April 21, 2011, 2:36 p.m.
posted by pythonics
Languages and Metalanguages
A language is composed of commonly accepted symbols that we assemble in a meaningful way in order to express ourselves and to pass along information that is intelligible to others. For example, English is a language with rules (grammar) that define how to put its symbols (words) together to form sentences, paragraphs, and, ultimately, books like the one you are holding. If you know the words and understand the grammar, you can read the book, even if you don't necessarily understand its contents.
An important difference between human and computer-based languages is that human languages are self-describing. We use English sentences and paragraphs to define how to create correct English sentences and paragraphs. Our brains are marvelous machines that have no problem understanding that you can use a language to describe itself. However, computer languages are not so rich and computers are not so bright that you could easily define a computer language with itself. Instead, we define one languagea metalanguagethat defines the rules and symbols for other computer languages.
Software developers create the metalanguage rules and then define one or more languages based on those rules.[*] The metalanguage also guides developers who create the automated agents that display or otherwise process the contents of documents that use its language(s).
XML is the metalanguage the W3C created and that developers use to define markup languages such as XHTML. Browser developers rely on XML's metalanguage rules to create automated processes that read the language definition of XHTML and implement the processes that ultimately display or otherwise process XHTML documents.
Why bother with a markup metalanguage? Because, as the familiar proverb goes, the W3C wants to teach us how to fish so that we can feed ourselves for a lifetime. With XML, there is a standardized way to define markup languages for different needs, instead of having to rely upon HTML extensions. Mathematicians need a way to express mathematical notations, for instance; composers need a way to present musical scores; businesses want their web sites to take sales orders from customers; physicians look to exchange medical records; plant managers want to run their factories from web-based documents. All of these groups need an acceptable, resilient way to express these different kinds of information so that the software industry can develop the programs that process and display these diverse documents.
XML provides the answer. Each content sectorthe business group, the factory-automation consortium, a trade associationmay define a markup language that suits their particular need for information exchange and processing over the Web. Computer programmers then create XML-compliant processesparsersthat read the new language definitions and allow the server to process the documents of those languages.
Creation Versus Display
While there is no limit to the kinds of markup languages that you can create with XML, displaying your documents may be more complicated. For instance, when you write HTML, a browser understands what to do with the <h1> tag because it is defined in the HTML DTD.
With XML, you create the DTD.[*] For example, wouldn't a recipe DTD be a great way to capture and standardize all those kumquat recipes you've been collecting in your kitchen drawers? With special <ingredient> and <portion> tags, the recipes are easy to define and understand. However, browsers won't know what to do with these new tags unless you attach a stylesheet that defines their handling. Without a stylesheet, XML-compliant browsers render these tags in a very generic waycertainly not the flourishing presentation your kumquat recipes deserve.
Even with stylesheets, there are limitations to presenting XML-based information. Let's say you want to create something more challenging, such as a DTD for musical notation or silicon chip design. While describing these data types in a DTD is possible, displaying this information graphically is certainly beyond the capabilities of any stylesheets we've seen yet; properly displaying this type of graphically rich information would require a specialized rendering tool.
Nonetheless, your recipe DTD is a great tool for capturing and sharing recipes. As we'll see later in this chapter, XML isn't simply about creating markup languages for displaying content in browsers. It has great promise for sharing and managing information so that those precious kumquat dishes will be preserved for many generations to come. Just bear in mind that, in addition to writing a DTD to describe your new XML-based markup language, in most cases you will want to supplement the DTD with a stylesheet.
A Little History
To complete your education into the whys and wherefores of markup languages, it helps to know how all these markup languages came to be.
In the beginning, there was SGML. SGML was intended to be the only metalanguage from which all markup languages would derive. With SGML, you can define everything from hieroglyphics to HTML, negating the need for any other metalanguage.
The problem with SGML is that it is so broad and all-encompassing that mere mortals cannot use it. Using SGML effectively requires very expensive and complex tools that are completely beyond the scope of regular people who just want to bang out an HTML document in their spare time. As a result, developers created other markup languages that are greatly reduced in scope and are much easier to use. The HTML standards themselves were initially defined using a subset of SGML that eliminated many of its more esoteric features. The DTD in Appendix D uses this subset of SGML to define the HTML 4.01 standard.
Recognizing that SGML was too unwieldy to describe HTML in a useful way and that there was a growing need to define other HTML-like markup languages, the W3C defined XML. XML is a formal markup metalanguage that uses select features of SGML to define markup languages in a style similar to that of HTML. It eliminates many SGML elements that aren't applicable to languages such as HTML, and simplifies other elements to make them easier to use and understand.
XML is a middle ground between SGML and HTML, a useful tool for defining a wide variety of markup languages. XML is becoming increasingly important as the Web extends beyond browsers and moves into the realm of direct data interchange among people, computers, and disparate systems. A small number of people wind up creating new markup languages with XML, and many more people want to be able to understand XML DTDs in order to use all of these new markup languages.