April 11, 2011, 10:05 p.m.
posted by max
While you may already be familiar with XML, it is important to understand XML concepts from the point of view of applications handling XML documents. With this knowledge, you are in a better position to judge the impact of your design decisions on the implementation and performance of your XML-based applications.
Essentially, XML is a markup language that enables hierarchical data content extrapolated from programming language data structures to be represented as a marked-up text document. As a markup language, XML uses tags to mark pieces of data. Each tag attempts to assign meaning to the data associated with it; that is, transform the data into information. If you know SGML (Standard Generalized Markup Language) and HTML (HyperText Markup Language), then XML will look familiar to you. XML is derived from SGML and also bears some resemblance to HTML, which is also a subset of SGML. But unlike HTML, XML focuses on representing data rather than end-user presentation. While XML aims to separate data from presentation, the end-user presentation of XML data is nevertheless specifically addressed by additional XML-based technologies in rich and various ways.
Although XML documents are not primarily intended to be read by users, the XML specification clearly states as one of its goals that "XML documents should be human-legible and reasonably clear." This legibility characteristic contributed to XML's adoption. XML supports both computer and human communications, and it ensures openness, transparency, and platform-independence compared to a binary format.
A grammar along with its vocabulary (also called a schema in its generic acception) defines the set of tags and their nesting (the tag structure) that may be allowed or that are expected in an XML document. In addition, a schema can be specific to a particular domain, and domain-specific schemas are sometimes referred to as markup vocabularies. The Document Type Definition (DTD) syntax, which is part of the core XML specification, allows for the definition of domain-specific schemas and gives XML its "eXtensible" capability. Over time, there have been an increasing number of these XML vocabularies or XML-based languages, and this extensibility is a key factor in XML's success. In particular, XML and its vocabularies are becoming the lingua franca of business-to-business (B2B) communication.
In sum, XML is a metalanguage used to define other markup languages. While tags help to describe XML documents, they are not sufficient, even when carefully chosen, to make a document completely self-describing. Schemas written as DTDs, or in some other schema language such as the W3C XML Schema Definition (XSD), improve the descriptiveness of XML documents since they may define a document's syntax or exact structure. But even with the type systems introduced by modern schema languages, it is usually necessary to accompany an XML schema with specification documents that describe the domain-specific semantics of the various XML tags. These specifications are intended for application developers and others who create and process the XML documents. Schemas are necessary for specifying and validating the structure and, to some extent, the content of XML documents. Even so, developers must ultimately build the XML schema's tag semantics into the applications that produce and consume the documents. However, thanks to the well-defined XML markup scheme, intermediary applications such as document routers can still handle documents partially or in a generic way without knowing the complete domain-specific semantics of the documents.
1 Document Type and W3C XML Schema Definitions
Originally, the Document Type Definition (DTD) syntax, which is part of the core XML 1.0 specification and became a recommendation in 1998, allowed for the definition of domain-specific schemas. However, with the growth in the adoption of XML (particularly in the B2B area), it became clear that the DTD syntax had some limitations. DTD's limitations are:
To address these shortcomings, the W3C defined the XML Schema Definition language (XSD). (XSD became an official recommendation of the W3C in 2001.) XSD addresses some of the shortcomings of DTD, as do other schema languages, such as RELAX-NG. In particular, XSD:
The following convention applies to the rest of the chapter: The noun "schema" or "XML schema" designates the grammar or schema to which an XML document must conform and is used regardless of the actual schema language (DTD, XSD, and so forth). Note: While XSD plays a major role in Web services, Web services may still have to deal with DTD-based schemas because of legacy reasons.
As an additional convention, we use the word "serialization" to refer to XML serialization and deserialization. We explicitly refer to Java serialization when referring to serialization supported by the Java programming language. Also note that we may use the terms "marshalling" and "unmarshalling" as synonyms for XML serialization and deserialization. This is the same terminology used by XML data-binding technologies such as JAXB.
2 XML Horizontal and Vertical Schemas
XML schemas, which are applications of the XML language, may apply XML to horizontal or vertical domains. Horizontal domains are cross-industry domains, while vertical domains are specific to types of industries. Specific XML schemas have been developed for these different types of domains, and these horizontal and vertical applications of XML usually define publicly available schemas.
Many schemas have been established for horizontal domains; that is, they address issues that are common across many industries. For example, W3C specifications define such horizontal domain XML schemas or applications as Extensible HyperText Markup Language (XHTML), Scalable Vector Graphics (SVG), Mathematical Markup Language (MathML), Synchronized Multimedia Integration Language (SMIL), Resource Description Framework (RDF), and so forth.
Likewise, there are numerous vertical domain XML schemas. These schemas or applications of XML define standards that extend or apply XML to a vertical domain, such as e-commerce. Typically, groups of companies in an industry develop these standards. Some examples of e-commerce XML standards are Electronic Business with XML (ebXML), Commerce XML (CXML), Common Business Language (CBL), and Universal Business Language (UBL).
When designing an enterprise application, developers often may define their own custom schemas. These custom schemas may be kept private within the enterprise. Or, they may be shared just with those partners that intend to exchange data with the application. It is also possible that these custom schemas may be publicly exposed. Such custom schemas or application-specific schemas are defined either from scratch or, if appropriate, they may reuse where possible existing horizontal or vertical schema components. Note that publishing schemas in order to share them among partners can be implemented in various ways, including publishing the schemas along with Web service descriptions on a registry (see "Publishing a Web Service" on page 101).
3 Other Specifications Related to XML
For those interested in exploring further, here is a partial list of the many specifications that relate to XML.