System.Xml Version 1.x

System.Xml Version 1.x

In this section we are going to briefly overview System.Xml version 1.x, as delivered in previous releases of the .NET Framework. Many of the innovations in version 2.0 build on the version 1.x designs and methodologies and continue with the driving requirements of the System.Xml version 1.x architecture.

Version 1.x Design Goals

The following list of requirements was used to drive the design of System.Xml in version 1.x.

  • W3C standards compliance: Support for the major W3C XML standards that provide cross-platform interoperability is necessary. These are XML 1.0, Namespaces in XML, XSLT 1.0, XPath 1.0, and W3C XML Schema. Also included is the DOM level 2 specification, but since this is an API specification, it has little to do with interoperability—as demonstrated by the large number of different DOM implementations on each platform. It does provide some level of API familiarity to developers, which is useful, but the API is limited for use only in particular scenarios.

  • XML provider model: When data access APIs are discussed, these are often thought of in terms of relational APIs such as ADO, OLE-DB, ODBC, and ADO.NET. However, a primary goal was to provide a base to build XML data access providers and hence create an extensible and pluggable architecture. This was achieved through the use of abstract classes to define the XML API and the behavior (or semantics) of the providers as well as concrete implementations. The three abstract classes that constitute this XML provider model are the XmlReader, the XmlWriter, and the XPathNavigator. The latter is of particular significance because it combines a cursor-style API with an XPath 1.0 query engine. The other supporting API that becomes more significant in version 2.0 is the XmlResolver.

  • Integration with ADO.NET: The System.Xml classes can really be considered part of ADO.NET because they are data access APIs, and it was a goal to provide as seamless an experience as possible when moving between XML and relational data. This is epitomized in the flexibility of the DataSet class to both read and write XML into relational table structures.

When the XML 1.0 standard is referred to, this indicates that the format consists of angle brackets (< and >) as a serialized, text format. Contrast this with a data model, which specifies the information in the document that is accessible but not the APIs to represent or access the data. Neither does it describe how the data is written.

Before diving into more detail on these requirements, we'll examine the main classes in System.Xml version 1.x in relation to their usage scenarios.

XML Reading and Writing

The XmlReader and XmlWriter classes provide input/output (I/O) support for XML by providing the abstract definition of how to read or parse XML and how to generate a stream of XML content. The implementation of these are the XmlTextReader, for reading streams of text characters serialized to the W3C XML 1.0 specification, and the XmlNodeReader, which has the ability to read an XmlDocument (DOM) in a streaming fashion. A major innovation in the .NET Framework was the implementation of the XmlReader as a pull model parser, which significantly reduces the complexity of the code needed to read an XML document. This was a first for any platform.

The XmlTextWriter generates XML according to the W3C XML 1.0 specification, checking the document structure according to XML well-formedness rules. The push model XmlWriter provides an API that is significantly easier to use when generating XML. Generally you should always prefer to use these classes over the XmlDocument and other DOMs, such as the XPathDocument, when you do not need the ability to edit an entire document.

XML Document Editing

The W3C DOM was the first widely used API for manipulating XML, and it is still used in many XML applications. The XmlDocument class implements the W3C DOM level 2 specification, but with a .NET feel. This means that it uses the .NET types and provides many useful additional methods, such as Load, Save, and SelectNodes for XPath support, as well as node-level properties such as InnerXml and OuterXml to easily extract and build node trees from strings. You should use the XmlDocument only when you need to load the whole XML document into memory and random access is needed by the application to all parts of the document (such as required by XPath).

The term "in-memory XML store" or just "XML store" is used in this book to describe a document that has been loaded into memory and is accessed via an XML API. For example, the XmlDocument, XPathDocument, and XmlDataDocument classes are all XML stores in the .NET Framework. By contrast, the ADO.NET DataSet class is a relational store because it has a relational API and storage model.

XML Validation or Content Checking

The ability to check, or validate, the structure of an XML document according to a Document Type Definition (DTD) or an XML schema is provided by the XmlValidatingReader class, which layers validation support on top of an XmlTextReader. This class ensures that when an XML instance document is read, the structure and (in the case of an XML schema) the types are matched according to the schema definition. Any irregularities cause errors to be thrown. When using XML schemas to perform this validation, there is the added benefit of being able to return .NET Common Language Runtime (CLR) types mapped from the XML schema types. For example, a DateTime CLR type is returned for the numerous W3C XML schema date types, such as gDay, gMonthDay, gYear, and so on. The W3C XML schemas are represented via the XmlSchema class and its related classes, such as XmlSchema Element, XmlSchemaAttribute, and so on. The XmlSchemaCollection class is a library of XML schemas loaded into memory that can be used by the XmlValidatingReader.

XML Querying

XPath 1.0 has established itself as the preferred query language for XML, in part because of its similarity to a file path–like syntax. The XPathNavigator class is another major innovation in System.Xml version 1.x because not only does this class separate the XPath query engine from the XSLT engine but it also provides a cursor-style API for random access retrieval within the XML. Importantly, this class implements the XPath 1.0 data model via the XPathNodeType enumeration, which means that it has a much simpler set of node types based on the XML Information Set (or InfoSet) types, rather than the overburdened DOM node types represented by the XmlNodeType enumeration.

The XPathDocument Class: A Better XML Store for XPath Queries

In order to support faster querying, and hence faster XSLT, an XML store that is based on the XPath data model is provided. This is the XPathDocument. The XPathDocument is typically 20%–30% faster than the XmlDocument for queries and is the preferred store for XSLT transformations. The Xml Document should be used only if you need to edit the XML document before performing the transformation. In System.Xml version 1.x the XPath Document is a read-only store, and one of the significant changes in version 2.0 is to make this editable via a cursor-style API. This now makes the XPathDocument the preferred class not only for querying but also for editing. Given its other new features (which we'll see in Chapter 6), the XPath Document will supplant the XmlDocument class as the main XML store.

XSL Transformation

The XslTransform class is an XSLT processor providing query and transformations, via the XSLT language written as XSL stylesheets. XSLT is a heavily used XML technology because it solves a range of development issues centered on shaping XML. The classic use of XSLT is for data-driven Web sites. Here the data layer is used to generate the presentation layer, typically as HTML, thereby allowing a clean UI separation (i.e., a data/ view separation). As new data feeds are introduced, these may or may not be rendered in the UI differently, depending on the generation rules embedded in the XSLT stylesheets.

The innovation of the XslTransform class is its ability to provide a streaming output via an XmlReader (pull) or an XmlWriter (push), rather than as an XML store such as an XPathDocument or XmlDocument as required by most other XSLT processors. This means that the output of the XSL transformation does not need to be cached unnecessarily if it is being streamed, for example, via an XML Web Service.

     Python   SQL   Java   php   Perl 
     game development   web development   internet   *nix   graphics   hardware 
     telecommunications   C++ 
     Flash   Active Directory   Windows