Working with Raw XML

Working with Raw XML

The XmlDataSource control gives you a great way to display XML data, but there are times when you need to interact with XML documents. There are several classes you can use for working directly with XML, and which classes you use depends upon your requirements. There are generally two ways of working with XML: in-memory or streamed from a file, and these ways will dictate which classes you use. Whether you should use in-memory or streamed XML is also a question of your requirements, and certain scenarios naturally lead to one form or another. Typical scenarios where you should use in-memory XML stores include the following:

  • XSLT transformations: Performing transformations naturally fits into memory usage, because nodes can be accessed in a random order.

  • Random access to a document that is being updated by a user: For example, users often work on Microsoft Word documents over a period of time. Performance here is not that critical, because a user's typing speed is typically the limiting factor.

  • Caching of configuration state or data that will be read many times in memory: XML stores are best used in layered caching strategies in order to improve application responsiveness. Examples of this scenario include caching weather data for multiple page hits from different clients and working with an application configuration file that is loaded when the application starts and is read many times.

  • Application of business rules to a document to check its validity: For example, e-commerce shopping carts may apply a discount to an order if a person buys more than five books. From an implementation perspective, this uses the node-level events that are fired when updates and changes occur to the document.

  • XML Digital Signatures (XMLDSig): In some cases (but not all), in order to sign a document with an XML digital signature, the document has to be loaded into memory first. Streaming models for XMLDSig are also possible based on the XmlReader.

Typical scenarios where you should not use in-memory XML stores include the following:

  • When performance is critical: Building a tree of nodes just to access a few elements within it is extremely costly and should be avoided.

  • When memory resource is critical: Typically a UTF-8 encoded document will quadruple in memory compared to its file size, and a UTF-16 document will approximately double or treble in memory size. Thus your 1MB document jumps to 4MB in memory with the CLR garbage collector working overtime to recover the hundreds of strings allocated when the document is closed. Conversely, the XmlReader loads a 4K character buffer from the stream along with some additional state, and hence it is much less memory-intensive. Also, the XmlReader can read XML documents up to any sizehundreds of gigabytes, if necessary (there is no upper limit)which is impossible with the memory-constrained XmlDocument.

  • When you have a "touch-once scenario:" For example, in some low-volume XML messaging systems, the body of the message can simply be passed on to the relevant processing module once the message header has been read to determine the routing or action information. Message caching is still necessary in many cases to ensure message throughput, because you cannot rely on the application to stream the received XML messages fast enough.

Streaming XML

Streaming XML is a connected scenario, where you are navigating through an XML file. You use the XmlReader class for reading, and the XmlWriter class for writing. Typical scenarios where you will be streaming XML include the following:

  • Reading and writing application XML configuration files

  • Reading from URLs such as RSS feeds or writing the XML for the RSS feeds

  • Validating an XML document with an XML schema to ensure that the document conforms to the schema and alternatively enforces business rules

  • Combining the XmlReader and XmlWriter to perform simple data transformations. Consider this as an alternative to using XSLT, because this combination is often faster and uses less memory

  • Pipelining XML processing. For example, you may have an XML document that is validated by a series of different XML schemas with business rules written in Visual Basic, C#, or XSLT that act on the data

  • Accessing relational data as XML from a SQL Server database

  • Implementing a custom XmlReader or XmlWriter in order to expose data not necessarily stored as XML as the XML data model. For example, you can implement an XmlReader over the file system to make it look like an XML document where the properties for the files, such as creation date and file size, are mapped to XML attributes. Equally, you can implement an XmlWriter that enables you to write to the file system as if it were an XML document

All of these scenarios can be achieved with the XmlReader and XmlWriter classes, both of which are simple to use, although you need to understand a bit about how XML documents are structured. This is best seen with an example.

Reading XML Documents

For example, consider Listing 7.12, which uses the Create method to create an XmlReader over the shippers file. Like the SqlDataReader, the XmlReader uses the Read method to read nodes from the underlying data, returning false if no more nodes can be read. The Name property returns the name of the element.

Using an XmlReader

using (XmlReader rdr =
  while (rdr.Read())
    Response.Write(rdr.Name + "<br/>");

What's interesting about this code fragment is that it returns more than you'd first think.


What you notice is that the element names appear twice, which is because there are two of them in the XML file: the start and end parts of the elementeach appear as separate nodes. There are also other nodes in the XML file; whitespace appears as a separate node as does the value of the element. This is what we mean when we said you must understand a bit about how XML documents are structuredthey are node-based. So everything within an XML document is a node, and you can see this in Figure, where even the white space in the document is a node. Compare this with Figure, where there are fewer nodes and no values, because the content is stored within attributes.

9. Nodes, types, and values of the Shippers document

10. Nodes, types, and values for Shippers Attributes document

The most important point these figures show is that when navigating through documents with an XmlReader, you need to know the structure of the XML if you want to process it in an intelligent fashion. The XmlReader has plenty of methods to determine if the current node has a value, or has attributes, and what type of node it is so that you can build logic into the data reading.

Writing XML Documents

Writing XML documents is similar to reading them, in that you are dealing with nodes as they will appear in the output document. You have to remember that XML documents are hierarchical, and that nodes have a start point and an end point, and so you have to create the start and end of each node as you write the file. This is shown in Listing 7.13.

Writing XML Using the XmlWriter

StringBuilder bldr = new StringBuilder();

using (XmlWriter writer = XmlWriter.Create(bldr, null))
  writer.WriteComment("Generated automatically");
  writer.WriteString("Speedy Express");
  writer.WriteString("(503) 555-9831");

NewXML.Text = bldr.ToString();

This code uses the static Create method of the XmlWriter class to create the writer (ignore the null for the moment), writing into a StringBuilder. You can also write to streams and files, but the results of this example are output of the XML to a TextBox control. The writer is then used to write content, starting with the documenteach XML must have a document element (thats the <?xml> element) that indicates this is an XML document, its version, and the encoding scheme used. Next, WriteComment is used to write a simple text comment, and then the first element, shipper, is started with WriteStateElement, which writes <shipper to the string builder. Within this element, WriteStartAttribute is used to start an attribute, WriteString to write the value of the attribute, and WriteEndAttribute to write the closing > of the attribute. Other attributes can be added in the same manner before WriteEndElement is used to close the element, and WriteEndDocument is used to end the document. The results of this can be seen in Figure.

11. Unformatted XML

If you are supplying this XML to another program, this output is perfectly acceptable, with no white space, but it is slightly hard to read. The output can automatically be formatted by creating XmlWriterSettings and passing them into the Create method of the XmlWriter in place of the null shown in earlier code. Listing 7.14 shows this in action, using the Indent property of the XmlWriterSettings to indicate that indenting should be used for child elements and using NewLineOnAttributes to add a new line before an attribute. The results of this are shown in Figure.

12. Formatted XML

Using XmlWriterSettings

XmlWriterSettings settings = new XmlWriterSettings();

settings.Indent = true;
settings.NewLineOnAttributes = true;
using (XmlWriter writer = XmlWriter.Create(bldr, settings))

Reading and writing XML using the XmlReader and XmlWriter classes is streaming-based, meaning that you have to deal with nodes in the order in which they appear in the document. If you need to deal with nodes in a more arbitrary manner, streaming is not the solution. Instead, you need to deal with an in-memory XML store.

Working with XML Documents in Memory

When working with XML in memory, you will use one of the XPathDocument, XmlDocument, or XmlDataDocument classes. The difference between them is summed up easily:

  • The XPathDocument is read-only, and provides the best performance.

  • The XmlDocument is read-write.

  • The XmlDataDocument is read-write, and also provides XML in relational form.

All of these objects deal with an XML document in its entirety but don't provide a way to navigate around the nodes. For this, you use an XPathNavigator, which provides read (and write if the underlying object supports updates) access to the nodes in the document. The use of these is best seen with some examples.

Using the XPathDocument Object

The XPathDocument is really a way of providing a read-only document to an XPathNavigator, as it only has constructors and a single methodCreateNavigator. The constructors allow the document to be created from a variety of sources, such as streams, text readers, and files, while the CreateNavigator method returns an XPathNavigator that allows you to navigate around the document. Listing 7.15 shows some examples of the movement types, using MoveToFirstChild to move to the first child of the current node; subsequent calls will move deeper into the hierarchy of nodes. MoveToFirstAttribute allows you move to the first attribute for a node, and there are equivalents for moving to the next node or attribute as well as moving to previous nodes, the first node, selecting a range of nodes with an XPath expression, and so on.

Using the XPathDocument and XPathNavigator

StringBuilder bldr = new StringBuilder();
XPathDocument doc = new XPathDocument(Server.MapPath("cars.xml"));
XPathNavigator nav = doc.CreateNavigator();

bldr.Append("Processing 'cars.xml' -editing allowed: " +
  nav.CanEdit.ToString() + "<br />");

string root = nav.Name;
bldr.Append("First child: " + root + "<br />");

string child = nav.Name;
bldr.Append("First child of '" + root + "': " + child + "<br />");
bldr.Append("Inner XML: " + nav.InnerXml + "<br />");

bldr.Append("First attribute of '" + child + "': " +
  nav.Name + "=" + nav.Value + "<br />");

bldr.Append("Previous: " + nav.Name + "<br />");

bldr.Append("Reset: " + nav.Name + "<br />");

Label1.Text = bldr.ToString();

The output of this code is as follows:

Processing 'cars.xml' - editing allowed: False
First child: Automobiles
First child of 'Automobiles': Manufacturer
Inner XML: <Car Model="A4" Id="02347">
  <Package Trim="Sport Package" />
  <Package Trim="Luxury Package" />
<Car Model="A6" Id="02932">
  <Package Trim="Sport Package" />
  <Package Trim="Luxury Package" />
<Car Model="A8" Id="09381">
  <Package Trim="Sport Package" />
  <Package Trim="Luxury Package" />
First attribute of 'Manufacturer': Make=Audi
Previous: Make
Reset: Automobiles

You can see that you can move forward and backward through the nodes, and you can access element and attribute values as well as the entire XML for the node. The limitation of the XPathDocument is that it is readonly, so for updates you need to consider the XmlDocument.

Using the XmlDocument Object

In use, XmlDocument can be similar to the XPathDocument in that you use an XPathNavigator to move through the document, but because the XPathDocument is read-write, you can use additional methods on the navigator to create new content, as seen in Listing 7.16.

Creating Nodes with an XmlDocument

XmlDocument doc = new XmlDocument();
XPathNavigator nav = doc.CreateNavigator();

Label1.Text = "Processing 'cars.xml' - editing allowed: " +
  nav.CanEdit.ToString() + "<br />";

nav.PrependChildElement(null, "Manufacturer", null, null);
nav.CreateAttribute(null, "Make", null, "Ferrari");
nav.CreateAttribute(null, "WebSite", null,
nav.AppendChildElement(null, "Car", null, null);
nav.CreateAttribute(null, "Model", null, "F430");
nav.CreateAttribute(null, "Id", null, "00430");

Here a new manufacturer is created using the PrependChildElement method, which adds a new element, and CreateAttribute is used to create attributes on the new element. The output of this code is as follows:

  <Manufacturer Make="Ferrari" WebSite="">
    <Car Model="F430" Id="00430" />
  <Manufacturer Make="Audi" WebSite="">
    <Car Model="A4" Id="02347">
      <Package Trim="Sport Package" />
      <Package Trim="Luxury Package" />

The XPathNavigator offers many methods for creating content within the document it is navigating over, including appending elements, inserting before and after, replacing existing elements, and changing values of existing elements. What is interesting about this method of working with XML documents is that it offers great flexibility; you can work with existing content, add nodes individually, or, in conjunction with XmlReaders and XmlWriters, add content in bulk.

Using the XmlDataDocument Object

Many ASP.NET developers also sit in the database developer camp, having to do database design and administration. While knowledge of XML is also widespread, the use of the XML APIs described in this chapter isn't, and often the DataSet is used, because it has ReadXML and WriteXML methods to surface the relational data in XML form.

For the developer experienced with XML but not relational data, the XmlDataDocument is the solution. It is a subclass of the XmlDocument and provides one really important additional property, DataSet, which returns the XML data as a DataSet object. Before the DataSet can be exposed from the XML, a schema must be used so that the DataSet knows the structure of the underlying XML. Listing 7.17 shows this in action. First an XmlDataDocument is created, and the DataSet property is used to read the schema. The DataSet property is then used as the source for a grid, and because the DataSet property is simply just another view on the XML data, rows can be added to the DataSet and they are visible in the XML, as seen in Figure.

13. Using the XmlDataDocument's DataSet

Using the XmlDataDocument

XmlDataDocument doc = new XmlDataDocument();

GridView1.DataSource = doc.DataSet.Tables[0].DefaultView;
GridView1.DataMember = "Manufacturer";

DataSet ds = doc.DataSet;
DataTable tbl = ds.Tables[0];

tbl.Rows.Add(new string[] {"Ferrari", ""});
NewXML.Text = doc.OuterXml;

 Python   SQL   Java   php   Perl 
 game development   web development   internet   *nix   graphics   hardware 
 telecommunications   C++ 
 Flash   Active Directory   Windows