Generating and Serializing XML Documents


In the examples in Chapter 2, we assumed that there was an input XML document before any process began. If you are going to build an application program that takes a set of data from a backend database and generates an XML document based on the data, you might need to generate a document structure from scratch. The generated structure might be exchanged between applications by serializing the document structure to an XML document.

In this chapter, we present the basics of constructing a DOM tree from scratch. Once the structure is built, you can instruct an XML processor to output an XML document from it.

We first discuss generating a DOM tree without worrying about a DTD and validity (see Section 3.2). The DOM API described in Chapter 2 enables you to build a well-formed tree structure without relying on any particular implementation of XML processor. Applications that use the XML processor can create and modify a DOM tree in a very flexible way. This characteristic is useful for many applications (especially for document applications).

However, if you use XML for messaging, or exchanging structured data between business applications, validity is important. A validating XML processor checks the validity on the receiving side, but generally it is more desirable to generate a valid XML document where it is created. Section 3.3 shows how to build a valid DOM tree according to a given DTD. If the network is not reliable, we may need validation on both sides.

In the first edition of this book, we introduced validating generation, which enables developers to create DOM nodes incrementally according to a DTD. The function was provided by the XML for Java Parser versions 1 and 2. Unfortunately, the function is not implemented in Xerces (maybe because the DOM specification does not support it). However, we believe this function is essential for XML document generation. Therefore, we introduce an alternative approach—validating a DOM tree after creating it (see Section 3.3).

A generated DOM tree in memory can be stored as an XML document in a file system or database, or sent to other applications. The conversion from a DOM tree to an XML document is called serialization. The serialization is not a task for an XML processor based on XML 1.0, and the DOM API itself does not support serialization. However, the serialization is important in a practical sense because XML documents are exchanged among applications as text data on a network. In this book, you learn how to serialize a DOM tree by using functions provided by Xerces.

The rest of this chapter covers some miscellaneous but important topics on XML processing. The result of serialization of a generated DOM tree may sometimes look strange to readers. It is caused by processing whitespace in an XML document. We cover handling whitespace in Section 3.5. The last section is devoted to internationalization, which is also important for interoperability.

     Python   SQL   Java   php   Perl 
     game development   web development   internet   *nix   graphics   hardware 
     telecommunications   C++ 
     Flash   Active Directory   Windows