The Document Interface as an Abstract Factory





The Document Interface as an Abstract Factory

The Document interface, summarized in Figure, serves two purposes in DOM:

  1. As an abstract factory, it creates instances of other nodes for that document.

  2. It is the representation of the document node.

4 The Document Interface
package org.w3c.dom;

public interface Document extends Node {

  public Element createElement(String tagName)
   throws DOMException;
  public Element createElementNS(String namespaceURI,
   String qualifiedName) throws DOMException;
  public Text createTextNode(String data);
  public Comment createComment(String data);
  public CDATASection createCDATASection(String data)
   throws DOMException;
  public ProcessingInstruction createProcessingInstruction(
   String target, String data) throws DOMException;
  public Attr createAttribute(String name) throws DOMException;
  public Attr createAttributeNS(String namespaceURI,
   String qualifiedName) throws DOMException;
  public DocumentFragment createDocumentFragment();
  public EntityReference createEntityReference(String name)
   throws DOMException;

  public DocumentType      getDoctype();
  public DOMImplementation getImplementation();
  public Element           getDocumentElement();
  public Node              importNode(Node importedNode,
                               boolean deep) throws DOMException;
  public NodeList          getElementsByTagName(String tagname);
  public NodeList          getElementsByTagNameNS(
                          String namespaceURI, String localName);
  public Element           getElementById(String elementId);

}

Remember that in addition to the methods listed here, each Document object has all the methods of the Node interface discussed in Chapter 9. These are key parts of the functionality of the class.

I'll begin with the use of the Document interface as an abstract factory. You'll notice that the Document interface has nine separate createXXX() methods for creating seven different kinds of node objects. (There are two methods each for creating element and attribute nodes, because you can create these with or without namespaces.) For example, given a Document object doc, the following code fragment creates a new processing instruction and a comment:

ProcessingInstruction xmlstylesheet 
 = doc.createProcessingInstruction("xml-stylesheet",
 "type=\"text/css\" href=\"standard.css\"");
Comment comment = doc.createComment(
 "An example from Chapter 10 of Processing XML with Java");

Although these two nodes are associated with the document, they are not yet parts of its tree. To add them, it's necessary to use the insertBefore() method of the Node interface that Document extends. Specifically, I'll insert each of these nodes before the root element of the document, which can be retrieved via getDocumentElement():

Node rootElement = doc.getDocumentElement(); 
doc.insertBefore(comment, rootElement);
doc.insertBefore(xmlstylesheet, rootElement);

To add content inside the root element, it's necessary to use the Node methods on the root element. For example, the following code fragment adds a desc child element to the root element:

Element desc 
 = doc.createElementNS("http://www.w3.org/2000/svg", "desc");
rootElement.appendChild(desc);

Each node is created by the owner document, but it is inserted using the parent node. For example, the following code fragment adds a text-node child containing the phrase "An example from Processing XML with Java" to the previous desc element node:

Text descText 
 = doc.createTextNode("An example from Processing XML with Java");
desc.appendChild(descText);

Figure puts this all together to create a program that builds a complete, albeit very simple, SVG document in memory using DOM. JAXP loads the DOMImplementation so that the program is reasonably parser independent. The JAXP ID-transform hack introduced in Chapter 9 dumps the document on System.out.

5 Using DOM to Build an SVG Document in Memory
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.dom.DOMSource;
import org.w3c.dom.*;
import java.io.IOException;


public class SimpleSVG {

  public static void main(String[] args) {

    try {

      // Find the implementation
      DocumentBuilderFactory factory
       = DocumentBuilderFactory.newInstance();
      factory.setNamespaceAware(true);
      DocumentBuilder builder = factory.newDocumentBuilder();
      DOMImplementation impl = builder.getDOMImplementation();

      // Create the document
      DocumentType svgDOCTYPE = impl.createDocumentType(
       "svg", "-//W3C//DTD SVG 1.0//EN",
       "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd"
      );
      Document doc = impl.createDocument(
       "http://www.w3.org/2000/svg", "svg", svgDOCTYPE);

      // Fill the document
      Node rootElement = doc.getDocumentElement();
      ProcessingInstruction xmlstylesheet
       = doc.createProcessingInstruction("xml-stylesheet",
       "type=\"text/css\" href=\"standard.css\"");
      Comment comment = doc.createComment(
       "An example from Chapter 10 of Processing XML with Java");
      doc.insertBefore(comment, rootElement);
      doc.insertBefore(xmlstylesheet, rootElement);
      Node desc = doc.createElementNS(
       "http://www.w3.org/2000/svg", "desc");
      rootElement.appendChild(desc);
      Text descText = doc.createTextNode(
       "An example from Processing XML with Java");
      desc.appendChild(descText);

      // Serialize the document onto System.out
      TransformerFactory xformFactory
       = TransformerFactory.newInstance();
      Transformer idTransform = xformFactory.newTransformer();
      Source input = new DOMSource(doc);
      Result output = new StreamResult(System.out);
      idTransform.transform(input, output);

    }
    catch (FactoryConfigurationError e) {
      System.out.println("Could not locate a JAXP factory class");
    }
    catch (ParserConfigurationException e) {
      System.out.println(
        "Could not locate a JAXP DocumentBuilder class"
      );
    }
    catch (DOMException e) {
      System.err.println(e);
    }
    catch (TransformerConfigurationException e) {
      System.err.println(e);
    }
    catch (TransformerException e) {
      System.err.println(e);
    }

  }

}

When this program is run, it produces the following output:

C:\XMLJAVA>java SimpleSVG 
<?xml version="1.0" encoding="utf-8"?><!--An example from Chapter
10 of Processing XML with Java--><?xml-stylesheet type="text/css"
href="standard.css"?><svg><desc>An example from Processing XML
with Java</desc></svg>

I've inserted line breaks to make the output fit on this page, but the actual output doesn't have any. In the prolog, that's because the JAXP ID transform doesn't include any. In the document, that's because the program did not add any text nodes containing only white space. Many parser vendors include custom serialization packages that allow you to more closely manage the placement of white space and other syntax sugar in the output. In addition, this will be a standard part of DOM3. We'll explore these options for prettifying the output in Chapter 13.

Note

The lack of namespace declarations and possibly the lack of a document type declaration is a result of bugs in JAXP implementations. I've reported the problem to several XSLT processor/XML parser vendors and am hopeful that at least some of them will fix this bug before the final draft of this book. As of July 2002, GNU JAXP and Oracle include the namespace declaration, whereas Xerces 2.0.2 leaves it out. So far no implementation I've seen includes the document type declaration. You can work around the problem by explicitly adding namespace declaration attributes to the tree.


The same techniques can be used for all of the nodes in the tree: text, comments, elements, processing instructions, and entity references. But because attributes are not children, attribute nodes can only be set on element nodes and only by using the methods of the Element interface. I'll take that up in Chapter 11. Attr objects, on the other hand, are created by Document objects, just like all the other DOM node objects.

DOM is not picky about whether you work from the top down or the bottom up. You can start at the root and add its children, then add the child nodes to these nodes, and continue down the tree. Alternately, you can start by creating the deepest nodes in the tree, and then create their parents, and then create the parents of the parents, and so on back up to the root. Or you can mix and match as seems appropriate in your program. DOM really doesn't care as long as there's always a root element.

Each node created is firmly associated with the document that created it. If document A creates node X, then node X cannot be inserted into document B. A copy of node X can be imported into document B, but node X itself is always attached only to document A.

We're now in a position to repeat some examples from Chapter 3, this time using DOM to create the document rather than just writing strings onto a stream. Among other advantages, this means that many well-formedness constraints are automatically satisfied. Furthermore, the programs will have a much greater object-oriented feel to them.

I'll begin with the simple Fibonacci problem of Figure. That program produced documents that look like this:

<?xml version="1.0"?> 
<Fibonacci_Numbers>
  <fibonacci>1</fibonacci>
  <fibonacci>1</fibonacci>
  <fibonacci>2</fibonacci>
  <fibonacci>3</fibonacci>
  <fibonacci>5</fibonacci>
  <fibonacci>8</fibonacci>
  <fibonacci>13</fibonacci>
  <fibonacci>21</fibonacci>
  <fibonacci>34</fibonacci>
  <fibonacci>55</fibonacci>
</Fibonacci_Numbers>

This is a straightforward element-based hierarchy that does not use namespaces or document type declarations. Although simple, these sorts of documents are important. XML-RPC is just one of many real-world applications that does not use anything more than element, text, and document nodes.

Figure is a DOM-based program that generates documents of this form. It is at least superficially more complex than the equivalent program from Chapter 3, but it has some advantages over that program. In particular, well-formedness of the output is almost guaranteed. It's a lot harder to produce incorrect XML with DOM than by simply writing strings on a stream. Furthermore, the data structure is a lot more flexible. Here, the document is written more or less from beginning to end, but if this were part of a larger program that ran for a longer time, then nodes could be added and deleted in almost random order anywhere in the tree at any time. It's not necessary to know all of the information that will ever go into the document before you begin writing it. The downside is that DOM programs tend to eat substantially more RAM than the streaming equivalents because they must keep the entire document in memory at all times. This can be a significant problem for large documents.

6 A DOM Program That Outputs the Fibonacci Numbers as an XML Document
import org.w3c.dom.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import java.math.BigInteger;


public class FibonacciDOM {

  public static void main(String[] args) {

    try {

      // Find the implementation
      DocumentBuilderFactory factory
       = DocumentBuilderFactory.newInstance();
      factory.setNamespaceAware(true);
      DocumentBuilder builder = factory.newDocumentBuilder();
      DOMImplementation impl = builder.getDOMImplementation();

      // Create the document
      Document doc = impl.createDocument(null,
       "Fibonacci_Numbers", null);

      // Fill the document
      BigInteger low  = BigInteger.ONE;
      BigInteger high = BigInteger.ONE;

      Element root = doc.getDocumentElement();

      for (int i = 0; i < 10; i++) {
        Element number = doc.createElement("fibonacci");
        Text text = doc.createTextNode(low.toString());
        number.appendChild(text);
        root.appendChild(number);
        BigInteger temp = high;
        high = high.add(low);
        low = temp;
      }

      // Serialize the document onto System.out
      TransformerFactory xformFactory
       = TransformerFactory.newInstance();
      Transformer idTransform = xformFactory.newTransformer();
      Source input = new DOMSource(doc);
      Result output = new StreamResult(System.out);
      idTransform.transform(input, output);

    }
    catch (FactoryConfigurationError e) {
      System.out.println("Could not locate a JAXP factory class");
    }
    catch (ParserConfigurationException e) {
      System.out.println(
        "Could not locate a JAXP DocumentBuilder class"
      );
    }
    catch (DOMException e) {
      System.err.println(e);
    }
    catch (TransformerConfigurationException e) {
      System.err.println(e);
    }
    catch (TransformerException e) {
      System.err.println(e);
    }

  }

}

As usual, this code contains the four main tasks for creating a new XML document with DOM:

  1. Locate a DOMImplementation.

  2. Create a new Document object.

  3. Fill the Document with various kinds of nodes.

  4. Serialize the Document onto a stream.

Most DOM programs that create new documents follow this structure. They may hide parts in different methods, or use DOM3 to serialize instead of JAXP; but they all must locate a DOMImplementation, use that to create a Document object, fill the document with other nodes created by the Document object, and finally serialize the result. (A few programs may skip the serialization step.)

The only part that really changes from one program to the next is how the document is filled with content. This naturally depends on the structure of the document. A program that reads tables from a database to get the data will naturally look very different from a program like this one, which algorithmically generates numbers. And both of these will look very different from a program that asks the user to type in information. However, all three and many more besides will use the same methods of the Document and Node interfaces to build the structures they need.

Here is the output when this program is run:

C:\XMLJAVA>java FibonacciDOM 
<?xml version="1.0" encoding="utf-8"?><Fibonacci_Numbers>
<fibonacci>1</fibonacci><fibonacci>1</fibonacci><fibonacci>2
</fibonacci><fibonacci>3</fibonacci><fibonacci>5</fibonacci>
<fibonacci>8</fibonacci><fibonacci>13</fibonacci><fibonacci>21
</fibonacci><fibonacci>34</fibonacci><fibonacci>55</fibonacci>
</Fibonacci_Numbers>

Notice once again that the white space is not quite what was expected. One way to fix this is to add the extra text nodes that represent the white space. For example,

for (int i = 0; i < 10; i++) {
  Text space = doc.createTextNode("\n  ");
  root.appendChild(space);
  Element number = doc.createElement("fibonacci");
  Text text = doc.createTextNode(low.toString());
  number.appendChild(text);
  root.appendChild(number);

  BigInteger temp = high;
  high = high.add(low);
  low  = temp;
}
Text lineBreak = doc.createTextNode("\n");
root.appendChild(lineBreak);

An alternate approach is to use a more sophisticated serializer and tell it to add the extra white space. I prefer this approach because it's much simpler and does not clutter up the code with basically insignificant white space, as I'll demonstrate in Chapter 13. Of course, if you really do care about white space, then you need to manage the white-space-only text nodes explicitly and tell whichever serializer you use to leave the white space alone.

Adding namespaces or a document type declaration pointing to an external DTD subset is not significantly harder. For example, suppose you want to generate valid MathML, as in Figure.

7 A Valid MathML Document That Contains Fibonacci Numbers
<?xml version="1.0"?>
<!DOCTYPE math PUBLIC "-//W3C//DTD MathML 2.0//EN"
 "http://www.w3.org/TR/MathML2/dtd/mathml2.dtd">
<math xmlns:mathml="http://www.w3.org/1998/Math/MathML">
  <mrow><mi>f(1)</mi><mo>=</mo><mn>1</mn></mrow>
  <mrow><mi>f(2)</mi><mo>=</mo><mn>1</mn></mrow>
  <mrow><mi>f(3)</mi><mo>=</mo><mn>2</mn></mrow>
  <mrow><mi>f(4)</mi><mo>=</mo><mn>3</mn></mrow>
  <mrow><mi>f(5)</mi><mo>=</mo><mn>5</mn></mrow>
  <mrow><mi>f(6)</mi><mo>=</mo><mn>8</mn></mrow>
  <mrow><mi>f(7)</mi><mo>=</mo><mn>13</mn></mrow>
  <mrow><mi>f(8)</mi><mo>=</mo><mn>21</mn></mrow>
  <mrow><mi>f(9)</mi><mo>=</mo><mn>34</mn></mrow>
  <mrow><mi>f(10)</mi><mo>=</mo><mn>55</mn></mrow>
</math>

The markup is somewhat more complex, but the Java code is not significantly more so. You simply need to use the implementation to create a new DocumentType object, and include both that and the namespace URL in the call to createDocument(). Figure demonstrates.

8 A DOM Program That Outputs the Fibonacci Numbers as a MathML Document
import org.w3c.dom.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import java.math.BigInteger;


public class FibonacciMathMLDOM {

  public static void main(String[] args) {

    try {

      // Find the implementation
      DocumentBuilderFactory factory
       = DocumentBuilderFactory.newInstance();
      factory.setNamespaceAware(true);
      DocumentBuilder builder = factory.newDocumentBuilder();
      DOMImplementation impl = builder.getDOMImplementation();

      // Create the document
      DocumentType mathml = impl.createDocumentType("math",
       "-//W3C//DTD MathML 2.0//EN",
       "http://www.w3.org/TR/MathML2/dtd/mathml2.dtd");
      Document doc = impl.createDocument(
       "http://www.w3.org/1998/Math/MathML", "math", mathml);

      // Fill the document
      BigInteger low  = BigInteger.ONE;
      BigInteger high = BigInteger.ONE;

      Element root = doc.getDocumentElement();

      for (int i = 1; i <= 10; i++) {
        Element mrow = doc.createElement("mrow");

        Element mi = doc.createElement("mi");
        Text function = doc.createTextNode("f(" + i + ")");
        mi.appendChild(function);

        Element mo = doc.createElement("mo");
        Text equals = doc.createTextNode("=");
        mo.appendChild(equals);

        Element mn = doc.createElement("mn");
        Text value = doc.createTextNode(low.toString());
        mn.appendChild(value);

        mrow.appendChild(mi);
        mrow.appendChild(mo);
        mrow.appendChild(mn);

        root.appendChild(mrow);

        BigInteger temp = high;
        high = high.add(low);
        low = temp;
      }

      // Serialize the document onto System.out
      TransformerFactory xformFactory
       = TransformerFactory.newInstance();
      Transformer idTransform = xformFactory.newTransformer();
      Source input = new DOMSource(doc);
      Result output = new StreamResult(System.out);
      idTransform.transform(input, output);

    }
    catch (FactoryConfigurationError e) {
      System.out.println("Could not locate a JAXP factory class");
    }
    catch (ParserConfigurationException e) {
      System.out.println(
        "Could not locate a JAXP DocumentBuilder class"
      );
    }
    catch (DOMException e) {
      System.err.println(e);
    }
    catch (TransformerConfigurationException e) {
      System.err.println(e);
    }
    catch (TransformerException e) {
      System.err.println(e);
    }

  }

}

Internal DTD subsets are a little harder, and not really supported at all in DOM2. For example, let's suppose you want to use a namespace prefix on your MathML elements but still want to have the document be valid MathML. The MathML DTD is designed in such a way that you can change the prefix and whether or not prefixes are used by redefining the MATHML.prefixed and MATHML.prefix parameter entities. Figure uses the prefix math.

9 A Valid MathML Document That Uses Prefixed Names
<?xml version="1.0"?>
<!DOCTYPE math:math PUBLIC "-//W3C//DTD MathML 2.0//EN"
 "http://www.w3.org/TR/MathML2/dtd/mathml2.dtd" [
  <!ENTITY % MATHML.prefixed "INCLUDE">
  <!ENTITY % MATHML.prefix "math">
]>
<math:math xmlns:mathml="http://www.w3.org/1998/Math/MathML">
  <math:mrow>
    <math:mi>f(1)</math:mi>
    <math:mo>=</math:mo>
    <math:mn>1</math:mn>
  </math:mrow>
  <math:mrow>
    <math:mi>f(2)</math:mi>
    <math:mo>=</math:mo>
    <math:mn>1</math:mn>
  </math:mrow>
  <math:mrow>
    <math:mi>f(3)</math:mi>
    <math:mo>=</math:mo>
    <math:mn>2</math:mn>
  </math:mrow>
  <math:mrow>
    <math:mi>f(4)</math:mi>
    <math:mo>=</math:mo>
    <math:mn>3</math:mn>
  </math:mrow>
</math:math>

Using prefixed names in DOM code is straightforward enough, but there's no way to override the entity definitions in the DTD to tell it to validate against the prefixed names. DOM does not provide any means to create a new internal DTD subset or change an existing one. In order for the document you generate to be valid, therefore, it must use the same prefix the DTD does.

There are some hacks that can work around this. Some of the concrete classes that implement the DocumentType interface such as Xerces' org.apache.xerces.dom.DocumentTypeImpl include a nonstandard setInternalSubset() method. Or instead of pointing to the normal DTD, you can point to an external DTD that overrides the namespace parameter entity references and then imports the usual DTD. You could even generate this DTD on the fly using a separate output stream that writes strings containing entity declarations into a file. However, the bottom line is that the internal DTD subset just isn't well supported by DOM, and any program that needs access to it should use a different API.


     Python   SQL   Java   php   Perl 
     game development   web development   internet   *nix   graphics   hardware 
     telecommunications   C++ 
     Flash   Active Directory   Windows