OutputFormat





OutputFormat

The detailed behavior of a serializer is controlled by an OutputFormat object. This class can configure almost any aspect of serialization, including setting the maximum line length, changing the indentation, specifying which elements have their text escaped as CDATA sections, and more. A few options even have the potential to make your documents malformed. For example, if you add an element to the list of nonescaping elements, then any reserved characters like < and & that appear in its text content will be output as themselves rather than escaped as &lt; and &amp;.

One of the most frequent requests for serializers is "pretty printing" data with extra line breaks and indentation. Within reasonable limits, the OutputFormat class can provide this. Simply pass true to setIndenting(), pass the number of spaces you want each level to be indented to setIndent(), and pass the maximum line length to setLineWidth(). Figure demonstrates.

1 Using Xerces' OutputFormat Class to "Pretty Print" XML
import java.math.*;
import java.io.IOException;
import org.w3c.dom.*;
import javax.xml.parsers.*;
import org.apache.xml.serialize.*;

public class IndentedFibonacci {

  public static void main(String[] args) {

    try {

      // Find the implementation
      DocumentBuilderFactory factory
       = DocumentBuilderFactory.newInstance();
      factory.setNamespaceAware(true);
      DocumentBuilder builder = factory.newDocumentBuilder();
      DOMImplementation impl = builder.getDOMImplementation();

      // Create the document
      Document doc = impl.createDocument(null,
       "Fibonacci_Numbers", null);

      // Fill the document
      BigInteger low  = BigInteger.ONE;
      BigInteger high = BigInteger.ONE;

      Element root = doc.getDocumentElement();

      for (int i = 0; i < 10; i++) {
        Element number = doc.createElement("fibonacci");
        Text text = doc.createTextNode(low.toString());
        number.appendChild(text);
        root.appendChild(number);

        BigInteger temp = high;
        high = high.add(low);
        low = temp;
      }

      // Serialize the document
      OutputFormat format = new OutputFormat(doc);
      format.setLineWidth(65);
      format.setIndenting(true);
      format.setIndent(2);
      XMLSerializer serializer
       = new XMLSerializer(System.out, format);
      serializer.serialize(doc);

    }
    catch (FactoryConfigurationError e) {
      System.out.println("Could not locate a JAXP factory class");
    }
    catch (ParserConfigurationException e) {
      System.out.println(
       "Could not locate a JAXP DocumentBuilder class"
      );
    }
    catch (DOMException e) {
     System.err.println(e);
    }
    catch (IOException e) {
     System.err.println(e);
    }

  }

}

When run, this program produces the following output:

C:\XMLJAVA>java IndentedFibonacci 
<?xml version="1.0" encoding="UTF-8"?>
<Fibonacci_Numbers>
  <fibonacci>1</fibonacci>
  <fibonacci>1</fibonacci>
  <fibonacci>2</fibonacci>
  <fibonacci>3</fibonacci>
  <fibonacci>5</fibonacci>
  <fibonacci>8</fibonacci>
  <fibonacci>13</fibonacci>
  <fibonacci>21</fibonacci>
  <fibonacci>34</fibonacci>
  <fibonacci>55</fibonacci>
</Fibonacci_Numbers>

I think you'll agree that this looks much more attractive than the smushed together output from the bare serialization without any extra white space. One warning, however: White space is significant in XML. Adding this white space has changed the document. This is not the same document as existed before it was "pretty printed." For this particular application, the extra white space is insignificant, but this is not true for all XML applications.

White space is just the beginning of what the OutputFormat class can control. Other features include the MIME media type, the XML declaration, the system and public IDs for the document type, which elements' content should be escaped as CDATA sections, and more. Following is a list of the properties you can control by invoking various methods on OutputFormat. In some cases, the default is document dependent. When it's not, the default value is given in parentheses.

Method

The method is normally set to one of three values—xml, html, or text—indicating the type of output that is desired. The serializer uses this value to configure itself. The default value is determined by the type of the document being serialized.

public void setMethod (String method)
public String getMethod()
public static String whichMethod (Document doc)

Media Type (Null)

This is the MIME media type for the output, such as application/xml or application/xhtml+xml. Although not included in the document itself, this may be used as part of the stream's metadata if it's written into a file system or onto an HTTP connection or some such.

public void setMediaType (String version)
public String getMediaType()
public static String whichMediaType (Document doc)

Version (1.0)

The version number used in the encoding declaration should always be "1.0." Do not change this.

public void setVersion (String version)
public String getVersion()

Standalone (No)

The value of the standalone attribute in the XML declaration. This should be true for "yes" and false for"no".

public void setStandalone (boolean standalone)
public boolean getStandalone()

Encoding (UTF-8)

The encoding specifed in the encoding attribute in the XML declaration and used to convert characters to bytes when serializing onto an OutputStream.

public void setEncoding (String encoding)
 public String getEncoding()

Omit XML Declaration (False)

If true, then no XML declaration is output. If false, then an XML declaration is written.

public void setOmitXMLDeclaration (boolean omitXMLDeclaration)
public boolean getOmitXMLDeclaration()

Document Type

This specifies the system and public IDs of the external DTD subset given in the document type declaration. These values are used only if the Document being serialized does not contain a DocumentType object of its own.

public void setDoctype (String publicID, String systemID)
public String getDoctypePublic()
public String getDoctypeSystem()
public static String whichDoctypePublic (Document doc)
public static String whichDoctypeSystem (Document doc)

Omit Document Type (False)

If true, then no document type declaration is output. If false, then a document type declaration is written. If the document does not have a document type declaration and none has been set with setDoctype(), then no document type declaration will be written, regardless of the value of this property.

public void setOmitDocumentType (boolean omitDocumentType)
public boolean getDocumentType()

Nonescaping Elements

The elements whose text-node children should not be escaped using entity references.

public void setNonEscapingElements (String[] elementNames)
public String[] getNonEscapingElements (String[] elementNames)
public boolean isNonEscapingElement (String name)

CDATA Elements

The elements whose text content should be enclosed in a CDATA section.

public void setCDATAElements (String[] elementNames)
public String[] getCDATAElements (String[] elementNames)
public boolean isCDATAElement (String name)

Omit Comments (False)

If true, then comments in the document are not written onto the output. If false, they are written.

public void setOmitComments (boolean omitComments)
public boolean getOmitComments()

Indenting (False)

If true, then the serializer will add indents at each level and wrap lines that exceed the maximum line width. If false, it won't. The number of spaces to indent is set by the indent property, and the column to wrap at is set by the line width property.

public void setIndenting (boolean indenting)
public boolean getIndenting()

Indent (4)

The number of spaces to indent each level if indenting is true.

public void setIndent (int indent)
public int getIndent()

Line Width (72)

The maximum number of characters in a line when indenting is true. Setting this to zero turns off line wrapping completely.

public void setLineWidth (int width)
public int getLineWidth()

Line Separator (\n)

The character or characters to use for a line break. Take care to set this property only to a carriage return, a linefeed, or a carriage return/linefeed pair.

public void setLineSeparator (String separator)
public String getLineSeparator()

Figure uses these methods to create a valid MathML document encoded in ISO-8859-1 with a document type declaration, an XML declaration, no comments, a 65-character maximum line width, a two-space indent, a standalone declaration with the value yes, and the MIME media type application/xml:

2 Using Xerces' OutputFormat Class to "Pretty Print" MathML
import java.math.*;
import java.io.*;
import org.w3c.dom.*;
import javax.xml.parsers.*;
import org.apache.xml.serialize.*;
public class ValidFibonacciMathML {

   public static String MATHML_NS
    = "http://www.w3.org/1998/Math/MathML";

   public static void main(String[] args) {

     try {

       DocumentBuilderFactory factory
        = DocumentBuilderFactory.newInstance();
       factory.setNamespaceAware(true);
       DocumentBuilder builder = factory.newDocumentBuilder();
       DOMImplementation impl = builder.getDOMImplementation();

       Document doc = impl.createDocument(MATHML_NS, "math", null);

       BigInteger low  = BigInteger.ONE;
       BigInteger high = BigInteger.ONE;

       Element root = doc.getDocumentElement();
       root.setAttribute("xmlns", MATHML_NS);

       for (int i = 1; i <= 10; i++) {
         Element mrow = doc.createElementNS(MATHML_NS, "mrow");

         Element mi = doc.createElementNS(MATHML_NS, "mi");
         Text function = doc.createTextNode("f(" + i + ")");
         mi.appendChild(function);

         Element mo = doc.createElementNS(MATHML_NS, "mo");
         Text equals = doc.createTextNode("=");
         mo.appendChild(equals);

         Element mn = doc.createElementNS(MATHML_NS, "mn");
         Text value = doc.createTextNode(low.toString());
         mn.appendChild(value);

         mrow.appendChild(mi);
         mrow.appendChild(mo);
         mrow.appendChild(mn);

         root.appendChild(mrow);

         BigInteger temp = high;
         high = high.add(low);
         low = temp;
       }

       OutputFormat format = new OutputFormat(doc);
       format.setLineWidth(65);
       format.setIndenting(true);
       format.setIndent(2);
       format.setEncoding("ISO-8859-1");
       format.setDoctype("-//W3C//DTD MathML 2.0//EN",
        "http://www.w3.org/TR/MathML2/dtd/mathml2.dtd");
       format.setMediaType("application/xml");
       format.setOmitComments(true);
       format.setOmitXMLDeclaration(false);
       format.setVersion("1.0");
       format.setStandalone(true);

       XMLSerializer serializer
        = new XMLSerializer(System.out, format);
       serializer.serialize(doc);

     }
     catch (FactoryConfigurationError e) {
       System.out.println("Could not locate a JAXP factory class");
     }
     catch (ParserConfigurationException e) {
       System.out.println(
         "Could not locate a JAXP DocumentBuilder class"
       );
     }
     catch (DOMException e) {
       System.err.println(e);
     }
     catch (IOException e) {
       System.err.println(e);
     }
    }
 }

Following is the beginning of the output that this program produces:

C:\XMLJAVA>java ValidFibonacciMathML 
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE math PUBLIC "-//W3C//DTD MathML 2.0//EN"
                  "http://www.w3.org/TR/MathML2/dtd/mathml2.dtd">
<math xmlns="http://www.w3.org/1998/Math/MathML">
  <mrow>
    <mi>f(1)</mi>
    <mo>=</mo>
    <mn>1</mn>
  </mrow>
  <mrow>
    <mi>f(2)</mi>
    <mo>=</mo>
    <mn>1</mn>
  </mrow>

...

You can imagine other requests for the serializer. For example, you might want a line break after each </mrow> end-tag but no line breaks inside mrow elements. Although OutputFormat doesn't give you enough control to arrange serialization to this level of detail, you could write a custom subclass of XMLSerializer to accomplish this.


     Python   SQL   Java   php   Perl 
     game development   web development   internet   *nix   graphics   hardware 
     telecommunications   C++ 
     Flash   Active Directory   Windows