Talking to SAX Programs





Talking to SAX Programs

JDOM works very well with SAX parsers. SAX is an almost-ideal event model for building a JDOM tree; and when the tree is complete, JDOM makes it easy to walk the tree, firing off SAX events as you go. Fast and memory efficient, SAX doesn't add a lot of extra overhead to JDOM programs.

Configuring SAXBuilder

When reading a file or stream through a SAX parser, you can set various properties on the parser, including the ErrorHandler, EntityResolver, DTDHandler, and any custom features or properties that are supported by the underlying SAX XMLReader. SAXBuilder includes several methods that delegate these configurations to the underlying XMLReader:

public void setErrorHandler (ErrorHandler errorHandler) 

public void setEntityResolver (EntityResolver entityResolver)

public void setDTDHandler (DTDHandler dtdHandler)

public void setIgnoringElementContentWhitespace
 (boolean ignoreWhitespace)

public void setFeature (String name, boolean value)

public void setProperty (String name, Object value)

For example, suppose you want to schema validate documents before using them. This requires three additional steps beyond the norm:

  1. Explicitly pick a parser class that is known to be able to schema validate, such as org.apache.xerces.parsers.SAXParser. (Most parsers can't schema validate.)

  2. Install a SAX ErrorHandler that reports validity errors.

  3. Set the SAX feature that turns on schema validation to true. Which feature this is depends on the parser you picked in step 1. In Xerces, it's http://apache.org/xml/features/validation/schema, and you also need to turn on validation using the standard SAX feature http://xml.org/sax/features/validation.

Figure is a simple JDOM program that uses Xerces to schema validate a URL named on the command line. This is similar to the earlier JDOMValidator in Figure. Here, because the installed ErrorHandler (BestSAXChecker from Figure) merely prints validity error messages on System.out and does not throw an exception, validity errors do not terminate the parse. The Document object is still built as long as it's well-formed, whether or not it's valid. You could of course change this behavior by using a more draconian ErrorHandler that did throw exceptions for validity errors.

11 A JDOM Program That Schema Validates Documents
import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;
import java.io.IOException;

public class JDOMSchemaValidator {

  public static void main(String[] args) {

    if (args.length == 0) {
      System.out.println("Usage: java JDOMSchemaValidator URL");
      return;
    }

    SAXBuilder builder = new SAXBuilder(
     "org.apache.xerces.parsers.SAXParser");
    builder.setValidation(true);
    builder.setErrorHandler(new BestSAXChecker());
                    // ^^^^^^^^^^^^^^
                   // From Chapter 7
    // turn on schema support
    builder.setFeature(
      "http://apache.org/xml/features/validation/schema", true);                  

    // command line should offer URIs or file names
    try {
      builder.build(args[0]);
    }
    // indicates a well-formedness error
    catch (JDOMException e) {
      System.out.println(args[0] + " is not well-formed.");
      System.out.println(e.getMessage());
    }
    catch (IOException e) {
      System.out.println("Could not check " + args[0]);
      System.out.println(" because " + e.getMessage());
    }

  }

}

Here is the result from when I used this program to check a mildly invalid document. One error was reported.

% java JDOMSchemaValidator original_hotcop.xml 
Error: cvc-type.3.1.3: The value '6:20' of element 'LENGTH' is
 not valid.
 at line 10, column 24
 in entity file:///D:/books/XMLJAVA/examples/14/
original_hotcop.xml

Caution

You should only use setFeature() and setProperty() for nonstandard features and properties like http://apache.org/xml/features/validation/schema. SAXBuilder requires certain settings of the standard features such as http://xml.org/sax/features/namespace-prefixes and standard properties such as http://xml.org/sax/properties/lexical-handler in order to work properly. If you change these, then the document may not be built correctly.


Another interesting possibility is to set a SAX filter that is applied to the document as it's read:

public void setXMLFilter (XMLFilter filter) 

If you use this, the JDOM Document will include only the filtered content.

SAXOutputter

In addition to reading a file or stream through a SAX parser, you can also feed a JDOM document into a SAX ContentHandler using the org.jdom.output.SAXOutputter class. This class is initially configured with a ContentHandler and optionally an ErrorHandler, DTDHandler, EntityResolver, and/or LexicalHandler. The output() method walks the tree, firing off events to these handlers as it does so.

For example, suppose you've built a document in memory that happens to contain some XInclude elements, and you'd like to resolve them. JDOM does not have built-in support for XInclude. To JDOM, an XInclude element is just an element that happens to have the local name include and the namespace URI http://www.w3.org/2001/XInclude. However, GNU JAXP does include a filter that can resolve XIncludes. Unfortunately it's a SAX filter rather than a JDOM filter. Not to worry. It's straightforward to feed a JDOM document into the GNU JAXP gnu.xml.pipeline.XIncludeFilter using a SAXOutputter, as shown in Figure.

12 A JDOM Program That Passes Documents to a SAX ContentHandler
import org.jdom.*;
import org.jdom.input.SAXBuilder;
import org.jdom.output.SAXOutputter;
import java.io.IOException;
import gnu.xml.pipeline.*;
import org.xml.sax.SAXException;

public class XIncluder {

  public static void main(String[] args) {

    if (args.length == 0) {
      System.out.println("Usage: java XIncluder URL");
      return;
    }

    SAXBuilder builder = new SAXBuilder(
     "gnu.xml.aelfred2.XmlReader");

    // command line should offer URIs or file names
    try {
      Document doc = builder.build(args[0]);
      XIncludeFilter filter = new XIncludeFilter(
        new TextConsumer(System.out)
      );
      SAXOutputter outputter = new SAXOutputter(filter);
      outputter.setContentHandler(filter);
      outputter.setDTDHandler(filter);
      outputter.setLexicalHandler(filter);
      outputter.output(doc);
    }
    // indicates a well-formedness error
    catch (JDOMException e) {
      System.out.println(args[0] + " is not well-formed.");
      System.out.println(e.getMessage());
    }
    catch (SAXException e) {
      System.out.println(e.getMessage());
    }
    catch (IOException e) {
      System.out.println("Could not merge " + args[0]);
      System.out.println(" because " + e.getMessage());
    }

  }

}

Here the XIncludeFilter is itself hooked up to another GNU JAXP class, TextConsumer, which merely prints the document on a specified OutputStream.


     Python   SQL   Java   php   Perl 
     game development   web development   internet   *nix   graphics   hardware 
     telecommunications   C++ 
     Flash   Active Directory   Windows