XSLT





XSLT

There is a criticism that the design of HTML is not very sophisticated because it does not separate a logical document structure from its presentation design.

Suppose that you are the CEO of a newspaper company and plan to publish news on the Web. News writers create source articles but they do not touch on the design of their presentation on the Web. Usually, Web designers create HTML documents based on the input from the news writers. If an HTML document contains both the content of news articles and their presentation (this is usually the case), a problem occurs when both designers and news writers want to modify the same HTML at the same time.

Furthermore, to differentiate your news site from those of your competitors, you may want to revise the overall design of your news site to capture the interest of more mobile users who are using Personal Digital Assistants (PDAs). For that purpose, you may decide to prepare two page designs, one for PC users and another for PDA users. How can you do that? You could first create a page for PC users and then copy it and modify the copy for PDA readers. However, it is easy to imagine that these steps would create a serious problem; for example, when a writer wants to update an article after the page design is final, it is hard to always modify both copies consistently.

As many of you may know, there is a famous software architecture model called the Model-View-Controller (MVC) Model. It is very useful when you design a graphical user interface (GUI). It clearly separates the role of the components in a GUI program into a model (M) for the structure of the data, a view (V) for presentation of the model, and a controller (C) for operations on the model. This separation makes it easy to minimize the side effect of each component; for example, changes on a view do not affect a model.

The proper design of HTML could, ideally, be the same as described earlier; that is, it could separate a model and its views.[6] In contrast to HTML, XML has the concept of model-view separation. A model is specified by an XML document, and a view is represented as an HTML document. It would be convenient to have a tool that converts an XML document to an appropriate HTML document based on a set of translation rules. XSLT is designed to provide the basis for such a translation.

[6] Such M-V separation makes the concept of HTML complicated; however, HTML could not have become very popular if it had very complicated syntax.

1 What Is XSLT?

XSLT is a W3C Recommendation that converts an XML document to something else. In many cases, the result of the conversion is another XML document, but in some cases, it may be an HTML document or Comma-Separated Value (CSV), or even Portable Document Format (PDF). In this section, we describe only the XML-to-XML conversion.

In XSLT, a template is used to represent a fragment of a result tree. With XSLT, some parts of an input XML document—typically a value of an element or an attribute—are inserted into the template, and then the result tree is created. XSLT uses XPath and its additional functions to select or test nodes.

2 Syntax and Semantics of XSLT

A detailed explanation of XSLT syntax and semantics goes beyond the scope of this book because the XSLT specification is very big. Instead, we describe some typical techniques by using concrete examples that translate an XML document to various XHTML documents. Even though it does not fully cover all the capabilities provided by XSLT, we think it is still enough to give you the flavor of XSLT. We refer you to the following book for more detail: XSLT: Working with XML and HTML, by Khun Yee Fung (Addison-Wesley, ISBN 0-201-71103-6).

XSLT Stylesheets

A document that defines XSLT translation rules is called a stylesheet. Listing 7.4 is an example of a stylesheet (sample-1.xsl) that translates the XML document sample.xml (see Listing 7.2) into an XHTML document.

Listing 7.4 Stylesheet for translating XML to XHTML, chap07/data/sample-1.xsl
[1]   <?xml version="1.0" encoding="UTF-8"?>
[2]   <xsl:stylesheet
[3]     version="1.0"
[4]     xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
[5]     xmlns="http://www.w3.org/TR/xhtml1"
[6]     xmlns:ch7="http://www.example.com/xmlbook2/chap07/"
        exclude-result-prefixes="ch7">
        <xsl:output method="xml" encoding="UTF-8"/>
        <xsl:template match="/">
          <html>
          <head><title>XSLT Sample</title></head>
            <body>
              <ul>
                <li>
      <xsl:value-of select="ch7:W3Cspecs/ch7:spec[1]/@title"/>
                </li>
               <li>
      <xsl:value-of select="ch7:W3Cspecs/ch7:spec[2]/@title"/>
               </li>
             </ul>
           </body>
         </html>
       </xsl:template>
      </xsl:stylesheet>

A stylesheet is an XML document. For example, you can specify an encoding attribute in an XML declaration (line 1) just as in other XML documents. The document element of a stylesheet is the xsl:stylesheet element.[7] The value of the version attribute (line 3) is the XSLT version number (1.0). Usually, the namespaces used in a stylesheet are declared in the xsl:stylesheet element. In this example, three namespaces are declared: for the stylesheet itself (line 4), XHTML 1.0 as the output document (line 5), and the input XML document, sample.xml (line 6). The exclude-result-prefixes attribute specifies the namespace prefixes that should not be included in the result XML document. Child elements of the xsl:stylesheet element are parameters to the stylesheet (for example, xsl:output) and templates (xsl:template). We describe templates in the next section, XSLT Templates.

[7] You can use xsl:transform instead of xsl:stylesheet. It is an alias of xsl:stylesheet and has the same meaning. In this book, we use xsl:stylesheet.

As shown in this example, xsl:output specifies the output format of this stylesheet. The method attribute specifies the overall format of the result tree. Possible values of the method attribute are xml (XML output), html (HTML output), text (text output), and so on. The XSLT processor, which is a component that processes an XSLT stylesheet, changes its behavior according to the method.

Listing 7.5 is the output XHTML document (sample-1.xhtml) translated from sample.xml by applying sample-1.xsl. The actual output does not contain any spaces, but we indented it for readability.

Listing 7.5 Translated XHTML document, chap07/data/sample-1.xhtml
<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/TR/xhtml1">
  <head>
    <title>XSLT Sample</title>
  </head>
  <body>
    <ul>
      <li>XML Path Language (XPath) Version 1.0</li>
      <li>XSL Transformations (XSLT) Version 1.0</li>
    </ul>
  </body>
</html>

In summary, elements in the template under the html element are first written, and then the values of the title attributes in sample.xml are embedded. In this way, XSLT is very powerful in extracting values from an input XML document and embedding them into an output template. To do the same task, you can write a program using DOM or SAX, but you need much more effort than when using XSLT.

In case of the sample-1.xsl, the xsl:output element and the xsl:template are only the child elements of the xsl:stylesheet element. However, you can specify more child elements. Figure summarizes the typical child elements under the xsl:stylesheet element.

Listing 7.6 is an overview of the XSLT stylesheet structure. Please refer the XSLT specification for the details.

Figure Child Elements of xsl:stylesheet

ELEMENT

DESCRIPTION

xsl:include and xsl:import

Embed another stylesheet specified by the href attribute into and the current stylesheet. Unlike with xsl:include, XSLT scans the stylesheet embedded by the xsl:import element before the current stylesheet.

xsl:output

Specifies the format of an output document; for example, an output is an XML document with the XML declaration.

xsl:variable and xsl:param

Declare a variable with a name specified by the name attribute and and a value specified as its text value. With xsl:param, unlike with xsl:variable, the text is regarded as a default value of the variable; that is, it is overwritten by the variable with the same name. Variable declarations can be written in a template (described later).

xsl:template

Specifies a template. As you have seen in sample.xsl, you can write more than one xsl:template in xsl:stylesheet. The details are described in the section XSLT Templates.

An overview of the stylesheet structure
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:include href="..."/>
  <xsl:import href="..."/>
  <xsl:strip-space elements="..."/>
  <xsl:preserve-space elements="..."/>
  <xsl:variable name="...">...</xsl:variable>
  <xsl:param name="...">...</xsl:param>
  <xsl:output method="..." />
  <xsl:key name="..." match="..." use="..."/>
  <xsl:decimal-format name="..."/>
  <xsl:namespace-alias stylesheet-prefix="..." result-prefix="..."/>
  <xsl:attribute-set name="...">
    ...
  </xsl:attribute-set>

  <xsl:template match="...">
    ...
  </xsl:template>
  <xsl:template match="...">
    ...
  </xsl:template>
</xsl:stylesheet>
XSLT Templates

A template is the most important part of a stylesheet. Each xsl:template element has a match attribute, which is an XPath to specify the part of an input XML document where the template is to be applied. For the stylesheet sample-1.xsl, for example, match="/" indicates that the template is to be applied to the document root of an input XML document. An xsl:template element contains a mixture of literal strings as fragments in the output document and instructions. The namespace prefix xsl distinguishes instructions from literal strings. When a template is applied, literal strings in the template are copied to the output document as they are, while instructions are executed. The template in the following example always outputs a fixed XHTML document regardless of its input document because it contains only literal strings.

<xsl:template match="/">
  <html xmlns="http://www.w3.org/TR/xhtml1">
    <head><title>XSLT sample 1</title></head>
    <body>
      <ul>
        <li>XML Path Language (XPath) Version 1.0</li>
        <li>XSL Transformations (XSLT) Version 1.0</li>
      </ul>
    </body>
  </html>
</xsl:template>

A literal string can be any sequence of characters that are allowed by XML, although the stylesheet as a whole must be a well-formed XML document. For example, an XSLT processor reports a syntax error if an end tag is missing but its start tag is specified. If you want to output such a fragment of XML, you must escape the start tag or use a CDATA section. Note that it is not always true that the output XML document is well-formed. For example, if you write an inappropriate string, such as "foo", outside the html tag in the previous example, the output document is no longer well-formed. In general, it is a very hard problem to make sure the output XML document is valid against a given DTD or an XML Schema. At this moment, one of the best ways to solve this problem is to parse the generated XML documents by using an XML parser as we discussed in Chapter 3, Section 3.4.2.

An instruction may apply a template to the selected nodes or the output values of the selected nodes. For example, the xsl:value-of instruction outputs the values of the nodes selected by an XPath specified in the select attribute. An XPath can be absolute or relative from the target node that an XSLT processor is working on (called the current node, to be described later). There are two XPaths in sample-1.xsl (see Listing 7.4): ch7:W3Cspecs/ch7:spec[1]/@title and ch7:W3Cspecs/ch7:spec[2]/@title. Their values are shown in sample-1.xhtml (see Listing 7.5) as "XML Path Language (XPath) Version 1.0" and "XSL Transformations (XSLT) Version 1.0," respectively.

Note that the namespace declaration for the namespace prefix ch7 in the xsl:stylesheet element affects the evaluation of the XPath. In this case, the namespace prefix ch7 in the XPath is associated with the namespace http://www.example.com/xmlbook2/chap07/ declared in the stylesheet. In XSLT, a namespace scope in a stylesheet is effective even in the evaluation of XPaths. This indicates that we do not need to follow the consideration on namespaces we describe in Section 7.1.3; that is, we should not use fixed namespace prefixes in XPath as long as XPath is used in XSLT.

The stylesheet sample-1.xsl assumes that there must be two title attributes in an input XML document. To make it more flexible, in Listing 7.7 we show you an improved stylesheet (sample-2.xsl) that can accept any number of title attributes.

Listing 7.7 Improved stylesheet, chap07/data/sample-2.xsl
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns="http://www.w3.org/TR/xhtml1"
  xmlns:ch7="http://www.example.com/xmlbook2/chap07/"
  exclude-result-prefixes="ch7">
  <xsl:output method="xml" encoding="UTF-8"/>

  <!-- Template 1 -->
  <xsl:template match="/">
    <html>
      <head><title>XSLT Sample</title></head>
      <body>
        <ul>
<xsl:apply-templates select="ch7:W3Cspecs/ch7:spec"/>
        </ul>
      </body>
    </html>
  </xsl:template>

  <!-- Template 2 -->
  <xsl:template match="ch7:W3Cspecs/ch7:spec">
    <li><xsl:value-of select="@title"/></li>
  </xsl:template>

</xsl:stylesheet>

Two templates are defined in sample-2.xsl (Templates 1 and 2), while there is only one template in sample-1.xsl. Template 1 is almost the same as the one in sample-1.xsl. The only difference is that the child element of the ul element is the xsl:apply-templates element. Template 2 is newly added. The result of applying sample-2.xsl to sample.xml is identical to that of applying sample-1.xsl (see Listing 7.6). However, sample-2.xsl can accept any number of title attributes in sample.xml.

The xsl:apply-templates element is an instruction to apply templates to the nodes selected by the XPath expression specified in the select attribute. This instruction lets an XSLT processor first select templates that match nodes specified by an XPath expression in the match attribute, and then apply the selected templates to specified nodes. If more than one node is selected, an XSLT processor applies templates to all the selected nodes. In sample-2.xsl, an XSLT processor first selects two ch7:spec elements in sample.xml. Selected elements are specified by the XPath expression ch7:W3Cspecs/ch7:spec in the select attribute of the xsl:apply-templates element. Then it applies the second template to the selected elements. The second template is selected because the selected elements also match the match attribute ch7:W3Cspecs/ch7:spec. For each selected element, the second template outputs the value of the title attribute of the ch7:spec element as an li element.

In general, an XSLT processor recursively traverses an input XML tree structure as shown in Figure. The node that an XSLT processor is working with is called the current node. An XSLT processor may change the current node to a different part of the tree structure depending on the instructions. Even so, control is finally returned to the previous current node, like a function call in a programming language.

3. Traversing an input XML document with an XSLT processor

graphics/07fig03.gif

There are several built-in templates to determine the default processing behavior of an XSLT processor. The priorities of built-in templates are lower than those written explicitly by a programmer. Therefore, a template written in a stylesheet is always selected when the template matches the same node that a built-in template also matches. Some typical built-in templates follow.[8]

[8] There are other built-in templates,which are not listed here.

<!-- Built-in template 1 -->
<xsl:template match="*|/">
  <xsl:apply-templates/>
</xsl:template>

<!-- Built-in template 2 -->
<xsl:template match="text()|@*">
  <xsl:value-of select="."/>
</xsl:template>

<!-- Built-in template 3 -->
<xsl:template match="processing-instruction()|comment()"/>

Built-in template 1 matches all the elements and the document root, and applies templates to all their child elements. If an apply-templates element does not have a select attribute, all the child elements under the current node are selected. Note that the apply-templates element changes the current node. Built-in template 2 matches all the text nodes and attribute nodes, and outputs their values. Built-in template 3 matches all the processing instructions (PIs) and comment nodes. The template does not output anything because it has no child elements. Therefore, it removes PIs and comments from the output document.

If you do not write any template—that is, you just use these built-in templates—all the tags, PIs, and comments are removed from an input XML document, and only the text and attribute values remain in the output document.

Using sample-2.xsl as an example, we can explain how templates are to be applied to an input XML document. To make the explanation simple, we omit the processing of whitespace. Figure shows the stylesheet processing steps in order.

Stylesheet Processing Steps

STEP NO.

CURRENT NODE

TEMPLATE IN PROCESS

OUTPUT

DESCRIPTION

1

/

N/A

N/A

Selects a template in the style-sheet that matches /. In this case, template 1 is selected.

2

/

1

<html><head><title>XSLT sample 1</title> </head><body> <ul>

Outputs the first literal string.

3

/

1

N/A

Selects ch7:W3Cspecs[1]/ch7:spec[1] and ch7:W3Cspecs/ch7:spec[2] according to the select attribute in the apply-templates element.

4

/ch7:W3C specs[1]/ch7:spec[1]

N/A

N/A

Selects a template in the stylesheet that matches ch7:W3Cspecs[1]/ch7:spec [1]. In this case, template 2 is selected.

5

/ch7:W3C specs[1]/ch7:spec[1]

2

<li>

Outputs the first literal string in template 2.

6

/ch7:W3C specs[1]/ch7:spec[1]

2

XML Path Language (XPath) Version 1.0

Outputs the value of the attribute selected by the XPath (@title) in the select attribute of the xsl:value-of element.

7

/ch7:W3C specs[1]/ch7:spec[1]

2

</li>

Outputs the remaining literal string in template 2.

8

/ch7:W3C specs[1]/ch7:spec[2]

N/A

<li>XSL Transformations(XSLT) Version1.0</li>

Repeats steps 4 through 7 with /ch7:W3Cspecs[1]/ch7:spec[2]

9

/

1

</ul></body></html>

Outputs the remaining literal string in the first template.

10

N/A

N/A

N/A

Ends

The last example, shown in Listing 7.8, is sample-3.xsl, which generates an XHTML table from sample.xml (see Listing 7.2).

Listing 7.8 More complex example of stylesheets, chap07/data/sample-3.xsl
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns="http://www.w3.org/TR/xhtml1"
  xmlns:ch7="http://www.example.com/xmlbook2/chap07/"
  exclude-result-prefixes="ch7">
  <xsl:output method="xml" encoding="UTF-8"/>

  <!-- Template 1 -->
  <xsl:template match="/">
    <html>
      <head><title>XSLT Sample</title></head>
      <body>
        <table border="1">
          <thead>
            <tr>
              <th>Title</th>
              <th>Status</th>
              <th>Date</th>
            </tr>
          </thead>
          <tbody><xsl:apply-templates/></tbody>
        </table>
      </body>
    </html>
  </xsl:template>

  <!-- Template 2 -->
  <xsl:template match="ch7:spec">
    <tr>
      <td><xsl:value-of select="@title"/></td>
      <td><xsl:apply-templates select="ch7:date/@type"/></td>
      <td><xsl:value-of select="ch7:date"/></td>
    </tr>
  </xsl:template>

  <!-- Template 3 -->
  <xsl:template match="@type">
    <xsl:variable name="type" select="."/>
    <xsl:choose>
      <xsl:when test='$type="REC"'>Recommendation</xsl:when>
      <xsl:when test='$type="PR"'>Proposed Recommendation</xsl:when>
      <xsl:when test='$type="WD"'>Working Draft</xsl:when>
      <xsl:otherwise>Other</xsl:otherwise>
    </xsl:choose>
  </xsl:template>

</xsl:stylesheet>

Listing 7.9, sample-3.xhtml, is the output XHTML file. We show the result with appropriate indentations for readability, but the actual result is not indented.

Listing 7.9 The output XHTML document generated by sample-3.xsl, chap07/data/sample-3.xhtml
<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/TR/xhtml1">
  <head>
    <title>XSLT Sample</title>
  </head>
  <body>
    <table border="1">
      <thead>
        <tr>
          <th>Title</th>
          <th>Status</th>
          <th>Date</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <td>XML Path Language (XPath) Version 1.0</td>
          <td>Recommendation</td><td>16 November 1999</td>
        </tr>
        <tr>
          <td>XSL Transformations (XSLT) Version 1.0</td>
          <td>Recommendation</td><td>16 November 1999</td>
        </tr>
      </tbody>
    </table>
  </body>
</html>

In this example, for each ch7:spec element in sample.xml, its title and type attributes and its child element, ch7:date, are extracted and inserted as a record of the output XHTML table. There are three templates: Template 1 matches the root node and outputs an outer structure of the XHTML table; template 2 matches ch7:spec elements and outputs the contents of the matched element as a table record; and template 3 matches each @title attribute and outputs its value.

An xsl:choose element, which appeared in the third template, is similar to the switch-case statement in a programming language. It tests the value of the test attribute in each xsl:when element and outputs the contents of the xsl:when element that first passes the test. This template translates shorthand notations (for example, "REC") to their corresponding full notations (for example, "Recommendation").

The type is a variable declared as an attribute of the xsl:variable element, with the initial value of the @type attribute. It is referred to as $type in a when element.

In Figure we show some typical instructions that can be used in a template. There are other instructions in XSLT. Refer to the XSLT specification for the details.

Instructions in XSLT

INSTRUCTION/EXAMPLE

DESCRIPTION

<xsl:value-of select="expression"/>
<xsl:value-of 
select="ch7:W3Cspecs/ch7:spec[1]/@title"/>
<xsl:text>content</xsl:text>
<xsl:text>XSLT sample 1</xsl:text>
<xsl:if test= expression > template </xsl:if>
<xsl:if test="position()=1">first one</xsl:if>
<xsl:choose>
  <xsl:when test=expression>template</xsl:when>
  <xsl:otherwise>template</xsl:otherwise>
 </xsl:choose>
<xsl:choose>
   <xsl:when test='node()="US"'>United States
   </xsl:when>
   <xsl:when test='node()="CA"'>Canada</xsl:when>
   <xsl:when test='node()="JP"'>Japan</xsl:when>
   <xsl:otherwise>Other</xsl:otherwise>
 </xsl:choose>
<xsl:variable name= "qname" select= "expression" />
<xsl:variable name="type" select="."/>

Outputs the result of evaluating the expression.

Outputs the content.

Evaluates the expression and if true, evaluates the template.

Evaluates the expression and if true, evaluates the template.

Declares a variable and initializes with the expression

3 XSLT Programming in Java

In this section, we use Xalan as an XSLT processor; it was also used as an XPath processor in Section 7.1.4. However, we do not use X alan's API (the org.apache.xalan package); we use the Java API for XML Processing (JAXP). JAXP is a standard API for XML processing that is defined by the Java Community Process (JCP) and will be shipped with Java Development Kit (JDK) 1.4, which is the next version of JDK. It provides the API for XSLT processing as the javax.xml. transform package. The JAXP package is also included in xalan.jar, which is Xalan's jar file.

Overview of JAXP API for Processing XSLT

Here we give an overview of the JAXP API for processing XSLT.

To absorb the differences between the underlying XSLT processors, JAXP provides two classes for processing XSLT: One is the Transformer class for the abstraction of an XSLT processor, and the other is the TransformerFactory class for the Transformer's factory class. This design is similar to JAXP for XML processors, such as the DocumentBuilder and DocumentBuilderFactory classes. To abstract the I/O, JAXP provides two interfaces: the Source interface and the Result interface. The Source interface is used for both stylesheets and input XML documents.

JAXP supports three types of I/O interfaces: stream, DOM, and SAX. There are three packages, javax.xml.transform. {stream, dom, sax}, provided for that purpose. Each package contains two classes (for example, StreamSource and StreamResult) to implement the Source and the Result interfaces.

Calling the XSLT Processor

Listing 7.10, XSLTStreamTest.java, shows a very simple example program to call the XSLT processor with streaming I/O.

Listing 7.10 Calling XSLT with streaming I/O, chap07/XSLTStreamTest.java
    package chap07;

    import javax.xml.transform.TransformerFactory;
    import javax.xml.transform.Transformer;
    import javax.xml.transform.stream.StreamSource;
    import javax.xml.transform.stream.StreamResult;

    public class XSLTStreamTest {
       public static void main(String[] args) throws Exception {
          // args[0] specifies the path to the input XSLT stylesheet
          String xsltURL = args[0];
          // args[1] specifies the path to the input XML document
          String xmlURL = args[1];

          // Creates instances of StreamSource for the stylesheet
[16]      StreamSource xslt = new StreamSource(xsltURL);
          // Creates instances of StreamSource for the input document
[18]      StreamSource xml = new StreamSource(xmlURL);
          // Creates an instance of TransformerFactory
[21]      TransformerFactory factory =
[22]         TransformerFactory.newInstance();
          // Creates an instance of Transformer
[24]      Transformer transformer = factory.newTransformer(xslt);
          // Executes the Transformer
[26]      transformer.transform(xml, new StreamResult(System.out));
       }
    }

The XSLTStreamTest program accepts two arguments: an XSLT stylesheet and an input XML document. It applies the XSLT stylesheet to the input XML document and then outputs the result to the standard output.

You can run the program as follows:

R:\samples\>java chap07.XSLTStreamTest
  file:./chap07/data/sample-1.xsl file:./chap07/data/sample.xml

The result is shown in Listing 7.5. Let's run XSLTStreamTest for sample-2. xsl and sample-3.xsl, too. We get the results shown in Listings 7.5 and 7.9, respectively.

Next we explain the details of XSLTStreamTest.

First, it creates StreamSource objects from the URIs of a stylesheet and an input XML document (lines 16 and 18). A StreamSource object can also be created from either an InputStream object or a Reader object. The XSLT processor calls an XML processor. Applications do not need to call the XML processor by themselves. Next, it creates a TransformerFactory object (lines 21 and 22) and a Transformer object (line 24).

Note that this program uses TransformerFactory#newTransformer (Source). In this way, JAXP creates a Transformer object associated with a particular stylesheet. The Transformer object is reusable for multiple calls of the transform() method. Therefore, it is useful to apply the same stylesheet object to multiple XML documents. Care should be taken that the Transformer is not thread-safe.

Finally, a transformation is executed (line 26). The translation result is written to the standard output.

Working with DOM

Listing 7.11, XSLTDOMTest.java, is an example using DOM for its I/O.

Listing 7.11 Using DOM for I/O, chap07/XSLTDOMTest.java
    package chap07;

    import javax.xml.transform.TransformerFactory;
    import javax.xml.transform.Transformer;
    import javax.xml.transform.dom.DOMSource;
    import javax.xml.transform.dom.DOMResult;
    import javax.xml.parsers.DocumentBuilder;
    import javax.xml.parsers.DocumentBuilderFactory;
    import org.apache.xml.serialize.OutputFormat;
    import org.apache.xml.serialize.XMLSerializer;
    import org.w3c.dom.Node;
    import org.w3c.dom.Element;
    import org.w3c.dom.Document;
    import org.w3c.dom.DocumentFragment;

    public class XSLTDOMTest {
       public static void main(String[] args) throws Exception {
[18]       // args[0] specifies the path to the input XSLT stylesheet
[19]       String xsltURL = args[0];
[20]       // args[1] specifies the path to the input XML document
[21]       String xmlURL = args[1];
[22]
[23]       // Creates an instance of DocumentBuilderFactory.
[24]       DocumentBuilderFactory dFactory =
[25]          DocumentBuilderFactory.newInstance();
[26]       dFactory.setNamespaceAware(true);
[27]       // Creates an instance of DocumentBuilder
[28]       DocumentBuilder parser = dFactory.newDocumentBuilder();
[29]
[30]       // Creates a DOM instance of the stylesheet
[31]       Document xsltDoc = parser.parse(xsltURL);
[32]       // Creates a DOM instance of the input document
[33]       Document xmlDoc = parser.parse(xmlURL);

           // Creates an instance of TransformerFactory
           TransformerFactory tFactory =
              TransformerFactory.newInstance();
           // Checks if the factory supports DOM or not
[39]       if(!tFactory.getFeature(DOMSource.FEATURE) ||
[40]          !tFactory.getFeature(DOMResult.FEATURE))
[41]           throw new Exception("DOM is not supported");

           // Creates instances of DOMSource and DOMResult
[44]       DOMSource xsltDOMSource = new DOMSource(xsltDoc);
[45]       DOMSource xmlDOMSource = new DOMSource(xmlDoc);
[46]       DOMResult domResult = new DOMResult();
           // Creates an instance of Transformer
           Transformer transformer =
               tFactory.newTransformer(xsltDOMSource);

           // Executes the Transformer
           transformer.transform(xmlDOMSource, domResult);

           // Gets the result
[56]       Node resultNode = domResult.getNode();

[58]       // Prints the response
[59]       OutputFormat formatter = new OutputFormat();
[60]       formatter.setPreserveSpace(true);
[61]       XMLSerializer serializer =
[62]          new XMLSerializer(System.out, formatter);
[63]       switch (resultNode.getNodeType()) {
[64]       case Node.DOCUMENT_NODE:
[65]          serializer.serialize((Document)resultNode);
[66]          break;
[67]       case Node.ELEMENT_NODE:
[68]          serializer.serialize((Element)resultNode);
[69]          break;
[70]       case Node.DOCUMENT_FRAGMENT_NODE:
[71]          serializer.serialize((DocumentFragment)resultNode);
[72]          break;
[73]       default:
[74]          throw new Exception("Unexpected node type");
[75]       }
        }
     }

Refer to XSLTStreamTest to run the program. The only thing you need to change is the class name.

Let's see the details of the program.

The first half of the program translates an input XML document into a DOM tree with an input stylesheet (lines18–33). Then it tests whether the JAXP implementation supports DOM (lines 39–41) because DOM support may not be provided in some JAXP implementations. The rest of the program is almost the same as XSLTStreamTest. The only difference is that it uses DOMSource and DOMResult instead of StreamSource and StreamResult (lines 44–46). Finally, it gets the result node set by using DOMResult#getNode()(line 56), serializes it, and outputs it (lines 58–75).

Translating SAX Events to Other SAX Events

In Chapter 5, Section 5.2.2, we showed how to use a SAX event filter to translate SAX events. In this section, we create a program with a similar behavior using XSLT and JAXP. We discuss the pros and cons of using SAX, DOM, XPath, and XSLT in Section 7.3.

Figure illustrates the concept of SAX event translation using XSLT. In the following discussion, the numbers in parentheses refer to the numbered areas in the figure. In the figure, TransformerHandler is a specialized Transformer that implements the ContentHandler interface. Similar to Transformer, TransformerHandler is associated with a stylesheet (1).

4. Translating SAX events using XSLT

graphics/07fig04.gif

SAXParser, TransformerHandler, and the application are connected as a pipeline; that is, the ContentHandler interface of TransformerHandler is registered to SAXParser (2), and similarly the application's ContentHandler interface is registered to TransformerHandler (3). In this way, each ContentHandler of a right-hand component is registered to its left-hand component to make a pipeline.

Once an XML document is given to the SAX parser (4), it is translated into SAX events and passed to the next component (5, 7). TransformerHandler translates SAX events by applying the stylesheet (6).

What is the advantage of this approach? Suppose that an application already has a SAX event handler for processing XML documents that are compliant with a particular DTD or an XML Schema. This approach provides the ability to process similar but different XML documents by preparing a stylesheet for this particular translation. A similar example is shown in Chapter 8, Section 8.3.

Suppose that an application developer has a SAX event handler like the one shown in Listing 7.12.

Listing 7.12 SAX event handler, chap07/BookHandler.java
package chap07;

import java.util.Vector;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class BookHandler extends DefaultHandler {
    // An array to store the results
    final Vector books;
    public BookHandler() {
       this.books = new Vector();
    }

    // Instance variables temporarily used for processing
    Book currentBook = null;
    Author currentAuthor = null;
    StringBuffer buf = new StringBuffer();

    class Book {
       String publishedDate;
       Vector authors = new Vector();
       public String toString() {
           return ("Book(publishedDate=" + publishedDate +
                   ", authors=" + authors+")");
       }
    }
    class Author {
       String authorName;
       String contactTo;
       public String toString() {
           return ("Author(authorName=" + authorName +
                   ", contactTo=" + contactTo + ")");
       }
    }

    public void endDocument() throws SAXException {
       System.out.println(books);
    }

    public void startElement(String uri,
                             String localName,
                             String qName,
                             Attributes attributes)
                throws SAXException
            {
                if ("Book".equals(localName)) {
                   currentBook = new Book();
                   return;
                }
                if ("Author".equals(localName)) {
                   currentAuthor = new Author();
                   return;
            }
            buf.setLength(0);
        }

        public void endElement(String uri,
                               String localName,
                               String qName)
           throws SAXException
        {
           if ("Book".equals(localName)) {
              books.addElement(currentBook);
              return;
           }
           if ("Author".equals(localName)) {
               currentBook.authors.addElement(currentAuthor);
              return;
           }
           if ("PublishedDate".equals(localName))
              currentBook.publishedDate = buf.toString();
           else if ("AuthorName".equals(localName))
              currentAuthor.authorName = buf.toString();
           else if ("ContactTo".equals(localName))
              currentAuthor.contactTo = buf.toString();
           buf.setLength(0);
       }

       public void characters(char[] ch, int start, int length)
          throws SAXException
    {
          buf.append(new String(ch, start, length));
    }
}

BookHandler is designed to process Books.xml (shown in Listing 7.13). It reads Books.xml as an input XML document, stores each book entry in the array variable books in the Book object, and finally outputs the values of books to the standard output.

Listing 7.13 The Books.xml document
<?xml version="1.0" encoding="UTF-8"?>
<Books xmlns="http://www.example.com/xmlbook2/chap07/">
  <Book>
    <PublishedDate>16 November 1999</PublishedDate>
    <Author>
      <AuthorName>James Clark</AuthorName>
      <ContactTo>[email protected]</ContactTo>
    </Author>
    <Author>
      <AuthorName>Steve DeRose</AuthorName>
      <ContactTo>[email protected]</ContactTo>
    </Author>
  </Book>
  <Book>
    <PublishedDate>16 November 1999</PublishedDate>
    <Author>
      <AuthorName>James Clark</AuthorName>
      <ContactTo>[email protected]</ContactTo>
    </Author>
  </Book>
</Books>

If you remember the document sample.xml, you will find similarities between Books.xml and sample.xml. For example, a spec element in sample.xml corresponds to a Book element in Books.xml. Even though these two have semantically the same contents, BookHandler cannot handle sample.xml correctly because they have different tag names.

In such a case, it is useful to have a preprocessing mechanism to translate sample.xml to another XML document having the same DTD for Books.xml. Once such a mechanism has been provided, just by preparing an XSLT stylesheet, an application can handle a similar but different XML document without any modification to the application itself. If we assume the existence of such a transformer between a SAX parser and an application at design time, the application becomes very flexible toward various input XML documents.

The program XSLTSAXTest.java (shown in Listing 7.14) is an example program for the translation purpose described earlier. It accepts two URI arguments (an XSLT stylesheet and an input XML document), applies the stylesheet to the input XML document, and then generates SAX events to be passed to BookHandler.

Listing 7.14 Translating SAX events using XSLT, chap07/XSLTSAXTest.java
    package chap07;

    import javax.xml.parsers.SAXParser;
    import javax.xml.parsers.SAXParserFactory;
    import javax.xml.transform.TransformerFactory;
    import javax.xml.transform.sax.SAXSource;
    import javax.xml.transform.sax.SAXResult;
    import javax.xml.transform.sax.SAXTransformerFactory;
    import javax.xml.transform.sax.TransformerHandler;
    import javax.xml.transform.stream.StreamSource;
    import org.xml.sax.InputSource;
    import org.xml.sax.XMLReader;

    public class XSLTSAXTest {
       public static void main(String[] args) throws Exception {
           // args[0] specifies the URI for the XSLT stylesheet
           String xsltURL = args[0];
           // args[1] specifies the URI for the input XML document
           String xmlURL = args[1];

           // Creates a stream source for the stylesheet
           StreamSource xslt = new StreamSource(xsltURL);
           // Creates an input source for the input document
           InputSource xml = new InputSource(xmlURL);

           // Creates a SAX parser
[27]       SAXParserFactory pFactory = SAXParserFactory.newInstance();
[28]       SAXParser parser = pFactory.newSAXParser();
[29]       XMLReader xmlReader = parser.getXMLReader();

           // Creates an instance of TransformerFactory
[32]       TransformerFactory tFactory =
[33]          TransformerFactory.newInstance();

           // Checks if the TransformingFactory supports SAX or not
[36]       if (!tFactory.getFeature(SAXSource.FEATURE))
[37]          throw new Exception("SAX is not supported");

           // Casts TransformerFactory to SAXTransformerFactory
[40]       SAXTransformerFactory stFactory =
[41]           ((SAXTransformerFactory)tFactory);
           // Creates a TransformerHandler with the stylesheet
[43]       TransformerHandler tHandler =
[44]           stFactory.newTransformerHandler(xslt);
           // Sets the TransformerHandler to the SAXParser
           xmlReader.setContentHandler(tHandler);
           // Sets the application ContentHandler
           // to the TransformerHandler
[49]       tHandler.setResult(new SAXResult(new BookHandler()));

           // Parses the input XML
[52]       xmlReader.parse(xml);
       }
    }

Let's examine XSLTSAXTest in detail by referencing Figure.

First, we create a SAXParser by using JAXP API (lines 27–28) and get an XMLReader by calling SAXParser#getXMLReader() (line 29). Second, we create a TransformerFactory (lines 32–33) and test whether it supports the SAX API (lines 36–37). If the API is supported, we cast the type of TransformerFactory to the SAXTransformerFactory type (lines 40–41). Third, we create a TransformerHandler associated with a stylesheet (lines 43–44). This corresponds to step (1) in Figure. Fourth, we register the TransformerHandler to the XMLReader (2). Fifth, we register BookHandler, which is a ContentHandler prepared by the application, to SAXResult (line 49) (3). Finally, we call XMLReader#parse() to perform the pipeline (line 52); that is, the input XML document is translated into SAX events and passed to TransformerHandler (5), TransformerHandler performs the conversion (6), and the converted SAX events are passed to BookHandler (7).

Let's run XSLTSAXTest with sample.xml as an input XML document (see Listing 7.2) and sample-4.xsl as an XSLT stylesheet (see Listing 7.15). The stylesheet sample-4.xsl matches each element in sample.xml and translates it to the corresponding element in Books.xml. It outputs every attribute and context string as is. The final result is the same as Books.xml (see Listing 7.13).

Listing 7.15 A stylesheet that translates sample.xml into Books.xml, chap07/data/sample-4.xsl
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns="http://www.example.com/xmlbook2/chap07/">
  <xsl:output method="xml" encoding="UTF-8"/>

  <xsl:template match="*[local-name()='W3Cspecs']">
    <Books><xsl:apply-templates/></Books>
  </xsl:template>

  <xsl:template match="*[local-name()='spec']">
    <Book><xsl:apply-templates/></Book>
  </xsl:template>

  <xsl:template match="*[local-name()='date']">
    <PublishedDate><xsl:apply-templates/></PublishedDate>
  </xsl:template>

  <xsl:template match="*[local-name()='editor']">
    <Author><xsl:apply-templates/></Author>
  </xsl:template>

  <xsl:template match="*[local-name()='name']">
    <AuthorName><xsl:apply-templates/></AuthorName>
  </xsl:template>

  <xsl:template match="*[local-name()='email']">
    <ContactTo><xsl:apply-templates/></ContactTo>
  </xsl:template>

  <xsl:template match="@*">
    <xsl:value-of select="name()"/>=<xsl:value-of select="."/>
  </xsl:template>
</xsl:stylesheet>

Running XSLTSAXTest in the console generates the output shown in Listing 7.16. The output is appropriately indented for readability although the original output is just a single line.

 R:\samples\>java chap07.XSLTSAXTest file:./chap07/data/sample-4.xsl
file:./chap07/data/sample.xml
Listing 7.16 Output of XSLTSAXTest
[Book(
  publishedDate=16 November 1999,
  authors=[
    Author(
      authorName=James Clark,
      [email protected]),
    Author(
      authorName=Steve DeRose,
      [email protected])
  ]),
Book(
  publishedDate=16 November 1999,
  authors=[
    Author(
      authorName=James Clark,
      [email protected])
  ])
]

You will find that the contents of sample.xml are stored into the Book objects correctly.

A Simple Way to Translate SAX Events

Now you can translate SAX events to other SAX events and pass them to another application's event handler. However, the drawback is that the programming style of XSLTSAXTest is far from the standard style of using JAXP.

For example, the standard way to use the JAXP API is to call SAXParser#parse(). On the other hand, XSLTSAXTestcalls the XMLReader#parse() method of the XMLReader object, which is created by the SAXParser#getXMLReader() method call to a SAXParser instance. Another example is that the use of the JAXP API becomes tricky. In this way, this approach requires significant changes to an application if it is written in the standard JAXP programming style. Therefore, we do not recommend this approach.

We want to make JAXP invocation as transparent as possible; that is, we want to minimize the modification of an application as much as possible. For that purpose, as shown in Figure, we provide wrapper classes (Transforming SAXParserFactory, TransformingSAXParser, XMLReaderWrapper) to the corresponding JAXP classes (SAXParserFactory, SAXParser, and XMLReader).

Figure. A mechanism for transforming SAXParser

graphics/07fig05.gif

Each wrapping class implements the same interface that the corresponding wrapped class does. Therefore, an application can call JAXP transparently through these wrapper classes. The application program is shown in Listing 7.17.

Listing 7.17 Calling JAXP transparently, chap07/TransformingSAXParserTest.java
package chap07;

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import javax.xml.transform.stream.StreamSource;
import org.xml.sax.InputSource;

public class TransformingSAXParserTest {
   public static void main(String[] args) throws Exception {
       // args[0] specifies the path to the input XSL file
       String xslURL = args[0];
       // args[1] specifies the path to the input XML file
       String xmlURL = args[1];

       // Creates a stream source for the XSL stylesheet
       StreamSource xsl = new StreamSource(xslURL);
       // Creates an input source for XML document
       InputSource xml = new InputSource(xmlURL);

       SAXParserFactory factory =
          TransformingSAXParserFactory.newInstance();
       SAXParser parser = factory.newSAXParser();
       parser.setProperty(TransformingSAXParser.PROPERTY_URI, xsl);
       parser.parse(xml, new BookHandler());
    }
}

Refer to XSLTSAXTest to run the program. You only need to change the class name.

This wrapper approach reduces the necessary changes to an application. In fact, only two lines must be changed from the original: The first change, when creating an instance of the SAXParserFactory class, is to explicitly call TransformingSAXParserFactory.newInstance() instead of SAXParser Factory.newInstance(); the second change is to specify a stylesheet by calling TransformingSAXParser#setProperty(String, Object) with a property name (http://www.example.com/xmlbook2/chap07/) and its value (the StreamSource object format of a stylesheet).

In this section, we explained XSLT programming through examples. We hope this helps you understand the powerful capability of XSLT. However, certain classes of applications cannot take advantage of XSLT. In some cases, it is not easy to determine whether it is better to use DOM, SAX, XPath, or XSLT for a particular application. We discuss this issue in Section 7.3.


     Python   SQL   Java   php   Perl 
     game development   web development   internet   *nix   graphics   hardware 
     telecommunications   C++ 
     Flash   Active Directory   Windows