Data Binding Basics

Data Binding Basics

Before getting into the specifics of JAXB, it will be helpful to take a look at the concepts that underlie data binding in general. Fundamentally, data binding is similar to the document object model APIs we've discussedDOM, JDOM, and dom4jin that it defines an association, referred to as a binding, between an XML document and a tree of Java objects. A tree of Java objects can be created from an XML document and vice versa. The difference is that when data binding, the Java objects mapped to the document are instances not of generic interfaces representing elements and attributes (and comments, processing instructions, etc.), but of specific classes that have a meaning beyond the XML document. In part to indicate this difference, with data binding you don't "parse" or "serialize" documents. Instead, you marshall XML into Java objects and unmarshall Java objects into an XML document. The components that sit between objects and XML documents are called marshallers and unmarshallers. This relationship is shown in Figure.

Marshallers and unmarshallers

Let's take a look at what we can do with a fictional data binding framework and the XML document in Figure.

A person XML document

<?xml version="1.0"?>
<person xmlns="">

Using DOM, outputting the first name looks something like:

DocumentBuilder documentBuilder = DocumentBuilderFactory.newDocumentBuilder(  );
Document doc = documentBuilder.parse(new File("lola.xml"));
Element element = doc.getDocumentElement(  );
NodeList firstNames = element.getElementsByTagName("firstName");
Element firstName = (Element) firstName.item(0);
System.out.println(firstName.getTextContent(  ));

With a data binding framework, we can write much simpler code, as in Figure.

Unmarshalling to a Person object

Unmarshaller unmarshaller = DataBindingFactory.newUnmarshaller(  );
Person person = (Person) unmarshaller.unmarshal(new File("lola.xml"));
System.out.println(person.getFirstName(  ));

This has both fewer lines of code and is much more obvious about what it is doing. The first line uses a factory class in our fictional framework to obtain a new instance of the Unmarshaller interface for this framework. The second line passes a File object for our document to the unmarshaller and returns an instance the Person class. Finally, we call the getFirstName( ) method on this object and output the result to the console.

We can also do the reverse. Producing the XML document in Figure could be done with the code in Figure.

Marshalling a Person object

Person person = new Person(  );
Marshaller marshaller = DataBindingFactory.newMarshaller(  );
marshaller.marshal(person, new FileWriter("lola.xml"));

The code above should raise a question: how did the unmarshaller in Figure know to create a Person object? And how did the marshaller in Figure know to create that specific XML structure and not, for example:

<person xmlns="" firstName="Lola" lastName="Arbuckle"/>

The answer to both questions could be that we need to explicitly tell the unmarshaller and marshaller about our Person class and that we want the properties to result in elements, not attributes:

// for the unmarshaller
unmarshaller.addMapping(Person.class, "", "person");

// for the marshaller
marshaller.addMapping(Person.class, "", "person");
marshaller.setMarshalPropertiesAsElements(Person.class, true);

This is reasonable, and Java's reflection features are good enough that you could write a simple data binding framework using this sort of configuration scheme. However, the more prevalent technique among current data binding frameworks is to put the configuration of the document-to-object mapping in some sort of class-level metadata, as seen in Figure.

Class metadata determines structure

In some frameworks, this metadata is located within the class definition itself, either through static methods and fields or Java annotations. In others, the metadata is contained in an external mapping configuration file (it should come as no surprise that this file is usually XML itself). Some frameworks support multiple configuration methods or combinations of methods.

Data Binding and Schemas

The above description of data binding left out a critical component: a schema. In many applications, the XML documents produced by a marshaller and consumed by an unmarshaller are expected to conform to a schema, whether that be a DTD, an XML Schema, a RELAX NG schema, or some other schema language. For data binding, schemas are used in two distinct ways. The first is that they can be used to validate the result of a marshalling or the input for unmarshalling, as shown in Figure.

Documents conform to a schema

The second area is that many data binding frameworks support generating the Java classes from a schema. This is usually referred to as compiling a schema and the application that performs it is called a schema compiler. In general, the compiler will have some mechanism for customizing the generated classes.

If you've used Hibernate or another object-relational mapping framework, you may have used a similar code-generation mechanism to generate Java classes from a database schema.

In addition to generating Java classes from a schema, a few frameworks, including JAXB 2.0, allow for the reverse: a schema definition can be generated from Java classes as seen in Figure.

JAXB 2.0 includes schema generation

When to (and When Not to) Use Data Binding

Data binding is designed to allow applications to easily move data between a set of Java classes and an XML representation where both the Java classes and the XML representation are defined in some way. When this is the case, data binding can be very useful and, as we've seen, produce significantly more readable source code. However, if you use data binding when it is not appropriate, you may find yourself spending more time combating the API. Here are some basic guidelines:

If you don't have a schema, don't use data binding

Although there are data binding frameworks that don't need a schema, XML applications that don't have a defined schema generally need more flexibility in the structure of the produced XML than data binding allows.

If you have a schema that uses mixed content, don't use data binding

Or at least pick your framework carefully. Data binding, as its name suggested, is designed for data, not documents. Mixed content, like XHTML, just doesn't make sense expressed in a Java interface or class. Some frameworks support partial binding of a schema such that some portion of the object tree is represented by a Node object from DOM or a similar API, but this is not a universal feature.

If you are dealing with large documents, don't use data binding

Like DOM and its ilk, data binding creates an object tree with at least one Java object per node in the XML document. As a result, the size of a document to be unmarshalled from or marshalled into is dependent upon the amount of memory available.

 Python   SQL   Java   php   Perl 
 game development   web development   internet   *nix   graphics   hardware 
 telecommunications   C++ 
 Flash   Active Directory   Windows