XML Schema

XML Schema

XML Schema is designed to enable object-oriented descriptions of XML. The rich type system introduced in XML Schema was specifically designed to allow the encoding of structured data in XML. In XForms, XML Schema's ability to create object-oriented descriptions of XML data is used to advantage in modeling the data to be collected by the application.

User input can be checked against these declarative constraints using XML processors. The rest of this section gives a brief tutorial on the features of XML Schema that prove useful in designing data models for XForms applications. The interested reader is referred to the wealth of XML Schema resources for additional details.

With the maturing of XML on the Web, the use of XML for structured data interchange is becoming increasingly popular. Data repositories, such as relational databases, also find XML representations of structured data a convenient means of exchanging data among different systems. These uses of XML for encapsulating and interchanging structured data create the need for static type checking of XML data. Type information can be captured using XML Schema, and such type constraints can be automatically checked using off-the-shelf XML processors such as xerces.[9] Such structured data can be bound to specific implementation languages such as Java using data binding. This proves a convenient means of interchanging data among systems distributed across the network and automatically marshaling such data between the XML interchange representation and the run-time representation used by a given environment.

[9] http://xml.apache.org/xerces

We illustrate the XML Schema declaration for USAddress in Figure and compare it to an equivalent Interface Definition Language (IDL) declaration of the same type in Figure. Notice that the IDL representation is biased toward implementation languages, whereas the XML representation is biased toward declaratively capturing the required information in an implementation-independent manner. These representations should not be viewed as competing approaches; rather, each reflects different design points in the overall spectrum of possible solutions.

6 IDL declaration of type USAddress
  interface USAddress {
  String name; String  street; String city;
  //Enumeration of  two letter codes
  USState  state;
  Integer zip; Float gpsLatitude; Float gpsLongitude;

1 Schema Built-in Types

The following built-in types from XML Schema are especially relevant for modeling structured data to be collected from the user. Note that XML Schema also has a few built-in types that are more relevant to defining document grammars, for example, token that will not be discussed in detail in this book. The complete list of built-in schema data types is described in XML Schema Part 2.[10]

[10] http://www.w3.org/TR/xmlschema-2/

These built-in types help constrain the lexical values of leaf nodes in an XML structure. These constraints can thus be applied to the text contents of an element or the value of an attribute. This set of basic types can be extended as described in the next section. XML Schema defines several additional data types derived from the above set of built-in types. We illustrate the use of the enumerated built-in types with an example in Figure. An XML instance document is shown with the type of each element declared using attribute xsi:type. Later, we extend this example to define a complete schema for invitations in Section 2.5.3.[11]

[11] Note that the types in this initial example are shown using xsi:type for clarity. In a real-world example, these would be provided by the XML Schema definition.

7 Illustrates the use of some of the built-in data types provided by XML Schema.
<invitation xmlns:xsi
  <title>BubbleDog's 5th Birthday</title>
  <age xsi:type="xsd:integer">5</age>
  <born xsi:type="xsd:Date">1997-12-21</born>
  <!-- party At a palindromic moment -->
  <party xsi:type="xsd:dateTime">
  <!-- lasts 1 hour -->
  <duration xsi:type="xsd:duration">PT1H</duration>
  <!-- Recurs annually on December 21 -->
  <annual xsi:type="xsd:gMonthDay">12-21</annual>
  <!-- if celebrated monthly -->
  <monthly xsi:type="xsd:gDay">21</monthly>
  <location  xsi:type="USAddress">...</location>
  <replyTo  xsi:type="xsd:anyURI">...</replyTo>
  <picture xsi:type="xsd:anyURI">...</picture>

Commonly Used XML Schema Data Types

















32-bit float



64-bit float



Time period



ISO 8601 date-time



Instant of time



Calendar date



Calendar month



Calendar year



Monthly recurring date



Annually recurring day



Annually recurring Month



Binary data





2 Extending Built-in Types

New data types can be defined starting from the set of XML Schema built-in types. Such type derivations are carried out by imposing appropriate restrictions on the set of allowable values for a given built-in type. Allowable values in XML Schema are governed by several facets; by restricting these facets, one can define subtypes of the built-in types described thus far. Figure lists facets that can be used in defining subtypes of the built-in types; type derivation by restricting values along one or more facets is called restriction. Note that not all of the facets listed in Figure are available on all built-in types; for details, see the XML Schema specification.

Simple types are defined in XML Schema using element simpleType. We show examples of the use of element simpleType in defining two user-defined types, USState and ZIPCode. We show an example of using string enumeration to define a new type called USState in Figure.

Figure Type USState is derived by restricting type xsd:string.
<xsd:simpleType name="USState"
  <xsd:restriction base="xsd:string">
    <xsd:enumeration value="AK"/>
    <xsd:enumeration value="AL"/>
    <xsd:enumeration value="AR"/>
    <!-- and so on ...-->

We define type ZIPCode by restricting xsd:string in Figure; the set of allowable values is specified via facet pattern. XML Schema also allows the definition of list and union types using element simpleType. Complete details of the use of element simpleType are beyond the scope of this book, and the interested reader is referred to the references on XML Schema.

Figure Using facet pattern to define type ZIPCode that can hold five-digit U.S. ZIP codes.
<xsd:simpleType name="ZIPCode"
  <xsd:restriction base="xsd:string">
    <xsd:pattern value="\d{5}"/>

Facets Restrict Values of Built-in XML Schema Data Types




Regular expression


Enumerate values


Minimum length


Maximum length


Lower Bound


Minimum allowed value


Upper bound


Maximum allowed value




Minimum length


Maximum length

3 Defining Aggregations Using Complex Types

Higher level data aggregations are encoded in XML using elements and attributes. The previous section described simple types as defined by XML Schema. XML structures that are the result of attaching attributes or element children are called complex types in XML Schema.

Constructs for creating complex types are defined in XML Schema Part 1,[12] and XML Schema Primer[13] gives a good tutorial introduction to this topic. This section gives a high-level overview of how these constructs can be used to define data aggregations.

[12] http://www.w3.org/TR/xmlschema-1/

[13] http://www.w3.org/TR/xmlschema-0/

In XML Schema, complex types allow elements in their content and may carry attributes; simple types cannot have element content and cannot carry attributes. XML Schema definitions create new types, and XML Schema declarations enable elements and attributes with specific names and types to appear in XML instances. In this section, we focus on defining complex types and declaring the elements and attributes that appear within them.

We illustrate these concepts by first defining complex type USAddress and then using this to define a more complete schema for the party invitation introduced in Figure. The schema in Figure defines a new type called USAddress. It declares that data conforming to type USAddress must have 5 element children and 2 attributes. It further constrains the values of these elements and attributes using XML Schema built-in types.

Figure Type Definition for complex type USAddress.
<x:complexType name="USAddress"
    <x:element name="name"   type="x:string"/>
    <x:element name="street" type="x:string"/>
    <x:element name="city"   type="x:string"/>
    <x:element name="state"  type="x:string"/>
    <x:element name="zip"    type="x:integer"/>
  <x:attribute name="gpsLatitude" type="x:decimal"/>
  <x:attribute name="gpsLongitude" type="x:decimal"/>

New complex types are defined using element complexType, and such definitions contain a set of element declarations, element references, and attribute declarations. The declarations are not themselves types, but rather an association between a name and the constraints that govern the appearance of that name in conforming XML instances. Thus, these are similar to statements in programming languages used to declare identifiers of a given type.

Elements are declared using element element; attributes are declared using element attribute. For example, we define InvitationType as a complex type, and within that definition, we see element and attribute declarations as shown in Figure.

Figure Definition of type InvitationType.
<s:schema xmlns:s="http://www.w3.org/2001/XMLSchema">
  <!-- insert USAddress definition here -->
  <s:complexType name="InvitationType">
      <s:element name="title" type="s:string"/>
      <s:element name="age" type="s:integer"/>
      <s:element name="born" type="s:date"/>
      <s:element name="party" type="s:dateTime"/>
      <s:element name="duration" type="s:duration"/>
      <s:element name="annual" type="s:gMonthDay"/>
      <s:element name="monthly" type="s:gDay"/>
      <s:element name="location" type="USAddress"/>
      <s:element name="replyTo" type="USAddress"/>
      <s:element name="picture" type="s:anyURI"/>

The consequence of the definition shown in Figure is that any element whose type is declared to be InvitationType must consist of the requisite number of elements and attributes. These elements must be named as specified by the values of the name attributes appearing in the definition, and each element must appear in the same order as declared. The USAddress definition contains only declarations involving the simple types xsd:string and decimal. More advanced type definitions, like the one for InvitationType shown in Figure, can use complex types defined earlier by using the same mechanism shown here.

In defining InvitationType, two of the element declarations, replyTo and location, associate different element names with the same complex type USAddress. The consequence of this definition is that any element appearing in an instance document whose type is declared to be InvitationType must consist of elements named replyTo and location, each containing the five subelements (name, street, city, state, and zip) that were declared as part of type USAddress. These elements may also carry the GPS attributes that were declared as part of USAddress.

Finally, notice that the declaration of child elements is enclosed in element sequence. Attributes minOccurs and maxOccurs on element sequence may be used to specify cardinality constraints on the number of child elements. If omitted, these default to 1 as in the examples shown in Figure.

     Python   SQL   Java   php   Perl 
     game development   web development   internet   *nix   graphics   hardware 
     telecommunications   C++ 
     Flash   Active Directory   Windows