Recipe 11.7. Validating an XML Document
Credit: Mauro Cicio
Problem
You want to check whether an XML document conforms to a certain schema or DTD.
Solution
Unfortunately, as of this writing there are no stable, pure Ruby libraries that do
XML validation. You'll need to install a Ruby binding to a C library. The easiest one to use is the Ruby binding to the GNOME libxml2 toolkit. (There are actually two Ruby bindings to libxml2, so don't get confused: we're referring to the one you get when you install the libxml-ruby gem.)
To validate a document against a DTD, create a a DTD object and pass it into Document#validate. To validate against an XML Schema, pass in a Schema object instead.
Consider the following DTD, for a cookbook like this one:
require 'rubygems'
require '
libxml'
dtd = XML::Dtd.new(%{<!ELEMENT rubycookbook (recipe+)>
<!ELEMENT recipe (title?, problem, solution, discussion, seealso?)+>
<!ELEMENT title (#PCDATA)>
<!ELEMENT problem (#PCDATA)>
<!ELEMENT solution (#PCDATA)>
<!ELEMENT discussion (#PCDATA)>
<!ELEMENT seealso (#PCDATA)>})
Here's an XML document that looks like it conforms to the DTD:
open('cookbook.xml', 'w') do |f|
f.write %{<?xml version="1.0"?>
<rubycookbook>
<recipe>
<title>A recipe</title>
<problem>A difficult/common problem</problem>
<solution>A smart solution</solution>
<discussion>A deep solution</discussion>
<seealso>Pointers</seealso>
</recipe>
</rubycookbook>
}
end
But does it really? We can tell for sure with Document#validate:
document = XML::Document.file('cookbook.xml')
document.validate(dtd) # => true
Here's a Schema definition for the same document. We can validate the document against the schema by making it into a Schema object and passing that into Document#validate:
schema = XML::Schema.from_string %{<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="recipe" type="recipeType"/>
<xsd:element name="rubycookbook" type="rubycookbookType"/>
<xsd:element name="title" type="xsd:string"/>
<xsd:element name="problem" type="xsd:string"/>
<xsd:element name="solution" type="xsd:string"/>
<xsd:element name="discussion" type="xsd:string"/>
<xsd:element name="seealso" type="xsd:string"/>
<xsd:complexType name="rubycookbookType">
<xsd:sequence>
<xsd:element ref="recipe"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="recipeType">
<xsd:sequence>
<xsd:element ref="title"/>
<xsd:element ref="problem"/>
<xsd:element ref="solution"/>
<xsd:element ref="discussion"/>
<xsd:element ref="seealso"/>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
}
document.validate(schema) # => true
Discussion
Programs that use
XML
validation are more robust and less complicated than nonvalidating versions. Before starting work on a document, you can check whether or not it's in the format you expect. Most services that accept XML as input don't have forgiving parsers, so you must validate your document before submitting it or it might fail without you even noticing.
One of the most popular and complete XML libraries around is the GNOME Libxml2 library. Despite its name, it works fine outside the GNOME platform, and has been ported to many different OSes. The Ruby project libxml (http://libxml.rubyforge.org) is a Ruby wrapper around the GNOME Libxml2 library. The project is not yet in a mature state, but it's very active and the
validation features are definitively usable. Not only does libxml support validation and a complete range of XML manipolation techniques, it can also improve your program's speed by an order of magnitude, since it's written in C instead of REXML's pure Ruby.
Don't confuse the libxml project with the libxml library. The latter is part of the XML::Tools project. It binds against the GNOME Libxml2 library, but it doesn't expose that library's validation features. If you try the example code above but can't find the XML::Dtd or the XML::Schema classes, then you've got the wrong binding. If you installed the libxml-ruby package on Debian GNU/Linux, you've got the wrong one. You need the one you get by installing the libxml-ruby gem. Of course, you'll need to have the actual GNOME libxml library installed as well.
See Also
|