What Is XML?

What Is XML?

XML is nothing more than a data format that is both human readable and machine readable. Have you ever tried to open a Microsoft Word document with Notepad? Good luck (see Figure). Although you can usually sift out the main text of the document, most of what you see is gobbledygook. That's because it is in a proprietary binary format. It's proprietary because, frankly, you shouldn't be poking your fingers in there. That's what Microsoft Word is for. And it's binary because you can store a lot of information conveniently in a little bit of disk space. With such a file, I can store my data anyway I choose. In fact, I can write my data out willy-nilly, and not have to get permission from anyone, because it's mine, mine, all mine.

This chapter in Notepad

Binary files are great for storing any kind of data: numbers, strings, base-64 encrypted images, streams of networking data chatter, anything. The problem is that unless you know the exact structure that you used to write it out, there is little chance of ever getting the data back. This is good if your goal is secrecy, but if you ever need to share that data with another person or program, or worse yet, debug the output from your errant program, you're in for a tough time. If one little byte gets messed up, the whole file might be useless.

There are, of course, other ways to store your data. For files that store records of data, tab-delimited and CSV (comma-separated values) files provide a convenient transfer medium, in a more human-friendly format. For instance, consider this data from Microsoft's sample "Northwind Traders" database, stored as comma-separated values.

"3","Aniseed Syrup","9874","Condiments","On Sale","Yes"

Now that's better. This data is pretty easy to understand. Each piece of data is grouped by commas, and the first row indicates what each column contains. And the best part is, many programs already know how to read files in this format. If you save this data in a text file with a ".csv" extension, and open it in Microsoft Excel, the data automatically appears in columns as expected.

But it could be better. For instance, what do those "652" and "9874" values refer to anyway? And is it correct that the unit price of Aniseed Syrup is "On Sale?" Sure, I can load this data into my program, but can I do anything with it? At least it's an easy read for both people and computer programs, and isn't that what I said XML was all about?

Well, yes. Although XML includes rules and features that make it more flexible than your average text data file, it's not that different. For all the hype, XML is just a way of storing data. Any of the fancy-schmancy XML traits discussed in this chapter could be performed easily with data stored in more simple text or binary proprietary formats. In fact, it is often quicker and more convenient to develop using a proprietary format, because your data will contain exactly and only what you need, without any fluff.

That being said, XML does include many aspects that make it a strong contender when considering a data format.

  • It's straightforward to read. Each data element includes a type of title. Good titles make for good reading.

  • It's easy to process. All data includes starting and ending tags, so a program can process the data without much effort. And one bad element won't necessarily ruin the whole file.

  • It's flexible. You can store any type of data in XML. It is just a text file, after all. If you have a certain XML file format used in version 1 of your program, and you add features to it in version 2, you can do it in a way that still allows version 1 programs to use version 2 files without breaking.

  • It's self-describing. XML includes several methods that let you describe the content of a given XML file. Two of the most popular are: DTD (Document Type Definition) and XSD (XML Schema Definition). You use these tools to indicate exactly what you expect your data file to contain. Additionally, XML allows you to embed comments in the content without impacting the actual data.

  • It's self-verifying. There are tools available, including tools in .NET, which can confirm the integrity and format of an XML file by comparing the content to the associated DTD or XSD. This lets you verify a file before you even process it.

  • It's an open standard. XML has gained widespread acceptance, even across divergent computer platforms.

  • It's built into .NET. This is going to be the biggest reason for using it. In fact, you won't be able to get away from XML in .NET, even if you try. It's everywhere.

But there's bad news, too.

  • It's bulky. XML content contains a lot of repetitive structural information, and generally lots of whitespace. You could abbreviate many of the structure elements, and remove all the whitespace (XML doesn't require it), but that would remove the human-readable aspects of the data. Some platforms, such as cell phone browsers, like to keep data small. XML is anything but small.

  • It's text. Wait a minute, this is a good thingmost of the time. Sometimes you just need to store binary data, like pictures. You can't really store true binary data in an XML file without breaking one of the basic rules about XML: text only! Often, binary data is encoded in a text-like format, such as base-64 (which uses readable characters to store binary data).

  • It's inefficient. This comes from having data in a verbose semi-human-readable format, rather than in terse, compact binary form. It simply takes longer for a computer to scan text looking for matching angle brackets than it does to move a few bytes directly from a lump of binary data into a location in memory.

  • It's human readable. There are not many secrets in an XML file. And while you could encrypt the data elements in the file, or the entire file for that matter, that would kind of defeat the purpose of using XML.

  • It's machine readable. If you are expecting the average Joe to pick up an XML printout and read it in his easy chair, think again. XML is not appropriate for every type of data file.

  • It's not immune to errors. As I keep repeating, XML is just a text file. If you open it in Notepad and let your five-year-old pound on the keyboard, the content will have problems. XML is not a panacea; it's just a useful file format.

 Python   SQL   Java   php   Perl 
 game development   web development   internet   *nix   graphics   hardware 
 telecommunications   C++ 
 Flash   Active Directory   Windows