April 5, 2011, 5:31 p.m.
posted by soulmaker
What Is XML?
XML is nothing more than a data format that is both human readable and machine readable. Have you ever tried to open a Microsoft Word document with Notepad? Good luck (see Figure). Although you can usually sift out the main text of the document, most of what you see is gobbledygook. That's because it is in a proprietary binary format. It's proprietary because, frankly, you shouldn't be poking your fingers in there. That's what Microsoft Word is for. And it's binary because you can store a lot of information conveniently in a little bit of disk space. With such a file, I can store my data anyway I choose. In fact, I can write my data out willy-nilly, and not have to get permission from anyone, because it's mine, mine, all mine.
This chapter in Notepad
Binary files are great for storing any kind of data: numbers, strings, base-64 encrypted images, streams of networking data chatter, anything. The problem is that unless you know the exact structure that you used to write it out, there is little chance of ever getting the data back. This is good if your goal is secrecy, but if you ever need to share that data with another person or program, or worse yet, debug the output from your errant program, you're in for a tough time. If one little byte gets messed up, the whole file might be useless.
There are, of course, other ways to store your data. For files that store records of data, tab-delimited and CSV (comma-separated values) files provide a convenient transfer medium, in a more human-friendly format. For instance, consider this data from Microsoft's sample "Northwind Traders" database, stored as comma-separated values.
ProductID,ProductName,SupplierID,Category,UnitPrice,Available "1","Chai","652","Beverages","$18.00","Yes" "2","Chang","9874","Beverages","$19.00","No" "3","Aniseed Syrup","9874","Condiments","On Sale","Yes"
Now that's better. This data is pretty easy to understand. Each piece of data is grouped by commas, and the first row indicates what each column contains. And the best part is, many programs already know how to read files in this format. If you save this data in a text file with a ".csv" extension, and open it in Microsoft Excel, the data automatically appears in columns as expected.
But it could be better. For instance, what do those "652" and "9874" values refer to anyway? And is it correct that the unit price of Aniseed Syrup is "On Sale?" Sure, I can load this data into my program, but can I do anything with it? At least it's an easy read for both people and computer programs, and isn't that what I said XML was all about?
Well, yes. Although XML includes rules and features that make it more flexible than your average text data file, it's not that different. For all the hype, XML is just a way of storing data. Any of the fancy-schmancy XML traits discussed in this chapter could be performed easily with data stored in more simple text or binary proprietary formats. In fact, it is often quicker and more convenient to develop using a proprietary format, because your data will contain exactly and only what you need, without any fluff.
That being said, XML does include many aspects that make it a strong contender when considering a data format.
But there's bad news, too.