Creating an XML Document





Creating an XML Document

As we mentioned earlier, basic Web pages are coded in HTML so that they can be displayed in a Web browser. HTML is a small, fixed subset of Standard Generalized Markup Language (SGML), a comprehensive system for coding the structure of text documents and other forms of data so that they can be used in a variety of environments. Extensible Markup Language (XML) is another subset of SGML. However, instead of being fixed like HTML, XML can be customized (extended) to store data so that it can be used in many ways in many environmentsfor example, as text, in a database or spreadsheet, or as a Web page.

Creating sophisticated, multi-purpose XML files can involve highly technical processes that are designed by experienced systems analysts and application developers. However, with Word 2007, anyone can participate in these processes by creating a Word document and then saving it as an XML file. During conversion, Word tags the file based on its styles and other formatting and saves it with an .xml extension.

You can open and edit an XML file in Word, in the same way you can an HTML file. You can also open it in an XML editor such as XMetal, or as a plain text file in a text editor such as Notepad.

If you want more control over the tagging of a document, you can attach an XML schema to it. The schema is an additional file that describes the structure allowed in the document, including the names of structural elements and what elements can contain what other elements. For example, a book might be divided into parts that can each contain chapters, which in turn can contain topics, which in turn can contain a heading, paragraphs, numbered and bulleted lists, tables, and other elements. The schema might also define formatting attributes that you can apply to text within specified elements. Word uses the schema to validate the document content and prompts you when content has been incorrectly tagged. Generally, companies employ a specialist with in-depth knowledge of XML to create custom schemas, but anyone can use an existing schema to tag a Word document and save it as an XML file.

In this exercise, you will first save a document in XML format. Then you will attach a schema to a document, tag document elements to create valid structure, and save that file as an XML file.

USE the 04_XML document and the 04_XMLSchema document schema. These practice files are located in the Chapter11 subfolder under SBS_Word2007.

OPEN the 04_XML document.


1.
Click the Microsoft Office Button, and then click Save As.

Microsoft Office Button

2.
In the Save As dialog box, type My XML in the File name box, click Word XML Document in the Save as type list, and then click Save.

Nothing appears to change, except that the title bar now displays My XML.

3.
Close the document.

4.
Click the Start button, click Documents, and then in the Documents window, navigate to the MSP\SBS_Word2007\Chapter11 folder.

Start

5.
Right-click the My XML file, point to Open With, and then click Notepad.

The Notepad plain text editor opens, displaying the contents of the XML file.

This "simple" method of creating XML files turns out to be not so simple after all! Hundreds of tags enclosed in greater than (>) and less than (<) signs make it possible for this plain text document to be displayed exactly as it appears in Word.

6.
Close Notepad, and then in the Chapter11 window, double-click the 04_XML document to reopen it in Word.

7.
Click the Microsoft Office Button, and click Word Options. Then on the Popular page of the Word Options window, under Top options for working with Word, select the Show Developer tab in the Ribbon check box, and click OK.

The Developer tab appears on the Ribbon.

8.
On the Developer tab, in the XML group, click the Schema button.

The Templates And Add-Ins dialog box opens.

9.
On the XML Schema tab of the dialog box, click Add Schema.

10.
In the Add Schema dialog box, navigate to the Documents\MSP\SBS_Word2007\Chapter11 folder, and then double-click 04_XMLSchema.

The Schema Settings dialog box opens.

11.
In the Alias box, type 04_XMLSchema, and then click OK.

Word adds the schema to the list of available schemas and attaches it to the document.

12.
In the Templates and Add-ins dialog box, click XML Options.

The XML Options dialog box opens.

13.
Under Schema validation options, verify that the Validate document against attached schemas check box is selected and the Hide schema violations in this document check box is cleared.

14.
Under XML view options, verify that the Hide namespace alias in XML Structure task pane check box is cleared, and then select the Show advanced XML error messages check box.

15.
Click OK to close the XML Options dialog box, and then close the Templates and Add-ins dialog box.

The XML Structure task pane opens.

16.
In the XML Structure task pane, verify that the Show XML tags in the document check box is selected.

Tip

When you don't need to see XML tags in a document, you can hide them by clearing the Show XML Tags In The Document check box.

17.
Click anywhere in the document window. Then at the bottom of the XML Structure task pane, in the Choose an element to apply to your current selection list, click classlist {04_XMLSchema}.

18.
In the message box asking how you want to apply the selected element, click Apply to Entire Document.

Word selects all the text in the document, adds an opening XML tag and a closing XML tag at either end of the document to indicate that the entire document is now a classlist element, and lists the element in the Elements In The Document box in the XML Structure task pane.

19.
Select all the text from Designing with Color down through Check with Jo about color swatches and kits for students. Then in the Choose an element to apply to your current selection box, click class.

Word tags the selection as a class element. All the information between the two class tags belongs to one particular class.

Tip

By default, the List Only Child Elements Of Current Element check box is selected. This simplifies the list of elements by showing only the ones that are valid in the current location. If you want to see a complete list of elements allowed in this schema, clear this check box. Invalid elements are then flagged with a slash inside a circle (the "not allowed" symbol).

20.
Select the Designing with Color heading, and tag it as title. Then select each of the next six paragraphs one at a time, and tag them in turn as instructor, date, time, description, cost, and classroom.

Tip

It is helpful to have non-printing characters displayed when you are selecting paragraphs for tagging.

As you tag each element, it appears in the Elements In The Document box. An X next to the classlist and class elements indicates that the structure is not valid according to the schema rules, and three dots under the classroom element and at the end of the class element tell you that an element is missing.

21.
Point to the X beside class.

A ScreenTip tells you that untagged text is not allowed in the class element; all text must be enclosed in valid start and end element tags.

22.
Select the sentence that begins Check with Jo (the only remaining untagged text in the class element). Then in the Choose an element to apply to your current selection list, click notes.

Word tags the element, and the X next to class disappears.

23.
Select all the text from Feng Shui Made Easy down to Andy will need the screen set up for his PowerPoint slides. In the Choose an element to apply to your current selection box, click class.

Word tags the element and the X next to classlist disappears.

24.
Select each of the paragraphs in this class in turn, and tag them as title, instructor, date, time, description, cost, and notes.

In the Elements In The Document box, a question mark appears next to the second class element, and a wavy purple line appears in the left margin of the document to show you the section with invalid structure.

25.
Point to the question mark.

Word tells you that according to the rules laid out by the schema, the class element is incomplete.

26.
In the Feng Shui Made Easy class in the document, click to the right of the cost end tag, press the key, type Room 2, select the text, and tag it as classroom.

The document's structure is now fully valid, and you're ready to save the document as an XML file.

Troubleshooting

If the Allow Saving As XML Even If Not Valid check box is cleared in the XML Options dialog box, Word will not allow you to save a document as XML unless the structure is valid. If Word tells you that it cannot save your document as XML because its structure violates the rules set by the schema, you have three choices: save the file as a Word document; click Cancel and change the option in the XML Options dialog box; or click Cancel and go back to the Elements In The Document box of the XML Structure task pane to correct the structure of marked elements.

27.
Click the Microsoft Office Button, click Save As, name the file My XML With Schema, change the Save as type setting to Word XML Document, and then click Save.

28.
Close the XML Structure task pane, and then close the My XML With Schema document.

29.
Click the Microsoft Office Button, and then in the Recent Documents pane, click My XML With Schema.

The XML file opens in Word, where you can edit it like a normal document.

BE SURE TO hide the Developer tab by displaying the Word Options window and clearing the Show Developer Tab In The Ribbon check box.

CLOSE the My XML With Schema file, and if you are not continuing directly on to the next chapter, quit Word.


Tip

The power of XML lies in its flexibility. After you create an XML file, you can apply a transform (also called a translation) to it to pull only the data you need and put it in the format you want. For example, you could apply one transform to the list of classes that extracts the title, description, instructor, cost, date, and time of the class and then formats that information as a Web page for customers. You could also apply a different transform that extracts the date, classroom, and notes and then formats that information as a memo for setup staff. The subject of transforms is beyond the scope of this book. For more information, see Microsoft Office Word 2007 Inside Out, by Katherine Murray and Mary Millhollon (Microsoft Press, 2007).


The DOCX Format

The Microsoft Office 2007 system introduces a new file format based on XML, called Microsoft Office Open XML Formats. By default, Word 2007 files are saved in the DOCX format, which is the Word variation of this new file format.

The DOCX format provides the following benefits:

  • File size is smaller because files are compressed when saved, decreasing the amount of disk space needed to store the file, and the amount of bandwidth needed to send files in e-mail, over a network, or across the Internet.

  • Recovering at least some of the content of damaged files is possible because XML files can be opened in a text program such as Notepad.

  • Security is greater because DOCX files cannot contain macros, and personal data can be detected and removed from the file. (Word 2007 provides a different file formatDOCMfor files that contain macros.)




 Python   SQL   Java   php   Perl 
 game development   web development   internet   *nix   graphics   hardware 
 telecommunications   C++ 
 Flash   Active Directory   Windows