The EntityReference Interface





The EntityReference Interface

The EntityReference interface represents a general entity reference such as   or &copyright_notice;. (It is not used for the five predefined entity references &, <, >, ', and ".)

Figure summarizes the EntityReference interface. You'll notice it declares exactly zero methods of its own. It inherits all of its functionality from the Node superinterface. In an XML document, an entity reference is just a placeholder for the text that will replace it. In a DOM tree, an EntityReference object merely contains the things that will replace the entity reference.

13 The EntityReference Interface
package org.w3c.dom;

public interface EntityReference extends Node {

}

The name of the entity reference is returned by the getNodeName() method. The replacement text for the entity (assuming that the parser has resolved the entity) can be read through the usual methods of the Node interface, such as getFirstChild(). However, entity references are read only. You cannot change their children using methods such as appendChild() or replaceChild() or change their names using methods such as setNodeName(). An attempt to do so throws a DOMException with the error code NO_MODIFICATION_ALLOWED_ERR.

EntityReference objects do not know their own system ID (URL) or public ID. Using the entity reference's name, however, you can look up this information in the NamedNodeMap of Entity objects returned by the getEntities() method of the DocumentType class. I'll show you an example of this when we get to the Entity interface. In the meantime, let's consider an example that creates new entity references in the tree.

One common complaint about XML is that it doesn't support the entity references like   and é which developers are accustomed to from HTML. Using DOM, it's uncomplicated to replace any inconvenient character with an entity reference, as Figure proves. This program recursively descends the element tree looking for any nonbreaking space characters (Unicode code point 0xA0). It replaces any it finds with an entity reference with the name nbsp. To do so, it has to split the text node around the nonbreaking space.

14 Inserting Entity References into a Document
import org.w3c.dom.*;


public class NBSPUtility {

  // Recursively descend the tree replacing all nonbreaking
  // spaces with  
  public static void addEntityReferences(Node node) {

    int type = node.getNodeType();
    if (type == Node.TEXT_NODE) {
                // the only type with attributes
      Text text = (Text) node;
      String s = text.getNodeValue();
      int nbsp = s.indexOf('\u00A0'); // finds the first A0
      if (nbsp != -1) {
        Text middle = text.splitText(nbsp);
        Text end = middle.splitText(1);
        Node parent = text.getParentNode();
        Document factory = text.getOwnerDocument();
        EntityReference ref =
         factory.createEntityReference("nbsp");
        parent.replaceChild(ref, middle);
        addEntityReferences(end); // finds any subsequent A0s
        System.out.println("Added");
      }
    } // end if

    else if (node.hasChildNodes()) {
      NodeList children = node.getChildNodes();
      for (int i = 0; i < children.getLength(); i++) {
        Node child = children.item(i);
        addEntityReferences(child);
      } // end for
    } // end if

  }  // end addEntityReferences()

}

It would be easy enough to make it replace all of the Latin-1 characters, or all of the characters that have standard entity references in HTML, or some such. You'd just need to keep a table of the characters and their corresponding entity references. You could even build such a table from the entities map available from the DTD.

Although this code runs, the documents it produces are not necessarily well-formed. In particular, only entities defined in the DTD should be used. Assuming that's the case, then the child list of the entity will be automatically filled by the entity's replacement text. Unfortunately, however, DOM does not offer any means of defining new entities that are not part of the document's original DTD.


     Python   SQL   Java   php   Perl 
     game development   web development   internet   *nix   graphics   hardware 
     telecommunications   C++ 
     Flash   Active Directory   Windows