Specific Node-Type Interfaces

Although it is possible to access the data from the original XML document using only the Node interface, the DOM Core provides a number of specific node-type interfaces that simplify common programming tasks. These specific node types can be divided into two broad types: structural nodes and content nodes.

Within an XML document, a number of syntax structures exist that are not formally part of the content. The following interfaces provide access to the portions of the document that are not related to element data.

The actual data conveyed by an XML document is contained completely within the document element. The following node types map directly to the XML document's nonstructural parts, such as character data, elements, and attribute values.

Each parsed document causes the creation of a single Document node in memory. (Empty Document nodes can be created through the DOMImplementation interface.) This interface provides access to the document type information and the single, top-level Element node that contains the entire body of the parsed document (the documentElement). It also provides access to the class factory methods that allow an application to create new content nodes that were not created by parsing a document. Table 19-10 shows all attributes and methods of the Document interface.

The various create...( ) methods are important for applications that wish to modify the structure of a document that was previously parsed. Note that nodes created using one Document instance may only be inserted into the document tree belonging to the Document that created them. DOM Level 2 provided a new importNode( ) method that allows a node, and possibly its children, to be essentially copied from one document to another. DOM Level 3 introduced the adoptNode( ) method that actually moves an entire node subtree from one document to another.

Besides the various node-creation methods, some methods can locate specific XML elements or lists of elements. The methods getElementsByTagName( ) and getElementsByTagNameNS() return a list of all XML elements with the name, and possibly namespace, specified. The getElementById( ) method returns the single element with the given ID attribute.

DOM Level 3 also introduced several attributes that are useful when an application wishes to reconstruct an XML document to its original, pre-parsing format. The inputEncoding, xmlEncoding, and xmlStandalone attributes preserve information about the values of the XML declaration from the original document as well as the character encoding of the document before it was parsed (and converted to Unicode).

One of the major additions to DOM in Level 3 was the inclusion of document validation support within the DOM tree itself. The normalizeDocument( ) method provides the developer with a mechanism for essentially "re-parsing" the XML document from the DOM tree in memory. Various parameters available through the domConfig attribute control how this normalization will occur. It is also possible to change the target version of XML by modifying the xmlVersion attribute before normalization. This will cause the DOM to enforce the XML name construction rules associated with the selected XML version. See Chapter 21 for more information about the differences between XML Versions 1.0 and 1.1.