Chapter 23. XPath Reference

XPath is a non-XML syntax for expressions that identifies particular nodes and groups of nodes in an XML document. It is used by both XPointer and XSLT, as well as by some native XML databases and query languages.

XPath views each XML document as a tree of nodes. Each node has one of seven types:

Root

Each document has exactly one root node, which is the root of the tree. This node contains one comment node child for each comment outside the document element, one processing-instruction node child for each processing instruction outside the root element, and exactly one element node child for the root element. It does not contain any representation of the XML declaration, the document type declaration, or any whitespace that occurs before or after the root element. The root node has no parent node. The root node's value is the value of the root element.

Element

An element node has a name, a namespace URI, a parent node, and a list of child nodes, which may include other element nodes, comment nodes, processing-instruction nodes, and text nodes. An element node also has a collection of attributes and a collection of in-scope namespaces, none of which are considered to be children of the element. The string-value of an element node is the complete, parsed text between the element's start- and end-tags that remains after all tags, comments, and processing instructions are removed and all entity and character references are resolved.

Attribute

An attribute node has a name, a namespace URI, a value, and a parent element. However, although elements are parents of attributes, attributes are not children of their parent elements. The biological metaphor breaks down here. xmlns and xmlns:prefix attributes are not represented as attribute nodes. An attribute node's value is the normalized attribute value.

Text

Each text node represents the maximum possible contiguous run of text between tags, processing instructions, and comments. A text node has a parent node but does not have children. A text node's value is the text of the node.

Namespace

A namespace node represents a namespace in scope on an element. In general, each namespace declaration by an xmlns or xmlns:prefix attribute produces a namespace node on that element and on all of its descendant elements (unless overridden by another namespace declaration). Like attribute nodes, each namespace node has a parent element but is not the child of that parent. The name of a namespace node is the prefix. The value of a namespace node is the namespace URI.

Processing instruction

A processing-instruction node has a target, data, a parent node, and no children. The name of a processing-instruction node is its target. The value of a processing-instruction node is the data of the processing instruction, not including any initial whitespace.

Comment

A comment node represents a comment. It has a parent node and no children. The value of a comment is the string content of the comment, not including the <!-- and -->.

The XML declaration and the document type declaration are not included in XPath's view of an XML document. All entity references, character references, and CDATA sections are resolved before the XPath tree is built. The references themselves are not included as a separate part of the tree.