XPaths, shorthand pointers, and child sequences can only point
to entire nodes or sets of nodes. However, sometimes you want to point
to something that isn't a node, such as the third word of the second
paragraph or the year in a date
attribute that looks like date="01/03/1950
". XPointer adds points and
ranges to the XPath data model to make this possible. A
point is the position preceding or following any
tag, comment, processing instruction, or character in the #PCDATA.
Points can also be positions inside comments, processing instructions,
or attribute values. Points cannot be located inside an entity
reference, although they can be located inside the entity's
replacement text. A range is the span of parsed
character data between two points. Nodes, points, and ranges are
collectively called locations ; a set that may contain nodes, points, and ranges is
called a location set . In other words, a location is a generalization of the
XPath node that includes points and ranges, as well as elements,
attributes, namespaces, text nodes, comments, processing instructions,
and the root node.
A point is identified by its container node and a non-negative index into that node. If the node contains child nodes—that is, if it's a document or element node—then there are points before and after each of its children (except at the ends, where the point after one child node will also be the point before the next child node). If the node does not contain child nodes—that is, if it's a comment, processing instruction, attribute, namespace, or text node—then there's a point before and after each character in the string value of the node, and again the point after one character will be the same as the point before the next character.
Consider the document in Example 11-1. It contains a
novel
element that has seven child
nodes, three of which are element nodes and four of which are text
nodes containing only whitespace.
Example 11-1. A novel document
<?xml version="1.0"?> <?xml-stylesheet type="text/css" value="novel.css"?> <!-- You may recognize this from the last chapter --> <novel copyright="public domain"> <title>The Wonderful Wizard of Oz</title> <author>L. Frank Baum</author> <year>1900</year> </novel>
There are eight points directly inside the novel
element numbered from 0 to 7, one
immediately after and one immediately before each tag. Figure 11-1 identifies these
points.
Inside the text node child of the year
element, there are five points:
Point 0 between <year>
and 1
Point 1 between 1
and
9
Point 2 between 9
and
0
Point 3 between 0
and
0
Point 4 between 0
and
</year>
Notice that the points occur between the characters of the text rather than on the characters themselves. Points are zero-dimensional. They identify a location, but they have no extension, not even a single character. To indicate one or more characters, you need to specify a range between two points.
XPointer adds two functions to XPath that make it very easy to
select the first and last points inside a node: start-point()
and end-point()
. For
example, this XPointer identifies the first point inside the title
element—that is, the point between the
title node and its text node child:
xpointer(start-point(//title))
This XPointer indicates the point immediately before the
</author>
tag:
xpointer(end-point(//author))
If there were multiple title
and author
elements in the
document, then these functions would select multiple points.
This XPointer points to the point immediately before the letter T in "The Wonderful Wizard of Oz":
xpointer(start-point(//title/text( )))
This point falls immediately after the point indicated by
xpointer(start-point(//title))
.
These are two different points, even though they fall between the same
two characters (>
and T
) in the text.
To select points other than the start-point or end-point of a
node, you first need to form a range that begins or ends with the
point of interest, using string-range(
)
, and then use the start-point
or end-point
function on that range. We take
this up in the next section.