The IDs and IDREFs are stored in the PSVI in a table (called the “ID/IDREF table”) and can eventually be used by the applications to locate the corresponding nodes. We can expect XPath applications (including XPointer) to provide shortcuts and fast access to the nodes identified by W3C XML Schema, as is already the case with the DTD IDs.
Simple and easy to use within their domain, IDs and IDREFs keep the limitations of their DTDs ancestors. W3C XML Schema provides a more flexible feature for defining identity constraints without limitation on its lexical space and allowing local keys and references, as well as multinodes keys.
Another important difference is that the ID/IDREF checks are done on
datatypes based on xs:NMTOKEN
datatypes, while the
checks that we will see hereafter can be performed on other
datatypes, and the comparisons will be done on the actual value
spaces rather than on their string representations from the lexical
space. These checks are based on a set of XPath
expressions and are defined through three different (but similar)
constructs to test the uniqueness of a value, define a key, and
define a key reference.
The first of these constructs defines a simple check for uniqueness. We will spend some time explaining this in detail, since the two other constructs are based on the same pattern.
The definition of these constraints is done using two consecutive relative XPath expressions evaluated against the position of the element under which they are defined. We need a clear picture of the structure of the instance documents to define them. The starting point is the location of the element under which the check is defined. This location determines the scope of the test and must be carefully chosen, since it is the basis from which all the checks will be performed for this constraint.
For instance, in our library, we can choose to define a check for the
uniqueness of the ISBN number of our books under the library element,
since we need to check it within the scope of the whole library.
However, within a book, we may also test that the reference to a
character is unique within the scope of this book. We can define this
second check inside the book
element.
Once we have chosen the location of the test, we can start writing it at the end of the definition of the element:
<xs:element name="book" maxOccurs="unbounded"> <xs:complexType> .../... </xs:complexType> <xs:unique name="book"> .../... </xs:unique> </xs:element>
The name
attribute used here will be useful if we
want to refer to this constraint through a keyref
.
Now that we have defined the name and the root of the test, we will
define the selector
that is the relative path of
the node being identified. In our example, the relative path to
access a book element from library
is
book
, so we write:
<xs:element name="library"> <xs:complexType> .../... </xs:complexType> <xs:unique name="book"> <xs:selector xpath="book"/> .../... </xs:unique> </xs:element>
We have expressed the fact that a book must be unique within a
library. To complete the description of this check, we need to define
how a book is identified through field
elements.
In our case, the identifier is the isbn
subelement, and the complete definition is:
<xs:element name="library" maxOccurs="unbounded"> <xs:complexType> .../... </xs:complexType> <xs:unique name="book"> <xs:selector xpath="book"/> <xs:field xpath="isbn"/> </xs:unique> </xs:element>
Translated into plain English, this definition can be read as “for each library, each book identified by its ISBN should be unique.”
If the names of our authors were split in our library into first, middle, and last names, we may find it convenient to define a composite field to identify our authors. W3C XML Schema provides this feature by allowing definition of several fields within a single constraint—for instance:
<xs:element name="library"> <xs:complexType> .../... </xs:complexType> <xs:unique name="author"> <xs:selector xpath="author"/> <xs:field xpath="first-name"/> <xs:field xpath="middle-name"/> <xs:field xpath="last-name"/> </xs:unique> </xs:element>
The check is then done on the triple that is composed of the values of the three fields (first-name, middle-name, last-name) that need to be unique as a combination.
A key is a unique constraint with the additional restriction that all the nodes corresponding to all the fields are required.
The syntax for defining a key is the same as the syntax for defining a unique condition, except the unique element is replaced by a key element:
<xs:element name="library"> <xs:complexType> .../... </xs:complexType> <xs:key name="book"> <xs:selector xpath="book"/> <xs:field xpath="isbn"/> </xs:key> </xs:element>
There is clearly an overlap between the additional existence check
done by a key constraint and the other ways to control the number of
occurrences of an element or attribute. In our example, if the
minimum number of occurrences for the author’s name
is set to one, using
xs:unique
or
xs:key
is equivalent, except when the
author’s name can have a
“nil” value. (We will discuss the
“nil” value in Chapter 11.)
Despite its name,
xs:keyref
can be used not only to define a
reference to
xs:key
, but
also to
xs:unique
.
The usage of
xs:keyref
is
straightforward and similar to the usage of
xs:key
or
xs:unique
, with an important point worth
mentioning: the refer
attribute of
xs:keyref
should refer to a
xs:key
or
xs:unique
element defined under the same
element or under one of their ancestors.
The reason for this rule is that the “identity-constraint tables” where the keys and references are stored are local to an element and its ancestors.
The definitions of matching
xs:unique
or
xs:key
and
xs:keyref
need to be done within the same
element, or else one of its ancestors has an impact on the choice of
this location. If, for instance, our books and authors are kept in
separate sections of our document:
<library> <books> <book> .../... <author-ref ref="Charles M. Schulz"/> .../... </book> .../... </books> <authors> <author> <name> Charles M. Schulz </name> .../... </author> .../... </authors> </library>
It’s good practice to define a modular schema by
locating the constraints as near as possible to the elements they
control. A natural fit is to locate a key in the
authors
element and the matching keyref in the
books
element. However, since a
xs:keyref
needs to be in the same element as
the matching
xs:key
or one
of its ancestors, and books
isn’t
an ancestor of authors
, the
xs:keyref
definition can only be done in the
library
element. (The
xs:key
can be defined either in the
library
or in the authors
element.)
In the previous example, locating the
xs:key
definition within
library
or authors
was only a
matter of style, since the authors are unique both within a
library
and within the authors
elements. However, W3C XML Schema allows for situations in which this
isn’t the case and in which a key is unique within
the scope of a subelement without being unique within the whole
document.
Let’s modify the previous example to define several categories of authors:
<library> <books> <book> .../... <author-ref ref="Charles M. Schulz"/> .../... </book> .../... </books> <authors> <category id="comics"> <author> <name> Charles M. Schulz </name> .../... </author> .../... </category> <category id="novels"> .../... </category> .../... </authors> </library>
Defining a
xs:key
(or
xs:unique
) within
library
or authors
specifies a
uniqueness within the scope of the entire library. Defining a list of
authors within category
specifies a uniqueness
within this category only, and allows authors with the same name to
be defined under several categories.
It is perfectly valid, per W3C XML Schema, to define a
xs:key
under category
and a matching
xs:keyref
under
library
(since library
is an
ancestor of category
). By doing so, a new
constraint is added to authors’ names. When an
author is referenced within a book, her name has to be unique within
the scope of the
xs:keyref
.
Applied to our instance document, this means that if
“Charles M. Schulz” was not
referenced in one of the books, he can be defined in several
categories; since he is referenced in one book, his name must be
defined once only.
The W3C XML Schema Recommendation
states that “to reduce the burden on implementers,
in particular implementers of streaming processors, only restricted
subsets of XPath expressions are allowed” in
xs:selector
and
xs:field
. The result of this statement is a
limited subset of XPath that allows only the selection of nodes that
are descendants of or are part of the current locations.
The XPath expressions allowed in
xs:selector
must exclusively go deeper into the
hierarchy of the XML element nodes, do not allow any tests in the
XPath steps, and must match a set of elements. In addition, the XPath
expressions allowed in
xs:field
can also select attributes.
The full BNF for this subset is given in the reference guide. Rather than giving a verbose explanation, let’s see some examples of what is possible and what is not.
The following are allowed:
Selects the child elements named author
that do
not belong to any namespace.
Selects the child elements named author
or
character
that do not belong to any namespace.
Selects the child elements named author
that
belong to the namespace whose prefix is
“lib”.
Selects all the child elements.
Selects all the child elements that belong to the namespace whose prefix is “lib”.
Selects all the authors
/author
child elements.
Selects all the elements that are descendants of the current node,
named author
, and don’t belong to
any namespace.
Selects the id
attribute of the
author
child element (allowed only for
xs:field
, and not for
xs:selector
).
Selects @id
or @name
(valid
only in
xs:field
, since
attributes are forbidden in
xs:selector
).
The following are forbidden:
Absolute paths are not allowed.
The parent axis is not allowed.
Tests are not allowed.
Tests are not allowed.
Function calls are not allowed.
Absolute paths are not allowed.