Like any storage system, a XML document needs to provide ways to identify and reference pieces of the information it contains. In this chapter, we will present and compare the two features that allow XML to do so with W3C XML Schema. One directly emulates the ID, IDREF, and IDREFs attribute types from the XML DTDs, while the other was introduced to provide more flexibility through the use of XPath expressions.
The first way to describe identifiers and references
with W3C XML Schema is inherited from XML’s DTDs. We
already discussed this in Chapter 5: the
xs:ID
,
xs:IDREF
, and
xs:IDREFS
datatypes introduced in W3C XML
Schema emulate the behavior of the XML DTD’s ID,
IDREF, and IDREFS attribute types.
Unlike their DTD counterparts, these simple types can be used to
describe both elements and attributes, but inherit the other
restrictions from the DTDs: their lexical space is the same as the
unqualified XML name (known as the
xs:NCName
datatype), and they are global to a
document, meaning that you won’t be allowed to use
the same ID value to identify, for instance, both an author and a
character within the same document.
The restriction on the lexical space can
often prevent you from using an existing node as an identifier. For
instance, in our library, we will not be able to use an ISBN number
as an ID since
xs:NCName
cannot start with a number
and whitespace is prohibited. We will therefore need to create
completely arbitrary IDs and derive their values from existing nodes.
The ISBN number “0836217462” can,
for instance, be used to build the ID
isbn-0836217462
, and the name
“Charles M. Schulz” can become the
ID au-Charles-M.-Schulz
. Adding a prefix (ISBN,
AU, etc.) is also a way to avoid a collision between IDs used for
different element types.
These IDs can be used to define either attributes or elements; however, the Recommendation reminds us that if we want to maintain compatibility with XML 1.0 IDs and IDREFs, they should be used only for attributes. In both cases (elements or attributes), the contribution to the PSVI is done in a similar fashion through a “ID/IDREF table”; except for maintaining compatibility with the feature as it was previously defined in XML 1.0, there is no reason to avoid using ID, IDREF, and IDREFS to define elements.
For example, to show how these styles can be combined, a
book
element of our library can be written as:
<book identifier="isbn-0836217462"> <isbn> 0836217462 </isbn> <title> Being a Dog Is a Full-Time Job </title> <author-ref ref="au-Charles_M._Schulz"/> <character-refs> ch-Peppermint_Patty ch-Snoopy ch-Schroeder ch-Lucy </character-refs> </book>
The book
element is identified by an
identifier
(ID) attribute, and references its
author though the ref
(IDREF) attribute of an
author-ref element as well as a whitespace-separated list of
characters through a character-refs
(IDREFS)
element. The piece of schema for this element can be:
<xs:element name="book"> <xs:complexType> <xs:sequence> <xs:element name="isbn" type="xs:NMTOKEN"/> <xs:element name="title" type="xs:string"/> <xs:element name="author-ref"> <xs:complexType> <xs:attribute name="ref" type="xs:IDREF" use="required"/> </xs:complexType> </xs:element> <xs:element name="character-refs" type="xs:IDREFS"/> </xs:sequence> <xs:attribute name="identifier" type="xs:ID" use="required"/> </xs:complexType> </xs:element>