Chapter 9. Defining Uniqueness, Keys, and Key References

Like any storage system, a XML document needs to provide ways to identify and reference pieces of the information it contains. In this chapter, we will present and compare the two features that allow XML to do so with W3C XML Schema. One directly emulates the ID, IDREF, and IDREFs attribute types from the XML DTDs, while the other was introduced to provide more flexibility through the use of XPath expressions.

The first way to describe identifiers and references with W3C XML Schema is inherited from XML’s DTDs. We already discussed this in Chapter 5: the xs:ID , xs:IDREF , and xs:IDREFS datatypes introduced in W3C XML Schema emulate the behavior of the XML DTD’s ID, IDREF, and IDREFS attribute types.

Unlike their DTD counterparts, these simple types can be used to describe both elements and attributes, but inherit the other restrictions from the DTDs: their lexical space is the same as the unqualified XML name (known as the xs:NCName datatype), and they are global to a document, meaning that you won’t be allowed to use the same ID value to identify, for instance, both an author and a character within the same document.

The restriction on the lexical space can often prevent you from using an existing node as an identifier. For instance, in our library, we will not be able to use an ISBN number as an ID since xs:NCName cannot start with a number and whitespace is prohibited. We will therefore need to create completely arbitrary IDs and derive their values from existing nodes. The ISBN number “0836217462” can, for instance, be used to build the ID isbn-0836217462, and the name “Charles M. Schulz” can become the ID au-Charles-M.-Schulz. Adding a prefix (ISBN, AU, etc.) is also a way to avoid a collision between IDs used for different element types.

These IDs can be used to define either attributes or elements; however, the Recommendation reminds us that if we want to maintain compatibility with XML 1.0 IDs and IDREFs, they should be used only for attributes. In both cases (elements or attributes), the contribution to the PSVI is done in a similar fashion through a “ID/IDREF table”; except for maintaining compatibility with the feature as it was previously defined in XML 1.0, there is no reason to avoid using ID, IDREF, and IDREFS to define elements.

For example, to show how these styles can be combined, a book element of our library can be written as:

<book identifier="isbn-0836217462">
  <isbn>
    0836217462
  </isbn>
  <title>
    Being a Dog Is a Full-Time Job
  </title>
  <author-ref ref="au-Charles_M._Schulz"/>
  <character-refs>
    ch-Peppermint_Patty ch-Snoopy ch-Schroeder ch-Lucy
  </character-refs>
</book>

The book element is identified by an identifier (ID) attribute, and references its author though the ref (IDREF) attribute of an author-ref element as well as a whitespace-separated list of characters through a character-refs (IDREFS) element. The piece of schema for this element can be:

<xs:element name="book">
  <xs:complexType>
    <xs:sequence>
      <xs:element name="isbn" type="xs:NMTOKEN"/>
      <xs:element name="title" type="xs:string"/>
      <xs:element name="author-ref">
        <xs:complexType>
          <xs:attribute name="ref" type="xs:IDREF" use="required"/>
        </xs:complexType>
      </xs:element>
      <xs:element name="character-refs" type="xs:IDREFS"/>
    </xs:sequence>
    <xs:attribute name="identifier" type="xs:ID" use="required"/>
  </xs:complexType>
</xs:element>