New Lessons

Although this schema describes the same document as the one in Chapter 2, it illustrates very different aspects of W3C XML Schema.

Depth Versus Modularity?

Even though we will present features to balance this fact in the next chapters— xs:complexType and xs:group—we have sacrificed the modularity of our first schema to gain the depth and structure of the second one. This is a general tendency in W3C XML Schema.

In practice, you will probably want to keep a balance between these two opposite styles and allow a certain level of depth under several global elements.

There are two cases, however, in which these two styles are not equivalent. The first is when elements with the same name need to be defined with different contents at different locations. In this case, local element definitions should be used (at least at all the location except one) since the elements are identified by their names.

In our example, the element name appears both within author and character with the same datatype. We may want to define the element name with different content models in author and character, as in this instance document:

<?xml version="1.0"?>
<library>
  <book id="b0836217462" available="true">
    <isbn>
      0836217462
    </isbn>
    <title lang="en">
      Being a Dog Is a Full-Time Job
    </title>
    <author id="CMS">
      <name>
        <first>
          Charles
        </first>
        <middle>
          M.
        </middle>
        <last>
          Schulz
        </last>
      </name>
      <born>
        1922-11-26
      </born>
      <dead>
        2000-02-12
      </dead>
    </author>
    <character id="Snoopy">
      <name>
        Snoopy
      </name>
      <born>
        1950-10-04
      </born>
      <qualification>
        extroverted beagle
      </qualification>
    </character>
  </book>
</library>

Since we can define only one global element named name, we need to define at least one of the name elements locally under its parent.

The W3C Schema for XML Schema gives several examples of elements having different types depending on their location. We will see this used in the next section in our Russian doll schema: global definitions of elements have a different type in the schema for schema than local definitions or references, even though they use the same element name (xs:element).

Tip

Whether defining elements with the same name and different datatypes is good practice or not is subject to discussion. It may be confusing for human authors and more difficult to document, but W3C XML Schema gives, through local definitions, a way to avoid any confusion for the applications that will process these documents. In our example, for instance, we have two occurrences of a name element under author and under character. It is perfectly possible to define different constraints and even contents on those two elements. Although this could be presented as overloaded element names (“character/name” versus “author/name”), I find this practice unreliable, since we often don’t have a clear and simple way to identify those two contexts.

Another example is recursive schema, in which an element can be included within an element of the same type directly or indirectly in a child element. In this case, a flat design employing references must be used since the depth of these recursive structures is unlimited.

W3C XML Schema offers several examples of such elements with local definitions of elements that can be recursively nested, as is the case in our second schema. A flat design must be used since these elements need to be referenced if we don’t want to limit the maximum depth of the structure, and the schema for schema uses a reference mechanism. (The actual mechanism used in this case involves an element group, a feature we have not seen yet but is equivalent to an actual reference to an element.)

Russian Doll and Object-Oriented Design

The style of defining elements and attributes locally is often called the Russian doll design, since the definition of each element is embedded in the definition of its parent, in the same way Russian dolls are embedded into each other.

If we look at the Russian dolls with our object-oriented lenses, we may say that the objects are now created locally where they are needed as opposed to being created globally and cloned when we need them (which was the case as in our first schema).

At this point, we still need to learn how we can create types that are the equivalent of classes of objects and containers, and that will let us manipulate sets of objects.

Where Have the Element Types Gone?

Those of you who are familiar with XML (or SGML) and its DTD are used to identifying the elements though the term “element type.” The XML 1.0 Recommendation states that “each element has a type, identified by name.” This is further disambiguated by the namespaces specification, which explain that “an XML namespace is a collection of names, identified by a URI reference [RFC2396], which are used in XML documents as element types and attribute names.”

A surprising feature of our Russian doll schema is that this fundamental notion of element type has completely disappeared, and there is no way to tell which element type name is. Two different elements have been defined as having a name equal to name. These have an independent definition, which is identical in our example, but could be different—such as if we had decomposed the first, middle, and last names for authors, but not for characters. The notion of element type name doesn’t mean anything if we do not specify in which context it is used.

This loss has such little importance that few people have even noticed it. There are some situations where we need to identify elements, though—for instance to document XML vocabularies. A convenient way to write a reference manual for a XML vocabulary is to write an index of the element names with their definition. This becomes much more complex when there is no clear match between element types and their definitions and content models.

Tip

RDF is another application that relies on element types. RDF uses element types to identify elements as objects in its triples. The element “name” of the namespace http://dyomedea.com/ns is identified as http://dyomedea.com/ns#name. Cutting the link between element types and their schema definition makes it difficult, if not impossible, to answer basic questions, such as what’s the content model of http://dyomedea.com/ns#name, and where can I find its definition.

I was confronted with this issue when writing the reference guide of this book since the W3C XML Schema for W3C XML Schema uses many local element definitions. I came to the conclusion that the fact that the same element type (such as xs:restriction, which we will see later on) can have different content models with a different semantic, depending on its location in a schema, adds a significant amount of difficulty in understanding the language and reading a schema.