Complex Content Models

Restricting or extending simple content models is useful, but XML is not very useful without more complex models.

Complex contents are created by defining the list (and order) of its elements and attributes. We have already seen a couple of examples of complex content models, defined as local complex types in Chapter 1 and Chapter 2:

<xs:element name="library">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="book" maxOccurs="unbounded"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

<xs:element name="author">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="name"/>
      <xs:element ref="born"/>
      <xs:element ref="dead" minOccurs="0"/>
    </xs:sequence>
    <xs:attribute ref="id"/>
  </xs:complexType>
</xs:element>

These examples show the basic structure of a complex type with complex content definition: the xs:complexType element is holding the definition. Here, this definition is local (xs:complexType is not top-level since it is included under an xs:element element) and, thus, anonymous. Under xs:complexType, we find the sequence of children elements (xs:sequence) and the list of attributes.

In these examples, the xs:sequence elements have a role as “compositors” and the xs:element elements, which are included in xs:sequence, play a role of “particle.” This simple scenario may be extended using other compositors and particles.

W3C XML Schema defines three different compositors: xs:sequence, to define ordered lists of particles; xs:choice, to define a choice of one particle among several; and xs:all, to define nonordered list of particles. The xs:sequence and xs:choice compositors can define their own number of occurrences using minOccurs and maxOccurs attributes and they can be used as particles (some important restrictions apply to xs:all, which cannot be used as a particle, as we will see in the next section).

The particles are xs:element, xs:sequence, xs:choice, plus xs:any and xs:group, which we will see later in the section. The ability to include compositors within compositors is key to defining complex structures, although it is unfortunately subject to the allergy of W3C XML Schema for “nondeterminism.”

To give an idea of the kind of structures that can be defined, let’s suppose that the names in our library may be expressed in two different ways: either as a name element, as we have shown up to now, or as three different elements to define the first, middle, and last name (the middle name should be optional). Names could then be expressed as one of the three following combinations:

<first-name>
  Charles
</first-name>
      <middle-name>
  M
</middle-name>
       <last-name>
  Schulz
</last-name>

or:

<first-name>
  Peppermint
</first-name>
      <last-name>
  Patty
</last-name>

or:

<name>
  Snoopy
</name>

To describe this, we will replace the reference to the name element with a choice between either a name element or a sequence of first-name, middle-name (optional), and last-name. The definition of author then becomes:

<xs:element name="author">
  <xs:complexType>
    <xs:sequence>
      <xs:choice>
        <xs:element ref="name"/>
        <xs:sequence>
          <xs:element ref="first-name"/>
          <xs:element ref="middle-name" minOccurs="0"/>
          <xs:element ref="last-name"/>
        </xs:sequence>
      </xs:choice>
      <xs:element ref="born"/>
      <xs:element ref="dead" minOccurs="0"/>
    </xs:sequence>
    <xs:attribute ref="id"/>
  </xs:complexType>
</xs:element>

The name element also appears in the character element, and a copy/paste can be used to replace it with the xs:choice structure, but we would rather take this opportunity to introduce a new feature that is very handy to manipulating reusable sets of elements.

Element and attribute groups are containers in which sets of elements and attributes may be embedded and manipulated as a whole. These simple and flexible structures are very convenient for defining bits of content models that can be reused in multiple locations, such as the xs:choice structure that we created for our name.

The first step is to define the element group. The definition needs to be named and global (i.e., immediately under the xs:schema element) and has the following form:

<xs:group name="name">
  <xs:choice>
    <xs:element ref="name"/>
    <xs:sequence>
      <xs:element ref="first-name"/>
      <xs:element ref="middle-name" minOccurs="0"/>
      <xs:element ref="last-name"/>
    </xs:sequence>
  </xs:choice>
</xs:group>

These groups can then be used by reference as particles within compositors:

<xs:element name="author">
  <xs:complexType>
    <xs:sequence>
      <xs:group ref="name"/>
      <xs:element ref="born"/>
      <xs:element ref="dead" minOccurs="0"/>
    </xs:sequence>
    <xs:attribute ref="id"/>
  </xs:complexType>
</xs:element>
             
<xs:element name="character">
  <xs:complexType>
    <xs:sequence>
      <xs:group ref="name"/>
      <xs:element ref="born"/>
      <xs:element ref="qualification"/>
    </xs:sequence>
    <xs:attribute ref="id"/>
  </xs:complexType>
</xs:element>

Groups of attributes can be created in the same way using xs:attributeGroup:

<xs:attributeGroup name="bookAttributes">
  <xs:attribute name="id" type="xs:ID"/>
  <xs:attribute name="available" type="xs:boolean"/>
</xs:attributeGroup>
             
<xs:element name="book">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="isbn"/>
      <xs:element ref="title"/>
      <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/> 
      <xs:element ref="character" minOccurs="0"
        maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attributeGroup ref="bookAttributes"/>
  </xs:complexType>
</xs:element>

Let’s try a new example to illustrate one of the most constraining limitations of W3C XML Schema. We may want to describe all the pages of our books and to have a different description using different elements, such as odd-page and even-page for odd and even pages that require a different pagination. We can try to describe the new content model in the following group:

<xs:group name="pages">
  <xs:sequence>
    <xs:sequence minOccurs="0" maxOccurs="unbounded">
      <xs:element ref="odd-page"/>
      <xs:element ref="even-page"/>
    </xs:sequence>
    <xs:element ref="odd-page" minOccurs="0"/>
  </xs:sequence>
</xs:group>

This seems like a simple, smart way to describe the sequences of odd and even pages: a sequence of odd and even pages eventually followed by a last odd page. The model covers books with an odd or even number of pages as well as tiny booklets with a single page. Neither XSV not Xerces appear to enjoy it, though:

XSV:

vdv@evlist:~/w3c-xml-schema/user/examples/complex-types$ xsd -n first-ambigous.xsd 
first-ambigous.xml
using xsv (default)
<?xml version='1.0'?>
<xsv docElt='{None}library' instanceAssessed='true' instanceErrors='0' 
rootType='[Anonymous]' schemaDocs='first-ambigous.xsd' schemaErrors='1' 
target='/home/vdv/w3c-xml-schema/user/examples/complex-types/first-ambigous.xml' 
validation='strict' version='XSV 1.203.2.20/1.106.2.11 of 2001/11/01 17:07:43' 
xmlns='http://www.w3.org/2000/05/xsv'>
<schemaDocAttempt URI='/home/vdv/w3c-xml-schema/user/examples/complex-types/first-
ambigous.xsd' 
outcome='success' source='command line'/>
<schemaError char='7' line='65' phase='instance' 
resource='file:///home/vdv/w3c-xml-schema/user/examples/complex-types/first-ambigous.
xsd'>
non-deterministic content model for type None: {None}:odd-page/{None}:odd-page
</schemaError>
</xsv>

Xerces:

vdv@evlist:~/w3c-xml-schema/user/examples/complex-types$ xsd -n first-ambigous.xsd 
-p xerces-cvs first-ambigous.xml
using xerces-cvs
startDocument
[Error] first-ambigous.xml:2:10: Error: cos-nonambig: (,odd-page) 
and (,odd-page) violate the "Unique Particle Attribution" rule.
endDocument

Misled by the apparent flexibility of construction with compositors and particles, we violated an ancient taboo known in SGML as "ambiguous content models,” which was imported into XML’s DTDs as "nondeterministic content models,” and preserved by W3C XML Schema as the “Unique Particle Attribution Rule.”

In practice, this rule adds a significant amount of complexity to writing a W3C XML Schema, since it must be matched after all the many features, which allow you to define, redefine, derive, import, reference, and substitute complex types, have been resolved by the schema processor. The Recommendation recognizes that “given the presence of element substitution groups and wildcards, the concise expression of this constraint is difficult.” When these features have been resolved, the remaining constraint requires that a schema processor should never have any doubt about which branch it is in while doing the validation of an element and looking only at this element. Applied to the previous example, which was as simple as possible, there is a problem. When a schema processor meets the first odd-page element, it has no way of knowing if the page will be followed by an even-page element without first looking ahead to the next element. This is a violation of the Unique Particle Attribution Rule.

This example, adapted from an example describing a chess board, is one of the famous instances in which the content model cannot be written in a “deterministic” way. This is not always the case, and many nondeterministic constructions describe content models that may be rewritten in a deterministic fashion. We should differentiate those that are fundamentally nondeterministic from those that are only “accidentally” nondeterministic. Let’s go back to our example with a “name” sequence that can have two different content models, and imagine that instead of using first-name, we reused the name name. The content model is now either name or a sequence of name, “middle-name,” and “last-name”:

<xs:group name="name">
  <xs:choice>
    <xs:element ref="name"/>
    <xs:sequence>
      <xs:element ref="name"/>
      <xs:element ref="middle-name" minOccurs="0"/>
      <xs:element ref="last-name"/>
    </xs:sequence>
  </xs:choice>
</xs:group>
             
<xs:element name="author">
  <xs:complexType>
    <xs:sequence>
      <xs:group ref="name"/>
      <xs:element ref="born"/>
      <xs:element ref="dead" minOccurs="0"/>
    </xs:sequence>
    <xs:attribute ref="id"/>
  </xs:complexType>
</xs:element>

Here again, when the processor meets a name element, it has no way of knowing (without looking ahead) if this element matches the first or the second branch of the choice. In this case, though, the content model may be simplified if we note that the name element is common to both branches and that, in fact, we now have a mandatory name element followed by an optional sequence of an optional middle-name and a mandatory last-name. The content model can then be rewritten in a deterministic way as:

<xs:group name="name">
  <xs:sequence>
    <xs:element ref="name"/>
    <xs:sequence minOccurs="0">
      <xs:element ref="middle-name" minOccurs="0"/>
      <xs:element ref="last-name"/>
    </xs:sequence>
  </xs:sequence>
</xs:group>

This is a slippery path, though, which frequently depends on slight nuances in the content model and leads to schemas that are very difficult to maintain and may require nonsatisfactory compromises. If the requirement for the content model we have just written is changed and the name element in the second branch is no longer mandatory, then we are in trouble. The new content model is as follows:

<xs:group name="name">
  <xs:choice>
    <xs:element ref="name"/>
    <xs:sequence>
      <xs:element ref="name" minOccurs="0"/>
      <xs:element ref="middle-name" minOccurs="0"/>
      <xs:element ref="last-name"/>
    </xs:sequence>
  </xs:choice>
</xs:group>

But this model is nondeterministic for the same reason that the previous one was, and we need to reevaluate the different possible combinations to find that the new content model can now be expressed as:

<xs:group name="name">
  <xs:choice>
    <xs:sequence>
      <xs:element ref="name"/>
      <xs:sequence minOccurs="0">
        <xs:element ref="middle-name" minOccurs="0"/>
        <xs:element ref="last-name"/>
      </xs:sequence>
    </xs:sequence>
    <xs:sequence>
      <xs:element ref="middle-name" minOccurs="0"/>
      <xs:element ref="last-name"/>
    </xs:sequence>
  </xs:choice>
</xs:group>

While useful, unordered content models have their own sets of limitations.

Unordered content models (i.e., content models that do not impose any order on the children elements) not only increase the risks of nondeterministic content models, but are also an important complexity factor for schema processors. For the sake of implementation simplicity, the Recommendation has imposed huge limitations on the xs:all element, which makes it hardly usable in practice. xs:all cannot be used as a particle, but as a compositor only; xs:all cannot have a number of occurrences greater than one; the particles included within xs:all must be xs:element; and these particles must not specify numbers of occurrences greater than one.

To illustrate these limitations, let’s imagine we have decided to simplify the life of document producers and want to create a vocabulary that doesn’t care about the relative order of children elements. With a simple vocabulary such as the one defined in our first schema, this wouldn’t add a big burden to the applications handling our vocabulary. When you think about it, there is no special reason to impose the definition of the title of a book after its ISBN number or the definition of the list of authors before the list of characters.The first content model that may be affected by this decision is the content model of the book element:

<xs:element name="book">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="isbn"/>
      <xs:element ref="title"/>
      <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/> 
      <xs:element ref="character" minOccurs="0"
        maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute ref="id"/>
    <xs:attribute ref="available"/>
  </xs:complexType>
</xs:element>

Unfortunately, here the xs:sequence cannot be replaced by xs:all, since two of the children elements (author and character) have a maximum number of occurrences that is “unbounded” and thus higher than one. The second group of candidates includes the content models of author and character, which are relatively similar:

<xs:element name="author">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="name"/>
      <xs:element ref="born"/>
      <xs:element ref="dead" minOccurs="0"/>
    </xs:sequence>
    <xs:attribute ref="id"/>
  </xs:complexType>
</xs:element>
                
<xs:element name="character">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="name"/>
      <xs:element ref="born"/>
      <xs:element ref="qualification"/>
    </xs:sequence>
    <xs:attribute ref="id"/>
  </xs:complexType>
</xs:element>

The good news here is that both author and character match the criteria for xs:all, so we can write:

<xs:element name="author">
  <xs:complexType>
    <xs:all>
      <xs:element ref="name"/>
      <xs:element ref="born"/>
      <xs:element ref="dead" minOccurs="0"/>
    </xs:all>
    <xs:attribute ref="id"/>
  </xs:complexType>
</xs:element>
                
<xs:element name="character">
  <xs:complexType>
    <xs:all>
      <xs:element ref="name"/>
      <xs:element ref="born"/>
      <xs:element ref="qualification"/>
    </xs:all>
    <xs:attribute ref="id"/>
  </xs:complexType>
</xs:element>

We can have two elements (author and character) in which the order of children elements is not significant. One may question, though, whether this is very interesting since this independence is not consistent throughout the schema. More importantly, we must note that we have lost a great deal of flexibility and extensibility by using a xs:all compositor. Since the maximum number of occurrences for each child element needs to be one, we can no longer, for instance, change the number of occurrences of the qualification element to accept several qualifications in different languages. And since the particles used in xs:all cannot be compositors or groups, we can’t extend the content model to accept both name and the sequence first-name, middle-name, and last-name either.

Since xs:all appears to be pretty ineffective in general, there are a couple of workarounds that may be proposed for people who would like to develop order-independent vocabularies.

The first workaround, which may be used only if you are creating your own vocabulary from scratch, is to adapt the structures of your document to the constraint of xs:all. In practice, this means that each time we have to use a xs:choice, a xs:sequence, or include elements with more than one occurrence, we will add a new element as a container. For instance, we will create containers named authors and characters that will encapsulate the multiple occurrences of author and character. The result is instance documents such as:

<?xml version="1.0"?> 
<library>
  <book id="b0836217462" available="true">
    <title lang="en">
      Being a Dog Is a Full-Time Job
    </title>
    <isbn>
      0836217462
    </isbn>
    <authors>
      <author id="CMS">
        <born>
          1922-11-26
        </born>
        <dead>
          2000-02-12
        </dead>
        <name>
          Charles M Schulz
        </name>
      </author>
    </authors>
    <characters>
      <character id="PP">
        <name>
          Peppermint Patty
        </name>
        <qualification>
          bold, brash and tomboyish
        </qualification>
        <born>
          1966-08-22
        </born>
      </character>
      <character id="Snoopy">
        <born>
          1950-10-04
        </born>
        <name>
          Snoopy
        </name>
        <qualification>
          extroverted beagle
        </qualification>
      </character>
      <character id="Schroeder">
        <qualification>
          brought classical music to the Peanuts strip
        </qualification>
        <name>
          Schroeder
        </name>
        <born>
          1951-05-30
        </born>
      </character>
      <character id="Lucy">
        <name>
          Lucy
        </name>
        <born>
          1952-03-03
        </born>
        <qualification>
          bossy, crabby and selfish
        </qualification>
      </character>
    </characters>
  </book>
</library>

This instance document defined by a full schema, which could be:

<?xml version="1.0"?> 
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="name" type="xs:token"/>
  <xs:element name="qualification" type="xs:token"/>
  <xs:element name="born" type="xs:date"/>
  <xs:element name="dead" type="xs:date"/>
  <xs:element name="isbn" type="xs:NMTOKEN"/>
  <xs:attribute name="id" type="xs:ID"/>
  <xs:attribute name="available" type="xs:boolean"/>
  <xs:attribute name="lang" type="xs:language"/>
  <xs:element name="title">
    <xs:complexType>
      <xs:simpleContent>
        <xs:extension base="xs:token">
          <xs:attribute ref="lang"/>
        </xs:extension>
      </xs:simpleContent>
    </xs:complexType>
  </xs:element>
  <xs:element name="library">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="book" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="authors">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="author">
    <xs:complexType>
      <xs:all>
        <xs:element ref="name"/>
        <xs:element ref="born"/>
        <xs:element ref="dead" minOccurs="0"/>
      </xs:all>
      <xs:attribute ref="id"/>
    </xs:complexType>
  </xs:element>
  <xs:element name="book">
    <xs:complexType>
      <xs:all>
        <xs:element ref="isbn"/>
        <xs:element ref="title"/>
        <xs:element ref="authors"/>
        <xs:element ref="characters"/>
      </xs:all>
      <xs:attribute ref="id"/>
      <xs:attribute ref="available"/>
    </xs:complexType>
  </xs:element>
  <xs:element name="characters">
    <xs:complexType>
      <xs:sequence> 
        <xs:element ref="character" minOccurs="0"
          maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="character">
    <xs:complexType>
      <xs:all>
        <xs:element ref="name"/>
        <xs:element ref="born"/>
        <xs:element ref="qualification"/>
      </xs:all>
      <xs:attribute ref="id"/>
    </xs:complexType>
  </xs:element>
</xs:schema>

This adaptation of the instance document will be more painful if we want to implement our alternative “name” content model. Since we cannot include a xs:choice in a xs:all compositor, we have to add a first level of container, which is always the same, and a second level of container, which contains only the choice that would lead to instance documents such as:

<?xml version="1.0"?> 
<library>
  <book id="b0836217462" available="true">
    <title lang="en">
      Being a Dog Is a Full-Time Job
    </title>
    <isbn>
      0836217462
    </isbn>
    <authors>
      <author id="CMS">
        <born>
          1922-11-26
        </born>
        <dead>
          2000-02-12
        </dead>
        <name>
          <complex-name>
            <last-name>
              Schulz
            </last-name>
            <first-name>
              Charles
            </first-name>
            <middle-name>
              M
            </middle-name>
          </complex-name>
        </name>
      </author>
    </authors>
    <characters>
      <character id="PP">
        <name>
          <complex-name>
            <first-name>
              Peppermint
            </first-name>
            <last-name>
              Patty
            </last-name>
          </complex-name>
        </name>
        <qualification>
          bold, brash and tomboyish
        </qualification>
        <born>
          1966-08-22
        </born>
      </character>
      <character id="Snoopy">
        <born>
          1950-10-04
        </born>
        <name>
          <simple-name>
            Snoopy
          </simple-name>
        </name>
        <qualification>
          extroverted beagle
        </qualification>
      </character>
      <character id="Schroeder">
        <qualification>
          brought classical music to the Peanuts strip
        </qualification>
        <name>
          <simple-name>
            Schroeder
          </simple-name>
        </name>
        <born>
          1951-05-30
        </born>
      </character>
      <character id="Lucy">
        <name>
          <simple-name>
            Lucy
          </simple-name>
        </name>
        <born>
          1952-03-03
        </born>
        <qualification>
          bossy, crabby and selfish
        </qualification>
      </character>
    </characters>
  </book>
</library>

The adaptation of the schema is then straightforward and could be (keeping a flat design):

<?xml version="1.0"?> 
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="simple-name" type="xs:token"/>
  <xs:element name="first-name" type="xs:token"/>
  <xs:element name="middle-name" type="xs:token"/>
  <xs:element name="last-name" type="xs:token"/>
  <xs:element name="qualification" type="xs:token"/>
  <xs:element name="born" type="xs:date"/>
  <xs:element name="dead" type="xs:date"/>
  <xs:element name="isbn" type="xs:NMTOKEN"/>
  <xs:attribute name="id" type="xs:ID"/>
  <xs:attribute name="available" type="xs:boolean"/>
  <xs:attribute name="lang" type="xs:language"/>
  <xs:element name="name">
    <xs:complexType>
      <xs:choice>
        <xs:element ref="simple-name"/>
        <xs:element ref="complex-name"/>
      </xs:choice>
    </xs:complexType>
  </xs:element>
  <xs:element name="complex-name">
    <xs:complexType>
      <xs:all>
        <xs:element ref="first-name"/>
        <xs:element ref="middle-name" minOccurs="0"/>
        <xs:element ref="last-name"/>
      </xs:all>
    </xs:complexType>
  </xs:element>
  <xs:element name="title">
    <xs:complexType>
      <xs:simpleContent>
        <xs:extension base="xs:token">
          <xs:attribute ref="lang"/>
        </xs:extension>
      </xs:simpleContent>
    </xs:complexType>
  </xs:element>
  <xs:element name="library">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="book" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="authors">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="author">
    <xs:complexType>
      <xs:all>
        <xs:element ref="name"/>
        <xs:element ref="born"/>
        <xs:element ref="dead" minOccurs="0"/>
      </xs:all>
      <xs:attribute ref="id"/>
    </xs:complexType>
  </xs:element>
  <xs:element name="book">
    <xs:complexType>
      <xs:all>
        <xs:element ref="isbn"/>
        <xs:element ref="title"/>
        <xs:element ref="authors"/>
        <xs:element ref="characters"/>
      </xs:all>
      <xs:attribute ref="id"/>
      <xs:attribute ref="available"/>
    </xs:complexType>
  </xs:element>
  <xs:element name="characters">
    <xs:complexType>
      <xs:sequence> 
        <xs:element ref="character" minOccurs="0"
          maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="character">
    <xs:complexType>
      <xs:all>
        <xs:element ref="name"/>
        <xs:element ref="born"/>
        <xs:element ref="qualification"/>
      </xs:all>
      <xs:attribute ref="id"/>
    </xs:complexType>
  </xs:element>
</xs:schema>

This process may be generalized and used for purposes other than adapting instance documents to the constraints of xs:all. It is interesting to note that we have “externalized” the complexity, which was previously hidden from the instance document in the schema, to bring the full structure of the content model into the instance document itself. The choices and sequences (an element with multiple occurrences is nothing more than an implicit sequence) are now expressed through containers in the instance documents. Since the structure is more apparent in the instance documents, it can be considered more readable; some people find it a good practice to use such container.

Complex contents can also be derived, by extension or by restriction, from complex types. Before we see the details of these mechanisms, note that they are not symmetrical and their semantic is very different. The derivation of a complex content by restriction is a restriction of the set of matching instances. All the instance structures that match the restricted complex type must also match the base complex type. The derivation of a complex content by extension of a complex type is an extension of the content model by addition of new particles. A content that matches the base type does not necessarily match the extended complex type. This also means that there is no “roundtrip”: in the general case, neither a restricted complex type nor an extended type can be extended or restricted back into its base type.

Derivation by extension is similar to the extension of simple content complex types. It is functionally very similar to joining groups of elements and attributes to create a new complex type. The idea behind this feature is to let people add new elements and attributes after those already defined in the base type. This is virtually equivalent to creating a sequence with the current content model followed by the new content model. Let’s go back to our library to illustrate this. The content models of our elements author and character are relatively similar: author expects name, born, and dead, while character expects name, born, and qualification. If we want to use a derivation by extension, we can first create a base type that contains the first elements common to the content model of both elements:

<xs:complexType name="basePerson">
  <xs:sequence>
    <xs:element ref="name"/>
    <xs:element ref="born"/>
  </xs:sequence>
  <xs:attribute ref="id"/>
</xs:complexType>

It is then possible to use derivations by extension to append new elements (dead for author and qualification for character) after those that have already been defined in the base type:

<xs:element name="author">
  <xs:complexType>
    <xs:complexContent>
      <xs:extension base="basePerson">
        <xs:sequence>
          <xs:element ref="dead" minOccurs="0"/>
        </xs:sequence>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>
</xs:element>
             
<xs:element name="character">
  <xs:complexType>
    <xs:complexContent>
      <xs:extension base="basePerson">
        <xs:sequence>
          <xs:element ref="qualification"/>
        </xs:sequence>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>
</xs:element>

Technically, the meaning of this derivation is equivalent to creating a sequence containing the compositor used to define the base type as well as the base type included in the xs:extension element. Thus, the content models of these elements are similar to the content models defined as:

<xs:element name="author">
  <xs:complexType>
    <xs:sequence>
      <xs:sequence>
        <xs:element ref="name"/>
        <xs:element ref="born"/>
      </xs:sequence>
      <xs:sequence>
        <xs:element ref="dead" minOccurs="0"/>
      </xs:sequence>
    </xs:sequence>
    <xs:attribute ref="id"/>
  </xs:complexType>
</xs:element>
             
<xs:element name="character">
  <xs:complexType>
    <xs:sequence>
      <xs:sequence>
        <xs:element ref="name"/>
        <xs:element ref="born"/>
      </xs:sequence>
      <xs:sequence>
        <xs:element ref="qualification"/>
      </xs:sequence>
    </xs:sequence>
    <xs:attribute ref="id"/>
  </xs:complexType>
</xs:element>

This equivalence clearly shows the feature of this derivation mechanism. As stated in the introduction of complex content derivation mechanisms, this is not an extension of the set of valid instance structures. An element character, with its mandatory qualification, cannot have a valid basePerson content model but rather the merge of two content models. This merge itself is subject to limitations: you cannot choose the point where the new content model is inserted; this addition is always done by appending the new compositor after the one of the base type. In our example, if the common elements name and born were not the first two elements, we couldn’t have used a derivation by extension.

Another caveat in derivations by extension is we can’t choose the compositor that is used to merge the two content models. This means that when we derive content models using xs:choice as compositors, it is not the scope of the choices that is extended, but rather the choices that are included in a xs:sequence. We could, for instance, extend the content model of the element persons, which we just created and which could be defined as a global complex type:

<xs:complexType name="basePersons">
  <xs:choice minOccurs="0" maxOccurs="unbounded">
    <xs:element ref="author"/>
    <xs:element ref="character"/>
  </xs:choice>
</xs:complexType>

If we add a new element using a derivation by extension:

<xs:complexType name="persons">
  <xs:complexContent>
    <xs:extension base="basePersons">
      <xs:sequence> 
        <xs:element name="editor" type="xs:token" minOccurs="0"
          maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:extension>
  </xs:complexContent>
</xs:complexType>

The result is a content type that is equivalent to:

<xs:complexType name="personsEquivalent">
  <xs:sequence>
    <xs:choice minOccurs="0" maxOccurs="unbounded">
      <xs:element ref="author"/>
      <xs:element ref="character"/>
    </xs:choice>
    <xs:sequence> 
      <xs:element name="editor" type="xs:token" minOccurs="0"
        maxOccurs="unbounded"/>
    </xs:sequence>
  </xs:sequence>
</xs:complexType>

There is no way to obtain an extension of the xs:choice such as:

<xs:complexType name="personsAsWeWouldHaveLiked">
  <xs:choice minOccurs="0" maxOccurs="unbounded">
    <xs:element ref="author"/>
    <xs:element ref="character"/>
    <xs:element name="editor" type="xs:token"/>
  </xs:choice>
</xs:complexType>

The situation with xs:all is even worse: the restrictions on the composition of xs:all still apply. This means you can’t add any content to a complex type defined with a xs:all—although you can still add new attributes—and also you can only use a xs:all compositor in a derivation by extension if the base type has an empty content model.

Whereas derivation by extension is similar to merging two content models through a xs:sequence compositor, derivation by restriction is a restriction of the number of instance structures matching the complex type. In this respect, it is similar to the derivation by restriction of simple datatypes or simple content complex types (even though we’ve seen that a facet such as xs:whiteSpace expanded the number of instance documents matching a simple type). Note that this is the only similarity between derivations by restriction of simple and complex datatypes. This is highly confusing, since W3C XML Schema uses the same word and even the same element name in both cases, but these words have a different meaning and the content models of the xs:restriction elements are different.

Unlike simple type derivation, there are no facets to apply to complex types, and the derivation is done by defining the full content model of the derived datatype, which must be a logical restriction of the base type. Any instance structure valid per the derived datatype must also be valid per the base datatype. The W3C XML Schema specification does not define the derivation by restriction in these terms, but defines a formal algorithm to be followed by schema processors, which is roughly equivalent.

The derivation by restriction of a complex type is a declaration of intention that the derived type is a subset of the base type. (Rather than a derivation we’ve seen for simple types, this declaration is needed for features allowing substitutions and redefinitions of types, which we will see in Chapter 8 and Chapter 12 and which may provide useful information used by some applications.) When we derive simple types, we can take a base type without having to care about the details of the facets that are already applied, and just add our own set of facets. Here, on the contrary, we need to provide a full definition of a content model, except for attributes that can be declared as “prohibited” to be excluded from the restriction, something we have seen for the restriction of complex types with simple contents.

Moving on, let’s try to find a base from which we can derive both the author and character elements by restriction. This time, we can be sure that such a complex type exists since all the complex types can be derived from an abstract xs:anyType, allowing any elements and attributes. In practice, however, we will try to find the most restrictive base type that can accommodate our needs. Since the name and born elements are present in both author and character, with the same number of occurrences, we can keep them as they appear. We then have two elements (dead and qualification, which appear only in one of the two elements author and character). Since both author and character will need to be valid per the base type, we will take both of them in the base type but make them optional by giving them a minOccurs attribute equal to 0. Our base type can then be:

<xs:complexType name="person">
  <xs:sequence>
    <xs:element ref="name"/>
    <xs:element ref="born"/>
    <xs:element ref="dead" minOccurs="0"/>
    <xs:element ref="qualification" minOccurs="0"/>
  </xs:sequence>
  <xs:attribute ref="id"/>
</xs:complexType>

The derivations are then done by defining the content model within a xs:restriction element (note that we have not repeated the attribute declarations which are not modified):

<xs:element name="author">
  <xs:complexType>
    <xs:complexContent>
      <xs:restriction base="person">
        <xs:sequence>
          <xs:element ref="name"/>
          <xs:element ref="born"/>
          <xs:element ref="dead" minOccurs="0"/>
        </xs:sequence>
      </xs:restriction>
    </xs:complexContent>
  </xs:complexType>
</xs:element>
             
<xs:element name="character">
  <xs:complexType>
    <xs:complexContent>
      <xs:restriction base="person">
        <xs:sequence>
          <xs:element ref="name"/>
          <xs:element ref="born"/>
          <xs:element ref="qualification"/>
        </xs:sequence>
      </xs:restriction>
    </xs:complexContent>
  </xs:complexType>
</xs:element>

We see here that the syntax of a derivation by restriction is more verbose than the syntax of the straight definition of the content model. The purpose of this derivation is not to build modular schemas, but rather to give applications that use this schema the indication that there is some commonality between the content models, and if they know how to handle the complex type “person,” they can handle the elements author and character. We will see W3C XML Schema features that rely on this derivation method in Chapter 8 and Chapter 12.

Changing the number of occurrences of particles is not the only modification that can be done during a derivation by restriction. Other operations that result in a reduction of the number of valid instance structures are also possible, such as changing a simple type to a more restrictive one or fixing values. The main constraint in this mechanism is that each particle of the derived type must be an explicit derivation of the corresponding particle of the base type. The effect of this statement is to limit the “depth” of the restrictions that can be performed in a single step, and when we need to restrict particles at a deeper level of imbrication, we may have to transform local definitions into global ones. We will see a concrete example in Section 7.5.1, which are similar in this respect.

We now have all the elements we need to look back at the claim about the asymmetry of these derivation methods. This lack of symmetry is not a defect as such, but studying it is a good exercise to understanding the meaning of these two derivation methods. Let’s examine the derivation by extension of basePerson into the character element:

<xs:complexType name="basePerson">
  <xs:sequence>
    <xs:element ref="name"/>
    <xs:element ref="born"/>
  </xs:sequence>
  <xs:attribute ref="id"/>
</xs:complexType>
             
<xs:element name="character">
  <xs:complexType>
    <xs:complexContent>
      <xs:extension base="basePerson">
        <xs:sequence>
          <xs:element ref="qualification"/>
        </xs:sequence>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>
</xs:element>

The content model of character contains a mandatory qualification element. Valid characters are not valid per basePerson; thus, there is no hope to be able to derive character back into basePerson by restriction, since all the instance structures that are valid per the derived type must be valid per the base type in a derivation by restriction.

Let’s look back at the derivation by restriction of the person base type into a character element:

<xs:complexType name="person">
  <xs:sequence>
    <xs:element ref="name"/>
    <xs:element ref="born"/>
    <xs:element ref="dead" minOccurs="0"/>
    <xs:element ref="qualification" minOccurs="0"/>
  </xs:sequence>
  <xs:attribute ref="id"/>
</xs:complexType>
             
<xs:element name="character">
  <xs:complexType>
    <xs:complexContent>
      <xs:restriction base="person">
        <xs:sequence>
          <xs:element ref="name"/>
          <xs:element ref="born"/>
          <xs:element ref="qualification"/>
        </xs:sequence>
      </xs:restriction>
    </xs:complexContent>
  </xs:complexType>
</xs:element>

Again, it is not possible to derive the complex type of character into person, since it means changing the number of minimum occurrences of qualification from 1 to 0 and adding an optional dead element between born and qualification. None of these operations are possible during a derivation by extension, which can only append new content after the content of the base type, and can’t update an existing particle (to change the number of occurrences) nor insert a new particle between two existing particles.