Restricting or extending simple content models is useful, but XML is not very useful without more complex models.
Complex contents are created by defining the list (and order) of its elements and attributes. We have already seen a couple of examples of complex content models, defined as local complex types in Chapter 1 and Chapter 2:
<xs:element name="library"> <xs:complexType> <xs:sequence> <xs:element ref="book" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="author"> <xs:complexType> <xs:sequence> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="dead" minOccurs="0"/> </xs:sequence> <xs:attribute ref="id"/> </xs:complexType> </xs:element>
These examples show the basic structure of a complex type with
complex content definition: the xs:complexType
element is holding the definition. Here, this definition is local
(xs:complexType
is not top-level since it is included
under an xs:element
element) and, thus, anonymous. Under xs:complexType
, we
find the sequence of children elements (xs:sequence
) and the list of attributes.
In
these examples, the
xs:sequence
elements have a role as
“compositors” and the
xs:element
elements, which are included in xs:sequence
, play a
role of “particle.” This simple
scenario may be extended using other compositors and particles.
W3C XML Schema defines three different compositors: xs:sequence
, to define ordered lists of particles;
xs:choice
, to
define a choice of one particle among several; and
xs:all
, to
define nonordered list of particles. The xs:sequence
and xs:choice
compositors can define their own
number of occurrences using
minOccurs
and
maxOccurs
attributes and they can be used as
particles (some important restrictions apply to xs:all
, which cannot be used as a particle, as we
will see in the next section).
The particles are xs:element
, xs:sequence
, xs:choice
, plus xs:any
and xs:group
, which we
will see later in the section. The ability to include compositors
within compositors is key to defining complex structures, although it
is unfortunately subject to the allergy of W3C XML Schema for
“nondeterminism.”
To give an idea of the kind of structures that can be defined,
let’s suppose that the names in our library may be
expressed in two different ways: either as a name
element, as we have shown up to now, or as three different elements
to define the first, middle, and last name (the middle name should be
optional). Names could then be expressed as one of the three
following combinations:
<first-name> Charles </first-name> <middle-name> M </middle-name> <last-name> Schulz </last-name>
or:
<first-name> Peppermint </first-name> <last-name> Patty </last-name>
or:
<name> Snoopy </name>
To describe this, we will replace the reference to the
name
element with a choice between either a
name
element or a sequence of
first-name
, middle-name
(optional), and last-name
. The definition of
author then becomes:
<xs:element name="author"> <xs:complexType> <xs:sequence> <xs:choice> <xs:element ref="name"/> <xs:sequence> <xs:element ref="first-name"/> <xs:element ref="middle-name" minOccurs="0"/> <xs:element ref="last-name"/> </xs:sequence> </xs:choice> <xs:element ref="born"/> <xs:element ref="dead" minOccurs="0"/> </xs:sequence> <xs:attribute ref="id"/> </xs:complexType> </xs:element>
The name
element also appears in the
character
element, and a copy/paste can be used to
replace it with the xs:choice
structure, but we
would rather take this opportunity to introduce a new feature that is
very handy to manipulating reusable sets of elements.
Element
and
attribute
groups are containers in which sets of
elements and attributes may be embedded and manipulated as a whole.
These simple and flexible structures are very convenient for defining
bits of
content
models that can be reused in multiple locations, such as the xs:choice
structure that we created for our name.
The first step is to define the element group. The definition needs
to be named and global (i.e., immediately under the
xs:schema
element) and has the following form:
<xs:group name="name"> <xs:choice> <xs:element ref="name"/> <xs:sequence> <xs:element ref="first-name"/> <xs:element ref="middle-name" minOccurs="0"/> <xs:element ref="last-name"/> </xs:sequence> </xs:choice> </xs:group>
These groups can then be used by reference as particles within compositors:
<xs:element name="author"> <xs:complexType> <xs:sequence> <xs:group ref="name"/> <xs:element ref="born"/> <xs:element ref="dead" minOccurs="0"/> </xs:sequence> <xs:attribute ref="id"/> </xs:complexType> </xs:element> <xs:element name="character"> <xs:complexType> <xs:sequence> <xs:group ref="name"/> <xs:element ref="born"/> <xs:element ref="qualification"/> </xs:sequence> <xs:attribute ref="id"/> </xs:complexType> </xs:element>
Groups of attributes can be created in the same way using
xs:attributeGroup
:
<xs:attributeGroup name="bookAttributes"> <xs:attribute name="id" type="xs:ID"/> <xs:attribute name="available" type="xs:boolean"/> </xs:attributeGroup> <xs:element name="book"> <xs:complexType> <xs:sequence> <xs:element ref="isbn"/> <xs:element ref="title"/> <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="character" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attributeGroup ref="bookAttributes"/> </xs:complexType> </xs:element>
Let’s
try a new example to illustrate one of
the most constraining limitations of W3C XML Schema. We may want to
describe all the pages of our books and to have a different
description using different elements, such as
odd-page
and even-page
for odd
and even pages that require a different pagination. We can try to
describe the new content model in the following group:
<xs:group name="pages"> <xs:sequence> <xs:sequence minOccurs="0" maxOccurs="unbounded"> <xs:element ref="odd-page"/> <xs:element ref="even-page"/> </xs:sequence> <xs:element ref="odd-page" minOccurs="0"/> </xs:sequence> </xs:group>
This seems like a simple, smart way to describe the sequences of odd and even pages: a sequence of odd and even pages eventually followed by a last odd page. The model covers books with an odd or even number of pages as well as tiny booklets with a single page. Neither XSV not Xerces appear to enjoy it, though:
XSV: vdv@evlist:~/w3c-xml-schema/user/examples/complex-types$ xsd -n first-ambigous.xsd first-ambigous.xml using xsv (default) <?xml version='1.0'?> <xsv docElt='{None}library' instanceAssessed='true' instanceErrors='0' rootType='[Anonymous]' schemaDocs='first-ambigous.xsd' schemaErrors='1' target='/home/vdv/w3c-xml-schema/user/examples/complex-types/first-ambigous.xml' validation='strict' version='XSV 1.203.2.20/1.106.2.11 of 2001/11/01 17:07:43' xmlns='http://www.w3.org/2000/05/xsv'> <schemaDocAttempt URI='/home/vdv/w3c-xml-schema/user/examples/complex-types/first- ambigous.xsd' outcome='success' source='command line'/> <schemaError char='7' line='65' phase='instance' resource='file:///home/vdv/w3c-xml-schema/user/examples/complex-types/first-ambigous. xsd'> non-deterministic content model for type None: {None}:odd-page/{None}:odd-page </schemaError> </xsv> Xerces: vdv@evlist:~/w3c-xml-schema/user/examples/complex-types$ xsd -n first-ambigous.xsd -p xerces-cvs first-ambigous.xml using xerces-cvs startDocument [Error] first-ambigous.xml:2:10: Error: cos-nonambig: (,odd-page) and (,odd-page) violate the "Unique Particle Attribution" rule. endDocument
Misled by the apparent flexibility of construction with compositors and particles, we violated an ancient taboo known in SGML as "ambiguous content models,” which was imported into XML’s DTDs as "nondeterministic content models,” and preserved by W3C XML Schema as the “Unique Particle Attribution Rule.”
In practice, this rule adds a significant
amount of complexity to writing a W3C XML Schema, since it must be
matched after all the many features, which allow you to define,
redefine, derive, import, reference, and substitute complex types,
have been resolved by the schema processor. The Recommendation
recognizes that “given the presence of element
substitution groups and wildcards, the concise expression of this
constraint is difficult.” When these features have
been resolved, the remaining constraint requires that a schema
processor should never have any doubt about which branch it is in
while doing the validation of an element and looking only at this
element. Applied to the previous example, which was as simple as
possible, there is a problem. When a schema processor meets the first
odd-page
element, it has no way of knowing if the
page will be followed by an even-page
element
without first looking ahead to the next element. This is a violation
of the Unique Particle Attribution Rule.
This example, adapted from an
example describing a chess board, is one of the famous instances in
which the content model cannot be written in a
“deterministic” way. This is not
always the case, and many nondeterministic constructions describe
content models that may be rewritten in a deterministic fashion. We
should differentiate those that are fundamentally nondeterministic
from those that are only
“accidentally” nondeterministic.
Let’s go back to our example with a
“name” sequence that can have two
different content models, and imagine that instead of using
first-name
, we reused the name
name
. The content model is now either
name
or a sequence of name
,
“middle-name,” and
“last-name”:
<xs:group name="name"> <xs:choice> <xs:element ref="name"/> <xs:sequence> <xs:element ref="name"/> <xs:element ref="middle-name" minOccurs="0"/> <xs:element ref="last-name"/> </xs:sequence> </xs:choice> </xs:group> <xs:element name="author"> <xs:complexType> <xs:sequence> <xs:group ref="name"/> <xs:element ref="born"/> <xs:element ref="dead" minOccurs="0"/> </xs:sequence> <xs:attribute ref="id"/> </xs:complexType> </xs:element>
Here again, when the processor meets a name
element, it has no way of knowing (without looking ahead) if this
element matches the first or the second branch of the choice. In this
case, though, the content model may be simplified if we note that the
name
element is common to both branches and that,
in fact, we now have a mandatory name
element
followed by an optional sequence of an optional
middle-name
and a mandatory
last-name
. The content model can then be rewritten
in a deterministic way as:
<xs:group name="name"> <xs:sequence> <xs:element ref="name"/> <xs:sequence minOccurs="0"> <xs:element ref="middle-name" minOccurs="0"/> <xs:element ref="last-name"/> </xs:sequence> </xs:sequence> </xs:group>
This is a slippery path, though, which frequently depends on slight
nuances in the content model and leads to schemas that are very
difficult to maintain and may require nonsatisfactory compromises. If
the requirement for the content model we have just written is changed
and the name
element in the second branch is no
longer mandatory, then we are in trouble. The new content model is as
follows:
<xs:group name="name"> <xs:choice> <xs:element ref="name"/> <xs:sequence> <xs:element ref="name" minOccurs="0"/> <xs:element ref="middle-name" minOccurs="0"/> <xs:element ref="last-name"/> </xs:sequence> </xs:choice> </xs:group>
But this model is nondeterministic for the same reason that the previous one was, and we need to reevaluate the different possible combinations to find that the new content model can now be expressed as:
<xs:group name="name"> <xs:choice> <xs:sequence> <xs:element ref="name"/> <xs:sequence minOccurs="0"> <xs:element ref="middle-name" minOccurs="0"/> <xs:element ref="last-name"/> </xs:sequence> </xs:sequence> <xs:sequence> <xs:element ref="middle-name" minOccurs="0"/> <xs:element ref="last-name"/> </xs:sequence> </xs:choice> </xs:group>
Formal theories and algorithms can rewrite nondeterministic content models in a deterministic way when possible. Hopefully, W3C XML Schema development tools will integrate some of these algorithms to propose an alternative when a schema author creates nondeterministic content models.
Ambiguous content models were already a controversial issue in the 90s among the SGML community, and the restriction has been maintained in XML DTDs under the name “nondeterministic content models” despite the dissent of Tim Bray, Jean Paoli, and Peter Sharpe, three influential members of the XML Special Interest Group who wanted to maintain a compatibility with SGML parsers. The motivation to maintain the restriction in W3C XML Schema is to keep schema processors simple to implement and to allow implementations through finite state machines (FSM). The execution time of these automatons could grow exponentially when the Unique Particle Attribution Rule is violated. This decision has been heavily criticized by experts including Joe English, James Clark, and Murata Makoto, who have proved that other simple algorithms might be used that keep the processing time linear when this rule is not met. This is also one of the main differences between the descriptive powers of schema languages, such as RELAX, TREX, and RELAX NG, which do not impose this rule, and W3C XML Schema.
Although not related, strictly speaking, the Unique Particle Attribution Rule and the Consistent Declaration Rule are often associated, since, in practice, when the Consistent Declaration Rule is violated, the Unique Particle Attribution Rule is often violated too. This new rule is much easier to explain and understand, since it only states that W3C XML Schema explicitly forbids choices between elements with the same name and different types, such as in the following:
<xs:choice> <xs:element name="name" type="xs:string"/> <xs:element name="name"> <xs:complexType> <xs:sequence> <xs:element ref="first-name"/> <xs:element ref="middle-name"/> <xs:element ref="last-name"/> </xs:sequence> </xs:complexType> </xs:element> </xs:choice>
We will see a workaround using the xsi:type
attribute, which may be used by some applications, in Chapter 11.
While useful, unordered content models have their own sets of limitations.
Unordered content models (i.e., content models that
do not impose any order on the children elements) not only increase
the risks of nondeterministic content models, but are also an
important complexity factor for schema processors. For the sake of
implementation simplicity, the Recommendation has imposed huge
limitations on the xs:all
element, which makes
it hardly usable in practice. xs:all
cannot be
used as a particle, but as a compositor only; xs:all
cannot have a number of occurrences greater
than one; the particles included within xs:all
must be xs:element
; and these particles must not
specify numbers of occurrences greater than one.
To illustrate these limitations, let’s imagine we
have decided to simplify the life of document producers and want to
create a vocabulary that doesn’t care about the
relative order of children elements. With a simple vocabulary such as
the one defined in our first schema, this wouldn’t
add a big burden to the applications handling our vocabulary. When
you think about it, there is no special reason to impose the
definition of the title of a book after its ISBN number or the
definition of the list of authors before the list of characters.The
first content model that may be affected by this decision is the
content model of the book
element:
<xs:element name="book"> <xs:complexType> <xs:sequence> <xs:element ref="isbn"/> <xs:element ref="title"/> <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="character" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute ref="id"/> <xs:attribute ref="available"/> </xs:complexType> </xs:element>
Unfortunately, here the xs:sequence
cannot be
replaced by xs:all
, since two of the children
elements (author
and character
)
have a maximum number of occurrences that is
“unbounded” and thus higher than
one. The second group of candidates includes the content models of
author
and character
, which are
relatively similar:
<xs:element name="author"> <xs:complexType> <xs:sequence> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="dead" minOccurs="0"/> </xs:sequence> <xs:attribute ref="id"/> </xs:complexType> </xs:element> <xs:element name="character"> <xs:complexType> <xs:sequence> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="qualification"/> </xs:sequence> <xs:attribute ref="id"/> </xs:complexType> </xs:element>
The good news here is that both author
and
character
match the criteria for xs:all
, so we can write:
<xs:element name="author"> <xs:complexType> <xs:all> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="dead" minOccurs="0"/> </xs:all> <xs:attribute ref="id"/> </xs:complexType> </xs:element> <xs:element name="character"> <xs:complexType> <xs:all> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="qualification"/> </xs:all> <xs:attribute ref="id"/> </xs:complexType> </xs:element>
We can have two elements (author
and
character
) in which the order of children elements
is not significant. One may question, though, whether this is very
interesting since this independence is not consistent throughout the
schema. More importantly, we must note that we have lost a great deal
of flexibility and extensibility by using a xs:all
compositor. Since the maximum number of
occurrences for each child element needs to be one, we can no longer,
for instance, change the number of occurrences of the
qualification
element to accept several
qualifications in different languages. And since the particles used
in xs:all
cannot be compositors or groups, we
can’t extend the content model to accept both
name
and the sequence
first-name
, middle-name
, and
last-name
either.
Since xs:all
appears to be pretty ineffective
in general, there are a couple of workarounds that may be proposed
for people who would like to develop order-independent vocabularies.
The first workaround, which may be used
only if you are creating your own vocabulary from scratch, is to
adapt the structures of your document to the constraint of xs:all
. In practice, this means that each time we
have to use a xs:choice
, a xs:sequence
, or include elements with more than one
occurrence, we will add a new element as a container. For instance,
we will create containers named authors
and
characters
that will encapsulate the multiple
occurrences of author
and
character
. The result is instance documents such
as:
<?xml version="1.0"?> <library> <book id="b0836217462" available="true"> <title lang="en"> Being a Dog Is a Full-Time Job </title> <isbn> 0836217462 </isbn> <authors> <author id="CMS"> <born> 1922-11-26 </born> <dead> 2000-02-12 </dead> <name> Charles M Schulz </name> </author> </authors> <characters> <character id="PP"> <name> Peppermint Patty </name> <qualification> bold, brash and tomboyish </qualification> <born> 1966-08-22 </born> </character> <character id="Snoopy"> <born> 1950-10-04 </born> <name> Snoopy </name> <qualification> extroverted beagle </qualification> </character> <character id="Schroeder"> <qualification> brought classical music to the Peanuts strip </qualification> <name> Schroeder </name> <born> 1951-05-30 </born> </character> <character id="Lucy"> <name> Lucy </name> <born> 1952-03-03 </born> <qualification> bossy, crabby and selfish </qualification> </character> </characters> </book> </library>
This instance document defined by a full schema, which could be:
<?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="name" type="xs:token"/> <xs:element name="qualification" type="xs:token"/> <xs:element name="born" type="xs:date"/> <xs:element name="dead" type="xs:date"/> <xs:element name="isbn" type="xs:NMTOKEN"/> <xs:attribute name="id" type="xs:ID"/> <xs:attribute name="available" type="xs:boolean"/> <xs:attribute name="lang" type="xs:language"/> <xs:element name="title"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:token"> <xs:attribute ref="lang"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> <xs:element name="library"> <xs:complexType> <xs:sequence> <xs:element ref="book" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="authors"> <xs:complexType> <xs:sequence> <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="author"> <xs:complexType> <xs:all> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="dead" minOccurs="0"/> </xs:all> <xs:attribute ref="id"/> </xs:complexType> </xs:element> <xs:element name="book"> <xs:complexType> <xs:all> <xs:element ref="isbn"/> <xs:element ref="title"/> <xs:element ref="authors"/> <xs:element ref="characters"/> </xs:all> <xs:attribute ref="id"/> <xs:attribute ref="available"/> </xs:complexType> </xs:element> <xs:element name="characters"> <xs:complexType> <xs:sequence> <xs:element ref="character" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="character"> <xs:complexType> <xs:all> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="qualification"/> </xs:all> <xs:attribute ref="id"/> </xs:complexType> </xs:element> </xs:schema>
This adaptation of the instance document will be more painful if we
want to implement our alternative
“name” content model. Since we
cannot include a xs:choice
in a xs:all
compositor, we have to add a first level of
container, which is always the same, and a second level of container,
which contains only the choice that would lead to instance documents
such as:
<?xml version="1.0"?> <library> <book id="b0836217462" available="true"> <title lang="en"> Being a Dog Is a Full-Time Job </title> <isbn> 0836217462 </isbn> <authors> <author id="CMS"> <born> 1922-11-26 </born> <dead> 2000-02-12 </dead> <name> <complex-name> <last-name> Schulz </last-name> <first-name> Charles </first-name> <middle-name> M </middle-name> </complex-name> </name> </author> </authors> <characters> <character id="PP"> <name> <complex-name> <first-name> Peppermint </first-name> <last-name> Patty </last-name> </complex-name> </name> <qualification> bold, brash and tomboyish </qualification> <born> 1966-08-22 </born> </character> <character id="Snoopy"> <born> 1950-10-04 </born> <name> <simple-name> Snoopy </simple-name> </name> <qualification> extroverted beagle </qualification> </character> <character id="Schroeder"> <qualification> brought classical music to the Peanuts strip </qualification> <name> <simple-name> Schroeder </simple-name> </name> <born> 1951-05-30 </born> </character> <character id="Lucy"> <name> <simple-name> Lucy </simple-name> </name> <born> 1952-03-03 </born> <qualification> bossy, crabby and selfish </qualification> </character> </characters> </book> </library>
The adaptation of the schema is then straightforward and could be (keeping a flat design):
<?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="simple-name" type="xs:token"/> <xs:element name="first-name" type="xs:token"/> <xs:element name="middle-name" type="xs:token"/> <xs:element name="last-name" type="xs:token"/> <xs:element name="qualification" type="xs:token"/> <xs:element name="born" type="xs:date"/> <xs:element name="dead" type="xs:date"/> <xs:element name="isbn" type="xs:NMTOKEN"/> <xs:attribute name="id" type="xs:ID"/> <xs:attribute name="available" type="xs:boolean"/> <xs:attribute name="lang" type="xs:language"/> <xs:element name="name"> <xs:complexType> <xs:choice> <xs:element ref="simple-name"/> <xs:element ref="complex-name"/> </xs:choice> </xs:complexType> </xs:element> <xs:element name="complex-name"> <xs:complexType> <xs:all> <xs:element ref="first-name"/> <xs:element ref="middle-name" minOccurs="0"/> <xs:element ref="last-name"/> </xs:all> </xs:complexType> </xs:element> <xs:element name="title"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:token"> <xs:attribute ref="lang"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> <xs:element name="library"> <xs:complexType> <xs:sequence> <xs:element ref="book" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="authors"> <xs:complexType> <xs:sequence> <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="author"> <xs:complexType> <xs:all> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="dead" minOccurs="0"/> </xs:all> <xs:attribute ref="id"/> </xs:complexType> </xs:element> <xs:element name="book"> <xs:complexType> <xs:all> <xs:element ref="isbn"/> <xs:element ref="title"/> <xs:element ref="authors"/> <xs:element ref="characters"/> </xs:all> <xs:attribute ref="id"/> <xs:attribute ref="available"/> </xs:complexType> </xs:element> <xs:element name="characters"> <xs:complexType> <xs:sequence> <xs:element ref="character" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="character"> <xs:complexType> <xs:all> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="qualification"/> </xs:all> <xs:attribute ref="id"/> </xs:complexType> </xs:element> </xs:schema>
This process may be generalized and used for purposes other than
adapting instance documents to the constraints of xs:all
. It is interesting to note that we have
“externalized” the complexity,
which was previously hidden from the instance document in the schema,
to bring the full structure of the content model into the instance
document itself. The choices and sequences (an element with multiple
occurrences is nothing more than an implicit sequence) are now
expressed through containers in the instance documents. Since the
structure is more apparent in the instance documents, it can be
considered more readable; some people find it a good practice to use
such
container.
When it is not possible or not practical to
adapt the structure of a document to the limitations of xs:all
, another workaround that may be used is to
replace xs:all
compositors by xs:choice
, when possible. This trick is far less generic
than the adaptation of structures we just saw, and it may be
surprising that two compositors with a very different meaning could
be “interchanged.” This applies
only when a loose control on the number of occurrences can be
applied, such as in a container that accepts both
author
and character
elements
in any order with any number of occurrences. Such a container can be
defined as:
<xs:element name="persons"> <xs:complexType> <xs:choice minOccurs="0" maxOccurs="unbounded"> <xs:element ref="author"/> <xs:element ref="character"/> </xs:choice> </xs:complexType> </xs:element>
This definition has the same meaning as the following xs:all
definition, which is forbidden:
<xs:element name="persons"> <xs:complexType> <xs:all> <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="character" minOccurs="0" maxOccurs="unbounded"/> </xs:all> </xs:complexType> </xs:element>
Complex contents can also be derived, by extension or by restriction, from complex types. Before we see the details of these mechanisms, note that they are not symmetrical and their semantic is very different. The derivation of a complex content by restriction is a restriction of the set of matching instances. All the instance structures that match the restricted complex type must also match the base complex type. The derivation of a complex content by extension of a complex type is an extension of the content model by addition of new particles. A content that matches the base type does not necessarily match the extended complex type. This also means that there is no “roundtrip”: in the general case, neither a restricted complex type nor an extended type can be extended or restricted back into its base type.
Derivation
by
extension is similar to the extension of simple content complex
types. It is functionally very similar to joining groups of elements
and attributes to create a new complex type. The idea behind this
feature is to let people add new elements and attributes after those
already defined in the base type. This is virtually equivalent to
creating a sequence with the current content model followed by the
new content model. Let’s go back to our library to
illustrate this. The content models of our elements
author
and character
are
relatively similar: author
expects
name
, born
, and
dead
, while character
expects
name
, born
, and
qualification
. If we want to use a derivation by
extension, we can first create a base type that contains the first
elements common to the content model of both elements:
<xs:complexType name="basePerson"> <xs:sequence> <xs:element ref="name"/> <xs:element ref="born"/> </xs:sequence> <xs:attribute ref="id"/> </xs:complexType>
It is then possible to use derivations by extension to append new
elements (dead
for author
and
qualification
for character
)
after those that have already been defined in the base type:
<xs:element name="author"> <xs:complexType> <xs:complexContent> <xs:extension base="basePerson"> <xs:sequence> <xs:element ref="dead" minOccurs="0"/> </xs:sequence> </xs:extension> </xs:complexContent> </xs:complexType> </xs:element> <xs:element name="character"> <xs:complexType> <xs:complexContent> <xs:extension base="basePerson"> <xs:sequence> <xs:element ref="qualification"/> </xs:sequence> </xs:extension> </xs:complexContent> </xs:complexType> </xs:element>
Technically, the meaning of this derivation is equivalent to creating
a sequence containing the compositor used to define the base type as
well as the base type included in the xs:extension
element. Thus, the content models of these elements are similar to
the content models defined as:
<xs:element name="author"> <xs:complexType> <xs:sequence> <xs:sequence> <xs:element ref="name"/> <xs:element ref="born"/> </xs:sequence> <xs:sequence> <xs:element ref="dead" minOccurs="0"/> </xs:sequence> </xs:sequence> <xs:attribute ref="id"/> </xs:complexType> </xs:element> <xs:element name="character"> <xs:complexType> <xs:sequence> <xs:sequence> <xs:element ref="name"/> <xs:element ref="born"/> </xs:sequence> <xs:sequence> <xs:element ref="qualification"/> </xs:sequence> </xs:sequence> <xs:attribute ref="id"/> </xs:complexType> </xs:element>
This equivalence clearly shows the feature of this derivation
mechanism. As stated in the introduction of complex content
derivation mechanisms, this is not an extension of the set of valid
instance structures. An element character
, with
its mandatory qualification
, cannot have a valid
basePerson
content model but rather the merge of
two content models. This merge itself is subject to limitations: you
cannot choose the point where the new content model is inserted; this
addition is always done by appending the new compositor after the one
of the base type. In our example, if the common elements
name
and born
were not the
first two elements, we couldn’t have used a
derivation by extension.
Another caveat in derivations by extension is we
can’t choose the compositor that is used to merge
the two content models. This means that when we derive content models
using xs:choice
as compositors, it is not the scope of the choices that is extended,
but rather the choices that are included in a xs:sequence
. We could, for instance, extend the content
model of the element persons
, which we just
created and which could be defined as a global complex type:
<xs:complexType name="basePersons"> <xs:choice minOccurs="0" maxOccurs="unbounded"> <xs:element ref="author"/> <xs:element ref="character"/> </xs:choice> </xs:complexType>
If we add a new element using a derivation by extension:
<xs:complexType name="persons"> <xs:complexContent> <xs:extension base="basePersons"> <xs:sequence> <xs:element name="editor" type="xs:token" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:extension> </xs:complexContent> </xs:complexType>
The result is a content type that is equivalent to:
<xs:complexType name="personsEquivalent"> <xs:sequence> <xs:choice minOccurs="0" maxOccurs="unbounded"> <xs:element ref="author"/> <xs:element ref="character"/> </xs:choice> <xs:sequence> <xs:element name="editor" type="xs:token" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:sequence> </xs:complexType>
There is no way to obtain an extension of the xs:choice
such as:
<xs:complexType name="personsAsWeWouldHaveLiked"> <xs:choice minOccurs="0" maxOccurs="unbounded"> <xs:element ref="author"/> <xs:element ref="character"/> <xs:element name="editor" type="xs:token"/> </xs:choice> </xs:complexType>
The situation with xs:all
is even worse: the
restrictions on the composition of xs:all
still
apply. This means you can’t add any content to a
complex type defined with a xs:all
—although you can still add new
attributes—and also you can only use a xs:all
compositor in a derivation by extension if the
base type has an empty content
model.
Whereas
derivation by extension is similar to
merging two content models through a xs:sequence
compositor, derivation by restriction is a restriction of the number
of instance structures matching the complex type. In this respect, it
is similar to the derivation by restriction of simple datatypes or
simple content complex types (even though we’ve seen
that a facet such as
xs:whiteSpace
expanded the number of instance documents matching a simple type).
Note that this is the only similarity between derivations by
restriction of simple and complex datatypes. This is highly
confusing, since W3C XML Schema uses the same word and even the same
element name in both cases, but these words have a different meaning
and the content models of the xs:restriction
elements
are different.
Unlike simple type derivation, there are no facets to apply to complex types, and the derivation is done by defining the full content model of the derived datatype, which must be a logical restriction of the base type. Any instance structure valid per the derived datatype must also be valid per the base datatype. The W3C XML Schema specification does not define the derivation by restriction in these terms, but defines a formal algorithm to be followed by schema processors, which is roughly equivalent.
The derivation by restriction of a complex type is a declaration of intention that the derived type is a subset of the base type. (Rather than a derivation we’ve seen for simple types, this declaration is needed for features allowing substitutions and redefinitions of types, which we will see in Chapter 8 and Chapter 12 and which may provide useful information used by some applications.) When we derive simple types, we can take a base type without having to care about the details of the facets that are already applied, and just add our own set of facets. Here, on the contrary, we need to provide a full definition of a content model, except for attributes that can be declared as “prohibited” to be excluded from the restriction, something we have seen for the restriction of complex types with simple contents.
Moving on, let’s try to find a base from which we
can derive both the author
and
character
elements by restriction. This time, we
can be sure that such a complex type exists since all the complex
types can be derived from an abstract xs:anyType
,
allowing any elements and attributes. In practice, however, we will
try to find the most restrictive base type that can accommodate our
needs. Since the name
and born
elements are present in both author
and
character
, with the same number of occurrences, we
can keep them as they appear. We then have two elements
(dead
and qualification
, which
appear only in one of the two elements author
and
character
). Since both author
and character
will need to be valid per the base
type, we will take both of them in the base type but make them
optional by giving them a minOccurs
attribute
equal to 0. Our base type can then be:
<xs:complexType name="person"> <xs:sequence> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="dead" minOccurs="0"/> <xs:element ref="qualification" minOccurs="0"/> </xs:sequence> <xs:attribute ref="id"/> </xs:complexType>
The derivations are then done by defining the content model within a
xs:restriction
element (note that we have not
repeated the attribute declarations which are not modified):
<xs:element name="author"> <xs:complexType> <xs:complexContent> <xs:restriction base="person"> <xs:sequence> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="dead" minOccurs="0"/> </xs:sequence> </xs:restriction> </xs:complexContent> </xs:complexType> </xs:element> <xs:element name="character"> <xs:complexType> <xs:complexContent> <xs:restriction base="person"> <xs:sequence> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="qualification"/> </xs:sequence> </xs:restriction> </xs:complexContent> </xs:complexType> </xs:element>
We see here that the syntax of a derivation by restriction is more
verbose than the syntax of the straight definition of the content
model. The purpose of this derivation is not to build modular
schemas, but rather to give applications that use this schema the
indication that there is some commonality between the content models,
and if they know how to handle the complex type
“person,” they can handle the
elements author
and character
.
We will see W3C XML Schema features that rely on this derivation
method in Chapter 8 and Chapter 12.
Changing the number of occurrences of particles is not the only modification that can be done during a derivation by restriction. Other operations that result in a reduction of the number of valid instance structures are also possible, such as changing a simple type to a more restrictive one or fixing values. The main constraint in this mechanism is that each particle of the derived type must be an explicit derivation of the corresponding particle of the base type. The effect of this statement is to limit the “depth” of the restrictions that can be performed in a single step, and when we need to restrict particles at a deeper level of imbrication, we may have to transform local definitions into global ones. We will see a concrete example in Section 7.5.1, which are similar in this respect.
We
now
have all the elements we need to look back at the claim about the
asymmetry of these derivation methods. This lack of symmetry is not a
defect as such, but studying it is a good exercise to understanding
the meaning of these two derivation methods. Let’s
examine the derivation by extension of basePerson
into the character
element:
<xs:complexType name="basePerson"> <xs:sequence> <xs:element ref="name"/> <xs:element ref="born"/> </xs:sequence> <xs:attribute ref="id"/> </xs:complexType> <xs:element name="character"> <xs:complexType> <xs:complexContent> <xs:extension base="basePerson"> <xs:sequence> <xs:element ref="qualification"/> </xs:sequence> </xs:extension> </xs:complexContent> </xs:complexType> </xs:element>
The content model of character
contains a
mandatory qualification
element. Valid characters
are not valid per basePerson
; thus, there is no
hope to be able to derive character back into
basePerson
by restriction, since all the instance
structures that are valid per the derived type must be valid per the
base type in a derivation by restriction.
Let’s look back at the derivation by restriction of
the person
base type into a
character
element:
<xs:complexType name="person"> <xs:sequence> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="dead" minOccurs="0"/> <xs:element ref="qualification" minOccurs="0"/> </xs:sequence> <xs:attribute ref="id"/> </xs:complexType> <xs:element name="character"> <xs:complexType> <xs:complexContent> <xs:restriction base="person"> <xs:sequence> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="qualification"/> </xs:sequence> </xs:restriction> </xs:complexContent> </xs:complexType> </xs:element>
Again, it is not possible to derive the complex type of
character
into person
, since it
means changing the number of minimum occurrences of
qualification
from 1 to 0 and adding an optional
dead
element between born
and
qualification
. None of these operations are
possible during a derivation by extension, which can only append new
content after the content of the base type, and
can’t update an existing particle (to change the
number of occurrences) nor insert a new particle between two
existing
particles.