Although this schema describes the same document as the one in Chapter 2, it illustrates very different aspects of W3C XML Schema.
Even
though we will present features to
balance this fact in the next
chapters—
xs:complexType
and
xs:group
—we have sacrificed the modularity of our
first schema to gain the depth and structure of the second one. This
is a general tendency in W3C XML Schema.
In practice, you will probably want to keep a balance between these two opposite styles and allow a certain level of depth under several global elements.
There are two cases, however, in which these two styles are not equivalent. The first is when elements with the same name need to be defined with different contents at different locations. In this case, local element definitions should be used (at least at all the location except one) since the elements are identified by their names.
In our example, the element name
appears both
within author
and character
with the same datatype. We may want to define the element
name
with different content models in
author
and character
, as in
this instance document:
<?xml version="1.0"?> <library> <book id="b0836217462" available="true"> <isbn> 0836217462 </isbn> <title lang="en"> Being a Dog Is a Full-Time Job </title> <author id="CMS"> <name> <first> Charles </first> <middle> M. </middle> <last> Schulz </last> </name> <born> 1922-11-26 </born> <dead> 2000-02-12 </dead> </author> <character id="Snoopy"> <name> Snoopy </name> <born> 1950-10-04 </born> <qualification> extroverted beagle </qualification> </character> </book> </library>
Since we can define only one global element named
name
, we need to define at least one of the
name
elements locally under its parent.
The W3C Schema for XML Schema gives several examples of elements
having different types depending on their location. We will see this
used in the next section in our Russian doll schema: global
definitions of elements have a different type in the schema for
schema than local definitions or references, even though they use the
same element name (xs:element
).
Whether defining elements with the same name and different datatypes
is good practice or not is subject to discussion. It may be confusing
for human authors and more difficult to document, but W3C XML Schema
gives, through local definitions, a way to avoid any confusion for
the applications that will process these documents. In our example,
for instance, we have two occurrences of a name
element under author
and under
character
. It is perfectly possible to define
different constraints and even contents on those two elements.
Although this could be presented as overloaded element names
(“character/name” versus
“author/name”), I find this
practice unreliable, since we often don’t have a
clear and simple way to identify those two contexts.
Another example is recursive schema, in which an element can be included within an element of the same type directly or indirectly in a child element. In this case, a flat design employing references must be used since the depth of these recursive structures is unlimited.
W3C XML Schema offers several examples of such elements with local definitions of elements that can be recursively nested, as is the case in our second schema. A flat design must be used since these elements need to be referenced if we don’t want to limit the maximum depth of the structure, and the schema for schema uses a reference mechanism. (The actual mechanism used in this case involves an element group, a feature we have not seen yet but is equivalent to an actual reference to an element.)
The style of defining elements and attributes locally is often called the Russian doll design, since the definition of each element is embedded in the definition of its parent, in the same way Russian dolls are embedded into each other.
If we look at the Russian dolls with our object-oriented lenses, we may say that the objects are now created locally where they are needed as opposed to being created globally and cloned when we need them (which was the case as in our first schema).
At this point, we still need to learn how we can create types that are the equivalent of classes of objects and containers, and that will let us manipulate sets of objects.
Those of you who are familiar with XML (or SGML) and its DTD are used to identifying the elements though the term “element type.” The XML 1.0 Recommendation states that “each element has a type, identified by name.” This is further disambiguated by the namespaces specification, which explain that “an XML namespace is a collection of names, identified by a URI reference [RFC2396], which are used in XML documents as element types and attribute names.”
A surprising feature of our Russian doll schema is that this
fundamental notion of element type has completely disappeared, and
there is no way to tell which element type name
is. Two different elements have been defined as having a name equal
to name
. These have an independent definition,
which is identical in our example, but could be different—such
as if we had decomposed the first, middle, and last names for
authors, but not for characters. The notion of element type
name
doesn’t mean anything if we
do not specify in which context it is used.
This loss has such little importance that few people have even noticed it. There are some situations where we need to identify elements, though—for instance to document XML vocabularies. A convenient way to write a reference manual for a XML vocabulary is to write an index of the element names with their definition. This becomes much more complex when there is no clear match between element types and their definitions and content models.
RDF is another application that relies on element types. RDF uses element types to identify elements as objects in its triples. The element “name” of the namespace http://dyomedea.com/ns is identified as http://dyomedea.com/ns#name. Cutting the link between element types and their schema definition makes it difficult, if not impossible, to answer basic questions, such as what’s the content model of http://dyomedea.com/ns#name, and where can I find its definition.
I was confronted with this issue when writing the reference guide of
this book since the W3C XML Schema for W3C XML Schema uses many local
element definitions. I came to the conclusion that the fact that the
same element type (such as xs:restriction
, which we
will see later on) can have different content models with a different
semantic, depending on its location in a schema, adds a significant
amount of difficulty in understanding the language and reading a
schema.