The X from XML stands for “extensible.” The goal of any schema language is to control and limit this extensibility to help the applications deal with it. Extensibility and schemas pursue two opposite goals. Carelessly written schemas may significantly reduce extensibility, and we need to keep this in mind when we design our own schemas.
Here again, we find the duality between the schema and the instance documents, and we need to distinguish between two different forms of extensibility. The extensibility of the schema, is the ability to reuse its components to create other schemas, while the extensibility of the vocabulary, is the ability to add or modify the content models with a minimal impact on the applications, and is, in fact, the openness of the schema.
The extensibility of a schema is essentially determined by its style,
the choice of which components (elements and attributes, element and
attribute groups, and simple and complex types) have been made
global, the use of the final
and
fixed
attributes, and the optional division of
these components over different schema documents. We need to have a
look at these three factors.
A simple example is often better than a long explanation, so to illustrate the differences between the different schema styles, we will take some examples out of our library and study complex and simple type elements and attributes.
Let’s consider
the definition of the
book
element in the context of our library. We
have four different basic ways of defining this element, and they all
will validate the same set of instance elements—but not the
same set of instance documents, since exposing an element as global
allows its use as a document element. We can use a
Russian doll design and define
the book
element and its type locally within the
library element (I have used the same Russian doll design for the
book
’s child elements to keep the
schema concise as we will focus on the definition of
book
for this example):
<xs:element name="library"> <xs:complexType> <xs:sequence> <xs:element name="book" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element ref="isbn"/> <xs:element ref="title"/> <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="character" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute ref="id"/> <xs:attribute ref="available"/> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element>
We can also define a global book element and reference it in the content model of our library:
<xs:element name="book"> <xs:complexType> <xs:sequence> <xs:element ref="isbn"/> <xs:element ref="title"/> <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="character" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute ref="id"/> <xs:attribute ref="available"/> </xs:complexType> </xs:element> <xs:element name="library"> <xs:complexType> <xs:sequence> <xs:element ref="book" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element>
The third classical way is to
define
a complex type for the content model of our
bookType
element (note that I could have called it
book
, but I feel bookType
is
less confusing):
<xs:complexType name="bookType"> <xs:sequence> <xs:element ref="isbn"/> <xs:element ref="title"/> <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="character" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute ref="id"/> <xs:attribute ref="available"/> </xs:complexType> <xs:element name="library"> <xs:complexType> <xs:sequence> <xs:element name="book" type="bookType" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element>
Finally, we can
define a group containing our book
element:
<xs:group name="bookGroup"> <xs:sequence> <xs:element name="book"> <xs:complexType> <xs:sequence> <xs:element ref="isbn"/> <xs:element ref="title"/> <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="character" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute ref="id"/> <xs:attribute ref="available"/> </xs:complexType> </xs:element> </xs:sequence> </xs:group> <xs:element name="library"> <xs:complexType> <xs:sequence> <xs:group ref="bookGroup" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element>
These four basic styles can, of course, be combined. The more extreme example is as follows:
<xs:complexType name="bookType"> <xs:sequence> <xs:element ref="isbn"/> <xs:element ref="title"/> <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="character" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute ref="id"/> <xs:attribute ref="available"/> </xs:complexType> <xs:element name="book" type="bookType"/> <xs:group name="bookGroup"> <xs:sequence> <xs:element ref="book"/> </xs:sequence> </xs:group> <xs:element name="library"> <xs:complexType> <xs:sequence> <xs:group ref="bookGroup" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element>
Although this example may seem excessive, we must acknowledge that it is also the most extensible, since it lets you use all the “reuse and derive” methods of our three compositors! Now that we’ve seen these four basic styles, let’s see how they compare for re-usability and derivation.
The Russian doll is obviously the style that
is the least extensible: both the definition of the
book
element and of its content model are local.
They cannot be referenced to be reused in another part of a schema,
they cannot be used as a document element, they cannot be modified by
derivation, through
xs:redefine
, or through
substitution groups. Using a Russian doll style here is thus a more
efficient “blocking” feature than
any blocking attribute is. Changing or reusing the
book
element or content model requires attaching a
totally different schema to the instance document or using a
xsi:type
attribute in the instance document.
The flat
model, which uses global element definitions, gives a basic level of
flexibility since the element can now be reused in any location
within any schema, can be used as a document element in an instance
document, and can be used as the head of a substitution group. When
used with a local complex type definition like in our example, the
flat model doesn’t allow you to change the content
model of the book
element. Among these three
features, the flat model can be used as the head of a substitution
group, and is the only one that can be blocked (using a
block
attribute). It can be used without
restriction as a document element in an instance document or be used
anywhere in a schema. We also need to note that elements cannot be
redefined and that the content model of our book
element cannot be changed, except through a substitution by means of
xsi:type
in the instance document.
The definition of a global complex type to
describe the content model of the book
element
opens two different doors. The content model of the
book
element can now be reused to derive extended
or restricted content models that may be used elsewhere, and the
complex type can be redefined through
xs:redefine
.
As seen in the previous chapter, the derivation can be blocked
through the final
attribute, but the redefinition
cannot be controlled.
Last but not
least, embedding the definition of the book
element in a group allows the group to be reused elsewhere—
for example, in our flat model—but can hide the definition of
the book
element, if needed, to avoid its usage as
a document element in instance documents. (Incidentally, it also
blocks its usage as the head of a substitution group.) Defining a
group also opens the possibility to redefine it through
xs:redefine
to change the number of occurrences of the
element, to add new elements, or even to change its content model if
a global complex type has been used. Using an element group this way
is very similar to the approach of RELAX NG and gives a bit of its
flexibility. We need to note, though, that element groups cannot be
recursive; this can be a limitation to using element groups to define
recursive content models with element groups, since a global element
still needs to be defined for use in a reference. This can be a
problem when we can’t, or don’t
want to, use a global element—for instance, when we have two
different recursive content models using the same element name with
different contents.
Which approach is appropriate? There is no single definite answer to this question, but we know that each of these styles has a different set of extensibility features. The choice between them or a combination of them has a major impact on the reusability and derivability of the definitions present in a schema. Table 13-1 may help with visualizing the differences between these styles, but keep in mind that combinations of all of them are allowed!
Table 13-1. Complex type styles
Style |
Element reference |
Content model reference |
Derivation |
Substitution group |
Document element |
Redefine |
---|---|---|---|---|---|---|
Russian doll |
No |
No |
No |
No |
No |
No |
Flat |
Yes |
No |
No |
Yes |
Yes |
No |
Complex type |
No |
Yes |
Yes |
No |
No |
Yes |
Group |
Yes |
No |
No |
No |
No |
Yes |
Simple type elements behave much like complex types, except that the complex type definitions are, of course, replaced by simple type definitions similar to those for attributes, discussed in the next section.
As seen in Chapter 10,
attributes behave differently from elements in that most of the time
they are unqualified. This means then that they cannot be globally
defined. Otherwise, we have a similar situation with attributes,
simple types, and attribute groups as we had with elements and
complex types (the other exception is there is no equivalent in
attribute land to substitution groups or
xsi:type
). If we take the definition of a
lang
attribute restricted to en
or fr
in the title element, we can have a
Russian doll design in which
the attribute and its type will be locally defined:
<xs:element name="title"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:token"> <xs:attribute name="lang"> <xs:simpleType> <xs:restriction base="xs:language"> <xs:enumeration value="en"/> <xs:enumeration value="fr"/> </xs:restriction> </xs:simpleType> </xs:attribute> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element>
We can also take a flat design in which the attribute is globally defined:
<xs:attribute name="lang"> <xs:simpleType> <xs:restriction base="xs:language"> <xs:enumeration value="en"/> <xs:enumeration value="fr"/> </xs:restriction> </xs:simpleType> </xs:attribute> <xs:element name="title"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:token"> <xs:attribute ref="lang"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element>
A global simple type can also be defined:
<xs:simpleType name="langType"> <xs:restriction base="xs:language"> <xs:enumeration value="en"/> <xs:enumeration value="fr"/> </xs:restriction> </xs:simpleType> <xs:element name="title"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:token"> <xs:attribute name="lang" type="langType"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element>
The attribute may be “hidden” in an attribute group:
<xs:attributeGroup name="langGroup"> <xs:attribute name="lang"> <xs:simpleType> <xs:restriction base="xs:language"> <xs:enumeration value="en"/> <xs:enumeration value="fr"/> </xs:restriction> </xs:simpleType> </xs:attribute> </xs:attributeGroup> <xs:element name="title"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:token"> <xs:attributeGroup ref="langGroup"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element>
All can this can be used together:
<xs:simpleType name="langType"> <xs:restriction base="xs:language"> <xs:enumeration value="en"/> <xs:enumeration value="fr"/> </xs:restriction> </xs:simpleType> <xs:attribute name="lang" type="langType"/> <xs:attributeGroup name="langGroup"> <xs:attribute ref="lang"/> </xs:attributeGroup> <xs:element name="title"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:token"> <xs:attributeGroup ref="langGroup"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element>
The impact of these design decisions is pretty much the same as those we’ve seen in complex type elements, except, of course, for substitution groups and usability as a document element. Table 13-2 explains the options these varying approaches provide.
These
attributes were already covered in Chapter 12, and they have an obvious impact on the
re-usability of simple and complex type definitions since they can
block some or all the further derivations. This category of features
affects the flexibility of the schema itself. Their friends
block
and abstract
are features
that impact the openness of the schema and have no impact on the set
of instance documents.
The
last factor that acts on the flexibility and
re-usability of our schema (and schema libraries) is the split of the
components among different documents. Some schema designers have gone
as far as possible in this direction and advise the location of each
class or component in its own schema document, and to include and
import the components needed to create a full schema. This may seem
excessive, but provides a very fine granularity and allows a
workaround of the limitations of
xs:redefine
. (If a
component needs to be redefined, just leave out the old definition
and write a new one.)
The biggest issue with such a design is probably the management of a number of different documents that can rapidly grow, and the many dependencies between these documents. These dependencies must be considered when designing libraries of schemas since they can be tough to track because the links between the included and including documents are multidirectional. A component within an included schema can reference components defined in any other schema processed by the schema processor.
We need to reexamine how a schema processor will build a global
schema using all the imported, included, and redefine instructions it
will find. The schema processor initially builds a big consolidated
schema with all the components defined in all the schema documents it
has processed. It then resolves the references between components
after building this consolidated schema. Although this simple and
powerful mechanism applies to inclusions without restriction, we will
see that things can get nastier with imports and redefinitions.
Let’s start with the simplest case and move on to
the processing of
xs:include
.
The semantic of
xs:include
is slightly different
from the semantic of the include statements used in languages such as
C, and it should be considered a conditional include. A
xs:include
is actually a request to read a schema if it
has not already been read, to add all the component declarations
found in this schema to the consolidated schema if they have not
already been defined, to ignore the components found in the new
schema that are already defined in the global schema if they are
identical, and to raise an error if they are different. This means it
is perfectly legitimate to create loops and multiple inclusions,
either directly (schema A includes schema B, which includes schema C)
or indirectly (schema A includes schema B and schema C, which
includes schema B) and we can create inclusion paths as complex as we
wish.
The meaning of
xs:redefine
is similar, except that
some components can be redefined. When used, this difference is
enough to break the possibility of creating loops in which a schema A
redefines components of a schema B, which redefines or includes
schema A. This restriction actually means that while we can speak of
inclusion graphs, the redefinitions would instead form a tree. The
process of including or redefining is recursive, however, and when we
include (or redefine) a schema, we include the consolidated schema
resulting from the included document rather than the document itself.
We can still create inclusion loops within the branches of the
redefinition tree (schema A can redefine schema B, which includes
schema C, which includes schema B).
Some designers rely on the fact that when a schema without target namespaces is included (or redefined) in a schema with a target namespace, the included schema “borrows” the target namespace of its “includer.” This feature, already mentioned in Chapter 10, can be used to build “neutral” components with no namespaces that can be included and used as building blocks. Since these components take the namespace of the including schema like a chameleon takes the color of its environment, these schemas are called "chameleon schemas.” Although this technique is simple and may be convenient in some cases, it can be confusing to define similar components (and, therefore, similar types and content models) in different namespaces instead of creating a common namespace for them, which would immediately identify these types and content models as identical.
xs:import
behaves somewhat like
xs:include
: no redefinitions occur, which means that
loops can be created where schema A (for namespace A) imports schema
B (for namespace B), which itself imports schema A. It is important
to note that
xs:import
serves two different
purposes: it is an instruction to import a schema and a declaration
that components from a namespace can be referenced. If schema A for
namespace A imports schema B for namespace B, and if schema B needs
to reference components from the namespace A, an
xs:import
statement must be
included in schema B to declare that namespace A can be used (the
schemaLocation
attribute is optional and can be
omitted in such cases).
After working through the three mechanisms (include, redefine, and import), we can mix all of them together and note that chameleon schemas can be used together with imports. In this case, the same imported chameleon can contribute several times to a global schema under different namespaces. If schema A for namespace A includes schema B with no namespace, and imports schema C for namespace C with includes schema B, the two inclusions of schema B belong to different namespaces and are considered different.
We now have all the elements to find innovative ways to mix inclusion and import graphs with redefinition trees. Keep in mind that simple is beautiful, and if we don’t restrict ourselves, we humans might get lost well before our favorite schema processor!