We will see, in the course of this book, that there are many different styles for writing a schema, and there are even more approaches to deriving a schema from an instance document. For our first schema, we will adopt a style that is familiar to those of you who have already worked with DTDs. We’ll start by creating a classified list of the elements and attributes found in the schema.
The
elements
existing
in our instance document are author
,
book
, born
,
character
, dead
,
isbn
, library
,
name
, qualification
, and
title
, and the attributes are
available
, id
, and
lang
.
We will build our first schema
by defining each element in turn under our schema document element
(named, unsurprisingly, schema
), which belongs to
the W3C XML Schema namespace (http://www.w3.org/2001/XMLSchema) and is
usually prefixed as “xs.”
Before we start, we need to classify the elements and, for this exercise, give some key definitions for understanding how W3C XML Schema does this classification. (We will see these definitions in more detail in the chapters about simple and complex types.)
The content model characterizes the types of children elements and text nodes that can be included in an element (without paying any attention to the attributes).
The content model is said to be "empty” when no children elements nor text nodes are expected, "simple” when only text nodes are accepted, "complex” when only subelements are expected, and "mixed” when both text nodes and sub-elements can be present. Note that to determine the content model, we pay attention only to the element and text nodes and ignore any attribute, comment, or processing instruction that could be included. For instance, an element with some attributes, a comment, and a couple of processing instructions would have an “empty” content model if it has no text or element children.
Elements such as name
, born
,
and title
have simple content models:
.../... <title lang="en"> Being a Dog Is a Full-Time Job </title> .../... <name> Charles M Schulz </name> <born> 1922-11-26 </born> .../...
Elements such as library
or
character
have complex content models:
<library> <book id="b0836217462" available="true"> .../... </book> </library> <character id="Lucy"> <name> Lucy </name> <born> 1952-03-03 </born> <qualification> bossy, crabby and selfish </qualification> </character>
Within elements that have a simple content model, we can distinguish those which have attributes and those which cannot have any attributes. Later chapters discuss how W3C XML Schema can also represent empty and mixed content models.
W3C XML Schema considers the elements that have a simple content model and no attributes "simple types,” while all the other elements (such as simple content with attributes and other content models) are "complex types.” In other words, when an element can only have text nodes and doesn’t accept any child elements or attributes, it is considered a simple type; in all the other cases, it is a complex type.
Attributes always have a simple type since they have no children and contain only a text value.
In
our example, elements such as
author
or title
have a complex
type:
<author id="CMS"> <name> Charles M Schulz </name> <born> 1922-11-26 </born> <dead> 2000-02-12 </dead> </author> .../... <title lang="en"> Being a Dog Is a Full-Time Job </title>
While elements such as born
or
qualification
(and, of course, all the attributes)
have a simple type:
<born> 1922-11-26 </born> .../... <qualification> brought classical music to the Peanuts strip </qualification> .../... <book available="true"/>
Now that we have criteria to classify our components, we can define
each of them. Let’s start with the simplest one by
taking a type element, such as the name
element
that can be found in author
or
character
:
<name> Charles M Schulz </name>
To define such an element, we use an
xs:element(global definition)
, included directly under the
xs:schema
document element:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="name" type="xs:string"/> .../... </xs:schema>
The value used to reference the datatype (
xs:string
) is prefixed by
xs
, the prefix associated with W3C XML Schema.
This means that xs
:string
is a
predefined W3C XML Schema datatype.
The same can be done for all the other simple types as well as for the attributes:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="name" type="xs:string"/> <xs:element name="qualification" type="xs:string"/> <xs:element name="born" type="xs:date"/> <xs:element name="dead" type="xs:date"/> <xs:element name="isbn" type="xs:string"/> <xs:attribute name="id" type="xs:ID"/> <xs:attribute name="available" type="xs:boolean"/> <xs:attribute name="lang" type="xs:language"/> .../... </xs:schema>
While we said that this design style would be familiar to DTD users, we must note that it is flatter than a DTD since the declaration of the attributes is done outside of the declaration of the elements. This results in a schema in which elements and attributes get fairly equal treatment. We will see, though, that when a schema describes an XML vocabulary that uses a namespace, this simple flat style is impossible to use most of time.
The assimilation of simple type elements and attributes is a simplification compared to the XPath, DOM, and Infoset data models. These consider a simple type element to be an item having a single child item of type “character,” and an attribute to be an item having a normalized value. The benefit of this simplification is we can use simple datatypes to define simple type elements and attributes indifferently and write in a consistent fashion:
<xs:element name="isbn" type="xs:string"/> or <xs:attribute name="isbn" type="xs:string"/>
The order of the definitions in a schema isn’t
significant; we can now take the next step in terms of type
complexity and define the title
element that
appears in the instance document as:
<title lang="en"> Being a Dog Is a Full-Time Job </title>
Since this element has an attribute, it has a complex type. Since it has only a text node, it is considered to have a simple content. We will, therefore, write its definition as:
<xs:element name="title"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute ref="lang"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element>
The XML syntax makes it verbose, but this can almost be read as plain
English as “the element named
title
has a complex type which is a simple content
obtained by extending the predefined datatype
xs:string
by adding the attribute defined in
this schema and having the name
lang
.”
The remaining elements (library
,
book
, author
, and
character
) are all complex types with
complex
content. They are defined by defining the sequence of elements and
attributes that will compose them.
The library
element, the most straightforward of
them, is defined as:
<xs:element name="library"> <xs:complexType> <xs:sequence> <xs:element ref="book" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element>
This definition can be read as “the element named
library
is a complex type composed of a sequence
of 1 to many occurrences (note the maxOccurs
attribute) of elements defined as having a name
book
.”
The element author
, which has an attribute and for
which we may consider the date of death as optional, could be:
<xs:element name="author"> <xs:complexType> <xs:sequence> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="dead" minOccurs="0"/> </xs:sequence> <xs:attribute ref="id"/> </xs:complexType> </xs:element>
This means the element named author
is a complex
type composed of a sequence of three elements
(name
, born
, and
dead
), and id
. The
dead
element is optional- it may occur zero
times.
The minOccurs
and
maxOccurs
attributes, which we have seen in a
couple of previous elements, allow us to define the minimum and
maximum number of occurrences. Their default value is 1, which means
that when they are both missing, the element must appear exactly one
time in the sequence. The special value
“unbounded” may be used for
maxOccurs
when the maximum number of occurrences
is unlimited.
The attributes need to be defined after the sequence. The remaining
elements (book
and character
)
can be defined in the same way, which leads us to the following full
schema:
<?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="name" type="xs:string"/> <xs:element name="qualification" type="xs:string"/> <xs:element name="born" type="xs:date"/> <xs:element name="dead" type="xs:date"/> <xs:element name="isbn" type="xs:string"/> <xs:attribute name="id" type="xs:ID"/> <xs:attribute name="available" type="xs:boolean"/> <xs:attribute name="lang" type="xs:language"/> <xs:element name="title"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute ref="lang"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> <xs:element name="library"> <xs:complexType> <xs:sequence> <xs:element ref="book" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="author"> <xs:complexType> <xs:sequence> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="dead" minOccurs="0"/> </xs:sequence> <xs:attribute ref="id"/> </xs:complexType> </xs:element> <xs:element name="book"> <xs:complexType> <xs:sequence> <xs:element ref="isbn"/> <xs:element ref="title"/> <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="character" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute ref="id"/> <xs:attribute ref="available"/> </xs:complexType> </xs:element> <xs:element name="character"> <xs:complexType> <xs:sequence> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="qualification"/> </xs:sequence> <xs:attribute ref="id"/> </xs:complexType> </xs:element> </xs:schema>