Although document type definitions can enforce basic structural rules on documents, many applications need a more powerful and expressive validation method. The W3C developed the XML Schema Recommendation to address these needs. Schemas can describe complex restrictions on elements and attributes. Multiple schemas can be combined to validate documents that use multiple XML vocabularies. This chapter provides a rapid introduction to key W3C XML Schema concepts and usage, starting with the fundamental structures that are common to all schemas. We begin with a very simple schema and proceed to add more functionality to it until every major feature of XML Schemas has been introduced.
An XML Schema is an XML document containing a formal description of what comprises a valid XML document. A W3C XML Schema Language schema is an XML Schema written in the particular syntax recommended by the W3C.
In this chapter, when we use the word "schema" without further qualification, we are referring specifically to a schema written in the W3C XML Schema language. However, there are numerous other XML Schema languages, including RELAX NG and Schematron, each with their own strengths and weaknesses.
An XML document described by a schema is called an instance document . If a document satisfies all the constraints specified by the schema, it is considered to be schema-valid . The schema document is associated with an instance document through one of the following methods:
An xsi:schemaLocation
attribute on an element contains a list of namespaces used within
that element and the URLs of the schemas with which to validate
elements and attributes in those namespaces.
An xsi:noNamespaceSchemaLocation
attribute
contains a URL for the schema used to validate elements that are
not in any namespace.
A validating parser may be instructed to validate a given document against an explicitly provided schema, ignoring any hints that might be provided within the document itself.
DTDs provide the capability to do basic validation of the following items in XML documents:
Element nesting
Element occurrence constraints
Permitted attributes
Attribute types and default values
However, DTDs do not provide fine control over the format and
data types of element and attribute values. Other than the various
special attribute types (ID
,
IDREF
, ENTITY
, NMTOKEN
, and so forth), once an element or
attribute has been declared to contain character data, no limits may
be placed on the length, type, or format of that content. For
narrative documents (such as web pages, book chapters, newsletters,
etc.), this level of control is probably good enough.
But as XML makes inroads into more record-like applications, such as remote procedure calls and object serialization, more precise control over the text content of elements and attributes becomes important. The W3C XML Schema standard includes the following features:
Simple and complex data types
Type derivation and inheritance
Element occurrence constraints
Namespace-aware element and attribute declarations
The most important of these features is the addition of simple
data types for parsed character data and attribute values. Schemas
can enforce much more specific rules about the contents of elements
and attributes than DTDs can. In addition to a wide range of
built-in simple types (such as string
, integer
, decimal
, and dateTime
), the schema language provides a
framework for declaring new data types, deriving new types from old
types, and reusing types from other schemas.
Besides simple data types, schemas can place more explicit restrictions on the number and sequence of child elements that can appear in a given location. This is even true when elements are mixed with character data, unlike the mixed content supported by DTDs.
As XML documents are exchanged between different people
and organizations around the world, proper use of namespaces becomes
critical to prevent misunderstandings. Depending on what type of
document is being viewed, a simple element like <fullName>Zoe</fullName>
could
have widely different meanings. It could be a person's name, a pet's
name, or the name of a ship that recently docked. By associating
every element with a namespace URI, it is possible to distinguish
between two elements with the same local name.
Because the "Namespaces in XML" recommendation was released after the XML 1.0 recommendation, DTDs do not provide explicit support for namespaces. Unlike DTDs (where element and attribute declarations must include a namespace prefix), schemas validate against the combination of the namespace URI and local name, rather than the prefixed name.
XML Schema uses namespaces internally for several purposes.
The XML Schema vocabulary is in its own namespace, the vocabulary
being defined is in its namespace, and components used within the
schema (groups, attribute groups, and types) may also have
namespaces. XML Schema processing also uses namespaces within
instance documents to include directives to the schema
processor. For example, the special attributes used to associate an
element with a schema (schemaLocation
and noNamespaceSchemaLocation
) must be
associated with the official XML Schema instance namespace
URI (http://www.w3.org/2001/XMLSchema-instance
)
in order for the schema processor to recognize it as an instruction
to itself.