This chapter provides a jump-start on XML Schema for readers who are familiar with DTDs. It offers a detailed comparison of DTD and schema syntax, which is useful both for understanding XML Schema and for converting existing DTDs to schemas. It also describes some of the features of XML Schema that require the use of DTDs, such as entities and notations.
Table 19–1 shows examples of various DTD content models and matches them up with the corresponding XML Schema content types. Each of these content types is explained in the rest of this section.
Element types with (#PCDATA)
content and no attributes in a DTD correspond to element declarations with simple types in schemas. Example 19–1 shows such an element declaration.
Note that the built-in type decimal
is assigned to price
. It is possible to assign all #PCDATA
element types the built-in type string
, which handles whitespace in the same way as DTD processors handle whitespace for any character data content of an element. However, it is advisable to be as specific as possible when choosing a type for an element declaration. Chapter 11 describes the built-in simple types in detail, and Chapter 8 describes how to define your own simple types.
DTD:
<!ELEMENT price (#PCDATA)>
Schema:
<xs:element name="price" type="xs:decimal"/>
Element types with (#PCDATA)
content that do have attributes correspond to element declarations using complex types with simple content in schemas. Example 19–2 shows such an element declaration. It extends the simple type decimal
to add the attribute currency
.
Example 19–2. Simple content (with attributes)
DTD:
<!ELEMENT price (#PCDATA)>
<!ATTLIST price currency NMTOKEN #IMPLIED>
Schema:
<xs:element name="price">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:decimal">
<xs:attribute name="currency" type="xs:NMTOKEN"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
Element types that may have children, regardless of whether they have attributes, correspond to element declarations using complex types with complex content in schemas. Example 19–3 shows such an element declaration.
DTD:
<!ELEMENT product (number, name+, size?, color*)>
Schema:
<xs:element name="product">
<xs:complexType>
<xs:sequence>
<xs:element ref="number"/>
<xs:element ref="name" maxOccurs="unbounded"/>
<xs:element ref="size" minOccurs="0"/>
<xs:element ref="color" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
In Example 19–3, the content model was converted into a sequence
. Groups, enclosed in parentheses in DTDs, are represented by one of the three model groups in a schema.
• sequence
groups require that the elements appear in order.
• choice
groups allow a choice from several elements.
• all
groups allow the elements to appear in any order.
Table 19–2 shows the mapping between DTD groups and XML Schema model groups.
As shown in Example 19–3, the occurrence constraints on element types and groups are represented by the minOccurs
and maxOccurs
attributes in schemas. Table 19–3 shows the mapping between occurrence constraints in DTDs and schemas.
Table 19–3. Occurrence constraints
The defaults for minOccurs
and maxOccurs
are both 1
. XML Schema can provide more specific validation than DTDs, since any non-negative integer can be specified. For example, you can specify that the color
element may appear a maximum of three times.
Groups may be nested in schemas just as they may in DTDs, as illustrated in Example 19–4. Note that minOccurs
and maxOccurs
may appear on groups as well as on element declarations.
DTD:
<!ELEMENT el ((a | b)*, (c | d)?)>
Schema:
<xs:element name="el">
<xs:complexType>
<xs:sequence>
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element ref="a"/>
<xs:element ref="b"/>
</xs:choice>
<xs:choice minOccurs="0" maxOccurs="1">
<xs:element ref="c"/>
<xs:element ref="d"/>
</xs:choice>
</xs:sequence>
</xs:complexType>
</xs:element>
Element types that have both #PCDATA
content and children are said to have mixed content.1 In schemas, mixed content is indicated by a mixed
attribute of a complexType
element, as shown in Example 19–5.
With DTDs, you are limited to the choice operator (|
) with mixed content element types. In schemas, any content model can be mixed, allowing more complex validation of the children. For example, in a DTD you cannot specify that custName
must appear before prodName
. In schemas, you can accomplish this using a sequence
group instead of a choice
group.
DTD:
<!ELEMENT letter (#PCDATA | custName | prodName)*>
Schema:
<xs:element name="letter">
<xs:complexType mixed="true">
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element ref="custName"/>
<xs:element ref="prodName"/>
</xs:choice>
</xs:complexType>
</xs:element>
Empty content, indicated by the keyword EMPTY
in DTDs, is simply indicated by an absence of a content model in a schema. Example 19–6 shows an element declaration with empty content, containing only attribute declarations.
DTD:
<!ELEMENT color EMPTY>
<!ATTLIST color value NMTOKEN #IMPLIED>
Schema:
<xs:element name="color">
<xs:complexType>
<!-- no content model is specified here -->
<xs:attribute name="value" type="xs:NMTOKEN"/>
</xs:complexType>
</xs:element>
Any content, indicated by the keyword ANY
in DTDs, is represented by an element wildcard any
in a schema. This is illustrated in Example 19–7.
DTD:
<!ELEMENT anything ANY>
Schema:
<xs:element name="anything">
<xs:complexType mixed="true">
<xs:sequence>
<xs:any minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
XML Schema offers much more sophisticated wildcard capabilities than DTDs. It is possible with XML Schema to put a wildcard anywhere in a content model, specify how many replacement elements may appear, restrict the namespace(s) of the replacement elements, and control how strictly they are validated. See Section 12.7.1 on p. 285 for more information on element wildcards.
The DTD attribute types are represented in XML Schema as simple types, most of them with the same name. Table 19–4 lists the DTD attribute types and their equivalent types in XML Schema.
Table 19–4. DTD attribute types and equivalents
In order to represent an enumerated attribute type in a schema, it is necessary to define a new simple type and apply enumeration
facets to restrict the values to the desired set. This is illustrated in Example 19–8.
Example 19–8. Representing an enumerated attribute
DTD:
<!ATTLIST price currency (USD | CHF) "USD">
Schema:
<xs:attribute name="currency" default="USD">
<xs:simpleType>
<xs:restriction base="xs:token">
<xs:enumeration value="USD"/>
<xs:enumeration value="CHF"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
The built-in type token
is used as the base type for the restriction, which will result in whitespace handling identical to that of enumerated attribute types in DTDs.
A NOTATION
attribute type exists in XML Schema as it does in XML DTDs. However, the NOTATION
type cannot be used directly by an attribute. Instead, you must define a new simple type that restricts NOTATION
and apply enumeration facets to list the possible values for that notation. This is illustrated in Example 19–9.
Example 19–9. Representing a notation attribute
DTD:
<!ATTLIST picture fmt NOTATION (jpg | gif) "jpg">
Schema:
<xs:attribute name="fmt" default="jpg">
<xs:simpleType>
<xs:restriction base="xs:NOTATION">
<xs:enumeration value="jpg"/>
<xs:enumeration value="gif"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
Attribute default values are handled by three attributes in schemas: the use
attribute which indicates whether the attribute being declared is required or optional, the default
attribute which specifies a default value, and the fixed
attribute which specifies a fixed value. Table 19–5 shows how the DTD attribute default values correspond to schema attributes.
Table 19–5. DTD default values and their equivalents
Example 19–10 provides some examples of attribute declarations with various types and default values.
Example 19–10. Attribute declarations
DTD:
<!ATTLIST product
id ID #REQUIRED
name CDATA #IMPLIED
type NMTOKEN "PR"
version NMTOKEN #FIXED "A123">
Schema:
<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attribute name="name" type="xs:normalizedString"
use="optional"/>
<xs:attribute name="type" type="xs:NMTOKEN" default="PR"/>
<xs:attribute name="version" type="xs:NMTOKEN" fixed="A123"/>
Internal parameter entities are often used in DTDs to reuse pieces of element or attribute declarations. Using schemas, reuse is handled by creating reusable types, named model groups, and attribute groups.
This section explains how to convert internal parameter entities into XML Schema components.
In DTDs, a parameter entity may be used to define a content model once and reuse it for multiple element types. Using schemas, the best way to accomplish this is to define a named complex type which is then used by multiple element declarations. This is illustrated in Example 19–11, where the AOrB
content model is used by two element declarations, x
and y
.
Example 19–11. Reusing entire content models
DTD:
<!ENTITY % AOrB "(a | b)">
<!ELEMENT x %AOrB;>
<!ELEMENT y %AOrB;>
Schema:
<xs:complexType name="AOrBType">
<xs:choice>
<xs:element ref="a"/>
<xs:element ref="b"/>
</xs:choice>
</xs:complexType>
<xs:element name="x" type="AOrBType"/>
<xs:element name="y" type="AOrBType"/>
A parameter entity may also be used to represent a fragment of a content model. In XML Schema, named model groups are designated for this purpose. Example 19–12 shows a content model fragment AOrB
that is used as part of the entire content model in the x
element declaration. See Section 15.2 on p. 386 for more information on named model groups.
Example 19–12. Reusing fragments of content models
DTD:
<!ENTITY % AOrB "a | b">
<!ELEMENT x ((%AOrB;), c)>
Schema:
<xs:group name="AOrBGroup">
<xs:choice>
<xs:element ref="a"/>
<xs:element ref="b"/>
</xs:choice>
</xs:group>
<xs:element name="x">
<xs:complexType>
<xs:sequence>
<xs:group ref="AOrBGroup"/>
<xs:element ref="c"/>
</xs:sequence>
</xs:complexType>
</xs:element>
In some cases, parameter entities are used in DTDs to reuse an attribute or a set of attributes that are common to several element types. In XML Schema, attribute groups are used for this purpose. Example 19–13 shows the definition of an attribute group HeaderGroup
containing two attributes, which is then referenced by the x
element declaration.
Example 19–13. Reusing groups of attributes
DTD:
<!ENTITY % HeaderGroup "id ID #REQUIRED
variety NMTOKEN #IMPLIED">
<!ATTLIST x %HeaderGroup;>
Schema:
<xs:attributeGroup name="HeaderGroup">
<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attribute name="variety" type="xs:NMTOKEN"/>
</xs:attributeGroup>
<xs:element name="x">
<xs:complexType>
<xs:attributeGroup ref="HeaderGroup"/>
</xs:complexType>
</xs:element>
Parameter entities are sometimes used to make DTDs more flexible and future-proof. Empty entities are declared and placed in various parts of the DTD, most often in content models and attribute lists. This allows a parent (or internal) DTD to override the entity declaration, thus overriding the original DTD without having to completely rewrite it. Using schemas, this can be accomplished through several methods: type derivation, substitution groups, redefines, or overrides.
In DTDs, you can place a reference to an empty parameter entity at the end of a content model, as shown in Example 19–14. In XML Schema, this can be accomplished using the redefine or override mechanism.
Example 19–14. Allowing future extensions for sequence groups
DTD:
<!ENTITY % ext "" >
<!ELEMENT x (a, b %ext;)>
Schema:
<xs:group name="ext">
<xs:sequence/>
</xs:group>
<xs:element name="x">
<xs:complexType>
<xs:sequence>
<xs:element ref="a"/>
<xs:element ref="b"/>
<xs:group ref="ext"/>
</xs:sequence>
</xs:complexType>
</xs:element>
Example 19–15 shows how these extensions could be accomplished in a new parent DTD or in a new schema. In the schema, the redefine mechanism is used to extend the named model group to add to the end of the content model. Redefinition is covered in Chapter 18.
Example 19–15. Implementing extensions for sequence groups using redefine
DTD:
<!ENTITY % ext ", c, d" >
<!ENTITY % original SYSTEM "original.dtd">
%original;
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:redefine schemaLocation="original.xsd">
<xs:group name="ext">
<xs:sequence>
<xs:group ref="ext"/>
<xs:element ref="c"/>
<xs:element ref="d"/>
</xs:sequence>
</xs:group>
</xs:redefine>
</xs:schema>
In version 1.1 of XML Schema, a better choice is to use override
, since redefine
is deprecated. Example 19–16 shows a revised example that uses override
. Overrides are also covered in Chapter 18.
Example 19–16. Implementing extensions for sequence groups using override
DTD:
<!ENTITY % ext ", c, d" >
<!ENTITY % original SYSTEM "original.dtd">
%original;
Schema:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:override schemaLocation="original.xsd">
<xs:group name="ext">
<xs:sequence>
<xs:element ref="a"/>
<xs:element ref="b"/>
<xs:element ref="c"/>
<xs:element ref="d"/>
</xs:sequence>
</xs:group>
</xs:override>
</xs:schema>
On the other hand, if it is a choice
group that you wish to leave open, extension will not meet your needs. This is because all extensions are added to the end of the content model as part of a sequence
group. For a more detailed explanation of this, see Section 13.4.2.1 on p. 309.
The best approach to extending a choice
group is by using a substitution group. Substitution groups allow an element declaration to be replaced by any of a group of designated element declarations. New element declarations can be added to the substitution group at any time. The schema fragment in Example 19–17 uses a choice
group that contains a reference to the ext
element declaration. Because it is abstract, ext
can never be used in an instance.
Example 19–17. Allowing future extensions for choice groups
DTD:
<!ENTITY % ext "" >
<!ELEMENT x (a | b %ext;)*>
Schema:
<xs:element name="x">
<xs:complexType>
<xs:choice maxOccurs="unbounded">
<xs:element ref="a"/>
<xs:element ref="b"/>
<xs:element ref="ext"/>
</xs:choice>
</xs:complexType>
</xs:element>
<xs:element name="ext" abstract="true" type="xs:string"/>
Example 19–18 shows how these extensions would be accomplished in a new parent DTD or in a new schema. In the schema, element declarations c
and d
are added to the substitution group headed by ext
, allowing these element declarations to appear in the content model as part of the choice.
Example 19–18. Implementing extensions for choice groups
DTD:
<!ENTITY % ext "| c | d" >
<!ENTITY % original SYSTEM "original.dtd">
%original;
Schema:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:include schemaLocation="original.xsd"/>
<xs:element name="c" substitutionGroup="ext"/>
<xs:element name="d" substitutionGroup="ext"/>
</xs:schema>
Parameter entities may also be used in DTDs to leave attribute lists open to future additions. Using schemas, this can be handled through redefining or overriding attribute groups. Example 19–19 shows a DTD that includes an empty parameter entity in an attribute list. The corresponding schema has an empty attribute group that serves the same purpose.
Example 19–19. Allowing future extensions for attributes
DTD:
<!ENTITY % attExt "" >
<!ATTLIST x id ID #REQUIRED
%attExt;>
Schema:
<xs:attributeGroup name="attExt"/>
<xs:element name="x">
<xs:complexType>
<!-- content model here -->
<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attributeGroup ref="attExt"/>
</xs:complexType>
</xs:element>
Example 19–20 shows how attribute extensions would be accomplished in a new parent DTD or in a new schema. In the schema, the redefine mechanism is used to extend the attribute group to add a new attribute.
Example 19–20. Implementing extensions for attributes using redefine
DTD:
<!ENTITY % attExt "myAttr NMTOKEN #IMPLIED" >
<!ENTITY % original SYSTEM "original.dtd">
%original;
Schema:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:redefine schemaLocation="original.xsd">
<xs:attributeGroup name="attExt">
<xs:attributeGroup ref="attExt"/>
<xs:attribute name="myAttr" type="xs:NMTOKEN"/>
</xs:attributeGroup>
</xs:redefine>
</xs:schema>
This technique can also replace the declaration of multiple ATTLIST
s for a single element type that is sometimes used to extend attribute lists.
In version 1.1 of XML Schema, a better choice is to use override
, since redefine
is deprecated. Example 19–21 shows a revised example that uses override
.
Example 19–21. Implementing extensions for attributes using override
DTD:
<!ENTITY % attExt "myAttr NMTOKEN #IMPLIED" >
<!ENTITY % original SYSTEM "original.dtd">
%original;
Schema:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:override schemaLocation="original.xsd">
<xs:attributeGroup name="attExt">
<xs:attribute name="myAttr" type="xs:NMTOKEN"/>
</xs:attributeGroup>
</xs:override>
</xs:schema>
External parameter entities are used to include other DTDs (or fragments of DTDs) in a parent DTD. In a schema, this is accomplished using either include
or import
. An include
can be used if both schema documents are in the same namespace (or in no namespace), while import
is used if they are in different namespaces. Example 19–22 illustrates the use of include
to combine schema documents. See Section 4.3.1 on p. 62 for more detailed information on the include mechanism.
Example 19–22. Including other DTDs or schema documents
DTD:
<!ENTITY % prodInfo SYSTEM "prod.dtd">
%prodInfo;
Schema:
<xs:include schemaLocation="prod.xsd"/>
General entities are used in DTDs to represent characters or other repeated character data that appears in instances. Unfortunately, there is no direct equivalent for general entities in XML Schema. It is still possible to use an internal or external DTD to declare the entities and use this DTD in conjunction with schemas, as explained in Section 19.9 on p. 499.
Unparsed entities are used in conjunction with notations to reference external data in non-XML formats, such as graphics files. A schema-validated instance must be associated with a DTD (usually an internal DTD subset) that declares the unparsed entities. This is described further in Section 19.7.3 on p. 496.
Notations are used to indicate the format of non-XML data. For example, notations can be declared to indicate whether certain binary graphics data embedded in a picture
element is in JPEG or GIF format. Notations may describe data embedded in an XML instance, or data in external files that are linked to the instance through unparsed entities.
A notation may have a system
or public
identifier. There are no standard notation names or identifiers for well-known formats such as JPEG. Sometimes the identifier points to an application that can be used to process the format, for example viewer.exe
, and other times it points to documentation about that format. Sometimes it is simply an abbreviation that can be interpreted by an application. Schema processors do not resolve these identifiers; it is up to the consuming application to process the notations as desired.
To indicate that a picture
element contains JPEG data, it will generally have a notation attribute (for example, fmt
) that indicates which notation applies. An element should only have one notation attribute.
Example 19–23 shows an instance that uses a notation. The fmt
attribute contains the name of the notation that applies to the contents of picture
.
Example 19–23. Using a notation in an instance
<picture fmt="jpeg">47494638396132003200F7FF00FFFFFFFFFFCCFFFF99FF
FF66FFFF33FFFF00FF</picture>
Notations in XML Schema are declared using notation
elements, whose syntax is shown in Table 19–6. Notations are always declared globally, with schema
as their parent. Notations are named components whose qualified names must be unique among all notations in a schema. Like other named, global components, notations take on the target namespace of the schema document. However, for compatibility, it is recommended that notations only be declared in schemas that have no target namespace.
Table 19–6. XSD Syntax: notation
As mentioned earlier, elements that contain data described by a notation have a notation attribute. This attribute has a type that restricts the type NOTATION
by specifying one or more enumeration
facets. Each of these enumeration values must match the name of a declared notation
.
Example 19–24 shows two notation declarations that represent graphics formats. A simple type PictureNotationType
is then defined, based on NOTATION
, which enumerates the names of the notations. Next, an element declaration for picture
is provided which declares an attribute fmt
of type PictureNotationType
.
Example 19–24. Declaring notations and notation attributes
<xs:notation name="jpeg" public="JPG"/>
<xs:notation name="gif" public="GIF"/>
<xs:simpleType name="PictureNotationType">
<xs:restriction base="xs:NOTATION">
<xs:enumeration value="jpeg"/>
<xs:enumeration value="gif"/>
</xs:restriction>
</xs:simpleType>
<xs:element name="picture">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:hexBinary">
<xs:attribute name="fmt" type="PictureNotationType"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
Example 19–24 showed the graphics data embedded directly in the XML in binary format. Notations can also be used to indicate the format of an unparsed general entity. Example 19–25 shows an XML document that lists products and links to pictures of those products. In the schema, picture
is declared to have an attribute location
that is of type ENTITY
. In the instance, each value of the location
attribute (in this case, prod557
and prod563
) matches the name of an entity declared in the internal DTD subset for the instance. The entity, in turn, refers to the notation via the NDATA
parameter. In this case, the notation must appear in the internal DTD subset of the instance in order for the entity to be able to reference it.
Example 19–25. A notation with an unparsed entity
Schema:
<xs:element name="picture">
<xs:complexType>
<xs:attribute name="location" type="xs:ENTITY"/>
</xs:complexType>
</xs:element>
<!--...-->
<!DOCTYPE catalog SYSTEM "catalog.dtd" [
<!NOTATION jpeg SYSTEM "JPG">
<!ENTITY prod557 SYSTEM "prod557.jpg" NDATA jpeg>
<!ENTITY prod563 SYSTEM "prod563.jpg" NDATA jpeg>
]>
<catalog>
<product>
<number>557</number>
<picture location="prod557"/>
</product>
<product>
<number>563</number>
<picture location="prod563"/>
</product>
</catalog>
DTDs often use comments to further explain the declarations they contain. Schema documents, as XML, can also contain comments. However, XML Schema also offers an annotation facility that is designed to provide more structured, usable documentation of schema components. Example 19–26 shows a DTD fragment that has a comment describing a section (CUSTOMER INFORMATION
) and two element declarations with element-specific comments appearing before each one.
The corresponding schema places each of these comments within an annotation
element. The first annotation
element, which describes the section, appears as a direct child of the schema
. The elementspecific annotations, on the other hand, are defined entirely within the element declarations to which they apply. In all three cases, documentation
elements are used, which are designed for human-readable information. The schema is considerably more verbose than the DTD, but the descriptive information is much better structured. Section 21.8 on p. 580 covers schema documentation in detail.
DTD:
<!-- ******************** -->
<!-- CUSTOMER INFORMATION -->
<!-- ******************** -->
<!-- billing address -->
<!ELEMENT billTo (%AddressType;)>
<!-- shipping address -->
<!ELEMENT shipTo (%AddressType;)>
Schema:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:doc="http://datypic.com/doc">
<xs:annotation>
<xs:documentation>
<doc:section>CUSTOMER INFORMATION</doc:section>
</xs:documentation>
</xs:annotation>
<xs:element name="billTo" type="AddressType">
<xs:annotation>
<xs:documentation>
<doc:description>billing address</doc:description>
</xs:documentation>
</xs:annotation>
</xs:element>
<xs:element name="shipTo" type="AddressType">
<xs:annotation>
<xs:documentation>
<doc:description>shipping address</doc:description>
</xs:documentation>
</xs:annotation>
</xs:element>
</xs:schema>
There is nothing to prevent an instance from being validated against both a DTD and a schema. In fact, if you wish to use general entities, you must continue to use DTDs alongside schemas. Example 19–27 shows an instance that has both a DTD and a reference to a schema.
Example 19–27. Using a DTD and a schema
<!DOCTYPE catalog SYSTEM "catalog.dtd" [
<!NOTATION jpeg SYSTEM "JPG">
<!ENTITY prod557 SYSTEM "prod557.jpg" NDATA jpeg>
<!ENTITY prod563 SYSTEM "prod563.jpg" NDATA jpeg>]>
<catalog xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="prod.xsd">
<product>
<number>557</number>
<picture location="prod557"/>
</product>
<product>
<number>563</number>
<picture location="prod563"/>
</product>
</catalog>
Two separate validations can take place: one against the DTD and one against the schema. The DTD validity will be assessed first. This process will not only validate the instance, but also augment it by resolving the entities, filling in attributes’ default values, and normalizing whitespace in attribute values. Validity according to the schema is then assessed on the augmented instance. None of the declarations in the DTD override the declarations in the schema. If there are declarations for the same element in both the DTD and the schema and these declarations are conflicting, an element may be DTD-valid but not schema-valid.