Complex Types

A schema assigns a type to each element and attribute it declares. In Example 17-5, the fullName element has a complex type. Elements with complex types may contain nested elements and have attributes. Only elements can contain complex types. Attributes always have simple types.

Since the type is declared using an xs:complexType element embedded directly in the element declaration, it is also an anonymous type, rather than a named type.

New types are defined using xs:complexType or xs:simpleType elements. If a new type is declared globally with a top-level element, it needs to be given a name so that it can be referenced from element and attribute declarations within the schema. If a type is defined inline (inside an element or attribute declaration), it does not need to be named. But since it has no name, it cannot be referenced by other element or attribute declarations. When building large and complex schemas, data types will need to be shared among multiple different elements. To facilitate this reuse, it is necessary to create named types.

To show how named types and complex content interact, let's expand the example schema. A new address element will contain the fullName element, and the person's name will be divided into a first- and last-name component. A typical instance document would look like Example 17-6.

Example 17-6. addressdoc.xml after adding address, first, and last elements

<?xml version="1.0"?>
<addr:address xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://namespaces.oreilly.com/xmlnut/address 
      address-schema.xsd"
    xmlns:addr="http://namespaces.oreilly.com/xmlnut/address"
    addr:language="en">
  <addr:fullName>
    <addr:first>Scott</addr:first>
    <addr:last>Means</addr:last>
  </addr:fullName>
</addr:address>

To accommodate this new format, fairly substantial structural changes to the schema are required, as shown in Example 17-7.

Example 17-7. address-schema.xsd to support address element

<xs:schema xmlns:xsi="http://www.w3.org/2001/XMLSchema"
  targetNamespace="http://namespaces.oreilly.com/xmlnut/address"
  xmlns:addr="http://namespaces.oreilly.com/xmlnut/address"
  elementFormDefault="qualified">
<xs:element name="address">
  <xs:complexType>
    <xs:sequence>
      <xs:element name="fullName">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="first" type="addr:nameComponent"/>
            <xs:element name="last" type="addr:nameComponent"/>
          </xs:sequence>
        </xs:complexType>
      </xs:element>
    </xs:sequence>
  </xs:complexType>
 </xs:element>
 
 <xs:complexType name="nameComponent">
  <xs:simpleContent>
    <xs:extension base="xs:string"/>
  </xs:simpleContent>
 </xs:complexType>
</xs:schema>

The first major difference between this schema and the previous version is that the root element name has been changed from fullName to address. The same result could have been accomplished by creating a new top-level element declaration for the new address element, but that would have opened a loophole allowing a valid instance document to contain only a fullName element and nothing else.

The address element declaration defines a new anonymous complex type. Unlike the old definition, this complex type is defined to contain complex content using the xs:sequence element. The sequence element tells the schema processor that the contained list of elements must appear in the target document in the exact order they are given. In this case, the sequence contains only one element declaration.

The nested element declaration is for the fullName element, which then repeats the xs:complexType and xs:sequence definition process. Within this nested sequence, two element declarations appear for the first and last elements.

These two element declarations, unlike all prior element declarations, explicitly reference a new complex type that's declared in the schema: the addr:nameComponent type. It is fully qualified to differentiate it from possible conflicts with built-in schema data types.

The nameComponent type is declared by the xs:complexType element immediately following the address element declaration. It is identified as a named type by the presence of the name attribute, but in every other way it is constructed the same way it would have been as an anonymous type.

One feature of schemas that should be welcome to DTD developers is the ability to explicitly set the minimum and maximum number of times an element may occur at a particular point in a document using minOccurs and maxOccurs attributes of the xs:element element. For example, this declaration adds an optional middle name to the fullName element:

<xs:element name="fullName">
  <xs:complexType>
    <xs:sequence>
      <xs:element name="first" type="addr:nameComponent"/>
      <xs:element name="middle" type="addr:nameComponent"
          minOccurs="0"/>
      <xs:element name="last" type="addr:nameComponent"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

Notice that the element declaration for the middle element has a minOccurs value of 0. The default value for both minOccurs and maxOccurs is 1, if they are not provided explicitly. Therefore, setting minOccurs to 0 means that the middle element may appear 0 to 1 times. This is equivalent to using the ? operator in a DTD declaration. Another possible value for the maxOccurs attribute is unbounded, which indicates that the element in question may appear an unlimited number of times. This value is used to produce the same effect as the * and + operators in a DTD declaration. The advantage over DTDs comes when you use values other than 0, 1, or unbounded, letting you specify things like "this element must appear at least twice but no more than four times."

So far you have seen elements that only contain character data and elements that only contain other elements. The next several sections cover each of the possible types of element content individually, from most restrictive to least restrictive: