Chapter 5. Creating Simple Datatypes

So far, we have used only predefined datatypes. In this chapter, we will see how to create new simple types, taking advantage of the different derivation mechanisms and facets of derivation by restriction.

W3C XML Schema has defined three independent and complementary mechanisms for defining our own custom datatypes, using existing datatypes as starting points. These new user datatypes that are built upon existing predefined datatypes or on other user datatypes are called “derivation.”

The three derivation methods are derivation by restriction (where constraints are added on a datatype without changing its original semantic or meaning), derivation by list (where new datatypes are defined as being lists of values belonging to a datatype and take the semantic of list datatypes), and derivation by union (where new datatypes are defined as allowing values from a set of other datatypes and lose most of their semantic).

As with the xs:complexType, definitions (which we saw in our Russian doll design) and xs:simpleType(global definition) can be either named or anonymous. Despite this similarity, simple and complex types are very different. A simple type is a restriction on the value of an element or an attribute (i.e., a constraint on the content of a set of documents) while a complex type is a definition of a content model (i.e., a constraint on the markup). This is why the derivation methods for simple and complex types are very different, even though W3C XML Schema used the same element name (xs:restriction) for both. This is a common source of confusion.

Tip

These derivation methods are flexible and powerful. However, that W3C XML Schema needs many different primary datatypes can be seen as proof that they are not sufficient to create a new primary datatype. The reason being that the derivation methods are only acting on the value space or on the lexical space (as defined in Chapter 4), but they cannot modify the relations between these two spaces, nor create new value or lexical spaces. This subject has been debated by the W3C XML Schema Working Group, which has not found an agreement for ways to define an abstract datatype system that would allow definition of several lexical representations. The most obvious consequence of this decision is that, despite the protestation from the W3C I18N Working Group, W3C XML Schema doesn’t allow the definition of localized decimal or date datatypes.

Derivation By Restriction

Restriction is probably the most commonly used and natural derivation method.Datatypes are created by restriction by adding new constraints to the possible values. W3C XML Schema itself has been using derivation by restriction to define most of derived predefined datatypes, such as xs:positiveInteger , which is a derivation by restriction of xs:integer . The restrictions can be defined along different aspects or axes that W3C XML Schema calls “facets.”

A derivation by restriction is done using a xs:restriction element and each facet is defined using a specific element embedded in the xs:restriction element. The datatype on which the restriction is applied is called the base datatype, which can be referenced through a <base> attribute or defined in the xs:restriction element:

<xs:simpleType name="myInteger">
  <xs:restriction base="xs:integer">
    <xs:minInclusive value="-2"/>
    <xs:maxExclusive value="5"/>
  </xs:restriction>
</xs:simpleType>

It can also be defined in two steps using an embedded xs:simpleType(global definition) anonymous definition:

<xs:simpleType name="myInteger">
  <xs:restriction>
    <xs:simpleType>
      <xs:restriction base="xs:integer">
        <xs:maxExclusive value="5"/>
      </xs:restriction>
    </xs:simpleType>
    <xs:minInclusive value="-2"/>
  </xs:restriction>
</xs:simpleType>

The xs:minInclusive and xs:maxExclusive elements are two facets that can be applied to an integer datatype. As can be guessed from their names, they specify the minimum inclusive (i.e., that can be reached) and maximum exclusive (i.e., that is not allowed) values. We will introduce the list of facets in the next section. Depending on the facet, each acts directly either on the value space or on the lexical space of the datatype, and the same facet may have different effects depending on the datatype on which it is applied.

Whatever facet is being applied on a datatype, the semantic of its primitive type is unchanged, the list of facets that can be applied cannot be extended, and one must be careful to choose, when possible, a datatype whose primitive type matches the purpose of the node in which it will be used. For instance, while it is possible to constrain a string datatype to match non-ISO 8601 dates using patterns, this solution should be used only when absolutely required since this datatype would still be considered a string and lack facets, such as xs:minInclusive or xs:maxExclusive that are defined on date datatypes but that have no meaning (for W3C XML Schema) on a string.

Tip

The impact of the “right” choice of the base datatype with a semantic as close as possible to its actual usage in the instance documents will become more critical when W3C XML Schema aware applications become available. Such applications will have a different behavior depending on the datatype information found in the PSVI. A “wrong” choice will have side effects. For instance, the first drafts of XPath 2.0 propose to interpret values according to predefined datatypes and the results of equality tests on values or the sort orders would depend on the datatypes.

Facets

Before we start looking at the list of facets, we’ll discuss the way they work. They may be classified into three categories: xs:whiteSpace defines the whitespace processing that happens between the parser and lexical spaces—but can be used only on xs:string and xs:normalizedString . xs:pattern works on the lexical space; all the other facets constrain the value space. The availability of the facets and their effect depend on the datatype on which they are applied. We will see them in the context of groups of datatypes sharing the same set of facets.

Whitespace collapsed strings

These datatypes share the fact that they are character strings (even though technically W3C XML Schema doesn’t consider all of them as derived from the xs:string datatypes) and that whitespaces are collapsed before validation, as defined in the Recommendation, “all occurrences of #x9 (tab), #xA (line feed), and #xD (carriage return) are replaced with #x20 (space) and then, contiguous sequences of #x20s are collapsed to a single #x20, and initial and/or final #x20s are deleted.”

Those datatypes are: xs:ENTITY , xs:ID , xs:IDREF , xs:language , xs:Name , xs:NCName , xs:NMTOKEN , xs:token , xs:anyURI , xs:base64Binary , xs:hexBinary , xs:NOTATION , and xs:QName . Their facets are explained in the next section:

xs:enumeration

xs:enumeration allows definition of a list of possible values. Here’s an example:

<xs:simpleType name="schemaRecommendations">
  <xs:restriction base="xs:anyURI">
    <xs:enumeration value="http://www.w3.org/TR/xmlschema-0/"/>
    <xs:enumeration value="http://www.w3.org/TR/xmlschema-1/"/>
    <xs:enumeration value="http://www.w3.org/TR/xmlschema-2/"/>
  </xs:restriction>
</xs:simpleType>

This facet is constraining the value space. For most of the string (and assimilated) datatypes, lexical and values are identical and this doesn’t make any difference; however, it does make a difference for xs:anyURI , xs:base64Binary , and xs:QName . For instance, "http://dmoz.org/World/Français/" and "http://dmoz.org/World/Fran%c3%a7ais/" would be considered equal for xs:anyURI , the line breaks would be ignored for xs:base64Binary , and the match would be done on the tuples {namespace URI, local name} for xs:QName , ignoring the prefix used in the schema and instance documents.

One should also note that xs:anyURI datatypes are not “absolutized” by W3C XML Schema and do not support xml:base. This means that if the “schemaRecommendations” defined in the previous example is assigned to a XLink href attribute, it must fail to validate the following instance element:

<a xml:base="http://www.w3.org/TR/" href="xmlschema-1/">
  XML Schema Part 2: Datatypes
</a>

We cannot leave this section without discussing xs:NOTATION . This datatype is the only case of a predefined datatype that cannot be used directly in a schema and must be used through derived types specifying a set of xs:enumeration facets. Even though notations are very seldom used in real-life applications, this book wouldn’t be complete without at least an example of notations. If we take the usual example of a picture using a notation in an attribute to qualify the content of a binary field as follows:

<?xml version="1.0"?> 
<picture type="png"> 
 
  iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAIAAAACUFjqAAAABmJLR0QA/wD/AP+gvaeTAAAA
  CXBIWXMAAAsSAAALEgHS3X78AAAAB3RJTUUH0QofESYx2JhwGwAAAFZJREFUeNqlj8ENwDAI
  A6HqGDCWp2QQ2AP2oI9IbaQm/dRPn9EJ7m7a56DPPDgiIoKIzGyBM9Pdx+4ueXabWVUBEJHR
  nLNJVbfuqspMAEOxwO9r/vX3BTEnKRXtqqslAAAAAElFTkSuQmCC
</picture>

The schema might be written as (note how the notations need to be declared in the schema to be used in an xs:enumeration facet):

<?xml version="1.0"?> 
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> 
  <xs:notation name="jpeg" public="image/jpeg"
    system="file:///usr/bin/xv"/> 
  <xs:notation name="gif" public="image/gif"
    system="file:///usr/bin/xv"/> 
  <xs:notation name="png" public="image/png"
    system="file:///usr/bin/xv"/> 
  <xs:notation name="svg" public="image/svg"
    system="file:///usr/bin/xsmiles"/> 
  <xs:notation name="pdf" public="application/pdf"
    system="file:///usr/bin/acroread"/>
  <xs:simpleType name="graphicalFormat">
    <xs:restriction base="xs:NOTATION">
      <xs:enumeration value="jpeg"/>
      <xs:enumeration value="gif"/>
      <xs:enumeration value="png"/>
      <xs:enumeration value="svg"/>
      <xs:enumeration value="pdf"/>
    </xs:restriction>
  </xs:simpleType>
  <xs:element name="picture">
    <xs:complexType>
      <xs:simpleContent>
        <xs:extension base="xs:base64Binary">
          <xs:attribute name="type" type="graphicalFormat"/>
        </xs:extension>
      </xs:simpleContent>
    </xs:complexType>
  </xs:element>
</xs:schema>

xs:length

xs:length defines a fixed length measured in number of characters (general case) or bytes (xs:hexBinary and xs:base64Binary):

<xs:simpleType name="standardNotations">
  <xs:restriction base="xs:NOTATION">
    <xs:length value="8"/>
  </xs:restriction>
</xs:simpleType>

This facet also constrains the value space. For xs:anyURI , this may be difficult to predict since the length is checked after the character normalization. For xs:QName , this is even worse since the W3C XML Schema recommendation has not given any definition of the length of an xs:QName tuple. Fortunately, in practice, constraining the length of these datatypes doesn’t seem to be very useful, and it’s a good idea to avoid using these constraints on these datatypes. The same restriction applies to the next two facets.

xs:maxLength

xs:maxLength defines a maximum length measured in number of characters (general case) or bytes (xs:hexBinary and xs:base64Binary):

<xs:simpleType name="binaryImage">
  <xs:restriction base="xs:hexBinary">
    <xs:maxLength value="1024"/>
  </xs:restriction>
</xs:simpleType>

xs:minLength

xs:minLength defines a minimum length measured in number of characters (general case) or bytes (hexBinary and base64Binary):

<xs:simpleType name="longName">
  <xs:restriction base="xs:NCName">
    <xs:minLength value="6"/>
  </xs:restriction>
</xs:simpleType>

xs:pattern

xs:pattern defines a pattern that must be matched by the string (we will explore patterns in more detail in the next chapter) :

<xs:simpleType name="httpURI">
  <xs:restriction base="xs:anyURI">
    <xs:pattern value="http://.*"/>
  </xs:restriction>
</xs:simpleType>

Several pattern facets can be defined in a single derivation step. They are then merged together through a logical “or” (a value will match the restricted datatype if it matches one of the patterns).

Tip

Because of the impossibility of defining a single order that would be useful for all the regional alphabets, W3C XML Schema has decided to handle the string datatypes as being unordered. The consequence is there are no facets to define minimal or maximal values for string datatypes.

Other strings

The whitespaces of these other strings are not collapsed before validation, and a new facet ( xs:whiteSpace ) is available, in addition to the facets just described, to specify the treatment to apply on whitespaces for the user-defined datatypes derived from them.

Those datatypes are: xs:normalizedString and xs:string .

xs:whiteSpace

xs:whiteSpace defines the way to handle whitespaces—i.e., #x20 (space), #x9 (tab), #xA (linefeed), and #xD (carriage return)—for this datatype:

<xs:simpleType name="CapitalizedNameWS">
  <xs:restriction base="xs:string">
    <xs:whiteSpace value="collapse"/>
    <xs:pattern value="([A-Z]([a-z]*) ?)+"/>
  </xs:restriction>
</xs:simpleType>

The values of an xs:whiteSpace facet are “preserve” (whitespaces are kept unchanged), “replace” (all the instances of any whitespace are replaced with a space), and “collapse” (leading and trailing whitespaces are removed and all the other sequences of contiguous whitespaces are replaced by a single space). This facet is atypical since it specifies a treatment to be done on a value before applying any validation test on this value. In the earlier example, setting whitespace to “collapse” allows testing of a single space character in the pattern (” ?”). This ensures the whitespaces are collapsed before the pattern is tested and will match any number of whitespaces.

The whitespace behavior cannot be relaxed during a restriction: if a datatype has a whitespace set as “preserve,” its derived datatypes can have any whitespace behavior; if its whitespace is set as “replace,” its derived datatypes can only have whitespace equal to “replace” or “collapse”; if its whitespace is “collapse,” all its derived datatypes must have the same behavior. This means xs:string is the only datatype that can be used to derive datatypes without any whitespace processing and xs:string and xs:normalizedString are the only datatypes that can be used to derive datatypes normalizing the whitespaces.

In practice, this facet isn’t really useful for user-defined datatypes since the whitespace processing largely dictates the choice of the predefined datatype to use. When we need a datatype that does no whitespace processing, we must use xs:string and not xs:whiteSpace . When we need a datatype that normalizes the whitespaces, instead of using xs:string and applying a xs:whiteSpace facet, we can use xs:normalizedString directly, which has the same effect. When we need a datatype that collapses the whitespaces, we can use xs:token if it’s a string—since, again, xs:token is not a token in the usual meaning of the word but rather a “tokenized string”—as well as any nonstring datatype. The whitespace processing will already be set to “collapse” without any need to use xs:whiteSpace . The previous example given is then equivalent to:

<xs:simpleType name="CapitalizedNameWS">
  <xs:restriction base="xs:token">
    <xs:pattern value="([A-Z]([a-z]*) ?)+"/>
  </xs:restriction>
</xs:simpleType>

Tip

Technically speaking, the W3C Working Group hasn’t “fixed” the xs:whiteSpace facet for xs:token and its derived datatypes. However, xs:whiteSpace has been set to “collapse” for xs:token ; since the facet can’t be relaxed in further restriction, this value cannot be changed in any datatype derived from these datatypes.

Float datatypes

The facets of: xs:double and xs:float are described in the next sections.

xs:enumeration

xs:enumeration allows definition of a list of possible values and operates on the value space—for example:

<xs:simpleType name="enumeration">
  <xs:restriction base="xs:float">
    <xs:enumeration value="-INF"/>
    <xs:enumeration value="1.618033989"/>
    <xs:enumeration value="3e3"/>
  </xs:restriction>
</xs:simpleType>

This simple type will match literals such as:

<enumeration>
  1.618033989
</enumeration>

<enumeration>
  3e3
</enumeration>

<enumeration>
  003000.0000
</enumeration>

This example shows (as we’ve briefly seen with xs:anyURI , xs:QName , and xs:base64Binary ) two different lexical representations (“3e3” and “003000.0000”) for the same value. It also shows, as expected, that all the lexical representations have the same value, so one of the enumerated values will be accepted.

xs:maxExclusive

xs:maxExclusive defines a maximum value that cannot be reached:

<xs:simpleType name="maxExclusive">
  <xs:restriction base="xs:float">
    <xs:maxExclusive value="10"/>
  </xs:restriction>
</xs:simpleType>

This datatype validates “9.999999999999999,” but not “10.”

The xs:maxExclusive facet is especially useful for datatypes such as xs:float , xs:double , xs:decimal , or even for datetime types that can cope with infinitesimal values and in which it is not possible to determine the greatest value that is smaller than a value.

xs:maxInclusive

xs:maxInclusive defines a maximum value that can be reached:

<xs:simpleType name="thousands">
  <xs:restriction base="xs:double">
    <xs:maxInclusive value="1e3"/>
  </xs:restriction>
</xs:simpleType>

xs:minExclusive

xs:minExclusive defines a minimum value that cannot be reached:

<xs:simpleType name="strictlyPositive">
  <xs:restriction base="xs:double">
    <xs:minExclusive value="0"/>
  </xs:restriction>
</xs:simpleType>

xs:minInclusive

xs:minInclusive defines a minimum value that can be reached:

<xs:simpleType name="positive">
  <xs:restriction base="xs:double">
    <xs:minInclusive value="0"/>
  </xs:restriction>
</xs:simpleType>

xs:pattern

xs:pattern defines a pattern that must be matched by the lexical value of the datatype:

<xs:simpleType name="nonScientific">
  <xs:restriction base="xs:float">
    <xs:pattern value="[^eE]*"/>
  </xs:restriction>
</xs:simpleType>

<xs:simpleType name="noLeading0">
  <xs:restriction base="xs:float">
    <xs:pattern value="[^0].*"/>
  </xs:restriction>
</xs:simpleType>

This example shows how a pattern, acting on the lexical value of the float, can disable the use of scientific notation (xxxEyyy) or leading zeros.

Tip

The xs:pattern is the only facet that directly acts on the lexical space of the datatype.

Date and time datatypes

These datatypes are partially ordered, and bounds can be defined even though some restrictions apply. These datatypes are: xs:date , xs:dateTime , xs:duration , xs:gDay , xs:gMonth , xs:gMonthDay , xs:gYear , xs:gYearMonth , and xs:time and their facets are the same as those of the float datatypes, as shown in the next sections.:

xs:enumeration

xs:enumeration allows definition of a list of possible values as well as works on the value space—for example:

<xs:simpleType name="ModernSwissHistoricalDates">
  <xs:restriction base="xs:gYear">
    <xs:enumeration value="1864"/>
    <xs:enumeration value="1872"/>
    <xs:enumeration value="1914"/>
    <xs:enumeration value="1939"/>
    <xs:enumeration value="1971"/>
    <xs:enumeration value="1979"/>
    <xs:enumeration value="1992"/>
  </xs:restriction>
</xs:simpleType>

This simple type will match literals such as:

Since no time zone is specified for the dates in the enumeration, the time zone is undetermined. These dates do not match any date with a time zone specified, such as:

1939Z

or:

1939+10:00

The same issue appears if enumerations include a time zone, such as in:

<xs:simpleType name="wakeUpTime">
  <xs:restriction base="xs:time">
    <xs:enumeration value="07:00:00-07:00"/>
    <xs:enumeration value="07:15:00-07:00"/>
    <xs:enumeration value="07:30:00-07:00"/>
    <xs:enumeration value="07:45:00-07:00"/>
    <xs:enumeration value="08:00:00-07:00"/>
  </xs:restriction>
</xs:simpleType>

This new datatype matches:

07:00:00-07:00

as well as:

11:00:00-04:00

and even:

07:15:00-07:15

but will not validate any time with a time zone.

Even though handling both times with and without time zones is problematic and questionable, it is possible to mix enumerations of values with and without time zones, such as:

<xs:simpleType name="sevenOClockPST">
  <xs:restriction base="xs:time">
    <xs:enumeration value="07:00:00-07:00"/>
    <xs:enumeration value="07:00:00"/>
  </xs:restriction>
</xs:simpleType>

xs:maxExclusive

xs:maxExclusive defines a maximum value that can be reached:

<xs:simpleType name="beforeY2K">
  <xs:restriction base="xs:dateTime">
    <xs:maxExclusive value="2000-01-01T00:00:00Z"/>
  </xs:restriction>
</xs:simpleType>

This datatype validates any date strictly less than Y2K UTC, such as:

1999-12-31T23:59:59Z

or:

1999-12-31T23:59:59.999999999999Z

It will also validate the following; even if expressed using any other time zone, such as:

2000-01-01T11:59:59+12:00

It doesn’t validate:

2000-01-01T00:00:00Z

The interval of indeterminacy of +/-14 hours is applied when compared to datetimes without a time zone. The greatest datetime without a time zone (without counting the fractions of seconds) is therefore:

1999-12-31T09:59:59

xs:maxInclusive

xs:maxInclusive defines a maximum value that can be reached:

<xs:simpleType name="AQuarterOrLess">
  <xs:restriction base="xs:duration">
    <xs:maxInclusive value="P3M"/>
  </xs:restriction>
</xs:simpleType>

This datatype validates all the durations less than or equal to 3 months. Durations such as P2M (2 months) or P3M (3 months) qualify. If both months and days are used, P2M30D (2 months and 30 days) will be valid, but P2M31D (2 months and 31 days), or even P2M30DT1S (2 months, 30 days and 1 second), will be rejected because of the indetermination of the actual duration when parts from year/month on one side and day/hours/minutes/seconds on the other side are used.

xs:minExclusive

xs:minExclusive defines a minimum value that can be reached:

<xs:simpleType name="afterTeaTimeInParisInSummer">
  <xs:restriction base="xs:time">
    <xs:minExclusive value="17:00:00+02:00"/>
  </xs:restriction>
</xs:simpleType>

xs:minInclusive

xs:minInclusive defines a minimum value that can be reached:

<xs:simpleType name="afterOrOnThe20th">
  <xs:restriction base="xs:gDay">
    <xs:minInclusive value="---20"/>
  </xs:restriction>
</xs:simpleType>

We can also take back our example using durations and define:

<xs:simpleType name="AQuarterOrMore">
  <xs:restriction base="xs:duration">
    <xs:minInclusive value="P3M"/>
  </xs:restriction>
</xs:simpleType>

This datatype validates all durations that are more than or equal to 3 months. Durations such as P4M (4 months) or P3M (3 months) will qualify. If both months and days are used, P2M31D (2 months and 31 days) will be valid, but P2M30D (2 months and 30 days), or even P2M30DT23H59M59S (2 months, 30 days, 23 hours, 59 minutes and 59 seconds), will be rejected because of the indetermination of the actual duration.

Because of this indeterminacy, W3C XML Schema considers our third month to have 30 days when we apply xs:minInclusive , and 31 days when we apply xs:maxInclusive . In practice, it may be wise to invalidate the usage of combinations allowing such an indeterminacy. We will see in the next chapter how to do it with a pattern.

xs:pattern

xs:pattern defines a pattern that must be matched by the lexical value of the datatype. We will see patterns in detail in the next chapter. To get an idea of what they look like, look at the following datatype. It forbids usage of a time zone by an xs:dateTime datatype:

<xs:simpleType name="noTimeZone">
  <xs:restriction base="xs:dateTime">
    <xs:pattern value=".*T[^Z+-]*"/>
  </xs:restriction>
</xs:simpleType>

Integer and derived datatypes

These datatypes are: xs:byte , xs:int , xs:integer , xs:long , xs:negativeInteger , xs:nonNegativeInteger , xs:nonPositiveInteger , xs:positiveInteger , xs:short , xs:unsignedByte , xs:unsignedInt , xs:unsignedLong , and xs:unsignedShort .

They accept the same facets of float datatypes as datetime of float datatypes, which we just saw, plus an additional facet to constraint the number of digits, as shown next.

xs:totalDigits

xs:totalDigits defines the maximum number of decimal digits:

<xs:simpleType name="totalDigits">
  <xs:restriction base="xs:integer">
    <xs:totalDigits value="5"/>
  </xs:restriction>
</xs:simpleType>

This datatype accepts only integers with up to five decimal digits.

xs:totalDigits acts on the value space, which means that the integer “000012345,” whose canonical value is “12345,” matches the datatype defined previously.

Decimals

This single datatype ( xs:decimal ) accepts all the facets of the integers and an additional facet to define the number of fractional digits as shown next.

xs:fractionDigits

xs:fractionDigits specifies the maximum number of decimal digits in the fractional part (after the dot) :

<xs:simpleType name="fractionDigits">
  <xs:restriction base="xs:decimal">
    <xs:fractionDigits value="2"/>
  </xs:restriction>
</xs:simpleType>

xs:fractionDigits acts on the value space, which means that the integer “1.12000,” whose canonical value is “1.12,” matches the datatype defined previously.

Booleans

With only one facet allowed, as far as restriction facets are concerned, the simplest datatype is xs:boolean . The value space of this simple datatype is limited to “true” and “false,” but its lexical space also includes “0” and “1.” The xs:pattern facet can be used to exclude one of these formats.

xs:pattern

The functionality of xs:pattern is usually very rich; however, given the limited number of values of the xs:boolean , its only use here appears to be to fix a format:

<xs:simpleType name="trueOrFalse">
  <xs:restriction base="xs:boolean">
    <xs:pattern value="true"/>
    <xs:pattern value="false"/>
  </xs:restriction>
</xs:simpleType>

List datatypes

The available facets for the list datatypes ( xs:IDREFS , xs:ENTITIES , and xs:NMTOKENS ) are the facets available for all the datatypes that are derived by list, as we will see in the next section.

Multiple Restrictions and Fixed Attribute

New restrictions can be applied to datatypes that are already derived by restriction from other types.

When the new restrictions are done on facets that have not yet been constrained, the new facets are just added to the set of facets already defined. The value and lexical spaces of the new datatype are the intersection of all the restrictions. Things become more complex when the same facets are being redefined, and restricting facets can extend the value space.

As far as multiple facet definitions are concerned, we can classify the facets into four categories, described in the next sections.

Facet that can be changed but needs to be more restrictive

This is the general case. xs:enumeration , xs:fractionDigits , xs:maxExclusive , xs:maxInclusive , xs:maxLength , xs:minExclusive , xs:minInclusive , xs:minLength , and xs:totalDigits are in this case.

For all these facets, it is forbidden to add a facet that expands the value space of the base datatype. The following examples demonstrate such errors:

<xs:simpleType name="minInclusive">
  <xs:restriction base="xs:float">
    <xs:minInclusive value="10"/>
  </xs:restriction>
</xs:simpleType>

<xs:simpleType name="minInclusive2">
  <xs:restriction base="minInclusive">
    <xs:minInclusive value="0"/>
  </xs:restriction>
</xs:simpleType>

or:

<xs:simpleType name="enumeration">
  <xs:restriction base="xs:float">
    <xs:enumeration value="-INF"/>
    <xs:enumeration value="1.618033989"/>
    <xs:enumeration value="3e3"/>
  </xs:restriction>
</xs:simpleType>

<xs:simpleType name="enumeration2">
  <xs:restriction base="enumeration">
    <xs:enumeration value="0"/>
  </xs:restriction>
</xs:simpleType>

Facet that cannot be changed

The xs:length facet is the only one in this category. The length of a derived datatype cannot be redefined if the length of its parent has been defined.

xs:length can be seen as a shortcut for assigning an equal value to xs:maxLength and xs:minLength . This behavior is coherent with what happens if these two facets are both used with the same value: further values of xs:maxLength must be inferior or equal to the length, and further values of xs:minLength must be greater than or equal to the length. Since xs:minLength must also be smaller than or equal to xs:maxLength , the only possibility is that they all need to stay equal to the length as previously defined.

Facet that performs the intersection of the lexical spaces

The xs:pattern facet is the only facet that can be applied multiple times. It always restricts the lexical space by performing a straight intersection of the lexical spaces. The following noScientificNoLeading0 datatype will try to match the patterns for both the base datatype and the new restriction:

<xs:simpleType name="nonScientific">
  <xs:restriction base="xs:float">
    <xs:pattern value="[^eE]*"/>
  </xs:restriction>
</xs:simpleType>

<xs:simpleType name="noScientificNoLeading0">
  <xs:restriction base="nonScientific">
    <xs:pattern value="[^0].*"/>
  </xs:restriction>
</xs:simpleType>

Facet that does its job before the lexical space

xs:whiteSpace is a remarkable exception. This facet defines the whitespace processing and can actually expand the set of accepted instance documents during a “restriction,” as shown in the following example:

<xs:simpleType name="greetings">
  <xs:restriction base="xs:string">
    <xs:whiteSpace value="replace"/>
    <xs:enumeration value="hi"/>
    <xs:enumeration value="hello"/>
    <xs:enumeration value="how do you do?"/>
  </xs:restriction>
</xs:simpleType>
      
<xs:simpleType name="restricted-greetings">
  <xs:restriction base="greetings">
    <xs:whiteSpace value="collapse"/>
  </xs:restriction>
</xs:simpleType>

While the first datatype (“greetings”) accepts:

how do you do?

but rejects a string such as:

how do     you do?

the type issued from the “restriction” accepts both.

Fixed facets

Each facet (except xs:enumeration and xs:pattern ) includes a fixed attribute which, when set to true, disables the possibility of modifying the facet during further restrictions by derivation.

If we want to make sure that the minimum value of our minInclusive cannot be modified, we write:

<xs:simpleType name="minInclusive">
  <xs:restriction base="xs:float">
    <xs:minInclusive value="10" fixed="true"/>
  </xs:restriction>
</xs:simpleType>

Note

This is the method used by the schema for W3C XML Schema to fix the value of the facets used to derive predefined datatypes. For instance, the type xs:integer is derived from xs:decimal through:

<xs:simpleType name="integer" id="integer">
  <xs:restriction base="xs:decimal">
    <xs:fractionDigits value="0" fixed="true"/>
  </xs:restriction>
</xs:simpleType>

<xs:enumeration > and <xs:pattern > cannot be fixed.