Glossary

A

abstract

A type of element or complex datatype that cannot be used directly in the instance documents. An abstract element must be substituted and is usually the head of a substitution group. An abstract complex type may be used to define content models, in which case the type will have to be substituted in the instance documents using xsi:type. There is no feature to define simple types as abstract (even though the predefined type xs:NOTATION could be considered abstract).

atom

In a regular expression, an atom expresses a condition on a substring. Atoms may be followed by a quantifier defining the expected number of the atom’s occurrences. The atom, with its optional number of occurrences, constitutes a “piece.” An atom may be a character, a wildcard, a special character, a character class, or a regular expression.

atomic type

A simple type that is not derived by list or union from another simple type.

attributes

Pieces of information attached to an element and defined in its start tag. Considered child nodes by the XPath data model, and considered property nodes by the DOM, attributes are “information items” to the XML Infoset.

attribute groups

Containers that allow you to define, reference, and redefine groups of attributes.

B

base type

The datatype that is used as the starting point to define a new datatype by derivation by restriction or extension.

Berners-Lee, Tim

Inventor of HTML and HTTP, and Director of the W3C; he is considered the father of the World Wide Web (see http://www.w3.org/People/all#timbl).

block

Elements and complex datatypes that cannot be substituted in the instance documents. A blocked element or complex type is restricted in the substitutions that may occur in the instance documents. There is no feature to block simple types.

C

canonical lexical representation

When a value in the value space may have different lexical representations in the lexical space, the W3C XML Schema Recommendation provides (when possible) a canonical representation, which is the most “normal” or “classical” and may be used as a reference. Although most of the types have canonical representations, some such as xs:duration or xs:QName, do not have one.

chameleon design

Importing a schema without a namespace into a schema with a target namespace is known as “chameleon design.” This is because the imported schema takes the target namespace of the schema in which it is imported like a chameleon takes the color of the environment in which it is placed.

character class

In a regular expression, a character class is an atom matching a set of characters. Character classes may be classical Perl character classes, Unicode character classes, or user-defined character classes.

classical Perl character class

A set of character classes designated by a single letter, for which upper- and lowercases of the same letter are complementary (for instance, “\d” is all the decimal digits, and “\D” is all the characters that are not decimal digits).

complex content

An element has a complex content model when it has child element nodes only (and no text node).

component

Something that can be defined and referenced in a schema. Elements, attributes, simple and complex types, and element and attribute groups are components.

compositor

Containers that allow the manipulation of a set of elements as a whole and defines their relative order. Compositors include xs:sequence, xs:choice, and xs:all. Compositors may be included in other compositors to form complex combinations (with some limitations). Most can also be used as particles and have minOccurs and maxOccurs attributes, which allow definition of the number of repetitions expected for the whole group of elements that they define. The child elements of a compositor are “particles.” A restriction applies to xs:all as a compositor: it can only include xs:element particles.

Consistent Declaration rule

This states that an element referenced by one “location” in a schema cannot be associated with two different simple or complex types.

content model

A description of the structure of children elements and text nodes (independent of attributes). The content model is “simple” when there is a text node but no elements, “complex” when there are element nodes but no text, “mixed” when there are text and element nodes, and “empty” when there are neither text nor element nodes. These definitions are commonly used by XML developers and slightly different from those of W3C XML Schema, for which there are only simple and complex content models. (Mixed models are considered special cases of complex contents, and empty models are considered either simple or complex contents with no child nodes.)

D

datatype

A term used by W3C XML Schema to qualify both the content and the structure of an element or attribute. Datatypes can be either simple (when they describe an attribute or an element without an embedded element or attribute) or complex (when they describe elements with embedded child elements or attributes). W3C XML Schema datatypes should not be confused with XML 1.0 element types, which are called element names by W3C XML Schema.

default value

A value that is used when no value is provided in the instance document. Default values apply to attributes that are either empty or missing in the instance documents and that apply to empty elements.

derivation

The action of defining a datatype by using the definition of one or several other datatypes. Simple datatypes may be defined by derivation by restriction, list, or union, while complex datatypes can be defined by derivation by restriction or extension.

derivation by extension

The action of adding attributes or elements to a complex type.

derivation by list

The action of using a simple datatype (called the list type) to define a new simple datatype as a whitespace-separated list of values of the list type. Derivation by list applies only to simple datatypes.

derivation by restriction

For simple datatypes, a derivation by restriction is the action of defining a simple datatype by adding new constraints (called facets) on the lexical or value space of an existing datatype (called the base type). For complex datatypes, a derivation by restriction is the action of giving a new content model for the datatype that is a restriction of the base type.

derivation by union

The action of using a set of simple datatypes (called the member types) to define a new simple datatype whose lexical space is the union of the lexical spaces of the member types.

derived datatype

A datatype that is defined by derivation from other datatypes. They can be user-defined when defined in a schema, or predefined when defined in the W3C XML Schema Recommendation.

DOM

Document Object Model. An object-oriented model of XML documents, including the definition of the API allowing its manipulation. The third version of DOM (DOM Level 3) will include an API named “Abstract Schemas” to facilitate schema-guided editions of XML documents (see http://www.w3.org/TR/DOM-Level-3-Core).

DSDL

Document Schema Definition Language (DSDL) is a project undertaken by the ISO (ISO/IEC JTC 1/SC 34/WG 1, to be precise) whose objective is “to create a framework within which multiple validation tasks of different types can be applied to an XML document in order to achieve more complete validation results than just the application of a single technology” (see http://dsdl.org). DSDL has classified W3C XML Schema as “object-oriented schema language.”

DTD

Document Type Definition. XML 1.0 DTDs are inherited from SGML, in which rules were included that allow the customization of the markup itself and played a very central role. Because of the syntactical rules included in their DTDs, SGML applications need a DTD to be able to read an SGML document. One of the simplifications of XML is to state that a XML parser should be able to read a document without needing a DTD. DTDs have therefore been simplified over their SGML ancestors and remain the first incarnation of what is today called a XML Schema language.

E

element

One of the basic type of nodes in the tree represented by a XML document. An element is delimited by start and end tags. In the corresponding tree, an element is a nonterminal node, which may have subnodes of type element, character (text), and namespace and attribute, as well as comment and processing instruction nodes.

element type

Term used in the XML 1.0 Recommendation, which is equivalent to the notion of element names in W3C XML Schema and should not be confused with the simple or complex datatype of an element.

element groups

Containers that allow you to define, reference, and redefine groups of elements.

empty content

An element that has neither child element nor text nodes (with or without attributes).

f

facet

A constraint added to the lexical or value space of a simple datatype during a derivation by restriction. The list of facets that can be used depends on the simple datatype. Facets can be “fixed” to disable their use during further derivations.

final

Elements and datatypes that cannot be substituted or derived any longer in the schema. A final element may not be chosen as the head of a substitution group while a final complex or simple type cannot be used as a base for further derivation.

fixed facets

Facets that are “fixed” during a derivation by restriction cannot be used during further derivations by restriction.

fixed values

A value that must match the value found in the instance document. Used as default values if no value is supplied.

G

global definition

All the components (elements, attributes, simple and complex types, element and attribute groups) can be defined at the top level of the schema, directly under the xs:schema document element. Their definition is said to be “global,” and they can be referenced elsewhere in the schema, as well as in any schema that has imported or included this schema.

I

Infoset

XML Information Set. A formal description of the information that may be found in a well-formed XML document.

instance document

A XML document that is a candidate to be validated by a schema. Any well-formed XML 1.0 document that conforms to the Namespaces in XML 1.0 Recommendation can be considered a valid or invalid instance document.

item type

The simple datatype that is used as the starting point to define a new simple datatype using a derivation by list.

L

lexical space

The set of all representations (after parsing and whitespace processing) allowed for a simple datatype.

local definition

Most of the components (elements, attributes, simple and complex types) can be defined inside of other components where they are used. Their definition is said to be “local” and they cannot be referenced in other parts of the schema.

local name

The name of a component in its namespace, i.e., the part of the qualified name that comes after the namespace prefix.

M

member types

The simple datatypes used as the starting point to define a new simple datatype using a derivation by union.

mixed content

The content of an element that contains both child element and text nodes.

N

namespace

A unique identifier that can be associated with a set of XML elements and attributes. This identifier is a URI, which is not required to point to an actual resource but must “belong” to the author of these elements and attributes. Since this full URI can’t be included in the name of each element and attribute, a namespace prefix is assigned to the namespace URI through a namespace declaration. This prefix is added to the local name of the elements and attributes to form a qualified name. Namespaces are optional and elements and attributes may have no namespaces attached. W3C XML Schema has extended the scope of namespaces by using them not only for elements and attributes but also for all the components of a schema. A schema identifies the namespace of the components described in a schema as a target namespace. When these components do not have a namespace, the schema is said to have no target namespace.

P

parsed space

The set of values that are sent by the parser to the applications. It is at the interface between the parser and the schema validator. Values from the parsed space undergo whitespace processing, as defined by their simple datatype, to feed the lexical space. The parsed space is, therefore, not visible by the facets.

particle

An element, such as a compositor, a group of elements (xs:group), an element definition or reference (xs:element), or an element wildcard (xs:any), which is included in a compositor to define a list of elements. A restriction applies to xs:all, which cannot be used as a particle even though it is defined as a compositor. The number of occurrences of particles may be constrained using their minOccurs and maxOccurs attributes.

pattern

A facet that allows definition of a regular expression, which will be applied to the lexical space to check its validity. By extension, the regular expression defined in a pattern is often called “pattern” as well.

piece

Regular expressions (or patterns) are composed of pieces. Each piece is itself composed of an atom describing a condition on a substring and an optional quantifier defining the expected number of occurrences of the atom.

predefined datatype

The simple datatypes (both primitive and derived) that are defined in the W3C XML Schema Recommendation.

primitive datatype

A simple datatype that cannot be defined by derivation from other datatypes. There is no way to create primitive datatypes, so all the primitive datatypes are therefore predefined.

PSVI

The Post Schema Validation Infoset. The Infoset after the information gathered during a schema validation is added.

Q

qualified element or attribute

Elements and attributes that belong to a namespace; i.e., a namespace URI is defined for them. The name of qualified elements may have no prefix if a default namespace is defined, but the name of qualified attributes must be prefixed.

qualified name

The complete name of a component, including the prefix associated to its target namespace if one is defined.

R

RDBMS

Relational DataBase Management System. Developed in the late 70s, this system has taken most of the database market and hosts a significant amount of the data of many organizations. XML Schema languages may help to insure the interface between that information and XML documents.

Recommendation

Specifications published by the W3C. They cannot be officially called “standards,” since the W3C is a consortium that does not have the status of the standard body reserved for the ISO and national standard bodies. The specifications, which are finalized and approved by the Director, are then called “W3C Recommendations.”

reference

All of the components (elements, attributes, simple and complex types, element and attribute groups) that have been created with a global definition can be referenced when needed in the schema in which they are defined, and in any schema that has imported or included this schema. Their definition is used at the location where they are referenced.

regular expression

A syntax to express conditions on strings. The syntax used by the W3C XML Schema for its patterns is very close to the syntax introduced by the Perl programming language. A regular expression is composed of elementary “pieces.”

RELAX

A grammar-based XML Schema language developed by Murata Makoto and published in March 2000 as a Japanese ISO Standard (see http://www.xml.gr.jp/relax).

RELAX NG

A grammar-based XML Schema language resulting from a merger between RELAX and TREX (see http://relaxng.org).

S

SAX

Simple API for XML. A streaming event-based API used between parsers and applications. Its streaming nature means that pipelines of XML processing may be created using SAX (see http://www.saxproject.org).

Schematron

A rule-based XML Schema language, developed by Rick Jelliffe, using XPath expressions to describe validation rules (see http://www.ascc.net/xml/resource/schematron/schematron.html).

serialization space

The set of values as they are stored in a document. These values are transformed by the parser, as defined in the Recommendation XML 1.0, before reaching the application. The serialization space is not visible to the schema processors.

SGML

Standard Generalized Markup Language. Created in 1980, the ancestor of XML. XML was designed as a simplified subset of SGML to be used on the Web.

simple content

An element has a simple content model when it has a child text node only (and no subelements). A simple content element has a simple type if it has no attributes, and it has a complex type if it has any attributes.

simple datatype

A datatype that accepts only a text value. Simple datatypes can be directly assigned to attributes and simple content elements that do not accept any attribute. Simple datatypes can be used to define complex datatypes by extension.

SOAP

The major XML protocol used by Web Services; relies on W3C XML Schema to describe the messages exchanged (see http://www.w3.org/TR/SOAP).

space

W3C XML Schema uses the term “space” to mean a set of values (lexical versus value spaces). For completeness, we introduced two additional spaces in this book (the serialization and parsed spaces).

special character

A character that may be used as an atom after a “\” to accept a specific character, either for convenience or because this character is interpreted differently in the context of a regular expression.

substitution group

A feature of W3C XML Schema, allowing you to define groups of elements that may be used interchangeably in instance documents. They are not declared as element groups, but through the substitutionGroup attribute of xs:element global definitions.

T

target namespace

The namespace of the components described in a schema. When these components do not have a namespace, the schema is said to have no target namespace.

TREX

A grammar-based XML Schema language developed by James Clark (see http://www.thaiopensource.com/trex).

U

Unicode block

A set of characters classified by their “localization” (Latin, Arabic, Hebrew, Tibetan, and even Gothic or musical symbols).

Unicode category

A set of characters classified by their usage (letters, uppercase, digit, punctuation, etc.).

Unicode character class

A set of character classes defined based on the Unicode blocks and categories.

unqualified element or attribute

Elements and attributes that don’t belong to a namespace; i.e., no namespace URI is defined for them. Any unprefixed attribute is unqualified, but unprefixed elements are unqualified only if no default namespace is defined.

UPA rule

The UPA (Unique Particle Attribution) rule states that at any given moment, a W3C XML Schema processor must know—without ambiguity and without needing any forward reference in the document—which particle in the schema describes an element in the instance document. This rule is roughly equivalent to the restrictions known as “non-deterministic content models” for the XML 1.0 DTDs and as “ambiguous content models” by SGML. The UPA rule is often associated with the “Consistent Declaration rule.”

URI

Uniform Resource Identifier. Defined by the RFCs 2396 and 2732. URIs were created to extend the notion of URLs (Uniform Resource Locators) to include abstract identifiers that do not necessarily need to “locate” a resource.

URL

Uniform Resource Locator, a common identifier used on the Web. URLs are absolute when the full path to the resource is indicated, and relative when a partial path is given that needs to be evaluated in relation with a base URL.

user-defined character class

A set of characters defined by the schema author.

user-defined datatype

Datatypes that are defined in a schema. All the datatypes can be defined by derivation or, for the complex datatypes only, by definition.

V

valid

A XML document that is well-formed and conforms to a schema (DTD, W3C XML Schema, etc.) of some kind.

value space

The set of all the possible values for a simple datatype, independent of their actual representation in the instance documents.

W

W3C

World Wide Web Consortium. Originally created to settle HTML and HTTP as de facto standards. The main specification body for the core specifications of the World Wide Web and the keeper of the core XML specifications (see http://www.w3.org).

Web Services

An approach to using the Web for applications, as opposed to the Web for human consumption that we use on a daily basis. Those services rely on the same infrastructure as the Web, and exchange XML documents over HTTP though a layer of protocols (such as SOAP or XML-RPC), which are themselves based on XML. XML Schema languages are used by these services to describe and control the XML documents that are exchanged.

well-formed

An XML document that meets the conditions defined in the XML 1.0 Recommendation: it must be readable without ambiguity. Syntax errors will be detected by a XML parser without schema of any type.

whitespace

Characters #x9 (tab), #xA (linefeed), #xD (carriage return), and #x20 (space). These are often used to indent the XML documents to give them a more readable aspect, and are filtered by an operation named “whitespace processing.”

whitespace collapsing

The action of applying the whitespace replacement, trimming the leading and trailing spaces, and replacing all the sequences of contiguous whitespaces by a single space between the parsed and lexical spaces. Most of the simple datatypes apply whitespace collapsing.

whitespace preservation

The action of preserving all the whitespaces from the parsed to the lexical space. The xs:string datatypes and the user-defined simple types derived from xs:string (which do not change the value of the xs:whitespace facet) are the only datatypes applying whitespace preservation.

whitespace processing

The operation of filtering that is done on the whitespaces present in the value of a simple datatype. The whitespace processing is done during the transformation between parsed and lexical spaces. W3C XML Schema defines three whitespace processing approaches (depending on the simple type): whitespace preservation, whitespace replacement, and whitespace collapsing.

whitespace replacement

The action of replacing all the occurrences of the characters #x9 (tab), #xA (linefeed), and #xD (carriage return) by a #x20 (space) between the parsed and the lexical space. Whitespace replacement doesn’t change the length of the string. xs:normalizedString and the user-defined simple types derived from xs:string and xs:normalizedString (for which the value of the xs:whitespace facet is “replace”) are the only datatypes that apply whitespace replacement.

wildcard

A character used as an atom in a regular expression to accept a set of characters. W3C XML Schema supports only one such wildcard: the character “.”, which means “any character.” This expression is also used to designate the xs:any and xs:anyAttribute particles.

X

Xerces

The XML parser developed by the XML Apache project (see http://xml.apache.org/xerces2-j/index.html).

XInclude

A W3C specification defining a general purpose inclusion mechanism for XML documents (see http://www.w3.org/TR/xinclude).

XLink

XML Linking Language is a W3C Recommendation (http://www.w3.org/TR/xlink) “which allows elements to be inserted into XML documents in order to create and describe links between resources.”

XML

Extensible Markup Language. A subset of SGML created to be used on the Web. Its core specification (XML 1.0) was published by the W3C in February 1998. New specifications have been added since this date, and the W3C considers that, with the addition of W3C XML Schema, the core specifications are now complete.

XML-RPC

Considered the ancestor of SOAP, XML-RPC is a simple XML protocol that may be used to implement Web Services. It does not rely on the W3C XML Schema to describe the content of its messages but has defined a simpler binding mechanism (see http://www.xmlrpc.com).

XPath

A query language used to identify a set of nodes within a XML document. Originally defined to be used with XSLT, it is also used by XPointer and a simple subset is used in the xs:key, xs:keyref, and xs:unique W3C XML Schema elements. The XQuery specification will be a superset of the second version of XPath. This version will use type information provided by W3C XML Schema (see http://www.w3.org/TR/xpath).

XQuery

XML Query language. This will be a superset of XPath 2.0 that will use type information provided by the W3C XML Schema to optimize its queries, and for features such as sort orders (see http://www.w3.org/TR/xquery).

XSLT

Extensible Stylesheet Language Transformations. A programming language specialized for the transformation of XML documents (see http://www.w3.org/TR/xslt).

XSV

An open source W3C XML Schema implementation available at http://www.w3.org/2001/03/webdata/xsv.