A
type of element or complex datatype that cannot be used directly in
the instance documents. An abstract element must be substituted and
is usually the head of a substitution group. An abstract complex type
may be used to define content models, in which case the type will
have to be substituted in the instance documents using
xsi:type
. There is no feature to define simple
types as abstract (even though the predefined type
xs:NOTATION
could be considered abstract).
In a regular expression, an atom expresses a condition on a substring. Atoms may be followed by a quantifier defining the expected number of the atom’s occurrences. The atom, with its optional number of occurrences, constitutes a “piece.” An atom may be a character, a wildcard, a special character, a character class, or a regular expression.
A simple type that is not derived by list or union from another simple type.
Pieces of information attached to an element and defined in its start tag. Considered child nodes by the XPath data model, and considered property nodes by the DOM, attributes are “information items” to the XML Infoset.
Containers that allow you to define, reference, and redefine groups of attributes.
The datatype that is used as the starting point to define a new datatype by derivation by restriction or extension.
Inventor of HTML and HTTP, and Director of the W3C; he is considered the father of the World Wide Web (see http://www.w3.org/People/all#timbl).
Elements and complex datatypes that cannot be substituted in the instance documents. A blocked element or complex type is restricted in the substitutions that may occur in the instance documents. There is no feature to block simple types.
When a value in the value space may have different lexical
representations in the lexical space, the W3C XML Schema
Recommendation provides (when possible) a canonical representation,
which is the most “normal” or
“classical” and may be used as a
reference. Although most of the types have canonical representations,
some such as xs:duration
or
xs:QName
, do not have one.
Importing a schema without a namespace into a schema with a target namespace is known as “chameleon design.” This is because the imported schema takes the target namespace of the schema in which it is imported like a chameleon takes the color of the environment in which it is placed.
In a regular expression, a character class is an atom matching a set of characters. Character classes may be classical Perl character classes, Unicode character classes, or user-defined character classes.
A set of character classes designated by a single letter, for which upper- and lowercases of the same letter are complementary (for instance, “\d” is all the decimal digits, and “\D” is all the characters that are not decimal digits).
An element has a complex content model when it has child element nodes only (and no text node).
Something that can be defined and referenced in a schema. Elements, attributes, simple and complex types, and element and attribute groups are components.
Containers that allow the manipulation of a set of elements as a
whole and defines their relative order. Compositors include
xs:sequence
, xs:choice
, and
xs:all
. Compositors may be included in other
compositors to form complex combinations (with some limitations).
Most can also be used as particles and have
minOccurs
and maxOccurs
attributes, which allow definition of the number of repetitions
expected for the whole group of elements that they define. The child
elements of a compositor are
“particles.” A restriction applies
to xs:all
as a compositor: it can only include
xs:element
particles.
This states that an element referenced by one “location” in a schema cannot be associated with two different simple or complex types.
A description of the structure of children elements and text nodes (independent of attributes). The content model is “simple” when there is a text node but no elements, “complex” when there are element nodes but no text, “mixed” when there are text and element nodes, and “empty” when there are neither text nor element nodes. These definitions are commonly used by XML developers and slightly different from those of W3C XML Schema, for which there are only simple and complex content models. (Mixed models are considered special cases of complex contents, and empty models are considered either simple or complex contents with no child nodes.)
A term used by W3C XML Schema to qualify both the content and the structure of an element or attribute. Datatypes can be either simple (when they describe an attribute or an element without an embedded element or attribute) or complex (when they describe elements with embedded child elements or attributes). W3C XML Schema datatypes should not be confused with XML 1.0 element types, which are called element names by W3C XML Schema.
A value that is used when no value is provided in the instance document. Default values apply to attributes that are either empty or missing in the instance documents and that apply to empty elements.
The action of defining a datatype by using the definition of one or several other datatypes. Simple datatypes may be defined by derivation by restriction, list, or union, while complex datatypes can be defined by derivation by restriction or extension.
The action of adding attributes or elements to a complex type.
The action of using a simple datatype (called the list type) to define a new simple datatype as a whitespace-separated list of values of the list type. Derivation by list applies only to simple datatypes.
For simple datatypes, a derivation by restriction is the action of defining a simple datatype by adding new constraints (called facets) on the lexical or value space of an existing datatype (called the base type). For complex datatypes, a derivation by restriction is the action of giving a new content model for the datatype that is a restriction of the base type.
The action of using a set of simple datatypes (called the member types) to define a new simple datatype whose lexical space is the union of the lexical spaces of the member types.
A datatype that is defined by derivation from other datatypes. They can be user-defined when defined in a schema, or predefined when defined in the W3C XML Schema Recommendation.
Document Object Model. An object-oriented model of XML documents, including the definition of the API allowing its manipulation. The third version of DOM (DOM Level 3) will include an API named “Abstract Schemas” to facilitate schema-guided editions of XML documents (see http://www.w3.org/TR/DOM-Level-3-Core).
Document Schema Definition Language (DSDL) is a project undertaken by the ISO (ISO/IEC JTC 1/SC 34/WG 1, to be precise) whose objective is “to create a framework within which multiple validation tasks of different types can be applied to an XML document in order to achieve more complete validation results than just the application of a single technology” (see http://dsdl.org). DSDL has classified W3C XML Schema as “object-oriented schema language.”
Document Type Definition. XML 1.0 DTDs are inherited from SGML, in which rules were included that allow the customization of the markup itself and played a very central role. Because of the syntactical rules included in their DTDs, SGML applications need a DTD to be able to read an SGML document. One of the simplifications of XML is to state that a XML parser should be able to read a document without needing a DTD. DTDs have therefore been simplified over their SGML ancestors and remain the first incarnation of what is today called a XML Schema language.
One of the basic type of nodes in the tree represented by a XML document. An element is delimited by start and end tags. In the corresponding tree, an element is a nonterminal node, which may have subnodes of type element, character (text), and namespace and attribute, as well as comment and processing instruction nodes.
Term used in the XML 1.0 Recommendation, which is equivalent to the notion of element names in W3C XML Schema and should not be confused with the simple or complex datatype of an element.
Containers that allow you to define, reference, and redefine groups of elements.
An element that has neither child element nor text nodes (with or without attributes).
A constraint added to the lexical or value space of a simple datatype during a derivation by restriction. The list of facets that can be used depends on the simple datatype. Facets can be “fixed” to disable their use during further derivations.
Elements and datatypes that cannot be substituted or derived any longer in the schema. A final element may not be chosen as the head of a substitution group while a final complex or simple type cannot be used as a base for further derivation.
Facets that are “fixed” during a derivation by restriction cannot be used during further derivations by restriction.
A value that must match the value found in the instance document. Used as default values if no value is supplied.
All the components (elements, attributes, simple and complex types,
element and attribute groups) can be defined at the top level of the
schema, directly under the xs:schema
document
element. Their definition is said to be
“global,” and they can be
referenced elsewhere in the schema, as well as in any schema that has
imported or included this schema.
XML Information Set. A formal description of the information that may be found in a well-formed XML document.
A XML document that is a candidate to be validated by a schema. Any well-formed XML 1.0 document that conforms to the Namespaces in XML 1.0 Recommendation can be considered a valid or invalid instance document.
The simple datatype that is used as the starting point to define a new simple datatype using a derivation by list.
The set of all representations (after parsing and whitespace processing) allowed for a simple datatype.
Most of the components (elements, attributes, simple and complex types) can be defined inside of other components where they are used. Their definition is said to be “local” and they cannot be referenced in other parts of the schema.
The name of a component in its namespace, i.e., the part of the qualified name that comes after the namespace prefix.
The simple datatypes used as the starting point to define a new simple datatype using a derivation by union.
The content of an element that contains both child element and text nodes.
A unique identifier that can be associated with a set of XML elements and attributes. This identifier is a URI, which is not required to point to an actual resource but must “belong” to the author of these elements and attributes. Since this full URI can’t be included in the name of each element and attribute, a namespace prefix is assigned to the namespace URI through a namespace declaration. This prefix is added to the local name of the elements and attributes to form a qualified name. Namespaces are optional and elements and attributes may have no namespaces attached. W3C XML Schema has extended the scope of namespaces by using them not only for elements and attributes but also for all the components of a schema. A schema identifies the namespace of the components described in a schema as a target namespace. When these components do not have a namespace, the schema is said to have no target namespace.
The set of values that are sent by the parser to the applications. It is at the interface between the parser and the schema validator. Values from the parsed space undergo whitespace processing, as defined by their simple datatype, to feed the lexical space. The parsed space is, therefore, not visible by the facets.
An element, such as a compositor, a group of elements
(xs:group
), an element definition or reference
(xs:element
), or an element wildcard
(xs:any
), which is included in a compositor to
define a list of elements. A restriction applies to
xs:all
, which cannot be used as a particle even
though it is defined as a compositor. The number of occurrences of
particles may be constrained using their minOccurs
and maxOccurs
attributes.
A facet that allows definition of a regular expression, which will be applied to the lexical space to check its validity. By extension, the regular expression defined in a pattern is often called “pattern” as well.
Regular expressions (or patterns) are composed of pieces. Each piece is itself composed of an atom describing a condition on a substring and an optional quantifier defining the expected number of occurrences of the atom.
The simple datatypes (both primitive and derived) that are defined in the W3C XML Schema Recommendation.
A simple datatype that cannot be defined by derivation from other datatypes. There is no way to create primitive datatypes, so all the primitive datatypes are therefore predefined.
The Post Schema Validation Infoset. The Infoset after the information gathered during a schema validation is added.
Elements and attributes that belong to a namespace; i.e., a namespace URI is defined for them. The name of qualified elements may have no prefix if a default namespace is defined, but the name of qualified attributes must be prefixed.
The complete name of a component, including the prefix associated to its target namespace if one is defined.
Relational DataBase Management System. Developed in the late 70s, this system has taken most of the database market and hosts a significant amount of the data of many organizations. XML Schema languages may help to insure the interface between that information and XML documents.
Specifications published by the W3C. They cannot be officially called “standards,” since the W3C is a consortium that does not have the status of the standard body reserved for the ISO and national standard bodies. The specifications, which are finalized and approved by the Director, are then called “W3C Recommendations.”
All of the components (elements, attributes, simple and complex types, element and attribute groups) that have been created with a global definition can be referenced when needed in the schema in which they are defined, and in any schema that has imported or included this schema. Their definition is used at the location where they are referenced.
A syntax to express conditions on strings. The syntax used by the W3C XML Schema for its patterns is very close to the syntax introduced by the Perl programming language. A regular expression is composed of elementary “pieces.”
A grammar-based XML Schema language developed by Murata Makoto and published in March 2000 as a Japanese ISO Standard (see http://www.xml.gr.jp/relax).
A grammar-based XML Schema language resulting from a merger between RELAX and TREX (see http://relaxng.org).
Simple API for XML. A streaming event-based API used between parsers and applications. Its streaming nature means that pipelines of XML processing may be created using SAX (see http://www.saxproject.org).
A rule-based XML Schema language, developed by Rick Jelliffe, using XPath expressions to describe validation rules (see http://www.ascc.net/xml/resource/schematron/schematron.html).
The set of values as they are stored in a document. These values are transformed by the parser, as defined in the Recommendation XML 1.0, before reaching the application. The serialization space is not visible to the schema processors.
Standard Generalized Markup Language. Created in 1980, the ancestor of XML. XML was designed as a simplified subset of SGML to be used on the Web.
An element has a simple content model when it has a child text node only (and no subelements). A simple content element has a simple type if it has no attributes, and it has a complex type if it has any attributes.
A datatype that accepts only a text value. Simple datatypes can be directly assigned to attributes and simple content elements that do not accept any attribute. Simple datatypes can be used to define complex datatypes by extension.
The major XML protocol used by Web Services; relies on W3C XML Schema to describe the messages exchanged (see http://www.w3.org/TR/SOAP).
W3C XML Schema uses the term “space” to mean a set of values (lexical versus value spaces). For completeness, we introduced two additional spaces in this book (the serialization and parsed spaces).
A character that may be used as an atom after a “\” to accept a specific character, either for convenience or because this character is interpreted differently in the context of a regular expression.
A feature of W3C XML Schema, allowing you to define groups of
elements that may be used interchangeably in instance documents. They
are not declared as element groups, but through the
substitutionGroup
attribute of
xs:element
global definitions.
The namespace of the components described in a schema. When these components do not have a namespace, the schema is said to have no target namespace.
A grammar-based XML Schema language developed by James Clark (see http://www.thaiopensource.com/trex).
A set of characters classified by their “localization” (Latin, Arabic, Hebrew, Tibetan, and even Gothic or musical symbols).
A set of characters classified by their usage (letters, uppercase, digit, punctuation, etc.).
A set of character classes defined based on the Unicode blocks and categories.
Elements and attributes that don’t belong to a namespace; i.e., no namespace URI is defined for them. Any unprefixed attribute is unqualified, but unprefixed elements are unqualified only if no default namespace is defined.
The UPA (Unique Particle Attribution) rule states that at any given moment, a W3C XML Schema processor must know—without ambiguity and without needing any forward reference in the document—which particle in the schema describes an element in the instance document. This rule is roughly equivalent to the restrictions known as “non-deterministic content models” for the XML 1.0 DTDs and as “ambiguous content models” by SGML. The UPA rule is often associated with the “Consistent Declaration rule.”
Uniform Resource Identifier. Defined by the RFCs 2396 and 2732. URIs were created to extend the notion of URLs (Uniform Resource Locators) to include abstract identifiers that do not necessarily need to “locate” a resource.
Uniform Resource Locator, a common identifier used on the Web. URLs are absolute when the full path to the resource is indicated, and relative when a partial path is given that needs to be evaluated in relation with a base URL.
A set of characters defined by the schema author.
Datatypes that are defined in a schema. All the datatypes can be defined by derivation or, for the complex datatypes only, by definition.
A XML document that is well-formed and conforms to a schema (DTD, W3C XML Schema, etc.) of some kind.
The set of all the possible values for a simple datatype, independent of their actual representation in the instance documents.
World Wide Web Consortium. Originally created to settle HTML and HTTP as de facto standards. The main specification body for the core specifications of the World Wide Web and the keeper of the core XML specifications (see http://www.w3.org).
An approach to using the Web for applications, as opposed to the Web for human consumption that we use on a daily basis. Those services rely on the same infrastructure as the Web, and exchange XML documents over HTTP though a layer of protocols (such as SOAP or XML-RPC), which are themselves based on XML. XML Schema languages are used by these services to describe and control the XML documents that are exchanged.
An XML document that meets the conditions defined in the XML 1.0 Recommendation: it must be readable without ambiguity. Syntax errors will be detected by a XML parser without schema of any type.
Characters #x9
(tab), #xA
(linefeed), #xD
(carriage return), and
#x20
(space). These are often used to indent the
XML documents to give them a more readable aspect, and are filtered
by an operation named “whitespace
processing.”
The action of applying the whitespace replacement, trimming the leading and trailing spaces, and replacing all the sequences of contiguous whitespaces by a single space between the parsed and lexical spaces. Most of the simple datatypes apply whitespace collapsing.
The action of preserving all the whitespaces from the parsed to the
lexical space. The xs:string
datatypes and the
user-defined simple types derived from xs:string
(which do not change the value of the
xs:whitespace
facet) are the only datatypes
applying whitespace preservation.
The operation of filtering that is done on the whitespaces present in the value of a simple datatype. The whitespace processing is done during the transformation between parsed and lexical spaces. W3C XML Schema defines three whitespace processing approaches (depending on the simple type): whitespace preservation, whitespace replacement, and whitespace collapsing.
The action of replacing all the occurrences of the characters
#x9
(tab), #xA
(linefeed), and
#xD
(carriage return) by a #x20
(space) between the parsed and the lexical space. Whitespace
replacement doesn’t change the length of the string.
xs:normalizedString
and the user-defined simple
types derived from xs:string
and
xs:normalizedString
(for which the value of the
xs:whitespace
facet is
“replace”) are the only datatypes
that apply whitespace replacement.
A character used as an atom in a regular expression to accept a set
of characters. W3C XML Schema supports only one such wildcard: the
character “.”, which means
“any character.” This expression is
also used to designate the xs:any
and
xs:anyAttribute
particles.
The XML parser developed by the XML Apache project (see http://xml.apache.org/xerces2-j/index.html).
A W3C specification defining a general purpose inclusion mechanism for XML documents (see http://www.w3.org/TR/xinclude).
XML Linking Language is a W3C Recommendation (http://www.w3.org/TR/xlink) “which allows elements to be inserted into XML documents in order to create and describe links between resources.”
Extensible Markup Language. A subset of SGML created to be used on the Web. Its core specification (XML 1.0) was published by the W3C in February 1998. New specifications have been added since this date, and the W3C considers that, with the addition of W3C XML Schema, the core specifications are now complete.
Considered the ancestor of SOAP, XML-RPC is a simple XML protocol that may be used to implement Web Services. It does not rely on the W3C XML Schema to describe the content of its messages but has defined a simpler binding mechanism (see http://www.xmlrpc.com).
A query language used to identify a set of nodes within a XML
document. Originally defined to be used with XSLT, it is also used by
XPointer and a simple subset is used in the
xs:key
, xs:keyref
, and
xs:unique
W3C XML Schema elements. The XQuery
specification will be a superset of the second version of XPath. This
version will use type information provided by W3C XML Schema (see
http://www.w3.org/TR/xpath).
XML Query language. This will be a superset of XPath 2.0 that will use type information provided by the W3C XML Schema to optimize its queries, and for features such as sort orders (see http://www.w3.org/TR/xquery).
Extensible Stylesheet Language Transformations. A programming language specialized for the transformation of XML documents (see http://www.w3.org/TR/xslt).
An open source W3C XML Schema implementation available at http://www.w3.org/2001/03/webdata/xsv.