Chapter 4. Namespaces

Namespaces have two purposes in XML:

  1. To distinguish between elements and attributes from different vocabularies with different meanings that happen to share the same name

  2. To group all the related elements and attributes from a single XML application together so that software can easily recognize them

The first purpose is easier to explain and grasp, but the second purpose is more important in practice.

Namespaces are implemented by attaching a prefix to each element and attribute. Each prefix is mapped to a URI by an xmlns:prefix attribute. Default URIs can also be provided for elements that don't have a prefix. Default namespaces are declared by xmlns attributes. Elements and attributes that are attached to the same URI are in the same namespace. Elements from many XML applications are identified by standard URIs.

In an XML 1.1 document, an Internationalized Resource Identifier (IRI) can be used instead of a URI. An IRI is just like a URI except it can contain non-ASCII characters such as é and π. In practice, parsers don't check that namespace names are legal URIs in XML 1.0, so the distinction is mostly academic.

Some documents combine markup from multiple XML applications. For example, an XHTML document may contain both SVG pictures and MathML equations. An XSLT stylesheet will contain both XSLT instructions and elements from the result-tree vocabulary. And XLinks are always symbiotic with the elements of the document in which they appear since XLink itself doesn't define any elements, only attributes.

In some cases, these applications may use the same name to refer to different things. For example, in SVG a set element sets the value of an attribute for a specified duration of time, while in MathML, a set element represents a mathematical set such as the set of all positive even numbers. It's essential to know when you're working with a MathML set and when you're working with an SVG set. Otherwise, validation, rendering, indexing, and many other tasks will get confused and fail.

Consider Example 4-1. This is a simple list of paintings, including the title of each painting, the date each was painted, the artist who painted it, and a description of the painting.

Now suppose that Example 4-1 is to be served as a web page and you want to make it accessible to search engines. One possibility is to use the Resource Description Framework (RDF) to embed metadata in the page. This describes the page for any search engines or other robots that might come along. Using the Dublin Core metadata vocabulary (http://purl.oclc.org/dc/ ), a standard vocabulary for library catalog-style information that can be encoded in XML or other syntaxes, an RDF description of this page might look something like this:

<RDF>
  <Description
     about="http://www.cafeconleche.org/examples/impressionists.xml">
    <title> Impressionist Paintings </title>
    <creator> Elliotte Rusty Harold </creator>
    <description>
      A list of famous impressionist paintings organized
      by painter and date
    </description>
    <date>2000-08-22</date>
  </Description>
</RDF>

Here we've used the Description and RDF elements from RDF and the title, creator, description, and date elements from the Dublin Core. We have no choice about these names; they are established by their respective specifications. If we want software that understands RDF and the Dublin Core to understand our documents, then we have to use these names. Example 4-2 combines this description with the actual list of paintings.

Now we have a problem. Several elements have been overloaded with different meanings in different parts of the document. The title element is used for both the title of the page and the title of a painting. The date element is used for both the date the page was written and the date the painting was painted. One description element describes pages, while another describes paintings.

This presents all sorts of problems. Validation is difficult because catalog and Dublin Core elements with the same name have different content specifications. Web browsers may want to hide the page description while showing the painting description, but not all stylesheet languages can tell the difference between the two. Processing software may understand the date format used in the Dublin Core date element, but not the more free-form format used in the painting date element.

We could change the names of the elements from our vocabulary, painting_title instead of title, date_painted instead of date, and so on. However, this is inconvenient if you already have a lot of documents marked up in the old version of the vocabulary. And it may not be possible to do this in all cases, especially if the name collisions occur not because of conflicts between your vocabulary and a standard vocabulary, but because of conflicts between two or more standard vocabularies. For instance, RDF just barely avoids a collision with the Dublin Core over the Description and description elements.

In other cases, there may not be any name conflicts, but it may still be important for software to determine quickly and decisively which XML application a given element or attribute belongs to. For instance, an XSLT processor needs to distinguish between XSLT instructions and literal result-tree elements.