Chapter 12. XInclude

XInclude is a new technology developed at the W3C for combining multiple well-formed and optionally valid documents and fragments thereof into a single document. It's similar in effect to using external entity references to assemble a document from several component pieces. However, XInclude can assemble a document from resources that are themselves fully well-formed documents that include XML declarations and even document type declarations. It can also use XPointers to extract only a piece of an external document, rather than including the entire thing.

XInclude defines two elements, xi:include and xi:fallback, both in the http://www.w3.org/2001/XInclude namespace. An xi:include element has an href attribute that points to a document. An XInclude processor replaces all the xi:include elements in a master document with the documents they point to. These documents can be other XML documents or plain text documents like Java source code. If the xi:include element has an xpointer attribute, then the xi:include element is replaced by only those parts of the remote document that the XPointer indicates. If the processor cannot find the external document the href attribute points to, then it replaces the xi:include element with the contents of the element's xi:fallback child element instead.

Warning

This chapter is based on the April 13, 2004 2^nd Candidate Recommendation of XInclude. We think this draft is pretty stable, but it's possible some of the details described here may change before the final release. The most current version of the XInclude specification can be found at http://www.w3.org/TR/xinclude/.

The include Element

The key component of XInclude is the include element. This must be in the http://www.w3.org/2001/XInclude namespace. The xi or xinclude prefixes are customary, although, as always, the prefix can change as long as the URI remains the same. This element has an href attribute that contains a URL pointing to the document to include. For example, this element includes the document found at the relative URL AlanTuring.xml:

<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" 
            href="AlanTuring.xml"/>

Of course, you can use absolute URLs as well:

<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" 
  href="http://cafeconleche.org/books/xian3/examples/12/AlanTuring.xml"
/>

Tip

Technically, the href attribute contains an IRI rather than a URI or URL. An IRI is like a URI except that it can contain non-ASCII characters such as é and . These characters are normally encoded in UTF-8, and then each byte of the UTF-8 sequence is percent escaped to convert the IRI to a URI before resolving it. If you're working in English, and you're not writing an XInclude processor, you can pretty much ignore this. All standard URLs are legal IRIs. If you are working with non-English, non-ASCII IRIs, this just means you can use them exactly as you'd expect without having to manually hex-encode the non-ASCII characters yourself.

Normally, the namespace declaration is placed on the root element of the including document, and not repeated on each individual xi:include element. Henceforth in this chapter, we will assume that the namespace prefix xi is bound to the correct namespace URI.

Example 12-1 shows a document similar to Example 8-1 that contains two xi:include elements. The first one loads the document found at the relative URL AlanTuring.xml. The second loads the document found at the relative URL RichardPFeynman.xml.

Example 12-1. A document that uses XInclude to load two other documents

<?xml version="1.0"?>
<people xmlns:xi="http://www.w3.org/2001/XInclude" >
  <xi:include href="AlanTuring.xml"/>
  <xi:include href="RichardPFeynman.xml"/>
</people>

When an XInclude processor reads this document, it will parse the XML documents found at the two URLs and insert their contents (except for the XML and document type declarations, if any) into the finished document at the positions indicated by the xi:include elements. The xi:include elements are removed. XInclusion is not done by default, and many XML parsers do not understand or support XInclude. You either need to use a filter that resolves the xi:include elements before processing the documents further, or tell the parser that you want it to perform XInclusion. The exact details vary from one processor to the next. For example, using xmllint from libxml, the --xinclude option tells it to resolve XIncludes:

$ xmllint --xinclude http://cafeconleche.org/books/xian3/examples/12/12-1.xml
<?xml version="1.0"?>
<people xmlns:xi="http://www.w3.org/2001/XInclude">
  <person born="1912" died="1954" 
   xml:base=
     "http://cafeconleche.org/books/xian3/examples/12/AlanTuring.xml">
    <name>
      <first_name>Alan</first_name>
      <last_name>Turing</last_name>
    </name>
    <profession>computer scientist</profession>
    <profession>mathematician</profession>
    <profession>cryptographer</profession>
  </person>
  <person born="1918" died="1988" 
  xml:base=
     "http://cafeconleche.org/books/xian3/examples/12/RichardPFeynman.xml">
    <name>
      <first_name>Richard</first_name>
      <middle_initial>P</middle_initial>
      <last_name>Feynman</last_name>
    </name>
    <profession>physicist</profession>
    <hobby>Playing the bongoes</hobby>
  </person>
</people>

You'll notice that the processor has added xml:base attributes to attempt to preserve the base URIs of the included elements. This is not so important here, where both the including document and the two included documents all live in the same directory. However, when assembling a document from different sources on different servers and different directories, this helps make sure the relative URLs in the included text are properly resolved.

It's also important to note that the inclusion is based on the parsed documents. It's not done as if by copying and pasting the raw text. XML declarations are not copied. Insignificant white space inside tags may not be quite the same after inclusion as it was before. Whitespace in the prolog and epilog is not copied at all. Document type declarations are not copied, but any default attribute values they defined are copied.

libxml includes fairly complete support for XInclude. Xerces-J 2.7 includes incomplete support for XInclude. Other parsers typically have none at all and will require the use of third-party libraries that do support XInclude, such as XOM's nu.xom.xinclude package. This is still fairly bleeding edge technology.