Including Text Files

By default, the XInclude processor assumes the document pointed to by an href attribute is a well-formed XML document. This document is parsed, and the content of the included document replaces the xi:include element in the including document. However, it is also nice to be able to include unparsed text when assembling a larger document. For instance, the program and XML examples in this book could be included directly from their source form. If you add a parse attribute to an xi:include element with the value text, then the document will be loaded as plain text and not parsed. For example, this element includes Example 12-1 as plain text, without parsing it:

<xi:include 
  href="http://cafeconleche.org/books/xian3/examples/12/12-1.xml"
  parse="text"
/>

When parse="text", it is no longer necessary for the referenced document to be well-formed. Indeed, it need not be an XML document at all. It can be C source code, an email message, a classic HTML document, or almost anything else. The only restriction is that the included document must not contain any completely illegal characters, such as an ASCII NUL, or an unmatched half of a surrogate pair.

XInclude processors make use of any protocol metadata such as HTTP headers to determine the encoding of a referenced document so they can transcode it into Unicode before including it. If external metadata is not available, but the MIME media type is text/xml, application/xml, or some type that ends in +xml, then the processor will look inside the document for common signatures like byte-order marks or XML declarations that help it guess the encoding. If these standard mechanisms won't suffice, the document author can add an encoding attribute to the xi:include element, indicating the expected encoding of the document. For example, this element tries to load Example 12-1 using the Latin-1 encoding:

<xi:include 
  href="http://cafeconleche.org/books/xian3/examples/12/12-1.xml"
  encoding="ISO-8859-1" parse="text"
/>

Finally, if all of those fail, the processor assumes the document is encoded in UTF-8. Any byte sequences that are undefined in the document's encoding (or what the XInclude processor thinks is the document's encoding) are a fatal error.

The parse attribute can also have the value xml to indicate that the referenced document should be parsed. However, this is the default so most authors don't bother to write parse="xml". Any other value is a fatal error.