By default, the XInclude processor assumes the document pointed to by an
href
attribute is a well-formed XML
document. This document is parsed, and the content of the included
document replaces the xi:include
element in the including document. However, it is also nice to be able
to include unparsed text when assembling a larger document. For
instance, the program and XML examples in this book could be included
directly from their source form. If you add a parse
attribute to an xi:include
element with the value text
, then the document will be loaded as
plain text and not parsed. For example, this element includes Example 12-1 as plain text,
without parsing it:
<xi:include
href="http://cafeconleche.org/books/xian3/examples/12/12-1.xml"
parse="text"
/>
When parse="text
", it is no
longer necessary for the referenced document to be well-formed.
Indeed, it need not be an XML document at all. It can be C source
code, an email message, a classic HTML document, or almost anything
else. The only restriction is that the included document must not
contain any completely illegal characters, such as an ASCII NUL, or an
unmatched half of a surrogate pair.
XInclude processors make use of any protocol metadata such as
HTTP headers to determine the encoding of a referenced document so
they can transcode it into Unicode before including it. If external
metadata is not available, but the MIME media type is text/xml
, application/xml
, or some type that ends in
+xml
, then the processor will look
inside the document for common signatures like byte-order marks or XML
declarations that help it guess the encoding. If these standard
mechanisms won't suffice, the document author can add an encoding
attribute to the xi:include
element, indicating the expected
encoding of the document. For example, this element tries to load
Example 12-1 using the
Latin-1 encoding:
<xi:include
href="http://cafeconleche.org/books/xian3/examples/12/12-1.xml"
encoding="ISO-8859-1" parse="text"
/>
Finally, if all of those fail, the processor assumes the document is encoded in UTF-8. Any byte sequences that are undefined in the document's encoding (or what the XInclude processor thinks is the document's encoding) are a fatal error.
The parse
attribute can also have the value xml
to indicate that the referenced document
should be parsed. However, this is the default so most authors don't
bother to write parse="xml
". Any
other value is a fatal error.