Relative URL references such as sark.jpg,
../pi1/sark.jpg, and
turing/pi1/sark.jpg must be resolved relative to
an absolute base URI before being retrieved. When relative URLs are
found in XLinks, xml-stylesheet
processing instructions, system identifiers, and other locations in
XML documents, they are normally resolved relative to the absolute
base URL of the document or entity that contains them. For instance,
if you find the element <image
xlink:type="simple"
xlink:href="pi1/sark.jpg
" />
in a document at the URL http://www.turing.org.uk/turing/index.html, you would
expect to find the file sark.jpg at the URL
http://www.turing.org.uk/turing/p1/sark.jpg.
This isn't a surprise. It's pretty much how links have worked in HTML
for over a decade.
However, XML does add a couple of new wrinkles to this procedure. First, an XML document may be composed of multiple entities loaded from multiple different URLs, even on different servers. If this is the case, then a relative URL is resolved relative to the base URL of the specific entity in which it appears, not the base URL of the entire document.
Secondly, the base URL may be reset or changed from within the
document by using xml:base
attributes. Such an attribute may appear on the XLink element itself
or on any ancestor element in the same entity. For example, this XLink
points to ftp://ftp.knowtion.net/pub/mirrors/gutenberg/etext93/wizoz10.txt:
<novel xmlns:xlink = "http://www.w3.org/1999/xlink" xml:base="ftp://ftp.knowtion.net/pub/mirrors/gutenberg/etext93/" xlink:type = "simple" xlink:href = "wizoz10.txt"> <title>The Wonderful Wizard of Oz</title> <author>L. Frank Baum</author> <year>1900</year> </novel>
So does this one:
<novel xmlns:xlink = "http://www.w3.org/1999/xlink" xml:base="ftp://ftp.knowtion.net/" xlink:type = "simple" xlink:href = "/pub/mirrors/gutenberg/etext93/wizoz10.txt"> <title>The Wonderful Wizard of Oz</title> <author>L. Frank Baum</author> <year>1900</year> </novel>
And this one does too:
<series xml:base="ftp://ftp.knowtion.net/"> <title>Oz Books</title> <author>L. Frank Baum</author> <novel xmlns:xlink = "http://www.w3.org/1999/xlink" xlink:type = "simple" xlink:href = "/pub/mirrors/gutenberg/etext93/"> <title>The Wonderful Wizard of Oz</title> <year>1900</year> </novel> ... </series>
All of these link to the URL ftp://ftp.knowtion.net/pub/mirrors/gutenberg/etext93/wizoz10.txt
regardless of where the document containing the XLink actually came
from. The base URL is taken from the nearest xml:base
attribute in the same entity, in
preference to the base URL of the entity that contains the
element.
xml:base
attributes can
themselves contain relative URLs. In this case, the base URL is formed
by resolving this relative URL against the base URL specified by
xml:base
attributes higher up in
the tree and/or the base URL of the entity that contains the element.
For example, resolving the URLs in the xlink:href
attributes in this authors
element requires applying the URLs
in three separate ancestor elements:
<authors xml:base="http://www.literature.org/authors/" xmlns:xlink = "http://www.w3.org/1999/xlink"> <author xml:base="baum-l-frank/"> <name>L. Frank Baum</name> <novel xml:base = "the-wonderful-wizard-of-oz/"> <title>The Wonderful Wizard of Oz</title> <year>1900</year> <chapter xlink:type="simple" xlink:href="introduction.html">Introduction</chapter> <chapter xlink:type="simple" xlink:href="chapter-01.html">The Cyclone</chapter> <chapter xlink:type="simple" xlink:href="chapter-02.html">The Council with the Munchkins</chapter> ... </novel> </author> </authors>
What if the top element has a relative base URL or no xml:base
attribute? Then you apply the
absolute base URL of the entity that contains the root element. In
theory, this entity should always have an absolute base URL against
which relative URLs can be resolved as a last resort. After all the
entity had to come from somewhere, right? Unfortunately, there are
some corner cases where this isn't true. In particular many APIs lose
track of the base URLs or create documents in memory without any base
URLs, so full resolution isn't always possible. The relevant
specifications are not perfectly clear on what happens here, though
one possible interpretation is to simply declare that the base URI is
the empty string. The URI specification defines this to mean the URI
of the current document, whatever it is. However, in the common case
where a document is read from an actual file or URL, it should always
be possible to calculate an absolute base URL for every
element.
There's one point we've made a couple of times, but it's worth
calling out because it's not obvious and quite tricky. All base URL
resolutions are performed within the scope of a single entity, not a
single document. If a document is built from multiple entities, then
it's the base URI of the entity that matters, not the base URI of the
document. Furthermore, xml:base
attributes only have scope within the entity from which they come.
They do not apply in any other entities. That is, if entity A includes
entity B, no xml:base
attributes in
entity A will be used to resolve relative URLs in entity B. If the
base URL cannot be fully resolved using xml:base
attributes from entity B, then the
final absolute URL is the URL from which entity B was loaded. xml:base
attributes in ancestor elements
from different entities are not considered.
Although we've emphasized the application of xml:base
attributes to xlink:href
attributes in this section, they
also apply in many other contexts. For instance, they're used in
XInclude and XHTML 2.0. However, xml:base
is a relative latecomer to the XML
table, so it's not universally applicable. For instance, XHTML 1.0 and
1.1 do not consider xml:base
attributes when resolving relative URLs in a
and img
elements. Instead they use the traditional base
element in the document's head
.