The character data inside an element must not contain a raw
unescaped opening angle bracket (<
).
This character is always interpreted as beginning a tag. If you need
to use this character in your text, you can escape it using the
entity reference <
, the numeric character reference
<
, or the
hexadecimal numeric character reference
<
. When a
parser reads the document, it replaces any <
, `
, or <
references it finds with the
actual <
character. However, it
will not confuse the references with the starts of tags. For
example:
<SCRIPT LANGUAGE="JavaScript"> if (location.host.toLowerCase( ).indexOf("ibiblio") < 0) { location.href="http://ibiblio.org/xml/"; } </SCRIPT>
Character data may not contain a raw unescaped ampersand (&
)
either. This is always interpreted as beginning an entity reference.
However, the ampersand may be escaped using the &
entity reference like this:
<company>W.L. Gore & Associates</company>
The ampersand is code point 38 so it could also be written with
the numeric character reference &
:
<company>W.L. Gore & Associates</company>
Entity references such as &
and character references such as
<
are markup. When an
application parses an XML document, it replaces this particular markup
with the actual character or characters the reference refers to.
XML predefines exactly five entity references. These are:
Only <
and &
must be used instead of the
literal characters in element content. The others are optional. "
and '
are useful inside attribute
values where a raw " or ' might be misconstrued as ending the
attribute value. For example, this image tag uses the '
entity reference to fill in the
apostrophe in "O'Reilly:"
<image source='oreilly_koala3.gif' width='122' height='66'
alt='Powered by O'Reilly Books'
/>
Although there's no possibility of an unescaped greater-than sign (>
) being misinterpreted as closing a tag
it wasn't meant to close, >
is allowed mostly for symmetry with <
.
There is one unusual case where the greater-than sign does
need to be escaped. The three-character sequence ]]>
cannot appear in character data.
Instead you have to write it as ]]>
.
In addition to the five predefined entity references, you can define others in the document type definition. We'll discuss how to do this in Chapter 3.
Entity and character references can only be used in element content and
attribute values. They cannot be used in element names, attribute
names, or other markup. Text like &
or <
may appear inside a comment or a
processing instruction. However, in these places it is not resolved.
The parser only replaces references in element content and attribute
values. It does not recognize references in other locations.