In addition to declaring its elements, a valid document must declare
all the elements' attributes. This is done with ATTLIST
declarations. A single ATTLIST
can
declare multiple attributes for a single element type. However, if the
same attribute is repeated on multiple elements, then it must be
declared separately for each element where it appears. (Later in this
chapter you'll see how to use parameter entity references to make this
repetition less burdensome.)
For example, ATTLIST
declares
the source
attribute of the
image
element:
<!ATTLIST image source CDATA #REQUIRED>
It says that the image
element has an attribute named source
. The value of the source
attribute is character data, and
instances of the image
element in
the document are required to provide a value for the source
attribute.
A single ATTLIST
declaration
can declare multiple attributes for the same element. For example,
this ATTLIST
declaration not only
declares the source
attribute of
the image
element, but also the
width
, height
, and alt
attributes:
<!ATTLIST image source CDATA #REQUIRED width CDATA #REQUIRED height CDATA #REQUIRED alt CDATA #IMPLIED >
This declaration says the source
, width
, and height
attributes are required. However, the
alt
attribute is optional and may
be omitted from particular image
elements. All four attributes are declared to contain character data,
the most generic attribute type.
This declaration has the same effect and meaning as four
separate ATTLIST
declarations, one
for each attribute. Whether to use one ATTLIST
declaration per attribute is a
matter of personal preference, but most experienced DTD designers
prefer the multiple-attribute form. Given judicious application of
whitespace, it's no less legible than the alternative.
In merely well-formed XML, attribute values can be any
string of text. The only restrictions are that any occurrences of
<
or &
must be escaped as <
and &
, and whichever kind of quotation
mark, single or double, is used to delimit the value must also be
escaped. However, a DTD allows you to make somewhat stronger
statements about the content of an attribute value. Indeed, these
are stronger statements than can be made about the contents of an
element. For instance, you can say that an attribute value must be
unique within the document, that it must be a legal XML name token,
or that it must be chosen from a fixed list of values.
There are 10 attribute types in XML. They are:
CDATA
NMTOKEN
NMTOKENS
Enumeration
ENTITY
ENTITIES
ID
IDREF
IDREFS
NOTATION
These are the only attribute types allowed. A DTD cannot say that an attribute value must be an integer or a date between 1966 and 2004, for example.
A CDATA
attribute
value can contain any string of text acceptable in a well-formed
XML attribute value. This is the most general attribute type. For
example, you would use this type for an alt
attribute of an image
element because there's no
particular form the text in such an attribute has to
follow.
<!ATTLIST image alt CDATA #IMPLIED>
You would also use this for other kinds of data such as prices, URLs, email and snail mail addresses, citations, and other types that—while they have more structure than a simple string of text—don't match any of the other attribute types. For example:
<!ATTLIST sku list_price CDATA #IMPLIED suggested_retail_price CDATA #IMPLIED actual_price CDATA #IMPLIED > <!-- All three attributes should be in the form $XX.YY -->
An XML name token is very close to an XML name. It must consist of the
same characters as an XML name; that is, alphanumeric and/or
ideographic characters and the punctuation marks _
, -
,
., and :. Furthermore, like an XML name, an XML name token may not
contain whitespace. However, a name token differs from an XML name
in that any of the allowed characters can be the first character
in a name token, while only letters, ideographs, and the
underscore can be the first character of an XML name. Thus
12
and .cshrc
are valid XML name tokens
although they are not valid XML names. Every XML name is an XML
name token, but not all XML name tokens are XML names.
The value of an attribute declared to have type NMTOKEN
is an XML name token. For
example, if you knew that the year
attribute of a journal
element should contain an
integer such as 1990 or 2015, you might declare it to have
NMTOKEN
type, since all years
are name tokens:
<!ATTLIST journal year NMTOKEN #REQUIRED>
This still doesn't prevent the document author from
assigning the year
attribute
values like "99" or "March", but at least it eliminates some
possible wrong values, especially those that contain whitespace
such as "1990 C.E." or "Sally had a little lamb."
A NMTOKENS
type
attribute contains one or more XML name tokens separated by
whitespace. For example, you might use this to describe the
dates
attribute of a performances
element, if the dates were
given in the form 08-26-2000, like this:
<performances dates="08-21-2001 08-23-2001 08-27-2001"> Kat and the Kings </performances>
The appropriate declaration is:
<!ATTLIST performances dates NMTOKENS #REQUIRED>
On the other hand, you could not use this for a list of dates in the form 08/27/2001 because the forward slash is not a legal name character.
An enumeration is the only attribute type that is
not an XML keyword. Rather, it is a list of all possible values
for the attribute, separated by vertical bars. Each possible value
must be an XML name token. For example, the following declarations
say that the value of the month
attribute of a date
element
must be one of the 12 English month names, that the value of the
day
attribute must be a number
between 1 and 31, and that the value of the year
attribute must be an integer
between 1970 and 2009:
<!ATTLIST date month (January | February | March | April | May | June | July | August | September | October | November | December) #REQUIRED > <!ATTLIST date day (1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31) #REQUIRED > <!ATTLIST date year (1970 | 1971 | 1972 | 1973 | 1974 | 1975 | 1976 | 1977 | 1978 | 1979 | 1980 | 1981 | 1982 | 1983 | 1984 | 1985 | 1986 | 1987 | 1988 | 1989 | 1990 | 1991 | 1992 | 1993 | 1994 | 1995 | 1996 | 1997 | 1998 | 1999 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 ) #REQUIRED > <!ELEMENT date EMPTY>
Given this DTD, this date
element is valid:
<date month="January" day="22" year="2001"/>
However, these date
elements are invalid:
<date month="01" day="22" year="2001"/> <date month="Jan" day="22" year="2001"/> <date month="January" day="02" year="2001"/> <date month="January" day="2" year="1969"/> <date month="Janvier" day="22" year="2001"/>
This trick works here because all the desired values happen to be legal XML name tokens. However, we could not use the same trick if the possible values included whitespace or any punctuation besides the underscore, hyphen, colon, and period.
An ID
type
attribute must contain an XML name (not a name token but a name)
that is unique within the XML document. More precisely, no other
ID
type attribute in the
document can have the same value. (Attributes of non-ID
type are not considered.) Each
element may have no more than one ID
type attribute.
As the keyword suggests, ID
type attributes assign unique
identifiers to elements. ID
type attributes do not need to have the name "ID" or "id",
although they very commonly do. For example, this ATTLIST
declaration says that every
employee
element must have a
social_security_number
ID
attribute:
<!ATTLIST employee social_security_number ID #REQUIRED>
ID numbers are tricky because a number is not an XML name
and therefore not a legal XML ID
. The normal solution is to prefix the
values with an underscore or a common letter. For example:
<employee social_security_number="_078-05-1120"/>
An IDREF
type
attribute refers to the ID
type
attribute of some element in the document. Thus, it must be an XML
name. IDREF
attributes are
commonly used to establish relationships between elements when
simple containment won't suffice.
For example, imagine an XML document that contains a list of
project
and employee
elements. Every project
has a project_id
ID type attribute, and every
employee
has a social_security_number
ID type
attribute. Furthermore, each project
has team_member
child elements that identify
who's working on the project. Since each project is assigned to
multiple employees and some employees are assigned to more than
one project, it's not possible to make the employees children of
the projects or the projects children of the employees. The
solution is to use IDREF
type
attributes like this:
<project id="p1"> <goal>Develop Strategic Plan</goal> <team_member person="ss078-05-1120"/> <team_member person="ss987-65-4320"/> </project> <project id="p2"> <goal>Deploy Linux</goal> <team_member person="ss078-05-1120"/> <team_member person="ss9876-12-3456"/> </project> <employee social_security_number="ss078-05-1120"> <name>Fred Smith</name> </employee> <employee social_security_number="ss987-65-4320"> <name>Jill Jones</name> </employee> <employee social_security_number="ss9876-12-3456"> <name>Sydney Lee</name> </employee>
In this example, the id
attribute of the project
element and the social_security_number
attribute of the
employee
element would be
declared to have type ID
. The
person
attribute of the
team_member
element would have
type IDREF
. The relevant
ATTLIST
declarations look like
this:
<!ATTLIST employee social_security_number ID #REQUIRED> <!ATTLIST project project_id ID #REQUIRED> <!ATTLIST team_member person IDREF #REQUIRED>
These declarations constrain the person
attribute of the team_member
element to match the ID of
something in the document. However, they do not constrain the
person
attribute of the
team_member
element to match
only employee IDs. It would be valid (though not necessarily
correct) for a team_member
to
hold the ID of another project or even the same project.
An IDREFS
type
attribute contains a whitespace-separated list of XML names, each
of which must be the ID
of an
element in the document. This is used when one element needs to
refer to multiple other elements. For instance, the previous
project example could be rewritten so that the team_member
children of the project
element could be replaced by a
team
attribute like
this:
<project project_id="p1" team="ss078-05-1120 ss987-65-4320"> <goal>Develop Strategic Plan</goal> </project> <project project_id="p2" team="ss078-05-1120 ss9876-12-3456"> <goal>Deploy Linux</goal> </project> <employee social_security_number="ss078-05-1120"> <name>Fred Smith</name> </employee> <employee social_security_number="ss987-65-4320" > <name>Jill Jones</name> </employee> <employee social_security_number="ss9876-12-3456"> <name>Sydney Lee</name> </employee>
The appropriate declarations are:
<!ATTLIST employee social_security_number ID #REQUIRED fsteam IDREFS #REQUIRED> <!ATTLIST project project_id ID #REQUIRED>
An ENTITY
type
attribute contains the name of an unparsed entity declared
elsewhere in the DTD. For instance, a movie
element might have an entity
attribute identifying the MPEG or QuickTime file to play when the
movie was activated:
<!ATTLIST movie source ENTITY #REQUIRED>
If the DTD declared an unparsed entity named X-Men-trailer
, then this movie
element might be used to embed
that video file in the XML document:
<movie source="X-Men-trailer"/>
We'll discuss unparsed entities in more detail later in this chapter.
An ENTITIES
type
attribute contains the names of one or more unparsed entities
declared elsewhere in the DTD, separated by whitespace. For
instance, a slide_show
element
might have an ENTITIES
attribute identifying the JPEG files to show and the order in
which to show them:
<!ATTLIST slide_show slides ENTITIES #REQUIRED>
If the DTD declared unparsed entities named slide1
, slide2
, slide3
, and so on through slide10
, then this slide_show
element might be used to
embed the show in the XML document:
<slide_show slides="slide1 slide2 slide3 slide4 slide5 slide6 slide7 slide8 slide9 slide10"/>
A NOTATION
type
attribute contains the name of a notation declared in the
document's DTD. This is perhaps the rarest attribute type and
isn't much used in practice. In theory, it could be used to
associate types with particular elements, as well as limiting the
types associated with the element. For example, these declarations
define four notations for different image types and then specify
that each image
element must
have a type
attribute that
selects exactly one of them:
<!NOTATION gif SYSTEM "image/gif"> <!NOTATION tiff SYSTEM "image/tiff"> <!NOTATION jpeg SYSTEM "image/jpeg"> <!NOTATION png SYSTEM "image/png"> <!ATTLIST image type NOTATION (gif | tiff | jpeg | png) #REQUIRED>
The type
attribute of
each image
element can have one
of the four values gif
,
tiff
, jpeg
, or png
but not any other value. This has a
slight advantage over the enumerated type in that the actual MIME
media type of the notation is available, whereas an enumerated
type could not specify image/png
or image/gif
as an allowed value because
the forward slash is not a legal character in XML names.
In addition to providing a data type, each ATTLIST
declaration includes a default declaration for that
attribute. There are four possibilities for this default:
#IMPLIED
The attribute is optional. Each instance of the element may or may not provide a value for the attribute. No default value is provided.
#REQUIRED
The attribute is required. Each instance of the element must provide a value for the attribute. No default value is provided.
#FIXED
The attribute value is constant and immutable. This attribute has the specified value regardless of whether the attribute is explicitly noted on an individual instance of the element. If it is included, though, it must have the specified value.
For example, this ATTLIST
declaration says that person
elements can but do not need to have born
and died
attributes:
<!ATTLIST person born CDATA #IMPLIED died CDATA #IMPLIED >
This ATTLIST
declaration
says that every circle
element
must have center_x
, center_y
, and radius
attributes:
<!ATTLIST circle center_x NMTOKEN #REQUIRED center_y NMTOKEN #REQUIRED radius NMTOKEN #REQUIRED >
This ATTLIST
declaration
says that every biography
element
has a version
attribute and that
the value of that attribute is 1.0
, even if the start-tag of the element
does not explicitly include a version
attribute:
<!ATTLIST biography version CDATA #FIXED "1.0">
This ATTLIST
declaration
says that every web_page
element
has a protocol
attribute. If a
particular web_page
element
doesn't have an explicit protocol
attribute, then the parser will supply one with the value http
:
<!ATTLIST web_page protocol NMTOKEN "http">