Attribute Declarations

In addition to declaring its elements, a valid document must declare all the elements' attributes. This is done with ATTLIST declarations. A single ATTLIST can declare multiple attributes for a single element type. However, if the same attribute is repeated on multiple elements, then it must be declared separately for each element where it appears. (Later in this chapter you'll see how to use parameter entity references to make this repetition less burdensome.)

For example, ATTLIST declares the source attribute of the image element:

<!ATTLIST image source CDATA #REQUIRED>

It says that the image element has an attribute named source. The value of the source attribute is character data, and instances of the image element in the document are required to provide a value for the source attribute.

A single ATTLIST declaration can declare multiple attributes for the same element. For example, this ATTLIST declaration not only declares the source attribute of the image element, but also the width, height, and alt attributes:

<!ATTLIST image source CDATA #REQUIRED
                width  CDATA #REQUIRED
                height CDATA #REQUIRED
                alt    CDATA #IMPLIED
>

This declaration says the source, width, and height attributes are required. However, the alt attribute is optional and may be omitted from particular image elements. All four attributes are declared to contain character data, the most generic attribute type.

This declaration has the same effect and meaning as four separate ATTLIST declarations, one for each attribute. Whether to use one ATTLIST declaration per attribute is a matter of personal preference, but most experienced DTD designers prefer the multiple-attribute form. Given judicious application of whitespace, it's no less legible than the alternative.

Attribute Types

In merely well-formed XML, attribute values can be any string of text. The only restrictions are that any occurrences of < or & must be escaped as < and &, and whichever kind of quotation mark, single or double, is used to delimit the value must also be escaped. However, a DTD allows you to make somewhat stronger statements about the content of an attribute value. Indeed, these are stronger statements than can be made about the contents of an element. For instance, you can say that an attribute value must be unique within the document, that it must be a legal XML name token, or that it must be chosen from a fixed list of values.

There are 10 attribute types in XML. They are:

CDATA
NMTOKEN
NMTOKENS
Enumeration
ENTITY
ENTITIES
ID
IDREF
IDREFS
NOTATION

These are the only attribute types allowed. A DTD cannot say that an attribute value must be an integer or a date between 1966 and 2004, for example.

CDATA

A CDATA attribute value can contain any string of text acceptable in a well-formed XML attribute value. This is the most general attribute type. For example, you would use this type for an alt attribute of an image element because there's no particular form the text in such an attribute has to follow.

<!ATTLIST image alt CDATA #IMPLIED>

You would also use this for other kinds of data such as prices, URLs, email and snail mail addresses, citations, and other types that—while they have more structure than a simple string of text—don't match any of the other attribute types. For example:

<!ATTLIST sku
 list_price               CDATA #IMPLIED
 suggested_retail_price   CDATA #IMPLIED
 actual_price             CDATA #IMPLIED
>
<!-- All three attributes should be in the form $XX.YY -->

NMTOKEN

An XML name token is very close to an XML name. It must consist of the same characters as an XML name; that is, alphanumeric and/or ideographic characters and the punctuation marks _, -, ., and :. Furthermore, like an XML name, an XML name token may not contain whitespace. However, a name token differs from an XML name in that any of the allowed characters can be the first character in a name token, while only letters, ideographs, and the underscore can be the first character of an XML name. Thus 12 and .cshrc are valid XML name tokens although they are not valid XML names. Every XML name is an XML name token, but not all XML name tokens are XML names.

The value of an attribute declared to have type NMTOKEN is an XML name token. For example, if you knew that the year attribute of a journal element should contain an integer such as 1990 or 2015, you might declare it to have NMTOKEN type, since all years are name tokens:

<!ATTLIST journal year NMTOKEN #REQUIRED>

This still doesn't prevent the document author from assigning the year attribute values like "99" or "March", but at least it eliminates some possible wrong values, especially those that contain whitespace such as "1990 C.E." or "Sally had a little lamb."

NMTOKENS

A NMTOKENS type attribute contains one or more XML name tokens separated by whitespace. For example, you might use this to describe the dates attribute of a performances element, if the dates were given in the form 08-26-2000, like this:

<performances dates="08-21-2001 08-23-2001 08-27-2001">
  Kat and the Kings
</performances>

The appropriate declaration is:

<!ATTLIST performances dates NMTOKENS #REQUIRED>

On the other hand, you could not use this for a list of dates in the form 08/27/2001 because the forward slash is not a legal name character.

Enumeration

An enumeration is the only attribute type that is not an XML keyword. Rather, it is a list of all possible values for the attribute, separated by vertical bars. Each possible value must be an XML name token. For example, the following declarations say that the value of the month attribute of a date element must be one of the 12 English month names, that the value of the day attribute must be a number between 1 and 31, and that the value of the year attribute must be an integer between 1970 and 2009:

<!ATTLIST date month (January | February | March | April | May | June
  | July | August | September | October | November | December) #REQUIRED
>
<!ATTLIST date day (1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12
  | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25
  | 26 | 27 | 28 | 29 | 30 | 31) #REQUIRED
>
<!ATTLIST date year (1970 | 1971 | 1972 | 1973 | 1974 | 1975 | 1976
  | 1977 | 1978 | 1979 | 1980 | 1981 | 1982 | 1983 | 1984 | 1985 | 1986
  | 1987 | 1988 | 1989 | 1990 | 1991 | 1992 | 1993 | 1994 | 1995 | 1996
  | 1997 | 1998 | 1999 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006
  | 2007 | 2008 | 2009 ) #REQUIRED
>
<!ELEMENT date EMPTY>

Given this DTD, this date element is valid:

<date month="January" day="22" year="2001"/>

However, these date elements are invalid:

<date month="01"      day="22" year="2001"/>
<date month="Jan"     day="22" year="2001"/>
<date month="January" day="02" year="2001"/>
<date month="January" day="2"  year="1969"/>
<date month="Janvier" day="22" year="2001"/>

This trick works here because all the desired values happen to be legal XML name tokens. However, we could not use the same trick if the possible values included whitespace or any punctuation besides the underscore, hyphen, colon, and period.

ID

An ID type attribute must contain an XML name (not a name token but a name) that is unique within the XML document. More precisely, no other ID type attribute in the document can have the same value. (Attributes of non-ID type are not considered.) Each element may have no more than one ID type attribute.

As the keyword suggests, ID type attributes assign unique identifiers to elements. ID type attributes do not need to have the name "ID" or "id", although they very commonly do. For example, this ATTLIST declaration says that every employee element must have a social_security_number ID attribute:

<!ATTLIST employee social_security_number ID #REQUIRED>

ID numbers are tricky because a number is not an XML name and therefore not a legal XML ID. The normal solution is to prefix the values with an underscore or a common letter. For example:

<employee social_security_number="_078-05-1120"/>

IDREF

An IDREF type attribute refers to the ID type attribute of some element in the document. Thus, it must be an XML name. IDREF attributes are commonly used to establish relationships between elements when simple containment won't suffice.

For example, imagine an XML document that contains a list of project and employee elements. Every project has a project_id ID type attribute, and every employee has a social_security_number ID type attribute. Furthermore, each project has team_member child elements that identify who's working on the project. Since each project is assigned to multiple employees and some employees are assigned to more than one project, it's not possible to make the employees children of the projects or the projects children of the employees. The solution is to use IDREF type attributes like this:

<project id="p1">
  <goal>Develop Strategic Plan</goal>
  <team_member person="ss078-05-1120"/>
  <team_member person="ss987-65-4320"/>
</project>
<project id="p2">
  <goal>Deploy Linux</goal>
  <team_member person="ss078-05-1120"/>
  <team_member person="ss9876-12-3456"/>
</project>
<employee social_security_number="ss078-05-1120">
  <name>Fred Smith</name>
</employee>
<employee social_security_number="ss987-65-4320">
  <name>Jill Jones</name>
</employee>
<employee social_security_number="ss9876-12-3456">
  <name>Sydney Lee</name>
</employee>

In this example, the id attribute of the project element and the social_security_number attribute of the employee element would be declared to have type ID. The person attribute of the team_member element would have type IDREF. The relevant ATTLIST declarations look like this:

<!ATTLIST employee social_security_number ID    #REQUIRED>
<!ATTLIST project  project_id             ID    #REQUIRED>
<!ATTLIST team_member person              IDREF #REQUIRED>

These declarations constrain the person attribute of the team_member element to match the ID of something in the document. However, they do not constrain the person attribute of the team_member element to match only employee IDs. It would be valid (though not necessarily correct) for a team_member to hold the ID of another project or even the same project.

IDREFS

An IDREFS type attribute contains a whitespace-separated list of XML names, each of which must be the ID of an element in the document. This is used when one element needs to refer to multiple other elements. For instance, the previous project example could be rewritten so that the team_member children of the project element could be replaced by a team attribute like this:

<project project_id="p1" team="ss078-05-1120 ss987-65-4320">
  <goal>Develop Strategic Plan</goal>
</project>
<project project_id="p2" team="ss078-05-1120 ss9876-12-3456">
  <goal>Deploy Linux</goal>
</project>
<employee social_security_number="ss078-05-1120">
  <name>Fred Smith</name>
</employee>
<employee social_security_number="ss987-65-4320" >
  <name>Jill Jones</name>
</employee>
<employee social_security_number="ss9876-12-3456">
  <name>Sydney Lee</name>
</employee>

The appropriate declarations are:

<!ATTLIST employee social_security_number ID     #REQUIRED
                          fsteam            IDREFS #REQUIRED>
<!ATTLIST project  project_id             ID     #REQUIRED>

ENTITY

An ENTITY type attribute contains the name of an unparsed entity declared elsewhere in the DTD. For instance, a movie element might have an entity attribute identifying the MPEG or QuickTime file to play when the movie was activated:

<!ATTLIST movie source ENTITY #REQUIRED>

If the DTD declared an unparsed entity named X-Men-trailer, then this movie element might be used to embed that video file in the XML document:

<movie source="X-Men-trailer"/>

We'll discuss unparsed entities in more detail later in this chapter.

ENTITIES

An ENTITIES type attribute contains the names of one or more unparsed entities declared elsewhere in the DTD, separated by whitespace. For instance, a slide_show element might have an ENTITIES attribute identifying the JPEG files to show and the order in which to show them:

<!ATTLIST slide_show slides ENTITIES #REQUIRED>

If the DTD declared unparsed entities named slide1, slide2, slide3, and so on through slide10, then this slide_show element might be used to embed the show in the XML document:

<slide_show slides="slide1 slide2 slide3 slide4 slide5 slide6
                    slide7 slide8 slide9 slide10"/>

NOTATION

A NOTATION type attribute contains the name of a notation declared in the document's DTD. This is perhaps the rarest attribute type and isn't much used in practice. In theory, it could be used to associate types with particular elements, as well as limiting the types associated with the element. For example, these declarations define four notations for different image types and then specify that each image element must have a type attribute that selects exactly one of them:

<!NOTATION gif  SYSTEM "image/gif">
<!NOTATION tiff SYSTEM "image/tiff">
<!NOTATION jpeg SYSTEM "image/jpeg">
<!NOTATION png  SYSTEM "image/png">
<!ATTLIST  image type NOTATION (gif | tiff | jpeg | png) #REQUIRED>

The type attribute of each image element can have one of the four values gif, tiff, jpeg, or png but not any other value. This has a slight advantage over the enumerated type in that the actual MIME media type of the notation is available, whereas an enumerated type could not specify image/png or image/gif as an allowed value because the forward slash is not a legal character in XML names.

Attribute Defaults

In addition to providing a data type, each ATTLIST declaration includes a default declaration for that attribute. There are four possibilities for this default:

#IMPLIED: The attribute is optional. Each instance of the element may or may not provide a value for the attribute. No default value is provided.
#REQUIRED: The attribute is required. Each instance of the element must provide a value for the attribute. No default value is provided.
#FIXED: The attribute value is constant and immutable. This attribute has the specified value regardless of whether the attribute is explicitly noted on an individual instance of the element. If it is included, though, it must have the specified value.
Literal: The actual default value is given as a quoted string.

For example, this ATTLIST declaration says that person elements can but do not need to have born and died attributes:

<!ATTLIST person born CDATA #IMPLIED
                 died CDATA #IMPLIED
>

This ATTLIST declaration says that every circle element must have center_x, center_y, and radius attributes:

<!ATTLIST circle center_x NMTOKEN #REQUIRED
                 center_y NMTOKEN #REQUIRED
                 radius   NMTOKEN #REQUIRED
>

This ATTLIST declaration says that every biography element has a version attribute and that the value of that attribute is 1.0, even if the start-tag of the element does not explicitly include a version attribute:

<!ATTLIST biography version CDATA #FIXED "1.0">

This ATTLIST declaration says that every web_page element has a protocol attribute. If a particular web_page element doesn't have an explicit protocol attribute, then the parser will supply one with the value http:

<!ATTLIST web_page protocol NMTOKEN "http">