Attribute Declarations

In addition to declaring its elements, a valid document must declare all the elements' attributes. This is done with ATTLIST declarations. A single ATTLIST can declare multiple attributes for a single element type. However, if the same attribute is repeated on multiple elements, then it must be declared separately for each element where it appears. (Later in this chapter you'll see how to use parameter entity references to make this repetition less burdensome.)

For example, ATTLIST declares the source attribute of the image element:

<!ATTLIST image source CDATA #REQUIRED>

It says that the image element has an attribute named source. The value of the source attribute is character data, and instances of the image element in the document are required to provide a value for the source attribute.

A single ATTLIST declaration can declare multiple attributes for the same element. For example, this ATTLIST declaration not only declares the source attribute of the image element, but also the width, height, and alt attributes:

<!ATTLIST image source CDATA #REQUIRED
                width  CDATA #REQUIRED
                height CDATA #REQUIRED
                alt    CDATA #IMPLIED
>

This declaration says the source, width, and height attributes are required. However, the alt attribute is optional and may be omitted from particular image elements. All four attributes are declared to contain character data, the most generic attribute type.

This declaration has the same effect and meaning as four separate ATTLIST declarations, one for each attribute. Whether to use one ATTLIST declaration per attribute is a matter of personal preference, but most experienced DTD designers prefer the multiple-attribute form. Given judicious application of whitespace, it's no less legible than the alternative.

In merely well-formed XML, attribute values can be any string of text. The only restrictions are that any occurrences of < or & must be escaped as &lt; and &amp;, and whichever kind of quotation mark, single or double, is used to delimit the value must also be escaped. However, a DTD allows you to make somewhat stronger statements about the content of an attribute value. Indeed, these are stronger statements than can be made about the contents of an element. For instance, you can say that an attribute value must be unique within the document, that it must be a legal XML name token, or that it must be chosen from a fixed list of values.

There are 10 attribute types in XML. They are:

These are the only attribute types allowed. A DTD cannot say that an attribute value must be an integer or a date between 1966 and 2004, for example.

An enumeration is the only attribute type that is not an XML keyword. Rather, it is a list of all possible values for the attribute, separated by vertical bars. Each possible value must be an XML name token. For example, the following declarations say that the value of the month attribute of a date element must be one of the 12 English month names, that the value of the day attribute must be a number between 1 and 31, and that the value of the year attribute must be an integer between 1970 and 2009:

<!ATTLIST date month (January | February | March | April | May | June
  | July | August | September | October | November | December) #REQUIRED
>
<!ATTLIST date day (1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12
  | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25
  | 26 | 27 | 28 | 29 | 30 | 31) #REQUIRED
>
<!ATTLIST date year (1970 | 1971 | 1972 | 1973 | 1974 | 1975 | 1976
  | 1977 | 1978 | 1979 | 1980 | 1981 | 1982 | 1983 | 1984 | 1985 | 1986
  | 1987 | 1988 | 1989 | 1990 | 1991 | 1992 | 1993 | 1994 | 1995 | 1996
  | 1997 | 1998 | 1999 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006
  | 2007 | 2008 | 2009 ) #REQUIRED
>
<!ELEMENT date EMPTY>

Given this DTD, this date element is valid:

<date month="January" day="22" year="2001"/>

However, these date elements are invalid:

<date month="01"      day="22" year="2001"/>
<date month="Jan"     day="22" year="2001"/>
<date month="January" day="02" year="2001"/>
<date month="January" day="2"  year="1969"/>
<date month="Janvier" day="22" year="2001"/>

This trick works here because all the desired values happen to be legal XML name tokens. However, we could not use the same trick if the possible values included whitespace or any punctuation besides the underscore, hyphen, colon, and period.

An IDREF type attribute refers to the ID type attribute of some element in the document. Thus, it must be an XML name. IDREF attributes are commonly used to establish relationships between elements when simple containment won't suffice.

For example, imagine an XML document that contains a list of project and employee elements. Every project has a project_id ID type attribute, and every employee has a social_security_number ID type attribute. Furthermore, each project has team_member child elements that identify who's working on the project. Since each project is assigned to multiple employees and some employees are assigned to more than one project, it's not possible to make the employees children of the projects or the projects children of the employees. The solution is to use IDREF type attributes like this:

<project id="p1">
  <goal>Develop Strategic Plan</goal>
  <team_member person="ss078-05-1120"/>
  <team_member person="ss987-65-4320"/>
</project>
<project id="p2">
  <goal>Deploy Linux</goal>
  <team_member person="ss078-05-1120"/>
  <team_member person="ss9876-12-3456"/>
</project>
<employee social_security_number="ss078-05-1120">
  <name>Fred Smith</name>
</employee>
<employee social_security_number="ss987-65-4320">
  <name>Jill Jones</name>
</employee>
<employee social_security_number="ss9876-12-3456">
  <name>Sydney Lee</name>
</employee>

In this example, the id attribute of the project element and the social_security_number attribute of the employee element would be declared to have type ID. The person attribute of the team_member element would have type IDREF. The relevant ATTLIST declarations look like this:

<!ATTLIST employee social_security_number ID    #REQUIRED>
<!ATTLIST project  project_id             ID    #REQUIRED>
<!ATTLIST team_member person              IDREF #REQUIRED>

These declarations constrain the person attribute of the team_member element to match the ID of something in the document. However, they do not constrain the person attribute of the team_member element to match only employee IDs. It would be valid (though not necessarily correct) for a team_member to hold the ID of another project or even the same project.

In addition to providing a data type, each ATTLIST declaration includes a default declaration for that attribute. There are four possibilities for this default:

For example, this ATTLIST declaration says that person elements can but do not need to have born and died attributes:

<!ATTLIST person born CDATA #IMPLIED
                 died CDATA #IMPLIED
>

This ATTLIST declaration says that every circle element must have center_x, center_y, and radius attributes:

<!ATTLIST circle center_x NMTOKEN #REQUIRED
                 center_y NMTOKEN #REQUIRED
                 radius   NMTOKEN #REQUIRED
>

This ATTLIST declaration says that every biography element has a version attribute and that the value of that attribute is 1.0, even if the start-tag of the element does not explicitly include a version attribute:

<!ATTLIST biography version CDATA #FIXED "1.0">

This ATTLIST declaration says that every web_page element has a protocol attribute. If a particular web_page element doesn't have an explicit protocol attribute, then the parser will supply one with the value http:

<!ATTLIST web_page protocol NMTOKEN "http">