Two DTD Examples

Some of the best techniques for DTD design only become apparent when you look at larger documents. In this section, we'll develop DTDs that cover the two different document formats for describing people that were presented in Examples Example 2-4 and Example 2-5 of the last chapter.

DTDs for Record-Like Documents

DTDs for record-like documents are very straightforward. They make heavy use of sequences, occasional use of choices, and almost no use of mixed content. Example 3-6 shows such a DTD. Since this is a small example, and since it's easier to understand when both the document and the DTD are on the same page, we've made this an internal DTD included in the document. However, it would be easy to extract it and store it in a separate file.

Example 3-6. A DTD describing people

<?xml version="1.0"?>
<!DOCTYPE person  [
  <!ELEMENT person (name+, profession*)>
  <!ELEMENT name EMPTY>
  <!ATTLIST name first CDATA #REQUIRED
                 last  CDATA #REQUIRED>
  <!-- The first and last attributes are required to be present
       but they may be empty. For example,
       <name first="Cher" last=""> -->
  <!ELEMENT profession EMPTY>
  <!ATTLIST profession value CDATA #REQUIRED>
]>
<person>
  <name first="Alan" last="Turing"/>
  <profession value="computer scientist"/>
  <profession value="mathematician"/>
  <profession value="cryptographer"/>
</person>

The DTD here is contained completely inside the internal DTD subset. First a person ELEMENT declaration states that each person must have one or more name children, and zero or more profession children, in that order. This allows for the possibility that a person changes his name or uses aliases. It assumes that each person has at least one name but may not have a profession.

This declaration also requires that all name elements precede all profession elements. Here the DTD is less flexible than it ideally would be. There's no particular reason that the names have to come first. However, if we were to allow more random ordering, it would be hard to say that there must be at least one name. One of the weaknesses of DTDs is that it occasionally forces extra sequence order on you when all you really need is a constraint on the number of some element.

Both name and profession elements are empty so their declarations are very simple. The attribute declarations are a little more complex. In all three cases, the form of the attribute is open, so all three attributes are declared to have type CDATA. All three are also required. However, note the use of comments to suggest a solution for edge cases such as celebrities with no last names. Comments are an essential tool for making sense of otherwise obfuscated DTDs.

DTDs for Narrative Documents

Narrative-oriented DTDs tend be a lot looser and make much heavier use of mixed content than do DTDs that describe more database-like documents. Consequently, they tend to be written from the bottom up, starting with the smallest elements and building up to the largest. They also tend to use parameter entities to group together similar content specifications and attribute lists.

Example 3-7 is a standalone DTD for biographies like the one shown in Example 2-5 of the last chapter. Notice that not everything it declares is actually present in Example 2-5. That's often the case with narrative documents. For instance, not all web pages contain unordered lists, but the XHTML DTD still needs to declare the ul element for those XHTML documents that do include them. Also, notice that a few attributes present in Example 2-5 have been made into fixed defaults here.

Example 3-7. A narrative-oriented DTD for biographies

<!ATTLIST biography xmlns:xlink CDATA #FIXED
                                       "http://www.w3.org/1999/xlink">
     
<!ELEMENT person (first_name, last_name)>
<!-- Birth and death dates are given in the form yyyy/mm/dd -->
<!ATTLIST person born CDATA #IMPLIED
                 died CDATA #IMPLIED>
     
<!ELEMENT date   (month, day, year)>
<!ELEMENT month  (#PCDATA)>
<!ELEMENT day    (#PCDATA)>
<!ELEMENT year   (#PCDATA)>
     
<!-- xlink:href must contain a URL.-->
<!ATTLIST emphasize xlink:type (simple) #IMPLIED
                    xlink:href CDATA   #IMPLIED>
     
<!ELEMENT profession (#PCDATA)>
<!ELEMENT footnote   (#PCDATA)>
     
<!-- The source is given according to the Chicago Manual of Style
     citation conventions -->
<!ATTLIST footnote source CDATA #REQUIRED>
     
<!ELEMENT first_name (#PCDATA)>
<!ELEMENT last_name  (#PCDATA)>
     
<!ELEMENT image EMPTY>
<!ATTLIST image source CDATA   #REQUIRED
                width  NMTOKEN #REQUIRED
                height NMTOKEN #REQUIRED
                ALT    CDATA   #IMPLIED
>
<!ENTITY % top_level "( #PCDATA | image | paragraph | definition 
                      | person | profession | emphasize | last_name
                      | first_name | footnote | date )*">
     
<!ELEMENT paragraph  %top_level; >
<!ELEMENT definition %top_level; >
<!ELEMENT emphasize  %top_level; >
<!ELEMENT biography  %top_level; >

The root biography element has a classic mixed-content declaration. Since there are several elements that can contain other elements in a fairly unpredictable fashion, we group all the possible top-level elements (elements that appear as immediate children of the root element) in a single top_level entity reference. Then we can make all of them potential children of each other in a straightforward way. This also makes it much easier to add new elements in the future. That's important since this one small example is almost certainly not broad enough to cover all possible biographies.