Some of the best techniques for DTD design only become apparent when you look at larger documents. In this section, we'll develop DTDs that cover the two different document formats for describing people that were presented in Examples Example 2-4 and Example 2-5 of the last chapter.
DTDs for record-like documents are very straightforward. They make heavy use of sequences, occasional use of choices, and almost no use of mixed content. Example 3-6 shows such a DTD. Since this is a small example, and since it's easier to understand when both the document and the DTD are on the same page, we've made this an internal DTD included in the document. However, it would be easy to extract it and store it in a separate file.
Example 3-6. A DTD describing people
<?xml version="1.0"?> <!DOCTYPE person [ <!ELEMENT person (name+, profession*)> <!ELEMENT name EMPTY> <!ATTLIST name first CDATA #REQUIRED last CDATA #REQUIRED> <!-- The first and last attributes are required to be present but they may be empty. For example, <name first="Cher" last=""> --> <!ELEMENT profession EMPTY> <!ATTLIST profession value CDATA #REQUIRED> ]> <person> <name first="Alan" last="Turing"/> <profession value="computer scientist"/> <profession value="mathematician"/> <profession value="cryptographer"/> </person>
The DTD here is contained completely inside the internal DTD
subset. First a person ELEMENT
declaration states that each person
must have one or more name
children, and zero or more profession
children, in that order. This
allows for the possibility that a person changes his name or uses
aliases. It assumes that each person has at least one name but may
not have a profession.
This declaration also requires that all name
elements precede all profession
elements. Here the DTD is less
flexible than it ideally would be. There's no particular reason that
the names have to come first. However, if we were to allow more
random ordering, it would be hard to say that there must be at least
one name
. One of the weaknesses
of DTDs is that it occasionally forces extra sequence order on you
when all you really need is a constraint on the number of some
element.
Both name
and profession
elements are empty so their
declarations are very simple. The attribute declarations are a
little more complex. In all three cases, the form of the attribute
is open, so all three attributes are declared to have type CDATA
. All three are also required.
However, note the use of comments to suggest a solution for edge cases such as
celebrities with no last names. Comments are an essential tool for
making sense of otherwise obfuscated DTDs.
Narrative-oriented DTDs tend be a lot looser and make much heavier use of mixed content than do DTDs that describe more database-like documents. Consequently, they tend to be written from the bottom up, starting with the smallest elements and building up to the largest. They also tend to use parameter entities to group together similar content specifications and attribute lists.
Example 3-7 is a
standalone DTD for biographies like the one shown in Example 2-5 of the last chapter.
Notice that not everything it declares is actually present in Example 2-5. That's often the
case with narrative documents. For instance, not all web pages
contain unordered lists, but the XHTML DTD still needs to declare
the ul
element for those XHTML
documents that do include them. Also, notice that a few attributes
present in Example 2-5
have been made into fixed defaults here.
Example 3-7. A narrative-oriented DTD for biographies
<!ATTLIST biography xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"> <!ELEMENT person (first_name, last_name)> <!-- Birth and death dates are given in the form yyyy/mm/dd --> <!ATTLIST person born CDATA #IMPLIED died CDATA #IMPLIED> <!ELEMENT date (month, day, year)> <!ELEMENT month (#PCDATA)> <!ELEMENT day (#PCDATA)> <!ELEMENT year (#PCDATA)> <!-- xlink:href must contain a URL.--> <!ATTLIST emphasize xlink:type (simple) #IMPLIED xlink:href CDATA #IMPLIED> <!ELEMENT profession (#PCDATA)> <!ELEMENT footnote (#PCDATA)> <!-- The source is given according to the Chicago Manual of Style citation conventions --> <!ATTLIST footnote source CDATA #REQUIRED> <!ELEMENT first_name (#PCDATA)> <!ELEMENT last_name (#PCDATA)> <!ELEMENT image EMPTY> <!ATTLIST image source CDATA #REQUIRED width NMTOKEN #REQUIRED height NMTOKEN #REQUIRED ALT CDATA #IMPLIED > <!ENTITY % top_level "( #PCDATA | image | paragraph | definition | person | profession | emphasize | last_name | first_name | footnote | date )*"> <!ELEMENT paragraph %top_level; > <!ELEMENT definition %top_level; > <!ELEMENT emphasize %top_level; > <!ELEMENT biography %top_level; >
The root biography
element
has a classic mixed-content declaration. Since there are several
elements that can contain other elements in a fairly unpredictable
fashion, we group all the possible top-level elements (elements that
appear as immediate children of the root element) in a single
top_level
entity reference. Then
we can make all of them potential children of each other in a
straightforward way. This also makes it much easier to add new
elements in the future. That's important since this one small
example is almost certainly not broad enough to cover all possible
biographies.