Every HTML document should conform to the HTML SGML DTD, the formal Document Type Definition that defines the HTML standard. The DTD defines the tags and syntax that are used to create an HTML document. You can inform the browser which DTD your document complies with by placing a special Standard Generalized Markup Language (SGML) command in the first line of the document:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">
This cryptic message indicates that your document is intended to
be compliant with the HTML 4.01 final DTD defined by the World Wide Web
Consortium (W3C). Other versions of the DTD define more restricted
versions of the HTML standard, and not all browsers support all versions
of the HTML DTD. In fact, specifying any other <!DOCTYPE>
may cause the browser to
misinterpret your document when displaying it for the user. It's also
unclear what <!DOCTYPE>
to use
if you include nonstandard, albeit popular extensions in the HTML
document—even for the deprecated HTML 3.0 standard, for which a DTD was
never released.
HTML developers are increasingly including an appropriate SGML DOCTYPE command as a prefix in their HTML documents. Because of the confusion of versions and standards, if you do choose to include a DOCTYPE in your HTML document, choose the appropriate one to ensure that your document is rendered correctly.
For XHTML authors, we do strongly recommend that you include the proper DOCTYPE statement in your XHTML documents, in conformance with XML standards. Read Chapters 15 and 16 for more about DTDs and the XML and XHTML standards.
As we saw earlier, the <html>
and </html>
tags serve to delimit the
beginning and end of a document. Since the typical browser can easily
infer from the enclosed source that it is an HTML or XHTML document,
you don't really need to include the tag in your source HTML
document.
That said, it's considered good form to include this tag so that
other tools, particularly more mundane text-processing ones, can
recognize your document as an HTML document. At the very least, the
presence of the beginning and end <html>
tags ensures that the beginning
or the end of the document has not inadvertently been deleted.
Besides, XHTML requires the <html>
and </html>
tags.
Between <html>
and
</html>
are the document's
head and body. Within the head, you'll find tags that identify the
document and define its place within a document collection. Within the
body is the actual document content, defined by tags that determine
the layout and appearance of the document text. As you might expect,
the document head is contained within <head>
and </head>
tags and the body is within
<body>
and </body>
tags, all of which we define
in more detail later in this chapter.[*]
By far, the most common form of the <html>
tag is simply:
<html>
document head and body content
</html>
The dir
attribute
specifies in which direction the browser should render text within
the containing element. When used within the <html>
tag, it determines how text
will be presented within the entire document. When used within
another tag, it controls the text's direction for just the content
of that tag.
By default, the value of this tag is ltr
, indicating that text is presented to
the user left to right. Use the other value, rtl
, to display text right to left, for
languages like Arabic and Hebrew. Of course, the results depend on
your content and the browser's support of HTML 4 or XHTML. Netscape
and Internet Explorer versions 4 and earlier ignore the dir
attribute. The HTML 4-compliant
Internet Explorer versions 5 and 6 simply right-justify (dir=rtl
) the text, although if you look
closely at Figure 3-2,
you'll notice that the browser moves the punctuation (the period) to
the other side of the sentence. Netscape 6 right-justified
everything, including the ending period, but versions 7 and 8 did
not (yet another sign that the browser wars are over):
<html dir=rtl> <head> <title>Display Directions</title> </head> <body> This is how IE 6 renders right-to-left directed text. </body> </html>
When included within the <html>
tag, the lang
attribute specifies the language you've generally used within the
document. When used within other tags, the lang
attribute specifies the language you
used within that tag's content. Ideally, browsers eventually will
use lang
to better render the
text for the user.
Set the value of the lang
attribute to an ISO-639 standard two-character language code. You
may also indicate a dialect by following the International
Organization for Standardization (ISO) language code with a dash and
a subcode name. For example, "en" is the ISO language code for
English; "en-US" is the complete code for U.S. English. Other common
language codes include "fr" (French), "de" (German), "it" (Italian),
"nl" (Dutch), "el" (Greek), "es" (Spanish), "pt" (Portuguese), "ar"
(Arabic), "he" (Hebrew), "ru" (Russian), "zh" (Chinese), "ja"
(Japanese), and "hi" (Hindi).
Use the version
attribute
to define the HTML standard version that you followed when composing
the document. Its value, for HTML version 4.01, should read
exactly:
version="-//W3C//DTD HTML 4.01//EN"
In general, version information within the <html>
tag is more trouble than it
is worth, and this attribute has been deprecated in HTML 4. Serious
authors should instead use an SGML <!DOCTYPE>
tag at the beginning of
their documents, like this:
<!DOCTYPE HTML PUBLIC "-//W3C/DTD HTML 4.01//EN" "http://www.w3c.org/TR/html4/strict.dtd">
[*] For the special HTML/XHTML frame document, a <frameset>
tag replaces the
<body>
tag; more about
this in Chapter 11.