Chapter 3. Anatomy of an HTML Document

Most HTML and XHTML documents are very simple, and writing one shouldn't intimidate even the most timid of computer users. First, although you might use a fancy WYSIWYG editor to help you compose it, a document is ultimately stored, distributed, and read by a browser as a simple text file.[*] That's why even the poorest user with a barebones text editor can compose the most elaborate of web pages. (Accomplished webmasters often elicit the admiration of "newbies" by composing astonishingly cool pages using the crudest text editor on a cheap laptop computer and performing in odd places, such as on a bus or in the bathroom.) Authors should, however, keep several of the popular browsers on hand, including recent versions of each, and alternate among them to view new documents under construction. Remember, browsers differ in how they display a page, not all browsers implement all of the language standards, and some have their own special extensions.

Documents never look alike when displayed by a text editor and when displayed by a browser. Take a look at any source document on the Web. At the very least, return characters, tabs, and leading spaces, although important for readability of the source text document, are ignored for the most part when displayed by an HTML/XHTML browser. There also is a lot of extra text in a source document, mostly from the display tags and interactivity markers and their parameters that affect portions of the document but don't appear in the display.

Accordingly, new authors are confronted with having to develop not only a presentation style for their web pages, but also a different style for their source text. The source document's layout should highlight the programming-like markup aspects of HTML and XHTML, not their display aspects. And it should be readable not only by you, the author, but by others as well.

Experienced document writers typically adopt a programming-like style, albeit very relaxed, for their source text. We do the same throughout this book, and that style will become apparent as you compare our source examples with the actual display of the document by a browser.

Our formatting style is simple, but it serves to create readable, easily maintained documents:

The task of maintaining the indentation of your source file ranges from trivial to onerous. Some text editors, such as Emacs, manage the indentation automatically; others, such as common word processors, couldn't care less about indentation and leave the task completely up to you. If your editor makes your life difficult, you might consider striking a compromise, perhaps by indenting the tags to show structure, but leaving the actual text without indentation to make modifications easier.

No matter what compromises or stands you make on source-code style, it's important that you adopt one. You'll be very glad you did when you go back to that document you wrote three months ago searching for that really cool trick you did with...now, where was that?



[*] Informally, both the text and the markup tags are ASCII characters. Technically, unless you specify otherwise, text and tags are made up of 8-bit characters as defined in the standard ISO-8859-1 Latin character set. The HTML/XHTML standards support alternative character encodings, including Arabic and Cyrillic. See Appendix F for details.