Nearly everything else you put into your HTML or XHTML document that isn't a tag is, by definition, content, and the majority of that is text. Like tags, document content is encoded using a specific character set—by default, the ISO-8859-1 Latin character set. This character set is a superset of conventional ASCII, adding the necessary characters to support the Western European languages. If your keyboard does not allow you to directly enter the characters you need, you can use character entities to insert the desired characters.
Perhaps the hardest rule to remember when marking up an HTML or XHTML document is that all the tags you insert regarding text display and formatting are only advice for the browser: they do not explicitly control how the browser will display the document. In fact, the browser can choose to ignore all of your tags and do what it pleases with the document content. What's worse, the user (of all people!) has control over the text-display characteristics of her own browser.
Get used to this lack of control. The best way to use markup to control the appearance of your documents is to concentrate on the content of the document, not on its final appearance. If you find yourself worrying excessively about spacing, alignment, text breaks, and character positioning, you'll surely end up with ulcers. You will have gone beyond the intent of HTML. If you focus on delivering information to users in an attractive manner, using the tags to advise the browser as to how best to display that information, you are using HTML or XHTML effectively, and your documents will render well on a wide range of browsers.
Besides common text, HTML and XHTML give you a way to display
special text characters that you normally might not be able to include
in your source document or that have other purposes. A good example is
the less-than or opening bracket symbol (<
). In HTML, it normally signifies the
start of a tag, so if you insert it simply as part of your text, the
browser will get confused and probably misinterpret your
document.
For both HTML and XHTML, the ampersand character (&
) instructs the browser to use a
special character, formally known as a character entity. For example, the command
<
inserts that pesky
less-than symbol into the rendered text and the browser does not
confuse it to mean the start of a tag. Similarly, >
inserts the greater-than symbol,
and &
inserts an ampersand.
There can be no spaces between the ampersand, the entity name, and the
required, trailing semicolon. (Semicolons aren't special characters;
you don't need to use an ampersand sequence to display a semicolon
normally.) [Handling Special
Characters, 16.3.7]
You also may replace the entity name after the ampersand with a
pound symbol (#
) and a decimal
value corresponding to the entity's position in the character set.
Hence, the sequence <
does
the same thing as <
and
represents the less-than symbol. In fact, you could substitute all the
normal content characters within an HTML document with ampersand
special characters, such as A
for the capital letter A or
a
for its lowercase
version, but that would be silly. You can find a complete listing of
all characters and their names and numerical equivalents in Appendix F.
Keep in mind that not all special characters can be rendered by all browsers. Some browsers just ignore many of the special characters; with others, the characters aren't available in the character sets on a specific platform. Be sure to test your documents on a range of browsers before electing to use some of the more obscure character entities.
Comments are another type of textual content that
appears in the source HTML document but is not rendered by the user's
browser. Comments fall between the special <!—
and —>
markup elements. Browsers ignore the
text between the comment character sequences. Here are some sample
comments:
<!— This is a comment —> <!— This is a multiple-line comment that ends on this line —>
You can put nearly anything you'd like inside a comment. The biggest exception to this rule is that the HTML standard doesn't let you nest comments.[*]
Internet Explorer also lets you place comments within a special,
nonstandard <comment>
tag.
Everything between the <comment>
and </comment>
tags is ignored by Internet
Explorer. All other browsers display the comment to the user.
Obviously, because of this undesirable behavior, we do not recommend
using the <comment>
tag.
Instead, always use the <!--
and
-->
sequences to delimit
comments.
Besides the obvious use of comments for source documentation, many web servers use comments to take advantage of features specific to the document server software. These servers scan the document for specific character sequences within conventional HTML/XHTML comments and then perform some action based upon the commands embedded in the comments. The action might be as simple as including text from another file (known as a server-side include) or as complex as executing other commands on the server to generate the document contents dynamically.
[*] Early versions of Netscape did let you nest comments, but no longer. The practice is tricky, so just say no.