More and Better Web Sites: site.simple

We are now in a position to start creating real(ish) web sites, which can be found in the sample code at the web site for the book, http://oreilly.com/catalog/apache3/. For the sake of a little extra realism, we will base the site loosely round a simple web business, Butterthlies, Inc., that creates and sells picture postcards. We need to give it some web addresses, but since we don’t yet want to venture into the outside world, they should be variants on your own network ID. This way, all the machines in the network realize that they don’t have to go out on the Web to make contact. For instance, we edited the \windows\hosts file on the Windows 95 machine running the browser and the /etc/hosts file on the Unix machine running the server to read as follows:

127.0.0.1 localhost
192.168.123.2 www.butterthlies.com
192.168.123.2 sales.butterthlies.com
192.168.123.3 sales-IP.butterthlies.com
192.168.124.1 www.faraway.com

localhost is obligatory, so we left it in, but you should not make any server requests to it since the results are likely to be confusing.

You probably need to consult your network manager to make similar arrangements.

site.simple is site.toddle with a few small changes. The script go will work anywhere. To get started, do the following, depending on your operating environment:

test -d logs || mkdir logs
httpd -d 'pwd' -f 'pwd'/conf/httpd.conf

Open an MS-DOS window and from the command line, type:

c>cd \program files\apache group\apache
c>apache -k start
c>Apache/1.3.26 (Win32) running ...

To stop Apache, open a second MS-DOS window:

c>apache -k stop 
c>cd logs 
c>edit error.log

This will be true of each site in the demonstration setup, so we will not mention it again.

From here on, there will be minimal differences between the server setups necessary for Win32 and those for Unix. Unless one or the other is specifically mentioned, you should assume that the text refers to both.

It would be nice to have a log of what goes on. In the first edition of this book, we found that a file access_log was created automatically in ...site.simple/logs. In a rather bizarre move since then, the Apache Group has broken backward compatibility and now requires you to mention the log file explicitly in the Config file using the TransferLog directive.

The ... /conf/httpd.conf file now contains the following:

User webuser
Group webgroup

ServerName www.butterthlies.com

DocumentRoot /usr/www/APACHE3/APACHE3/site.simple/htdocs

TransferLog logs/access_log

In ... /htdocs we have, as before, 1.txt :

hullo world from site.simple again!

Type ./go on the server. Become the client, and retrieve http://www.butterthlies.com. You should see:

Index of /
. Parent Directory
. 1.txt

Click on 1.txt for an inspirational message as before.

This all seems satisfactory, but there is a hidden mystery. We get the same result if we connect to http://sales.butterthlies.com. Why is this? Why, since we have not mentioned either of these URLs or their IP addresses in the configuration file on site.simple, do we get any response at all?

The answer is that when we configured the machine on which the server runs, we told the network interface to respond to anyof these IP addresses:

192.168.123.2
192.168.123.3

By default Apache listens to all IP addresses belonging to the machine and responds in the same way to all of them. If there are virtual hosts configured (which there aren’t, in this case), Apache runs through them, looking for an IP name that corresponds to the incoming connection. Apache uses that configuration if it is found, or the main configuration if it is not. Later in this chapter, we look at more definite control with the directives BindAddress, Listen, and <VirtualHost>.

It has to be said that working like this (that is, switching rapidly between different configurations) seemed to get Netscape or Internet Explorer into a rare muddle. To be sure that the server was functioning properly while using Netscape as a browser, it was usually necessary to reload the file under examination by holding down the Control key while clicking on Reload. In extreme cases, it was necessary to disable caching by going to Edit Preferences Advanced Cache. Set memory and disk cache to 0, and set cache comparison to Every Time. In Internet Explorer, set Cache Compares to Every Time. If you don’t, the browser tends to display a jumble of several different responses from the server. This occurs because we are doing what no user or administrator would normally do, namely, flipping around between different versions of the same site with different versions of the same file. Whenever we flip from a newer version to an older version, Netscape is led to believe that its cached version is up-to-date.

Back on the server, stop Apache with ^C, and look at the log files. In ... /logs/access_log, you should see something like this:

192.168.123.1--- [<date-time>] "GET / HTTP/1.1" 200 177

200 is the response code (meaning “OK, cool, fine”), and 177 is the number of bytes transferred. In ... /logs/error_log, there should be nothing because nothing went wrong. However, it is a good habit to look there from time to time, though you have to make sure that the date and time logged correspond to the problem you are investigating. It is easy to fool yourself with some long-gone drama.

Life being what it is, things can go wrong, and the client can ask for something the server can’t provide. It makes sense to allow for this with the ErrorDocument command.