What Can Headers Tell Us?

There are several headers that are common to all servers, as well as others that vary between servers and between different pages on the same server. The twelve headers that are returned with the O’Reilly home page are representative. I will step through each of these and explain their significance. The order has been changed so as to group related headers together.

Server Response
HTTP/1.1 200 OK

The first header announces the protocol that is being used by the server and reports its response to the page request. In this case, the server is using Version 1.1 of the HTTP protocol, which is typical. The number that follows is the server response code . The value 200 signifies that the requested page was found and is being sent back to the browser. The status message OK is a convenience that reminds us what the numeric code means.

There are more than 30 possible server response codes but most are rarely seen. You are undoubtedly familiar with code 404, which signifies that the requested page was not found. Codes in the 300 series indicate that the browser is being redirected to another page. Redirection is commonly used in scams to conceal the identity of a web site. The mechanism being used can be determined by looking at that specific code, as I discuss in Chapter 4.

Date
Date: Thu, 20 Jan 2005 17:08:11 GMT

The date header is a timestamp for when this page was downloaded from the server.

Last-Modified
Last-Modified: Thu, 20 Jan 2005 09:19:26 GMT

This header is the timestamp of when the content of the page was last modified. This is an important piece of information for the browser because it will check its cache to see if it already has a copy of this page. If it does, then it will compare the timestamp from when that was downloaded with the Last-Modified timestamp. If the latter is later than the prior download then it will retrieve the new version. If the cached version was downloaded after the last change, then the browser will use that instead of continuing with the current download.

This header is also of interest from a forensics perspective because it tells us a little about the history of the page. Looking at these dates from a set of related files can help define when the site was created, which would be on or before the earliest date. The most recent change to any of the files can suggest how long the site has been in operation. Fake bank sites, for example, tend to be created immediately before the associated phishing email is sent out.

ETag
ETag: "a4524-d5f6-41ef779e"

Entity Tags, or ETags, offer an alternative to comparing timestamps. The browser can compare the bit representation of the ETag from the cached version with the one on the web site. If they are identical, then the cached version will be used. This offers a very slight performance improvement over comparing timestamps directly.

Connection and Keep-Alive
Connection: Keep-Alive
Keep-Alive: timeout=15, max=500

The Connection header tells the browser what kind of connection the server would like to establish. Keep-Alive is the usual value for this and means that the connection between the two computers will stay open after this page has been downloaded. The Keep-Alive header defines the number of seconds that a connection will stay open, waiting for a new request, and the time it will wait if the browser fails to respond.

Content-Type
Content-Type: text/html

This tells the browser what type of content it should expect. text/html is the MIME type of a basic web page. This would be different if the document were an audio file or an Excel spreadsheet or some other type of file.

Content-Length and Accept-Ranges
Content-Length: 54774
Accept-Ranges: bytes

Content-Length tells the browser how large a document is coming its way. Checking this against the number of bytes actually received provides a simple integrity check for the browser. Accept-Ranges tells the browser that it can, if it needs to, request specific pieces of the requested file, rather than the entire thing. This is not very relevant for regular web pages.

P3P
P3P: policyref="http://www.oreillynet.com/w3c/p3p.xml",
CP="CAO DSP COR [...]"

This header is used by a growing number of sites to disclose their privacy policy to browsers prior to actually downloading any content. P3P stands for Platform for Privacy Preferences, a project of the World Wide Web Consortium (http://www.w3.org/P3P/).

X-Cache
X-Cache: MISS from www.oreilly.com

Headers with the X- prefix can represent anything the server wants them to. They are equivalent to the X- headers found in email messages. In this example, X-Cache most likely refers to a cache of dynamically generated pages on the O’Reilly web site, suggesting that this site handles its load using something more than a basic web server.

Server
Server: Apache/1.3.33 (Unix) PHP/4.3.10 mod_perl/1.29

The most informative header, from a forensics point of view, is the Server header. This tells us what type of web server is responding to this request. Apache is the most common server on the Internet, and its default configuration offers up a surprising amount of detail in this header.

In this example, you can see that the site is hosted on a system running Unix. The Apache web server is Version 1.3.33, which is widely used although not the most recent release. In addition, it tells us the specific versions of PHP and the mod_perl module. With a bit of work investigating the release history of these packages and their inclusion in different Linux distributions, you could make an educated guess as to when the computer running this server was set up or last updated.

Here are some other examples of Server headers taken from various web sites:

Apache, version 1.3 on Mac OS X, version 2.0 and a commercial version
Server: Apache/1.3.29 (Darwin) PHP/4.3.1
Server: Apache/2.0.51 (Unix)
Server: Stronghold/2.4.2 Apache/1.3.6 C2NetEU/2412 (Unix)
amarewrite/0.1 mod_fastcgi/2.2.12
Microsoft Internet Information Server, versions 5 and 6
Server: Microsoft-IIS/5.0
Server: Microsoft-IIS/6.0
Sun ONE Web Server
Server: Sun-ONE-Web-Server/6.1
Oracle Application Server
Server: Oracle-Application-Server-10g OracleAS-Web-Cache-10g
/9.0.4.1.0 H;max-age=300+0;age=73)
Google’s custom web server
Server: GWS/2.1

The amount of information revealed varies according to the type of server. Apache is, by default, very generous. But while this works to our benefit when we want to investigate a web site, it can be viewed as a liability when other people use it to learn about the sites that we control.

Anyone who wants to break into a server is looking for a vulnerability they can exploit. By revealing the specific versions of Apache, PHP, DAV, etc. that we are running on our server, we may be making their life much easier than it needs to be. If someone knows that a certain vulnerability exists in, say Apache 1.3.29, they can write a simple wrapper script that runs wget on every IP address in a range, and then runs grep on the headers that are returned, looking for the specific version. I show you how you can limit the information in this header in the section "Controlling HTTP Headers" later in this chapter.