There are several headers that are common to all servers, as well as others that vary between servers and between different pages on the same server. The twelve headers that are returned with the O’Reilly home page are representative. I will step through each of these and explain their significance. The order has been changed so as to group related headers together.
HTTP/1.1 200 OK
The first header announces the protocol that is being used
by the server and reports its response to the page request. In
this case, the server is using Version 1.1 of the HTTP
protocol, which is typical. The
number that follows is the server
response code . The value 200
signifies that the requested page was found and is being sent back
to the browser. The status message OK
is a convenience that reminds us what
the numeric code means.
There are more than 30 possible server response codes but
most are rarely seen. You are undoubtedly familiar with code
404
, which signifies that the
requested page was not found. Codes in the 300
series indicate that the browser is
being redirected to another page. Redirection is commonly used in
scams to conceal the identity of a web site. The mechanism being
used can be determined by looking at that specific code, as I
discuss in Chapter
4.
Date: Thu, 20 Jan 2005 17:08:11 GMT
The date header is a timestamp for when this page was downloaded from the server.
Last-Modified: Thu, 20 Jan 2005 09:19:26 GMT
This header is the timestamp of when the content of the page
was last modified. This is an important piece of information for
the browser because it will check its cache to see if it already
has a copy of this page. If it does, then it will compare the
timestamp from when that was downloaded with the Last-Modified
timestamp. If the latter
is later than the prior download then it will retrieve the new
version. If the cached version was downloaded after the last
change, then the browser will use that instead of continuing with
the current download.
This header is also of interest from a forensics perspective because it tells us a little about the history of the page. Looking at these dates from a set of related files can help define when the site was created, which would be on or before the earliest date. The most recent change to any of the files can suggest how long the site has been in operation. Fake bank sites, for example, tend to be created immediately before the associated phishing email is sent out.
ETag: "a4524-d5f6-41ef779e"
Entity Tags, or
ETags, offer an alternative
to comparing timestamps. The browser can compare the bit
representation of the ETag
from
the cached version with the one on the web site. If they are
identical, then the cached version will be used. This offers a
very slight performance improvement over comparing timestamps
directly.
Connection: Keep-Alive Keep-Alive: timeout=15, max=500
The Connection
header
tells the browser what kind of connection the server would like to
establish. Keep-Alive
is the
usual value for this and means that the connection between the two
computers will stay open after this page has been downloaded. The
Keep-Alive header defines the number of seconds that a connection
will stay open, waiting for a new request, and the time it will
wait if the browser fails to respond.
Content-Type: text/html
This tells the browser what type of content it should
expect. text/html
is the MIME
type of a basic web page. This would be different if the document
were an audio file or an Excel spreadsheet or some other type of
file.
Content-Length: 54774 Accept-Ranges: bytes
Content-Length
tells the
browser how large a document is coming its way. Checking this
against the number of bytes actually received provides a simple
integrity check for the browser. Accept-Ranges
tells the browser that it
can, if it needs to, request specific pieces of the requested
file, rather than the entire thing. This is not very relevant for
regular web pages.
P3P: policyref="http://www.oreillynet.com/w3c/p3p.xml", CP="CAO DSP COR [...]"
This header is used by a growing number of sites to disclose their privacy policy to browsers prior to actually downloading any content. P3P stands for Platform for Privacy Preferences, a project of the World Wide Web Consortium (http://www.w3.org/P3P/).
X-Cache: MISS from www.oreilly.com
Headers with the X-
prefix can represent anything the server wants them to. They are
equivalent to the X-
headers
found in email messages. In this example, X-Cache
most likely refers to a cache of
dynamically generated pages on the O’Reilly web site, suggesting
that this site handles its load using something more than a basic
web server.
Server: Apache/1.3.33 (Unix) PHP/4.3.10 mod_perl/1.29
The most informative header, from a forensics point of view,
is the Server
header. This
tells us what type of web server is responding to this request.
Apache is the most common server on the Internet, and its default
configuration offers up a surprising amount of detail in this
header.
In this example, you can see that the site is hosted on a
system running Unix. The Apache web server is Version 1.3.33, which is widely used although
not the most recent release. In addition, it tells us the specific
versions of PHP and the mod_perl
module. With a bit of work
investigating the release history of these packages and their
inclusion in different Linux distributions, you could make an
educated guess as to when the computer running this server was set
up or last updated.
Here are some other examples of Server
headers taken from various web sites:
Server: Apache/1.3.29 (Darwin) PHP/4.3.1 Server: Apache/2.0.51 (Unix) Server: Stronghold/2.4.2 Apache/1.3.6 C2NetEU/2412 (Unix) amarewrite/0.1 mod_fastcgi/2.2.12
Server: Microsoft-IIS/5.0 Server: Microsoft-IIS/6.0
Server: Sun-ONE-Web-Server/6.1
Server: Oracle-Application-Server-10g OracleAS-Web-Cache-10g /9.0.4.1.0 H;max-age=300+0;age=73)
Server: GWS/2.1
The amount of information revealed varies according to the type of server. Apache is, by default, very generous. But while this works to our benefit when we want to investigate a web site, it can be viewed as a liability when other people use it to learn about the sites that we control.
Anyone who wants to break into a server is looking for a
vulnerability they can exploit. By revealing the specific versions of
Apache, PHP, DAV, etc. that we are running on our server, we may be
making their life much easier than it needs to be. If someone knows that
a certain vulnerability exists in, say Apache 1.3.29, they can write a
simple wrapper script that runs wget
on every IP address in a range, and then runs grep
on the headers that are returned, looking
for the specific version. I show you how you can limit the information
in this header in the section "Controlling HTTP
Headers" later in this chapter.