Web cookies are used for a variety of purposes, such as preserving
session information between multiple page requests, recording user
preferences, and tracking the identity of users. They are passed from
server to browser in the form of HTTP Set-Cookie
headers.
Most browsers will let you view the cookies that have been placed
on your system. In Mozilla Firefox, they can be examined by clicking on
the Preferences menu item, followed by the Privacy tab and the Cookies
menu item therein. Clicking on View Cookies will list all cookies
currently on your system. In Safari, these are displayed by clicking on
Preferences, followed by the Security tab and the Show Cookies button.
The easiest way to see these headers in their native format is to use
wget
.
Here are three examples of Set-Cookie
headers from three different
sites:
Set-Cookie: BBC-UID=1462a528736a32a501c6c457b1a65b753c72fa78f090b 0a3eb3a05748d271ef60Wget%2f1%2e9%2bcvs%2dstable%20%28Red%20Hat%20 modified%29; expires=Wed, 08-Apr-09 20:25:09 GMT; path=/; domain=bbc.co.uk;
Set-Cookie: PREF=ID=0aa0e58b50c9105d:TM=1113084813:LM=1113084813: S=59giS7pgAA31xLPV; expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com
Set-Cookie: pds%5Fsess=d=AQCDWscw3cEV42bFRbjUriJ9jNujxeTdMvYlVc5b wnkxHkEDhTPg3fr6TuIaa4i0NQSGXgqxu2lZZf3vokumcLM4r2czDLNgX6iomKfde X9ZXg%3D%3D&v=5; domain=.barnesandnoble.com; path=/ Set-Cookie: pds%5Flife=d=AQBr0gi5Z9MM4Ctl1jKNjR8LVWpRnNiF5aEQupVa 4ewlSgkIybohsw4tgTuHXUMtC1m8qu9Xf6NehBWUzf1vkE9e&v=5; expires=Sat, 10-Apr-2010 21:23:54 GMT; domain=.barnesandnoble.com; path=/ Set-Cookie: browserid=version=0&os=0&browser=0; expires=Sat, 08-Oct-2005 04:00:00 GMT; domain=.barnesandnoble.com; path=/ Set-Cookie: returning=1; expires=Sat, 10-Apr-2010 21:23:54 GMT; domain=.barnesandnoble.com; path=/ Set-Cookie: affiliate=siteId=0&btbLogoFile=&siteType=1&LinkTo Referrer=Y&url=+&showBackButton=N&name=Network&isAffiliate=True; expires=Mon, 11-Apr-2005 23:23:54 GMT; domain=.barnesandnoble.com; path=/ Set-Cookie: userid=dK8EQuGB5B; expires=Sun, 11-Apr-2010 21:23:54 GMT; domain=.barnesandnoble.com; path=/
The syntax of a Set-Cookie
header is as follows:
Set-Cookie: <NAME>=<CONTENT>; expires=<TIMESTAMP>; path=<PATH>; domain=<DOMAIN>;
NAME
A unique name that identifies the cookie.
CONTENT
An arbitrary string of information that has some specific meaning to the server. As you can see from the examples, the content is often encoded in some way.
TIMESTAMP
A timestamp that denotes when the date and time at which a
browser will remove the cookie. This must be in RFC-822 format (Wdy, DD-Mon-YYYY HH:MM:SS GMT
)
PATH
The path denotes the directories on the target site in which
the cookie can be applied. This is usually set to /
, which refers to the entire
site.
DOMAIN
This defines the hosts within a domain that the cookie
applies to. For example, domain=
http://www.oreilly.com denotes a single host
whereas domain=.oreilly.com
denotes all hosts in this domain.
The use of cookies by a web site implies a reasonably sophisticated operation, including server-side scripts that can make use of the information contained in the cookies. Phishing web sites are typically very simple, so you are not likely to find cookies used in this kind of scam. But they are used extensively by sites that are involved in online advertising. Some of these fall in the gray area between legitimate advertising and spyware.
Looking at the names of the cookies stored on your system can be
very enlightening. The same names will appear linked to very different
sites, indicating that they use the same software to manage user
sessions or to track user activity. For example, cookies with the names
CP
, CFID
, and CFTOKEN
indicate that the server is using
Macromedia Cold Fusion software to manage user sessions. Similarly,
cookies named WEBTRENDS_ID
are used
to gather information about the browsing habits of visitors within a
site, using software from WebTrends. If you have an interest in a
particular company, look closely at the cookies they set. These reveal
interesting details about the software that they use.
This is an instance where using an interactive browser and
wget
are complementary. While
wget
is great for capturing headers
to a file, it does not, by default, store cookies in a form where they
can be returned to the server as part of a subsequent request. This
happens automatically with a browser. The difference in behavior can
produce unexpected results if wget
is
used to fetch a series of pages from a site that uses cookies. The
-save-cookies
and -load-cookies
options allow wget
to get around this.
The content of a cookie can be an arbitrary string and precisely what it represents is left in the hands of the server. Some examples are easy to understand but most make use of some form of encoding. Trying to decipher their internal format can make for an interesting forensic challenge.
As an example, look at the cookie that is set by Google when you pay their site a visit.
Set-Cookie: PREF=ID=bb4498284cb8aec1:TM=1113242033:LM=1113242033: S=Kz1EhS5vGqaY6AoF; expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com
The name of the cookie is PREF
,
and its value consists of the string from the equals sign up to the
first semicolon. Within that string there are four key/value pairs, separated by colons, with
the key names of ID
, TM
, LM
, and
S
.
One approach to deciphering a cookie is to visit the web site multiple times and see what, if anything, changes between visits. When viewed in a browser, the content of the Google cookie remains the same across multiple visits. This is not much of a surprise as the expiration date for it is set in the year 2038. So this is probably a long-lived identifier, rather than something that represents the current state of my Google session.
However, if I do exactly the same series of operations using
wget
, which does not save the cookie
between requests, I see that the string is different every time, as in
these three examples:
ID=bb4498284cb8aec1:TM=1113242033:LM=1113242033:S=Kz1EhS5vGqaY6AoF ID=fc61be0a2e1c6fb6:TM=1113242036:LM=1113242036:S=uldLqtcxHvKCiuV5 ID=56ead2a5473d19e1:TM=1113242042:LM=1113242042:S=xogR1GyjQZFwgI0x
The ID
and S
values are completely different strings each
time, but the TM
and LM
values are integers that increase on
consecutive requests. Also note that TM
and LM
are identical to each other in each instance of the cookie.
Looking at the difference in the TM
values and estimating the time difference
between these requests, it seems likely that both TM
and LM
are timestamps, representing the time in seconds since some starting
point. The chances are extremely high that a timestamp in the form of a
large integer like this will represent the number of seconds since
the epoch, which is a standard
point in time used in computer clocks (00:00:00 on January 1st, 1970,
UTC). A simple way to get the current number of seconds since this point
is to use the Perl one-liner: perl -e 'print
time'
. Running this immediately after downloading the cookie
shows that this is precisely what the TM
and LM
values represent.
The ID
and S
values are more of a puzzle. The ID
string is a 16-character hexadecimal string
and its name suggests that it is a unique identifier. The S
string is also 16 characters long, but it is
made up of numbers and letters of the alphabet in lower- or uppercase.
The wider range of characters allows this string to carry more
information than the ID
value. So it
could represent another unique identifier, but it might also represent
an encrypted string of some form. Analysis of a lot more examples might
shed more light on the situation. Most examples of cookies are not going to be as informative as this one but trying
to pick them apart makes for an entertaining puzzle.