Cookies

Web cookies are used for a variety of purposes, such as preserving session information between multiple page requests, recording user preferences, and tracking the identity of users. They are passed from server to browser in the form of HTTP Set-Cookie headers.

Most browsers will let you view the cookies that have been placed on your system. In Mozilla Firefox, they can be examined by clicking on the Preferences menu item, followed by the Privacy tab and the Cookies menu item therein. Clicking on View Cookies will list all cookies currently on your system. In Safari, these are displayed by clicking on Preferences, followed by the Security tab and the Show Cookies button. The easiest way to see these headers in their native format is to use wget.

Here are three examples of Set-Cookie headers from three different sites:

BBC
Set-Cookie: BBC-UID=1462a528736a32a501c6c457b1a65b753c72fa78f090b
0a3eb3a05748d271ef60Wget%2f1%2e9%2bcvs%2dstable%20%28Red%20Hat%20
modified%29; expires=Wed, 08-Apr-09 20:25:09 GMT; path=/;
domain=bbc.co.uk;
Google
Set-Cookie: PREF=ID=0aa0e58b50c9105d:TM=1113084813:LM=1113084813:
S=59giS7pgAA31xLPV; expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/;
domain=.google.com
Barnes and Noble
Set-Cookie: pds%5Fsess=d=AQCDWscw3cEV42bFRbjUriJ9jNujxeTdMvYlVc5b
wnkxHkEDhTPg3fr6TuIaa4i0NQSGXgqxu2lZZf3vokumcLM4r2czDLNgX6iomKfde
X9ZXg%3D%3D&v=5; domain=.barnesandnoble.com; path=/
Set-Cookie: pds%5Flife=d=AQBr0gi5Z9MM4Ctl1jKNjR8LVWpRnNiF5aEQupVa
4ewlSgkIybohsw4tgTuHXUMtC1m8qu9Xf6NehBWUzf1vkE9e&v=5; expires=Sat,
10-Apr-2010 21:23:54 GMT; domain=.barnesandnoble.com; path=/
Set-Cookie: browserid=version=0&os=0&browser=0; expires=Sat,
08-Oct-2005 04:00:00 GMT; domain=.barnesandnoble.com; path=/
Set-Cookie: returning=1; expires=Sat, 10-Apr-2010 21:23:54 GMT;
domain=.barnesandnoble.com; path=/
Set-Cookie: affiliate=siteId=0&btbLogoFile=&siteType=1&LinkTo
Referrer=Y&url=+&showBackButton=N&name=Network&isAffiliate=True;
expires=Mon, 11-Apr-2005 23:23:54 GMT; domain=.barnesandnoble.com;
path=/
Set-Cookie: userid=dK8EQuGB5B; expires=Sun, 11-Apr-2010 21:23:54
GMT; domain=.barnesandnoble.com; path=/

The syntax of a Set-Cookie header is as follows:

    Set-Cookie: <NAME>=<CONTENT>; expires=<TIMESTAMP>; path=<PATH>;
    domain=<DOMAIN>;
NAME

A unique name that identifies the cookie.

CONTENT

An arbitrary string of information that has some specific meaning to the server. As you can see from the examples, the content is often encoded in some way.

TIMESTAMP

A timestamp that denotes when the date and time at which a browser will remove the cookie. This must be in RFC-822 format (Wdy, DD-Mon-YYYY HH:MM:SS GMT)

PATH

The path denotes the directories on the target site in which the cookie can be applied. This is usually set to /, which refers to the entire site.

DOMAIN

This defines the hosts within a domain that the cookie applies to. For example, domain= http://www.oreilly.com denotes a single host whereas domain=.oreilly.com denotes all hosts in this domain.

The use of cookies by a web site implies a reasonably sophisticated operation, including server-side scripts that can make use of the information contained in the cookies. Phishing web sites are typically very simple, so you are not likely to find cookies used in this kind of scam. But they are used extensively by sites that are involved in online advertising. Some of these fall in the gray area between legitimate advertising and spyware.

Looking at the names of the cookies stored on your system can be very enlightening. The same names will appear linked to very different sites, indicating that they use the same software to manage user sessions or to track user activity. For example, cookies with the names CP, CFID, and CFTOKEN indicate that the server is using Macromedia Cold Fusion software to manage user sessions. Similarly, cookies named WEBTRENDS_ID are used to gather information about the browsing habits of visitors within a site, using software from WebTrends. If you have an interest in a particular company, look closely at the cookies they set. These reveal interesting details about the software that they use.

This is an instance where using an interactive browser and wget are complementary. While wget is great for capturing headers to a file, it does not, by default, store cookies in a form where they can be returned to the server as part of a subsequent request. This happens automatically with a browser. The difference in behavior can produce unexpected results if wget is used to fetch a series of pages from a site that uses cookies. The -save-cookies and -load-cookies options allow wget to get around this.

The content of a cookie can be an arbitrary string and precisely what it represents is left in the hands of the server. Some examples are easy to understand but most make use of some form of encoding. Trying to decipher their internal format can make for an interesting forensic challenge.

As an example, look at the cookie that is set by Google when you pay their site a visit.

    Set-Cookie: PREF=ID=bb4498284cb8aec1:TM=1113242033:LM=1113242033:
    S=Kz1EhS5vGqaY6AoF; expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/;
    domain=.google.com

The name of the cookie is PREF, and its value consists of the string from the equals sign up to the first semicolon. Within that string there are four key/value pairs, separated by colons, with the key names of ID, TM, LM, and S.

One approach to deciphering a cookie is to visit the web site multiple times and see what, if anything, changes between visits. When viewed in a browser, the content of the Google cookie remains the same across multiple visits. This is not much of a surprise as the expiration date for it is set in the year 2038. So this is probably a long-lived identifier, rather than something that represents the current state of my Google session.

However, if I do exactly the same series of operations using wget, which does not save the cookie between requests, I see that the string is different every time, as in these three examples:

    ID=bb4498284cb8aec1:TM=1113242033:LM=1113242033:S=Kz1EhS5vGqaY6AoF
    ID=fc61be0a2e1c6fb6:TM=1113242036:LM=1113242036:S=uldLqtcxHvKCiuV5
    ID=56ead2a5473d19e1:TM=1113242042:LM=1113242042:S=xogR1GyjQZFwgI0x

The ID and S values are completely different strings each time, but the TM and LM values are integers that increase on consecutive requests. Also note that TM and LM are identical to each other in each instance of the cookie.

Looking at the difference in the TM values and estimating the time difference between these requests, it seems likely that both TM and LM are timestamps, representing the time in seconds since some starting point. The chances are extremely high that a timestamp in the form of a large integer like this will represent the number of seconds since the epoch, which is a standard point in time used in computer clocks (00:00:00 on January 1st, 1970, UTC). A simple way to get the current number of seconds since this point is to use the Perl one-liner: perl -e 'print time'. Running this immediately after downloading the cookie shows that this is precisely what the TM and LM values represent.

The ID and S values are more of a puzzle. The ID string is a 16-character hexadecimal string and its name suggests that it is a unique identifier. The S string is also 16 characters long, but it is made up of numbers and letters of the alphabet in lower- or uppercase. The wider range of characters allows this string to carry more information than the ID value. So it could represent another unique identifier, but it might also represent an encrypted string of some form. Analysis of a lot more examples might shed more light on the situation. Most examples of cookies are not going to be as informative as this one but trying to pick them apart makes for an entertaining puzzle.