Chapter 8

Internet and E-Mail

Information in This Chapter:

ent Overview of the Internet and How it Works

ent How Web Browsers Work and the Evidence They Can Create

ent E-Mail Function & Forensics

ent Chat and Social Networking Evidence

E-mail and Internet evidence can play a vital role in any investigation. Chapter 8 examines the technologies behind the Internet such as HTTP, IP Addressing, peer-to-peer networks, chat clients, and web browsers. It also covers reading e-mail headers and ways to conceal the sender's identity.

Uniform Resource Locator (URL), Browser, Hypertext Transfer Protocol (HTTP), Top Level Domain (TLD), Domain name, IP (Internet Protocol), Domain Name Server (DNS), HTML (Hypertext Markup Language), Whois, Gnutella, Peer-to-Peer (P2P), INDEX.DAT, Cookies, Temporary Internet Files (TIF), NTUSER.DAT, Internet Relay Chat, ICQ, Simple Mail Transfer Protocol (SMTP), Post Office Protocol (POP), Internet Message Access Protocol (IMAP), Spoofing, Anonymous remailing, Message ID

Introduction

In the beginning, the Internet was a little-known tool used by a few academics and the military. Today, it's a tool truly for the masses. We can order pizza, pay bills, look up a phone number, and take a class. For many of us, it is hard to imagine life without it. For examiners, its use can leave significant pieces of evidence. Web browsing, chat, e-mail, and social networking are just some of the technologies that we must understand how they're used, how they work, and where they leave traces.

Internet Overview

We'll begin with a quick introduction to the technology involved in getting your favorite web page to appear on your computer screen. Perhaps the best way is to track the process from start to finish. It all begins when someone enters a web address or URL (Uniform Resource Locator) into the address bar of a browser. A URL comprises three parts: the host, the domain name, and the file name. Let's use http://www.digitalforensics.com as an example.

In our example, “http” or Hypertext Transfer Protocol (HTTP) is the protocol used on the Internet to browse and interact with web sites and the like. A protocol is nothing more than an agreed-upon way for devices to communicate with one another. Next is the domain name, “digital forensics.” Last is the Top Level Domain (TLD), “.com.” It's called a TLD because it is at the top of the hierarchy that makes up the Internet's domain name system. Other TLDs include .org, .edu, and .net, just to name a few.

The browser, using the HTTP protocol, sends a “get” request to the web server hosting www.digitalforensics.com. A browser is an application that is used to view and access content on the Internet. There are several browsers to choose from: the most common are Microsoft's Internet Explorer, Mozilla's Firefox, and Google's Chrome.

After hitting enter, the first order of business is to convert the domain name into an IP (Internet Protocol) address. The Internet functions with IP addresses. It can't do anything with the domain name itself. The domain name is for us, making it easier to remember. A Domain Name Server (DNS) is responsible for mapping domain names to specific IP addresses. After the DNS makes the conversion, the request is then sent on to the server hosting the web site. After receiving the request, the server returns the requested web page and associated content.

A web page comprises several components. The first is the HTML (Hypertext Markup Language) document. This contains quite a bit of information including directions for how the page should be rendered (displayed) by the browser, content, and more. It also contains file names for subcomponents of the web page such as images. It's important to note that HTML is not a programming language.

There are two types of web pages: static and dynamic. A static web page is one that is prebuilt. Its content, layout, etc., are predetermined. A dynamic page, however, is built “on the fly.” It doesn't exist until it's called. The page is built from different pieces drawn from databases. Amazon is a great example of a dynamic web site. My page will very likely be different from your page. The books and so on that appear on my page are based on my shopping and buying habits. All this information is stored in a database(s) along with the things like the book images, descriptions, and so on. When I logon to Amazon, the server sends the items that are standard for everyone (like the Amazon logo) along with the content targeted to me.

When interacting with a web site, it's important to understand where certain things are occurring. This can be especially important to know from a forensics perspective because it can tell you where you should be looking for a given artifact. Actions can occur on either the client-side or the server-side. JavaScript (no relation to the Java the programming language) is a client-side technology. It's used for things such as roll-overs on a navigation bar. The code that makes that work is downloaded and run on the local machine. Server-side actions are just the opposite and are used when there is a need to send information to another computer (like my custom content at Amazon).

Determining the ownership and host of a particular domain name can become relevant in a criminal or civil case. A search query known as a “whois” can help you identify some of the individuals and/or companies associated with a given domain name. A whois search can tell you the registrant, when the domain was created, the administrative contact, and the technical contact. The contact information typically provides a name, address, and phone number. Most if not all domain name registrars now offer private registration. Any whois search for a domain name with private registration will typically get the registrar's contact information, rather than the actual owner (Network Solutions, LLC). If you'd like to give this a try, visit one of the sites offering the whois service. Network Solutions is one: http://www.networksolutions.com/whois/index.jsp.

Peer-to-Peer (P2P)

P2P is used primarily as a means to share files. A major portion of the traffic on a P2P network is pirated music and movies as well as child pornography. P2P differs from a client/server network in that computers on a P2P network can serve both roles (client and server). Gnutella is one of the major systems or architectures used in P2P networks.

More Advanced

Gnutella Requests

On a P2P network, what stops a file request from just propagating forever? There is actually a built-in mechanism in the information packets. In each packet, there is a Time To Live (TTL) value that is set to decrease by one every time it is delivered to another node on the network. Once that number hits 0, the packet is stopped.

To get started with a P2P network, users must first download and install a P2P client such as KaZaA, Frostwire, GigaTribe or eMule. Typically, users then create a “shared” directory containing files they want to make available to others. To find files of interest to download, users normally enter search term(s) for the file or files he wants. If the search is successful, the software returns a list of computers that have the requested file(s). Lastly, the files are downloaded to a directory of the user's choosing or to the default location specified by the client. P2P networks use HTTP to transfer files.

Nodes on a Gnutella fall into two categories. Nodes that have the required bandwidth as well as the uptime (time on the network) are classified as Ultrapeers. Those that don't are known as leafs. Ultrapeers perform some additional duties such as searching, indexing, and facilitating connections.

Web Browsers—Internet Explorer

Web browsers are an indispensable part of the overall computing experience and serve as our “vehicles” on the “Information Superhighway” known as the World Wide Web. Although there are multiple browsers on the market, Microsoft's Internet Explorer is far and away the most widely used. Other browsers (for the PC) also getting some traction are Mozilla's Firefox and Google's Chrome. On Macintosh computers, Safari is king, with Firefox getting some use here as well. At their foundation, these applications function in much the same way. For instance, all of them utilize some sort of caching system. They also have mechanisms to deal with cookies, Internet history, typed URLs, bookmarks, and more. They differ in the details. Space does not permit an exhaustive look at all the browsers and the details of their inner workings. Instead, we'll focus on some of the common functions as they are in MSIE, the overwhelming market leader.

Cookies

A cookie is a small text file that is deposited on a user's computer by a web server. Cookies can serve a variety of purposes. They can be used to track sessions as well as remember a user's preferences for a particular web site. Amazon.com is a great example. When you return to the site you are normally greeted with a “Hello, Susan” as well as customized recommendations based on your buying and browsing history. That level of individualization is made possible through cookies.

Cookies can provide valuable evidence and are tracked in a single INDEX.DAT file. They can contain Uniform Resource Locators (URLs), dates and times, user names, and more. Deciphering a cookie can be a challenge, as they aren't normally written in the clear. Fortunately for us, tools are available to get this done. It's critical to note that the existence of a web address in a cookie is not necessarily proof that the suspect actually visited the site (Casey, 2009).

Temporary Internet Files, a.k.a. web Cache

We are an impatient lot. As such, speed is vital to a user's Internet experience. Today, web browsing is expected to be nearly indistinguishable from the applications running on our own machines. web cache is one way that the browser makers shave some time off the download times. Cache speeds things along by reusing web page components like images, saving time from having to download objects more than once.

Microsoft's browser, Internet Explorer, refers to web cache as Temporary Internet Files (TIF). In Microsoft Internet Explorer, TIF is organized into subfolders bearing a random eight-character name. They are organized using a collection of INDEX.DAT files. Each file in TIF has a corresponding date and time value associated with it. This includes a “last-checked” time, which is used by the browser to determine if a newer version exists on the server. If so, then it will download the newer version.

Users can view their TIF anytime using Windows Explorer. Inside the TIF folder users will see a listing of its contents. Each item in the list will display an icon showing file type, file name, and the associated URL. It's important to understand that in this instance, what the user sees is a virtualized representation of the content. The actual items are kept in the TIF subdirectories. The only file that is actually kept here is the INDEX.DAT that keeps tabs on where the files are located inside the various subdirectories.

Webmail evidence can also be found in TIF. Hotmail, AOL, and Yahoo! can all leave messages and/or inbox information that can prove useful. These items can be recognized by the file names. Here are some examples:

More Advanced

Caching and HTTPS

If you've ever bought anything on the Internet or done any online banking, then odds are that you've used the HTTPS protocol. HTTPS is not just used for electronic commerce. It's also used for secure web-based e-mail.

HTTPS is a secure version of the HTTP protocol we use on the Internet. By default, and for security reasons, MSIE does not cache any HTTPS web pages. This is important to note, especially if you are investigating a case that might involve some sort of HTTPS web traffic. If so, then you may not find any remnants of this activity in cache.

web cache can be used to determine both culpability and intent. Much of what's in web cache will be thumbnails (those small images) along with bits and pieces of web pages.

Image size can impact a case, particularly those involving child pornography. If the suspect images are comprised entirely of small, cache-like images, then some prosecutors may be reluctant to file charges. The issue then becomes intent. Those images could have been downloaded automatically, without his consent. Images of such a small size can make for a much weaker case. Larger images, those not commonly found as part of a web page, are harder to explain away.

Internet History

Microsoft's Internet Explorer, the reigning king of browsers, keeps multiple historic user records. History is used to prevent a user from having to retype URLs into the address bar of the browser. The index.dat files track other details as well. For example, it tracks the number of times the site is visited, and the name of the file. The Internet history is organized in multiple folders and index.dat files. There are three folders: Daily, Weekly, and Cumulative.

These folders use a naming convention based on a set prefix followed by a date range. For example, a folder covering the Internet history from October 1, 2011, to October 8, 2011, would look like this:

People who have something to hide will often clear their history on a frequent basis. This can be done manually by the user or automatically by the system. By default, the history is set to clear every twenty days. The user can change this to clear much faster than that. Using a tool that can read the registry, you can view this information here:

NTUSERS\Software Microsoft\Windows\CurrentVersion\Internet Settings\URL History

More Advanced

The NTUSER.DAT File

The NTUSER.DAT file contains preference settings and individual information for each user profile. Browser history is part of this information. There is one NTUSER.DAT for each user profile on the system. Although technically a registry file, the NTUSER.DAT is located in the user folder. Note that we're talking about user “profiles” and not “users.” Putting a specific person on the keyboard is a very difficult if not impossible determination to make. Just because a person has a profile on the machine does not mean their fingers were on the keyboard at any given moment.

If this value is set less than the default of twenty days, this can be used to show the defendant took proactive steps to remove potentially incriminating evidence.

Internet Explorer Artifacts in the Registry

As part of its everyday function, MSIE deposits artifacts in the registry. These items are stored particularly in the NTUSER.DAT hive. Here we can see if the browser stores passwords, the default search engine, the default search provider, and more.

The registry can also tell us what URLs have been typed right into the browser's address bar. These are listed from 1 to 25 with the lowest number being the most recent. Only twenty-five entries can be kept at a time. The entries are purged on a first in, first out basis. Figure 8.1 shows you what they look like through a forensic tool.

image

Figure 8.1 Typed URLs as found in the Windows Registry. Graphic courtesy of Jonathan Sisson.

Here is the file path to this registry artifact:

NTUSER\Software\Microsoft\Internet Explorer\Typed URLs

Remember, the registry is not human-readable in its native form. To examine it you will need an appropriate tool. Some of these tools include Microsoft's RegEdit, Harlan Carvey's RegRipper, and AccessData's Registry Viewer.

Chat Clients

Chat applications are both popular and numerous. They are used for instant text-based communication. Popular applications include AOL Instant Messenger (AIM), Yahoo! Messenger, Windows Live Messenger, Trillian, Digsby, and many more. These clients can be used either to commit or to facilitate a variety of crimes. Pedophiles use these tools to solicit sex from minors or to distribute child pornography. Buyers and sellers use them to negotiate the sale and transfer of narcotics. The list can go on and on. Function varies from client to client as do the artifacts they leave behind. Function and residual evidence can also vary from version to version. It's difficult to keep up with the rapid pace at which these clients change. Changes can result in artifacts moving or disappearing. Rather than get “down in the weeds” with each application and version, we'll talk in broad terms of what kind of artifacts are possible and how they can be used as evidence.

Not unlike other software, chat client will leave artifacts of its installation. Paths and directories may vary somewhat. The presence or absence of these files and folders may help in proving or disproving that a specific client was used to communicate with a victim or accomplice.

Chat programs maintain a contact or “buddy” list. This list of screen names can be used to link individuals together, particularly if the other parties' screen names appear in the logs or on the drive. Screen names are often nonsensical, like “footballfan7878,” and can require some effort to connect them with a specific person. Entering screen names as part of your keyword search can also be very helpful. To complicate matters further, users can have multiple screen names. Many times these alternate identities assume a parent-child relationship with the primary identity.

Users can also choose to block people, preventing them from communicating with them. If this function is available, then this setting should be tracked somewhere, potentially leaving relevant artifacts. Often clients will also maintain a list of recent chats.

Other preferences that are under user control include embedding the date time in the chat, selecting a custom icon or image, and enabling or disabling logging. Logging can serve as a tremendous source of evidence if it's enabled.

Normally, logging is turned off by default, requiring the user to activate that function. Logs typically record the chat conversations and/or other related information like connection details, etc. Even if logging is turned off, the user can manually save that particular chat session should they need to. A major difference between having logging turned on and manually saving a session log is the location where the resulting file is saved. Auto-saved logs will normally go to a default location, whereas a destination will need to be selected for a manually saved log.

Another preference setting of interest involves the automatic acceptance of video calls, file transfers, real-time instant messages, and so on. By default, many of these features are disabled. This setting and the subsequent functionality can be used to prove that an image wasn't downloaded without consent. A suspect will have an uphill slog trying to get a jury to believe that they “had no idea” they were downloading child pornography through their chat client when the settings prove that they had to agree to accept it.

Some chat/IM clients are now allowing users to associate a cell phone (or more than one) with their account. This allows them to have IM messages forwarded to their mobile phone. In this situation, the cell number together with the account information could be used to help connect that person to a particular screen name.

Internet Relay Chat (IRC)

Commercial chat clients like Yahoo! and AOL are quite popular and in wide use. There are two other chat clients that are well worth exploring. These tools are arguably better suited for criminal activity. Internet Relay Chat or IRC is one such tool. IRC is a large chat network that has little to no oversight as it is under the control of no one single entity. It affords its user near total anonymity because there is no formal registration process. IRC is also free to use. The IRC network comprises many smaller networks such as Undernet, IRCnet, and EFnet, just to name a few (Casey, 2011). IRC users create their own chat rooms or “channels.” IRC attracts criminals with a wide range of interests looking to trade information or contraband. Network intrusion, identity theft, and child pornography represent some of the main criminal interests found on IRC.

IRC boasts some other features that make it attractive for criminals. Direct Client Connection (DCC) allows two users to connect directly from one machine to the other. In this mode the communication is totally private. This private traffic even avoids network servers, leaving no evidence for investigators to find.

E-Mail

Of all the potential sources of digital evidence, e-mail is one of the best. People often draft and send e-mail that they assume will never be read by anyone other than the intended recipient. As such, these often candid exchanges can (and have) come back to haunt the parties involved. It's also persistent, residing in multiple locations, thus making it harder to get rid of.

Accessing E-mail

E-mail is accessed and managed in one of two ways. The first is web-based e-mail such as Google's Gmail or Microsoft's Hotmail. These tools function through a web browser. The second is through an e-mail application (client). E-mail clients are specialized programs designed specifically for working with e-mail. Some applications also manage calendars, tasks, contacts, and more. Outlook and Windows Live Mail by Microsoft are two of the most popular e-mail clients on Windows systems. Outlook, the more robust of the two, is used primarily in the workplace or by power users. Windows Live Mail and its predecessor Outlook Express have much more limited functionality.

Outlook stores data in either a .pst or .ost file. Windows Live Mail and Outlook Express use .dbx. Getting at the individual messages from inside these containers is a concern, but much less so now that several current tools handle these file types natively. Individual e-mail messages (.msg files) can be exported out and given to investigators or attorneys for review.

Tracing E-mail

Tracing an e-mail message is heavily reliant on logs. As we learned earlier, each server along the e-mail's path adds information to the message's header. One of those bits of information is the Message ID. The message ID is a unique number assigned to the message by the e-mail server. Correlating the message ID with the server's logs is solid evidence that the message was received and sent by that particular machine. Again, the providers may purge those logs on a regular basis if they even keep them at all. Foreign providers will likely be very tough to deal with, making collection of this evidence that much harder.

Reading E-mail Headers

The e-mail header provides a record of the path the message took from sender to receiver (assuming steps weren't taken to alter or remove it). E-mail headers should be read from the bottom to the top. Below is a sample e-mail header from a message I may have sent to legendary Steeler linebacker Jack Lambert.

Delivered-To: Lambert58@gmail.com

Received: by 11.48.31.1 with SMTP1 id c2ct279nzg; Fri, 25 Oct 2011

22:38:23 −0800 (PST)

Return-Path:

Received: from mail.emailprovider.com (mail.myisp.com [12.34.567.890]) by mx.gmail.com with SMTP id f27se846431anc.2011.10.25.22.38.19; Fri, 25 Oct 2011 22:38:23 −0800 (PST)

Message-ID: <20111025233819.47097.mail@mail.myisp.com>

Received: from [12.34.567.890] by mail.myisp.com via HTTP; Fri, 25 Oct 2011 22:38:19 PST

Date: Fri, 25 Oct 2011 22:38:19 −0800 (PST)

From: John Sammons

Subject: Super Bowl

To: Jack Lambert

Delivered-To: Lambert58@gmail.com

The message recipient

Message-ID: <20111025233819.47097.mail@mail.myisp.com>

Received: from [12.34.567.890] by mail.myisp.com via HTTP; Fri, 25 Oct 2011 22:38:19 PST

This the record of the message being sent through Jack Lambert's email provider, mail.myisp.com.

Delivered-To: Lambert58@gmail.com

Received: by 11.48.31.1 with SMTP1 id c2ct279nzg; Fri, 25 Oct 2011 22:38:23 −0800 (PST)

Return-Path:

Received: from mail.emailprovider.com (mail.myisp.com [12.34.567.890]) by mx.gmail.com with SMTP id f27se846431anc.2011.10.25.22.38.19; Fri, 25 Oct 2011 22:38:23 −0800 (PST)

Finally,the message is transmitted from my email provider to Jack's Gmail account, Lambert58@Gmail.com

Note the message ID, 20111025233819.47097.mail@mail.myisp.com. Remember, this is a unique number assigned by an e-mail server (Google, 2011).

Social Networking Sites

E-mail and social media have at least one thing in common. There seems to be almost nothing that people won't send, post, or tweet. The fact that everyone seems to be on Facebook, Twitter, or some flavor of social media is not lost on law enforcement or prospective employers for that matter. Both groups routinely look to social media to learn more about suspects and prospective employees.

Social media evidence can be found in several places including the suspect's computer, smartphone, and the provider's network. Getting evidence from the provider will require relatively quick action along with a subpoena or search warrant. Remember, the provider only retains this information for a certain amount of time. At some point, the data you need will be purged without some legal intervention. All things considered, collecting the evidence from the provider might yield the best results.

Recovering evidence on the local machine can be a challenge. The page file (or swap space) is one location that could bear fruit. INDEX.DAT files also hold promise. Multiple artifacts can be found here. The confirmation e-mail (sent when the account is created) is found in the History.IE5\Index.dat file. The user's Facebook profile can be found on the local machine in a file named profile[#].htm. This is located in the Content.IE5 directories. The History.IE Index.dat file can hold Facebook friend searches.

Summary

The Internet functions in large part due to two protocols, specifically HTTP and TCP/IP. Another very common technology in wide use is HTML or Hyper-text Markup Language. HTML is one of the primary languages used to construct web pages. In digital forensics, evidence can be found within this code so it behooves us as examiners to be able navigate through it to locate any existing evidence.

We also looked at how web pages are found and sent to browsers using Uniform Resource Locators (URLs) and Domain Name Servers (DNS).

Peer-to-Peer (P2P) networks can be used to share not only pirated music and movies, but contraband such as child pornography as well.

Chapter 8 also looked at several artifacts generated from Internet and e-mail usage. This includes such things as INDEX.DAT records, Temporary Internet Files (TIF), the NTUSER.DAT file, cookies, and e-mail headers. Tracing an e-mail back to its origin is no easy feat as the identifying information can be forged or removed.

Chat clients and their associated logs are worth examining if found on a computer. Remember, logging may not be turned on by default.

IRC and ICQ are two modes of Internet communication that can't be ignored. These are two of the most popular ways for criminals (and others concerned with private communication) to help cover their trail.

Social networking is used worldwide today by a massive number of people. Social networking evidence can be found locally and remotely on the provider's network.

REFERENCES

1. Casey E. Handbook of Digital Forensics and Investigation Burlington, MA: Academic Press; 2009.

2. Casey E. Digital Evidence and Computer Crime: Forensic Science, Computers and the Internet Waltham, MA: Academic Press; 2011.

3. E.I. du Pont de Nemours and Company v. Kolon Industries, Inc., 2011 U.S. Dist. LEXIS 45888 (E.D. Va. April 27, 2011).

4. Google. (2011, September 21). Reading Full Email Headers. Retrieved October 24, 2011, from: http://mail.google.com/support/bin/answer.py?hl=en&answer=29436.

5. Network Solutions, LLC. (n.d.). WHOIS Behind That Domain Name? Retrieved October 13, 2011, from: http://www.networksolutions.com/whois/index.jsp.

6. Refsnes Data. (n.d.). HTML Introduction. Retrieved October 13, 2011, from: http://www.w3schools.com/html/html_intro.asp.