Now you’ve seen how much information a web server can record about its visitors you might be feeling a little uneasy. Let’s turn the tables and discuss how you can control the information that your browser gives to the servers to which it connects.
There are many reasons why you might want not want a server to know anything about you. Seeing as you are reading this book, you might be investigating a dodgy web site and be concerned that the bad guys could identify you. You might be visiting sites that your government views as subversive and be worried about surveillance. Or you might be doing something illegal and not want to get caught.
The technology of the Internet, through its speed, ubiquity, and complete disdain for traditional national boundaries, has raised many complex issues involving civil liberties, censorship, law enforcement, and property laws. The technologies to protect or disguise your identity that are described here are at the heart of several of these debates. I encourage you to think about their ethical and political implications. The Electronic Frontier Foundation (EFF) (http://www.eff.org) is a vigorous champion of freedom on the Internet, and their site is an excellent resource.
If you want to disguise or hide your identity, then you have several choices, ranging from simple browser settings to sophisticated encryption and networking software.
The easiest approach is to modify the User-Agent
string that your browser sends to the server. With some
browsers , this is trivial. Konqueror, for example, can be set
up to impersonate specific browsers on specific sites, or to send no
User-Agent
string at all. If you
write you own Perl script to fetch web pages, using the LWP
module, you can have it masquerade as
anything you want. You should give it a unique name so that it can be
identified, allowing a server to allow it access or not.
This sort of disguise can conceal the browser and operating system that you use, but that’s about it. In fact, it may work against you because some sites deliver browser-specific content. If you pretend to be using Internet Explorer when you are really using Safari, you may receive content that cannot be properly displayed.
The next step is to use a Proxy that sits between your browser and the server you want to visit. A proxy is an intermediate server that takes your request, forwards it to the target server, accepts the content from that server, and passes that back to you. It has the potential to modify both the request you send and the content it receives. They come in many forms. Some are used to cache frequently requested pages rather than fetch them from the original site every time. Some companies funnel requests from internal users through a proxy to block visits to objectionable web sites. There are two types that are particularly relevant to our interests. The first is a local proxy that provides some of the privacy features that are lacking from most browsers. The second is an external proxy through which we send our requests and that can mask our IP address.
Privoxy
is an example of a
local proxy that provides a wide range of filtering capabilities. It
can process the outgoing requests sent from your
browser to modify User-Agent
and
other headers. It can also modify incoming
content to block cookies, pop ups, and ads.
The software is open source and is available from http://www.privoxy.org. You install it on your client
computer, rather than on a server, and then configure your browser to
send all http and SSL requests to port 8118 on localhost
. Figure 7-1 shows the
proxy configuration dialog box for Firefox running on Mac OS X. Other
browsers have a similar interface.
The software then applies a series of filters to the request
according to the actions that you have defined. You set these up by
going to the URL http://config.privoxy.org,
which is actually served by privoxy
running on your machine. Configuring the software is quite daunting
due to the large number of options. I’ll limit my description to just
a few of the more important ones.
To change the configuration, go to http://config.privoxy.org/show-status and click on the Edit button next to the default.action filename in the first panel of that page. This pulls up a confusing page that lists a great many actions, most of which apply to incoming content and can be safely ignored. Click on the first Edit button in the section entitled “Editing Actions File default.action”. This brings up a page of actions, each with radio buttons that can enable or disable that filter. You are strongly advised not to mess with any filters that you do not understand.
Perhaps the most useful of these is the hide-referrer
action, which is enabled by
default. Normally your browser would forward the URL of the page that
contained the link to the current page. With this filter you can
remove this header completely, you can set it to a fixed arbitrary
URL, or you can set it to the root page for the target site. The
latter is the preferred option, as some sites will only serve images
if the request was referred from a page on their site. Earlier in this
chapter, I mentioned how query strings from Google searches can be
included in the referrer header and can then be logged by the target
site. Using this privoxy filter allows you to hide this information.
The hide-user-agent
action can be
used to disguise the identity of the browser. Click on the enable
button next to this item. Below it will appear an entry box that
contains the string: Privoxy/3.0
(Anonymous)
. You don’t want to use this because it tells the
server that you are disguising your identity. Instead take the default
User-Agent
string from your browser
and strip out the text that identifies the version of either the
browser or the operating system. For example, if the original string
was this:
Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0
You would replace it with this abbreviated form:
Mozilla/5.0 (Macintosh) Firefox
This allows the server to figure what type of browser is being used and deliver appropriate content, while not revealing information that might be useful to an attacker. Figure 7-2 shows the relevant section of the configuration page.
You can check what privoxy
is
actually doing to your requests by going to http://config.privoxy.org/show-request, which shows the
headers before and after it has modified them.
Neither of these approaches do anything to hide the IP address of your computer. To do that, you need an external proxy that will forward your request to the target server and return the content to your browser. There are many sites on the Internet that have been set up to provide this service. Typically you go to their home page and type in the URL you want to view. In a basic proxy, the IP address of that site will appear in the log of the target server. Sites vary in their level of sophistication. Some will redirect requests among their own set of servers so that no one address is used all the time. Others maintain a list of active proxies elsewhere on the Net and redirect through these, adding further steps between yourself and the target server. A Google search will turn up many examples—these are a few that are active at the time of writing:
Sites like these are set up for various reasons. Some people believe strongly in Internet freedom and want to provide a service to the community. Others are set up to help people who want to view pornography or other questionable, but legal, material, perhaps making some money in the process by serving up ads to their users. Undoubtedly there are some, lurking in the back alleys of the Net, that cater for those interested in illegal material such as child pornography.
Proxies are a dual-use technology. They can just as well protect a whistle-blower or dissident as they can protect a pedophile downloading child pornography. That poses a serious liability for people that operate proxy sites. If their server is involved in illegal activity, whether they know it or not, it will be their door that the FBI will be knocking on. Many proxies have been set up with the best of intentions only to find their service abused. Some have been shut down by the authorities, some have shut themselves down, and, without wanting to sound too paranoid, you can bet that some them are honeypots, set up by the authorities, that exist solely to intercept and trace illegal traffic.
Proxy servers can protect the identity of an individual who
accesses a specific server. But they do nothing to protect someone
from a government that is able to monitor and trace traffic passing
through the network, either by packet sniffing or through the use of
compromised proxies. Truly anonymous browsing needs to use technology
at a whole other level of sophistication that combines proxies with
encryption. That technology, albeit in its infancy, is already
available to us. One of the front-runners in this field is Tor
, a project started by the Free Haven
Project and the U.S. Naval Research Lab that was recently brought
under the wing of the EFF (http://tor.eff.org).
Tor
uses a network of servers, or
nodes, dispersed across the Internet to implement what is called an
onion routing network. This paper
provides a detailed technical background to the project: http://tor.eff.org/cvs/tor/doc/design-paper/tor-design.pdf.
It works by redirecting a http request through multiple Tor
nodes until finally sending it to the
target web server. All communication between nodes is encrypted in
such a way that no single node has enough information to decode the
messages. Each node is a proxy, but not in the simple sense that we’ve
been talking about thus far.
A Tor
transaction starts with
a regular web browser making a request for a page on a remote web
server. The Tor
client consults a
directory of available nodes and picks one at random as the first hop
towards the target server. It then extends the path from that node to
a second one, and so on until there are deemed to be enough to ensure
anonymity. The final node in the path is called the exit node. It will send the unencrypted
request to the target web server and pass the content back along the
same path to the client. All data sent between nodes on the network is
encrypted and each node has a separate set of encryption keys
generated for it by the client. The upshot is that any given node in
the system, other than the client, only knows about the node it
received data from and the one it sent data to. The use of separate
encryption keys prevents any node from eavesdropping on the data it
passes down the chain. This idea of building a path incrementally
through the network is conceptually like peeling away the layers of an
onion, hence the name onion routing.
The path selection and encryption prevents anyone observing the traffic passing through the network. The target web server sees only the IP address of the exit node, and it is impossible to trace a path back to the client. Furthermore, the lifespan of a path through the network is short—typically less than a minute—so that consecutive requests for pages from a single client will most likely come from different exit nodes.
Tor
is available for Windows,
Mac OS X, and Unix. Installation as a client is straightforward.
Installing privoxy
is recommended
alongside Tor
, and happens
automatically with the Mac OS X installation. To use the network you
need to set your browser to use a proxy. That configuration is
identical to the one described earlier for privoxy
.
Once you have it configured, the software works quietly in the
background. It does slow things down, sometimes significantly. This is
a function of the number of server nodes and the traffic going through
them at any one time. The Tor
project team encourages users of the system to contribute to its
success by setting up server nodes. The more servers there are, the
better the performance and the more secure the system.
Here is an example of some edited Apache log entries for a regular browser following a series of links from one page to another:
208.12.16.2 "GET /index.html HTTP/1.1" 208.12.16.2 "GET /mobile/ora/index.html HTTP/1.1" 208.12.16.2 "GET /mobile/ora/wurfl_cgi_listing.html HTTP/1.1"
The owner of the web server can see a single machine and the
path they take through their site. Now look at the same path when run
through Tor
:
64.246.50.101 "GET /index.html HTTP/1.1" 24.207.210.2 "GET /mobile/ora/index.html HTTP/1.1" 67.19.27.123 "GET /mobile/ora/wurfl_cgi_listing.html HTTP/1.1"
Each page appears to have been retrieved from a separate browser, none of which is the true source of the request.
As it stands, Tor
is a great
way to protect your communications from attempts at eavesdropping, and
it effectively shields your IP address from any site that you visit.
Of course, no system is perfect. Even though a site cannot determine
your IP address, it can still detect that someone is visiting their
site by way of the Tor
network,
which might indicate that they are under investigation.
We can download the list of all the current active Tor
nodes (http://belegost.seul.org/), and then look for their IP
addresses in our logs. At the time of this writing, there are only 134
of these so this is not difficult. Sets of log records with these IP
addresses, close together in time, would suggest that a site is being
accessed via the Tor
network.
Looking at the collection of pages that were visited and, if possible,
the referring pages, could allow us to piece together the path taken
by that visitor. For this reason, it is especially important that you
set up privoxy
in conjunction with
Tor
and have it hide your referring
page.
Tor
is a work in progress.
The technology behind it is sophisticated, well thought out, and well
implemented. It addresses most of the technical issues that face any
scheme for anonymous communication. While the network is still small,
it is growing and has solid backing from the EFF and others. How it
will deal with the inevitable problem of abuse remains to be seen.
Finding a technical solution to this social problem is probably
impossible.
As a practical matter, if you are going to be poking around web
sites that are involved in phishing or other shady business, then it
makes sense to hide your identity from them using Tor
. It’s a simple precaution that can
prevent the outside possibility that someone will get upset with you
and flood you with spam or try and break into your machine.
On a lighter note, I do have to warn you about certain side
effects when you use Tor
for
regular browsing. Some sites, such as Google, look at the IP address
that your request is coming from and deliver content tailored to that
part of the world. With Tor
, you
cannot predict which exit node your request will finally emerge from.
It had me scratching my head for quite a while the first time my
Google search returned its results in Japanese!