Types of Proxy Servers

As mentioned earlier, the term proxy server can refer to many different things. And even within the definition of proxies that are of particular interest to webbot developers, you’ll find wide diversity. This section describes some of the proxy servers available to you, as well as their advantages and disadvantages.

For a variety of reasons, which will be described later, thousands of proxy servers are available on the Internet for you to use freely. These proxies are known as open proxies. Just as with the proxy servers we discussed earlier, when you connect your webbot or browser to an open proxy, you assume that proxy’s IP address—and, by default, its physical location.

To experience an open proxy for yourself, do an Internet search on the term “open proxy.” Within the search results, you will find links to services that list hundreds, if not thousands, of open proxies. Figure 27-6 shows a representative list (from http://www.xroxy.com[77]).

In addition to the proxy’s IP address and port number, most of these lists also describe other information about the proxy, such as the proxy type (this will probably be HTTP, SOCKS, or SOCKS5), the country of origin, whether the proxy server supports (SSL) encryption, the latency (the amount of time it takes to get a response from the proxy), and some type of reliability rating.

I prefaced the first paragraph in this section with the words “for a variety of reasons,” because like many things you’ll find online, not everything is as it appears to be. Before you use an open proxy, you should ask yourself why anyone would open up their network and allow strangers to consume his resources. The truth is that there are very few legitimate reasons for anyone to do so. So why are these open proxies made available?

Many open proxies are actually misconfigured servers that allow open relaying connections. This can happen for many reasons, including when a system administrator installs a mail server and never bothers to change the default settings.

It is also strongly suspected that law enforcement agencies, governments, and cyber voyeurs use proxy servers either to detect or conceal criminal activity or to uncover covert political movements. And other open proxies are unknowingly run by regular people who inadvertently installed them when they downloaded unwanted malware or viruses.

Open proxies are good for learning, but I would not recommend them for production use. Since you don’t control the open proxy’s environment, and since the service isn’t guaranteed, there is no way to predict if the proxy’s performance will continue or if the proxy will even be there when you really need it. The other problem with open proxies is that, as we mentioned earlier, you don’t know who is operating the service, so never use an open proxy when you are transmitting confidential information like usernames or passwords.

Tor is an anonymous proxy service that is based on US naval technology. While the military is believed to still use this technology, it is a now an open source project and maintained by the nonprofit Tor Project (http://www.TorProject.org).

Unlike open proxies, Tor is a voluntary community of proxies that relay traffic through a varying route of community servers until finally exiting at a Tor endpoint. This technique makes tracing traffic back to its origin very difficult. Tor also encrypts all traffic, so there is reduced danger of being identified by network sniffers. Because of Tor’s availability (it’s free) and success, it has been embraced by journalists, military personal, law enforcement, political dissenters, webbot developers, and people like you and me.

A lot could be written about Tor, but you will only find the basics here. You are strongly encouraged to visit the Tor Project website to learn more and possibly even contribute to the project.

In addition to open proxies and Tor, a variety of commercial proxy products are available to purchase. The quality, features, and price vary from provider to provider, but most have the ability to restrict IP addresses to originate from a specific country.

It is not the intent of this book to endorse any specific proxy service providers. However, two of the bigger players in this segment are Anonymizer (http://www.Anonymizer.com) and HideMyIP (http://HideMyIP.com). You can find a wide selection of similar proxy services by performing an online search with the term “anonymous browsing.” Available commercial proxy services range from marginal to downright amazing. Some of the more compelling proxy services utilize many thousands of IP addresses, have low network latency, and change IP addresses every few seconds. Less desirable proxy services are slow, change IP addresses infrequently, and put a small pool of available addresses at your disposal. Pricing also varies widely. Some proxy services are available for a few dollars a month, but the more advanced proxies—which provide the most anonymity—are priced per HTTP GET and can become quite expensive in commercial webbot environments.

One thing that many commercial, or consumer-oriented, proxy services have in common is that they deviate from traditional proxies in the way they are configured. Instead of setting a proxy’s IP address and port in a browser or PHP/CURL configuration, these programs work with the browser to intercept web traffic and automatically route it through their network. While this “configure-less” environment makes it easier for consumers to set up and use the proxy, it is much harder (next to impossible) for a webbot employing PHP/CURL to make use of such services. While these configure-less proxy services are difficult for PHP scripts to use, they are ideal for the browser macro applications discussed in Chapter 23 and Chapter 24.



[77] The current web page containing this list is found at http://www.xroxy.com/proxylist.htm.