HTTP

HyperText Transfer Protocol (HTTP) is the protocol that the web is based on. The HTTP protocol itself is a relatively secure protocol and a straightforward one to allow through firewalls.

HTTP itself is a simple protocol, but it can carry quite complex data. Because HTTP is simple and popular, most people let it through their firewalls. Because it can carry complex data, it's easy to use it to carry other protocols, which can be useful. For instance, as a firewall maintainer, you may prefer having audio data come in over HTTP to having to configure more open ports for audio data (your users may not, since the quality may not be very good).

On the other hand, tunneling can also allow inherently insecure protocols to cross your firewall. For this reason, it may be advantageous to use a firewall solution that does content-based checking of HTTP connections, so that you can disallow connections that are actually tunneling other protocols. This can be quite difficult to do.

Different programs use different methods of "tunneling". These range from simply running their normal protocol on port 80, to including support for HTTP proxying using the "CONNECT" method (discussed later in the section about HTTP proxying), to actually using HTTP with a data type that the client handles specially.

Some of these are much easier to filter out than others. For instance, almost any content checking, whether it's an intelligent packet filter or an HTTP-aware proxy, will get rid of people running protocols other than HTTP on port 80. Similarly, most HTTP proxies will let you control what destinations can be used with CONNECT, and you should restrict them carefully to just the destinations that you need.

Tunneling that actually uses HTTP, on the other hand, is very difficult to filter out successfully. In order to get rid of it, you need to do content filtering on the HTTP stream and remove the relevant data types. Relatively few firewalls support this functionality, and it's very difficult to do successfully in any case. The problem is that if you remove only the data types that you know are being used for tunneling, you are setting up a policy that allows connections by default, which is guaranteed to leave you with a continuous stream of new problems. On the other hand, if you accept only data types that you believe to be safe, you are going to have a continuous stream of new complaints from users, because many data types are in use on the web, and they change rapidly.

Fortunately, the uses for tunneling that actually uses HTTP are fairly limited. The HTTP protocol is set up to support interactions that look like normal web browsing; the client sends a query, and the server sends an answer. The client can't send any information except the initial query, which is of limited size. This model works well for tunneling some other protocols (for instance, it's fine for tunneling RealAudio) but poorly for tunneling protocols that need prolonged interaction between the client and the server. This doesn't prevent people from tunneling any protocol they like over HTTP, but it does at least make it more difficult and less efficient.

There is unfortunately no good solution to the general problem of tunneled protocols. Using proxying to make sure that connections are using HTTP, and controlling the use of CONNECT, will at least limit your exposure.

We have been discussing web servers, programs that exist purely to provide content via HTTP and related protocols. But HTTP is a straightforward and widely implemented protocol, so a number of things speak HTTP not to provide random content, but for some specialized purpose. The classic example is the administrative interface to normal HTTP servers. If you're administering a web server, you probably have a browser handy, so what's more natural than using the browser to do the administration? For a number of reasons, you don't want the administrative interface built in to the standard server (among other things, common administrative tasks involve stopping and starting the server—stopping it while you're talking to it is one thing, but starting it again is a neat trick if it's not there to talk to). Therefore, there is often a second server that speaks the HTTP protocol but doesn't behave exactly like a normal web server reading information out of files.

These days, other programs and even hardware devices may provide HTTP interfaces. For instance, you can buy a power strip with a built-in web server, allowing you to turn its outlets on and off from a web browser. These servers do not behave like the servers we have been discussing, and the fact that they speak the HTTP protocol doesn't give you any particularly good idea of what their security vulnerabilities may be.

You will have to assess the security of each of these servers separately. Some of the questions you should ask are:

In general, you do not want to allow connections to these servers to cross a firewall.

HTTP is a TCP-based service. Clients use random ports above 1023. Most servers use port 80, but some don't. To understand why, you need some history.

Many information access services (notably HTTP, WAIS, and Gopher) were designed so that the servers don't have to run on a fixed well-known port on all machines. A standard well-known port was established for each of these services, but the clients and servers are all capable of using alternate ports as well. When you reference one of these servers, you can include the port number it's running on (assuming that it's not the standard port for that service) in addition to the name of the machine it's running on. For example, an HTTP URL of the form http://host.domain.example/file.html is assumed to refer to a server on the standard HTTP port (port 80); if the server were on an alternate port (port 8000, for example), the URL would be written http://host.domain.example:8000/file.html.

The protocol designers had two valid reasons for designing these services this way:

The ability to provide these services on nonstandard ports has its uses, but it complicates things considerably from a packet filtering point of view. If your users wish to access a server running on a nonstandard port, you have several choices:

The good news is that the vast majority of these servers (probably more than 90 percent of them) use the standard port, and the more widely used and important the server is, the more likely it is to use the standard port. Many servers that use nonstandard ports use one of a few easily recognizable substitutes (800 or 8000, for instance).

Some servers also use nonstandard ports to run secondary servers. Traditionally, HTTP proxies use port 8080, and administrative servers use a port number one higher than the server they're controlling (81 for administering a standard web server and 8081 for administering a proxy server).

Your firewall will probably prevent people on your internal network from setting up their own servers at nonstandard ports (you're not going to want to allow inbound connections to arbitrary ports above 1023). You could set up such servers on a bastion host, but wherever possible, it's kinder to other sites to leave your servers on the standard port.

Various HTTP clients (such as Internet Explorer and Netscape Navigator) transparently support various proxying schemes. Some clients support SOCKS; others support user-transparent proxying via special HTTP servers, and some support both. (See the discussion of SOCKS and proxying in general in Chapter 9.)

HTTP proxies of various kinds are extremely common, and many incorporate caching, which can provide significant performance advantages for most sites. (A caching proxy is one that makes a copy of the requested data, so that if somebody else requests the same data, the proxy can fulfill the request with the copy instead of going back to the original server to request the data again.) In addition, many sites are worried about the content that people access via HTTP and use proxies to control accessibility (for instance, to prevent access to sites containing pornography, stock prices, or sports scores, all of which are common nonbusiness uses of the web by employees).

Clients that are speaking to HTTP proxy servers use HTTP, but they use slightly different commands from the ones they'd normally use. A client that wants to get the document known as "http://amusinginformation.example/foodle" without using a proxy will connect to the host amusinginformation.example and send a command much like "GET /foodle HTTP/1.1". In order to use an HTTP proxy, the client will connect to the proxy instead and issue the command as "GET http://amusinginformation.example/foodle HTTP/1.1". The proxy will then connect to amusinginformation.example and send "GET /foodle HTTP/1.1" and return the resulting page to the client.

Some HTTP proxy servers support commands that normal HTTP servers don't support. For instance, they may allow a client to issue commands like "FTP ftp://amusinginformation.example/foodle" (to have the proxy server transfer the named file via FTP and return it to the client) or "CONNECT amusinginformation.example:873" (to have the proxy server make a TCP connection to the named port and relay information between it and the client). There is no standard for these additional commands, although FTP and CONNECT are two of the most common. Most web browsers will support using an HTTP proxy server for FTP and Gopher connections, and common web proxies (for instance, Microsoft Proxy Server) will support FTP and Gopher.

Some clients that are not web browsers will allow you to use an HTTP proxy server for protocols other than HTTP, and most of them depend on using CONNECT, which makes the HTTP proxy server into a generic proxy. For instance, Lotus Notes and rsync clients both are able to use HTTP proxies to get to their servers via CONNECT.

Using an HTTP proxy server as a generic proxy in this way is convenient but not particularly secure. Few HTTP proxy servers provide any interesting control or logging on the protocols used with CONNECT. You will want to be very restrictive about what protocols you allow this way.

It's extremely important to prevent external users from connecting to your HTTP proxy servers. If your HTTP proxy server can make inbound connections, external users can use it as a platform to attack internal servers they would not otherwise be able to get to (this is particularly dangerous if they can use CONNECT to get to arbitrary services). Even if the proxy server can't be used this way, it can be used to attack third parties.

People often search actively for open HTTP proxy servers. Some of these people are hostile and want to use the proxy servers as attack platforms, but some of them just want to use the proxy servers to access web sites that would otherwise be unavailable to them because of filtering rules at their site (or in a few cases, filtering imposed by national governments). Either way, it's probably not to your advantage to let them use your site. Being nice to people behind restrictive filters is tempting, but in the long run, it will merely use up your bandwidth and get you added to the list of filtered sites.

HTTP does not use embedded IP addresses as a functional part of the protocol, so network address translation will not interfere with HTTP. Web pages may contain URLs written with IP addresses instead of hostnames, and those embedded IP addresses will not be translated. You should therefore be careful about the content of web pages on servers behind network address translators.

In addition, HTTP clients may provide name and/or IP address information to servers, leaking information about your internal numbering and naming schemes. HTTP clients may provide "From:" headers, telling the server the user's email address (as the user told it to the browser), and proxies may add "Via:" headers indicating the IP addresses of proxies that a request (or response) has passed through.

You may hear discussions about secure versions of HTTP and wonder how they relate to firewalls and the configuring of services. Such discussions are mainly focused on the privacy issues of passing information around via HTTP. They don't really help solve the kinds of problems we've been discussing in previous sections.

Two defined protocols actually provide privacy using encryption and strong authentication for HTTP. The one that everyone knows is usually called HTTPS and is denoted by using https in the URL. The other, almost unknown protocol, is called Secure HTTP and is denoted by using shttp in the URL.

The goal of HTTPS is to protect your communication channel when retrieving or sending data. HTTPS currently uses TLS and SSL to achieve this. Chapter 14, contains more technical information on TLS and SSL.

The goal of Secure HTTP is to protect individual objects rather than the communications channel. This allows, for example, individual pages on a web server to be digitally signed—a web client can check the signature when the page is downloaded. If someone replaces the page without re-signing, then the signature check will fail, causing an alert to be displayed. Similarly, a secure form that is submitted to a web server can be a self-contained digitally signed object. This means that the object can be stored and used later to prove or dispute the transaction.

The use of Secure HTTP could have significant advantages for the consumer in the world of electronic commerce. If a company claims that it has a digitally signed object indicating your desire to purchase 2,000 rubber chickens but the digital signature doesn't match, then you can argue that you did not make the request. If the signature does match, then it can only mean one of two things; either you requested the chickens, or your private key has been stolen. In contrast, when you use HTTPS, your identity is not bound to the transaction but to the communication channel. This means that HTTPS cannot protect you from someone switching your order for rubber chickens to live ones, once it has been made, or just ordering chickens on your behalf.