Dynamic Port Forwarding

We are often asked, "How can I tunnel my web browsing over SSH?" The usual reasons are for privacy or for browsing across a firewall. The SSH port forwarding we've described so far doesn't meet this need very well, but there is another flavor called dynamic port forwarding which does. We'll call the previous technique "static forwarding" in contrast.

Suppose you're at home, using your home machine H, and need to access a web server W1 at work, but your employer's internal network is behind a firewall. You might attempt to do this through a bastion server at work (say, B) which you can log into via SSH; and then from B, you can reach whatever internal web servers you want. So you create a tunnel using the following port-forwarding command on home machine H:

    $  ssh -L 8080:W1:80 B         This runs into problems

and point your web browser on H at http://localhost:8080/. This is a reasonable try, based on forwarding as we've seen it so far, but there are lots of problems:

Problem 1: virtual hosts

Web servers can make decisions based on the hostname portion of the URL you request. For example, if the names foo and bar are aliases for the same host, then the URLs http://foo/ and http://bar/ may return different pages. A practical example is an ISP's web server, which could host content for dozens or hundreds of customers' web sites under different hostnames, all of which point to that same machine. This web server configuration is often called virtual hosts.

In our home/work example, we're trying to access web server W1 as "localhost," but it might not be configured to serve any content under this name; and even if it does, it might not be the content you want. To address this problem, you'd have to get the browser to recognize other names as aliases for localhost, e.g., by hacking /etc/hosts on a Unix box—not exactly a smooth solution.

Problem 2: absolute links

Suppose problem 1 is a non-issue, and you see the web page you want. However, if that web page has any absolute links that directly reference the hostname W1, they might not work. For example, the absolute URL http://W1/some_great_content.html fails when your browser tries to follow it, because your browser knows the site only as localhost.

Problem 3: links to other secured servers

Even if problems 1 and 2 don't bite you, your luck runs out when you hit a link to another internal web server, W2, or even a page on the same server but on a different port (e.g., http://W1:81/java-is-great.jsp).

Clearly, static port forwarding is woefully inadequate for this scenario. You could get around individual problems by editing your host file or stopping now and then to forward another port, but who wants the annoyance? And such a burdensome solution isn't exactly convenient to explain to your Aunt Mae. Or your boss.

We can address problems 1 and 2 by making a realization: that we want to redirect the web browser over SSH without fussing with the URL. Most browsers have just such a feature: a proxy. We can set the browser's HTTP proxy to our SSH-forwarded port localhost:8080; this means it always connects to our forwarded port in response to any HTTP URL we provide. The browser assumes this port leads to a proxy server that knows how to get the content for the various web servers we seek, so the browser doesn't have to contact those servers directly.

Proxying gets us part of the way there, but doesn't solve problem 3: what happens if we hit a link to a hostname besides W1? The browser sends it to W1 anyway via its proxy setting, but W1 won't know how to handle it, so we'll get a web server error along the lines of "unrecognized URL." We can't feasibly deal with this manually; not only would we have to forward another port, but also we'd have to reset the browser to proxy through the new port, at which point it could reach the new URLs but not the old ones on W1! That's just a mess...what we really need is a way for the browser to communicate dynamically with SSH itself, telling it to forward to the correct web server for each URL the browser handles. And indeed, there is a feature to do exactly this, called dynamic forwarding or SOCKS forwarding.

SOCKS is a small protocol, defined in RFC-1928. A SOCKS client connects via TCP, and indicates via the protocol the remote socket it wants to reach; the SOCKS server makes the connection, then gets out of the way, transparently passing data back and forth. Thereafter, it is just as if the client had connected directly to the remote socket. The OpenSSH and Tectia syntax for this kind of forwarding would be:

    # OpenSSH
    $ ssh -D 1080 B

    # Tectia
    $ ssh -L socks/1080 B

We've switched to port 1080 since that's the usual SOCKS port; 8080 or any other port would do, as usual. Note that there's no destination socket in either command, just the local port to be forwarded; that's because the destination is determined dynamically, and can be different for each connection. We can use this solution only if the browser has an option to use a SOCKS proxy (as most do).

This solves the whole problem neatly! The process goes like so:

  1. The user types URL scheme://foo:1234/ into the browser. The port 1234 might be implicit, as in 80 for HTTP or 443 for HTTPS.

  2. The browser connects to the SSH SOCKS proxy on localhost:1080, and asks for a connection to foo:1234 using the SOCKS protocol.

  3. In response, the SSH client associates the browser's connection with a new direct-tcpip channel in the existing SSH session [3.4.4.1], connected to foo:1234 via another TCP connection established by the SSH server.

  4. The SSH client and server "get out of the way," and the browser is connected to the desired web server. Note that there is nothing here specific to HTTP; the browser can next build an SSL session if the scheme is HTTPS, or use any protocol at all over the proxied connection.

Each time a new connection arrives on port 1080, it can be forwarded to a different socket. This might seem odd if you have static forwarding firmly in mind, but it's just an extension of what you already know. With static forwarding, the SSH client still creates a new channel for each connection; it just sends them all to the same place. With dynamic forwarding, SOCKS allows each connection to indicate its own destination, and SSH obliges.

No special support is required for dynamic forwarding on the SSH server, since it in fact uses the same mechanism as static forwarding. Only the client needs to support dynamic forwarding.

So, this would be a perfect lightweight solution: complete remote web browsing with just SSH. Ah, if only we lived in such a simple world....

There are actually two commonly used versions of the SOCKS protocol: Version 4 and Version 5. Both OpenSSH and Tectia clients can do SOCKS proxying, and recent versions implement SOCKS5 as well as SOCKS4. SOCKS5 added many features over SOCKS4—authentication, UDP support, bidirect forwarding, and more—but the germane feature here is that SOCKS4 only understands IP addresses in destination sockets, whereas SOCKS5 accepts domain names as well. This is crucial for both practical and privacy reasons. Often, the naming context on either side of the SSH connection is different: in our current example, your company's network probably has a private namespace for hosts (e.g., an internal-only DNS which isn't available to the outside world). With SOCKS4, your browser must look up the name in the URL locally, then ask the SOCKS proxy to connect to the resulting address. That won't work for us; we want to give the proxy the (name,port) to reach, and have it resolve the name on the far side of the connection, in the correct context.

The privacy aspect is, if you're proxying your browsing traffic to shield your local web traffic from prying eyes, you don't want to reveal the names of all the web servers you're hitting to anyone who can watch the DNS traffic from your browsing host.

OK, so SOCKS4 is out; that's no problem, as many browsers support SOCKS5. But there's a further complication; the ugly face of reality nosing into our elegant solution. Disappointingly, most of the major browsers, even when they support SOCKS5, don't actually use it properly: they look up names locally, even though they could be passed through the proxy. We've tried dozens of OS/browser combinations, including Firefox, Safari, Netscape, Mozilla, Internet Explorer (IE), and Opera, and the only one we've found so far which does the right thing is...(drum roll please...) IE 5.2 on Macintosh OS X. We guess that the main motivation for adding SOCKS5 support was authentication, and so it was added without changing the address-lookup logic—but this is an oversight that makes any use of SOCKS5 proxying much less useful than it could be. So: write your browser developers and ask for better SOCKS5 support! A switch for choosing either local or remote name resolution would be ideal.

Given the realities of browser SOCKS support, the best solution for now is usually using a static SSH port forwarding to a separate HTTP proxy server, such as Squid or Privoxy. These proxies can also provide lots of other useful features, such as pop-up blocking and cookie management—but one doesn't always have such a proxy available or the ability to set one up, so the SSH-only approach with dynamic forwarding is preferable if you can use it.

The remote web-browsing problem provided a perfect setting in which to introduce dynamic forwarding, but there are certainly other uses. Any program which can use a SOCKS proxy is a candidate, and there are lots of them if you look. For instance: SSH itself! With dynamic forwarding, SSH acts as a SOCKS server, but as a completely separate feature, some SSH products can also be SOCKS clients. The usual use for this is for external connectivity where the local network isn't directly connected to the Internet, but provides only proxied Net access via SOCKS. However, it has a neat use in combination with dynamic forwarding:

    # Tectia
    # In one window:
    $ ssh -L socks/1080 B
    # In another window:
    $ export SSH_SOCKS_SERVER=socks://localhost:1080/
    $ ssh -o'usesocks5 yes' HOST1

where you're on the outside but HOST1 is on your company's internal network. The second ssh command uses the SSH/SOCKS proxy established by the first to connect through the bastion host B to HOST1, resolving the name HOST1 on the inside. This is obviously more convenient than forwarding a separate port to host:22 for each internal host you might want to reach. It also has many advantages over the idiom ssh B -t ssh HOST1, including: