Preventing address-in-use errors

If you do TCP server programming for very long, you will eventually run into the following scenario—your TCP server has one or more open connections, and then you terminate it (or it crashes). You restart the server, but the call to bind() fails with an EADDRINUSE (WSAEADDRINUSE on Windows) error.

When this happens, you can wait a few moments, try it again, and it works. What's going on here?

Essentially, when an application initiates a TCP socket close (or causes the disconnection by crashing), that socket goes into the TIME-WAIT state. The operating system continues to keep track of this socket for some time, potentially minutes.

An example program, server_noreuse.c, is included in this chapter's code repo. You can reproduce this address-in-use problem by running it, accepting a connection, and then terminating server_noreuse. To reproduce the problem, it is vital that the server is the one to terminate the open connection, not the client.

If you immediately start server_noreuse again, you will see the bind() error.

The following screenshot shows this on a Linux desktop:

You can use the netstat command to see these half-dead connections that are preventing our server from starting. The following command shows which connections are stuck in the TIME-WAIT state on Linux:

netstat -na | grep TIME

As long as one of these connections is hanging on, it prevents any new process from calling bind() on the same local port and address.

This failure of the bind() call can be prevented by setting the SO_REUSEADDR flag on the server socket before calling bind().

The following code demonstrates this:

int yes = 1;
if (setsockopt(my_socket, SOL_SOCKET, SO_REUSEADDR,
(void*)&yes, sizeof(yes)) < 0) {
fprintf(stderr, "setsockopt() failed. (%d)\n", GETSOCKETERRNO());
}

Once the SO_REUSEADDR flag is set, bind() succeeds even if a few TIME-WAIT connections are still hanging on to the same local port and address.

An example program, server_reuse.c, is included to demonstrate this technique.

I suggest that you always use SO_REUSEADDR for TCP servers because there are few downsides. The only real drawback is that using SO_REUSEADDR allows your program to bind to a specific interface even if another program has already bound to the wildcard address. Usually, this isn't a problem, but it is something to keep in mind.

You may sometimes see programs that attempt to fix this issue by killing sockets in the TIME-WAIT state. This can be accomplished by setting the socket linger option. This is dangerous! The TIME-WAIT state is essential to TCP's reliability, and interfering with it can lead to severe problems.

Why is this address-in-use only a problem for servers and not clients? Because the problem manifests itself when calling bind(). Client programs don't usually call bind(). If they do, this can also be a problem on the client-side.

While we are still on the topic of disconnected sockets, what happens when you try to send data to a peer that has already called close()? Let's consider that next.