The operation of stream sockets can be explained by analogy with the telephone system:
The socket() system call, which creates a socket, is the equivalent of installing a telephone. In order for two applications to communicate, each of them must create a socket.
Communication via a stream socket is analogous to a telephone call. One application must connect its socket to another application’s socket before communication can take place. Two sockets are connected as follows:
One application calls bind() in order to bind the socket to a well-known address, and then calls listen() to notify the kernel of its willingness to accept incoming connections. This step is analogous to having a known telephone number and ensuring that our telephone is turned on so that people can call us.
The other application establishes the connection by calling connect(), specifying the address of the socket to which the connection is to be made. This is analogous to dialing someone’s telephone number.
The application that called listen() then accepts the connection using accept(). This is analogous to picking up the telephone when it rings. If the accept() is performed before the peer application calls connect(), then the accept() blocks (“waiting by the telephone”).
Once a connection has been established, data can be transmitted in both directions between the applications (analogous to a two-way telephone conversation) until one of them closes the connection using close(). Communication is performed using the conventional read() and write() system calls or via a number of socket-specific system calls (such as send() and recv()) that provide additional functionality.
Figure 56-1 illustrates the use of the system calls used with stream sockets.
Stream sockets are often distinguished as being either active or passive:
By default, a socket that has been created using socket() is active. An active socket can be used in a connect() call to establish a connection to a passive socket. This is referred to as performing an active open.
A passive socket (also called a listening socket) is one that has been marked to allow incoming connections by calling listen(). Accepting an incoming connection is referred to as performing a passive open.
In most applications that employ stream sockets, the server performs the passive open, and the client performs the active open. We presume this scenario in subsequent sections, so that instead of saying “the application that performs the active socket open,” we’ll often just say “the client.” Similarly, we’ll equate “the server” with “the application that performs the passive socket open.”
The listen() system call marks the stream socket referred to by the file descriptor sockfd as passive. The socket will subsequently be used to accept connections from other (active) sockets.
#include <sys/socket.h>
int listen
(int sockfd, int backlog);
Returns 0 on success, or -1 on error
We can’t apply listen() to a connected socket--that is, a socket on which a connect() has been successfully performed or a socket returned by a call to accept().
To understand the purpose of the backlog argument, we first observe that the client may call connect() before the server calls accept(). This could happen, for example, because the server is busy handling some other client(s). This results in a pending connection, as illustrated in Figure 56-2.
The kernel must record some information about each pending connection request so that a subsequent accept() can be processed. The backlog argument allows us to limit the number of such pending connections. Connection requests up to this limit succeed immediately. (For TCP sockets, the story is a little more complicated, as we’ll see in TCP Connection Establishment.) Further connection requests block until a pending connection is accepted (via accept()), and thus removed from the queue of pending connections.
SUSv3 allows an implementation to place an upper limit on the value that can be specified for backlog, and permits an implementation to silently round backlog values down to this limit. SUSv3 specifies that the implementation should advertise this limit by defining the constant SOMAXCONN
in <sys/socket.h>
. On Linux, this constant is defined with the value 128. However, since kernel 2.4.25, Linux allows this limit to be adjusted at run time via the Linux-specific /proc/sys/net/core/somaxconn
file. (In earlier kernel versions, the SOMAXCONN
limit is immutable.)
In the original BSD sockets implementation, the upper limit for backlog was 5, and we may see this number specified in older code. All modern implementations allow higher values of backlog, which are necessary for network servers employing TCP sockets to serve large numbers of clients.
The accept() system call accepts an incoming connection on the listening stream socket referred to by the file descriptor sockfd. If there are no pending connections when accept() is called, the call blocks until a connection request arrives.
#include <sys/socket.h>
int accept
(int sockfd, struct sockaddr *addr, socklen_t *addrlen);
Returns file descriptor on success, or -1 on error
The key point to understand about accept() is that it creates a new socket, and it is this new socket that is connected to the peer socket that performed the connect(). A file descriptor for the connected socket is returned as the function result of the accept() call. The listening socket (sockfd) remains open, and can be used to accept further connections. A typical server application creates one listening socket, binds it to a well-known address, and then handles all client requests by accepting connections via that socket.
The remaining arguments to accept() return the address of the peer socket. The addr argument points to a structure that is used to return the socket address. The type of this argument depends on the socket domain (as for bind()).
The addrlen argument is a value-result argument. It points to an integer that, prior to the call, must be initialized to the size of the buffer pointed to by addr, so that the kernel knows how much space is available to return the socket address. Upon return from accept(), this integer is set to indicate the number of bytes of data actually copied into the buffer.
If we are not interested in the address of the peer socket, then addr and addrlen should be specified as NULL
and 0, respectively. (If desired, we can retrieve the peer’s address later using the getpeername() system call, as described in Section 61.5.)
Starting with kernel 2.6.28, Linux supports a new, nonstandard system call, accept4(). This system call performs the same task as accept(), but supports an additional argument, flags, that can be used to modify the behavior of the system call. Two flags are supported: SOCK_CLOEXEC
and SOCK_NONBLOCK
. The SOCK_CLOEXEC
flag causes the kernel to enable the close-on-exec flag (FD_CLOEXEC
) for the new file descriptor returned by the call. This flag is useful for the same reasons as the open() O_CLOEXEC
flag described in . The SOCK_NONBLOCK
flag causes the kernel to enable the O_NONBLOCK
flag on the underlying open file description, so that future I/O operations on the socket will be nonblocking. This saves additional calls to fcntl() to achieve the same result.
The connect() system call connects the active socket referred to by the file descriptor sockfd to the listening socket whose address is specified by addr and addrlen.
#include <sys/socket.h>
int connect
(int sockfd, const struct sockaddr *addr, socklen_t addrlen);
Returns 0 on success, or -1 on error
The addr and addrlen arguments are specified in the same way as the corresponding arguments to bind().
If connect() fails and we wish to reattempt the connection, then SUSv3 specifies that the portable method of doing so is to close the socket, create a new socket, and reattempt the connection with the new socket.
A pair of connected stream sockets provides a bidirectional communication channel between the two endpoints. Figure 56-3 shows what this looks like in the UNIX domain.
The semantics of I/O on connected stream sockets are similar to those for pipes:
To perform I/O, we use the read() and write() system calls (or the socket-specific send() and recv(), which we describe in Socket-Specific I/O System Calls: recv() and send()). Since sockets are bidirectional, both calls may be used on each end of the connection.
A socket may be closed using the close() system call or as a consequence of the application terminating. Afterward, when the peer application attempts to read from the other end of the connection, it receives end-of-file (once all buffered data has been read). If the peer application attempts to write to its socket, it receives a SIGPIPE
signal, and the system call fails with the error EPIPE
. As we noted in Creating and Using Pipes, the usual way of dealing with this possibility is to ignore the SIGPIPE
signal and find out about the closed connection via the EPIPE
error.
The usual way of terminating a stream socket connection is to call close(). If multiple file descriptors refer to the same socket, then the connection is terminated when all of the descriptors are closed.
Suppose that, after we close a connection, the peer application crashes or otherwise fails to read or correctly process the data that we previously sent to it. In this case, we have no way of knowing that an error occurred. If we need to ensure that the data was successfully read and processed, then we must build some type of acknowledgement protocol into our application. This normally consists of an explicit acknowledgement message passed back to us from the peer.
In The shutdown() System Call, we describe the shutdown() system call, which provides finer control of how a stream socket connection is closed.