User Datagram Protocol

Abstract

In this chapter, you will learn about UDP, one of the major transport layer protocols in the TCP/IP stack. We’ll talk about datagrams and the structure of the UDP header.

You will learn about ports and sockets and how they are used at the transport layer.

Keywords

UDP; IPv4; IPv6; TCP/IP stack; DNS; VoIP; TFTP

The User Datagram Protocol (UDP) is one the major transport layer protocols that rides on top of IPv4 or IPv6. Most explorations of the TCP/IP transport layer treat the other major protocol, the connection-oriented Transmission Control Protocol (TCP) first and present connectionless UDP later. But the complexities of TCP, and the reasons for these often sophisticated procedures, are better understood after appreciating the basic connectionless service provided by UDP. In addition, certain concepts that are shared by both UDP and TCP, such as ports, can be introduced in UDP and so reduce the number of new ideas that must be covered during TCP discussions to a more manageable level.

The UDP acronym shows the effects of early Internet efforts to distinguish connectionless packet delivery (“It’s a datagram, not a packet!”) from more conventional connection-oriented schemes in use at the time. The data unit of UDP is not a packet, but a datagram, the content of a connectionless packet (many authors call IP packets datagrams as well, but we do not in this book). UDP datagrams have their own headers, naturally, and the UDP header is about as simple as a header can get. That’s only to be expected, because UDP operation is also very simple, making UDP ideal for a first look at end-to-end functions on a network.

In recent years, UDP’s popularity as a transport layer protocol for applications has been growing. The simple and fast operation of UDP makes it ideal for delay-sensitive traffic like voice samples (the digital representation of analog speech), multicast digital video, and other types of streaming and “real-time” traffic that cannot be resent piecemeal if lost on the network. This use of UDP is not as originally intended, and there are other things that need to be done before UDP is ready for voice and video, but in the true spirit of Internet innovation, UDP has adapted to these new circumstances.

UDP is used by many common network applications, including DNS, IPTV streaming media applications, voice over IP (VoIP), the Trivial File Transfer Protocol (TFTP), and online games. UDP is required for multicast applications.

UDP Ports and Sockets

Figure 11.1 shows the hosts on the Illustrated Network that we’ll be using in this chapter to explore UDP ports and sockets. We’ll primarily use the Unix-based hosts, both FreeBSD and Linux.

Let’s look at a simple application of UDP between the lnxclient and lnxserver hosts. The standard Unix “echo” utility (not the same “echo” program as the application used in a previous chapter) sends a simple text string from a client to a server using UDP port 7. The server just bounces a UDP datagram back with the same content. But even with this simple interaction, all of the major points about UDP discussed in this chapter can be illustrated.

The capture is from lnxserver (10.10.11.66). The server is responding to the lnxclient (10.10.12.166) request to echo the string “TEST.” The important sections of the request and response packets relevant to UDP are highlighted.

[root@lnxserver admin]# /usr/sbin/tethereal -V port 7

Capturing on eth0

Frame 1 (60 bytes on wire, 60 bytes captured)

  Arrival Time: May 6, 2008 16:31:30.947137000

  Time delta from previous packet: 0.000000000 seconds

  Time relative to first packet: 0.000000000 seconds

  Frame Number: 1

  Packet Length: 60 bytes

  Capture Length: 60 bytes

Ethernet II, Src: 00:05:85:88:cc:db, Dst: 00:d0:b7:1 f:fe:e6

  Destination: 00:d0:b7:1 f:fe:e6 (Intel_1 f:fe:e6)

  Source: 00:05:85:88:cc:db (Juniper__88:cc:db)

  Type: IP (0x0800)

  Trailer: 0000000000000000000000000000

Internet Protocol, Src Addr: 10.10.12.166 (10.10.12.166), Dst Addr:10.10.11.66 (10.10.11.66)

  Version: 4

  Header length: 20 bytes

  Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)

  0000 00.. = Differentiated Services Codepoint: Default (0x00)

   .... ..0. = ECN-Capable Transport (ECT): 0

   .... ...0 = ECN-CE: 0

  Total Length: 32

  Identification: 0x0000

  Flags: 0x04

   .1.. = Don’t fragment: Set

   ..0. = More fragments: Not set

  Fragment offset: 0

  Time to live: 62

  Protocol: UDP (0x11)

  Header checksum: 0x10d2 (correct)

  Source: 10.10.12.166 (10.10.12.166)

  Destination: 10.10.11.66 (10.10.11.66)

User Datagram Protocol, Src Port: 32787 (32787), Dst Port: echo (7)

  Source port: 32787 (32787)

  Destination port: echo (7)

  Length: 12

  Checksum: 0xac26 (correct)

Data (4 bytes)

0000 54 45 53 54        TEST

Frame 2 (46 bytes on wire, 46 bytes captured)

  Arrival Time: May 6, 2008 16:31:30.948312000

  Time delta from previous packet: 0.001175000 seconds

  Time relative to first packet: 0.001175000 seconds

  Frame Number: 2

  Packet Length: 46 bytes

  Capture Length: 46 bytes

Ethernet II, Src: 00:d0:b7:1 f:fe:e6, Dst: 00:05:85:88:cc:db

  Destination: 00:05:85:88:cc:db (Juniper__88:cc:db)

  Source: 00:d0:b7:1 f:fe:e6 (Intel_1 f:fe:e6)

  Type: IP (0x0800)

Internet Protocol, Src Addr: 10.10.11.66 (10.10.11.66), Dst Addr: 10.10.12.166 (10.10.12.166)

  Version: 4

  Header length: 20 bytes

  Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)

  0000 00.. = Differentiated Services Codepoint: Default (0x00)

   .... ..0. = ECN-Capable Transport (ECT): 0

   .... ...0 = ECN-CE: 0

  Total Length: 32

  Identification: 0x0000

  Flags: 0x04

   .1.. = Don’t fragment: Set

   ..0. = More fragments: Not set

  Fragment offset: 0

  Time to live: 64

  Protocol: UDP (0x11)

  Header checksum: 0x0ed2 (correct)

  Source: 10.10.11.66 (10.10.11.66)

  Destination: 10.10.12.166 (10.10.12.166)

User Datagram Protocol, Src Port: echo (7), Dst Port: 32787 (32787)

  Source port: echo (7)

  Destination port: 32787 (32787)

  Length: 12

  Checksum: 0xac26 (correct)

Data (4 bytes)

0000 54 45 53 54       TEST

The DF bit in the packet is set, and the UDP checksum field is used. Technically, the UDP checksum is optional, and the client decides whether to use it. The server responds with a checksum because the client used a checksum in the request. In fact, Windows and FreeBSD do the same.

The UDP checksum was made optional to cut processing on reliable networks like small LAN segments to a bare minimum. Today, client and server on the same LAN segment are not very common, and processing the checksum is not a burden for modern computing devices. Also, UDP checksum calculation can be offloaded to modern Ethernet chipsets, so it’s less “expensive” than it used to be. Currently, use of the UDP checksum is common, and most traditional texts say it “should” be used with IPv4. Use of the UDP checksum is mandatory with IPv6.

Note that the program uses client UDP port 32787. This is in the range of ports known as registered ports. We’ll talk about those, and the dynamic port range of 49152 to 65535, later in this chapter. The dynamic port range that a Unix system uses is a kernel-tunable parameter and can be changed using tweaks to the /etc/sysctl.conf file, but information on exactly how to do it is scarce and beyond the scope of this book.

We can see the sockets in use on a Linux host by using the netstat −lp command to display listening sockets. (Although the options imply these are listening ports, it is the socket information that is displayed.)

[root@lnxserver admin]# netstat -lp

Active Internet connections (only servers)

ProtoRecv-Q Send-Q Local AddressForeign AddressState
PID/Program name
tcp00 *:32768*:*LISTEN
1664/
tcp00 localhost.localdo:32769*:*LISTEN
1783/xinetd
tcp00 localhost.localdoma:783*:*LISTEN
 1853/spamd -d -c -a   
tcp00 *:sunrpc*:*LISTEN
1645/
tcp00 *:x11*:*LISTEN
2103/X
tcp00 *:ssh*:*LISTEN
1769/sshd
tcp00 localhost.localdoma:ipp*:*LISTEN
6813/cupsd
tcp00 localhost.localdom:smtp*:*LISTEN
1826/
udp00 *:32768*:* 
1664/
udp00 *:echo*:* 
1923/Echo
udp00 *:sunrpc*:* 
1645/
udp00 *:631*:* 
6813/cupsd
udp00 localhost.localdoma:ntp*:* 
1800/
udp00 *:ntp*:* 
1800/

Image

Active UNIX domain sockets (only servers)

ProtoRefCntFlagsTypeStateI-NodePID/Program name
Path
unix2[ACC]STREAMLISTENING26631939/
/tmp/jd_sockV4
unix2[ACC]STREAMLISTENING28392053/
/tmp/.gdm_socket
unix2[ACC]STREAMLISTENING27142016/
/tmp/.font-unix/fs7100
unix2[ACC]STREAMLISTENING25421872/
/tmp/.iroha_unix/IROHA
unix2[ACC]STREAMLISTENING28492103/X
/tmp/.X11-unix/X0
unix2[ACC]STREAMLISTENING25351862/gpm
/dev/gpmctl

Image

The output is difficult to parse, but we can see our little echo utility (highlighted, and the second line of the UDP section) patiently waiting for clients on port 7 (the output identifies it as the standard “echo” port). UDP, being a stateless protocol, is not technically in a “listening” state, but that’s what the server socket essentially does. The asterisks (*:*) mean that communications will be accepted from another IP address and port.

The command to reveal the same type of information on bsdserver is sockstat.

bsdserver# sockstat

USERCOMMANDPIDFDPROTOLOCAL ADDRESSFOREIGN ADDRESS
rootsendmail884tcp4*:25*:*
rootsendmail886tcp4*:587*:*
rootsshd834tcp4*:22*:*
rootinetd794tcp4*:21*:*
rootinetd795tcp4*:23*:*
rootsyslogd725udp4*:514*:*
USERCOMMANDPIDFDPROTOLOCAL ADDRESSFOREIGN ADDRESS
Rootsendmail885tcp46*:25*:*
Rootsshd833tcp46*:22*:*
Rootsyslogd724udp6*:514*:*
USERCOMMANDPIDFDPROTOADDRESS 
Adminsshd482183streamsshd[48216]:4 
Rootsshd482164streamsshd[48218]:3 
Smmspsendmail913dgramsyslogd[72]:3 
Rootsendmail883dgramsyslogd[72]:3 
Rootsyslogd723dgram/var/run/log 

Image

What UDP Is For

UDP was defined in RFC 768 and refined in RFC 1122. All implementations must follow both RFCs to make interoperability reliable, and all do. UDP uses IP protocol ID 17. Any IPv4 or IPv6 packet received with 17 in the protocol ID field is given to the local UDP service.

UDP is defined as stateless (no session information is kept by hosts) and unreliable (no guarantees of any QoS parameters, not even delivery). This does not mean that UDP traffic is somehow lower priority on the network or through routers. It’s not as if UDP traffic is routinely tossed by stressed-out routers. It just means that if the application using UDP needs to keep track of a session history (“How many datagrams did you get before that link failed?”) or guaranteed delivery (“I’m not sending any more until I know if you got the datagrams I sent.”), then the application itself must do it, because UDP can’t and won’t.

Nevertheless, there is a whole class of applications that use UDP, some almost exclusively. These are applications that are invoked to exchange quick, request–response pairs of messages, such as DNS (“Quick! What IP address goes with www.example.com?”). These applications could suffer while waiting for all the overhead that TCP requires to set up a connection between hosts before sending a message.

Multicast allows one source to send a single packet stream to multiple destinations (TCP is strictly a one-source-to-one-destination protocol), so UDP must be used for multicast data transfer as well. Multicast is not only used with video or audio, but also in applications such as the Dynamic Host Configuration Protocol (DHCP).

In other words, UDP is a low-overhead transport for applications that do not need, or cannot have, the “point-to-point” connections or guaranteed delivery that TCP provides.

Packets carrying UDP traffic in IPv4 sometimes have the DF (Don’t Fragment) bit set in the IPv4 header. However, no one should be surprised or upset to find a UDP datagram riding inside an IPv4 packet without the DF bit set.

The UDP Header

Figure 11.2 shows the UDP header. There are only four fields, and the data inside the datagram (the message) are optional.

The header is only 8 bytes (64 bits) long. First are the 2-byte Source Port field and the 2-byte Destination Port field. These fields are the datagram counterparts of the source and destination IP addresses at the packet level. But unlike IP addresses, there is no structure to the port fields: All values between 0 and 65,353 are represented as pure numerics. This does not mean that all port numbers, source and destination, are the same, however. Port values can be divided into well-known, registered, and dynamic port numbers.

The Length field gives the length in bytes of the UDP datagram, and includes the header fields along with any data. The minimum length is 8 (the header alone), and the maximum value is 65,353. However, the achievable maximum UDP datagram lengths are determined by the size of the send and receive buffers on the host end systems, which are usually set to around 8000 bytes (although they can be changed).

As already mentioned, hosts are required to handle 576-byte IP packets at a minimum, but many protocols (the most common being DNS and DHCP) limit the maximum size of the UDP datagram that they use to 512 bytes or less.

The Checksum field is the most interesting field in the UDP header. This is because the checksum is not a simple value calculated on the UDP header fields and data, if present. The UDP checksum is computed on what is called the pseudo-header. The pseudo-header fields for IPv4 are shown in Figure 11.3.

The all-zero byte is used to provide alignment of the pseudo-header, and the data field must be padded to align it with a 16-bit boundary. The 12 bytes of the UDP pseudo-header are prepended to the UDP datagram, and the checksum is computed on the whole object. For this computation, the Checksum field itself is set to zero, and the 16-bit result placed in the field before transmission. If the checksum computes to zero, an all-1s value is sent, and all-1s is not a computable checksum. The pseudo-header fields are not sent with the datagram.

At the receiver, the value of the Checksum is copied and the field again set to zero. The checksum is again computed on the pseudo-header and compared to the received value. If they match, the datagram is processed by the receiving application indicated by the destination port number. If they do not match, the datagram is silently discarded (i.e., no error message is sent to the source).

Naturally, using 32-bit IPv4 addresses to compute transport layer checksums will not work in IPv6, although the procedure is the same. RFC 2460 establishes a different set of pseudo-header fields for IPv6. The IPv6 pseudo-header is shown in Figure 11.4.

The Next Header value is not always 17 for UDP, because other extension headers could be in use. Length is the length of the upper layer header and the data it carries.

IPv4 and IPv6 Notes

The presence of the IP source and destination address in an upper layer checksum computation strikes many as a violation of the concept of protocol layer independence. (The same concern applies to NAT, discussed in Chapter 27.) In fact, a lot of TCP/IP books mention that including packet level fields in the end-to-end checksum helps assure (when the checksum is correct at the receiver) that the message has not only made its way to right port, but to the correct system.

The presence of a pseudo-header also shows how late in the development process that TCP and UDP were separated from IP. Not only that, but the transport layer and network layer (or, to give them their more intuitive names, the end-to-end layer and routing layer) have always been tightly coupled in any working network.

The use of the UDP checksum is not required for IPv4, but highly recommended. It is required in IPv6, of course. In IPv4, servers that receive client datagrams with the checksum field set are supposed to reply using the checksum, but this is not always enforced. If the IPv4 checksum field is not used, it is set to all 0 bits (recall that all 0 checksums are sent as all-1s).

Port Numbers

Each application running above UDP (and TCP) and IP is indexed by its port number, allowing for the multiplexing of the IP layer. Just as frames with different types of packets inside (on Ethernet, IPv4 is 0x0800 and IPv6 is 0x86DD) are multiplexed onto a single LAN interface, the individual IPv4 or IPv6 packets are multiplexed and distributed by the protocol number (UDP is IP protocol number 17, and TCP is 6).

The port numbers in turn multiplex and distribute datagrams from applications, allowing them to share a single UDP or TCP process, which is usually integrated closely with the operating system. This function of frame Ethertype, packet protocol, and datagram port is shown in Figure 11.5. The figure shows how IPv4 data for DNS makes its way from frame through IPv4 through UDP to the DNS application listening on UDP port 53.

Well-Known Ports

Port numbers can run from 0 to 65353. Port numbers from 0 to 1023 are reserved for common TCP/IP applications and are called well-known ports. The use of well-known ports allows client applications to easily locate the corresponding server application processes on other hosts. For example, a client process wanting to contact a DNS process running on a server must send the datagram to some destination port. The well-known port number for DNS is 53, and that’s where the server process should be listening for client requests. These ports are sometimes called “privileged” ports, although a number of applications that formerly ran in “privileged” mode, such as HTTP servers, do not run this way anymore except when binding to the port. It should be noted that it is getting harder and harder to register new applications in the space below 1023 (these often use registered ports in the range 1024 to 49151).

Ports used on servers are persistent in the sense that they last for a long time, or at least as long as the application is running. Ports used on clients are ephemeral (“lasting a short time,” although the term technically means “lasting a day”) in the sense that they “come and go” as the user runs client applications.

Technically, UDP port numbers are independent from TCP port numbers. In practice, most of the applications indexed by port numbers are the same in UDP or TCP (although a few applications can use either protocol), excepting a handful that are maintained for historical reasons. This does not imply that applications can use TCP or UDP as they choose. It just means that it’s easier to maintain one list rather than two. But no matter what port numbers are used, UDP port 1000 is a different application than TCP port 1000, even though both applications might perform the same function.

Some of the more common well-known port numbers are shown in Table 11.1. In the table, the UDP and TCP port numbers are identical.

Table 11.1

Some Well-Known Ports Used by UDP and TCP Services and Functions

Port Number Service Meaning
7 Echo Used to echo data back to the sender
9 Discard Used to discard data at receiver
13 Daytime Reports time information in user-friendly format
17 Quote Returns a “quote of the day” (rarely used today)
19 Chargen Character generator
53 DNS Domain Name Service
67 DHCP server Server port used to send configuration information
68 DHCP client Client port used to receive configuration information
69 TFTP Trivial file transfer
161 SNMP Used to receive network management queries
162 SNMP traps Used to receive network problem reports
1011–1023 Reserved Reserved for future use

Port numbers above 1023 can be either registered or dynamic (also called private or non-reserved). Registered ports are in the range 1024 to 49151. Dynamic ports are in the range 49152 to 65535. As mentioned, most new port assignments are in the range from 1024 to 49151.

Registered port numbers are non–well-known ports that are used by vendors for their own server applications. After all, not every possible application capability will be reflected in a well-known port, and software vendors should be free to innovate. Of course, if another vendor chooses the same port number for a server process, and they are run on the same system, there would be no way to distinguish between these two seemingly identical applications.

Vendors can register their application’s ports with ICANN. Other software vendors are supposed to respect these registered values and register their own server application port numbers from the pool of unused values. Some registered UDP and TCP protocol numbers are shown in Table 11.2.

The private, or dynamic, port numbers are used by clients and not servers. Datagrams sent from a client to a server are typically only sent to well-known or registered ports (although there are exceptions). Server applications are usually long lived, while client processes come and go as users run them. Client applications therefore are free to choose almost any port number not used for some other purpose (hence the term “dynamic”), and many use different source port numbers every time they are run. The server has no trouble replying to the proper client because the server can just reverse the source and destination port numbers to send a reply to the correct client (assuming the IP address of the client is correct).

All TCP/IP implementations must know the range of well-known, registered, and private ports when choosing a port number to use. Unix systems hold this information is the /etc/services file. Windows users can find this C:\%SystemRoot%\system32\drivers\etc\SERVICES file, where %SystemRoot% will be automatically referred to a folder such as WinNT or WINDOWS. UDP or TCP, but some are unique to one or the other. For example, FTP control uses TCP port 21.

Here is the beginning of the file from winsrv2:

# Copyright (c) 1993-2004 Microsoft Corp.

#

# This file contains port numbers for well-known services defined by IANA

#

# Format:

#

# <service name> <port number>/<protocol> [aliases...] [#<comment>]

#

echo7/tcp  
echo7/udp  
discard9/tcpsink null 
discard9/udpsink null 
systat11/tcpusers#Active users
systat11/tcpusers#Active users
daytime13/tcp  
daytime13/udp  
qotd17/tcpquote#Quote of the day
qotd17/udpquote#Quote of the day
chargen19/tcpttytst source#Character generator
chargen19/udpttytst source#Character generator
ftp-data20/tcp #FTP, data
ftp21/tcp #FTP. control
telnet23/tcp  

Image

[many more lines not shown...]

For the latest global list of well-known, registered, and private port numbers, see www.iana.org/assignments/port-numbers. The port numbers are the same for IPv4 and IPv6.

The Socket

The combination of IPv4 or IPv6 address and port numbers forms an abstract concept called a socket. We’ve mentioned the socket concept briefly before, and will do so again and again in later chapters. The socket concept is important for many reasons, and a later chapter will explore some of them more completely. For now, all that is important to mention is that, for each client–server interaction, there is a socket on each host at the endpoints of the network. The sockets at each end uniquely identify that particular client–server interaction, although the same sockets can be used for subsequent interactions.

Sockets are usually written in IPv4 and IPv6 by adding a colon (:) to the IP address, although sometimes a dot (.) is used instead. In IPv6, it is also necessary to add brackets to avoid confusion with the :: notation, such as in [FC00:490:f100:1000::1]:80. A UDP socket on lnxclient, for example, would be 10.10.12.166:17, while one on bsdserver would be 10.10.12.77:17.

UDP Operation

The delivery of UDP datagrams is by no means certain. The lack of an expected response on the part of a server to a UDP client request is handled by a simple timeout. Responses are not always expected, as might be the case with streaming audio and video. The client might resend the datagram, but in many cases this might not be the best strategy.

In some cases, lack of response is not a reliable indication that anything is wrong with the network or remote host. Routers routinely filter out unwanted packets, and many do so silently, while others send the appropriate ICMP “administratively prohibited” message.

In general, there are five major possible results when an application sends a UDP request, shown in Figure 11.6. Note that any of the replies can be lost on the way back to the sender, generating a timeout.

UDP Overflows

We’ve looked at UDP as a sort of quick-and-dirty request–response interaction between hosts over a network. Delivery is not guaranteed, but neither is an important network property called flow control. A lot of nonsense has been written about flow control, which is a very simple idea. It just means that no sender should ever be able to overwhelm a receiver with traffic. In other words, receivers must have a way to tell senders to slow down. UDP, of course, has no such mechanism.

The confusion over flow control often comes from treating flow control as a synonym for a related concept called congestion control. While flow control is strictly a local property of individual senders and receivers, congestion control is a global property of the network. No sender has to overwhelm a receiver in a congested network: There’s just too much traffic in the router network for things to work properly.

Congestion control often uses flow control to accomplish its goals (source quench was a not-too-sophisticated mechanism). There’s not much else a router can use other than flow control to tell senders to shut up for a while. But that’s no excuse for treating the two as one and the same.

What has this to do with UDP? Well, it is possible for UDP receivers’ buffers, which are usually fixed, to overflow with unexpected UDP datagrams and be forced to discard traffic. Most UDP implementations include a way to display “UDP socket overflows” or discarded UDP datagrams.

But what if an application needs guaranteed delivery, sequencing, and flow control to work properly, and we don’t want to add these to the application? Files cannot use quick request–response messages to transfer themselves over a network. That’s the job of TCP, which is the topic of the next chapter.

Questions for Readers

Figure 11.7 shows some of the concepts discussed in this chapter and can be used to help you answer the following questions.

1. Which UDP header field does UDP use for demultiplexing?

2. What is UDP’s only attempt at error control?

3. A socket is comprised of which two TCP/IP components?

4. What is the registered port range? Is this assigned or controlled?

5. What is the dynamic or private port range? Are these assigned or controlled?