Chapter 18. Network Capture

Sooner or later, you will connect your system to a network, whether it is a LAN segment at work, a cable or DSL modem at home, or even a dial-up connection on the road. You will send and receive packets from a variety of computers that you know almost nothing about. Being able to monitor, capture, and analyze those packets can be incredibly useful, either to troubleshoot network performance, debug a problematic networking program, or capture an attack for later analysis or as evidence for prosecution.

This chapter is meant to give you a short introduction to the essential tools of capturing and manipulating traffic. For additional resources, I strongly recommend Wireshark & Ethereal Network Protocol Analyzer Toolkit, by Orebaugh et al. (Syngress) and Network Intrusion Detection, by Steven Northcutt and Judy Novak (SAMS).

tcpdump is a command-line packet sniffer for Unix-based operating systems. In order to capture packets other than those addressed to the host's MAC address, it must enable Promiscuous Mode on the card, which requires superuser/root access. Most versions of Unix will not let you run tcpdump unless you are root, because being able to see packets from other users would violate Unix's security model.

tcpdump was originally written by Van Jacobson, Craig Leres, and Steven McCanne when they worked at Lawrence Berkeley National Laboratory (LBNL) Network Research Group (NRG), just up the hill from the main UC Berkeley campus. Because of this, the filtering language used in tcpdump is known as the Berkeley Packet Filtering (BPF) language.

Acquiring tcpdump is fairly straightforward; most Linux/Unix/POSIX distributions (or distros, for short) have a simple-to-install package for both tcpdump and libpcap (you need both to capture traffic). If an install package isn't available for your distro, the source is available from http://www.tcpdump.org. Compiling from source is a fairly straightforward operation.

Once installed, log in as root (or use sudo as discussed in Limiting Access) and run it:

 # tcpdump

By default, it picks up the first interface it finds (excluding the loopback, typically eth0 for Linux) and displays each packet it sees on that interface as a single line; for example:

12:55:09.459039 IP 10.10.9.24.3766 > server.ssh: . ack 887032 win 65535
12:55:09.459181 IP server.ssh > 10.10.9.24.3766: P 887280:887396(116)
ack 3693 win 9648

Ctrl-C quits the capture. If you did the above capture remotely through SSH, you receive a torrent of packets similar to the ones just shown. You are essentially seeing yourself, then telling yourself you are seeing yourself. This capability is not terribly useful, although sometimes it can be enough to know whether your network is working properly. Note that if tcpdump knows the service name from the /etc/services file, it displays the service name instead of the port number (e.g., .ssh in the example). Also if it knows the reverse lookup for a particular address, it resolves it by default—this can be useful, or it can be bad.

Consider this: if you are doing a covert packet capture between you and a remote target, or between two targets on a port-mirrored switch (or you broke its CAM table), and then you start doing reverse-DNS lookups on the hosts you are collecting against, you are leaking information that someone else may be able to monitor and detect, revealing your otherwise untraceable activities. To disable name lookups, use the -n option:

# tcpdump -n

What if you want to ssh in on one interface and then monitor on another? A neat trick is to have the second interface simply up, but with no IP address assigned to it. This allows you to monitor a network without being detected on that network or directly attacked. In order to specify which interface you are going to capture on, use the -i option and then the name of an interface; for example:

# tcpdump -i eth1

Say your Unix system is also forwarding traffic, perhaps modifying it on the way or maybe even dropping traffic (due to an inline IDS or a proxy server), and you want to see traffic that it is forwarding; or suppose your machaine is multi-homed, or you're connected to multiple span ports, etc. If it is a Linux-based system, you can easily capture traffic to and from any interfaces with the any interface keyword; for example:

# tcpdump -i any

If you decide to save packets to disk (discussed later) by using this technique, the resulting capture file is typically called a Linux cooked capture. Also keep in mind that this does not set the interfaces to promiscuous mode and only packets destined to or sent from the system are captured.

To get a list of available capture interfaces, use the -D option:

# tcpdump -D
1.eth0
2.eth1
3.eth2
4.eth3
5.any (Pseudo-device that captures on all interfaces)
6.lo

My example shows four Ethernet interfaces, the Linux any interface, and the loopback interface.

Up until now, you have been capturing any old packet that comes across the wire. And while this may be handy, it is not always desirable (consider the problem of sshing into a box, then performing a tcpdump and capturing your own SSH session in the process). As mentioned earlier, the gents from Berkeley came up with a filtering language to specify packets of interest, which became known as the Berkeley Packet Filter (BPF) language. It can be as simple or as complex as you like. We will start out with some basics.

Anything that comes after the specified options when tcpdump is run is considered a BPF expression. Expressions consist of one or more primitives, which generally consist of an identifier (either name or number) and a qualifier. There are three different kinds of qualifiers:

Here's a simple example; say you want to filter on only IP protocol packets (no ARP or IPX). You could use the IP primitive to create the following filter:

# tcpdump -i eth1 ip

You can take these different primitives and combine them, using Boolean logic (and, not, or); for example:

# tcpdump -i eth1 tcp port 80 and ether host 00:0C:F1:F1:B6:20

This would listen on interface eth1 for HTTP traffic that was sent or received by a system with a MAC address of 00:0C:F1:D1:B6:20.

This next example is very useful:

# tcpdump -i eth2 not tcp port 22

This captures everything except SSH, which is really handy when you have ssh'd in remotely and are commanding the box to capture traffic, but you want to exclude your encrypted (and therefore useless) SSH session.

Watching packets fly over the wire is interesting, for about 10 seconds. Under normal circumstances, the packets are going by so quickly that you are not going to have the reflexes to see them long enough for it to make much sense. Sure, you can page it with your favorite paging program (e.g., more, less), but it would be so much more useful if you could save these packets to disk.

Fortunately, tcpdump has a switch to do just that—the write option -w. By specifying a filename after the -w option, packets are written to disk instead of displayed:

# tcpdump -i eth2 -w eth2-all-but-ssh.pcap not tcp port 22

By default, tcpdump generally examines only the first 68 bytes of each packet (96 with SunOS NIT), which is enough to snag the IP header and an ICMP, TCP, UDP, or similar header, but not go much into the payload space. This is to save on memory (packet buffers) and to increase performance. Saving just the packet headers to disk is not useful most of the time. To ensure the entire packet is captured (including payload), use the snaplen option of -s and set it to zero, which captures the entire packet regardless of its length; for example:

# tcpdump -s 0 -i eth2 -w eth2-all-but-ssh.pcap not tcp port 22

These are the most common options I use when I'm creating pcaps for later review.

One problem with writing packets to disk is the sheer volume of data that gets written. This can be countered with some good BPF-fu to get more precise captures, but there are times when we cannot do filtering because we do not know what we are looking for, when we need to capture a lot of data before we find something interesting, or when we need to capture for a very long time. All of these scenarios result in very large pcap files. Since most systems can handle 2 GB files these days without any problems, this could be considered a non-issue, except for that fact that eventually you will want to view these pcaps. Viewing a 2 GB pcap file is not something I have ever tried, as I do not have 3-4 GB of RAM, nor the 30-45 minutes of free time to watch it load. tcpdump has a -C (chunk) option that allows you to spread a capture across several files. The -C option takes a file size parameter (in millions of bytes, not megabytes) that if exceeded, starts a new capture file and closes the old one; for example:

# tcpdump -s 0 -C 100 -i eth2 -w eth2-all-but-ssh.pcap not tcp port 22

This would create several files, as close to 100 million byte files as possible of pcap data until you told it to stop with a Ctrl-C. The variance in file size occurs because tcpdump does not split a packet between two capture files. New files are named the same as the -w option filename, with a number appended to the end, starting at 1 and going up from there. Unfortunately, if you name your files correctly and place a .pcap extension on the end, it will do the wrong thing and number them .pcap1, .pcap2, .pcap3, etc.

The -C option can be combined with the -W (wrap) option to set a limit on how many files are created. When this number is reached, the oldest file is overwritten, creating a rolling buffer. This can be useful if you are doing a really long-term capture but do not want to crash the process because you ran out of hard drive space. This makes 10 pcap files of 100 million bytes each (for a total of about a gigabyte):

# tcpdump -s 0 -C 100 -W 10 -i eth2 -w eth2-all-but-ssh.pcap not tcp port 22

I'd normally have the -W option higher, say 100 or even 500, to ensure I do not lose data right away.

There are a variety of different options for displaying data. I will cover the most important two: Verbosity and Format.

Verbosity is controlled with the -q and various -v options:

Format controls how the data is displayed. There are a few options for this: