An Example—Dissecting a Spam Network

Now let’s see how these tools can be used in the real world. This section shows how you can figure out the structure of a sophisticated spam operation. A point that I will stress here and throughout the book is how valuable it can be to have multiple examples of an email or a web site. Even though the details may differ, the similarities between them can be very revealing.

For a while last year I was getting a lot of spam emails that all had a similar underlying appearance. The products being offered varied, as did the name of the Sender, but they clearly had a common origin. The From addresses all had the form and they all had the same mechanism for unsubscribing from their mailing list. So I collected a bunch of messages that fit this pattern and made a list of the web sites they were directing me to. At first glance these seemed to be a diverse group but as I added more examples the domain names started to take on a similar form. That was my motivation to investigate further and start to run dig on the hostnames. Table 2-3 shows a small sample of the results from that survey, sorted by IP address.

Table 2-3. Hostnames with similar IP addresses

Warning

Web sites come and go. The dodgy ones, in particular, often have a very short life. So don’t be surprised if the specific IP addresses and hostnames given here no longer give the same results. Instead, let the examples illustrate the underlying techniques and use them to explore sites that you come across in your own email.

First, look at the hostnames. You can see a common pattern in the domain names with two or three words joined together that almost make sense. Likewise, the first part of each hostname has the form of a name and a number, and there are two groups that are arranged sequentially. Now look at the IP addresses—the pattern is glaringly obvious. The people behind this operation would appear to have a bank of servers covering a significant block of IP addresses. These are organized very logically such that, for example, servers in the http://dynamicrhythms.com block have consecutive IP addresses.

It’s a safe bet that other servers occupy the gaps in the IP address range. We can even predict some of the hostnames. The next step was to figure out just how large this network was. I couldn’t get that information directly, but by calling dig systematically across a range of addresses, I thought I might be able to define its limits. Doing this one address at a time became tedious, so I wrote a small Perl script that takes a range of numeric addresses and performs a reverse lookup on each of them. This can be useful in other scenarios, so I’ve included it here as Example 2-1. Note that you need to switch between the dotted-quad notation that dig expects and the decimal form you need to step through sequentially.

Example 2-1. scan_ip_range.pl

#!/usr/bin/perl -w
# Runs dig on all IP addresses in the specified range

die "Usage: $0 <start IP addr> <end IP addr>\n" unless @ARGV == 2;
my $start_dec = dotted_quad_to_decimal($ARGV[0]);
my $end_dec   = dotted_quad_to_decimal($ARGV[1]);

for(my $i=$start_dec; $i<=$end_dec; $i++) {
    my $i_ip = decimal_to_dotted_quad($i);
    my $hostname = `dig +short -x $i_ip`;
    printf "%-15s %s", $i_ip, $hostname;
}

sub dotted_quad_to_decimal {
   my @fields = split /\./, shift;
   (fields[0] * 16777216) + ($fields[1] * 65536) +
   ($fields[2] * 256)     +  $fields[3];
}

sub decimal_to_dotted_quad {
    my $decimal = shift;
    my $factor = 16777216;
    my @quad = ();
    for(my $i=0; $i<4; $i++) {
       $quad[$i] = int($decimal / $factor);
       $decimal -= $quad[$i] * $factor;
       $factor /= 256;
    }
    join ".", @quad;
}

Running this over the 66.111.233.x and 66.111.234.x blocks (of 256 addresses each) uncovered 211 hostnames similar to those above, which fell into 60 groups of related names. I didn’t bother to scan adjacent blocks, but I know from other sources on the Web that the network extends even further than this. Here is a sample of the scan output:

    66.111.233.168  233-111-66.ftl-nj.webhostplus.com.
    66.111.233.169  233-111-66.ftl-nj.webhostplus.com.
    66.111.233.170  dyna1.dynamicrhythms.com.
    66.111.233.171  dyna2.dynamicrhythms.com.
    66.111.233.172  dyna3.dynamicrhythms.com.
    66.111.233.173  dyna4.dynamicrhythms.com.
    66.111.233.174  dyna5.dynamicrhythms.com.
    66.111.233.175  spec1.greenplanetspecials.com.
    66.111.233.176  spec2.greenplanetspecials.com.

One other thing to note from these scans was the mapping of a significant number of the IP addresses in the 66.111.233.x block to a single host called http://233-111-66.ftl-nj.webhostplus.com and to http://234-11-66.ftl-nj.webhostplus.com in the other block. We’ll return to this shortly.

So far we’ve used dig for reverse lookups. Using it with the reported hostnames would not be expected to add much information in this case. In fact, a sampling of such queries as I write this, some months after that period of spam, shows that many do not return IP addresses. That tells me that not only have these sites been taken down but also that the DNS entries have been removed. Fortunately for us, someone slipped up and left the reverse entries in the tables. The management of DNS records can be surprisingly sloppy and still work just fine. Sometime that works to your advantage.

Now let’s see what whois can contribute to this story. Running it on a sample of the domain names turns up a mixed bag of names and addresses in the contact information. Most of the domains appear linked to three addresses in the towns of Sunny Isles Beach, Aventura, and Hollywood, which are all in Florida. I don’t know if these are real addresses or not, but they serve as a type of signature or fingerprint for the people behind these sites. We’ll talk more about making these kinds of connections later in the book.

Warning

Note that you should NOT write scripts that attempt to step through whois records the way I did with the DNS lookups. This is exactly how spammers have built up their mailing lists in the past, and the domain registries will likely detect your script and block any further whois queries coming from your computer. Modest numbers of queries submitted manually should not get you into trouble.

Using whois with any of the IP addresses revealed something about the network these servers reside in:

    [whois.arin.net]
    OrgName:    WebHostPlus Inc
    OrgID:      WEBHO-3
    Address:    100 Plaza drive
    City:       Secaucus
    StateProv:  NJ
    PostalCode: 07094
    Country:    US

    NetRange:   66.111.192.0 - 66.111.255.255
    CIDR:       66.111.192.0/18
    NetName:    WEBHOSTPLUS-INC
    NetHandle:  NET-66-111-192-0-1
    Parent:     NET-66-0-0-0-0
    NetType:    Direct Allocation
    NameServer: NS.WHP-SERVER.COM

WebHost Plus is a well-established company in New Jersey that provides web hosting and other services to a large number of clients. Our friends sending out the emails are simply using them to host their web sites. But with over 200 web sites, each with a unique IP address, this looks like a big operation. Are they really running that many different web servers and physical computers?

No, what they are doing is configuring their servers with multiple IP addresses. Even with a single Ethernet card, you can configure Linux, for example, to act as though it has 256 IP addresses. Then you configure the Apache web server to respond to each address with a different web site. That’s what was going on with the 66.111.233.x addresses handled by one machine (http://233-111-66.ftl-nj.webhostplus.com) and the 66.11.234.x block handled by another. In their DNS tables, all the addresses were mapped to the canonical names of those machines until they were allocated to a client’s site. This is how companies such as WebHost Plus can afford to offer web sites for just a few dollars a month. You are sharing the server with other people and, as long as no one site hogs all the CPU cycles, it will appear as though you have your own dedicated server.

It seems like our friends are giving themselves a lot of extra work creating and managing all these distinct web sites. Why go to all that trouble? It’s all an attempt to evade the spam filters that are becoming ever more sophisticated. By generating emails with continually evolving content and including links to web sites with different hostnames they can avoid—or at least delay—being detected by the spam filters and being blacklisted by mail relays. They can run one web site for a week or two, shut it down, and then reappear under a totally different name.

This example has shown how much can be learned about an operation simply using dig and whois. By looking at similar emails, I found a set of hostnames that resembled each other. dig revealed that these all had similar IP addresses. Reverse lookups across a wider range of addresses turned up a lot more domains and hostnames, and whois showed that the same company hosted all of these. Unallocated addresses from the reverse lookup scan suggested that two physical servers were being used to host all these web sites. Running whois on the domain names turned up a confused mass of contact information that, in isolation, was not that useful. But even untrustworthy contact information can be useful as a signature or fingerprint for this operation.