Now let’s see how these tools can be used in the real world. This section shows how you can figure out the structure of a sophisticated spam operation. A point that I will stress here and throughout the book is how valuable it can be to have multiple examples of an email or a web site. Even though the details may differ, the similarities between them can be very revealing.
For a while last year I was getting a lot of spam emails that all
had a similar underlying appearance. The products being offered varied,
as did the name of the Sender, but they clearly had a common origin. The
From addresses all had the form
<somebody>@stderr.<somedomain>.com and they
all had the same mechanism for unsubscribing from their mailing list. So
I collected a bunch of messages that fit this pattern and made a list of
the web sites they were directing me to. At first glance these seemed to
be a diverse group but as I added more examples the domain names started
to take on a similar form. That was my motivation to investigate further
and start to run dig
on the
hostnames. Table
2-3 shows a small sample of the results from that survey, sorted
by IP address.
Table 2-3. Hostnames with similar IP addresses
Hostname | IP address |
---|---|
| |
| |
| |
| |
| |
| |
| |
| |
| |
|
Web sites come and go. The dodgy ones, in particular, often have a very short life. So don’t be surprised if the specific IP addresses and hostnames given here no longer give the same results. Instead, let the examples illustrate the underlying techniques and use them to explore sites that you come across in your own email.
First, look at the hostnames. You can see a common pattern in the domain names with two or three words joined together that almost make sense. Likewise, the first part of each hostname has the form of a name and a number, and there are two groups that are arranged sequentially. Now look at the IP addresses—the pattern is glaringly obvious. The people behind this operation would appear to have a bank of servers covering a significant block of IP addresses. These are organized very logically such that, for example, servers in the http://dynamicrhythms.com block have consecutive IP addresses.
It’s a safe bet that other servers occupy the gaps in the IP
address range. We can even predict some of the hostnames. The next step
was to figure out just how large this network was. I couldn’t get that
information directly, but by calling dig
systematically across a range of
addresses, I thought I might be able to define its limits. Doing this
one address at a time became tedious, so I wrote a small Perl script
that takes a range of numeric addresses and performs a reverse lookup on
each of them. This can be useful in other scenarios, so I’ve included it
here as Example 2-1.
Note that you need to switch between the dotted-quad notation that
dig
expects and the decimal form you
need to step through sequentially.
Example 2-1. scan_ip_range.pl
#!/usr/bin/perl -w # Runs dig on all IP addresses in the specified range die "Usage: $0 <start IP addr> <end IP addr>\n" unless @ARGV == 2; my $start_dec = dotted_quad_to_decimal($ARGV[0]); my $end_dec = dotted_quad_to_decimal($ARGV[1]); for(my $i=$start_dec; $i<=$end_dec; $i++) { my $i_ip = decimal_to_dotted_quad($i); my $hostname = `dig +short -x $i_ip`; printf "%-15s %s", $i_ip, $hostname; } sub dotted_quad_to_decimal { my @fields = split /\./, shift; (fields[0] * 16777216) + ($fields[1] * 65536) + ($fields[2] * 256) + $fields[3]; } sub decimal_to_dotted_quad { my $decimal = shift; my $factor = 16777216; my @quad = (); for(my $i=0; $i<4; $i++) { $quad[$i] = int($decimal / $factor); $decimal -= $quad[$i] * $factor; $factor /= 256; } join ".", @quad; }
Running this over the 66.111.233.x
and 66.111.234.x
blocks (of 256 addresses each)
uncovered 211 hostnames similar to those above, which fell into 60
groups of related names. I didn’t bother to scan adjacent blocks, but I
know from other sources on the Web that the network extends even further
than this. Here is a sample of the scan output:
66.111.233.168 233-111-66.ftl-nj.webhostplus.com. 66.111.233.169 233-111-66.ftl-nj.webhostplus.com. 66.111.233.170 dyna1.dynamicrhythms.com. 66.111.233.171 dyna2.dynamicrhythms.com. 66.111.233.172 dyna3.dynamicrhythms.com. 66.111.233.173 dyna4.dynamicrhythms.com. 66.111.233.174 dyna5.dynamicrhythms.com. 66.111.233.175 spec1.greenplanetspecials.com. 66.111.233.176 spec2.greenplanetspecials.com.
One other thing to note from these scans was the mapping of a
significant number of the IP addresses in the 66.111.233.x
block to a single host called
http://233-111-66.ftl-nj.webhostplus.com and to
http://234-11-66.ftl-nj.webhostplus.com in the
other block. We’ll return to this shortly.
So far we’ve used dig
for
reverse lookups. Using it with the reported hostnames would not be
expected to add much information in this case. In fact, a sampling of
such queries as I write this, some months after that period of spam,
shows that many do not return IP addresses. That tells me that not only
have these sites been taken down but also that the DNS entries have been
removed. Fortunately for us, someone slipped up and left the reverse
entries in the tables. The management of DNS records can be surprisingly
sloppy and still work just fine. Sometime that works to your
advantage.
Now let’s see what whois
can
contribute to this story. Running it on a sample of the domain names
turns up a mixed bag of names and addresses in the contact information.
Most of the domains appear linked to three addresses in the towns of
Sunny Isles Beach, Aventura, and Hollywood, which are all in Florida. I
don’t know if these are real addresses or not, but they serve as a type
of signature or fingerprint for the people behind these sites. We’ll
talk more about making these kinds of connections later in the
book.
Note that you should NOT write scripts that attempt to step
through whois
records the way I did
with the DNS lookups. This is exactly how spammers have built up their
mailing lists in the past, and the domain registries will likely
detect your script and block any further whois
queries coming from your computer.
Modest numbers of queries submitted manually should not get you into
trouble.
Using whois
with any of the IP
addresses revealed something about the network these servers reside
in:
[whois.arin.net] OrgName: WebHostPlus Inc OrgID: WEBHO-3 Address: 100 Plaza drive City: Secaucus StateProv: NJ PostalCode: 07094 Country: US NetRange: 66.111.192.0 - 66.111.255.255 CIDR: 66.111.192.0/18 NetName: WEBHOSTPLUS-INC NetHandle: NET-66-111-192-0-1 Parent: NET-66-0-0-0-0 NetType: Direct Allocation NameServer: NS.WHP-SERVER.COM
WebHost Plus is a well-established company in New Jersey that provides web hosting and other services to a large number of clients. Our friends sending out the emails are simply using them to host their web sites. But with over 200 web sites, each with a unique IP address, this looks like a big operation. Are they really running that many different web servers and physical computers?
No, what they are doing is configuring their servers with multiple
IP addresses. Even with a single Ethernet card, you can configure Linux,
for example, to act as though it has 256 IP addresses. Then you
configure the Apache web server to respond to each address with a different web site.
That’s what was going on with the 66.111.233.x
addresses handled by one machine
(http://233-111-66.ftl-nj.webhostplus.com) and the
66.11.234.x
block handled by another.
In their DNS tables, all the addresses were mapped to the canonical
names of those machines until they were allocated to a client’s site.
This is how companies such as WebHost Plus can afford to offer web sites
for just a few dollars a month. You are sharing the server with other
people and, as long as no one site hogs all the CPU cycles, it will
appear as though you have your own dedicated server.
It seems like our friends are giving themselves a lot of extra work creating and managing all these distinct web sites. Why go to all that trouble? It’s all an attempt to evade the spam filters that are becoming ever more sophisticated. By generating emails with continually evolving content and including links to web sites with different hostnames they can avoid—or at least delay—being detected by the spam filters and being blacklisted by mail relays. They can run one web site for a week or two, shut it down, and then reappear under a totally different name.
This example has shown how much can be learned about an operation
simply using dig
and whois
. By looking at similar emails, I found a
set of hostnames that resembled each other. dig
revealed that these all had similar IP
addresses. Reverse lookups across a wider range of addresses turned up a
lot more domains and hostnames, and whois
showed that the same company hosted all
of these. Unallocated addresses from the reverse lookup scan suggested
that two physical servers were being used to host all these web sites.
Running whois
on the domain names
turned up a confused mass of contact information that, in isolation, was
not that useful. But even untrustworthy contact information can be
useful as a signature or fingerprint for this operation.