In some cases, the spammers have been able to hijack the computers of unsuspecting users on the Internet, either by a targeted attack or through virus infections. The Sobig series of worms are widely believed to be an example of this. These are a family of worms that were disseminated across the Internet beginning in 2003. They showed a clear evolution in their design from the first (Sobig.A) through the sixth (Sobig.F), in terms of their ability to sidestep the defenses that were quickly raised against them. That evolution also appears to reflect improvements in the secondary function for the worm, which was to install email proxy servers on infected computers.
Having access to a network of these proxy servers is of great value to the spammers. Not only do they greatly reduce the chance that their identity will be revealed, but by constantly switching between proxies, they can prevent their emails being rejected by the spam blacklist servers. These keep track of machines that have sent large amounts of spam. If any given machine sends only a small number of messages, then it will never be blacklisted.
The evolution of Sobig through its fifth incarnation is summarized nicely in a report by the LURHQ Threat Intelligence Group , which can be found at http://www.lurhq.com/sobig-e.html. For a more detailed technical analysis, written by a group of analysts who have chosen to remain anonymous, you might find this document of interest: http://spamkings.oreilly.com/WhoWroteSobig.pdf. It offers a fascinating insight into the world of virus tracking and even names the individual that the authors believe created the worm.
The networks of compromised machines have been termed Botnets , with individual computers called zombies or bots . Their implications for computer security go beyond spamming to include distributed denial-of-service attacks on target machines and networks. The Honeynet Project and Research Alliance have published a detailed whitepaper about Botnets (http://www.honeynet.org/papers/bots/).
That level of analysis is beyond the scope of this book, but we can use our forensic skills to look at sets of related spam messages and perhaps infer something about the software used to generate the email.
In the face of increasingly sophisticated spam-blocking software, spammers are forced to continually generate unique email messages. Anyone who looks at spam messages will be familiar with the many ways of intentionally misspelling Viagra, oxycontin, etc., along with all the extraneous text that is used to get past spam filters. A similar approach is taken to the message headers. The goal is to continually change the headers so that spam filters can never determine a signature that clearly indicates spam. Most bulk mailers now include this feature. However, while specific strings may be continually changed, the algorithms used to generate them do not and they can serve as unique signatures by themselves. This is an ongoing battle between bulk mailers and spam filters, but you can place yourself at the front line with some simple analyses.
In the earlier section "Forged Headers,” I
showed the headers for a spam message about a pornography site. That was
one example of a series of similar messages that are clearly from the
same source. At the time of this writing, I receive one or two new
messages from this series every hour. No two messages have the same
sender, but all senders have names like Reuse L. Idahoan, Aggravation E.
Envelops, Hatching B. Saunter, and so forth. Right away I can see a
simple algorithm at work. Every sender consists of forename, middle
initial, and last name. The software probably performs random lookups in
a dictionary of names. Similar algorithms are used to generate other
headers. The content boundary string, the headers with the X-
prefix and a forged Received
header, all show clear patterns
between the examples.
Most striking is the pattern contained in the Message-ID
headers, of which eight examples
are shown here.
Message-ID: <111101c518f6$c8dbcb2d$3511bb57@pkst.fi> Message-ID: <100001c518f7$95f3a014$35733cb2@laguna1.com> Message-ID: <110001c518f7$89d12751$9e11aa16@tostado.com.ar> Message-ID: <010001c51903$2b95e38f$f9ddef3b@inkk.tk> Message-ID: <011001c51913$abcb792a$ba934b39@mandate.nl> Message-ID: <100101c51916$a7250710$b47397ef@st.vtu.lt> Message-ID: <111001c51916$4eee0050$c74db867@antill.net> Message-ID: <010101c5193f$bdf33582$fd56dd00@cactusbuilders.com> ##### # # #
The last line shows a hash mark wherever a character has been conserved in a specific position through all these examples. The dollar signs in each line are of particular interest. They split the string into blocks of 12, 8, and 8 characters before the @ character. In itself, this is a clear signature for the mailing software being used here. It can be used to identify this software being used in other spam campaigns beyond this current onslaught of porn.
In fact, this pattern is so distinctive that I noticed it right
away when I read the technical analysis of the Sobig worm that I
mentioned earlier in this section. That report includes examples of the
message headers generated by the Send-Safe
bulk mailing software, all of which
match the signature. That software is linked by the authors of the
report to the Sobig worm and its installation of email proxy servers on
infected machines. When I looked at the addresses of the machines that
transferred these spam examples to my server, every one was different,
and several had reverse DNS lookups that suggested they were personal
machines on cable modems or DSL connections. This is strong evidence
that this recent campaign is related to the Sobig infections and may be
using the proxy servers created by that worm.
Like the rifling marks found on a bullet at a crime scene, patterns like this are able to link separate incidents in very specific ways.