Some of the most advanced text-searching software can be found within Internet search engines such as Google. So it makes sense to use these to help uncover links between scam web sites and other dubious operations.
Despite the large amount of spam that I receive every day, I know that I am only seeing a fraction of the total that is out there on the Internet. Likewise, the phishing scams that I encounter represent only a small sample of those that are running at any given time. I know there are other people like me that are investigating these scams, and some of their findings are posted on web sites and newsgroups. I can leverage the good work they are doing to help my own studies by running Google searches on any signature patterns that I come across. It may sound like an obvious thing to do, but it is easy to overlook when you are working your way through the mass of information that some investigations can produce.
The in-depth example on directory listings that I describe in Chapter 5 is a good illustration of how a Google search can quickly expand the scope of an investigation. There I came across several files in a directory listing on a fake bank web site. Some of those had no apparent connection to that bank. I came across several names of businesses and one individual in those files; Google searches on these turned up a very clear connection to a whole series of other scams involving check cashing.
You can make good use of Google’s versatile query syntax to focus
your searches. The AntiPhishing Working
Group (APWG) maintains an archive of reports about phishing
attempts (http://www.antiphishing.org/phishing_archive.html).
Although their site lacks a search function, you can achieve the same
result with a Google query that limits its search to that domain. For
example, the query site:antiphishing.org
paypal
will return all pages that mention PayPal on the APWG
site, many of which will lie within the phishing archive.
The Spamhaus Project maintains a very useful database of information about known spammers called the Register of Known Spam Operations (ROKSO) (http://www.spamhaus.org/rokso/index.lasso). Many of these people are also involved in other scams, and the ROKSO database records can provide considerable insight into the breadth of their activities. Much of the higher level content at Spamhaus is accessible via Google, but individual ROKSO records do not appear to be indexed. Fortunately, Spamhaus has a local search function that can make up for this.
The techniques for pattern matching and text searching that I have described here merely scratch the surface of this field of study. Many of the people responsible for Internet fraud work together in organized or informal groups. They share code and ideas and use their skills to commit multiple instances of fraud. Being able to link these together can help us understand the operation of these groups, and it can help us identify the people behind them. I see a great deal of potential for developing tools that help us make these connections quickly and efficiently. Alongside those, I also see the need for a comprehensive archive of information about these scams that is available for anyone who wants to study them. Such a database would include not only the original email messages but also copies of web sites and related DNS records. The work of the APWG and Spamhaus is taking us in the right direction, but both need to be more detailed and broader in scope, and both need to adopt structured formats that allow others to query their resources more efficiently.