Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
Anti-Spam Toolkit Paul Wolfe Charlie Scott Mike Erwin McGraw-Hill/Osborne New York Chicago San Francisco Lisbon London Madrid Mexico City Milan New Delhi San Juan Seoul Singapore Sydney Toronto McGraw-Hill/Osborne2100 Powell Street, 10th FloorEmeryville, California 94608U.S.A. To arrange bulk purchase discounts for sales promotions, premiums, or fund-raisers, please contact McGraw-Hill/Osborne at the above address. For information on translations or book distributors outside the U.S.A., please see the International Contact Information page immediately following the index of this book. Anti-Spam Tool Kit Copyright © 2004 by The McGraw-Hill Companies. All rights reserved. Printed in the United States of America. Except as permitted under the Copyright Act of 1976, no part of this publication may be reproduced or distributed in any form or by any means, or stored in a database or retrieval system, without the prior written permission of publisher, with the exception that the program listi
Introduction The three of us began our tech careers at a small local Internet service provider, where every dollar was sacred and wasting one was the equivalent of sacrilege. What we discovered then is still true today: Any way you cut it, spam wastes money. Organizations and individuals devote more time, money, and strategy to thwarting spam than to any other online problem. Even the Congress is talking about it. As of this writing, they have passed the Controlling the Assault of Non-Solicited Pornography and Marketing Act (or the CAN-SPAM Act, if you can believe it), and we know that those guys have more important things to do than monitor your e-mail box. This book was written to help you thwart the assault of unwanted commercial e-mail whether you run a sizable organization’s e-mail system or you’re sitting at home banging your head on your keyboard, trying to sift a real e-mail message from the chaff. Within this book, you will find a thorough discussion of spam- fighting tools an
Part I: Preparing for the Fight
Chapter Overview
Chapter 1: Forming Your Plan Against Spam Overview Some years back, it was suggested that in a typical e-mail box, the number of spam messages per day might eventually match the number of regular e-mail messages on a one-for-one basis. Sadly, we must report that this is already true—in fact, we “Internet old fogies” reached that milestone a few years back. Our personal ratio is currently about 30 good messages to around 300 spam messages each and every day! Despite this exponential torrent of unwanted e-mail, almost everyone goes to extraordinary lengths to stem the spam tide. We install filters, we use blacklists, and, increasingly, we rely on advanced content analysis to help us control unwanted e-mail. An entire software industry has emerged that builds applications to assist us in holding back the flood. These applications have grown in size and number almost as rapidly as the amount of spam in our inboxes. Not to be left out, the Internet Engineering Task Force (IETF), a think tan
Chapter 1: Forming Your Plan Against Spam
A Brief History of Spam Believe it or not, the first spam ever transmitted crossed the wire in May of 1978. Even more amusing is the fact that the message actually contained an advertisement and was sent in bulk to every e-mail address on the West Coast (more than 600 users). Hats off to Digital Equipment Corporation (DEC), who caught flames for sending it. Because we knew you would be curious, we’ve included the contents of the first recognized e-mail spam here: DIGITAL WILL BE GIVING A PRODUCT PRESENTATION OF THE NEWEST MEMBERS OF THE DECSYSTEM-20 FAMILY; THE DECSYSTEM-2020, 2020T, 2060, AND 2060T. THE DECSYSTEM-20 FAMILY OF COMPUTERS HAS EVOLVED FROM THE TENEX OPERATING SYSTEM AND THE DECSYSTEM-10 <PDP-10> COMPUTER ARCHITECTURE. BOTH THE DECSYSTEM-2060T AND 2020T OFFER FULL ARPANET SUPPORT UNDER THE TOPS-20 OPERATING SYSTEM. THE DECSYSTEM-2060 IS AN UPWARD EXTENSION OF THE CURRENT DECSYSTEM 2040 AND 2050 FAMILY. THE DECSYSTEM-2020 IS A NEW LOW END MEMBER OF THE DECSYSTEM-20 FAMILY A
The Basics of Fighting Spam Most spam fighting is accomplished by filtering e-mail through a process that determines the probability of each message being “junk.” The next several chapters cover in detail the basic methods used to filter spam; here, we’ve highlighted the major points of each spam-fighting method. Traditional Methods—Filtering by Keyword Early techniques for combating spam stemmed from the detection of keywords in the subject field of the message. For instance, use of such words as promotion, advertisement, and refinance became likely indicators of spam. Fledgling spam filters (often nothing more than command-line scripts) matched and filtered messages against a forbidden word list into specified mail folders for later inspection. As the amount of spam increased, so did the options for filtering it, and so did the accidental removal of real mail. Filtering as a technique was somewhat effective in the beginning, but it gradually declined with accuracy over time. Spammers
Developing E-Mail Policies Now that we’ve taken you through a basic introduction to spam and some of the issues surrounding it, we need to embark upon the first step in bolstering your e-mail posture: policy development. Without getting too bogged down by complexity, let’s quickly review how a general security system is governed. Security implementations are classified into common sets of rules: standards, policies, guidelines, and procedures. Standards represent industry-accepted norms for a particular topic (for example, a certain quality and type of encryption used to secure sensitive information such as a financial institution that may store PIN-numbers using 256-bit AES keys). Policies are company-specific adoptions of standards within a given environment. Our goal here is to help you establish a rudimentary e-mail policy as a starting point. Guidelines are commonly agreed upon methods for enacting standards and policies, but they are not rigidly enforced. Lastly, procedures are s
Spotting Problems Before They Happen The Federal Trade Commission (FTC), overseers of the public good, considers spam e-mail a scourge that’s gaining popularity among the fraudsters. For more information, check out http://www.ftc.gov/bcp/conline/pubs/alerts/doznalrt.htm. Imagine a tool capable of sending millions of solicitations with almost no monetary investment in the hands of the pyramid schemers? We shudder at the thought. The FTC has named its “dirty dozen” scams most likely to arrive via bulk e-mail: Business Opportunities Pyramid, multi-level marketing (MLM), or identity theft scams Bulk E-mail Offers to sell and resell lists of e-mail accounts Chain Letters Requests to send $1 to the five names on the list…or else! Work-at-Home Schemes Envelope stuffing, pen assembly, jewelry construction Health and Diet Scams Pills, herbals, pharmacy, and other drug misrepresentations Effortless Income Get-rich-quick schemes Free Goods and Services Just another MLM or Ponzi scheme Investment
Advanced Topics and Cross-Pollination With even worse goals than filling your mailbox with garbage, spammers are increasingly abusing the general public to extend their blanket of anonymity and increase their rate of solicitation. Previously, we mentioned the open-relay condition whereby spammers create two victims at once. One victim has a misconfigured mail server and acts as a spam-amplifier, sending out millions of e-mails to other recipients who angrily point back to the first victim instead of the spammer. To combat that style of attack, many excellent resources catalog these broken mail servers across the Internet. The Open Relay Database (http://www.ordb.org) is currently reporting more than a quarter-of-a-million active open relays, each of which spew forth spam like an open fire hydrant. An important first step in making sure you aren’t being used is to check that your IP addresses and domain names aren’t listed as a known open relay. The spam promoters make extensive use of
Summary In these pages, we covered a short history of spam origins, took you through the various methods used to combat spam, and built a simplified e-mail policy as demonstration. In the next chapters and throughout the rest of the book, we introduce you to the best spam-fighting strategies and tools available today. In addition to installation and configuration, we also cover each tool in a real-world, spam-fighting scenario. The Anti-Spam Tool Kit is your tool kit and guide to reducing spam across your mail systems.
Chapter 2: Goals and Criteria for Evaluating Spam Control Solutions In Chapter 1, we established that spam was a wildly growing contagion, greater in size, scope, and acceleration than most people realize. We also introduced you to a simple mail content control policy as a preparation for the introduction of anti-spam software to your systems. In this chapter we will explore how e-mail systems are organized, track the flow of mail content through the plumbing of the Internet, articulate some goals that are consistent with filtering techniques, and (most important) develop some criteria that can be used to measure the success of those systems once they have been deployed. The Mail Flow Architecture Let’s start by following a piece of e-mail as it swims its way through the maze of networks, eventually landing in your inbox. Figure 2-1 is a fairly complete example of the potential paths that an e-mail message might take from sender to recipient. Not all links may exist in that chain or ev
Chapter 2: Goals and Criteria for Evaluating Spam Control Solutions
The Digital You: Authentication and Repudiation E-mail is one of the staples of modern mass communication, resplendent in its capacity to tie people together anywhere they happen to be. Even with its age and proven reliability, e-mail has long been criticized for its deepest shortcoming: the lack of any real authentication. Authentication is a just fancy term that means that a verifiable, trusted way exists to determine the identity of the parties on the connection. A good analogy is a conversation on a regular telephone line (including one equipped with caller ID). Many a time we have picked up the receiver and stared cross-ways at the incoming phone number, fearful that a solicitor was preparing his script. This has a strong correlation to an e-mail coming from some unknown domain name (from the other side of the world). What’s drastically different with e-mail is that the e-mail solicitor incurs virtually no cost, either in time or in money. Add to this the near perfect anonymity of
Goals of a Robust Mail Control System Before evaluating specific spam solutions and their implementations, we cover the basics of evaluating a tool’s capabilities in light of your overall mail filtering objectives. Put simply, the goals of deploying spam control systems fall into the same categories as deploying most IT solutions: Total Cost of Ownership (TCO) and Return on Investment (ROI). The difficulty comes in comparing the currency denominated cost figures to the more ephemeral productivity savings costs. Thankfully, the installation of new spam control software is a much easier sale to a CIO than other technology implementations, proven of course that your proposal includes eradicating all the spam in the CIO’s mailbox. Restrict Access to Your E-Mail Identities One of the strongest and most effective goals is also the most difficult to effect: Limiting or removing a spammer’s access to your e-mail identities would stop spam at its source. Of course, that’s hard to imagine, given
Bringing It All Together A perfect system addresses spam as unwanted content using the following process: By denying the spammer ways to harvest addresses. By using a multi-node architecture that is deployed at many levels, providing a distributed approach throughout the mail transmission path. By using a multi-mode approach to layer different types of anti-spam software at the different points. By establishing a ritual of testing, tweaking, and refinement. No system is complete without its upkeep.
Selecting Mail Control Components Now that we’ve covered a broad set of general goals meant to guide you in your selection of an effective anti-spam solution, we turn our attention to the criteria you should use in evaluating the tools that meet those requirements. We start by offering some generalized criteria that can be used across a broad range of anti-spam solutions; then we cover some specific guidelines that you can use to narrow down your selection to a specific package or set of packages. Breadth (All Forms of Mail-borne Content Filtered) By using the goals defined earlier, we can classify the location of implementation, the identification of the spammers, and the content of the spam itself to be strategies of breadth. The term spam encompasses lots of territory (including instant messenger spam, digital fax spam, SMS [short message service] spam), and as the universe of anti-spam solutions mature, we will see them tackling this whole raft of additional content. For instance,
Summary Spam is an abuse of e-mail that is architecturally and automatically “trusted” without any authentication on the front-end. There is an inherent asymmetry of identities in a mail transmission that spammers use to their advantage—that is, they know us, but we don’t know them. The goals of any spam control system should be to reduce spam by limiting access of your data to the spammer to begin with, followed by a successive strategy of controlling what e-mail enters your system. The criteria for selecting anti-spam packages are based on breadth, depth, impact, and operational ease.
Chapter 3: Methods for Mail Content Control Although no bulletproof architectural standard has emerged to restrict or regulate the sending and receiving of electronic mail, many vendors have implemented common methods of control. In this chapter we cover best practices, their corresponding strategies, and a brief technical explanation of each practice and strategy. Building on a Historical Basis First, let’s take a visit to the “Encyclopedia Internet,” where we will find some well- drafted memorandums and position papers used to set the “unwritten” laws of the Net. These documents are referred to as Requests for Comment (RFCs). Since the Internet is a hostile, connect-at-your-own-peril kind of place, “laws” aren’t really practical or technically feasible to enforce, so the best we have is the agreement of large collections of participants that could be coerced to act in consistent ways given certain circumstances. A curious strategy for keeping order in a huge unmanageable place like t
Chapter 3: Methods for Mail Content Control
Analyzing Spam One common element that underlies all anti-spam technology is the reliance on a relatively limited set of standards used to transact e-mail transmissions. In Chapter 2, we learned that almost all e-mail relied on the SMTP, POP, and IMAP protocols for transmission and delivery. This has evolved almost exclusively to promote and ensure the interoperability between vendors of messaging software. Some of you may note that an obvious and ideal way to combat spam would be to rewrite the networking protocols used to send it. Certainly this would solve the bulk of the spam problem, but the engineering and deployment of such a solution at this immense scale is likely impossible. Current efforts that are well underway along these lines mainly involve the IPv6 protocol (the next generation of the common Internet Protocol). We don’t want to belittle the excellent work being done in this area; rather, we’ll redirect your attention to more immediate methods that lessen our collective
New Approaches to Circumvent Advanced Spam Filtering Throughout this book we cover the latest defensive methods for fighting spam. As we’ve illustrated, the more advanced anti-spam solutions become, the smarter the spammers get at building workarounds. Given the success that Bayesian and statistical analysis has had with limiting spam, the senders have basically two big choices to combat it: Reduce the message to its essential parts until it slips through. Develop a way that the Bayesian classifier can’t get a grip on the message. We’ve seen activity in both arenas. We’ve noticed a reduction in the complexity of spam message bodies that we’ve seen over the past year, which suggests that the senders are not only refining their messages, but also testing them against a reference platform to see how well they fare. To avoid the advanced filter entirely, we’ve seen an increase in the number of messages that have been run through a process that converts them into images (JPEGs, GIFs, TIFFs,
Summary A variety of methods are available by which anti-spam solutions get their job done. They can be easily classified as filters that operate using the content of the message itself, filters that target the sender and intermediaries through which spam passes, and filters that target the final benefactor that had underwritten the spam to begin with. We covered a group of current practices in each category, and we even explored some of the more leading-edge technology. Without going into specific products or implementations, we covered the theoretical methods on which those solutions rely and even projected futuristic next steps in the evolution of spam-sending ware.
Chapter 4: Anti-Spam Implementation Strategies Overview In the preceding chapters we covered the origin of spam, its impact to society, and the out-of-control growth that has marked its existence. We also explored the development of some plans for combating it, the beginnings of an e-mail content control policy, as well as the technical merits of the variety of methods that exist for eradicating it. We discovered that to be effective, a strategy of “defense-in-depth” should be adopted and deployed throughout an organization. In short, this means that implementing a single defense at only one point along the mail path is generally inadequate to control an appreciable amount of spam in the long run. In this chapter, we take what we’ve learned and explore some options for building the most appropriate spam control system across a broad set of circumstances, platforms, and operating systems. We briefly identify the technologies covered in depth across the remainder of this book, broken int
Chapter 4: Anti-Spam Implementation Strategies
Choosing the Right Solutions “Defense-in-depth” doesn’t necessarily mean that you have to procure as many different products as you can, and then install them throughout your organization; instead it means that you cover the broadest range of filtering activities as controls during any potential spam transmission. The objective is to get the best bang for the buck with a minimum of long-term maintenance hassles using solutions obtained from stable, reputable firms. Key Factors That Affect Your Decision Of the factors that guide you, we consider an organization’s size, architectural complexity, and ongoing support/maintenance requirements to be the most critical. There are, of course, many other equally important factors that should be taken into consideration, such as what types of training your staff has undergone, the frequency you send and receive mail, and the importance of e-mail to your organization’s overall success, but size, composition, and support requirements really should
Recommendations on Solution Robustness We’ve separated our recommendations into policy and technical recommendations. Policy suggestions are all-encompassing across an entire organization’s methods of operation. They include such things as what escalation procedures should be followed and when to report spammers to the authorities, while technical recommendations cover the specific software, hardware, and architectural goals. Policy Recommendations As suggested previously, it’s best to deploy spam-fighting software at multiple levels. Layered protection produces better results, is naturally scalable, and distributes processing requirements to specific nodes. Additionally, this approach segregates streams of traffic by destination and produces discrete systems for more granular control. Implement strategies based on your organization’s security policies first, content control policies second, and e-mail specific policies last. Since spam filtering falls under the larger domain of genera
Spam Solutions Covered in this Book In this book, we tried to review a broad set of spam-filtering software solutions across many operating environments in the hopes of giving you a good idea of what choices you have out there. Some packages weren’t reviewed and some were not included, but this in no way represents a lack of support for those offerings; we wanted to keep this book moderately broad, but not exhaustively so. Subscriptions to Network-Based Blacklists In Chapter 5, we cover a few of the more popular DNS-based blacklists. Many of the services they provide are free. Even so, we would definitely recommend sending the maintainers a donation if you find their services useful after installing it. We found that most of the installed mail servers that we configured in our testing lab had about equal support for connecting to the blacklists (including Windows Exchange Server, sendmail, postfix, and courier). In Table 4-1, we summarize the highlights of the different lists. For more
Spam Solutions Covered in this Book
Summary In this chapter we covered where to implement the many different methods for spam control, as well the systems necessary for getting the job done. We also looked forward at the rest of the book, giving you an outline of the packages that we cover in depth and how they compare and contrast with each other. Lastly, we explored some of the newer offerings for outsourced enterprise shoppers and investigated a few of the gateway vendors, some of which have pure software offering and some of which sell spam-killing network appliances.
Part II: Building Your Anti-Spam Arsenal
Chapter Overview
Chapter 5: Blocking Spammers with DNS Blacklists In Chapter 4 we introduced you to DNS Blacklists as one of several means for fighting spam. In this chapter, we will look at popular individual DNS Blacklists, explain how to implement them on a mail server, and help you decide which list is the best one to use. When referring to DNS Blacklists, the shorthand DNSBL is often used, and that’s how we’ll refer to them throughout this chapter. Before we talk about specific blacklists and how to implement them, we’ll delve into what DNSBLs are and how they work. Understanding DNS Blacklists DNSBLs are an integral part of any spam-fighting toolkit. The fact that many, many users on the Internet are updating them means you get the benefit of blocking a spammer before the first piece of spam even hits you. To understand how DNSBLs help, you need to know the types of DNSBLs available and how they work. Types of DNSBLs Currently, two different types of DNS Blacklists are used: IP-based blacklists D
Chapter 5: Blocking Spammers with DNS Blacklists
Choosing a DNS Blacklist Many DNSBLs are out there—more than 100 public ones, and who knows how many private ones. Organizationally, they typically fall into three categories: Nonprofit organizations that are dedicated to spam-fighting. These organizations generally have employees and a semicorporate structure. A loose-knit group of administrators who have banded together to fight spam. These groups usually do not have full-time employees, and they resemble open-source projects more than anything else. (For example, though there may be a recognized leader, they follow democratic principles in decision-making.) Individuals who have set up their own DNSBLs for their own private use, but who allow others to use them if they like. In addition to their organizational structure, DNSBLs differ greatly in the way they operate. Areas in which there might be differences include the following: Criteria on what constitutes a spammer (or potential spammer) The method they use to obtain candidates f
Mail Abuse Prevention System (MAPS) Mail Abuse Prevention System, LLC (MAPS for short), is one of the biggest, oldest, most controversial, and most well-known DNSBLs around. Formed in 1997 by a small group that included Internet developer Paul Vixie (author of the BIND software found on the majority of Internet DNS servers), MAPS is a nonprofit corporation based in California. MAPS main web site can be found at http://www.mail-abuse.org. Vixie’s reputation and knowledge has given MAPS a lot of respect among system administrators and a lot of profiling in the press. Unfortunately, MAPS’s methods (which really aren’t all that different from many other blacklists), its high profile, and its ubiquity have made it the legal target for numerous bulk e-mail senders who feel that MAPS is unfairly persecuting them. MAPS even collects money online for its legal defense fund! How MAPS Works MAPS has one of the widest assortments of DNSBLs available. A different group within the MAPS organization
SpamCop SpamCop is a popular DNSBL that has been around since 1998. SpamCop itself is based in Seattle and is run by Julian Haight (who wrote the code) and many contributors. SpamCop has a unique method of keeping its list fresh and removing sites that are no longer spamming in a timely manner, and that appears to make it one of the “fairest” DNSBLs around. Apart from being a DNSBL, SpamCop also offers filtered POP/IMAP/web-mail accounts. In this chapter, we’re not going to go into this service in-depth, but we’ll briefly describe it since it’s a major component of their business. The SpamCop service currently costs $30 per year per e-mail account. You can either forward your existing e-mail account to the service or use it as a new account under the domain spamcop.net (that is, your e-mail address would be your_account@spamcop.net). The service checks incoming mail for viruses, and then it uses SpamCop’s DNSBL to decide whether or not the message is spam. If it decides it is spam, it
Open Relay Database (ORDB) The Open Relay Database (ORDB) lists open relays on the Internet and has done so since 2001. ISPs and organizations large and small use ORDB. It’s a nonprofit headquartered in Denmark, but it has contributors and users the world over. ORDB is simply a listing of reported and tested open relays. It’s different than MAPS’s RSS, in that ORDB doesn’t care whether or not the site has actually been used to send spam before, only that it’s technically an open relay. Therefore, if you use it, there’s a chance of blocking mail from legitimate mail servers, even if those servers aren’t currently being used by a spammer. The thought is that it’s only a matter of time before a spammer finds the open relay and does use it, so it’s better to go ahead and block the site. This also encourages systems administrators to close their open relays if they know their servers might get blocked. The company’s web site is at http://www.ordb.org. How ORDB Works List type: IP-based DNSB
Distributed Server Boycott List (DSBL) The Distributed Server Boycott List (DSBL) is a group of administrators and users who have banded together to fight spam. They are primarily concerned with spam sources that are open relays or open proxies. The DSBL web site is at http://www.dsbl.org. How DSBL Works The DSBL does not test sites itself. Anybody can submit a site to the DSBL, but two types of users actually do so: untrusted and trusted. An untrusted user is anyone who reports spam to the DSBL. A trusted user is someone who has requested an account with the DSBL and provides a rationale as to why they should be a trusted user. The DSBL then gives the user a provisional account that can be revoked if the user violates reporting standards the DSBL sets forth. List List Type: IP-based DNSBLServer: list.dsbl.org This list contains only spam sources verified by users trusted to the DSBL staff. Because of this, it has a lower incidence of false positives than the other DSBL lists. Multihop
SPAMHAUS The operators of Spamhaus believe that 90 percent of all spam in Europe and North America is sent by less than 200 known spammers, which they keep up with in their Registry of Known Spam Operators (ROKSO). By knowing their enemy and tracking their movements from one ISP to another, the Spamhaus Blacklist (SBL) has become a popular and effective DNSBL. Its web site is at http://www.spamhaus.org. How Spamhaus Works List type: IP-based DNSBL Server: sbl.spamhaus.org Spamhaus is updated around the clock by an international team of administrators who are on the watch for known spammers and spamming in progress. Updates are made to the SBL once per hour. The SBL contains a list of known spammers. It does not contain a list of open proxies or open relays, so Spamhaus suggests using its list in combination with a good open relay/open proxy list. It uses the following criteria to decide whether a given IP address will be listed in the SBL: Spam Sources Spammers sending bulk e-mail from
Not Just Another Bogus List (NJABL) Not Just Another Bogus List (NJABL) is run by a group of e-mail administrators who are frustrated with the policies and uptime of existing DNSBLs. They decided to take matters into their own hands and created a blacklist that is almost entirely supported by the e-mail administrators who use it. How NJABL Works List type: IP-based DNSBL server: dnsbl.njabl.org The NJABL has only one list. The NJABL includes any IP address in its list that meets the following criteria: The system is an open relay, open proxy, or is running an open web-to-mail gateway. The IP address belongs to a dial-up or dynamic range. This information is received through American Registry for Internet Numbers (ARIN) records or by ISPs reporting such ranges to them directly. The system has been used to send out spam directly. NJABL tests for open proxies by scanning individual servers for them. It then does an open-relay test by scanning the SMTP port. It takes about four weeks to re
RFC Ignorant (RFCI) RFCI sets itself apart from the rest of the DNSBLs because it is not concerned whether or not a site is, or could be, a spammer. In fact, instead of worrying about spam, RFCI is more concerned about whether or not a domain or IP network block’s administrator is a good “Netizen” (that is, citizen of the Internet). The domain and network blocks in the RFCI have been placed there because their owners are deemed “RFC ignorant.” You may have heard the term RFC before and are vaguely aware of it being related to Internet standards (numerous companies tout their products as being “RFC compliant”). RFC stands for Request for Comments, which is the common name for the rules and best practices ratified and published by the Internet Engineering Task Force (IETF). Many RFCs are like technical blueprints, but some RFCs are human protocols—best practices for configuring and running networks and servers on the Internet. Domains are listed in the RFCI because their owners have refu
Implementing DNSBLs Within Sendmail Sendmail is the most venerable mail transfer agent on the Internet and runs on most Unix and Unix-like operating systems. (In fact, almost every major distribution of Linux comes with Sendmail.) Sendmail added direct DNSBL support with version 8.9, and changed the syntax slightly in 8.10. Support was also possible in version 8.8, but you had to hack your sendmail.cf configuration file to do it. However, due to security vulnerabilities in earlier versions of Sendmail, including major buffer overflows discovered in the spring and fall of 2003 (see CERT Advisories CA-2003-12 and CA-2003-25), we highly recommend that you run the latest stable version. You can find Sendmail at http://www.sendmail.org. Configuring Sendmail for IP-Based DNSBLs By Sendmail standards, configuring Sendmail to use IP-based DNSBLs post version 8.10 is quite straightforward. Edit the sendmail.mc file, or equivalent .mc file, that you’re using. If you are unfamiliar with .mc files
Implementing DNSBLs with Postfix Postfix is another popular Sendmail replacement for Unix and Unix-like operating systems, designed by Wietse Venema while he worked at IBM. IBM released it to the public in 1998 and Postfix is credited with instigating Big Blue’s open-source strategy. Postfix was designed with security and speed in mind. We’re going to cover only Postfix 2.x here. Postfix can be had at http://www.postfix.org. Configuring Postfix for IP-Based DNSBLs Postfix is probably one of the simplest mail servers to configure for DNSBLs. All you have to do is edit your main.cf file (usually in /etc/postfix) and modify (or add) the smtpd_client_restrictions line. This line controls many spam-related functions, including anti-relaying provisions. To configure Postfix 2.x to reject mail included in a DNSBL, make the following modifications: smtpd_client_restrictions = reject_rbl_client <zone server> where <zone server> is the DNSBL you want to use. For instance: smtpd_clients_restricti
Implementing DNSBLs with Microsoft Exchange While Sendmail and Postfix are still the darlings of many ISPs, data centers, and Unix- based shops, Microsoft’s Exchange Server is the most popular corporate e-mail system. While many companies no doubt front-end their Exchange Server with a server running one of the three implementations mentioned earlier, without a doubt many Exchange servers also process mail on the front-end. Exchange 2000 (assuming most of you have upgraded from Exchange 5 and 5.5 by now) had little in the way of anti-spam features. Exchange 2000 required third-party plug-ins or front-end servers to deal with spam. Exchange 2003 has rectified the situation by introducing its own spam-fighting features. We’ll discuss this progression as it pertains to DNSBLs. Exchange 2000 Exchange 2000 requires that you use a third-party application to implement DNSBLs. Almost any anti-spam package for Exchange supports DNSBLs, including those we cover in Chapter 11. You can also use tw
Summary DNS Blacklists help you reduce spam by allowing you to reject, tag, or score e-mail that comes from known or potential spam sources. It gives you the benefit of the experience and resources of others. The primary pitfall of a DNSBL is the relatively high chance of gross false positives—more than most other anti-spam solutions. Entire networks of e-mail could potentially be lost should a major mail server or service reach a DNSBL. Likewise, you’re placing a high degree of trust in the people running and contributing to the DNSBLs. By understanding the technology and philosophy behind individual blacklists, you’ll be able to choose those that suit your needs. Most e-mail servers and anti-spam software support DNSBLs either natively or through third-party add-ons, so there’s no reason why DNSBLs shouldn’t be a part of your anti-spam strategy. Just don’t rely on them as your primary means to thwart spam.
Chapter 6: Filtering Spam with SpamAssassin Now that you understand the theoretical, philosophical, and practical basics of spamming and spam fighting, and you have a thorough understanding of DNS Blacklists and their place in your arsenal, we’ll look at one of the most popular anti-spam programs to hit the Internet in recent years: SpamAssassin. SpamAssassin uses a number of techniques to identify spam, including blacklists, pattern matching, and Bayesian learning. In this chapter, we’ll introduce you to SpamAssassin, help you understand how it works, and show you how to install and configure it on your system. Chapter 7 will delve deeper into SpamAssassin Bayesian learning capabilities, and Chapter 8 will cover advanced SpamAssassin topics, including integration with other mail-filtering programs. Dossier of a Spam Assassin SpamAssassin is a spam filtering utility written in the Perl scripting language (http://www.perl.org), with a few extra utilities written in the C programming lan
Chapter 6: Filtering Spam with SpamAssassin
Installing SpamAssassin In this section we walk you through the installation of SpamAssassin on Linux-based systems. (This also applies to most Unix-based systems.) Typically, when you talk about installing software, you list hardware requirements and any prerequisite software you might need. Here, we also recommend that you have experience installing software on a Linux or Unix system. Although SpamAssassin’s installation and configuration process is simple relative to some tools, we don’t recommend that you cut your Unix teeth on it, either. A good working knowledge of how your system processes mail and its message transfer agent (MTA) is also recommended. MTAs can include the previously discussed Sendmail and Postfix. In this section we’ll discuss installing SpamAssassin in depth—the requirements for using it and how to install it. You can install SpamAssassin via three main methods: The Comprehensive Perl Archive Network (CPAN) An automated means of downloading and installing Perl
Understanding SpamAssassin’s Components SpamAssassin isn’t just one application; rather it’s a collection of tools and an API that can be used in a multitude of ways. In this section we talk about SpamAssassin’s major components. You need to know what it installed before you can begin configuration. The spamassassin Utility The spamassassin utility is a Perl script that calls on the Mail::SpamAssassin Perl module. In many cases, it’s the major workhorse of a SpamAssassin installation. Since it’s just a script, spamassassin has to be called from Procmail or from a mail filter-type utility, such as MIMEDefang or amavisd-new, both of which we discuss in Chapter 8. By default, spamassassin is installed in /usr/bin (or ~/sausr/bin if it’s a personal installation). Essentially, Procmail pipes e-mail through spamassassin and compares the messages against its rules. Spamassassin also has a host of other command-line options, such as adding and removing addresses (inside an e-mail) from your bl
Configuring SpamAssassin You can run SpamAssassin in two ways: One is to use Procmail to run the spamassassin utility. The other is to run the spamd daemon and use it in conjunction with the spamc client (also run by Procmail). In this section, we cover configuring SpamAssassin for both per-user and site-wide installations. The primary difference is that site-wide installations can use spamassassin or spamd/spamc, while per-user installations can only use spamassassin. We used Procmail for this installation, but we also cover other mail filters in Chapter 8. Per-User Configuration A per-user configuration should be implemented when an individual user wants to run SpamAssassin. This is required for a personal use installation, but it can also be used if a particular site offers SpamAssassin but doesn’t “turn it on” by default for all users. The first thing you need to do is set yourself up to use Procmail, if you’re not using it already. To do this, first locate Procmail on the system b
An Introduction to SpamAsssassin’s Output Now your users should receive spam messages attached to spam reports, letting them make the decision as to whether the messages are legitimate or spam. Either way, SpamAssassin’s process protects users by keeping them from viewing spam messages. Looking at a Message A sample spam report is shown in Figure 6-3. By default, the report preserves the Subject line and the From address, so users can quickly decide whether the mail came from a legitimate user. Figure 6-3: A spam report from SpamAssassin If you open the mail, the spam report appears. It states that the attached e-mail is probably spam and gives you a resource to visit if you need more information (this is the e-mail address or URL we filled in earlier). It goes on to provide a text-only preview of the content, if it can (some messages are simply a few HTML tags and GIF images or otherwise cannot be displayed in the preview). Finally, it offers its analysis of the content, breaking it d
Summary SpamAssassin is one of the most popular anti-spam tools available on Linux today. It’s free, extensible, and packed with features, such as Bayesian analysis, DNSBLs, and an extremely large number of header and body tests. You can install and run SpamAssassin on your system in several ways, as a user or an administrator, and you’ll have to pick the method that best fits your plan against spam. Now that you understand the basics of how to install and configure SpamAssassin, in Chapters 7 and 8 we cover its Bayesian analysis features and advanced configuration examples.
Chapter 7: Catching Spam with SpamAssassin’s Bayesian Classifier Version 2.50 of SpamAssassin was the first version of the software to include a Bayesian classifier. Prior to that version’s release, if you wanted to use Bayesian analysis you had to run a separate program, such as SpamBayes or Bogofilter, in conjunction with SpamAssassin. While both of those are fine programs, including Bayesian analysis in SpamAssassin makes your anti-spam implementation much simpler and cleaner. SpamAssassin allows you to use Bayesian analysis with either site-wide or user-specific Bayes databases. Like many other Bayes implementations, SpamAssassin based its algorithms on those described in Paul Graham’s forward-thinking paper, “A Plan for Spam,” found at http:// www.paulgraham.com/spam.html. (Note that this chapter does not explain how Bayesian classification works. For a detailed explanation, see Chapter 4.) Implementing SpamAssassin’s Bayesian Classifier Bayesian analysis is turned on by default i
Chapter 7: Catching Spam with SpamAssassin’s Bayesian Classifier
Looking at SpamAssassin’s Bayes-Related Files The SpamAssassin Bayesian classifier uses a number of files to track and analyze tokens. Tokens are words that, through “learning,” are found to have a certain probability of being in spam or ham. By default, SpamAssassin builds separate Bayes databases for each user and stores each database in the user’s home directory in ~/spamassassin. Each Bayes database–related SpamAssassin file begins with bayes_, and each is described here: bayes_toks This is the database of tokens (words) that are taken from analyzed messages. Each token is given a count of its frequency in spam and ham messages, along with the message count of the last message in which it was seen. This file is a binary file, but you can use the Unix strings command to see the various tokens. It makes for amusing reading. You can also get a dump of the file using the sa-learn utility, which we talk about later in this chapter in the section “Teaching SpamAssassin’s Bayesian Classif
SpamAssassin’s Bayes Rules SpamAssassin’s Bayes rules are stored in a file called 23_bayes.cf in /usr/share/ spamassassin. You won’t learn much by looking at the rules themselves, however, since an internal Bayesian engine performs all Bayesian calculations, but it does give you an idea of how SpamAssassin ranks its Bayesian scores. As you can see in the following list (taken directly from 23_bayes.cf), SpamAssassin groups percentages of probability together, rather than create a separate rule for each percentage. So, for example, any message that Bayes gives a 66 percent chance of being spam is assigned to the BAYES_60 rule. describe BAYES_00 Bayesian spam probability is 0 to 1% describe BAYES_01 Bayesian spam probability is 1 to 10% describe BAYES_10 Bayesian spam probability is 10 to 20% describe BAYES_20 Bayesian spam probability is 20 to 30% describe BAYES_30 Bayesian spam probability is 30 to 40% describe BAYES_40 Bayesian spam probability is 40 to 44% describe BAYES_44 Bayesian
Automated Learning SpamAssassin has the option of allowing the Bayesian classifier to auto-learn what is spam and what is ham based on the score SpamAssassin assigns to a message using its various other rules and pattern-matching. Bayesian auto-learning is turned on by default. However, you can turn it on or off using the following line in your local.cf or user_prefs file: bayes_auto_learn (0|1) You don’t necessarily want SpamAssassin to learn from any message tagged as spam—especially if your required hits are set to 5—because doing so will mean a large number of false positives will get analyzed for spam keywords. Likewise, you don’t want just any message that doesn’t get tagged as spam to be auto-learned as ham, because a few spams will get through (as false negatives) for scores between 0 and 5. Fortunately, SpamAssassin has a way around this conundrum. It allows you to set minimum and maximum score thresholds for when to auto-learn. By default, SpamAssassin auto-learns as spam mes
Training SpamAssassin’s Bayesian Classifier To train SpamAssassin’s Bayesian classifier, you use an included command-line utility called sa-learn. In fact, this powerful tool is used to control much of the Bayesian classifier. To run the tool, you’ll need access to the system on which SpamAssassin resides. On a Unix system, this means shell access, unless your system administrator has set up a way for you to run sa-learn through a web-based system such as Webmin, Usermin, or some other utility. Giving sa-learn Input The sa-learn utility can take input from three different sources: text-based files, directories, or the Unix mbox (that is, mailbox) format. A file is just what you’d expect: An individual e-mail is saved off as a file and then given to sa-learn as input. A directory is a place where a collection of such files is stored. By default, sa-learn treats any input source as a file or directory, so no special command-line switches are required. If you’re using a mail transfer agen
Implementing Bayes System-Wide Thus far we have discussed implementing SpamAssassin’s Bayesian classifier on a per-user basis, with each user having his or her own database and journal. In some instances, it may be desirable to run SpamAssassin on a system-wide basis instead. One such instance is when you’re running SpamAssassin system-wide and do not allow users to train the Bayesian system themselves. When your SpamAssassin server isn’t the final destination for the mail, but it simply forwards processed mail onto a “smart host” that contains the user accounts, it’s not possible or practical for users to have individual Bayes databases (we explore an example of this in Chapter 8) or for users to train the database. For an entire system to share the Bayes databases among users, you must set the bayes_path configuration option in local.cf, using the following syntax: bayes_path /path/to/file The default path is ~/spamassassin. For sharing the databases system-wide, you may want to crea
Bayesian Learning Caveats While Bayesian learning is a vast improvement over many of the older methods of spam detection, it still has its downsides, weak points, and “gotchas.” When using Bayesian learning with SpamAssassin, keep the following in mind: The database needs to learn from at least 200 ham and 200 spam messages before message analysis (and thus SpamAssassin’s Bayes scoring rules) kicks in. It may take at least 1000 spam and 1000 ham messages before Bayesian analysis becomes most effective. If you’re running SpamAssassin in a site-wide configuration where it forwards mail on to a smart host after processing, it may not be possible for users to teach the system easily. This is because when the mail reaches the smart host, it’s no longer in the same place (server) as the Bayes database. The trick is to get it back to that server for processing, which, depending on the smart host MTA and end-user client, may not be possible (we discuss options that address this in Chapter 8).
Summary Bayesian classification is an excellent addition to SpamAssassin, even if its sole purpose is to eliminate the need to have a separate program running to perform this task. It’s implemented by default in SpamAssassin and can be tailored to your needs using SpamAssassin’s straightforward text configuration files. Teaching the system is accomplished through the sa-learn command-line tool. Although the Bayesian classifier uses per-user databases by default, it can also be configured to run on a site-wide basis—though this may reduce its effectiveness. In Chapter 8, we address Bayesian classification again, and we cover some of the more advanced capabilities included within SpamAssassin.
Chapter 8: Enhancing and Maintaining SpamAssassin SpamAssassin runs well without any kind of tuning, but if you want to get the most out of your system, you need to know when and where to make modifications. After a couple months of running SpamAssassin, you will inevitably start to see more spam sneak through. This is a sign that spammers have adapted to your defenses, so you must adapt as well. In this chapter, we cover ways in which you can fine-tune and improve your SpamAssassin installation. We go in-depth with some specific configuration options and touch upon helper programs. Creating Your Own Rules One of SpamAssassin’s most powerful features is that it lets you create your own rules. While the SpamAssassin developers put out a new version every few months, you’re likely to get a crafty spam that doesn’t match any of your current rules or—worse—uses SpamAssassin’s own rules against you. In such cases, the ability to create and modify rules is invaluable. SpamAssassin's Double-E
Chapter 8: Enhancing and Maintaining SpamAssassin
Whitelisting and Blacklisting SpamAssassin has many, many whitelisting and blacklisting features. We’ve already covered the most obvious ones in Chapter 6: whitelist_from and blacklist_from. A few other rules let you customize what networks you trust and how much spam individual users receive. trusted_networks The trusted_networks setting (in local.cf or user_prefs) is a way of whitelisting e-mail coming from specific IP networks. Hosts in listed networks are never checked against DNSBLs. It’s important to include only networks that you are sure do not contain open proxies or open relays, or those that are used by spammers. Networks with netmasks are entered in classless interdomain routing (CIDR) notation (/24, /16, and so on). So, for example, if you want e-mail from the entire (fictional) class C address 192.168.100.0, you enter this: trusted_networks 192.168.100.0/24 If you do not specify the last octet but leave a trailing dot, the maximum number of hosts for that network is used.
Localizing An easy way to tweak SpamAssassin to reduce spam from foreign sources is to localize it to your particular user base. That is, if all of the users on your mail system are English- only speakers, they are unlikely to get messages they want in a Chinese character set. SpamAssassin gives you the ability to score such messages higher. You can test message localization in two ways in SpamAssassin, and both can be configured either in the local.cf or user_prefs files. The first is the ok_locales setting and the second is the ok_languages setting. ok_locales This ok_locales setting checks the character set of the received e-mail. It activates the rules CHARSET_FARAWAY, CHARSET_FARAWAY_BODY, and CHARSET_FARAWAY_ HEADERS. It supports only the settings shown in Table 8-1. Table 8-1: Settings for ok_locales Setting Description all All character sets (default) en English (and other western) character sets ja Japanese character sets ko Korean character sets ru Cyrillic character sets th
Using MIMEDefang with SpamAssassin MIMEDefang is a mail filtering processor for UNIX systems that was originally created as a way to thwart viruses by deleting attachments likely to contain them (such as .exe and .com executables). It still has that functionality, along with the ability to convert Microsoft Word .doc files into HTML, and it can “defang” MIME attachments in other ways. It can also be used in conjunction with a number of anti-virus programs to detect and remove viruses based on signatures. Its focus has shifted to anti-spam activities, and it can now work with SpamAssassin to tag or reject spam. It is written primarily in Perl (with some C components for quick processing) and is therefore highly configurable by system administrators. Like many thriving open-source projects, MIMEDefang is available both as a freely available package and a fully-supported commercial product. Its free, community-supported version is available at http://www.mimedefang.org/. The commercial ve
Using amavisd-new with SpamAssassin amavisd-new is to postfix what MIMEDefang is to sendmail—or at least that’s the simplest way to put it. Actually, you can run amavisd-new with a number of MTAs, including sendmail and qmail, but it seems to be the mail filter of choice for postfix users. amavisd-new sprang out of the AMaViS project (which stands for A Mail Virus Scanner). Whereas AMaViS focused mostly on integrating virus scanning with MTAs, amavisd-new also added SpamAssassin integration and other features. Like SpamAssassin itself, amavisd-new is written in Perl with some helper components written in C. amavisd-new works by running as a daemon on a specified port on the mail host. Incoming mail is sent to your MTA as usual, then back out the MTA and through amavisd-new (where scanning and modifications are made), and then back to your MTA for delivery. amavisd-new does not run as a plug-in to your MTA, but it runs as a separate ESMTP mailer on a specified port. Your MTA also listen
Using SpamAssassin as a Gateway to Another Mail Server You don’t need to run SpamAssassin on the server that is the ultimate destination of the mail. In fact, in many cases this may not be possible. In such a case, the MTA running SpamAssassin can simply act as a gateway to another “smart host” after scanning. One situation where this makes sense is when you have a public-facing mail server on your DMZ (semitrusted network) where you scan mail for spam and viruses before it hits the interior mail server. Another is when you aren’t running a UNIX or Linux system as your final-destination mail server (you’re running Microsoft Exchange or Lotus Notes, for instance), but still want to run SpamAssassin. Figure 8-3 shows a configuration example. Figure 8-3: Running SpamAssassin as a gateway to another MTA server The best way to configure a SpamAssassin server as a mail gateway is to use either MIMEDefang or amavisd-new. If you simply use Procmail in conjunction with the spamassassin program,
Summary In this chapter we learned how to tweak SpamAssassin to make it more effective. One way to do this is to add and modify your own SpamAssassin rules, which can be challenging to get right, but of great benefit. Modifying whitelists and blacklists is another powerful way to fine-tune how much spam is allowed to get through and avoid false positives and false negatives. Localizing SpamAssassin for the languages in which you normally get e-mail reduces foreign spam. Finally, using filtering programs such as MIMEDefang and amavisd-new gives you the ability to filter for viruses, manipulate headers, and use smart hosts.
Chapter 9: Configuring Popular E-mail Clients for Spam Filtering In Chapters 6, 7, and 8 we installed and configured SpamAssassin. In most cases, however, this takes care of only half the job that needs to be done. SpamAssassin is now tagging messages it thinks are spam, but the messages are still coming through, and a manual process for dealing with them is still necessary. Fortunately, most e-mail clients allow you to filter e-mail based on certain criteria—such as specific text in a header. By doing this, we can filter out and eliminate messages that are tagged by SpamAssassin. In this chapter, we’ll configure the anti-spam and filtering features on four popular e-mail clients: Eudora, Mozilla Mail, Outlook Express, and Outlook. Configuring Spam Filters on Eudora Eudora is a popular e-mail client that has been around since the early 1990s, with versions for both Windows and Macintosh (OS 9 and OS X) platforms. In this chapter we cover only the Windows version, but the features are t
Chapter 9: Configuring Popular E-mail Clients for Spam Filtering
Configuring Spam Filters on Mozilla Mail Mozilla Mail is included as an optional part of the Mozilla browser suite installation. (It’s actually Mozilla Mail & Newsgroups, but since we’re not concerned about reading news in this chapter, we’ll drop the Newsgroups part.) Mozilla spawned when Netscape, experiencing intense competition with its Communicator browser suite from Microsoft’s Internet Explorer, released its source code to the open-source community in 1998. Since then, Mozilla has become a popular cross-platform browser (it’s the default browser in Red Hat and many other Linux distributions), but it still barely made a dent in Microsoft’s market share. The Mozilla Foundation releases binaries of Mozilla and Mozilla Mail for Windows, Linux, and Mac OS X, though you can also typically find binaries for Solaris and HP-UX, to name a couple. If a binary doesn’t exist for your platform, there’s always the option of compiling it yourself from source code, which Mozilla makes readily av
Configuring Spam Filters in Outlook Express Microsoft Outlook Express (OE) for Windows is distributed for free as part of Internet Explorer. It’s designed primarily to be an Internet mail and news client for home users, as it lacks the corporate features of Outlook (such as a Microsoft Exchange connector, scheduling, and so on). It supports the Post Office Protocol (POP), Internet Message Access Protocol (IMAP), and Hypertext Transfer Protocol (HTTP) for reading mail, as well as the Secure Socket Layer (SSL) versions of POP and IMAP. For sending mail, OE supports Simple Mail Transport Protocol (SMTP), and for Internet News, Network News Transfer Protocol (NNTP). For many Windows users, OE is their first and only e-mail client. As of this writing, Outlook Express 6 (a component of Internet Explorer 6) is the latest version. Even though OE has been around for many years, it lacks solid anti-spam features and requires third-party applications and plug-ins—such as those we cover in Chapter
Configuring Spam Filters on Outlook Microsoft Outlook comes with the Microsoft Office suite of programs. Microsoft’s corporate-class e-mail client has many more features than Outlook Express. In fact, the primary similarity between Outlook and Outlook Express is the name—almost everything else is different. Like OE, though, Outlook can connect to mail servers using POP, IMAP, and HTTP (for web accounts). But it can also connect directly to Microsoft Exchange servers for e-mail, calendaring, and public folders. We are covering Outlook 2002 (part of Office XP) in this chapter. The version you have installed depends on the version of Office you are using (Outlook 97 came with Office 97, Outlook 2000 came with Office 2000, and so on). Configuring Outlook’s Junk and Adult Content E-mail Filters Outlook 2002 has built-in Adult and Junk e-mail filters. These filters operate by comparing the messages headers and body against a list of keywords and also by comparing user-defined lists of known
Summary In this chapter we’ve looked at four popular e-mail clients: Eudora, Mozilla Mail, Outlook Express, and Outlook. Each has its own set of anti-spam features. Some, such as Eudora’s SpamWatch, are quite robust and full-featured. Others, such as Outlook Express, offer little more than manual blacklists. All of them will allow you to filter messages based on certain criteria, which makes them extremely useful when filtering SpamAssassin-tagged messages. Configuring your e-mail clients in this way is helpful in keeping spam out of sight.
Part III: Implementing Other Popular Anti-Spam Tools
Chapter Overview
Chapter 10: Anti-Spam Clients for Windows Overview By now you’ve learned about powerful tools that can stop an unwanted e-mail message before it reaches your inbox. But what if these tools don’t provide enough protection? Your mail system might block 99 percent of the spam that hits the mail exchanger, but if you get 10,000 pieces of spam a day, you still end up with 100 pieces of unwanted e-mail every day. And what do you do if your ISP doesn’t filter mail at the mail server? That’s where anti-spam clients come in. As the last line of defense (other than the Delete button) in the war on spam, client- side mail filters use a variety of standard and advanced methods for detecting, matching, filtering, and tweaking spam. In this chapter, we cover spam-client solutions for the Microsoft Windows operating systems (for the most part tested on Windows 2000 or XP systems). The clients covered are organized by method of detection (POP proxy, Outlook plug-in, and Other) and method of filtering
Chapter 10: Anti-Spam Clients for Windows
SpamBayes SpamBayes is an open-source anti-spam tool. The development group released a version of SpamBayes, first developed as a platform-independent spam filter for UNIX, as a plug-in for Microsoft Outlook 2000 and Outlook XP. This plug-in uses Bayesian statistical analysis to quantify incoming e-mail messages as spam, ham (good e-mail), or unknown. SpamBayes then sorts the classified mail according to your configuration and the mail folders you have set up. How It Works SpamBayes is a powerful Outlook plug-in that relies heavily on user interaction (on the front-end) and machine learning to classify incoming mail as spam or ham. First, the user presents examples of spam to SpamBayes—the more examples the better. The program analyzes the e-mail’s headers, including To and From e-mail addresses, Subject, and the text of the message itself, building a statistical model definition of spam for that user. Next, the user presents examples of legitimate e-mail, and SpamBayes repeats the pro
SpamPal SpamPal is an anti-spam proxy program that relies on DNS blacklist/banlist information to tag and sort suspected spam. SpamPal is available for Windows 9x/2000/XP and is compatible with any standard Post Office Protocol 3 (POP3) e-mail client, such as Eudora, Outlook/Outlook Express, and Pegasus Mail. How It Works SpamPal monitors POP3 messages as they pass to the user’s mail client, acting as a type of proxy between the mail server and mail client. SpamPal compares the From header field to DNS blacklists (DNSBLs). If the message has passed through any host on these blacklists to reach your mailbox, SpamPal flags the message as spam with a special message header. The user configures the mail client to sort messages containing this special spam header into a specific folder for further review. Of course, as with any blacklist-based spam filter, many legitimate messages pass through blacklist hosts to reach your Inbox. Thus, these legitimate messages must be hand sorted back to t
SpamCatcher Mailshell SpamCatcher is a complex e-mail filtering plug-in for Microsoft Outlook 2000/2002/XP as well as for POP3 clients, such as Eudora, Netscape Messenger, and the like. Thwarting spam utilizing both approve and block lists and a remote algorithmic rules engine, SpamCatcher identifies, tags, and filters incoming spam. In addition to the POP client version, a SpamCatcher service is also available for web-based e-mail sites, such as Yahoo!, Hotmail, and America Online. How It Works SpamCatcher Universal installs as a proxy to the mail client. Once installed, the user can stay with SpamCatcher network’s default configuration or set custom filter strengths and further customize the proxy with approve/block e-mail lists. E-mail accounts already in the user’s address book automatically update to the default Approved Senders list. SpamCatcher scans each e-mail, assigning it a “fingerprint” ID (essentially an algorithmic hash), and compares it to IDs on the SpamCatcher network.
Lyris MailShield Desktop Lyris MailShield Desktop is an e-mail proxy application that uses filtering rules, “fuzzy logic,” and e-mail lists to filter incoming spam before it reaches your Inbox. Supporting POP, IMAP, and MAPI protocols as well as msn.com and Hotmail’s HTML interfaces, Lyris MailShield Desktop also integrates a spam reporting system with numerous configuration options. MailShield Desktop runs on Windows 98, Me, 2000, and XP and supports any POP3, IMAP, or MAPI e-mail client. How It Works MailShield uses a combination of tactics to detect, score, and filter both legitimate and spam e-mail. First, MailShield uses a complex set of token-based rulesets that scan every word in an e-mail and weight the appearance of certain words. Words are weighted for where they appear in the message as well, and these word lists are user-configurable. Additionally, MailShield employs standard e-mail whitelists and blacklists, sender/domain verification, and other more personalized settings
SPAMfighter SPAMfighter is a distributed anti-spam solution that relies on individual users of the client software to update a central server with known spam messages. SPAMfighter is available for Windows 98/Me/2000/XP and functions as a plug-in for Microsoft Outlook 2000/2002/2003 and with Outlook Express. How It Works Once installed, SPAMfighter compares incoming e-mail to known spam messages as communicated to the central SPAMfighter server. Messages matching those found on the server are automatically sent to the SPAMfighter folder within Outlook. Spam messages that manage to get through SPAMfighter’s filter system can be flagged as spam. Flagging spam messages automatically updates SPAMfighter’s central server, and if enough users report the same spam message, the message is flagged for all other SPAMfighter users. Few details about the “gut-level” operation of SPAMfighter appear on the product web site. As SPAMfighter’s designers put it, “the spammers might be reading our site as
SpamButcher SpamButcher is a POP3 proxy anti-spam tool that uses an unconfigurable “fuzzy logic” system to filter suspected spam. Available for Windows 95 and up, SpamButcher functions with any POP3 client application, including Outlook, Outlook Express, Eudora, Pegasus, and the like. How It Works SpamButcher runs as a proxy to a POP3 client, downloading e-mail messages, performing matching to blacklists and whitelists, and executing anti-spam filtering rules against the message headers, subject, and body. Messages detected as spam are flagged and deleted, while legitimate messages are passed on to the POP client. As suggested, e-mail addresses can be added to SpamButcher’s whitelist for automatic passage to the mail client or added to the blacklist for automatic designation as spam. Additionally, SpamButcher issues a regular spam report that lists messages the application has deleted. These messages can be restored if SpamButcher has inadvertently deleted a legitimate message. The use
iHateSpam iHateSpam from Sunbelt Software uses heuristic processing, rules-based engines, and block/unblock lists to identify incoming spam messages. Running as an Outlook or Outlook Express plug-in, iHateSpam processes e-mail and quarantines spam outside of the mail client for further review. iHateSpam runs on Windows 9x/Me/2000/XP and supports all versions of Outlook and Outlook Express. iHateSpam is also distributed as Postal Inspector by the Giant Company, though the configuration and operation of the program is essentially the same. Postal Inspector also works as a plug-in with AOL versions 6.0 and 7.0. How It Works The iHateSpam spam detection engine uses libraries of spam and nonspam semantic knowledge available on the Sunbelt Software central server and the user’s incoming mail. Using a heuristic-determined probability based on the process as described, user-definable rules-based filtering, and blacklists, iHateSpam designates the message as spam or ham, catching spam in an off
SpamNet Cloudmark bills SpamNet as “collaborative spamfighting.” Operating as a P2P network application, users running SpamNet contribute to the anti-spam community just by deleting spam from their Inbox. Once deleted, other SpamNet users can tag these unwanted e-mails as spam. SpamNet operates on Windows 98/NT/2000/XP and works with Outlook 2000/2002/XP. How It Works SpamNet functions as an Outlook plug-in. When anyone within the SpamNet community designates an e-mail as spam, this information is sent to Cloudmark’s central repository. Here the report is evaluated. If it’s a legitimate report of spam, the message is added to the SpamNet block list, and the message filters as spam for other SpamNet users. To prevent abuse, each SpamNet reporter is ranked according to the number of “good” reports they have made. As the message arrives at the user’s e-mail application, SpamNet generates a one-way hash that represents the e-mail. The SpamNet application transmits this message fingerprint
KnockKnock KnockKnock is a POP3 e-mail filter that manages spam using approved and denied lists with a twist. The program incorporates secret passwords and an interesting management system to prevent nonsolicited e-mail from reaching you at all. KnockKnock is available for Windows 9x/2000/XP and works with most POP3 clients, including Netscape, Outlook Express, and Eudora. How It Works With KnockKnock, you set up approved and blocked senders lists on the outset, and these lists function as other spam clients of this ilk. When you subsequently receive e-mail from a sender that’s not on one of the lists, KnockKnock compiles the sender addresses and subjects of the questionable messages and mails them to you for review. From here, you accept or deny the messages (and thus the senders), allowing KnockKnock to learn, after a fashion, what spam is to you. Additionally, you can issue a secret word that senders not on your approved list can include in the subject of their e-mail to bypass Knoc
POPFile POPFile is a free, open-source anti-spam tool that uses word and logic filters to classify e-mail and sort it into buckets (folders) as directed by you. As POPFile is trained by your direction, it starts to sort mail on its own. Although it functions as a spam-fighting tool, POPFile can also sort your e-mail into any buckets you desire. Thus, if you have personal and business messages arriving at the same POP box, POPFile can sort these by the same logic it uses to fight spam. POPFile is distributed in both a Windows-based version and a cross-platform version. POPFile is compatible with any POP e-mail client and the Windows version operates on Windows 9x/NT/2000/XP/2003. The cross-platform version operates with any operating system that runs the Perl programming environment. We cover the Windows version in this chapter; however, both versions run exactly the same. How It Works POPFile operates as a POP mail proxy, scanning incoming messages for keywords and performing Bayesian
Chapter 11: Anti-Spam Servers for Windows In previous chapters, we’ve talked a lot about client anti-spam tools and how they are great for individual users. But what about tools for the organization? The logical chokepoint for spam is at the mail gateway, and since most organizations do not run UNIX-based e-mail solutions, we offer the following Windows-based server solutions. iHateSpam Server Edition Why not start with the tool whose name says how we all really feel about spam? If you think we already covered this product in Chapter 10, you’re only half correct. In addition to a client tool, Sunbelt Software also distributes a server-based anti-spam tool. Like the client version, iHateSpam Server Edition is a multistrategy spam fighter using semantic and rules-based filtering and black/whitelists to block spam at the mail gateway. Out of the box, iHateSpam claims a 90 percent or better accuracy rate, although we had a considerably lower percentage on initial install. iHateSpam runs on
Chapter 11: Anti-Spam Servers for Windows
GFI MailEssentials MailEssentials is a Bayesian filter-based anti-spam server solution available from GFI, Inc. In addition to spam filtering, MailEssentials adds server-based e-mail tools such as global disclaimer signatures, reporting, mail archiving, and auto-replies. How It Works MailEssentials controls spam at the gateway by applying Bayesian rulesets, blacklists and whitelists, and other functions to all incoming mail. Like most Bayesian filter-based tools, MailEssentials learns the difference between spam and legitimate e-mail over time within your specific enterprise. MailEssentials filters scan each message in its entirety, firing on keywords, checking for whitelisted/blacklisted domains and e-mail addresses, and verifying header information, such as domains, forged headers, mutation, and the like. Once the scan is done, it applies a weight to the message (its likely spam probability) and filters it according to thresholds that you set. In addition, MailEssentials checks third
Trend Micro Spam Prevention Service Spam Prevention Service (SPS) is a feature-rich spam-fighting tool from Trend Micro. Although its spam-filtering process is similar to that of other tools covered in this chapter, its deployment strategy is different. SPS fights spam as a pass-through SMTP server, meaning that instead of applying rules to e-mail already received by the mail server, SPS filters mail before it ever touches the mail server. How It Works Deployed between the mail server and the Internet, SPS assigns a numeric value to incoming e-mail based on an equation formed by rules that apply a spam score or weight to the incoming message. The spam score is then compared to a global threshold and the mail is either forwarded on to the mail server, tagged as spam and forwarded on, held on the SPS server, or deleted entirely. SPS runs on its own machine and monitors port 25 (the SMTP port). In addition to its complex filter set, SPS also filters mail using the standard whitelist/black
Chapter 12: Anti-Spam Tools for Macs We’ve discussed client and server tools for Windows-based platforms, of which there are many. However, client tools for the Macintosh operating systems are relatively few. Gratefully, most UNIX-based implementations also compile on the OS X platform. For those wanting Mac-centric anti-spam clients, especially those still available for OS 9, we present four solutions. PostArmor PostArmor is a Java-based simple mail filter that runs as a Post Office Protocol (POP), Authenticated POP (APOP), or Internet Message Access Protocol (IMAP) proxy. The version we cover in this chapter is essentially the same across Windows, Linux/UNIX, and Mac OS 9 and X, although the installers are different for each platform. PostArmor is not open-source, but it is distributed for free if used as a single-machine client application. If used in the server mode, the application must be registered and a nominal fee paid, depending on the implementation. How It Works PostArmor a
Chapter 12: Anti-Spam Tools for Macs
POPmonitor POPmonitor is a simple e-mail management program that incorporates limited filtering, black/whitelisting and other anti-spam tool functions. As advertised, POPmonitor allows you to connect to your mailbox before mail downloads to your mail client and manually delete unwanted e-mails, run word filters against message headers and body, and apply black/whitelists to your mailbox. Of all the clients covered in this book, POPmonitor is the simplest in functionality and operation. We cover the Mac OS X version of POPmonitor, although a version for Mac OS 8/9 is also available. POPmonitor is distributed as shareware, limiting some functionality until the software is registered for a nominal fee. How It Works POPmonitor connects to your mailbox and downloads the headers of all the messages found there. From the main interface, you may then manually select individual messages to read, save, trust, block, bounce, or delete. Additionally, you can configure and apply simple word and bla
Spamfire Spamfire is a filter-based POP/IMAP/Hotmail proxy for Mac OS 9 and OS X, distributed by Matterform Media. Using scored filters that search the headers and bodies of messages, Spamfire matches and flags incoming e-mail before it hits your Inbox. We cover the OS X version in this chapter. How It Works Spamfire downloads e-mail from your mail server and applies word-based filtering rules against the headers, body, and attachments. Each time a rule matches, Spamfire applies a weighted score, filtering those that score over its threshold. The program automatically updates filters from the Matterform server, and you can also add custom filters at any time. Additionally, you can add senders to a Friends or Spammers list to accept or block incoming mails automatically from e-mail addresses or domains. All of the incoming mail is managed from a central console before the messages download to your mail client. Installing Spamfire is distributed as an install package for Mac OS X and OS
MailGoGoGo Our last Mac entry is the POP proxy mail filter with little to configure or manage. Distributed by the Japanese company Maki Enterprise, MailGoGoGo is a mail filter that utilizes word matching and black/whitelists to manage incoming spam e-mail. MailGoGoGo runs best on OS 9 and earlier versions. MailGoGoGo does not have the features and extensibility of other anti-spam tools for the Mac, but for a simple mail filter with black/whitelist functionality, it performs adequately. In our testing, MailGoGoGo caught about 75 percent of the spam we shot at it. How It Works MailGoGoGo comes complete with its own set of filters and black-box analysis methods. The program operates independently of your e-mail client, checking your mailbox and processing the messages either when you prompt it or on a scheduled basis. MailGoGoGo scans the body of the message and performs a context-sensitive analysis of the words and phrases found there, assigning a spam score to the message. If the score
Summary The anti-spam client tools written specifically for the Mac maintain that Mac-centric look and feel and provide just as much functionality as Windows or Linux clients. While there are fewer spam tools specifically for the Macintosh, Mac OS X can compile and run any of the anti-spam tools for the Unix platform, such as POPMonitor and SpamBayes. If you’re on an OS earlier than OS 9, your options are limited to software that won’t be updated and lacks the features of modern spam-fighting tools.
Chapter 13: Anti-Spam Tools for Linux In Chapters 6, 7, and 8 we covered the granddaddy of anti-spam tools for Linux: SpamAssassin. Many, many more spam-fighting programs run on Linux, however. All of them tend to fill a specific niche that SpamAssassin may or may not already cover, and some interoperate with SpamAssassin. In this chapter, we look at six of these tools and give you specifics on how to bring them into your Linux mail environment. Two of the tools are distributed checksum matching networks, three are Bayesian classifiers, and one is a collection of Procmail recipes. Vipul’s Razor Vipul’s Razor (Razor for short) is both software and a collaborative, distributed anti-spam network. It is free, open source, and distributed under the Artistic License (like Perl). When a user designates a piece of mail as spam, a “signature” is generated from it and is fed to the Razor network. From that point on, anyone who uses the Razor network has access to that signature. When a user rece
Chapter 13: Anti-Spam Tools for Linux
Distributed Checksum Clearinghouse The Distributed Checksum Clearinghouse (DCC), like Razor, is a distributed anti-spam network. It’s also free and open source, distributed under its own licensing terms that simply say you cannot take credit for writing it. DCC is a project of Rhyolite Software and can be found at http://www.rhyolite.com/anti-spam/dcc/. Mail designated by a user as spam will receive a checksum that is reported to the DCC network. The more checksums that are reported as spam, the higher the likelihood that that message is spam. A mail server or client will check against the DCC network using DCC tools and then compare the checksum of the message to that in the database. The DCC uses fuzzy checksums that “filter out” a certain amount of randomness and personalization in e-mail messages. Unlike Razor, however, DCC is written in C and can be used in a variety of ways, from being called using Procmail or a .forward file, to being a Sendmail mail filter (milter), to running
Bogofilter Bogofilter is a Bayesian classifier that uses advanced statistical methods and is written in C. It was originally created by Eric S. Raymond, noted programmer, author, and open- source software advocate. Now a number of other developers are assisting in the project. The bogo in Bogofilter stems from the word bogus, one of the meanings of which is “useless.” In that sense, spam can be considered bogus e-mail, and it’s Bogofilter’s job to filter out this bogus e-mail based on a particular piece of spam’s “bogosity” level. If all this sounds like jargon, you’re right, because it is. You can visit Eric’s Jargon File at http://www.catb.org/~esr/jargon/ for more. If you want to skip all that and go straight to Bogofilter, you can find it at http://bogofilter.sourceforge.net/. Installing Bogofilter Bogofilter is distributed as source code or as a package of binaries in RPM Package Manager (RPM) format. You have the option of downloading the most current version (bogofilter-current)
SpamBayes In Chapter 10 we looked at the Windows version of SpamBayes, which works as an Outlook plug-in. SpamBayes isn’t just an Outlook plug-in, though; it’s actually a whole suite of programs (including the plug-in) that make up the SpamBayes Project: The Outlook plug-in Reviewed in Chapter 10 Pop3proxy A filter that analyzes mail as your POP client downloads it from the server Imapfilter Works like Pop3proxy, but is used with the IMAP mail-reading protocol Hammiefilter A filter that can be used with Procmail Since we’re writing a chapter on Linux anti-spam tools, we’ll stick to using Procmail with SpamBayes. All SpamBayes programs require that you have Python installed. Python is an open source, cross-platform, object-oriented scripting language and interpreter. Python packages come with many Linux distributions, and you can also obtain it at http://www .python.org/. SpamBayes requires Python 2.2 (the current version is 2.3.3) or later and can be found at http://spambayes.sourcefor
Quick Spam Filter The Quick Spam Filter (QSF) is another Bayesian classifier. (Popular, aren’t they?) It’s designed to be simple to set up and use. QSF is written in C, and therefore it must be compiled on your system. If you’re not interested in compiling, RPMs and even an experimental Windows binary are available. QSF is available from http://www.ivarch.com/ programs/qsf.shtml. One thing that sets QSF apart from some of the other Bayesian classifiers is its ability to use a MySQL backend database. This is useful if you’re already running a MySQL server, as it will simplify database backups. Downloading and Installing QSF QSF’s source code is distributed as a gzipped tarball called qsf-x.x.x.tar.gz, where the x is the major and minor version number. The current version as of this writing is qsf-0.9.9.tar.gz. Installing QSF is easy. Simply un-zip and un-tar the tarball into a source directory: tar -zxvf qsf-0.9.9.tar.gz Then change into the newly created qsf-x.x.x directory and run the
The SpamBouncer The SpamBouncer is a collection of Procmail recipes. While using Procmail exclusively is a rather “old-school” way of dealing with spam, some administrators still like the elegant simplicity of its regular expression matching. There’s nothing hidden, no algorithms you have to have a major in mathematics to understand, and they’re easy to modify—if you understand regular expressions, that is. The SpamBouncer was created by Catherine A. Hampton and can be found at http:// www.spambouncer.org/. Installing and Configuring the SpamBouncer First, make sure you have Procmail installed (see the “Groking Procmail” sidebar earlier in the chapter). The SpamBouncer also requires the nslookup or host programs to perform its DNSBL checks. One or both of these come with most Linux distributions. The SpamBouncer is distributed as either a compressed tar file or a ZIP file. It’s simply called sb.tar.Z or sb.zip. Once you get it, decompress and un-tar into a directory off of your home di
Summary If you’re not interested in a full-blown SpamAssassin installation, the Linux anti-spam tools covered in this chapter are excellent alternatives. Vipul’s Razor and DCC are distributed spam checksum databases that allow you to block spam others have already seen, and they can be used with SpamAssassin. Bogofilter, SpamBayes, and QSF are Bayesian classifiers that can be used by individual users on your Linux system. Finally, the SpamBouncer is a collection of Procmail rules that perform pattern matching and blacklist checking on incoming e-mail. Any of these Linux anti-spam tools are good for implementing specific anti-spam functionality on your Linux e-mail server or workstation.
Part IV: Stopping Spam in the Long Term
Chapter Overview
Chapter 14: Know Your Enemy Throughout this book, we have mentioned various spammer tactics, usually in relation to anti-spam rule creation. In this chapter, we discuss these tactics in depth and provide a “profile” of a typical spammer and his product: spam. Additionally, we discuss how to track the spammer back to his foul lair (though not usually a successful venture) and how to report when you’ve found a real-live “fathead” spammer. Profile of an “E-Mail Direct Marketer” Before we prompt a flood of hate mail and possible litigation, consider the following caveat: Not all direct marketers are scum-sucking, bottom-feeding spammers. Legitimate businesses do have real products to sell, and they responsibly market directly to people willing to receive direct solicitation of that real product. Although these businesses exist, they are drowning (or better yet, trapped) in the tidal wave of spammers posing as legitimate marketers. In this section we discuss the types of tools used by the s
Chapter 14: Know Your Enemy
Getting to Know the Product (Spam) If you delve into a well-constructed spam message, you’ll realize that spammers, or at least the developers of their software, are ingenious. For almost every spam countermeasure, direct e-mail marketing software has an answer, and most often, spammers are proactively developing for the next countermeasure. In this section, we dissect a few spam messages and highlight the most common methods that spammers use to defeat your anti-spam tools. We also take a look at how spammers might thwart anti-spam tools of the future. Anatomy of an E-Mail Header To understand e-mail, and thus spam, you must understand the e-mail (or SMTP) header. Within the headers are the details of the pathway the mail took to get to you and certain information about the sender. The bad news is that almost everything that happens to the e-mail before it reaches your mail server can be forged. And if the message is spam, it probably is forged. In this section we discuss e-mail heade
Red Alert: Reporting Known Spammers So, you managed to track down the spam bear and trap it in its cave. What do you do now? Since it’s still illegal to shoot spammers, your only recourse is to share information for the betterment of the Internet community. In this section, we cover the few ways you have to report spammers. Direct E-mail Your first, and probably least effective, way is to contact the administrator of the offending domain directly. Whether you’ve determined that the spammer was a remote-access client on an ISP’s network, transiting through an open mail relay, or have otherwise found proof that a spammer originated on a given network, you can probably find the administrator responsible for following up. The fastest way to find the point-of-contact is to take the header and go to http://www.spamcop.com. You’ll find a handy tool that analyzes the headers and spits out the e-mail contact for the appropriate administrator. We discuss SpamCop in more detail in Chapter 5. When
Summary In this chapter, we covered the spammer’s tools in addition to detailing the spam message itself. Most spammer tools contain a suite of programs and functions specifically designed to thwart anti-spam solutions. Built-in mail exchangers, message randomizers, the use of HTML, and random lists of words all target specific classes of anti-spam tools. Knowing spam tools and methods helps you to tweak your countermeasures to compensate. The spam message itself can also become a spam-fighting tool. Information in the message headers typically leads you as close to the spammer as you can get, and it usually gets you at least enough information to report the message and the sender. By sharing this information, you are helping others to fight spam, as well as helping yourself and your organization.
Chapter 15: Advanced Topics and Fine Tuning In this chapter we explore some advanced spamming topics, both from the defender’s perspective as well as from the spammer’s. Our objective is to delve into what really makes some of the technologies tick so that you will be well prepared on your mission of ridding your inbox of spam. Some of the items covered here are best described by using source code, but that by no means should dissuade you from diving right in. We promise to keep things at a topical and uncomplicated level. We explore some of the finer facets of black/whitelisting and cover a new approach that splits the difference between the two, affectionately called greylisting. Next, we cover a group of defensive tactics with supporting examples, which are largely based on the methods and practices used by the spammers themselves. Lastly we explore some automation techniques for handling the ongoing operation of an anti-spam suite of tools and the problems you will likely encounter
Chapter 15: Advanced Topics and Fine Tuning
The Complete MX Relay Defense Another strategy of a spam-blasting operation is to find unused, unprotected, or unregulated channels into a network. Usually through old or neglected mail exchangers (MX servers), spammers can get a foothold into your organization. Be sure that all of your MX relays are set up with the same software and spam controls, including outsourced or infrequently used ones. In addition, many organizations apply spam filtering only at the main mail server, which can easily be determined by the spammers. Often third- or fourth-tier MX relays are easily identified by spam-bots and are used as primary transmission vehicles. Shuffling through MX entries is one of the best methods spammers have to circumvent all of your hard work in defeating their advances. As a defensive posture, you could either choose to install your complement of spam-fighting solutions at all of your exposed mail servers, including the backup MX servers, or you could force a “re-forward” of e-mail
Defense by Disguise The harvesting of target addresses is largely done using web page scraping robots, or bots for short. To help contain the leakage of your addresses right at the source, only a few cheap and easy ways are available to use to protect yourself. By disguising e-mail addresses published on your web pages, you can avoid a big mountain of garbage collecting in your inbox. First off, it would be a good idea to do some spot-checking of your addresses in Google or one of the other big Internet search engines. You should be able to find your own web pages cited, especially if you have linked references as part of your HTML files. This fact alone is one of the key reasons why spam-bots are quickly able to obtain a trajectory on your site and why you can’t seem to shake them. This also makes for a good starting point so that you will be able to review the effectiveness of any countermeasures you have taken. In the pages that follow, we cover four of the basic methods for reducin
Spam-bots and How They Work In this section we don the hat of the enemy, as we analyze the operation of a spam- harvesting tool and explore some of the challenges spam-bots face when trying to collect and administer spam to a restless, aggravated, and retaliatory audience. Let’s start by writing our own short mail harvester to build a database of target e-mail addresses. Our application should be able to take a web page as a starting point and optionally follow any embedded web links that it finds along the way. It should also scan for mailto: tags anywhere it looks and output the address so that the operator can include them in the master database. To keep things short, simple, and easy to follow, we didn’t bother with a bunch of the niceties that would be the mark of a good, well-written application. The following is a complete code listing of our harvesting application, which we developed with the Perl programming language. If you’d like the latest electronic version, it’s posted on
Siphoning a 55-Gallon Drum of Spam Once you’ve got spam filtering working reliably, and once you’ve built a system for updates that keeps your whole system in tune, all that’s left are a few spams here and there, often as few as one per day (at least, that’s what we can expect from a well-oiled system). What’s curious is that we have to remind ourselves how big that iceberg of unseen spam actually is. In this section, we cover a few statistics of our spam corpus, which can help elucidate the elements that really matter to a system that’s been around the block a few times and is essentially operating per its specification. Consider the following distribution of the spaminess of our test mailbox. After collecting about ten days worth of mail, we were able to draw the following statistics: Number of Spam Mails Received 8014 Time Period Covered Sunday, 14 Dec 2003 20:03:20 to Thursday, 24 Dec 2003 14:18:09—about 10 days Average spam received per hour 33 (one every 2 minutes) Number of link
Reversing the Spam-bot Spigot Now that we’ve covered the intricate architectures of how robotic automated collection agents troll for addresses 7×24×365, let’s get to the fun part of how to incorporate a few other defensive measures against their assaults. The simplest and most logical tactic is to feed the spam bots what they want: addresses for their database. Who says the address they get need to be real ones? The hard part is determining when you have a live spam-bot on the line with you. A few of the defensive techniques covered next can help determine this, usually through only empirical or traffic analysis in lieu of anything overt. As is their nature, the spam trafficking organizations do everything they can to minimize their footprint, their length of stay, and any indication that they were there at all. But they do leave a trace, and to get their messages sent, they need to make a connection at some point. It’s by finding these traces that we can detect them best. The Reverse
Summary In this chapter we explored some advanced techniques used by the spammers, and we projected some of the more interesting features of where we think spam fighting is going in the next few years. We built a child’s spam harvesting bot and investigated a handful of good defensive maneuvers that you could take to even up the playing field. Lastly we did a little sanity checking of the systems we erected and deployed with this book and found our strategies to be most effective.
Chapter 16: Fighting Spam Defensively The art of defending against unwanted e-mail goes beyond simply installing a few anti-spam programs and updating the profiles. Spam e-mail is a security threat, in that it denies availability to resources, and with a combined attack such as a spam message with a virus attachment, this threat can quickly become very serious and expensive. As with any information security-related issue, a defense-in-depth posture is required. All of the tools we covered in the previous 15 chapters of this book do a great job of managing your spam-fighting energies, but additional, basic information security techniques could exponentially increase your success. In earlier chapters, we discussed e-mail management organization and policy, but what other network and IT management-level strategies reduce the threat spam poses to the organization? In this chapter, we discuss these strategies. Some are extreme, but most are up-and-coming spam-fighting techniques that preclu
Chapter 16: Fighting Spam Defensively
Keeping Your Own House Clean While the spam-fighting tools presented in this book are powerful and feature-rich, the first step to fighting spam is ensuring that you are not adding to the problem. In this section we cover the four simple rules for preventing your resources from being used by spammers: closing your open relays, hardening all hosts and servers, restricting access, and sniffing out spyware. Open Relays The main problem to the ongoing spam-fighting campaign is the open e-mail relay. In earlier chapters, we discussed DNSBLs that track and block mail from open relays, but how do you close your mail exchanger to spammers? In this section, we discuss how to secure mail relaying for Sendmail 8.12 and Microsoft Exchange 2000. Both of these mail servers deny relaying by default; however, certain configuration options allow some flexibility for mail exchange that may be required in your environment. Sendmail We chose Sendmail 8.12 for this chapter because this release was the firs
Spyware: Another Spam Pathway Spyware consists of programs that are installed on a computer without the user’s explicit permission that monitor and report on activity or other data on the computer. Spyware is also known as adware, trojan horses, and $%#$$#@ web pop-ups. While many spyware programs are truly illicit, several legitimate programs, such as Microsoft Media Player, Real Media Player, and others, also collect information on user actions (DVD movies watched, MP3 music files opened, and so on) and surreptitiously report this information to a remote server. While this may be an innocuous exchange of information for the benefit of the user, it is still network communication that the user did not specifically allow and the vendor did not explicitly disclose (or perhaps it did, hidden in the fine print of the license agreement you clicked past). In most security profiles, this is a trojan horse program. In this section we cover the illicit version of these programs, their operation
Summary In this chapter, we talked about various strategies, outside of your anti-spam tools, for combating spam before it actually reaches you, including protecting your e-mail addresses, keeping your own mail server resources out of the hands of the spammers, and protecting your hosts and other computers on your network from becoming zombie spam-spewers. Finally, we discussed spyware, its new convergence with spammer tools, and anti-spyware programs that protect you from that threat.
Chapter 2: Goals and Criteria for Evaluating Spam Control Solutions
Chapter 4: Anti-Spam Implementation Strategies
Chapter 5: Blocking Spammers with DNS Blacklists
Chapter 6: Filtering Spam with SpamAssassin
Chapter 8: Enhancing and Maintaining SpamAssassin
Chapter 9: Configuring Popular E-mail Clients for Spam Filtering
Chapter 10: Anti-Spam Clients for Windows
Chapter 11: Anti-Spam Servers for Windows
Chapter 12: Anti-Spam Tools for Macs
Chapter 14: Know Your Enemy
Chapter 15: Advanced Topics and Fine Tuning
Chapter 16: Fighting Spam Defensively
Chapter 4: Anti-Spam Implementation Strategies
Chapter 6: Filtering Spam with SpamAssassin
Chapter 8: Enhancing and Maintaining SpamAssassin
Chapter 3: Methods for Mail Content Control
Chapter 4: Anti-Spam Implementation Strategies
Chapter 8: Enhancing and Maintaining SpamAssassin
Chapter 9: Configuring Popular E-mail Clients for Spam Filtering
Chapter 11: Anti-Spam Servers for Windows
Chapter 13: Anti-Spam Tools for Linux
Chapter 14: Know Your Enemy
Chapter 16: Fighting Spam Defensively
← Prev
Back
Next →
← Prev
Back
Next →