SpamAssassin Rules

A set of predefined SpamAssassin rules are contained in ./usr/share/spamassassin. These rules are contained in a file named xy_name.cf. SpamAssassin uses these rules in alphabetical order; each rule overrides the previous rule if they contain the same options. These rules are similar to the scoring rules in Procmail, but they are more flexible and powerful in SpamAssassin.

A rule is made of three lines that define four mandatory elements:

Scope

This determines where the test applies: to the body, header, or both; the raw body or both the header and the raw body.

Name

This is the name of the test. The name has to be unique, otherwise it is overridden by the test with the same name from the latest file loaded by spamassassin.

Test

The test can be a regular expression to match, a call to a function inside spamassasin, or a plug-in.

Score

The weight to apply to the test.

/usr/share/spamassassin/20_phrases.cf contains simple rules to match a Perl regular expression. One of the tests contained in this file is:

body STRONG_BUY        /strong buy/i
describe STRONG_BUY        Tells you about a strong file

This rule matches on string "strong buy" in the text part of the email body. //i means that the search is case-insensitive (i.e., it matches STRONG BUY, strONg Buy, etc.).

An optional flag can be set with tflags to specify a special condition. The most common flags are:

nice

Used to avoid false positives, the test score must be subtracted instead of added.

learn

Requires training before using the test. This is commonly used for tests that use the Bayesian filter.

noautolear

The score should not be used for learning systems. This flag is used in association with the Bayesian filter.

userconf

Requires an option in local.cf or user_prefs.

Tests can be combined together to create a new test called a meta test. They can be combined in a Boolean expression:

meta NEW_TEST    TEST_1 && (TEST_2 || ! TEST_3) && TEST_4

Or they can be combined in an arithmetic expression (a positive test is equal to 1; a negative test is equal to 0):

meta OTHER_TEST (3 * TEST_2 - TEST_3) + 2 > 2 * TEST_1

Meta tests can be used to counterbalance some tests in special conditions. For example, if you are interested in commercial offers on computers, you can disqualify the test that classifies such mails as spam:

body COMPUTER       /computer|laptop|PC/i
describe            Mail about computers
tflags              nice

body OFFER          /buy/i
describe            Buy something you do not need

body PRICE          /price/i
describe            Price for something ouy do not want

meta DISABLE_TESTS  (PRICE || OFFER) && COMPUTER
describe            These tests do not matter
score               3
tflags              nice

By default, all tests have a score of 1.0, except the test with a name starting with T_ (for testing), which has a default score of 0.01. This value can be changed with the score option:

A whitelist and a blacklist are used to override the SpamAssassin score. If the whitelist matches on an email, 100 is subtracted from the SpamAssassin score, which means that the email from the whitelist is never identified as spam. Email that matches the blacklist is always reported as spam.

The whitelist is managed with spamassassin with several options:

The options to manage the blacklist are the same. Just change "whitelist" to "blacklist" in the option name.

The auto-whitelist feature is an averaging of the SpamAssassin score per sender. If use_auto_whitelist is set to 1, the final score is approximately the average of the current score and the score for the past emails from the same sender (the actual algorithm is a bit more complex than an arithmetic average). This feature is used to level down the peaks in score. For example, if a sender has an average low score of 2, but a new email from him has a score of 10, it is probably a glitch. Its final score is leveled down and the email is not classified as spam. This works the same when the score is usually high except for a few emails with a low score. These emails are reclassified as spam. This acts as a dynamic whitelist/blacklist.

Most people only receive legitimate email in a couple of languages. If you receive emails with a Chinese encoding, but you do not read Chinese, it is a very likely spam.

SpamAssassin has a set of rules to filter languages and character sets:

To increase the importance of language in a character set of emails, change the default score of these above rules:

ok_languages en fr de es
ok_locales en

score UNWANTED_LANGUAGE_BODY  2.0
score CHARSET_FARAWAY         2.0
score CHARSET_FARAWAY_BODY     2.0
score CHARSET_FARAWAY_HEADERS     2.0

SpamAssassin includes a Bayesian filter (see Spam Filtering with Bayesian Filters). During the learning period, the rules using the filter can be disabled while it is learning new words:

use_bayes_rules 0
bayes_auto_learn 1
bayes_auto_learn_threshold_nonspam 0.5
bayes_auto_learn_threshold_spam    10.0
bayes_ignore_from     *@good-domain.net
bayes_ignore_to    user@fomain.net

This rule provides the Bayesian filter with emails as spam when the final sore is 10.0 and above, and emails as good when the final score is 0.5 or lower. Emails sent from the domain good-domain.net or to the recipient user@domain.net are not used.

Instead of manually enabling the Bayesian filter, it can automatically be turned on when the database contains enough words. The option use_bayes_rules 0 can be replaced by the following:

bayes_min_han num    500
bayes_min_spam_num    500

With this rule, the filter automatically turns on when the database reach 500 words for the good emails and 500 words for the spam.

The database maintenance is done by SpamAssassin. It is possible to specify a maximum number of words in the database and to remove words that were not seen recently:

bayes_expiry_max_db_size 200000
bayes_auto_expire    1

This rule cleans up the database when it reaches about 10 MB.

The SpamAssassin Bayesian filter is not as fast as SpamProbe, but it is easier to set up and maintain. It is possible to use SpamProbe in association with SpamAssassin. The SpamAssassin Bayesian filter can be disabled completely with the following rule:

use_bayes 0