A set of predefined SpamAssassin rules are contained in ./usr/share/spamassassin. These rules are contained in a file named xy_name.cf. SpamAssassin uses these rules in alphabetical order; each rule overrides the previous rule if they contain the same options. These rules are similar to the scoring rules in Procmail, but they are more flexible and powerful in SpamAssassin.
A rule is made of three lines that define four mandatory elements:
This determines where the test applies: to the body, header, or both; the raw body or both the header and the raw body.
This is the name of the test. The name has to be unique, otherwise it is overridden by the test with the same name from the latest file loaded by spamassassin.
The test can be a regular expression to match, a call to a function inside spamassasin, or a plug-in.
The weight to apply to the test.
/usr/share/spamassassin/20_phrases.cf contains simple rules to match a Perl regular expression. One of the tests contained in this file is:
body STRONG_BUY /strong buy/i describe STRONG_BUY Tells you about a strong file
This rule matches on string "strong buy" in the text part of the email body. //i
means that the search is case-insensitive (i.e., it matches STRONG BUY, strONg Buy, etc.).
An optional flag can be set with tflags
to specify a special condition. The most common flags are:
nice
Used to avoid false positives, the test score must be subtracted instead of added.
learn
Requires training before using the test. This is commonly used for tests that use the Bayesian filter.
noautolear
The score should not be used for learning systems. This flag is used in association with the Bayesian filter.
userconf
Requires an option in local.cf or user_prefs.
Tests can be combined together to create a new test called a meta test. They can be combined in a Boolean expression:
meta NEW_TEST TEST_1 && (TEST_2 || ! TEST_3) && TEST_4
Or they can be combined in an arithmetic expression (a positive test is equal to 1; a negative test is equal to 0):
meta OTHER_TEST (3 * TEST_2 - TEST_3) + 2 > 2 * TEST_1
Meta tests can be used to counterbalance some tests in special conditions. For example, if you are interested in commercial offers on computers, you can disqualify the test that classifies such mails as spam:
body COMPUTER /computer|laptop|PC/i describe Mail about computers tflags nice body OFFER /buy/i describe Buy something you do not need body PRICE /price/i describe Price for something ouy do not want meta DISABLE_TESTS (PRICE || OFFER) && COMPUTER describe These tests do not matter score 3 tflags nice
By default, all tests have a score of 1.0, except the test with a name starting with T_
(for testing), which has a default score of 0.01. This value can be changed with the score option:
score
TEST_NAME x.y
Assigns a score of x.y
to the test TEST_NAME
.
score
TEST_NAME
0Disables the test TEST_NAME
.
score
TEST_NAME a.b c.d e.f g.h
Assigns a different score for different SpamAssassin configurations:
a.b
Score when the Bayesian filter and the network tests are disabled.
c.d
Score when the Bayesian filter is disabled and the network tests are enabled.
e.f
Score when the Bayesian filter is enabled and the network tests are disabled.
g.h
Score when the Bayesian filter and the network tests are enabled.
score
TEST_NAME
(a.b
)(c.d
) (e.f
) (g.h
)Increases the existing scores by the value between parentheses.
A whitelist and a blacklist are used to override the SpamAssassin score. If the whitelist matches on an email, 100 is subtracted from the SpamAssassin score, which means that the email from the whitelist is never identified as spam. Email that matches the blacklist is always reported as spam.
The whitelist is managed with spamassassin with several options:
whitelist_from
address
If the email appears to be sent by address
, add it to the whitelist. This option can be used several times.
unwhitelist_from
address
Remove address
from the whitelist. This is used by users to override the global whitelist if user_allow_rules
was set to 1
in local.cf, or to make exceptions to a whitelist:
whitelist_from*@good-domain.net
unwhitelist_fromnot-so-good@good-domain.net bad@good-domain.net
This creates a whitelist with all users in good-domain.net
except not-so-good
and bad
.
whitelist_to
address
Similar to whitelist_to
, but the match is done on the recipient.
The options to manage the blacklist are the same. Just change "whitelist" to "blacklist" in the option name.
It is recommended to manage a whitelist outside SpamAssassin—in Procmail, for example—since all the SpamAssassin tests are run even after matching the whitelist.
The auto-whitelist feature is an averaging of the SpamAssassin score per sender. If use_auto_whitelist
is set to 1
, the final score is approximately the average of the current score and the score for the past emails from the same sender (the actual algorithm is a bit more complex than an arithmetic average). This feature is used to level down the peaks in score. For example, if a sender has an average low score of 2, but a new email from him has a score of 10, it is probably a glitch. Its final score is leveled down and the email is not classified as spam. This works the same when the score is usually high except for a few emails with a low score. These emails are reclassified as spam. This acts as a dynamic whitelist/blacklist.
Most people only receive legitimate email in a couple of languages. If you receive emails with a Chinese encoding, but you do not read Chinese, it is a very likely spam.
SpamAssassin has a set of rules to filter languages and character sets:
ok_languages
xx
By default, all languages are acceptable. This option restricts the list of languages. Languages are identified by two letters: en for English, fr for French, etc. The value of ok_languages
is used by the test UNWANTED_LANGUAGE_BODY
in /usr/share/spamassassin/20_body_tests.cf.
ok_locales
xx
Similar to ok_languages
, but for character sets. It is used by the test CHARSET_FORAWAY
, CHARSET_FARAWAY_BODY
, and CHARSET_FARAWAY_HEADERS
.
To increase the importance of language in a character set of emails, change the default score of these above rules:
ok_languagesen fr de es
ok_localesen
score UNWANTED_LANGUAGE_BODY2.0
score CHARSET_FARAWAY2.0
score CHARSET_FARAWAY_BODY2.0
score CHARSET_FARAWAY_HEADERS2.0
SpamAssassin includes a Bayesian filter (see Spam Filtering with Bayesian Filters). During the learning period, the rules using the filter can be disabled while it is learning new words:
use_bayes_rules 0 bayes_auto_learn 1 bayes_auto_learn_threshold_nonspam0.5
bayes_auto_learn_threshold_spam10.0
bayes_ignore_from*@good-domain.net
bayes_ignore_touser@fomain.net
This rule provides the Bayesian filter with emails as spam when the final sore is 10.0 and above, and emails as good when the final score is 0.5 or lower. Emails sent from the domain good-domain.net or to the recipient user@domain.net are not used.
Instead of manually enabling the Bayesian filter, it can automatically be turned on when the database contains enough words. The option use_bayes_rules 0
can be replaced by the following:
bayes_min_han num500
bayes_min_spam_num500
With this rule, the filter automatically turns on when the database reach 500 words for the good emails and 500 words for the spam.
The database maintenance is done by SpamAssassin. It is possible to specify a maximum number of words in the database and to remove words that were not seen recently:
bayes_expiry_max_db_size 200000
bayes_auto_expire 1
This rule cleans up the database when it reaches about 10 MB.
The SpamAssassin Bayesian filter is not as fast as SpamProbe, but it is easier to set up and maintain. It is possible to use SpamProbe in association with SpamAssassin. The SpamAssassin Bayesian filter can be disabled completely with the following rule:
use_bayes 0