[Nelug] Spam Filtering

Tue Apr 13 22:17:43 UTC 2004

I have just implemented a perl module and some supporting scripts
for spam filtering. The module is called Mail::SpamFilter
and is available from my web pages:

http://www.cse.dmu.ac.uk/~mward/martin/software
http://www.dur.ac.uk/martin.ward/martin/software

The module presents a uniform interface for passing a message through each 
filter and determining which filters consider the message to be spam

The spamcheck script passes a copy of the given message to each filter and 
counts how many filters consider it to be spam. It adds a X-SPAM-Votes: header 
with the total.

I currently delete everything with three or more votes and quarantine 
everything with one or two votes using these procmail rules:

:0fw: spamcheck.lock
| spamcheck

# Record the votes in the procmail log file:
:0
* ^X-Spam-Votes: \/.*$
{ LOG="Spam-Votes: ${MATCH}" }

# Junk anything that 3 or more scanners give a positive result on.

:0
* ^X-Spam-Votes: [3456789]
/dev/null

# Filter anything which any scanner considers to be spam:

:0
* ^X-Spam-Votes: [12]
SPAM

The isspam and notspam scripts can be used to train your filters. Any spam 
message which is missed by any filter can be passed to isspam while false 
positives should be passed to notspam.

The spam filters it currently knows about are:

    * SpamAssassin
    * The CRM114 Discriminator
    * Nuclear Elephant: DSPAM
    * WPBL - Weighted Private Block List

(See the web page for URLs  for each of these).

Let me know what you think of it!

-- 
			Martin

Martin.Ward at durham.ac.uk http://www.cse.dmu.ac.uk/~mward/ Erdos number: 4
G.K.Chesterton web site: http://www.cse.dmu.ac.uk/~mward/gkc/