[Sderby] Spam filtering

Thu Jun 10 09:15:49 BST 2004

On Wednesday 09 June 2004 20:11, Mike Hemstock wrote:
> Hi folks,
>
> Does anyone have any advise for filtering Spam?  I currently use Spam
> Assasin, but I am finding that it is becoming less effective as the
> spammers are starting to become more effective at creating less obvious
> spam.  The answer, I fear, may be a combination of Bayesian filtering, word
> lists, white lists and all sorts!  Just wondering if anyone could make any
> recommendations?

I have quite a bit of advice and could do a demo at a future meeting if this 
would be of use? I run SuSE 9.0 and collect my mail by running fetchmail as a 
cron job. All mail is received by a single user account that runs a series of 
procmail scripts against the incoming mail. This includes filtering through 
spamassassin and razor, checking for virus's and auto filing emails from 
mailing lists. I am achieving >95% of spam intercepted with less than 1 in a 
1000 non-spam caught.

Here is my advice:

(1) Read  the procmail quickstart document at 
http://www.ii.com/internet/robots/procmail/qs/ .Look at the bit under 
strategies. The key is flexibility. The spammers have access to all the tools 
you have and design their spam to get through. This means that you are most 
likely to win if you use a modular approach which can be adapted over time. I 
use a series of procmail scripts  to achieve this. 

(2) This is the order I run the scripts in is:

(a) Filter out clean mailing list emails;
(b) Filter out white list emails. I run two lists, one where I trust the whole 
domain and one where I trust specific email address's. I harvested these by 
running a grep on my outbox.
(d) Filter via razor - you need a permanent connection to the internet because 
razor  checks a "signature" of the email against spam reported by other razor 
users. This is very good for new spam that does not match a Bayesian pattern.
(e) Filter using virussnag. This picks up attachments with bad extentions 
e.g. .pif
(f) Filter via spamassassin.
(g) Do your local filing into mailboxes.

(3) Remember that for a Bayesian filter to be effective it needs roughly EQUAL 
NUMBERS of spam and ham. Good results are achieved after about 1000 of each. 
Little is gained after 5000 emails i.e. keep your spam.

(4) Any false spam and false ham are manually filed in separate folders. I 
then run a shell script that runs spamassassin's sa-learn utility as well as 
razor's razor-report utility. That way they each learn from the others spam.

Works like a dream.

Andre