[Gllug] How do I improve spamassassin's usefulness?

Sat Aug 15 19:22:35 UTC 2009

On Sat, Aug 15, 2009 at 07:39:08PM +0100, Nix wrote:
> On 11 Aug 2009, Richard Jones spake thusly:
> 
> > I'm using spamassassin on my work email, classifying email mostly by
> > hand, and running sa-learn.  Unfortunately SA is still very bad at
> > sorting my email.  After about 6 months of doing this, I still get
> > lots of (very obvious) spam arriving in my inbox.  I'm left thinking
> > how long will SA take to get the idea that "EUROMILLIONS LOTTERY" is
> > not a subject I'm interested in.
> 
> Are you using Justin Mason's sought rules? 
> <http://wiki.apache.org/spamassassin/SoughtRules>

No.

> > My email comes in from fetchmail and is filtered using a .procmailrc
> > using the recommended recipe at the top:
> >
> >   # Send mail through spamassassin
> >   :0fw: spamassassin.lock
> >   * < 512000
> >   | spamassassin
> 
> (Not spamc? spamassassin(1) has very high invocation overhead: it takes
> over a second of maxed-out-core on my Nehalem...)
> 
> >   :0:
> >   * ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
> >   mail/spam
> 
> I've always found this annoyingly hard to read: ${MATCH} is much
> clearer. Personally I do this:
> 
> ,----
> | # SpamAssassin, assassinate!
> | :0 fw
> | | spamc -d spamfilter.srvr.nix
> | 
> | # Is this certain to be spam?
> | :0 H:
> | * ^X-Spam-Status: yes, +score=\/[^. ]*
> | * ? (( ${MATCH} > 9 ))
> | spambox
> | 
> | # Is this merely likely to be spam?
> | :0 H:
> | * ^X-Spam-Flag: YES$
> | blockbox
> `----
> 
> > And I do get spam appearing in mail/inspect and mail/spam, so I am
> > sure that SA does run.  But I also get obvious spam with ridiculously
> > low scores in my inbox.
> >
> > What am I doing wrong?
> 
> I'm not sure. Stick a sample of spam that got through (with headers)
> somewhere and I'll see if it gets equally low scores here (with 3.2.x
> and 3.3-to-be). Maybe it's a config error: there are some important
> configuration items (like trusted_networks) that are often set wrongly
> and badly damage accuracy until they're fixed.

Two spams that made it through SA with rather low scores are
(temporarily) here:

http://www.annexia.org/tmp/spam1.txt
http://www.annexia.org/tmp/spam2.txt

> Also, note that SA intentionally biases strongly towards allowing a
> little spam through rather than mistakenly blocking legitimate email. I
> get a spam or two a day through (and 2000-or-so blocked by SA). Even the
> 'likely' box only gets an FP once every year or so.

It's definitely letting a lot of spam through ~ 50%.

Rich.

-- 
Richard Jones
Red Hat
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug