[Gllug] How do I improve spamassassin's usefulness?
Richard Jones
rich at annexia.org
Sat Aug 15 19:22:35 UTC 2009
On Sat, Aug 15, 2009 at 07:39:08PM +0100, Nix wrote:
> On 11 Aug 2009, Richard Jones spake thusly:
>
> > I'm using spamassassin on my work email, classifying email mostly by
> > hand, and running sa-learn. Unfortunately SA is still very bad at
> > sorting my email. After about 6 months of doing this, I still get
> > lots of (very obvious) spam arriving in my inbox. I'm left thinking
> > how long will SA take to get the idea that "EUROMILLIONS LOTTERY" is
> > not a subject I'm interested in.
>
> Are you using Justin Mason's sought rules?
> <http://wiki.apache.org/spamassassin/SoughtRules>
No.
> > My email comes in from fetchmail and is filtered using a .procmailrc
> > using the recommended recipe at the top:
> >
> > # Send mail through spamassassin
> > :0fw: spamassassin.lock
> > * < 512000
> > | spamassassin
>
> (Not spamc? spamassassin(1) has very high invocation overhead: it takes
> over a second of maxed-out-core on my Nehalem...)
>
> > :0:
> > * ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
> > mail/spam
>
> I've always found this annoyingly hard to read: ${MATCH} is much
> clearer. Personally I do this:
>
> ,----
> | # SpamAssassin, assassinate!
> | :0 fw
> | | spamc -d spamfilter.srvr.nix
> |
> | # Is this certain to be spam?
> | :0 H:
> | * ^X-Spam-Status: yes, +score=\/[^. ]*
> | * ? (( ${MATCH} > 9 ))
> | spambox
> |
> | # Is this merely likely to be spam?
> | :0 H:
> | * ^X-Spam-Flag: YES$
> | blockbox
> `----
>
> > And I do get spam appearing in mail/inspect and mail/spam, so I am
> > sure that SA does run. But I also get obvious spam with ridiculously
> > low scores in my inbox.
> >
> > What am I doing wrong?
>
> I'm not sure. Stick a sample of spam that got through (with headers)
> somewhere and I'll see if it gets equally low scores here (with 3.2.x
> and 3.3-to-be). Maybe it's a config error: there are some important
> configuration items (like trusted_networks) that are often set wrongly
> and badly damage accuracy until they're fixed.
Two spams that made it through SA with rather low scores are
(temporarily) here:
http://www.annexia.org/tmp/spam1.txt
http://www.annexia.org/tmp/spam2.txt
> Also, note that SA intentionally biases strongly towards allowing a
> little spam through rather than mistakenly blocking legitimate email. I
> get a spam or two a day through (and 2000-or-so blocked by SA). Even the
> 'likely' box only gets an FP once every year or so.
It's definitely letting a lot of spam through ~ 50%.
Rich.
--
Richard Jones
Red Hat
--
Gllug mailing list - Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug
More information about the GLLUG
mailing list