[Gllug] Whitelist-only spam filtering

Thu Sep 12 13:18:37 UTC 2002

* Rev Simon Rumble (simon at rumble.net) wrote:
> > Ugh. This is highly offensive to people sending you mail. Frankly when I
> > get messages bounced from people saying "send the message again with
> > this SECRIT PASSWORD so I can add you to my whitelist", I never send
> > them another mail.
> 
> Well that's okay, I don't want to talk to you anyway :P  Seriously,
> if someone really does want to talk to me, they'll follow the (simple)
> instructions.  Failing that, I'll get around to checking the junk box
> and add them to the whitelist.

Good luck with that :)

> > Sensible ideas for filtering spam:
> > http://www.paulgraham.com/spam.html
> 
> The problem with heuristic filters is that once they gain widespread
> use, spammers will test their spam on them.  So it's an arms race.
> This is why I find the whitelist idea elegant.

Then you failed to read or understand the article. Might I draw your
attention to what Paul said in the article I pointed you to:

    To beat Bayesian filters, it would not be enough for spammers to
    make their emails unique or to stop using individual naughty words.
    They'd have to make their mails indistinguishable from your ordinary
    mail. And this I think would severely constrain them. Spam is mostly
    sales pitches, so unless your regular mail is all sales pitches,
    spams will inevitably have a different character. And the spammers
    would also, of course, have to change (and keep changing) their
    whole infrastructure, because otherwise the headers would look as
    bad to the Bayesian filters as ever, no matter what they did to the
    message body. I don't know enough about the infrastructure that
    spammers use to know how hard it would be to make the headers look
    innocent, but my guess is that it would be even harder than making
    the message look innocent.

Also, from the FAQ, which you might have missed,
http://www.paulgraham.com/spamfaq.html:

    Once this software was available, couldn't spammers just tune their
    spams to get through it?

    They couldn't necessarily tune their emails and still say what they
    wanted to say. If they wanted to send you to a url that is known to
    the filters, for example, they would find it hard to tune their way
    around that.

    Second, tune using what? Each user's filters will be different, and
    the innocent words will vary especially. At most, spammers will be
    able to dilute their mails with merely neutral words, and those will
    not tend to be much use because they won't be among the fifteen most
    interesting.

    If the spammers did try to get most of the incriminating words out
    of their messages, they would all have to use different euphemisms,
    because if they all started saying "adolescents" instead of "teens",
    then "adolescents" would start to have a high spam probability.

    Finally, even if spammers worked to get all the incriminating words
    out of the message body, that wouldn't be enough, because in a
    typical spam a lot of the incriminating words are in the headers.

Anyhow, I don't mind what you do, I'm just offering some advice - to
shed some light I've moved my spam filtering over from the combination
blacklist/whitelist/regexp matching procmail filter I've been using for
several years, to a bayesian filter, which so far, has caught every spam
thrown my way with no false positives.

Tom.
-- 
   .^.    .-------------------------------------------------------.
   /V\    | Tom Gilbert, London, England | http://linuxbrit.co.uk |
 /(   )\  | Open Source/UNIX consultant  | tom at linuxbrit.co.uk    |
  ^^-^^   `-------------------------------------------------------'
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <http://mailman.lug.org.uk/pipermail/gllug/attachments/20020912/b6f350b0/attachment.pgp>