[Gllug] Whitelist-only spam filtering
Tom Gilbert
tom at linuxbrit.co.uk
Thu Sep 12 13:18:37 UTC 2002
* Rev Simon Rumble (simon at rumble.net) wrote:
> > Ugh. This is highly offensive to people sending you mail. Frankly when I
> > get messages bounced from people saying "send the message again with
> > this SECRIT PASSWORD so I can add you to my whitelist", I never send
> > them another mail.
>
> Well that's okay, I don't want to talk to you anyway :P Seriously,
> if someone really does want to talk to me, they'll follow the (simple)
> instructions. Failing that, I'll get around to checking the junk box
> and add them to the whitelist.
Good luck with that :)
> > Sensible ideas for filtering spam:
> > http://www.paulgraham.com/spam.html
>
> The problem with heuristic filters is that once they gain widespread
> use, spammers will test their spam on them. So it's an arms race.
> This is why I find the whitelist idea elegant.
Then you failed to read or understand the article. Might I draw your
attention to what Paul said in the article I pointed you to:
To beat Bayesian filters, it would not be enough for spammers to
make their emails unique or to stop using individual naughty words.
They'd have to make their mails indistinguishable from your ordinary
mail. And this I think would severely constrain them. Spam is mostly
sales pitches, so unless your regular mail is all sales pitches,
spams will inevitably have a different character. And the spammers
would also, of course, have to change (and keep changing) their
whole infrastructure, because otherwise the headers would look as
bad to the Bayesian filters as ever, no matter what they did to the
message body. I don't know enough about the infrastructure that
spammers use to know how hard it would be to make the headers look
innocent, but my guess is that it would be even harder than making
the message look innocent.
Also, from the FAQ, which you might have missed,
http://www.paulgraham.com/spamfaq.html:
Once this software was available, couldn't spammers just tune their
spams to get through it?
They couldn't necessarily tune their emails and still say what they
wanted to say. If they wanted to send you to a url that is known to
the filters, for example, they would find it hard to tune their way
around that.
Second, tune using what? Each user's filters will be different, and
the innocent words will vary especially. At most, spammers will be
able to dilute their mails with merely neutral words, and those will
not tend to be much use because they won't be among the fifteen most
interesting.
If the spammers did try to get most of the incriminating words out
of their messages, they would all have to use different euphemisms,
because if they all started saying "adolescents" instead of "teens",
then "adolescents" would start to have a high spam probability.
Finally, even if spammers worked to get all the incriminating words
out of the message body, that wouldn't be enough, because in a
typical spam a lot of the incriminating words are in the headers.
Anyhow, I don't mind what you do, I'm just offering some advice - to
shed some light I've moved my spam filtering over from the combination
blacklist/whitelist/regexp matching procmail filter I've been using for
several years, to a bayesian filter, which so far, has caught every spam
thrown my way with no false positives.
Tom.
--
.^. .-------------------------------------------------------.
/V\ | Tom Gilbert, London, England | http://linuxbrit.co.uk |
/( )\ | Open Source/UNIX consultant | tom at linuxbrit.co.uk |
^^-^^ `-------------------------------------------------------'
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <http://mailman.lug.org.uk/pipermail/gllug/attachments/20020912/b6f350b0/attachment.pgp>
More information about the GLLUG
mailing list