[Gllug] A few words on the topic of stock spam
Dagfinn Ilmari Mannsåker
ilmari at ilmari.org
Tue Jul 10 21:25:57 UTC 2007
"Martin A. Brooks" <martin at hinterlands.org> writes:
> Nix wrote:
>> FWIW, FuzzyOCR with a pipeline that turns the jpegs into images and then
^^^^^
>> OCRs them as usual does a reasonable job on this (if you ignore the
>> hokey horrible method FuzzyOCR uses to identify spammy words: I really
>> must get this stuff fed through Bayes like everything else).
>
> Perhaps you misread a little. We're already past that stage, PDF spam is
> all the rage.
I suspect Nix meant "PDFs" above, which are a SMOP to convert to
something suitable for FuzzyOCR as long as they aren't encrypted (I've
seen spam/viruses with password-protected zip files attached and the
password in the message body).
--
ilmari
"A disappointingly low fraction of the human race is,
at any given time, on fire." - Stig Sandbeck Mathisen
--
Gllug mailing list - Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug
More information about the GLLUG
mailing list