[Gllug] A few words on the topic of stock spam

Dagfinn Ilmari Mannsåker ilmari at ilmari.org
Tue Jul 10 21:25:57 UTC 2007


"Martin A. Brooks" <martin at hinterlands.org> writes:

> Nix wrote:
>> FWIW, FuzzyOCR with a pipeline that turns the jpegs into images and then
                                                 ^^^^^
>> OCRs them as usual does a reasonable job on this (if you ignore the
>> hokey horrible method FuzzyOCR uses to identify spammy words: I really
>> must get this stuff fed through Bayes like everything else).
>
> Perhaps you misread a little. We're already past that stage, PDF spam is 
> all the rage.

I suspect Nix meant "PDFs" above, which are a SMOP to convert to
something suitable for FuzzyOCR as long as they aren't encrypted (I've
seen spam/viruses with password-protected zip files attached and the
password in the message body).

-- 
ilmari
"A disappointingly low fraction of the human race is,
 at any given time, on fire." - Stig Sandbeck Mathisen

-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug




More information about the GLLUG mailing list