[Gllug] A few words on the topic of stock spam

Nix nix at esperi.org.uk
Wed Jul 11 19:26:41 UTC 2007


On 10 Jul 2007, Martin A. Brooks stated:

> Nix wrote:
>> On 10 Jul 2007, Martin A. Brooks stated:
>>   
>>> Nix wrote:
>>>     
>>>> FWIW, FuzzyOCR with a pipeline that turns the jpegs into images and then
>>>> OCRs them as usual does a reasonable job on this (if you ignore the
>>>> hokey horrible method FuzzyOCR uses to identify spammy words: I really
>>>> must get this stuff fed through Bayes like everything else).
>>>>       
>>> Perhaps you misread a little. We're already past that stage, PDF spam is 
>>> all the rage.
>>
>> I can't parse this at all. We're already past *what* stage? I never
>> mentioned any sort of stage.
>
> We're past the stage of FuzzyOCR being an effective method of picking 
> out this stuff.

FuzzyOCR is pretty customizable: I was suggesting a trivial
customization (actually already in FuzzyOCR SVN, I now find) which makes
it quite capable at picking this stuff out.

-- 
`... in the sense that dragons logically follow evolution so they would
 be able to wield metal.' --- Kenneth Eng's colourless green ideas sleep
 furiously
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug




More information about the GLLUG mailing list