[Gllug] Extracting information from huge volumes of email

Daniel P. Berrange dan at berrange.com
Mon Mar 15 14:25:00 UTC 2004


On Mon, Mar 15, 2004 at 01:45:07PM +0000, Mamading Ceesay wrote:
> On Mon, 15 Mar 2004 11:45:58 +0000
> Daniel P. Berrange wrote:
> 
> > 
> > I've come the conclusion that mail programs / filesystem folders are
> > not so good for extracting information from huge volumes of email.
> >
> 
> I've been intending to investigate mairix for this very purpose.
> http://www.rc0.org.uk/mairix/

Looks interesting, but can't help thinking he's taken the wrong approach 
to implementing this. All that time he's spent writing code to parse / write 
mail boxes, parse RFC822 and MIME messages, a crude query language, when 
he could have concentrated on formaluting interesting search/query patterns.

I briefly considered doing similar, but decided that there is no way I'd get 
any where near the quality of existing code in these areas. By leveraging
the existing Perl Mail-Box & Mime-tools modules for parsing and manipulating
folders, SQL for the general query language & TSearch for full text indexing, 
I've created something easily as powerful in query terms, with better mailbox
support & trivially extendable in a matter of days. Having said that, being
C it does have pretty awesome speed compared to the perl / SQL solution - 
evidentally important to the guy since he apparently runs it on a 486 !

Dan.
-- 
|=-               http://www.berrange.com/~dan/gpgkey.txt             -=|
|=-   berrange at redhat.com  -  Daniel Berrange  -  dan at berrange.com    -=|
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 240 bytes
Desc: not available
URL: <http://mailman.lug.org.uk/pipermail/gllug/attachments/20040315/85bc0d34/attachment.pgp>
-------------- next part --------------
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug


More information about the GLLUG mailing list