[Gllug] Extracting information from huge volumes of email
Daniel P. Berrange
dan at berrange.com
Mon Mar 15 14:25:00 UTC 2004
On Mon, Mar 15, 2004 at 01:45:07PM +0000, Mamading Ceesay wrote:
> On Mon, 15 Mar 2004 11:45:58 +0000
> Daniel P. Berrange wrote:
>
> >
> > I've come the conclusion that mail programs / filesystem folders are
> > not so good for extracting information from huge volumes of email.
> >
>
> I've been intending to investigate mairix for this very purpose.
> http://www.rc0.org.uk/mairix/
Looks interesting, but can't help thinking he's taken the wrong approach
to implementing this. All that time he's spent writing code to parse / write
mail boxes, parse RFC822 and MIME messages, a crude query language, when
he could have concentrated on formaluting interesting search/query patterns.
I briefly considered doing similar, but decided that there is no way I'd get
any where near the quality of existing code in these areas. By leveraging
the existing Perl Mail-Box & Mime-tools modules for parsing and manipulating
folders, SQL for the general query language & TSearch for full text indexing,
I've created something easily as powerful in query terms, with better mailbox
support & trivially extendable in a matter of days. Having said that, being
C it does have pretty awesome speed compared to the perl / SQL solution -
evidentally important to the guy since he apparently runs it on a 486 !
Dan.
--
|=- http://www.berrange.com/~dan/gpgkey.txt -=|
|=- berrange at redhat.com - Daniel Berrange - dan at berrange.com -=|
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 240 bytes
Desc: not available
URL: <http://mailman.lug.org.uk/pipermail/gllug/attachments/20040315/85bc0d34/attachment.pgp>
-------------- next part --------------
--
Gllug mailing list - Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug
More information about the GLLUG
mailing list