[Gllug] strip invalid email addresses

Joel Bernstein joel at fysh.org
Fri Dec 7 17:46:35 UTC 2007


On Fri, Dec 07, 2007 at 05:21:46PM +0000, countd wrote:
> I have a plain text list of email addresses. Can anyone help me with a
> one liner (sed?) to strip out the invalid ones? I can see for example
> that there are telephone numbers in there, and lines with no @ sign
> which I'd like to get rid of.

You might require that they all contain an '@' sign. But really there
are a huge possible set of email addresses valid according to RFC822 and
its successors, and a single regex is a rotten way to tackle that. The
current specification for what an email address looks like is at:
http://tools.ietf.org/html/rfc2822#section-3.4.1 

The Perl module Regexp::Common [1] doesn't make any attempt to provide a
matcher for email addresses, for good reason. I suggest you either rely
on your mailserver to chuck out anything it doesn't know how to deliver,
or that you settle on a relatively narrow definition of what an email
address looks like and writing a regex which covers those cases.

Fundamentally, you're making a mistake in conflating "looks like a valid
email address" (according to whatever metric) and "is an address to
which I can deliver email". 

/joel

1:
http://search.cpan.org/~abigail/Regexp-Common-2.120/lib/Regexp/Common.pm
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug




More information about the GLLUG mailing list