[Gllug] strip invalid email addresses

Joel Bernstein joel at fysh.org
Fri Dec 7 17:56:53 UTC 2007


On Fri, Dec 07, 2007 at 05:52:20PM +0000, Progga wrote:
> On Fri, Dec 07, 2007 at 05:21:46PM +0000, countd wrote:
> 
> > I have a plain text list of email addresses. Can anyone help me with a
> > one liner (sed?) to strip out the invalid ones? I can see for example
> > that there are telephone numbers in there, and lines with no @ sign
> > which I'd like to get rid of.
> 
> Assuming that each line contains only one address/phone-number/... 
> 
> $ cat list.txt | sed -n "/@.*/p"
> 
>   will filter the email addresses.  You'll have to tweak the regex for best
> performance.  You can google for the perfect regex for email address.
> 

Actually, requiring an '@' sign is about as much as you can do in that
regard. There is theoretically a 'perfect' regex for RFC2822 addresses,
which is very long and really doesn't do much more than match
<localpart>@<something> in a rather long winded way -- see my answer
elsewhere in this thread for an explanation and link to the RFC.

Regular expressions are the wrong way to match email addresses. Filter
out the ones without '@' signs and see which of the remainder are
deliverable. You're looking at diminishing returns beyond this point
though.

/joel
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug




More information about the GLLUG mailing list