[Gllug] sed question: hyperlinking URLs

J F jnns at linuxmail.org
Fri Jan 26 14:12:26 UTC 2007


> > > I'm trying to write a sed command to pick out URLs in a file 
> > and turn them into an HTML hyperlink. I've come with this thus 
> > far:
> > >
> > > sed -e "s/\(www\.[,\=\/\%\&\+\#\?0-9_a-z\.A-Z-]\+\)/\x3Ca 
> > href=\"http\x3A\/\/\1\"\x3E\1\x3C\/a\x3E/g" file.txt
> > >
> > > (\x3C is a '<' character; \x3E is a '>' character; \x3A is a colon.)
> > >
> > > The trouble is, the input text contains URLs at the end of 
> > sentences and the expression picks up the full stop ending the 
> > sentence.
> >
> > You need negative lookahead assertions to do this. sed doesn't have
> > them, nor does awk :/
> 
> I'm not convinced that's correct. The first regex I wrote used a (?!\.)
> negative lookahead assertion (in Perl) but I realised it didn't need to
> be anything like as complex. Requiring the last character to be a word
> character is enough for this particular example, since no TLD that I am
> aware of ends in anything else. He won't get [,=+/% ...] in the domain
> name either, so the original character class is overly complex.
> 
> Did you see the Perl oneliner version I replied with earlier? That isn't
> using lookahead assertions. Does it miss the point somehow?
> 
> /joel

Thanks for the Perl code and the tips. Having looked at your code, I tried this:

sed -e "s at www\.[^ \t]*[^., \t]@<a href=\"http://\0\">\0</a>@g"

And it works!

Thanks

=


-- 
Powered by Outblaze
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug




More information about the GLLUG mailing list