[Gllug] sed question: hyperlinking URLs
Joel Bernstein
joel at fysh.org
Thu Jan 25 00:47:06 UTC 2007
On Wed, Jan 24, 2007 at 10:22:46PM +0000, Nix wrote:
> On 24 Jan 2007, J. F. spake thusly:
>
> > I'm trying to write a sed command to pick out URLs in a file and turn them into an HTML hyperlink. I've come with this thus far:
> >
> > sed -e "s/\(www\.[,\=\/\%\&\+\#\?0-9_a-z\.A-Z-]\+\)/\x3Ca href=\"http\x3A\/\/\1\"\x3E\1\x3C\/a\x3E/g" file.txt
> >
> > (\x3C is a '<' character; \x3E is a '>' character; \x3A is a colon.)
> >
> > The trouble is, the input text contains URLs at the end of sentences and the expression picks up the full stop ending the sentence.
>
> You need negative lookahead assertions to do this. sed doesn't have
> them, nor does awk :/
I'm not convinced that's correct. The first regex I wrote used a (?!\.)
negative lookahead assertion (in Perl) but I realised it didn't need to
be anything like as complex. Requiring the last character to be a word
character is enough for this particular example, since no TLD that I am
aware of ends in anything else. He won't get [,=+/% ...] in the domain
name either, so the original character class is overly complex.
Did you see the Perl oneliner version I replied with earlier? That isn't
using lookahead assertions. Does it miss the point somehow?
/joel
-------------- next part --------------
--
Gllug mailing list - Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug
More information about the GLLUG
mailing list