[Gllug] Batch Text Replacement

F Andersen fendersan at aol.com
Tue Oct 30 23:31:29 UTC 2007


> Will need to recap on regex though :/
>
> > > Just need to know what easy app can perform such feat - pretty easily.
> > sed/awk/Perl

Try gsar (General Search And Replace), it's faster/simpler than sed/awk/perl
but doesn't use regex:
http://home.online.no/~tjaberg/

you'll have to download your file(s) first though, it doesn't work via FTP.

> Quick question then:
>
> What would be the regex (or even the "sed" command) that would strip
> any occurrence of the string ' style="blah blah blah' from a group of
> html files?
>
> ie
>
> <span style="font-size:11.0pt;mso-bidi-font-size:10.0pt">
>
> becomes
>
> <span>
>
> that would help me loads.

First, I would use HTML Tidy to make the tags consistent and to stop the
tags from wrapping (i think the option is 'wrap: 0')
http://www.w3.org/People/Raggett/tidy/

... then use gsar, sed or whatever to strip the tags you don't want.

I believe HTML Tidy comes with an option to strip MS Word tags, might work
with those frontpage tags too.

The general search and replace syntax for gsar is:

gsar -i -o -s"search-string" -r"replace-string" file.htm

So the gsar command to remove your <span> tag would be:

gsar -i -o -s"<span
style:x22font-size::11.0pt;mso-bidi-font-size::10.0pt:x22>" -r"" file.html

>From the documentation:
--------
Ctrl characters may be entered by using a ':' in the string followed by the
ASCII value of the character. The value is entered using ':' followed by
three
decimal digits or ':x' followed by two hex numbers. To enter ':' use '::'
--------

... that's why I used :x22 for the double quotes and :: for the colon.

-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug




More information about the GLLUG mailing list