[Sussex] Updated Grep, Sed and RegExp links from August moot

Steve Dobson steve at dobbo.org
Fri Sep 16 02:14:33 UTC 2011


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Dominic

On 15/09/11 10:29, Dominic Humphries wrote:
> Wow.. that's some serious grep & sed usage!
> 
> I'm impressed that you've managed to get so much functionality out of
> sed, in particular.

sed is an extremely powerful editor.  I use it for changing typos and
stuff in boilerplate comments in files for example.

> One thing I would say, though, is that if this is the kind of thing
> you're doing often, you would probably find Perl both easy to use (if
> you can do regexes, you're more than halfway to being good at Perl
> anyway) and immensely helpful - Perl was basically created for this kind
> of text processing.
> 
> For example, your first sed example,
>   sed -e '/^[[:space:]]*\- *[0-9]\{1,3\} *\-$/d' test_file_00.txt > test_file_01.txt
> 
> You could replace with the below perl script, which you would run as, e.g.:
>   ./de-page-number filename.txt
> And it would create filename.txt.out with the de-paginated output.
>
> I'll grant you it would be more typing, but it's a *lot* more
> readable & re-usable :)

Your perl script maybe more readable but I would argue that it isn't
more re-useable.  You have to use it on a file, and the modified text is
written to another file in the same directory.  What if the file you
want to de page number is on a CDROM?

First of all we can turn the sed expression into a command by creating
the following file:
- ------------- sed-de-page-number ------------------
#!/bin/sed -f
/^[[:space:]]*\- *[0-9]\{1,3\} *\-$/d
- ---------------------------------------------------

and then chmod(1) the file to give it execute permissions

	$ chmod o+x sed-de-page-number

and your good to go:

	$ ./sed-de-page-number < filename.txt > filename.txt.out

But with this version one can send the file to anywhere:

	$ ./sed-de-page-number < filename.txt > /tmp/filename.txt

Or how about if you want to send the file to the default printer:

	$ ./sed-de-page-number < filename.txt | lpr

Or if the directory contains the chapters of a book and I want to print
the whole book without page numbers:

	$ cat chapter-*.txt | ./sed-de-page-number | lpr

Or if the directory contains directories each of which contains the
chapters of a different book and I want to print the whole libray:

	$ cat */chapter-*.txt | ./sed-de-page-number | lpr

But at the end of the day the really important thing is to learn
regexps, although one does have to be careful because there are some
differences from command to command.  But I've used regexps in the
following programs:

Command shells: bash, ksh, sed

Editors: sed, [n]vi, emacs, xemacs, OpenOffice

Others: awk, [f]lex, find

And that's not to mention the number of configuration file which can
include regexps.

Steve
- -- 
Steve Dobson
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iD8DBQFOcrEBu7HOw0Q66oERAqWyAKCfkjrimJCeUXbD15gUExsHMmgouACfR2/e
TmCSST6krm3CMAHaqi1y23s=
=AEx6
-----END PGP SIGNATURE-----



More information about the Sussex mailing list