[Gllug] Stories of using filters

Fri Mar 18 16:43:34 UTC 2005

On Thu, 17 Mar 2005, Steve Nelson prattled cheerily:
> Would people like to contribute some real-life examples of using:
> 
> od

Whenever you want to see WTF is going on with some binary file :)

These days there are nicer binary viewers, but there are few as
configurable.

> nl

Again, this one tends to be used at the *ends* of pipelines, generally
just before pr(1) and fmt(1), e.g. something I did last week (reflowed,
some of the linefeeds may be in the wrong place):

#!/bin/ksh

[[ $# -ne 1 ]] && echo "Syntax: find-symbol-users symbol" && exit 1;

find / \( -fstype nfs -o -fstype proc -o -path "/mirror" -o -path "/dos" \
          -o -path "/var/spool" -o -path "/mnt" -o -path "/home" \) -prune \
          -o -perm +a+x -type f -print0 | xargs -0r file |\
    grep -E "dynamically linked|shared object" | cut -d: -f1 |\
    xargs readelf --symbols | grep -E "^File: | UND $1@" |\
    awk ' BEGIN { header=0; oldline=""; }
          /^File: / { header=1; oldline=$0; }
          { if (match ($0,/File: /) == 0)
          { if (header == 1) {
              header=0;
              print oldline;
            }
            print $0; } }' | sed -r 's/[ \t]+/ /'g |\
    nl | fmt --split-only | pr | lpr

(OK, maybe this isn't the *best* example for didactic purposes in the
world; it looks across most of my system for ELF executables and shared
libraries that reference a particular symbol, and says what binaries
they are. Useful for keeping up with ABI changes and figuring out if
anyone really uses strfry() and stuff like that.)

> paste

Often used with join, apparently, but I'll admit I've never got into
the habit of using them.

> tr

This one is really useful. Examples (all assuming LC_COLLATE="C"):

echo "$something" | tr '[a-z]' '[A-Z]'  # converts to uppercase

tr -d '\015' < something  # proof that `dos2unix' is a pointless command

GNU tr supports the use of nulls in there, too, which can be decidedly
useful when writing command-lines using GNU find feeding commands that
don't support receiving null-separated argument lists:

find ... -print0 | tr "\0" "\n" | ... # or something other than \n

(the common case for find is feeding xargs, of course, and GNU xargs
supports -0, so all is well there.)

> cut

Lots of uses: the scriptage above uses it to transform a stream of
things like

/lib/ld-2.3.4.so: ELF 32-bit MSB shared object, SPARC32PLUS, V8+ Required, Sun UltraSPARC1 Extensions Required, version 1 (SYSV), not stripped

into

/lib/ld-2.3.4.so

after grepping them for `shared object' or `dynamically linked'.

> fmt

See above :)

> expand

Useful for the poor deprived vi user to convert tabs to spaces :) the
rest of us use `M-x untabify', of course. :)

> pr

(See above.)

> cut

(What, again? ;) )

> tac

I've used this one to cater for the fact that `tail' allows the syntax
`+{num}' to cut the first {n-1} lines of a file off, but `head' doesn't
have an analogous way to cut off the end of a file. Hence:

... | tac | tail +2 | tac | ...

cuts off the last line of a file. (You could do exactly the same thing
with sed, or with awk.)

I use it in this appallingly ugly hack of a script, which I run out of
cron to delete old packages downloaded by CPAN.pm but not removed (note
that this is only safe when run over non-publically-writable hierarchies
because I didn't bother to use find -print0):

#!/bin/ksh
#
# Clean up the junk from the CPAN directories.
#

CPAN=/usr/packages/perl/modules/CPAN

integer NUMSLASHES

NUMSLASHES=`echo $CPAN | sed 's,[^/],,g' | wc -c`+6

# This for condition gives us a list of module names of dupped modules,
# absent their version number and path.  (It'll break with a module
# named e.g. Foo-77.Men-Alive-0.22, but I think we can cope with that.)

for name in `find $CPAN/authors/id -type f -name "*.tar.gz" |\
             sed 's,^.*/\([^/]*\)$,\1,; s/^\(.*\)-[0-9]*\..*$/\1/' |\
             sort | uniq -c | sort -n -k 1 | grep '^[[:space:]]*[2-9]' |\
             awk '{ print $2; }'`; do

    # We now have a name. Find all perl modules that match that name,
    # sort them by basename, and tear the last line off. Zap everything
    # else.

    find $CPAN/authors/id -type f -regex '^.*/'$name'-[0-9.]*\.tar\.gz$' |\
        sort -t / -k $(($NUMSLASHES+1)) | tac | tail +2 | tac | xargs rm -f
done

> split

Another one I've never really used.

One I use all the time in high-performance shell scripting (OK, OK, stop
laughing) is comm(1). The trick when writing fast shell scripts is to do
*nothing* with loops, as they involve lots of forking and so are
slow. Instead, manipulate lists (maintained as LF- separated files in a
temporary directory under /tmp), with head, tail, and these functions:

#
# filter_out REMOVE FROM
#
# Remove the lines matching REMOVE from the file FROM, and echo the result
# on standard output. `-', meaning standard input, can be used for either
# REMOVE or FROM. Both lists must be sorted.
#
# This is a simple wrapper around `comm'.
filter_out()
{
    comm -13 $1 $2
}

#
# matching A B
#
# Emit the lines that the sorted lists A and B have in common. 
#
# This is a simple wrapper around `comm'.
matching()
{
    comm -12 $1 $2
}

and do everything else with sed and tr and friends. Never use loops.

(Obviously, this is in situations when I don't have perl available.)

You probably also want to read the chapter `Portable Shell Programming'
in the Autoconf manual, particularly the bit on `Limitations of Shell
Builtins' --- not everything it says necessarily matters (e.g. I've
never seen a shell on which ! doesn't work) but a lot of it is worth
bearing in mind, like the `echo "x$foo" | cut -c 2- | ...' convention.

> I remember Bruce's example of using netcat with a borked laptop - its
> always so helpful when you can give a real, useful example of a tool
> rather than just giving a dry theoretical example.

True enough. Well, there's a few real examples above, and some
didactic wibbling.

-- 
> ...Hires Root Beer...
What we need these days is a stable, fast, anti-aliased root beer
with dynamic shading. Not that you can let just anybody have root.
 --- John M. Ford
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug