[Gllug] Stories of using filters
Nix
nix at esperi.org.uk
Fri Mar 18 16:43:34 UTC 2005
On Thu, 17 Mar 2005, Steve Nelson prattled cheerily:
> Would people like to contribute some real-life examples of using:
>
> od
Whenever you want to see WTF is going on with some binary file :)
These days there are nicer binary viewers, but there are few as
configurable.
> nl
Again, this one tends to be used at the *ends* of pipelines, generally
just before pr(1) and fmt(1), e.g. something I did last week (reflowed,
some of the linefeeds may be in the wrong place):
#!/bin/ksh
[[ $# -ne 1 ]] && echo "Syntax: find-symbol-users symbol" && exit 1;
find / \( -fstype nfs -o -fstype proc -o -path "/mirror" -o -path "/dos" \
-o -path "/var/spool" -o -path "/mnt" -o -path "/home" \) -prune \
-o -perm +a+x -type f -print0 | xargs -0r file |\
grep -E "dynamically linked|shared object" | cut -d: -f1 |\
xargs readelf --symbols | grep -E "^File: | UND $1@" |\
awk ' BEGIN { header=0; oldline=""; }
/^File: / { header=1; oldline=$0; }
{ if (match ($0,/File: /) == 0)
{ if (header == 1) {
header=0;
print oldline;
}
print $0; } }' | sed -r 's/[ \t]+/ /'g |\
nl | fmt --split-only | pr | lpr
(OK, maybe this isn't the *best* example for didactic purposes in the
world; it looks across most of my system for ELF executables and shared
libraries that reference a particular symbol, and says what binaries
they are. Useful for keeping up with ABI changes and figuring out if
anyone really uses strfry() and stuff like that.)
> paste
Often used with join, apparently, but I'll admit I've never got into
the habit of using them.
> tr
This one is really useful. Examples (all assuming LC_COLLATE="C"):
echo "$something" | tr '[a-z]' '[A-Z]' # converts to uppercase
tr -d '\015' < something # proof that `dos2unix' is a pointless command
GNU tr supports the use of nulls in there, too, which can be decidedly
useful when writing command-lines using GNU find feeding commands that
don't support receiving null-separated argument lists:
find ... -print0 | tr "\0" "\n" | ... # or something other than \n
(the common case for find is feeding xargs, of course, and GNU xargs
supports -0, so all is well there.)
> cut
Lots of uses: the scriptage above uses it to transform a stream of
things like
/lib/ld-2.3.4.so: ELF 32-bit MSB shared object, SPARC32PLUS, V8+ Required, Sun UltraSPARC1 Extensions Required, version 1 (SYSV), not stripped
into
/lib/ld-2.3.4.so
after grepping them for `shared object' or `dynamically linked'.
> fmt
See above :)
> expand
Useful for the poor deprived vi user to convert tabs to spaces :) the
rest of us use `M-x untabify', of course. :)
> pr
(See above.)
> cut
(What, again? ;) )
> tac
I've used this one to cater for the fact that `tail' allows the syntax
`+{num}' to cut the first {n-1} lines of a file off, but `head' doesn't
have an analogous way to cut off the end of a file. Hence:
... | tac | tail +2 | tac | ...
cuts off the last line of a file. (You could do exactly the same thing
with sed, or with awk.)
I use it in this appallingly ugly hack of a script, which I run out of
cron to delete old packages downloaded by CPAN.pm but not removed (note
that this is only safe when run over non-publically-writable hierarchies
because I didn't bother to use find -print0):
#!/bin/ksh
#
# Clean up the junk from the CPAN directories.
#
CPAN=/usr/packages/perl/modules/CPAN
integer NUMSLASHES
NUMSLASHES=`echo $CPAN | sed 's,[^/],,g' | wc -c`+6
# This for condition gives us a list of module names of dupped modules,
# absent their version number and path. (It'll break with a module
# named e.g. Foo-77.Men-Alive-0.22, but I think we can cope with that.)
for name in `find $CPAN/authors/id -type f -name "*.tar.gz" |\
sed 's,^.*/\([^/]*\)$,\1,; s/^\(.*\)-[0-9]*\..*$/\1/' |\
sort | uniq -c | sort -n -k 1 | grep '^[[:space:]]*[2-9]' |\
awk '{ print $2; }'`; do
# We now have a name. Find all perl modules that match that name,
# sort them by basename, and tear the last line off. Zap everything
# else.
find $CPAN/authors/id -type f -regex '^.*/'$name'-[0-9.]*\.tar\.gz$' |\
sort -t / -k $(($NUMSLASHES+1)) | tac | tail +2 | tac | xargs rm -f
done
> split
Another one I've never really used.
One I use all the time in high-performance shell scripting (OK, OK, stop
laughing) is comm(1). The trick when writing fast shell scripts is to do
*nothing* with loops, as they involve lots of forking and so are
slow. Instead, manipulate lists (maintained as LF- separated files in a
temporary directory under /tmp), with head, tail, and these functions:
#
# filter_out REMOVE FROM
#
# Remove the lines matching REMOVE from the file FROM, and echo the result
# on standard output. `-', meaning standard input, can be used for either
# REMOVE or FROM. Both lists must be sorted.
#
# This is a simple wrapper around `comm'.
filter_out()
{
comm -13 $1 $2
}
#
# matching A B
#
# Emit the lines that the sorted lists A and B have in common.
#
# This is a simple wrapper around `comm'.
matching()
{
comm -12 $1 $2
}
and do everything else with sed and tr and friends. Never use loops.
(Obviously, this is in situations when I don't have perl available.)
You probably also want to read the chapter `Portable Shell Programming'
in the Autoconf manual, particularly the bit on `Limitations of Shell
Builtins' --- not everything it says necessarily matters (e.g. I've
never seen a shell on which ! doesn't work) but a lot of it is worth
bearing in mind, like the `echo "x$foo" | cut -c 2- | ...' convention.
> I remember Bruce's example of using netcat with a borked laptop - its
> always so helpful when you can give a real, useful example of a tool
> rather than just giving a dry theoretical example.
True enough. Well, there's a few real examples above, and some
didactic wibbling.
--
> ...Hires Root Beer...
What we need these days is a stable, fast, anti-aliased root beer
with dynamic shading. Not that you can let just anybody have root.
--- John M. Ford
--
Gllug mailing list - Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug
More information about the GLLUG
mailing list