[Nottingham] shell script guru rqd.
Mr Alan Carter
nottingham at mailman.lug.org.uk
Wed Jan 15 02:06:01 2003
> I have a large list of publications. The problem is that the order
they =
> are
> in is the inverse to what I want (the oldest are first). The other
major
> problem is the date is not in the same place for each article e.g.
>
> Bloggs Joe, The Art Picking Your Nose, Journal of Useless Things,
1975
>
> Blair Tony, How to Make Friends and Influence People, 1997 Labour
Manifestation pp 69 - 79.
The moving position of the dates in Matt's problem is not best
addressed by a shell script, because a) shell is not good at string
handling, b) the input data may contain characters shell is sensitive
to, like ", c) he may believe he has a rigourous scheme for delimiting
fields in his references, but you can bet there will be all sorts of
doubles, tabs, spaces and even weird unprintables in there, d) there
are a couple of messy aspects of the necessary loops that significantly
increase the size and complexity of the script.
In ye olde UNIX model the solution is to scan through the file with a
tiny C filter program, find the year fields and output each line with
the line as the first field. Then shell can be used to sort the lines
by year, and strip off the first field. The C program looks like this:
#include <stdio.h>
#include <string.h>
#define MAXLINE 1024
#define START 1950
#define END 2003
main(int argc, char **argv)
{
int Year;
char *Pointer;
char Buffer[MAXLINE + 1];
char Scratch[MAXLINE + 1];
while(fgets(Buffer, MAXLINE, stdin))
{
Year = 1900;
strcpy(Scratch, Buffer);
Pointer = strtok(Scratch, " \t\n");
while(Pointer)
{
if(atoi(Pointer) >= START &&
atoi(Pointer) <= END)
Year = atoi(Pointer);
Pointer = strtok(NULL, " \t\n");
}
printf("%d %s", Year, Buffer);
}
}
Just copy and paste it into a file called years.c, and compile by
saying:
$ make years
Then just pipe the references through findyear, pipe the findyear
output through sort(1) and pipe the results of that through a little
command line loop that reads each line into two variables - one to be
thrown away and the other that contains the original line, like this:
$ cat refs | years | sort | while read junk data
> do
> echo $data
> done
You could do this with awk or even perl just on the command line, both
of which can be smart with string handling, delimiters and control flow,
but in both cases the funny characters problem would remain, and if you
used perl you'd have to spend several hours debugging and ritually
washing afterwards ;-)
Alan
--