[Nottingham] AWK script

stripes stripes at roppa.demon.co.uk
Thu Mar 24 23:00:23 UTC 2011


Hi Cam,

> are you confident of Awk's behaviour splitting the file into records
> and fields? If you ran a simple script along the lines of
>
> { print "fields = " NF ; for (i=1; i<NF; i++){ print i " is " $i } }
>
> Does it print what you expect?

Not even close but this variant of it does
{RS=""};{FS="\n"}; { print "fields = " NF ; for (i=1; i<NF; i++){
print i " is " $i } }

This is the first field and the script will always find and print the
Author-Name: for whatever name is entered here.
The string  <U+FEFF> is the byte mark order for the file as the file
is UTF8 it should not be needed but RePEc won't process accented
characters without it.

The number of fields is 1 more than the number of fields in the record
maybe that is the blank line separating the records. I have no idea
why the first field

<U+FEFF>Template-Type: ReDIF-Paper is split into 3 fields and
separated from the rest of the record. The other records all appear
normal

fields = 3
1 is <U+FEFF>Template-Type:
2 is ReDIF-Paper
fields = 12
1 is Author-Name: Harry R Clarke
2 is Author-Name-First: Harry
3 is Author-Name-Last: Clarke
4 is Author-Workplace-Name: School of Economics and Finance, La Trobe University
5 is Author-Name: William J. Reed
6 is Author-Name-First: William
7 is Author-Name-Last: Reed
8 is Author-Workplace-Name: School of Economics and Finance, La Trobe University
9 is Title: The Tree-Cutting Problem in a Stochastic Environment: The
case of Age—Dependent Growth
10 is Creation-Date: 1989
11 is Number: 1989.01


> If that's working well then the logic of the program looks a bit off,
> I think there are too many for loops and most of the second part of
> the script wants to be inside the 'if' that checks for the name of the
> author... but that could be me misunderstanding the intentions of the
> script.

You are probably right. I am just learning to use AWK so my ideas
maybe a bit off. I will try to explain what I think the script should
do so you can understand what I am trying to do and it might clarify
it a bit for me too.

{RS=""};{FS="\n"}; Sets record separator equals blank line and Field
separator equals newline.

{for(i=1;i<NF;i++){if($i ~ name ){print $i} finds and prints the first
instance of name in the file but it only works if the name is in the
very first record in the file I don't know why

break}} breaks out of the loop so we don't go onto the next record.

{for(i=1;i<NF;i++) initializes a new loop to scan over all the fields
in the current record.

{if($i ~ name) searches for the author name previously found

{{for(i=1;i<NF;i++){if($i ~ /Title:/) if the previous search found
author name it starts another new loop and scans over all the fields
in the record searching for Title:

{print $i }}}}}}' Prints the titles of the papers written by our
chosen author, go to next record, return to second for loop.


At least that is what I want it to do. Most of it works in that it
scans all the records and prints all the papers written but the chosen
author but it doesn't print the Author-Name: field unless it is the
first author in the file I assume this must be mixed up with the BOM
at the start of the file but I can't see how.

Stripes

>
> -Cam
>
> _______________________________________________
> Nottingham mailing list
> Nottingham at mailman.lug.org.uk
> https://mailman.lug.org.uk/mailman/listinfo/nottingham
>



More information about the Nottingham mailing list