[Nottingham] AWK script

Camilo Mesias camilo at mesias.co.uk
Thu Mar 24 23:29:57 UTC 2011


Ah, that's interesting,

> The number of fields is 1 more than the number of fields in the record
> maybe that is the blank line separating the records. I have no idea
> why the first field
>
> <U+FEFF>Template-Type: ReDIF-Paper is split into 3 fields and
> separated from the rest of the record. The other records all appear
> normal

I think the for loop should be <= NF then you will see all the fields.
We're all used to 0 based counting where you loop while x<NF, but this
is 1-based

> fields = 3
> 1 is <U+FEFF>Template-Type:
> 2 is ReDIF-Paper

Yes that is odd!

> fields = 12
> 1 is Author-Name: Harry R Clarke
> 2 is Author-Name-First: Harry
> 3 is Author-Name-Last: Clarke
> 4 is Author-Workplace-Name: School of Economics and Finance, La Trobe University
> 5 is Author-Name: William J. Reed
> 6 is Author-Name-First: William
> 7 is Author-Name-Last: Reed
> 8 is Author-Workplace-Name: School of Economics and Finance, La Trobe University
> 9 is Title: The Tree-Cutting Problem in a Stochastic Environment: The
> case of Age—Dependent Growth
> 10 is Creation-Date: 1989
> 11 is Number: 1989.01
>
>
>> If that's working well then the logic of the program looks a bit off,
>> I think there are too many for loops and most of the second part of
>> the script wants to be inside the 'if' that checks for the name of the
>> author... but that could be me misunderstanding the intentions of the
>> script.
>
> You are probably right. I am just learning to use AWK so my ideas
> maybe a bit off. I will try to explain what I think the script should
> do so you can understand what I am trying to do and it might clarify
> it a bit for me too.
>
> {RS=""};{FS="\n"}; Sets record separator equals blank line and Field
> separator equals newline.

OK

> {for(i=1;i<NF;i++){if($i ~ name ){print $i} finds and prints the first
> instance of name in the file but it only works if the name is in the
> very first record in the file I don't know why
>
> break}} breaks out of the loop so we don't go onto the next record.

I think there is an implied loop around the whole of the awk script -
it will be applied to each record in the file, and your break is
happening after the if statement (whether or not the if part is true).

> {for(i=1;i<NF;i++) initializes a new loop to scan over all the fields
> in the current record.
>
> {if($i ~ name) searches for the author name previously found
>
> {{for(i=1;i<NF;i++){if($i ~ /Title:/) if the previous search found
> author name it starts another new loop and scans over all the fields
> in the record searching for Title:
>
> {print $i }}}}}}' Prints the titles of the papers written by our
> chosen author, go to next record, return to second for loop.

OK I follow that now - it's not how I would have written it but it
should work, although the reused loop variable might cause confusion.

I think I would have written something like:

{ # in this record, look for the name we want
 for(i=1;i<=NF;i++)
 {
  if ($i ~ name)
  {
   print "Found " $i
   # since the name is right, print the titles in this record
   for(j=1;j<=NF;j++)
   {
     if ($j ~ /Title:/)
     {
      print $j
     }
   }
  # finished with this record, no need to continue looking for the
name in this record
  break
  }
 }
}



More information about the Nottingham mailing list