[YLUG] Subtracting one list from another.

Zoe Stephenson zrs1 at york.ac.uk
Mon Mar 26 09:49:55 BST 2007


On Sun, Mar 25, 2007 at 04:38:07PM +0100, Gaffer wrote:
> On Sunday 25 March 2007 12:27, Gaffer inscribed thus:
> > Hi Guys,
> >
> > How can I subtract items from two lists where one list has items
> > that are not in the other list in order to get a list of those
> > items ?
> >
> > Sorry if that sounds confusing !
> >
> > I have tried   "diff -a g1.txt g2.txt > g3.txt"  but I end up with
> > a file that is as big as the sum of the g1 + g2 file sizes and
> > seems to have as entries that are not in either file.
> >
> > cmp just says the first line is different and stops.
> >
> > I'm obviously doing something wrong..........!!

[many cool suggestions]

> Thankyou very much for the help !  I got there in the end, or at least 
> somewhere close !

You had a look at diff.  By way of extra explanation - diff gives you
a summary of what changes would need to be made to the first file you
give it to turn it into the second file.  It also tells you the line
numbers, e.g.:

2c2,4 tells you a change to make to turn line 2 of the first file into
lines 2-4 of the second, or
3a6 tells you what to add after line 3 of the first to create line 6
of the second

Then it marks the text to remove with <s and the text to add with >s.

> I hadn't realised that there may be non printing characters hidden in 
> the files that i was trying to compare.  Also I missed a switch "-r" 
> which didn't help.  
> 
> Of course I am now curious as to why there should be non printing 
> characters, other than new lines, carriage returns and white space in 
> a text file !  One file created from "ls > g1.txt" and the other from 
> a text file copied from another machine via drag n drop.

Some ideas, not an exhaustive list:

  Filenames can have non-printing characters in them.  You can put them
  there yourself, or they might turn up because of some bizarre
  automation problem with encodings.

  The output from ls often contains control characters that mess with
  the colours, but these should be suppressed when it finds out that
  its output isn't going to the terminal.

  The text file on the other machine may have a different encoding or
  be edited with a different editor.

Do you know exactly what extra characters are in the file and where?

Also, why are you comparing lists of files?  There may be a more
optimal way of doing what you're doing, depending on the application.

-- 
 -- zoe



More information about the York mailing list