[Nottingham] A quick bash updating table lookup

Martin martin at ml1.co.uk
Mon Feb 7 16:20:46 UTC 2011


Folks,

Now this is something that just must have already been done...

I'm checking md5sums for files for filesystem _inodes_  ... This is so
that my system doesn't go checking the md5sum for data corruption for
the same inode a gazillion times through multiple hard links to the
same inode in the archive/snapshot copies that are kept. So... The
problem is:

How to look up a table of already seen inode/md5sum pairs quickly, for
many millions of files?

Also, how to efficiently add new entries to the table and yet maintain
a fast lookup?


I'm running a 'quick and dirty' but non-ideal solution at the moment.
However, it isn't all that quick and will eventually suffer an
exponential degrade as the number of lookups increase... Also, I'm
looking for something that will scale up to a 100 million files or
more.

Any good ideas?

Already done even?

(And this was supposed to be just a "5 minutes bash scripting exercise"... :-( )

Cheers,
Martin



More information about the Nottingham mailing list