[Nottingham] A quick bash updating table lookup

Mon Feb 7 17:52:13 UTC 2011

It sounds like a database, I probably wouldn't tackle the problem with
anything less than Perl, although the result might look quite
bash-like.

A perl hash (associative array) could map inodes to md5sums, the hash
would also work for telling you if the inode was already summed.

That would work for small numbers of entries (several thousand) effortlessly.

To scale it to huge numbers then you could 'tie' the hash to a
database file - it would then be implemented and persisted in the DB
file.

If Perl is installed you might have the man page for DB_File which
might help, or search for some examples on the net.

-Cam

On Mon, Feb 7, 2011 at 4:16 PM, Martin <martin at ml1.co.uk> wrote:
> Folks,
>
> Now this is something that just must have already been done...
>
> I'm checking md5sums for files for filesystem _inodes_  ... This is so
> that my system doesn't go checking the md5sum for data corruption for
> the same inode a gazillion times through multiple hard links to the
> same inode in the archive/snapshot copies that are kept. So... The
> problem is:
>
> How to look up a table of already seen inode/md5sum pairs quickly, for
> many millions of files?
>
> Also, how to efficiently add new entries to the table and yet maintain
> a fast lookup?
>
>
> I'm running a 'quick and dirty' but non-ideal solution at the moment.
> However, it isn't all that quick and will eventually suffer an
> exponential degrade as the number of lookups increase... Also, I'm
> looking for something that will scale up to a 100 million files or
> more.
>
> Any good ideas?
>
> Already done even?
>
> (And this was supposed to be just a "5 minutes bash scripting exercise"... :-( )
>
> Cheers,
> Martin
>
> _______________________________________________
> Nottingham mailing list
> Nottingham at mailman.lug.org.uk
> https://mailman.lug.org.uk/mailman/listinfo/nottingham
>