[Nottingham] A quick bash updating table lookup

Martin martin at ml1.co.uk
Mon Feb 7 18:15:15 UTC 2011


On 7 February 2011 17:48, Camilo Mesias <camilo at mesias.co.uk> wrote:
> It sounds like a database, I probably wouldn't tackle the problem with
> anything less than Perl, although the result might look quite
> bash-like.
>
> A perl hash (associative array) could map inodes to md5sums, the hash
> would also work for telling you if the inode was already summed.
>
> That would work for small numbers of entries (several thousand) effortlessly.

My thoughts also. I /could/ just have a huge look-up table simply
using the inode numbers as an index, but that's going to be rather
expensive on memory or disk seeks if I use a big disk file. That's
workable if I allow a table to gobble up a few GBytes temporarily.

For using hashing, I just can't guess whether a 'big table' is needed
or if a smaller table would be fine. There's no prediction on what
proportion of files are hard linked together.


> To scale it to huge numbers then you could 'tie' the hash to a
> database file - it would then be implemented and persisted in the DB
> file.
>
> If Perl is installed you might have the man page for DB_File which
> might help, or search for some examples on the net.

Thanks but I'm not sure I gain anything with a database other than
bloat and a slow-down. Will the database do anything more clever than
a simple linear search if there is no (slow) index creation upon each
new entry?

Are there self balancing binary tree tables available so that the
lookup is done 'quickly' without the brute force expense of a huge
(and likely sparse) lookup table? And without the expense of
re-indexing for every new addition to the table?...


Cheers,
Martin



More information about the Nottingham mailing list