[Nottingham] A quick bash updating table lookup

Mat Booth mbooth at fedoraproject.org
Mon Feb 7 18:43:03 UTC 2011


On 7 February 2011 18:11, Martin <martin at ml1.co.uk> wrote:
> On 7 February 2011 17:48, Camilo Mesias <camilo at mesias.co.uk> wrote:
>> It sounds like a database, I probably wouldn't tackle the problem with
>> anything less than Perl, although the result might look quite
>> bash-like.
>>
>> A perl hash (associative array) could map inodes to md5sums, the hash
>> would also work for telling you if the inode was already summed.
>>
>> That would work for small numbers of entries (several thousand) effortlessly.
>
> My thoughts also. I /could/ just have a huge look-up table simply
> using the inode numbers as an index, but that's going to be rather
> expensive on memory or disk seeks if I use a big disk file. That's
> workable if I allow a table to gobble up a few GBytes temporarily.
>
> For using hashing, I just can't guess whether a 'big table' is needed
> or if a smaller table would be fine. There's no prediction on what
> proportion of files are hard linked together.
>
>
>> To scale it to huge numbers then you could 'tie' the hash to a
>> database file - it would then be implemented and persisted in the DB
>> file.
>>
>> If Perl is installed you might have the man page for DB_File which
>> might help, or search for some examples on the net.
>
> Thanks but I'm not sure I gain anything with a database other than
> bloat and a slow-down. Will the database do anything more clever than
> a simple linear search if there is no (slow) index creation upon each
> new entry?
>
> Are there self balancing binary tree tables available so that the
> lookup is done 'quickly' without the brute force expense of a huge
> (and likely sparse) lookup table? And without the expense of
> re-indexing for every new addition to the table?...
>
>
> Cheers,
> Martin


I'm not sure how you think databases work but the indices are there
exactly to provide a record search that's faster than a linear
lookup...


-- 
Mat Booth
http://fedoraproject.org/get-fedora



More information about the Nottingham mailing list