[Nottingham] A quick bash updating table lookup

Martin martin at ml1.co.uk
Tue Feb 8 13:02:00 UTC 2011


On 8 February 2011 12:48, Martin <martin at ml1.co.uk> wrote:
> On 7 February 2011 21:35, Richard Hodgson <rich at dearinternet.com> wrote:
>> On Mon, Feb 7, 2011 at 21:23, Camilo Mesias <camilo at mesias.co.uk> wrote:
>>>
>>> Even if you have an external program to do the b-tree heavy lifting,
>>> its performance will likely be crippled if it has to start up,
>>> navigate the tree and interact with it. It will also be better to use
>>> a scripting language that has extensions for md5 (for example) rather
>>> than run up a new subprocess for every md5sum.
>
> I suspect that this exercise is IO-limited on reading the files
> through md5sum and in seeking around the HDD chasing the inodes. ...

Now... For a real speedup...

Why not instead start at HDD block zero and md5sum each inode as you
scan up through the disk?... That should minimise disk seeks apart
from the inevitable fragmented files...

Is there a filesystem independent inodes utility that can read off all
the inodes by partition offset?... However... You still need to run
through the directory trees to retrieve all the filenames...

So... Would naively running through all the inodes in ascending order
have the HDD heads nicely (efficiently) scanning up the disk?

Anyone know how inodes numbers are allocated across the disk for
various filesystems?

Cheers,
Martin



More information about the Nottingham mailing list