[Nottingham] A quick bash updating table lookup

Mon Feb 7 21:27:25 UTC 2011

On the whole I trust Perl to do the right thing unless the
requirements are ridiculous. I've been fairly pleased by the
performance of scripts, with and without large data sets. I imagine
there are some cases where Perl can be made to behave badly but on the
whole they seem rare.

>> For this case under consideration, the table is being continuously
>> randomly updated whilst also being read to see if anything has been
>> already seen...
>>
>> Do databases support such as (big) sparse tables or fast updating of a
>> table index without a long pregnant pause upon each new entry and
>> re-index?

Perl hashes are pretty much designed to do this and I understand
DB_File ties are a mapping of this API onto a persistent storage. It
would probably be interesting to do some tests to see exactly how it
performs. The hardest thing is probably to get a dataset to test the
algorithms on.

Of course performance will degrade a bit as the data set grows so it
might be less risk to find some good libraries and get coding in a
compiled language.

My gut feel is, if you can't do it easily in Perl, there's no way it
could have been done in Bash :) but you might still be able to do it
in C

Even if you have an external program to do the b-tree heavy lifting,
its performance will likely be crippled if it has to start up,
navigate the tree and interact with it. It will also be better to use
a scripting language that has extensions for md5 (for example) rather
than run up a new subprocess for every md5sum.

-Cam