[Gllug] Faster maths?

Nix nix at esperi.org.uk
Sat Mar 10 01:18:56 UTC 2007


On 9 Mar 2007, Andy Farnsworth told this:
> Hashing in perl has been highly optimized.  I have used hashes on the
> order of a million entries and it has worked extremely quickly (<1
> second for 1000 lookups).

I'm sorry, but for a linear-probing hash table that's not exactly
shining performance. You should be approximating memory bandwidth speed
(obviously not dcache speed), which should be *vastly* higher than that.

Not-very-optimized C hashing libraries I've written in the past have
managed >50000 lookups per second on 500MHz SPARCs. The only part I
optimized, and this is where I think the difference lies, is the hashing
function (Jenkins has some *very* good ones: he also has some slightly
less good ones, one of which is the one Perl 5 is using).

But if you're only getting 1000 hash lookups per second, I'd start to
profile perl if I were you. Something is wrong in there.

>                           The only real issue is if you start to get
> a very large number of hash collisions, then it may slow down.

That won't happen on recent Perl versions, at least not for long: the
hash table used is a linear-probing dynamically expanding table, so the
first time it resizes the hash gets in effect repopulated. Also, Perl
5.8.1+ incorporates defences against hash complexity attacks: to wit, a
per-process random factor is incorporated into the hash (the c
parameter, I think it is, in many of Jenkins's hashes can be (ab)used
for this) such that hash bucket assignment differs on each run, with
the specific intention of making sure that colliding keys on one run
won't collide on the next.

-- 
`In the future, company names will be a 32-character hex string.'
  --- Bruce Schneier on the shortage of company names
-------------- next part --------------
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug


More information about the GLLUG mailing list