[YLUG] Why is this so much faster on an Opteron?
Gavin Atkinson
gavin.atkinson at ury.york.ac.uk
Mon Sep 24 16:43:37 BST 2007
On Mon, 2007-09-24 at 15:20 +0100, Robert Hulme wrote:
> I have this code:
>
> #include <stdio.h>
>
> int main(char **argv,int argc) {
> double f=0;
> int i=0;
> int j=0;
> int c=1;
> for(i=0;i<10000;i++)
> for(j=0;j<10000;j++,c++)
> f=f+0.505/(j+i/(1.0+j+i));
> return (int)f%10;
> }
>
> If I compile it with -O2 it takes about 40 seconds to run on my
> Pentium D (dual core pentium 4) desktop, but only about 2 seconds
> (when using time) to run on a 2.2Ghz Opteron.
>
> Why is there such an enormous difference?
You've sort of touched on the problem with discovering SSE/SMD
instructions mask the issue.
P4 CPUs are notoriously bad at handling infinities, as well as NaNs and
denormals. Usually this isn't a huge problem, but on the first
iteration, i == j == 0, so the division becomes "0.505/(0/1)", i.e
infinity. P4 CPUs are notoriously bad at handling infinities, as well
as NaNs and denormals. Once f == infinity, all your future divisions
are also affected.
Try starting i and j at 1, and seeing the speed increase.
On the other hand, using SSE instructions for this incurs almost no
penalties for handling these numbers.
See http://www.cygnus-software.com/papers/x86andinfinity.html for more
details.
Gavin
More information about the York
mailing list