[Gllug] code optimisations

Tue Mar 22 13:16:25 UTC 2005

Nix <nix at esperi.org.uk> writes:

> On Mon, 21 Mar 2005, Pete Ryland uttered the following:
>> And by Cache efficiency, I mean things like organising memory usage to be
>> more sequencial if possible, e.g.:
>> 
>> for (i=0; i<1000000; i++)
>>   for (j=1; j<10000000; j++)
>>     a[i][j]++;
>> 
>> is *significantly* faster than:
>> 
>> for (j=1; j<10000000; j++)
>>   for (i=0; i<1000000; i++)
>>     a[i][j]++;
>> 
>> I don't think a compiler would optimise that type of thing.
>
> Optimizations that can do that sort of code motion in some cases are on
> the cards for GCC-4.1. :)

There was a fairly notorious optimisation the Sun C compiler did on one
of the SPEC benchmarks that was roughly[1] equivalent to this kind of
structure re-ordering.

Also, on some of the Athlon-and-later systems, the memory controller can
support multiple fetch streams, making

 for (i=0; i<1000000/2; i++)
   for (j=1; j<10000000; j++) {
     a[i][j]++;
     a[i+1000000/2][j]++;
 }
perform faster than the naive version.

But it's ... interesting ... to communicate to a C compiler that the
second optimisation is valid. If you did:

  void foo(int ** __restrict__ a) { }

then you might expect it to happen, but I'm not sure it would. The use
of __attribute__ ((vector_size(16))) applied to the type of a might
permit the C compiler to use vector operation, but the exact semantics
are not clear.

cheers, Rich.

Footnotes: 
[1]  For some value of roughly including "I vaguely remember this"

-- 
rich walker         |  Shadow Robot Company | rw at shadow.org.uk
technical director     251 Liverpool Road   |
need a Hand?           London  N1 1LX       | +UK 20 7700 2487
www.shadow.org.uk/products/newhand.shtml
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug