[Gllug] code optimisations
Rich Walker
rw at shadow.org.uk
Tue Mar 22 13:16:25 UTC 2005
Nix <nix at esperi.org.uk> writes:
> On Mon, 21 Mar 2005, Pete Ryland uttered the following:
>> And by Cache efficiency, I mean things like organising memory usage to be
>> more sequencial if possible, e.g.:
>>
>> for (i=0; i<1000000; i++)
>> for (j=1; j<10000000; j++)
>> a[i][j]++;
>>
>> is *significantly* faster than:
>>
>> for (j=1; j<10000000; j++)
>> for (i=0; i<1000000; i++)
>> a[i][j]++;
>>
>> I don't think a compiler would optimise that type of thing.
>
> Optimizations that can do that sort of code motion in some cases are on
> the cards for GCC-4.1. :)
There was a fairly notorious optimisation the Sun C compiler did on one
of the SPEC benchmarks that was roughly[1] equivalent to this kind of
structure re-ordering.
Also, on some of the Athlon-and-later systems, the memory controller can
support multiple fetch streams, making
for (i=0; i<1000000/2; i++)
for (j=1; j<10000000; j++) {
a[i][j]++;
a[i+1000000/2][j]++;
}
perform faster than the naive version.
But it's ... interesting ... to communicate to a C compiler that the
second optimisation is valid. If you did:
void foo(int ** __restrict__ a) { }
then you might expect it to happen, but I'm not sure it would. The use
of __attribute__ ((vector_size(16))) applied to the type of a might
permit the C compiler to use vector operation, but the exact semantics
are not clear.
cheers, Rich.
Footnotes:
[1] For some value of roughly including "I vaguely remember this"
--
rich walker | Shadow Robot Company | rw at shadow.org.uk
technical director 251 Liverpool Road |
need a Hand? London N1 1LX | +UK 20 7700 2487
www.shadow.org.uk/products/newhand.shtml
--
Gllug mailing list - Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug
More information about the GLLUG
mailing list