[Gllug] code optimisations
Rich Walker
rw at shadow.org.uk
Wed Mar 23 14:14:42 UTC 2005
Nix <nix at esperi.org.uk> writes:
> On Tue, 22 Mar 2005, Rich Walker stipulated:
>> Also, on some of the Athlon-and-later systems, the memory controller can
>> support multiple fetch streams, making
>>
>> for (i=0; i<1000000/2; i++)
>> for (j=1; j<10000000; j++) {
>> a[i][j]++;
>> a[i+1000000/2][j]++;
>> }
>> perform faster than the naive version.
>
> That requires loop peeling and the value range optimization to work, I
> think (at least: even that doesn't provide an optimization that will
> adjust i like that, although it's an interesting idea, related to the
> autovectorization stuff that's going into 4.1).
>
> This is all 4.1 material again, at least.
>
> (I bet the Sun compiler had a special-purpose speed-up-SPEC
> optimization...)
It was a great surprise to the regulars at news:comp.arch, and caused
some discussion. General conclusion: it was just on the edge of
permissible for Spec.
A quick check finds
<http://groups-beta.google.com/group/comp.sys.super/browse_frm/thread/ecd8ea5519ea1ee0/489980efd438b957?tvc=1&q=sun+spec+optimisation#489980efd438b957>
which is actually a more interesting optimisation, and
<http://groups-beta.google.com/group/comp.arch/browse_frm/thread/ad3ba8b791a07e58/15b4f6f0d6c11ae4?q=sun+spec+optimisation+179.art#15b4f6f0d6c11ae4>
which suggests the other one is "transpose the matrix" rather than
multiple fetch streams.
>
>> But it's ... interesting ... to communicate to a C compiler that the
>> second optimisation is valid. If you did:
>>
>> void foo(int ** __restrict__ a) { }
>
> As long as i and j are locals, I think that is acceptable. The compiler
> knows that a[i][j] and a[i+1000000/2][j] cannot alias :)
Actually, I'm not sure it does. Suppose a[i]==a[i+1000000/2], which is
certainly legal. __restrict__ seems to be strong enough to say that
int foo(int * __restrict__ p, int * __restrict__ q)
p and q do not overlap, but I'm not sure it says anything about the
arrangement of a...
>> then you might expect it to happen, but I'm not sure it would. The use
>> of __attribute__ ((vector_size(16))) applied to the type of a might
>
> You shouldn't need to use this; vector_size has no effect with respect
> to arrays anyway, and even if it did, the (ISO C99) parameter
> declaration would be something like
>
> int *a [restrict __attribute__((vector_size(16)))]
>
> only GCC doesn't (yet) support the use of __attribute__ there.
That's going to be fun :->
cheers, Rich.
--
rich walker | Shadow Robot Company | rw at shadow.org.uk
technical director 251 Liverpool Road |
need a Hand? London N1 1LX | +UK 20 7700 2487
www.shadow.org.uk/products/newhand.shtml
--
Gllug mailing list - Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug
More information about the GLLUG
mailing list