[Gllug] code optimisations

Rich Walker rw at shadow.org.uk
Wed Mar 23 14:14:42 UTC 2005


Nix <nix at esperi.org.uk> writes:

> On Tue, 22 Mar 2005, Rich Walker stipulated:
>> Also, on some of the Athlon-and-later systems, the memory controller can
>> support multiple fetch streams, making
>> 
>>  for (i=0; i<1000000/2; i++)
>>    for (j=1; j<10000000; j++) {
>>      a[i][j]++;
>>      a[i+1000000/2][j]++;
>>  }
>> perform faster than the naive version.
>
> That requires loop peeling and the value range optimization to work, I
> think (at least: even that doesn't provide an optimization that will
> adjust i like that, although it's an interesting idea, related to the
> autovectorization stuff that's going into 4.1).
>
> This is all 4.1 material again, at least.
>
> (I bet the Sun compiler had a special-purpose speed-up-SPEC
> optimization...)

It was a great surprise to the regulars at news:comp.arch, and caused
some discussion. General conclusion: it was just on the edge of
permissible for Spec.

A quick check finds
<http://groups-beta.google.com/group/comp.sys.super/browse_frm/thread/ecd8ea5519ea1ee0/489980efd438b957?tvc=1&q=sun+spec+optimisation#489980efd438b957>

which is actually a more interesting optimisation, and

<http://groups-beta.google.com/group/comp.arch/browse_frm/thread/ad3ba8b791a07e58/15b4f6f0d6c11ae4?q=sun+spec+optimisation+179.art#15b4f6f0d6c11ae4> 

which suggests the other one is "transpose the matrix" rather than
multiple fetch streams.

>
>> But it's ... interesting ... to communicate to a C compiler that the
>> second optimisation is valid. If you did:
>> 
>>   void foo(int ** __restrict__ a) { }
>
> As long as i and j are locals, I think that is acceptable. The compiler
> knows that a[i][j] and a[i+1000000/2][j] cannot alias :)

Actually, I'm not sure it does. Suppose a[i]==a[i+1000000/2], which is
certainly legal.  __restrict__ seems to be strong enough to say that
  int foo(int * __restrict__ p, int * __restrict__ q) 
p and q do not overlap, but I'm not sure it says anything about the
arrangement of a...

>> then you might expect it to happen, but I'm not sure it would. The use
>> of __attribute__ ((vector_size(16))) applied to the type of a might
>
> You shouldn't need to use this; vector_size has no effect with respect
> to arrays anyway, and even if it did, the (ISO C99) parameter
> declaration would be something like
>
> int *a [restrict __attribute__((vector_size(16)))]
>
> only GCC doesn't (yet) support the use of __attribute__ there.

That's going to be fun :->

cheers, Rich.

-- 
rich walker         |  Shadow Robot Company | rw at shadow.org.uk
technical director     251 Liverpool Road   |
need a Hand?           London  N1 1LX       | +UK 20 7700 2487
www.shadow.org.uk/products/newhand.shtml
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug




More information about the GLLUG mailing list