[Gllug] code optimisations

Thu Mar 24 10:37:55 UTC 2005

On Wed, 23 Mar 2005, Chris Ball whispered secretively:
> On Tue, Mar 22, 2005 at 01:16:25PM +0000, Rich Walker wrote:
>> Also, on some of the Athlon-and-later systems, the memory controller can
>> support multiple fetch streams, making
>> 
>>  for (i=0; i<1000000/2; i++)
>>    for (j=1; j<10000000; j++) {
>>      a[i][j]++;
>>      a[i+1000000/2][j]++;
>>  }
>> perform faster than the naive version.
> 
> This is standard autovectorization, and working in gcc-4.0 when you use 
> -O2 and -ftree-vectorize.

Is it?

Wow. I hadn't been paying much attention to the autovectorization branch:
obviously I should pay more :)

---- ah, of course, I'm blind, there are provably no aliasing problems
here because the loop bound for i is halved, and the compiler knows
this because it adjusted the loop bound in the first place.

>> But it's ... interesting ... to communicate to a C compiler that the
>> second optimisation is valid. If you did:
> 
> Quite.  The compiler can't always know that there's no data dependency 
> inside the loop.  The current plan for autovect-branch is to control 
> the vectorizer with a #pragma (which is apparently how icc does it).

Eeew. I can't help feeling that there must be a better way :(

-- 
This is like system("/usr/funky/bin/perl -e 'exec sleep 1'");
   --- Peter da Silva
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug