[Gllug] Patterns of locality of reference in xterm

Nix nix at esperi.org.uk
Mon Jun 20 10:19:48 UTC 2011


On 20 Jun 2011, James Courtier-Dutton uttered the following:

> On 20 June 2011 00:40, Nix <nix at esperi.org.uk> wrote:
>> On 19 Jun 2011, Adrian McMenamin verbalised:
>>
>>> A few people expressed interest in this (honestly, they did).
>>>
>>> My first, rather crude, results can be seen here:
>>> http://cartesianproduct.wordpress.com/2011/06/19/patterns-of-locality-in-xterm/
>>
>> Doing it for GCC might *very* well be useful, because locality of
>> reference (particularly within single compilation passes) translates
>> directly into speed: GCC is distinctly cache-unfriendly and fixing that
>> is a long, long slog.
>
> Why not use a cache-oblivious algorithm.

It does, sort of: structures are arranged to maximize cacheline reuse,
and that sort of thing.

> I used one for some crypto implementation recently and it sped it up
> by about 20x.

The problem is that a crypto implementation is one very heavily studied
algorithm. A compiler has thousands of algorithms in it, most of which
have been studied by one or perhaps two people (though some parts such
as SSA representation have been heavily studied, plus we are privileged
to have an original designer of SSA working on GCC :) ). Plus there are
a large number of shared datastructures, and great quantities of churn
in the shape of new structures appearing and vanishing all the time.

> But for my optimization I only had to arrange the data to be cache
> oblivious. The code itself was so small it would all fit in the Level
> 1 cache of any modern CPU.

Would that this were true of any nontrivial compiler. Hell, the only
interpreters it's true of are some Forth interpreters and Lua.

> I suppose you are wishing the code to be cache oblivious with the code
> size being larger than the cache?

Well, yes, but it's more data size. I don't know of anyone who's done it
recently, but a few years ago someone did a profile and found that GCC
was taking one cache miss per twenty clock cycles. That's pretty bad.
(Some effort has gone into fixing this -- this was *way* bback when
Apple was still working on GCC, before it went GPLv3).

-- 
NULL && (void)
--
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug




More information about the GLLUG mailing list