[Gllug] Programming for performance on Linux

Nix nix at esperi.org.uk
Mon May 17 18:30:35 UTC 2010

On 17 May 2010, Jasper Wallace said:

> On Tue, 11 May 2010, James Courtier-Dutton wrote:
>> So, I am assuming that I will need to memcpy all 128 packets of the
>> same type to a memory location so they can all sit next to each other
>> in the Layer 1 cache, run the algorithm on them, and then memcpy them
>> back where they came from. The memcpy is relatively inexpensive in
>> relation to the number crunching done in the algorithm.
> You don't memcpy stuff into caches, it handled for you at the memory 
> manager level.

You may well want to memcpy first, if the data is widely scattered, both
to force it into L1/L2 cache immediately (depending on volume) to reduce
uncertainties in running time (OK, this is a rare requirement) and to
pack the data more tightly into the cache, rather than ending up with
most of your cache ending up containing data that merely happens to be
adjacent to the data being worked on. (But it's better if possible to
avoid the memcpy() by storing the data in a cache-friendly manner in the
first place. This is only really pratical if one algorithm dominates the
accesses to that data, though.)
Gllug mailing list  -  Gllug at gllug.org.uk

More information about the GLLUG mailing list