[SWLUG] how much more CPU performance can one get these days?

Fri Sep 18 08:40:27 UTC 2009

On Fri, 2009-09-18 at 08:50 +0100, Mark Summerfield wrote:
> On 2009-09-18, Iain Menzies-Runciman wrote:
> > Mark,
> > 
> > Having hit the limits of machines several times in the past you may need
> > to change the problem.
> > 
> > I would assume that you have already done the obvious and optimised the
> > code you have (also assuming that you have the source code).
> 
> I have the source but it is C and complicated & I can't even think of
> changing it. (Also the author is an expert on algorithms for intractable
> problems so I doubt I could improve what he's done.)
> 
> > From the sounds of it, it is an iterative process, so just running the
> > code on many machines at the same time would not help.
> 
> That's correct.
> 
> > Given that you are on a dual core, I am assuming that you are threading
> > and make the most of each core.
> 
> I don't know for sure if it is threaded, but if I don't use nice it
> manages to use 100% of CPU on my dual core.
> 
> > Can the code be parallelised and then split out onto multiple machines
> > e.g. using something like PVM or MPI? If it can then you can get
> > yourself a small cluster of cheap systems (e.g. 2nd hand systems at £50
> > a pop - or if you are like most techies use some of the old systems that
> > we have just lying around :-) ) and spread the load out onto a
> > distributed cluster. I managed to get a task from 22 hours down to 45
> > minutes - but that was PVM spread over 50 machines.
> 
> I wish... but there's no way of splitting it up. 
> 
> I was really wondering if getting a modern 64-bit quad core would
> actually make a significant difference.

well an important thing to determine is if thats 100% of one core, or of
all cores.

in top press '1' to switch the view to list the cores separately, if its
only maxing out one core then it isnt threaded and more cores isnt going
to help.

in which case i would imagine (im no expert) that the thing to
concentrate on is getting the highest performance of one core, either
through usign a single core chip, or a dualcore that has better speed
per core.

also probably looking to make sure your getting the best possible RAM
speed performance, i believe there are certain chipsets that use dimms
in pairs or triples to get faster access, so that and the speed of the
ram, will probably make a significant difference when crunching lots of
numbers.

also make sure that the process isnt starved for memory, use something
like vmstat to make sure theres no significant swapping happening, as
that would be a slowdown and easily fixed with more ram.

but if it really is a single threaded algorithm then it just really isnt
going to scale all that well on consumer hardware.

if it was threaded (the 'H' key in top is also useful there) then all of
the speed advice above still applies but you can also throw more cores
at it, how much that increases speed will depend on how well the
algorithm parallelises, but its more scalable than single threaded.