[Nottingham] Linux HT

Robert Davies nottingham at mailman.lug.org.uk
Sun Apr 6 21:47:01 2003


On Sunday 06 Apr 2003 17:35, Jon Masters wrote:

> I am reviewing a box at the moment and am disappointed with performace of
> two 2.4GHz Xeon CPUs when running under the supplied Linux configuration
> (Redhat 8, 2.4.18-26.8.0smp). This could be due in part to the Linux
> kernel configuration which Redhat supply and other related factors.
>
> It would seem logical to schedule a single process per real CPU and
> several threads of execution context per logical CPU though this is not
> what is happening in practice and the cache contention must be nasty.
>
> I have read Intel's architecture papers and understand roughly how the
> implementation of logical architectural units is done though there is
> always a chance someone here will be much more aquainted with the Linux
> kernel side of the HT support. I can see why the Linux implementation
> would have been done as it initially has been - and it is early days.
>
> The idea behind Hyperthreading is to improve the performance of threaded
> applications at a cost of about 10% increased die usage. The execution
> context of the Intel Architecture is duplicated but not the supporting
> caching and related mechanisms. This would be great in situations where
> standard locality of code and data dictates that highly threaded
> applications will benefit from increased throughput, but only for threads.

Problem with Hyper-threading over trad SMP is that you're trying to make more 
efficient use of on-die resources, unused execution units etc, but that means 
performance gains depend very much how well the 2 hyper-threads compliment 
each other.  I can only comment theoretically, but having followed the kernel 
news, I think you need kernel with 2.5 scheduler stages, in effect 
Hyperthreaded SMP machine, giving 2 x 2 NUMA-like architecture.  The O(1) 
scheduler has patches aimed at hyper threading in 2.5.  The new scheduler 
understands that the cache is shared when hyperthreading, so both HT 
execution contexts on same die can be equally good to schedule for a process, 
without blowing off cache context by switching it onto 2nd CPU.

The idea behind hyperthreading, is to make progress using extra otherwise 
unused execution units, during thread stalls on memory access.  The strength 
is it can raise performance for 5% increase in die area, but the weakness is 
unless the threads are using cached memory locations, it will actually be 
slower due to internal overheads in CPU of faking SMP only for both to stall 
on memory access, or contend for the same execution units.

Anandtech has fairly good article on Hyper-threading which explains why it's 
use is not a no brainer, Intel have shipped HT capable chips for desktop, but 
advised BIOS to default to keeping it off.  You should get the most benefit, 
where you have one set of processes doing one type of work, say compiling, 
whilst the 'hyper thread' makes use of unused execution units to operate on 
say floating point.

Intel are suggesting software developers make applications hyper thread aware, 
and turn it on and off, depending on instruction mix, which seems dire and 
unworkable to me.  The Linux approach, seems to be to have the kernel attempt 
to schedule extra threads, and to hope that different processes will avoid 
contention and simulataneous cache misses often enough to make it worthwhile.  
That said, the early P4's were very low on cache, it's noticeable that planned 
L2 cache sizes has already quadrupled, in 2 years, even if the L1 caches 
remain a puny (but fast) 16KB, it's likely there's a link.  Actually the new 
Pentium Mobile chip (PIII based) is a nice design, managing to wring even 
more instructions per clock cycle with parrallelism, though it's clear 
architectures have reached point of diminishing returns, and many RISC server 
chips under development, are SMP multi-CPU on one die.  They don't risk slow 
downs due to thread contention in same way, but seperate CPUs double the 
silicon required.

Rob