[Gllug] Random freezes

Sun Feb 6 16:12:23 UTC 2005

On Mon, 31 Jan 2005, Christian Smith yowled:
> Thought this'd stoke the flames...;)
> 
> 
> On Fri, 28 Jan 2005, Nix wrote:
> 
>>On Fri, 28 Jan 2005, Christian Smith spake:
>>> With sun4c, NetBSD handles the strange SPARC MMU much better than Linux
>>> ever could
>>
>>That's a risky statement.
> 
> 
> It's widely ackowledged that Linux sucks on sun4c. Check out:

Linux is presently *nonfunctional* on sun4c: the code has rotted.

> http://www.uwsg.iu.edu/hypermail/linux/kernel/0107.0/0003.html
> 
> The issues have not been fixed in the meantime, and probably never will
> be.

Certainly not, unless someone other than davem does it. :)

I can't recall what the weirdness is with sun4c: but yes, I was wrong,
because I was thinking of UltraSPARC (stupid, really, since you
*said* `sun4c'.)

>>In fact, substantial revisions were made specifically to support SPARC.

UltraSPARC. Acquire brain, idiot.

>>(Note that while several different MMUs are used on SPARC-class boxes,
>>the UltraSPARC MMUs are recognisably related to (some of) the SPARC
>>ones, and many of the same weird features are still needed (IIRC, the
>>biggest being software-invalidated caches and the lovely instructions
>>that let you look at address spaces `as if' you were running in some
>>other protection domain; but it's several years since I looked at this
>>last...)
> 
> Crikey, software invalidated cache is truly the spawn of satan.

I dunno: you can do really wicked cool evil tricks with it...

... however, nobody ever does, so everyone just gets the costs without
the (questionable) benefits.

Quite like register windows really. (You *could* have an infinitely
large stack with it, thanks to the software window underflow/overflow
traps... but nobody ever does it.)

>>So what's NetBSD got that Linux hasn't in this area? Where's Linux's MM
>>`fundamentally weak'? (And which MM do you mean? 2.0, 2.2, 2.4 <2.4.10,
>>2.4.x >2.4.10, and 2.6 are really quite different from each other in
>>a number of fundamental ways.)
> 
> The fact that you were asking for qualification on which version of the
> Linux VM has the problem shows that it is not a particularly stable
> abstraction.

Actually, the VM itself stayed quite stable, modulo a rewrite in 2.4.10;
it's the parameters that control things like memory balancing that have
been tweaked like mad. (That's also where its weaknesses were: the
inability pre-rmap to go from virtual to physical address really caused
problems with things like swapping.)

> It's based on a "generalised" page table model, with 3 levels of pages.
> That works fine on machines with <=3 levels of page tables, but x86-64 has
> 4 levels in 64 bit mode, I believe.

You're whole weeks behind the times :) It's got four levels as of early
January, any of which can be eliminated by the compiler on suitable
architectures.

(Mind you, no *released* kernel has this yet, but when 2.6.11 comes
out...)

>                                     It works alright in TLB based MMUs,
> such as MIPS and SPARC V9, but doesn't really fit for inverted page table
> based architectures.

Well, it can work on them. It's just trickier. (Quite a bit trickier).

> Page tables are largely a hardware notion.

Linux page tables are divorced from the hardware: we just use the same
terms as do the hardware people. s/page table/way of tracking a page-sized
block of memory/ if you prefer.

>                                            Linux page tables are
> overloaded with swap information, for example, making them impossible to
> page out or discard.

The inability to swap out page tables is definitely a failing, but it's
hard to see how it could be fixed while retaining the ability to examine
memory globally to search for things to swap out. I guess you'd have to
impose a rule that says that page tables are swapped out when the pages
they control have all been swapped out, and use something like swap
clustering...

... but none of that requires moving the page flags out of `struct
page'. Doing that would be hugely disruptive, actually...

>                      Not an issue for memory laden modern desktop, but an
> issue for embedded or small machines.

Linux is drifting in the direction you mention, actually, driven by very
*large*-memory machines (tens of Gb of RAM on 32-bit boxes, NUMA
boxes...) in which page tables have all sorts of weird extra stuff which
needs to be tracked.

>                                       And page table reuse could even be a
> win on memory rich machines, due to increased cache utilisation, though I
> guess it would not be a big win.

It's a *huge* win. The `struct page' uses up ~4% of all memory; that's a
significant amount no matter how much memory you've got.

> Mach/BSD pmap and SVR4 hat are much better MMU abstractions. Simple APIs
> which hides *all* the details of the hardware MMU, on top of which truly
> platform independent VM can be built.

Downside: if you hide *all* the details then you can't build a VM that
takes full advantage of the hardware.

>                                       NetBSD had a completely new VM model
> between 1.4 and 1.5 across all 12 CPU families then supported, possible
> only because the VM model was completely MMU agnostic.

That's hyperbole.

Being completely VM agnostic is of course completely impractical: you
still acknowledge the existence of swap, L2 cache, or at least staggered
levels of storage of different speeds: that's the whole *point* of MM.

> The fact that BSDs are often cited as better under high load is a function
> of it's better VM architecture. Linux would do well to adopt a similar
> seperation of VM policy and mechanism. Alas, it will not happen.

*sigh* VM policy and mechanism have been separated to some degree in
Linux since the 1.something days, and are growing more so with
time. `struct page' *is not a hardware page table entry*.

(And anyone who walks the page table directly without a good reason gets
his code rejected.)

-- 
`anybody who quotes Russ [Allbery] can be forgiven almost anything!'
                                                  -- Stephen J. Turnbull
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug