[Nottingham] Various kernel oops

Martin martin at ml1.co.uk
Fri Jun 6 01:47:02 BST 2008


Martin wrote:
[---]
> No OC and I've had it at 100% CPU utilisation running Boinc projects.
> Also checked out with memtest86+ and a HDD diagnostics.
> 
> I may well try it with Bonnie to check out the HDDs/IO and fs.

Well, the story rolls on...

The 'fix' appears to have been to clobber the SATA HDDs NCQ down to '1'
(no NCQ) and to boot with the kernel parameter mem=3584M to avoid ever
going over the 4 GByte memory boundary. (I'm sure mem=4000M or even
mem=4096M would work just as well.)

>From various thrashings with Bonnie++, I don't think having the swap on
was really the problem other than for perhaps encouraging more physical
RAM to be used for buffers sooner and so tripping over the 4 GByte
boundary sooner, Oops.

And the fault for the Oops?...

>From reading around, my suspicion is for IODMA from hardware somewhere
not surviving going above the 4 GByte boundary.

There's been various kernel fixes recently (2.6.25...), including for
NCQ and for avoiding unwarranted locking of pages for pdflush (inodes?)
that I might have seen as a problem during high utilisation of RAM.
Mandriva have also released recently another version of the 2.6.24
kernel with security updates and bugfixes... So... Further experimenting
when I can.


I also enabled ksymoops and that sent off a fair few reports to the
kernel people. No idea if any particular one got picked up.

To be continued...


(The next meet might be about kernels and schedulers unless there's
other better or more down-to-earth suggestions! ;-) )

Cheers,
Martin

-- 
----------------
Martin Lomas
martin at ml1.co.uk
----------------



More information about the Nottingham mailing list