[Gllug] DD Ext3 Move

Nix nix at esperi.org.uk
Fri Feb 10 07:28:53 UTC 2006


On Thu, 09 Feb 2006, Rich Walker yowled:
> Nix <nix at esperi.org.uk> writes:
>> hardware problems appear resolved (really weird hardware problems,
>> too; turn on ECC L2 cache RAM, and get intermittent single-bit errors
>> under high load: turn it off, and the errors go away. ECC shouldn't
>> introduce errors, should it?)
> 
> Might be - turn on ECC, *detect* single-bit errors. turn off ECC,
> *don't* detect single-bit errors.

The `detection' consists of unexpected single-bit errors when running
this test script:

#!/bin/bash

cd /tmp
i=0
cp ~nix/Graphics/cats_te_mpl_mpg.mpg .
while :; do
    echo -n "$i "
    if ! ( bzip2 -9 cats_te_mpl_mpg.mpg && bzip2 -t cats_te_mpl_mpg.mpg.bz2 && bunzip2 cats_te_mpl_mpg.mpg.bz2 ); then
        mv cats_te* failure.$i
        cp /home/nix/Graphics/cats_te_mpl_mpg.mpg .
        echo -n "(failure) "
    fi
    i=$((i+1))
done

where that .mp3 file is 10Mb. Without the L2 ECC, the single-bit errors
go away.

So it's not `enabling ECC on the L2 is tripping warnings'; no warnings
are seen. It's `enabling ECC on the L2 is causing corruption'.

As I said, I am mystified.

> I.e. - replace machine time.

The machine is brand new, and I've already replaced the RAM, the cache
RAM and the hilariously underspecified CPU fan.

> You did the "thousand kernel compiles" test?

A rolling GCC bootstrap-and-test while running cpuburn (a much more
intensive test than any mere kernel compile) found no problems.  Memtest
found no problems, and nor did half-a-dozen other testers. But bzip2 and
bunzip2, and my backup run which uses libbz2, seem to trip something
nasty: persistent CRC errors, tightly clustered. (It might work for half
an hour and then fail ten times in five minutes: probably there's *one*
bad bit of cache or something, and everything works until the kernel
moves things around to hit that bit). Turn off ECC and suddenly the
failures dissolve.

It's the single oddest memory subsystem failure I've ever seen.

>> It's only recently that I've had more than two disks in one machine, or
>> more than one disk of any vaguely similar size. Now I've got 1x50Gb
>> and 2x73Gb in one machine, it's time for 50Gbx3 RAID-5... :)
> 
> Now, if you could just wean yourself off SCSI/FC and onto PATA/SATA, you
> could put 6 or 8 200GB disks in instead for the same money, and probably

Hardly, I got two of those disks for almost nothing. ;)

> manage the same speed, as well as having a hot spare for the day
> everything goes south...

Well, I can SATA it up later, as well. (Or I could if the machine were
SATA-capable, which it isn't.)

>> (There will be some un-RAIDed storage there, as well, probably for swap
>> and things like MP3s which I can just re-rip if a disk dies. But
>> everything of significance is going under the RAID-5 hammer, including
>> the root filesystem.)
> 
> I'm still trying to work out whether to try RAID-ing swap
> partitions. 'Cos, at the moment, if a RAID disk with a non-RAID swap
> partition on it dies, I lose (potentially) all running processes as well...

Yeah, but there *is* CPU overhead from RAID... admittedly if you're swapping
the CPU is pretty much going to be idle.

The question is, are you considering RAID a `get back on your feet fast
after a reboot' system, or a `work unattended come what may' system? :)

>> My backups are semi-incremental (level 0 on CD-R, incremental on CD-RW
>> until I run out of disks, then incremental on CD-R against the most
>> recent CD-R backup), but of *everything*. I'm a paranoid madman :)
> 
>>From time to time I run multicd to generate DVD's. That's always fun -
> it takes *forever* to do my maildir alone...

Likewise, although I think that's because of quadratic behaviour in
dar(1) on large directories. I must fix that, although the source being
mostly in French doesn't help.

-- 
`... follow the bouncing internment camps.' --- Peter da Silva
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug




More information about the GLLUG mailing list