[Gllug] disk problems

Sean Burlington sean at burlington.eclipse.co.uk
Tue Mar 14 21:27:30 UTC 2006


Nix wrote:
> 
> Between May 1989 (my first PC) and Jan 2006 I had two fan failures.
> 
> In late Jan 2006 and early-to-mid Feb 2006 I had
> 
> - two disk failures (one whose motor died at spinup, one just from
>   old age and bearing wear), leading to the decommissioning of an
>   entire machine because Sun disks cost so much to replace
> - one motherboard-and-network-card failure (static, oops)
> - one overheating CPU (on the replacement for the static-death box)
> - and some (very) bad RAM (on that replacement).

How many machines do you have ?

It can't be so many that that isn't an appalling failure rate!!!

> Everything seems to be stable now, but RAID it is, and because I want
> actual *robustness* I'm LVM+RAID-5ing everything necessary for normal
> function except for /boot, and RAID-1ing that.

RAID 5 = 3 hard disks + controller...

I can't really justify the expense of that even though I have had a
couple of failures (and one or two learning experiences a while back)

Pretty much everything important is under version control at work and
backed up to tape nightly.

I just need a more effective way of separating out stuff that needs to
be regularly backed up from the rest of it...

My backups have been getting less frequent as I keep finding that the
stuff I planned to backup is over DVD size!

And maybe I'll look into something like MondoRescue - even if I end up
spending more time backing up that I would restoring from scratch - at
least I can plan when I do the work.

> Thanks to initramfs this really isn't actually all that hard :) my /init
> script in the initramfs is 76 lines, and that includes enough error
> checking that if / can't be mounted, or the RAID arrays are shagged, or
> LVM has eaten itself, I get a terribly primitive shell with access to
> mdadm, the lvm tools, and fsck, on a guaranteed-functioning FS which can
> only go away if the kernel image itself has vanished.
> 
> (I'm an early adopter; eventually use of an initramfs will be mandatory,
> and even now everyone running 2.6 has an initramfs of sorts built in,
> although it's empty.)

I try not to be an early adopter - or at least I try and stick with
Debian stable where I can and cherry pick from more up to date stuff.


> 
> The CPU in this box is nice and cool for a P3/600, probably because I
> overdid it and slung in a huge fan after hte first overheating incident:
> 
> CPU Temp:  +44.2°C  (high =   +95°C, hyst =   +89°C)
> 
> :)

lmsensors has been on my todo list for a bit ....

> 
> 
>>I think I'm going to make more effort to switch off overnight in
>>future - it seems to me that it's boxes that get left on for months
>>which have problmes.
> 
> 
> I've left boxes on for years at a time with no problems at all. Most of
> my failures have happened at poweron time (excepting the old-age disk
> and that was fifteen years old when it died and had been running
> constantly for all that time, except for house moves; it had been a
> huge and expensive disk in its day, 4Gb!)

Now I've got things up and running again I'm much happier and thinking I
don't want to give up mythtv - so it's likely to get left switched on.

What I will do is make a full backup before shutting down such a machine!

> 
>>>What is `the system'? /etc/mtab (as used by df(1)) is maintained by
>>>mount(8) and is terribly unreliable; it's confused by mount --bind,
>>>per-process filesystems, chroot(8), mount --move, mount --rmove, subtree
>>>sharing, you name it, it confuses it.
>>>/proc/mounts is maintained by the kernel and is actually reliable. What
>>
>>this seems bad - here was no /proc/mounts
> 
> 
> Either you don't have /proc mounted or you're running a pre-2.4 kernel.
> Both these things are generally bad signs.
> 

it was a 2.6 kernel compiled by me rather than ready distributed

I'm not sure if proc was mounted or not - it was in /etc/fstab but since
I didn't trust df and couldn't read proc .... I re-installed to a
different disk.


-- 

Sean
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug




More information about the GLLUG mailing list