[Gllug] disk problems

Mon Mar 13 21:33:19 UTC 2006

On Mon, 13 Mar 2006, Tethys murmured woefully:
> Nix writes:
>>Join the club. After I lost two disks in a month-long period I went
>>all-out and have now RAIDed the lot. No more disk death for *me*.
> 
> Don't be too proud of this technological terror you've constructed.
> The first time you lose 3 drives concurrently will cure you of your
> belief that RAID will protect you :-)

Oh, it's not a replacement for a backup, and if lightning or a massive
static blast strikes my machine, or the house burns down, it won't help.

What it *will* do is reduce the panic quotient when a disk fails: I
don't have to instantly get a replacement regardless of finances and
until it's installed get knocked off the net with my most critical
machine (with 95% of my data and 85% of my disk space) dead in the
water.

Plus I got an excuse to hack at initramfs scripts, busybox and uClibc a
bit, and to eliminate the vast and risky kludge which is fscking a
mounted filesystem (/, just after boot; now I can fsck it before it's
mounted, done from a filesystem constructed ex nihilo and populated from
unchanging data at the instant it's used, which is thus guaranteed not
to be damaged).

>                                       Unless you're using some seriously
> OTT RAID scheme -- which I've seen done, but it wasn't exactly cheap.

Nah, this is just a pair of three-device RAID5's (with no spares, I
don't have *that* much space), a tiny old-metadataed RAID1 for /boot,
and a couple of non-RAIDed arrays in the remaining space and for
swapping to:

Gloki:/boot# cat /proc/mdstat
Personalities : [raid1] [raid5]
md2 : active raid5 sdb7[0] hda5[3] sda7[1]
      19631104 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/3] [UUU]

md1 : active raid5 sda6[0] hdc5[3] sdb6[1]
      76807296 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/3] [UUU]

md0 : active raid1 sda5[0] hdc1[3] hda1[2] sdb5[1]
      56064 blocks [4/4] [UUUU]

unused devices: <none>

loki:/boot# pvs -o +pv_used
  PV         VG    Fmt  Attr PSize   PFree  Used
  /dev/hdc6  disks lvm2 a-   592.00M     0  592.00M
  /dev/md1   raid  lvm2 a-    73.25G 34.05G  39.20G
  /dev/md2   raid  lvm2 a-    18.72G 15.72G   3.00G
  /dev/sda8  disks lvm2 a-    21.70G 18.22G   3.48G
  /dev/sdb8  disks lvm2 a-    21.67G  8.58G  13.09G

(It's all LVMed except for /dev/md0; LVMing that seemed a bit of a waste
of effort; /dev/md1 is where my important data is, because one of the
disks in /dev/md2 is old and slow).

It'll do the job, and I don't even need to feel uneasy about the
initramfs: the biggest problem I had with initrds (getting out of synch
with the kernel, and/or forgetting to rebuild it when needed) is not
present :)

-- 
`Come now, you should know that whenever you plan the duration of your
 unplanned downtime, you should add in padding for random management
 freakouts.'
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug