[Wylug-help] Inactive RAID 10 Array

Tue Apr 14 22:58:27 UTC 2009

Dave Fisher wrote:
>   1. How to correctly backup the affected array before I do anything else.

Using dd for each partition of the array will work. The MD superblock is 
usually stored at the end of each physical partition (not sure how big 
it is, though).

The manual page clarifies this a little:

<<
The different sub-versions store the superblock at different locations 
on the device, either at the end (for 1.0), at the start (for 1.1) or 
4K from the start (for 1.2).
 >>

But then goes on to muddy the waters by saying that the default (on 
Debian, at least) is to create superblocks with version 0.90, which is 
also stored at the end of the partition.

>   2. How to diagnose the fault with a high degree of certainty.

Have you considered simply trying to re-assemble the array?

If you "mdadm --examine --scan -v", and the superblocks are still 
intact, you'll get a dump of something like this (from my RAID 1 setup):

# mdadm --examine --scan -v
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=...
    devices=/dev/hdc1,/dev/hda1
ARRAY /dev/md5 level=raid1 num-devices=2 UUID=...
    devices=/dev/hdc5,/dev/hda5
ARRAY /dev/md6 level=raid0 num-devices=2 UUID=...
    devices=/dev/hdc6,/dev/hda6
ARRAY /dev/md9 level=raid10 num-devices=4 UUID=...
    devices=/dev/dm-8,/dev/dm-3,/dev/dm-2,/dev/dm-1

If you see the right devices you should simply be able to restart the array:

# mdadm --assemble /dev/md1 /dev/sd{b,c,d,e,f}4

Provided you don't use --force it will refuse the action if there are 
too many errors in the devices.

>   5. sdd4 appears to be faulty and fdf4 is supposed to be a spare

mdadm /dev/md1 --fail /dev/sdd4

> My two big fears are that:
> 
>   1. Some of the RAID metadata is stored elsewhere, e.g. on a different partition
>      or superblock.
> 
>      If so, how do I back that up and restore it?  

It's stored at the end of each corresponding physical RAID partition. If 
you set it up initially, it will also be in /etc/mdadm/mdadm.conf.

>   2. There may be hardware constraints that I've forgotten or never knew about.
>   
>      For example, I remember that the partitions in an array have to be identically
>      sized, but I am guessing that they don't have to be physically identical, i.e.
>      they don't have to occupy identically positioned blocks on identical models of HDD. 

Partitions /should/ be identically sized; if they're not, then the 
smallest is used as the array size. They don't have any other 
constraints (I've just built a RAID10 device from four LVM volumes 
allocated from a two disk RAID0).

>      So I should be able to treat raw images of the partitions just like the originals.

I'm not sure you can use a raw image as a component of a RAID device, 
but if you attached it via the loop device I guess it's possible. (Now 
there's an interesting train of thought!)

>      What about block sizes for the dd'd copies?     

As big as possible. bs=10240k often works for me.

> One thing I don't understand is why mdadm -E /dev/sdb2 reports one
> failed partition, but mdadm -E /dev/sd{c,d,e,f} show no such error? 

(Un)fortunately, just because a partition has failed doesn't 
automatically mean that the remainder of the disk is seen as failed.

Chris