[Wylug-help] Inactive RAID 10 Array
Dave Fisher
wylug-help at davefisher.co.uk
Tue Apr 14 13:59:14 UTC 2009
Hi All,
I need some help recovering from a RAID 10 fault. Unfortunately, I really have
to get this fixed quickly and I can't afford to experiment, because I haven't
yet set up a backup procedure for the affected array.
So I need advice on:
1. How to correctly backup the affected array before I do anything else.
2. How to diagnose the fault with a high degree of certainty.
The relevant bits of my RAID set up are as follows:
1. I have 2 arrays (see /etc/mdstat and /etc/fstab below)
1.1. md0 is a RAID 1 array containing my root filesystem
1.2. md1 is a RAID 10 array containing 3 LVM volume groups for /home, /tmp, and /var
2. md1 is inactive, but contains c. 2TB of data spread across 4 primary
partions of c. 1TB each.
3. Most of the data on md1 are multimedia files that I can afford to lose,
but several MBs are business critical stuff like invoices, tax info, etc.
4. md1 consists of the following partitions (see mdadm -E readouts below)
/dev/sdb4
/dev/sdc4
/dev/sdd4
/dev/sde4
/dev/sdf4
5. sdd4 appears to be faulty and fdf4 is supposed to be a spare
I suspect that the first thing I should do is dd sd{b,c,d,e}4 to some spare disks.
I currently have 3 spare 1TB SATA disks and am just about to pop out to
buy 2 more, but before I do anything that relies on these spares I'd
like to be more certain about how useful the dd'd copies are going to
be, i.e. will they contain all the RAID metadata that I need to
preserve?
It's some years since I last read up on RAID so my memory is hazy, and a brief
bit of googling suggests that the documentation on Linux RAID (especially RAID
10) is just as bad (dislocated) as it was then.
My two big fears are that:
1. Some of the RAID metadata is stored elsewhere, e.g. on a different partition
or superblock.
If so, how do I back that up and restore it?
2. There may be hardware constraints that I've forgotten or never knew about.
For example, I remember that the partitions in an array have to be identically
sized, but I am guessing that they don't have to be physically identical, i.e.
they don't have to occupy identically positioned blocks on identical models of HDD.
So I should be able to treat raw images of the partitions just like the originals.
Is this the case?
What about block sizes for the dd'd copies?
Would this command be sufficient to copy /dev/sdb4?
$ sudo dd if=/dev/sdb4 of=/dev/sdg
Suggested diagnoses and prognoses based on the read-outs (e.g. dmesg and
mdadm) below would be appreciated, but that's step no.2 ... after I've
sorted the backups.
One thing I don't understand is why mdadm -E /dev/sdb2 reports one
failed partition, but mdadm -E /dev/sd{c,d,e,f} show no such error?
Hope someone can help,
Dave
----------------------------------------------------------------------------
####################
# cat /proc/mdstat #
####################
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : inactive sdb4[0](S) sdf4[4](S) sde4[3](S) sdd4[2](S) sdc4[1](S)
4829419520 blocks
md0 : active raid1 sdb2[0] sdf2[2](S) sdc2[1]
9767424 blocks [2/2] [UU]
unused devices: <none>
##############
# /etc/fstab #
##############
# <file system> <mount point> <type> <options> <dump> <pass>
# /dev/md0
UUID=5d494f0c-d723-4f15-90d6-b4d08e5fd059 / ext3 relatime,errors=remount-ro 0 1
# /dev/sda1
UUID=2968bbbe-223f-490f-869e-1312dabdaf18 /boot ext2 relatime 0 2
# /dev/mapper/vg--data1-lv--home
UUID=8b824f93-e686-4f08-9ec2-76e754d8f06f /home ext3 relatime 0 2
# /dev/mapper/vg--data1-lv--tmp
UUID=dee6072f-ca1c-462f-9730-c277e3f8b8d9 /tmp ext3 relatime 0 2
# /dev/mapper/vg--data1-lv--var
UUID=03600db0-f72f-4021-9bb2-b8cb19f3a2a0 /var ext3 relatime 0 2
#########
# dmesg #
#########
[ 1.360604] device-mapper: uevent: version 1.0.3
[ 1.360753] device-mapper: ioctl: 4.14.0-ioctl (2008-04-23) initialised: dm-devel at redhat.com
[ 1.372169] md: linear personality registered for level -1
[ 1.374645] md: multipath personality registered for level -4
[ 1.376874] md: raid0 personality registered for level 0
[ 1.380025] md: raid1 personality registered for level 1
...
[ 1.468017] raid6: int64x1 2136 MB/s
[ 1.536001] raid6: int64x2 2936 MB/s
[ 1.604005] raid6: int64x4 2237 MB/s
[ 1.672023] raid6: int64x8 1907 MB/s
[ 1.740013] raid6: sse2x1 4421 MB/s
[ 1.808006] raid6: sse2x2 5179 MB/s
[ 1.876004] raid6: sse2x4 7969 MB/s
[ 1.876032] raid6: using algorithm sse2x4 (7969 MB/s)
[ 1.876062] md: raid6 personality registered for level 6
[ 1.876090] md: raid5 personality registered for level 5
[ 1.876580] md: raid4 personality registered for level 4
[ 1.889737] md: raid10 personality registered for level 10
...
[ 6.298660] sdf: sdf1 sdf2 sdf3 sdf4
[ 6.319252] sd 6:0:0:0: [sdf] Attached SCSI disk
[ 6.480811] md: md0 stopped.
[ 6.650804] md: md0 stopped.
[ 6.672476] md: md1 stopped.
[ 6.703699] md: md0 stopped.
[ 6.776075] md: bind<sdc2>
[ 6.776281] md: bind<sdf2>
[ 6.776493] md: bind<sdb2>
[ 6.781330] raid1: raid set md0 active with 2 out of 2 mirrors
[ 6.781421] md: md1 stopped.
[ 6.827664] md: bind<sdc4>
[ 6.827877] md: bind<sdd4>
[ 6.828101] md: bind<sde4>
[ 6.828322] md: bind<sdf4>
[ 6.828511] md: bind<sdb4>
...
[ 16.211526] md: md1 stopped.
[ 16.211568] md: unbind<sdb4>
[ 16.238909] md: export_rdev(sdb4)
[ 16.238974] md: unbind<sdf4>
[ 16.260022] md: export_rdev(sdf4)
[ 16.260087] md: unbind<sde4>
...
[ 16.288038] md: export_rdev(sde4)
[ 16.288177] md: unbind<sdd4>
[ 16.310872] MT2060: successfully identified (IF1 = 1210)
[ 16.316024] md: export_rdev(sdd4)
[ 16.316157] md: unbind<sdc4>
[ 16.348112] md: export_rdev(sdc4)
[ 16.356096] md: bind<sdc4>
[ 16.356309] md: bind<sdd4>
[ 16.356489] md: bind<sde4>
[ 16.356659] md: bind<sdf4>
[ 16.356861] md: bind<sdb4>
[ 16.389718] md: md1 stopped.
[ 16.389760] md: unbind<sdb4>
[ 16.416231] md: export_rdev(sdb4)
[ 16.416272] md: unbind<sdf4>
[ 16.428038] md: export_rdev(sdf4)
[ 16.428074] md: unbind<sde4>
[ 16.440027] md: export_rdev(sde4)
[ 16.440063] md: unbind<sdd4>
[ 16.452029] md: export_rdev(sdd4)
[ 16.452064] md: unbind<sdc4>
[ 16.464059] md: export_rdev(sdc4)
[ 16.471971] md: bind<sdc4>
[ 16.472177] md: bind<sdd4>
[ 16.472353] md: bind<sde4>
[ 16.472533] md: bind<sdf4>
[ 16.472734] md: bind<sdb4>
...
######################
mdadm -E /dev/sdb2
######################
/dev/sdb2:
Magic : a92b4efc
Version : 00.90.00
UUID : e1023500:94537d05:cb667a5a:bd8e784b
Creation Time : Tue May 6 01:50:43 2008
Raid Level : raid1
Used Dev Size : 9767424 (9.31 GiB 10.00 GB)
Array Size : 9767424 (9.31 GiB 10.00 GB)
Raid Devices : 2
Total Devices : 3
Preferred Minor : 0
Update Time : Tue Apr 14 13:07:21 2009
State : clean
Active Devices : 2
Working Devices : 3
Failed Devices : 0
Spare Devices : 1
Checksum : 3a0f281d - correct
Events : 176
Number Major Minor RaidDevice State
this 0 8 18 0 active sync /dev/sdb2
0 0 8 18 0 active sync /dev/sdb2
1 1 8 34 1 active sync /dev/sdc2
2 2 8 82 2 spare /dev/sdf2
######################
mdadm -E /dev/sdc2
######################
/dev/sdc2:
Magic : a92b4efc
Version : 00.90.00
UUID : e1023500:94537d05:cb667a5a:bd8e784b
Creation Time : Tue May 6 01:50:43 2008
Raid Level : raid1
Used Dev Size : 9767424 (9.31 GiB 10.00 GB)
Array Size : 9767424 (9.31 GiB 10.00 GB)
Raid Devices : 2
Total Devices : 3
Preferred Minor : 0
Update Time : Tue Apr 14 13:07:21 2009
State : clean
Active Devices : 2
Working Devices : 3
Failed Devices : 0
Spare Devices : 1
Checksum : 3a0f282f - correct
Events : 176
Number Major Minor RaidDevice State
this 1 8 34 1 active sync /dev/sdc2
0 0 8 18 0 active sync /dev/sdb2
1 1 8 34 1 active sync /dev/sdc2
2 2 8 82 2 spare /dev/sdf2
######################
mdadm -E /dev/sdf2
######################
/dev/sdf2:
Magic : a92b4efc
Version : 00.90.00
UUID : e1023500:94537d05:cb667a5a:bd8e784b
Creation Time : Tue May 6 01:50:43 2008
Raid Level : raid1
Used Dev Size : 9767424 (9.31 GiB 10.00 GB)
Array Size : 9767424 (9.31 GiB 10.00 GB)
Raid Devices : 2
Total Devices : 3
Preferred Minor : 0
Update Time : Tue Apr 14 02:33:25 2009
State : clean
Active Devices : 2
Working Devices : 3
Failed Devices : 0
Spare Devices : 1
Checksum : 3a0e93c7 - correct
Events : 176
Number Major Minor RaidDevice State
this 2 8 82 2 spare /dev/sdf2
0 0 8 18 0 active sync /dev/sdb2
1 1 8 34 1 active sync /dev/sdc2
2 2 8 82 2 spare /dev/sdf2
######################
# mdadm -E /dev/sdb4 #
######################
/dev/sdb4:
Magic : a92b4efc
Version : 00.90.00
UUID : f4ddbd55:206c7f81:b855f41b:37d33d37
Creation Time : Tue May 6 02:06:45 2008
Raid Level : raid10
Used Dev Size : 965883904 (921.14 GiB 989.07 GB)
Array Size : 1931767808 (1842.28 GiB 1978.13 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 1
Update Time : Tue Apr 14 00:45:27 2009
State : active
Active Devices : 3
Working Devices : 4
Failed Devices : 1
Spare Devices : 1
Checksum : 7a3576c1 - correct
Events : 221
Layout : near=2, far=1
Chunk Size : 64K
Number Major Minor RaidDevice State
this 0 8 20 0 active sync /dev/sdb4
0 0 8 20 0 active sync /dev/sdb4
1 1 8 36 1 active sync /dev/sdc4
2 2 0 0 2 faulty removed
3 3 8 68 3 active sync /dev/sde4
4 4 8 84 4 spare /dev/sdf4
######################
# mdadm -E /dev/sdc4 #
######################
/dev/sdc4:
Magic : a92b4efc
Version : 00.90.00
UUID : f4ddbd55:206c7f81:b855f41b:37d33d37
Creation Time : Tue May 6 02:06:45 2008
Raid Level : raid10
Used Dev Size : 965883904 (921.14 GiB 989.07 GB)
Array Size : 1931767808 (1842.28 GiB 1978.13 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 1
Update Time : Tue Apr 14 00:44:13 2009
State : active
Active Devices : 4
Working Devices : 5
Failed Devices : 0
Spare Devices : 1
Checksum : 7a35767a - correct
Events : 219
Layout : near=2, far=1
Chunk Size : 64K
Number Major Minor RaidDevice State
this 1 8 36 1 active sync /dev/sdc4
0 0 8 20 0 active sync /dev/sdb4
1 1 8 36 1 active sync /dev/sdc4
2 2 8 52 2 active sync /dev/sdd4
3 3 8 68 3 active sync /dev/sde4
4 4 8 84 4 spare /dev/sdf4
######################
# mdadm -E /dev/sdd4 #
######################
/dev/sdd4:
Magic : a92b4efc
Version : 00.90.00
UUID : f4ddbd55:206c7f81:b855f41b:37d33d37
Creation Time : Tue May 6 02:06:45 2008
Raid Level : raid10
Used Dev Size : 965883904 (921.14 GiB 989.07 GB)
Array Size : 1931767808 (1842.28 GiB 1978.13 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 1
Update Time : Tue Apr 14 00:44:13 2009
State : active
Active Devices : 4
Working Devices : 5
Failed Devices : 0
Spare Devices : 1
Checksum : 7a35768c - correct
Events : 219
Layout : near=2, far=1
Chunk Size : 64K
Number Major Minor RaidDevice State
this 2 8 52 2 active sync /dev/sdd4
0 0 8 20 0 active sync /dev/sdb4
1 1 8 36 1 active sync /dev/sdc4
2 2 8 52 2 active sync /dev/sdd4
3 3 8 68 3 active sync /dev/sde4
4 4 8 84 4 spare /dev/sdf4
######################
# mdadm -E /dev/sde4 #
######################
/dev/sde4:
Magic : a92b4efc
Version : 00.90.00
UUID : f4ddbd55:206c7f81:b855f41b:37d33d37
Creation Time : Tue May 6 02:06:45 2008
Raid Level : raid10
Used Dev Size : 965883904 (921.14 GiB 989.07 GB)
Array Size : 1931767808 (1842.28 GiB 1978.13 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 1
Update Time : Tue Apr 14 00:44:13 2009
State : active
Active Devices : 4
Working Devices : 5
Failed Devices : 0
Spare Devices : 1
Checksum : 7a35769e - correct
Events : 219
Layout : near=2, far=1
Chunk Size : 64K
Number Major Minor RaidDevice State
this 3 8 68 3 active sync /dev/sde4
0 0 8 20 0 active sync /dev/sdb4
1 1 8 36 1 active sync /dev/sdc4
2 2 8 52 2 active sync /dev/sdd4
3 3 8 68 3 active sync /dev/sde4
4 4 8 84 4 spare /dev/sdf4
######################
# mdadm -E /dev/sdf4 #
######################
/dev/sdf4:
Magic : a92b4efc
Version : 00.90.00
UUID : f4ddbd55:206c7f81:b855f41b:37d33d37
Creation Time : Tue May 6 02:06:45 2008
Raid Level : raid10
Used Dev Size : 965883904 (921.14 GiB 989.07 GB)
Array Size : 1931767808 (1842.28 GiB 1978.13 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 1
Update Time : Fri Apr 10 16:43:47 2009
State : clean
Active Devices : 4
Working Devices : 5
Failed Devices : 0
Spare Devices : 1
Checksum : 7a31126a - correct
Events : 218
Layout : near=2, far=1
Chunk Size : 64K
Number Major Minor RaidDevice State
this 4 8 84 4 spare /dev/sdf4
0 0 8 20 0 active sync /dev/sdb4
1 1 8 36 1 active sync /dev/sdc4
2 2 8 52 2 active sync /dev/sdd4
3 3 8 68 3 active sync /dev/sde4
4 4 8 84 4 spare /dev/sdf4
More information about the Wylug-help
mailing list