[Wylug-discuss] Help needed with 'failed' Linux software RAID 10
Dave Fisher
wylug-discuss at davefisher.co.uk
Sat May 29 18:43:11 UTC 2010
Dear all,
The other day, I posted a request to wylug-help concerning a serious
(for me) RAID failure, but got no reply, so I'm trying again via
wylug-discuss (with appologies).
I know that many wylug members are RAID experts (at least in
comparison with me), so I guess that either I didn't express myself
properly, no-one uses wylug-help any more, or everyone hates me ;-)
In the first instance, the only help I'm asking for is advice on how
to carry out safe diagnostics and how to interpret the results.
To give you some idea of the facts that I've already established, I've
copied the output from some simple diagnostics in a text attachment.
I might be completely missing the obvious, but some salient points seem to be:
a) LVM2 logical volumes sit on top of a RAID 10 array
b) mdadm --examine shows partition names/IDs and statuses to be mixed
up, like my last RAI failure
c) That last failure was 'fixed' by finding the right incantation to
re-assemble and re-sync
d) One problem, then as now, was to identify which pairs of
partitions make up the each side of the mirror
e) There seems to be no superblock detected for the whole device (md1)
f) The spare (sdj4) seems to be invisible in proc
g) The report of md0 on sdi2 is a red herring - left over from the
incomplete removal of md0
Some questions
-------------
a) My, admittedly dodgy, understanding of RAID 10 is that if I can ID
the two halves of a pair, I should be able to mount the pair like
normal partitions or LVM volumes. Is this correct?
b) Is it possible to readonly mount the md1 partitions in their
current state - safely, without screwing-up further?
Any interpretation of the attached data, or advice on further safe
diagnostics would be appreciated.
Dave
-------------- next part --------------
1. The proc filesystem doesn't seem to recognise the spare (sdj4)
-----------------------------------------------------------------
<pre>
$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : inactive sdh4[2](S) sdf4[3](S) sdg4[1](S) sdi4[0](S)
3863535616 blocks
unused devices: <none>
</pre>
2. Using Mdadm to examine the device /dev/md1
---------------------------------------------
<pre>
sudo mdadm --examine /dev/md1
mdadm: No md superblock detected on /dev/md1.
</pre>
3. Using Mdadm to examine the partitions + the spare that make up /dev/md1
--------------------------------------------------------------------------
3.1. sdf4 - seems to report AOK
-------------------------------
<pre>
$ sudo mdadm --examine /dev/sdf4
/dev/sdf4:
Magic : a92b4efc
Version : 00.90.00
UUID : f4ddbd55:206c7f81:b855f41b:37d33d37
Creation Time : Tue May 6 02:06:45 2008
Raid Level : raid10
Used Dev Size : 965883904 (921.14 GiB 989.07 GB)
Array Size : 1931767808 (1842.28 GiB 1978.13 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 1
Update Time : Mon May 24 02:12:54 2010
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Checksum : 7d3a624c - correct
Events : 7828427
Layout : near=2, far=1
Chunk Size : 64K
Number Major Minor RaidDevice State
this 3 8 84 3 active sync /dev/sdf4
0 0 8 132 0 active sync /dev/sdi4
1 1 8 100 1 active sync /dev/sdg4
2 2 8 116 2 active sync /dev/sdh4
3 3 8 84 3 active sync /dev/sdf4
</pre>
3.2. sdg4 - seems to think two of the four partitions are AOK
--------------------------------------------------------------
<pre>
$ sudo mdadm --examine /dev/sdg4
/dev/sdg4:
Magic : a92b4efc
Version : 00.90.00
UUID : f4ddbd55:206c7f81:b855f41b:37d33d37
Creation Time : Tue May 6 02:06:45 2008
Raid Level : raid10
Used Dev Size : 965883904 (921.14 GiB 989.07 GB)
Array Size : 1931767808 (1842.28 GiB 1978.13 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 1
Update Time : Sat May 29 01:12:30 2010
State : active
Active Devices : 2
Working Devices : 2
Failed Devices : 1
Spare Devices : 0
Checksum : 7ccd4c92 - correct
Events : 8079459
Layout : near=2, far=1
Chunk Size : 64K
Number Major Minor RaidDevice State
this 1 8 100 1 active sync /dev/sdg4
0 0 0 0 0 removed
1 1 8 100 1 active sync /dev/sdg4
2 2 8 116 2 active sync /dev/sdh4
3 3 0 0 3 faulty removed
</pre>
3.3. sdh4 - seems to have two personalities
-------------------------------------------
<pre>
$ sudo mdadm --examine /dev/sdh4
/dev/sdh4:
Magic : a92b4efc
Version : 00.90.00
UUID : f4ddbd55:206c7f81:b855f41b:37d33d37
Creation Time : Tue May 6 02:06:45 2008
Raid Level : raid10
Used Dev Size : 965883904 (921.14 GiB 989.07 GB)
Array Size : 1931767808 (1842.28 GiB 1978.13 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 1
Update Time : Sat May 29 01:26:30 2010
State : clean
Active Devices : 1
Working Devices : 1
Failed Devices : 2
Spare Devices : 0
Checksum : 7d4898bb - correct
Events : 8079505
Layout : near=2, far=1
Chunk Size : 64K
Number Major Minor RaidDevice State
this 2 8 116 2 active sync /dev/sdh4
0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 8 116 2 active sync /dev/sdh4
3 3 0 0 3 faulty removed
</pre>
3.4. sdi4 - seems to think all 4 partitions are AOK
---------------------------------------------------
<pre>
$ sudo mdadm --examine /dev/sdi4
/dev/sdi4:
Magic : a92b4efc
Version : 00.90.00
UUID : f4ddbd55:206c7f81:b855f41b:37d33d37
Creation Time : Tue May 6 02:06:45 2008
Raid Level : raid10
Used Dev Size : 965883904 (921.14 GiB 989.07 GB)
Array Size : 1931767808 (1842.28 GiB 1978.13 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 1
Update Time : Mon May 24 02:12:54 2010
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Checksum : 7d3a6276 - correct
Events : 7828427
Layout : near=2, far=1
Chunk Size : 64K
Number Major Minor RaidDevice State
this 0 8 132 0 active sync /dev/sdi4
0 0 8 132 0 active sync /dev/sdi4
1 1 8 100 1 active sync /dev/sdg4
2 2 8 116 2 active sync /dev/sdh4
3 3 8 84 3 active sync /dev/sdf4
</pre>
3.5. sdj4 should be the spare, but it seems to think that sdf4 is the spare
---------------------------------------------------------------------------
<pre>
$ sudo mdadm --examine /dev/sdj4
[sudo] password for davef:
/dev/sdj4:
Magic : a92b4efc
Version : 00.90.00
UUID : f4ddbd55:206c7f81:b855f41b:37d33d37
Creation Time : Tue May 6 02:06:45 2008
Raid Level : raid10
Used Dev Size : 965883904 (921.14 GiB 989.07 GB)
Array Size : 1931767808 (1842.28 GiB 1978.13 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 1
Update Time : Tue Oct 6 18:01:45 2009
State : clean
Active Devices : 4
Working Devices : 5
Failed Devices : 0
Spare Devices : 1
Checksum : 7b1d23e4 - correct
Events : 370
Layout : near=2, far=1
Chunk Size : 64K
Number Major Minor RaidDevice State
this 3 8 148 3 active sync /dev/sdj4
0 0 8 132 0 active sync /dev/sdi4
1 1 8 100 1 active sync /dev/sdg4
2 2 8 116 2 active sync /dev/sdh4
3 3 8 148 3 active sync /dev/sdj4
4 4 8 84 4 spare /dev/sdf4
</pre>
4. Output from TestDisk
-----------------------
Disk /dev/sdf - 1000 GB / 931 GiB - CHS 121601 255 63, sector size=512
Disk /dev/sdg - 1000 GB / 931 GiB - CHS 121601 255 63, sector size=512
Disk /dev/sdh - 1000 GB / 931 GiB - CHS 121601 255 63, sector size=512
Disk /dev/sdi - 1000 GB / 931 GiB - CHS 121601 255 63, sector size=512
Disk /dev/sdj - 1000 GB / 931 GiB - CHS 121601 255 63, sector size=512
Disk /dev/sdf - 1000 GB / 931 GiB - CHS 121601 255 63
Partition Start End Size in sectors
1 * Linux 0 1 1 15 254 63 256977
2 P Linux 16 0 1 1231 254 63 19535040
3 P Linux Swap 1232 0 1 1353 254 63 1959930
4 P Linux RAID 1354 0 1 121600 254 63 1931768055 [md1]
Disk /dev/sdg - 1000 GB / 931 GiB - CHS 121601 255 63
Partition Start End Size in sectors
No EXT2, JFS, Reiser, cramfs or XFS marker
1 P Linux 0 1 1 15 254 63 256977
1 P Linux 0 1 1 15 254 63 256977
Invalid RAID superblock
2 P Linux RAID 16 0 1 1231 254 63 19535040
2 P Linux RAID 16 0 1 1231 254 63 19535040
3 P Linux Swap 1232 0 1 1353 254 63 1959930
4 P Linux RAID 1354 0 1 121600 254 63 1931768055 [md1]
No partition is bootable
Disk /dev/sdh - 1000 GB / 931 GiB - CHS 121601 255 63
Partition Start End Size in sectors
No EXT2, JFS, Reiser, cramfs or XFS marker
1 * Linux 0 1 1 15 254 63 256977
1 * Linux 0 1 1 15 254 63 256977
Invalid RAID superblock
2 P Linux RAID 16 0 1 1231 254 63 19535040
2 P Linux RAID 16 0 1 1231 254 63 19535040
3 P Linux Swap 1232 0 1 1353 254 63 1959930
4 P Linux RAID 1354 0 1 121600 254 63 1931768055 [md1]
Disk /dev/sdi - 1000 GB / 931 GiB - CHS 121601 255 63
Partition Start End Size in sectors
No EXT2, JFS, Reiser, cramfs or XFS marker
1 P Linux 0 1 1 15 254 63 256977
1 P Linux 0 1 1 15 254 63 256977
2 P Linux RAID 16 0 1 1231 254 63 19535040 [md0]
3 P Linux Swap 1232 0 1 1353 254 63 1959930
4 P Linux RAID 1354 0 1 121600 254 63 1931768055 [md1]
No partition is bootable
Disk /dev/sdj - 1000 GB / 931 GiB - CHS 121601 255 63
Partition Start End Size in sectors
No EXT2, JFS, Reiser, cramfs or XFS marker
1 P Linux 0 1 1 15 254 63 256977
1 P Linux 0 1 1 15 254 63 256977
2 * Linux 16 0 1 1231 254 63 19535040
3 P Linux Swap 1232 0 1 1353 254 63 1959930
4 P Linux RAID 1354 0 1 121600 254 63 1931768055 [md1]
More information about the Wylug-discuss
mailing list