[Wylug-discuss] Help needed with 'failed' Linux software RAID 10

Dave Fisher wylug-discuss at davefisher.co.uk
Sat May 29 18:43:11 UTC 2010


Dear all,

The other day, I posted a request to wylug-help concerning a serious
(for me) RAID failure, but got no reply, so I'm trying again via
wylug-discuss (with appologies).

I know that many wylug members are RAID experts (at least in
comparison with me), so I guess that either I didn't express myself
properly,  no-one uses wylug-help any more, or everyone hates me ;-)

In the first instance, the only help I'm asking for is advice on how
to carry out safe diagnostics and how to interpret the results.

To give you some idea of the facts that I've already established, I've
copied the output from some simple diagnostics in a text attachment.

I might be completely missing the obvious, but some salient points seem to be:

 a) LVM2 logical volumes sit on top of a RAID 10 array
 b) mdadm --examine shows partition names/IDs and statuses to be mixed
up, like my last RAI failure
 c) That last failure was 'fixed' by finding the right incantation to
re-assemble and re-sync
 d) One problem, then as now, was to identify which pairs of
partitions make up the each side of the mirror
 e) There seems to be no superblock detected for the whole device (md1)
 f) The spare (sdj4) seems to be invisible in proc
 g) The report of md0 on sdi2 is a red herring - left over from the
incomplete removal of md0


Some questions
-------------

a) My, admittedly dodgy, understanding of RAID 10 is that if I can ID
the two halves of a pair, I should be able to mount the pair like
normal partitions or LVM volumes. Is this correct?

b) Is it possible to readonly mount the md1 partitions in their
current state - safely, without screwing-up further?

Any interpretation of the attached data, or advice on further safe
diagnostics would be appreciated.

Dave
-------------- next part --------------
1. The proc filesystem doesn't seem to recognise the spare (sdj4)
-----------------------------------------------------------------
<pre>
$ cat /proc/mdstat 
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md1 : inactive sdh4[2](S) sdf4[3](S) sdg4[1](S) sdi4[0](S)
      3863535616 blocks
unused devices: <none>
</pre>


2. Using Mdadm to examine the device /dev/md1
---------------------------------------------
<pre>
sudo mdadm --examine /dev/md1
mdadm: No md superblock detected on /dev/md1.
</pre>


3. Using Mdadm to examine the partitions + the spare that make up /dev/md1
--------------------------------------------------------------------------


3.1. sdf4 - seems to report AOK
-------------------------------

<pre>
$ sudo mdadm --examine /dev/sdf4
/dev/sdf4:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : f4ddbd55:206c7f81:b855f41b:37d33d37
  Creation Time : Tue May  6 02:06:45 2008
     Raid Level : raid10
  Used Dev Size : 965883904 (921.14 GiB 989.07 GB)
     Array Size : 1931767808 (1842.28 GiB 1978.13 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 1

    Update Time : Mon May 24 02:12:54 2010
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 7d3a624c - correct
         Events : 7828427

         Layout : near=2, far=1
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     3       8       84        3      active sync   /dev/sdf4

   0     0       8      132        0      active sync   /dev/sdi4
   1     1       8      100        1      active sync   /dev/sdg4
   2     2       8      116        2      active sync   /dev/sdh4
   3     3       8       84        3      active sync   /dev/sdf4
</pre>


3.2. sdg4 - seems to think two of the four partitions are AOK
--------------------------------------------------------------
<pre>
$ sudo mdadm --examine /dev/sdg4
/dev/sdg4:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : f4ddbd55:206c7f81:b855f41b:37d33d37
  Creation Time : Tue May  6 02:06:45 2008
     Raid Level : raid10
  Used Dev Size : 965883904 (921.14 GiB 989.07 GB)
     Array Size : 1931767808 (1842.28 GiB 1978.13 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 1

    Update Time : Sat May 29 01:12:30 2010
          State : active
 Active Devices : 2
Working Devices : 2
 Failed Devices : 1
  Spare Devices : 0
       Checksum : 7ccd4c92 - correct
         Events : 8079459

         Layout : near=2, far=1
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     1       8      100        1      active sync   /dev/sdg4

   0     0       0        0        0      removed
   1     1       8      100        1      active sync   /dev/sdg4
   2     2       8      116        2      active sync   /dev/sdh4
   3     3       0        0        3      faulty removed
</pre>


3.3. sdh4 - seems to have two personalities
-------------------------------------------

<pre>
$ sudo mdadm --examine /dev/sdh4
/dev/sdh4:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : f4ddbd55:206c7f81:b855f41b:37d33d37
  Creation Time : Tue May  6 02:06:45 2008
     Raid Level : raid10
  Used Dev Size : 965883904 (921.14 GiB 989.07 GB)
     Array Size : 1931767808 (1842.28 GiB 1978.13 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 1

    Update Time : Sat May 29 01:26:30 2010
          State : clean
 Active Devices : 1
Working Devices : 1
 Failed Devices : 2
  Spare Devices : 0
       Checksum : 7d4898bb - correct
         Events : 8079505

         Layout : near=2, far=1
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     2       8      116        2      active sync   /dev/sdh4

   0     0       0        0        0      removed
   1     1       0        0        1      faulty removed
   2     2       8      116        2      active sync   /dev/sdh4
   3     3       0        0        3      faulty removed
</pre>


3.4. sdi4 - seems to think all 4 partitions are AOK
---------------------------------------------------

<pre>
$ sudo mdadm --examine /dev/sdi4
/dev/sdi4:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : f4ddbd55:206c7f81:b855f41b:37d33d37
  Creation Time : Tue May  6 02:06:45 2008
     Raid Level : raid10
  Used Dev Size : 965883904 (921.14 GiB 989.07 GB)
     Array Size : 1931767808 (1842.28 GiB 1978.13 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 1

    Update Time : Mon May 24 02:12:54 2010
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 7d3a6276 - correct
         Events : 7828427

         Layout : near=2, far=1
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     0       8      132        0      active sync   /dev/sdi4

   0     0       8      132        0      active sync   /dev/sdi4
   1     1       8      100        1      active sync   /dev/sdg4
   2     2       8      116        2      active sync   /dev/sdh4
   3     3       8       84        3      active sync   /dev/sdf4
</pre>


3.5. sdj4 should be the spare, but it seems to think that sdf4 is the spare
---------------------------------------------------------------------------

<pre>
$ sudo mdadm --examine /dev/sdj4
[sudo] password for davef: 
/dev/sdj4:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : f4ddbd55:206c7f81:b855f41b:37d33d37
  Creation Time : Tue May  6 02:06:45 2008
     Raid Level : raid10
  Used Dev Size : 965883904 (921.14 GiB 989.07 GB)
     Array Size : 1931767808 (1842.28 GiB 1978.13 GB)
   Raid Devices : 4
  Total Devices : 5
Preferred Minor : 1

    Update Time : Tue Oct  6 18:01:45 2009
          State : clean
 Active Devices : 4
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 1
       Checksum : 7b1d23e4 - correct
         Events : 370

         Layout : near=2, far=1
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     3       8      148        3      active sync   /dev/sdj4

   0     0       8      132        0      active sync   /dev/sdi4
   1     1       8      100        1      active sync   /dev/sdg4
   2     2       8      116        2      active sync   /dev/sdh4
   3     3       8      148        3      active sync   /dev/sdj4
   4     4       8       84        4      spare   /dev/sdf4
</pre>


4. Output from TestDisk
-----------------------

Disk /dev/sdf - 1000 GB / 931 GiB - CHS 121601 255 63, sector size=512
Disk /dev/sdg - 1000 GB / 931 GiB - CHS 121601 255 63, sector size=512
Disk /dev/sdh - 1000 GB / 931 GiB - CHS 121601 255 63, sector size=512
Disk /dev/sdi - 1000 GB / 931 GiB - CHS 121601 255 63, sector size=512
Disk /dev/sdj - 1000 GB / 931 GiB - CHS 121601 255 63, sector size=512

Disk /dev/sdf - 1000 GB / 931 GiB - CHS 121601 255 63
     Partition			Start        End    Size in sectors
 1 * Linux                    0   1  1    15 254 63     256977
 2 P Linux                   16   0  1  1231 254 63   19535040
 3 P Linux Swap            1232   0  1  1353 254 63    1959930
 4 P Linux RAID            1354   0  1 121600 254 63 1931768055 [md1]

Disk /dev/sdg - 1000 GB / 931 GiB - CHS 121601 255 63
     Partition			Start        End    Size in sectors
No EXT2, JFS, Reiser, cramfs or XFS marker
 1 P Linux                    0   1  1    15 254 63     256977
 1 P Linux                    0   1  1    15 254 63     256977
Invalid RAID superblock
 2 P Linux RAID              16   0  1  1231 254 63   19535040
 2 P Linux RAID              16   0  1  1231 254 63   19535040
 3 P Linux Swap            1232   0  1  1353 254 63    1959930
 4 P Linux RAID            1354   0  1 121600 254 63 1931768055 [md1]
No partition is bootable

Disk /dev/sdh - 1000 GB / 931 GiB - CHS 121601 255 63
     Partition			Start        End    Size in sectors
No EXT2, JFS, Reiser, cramfs or XFS marker
 1 * Linux                    0   1  1    15 254 63     256977
 1 * Linux                    0   1  1    15 254 63     256977
Invalid RAID superblock
 2 P Linux RAID              16   0  1  1231 254 63   19535040
 2 P Linux RAID              16   0  1  1231 254 63   19535040
 3 P Linux Swap            1232   0  1  1353 254 63    1959930
 4 P Linux RAID            1354   0  1 121600 254 63 1931768055 [md1]

Disk /dev/sdi - 1000 GB / 931 GiB - CHS 121601 255 63
     Partition			Start        End    Size in sectors
No EXT2, JFS, Reiser, cramfs or XFS marker
 1 P Linux                    0   1  1    15 254 63     256977
 1 P Linux                    0   1  1    15 254 63     256977
 2 P Linux RAID              16   0  1  1231 254 63   19535040 [md0]
 3 P Linux Swap            1232   0  1  1353 254 63    1959930
 4 P Linux RAID            1354   0  1 121600 254 63 1931768055 [md1]
No partition is bootable

Disk /dev/sdj - 1000 GB / 931 GiB - CHS 121601 255 63
     Partition			Start        End    Size in sectors
No EXT2, JFS, Reiser, cramfs or XFS marker
 1 P Linux                    0   1  1    15 254 63     256977
 1 P Linux                    0   1  1    15 254 63     256977
 2 * Linux                   16   0  1  1231 254 63   19535040
 3 P Linux Swap            1232   0  1  1353 254 63    1959930
 4 P Linux RAID            1354   0  1 121600 254 63 1931768055 [md1]


More information about the Wylug-discuss mailing list