[Wylug-help] Inactive RAID 10 Array

Dave Fisher wylug-help at davefisher.co.uk
Tue Apr 14 13:59:14 UTC 2009


Hi All,

I need some help recovering from a RAID 10 fault.  Unfortunately, I really have
to get this fixed quickly and I can't afford to experiment, because I haven't
yet set up a backup procedure for the affected array. 

So I need advice on:

  1. How to correctly backup the affected array before I do anything else.
  2. How to diagnose the fault with a high degree of certainty.

The relevant bits of my RAID set up are as follows:

  1. I have 2 arrays (see /etc/mdstat and /etc/fstab below)
     
     1.1. md0 is a RAID 1 array containing my root filesystem
     1.2. md1 is a RAID 10 array containing 3 LVM volume groups for /home, /tmp, and /var

  2. md1 is inactive, but contains c. 2TB of data spread across 4 primary
     partions of c. 1TB each.

  3. Most of the data on md1 are multimedia files that I can afford to lose,
     but several MBs are business critical stuff like invoices, tax info, etc.
  
  4. md1 consists of the following partitions (see mdadm -E readouts below)
     
     /dev/sdb4
     /dev/sdc4
     /dev/sdd4
     /dev/sde4
     /dev/sdf4

  5. sdd4 appears to be faulty and fdf4 is supposed to be a spare

I suspect that the first thing I should do is dd sd{b,c,d,e}4 to some spare disks.

I currently have 3 spare 1TB SATA disks and am just about to pop out to
buy 2 more, but before I do anything that relies on these spares I'd
like to be more certain about how useful the dd'd copies are going to
be, i.e. will they contain all the RAID metadata that I need to
preserve?

It's some years since I last read up on RAID so my memory is hazy, and a brief
bit of googling suggests that the documentation on Linux RAID (especially RAID
10) is just as bad (dislocated) as it was then.

My two big fears are that:

  1. Some of the RAID metadata is stored elsewhere, e.g. on a different partition
     or superblock.

     If so, how do I back that up and restore it?  
  
  2. There may be hardware constraints that I've forgotten or never knew about.
  
     For example, I remember that the partitions in an array have to be identically
     sized, but I am guessing that they don't have to be physically identical, i.e.
     they don't have to occupy identically positioned blocks on identical models of HDD. 

     So I should be able to treat raw images of the partitions just like the originals.
     
     Is this the case?

     What about block sizes for the dd'd copies?     

     Would this command be sufficient to copy /dev/sdb4?

       $ sudo dd if=/dev/sdb4 of=/dev/sdg


Suggested diagnoses and prognoses based on the read-outs (e.g. dmesg and
mdadm) below would be appreciated, but that's step no.2 ... after I've
sorted the backups.

One thing I don't understand is why mdadm -E /dev/sdb2 reports one
failed partition, but mdadm -E /dev/sd{c,d,e,f} show no such error? 

Hope someone can help,

Dave 
----------------------------------------------------------------------------

####################
# cat /proc/mdstat #
####################
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md1 : inactive sdb4[0](S) sdf4[4](S) sde4[3](S) sdd4[2](S) sdc4[1](S)
      4829419520 blocks
       
md0 : active raid1 sdb2[0] sdf2[2](S) sdc2[1]
      9767424 blocks [2/2] [UU]
      
unused devices: <none>


##############
# /etc/fstab #
##############
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
# /dev/md0
UUID=5d494f0c-d723-4f15-90d6-b4d08e5fd059 /               ext3    relatime,errors=remount-ro 0       1
# /dev/sda1
UUID=2968bbbe-223f-490f-869e-1312dabdaf18 /boot           ext2    relatime        0       2
# /dev/mapper/vg--data1-lv--home
UUID=8b824f93-e686-4f08-9ec2-76e754d8f06f /home           ext3    relatime        0       2
# /dev/mapper/vg--data1-lv--tmp
UUID=dee6072f-ca1c-462f-9730-c277e3f8b8d9 /tmp            ext3    relatime        0       2
# /dev/mapper/vg--data1-lv--var
UUID=03600db0-f72f-4021-9bb2-b8cb19f3a2a0 /var            ext3    relatime        0       2



#########
# dmesg #
#########

[    1.360604] device-mapper: uevent: version 1.0.3
[    1.360753] device-mapper: ioctl: 4.14.0-ioctl (2008-04-23) initialised: dm-devel at redhat.com
[    1.372169] md: linear personality registered for level -1
[    1.374645] md: multipath personality registered for level -4
[    1.376874] md: raid0 personality registered for level 0
[    1.380025] md: raid1 personality registered for level 1
...
[    1.468017] raid6: int64x1   2136 MB/s
[    1.536001] raid6: int64x2   2936 MB/s
[    1.604005] raid6: int64x4   2237 MB/s
[    1.672023] raid6: int64x8   1907 MB/s
[    1.740013] raid6: sse2x1    4421 MB/s
[    1.808006] raid6: sse2x2    5179 MB/s
[    1.876004] raid6: sse2x4    7969 MB/s
[    1.876032] raid6: using algorithm sse2x4 (7969 MB/s)
[    1.876062] md: raid6 personality registered for level 6
[    1.876090] md: raid5 personality registered for level 5
[    1.876580] md: raid4 personality registered for level 4
[    1.889737] md: raid10 personality registered for level 10
...
[    6.298660]  sdf: sdf1 sdf2 sdf3 sdf4
[    6.319252] sd 6:0:0:0: [sdf] Attached SCSI disk
[    6.480811] md: md0 stopped.
[    6.650804] md: md0 stopped.
[    6.672476] md: md1 stopped.
[    6.703699] md: md0 stopped.
[    6.776075] md: bind<sdc2>
[    6.776281] md: bind<sdf2>
[    6.776493] md: bind<sdb2>
[    6.781330] raid1: raid set md0 active with 2 out of 2 mirrors
[    6.781421] md: md1 stopped.
[    6.827664] md: bind<sdc4>
[    6.827877] md: bind<sdd4>
[    6.828101] md: bind<sde4>
[    6.828322] md: bind<sdf4>
[    6.828511] md: bind<sdb4>
...
[   16.211526] md: md1 stopped.
[   16.211568] md: unbind<sdb4>
[   16.238909] md: export_rdev(sdb4)
[   16.238974] md: unbind<sdf4>
[   16.260022] md: export_rdev(sdf4)
[   16.260087] md: unbind<sde4>
...
[   16.288038] md: export_rdev(sde4)
[   16.288177] md: unbind<sdd4>
[   16.310872] MT2060: successfully identified (IF1 = 1210)
[   16.316024] md: export_rdev(sdd4)
[   16.316157] md: unbind<sdc4>
[   16.348112] md: export_rdev(sdc4)
[   16.356096] md: bind<sdc4>
[   16.356309] md: bind<sdd4>
[   16.356489] md: bind<sde4>
[   16.356659] md: bind<sdf4>
[   16.356861] md: bind<sdb4>
[   16.389718] md: md1 stopped.
[   16.389760] md: unbind<sdb4>
[   16.416231] md: export_rdev(sdb4)
[   16.416272] md: unbind<sdf4>
[   16.428038] md: export_rdev(sdf4)
[   16.428074] md: unbind<sde4>
[   16.440027] md: export_rdev(sde4)
[   16.440063] md: unbind<sdd4>
[   16.452029] md: export_rdev(sdd4)
[   16.452064] md: unbind<sdc4>
[   16.464059] md: export_rdev(sdc4)
[   16.471971] md: bind<sdc4>
[   16.472177] md: bind<sdd4>
[   16.472353] md: bind<sde4>
[   16.472533] md: bind<sdf4>
[   16.472734] md: bind<sdb4>
...


######################
mdadm -E /dev/sdb2
######################
/dev/sdb2:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : e1023500:94537d05:cb667a5a:bd8e784b
  Creation Time : Tue May  6 01:50:43 2008
     Raid Level : raid1
  Used Dev Size : 9767424 (9.31 GiB 10.00 GB)
     Array Size : 9767424 (9.31 GiB 10.00 GB)
   Raid Devices : 2
  Total Devices : 3
Preferred Minor : 0

    Update Time : Tue Apr 14 13:07:21 2009
          State : clean
 Active Devices : 2
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 1
       Checksum : 3a0f281d - correct
         Events : 176


      Number   Major   Minor   RaidDevice State
this     0       8       18        0      active sync   /dev/sdb2

   0     0       8       18        0      active sync   /dev/sdb2
   1     1       8       34        1      active sync   /dev/sdc2
   2     2       8       82        2      spare   /dev/sdf2


######################
mdadm -E /dev/sdc2
######################
/dev/sdc2:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : e1023500:94537d05:cb667a5a:bd8e784b
  Creation Time : Tue May  6 01:50:43 2008
     Raid Level : raid1
  Used Dev Size : 9767424 (9.31 GiB 10.00 GB)
     Array Size : 9767424 (9.31 GiB 10.00 GB)
   Raid Devices : 2
  Total Devices : 3
Preferred Minor : 0

    Update Time : Tue Apr 14 13:07:21 2009
          State : clean
 Active Devices : 2
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 1
       Checksum : 3a0f282f - correct
         Events : 176


      Number   Major   Minor   RaidDevice State
this     1       8       34        1      active sync   /dev/sdc2

   0     0       8       18        0      active sync   /dev/sdb2
   1     1       8       34        1      active sync   /dev/sdc2
   2     2       8       82        2      spare   /dev/sdf2


######################
mdadm -E /dev/sdf2
######################
/dev/sdf2:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : e1023500:94537d05:cb667a5a:bd8e784b
  Creation Time : Tue May  6 01:50:43 2008
     Raid Level : raid1
  Used Dev Size : 9767424 (9.31 GiB 10.00 GB)
     Array Size : 9767424 (9.31 GiB 10.00 GB)
   Raid Devices : 2
  Total Devices : 3
Preferred Minor : 0

    Update Time : Tue Apr 14 02:33:25 2009
          State : clean
 Active Devices : 2
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 1
       Checksum : 3a0e93c7 - correct
         Events : 176


      Number   Major   Minor   RaidDevice State
this     2       8       82        2      spare   /dev/sdf2

   0     0       8       18        0      active sync   /dev/sdb2
   1     1       8       34        1      active sync   /dev/sdc2
   2     2       8       82        2      spare   /dev/sdf2


######################
# mdadm -E /dev/sdb4 #
######################
/dev/sdb4:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : f4ddbd55:206c7f81:b855f41b:37d33d37
  Creation Time : Tue May  6 02:06:45 2008
     Raid Level : raid10
  Used Dev Size : 965883904 (921.14 GiB 989.07 GB)
     Array Size : 1931767808 (1842.28 GiB 1978.13 GB)
   Raid Devices : 4
  Total Devices : 5
Preferred Minor : 1

    Update Time : Tue Apr 14 00:45:27 2009
          State : active
 Active Devices : 3
Working Devices : 4
 Failed Devices : 1
  Spare Devices : 1
       Checksum : 7a3576c1 - correct
         Events : 221

         Layout : near=2, far=1
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     0       8       20        0      active sync   /dev/sdb4

   0     0       8       20        0      active sync   /dev/sdb4
   1     1       8       36        1      active sync   /dev/sdc4
   2     2       0        0        2      faulty removed
   3     3       8       68        3      active sync   /dev/sde4
   4     4       8       84        4      spare   /dev/sdf4

######################
# mdadm -E /dev/sdc4 #
######################
/dev/sdc4:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : f4ddbd55:206c7f81:b855f41b:37d33d37
  Creation Time : Tue May  6 02:06:45 2008
     Raid Level : raid10
  Used Dev Size : 965883904 (921.14 GiB 989.07 GB)
     Array Size : 1931767808 (1842.28 GiB 1978.13 GB)
   Raid Devices : 4
  Total Devices : 5
Preferred Minor : 1

    Update Time : Tue Apr 14 00:44:13 2009
          State : active
 Active Devices : 4
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 1
       Checksum : 7a35767a - correct
         Events : 219

         Layout : near=2, far=1
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     1       8       36        1      active sync   /dev/sdc4

   0     0       8       20        0      active sync   /dev/sdb4
   1     1       8       36        1      active sync   /dev/sdc4
   2     2       8       52        2      active sync   /dev/sdd4
   3     3       8       68        3      active sync   /dev/sde4
   4     4       8       84        4      spare   /dev/sdf4


######################
# mdadm -E /dev/sdd4 #
######################
/dev/sdd4:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : f4ddbd55:206c7f81:b855f41b:37d33d37
  Creation Time : Tue May  6 02:06:45 2008
     Raid Level : raid10
  Used Dev Size : 965883904 (921.14 GiB 989.07 GB)
     Array Size : 1931767808 (1842.28 GiB 1978.13 GB)
   Raid Devices : 4
  Total Devices : 5
Preferred Minor : 1

    Update Time : Tue Apr 14 00:44:13 2009
          State : active
 Active Devices : 4
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 1
       Checksum : 7a35768c - correct
         Events : 219

         Layout : near=2, far=1
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     2       8       52        2      active sync   /dev/sdd4

   0     0       8       20        0      active sync   /dev/sdb4
   1     1       8       36        1      active sync   /dev/sdc4
   2     2       8       52        2      active sync   /dev/sdd4
   3     3       8       68        3      active sync   /dev/sde4
   4     4       8       84        4      spare   /dev/sdf4



######################
# mdadm -E /dev/sde4 #
######################
/dev/sde4:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : f4ddbd55:206c7f81:b855f41b:37d33d37
  Creation Time : Tue May  6 02:06:45 2008
     Raid Level : raid10
  Used Dev Size : 965883904 (921.14 GiB 989.07 GB)
     Array Size : 1931767808 (1842.28 GiB 1978.13 GB)
   Raid Devices : 4
  Total Devices : 5
Preferred Minor : 1

    Update Time : Tue Apr 14 00:44:13 2009
          State : active
 Active Devices : 4
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 1
       Checksum : 7a35769e - correct
         Events : 219

         Layout : near=2, far=1
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     3       8       68        3      active sync   /dev/sde4

   0     0       8       20        0      active sync   /dev/sdb4
   1     1       8       36        1      active sync   /dev/sdc4
   2     2       8       52        2      active sync   /dev/sdd4
   3     3       8       68        3      active sync   /dev/sde4
   4     4       8       84        4      spare   /dev/sdf4



######################
# mdadm -E /dev/sdf4 #
######################
/dev/sdf4:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : f4ddbd55:206c7f81:b855f41b:37d33d37
  Creation Time : Tue May  6 02:06:45 2008
     Raid Level : raid10
  Used Dev Size : 965883904 (921.14 GiB 989.07 GB)
     Array Size : 1931767808 (1842.28 GiB 1978.13 GB)
   Raid Devices : 4
  Total Devices : 5
Preferred Minor : 1

    Update Time : Fri Apr 10 16:43:47 2009
          State : clean
 Active Devices : 4
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 1
       Checksum : 7a31126a - correct
         Events : 218

         Layout : near=2, far=1
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     4       8       84        4      spare   /dev/sdf4

   0     0       8       20        0      active sync   /dev/sdb4
   1     1       8       36        1      active sync   /dev/sdc4
   2     2       8       52        2      active sync   /dev/sdd4
   3     3       8       68        3      active sync   /dev/sde4
   4     4       8       84        4      spare   /dev/sdf4






More information about the Wylug-help mailing list