[Gllug] Problems rebuilding a RAID5 array after a failed disc on Centos 5
Oliver Howe
ojhowe at gmail.com
Wed Sep 22 08:41:15 UTC 2010
I have a backup server configured with the following 3 RAID arrays
//backup2> /c0 show
Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache
AVrfy
------------------------------------------------------------------------------
u0 RAID-1 OK - - - 74.4951 OFF
OFF
u1 RAID-5 OK - - 64K 3259.56 OFF OFF
u2 RAID-5 OK - - 64K 5122.17 OFF OFF
The first array (RAID1) /co/u0 contains 2 x 500GB discs which hold the
operating system.
The second array (RAID5) /c0/u1 contains 8 x 500GB discs and provides just
over 3GB worth of space to hold my backup files.
This is mounted on /dev/sdb2 as /disk01
The third array (RAID5) /co/u2 contains 12 x 500GB discs and provides just
over 5GB worth of space to hold my backup files.
This is mounted on /dev/sdb1 as /disk02
The third array co/u2 was having some problems after one of it's discs
failed.
This first showed up by the partition it was mounted on /disk01 showing I/O
errors.
So I rebooted it and went to the RAID controller (alt-3) which showed me the
corrupt disc.
I replaced the corrupt disc and told the controller to rebuild the array
which it did. I then verified
it via the controller.
However, when I boot the system up it didnt mount /disk02.
So I manually mounted it but it didnt show any data and it didnt let me
write to it.
Then I did a fsck on it and it said the following
# fsck /dev/sdb1
fsck 1.39 (29-May-2006)
e2fsck 1.39 (29-May-2006)
The filesystem size (according to the superblock) is 854473244 blocks
The physical size of the device is 317602332 blocks
Either the superblock or the partition table is likely to be corrupt!
Abort<y>?
I'm not sure what has happened here and why the array hasnt been recovered.
Does anyone have any suggestions?
I'm not too bothered about losing the data from /disk02 as I have it backed
up elsewhere, but I'd rather
recover everything to its previous state than have to reinstall the entire
server.
Thanks,
Oliver
below is the output from the RAID controller using the tw_cli linux client
from 3ware.
Note that slots 2 and 11 have always been empty.
# /opt/3ware/CLI/tw_cli
//backup2> /c0 show
Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache
AVrfy
------------------------------------------------------------------------------
u0 RAID-1 OK - - - 74.4951 OFF
OFF
u1 RAID-5 OK - - 64K 3259.56 OFF OFF
u2 RAID-5 OK - - 64K 5122.17 OFF OFF
Port Status Unit Size Blocks Serial
---------------------------------------------------------------
p0 OK u0 74.53 GB 156301488 6QZ3D8E8
p1 OK u0 74.53 GB 156301488 6QZ3D04Z
p2 NOT-PRESENT - - - -
p3 OK u1 465.76 GB 976773168 9QM0G1SF
p4 OK u1 465.76 GB 976773168 9QM0L1N3
p5 OK u1 465.76 GB 976773168 9QM0EEP9
p6 OK u1 465.76 GB 976773168 9QM0HXMH
p7 OK u1 465.76 GB 976773168 9QM0GZ6H
p8 OK u1 465.76 GB 976773168 9QM0G16T
p9 OK u1 465.76 GB 976773168 9QM0LK3C
p10 OK u1 465.76 GB 976773168 9QM0EVP7
p11 NOT-PRESENT - - - -
p12 OK u2 465.76 GB 976773168 WD-WCASU6421435
p13 OK u2 465.76 GB 976773168 WD-WCASU6548401
p14 OK u2 465.76 GB 976773168 WD-WCASU6548327
p15 OK u2 465.76 GB 976773168 WD-WCASU6421420
p16 OK u2 465.76 GB 976773168 WD-WCASU6421342
p17 OK u2 465.76 GB 976773168 WD-WCASU6519440
p18 OK u2 465.76 GB 976773168 WD-WCASU6542386
p19 OK u2 465.76 GB 976773168 WD-WCASU6418851
p20 OK u2 465.76 GB 976773168 WD-WCASU6548441
p21 OK u2 465.76 GB 976773168 WD-WCASU6548301
p22 OK u2 465.76 GB 976773168 WD-WCASU6541089
p23 OK u2 465.76 GB 976773168 WD-WCASU6041935
//backup2>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.lug.org.uk/pipermail/gllug/attachments/20100922/b184211c/attachment.html>
-------------- next part --------------
--
Gllug mailing list - Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug
More information about the GLLUG
mailing list