[Wolves] Software RAID reconstruction

Adam Sweet adam at adamsweet.org
Wed Aug 27 08:34:04 UTC 2008


chris procter wrote:
> I think you're wrong :)
> 
> yes sdd1 (scsi0 : destination target 3, lun 0) is failing but sdc1 *must* have been part of the
> array otherwise what is sdd rebuilding from?

I actually think I was wrong too. I think sdc1 is continually rebuilding
from sdd1, but as sdd1 has a bad block, it keeps rebuilding sdc1, which
seems a bit screwy in my eyes but I can understand how it happened. I
think that once sdc1 has finished rebuilding and is in sync, I just have
to fail out sdd1 before it starts rebuilding again, which is a window of
15 to 30 seconds so far as I can tell.

> when I fail a device in my array I get (ignore the odd devices :)

This is the network block driver isn't it? :)

> [root at north ~]# mdadm --manage -f /dev/md0 /dev/gnbd0
> mdadm: set /dev/gnbd0 faulty in /dev/md0
> [root at north ~]# cat /proc/mdstat
> Personalities : [raid1]
> md0 : active raid1 gnbd0[2](F) loop0[1]
>       2097088 blocks [2/1] [_U]

<snip>

> Do you get the same log messages previously in the logs? Could it just be a weird log messages get
> while the array is starting up at boot time?

Which same messages? After failing sdc1 out and re-adding it, sdc1 has
remaining part of the array across reboots. It was just that first boot
after it was kicked out where it couldn't be added to the array on boot.

> If so your seeing sdd rebuilding then failing again either when it writes to that block or some
> sort of integrity check on finishing rebuilding, replace sdd and you'll be fine.

That's the new plan. I just have to catch it.

> Of course I could be wrong about you being wrong :)

We could both be wrong and then I'll phone you up for advice on advanced
filesystem recovery techniques.

Thanks Chris.

Ad



More information about the Wolves mailing list