[Wolves] Software RAID reconstruction

Adam Sweet adam at adamsweet.org
Tue Aug 26 10:49:31 UTC 2008


David Goodwin wrote:
> <snip>
> 
> Shouldn't you really have forced sdc1 to be the array 'source' and
> failed sdd1?

This was because sdc1 was kicked out of the array by the software RAID
layer itself and couldn't be reinserted automatically. I had to fail it
out and then reinsert it. I tried to do the same with sdd1 but mdadm
refused saying that sdd1 was in use.

> Considering it's all handled at the bit level, it's possible that you
> could reconstruct the array from sdc1.

The thing is that I don't trust that sdc1 is in sync, though I can't
tell which disk is being rebuilt from which and the rebuild occurs
within seconds of a completed one so I'm not cinvinced I can catch it.

> I think the bit about sdc1 being out of date is a red herring - after
> all sdd1 has failed (for whatever reason), so it can hardly be judged to
> be an authority!

Agreed, but sdd1 is running. sdc1 is synching from it

> I'd be tempted to:
> 
> a) Take a dd disk image of sdc1 (unless you have backups you can use?)

Umph. No, there are 16 live virtual machines with 10GB disk each. We
have a quick rebuild process (~15 mins) instead, everything not in the
rebuild is only temporary data anyway.

> b) Recreate the array with sdc1 being the only device present, and see
> if you can access it... if so, is the data valid?  This should end up
> causing the raid superblock to alter, but not the filesystem table - so
> you should still be able to access the data etc.

The downtime for all of the customers on the machine would be too much
for this.

> sdc1 should still be a valid disk - as even though it's partially
> resynced, the order of the data should be maintained and it should be
> writing the same content to the blocks as was there previously.
> 
> (At least that's my understanding of how it works...)
> 
> (I'm no expert... [insert disclaimer here])

No trouble, thanks for your advice. I think at the moment, my best bet
is to order some replacement hardware, we need some in the next few
weeks anyway and move all of the VMs to the new hardware, before
rebuilding the machine and if necessary, moving them back.

The machine in question has a hardware RAID controller, but for some
reason, my predecessor used software RAID. I can only assume that the
controller wasn't supported by the OS installed at the time, so an OS
refresh is probably a good idea all the same.

Ad



More information about the Wolves mailing list