[Wylug-help] Inactive RAID 10 Array - Need More Help Please

Sun May 3 00:50:41 UTC 2009

On Tue, Apr 28, 2009 at 11:51:37PM +0100, Chris Davies MBCS wrote:
> Are you sorted with your RAID issue?

Hi Chris,

Thanks for your concern and your previous response to my original plea
for help.

Yes, I am sorted (ish), i.e. I have a running system, using a recovered
version of the dodgy array (/dev/md0) and I don't think that I've lost
any data.

I am not exactly happy, however, because:

  1. It appears that a power surge de-synchronised the discs' revision
     counts - so simply updating the revision counts should have sorted
     everything.  In fact, the syntax given by many sources (including
     the mdadm manpage and the kernel.org linux-raid mailing list) had
     no effect, whatsoever, on my array. 

     In the end I simply failed and removed the one excessively
     up-to-date disc so that mdadm automatically brought in the spare
     and synched it with the remaining 3/4 of the original 4-disc array.

     In other words, I wasted vast amounts of time taking backups,
     resynching discs, and trying to connect up snippets of mostly
     missing documentation, because the simplest one-line mdadm command
     to update revision counts just didn't work ... actually, it did
     return something: an erroneous 'success' message {sigh}.

  2. I've been using RAID to enhance fault tolerance, but at the cost
     of introducing an additional (and potentially catastrophic) point
     of failure.

     Sure, my array did tolerate the loss of one of its discs, but that
     was an entirely unecessary loss, brought about by a flaw in RAID's
     own management software.

Is the de-synchronisation of revision counts an example of the so-called
'write hole' in RAID?

Dave