[Nottingham] [OT - OTT technical] A good geek giggle too good even for The Register! - RAID data recovery poser

Martin martin at ml1.co.uk
Tue May 7 17:10:47 UTC 2013


This one sent me into a spin of curious giggles... (All the more so
whilst at the moment delving rather too deeply into btrfs and RAID and
bit flips for different types of media... :-/ Also just successfully
recovered a failed disk on RAID5 for rather a lot of data :-) )

For any of you twisted and warped enough to appreciate The Register, for
this article even they have hit the buffers for circumlocutory
bafflement of this techno-geekery advance on not losing all your bits:

Reg boffins: Help us answer this Big Blue RAID data recovery poser
It's not just the disks that have failed



...Alerted at once to an insufficiency of cerebral expertise, your hack
pressed on to the introduction. Its first sentence reads:

    Consider an m×n array whose entries are elements in a finite field...

... Enough already! We can't determine if this IBM Research paper is hot
stuff or an interesting mathematical side line with no real-world

So, having publicly admitted to our cerebral insufficiency, let's try
crowd-sourcing (via you, dear Reg readers) to get an understanding of it.

The deal is we make the paper available here (PDF), you read it, and
those of you still sentient and standing after doing that can post a
plain language description of it in this Reg forum thread. Then everyone
can benefit from your expertise, you become known as a ... wait for it ...


Excellent stuff! And far better than the usual ignorant journo
regurgitating techno-speak in meaningless random gay abandon.

As can be expected, some of the comments are fun.

Until I line up a really good pint mug of coffee to read further, and
also overcome my Shannon theory and entropy theory fuelled scepticism,
here's a few fun giggles from the comments:

(Best of all, this isn't April 1st!)

((For the full fun, please go and read the comments thread yourself!))

(((The impatient can skip to the end for the one line summary ;-) )))

"But it reads like Hamming distance meets Sudoku"

"I didn't attempt to read the paper, the quoted bits were enough to put
me off."

"There is a good reason for not rebuilding a degraded RAID 5 array, but
that's not it.

An obvious reason is that you can bet your salary that the array was
commissioned with a full set of shiny, new disks. Unless the disk that
failed was a serious short-lifer, you can bet that the others are now
not long for this world too. Rebuilding the array soon gets like
painting the Forth bridge.

Compound that with the fact that disks are more likely to fail (if
they're going to) when heavily loaded and that the rebuild is going to
cane the living ..."

"Wow, that's nasty... some of it has echos of reed-solomon, but that's
all I can follow once it gets past defining conventional RAID in matrix

I can tell you that this stuff is obscure. Really obscure. This isn't
just your regular nasty math that any professor can figure out - this is ...

... Might be applicable to flash controllers though - while drives tend
to fail in a 'clunk-bzz-dead' manner, flash memory is more likely to
lose individual cells."

"... That's a big problem with mathematicians, they can explain stuff
simply but when they write papers it's their custom to be as terse as
possible. Clarity is sacrificed by convention and that's a crying shame.
It's time they changed, really (any eggheads listening?)

Caesarius gave a good example of what it's like to have it put in simple
terms with his comments about GF..."

"I just realised while soaking in the bath - there is an application of
this in RAID. It can correct the single-sector read errors encountered
during an array rebuild.

Also, why do none of your tablet accessory reviews include the most
useful accessory to me, the watertight food bag...?"

"... to construct RAID-like "erasure codes". The paper here has elements
of Reed-Solomon codes or Rabin's "Information Dispersal Algorithm", but
also needs some understanding of the maths underlying maximal distance
separation (MDS) codes in theory. I'll try to give a brief outline of
what I think is going on...

[Beautifully clear analogy, really!]

... That's the simplest analogy to explain how these kinds of codes
work. In practice, we don't use lines, planes and hyperplanes in a real
space, but instead (generally) use either field arithmetic modulo some
prime or (more usually) Galois fields modulo an irreducible polynomial
(which is effectively a prime, or serves the same function, anyway).
They're just more convenient to work with. ...

... I'll follow this up with another post to explain RAID and the paper..."

2 hours later...

"... RAID looks at redundancy by distinguishing between data "slices"
and parity slices. In the simplest case, you can have a data slice
mirrored across two drives, and use the XOR of the two slices to provide
your parity stripe. For this type of parity checking, if you lose either
of the data stripes, you can still recover it by XORing the remaining
one with the parity stripe. This kind of redundancy is (more or less)
how different RAID levels are built up, with extra mirroring and parity
added on top of existing layers. As the paper says, this is pretty

... do some maths magic and return the correct bits. Unfortunately,
that's not how RAID works. All of its redundancy is built using either
plain XOR of data or (in RAID 6) a more complex formula that's
(generally) based on Hamming codes...

... this paper is showing is how to construct a couple of different
parity calculations that can be used to scale up the redundancy level to
tolerate more failures in a (presumably) more realistic pattern. It
seems to be using the same kind of matrix operations as used in
Reed-Solomon codes, including using a Vandermonde-form matrix to prove
the invertibility (solvability) of their coding (parity check) matrices.
So it's not quite as generic as a secret-sharing scheme would be, but
it's definitely using the same mathematical machinery. ...

... once you understand the basics of using matrices for solving linear
equations and know generally how to work in Galois fields, this paper
isn't actually all that obscure. IMO, of course :)"

"... you've just solved the most pressing problem known to bathkind.

Yours truly, from the B Ark."

"... If you want to understand the detail, start by reading about
Hamming codes.

Also, read about "finite fields". GF(2) means Galois Field order 2,
which is a very posh way of saying "binary, and use XOR to add and AND
to multiply". GF(anything else) is a more complicated arrangement where
we redefine ADD and MULTIPLY to behave themselves, even though the
numbers don't behave quite like familiar integers and certainly don't
look like them. ..."

"Pretty mathematics, but mirroring is more practical

... Disks are cheap, and mirroring does does not impose the large
performance penalties that come with RAID 5.

Now lets move into the IBM universe. In this universe, error detection
is perfect, so the file system corruption risk of RAID 5 disappears. The
poor write performance of RAID 5 does not matter to you, and you do not
mind you system slowing to a craw each time a disk fails. You do care
about the price of disks, so you are too cheap to go for mirroring.
Lastly, you buy your drives from which ever manufacturer is currently
going through a bad patch (this happens to all of them - just watch the
commentards slagging off the particular manufacture that happened to
caused them grief). You decide that a second drive can fail before a
first failed drive can be restored, but that you will never have three
broken drives at the same time because you have sacrificed a chicken to
to voodoo gods. ...

... If you cannot make data packets small enough to get only single bit
errors, you will need to look at correcting multiple bits at once.
Mathematicians have done all the grunt work for you already, but they
express it in their own jargon which is clear as mud to most programmers.

Lets start with...

... properties for a Galois field. They can use all the things they have
discovered about Galois fields (far more than most programmers can
stand) to find the equations for recovering data from a RAID array with
two broken drives.

We have almost reached the point where I am going to run away screaming,
but there is one more piece of mathematics... with any luck, I can
explain it and perhaps three or four of you will actually need to use at
some point in the next 50 years. Pretend we have...

... In the IBM universe, disk drives can always spot when reading a
sector failed. Instead of some unknown bits getting flipped, some known
bits are set to 0. This requires some slightly different mathematics to
to usual error correcting code programmers meet. If you get as far as
page 2, the paper defines Partitial Minimum Spanning Distance and SD
codes which are just acronyms to frighten journalist on page 1. I
skimmed through to the end, and did not see any mention of performance,
thrashing when a drive fails or error propagation when a bad read has
the right checksum by chance. I expect this set up has worse performance
and corruption problems than RAID 5. For most people, the extra disks
required for RAID 10 are going to be cheaper than implementing IBM's
pretty mathematics."

"... I can see that disks are cheap, and I agree that recovering from a
failure should not detract from normal operation more than is
acceptable. Discussion then follows the "unfortunately .. fortunately ..
unfortunately there was a pitchfork in the haystack ..." route, where
people pay their money and make their choice. So I observe..."

"Murphy's Law says that there are a finite number of things that you can
think of that might go wrong - and an infinite number of things that can
go wrong..."

"Rather than try to understand this math to save a parity drive

I think I'll go with dual parity 14+2 RAID and at least understand how
the data protection is working so I can enjoy my beer instead of
stressing out wondering if the proof has a math error somewhere that
will require a team of PhD's to uncover..."

"... Now this is pretty rare, and we're talking lightning bolt strike
while winning the lottery kind of rare here, but the paper seems to be
saying that by using both vertical *and* horizontal parity, you can
increase the level of protection.

>From the above example, you can use vertical parity to correct the
individual bit errors [across the disks]...

And now it's a straightforward job of using the 'normal' parity to
complete the rebuild. Nice idea, not sure how much benefit it adds vs
simply adding more traditional parity, but it's interesting at least."

"Re: Really?

Nail. Head. On the. Hit.

I couldn't understand the paper as I don't have 90 per cent of the maths
knowledge to do so - cerebral insufficiency. Mario couldn't explain it
to me so I could understand it because of my limitations - so that would
be no use in trying to get the paper's contents described on the Reg'.

This way is much better - and more fun."

"Just use ZFS: ZRAID2, ZRAID3.... etc

ZRAID will detect the problems before you do, unlike conventional
rubbish RAID 5 or 6..."

Myself... A rather liked the beer vs PhDs comment ;-)

In summary, choose your (anticipated or hoped for selection of) failure
mechanisms and regardless, *keep independent full backups* !

Now to head off to tonight's comingling of of beer and PhuD-ings :-)

Have fun!



*TODAY* *Tuesday 07/05/2013: Going Cloudy*

All welcome!

- ------------------ - ----------------------------------------
-    Martin Lomas    - OpenPGP (GPG/PGP) Public Key: 0xCEE1D3B7
- martin @ ml1 co uk - Import from   hkp://subkeys.pgp.net   or
- ------------------ - http:// ml1 .co .uk/martin_ml1_co_uk.gpg

More information about the Nottingham mailing list