[GLLUG] Advice on expanding an existing RAID 1 installation

Mon May 26 06:31:08 UTC 2014

Andy Smith <andy at bitfolk.com> writes:

> Hi John,
>
> On Sat, May 24, 2014 at 11:12:56AM +0100, John Winters wrote:
>> The Linux RAID Wiki has a page on extending an existing RAID array,
>> but says I should start by removing one of my existing drives.
>> Presumably this isn't necessary if I have a spare physical
>> slot/controller slot/power lead?  Rather than their proposed approach
>> of remove=>add=>remove=>add, is there any reason I can't do
>> add=>remove=>add=>remove?
>
> That should work and would have the advantage of never having a
> degraded array – you'd go to a three way RAID-1 which would not be
> degraded when you took it down to two devices again.

For that to be the case you need to use mdadm's --grow to set the number
of devices to 3, after you've added the drive to the array, otherwise
it'll just be added as a hot spare.

I tend to partition my disks, and then RAID the partitions, because that
allows you to rejig the RAIDs on a live system -- it is more effort
though, and if you're just using this system as a backup server, perhaps
the effort will never be repaid.

>> Second question - I'm undecided about whether to put the 3rd drive
>> into the system to start with (thus three drives up to date all the
>> time) or keep it on a shelf for when it's needed.  Does anyone have
>> strong views on which is better?

I'd guess that if you're power-cycling regularly the drives that there's
a reasonable chance that they'll sear out because of that before
mechanical failure.  If that's the case, it's probably best to have the
drives in the machine for differing amounts of time.

I suppose you could set the thing up as a 3 disk RAID1, and leave it
with one disk missing most of the time, and occasionally put the third
disk in the machine, sync up the array, and then take a disk out again
(perhaps giving one of the other drives a holiday, so they get different
amounts of wear)

An internal bitmap (see the mdadm(8) man page) will speed up your
resyncs at the cost of a tiny write speed penalty.

You are perhaps better off --grow-ing the array to include the extra
disk when loaded, since leaving the raid degraded will cause lots of
mail from mdadm's checks (you could tweak the script to expect just one
_ in the relevant array's /proc/mdstat though).

Things to consider:

If you have a cold spare, then when the first disk dies, you're going to
have to copy everything off of the second disk, which is when you'll
find all the bad blocks on that drive, which may contain the one bit of
data you really wanted to keep.

If you have all the drives running, then when you ask for data, you'll
get given whatever one of the drives thinks of as being right, unless
the drive decides that the data is broken -- if the three drive do not
agree about that, you will not discover this unless you run checks e.g.:

  echo check > /sys/block/md5/md/sync_action

BTW If you're using non-"raid ready" drives, then they generally try way
too hard to recover your data (assuming they contain the only copy), so
will sod about for 10 seconds when you start getting duff blocks, and
cause the whole system to grind to a halt as it waits for the disk to
respond.  This seems to result in the block not (or not always) being
overwritten by good data from another drive, which is a bit
disappointing.  If smart shows you pending and/or uncorrectable sectors,
but doesn't seem to be reallocating the sectors, that's what's up.

>> The machine in question is only switched on for a couple of hours
>> each day.  It switches itself on, backs up a selection of other
>> systems, then switches itself off again.
>
> It sounds like neither power usage nor the inconvenience of
> replacing devices would be much of a factor for you. Maybe the extra
> resilience is worth it then?
>
> As James mentioned, if you were in the mood for a radical rethink
> then you could use btrfs. There's also ZFS on Linux. Either of those
> would help avoid the phantom read error problem by keeping checksums
> of files and automatically reconstructing them from the good copies.
> You still need to have enough good copies and to read the actual
> files to discover the problem, though.

BTRFS is still quite new though, so you might find that you have to take
care of it -- I think the checksums for everything feature may be enough
to swing that decision though, especially on a machine with easy
physical access (as seems to be the case here), since with larger drive
sizes you're pretty much guaranteed to lose a block or two whenever you
try a rebuild from one drive, so better to assume that the drive is
unreliable in advance.

Also, it seems that putting btrfs on to of software raid is a mistake,
so don't do that -- for that and other hints, Russel Coker's been
writing lots of useful stuff around this subject of late:

  http://etbe.coker.com.au/category/storage/

Cheers, Phil.
-- 
|)|  Philip Hands [+44 (0)20 8530 9560]    http://www.hands.com/
|-|  HANDS.COM Ltd.                    http://ftp.uk.debian.org/
|(|  10 Onslow Gardens, South Woodford, London  E18 1NE  ENGLAND
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <http://mailman.lug.org.uk/pipermail/gllug/attachments/20140526/dbd5e0a6/attachment.pgp>