[sclug] Mirror disks across machines like distributed RAID?

Mon Jan 4 11:39:35 UTC 2010

On 02/01/2010 12:20, Dickon Hood wrote:
> On Fri, Jan 01, 2010 at 23:48:24 +0000, John Stumbles wrote:
> doesn't help if/when the host
> : system goes bad,
>
> iSCSI.  Export the block device from the server to the client, use it as
> any other block device (so add it to LVM / md / whathaveyou) on the
> client.  Job's a goodun.  And it's standard.
>    
Dickon,

iSCSI is great, I use it for my diskless machines and VM's etc, but 
doesn't meet the OP requirements, a client/server disk system which you 
suggest does not provide the fail over of services he wanted.
BTW, drbd is now in the mainline kernel, so is "standard" too, I have 
never understood why DRBD had so many haters in the past, any ideas?

John,

To overcome the same problem you describe I use DRBD, which is pretty 
easy to set up and works great for me. I have been using it for years 
without issue and found it copes well with all the failure scenario's 
i've been through. I also found the throughput is good too when you use 
a dedicated disk network, just stick an extra gigabit card in each 
machine and link them back to back, this removes any worry of the disks 
getting out of sync unless you are regularly writing to the disk at more 
than gigabit speeds (e.g. SSD - in this case add cards to suit )

This does increase your power footprint as both machines have to be on 
to allow the disk replication, but it is worth it if you need the 
reliability. If you went down any other route, like an external 
iSCSI/SAN/NAS box which feeds both servers, you would only end up having 
to get two of them as well to avoid failure at that layer anyway, so 
having the dual storage system and dual cpu both being the same pair of 
boxs saves money and power.

Not everyone is mad enough to build a rack mount server room in there 
back garden like I have so I find a great solution to the power and 
space problem to get a redundant home setup is to use two laptops, 
modified to take 3.5" disks and second network port added via card slot. 
This way you have total hardware fault tolerance (including TWO UPS!) on 
a power budget of under 50W and normally a smaller cubic space 
requirement than single ATX machine. Great use for laptops with broken LCD's

The next issue will be your switch or router, I have seen a fair number 
of no-name switch die when in 24/7 use in the last year (in fact 
probably more than I have seen disk fails if you weight the data on 
number of units) you can find a decent Cisco or HP rack mount unit on 
ebay for less than ?40 which will be much more reliable, but larger, 
more power hungry and a lot noisier! or just wire up two cheaply no name 
entire networks with Ethernet bonding for fail over.

The most important part of any redundant setup is always monitoring and 
reporting though. If a system works well you will have no idea the 
primary has failed and you are now running in non-fault tolerant mode 
from a user perspective. So it is critical you setup and test systems to 
let you know so you can fix the problem and get back to two node mode as 
soon as possible. Recently I saw a big fuss over a hosted redundant 
system going offline despite having RAID 5 SAN with spare disk, upon 
investigation it turned one disk from the array died in 2007 and the hot 
spare died shortly after it joined the array, two years ago, without 
anyone noticing a thing!

When you get comfortable with DRBD to share your application and user 
data disk, you can start to look at some advanced topics too, like 
booting both machines from the same OS, multiple on-line primaries with 
load balancing and things like that. Be careful you don't get ahead of 
your self and end up with your brain split in two though, boom boom.....

TCA