[Gllug] Filesystems again :)

Thu May 16 11:17:11 UTC 2002

On Thu, May 16, 2002 at 12:53:31PM +0200, John Hearns wrote:
> On Thu, 2002-05-16 at 11:37, pauln at truemesh.com wrote:
> > More pondering on how to do high availability storage :)
> > 
> What do you mean by high availability storage?

Sadly not SANs (had EMC at last place), as no budget.  I have a cluster
of machines, and need to have all of them getting data from a central
source (which is robust).

The servers get both user data and live feeds (updated every 2-4 mins).
The cluster is not physically in one rack (so shared storage may be
out), however they are on a fast local network.

The cluster is round robin meaning writes to the storage could occur via
any of the servers.  The servers currently both read and write the data
from a single server, the feed gets delivered to all members.  I'd like
to consolidate both into a more robust solution.

What I've played with so far:

1) rsync/unison in cron between all machines  - this works, but is
expensive in polling.  Also as more machines are added need to create
more syncs.

2) rsync/unison event triggered using FAM.  Less expensive as no
polling, however when cp'ing large files (I tried an iso), it was sync'd
in chunks meaning a read may be incomplete.

3) nfs/samba/dav - still single point of failure with current set up as
only one write master.  Need to see if can make write robust. (see point
5)

4) Berkeley db - looks like it will do all I need but coding change
needed.

5) Buy more machines and make back end redundant with take over (LVS,
http://people.redhat.com/jrfuller/cms/index.html, or something).  This
could then use DAV/samba/whatever and failover.

6) Intermezzo and intersync - just compiling atm. 

7) coda - haven't looked at yet.

I don't think there is an easy headless way to do it, so write master
with takeover is probably the way to go.  It's a production system so
I'd prefer not to be running something bleeding edge.

Problems I've thought about: write collisions, merging, node removal,
nodes not being able to talk to other nodes.

The round robin is annoying as it could mean that you make a change, and
when you look, you're looking at a machine it hasn't progagated to.

Paul

-- 
Gllug mailing list  -  Gllug at linux.co.uk
http://list.ftech.net/mailman/listinfo/gllug