[Nottingham] Distributed file filesystems

Tue Mar 16 21:36:11 UTC 2010

Hi,

FUSE can be much to slow it depends on how verbose your file information transport will be and how youv'e implemented it. 

It almost sounds like you want box X, Y, Z to replicate on the fly between themselves, it's possible to do this in *many* different ways and one I came across recently is ZFS (solaris/opensolaris) ability to always sync to another ZFS server constantly, this has the downside of being read-only until you clone/promote.

You could write a 'daemon' to check a file system for any new files and if it sees it replicates using rsync across your NAS's. You could probably use FUSE to manage/kick off rsync when a new file is copied/deleted, rather than a daemon. Might have inspired me to write a little project now!! 

Clustered file systems are nice but the presumption is that the backend is shared in some fashion i.e. iscsi, FCoE, FC, SAS etc...

JD

--- On Mon, 3/15/10, Martin <martin at ml1.co.uk> wrote:

> From: Martin <martin at ml1.co.uk>
> Subject: Re: [Nottingham] Distributed *file* filesystems
> To: "Notts GNU/Linux Users Group" <nottingham at mailman.lug.org.uk>
> Date: Monday, March 15, 2010, 12:27 PM
> James of the family Moore wrote:
> > I had developed a DDBFS layer a few moons back (2003?)
> that used a
> > replicated node table/index; this was updated live
> across all live1
> > nodes and updated new nodes as they came online, to
> the most
> > up-to-date version (that worked assuming that the
> offline nodes had no
> > new additions in the meantime!). The file objects did
> not distribute
> > across the nodes, however, they stayed at "home" on
> each originating
> [...]
> 
> Interesting. It sounds like a more general version of the
> Google
> distributed fs. Google use fixed blocks of 64MBytes. An
> interesting
> design feature is that they assume files will usually be
> appended to
> rather than rewritten. I guess that's optimised for their
> web crawling...
> 
> 
> > little bit I nicked from the way image caching on most
> browsers works.
> 
> Surely you mean "reused" :-)
> 
> 
> > What I did find tho, while chatting with researchers
> into
> > collaborative virtual environments, was the problem of
> centralised
> > servers. These single points of failure easily became
> clogged with
> > data, hence I looked at decentralising the experience.
> Ergo, born was
> [...]
> 
> Having or needing a single coordinating point in parallel
> systems is
> always the bottleneck or single weak link that chokes
> performance. Going
> truly distributed and yet maintaining coherence across all
> the parallel
> working seems to be a Very hard Problem.
> 
> I rediscovered some of that with my parallel bash demos and
> experiments
> with the Sieve of Eratosthenes...
> 
> Whatever did happen to the HURD?...
> 
> 
> > think you might find some of it useful; it shouldn't
> take too long to
> > have the thing set up to replicate data across a
> dynamic nodemap in
> > the background, updating the indices as it goes...
> 
> Now that would make for a very interesting techie talk! You
> got time to
> hack?
> 
> However, hasn't all that been overtaken now by the various
> FUSE based
> distributed flavours?
> 
> 
> Cheers,
> Martin
> 
> 
> -- 
> ----------------
> Martin Lomas
> martin at ml1.co.uk
> ----------------
> 
> _______________________________________________
> Nottingham mailing list
> Nottingham at mailman.lug.org.uk
> https://mailman.lug.org.uk/mailman/listinfo/nottingham
>