[Nottingham] Distributed *file* filesystems

James of the family Moore jmthelostpacket at googlemail.com
Sun Mar 14 23:30:04 UTC 2010


I had developed a DDBFS layer a few moons back (2003?) that used a
replicated node table/index; this was updated live across all live
nodes and updated new nodes as they came online, to the most
up-to-date version (that worked assuming that the offline nodes had no
new additions in the meantime!). The file objects did not distribute
across the nodes, however, they stayed at "home" on each originating
node; this was developed with the intention of allowing a 3D/immersive
environment to query each node as the user came into contact with it
in the environment. This would be a transparent process, depending on
how good the network link is and how much compression could be
attained in the index, how many nodes present reflecting on how large
this index would be would affect how long it would take for a
network-wide and per-node update. Practically speaking it took
milliseconds to run through a full update on a LAN with thirty nodes
and 10GB of random data each. A later version (which I didn't get
around to finishing) would also sport a common area where a section of
the node table was used in RAM and file objects replicated across the
environment - like the Whiteboard in SL. This bit obviously would not
be persistent save for a built-in reserved space in the environment
where you could drop the object and it would update each client node's
index as it came into contact with the Whiteboard - a way could be
found here to upload the whiteboard objects to all clients from every
other client a la torrents, relieving server load. Persistent objects
would be uploaded from the server on demand and cached by the client
without it being added as a new object in that client's index,
although it would still show in the index in its original location;
the cached version would be compared with the persistent version for
updates and other changes, the most up-to-date would be used. That
little bit I nicked from the way image caching on most browsers works.

What I did find tho, while chatting with researchers into
collaborative virtual environments, was the problem of centralised
servers. These single points of failure easily became clogged with
data, hence I looked at decentralising the experience. Ergo, born was
the idea of distributing the index with information about the location
on the network, of each object. No central server needed, each and
every client had all the information it needed. The only reason to
have a centralised server at all would probably be to hold a master
index and upload that to new nodes (tho there may well be a  way
around that - look at dynamic host tracking for torrents to see how
that works), or as a relay for the Whiteboard.

I wish I could find all my research and development on this, as I do
think you might find some of it useful; it shouldn't take too long to
have the thing set up to replicate data across a dynamic nodemap in
the background, updating the indices as it goes...

On 3/14/10, Martin <martin at ml1.co.uk> wrote:
> Folks,
>
> There is RAID for offering redundancy at the device block level across
> multiple disks in one host;
>
> There is DRBD that offers in effect RAID across disks that are on
> networked hosts;
>
> What is there for offering redundancy instead at the /file/ level across
> multiple hosted /filesystems/ ?... (Instead of the block device level RAID.)
>
>
> My crazy idea is to have two hosts acting as NAS so that another machine
> can mount both of them and automatically have files immediately
> replicated across both, but also to allow another few machines to mount
> either NAS read-only to read the files...
>
> How best to do that?
>
> I also hope to gain some redundancy in that if one NAS goes offline, the
> second can pick up all the read-only mounts.
>
>
> I've found:
>
> Gluster Storage Platform
> http://www.gluster.com/community/documentation/index.php/Gluster_Storage_Platform
>
> Sector/Sphere
> http://en.wikipedia.org/wiki/Sector/Sphere
>
>
> Anyone played with them?
>
> Any thoughts/ideas for redundant operation but at the filesystem level
> rather than at the block level?
>
>
> I've tried DRBD and it works fine, but it is useless for read-only from
> the multiple DRBD hosts if you also have live read-writes... You must go
> through the one master machine that has the read-write filesystem
> mounted on the DRBD device.
>
>
> Aside: Whilst in FUSE-land, I also stumbled across:
>
> lessfs data deduplication
> http://www.lessfs.com/wordpress/?page_id=50
>
> which could be an alternative to "rsync --link-dest=" for hard linking
> to previous unchanged copies for saving backups disk space. Anyone tried it?
>
>
> All on FUSE:
> http://fuse.sourceforge.net/
>
>
> Any comments, thoughts, stories?
>
> Cheers,
> Martin
>
> --
> ----------------
> Martin Lomas
> martin at ml1.co.uk
> ----------------
>
> _______________________________________________
> Nottingham mailing list
> Nottingham at mailman.lug.org.uk
> https://mailman.lug.org.uk/mailman/listinfo/nottingham
>


-- 
Vi veri veniversum vivus vici



More information about the Nottingham mailing list