[Gllug] Hacker Attack, and a wild aside about version-controlled filesystems

Thu Jan 12 16:06:17 UTC 2006

On Thu, 12 Jan 2006, Daniel P. Berrange stated:
> On Wed, Jan 11, 2006 at 10:54:44PM +0000, Nix wrote:
>> [wild aside]
>> I'm thinking a lot about revertability right now 'cos of my current
>> spare-time obsession. I've just completed the design of a
>> PostgreSQL-backed, version-controlled FUSE-based filesystem I'm calling
>> Recant. Coding is underway as of three hours ago :)
> 
> Personally, I've been permanently put off the idea of versioning filesystems
> having had to deal with the abortion that is IBM/Rational ClearCase and its
> MVFS filesystem.

I feel your pain.

>                  One of many problems with MVFS was that it tried to mirror 
> the entire of your existing VFS hierarchy & apply versioning to it, rather
> than just providing a standalone versioned storage bucket you could mount
> anywhere.

That's a... deeply silly and profoundly non-Unixlike idea.

>           With the mirroring approach it turned out to be very very hard to
> passthrough correct semantics for all the wierd custom filesystem types like 
> devpts, proc, sysfs, etc,

Did it even handle hardlinks properly? I've found not one versioning
filesystem that succeeds so far. Even Recant's implementation is odd, viz

# Make a file
$ echo 'file contents mark one' > a-link
$ recant version a-link
1.1

# Hard-link it to some new name, then unlink that link
$ ln a-link a-nother-link
$ rm a-nother-link

# Make a new file with that name (new inode, of course)
$ echo 'file contents mark two' > a-nother-link
$ recant version a-nother-link
1.2
$ recant version a-link
1.1

# Swapitty-do-dah!
$ recant roll a-nother-link 1.1
$ recant roll a-link 1.2
$ cat a-link
file contents mark two
$ cat a-nother-link
file contents mark one

That is, combine hardlink-based versioning and filename-based
versioning, and all of a sudden you find that you can roll one link to
contents that it never had before (one of its links was given a name that
was earlier or later used by an inode that did have those contents).

This seems odd, but it's an inevitable consequence of making versioning
that works both by inode and by filename at the same time. The semantics
of multiple independent inodes, both with history, being linked to the
same name at different times are even nastier: you may have to do
automatic branch merging and things like that. However, the result of
all this thrashing should be reasonably intuitive, I hope.

>                           no to mention the you now have interactions of two
> layers of inode caching, locking, god knows what.

Yuck?

>                                                   The end result was using
> MVFS tends to break various things, psuedo TTYs for example - /dev/pts now
> has a filesystem type 'mvfs' instead of 'devpts' which confused glibc...

All I can say is whoever thought that up was completely irrational. ;)

> Still don't let IBM's horrific mistakes discourage you - even with using

I am not not *not* writing something that tries to unionfs and
versionise your entire VFS. I think someone from IBM mentioned something
about this on l-k a few years back: they wanted to export a bunch of
VFS-internal things so they could traverse the vfsmnt tree and mirror
it. They didn't say what this was for and it was non-free and Al Viro
shot it down in microseconds.

Al can't shoot Recant down 'cos I'm not asking him for anything. I
expect he'll be violently ill when he spots it: he's a simplicity fiend
and this, well, it's not simple, and it's got lots of heuristics in,
most of which will probably be wrong to start with.

(However, he's also a vi fiend, and the object at the heart of Recant is
called a vinode. Maybe this will soften the blow. ;} )

> FUSE & PostgreSQL I'm sure it'll be faster than MVFS ;-P

I'm not setting the bar *that* low. ;P

Space consumption is actually turning out to be a bigger constraint than
time, because there are a lot of `obvious' ways to speed this sort of
thing up which turn out to consume insane space. The `obvious' way to
speed up file lookups, for instance, is to replicate all the per-block
metadata whenever a few file is created... but the space requirements of
that are horrible; make a few dozen versions with a one-byte change in
each and you've probably eaten as much space in metadata as the original
file consumed. Not a good idea, even though it lets lookups be O(log n).

But I should stop gibbering about it and get back to working on it and
getting it neat enough that I can release it properly the way all other
projects do (i.e. with some actual *code* and the design docs) rather
than polluting an unrelated mailing list with my babblings. :)

-- 
`I must caution that dipping fingers into molten lead
 presents several serious dangers.' --- Jearl Walker
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug