[Gllug] Hacker Attack, and a wild aside about version-controlled filesystems

Nix nix at esperi.org.uk
Thu Jan 12 08:00:47 UTC 2006


On Thu, 12 Jan 2006, Tethys moaned:
> 
> Nix writes:
> 
>>File access is, unfortunately, going to be slow, scaling as O(mn) where
>>m = the number of versions needed to traverse to reconstruct the file
>>contents, and n = the number of blocks in the file.
> 
> Not necessarily. A few alternative options:
> 
> 1. Use an RCS-like approach of storing the most recent version in its
>    entirety, and deltas back to the beginning of time, to make m=1 for
>    accessing the current version.

I've considered this. However, if this isn't to lead to massive writes
whenever you bump a version due to a tiny write, you'll have to write
forward deltas against an existing branch tip and transform them into
reverse deltas later, which means you now have deltas travelling in
*both* directions, which seems to me to be terribly complex.

Plus, this is rather hard to make work with branches (yes of *course*
I support branches; rolling back doesn't stop open()/write()/close()
from working!)

So instead I'm recording forward deltas and analyzing the delta trees
in the background to determine the earliest delta which you need to
scan for a given version in order to reconstruct all the data for
that version. If the analysis hasn't been got around to for a given
version that version is probably so new it's still in the page cache,
and if it *has* been got around to then we'll probably hardly need to
step backwards at all.

>                                   Yes, as Larry McVoy is so keen to point
>    out, this can lead to undetected corruption of previous versions.

That's what the MD5 hash of every disk block is meant to spot. (In
practice the database should spot this: the major point of the hash is
to make spotting candidate duplicate blocks for merging much faster. Of
course we byte-for-byte compare candidates too; we have to read them
anyway, and this lets us defend against hash collisions.)

>    However, his insistence on using SCCS style storage isn't the only
>    solution. You could also have a periodic reconstruction of the first
>    checked in version, which could be checksummed to detect corruption.
>    The period can be as frequent or infrequent as you wish, but pick a
>    sensible default, and you should be fine.

Most of the optimizations which usefully apply to newly-written versions
are applied lazily; not the hashing, but things like block merging and
last-safe-version determination are done by noting in a (semi-temporary)
table that the appropriate optimization thread should get around to
checking this candidate soon, please; the optimization threads consider
that `low-hanging fruit' and look at it before anything else does.

> 2. Store a full version of the file every c revisions, so at most you need
>    to go back c revisions, rather than allowing m to grow arbitrarily large.

Hm, actually, a per-block analogue of this sounds like a good idea :)
since I store read/write/frequency statistics on a per-allocated-block
basis, it should be fairly easy to determine which blocks are being
heavily read, and from what distance in versions; those blocks can get
duplicates allocated further up the stack. (Of course the *blocks*
aren't duplicated, but the per-block-per-version tracking state *is*;
that's the thing which ties block data to file versions, and takes the
time to scan.)

(THe frequency stuff seems expensive to track at first, can be made
quite cheap to store. There are fairly obvious data structures for
storing frequency data in bounded space, as long as you're willing to
accept lower resolution for data far in the past.)

>    Yes, this incurrs a storage overhead, but for most non-niche cases today,
>    storage is cheap.

The overhead is only 70 bytes or so per block duplication; insignificant,
especially since I hope most blocks will be large (how many files are
really written to piecemeal?)

> 3. Every c revisions, store not the delta from the previous version, but
>    from the file as it was c versions ago. This is a kind of half way house,
>    in that if you set c=10, you reduce m by nearly an order of magnitude,

Oo. *Nifty* trick. It means we need to track what version things are deltaed
from, and complicates expiry, but I'll see if I can incorporate that.

(I think it may reduce to the per-block duplication trick, though, since
we store `deltas' on a block-by-block basis.)

>    without paying the storage penalty of saving the whole file. Thus to get
>    to revision 32, you'd go from 0 -> 10 -> 20 -> 30 -> 31 -> 32, making
>    m=5, rather than m=32. Checkin time increases, but the ratio of reads to
>    checkins is likely to make this a win for most files.

Yes: how often do you call close() on a file you opened for writing anyway?
;)

(actually, commits/checkins are going to be very common for some
directories, and it's for those that the block duplication idea makes a
*lot* of sense. So thanks for the idea! :) )

-- 
`I must caution that dipping fingers into molten lead
 presents several serious dangers.' --- Jearl Walker
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug




More information about the GLLUG mailing list