[Wolves] Tinkle tinkle little disk...

chris procter chris-procter at talk21.com
Thu Jan 3 18:03:29 GMT 2008


>> That idea turned into
>>
 http://www.kryogenix.org/days/2006/04/12/distributed-backups-to-friends
>> in my head, but there are certain limitations with that (which I'll
>> happily lay out if people would like me to, so you can help me work
>> out how they might be fixed :)).
>
>Actually, I've changed my mind; I'm going to lay these out anyway
>whether you all want me to or not, so if anyone's bored by technical
>stuff, turn away now. This is going to be relatively long :)
>
>The idea behind my distributed-backups system, as per link above, is
>that you download the backup client, say "I am in the Wolves LUG
>backup group", and mark files you want to back up, and that's it. The
>system itself then backs up the files you've marked onto machines of
>other people in the Wolves LUG backup group, making sure that it
>breaks the files up in such a way that if someone disappears from the
>group you can still retrieve your full backups, and it encrypts the
>backups on the other machines so that the owner of the machine can't
>read them. That's all. There are three problems with this, none of
>which I have a clear idea how to solve. They are:
>  rsyncness
>  encryption key loss
>  bandwidth


You could break the file up into blocks (say 4KB each) then calculate the difference between each block and the previous back up of that block, then if theres any changes, encrypt the diff, compress it and send it over.

You actually dont need to reconstruct the file from the diffs untill you restore it so all the remote site ends up storing is encryted compressed messages that say "at 23:00 on 29thFeb in file X in the 23rd 4k block changed the seventh byte from a to e". To restore you get all of the patch files and rerun them in time order.

If you're really clever you get three remote machines and send the diff to two and a parity block to the third RAID style.


If you're really really clever you can deduplicate the data first by
finding blocks that appear in several places, transmit them once and
just send diffs like "for block 17 in file X use block 231 from file Y
" for the rest which should help the bandwidth a lot as well. If you get this bit working sell the technology to IBM and buy bill gates to act as your footstool.

Most changes to files cluster together (new emails get added to the end of mbox files for example), so only a few blocks should change at a time even in very large files so you only replicate the changes, which in turn minimises the bandwidth needed, restore is still a pain because you're going upstream on ADSL and then have to decrypt and reconstruct the files but if you dont drop your laptop and shatter the harddisk you'll never need to restore it ;-)

The encryption key problem is more of a pain, probably the best way I can think of is to use an encryption system that always generates the same key from the same passphrase. It would make Bruce Schneier cry but your dad would only need to remember his phrase (or even better "its the first line of my favourite book"). Not exactly military grade security but does it need to be? and even if someone cracks it they cant *change* the data because theres several copies distributed around to compare.

chris

"tinkle tinkle little disk,
all your data is at risk"






      __________________________________________________________
Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com




More information about the Wolves mailing list