[Wolves] Tinkle tinkle little disk...

Stuart Langridge sil at kryogenix.org
Thu Jan 3 16:53:34 GMT 2008


> That idea turned into
> http://www.kryogenix.org/days/2006/04/12/distributed-backups-to-friends
> in my head, but there are certain limitations with that (which I'll
> happily lay out if people would like me to, so you can help me work
> out how they might be fixed :)).

Actually, I've changed my mind; I'm going to lay these out anyway
whether you all want me to or not, so if anyone's bored by technical
stuff, turn away now. This is going to be relatively long :)

The idea behind my distributed-backups system, as per link above, is
that you download the backup client, say "I am in the Wolves LUG
backup group", and mark files you want to back up, and that's it. The
system itself then backs up the files you've marked onto machines of
other people in the Wolves LUG backup group, making sure that it
breaks the files up in such a way that if someone disappears from the
group you can still retrieve your full backups, and it encrypts the
backups on the other machines so that the owner of the machine can't
read them. That's all. There are three problems with this, none of
which I have a clear idea how to solve. They are:
  rsyncness
  encryption key loss
  bandwidth

1. rsyncness
Because bandwidth is small (see point 3), the system should not back
up all my tagged files every night: instead, it should only back up
changes. The rsync program already knows how to do this; it can look
at a file and transfer only the bits which changed since the last
backup, which is great. However, the file on the backup "server"
(which is actually, say, Adam's PC) is encrypted. A small change to
the "real" file (on my machine) will result in a very big change to
the encrypted version (on Adam's PC), which in practice means that
rsync's clever "work out which bits have changed" algorithm is
useless. This can be avoided by assuming that any change to a file
means you have to back up the whole file again; this works OK if your
files are all small (like, say /etc/* files on a Linux box) or if your
large files never change (like, say, your Photos folder), but if you
have a large file and it changes a lot (like, say, your mailbox, if
it's one mbox file, or an Outlook PST) then you pay the long-time
backup penalty every night. Comments on whether this is likely to be a
problem are invited.

2. encryption key loss
Because files in the backup need to be encrypted (so that Adam can't
read my files), there needs to be an encryption key. Since asking the
user to remember it, or enter a passphrase, or anything like that is a
total loser if the user is my dad, the system should generate my
encryption key for itself and remember it without telling me about it.
That's all fine, except that if my hard disc dies and I get a new one
and then try and restore my backups (this is what backups are for,
after all!) the new system won't know what the encryption key was, and
so it won't be able to decrypt the backups. I can't think of a way
around this without saying to the user "here is a file which you must
keep safe somewhere other than on your computer", which is a total
abject loss if you're my dad (because (a) he won't understand the
request and (b) he has nowhere other than his computer to keep such a
file safe! Yes, he could burn it to a CD, but then it's hardly an
easy-to-use backup system, is it?) I can't think of a user-friendly
way to get around this one other than "the central server stores your
backup key", which translates to "the central server admin can read
all your backups", which is clearly a bad idea.

3. bandwidth
If I want to back up 200MB of photos somewhere away from my home
network, then I'm screwed. This isn't a problem with my backup system,
it's just The Way It Is, because that's how fast internet connections
are. This possibly makes any concept of off-site backup for home users
basically useless. Yes, some people will just use it to back up their
most vital documents, but some people want to make sure they don't
lose their photos and their mp3s, and saying "here is a brilliant
backup system; only use it for text documents" is pretty stupid. This,
all by itself, might kill the entire concept of off-site home-user
backup. Not a lot that can be done about that if that's the case; it
can't be solved with software, obviously, no matter how clever the
software is. Thoughts on whether this is an issue are gratefully
welcomed.

sil

-- 
New Year's Day --
everything is in blossom!
I feel about average.
   -- Kobayashi Issa



More information about the Wolves mailing list