[Nottingham][Talk] TODAY 17/07/2007 1: Backups; 2: Microsoft! (Navigation Inn)

Thu Jul 19 19:04:14 BST 2007

On Thu, 2007-07-19 at 16:56 +0100, Martin wrote:
> The average seemed to be that home backups were done 'once in a while',
> from every few days to once every few months.

As per your sig, the frequency increases when you have a disk fail :)

> Backup methods were straight-forward using 'drag-n-drop', "cp -r", "cp
> -a", tar, rsync, or whatever direct copy onto an alternate media.
> 
> HDDs are cheap enough to use them for speed and convenience. Similarly
> so for using "USB memory sticks". Finally, CD-R/RW and DVD+/-R was used
> also.

I use a mix of CD-R and extra hard disks (using RAID, of which more
below).

> Noone bothers with tapes or punched tape or punched cards any more!

Tapes are great for capacity *but* nowadays more and more people want
"instant" recovery of individual files. That's just not possible with
tape, no matter how quick the drive - unless you've a *very* good
offline indexing system to call on which keeps positional info of where
on which tape a file exists. If you have that - which is a substantial
quantity of data itself, and needs to be kept intact (hence needs
backing up!) - then modern drives like LTO2/3/4 can zap to a position
very quickly. LTO-anything isn't really affordable for home use,
however.

> And more surprisingly, noone verifies that their backups are good and
> that they can be used for recovery!!

That's something we do in $dayjob infrequently; it really should be
something we do a lot more.

> Aside: Raid got slammed for being ineffective and even dangerous for
> giving a /false/ /sense/ of security...

I cannot stress this point enough (and I guess that the same sentiments
were discussed last night):

RAID is *not* a substitute for backups. If you think it is, you have
never suffered a multiple drive (or metadata) failure.

RAID should only ever be used in terms of fault tolerance, unless you're
simply using it for extended space or increased throughput where loss of
data is unimportant - render farm scratch space, for example.
A given RAID system in fault-tolerant mode *should* be able to tolerate
failure of a single (or possibly more than one) drive in a given array.
Sadly, I've been burned so many times by crappy RAID hardware - HP,
Dell, AMI/LSI Megaraid, various aacraid variations - that I cannot
stress enough how important it is to backup, backup, and backup again.

In $dayjob we have a bunch of Dell Powervault storage arrays which for a
very long time used to pre-emptively fail out perfectly good drives
(which passed later diagnostics) and then fail out another one a few
seconds later, when under load. At that point the filesystem would
cough, the server halt, and we'd be left having to reboot the machine
and do a full Reiser filesystem tree rebuild. On a system of 4TB, that
takes a *long* time (18 hours or so). It took Dell over three years to
get their firmware right and stop the arrays doing that :-/

Several subscribers might also recall the days of the amazing exploding
Megaraid / HP NetRaid arrays at Webfusion...

I'm fortunate enough to have enjoyed using NetApp appliances for some
time, and am now using Sun/StorageTek SAN systems. They don't fail
anywhere near as often, and the premium price means that if a disk says
"ooh, I'm on me way out" there's often a new one on the premises before
you know you need it. That's service :)

In both of those systems, you can take snapshots (or checkpoints) of a
given filesystem on a scheduled basis then use the snapshot to take a
backup. That helps with performance, since the live filesystem isn't
getting thrashed doing the backup - and there's an online copy of the
filesystem to get single files back from. It completely changes backup
methodologies.

Anyway, curry to eat :)

Graeme

[Nottingham][Talk] *TODAY* 17/07/2007 1: Backups; 2: Microsoft! (Navigation Inn)

[Nottingham][Talk] TODAY 17/07/2007 1: Backups; 2: Microsoft! (Navigation Inn)