[sclug] My disk esplode

Mon Apr 14 23:37:26 UTC 2008

On Tue, Apr 15, 2008 at 00:05:04 +0100, David Given wrote:
: I recently had a major disk failure --- bad sectors galore. I thought
: people would appreciate a brief writeup to tell how I managed to recover
: (AFAICT) all my data off it.

You used the other half of the mirror..?

: The scenario: a Porsche branded LaCie external USB disk (yes, I crashed
: my Porsche). Inside is a SATA Seagate ST3500 Barracuda 7200.10 500GB
: disk. It was cheap. On the disk is a JFS partition.

: http://www.scan.co.uk/Products/ProductInfo.asp?WebProductID=675794

: What happened: the disk, running quite happily, started chattering to
: itself. At first I thought it was actual activity until I noticed that
: it still happened when not actually plugged in to the computer. At this
: point I realised something was very wrong and prepared to back up my
: data. The first thing I did was nuke a very, very big directory of
: temporary data because I didn't need it and didn't want to spend time
: backing it up.

...oh dear, no, you didn't.

: Lesson #1: do not do this. When your disk starts acting funny, mount it
: read only and never try to write to it again.

Correct.

: When the disk started making rythmic grinding noises and spewing I/O
: errors I killed it, remounted it, and even though it still seemed
: readable, fscked it. EPIC FAIL.

: Lesson #2: see lesson #1.

Very definitely correct.  Never, ever expect to write to the device again
and have it succeed.  This will likely make things worse.

: fsck tried to write to the superblock. The sectors containing the
: superblock curled up and died. At this point I now had an unmountable
: filesystem. Further fsck attempts (read-only this time!) revealed that
: while the backup superblock seemed to be fine, there was no way of
: fixing things because the primary superblock was unreadable.

: At this point I needed to take an image of the hard disk. dd would not
: work, because dd doesn't handle I/O errors appropriately.

dd if=... of=... conv=noerror,sync

Bad blocks are replaced with zero bytes, and it all works as well as can
be expected.  Likely one of the two tools you mention below do little more
than this.

: Luckily, there
: are two tools that do do this: dd_rescue and ddrescue (these are
: *different*); they're in Debian packages ddrescue and gddrescue
: respectively. These will both read disk images and attempt to recover
: bad sectors, but ddrescue does the better job of it. The Debian version
: is, unfortunately, very old; if you ever do this, you will want to
: compile the most recent version yourself. This supports sparse files,
: which means that empty blocks in the image will consume no disk space.
: Thanks to this feature, I managed to get the 500GB image to occupy only
: about 300GB of real disk space on my other big drive. This was a good
: thing, as otherwise I'd have had no room.

: Lesson #3: it's always worth having a spare of the biggest disk you've got.

Well, I'd personally say that lesson three should be 'ensure the other
half of the mirror is working', although people often don't understand
that they need a mirror in the first place.

It's quite simple, people: one copy of the data is no copy at all.  You
have no backup, and you have no guarantee that the data you think you have
is available.

: At this point all I needed to do was fsck.jfs the disk image, mount it
: loopback, and copy the data off. Success. (If there were any bad sectors
: in spaces used by actual files, those files will now contain blocks of
: zeros, but ddrescue said that there were only 419kB of unreadable
: sectors on the entire 488386560kB disk, so chances are I was lucky.)

Likely.

-- 
Dickon Hood

Due to digital rights management, my .sig is temporarily unavailable.
Normal service will be resumed as soon as possible.  We apologise for the
inconvenience in the meantime.

No virus was found in this outgoing message as I didn't bother looking.