[sclug] Re: sclug Digest, Vol 41, Issue 10

Fri Feb 16 18:16:22 UTC 2007

> Message: 1
> Date: Thu, 15 Feb 2007 18:00:47 +0000
> From: Darren Davison <darren at davisononline.org>
> Subject: [sclug] file system errors on SATA drive
> To: SCLUG <sclug at sclug.org.uk>
> Message-ID: <20070215180047.GA15704 at davisononline.org>
> Content-Type: text/plain; charset="us-ascii"
> 
> hi,
> 
> I recently started getting errors on a new (3 mth old) SATA drive...

Hi as everyone realised, its the difference between
/dev/sda1 and /dev/sda, multiplied by the actions you took,
plus maybe an unrecoverable bad-block.

If you always used to have sda1, and never did any damage on sda,
then carry on, as it now works, however, since you say that mount
worked for a while, then failed, so some damage has been done.

My suggestion, is to backup all files onto a different disk,
then zero out this disk and start again. Even if you are using
corrupted file contents, the FS layout is fixed.

# to copy files

cd /mnt/sda1
find | cpio -pvdm /mnt/sdb1/old_sda1/
dmesg

# to overwrite disk surface

dd if=/dev/zero of=/dev/sda bs=1024k # 20-80 minutes later ...
dmesg

ext2 fs (and most others) leave the first block unused,
so that a boot sector and partition information can live there safely.
"fdisk" writes its partition information there, and the kernel uses
it for /proc/partitions. Running mke2fs doesnt overwrite that
first sector, so it still gets seen as multiple partitions.
Conversely, ext2 doesnt write into that sector, so it doesnt
accidentally create something that looks like a partition table.

A bad block (if that is what happened), is something that the
disk tries to resolve itself. If it succeeded, it would copy the data
to another sector, mark the old sector as unusable, and secretly map
requests for one to the other. If it failed, it would not have the data
to copy over, and would report a fail to the OS, which might recover
it from elsewhere, or retry on a warmer day. If at any stage the
OS writes over that sector (not requiring read-then-write),
the disk can do its sector remapping, without error.

You logs dont show a disk io error from the disk driver,
(just errno = EIO from mount). NB disk sector = 512 bytes

dd is your friend, it will sweep for unreadable blocks

	dd if=/dev/sda of=/dev/null bs=16k # then check dmesg

If you used to use /dev/sda1, then one day switched to /dev/sda
(edited a file, ran a command, ... ) then the tools should have
noticed the absent superblock (in a different location), and refused
to do damage to the data on the disk.

If you used to use /dev/sda, and it had a nulled out partition sector,
but then one day you (or the magic GUI) ran fdisk, then there would
be a valid partition table, which would now lead various tools (kernel)
to believe that /dev/sda1 existed, with specific boundries, even with
ext2 (3) continuing to use /dev/sda. Those tools might have written
new data at the start of /dev/sda2, or even /dev/sda1 (which doesnt
contain a recognisable superblock).

I cant remember - maybe /dev/sda1 has to start at a cylinder boundry,
which might leave all of track-zero as the boot-sectors!?

Running fsck might have patched over the overwrite. but it might not.
Thats why I'd recommend starting again after a backup, and zero-out.

Running mkfs (without -n) would have written superblocks,
and inode tables, in several places, one sector + one track away
from the older fs-image, (ie in data locations). at 4k-block positions.
Most of the disk surface would remain untouched.
Then running fsck on the OTHER layout ... start again !

If you want to edit out any fdisk partition table info, so that
/proc/partitions doesnt show a ghost outline, you should look at
the lilo docs for the layout of the boot sector, and use dd
with a bytes count (it will read the entire sector in, overwrite
the count bytes after the skip bytes, and write it back).

>From memory, the first 446 bytes are the (first) boot sector,
the rest of the 512-446=66 are the fdisk table. 

> sda1): ext3_clear_journal_err: Filesystem error recorded from previous
> mount: IO failure

clever! find the old superblocks!

> root at hepburn:~# mke2fs -n /dev/sda

> Block size=4096 (log=2)
> Fragment size=4096 (log=2)
> 19546112 inodes, 39072726 blocks
> 1953636 blocks (5.00%) reserved for the super user
> First data block=0
> Maximum filesystem blocks=41943040

a block group is like a mini ext2 within an ext2
files in one dir go to one zone. Corrupted zones
will have bunches of corrupted files

> 1193 block groups
> 32768 blocks per group, 32768 fragments per group
> 16384 inodes per group
> Superblock backups stored on blocks: 
>         32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632,
> 		2654208, 4096000, 7962624, 11239424, 20480000, 23887872

Did this do any damage ?

> root at hepburn:~# e2fsck -b 32768 /dev/sda

--
Graham