[Gllug] Disk problems

Alain Williams addw at phcomp.co.uk
Tue Sep 10 21:36:59 UTC 2002


A machine that I look after has suddenly developed disk problems.

P400, reasonable memory 2.4.18 kernel. 3 disks (hda, hdb, hdd - all IDE), SCSI tape.
Reasonably busy: 1,000,000 processes/day. Dell box of some sort. Put together April last
year - redhat 7.1
Up 6 months (a fortnight under - damn) and it dies slowly, processes will not exit,
but can run new ones. There is a regular clicking noise (sounds like from one of
the disks). Try to sync, reboot. All disks are invisible - even when boot from
floppy.

Diagnose bust EDI controller, rip out disks & put into new box (same, just a bit faster).

Seems OK, go home.

Box dies at 4am when the backup starts.
Reboot: notice error messages relating to hdd in /var/log/messages.
Install new hdd.

Several hours later (now) I notice errors relating to the other 2 disks (hda, hdb), non
for hdd. Errors are like:
Sep 10 22:02:12 zebra kernel: hdb: lost interrupt
Sep 10 22:02:16 zebra kernel: hdb: status error: status=0x58 { DriveReady SeekComplete DataRequest }
Sep 10 22:02:16 zebra kernel: hdb: drive not ready for command
Sep 10 22:02:16 zebra kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Sep 10 22:02:16 zebra kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
Sep 10 22:02:36 zebra kernel: hdb: lost interrupt
Sep 10 22:02:36 zebra kernel: hdb: status error: status=0x58 { DriveReady SeekComplete DataRequest }
Sep 10 22:02:36 zebra kernel: hdb: drive not ready for command
Sep 10 22:03:03 zebra kernel: hdb: lost interrupt
Sep 10 22:03:04 zebra kernel: hdb: status error: status=0x58 { DriveReady SeekComplete DataRequest }
Sep 10 22:03:04 zebra kernel: hdb: drive not ready for command

This is the sort of thing that I was finding for hdd after the initial crash. Going back through
a month of /var/log/messages I cannot see any messages relating to the disks before the initial
crash.

Anyone any idea ????? what is happening. I intend to replace the 2 other disks asap tomorrow.
This will mean completely new hardware except for the SCSI tape & the SCSI card (Adaptec 7899,
I know that it over kill for just one device - but that is what could be got quickly when I built it).

My theory is that the original IDE controller failed & somehow damaged the disks.

A bit more hardware from the last boot:

Sep 10 19:32:28 zebra kernel: PIIX4: IDE controller on PCI bus 00 dev f9
Sep 10 19:32:28 zebra kernel: PIIX4: chipset revision 2
Sep 10 19:32:28 zebra kernel: PIIX4: not 100%% native mode: will probe irqs later
Sep 10 19:32:28 zebra kernel:     ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio
Sep 10 19:32:28 zebra kernel:     ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio
Sep 10 19:32:28 zebra kernel: hda: Maxtor 90645D3, ATA DISK drive
Sep 10 19:32:28 zebra kernel: hdb: Maxtor 98196H8, ATA DISK drive
Sep 10 19:32:28 zebra random: Initializing random number generator:  succeeded
Sep 10 19:32:28 zebra kernel: hdc: Lite-On LTN486 48x Max, ATAPI CD/DVD-ROM drive
Sep 10 19:32:28 zebra kernel: hdd: ST380021A, ATA DISK drive
Sep 10 19:32:29 zebra kernel: ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Sep 10 19:32:29 zebra kernel: ide1 at 0x170-0x177,0x376 on irq 15
Sep 10 19:32:29 zebra kernel: hda: 12594960 sectors (6449 MB) w/512KiB Cache, CHS=784/255/63, UDMA(33)
Sep 10 19:32:29 zebra kernel: hdb: 160086528 sectors (81964 MB) w/2048KiB Cache, CHS=158816/16/63, (U)DMA
Sep 10 19:32:29 zebra kernel: hdd: 156301488 sectors (80026 MB) w/2048KiB Cache, CHS=155061/16/63, (U)DMA
Sep 10 19:32:29 zebra kernel: hdc: ATAPI 48X CD-ROM drive, 120kB Cache, UDMA(33)
Sep 10 19:32:29 zebra kernel: Uniform CD-ROM driver Revision: 3.12
Sep 10 19:32:29 zebra kernel: Partition check:
Sep 10 19:32:29 zebra kernel:  hda: hda1 hda2 hda3 hda4 < hda5 hda6 hda7 >
Sep 10 19:32:29 zebra kernel:  hdb: [PTBL] [9964/255/63] hdb1 hdb2 hdb3 < hdb5 hdb6 >
Sep 10 19:32:29 zebra kernel:  hdd: hdd1 hdd2
Sep 10 19:32:29 zebra kernel: Floppy drive(s): fd0 is 1.44M
Sep 10 19:32:29 zebra kernel: FDC 0 is a National Semiconductor PC87306


-- 
Alain Williams

#include <std_disclaimer.h>

-- 
Gllug mailing list  -  Gllug at linux.co.uk
http://list.ftech.net/mailman/listinfo/gllug




More information about the GLLUG mailing list