[Gllug] Disk problem

Alain Williams addw at phcomp.co.uk
Fri Sep 4 10:00:24 UTC 2009


I was wondering if anyone had any view on this.

Early this morning I got errors from one of my SATA disks (1GB).

When I got up I found that it would not respond to anything, however after a power cycle
of the whole machine (first in a long time) it came back. smartctl tests claim that
it is healthy, although I did get some more errors (see below). How worried should I be? 

I'll replace the disk later today.

I did not loose anything since it is software mirrored (kernel MD), that is rebuilding
nicely now.

The disk is Western digital: WDC WD10EACS-00D

Curiously one thing I did to all my disks y/day was to switch off the disk write cache:
	hdparm -W 0 /dev/sdb
this was after reading something in LWN saying that disk write caches can cause consistency
problems on failure:
	http://lwn.net/Articles/349970/
I wonder if this could have triggered something ?


Sep  4 02:22:56 mint kernel: ata2: EH in SWNCQ mode,QC:qc_active 0xFFFFFF sactive 0xFFFFFF
Sep  4 02:23:57 mint kernel: ata2: SWNCQ:qc_active 0x10 defer_bits 0xFFFFEF last_issue_tag 0x4   
Sep  4 02:23:57 mint kernel:   dhfis 0x10 dmafis 0x10 sdbfis 0x0
Sep  4 02:23:57 mint kernel: ata2: ATA_REG 0x40 ERR_REG 0x0
Sep  4 02:23:57 mint kernel: ata2: tag : dhfis dmafis sdbfis sacitve
Sep  4 02:23:57 mint kernel: ata2: tag 0x4: 1 1 0 1
Sep  4 02:23:57 mint kernel: ata2.00: exception Emask 0x0 SAct 0xffffff SErr 0x0 action 0x6 frozen
Sep  4 02:23:57 mint kernel: ata2.00: cmd 61/10:00:f5:5e:03/00:00:00:00:00/40 tag 0 ncq 8192 out 
Sep  4 02:23:57 mint kernel:          res 40/00:00:00:4f:c2/04:00:03:00:00/00 Emask 0x4 (timeout)
Sep  4 02:23:57 mint kernel: ata2.00: status: { DRDY }
Sep  4 02:23:57 mint kernel: ata2.00: cmd 61/08:08:85:fb:f8/00:00:00:00:00/40 tag 1 ncq 4096 out 
Sep  4 02:23:57 mint kernel:          res 40/00:00:a4:39:4c/04:00:03:00:00/40 Emask 0x4 (timeout)
Sep  4 02:23:57 mint kernel: ata2.00: status: { DRDY }
Sep  4 02:23:57 mint kernel: ata2.00: cmd 61/08:10:dd:12:30/00:00:36:00:00/40 tag 2 ncq 4096 out 
Sep  4 02:23:57 mint kernel:          res 40/00:00:a4:39:4c/04:00:03:00:00/40 Emask 0x4 (timeout)
Sep  4 02:23:57 mint kernel: ata2.00: status: { DRDY }
Sep  4 02:23:58 mint kernel: ata2.00: cmd 61/10:18:4d:31:99/00:00:01:00:00/40 tag 3 ncq 8192 out 
Sep  4 02:23:58 mint kernel:          res 40/00:00:a4:39:4c/04:00:03:00:00/40 Emask 0x4 (timeout)
Sep  4 02:23:58 mint kernel: ata2.00: status: { DRDY }
Sep  4 02:23:58 mint kernel: ata2.00: cmd 61/e8:20:95:7e:32/02:00:03:00:00/40 tag 4 ncq 380928 out
Sep  4 02:23:58 mint kernel:          res 40/00:00:a4:39:4c/04:00:03:00:00/40 Emask 0x4 (timeout)
Sep  4 02:23:58 mint kernel: ata2.00: status: { DRDY }

I have seen some like this:

Sep  4 09:54:36 mint kernel: ata2.00: cmd 61/80:f0:cd:7d:66/00:00:00:00:00/40 tag 30 ncq 65536 out
Sep  4 09:54:36 mint kernel:          res 41/04:00:cc:75:66/04:00:00:00:00/40 Emask 0x1 (device error)
Sep  4 09:54:36 mint kernel: ata2.00: status: { DRDY ERR }
Sep  4 09:54:36 mint kernel: ata2.00: error: { ABRT }
Sep  4 09:54:36 mint kernel: ata2: soft resetting link
Sep  4 09:54:36 mint kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Sep  4 09:54:36 mint kernel: ata2.00: configured for UDMA/133
Sep  4 09:54:36 mint kernel: ata2: EH complete
Sep  4 09:54:36 mint kernel: SCSI device sdb: 1953525168 512-byte hdwr sectors (1000205 MB)
Sep  4 09:54:36 mint kernel: sdb: Write Protect is off
Sep  4 09:54:36 mint kernel: SCSI device sdb: drive cache: write through


-- 
Alain Williams
Linux/GNU Consultant - Mail systems, Web sites, Networking, Programmer, IT Lecturer.
+44 (0) 787 668 0256  http://www.phcomp.co.uk/
Parliament Hill Computers Ltd. Registration Information: http://www.phcomp.co.uk/contact.php
Past chairman of UKUUG: http://www.ukuug.org/
#include <std_disclaimer.h>
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug




More information about the GLLUG mailing list