[Wylug-help] Disk health monitoring

Anne Wilson cannewilson at googlemail.com
Thu May 26 13:11:22 UTC 2011

On Thursday 26 May 2011 12:36:39 John Hodrien wrote:
> take (44 minutes in this case).
> Wait that amount of time, then do:
> smartctl -a /dev/sda
> This'll then return something like this (snipped):
> SMART Self-test log structure revision number 1
> Num  Test_Description    Status                  Remaining  LifeTime(hours)
> LBA_of_first_error
> # 1  Short offline       Interrupted (host reset)      80%     15763       
>  - # 2  Short offline       Interrupted (host reset)      80%     15763   
>      - # 3  Short offline       Completed without error       00%        
> 0         -
> If these tests fail, buy a new disk.
It says the general health passed, but it did find 4 errors.  I could use some 
help in understanding what it is telling me, though.  Could you look through 
it, please (attached) and comment?  Thanks

-------------- next part --------------
# smartctl -a /dev/sda
smartctl 5.40 2010-10-16 r3189 [i386-redhat-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

Model Family:     Western Digital Scorpio EIDE family
Device Model:     WDC WD600UE-22KVT0
Serial Number:    WD-WXE706485695
Firmware Version: 01.03K01
User Capacity:    60,011,642,880 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Thu May 26 14:46:50 2011 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                 (3180) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        No General Purpose Logging support.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  44) minutes.
Conveyance self-test routine
recommended polling time:        (   6) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
  1 Raw_Read_Error_Rate     0x000f   200   200   051    Pre-fail  Always       -       4
  3 Spin_Up_Time            0x0003   160   157   021    Pre-fail  Always       -       1000
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1250
  5 Reallocated_Sector_Ct   0x0033   188   188   140    Pre-fail  Always       -       95
  7 Seek_Error_Rate         0x000f   200   200   051    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   090   090   000    Old_age   Always       -       7903
 10 Spin_Retry_Count        0x0013   100   100   051    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0012   100   100   051    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1236
192 Power-Off_Retract_Count 0x0032   199   199   000    Old_age   Always       -       1230
193 Load_Cycle_Count        0x0032   067   067   000    Old_age   Always       -       400095
194 Temperature_Celsius     0x0022   096   091   000    Old_age   Always       -       47
196 Reallocated_Event_Count 0x0032   194   194   000    Old_age   Always       -       6
197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       2
200 Multi_Zone_Error_Rate   0x0009   200   200   051    Pre-fail  Offline      -       0

SMART Error Log Version: 1
ATA Error Count: 4
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 4 occurred at disk power-on lifetime: 5383 hours (224 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  04 41 08 37 4c 01 e3  Error: ABRT 8 sectors at LBA = 0x03014c37 = 50416695

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 37 4c 01 03 08   3d+23:27:27.479  READ DMA
  27 00 00 00 00 00 00 08   3d+23:27:27.473  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 00 0a   3d+23:27:27.470  IDENTIFY DEVICE
  ef 03 45 00 00 00 00 0a   3d+23:27:27.470  SET FEATURES [Set transfer mode]
  27 00 00 00 00 00 00 08   3d+23:27:27.470  READ NATIVE MAX ADDRESS EXT

Error 3 occurred at disk power-on lifetime: 5383 hours (224 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  04 41 08 37 4c 01 e3  Error: ABRT 8 sectors at LBA = 0x03014c37 = 50416695

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 37 4c 01 03 08   3d+23:27:24.633  READ DMA
  27 00 00 00 00 00 00 08   3d+23:27:24.627  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 00 0a   3d+23:27:24.625  IDENTIFY DEVICE
  ef 03 45 00 00 00 00 0a   3d+23:27:24.624  SET FEATURES [Set transfer mode]
  27 00 00 00 00 00 00 08   3d+23:27:24.624  READ NATIVE MAX ADDRESS EXT

Error 2 occurred at disk power-on lifetime: 5383 hours (224 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  04 41 08 37 4c 01 e3  Error: ABRT 8 sectors at LBA = 0x03014c37 = 50416695

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 37 4c 01 03 08   3d+23:27:21.788  READ DMA
  27 00 00 00 00 00 00 08   3d+23:27:21.781  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 00 0a   3d+23:27:21.779  IDENTIFY DEVICE
  ef 03 45 00 00 00 00 0a   3d+23:27:21.779  SET FEATURES [Set transfer mode]
  27 00 00 00 00 00 00 08   3d+23:27:21.778  READ NATIVE MAX ADDRESS EXT

Error 1 occurred at disk power-on lifetime: 5383 hours (224 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  04 41 08 37 4c 01 e3  Error: ABRT 8 sectors at LBA = 0x03014c37 = 50416695

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 37 4c 01 03 08   3d+23:27:18.924  READ DMA
  c8 00 18 97 72 bc 03 08   3d+23:27:18.914  READ DMA
  c8 00 08 d7 72 bc 03 08   3d+23:27:18.914  READ DMA
  c8 00 08 f7 72 bc 03 08   3d+23:27:18.909  READ DMA
  c8 00 48 e7 73 bc 03 08   3d+23:27:18.905  READ DMA

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      7903         -
# 2  Extended offline    Completed without error       00%      7901         -
# 3  Short offline       Completed without error       00%      7899         -
# 4  Short offline       Completed without error       00%      7899         -
# 5  Extended offline    Completed without error       00%      7887         -
# 6  Extended offline    Completed without error       00%      1839         -
# 7  Short offline       Completed without error       00%      1838         -
# 8  Short offline       Completed without error       00%      1838         -

SMART Selective self-test log data structure revision number 1
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
