[Gllug] Software Raid 5 MD0 just stopped working
Ken Smith
kens at kensnet.org
Sun Apr 22 15:50:00 UTC 2012
I'm helping a friend with an old FC6 system I set up for him ages ago.
It has a Logical Lolume made from MD0 and MD1 that in turn are two three
disk raid 5 sets.
One day MD0 decided not to play any more. When I looked at the system
MD0 was no longer mentioned in /proc/mdstat. And the VG was showing that
it was made of an unknown device and MD1.
I reassembled MD0 and the Raid appeared to be happy and re-established
the uuid of MD0 and the LV was found again but the ext3 filesystem on
the LV was a shambles. Its all backed up so it can all be put back.
The machine runs smartctl -a on its disks daily and I have the records
of that going back for over a year. MD0 is made of two Western Digital
500G's and a Seagate 500G. All the smartctl data looks fine. Except
that the Seagate is showing:-
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 114 099 006 Pre-fail Always - 70478277
3 Spin_Up_Time 0x0003 094 093 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 26
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 081 060 030 Pre-fail Always - 126904381
9 Power_On_Hours 0x0032 061 061 000 Old_age Always - 34946
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 26
187 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
189 Unknown_Attribute 0x003a 100 100 000 Old_age Always - 0
190 Temperature_Celsius 0x0022 069 060 045 Old_age Always - 554958879
194 Temperature_Celsius 0x0022 031 040 000 Old_age Always - 31 (Lifetime Min/Max 0/13)
195 Hardware_ECC_Recovered 0x001a 060 057 000 Old_age Always - 167912025
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0
The Raw Read Error Rate drew my attention, but a year ago it showed:-
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 114 099 006 Pre-fail Always - 70478277
3 Spin_Up_Time 0x0003 093 093 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 22
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 080 060 030 Pre-fail Always - 112009888
9 Power_On_Hours 0x0032 071 071 000 Old_age Always - 26049
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 22
187 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
189 Unknown_Attribute 0x003a 100 100 000 Old_age Always - 0
190 Temperature_Celsius 0x0022 067 060 045 Old_age Always - 588513313
194 Temperature_Celsius 0x0022 033 040 000 Old_age Always - 33 (Lifetime Min/Max 0/14)
195 Hardware_ECC_Recovered 0x001a 060 057 000 Old_age Always - 18314776
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0
Pretty similar.
I'm trying to fathom why MD0 just packed up a went home. Noting in the
log files to give a clue.
Any ideas/suggestions
Thanks
Ken
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
--
Gllug mailing list - Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug
More information about the GLLUG
mailing list