sata NCQ errors (Re: [Nottingham] Missing CF drive with libata...)

Martin martin at ml1.co.uk
Sat May 24 23:21:05 BST 2008


For anyone similarly afflicted:

It all started with a curiously disappearing CF drive...

Using Mandriva 2.6.24 on a Gigabyte GA-MA790FX-DS5 with a Sandisk
Extreme III CompactFlash plugged into a CF-ide adapter plugged into the
ide Ch0 Master, the libata/ata_generic kernel modules appeared to bomb
everything down to just a maximum transfer rate of low MB/s. Including
for the sata HDDs!

Removing the CF and disabling the ide channel in the BIOS caused the
full disk IO of tens of MB/s for the HDDs to be seen again.


And then the next strangeness... Native Command Queuing on the sata HDDs:

egrep '(kernel: ata)|(kernel: sd)' /var/log/messages

kernel: ata7.00: exception Emask 0x10 SAct 0xf SErr 0x580100 action 0x2
kernel: ata7.00: irq_stat 0x08000000
kernel: ata7: SError: { UnrecovData 10B8B Dispar Handshk }
[...]
kernel: ata7: hard resetting link


Those errors are for:

kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
kernel: ata7.00: ATA-7: Hitachi HUA721010KLA330, GKAOA70M, max UDMA/133
kernel: ata7.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
kernel: ata7.00: configured for UDMA/133

I also have in there:

kernel: ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
kernel: ata8.00: ATA-7: WDC WD1500ADFD-00NLR5, 21.07QR5, max UDMA/133
kernel: ata8.00: 293046768 sectors, multi 0: LBA48 NCQ (depth 31/32)
kernel: ata8.00: configured for UDMA/133

which hasn't so far given those errors, but others have blacklisted it:

http://bugs.centos.org/view.php?id=2609&nbn=1

for the NLR4 drive version. Is that /really/ fixed in that NLR5?...

This condemns NCQ on all Western Digital HDDs:

http://www.transcoding.org/cgi-bin/wiki?action=browse&id=Western_Digital_NCQ&revision=26

Hitachi gets a few hits on a web search for NCQ problems also...


The NCQ appears to fail under heavy loading, which is exactly when you
want it to be working!

OK, so to change NCQ to be effectively disabled, you can use:

echo 1 >/sys/block/sdX/device/queue_depth

where X is your offending device.

Is there a better way of setting this for boot rather than doing the
hack of adding that line onto the end of "/etc/rc.d/rc.local" (or
whatever other startup script)?

See also:

http://linux-ata.org/faq.html



The fun all started with:

Martin wrote:
> OK, a curious one...
> 
> Loading up Mandriva 2.6.24.4 onto a new machine and a SanDisk
> CompactFlash plugged into ide0 (master) (via an ide -> CF adapter) is
> not discovered.
> 
> The CF is seen and identified by the BIOS and I've updated to the latest
> BIOS for the motherboard (SB600 chip for the single ide channel) just in
> case...
> 
> I'm also using LVM on top of software RAID.
> 
> The only significantly visible difference is for the selection of
> modules. Also on Mandriva, the SB600_PATA module logs that it runs out
> of table space...
> 
> Help?
> 
> 
> 
> The dirty details:
> 
> Mandriva is using (lsmod selected results):
> 
> fuse                   51504  1
> loop                   20100  0
> usb_storage           108100  0
> sg                     39064  0
> ide_disk               18816  0
> atiixp                  8848  0 [permanent]
> jmicron                 6912  0 [permanent]
> ide_core              123928  3 ide_disk,atiixp,jmicron
> shpchp                 36764  0
> pci_hotplug            32816  1 shpchp
> ata_piix               22788  0
> ahci                   31620  8
> libata                155696  2 ata_piix,ahci
> sd_mod                 31872  12
> scsi_mod              157880  5 sbp2,usb_storage,sg,libata,sd_mod
> raid456               129568  0
> async_xor               8448  1 raid456
> async_memcpy            6912  1 raid456
> async_tx               11252  3 raid456,async_xor,async_memcpy
> xor                     9744  2 raid456,async_xor
> raid1                  25728  2
> reiserfs              246536  4
> uhci_hcd               29088  0
> ohci_hcd               27268  0
> ehci_hcd               40076  0
> usbcore               145712  5 usb_storage,uhci_hcd,ohci_hcd,ehci_hcd
> 
> 
> For comparison, an Ubuntu LiveDVD works fine and shows and accesses the
> SanDisk no problem:
> 
> Ubuntu 2.6.22-14
> 
> pci_hotplug            32704  1 shpchp
> squashfs               48132  1
> ide_disk               18560  0
> loop                   19076  2
> unionfs                77096  1
> nls_cp437               6784  1
> isofs                  36412  1
> sd_mod                 30336  4
> sg                     36764  0
> sr_mod                 17828  1
> cdrom                  37536  1 sr_mod
> ata_generic             8452  0
> atiixp                  7056  0 [permanent]
> ide_core              116804  2 ide_disk,atiixp
> r8169                  32260  0
> pata_jmicron            7552  0
> ahci                   23300  3
> ehci_hcd               36492  0
> libata                125168  3 ata_generic,pata_jmicron,ahci
> scsi_mod              147084  4 sd_mod,sg,sr_mod,libata
> ohci_hcd               22916  0
> usbcore               138632  3 ehci_hcd,ohci_hcd
> fuse                   47124  1
> 
> 
> Further info:
> 
> On Mandriva there is no mention of the SanDisk anywhere in the logs. The
> only complaint is:
> 
> Jan  1 03:45:51 Server01a kernel: shpchp: Standard Hot Plug PCI
> Controller Driver version: 0.4
> Jan  1 03:45:51 Server01a kernel: Uniform Multi-Platform E-IDE driver
> Revision: 7.00alpha2
> Jan  1 03:45:51 Server01a kernel: ide: Assuming 33MHz system bus speed
> for PIO modes; override with idebus=xx
> Jan  1 03:45:51 Server01a kernel: JMB: IDE controller (0x197b:0x2363 rev
> 0x02) at  PCI slot 0000:05:00.1
> Jan  1 03:45:51 Server01a kernel: PCI: Enabling device 0000:05:00.1
> (0000 -> 0001)
> Jan  1 03:45:51 Server01a kernel: ACPI: PCI Interrupt 0000:05:00.1[B] ->
> GSI 18 (level, low) -> IRQ 18
> Jan  1 03:45:51 Server01a kernel: JMB: 100% native mode on irq 18
> Jan  1 03:45:51 Server01a kernel:     ide0: BM-DMA at 0xbb00-0xbb07,
> BIOS settings: hda:pio, hdb:pio
> Jan  1 03:45:51 Server01a kernel:     ide1: BM-DMA at 0xbb08-0xbb0f,
> BIOS settings: hdc:pio, hdd:pio
> Jan  1 03:45:51 Server01a kernel: JMB: IDE controller (0x197b:0x2363 rev
> 0x02) at  PCI slot 0000:06:00.1
> Jan  1 03:45:51 Server01a kernel: PCI: Enabling device 0000:06:00.1
> (0000 -> 0001)
> Jan  1 03:45:51 Server01a kernel: ACPI: PCI Interrupt 0000:06:00.1[B] ->
> GSI 19 (level, low) -> IRQ 19
> Jan  1 03:45:51 Server01a kernel: JMB: 100% native mode on irq 19
> Jan  1 03:45:51 Server01a kernel:     ide2: BM-DMA at 0xab00-0xab07,
> BIOS settings: hde:pio, hdf:pio
> Jan  1 03:45:51 Server01a kernel:     ide3: BM-DMA at 0xab08-0xab0f,
> BIOS settings: hdg:DMA, hdh:DMA
> Jan  1 03:45:51 Server01a kernel: SB600_PATA: IDE controller
> (0x1002:0x438c rev 0x00) at  PCI slot 0000:00:14.1
> Jan  1 03:45:51 Server01a kernel: ACPI: PCI Interrupt 0000:00:14.1[A] ->
> GSI 16 (level, low) -> IRQ 16
> Jan  1 03:45:51 Server01a kernel: SB600_PATA: not 100% native mode: will
> probe irqs later
> Jan  1 03:45:51 Server01a kernel: SB600_PATA: too many IDE interfaces,
> no room in table
> 
> Don't believe the date!
> 
> 
> Ubuntu shows:
> 
> Apr 30 15:17:40 ubuntu kernel: [   10.704000] SB600_PATA: IDE controller
> at PCI slot 0000:00:14.1
> Apr 30 15:17:40 ubuntu kernel: [   10.704000] ACPI: PCI Interrupt
> 0000:00:14.1[A] -> GSI 16 (level, low) -> IRQ 16
> Apr 30 15:17:40 ubuntu kernel: [   10.704000] SB600_PATA: chipset revision 0
> Apr 30 15:17:40 ubuntu kernel: [   10.704000] SB600_PATA: not 100%%
> native mode: will probe irqs later
> Apr 30 15:17:40 ubuntu kernel: [   10.704000]     ide0: BM-DMA at
> 0xf900-0xf907, BIOS settings: hda:DMA, hdb:pio
> [...]
> Apr 30 15:17:40 ubuntu kernel: [   10.992000] hda: SanDisk SDCFX3-16384,
> CFA DISK drive
> Apr 30 15:17:40 ubuntu kernel: [   11.396000] Registering unionfs 1.4
> Apr 30 15:17:40 ubuntu kernel: [   11.396000] unionfs: debugging is not
> enabled
> Apr 30 15:17:40 ubuntu kernel: [   11.400000] loop: module loaded
> Apr 30 15:17:40 ubuntu kernel: [   11.664000] ide0 at 0x1f0-0x1f7,0x3f6
> on irq 14
> Apr 30 15:17:40 ubuntu kernel: [   11.668000] hda: max request size: 128KiB
> Apr 30 15:17:40 ubuntu kernel: [   11.668000] hda: 32014080 sectors
> (16391 MB) w/1KiB Cache, CHS=31760/16/63, DMA
> Apr 30 15:17:40 ubuntu kernel: [   11.668000]  hda: hda1
> 
> Which is exactly right.
> 
> 
> So how to kick the Mandriva SB600_PATA into seeing only the one ide channel?
> 
> Or, what needs kicking (or modprobe-ing) to bring the CF to life on
> Mandriva? Is this a 2.6.24 kernel problem?
> 
> 
> Help!
> Martin

Hope that might be of help to others similarly bemused!

Fixing the CF visibility can wait for another evening...

Good luck,
Martin

-- 
----------------
Martin Lomas
martin at ml1.co.uk
----------------



More information about the Nottingham mailing list