sata NCQ errors (Re: [Nottingham] Missing CF drive with libata...)
Martin
martin at ml1.co.uk
Sat May 24 23:21:05 BST 2008
For anyone similarly afflicted:
It all started with a curiously disappearing CF drive...
Using Mandriva 2.6.24 on a Gigabyte GA-MA790FX-DS5 with a Sandisk
Extreme III CompactFlash plugged into a CF-ide adapter plugged into the
ide Ch0 Master, the libata/ata_generic kernel modules appeared to bomb
everything down to just a maximum transfer rate of low MB/s. Including
for the sata HDDs!
Removing the CF and disabling the ide channel in the BIOS caused the
full disk IO of tens of MB/s for the HDDs to be seen again.
And then the next strangeness... Native Command Queuing on the sata HDDs:
egrep '(kernel: ata)|(kernel: sd)' /var/log/messages
kernel: ata7.00: exception Emask 0x10 SAct 0xf SErr 0x580100 action 0x2
kernel: ata7.00: irq_stat 0x08000000
kernel: ata7: SError: { UnrecovData 10B8B Dispar Handshk }
[...]
kernel: ata7: hard resetting link
Those errors are for:
kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
kernel: ata7.00: ATA-7: Hitachi HUA721010KLA330, GKAOA70M, max UDMA/133
kernel: ata7.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
kernel: ata7.00: configured for UDMA/133
I also have in there:
kernel: ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
kernel: ata8.00: ATA-7: WDC WD1500ADFD-00NLR5, 21.07QR5, max UDMA/133
kernel: ata8.00: 293046768 sectors, multi 0: LBA48 NCQ (depth 31/32)
kernel: ata8.00: configured for UDMA/133
which hasn't so far given those errors, but others have blacklisted it:
http://bugs.centos.org/view.php?id=2609&nbn=1
for the NLR4 drive version. Is that /really/ fixed in that NLR5?...
This condemns NCQ on all Western Digital HDDs:
http://www.transcoding.org/cgi-bin/wiki?action=browse&id=Western_Digital_NCQ&revision=26
Hitachi gets a few hits on a web search for NCQ problems also...
The NCQ appears to fail under heavy loading, which is exactly when you
want it to be working!
OK, so to change NCQ to be effectively disabled, you can use:
echo 1 >/sys/block/sdX/device/queue_depth
where X is your offending device.
Is there a better way of setting this for boot rather than doing the
hack of adding that line onto the end of "/etc/rc.d/rc.local" (or
whatever other startup script)?
See also:
http://linux-ata.org/faq.html
The fun all started with:
Martin wrote:
> OK, a curious one...
>
> Loading up Mandriva 2.6.24.4 onto a new machine and a SanDisk
> CompactFlash plugged into ide0 (master) (via an ide -> CF adapter) is
> not discovered.
>
> The CF is seen and identified by the BIOS and I've updated to the latest
> BIOS for the motherboard (SB600 chip for the single ide channel) just in
> case...
>
> I'm also using LVM on top of software RAID.
>
> The only significantly visible difference is for the selection of
> modules. Also on Mandriva, the SB600_PATA module logs that it runs out
> of table space...
>
> Help?
>
>
>
> The dirty details:
>
> Mandriva is using (lsmod selected results):
>
> fuse 51504 1
> loop 20100 0
> usb_storage 108100 0
> sg 39064 0
> ide_disk 18816 0
> atiixp 8848 0 [permanent]
> jmicron 6912 0 [permanent]
> ide_core 123928 3 ide_disk,atiixp,jmicron
> shpchp 36764 0
> pci_hotplug 32816 1 shpchp
> ata_piix 22788 0
> ahci 31620 8
> libata 155696 2 ata_piix,ahci
> sd_mod 31872 12
> scsi_mod 157880 5 sbp2,usb_storage,sg,libata,sd_mod
> raid456 129568 0
> async_xor 8448 1 raid456
> async_memcpy 6912 1 raid456
> async_tx 11252 3 raid456,async_xor,async_memcpy
> xor 9744 2 raid456,async_xor
> raid1 25728 2
> reiserfs 246536 4
> uhci_hcd 29088 0
> ohci_hcd 27268 0
> ehci_hcd 40076 0
> usbcore 145712 5 usb_storage,uhci_hcd,ohci_hcd,ehci_hcd
>
>
> For comparison, an Ubuntu LiveDVD works fine and shows and accesses the
> SanDisk no problem:
>
> Ubuntu 2.6.22-14
>
> pci_hotplug 32704 1 shpchp
> squashfs 48132 1
> ide_disk 18560 0
> loop 19076 2
> unionfs 77096 1
> nls_cp437 6784 1
> isofs 36412 1
> sd_mod 30336 4
> sg 36764 0
> sr_mod 17828 1
> cdrom 37536 1 sr_mod
> ata_generic 8452 0
> atiixp 7056 0 [permanent]
> ide_core 116804 2 ide_disk,atiixp
> r8169 32260 0
> pata_jmicron 7552 0
> ahci 23300 3
> ehci_hcd 36492 0
> libata 125168 3 ata_generic,pata_jmicron,ahci
> scsi_mod 147084 4 sd_mod,sg,sr_mod,libata
> ohci_hcd 22916 0
> usbcore 138632 3 ehci_hcd,ohci_hcd
> fuse 47124 1
>
>
> Further info:
>
> On Mandriva there is no mention of the SanDisk anywhere in the logs. The
> only complaint is:
>
> Jan 1 03:45:51 Server01a kernel: shpchp: Standard Hot Plug PCI
> Controller Driver version: 0.4
> Jan 1 03:45:51 Server01a kernel: Uniform Multi-Platform E-IDE driver
> Revision: 7.00alpha2
> Jan 1 03:45:51 Server01a kernel: ide: Assuming 33MHz system bus speed
> for PIO modes; override with idebus=xx
> Jan 1 03:45:51 Server01a kernel: JMB: IDE controller (0x197b:0x2363 rev
> 0x02) at PCI slot 0000:05:00.1
> Jan 1 03:45:51 Server01a kernel: PCI: Enabling device 0000:05:00.1
> (0000 -> 0001)
> Jan 1 03:45:51 Server01a kernel: ACPI: PCI Interrupt 0000:05:00.1[B] ->
> GSI 18 (level, low) -> IRQ 18
> Jan 1 03:45:51 Server01a kernel: JMB: 100% native mode on irq 18
> Jan 1 03:45:51 Server01a kernel: ide0: BM-DMA at 0xbb00-0xbb07,
> BIOS settings: hda:pio, hdb:pio
> Jan 1 03:45:51 Server01a kernel: ide1: BM-DMA at 0xbb08-0xbb0f,
> BIOS settings: hdc:pio, hdd:pio
> Jan 1 03:45:51 Server01a kernel: JMB: IDE controller (0x197b:0x2363 rev
> 0x02) at PCI slot 0000:06:00.1
> Jan 1 03:45:51 Server01a kernel: PCI: Enabling device 0000:06:00.1
> (0000 -> 0001)
> Jan 1 03:45:51 Server01a kernel: ACPI: PCI Interrupt 0000:06:00.1[B] ->
> GSI 19 (level, low) -> IRQ 19
> Jan 1 03:45:51 Server01a kernel: JMB: 100% native mode on irq 19
> Jan 1 03:45:51 Server01a kernel: ide2: BM-DMA at 0xab00-0xab07,
> BIOS settings: hde:pio, hdf:pio
> Jan 1 03:45:51 Server01a kernel: ide3: BM-DMA at 0xab08-0xab0f,
> BIOS settings: hdg:DMA, hdh:DMA
> Jan 1 03:45:51 Server01a kernel: SB600_PATA: IDE controller
> (0x1002:0x438c rev 0x00) at PCI slot 0000:00:14.1
> Jan 1 03:45:51 Server01a kernel: ACPI: PCI Interrupt 0000:00:14.1[A] ->
> GSI 16 (level, low) -> IRQ 16
> Jan 1 03:45:51 Server01a kernel: SB600_PATA: not 100% native mode: will
> probe irqs later
> Jan 1 03:45:51 Server01a kernel: SB600_PATA: too many IDE interfaces,
> no room in table
>
> Don't believe the date!
>
>
> Ubuntu shows:
>
> Apr 30 15:17:40 ubuntu kernel: [ 10.704000] SB600_PATA: IDE controller
> at PCI slot 0000:00:14.1
> Apr 30 15:17:40 ubuntu kernel: [ 10.704000] ACPI: PCI Interrupt
> 0000:00:14.1[A] -> GSI 16 (level, low) -> IRQ 16
> Apr 30 15:17:40 ubuntu kernel: [ 10.704000] SB600_PATA: chipset revision 0
> Apr 30 15:17:40 ubuntu kernel: [ 10.704000] SB600_PATA: not 100%%
> native mode: will probe irqs later
> Apr 30 15:17:40 ubuntu kernel: [ 10.704000] ide0: BM-DMA at
> 0xf900-0xf907, BIOS settings: hda:DMA, hdb:pio
> [...]
> Apr 30 15:17:40 ubuntu kernel: [ 10.992000] hda: SanDisk SDCFX3-16384,
> CFA DISK drive
> Apr 30 15:17:40 ubuntu kernel: [ 11.396000] Registering unionfs 1.4
> Apr 30 15:17:40 ubuntu kernel: [ 11.396000] unionfs: debugging is not
> enabled
> Apr 30 15:17:40 ubuntu kernel: [ 11.400000] loop: module loaded
> Apr 30 15:17:40 ubuntu kernel: [ 11.664000] ide0 at 0x1f0-0x1f7,0x3f6
> on irq 14
> Apr 30 15:17:40 ubuntu kernel: [ 11.668000] hda: max request size: 128KiB
> Apr 30 15:17:40 ubuntu kernel: [ 11.668000] hda: 32014080 sectors
> (16391 MB) w/1KiB Cache, CHS=31760/16/63, DMA
> Apr 30 15:17:40 ubuntu kernel: [ 11.668000] hda: hda1
>
> Which is exactly right.
>
>
> So how to kick the Mandriva SB600_PATA into seeing only the one ide channel?
>
> Or, what needs kicking (or modprobe-ing) to bring the CF to life on
> Mandriva? Is this a 2.6.24 kernel problem?
>
>
> Help!
> Martin
Hope that might be of help to others similarly bemused!
Fixing the CF visibility can wait for another evening...
Good luck,
Martin
--
----------------
Martin Lomas
martin at ml1.co.uk
----------------
More information about the Nottingham
mailing list