Page MenuHomeFreeBSD

Add quirk to disable NCQ TRIM for Samsung 860 SSDs
AbandonedPublic

Authored by ryan_freqlabs.com on Sep 14 2018, 9:04 AM.

Details

Reviewers
mav
Summary

Like the previous generations, Samsung 860 EVO SSDs have a broken NCQ TRIM command and report 512 byte sectors while being optimized for 4k.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

imp added a comment.Sep 14 2018, 9:13 PM

How were you able to determine this?
The latest version of Linux doesn't have this quirk.

ryan_freqlabs.com added a comment.EditedSep 15 2018, 4:26 AM
In D17169#366159, @imp wrote:

How were you able to determine this?
The latest version of Linux doesn't have this quirk.

It appears I was mistaken, and I'm inclined to suspect the AMD SB950 AHCI SATA controller is the problem. In that case, I can remove the NCQ_TRIM_BROKEN quirk from this diff, but it would still be nice to have the 4K quirk entry. Should I create a new diff for that or is it OK to simply update this one?


Explanation

While replacing the the Samsung 850 EVO SSDs in my system with Samsung 860 EVO SSDs, I started to notice error messages from the kernel about the new drives.
"Uncorrectable parity/CRC error" (among other errors, screenshot attached) appeared during ZFS resilvering.
Only the new drives were causing errors. I tested with different cables, swapped ports, etc.

The system has two AHCI SATA controllers:

  • AMD SB7x0/SB8x0/SB9x0 AHCI SATA controller (specifically, SB950)
  • ASMedia ASM1062 AHCI SATA controller

I have errors on the 860 drives (but not the 850 drives) when they are attached to the AMD SB950 controller. When I attach the same drives to the other controller and run a scrub, there are no errors. The ASMedia controller has its own quirk to disable NCQ, and I assumed that was masking the same broken NCQ TRIM command that existed for the 830, 840, 845, and 850.

I added loader tunables to test the quirks added by this diff:

kern.cam.ada.0.quirks=0x3
kern.cam.ada.1.quirks=0x3

I confirmed the tunables took effect by observing that sysctl kern.cam.ada.*.delete_method changed from NCQ_DSM_TRIM to DSM_TRIM for the new drives after a reboot.

Now I have not seen another "Uncorrectable parity/CRC error" message. The "Command timeout" and "ATA Status Error" messages still appear during scrub, but I have found reports of these same errors with the same pairing of 860 EVO drives with AMD SB950 controller on Linux and on Windows where the same controller worked with the older SSDs and the same SSDs work with different controllers (or different drivers perhaps).

At this point I decided to add the quirk and submit this diff.

However, since you pointed out that Linux doesn't have this quirk, I have recognized the assumption I made and am now performing more thorough testing.

After removing the quirks (reverting to a stock configuration), I have been able to reproduce the "Uncorrectable parity/CRC error" messages with an 860 and the SB950 by running benchmarks/fio using an example config with the following changes:

--- /usr/local/share/examples/fio/ssd-test.fio	2018-07-23 07:55:32.000000000 -0700
+++ ssd-test.fio	2018-09-14 12:41:22.060666000 -0700
@@ -12,12 +12,11 @@
 #
 [global]
 bs=4k
-ioengine=libaio
 iodepth=4
 size=10g
 direct=1
 runtime=60
-directory=/mount-point-of-ssd
+directory=/tmpzfs
 filename=ssd.test.file

 [seq-read]

/tmpzfs is the mountpoint of a pool I created on an 860.
I get the following errors during the course of the test:

Sep 14 12:44:09 fx-freebsd kernel: (ada1:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 2e ef 7f 40 10 00 00 01 00 00
Sep 14 12:44:09 fx-freebsd kernel: (ada1:ahcich5:0:0:0): CAM status: Uncorrectable parity/CRC error
Sep 14 12:44:09 fx-freebsd kernel: (ada1:ahcich5:0:0:0): Retrying command
Sep 14 12:44:09 fx-freebsd kernel: (ada1:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 2e f0 7f 40 10 00 00 01 00 00
Sep 14 12:44:09 fx-freebsd kernel: (ada1:ahcich5:0:0:0): CAM status: Uncorrectable parity/CRC error
Sep 14 12:44:09 fx-freebsd kernel: (ada1:ahcich5:0:0:0): Retrying command
[...snipped out more of the same for brevity...]
Sep 14 12:44:28 fx-freebsd kernel: (ada1:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 e9 d9 1e 40 11 00 00 01 00 00
Sep 14 12:44:28 fx-freebsd kernel: (ada1:ahcich5:0:0:0): CAM status: Uncorrectable parity/CRC error
Sep 14 12:44:28 fx-freebsd kernel: (ada1:ahcich5:0:0:0): Retrying command
Sep 14 12:44:28 fx-freebsd kernel: (ada1:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 e9 da 1e 40 11 00 00 01 00 00
Sep 14 12:44:28 fx-freebsd kernel: (ada1:ahcich5:0:0:0): CAM status: Uncorrectable parity/CRC error
Sep 14 12:44:28 fx-freebsd kernel: (ada1:ahcich5:0:0:0): Retrying command

Added the quirk to loader.conf, rebooted, confirmed the change took, and and ran the same test with the same drive in the same port. I see the same errors:

[...]
Sep 14 13:10:30 fx-freebsd kernel: (ada1:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 fa c6 29 40 16 00 00 01 00 00
Sep 14 13:10:30 fx-freebsd kernel: (ada1:ahcich5:0:0:0): CAM status: Uncorrectable parity/CRC error
Sep 14 13:10:30 fx-freebsd kernel: (ada1:ahcich5:0:0:0): Retrying command
Sep 14 13:10:30 fx-freebsd kernel: (ada1:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 fa c7 29 40 16 00 00 01 00 00
Sep 14 13:10:30 fx-freebsd kernel: (ada1:ahcich5:0:0:0): CAM status: Uncorrectable parity/CRC error
Sep 14 13:10:30 fx-freebsd kernel: (ada1:ahcich5:0:0:0): Retrying command

This quirk doesn't appear to be necessary (or even work) for the 860 EVO drives after all. In hindsight it didn't make any sense to test TRIM by doing a scrub. Since this new test seems reliable, I ran it in several other configurations, for good measure:

DriveDrive QuirksPlatformControllerController QuirksErrors
860 EVONoneIntelIntel Lynx Point AHCI SATANoneNone
860 EVONoneIntelASMedia ASM1062 AHCI SATANOCCS, NOAUXNone
860 EVONoneAMDASMedia ASM1062 AHCI SATANOCCS, NOAUXNone
860 EVONoneAMDAMD SB950 AHCI SATAATI_PMP_BUG, 1MSIUncorrectable parity/CRC error, ATA Status Error, Command timeout
860 EVO4K, NCQ_TRIM_BROKENAMDAMD SB950 AHCI SATAATI_PMP_BUG, 1MSIUncorrectable parity/CRC error, ATA Status Error, Command timeout
850 EVO4K, NCQ_TRIM_BROKENAMDAMD SB950 AHCI SATAATI_PMP_BUG, 1MSINone

Removed TRIM quirk. Added 4k quirk for SCSI da in addition to ATA da.

ryan_freqlabs.com abandoned this revision.Sep 19 2019, 2:44 AM