Wait for EN transition on disable

Authored by imp on Dec 6 2017, 1:37 AM.



When we're disabling the nvme device, we need to wait first for the EN bit to go from 1 to 0, and then we need to wait for the RDY bit to go to 0. We have a timeout of 2500ms.

Before, we were just waiting a little bit and assuming life is
good. However, we waited only 5ms. Linux, for quirked devices, waits
2.3s. We shouldn't need to wait so long, since we can expect cards to
follow the standard here (Linux doesn't check, but likely should).

This results in not being able to recover from wedged cards on a busy
system on at least one card in Linux's quirk list.

Finally, fail the reset if the disable fails. This will lead to a failed device, which is what we want.

Sponsored by: Netflix

Test Plan

Note: We go off the rails when we fail, especially with the nda client, but that's for another time to fix.

Diff Detail

rS FreeBSD src repository
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.
imp created this revision.Dec 6 2017, 1:37 AM
imp edited the summary of this revision. (Show Details)Dec 6 2017, 1:40 AM
imp edited the test plan for this revision. (Show Details)
imp added reviewers: jimharris, scottl.
imp updated this revision to Diff 36279.Dec 6 2017, 1:41 AM
imp edited the test plan for this revision. (Show Details)

Actually fail the reset.

imp updated this revision to Diff 36280.Dec 6 2017, 1:43 AM

Doh, must commit before arc diff --update it seems.

jimharris accepted this revision.Dec 7 2017, 5:22 PM
This revision is now accepted and ready to land.Dec 7 2017, 5:22 PM
chuck added a subscriber: chuck.Dec 7 2017, 7:05 PM
chuck added inline comments.
250 ↗(On Diff #36280)

I suspect what the code was trying to do here was follow the CC.EN guidance in the spec (section 3.1.5 of the 1.3 spec) that transitioning CC.EN from 0 to 1 when CSTS.RDY is 1 or transitioning CC.EN from 1 to 0 when CSTS.RDY is 0 "has undefined results".

If that is the case, there should be code similar to:

if (csts.bits.rdy == desired) {
        "%s: cannot transition in undefined state "
        "(cc.en -> %u when csts.rdy = %u)\n",
        __func__, desired, csts.bits.rdy);
    return (ENXIO);

in nvme_ctrlr_disable() and nvme_ctrlr_enable(), but not here.

1134 ↗(On Diff #36280)

Vendors specify the maximum time for CSTS.RDY and CSTS.SHST transitions in CAP.TO, and by in large, test that their products meet this value. While the other changes seem good, this change forces the driver to wait more than the vendor specifies and longer than the spec mandates.

The original code here is what I would do and have done.

imp added inline comments.Dec 7 2017, 7:31 PM
250 ↗(On Diff #36280)

The old code didn't do that at all... But this is a good suggestion.

1134 ↗(On Diff #36280)

I'm not sure I get this comment.

We still use ready_timeout_in_ms which is derived from CAP.TO. What enable_timeout does is for the CC.EN readback after we've set it to make sure that it transitions before we start the timer on the RDY transition (which is still bounded by ready_timeout_in_ms). I couldn't find a spec for how long it will take for EN to transition.

imp updated this revision to Diff 36358.Dec 7 2017, 8:22 PM

Update, per chuck's feedback.

This revision now requires review to proceed.Dec 7 2017, 8:22 PM
chuck added inline comments.Dec 7 2017, 8:31 PM
1134 ↗(On Diff #36280)

Bah, I misunderstood what the code was trying to do.

Best I can tell, the Linux code doesn't check that the register write to CC completed for either the enable or disable case. My interpretation of the Linux code is that it allows extra time for the quirked device to set CSTS.RDY == 0.

Since NVMe doesn't specify when implementation must update the EN field, there's an argument that says the code which waits for this field to be set is not guaranteed to be correct for compliant implementations. This would argue for removing the check of cc.bits.en altogether. Also, wouldn't PCIe transaction ordering rules prevent the read of CSTS (non-posted) from passing the write to CC (posted)?

If the troublesome device needs additional time to disable, wouldn't it be better to add a quirk for the device and then add an additional DELAY() in nvme_ctrlr_disable() for those devices?

imp updated this revision to Diff 36359.Dec 7 2017, 8:34 PM

Remove unused read of cc register.

imp added inline comments.Dec 7 2017, 8:44 PM
250 ↗(On Diff #36280)

Ah, I see what you are saying... The code was trying to enforce that as a precondition.... However, since we just changed things, there's a race here, I think, so I'll leave this check out.

1134 ↗(On Diff #36280)

I hate quirks. They are super ugly. Linux always sleeps 2.3s on disable for these devices.

Also, it's hard to know that you need to wait until you've had one FAIL, notice that the failure was because the timeout was insufficient, and add a quirk. And if your OPS group is lame, it may mean you have to wait for several failures to happen before it makes it back to you for FA.

247 ↗(On Diff #36358)

This is now unused, remove it.

imp updated this revision to Diff 36369.Dec 7 2017, 11:59 PM

After talking with an NVME expert, go with the delay + quirk instead
since this is a very odd thing, and doing the EN read dance (a) would
fail on these cards and (b) isn't a standard conforming thing (as
already pointed out in the review).

chuck added inline comments.Dec 8 2017, 1:55 AM
107 ↗(On Diff #36369)

Should the device ID be 0xa821144d?

108 ↗(On Diff #36369)

Should the device ID be 0xa822144d?

imp added inline comments.Dec 8 2017, 4:25 AM
107 ↗(On Diff #36369)


108 ↗(On Diff #36369)


imp updated this revision to Diff 36374.Dec 8 2017, 4:26 AM

PCI IDs. For the Win.

chuck accepted this revision.Dec 11 2017, 5:17 AM

Other than the noted if check, this looks good to me.

285 ↗(On Diff #36374)

If EN and RDY are out of sync, the user probably needs to issue a reset to recover the card. Would it be useful to log this?

327 ↗(On Diff #36374)

Should this check CSTS.RDY? Something like

if (csts.bit.rdy == 1) {
        /* EN == 0 already wait for RDY == 0 or fail */
        err = nvme_ctrlr_wait_for_ready(ctrlr, 0);
        if (err != 0)
                return (err);
This revision is now accepted and ready to land.Dec 11 2017, 5:17 AM
dch added a subscriber: dch.Dec 18 2017, 1:33 PM
This revision was automatically updated to reflect the committed changes.