Page MenuHomeFreeBSD

Revamp nvme recovery

Authored by imp on Apr 3 2020, 11:31 PM.
Referenced Files
Unknown Object (File)
Thu, Jan 12, 2:02 PM
Unknown Object (File)
Jan 2 2023, 5:29 AM
Unknown Object (File)
Dec 12 2022, 12:27 PM
Unknown Object (File)
Nov 28 2022, 8:15 AM



Calling the completion routine from the timout was a good idea for the first
generation of missing interrupt problems that we had. However, since we've solve
the real root cause for them its need is much lessened. In addition, it's always
been racy, but as a 'last resort' it wasn't too bad.

However, under heavy load on some cloud platforms we see missing interrupts for
reasons unknown. In that environment, something the timeout was doing was
causing the interrupt to fire, causing a race between the two threads. To fix
this race would require an additional lock / unlock pair in the hot path.

To avoid that, we revamp how we do recovery. If the controller has failed, we
still do what we did before: we reset the controller. This also picks up the
hotplug case.

However, for the first timeout, we now send an innocuous command to the card (A
GET_FEATURES for a required feature). We start a new timeout on the original
request (but do nothing further). If that works, then we do nothing further as
the card is unwedge. If sending that command causes a timeout, we reset the
card. If the original command doesn't get a completion before its new timeout
exires, we either abort the command (if enable_abort = 1), or we reset the
card. If sending the abort times out, then we reset the card.

This should be more robust and free races. It remains to be seen if it actully

Diff Detail

Lint Passed
No Test Coverage
Build Status
Buildable 30270
Build 28047: arc lint + arc unit

Event Timeline

imp added inline comments.

I should likely make this a counter and publish it as a sysctl.


Maybe I need a comment here justifying reset on error. The GET FEATURES command is a can't fail sort of thing for a mandatory feature like arbitration.

cperciva added inline comments.

If nvme_qpair_complete_tracker lost a race, we could enter nvme_timeout with tr->req = NULL, resulting in a panic when we dereference req.

For that matter, with sufficient racing it's possible that tr->req is non-NULL and refers to a *completely different request* to the one which triggered the timeout.

cem added a subscriber: cem.

I'm sure we have some interested parties in this revision, but I'm not sure if they have Phab accounts. Added some possibilities, feel free to remove yourselves / forward.

imp added inline comments.

Wouldn't cancelling the timeout be sufficient, which is done as part of the tracking tear down during completion?