Revamp nvme recovery
AbandonedPublic
Actions

Authored by imp on Apr 3 2020, 11:31 PM.

Details

Reviewers

dab
daniel.william.ryan_gmail.com

Summary

Calling the completion routine from the timout was a good idea for the first
generation of missing interrupt problems that we had. However, since we've solve
the real root cause for them its need is much lessened. In addition, it's always
been racy, but as a 'last resort' it wasn't too bad.

However, under heavy load on some cloud platforms we see missing interrupts for
reasons unknown. In that environment, something the timeout was doing was
causing the interrupt to fire, causing a race between the two threads. To fix
this race would require an additional lock / unlock pair in the hot path.

To avoid that, we revamp how we do recovery. If the controller has failed, we
still do what we did before: we reset the controller. This also picks up the
hotplug case.

However, for the first timeout, we now send an innocuous command to the card (A
GET_FEATURES for a required feature). We start a new timeout on the original
request (but do nothing further). If that works, then we do nothing further as
the card is unwedge. If sending that command causes a timeout, we reset the
card. If the original command doesn't get a completion before its new timeout
exires, we either abort the command (if enable_abort = 1), or we reset the
card. If sending the abort times out, then we reset the card.

This should be more robust and free races. It remains to be seen if it actully
works.

Diff Detail

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 30270
Build 28047: arc lint + arc unit

Event Timeline

imp created this revision.Apr 3 2020, 11:31 PM

Herald added a subscriber: dab. · View Herald TranscriptApr 3 2020, 11:31 PM

Harbormaster completed remote builds in B30270: Diff 70172.Apr 3 2020, 11:31 PM

imp mentioned this in D24256: Avoid concurrent nvme_qpair_process_completions runs.Apr 3 2020, 11:33 PM

imp added inline comments.

sys/dev/nvme/nvme_private.h
340	I should likely make this a counter and publish it as a sysctl.
sys/dev/nvme/nvme_qpair.c
883	Maybe I need a comment here justifying reset on error. The GET FEATURES command is a can't fail sort of thing for a mandatory feature like arbitration.

cperciva added a subscriber: cperciva.Apr 3 2020, 11:48 PM

cperciva added inline comments.

sys/dev/nvme/nvme_qpair.c
918	If nvme_qpair_complete_tracker lost a race, we could enter nvme_timeout with tr->req = NULL, resulting in a panic when we dereference req. For that matter, with sufficient racing it's possible that tr->req is non-NULL and refers to a completely different request to the one which triggered the timeout.

I'm sure we have some interested parties in this revision, but I'm not sure if they have Phab accounts. Added some possibilities, feel free to remove yourselves / forward.

dab added a subscriber: rramsden_isilon.com.Apr 6 2020, 12:27 PM

imp abandoned this revision.Jul 10 2021, 4:43 PM

imp added inline comments.