nvme: fix a race between failing the controller and failing requests
ClosedPublic
Actions

Authored by imp on May 20 2021, 6:57 PM.

Details

Reviewers

mav
jhb
markj

Commits

rG225c0eda7efb: nvme: fix a race between failing the controller and failing requests
rG1109ce0da292: nvme: fix a race between failing the controller and failing requests
rGf0f47121653e: nvme: fix a race between failing the controller and failing requests

Summary

Part of the nvme recovery process for errors is to reset the
card. Sometimes, this results in failing the entire controller. When nda
is in use, we free the sim, which will sleep until all the I/O has
completed. However, with only one thread, the request fail task never
runs once the reset thread sleeps here. Create two threads to allow I/O
to fail until it's all processed and the reset task can proceed.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

imp created this revision.May 20 2021, 6:57 PM

Herald added a subscriber: dab. · View Herald TranscriptMay 20 2021, 6:57 PM

imp requested review of this revision.May 20 2021, 6:57 PM

Harbormaster completed remote builds in B39320: Diff 89551.May 20 2021, 6:57 PM

imp edited the test plan for this revision. (Show Details)May 20 2021, 6:58 PM

imp added reviewers: mav, jhb, markj.

It seems a bit sad to have to use a blocking task for this vs some kind of counter or state machine such that the task doesn't have to block.

sys/dev/nvme/nvme_ctrlr.c
1445

jhb accepted this revision.May 20 2021, 7:18 PM

This revision is now accepted and ready to land.May 20 2021, 7:18 PM

In D30366#682171, @jhb wrote:

It seems a bit sad to have to use a blocking task for this vs some kind of counter or state machine such that the task doesn't have to block.

In an ideal world, we'd just deref the sim, and it would go away when the count reaches 0 after the I/O has failed and then cleanup would happen.
But today sim removal is synchronous and cam_sim_free likely has consumers that assume it's really gone once it returns, though I've
not done an audit. So having a deref it and have it eventually go away will have to wait...

The other workaround would be to fail all the I/O inline as part of freeing the sim. I worried about races, though smaller, there and didn't
want it half fixed. Though inelegant and a really big hammer, this approach ensures that kind of race can't happen.

I don't have particular objections, but it makes me wonder why _nvme_qpair_submit_request() in case of failed controller we decouple the completion via the taskqueue/nvme_ctrlr_post_failed_request(), while in case of busdma error we are calling nvme_qpair_manual_complete_tracker() directly.

tweak comment a bit

This revision now requires review to proceed.May 20 2021, 7:37 PM

Harbormaster completed remote builds in B39322: Diff 89555.May 20 2021, 7:37 PM

In D30366#682178, @mav wrote:

I don't have particular objections, but it makes me wonder why _nvme_qpair_submit_request() in case of failed controller we decouple the completion via the taskqueue/nvme_ctrlr_post_failed_request(), while in case of busdma error we are calling nvme_qpair_manual_complete_tracker() directly.

I'm not sure why we need a task to do the failures at all...

This requires a bit of work to cover the races when a consumer is sending
I/O requests to a controller that is transitioning to the failed state.  To
help cover this condition, add a task to defer completion of I/Os submitted
to a failed controller, so that the consumer will still always receive its
completions in a different context than the submission.

So there once was a need for it, but it's not clear to me that the need is still
there now that we have more locking in the driver and can cope with other
issues like hotplug better. It is a good question for another day. So I'm not sure
why that wouldn't also apply to failed busdma loading...

I'll commit this for now, but investigate other ways to fail the requests. It's not at all clear to me that the races that were mentioned in the initial commit of this code are still relevant or not.

This revision was not accepted when it landed; it landed in state Needs Review.May 29 2021, 5:06 AM

Closed by commit rGf0f47121653e: nvme: fix a race between failing the controller and failing requests (authored by imp). · Explain Why

This revision was automatically updated to reflect the committed changes.

imp added a commit: rGf0f47121653e: nvme: fix a race between failing the controller and failing requests.

imp added a commit: rG1109ce0da292: nvme: fix a race between failing the controller and failing requests.Jul 12 2021, 7:34 PM

imp added a commit: rG225c0eda7efb: nvme: fix a race between failing the controller and failing requests.Jul 31 2021, 12:23 AM