Details

Reviewers

jhb
mav
chuck
chs

Commits

rG001c45a1229a: nvme: Fix locking protocol violation to fix suspend / resume
rG81b118e842f1: nvme: Fix locking protocol violation to fix suspend / resume
rGda8324a9258f: nvme: Fix locking protocol violation to fix suspend / resume

Summary

Currently, when we suspend, we need to tear down all the qpairs. We call
nvme_admin_qpair_abort_aers with the admin qpair lock held, but the
tracker it will call for the pending AER also locks it (recursively)
hitting an assert. This routine is called without the qpair lock held
when we destroy the device entirely in a number of places. Add an assert
to this effect and drop the qpair lock before calling it.
nvme_admin_qpair_abort_aers then locks the qpair lock to traverse the
list, dropping it around calls to nvme_qpair_complete_tracker, and
restarting the list scan after picking it back up.

Note: If interrupts are still running, there's a tiny window for these
AERs: If one fires just an instant after we manually complete it, then
we'll be fine: we set the state of the queue to 'waiting' and we ignore
interrupts while 'waiting'. We know we'll destroy all the queue state
with these pending interrupts before looking at them again and we know
all the TRs will have been completed or rescheduled.

Also, tidy up the failure case as well: failing a queue is a superset of
disabling it, so no need to call disable first. This solves solves some
locking issues. Set the qpair state of failed queues to RECOVERY_FAILED
and stop scheduling the watchdog. Assert we're not failed when we're
enabling a qpair. Make failure a litle less verbose.

Next, kill the pre/post reset stuff. It's completely bogus since we
disable the qparis, we don't need to also hold the lock through the
reset: disabling will cause the ISR to return early. This keeps us from
recursing on the recovery lock when resuming.

Finally, kill NVME_RESET_2X. It'S been a major release since we put it
in and nobody has used it as far as I can tell. And it was a motivator
for the pre/post uglification.

These are all interrelated, so need to be done at the same time.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Skipped

Unit

Tests Skipped

Build Status

Buildable 53593
Build 50484: arc lint + arc unit

Event Timeline

imp created this revision.Sep 15 2023, 9:23 AM

Herald added a subscriber: dab. · View Herald TranscriptSep 15 2023, 9:23 AM

imp requested review of this revision.Sep 15 2023, 9:23 AM

Harbormaster completed remote builds in B53590: Diff 127403.Sep 15 2023, 9:23 AM

fix comment

Harbormaster completed remote builds in B53591: Diff 127404.Sep 15 2023, 9:30 AM

imp added inline comments.Sep 15 2023, 9:31 AM

sys/dev/nvme/nvme_qpair.c
960	I'm not 100% sure that this is race free if we're racing the ISR to complete these tasks. Any opinions on this? Is there a chance we'll free twice, or have the tailq_next race another instance of nvme_qpair_complete_tracker running?

hmmm, we do need the lock.

Harbormaster completed remote builds in B53592: Diff 127407.Sep 15 2023, 9:40 AM

imp added inline comments.Sep 15 2023, 9:48 AM

sys/dev/nvme/nvme_qpair.c
960	Since interrupts might be running, we do need to hold the lock during the traverse, and drop it around calls to complete_tracker to be safe. We should really only have AERs here now, and they shouldn't be firing, so practically it won't matter. BUT I'm not sure what would happen if the AER completes just an instant after we manually complete it... I think that we'll set the rt in our cid mapping array to NULL and we'll hit the assert in _nvme_qpair_process_completions

races

Harbormaster completed remote builds in B53593: Diff 127412.Sep 15 2023, 10:19 AM

imp added inline comments.Sep 15 2023, 12:02 PM

sys/dev/nvme/nvme_qpair.c
960	This race can't happen because disable_queues() sets the state to WAITING so we ignore races...

Update locking protocol

John and I chatted about this at EuroBSDCon and we don't need the pre-post reset
thing: disabled is enough since we abort the ISR when we're in state
RECOVERY_WAITING. No need to also have the lock for this case. This solves a
problem on resume where we'd otherwise recurse these locks.

Also, noticed the failure case is lame and fixed it. We now don't disable first
(we don't need to) and we have a new recoverys state: FAILED. This state is
terminal.

Harbormaster completed remote builds in B53616: Diff 127459.Sep 16 2023, 11:41 PM

fix stupid stray character, failure cases safer, etc

Harbormaster completed remote builds in B53617: Diff 127460.Sep 16 2023, 11:53 PM

imp edited the summary of this revision. (Show Details)Sep 16 2023, 11:53 PM

imp edited the summary of this revision. (Show Details)Sep 16 2023, 11:55 PM

This all looks sane to me.