Page MenuHomeFreeBSD

nvme: Greatly improve error recovery
ClosedPublic

Authored by imp on Oct 10 2022, 2:15 AM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Dec 27, 8:48 PM
Unknown Object (File)
Nov 6 2024, 9:36 PM
Unknown Object (File)
Nov 6 2024, 9:24 PM
Unknown Object (File)
Nov 4 2024, 12:40 PM
Unknown Object (File)
Oct 30 2024, 7:10 AM
Unknown Object (File)
Oct 18 2024, 12:54 PM
Unknown Object (File)
Sep 26 2024, 7:29 AM
Unknown Object (File)
Sep 24 2024, 3:06 PM

Details

Summary

Next phase of error recovery: Eliminate the REOVERY_START phase, since
we don't need to wait to start recovery. Eliminate the RECOVERY_RESET
phase since it is transient, we now transition from RECOVERY_NORMAL into
RECOVERY_WAITING.

In normal mode, read the status of the controller. If it is in failed
state, or appears to be hot-plugged, jump directly to reset which will
sort out the proper things to do. This will cause all pending I/O to
complete with an abort status before the reset.

When in the NORMAL state, call the interrupt handler. This will complete
all pending transactions when interrupts are broken or temporarily
misbehaving. We then check all the pending completions for timeouts. If
we have abort enabled, then we'll send an abort. Otherwise we'll assume
the controller is wedged and needs a reset. By calling the interrupt
handler here, we'll avoid an issue with the current code where we
transitioned to RECOVERY_START which prevented any completions from
happening. Now completions happen. In addition and follow-on I/O that is
scheduled in the completion routines will be submitted, rather than
queued, because the recovery state is correct. This also fixes a problem
where I/O would timeout, but never complete, leading to hung I/O.

Resetting remains the same as before, just when we chose to reset has
changed.

A nice side effect of these changes is that we now do I/O when
interrupts to the card are totally broken. Followon commits will improve
the error reporting and logging when this happens. Performance will be
aweful, but will at least be minimally functional.

MFC After: 3 days
Sponsored by: Netflix

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 47761
Build 44648: arc lint + arc unit

Event Timeline

imp requested review of this revision.Oct 10 2022, 2:15 AM
sys/dev/nvme/nvme_qpair.c
1000

There is NVME_GONE define for this, actually all-1s also means cfs set, so this check for NVME_GONE does not add much.

1012

Is this safe to call it in parallel to active interrupts? I don't see any locking there.

imp marked 2 inline comments as done.Aug 12 2023, 4:56 PM
imp added inline comments.
sys/dev/nvme/nvme_qpair.c
1000

Will use NVME_GONE.

I'll agree that the csts check doesn't add much. It's more about intent, though that can be covered by the comments adequately so I'll remove it.

1012

I keep going round and round about how safe it is (but although it is safer than it used to be, this is a great question because I don't think it's safe enough after looking closely). The qpair lock is used to protect qpair state, so it's safe from that perspective... But it doesn't protect all the qpair state (even if I fixed a couple of unprotected increments). And it doesn't protect the hardware against parallel access, which would be a problem.

We already make this call, so it's not a new problem.... But it's a problem that should be fixed none-the-less. I have an ideal I'll explore and post a review on and a pointer here. IIRC, the one issue I had last time I tried to do locking was in the case of a panic while we happened to be in the completion routine: We can't have only one in the completion routine if we need to poll to write crash dumps.... But I think that's surmountable.

For the case I'm trying to fix, there's no races since there's no interrupts, but for the more typical case,

https://reviews.freebsd.org/D36924 is what's needed, but there may be better, more performant ways to accomplish this.

sys/dev/nvme/nvme_qpair.c
988

I'm surprised this is locking manually rather than using callout_init_mtx().

1011

Dropping the lock here means you can't depend on callout_stop() under the lock not having races. If nvme_qpair_process_completions just locks it again, maybe add a _locked variant of nvme_qpair_process_completions this can call? (Or make all the callers acquire the lock?)

imp marked 2 inline comments as done.Aug 14 2023, 11:41 PM
imp added inline comments.
sys/dev/nvme/nvme_qpair.c
988

tl;dr: I need a recovery lock and a tracker (and other state) lock that this comment really exposes. Will update for that and other comments

This revision was not accepted when it landed; it landed in state Needs Review.Aug 25 2023, 4:20 PM
This revision was automatically updated to reflect the committed changes.