Page MenuHomeFreeBSD

nvme: Add a workaround for spurious completion interrupts.
AbandonedPublic

Authored by imp on Wed, Jul 14, 8:37 PM.

Details

Reviewers
mav
chuck
Summary

At least one experimental NVMe card has passed through our test labs
that would interrupt before it was started and its completion queue
memory would read all 0's. This would lead us to believe we had a queue
size'd set of completions that completed w/o trackers. Work around this
by making sure we've submitted commands to queue when the cid tracker
lookup returns NULL. For cards w/o this issue, this code will never run.

Sponsored by: Netflix

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 40476
Build 37365: arc lint + arc unit

Event Timeline

imp requested review of this revision.Wed, Jul 14, 8:37 PM

The motivation is weird to me. If we never saw this before on any production unit, not sure we should care about samples. But if you need it, I'm fine about it.

In D31183#701837, @mav wrote:

The motivation is weird to me. If we never saw this before on any production unit, not sure we should care about samples. But if you need it, I'm fine about it.

Yes. I thought about making it an ifdef as well...

I think that this may be because we enable interrupts before we've written the ACQ. The card is the Nth one in our lab machine, so isn't initialized at all by the BIOS. We enable the interrupts as part of nvme_qpair_construct, which is called for the admin queue before we call nvme_ctrlr_hw_reset( ) and ultimately write the ACQ. We compute that based on the memory we located in nvme_qpair_construct for the DMA buffer. We now, after D31182, enable interrupts after this. We could, in theory, write these addresses before we enable the interrupts. The standard is opaque as to this detail, imho. We need to enable interrupts on the admin queue before we send any commands, but needn't do them where we are. My experimentation with this didn't lead to satisfactory results either, though I could give it another try tomorrow if you think that your unease about the propose changes is telling us something else is wrong...

No objection here to the change (robustness / sanity checks and all), but it does seem like an odd thing to need to do.

After some careful thought, I'm going to punt on this. The other workaround I put in is adequate, even though it's a bit messy.
This is for an extreme edge case in beta hardware, so likely shouldn't clutter up FreeBSD.