Changeset View
Standalone View
sys/dev/nvme/nvme_qpair.c
Context not available. | |||||
if (!qpair->is_enabled) | if (!qpair->is_enabled) | ||||
return (false); | return (false); | ||||
mtx_lock(&qpair->complete_lock); | |||||
imp: If we go with a mutex, it would be better to try_lock here and bail if we can't get the lock. | |||||
cpercivaAuthorUnsubmitted Done Inline ActionsThis, and the atomic_cmpset you suggested in the comments, would introduce a different race:
which would result in the driver getting stuck. cperciva: This, and the atomic_cmpset you suggested in the comments, would introduce a different race:
1. | |||||
impUnsubmitted Not Done Inline ActionsHow? Thread 2 doesn't need to run in this case, so there's no getting stuck. imp: How? Thread 2 doesn't need to run in this case, so there's no getting stuck.
| |||||
cpercivaAuthorUnsubmitted Done Inline ActionsThe I/O hadn't completed yet when Thread #1 checked. So neither thread is going to handle the completion. cperciva: The I/O hadn't completed yet when Thread #1 checked. So neither thread is going to handle the… | |||||
impUnsubmitted Not Done Inline ActionsThe next timeout will catch this, or the next completion on this queue. imp: The next timeout will catch this, or the next completion on this queue. | |||||
cpercivaAuthorUnsubmitted Done Inline ActionsOnly if there *is* a next timeout or completion. We could hang indefinitely with one outstanding completion if there aren't any more I/Os added to this queue. (Which could happen even if there's still significant I/O to the device, since we have a LIFO for free trackers -- if we have only a small number of concurrent I/Os they might keep on reusing the same trackers all on the same queue.) cperciva: Only if there *is* a next timeout or completion. We could hang indefinitely with one… | |||||
impUnsubmitted Not Done Inline ActionsThe driver will have schedules others for in-flight I/O, or it will cancel the I/O that's in flight which will produce an interrupt... So either way, we'll have something to clock the next round of completions, and any stragglers will get picked up here. It's how the driver normally works (eg, if we complete the I/O and exit the loop and the next I/O completes before we can exit the interrupt context, that's handled correctly today and this race seems no different than that). imp: The driver will have schedules others for in-flight I/O, or it will cancel the I/O that's in… | |||||
cpercivaAuthorUnsubmitted Done Inline ActionsEach qpair has its own ithread and its own set of timeouts. There's no way for an interrupt on CPU #0 which was triggered by a completion on qpair #1 to do anything with I/Os which are part of qpair #2 (which are handled by CPU #8)... cperciva: Each qpair has its own ithread and its own set of timeouts. There's no way for an interrupt on… | |||||
cpercivaAuthorUnsubmitted Done Inline Actions
That's handled correctly because swi_sched sets ih_need = 1, which results in nvme_qpair_msix_handler being called again if an interrupt arrives while it is already running. This doesn't cover the case where an interrupt arrives while nvme_timeout is running. cperciva: > if we complete the I/O and exit the loop and the next I/O completes before we can exit the… | |||||
impUnsubmitted Not Done Inline ActionsI still want to know why the timeout is happening int he first place. I added the code as a fallback, failsafe, not as a reliable part of the driver. Even if we put this locking in, it's not a reliable, fail-safe part of the driver. I really really really do not want to slow down the normal path of the driver if we can't answer this fundamental question: why are they racing at all? What's causing the interrupt to miraculously fire enough of the time while the interrupt is running to cause all this contention? If we know what we're doing to cause that to happen, it would be a hell of a lot more reliable to just do that in the timeout routine to trigger an interrupt and let the ISR handle completions... The timeout means 'something totally f* up and now I have to sort this st out' and shouldn't slow down the fast path to accommodate it sorting the st out... imp: I still want to know why the timeout is happening int he first place. I added the code as a… | |||||
/* | /* | ||||
* A panic can stop the CPU this routine is running on at any point. If | * A panic can stop the CPU this routine is running on at any point. If | ||||
* we're called during a panic, complete the sq_head wrap protocol for | * we're called during a panic, complete the sq_head wrap protocol for | ||||
Context not available. | |||||
nvme_mmio_write_4(qpair->ctrlr, doorbell[qpair->id].cq_hdbl, | nvme_mmio_write_4(qpair->ctrlr, doorbell[qpair->id].cq_hdbl, | ||||
qpair->cq_head); | qpair->cq_head); | ||||
} | } | ||||
mtx_unlock(&qpair->complete_lock); | |||||
return (done != 0); | return (done != 0); | ||||
} | } | ||||
Context not available. | |||||
} | } | ||||
mtx_init(&qpair->lock, "nvme qpair lock", NULL, MTX_DEF); | mtx_init(&qpair->lock, "nvme qpair lock", NULL, MTX_DEF); | ||||
mtx_init(&qpair->complete_lock, "nvme qpair process completions lock", NULL, MTX_DEF); | |||||
/* Note: NVMe PRP format is restricted to 4-byte alignment. */ | /* Note: NVMe PRP format is restricted to 4-byte alignment. */ | ||||
err = bus_dma_tag_create(bus_get_dma_tag(ctrlr->dev), | err = bus_dma_tag_create(bus_get_dma_tag(ctrlr->dev), | ||||
Context not available. | |||||
if (mtx_initialized(&qpair->lock)) | if (mtx_initialized(&qpair->lock)) | ||||
mtx_destroy(&qpair->lock); | mtx_destroy(&qpair->lock); | ||||
if (mtx_initialized(&qpair->complete_lock)) | |||||
mtx_destroy(&qpair->complete_lock); | |||||
if (qpair->res) | if (qpair->res) | ||||
bus_release_resource(qpair->ctrlr->dev, SYS_RES_IRQ, | bus_release_resource(qpair->ctrlr->dev, SYS_RES_IRQ, | ||||
rman_get_rid(qpair->res), qpair->res); | rman_get_rid(qpair->res), qpair->res); | ||||
Context not available. | |||||
uint32_t csts; | uint32_t csts; | ||||
uint8_t cfs; | uint8_t cfs; | ||||
/* XXX debugging XXX */ | |||||
nvme_printf(ctrlr, "nvme_timeout\n"); | |||||
/* | /* | ||||
* Read csts to get value of cfs - controller fatal status. | * Read csts to get value of cfs - controller fatal status. | ||||
* If no fatal status, try to call the completion routine, and | * If no fatal status, try to call the completion routine, and | ||||
Context not available. |
If we go with a mutex, it would be better to try_lock here and bail if we can't get the lock. That is a higher-level wrapper on the atomic_cmpset() that I did in the comments.