Differential D24256 Diff 70104 sys/dev/nvme/nvme_qpair.c

Changeset View

Standalone View

sys/dev/nvme/nvme_qpair.c

Context not available.
	if (!qpair->is_enabled)	if (!qpair->is_enabled)
	return (false);	return (false);

		mtx_lock(&qpair->complete_lock);
		impUnsubmitted Not Done Inline Actions If we go with a mutex, it would be better to try_lock here and bail if we can't get the lock. That is a higher-level wrapper on the atomic_cmpset() that I did in the comments. imp: If we go with a mutex, it would be better to try_lock here and bail if we can't get the lock.
		cpercivaAuthorUnsubmitted Done Inline Actions This, and the atomic_cmpset you suggested in the comments, would introduce a different race: Thread #1 exits the while (1) loop. An I/O completes. Thread #2 checks the lock (or the "busy" atomic) and returns. Thread #1 unlocks the lock. which would result in the driver getting stuck. cperciva: This, and the atomic_cmpset you suggested in the comments, would introduce a different race: 1.
		impUnsubmitted Not Done Inline Actions How? Thread 2 doesn't need to run in this case, so there's no getting stuck. imp: How? Thread 2 doesn't need to run in this case, so there's no getting stuck.
		cpercivaAuthorUnsubmitted Done Inline Actions The I/O hadn't completed yet when Thread #1 checked. So neither thread is going to handle the completion. cperciva: The I/O hadn't completed yet when Thread #1 checked. So neither thread is going to handle the…
		impUnsubmitted Not Done Inline Actions The next timeout will catch this, or the next completion on this queue. imp: The next timeout will catch this, or the next completion on this queue.
		cpercivaAuthorUnsubmitted Done Inline Actions Only if there is a next timeout or completion. We could hang indefinitely with one outstanding completion if there aren't any more I/Os added to this queue. (Which could happen even if there's still significant I/O to the device, since we have a LIFO for free trackers -- if we have only a small number of concurrent I/Os they might keep on reusing the same trackers all on the same queue.) cperciva: Only if there is a next timeout or completion. We could hang indefinitely with one…
		impUnsubmitted Not Done Inline Actions The driver will have schedules others for in-flight I/O, or it will cancel the I/O that's in flight which will produce an interrupt... So either way, we'll have something to clock the next round of completions, and any stragglers will get picked up here. It's how the driver normally works (eg, if we complete the I/O and exit the loop and the next I/O completes before we can exit the interrupt context, that's handled correctly today and this race seems no different than that). imp: The driver will have schedules others for in-flight I/O, or it will cancel the I/O that's in…
		cpercivaAuthorUnsubmitted Done Inline Actions Each qpair has its own ithread and its own set of timeouts. There's no way for an interrupt on CPU #0 which was triggered by a completion on qpair #1 to do anything with I/Os which are part of qpair #2 (which are handled by CPU #8)... cperciva: Each qpair has its own ithread and its own set of timeouts. There's no way for an interrupt on…
		cpercivaAuthorUnsubmitted Done Inline Actions if we complete the I/O and exit the loop and the next I/O completes before we can exit the interrupt context, that's handled correctly today That's handled correctly because swi_sched sets ih_need = 1, which results in nvme_qpair_msix_handler being called again if an interrupt arrives while it is already running. This doesn't cover the case where an interrupt arrives while nvme_timeout is running. cperciva: > if we complete the I/O and exit the loop and the next I/O completes before we can exit the…
		impUnsubmitted Not Done Inline Actions I still want to know why the timeout is happening int he first place. I added the code as a fallback, failsafe, not as a reliable part of the driver. Even if we put this locking in, it's not a reliable, fail-safe part of the driver. I really really really do not want to slow down the normal path of the driver if we can't answer this fundamental question: why are they racing at all? What's causing the interrupt to miraculously fire enough of the time while the interrupt is running to cause all this contention? If we know what we're doing to cause that to happen, it would be a hell of a lot more reliable to just do that in the timeout routine to trigger an interrupt and let the ISR handle completions... The timeout means 'something totally f* up and now I have to sort this st out' and shouldn't slow down the fast path to accommodate it sorting the st out... So maybe instead we should in the timeout try to do a get feature or some other innocuous command that will generate an interrupt and have the timeout from that just cause a full reset of the card and never call the completion routine from the timeout as a hail-mary since it's unsafe. That would get things unstuck without slowing down the current completion path. imp: I still want to know why the timeout is happening int he first place. I added the code as a…

	/*	/*
	* A panic can stop the CPU this routine is running on at any point. If	* A panic can stop the CPU this routine is running on at any point. If
	* we're called during a panic, complete the sq_head wrap protocol for	* we're called during a panic, complete the sq_head wrap protocol for
Context not available.
	nvme_mmio_write_4(qpair->ctrlr, doorbell[qpair->id].cq_hdbl,	nvme_mmio_write_4(qpair->ctrlr, doorbell[qpair->id].cq_hdbl,
	qpair->cq_head);	qpair->cq_head);
	}	}
		mtx_unlock(&qpair->complete_lock);
	return (done != 0);	return (done != 0);
	}	}

Context not available.
	}	}

	mtx_init(&qpair->lock, "nvme qpair lock", NULL, MTX_DEF);	mtx_init(&qpair->lock, "nvme qpair lock", NULL, MTX_DEF);
		mtx_init(&qpair->complete_lock, "nvme qpair process completions lock", NULL, MTX_DEF);

	/* Note: NVMe PRP format is restricted to 4-byte alignment. */	/* Note: NVMe PRP format is restricted to 4-byte alignment. */
	err = bus_dma_tag_create(bus_get_dma_tag(ctrlr->dev),	err = bus_dma_tag_create(bus_get_dma_tag(ctrlr->dev),
Context not available.
	if (mtx_initialized(&qpair->lock))	if (mtx_initialized(&qpair->lock))
	mtx_destroy(&qpair->lock);	mtx_destroy(&qpair->lock);

		if (mtx_initialized(&qpair->complete_lock))
		mtx_destroy(&qpair->complete_lock);

	if (qpair->res)	if (qpair->res)
	bus_release_resource(qpair->ctrlr->dev, SYS_RES_IRQ,	bus_release_resource(qpair->ctrlr->dev, SYS_RES_IRQ,
	rman_get_rid(qpair->res), qpair->res);	rman_get_rid(qpair->res), qpair->res);
Context not available.
	uint32_t csts;	uint32_t csts;
	uint8_t cfs;	uint8_t cfs;

		/* XXX debugging XXX */
		nvme_printf(ctrlr, "nvme_timeout\n");

	/*	/*
	* Read csts to get value of cfs - controller fatal status.	* Read csts to get value of cfs - controller fatal status.
	* If no fatal status, try to call the completion routine, and	* If no fatal status, try to call the completion routine, and
Context not available.