Differential D27448 Diff 80320 sys/dev/nvme/nvme_qpair.c

Changeset View

Standalone View

sys/dev/nvme/nvme_qpair.c

Show First 20 Lines • Show All 975 Lines • ▼ Show 20 Lines	nvme_qpair_submit_tracker(struct nvme_qpair qpair, struct nvme_tracker tr)

/* Copy the command from the tracker to the submission queue. */		/* Copy the command from the tracker to the submission queue. */
memcpy(&qpair->cmd[qpair->sq_tail], &req->cmd, sizeof(req->cmd));		memcpy(&qpair->cmd[qpair->sq_tail], &req->cmd, sizeof(req->cmd));

if (++qpair->sq_tail == qpair->num_entries)		if (++qpair->sq_tail == qpair->num_entries)
qpair->sq_tail = 0;		qpair->sq_tail = 0;

bus_dmamap_sync(qpair->dma_tag, qpair->queuemem_map,		bus_dmamap_sync(qpair->dma_tag, qpair->queuemem_map,
BUS_DMASYNC_PREREAD \| BUS_DMASYNC_PREWRITE);		BUS_DMASYNC_PREREAD \| BUS_DMASYNC_PREWRITE);
#if !defined( __powerpc__) && !defined( __aarch64__) && !defined( __arm__)
/*
* powerpc's bus_dmamap_sync() already includes a heavyweight sync, but
* no other archs do.
*/
wmb();
#endif

bus_space_write_4(qpair->ctrlr->bus_tag, qpair->ctrlr->bus_handle,		bus_space_write_4(qpair->ctrlr->bus_tag, qpair->ctrlr->bus_handle,
		impAuthorUnsubmitted Done Inline Actions I'm half thinking this should be just ifdef x86, but I didn't want to change that just yet. imp: I'm half thinking this should be just ifdef x86, but I didn't want to change that just yet.
		impAuthorUnsubmitted Done Inline Actions The above comment is only correct if the sfence here is actually needed. kib@ makes a good case that it's not needed. imp: The above comment is only correct if the sfence here is actually needed. kib@ makes a good case…
		impAuthorUnsubmitted Done Inline Actions And talking to Drew Gallatin, he's convinced me that the DMA the device does will be consistent with what was written, even if the physical DRAM doesn't contain the new contents. The volatile pointer dereference in bus_space_* will act as a memory bar to keep the compiler from reordering that call with the memcpy above. imp: And talking to Drew Gallatin, he's convinced me that the DMA the device does will be consistent…
		kibUnsubmitted Not Done Inline Actions volatile access does not provide barrier. I think that x86 needs `atomic_thread_fence_rel()` there. I suspect that most arches that need any fence semantic not ensured by bus_dmamap_sync(), would be served by the fence_rel too. kib: volatile access does not provide barrier. I think that x86 needs `atomic_thread_fence_rel()`…
		impAuthorUnsubmitted Done Inline Actions When talking to Drew, we used "memory bar" to mean "compiler memory bar", which the volatile in bus_space will afford. So that raises the question of whether we should put that into bus_dmamap_sync or not on x86.... The equivalent is in arm64, arm and powerpc dmamap routines if I'm reading them right. arm64 defines atomic_thread_fence_rel() as dsb(sy), and has dsb(sy) for the PREWRITE case. arm does similar things with dmb powerpc64 does powerpc_sync() at the end of bus_dmamap(), which is stronger than the powerpc_lwsync() that fence_rel() uses. Since bus_dmamap_sync() on x86 is no inlined, __compiler_membar() seems redundant since it's already implied with the bus_space() functions. So in light of all that, what do you recommend kib@ I'm sympathetic to mmel's notion that needing extra things after the sync is a bug in the busdma* implementation.... It's my belief that the C standard already requires there to be an implicit compiler_membar() for the function call, but you're in a much better position to know that than I. imp: When talking to Drew, we used "memory bar" to mean "compiler memory bar", which the volatile in…
		kibUnsubmitted Not Done Inline Actions Calling a function does not provide guarantees that __compiler_membar() does. Function call is only ensures seq point, which only meaningful for C virtual machine, it does not force compiler to move memory accesses over the place. So I do believe that x86 needs __compiler_membar(). I do think that placing fence_rel() into i386/amd64 bus_dmamap_sync() is the right resolution. kib: Calling a function does not provide guarantees that __compiler_membar() does. Function call is…
		mmelUnsubmitted Not Done Inline Actions I don't agree with first paragraph. For all non-local data, calling external function forces compiler to don't move memory accesses over function call and it forces it to don't use cached data across call. The called external function can change anything in memory and compiler cannot predict if given data are changed or not. Only not-exported local variables are exception. About second paragraph and only for my education - does this mean that x86 architecture doesn't guarantee store order and/or visibility of store buffers to external observers? Or it is necessary as 'barrier' for possible write combining? mmel: I don't agree with first paragraph. For all non-local data, calling external function forces…
		kibUnsubmitted Not Done Inline Actions I don't agree with first paragraph. For all non-local data, calling external function forces compiler to don't move memory accesses over function call and it forces it to don't use cached data across call. The called external function can change anything in memory and compiler cannot predict if given data are changed or not. Only not-exported local variables are exception. This is mostly fine when you consider this guarantee as the feature of the current build. The moment LTO is enabled, the guarantee disappears. About second paragraph and only for my education - does this mean that x86 architecture doesn't guarantee store order and/or visibility of store buffers to external observers? Or it is necessary as 'barrier' for possible write combining? I do not understand this question. Second paragraph you reference, talked about __compiler_membar(), which ensures that compiler cannot reorder memory accesses around it. x86 guarantees that writes to write-back (and normal uncacheable) memory are observed in program order. It is not true for write-combining memory and in some other situations when special instructions are used. kib: > I don't agree with first paragraph. For all non-local data, calling external function forces…
		mmelUnsubmitted Not Done Inline Actions This is mostly fine when you consider this guarantee as the feature of the current build. The moment LTO is enabled, the guarantee disappears. well, LTO is very heavy/strong argument :) On opposite side putting __compiler_membar() onto top of bus_dmamap_sync() is almost no-op in current situation (i think) so why not. x86 guarantees that writes to write-back (and normal uncacheable) memory are observed in program order. It is not true for write-combining memory and in some other situations when special instructions are used. Thank you very much, that's exactly what I needed to explain. mmel: > This is mostly fine when you consider this guarantee as the feature of the current build. >…
qpair->sq_tdbl_off, qpair->sq_tail);		qpair->sq_tdbl_off, qpair->sq_tail);
qpair->num_cmds++;		qpair->num_cmds++;
}		}

static void		static void
nvme_payload_map(void arg, bus_dma_segment_t seg, int nseg, int error)		nvme_payload_map(void arg, bus_dma_segment_t seg, int nseg, int error)
{		{
struct nvme_tracker *tr = arg;		struct nvme_tracker *tr = arg;
▲ Show 20 Lines • Show All 301 Lines • Show Last 20 Lines