Changeset View
Standalone View
sys/dev/nvme/nvme_qpair.c
Show First 20 Lines • Show All 975 Lines • ▼ Show 20 Lines | nvme_qpair_submit_tracker(struct nvme_qpair *qpair, struct nvme_tracker *tr) | ||||
/* Copy the command from the tracker to the submission queue. */ | /* Copy the command from the tracker to the submission queue. */ | ||||
memcpy(&qpair->cmd[qpair->sq_tail], &req->cmd, sizeof(req->cmd)); | memcpy(&qpair->cmd[qpair->sq_tail], &req->cmd, sizeof(req->cmd)); | ||||
if (++qpair->sq_tail == qpair->num_entries) | if (++qpair->sq_tail == qpair->num_entries) | ||||
qpair->sq_tail = 0; | qpair->sq_tail = 0; | ||||
bus_dmamap_sync(qpair->dma_tag, qpair->queuemem_map, | bus_dmamap_sync(qpair->dma_tag, qpair->queuemem_map, | ||||
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE); | BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE); | ||||
#if !defined( __powerpc__) && !defined( __aarch64__) && !defined( __arm__) | |||||
/* | |||||
* powerpc's bus_dmamap_sync() already includes a heavyweight sync, but | |||||
* no other archs do. | |||||
*/ | |||||
wmb(); | |||||
#endif | |||||
bus_space_write_4(qpair->ctrlr->bus_tag, qpair->ctrlr->bus_handle, | bus_space_write_4(qpair->ctrlr->bus_tag, qpair->ctrlr->bus_handle, | ||||
imp: I'm half thinking this should be just ifdef x86, but I didn't want to change that just yet. | |||||
Done Inline ActionsThe above comment is only correct if the sfence here is actually needed. kib@ makes a good case that it's not needed. imp: The above comment is only correct if the sfence here is actually needed. kib@ makes a good case… | |||||
Done Inline ActionsAnd talking to Drew Gallatin, he's convinced me that the DMA the device does will be consistent with what was written, even if the physical DRAM doesn't contain the new contents. The volatile pointer dereference in bus_space_* will act as a memory bar to keep the compiler from reordering that call with the memcpy above. imp: And talking to Drew Gallatin, he's convinced me that the DMA the device does will be consistent… | |||||
Not Done Inline Actionsvolatile access does not provide barrier. I think that x86 needs atomic_thread_fence_rel() there. I suspect that most arches that need any fence semantic not ensured by bus_dmamap_sync(), would be served by the fence_rel too. kib: volatile access does not provide barrier.
I think that x86 needs `atomic_thread_fence_rel()`… | |||||
Done Inline ActionsWhen talking to Drew, we used "memory bar" to mean "compiler memory bar", which the volatile in bus_space will afford. So that raises the question of whether we should put that into bus_dmamap_sync or not on x86.... The equivalent is in arm64, arm and powerpc dmamap routines if I'm reading them right. arm64 defines atomic_thread_fence_rel() as dsb(sy), and has dsb(sy) for the PREWRITE case. Since bus_dmamap_sync() on x86 is no inlined, __compiler_membar() seems redundant since it's already implied with the bus_space() functions. So in light of all that, what do you recommend kib@ I'm sympathetic to mmel's notion that needing extra things after the sync is a bug in the busdma* implementation.... It's my belief that the C standard already requires there to be an implicit compiler_membar() for the function call, but you're in a much better position to know that than I. imp: When talking to Drew, we used "memory bar" to mean "compiler memory bar", which the volatile in… | |||||
Not Done Inline ActionsCalling a function does not provide guarantees that __compiler_membar() does. Function call is only ensures seq point, which only meaningful for C virtual machine, it does not force compiler to move memory accesses over the place. So I do believe that x86 needs __compiler_membar(). I do think that placing fence_rel() into i386/amd64 bus_dmamap_sync() is the right resolution. kib: Calling a function does not provide guarantees that __compiler_membar() does. Function call is… | |||||
Not Done Inline ActionsI don't agree with first paragraph. For all non-local data, calling external function forces compiler to don't move memory accesses over function call and it forces it to don't use cached data across call. About second paragraph and only for my education - does this mean that x86 architecture doesn't guarantee store order and/or visibility of store buffers to external observers? Or it is necessary as 'barrier' for possible write combining? mmel: I don't agree with first paragraph. For all non-local data, calling external function forces… | |||||
Not Done Inline Actions
This is mostly fine when you consider this guarantee as the feature of the current build.
I do not understand this question. Second paragraph you reference, talked about __compiler_membar(), which ensures that compiler cannot reorder memory accesses around it. x86 guarantees that writes to write-back (and normal uncacheable) memory are observed in program order. It is not true for write-combining memory and in some other situations when special instructions are used. kib: > I don't agree with first paragraph. For all non-local data, calling external function forces… | |||||
Not Done Inline Actions
well, LTO is very heavy/strong argument :) On opposite side putting __compiler_membar() onto top of bus_dmamap_sync() is almost no-op in current situation (i think) so why not.
Thank you very much, that's exactly what I needed to explain. mmel: > This is mostly fine when you consider this guarantee as the feature of the current build.
>… | |||||
qpair->sq_tdbl_off, qpair->sq_tail); | qpair->sq_tdbl_off, qpair->sq_tail); | ||||
qpair->num_cmds++; | qpair->num_cmds++; | ||||
} | } | ||||
static void | static void | ||||
nvme_payload_map(void *arg, bus_dma_segment_t *seg, int nseg, int error) | nvme_payload_map(void *arg, bus_dma_segment_t *seg, int nseg, int error) | ||||
{ | { | ||||
struct nvme_tracker *tr = arg; | struct nvme_tracker *tr = arg; | ||||
▲ Show 20 Lines • Show All 301 Lines • Show Last 20 Lines |
I'm half thinking this should be just ifdef x86, but I didn't want to change that just yet.