The NVMe standard requires that all writes Remove a wmb() that's not necessary.
bus_dmamap_sync() is supposed to host memory for theprovide guarantees that ensure that
submission queue are globally visible before updating the tailq memory that's prepared for PREWRITE can be DMA'd immediately after it
pointer. Document this fact, and explain why some (all !x86?) returns.
For non-x86 platforms, bus_dmamap_sync() takes care of ensuring that
implementations don't need it. all writes to the command buffer has been posted well enough for the
device to initiate DMA from that memory and get that contents. They
all have the appropaite strength memory fence.
For x86 platforms, the memory ordering is already strong enough. Once
memory is written, the write to the uncached BAR to force the DMA to
the device will get its contents. As such, we don't need the wmb()
here. It translates to an sfence which is only needed for writes to
regions that have the write combining attribute set. The nvme driver
does none of these. Now that x86's bus_dmamap_sync() includes a
__compiler_membar, we can be assured the optimizer won't reorder the
bus_dmamap_sync and the bus_space_write operations.
and
Annotate bus_dmamap_sync() with fence
Add an explicit thread fence release before returning from
bus_dmamap_sync. This should be a no-op in practice, but makes
explicit that all ordinary stores will be completed before subsequent
reads/writes to ordinary device memory.
There is one exception. If you've mapped memory as write combining,
then you will need to add a sfence or similar.