Page MenuHomeFreeBSD

NVME: Multiple busdma related fixes.
ClosedPublic

Authored by mmel on Dec 2 2020, 12:56 PM.

Details

Summary
  • in nvme_qpair_process_completions() do dma sync before completion buffer is used.
  • in nvme_qpair_submit_tracker(), don't do explicit wmb() also for arm and arm64. Execution bus_dmamap_sync() is (and must be) sufficient to ensure that all CPU stores are visible to external (including DMA) observers.
  • Allocate completion buffer as BUS_DMA_COHERENT. On not-DMA coherent systems, buffers continuously owned (and accessed) by DMA must be allocated with this flag. Note that BUS_DMA_COHERENT flag is no-op on DMA coherent systems (or coherent buses in mixed systems).

MFC after: 3 weeks


I think that wmb() in nvme_qpair_submit_tracker() is relict from early
implementation (without bus_dmamap_sync() ). It's job of bus_dmamap_sync()
to ensure visibility of all CPU stores committed before, therefore write
barrier is clearly superfluous.
Unfortunately I have not right HW to test this on amd64/i386.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

mmel requested review of this revision.Dec 2 2020, 12:56 PM
mmel edited the summary of this revision. (Show Details)

It is mostly OK to me, except I am not sure what was the motivation behind the wmb() originally. I haven't seen it in any other driver, so it is weird to me from the beginning. But before removing it it would be nice to figure out what was it.

This revision is now accepted and ready to land.Dec 2 2020, 2:57 PM

Added some comments about the wmb() history and a quick analysis of the busdma impl.

sys/dev/nvme/nvme_qpair.c
990 ↗(On Diff #80225)

This was in the original driver, as committed. There was no sync() call in that version and it was needed.
The wmb() wasn't removed when this was ported to powerpc. Justin Hibbits added the #ifdef when he added bus_dmamap_sync().

On x86 this places a sfence in the sequence so that all stores are globally visible. This is a fairly common thing in other drivers that have the same queueing model where the producer/consumer queue is shared between the CPU and the device to ensure that part of 'global' includes the device. The x86 busdma implementation doesn't appear[*] to have this in its implementation, which is likely why wmb() was retained when Justin committed his changes.

  • I say appears because I didn't read all the code. When we aren't actively bouncing the buffers (which we won't), the x86 sync appears to do nothing, but there may be something else.

This wmb() was exist in initial version of nvme drivers, which did not used standard FreeBSD busdma functions.
See https://svnweb.freebsd.org/base/head/sys/dev/nvme/nvme_qpair.c?view=markup&pathrev=240616#l415 .
I'm near sure it was used as barrier to ensure the visibility of memcpy() few line above to nvme dma before the real comand is fired (by nvme_mmio_write_4()).

jhibbits in r337273 added bus_dmamap_sync for powerpc. The existence of wmb () was mentioned in preview for this commit - https://reviews.freebsd.org/D16570#351840. I
have checked x86 implementation of bounce_bus_dmamap_sync () and he's right.

I'm not expert on x86 but this looks very strange for me. Either the platform guarantees the write order (then wmb () is unnecessary) or not, and then bus_dmamap_sync () should do it.
With all this, I think that its safer to leave patch (including wmb()) as is.

Bah, Warner was much faster :)

In D27446#613264, @mmel wrote:

I'm not expert on x86 but this looks very strange for me. Either the platform guarantees the write order (then wmb () is unnecessary) or not, and then bus_dmamap_sync () should do it.
With all this, I think that its safer to leave patch (including wmb()) as is.

I agree we should leave it here for now.

I believe it's to force the write to the go to the hardware faster rather than a strict ordering perspective, but the confidence in that belief is weak.

  • UPDATE ** see below, I think the last sentence here is incorrect and I was wrong about it.
This revision was automatically updated to reflect the committed changes.
sys/dev/nvme/nvme_qpair.c
990 ↗(On Diff #80225)

Now that I've looked at the code more extensively, I think this is here to ensure writes done to the submission queue are forced out before we write to the tailq pointer to do the actual submission to the device, not for any latency advantage. sfence ensure writes that happen before it aren't reordered past writes that happen after it, and the next line we're writing to the tailq pointer.