Page MenuHomeFreeBSD

ioat(4) should use bus_dma(9) for the operation source and destination addresses to work properly with the VT-d IOMMU
ClosedPublic

Authored by tychon on Mar 27 2019, 2:30 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, May 1, 12:03 PM
Unknown Object (File)
Wed, May 1, 12:02 PM
Unknown Object (File)
Wed, May 1, 12:02 PM
Unknown Object (File)
Wed, May 1, 12:02 PM
Unknown Object (File)
Wed, May 1, 12:02 PM
Unknown Object (File)
Wed, May 1, 12:02 PM
Unknown Object (File)
Wed, May 1, 12:02 PM
Unknown Object (File)
Wed, May 1, 3:51 AM
Subscribers

Details

Summary

Supplying physical address for the source and destination addresses directly to the DMA engine makes the IOAT(4) driver is incompatible with the VT-d IOMMU enabled in re-map mode.

Use bus_dma(9) to obtain addresses -- translatable by the IOMMU when enabled -- instead.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

+CCing a few other folks who have ever had some interest in I/OAT :-)

Generally I'd indeed like to see some busdma integration either in the driver, or may be in some wrapper/subsystem on top of it to simplify usage, but right now in my code I am already using busdma to translate from virtual to physical addresses, and this busdma call here will be duplicate waste of time, if not worse somehow.

sys/dev/ioat/ioat.c
549 ↗(On Diff #55496)

Not sure it would change much, but I think here ioat->max_xfer_size could be specified.

In D19725#422587, @mav wrote:

Generally I'd indeed like to see some busdma integration either in the driver, or may be in some wrapper/subsystem on top of it to simplify usage, but right now in my code I am already using busdma to translate from virtual to physical addresses, and this busdma call here will be duplicate waste of time, if not worse somehow.

When the IOMMU is enabled you need a 3rd-level of translation. As currently written the ioat driver programs the DMA engine, for the source and destination addresses of the operation, with physical addresses. With the IOMMU enabled these physical addresses are presented for translation. That is incorrect. In the re-mapped mode the physical addresses masquerading as DMA addresses will fail to translate. This causes the ioat operations to fail.

When the IOMMU is disabled there is little overhead. If this overhead is concerning it's hard to argue that bus_dma(9) non-compliance is really desirable, certainly not by default, but one could always choose wrap the load/unload calls in ACPI_DMAR.

tychon retitled this revision from ioat(4) should use bus_dma(9) to work properly with the VT-d IOMMU to ioat(4) should use bus_dma(9) for the operation source and destination addresses to work properly with the VT-d IOMMU.
tychon edited the summary of this revision. (Show Details)

Use ioat-> max_xfer_size instead of BUS_SPACE_MAXADDR when creating the DMA tag for the operands.

sys/dev/ioat/ioat.c
363 ↗(On Diff #55506)

Keep in mind this detach routine is invoked when attach3 below errors, so e.g. the ring may be only partially initialized if one of the dmamap_create operations fails.

381 ↗(On Diff #55506)

Keep in mind data_tag here may not be any value allocated by bus_dma_tag_create (i.e., initial softc zeroes).

547–548 ↗(On Diff #55506)

Not sure this is accurate; some versions of the device seem limited to 48 bits of physical address.

773–774 ↗(On Diff #55506)

It seems like these might need to be conditional, unless unload without prior load is ok? E.g., ioat_blockfill only has a destination address; there is no user source address to map.

1085–1088 ↗(On Diff #55506)

This isn't generally valid. Again, the example is ioat_blockfill. Here, src_addr is overloaded to mean "64-bit block pattern constant." Mapping it is nonsensical.

Similarly, ioat_crc doesn't use dst and passes 0.

That probably means we should add additional flags for ioat_op_generic about whether src/dst represent actual addresses and conditionalize mapping based on those?

1194–1201 ↗(On Diff #55506)

These will also need to be mapped (potentially 2 srcs and 2 dsts per op).

tychon marked 6 inline comments as done.
sys/dev/ioat/ioat.c
547–548 ↗(On Diff #55506)

I'm going to assume that whatever DMA address the IOMMU assigns are going to be constrained by the same limitations as this device. While that too may not be accurate, intuitively it seems a better approach than perhaps imposing an arbitrary constraint.

sys/dev/ioat/ioat.c
547–548 ↗(On Diff #55506)

I think one of us should dig out the arbitrary constant from the NDA ioat reference documents and just use the right constant rather than assuming it matches the IOMMU :-).

cem requested changes to this revision.Apr 1 2019, 11:39 PM

Thanks, this addresses a lot of my concerns. I have a few more suggestions / problems to point out. I really appreciate that someone's fixing this driver to use busdma correctly. Certainly when I committed it into the tree I didn't really understand busdma well enough to do it myself, and then never found the motivation after that.

sys/dev/ioat/ioat.c
364 ↗(On Diff #55691)

This isn't good enough, unfortunately. dmamap can be both valid and NULL (e.g., when bounce DMA is used on x86, but a bounce mapping is not needed). The dmamap still needs to be destroyed (reference count release) and the rest of the cleanup logic needs to proceed.

638–639 ↗(On Diff #55691)

Could we just reuse dst2 for this one?

1270 ↗(On Diff #55691)

This optimization is not valid with iommu (or really busdma in general) -- only the first page is mapped and the mapping is only one page long. We can always use src/dst_page_break and map src2/dst2 unconditionally, or we could somehow attempt to handle this more carefully for busdma.

1367–1375 ↗(On Diff #55691)

Any chance it makes sense to change this common snippet of repeated logic into a subroutine (or macro+function, to automatically insert __func__)?

This revision now requires changes to proceed.Apr 1 2019, 11:39 PM

This revision includes setups up the '48-bit' DMA address constraint for the 'data' tag and the '40-bit' DMA address constraint for the 'crc data' tag.

This should close out the feedback I received prior to today and next I will address the feedback that I see has just arrived. Expect at least one more revision if you'd like to simply review it in it's entirety.

sys/dev/ioat/ioat.c
593 ↗(On Diff #55708)

We define a BUS_SPACE_MAXADDR_40BIT for this you can use :-)

I think I've addressed at least many an attempt to address all outstanding feedback now. The diff is updated.

sys/dev/ioat/ioat.c
638–639 ↗(On Diff #55691)

No, alas, the address width requirements are different.

1367–1375 ↗(On Diff #55691)

I tried. Not sure if it's any better.

Looks good to me in the happy case and in the ordinary / weird DMA operations. I think teardown still isn't quite clean in the case where attach fails at a partially initialized ring.

sys/dev/ioat/ioat.c
380 ↗(On Diff #55732)

This may still (bogusly) release references if there was a failure partway through initialization of the ring and bouncebuf dma is used.

This revision is now accepted and ready to land.Apr 2 2019, 6:12 PM
This revision was automatically updated to reflect the committed changes.
tychon marked 2 inline comments as done.