Page MenuHomeFreeBSD

ioat(4) should use bus_dma(9) for the operation source and destination addresses to work properly with the VT-d IOMMU
ClosedPublic

Authored by tychon on Mar 27 2019, 2:30 PM.
Tags
None
Referenced Files
F84415902: D19725.diff
Thu, May 23, 2:38 PM
Unknown Object (File)
Thu, May 16, 8:02 PM
Unknown Object (File)
Wed, May 1, 12:03 PM
Unknown Object (File)
Wed, May 1, 12:02 PM
Unknown Object (File)
Wed, May 1, 12:02 PM
Unknown Object (File)
Wed, May 1, 12:02 PM
Unknown Object (File)
Wed, May 1, 12:02 PM
Unknown Object (File)
Wed, May 1, 12:02 PM
Subscribers

Details

Summary

Supplying physical address for the source and destination addresses directly to the DMA engine makes the IOAT(4) driver is incompatible with the VT-d IOMMU enabled in re-map mode.

Use bus_dma(9) to obtain addresses -- translatable by the IOMMU when enabled -- instead.

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

+CCing a few other folks who have ever had some interest in I/OAT :-)

Generally I'd indeed like to see some busdma integration either in the driver, or may be in some wrapper/subsystem on top of it to simplify usage, but right now in my code I am already using busdma to translate from virtual to physical addresses, and this busdma call here will be duplicate waste of time, if not worse somehow.

sys/dev/ioat/ioat.c
549

Not sure it would change much, but I think here ioat->max_xfer_size could be specified.

In D19725#422587, @mav wrote:

Generally I'd indeed like to see some busdma integration either in the driver, or may be in some wrapper/subsystem on top of it to simplify usage, but right now in my code I am already using busdma to translate from virtual to physical addresses, and this busdma call here will be duplicate waste of time, if not worse somehow.

When the IOMMU is enabled you need a 3rd-level of translation. As currently written the ioat driver programs the DMA engine, for the source and destination addresses of the operation, with physical addresses. With the IOMMU enabled these physical addresses are presented for translation. That is incorrect. In the re-mapped mode the physical addresses masquerading as DMA addresses will fail to translate. This causes the ioat operations to fail.

When the IOMMU is disabled there is little overhead. If this overhead is concerning it's hard to argue that bus_dma(9) non-compliance is really desirable, certainly not by default, but one could always choose wrap the load/unload calls in ACPI_DMAR.

tychon retitled this revision from ioat(4) should use bus_dma(9) to work properly with the VT-d IOMMU to ioat(4) should use bus_dma(9) for the operation source and destination addresses to work properly with the VT-d IOMMU.
tychon edited the summary of this revision. (Show Details)

Use ioat-> max_xfer_size instead of BUS_SPACE_MAXADDR when creating the DMA tag for the operands.

sys/dev/ioat/ioat.c
363

Keep in mind this detach routine is invoked when attach3 below errors, so e.g. the ring may be only partially initialized if one of the dmamap_create operations fails.

381

Keep in mind data_tag here may not be any value allocated by bus_dma_tag_create (i.e., initial softc zeroes).

547–548

Not sure this is accurate; some versions of the device seem limited to 48 bits of physical address.

773–774

It seems like these might need to be conditional, unless unload without prior load is ok? E.g., ioat_blockfill only has a destination address; there is no user source address to map.

1085–1088

This isn't generally valid. Again, the example is ioat_blockfill. Here, src_addr is overloaded to mean "64-bit block pattern constant." Mapping it is nonsensical.

Similarly, ioat_crc doesn't use dst and passes 0.

That probably means we should add additional flags for ioat_op_generic about whether src/dst represent actual addresses and conditionalize mapping based on those?

1194–1201

These will also need to be mapped (potentially 2 srcs and 2 dsts per op).

tychon marked 6 inline comments as done.
sys/dev/ioat/ioat.c
547–548

I'm going to assume that whatever DMA address the IOMMU assigns are going to be constrained by the same limitations as this device. While that too may not be accurate, intuitively it seems a better approach than perhaps imposing an arbitrary constraint.

sys/dev/ioat/ioat.c
547–548

I think one of us should dig out the arbitrary constant from the NDA ioat reference documents and just use the right constant rather than assuming it matches the IOMMU :-).

cem requested changes to this revision.Apr 1 2019, 11:39 PM

Thanks, this addresses a lot of my concerns. I have a few more suggestions / problems to point out. I really appreciate that someone's fixing this driver to use busdma correctly. Certainly when I committed it into the tree I didn't really understand busdma well enough to do it myself, and then never found the motivation after that.

sys/dev/ioat/ioat.c
364

This isn't good enough, unfortunately. dmamap can be both valid and NULL (e.g., when bounce DMA is used on x86, but a bounce mapping is not needed). The dmamap still needs to be destroyed (reference count release) and the rest of the cleanup logic needs to proceed.

595–596

Could we just reuse dst2 for this one?

1194

This optimization is not valid with iommu (or really busdma in general) -- only the first page is mapped and the mapping is only one page long. We can always use src/dst_page_break and map src2/dst2 unconditionally, or we could somehow attempt to handle this more carefully for busdma.

1275–1283

Any chance it makes sense to change this common snippet of repeated logic into a subroutine (or macro+function, to automatically insert __func__)?

This revision now requires changes to proceed.Apr 1 2019, 11:39 PM

This revision includes setups up the '48-bit' DMA address constraint for the 'data' tag and the '40-bit' DMA address constraint for the 'crc data' tag.

This should close out the feedback I received prior to today and next I will address the feedback that I see has just arrived. Expect at least one more revision if you'd like to simply review it in it's entirety.

sys/dev/ioat/ioat.c
548

We define a BUS_SPACE_MAXADDR_40BIT for this you can use :-)

I think I've addressed at least many an attempt to address all outstanding feedback now. The diff is updated.

sys/dev/ioat/ioat.c
595–596

No, alas, the address width requirements are different.

1275–1283

I tried. Not sure if it's any better.

Looks good to me in the happy case and in the ordinary / weird DMA operations. I think teardown still isn't quite clean in the case where attach fails at a partially initialized ring.

sys/dev/ioat/ioat.c
365

This may still (bogusly) release references if there was a failure partway through initialization of the ring and bouncebuf dma is used.

This revision is now accepted and ready to land.Apr 2 2019, 6:12 PM
This revision was automatically updated to reflect the committed changes.
tychon marked 2 inline comments as done.