Page MenuHomeFreeBSD

dmar: reserve memory windows of PCIe root port
ClosedPublic

Authored by rlibby on Dec 7 2020, 10:22 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Jan 22, 10:32 AM
Unknown Object (File)
Sun, Jan 19, 4:37 PM
Unknown Object (File)
Wed, Jan 8, 8:34 PM
Unknown Object (File)
Tue, Jan 7, 8:26 AM
Unknown Object (File)
Dec 7 2024, 5:27 AM
Unknown Object (File)
Dec 5 2024, 12:41 AM
Unknown Object (File)
Nov 26 2024, 1:00 PM
Unknown Object (File)
Nov 24 2024, 10:54 AM

Details

Summary

PCI memory address space is shared between memory-mapped devices (MMIO)
and host memory (which may be remapped by an IOMMU). Device accesses to
an address within a memory aperture in a PCIe root port will be treated
as peer-to-peer and not forwarded to an IOMMU. To avoid this, reserve
the address space of the root port's memory apertures in the address
space used by the IOMMU for remapping.

Test Plan

Internally we have tested this on a device where we use DMAR, and where
IOVA mappings were regularly intersecting the device BAR before this
patch.

tychon has/will boot systems on CURRENT with hw.dmar.enable=1 and some
set of devices enabled.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

So idea is to ensure that IOMMU mapping never aliases BAR, am I right ? It was impossible to infer it from the summary, but this seems to be what the code does.

sys/dev/iommu/iommu_gas.c
728 ↗(On Diff #80420)

I do not think this is good pattern. Either unstaticise iommua_gas_alloc_region(), or better, add some function like iommu_gas_reserve_region_locked() so that you do not drop/reacquire the domain lock between iteration and reserve.

sys/x86/iommu/intel_ctx.c
333 ↗(On Diff #80420)

I suggest to leave non-pci devices alone, and return 0 instead of EINVAL. Not that we support them ATM for memory translations, but we should start someday.

In D27503#614700, @kib wrote:

So idea is to ensure that IOMMU mapping never aliases BAR, am I right ? It was impossible to infer it from the summary, but this seems to be what the code does.

Apologies up front if I'm not using the right terminology.

The device BARs, yes, but not just the BARs. Essentially the intention is to prevent overlap with any host memory assignment to anything on the same PCIe root port. The reason is that otherwise my understanding is that PCI devices may (mis)interpret the IO virtual addresses as host addresses which they think they know how to route, and may intercept. My understanding is that they do not care whether the untranslated bit is set. Furhter complicating this is hotplug. So, the intention here is to take a big hammer to this, because we have a lot of IOVA space.

I'll work on trying to explain this better in the summary.

sys/dev/iommu/iommu_gas.c
728 ↗(On Diff #80420)

Okay I'll take another look at trying to make this nice. At this point the lock is somewhat moot because this is only intended to be used during the initialization, but I understand that that could become a problem if it is ever used beyond that initialization period. Incidentally the reason for the lock dropping is that we can't iommu_gas_alloc_entry under the mutex. Because RMRRs might exist, and my understanding is that it is not forbidden that they might exist within the memory windows, the reservation may need to be split, which means we may need to (pre)allocate multiple entries.

sys/x86/iommu/intel_ctx.c
333 ↗(On Diff #80420)

Okay, will do.

Implement feedback from kib. Don't drop the lock between determining
the free region and inserting an entry to cover it.

At first I tried to implement the feedback by counting how many entries
we would need, then preallocating them all, then rechecking, then
reserving. But then I realized that doing all the reservations under
one lock doesn't really buy us anything, since
iommu_gas_reserve_region_extend is not called with the lock anyway (i.e.
if we were racing with someone removing entries, they could just as well
do it immediately after we dropped our lock). However I can show that
code for discussion if desired.

Now instead what this does is preallocate just a single entry and then
with the lock compute the next free region and fill it in, before taking
a lock break and repeating.

comment and rename dmar_reserve_pci_regions

Update comment with suggestion from Tycho and Anton. The summary log
will also be updated with this text.

I've tested this latest iteration as well.

This revision is now accepted and ready to land.Dec 8 2020, 8:09 PM

New description is much cleaner, I believe I understand the purpose of the patch. Thanks.

sys/dev/iommu/iommu_gas.c
747 ↗(On Diff #80446)

Hm, you are still stepping to the next entry on each unlock.

Wouldn't it be more accurate to keep start constant, and fill the lowest hole after start and below end, on each iteration which takes the lock ?

sys/dev/iommu/iommu_gas.c
747 ↗(On Diff #80446)

I can definitely code that up if you prefer. Normally there shouldn't be that many segments, so the walk forward shouldn't be too expensive. However, I think ultimately it doesn't really make a difference to the meaning of the API, since the caller doesn't hold a lock anyway, as in it seems that whatever that would protect us from could just happen right after we dropped our lock the last time.

This revision was automatically updated to reflect the committed changes.