iommu_get_requester(): make it more resilient against arbitrary dev arg
ClosedPublic
Actions

Authored by kib on Sep 5 2025, 11:11 PM.

Details

Reviewers

jhb
jah

Commits

rGc745a6818bcb: iommu_get_requester(): make it more resilient against arbitrary dev arg

Summary

If passed the parent of a device instead of the device, the loop might
end up with the host bridge in the pci local variable.  If the passed
device is not from the pci hierarchy, any of the calculated parents
might be NULL.

Instead of asserting, just issue a message and return the last guess of
the requester.

PR:     289318

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

kib created this revision.Sep 5 2025, 11:11 PM

Herald added a subscriber: imp. · View Herald TranscriptSep 5 2025, 11:11 PM

kib requested review of this revision.Sep 5 2025, 11:11 PM

kib added a reviewer: jah.Sep 6 2025, 8:56 AM

Would it be better to handle the bug in the PR with a few extra checks in the calling DEVICE_SYSCTL_IOMMU code? That would also allow the sysctl to return some string in the sbuf to clearly indicate that the RID can't be determined or is N/A.

For the original (busdma) intended use of iommu_get_requester(), I'd prefer to err on the side of panicking if e.g. a driver messes up and passes a bogus dev instead of just returning the original dev only to later get IOMMU faults or some other unexpected behavior.

In D52406#1196858, @jah wrote:

Would it be better to handle the bug in the PR with a few extra checks in the calling DEVICE_SYSCTL_IOMMU code? That would also allow the sysctl to return some string in the sbuf to clearly indicate that the RID can't be determined or is N/A.

Ok, but I do not quite understand how can I check that without actually climbing up the device newbus hierarchy to see where we end. Do you have a proposal?

May be, the interface of iommu_get_requester() should be changed to indicate the failure of translation, so that instead of rather arbitrary 'requester', my patch would be able to return an error.

For the original (busdma) intended use of iommu_get_requester(), I'd prefer to err on the side of panicking if e.g. a driver messes up and passes a bogus dev instead of just returning the original dev only to later get IOMMU faults or some other unexpected behavior.

I hadn't thought about it very much at all TBH, but the 2 very rough ideas I had were either to factor out the newbus walk into a common (gracefully failing) helper that could be used by both iommu_get_requester() and the sysctl handler, or to do as you suggest and change iommu_get_requester() to return an error.

If it's not too much work to adapt the other callers to handle the error, I think I would prefer the second option, as it would have the added benefit of producing a (hopefully) clear failure on non-INVARIANTS builds.

Change the interface to allow the caller to receive error.
Instead of asserting, just issue a message and return ENXIO,
allowing the caller to select appropriate action.

Herald added a subscriber: olce. · View Herald TranscriptSep 7 2025, 7:34 AM

Do not print garbage (zero) rid when IOMMU is not configured.

jah added inline comments.Sep 7 2025, 5:18 PM

sys/dev/iommu/busdma_iommu.c
270	Should we KASSERT/MPASS here instead of returning NULL ? Would this call ever be expected to fail under normal operation?

kib added inline comments.Sep 8 2025, 3:15 AM

sys/dev/iommu/busdma_iommu.c
270	I think yes, it could fail, in principle. The iommu passed as an argument is coming from some external sources of information, e.g. DMAR table might have the glob 'INCLUDE_PCI_ALL' flag. Then, there is no guarantee that we were not called e.g. on the host bridge directly (for the dev argument), for whatever reasons. Might be newer PCIe std would allow for some data transfers initiated by upstream ports. I understand the motivation to assert that error == 0, but IMO since the patch grows the capability to accept and handle an error, this is much more flexible and error-resilient. Note that user is already getting the diagnostic, and that the device with the error is still the subject of translation by the iommu unit, so I doubt that it can work with busdma bouncing fallback.