If passed the parent of a device instead of the device, the loop might end up with the host bridge in the pci local variable. If the passed device is not from the pci hierarchy, any of the calculated parents might be NULL. Instead of asserting, just issue a message and return the last guess of the requester. PR: 289318
Details
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Not Applicable - Unit
Tests Not Applicable
Event Timeline
Would it be better to handle the bug in the PR with a few extra checks in the calling DEVICE_SYSCTL_IOMMU code? That would also allow the sysctl to return some string in the sbuf to clearly indicate that the RID can't be determined or is N/A.
For the original (busdma) intended use of iommu_get_requester(), I'd prefer to err on the side of panicking if e.g. a driver messes up and passes a bogus dev instead of just returning the original dev only to later get IOMMU faults or some other unexpected behavior.
Ok, but I do not quite understand how can I check that without actually climbing up the device newbus hierarchy to see where we end. Do you have a proposal?
May be, the interface of iommu_get_requester() should be changed to indicate the failure of translation, so that instead of rather arbitrary 'requester', my patch would be able to return an error.
For the original (busdma) intended use of iommu_get_requester(), I'd prefer to err on the side of panicking if e.g. a driver messes up and passes a bogus dev instead of just returning the original dev only to later get IOMMU faults or some other unexpected behavior.
I hadn't thought about it very much at all TBH, but the 2 very rough ideas I had were either to factor out the newbus walk into a common (gracefully failing) helper that could be used by both iommu_get_requester() and the sysctl handler, or to do as you suggest and change iommu_get_requester() to return an error.
If it's not too much work to adapt the other callers to handle the error, I think I would prefer the second option, as it would have the added benefit of producing a (hopefully) clear failure on non-INVARIANTS builds.
Change the interface to allow the caller to receive error. Instead of asserting, just issue a message and return ENXIO, allowing the caller to select appropriate action.
sys/dev/iommu/busdma_iommu.c | ||
---|---|---|
270 | Should we KASSERT/MPASS here instead of returning NULL ? |
sys/dev/iommu/busdma_iommu.c | ||
---|---|---|
270 | I think yes, it could fail, in principle. The iommu passed as an argument is coming from some external sources of information, e.g. DMAR table might have the glob 'INCLUDE_PCI_ALL' flag. Then, there is no guarantee that we were not called e.g. on the host bridge directly (for the dev argument), for whatever reasons. Might be newer PCIe std would allow for some data transfers initiated by upstream ports. I understand the motivation to assert that error == 0, but IMO since the patch grows the capability to accept and handle an error, this is much more flexible and error-resilient. Note that user is already getting the diagnostic, and that the device with the error is still the subject of translation by the iommu unit, so I doubt that it can work with busdma bouncing fallback. |