Further investigation of issues with 32-bit DMA on PowerNV revealed that
its window is hardcoded by OPAL (at least in skiboot version 5.4.9) and
cannot be changed by the OS.
Thus, now jhb suggestion of limiting the range in PCI DMA tag seems
the best way to deal with it.
Details
- Reviewers
jhibbits jhb nwhitehorn sbruno - Commits
- rS339589: ppc64: limited 32-bit DMA address range
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Not Applicable - Unit
Tests Not Applicable
Event Timeline
This is running live on the host archon.nyi.freebsd.org that hosts ref12-ppc64.freebsd.org. We can do buildworlds and network things now. This is awesome.
I just rebased to today's HEAD from that of the 6th. This change makes my system unbootable. I haven't delved in to it, but my guess is I don't have enough contiguous memory below the limit (my system has 512GB)
Ok. This change should limit only the usable 32-bit range, to 2GB to each PCIe.
Is your system trying to use more than 2GB 32-bit DMA in a given PCIe domain?
Are you using POWER8 or POWER9?
I'm not sure if this 2GB restriction applies to POWER9 as well. It seemed so, by skiboot code.
What version of skiboot do you have on your system?
The problem is that the change is applying to 64-bit devices too. It really needs to be restricted to PECs that have 32 bit devices attached.
Specifically, here's the problem that we're seeing on the Talos II:
Since dma tags extend the window, devices that set lowaddr and highaddr to BUS_SPACE_MAXADDR will get restricted to addresses below OPAL_PCI_BUS_SPACE_LOWADDR_32BIT, because when it factors in parent restriction it takes the minimum of lowaddr and the maximum of highaddr, so it excludes the entire 64 bit address space, leaving just 0x0 to 0x7FFFFFFF as possible DMA space. This has to contend with other stuff like the kernel text which generally ends up in the same phys area.
On POWER9, the controllers operate in 32 bit and 64 bit mode simultaneously. A full fix will require teaching the dma tag handling to handle multiple exclusion ranges, I imagine. Right now it's hardcoded to only allocate memory between 0x0 and lowaddr, and it ignores highaddr entirely.
Additionally, the setting of lowaddr to an address below the end of phys memory will force bus_dmamem_alloc to always do contig malloc, which causes additional pressure and possibility of failure. sys/powerpc/powerpc/busdma_machdep.c badly needs some updating to handle stuff like multiple segment allocations and multiple exclusion ranges.
Also worth looking at:
https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-June/174985.html