- Pass through a DMA tag for the controller.
- In theory the controller can access the lower 3 GB, but in my testing I found that unreliable. OpenBSD also restricts DMA to the lowest 1 GB.
- Rename some constants to be a bit more meaningful.
- Minor changes to comments that were no longer accurate.
- Group Reviewers
- rS365677: bcm2838_pci.c: Respect DMA limits of controller.
Tested on my 8 GiB Pi4.
- Multiple readers and writers to USB media (md5 of multiple large [ > 10 GB ] files while copying other files onto device).
- With this bus tag in place, the checksum is preserved. Without, the checksums become garbled.
I shall ask for additional testers from the mailing list.
Basic functionality also tested on a 2 GiB Pi4.
My understanding is that different devices have different DMA limits. XHCI has 3 GiB but others have 1 GiB. There is not one DMA controller enforcing one limit common to all contexts but multiple separate limits. Using the smallest limit for all devices may well be an alternative but that is not what the ACPI tables describe (from what I read on the lists).
There is only one device on the PCI express bus: the xhci controller. This patch only affects devices attached to the PCI-E bus. It is possible to desolder the xhci controller and attach alternative devices, but I haven't tried that.
I have experimented extensively with a 3 GB limit, and it is better than no limit but there is corruption on the xhci DMA. I am very happy if there is a working alternative to this patch: it is probable that there is some further nuance of configuration around the PCI-e DMA that I do not know.
The 0x3c000000 limit is taken from OpenBSD.
thanks for the patch !,
not my area of expertise but here some superficial details :
limiting DMA to a certain memory region
is necessary given that the RPi4 ships with more than 1 GB of memory
in some configurations, but uses DMA translation for the platform devices
in a way that puts that memory out of reach for 32-bit DMA (i.e., the DMA
translation is 3 GB). So 2 possibilities: setting the device DMA limit to MAX_UINT32/ an host address limit of 1 GB/ as done in this patch, or adding a bounce buffering implementation if any buffers should be mapped outside that region.
A quote I found about this was: "OpenBSD can deal with the 3GB limit. In fact it imposes a 1GB DMA limit because there are additional DMA restrictions for the SD controller." Sounds like their design has the same figure used/needed(?) for both and so picked the smaller limit that fit both. (Mark Kettenis Aug 24, 2020; 4:29am Re: Discuss UEFI settingsin arm64.html/INSTALL.arm64 ) I've not found any evidence of such a limit for NetBSD (not that I'm good at finding such things).
Also: I just found a July verbose boot log for NetBSD on a RPi4. It reports:
[ 1.0000030] acpi0: \_SB_.SCB0.XHC0: DMA sys 0-0xbfffffff bus 0-0xbfffffff
[ 1.0000030] acpi0: \_SB_.SCB0.XHC0: DMA sys 0-0xbfffffff bus 0-0xbfffffff
. . .
So it reports a 3 GiB range. The other devices all listed:
. . . DMA sys 0-0x3fffffff bus 0xc0000000-0xffffffff
For reference: https://dmesgd.nycbug.org/index.cgi?do=view&id=5562
. . . And now I've figured out how to get that output in my boots of NetBSD on the RPi4B's (boot . . . -x). The other devices listed with DMA address ranges are prefixed with GDV0. instead of SCB0. : USB0, GPU0, RPIQ, VCIQ, VCSM, GPI0, I2C1, I2C2, SPI0, SPI1, PWM0, URT0, URTM, BTH0, RHPX, SDC1, SDC1.SDMM, SDC2, and SDC2.SDMM . They all seem to list 2 lines, like XHC0 did. There are no otherDMA address ranges shown for anything else.
FYI: Something that I do not see represented in the NetBSD debug information is that DMA channel 11 (of DMA4 engine type) is the only one explicitly noted in rpi_DATA_2711_1p0.pdf as "additionally able to access the PCIe interface".
|643 ↗||(On Diff #76745)|
Note: from a RPi4B:
/usr/include/machine/bus.h:#define BUS_SPACE_MAXSIZE 0xFFFFFFFFFFFFFFFFUL
bus_dma(9) reports for boundary and maxsegsz:
The boundary must be a power of 2 and must be no smaller than the maximum segment size. `0' indicates that there are no boundary re- strictions. Maximum size, in bytes, of a segment in any DMA mapped region associated with dmat.
Unless bus_get_dma_tag(dev) returns something that already has sufficiently restrictive values for the device that will constrain what is specified here, the above reads to me like the bus_dma_tag_create call allows for segments too large to fit the context. If there are such implicit constraints via bus_get_dma_tag(dev), should it be commented on in the code?
Given what I've seen about the 5.8 mainline Linux kernel status for the RPi4 and that Fedora just branched 33 from Rawhide (for a late Oct. release), I decided to try installing a Fedora 33 branch server and UEFI/ACPI v1.20 and doing a "yum update" on the booted result. (USB3 SSD based, no microsd card involved.) That ended up with a 5.8.7-300.fc33.aarch64 kernel and it sees all 8GiB of RAM. I've tried the huge file copy and diff type of test and it has not failed yet. (The files are each bigger than the RAM.) (I've not noticed other issues but that likely does not imply much.) For reference, the boot reported the following "Zone ranges":
[ 0.000000] NUMA: Faking a node at [mem 0x0000000000000000-0x00000001ffffffff] [ 0.000000] NUMA: NODE_DATA [mem 0x1fefec8c0-0x1ff002fff] [ 0.000000] Zone ranges: [ 0.000000] DMA [mem 0x0000000000000000-0x000000003fffffff] [ 0.000000] DMA32 [mem 0x0000000040000000-0x00000000ffffffff] [ 0.000000] Normal [mem 0x0000000100000000-0x00000001ffffffff] [ 0.000000] Device empty [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000000000000-0x00000000001fffff] [ 0.000000] node 0: [mem 0x0000000000200000-0x00000000337d7fff] [ 0.000000] node 0: [mem 0x00000000337d8000-0x000000003387ffff] [ 0.000000] node 0: [mem 0x0000000033880000-0x000000003390ffff] [ 0.000000] node 0: [mem 0x0000000033910000-0x000000003398ffff] [ 0.000000] node 0: [mem 0x0000000033990000-0x00000000339affff] [ 0.000000] node 0: [mem 0x00000000339b0000-0x0000000033a2ffff] [ 0.000000] node 0: [mem 0x0000000033a30000-0x0000000036efffff] [ 0.000000] node 0: [mem 0x0000000036f00000-0x00000000372dffff] [ 0.000000] node 0: [mem 0x00000000372e0000-0x000000003b2fffff] [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000fbffffff] [ 0.000000] node 0: [mem 0x0000000100000000-0x00000001ffffffff] [ 0.000000] Zeroed struct page in unavailable ranges: 552 pages [ 0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x00000001ffffffff]
What OpenBSD did for the size of the DMA area for XHCI looks to be unusual (relative to NetBSD and Linux). But the next-to-last "node 0:" line above does show not using the whole DMA32 range. (I do not now why they had it do that.)
Mark, my estimation in "short" :
this patch is a MustHave which respects DMA limits, predefined
of hardware design(or better said: of hw design bug).
Since file corruption is a NoGo, the only thing to do here , requested by an expert-reviewer, is to rename
BUS_SPACE_DMA_MAXADDR into BUS_SPACE_DMA_bcm2838_MAXADDR or similar... and of course to click the commit-button :-)
Expanding to 3GB by an bounce buffer implementation is a NiceToHave for later,
which needs further research(or discussion on the mailing list(not here)) not only about ACPI, OpenBSD/NetBSD/tux but also specially on how or whether U-boot has dealed with it in their last revisions.
IIRC https://reviews.freebsd.org/p/kevans/ has recommended to perform these things a bit more consistent e.g. in sysutils/u-bootxxx, which I would hereby underline again. by the way , thanks, Mark for hunting this bug without mercy :-)
As I have just reported on the arm list, the checked-in patch fails the huge file duplicate and diff/cmp test. xhci
DMA is still a problem.
(I finally got things updated past head -r363590 to head -r365932 and finally re-ran the test in the new context.)
FYI: sys/arm64/arm64/busdma_machdep.c 's common_bus_dma_tag_create has:
common->lowaddr = trunc_page((vm_paddr_t)lowaddr) + (PAGE_SIZE - 1); common->highaddr = trunc_page((vm_paddr_t)highaddr) + (PAGE_SIZE - 1);
and so forces reference to the last byte of the page identified by lowaddr (and highaddr). Thus lowaddr needs to identify the last page that can avoid bouncing (or earlier), not the first page that needs bouncing.
arm64, arm, powerpc, riscv, mips, and x86 all use the trunc_page and PAGE_SIZE-1 computation. I do not remember a hint of that from the documentation for bus_dma_tag_create.
Turns out that that basing the code on subtracting 1 from 0xc0000000u for DMA_HIGH_LIMIT works without needing to force dma_bits to 32 in the bcm2838_xhci.c code.
(I'm unsure of the old #define pair vs. the newer material below. 0x21-0xf=0x12 instead of 0x11, for example. But I'm not PCIe literate to know which is better. I used the old pair in my testing the adjusted DMA_HIGH_LIMIT.)
By the way: the edk2 toolkit uses the 0x11 figure instead of the below 0x12 figure (0x21-0xf). So my list-posted patch that also reverted to your earlier REG_VALUE_4GB_WINDOW and REG_VALUE_4GB_CONFIG use matches what they did.
So far, I've not seen any evidence of problems with what I did.