Page MenuHomeFreeBSD

LinuxKPI: SKBuff: remove DMA32/36 workaround tunable
Needs ReviewPublic

Authored by bz on Apr 24 2025, 12:02 AM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Nov 26, 9:17 PM
Unknown Object (File)
Nov 12 2025, 9:48 PM
Unknown Object (File)
Oct 24 2025, 2:48 PM
Unknown Object (File)
Oct 17 2025, 7:48 AM
Unknown Object (File)
Oct 15 2025, 4:54 PM
Unknown Object (File)
Sep 27 2025, 2:20 AM
Unknown Object (File)
Sep 24 2025, 12:19 AM
Unknown Object (File)
Sep 20 2025, 12:31 PM

Details

Reviewers
jhb
Summary

Theory has it that with the LinuxKPI alloc code now providing
physically contiguous memory we should be fine so we can remove
the entire compat code.

For this to happen busdma bounce needs to be able to bounce multiple
contiguous pages with nseg=1. Removing this code the LinuxKPI debug
counter about failed mappings raises quickly. See the discussion in
D45813 for some more info.

Older Intel cards do have a 36bit DMA limit, rtw88 seems to still have
a 32bit limit despite PCIe.

Sponsored by: The FreeBSD Foundation
MFC after: 3 days

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 63687
Build 60571: arc lint + arc unit

Event Timeline

bz requested review of this revision.Apr 24 2025, 12:02 AM
adrian added a subscriber: adrian.

yeah lots of pcie wifi hardware has a 36 bit dma limit; it saves on gates :-P

This revision is now accepted and ready to land.Apr 24 2025, 12:46 AM
This revision now requires review to proceed.Apr 24 2025, 7:38 AM

So the linuxkpi_skb_memlimit is used to give contiguous > PAGE_SIZE allocations and to limit allocations to 32 or 36 bit addresses, so what do we do about the latter? How does rtw88 on Linux handle this?

So the linuxkpi_skb_memlimit is used to give contiguous > PAGE_SIZE allocations and to limit allocations to 32 or 36 bit addresses, so what do we do about the latter? How does rtw88 on Linux handle this?

I believe (not 100% sure) that Linux tries to get DMA memory by default from the lower 32bit.

We have everything in place that we could bounce BUT there is the 1 segment limit (if the drivers don't use the S/G interface which none of rtw8x, ath10k, mt76 do; mt76 with page pools being a different story now). Our busdma framework false to provide contiguous bounce pages for the single segment, especially if you need 3..4.. as needed at times. Observed recently with mt76 before I added a work around (which in Linux v6.19 will become obsolete I believe): https://reviews.freebsd.org/D54061

rtw88 specifically we even added the [explicit] DMA_BIT_MASK() calls as they are lacking upstream (rtw89 does have them).

sys/contrib/dev/iwlwifi/pcie/gen1_2/trans.c:    ret = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(addr_size));   // 36 or 64 bit
sys/contrib/dev/iwlwifi/pcie/gen1_2/trans.c:            ret = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
sys/contrib/dev/athk/ath11k/ahb.c:      ret = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
sys/contrib/dev/athk/ath11k/pci.c:      ret = dma_set_mask(&pdev->dev,                                          // #define ATH11K_PCI_DMA_MASK 36
sys/contrib/dev/athk/ath10k/ahb.c:      ret = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));
sys/contrib/dev/athk/ath10k/pci.c:      ret = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
sys/contrib/dev/athk/ath10k/snoc.c:     ret = dma_set_mask_and_coherent(dev, drv_data->dma_mask);
sys/contrib/dev/athk/ath12k/pci.c:      ret = dma_set_mask_and_coherent(&pdev->dev,                            // #define ATH12K_PCI_DMA_MASK 32
sys/contrib/dev/rtw89/pci.c:    ret = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(36));
sys/contrib/dev/rtw89/pci.c:            ret = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
sys/contrib/dev/rtw88/pci.c:    ret = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));                              // added by us
sys/contrib/dev/mediatek/mt76/mt7925/pci.c:     ret = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));
sys/contrib/dev/mediatek/mt76/mt76x0/pci.c:     ret = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));
sys/contrib/dev/mediatek/mt76/mt7615/pci.c:     ret = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));
sys/contrib/dev/mediatek/mt76/mt7921/pci.c:     ret = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));
sys/contrib/dev/mediatek/mt76/mt7996/pci.c:     ret = dma_set_mask(&pdev->dev, DMA_BIT_MASK(36));
sys/contrib/dev/mediatek/mt76/mt76x2/pci.c:     ret = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));
sys/contrib/dev/mediatek/mt76/mt7603/pci.c:     ret = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));
sys/contrib/dev/mediatek/mt76/mt7915/pci.c:     ret = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));
sys/contrib/dev/mediatek/mt76/mt7915/mmio.c:    ret = dma_set_mask(wed->dev, DMA_BIT_MASK(32));

If we really want to get rid of this we need to sort busdma. How, I don't know how yet. I tired three times over the years.

At this point, it would almost sound best if "driver brings its own bounce page pool" would work as then we could avoid a more general implementation (which I did a few years ago but @jhb wasn't quite keen on, e.g., given busdma tries to use "hot" pages right away again and I re-sorted them to keep them as contiguous as possible).

And I have to say I would love to get rid of the sysctl and hack.
My FW.16 has 96GB of memory and that's exceeding the 36 bits some of these cards support, so even there I have to set the sysctl.
In theory they are all supposed to support 64bit DMA but "lanes are expensive" as one of our fellow committers put it.

I mean look it's 2025 and the machines that we're going to run these devices on have a huge amount of ram compared to the 90s. linux has always had the notion of low, high, very high, etc pools for allocators and their device drivers in various malloc/etc incantations end up using those behaviours implicitly. They have slowly churned on malloc flags, dev based allocations, etc to make this less error-prone but it's still not guaranteed.

I honestly think we should just keep a hack like this in the tree and reserve a pool of pages to use for these allocations - which we do for bounce buffers, mind! - and feed them up like this. Noone is going to notice 64mb of RAM in low 32 bit space and 64mb RAM in 36 bit space so we have better working linuxkpi network drivers. Heck, something storage/nvme related happily allocates RAM on my laptop like it's free in 2025:

nvme0: Allocated 64MB host memory buffer

I accepted this in the beginning because i thought we had all the machinery needed elsewhere for this not to be needed, but from a quick read today it doesn't look like we do.