Page MenuHomeFreeBSD

LinuxKPI: skbuff: adjust to updated malloc/contigmalloc and free(9).
Needs ReviewPublic

Authored by bz on Jun 30 2024, 7:52 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mar 3 2025, 6:50 PM
Unknown Object (File)
Mar 1 2025, 7:05 PM
Unknown Object (File)
Jan 28 2025, 6:13 PM
Unknown Object (File)
Jan 28 2025, 2:32 AM
Unknown Object (File)
Jan 21 2025, 6:58 PM
Unknown Object (File)
Jan 5 2025, 4:07 AM
Unknown Object (File)
Dec 18 2024, 7:21 AM
Unknown Object (File)
Dec 7 2024, 3:17 AM

Details

Reviewers
jhb
Summary

Simplify the code using the extended contigmalloc(9) which means we
(a) no longer have to remember our allocation size, and (b) we can
call free(9) independent of the allocator (malloc or contigmalloc).

This will likely also allow us to make the TUNABLE a SYSCTL in the
future, allow drivers to change it on the fly (at least lowering the
limit) and with that removing the need for user adjustments.

Sponsored by: The FreeBSD Foundation
MFC after: 3 days

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 63200
Build 60084: arc lint + arc unit

Event Timeline

bz requested review of this revision.Jun 30 2024, 7:52 PM

s/siplify/simplify/ in the commit message

sys/compat/linuxkpi/common/src/linux_skbuff.c
103

@jhb This should likely be len > PAGE_SIZE only and not >= ?

sys/compat/linuxkpi/common/src/linux_skbuff.c
103

I don't understand why we will need this tunable at all once kmalloc() and friends are physically contiguous? I guess what you really need is a way for a driver to actually use bus_dmamem_alloc under the hood instead of kmalloc() for things like descriptor rings?

sys/compat/linuxkpi/common/src/linux_skbuff.c
103

Because ...^%$%^&*?

From all I could say DMA only worked with the lower 32bit on realtek rtw88 and they did not even set that "hint" on Linux in the rtw88 driver when I first ported it over.
For rtw89 I think they had it then... and it's 32bit (unless I added that too).

QCA/Atheros is very picky on coming out of reset and who knows what, ...
sys/contrib/dev/athk/ath10k/pci.c- /* Target expects 32 bit DMA. Enforce it. */
sys/contrib/dev/athk/ath10k/pci.c: ret = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));

ath11k is 36 bits

ath12k is sys/contrib/dev/athk/ath12k/pci.c:#define ATH12K_PCI_DMA_MASK 32

Mediatek is all 32bit but the mt7996 chipset which is 36bit.

For Intel I found one place so far where they used a KPI which didn't "inherit and adhere to" the limit they give for the chipset; they have different limits depending on chipset generation (later ones are fine, older ones are 36 or 32 bit DMA too);

if (trans->trans_cfg->gen2) {
        trans->txqs.tfd.addr_size = 64;
... } else { ...
        trans->txqs.tfd.addr_size = 36;
}

addr_size = trans->txqs.tfd.addr_size;
ret = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(addr_size));
if (ret) {
        ret = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));

The problem remains that even if physically contig you then still need to bounce because of that limit and most of them not always using the proper KPIs on Linux (I assume); and you still do not get large enough contig segments at least on arm64 for that.

Completely rewriting the drivers to busdma means we'd half port them over natively and that's a pain to maintain too -- at least for now; I added a hack back then to LinuxKPI to be able to use bus_read/write as a start by adding a 1-liner to a driver and that was an okay-ish tradeoff...

As seen by Intel, it's not that the drivers couldn't do the right thing ... I am just in no place to *NOW* deal with all their internals while too many other things are pressing. In the future having a proper abstraction for these things like we do for Ethernet and have a foo_linux.c and a foo_freebsd.c would be awesome; we are just in no place yet to get there.

But I'd get rid of the TUNABLE/SYSCTL in the future and at least let the driver express (set -- as in lower if needed) that limit (or rather let LinuxKPI do that internally given they do call dma_set_mask() or dma_set_mask_and_coherent().

103

The real question was - I should adjust the boundary check, right?

So (ignoring realloc) I am running the kmalloc changes with this and the entire tunable/private malloc #if 0 here using __kmalloc() in my branch.

By the understanding of it all being a single segment physically contiguous allocation now, busdma bouncing with nseg=1 should be happy if need be too.

This is an amd64 machine with 4GB of memory so should be fine anyway. Or so I thought. rtw88 is badly unhappy and:
compat.linuxkpi.lkpi_pci_nseg1_fail: 11

I assume I'll do realloc and see again or we are lacking yet another something ...

In D45813#1129990, @bz wrote:

So (ignoring realloc) I am running the kmalloc changes with this and the entire tunable/private malloc #if 0 here using __kmalloc() in my branch.

Makes little sense; it's a skb_dev_alloc call which directly calls into __kmalloc now with D45813 applied.
This is (in rtw88/linuxkpi):
rtw_pci_init_rx_ring() -> dev_alloc_skb() -> rtw_pci_reset_rx_desc() -> dma_map_single() -> linux_dma_map_phys() -> linux_dma_map_phys_common -> _bus_dmamap_load_phys().
Thankfully I added the dump_stack() there with debugging turned on. Made this easier to look up.

If I am not mistaken that means we end up in bounce_bus_dmamap_load_phys() from sys/x86/ .. but why?

sys/contrib/dev/rtw88/pci.c: ret = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));
sys/contrib/dev/rtw88/pci.c: ret = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(32));

rtw_pci_setup_resource() first calls into rtw_pci_io_mapping() doing the above dma_set_mask() before calling rtw_pci_init() which eventually will end up in rtw_pci_init_rx_ring().

Right and here's the obvious answer why we may bounce because there's a chunk above 32bit:

[1.000026] real memory = 4294967296 (4096 MB)
[1.000027] Physical memory chunk(s):
[1.000028] 0x0000000000010000 - 0x000000000009bfff, 573440 bytes (140 pages)
[1.000029] 0x0000000000100000 - 0x00000000001fffff, 1048576 bytes (256 pages)
[1.000030] 0x0000000002602000 - 0x00000000c7fa0fff, 3315200000 bytes (809375 pages)
[1.000031] 0x00000000cfc00000 - 0x00000000cfd4ffff, 1376256 bytes (336 pages)
[1.000032] 0x0000000100000000 - 0x000000012eb9ffff, 783941632 bytes (191392 pages)
[1.000033] avail memory = 4084629504 (3895 MB)

I should verify that the requested physical address is indeed from the last chunk, and where the _bus_dmamap_load_phys() call fails and why.

But it seems I am back to square one with the bounce code not being able to fullfill the request?

Makes me wonder but the skbuff data bits come out of "kmalloc" if I understood correctly; we just also get the header and other bits out of the same allocation rather than using an UMA zone for the header and attaching the data so far.

In D45813#1129996, @bz wrote:

I should verify that the requested physical address is indeed from the last chunk, and where the _bus_dmamap_load_phys() call fails and why.

But it seems I am back to square one with the bounce code not being able to fullfill the request?

[419.292946] rtw880: linux_dma_map_phys_common:1482: _bus_dmamap_load_phys error 27, phys 0x00000001230bc140 len 11478
[419.964686] rtw880: linux_dma_map_phys_common:1482: _bus_dmamap_load_phys error 27, phys 0x00000001230c0140 len 11478
[420.234643] rtw880: linux_dma_map_phys_common:1482: _bus_dmamap_load_phys error 27, phys 0x00000001230c0140 len 11478
[420.835826] rtw880: linux_dma_map_phys_common:1482: _bus_dmamap_load_phys error 27, phys 0x00000001230c0140 len 11478
[421.046703] rtw880: linux_dma_map_phys_common:1482: _bus_dmamap_load_phys error 27, phys 0x00000001230c0140 len 11478
[421.355133] rtw880: linux_dma_map_phys_common:1482: _bus_dmamap_load_phys error 27, phys 0x00000001230c0140 len 11478

EFBIG

@jhb probably not as complicated to fix on x86 but there's an old incarnation around at D31823. Not sure if I kept the updated code somewhere. bouncing remains the problem or keeping the tunable.

Update the code using #ifdefs to hide the old implementation.

LinuxKPI: skbuff: disable stunable for malloc restrictions

With kmalloc et all platying by the rules of giving us one contiguous
physical section of memory even if we have to bounce we should be able
to given nseg=1 should work.

In practise we still see nseg=1 failures so keep the current code

Sponsored by:   The FreeBSD Foundation
MFC after:      3 days
Gratitude to:   jhb for all the help sorting all the malloc bits
Differential Revision: https://reviews.freebsd.org/D45813
bz added inline comments.
sys/compat/linuxkpi/common/src/linux_80211.c
6359

This bit I could probably simply commit w/o this review so it's out of the way.