iflib: Drop tx lock when freeing mbufs using simple_transmit
ClosedPublic
Actions

Authored by gallatin on Dec 23 2025, 10:21 PM.

Details

Reviewers

shurd
erj
glebius
markj
kgalazka
kbowling
ziaee

Group Reviewers

iflib
manpages

Commits

rG14d93f612f26: iflib: Drop tx lock when freeing mbufs using simple_transmit

Summary

Freeing completed transmit mbufs can be time consuming (due to them
being cold in cache, and due to ext free routines taking locks),
especially when we batch tx completions. If we do this when holding
the tx ring mutex, this can cause lock contention on the tx ring mutex
when using iflib_simple_transmit.

To resolve this, this patch opportunistically copies completed mbuf
pointers into a new array (ifsd_m_defer) so they can be freed after
dropping the transmit mutex. The ifsd_m_defer array is
opportunistically used, and may be NULL. If its NULL, then we free
mbufs in the old way. The ifsd_m_defer array is atomically nulled
when a thread is using it, and atomically restored when the freeing
thread is done with it. The use of atomics here avoids
acquire/release of the tx lock to restore the array after freeing
mbufs.

Since we're no longer always freeing mbufs inline, peeking into them to see if a
transmit used TSO or not will cause a useless cache miss, as nothing
else in the mbuf is likely to be accessed soon. To avoid that cache
miss, we encode a TSO or not TSO flag in the lower bits of the mbuf
pointer stored in the ifsd_m array. Note that the IFLIB_NO_TSO flag
exists primarily for sanity/debugging.

iflib_completed_tx_reclaim() was refactored to break out
iflib_txq_can_reclaim() and _iflib_completed_tx_reclaim()
so the that the tx routine can call iflib_tx_credits_update()
just once, rather than twice.

Note that deferred mbuf freeing is not enabled by default, and can be
enabled using the dev.$DEV.$UNIT.iflib.tx_defer_mfree sysctl.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

gallatin created this revision.Dec 23 2025, 10:21 PM

Herald added a reviewer: iflib. · View Herald TranscriptDec 23 2025, 10:21 PM

Herald added subscribers: melifaro, ae, imp. · View Herald Transcript

I've only been able to test this on amd64, and I'm pretty bad at atomic acq/rel semantics. If somebody could review that, I'd appreciate it.

Have you tried chaining mbufs instead of having an array? Does that have negative cache effect?

How big is the array? Can it be placed on stack achieving same level of batching?

In D54356#1242220, @glebius wrote:

Have you tried chaining mbufs instead of having an array? Does that have negative cache effect?

Yes; chained mbufs will cause a cache miss for each mbuf when queuing, and when de-queing.

How big is the array? Can it be placed on stack achieving same level of batching?

The array is the same size as the tx ring, so that we can free an entire ring's worth of transmits at once if needed.
Iflib supports up to 32k ring sizes, so that's 32k*8 or 256k. That's too big for the stack, I think.

I don't like the atomics either, but I'm fairly sure that is the cheapest way to do this.

What really bugs me is the mess with the different busdma resources for TSO and non-TSO. No driver I've ever written has needed this. I'd like to clean it up, but there are so many drivers using iflib and I'm sure that at least one must actually need this..

Note that deferred mbuf freeing is not enabled by default, and can be

enabled using the dev.$DEV.$UNIT.iflib.tx_defer_mfree sysctl.

Can you put tx_defer_mfree in iflib(4) in this commit?

emaste added a subscriber: emaste.Jan 6 2026, 7:30 PM

In D54356#1246570, @ziaee wrote:

Note that deferred mbuf freeing is not enabled by default, and can be

enabled using the dev.$DEV.$UNIT.iflib.tx_defer_mfree sysctl.

Can you put tx_defer_mfree in iflib(4) in this commit?

I opened a separate review to document this and the other undocumented things I've added recently: https://reviews.freebsd.org/D54564

rebased diff
added documentation for tx_defer_mfree sysctl

Manual LGTM, idk about the code.

Seems fine

This revision is now accepted and ready to land.Jan 6 2026, 10:10 PM

markj added inline comments.Jan 6 2026, 10:56 PM

FreeBSD/sys/net/iflib.c
7336 ↗	(On Diff #169215)	This should be atomic_store, not atomic_set. It happens to work because the current value of the pointer is 0, but it's logically wrong and more expensive than necessary.