Page MenuHomeFreeBSD

iflib: Drop tx lock when freeing mbufs using simple_transmit
ClosedPublic

Authored by gallatin on Tue, Dec 23, 10:21 PM.
Tags
None
Referenced Files
F142308921: D54356.id169253.diff
Sun, Jan 18, 10:32 AM
F142275577: D54356.id169266.diff
Sun, Jan 18, 2:00 AM
F142261202: D54356.id.diff
Sat, Jan 17, 10:18 PM
F142258008: D54356.id169215.diff
Sat, Jan 17, 9:29 PM
F142253114: D54356.id.diff
Sat, Jan 17, 8:18 PM
Unknown Object (File)
Fri, Jan 16, 2:05 PM
Unknown Object (File)
Thu, Jan 15, 6:52 PM
Unknown Object (File)
Mon, Jan 12, 11:36 PM

Details

Summary

Freeing completed transmit mbufs can be time consuming (due to them
being cold in cache, and due to ext free routines taking locks),
especially when we batch tx completions. If we do this when holding
the tx ring mutex, this can cause lock contention on the tx ring mutex
when using iflib_simple_transmit.

To resolve this, this patch opportunistically copies completed mbuf
pointers into a new array (ifsd_m_defer) so they can be freed after
dropping the transmit mutex. The ifsd_m_defer array is
opportunistically used, and may be NULL. If its NULL, then we free
mbufs in the old way. The ifsd_m_defer array is atomically nulled
when a thread is using it, and atomically restored when the freeing
thread is done with it. The use of atomics here avoids
acquire/release of the tx lock to restore the array after freeing
mbufs.

Since we're no longer always freeing mbufs inline, peeking into them to see if a
transmit used TSO or not will cause a useless cache miss, as nothing
else in the mbuf is likely to be accessed soon. To avoid that cache
miss, we encode a TSO or not TSO flag in the lower bits of the mbuf
pointer stored in the ifsd_m array. Note that the IFLIB_NO_TSO flag
exists primarily for sanity/debugging.

iflib_completed_tx_reclaim() was refactored to break out
iflib_txq_can_reclaim() and _iflib_completed_tx_reclaim()
so the that the tx routine can call iflib_tx_credits_update()
just once, rather than twice.

Note that deferred mbuf freeing is not enabled by default, and can be
enabled using the dev.$DEV.$UNIT.iflib.tx_defer_mfree sysctl.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

I've only been able to test this on amd64, and I'm pretty bad at atomic acq/rel semantics. If somebody could review that, I'd appreciate it.

Have you tried chaining mbufs instead of having an array? Does that have negative cache effect?

How big is the array? Can it be placed on stack achieving same level of batching?

Have you tried chaining mbufs instead of having an array? Does that have negative cache effect?

Yes; chained mbufs will cause a cache miss for each mbuf when queuing, and when de-queing.

How big is the array? Can it be placed on stack achieving same level of batching?

The array is the same size as the tx ring, so that we can free an entire ring's worth of transmits at once if needed.
Iflib supports up to 32k ring sizes, so that's 32k*8 or 256k. That's too big for the stack, I think.

I don't like the atomics either, but I'm fairly sure that is the cheapest way to do this.

What really bugs me is the mess with the different busdma resources for TSO and non-TSO. No driver I've ever written has needed this. I'd like to clean it up, but there are so many drivers using iflib and I'm sure that at least one must actually need this..

ziaee added a subscriber: ziaee.

Note that deferred mbuf freeing is not enabled by default, and can be

enabled using the dev.$DEV.$UNIT.iflib.tx_defer_mfree sysctl.

Can you put tx_defer_mfree in iflib(4) in this commit?

Note that deferred mbuf freeing is not enabled by default, and can be

enabled using the dev.$DEV.$UNIT.iflib.tx_defer_mfree sysctl.

Can you put tx_defer_mfree in iflib(4) in this commit?

I

Note that deferred mbuf freeing is not enabled by default, and can be

enabled using the dev.$DEV.$UNIT.iflib.tx_defer_mfree sysctl.

Can you put tx_defer_mfree in iflib(4) in this commit?

I opened a separate review to document this and the other undocumented things I've added recently: https://reviews.freebsd.org/D54564

  • rebased diff
  • added documentation for tx_defer_mfree sysctl

Manual LGTM, idk about the code.

This revision is now accepted and ready to land.Tue, Jan 6, 10:10 PM
FreeBSD/sys/net/iflib.c
7336 ↗(On Diff #169215)

This should be atomic_store, not atomic_set. It happens to work because the current value of the pointer is 0, but it's logically wrong and more expensive than necessary.

  • use atomic_store rather than atomic_set, as per @markj 's feedback
  • update man page date
This revision now requires review to proceed.Wed, Jan 7, 5:19 PM
gallatin added inline comments.
FreeBSD/sys/net/iflib.c
7336 ↗(On Diff #169215)

Thank you! This is exactly the kind of feeback I was hoping for!

This revision was not accepted when it landed; it landed in state Needs Review.Wed, Jan 7, 7:33 PM
This revision was automatically updated to reflect the committed changes.
gallatin marked an inline comment as done.