Page MenuHomeFreeBSD

iflib: Implement tx desc reclaim threshold
ClosedPublic

Authored by gallatin on Sep 15 2025, 9:50 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Oct 11, 2:57 AM
Unknown Object (File)
Fri, Oct 10, 4:06 AM
Unknown Object (File)
Fri, Oct 10, 4:06 AM
Unknown Object (File)
Fri, Oct 10, 4:06 AM
Unknown Object (File)
Fri, Oct 10, 4:05 AM
Unknown Object (File)
Fri, Oct 10, 3:15 AM
Unknown Object (File)
Thu, Oct 9, 10:51 PM
Unknown Object (File)
Thu, Oct 9, 9:18 PM

Details

Summary

On some iflib drivers, the txd reclaim routine can be fairly expensive at high packet rates.
Iflib was designed with the intent of only reclaiming tx descriptors above a configurable threshold, but this logic was left unimplemented
This change:

  • implements 2 new knobs, iflib.tx_reclaim_thresh and iflib.tx_reclaim_ticks.
  • moves tx reclaim thresh from the if_shared_ctx and into the iflib_ctx as drivers don't need to see it, and it needs to be changed, so it can't be const o tx_reclaim_thresh and ticks are replicated into the txq to improve cache locality of data accessed in the hot path
  • ticks is used rather than more expensive timekeeping mechanism so as to keep things simple and cheap

This change substantially improves packet rates on bnxt. It has been tested on bxnt and ixl

On ixl, this change interacts poorly with doorbell deferral via txq_max_db_deferred(). On ixl, I observed iflib sending doorbells only every 128 packets once traffic had ramped up, which caused LACP to flap constantly. I'm currently unsure if this is a problem in the patch, or a problem with doorbell deferral. For now, I have disabled doorbell deferral when we use a non-zero tx desc reclaim threshold as a workaround. That allowed a busy server to run for over 48hrs with no flaps using ixl. I would like to understand how doorbell deferral is supposed to work, as it seems to have no backstop timer to force doorbells on mosty-idle links (like my LACP problem)

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

FreeBSD/sys/net/iflib.c
3054 ↗(On Diff #162111)

The more I think about this, the more I think that this is entirely unsafe in the general case, since there is no timer-driven backstop. The condition that I was hitting (packets pending for xmit, just waiting for doorbells) could happen in other cases (like something blocks tx complete so we have a lot of packets pending).

Without a timer-driven backstop, doorbells deferral is only safe if the caller can see the future. mp_ring can do this in some cases, so if we want to defer doorbells, it should happen at that level.

I think I may make a future patch to entirely remove this deferral mechanism.

FreeBSD/sys/net/iflib.c
349 ↗(On Diff #162111)
3054 ↗(On Diff #162111)

I don't quite follow, isn't the enqueue in _task_fn_tx() supposed to (eventually) trigger a doorbell? Is that not prompt enough?

3094 ↗(On Diff #162111)

This comment can go too I think.

6814 ↗(On Diff #162111)
6816 ↗(On Diff #162111)
6851 ↗(On Diff #162111)

Same style bugs above.

6957 ↗(On Diff #162111)

Why are these NEEDGIANT? Do we not need to serialize the sysctl handlers with the tx path?

gallatin marked 4 inline comments as done and an inline comment as not done.Mon, Sep 22, 11:16 PM

@markj I've marked things as done as I've done them in my local tree. I'll update the patch once I do some more research into the doorbell issue. thank you for the reveiw!

FreeBSD/sys/net/iflib.c
3054 ↗(On Diff #162111)

I don't quite follow, isn't the enqueue in _task_fn_tx() supposed to (eventually) trigger a doorbell? Is that not prompt enough?

The codepath seems to involve mp_ring. I cannot find a codepath that leads to doing anything with pending doorbells unless there are packets to be sent.

Eg, in the if (txq->ift_db_pending) case, we call ifmp_ring_enqueue(), which then calls drain_ring_lockless() in the !abdicate case. This might call iflib_txq_drain(). In the case where we have pending packets that we have not sent via doorbell, its ring arg will be 0. So then we fall back to using the "max" calculation TXQ_MAX_DB_DEFERRED(txq, txq->ift_in_use);

If there is nothing to send, ring will be iflib_min_tx_latency | err will be 0 at the bottom of iflib_txq_drain

6957 ↗(On Diff #162111)

Because the code I cloned them from was NEEDGIANT :)

I think it is safe to leave them un-serialized.

Update patch to address @markj 's feedback, and to properly fix the bug I encountered where doorbells were withheld when using the mp_ring transmit method.

This revision was not accepted when it landed; it landed in state Needs Review.Wed, Oct 1, 10:57 PM
This revision was automatically updated to reflect the committed changes.