epoch: Don't idle CPUs when there's pending epoch work
ClosedPublic
Actions

Authored by markj on Apr 18 2026, 7:07 PM.

Details

Reviewers

Commits

rG49dbebd74fc8: epoch: Don't idle CPUs when there's pending epoch work
rGdce565949914: epoch: Don't idle CPUs when there's pending epoch work

Summary

The epoch(9) subsystem implements per-CPU queues of object destructors
which get invoked once it is safe to do so. These queues are polled via
hardclock().

When a CPU is about to go idle, we reduce the hardclock frequency to 1Hz
by default, to avoid unneeded wakeups. This means that if there is any
garbage in these queues, it won't be cleared for at least 1s (and
possibly longer).

epoch_drain_callbacks() is used in some places to provide a barrier,
ensuring that all garbage present in the destructor queues is cleaned up
before returning. It's implemented by adding a fake destructor in the
queues and blocking until it gets run on all CPUs. The above-described
phenomenon means that it can take a long time for these calls to return,
even (especially) when some CPUs are idle. This causes long delays when
destroying VNET jails, for instance, as epoch_drain_callbacks() is
invoked each time a network interface is destroyed.

Work around this problem by not disabling the hardclock timer if there
is garbage present in the destructor queues. The implementation of
epoch_drain_callbacks() has other problems, but this small change on its
own gives a good improvement.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

markj created this revision.Apr 18 2026, 7:07 PM

Herald added subscribers: olce, imp. · View Herald TranscriptApr 18 2026, 7:07 PM

markj requested review of this revision.Apr 18 2026, 7:07 PM

Harbormaster completed remote builds in B72331: Diff 175847.Apr 18 2026, 7:07 PM

Not sure who else might want to review this.

This gives a big speedup when running the regression test suite in parallel, as that involves destroying thousands of VNET jails.

IMHO, if everything in the network stack is coded properly, we don't need explicit epoch_drain_callbacks() anywhere and this KPI should disappear.

Of course I'd love to see the test suite running faster. Now even most trivial test case takes at least a second.

Are there any pitfalls with this patch? Increased idle CPU/power use? Anything else?

Maybe leave a comment pointing at this review, so that later when the network stack is improved this can be deleted? Since the patch is a one liner leaving the attribution only to git blame can quickly lose the rationale for it.

I'd love to see tests running faster, but I'm not expert to tell if we are going to have any side effects of this change.

This revision is now accepted and ready to land.Apr 19 2026, 6:25 AM

In D56508#1293393, @glebius wrote:

IMHO, if everything in the network stack is coded properly, we don't need explicit epoch_drain_callbacks() anywhere and this KPI should disappear.

Right, this is always a hack. But fixing this requires much deeper integration of epoch into the network stack. For instance, VNET teardown would have to become completely asynchronous, I think.

Really I'd like to unify epoch and SMR, and make the network stack much more explicit about where we're relying on synchronization from epoch(9), using accessors from smr_types.h.

Of course I'd love to see the test suite running faster. Now even most trivial test case takes at least a second.

Are there any pitfalls with this patch? Increased idle CPU/power use? Anything else?

I don't think so. If the CPU is truly idle, it will not be generating new garbage, so if the system is responsive, the CPU-local destructors should be cleaned up within a few ticks. By "responsive" I mean that no thread is staying in a read section for a long period of time without exiting. We have some problems like that, e.g., taskqueue_run_locked() may stay in a net_epoch read section indefinitely. I will fix that.

Maybe leave a comment pointing at this review, so that later when the network stack is improved this can be deleted? Since the patch is a one liner leaving the attribution only to git blame can quickly lose the rationale for it.

This change isn't a workaround, it's just a natural thing to do given how destructors are scheduled from hardclock(). So, it should not be deleted unless the integration between epoch and hardclock() changes.

Closed by commit rGdce565949914: epoch: Don't idle CPUs when there's pending epoch work (authored by markj). · Explain WhyApr 21 2026, 4:14 PM

This revision was automatically updated to reflect the committed changes.

markj added a commit: rGdce565949914: epoch: Don't idle CPUs when there's pending epoch work.

markj added a commit: rG49dbebd74fc8: epoch: Don't idle CPUs when there's pending epoch work.Sun, May 3, 3:46 PM