Page MenuHomeFreeBSD

epoch: Don't idle CPUs when there's pending epoch work
ClosedPublic

Authored by markj on Sat, Apr 18, 7:07 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Apr 25, 10:47 AM
Unknown Object (File)
Sat, Apr 25, 10:47 AM
Unknown Object (File)
Fri, Apr 24, 9:58 PM
Unknown Object (File)
Fri, Apr 24, 10:26 AM
Unknown Object (File)
Fri, Apr 24, 6:59 AM
Unknown Object (File)
Fri, Apr 24, 1:23 AM
Unknown Object (File)
Wed, Apr 22, 4:27 PM
Unknown Object (File)
Mon, Apr 20, 8:07 PM
Subscribers

Details

Summary

The epoch(9) subsystem implements per-CPU queues of object destructors
which get invoked once it is safe to do so. These queues are polled via
hardclock().

When a CPU is about to go idle, we reduce the hardclock frequency to 1Hz
by default, to avoid unneeded wakeups. This means that if there is any
garbage in these queues, it won't be cleared for at least 1s (and
possibly longer).

epoch_drain_callbacks() is used in some places to provide a barrier,
ensuring that all garbage present in the destructor queues is cleaned up
before returning. It's implemented by adding a fake destructor in the
queues and blocking until it gets run on all CPUs. The above-described
phenomenon means that it can take a long time for these calls to return,
even (especially) when some CPUs are idle. This causes long delays when
destroying VNET jails, for instance, as epoch_drain_callbacks() is
invoked each time a network interface is destroyed.

Work around this problem by not disabling the hardclock timer if there
is garbage present in the destructor queues. The implementation of
epoch_drain_callbacks() has other problems, but this small change on its
own gives a good improvement.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

markj requested review of this revision.Sat, Apr 18, 7:07 PM

Not sure who else might want to review this.

This gives a big speedup when running the regression test suite in parallel, as that involves destroying thousands of VNET jails.

IMHO, if everything in the network stack is coded properly, we don't need explicit epoch_drain_callbacks() anywhere and this KPI should disappear.

Of course I'd love to see the test suite running faster. Now even most trivial test case takes at least a second.

Are there any pitfalls with this patch? Increased idle CPU/power use? Anything else?

Maybe leave a comment pointing at this review, so that later when the network stack is improved this can be deleted? Since the patch is a one liner leaving the attribution only to git blame can quickly lose the rationale for it.

I'd love to see tests running faster, but I'm not expert to tell if we are going to have any side effects of this change.

This revision is now accepted and ready to land.Sun, Apr 19, 6:25 AM

IMHO, if everything in the network stack is coded properly, we don't need explicit epoch_drain_callbacks() anywhere and this KPI should disappear.

Right, this is always a hack. But fixing this requires much deeper integration of epoch into the network stack. For instance, VNET teardown would have to become completely asynchronous, I think.

Really I'd like to unify epoch and SMR, and make the network stack much more explicit about where we're relying on synchronization from epoch(9), using accessors from smr_types.h.

Of course I'd love to see the test suite running faster. Now even most trivial test case takes at least a second.

Are there any pitfalls with this patch? Increased idle CPU/power use? Anything else?

I don't think so. If the CPU is truly idle, it will not be generating new garbage, so if the system is responsive, the CPU-local destructors should be cleaned up within a few ticks. By "responsive" I mean that no thread is staying in a read section for a long period of time without exiting. We have some problems like that, e.g., taskqueue_run_locked() may stay in a net_epoch read section indefinitely. I will fix that.

Maybe leave a comment pointing at this review, so that later when the network stack is improved this can be deleted? Since the patch is a one liner leaving the attribution only to git blame can quickly lose the rationale for it.

This change isn't a workaround, it's just a natural thing to do given how destructors are scheduled from hardclock(). So, it should not be deleted unless the integration between epoch and hardclock() changes.