This function is supposed to wait until all pending callbacks have been
executed. This is useful in some contexts where we tear down some
context (like a VNET jail and its associated UMA zones) synchronously,
and we want to make sure that all pending asynchronous callbacks (which
may free objects to said UMA zones) have run first.
The implementation schedules a callback on each CPU and waits for them
all to run. This assumes that, on a given CPU, callbacks are executed
in the order that they are pushed. This assumption depends on the
implementation of epoch_call_task() and ck_epoch_poll_deferred(), and it
is not true in general.
Callbacks are pushed onto a per-CPU stack in LIFO order.
ck_epoch_poll_deferred() first pulls out the callbacks from epoch - 2,
which are always safe to execute, and in so doing reorders them such
that the oldest callback as at the top of the stack, so in this case,
epoch_call_task() will execute them in order. However,
ck_epoch_poll_deferred() may determine that it is safe to execute
callbacks from epoch - 1 (or even from the current epoch if there are
no active readers), and in this case it will push those callbacks onto
the returned stack. This means that epoch_call_task() will invoke those
newer destructors before the older ones, which means that
epoch_drain_callbacks() may return early.
Fix the correctness problem by simply doing all of this twice: once the
first callback is invoked, we know that all of the callbacks that were
pending at the time that epoch_drain_callbacks() was called are
scheduled to be executed, so when the second callback is executed we
know that they must be finished.
I note that in an ideal world, this function would not exist, and all of
the teardown would happen asynchronously, rather than the current
mismash of synchronous and asynchronous cleanup.