Page MenuHomeFreeBSD

Run epoch calls sooner and more reliably
ClosedPublic

Authored by jtl on Oct 10 2018, 3:18 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mar 4 2024, 8:11 AM
Unknown Object (File)
Nov 22 2023, 2:33 PM
Unknown Object (File)
Nov 22 2023, 2:15 PM
Unknown Object (File)
Nov 13 2023, 7:10 PM
Unknown Object (File)
Nov 12 2023, 10:30 AM
Unknown Object (File)
Nov 12 2023, 9:39 AM
Unknown Object (File)
Oct 12 2023, 5:32 AM
Unknown Object (File)
Aug 17 2023, 1:58 PM

Details

Summary

Epoch calls are stored in a 4-bucket hash table. The 4-bucket hash table allows for calls to be stored for three epochs: the current epoch and two previous ones. The comments at the beginning of ck_epoch.c explain why this is necessary.

When there are active threads, ck_epoch_poll_deferred() current runs the epoch calls for the current global epoch + 1. Because of modulo arithmetic, this is equivalent to running the calls for epoch - 3. However, this means that ck_epoch_poll_deferred() is waiting unnecessarily long to run epoch calls.

Further, there could be races in incrementing the global epoch. Imagine all active threads have observed epoch n. CPU 0 sees this. It increments the global epoch to (n + 1). It runs the epoch calls for (n - 3). Now, CPU 1 checks. It sees that there are active threads which have not yet observed the new global epoch (n + 1). In this case, ck_epoch_poll_deferred() will return without running any epochs. In the worst case (CPU 1 continually losing the race), these epoch calls could be deferred indefinitely.

To fix this, always run any epoch calls for global epoch - 2. Further, if all active threads have observed the global epoch, run epoch calls for global epoch - 1.

The global epoch is only incremented when all active threads have observed it. Therefore, all active threads must always have observed global epoch - 1 or the current global epoch. Accordingly, it is safe to always run epoch calls for global epoch - 2.

Further, if all active threads have observed the global epoch, it is safe to run epoch calls for global epoch - 1.

While here, add a compile-time check to validate assumptions about the value of CK_EPOCH_LENGTH.

Test Plan

I ran a test module against this and it seemed to "do the right thing" (both by results and from observation with a dtrace script).

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

See previous note in D17492 regarding the assumption that write-side functions are *not* called with-in a read-side critical. Does FreeBSD require that this is permitted? If so, there's a few other changes we may want to make and I'll need to do some additional validation.

A few notes, just for my memory:

  • There was interest in re-entrant epoch functions. Those would be more expensive and would not using n_pending != startpend invariant.

See previous note in D17492 regarding the assumption that write-side functions are *not* called with-in a read-side critical. Does FreeBSD require that this is permitted?

See my comments in that review.

Regardless of whether that is true, it seems like this change would still be safe. Do you agree?

A few notes, just for my memory:

  • There was interest in re-entrant epoch functions. Those would be more expensive and would not using n_pending != startpend invariant.

Feel free to suggest a different way to construct a return value. It wasn't clear to me what the return value was supposed to mean, so I took a guess.

FWIW, i have patched both of these changes (this + D17492) into the upstream ck git repo, and run them through the CK regression tests for epoch in userpace on my 32-core/64-thread amd, and they passed.

This revision is now accepted and ready to land.Oct 10 2018, 7:16 PM
In D17503#373529, @jtl wrote:

See previous note in D17492 regarding the assumption that write-side functions are *not* called with-in a read-side critical. Does FreeBSD require that this is permitted?

See my comments in that review.

Regardless of whether that is true, it seems like this change would still be safe. Do you agree?

Is it expected by the kernel that ck_epoch write-side functions are safe to call on records that are already in a read-side critical section?
Helping me understand the intended use-case will allow me to answer accurately. There is further validation needed beyond poll.

A few notes, just for my memory:

  • There was interest in re-entrant epoch functions. Those would be more expensive and would not using n_pending != startpend invariant.

Feel free to suggest a different way to construct a return value. It wasn't clear to me what the return value was supposed to mean, so I took a guess.

Is it possible to put these changes upstream first at https://github.com/concurrencykit/ck? We are responsive. You'll get more eye balls on the review and we have the advantage of CI with several torture tests for epoch on both TSO and RMO architectures. We are responsive. Longer term, if you're doing more work here, happy to provide commit privileges as well.

In D17503#373529, @jtl wrote:

See previous note in D17492 regarding the assumption that write-side functions are *not* called with-in a read-side critical. Does FreeBSD require that this is permitted?

See my comments in that review.

Regardless of whether that is true, it seems like this change would still be safe. Do you agree?

Is it expected by the kernel that ck_epoch write-side functions are safe to call on records that are already in a read-side critical section?
Helping me understand the intended use-case will allow me to answer accurately. There is further validation needed beyond poll.

Nothing is executed from hardclock "the tick handler", as that happens in interrupt context. If there are epoch_calls pending it schedules a task.
The task then _should_ be making any calls referencing the per-cpu record from within a critical section (preemption disable) - thus ensuring that record accesses don't need to be reentrant.

See sys/kern/subr_epoch.c for further details.
ck_epoch_call is called from a critical section in epoch_call. And then ck_epoch_poll_deferred is similarly called in a critical section epoch_call_task:

	ck_stack_init(&cb_stack);
	critical_enter();
	epoch_enter(global_epoch);
	for (total = i = 0; i < epoch_count; i++) {
		if (__predict_false((epoch = allepochs[i]) == NULL))
			continue;
		er = epoch_currecord(epoch);
		record = &er->er_record;
		if ((npending = record->n_pending) == 0)
			continue;
		ck_epoch_poll_deferred(record, &cb_stack);
		total += npending - record->n_pending;
	}
	epoch_exit(global_epoch);
	*DPCPU_PTR(epoch_cb_count) -= total;
	critical_exit();
In D17503#373529, @jtl wrote:

See previous note in D17492 regarding the assumption that write-side functions are *not* called with-in a read-side critical. Does FreeBSD require that this is permitted?

See my comments in that review.

Regardless of whether that is true, it seems like this change would still be safe. Do you agree?

Is it expected by the kernel that ck_epoch write-side functions are safe to call on records that are already in a read-side critical section?
Helping me understand the intended use-case will allow me to answer accurately. There is further validation needed beyond poll.

See related comments in D17492. In short, the kernel does expect to be able to call ck_epoch_call() and ck_epoch_poll_deferred() on records that are already in a read-side critical section.

It also expects to be able to call ck_epoch_synchronize_wait() with the global epoch.

Is it possible to put these changes upstream first at https://github.com/concurrencykit/ck? We are responsive.

I see that you are responsive. :-) And, I appreciate that!

Normally, I would prefer to fix these upstream and merge down to our tree. Because we're on the verge of a release and trying to track down some outstanding bugs, I would like to get these changes into our tree as quickly as possible to understand whether it fixes some of the bugs we're seeing that seem to be related to synchronization. I'm open to whatever is the best path forward to do that quickly.

You'll get more eye balls on the review and we have the advantage of CI with several torture tests for epoch on both TSO and RMO architectures. We are responsive. Longer term, if you're doing more work here, happy to provide commit privileges as well.

If you agree these are of general applicability for CK, I would love to see these changes get into upstream CK, whether that's before or after the commit to FreeBSD.

In D17503#373585, @jtl wrote:
In D17503#373529, @jtl wrote:

See previous note in D17492 regarding the assumption that write-side functions are *not* called with-in a read-side critical. Does FreeBSD require that this is permitted?

See my comments in that review.

Regardless of whether that is true, it seems like this change would still be safe. Do you agree?

Is it expected by the kernel that ck_epoch write-side functions are safe to call on records that are already in a read-side critical section?
Helping me understand the intended use-case will allow me to answer accurately. There is further validation needed beyond poll.

See related comments in D17492. In short, the kernel does expect to be able to call ck_epoch_call() and ck_epoch_poll_deferred() on records that are already in a read-side critical section.

Right we can actually be preempted while _in_ an epoch section. I'm just saying record manipulation itself doesn't need to be preemption safe.

Ok. I caught up with Matt offline. This doesn't solve the problem at hand. The API is not being used as intended in some areas, so we will work to address that. I would advise against merging this in as it doesn't address the underlying issues:

Some usage of epoch records is re-entrant: This is not supported. Supporting this would require atomic usage that would be, expensive. We have an easy fix for that.
Handling of ck_epoch_call across multiple writers: If writers are globally serialized, then this is fine. There is more magic involved if you have multiple writers mutating the epoch while concurrent ck_epoch_call invocations are made outside of an active section.

I will provide a detailed proposal tonight and provide more color on the above. I was caught on a particularly bad day. We'll work to get these merged in as soon as possible upstream with sufficient tests. Both of these should be easy fixes.
@jtl What timelines are you working with? It'll help me prioritize my time on this.

This revision now requires changes to proceed.Oct 10 2018, 9:10 PM

Thanks for the update. I had a few items come up and will be unable to spend time on this until Saturday. I'll loop back here and I'll try my best to have things ready for Sunday. Unfortunately, Friday won't be realistic at the moment.

Ok, all these changes here should be good. Thanks. Merging these upstream and then circling back on two issues we wanted to validate:

  • Using writer function on active record (Matt has a simple work-around). As 12 is getting cut, we won't extend the API right now to allow for writer-functions to be used on active records.
  • Usage of call.

Merged upstream and passes tests. poll return value was changed a bit to only return false if no forward progress was made (no dispatch or no progression of epoch). It will return true if we're in a grace period or if anything has been dispatched.

Let's merge from upstream if possible.

This revision is now accepted and ready to land.Oct 14 2018, 2:00 AM

Let's merge from upstream if possible.

Thanks again for fixing these upstream so quickly!

The new code was merged in rS339375.