Implement callout_drain_async() in a preliminar fashion. Inspired by projects/hps_head.
Details
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
This revision should superceed D3078.
Use callout_drain_async() in the TCP stack instead, like done by:
https://svnweb.freebsd.org/changeset/base/287261
And all the races and clutter will go away I think :-)
share/man/man9/timeout.9 | ||
---|---|---|
270 | This sentence packs about three sentences into one. .Fn callout_drain_async is non-blocking and works much like .Fn callout_stop . However, when .Fn callout_drain_async returns non-zero, it means that the callback function .Fa fn will be called with .Fa arg when all references to the callout .Fa c are gone. | |
281 | Needs commas, can avoid an if/then, and use active rather than passive voice. When this function returns non-zero, do not call it again until the callback function has been called. | |
282 | Rearrange to avoid if/then and pauses. The "might" here is a little unclear. US spelling of "canceled" (which looks wrong to me, too). Previously pending asynchronous drains might get canceled if .Fn callout_drain or .Fn callout_drain_async functions are called while an asynchronous drain is pending. | |
288 | "pointed to by the" seems unnecessary. Also, "right away" seems colloquial. It is safe to immediately free the callout structure .Fa c when this function returns zero. |
share/man/man9/timeout.9 | ||
---|---|---|
282 | Looking at this again, I think "might be canceled" sounds better. |
I do approve the addition of this function, it would have made my life much easier for these two tasks:
- D2079: Fix TCP timers use-after-free old race conditions
- D2763: Fix a callout race condition introduced in TCP timers callouts with r281599.
mp-safe callout usage is tricky, mp-safe callout usage in context when you cannot use callout_drain() is even trickier, and this callout_drain_async()function definitively helps to address that complex case.
head/sys/sys/_callout.h | ||
---|---|---|
49 ↗ | (On Diff #8742) | Minor comment: Here callout_func_t functions will be used mostly for defer destroy/free/discard/release the structures used in the callout callback, so what do you think of calling it like: callout_dtor_t or callout_defer_dtor_t or anything more gears towards destruction of structures used in the callback. My 2 cents, |
Let me try to try callout_drain_async() in top of our last TCP changes. I will put the results here.
head/sys/sys/_callout.h | ||
---|---|---|
49 ↗ | (On Diff #8742) | I was thinking to use this function in general for callback prototypes. |
head/sys/sys/_callout.h | ||
---|---|---|
49 ↗ | (On Diff #8742) | Ok, then leave it that way. |
FYI:
This change and projects/hps_head will be discussed at EuroBSDcon 2015 next week:
Hi,
Can the people subscribed here please give some feedback how much time they need for review. I'd like to put this change back in when all issues are resolved.
Thank you!
--HPS
head/sys/kern/kern_timeout.c | ||
---|---|---|
1156 ↗ | (On Diff #8742) | I would check callout_stop() return value, and if it returns 0 (fail) no need to test if the callout is currently running: It is already over, we have to schedule the final callback anyway. Something like: stop = callout_stop(); cc = callout_lock(c); if (stop) { /* Check if the callout is not currently running */ if (c->c_iflags & CALLOUT_DIRECT) { direct = 1; } else { direct = 0; } retval = (cc_exec_curr(cc, direct) == c); } else { retval = 1; } ... Or anything equivalent. |
1169 ↗ | (On Diff #8742) | Quick open question for me here: What would be the use case for callouts with associated mutex to use callout_drain_async() instead of using the classical callout_drain()? If there is none I would set clear that callout_drain_async() is only for mpsafe callout and that's it. |
I am indeed for adding callout_drain_async() as this is exactly what we are doing with TCP timer callouts in HEAD, stable/10 and releng/10.2, but it is currently done using callout_stop()/callout_reset() in a non simple way instead of using directly callout_drain_async().
I plan to test callout_drain_async() by the end of this week.
head/sys/kern/kern_timeout.c | ||
---|---|---|
1156 ↗ | (On Diff #8742) | Hi, I'm not convinced that callout_stop() is consistent that way. In my projects/hps_head I've followed the examples in the manual page, and it only describes the return value from callout_stop() the first time you call it and not the second time! if (callout_stop()) { // was cancelled -- need to check } else { // not running } if (callout_stop()) { // it doesn't make sense to me that a second call to this function // would also say cancelled, while the callout is already cancelled. } else { // not running } --HPS |
1169 ↗ | (On Diff #8742) | Hi, The use-case is the same, to avoid blocking the caller when draining. This function should work for both MPSAFE and non-MPSAFE callouts. --HPS |
head/sys/kern/kern_timeout.c | ||
---|---|---|
1156 ↗ | (On Diff #8742) | To optimise this case I would write a new function, which merges into callout_stop_safe(). That's a bit more work and out of my scope for -current. |
head/sys/kern/kern_timeout.c | ||
---|---|---|
1156 ↗ | (On Diff #8742) | Thanks for your answer, I was more concern by a potential case where (hold your breath):
It cannot happen with MPSAFE callouts, just double-check it also never happens for non-MPSAFE callouts. In this case, your are right: No need to check the return value of callout_stop() in callout_drain_async(). That said, I would put a comment then, e.g.: /* callout_stop() return value is meaningless in this context: - It can return 1 (success) where the callout is currently running (MPSAFE callout) - It can return 0 (fail) where no callout are currently scheduled or running - etc. */ callout_stop(c); |
1169 ↗ | (On Diff #8742) | I see, it makes sense. |
head/sys/kern/kern_timeout.c | ||
---|---|---|
1183 ↗ | (On Diff #8742) | I would revert the callout_drain_async() return value to follow callout_stop()/callout_drain(): callout_stop() returns 0 == Fail to stop then: callout_drain_async() returns 0 == Fail to stop and deferred callback has been scheduled |
I have used and tested callout_async_drain() for TCP timer callouts and all good, everything works as expected. I would suggest you to create a review that depends on this one, to show how to use callout_async_drain() as a real example. What I did for my testing:
- Revert the rS284245 workaround (Not needed anymore with callout_drain_async())
- Apply D3521 patch
- Use callout_drain_async() in TCP timers with:
Note: Obviously here callout_drain_async() returns 1 in case of success:
diff --git a/sys/netinet/tcp_timer.c b/sys/netinet/tcp_timer.c index c11ef78..41fb0c6 100644 --- a/sys/netinet/tcp_timer.c +++ b/sys/netinet/tcp_timer.c @@ -918,20 +918,8 @@ tcp_timer_stop(struct tcpcb *tp, uint32_t timer_type) } if (tp->t_timers->tt_flags & timer_type) { - if (callout_stop(t_callout)) { + if (callout_drain_async(t_callout, f_callout, tp)) { tp->t_timers->tt_flags &= ~timer_type; - } else { - /* - * Can't stop the callout, defer tcpcb actual deletion - * to the last tcp timer discard callout. - * The TT_STOPPED flag will ensure that no tcp timer - * callouts can be restarted on our behalf, and - * past this point currently running callouts waiting - * on inp lock will return right away after the - * classical check for callout reset/stop events: - * callout_pending() || !callout_active() - */ - callout_reset(t_callout, 1, f_callout, tp); } } }
I approve this change for MPSAFE callouts. As I lack of specific tests (and knowledge) for non-MPSAFE callouts, I would like another reviewer to look at it (@rrs?).
This has a very very awful side effect in it. You change
the callout lock type. This means the KAPI user may have
specified to return locked, and you change it to UNLOCKED.
This is sure to invoke a crash when the callout that it is running
returns with it being unlocked and the user expects it to be locked.
The user will surely do a unlock() some place and boom. The machine
dies.
This is *not* how this should be done!!
I vote NO NO NO do not commit this!!
R
This has a very very awful side effect in it. You change
the callout lock type. This means the KAPI user may have
specified to return locked, and you change it to UNLOCKED.
Assuming that the drain function is the last operation on the callout, this is perferctly fine for callouts with mutexes. We should not call the defer function locked, because the mutex being locked might be destroyed by the destructor too.
This is sure to invoke a crash when the callout that it is running
returns with it being unlocked and the user expects it to be locked.
The user will surely do a unlock() some place and boom. The machine
dies.
This is an incorrect analysis. Given that the code doesn't call callout_restart_xxx() after the callout_stop() in callout_drain_async(), this cannot happen simply. The callout drain async code specifically checks if "cc_exec_curr(cc, direct) == c" and only then it will replace the lock and lock type. In that case the lock and lock type is copied on the stack, and will no longer be referred by the code which handles the callback. Even the c_flags are copied! See lines 671 +++ :
Please show step by step how this can happen.
c_lock = c->c_lock; c_func = c->c_func; c_arg = c->c_arg; c_iflags = c->c_iflags;
This is *not* how this should be done!!
Yes, this is how it can be done without adding more states to the existing callout subsystem.
--HPS
This is *not* how this should be done!!
Can you explain how you think "this" should be done and give some code pointers so that we can move this patch forward?
I don't know but as yet, I have not taken the time to think through the problem. I don't
understand why you think there is a *burning* need for this call. TCP has, as you pointed
out, used a similar method.. it can do that and *not* harm itself since it knows what it
expects the lock to return as (locked or unlocked) and take appropriate action.
Can you share for the group why you think this *must* go in? In its current form it
is broken and can lead to panic's or other un-wholesome behavior (imagine some other
thing getting in and locking it) and in the mean time the original caller thinking it was returned
locked goes and changes it... that could cause interesting surprises too. Those mysterious panics
that are *hard* to explain.
If we had a queue of users demanding "where is the async_drain" I would put more effort into
thinking it through and coming up with a solution that does not lead to hidden problems. Without
that demand, to me, this is a back-burner project and my plate is filled with quite a few higher priority
items that need to be addressed.
Hi Randall,
I don't know but as yet, I have not taken the time to think through the problem. I don't
understand why you think there is a *burning* need for this call. TCP has, as you pointed
out, used a similar method.. it can do that and *not* harm itself since it knows what it
expects the lock to return as (locked or unlocked) and take appropriate action.
Can you share for the group why you think this *must* go in?
- This is about coding principles. You simply don't free a structure until you're absolutely sure nobody is using it! Having proper API's for draining callouts is important for timing critical applications, beyond the TCP stack.
- The code in the TCP stack which tracks the return values of callout_reset() and callout_stop() is what I would call a hack and when using projects/hps_head and possibly other callout implementations, has been observed to escape that logic and cause use after free situations - still! I believe the same can happen with 11-current given the correct timing conditions. Only when using callout_drain_async() no failures were seen with projects/hps_head .
In its current form it
is broken and can lead to panic's or other un-wholesome behavior (imagine some other
thing getting in and locking it) and in the mean time the original caller thinking it was returned
locked goes and changes it... that could cause interesting surprises too. Those mysterious panics
that are *hard* to explain.
For MPSAFE callouts, which is the main target right now, it is _not_ broken, because MPSAFE callouts don't use mutexes.
For non-MPSAFE callouts, there are some restrictions on the usage. If it makes you worry, I can change the implementation, so that the callout's flags are not touched and instead the per-CPU callout structures are extended containing a drain function and drain argument. That will add one more unlikely "if" to the fast path for every tick per CPU. Is this way of implementing more acceptable to you? Or we can simply say panic() or "not supported" for non-MPSAFE callouts until further?
If we had a queue of users demanding "where is the async_drain" I would put more effort into
thinking it through and coming up with a solution that does not lead to hidden problems. Without
that demand, to me, this is a back-burner project and my plate is filled with quite a few higher priority
items that need to be addressed.
Users don't always know what they need. Let's make this API happen, before seeing more creative solutions to the drain a callout asynchronously issue, which doesn't involve a kernel callout_drain_async() function.
sys/kern/kern_timeout.c | ||
---|---|---|
1191 | Trying to catch up with @rrs comments and learn more about non-MPSAFE callouts. Open question to @hps: Why "the final callback should not be called locked" here? If the final callout is indeed going to free the callout structure, it does not have to free the lock used to protect this callout. They can have different lifetime. For example, if the TCP timer callouts were non-MPSAFE (yeah), you would use the inp_lock as callout lock and when the final callout tcp_timer_discard() is called only the callout structure is freed but not the inp_lock that can be safely unlocked. Of course if you use a non-MPSAFE callout and destroy both callout structure and the lock in callout in the callout_drain_async() discard function, it will crash when trying to unlock it. But it seems to be more callout user/documentation issue to me. My 2 cents. |
Hi @jch,
Trying to catch up with @rrs comments and learn more about non-MPSAFE callouts.
Open question to @hps: Why "the final callback should not be called locked" here? If the final callout is indeed going to free the callout structure, it does not have to free the lock used to protect this callout. They can have different lifetime.
That is right, though I think it is not common to use a lock to protect a destructur. In the callout subsystem locks are mostly used to ensure an atomic stop of a callout. Because the destructor should not be cancelled, it can be called unlocked.
For example, if the TCP timer callouts were non-MPSAFE (yeah), you would use the inp_lock as callout lock and when the final callout tcp_timer_discard() is called only the callout structure is freed but not the inp_lock that can be safely unlocked.
When is the "inp_lock" destroyed?
Of course if you use a non-MPSAFE callout and destroy both callout structure and the lock in callout in the callout_drain_async() discard function, it will crash when trying to unlock it. But it seems to be more callout user/documentation issue to me.
Possibly one mutex covering multiple callouts will be fine too with regard to SMP, but the problem is then we don't know which callouts to drain before freeing this common mutex, because the callouts are already freed? How could you solve that? pause("W", 1)?
Then I think it would be better to allow a mutex being associated with a callout to be destructed inside the destructor.
The right way to do this is so that
callout_stop_async_drain(struct callout *, drain_function)
is made.
It calls into
__callout_stop(co, 0, drain)
When we hit the return 0 in the safe == 0 case, because our callout
is running, we save drain in the callout's structure (where exec is at) and
return 0.
The guy getting the return will know that nothing was stopped, but his function will
be called at the end.
The callout system then after re-acquiring its lock when running the callout.. then calls
the callout-drain function saved..
There will be some thoughts needed on the locking of that call.. probably best to redrop the callout system
lock and call it again.. but that should be ok.
I will be more than glad to do this at the vendor summit today.. if thats ok with you Hans..
I will be more than glad to do this at the vendor summit today.. if thats ok with you Hans..
Yes, it is OK with me if you implement it like that. That's similar to what I've already done in projects/hps_head.