Page MenuHomeFreeBSD

When the RTC is adjusted, reevaluate sleep times that are based on the RTC
ClosedPublic

Authored by vangyzen on Feb 24 2017, 9:07 PM.
Tags
None
Referenced Files
F108172022: D9791.id25667.diff
Wed, Jan 22, 5:05 AM
Unknown Object (File)
Mon, Jan 20, 9:55 PM
Unknown Object (File)
Mon, Jan 20, 7:01 AM
Unknown Object (File)
Sat, Jan 11, 6:38 AM
Unknown Object (File)
Sat, Jan 11, 6:34 AM
Unknown Object (File)
Sat, Jan 11, 5:54 AM
Unknown Object (File)
Sat, Jan 11, 1:53 AM
Unknown Object (File)
Sat, Jan 11, 1:48 AM
Subscribers

Details

Summary

POSIX says this about sem_timedwait(3):

The timeout shall expire when the absolute time
specified by abstime passes, as measured by the clock
on which timeouts are based (that is, when the value
of that clock equals or exceeds abstime), ...

It says essentially the same about:

pthread_cond_timedwait(3)
pthread_mutex_timedlock(3)
pthread_rwlock_timedrdlock(3)
pthread_rwlock_timedwrlock(3)

When the real-time clock is adjusted, such as by clock_settime(3),
wake any sleeping threads whose sleep duration is based on that clock.
The sleep function will return a special errno that tells the caller
to reevaluate its sleep duration based on the new value of the clock.

Adopt the new POSIX-compliant behavior in the functions mentioned above
(and sem_clockwait_np(3)) when using CLOCK_REALTIME. Other functions
might need to do likewise.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

vangyzen retitled this revision from to When the RTC is adjusted, reevaluate sleep times that are based on the RTC.
vangyzen updated this object.
vangyzen edited the test plan for this revision. (Show Details)

Remove dead code from a different approach

The approach might be fine, but I see at least one rough race which is not handled. Suppose that a thread calculated timeout based on the absolute time, but not yet managed to actually sleep, as seen by being owned by sq. Then, if the clock is set before the thread is finally put on sq, it will not be woken up.

sys/kern/kern_tc.c
1311 ↗(On Diff #25668)

Why do you need enqueue_timeout() ? I do not quite understand why cannot you directly call sleepq_remove_matching() in the current context.

sys/kern/subr_sleepqueue.c
1071 ↗(On Diff #25668)

I suggest to re-factor this code fragment from sleepq_broadcast() and reuse the call.

1084 ↗(On Diff #25668)

It is enough to do one kick_proc0() after waking up all chains.

@kib: Well spotted. I'm still considering this race. For now, that's why I have the 10ms timeout taskqueue. That's an ugly hack that will probably be insufficient in some cases.

I'm currently considering a global rmlock. Sleepers will rlock it before reading the RTC. _sleep() will runlock it after sleepq_add if PRTCLK is passed. tc_setclock will wlock it. The taskqueue will not be used.

I don't like the asymmetry of the locking, but it's fairly reasonable, and it's the best idea I've had. I would appreciate your thoughts.

sys/kern/subr_sleepqueue.c
1071 ↗(On Diff #25668)

Good idea. Done.

1084 ↗(On Diff #25668)

I was wondering about that. Thanks.

vangyzen marked 2 inline comments as done.

Code review feedback

Rename sleepq_remove_matching() to sleepq_chains_remove_matching().
The former name is used for a new function that loops over threads
in a queue. It is refactored out of sleepq_broadcast() and called
from that function and sleepq_chains_remove_matching().

Kick proc0 once after waking up all chains.

I don't like the asymmetry of the locking, but it's fairly reasonable, and it's the best idea I've had. I would appreciate your thoughts.

Suppose that we have some 'generation' for the system clock value, which is incremented on each setclock(). Also suppose that we added some field td_umtx_agen to struct thread and store there the clock generation we used to calculate timeout for msleep(). Then sleepq code can refuse to put the thread to sleep if td_umtx_agen != current setclock generation.

The check in sleepq must happens quite late, somewhere in sleepq_switch() similar to TDF_TIMEOUT recheck. Then, either we note the generation bump on our own, or the setclock() wakeup would note that we are on sleepqueue and wakes us. This basically means that sleepqueue chain lock is used as interlock.

I like your idea. This is cleaner than the rmlock.

In D9791#202610, @kib wrote:

Then sleepq code can refuse to put the thread to sleep if td_umtx_agen != current setclock generation.

Since this read of the clock generation is not synchronized with the increment in setclock(), we would still need the setclock() wakeup. So, is this an optimization to avoid an unnecessary sleep (and immediate wake)?

The check in sleepq must happens quite late, somewhere in sleepq_switch() similar to TDF_TIMEOUT recheck.

That makes sense, but why do you say it? If this is an optimization, this would make it most optimal. Is that the reason?

I like your idea. This is cleaner than the rmlock.

In D9791#202610, @kib wrote:

Then sleepq code can refuse to put the thread to sleep if td_umtx_agen != current setclock generation.

Since this read of the clock generation is not synchronized with the increment in setclock(), we would still need the setclock() wakeup. So, is this an optimization to avoid an unnecessary sleep (and immediate wake)?

Sure we need to do the wakeup loop, same as it is done in the current patch. What I described is not an optimization, it is a measure to avoid the situation which I described in the first review message, namely, the problem where a thread calculates timeout with old clock, but does not make to sleep fast enough to be woken after setclock().

The check in sleepq must happens quite late, somewhere in sleepq_switch() similar to TDF_TIMEOUT recheck.

That makes sense, but why do you say it? If this is an optimization, this would make it most optimal. Is that the reason?

No, this is the 'interlock' to avoid lost wakeup, continued.

My apologies. That's blindingly obvious now. (Maybe I shouldn't work on this at home while my kids are around...)

Add an rtc_generation and use it to close the race,
instead of the 10ms timeout taskqueue.

Add test code (to be removed).

I think I can remove PRTCLK and TDF_SLEEPRTC, and just use td_rtcgen. I'm working on that now.

sys/kern/kern_umtx.c
833 ↗(On Diff #25806)

This will be removed before commit.

  • Remove PRTCLK and TDF_SLEEPRTC; just use td_rtcgen
  • Remove test code
  • Update locking comment
  • Avoid superfluous clock re-read when not using RTC

(previous update included unrelated changes due to stale master branch)

Please commit the grammar fixes for comments in subr_sleepqueue.c.

sys/kern/kern_synch.c
218 ↗(On Diff #25826)

I think that the changes to _sleep() can be avoided at all. If we set td_rtcgen before msleep() call in umtxq_sleep() and get td_rtcgen == 0 after sleep, we are guaranteed that the thread was either aborted in sleeping_on_old_rtc(), or sleepq_switch() refused to put it on sleepq. So you can just test for td_rtcgen == 0 in umtxq_sleep() instead of ERELOOKUP.

sys/kern/kern_tc.c
1486 ↗(On Diff #25826)

I think it is fine to increment generation in tc_setclock, after the tc_setclock_mtx spinlock is unlocked, right before remove_matching().

Also, the increment must be slightly more advanced, to avoid overflowing the value into zero.

sys/kern/kern_umtx.c
779 ↗(On Diff #25826)

() are not needed.

vangyzen added inline comments.
sys/kern/kern_synch.c
218 ↗(On Diff #25826)

I considered this, but I liked the fact that the result was communicated entirely by the return value. However, after a fresh look, I realize that testing td_rtcgen is fairly clean, especially since callers are already manipulating it.

I have reverted all changes in this file. (I hope Phabricator still shows these comments...)

sys/kern/kern_tc.c
1486 ↗(On Diff #25826)

I agree about the location.

I figured overflow wouldn't happen in practice, but guaranteeing it is easy enough, so I will.

sys/kern/kern_umtx.c
779 ↗(On Diff #25826)

True, but I like them. However, style(9) discourages them, so I'll remove them.

vangyzen marked 3 inline comments as done.

Revert all changes in kern_synch.c. Callers will test td_rtcgen instead
of the *sleep return value. They're already maintaining that field
anyway, so this is clean enough.

Bump rtc_generation in tc_setclock() instead of tc_windup().

Remove superfluous parens to adhere to style(9).

sys/kern/kern_tc.c
1277 ↗(On Diff #25846)

Blank line is not needed.

sys/kern/kern_umtx.c
842 ↗(On Diff #25846)

I understand why this reading of rtc_generation needs acquire semantic: we ensure that if we see updated rtc_generation, then we also see the updated timehands.

But I do not quite see why other rtc_generation reads, in particular, in sleeping_on_old_rtc() and in sleepq_switch(), need the acquire. sleeping_on_old_rtc() happens in the same thread that incremented the variable. sleepq_switch() only needs the value of the variable, it does not order its read with other happenings in tc_windup() which did release write. Could you, please, explain your use of _acq there ?

876 ↗(On Diff #25846)

Too many blank lines.

sys/kern/subr_sleepqueue.c
586 ↗(On Diff #25846)

As I asked before, please commit the style and comment grammar fixes now.

917 ↗(On Diff #25846)

I would prefer to remove special case for matcher == NULL and instead provided a trivial matcher always returning true there.

1101 ↗(On Diff #25846)

No need in {}.

sys/sys/proc.h
151 ↗(On Diff #25846)

This is too cryptic. My understanding is that owner of the field if curthread when running. When curthread is sleeping, the field is protected by the thread lock. I am not sure how to explain this shortly in the annotations, might be a longer comment in more appropriate place would be less confusing.

sys/kern/subr_sleepqueue.c
566 ↗(On Diff #25846)

I think this is a right place to provide some reasoning why the code correctly covers the race where tc_windup()->sleeping_on_old_rtc() missed us. Some note about sleepq lock used as interlock there is due, IMO.

vangyzen added inline comments.
sys/kern/kern_umtx.c
842 ↗(On Diff #25846)

There is no good reason. I'm embarrassed to say, it was absent-minded. Of course, you're right that only this function needs the acquire semantic.

sys/kern/subr_sleepqueue.c
566 ↗(On Diff #25846)

Will do.

1101 ↗(On Diff #25846)

True, but I very strongly prefer using such braces.

sys/sys/proc.h
151 ↗(On Diff #25846)

Yes, this deserves more explanation.

I will write the documentation in my next upload.

POSIX compliance; code review feedback

POSIX says that only absolute sleeps are affected by RTC adjustments.
Relative sleeps are immune. Make it so.

Use atomic acquire only where needed.

Change sleepq_remove_matching() to require a predicate.
Provide a constant function when called from sleepq_broadcast().

Remove unstylish blank lines.

Add documentation in all interesting places.

@kib, thank you for your thorough review and very helpful comments. When you started, the code was essentially just an example of an approach and not ready for such a review, so I realize that you have already spent a lot of time on it. I believe it is now ready, if you have the time.

Note to self: Update the description.

kib added a reviewer: kib.

I think that the code is fine as is. We might work on improving the comment in sleepq_switch() after the code is landed.

I suggest asking pho@ to test this, he would probably need specific instructions and might be a specific test programs from you. I also added jhb@ to the reviewers.

sys/kern/kern_tc.c
1274 ↗(On Diff #26019)

By finding the thread on a sleepqueue and holding the lock ...

sys/kern/subr_sleepqueue.c
572 ↗(On Diff #26019)

I think it is a prereq for the race that Ts is not put to sleep yet. In fact, there is very fine difference between TD_ON_SLEEPQ() which means that sleepq_add() was done, and TD_IS_SLEEPING(), which mean that the thread is really asleep. TD_IS_SLEEPING() condition is set as the last action in sleepq_switch() (this function) right before the call to mi_switch().

When I asked for the comment, I mean just mention this race and also to explain how the use of sleep chain spinlocks makes the td_rtcgen accesses correctly synchronized.

914 ↗(On Diff #26019)

Style(9) requires a blank line after the '{' if no local vars is defined.

This revision is now accepted and ready to land.Mar 6 2017, 1:30 PM
jhb edited edge metadata.
jhb added inline comments.
sys/kern/kern_tc.c
1311 ↗(On Diff #26019)

My understanding of style(9) would require a blank line before the comment.

vangyzen added a subscriber: pho.
vangyzen added inline comments.
sys/kern/kern_tc.c
1274 ↗(On Diff #26019)

That sounds better.

1311 ↗(On Diff #26019)

I couldn't find the reference, but I like an extra line here anyway.

vangyzen edited edge metadata.
vangyzen marked 2 inline comments as done.

Polish comments; improve style(9).

Also merge from master so @pho can apply the patch to master.

This revision now requires review to proceed.Mar 6 2017, 9:49 PM
This revision was automatically updated to reflect the committed changes.