Fix several races in ntptime code and provide fine-grained locking for ntptime and resettodr().
ClosedPublic
Actions

Authored by kib on Jun 12 2016, 2:16 PM.

Details

Reviewers

jhb
imp
lstewart
• ian

Commits

rS302252: Currently the ntptime code and resettodr() are Giant-locked. In

Summary

Currently both ntptime code and resettodr() are Giant-locked. In particular, the Giant is supposed to protect against parallel ntp_adjtime(2) invocations. But, for instance, sys_ntp_adjtime() does copyout(9) under Giant and then examines time_status to return syscall result. Since copyout(9) could sleep, the syscall result might be
inconsistent.

Another and bigger issue is that if PPS is configured, hardpps() is executed without any protection against the parallel syscall invocation. Potentially, this may result in the inconsistent state of the ntptime state variables, but I cannot say how serious such distortion is. The non-functional splclock() call in sys_ntp_adjtime() protected against clock interrupts calling hardpps() in the pre-SMP era.

The patch modernizes the locking. A mutex protects ntptime data. Due to the hardpps() KPI legitimately serving from the interrupt filters (and e.g. uart(4) does call it from filter), the lock cannot be sleepable mutex if PPS_SYNC is defined.

The ntp_is_time_error() interface was changed to make the calls outside the ntpadj_lock at least self-consistent. Potentially callout could act upon stale state of the ntptime machinery still, but this is true for the current code in more ugly ways, and probably should be fine due to both low quality and long time of update for typical RTC.

For resettodr(), I just added an internal mutex and removed (non-enforced) requirement of owning Giant around the call. Since resettodr() is callable from the clock swi, MD CLOCK_SETTIME() implementations must not sleep anyway. The lock ensures that the callout to write-out the clock does not interfere with the machdep.adjkerntz sysctl and shutdown. I believe, but did not verified, that this is enough for all drivers (due to non-sleepable statement).

Test Plan

I do not have any PPS hardware so I only tested that the spinlock case does not cause immediate smoke-out for an adjtime(2) call.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

kib updated this revision to Diff 17538.Jun 12 2016, 2:16 PM

kib retitled this revision from to Fix several races in ntptime code and provide fine-grained locking for ntptime and resettodr()..

kib updated this object.

kib edited the test plan for this revision. (Show Details)

kib added reviewers: • ian, imp, lstewart.

kib set the repository for this revision to rS FreeBSD src repository - subversion.

kib added a subscriber: emaste.

kib added a reviewer: jhb.Jun 27 2016, 6:45 PM

jhb added inline comments.Jun 27 2016, 7:03 PM

sys/kern/kern_ntptime.c
489 ↗	(On Diff #17538)	Is time_state a global protected by this lock? If so, you could just do this: int retval; if (ntp_is_time_error(time_status)) retval = TIME_ERROR; else retval = time_state; NTPADJ_UNLOCK(); And then set td_retval[0] to retval if there is no error from copyout().

Implement jhb' suggestion.

Generally, I like this cleanup. Wondering why sometimes we spin and other times we don't. Is there a performance win for that? Since this lock is usually not contested, I'm skeptical there's a win. If there's not a win, I'd recommend always spinning. There's a couple of minor style nits, but I'm not going to block based on them. Use your best judgement.

sys/kern/kern_ntptime.c
179 ↗	(On Diff #17945)	Why not always spin?
233 ↗	(On Diff #17945)	time_status hides the global time status. Higher warning levels will complain.
822–823 ↗	(On Diff #17945)	Given the number of unlocks, I'd be more inclined to have a goto out pattern that does the unlock and returns. But it's a minor point.
965 ↗	(On Diff #17945)	out would be here, see comment above.

This revision is now accepted and ready to land.Jun 27 2016, 8:11 PM

kib marked 3 inline comments as done.Jun 27 2016, 8:25 PM

kib added inline comments.

sys/kern/kern_ntptime.c
179 ↗	(On Diff #17945)	Spinlocks add interrupt latency, as a generic policy in kernel we prefer not to spin if ever possible. I fixed several existing cases in kernel, like umtx, which used spinlocks without a justification. And there, if we can avoid spinning, we should IMO. Typical kernel does not have PPS_SYNC, so always using spinlock adds unneeded overhead.

Rename argument to ntp_is_time_error().
Use goto out in hardpps().

This revision now requires review to proceed.Jun 27 2016, 8:33 PM

jhb accepted this revision.Jun 28 2016, 3:27 PM

jhb edited edge metadata.

This revision is now accepted and ready to land.Jun 28 2016, 3:27 PM

Closed by commit rS302252: Currently the ntptime code and resettodr() are Giant-locked. In (authored by kib). · Explain WhyJun 28 2016, 4:43 PM

This revision was automatically updated to reflect the committed changes.

kib mentioned this in D7302: Further locking changes to timekeeping code..Jul 24 2016, 3:23 PM

Revision Contents
Changeset List

Path

Size

head/

sys/

kern/

kern_ntptime.c

174 lines

Diff 17985

View Options

head/sys/kern/kern_ntptime.c

Fix several races in ntptime code and provide fine-grained locking for ntptime and resettodr().ClosedPublicActions