This reduces the overall number of hard timer IRQ's when precision higher
than "kern.hz" is not needed.
MFC after: 1 week
Sponsored by: NVIDIA Networking
Differential D37594
callout(9): When using the C_HARDCLOCK flag, ensure the target time is aligned to "tick_sbt" • hselasky on Dec 3 2022, 12:07 AM. Authored by Tags None Referenced Files
Subscribers
Details
Diff Detail
Event TimelineComment Actions Why do you think this is needed? All C_HARDCLOCK consumers are supposed to pass only periods multiple of tick_sbt, and hardclocktime above should be aligned to hardclock also. In what cases do you see misaligned values? Comment Actions @mav: I added some code to the kernel and found the remainder of the to_sbt for hardclock is never properly aligned: diff --git a/sys/kern/kern_timeout.c b/sys/kern/kern_timeout.c index 0ac0eca36da4..a6acff175d85 100644 --- a/sys/kern/kern_timeout.c +++ b/sys/kern/kern_timeout.c @@ -114,6 +114,9 @@ static int ncallout; SYSCTL_INT(_kern, OID_AUTO, ncallout, CTLFLAG_RDTUN | CTLFLAG_NOFETCH, &ncallout, 0, "Number of entries in callwheel and size of timeout() preallocation"); +static int remainder; +SYSCTL_INT(_kern, OID_AUTO, remainder, CTLFLAG_RW, &remainder, 0, ""); + #ifdef RSS static int pin_default_swi = 1; static int pin_pcpu_swi = 1; @@ -906,6 +909,11 @@ callout_when(sbintime_t sbt, sbintime_t precision, int flags, to_sbt = sbinuptime(); if ((flags & C_HARDCLOCK) == 0) to_sbt += tick_sbt; + + /* make sure the target time is really aligned to "tick_sbt" */ + if ((to_sbt % tick_sbt) != 0) + remainder = to_sbt % tick_sbt; + to_sbt -= to_sbt % tick_sbt; } else to_sbt = sbinuptime(); if (SBT_MAX - to_sbt < sbt) I think this has to do with: --HPS Comment Actions @mav : I see more. Maybe when setting/updating tick_sbt, there should be a callback to re-align the hardclocktime? What do you think? Comment Actions @hselasky Until you've started to try align everything to absolute tick_sbt multiples it should not matter. All C_HARDCLOCK callouts measure time from the last hardclocktime, stepping in tick_sbt increments, but without strictly defined initial offset. I have no objections for realigning hardclocktime on the tick rate change, but I am missing a motivation. Comment Actions I was thinking about removing 64bit divisions in the hot path and moving them into the first initialization, instead you've added more... We should not bother about alignment in callout_when() if hardclocktime is aligned. Time passed by caller must be aligned to tick_sbt already by him, otherwise he should not use C_HARDCLOCK. Comment Actions If it is not aligned, then we should not try to align it. Most of C_HARDCLOCK users should do tick_sbt *, so it should be aligned.
Comment Actions @mav : I know modulus is a bit expensive, but many of those computations, even for 64-bit, could easily be avoided by using 1024 ticks instead of 1000 ticks per second .... Though that's a completely different discussion.
Comment Actions @mav : What about synchronous execution between CPUs. When you have one CPU it doesn't matter that much, but when you have multiple CPUs, it might make sense to execute the timers aligned to eachother, instead of sliding randomly? Comment Actions @hselasky As I have told above, our code tries to align periodic timers by providing eventtimers first event time, but it is not guarantied to be supported. I am not sure what are you saying or proposing. Comment Actions Maybe I wasn't that clear. For SMP the value that holds the next hardclock event is per-CPU, and what prevents those per-CPU values from sliding between CPUs? During boot, the dummy timer will put "random" starting values in there? Or am I wrong?
Comment Actions
IIRC nexthard is initialized to the same value on all CPUs and incremented by tick_sbt up to the current time. They may lag on idle CPUs, but should catch up back in sync as soon as CPU wake up. I don't see there any randomness. By the time cpu_initclocks_bsp() calls configtimer() to initialize next hard I see system should already calibrate TSC, so time should already be accurate. nexttick may be random on a system with periodic timers, but as I have told above, I'd prefer it to be removed for periodic timers rather than fixed, unless I missed what it is used for. Whats about "kern.hz a power of two", we never had that limitation, and introducing it now feels a bit late. Mentioned divisions for earlier CPUs were much more expensive than for modern, and still hz was never restricted. I'd prefer hz to be deprecated instead. Comment Actions
I see that too, but dumping the values I see they are not aligned like they should. Try this:
Comment Actions
It is not a good idea to deprecate "hz", because thread preemption depends on it! I.E. when a thread runs for too long, and other threads needs to be switched in. Comment Actions nexthard is not expected to be divisible by tick_sbt, since configtimer() initializes it to sbinuptime() + timerperiod. If you see something above that, please share, save my time.
Scheduler's preemption in sched_clock() depends on statclock (stathz), not hardclock (hz). Though sched_ule indeed uses ticks for some stats. And I am only saying that we should stop using hz in all random applications, not depending on it and just needing sleep for some time, etc. |