Page MenuHomeFreeBSD

hz.9: update stathz for current usage

Authored by imp on Jun 18 2021, 3:51 PM.



Update the stathz description to reflect reality. profhz is the only
thing we should deprecate.

Sponsored by: Netflix

Diff Detail

rG FreeBSD src repository
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

imp requested review of this revision.Jun 18 2021, 3:51 PM
jrtc27 added inline comments.

You have an either without an or any more.


While it is true that historically we used the same physical timer to drive statclock and profclock, it might be clearer to the reader to actually describe them as three separate logical clocks and then maybe add a note after the description of statclock and profclock to say that historically these two clocks were driven by the same physical timer. I think it will also fix Jess's comment about the missing or.

Phab's UI won't let me select the body of text to do an suggestion since there's already a comment, so I'll just do a suggestion by hand:

A second clock running at
.Va stathz
gathers statistics on the system and its processes.
It computes values for
.Xr top 1 .
Finally, a third clock runs at
.Vt profhz
to sample user program counter values for profiling purposes.
This profiling clock has been replaced by the more functional
.Xr hwpmc 4
and may be removed in a future version of
.Fx .

Maybe it's not even worth mentioning the historical note that statclock and profclock use the same underlying timer, or that profclock doesn't run except when it is enabled by at least one process requesting profiling. If we did, maybe we would add it in a separate '.Sh IMPLEMENTATION NOTES' section. Also, I snuck in a fuller deprecation warning for profclock. :)

Update, per jhb@ review, adopting his wording...

Unsure about the implementation details section, so please comment.

add in kib's observations.

Added kib based on his observations on my original commit.

imp marked 2 inline comments as done.Jun 18 2021, 7:18 PM
emaste added inline comments.

Discussed on IRC and @kib pointed out in email - this is not correct.

I think "up to .Va hz times per second" would be accurate, but I'm not sure if there's more context we should offer.

imp marked an inline comment as done.Jun 18 2021, 7:25 PM
imp added inline comments.

hardclock needs to be updated, and I'll put the details there.
Forgot to update this based on IRC earlier, and kib's comments made me revisit.
There's a lot of details that we try to do as well for when we call it when it isn't that many times a second, but that detail is best held elsewhere.

imp marked an inline comment as done.Jun 18 2021, 8:16 PM
imp added inline comments.

In the hardclock.9 page, I'll say "the sum of cnt over a second will tend to be about hz" or something similar.... It looks like we make our best guesses on ticks, but nothing really cares how exact we are on that for time keeping, within certain limits dictated by overflow and such.


I think something along these lines would be more accurate:

This routine is called with the varying frequency, enough to handle events fired from it, in time. Highest frequency is hz, and current implementation ensures that the function is called at least once in a second if CPU is idle.




I think that's correct, but probably not quite the right way to express it, essentially hardclock is called on an upcoming tick 1/hz boundry when necessary. But anyway this is outside of the scope of hz.9.

Looks good to me overall. kib@'s remaining suggestions should be resolved, but I'm otherwise happy. A bit of a buzzsaw process-wise to get these pages in, but nice to have this more clearly documented. Thanks!


That suggestion to use "mechanism" is good with me.

This revision is now accepted and ready to land.Jun 18 2021, 9:22 PM

skate the middle ground between all the suggested frequency of hardclock
wordings. I believe this tersely, but accurately, reflects the system behavior.

This revision now requires review to proceed.Jun 19 2021, 12:46 AM
imp marked 5 inline comments as done.Jun 19 2021, 12:56 AM
This revision is now accepted and ready to land.Jun 19 2021, 1:03 AM

tempt fate and tweak a word or two here and there and
add a senence saying what these clocks do.

This revision now requires review to proceed.Jun 19 2021, 1:09 AM

As an aside I brought up removing -pg profiling support last year (


Interesting, I wonder if we can eliminate the at least once per second?


Yea. I saw that and wasn't sure what to think, so said nothing.
Is that where we turned off PROFILE by default? Or was that prior?


No. I don't think so.

  1. timecounters generally wrap in the few second range these days, and we have to poll them faster than that..
  2. time_second needs to be bumped once a second.

There may be other reasons as well. hardclock clocks polling devices and queueing the end of epoch processing (If I'm reading the code right, I just took a quick glance). There may be other reasons I'm unaware of that kib@ or mav@ could expound upon.


Is that where we turned off PROFILE by default? Or was that prior?

We haven't turned PROFILE off by default yet.


Yes, the main reason is the timecounters. They are set up to return uint32_t value which we allow to wrap in approximately a second. In particular, this is why we have TSC-low timecounter which shifts TSC value right.

And this means that we should call tc_windup() once in half-second, actually, not once in a second. We indeed do, see getnextcpuevent().

Also this frequency determines how fast we can reclaim resources when epochs end, which imp noted above.

On the other hand, this 'once in half-second' really need to occur once in half-second globally, not on each CPU. Right now we do not utilize this possible optimization.

The other niglet about once a second is so that the kern.cp_time is updated for idle CPUs at least once a second so that top(1) works. Right now when a CPU is idle we don't schedule statclock on it as an optimization and instead do a fixup later when the CPU resumes assuming all the missed ticks were idle ticks. Perhaps we could do this fixup globally in the same way as is done for tc_windup as Konstantin mentions, but I don't think that has been implemented.

Update based on the observations here. This tidbits I think are instructive.


I know I called this "stat clock" earlier, so this is my bad.


Do we want to include ps(1) and top(1) here since we Xr them inline now?

gbe added a subscriber: gbe.


This revision was not accepted when it landed; it landed in state Needs Review.Jul 1 2021, 3:34 PM
This revision was automatically updated to reflect the committed changes.

So if there's still issues, lemme know and I'll work with you. Wanted to get this round into the tree.
next up, hardclock.9