Page MenuHomeFreeBSD

kernel: Rework how loadtimer() works.
Needs ReviewPublic

Authored by hselasky on Sep 30 2022, 5:33 PM.
Tags
None
Referenced Files
Unknown Object (File)
Dec 20 2023, 7:54 AM
Unknown Object (File)
Jun 8 2023, 1:36 AM
Unknown Object (File)
May 2 2023, 8:11 AM
Unknown Object (File)
Apr 3 2023, 12:47 AM
Unknown Object (File)
Mar 1 2023, 5:49 PM
Unknown Object (File)
Feb 11 2023, 7:29 PM
Unknown Object (File)
Dec 15 2022, 8:03 PM
Unknown Object (File)
Dec 14 2022, 10:07 PM
Subscribers

Details

Reviewers
mav
kib
Summary

Because et_start() uses the delta between timestamps to compute the next event,
it is important to get the current sbinuptime() just before calling et_start(),
else the time spent to execute code and obtaining mutexes, since the last
sbinuptime() call gets accumulated as delay for the next timer event.

Tested using constant 1ms timers firing from user-space using absolute
timeouts and sending UDP packets.

See audio/hpsjam for more details. The test program was run using real-time
priority and the resulting jitter was measured by ear.

MFC after: 1 week
Sponsored by: NVIDIA Networking

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

Before:

hpsjam_before.png (447×936 px, 56 KB)

After:
hpsjam_after.png (458×1 px, 27 KB)

The original idea was to keep the potentially expensive clock reading operation out of the lock scope, which in case of global timer, like HPET (which I guess you are using according to other review), may be quite congested. You are right that reading time counter closer to event timer programming should reduce the jitter. I just have difficulties to believe that spin on the lock may be long enough to be possible to hear (thinking whether there can be a context switch somewhere?). But if you say so, I have no big objections. I don't know any modern system with global event timer or slow enough time counter any more, so don't care so much about this aspect.

sys/kern/kern_clocksource.c
420

Above in !start and here in eq cases now would not be needed, but you are reading it, that is a waste.

473

With now variable removed above, you could pass sbintime() result here directly. Though it is a cosmetics.

@mav : Thanks for your comments. I will check this issue more tomorrow. I'm using HPET (per-cpu). I'll try to add some debug prints to measure the time difference between the old "now" and "sbinuptime()", to try and locate when deltas get too high.

I'm using HPET (per-cpu).

I am curious why.

@kib : Its LAPIC : kern.eventtimer.timer: LAPIC . Note this is a laptop and not server.

FreeBSD 14-main w/o this patch:

Worst case offset is approximately 83 microseconds, which is close to 16 percent when using kern.hz = 2000 .

358479 / 4294967
ans = 0.083465

sys/kern/kern_clocksource.c
420

cpu_new_callout() now can save a sbinuptime() call ! That's good.

One question though: What is the return value from cpu_idleclock() used for? Must it be accurate?

sys/kern/kern_clocksource.c
420

On x86 it is used to select optimal sleep method, depending on expected sleep time. I.e. to not try deep sleep if we expect to wake up in few microseconds. It does not need to be too accurate, but less then the order of magnitude would be good.

@kib : Its LAPIC : kern.eventtimer.timer: LAPIC . Note this is a laptop and not server.

LAPIC is very much different from HPET. For LAPIC, esp. modern LAPIC with TSC-deadline (almost any not too old CPU) programming timer costs less than the memory access. Then
But for HPET, typically used when there are either some problems with TSC (e.g. BIOS mis-programming) or under virtualization, the cost of the patch could be significant.

Also, sbinuptime() is about timecounter and not timers, but the cost analysis of TSC vs. HPET is even more applicable, since HPET means that we issue a read (which is not posting) to the south bridge, while TSC read is typically core-local or uncore.

@mav : I added a sysctl to show the maximum time offset in timercb():

Typical time is:

123781 / 4294967 = 0.028820 milliseconds

But when I run powerd -a maximum -v, then something impacts the timer!

4298726 / 4294967 = 1.0009 milliseconds of jitter!