hyperv: Implement userspace gettimeofday(2) with Hyper-V reference TSC
ClosedPublic
Actions

Authored by sepherosa_gmail.com on Dec 14 2016, 8:49 AM.

Details

Reviewers

kib
howard0su_gmail.com
honzhan_microsoft.com
decui_microsoft.com
jhb
delphij
royger

Commits

rS310239: hyperv: Implement userspace gettimeofday(2) with Hyper-V reference TSC

Summary

This 6 times gettimeofday performance, as measured by tools/tools/syscall_timing

Test Plan

perf03-sephe:syscall_timing# /syscall_timing gettimeofday
Clock resolution: 0.000000101
test loop time iterations periteration
gettimeofday 0 1.014996400 36657501 0.000000027
gettimeofday 1 1.014232700 36467423 0.000000027
gettimeofday 2 1.037720000 37498712 0.000000027
gettimeofday 3 1.040983300 37674919 0.000000027
gettimeofday 4 1.010988300 36572763 0.000000027
gettimeofday 5 1.006983400 36404722 0.000000027
gettimeofday 6 1.006974100 36229166 0.000000027
gettimeofday 7 1.005982500 36437948 0.000000027
gettimeofday 8 1.011984500 36638109 0.000000027
gettimeofday 9 1.004983900 36347236 0.000000027
perf03-sephe:syscall_timing# sysctl kern.timecounter.fast_gettime=0
kern.timecounter.fast_gettime: 1 -> 0
perf03-sephe:syscall_timing# /syscall_timing gettimeofday
Clock resolution: 0.000000101
test loop time iterations periteration
gettimeofday 0 1.000997300 6175800 0.000000162
gettimeofday 1 1.000986700 6181355 0.000000161
gettimeofday 2 1.000981500 6174445 0.000000162
gettimeofday 3 1.000985100 6086535 0.000000164
gettimeofday 4 1.000982500 6137102 0.000000163
gettimeofday 5 1.000984700 6173990 0.000000162
gettimeofday 6 1.033986300 6284882 0.000000164
gettimeofday 7 1.050068700 4551081 0.000000230
gettimeofday 8 1.011150300 6186478 0.000000163
gettimeofday 9 1.030976700 6369569 0.000000161

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

sepherosa_gmail.com updated this revision to Diff 22905.Dec 14 2016, 8:49 AM

sepherosa_gmail.com retitled this revision from to hyperv: Implement userspace gettimeofday(2) with Hyper-V reference TSC.

sepherosa_gmail.com updated this object.

sepherosa_gmail.com edited the test plan for this revision. (Show Details)

sepherosa_gmail.com added reviewers: delphij, royger, decui_microsoft.com, honzhan_microsoft.com, howard0su_gmail.com, kib, jhb.

kib added inline comments.Dec 14 2016, 9:00 AM

lib/libc/x86/sys/__vdso_gettc.c
163 ↗	(On Diff #22905)	If the device cannot be opened for some reason, e.g. the process is executing with devfs instance which does not provide device, say in jail, and kernel reports hyperv algo for timecounter, then each gettimeofday(2) will be accompanied by failing open(2). This is the reason why I check for map == NULL _and_ MAP_FAILED for HPET.
176 ↗	(On Diff #22905)	Please follow style(9) and put declarations at the beginning of function.
180 ↗	(On Diff #22905)	I know about it but it seems to be fine to use lfence on amd, I never got reports of weird timecounter behavior on AMD. I suspect that mfence usage on AMD is somewhat cargo-cult.
sys/dev/hyperv/vmbus/amd64/hyperv_machdep.c
137 ↗	(On Diff #22905)	You do not need to fill these fields when algorithm is hv.

sepherosa_gmail.com updated this revision to Diff 22937.Dec 15 2016, 2:53 AM

kib added inline comments.Dec 15 2016, 10:19 AM

lib/libc/x86/sys/__vdso_gettc.c
159 ↗	(On Diff #22937)	e.g. in misconfigured jail.
186 ↗	(On Diff #22937)	You moved one variable, but left two others.
sys/dev/hyperv/vmbus/amd64/hyperv_machdep.c
76 ↗	(On Diff #22937)	I do no understand this. The libc part is only compiled on amd64, so why do you provide the compat32 version ? Either libc for i386 should also implement HV timecounter (preferred), or compat32 part should become NULL.
137 ↗	(On Diff #22937)	You should just bzero whole part of timehands structure starting with the th_x86_shift. I am sorry that I was not quite clean in my previous note.

sepherosa_gmail.com added inline comments.Dec 16 2016, 1:11 AM

lib/libc/x86/sys/__vdso_gettc.c
186 ↗	(On Diff #22937)	I actually prefer the have the local variables close to their usage. IIRC, there are some discussion about the style to group all of the local variables at the beginning of the function, but ended in no where. Sure, I will move them to the beginning of the function.
sys/dev/hyperv/vmbus/amd64/hyperv_machdep.c
76 ↗	(On Diff #22937)	OK, I will remove it.
137 ↗	(On Diff #22937)	Heh, I used tsc.c as example, which fills the "shift" and set the "hpet" to ~0, that's why the original code (since Hyper-V TSC does not use shift, I set it to 0 and hpet to ~0).

sepherosa_gmail.com updated this revision to Diff 22988.Dec 16 2016, 2:29 AM

Other than compat32 bits, this looks good.

This revision is now accepted and ready to land.Dec 16 2016, 9:49 AM

Closed by commit rS310239: hyperv: Implement userspace gettimeofday(2) with Hyper-V reference TSC (authored by sephe). · Explain WhyDec 19 2016, 7:41 AM

This revision was automatically updated to reflect the committed changes.

So you did not added support to 32bit libc. Why ?

In D8789#183203, @kib wrote:

So you did not added support to 32bit libc. Why ?

It currently requires mulq, which is not available on 32 bits system.

In D8789#183231, @sepherosa_gmail.com wrote:

In D8789#183203, @kib wrote:

So you did not added support to 32bit libc. Why ?

It currently requires mulq, which is not available on 32 bits system.

The code doesn't require 128bit multiplication support, it is useful for optimization but not critical. It is possible to express the same calculation using the big numbers multiplication (or, if you prefer it, a term from russian elementary school, 'multiplication in column').

If 64bit values are X=a*g+b and Y=c*g+d, where g is 2^32, a,b, and c,d are 32bit high and low words of the corresponding 64bit values, then X*Y = a*c*g*g + (a*d + b*c)*g + b*d. You need to care about the carry bit. It is slightly more cumbersome then mulq, but not too complicated.

In D8789#183250, @kib wrote:

In D8789#183231, @sepherosa_gmail.com wrote:

In D8789#183203, @kib wrote:

So you did not added support to 32bit libc. Why ?

It currently requires mulq, which is not available on 32 bits system.

The code doesn't require 128bit multiplication support, it is useful for optimization but not critical. It is possible to express the same calculation using the big numbers multiplication (or, if you prefer it, a term from russian elementary school, 'multiplication in column').

If 64bit values are X=a*g+b and Y=c*g+d, where g is 2^32, a,b, and c,d are 32bit high and low words of the corresponding 64bit values, then X*Y = a*c*g*g + (a*d + b*c)*g + b*d. You need to care about the carry bit. It is slightly more cumbersome then mulq, but not too complicated.

You could copy/paste the code from contrib/libcompiler_rt/lib/builtins/multi3.c, the __mulddi3() function.

In D8789#183254, @kib wrote:

In D8789#183250, @kib wrote:

In D8789#183231, @sepherosa_gmail.com wrote:

In D8789#183203, @kib wrote:

So you did not added support to 32bit libc. Why ?

It currently requires mulq, which is not available on 32 bits system.

The code doesn't require 128bit multiplication support, it is useful for optimization but not critical. It is possible to express the same calculation using the big numbers multiplication (or, if you prefer it, a term from russian elementary school, 'multiplication in column').

If 64bit values are X=a*g+b and Y=c*g+d, where g is 2^32, a,b, and c,d are 32bit high and low words of the corresponding 64bit values, then X*Y = a*c*g*g + (a*d + b*c)*g + b*d. You need to care about the carry bit. It is slightly more cumbersome then mulq, but not too complicated.

You could copy/paste the code from contrib/libcompiler_rt/lib/builtins/multi3.c, the __mulddi3() function.