This 6 times gettimeofday performance, as measured by tools/tools/syscall_timing
Details
perf03-sephe:syscall_timing# /syscall_timing gettimeofday
Clock resolution: 0.000000101
test loop time iterations periteration
gettimeofday 0 1.014996400 36657501 0.000000027
gettimeofday 1 1.014232700 36467423 0.000000027
gettimeofday 2 1.037720000 37498712 0.000000027
gettimeofday 3 1.040983300 37674919 0.000000027
gettimeofday 4 1.010988300 36572763 0.000000027
gettimeofday 5 1.006983400 36404722 0.000000027
gettimeofday 6 1.006974100 36229166 0.000000027
gettimeofday 7 1.005982500 36437948 0.000000027
gettimeofday 8 1.011984500 36638109 0.000000027
gettimeofday 9 1.004983900 36347236 0.000000027
perf03-sephe:syscall_timing# sysctl kern.timecounter.fast_gettime=0
kern.timecounter.fast_gettime: 1 -> 0
perf03-sephe:syscall_timing# /syscall_timing gettimeofday
Clock resolution: 0.000000101
test loop time iterations periteration
gettimeofday 0 1.000997300 6175800 0.000000162
gettimeofday 1 1.000986700 6181355 0.000000161
gettimeofday 2 1.000981500 6174445 0.000000162
gettimeofday 3 1.000985100 6086535 0.000000164
gettimeofday 4 1.000982500 6137102 0.000000163
gettimeofday 5 1.000984700 6173990 0.000000162
gettimeofday 6 1.033986300 6284882 0.000000164
gettimeofday 7 1.050068700 4551081 0.000000230
gettimeofday 8 1.011150300 6186478 0.000000163
gettimeofday 9 1.030976700 6369569 0.000000161
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Not Applicable - Unit
Tests Not Applicable
Event Timeline
lib/libc/x86/sys/__vdso_gettc.c | ||
---|---|---|
163 ↗ | (On Diff #22905) | If the device cannot be opened for some reason, e.g. the process is executing with devfs instance which does not provide device, say in jail, and kernel reports hyperv algo for timecounter, then each gettimeofday(2) will be accompanied by failing open(2). This is the reason why I check for map == NULL _and_ MAP_FAILED for HPET. |
176 ↗ | (On Diff #22905) | Please follow style(9) and put declarations at the beginning of function. |
180 ↗ | (On Diff #22905) | I know about it but it seems to be fine to use lfence on amd, I never got reports of weird timecounter behavior on AMD. I suspect that mfence usage on AMD is somewhat cargo-cult. |
sys/dev/hyperv/vmbus/amd64/hyperv_machdep.c | ||
137 ↗ | (On Diff #22905) | You do not need to fill these fields when algorithm is hv. |
lib/libc/x86/sys/__vdso_gettc.c | ||
---|---|---|
159 ↗ | (On Diff #22937) | e.g. in misconfigured jail. |
186 ↗ | (On Diff #22937) | You moved one variable, but left two others. |
sys/dev/hyperv/vmbus/amd64/hyperv_machdep.c | ||
76 ↗ | (On Diff #22937) | I do no understand this. The libc part is only compiled on amd64, so why do you provide the compat32 version ? Either libc for i386 should also implement HV timecounter (preferred), or compat32 part should become NULL. |
137 ↗ | (On Diff #22937) | You should just bzero whole part of timehands structure starting with the th_x86_shift. I am sorry that I was not quite clean in my previous note. |
lib/libc/x86/sys/__vdso_gettc.c | ||
---|---|---|
186 ↗ | (On Diff #22937) | I actually prefer the have the local variables close to their usage. IIRC, there are some discussion about the style to group all of the local variables at the beginning of the function, but ended in no where. Sure, I will move them to the beginning of the function. |
sys/dev/hyperv/vmbus/amd64/hyperv_machdep.c | ||
76 ↗ | (On Diff #22937) | OK, I will remove it. |
137 ↗ | (On Diff #22937) | Heh, I used tsc.c as example, which fills the "shift" and set the "hpet" to ~0, that's why the original code (since Hyper-V TSC does not use shift, I set it to 0 and hpet to ~0). |
The code doesn't require 128bit multiplication support, it is useful for optimization but not critical. It is possible to express the same calculation using the big numbers multiplication (or, if you prefer it, a term from russian elementary school, 'multiplication in column').
If 64bit values are X=a*g+b and Y=c*g+d, where g is 2^32, a,b, and c,d are 32bit high and low words of the corresponding 64bit values, then X*Y = a*c*g*g + (a*d + b*c)*g + b*d. You need to care about the carry bit. It is slightly more cumbersome then mulq, but not too complicated.
You could copy/paste the code from contrib/libcompiler_rt/lib/builtins/multi3.c, the __mulddi3() function.