Details

Reviewers

mav
phk
imp
rrs
kib

Commits

rG4f576e96663e: time(3): Increase precision of time conversion functions by using gcd.
rGd97ecf714e79: time(3): Increase precision of time conversion functions by using gcd.

Summary

When converting times to and from units which have many leading zeros,
it pays off to compute the greatest common divisor first, and then do the
scaling product. This way all time unit conversion comes down to scaling a
signed or unsigned 64-bit value by a fraction represented by two signed
or unsigned 32-bit integers.

For example:

SBT_1S is defined as 2^32 . When scaling using powers of 10 above 1,
the gcd of SBT_1S and 10^N is always greater than or equal to 4,
when N is greater or equal to 2.

For example scaling a sbt value to milliseconds is done by multiplying
by (1000 / 8) and dividing by (2^32 / 8).

This trick allows for higher sbt precision at very little additional CPU cost.

MFC after: 1 week
Sponsored by: NVIDIA Networking

Test Plan

#include <stdio.h>
#include "/usr/src/sys/sys/time.h"

int
main()
{
	for (int64_t x = 0; x != 2000000000; x++) {
		int64_t y = nstosbt(x);
		int64_t z = sbttons(y);

		if (z != x)
			printf("NS ERROR %zd %zd\n", z, x);
	}

	for (int64_t x = 0; x != 2000000; x++) {
		int64_t y = ustosbt(x);
		int64_t z = sbttous(y);

		if (z != x)
			printf("US ERROR %zd %zd\n", z, x);
	}

	for (int64_t x = 0; x != 2000; x++) {
		int64_t y = mstosbt(x);
		int64_t z = sbttoms(y);

		if (z != x)
			printf("MS ERROR %zd %zd\n", z, x);
	}

	return (0);
}

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

• hselasky requested review of this revision.Oct 2 2022, 10:23 PM

• hselasky created this revision.

• hselasky added a reviewer: kib.Oct 3 2022, 12:10 AM

kib added inline comments.Oct 3 2022, 1:43 AM

sys/sys/time.h
169	I am not sure what does the definition mean. I suspect it is not GCD, but the largest power of 2 dividing both of the arguments, am I right? I also needed the following hint from the hacker's delight to decipher the calculation. I believe it must be included in the description. create a word with 1’s at the positions of the trailing 0’s in x, and 0’s elsewhere, producing 0 if none (e.g., 01011000 ->0000 0111): ~x & x – 1
172	Why all these macros cannot be static inline functions? I have to look up the signatures of the used functions to understand the rules. Also, I do not think we should put symbols in the user namespace, that are not supposed to be accessed by users. Please use implementation namespace: __xxxx, i.e. two underscores in front of alphabetical characters.

• hselasky added a child revision: D36859: time(3): Optimize tvtohz() function..Oct 3 2022, 9:20 AM

Implement suggestions from @kib

use __ to prefix internal function names.
add more detailed explanation.
use static inline functions when possible.

• hselasky marked 2 inline comments as done.Oct 3 2022, 10:24 AM

Refactor code a bit.

x=125 NEW: 536870912 0  OLD: 536870913 0 DIFF: -1 sbt 0 ms
x=250 NEW: 1073741824 0  OLD: 1073741825 0 DIFF: -1 sbt 0 ms
x=375 NEW: 1610612736 0  OLD: 1610612737 0 DIFF: -1 sbt 0 ms
x=500 NEW: 2147483648 0  OLD: 2147483649 0 DIFF: -1 sbt 0 ms
x=625 NEW: 2684354560 0  OLD: 2684354561 0 DIFF: -1 sbt 0 ms
x=750 NEW: 3221225472 0  OLD: 3221225473 0 DIFF: -1 sbt 0 ms
x=875 NEW: 3758096384 0  OLD: 3758096385 0 DIFF: -1 sbt 0 ms
x=1125 NEW: 4831838208 0  OLD: 4831838209 0 DIFF: -1 sbt 0 ms
x=1250 NEW: 5368709120 0  OLD: 5368709121 0 DIFF: -1 sbt 0 ms
x=1375 NEW: 5905580032 0  OLD: 5905580033 0 DIFF: -1 sbt 0 ms
x=1500 NEW: 6442450944 0  OLD: 6442450945 0 DIFF: -1 sbt 0 ms
x=1625 NEW: 6979321856 0  OLD: 6979321857 0 DIFF: -1 sbt 0 ms
x=1750 NEW: 7516192768 0  OLD: 7516192769 0 DIFF: -1 sbt 0 ms
x=1875 NEW: 8053063680 0  OLD: 8053063681 0 DIFF: -1 sbt 0 ms

Here we see that the old sbt conversions are off by-one for 125ms for example.

Looking at the hex values you see (1/8 hz) is not correctly computed as sbt:

octave:3> dec2hex(536870912)
ans = 20000000 (correct)
octave:4> dec2hex(536870913)
ans = 20000001 (wrong)

Fix spelling in comment.

Something to consider? :

It looks to me that the original code tried to minimize the number of divisions (and modulo's). This might have been because of the likes of, say, "ARMv7-A implementation that does not include the Virtualization Extensions" having SDIV and UDIV as optional in the profile and ARMv7-R only requiring such instructions for Thumb, not ARM. (Just examples, I've no general knowledge of the range of platform's status for such.) Also: so far as I know, there is no 64-bit division instruction. The prior is about 32-bit division.

I also wonder if factor or divisor or the like is intended to ever allowed to be negative in the use of %. (Some routines have signed types involved.) Checking C's language requirements for negatives involved in % operations could be important otherwise: If there is C implementation freedom, that could lead to variability in the results.

Looks like C89 was the last with "implementation defined" for / and % involving negative integers. After that: 7/3==2; 7%3==1; -7/3==-2; -7%3==-1; 7/-3==-2; 7/-3==1; -7/-3==2; -7%-3==-1. So / truncates towards zero and a%b==a-(a/b)*b when a/b is representable. Notably -7%3!=7%-3 and 7%3!=-7%-3 .

In D36857#836796, @markmi_dsl-only.net wrote:

It looks to me that the original code tried to minimize the number of divisions (and modulo's).

You are on right track, but off by a decade or two :-)

There were no 64 bit divide instruction in the i386 instruction set, and the software version in the gcc runtime were horribly slow by kernel standards.

markmi_dsl-only.net mentioned this in D36858: time(3): Align fast clock times to avoid firing multiple timers..Oct 3 2022, 8:06 PM

In D36857#836796, @markmi_dsl-only.net wrote:

Something to consider? :

It looks to me that the original code tried to minimize the number of divisions (and modulo's). This might have been because of the likes of, say, "ARMv7-A implementation that does not include the Virtualization Extensions" having SDIV and UDIV as optional in the profile and ARMv7-R only requiring such instructions for Thumb, not ARM. (Just examples, I've no general knowledge of the range of platform's status for such.)

Yes, but by doing a bunch of if's, which in turn kill performance, when the branch misses.

Try to compile the new code and look at the assembly.

I also wonder if factor or divisor or the like is intended to ever allowed to be negative in the use of %. (Some routines have signed types involved.) Checking C's language requirements for negatives involved in % operations could be important otherwise: If there is C implementation freedom, that could lead to variability in the results.

Looks like C89 was the last with "implementation defined" for / and % involving negative integers. After that: 7/3==2; 7%3==1; -7/3==-2; -7%3==-1; 7/-3==-2; 7/-3==1; -7/-3==2; -7%-3==-1. So / truncates towards zero and a%b==a-(a/b)*b when a/b is representable. Notably -7%3!=7%-3 and 7%3!=-7%-3 .

The divisor and modulus is always positive, but the result may be negative for signed operation, if the input value is negative, in my code.

Try to compile the new code and look at the assembly.

You mean the calls for armv7 when I compile a copy your test program (and a locally patched time.h copy as ./time.h and the #include adjusted)? :

. . .
<main+0x50> bl 000209b0 <__aeabi_uldivmod>
. . .
<main+0x90> bl 000209b0 <__aeabi_uldivmod>
. . .
<main+0x124> bl        000209b0 <__aeabi_uldivmod>
. . .
<main+0x154> bl        000209b0 <__aeabi_uldivmod>
. . .
<main+0x1f0> bl        000209b0 <__aeabi_uldivmod>
. . .
<main+0x220> bl        000209b0 <__aeabi_uldivmod>
. . .

FYI: Using the system's normal time.h shows 3 such calls, not 6.

Side note: The compile also complains about the likes of:

# cc -O2 fbsd_time_conv.c
fbsd_time_conv.c:13:33: warning: format specifies type 'ssize_t' (aka 'int') but the argument has type 'int64_t' (aka 'long long') [-Wformat]
                        printf("NS ERROR %zd %zd\n", z, x);
                                         ~~~         ^
                                         %lld
fbsd_time_conv.c:13:36: warning: format specifies type 'ssize_t' (aka 'int') but the argument has type 'int64_t' (aka 'long long') [-Wformat]
                        printf("NS ERROR %zd %zd\n", z, x);
                                             ~~~        ^
                                             %lld
. . .

(modulo the style changes I listed)

sys/sys/time.h
169
171–177
177	... and the same changes over the functions below.

This revision is now accepted and ready to land.Oct 3 2022, 11:58 PM

In D36857#836894, @markmi_dsl-only.net wrote:

Try to compile the new code and look at the assembly.

You mean the calls for armv7 when I compile a copy your test program (and a locally patched time.h copy as ./time.h and the #include adjusted)? :

. . .
<main+0x50> bl 000209b0 <__aeabi_uldivmod>
. . .
<main+0x90> bl 000209b0 <__aeabi_uldivmod>
. . .
<main+0x124> bl        000209b0 <__aeabi_uldivmod>
. . .
<main+0x154> bl        000209b0 <__aeabi_uldivmod>
. . .
<main+0x1f0> bl        000209b0 <__aeabi_uldivmod>
. . .
<main+0x220> bl        000209b0 <__aeabi_uldivmod>
. . .

FYI: Using the system's normal time.h shows 3 such calls, not 6.

Side note: The compile also complains about the likes of:

# cc -O2 fbsd_time_conv.c
fbsd_time_conv.c:13:33: warning: format specifies type 'ssize_t' (aka 'int') but the argument has type 'int64_t' (aka 'long long') [-Wformat]
                        printf("NS ERROR %zd %zd\n", z, x);
                                         ~~~         ^
                                         %lld
fbsd_time_conv.c:13:36: warning: format specifies type 'ssize_t' (aka 'int') but the argument has type 'int64_t' (aka 'long long') [-Wformat]
                        printf("NS ERROR %zd %zd\n", z, x);
                                             ~~~        ^
                                             %lld
. . .

Yes, that looks correct. You should also notice that going to SBT is slower than going from SBT.

I checked qdivrem() in the kernel, and It is highly optimized. For small timeouts it will return zero infact, so only one full qdivrem() will be needed.

Also for millisecond and microsecond conversion, the divisor now fits into 16-bits, and apparently __qdivrem() optimizes for this!

• hselasky marked 3 inline comments as done.Oct 4 2022, 10:39 AM

Closed by commit rGd97ecf714e79: time(3): Increase precision of time conversion functions by using gcd. (authored by • hselasky). · Explain WhyOct 4 2022, 11:54 AM

This revision was automatically updated to reflect the committed changes.

• hselasky added a commit: rGd97ecf714e79: time(3): Increase precision of time conversion functions by using gcd..

• hselasky added a commit: rG4f576e96663e: time(3): Increase precision of time conversion functions by using gcd..Oct 12 2022, 3:56 PM

time(3): Increase precision of sbt and bintime conversion functions by using gcd.
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 111417

sys/sys/time.h

time(3): Increase precision of sbt and bintime conversion functions by using gcd.ClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 111417

sys/sys/time.h

time(3): Increase precision of sbt and bintime conversion functions by using gcd.
ClosedPublic
Actions

Revision Contents
Changeset List