Details

Reviewers

cperciva
dwmalone
• hselasky

Group Reviewers

linuxkpi

Commits

rGb80ea452375f: LinuxKPI: implement mul_u64_u64_div_u64()

Summary

Implement mul_u64_u64_div_u64() for an updated iwlwifi driver (though
we do not yet use it there; it is used for in-kernel ptp on wifi).

Sponsored by: The FreeBSD Foundation
Submitted by: cperciva
MFC after: 10 days

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

bz created this revision.May 16 2023, 9:07 PM

Herald added subscribers: emaste, imp. · View Herald TranscriptMay 16 2023, 9:07 PM

bz requested review of this revision.May 16 2023, 9:07 PM

Harbormaster completed remote builds in B51512: Diff 122012.May 16 2023, 9:07 PM

Please don't implement integers like floating point!

I think you need to do something exactly like below. I just quickly sketched this up. Can you verify it?

static inline uint64_t
mul_u64_u64_div_u64(uint64_t x, uint64_t y, uint64_t z)
{
	const uint64_t r32[4] = { x >> 32, x & 0xFFFFFFFFULL, y >> 32, y & 0xFFFFFFFFULL };
	const uint64_t rZ[8] =  { r32[0] / z, r32[0] % z, r32[1] / z, r32[1] % z,
					       r32[2] / z, r32[2] % z, r32[3] / z, r32[3] % z };

	return (
	(((rZ[0] * rZ[4]) * z) << 64) +
	((rZ[0] * rZ[5]) << 64) +
	(((rZ[0] * rZ[6]) * z) << 32) +
	((rZ[0] * rZ[7]) << 32) +
	
	((rZ[1] * rZ[4]) << 64) +
	(((rZ[1] * rZ[5]) / z) << 64) +
	((rZ[1] * rZ[6]) << 32) +
	(((rZ[1] * rZ[7]) / z) << 32) +
	
	(((rZ[2] * rZ[4]) * z) << 32) +
	((rZ[2] * rZ[5]) << 32) +
	((rZ[2] * rZ[6]) * z) +
	(rZ[2] * rZ[7]) +
	
	((rZ[3] * rZ[4]) << 32) +
	(((rZ[3] * rZ[5]) / z) << 32) +
	(rZ[3] * rZ[6]) +
	((rZ[3] * rZ[7]) / z)
	);
}

This revision now requires changes to proceed.May 17 2023, 7:06 AM

In D40120#913699, @hselasky wrote:

Please don't implement integers like floating point!

I think you need to do something exactly like below. I just quickly sketched this up. Can you verify it?

In D40120#913732, @bz wrote:

In D40120#913699, @hselasky wrote:

Please don't implement integers like floating point!

I think you need to do something exactly like below. I just quickly sketched this up. Can you verify it?

No

Can you explain a bit more what this function is supposed to do?

Loosing integer precision is not a good thing!

What is this function used for?

--HPS

@bz : Another thing coming to mind, is to apply the greatest common divisor between x,y,z. This computation is pretty cheap. When you have exact fractions, it doesn't look so good seeing rounding errors.

Is this function being used for clock computations?

In D40120#914011, @hselasky wrote:

@bz : Another thing coming to mind, is to apply the greatest common divisor between x,y,z. This computation is pretty cheap. When you have exact fractions, it doesn't look so good seeing rounding errors.

Is this function being used for clock computations?

Yeah, it's used in a driver ptp code (which we don't actually support yet, the in-kernel ptp bits)

Based on it's name it does X * Y / Z

Hans, any suggestions on how to proceed?

In D40120#918932, @bz wrote:

Hans, any suggestions on how to proceed?

@bz

Can you show me excatly where this function is needed?

The function name does not indicate the function is inaccurate.

--HPS

In D40120#918998, @hselasky wrote:

In D40120#918932, @bz wrote:

Hans, any suggestions on how to proceed?

@bz

Can you show me excatly where this function is needed?

https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-testing.git/tree/drivers/net/wireless/intel/iwlwifi/mvm/ptp.c#n74

In case this is going to be too complicated I'll add a "skeleton" in given we are unlikely going to use this much -- if at all -- anywhere.

In D40120#921419, @bz wrote:

In D40120#918998, @hselasky wrote:

In D40120#918932, @bz wrote:

Hans, any suggestions on how to proceed?

@bz

Can you show me excatly where this function is needed?

https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-testing.git/tree/drivers/net/wireless/intel/iwlwifi/mvm/ptp.c#n74

In case this is going to be too complicated I'll add a "skeleton" in given we are unlikely going to use this much -- if at all -- anywhere.

Thank you!

I looked a bit at Linux, and it seems they have an exact implementation for this function, utilizing dedicated CPU instructions when possible to avoid bunches of code.

What do you think?

Some ideas:

What values are being passed to this function?

Can we compute the greatest common divisor on the mul/div values, and reduce this to a classic, scale 64-bit number by 32/32-bits as in sys/sys/time.h, after my recent changes?

Update with algo submitted by @cperciva.

Harbormaster completed remote builds in B53126: Diff 126059.Aug 16 2023, 12:37 AM

bz edited the summary of this revision. (Show Details)Aug 16 2023, 12:38 AM

bz removed a subscriber: • hselasky.

cperciva added inline comments.Aug 16 2023, 4:31 AM

sys/compat/linuxkpi/common/include/linux/math64.h
121	Even if it's not a KASSERT, I think it's worth documenting // INVARIANT: x * y = res * z + rem + (y1 + y1z * z) * x1 // INVARIANT: y1 < z to help the reader understand.
134	The second part of this should be `y1 * 2 >= z`.
142	Before returning I would probably add KASSERT(res * z + rem == x * y); KASSERT(rem < z); Those don't prove that the computation is correct (since they only verify the bottom 64 bits of the 128-bit product) but it's still better than nothing (and would have caught the typo above).

Update based on @cperciva comments

Harbormaster completed remote builds in B53131: Diff 126066.Aug 16 2023, 4:39 AM

bz marked 3 inline comments as done.Aug 16 2023, 4:39 AM

@dwmalone Please take a look at this too, just to make sure I didn't goof anywhere... it's always hard to review my own code. ;-)

Will have a look today.

Looks sensible to me. Minor observations:

It will blow up with a division by zero if z = 0, however, that's probably the correct behaviour.
The two checks for overflow are clever - if we overflow, then we have definitely exceed our invariant, and so need to do the reduce step.
I've suggested recording one extra invariant above.
I've run a few different versions of this test code, and it has always produced the correct answers, so I think it is working fine.

int main(void) { 
        uint64_t x, y, z, m;
        uint64_t a1, a2;
        int i;
        
        for (i = 0; i < 1000000; i++) { 
                x = arc4random();
                y = arc4random();
                z = arc4random();
                a2 = (x*(uint64_t)y)/z;
                m = arc4random();
                y *= m; 
                z *= m; 
                a1 = mul_u64_u64_div_u64(x,y,z);
                if (a1 != a2)
                        printf("%ju %ju %ju\n", (uintmax_t)x, (uintmax_t)y, (uintmax_t)z);
        }                       
                        
        return 0;
}