Implement mul_u64_u64_div_u64() for an updated iwlwifi driver (though
we do not yet use it there; it is used for inkernel ptp on wifi).
Sponsored by: The FreeBSD Foundation
Submitted by: cperciva
MFC after: 10 days
Differential D40120
LinuxKPI: implement mul_u64_u64_div_u64() bz on May 16 2023, 9:07 PM. Authored by
Details
Implement mul_u64_u64_div_u64() for an updated iwlwifi driver (though Sponsored by: The FreeBSD Foundation
Diff Detail
Event TimelineComment Actions Please don't implement integers like floating point! I think you need to do something exactly like below. I just quickly sketched this up. Can you verify it? static inline uint64_t mul_u64_u64_div_u64(uint64_t x, uint64_t y, uint64_t z) { const uint64_t r32[4] = { x >> 32, x & 0xFFFFFFFFULL, y >> 32, y & 0xFFFFFFFFULL }; const uint64_t rZ[8] = { r32[0] / z, r32[0] % z, r32[1] / z, r32[1] % z, r32[2] / z, r32[2] % z, r32[3] / z, r32[3] % z }; return ( (((rZ[0] * rZ[4]) * z) << 64) + ((rZ[0] * rZ[5]) << 64) + (((rZ[0] * rZ[6]) * z) << 32) + ((rZ[0] * rZ[7]) << 32) + ((rZ[1] * rZ[4]) << 64) + (((rZ[1] * rZ[5]) / z) << 64) + ((rZ[1] * rZ[6]) << 32) + (((rZ[1] * rZ[7]) / z) << 32) + (((rZ[2] * rZ[4]) * z) << 32) + ((rZ[2] * rZ[5]) << 32) + ((rZ[2] * rZ[6]) * z) + (rZ[2] * rZ[7]) + ((rZ[3] * rZ[4]) << 32) + (((rZ[3] * rZ[5]) / z) << 32) + (rZ[3] * rZ[6]) + ((rZ[3] * rZ[7]) / z) ); } Comment Actions Can you explain a bit more what this function is supposed to do? Loosing integer precision is not a good thing! What is this function used for? HPS Comment Actions @bz : Another thing coming to mind, is to apply the greatest common divisor between x,y,z. This computation is pretty cheap. When you have exact fractions, it doesn't look so good seeing rounding errors. Is this function being used for clock computations? Comment Actions Yeah, it's used in a driver ptp code (which we don't actually support yet, the inkernel ptp bits) Based on it's name it does X * Y / Z Comment Actions Can you show me excatly where this function is needed? The function name does not indicate the function is inaccurate. HPS Comment Actions In case this is going to be too complicated I'll add a "skeleton" in given we are unlikely going to use this much  if at all  anywhere. Comment Actions Thank you! I looked a bit at Linux, and it seems they have an exact implementation for this function, utilizing dedicated CPU instructions when possible to avoid bunches of code. What do you think? Some ideas: What values are being passed to this function? Can we compute the greatest common divisor on the mul/div values, and reduce this to a classic, scale 64bit number by 32/32bits as in sys/sys/time.h, after my recent changes?
Comment Actions @dwmalone Please take a look at this too, just to make sure I didn't goof anywhere... it's always hard to review my own code. ;) Comment Actions Looks sensible to me. Minor observations:
int main(void) { uint64_t x, y, z, m; uint64_t a1, a2; int i; for (i = 0; i < 1000000; i++) { x = arc4random(); y = arc4random(); z = arc4random(); a2 = (x*(uint64_t)y)/z; m = arc4random(); y *= m; z *= m; a1 = mul_u64_u64_div_u64(x,y,z); if (a1 != a2) printf("%ju %ju %ju\n", (uintmax_t)x, (uintmax_t)y, (uintmax_t)z); } return 0; }
