Page MenuHomeFreeBSD

Build arm64 kernel with inline ffs, fls
ClosedPublic

Authored by dougm on May 12 2019, 9:16 PM.

Details

Summary

Implement the ffs and fls instructions, and their longer counterparts, in cpufunc, in terms of gcc extensions like __builtin_ffs, for arm64 architectures, and use those, rather than simple libkern implementations, in building arm64 kernels.

Tested by: greg_unrelenting.technology

Test Plan

I can complete an arm64 build, and I can examine an object file like pmap.o or subr_blist.o for signs that the inlining has happened, as instruction sequences like

244: dac001c9 rbit x9, x14
248: dac01129 clz x9, x9

appear there to indicate a fast implementation of ffs. My reference on smart bit ops for arm64 and how to recognize them is https://fgiesen.wordpress.com/2013/10/18/bit-scanning-equivalencies/.

Without the patch, the arm64 build does not have these sequences in these object files and has lines like this instead:

af34:       94000000        bl      0 <ffsl>

which do not appear with the patch in place.

None of this suggests that this patch is correct, but only that it does something. I don't have the means to test an arm64 kernel at this time. If Peter Holm can help, that would be great. If not, I'll have to wait for the freebsd arm64 test boxes to come back online.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

dougm created this revision.May 12 2019, 9:16 PM

Perhaps @andrew, @manu, @greg_unrelenting.technology or one of the other ARM folks would be willing to test?

arm64/include/cpufunc.h
56 ↗(On Diff #57343)

The indentation in this function and the one above looks off.

dougm updated this revision to Diff 57349.May 13 2019, 1:14 AM

Address indentation issues.

kib added a comment.May 13 2019, 8:44 AM

Can somebody explain to me why don't we use __builtin_ffs() and other functions everywhere ?

arm64/include/cpufunc.h
47 ↗(On Diff #57349)

Why do you need the cast of mask to u_int ? Prototype in the gcc reference shows
int __builtin_ctz(unsigned).

I also find it strange to write mask == 0 ? mask : ..., I would certainly express it as mask == 0 ? 0 : ....

85 ↗(On Diff #57349)

Why casting the result of __builtin_clzl() to int ? It is tautology.

dougm marked 3 inline comments as done.May 13 2019, 2:25 PM
dougm updated this revision to Diff 57360.

Reduce strangeness.

arm64/include/cpufunc.h
47 ↗(On Diff #57349)

In imitating the code in amd64/include/cpufunc.h, I unnecessarily copied a cast-to-int. I will remove it. I also copied the use of 'mask' that seems strange. I will remove it too.

85 ↗(On Diff #57349)

Agreed.

dougm added a comment.May 13 2019, 8:14 PM
In D20250#436408, @kib wrote:

Can somebody explain to me why don't we use __builtin_ffs() and other functions everywhere ?

In case you are asking why I wrote code using builtin_clz and not builtin_ffs, it is because I believe that many call to ffs and fls always subtract one from the result, and always know that the argument is nonzero, and I think that the way I wrote things might let the compiler optimize out the zero-check and/or the plus-one sometimes, and maybe with __builtin_ffs that wouldn't happen. But that is speculation.

I case you are asking about the whole freebsd codebase in general, I can't say.

Kernel with this patch boots fine on Amazon EC2 (which btw you can use too, freebsd-snapshots posts have aarch64 AMIs)

dougm edited the summary of this revision. (Show Details)May 14 2019, 4:27 PM
dougm added a reviewer: alc.May 16 2019, 7:41 AM
dougm updated this revision to Diff 57437.May 16 2019, 7:46 AM

The only difference in the pmap.o code on arm64 between using builtin_ffsl and _builtin_ctzl (with a zero-test and an add) is that the version with builtin_ffsl has an 'orr' instruction where the other version has an 'add'.

So this version uses builtin_ffs and builtin_ffsl.

alc added inline comments.May 16 2019, 5:24 PM
arm64/include/cpufunc.h
94 ↗(On Diff #57437)

This assumes that long and long long will always be the same size. If a tool chain came along that defined long long as 128 bits, this would compile without an error or warning and produce broken results. I would either add a CTASSERT() that long and long long are the same size or implement this using __builtin_clzll.

dougm updated this revision to Diff 57452.May 16 2019, 5:35 PM

Handle the possibility that sizeof(long long) != sizeof(long).

alc accepted this revision.May 17 2019, 4:36 AM
This revision is now accepted and ready to land.May 17 2019, 4:36 AM
kib accepted this revision.May 17 2019, 7:58 AM
This revision was automatically updated to reflect the committed changes.