Page MenuHomeFreeBSD

Build riscv kernel with inline ffs, fls
AbandonedPublic

Authored by dougm on Jun 18 2023, 4:11 AM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Nov 28, 7:23 AM
Unknown Object (File)
Nov 15 2024, 9:11 AM
Unknown Object (File)
Sep 27 2024, 7:17 AM
Unknown Object (File)
Sep 23 2024, 9:08 PM
Unknown Object (File)
Sep 22 2024, 9:09 AM
Unknown Object (File)
Sep 19 2024, 11:35 AM
Unknown Object (File)
Sep 18 2024, 5:13 AM
Unknown Object (File)
Sep 17 2024, 4:31 PM
Subscribers

Details

Reviewers
alc
Group Reviewers
riscv
Summary

Implement the ffs and fls instructions, and their longer counterparts, in cpufunc, in terms of gcc extensions like __builtin_ffs, for riscv architectures, and use those, rather than simple libkern implementations, in building riscv kernels.

This is a clone of D20250, which was for arm64.

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

dougm requested review of this revision.Jun 18 2023, 4:11 AM
dougm created this revision.

Yes, I would like to see this change happen, with some caveats. I'll try to clarify the situation here.

Unlike the change for arm64, the switch to these builtins will not (by default) convert these routines to optimized instructions, but rather libcalls to the compiler-provided routines, e.g. contrib/llvm-project/compiler-rt/lib/builtins/ctzsi2.c.

This is because the standard set of RISC-V instructions does not define bit-manipulation instructions. There is a "Bitmanip" ISA extension which adds such instructions, but we can't easily enable this in the kernel because it would be incompatible with existing hardware implementations. How we will usefully support the bitmanip CPUs in GENERIC now that they are nearly here is a question I've considered but do not really have an answer for.

I do think that switching away from the libkern implementations to whatever is provided by the compiler runtime library is a fine choice, that will give equivalent or better performance in all cases. In fact, I've had a local change for a long time that will switch the remaining architectures (riscv, arm) to use the builtins, and removes the libkern implementations altogether. I've held off on it mainly due to a lack of testing/verifying the performance impact; there are a handful of kernel routines that are implemented differently depending on if HAVE_INLINE_F** is defined.

So, all of this is to say that I am in favor of this change, but it doesn't really accomplish what I think you were hoping to accomplish in terms of speeding up the routines, nor does it go all the way to clean up these older libkern routines.

Your only real option would be to IFUNC them (or have a branch like arm64 does for LSE atomics); the initial set of ratified extensions for RISC-V that then appeared in real hardware was a bit pitiful.

Unlike the change for arm64, the switch to these builtins will not (by default) convert these routines to optimized instructions, but rather libcalls to the compiler-provided routines, e.g. contrib/llvm-project/compiler-rt/lib/builtins/ctzsi2.c.

These compiler-provided routines use binary search instead of sequential search, so replacing the libkern implementations with them is still a win.

I do think that switching away from the libkern implementations to whatever is provided by the compiler runtime library is a fine choice, that will give equivalent or better performance in all cases. In fact, I've had a local change for a long time that will switch the remaining architectures (riscv, arm) to use the builtins, and removes the libkern implementations altogether. I've held off on it mainly due to a lack of testing/verifying the performance impact; there are a handful of kernel routines that are implemented differently depending on if HAVE_INLINE_F** is defined.

Well, the two instances of "HAVE_INLINE_F**" in sys/kern/subr_blist.c are my doing, and they are only there because I didn't want to rely on the slow libkern implementations. What's in sys/netinet/tcp_lro.c looks like the same kind of thing.

I don't know what to do with IFUNC for riscv, so I won't even try.

I'm happy to commit this if people approve it. I'm happy for you to commit your patch that covers other architectures. I'm happy to go clean the "HAVE_INLINE"s out of at least two files that would no longer need them after you rid the world of the slow libkern implemenations. How do you want to proceed?

Well, the two instances of "HAVE_INLINE_F**" in sys/kern/subr_blist.c are my doing, and they are only there because I didn't want to rely on the slow libkern implementations. What's in sys/netinet/tcp_lro.c looks like the same kind of thing.

I'm happy to commit this if people approve it. I'm happy for you to commit your patch that covers other architectures. I'm happy to go clean the "HAVE_INLINE"s out of at least two files that would no longer need them after you rid the world of the slow libkern implemenations. How do you want to proceed?

I took a look at what I have. I can submit a change that consolidates the ffs/fls definitions to a single place (with an MD override), and disables the conditional optimizations by defining HAVE_INLINE_FOO for every architecture. I will post this to phabricator very soon, most likely tomorrow. If the review does not stall we should prefer this to handling only riscv.

The second half will be actually removing the HAVE_INLINE_FOO instances from the source tree. Since these have crept into OpenZFS, it unfortunately can't be done in a single commit with part 1. If you can handle any of this removal, such as the cases you listed above, that would be very helpful.