Page MenuHomeFreeBSD

Pass cpuid values to the ifunc resolvers on x86.
ClosedPublic

Authored by kib on Nov 5 2016, 3:18 PM.
Tags
None
Referenced Files
F105944359: D8448.diff
Sun, Dec 22, 10:26 PM
Unknown Object (File)
Oct 28 2024, 4:56 AM
Unknown Object (File)
Oct 1 2024, 7:48 AM
Unknown Object (File)
Sep 28 2024, 3:30 AM
Unknown Object (File)
Sep 27 2024, 7:38 PM
Unknown Object (File)
Sep 24 2024, 9:14 AM
Unknown Object (File)
Sep 14 2024, 8:15 AM
Unknown Object (File)
Sep 11 2024, 11:37 AM
Subscribers

Details

Summary

Pass CPUID[1] %edx (cpu_feature), %ecx (cpu_feature2) and CPUID[7].%ebx (cpu_stdext_feature), %ecx (cpu_stdext_feature2) to the ifunc resolvers on x86.

I consider it is much more clean to use CPUID instruction in usermode to retrieve this information than to pass AT_HWCAP aux vector from kernel, on x86. Still, I do allow for use of AT_HWCAP on arches where it is needed, by passing aux array to ifunc_init() initializer which should prepare arguments for ifunc resolvers.

Current signature for resolvers on x86 is

func_t iresolve(uint32_t cpu_feature, uint32_t cpu_feature2, uint32_t cpu_stdext_feature, uint32_t cpu_stdext_feature2)

where arguments have the same meaning as the kernel variables.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

kib retitled this revision from to Pass cpuid values to the ifunc resolvers on x86..
kib updated this object.
kib edited the test plan for this revision. (Show Details)
kib added reviewers: jhb, emaste.
kib set the repository for this revision to rS FreeBSD src repository - subversion.

This does look convenient, though I wonder if it wouldn't be better to let the x86 ifuncs call a common routine to initialize private globals in libc instead? In particular, I'm thinking of the gdb change made recently which just passes the first word (but not all) of AT_HWCAP to resolvers. I could fix that in gdb by making the ifunc thing a gdbarch method, but the more complicated the method, the more work it is in the debugger to emulate.

That said, I suspect this was motivated by my renewed interest in SSE memcpy and friends. Do you know if we can compile ifuncs ok if we use the amd64 cross-toolchain?

In D8448#176317, @jhb wrote:

This does look convenient, though I wonder if it wouldn't be better to let the x86 ifuncs call a common routine to initialize private globals in libc instead? In particular, I'm thinking of the gdb change made recently which just passes the first word (but not all) of AT_HWCAP to resolvers. I could fix that in gdb by making the ifunc thing a gdbarch method, but the more complicated the method, the more work it is in the debugger to emulate.

I do not think that ifunc use should be limited to libc. libm can benefit as well, and a lot of non-system code should also take advantage of convenient cpu sizing.

That said, binary-local symbols for CPU feature bitmasks, same as kernel cpu_feature* vars, would be a solution, in my opinion. This is mostly done in D2651.

That said, I suspect this was motivated by my renewed interest in SSE memcpy and friends. Do you know if we can compile ifuncs ok if we use the amd64 cross-toolchain?

I am not sure about cross-toolchain. I use manually compiled gcc 6.2 and binutils 2.27 for testing my rtld patches, it works. gcc requires --enable-gnu-indirect-function flag at configure time. Cross-toolchain needs similar tweak.

Yes, I finally decided to add something similar to AT_HWCAP, motivated by scrollback, but again, I do not see it reasonable to marshall a data from kernel, which is readily available at userspace.

In D8448#176323, @kib wrote:
In D8448#176317, @jhb wrote:

This does look convenient, though I wonder if it wouldn't be better to let the x86 ifuncs call a common routine to initialize private globals in libc instead? In particular, I'm thinking of the gdb change made recently which just passes the first word (but not all) of AT_HWCAP to resolvers. I could fix that in gdb by making the ifunc thing a gdbarch method, but the more complicated the method, the more work it is in the debugger to emulate.

I do not think that ifunc use should be limited to libc. libm can benefit as well, and a lot of non-system code should also take advantage of convenient cpu sizing.

Ok, fair enough. To be clear, I don't mind adding args to ifuncs as it is certainly convenient. I just imagine that as vendors keep adding more features we will find at some point we will want yet-another feature flag and adding a new arg to the ifuncs would break the ABI.

That said, binary-local symbols for CPU feature bitmasks, same as kernel cpu_feature* vars, would be a solution, in my opinion. This is mostly done in D2651.

Ok.

That said, I suspect this was motivated by my renewed interest in SSE memcpy and friends. Do you know if we can compile ifuncs ok if we use the amd64 cross-toolchain?

I am not sure about cross-toolchain. I use manually compiled gcc 6.2 and binutils 2.27 for testing my rtld patches, it works. gcc requires --enable-gnu-indirect-function flag at configure time. Cross-toolchain needs similar tweak.

Yes, I finally decided to add something similar to AT_HWCAP, motivated by scrollback, but again, I do not see it reasonable to marshall a data from kernel, which is readily available at userspace.

I agree. I think some platforms may require AT_HWCAP, but for x86 it is much more flexible to do it all in userland.

In D8448#176325, @jhb wrote:

Ok, fair enough. To be clear, I don't mind adding args to ifuncs as it is certainly convenient. I just imagine that as vendors keep adding more features we will find at some point we will want yet-another feature flag and adding a new arg to the ifuncs would break the ABI.

As far as I remember what you said, glibc only passes one word of flags to ifuncs, I decided to pass all four current flags words. And, all our ABIs allow to increase amount of integer args to a function without breaking older variant which takes less arguments.

kib edited edge metadata.

Final version with non-x86 arches handled by a placeholder.

libexec/rtld-elf/Makefile
50 ↗(On Diff #22129)

Can do without the second -Wl -- -Wl,-Bsymbolic,-z,defs

libexec/rtld-elf/Makefile
50 ↗(On Diff #22129)

I think it would work.

I weakly prefer the split version, since -Wl is just a weird way to pass the linker flag due to clang not behaving like normal unix compiler. In the split version it is more clean that symbolic is one option, and "-z defs" is another.

If you strongly prefer the glued variant, I will update this chunk.

jhb edited edge metadata.
jhb added inline comments.
libexec/rtld-elf/Makefile
50 ↗(On Diff #22129)

I probably prefer each '-Wl' to pass a single, logical argument to ld, so that each argument to ld requires a -Wl as you have done.

This revision is now accepted and ready to land.Nov 14 2016, 7:09 PM
libexec/rtld-elf/Makefile
50 ↗(On Diff #22129)

Ok, you've convinced me. I agree separate -Wl arguments makes the intent more clear.

This revision was automatically updated to reflect the committed changes.