Page MenuHomeFreeBSD

Make WRFSBASE and WRGSBASE functional.
ClosedPublic

Authored by kib on Aug 14 2017, 11:42 AM.
Tags
None
Referenced Files
Unknown Object (File)
Dec 5 2024, 3:28 PM
Unknown Object (File)
Sep 22 2024, 2:32 AM
Unknown Object (File)
Sep 20 2024, 12:01 PM
Unknown Object (File)
Sep 18 2024, 3:22 AM
Unknown Object (File)
Sep 4 2024, 11:35 PM
Unknown Object (File)
Aug 31 2024, 8:44 AM
Unknown Object (File)
Aug 28 2024, 2:28 AM
Unknown Object (File)
Aug 23 2024, 7:41 PM
Subscribers

Details

Summary

Right now, we enable the CR4.FSGSBASE on CPUs which support the facility (Ivy and later), to allow usermode to read fs and gs bases without syscalls. This bit also controls the write access to bases from userspace, but WRFSBASE and WRGSBASE instructions currently cannot be used because return path from both exceptions or interrupts overrides bases with the values from pcb.

Supporting the instructions is useful because this means that usermode can implement green-threads completely in userspace without issuing syscalls to change all of the machine context.

Support is implemented by saving the fs base and user gs base when PCB_FULL_IRET flag is set. The flag is set on the context switch, which potentially causes clobber of the bases due to activation of another context, and when explicit modification of the user context by a syscall or exception handler is performed. In particular, the patch moves setting of the flag before syscalls change context.

The changes to doreti_exit and PUSH_FRAME to clear PCB_FULL_IRET on entry from userspace can be considered a bug fixes on its own.

Just a note, previous version of the patch used different strategy. It saved the bases on every kernel entrance from usermode. Patch was much simpler to analyze for correctness, although it added a lot of assembly code. I abandoned that version after measurements of the syscall microbenchmarks demonstrated 20% increase of syscall time for getpid(). Current patch shows statistically zero impact (of course, the save overhead is moved to the context switch time).

Test Plan

The program which runs a thread on each core, writes some values in bases and expect them to be read back both by instructions and by sysarch(2) was used as additional load for stress2. See https://gist.github.com/23f6b30e72e6bc8738196fc6c7e2e39c.

hwmpc(4) was used to generate NMIs.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

kib added reviewers: jhb, markj.
lib/libc/amd64/sys/amd64_detect_rdfsgsbase.c
58 ↗(On Diff #32040)

Why goto vs just return?

sys/amd64/amd64/cpu_switch.S
109 ↗(On Diff #32040)

It took me a bit to understand this as I thought perhaps we were relying on wrfsbase/gsbase somehow changing %fs and %gs itself, but instead it seems that the logic is that 32-bit processes can't use these instructions, but for 64-bit processes on a supported CPU we always save/restore the base in cpu_switch now?

117 ↗(On Diff #32040)

Why does this use the msr instead of rdgsbaseq?

Ah, because the current gs base is the the kernel one. Perhaps a comment "Read user gs base" would help avoid confusion for future readers.

sys/amd64/amd64/exception.S
191 ↗(On Diff #32040)

Is this unrelated (or leftovers from the first version)? I didn't see any references to alltraps_pushregs_no_rax.

sys/amd64/amd64/machdep.c
2138 ↗(On Diff #32040)

Not sure why we need to set this flag to read pcb_fsbase/gsbase? If we need "up to date" values for userland and this is curthread then we need to just use rdfsbase() and rdmsr(MSR_KGSBASE) if this is a CPU with the new instructions instead of the values from the pcb? As it is, the effect here is that get_mcontext is now going to potentially overwrite fsbase / gsbase with stale values from the PCB (if a thread happened to invoke wrfsbase or wrgsbase in userland right before calling getmcontext without an intervening syscall or interrupt that triggered a context switch)

sys/amd64/amd64/ptrace_machdep.c
175 ↗(On Diff #32040)

Same note from get_mcontext() applies to the PT_GETFS/GSBASE ptrace ops.

kib marked 2 inline comments as done.Aug 18 2017, 6:45 PM
kib added inline comments.
sys/amd64/amd64/cpu_switch.S
109 ↗(On Diff #32040)

The instructions are not executed in 32bit mode, this is specified by Intel.

For 64bit processes I consider the MSR to be the authoritative source on CPUs with the feature implemented. But we only support the base overload using MSRs in usermode when well-known segments are loaded into the segment registers, see the iretq return path in exception.S.

If this explanation does not answer your question, perhaps I did not understand it. Please specify it more then.

sys/amd64/amd64/exception.S
191 ↗(On Diff #32040)

Look at the very end of the prot_addrf-labeled block.

There the bases must be saved before swapgs otherwise code need to use RDMSR which would put even higher registers pressure.

sys/amd64/amd64/machdep.c
2138 ↗(On Diff #32040)

I do not quite understand this comment. The set_pcb_flag(PCB_FULL_IRET) ensures that pcb_xsbase values are not stalled, i.e. the set flag function evacuates the values from the registers into pcb. Code is generally structured so that PCB_FULL_IRET flag, when not set, indicates that the bases are in MSRs, and when set, that the bases are in pcb. See the context switch code as well.

I considered extracting the code from the new set_pcb_flag() to only read the bases into pcb, but I disliked both the fact that PCB_FULL_IRET no longer reflects the bases value ownership, and that I still need to disable interrupts to reliably read the values. So use of the set_pcb_flags() seems to be most correct, although the name is somewhat confusing, I agree. This is why I added the 'update base' comments.

Add comment. Do not use goto.

sys/amd64/amd64/machdep.c
2138 ↗(On Diff #32040)

Ahh, I do think the 'set_pcb_flags()' name is what threw me off. I think it would be good to add a comment above set_pcb_flags() explaining why the bases are fetched, and why it is important to call set_pcb_flags() before reading or writing pcb_[fg]sbase.

Hmm, if you've already had to change all of the set_pcb_flags calls that set PCB_FULL_IRET (either to move them or make them use _raw), maybe you could leave 'set_pcb_flags' as is (just the atomic op) and add a new 'update_pcb_bases(pcb)' which is the new set_pcb_flags but hardcodes the fact that it is setting PCB_FULL_IRET.

Add update_pcb_bases() and the comment explaining the method for saving the user-set bases together with the relation to PCB_FULL_IRET flag.

Thanks. Just some small suggestions for thought, but fine either way.

lib/libc/amd64/sys/amd64_set_gsbase.c
50 ↗(On Diff #32247)

Possibly abort(), or maybe restructure the libc wrappers to do 'if (detect == SUPPORTED) rd/wrfs/gsbase() else sysarch()' so that they always do the sysarch in the else?

sys/amd64/amd64/vm_machdep.c
241 ↗(On Diff #32247)

These could actually not change because pcb2 != curpcb so set_pcb_flags() will still DTRT?

This revision is now accepted and ready to land.Aug 21 2017, 3:59 PM
kib marked an inline comment as done.Aug 21 2017, 4:11 PM
kib added inline comments.
sys/amd64/amd64/vm_machdep.c
241 ↗(On Diff #32247)

Yes, they could, but _raw() variant has slightly less overhead. I prefer to use it there, if you do not object.

kib edited edge metadata.

Ensure that amd64_get/set_f/gbase() use syscall if the return value from amd64_detect_rdfsgsbase() is unknown.

This revision now requires review to proceed.Aug 21 2017, 4:11 PM
This revision was automatically updated to reflect the committed changes.