Right now, we enable the CR4.FSGSBASE on CPUs which support the facility (Ivy and later), to allow usermode to read fs and gs bases without syscalls. This bit also controls the write access to bases from userspace, but WRFSBASE and WRGSBASE instructions currently cannot be used because return path from both exceptions or interrupts overrides bases with the values from pcb.
Supporting the instructions is useful because this means that usermode can implement green-threads completely in userspace without issuing syscalls to change all of the machine context.
Support is implemented by saving the fs base and user gs base when PCB_FULL_IRET flag is set. The flag is set on the context switch, which potentially causes clobber of the bases due to activation of another context, and when explicit modification of the user context by a syscall or exception handler is performed. In particular, the patch moves setting of the flag before syscalls change context.
The changes to doreti_exit and PUSH_FRAME to clear PCB_FULL_IRET on entry from userspace can be considered a bug fixes on its own.
Just a note, previous version of the patch used different strategy. It saved the bases on every kernel entrance from usermode. Patch was much simpler to analyze for correctness, although it added a lot of assembly code. I abandoned that version after measurements of the syscall microbenchmarks demonstrated 20% increase of syscall time for getpid(). Current patch shows statistically zero impact (of course, the save is moved to the context switch time).