This saves 320 bytes of the precious stack space.
The only negative aspect of the change I can think of is that the struct thread increased by 320 bytes obviously, and that 320 bytes are not swapped out anymore. I believe the freed stack space is much more important than that. ALso, current struct thread size is 1392 bytes in size on amd64, so UMA will allocate two thread structures per (4KB) slab, which leaves a space for pcb without increasing zone memory use.
I consider moving the user FPU save area into the dedicated allocation as the next step. Then it might be even possible to reduce the default stack size (not tried yet).
Tested by: pho