Page MenuHomeFreeBSD

x86: Reset fsbase and gsbase for new threads.
AbandonedPublic

Authored by jhb on Mar 3 2021, 12:12 AM.

Details

Reviewers
kib
Summary

New threads shouldn't reuse TLS registers from old threads, and in
general new threads should begin with a clean register state closer
to exec() than to fork().

In practice thr_new() explicitly sets the TLS register to
params->tls_base before the thread begins execution anyway for native
processes.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 37517
Build 34406: arc lint + arc unit

Event Timeline

But this change breaks ABI, I suspect. E.g. for amd64 current behavior is to set fsbase but gs/gsbase is copied. Similarly for 32bit on amd64 gsbase is set but fsbase is copied.

After your change, non-TLS base is zeroed..

In D29025#650004, @kib wrote:

But this change breaks ABI, I suspect. E.g. for amd64 current behavior is to set fsbase but gs/gsbase is copied. Similarly for 32bit on amd64 gsbase is set but fsbase is copied.

After your change, non-TLS base is zeroed..

This is true, but it's not clear what is really guaranteed here vs a historical accident. I think if any software wanted to reliably use the non-TLS base they should be explicitly setting it for new threads as there isn't any sort of portable guarantee about what state is inherited from the creating thread.

Hmm, so the only portable guarantee is one we perhaps don't guarantee which is "The floating-point environment shall be inherited from the creating thread." (for pthread_create() from pubs.opengroup.org). I'm not sure if that means things like rounding mode or the actual values of floating point registers. That seems like an odd requirement.

In D29025#650451, @jhb wrote:
In D29025#650004, @kib wrote:

But this change breaks ABI, I suspect. E.g. for amd64 current behavior is to set fsbase but gs/gsbase is copied. Similarly for 32bit on amd64 gsbase is set but fsbase is copied.

After your change, non-TLS base is zeroed..

This is true, but it's not clear what is really guaranteed here vs a historical accident. I think if any software wanted to reliably use the non-TLS base they should be explicitly setting it for new threads as there isn't any sort of portable guarantee about what state is inherited from the creating thread.

I am quite sure that there are programs that rely on kernel ABI with this regard. For instance, I remember that at least some versions of sbcl used thr(2) syscalls directly and manipulated segment register bases, for their threading implementation (on Linux they used futex(2)). I would be not surprised to learn that Go generated binaries do the same. That said, I do not see why risk and change ABI there.

Hmm, so the only portable guarantee is one we perhaps don't guarantee which is "The floating-point environment shall be inherited from the creating thread." (for pthread_create() from pubs.opengroup.org). I'm not sure if that means things like rounding mode or the actual values of floating point registers. That seems like an odd requirement.

Posix does not mention many concepts that we implement, or these concepts are too low-level and become details of implementation.

ANSI C states The floating-point environment refers collectively to any floating-point status flags and control modes supported by the implementation. I would assume that POSIX authors are knowledgeable enough to use proper terminology. I wrote the following test program https://gist.github.com/bc6e2256ed12de4dc67d1cac7478d717 . Its output on FreeBSD is

m 3.141593e+00
t 0.000000e+00

and on Linux

m 3.141593e+00
t 3.141593e+00

I'm fine with dropping this. Should we fix the floating point inheritance to match Linux? Even if the goal is just to preserve rounding mode, etc. it seems simpler to just preserve all the state implementation-wise.

In D29025#651549, @jhb wrote:

I'm fine with dropping this. Should we fix the floating point inheritance to match Linux? Even if the goal is just to preserve rounding mode, etc. it seems simpler to just preserve all the state implementation-wise.

I agree that copying the whole FPU save area is the best. In fact, the save area contains not only FPU+ state like x87+XMM+YMM+ZMM, but also e.g. PKRU and (for future) CET. So copying first is perhaps the right approach for other means as well.