- Clear the current thread's TLS pointer on exec. Previously the TLS pointer (and register) remain unchanged.
- Explicitly clear the TLS pointer when new threads are created.
- Make md_tls_tcb_offset per-process instead of per-thread.
The layout of the TLS and TCB are identical for all threads in a process, it is only the TLS pointer values themselves that vary by thread. This also makes setting md_tls_tcb_offset in cpu_set_user_tls() redundant with the setting in exec_setregs(), so only set it in exec_setregs().