The original code did not support dynamically loaded libraries and use
suboptimal access to TLS variables.
New implementation removes lazy resolving of TLS relocation - due to flaw
in TLSDESC design is impossible to switch resolver function at runtime
without expensive locking.
Due to this, 3 specialized resolvers are used.
- load time resolver, for TLS relocation from libraries loaded with main binary (thus with known TLS offset).
- undefined thread weak symbol resolved for initial exec symbols.
- slower lazy resolving for dynamically loaded libraries with fast path for already resolved symbols. It correctly handles undefined thread weak symbols (it should be resolved to NULL).
Unfortunately, here are still some issues which I unable to solve:
- TLSDESC ABI for resolver function requires preserving all (but X0) registers, including all floating point and optional SVE registers. This is not possible (in universal way) so current code expects that nobody in rtld-elf code uses it, including compiler instrict and/or used libraries. I'm not sure if there is any compiler switch that ensure this.
- memory allocated by reloc_tlsdesc_alloc() is not freed
- for TLS_TPREL64 relocation, we don't handle undefined thread weak symbols, but this problem exist in all other architectures.
- TLSDESC case in reloc_plt() is nonsense, this kind of relocation relocates data, not code. But (GNU compiled) /usr/local/lib/libuuid.so.1.2 have TLSDESC relocation in .rela.plt section. I have no idea why.