Page MenuHomeFreeBSD

Align initial-exec TLS segments to the p_vaddr % align.
ClosedPublic

Authored by kib on Apr 10 2020, 3:08 PM.

Details

Summary

Only amd64 and i386 are implemented. I do not plan to handle other arches.

This is continuation of D21163/r359634, which handled the alignment for global mode.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

I will apply this to my tree and test soon.

I've built and installed rtld w/ this change on my main laptop, no issues observed so far.

This revision was not accepted when it landed; it landed in state Needs Review.Apr 19 2020, 9:29 AM
This revision was automatically updated to reflect the committed changes.

Do we not need an equivalent change to the frustratingly-duplicated parts in lib/libc/gen/tls.c for statically-linked binaries? Initial-exec is unusual but still valid there so long as your linker has statically filled in the relocations for the GOT, and can happen if you disable TLS linker relaxation (or use one of the architectures that can't relax TLS sequences).

Do we not need an equivalent change to the frustratingly-duplicated parts in lib/libc/gen/tls.c for statically-linked binaries? Initial-exec is unusual but still valid there so long as your linker has statically filled in the relocations for the GOT, and can happen if you disable TLS linker relaxation (or use one of the architectures that can't relax TLS sequences).

It would be nice to have, and regardless of static or initial-exec mode. For rtld I handled both.

On the other hand, I somewhat doubt that any program where it would matter is linked statically. Still, if you want to provide the patch, go ahead. Otherwise I might return to this sometime.

fbsd-phab_maskray.me added inline comments.
head/libexec/rtld-elf/amd64/reloc.c
561

Considering a PT_TLS segment with p_vaddr == 4, p_align == 8 and p_memsz == 9, the computed offset is 20 (does that mean TP-20?)

LLD computed offset is -12 (matches musl libc)
https://github.com/llvm/llvm-project/blob/main/lld/ELF/InputSection.cpp#L665 The offset equals p_vaddr modulo p_align.

Local-exec TLS offsets are a protocol between the linker and the dynamic loader. Their values must match.

head/libexec/rtld-elf/amd64/reloc.c
561

Nobody ever answered by query about the right formula. Please provide the algorithm that should be implemented by rtld.

head/libexec/rtld-elf/amd64/reloc.c
561

20 mod 8 == 12 mod 8 == 4. Can you provide self-contained elf binary that demonstrates the problem you complain about?

It might be that the problem is not in the calculation, but in the alignment of the end of the tls segment (from which this offset is subtracted).

head/libexec/rtld-elf/amd64/reloc.c
561

20 mod 8 == 12 mod 8 == 4.

This is not sufficient. For the local-exec TLS model, the linker and the dynamic loader must agree on the TP offset, otherwise if you take the address from an executable (local-exec) and take the address from a DF_STATIC_TLS shared object (initial-exec movq tls@GOTTPOFF(%rip), %rax), the two addresses may mismatch.

Can you provide self-contained elf binary that demonstrates the problem you complain about?

It might be that the problem is not in the calculation, but in the alignment of the end of the tls segment (from which this offset is subtracted).

LLD has a workaround to force p_vaddr%p_align = 0. You can delete the else if (Out::tlsPhdr && Out::tlsPhdr->firstSec == p->firstSec) branch in https://reviews.llvm.org/D64906 , then LLD can make p_vaddr%p_align != 0 if sh_addralign(.tdata) < sh_addralign(.tbss), e.g.

# REQUIRES: x86
# RUN: llvm-mc -filetype=obj -triple=x86_64 %s -o %t.o
# RUN: ld.lld %t.o -o %t
# RUN: llvm-objdump -d --no-show-raw-insn %t | FileCheck %s

# -alignTo(p_memsz, p_align) = -alignTo(4, 64) = -64

# CHECK: movl %fs:-64, %eax

  movl %fs:a@TPOFF, %eax

.section .tdata,"awT"
a:
.long 0

.section .tbss,"awT",@nobits
.balign 64

I don't spend time to craft an example to make the FreeBSD rtld inconsistent, but with appropriate sh_size(.tdata) you can.

I wrote most stuff in https://maskray.me/blog/2021-02-14-all-about-thread-local-storage