Make p_vaddr % p_align == p_offset % p_align for TLS segments.
Authored by kib on Sun, Aug 4, 10:31 PM.
See for the test case.
See for the background and more discussion.

Also this fixes another bug in malloc_aligned() where total size of the allocated memory might be not enough to fir the aligned requested block after the initial pointer is incremented by the pointer size.

Local-Exec TP offsets are link-time constants, and thus contracts between ld and It may be worth checking if rtld-elf computed Local-Exec TP offsets match lld. series (EM_PPC, EM_PPC64, EM_AARCH64 and EM_386) were committed yesterday. See the changed ELF/InputSection.cpp static int64_t getTlsTpOffset(const Symbol &s) {. The lld formulae should match musl>1.1.22 and glibc. ARM/AArch64 are a bit tricky because there is an optional alignment padding after the 2 reserved words.

Quick instructions to build lld:

git clone
cd llvm-project
cmake -Hllvm -BRelease -GNinja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS='lld'

# Delete `else if (Out::tlsPhdr && Out::tlsPhdr->firstSec == p->firstSec)` from lld/ELF/Writer.cpp around line 2244
# If sh_addralign(.tdata) < sh_addralign(.tbss), sometimes p_vaddr(PT_TLS)%p_align(PT_TLS)!=0

ninja -C Release lld
# lld is at Release/bin/ld.lld


#include <stdio.h>
__thread int a __attribute__((aligned(4))) = 0xb612; // .tdata alignment, try a few numbers, e.g. 1, 2, 4, 8, 16, 32
__asm(".section .tbss,\"awT\"; .align 64"); // try 64, 128, 256, etc
int main() { printf("%p %x\n", &a, a); }
clang a.c -fuse-ld=path/to/ld.lld -o a
# Experiment with a few different alignments