Page MenuHomeFreeBSD

Make p_vaddr % p_align == p_offset % p_align for TLS segments.
Needs ReviewPublic

Authored by kib on Aug 4 2019, 10:31 PM.

Details

Summary

See https://sourceware.org/bugzilla/show_bug.cgi?id=24606 for the test case.
See https://reviews.llvm.org/D64930 for the background and more discussion.

Also this fixes another bug in malloc_aligned() where total size of the allocated memory might be not enough to fir the aligned requested block after the initial pointer is incremented by the pointer size.

Test Plan

Would be nice to boot on non-x86 to see how it breaks things.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped
Build Status
Buildable 26331

Event Timeline

kib created this revision.Aug 4 2019, 10:31 PM
kib edited the summary of this revision. (Show Details)Aug 4 2019, 10:32 PM
kib edited the summary of this revision. (Show Details)
kib retitled this revision from Make p_vaddr%p_align = p_offset%p_align for TLS segments. to Make p_vaddr % p_align == p_offset % p_align for TLS segments..Aug 4 2019, 10:39 PM
kib edited the summary of this revision. (Show Details)

Local-Exec TP offsets are link-time constants, and thus contracts between ld and ld.so. It may be worth checking if rtld-elf computed Local-Exec TP offsets match lld.

https://reviews.llvm.org/D64906 series (EM_PPC, EM_PPC64, EM_AARCH64 and EM_386) were committed yesterday. See the changed ELF/InputSection.cpp static int64_t getTlsTpOffset(const Symbol &s) {. The lld formulae should match musl>1.1.22 and glibc. ARM/AArch64 are a bit tricky because there is an optional alignment padding after the 2 reserved words.

Quick instructions to build lld:

git clone git@github.com:llvm/llvm-project.git
cd llvm-project
cmake -Hllvm -BRelease -GNinja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS='lld'

# Delete `else if (Out::tlsPhdr && Out::tlsPhdr->firstSec == p->firstSec)` from lld/ELF/Writer.cpp around line 2244
# If sh_addralign(.tdata) < sh_addralign(.tbss), sometimes p_vaddr(PT_TLS)%p_align(PT_TLS)!=0

ninja -C Release lld
# lld is at Release/bin/ld.lld

a.c:

#include <stdio.h>
__thread int a __attribute__((aligned(4))) = 0xb612; // .tdata alignment, try a few numbers, e.g. 1, 2, 4, 8, 16, 32
__asm(".section .tbss,\"awT\"; .align 64"); // try 64, 128, 256, etc
int main() { printf("%p %x\n", &a, a); }
clang a.c -fuse-ld=path/to/ld.lld -o a
./a
# Experiment with a few different alignments
kib added a comment.EditedAug 24 2019, 9:08 PM

...
Experiment with a few different alignments

Is that what you mean ?

diff --git a/lld/ELF/Writer.cpp b/lld/ELF/Writer.cpp
index 9db4259be49..b8b154d1f32 100644
--- a/lld/ELF/Writer.cpp
+++ b/lld/ELF/Writer.cpp
@@ -2236,12 +2236,14 @@ template <class ELFT> void Writer<ELFT>::fixSectionAlignments() {
       // p_align for dynamic TLS blocks (PR/24606), FreeBSD rtld has the same
       // bug, musl (TLS Variant 1 architectures) before 1.1.23 handled TLS
       // blocks correctly. We need to keep the workaround for a while.
+#if 0
       else if (Out::tlsPhdr && Out::tlsPhdr->firstSec == p->firstSec)
         cmd->addrExpr = [] {
           return alignTo(script->getDot(), config->maxPageSize) +
                  alignTo(script->getDot() % config->maxPageSize,
                          Out::tlsPhdr->p_align);
         };
+#endif
       else
         cmd->addrExpr = [] {
           return alignTo(script->getDot(), config->maxPageSize) +

?

What should I infer from the './a' output ? I suspect there is some alignment that just happens, which makes the test not effective:

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
...
  TLS            0x002000 0x0000000000202000 0x0000000000202000 0x000004 0x000040 R   0x40

orion% ./a                                                         ~/build/llvm
0x80022f900 b612
In D21163#465743, @kib wrote:

...
Experiment with a few different alignments

Is that what you mean ?

diff --git a/lld/ELF/Writer.cpp b/lld/ELF/Writer.cpp
index 9db4259be49..b8b154d1f32 100644
--- a/lld/ELF/Writer.cpp
+++ b/lld/ELF/Writer.cpp
@@ -2236,12 +2236,14 @@ template <class ELFT> void Writer<ELFT>::fixSectionAlignments() {
       // p_align for dynamic TLS blocks (PR/24606), FreeBSD rtld has the same
       // bug, musl (TLS Variant 1 architectures) before 1.1.23 handled TLS
       // blocks correctly. We need to keep the workaround for a while.
+#if 0
       else if (Out::tlsPhdr && Out::tlsPhdr->firstSec == p->firstSec)
         cmd->addrExpr = [] {
           return alignTo(script->getDot(), config->maxPageSize) +
                  alignTo(script->getDot() % config->maxPageSize,
                          Out::tlsPhdr->p_align);
         };
+#endif
       else
         cmd->addrExpr = [] {
           return alignTo(script->getDot(), config->maxPageSize) +

?

Yes, you can get misaligned a PT_TLS segment with p_vaddr%p_align!=0 with a.c above. Or:

# lld/test/ELF/aarch64-tls-vaddr-align.s
add x0, x0, :tprel_lo12_nc:a

.section .tdata,"awT"
.byte 0

.section .tbss,"awT"
.p2align 8
a:
.quad 0

# lld/test/ELF/ppc64-tls-vaddr-align.s
ld 3, a@tprel(13)

.section .tdata,"awT"
.byte 0

.section .tbss,"awT"
.p2align 8
a:
.quad 0

Your patch looks good.

kib edited the test plan for this revision. (Show Details)Sep 6 2019, 2:01 PM
kib added reviewers: emaste, brooks, andrew, jhibbits.
kib removed a subscriber: emaste.

Testing powerpc64.

libexec/rtld-elf/rtld.c
4823

missing param on Variant I.

kib updated this revision to Diff 61768.Sep 6 2019, 10:22 PM

Compile fix for var I.

Hmm, calculations on powerpc64 seem to be offset incorrectly in both cases with the patched lld. Gonna meditate on a disassembly for a bit.

bdragon added inline comments.Sep 7 2019, 1:57 AM
libexec/rtld-elf/rtld.c
4845–4846

I *think* the problem I'm having is that this bit needs to compensate for the difference between the file offset and the first legal address according to the padding, so that it copies the preset variables at the correct offset, instead of copying them to the beginning of the TLS memory.

bdragon added inline comments.Sep 7 2019, 2:08 AM
libexec/rtld-elf/rtld.c
4845–4846

Example, thinking out loud:

Broken:

header:
     TLS off    0x0000000000000cd0 vaddr 0x0000000010020cd0 paddr 0x0000000010020cd0 align 2**6
         filesz 0x0000000000000004 memsz 0x0000000000000030 flags r--

binary:
00000cc0  4e 80 04 20 00 00 00 00  00 02 02 50 4b ff ff c4  |N.. .......PK...|
00000cd0 [00 00 b6 12]00 00 00 00  ff ff ff ff ff ff ff ff  |................|
00000ce0  00 00 00 00 00 00 00 00  ff ff ff ff ff ff ff ff  |................|
00000cf0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

Working:

header:
     TLS off    0x0000000000000c40 vaddr 0x0000000010010c40 paddr 0x0000000010010c40 align 2**6
         filesz 0x0000000000000004 memsz 0x0000000000000004 flags r--

binary:
00000c10  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000c40 [00 00 b6 12]00 00 00 00  ff ff ff ff ff ff ff ff  |................|
00000c50  00 00 00 00 00 00 00 00  ff ff ff ff ff ff ff ff  |................|
00000c60  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000c70  00 00 00 00 00 00 00 01  00 00 00 00 00 00 00 01  |................|

In the broken case, the binary has 0x10 offset built in to reference the thread variable, because it expected the init to start at the difference between off (0xcd0) and the base-according-to-alignment (0xcc0). But since it was copying to the start of the area instead of an offset into it, the variable is copied 0x10 too low.

So I guess another memset is needed to zero the bytes between the base and the offset here, and the memcpy should write starting after that.

bdragon added inline comments.Sep 7 2019, 3:41 AM
libexec/rtld-elf/rtld.c
4845–4846

Something like this seems to work for my test case. (remote editing, sorry in advance about the patch copypasted from a terminal:)

Index: local-src/libexec/rtld-elf/rtld.c
===================================================================
--- local-src.orig/libexec/rtld-elf/rtld.c
Index: local-src/libexec/rtld-elf/rtld.c
===================================================================
--- local-src.orig/libexec/rtld-elf/rtld.c
+++ local-src/libexec/rtld-elf/rtld.c
@@ -1447,6 +1447,7 @@ digest_phdr(const Elf_Phdr *phdr, int ph
            obj->tlsalign = ph->p_align;
            obj->tlsinitsize = ph->p_filesz;
            obj->tlsinit = (void*)(ph->p_vaddr + obj->relocbase);
+           obj->tlspoffset = ph->p_offset;
            break;
 
        case PT_GNU_STACK:
@@ -4802,7 +4803,7 @@ allocate_tls(Obj_Entry *objs, void *oldt
     Elf_Addr addr;
     Elf_Addr i;
     size_t extra_size, maxalign, post_size, pre_size, tls_block_size;
-    size_t tls_init_align;
+    size_t tls_init_align, tls_init_offset;
 
     if (oldtcb != NULL && tcbsize == TLS_TCB_SIZE)
        return (oldtcb);
@@ -4843,13 +4844,17 @@ allocate_tls(Obj_Entry *objs, void *oldt
 
        for (obj = globallist_curr(objs); obj != NULL;
          obj = globallist_next(obj)) {
+           tls_init_offset = obj->tlspoffset & (obj->tlsalign - 1); 
+           if (tls_init_offset > 0)
+               memset((void*) addr, 0, tls_init_offset);
            if (obj->tlsoffset > 0) {
                addr = (Elf_Addr)tcb + obj->tlsoffset;
                if (obj->tlsinitsize > 0)
-                   memcpy((void*) addr, obj->tlsinit, obj->tlsinitsize);
+                   memcpy((void*)(addr + tls_init_offset), obj->tlsinit, obj->tlsinitsize);
+
                if (obj->tlssize > obj->tlsinitsize)
-                   memset((void*)(addr + obj->tlsinitsize), 0,
-                          obj->tlssize - obj->tlsinitsize);
+                   memset((void*)(addr + tls_init_offset + obj->tlsinitsize), 0,
+                          obj->tlssize - obj->tlsinitsize - tls_init_offset);
                dtv[obj->tlsindex + 1] = addr;
            }
        }
bdragon added inline comments.Sep 7 2019, 3:49 AM
libexec/rtld-elf/rtld.c
1449–1450

As per the inline patch I just dumped into my last comment: Need to set obj->tlspoffset = ph->p_offset; here as well to handle an unaligned tls offset in the main program.

kib updated this revision to Diff 61794.Sep 7 2019, 6:15 PM

Add change from bdragon.