Rework alignment handling in __libc_allocate_tls() for Variant I of TLS layout.
ClosedPublic

Authored by meloun-miracle-cz on Dec 5 2017, 4:46 PM.

Details

Summary

Rework alignment handling in __libc_allocate_tls() for Variant I of TLS layout.

There are two versions of variant I of TLS

  • ARM and aarch64 uses original version of variant I here TP points to start of TCB followed by aligned TLS segment. Both TCB and TLS must be aligned to alignment of TLS section. The TCB[0] points to DTV vector and DTV values are real addresses (without bias).
  • MIPS, PowerPC and RISC-V uses modified, incomatible version of variant I, where TP points (with bias) to TLS and TCB immediately precedes TLS without any alignment gap[4]. Only TLS should be aligned. The TCB[0] points to DTV vector and DTV values are biased by constant value (0x8000) from real addresses.

Take this in account when allocating memory for TLS structures.

MFC after: 1 month


There are at least one oustanding issue: I cannot find any evidence
if a DTV vector is used in Local Exec TLS Model and if so, whether
is expected that it contains biased TLS values (addresses).

Test Plan

I made a simple test, so it would be nice if you can run it on the
affected platforms: https://people.freebsd.org/~mmel/test_tls.tgz
Thanks.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
landonf added a subscriber: landonf.Dec 5 2017, 7:43 PM
mizhka_gmail.com accepted this revision.Dec 5 2017, 9:21 PM

MIPS is fine:

[RT-N53:/tmp]$ ./test_tls 
OK.

Tested on Asus RT-N53 with jemalloc correction. jemalloc issue (incorrect debug assertion) has been discussed with David Goldblatt in https://gitter.im/jemalloc/jemalloc. It's accepted as bug to be fixed before 5.1 planned for early next year.

This revision is now accepted and ready to land.Dec 5 2017, 9:21 PM
kib accepted this revision.Dec 6 2017, 4:54 PM

I did not found anything obviously wrong (i.e. an easy to see contradiction between the comment and code).

lib/libc/gen/tls.c
143 ↗(On Diff #36245)

Perhaps commit the malloc_aligned() changes separately. Am I right that this is needed only for Cheri ?

172 ↗(On Diff #36245)

use

205 ↗(On Diff #36245)

Why int and not size_t ?

Also, order the variables in declaration alphabetically.

207 ↗(On Diff #36245)

fragments

240 ↗(On Diff #36245)

the following

261 ↗(On Diff #36245)

Same note for the type and ordering.

269 ↗(On Diff #36245)

fragments

287 ↗(On Diff #36245)

Space between Elf_addr and '**'.

This code generally seem like an improvement for code where there are hidden alignment requirements in the binary, but it isn't complete. If you do something like:

extern __thread char array_16k[16384];
__attribute__((__aligned__(16384)))
__thread char array_16k[16384];

Then it will only reliably be aligned to 8k. On at least mips64 this will be a regression.

Kib,
thanks. I will incorporate all your comments into final commit.

Brooks,
I'm sorry for my ignorance, but why? I don't see single reason for this. Can you be more verbose, please.

This revision was automatically updated to reflect the committed changes.

First the overall alignment is computed:

maxalign = MAX(tcbalign, tls_init_align);

and then tls_block_size is computed with all the assorted padding and without relation to tcbalign.
The whole allocation is then aligned to maxalign with:

tls_block = malloc_aligned(tls_block_size, maxalign);

However, if tcbalign is > tls_block_size the TLS segment will be aligned to tls_block_size not maxalign. As a practical matter I'm not sure this can happen, but this code is confusing enough that I really don't want to have to read through multiple codepaths and traverser weak symbols to convince myself it won't. We should either honor all reasonable values for tcbalign is document our assumptions about them clearly.