Page MenuHomeFreeBSD

[mips32/tls] change TCB size from 8 to 16 to be aligned with r324938 & r325364
AbandonedPublic

Authored by mizhka_gmail.com on Nov 17 2017, 3:44 PM.

Details

Reviewers
adrian
yamori813_yahoo.co.jp
landonf
Group Reviewers
MIPS
Summary

Hi,

The subtle of previous changes is to align thread local data to 16 bytes.
At first jemalloc aligned "tsd_s" to 16 bytes and then for static executables TCB & TLS have been changed to 16 byte alignment.
On MIPS platform, pointer to TLS is stored in UserLocal registers (since 2016?) and offset is calculated on kernel side. TCB size is 16 bytes for mips64 and 8 bytes for mips32. As result, commits r324938 & r325364 results in shift between userland & kernel TLS pointers (fixed by r325364 for other 32-bit platforms like arm & i386).

Also following ddb command from sys/mips helped to identify root cause:

#ifdef DDB
#include <ddb/ddb.h>
#include <machine/proc.h>

DB_SHOW_COMMAND(tls, db_show_tls)
{
	struct thread *td;

	td = curthread;
	db_printf("TLS: %p %x\n", td->td_md.md_tls, td->td_md.md_tls_tcb_offset);
	return;
}
#endif /* DDB */

I doubt that it's right way to fix it, but at least it unbreak mips32 platform. IMHO md_tls_tcb_offset is to be populated by process, not by kernel, am I right?

Test Plan

Tested on MIPS32 34k (BCM5357, Asus RT-N53).

ATTENTION: rebuild of world & kernel is required.

Original exception on revision 325744 (with patched libc abort method to call sysctl debug.kdb.enter):

mx25l0: bhnd: Firmware image [0x30524448]: 0x20000 - 0x7f8000
ugen0.1: <Broadcom EHCI root HUB> at usbus0
uhub0: <Broadcom EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus0
Warning: no time-of-day clock registered, system time will not be set accurately
uhub0: 2 ports with 2 removable, self powered
ugen0.2: <Broadcom Remote Download Wireless Adapter> at usbus0
KDB: enter: sysctl debug.kdb.enter
[ thread pid 1 tid 100001 ]
Stopped at      _DYNAMIC_LINKING+0x3:
db> bt
Tracing pid 1 tid 100001 td 0x80642000
[kernel]        kdb_enter+0x48 (?,?,?,?) ra 80183ae4 sp c2b03b90 sz 24 subr 80183994
[kernel]        kdb_sysctl_enter+0xbc (?,?,?,?) ra 801457e8 sp c2b03ba8 sz 48 subr 80183a28
[kernel]        sysctl_root_handler_locked+0xa4 (?,?,?,?) ra 80145aa0 sp c2b03bd8 sz 56 subr 80145744
[kernel]        sysctl_root+0xd8 (?,?,?,?) ra 80145efc sp c2b03c10 sz 96 subr 801459c8
[kernel]        userland_sysctl+0x110 (80642000,?,?,?) ra 801461ac sp c2b03c70 sz 112 subr 80145dec
[kernel]        sys___sysctl+0xac (?,?,?,?) ra 80339e1c sp c2b03ce0 sz 168 subr 80146100
[kernel]        trap+0x1018 (?,?,?,?) ra 80326524 sp c2b03d88 sz 248 subr 80338e04
[kernel]        MipsUserGenException+0x10c (?,?,?,?) ra 00000000 sp c2b03e80 sz 0 subr 80326418
--- exception, cause 0 badvaddr 0 ---
[/sbin/init]    __sysctl (?,?,?,?) ra 004791ec sp 7fffeca8 sz 0 subr 0
[/sbin/init]    sysctl (?,?,?,?) ra 00476684 sp 7fffeca8 sz 64 subr 47917c
[/sbin/init]    sysctlbyname (?,?,?,?) ra 00475cf8 sp 7fffece8 sz 160 subr 47660c
[/sbin/init]    abort (?,?,?,?) ra 0047ce68 sp 7fffed88 sz 72 subr 475c9c
[/sbin/init]    __je_malloc_tsd_boot0 (?,?,?,?) ra 00423834 sp 7fffedd0 sz 32 subr 47cd78
[/sbin/init]    malloc_init_hard (?,?,?,?) ra 00423fe8 sp 7fffedf0 sz 40 subr 423738
[/sbin/init]    jemalloc_constructor (?,?,?,?) ra 00543924 sp 7fffee18 sz 32 subr 423fb4
[/sbin/init]    __do_global_ctors_aux (?,?,?,?) ra 0040018c sp 7fffee38 sz 40 subr 5438ec
[/sbin/init]    _init (?,?,?,?) ra 004002e4 sp 7fffee60 sz 24 subr 40012c
[/sbin/init]    __start (?,?,?,?) ra 00000000 sp 7fffee78 sz 56 subr 4001ac
db> show tls
TLS: 0x40768010 7008

After change of TCB size from 8 to 16 and rebuild ONLY kernel:

Warning: no time-of-day clock registered, system time will not be set accurately
uhub0: 2 ports with 2 removable, self powered
Nov 15 17:26:30 init: login_getclass: unknown class 'daemon'
BAD_PAGE_FAULT: pid 20 tid 100039 (sh), uid 0: pc 0x40442930 got a read fault (type 0x2) at 0
Trapframe Register Dump:
	zero: 0	at: 0x406a4b04	v0: 0x4072e008	v1: 0xffffffffffff8000
	a0: 0x4072e008	a1: 0x2	a2: 0	a3: 0
	t0: 0	t1: 0	t2: 0x10e8bc67	t3: 0
	t4: 0x407389d0	t5: 0x4c957f2d	t6: 0x5851f42d	t7: 0x124fb5
	t8: 0x1	t9: 0x40442914	s0: 0x4069edec	s1: 0x40462090
	s2: 0x404416ac	s3: 0x4046f4bc	s4: 0x4046f4d4	s5: 0x4046f4d0
	s6: 0x404415c0	s7: 0x4046e1d0	k0: 0	k1: 0
	gp: 0x404764d0	sp: 0x7fffe1f8	s8: 0x7fffe2a0	ra: 0x4043d340
	sr: 0xa413	mullo: 0xffffffffcb417800	mulhi: 0x2	badvaddr: 0
	cause: 0x8	pc: 0x40442930
Page table info for pc address 0x40442930: pde = 0x80828000, pte = 0xa002601a
Dumping 4 words starting at pc address 0x40442930: 
8ce20000 8f8380d8 8c630000 14430005
Page table info for bad address 0: pde = 0, pte = 0
pid 20 (sh), uid 0: exited on signal 11
Nov 15 17:26:36 init: /bin/sh on /etc/rc terminated abnormally, going to single user mode
Enter full pathname of shell or RETURN for /bin/sh:

After patch and rebuild WORLD+kernel:

uhub0: <Broadcom EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus0
Warning: no time-of-day clock registered, system time will not be set accurately
uhub0: 2 ports with 2 removable, self powered
ugen0.2: <Broadcom Remote Download Wireless Adapter> at usbus0
Nov 17 15:10:54 init: login_getclass: unknown class 'daemon'
*** Mounting /tmp, /var, /etc ...
random: unblocking device.
*** Copying /c/etc -> /etc ...
*** Starting rc2 ...
*** Populating /var ..
ls: /dev/cfi*.cfg: No such file or directory
*** Loading configuration files ..
ls: /dev/cfi*.cfg: No such file or directory
*** Restoring from /dev/flash/spi0s.cfg .. 
gunzip: invalid compressed data--crc error
1+0 records in
1+0 records out
etc/cfg/manifest
65536 bytes transferred in 3.658103 secs (17915 bytes/sec)
etc/master.passwd
etc/group
etc/cfg/rc.conf
etc/profile
10 blocks
*** Completed.
*** setting up hostname
*** Load kernel modules
  .. bgmac
bgmac0: Ethernet address: bc:ee:7b:71:97:8b
mdio0: <MDIO> on bgmac0
  .. bridgestp
kldload: can't load bridgestp: No such file or directory
  .. if_bridge
kldload: can't load if_bridge: No such file or directory
  .. random
kldload: can't load random: No such file or directory
*** Setting kern.random.harvest.mask=511
kern.random.harvest.mask: 2047 -> 511
*** bringing up loopback ..
*** Default password/login databases ..
*** Customising sysctl ..
*** Starting networking via /etc/rc.d/base/net
*** Interface: bgmac0: start
bgmac0: [BHND  info] bgmac_chip_start_txrx:396 => starting TX / RX
*** Interface: bgmac0: done
/etc/rc.d/base/net: /etc/rc.d/net/: Permission denied
/etc/rc.d/base/net: /etc/rc.d/net/: Permission denied
/etc/rc.d/base/net: /etc/rc.d/net/: Permission denied
*** Interface: bridge0: start
ifconfig: SIOCIFCREATE2: Invalid argument
ifconfig: interface bridge0 does not exist
ifconfig: interface bridge0 does not exist
ifconfig: interface bridge0 does not exist
*** Interface: bridge0: done
ipfw: setsockopt(IP_FW_XDEL): Protocol not available
*** Done!

FreeBSD/mips (RT-N53) (ttyu0)

login: root
[RT-N53:/root]$

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

mizhka_gmail.com edited the summary of this revision. (Show Details)Nov 17 2017, 4:15 PM
brooks added a subscriber: brooks.Nov 17 2017, 4:34 PM

Why not just nuke TLS_TCB_SIZE32?

Why not just nuke TLS_TCB_SIZE32?

I'll do by next commit. AFAIK TLS_TCB_SIZE32 is used for 32 compatibility on MIPS64. I have no experience with MIPS64, so it's not easy to quick test this change. So I prefer to nuke it by next step.

Is this needed at all after rS325364? We should now be respecting the alignment of the PT_TLS section. If the PT_TLS section isn't correctly aligned then fixing that is the right answer rather than trashing the ABI.

Is this needed at all after rS325364? We should now be respecting the alignment of the PT_TLS section. If the PT_TLS section isn't correctly aligned then fixing that is the right answer rather than trashing the ABI.

Thank you for idea. I'll update review request after test (hope in next few hours). I'll try to add roundup2 to sysarch of mips to avoid trashing ABI.

@brooks ,

On kernel side, we calculate TLS size (td->td_md.md_tls_tcb_offset) in several places used by static & dynamic at same time. So if we roundup2 it, /sbin/init works, but /bin/sh fails.
As of now, I suppose that current fix is the only quick solution :\

BTW, here is stack trace of bad page fault:

ugen0.2: <Broadcom Remote Download Wireless Adapter> at usbus0
Nov 17 18:33:54 init: login_getclass: unknown class 'daemon'
BAD_PAGE_FAULT: pid 20 tid 100039 (sh), uid 0: pc 0x40442930 got a read fault (type 0x2) at 0
Trapframe Register Dump:
	zero: 0	at: 0x406a4b04	v0: 0x4072e008	v1: 0xffffffffffff8000
	a0: 0x4072e008	a1: 0x2	a2: 0	a3: 0
	t0: 0	t1: 0	t2: 0x12ed7ff3	t3: 0
	t4: 0x407389d0	t5: 0x4c957f2d	t6: 0x5851f42d	t7: 0x124fb5
	t8: 0x1	t9: 0x40442914	s0: 0x4069edec	s1: 0x40462090
	s2: 0x404416ac	s3: 0x4046f4bc	s4: 0x4046f4d4	s5: 0x4046f4d0
	s6: 0x404415c0	s7: 0x4046e1d0	k0: 0	k1: 0
	gp: 0x404764d0	sp: 0x7fffe1f8	s8: 0x7fffe2a0	ra: 0x4043d340
	sr: 0xa413	mullo: 0xffffffffcb417800	mulhi: 0x2	badvaddr: 0
	cause: 0x8	pc: 0x40442930
Page table info for pc address 0x40442930: pde = 0x8084f000, pte = 0xa003621a
Dumping 4 words starting at pc address 0x40442930: 
8ce20000 8f8380d8 8c630000 14430005
Page table info for bad address 0: pde = 0, pte = 0
KDB: enter: bad page fault
[ thread pid 20 tid 100039 ]
Stopped at      _DYNAMIC_LINKING+0x3:
db> bt
Tracing pid 20 tid 100039 td 0x8064b780
[kernel]        kdb_enter+0x48 (?,?,?,?) ra 80338690 sp c2bc5d10 sz 24 subr 80183994
[kernel]        log_bad_page_fault+0x29c (?,?,?,?) ra 803393c8 sp c2bc5d28 sz 96 subr 803383f4
[kernel]        trap+0x5b0 (?,?,?,?) ra 80326524 sp c2bc5d88 sz 248 subr 80338e18
[kernel]        MipsUserGenException+0x10c (?,?,?,?) ra 00000000 sp c2bc5e80 sz 0 subr 80326418
--- exception, cause 404764d0 badvaddr 40446a84 ---
[/libexec/ld-elf.so.1]  0x40442930 (?,?,?,?) ra 4043d340 sp 7fffe1f8 sz 0 subr 0
[/libexec/ld-elf.so.1]  0x4043d340 (?,?,?,?) ra 4043f46c sp 7fffe1f8 sz 32 subr 4043d2fc
[/libexec/ld-elf.so.1]  0x4043f46c (?,?,?,?) ra 40696e68 sp 7fffe218 sz 32 subr 4043f430
can't lookup vm_map_entry: 2
0x40696e68 (?,?,?,?) ra 00000000 sp 7fffe238 sz 0 subr 0
db> show tls
TLS: 0x4072e000 7010

According to kernel thread info, TLS is aligned to 16-byte. I guess that page fault is due to dynamic executable expects TLS with size = 0x7008 bytes.

You shouldn't need to change the size or roundup2 the syscall argument. With the change in rS325364 the TLS allocation should be properly aligned so long as the PT_TLS segment has the right alignment. You could change the definition of TLS_TCB_ALIGN to be 16 for mips like it is on amd64, but we should be setting the alignment of the TLS segment correctly and fix it if we aren't.

mizhka_gmail.com updated this revision to Diff 35473.

I don't touch "Summary" because original patch is still good IMHO, but it breaks ABI.
This revision of patch is only kernel change and doesn't break ABI, i.e. buildkernel/installkernel is enough.

As discussion with @meloun-miracle-cz in IRC, static binaries expects aligned TCB thanks to new CRT code, but dynamic binaries is fully managed by rtld and behavior should be same as it was before r325364. To handle this difference, let's check first offset in dtv[] array (actual third element) to identify actual TCB size. It allows to adjust md_tls_tcb_offset for static binaries and keep original behaviour for dynamic binaries.

@meloun-miracle-cz , @ray , @yamori813_yahoo.co.jp , @adrian , I'll appreciate your review! Thx!

mmel added a comment.Nov 20 2017, 3:19 PM

with full respect, I don’t think that this is right way. Moreover, I think that you papering over real problem there.
With this patch, what’s happen if someone requests any higher alignment (that actual 16) for TLS data? Or, can patched kernel run the old init (pre r324938)?

After day of digging, I’m still not sure which TLS model mips uses. Mainly which value is expected to be returned by tls_get_tp() function.
It’s clear that mips doesn’t follow Variant I of TLS standard (which require TP pointing to TCB), but it takes relaxed model of this. Something like
“The "thread pointer register" points to some fixed offset from the beginning of the TLS segment. This offset varies by arch“.

If is this true, then my change in libc/gen/tls.h has been significantly incomplete and the arch specific bits need to be implemented.

In D13134#274207, @meloun-miracle-cz wrote:

with full respect, I don’t think that this is right way. Moreover, I think that you papering over real problem there.
With this patch, what’s happen if someone requests any higher alignment (that actual 16) for TLS data? Or, can patched kernel run the old init (pre r324938)?
After day of digging, I’m still not sure which TLS model mips uses. Mainly which value is expected to be returned by tls_get_tp() function.
It’s clear that mips doesn’t follow Variant I of TLS standard (which require TP pointing to TCB), but it takes relaxed model of this. Something like
“The "thread pointer register" points to some fixed offset from the beginning of the TLS segment. This offset varies by arch“.
If is this true, then my change in libc/gen/tls.h has been significantly incomplete and the arch specific bits need to be implemented.

Kernel world with old world fine, tested on r322999 world.

Cosmetic change: increase MIPS_MAX_TCB_SIZE from 0x20 to 0x1000

mmel added a comment.Nov 25 2017, 8:58 AM

Generally speaking, I think that implementing more and more knowledge about TLS ABI into kernel is direct way to hell and it simply leads to hard to solve compatibility issues.
The ideal kernel should take TP pointer as opaque value, without any attempt to interpret it.

At this point I don't see any other option than implementing new sysarch syscalls, say <ARCH>_SET/GET_TP with correct semantics, adapt all library bits to it and hide old <ARCH>_SET/GET_TLS under COMPAT11 #ifdef.

sys/mips/include/tls.h
55

Why is this define needed ?

sys/mips/mips/sys_machdep.c
82

You can never directly dereference the usermode pointer from the kernel. Malicious program can pass invalid or kernel address in tcb/dtv which can leads to panic or it can have security implication. Use fueword*() or copyin() for this.

Also, it seems that mips ABI expects dtv[] pointers are biased by TLS_DTP_OFFSET. We have implemented this bias in libexec/rtld, but our libc/gen/tls.c ignores it.