Page MenuHomeFreeBSD

libc: Give __thr_jtable protected visibility
ClosedPublic

Authored by markj on Wed, May 14, 4:42 PM.
Tags
None
Referenced Files
F118429940: D50354.diff
Thu, May 29, 7:06 AM
Unknown Object (File)
Mon, May 26, 11:04 AM
Unknown Object (File)
Sat, May 24, 6:48 AM
Unknown Object (File)
Sat, May 24, 3:34 AM
Unknown Object (File)
Sat, May 24, 2:13 AM
Unknown Object (File)
Fri, May 23, 11:30 PM
Unknown Object (File)
Mon, May 19, 3:03 PM
Unknown Object (File)
Mon, May 19, 2:15 PM
Subscribers

Details

Summary

This function pointer table is overwritten by libthr when it's loaded.
libc's pthread stubs are implemented by looking up an entry in this
table and invoking the function pointer contained in the entry.

pthread calls are fairly expensive even when libthr is not loaded: each
call involves indirection through the PLT, then through the GOT to look
up __thr_jtable, then the function pointer itself. We can however
eliminate one level of indirection by disallowing preemption of the
__thr_jtable symbol, and I believe that doing so is unlikely to break
anything, so do it.

Sponsored by: Innovate UK

Test Plan

I used a microbenchmark which locks and unlocks a mutex one billion times.
On a Ryzen 7950X3D the runtime decreases, from 4.53s to 4.15s. When libthr
is loaded, either at program start time or by dlopen(), there's no change
(the runtime is about 10.2s).

Without the change, a typical stub looks like this:

Dump of assembler code for function pthread_mutex_lock_exp:         
   0x000000000009b530 <+0>:     push   %rbp
   0x000000000009b531 <+1>:     mov    %rsp,%rbp
   0x000000000009b534 <+4>:     mov    0x14ddc5(%rip),%rax        # 0x1e9300                                                                                   
   0x000000000009b53b <+11>:    pop    %rbp                                                                                                                    
   0x000000000009b53c <+12>:    jmp    *0x2a0(%rax)

whereas with the change it becomes:

Dump of assembler code for function pthread_mutex_lock_exp:                                                                                                    
   0x000000000009b380 <+0>:     push   %rbp
   0x000000000009b381 <+1>:     mov    %rsp,%rbp
   0x000000000009b384 <+4>:     pop    %rbp
   0x000000000009b385 <+5>:     jmp    *0x14ecd5(%rip)        # 0x1ea060 <__thr_jtable+672>

(and the frame pointer manipulations look a bit silly now.) I see a similar change on arm64:

Dump of assembler code for function pthread_mutex_lock_exp:
   0x00000000000a8e5c <+0>:     adrp    x8, 0x1f1000
   0x00000000000a8e60 <+4>:     ldr     x8, [x8, #3240]
   0x00000000000a8e64 <+8>:     ldr     x1, [x8, #672]
   0x00000000000a8e68 <+12>:    br      x1

after:

Dump of assembler code for function pthread_mutex_lock_exp:
   0x00000000000a8dc4 <+0>:     adrp    x8, 0x202000
   0x00000000000a8dc8 <+4>:     ldr     x1, [x8, #1296]
   0x00000000000a8dcc <+8>:     br      x1

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable