Page MenuHomeFreeBSD

ptrace: Clear TDB_BORN during PT_DETACH.
ClosedPublic

Authored by jhb on Apr 27 2023, 5:38 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Oct 27, 6:15 PM
Unknown Object (File)
Sat, Oct 26, 7:05 PM
Unknown Object (File)
Sep 30 2024, 9:59 AM
Unknown Object (File)
Sep 19 2024, 11:50 AM
Unknown Object (File)
Sep 19 2024, 10:24 AM
Unknown Object (File)
Sep 18 2024, 5:13 PM
Unknown Object (File)
Sep 18 2024, 12:32 PM
Unknown Object (File)
Sep 16 2024, 1:00 PM
Subscribers

Details

Summary

If a debugger detaches from a process that has a new thread that has
not yet executed, the new thread will raise a SIGTRAP signal to report
it's thread birth event even after the detach. With the debugger
detached, this results in a SIGTRAP sent to the process and typically
a core dump. Fix this by clearing TDB_BORN from any new threads
during detach.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

jhb requested review of this revision.Apr 27 2023, 5:38 PM

I discovered this bug running GDB's test suite which was able to reproduce this reliably. I tried writing my own test (included here), but was not able to get the test to provoke the condition reliably. I'd like to still merge the bug fix though even if I can't come up with a reliable test for it.

The kernel change seems ok, but I wonder why fork_return() checks TDB_BORN when P_TRACED is clear.

tests/sys/kern/ptrace_test.c
4331 ↗(On Diff #121154)

Check for errors here?

4344 ↗(On Diff #121154)

Why do you start the loop from 1?

4403 ↗(On Diff #121154)

If I run the test, this test occasionally fails: ptrace_test.c:4403: WSTOPSIG(status) (11) == SIGTRAP (5) not met. Presumably that reflects a bug in the test? Otherwise the test just times out.

The test was a bit more of a current WIP of what I had been trying. I'm not sure it's possible to get a reliable test for this race (and don't plan to include the test unless it can be made to work reliably). As far as the checking the flag in fork_return: I actually think in the rare case you do PT_DETACH and then PT_ATTACH again, the expectation should be that the second PT_ATTACH should be using PT_GETLWPS to "see" the new threads right after PT_ATTACH and not need the born event. It's possible that we should also be only checking P_TRACED in fork_return, but then you'd never have TDB_BORN cleared in this case which is kind of odd.

tests/sys/kern/ptrace_test.c
4344 ↗(On Diff #121154)

Humm, that's probably an oversight, though I'm not sure one that matters. The 4 above is my current SWAG while trying different values of this when I was actively running the test on my laptop. (I found this bug back in January or so but have just been reticent in posting a review)

4403 ↗(On Diff #121154)

Do you get a coredump for the SIGSEGV?

In D39856#907301, @jhb wrote:

The test was a bit more of a current WIP of what I had been trying. I'm not sure it's possible to get a reliable test for this race (and don't plan to include the test unless it can be made to work reliably). As far as the checking the flag in fork_return: I actually think in the rare case you do PT_DETACH and then PT_ATTACH again, the expectation should be that the second PT_ATTACH should be using PT_GETLWPS to "see" the new threads right after PT_ATTACH and not need the born event. It's possible that we should also be only checking P_TRACED in fork_return, but then you'd never have TDB_BORN cleared in this case which is kind of odd.

I think you would also want to clear TDB_BORN regardless, yes.

tests/sys/kern/ptrace_test.c
4403 ↗(On Diff #121154)

I had to hack up the test to get it:

Thread 2.34434 "simple" received signal SIGSEGV, Segmentation fault.
Address not mapped to object.
[Switching to LWP 1097487 of process 46014]
check_and_init_mutex (mutex=0x40, m=m@entry=0x7fffdedf4ad8) at /root/freebsd/lib/libthr/thread/thr_mutex.c:597
597     /root/freebsd/lib/libthr/thread/thr_mutex.c: No such file or directory.
(gdb) bt
#0  check_and_init_mutex (mutex=0x40, m=m@entry=0x7fffdedf4ad8) at /root/freebsd/lib/libthr/thread/thr_mutex.c:597
#1  0x00000008010b0caa in __Tthr_mutex_trylock (mutex=0x40) at /root/freebsd/lib/libthr/thread/thr_mutex.c:625
#2  0x000000080124e98d in malloc_mutex_trylock_final (mutex=0x0) at /root/freebsd/contrib/jemalloc/include/jemalloc/internal/mutex.h:159
#3  malloc_mutex_lock (tsdn=0x8014f5090, mutex=0x0) at /root/freebsd/contrib/jemalloc/include/jemalloc/internal/mutex.h:218
#4  arena_dalloc_bin (tsdn=0x8014f5090, arena=0x801e008c0, extent=0x801e0a040, ptr=0x802208318) at jemalloc_arena.c:1724
#5  __je_arena_dalloc_small (tsdn=0x8014f5090, ptr=0x802208318) at jemalloc_arena.c:1735
#6  0x000000080124b053 in arena_dalloc_no_tcache (tsdn=tsdn@entry=0x8014f5090, ptr=ptr@entry=0x802208318) at /root/freebsd/contrib/jemalloc/include/jemalloc/internal/arena_inlines_b.h:263
#7  0x0000000801243a11 in arena_dalloc (tsdn=0x8014f5090, tcache=0x0, slow_path=true, ptr=<optimized out>, alloc_ctx=<optimized out>) at /root/freebsd/contrib/jemalloc/include/jemalloc/internal/arena_inlines_b.h:292
#8  idalloctm (tsdn=0x8014f5090, tcache=0x0, is_internal=false, slow_path=true, ptr=<optimized out>, alloc_ctx=<optimized out>) at /root/freebsd/contrib/jemalloc/include/jemalloc/internal/jemalloc_internal_inlines_c.h:118
#9  ifree (tsd=0x8014f5090, tcache=0x0, slow_path=true, ptr=<optimized out>) at jemalloc_jemalloc.c:2593
#10 __je_free_default (ptr=<optimized out>) at jemalloc_jemalloc.c:2784
#11 0x00000008010ad23b in exit_thread () at /root/freebsd/lib/libthr/thread/thr_exit.c:284
#12 0x00000008010ad1be in _pthread_exit_mask (status=status@entry=0x0, mask=mask@entry=0x0) at /root/freebsd/lib/libthr/thread/thr_exit.c:266
#13 0x00000008010ad01b in _Tthr_exit (status=0x40, status@entry=0x0) at /root/freebsd/lib/libthr/thread/thr_exit.c:209
#14 0x000000000102f6db in simple_thread (arg=<optimized out>) at /usr/home/markj/src/freebsd/tests/sys/kern/ptrace_test.c:1230
#15 0x00000008010acb08 in thread_start (curthread=0x802800700) at /root/freebsd/lib/libthr/thread/thr_create.c:292
#16 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdedf5000

At a glance, I'm not sure what to make of it.

This revision is now accepted and ready to land.Apr 27 2023, 8:08 PM
This revision was automatically updated to reflect the committed changes.