Page MenuHomeFreeBSD

procdesc: fix reparenting when the debugger is attached
ClosedPublic

Authored by oshogbo on May 22 2019, 7:42 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Nov 15, 10:44 PM
Unknown Object (File)
Fri, Nov 8, 5:31 AM
Unknown Object (File)
Oct 18 2024, 12:36 PM
Unknown Object (File)
Oct 4 2024, 11:30 PM
Unknown Object (File)
Oct 2 2024, 8:58 PM
Unknown Object (File)
Oct 2 2024, 6:00 PM
Unknown Object (File)
Oct 2 2024, 3:04 PM
Unknown Object (File)
Oct 2 2024, 3:40 AM
Subscribers

Details

Summary

The process is reparented to the debugger while it is attached.

  B          B
 /   ---->   |
A          A D

Every time when the process is reparented, it is added to the orphan list of
the previous parent:

A->orphan = B
D->orphan = NULL

When the A process will close the process descriptor to the B process,
the B process will be reparented to the init process.

   B            B - init
   |   ---->
A  D           A   D
`

A->orphan = B
D->orphan = B

In this scenario, the B process is in the orphan list of A and D.

When the last process descriptor is closed instead of reparenting
it to the reaper let it stay with the debugger process and set
our previews parent to the reaper.

Add test case for this situation.
Notice that without this patch the kernel will crash with this test case:
panic: orphan 0xfffff8000e990530 of 0xfffff8000e990000 has unexpected oppid 1

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

oshogbo edited the summary of this revision. (Show Details)
oshogbo edited the summary of this revision. (Show Details)
oshogbo added reviewers: emaste, markj.
oshogbo removed a subscriber: emaste.
sys/kern/sys_procdesc.c
419 ↗(On Diff #57716)

You forgot to remove the old call.

425 ↗(On Diff #57716)

I think this is necessary but it's not obvious to me that it's sufficient. Looking at proc_reparent(), it also calls sigqueue_take(), but I don't think we need to do that in the traced case. Is there any other state that might need to be updated?

sys/kern/sys_procdesc.c
419 ↗(On Diff #57716)

Aghr. Screw up while rebasing. Thanks!

sys/kern/sys_procdesc.c
425 ↗(On Diff #57716)

I am not sure how would it work. The process must be on the orphans' list for its old parent, otherwise premature exit of the debugger or even detach would see unexpected state.

sys/kern/sys_procdesc.c
425 ↗(On Diff #57716)

The parent has disassociated itself from the process by closing the process descriptor. Your comment makes me think that p should be added to the orphan list of the reaper. I noted however that the code in exit1() that raises SIGKILL to children of an exiting debugger does not do this.

sys/kern/sys_procdesc.c
425 ↗(On Diff #57716)

Yes, inserting into the reapper' orphan list is the most obvious option.

In exit1() case, when debugger exits, the traced children are reparented back, there is no need to do anything about new orphanage.

Add a process to the orphan list as suggested by @kib and @markj.

sys/kern/kern_exit.c
1361 ↗(On Diff #57852)

Can you explain why do you require PROC_LOCK there ?

sys/kern/kern_exit.c
1361 ↗(On Diff #57852)

p_flag is protected by PROC_LOCK.
But I guess the proctree_lock should be enough.

P_TRACED is always set or cleared with proctree_lock exclusively locked, except at the process exit. After that, the process lock in exit1() is dropped and we acquire the proctree_lock. PRS_ZOMBIE is set much later on the process.

So I wonder is it possible that your patched code would see non-traced process which is not yet safe to reparent. Or rather, I do not see anything that prevents this situation.

I suspect that we should clear P_TRACED under proctree_lock in exit1() as well.

In D20361#440566, @kib wrote:

So I wonder is it possible that your patched code would see non-traced process which is not yet safe to reparent. Or rather, I do not see anything that prevents this situation.

What exactly makes it unsafe? The procdesc close may race with a PT_DETACH, but PT_DETACH should not modify the process tree when p_oppid and p_ptr refer to the same process.

sys/kern/kern_exit.c
171 ↗(On Diff #57897)

IMO it should be renamed to proc_clear_orphan() now.

In D20361#440566, @kib wrote:

So I wonder is it possible that your patched code would see non-traced process which is not yet safe to reparent. Or rather, I do not see anything that prevents this situation.

What exactly makes it unsafe? The procdesc close may race with a PT_DETACH, but PT_DETACH should not modify the process tree when p_oppid and p_ptr refer to the same process.

ptrace(PT_DETACH) is safe, but exit1() is not IMO. It clears P_TRACED very early, which would make procdesc_close() not calling proc_clear_orphan().

In D20361#441005, @kib wrote:
In D20361#440566, @kib wrote:

So I wonder is it possible that your patched code would see non-traced process which is not yet safe to reparent. Or rather, I do not see anything that prevents this situation.

What exactly makes it unsafe? The procdesc close may race with a PT_DETACH, but PT_DETACH should not modify the process tree when p_oppid and p_ptr refer to the same process.

ptrace(PT_DETACH) is safe, but exit1() is not IMO. It clears P_TRACED very early, which would make procdesc_close() not calling proc_clear_orphan().

Sorry, I don't follow. If procdesc_close() does not observe P_TRACED, it will simply reparent to the reaper, and proc_reparent() clears the orphan state.

In D20361#441005, @kib wrote:

ptrace(PT_DETACH) is safe, but exit1() is not IMO. It clears P_TRACED very early, which would make procdesc_close() not calling proc_clear_orphan().

Sorry, I don't follow. If procdesc_close() does not observe P_TRACED, it will simply reparent to the reaper, and proc_reparent() clears the orphan state.

kib pointed out that the debugger will not be able to collect the exit status in this case. I now agree that we should postpone clearing P_TRACED until the proctree lock is acquired in exit1(). It would probably be worth testing such a patch separately; I would ask Peter to run stress2, since it contains a number of tests for ptrace.

As Mark noted, move of P_TRACED cleanup should be a separate commit which makes sense to ask pho to test. Also I recommend to split introduction of proc_add_orphan() into another preliminary commit, since it will make merges easier.

This revision is now accepted and ready to land.May 29 2019, 5:42 PM
markj added inline comments.
sys/kern/kern_exit.c
171 ↗(On Diff #57897)

Do you disagree with my comment above?

I got this after a 5 hour test on i386, followed by a shutdown:

panic: vm_fault_hold: fault on nofault entry, addr: 0x83326000
cpuid = 0
time = 1564486041
KDB: stack backtrace:
db_trace_self_wrapper(e6666d,1bdcda4,0,23a17534,bd2eb1,...) at db_trace_self_wrapper+0x2a/frame 0x23a17508
kdb_backtrace(e,0,0,2b06100,2db5000,...) at kdb_backtrace+0x2e/frame 0x23a17568
vpanic(163709e,23a175b0,23a175b0,23a17680,12c2044,...) at vpanic+0x121/frame 0x23a17590
panic(163709e,158dad9,83326000,4895800,7000004,...) at panic+0x14/frame 0x23a175a4
vm_fault_hold(2db5000,83326000,2,0,0) at vm_fault_hold+0x2474/frame 0x23a17680
vm_fault(2db5000,83326000,2,0) at vm_fault+0x4a/frame 0x23a176a8
trap_pfault(8332680c) at trap_pfault+0xf3/frame 0x23a176f4
trap(23a177bc,8,28,28,83df8640,...) at trap+0x3c0/frame 0x23a177b0
calltrap() at 0xffc0316d/frame 0x23a177b0
--- trap 0xc, eip = 0xf7f255, esp = 0x23a177fc, ebp = 0x23a17814 ---
funsetownlst(27e5ea50) at funsetownlst+0xa5/frame 0x23a17814
pgdelete(27e5ea6c,23c00a80,1651842,36,23a178f8,...) at pgdelete+0x60/frame 0x23a17834
leavepgrp(298d46a8,0,1651842,396,44c,...) at leavepgrp+0xd5/frame 0x23a1784c
proc_reap(23c00a80,298d46a8,23a17ba4,36,1edeb50,...) at proc_reap+0x714/frame 0x23a178f8
proc_to_reap(23c00a80,298d46a8,7,0,0,...) at proc_to_reap+0x718/frame 0x23a17998
kern_wait6(23c00a80,7,0,0,23a17ba4,...) at kern_wait6+0x37a/frame 0x23a17a78
kern_wait(23c00a80,ffffffff,23a17ba4,36,0,...) at kern_wait+0x119/frame 0x23a17b80
sys_wait4(23c00a80,23c00d0c) at sys_wait4+0x68/frame 0x23a17c08
syscall(23a17ce8,3b,3b,3b,0,...) at syscall+0x304/frame 0x23a17cdc
Xint0x80_syscall() at 0xffc033b7/frame 0x23a17cdc

https://people.freebsd.org/~pho/stress/log/oshogbo001.txt

Just a follow up. I'm currently testing your patch on amd64 where I have not yet seen any problems.

I completed a full stress2 test on amd64 without seeing any problems.