Page MenuHomeFreeBSD

procdesc: allow to collect status through wait(1) if process is traced

Authored by oshogbo on May 22 2019, 7:46 PM.



The debugger like truss(1) depends on the wait(2) syscall. This syscall
waits for ALL children. When it is waiting for ALL child's the children
created by process descriptors are not returned. This behavior was
introduced because we want to implement libraries which may pdfork(1).

The behavior of process descriptor brakes truss(1) because it will
not be able to collect the status of processes with process descriptors.

To address this problem the status is returned to parent when the
child is traced. While the process is traced the debugger is the parent.

The new process flag (P2_EXIT_TRACED) is introduced.
This flag informs that process exited while being traced.
Otherwise, the debugger will not collect the exit code of the traced process.

Unfortunately, that is not possible because of the kern_wait6().
This function again reparents the process to the original state.
If the P_TRACE flag would be set, the process would be added to
the orphan list of the debugger process - which would cause crash.

Add test case to verify that. The test case was implemented by markj@.

Diff Detail

Lint OK
No Unit Test Coverage
Build Status
Buildable 25701
Build 24286: arc lint + arc unit

Event Timeline

oshogbo created this revision.May 22 2019, 7:46 PM
oshogbo added a reviewer: jhb.
kib added inline comments.May 22 2019, 8:46 PM

Wouldn't this cause return of the process status through wait(2) to parent (and not a debugger) if the process was already reparented back from the debugger to the original parent ?

kib added a comment.May 22 2019, 8:46 PM

wait(2) and pdfork(2)

oshogbo added inline comments.May 22 2019, 8:56 PM

The P_TRACED and P2_EXIT_TRACED flag would not be set if the child was returned.

oshogbo marked an inline comment as not done.May 22 2019, 9:06 PM
oshogbo added inline comments.

Sorry maybe I respond to quickly. Could you clarify a little bit more?

Is this possible that we have attached a debugger and we reparented back to the parent? I didn't find such scenario in the code.
If we detach the debugger we are removing the P_TRACED flag.

kib added inline comments.May 22 2019, 9:37 PM

Process is traced and exiting, which sets it P2_EXIT_TRACED flag. Debugger exits as well, which allows re-parenting of the child to the original parent. Then you get everything set up for returning status in wait(2) despite p_procdesc != NULL.

Let me state it in other way: P2_EXIT_TRACED is not too different from not clearing P_TRACED at all.

oshogbo added inline comments.May 22 2019, 10:05 PM

Yea you are right. I introduce a new flag to different our selves from the P_TRACED and not add our selves to the orphan list.

I guess we should remove P2_EXIT_TRACED on re-parenting or detaching.
Or those you see another way of solving this?

oshogbo updated this revision to Diff 57853.May 24 2019, 6:40 PM

Clear the P2_EXIT_TRACED after reparenting.

kib added inline comments.May 25 2019, 2:43 PM

I wonder do you need P2_EXIT_TRACED flag. Isn't the condition where P_TRACED or P2_EXIT_TRACED are set equivalent to the orphaned state of the child ?

oshogbo added inline comments.May 27 2019, 3:02 PM

I'm not sure.
I think if the debugger is also a parent then we are not orphaned.
But maybe I miss understand some code.

kib added inline comments.May 27 2019, 7:52 PM

You can check if parent == oparent additionally ?

oshogbo added inline comments.May 27 2019, 8:08 PM

When we are parent and debugger then, I think, I wouldn't be still able to fetch last status when process exited and the P_TRACED flag was removed.

If I would like to leave P_TRACED flag then I need to figure out how to inform kern_wait6 not to add it to the orphan list.

markj added inline comments.Jun 3 2019, 7:48 PM

Could we change ptrace() to add the child to the parent's orphan list even when the original parent is the debugger?

oshogbo added inline comments.Jun 9 2019, 10:47 PM

I'm not sure about this approach. We may try to kill process twice first in the child list then in orphan lists. We can skip that if p_pid == q_oppid .
Im not sure if this intuitive that the child would land on the orphan list in this situation.

markj added inline comments.Jun 10 2019, 4:03 PM

Sorry, I'm not sure what you mean. exit() will send SIGKILL to children of the debugger, not orphans, but I guess you are referring to something else.

IMO the "orphan" terminology is already non-intuitive: an orphan's parent may still be alive. Without having tried to implement it myself, I think having the debuggee always live on an orphan list is more consistent and easier to reason about.

jhb added inline comments.Jun 10 2019, 6:23 PM

I have long wanted a 'p_debugees' list and a 'p_debugger' backpointer that was used when a debugger was detached and to not change the actual parent at all. I have had partial patches to do this in the past. The last time I brought this up when getppid was fixed to not report debuggers, kib@ convinced me that the new approach wouldn't really be any cleaner (you need workarounds to avoid reporting the status of a child with p_debugger != NULL to the "real" parent for example), but I still think it would probably be easier to reason about. Not sure if that is what you are asking about here?

In case the parent is the debugger meaning the debugger explicitly used pdfork() to create the child, it seems like the debugger should be using pdwait() instead of wait(). OTOH, if the child process then uses pdfork() the debugger is only going to get pids back for that event unless we add additional changes (e.g. a new trace op) to get back a procdesc for the new process. I don't really think we want to target the case of a debugger using pdfork() to create debugees currently as that needs a lot more work.

markj added inline comments.Jun 17 2019, 6:11 PM

Right, the problem is to recognize the case where the parent is also the debugger.

I think you're right that we can not worry about the case where the debugger used pdfork() to create the child. The main motivation here is to fix the case where some existing debugger (truss, gdb, etc.) is pointed at a pdfork() child. If we make that simplifying assumption, then it should be sufficient to check:

if (p->p_procdesc == NULL || ((p->p_treeflag & P_TREE_ORPHANED) != 0 && p->p_pptr == td->td_proc)

and maybe assert p->p_oppid != td->td_proc->p_pid in the latter case.

oshogbo updated this revision to Diff 60500.Aug 6 2019, 10:20 AM

If the process is created by pdfork the final status have to be
collected through process descriptor.

markj added inline comments.Aug 6 2019, 2:45 PM

Won't this test fail?

markj accepted this revision.Aug 7 2019, 8:31 PM

I think the wait or pdfork man pages should clarify this behaviour, but the code changes look ok to me.


Sorry, I was going off of the description, not what the test actually does. The comment should be updated: processes created this way are *not* visible to waitid(P_ALL).

This revision is now accepted and ready to land.Aug 7 2019, 8:31 PM
kib added a comment.Aug 9 2019, 3:04 PM
  1. Can you rewrite the review summary according to the new code ? I want to see the commit message for the change.
  2. Looking at proc_to_reap() and you initial note about kern_wait6(), I wonder should we check for p_procdesc for all idtype cases in proc_to_reap() ? In other words, isn't there somewhat larger problem with kern_wait6() reaping procdesc zombies at all ?