Page MenuHomeFreeBSD

Corestop.
Needs ReviewPublic

Authored by trasz on May 11 2020, 9:08 PM.

Details

Reviewers
kib
Group Reviewers
manpages
Summary

Add corestop - a mechanism to try to stop a process instead of dumping
core, and send out a devd(8) notification. This way devd can run a script
to gather various data related to the process, or trigger a snapshot of the
VM it's running in, so that the user can later restore the snapshot, log in,
and attach to the offending proccess with a debugger.

This is a kind of WIP - it works, but might not be the final shape;
comments are welcome.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 31549
Build 29151: arc lint + arc unit

Event Timeline

trasz requested review of this revision.May 11 2020, 9:08 PM

This only handles trapsignal(), so won't catch abort() from userland? (E.g. if a C++ exception isn't caught and unwinds to the bottom of the stack) Also, is the idea that then later do a SIGCONT to get a core dump by retrying the instruction (won't work for the abort() case)? Or to depend on the notification to send a SIGKILL?

For FreeBSD opting into this sort of thing might be something only certain processes want to do via a new procctl(2) flag or the like.

In D24813#546230, @jhb wrote:

This only handles trapsignal(), so won't catch abort() from userland? (E.g. if a C++ exception isn't caught and unwinds to the bottom of the stack) Also, is the idea that then later do a SIGCONT to get a core dump by retrying the instruction (won't work for the abort() case)? Or to depend on the notification to send a SIGKILL?

Sure it does not handle async signals.

For FreeBSD opting into this sort of thing might be something only certain processes want to do via a new procctl(2) flag or the like.

If process opts-in into this mode, it can instead fork/fork/exec gdb which attaches to the originator.

Overall this looks like mach debug servers (AFAIR) with very poor implementation. I am not sure that this can be useful except for small hacks.

sys/kern/kern_sig.c
2042

You never clear P2_CORESTOPPED.

Thanks for the review. I'm in the middle of rewriting the kernel part, so that instead of replacing the signal it handles things a bit lower down the stack; I'll try to avoid the problems you've pointed out.

Regarding the opt-in - I think that would make the mechanism somewhat harder to use; after all, the point of debugging is to force the process to cooperate in the first place.

For now, I'm updating the review to also include the userspace part.

Add the userspace part; kernel part will be updated in subsequent review.

Slightly less hacky, although still a bit buggy (SIGABRT doesn't
quite work) kernel part.

Am I going in the right direction?

sys/kern/kern_sig.c
3079

I think you should pass td as an argument.

3082

Why is this assert valid ?

Also, you should document, by asserts, all the locks that are assumed in this functions. The proc lock at least.

Which leads to the question, how wise is to log and send devctl notification under the process lock (and sigacts mutex as well).

3118

You basically loose the original signal there, I do not see how could the code work as intended, reliably.

trasz marked an inline comment as done.Jun 7 2020, 4:43 PM
trasz added inline comments.
sys/kern/kern_sig.c
3082

Is the assert ok now?

Regarding logs and devctl - I'm not sure. WITNESS doesn't yell about it, so I'm assuming it's ok? The sigacts mutex would be easy to get rid of; proc - not sure.

3118

I think I got it right this time. Or rather - as right as reasonably possible.

sys/kern/kern_sig.c
2992

If I would play with something like this, I would just changed this line to

if ((prop & SIGPROP_STOP) != 0 || (corestop && (prop & SIGPROP_CORE) != 0))

Then if SIGPROP_CORE is set, I would send notifications before stopping.

3813

You still never clear the flag.

3838

Again this erases any info about the signal.

3840

This is weird. More weird is that you stopping the process without notifying potential debugger.

3845

This block was copied/pasted. Can you refactor instead ?

3852

This is a recursion, perhaps infinite.

gbe added a subscriber: gbe.

OK from manpages.