Another fix for PR275286
ClosedPublic
Actions

Authored by kib on Nov 26 2023, 9:33 PM.

Details

Reviewers

markj

Commits

rG171f0832c5b1: EVFILT_TIMER: intialize stop timer list in type-stable proc init, instead of…
rGed410b78edc5: EVFILT_SIGNAL: do not use target process pointer on detach
rG877ef685322f: Revert "kqueue: on process exit, force-clear its registered signal events"

Summary

EVFILT_TIMER: intialize stop timer list in type-stable proc init, instead of fork

Since kqueue timer may exist after the process that created it exited
(same scenario with rfork(2) as in PR 275286), make the tailq
p_kqtim_stop accessed by filt_timerdetach() type-stable.

EVFILT_SIGNAL: do not use target process pointer on detach

It is enough to know knlist to remove from it, and the list is
autodestroyed on last removal.

PR:     275286

Revert "kqueue: on process exit, force-clear its registered signal events"

This reverts commit 393ac29f0b8be068c8e46f76c2eeee07d20ea4df.  A
different fix is following, which preserves semantic, required by the
sys.kqueue.proc3_test.proc3 test.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

kib created this revision.Nov 26 2023, 9:33 PM

Herald added a subscriber: imp. · View Herald TranscriptNov 26 2023, 9:33 PM

kib requested review of this revision.Nov 26 2023, 9:33 PM

kib edited the summary of this revision. (Show Details)Nov 26 2023, 9:58 PM

One root cause of the original problem is that we allow rfork() to create processes which share a kqueue. Why do we allow this at all, when regular fork() disallows it?

sys/sys/proc.h
754 ↗	(On Diff #130598)

In D42777#975712, @markj wrote:

One root cause of the original problem is that we allow rfork() to create processes which share a kqueue. Why do we allow this at all, when regular fork() disallows it?

The processes not just share a kqueue, they share the file descriptors table. So this is natural. Ugliness is IMO in EVFILT_PROC, the issue with the p_klist lifetime is more fundamental.

kib marked an inline comment as done.Nov 26 2023, 10:34 PM

In D42777#975714, @kib wrote:

In D42777#975712, @markj wrote:

One root cause of the original problem is that we allow rfork() to create processes which share a kqueue. Why do we allow this at all, when regular fork() disallows it?

The processes not just share a kqueue, they share the file descriptors table. So this is natural.

It is natural for the child of fork() to inherit copies of the parent's file descriptors. But for kqueue we already make an exception, so the situation is already unnatural. It seems wrong to add additional complexity to handle a corner case that has been broken for a long time. Plus, with the patch we now allocate a large table (PID_MAX * 8B on most systems) and require extra locks for common operations like fork() and kill(). We can't even combine this with the main PID hash table, I think, because the kevent table requires a different lock order with respect to the proc lock (because of tdsendsignal()).

In D42777#975716, @markj wrote:

In D42777#975714, @kib wrote:

In D42777#975712, @markj wrote:

One root cause of the original problem is that we allow rfork() to create processes which share a kqueue. Why do we allow this at all, when regular fork() disallows it?

The processes not just share a kqueue, they share the file descriptors table. So this is natural.

It is natural for the child of fork() to inherit copies of the parent's file descriptors. But for kqueue we already make an exception, so the situation is already unnatural. It seems wrong to add additional complexity to handle a corner case that has been broken for a long time. Plus, with the patch we now allocate a large table (PID_MAX * 8B on most systems) and require extra locks for common operations like fork() and kill(). We can't even combine this with the main PID hash table, I think, because the kevent table requires a different lock order with respect to the proc lock (because of tdsendsignal()).

No, thi is not about copies. Rforked child, in this case, shares file descriptor table, which is fine. Kqueue must work per-fd table, so this case must be handled by the code. And I do think that the only way to architectural cleanly provide the documented semantic for signal and process filters is to attach knotes to pids and not to struct proc.

But for the case at hands, I think I found a way to fix the bug much simpler, with small trick.

Another way to fix.

I think this addresses the PR, but what about other kevent filters which maintain a proc pointer? In particular, I think filt_timerdetach() is susceptible to a similar problem.

This revision is now accepted and ready to land.Nov 28 2023, 3:36 PM

In D42777#976040, @markj wrote:

I think this addresses the PR, but what about other kevent filters which maintain a proc pointer? In particular, I think filt_timerdetach() is susceptible to a similar problem.

Yes, I think filt_timerdetach() is problematic, but not due to p_klist. And again, attaching timer to pid would be the right thing to do.

Fix for timers

This revision now requires review to proceed.Nov 28 2023, 3:55 PM

markj added inline comments.Nov 28 2023, 4:05 PM

sys/kern/kern_proc.c
278 ↗	(On Diff #130647)	Hmm, I think this isn't sufficient. Suppose a timer callout is armed, and the target process exits. If a child process holds the kqueue reference, the callout will not be drained. Then filt_timerexpire_l() loads `kc->p`, but this will be a pointer to a freed proc.

kib added inline comments.Nov 28 2023, 4:08 PM

sys/kern/kern_proc.c
278 ↗	(On Diff #130647)	The pointer is freed but still valid. The process mutex and (now) p_kqtim_stop are usable.

markj accepted this revision.Nov 28 2023, 4:17 PM

markj added inline comments.

sys/kern/kern_proc.c
278 ↗	(On Diff #130647)	Hmm, I see. So, in essence, we use a random process to be the holder of stopped timers. This is still incorrect, since it means that a program can defeat the safeguard which ensures that SIGKILL/SIGSTOP can be delivered to a process which arms timers with a very short period. Also, it means that if this random process is stopped or killed, then the timer will stop firing. It is a rather strange scenario.

This revision is now accepted and ready to land.Nov 28 2023, 4:17 PM

kib added inline comments.Nov 28 2023, 5:09 PM

sys/kern/kern_proc.c
278 ↗	(On Diff #130647)	I agree, but again, the fix is to attach everything to pid, not process.

markj added inline comments.Nov 28 2023, 5:12 PM

sys/kern/kern_proc.c
278 ↗	(On Diff #130647)	Perhaps it's possible to use the existing PID hash table and associated locks. This didn't seem easy for EVFILT_SIGNAL (there is a LOR with the proc lock), but perhaps for timers it works.