Page MenuHomeFreeBSD

Copy kqueues into the child on fork
Needs ReviewPublic

Authored by kib on Aug 20 2025, 10:48 AM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Oct 13, 2:23 PM
Unknown Object (File)
Mon, Oct 13, 2:22 PM
Unknown Object (File)
Sun, Oct 12, 8:15 PM
Unknown Object (File)
Sun, Oct 12, 8:10 PM
Unknown Object (File)
Sun, Oct 12, 7:21 PM
Unknown Object (File)
Sun, Oct 12, 7:21 PM
Unknown Object (File)
Sun, Oct 12, 7:06 PM
Unknown Object (File)
Thu, Oct 9, 8:56 PM

Details

Reviewers
kevans
markj
Summary

Add KQUEUE_CPONFORK flag for kqueue1(2), that marks the created kqueue as requiring copy on fork into the child' file descriptor table.

This means that a new kqueue is created at the same fd index as the parent' kqueue, and all registered events are copied into the new kqueue (when possible). The current active events list is also duplicated.

Another sore place is the removal on the marker from the knlist, which is very high-cost for SLIST (the second XXXKIB place).

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

kib requested review of this revision.Aug 20 2025, 10:48 AM

Missed check for KQ_CPONFORK

Add f_copy; properly copy timers.
Many fixes for generic knote copy.

markj added inline comments.
sys/kern/kern_event.c
3037

Shouldn't we ignore knotes corresponding to fds with O_CLOFORK set?

sys/kern/kern_event.c
3037

I thought about translating KQUEUE_CPONFORK into ~O_CLOFORK and vice versa, but there are some details. I might get to this soon.

sys/kern/kern_event.c
3037

Oh, you mean knotes. Right, I think this is good idea.

sys/kern/kern_event.c
3037

I mean, suppose the parent process has a EVFILT_READ knote for fd 3, and fd 3 is flagged O_CLOFORK, so it will be closed in the child process. Then, this knote must not be inherited, I think.

kib marked an inline comment as done.

Ignore fd-based knotes for which fd was not copied.
Tweak locking to avoid witness warning about dup filedesc lock.

kib marked an inline comment as done.Aug 21 2025, 7:20 PM

Handle copy for vfs/vnode filters

Skip copying of influx notes (that are infux not due to scan).
Add copy method for netmap notes.

Call kqueue_expand() as needed. Not doing so has profound consequences, including not calling knote_drop() on close of forked kqueue.

Allocate forked kqueue fd with the ref count on 1 instead of 2. Using '2' hide the issue with missed knote_drop().

Harmonize internal interfaces for copying specific knote.

kib removed subscribers: kevans, markj.

From my current PoV, this should be more or less finished implementation. In particular, I reviewed all filters for the need of special copy handling.

In D52045#1193293, @kib wrote:

From my current PoV, this should be more or less finished implementation. In particular, I reviewed all filters for the need of special copy handling.

Ping?

In D52045#1199301, @kib wrote:
In D52045#1193293, @kib wrote:

From my current PoV, this should be more or less finished implementation. In particular, I reviewed all filters for the need of special copy handling.

Ping?

Sorry, I will try to get to this patch later this week.

kib added a subscriber: pho.

Rebase. Document KQUEUE_CPONFORK

sys/compat/linux/linux_event.c
107

Shouldn't the third arg be true in this call? In Linux epoll fds are copied across the forks.

kib marked an inline comment as done.Wed, Oct 8, 5:30 AM
kib added inline comments.
sys/compat/linux/linux_event.c
107

Might be. For now I intend to keep existing behavior.

kib marked an inline comment as done.

Bug fixes after Peter' testing.

Keep source kq pointer in the copy kq until copying phase two is done.

I intend to commit this early next week.

lib/libsys/kqueue.2
105
109
111

Maybe, "The kqueue descriptor will be inherited by the child, i.e., the numeric value of the descriptor will remain the same."?

115
sys/kern/kern_event.c
3013

Strange indentation here, is it from debugging?

3124

This looks like some debugging.

kib marked 6 inline comments as done.Mon, Oct 13, 4:06 PM
kib added inline comments.
sys/kern/kern_event.c
3013

Indeed this is debugging, I forgot to drop the patch when generating the diff.

kib marked an inline comment as done.

Man page editing

lib/libsys/kqueue.2
109

Maybe mention that this is contrary to how all other descriptors are treated upon fork(), i.e., we make shallow copies instead.

sys/kern/kern_descrip.c
2539–2554

This comment should be removed or enhanced to explain what is actually happening.

2582

If you handle the !fork_pass case here instead, you can reduce indentation below.

2584

Maybe make a local variable for ofde->fde_file->f_ops so that the lines aren't so long.

2596

Why do we handle DFLAG_FORK in a separate pass? Why not call fo_fork in the first loop?

sys/kern/kern_event.c
966

Inheriting the cpuid like this doesn't really make sense. Ideally the callout will fire on the same CPU where it was scheduled from. Maybe filt_timerstart() should also reset kc->cpuid = PCPU_GET(cpuid).

3089

It is not too unusual for a queue to have several thousand knotes. I suspect this will be slow indeed.

kib marked 7 inline comments as done.Tue, Oct 14, 6:01 PM
kib added inline comments.
sys/kern/kern_descrip.c
2582

Each FILEDESC_FOREACH_FDE() needs its own block scope unfortunately.

2596

This is the commit message for DFLAG_FORK change:

struct file: add DFLAG_FORK, indicate copying on fork

instead of referencing or ignoring.

The pass to handle copying is performed after all passable files are
installed into the new file descriptor's table, since one of the future
consumers of this interface (kqueue) needs all passable file descriptors
already valid. For the same reason the copying is done in two passes
by itself, first allowing copyable files to be copied, and then giving
the subsystems a second chance to really fill the copied file backing
structure with the content.

[MFC note: use fo_spares for fo_fork]

(and note that kqueue might need the fd number to reference other kqueue)

sys/kern/kern_event.c
3089

It is worse at more significant place, knlist_remove_kq().

kib marked 3 inline comments as done.

Changes due to the latest comments.

Wrap the first stage of the fork pass into the original pass over source fd table.

Some editing to the comment.

The code changes seem ok to me. This should really have some regression tests. Even some simple ones which exercise the f_copy routine for various filters, and verify that knotes for O_CLOFORK fds are handled properly, would be much better than nothing.

lib/libsys/kqueue.2
120

You might mention somewhere that knotes corresponding to O_CLOFORK fds will not be inherited.

sys/kern/kern_descrip.c
2588
sys/kern/kern_event.c
3089

knlist_remove_kq() is just removing a single knote though.