Page MenuHomeFreeBSD

Make influx state counted.
ClosedPublic

Authored by kib on Dec 24 2016, 1:18 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Dec 21, 4:36 PM
Unknown Object (File)
Mon, Dec 2, 3:37 AM
Unknown Object (File)
Nov 5 2024, 1:53 AM
Unknown Object (File)
Oct 22 2024, 6:23 AM
Unknown Object (File)
Oct 8 2024, 6:48 AM
Unknown Object (File)
Oct 3 2024, 9:38 PM
Unknown Object (File)
Oct 1 2024, 2:33 AM
Unknown Object (File)
Sep 29 2024, 10:27 PM
Subscribers

Details

Summary

This is a continuation of the r310302.

If KN_INFLUX | KN_SCAN flags are set for the note passed to knote() or knote_fork(), i.e. the knote is scanned, we might erronously clear INFLUX when finishing notification. For normal knote() it was fixed in r310302 simply by remembering the fact that we do not own INFLUX, since there we own knlist lock and scan thread cannot clear INFLUX until we drop knlist lock.

For knote_fork(), the situation is more complicated. We must drop knlist lock AKA the process lock, since we need to register new knotes. As result, the bug is more involved.

I see two ways to fix this. One is to do something with KN_SCAN, e.g. delegating the knote() operation to the scan thread, when the knote() sees KN_INFLUX | KN_SCAN. This is somewhat big change.

Another solution is to change KN_INFLUX into counter. It is quite simple, the drawback there is that the change is not KBI backward-compatible. But I think that this is the natural solution, and there might appear more situations where we want to distinguish exclusive ownership of knote for drop vs. shared ownership for some non-destructive manipulations. Note that added assert about kn_influx == 1 in knote_drop() verifies that influx state is not shared when knote is destroyed.

Patch also makes the knlist_destroy() function just assert the right condition instead of allowing it and printing a message, which is useless for user. This will be committed separately.

Test Plan

Patch was tested by Peter Holm.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

kib retitled this revision from to Make influx state counted..
kib updated this object.
kib edited the test plan for this revision. (Show Details)
kib added reviewers: badger, markj, jhb.
kib set the repository for this revision to rS FreeBSD src repository - subversion.

I'm a bit confused by this. kqueue_scan() checks for in-flux knotes and sleeps ("kqflxwt") until the concurrent knote operation is finished. Why can't knote_fork() do the same as, e.g., knote_fdclose() does?

sys/kern/kern_event.c
1672 ↗(On Diff #23236)

This should be "... not supposed to be". Or more succinctly, "... is unexpectedly in flux".

sys/sys/event.h
206 ↗(On Diff #23236)

This comment seems dubious now that influx is a shared lock.

226 ↗(On Diff #23236)

Perhaps this should be grouped with kn_hookid to avoid extra padding?

kib marked 2 inline comments as done.Dec 25 2016, 11:01 AM
In D8898#184313, @markj wrote:

I'm a bit confused by this. kqueue_scan() checks for in-flux knotes and sleeps ("kqflxwt") until the concurrent knote operation is finished. Why can't knote_fork() do the same as, e.g., knote_fdclose() does?

It is fine to sleep waiting for the influx to pass when we traverse kq list, but I do not see what to do when we traverse knlist to report registered events.

knote_fdclose() traverses kq knote list, not knlist. There, the function is fine if the knote which influx state we are waiting for, goes away. What should knote_fork() do if, while we dropped all locks, the knote we iterated to, is removed from the knlist ? At very least, we lost our position in the knlist and might activate some event second time, or generate more auto-created notes for children. Also, callers of knote() do not expect the function to sleep. It might be that fork() could be teached that knote_fork() call might sleep, but it would make the operation of process creation depended on the potentially unbound sleep of kqueue consumers.

Similarly, kqueue_scan() traverses kq note list, and can accept drop of knotes meantime.

All this troubles appear because we do not want to the 'non-dropping' influx state enforced on knote during scan to prevent simultaneous event activation (KN_SCAN). As I said, if we delegated the notification to the scan itself, then this shared influx state would be not needed.

sys/sys/event.h
206 ↗(On Diff #23236)

I tried to specify the rules for managing kn_influx+KN_SCAN there, instead. Might be, some day it would be worth to change kn_influx + KN_SCAN combo into sx.

226 ↗(On Diff #23236)

Right now kn_status+kn_influx are logically grouped. If you strongly prefer reordering, I will do the move, but size of the knote seems to be not too critical.

kib edited edge metadata.
kib marked an inline comment as done.

Mark' notes.

markj edited edge metadata.
In D8898#184339, @kib wrote:
In D8898#184313, @markj wrote:

I'm a bit confused by this. kqueue_scan() checks for in-flux knotes and sleeps ("kqflxwt") until the concurrent knote operation is finished. Why can't knote_fork() do the same as, e.g., knote_fdclose() does?

It is fine to sleep waiting for the influx to pass when we traverse kq list, but I do not see what to do when we traverse knlist to report registered events.

knote_fdclose() traverses kq knote list, not knlist. There, the function is fine if the knote which influx state we are waiting for, goes away. What should knote_fork() do if, while we dropped all locks, the knote we iterated to, is removed from the knlist ? At very least, we lost our position in the knlist and might activate some event second time, or generate more auto-created notes for children.

Ok, I see. I had some vague notion of inserting a KN_MARKER knote before going to sleep, but there is a number of problems with that approach.

Also, callers of knote() do not expect the function to sleep. It might be that fork() could be teached that knote_fork() call might sleep, but it would make the operation of process creation depended on the potentially unbound sleep of kqueue consumers.

Ok. I don't see what modifications to fork() would be needed though - all locks are dropped while calling knote_fork(), if I'm not mistaken.

Similarly, kqueue_scan() traverses kq note list, and can accept drop of knotes meantime.

All this troubles appear because we do not want to the 'non-dropping' influx state enforced on knote during scan to prevent simultaneous event activation (KN_SCAN). As I said, if we delegated the notification to the scan itself, then this shared influx state would be not needed.

Got it. This seems like a natural solution to me now. KN_SCAN isn't a very good name, but I don't have any alternative suggestions at the moment.

sys/sys/event.h
206 ↗(On Diff #23236)

Thanks, this is more clear. Minor nits: "a thread can", "must not be set on a knote which". The first sentence also seems somewhat awkward to me - maybe, "An in-flux knote cannot be dropped from its kq while the kq is unlocked."?

It's worth noting that "influx" is a noun with a meaning distinct from "in-flux."

226 ↗(On Diff #23236)

I don't think it's critical, it just seems a bit wasteful. sizeof(struct knote) is 128 on amd64; we'll get 31 knotes per slab in that case, since the slab header is smaller than 128 bytes. With this change, sizeof(struct knote) becomes 136; we'll get 29 knotes per slab.

I think it makes sense to keep kn_status and kn_influx grouped - when I wrote the original comment I was really thinking about moving kn_hookid instead.

This revision is now accepted and ready to land.Dec 25 2016, 10:23 PM
sys/sys/event.h
226 ↗(On Diff #23236)

Sorry, I meant "moving kn_hook and kn_hookid". Something like:

kn_status
kn_influx
kn_sdata
kn_sfflags
kn_hookid
kn_hook

In D8898#184392, @markj wrote:
In D8898#184339, @kib wrote:

Also, callers of knote() do not expect the function to sleep. It might be that fork() could be teached that knote_fork() call might sleep, but it would make the operation of process creation depended on the potentially unbound sleep of kqueue consumers.

Ok. I don't see what modifications to fork() would be needed though - all locks are dropped while calling knote_fork(), if I'm not mistaken.

In the normal circumstances, the new child is not made runnable until knotes fire. The kqueue_register() call to register automatic notes for the children in knote_fork() is made in non-sleepable mode (the waitok argument is 0). In other words, the code makes some efforts right now to avoid unbounded sleep before finishing fork and making the child runnable. This is not directly tied to locking constraints.

Might be it is not too problematic to start sleeping in knote_fork(), but it definitely avoided right now. I see some logic in this.

sys/sys/event.h
226 ↗(On Diff #23236)

I moved kn_hook and kn_hookid after kn_kevent. With this change, sizeof(struct knote) did not changed and is still 128.

kib edited edge metadata.

Reword the in-flux comment according to Mark recommendations.

Rearrange struct knote so that its size did not changed.

This revision now requires review to proceed.Dec 26 2016, 9:41 AM
markj edited edge metadata.
markj added inline comments.
sys/sys/event.h
226 ↗(On Diff #23236)

Thanks.

This revision is now accepted and ready to land.Dec 26 2016, 5:58 PM
This revision was automatically updated to reflect the committed changes.