Make influx state counted.
ClosedPublic
Actions

Authored by kib on Dec 24 2016, 1:18 PM.

Details

Reviewers

markj
jmg
badger
jhb

Commits

rS310617: Make knote KN_INFLUX state counted. This is final fix for the issue

Summary

This is a continuation of the r310302.

If KN_INFLUX | KN_SCAN flags are set for the note passed to knote() or knote_fork(), i.e. the knote is scanned, we might erronously clear INFLUX when finishing notification. For normal knote() it was fixed in r310302 simply by remembering the fact that we do not own INFLUX, since there we own knlist lock and scan thread cannot clear INFLUX until we drop knlist lock.

For knote_fork(), the situation is more complicated. We must drop knlist lock AKA the process lock, since we need to register new knotes. As result, the bug is more involved.

I see two ways to fix this. One is to do something with KN_SCAN, e.g. delegating the knote() operation to the scan thread, when the knote() sees KN_INFLUX | KN_SCAN. This is somewhat big change.

Another solution is to change KN_INFLUX into counter. It is quite simple, the drawback there is that the change is not KBI backward-compatible. But I think that this is the natural solution, and there might appear more situations where we want to distinguish exclusive ownership of knote for drop vs. shared ownership for some non-destructive manipulations. Note that added assert about kn_influx == 1 in knote_drop() verifies that influx state is not shared when knote is destroyed.

Patch also makes the knlist_destroy() function just assert the right condition instead of allowing it and printing a message, which is useless for user. This will be committed separately.

Test Plan

Patch was tested by Peter Holm.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

kib updated this revision to Diff 23236.Dec 24 2016, 1:18 PM

kib retitled this revision from to Make influx state counted..

kib updated this object.

kib edited the test plan for this revision. (Show Details)

kib added reviewers: badger, markj, jhb.

kib set the repository for this revision to rS FreeBSD src repository - subversion.

Herald added a reviewer: jmg. · View Herald TranscriptDec 24 2016, 1:18 PM

Herald added a subscriber: imp. · View Herald Transcript

I'm a bit confused by this. kqueue_scan() checks for in-flux knotes and sleeps ("kqflxwt") until the concurrent knote operation is finished. Why can't knote_fork() do the same as, e.g., knote_fdclose() does?

sys/kern/kern_event.c
1672 ↗	(On Diff #23236)	This should be "... not supposed to be". Or more succinctly, "... is unexpectedly in flux".
sys/sys/event.h
206 ↗	(On Diff #23236)	This comment seems dubious now that influx is a shared lock.
226 ↗	(On Diff #23236)	Perhaps this should be grouped with kn_hookid to avoid extra padding?

In D8898#184313, @markj wrote:

I'm a bit confused by this. kqueue_scan() checks for in-flux knotes and sleeps ("kqflxwt") until the concurrent knote operation is finished. Why can't knote_fork() do the same as, e.g., knote_fdclose() does?

It is fine to sleep waiting for the influx to pass when we traverse kq list, but I do not see what to do when we traverse knlist to report registered events.

knote_fdclose() traverses kq knote list, not knlist. There, the function is fine if the knote which influx state we are waiting for, goes away. What should knote_fork() do if, while we dropped all locks, the knote we iterated to, is removed from the knlist ? At very least, we lost our position in the knlist and might activate some event second time, or generate more auto-created notes for children. Also, callers of knote() do not expect the function to sleep. It might be that fork() could be teached that knote_fork() call might sleep, but it would make the operation of process creation depended on the potentially unbound sleep of kqueue consumers.

Similarly, kqueue_scan() traverses kq note list, and can accept drop of knotes meantime.

All this troubles appear because we do not want to the 'non-dropping' influx state enforced on knote during scan to prevent simultaneous event activation (KN_SCAN). As I said, if we delegated the notification to the scan itself, then this shared influx state would be not needed.

sys/sys/event.h
206 ↗	(On Diff #23236)	I tried to specify the rules for managing kn_influx+KN_SCAN there, instead. Might be, some day it would be worth to change kn_influx + KN_SCAN combo into sx.
226 ↗	(On Diff #23236)	Right now kn_status+kn_influx are logically grouped. If you strongly prefer reordering, I will do the move, but size of the knote seems to be not too critical.

Mark' notes.

Herald edited edge metadata. · View Herald TranscriptDec 25 2016, 11:02 AM

In D8898#184339, @kib wrote:

In D8898#184313, @markj wrote:

I'm a bit confused by this. kqueue_scan() checks for in-flux knotes and sleeps ("kqflxwt") until the concurrent knote operation is finished. Why can't knote_fork() do the same as, e.g., knote_fdclose() does?

It is fine to sleep waiting for the influx to pass when we traverse kq list, but I do not see what to do when we traverse knlist to report registered events.

knote_fdclose() traverses kq knote list, not knlist. There, the function is fine if the knote which influx state we are waiting for, goes away. What should knote_fork() do if, while we dropped all locks, the knote we iterated to, is removed from the knlist ? At very least, we lost our position in the knlist and might activate some event second time, or generate more auto-created notes for children.

Ok, I see. I had some vague notion of inserting a KN_MARKER knote before going to sleep, but there is a number of problems with that approach.

Also, callers of knote() do not expect the function to sleep. It might be that fork() could be teached that knote_fork() call might sleep, but it would make the operation of process creation depended on the potentially unbound sleep of kqueue consumers.

Ok. I don't see what modifications to fork() would be needed though - all locks are dropped while calling knote_fork(), if I'm not mistaken.

Similarly, kqueue_scan() traverses kq note list, and can accept drop of knotes meantime.

All this troubles appear because we do not want to the 'non-dropping' influx state enforced on knote during scan to prevent simultaneous event activation (KN_SCAN). As I said, if we delegated the notification to the scan itself, then this shared influx state would be not needed.

Got it. This seems like a natural solution to me now. KN_SCAN isn't a very good name, but I don't have any alternative suggestions at the moment.

sys/sys/event.h
206 ↗	(On Diff #23236)	Thanks, this is more clear. Minor nits: "a thread can", "must not be set on a knote which". The first sentence also seems somewhat awkward to me - maybe, "An in-flux knote cannot be dropped from its kq while the kq is unlocked."? It's worth noting that "influx" is a noun with a meaning distinct from "in-flux."
226 ↗	(On Diff #23236)	I don't think it's critical, it just seems a bit wasteful. sizeof(struct knote) is 128 on amd64; we'll get 31 knotes per slab in that case, since the slab header is smaller than 128 bytes. With this change, sizeof(struct knote) becomes 136; we'll get 29 knotes per slab. I think it makes sense to keep kn_status and kn_influx grouped - when I wrote the original comment I was really thinking about moving kn_hookid instead.

This revision is now accepted and ready to land.Dec 25 2016, 10:23 PM

markj added inline comments.Dec 25 2016, 10:28 PM

sys/sys/event.h
226 ↗	(On Diff #23236)	Sorry, I meant "moving kn_hook and kn_hookid". Something like: kn_status kn_influx kn_sdata kn_sfflags kn_hookid kn_hook

In D8898#184392, @markj wrote:

In D8898#184339, @kib wrote:

Also, callers of knote() do not expect the function to sleep. It might be that fork() could be teached that knote_fork() call might sleep, but it would make the operation of process creation depended on the potentially unbound sleep of kqueue consumers.

Ok. I don't see what modifications to fork() would be needed though - all locks are dropped while calling knote_fork(), if I'm not mistaken.

In the normal circumstances, the new child is not made runnable until knotes fire. The kqueue_register() call to register automatic notes for the children in knote_fork() is made in non-sleepable mode (the waitok argument is 0). In other words, the code makes some efforts right now to avoid unbounded sleep before finishing fork and making the child runnable. This is not directly tied to locking constraints.

Might be it is not too problematic to start sleeping in knote_fork(), but it definitely avoided right now. I see some logic in this.