Page MenuHomeFreeBSD

A kludge to work around the knlist_remove_inevent() making the process exit non-synced with the NOTE_EXIT delivery.
ClosedPublic

Authored by kib on Jun 16 2016, 7:42 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Apr 20, 7:57 AM
Unknown Object (File)
Mar 16 2024, 11:26 PM
Unknown Object (File)
Mar 14 2024, 3:38 PM
Unknown Object (File)
Mar 14 2024, 3:38 PM
Unknown Object (File)
Mar 14 2024, 3:33 PM
Unknown Object (File)
Mar 14 2024, 3:33 PM
Unknown Object (File)
Mar 14 2024, 3:33 PM
Unknown Object (File)
Mar 14 2024, 3:33 PM
Subscribers

Details

Summary

When filt_proc() removes event from the knlist due to the process exiting (NOTE_EXIT->knlist_remove_inevent()), two things happen:

  • knote kn_knlist pointer is reset
  • INFLUX knote is removed from the process knlist.

And, there are two consequences:

  • KN_LIST_UNLOCK() on such knote is nop
  • there is nothing which would block exit1() from processing past the knlist_destroy() (and knlist_destroy() resets knlist lock pointers).

Both consequences result either in leaked process lock, or dereferencing NULL function pointers for locking.

The patch removes setting kn->kn_list to NULL on knlist_remove(), as well as setting lock functions to NULL on knlist_destroy(). This makes knlist unlocks in the kqueue_scan() functional even for knotes from _inevent(). I call it kludge because this change relies on the type-safety of the struct proc.

Alternate, non-kludge solution is to either remove _inevent() at all, or make some additional hold mechanism on the knlists to block knlist_cleardel/destroy, besides the presence of the INFLUX notes on list. E.g. knlist might also grow some reference count which must be waited for dropping to zero in knlist_destroy(). But this is much more fundamental rework of the code.

Diagnosed by: Eric Badger <eric@badgerio.us>
Tested by: pho, Eric Badger <eric@badgerio.us>

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

kib retitled this revision from to A kludge to work around the knlist_remove_inevent() making the process exit non-synced with the NOTE_EXIT delivery..
kib updated this object.
kib edited the test plan for this revision. (Show Details)
kib added a reviewer: jhb.
kib set the repository for this revision to rS FreeBSD src repository - subversion.
kib added subscribers: pho, badger.

Forgot to note, style and minor cosmetics bits will be split for commit, of course.

Hmmm, knlist_remove_inevent() always passes '1' in 'knlislocked' so knlist_remove() should never use the lock function pointers or argument. (The lock is held by the calling KNOTE() invocation.) In the stack trace where you see the panic, are you trying to add a filter for a process that is already a zombie? Ah, I guess in kqueue_register() it is the KNL_LIST_UNLOCK() after 'done_ev_add' and before 'done'? We hold PROC_LOCK throughout the part of exit1() that calls KNOTE_LOCKED / knlist_clear / knlist_destroy (which means filt_procattach() should complete the "immediate" KNOTE_ACTIVATE fine).

Hmm, is the issue a race between the KNOTE_LOCKED() in exit1() with kqueue_register() (so we don't catch it in the "immediate" case but the process exits just after)?

Oh bah. Wouldn't it be nice if filt_procattach() didn't do PROC_UNLOCK but returned with it locked? That is, suppose f_attach()'s semantics required it to return with KN_LIST_LOCK held. I think that would fix this, but it would mean fixing the various f_attach routines. That is, we do PROC_UNLOCK() in filt_procattach() only to turn around and do KN_LIST_LOCK(kn) as the next statement in kqueue_register(). Dropping the lock there opens the race to allow exit1() to do KNOTE_LOCKED, knlist_*, etc.

The race is between the exiting process (who runs knote(..., NOTE_EXIT), then filt_proc(), then knlist_remove_inevent()) and another thread in kqueue_scan(), processing possibly a NOTE_EXEC for the same process (I've only been able to reproduce this with short lived processes). I could try to track it down more precisely, though even KTR messages can close the timing window, depending on where I put them.

To make my last comment slightly more specific, I think what happens is:

  1. The exiting process gets the PROC_LOCK first, and runs knote().
  2. The thread in kqueue_scan() blocks on KN_LIST_LOCK.
  3. The exiting process releases the PROC_LOCK, having finished its work.
  4. The thread in kqueue_scan() acquires the PROC_LOCK, but kn->kn_knlist has been cleared by this point, so it can't release it.
kib edited edge metadata.

I believe that the real issue there is the tight coupling of the knlist and struct proc lifetime. The _inevent() hack is yet another indication of the problem. The knlist is cleared and destroyed on the process exit, which is too early: we must allow knotes to exist past the exit. Even moving the destroy into process reap would not help, knotes must survive that as well.

I propose to delete the _inevent() stuff, instead allow the 'detached' knlists. A detached knlist is automatically removed by the kqueue subsystem when the last knote is removed.

The patch below implements that, it survived very light testing. In some sense, the patch also repeats what was done in the previous patch, but in less kludge way. The kn_list_lock() function now returns knlist, so zeroing kn->kn_list is possible on detach, and NULLing lock pointers in knlist_destroy() does not affect proc knlists, because knlist_destroy() is called after last unlock.

Of course, even after the patch, code still accesses freed struct proc, because detached knlist is locked by the proc mutex. I think this is an acceptable outcome for now.

I tested the 17676 patch with my original script. Haven't seen any crashes, but I am seeing spurious NOTE_EXEC events. That is, I might encounter >4000 NOTE_EXECs even though only 4000 occurred. The number of extra NOTE_EXECs is the same as the number of events where both NOTE_EXEC and NOTE_CHILD are in fflags. I think the extra knote that is created for NOTE_CHILD as well as the normal knote are both getting NOTE_EXEC added to fflags.

In D6859#144745, @eric_badgerio.us wrote:

I tested the 17676 patch with my original script. Haven't seen any crashes, but I am seeing spurious NOTE_EXEC events. That is, I might encounter >4000 NOTE_EXECs even though only 4000 occurred. The number of extra NOTE_EXECs is the same as the number of events where both NOTE_EXEC and NOTE_CHILD are in fflags. I think the extra knote that is created for NOTE_CHILD as well as the normal knote are both getting NOTE_EXEC added to fflags.

Indeed, thanks for noting this. I included the supposed fix into the update, although the issue is arguably unrelated. Since I was there, I also untwisted the immediate activation of the child note and NOTE_EXIT activation of the existing note for exiting process. I am unable to see how existing code is correct, the KNOTE_ACTIVATE() was not called for child unless NOTE_EXIT was asked for.

kib edited edge metadata.

Clear NOTE_EXIT | NOTE_FORK | NOTE_EXEC for child knote.
Untwist the immediate activation.

kib edited edge metadata.

Remove racy check for KN_INFLUX, which is additionally buggy by not chechking KN_SCAN.

In D6859#144877, @kib wrote:

Remove racy check for KN_INFLUX, which is additionally buggy by not chechking KN_SCAN.

Thank you for addressing this. I've bumped into problems that I believe were caused by this before. So far (several hours of testing), it looks like your change has addressed those issues (specifically, I would miss events that were skipped due to an KN_INFLUX state).

jhb edited edge metadata.

I think this looks ok and is generally cleaner.

sys/sys/event.h
161 ↗(On Diff #17727)

Might be better to say "passed to lock functions" or some such since the arg is passed to all 4 fucntions.

This revision is now accepted and ready to land.Jun 27 2016, 6:57 PM
This revision was automatically updated to reflect the committed changes.