Page MenuHomeFreeBSD

Don't defer wakeup()'s for completed journal workitems.
ClosedPublic

Authored by jhb on Sep 25 2017, 6:04 PM.
Tags
None
Referenced Files
Unknown Object (File)
Dec 20 2023, 3:47 AM
Unknown Object (File)
Aug 12 2023, 9:14 PM
Unknown Object (File)
Aug 2 2023, 10:01 AM
Unknown Object (File)
Jul 12 2023, 5:06 PM
Unknown Object (File)
Jan 9 2023, 11:33 AM
Unknown Object (File)
Jan 4 2023, 11:38 AM
Subscribers

Details

Summary

Normally wakeups() are performed for completed softupdates work items
in workitem_free() before the underlying memory is free()'d.
complete_jseg() was clearing the "wakeup needed" flag in work items to
defer the wakeup until the end of each loop iteration. However, this
resulted in the item being free'd before it's address was used with
wakeup(). As a result, another part of the kernel could allocate this
memory from malloc() and use it as a wait channel for a different
"event" with a different lock. This triggered an assertion failure
when the lock passed to sleepq_add() did not match the existing lock
associated with the sleep queue. Fix this by removing the code to
defer the wakeup in complete_jseg() allowing the wakeup to occur
slightly earlier in workitem_free() before free() is called.

The main reason I can think of for deferring a wakeup() would be to
avoid waking up a waiter while holding a lock that the waiter would
need. However, no locks are dropped in between the wakeup() in
workitem_free() and the end of the loop in complete_jseg() as far as I
can tell.

In general I think it is not safe to do a wakeup() after free() as one
cannot control how other parts of the kernel that might reuse the
address for a different wait channel will handle spurious wakeups.

Test Plan
  • I did a simple boot test, but pho@ will need to run the test that reproduced the crash.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

This revision is now accepted and ready to land.Sep 25 2017, 6:08 PM

I tested this patch on two amd64 hosts for 24 hours with all of the SU+J tests I have. No problems seen.

This revision was automatically updated to reflect the committed changes.