Page MenuHomeFreeBSD

Fix locking issues with aio_fsync().
ClosedPublic

Authored by jhb on Jul 27 2016, 8:55 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Nov 24, 7:53 PM
Unknown Object (File)
Oct 20 2024, 9:05 AM
Unknown Object (File)
Oct 1 2024, 10:02 AM
Unknown Object (File)
Sep 28 2024, 12:33 AM
Unknown Object (File)
Sep 23 2024, 5:34 AM
Unknown Object (File)
Sep 21 2024, 11:43 AM
Unknown Object (File)
Sep 20 2024, 5:56 AM
Unknown Object (File)
Aug 29 2024, 5:20 PM

Details

Summary

Fix locking issues with aio_fsync().

  • Use correct lock in aio_cancel_sync when dequeueing job.
  • Add _locked variants of aio_set/clear_cancel_function and use those to avoid lock recursion when adding and removing fsync jobs to the per-process sync queue.
  • While here, add a basic test for aio_fsync().

PR: 211390

Test Plan
  • Run aio_fsync test on an unpatched kernel and get panic reported in PR.
  • Patched kernel is fine.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 4636
Build 4690: arc lint + arc unit

Event Timeline

jhb retitled this revision from to Fix locking issues with aio_fsync()..
jhb updated this object.
jhb edited the test plan for this revision. (Show Details)
jhb added a reviewer: kib.

I'm using 12-CURRENT r303286, and setting the sysctl vfs.aio.enable_unsafe=1 to enable the tests.

Running the test on an unpatched kernel successfully triggers the same panic from the PR.

Running the test on a patched kernel produces the following panic:

root@mako:/usr/tests # kyua test -k Kyuafile sys/aio/aio_test
sys/aio/aio_test:aio_fifo_test -> passed [0.007s]
sys/aio/aio_test:aio_file_test -> passed [0.007s]
sys/aio/aio_test:aio_fsync_test ->
panic: _mtx_lock_sleep: recursed on non-recursive mutex aiomtx @ /usr/src/sys/kern/vfs_aio.c:994

cpuid = 1
KDB: stack backtrace:
db_trace_self_wrapper at db_trace_self_wrapper+0x2b
vpanic() at vpanic+0x182
kassert_panic() at kassert_panic+0x126
__mtx_lock_sleep() at __mtx_lock_sleep+0x228
__mtx_lock_flags() at __mtx_lock_flags+0x10d
aio_clear_cancel_function() at aio_clear_cancel_function+0x2f
aio_bio_done_notify() at aio_bio_done_notify+0x28a
aio_complete() at aio_complete+0x9a
aio_process_rw() at aio_process_rw+0x259
aio_daemon() at aio_daemon+0x161
fork_exit() at fork_exit+0x84
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rop = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 1171 tid 101186 ]
Stopped at                  kdb_enter+0x3b: movq      $0,kdb_why

(edited to fix formatting)

Are you sure the patch applied cleanly? I reworked the code in aio_bio_done_notify() to reduce indentation and committed that code yesterday. If you don't have that change then the hunk that changed 'aio_clear_cancel_function()' to 'aio_clear_cancel_function_locked()' would have failed resulting in the panic you see.

You're correct, the revision I was using was before your other changes went in. With r303462 and this patch, the tests pass successfully for me :)

kib edited edge metadata.
This revision is now accepted and ready to land.Jul 29 2016, 7:39 AM
This revision was automatically updated to reflect the committed changes.