Page MenuHomeFreeBSD

Save the last callout function executed on each CPU
ClosedPublic

Authored by vangyzen on Jun 28 2019, 4:41 PM.
Tags
None
Referenced Files
F107741609: D20794.id59162.diff
Fri, Jan 17, 10:04 PM
Unknown Object (File)
Thu, Jan 16, 4:13 AM
Unknown Object (File)
Fri, Jan 3, 6:39 AM
Unknown Object (File)
Thu, Dec 26, 5:15 AM
Unknown Object (File)
Sep 24 2024, 2:53 AM
Unknown Object (File)
Sep 16 2024, 10:31 PM
Unknown Object (File)
Sep 15 2024, 9:19 PM
Unknown Object (File)
Sep 12 2024, 3:45 AM
Subscribers

Details

Summary

Save the last callout function pointer (and its argument) executed
on each CPU for inspection by a debugger. Add a ddb show callout_last
command to show these pointers. Add a kernel module that I used
for testing that command.

Relocate ce_migration_cpu to reduce padding and therefore preserve
the size of struct callout_cpu (320 bytes on amd64) despite the
added members.

This should help diagnose reference-after-free bugs where the
callout's mutex has already been freed when softclock_call_cc
tries to unlock it.

You might hope that the pointer would still be available, but it
isn't. The argument to that function is on the stack (because
softclock_call_cc uses it later), and that might be enough in
some cases, but even then, it's very laborious. A pointer to the
callout is saved right before these newly added fields, but that
callout might have been freed. We still have the pointer to its
associated mutex, and the name within might be enough, but it might
also have been freed.

Test Plan

The structure size did not change:

(kgdb) p sizeof(struct callout_cpu)
$1 = 320

The DDB command works:

# make -C /usr/src/tools/test/callout_free load
callout_free_func = 0xffffffff82c110e0
callout_free_arg = 0xffffffff82c11288
squirrel!
panic: mutex callout_free not owned at /usr/src/sys/kern/kern_mutex.c:202
cpuid = 0
time = 1561739378
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00004b7870
vpanic() at vpanic+0x19d/frame 0xfffffe00004b78c0
panic() at panic+0x43/frame 0xfffffe00004b7920
unlock_mtx() at unlock_mtx+0x86/frame 0xfffffe00004b7930
softclock_call_cc() at softclock_call_cc+0x175/frame 0xfffffe00004b79e0
softclock() at softclock+0x7c/frame 0xfffffe00004b7a10
ithread_loop() at ithread_loop+0x187/frame 0xfffffe00004b7a70
fork_exit() at fork_exit+0x84/frame 0xfffffe00004b7ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00004b7ab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 12 tid 100029 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db> show callout_last
cpu 0 last callout function: 0xffffffff82c110e0 callout_free_func
cpu 0 last callout argument: 0xffffffff82c11288
cpu 0 last direct callout function: 0xffffffff80c1e7b0 sleepq_timeout
cpu 0 last direct callout argument: 0xfffff80003ea65a0
cpu 1 last callout function: 0 0
cpu 1 last callout argument: 0
cpu 1 last direct callout function: 0xffffffff80c1e7b0 sleepq_timeout
cpu 1 last direct callout argument: 0xfffff80003ea65a0
cpu 2 last callout function: 0 0
cpu 2 last callout argument: 0
cpu 2 last direct callout function: 0xffffffff80c1e7b0 sleepq_timeout
cpu 2 last direct callout argument: 0xfffff80003fbe5a0
cpu 3 last callout function: 0 0
cpu 3 last callout argument: 0
cpu 3 last direct callout function: 0xffffffff80c1e7b0 sleepq_timeout
cpu 3 last direct callout argument: 0xfffff80003fef000
cpu 4 last callout function: 0 0
cpu 4 last callout argument: 0
cpu 4 last direct callout function: 0xffffffff80c1e7b0 sleepq_timeout
cpu 4 last direct callout argument: 0xfffff80003ec55a0
cpu 5 last callout function: 0 0
cpu 5 last callout argument: 0
cpu 5 last direct callout function: 0xffffffff80c1e7b0 sleepq_timeout
cpu 5 last direct callout argument: 0xfffff80003ea25a0
db> show callout_last 0
cpu 0 last callout function: 0xffffffff82c110e0 callout_free_func
cpu 0 last callout argument: 0xffffffff82c11288
cpu 0 last direct callout function: 0xffffffff80c1e7b0 sleepq_timeout
cpu 0 last direct callout argument: 0xfffff80003ea65a0
db>

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 25137
Build 23831: arc lint + arc unit

Event Timeline

The change looks ok to me. Are you proposing committing the test code?

sys/kern/kern_timeout.c
1704

I think printing the last callout for each CPU would be a more useful default behaviour.

The change looks ok to me.

Thanks for the review, Mark.

Are you proposing committing the test code?

Yep.

sys/kern/kern_timeout.c
1704

I agree.

  • Show the last callout on every CPU by default
This revision is now accepted and ready to land.Jul 2 2019, 1:54 PM