CALLOUT_DEBUG_DRAIN and its use for mlx5en/mlx5ib
Needs ReviewPublic
Actions

Authored by kib on Aug 3 2020, 11:37 AM.

Details

Reviewers

• hselasky
menyy_mellanox.com

Summary

The patch adds a facility to detect stray callouts after driver unload. Instead of crashes on callwheel list iteration, hopefully the checker would detect a callout for which the driver was already unloaded and provide the diagnostic, which should be enough to find the offender.

How to use:

Compile kernel with the option CALLOUT_DEBUG_DRAIN. This changes KBI, all modules must be built with the option.
Ensure that there is only one device for attach. Disable all other devices, including sibling PCI functions, with devctl disable <pci name>.
Load modules, kldload mlx5en. Then print the modules load address with kldstat, save this info.
Put some load.
Unload the driver with kldunload mlx5en. If a callout was left armed, kernel panics and prints addresses of callout callback, and for Linuxkpi callouts, additional address of the actual timer of work function that would be called.
Using kldstat base addresses of modules, and addresses of callout or timer functions, the offender should be identified.

The patch works by copying the callout and timer function addresses from the callwheel callout entry to the _previous_ entry in callwheel list. In other word, until callout exec operates on valid callouts, there is always the data about next, possible invalid, callout.

Example of the test panic dump:

root@:/ # sysctl debug.callout_check_drain_test=1
debug.callout_check_drain_test: 0CALLOUT CHECK DRAIN 0xffffffff803dc2e0 0xfeedf00d 0xffffffff803dc2e0
func 0xffffffff803dc2e0 at callout_check_drain_test_func+0
arg1 0xffffffff803dc2e0 at callout_check_drain_test_func+0
#0 0xffffffff803da240 at callout_reset_sbt_on_arg1+0x480
#1 0xffffffff803dc2a2 at callout_check_drain_test+0xb2
#2 0xffffffff803cddfc at sysctl_root_handler_locked+0x9c
#3 0xffffffff803cd261 at sysctl_root+0x1c1
#4 0xffffffff803cd8db at userland_sysctl+0x13b
#5 0xffffffff803cd75f at sys___sysctl+0x5f
#6 0xffffffff806751ab at amd64_syscall+0x18b
#7 0xffffffff8064d6ae at fast_syscall_common+0xf8
panic: callout check drain

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Skipped

Unit

Tests Skipped

Build Status

Buildable 32744

Event Timeline

kib created this revision.Aug 3 2020, 11:37 AM

Herald added subscribers: emaste, ae, imp. · View Herald TranscriptAug 3 2020, 11:37 AM

kib requested review of this revision.Aug 3 2020, 11:37 AM

Harbormaster completed remote builds in B32711: Diff 75292.Aug 3 2020, 11:38 AM

When you unload a kernel module, you know its text segment virtual address, where all code resides.
Why can't you iterate all pending callbacks and check if the function pointer is within the unloaded range?

In D25924#574449, @hselasky wrote:

When you unload a kernel module, you know its text segment virtual address, where all code resides.
Why can't you iterate all pending callbacks and check if the function pointer is within the unloaded range?

An intermediate version of the patch was very similar to what you suggested.
Current patch does the check much earlier than modules unload. So a possible situation where the non-drained callout went unnoticed because it is executed with garbage callout data should be less likely.

Significant efforts in the patch are spent to handle 'indirect' callbacks, which we typically have with Linuxkpi. The callout function points into linuxkpi.ko, but real function that should be called from callout handler is in some higher-level module. The patch should detect this, look for 'arg1' stuff.

Store stack of the callout_reset. Print it on violation.
Try to resolve func and arg1 with symbols.

Harbormaster completed remote builds in B32732: Diff 75328.Aug 3 2020, 9:03 PM

Significant efforts in the patch are spent to handle 'indirect' callbacks, which we typically have with Linuxkpi. The callout function points into linuxkpi.ko, but real function that should be called from callout handler is in some higher-level module. The patch should detect this, look for 'arg1' stuff.

I see. I need some more time to look at this patch.

sys/dev/mlx5/mlx5_core/mlx5_main.c
1272	What if mlx4 is loaded?
sys/kern/kern_thread.c
80	This ifndef looks like a temporary hack. Will you fix before commit?
sys/kern/kern_timeout.c
255	CALLOUT_DEBUG_TRACE ?
299	put ()'s around (cp == NULL) to make code more clear?
341	The right way to declare an empty function macro is by using do { } while (0)
344	Can we call this callout_trace_xxx . callout_remove_le get's me confused, and uppercase the macros ?
383	What about !SMP
623	Direct callouts should be supported too? Why is there no callout_insert_tqe() call here ?
915	Naming: It is more a sub_function than arg1
1208	Call it trace arg ? arg1 is too generic
1224	arg1 = ftn ????
1875	Should be on the stack or malloc'ed. There can be multiple threads invoking the sysctl.
1899	missing: callout_drain(&ctd_c);

kib marked 13 inline comments as done.Aug 4 2020, 10:03 AM

kib added inline comments.

sys/dev/mlx5/mlx5_core/mlx5_main.c
1272	Then things will break. Similarly, if there is more than one mlx5@pci devices, things will break. Read the description of the patch for instructions how to use it, you should see.
sys/kern/kern_thread.c
80	No it is not a hack, the patch changes KBI of the kernel. On the other hand, I am not sure that this patch is worth committing. It is for debugging.
sys/kern/kern_timeout.c
255	The option is called DEBUG_DRAIN, because it is for debugging missed drains, not tracing callouts.
299	Style(9) suggests to use () only when required by operators priority. We typically not enclose triple-op condition with ().
344	It should be callout_debug_drain_xxx then, but it is too long name. Also, they are functions when really used. So I made them functions for unused #ifdef branch as well.
383	For !SMP callouts cannot migrate so cc_migration_stack is unused.
623	Because direct callouts are removed from the wheel and not re-inserted into the expired queue. From the callouts subsystem PoV, after this moment, direct callout 'disappears'.
915	sub_function is too long, 'cc_migration_sub_function'
1208	It is not trace.
1224	Are you suggesting to always fill arg1 ? This will break the purpose of the patch. I detect 'interesting' leaked undrained callouts by looking for non-NULL arg1. This might be generalized, but for purpose of finding leaked callouts from mlx5/ofed, it is the ideal fit.
1875	It is debugging sysctl, which should result in panic when worked. I do not see much point from preventing user from misusing it.
1899	Why ? It is the point that the drain is missed, so I can check that the patch worked and leaked callout is detected.