Page MenuHomeFreeBSD

CALLOUT_DEBUG_DRAIN and its use for mlx5en/mlx5ib
Needs ReviewPublic

Authored by kib on Aug 3 2020, 11:37 AM.

Details

Summary

The patch adds a facility to detect stray callouts after driver unload. Instead of crashes on callwheel list iteration, hopefully the checker would detect a callout for which the driver was already unloaded and provide the diagnostic, which should be enough to find the offender.

How to use:

  1. Compile kernel with the option CALLOUT_DEBUG_DRAIN. This changes KBI, all modules must be built with the option.
  2. Ensure that there is only one device for attach. Disable all other devices, including sibling PCI functions, with devctl disable <pci name>.
  3. Load modules, kldload mlx5en. Then print the modules load address with kldstat, save this info.
  4. Put some load.
  5. Unload the driver with kldunload mlx5en. If a callout was left armed, kernel panics and prints addresses of callout callback, and for Linuxkpi callouts, additional address of the actual timer of work function that would be called.
  6. Using kldstat base addresses of modules, and addresses of callout or timer functions, the offender should be identified.

The patch works by copying the callout and timer function addresses from the callwheel callout entry to the _previous_ entry in callwheel list. In other word, until callout exec operates on valid callouts, there is always the data about next, possible invalid, callout.

Example of the test panic dump:

root@:/ # sysctl debug.callout_check_drain_test=1
debug.callout_check_drain_test: 0CALLOUT CHECK DRAIN 0xffffffff803dc2e0 0xfeedf00d 0xffffffff803dc2e0
func 0xffffffff803dc2e0 at callout_check_drain_test_func+0
arg1 0xffffffff803dc2e0 at callout_check_drain_test_func+0
#0 0xffffffff803da240 at callout_reset_sbt_on_arg1+0x480
#1 0xffffffff803dc2a2 at callout_check_drain_test+0xb2
#2 0xffffffff803cddfc at sysctl_root_handler_locked+0x9c
#3 0xffffffff803cd261 at sysctl_root+0x1c1
#4 0xffffffff803cd8db at userland_sysctl+0x13b
#5 0xffffffff803cd75f at sys___sysctl+0x5f
#6 0xffffffff806751ab at amd64_syscall+0x18b
#7 0xffffffff8064d6ae at fast_syscall_common+0xf8
panic: callout check drain

Sponsored by: Mellanox Technologies

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped
Build Status
Buildable 32744

Event Timeline

kib created this revision.Aug 3 2020, 11:37 AM
kib requested review of this revision.Aug 3 2020, 11:37 AM

When you unload a kernel module, you know its text segment virtual address, where all code resides.
Why can't you iterate all pending callbacks and check if the function pointer is within the unloaded range?

kib added a comment.Aug 3 2020, 4:48 PM

When you unload a kernel module, you know its text segment virtual address, where all code resides.
Why can't you iterate all pending callbacks and check if the function pointer is within the unloaded range?

An intermediate version of the patch was very similar to what you suggested.
Current patch does the check much earlier than modules unload. So a possible situation where the non-drained callout went unnoticed because it is executed with garbage callout data should be less likely.

Significant efforts in the patch are spent to handle 'indirect' callbacks, which we typically have with Linuxkpi. The callout function points into linuxkpi.ko, but real function that should be called from callout handler is in some higher-level module. The patch should detect this, look for 'arg1' stuff.

kib updated this revision to Diff 75328.Aug 3 2020, 9:02 PM
kib edited the summary of this revision. (Show Details)

Store stack of the callout_reset. Print it on violation.
Try to resolve func and arg1 with symbols.

Significant efforts in the patch are spent to handle 'indirect' callbacks, which we typically have with Linuxkpi. The callout function points into linuxkpi.ko, but real function that should be called from callout handler is in some higher-level module. The patch should detect this, look for 'arg1' stuff.

I see. I need some more time to look at this patch.

sys/dev/mlx5/mlx5_core/mlx5_main.c
1272

What if mlx4 is loaded?

sys/kern/kern_thread.c
80

This ifndef looks like a temporary hack. Will you fix before commit?

sys/kern/kern_timeout.c
255

CALLOUT_DEBUG_TRACE ?

299

put ()'s around (cp == NULL) to make code more clear?

341

The right way to declare an empty function macro is by using do { } while (0)

344

Can we call this callout_trace_xxx . callout_remove_le get's me confused, and uppercase the macros ?

383

What about !SMP

623

Direct callouts should be supported too?

Why is there no callout_insert_tqe() call here ?

915

Naming: It is more a sub_function than arg1

1208

Call it trace arg ? arg1 is too generic

1224

arg1 = ftn ????

1875

Should be on the stack or malloc'ed. There can be multiple threads invoking the sysctl.

1899

missing: callout_drain(&ctd_c);

kib marked 13 inline comments as done.Aug 4 2020, 10:03 AM
kib added inline comments.
sys/dev/mlx5/mlx5_core/mlx5_main.c
1272

Then things will break. Similarly, if there is more than one mlx5@pci devices, things will break. Read the description of the patch for instructions how to use it, you should see.

sys/kern/kern_thread.c
80

No it is not a hack, the patch changes KBI of the kernel.

On the other hand, I am not sure that this patch is worth committing. It is for debugging.

sys/kern/kern_timeout.c
255

The option is called DEBUG_DRAIN, because it is for debugging missed drains, not tracing callouts.

299

Style(9) suggests to use () only when required by operators priority. We typically not enclose triple-op condition with ().

344

It should be callout_debug_drain_xxx then, but it is too long name. Also, they are functions when really used.

So I made them functions for unused #ifdef branch as well.

383

For !SMP callouts cannot migrate so cc_migration_stack is unused.

623

Because direct callouts are removed from the wheel and not re-inserted into the expired queue. From the callouts subsystem PoV, after this moment, direct callout 'disappears'.

915

sub_function is too long, 'cc_migration_sub_function'

1208

It is not trace.

1224

Are you suggesting to always fill arg1 ? This will break the purpose of the patch.

I detect 'interesting' leaked undrained callouts by looking for non-NULL arg1. This might be generalized, but for purpose of finding leaked callouts from mlx5/ofed, it is the ideal fit.

1875

It is debugging sysctl, which should result in panic when worked. I do not see much point from preventing user from misusing it.

1899

Why ? It is the point that the drain is missed, so I can check that the patch worked and leaked callout is detected.

kib updated this revision to Diff 75346.Aug 4 2020, 10:05 AM
kib marked 12 inline comments as done.

Convert stub macros into empty functions.

If the patch is internal debugging only, not for upstream, it is fine.