Page MenuHomeFreeBSD

witness: Provide facility to print detailed lock tree
Needs ReviewPublic

Authored by jtl on Mon, Jan 19, 7:42 PM.
Tags
None
Referenced Files
F142542437: D54785.diff
Tue, Jan 20, 7:43 PM
F142506960: D54785.id170071.diff
Tue, Jan 20, 11:53 AM
F142501289: D54785.diff
Tue, Jan 20, 10:37 AM
F142500481: D54785.id.diff
Tue, Jan 20, 10:27 AM
Unknown Object (File)
Tue, Jan 20, 2:30 AM
Unknown Object (File)
Tue, Jan 20, 1:41 AM
Unknown Object (File)
Tue, Jan 20, 1:28 AM

Details

Reviewers
jhb
markj
kib
Summary
When witness(4) detects lock order reversals (LORs), it prints
information about the stack trace which caused the LOR. If available,
it can also print information about the first stack trace which
established the other lock ordering. However, it only does this for
"simple" LORs where the two locks in question were directly locked
in the opposite order. When the lock order was established through
a more complex pattern of intermediate locks, WITNESS only prints
the stack trace where it detected the LOR.

This commit provides a new sysctl (debug.witness.badstacks_verbose)
which provides more verbose information about the lock chain(s) which
established the lock ordering. It does not modify any existing
functionality.

Reviewed by:    <If someone else reviewed your modification.>
Sponsored by:   <If the change was sponsored by an organization.>
Differential Revision:  <https://reviews.freebsd.org/D###>
Test Plan

I wrote some kernel code to create a few LORs. With the changes in this diff, I see the following output.

jtl@freebsd:~ $ sysctl debug.witness.badstacks
debug.witness.badstacks: Number of known direct relationships is 731

Lock order reversal between "so_rcv"(sleep mutex) and "lle"(rw)!
Lock order "lle"(rw) -> "so_rcv"(sleep mutex) first seen at:
#0 0xffff00000054b774 at witness_checkorder+0x32c
#1 0xffff0000004abc8c at __mtx_lock_flags+0x9c
#2 0xffff0000005501fc at sysctl_debug_witness_generate_lor+0x1a8
#3 0xffff0000004e5ca0 at sysctl_root_handler_locked+0xe4
#4 0xffff0000004e4fec at sysctl_root+0x1e4
#5 0xffff0000004e5684 at userland_sysctl+0x140
#6 0xffff0000004e5508 at sys___sysctl+0x84
#7 0xffff0000008bb31c at do_el0_sync+0x68c
#8 0xffff00000088e1ac at handle_el0_sync+0x4c


Lock order reversal between "jtl1"(sleep mutex) and "jtl2"(sleep mutex)!
Lock order "jtl1"(sleep mutex) -> "jtl2"(sleep mutex) first seen at:
#0 0xffff00000054b774 at witness_checkorder+0x32c
#1 0xffff0000004abc8c at __mtx_lock_flags+0x9c
#2 0xffff000000550270 at sysctl_debug_witness_generate_lor+0x21c
#3 0xffff0000004e5ca0 at sysctl_root_handler_locked+0xe4
#4 0xffff0000004e4fec at sysctl_root+0x1e4
#5 0xffff0000004e5684 at userland_sysctl+0x140
#6 0xffff0000004e5508 at sys___sysctl+0x84
#7 0xffff0000008bb31c at do_el0_sync+0x68c
#8 0xffff00000088e1ac at handle_el0_sync+0x4c

Lock order "jtl2"(sleep mutex) -> "jtl1"(sleep mutex) first seen at:
#0 0xffff00000054b774 at witness_checkorder+0x32c
#1 0xffff0000004abc8c at __mtx_lock_flags+0x9c
#2 0xffff0000005502c0 at sysctl_debug_witness_generate_lor+0x26c
#3 0xffff0000004e5ca0 at sysctl_root_handler_locked+0xe4
#4 0xffff0000004e4fec at sysctl_root+0x1e4
#5 0xffff0000004e5684 at userland_sysctl+0x140
#6 0xffff0000004e5508 at sys___sysctl+0x84
#7 0xffff0000008bb31c at do_el0_sync+0x68c
#8 0xffff00000088e1ac at handle_el0_sync+0x4c


jtl@freebsd:~ $ sysctl debug.witness.badstacks_verbose
debug.witness.badstacks_verbose: Number of known direct relationships is 731

Lock order reversal between "so_rcv"(sleep mutex) and "lle"(rw)!
All lock orders from "so_rcv"(sleep mutex) -> "lle"(rw):
"so_rcv" -> "tcphash" -> "in6_ifaddr_lock" -> "lle"
"so_rcv" -> "udphash" -> "in6_ifaddr_lock" -> "lle"
"so_rcv" -> "udphash" -> "lle"

Lock order "so_rcv"(sleep mutex) -> "tcphash"(sleep mutex) first seen at:
#0 0xffff00000054b774 at witness_checkorder+0x32c
#1 0xffff0000004abc8c at __mtx_lock_flags+0x9c
#2 0xffff0000006b5314 at tcp6_usr_listen+0x188
#3 0xffff00000058b564 at solisten+0x48
#4 0xffff000000593d2c at kern_listen+0x74
#5 0xffff0000008bb31c at do_el0_sync+0x68c
#6 0xffff00000088e1ac at handle_el0_sync+0x4c

Lock order "tcphash"(sleep mutex) -> "in6_ifaddr_lock"(rm) first seen at:
(No stack trace--hardcoded relationship?)

Lock order "so_rcv"(sleep mutex) -> "udphash"(sleep mutex) first seen at:
#0 0xffff00000054b774 at witness_checkorder+0x32c
#1 0xffff0000004abc8c at __mtx_lock_flags+0x9c
#2 0xffff000000550118 at sysctl_debug_witness_generate_lor+0xc4
#3 0xffff0000004e5ca0 at sysctl_root_handler_locked+0xe4
#4 0xffff0000004e4fec at sysctl_root+0x1e4
#5 0xffff0000004e5684 at userland_sysctl+0x140
#6 0xffff0000004e5508 at sys___sysctl+0x84
#7 0xffff0000008bb31c at do_el0_sync+0x68c
#8 0xffff00000088e1ac at handle_el0_sync+0x4c

Lock order "udphash"(sleep mutex) -> "in6_ifaddr_lock"(rm) first seen at:
#0 0xffff00000054b774 at witness_checkorder+0x32c
#1 0xffff0000004ccbb4 at _rm_wlock_debug+0x78
#2 0xffff0000005501b4 at sysctl_debug_witness_generate_lor+0x160
#3 0xffff0000004e5ca0 at sysctl_root_handler_locked+0xe4
#4 0xffff0000004e4fec at sysctl_root+0x1e4
#5 0xffff0000004e5684 at userland_sysctl+0x140
#6 0xffff0000004e5508 at sys___sysctl+0x84
#7 0xffff0000008bb31c at do_el0_sync+0x68c
#8 0xffff00000088e1ac at handle_el0_sync+0x4c

Lock order "in6_ifaddr_lock"(rm) -> "lle"(rw) first seen at:
#0 0xffff00000054b774 at witness_checkorder+0x32c
#1 0xffff0000004cf4c8 at __rw_rlock_int+0x8c
#2 0xffff0000006c9fe4 at in6_lltable_lookup+0xe0
#3 0xffff0000006e9980 at nd6_lookup+0x74
#4 0xffff0000006f44ac at find_pfxlist_reachable_router+0x88
#5 0xffff0000006f1f08 at pfxlist_onlink_check+0x4b4
#6 0xffff0000006f17f4 at nd6_ra_input+0x116c
#7 0xffff0000006c1754 at icmp6_input+0x900
#8 0xffff0000006dc020 at ip6_input+0xf60
#9 0xffff00000063a730 at netisr_dispatch_src+0xd8
#10 0xffff00000061ba80 at ether_demux+0x174
#11 0xffff00000061cfd4 at ether_nh_input+0x374
#12 0xffff00000063a730 at netisr_dispatch_src+0xd8
#13 0xffff00000061bec8 at ether_input+0xdc
#14 0xffff0000006967d8 at tcp_lro_flush_all+0xec
#15 0xffff000000635d2c at iflib_rxeof+0xbc4
#16 0xffff00000062ffbc at _task_fn_rx+0xa4
#17 0xffff0000005221b8 at gtaskqueue_run_locked+0x164

Lock order "udphash"(sleep mutex) -> "lle"(rw) first seen at:
#0 0xffff00000054b774 at witness_checkorder+0x32c
#1 0xffff0000004cf4c8 at __rw_rlock_int+0x8c
#2 0xffff000000550164 at sysctl_debug_witness_generate_lor+0x110
#3 0xffff0000004e5ca0 at sysctl_root_handler_locked+0xe4
#4 0xffff0000004e4fec at sysctl_root+0x1e4
#5 0xffff0000004e5684 at userland_sysctl+0x140
#6 0xffff0000004e5508 at sys___sysctl+0x84
#7 0xffff0000008bb31c at do_el0_sync+0x68c
#8 0xffff00000088e1ac at handle_el0_sync+0x4c


All lock orders from "lle"(rw) -> "so_rcv"(sleep mutex):
** "lle" -> "so_rcv"

Lock order "lle"(rw) -> "so_rcv"(sleep mutex) first seen at:
#0 0xffff00000054b774 at witness_checkorder+0x32c
#1 0xffff0000004abc8c at __mtx_lock_flags+0x9c
#2 0xffff0000005501fc at sysctl_debug_witness_generate_lor+0x1a8
#3 0xffff0000004e5ca0 at sysctl_root_handler_locked+0xe4
#4 0xffff0000004e4fec at sysctl_root+0x1e4
#5 0xffff0000004e5684 at userland_sysctl+0x140
#6 0xffff0000004e5508 at sys___sysctl+0x84
#7 0xffff0000008bb31c at do_el0_sync+0x68c
#8 0xffff00000088e1ac at handle_el0_sync+0x4c



Lock order reversal between "jtl1"(sleep mutex) and "jtl2"(sleep mutex)!
All lock orders from "jtl1"(sleep mutex) -> "jtl2"(sleep mutex):
"jtl1" -> "jtl2"

Lock order "jtl1"(sleep mutex) -> "jtl2"(sleep mutex) first seen at:
#0 0xffff00000054b774 at witness_checkorder+0x32c
#1 0xffff0000004abc8c at __mtx_lock_flags+0x9c
#2 0xffff000000550270 at sysctl_debug_witness_generate_lor+0x21c
#3 0xffff0000004e5ca0 at sysctl_root_handler_locked+0xe4
#4 0xffff0000004e4fec at sysctl_root+0x1e4
#5 0xffff0000004e5684 at userland_sysctl+0x140
#6 0xffff0000004e5508 at sys___sysctl+0x84
#7 0xffff0000008bb31c at do_el0_sync+0x68c
#8 0xffff00000088e1ac at handle_el0_sync+0x4c


All lock orders from "jtl2"(sleep mutex) -> "jtl1"(sleep mutex):
** "jtl2" -> "jtl1"

Lock order "jtl2"(sleep mutex) -> "jtl1"(sleep mutex) first seen at:
#0 0xffff00000054b774 at witness_checkorder+0x32c
#1 0xffff0000004abc8c at __mtx_lock_flags+0x9c
#2 0xffff0000005502c0 at sysctl_debug_witness_generate_lor+0x26c
#3 0xffff0000004e5ca0 at sysctl_root_handler_locked+0xe4
#4 0xffff0000004e4fec at sysctl_root+0x1e4
#5 0xffff0000004e5684 at userland_sysctl+0x140
#6 0xffff0000004e5508 at sys___sysctl+0x84
#7 0xffff0000008bb31c at do_el0_sync+0x68c
#8 0xffff00000088e1ac at handle_el0_sync+0x4c

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

jtl requested review of this revision.Mon, Jan 19, 7:42 PM
jtl added a subscriber: gallatin.

This should be extremely useful, I already anticipate it. But, why not make this information available by default? I am not even sure that the knob is needed to hide it.

In D54785#1252125, @kib wrote:

This should be extremely useful, I already anticipate it.

Great! Glad to hear it. I knew I would find it useful. I wasn't sure if others would.

But, why not make this information available by default? I am not even sure that the knob is needed to hide it.

If you are talking about changing the behavior of the debug.witness.badstacks sysctl, the main reason I didn't change the default behavior is that I don't know how people use that sysctl; therefore, I was hesitant to change it. A second reason is that I wasn't sure other people would appreciate the extra information as much as I would. :-) But, I am happy to simply change the behavior of the sysctl if the consensus is that we should do that.

If you are talking about changing the information printed to the console when a LOR is detected at run-time, there are a few reasons I was concerned about changing this: a) This new code adds a decent bit of extra processing which will make a WITNESS violation even more expensive, b) This code allows (bounded) recursion, which I am hesitant to add in unconstrained contexts, c) Printing extra information could mean a lot of the information scrolls off the console for people using graphical consoles, and d) I wasn't sure people would appreciate the extra information. :-)

FWIW, the same function (sbuf_print_witness_badstacks()) is called from a DDB command. Because of the recursion in the current version of the patch, I am hesitant to call it from anything but the sysctl context (where the current stack should be predictable and short). I can unroll the recursion into a loop, but I think that would hurt readability.

If we want to add this same information to the run-time checks printed on the console, I would likely switch those to defer most output to a taskqueue. But, modifying the code to print the extra information should be a minimal change.

I do not think that trying to schedule a task is very robust with a LOR detected. We are already in the situation risking the deadlock.
Also I do not think that the doubts of the stack overflow is very grounded, since WITNESS typically has spinlock recording disabled II believe there are not solved issues with console spinlock which in reality cause hard hang). So there is definitely the space for interrupt frame, and mode, when WITNESS running.

If you refuse to make this a default, I am asking to add a simple knob which would make this additional info to be printed in all situations.

In D54785#1252666, @kib wrote:

I do not think that trying to schedule a task is very robust with a LOR detected. We are already in the situation risking the deadlock.

Fair point!

If you refuse to make this a default, I am asking to add a simple knob which would make this additional info to be printed in all situations.

I don't "refuse"; I was just giving my thinking.

I'm currently working on changing the recursion into a loop; once I do that, I'm happy to add it everywhere.

In D54785#1252692, @jtl wrote:
In D54785#1252666, @kib wrote:

I do not think that trying to schedule a task is very robust with a LOR detected. We are already in the situation risking the deadlock.

Fair point!

If you refuse to make this a default, I am asking to add a simple knob which would make this additional info to be printed in all situations.

I don't "refuse"; I was just giving my thinking.

Ok, refuse was probably too strong word, sorry about this.

I'm currently working on changing the recursion into a loop; once I do that, I'm happy to add it everywhere.

Thank you.