Don't take the device_mtx lock in daasync() when handling an
AC_UNIT_ATTENTION. Instead, assert the lock is held before modifying the
periph's softc flags.
The device_mtx lock is taken in xptdevicetraverse() before daasync()
is eventually called in xpt_async_bcast().
What I found was that even though the periphs are different
("da0" and "probe0", in my case), they are attached to the same device and
share the same device_mtx.
remnants of PR: 240917, 226510, 226578
I'll add the following for completeness, but not sure if phabricator is the
right place for it..
This patch fixes the panic:
panic: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /usr/src/sys/cam/scsi/scsi_d cpuid = 0 time = 1607456465 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0001d2b750 vpanic() at vanic+0x181/frame 0xfffffe0001d2b7a0 panic() at panic+0x43/frame 0xfffffe0001d2b800 __mtx_lock_sleep() at __mtx_lock_sleep+0x47a/frame 0xfffffe0001d2b890 __mtx_lock_flags() at __mtx_lock_flags+0xe5/frame 0xfffffe0001d2b8e0 daasync() at daasync+0x4dc/frame 0xfffffe0001d2b930 xpt_async_process_dev() at xpt_async_process_dev+0x152/frame 0xfffffe0001d2b980 xptdevicetraverse() at xptdevicetraverse+0xa5/frame 0xfffffe0001d2b9d0 xpttargettraverse() at xpttargettraverse+0x6b/frame 0xfffffe0001d2ba10 xpt_async_process() at xpt_async_process+0x2ed/frame 0xfffffe0001d2bb20 xpt_done_process() at xpt_done_process+0x3a3/frame 0xfffffe0001d2bb60 xpt_done_td() at xpt_done_td+0xf5/frame 0xfffffe0001d2bbb0 fork_exit() at fork_exit+0x80/frame 0xfffffe0001d2bbf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0001d2bbf0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic [ thread pid 4 tid 100026 ] Stopped at kdb_enter+0x37: movq $0,0x10aa136(%rip) db>
To reproduce, setup a LUN backed by a zvol, (in ctl.conf):
portal-group pg1 { discovery-auth-group no-authentication listen 0.0.0.0 } target iqn.2012-18.com.example:target155 { auth-group no-authentication portal-group pg1 lun 0 { path /dev/zvol/zroot/testvol size 100M } }
Start ctld:
% service ctld start
Turn on the camsim port twice, the second time will trigger panic:
% ctladm port -p 0 -o on % ctladm port -p 0 -o on
output =>
(probe0:camsim0:0:0:0): INQUIRY data has changed -- panic triggered --
Another way, by modifying LUN size (panic triggered by the rescan):
% ctladm modify -b block -l 0 -s 99M % camcontrol rescan all
output =>
(probe0:camsim0:0:0:0): Capacity data has changed -- panic triggered --