This change tries to fix the most obvious locking problems.
sbp_cam_scan_lun() is never called with the sbp lock held, so the lock
needs to be acquired internally (if it's needed at all).
Without this change a kernel with INVARIANTS panics when a firewire disk
is connected:
panic: mutex sbp not owned at /usr/src/sys/dev/firewire/sbp.c:967 KDB: stack backtrace: db_trace_self_wrapper() at 0xffffffff80420bbb = db_trace_self_wrapper+0x2b/frame 0xfffffe0504df0930 kdb_backtrace() at 0xffffffff80670359 = kdb_backtrace+0x39/frame 0xfffffe0504df09e0 vpanic() at 0xffffffff8063986c = vpanic+0x14c/frame 0xfffffe0504df0a20 panic() at 0xffffffff806395b3 = panic+0x43/frame 0xfffffe0504df0a80 __mtx_assert() at 0xffffffff8061c40d = __mtx_assert+0xed/frame 0xfffffe0504df0ac0 sbp_cam_scan_lun() at 0xffffffff80474667 = sbp_cam_scan_lun+0x37/frame 0xfffffe0504df0af0 xpt_done_process() at 0xffffffff802aacfa = xpt_done_process+0x2da/frame 0xfffffe0504df0b30 xpt_done_td() at 0xffffffff802ac2e5 = xpt_done_td+0xd5/frame 0xfffffe0504df0b80 fork_exit() at 0xffffffff805ff72f = fork_exit+0xdf/frame 0xfffffe0504df0bf0 fork_trampoline() at 0xffffffff8082483e = fork_trampoline+0xe/frame 0xfffffe0504df0bf0 --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Also, I tried to reduce the scope of the sbp lock to avoid holding it
while doing bus_dma allocations. Without the change there is mirriad
of warnings like:
kernel: bus_dma_tag_create with the following non-sleepable locks held: kernel: exclusive sleep mutex sbp (sbp) r = 0 (0xfffff8000a099748) locked @ /usr/src/sys/modules/firewire/sbp/../../../dev/firewire/sbp.c:802 kernel: stack backtrace: kernel: #0 0xffffffff80aa1250 at witness_debugger+0x70 kernel: #1 0xffffffff80aa2537 at witness_warn+0x3d7 kernel: #2 0xffffffff810064dd at bus_dma_tag_create+0x3d kernel: #3 0xffffffff83703402 at fwdma_malloc+0x82 kernel: #4 0xffffffff836ed759 at sbp_post_explore+0x849 kernel: #5 0xffffffff836f9e76 at fw_bus_probe_thread+0x906 kernel: #6 0xffffffff80a05674 at fork_exit+0x84 kernel: #7 0xffffffff80e988be at fork_trampoline+0xe kernel: bus_dmamem_alloc with the following non-sleepable locks held: kernel: exclusive sleep mutex sbp (sbp) r = 0 (0xfffff8000a099748) locked @ /usr/src/sys/modules/firewire/sbp/../../../dev/firewire/sbp.c:802 kernel: stack backtrace: kernel: #0 0xffffffff80aa1250 at witness_debugger+0x70 kernel: #1 0xffffffff80aa2537 at witness_warn+0x3d7 kernel: #2 0xffffffff810065c3 at bus_dmamem_alloc+0x33 kernel: #3 0xffffffff83703424 at fwdma_malloc+0xa4 kernel: #4 0xffffffff836ed759 at sbp_post_explore+0x849 kernel: #5 0xffffffff836f9e76 at fw_bus_probe_thread+0x906 kernel: #6 0xffffffff80a05674 at fork_exit+0x84 kernel: #7 0xffffffff80e988be at fork_trampoline+0xe kernel: bus_dmamap_create with the following non-sleepable locks held: kernel: exclusive sleep mutex sbp (sbp) r = 0 (0xfffff8000a099748) locked @ /usr/src/sys/modules/firewire/sbp/../../../dev/firewire/sbp.c:802 kernel: stack backtrace: kernel: #0 0xffffffff80aa1250 at witness_debugger+0x70 kernel: #1 0xffffffff80aa2537 at witness_warn+0x3d7 kernel: #2 0xffffffff8100655f at bus_dmamap_create+0x2f kernel: #3 0xffffffff836ed7d2 at sbp_post_explore+0x8c2 kernel: #4 0xffffffff836f9e76 at fw_bus_probe_thread+0x906 kernel: #5 0xffffffff80a05674 at fork_exit+0x84 kernel: #6 0xffffffff80e988be at fork_trampoline+0xe
Also, there were a couple of LORs:
kernel: lock order reversal: kernel: 1st 0xfffff8000a099748 sbp (sbp) @ /usr/src/sys/kern/kern_mutex.c:220 kernel: 2nd 0xfffffe0001b80870 firewire (firewire) @ /usr/src/sys/modules/firewire/firewire/../../../dev/firewire/firewire.c:302 kernel: stack backtrace: kernel: #0 0xffffffff80aa1250 at witness_debugger+0x70 kernel: #1 0xffffffff80aa1144 at witness_checkorder+0xe54 kernel: #2 0xffffffff80a22a04 at __mtx_lock_flags+0xa4 kernel: #3 0xffffffff836f6423 at fw_asyreq+0x2d3 kernel: #4 0xffffffff80a580dc at softclock_call_cc+0x19c kernel: #5 0xffffffff80a584d7 at softclock+0x47 kernel: #6 0xffffffff80a07f26 at intr_event_execute_handlers+0x96 kernel: #7 0xffffffff80a085a6 at ithread_loop+0xa6 kernel: #8 0xffffffff80a05674 at fork_exit+0x84 kernel: #9 0xffffffff80e988be at fork_trampoline+0xe kernel: lock order reversal: kernel: 1st 0xfffff8000a099748 sbp (sbp) @ /usr/src/sys/kern/kern_mutex.c:220 kernel: 2nd 0xfffff8000ff37460 CAM device lock (CAM device lock) @ /usr/src/sys/cam/scsi/scsi_xpt.c:2356 kernel: stack backtrace: kernel: #0 0xffffffff80aa1250 at witness_debugger+0x70 kernel: #1 0xffffffff80aa1144 at witness_checkorder+0xe54 kernel: #2 0xffffffff80a22a04 at __mtx_lock_flags+0xa4 kernel: #3 0xffffffff80319f12 at scsi_scan_lun+0x122 kernel: #4 0xffffffff836eef40 at sbp_cam_scan_target+0x100 kernel: #5 0xffffffff80a580dc at softclock_call_cc+0x19c kernel: #6 0xffffffff80a584d7 at softclock+0x47 kernel: #7 0xffffffff80a07f26 at intr_event_execute_handlers+0x96 kernel: #8 0xffffffff80a085a6 at ithread_loop+0xa6 kernel: #9 0xffffffff80a05674 at fork_exit+0x84 kernel: #10 0xffffffff80e988be at fork_trampoline+0xe
The code badly needs some re-engineering. SBP really should implement
a CAM transport, so that it avoids control flow inversion when re-scanning
the bus. Also, the sbp lock seems to be too coarse.
Also, the change includes some changes to related to locking.
- sbp_cam_scan_lun: restore CAM_DEV_QFREEZE before re-queueing the ccb because xpt_setup_ccb resets ccb_h.flags
- sbp_post_busreset: call xpt_release_simq only if it's actually frozen
- don't place private SIMQ_FREEZED flag (sic, "freezed") into sim->flags, use sbp->flags for that
- some style fixes and control flow enhancements