Running sys/net tests in parallel reveals some panics which look like
the one below:
shared lock of (sx) vlan_sx @ /home/markj/sb/main/src/sys/net/if_vlan.c:2395 while exclusively locked from /home/markj/sb/main/src/sys/net/if_vlan.c:1850 panic: excl->share cpuid = 9 time = 1776467219 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00d84e0780 vpanic() at vpanic+0x136/frame 0xfffffe00d84e08b0 panic() at panic+0x43/frame 0xfffffe00d84e0910 witness_checkorder() at witness_checkorder+0xdb1/frame 0xfffffe00d84e0ad0 _sx_slock_int() at _sx_slock_int+0x64/frame 0xfffffe00d84e0b10 vlan_ioctl() at vlan_ioctl+0x25c/frame 0xfffffe00d84e0b70 if_setflag() at if_setflag+0xdc/frame 0xfffffe00d84e0be0 ifpromisc() at ifpromisc+0x27/frame 0xfffffe00d84e0c00 vlan_setflags() at vlan_setflags+0x64/frame 0xfffffe00d84e0c30 vlan_unconfig_locked() at vlan_unconfig_locked+0xb7/frame 0xfffffe00d84e0c70 vlan_clone_destroy() at vlan_clone_destroy+0x5d/frame 0xfffffe00d84e0cb0 if_clone_destroyif_flags() at if_clone_destroyif_flags+0x8c/frame 0xfffffe00d84e0cf0 if_clone_detach() at if_clone_detach+0x106/frame 0xfffffe00d84e0d20 vnet_destroy() at vnet_destroy+0x154/frame 0xfffffe00d84e0d50 prison_deref() at prison_deref+0xaf5/frame 0xfffffe00d84e0dc0 sys_jail_remove() at sys_jail_remove+0x1a7/frame 0xfffffe00d84e0e00 amd64_syscall() at amd64_syscall+0x169/frame 0xfffffe00d84e0f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00d84e0f30 --- syscall (508, FreeBSD ELF64, jail_remove), rip = 0x25bd44705ca, rsp = 0x25bcfe72ab8, rbp = 0x25bcfe72b40 ---
All vlan interfaces are locked by a single recursive global lock. There
are cases, like in the panic above where vlans are stacked on top of
each other, where the driver tries to acquire an exclusive lock while
holding a shared lock, and vice versa.
With longer-term goals of:
- making the networking regression test suites stable when run in parallel, and
- simplifying network control plane locking, which I find is quite complex and buggy,
let's change if_vlan to use the exclusive lock everywhere.