Page MenuHomeFreeBSD

if_vlan: Use the exclusive lock everywhere
ClosedPublic

Authored by markj on Mon, May 4, 12:08 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, May 17, 10:53 AM
Unknown Object (File)
Sat, May 16, 5:31 PM
Unknown Object (File)
Sat, May 16, 4:36 AM
Unknown Object (File)
Fri, May 15, 2:30 AM
Unknown Object (File)
Thu, May 14, 6:13 PM
Unknown Object (File)
Thu, May 14, 2:31 PM
Unknown Object (File)
Thu, May 14, 1:45 PM
Unknown Object (File)
Tue, May 12, 7:27 PM

Details

Summary

Running sys/net tests in parallel reveals some panics which look like
the one below:

shared lock of (sx) vlan_sx @ /home/markj/sb/main/src/sys/net/if_vlan.c:2395
while exclusively locked from /home/markj/sb/main/src/sys/net/if_vlan.c:1850
panic: excl->share
cpuid = 9
time = 1776467219
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00d84e0780
vpanic() at vpanic+0x136/frame 0xfffffe00d84e08b0
panic() at panic+0x43/frame 0xfffffe00d84e0910
witness_checkorder() at witness_checkorder+0xdb1/frame 0xfffffe00d84e0ad0
_sx_slock_int() at _sx_slock_int+0x64/frame 0xfffffe00d84e0b10
vlan_ioctl() at vlan_ioctl+0x25c/frame 0xfffffe00d84e0b70
if_setflag() at if_setflag+0xdc/frame 0xfffffe00d84e0be0
ifpromisc() at ifpromisc+0x27/frame 0xfffffe00d84e0c00
vlan_setflags() at vlan_setflags+0x64/frame 0xfffffe00d84e0c30
vlan_unconfig_locked() at vlan_unconfig_locked+0xb7/frame 0xfffffe00d84e0c70
vlan_clone_destroy() at vlan_clone_destroy+0x5d/frame 0xfffffe00d84e0cb0
if_clone_destroyif_flags() at if_clone_destroyif_flags+0x8c/frame 0xfffffe00d84e0cf0
if_clone_detach() at if_clone_detach+0x106/frame 0xfffffe00d84e0d20
vnet_destroy() at vnet_destroy+0x154/frame 0xfffffe00d84e0d50
prison_deref() at prison_deref+0xaf5/frame 0xfffffe00d84e0dc0
sys_jail_remove() at sys_jail_remove+0x1a7/frame 0xfffffe00d84e0e00
amd64_syscall() at amd64_syscall+0x169/frame 0xfffffe00d84e0f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00d84e0f30
--- syscall (508, FreeBSD ELF64, jail_remove), rip = 0x25bd44705ca, rsp = 0x25bcfe72ab8, rbp = 0x25bcfe72b40 ---

All vlan interfaces are locked by a single recursive global lock. There
are cases, like in the panic above where vlans are stacked on top of
each other, where the driver tries to acquire an exclusive lock while
holding a shared lock, and vice versa.

With longer-term goals of:

  • making the networking regression test suites stable when run in parallel, and
  • simplifying network control plane locking, which I find is quite complex and buggy,

let's change if_vlan to use the exclusive lock everywhere.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

markj requested review of this revision.Mon, May 4, 12:08 AM

Indeed the net stack lacks proper ioctl locks. I found that while diagnosing PR 292993 and some attaching / detaching issues. Also some drivers have comments about that. I think normally users do not generate concurrent ioctls, so the issue is merely noticed. For parallel tests that is easy to spot.

I think this approach is good, as IMO the ioctls are not performance critical path. Ideally a per-interface sx lock should be sufficient ( think about lots of interfaces cross multiple VNETs ) , but I think that is not scope of this change.

This revision is now accepted and ready to land.Mon, May 4, 6:20 AM

Indeed the net stack lacks proper ioctl locks. I found that while diagnosing PR 292993 and some attaching / detaching issues. Also some drivers have comments about that. I think normally users do not generate concurrent ioctls, so the issue is merely noticed. For parallel tests that is easy to spot.

I think this approach is good, as IMO the ioctls are not performance critical path. Ideally a per-interface sx lock should be sufficient ( think about lots of interfaces cross multiple VNETs ) , but I think that is not scope of this change.

Even a global lock (or a per-VNET lock) might be ok, so long as it is not touched in the packet processing path. But the integration of net_epoch and preexisting locks is still messy and unclear, and in order to fix that we must first simplify existing locking as much as possible, IMHO.

pouria added a subscriber: pouria.

LGTM, I hope @ivy also take a look at this.

As Zhenlei said, we don't need much parallelism in the configuration path.

But the integration of net_epoch and preexisting locks is still messy and unclear, and in order to fix that we must first simplify existing locking as much as possible, IMHO.

I had started a Christmas project where vlan would calculate a perfect hash on every reconfiguration and reallocate the entire hash and then free the old one with epoch. Didn't finish, maybe will get back to that.

This revision was automatically updated to reflect the committed changes.