Page MenuHomeFreeBSD

ifnet: drain+halt ioctl/vmove operations on detach
Needs ReviewPublic

Authored by kevans on Dec 5 2019, 2:56 PM.

Details

Reviewers
rgrimes
ae
kp
glebius
jhb
avg
bz
Group Reviewers
network
Summary

This pushes us towards a mode of operation similar to other teardown routines, where we're guaranteed to be free of clashing operations after if_detach/ether_ifdetach returns. Various operations (e.g. vnet moves, ioctl handling) will busy/unbusy the ifnet to block detaching until they've completed.

Some further vnet ordering issues are touched on here to make sure that we handle concurrent re-home/removal properly.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped
Build Status
Buildable 28331

Event Timeline

kevans created this revision.Dec 5 2019, 2:56 PM
kevans added reviewers: ae, kp.Dec 5 2019, 6:19 PM
kevans added a subscriber: kp.

Specifically tagging @ae and @kp as well, as they've written or maintained some of the drivers that could benefit from a callback to do this deferred cleanup of softc.

kevans added a subscriber: glebius.

Explicitly adding @glebius as well, as he's well-versed in ifnet lifetime issues.

rgrimes accepted this revision.Dec 5 2019, 7:34 PM

I might of used tunif_freed inplace of tunifdtor, but thats just me wanting grep to find these :-)

This revision is now accepted and ready to land.Dec 5 2019, 7:34 PM
glebius added a reviewer: jhb.Dec 5 2019, 11:22 PM

We once discussed this issue with John back when I was working on projects/ifnet branch in subversion. Of course now with epoch in place and if_detach() waiting on all callbacks the situation is very different to what we had in 2015.

I'm trying to find discussion in archives, meanwhile adding John.

I don't like this change. The problem comes from the fact that we are allowing to run administrative detach in parallel with any other ifioctl(). I believe there are much more issues here except this one. This is just a can of worms as long as we keep these two things to run in parallel. Back when resolving all possible races in carp(4) I just have created a single sx lock that serialized any possible CARP reconfigurations. And this isn't a performance critical path - nobody suffers from this serialization. I'm inclined that we need same approach at interface wide level. Not necessarily a global sx, a per-interface sx should work. And at the point where ifioctl() today does ifunit_ref() it should lock the sx. Same thing should if_clone_destroy() do instead of taking reference.

I don't like this change. The problem comes from the fact that we are allowing to run administrative detach in parallel with any other ifioctl(). I believe there are much more issues here except this one. This is just a can of worms as long as we keep these two things to run in parallel. Back when resolving all possible races in carp(4) I just have created a single sx lock that serialized any possible CARP reconfigurations. And this isn't a performance critical path - nobody suffers from this serialization. I'm inclined that we need same approach at interface wide level. Not necessarily a global sx, a per-interface sx should work. And at the point where ifioctl() today does ifunit_ref() it should lock the sx. Same thing should if_clone_destroy() do instead of taking reference.

Sure- I had considered this approach as well, but didn't immediately see the downside in letting the in-process ioctl continue and just freeing softc at the end. I can see where that'd be less than ideal, though.

One thing I am unsure about is how to handle the possibility that we've already entered the SIOCIFDESTROY path and taken the lock when another thread or two pops in and tries to execute another ioctl but waits on the sx in ifhwioctl. Presumably the destroy path would have to refrain from destroying the sx because the other patch could have taken it in the small window between releasing it and destroying it, right?

ae added a reviewer: avg.Dec 6 2019, 6:50 AM
jhb added a comment.Dec 6 2019, 5:43 PM

My view is that if_detach() (and ether_ifdetach which calls it) needs to be a blocking wait that ensures that any in-process ioctls have finished and exited the ifnet's routines as well as forbidding new ones before it returns. This matches the pattern of other teardown routines like bus_teardown_intr, callout_drain, taskqueue_drain, etc. Many drivers in the tree assume these semantics (except a few that still call ifdetach "late").

kevans updated this revision to Diff 65512.Dec 11 2019, 10:31 PM
kevans retitled this revision from ifnet: add callback for post-free cleanup to ifnet: drain+halt ioctl/vmove operations on detach.
kevans edited the summary of this revision. (Show Details)
This revision now requires review to proceed.Dec 11 2019, 10:31 PM
kevans added a reviewer: bz.Dec 11 2019, 10:31 PM

Adding bz since I started touching some vimage parts.

jhb added inline comments.Dec 11 2019, 11:01 PM
sys/net/if.c
1355

You don't have to use the refcount API if you are doing all operations under a lock. OTOH, I think a better fix is to avoid the lock for the release unless you need it.

I would perhaps call this a "busy" count to match terminology we use for other parts of the kernel and add little wrappers if_busy / if_unbusy that just call refcount_acquire and refcount_release. You can drop the CV and instead use refcount_sleep() in if_detach() to wait for the reference count to drop to zero.

You would still need the lock to gate checking the IFF_DETACHING flag before doing if_busy, but that's about it. The vnet moving thing though makes this more complicated. I'd prefer that we solve that by using a suitable busy/unbusy of the ifnet instead of adding the explicit check for destroying the vnet. Fore example, maybe vnet destroy busies all the ifp's and then unbusies them once they are moved if that works?

kp added a comment.Dec 12 2019, 12:19 AM

This seems to break the build:

In file included from /usr/src/contrib/ntp/libntp/systime.c:10:
In file included from /usr/src/contrib/ntp/include/ntp.h:13:
In file included from /usr/src/contrib/ntp/include/ntp_stdlib.h:14:
In file included from /usr/src/contrib/ntp/include/ntp_net.h:19:
/usr/obj/usr/src/amd64.amd64/tmp/usr/include/net/if_var.h:431:12: error: field has incomplete type 'struct cv'
        struct cv if_mgmt_cv;
                  ^
/usr/obj/usr/src/amd64.amd64/tmp/usr/include/net/if_var.h:431:9: note: forward declaration of 'struct cv'
        struct cv if_mgmt_cv;
               ^
--- all_subdir_usr.bin ---
yacc: 1 shift/reduce conflict, 16 reduce/reduce conflicts.

I've not yet looked any deeper.

In D22691#498397, @kp wrote:

This seems to break the build:

In file included from /usr/src/contrib/ntp/libntp/systime.c:10:
In file included from /usr/src/contrib/ntp/include/ntp.h:13:
In file included from /usr/src/contrib/ntp/include/ntp_stdlib.h:14:
In file included from /usr/src/contrib/ntp/include/ntp_net.h:19:
/usr/obj/usr/src/amd64.amd64/tmp/usr/include/net/if_var.h:431:12: error: field has incomplete type 'struct cv'
        struct cv if_mgmt_cv;
                  ^
/usr/obj/usr/src/amd64.amd64/tmp/usr/include/net/if_var.h:431:9: note: forward declaration of 'struct cv'
        struct cv if_mgmt_cv;
               ^
--- all_subdir_usr.bin ---
yacc: 1 shift/reduce conflict, 16 reduce/reduce conflicts.

I've not yet looked any deeper.

Whoops- failed to check/build if any userland stuff included if_var.h... only buildkernel+test. Will remedy in the next iteration.

kp added a comment.Dec 12 2019, 1:48 AM

Still haven't looked at it, but it seems to fix PR 238870. No more panics (in a couple of runs of) pf:names / pf:synproxy tests, so as far as I'm concerned this cannot get merged quickly enough.

kevans added inline comments.Dec 12 2019, 4:28 AM
sys/net/if.c
1355

The semantics of if_vmove need to change a little bit, too, if we go that route... and that's probably actually a good thing. Right now, if_vmove isn't mutually exclusive with ioctl handlers as well, but we should probably be treating that more like a temporary detach; drain ioctl handlers similarly and prevent if_vmove() on an interface that's already trying to move.

Then, vnet_destroy busies all of its interfaces that are getting re-homed and flags them as IFF_DETACHED to prevent any vnet motion before we get a chance to move them back to their home later on in the shutdown procedure. If done properly, I think this also closes the remainder of the vnet race that I describe later in if_detach_internal where shutdown can commence in the middle of if_detach.

But +1 for dropping the cv, removing the additional dependency of the header (pointed out by kp) and reducing the weight (because I'd still like to try and produce something MFC'able to stable/12).

kevans added inline comments.Dec 12 2019, 6:17 AM
sys/net/if.c
1355

Perhaps if_vmove draining ioctl handlers is unnecessary, but it should at least block any other if_vmove and further ioctls until the info has stabilized in the new vnet.

kevans updated this revision to Diff 65557.Dec 12 2019, 6:20 PM
kevans edited the summary of this revision. (Show Details)

This is now stacked on D22780 and D22783, which separate out some of the other vnet ordering issues.

kp added a comment.Dec 13 2019, 4:58 PM

With all three of these patches I now see this panic running the pf tests:

panic: hashdestroy: hashtbl 0xfffff80045423000 not empty (malloc type ifaddr)
cpuid = 5
time = 1576251489
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0061f20830
vpanic() at vpanic+0x17e/frame 0xfffffe0061f20890
panic() at panic+0x43/frame 0xfffffe0061f208f0
hashdestroy() at hashdestroy+0x54/frame 0xfffffe0061f20900
vnet_destroy() at vnet_destroy+0x123/frame 0xfffffe0061f20930
prison_deref() at prison_deref+0x29d/frame 0xfffffe0061f20970
sys_jail_remove() at sys_jail_remove+0x290/frame 0xfffffe0061f209c0
amd64_syscall() at amd64_syscall+0x2d6/frame 0xfffffe0061f20af0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe0061f20af0
--- syscall (508, FreeBSD ELF64, sys_jail_remove), rip = 0x80031bfaa, rsp = 0x7fffffffe848, rbp = 0x7fffffffe8d0 ---
KDB: enter: panic
[ thread pid 1407 tid 100482 ]
Stopped at      kdb_enter+0x37: movq    $0,0x1082616(%rip)
db>
In D22691#499064, @kp wrote:

With all three of these patches I now see this panic running the pf tests:

panic: hashdestroy: hashtbl 0xfffff80045423000 not empty (malloc type ifaddr)
cpuid = 5
time = 1576251489
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0061f20830
vpanic() at vpanic+0x17e/frame 0xfffffe0061f20890
panic() at panic+0x43/frame 0xfffffe0061f208f0
hashdestroy() at hashdestroy+0x54/frame 0xfffffe0061f20900
vnet_destroy() at vnet_destroy+0x123/frame 0xfffffe0061f20930
prison_deref() at prison_deref+0x29d/frame 0xfffffe0061f20970
sys_jail_remove() at sys_jail_remove+0x290/frame 0xfffffe0061f209c0
amd64_syscall() at amd64_syscall+0x2d6/frame 0xfffffe0061f20af0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe0061f20af0
--- syscall (508, FreeBSD ELF64, sys_jail_remove), rip = 0x80031bfaa, rsp = 0x7fffffffe848, rbp = 0x7fffffffe8d0 ---
KDB: enter: panic
[ thread pid 1407 tid 100482 ]
Stopped at      kdb_enter+0x37: movq    $0,0x1082616(%rip)
db>

I suspect this would be fine with if ((ifp->if_flags & IFF_VMOVING) != 0) patched out in the SYSUNINIT releasing interfaces back to their home vnet. I'm slightly uncomfortable with how some of this works -- this teardown happens at SI_SUB_PROTO_DOMAIN, but still we have some interfaces that were created in this vnet that will be torn down later at SI_SUB_PROTO_IF. Intuitively, I would think we'd want to make sure all interfaces are moved out or destroyed (I "broke" the former, but I think the latter's still an issue) before we get to that point -- in which case the answer would instead be to bump SYSUNINITs that are detaching the other cloned interfaces to SI_SUB_PROTO_END or earlier in the process, maybe convert the vnet ifcnt to refcount(9) API and stick a refcount_wait in this bad boy before we get too SI_SUB_DOMAIN to be sure that all interfaces have vacated.

I haven't looked at how those interfaces operate, though. In hindsight it's probably fine and we should force the move+detach done by patching out the VMOVING check.

kevans updated this revision to Diff 66011.Dec 27 2019, 1:30 AM

Force removal of ifnet from a vnet, even if another thread's come along to destroy it -- we need to ensure it's gone before we proceed with sysinits or else we're just racing if_detach_internal vs. vnet sysuninits.