Page MenuHomeFreeBSD

if_tuntap: defer transient destroy_dev() to a taskqueue
Needs ReviewPublic

Authored by kevans on Wed, Oct 29, 3:19 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Nov 3, 5:18 AM
Unknown Object (File)
Mon, Nov 3, 5:18 AM
Unknown Object (File)
Mon, Nov 3, 5:18 AM
Unknown Object (File)
Mon, Nov 3, 5:18 AM
Unknown Object (File)
Mon, Nov 3, 1:25 AM
Unknown Object (File)
Sun, Nov 2, 8:08 PM
Unknown Object (File)
Sun, Nov 2, 8:08 PM
Unknown Object (File)
Sun, Nov 2, 8:08 PM

Details

Reviewers
kib
markj
Summary

We're in the dtor, so we can't destroy it now without deadlocking after
recent changes to make destroy_dev() provide a barrier. However, we
know there isn't any other dtor to run, so we can go ahead and clean up
our state and just prevent a use-after-free if someone races to open
the device while we're trying to destroy it.

While we're here, allow a destroy operation to proceed if we caught a
signal in cv_wait_sig() but tun_busy dropped to 0 while we were waiting
to acquire the lock.

This was more of an inherent design flaw, then a bug in the below-refed
commit.

PR: 290575
Fixes: 4dbe6628179d ("devfs: make destroy_dev() a release [...]")

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 68401
Build 65284: arc lint + arc unit

Event Timeline

sys/net/if_tuntap.c
680
1167

What prevents a scenario where tunopen() is entered and the check for si_drv1 == dev to fail because a parallel tun_destroy() not yet set si_drv1 to dev. Meantime tun_destroy() progressed to the point that si_drv1 is no longer a pointer to tuntap_softc, and we read it there?

kevans marked an inline comment as done.

Dance around tunmtx to protect against concurrent tun_destroy() / tunopen()

destroy_dev_drain() after we've issued all of our tun_destroy(), just in case
the module unload destroyed a transient tunnel. I note that module unload does
not actually work at the moment, though, since the stub tun_mod/tap_mod use
a NULL modevent handler, which means modevent_nop / EBUSY a MOD_UNLOAD. This is
an easy fix, but out of scope for this review.

sys/net/if_tuntap.c
1179

I do not understand this even more. If everything you do under the tunmtx is checking the flag, then you can check it without lock as well.

Could it become simpler if we made two observations:

  • if you need to call destroy_dev() from cdevpriv destructor, then the destructor itself was not called in the context of destroy_dev(), but instead due to the close of a file
  • after destroy_dev_sched(), no new threads could enter any csw methods. Existing threads continue to run, of course.
sys/net/if_tuntap.c
1179

Oh, right- I missed that your point #1 is implemented by way of dev_refthread() checking CDP_SCHED_DTR... I'll chew on this a little bit more, because I don't immediately see how I drain the theoretical remaining thread in tunopen(), unless I switch it to something more like an epoch with a deferred-free.

Simplify a bit using the net epoch

Enter the net epoch in tunopen() and use that fact to defer reclamation of just
the bits needed by a concurrent open. NET_EPOCH_DRAIN_CALLBACKS() in our
SYSUNINIT now that the dtor might have scheduled a callback to free a softc.

sys/net/if_tuntap.c
1167

So I still do not understand why what I asked above is not possible, with the new code. You ought to explain it, perhaps as a comment to make it easily visible.

sys/net/if_tuntap.c
1167

Sure, let me explain here to be sure it's actually correct (and it may be that I'm missing a barrier around si_drv1 load/store to actually guarantee the ordering that I want): tun_destroy() sets si_drv1 = dev and subsequently schedules the softc to be freed in coordination with the net epoch, so that a parallel thread in tunopen can observe one of two states:

  • si_drv1 == dev: the softc will be freed imminently
  • si_drv1 != dev: we're in the middle of an epoch section, so even if we just missed the store to si_drv1 the free will still be deferred until sometime after we exited. The tunnel is TUN_DYING, so we'll EBUSY our way out of the epoch.

Still, add the comment please.

This revision is now accepted and ready to land.Mon, Nov 3, 3:38 AM
sys/net/if_tuntap.c
1167

Don't you at least need read-once semantics for dev->si_drv1? That is, how do you know that the value tp is a valid pointer to the softc? It looks like this should be:

void *p;

p = atomic_load_ptr(&dev->si_drv1);
if (p == dev)
    return error;
...
tp = p;

Do a read-once load of si_drv1, just in case. Amend the comment to try and
describe our solution more completely

This revision now requires review to proceed.Tue, Nov 4, 6:14 AM