Rework vlan(4) locking.
ClosedPublic

Authored by mjoras on Jun 26 2017, 6:11 PM.

Details

Summary

Previously the locking of vlan(4) interfaces was not very comprehensive.
Particularly there was very little protection against the destruction of
active vlan(4) interfaces or concurrent modification of a vlan(4)
interface. The former readily produced several different panics.

The changes can be summarized as using two global vlan locks (an
rmlock(9) and an sx(9)) to protect accesses to the if_vlantrunk field of
struct ifnet, in addition to other places where global exclusive access
is required.vlan(4) should now be much more resilient to the destruction
of active interfaces and concurrent calls into the configuration path.

Test Plan

I have a few simple shell scripts to test the tx/rx panics and to make
sure the config path locking is still sane. Both were run with INVARIANTS
kernels. The tx/rx panics are easily reproducible using two bhyve VMs
with the tap interfaces connected via IF_BRIDGE(4). Additionally Isilon
will be running this through an internal vlan-locking test suite shortly.

Basic create_destroy script that creates a bunch of vlans, propogates
an lladdr change, sets the MTU, then destroys them:

#!/bin/sh

for i in `seq 1 200`
do
    ifconfig "vlan$i" create
    ifconfig "vlan$i" vlan $i vlandev vtnet0
    ifconfig "vlan$i" "203.$i.113.1/24"
done

ifconfig vtnet0 ether "00:a0:98:e7:7e:f6"

for i in `seq 1 200`
do
    ifconfig "vlan$i" mtu 1000 &
    ifconfig "vlan$i" destroy &
done

The following two scripts transmit over vlans and iteratively
destroys it on the receiving side:

#!/bin/sh
set -x
set -e

ifconfig vlan0 create
ifconfig vlan0 vlandev vtnet0 vlan 42
ifconfig vlan0 203.0.113.11/24
dd if=/dev/zero bs=32k count=1000000 | nc -N 203.0.113.10 8000
#!/bin/sh
set -x
set -e
ifconfig vlan0 create
ifconfig vlan0 vlandev vtnet0 vlan 42
ifconfig vlan0 203.0.113.10/24
nc -l -k 8000 > /dev/null &
set s=''
read s

while true
do
    ifconfig vlan0 destroy
    ifconfig vlan0 create
    ifconfig vlan0 vlandev vtnet0 vlan 42
    ifconfig vlan0 203.0.113.10/24
    sleep 1
done

The following two do the opposite:

#!/bin/sh
set -x
set -e

ifconfig vlan0 create
ifconfig vlan0 vlandev vtnet0 vlan 42
ifconfig vlan0 203.0.113.11/24
dd if=/dev/zero bs=32k count=1000000 | nc -N 203.0.113.10 8000 &

set s=''
read s

while true
do
    ifconfig vlan0 destroy
    ifconfig vlan0 create
    ifconfig vlan0 vlandev vtnet0 vlan 42
    ifconfig vlan0 203.0.113.11/24
    sleep 1
done
#!/bin/sh
set -x
set -e

ifconfig vlan0 create
ifconfig vlan0 vlandev vtnet0 vlan 42
ifconfig vlan0 203.0.113.10/24
nc -l -k 8000 > /dev/null

As of 7/7/17 Isilon has completed our internal regression testing of these changes against HEAD as of r310194. Those regression tests involve creating/destroying vlan interfaces on top of physical interfaces and passing traffic between them. Additionally all of the configuration ioctls paths are tested. This is all done with an INVARIANTS kernel to note any WITNESS warnings.

A version of these changes, using just an rmlock(9) but covering broadly the same areas, has been in production with Isilon customers for over a year.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
mjoras created this revision.Jun 26 2017, 6:11 PM
mjoras edited the test plan for this revision. (Show Details)Jun 26 2017, 6:15 PM
mjoras set the repository for this revision to rS FreeBSD src repository.
rstone added inline comments.Jun 27 2017, 5:34 PM
sys/net/if_vlan.c
288 ↗(On Diff #30105)

is there a KASSERT() that we can add to RLOCK() to enforce this?

609 ↗(On Diff #30105)

I think you accidentally the verb in this sentence. "This is stupidly *requiring* the rmlock...?"

mjoras added inline comments.Jun 27 2017, 6:18 PM
sys/net/if_vlan.c
288 ↗(On Diff #30105)

I was looking for a way to do this and the problem I ran into was the fact that while there's calls for rm_assert and sx_assert, what we really need here is assert on multiple locks for !RA_LOCKED and !SA_LOCKED. So I think we'd need to add a call that basically does what sx_assert and rm_assert do, except without actually panicking (e.g. something like bool sx_is(struct sx *sx, int what))? Unless I'm missing an easier way to accomplish it.

rstone added inline comments.Jun 27 2017, 8:09 PM
sys/net/if_vlan.c
288 ↗(On Diff #30105)

Ah. For mtx there is mtx_owned(), but there doesn't seem to be anything similar for sx.

mav accepted this revision.Jun 29 2017, 10:34 AM

I have no objections. Just not sure why if_inc_counter() is repeatedly called under the lock. IIRC it is atomic by itself, and interface pointer should be valid any way.

This revision is now accepted and ready to land.Jun 29 2017, 10:34 AM
In D11370#235953, @mav wrote:

I have no objections. Just not sure why if_inc_counter() is repeatedly called under the lock. IIRC it is atomic by itself, and interface pointer should be valid any way.

Thanks for the review.

The reasoning for keeping the counter increment under the lock is to protect against the possibility of the vlan ifnet being freed while we are touching the counter, since the vlan ifnet can't be freed while we still have the read lock. That panic in particular is actually fairly easy to reproduce with a INVARIANTs kernel since it will write 0xdeadc0de into the ifnet once it's freed.

mjoras updated this revision to Diff 30233.Jun 29 2017, 8:01 PM
mjoras edited edge metadata.
mjoras marked an inline comment as done.
  • edit comment to properly english
This revision now requires review to proceed.Jun 29 2017, 8:01 PM
ngie added a subscriber: ngie.Jun 29 2017, 9:24 PM
mav added a comment.EditedJun 30 2017, 2:54 AM
In D11370#236121, @matt.joras_gmail.com wrote:

The reasoning for keeping the counter increment under the lock is to protect against the possibility of the vlan ifnet being freed while we are touching the counter, since the vlan ifnet can't be freed while we still have the read lock.

It looks odd to me that network stack does not protect against this. What if interface decide to go away earlier, just after entering vlan_transmit? I have subtle feeling that this may only hide the problem.

mjoras added a comment.EditedJun 30 2017, 3:25 AM
In D11370#236225, @mav wrote:
In D11370#236121, @matt.joras_gmail.com wrote:

The reasoning for keeping the counter increment under the lock is to protect against the possibility of the vlan ifnet being freed while we are touching the counter, since the vlan ifnet can't be freed while we still have the read lock.

It looks odd to me that network stack does not protect against this. What if interface decide to go away earlier, just after entering vlan_transmit? I have subtle feeling that this may only hide the problem.

Ah, you've stumbled upon the remaining problem here. You're absolutely correct, there is nothing preventing a race where we destroy the vlans AND the parent interface before entering one of the functions. This is because there's nothing that globally synchronizes the references of an ifnet so that one doesn't destroy an ifnet while there are references held. This is theoretically a problem for physical interfaces with no cloned interfaces associated, but it isn't very noticeable due to the fact that detachment is rather rare.

For vlan interfaces, it turns out, that the main window that ends up causing panics is significantly narrowed by using these global locks. Specifically that window is the one where a vlan interface is destroyed, but the parent (usually physical) interface is still intact. In these cases the path for vlan_input; we are assuming the parent ifnet isn't going to disappear out from under us.

The problem with synchronizing ifnet destruction with reference holders is tricky due to the fact that mbufs hold ifnet references on the rx path, so it would probably require a combination of reference counting and something to drain the references, such as an RCU mechanism or something similar.

ae added a comment.Jun 30 2017, 9:00 AM
In D11370#236225, @mav wrote:
In D11370#236121, @matt.joras_gmail.com wrote:

The reasoning for keeping the counter increment under the lock is to protect against the possibility of the vlan ifnet being freed while we are touching the counter, since the vlan ifnet can't be freed while we still have the read lock.

It looks odd to me that network stack does not protect against this. What if interface decide to go away earlier, just after entering vlan_transmit? I have subtle feeling that this may only hide the problem.

We have similar patch https://people.freebsd.org/~ae/netgc.diff in our environment to minimize such chance.

jhb added a subscriber: jhb.Jul 7 2017, 6:45 PM
jhb added inline comments.
sys/net/if_vlan.c
288 ↗(On Diff #30105)

Yes, in general I don't want to advertise 'foo_owned()'-like functions because people might use them outside of assertions which I think is not good practice (I think that generally code should be aware enough of the locking to not do things like 'if (!locked) { lock(); }'. For that reason I've tried to have locking primitives only provide assertions as that is the one case where I do think that an "owned" check is useful.

That said, combining multiple locks and asserting that at least one is locked for a "read" lock is a perfectly fine thing to do and the existing assertion functions don't really let you do that. I think the proposed 'sx_is' which accepts the assertion argument as the 'what' would be a useful tool to permit construction of these assertions (along with rm_is, etc.). I would perhaps suggest that if these are added that the documentation explicitly recommend them only for use in building assertions for more complex locking schemes.

mjoras added inline comments.Jul 8 2017, 4:50 AM
sys/net/if_vlan.c
288 ↗(On Diff #30105)

Yes, I was thinking similarly. I think if it's added the manpage should strongly suggest only using them in building locking assertions. I think I'll explore this in a different revision.

mjoras edited the test plan for this revision. (Show Details)Jul 9 2017, 12:42 AM
markj accepted this revision.Jul 10 2017, 4:17 PM
markj added inline comments.
sys/net/if_vlan.c
228 ↗(On Diff #30233)

"is has"

266 ↗(On Diff #30233)

The continuation line should be indented to the same level as "rm_rlock", plus four spaces.

290 ↗(On Diff #30233)

I'd explicitly say, "both the global locks." It's also not obvious to me why you singled out TRUNK_WLOCK - does this not apply to TRUNK_RLOCK?

This revision is now accepted and ready to land.Jul 10 2017, 4:17 PM
mjoras marked 2 inline comments as done.Jul 11 2017, 4:24 AM
mjoras added inline comments.
sys/net/if_vlan.c
290 ↗(On Diff #30233)

Wouldn't saying "both the global locks" here indicate that you need both of them to acquire a trunk lock? I was trying to convey that you need at least a VLAN_RLOCK or VLAN_SLOCK before TRUNK_*LOCK.

To your second point, no, there's no reason for singling out TRUNK_WLOCK, I can fix that.

mjoras updated this revision to Diff 30635.Jul 11 2017, 4:27 AM
mjoras edited edge metadata.
  • comment updates
This revision now requires review to proceed.Jul 11 2017, 4:27 AM
markj added inline comments.Jul 11 2017, 4:53 PM
sys/net/if_vlan.c
290 ↗(On Diff #30233)

Oops, I missed that you need *both* shared locks to acquire the trunk lock. So perhaps explicitly say "either of the global locks."

mjoras updated this revision to Diff 30650.Jul 11 2017, 6:25 PM
  • further clarify comment
emaste added a subscriber: emaste.Jul 25 2017, 1:47 AM
ae added a comment.Aug 15 2017, 11:43 AM

I have tested your patch in our test environment against forwarding performance.
[packet generator] -> [ switch ] -> [ix.10 -> ix.100]

So, the FreeBSD 12 receives tagged by vlan10 packets on ixgbe(4) and then sends them into vlan100 through the same interface.
With used traffic distribution this test machine is able to forward about 1.3Mpps with and without your patch.
Then I applied our local patch to reduce RX overhead using direct vlan handling in the ixgbe(4). With this patch the same machine is able to forward 3Mpps without packet loss. With your patch this value is lowered to 2.9Mpps. Thus the locking overhead cost is about ~100kpps.
Also I think the possible panic in the vlan_input() due to the race now fixed.

In D11370#249511, @ae wrote:

I have tested your patch in our test environment against forwarding performance.
[packet generator] -> [ switch ] -> [ix.10 -> ix.100]

So, the FreeBSD 12 receives tagged by vlan10 packets on ixgbe(4) and then sends them into vlan100 through the same interface.
With used traffic distribution this test machine is able to forward about 1.3Mpps with and without your patch.
Then I applied our local patch to reduce RX overhead using direct vlan handling in the ixgbe(4). With this patch the same machine is able to forward 3Mpps without packet loss. With your patch this value is lowered to 2.9Mpps. Thus the locking overhead cost is about ~100kpps.
Also I think the possible panic in the vlan_input() due to the race now fixed.

Thanks for your testing! We didn't observe any measurable performance decreases in our testing, but we aren't a forwarding use case so we don't push the pps that far on 10GbE; we are looking for line-rate at 1500 and 9000 MTU. Is that 100kpps cost enough to dissuade you from using the improved locking? Long term I'd like to investigate other less-costly synchronization schemes using if_vlan as a test bed, but that's not something I'm going to get to anytime soon.

ae accepted this revision.Aug 15 2017, 3:53 PM

I also did the same test with vlans created on top of lagg with 2x25G mellanox adapters.
I didn't see measurable performance drop there. It is able to forward 14Mpps with our RX direct vlan handling patch.

I think it is acceptable.

This revision is now accepted and ready to land.Aug 15 2017, 3:53 PM
In D11370#249572, @ae wrote:

I also did the same test with vlans created on top of lagg with 2x25G mellanox adapters.
I didn't see measurable performance drop there. It is able to forward 14Mpps with our RX direct vlan handling patch.

I think it is acceptable.

Great. For my own curiosity, are your direct vlan handling patches posted anywhere?

rstone accepted this revision.Aug 15 2017, 5:05 PM
This revision was automatically updated to reflect the committed changes.
ae added a comment.Aug 16 2017, 9:15 AM

Great. For my own curiosity, are your direct vlan handling patches posted anywhere?

No, they are hackish :)
I published them here:
https://reviews.freebsd.org/D12040
https://reviews.freebsd.org/D12041