Page MenuHomeFreeBSD

Wrap a vlan's parent's if_output in a separate function.
Needs RevisionPublic

Authored by mjoras on Feb 15 2018, 6:18 PM.

Details

Reviewers
rstone
melifaro
olivier_cochard.me
Group Reviewers
network
Summary

When a vlan interface is created, its if_output is set directly to the
parent interface's if_output. This is fine in the normal case but has an
unfortunate consequence if you end up with a certain combination of vlan
and lagg interfaces.

Consider you have a lagg interface with a single laggport member. When
an interface is added to a lagg its if_output is set to
lagg_port_output, which blackholes traffic from the normal networking
stack but not certain frames from BPF (pseudo_AF_HDRCMPLT). If you now
create a vlan with the laggport member (not the lagg interface) as its
parent, its if_output is set to lagg_port_output as well. While this is
confusing conceptually and likely represents a misconfigured system, it
is not itself a problem. The problem arises when you then remove the
lagg interface. Doing this resets the if_output of the laggport member
back to its original state, but the vlan's if_output is left pointing to
lagg_port_output. This gives rise to the possibility that the system
will panic when e.g. bpf is used to send any frames on the vlan
interface.

Fix this by creating a new function, vlan_output, which simply wraps the
parent's current if_output. That way when the parent's if_output is
restored there is no stale usage of lagg_port_output.

Test Plan

Some basic testing with vlan interfaces to ensure they still work.
Verified the vlan interface goes back to using ether_output after
the lagg is destroyed.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 15068
Build 15163: arc lint + arc unit

Event Timeline

mjoras created this revision.Feb 15 2018, 6:18 PM
melifaro requested changes to this revision.Feb 15 2018, 6:33 PM
melifaro added a subscriber: melifaro.

The issue which this change is trying to solve is control plane issue. Dealing with it in the data path code seem to be a wrong approach. Additionally, it imposes significant performance penalties.
The better way of doing this is to have a "solver" function which is able to handle such cases. Calls to this function can be triggered by ifnet/lagg change events subscription.

This revision now requires changes to proceed.Feb 15 2018, 6:33 PM

Why don't we just reject creation of vlan interface on lagg member (as opposed to lagg itself)?

The issue which this change is trying to solve is control plane issue. Dealing with it in the data path code seem to be a wrong approach. Additionally, it imposes significant performance penalties.
The better way of doing this is to have a "solver" function which is able to handle such cases. Calls to this function can be triggered by ifnet/lagg change events subscription.

You can't just state it has "significant performance penalties" by looking at the code. Consider also that the path if_output takes isn't as direct as you might think anyway. When a vlan's if_output gets called you end up back in ether_output_frame. That ends up calling the interface's if_transmit, which puts you back in vlan_transmit. So the old path would have been:

<vlan if>->ifoutput() 
ether_output(<vlan if>)
ether_output_frame(<vlan if>) 
<vlan if>->if_transmit() 
vlan_transmit() 
<parent if>->if_output() 
ether_output(<vlan parent if>) 
ether_output_frame(<vlan parent if>) 
<vlan parent if>->if_transmit()

And with this change it would be:

<vlan if>->ifoutput() 
vlan_output(<vlan_if>)
ether_output(<vlan if>)
ether_output_frame(<vlan if>) 
<vlan if>->if_transmit() 
vlan_transmit() 
<parent if>->if_output() 
ether_output(<vlan parent if>) 
ether_output_frame(<vlan parent if>) 
<vlan parent if>->if_transmit()

So it wasn't exactly a hyper-optimized data path to begin with. This is why some vlan users avoid this mess entirely using an out of tree direct dispatch approach. Also keep in mind that this only affects direct users of if_output, not users of if_transmit.

We could have a vlan event handler explicitly for lagg events, but that to me seems like an annoying layering violation / special casing.

mjoras added a comment.EditedFeb 15 2018, 7:01 PM

Why don't we just reject creation of vlan interface on lagg member (as opposed to lagg itself)?

We could do that but that is technically regressing behavior. Today you can get something very weird:

  1. Setup a lagg with laggport member
  2. Setup vlan on top of laggport member
  3. Send frames out of the vlan interface via BPF.

What this will result in is tagged frames being sent out of the parent by way of the lagg interface. It's a counter-intuitive result but I didn't want to just go and block it because if I've learned anything is that people sometimes rely on very strange network configurations.

Indeed, people sometimes tend to do strange things with their network and that's nothing wrong with it (I do sometimes, too). But we should have a distinction for supported configuration and unsupported ones (leading to panic or anothe misbehavour). For example, we do not support assigning IP address to lagg or bridge member - one can do that but that won't work generally.

If one needs to send tagged frames out of lagg interface, we support that by numerous consistent ways like creation vlan over lagg or using ng_vlan or manually crafting raw frames and sending them via BPF (I personally have such custom perl code for BPF way in production).

Why should we support inconsistent or (using more strong language) broken ways to abuse network stack? I'd like to see real example of usefullness of such approach.

ae added a reviewer: olivier_cochard.me.EditedFeb 16 2018, 8:52 AM

You can't just state it has "significant performance penalties" by looking at the code. Consider also that the path if_output takes isn't as direct as you might think anyway. When a vlan's if_output gets called you end up back in ether_output_frame. That ends up calling the interface's if_transmit, which puts you back in vlan_transmit. So the old path would have been:

Reducing inbound call path improves forwarding performance for up to 20%. Additional entries can hit performance. This is not noticeable for stock FreeBSD, that can't do more that 3-5Mpps due to lock contention. But when packet rate is about 10-12Mpps it will be significant. We can ask Olivier to try test with our patches.

In D14385#301730, @ae wrote:

You can't just state it has "significant performance penalties" by looking at the code. Consider also that the path if_output takes isn't as direct as you might think anyway. When a vlan's if_output gets called you end up back in ether_output_frame. That ends up calling the interface's if_transmit, which puts you back in vlan_transmit. So the old path would have been:

Reducing inbound call path improves forwarding performance for up to 20%. Additional entries can hit performance. This is not noticeable for stock FreeBSD, that can't do more that 3-5Mpps due to lock contention. But when packet rate is about 10-12Mpps it will be significant. We can ask Olivier to try test with our patches.

Right, and since this targets the current stack I figured it probably wouldn't have too much of a perf impact. It would definitely be nice to have some actual numbers from Olivier!

That being said, I'm ambivalent about doing this versus just blocking the configuration entirely. Do we care about breaking the very strange confiuration I described earlier?

I vote to block it for now. We can re-think it in case some useful example of real case came up.

In D14385#301730, @ae wrote:

Reducing inbound call path improves forwarding performance for up to 20%. Additional entries can hit performance. This is not noticeable for stock FreeBSD, that can't do more that 3-5Mpps due to lock contention. But when packet rate is about 10-12Mpps it will be significant. We can ask Olivier to try test with our patches.

Ready to do a bench: Does "our patches" means is this review or another patch ?
Because I've already did some benches regarding VLAN impact (with a router use case) here:

ae added a comment.Feb 18 2018, 11:52 AM

Ready to do a bench: Does "our patches" means is this review or another patch ?

I meant patches that replaces rwlock to rmlock. Our vlan related patches are targeted to reduce overhead for inbound traffic.

Because I've already did some benches regarding VLAN impact (with a router use case) here:

In this case I think you can route non-tagged packets into some vlan to see if there some difference appears. One test without this patch and one with it.

Then, here is my bench results regarding forwarding performance impact with this patch: I'm using a simple 802.1q setup (no LAGG).
Configuration on the Device Under Test (DUT):

vlans_cxl0="2"
vlans_cxl1="4"
ifconfig_cxl0="up -tso4 -tso6 -lro -vlanhwtso"
ifconfig_cxl1="up -tso4 -tso6 -lro -vlanhwtso"
gateway_enable="YES"
ifconfig_cxl0_2="inet 198.18.0.10/24"
ifconfig_cxl1_4="inet 198.19.0.10/24"

now the difference with a -head FreeBSD (no difference):

x FreeBSD r330012: inet4 packets-per-second forwarded
+ FreeBSD r330012 and D14385: inet4 packets-per-second forwarded
+--------------------------------------------------------------------------+
|+        +      +    x  *                 +             x            x   x|
|                        |________________________A______M________________||
|  |_____________M_A________________|                                      |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5       3852149       4034011     3976874.5     3949331.1     85995.825
+   5       3778967       3927953     3836199.5     3843752.3     56492.182
No difference proven at 95.0% confidence

And the difference with a Yandex (AF_DATA + RADIX) patched FreeBSD (loose by about 2.7%, from 5Mpps to 4.9Mpps):

x FreeBSD r330012 and Yandex patches: inet4 packets-per-second forwarded
+ FreeBSD r330012 and Yandex patches and D14385: inet4 packets-per-second forwarded
+--------------------------------------------------------------------------+
|                              +                                           |
|+            x          +     + +                 x   x           x   x   |
|                            |______________________A__M__________________||
|          |____________A______M______|                                    |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5       4830339     5114537.5       5038083     5019229.9     112874.46
+   5       4763490       4925624       4914459       4880596      67325.88
Difference at 95.0% confidence
        -138634 +/- 135539
        -2.76206% +/- 2.64558%
        (Student's t, pooled s = 92933.9)