Page MenuHomeFreeBSD

Pass SIOCSIFCAP request to vlan(4)'s parent if there is one.
AbandonedPublic

Authored by sd_beastie.io on Apr 20 2016, 12:29 AM.

Details

Summary

This patch is copied from TrueNAS/FreeNAS as it resolves an issue when vlan + bridge + vimage/vnet is involved. Patch originally created by Xin Li for FreeNAS, see: https://bugs.pcbsd.org/issues/3676

Issue:
In a jail environment, with vlan + bridge in use utilizing VIMAGE, tearing down inet interfaces for the jail causes the ethernet mac addresses to switch as reported as follows:

Apr 12 02:54:45 uat kernel: arp: 192.168.6.103 moved from 20:e8:83:05:00:f8 to 02:ff:c0:00:09:0b on epair106b

The patch has been tested by me and it has proven reliable for over 2 weeks, and has been in FreeNAS/TrueNAS/TrueOS for over 2+ years. It's about time to get this into 11-CURRENT for those of us who have jails with vlan tagging using the "new" vimage/vnet framework..

PR: 208910

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

sd_beastie.io retitled this revision from to Pass SIOCSIFCAP request to vlan(4)'s parent if there is one..
sd_beastie.io updated this object.
sd_beastie.io edited the test plan for this revision. (Show Details)
sd_beastie.io set the repository for this revision to rS FreeBSD src repository - subversion.
sd_beastie.io added a project: network.

Do you have a bit more information on how to provoke the problem? I've look at the PR (and the pcbsd bug report), but I've not yet been able to reproduce it myself.

I found the issue when using jails. My setup involves em0 having a static ip and 2 cloned interfaces vlan0 for adding vlan tag on em0 traffic and bridge0 which includes vlan0 and all the epair interfaces created for each jail. After each jail is created and running, if I remove a jail using jail -r <name>, I see the error message above and all traffic after that is dropped. You may try to recreate the jail (jail -c <name>) if just removing isn't causing the error.

After applying the patch, I am unable to cause the problem to re-appear and have been successfully using it for 2+ weeks now in production.

error
arp: 192.168.6.108 moved from 20:90:3b:1b:00:f8 to 02:ff:c0:00:0b:0b on epair107b

rc.conf

[snip]  
EXT_IP="192.168.6.10"
ifconfig_em0="up"
cloned_interfaces="vlan0 bridge0"
ifconfig_vlan0="inet $EXT_IP netmask 255.255.255.0 vlan 6 vlandev em0"
defaultrouter="192.168.6.1"
ifconfig_bridge0="addm vlan0 up"
[snip]

jail.conf

publicweb {
  $if  = "109";
  $ip_addr  = "192.168.6.${if}";   # Jail ip address
  $ip_route  = "192.168.6.1";  # Gateway or host's ip address
  $mask = "255.255.255.0";  # Netmask
  vnet;
  vnet.interface  = "epair${if}b";

  # Commands to run on host before jail is created
  exec.prestart  = "ifconfig epair${if} create up";
  exec.prestart  += "ifconfig epair${if}a up";
  exec.prestart  += "ifconfig bridge0 addm epair${if}a up";

  # Commands to run in jail after it is created
  exec.start  = "/sbin/ifconfig lo0 127.0.0.1 up";
  exec.start  += "/sbin/ifconfig epair${if}b up";
  exec.start  += "/sbin/ifconfig epair${if}b ${ip_addr} netmask ${mask} up";
  exec.start  += "/sbin/route add default ${ip_route}";
  exec.start  += "/bin/sh /etc/rc";  

  exec.stop  = "/bin/sh /etc/rc.shutdown";
  exec.poststop  = "ifconfig bridge0 deletem epair${if}a";
  exec.poststop  += "ifconfig epair${if}a destroy";
  persist;
}

I don't understand how this patch helps.

I've reproduced the arp warning, but it doesn't seem to be related to SIOCSIFCAP.
The reason for the arp warning is that our bridging code sets the mac address of the bridge to be that of the first interface added t o it. In this case it doesn't seem to take the mac of the VLAN interface (figuring out why would be worthwhile), so as soon as the jail is started and the epair interface is added as a member the mac of the bridge changes.
Other than the vlan interface not doing this that's entirely expected. It should also be harmless.

The bridge code does call SIOCSIFCAP, but that's to disable IFCAP_LRO. Arguably the vlan code should allow this, but exposing all of SIOCSIFCAP to the parent interface seems dangerous as well. That implies that changing any capability on a vlan interface (which might be delegated to a jail!) would change it on the physical interface too.

Thanks for your response. I have started reading the bridge code in if_bridge.c and located the bridge_ioctl_add() and bridge_ioctl_del() functions. I should be able to add some instrumentation code to see what is causing this behavior and even try to patch it so the bridge sticks with the mac address of the first interface it was assigned unless that was being removed. Been meaning to jump in and write some code so this is a perfect opportunity.

One question though is would the switching of mac address of the bridge result in the loss of packets to and from the jail? I am very new to networking so pardon the silly questions. With the patch in this review, the bridge has an unique mac address in the system as shown below. Are you suggesting the bridge's mac address should be exactly that of the first interface, in this case em0?

em0: ether 68:05:ca:36:3c:26
bge0: ether 78:2b:cb:8e:0d:27
vlan0: ether 68:05:ca:36:3c:26
bridge0:
ether 02:0b:47:0d:33:00
member: epair109a
member: epair200a
member: epair108a
member: epair104a
member: epair106a
member: epair105a
member: epair103a
member: epair101a
member: epair201a
member: epair107a
member: epair100a
member: epair102a
member: vlan0
epair102a: ether 02:ff:70:00:07:0a
epair103a: ether 02:ff:70:00:09:0a
epair100a: ether 02:ff:70:00:0b:0a
epair101a: ether 02:ff:70:00:0d:0a
epair104a: ether 02:ff:70:00:0f:0a
epair105a: ether 02:ff:70:00:11:0a
epair201a: ether 02:ff:70:00:13:0a
epair106a: ether 02:ff:70:00:15:0a
epair107a: ether 02:ff:70:00:17:0a
epair108a: ether 02:ff:70:00:19:0a
epair200a: ether 02:ff:70:00:08:0a
epair109a: ether 02:ff:70:00:0a:0a

In D6015#129660, @shawn_debnath.net wrote:

One question though is would the switching of mac address of the bridge result in the loss of packets to and from the jail?

I wouldn't expect it to. In fact, I think the arp message is a red herring here.

What I suspect is happening is that the bridge on top of the vlan interface isn't working, because it couldn't disable capabilities (likely LRO, but perhaps something else) on the vlan interface. (See bridge_mutecaps() in net/if_bridge.c).

I'd start by testing a system without this patch, manually disabling capabilities on the bridge members. I would expect things to start working of the correct capability is disabled on the right interface.

@kristof - perfect. I should have enough info to start digging. Will close this review for now. Would you mind if I ping you separately if questions come up. Thanks

In D6015#129721, @shawn_debnath.net wrote:

@kristof - perfect. I should have enough info to start digging. Will close this review for now. Would you mind if I ping you separately if questions come up. Thanks

Yes, please do contact me (kp@freebsd.org) if you've got questions or answers ;)

Per comments, closing review. Will create new review linked to the same bug once I have something tangible.

This revision was automatically updated to reflect the committed changes.

@kp, might have been closed with the wrong review/commit pair.

emaste added a subscriber: emaste.

rS298664 should have referenced D5977, not this one

No need to re-open. Abandoning again.

Oops -- sorry I missed that it was previously abandoned.

In any case it how has the correct state and a reference to the rev that accidentally closed it.

This revision was automatically updated to reflect the committed changes.

And like an idiot I copy/paste the incorrect differential again.

@kristof, glad this issue is on the top of your head at all times :)