Page MenuHomeFreeBSD

Fix 802.1ad handling bug in input processing of ethernet frames.
Needs ReviewPublic

Authored by donner on Sep 7 2021, 8:29 AM.

Details

Reviewers
freebsd_oprs.eu
melifaro
Group Reviewers
network
Summary

During input processing of received frames, the commit
c7cffd65c5d858425e90b847d2e8e583e3b13bf7 (D26436) introduced a
generalized processing von 802.1q and 802.1ad frames by replacing the
VLAN header by mbuf meta data.

Unfortunately the ethertype of the original frame is lost. So if the
frame is processed by other kernel routines later (i.e. netgraph),
802.1ad frames will be indistinguishable from 802.1q frames.

In my case this breaks further processing.

PR: 258334

Test Plan

Given the simple netgraph infrastructure to extrude an 802.1ad vlan (2) from an interface (lagg1) (and vice versa)

+ mkpeer lagg1: vlan lower downstream
+ name lagg1:lower vlag
+ msg vlag: setencapproto 0x88A8
+ mkpeer vlag: eiface test ether
+ msg vlag: addfilter { vid=2 hook="test" }
+ mkpeer vlag: eiface nomatch ether
# tcpdump -eni ngeth1 # nomatch

An unpatched 13-STABLE does spit out 802.1q frames on the "nomatch" hook which are displayed by tcpdump as

09:22:35 aa:aa:aa:aa:aa > bb:bb:bb:bb:bb:bb ethertype 802.1Q (0x8100), length 68: vlan 2, p 0, ethertype ...

After applying the patch, the netgraph node ng_vlan(4) connected to
the lower hook of ng_ether(4) is able to handle the frames accordingly
to the ethertype correctly.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 43607
Build 40495: arc lint + arc unit

Event Timeline

donner requested review of this revision.Sep 7 2021, 8:29 AM

Could you describe the actual problem with ng_vlan? Without the problem statement, it's not exactly clear what can/should be done here.
This diff breaks QinQ processing, so it doesn't look like a path forward.

Could you describe the actual problem with ng_vlan? Without the problem statement, it's not exactly clear what can/should be done here.
This diff breaks QinQ processing, so it doesn't look like a path forward.

Done. Does it help to understand the problem?

Rebase to main.

@melifaro may you please point me to the problem with QinQ processing?
How can the processing work, if the ethertype information is lost?

Rebase to main.

@melifaro may you please point me to the problem with QinQ processing?
How can the processing work, if the ethertype information is lost?

Regardless of what encap is (vlan or qinq) the vlan code effectively does the same - pullups the header and record the tag.
If NIC does not support QINIQ tag removal (or vlanhwtag is explicitly turned off), such frames will be dropped after the change, which is the breakage I'm talking about

@melifaro may you please point me to the problem with QinQ processing?
How can the processing work, if the ethertype information is lost?

Regardless of what encap is (vlan or qinq) the vlan code effectively does the same - pullups the header and record the tag.
If NIC does not support QINIQ tag removal (or vlanhwtag is explicitly turned off), such frames will be dropped after the change, which is the breakage I'm talking about

In my case I have to handle arbitrary vlan tags in both directions, so the are provided inband (vlanhwtag disabled) and need to match the outside world exactly. The current approach is not able to regenerate the QinQ ethertype in outbound direction without hardware support. This is an unexpected behavior which breaks existing production setups.

This fix reestablish the previous, working situation for 802.1ad frames without static hardware assignments. It runs in production since the breakage of 13.0.

In order to solve this mess, I'd need to know witch part of the QinQ handling will break if only 802.1q frames are removed. It would be very nice to have a code path with will server all purposes, not only the latest commiter. Can you point me to the relevant part, please?