Page MenuHomeFreeBSD

Allow to specify PCP on packets not belonging to any VLAN.
ClosedPublic

Authored by kib on Mar 15 2018, 4:16 PM.

Details

Summary

According to 802.1Q-2014, VLAN tagged packets with VLAN id 0 should be considered as untagged, and only PCP and DEI values from the VLAN tag are meaningful. See for instance https://www.cisco.com/c/en/us/td/docs/switches/connectedgrid/cg-switch-sw-master/software/configuration/guide/vlan0/b_vlan_0.html .

Make it possible to specify PCP value for outgoing packets on an ethernet interface. When PCP is supplied, the tag is appended, VLAN id set to 0, and PCP is filled by the supplied value. The code to do VLAN tag encapsulation is refactored from the if_vlan.c and moved into if_ethersubr.c.

Drivers might have issues with filtering VID 0 packets on receive. This bug should be fixed for each driver.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

kib created this revision.Mar 15 2018, 4:16 PM
kib edited the summary of this revision. (Show Details)Mar 15 2018, 4:38 PM
ae added a reviewer: network.Mar 16 2018, 8:26 AM
hselasky added inline comments.Mar 16 2018, 9:00 AM
sys/net/if_ethersubr.c
457 ↗(On Diff #40318)

Re-order to save a branch prediction?

if (error != 0) {
  if (error == EJUSTRETURN) {
  }
  return (error);
}
1320 ↗(On Diff #40318)

Why is pad not marked "const". Further, I suggest adding __aligned(4) to optimise the access to this field.

ae added inline comments.Mar 16 2018, 9:19 AM
sys/net/ethernet.h
423 ↗(On Diff #40318)

I think m and mout arguments can be replaced with one struct mbuf **mp argument like is currently used in many places.

sys/net/if_ethersubr.c
447 ↗(On Diff #40318)

the "error" and "i" variables seems have the same meaning. Also, are "etype" and "pcp" variables really needed?

sys/net/if_var.h
369 ↗(On Diff #40318)

Do you plan to MFC this change? Maybe it makes sense to use spare value for this?

kib marked 5 inline comments as done.Mar 16 2018, 10:33 AM
kib added inline comments.
sys/net/if_ethersubr.c
447 ↗(On Diff #40318)

I eliminated 'i'. 'pcp' is in fact needed to consistently make the decision to frame the packet with the VLAN tag, and to put right pcp into the tag. It is the bug that code used ifp_pcp as the argument to ether_8021q_frame(), instead of pcp.

457 ↗(On Diff #40318)

I do not believe that compiler is obliged to do anything differently compared with the existing code, if changed as you suggested. The only guaranteed way to mark 'error == 0' case as common is to use __predict_true() ugliness. It certainly requires numbers demonstrating the effect, which I doubt is possible in this case.

1320 ↗(On Diff #40318)

It was copied from the existing code, so no const, and I added the qualifier in the update. __aligned(4) is quite useless for byte access. The minor effect can be achieved if we put this into dedicated cache line, but again it is impossible to measure the effect, I believe.

I contemplated using zero_region as the source of high-quality zeros, but decided to not rototile this code too much.

sys/net/if_var.h
369 ↗(On Diff #40318)

spare fields should be only used when doing MFC. The HEAD changes should add necessary fields, it is fine to break KBI on HEAD.

kib marked 4 inline comments as done.Mar 16 2018, 10:33 AM
kib updated this revision to Diff 40342.

Handle review notes.

ae accepted this revision.Mar 16 2018, 11:12 AM

This looks good to me.

This revision is now accepted and ready to land.Mar 16 2018, 11:12 AM
hselasky accepted this revision.Mar 16 2018, 1:53 PM
melifaro added a subscriber: melifaro.EditedMar 18 2018, 3:03 PM

Several questions.

  1. What was the driver of implementing interface-level pcp settings?

For example, currently pcp can be set on per-mbuf basis for the "real" vlans, which can give more granular control on the outgoing traffic.
Personally it is a bit hard for me to see the benefits of having the same pcp applied to all packets from the host.
Typically this kind of configuration can be easily configured on the access switch and host is supposed to be able to do more fine-granular control.
What comes into my mind is having something like dscp-to-pcp map, which allows user to benefit from all existing dscp manipulation framework.
What do you think?

  1. Have you performed interop testing for the non-routable protocols like lldp, stp, lacp, etc?

Having seen some issues related to the routers/switches control plane forgetting to remove similar "dummy" headers, I'd recommend to perform validation with multiple vendors before committing this.

kib added a comment.Mar 18 2018, 4:29 PM

Several questions.

  1. What was the driver of implementing interface-level pcp settings?

For example, currently pcp can be set on per-mbuf basis for the "real" vlans, which can give more granular control on the outgoing traffic.

This is supported in the patch as well. If the scheduled mbuf has 8021Q tag attached, the pcp value from the tag is inserted into the VLAN frame tag.
If you look at the code, you will see that I extracted the fragment from vlan_trasnmit() to reuse in both places.

Personally it is a bit hard for me to see the benefits of having the same pcp applied to all packets from the host.
Typically this kind of configuration can be easily configured on the access switch and host is supposed to be able to do more fine-granular control.

I think that this is complimentary, and one method of manipulating pcp does not exclude the validity, and apparent usefulness, of another.
The patch was written because there are users who need this feature (I cannot say more).

What comes into my mind is having something like dscp-to-pcp map, which allows user to benefit from all existing dscp manipulation framework.
What do you think?

Perhaps yes, but this is out of scope of the change. If pcp is already assigned, it will be honored. The assignment should be managed by the layer above the place where the framing and transmit are performed.

  1. Have you performed interop testing for the non-routable protocols like lldp, stp, lacp, etc?

Having seen some issues related to the routers/switches control plane forgetting to remove similar "dummy" headers, I'd recommend to perform validation with multiple vendors before committing this.

No, no interop testing was done. As an anecdote, I can say that my home switch filters the vid 0 packets outright. More, Mellanox driver has the flow table programmed to drop such packets as well, right now (the fix is already worked out).

The feature is disabled by default, so I do not see it as critical or even important that some stuff would break when vid 0 encapsulation is enabled. The feature is added for the cases where it works.

In D14702#309602, @kib wrote:

Several questions.

  1. What was the driver of implementing interface-level pcp settings?

For example, currently pcp can be set on per-mbuf basis for the "real" vlans, which can give more granular control on the outgoing traffic.

This is supported in the patch as well. If the scheduled mbuf has 8021Q tag attached, the pcp value from the tag is inserted into the VLAN frame tag.
If you look at the code, you will see that I extracted the fragment from vlan_trasnmit() to reuse in both places.

Ah, my bad. I haven't noticed the 8021q tag override part. In that case it's more like setting default pcp.

Personally it is a bit hard for me to see the benefits of having the same pcp applied to all packets from the host.
Typically this kind of configuration can be easily configured on the access switch and host is supposed to be able to do more fine-granular control.

I think that this is complimentary, and one method of manipulating pcp does not exclude the validity, and apparent usefulness, of another.
The patch was written because there are users who need this feature (I cannot say more).

What comes into my mind is having something like dscp-to-pcp map, which allows user to benefit from all existing dscp manipulation framework.
What do you think?

Perhaps yes, but this is out of scope of the change. If pcp is already assigned, it will be honored. The assignment should be managed by the layer above the place where the framing and transmit are performed.

  1. Have you performed interop testing for the non-routable protocols like lldp, stp, lacp, etc?

Having seen some issues related to the routers/switches control plane forgetting to remove similar "dummy" headers, I'd recommend to perform validation with multiple vendors before committing this.

No, no interop testing was done. As an anecdote, I can say that my home switch filters the vid 0 packets outright. More, Mellanox driver has the flow table programmed to drop such packets as well, right now (the fix is already worked out).
The feature is disabled by default, so I do not see it as critical or even important that some stuff would break when vid 0 encapsulation is enabled. The feature is added for the cases where it works.

That's the topic we see differently. The functionality indeed is not used by default, however the actual code complicates ether_output(). Furthermore, I'm afraid that after someone actually tries to use this, the code will get more complicated. I'd really appreciate if you could change the ether_output() part to call the (inlined) function doing all of the business logic for handling pcp.

kib added a comment.Mar 18 2018, 7:17 PM
In D14702#309602, @kib wrote:

The feature is disabled by default, so I do not see it as critical or even important that some stuff would break when vid 0 encapsulation is enabled. The feature is added for the cases where it works.

That's the topic we see differently. The functionality indeed is not used by default, however the actual code complicates ether_output().

Indeed, it complicates the function because it adds the new feature.

Furthermore, I'm afraid that after someone actually tries to use this, the code will get more complicated. I'd really appreciate if you could change the ether_output() part to call the (inlined) function doing all of the business logic for handling pcp.

Can you explain more explicitly what do you want to change ? The only interpretation for your words which I was able to construct is that you want ether_8021q_frame() to become inlined in ether_output_frame(). Is it correct ?
If yes, I do not see much sense in it, because ether_8021q_frame() is only called for non-default path, and it really makes sense to keep ether_output_frame() short to not pollute icache.

In D14702#309683, @kib wrote:
In D14702#309602, @kib wrote:

The feature is disabled by default, so I do not see it as critical or even important that some stuff would break when vid 0 encapsulation is enabled. The feature is added for the cases where it works.

That's the topic we see differently. The functionality indeed is not used by default, however the actual code complicates ether_output().

Indeed, it complicates the function because it adds the new feature.

Furthermore, I'm afraid that after someone actually tries to use this, the code will get more complicated. I'd really appreciate if you could change the ether_output() part to call the (inlined) function doing all of the business logic for handling pcp.

Can you explain more explicitly what do you want to change ? The only interpretation for your words which I was able to construct is that you want ether_8021q_frame() to become inlined in ether_output_frame(). Is it correct ?
If yes, I do not see much sense in it, because ether_8021q_frame() is only called for non-default path, and it really makes sense to keep ether_output_frame() short to not pollute icache.

No, not exactly.

/* ether_otput_frame() */

if (ifp->if_pcp != IFNET_PCP_NONE) {
  if (ether_set_pcp(&m) != 0)
   return 0;
}

/* ether_set_pcp() */

/*
 * De-facto ether_8021q_frame() either produces new mbuf,  or fails to do so and produces an error.
 * Make its semantic consistent: either
 * 1) return/set new mbuf
 * or
 * 2) free resources and fill in errmsg.
 * With that change, an intermediate function can look like the following:
*/
eh = mtod(m, struct ether_header *);
if (ntohs(eh->ether_type) != ETHERTYPE_VLAN) {
  ether_8021q_frame(&m, &error);
  if m == NULL {
     print(error)
     return (1);
  }
}
return 0;
kib updated this revision to Diff 40426.EditedMar 18 2018, 10:10 PM

Refactor the patch according to melifaro' preferences.

Not tested. I will work further on this variant if agreed upon.

This revision now requires review to proceed.Mar 18 2018, 10:10 PM
melifaro accepted this revision.Mar 18 2018, 10:17 PM

Thank you!

This revision is now accepted and ready to land.Mar 18 2018, 10:17 PM
hselasky accepted this revision.Mar 18 2018, 10:35 PM

Maybe commit the mlx5en part separately from the other bits.

This revision was automatically updated to reflect the committed changes.