Page MenuHomeFreeBSD

Add IP(V6)_VLAN_PCP to adjust 802.1 priority per-flow.
ClosedPublic

Authored by rscheff on Sep 12 2020, 1:02 AM.

Details

Summary

Currently, IOCTL only allows the interface-wide adjustment
of the Ethernet 802.1 priority. This adds a new IP_PROTO / IPV6_PROTO
option IP(V6)_VLAN_PCP, which can be set to -1 (interface
default), or explicitly to any priority between 0 and 7.

Note that for untagged traffic, explicitly adding a
priority will insert a special 801.1Q vlan header to
carry the priority setting

Test Plan

Will provide a "-C" option to ping and ping6 to allow
testing and demonstrating the use of this new functionality.

Changing the default interface priority with
ifconfig em0 pcp 3, and validating the prior function
with tcpdump -i em0 icmp or ( vlan and icmp ),
using the ping -C <pcp> option we can override the
interface default (or revert a flow later to resume
using the interface default.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Please add some documentation for the SOL_SOCKET level socket option SO_VLAN_PCP to the man page lib/libc/sys/getsockopt.2.

Maybe it makes sense to split the commit:

  • One adding the socket options.
  • One allowing ping to use it.
  • One allowing ping6 to use it.

Maybe one could combine the last two.

sys/netinet6/ip6_output.c
466 ↗(On Diff #76940)

I'm really concerned you're going to add cache misses to the critical path by doing this in a blanket fashion for all inp with a socket. Can you instead add a flag to inp_flags2 which indicates that you should check the socket for the so_vlan_pcp?

sys/netinet6/ip6_output.c
466 ↗(On Diff #76940)

I hear your concern, but that makes me concerned that to do what you suggest would be "attempting to optimize without measurements". Might I suggest first a set of impact measurements be made, with and without this differential and then with and without a inp_flags2 flag?

I might also suggest that this KASSERT in the fast path is sub optimal, it appears to me that you have already range checked the value at the only setter, so this KASSERT would only catch a data corruption error?

sys/netinet6/ip6_output.c
466 ↗(On Diff #76940)

Accessing the socket is expensive. Especially for a workload with a lot of flows (100,000 plus). Its one of the biggest cache miss points in rack tcp as shown by vtune for us. I've seen it over and over again in pretty much every vtune i look at. Randall is working on some optimizations that add a fast path which, among other things, does not deref the socket every time. inp->flags2 is checked on output already, which is why I suggested it.

I've added Randall so he can comment on this.

The other option is to put this in as a compile-time option so it can be compiled out entirely.

rrs requested changes to this revision.Sep 25 2020, 7:19 PM
rrs added inline comments.
sys/netinet6/ip6_output.c
467–468 ↗(On Diff #76940)

I agree with Drew here, you would take cache misses on every output packet. That is just plain unacceptable, especially
if we are trying to squeeze as much as we can out of a box.

A flag in the inp would be better since it would be much more likely to be in cache.

This revision now requires changes to proceed.Sep 25 2020, 7:19 PM

So, how about moving the PCP to become part of the inpcb, and use IPPROTO sockopts instead. Sounds like that may be the better approach; doing this in the socket struct was "just" the easiest way to get a proof-of-concept running for various reasons... While the PCP is part of the (extended) Ethernet Header, there are no setsockopt Protos that deal with data on that level...

So, how about moving the PCP to become part of the inpcb, and use IPPROTO sockopts instead. Sounds like that may be the better approach; doing this in the socket struct was "just" the easiest way to get a proof-of-concept running for various reasons... While the PCP is part of the (extended) Ethernet Header, there are no setsockopt Protos that deal with data on that level...

That does sound better. I'd still suggest guarding it with a check of a flag or a bit that's in a cacheline which is already hot, because if you start checking a cold cacheline of the inp, its still going to have overhead. But if the cacheline you're checking is already hot, then its "free". Thank you!

  • move VLAN_PCP to INPCB, and move sockopt to IP_* / IPV6_* from SO_VLAN_PCP
rscheff retitled this revision from Add SO_VLAN_PCP to adjust 802.1 priority per-flow. to Add IP(V6)_VLAN_PCP to adjust 802.1 priority per-flow..Oct 1 2020, 12:37 PM
rscheff edited the summary of this revision. (Show Details)
rscheff edited the test plan for this revision. (Show Details)

Moved the setsockopt/getsockopt from socket level to IP_PROTO level (IPV6_PROTO, the numeric ids are identical), which addresses the concerns about the cold cacheline hit (inp->inp_flags2 should be hot in the ip_output path. Also, this change alignes slightly better with where the Ethernet PCP "lives" (as marking on ethernet packets, not internal unix sockets, which IMHO is more close to an IP packet ).

Finally, splitting off the changes to ping/ping6, which would demonstrate the use of the new setsockopt into another Diff.

Thank you for making these changes.

This revision is now accepted and ready to land.Oct 8 2020, 2:42 PM