Page MenuHomeFreeBSD

Add GRE-in-UDP encapsulation support
ClosedPublic

Authored by ae on Apr 16 2019, 11:14 AM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Nov 7, 8:44 AM
Unknown Object (File)
Tue, Oct 29, 4:56 AM
Unknown Object (File)
Mon, Oct 28, 5:47 AM
Unknown Object (File)
Oct 3 2024, 10:15 PM
Unknown Object (File)
Oct 3 2024, 3:09 PM
Unknown Object (File)
Oct 2 2024, 5:34 AM
Unknown Object (File)
Sep 30 2024, 10:02 AM
Unknown Object (File)
Sep 24 2024, 5:06 PM
Subscribers

Details

Summary

This patch adds support for GRE-in-UDP encapsulation to if_gre(4) as defined in RFC8086.

I did some tests to see the CPU load difference between GRE and GRE-in-UDP. I used Ixia packet generator to create many packets flows.
The ingress side is mellanox mlx5 card with GRE has single CPU core with load up to 80%, the same packet rate with udpencap enabled loads at least 6 CPU cores up to 10-20%.

How it is implemented. When user enables UDP encapsulation with command ifconfig gre0 udpencap, the driver creates kernel socket, that binds to tunnel source address and after udp_set_kernel_tunneling() starts receiving of all UDP packets destined to 4754 port. Each kernel socket maintains list of tunnels with different destination addresses. Thus when several tunnels use the same source address, they all handled by single socket.
The IP[V6]_BINDANY socket option is used to be able bind socket to source address even if it is not yet available in the system. This may happen on system boot, when gre(4) interface is created before source address become available. Due to using of ip_encap_register_srcaddr(), tunnel will not send packets until address become available. And since it is not yet configured, there are no chances for UDP packets to be received by tunnel.
The encapsulation and sending of packets is done directly from gre(4) into ip[6]_output() without using sockets.

gre_transmit() uses gre_flowid() function to generate entropy value for UDP source port. For now it is simple XOR from src and dsr IP addresses. Also for IPv6 this value will be set in flow label field. If RSS option is enabled will be used rss_hash_ip[46]_2tuple() functions.

Usage example:

# ifconfig gre0 create
# ifconfig gre0 inet tunnel 10.0.0.1 10.0.0.2 udpencap
# ifconfig gre0 inet 192.168.0.1/24 192.168.0.2
# ping 192.168.0.2

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Document GRE-in-UDP in gre(4).

ae added a reviewer: network.
bz requested changes to this revision.Apr 19 2019, 4:10 PM
bz added inline comments.
sys/conf/config.mk
33 ↗(On Diff #56244)

This seems noise in this change and not belong here?

sys/conf/kern.opts.mk
56 ↗(On Diff #56244)

Equally noise not belonging here? Please commit upfront separately and then update this change again.

This revision now requires changes to proceed.Apr 19 2019, 4:10 PM

I like the idea.

Speaking of load-balancing, there are cases when more control is needed. lagg(4) shows good example of "lagghash" option for ifconfig(8). Something like its l3/l4 values may be useful but that can wait for later updates.

sbin/ifconfig/ifgre.c
91 ↗(On Diff #56244)

Please consider using sysctl net.inet.ip.portrange.hifirst and net.inet.ip.portrange.hilast instead of embedding magic constants 0xC000 and 0xFFFF.

And maybe issue a warning if value is outside of the range but still process it.

sys/net/if_gre.c
296 ↗(On Diff #56244)

Same here, consider using V_ipport_hifirstauto and V_ipport_hilastauto (sys/netinet/in_pcb.c) instead of magic constants.

745 ↗(On Diff #56244)

V_ipport_hifirstauto ?

sys/conf/kern.opts.mk
56 ↗(On Diff #56244)

This is mostly for testing, it seems it is hard to compile module with WITH_RSS/WITHOUT_RSS make options without such change.

sys/net/if_gre.c
296 ↗(On Diff #56244)

I implemented this as required by RFC.

745 ↗(On Diff #56244)

https://tools.ietf.org/html/rfc8086#section-3.2.1

GRE-in-UDP permits the UDP source port value to be used to encode an
entropy value.  The UDP source port contains a 16-bit entropy value
that is generated by the encapsulator to identify a flow for the
encapsulated packet.  The port value SHOULD be within the ephemeral
port range, i.e., 49152 to 65535, where the high-order two bits of
the port are set to one.  This provides fourteen bits of entropy for
the inner flow identifier.  In the case that an encapsulator is
unable to derive flow entropy from the payload header or the entropy
usage has to be disabled to meet operational requirements (see
Section 7), to avoid reordering with a packet flow, the encapsulator
SHOULD use the same UDP source port value for all packets assigned to
a flow, e.g., the result of an algorithm that performs a hash of the
tunnel ingress and egress IP address.
  • remove RSS-related chunks.
  • allow use any port number within [V_ipport_hifirstauto, V_ipport_hilastauto] range.
ae marked 3 inline comments as done.Apr 22 2019, 11:18 AM
melifaro added inline comments.
sys/netinet/ip_gre.c
370 ↗(On Diff #56477)

Shouldn't mtu be changed here as well?

sys/netinet/ip_gre.c
370 ↗(On Diff #56477)

There was one period, when gre(4) interfaces did automatic MTU adjustment, when some GRE options appeared or IP version for outer header changed. But then users started complain that this is unexpected and they want to calculate MTU by self. And then I reverted this back.

sys/netinet/ip_gre.c
370 ↗(On Diff #56477)

I agree with the statement about manual MTU adjustment - It shouldn't be the kernel job. However, currently gre tunnel is created with mtu 1476 by default, which works for the most common use case. If we know that we're going to use additional udp header at the moment of creation, shouldn't we have a better default?

sys/netinet/ip_gre.c
370 ↗(On Diff #56477)

If an administrator knows that gre-in-udp will be used, corresponding mtu can be configured on creating time :)
1476 is just historical default.

sys/netinet/ip_gre.c
370 ↗(On Diff #56477)

Yeah, there is always "you can configure everything yourself and you WILL configure everything yourself"-style way of doing things :-). That's certainly not a blocker for the change and can be discussed/addressed separately.

This revision was not accepted when it landed; it landed in state Needs Review.Apr 24 2019, 9:05 AM
This revision was automatically updated to reflect the committed changes.