Paths

Table of Contentst

Differential D19921

Add GRE-in-UDP encapsulation support
ClosedPublic
Actions

Authored by ae on Apr 16 2019, 11:14 AM.

Details

Reviewers

eugen_grosbein.net
bz

Group Reviewers

manpages
network

Commits

rS346630: Add GRE-in-UDP encapsulation support as defined in RFC8086.

Summary

This patch adds support for GRE-in-UDP encapsulation to if_gre(4) as defined in RFC8086.

I did some tests to see the CPU load difference between GRE and GRE-in-UDP. I used Ixia packet generator to create many packets flows.
The ingress side is mellanox mlx5 card with GRE has single CPU core with load up to 80%, the same packet rate with udpencap enabled loads at least 6 CPU cores up to 10-20%.

How it is implemented. When user enables UDP encapsulation with command ifconfig gre0 udpencap, the driver creates kernel socket, that binds to tunnel source address and after udp_set_kernel_tunneling() starts receiving of all UDP packets destined to 4754 port. Each kernel socket maintains list of tunnels with different destination addresses. Thus when several tunnels use the same source address, they all handled by single socket.
The IP[V6]_BINDANY socket option is used to be able bind socket to source address even if it is not yet available in the system. This may happen on system boot, when gre(4) interface is created before source address become available. Due to using of ip_encap_register_srcaddr(), tunnel will not send packets until address become available. And since it is not yet configured, there are no chances for UDP packets to be received by tunnel.
The encapsulation and sending of packets is done directly from gre(4) into ip[6]_output() without using sockets.

gre_transmit() uses gre_flowid() function to generate entropy value for UDP source port. For now it is simple XOR from src and dsr IP addresses. Also for IPv6 this value will be set in flow label field. If RSS option is enabled will be used rss_hash_ip[46]_2tuple() functions.

Usage example:

# ifconfig gre0 create
# ifconfig gre0 inet tunnel 10.0.0.1 10.0.0.2 udpencap
# ifconfig gre0 inet 192.168.0.1/24 192.168.0.2
# ping 192.168.0.2

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

ae created this revision.Apr 16 2019, 11:14 AM

Herald added subscribers: bz, imp. · View Herald TranscriptApr 16 2019, 11:14 AM

Harbormaster completed remote builds in B23692: Diff 56241.Apr 16 2019, 11:14 AM

Document GRE-in-UDP in gre(4).

Herald added a reviewer: manpages. · View Herald TranscriptApr 16 2019, 12:52 PM

Harbormaster completed remote builds in B23693: Diff 56244.Apr 16 2019, 12:52 PM

ae edited the summary of this revision. (Show Details)Apr 16 2019, 3:25 PM

ae added a reviewer: network.

ae edited the summary of this revision. (Show Details)Apr 19 2019, 10:58 AM

eugen_grosbein.net added a reviewer: eugen_grosbein.net.Apr 19 2019, 3:43 PM

bz requested changes to this revision.Apr 19 2019, 4:10 PM

bz added inline comments.

sys/conf/config.mk
33 ↗	(On Diff #56244)	This seems noise in this change and not belong here?
sys/conf/kern.opts.mk
56 ↗	(On Diff #56244)	Equally noise not belonging here? Please commit upfront separately and then update this change again.

This revision now requires changes to proceed.Apr 19 2019, 4:10 PM

I like the idea.

Speaking of load-balancing, there are cases when more control is needed. lagg(4) shows good example of "lagghash" option for ifconfig(8). Something like its l3/l4 values may be useful but that can wait for later updates.

sbin/ifconfig/ifgre.c
91 ↗	(On Diff #56244)	Please consider using sysctl net.inet.ip.portrange.hifirst and net.inet.ip.portrange.hilast instead of embedding magic constants 0xC000 and 0xFFFF. And maybe issue a warning if value is outside of the range but still process it.
sys/net/if_gre.c
296 ↗	(On Diff #56244)	Same here, consider using V_ipport_hifirstauto and V_ipport_hilastauto (sys/netinet/in_pcb.c) instead of magic constants.
745 ↗	(On Diff #56244)	V_ipport_hifirstauto ?

ae added inline comments.Apr 19 2019, 4:35 PM

sys/conf/kern.opts.mk
56 ↗	(On Diff #56244)	This is mostly for testing, it seems it is hard to compile module with WITH_RSS/WITHOUT_RSS make options without such change.
sys/net/if_gre.c
296 ↗	(On Diff #56244)	I implemented this as required by RFC.
745 ↗	(On Diff #56244)	https://tools.ietf.org/html/rfc8086#section-3.2.1 GRE-in-UDP permits the UDP source port value to be used to encode an entropy value. The UDP source port contains a 16-bit entropy value that is generated by the encapsulator to identify a flow for the encapsulated packet. The port value SHOULD be within the ephemeral port range, i.e., 49152 to 65535, where the high-order two bits of the port are set to one. This provides fourteen bits of entropy for the inner flow identifier. In the case that an encapsulator is unable to derive flow entropy from the payload header or the entropy usage has to be disabled to meet operational requirements (see Section 7), to avoid reordering with a packet flow, the encapsulator SHOULD use the same UDP source port value for all packets assigned to a flow, e.g., the result of an algorithm that performs a hash of the tunnel ingress and egress IP address.

remove RSS-related chunks.
allow use any port number within [V_ipport_hifirstauto, V_ipport_hilastauto] range.

Harbormaster completed remote builds in B23815: Diff 56477.Apr 22 2019, 11:17 AM

ae marked 3 inline comments as done.Apr 22 2019, 11:18 AM

eugen_grosbein.net accepted this revision.Apr 23 2019, 5:42 AM

melifaro added a subscriber: melifaro.Apr 23 2019, 9:45 AM

melifaro added inline comments.

sys/netinet/ip_gre.c
370 ↗	(On Diff #56477)	Shouldn't mtu be changed here as well?

ae added inline comments.Apr 23 2019, 10:36 AM

sys/netinet/ip_gre.c
370 ↗	(On Diff #56477)	There was one period, when gre(4) interfaces did automatic MTU adjustment, when some GRE options appeared or IP version for outer header changed. But then users started complain that this is unexpected and they want to calculate MTU by self. And then I reverted this back.

melifaro added inline comments.Apr 23 2019, 2:18 PM

sys/netinet/ip_gre.c
370 ↗	(On Diff #56477)	I agree with the statement about manual MTU adjustment - It shouldn't be the kernel job. However, currently gre tunnel is created with mtu 1476 by default, which works for the most common use case. If we know that we're going to use additional udp header at the moment of creation, shouldn't we have a better default?

ae added inline comments.Apr 23 2019, 3:52 PM

sys/netinet/ip_gre.c
370 ↗	(On Diff #56477)	If an administrator knows that gre-in-udp will be used, corresponding mtu can be configured on creating time :) 1476 is just historical default.

melifaro added inline comments.Apr 23 2019, 4:45 PM

sys/netinet/ip_gre.c
370 ↗	(On Diff #56477)	Yeah, there is always "you can configure everything yourself and you WILL configure everything yourself"-style way of doing things :-). That's certainly not a blocker for the change and can be discussed/addressed separately.