This patch adds support for GRE-in-UDP encapsulation to if_gre(4) as defined in RFC8086.
I did some tests to see the CPU load difference between GRE and GRE-in-UDP. I used Ixia packet generator to create many packets flows.
The ingress side is mellanox mlx5 card with GRE has single CPU core with load up to 80%, the same packet rate with udpencap enabled loads at least 6 CPU cores up to 10-20%.
How it is implemented. When user enables UDP encapsulation with command ifconfig gre0 udpencap, the driver creates kernel socket, that binds to tunnel source address and after udp_set_kernel_tunneling() starts receiving of all UDP packets destined to 4754 port. Each kernel socket maintains list of tunnels with different destination addresses. Thus when several tunnels use the same source address, they all handled by single socket.
The IP[V6]_BINDANY socket option is used to be able bind socket to source address even if it is not yet available in the system. This may happen on system boot, when gre(4) interface is created before source address become available. Due to using of ip_encap_register_srcaddr(), tunnel will not send packets until address become available. And since it is not yet configured, there are no chances for UDP packets to be received by tunnel.
The encapsulation and sending of packets is done directly from gre(4) into ip[6]_output() without using sockets.
gre_transmit() uses gre_flowid() function to generate entropy value for UDP source port. For now it is simple XOR from src and dsr IP addresses. Also for IPv6 this value will be set in flow label field. If RSS option is enabled will be used rss_hash_ip[46]_2tuple() functions.
Usage example:
# ifconfig gre0 create # ifconfig gre0 inet tunnel 10.0.0.1 10.0.0.2 udpencap # ifconfig gre0 inet 192.168.0.1/24 192.168.0.2 # ping 192.168.0.2