Page MenuHomeFreeBSD

netinet6: fix interface handling for loopback traffic

Authored by melifaro on Jul 6 2022, 4:02 PM.




Currently, processing of IPv6 local traffic is partially broken:

## Link-local fails:
ifconfig vtnet0 inet6 | awk '$2~/fe80::/{print$2}'
 telnet fe80::5054:ff:fe42:fef%vtnet0 22
Trying fe80::5054:ff:fe42:fef...

## Global unicast connect() takes 3 seconds:

time echo | nc `ifconfig vtnet0 inet6 | awk '$2~/^2a01/{print$2}'` 22
SSH-2.0-OpenSSH_9.0 FreeBSD-20220415
Invalid SSH identification string.
echo  0,00s user 0,00s system 62% cpu 0,004 total
nc `ifconfig vtnet0 inet6 | awk '$2~/^2a01/{print$2}'` 22  0,00s user 0,00s system 0% cpu 3,515 total

There are multiple underlying issues. To, let's take a short survey on what's happening.
As opposed to IPv4 world, IPv6 has a native concept of scopes (e.g. non-overlapping zones to which an address can belong to). One of such scopes is link-local scope (e.g. link-local address is only "valid" within the link). Traditionally we shortcut traffic to the local addresses via loopback interface, using host routes. This approach offers better performance than relying on the L2 output path (or the actual NIC) to such loop. In order to support this shortcut for IPv6 packets with link-local source/destination, one needs to somehow propagate the original zone or interface to the loopback input, so ip6_input() can properly work.
The implementation of that interface propagation looks the following: ip6_output() determines both ifp and origifp for the passed mbuf. The former is the "transmit" interface - the one that if_output() method will be called. The latter is the "original" interface - the one which IPv6 packet source belongs to.
For example, if one wants to send a packet to the fe80::1 address that is assigned to vtnet0, ifp would be lo0 (result of a routing table lookup) and origifp would be vtnet0. For the non-local origifp would be the same as ifp.
Both ifp and origifp are passed to ip6_output_send(), then to nd6_output_ifp(), then to if_output(), which is looutput() and finally to if_simloop(). The latter does two things before passing the packet to netisr: (a) updates mbuf rcvif to origifp and (b) sets M_LOOP flag for IPv6 packets. Thus, the newly-received packet has an "original" interface set as its received interface and a special flag that allows checking if it's received locally.
ip6_input() does tons of checks to ensure that the packet is "valid" and can be accepted. One of such checks is recently-added source address validation, which rejects packets from local sources. Problem#1 is that this check relies on the interface type (loopback) instead of M_LOOP flag to exclude the packet from source checking. This breaks link-local processing.

Problem#2 is that TCP connect() takes 3 seconds to complete. The underlying reason is that ip6_output() behavior is not consistent across cached and non-cached route selection versions. The following example, based on dtrace output provides more details:

13:58 [0] m@devel2 ifconfig vtnet0 inet6
	inet6 2a01:4f8:13a:70c:ffff::8 prefixlen 96

telnet 2a01:4f8:13a:70c:ffff::8 22
## dtrace probe checking ifp & originifp @ ip6_output_send():
# First TCP SYN:
* TX ifp=lo0 origifp=vtnet0 src=2a01:4f8:13a:70c:ffff::8
# Second TCP SYN:
* TX ifp=lo0 origifp=vtnet0 src=2a01:4f8:13a:70c:ffff::8
# Third TCP SYN:
* TX ifp=lo0 origifp=lo0 src=2a01:4f8:13a:70c:ffff::8

What happens here?
The first trace (ifp=lo0, origifp=vtnet0) is exactly what is expected - transmit interface is loopback, and the original interface is properly retained.
However, this result is achieved in a non-obvious manner. In the middle of ip6_output(), at the routing lookup phase, in6_selectroute() is called.
It returns the correct nexthop, specific for the address in question (2a01:4f8:13a:70c:ffff::8), with proper nh_ifp=lo0& nh_aifp=vtnet0. Surprisingly, the ifp returned by the in6_selectroute() is vtnet0 instead of expected lo0. (In fact, in6_selectroute() explicitly returns nh_aifp in case of successful lookup).
It is changed once again in the Check for valid scope ID section - originifp becomes ifp and ifp is set to be ia->ia_ifp. The latter ia is derived from the same nexthop and is currently ::1 for such routes, but can change in the future, so it looks pretty fragile. Finally, it gets dropped by the source address validation, as originifp is not loopback.
The second result looks exactly like the first, so we jump to the third result for a second. There the origifp suddenly becomes lo0. It happens because the nexthop finally get cached in the inpcb and the call to in6_selectroute() is avoided. Thus, ifp starts with lo0 and the machinery described above results in both origifp and ifp to become lo0.
Why the nexthop is not cached immediately? Because (Problem#3) in6_selectroute() (and underlying selectroute()) updates the nexthop in the provided inpcb route, but does not update the route generation id (inp_rt_cookie). Next validation simply wipes the cached nexthop as the route generate id is wrong.

Proposed solution

The first problem is addressed by checking M_LOOP flag instead of the interface type (patch by @glebius).

The second problem is handled by consistently filling ifp / origifp for each route lookup condition. Note that this diff depends on the 2 previous in the stack that guarantees that in6_selectroute() always returns ifp when successful.

The third problem is not addressed here and is subject for the future work.

Test Plan


...[gu-no_sav]  ->  passed  [0.281s][gu-sav]  ->  failed: /usr/tests/sys/netinet6/ AssertionError  [3.548s][ll-no_sav]  ->  passed  [0.289s][ll-sav]  -> failed: /usr/tests/sys/netinet6/ TimeoutError
[75.330s][gu-no_sav]  ->  passed  [0.280s][gu-sav]  ->  failed: /usr/tests/sys/netinet6/ timeout  [2.337s][ll-no_sav]  ->  passed  [0.275s][ll-sav]  ->  failed: /usr/tests/sys/netinet6/ timeout  [2.352s]


16:54 [0] m@devel2                   s kyua test -k /usr/tests/sys/netinet6/Kyuafile  ->  passed  [0.453s]  ->  passed  [0.442s][empty]  ->  passed  [0.423s][ifsame]  ->  passed  [0.410s][ipandif]  ->  passed  [0.492s][iponly1]  ->  passed  [0.451s][nolocalip]  ->  passed  [0.432s]  ->  passed  [0.441s][gu-no_sav]  ->  passed  [0.327s][gu-sav]  ->  passed  [0.290s][ll-no_sav]  ->  passed  [0.344s][ll-sav]  ->  passed  [0.299s][lo-no_sav]  ->  passed  [0.296s][lo-sav]  ->  passed  [0.288s][gu-no_sav]  ->  passed  [0.295s][gu-sav]  ->  passed  [0.287s][ll-no_sav]  ->  passed  [0.283s][ll-sav]  ->  passed  [0.296s][lo-no_sav]  ->  passed  [0.294s][lo-sav]  ->  passed  [0.308s][ff02]  ->  passed  [0.422s][ff05]  ->  passed  [0.440s][ff08]  ->  passed  [0.414s][ff0e]  ->  passed  [0.481s]  ->  passed  [0.454s]

Diff Detail

rG FreeBSD src repository
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

melifaro retitled this revision from netinet6: simplify interface handling for loopback traffic to netinet6: fix interface handling for loopback traffic.Jul 6 2022, 4:58 PM
melifaro edited the summary of this revision. (Show Details)
melifaro edited the test plan for this revision. (Show Details)
melifaro added reviewers: network, bz, glebius, ae.

Good explanation. It would be nice to have something similar somewhere in comments.


typo: s/ifp\-/ifp=/

This revision is now accepted and ready to land.Jul 7 2022, 12:28 PM