Change Details

## Overview Currently, processing of IPv6 local traffic is partially broken: ``` ## Link-local fails: ifconfig vtnet0 inet6 | awk '$2~/fe80::/{print$2}' fe80::5054:ff:fe42:fef%vtnet0 telnet fe80::5054:ff:fe42:fef%vtnet0 22 Trying fe80::5054:ff:fe42:fef... ^C ## Global unicast connect() takes 3 seconds: time echo | nc `ifconfig vtnet0 inet6 | awk '$2~/^2a01/{print$2}'` 22 SSH-2.0-OpenSSH_9.0 FreeBSD-20220415 Invalid SSH identification string. echo 0,00s user 0,00s system 62% cpu 0,004 total nc `ifconfig vtnet0 inet6 | awk '$2~/^2a01/{print$2}'` 22 0,00s user 0,00s system 0% cpu 3,515 total ``` There are multiple underlying issues. To, let's take a short survey on what's happening. As opposed to IPv4 world, IPv6 has a native concept of `scopes` (e.g. non-overlapping zones to which an address can belong to). One of such scopes is //link-local// scope (e.g. link-local address is only "valid" within the link). Traditionally we shortcut traffic to the local addresses via loopback interface, using host routes. This approach offers better performance than relying on the L2 output path (or the actual NIC) to such loop. In order to support this shortcut for IPv6 packets with link-local source/destination, one needs to somehow propagate the original zone or interface to the loopback input, so ip6_input() can properly work. The implementation of that interface propagation looks the following: `ip6_output()` determines both `ifp` and `origifp` for the passed mbuf. The former is the "transmit" interface - the one that `if_output()` method will be called. The latter is the "original" interface - the one which IPv6 packet source belongs to. For example, if one wants to send a packet to the `fe80::1` address that is assigned to `vtnet0`, `ifp` would be `lo0` (result of a routing table lookup) and `origifp` would be `vtnet0`. For the non-local `origifp` would be the same as `ifp`. Both `ifp` and `origifp` are passed to `ip6_output_send()`, then to `nd6_output_ifp()`, then to `if_output()`, which is `looutput()` and finally to `if_simloop()`. The latter does two things before passing the packet to netisr: (a) updates mbuf `rcvif` to `origifp` and (b) sets `M_LOOP` flag for IPv6 packets. Thus, the newly-received packet has an "original" interface set as its received interface and a special flag that allows checking if it's received locally. ip6_input() does tons of checks to ensure that the packet is "valid" and can be accepted. One of such checks is recently-added source address validation, which rejects packets from local sources. **Problem#1** is that this check relies on the interface type (loopback) instead of `M_LOOP` flag to exclude the packet from source checking. This breaks link-local processing. **Problem#2** is that TCP connect() takes 3 seconds to complete. The underlying reason is that `ip6_output()` behavior is not consistent across cached and non-cached route selection versions. The following example, based on dtrace output provides more details: ``` 13:58 [0] m@devel2 ifconfig vtnet0 inet6 .. inet6 2a01:4f8:13a:70c:ffff::8 prefixlen 96 telnet 2a01:4f8:13a:70c:ffff::8 22 ... ## dtrace probe checking ifp & originifp @ ip6_output_send(): # First TCP SYN: * TX ifp=lo0 origifp=vtnet0 src=2a01:4f8:13a:70c:ffff::8 # Second TCP SYN: * TX ifp=lo0 origifp=vtnet0 src=2a01:4f8:13a:70c:ffff::8 # Third TCP SYN: * TX ifp=lo0 origifp=lo0 src=2a01:4f8:13a:70c:ffff::8 ``` What happens here? The first trace (ifp=lo0, origifp=vtnet0) is exactly what is expected - transmit interface is loopback, and the original interface is properly retained. However, this result is achieved in a non-obvious manner. In the middle of `ip6_output()`, at the routing lookup phase, `in6_selectroute()` is called. It returns the correct nexthop, specific for the address in question (`2a01:4f8:13a:70c:ffff::8`), with proper nh_ifp=lo0& nh_aifp=vtnet0. Surprisingly, the `ifp` returned by the `in6_selectroute()` is `vtnet0` instead of expected `lo0`. (In fact, `in6_selectroute()` explicitly returns nh_aifp in case of successful lookup). It is changed once again in the `Check for valid scope ID` section - `originifp` becomes `ifp` and ifp is set to be `ia->ia_ifp`. The latter `ia` is derived from the same nexthop and is currently `::1` for such routes, but can change in the future, so it looks pretty fragile. Finally, it gets dropped by the source address validation, as `originifp` is not loopback. The second result looks exactly like the first, so we jump to the **third** result for a second. There the `origifp` suddenly becomes `lo0`. It happens because the nexthop finally get cached in the inpcb and the call to `in6_selectroute()` is avoided. Thus, ifp starts with `lo0` and the machinery described above results in both `origifp` and `ifp` to become `lo0`. Why the nexthop is not cached immediately? Because (**Problem#3**) `in6_selectroute()` (and underlying `selectroute()`) updates the nexthop in the provided inpcb route, but does not update the route generation id (`inp_rt_cookie`). Next validation simply wipes the cached nexthop as the route generate id is wrong. ## Proposed solution The first problem is addressed by checking `M_LOOP` flag instead of the interface type (patch by @glebius). The second problem is handled by consistently filling `ifp` / `origifp` for each route lookup condition. Note that this diff depends on the 2 previous in the stack that guarantees that `in6_selectroute()` always returns `ifp` when successful. The third problem is not addressed here and is subject for the future work.