Page MenuHomeFreeBSD

netinet6: streamline scope6 checks for loopback traffic in ip6_output().
Needs ReviewPublic

Authored by melifaro on Tue, May 3, 1:53 PM.


Group Reviewers


Current ip6_output() behaviour is not consistent across cached and non-cached lookup versions (followup of D18769).
For example, TCP retransmits (and to some extent to the normal TCP) for the local connections looks the following:

13:58 [0] m@devel2 ifconfig vtnet0 inet6
vtnet0: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	inet6 fe80::5054:ff:fe14:e319%vtnet0 prefixlen 64 scopeid 0x1
	inet6 2a01:4f8:13a:70c:ffff::8 prefixlen 96

telnet 2a01:4f8:13a:70c:ffff::8 22
## dtrace probe checking ifp & originifp @ ip6_output_send():
# First TCP SYN:
* TX ifp=lo0 origifp=vtnet0 2a01:4f8:13a:70c:ffff::8
# Second TCP SYN:
* TX ifp=lo0 origifp=vtnet0 2a01:4f8:13a:70c:ffff::8
# Third TCP SYN:
* TX ifp=lo0 origifp=lo0 2a01:4f8:13a:70c:ffff::8

Apart from being inconsistent, it also adds complexity to the recently-added source address validation (D32915).

So, what happens here?
Let's start with a small background - what is originifp and why it is needed?
As opposed to IPv4 world (mostly), IPv6 has a concept of scopes (e.g. non-overlapping zones which an address can belong to). One of such scopes is link-local scope (e.g. link-local address is only "valid" within the link). Traditionally we shortcut traffic to the local addresses via loopback interface, instead of relying on the L2 output route (or the actual NIC) to do the loop. In order to support this shortcut for IPv6 link-local, one needs to somehow pass the original zone/interface to the loopback input, so ip6_input() can properly work. This is what origins is used for - passing the "address" interface. For the sake of simplicity, it is used for all IPv6 traffic, not just link-local one.

Let's look into the dtrace results once again. The first one (ifp=lo0, origifp=vtnet0) is exactly what is expected - transmit interface is loopback, and the original interface is properly retained.
However, this result is achieved in a non-obvious manner. In the middle of ip6_output(), at the routing lookup phase, in6_selectroute() is called.
It returns the correct nexthop, specific for the address in question (2a01:4f8:13a:70c:ffff::8), with proper nh_ifp=lo0& nh_aifp=vtnet0. Surprisingly, the ifp returned by the in6_selectroute() is vtnet0 instead of expected lo0. (In fact, in6_selectroute() explicitly returns nh_aifp in case of successful lookup).
It is changed once again in the Check for valid scope ID section - originifp becomes ifp and ifp is set to be ia->ia_ifp. The latter ia is derived from the same nexthop and is currently ::1 for such routes, but can change in the future, so it looks pretty fragile.

The second result looks exactly like the first, so we jump to the third result for a second. There the origifp suddenly becomes lo0. It happens because the nexthop finally get cached in the inpcb and the call to in6_selectroute() is avoided. Thus, ifp starts with lo0 and the machinery described above results in both origifp and ifp to become lo0.

Why the nexthop is not cached immediately? Because in6_selectroute() (and underlying selectroute()) updates the nexthop in the provided inpcb route, but does not update the route generation id (inp_rt_cookie). Next validation simply wipes the cached nexthop as the route generate id is wrong.

Proposed solution

The proposed idea is relatively simple and is composed of two actions. First is explicitly filling in proper ifp and origifp at the route lookup stage. The second is simplifying source/destination scope Id checks, as no actions other that pass/fail are expected.

Diff Detail

rS FreeBSD src repository - subversion
Lint OK
No Unit Test Coverage
Build Status
Buildable 45455
Build 42343: arc lint + arc unit

Event Timeline

melifaro retitled this revision from netinet6: streamline scope6 checks. to netinet6: streamline scope6 checks for loopback traffic in ip6_output()..Tue, May 3, 2:45 PM
melifaro edited the summary of this revision. (Show Details)

I am not sure if I currently can review this technically in full but we fixed a similar issue by the code you are currently changing in ef0111fdf364e4e87b522025b13aad69067c3fe6 .

686cdd19b1b18 when origifp was introduced there was a special check for IFF_LOOPBACK all along to deal with this. a1f7e5f8ee7fe then changed it to a check to use the interface of our own address if avail as origifp. From there too many edits have had an impact and subtle behaviour.

For all I am reading we are talking multiple problems and I am with @glebius from the email thread in that the 3rd lookup seems wrong. With no route changes the results should be correct and deterministic whether cached or not. I am not sure if that problem was addressed in first place but that should be one dedicated change.

Second and only from there I'd go after the scope zone checks as a 2nd fix though I haven't convinced myself that that is correct either yet.

Third, given glebius also has a patch for ip6_input the problem seems even bigger and means more things got broken along the way. Maybe it would be good where this breakage was introduced as that's not clear to me yet.
The fact that m->m_pkthdr.rcvif may be lo0 or actually is not may be unexpected. Should rcvif be updated in this case somewhere too? May this also affects packet filtering in firewalls potentially. There used to be a sysctl for this behaviour actually introduced in 82cd038d51e2f nd6_useloopback (and useloopback for IPv4). Sadly that got nuked along the way after 10-CURRENT/2012-04 as it was great or testing this stuff..

> % sysctl net.inet6.icmp6.nd6_useloopback
> net.inet6.icmp6.nd6_useloopback: 1

I would also kind-of like to know which the 4 packets are which do not show up in tcpdump on vtnet but ipfw counts them on vtnet:

# ipfw show
00100 0 0 allow log ip6 from any to any via lo0
00200 0 0 allow log ip6 from any to any via vtnet0
65535 0 0 deny ip from any to any
# telnet 2001:db8::1 22
Trying 2001:db8::1...
Connected to 2001:db8::1.
Escape character is '^]'.
SSH-2.0-OpenSSH_9.0 FreeBSD-20220415
telnet> quit
Connection closed.
# jobs
[2]  + Running                       tcpdump -ln -s0 -i vtnet0 -v
# fg
tcpdump -ln -s0 -i vtnet0 -v
0 packets captured
0 packets received by filter
0 packets dropped by kernel
# ipfw show
00100 22 1784 allow log ip6 from any to any via lo0
00200  4  380 allow log ip6 from any to any via vtnet0
65535  0    0 deny ip from any to any

Does this mean that rcvif is updated and @glebius' change from the email thread might not be required? Has anyone validated this?


If you can guarantee (further down) that ifp is always set here then that would probably be a good change (missed at an earlier time) to remove the if() here first as it'll make the follow-up changes a lot more logic (i.e. the missing NULL check further down) in 716 for the mtu.


That probably wants a #define elsewhere? But then GLOBAL is 0xe ... so either the 0xf or the comment are misleading...