Page MenuHomeFreeBSD

netinet6: streamline scope6 checks for loopback traffic in ip6_output().
Changes PlannedPublic

Authored by melifaro on May 3 2022, 1:53 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Nov 26, 1:56 AM
Unknown Object (File)
Tue, Nov 26, 1:55 AM
Unknown Object (File)
Tue, Nov 26, 1:29 AM
Unknown Object (File)
Nov 16 2024, 5:27 PM
Unknown Object (File)
Nov 11 2024, 7:59 PM
Unknown Object (File)
Sep 22 2024, 8:38 PM
Unknown Object (File)
Sep 8 2024, 3:31 PM
Unknown Object (File)
Aug 26 2024, 1:11 PM

Details

Reviewers
bz
thj
Group Reviewers
network
Summary

Overview

Current ip6_output() behaviour is not consistent across cached and non-cached lookup versions (followup of D18769).
For example, TCP retransmits (and to some extent to the normal TCP) for the local connections looks the following:

13:58 [0] m@devel2 ifconfig vtnet0 inet6
vtnet0: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=4c04bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,LRO,VLAN_HWTSO,LINKSTATE,TXCSUM_IPV6>
	inet6 fe80::5054:ff:fe14:e319%vtnet0 prefixlen 64 scopeid 0x1
	inet6 2a01:4f8:13a:70c:ffff::8 prefixlen 96
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

telnet 2a01:4f8:13a:70c:ffff::8 22
...
## dtrace probe checking ifp & originifp @ ip6_output_send():
# First TCP SYN:
* TX ifp=lo0 origifp=vtnet0 2a01:4f8:13a:70c:ffff::8
# Second TCP SYN:
* TX ifp=lo0 origifp=vtnet0 2a01:4f8:13a:70c:ffff::8
# Third TCP SYN:
* TX ifp=lo0 origifp=lo0 2a01:4f8:13a:70c:ffff::8

Apart from being inconsistent, it also adds complexity to the recently-added source address validation (D32915).

So, what happens here?
Let's start with a small background - what is originifp and why it is needed?
As opposed to IPv4 world (mostly), IPv6 has a concept of scopes (e.g. non-overlapping zones which an address can belong to). One of such scopes is link-local scope (e.g. link-local address is only "valid" within the link). Traditionally we shortcut traffic to the local addresses via loopback interface, instead of relying on the L2 output route (or the actual NIC) to do the loop. In order to support this shortcut for IPv6 link-local, one needs to somehow pass the original zone/interface to the loopback input, so ip6_input() can properly work. This is what origins is used for - passing the "address" interface. For the sake of simplicity, it is used for all IPv6 traffic, not just link-local one.

Let's look into the dtrace results once again. The first one (ifp=lo0, origifp=vtnet0) is exactly what is expected - transmit interface is loopback, and the original interface is properly retained.
However, this result is achieved in a non-obvious manner. In the middle of ip6_output(), at the routing lookup phase, in6_selectroute() is called.
It returns the correct nexthop, specific for the address in question (2a01:4f8:13a:70c:ffff::8), with proper nh_ifp=lo0& nh_aifp=vtnet0. Surprisingly, the ifp returned by the in6_selectroute() is vtnet0 instead of expected lo0. (In fact, in6_selectroute() explicitly returns nh_aifp in case of successful lookup).
It is changed once again in the Check for valid scope ID section - originifp becomes ifp and ifp is set to be ia->ia_ifp. The latter ia is derived from the same nexthop and is currently ::1 for such routes, but can change in the future, so it looks pretty fragile.

The second result looks exactly like the first, so we jump to the third result for a second. There the origifp suddenly becomes lo0. It happens because the nexthop finally get cached in the inpcb and the call to in6_selectroute() is avoided. Thus, ifp starts with lo0 and the machinery described above results in both origifp and ifp to become lo0.

Why the nexthop is not cached immediately? Because in6_selectroute() (and underlying selectroute()) updates the nexthop in the provided inpcb route, but does not update the route generation id (inp_rt_cookie). Next validation simply wipes the cached nexthop as the route generate id is wrong.

Proposed solution

The proposed idea is relatively simple and is composed of two actions. First is explicitly filling in proper ifp and origifp at the route lookup stage. The second is simplifying source/destination scope Id checks, as no actions other that pass/fail are expected.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 45455
Build 42343: arc lint + arc unit

Event Timeline

melifaro retitled this revision from netinet6: streamline scope6 checks. to netinet6: streamline scope6 checks for loopback traffic in ip6_output()..May 3 2022, 2:45 PM
melifaro edited the summary of this revision. (Show Details)

I am not sure if I currently can review this technically in full but we fixed a similar issue by the code you are currently changing in ef0111fdf364e4e87b522025b13aad69067c3fe6 .

686cdd19b1b18 when origifp was introduced there was a special check for IFF_LOOPBACK all along to deal with this. a1f7e5f8ee7fe then changed it to a check to use the interface of our own address if avail as origifp. From there too many edits have had an impact and subtle behaviour.

For all I am reading we are talking multiple problems and I am with @glebius from the email thread in that the 3rd lookup seems wrong. With no route changes the results should be correct and deterministic whether cached or not. I am not sure if that problem was addressed in first place but that should be one dedicated change.

Second and only from there I'd go after the scope zone checks as a 2nd fix though I haven't convinced myself that that is correct either yet.

Third, given glebius also has a patch for ip6_input the problem seems even bigger and means more things got broken along the way. Maybe it would be good where this breakage was introduced as that's not clear to me yet.
The fact that m->m_pkthdr.rcvif may be lo0 or actually is not may be unexpected. Should rcvif be updated in this case somewhere too? May this also affects packet filtering in firewalls potentially. There used to be a sysctl for this behaviour actually introduced in 82cd038d51e2f nd6_useloopback (and useloopback for IPv4). Sadly that got nuked along the way after 10-CURRENT/2012-04 as it was great or testing this stuff..

> % sysctl net.inet6.icmp6.nd6_useloopback
> net.inet6.icmp6.nd6_useloopback: 1

I would also kind-of like to know which the 4 packets are which do not show up in tcpdump on vtnet but ipfw counts them on vtnet:

# ipfw show
00100 0 0 allow log ip6 from any to any via lo0
00200 0 0 allow log ip6 from any to any via vtnet0
65535 0 0 deny ip from any to any
# telnet 2001:db8::1 22
Trying 2001:db8::1...
Connected to 2001:db8::1.
Escape character is '^]'.
SSH-2.0-OpenSSH_9.0 FreeBSD-20220415
^]
telnet> quit
Connection closed.
# jobs
[2]  + Running                       tcpdump -ln -s0 -i vtnet0 -v
# fg
tcpdump -ln -s0 -i vtnet0 -v
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel
# ipfw show
00100 22 1784 allow log ip6 from any to any via lo0
00200  4  380 allow log ip6 from any to any via vtnet0
65535  0    0 deny ip from any to any

Does this mean that rcvif is updated and @glebius' change from the email thread might not be required? Has anyone validated this?

sys/netinet6/ip6_output.c
711

If you can guarantee (further down) that ifp is always set here then that would probably be a good change (missed at an earlier time) to remove the if() here first as it'll make the follow-up changes a lot more logic (i.e. the missing NULL check further down) in 716 for the mtu.

sys/netinet6/scope6.c
564

That probably wants a #define elsewhere? But then GLOBAL is 0xe ... so either the 0xf or the comment are misleading...

In D35117#798486, @bz wrote:

I am not sure if I currently can review this technically in full but we fixed a similar issue by the code you are currently changing in ef0111fdf364e4e87b522025b13aad69067c3fe6 .

686cdd19b1b18 when origifp was introduced there was a special check for IFF_LOOPBACK all along to deal with this. a1f7e5f8ee7fe then changed it to a check to use the interface of our own address if avail as origifp. From there too many edits have had an impact and subtle behaviour.

For all I am reading we are talking multiple problems and I am with @glebius from the email thread in that the 3rd lookup seems wrong. With no route changes the results should be correct and deterministic whether cached or not. I am not sure if that problem was addressed in first place but that should be one dedicated change.

First of all, thank you for reviewing!
Indeed, this review packs two distinct changes. It targets the deterministic cache/lookup result, exactly as you suggested, so all cached/uncached always performs in same fashion. After untangling the ifp/originifp complexities, it resulted in empty if` condition after /* Check for valid scope ID. */, which made me think of simplifying the scope checks code as well. I'll split this diff into two, to simplify reviewing.

Second and only from there I'd go after the scope zone checks as a 2nd fix though I haven't convinced myself that that is correct either yet.

What would be the best way forward here (provided scope check changes are in separate review & the comments are addressed) ?

Third, given glebius also has a patch for ip6_input the problem seems even bigger and means more things got broken along the way. Maybe it would be good where this breakage was introduced as that's not clear to me yet.
The fact that m->m_pkthdr.rcvif may be lo0 or actually is not may be unexpected. Should rcvif be updated in this case somewhere too? May this also affects packet filtering in firewalls potentially. There used to be a sysctl for this behaviour actually introduced in 82cd038d51e2f nd6_useloopback (and useloopback for IPv4). Sadly that got nuked along the way after 10-CURRENT/2012-04 as it was great or testing this stuff..

I'd prefer not to see rcvif being loX (unless it's local traffic AND source address belongs to loX), but that's a separate discussion.

> % sysctl net.inet6.icmp6.nd6_useloopback
> net.inet6.icmp6.nd6_useloopback: 1

I would also kind-of like to know which the 4 packets are which do not show up in tcpdump on vtnet but ipfw counts them on vtnet:

# ipfw show
00100 0 0 allow log ip6 from any to any via lo0
00200 0 0 allow log ip6 from any to any via vtnet0
65535 0 0 deny ip from any to any
# telnet 2001:db8::1 22
Trying 2001:db8::1...
Connected to 2001:db8::1.
Escape character is '^]'.
SSH-2.0-OpenSSH_9.0 FreeBSD-20220415
^]
telnet> quit
Connection closed.
# jobs
[2]  + Running                       tcpdump -ln -s0 -i vtnet0 -v
# fg
tcpdump -ln -s0 -i vtnet0 -v
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel
# ipfw show
00100 22 1784 allow log ip6 from any to any via lo0
00200  4  380 allow log ip6 from any to any via vtnet0
65535  0    0 deny ip from any to any

Does this mean that rcvif is updated and @glebius' change from the email thread might not be required? Has anyone validated this?

Indeed, this review packs two distinct changes. It targets the deterministic cache/lookup result, exactly as you suggested, so all cached/uncached always performs in same fashion. After untangling the ifp/originifp complexities, it resulted in empty if` condition after /* Check for valid scope ID. */, which made me think of simplifying the scope checks code as well. I'll split this diff into two, to simplify reviewing.

The ifp/mtu changes are actually the hard parts to review here to make sure everything is set as expected in all the possible cases (which are plenty in that function).

Second and only from there I'd go after the scope zone checks as a 2nd fix though I haven't convinced myself that that is correct either yet.

What would be the best way forward here (provided scope check changes are in separate review & the comments are addressed) ?

Sounds like a good start to move forward indeed.

Third, given glebius also has a patch for ip6_input the problem seems even bigger and means more things got broken along the way. Maybe it would be good where this breakage was introduced as that's not clear to me yet.
The fact that m->m_pkthdr.rcvif may be lo0 or actually is not may be unexpected. Should rcvif be updated in this case somewhere too? May this also affects packet filtering in firewalls potentially. There used to be a sysctl for this behaviour actually introduced in 82cd038d51e2f nd6_useloopback (and useloopback for IPv4). Sadly that got nuked along the way after 10-CURRENT/2012-04 as it was great or testing this stuff..

I'd prefer not to see rcvif being loX (unless it's local traffic AND source address belongs to loX), but that's a separate discussion.

Well, you could also argue that it never sees origifp (and its driver or interface) in that case of local traffic and it's all lo0 but then it may be confusing to people given the interface belonging to the address and the interface of the packet are not matching (already confusing when looking at interface counters as counters on two interfaces are incrementing for the same incoming packet; but that is correct). I think what I was trying to point out with tcpdump vs. ipfw is that there are more inconsistencies lingering around which should be looked into.

In D35117#798612, @bz wrote:

Indeed, this review packs two distinct changes. It targets the deterministic cache/lookup result, exactly as you suggested, so all cached/uncached always performs in same fashion. After untangling the ifp/originifp complexities, it resulted in empty if` condition after /* Check for valid scope ID. */, which made me think of simplifying the scope checks code as well. I'll split this diff into two, to simplify reviewing.

The ifp/mtu changes are actually the hard parts to review here to make sure everything is set as expected in all the possible cases (which are plenty in that function).

Second and only from there I'd go after the scope zone checks as a 2nd fix though I haven't convinced myself that that is correct either yet.

What would be the best way forward here (provided scope check changes are in separate review & the comments are addressed) ?

Sounds like a good start to move forward indeed.

I've raised a separate stack of diffs, currently ending with D35732. If you could take a look at these, it would be awesome.

Third, given glebius also has a patch for ip6_input the problem seems even bigger and means more things got broken along the way. Maybe it would be good where this breakage was introduced as that's not clear to me yet.
The fact that m->m_pkthdr.rcvif may be lo0 or actually is not may be unexpected. Should rcvif be updated in this case somewhere too? May this also affects packet filtering in firewalls potentially. There used to be a sysctl for this behaviour actually introduced in 82cd038d51e2f nd6_useloopback (and useloopback for IPv4). Sadly that got nuked along the way after 10-CURRENT/2012-04 as it was great or testing this stuff..

I'd prefer not to see rcvif being loX (unless it's local traffic AND source address belongs to loX), but that's a separate discussion.

Well, you could also argue that it never sees origifp (and its driver or interface) in that case of local traffic and it's all lo0 but then it may be confusing to people given the interface belonging to the address and the interface of the packet are not matching (already confusing when looking at interface counters as counters on two interfaces are incrementing for the same incoming packet; but that is correct). I think what I was trying to point out with tcpdump vs. ipfw is that there are more inconsistencies lingering around which should be looked into.