Details

Reviewers

asomers
bz

Group Reviewers

network

Commits

rS315458: Constrain IPv6 routes to single FIBs when net.add_addr_allfibs=0

Summary

Respect net.add_addr_allfibs for IPv6

This patch contains two related changes:

First, we add routes for interface addresses to only the interface's FIB if net.add_addr_allfibs is 0. This is congruent with IPv4 behavior. This includes making changes to source address selection logic to ensure a source from the correct FIB is chosen. Previously, we could assume all interface routes were in the default FIB since interface addresses were added to all FIBs.

Second, default router selection (from router announcements) is now able to select multiple routers, one per FIB. Obviously, the selected router is installed as the default route in the FIB of the interface that received the RA.

See PR196361 - https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=196361

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

jhujhiti_adjectivism.org updated this revision to Diff 24768.Feb 5 2017, 4:34 PM

jhujhiti_adjectivism.org retitled this revision from to Constrain IPv6 interface routes to each FIB.

jhujhiti_adjectivism.org updated this object.

jhujhiti_adjectivism.org edited the test plan for this revision. (Show Details)

jhujhiti_adjectivism.org added reviewers: asomers, network.

jhujhiti_adjectivism.org set the repository for this revision to rS FreeBSD src repository - subversion.

jhujhiti_adjectivism.org added a subscriber: freebsd-net-list.

Herald added subscribers: ae, imp. · View Herald TranscriptFeb 5 2017, 4:34 PM

Awesome work jhujhiti. Unfortunately, I won't be able to test it until PR 216734 is fixed or I make myself another FreeBSD head machine. I'll try to do that sometime next week.

Can you resubmit the patch with more context? You can make it using such arguments for svn diff "--diff-cmd=diff -x -U999999" or similar arguments for your VCS. It produces a much large patches, but it is possible to see more context.

Here's the same patch with full context

In addition to the issues I mentioned inline, could you please also update the review summary to include the full commit message? Try to mention every scenario where you're intentionally changing behavior. I'll try to add ATF testcases for all of them.

sys/net/route.c
2224 ↗	(On Diff #24782)	This line is going to break vimage, and it's also the wrong place to check the tunable. See rtinit1, line 2032. Please revert this chunk.
sys/netinet6/icmp6.c
2147 ↗	(On Diff #24782)	It seems like this code selects the incoming interface's FIB for routing an ICMPv6 response. Do I understand that correctly? If so, why is similar code not needed in icmp_reflect?
sys/netinet6/in6.c
180 ↗	(On Diff #24782)	Should be `V_rt_add_addr_allfibs`
2127 ↗	(On Diff #24782)	Should be V_rt_add_addr_allfibs
sys/netinet6/in6_src.c
566 ↗	(On Diff #24782)	I think this is a bug. If we have a socket, then the source address should be based on the socket's FIB, not the interface's. Socket FIBs default to the FIB of the process, but can be manually changed.
sys/netinet6/nd6.c
198 ↗	(On Diff #24782)	Should be V_rt_add_addr_allfibs
1295 ↗	(On Diff #24782)	This looks wrong to me. If we're trying to determine whether a given IPv6 address is a neighbor, that shouldn't depend on the interface's default fib. OTOH, the old code looks wrong too, because it only checks the default FIB, not all FIBs. The right answer is probably to loop through all FIBs. I would ask for @bz's review of this section
sys/netinet6/nd6.h
472 ↗	(On Diff #24782)	This is a KPI change. We generally can't break existing KPIs. Your options are: Is `defrouter_select` new in FreeBSD 12? If so, go ahead and change it Has `defrouter_select`'s signature already changed since FreeBSD 11.0? If so, go ahead and change it. Otherwise, leave it alone, and add a new function called `defrouter_select_fib`.
sys/netinet6/nd6_nbr.c
265 ↗	(On Diff #24782)	As with `nd6_is_new_addr_neighbor`, we should get @bz's review here.
sys/netinet6/nd6_rtr.c
735 ↗	(On Diff #24782)	Please document the new parameter. Also, I'm not sure the code is correct when `fibnum == RT_ADDR_ALLFIBS`. In that case, no interface will ever have `if_fib == fibnum`. You should probably fall back to legacy behavior in that case.
806 ↗	(On Diff #24782)	While you're here, you may as well fix the style on this line. It's > 80 chars.
1758 ↗	(On Diff #24782)	Should be V_rt_add_addr_allfibs
1854 ↗	(On Diff #24782)	Should be V_rt_add_addr_allfibs
1936 ↗	(On Diff #24782)	Should be V_rt_add_addr_allfibs

This revision now requires changes to proceed.Feb 8 2017, 6:22 PM

asomers added a reviewer: bz.Feb 8 2017, 6:23 PM

As I mentioned in the PR, this is my first attempt at kernel work, so I very much appreciate the comments. I'll go ahead and update the review summary at my next opportunity.

The testcase that is difficult to write is unfortunately the one that tests the most complex code paths: configuring multiple prefixes and default routers from router advertisements in different FIBs. I'm not sure it's possible to write this with a single machine (sending RAs to itself)? For what it's worth, my test case while working has been to attach the machine's NIC (which is connected to a network sending RAs) to a bridge, and then connect several epair interfaces in different FIBs to the bridge. This test also proved that the code has no problems with overlapping subnets in different FIBs.

sys/net/route.c
2224 ↗	(On Diff #24782)	You're right - this was mistakenly left over from my first (poor) attempt at the patch. I'll revert and replace with rtinit1() in the applicable places in netinet6/in6.c. Can I ask what you mean by this being the wrong place to check the tunable? Is this a stylistic thing, or is there a technical reason I'm not aware of?
sys/netinet6/icmp6.c
2147 ↗	(On Diff #24782)	You do understand correctly, and trying to answer why equivalent code is not needed in icmp_reflect() caused me to realize that this logic is mostly redundant. icmp6_reflect() callers copy the FIB when creating the mbuf of the packet to reflect, and RT_DEFAULT_FIB will get passed to in6_selectsrc_addr() below in almost all of the same cases as it did before. The one case that seems worth thinking about more is the case where the packet is destined to an anycast address. The logic on lines 2125-2126 avoid selecting that address as a source. Since srcp will not be set in this case, we will end up selecting a source address for the response from the default FIB on line 2159 (assuming my change had been reverted). This seems like a case where we should attempt to reply from some other address in the same FIB. Does this logic make sense? If so, I'll rework the FIB selection logic above to be less redundant. Alternatively, I notice that a note in RFC 6724, Appendix B suggests that anycast source addresses may be valid (https://tools.ietf.org/html/rfc6724 page 30, section 3), so perhaps avoiding selecting the anycast address as a source for the reply is no longer necessary (although reworking source address selection in general is definitely not in scope for my change).
sys/netinet6/in6_src.c
566 ↗	(On Diff #24782)	I must be misunderstanding the inpcb structure -- does this not represent the socket (and therefore inp->inp_inc.inc_fibnum represent the socket FIB)? Even if my understanding is correct, I do think there's a bug here: it seems like we should prefer the socket's FIB, then the interface's FIB, and only the default as a last resort.
sys/netinet6/nd6.c
1295 ↗	(On Diff #24782)	I think this makes sense... I assume the old code used the default FIB because all interface addresses were added to all FIBs, so it didn't matter which FIB was searched.
sys/netinet6/nd6.h
472 ↗	(On Diff #24782)	I was under the impression that KPIs were mutable on CURRENT - although I also understand that a KPI change would block an MFC. Are the rules around KPI changes on CURRENT more strict than I realized? I realized while writing this comment that I can simply make the old function loop over all FIBs, so maintaining KPI stability works :)
sys/netinet6/nd6_rtr.c
735 ↗	(On Diff #24782)	I would expect a value RT_ADDR_ALLFIBS to either (1) be an invalid argument (assertion failure?) or (2) call ourselves recursively to loop over all FIBs rather than falling back to old behavior, which doesn't make a lot of sense -- old behavior wasn't FIB-aware and could trample over multiple selected routers.

jch added a subscriber: jch.Feb 9 2017, 8:43 AM

In D9451#196364, @jhujhiti_adjectivism.org wrote:

As I mentioned in the PR, this is my first attempt at kernel work, so I very much appreciate the comments. I'll go ahead and update the review summary at my next opportunity.

The testcase that is difficult to write is unfortunately the one that tests the most complex code paths: configuring multiple prefixes and default routers from router advertisements in different FIBs. I'm not sure it's possible to write this with a single machine (sending RAs to itself)? For what it's worth, my test case while working has been to attach the machine's NIC (which is connected to a network sending RAs) to a bridge, and then connect several epair interfaces in different FIBs to the bridge. This test also proved that the code has no problems with overlapping subnets in different FIBs.

I think we can make the testcase a little simpler. We can create two tap devices in different fibs and bridge them with socat. Then we can configure one to broadcast router advertisements, and check that the other one sets up its routes correctly. We could create a second pair of tap interfaces too, to test having multiple simultaneous subnets configured with SLAAC. I can write the testcase, if you tell me what to check and how to configure an interface to broadcast router advertisements.

sys/net/route.c
2224 ↗	(On Diff #24782)	Just three lines below this point, `rtinit` calls `rtinit1`, and `rtinit1` checks `V_rt_add_addr_allfibs` as almost the first thing that it does. So `rtinit` shouldn't also check it.
sys/netinet6/icmp6.c
2147 ↗	(On Diff #24782)	At line 2159, do we know the fib of the receiving interface? If so, we should probably use it. The interface fib is the fib to use when forwarding packets, so it makes sense to use it for ICMP replies, too. Your observation about RFC 6724 is interesting, but I don't think we should act on it as part of this commit. It should go in separately.
sys/netinet6/in6_src.c
566 ↗	(On Diff #24782)	You're correct; I was confused about inpcb. However, there is no need to consider the interface fib. The interface fib really only matters for forwarding packets and for automatically adding routes to routing tables. Also, I think that the `inp != NULL` check is superfluous. AFAICT, there's no way for `inp` to be `NULL` right here.
sys/netinet6/nd6.h
472 ↗	(On Diff #24782)	Definitely better to preserver the old `defrouter_select` as a wrapper around the new function.
sys/netinet6/nd6_rtr.c
735 ↗	(On Diff #24782)	Looping sounds like a good idea.

jhujhiti_adjectivism.org updated this revision to Diff 25512.Feb 22 2017, 5:12 AM

jhujhiti_adjectivism.org updated this object.

jhujhiti_adjectivism.org edited edge metadata.

jhujhiti_adjectivism.org marked 13 inline comments as done.

I've updated the patch to address these points aside from the open question about determining neighborship.

I think the testcase for default router advertisement should look something like this. I'm more familiar with epair interfaces than tap, but I assume the concept is the same.

First, enable IPv6 forwarding and rfc6204w3. The former is necessary for rtadvd to send RAs and the latter is a workaround to prevent the kernel from disabling default router selection on a router itself.

# sysctl net.inet6.ip6.rfc6204w3=1
net.inet6.ip6.rfc6204w3: 0 -> 1
# sysctl net.inet6.ip6.forwarding=1
net.inet6.ip6.forwarding: 0 -> 1

Create epair interfaces and configure "a" as the router and "b" as the client.

# ifconfig epair0 create
epair0a
# ifconfig epair0a fib 1
# ifconfig epair0a inet6 fd01::1/64 
# ifconfig epair0a up
# ifconfig epair0b fib 2
# ifconfig epair0b inet6 -ifdisabled accept_rtadv
# ifconfig epair0b up
# rtadvd epair0a
# rtsol epair0b

Sanity check - if epair0b received the RA, it should have an address in fd01::/64.

# ifconfig epair0b
epair0b: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=8<VLAN_MTU>
	ether 02:ff:30:00:05:0b
	inet6 fe80::ff:30ff:fe00:50b%epair0b prefixlen 64 scopeid 0x5 
	inet6 fd01::ff:30ff:fe00:50b prefixlen 64 autoconf 
	nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
	media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
	status: active
	fib: 2
	groups: epair

Verify that the router was learned and that the default route was installed.

# ndp -rn
fe80::ff:e0ff:fe00:40a%epair0b if=epair0b, flags=, pref=medium, expire=28m4s
# netstat -rnf inet6 -F2
Routing tables (fib: 2)

Internet6:
Destination                       Gateway                       Flags     Netif Expire
::/96                             ::1                           UGRS        lo0
default                           fe80::ff:e0ff:fe00:40a%epair0b UG     epair0b
::1                               lo0                           UHS         lo0
::ffff:0.0.0.0/96                 ::1                           UGRS        lo0
fd01::/64                         link#5                        U       epair0b
fd01::ff:30ff:fe00:50b            link#5                        UHS         lo0
fe80::/10                         ::1                           UGRS        lo0
fe80::%epair0b/64                 link#5                        U       epair0b
fe80::ff:30ff:fe00:50b%epair0b    link#5                        UHS         lo0
ff02::/16                         ::1                           UGRS        lo0

The same thing should work simultaneously in FIBs 3 and 4 with epair1a and 1b (and the scope on the default route in FIB 4 should be %epair1b). The MAC address of the epair interfaces can be set to something predictable to make identifying the route easier.

sys/net/route.c
2224 ↗	(On Diff #24782)	Ah, of course... A second look while re-implementing this revealed that rtinit1() actually does the correct thing when passed RT_ALL_FIBS for both additions and deletions, so I've simply reverted my change to line 2224.
sys/netinet6/icmp6.c
2147 ↗	(On Diff #24782)	At line 2159, do we know the fib of the receiving interface? It looks like the mbuf passed to icmp6_reflect() will always have the FIB set based on the receiving interface, so we can just use that! Can a more experienced set of eyes confirm?
sys/netinet6/in6_src.c
566 ↗	(On Diff #24782)	I agree - all callers of in6_selectsrc_socket() unconditionally dereference inp prior to passing it, so this check for NULL is definitely redundant. I've removed it.
sys/netinet6/nd6.c
1295 ↗	(On Diff #24782)	I'm rethinking this a little bit. While I think it's true that the most correct way to answer the question "Is this address a neighbor?" is to consider addresses in any FIB, I'm not sure it's necessary. Since the caller is specifying which interface this address should be a neighbor on, it's safe to consider only that interface's FIB because interface routes (ie., those routes with neighbors) will always be added there.
1353 ↗	(On Diff #24782)	Should we pass the ifp->if_fib here? We compare ifps on line 1355, so searching all FIBs is usually fine, but if we have two identical addresses in two different FIBs, we could find the wrong one here and end up returning false.

This review is starting to look pretty good. But in addition to the few things I mentioned inline, there's one other change that you need to make: you get to clear the atf_expect_fail statements from tests/sys/netinet/fibs_test.sh.

sys/netinet6/nd6.c
1310 ↗	(On Diff #25512)	This comment is incorrect now.
1295 ↗	(On Diff #24782)	Remember, the interface fib only matters for forwarding packets. It's totally valid for an interface to have multiple addresses assigned, each of which is on a different fib. So, to correctly determine whether `addr` is a neighbor of `ifp`, we must either Loop over all fibs, and check whether `addr` is a neighbor of `ifp` on any of them, or Loop over all addresses assigned to `ifp`, and check whether `addr` is a neighbor of `ifp` on that address's fib. I'm guessing that this will be the slower option, because an interface can have arbitrarily many addresses
1353 ↗	(On Diff #24782)	The original code seems too complicated. I think it should go a little like this (locks elided): if (ifp->if_flags & IFF_POINTOPOINT) { TAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) { if (ifa->ifa_addr->sa_family != addr->sa_family) continue; if (ifa->ifa_dstaddr != NULL && sa_equal(addr, ifa->ifa_dstaddr)) { return (1); } } } No unnecessary looping over either fibs or interfaces.

jhujhiti_adjectivism.org added inline comments.Mar 3 2017, 12:47 AM

sys/netinet6/nd6.c
1295 ↗	(On Diff #24782)	It's totally valid for an interface to have multiple addresses assigned, each of which is on a different fib. Is this true? I'm not aware of a way this could happen. Interface routes are added to the FIB associated with the interface, and of course there is only one FIB per interface. For instance in6_newaddrmsg() provides no mechanism by which to assign the route for the interface's address to anything other than the interface FIB.

asomers added inline comments.Mar 3 2017, 1:16 AM

sys/netinet6/nd6.c
1295 ↗	(On Diff #24782)	Yep, it's true. One way is to do it with static routes. Another way involves changing the interfaces's fib. For example, like this: ifconfig tap0 create ifconfig tap0 10.1.0.1/24 fib 2 ifconfig tap0 10.1.1.1/24 fib 3 alias

jhujhiti_adjectivism.org updated this revision to Diff 26022.Mar 6 2017, 5:10 AM

jhujhiti_adjectivism.org edited edge metadata.

jhujhiti_adjectivism.org marked an inline comment as done.

jhujhiti_adjectivism.org added inline comments.

sys/netinet6/nd6.c
1295 ↗	(On Diff #24782)	That's... counter-intuitive. But given that behavior, we absolutely should check other other FIBs here. The function is already looping over all prefixes on the interface so I inserted a check of all FIBs (if necessary) inside of it.
1353 ↗	(On Diff #24782)	This seems like a good idea. Is this new code what you had in mind?
sys/netinet6/nd6_rtr.c
574 ↗	(On Diff #25512)	I'm now wondering if this is incorrect. If the interface's FIB changes after the router has been added to the list, we would fail to remove it here. Should the default router (and maybe prefix) structs include a new field to store the FIB in which the router(/prefix) was originally learned?

Almost done. I think the only thing left is to delete all of the related atf_expect_fail statements from fibs_test.sh, not just one.

sys/netinet6/nd6.c
1353 ↗	(On Diff #24782)	Yep.
sys/netinet6/nd6_rtr.c
574 ↗	(On Diff #25512)	I think that's a limitation of the design of our network stack, and I don't think that fixing it should be a part of this change. I think that fixing would require either: Associate a fib with each ifaddr, and delete the network and host routes whenever the ifaddr is deleted, or Store extra information with each route, such as why they were added to the routing table.

Forgot to change the other tests - this ought to do it.

One annoying detail: while running these tests, I noticed intermittent failures on the SLAAC test. I was reminded of a similar intermittent issue I had while developing this: for reasons I can not explain, the epair0a interface (in this example) oftentimes gets correctly configured, but the IFDISABLED flag is never removed, like so:

epair0a: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=8<VLAN_MTU>
        ether 02:ff:e0:00:04:0a
        inet6 2001:db8::2 prefixlen 64 tentative 
        inet6 fe80::ff:e0ff:fe00:40a%epair0a prefixlen 64 tentative scopeid 0x4 
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
        status: active
        fib: 1
        groups: epair

Inserting a sleep between creation of the epair and configuration solves it. The good news is that this behaves identically on my system without the patch, so it's not a new regression.

sys/netinet6/icmp6.c
2147 ↗	(On Diff #24782)	@asomers, can you confirm that M_GETFIB(m) is always correctly set to the FIB of the receiving interface?

asomers added inline comments.Mar 7 2017, 6:21 PM

sys/netinet6/icmp6.c
2147 ↗	(On Diff #24782)	No. According to the comment at the bottom of icmp6_error, it isn't, because icmp6_reflect can sometimes be called from the output path. In that case, there wouldn't be a receiving interface. However, `M_GETFIB(m)` will definitely return _a_ fib. It will either have the fib of the receiving interface if this is on the receive path, or the fib of the socket if this is on the output path. Socket fibs default to the process fib, which cannot be `RT_ALL_FIBS`, but they can also be set by `setsockopt(..., SO_SETFIB, ...)`, which also doesn't allow `RT_ALL_FIBS`. So I think your code is ok here.

jhujhiti_adjectivism.org marked 6 inline comments as done.Mar 8 2017, 1:41 AM

jhujhiti_adjectivism.org added inline comments.

sys/netinet6/icmp6.c
2147 ↗	(On Diff #24782)	Ah yes, that's how I should have phrased it - valid FIB even if called in the output path.
sys/netinet6/nd6_nbr.c
265 ↗	(On Diff #24782)	I think this is the only thing left to consider for this patch, but it seems to me that using the receiving interface's FIB is the most correct thing to do here. Checking other FIBs seems incorrect since that would cause the proxy to "leak" across FIB boundaries.

asomers accepted this revision.Mar 9 2017, 6:14 PM

asomers added inline comments.

sys/netinet6/nd6_nbr.c
265 ↗	(On Diff #24782)	At this point, `dst6` is the destination address of the received ns packet. ns packets are sent either to unicast or multicast addresses. In the former case, we should never receive a ns packet except to an address we already use, and it should only arrive at the interface that's using that address (unless your interface fib is different than fib the address was configured with, but I would call that a configuration error). If it's a multicast packet, the same logic applies, except that there's a small chance for us to receive a multicast ns packet for an address that's not ours. But even in that case, the packet should arrive at the interface that's a member of that multicast group, unless the group's unicast address is configured for a different fib than the interface. In summary, I think it's ok to use the receiving interface's fib here.

This revision is now accepted and ready to land.Mar 9 2017, 6:14 PM

Closed by commit rS315458: Constrain IPv6 routes to single FIBs when net.add_addr_allfibs=0 (authored by asomers). · Explain WhyMar 17 2017, 4:51 PM

This revision was automatically updated to reflect the committed changes.

Constrain IPv6 interface routes to each FIB
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 26359

head/sys/netinet6/icmp6.c

head/sys/netinet6/in6.c

head/sys/netinet6/in6_src.c

head/sys/netinet6/nd6.h

head/sys/netinet6/nd6.c

head/sys/netinet6/nd6_nbr.c

head/sys/netinet6/nd6_rtr.c

head/tests/sys/netinet/fibs_test.sh

Constrain IPv6 interface routes to each FIBClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 26359

head/sys/netinet6/icmp6.c

head/sys/netinet6/in6.c

head/sys/netinet6/in6_src.c

head/sys/netinet6/nd6.h

head/sys/netinet6/nd6.c

head/sys/netinet6/nd6_nbr.c

head/sys/netinet6/nd6_rtr.c

head/tests/sys/netinet/fibs_test.sh

Constrain IPv6 interface routes to each FIB
ClosedPublic
Actions

Revision Contents
Changeset List