Page MenuHomeFreeBSD

Constrain IPv6 interface routes to each FIB
ClosedPublic

Authored by jhujhiti_adjectivism.org on Feb 5 2017, 4:34 PM.

Details

Summary

Respect net.add_addr_allfibs for IPv6

This patch contains two related changes:

First, we add routes for interface addresses to only the interface's FIB if net.add_addr_allfibs is 0. This is congruent with IPv4 behavior. This includes making changes to source address selection logic to ensure a source from the correct FIB is chosen. Previously, we could assume all interface routes were in the default FIB since interface addresses were added to all FIBs.

Second, default router selection (from router announcements) is now able to select multiple routers, one per FIB. Obviously, the selected router is installed as the default route in the FIB of the interface that received the RA.

See PR196361 - https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=196361

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

jhujhiti_adjectivism.org retitled this revision from to Constrain IPv6 interface routes to each FIB.
jhujhiti_adjectivism.org updated this object.
jhujhiti_adjectivism.org edited the test plan for this revision. (Show Details)
jhujhiti_adjectivism.org set the repository for this revision to rS FreeBSD src repository.
asomers edited edge metadata.Feb 5 2017, 5:33 PM

Awesome work jhujhiti. Unfortunately, I won't be able to test it until PR 216734 is fixed or I make myself another FreeBSD head machine. I'll try to do that sometime next week.

ae added a comment.Feb 5 2017, 11:48 PM

Can you resubmit the patch with more context? You can make it using such arguments for svn diff "--diff-cmd=diff -x -U999999" or similar arguments for your VCS. It produces a much large patches, but it is possible to see more context.

jhujhiti_adjectivism.org edited edge metadata.

Here's the same patch with full context

asomers requested changes to this revision.Feb 8 2017, 6:22 PM
asomers edited edge metadata.
asomers added a subscriber: bz.

In addition to the issues I mentioned inline, could you please also update the review summary to include the full commit message? Try to mention every scenario where you're intentionally changing behavior. I'll try to add ATF testcases for all of them.

sys/net/route.c
2224 ↗(On Diff #24782)

This line is going to break vimage, and it's also the wrong place to check the tunable. See rtinit1, line 2032. Please revert this chunk.

sys/netinet6/icmp6.c
2147 ↗(On Diff #24782)

It seems like this code selects the incoming interface's FIB for routing an ICMPv6 response. Do I understand that correctly? If so, why is similar code not needed in icmp_reflect?

sys/netinet6/in6.c
180 ↗(On Diff #24782)

Should be V_rt_add_addr_allfibs

2127 ↗(On Diff #24782)

Should be V_rt_add_addr_allfibs

sys/netinet6/in6_src.c
566 ↗(On Diff #24782)

I think this is a bug. If we have a socket, then the source address should be based on the socket's FIB, not the interface's. Socket FIBs default to the FIB of the process, but can be manually changed.

sys/netinet6/nd6.c
198 ↗(On Diff #24782)

Should be V_rt_add_addr_allfibs

1295 ↗(On Diff #24782)

This looks wrong to me. If we're trying to determine whether a given IPv6 address is a neighbor, that shouldn't depend on the interface's default fib. OTOH, the old code looks wrong too, because it only checks the default FIB, not all FIBs. The right answer is probably to loop through all FIBs. I would ask for @bz's review of this section

sys/netinet6/nd6.h
472 ↗(On Diff #24782)

This is a KPI change. We generally can't break existing KPIs. Your options are:

  1. Is defrouter_select new in FreeBSD 12? If so, go ahead and change it
  2. Has defrouter_select's signature already changed since FreeBSD 11.0? If so, go ahead and change it.
  3. Otherwise, leave it alone, and add a new function called defrouter_select_fib.
sys/netinet6/nd6_nbr.c
265 ↗(On Diff #24782)

As with nd6_is_new_addr_neighbor, we should get @bz's review here.

sys/netinet6/nd6_rtr.c
735 ↗(On Diff #24782)

Please document the new parameter. Also, I'm not sure the code is correct when fibnum == RT_ADDR_ALLFIBS. In that case, no interface will ever have if_fib == fibnum. You should probably fall back to legacy behavior in that case.

806 ↗(On Diff #24782)

While you're here, you may as well fix the style on this line. It's > 80 chars.

1758 ↗(On Diff #24782)

Should be V_rt_add_addr_allfibs

1854 ↗(On Diff #24782)

Should be V_rt_add_addr_allfibs

1936 ↗(On Diff #24782)

Should be V_rt_add_addr_allfibs

This revision now requires changes to proceed.Feb 8 2017, 6:22 PM
asomers added a reviewer: bz.Feb 8 2017, 6:23 PM
jhujhiti_adjectivism.org marked 6 inline comments as done.Feb 9 2017, 5:31 AM

As I mentioned in the PR, this is my first attempt at kernel work, so I very much appreciate the comments. I'll go ahead and update the review summary at my next opportunity.

The testcase that is difficult to write is unfortunately the one that tests the most complex code paths: configuring multiple prefixes and default routers from router advertisements in different FIBs. I'm not sure it's possible to write this with a single machine (sending RAs to itself)? For what it's worth, my test case while working has been to attach the machine's NIC (which is connected to a network sending RAs) to a bridge, and then connect several epair interfaces in different FIBs to the bridge. This test also proved that the code has no problems with overlapping subnets in different FIBs.

sys/net/route.c
2224 ↗(On Diff #24782)

You're right - this was mistakenly left over from my first (poor) attempt at the patch. I'll revert and replace with rtinit1() in the applicable places in netinet6/in6.c.

Can I ask what you mean by this being the wrong place to check the tunable? Is this a stylistic thing, or is there a technical reason I'm not aware of?

sys/netinet6/icmp6.c
2147 ↗(On Diff #24782)

You do understand correctly, and trying to answer why equivalent code is not needed in icmp_reflect() caused me to realize that this logic is mostly redundant. icmp6_reflect() callers copy the FIB when creating the mbuf of the packet to reflect, and RT_DEFAULT_FIB will get passed to in6_selectsrc_addr() below in *almost* all of the same cases as it did before.

The one case that seems worth thinking about more is the case where the packet is destined to an anycast address. The logic on lines 2125-2126 avoid selecting that address as a source. Since srcp will not be set in this case, we will end up selecting a source address for the response from the default FIB on line 2159 (assuming my change had been reverted). This seems like a case where we should attempt to reply from some other address in the same FIB. Does this logic make sense? If so, I'll rework the FIB selection logic above to be less redundant.

Alternatively, I notice that a note in RFC 6724, Appendix B suggests that anycast source addresses may be valid (https://tools.ietf.org/html/rfc6724 page 30, section 3), so perhaps avoiding selecting the anycast address as a source for the reply is no longer necessary (although reworking source address selection in general is definitely not in scope for my change).

sys/netinet6/in6_src.c
566 ↗(On Diff #24782)

I must be misunderstanding the inpcb structure -- does this not represent the socket (and therefore inp->inp_inc.inc_fibnum represent the socket FIB)?

Even if my understanding is correct, I do think there's a bug here: it seems like we should prefer the socket's FIB, then the interface's FIB, and only the default as a last resort.

sys/netinet6/nd6.c
1295 ↗(On Diff #24782)

I think this makes sense... I assume the old code used the default FIB because all interface addresses were added to all FIBs, so it didn't matter which FIB was searched.

sys/netinet6/nd6.h
472 ↗(On Diff #24782)

I was under the impression that KPIs were mutable on CURRENT - although I also understand that a KPI change would block an MFC. Are the rules around KPI changes on CURRENT more strict than I realized?

I realized while writing this comment that I can simply make the old function loop over all FIBs, so maintaining KPI stability works :)

sys/netinet6/nd6_rtr.c
735 ↗(On Diff #24782)

I would expect a value RT_ADDR_ALLFIBS to either (1) be an invalid argument (assertion failure?) or (2) call ourselves recursively to loop over all FIBs rather than falling back to old behavior, which doesn't make a lot of sense -- old behavior wasn't FIB-aware and could trample over multiple selected routers.

jch added a subscriber: jch.Feb 9 2017, 8:43 AM

As I mentioned in the PR, this is my first attempt at kernel work, so I very much appreciate the comments. I'll go ahead and update the review summary at my next opportunity.
The testcase that is difficult to write is unfortunately the one that tests the most complex code paths: configuring multiple prefixes and default routers from router advertisements in different FIBs. I'm not sure it's possible to write this with a single machine (sending RAs to itself)? For what it's worth, my test case while working has been to attach the machine's NIC (which is connected to a network sending RAs) to a bridge, and then connect several epair interfaces in different FIBs to the bridge. This test also proved that the code has no problems with overlapping subnets in different FIBs.

I think we can make the testcase a little simpler. We can create two tap devices in different fibs and bridge them with socat. Then we can configure one to broadcast router advertisements, and check that the other one sets up its routes correctly. We could create a second pair of tap interfaces too, to test having multiple simultaneous subnets configured with SLAAC. I can write the testcase, if you tell me what to check and how to configure an interface to broadcast router advertisements.

sys/net/route.c
2224 ↗(On Diff #24782)

Just three lines below this point, rtinit calls rtinit1, and rtinit1 checks V_rt_add_addr_allfibs as almost the first thing that it does. So rtinit shouldn't also check it.

sys/netinet6/icmp6.c
2147 ↗(On Diff #24782)

At line 2159, do we know the fib of the receiving interface? If so, we should probably use it. The interface fib is the fib to use when forwarding packets, so it makes sense to use it for ICMP replies, too.

Your observation about RFC 6724 is interesting, but I don't think we should act on it as part of this commit. It should go in separately.

sys/netinet6/in6_src.c
566 ↗(On Diff #24782)

You're correct; I was confused about inpcb. However, there is no need to consider the interface fib. The interface fib really only matters for forwarding packets and for automatically adding routes to routing tables. Also, I think that the inp != NULL check is superfluous. AFAICT, there's no way for inp to be NULL right here.

sys/netinet6/nd6.h
472 ↗(On Diff #24782)

Definitely better to preserver the old defrouter_select as a wrapper around the new function.

sys/netinet6/nd6_rtr.c
735 ↗(On Diff #24782)

Looping sounds like a good idea.

jhujhiti_adjectivism.org updated this object.
jhujhiti_adjectivism.org edited edge metadata.
jhujhiti_adjectivism.org marked 13 inline comments as done.

I've updated the patch to address these points aside from the open question about determining neighborship.

I think the testcase for default router advertisement should look something like this. I'm more familiar with epair interfaces than tap, but I assume the concept is the same.

First, enable IPv6 forwarding and rfc6204w3. The former is necessary for rtadvd to send RAs and the latter is a workaround to prevent the kernel from disabling default router selection on a router itself.

# sysctl net.inet6.ip6.rfc6204w3=1
net.inet6.ip6.rfc6204w3: 0 -> 1
# sysctl net.inet6.ip6.forwarding=1
net.inet6.ip6.forwarding: 0 -> 1

Create epair interfaces and configure "a" as the router and "b" as the client.

# ifconfig epair0 create
epair0a
# ifconfig epair0a fib 1
# ifconfig epair0a inet6 fd01::1/64 
# ifconfig epair0a up
# ifconfig epair0b fib 2
# ifconfig epair0b inet6 -ifdisabled accept_rtadv
# ifconfig epair0b up
# rtadvd epair0a
# rtsol epair0b

Sanity check - if epair0b received the RA, it should have an address in fd01::/64.

# ifconfig epair0b
epair0b: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=8<VLAN_MTU>
	ether 02:ff:30:00:05:0b
	inet6 fe80::ff:30ff:fe00:50b%epair0b prefixlen 64 scopeid 0x5 
	inet6 fd01::ff:30ff:fe00:50b prefixlen 64 autoconf 
	nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
	media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
	status: active
	fib: 2
	groups: epair

Verify that the router was learned and that the default route was installed.

# ndp -rn
fe80::ff:e0ff:fe00:40a%epair0b if=epair0b, flags=, pref=medium, expire=28m4s
# netstat -rnf inet6 -F2
Routing tables (fib: 2)

Internet6:
Destination                       Gateway                       Flags     Netif Expire
::/96                             ::1                           UGRS        lo0
default                           fe80::ff:e0ff:fe00:40a%epair0b UG     epair0b
::1                               lo0                           UHS         lo0
::ffff:0.0.0.0/96                 ::1                           UGRS        lo0
fd01::/64                         link#5                        U       epair0b
fd01::ff:30ff:fe00:50b            link#5                        UHS         lo0
fe80::/10                         ::1                           UGRS        lo0
fe80::%epair0b/64                 link#5                        U       epair0b
fe80::ff:30ff:fe00:50b%epair0b    link#5                        UHS         lo0
ff02::/16                         ::1                           UGRS        lo0

The same thing should work simultaneously in FIBs 3 and 4 with epair1a and 1b (and the scope on the default route in FIB 4 should be %epair1b). The MAC address of the epair interfaces can be set to something predictable to make identifying the route easier.

sys/net/route.c
2224 ↗(On Diff #24782)

Ah, of course...

A second look while re-implementing this revealed that rtinit1() actually does the correct thing when passed RT_ALL_FIBS for both additions and deletions, so I've simply reverted my change to line 2224.

sys/netinet6/icmp6.c
2147 ↗(On Diff #24782)

At line 2159, do we know the fib of the receiving interface?

It looks like the mbuf passed to icmp6_reflect() will always have the FIB set based on the receiving interface, so we can just use that! Can a more experienced set of eyes confirm?

sys/netinet6/in6_src.c
566 ↗(On Diff #24782)

I agree - all callers of in6_selectsrc_socket() unconditionally dereference inp prior to passing it, so this check for NULL is definitely redundant. I've removed it.

sys/netinet6/nd6.c
1295 ↗(On Diff #24782)

I'm rethinking this a little bit. While I think it's true that the most correct way to answer the question "Is this address a neighbor?" is to consider addresses in any FIB, I'm not sure it's necessary. Since the caller is specifying which interface this address should be a neighbor on, it's safe to consider only that interface's FIB because interface routes (ie., those routes with neighbors) will always be added there.

1353 ↗(On Diff #24782)

Should we pass the ifp->if_fib here? We compare ifps on line 1355, so searching all FIBs is usually fine, but if we have two identical addresses in two different FIBs, we could find the wrong one here and end up returning false.

This review is starting to look pretty good. But in addition to the few things I mentioned inline, there's one other change that you need to make: you get to clear the atf_expect_fail statements from tests/sys/netinet/fibs_test.sh.

sys/netinet6/nd6.c
1295 ↗(On Diff #24782)

Remember, the interface fib only matters for forwarding packets. It's totally valid for an interface to have multiple addresses assigned, each of which is on a different fib. So, to correctly determine whether addr is a neighbor of ifp, we must either

  1. Loop over all fibs, and check whether addr is a neighbor of ifp on any of them, or
  2. Loop over all addresses assigned to ifp, and check whether addr is a neighbor of ifp on that address's fib. I'm guessing that this will be the slower option, because an interface can have arbitrarily many addresses
1353 ↗(On Diff #24782)

The original code seems too complicated. I think it should go a little like this (locks elided):

if (ifp->if_flags & IFF_POINTOPOINT) {
        TAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) {
			if (ifa->ifa_addr->sa_family != addr->sa_family)
				continue;
			if (ifa->ifa_dstaddr != NULL &&
			    sa_equal(addr, ifa->ifa_dstaddr)) {
				return (1);
			}
		}
}

No unnecessary looping over either fibs or interfaces.

1310 ↗(On Diff #25512)

This comment is incorrect now.

sys/netinet6/nd6.c
1295 ↗(On Diff #24782)

It's totally valid for an interface to have multiple addresses assigned, each of which is on a different fib.

Is this true? I'm not aware of a way this could happen. Interface routes are added to the FIB associated with the interface, and of course there is only one FIB per interface. For instance in6_newaddrmsg() provides no mechanism by which to assign the route for the interface's address to anything other than the interface FIB.

asomers added inline comments.Mar 3 2017, 1:16 AM
sys/netinet6/nd6.c
1295 ↗(On Diff #24782)

Yep, it's true. One way is to do it with static routes. Another way involves changing the interfaces's fib. For example, like this:

ifconfig tap0 create
ifconfig tap0 10.1.0.1/24 fib 2
ifconfig tap0 10.1.1.1/24 fib 3 alias
jhujhiti_adjectivism.org edited edge metadata.
jhujhiti_adjectivism.org marked an inline comment as done.
jhujhiti_adjectivism.org added inline comments.
sys/netinet6/nd6.c
1295 ↗(On Diff #24782)

That's... counter-intuitive. But given that behavior, we absolutely should check other other FIBs here. The function is already looping over all prefixes on the interface so I inserted a check of all FIBs (if necessary) inside of it.

1353 ↗(On Diff #24782)

This seems like a good idea. Is this new code what you had in mind?

sys/netinet6/nd6_rtr.c
574 ↗(On Diff #25512)

I'm now wondering if this is incorrect. If the interface's FIB changes after the router has been added to the list, we would fail to remove it here.

Should the default router (and maybe prefix) structs include a new field to store the FIB in which the router(/prefix) was originally learned?

Almost done. I think the only thing left is to delete all of the related atf_expect_fail statements from fibs_test.sh, not just one.

sys/netinet6/nd6.c
1353 ↗(On Diff #24782)

Yep.

sys/netinet6/nd6_rtr.c
574 ↗(On Diff #25512)

I think that's a limitation of the design of our network stack, and I don't think that fixing it should be a part of this change. I think that fixing would require either:

  1. Associate a fib with each ifaddr, and delete the network and host routes whenever the ifaddr is deleted, or
  2. Store extra information with each route, such as why they were added to the routing table.
jhujhiti_adjectivism.org marked 14 inline comments as done.

Forgot to change the other tests - this ought to do it.

One annoying detail: while running these tests, I noticed intermittent failures on the SLAAC test. I was reminded of a similar intermittent issue I had while developing this: for reasons I can not explain, the epair0a interface (in this example) oftentimes gets correctly configured, but the IFDISABLED flag is never removed, like so:

epair0a: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=8<VLAN_MTU>
        ether 02:ff:e0:00:04:0a
        inet6 2001:db8::2 prefixlen 64 tentative 
        inet6 fe80::ff:e0ff:fe00:40a%epair0a prefixlen 64 tentative scopeid 0x4 
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
        status: active
        fib: 1
        groups: epair

Inserting a sleep between creation of the epair and configuration solves it. The good news is that this behaves identically on my system without the patch, so it's not a new regression.

sys/netinet6/icmp6.c
2147 ↗(On Diff #24782)

@asomers, can you confirm that M_GETFIB(m) is always correctly set to the FIB of the receiving interface?

asomers added inline comments.Mar 7 2017, 6:21 PM
sys/netinet6/icmp6.c
2147 ↗(On Diff #24782)

No. According to the comment at the bottom of icmp6_error, it isn't, because icmp6_reflect can sometimes be called from the output path. In that case, there wouldn't be a receiving interface. However, M_GETFIB(m) will definitely return _a_ fib. It will either have the fib of the receiving interface if this is on the receive path, or the fib of the socket if this is on the output path. Socket fibs default to the process fib, which cannot be RT_ALL_FIBS, but they can also be set by setsockopt(..., SO_SETFIB, ...), which also doesn't allow RT_ALL_FIBS. So I think your code is ok here.

jhujhiti_adjectivism.org marked 6 inline comments as done.Mar 8 2017, 1:41 AM
jhujhiti_adjectivism.org added inline comments.
sys/netinet6/icmp6.c
2147 ↗(On Diff #24782)

Ah yes, that's how I should have phrased it - valid FIB even if called in the output path.

sys/netinet6/nd6_nbr.c
265 ↗(On Diff #24782)

I think this is the only thing left to consider for this patch, but it seems to me that using the receiving interface's FIB is the most correct thing to do here. Checking other FIBs seems incorrect since that would cause the proxy to "leak" across FIB boundaries.

asomers accepted this revision.Mar 9 2017, 6:14 PM
asomers added inline comments.
sys/netinet6/nd6_nbr.c
265 ↗(On Diff #24782)

At this point, dst6 is the destination address of the received ns packet. ns packets are sent either to unicast or multicast addresses. In the former case, we should never receive a ns packet except to an address we already use, and it should only arrive at the interface that's using that address (unless your interface fib is different than fib the address was configured with, but I would call that a configuration error). If it's a multicast packet, the same logic applies, except that there's a small chance for us to receive a multicast ns packet for an address that's not ours. But even in that case, the packet should arrive at the interface that's a member of that multicast group, unless the group's unicast address is configured for a different fib than the interface. In summary, I think it's ok to use the receiving interface's fib here.

This revision is now accepted and ready to land.Mar 9 2017, 6:14 PM
This revision was automatically updated to reflect the committed changes.