Page MenuHomeFreeBSD

Fix rtsock route message generation for interface addresses.
ClosedPublic

Authored by melifaro on Dec 30 2019, 9:46 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Apr 14, 7:00 PM
Unknown Object (File)
Sun, Apr 14, 6:42 PM
Unknown Object (File)
Sun, Apr 14, 1:17 PM
Unknown Object (File)
Mar 18 2024, 5:51 AM
Unknown Object (File)
Feb 17 2024, 3:28 AM
Unknown Object (File)
Jan 2 2024, 4:29 PM
Unknown Object (File)
Dec 23 2023, 12:40 AM
Unknown Object (File)
Dec 18 2023, 6:33 PM
Subscribers

Details

Summary

There are multiple problems both in IPv4 and IPv6 code leading to the messages being generated with incorrect data.
Additionally, due to the lack of proper API v6 code uses ugly hacks to generate rtm messages of needed kindd.

Some background on the routing messages and their handling by the popular daemons is provided below for the interested reader:

rtsock/sysctl interface

Typically routing daemons utilise both rtsock "async" interface and sysctl(3) "sync" interface to keep the information on the interfaces and routes in sync with the kernel.
This allows to recover from both missed (for example, due to an excessive RTM_MISSMSG generation) and incorrect rtsock messages. However, as the sync calls are not cheap, especially for reading the routing tables, these are performed on minutes/hours cadence, thus making recovery not really fast.

Interface address setup

The following resources are generated:

  • (1) link-local entry, corresponding to the interface (given the interface is ethernet or similar). This translates to RTM_ADD with RTF_LLDATA rtsock notification
  • (2 "newaddr") interface address itself. This translates to RTM_NEWADDRS rtsock notification
  • (3 "hostroute") host route for the interface address. This translates to RTM_ADD for the host route
  • (4 "prefixroute") prefix route for the interface address&mask. This translates to RTM_ADD for the prefix.
  • corresponding multicast group(s)
IPv4 world

Let's see how it works in IPv4:

User's SIOCAIFADDR is handled by the in_aifaddr_ioctl(), which calls rtinit1() via in_addprefix().
rtinit1() generates "newaddrs" message:

RTM_NEWADDR: address being added to iface: len 164, metric 0, flags:
sockaddrs: <NETMASK,IFP,IFA,BRD>
  vlan2:52.54.0.42.f.ef 10.1.0.1 10.1.0.255

Note the _netmask_ sa: it is actually AF_LINK and not AF_INET. This is one of the things fixed in this change.

Message 2 "hostroute" is simply not generated for IPv4, resulting in routing daemons picking host /32 from the scan at the latter point in time (see testing).

Message 3 "prefixroute" is also generated by rtinit1():

RTM_ADD: Add Route: len 240, pid: 0, seq 0, errno 0, flags:<UP,PINNED>
locks:  inits:
sockaddrs: <DST,GATEWAY,NETMASK>
 10.1.0.0 link#3 255.255.255.0

As flags do NOT contain RTF_DONE, this message is ignored by bird/quagga. However, they typically don’t care as they derive these prefixes from 2 "newaddrs" message. Please see more details in Testing section.

IPv6 world

IPv6 control plane starts in in6_update_ifa(). IPv6 address setup is a complex one, so currently the code takes "all-or-nothing" approach w.r.t. reporting changes to the userland: it generates rtsock messages IFF everything was successful.
However, this approach has a drawback of not having data (rtentries) to generate these messages from, resulting in a hackish in6_newaddrmsg() creating "newaddrs" and "hostroute" messages.

Message 2 "newaddrs":

got message of size 164 on Fri Dec 27 21:59:50 2019
RTM_NEWADDR: address being added to iface: len 164, metric 0, flags:<HOST>
sockaddrs: <NETMASK,IFP,IFA>
 link#0 vlan2:52.54.0.42.f.ef 2a02:6b8:6::6

Note the _netmask_ sa: it is actually AF_LINK and not AF_INET. This is one of the things fixed in this change.

Message 3 "hostroute":

got message of size 272 on Fri Dec 27 21:59:50 2019
RTM_ADD: Add Route: len 272, pid: 0, seq 0, errno 0, flags:<UP,HOST,STATIC>
locks:  inits:
sockaddrs: <DST,GATEWAY,NETMASK>
 2a02:6b8:6::6 link#0 ffff:ffff:ffff:ffff::

Note RTF_DONE flag is not set, making this message ignored by bird/quagga.
Note useless NETMASK sockaddr being passed, despite the fact that this is host route.

Message 4 "prefixroute" generated by nd6_prefix_onlink() afterwards:

RTM_ADD: Add Route: len 344, pid: 0, seq 0, errno 0, flags:<UP,DONE>
locks:  inits:
sockaddrs: <DST,GATEWAY,NETMASK,IFP,IFA>
 2a02:6b8:6:: link#3 (255) ffff ffff ffff ffff ffff ffff ffff vlan2:52.54.0.42.f.ef 2a02:6b8:6::6

Fine on the first glance, however in reality it does not fill in interface index in rtm->rtm_index (rt_missmsg_fib() does not do that), thus making this message ignored by bird.

How these messages are perceived by the routing daemons

Quagga/FRR

//rtm_read()/ ignores all rtm messages w/o RTF_DONE. quagga/kernel_socket.c at 88d6516676cbcefb6ecdc1828cf59ba3a6e5fe7b · Quagga/quagga · GitHub

Additionally, quagga ignores all routes without a gateway (RTF_GATEWAY): quagga/kernel_socket.c at 88d6516676cbcefb6ecdc1828cf59ba3a6e5fe7b · Quagga/quagga · GitHub

This makes quagga ignore both "hostroute" and "prefixroute" messages, relying only on "newaddrs" message to construct the view. Lastly, the netmask SA in "newaddrs" is incorrect (wrong family/af_size). However, as variations of this were present in *BDSs for decades, routing daemons have already worked around that.

bird

bird also ignores all rtm messages w/o RTF_DONE: bird/krt-sock.c at 822a7ee6d5cd9bf38548026e0dd52fbc4634030d · BIRD/bird · GitHub.

It also relies on "newaddrs" message to construct the prefix routes. Learning host route is postponed till the next route sync:

2019-12-27 21:59:50 <ERR> KRT: Received route 2a02:6b8:6::/64 with unknown ifindex 0
..
2019-12-27 22:00:05 <TRACE> kernel1: Scanning routing table
2019-12-27 22:00:05 <TRACE> kernel1: 2a02:6b8:6::6/128: [alien] created

Changes

IPv4

Newaddrs

OLD:

RTM_NEWADDR: address being added to iface: len 164, metric 0, flags:
sockaddrs: <NETMASK,IFP,IFA,BRD>
  vlan2:52.54.0.42.f.ef 10.1.0.1 10.1.0.255

NEW:

got message of size 124 on Mon Dec 30 22:56:57 2019
RTM_NEWADDR: address being added to iface: len 124, metric 0, flags:
sockaddrs: <NETMASK,IFP,IFA,BRD>
 255.255.255.0 vlan2:52.54.0.14.e3.19 10.1.0.1 10.1.0.255

Changes:

  • proper netmask AF
Prefixroute

OLD:

RTM_ADD: Add Route: len 240, pid: 0, seq 0, errno 0, flags:<UP,PINNED>
locks:  inits:
sockaddrs: <DST,GATEWAY,NETMASK>
 10.1.0.0 link#3 255.255.255.0

NEW:

RTM_ADD: Add Route: len 240, pid: 0, seq 0, errno 0, flags:<UP,DONE,PINNED>
locks:  inits:
sockaddrs: <DST,GATEWAY,NETMASK>
 10.1.0.0 link#3 255.255.255.0

Changes:

  • RTF_DONE is set

IPv6

"newaddrs" message:

OLD:

RTM_NEWADDR: address being added to iface: len 164, metric 0, flags:<HOST>
sockaddrs: <NETMASK,IFP,IFA>
 link#0 vlan2:52.54.0.42.f.ef 2a02:6b8:6::6

NEW:

RTM_NEWADDR: address being added to iface: len 140, metric 0, flags:<HOST>
sockaddrs: <NETMASK,IFP,IFA>
 ffff:ffff:ffff:ffff:: vlan2:52.54.0.14.e3.19 2a02:6b8:6::6

Changes:

  • Netmask has proper AF
"Hostroute" message:

OLD:

RTM_ADD: Add Route: len 272, pid: 0, seq 0, errno 0, flags:<UP,HOST,STATIC>
locks:  inits:
sockaddrs: <DST,GATEWAY,NETMASK>
 2a02:6b8:6::6 link#0 ffff:ffff:ffff:ffff::

NEW:

RTM_ADD: Add Route: len 240, pid: 0, seq 0, errno 0, flags:<UP,HOST,DONE,STATIC,PINNED>
locks:  inits:
sockaddrs: <DST,GATEWAY>
 2a02:6b8:6::6 link#3

Changes:

  • RTF_DONE is set, along with RTF_PINNED, which is the real flag set on the route by the ifa_maintain_loopback_route()
  • No netmask SA
"Prefixroute" message:

OLD:

RTM_ADD: Add Route: len 344, pid: 0, seq 0, errno 0, flags:<UP,DONE>
locks:  inits:
sockaddrs: <DST,GATEWAY,NETMASK,IFP,IFA>
 2a02:6b8:6:: link#3 (255) ffff ffff ffff ffff ffff ffff ffff vlan2:52.54.0.42.f.ef 2a02:6b8:6::6

NEW:

RTM_ADD: Add Route: len 272, pid: 0, seq 0, errno 0, flags:<UP,DONE>
locks:  inits:
sockaddrs: <DST,GATEWAY,NETMASK>
 2a02:6b8:6:: link#3 ffff:ffff:ffff:ffff::

Changes:

  • rtm_index is filled in (not visible here)
  • IFA /IFA sockaddrs has been removed. Other route messages does not contain this info, as it is excessive - can be easily obtained by getifaddrs(3).

BEFORE - bird:

2019-12-27 21:59:50 <ERR> KRT: Received route 2a02:6b8:6::/64 with unknown ifindex 0
..
2019-12-27 22:00:05 <TRACE> kernel1: Scanning routing table
2019-12-27 22:00:05 <TRACE> kernel1: 2a02:6b8:6::6/128: [alien] created

AFTER - bird:

2019-12-30 23:01:12 <TRACE> static1 < interface vlan2 goes up
2019-12-30 23:01:12 <TRACE> direct1 < primary address 2a02:6b8:6::/64 on interface vlan2 added
2019-12-30 23:01:12 <TRACE> direct1 > added [best] 2a02:6b8:6::/64 dev vlan2
2019-12-30 23:01:12 <TRACE> kernel1 < rejected by protocol 2a02:6b8:6::/64 dev vlan2
2019-12-30 23:01:12 <TRACE> kernel1: 2a02:6b8:6::6/128: [alien async] created
Test Plan

Multiple tests has been added to verify the expected behaviour.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable