There are multiple problems both in IPv4 and IPv6 code leading to the messages being generated with incorrect data.
Additionally, due to the lack of proper API v6 code uses ugly hacks to generate rtm messages of needed kindd.
Some background on the routing messages and their handling by the popular daemons is provided below for the interested reader:
### rtsock/sysctl interface
Typically routing daemons utilise both rtsock "async" interface and `sysctl(3)` "sync" interface to keep the information on the interfaces and routes in sync with the kernel.
This allows to recover from both missed (for example, due to an excessive `RTM_MISSMSG` generation) and incorrect rtsock messages. However, as the sync calls are not cheap, especially for reading the routing tables, these are performed on minutes/hours cadence, thus making recovery not really fast.
### Interface address setup
The following resources are generated:
* (1) link-local entry, corresponding to the interface (given the interface is ethernet or similar). This translates to RTM_ADD with RTF_LLDATA rtsock notification
* (2 "newaddr") interface address itself. This translates to RTM_NEWADDRS rtsock notification
* (3 "hostroute") host route for the interface address. This translates to RTM_ADD for the host route
* (4 "prefixroute") prefix route for the interface address&mask. This translates to RTM_ADD for the prefix.
* corresponding multicast group(s)
#### IPv4 world
Let's see how it works in IPv4:
User's `SIOCAIFADDR` is handled by the //in_aifaddr_ioctl()//, which calls //rtinit1()// via //in_addprefix()//.
rtinit1() generates "newaddrs" message:
```
RTM_NEWADDR: address being added to iface: len 164, metric 0, flags:
sockaddrs: <NETMASK,IFP,IFA,BRD>
vlan2:52.54.0.42.f.ef 10.1.0.1 10.1.0.255
```
Note the _netmask_ sa: it is actually AF_LINK and not AF_INET. This is one of the things fixed in this change.
Message 2 "hostroute" is simply **not generated** for IPv4, resulting in routing daemons picking host /32 from the scan at the latter point in time (see testing).
Message 3 "prefixroute" is also generated by //rtinit1()//:
```
RTM_ADD: Add Route: len 240, pid: 0, seq 0, errno 0, flags:<UP,PINNED>
locks: inits:
sockaddrs: <DST,GATEWAY,NETMASK>
10.1.0.0 link#3 255.255.255.0
```
As flags do NOT contain `RTF_DONE`, this message **is ignored** by bird/quagga. However, they typically don’t care as they derive these prefixes from 2 "newaddrs" message. Please see more details in Testing section.
#### IPv6 world
IPv6 control plane starts in //in6_update_ifa()//. IPv6 address setup is a complex one, so currently the code takes "all-or-nothing" approach w.r.t. reporting changes to the userland: it generates rtsock messages IFF everything was successful.
However, this approach has a drawback of not having data (rtentries) to generate these messages from, resulting in a hackish //in6_newaddrmsg()// creating "newaddrs" and "hostroute" messages.
Message 2 "newaddrs":
```
got message of size 164 on Fri Dec 27 21:59:50 2019
RTM_NEWADDR: address being added to iface: len 164, metric 0, flags:<HOST>
sockaddrs: <NETMASK,IFP,IFA>
link#0 vlan2:52.54.0.42.f.ef 2a02:6b8:6::6
```
Note the _netmask_ sa: it is actually AF_LINK and not AF_INET. This is one of the things fixed in this change.
Message 3 "hostroute":
```
got message of size 272 on Fri Dec 27 21:59:50 2019
RTM_ADD: Add Route: len 272, pid: 0, seq 0, errno 0, flags:<UP,HOST,STATIC>
locks: inits:
sockaddrs: <DST,GATEWAY,NETMASK>
2a02:6b8:6::6 link#0 ffff:ffff:ffff:ffff::
```
Note `RTF_DONE` flag is not set, making this message ignored by bird/quagga.
Note useless `NETMASK` sockaddr being passed, despite the fact that this is host route.
Message 4 "prefixroute" generated by nd6_prefix_onlink() afterwards:
```
RTM_ADD: Add Route: len 344, pid: 0, seq 0, errno 0, flags:<UP,DONE>
locks: inits:
sockaddrs: <DST,GATEWAY,NETMASK,IFP,IFA>
2a02:6b8:6:: link#3 (255) ffff ffff ffff ffff ffff ffff ffff vlan2:52.54.0.42.f.ef 2a02:6b8:6::6
```
Fine on the first glance, however in reality it does not fill in interface index in //rtm->rtm_index// (rt_missmsg_fib() does not do that), thus making this message ignored by bird.
### How these messages are perceived by the routing daemons
#### Quagga/FRR
//rtm_read()/ ignores all rtm messages w/o `RTF_DONE`. [quagga/kernel_socket.c at 88d6516676cbcefb6ecdc1828cf59ba3a6e5fe7b · Quagga/quagga · GitHub](https://github.com/Quagga/quagga/blob/88d6516676cbcefb6ecdc1828cf59ba3a6e5fe7b/zebra/kernel_socket.c#L884)
Additionally, quagga ignores all routes without a gateway (`RTF_GATEWAY`): [quagga/kernel_socket.c at 88d6516676cbcefb6ecdc1828cf59ba3a6e5fe7b · Quagga/quagga · GitHub](https://github.com/Quagga/quagga/blob/88d6516676cbcefb6ecdc1828cf59ba3a6e5fe7b/zebra/kernel_socket.c#L902)
This makes quagga ignore both "hostroute" and "prefixroute" messages, relying only on "newaddrs" message to construct the view. Lastly, the netmask SA in "newaddrs" is incorrect (wrong family/af_size). However, as variations of this were present in *BDSs for decades, routing daemons have already worked around that.
#### bird
bird also ignores all rtm messages w/o `RTF_DONE`: [bird/krt-sock.c at 822a7ee6d5cd9bf38548026e0dd52fbc4634030d · BIRD/bird · GitHub](https://github.com/BIRD/bird/blob/822a7ee6d5cd9bf38548026e0dd52fbc4634030d/sysdep/bsd/krt-sock.c#L349).
It also relies on "newaddrs" message to construct the prefix routes. Learning host route is postponed till the next route sync:
```
2019-12-27 21:59:50 <ERR> KRT: Received route 2a02:6b8:6::/64 with unknown ifindex 0
..
2019-12-27 22:00:05 <TRACE> kernel1: Scanning routing table
2019-12-27 22:00:05 <TRACE> kernel1: 2a02:6b8:6::6/128: [alien] created
```
## Changes
### IPv4
#### Newaddrs
OLD:
```
RTM_NEWADDR: address being added to iface: len 164, metric 0, flags:
sockaddrs: <NETMASK,IFP,IFA,BRD>
vlan2:52.54.0.42.f.ef 10.1.0.1 10.1.0.255
```
NEW:
```
got message of size 124 on Mon Dec 30 22:56:57 2019
RTM_NEWADDR: address being added to iface: len 124, metric 0, flags:
sockaddrs: <NETMASK,IFP,IFA,BRD>
255.255.255.0 vlan2:52.54.0.14.e3.19 10.1.0.1 10.1.0.255
```
Changes:
* proper netmask AF
#### Prefixroute
OLD:
```
RTM_ADD: Add Route: len 240, pid: 0, seq 0, errno 0, flags:<UP,PINNED>
locks: inits:
sockaddrs: <DST,GATEWAY,NETMASK>
10.1.0.0 link#3 255.255.255.0
```
NEW:
```
RTM_ADD: Add Route: len 240, pid: 0, seq 0, errno 0, flags:<UP,DONE,PINNED>
locks: inits:
sockaddrs: <DST,GATEWAY,NETMASK>
10.1.0.0 link#3 255.255.255.0
```
Changes:
* `RTF_DONE` is set
### IPv6
#### "newaddrs" message:
OLD:
```
RTM_NEWADDR: address being added to iface: len 164, metric 0, flags:<HOST>
sockaddrs: <NETMASK,IFP,IFA>
link#0 vlan2:52.54.0.42.f.ef 2a02:6b8:6::6
```
NEW:
```
RTM_NEWADDR: address being added to iface: len 140, metric 0, flags:<HOST>
sockaddrs: <NETMASK,IFP,IFA>
ffff:ffff:ffff:ffff:: vlan2:52.54.0.14.e3.19 2a02:6b8:6::6
```
Changes:
* Netmask has proper AF
#### "Hostroute" message:
OLD:
```
RTM_ADD: Add Route: len 272, pid: 0, seq 0, errno 0, flags:<UP,HOST,STATIC>
locks: inits:
sockaddrs: <DST,GATEWAY,NETMASK>
2a02:6b8:6::6 link#0 ffff:ffff:ffff:ffff::
```
NEW:
```
RTM_ADD: Add Route: len 240, pid: 0, seq 0, errno 0, flags:<UP,HOST,DONE,STATIC,PINNED>
locks: inits:
sockaddrs: <DST,GATEWAY>
2a02:6b8:6::6 link#3
```
Changes:
* `RTF_DONE` is set, along with `RTF_PINNED`, which is the real flag set on the route by the //ifa_maintain_loopback_route()//
* No netmask SA
#### "Prefixroute" message:
OLD:
```
RTM_ADD: Add Route: len 344, pid: 0, seq 0, errno 0, flags:<UP,DONE>
locks: inits:
sockaddrs: <DST,GATEWAY,NETMASK,IFP,IFA>
2a02:6b8:6:: link#3 (255) ffff ffff ffff ffff ffff ffff ffff vlan2:52.54.0.42.f.ef 2a02:6b8:6::6
```
NEW:
```
RTM_ADD: Add Route: len 272, pid: 0, seq 0, errno 0, flags:<UP,DONE>
locks: inits:
sockaddrs: <DST,GATEWAY,NETMASK>
2a02:6b8:6:: link#3 ffff:ffff:ffff:ffff::
```
Changes:
* //rtm_index// is filled in (not visible here)
* IFA /IFA sockaddrs has been removed. Other route messages does not contain this info, as it is excessive - can be easily obtained by `getifaddrs(3)`.
BEFORE - bird:
```
2019-12-27 21:59:50 <ERR> KRT: Received route 2a02:6b8:6::/64 with unknown ifindex 0
..
2019-12-27 22:00:05 <TRACE> kernel1: Scanning routing table
2019-12-27 22:00:05 <TRACE> kernel1: 2a02:6b8:6::6/128: [alien] created
```
AFTER - bird:
```
2019-12-30 23:01:12 <TRACE> static1 < interface vlan2 goes up
2019-12-30 23:01:12 <TRACE> direct1 < primary address 2a02:6b8:6::/64 on interface vlan2 added
2019-12-30 23:01:12 <TRACE> direct1 > added [best] 2a02:6b8:6::/64 dev vlan2
2019-12-30 23:01:12 <TRACE> kernel1 < rejected by protocol 2a02:6b8:6::/64 dev vlan2
2019-12-30 23:01:12 <TRACE> kernel1: 2a02:6b8:6::6/128: [alien async] created
```