Page MenuHomeFreeBSD

Stage 1: Introduce nexhop objects and new routing kpi

Authored by melifaro on Mar 30 2020, 9:45 PM.


NOTE: This is the part of the larger change introducing new routing kpi, nexhops and scalable multipath: D24141

Please see the description of that review for the reasoning and high-level technical overview.

This change focus on bringing the pre-reqisits in compatible way, allowing to introduce the changes sequentially in small pieces.
To be more specific, the change keeps the existing routing control plane while still calculating nexhop objects AND adds nexthop references to the struct rtentry. This allows both old and new routing kpi to coexists.

This review introduces the following 4 dataplane functions:

struct nhop_object *fib4_lookup_nh_ptr(uint32_t fibnum, struct in_addr dst, uint32_t scopeid, uint32_t flags, uint32_t flowid);
struct nhop_object *fib6_lookup_nh_ptr(uint32_t fibnum, const struct in6_addr *dst6, uint32_t scopeid, uint32_t flags, uint32_t flowid);

int fib4_lookup_urpf(uint32_t fibnum, struct in_addr dst, uint32_t scopeid, uint32_t flags, const struct ifnet *src_if);
int fib6_lookup_urpf(uint32_t fibnum, const struct in6_addr *dst6, uint32_t scopeid, uint32_t flags, const struct ifnet *src_if);

The first 2 are intended to replace the existing fib api and all flavours of *rtalloc* functions.
The second two are intended to be used by firewalls to provide efficient reverse path forwarding verification.

Existing fib(4) functions have been converted to use the kpi. They are mostly used in forwarding. To ease the review process, conversion of the other consumers will be submitted as separate reviews.

  • D24245 - Convert IP/IPv6 forwarding and ICMP processing to the new routing KPI.
  • D24334 - Convert ip6_forward() to the new routing KPI.
  • D24340 - Convert route caching to nexthop caching


  • Per-rte traffic statistics will not be supported with the proposed changes. The only statistics available is be per-nexthop. With more and more consumers using the new kpi, the per-rte statistics will become less and less relevant. Given that, netstat(8) per-rte statistics reported should be turned off at some point of this conversion.
  • rtentry size and cache lines. Header of struct rtentry is occupied by radix header structures consisting of 2*6 pointers, leaving 4 pointers remaining in first 128 bytes. Currently datapath-required fields like rt_mtu` and rt_pksent are already outsize this boundary, so temporarily adding another pointer should not make the things worse.
Test Plan
test_rtsock_l3:rtm_add_v4_gu_ifa_ordered_success  ->  passed  [0.123s]
test_rtsock_l3:rtm_add_v4_gw_direct_success  ->  passed  [0.122s]
test_rtsock_l3:rtm_add_v4_temporal1_success  ->  passed  [0.139s]
test_rtsock_l3:rtm_add_v6_gu_gw_gu_direct_success  ->  passed  [0.122s]
test_rtsock_l3:rtm_add_v6_gu_ifa_hostroute_success  ->  passed  [0.131s]
test_rtsock_l3:rtm_add_v6_gu_ifa_ordered_success  ->  passed  [0.121s]
test_rtsock_l3:rtm_add_v6_gu_ifa_prefixroute_success  ->  passed  [0.124s]
test_rtsock_l3:rtm_add_v6_temporal1_success  ->  passed  [0.124s]
test_rtsock_l3:rtm_change_v4_gw_success  ->  passed  [0.133s]
test_rtsock_l3:rtm_change_v4_mtu_success  ->  passed  [0.127s]
test_rtsock_l3:rtm_change_v6_gw_success  ->  passed  [0.145s]
test_rtsock_l3:rtm_change_v6_mtu_success  ->  passed  [0.131s]
test_rtsock_l3:rtm_del_v4_gu_ifa_prefixroute_success  ->  passed  [0.127s]
test_rtsock_l3:rtm_del_v4_prefix_nogw_success  ->  passed  [0.124s]
test_rtsock_l3:rtm_del_v6_gu_ifa_hostroute_success  ->  passed  [0.130s]
test_rtsock_l3:rtm_del_v6_gu_ifa_prefixroute_success  ->  passed  [0.139s]
test_rtsock_l3:rtm_del_v6_gu_prefix_nogw_success  ->  passed  [0.131s]
test_rtsock_l3:rtm_get_v4_empty_dst_failure  ->  passed  [0.004s]
test_rtsock_l3:rtm_get_v4_exact_success  ->  passed  [0.137s]
test_rtsock_l3:rtm_get_v4_hostbits_failure  ->  passed  [0.129s]
test_rtsock_l3:rtm_get_v4_lpm_success  ->  passed  [0.121s]
test_rtsock_lladdr:rtm_add_v4_gu_lle_success  ->  passed  [0.126s]
test_rtsock_lladdr:rtm_add_v6_gu_lle_success  ->  passed  [0.125s]
test_rtsock_lladdr:rtm_add_v6_ll_lle_success  ->  passed  [0.121s]
test_rtsock_lladdr:rtm_del_v4_gu_lle_success  ->  passed  [0.122s]
test_rtsock_lladdr:rtm_del_v6_gu_lle_success  ->  passed  [0.120s]
test_rtsock_lladdr:rtm_del_v6_ll_lle_success  ->  passed  [0.120s]

forward:fwd_ip_icmp_gw_fast_success  ->  passed  [1.053s]
forward:fwd_ip_icmp_gw_slow_success  ->  passed  [1.050s]
forward:fwd_ip_icmp_iface_fast_success  ->  passed  [1.046s]
forward:fwd_ip_icmp_iface_slow_success  ->  passed  [1.055s]
ip_reass_test:ip_reass__large_fragment  ->  passed  [0.014s]
ip_reass_test:ip_reass__multiple_last_fragments  ->  passed  [0.032s]
ip_reass_test:ip_reass__zero_length_fragment  ->  passed  [0.015s]
output:output_raw_success  ->  passed  [0.075s]
output:output_tcp_setup_success  ->  passed  [0.138s]
output:output_udp_setup_success  ->  passed  [1.221s]
redirect:valid_redirect  ->  passed  [1.025s]
so_reuseport_lb_test:basic_ipv4  ->  passed  [0.600s]
so_reuseport_lb_test:basic_ipv6  ->  passed  [1.334s]
socket_afinet:socket_afinet  ->  passed  [0.003s]
socket_afinet:socket_afinet_bind_ok  ->  passed  [0.002s]

forward6:fwd_ip6_gu_icmp_gw_gu_fast_success  ->  passed  [2.993s]
forward6:fwd_ip6_gu_icmp_gw_gu_slow_success  ->  passed  [2.702s]
forward6:fwd_ip6_gu_icmp_gw_ll_fast_success  ->  passed  [3.132s]
forward6:fwd_ip6_gu_icmp_gw_ll_slow_success  ->  passed  [3.011s]
forward6:fwd_ip6_gu_icmp_iface_fast_success  ->  passed  [2.683s]
forward6:fwd_ip6_gu_icmp_iface_slow_success  ->  passed  [3.356s]
mld:mldraw01  ->  passed  [4.274s]
output6:output6_raw_flowid_mpath_success  ->  skipped: This test requires ROUTE_MPATH enabled  [0.018s]
output6:output6_raw_success  ->  passed  [2.014s]
output6:output6_tcp_flowid_mpath_success  ->  skipped: This test requires ROUTE_MPATH enabled  [0.022s]
output6:output6_tcp_setup_success  ->  passed  [2.106s]
output6:output6_udp_flowid_mpath_success  ->  skipped: This test requires ROUTE_MPATH enabled  [0.028s]
output6:output6_udp_setup_success  ->  passed  [2.599s]
redirect:valid_redirect  ->  passed  [3.103s]
scapyi386:scapyi386  ->  passed  [4.107s]

Diff Detail

rS FreeBSD src repository - subversion
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

melifaro retitled this revision from Stage 1: Introduce nexhop objects. to Stage 1: Introduce nexhop objects and new routing kpi.Mar 30 2020, 10:15 PM
melifaro edited the summary of this revision. (Show Details)
melifaro edited the test plan for this revision. (Show Details)
melifaro added reviewers: network, glebius, ae, bz, olivier.
1730 ↗(On Diff #70031)

It seems there is another exit point, but you didn't add nhop_free_object(nh) here. Is it ok?

1905 ↗(On Diff #70031)

Probably the "goto bad" case needs nhop_free_object(). Isn't it?

melifaro edited the summary of this revision. (Show Details)

Address ae@ comments.
Remove debugging (RTDEBUG)
Sort includes.

melifaro marked 2 inline comments as done.

Fix rtrequest1_change().

Improve comments, remove the multipath parts sneaked in.

27 ↗(On Diff #70031)

I think usually we use __FBSDID() for *.c files.

282 ↗(On Diff #70089)

This locking looks redundant.

192 ↗(On Diff #70089)

/* _KERNEL */

362 ↗(On Diff #70089)

this lock also seems unneeded.

127 ↗(On Diff #70089)

It would be better use something one - memset or bzero :)

137 ↗(On Diff #70089)

IMHO, using gw_sa and sa_len is preferable. It mean address family independence.

40 ↗(On Diff #70089)

It seems this macro is unused and useless.

304 ↗(On Diff #70089)

sin4.sin_family = AF_INET?

163 ↗(On Diff #70089)

Do we need to initialize sin6_family here?

213 ↗(On Diff #70089)


269 ↗(On Diff #70089)


melifaro marked 7 inline comments as done.

Remove NH_PRIV lock, switch to refcount(9)-based nh_linked field
to track unlink status.
Remove nhop_req request layer, thus removing double copying for each
nhop request.
Address ae@ comments.

304 ↗(On Diff #70089)

No, radix doesn't care of anything that is below sockaddr adress offset.
Additionally, there is a review ( D23051 ) to allow address lookups, eliminating the need to use sockaddrs as a lookup key .

I agree that generally this should be filled in properly to ease code understanding, but I really hope to commit the sockaddr change soon.

163 ↗(On Diff #70089)

Same as in IPv4 case: radix does not care of anything that is below the address offset in the struct sockaddr_in6.

I'm going to commit it on Sunday, April 12 unless anyone has any objection.

Fix new KPI compatibility with current mpath implementation.

I'm going to commit it on Sunday, April 12 unless anyone has any objection.

"Поехали!" (c) :-)

Pre-commit update:

  • fix useland directory creation
  • rename fib[46]_lookup_nh_ptr to just fib[46]_lookup
  • rename fib[46]_lookup_urpf to fib[46]_check_urpf
  • rename nhop_free_object to nhop_free
  • add forgotten unlock in nhops_dump()
  • fix improper return code check for bitmask_copy
  • fix some DPRINT arguments
  • improve documentation a bit
This revision was not accepted when it landed; it landed in state Needs Review.Apr 12 2020, 2:30 PM
This revision was automatically updated to reflect the committed changes.