Page MenuHomeFreeBSD

Bring back redirect route expiration.
AbandonedPublic

Authored by melifaro on Jan 1 2020, 1:45 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Jan 22, 1:40 PM
Unknown Object (File)
Sat, Jan 18, 4:32 AM
Unknown Object (File)
Sat, Jan 11, 4:58 AM
Unknown Object (File)
Sat, Jan 11, 4:58 AM
Unknown Object (File)
Sat, Jan 11, 1:52 AM
Unknown Object (File)
Sat, Jan 4, 1:01 PM
Unknown Object (File)
Sat, Dec 28, 9:05 PM
Unknown Object (File)
Sat, Dec 28, 3:44 AM
Subscribers

Details

Reviewers
bz
ae
donner
Group Reviewers
network
Summary

Redirect (and temporal) route expiration was broken a while ago.

This change brings route expiration back, with unified IPv4/IPv6 handling code.

It introduces net.inet.icmp.redirtimeout sysctl, allowing to set an expiration time for redirected routes. It defaults to 10 minutes, analogues with net.inet6.icmp6.redirtimeout.

Implementation uses separate file, route_temporal.c, as route.c is already bloated with tons of different functions.

Internally, expiration is implemented as an per-rnh callout scheduled when route with non-zero rt_expire time is added or rt_expire is changed. It does not add any overhead when no temporal routes are present.

Callout traverses entire routing tree under wlock, scheduling expired routes for deletion and calculating the next time it needs to be run.
The rationale for such implemention is the following: typically workloads requiring large amount of routes have redirects turned off already, while the systems with small amount of routes will not inhibit large overhead during tree traversal.

To support this callout, rib_fibnum, rib_family and rib_vnet fields are added to the rhn. These will be used by the upcoming route changes as well.

This changes also fixes netstat -rn display of route expiration time, which has been broken since the conversion from kread() to sysctl.

Test Plan
12:20 [1] m@devel0 s route add -6 2a02:6b8:1::/64 fe80::5054:ff:fe42:fef%vtnet0 -expire +10
add net 2a02:6b8:1::/64: gateway fe80::5054:ff:fe42:fef%vtnet0
12:21 [1] m@devel0 route -n get -6 2a02:6b8:1::/64
   route to: 2a02:6b8:1::
destination: 2a02:6b8:1::
       mask: ffff:ffff:ffff:ffff::
    gateway: fe80::5054:ff:fe42:fef%vtnet0
        fib: 0
  interface: vtnet0
      flags: <UP,GATEWAY,DONE,STATIC>
 recvpipe  sendpipe  ssthresh  rtt,msec    mtu        weight    expire
       0         0         0         0      1500         1         4
12:21 [1] m@devel0 route -n get -6 2a02:6b8:1::/64
   route to: 2a02:6b8:1::
destination: 2a02:6b8:1::
       mask: ffff:ffff:ffff:ffff::
    gateway: fe80::5054:ff:fe42:fef%vtnet0
        fib: 0
  interface: vtnet0
      flags: <UP,GATEWAY,DONE,STATIC>
 recvpipe  sendpipe  ssthresh  rtt,msec    mtu        weight    expire
       0         0         0         0      1500         1         1
12:21 [1] m@devel0 route -n get -6 2a02:6b8:1::/64
route: route has not been found
12:22 [1] m@devel0 s route add -net 10.2.0.0/24 10.0.0.2 -expire +20
add net 10.2.0.0: gateway 10.0.0.2
12:22 [1] m@devel0 route -n get 10.2.0.0/24
   route to: 10.2.0.0
destination: 10.2.0.0
       mask: 255.255.255.0
    gateway: 10.0.0.2
        fib: 0
  interface: vtnet0
      flags: <UP,GATEWAY,DONE,STATIC>
 recvpipe  sendpipe  ssthresh  rtt,msec    mtu        weight    expire
       0         0         0         0      1500         1        16
12:22 [1] m@devel0 route -n get 10.2.0.0/24
   route to: 10.2.0.0
destination: 10.2.0.0
       mask: 255.255.255.0
    gateway: 10.0.0.2
        fib: 0
  interface: vtnet0
      flags: <UP,GATEWAY,DONE,STATIC>
 recvpipe  sendpipe  ssthresh  rtt,msec    mtu        weight    expire
       0         0         0         0      1500         1         6
12:22 [1] m@devel0
12:22 [1] m@devel0
12:22 [1] m@devel0
12:22 [1] m@devel0 route -n get 10.2.0.0/24
route: route has not been found

added tests (route expiration)

14:33 [0] m@devel0 s kyua test -k /usr/kyua test -k /usr/obj/usr/home/melifaro/free/head/amd64.amd64/tests/sys/net/routing/Kyuafile
test_rtsock_l3:rtm_add_v4_gw_direct_success  ->  passed  [0.008s]
test_rtsock_l3:rtm_add_v4_temporal1_success  ->  passed  [0.015s]
test_rtsock_l3:rtm_add_v6_gu_gw_gu_direct_success  ->  passed  [0.005s]
test_rtsock_l3:rtm_add_v6_temporal1_success  ->  failed: /usr/home/melifaro/free/head/tests/sys/net/routing/test_rtsock_l3.c:152: NETMASK sa diff: overall memcmp() reports diff for af 28 offset 2  [0.008s]
test_rtsock_l3:rtm_del_v4_prefix_nogw_success  ->  passed  [0.004s]
test_rtsock_l3:rtm_del_v6_gu_prefix_nogw_success  ->  failed: /usr/home/melifaro/free/head/tests/sys/net/routing/test_rtsock_l3.c:152: NETMASK sa diff: overall memcmp() reports diff for af 28 offset 2  [0.004s]
test_rtsock_l3:rtm_get_v4_empty_dst_failure  ->  passed  [0.002s]
test_rtsock_l3:rtm_get_v4_exact_success  ->  passed  [0.004s]
test_rtsock_l3:rtm_get_v4_hostbits_failure  ->  passed  [0.004s]
test_rtsock_l3:rtm_get_v4_lpm_success  ->  passed  [0.005s]
test_rtsock_lladdr:rtm_add_v4_gu_lle_success  ->  passed  [0.004s]
test_rtsock_lladdr:rtm_add_v6_gu_lle_success  ->  passed  [0.003s]
test_rtsock_lladdr:rtm_add_v6_ll_lle_success  ->  passed  [0.002s]
test_rtsock_lladdr:rtm_del_v4_gu_lle_success  ->  passed  [0.004s]
test_rtsock_lladdr:rtm_del_v6_gu_lle_success  ->  passed  [0.003s]
test_rtsock_lladdr:rtm_del_v6_ll_lle_success  ->  passed  [0.002s]

netmask failures are not related to this change, this will be addressed in a separate change.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 28412
Build 26491: arc lint + arc unit

Event Timeline

melifaro edited the test plan for this revision. (Show Details)
melifaro edited the summary of this revision. (Show Details)

Add forgotten route_temporal.c

Add tests for route expiration.

Add ND redirect test, fix panic in redirect handling code.

Missing rtedirect.py is here. Phabricator refuses to add python files to the diff due to the lack of pep8 binary installed on phabric server..

Remove unrelevant changes, add forgotten license header.

Add python code to generate IPv6 redirect.

Is it possible to default the "redirect" settings to an "automatic mode" where redirects are disabled if a certain, hardcoded size of the routing table is crossed?

Normally the admin can not know all implementation details on each sysctl setting. So unless the setting was not modified by hand, the system is expected to choose the "best" path for the workload. Traversing the large routing table every 10 minutes causes a performance glitch which is very hard to find.

In D22988#503966, @lutz_donnerhacke.de wrote:

Is it possible to default the "redirect" settings to an "automatic mode" where redirects are disabled if a certain, hardcoded size of the routing table is crossed?

Normally the admin can not know all implementation details on each sysctl setting. So unless the setting was not modified by hand, the system is expected to choose the "best" path for the workload. Traversing the large routing table every 10 minutes causes a performance glitch which is very hard to find.

First of all, I do like the callout about the reasonable and auto-tuning parameters.

Speaking of this specific change: there are no fixed-time traversals of the routing table. No installed redirects -> no traversals at all.
I went through the exercise of finding the cause of performance degradation in similar cases a couple of times as well :-)

Speaking of auto-tuning: I'm not sure if the kernel is really the best place to implement it. Size of the routing table is one datapoint, but it will not always produce the consistent results. Also, kernel-based logic is really hard to modify. If going this way, I'd be more thinking of having an rc.d script that would verify the presence of an enabled routing daemon and rely on that to disable redirects.

If going this way, I'd be more thinking of having an rc.d script that would verify
the presence of an enabled routing daemon and rely on that to disable redirects.

That's a very good idea. It makes the decision explicit and allow human readable comments.
Let me have a look into this (beside the other pressing issues at work).

This revision is now accepted and ready to land.Jan 2 2020, 9:25 PM

I have a very annoying question: how much extra work would it be to split this up into: (a) adding rib_fibnum, rib_family and rib_vnet fields and changing the KPI for them by adding them to the current calls and then (b) adding the new functionality (and changing the support function logic beyond just passing the extra fields around)?
It would make the change history much more clear and also make it easier to review things.

PS: I am happy some of the FIB KPI gets cleaned up again, i.e. in6_rtredirect() going away. I hated adding them based on the IPv4 model when I did the initial IPv6 FIB work.

In D22988#504231, @bz wrote:

I have a very annoying question: how much extra work would it be to split this up into: (a) adding rib_fibnum, rib_family and rib_vnet fields and changing the KPI for them by adding them to the current calls and then (b) adding the new functionality (and changing the support function logic beyond just passing the extra fields around)?

Just to be sure we're on the same page and I understand it correctly:
(a) - review1 means adding rib_fibnum, rib_family, rib_vnet to struct rib_head, along with changes in dom_rtattach(), rt_table_init() and so on, while
(b) - review2 is the same review w/o these changes, right?

It would make the change history much more clear and also make it easier to review things.

Sure. Would be happy to do it.

PS: I am happy some of the FIB KPI gets cleaned up again, i.e. in6_rtredirect() going away. I hated adding them based on the IPv4 model when I did the initial IPv6 FIB work.

More changes will come soon :-)

Just to be sure we're on the same page and I understand it correctly:
(a) - review1 means adding rib_fibnum, rib_family, rib_vnet to struct rib_head, along with changes in dom_rtattach(), rt_table_init() and so on, while
(b) - review2 is the same review w/o these changes, right?

da!

It would make the change history much more clear and also make it easier to review things.

Sure. Would be happy to do it.

PS: I am happy some of the FIB KPI gets cleaned up again, i.e. in6_rtredirect() going away. I hated adding them based on the IPv4 model when I did the initial IPv6 FIB work.

More changes will come soon :-)

:-)

In D22988#504244, @bz wrote:

Just to be sure we're on the same page and I understand it correctly:
(a) - review1 means adding rib_fibnum, rib_family, rib_vnet to struct rib_head, along with changes in dom_rtattach(), rt_table_init() and so on, while
(b) - review2 is the same review w/o these changes, right?

da!

Raised D23047 for the former.

It would make the change history much more clear and also make it easier to review things.

Sure. Would be happy to do it.

PS: I am happy some of the FIB KPI gets cleaned up again, i.e. in6_rtredirect() going away. I hated adding them based on the IPv4 model when I did the initial IPv6 FIB work.

More changes will come soon :-)

:-)

In D22988#504244, @bz wrote:

Just to be sure we're on the same page and I understand it correctly:
(a) - review1 means adding rib_fibnum, rib_family, rib_vnet to struct rib_head, along with changes in dom_rtattach(), rt_table_init() and so on, while
(b) - review2 is the same review w/o these changes, right?

da!

Raised D23047 for the former.

Raised D23075 for the latter. However, diffs w/o context are a bit harder to review.

This change got reviewed and committed in D23047 and D23075.
Closing.