Page MenuHomeFreeBSD

routing: Subscribe nhops to ifnet link events
AcceptedPublic

Authored by pouria on Sun, May 31, 9:19 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Jun 10, 2:56 PM
Unknown Object (File)
Mon, Jun 8, 11:48 AM
Unknown Object (File)
Mon, Jun 8, 1:36 AM
Unknown Object (File)
Sat, Jun 6, 10:32 PM
Unknown Object (File)
Fri, Jun 5, 6:22 PM
Unknown Object (File)
Thu, Jun 4, 7:17 PM
Unknown Object (File)
Thu, Jun 4, 6:03 PM
Unknown Object (File)
Thu, Jun 4, 10:36 AM
Subscribers

Details

Reviewers
melifaro
glebius
markj
Group Reviewers
network
Summary

Update nexthop flags with interface link status events and
instead of checking link status of interface for every packet
only check the reachability flag of the final nexthop.

Q: Why not listening in the rib instead of nhops?
A: We don't have much nhops in comparison to routes.

Updating nhops directly is way more faster and efficient.
Test Plan
[root@ftsr1] [~] # ping 9.9.9.9
PING 9.9.9.9 (9.9.9.9): 56 data bytes
64 bytes from 9.9.9.9: icmp_seq=0 ttl=57 time=87.410 ms
64 bytes from 9.9.9.9: icmp_seq=1 ttl=57 time=95.969 ms
64 bytes from 9.9.9.9: icmp_seq=3 ttl=57 time=96.876 ms
[nhop_ctl] inet.0 nhop_free: deleting nh#2/inet/vtnet0/resolve
ping: sendto: No route to host
ping: sendto: No route to host
ping: sendto: No route to host
^C

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 73616
Build 70499: arc lint + arc unit

Event Timeline

I updated my servers with this patch and got good results.
Before the patch, when I pinged routed addresses and a nexthop became unreachable, my pings got stuck.
Now it immediately shows Network is down and reacts faster when my nexthop is reachable again.
Consider I have more than 2M routes.

% ping 3fff::10:15::1
PING(56=40+8+8 bytes) 3fff::2::fbf8:3f4 --> 3fff::10:15::1
16 bytes from 3fff::10:15::1, icmp_seq=0 hlim=64 time=4.100 ms
16 bytes from 3fff::10:15::1, icmp_seq=1 hlim=64 time=4.273 ms
16 bytes from 3fff::10:15::1, icmp_seq=2 hlim=64 time=1.290 ms
16 bytes from 3fff::10:15::1, icmp_seq=3 hlim=64 time=5.433 ms
16 bytes from 3fff::10:15::1, icmp_seq=4 hlim=64 time=2.654 ms
ping: sendmsg: Network is down
ping: wrote 3fff::10:15::1 16 chars, ret=-1
ping: sendmsg: Network is down
ping: wrote 3fff::10:15::1 16 chars, ret=-1
ping: sendmsg: Network is down
ping: wrote 3fff::10:15::1 16 chars, ret=-1
16 bytes from 3fff::10:15::1, icmp_seq=8 hlim=64 time=2.928 ms
16 bytes from 3fff::10:15::1, icmp_seq=9 hlim=64 time=5.729 ms
16 bytes from 3fff::10:15::1, icmp_seq=10 hlim=64 time=2.608 ms
sys/net/route/nhop_ctl.c
1338–1341

Shouldn't this be an atomic(9) operation?

1361–1364

What did happen without this crutch?

sys/net/route/nhop_ctl.c
1338–1341

AFAICU no.
We set nhop flags without any kind of protection in netlink, rtsock and other subsystem.

1361–1364

bridge tests cause panics.
Why?
We calculate rib_head (rh) from fibnum + V_tables in rt_tables_get_rnh and the VNET is destroyed.
So it results in garbage memory.

Move one line on updating rib_head from D57389 to here.

I believe this is an architecturally correct change, but I'm not sure about all details.

P.S. I'll move my locking concerns discussion to D57389.

sys/net/route/nhop_ctl.c
1361–1364

Please add this info into a comment above VNET_IS_SHUTTING_DOWN().

This revision is now accepted and ready to land.Thu, Jun 11, 4:01 PM