Page MenuHomeFreeBSD

If reading the routing table fails, retry up to 10 times
AcceptedPublic

Authored by allanjude on May 19 2018, 3:25 PM.

Details

Summary

To read the routing table, we call sysctl() with a NULL pointer to get the size of the result (needed), then allocate a buffer that large, and then call sysctl() for real.

However, if you are operating a large router, like a BGP full feed, your routing table is changing constantly, and the routing table may grow by one or more entries between the estimate, and the actual read.
Currently, this results in the confusing error message:

# netstat -rnf inet
Routing tables
netstat: sysctl: net.route.0.2.dump.0: Cannot allocate memory

On a router with 12GB of free memory. Memory allocation did not fail, rather the buffer we passed to sysctl() was insufficient because some additional routes were added in the mean time.

This patch will expand needed by 25% over the most recent estimate and try again, up to 10 times.

During convergence, when 100s of thousands of routes are being added rapidly, this can be the difference between being able to read the routing table, and not.

This patch was inspired by one Hiroki offered on the mailing list in 2014.
https://lists.freebsd.org/pipermail/freebsd-current/2014-February/thread.html#48412
That version of the patch only grew the buffer by 4kb, however the routing table in this case is over 180 MB, and the churn may be more than the ~16 additional routes that would fit in an extra 4kb of memory.

Test Plan

Tested on a live router with 3 full BGP feeds

# netstat -rnf inet|wc -l
netstat: sysctl: net.route.0.2.dump.0: routing table grew beyond estimated buffer size (188334440), retrying with larger buffer...
  692454

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 16700
Build 16594: arc lint + arc unit

Event Timeline

allanjude created this revision.May 19 2018, 3:25 PM
allanjude edited the summary of this revision. (Show Details)May 19 2018, 3:26 PM
allanjude edited the test plan for this revision. (Show Details)
allanjude edited the summary of this revision. (Show Details)
allanjude edited the test plan for this revision. (Show Details)
allanjude updated this revision to Diff 42737.May 19 2018, 4:19 PM

After discussion with rgrimes, round needed up to the nearest full PAGE_SIZE

This should eliminate the need to retry in the common case where the number of routes has increased by only a single digit

Make the expansion facter larger, to try to deal with the 'BGP feed flap' case

allanjude edited the summary of this revision. (Show Details)May 19 2018, 4:20 PM
melifaro added inline comments.
usr.bin/netstat/route.c
317

What is the reason of using 1.25 and not, say, 2?

I see this problem often on BPG routers. I usually use the birdc as a better and more reliabe interface to the routing information, though on occasion you do have to sanity check with what the kernel has.
Also birdc seems to be able to give me a routing table 10x faster than netstat -rn and without this issue.

usr.bin/netstat/route.c
255–256

I am not a fan of mixing initialized and un initialized variables in the same line.

rgrimes added inline comments.May 19 2018, 4:53 PM
usr.bin/netstat/route.c
311

I am concerned about this high retry count, netstat -rn on a full view BGP router is a non trivial process, repeating it 10 times in a tight loop may lead to additional problems.

Can we lower this retry count to 4, and add a short delay() inside the loop giving the route table a chance to settle?

I think the round up to page is already going to handle the stable state, and it is just during bgp session start up that the problem is most likely. You certainly do not want to be banging on the CPU when it is already pretty busy doing a full view route flap convergence!

While the intention is good, I'm curious why someone would want to use "netstat -rn" these days for BGPv4 full view having about 700 thousands prefixes?

netstat gives text view of the table requiring great deal of CPU cycles to format it first and then to parse again.

usr.bin/netstat/route.c
255–256

Also, variables should be sorted by name.

allanjude marked 3 inline comments as done.May 19 2018, 6:35 PM
allanjude updated this revision to Diff 42739.

Address feedback from rgrimes@, melifaro@, and everyone else who commented

  • Cleanup style of declarations
  • Lower retry count from 10 to 4
  • Add a linearly increasing sleep between retries
  • Change the growth factor from 1.25 to 2x
allanjude marked an inline comment as done.May 19 2018, 6:40 PM
rgrimes accepted this revision.May 23 2018, 7:38 AM

I am ok with this for now, but I think the values 100000uS and 2x are going to need some "experience" tuning.

This revision is now accepted and ready to land.May 23 2018, 7:38 AM