When executing a workload where threads are waking up and sleeping frequently, such as a web server workload with RACK TCP using tcphpts pacing, the scheduler frequently calls cpu_search_highest() to try to find a more highly loaded core and steal work from it. Netflix 100Gb/s servers call cpu_search_highest as much as 7 million times per second.
This change introduces a rate-limiting mechanism to limit this search for work to a configurable interval per-core. Limiting the search to once every 10ms moves the cpu_search() code from our 3-4th hottest on the system down into the noise, and drops load by a few percent and increases throughput by 4-5Gb/s on experimental machines with multiple 100GbE links.
Note that I used getsbinuptime() intentionally, as it didn't seem like we needed the precision of sbinuptime() and when using it, tsc_get_timecount_low() surfaced as a hot function in profiling and we saw less CPU savings.