Allow multiple cores to be used to process if_epair traffic. We do this
(if RSS is enabled) based on the RSS hash of the incoming packet. This
allows us to distribute the load over multiple cores, rather than
sending everything to the same one.
We also switch from swi_sched() to taskqueues, which also contributes to
better throughput.
Benchmark results:
Setup A: (cc0 - bridge0 - epair0a) (epair0b - bridge1 - cc1)
Before 346.223 Kpps
After (no RSS) 1.419 Mpps
After (RSS) 3.110 Mpps
Setup B: (cc0 - bridge0 - epaira0) (epair0b - vnet jail - epair1a) (epair1b - bridge1 - cc1)
Before 7.705 Kpps
After (no RSS) 1.362 Mpps
After (RSS) 781.086 Kpps
MFC after: 3 weeks
Sponsored by: Orange Business Services