Performance improvements for octe(4):
- Distribute RX load across multiple cores, if present. This reverts r217212, which is no longer relevant (I think because of the newer SDK).
- Use newer APIs for pinning taskqueue entries to specific cores.
- Deepen RX buffers.
This more than doubles NAT forwarding throughput on my EdgeRouter Lite from,
with typical packet mixture, 90 Mbps to over 200 Mbps. The result matches
forwarding throughput in Linux without the UBNT hardware offload on the same
hardware, and thus likely reflects hardware limits.
Reviewed by: jhibbits