Currently each non-TCP packet (and first TCP packet) exercise ARP/ND lookup if transmitted via IFT_ETHER kind of interface.
These lookups account for ~7% of CPU time when doing IP forwarding. Similarly, LLE recounting for short-lived TCP connections going through the default gateway, maybe a contention point.
This change eliminates L2 lookup and LLE refcounting for all output/forward routes that have a gateway.
The diff introduces a "glue" nhop_neigh layer between nexthops and LLE entries. Nexthops "subscribes" for the link layer notifications, and LLE layer provides those notifications.
- fast path utilises struct route ro_prepend and ro_plen infrastructure, allowing to bypass most of ether_output().
- nhop_neigh datastructure is implemented as per-VNET resizable hash table (as nexthops from different fibs can reference the same interface, also IPv4 nexthop can reference IPv6 LLE)
- datapath feedback ("get the timestamp of the first packet traversing given LLE startinf from now") occupies nearly half of the implemnetation. Effectively when such feedback is requested, sum of all packets gets collected from the matching nexthops. Then, in a global callout, each affected nhop_neigh structure is checked for difference every second.
- there is an assumption that all prepends are at most 64 bytes (and cache line size is at least 64). This is required to allow atomic updates for both prepend and prepend length.
- nhop prepends are allocated from a newly-created UMA zone. They use epoch(9) reclamation