- Implement the LRO using tcp_lro APIs, and LRO is enabled by default
- Add several stats sysctl nodes
- Check IP/TCP length before sending the packet to tcp_lro_rx(), if host does not provide RX csum information (*); and add an option through sysctl to always trust host TCP segment csum checks (default is off).
- Add sysctl to control the LRO entry depth. This depends on later tcp_lro patch, thus it is disabled by default. It is used to avoid holding too much TCP segments in driver. Limiting the LRO entry depth helps a lot in a one/two streams RX test.
This one 3x the RX performance on my local test (3Gbps -> 10Gbps), and ~2x the RX performance over a directly connected 40Ge network (5Gbps -> 9Gbps).
Reviewed by: Hongjiang Zhang <honzhan microsoft com>, Dexuan Cui <decui microsoft com>, Jun Su <junsu microsoft com>
Tested by: me (local), Hongjiang Zhang <honzhan microsoft com> (directly connected 40Ge)
Sponsored by: Microsoft OSTC