Page MenuHomeFreeBSD

TCP LRO support for VLAN and VXLAN
Needs ReviewPublic

Authored by hselasky on Fri, Apr 2, 10:37 PM.

Details

Reviewers
kib
Group Reviewers
transport
Summary
  • Speedup header data comparison. Don't compare field by field, but instead use unsigned long.
  • Make smaller functions doing one thing at a time instead of big functions with lots of repeated code.
  • Try to refactor the TCP ACK compression code to be more readable. Predict number of ACKs which needs compression.
  • Use sbintime() for all time-keeping.
  • Try to shrink the size of the LRO entry, because it is frequently zeroed.

MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking

Diff Detail

Repository
R10 FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

Fix UDP checksumming, when there is no UDP checksum. Optimise other checksumming.

Test OK: VXLAN
Test OK: NO-VLAN
Test OK: VLAN
TODO: PRIO-TAGGED

Add fixes for BBR/RACK.

Incorporate a fix for bad assert / panic.

I may be reading this wrong, but for mbuf compression, you now seem to be appending a chain of packets, then coalescing them. The original design choice of coalescing as the new packet was encountered was intentional. The idea is that you want to operate on and free the incoming mbuf while it is hot in cache. By chaining mbufs and dealing with them later, they're likely to be cold in cache, so you're likely encounter additional cache misses on every mbuf in the chain as you copy out data and free it.

Hi Drew,

For sorted LRO the mbufs are still hot in the CPU cache. Remember we are flushing all the entries after each new flowid value!

--HPS

Hi Drew,

For sorted LRO the mbufs are still hot in the CPU cache. Remember we are flushing all the entries after each new flowid value!

--HPS

Remember that very few drivers use sorted LRO. Iflib based drivers do not, which is the vast majority of network traffic in FreeBSD.

(I tried to add sorted LRO to iflib, but wound up causing a small packet forwarding regression, so I abandoned the effort)

The advantage with my approach is:

  1. Lookup the INP only once.
  2. Less error handling.

Won't the same problem happen looking up INP's from 1000's of connections, that the CPU runs out of cache?

Then it is better to only process packets for one INP at a time?

@gallatin: Who can test this patch with non-sorted-LRO and TCPHPTS, to verify performance improvements or losses?

hselasky retitled this revision from Work in progress TCP LRO support for VLAN and VXLAN. to TCP LRO support for VLAN and VXLAN.

Incorporate changes from Randall.
Some minor style and spelling fixes.

Include more changes from Randall.