Page MenuHomeFreeBSD

Combine LROed mbufs for a single call to if_input()
Needs ReviewPublic

Authored by hselasky on Sep 24 2017, 10:10 PM.

Details

Reviewers
gallatin
shurd
bz
erj
Group Reviewers
transport
Summary

Combining non-LRO mbufs results in a 12% throughput increase,
Chain LROed mbufs before passing to if_input() to reduce number of if_input() calls.
No API changes.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Unit Tests Skipped
Build Status
Buildable 11731

Event Timeline

hselasky added inline comments.
sys/netinet/tcp_lro.c
425–427

I think you forgot to remove this if_input call ?

885

Hi, All if_input() calls inside the LRO code should be removed (!)

hselasky added a reviewer: shurd.
hselasky updated this revision to Diff 33400.
hselasky edited the summary of this revision. (Show Details)
hselasky edited the test plan for this revision. (Show Details)
hselasky edited the summary of this revision. (Show Details)

Whitespace nits.

hselasky marked an inline comment as done.

bz@ has a valid point:

Got it, so the “after LRO” in the original commit message is as confusing as forwarding.

I not saying anything against the change, I am just saying the commit message doesn’t describe what it does.

Also I am pretty sure this works with ether_input but not so much with fddi_input, iso88025_input, and ifdead_input is probably going to leak as well.

I think this is intended to piggyback on the recent iflib change which claims a speedup from chaining the packets. However, I'm afraid that I don't understand where this speedup is coming from. The stated reason to allow chaining in ether_input() is to allow drivers to amortize the release/acquire of the rx lock. However, no decent driver even uses an rx lock anymore, certainly not iflib or mlx5. So is there a benefit? If yes, then can you explain where it is coming from?

I think this is intended to piggyback on the recent iflib change which claims a speedup from chaining the packets. However, I'm afraid that I don't understand where this speedup is coming from. The stated reason to allow chaining in ether_input() is to allow drivers to amortize the release/acquire of the rx lock. However, no decent driver even uses an rx lock anymore, certainly not iflib or mlx5. So is there a benefit? If yes, then can you explain where it is coming from?

For the small packet forwarding microbenchmark in the iflib change, the speedup appears to be simply from avoiding the extra function calls, and maybe some better cache usage as a result. It's a highly synthetic benchmark however, even for the same test with packets large enough to not fit in the mbuf, the improvement dropped into the single digits (I don't remember the exact number off-hand) and for more conventional loads, it had no effect.

It's not at all unlikely that *this* change will not actually improve anything, I just wanted something on my dashboard before I went offline for a week to ensure I didn't forget about it.

I think chaining mbufs together after doing LRO re-assembly is asking for too much.
It took a while to get all the different HW LRO cases sorted out; this is likely just going to add extra delay?

The change has style problems but I am really not interested in it, so leaving this review.

There doesn't seem to be any will to move this forward, leaving review.

erj removed a reviewer: Intel Networking.

Can this diff be abandoned if no one is interested in following up - to clean up the clutter of the essential transport reviews?