Combine LROed mbufs for a single call to if_input()
Needs ReviewPublic
Actions

Authored by • hselasky on Sep 24 2017, 10:10 PM.

Details

Reviewers

gallatin
shurd
bz
erj

Group Reviewers

transport

Summary

Combining non-LRO mbufs results in a 12% throughput increase,
Chain LROed mbufs before passing to if_input() to reduce number of if_input() calls.
No API changes.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Skipped

Unit

Tests Skipped

Build Status

Buildable 11731

Event Timeline

shurd created this revision.Sep 24 2017, 10:10 PM

Herald added a reviewer: Intel Networking. · View Herald TranscriptSep 24 2017, 10:10 PM

Herald added a reviewer: transport. · View Herald Transcript

Herald added subscribers: ae, imp. · View Herald Transcript

• hselasky added a subscriber: • hselasky.Sep 25 2017, 7:58 AM

• hselasky added inline comments.

sys/netinet/tcp_lro.c
425–427	I think you forgot to remove this if_input call ?
886	Hi, All if_input() calls inside the LRO code should be removed (!)

• hselasky added a reviewer: gallatin.Sep 25 2017, 7:59 AM

• hselasky commandeered this revision.Sep 25 2017, 9:17 AM

• hselasky added a reviewer: shurd.

Whitespace nits.

• hselasky marked an inline comment as done.Sep 25 2017, 9:24 AM

• hselasky marked an inline comment as done.

bz@ has a valid point:

Got it, so the “after LRO” in the original commit message is as confusing as forwarding.

I not saying anything against the change, I am just saying the commit message doesn’t describe what it does.

Also I am pretty sure this works with ether_input but not so much with fddi_input, iso88025_input, and ifdead_input is probably going to leak as well.

I think this is intended to piggyback on the recent iflib change which claims a speedup from chaining the packets. However, I'm afraid that I don't understand where this speedup is coming from. The stated reason to allow chaining in ether_input() is to allow drivers to amortize the release/acquire of the rx lock. However, no decent driver even uses an rx lock anymore, certainly not iflib or mlx5. So is there a benefit? If yes, then can you explain where it is coming from?

In D12487#258902, @gallatin wrote:

I think this is intended to piggyback on the recent iflib change which claims a speedup from chaining the packets. However, I'm afraid that I don't understand where this speedup is coming from. The stated reason to allow chaining in ether_input() is to allow drivers to amortize the release/acquire of the rx lock. However, no decent driver even uses an rx lock anymore, certainly not iflib or mlx5. So is there a benefit? If yes, then can you explain where it is coming from?

For the small packet forwarding microbenchmark in the iflib change, the speedup appears to be simply from avoiding the extra function calls, and maybe some better cache usage as a result. It's a highly synthetic benchmark however, even for the same test with packets large enough to not fit in the mbuf, the improvement dropped into the single digits (I don't remember the exact number off-hand) and for more conventional loads, it had no effect.

It's not at all unlikely that *this* change will not actually improve anything, I just wanted something on my dashboard before I went offline for a week to ensure I didn't forget about it.

I think chaining mbufs together after doing LRO re-assembly is asking for too much.
It took a while to get all the different HW LRO cases sorted out; this is likely just going to add extra delay?

The change has style problems but I am really not interested in it, so leaving this review.

There doesn't seem to be any will to move this forward, leaving review.

erj resigned from this revision.Oct 3 2018, 5:21 PM

erj removed a reviewer: Intel Networking.

Can this diff be abandoned if no one is interested in following up - to clean up the clutter of the essential transport reviews?

sbruno removed a reviewer: sbruno.Jan 19 2020, 11:25 PM

Ping.

Herald added a subscriber: melifaro. · View Herald TranscriptAug 4 2021, 9:09 AM