Combining non-LRO mbufs results in a 12% throughput increase,
Chain LROed mbufs before passing to if_input() to reduce number of if_input() calls.
No API changes.
bz@ has a valid point:
Got it, so the “after LRO” in the original commit message is as confusing as forwarding.
I not saying anything against the change, I am just saying the commit message doesn’t describe what it does.
Also I am pretty sure this works with ether_input but not so much with fddi_input, iso88025_input, and ifdead_input is probably going to leak as well.
I think this is intended to piggyback on the recent iflib change which claims a speedup from chaining the packets. However, I'm afraid that I don't understand where this speedup is coming from. The stated reason to allow chaining in ether_input() is to allow drivers to amortize the release/acquire of the rx lock. However, no decent driver even uses an rx lock anymore, certainly not iflib or mlx5. So is there a benefit? If yes, then can you explain where it is coming from?
For the small packet forwarding microbenchmark in the iflib change, the speedup appears to be simply from avoiding the extra function calls, and maybe some better cache usage as a result. It's a highly synthetic benchmark however, even for the same test with packets large enough to not fit in the mbuf, the improvement dropped into the single digits (I don't remember the exact number off-hand) and for more conventional loads, it had no effect.
It's not at all unlikely that *this* change will not actually improve anything, I just wanted something on my dashboard before I went offline for a week to ensure I didn't forget about it.
I think chaining mbufs together after doing LRO re-assembly is asking for too much.
It took a while to get all the different HW LRO cases sorted out; this is likely just going to add extra delay?
The change has style problems but I am really not interested in it, so leaving this review.