Page MenuHomeFreeBSD

Shuffle of tcpcb to optimize cache line efficiencies in main tcp_input/output paths.
ClosedPublic

Authored by rrs on Apr 19 2018, 3:22 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Jan 8, 9:11 PM
Unknown Object (File)
Dec 25 2024, 1:42 PM
Unknown Object (File)
Dec 23 2024, 2:46 AM
Unknown Object (File)
Dec 18 2024, 8:38 AM
Unknown Object (File)
Dec 10 2024, 8:53 PM
Unknown Object (File)
Dec 10 2024, 8:51 PM
Unknown Object (File)
Dec 10 2024, 8:50 PM
Unknown Object (File)
Dec 10 2024, 8:49 PM
Subscribers

Details

Summary

This diff shuffles around the tcpcb so that it is optimized
for the common input and output processing with a 64 byte
cache line in mind. We want the first cache miss to be the
most common byte accessed and fields accessed in the
common path to stick to that cache line for as long as possible.
Hopefully by the time we spill over to the next cacheline the
pre-read-ahead will have gotten line two in etc. Things that
are less often used (retransmission paths, sacks etc) are pushed
towards the bottom optimizing for the hopefully most common paths.

Test Plan

This changes no code only shuffles around fields in the tcp-pcb.

It has been tested and running like this at NF for a couple of years now. Vtune
has shown it to be more efficient.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Did you ever measure (apart from VTune) any difference. What does this change do to VIMAGE kernels given td_vnet gets down to the cold side of the structure?

Yes I gained about 1/2Gbps of added performance in my tests.
As to VIMAGE who really uses that? No one I know of. Considering
the use of it (or lack there of) I saw of no real reason to have it
in the first-cache-line. Of course the other question is how
often does one use the back-pointer to the parent vnet.

Hmm looking in the code t_vnet is only used by

  1. The new htpsi code
  2. in tcb_subr when creating a new tcb
  3. The timer code

All of these seem to me to be prime candidates for a later cache-line. You
want the hits to be against things in the direct input/output path which
this is not.

kbowling added a subscriber: kbowling.

Been running a variant of this for over a year (with some slight site-specific changes)

This revision is now accepted and ready to land.Apr 22 2018, 7:36 PM
This revision was automatically updated to reflect the committed changes.