Shuffle of tcpcb to optimize cache line efficiencies in main tcp_input/output paths.
ClosedPublic
Actions

Authored by rrs on Apr 19 2018, 3:22 PM.

Details

Reviewers

gnn
bz
jtl
kbowling

Group Reviewers

transport

Commits

rS333041: This change re-arranges the fields within the tcp-pcb so that

Summary

This diff shuffles around the tcpcb so that it is optimized
for the common input and output processing with a 64 byte
cache line in mind. We want the first cache miss to be the
most common byte accessed and fields accessed in the
common path to stick to that cache line for as long as possible.
Hopefully by the time we spill over to the next cacheline the
pre-read-ahead will have gotten line two in etc. Things that
are less often used (retransmission paths, sacks etc) are pushed
towards the bottom optimizing for the hopefully most common paths.

Test Plan

This changes no code only shuffles around fields in the tcp-pcb.

It has been tested and running like this at NF for a couple of years now. Vtune
has shown it to be more efficient.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

rrs created this revision.Apr 19 2018, 3:22 PM

Herald added 1 blocking reviewer(s): transport. · View Herald TranscriptApr 19 2018, 3:22 PM

Did you ever measure (apart from VTune) any difference. What does this change do to VIMAGE kernels given td_vnet gets down to the cold side of the structure?

Yes I gained about 1/2Gbps of added performance in my tests.
As to VIMAGE who really uses that? No one I know of. Considering
the use of it (or lack there of) I saw of no real reason to have it
in the first-cache-line. Of course the other question is how
often does one use the back-pointer to the parent vnet.

Hmm looking in the code t_vnet is only used by

The new htpsi code
in tcb_subr when creating a new tcb
The timer code

All of these seem to me to be prime candidates for a later cache-line. You
want the hits to be against things in the direct input/output path which
this is not.