Page MenuHomeFreeBSD

tcp: release nic ktls send tags before time wait
AcceptedPublic

Authored by gallatin on Thu, Apr 23, 9:23 PM.

Details

Summary

When under heavy load or churn, inline ktls offload NICs may run out of hardware resources described by ktls send tags.
Rather than waiting for connections to pass through the fin_wait_2 and time_wait states, reclaim the ktls send
tag early. By preventing potentially tens or hundreds of thousands of sessions from holding send tags in time_wait / fin_wait_2,
this allows more ktls sessions to be offloaded to hardware.

fin_wait_2 was chosen because I *THINK* this is the earliest place where we can be
certain that all outgoing traffic has been acknowledged and no more data from the
socket buffer will be transmitted.

Something similar may be possible with receive ktls offload, but we do not run with rx ktls offload and I cannot test it.

Test Plan

Bounce nginx on a busy server. Watch netstat -sptcp vmstat -s | grep mlx5_0_tls and ensure the USED column of the zone roughly matches the established TCP connection count, and does not include connections in time_wait or fin2

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

sys/netinet/tcp_subr.c
4257 ↗(On Diff #176298)

To me this looks like a quick ad-hoc hack. This function is a purely stats keeping function by its nature. Imagine everybody who is interested in doing some work at certain state change would pile up their logic right here?

I understand that there are several places where tcp_state_change(tp, TCPS_FIN_WAIT_2) is called and we want to reduce amount of paste. Any ideas? Maybe a common function for entering a late state? There could be more cleanups performed there.

You shouldn't stay in TCPS_FIN_WAIT_2 for a long time, but you stay in TCPS_TIME_WAIT for a long time. Wouldn't it be good enough to release the snd_tag when you enter TCPS_TIME_WAIT? If that is the case, you can add the call of ktls_release_snd_tag() to tcp_twstart().
If you are really short of resources, I guess it might make sense to release them on periods where a TCP connection is idle.

sys/netinet/tcp_subr.c
4257 ↗(On Diff #176298)

Its more than several. At least in Netflix's tree, we have 25 calls to tcp_state_change(tp, TCPS_FIN_WAIT_2);

Rather than tackle this, I'll take Michael's advice and move this to tcp_twstart()

sys/netinet/tcp_subr.c
4257 ↗(On Diff #176298)

Yes, tcp_twstart() would be excellent! Thanks!

Address review feedback by moving this into tcp_twstart()

Excellent. Once we have TCP own socket buffer, it could be part of soisdisconnected() method of TCP. Now it just sits next to this call.

This revision is now accepted and ready to land.Sat, Apr 25, 5:41 AM