Page MenuHomeFreeBSD

TCP Ratelimit code update
ClosedPublic

Authored by rrs on Feb 12 2020, 2:14 PM.
Tags
None
Referenced Files
Unknown Object (File)
Dec 13 2024, 1:59 PM
Unknown Object (File)
Nov 6 2024, 12:18 PM
Unknown Object (File)
Oct 18 2024, 11:21 PM
Unknown Object (File)
Oct 5 2024, 10:40 PM
Unknown Object (File)
Oct 5 2024, 10:34 PM
Unknown Object (File)
Oct 5 2024, 10:31 PM
Unknown Object (File)
Oct 1 2024, 11:09 PM
Unknown Object (File)
Sep 8 2024, 6:03 PM
Subscribers

Details

Summary

The TCP Ratelimit code was written originally with only access to a mlx 4th generation
card that supported only 13 fixed non-settable rates and a chelsio card that supported
16 fixed rates but settable by the first flow to use them.

There was no provision to have a required "setup" utility to setup a rte. The newer
5th and 6th generation mlx cards require a distinct setup to be called if
you want to define your own rates.

Also TCP either wanted 1000 rates (which no card supports) or would
settle in on 14 or so.. with a common rate so that both mlx and chelsio would work.
This a bit of wanting too much. The newer mlx cards support over 100 rates
so lets define these and there uses as well as fix it so we can set them up.

Test Plan

With a new mlx 5th or 6 gen card test to make sure
that the system will initialize in net.inet.rl with the
new 100+ rates.

I have actually also tested with one of these card setup
back to back so I can measure each of the rates, and the mlx
5th generation card at least performs perfectly with a 1 packet
gap.

Note that changes will need to be brought in from the mlx driver in order
for this to all work (thanks to Hans for his patches)!!

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

netinet/tcp_ratelimit.c
140 ↗(On Diff #68184)

Should this table be const uint64_t ?

525 ↗(On Diff #68184)

Should if_printf() be used here? Or should these prints go away?

Address Hans comments, also fix the issue I had
when I back things out (the net epoch macro should have
been in the source).. also add just a few more comments and
fix a spelling error in one.

netinet/tcp_ratelimit.c
140 ↗(On Diff #68184)

Yeah your right, we want it to be const.. good catch!

525 ↗(On Diff #68184)

Yeah I had left them in, in our image but they are really un-needed.. good idea!!

This revision is now accepted and ready to land.Feb 12 2020, 3:41 PM
melifaro added inline comments.
netinet/tcp_ratelimit.c
116 ↗(On Diff #68184)

Would it be possible to add a bit more wording about the measurement units here?

I don't have Chelsio/Mellanox docs handy but my expectation is that the hw rate is indeed bps.
In my understanding, what is calculated here is burst size for the goodput - e.g. actual data without accounting for encapsulation.

If that's correct, then in similar example where we have 31Mbps reported by both TCP stack and supported HW queue, we would get 40 bytes overhead for each packet, which will ends up queuing 1-second delay each 37.5 seconds.

117 ↗(On Diff #68184)

Typo: pacets

1469 ↗(On Diff #68184)

Don't we need to provide some error notification for the caller that there is no suitable hw rate?

netinet/tcp_ratelimit.h
151 ↗(On Diff #68184)

Given it returns the ideal burst size, would it be possible to name the function something like tcp_get_pacing_burst_size() or similar to avoid confusion around mss?

netinet/tcp_ratelimit.c
116 ↗(On Diff #68184)

First both Chelsio and Mellanox use bytes per second for controlling the card. Second
my testing has shown that this rate is the complete rate not the goodput rate. I recently
changed the NF version of rack to add in the overhead for this very reason. So no it is
the complete size including the encap (E-H, IP and TCP Hdr).

I can add a bit more here if you like.

117 ↗(On Diff #68184)

Thats should have been fixed by my second update... caught that as I was
re-reading the comments ;=_

1469 ↗(On Diff #68184)

I did not feel an error was appropriate for this return. In this last case we are forcing
the stack to fall back to its rate that it would be using without hardware assist.
But I can see if we are still growing a queue that may not be desirable, in this
case the stack has made an error and selected the wrong hardware rate for the
rate it wants to send at. Hmm I suppose I could add a return code via a
passed argument that can be checked.

netinet/tcp_ratelimit.h
151 ↗(On Diff #68184)

sure thats fine :-)

Update comments and name of the mss function per the comments in the review.

This revision now requires review to proceed.Feb 18 2020, 12:07 PM
This revision is now accepted and ready to land.Feb 18 2020, 12:07 PM