Page MenuHomeFreeBSD

TCP Ratelimit code update

Authored by rrs on Wed, Feb 12, 2:14 PM.


Group Reviewers

The TCP Ratelimit code was written originally with only access to a mlx 4th generation
card that supported only 13 fixed non-settable rates and a chelsio card that supported
16 fixed rates but settable by the first flow to use them.

There was no provision to have a required "setup" utility to setup a rte. The newer
5th and 6th generation mlx cards require a distinct setup to be called if
you want to define your own rates.

Also TCP either wanted 1000 rates (which no card supports) or would
settle in on 14 or so.. with a common rate so that both mlx and chelsio would work.
This a bit of wanting too much. The newer mlx cards support over 100 rates
so lets define these and there uses as well as fix it so we can set them up.

Test Plan

With a new mlx 5th or 6 gen card test to make sure
that the system will initialize in net.inet.rl with the
new 100+ rates.

I have actually also tested with one of these card setup
back to back so I can measure each of the rates, and the mlx
5th generation card at least performs perfectly with a 1 packet

Note that changes will need to be brought in from the mlx driver in order
for this to all work (thanks to Hans for his patches)!!

Diff Detail

Lint Skipped
Unit Tests Skipped

Event Timeline

rrs created this revision.Wed, Feb 12, 2:14 PM
hselasky added inline comments.Wed, Feb 12, 2:27 PM
140 ↗(On Diff #68184)

Should this table be const uint64_t ?

525 ↗(On Diff #68184)

Should if_printf() be used here? Or should these prints go away?

rrs updated this revision to Diff 68189.Wed, Feb 12, 3:38 PM

Address Hans comments, also fix the issue I had
when I back things out (the net epoch macro should have
been in the source).. also add just a few more comments and
fix a spelling error in one.

140 ↗(On Diff #68184)

Yeah your right, we want it to be const.. good catch!

525 ↗(On Diff #68184)

Yeah I had left them in, in our image but they are really un-needed.. good idea!!

hselasky accepted this revision.Wed, Feb 12, 3:41 PM
This revision is now accepted and ready to land.Wed, Feb 12, 3:41 PM
melifaro added inline comments.
116 ↗(On Diff #68184)

Would it be possible to add a bit more wording about the measurement units here?

I don't have Chelsio/Mellanox docs handy but my expectation is that the hw rate is indeed bps.
In my understanding, what is calculated here is burst size for the goodput - e.g. actual data without accounting for encapsulation.

If that's correct, then in similar example where we have 31Mbps reported by both TCP stack and supported HW queue, we would get 40 bytes overhead for each packet, which will ends up queuing 1-second delay each 37.5 seconds.

117 ↗(On Diff #68184)

Typo: pacets

1469 ↗(On Diff #68184)

Don't we need to provide some error notification for the caller that there is no suitable hw rate?

151 ↗(On Diff #68184)

Given it returns the ideal burst size, would it be possible to name the function something like tcp_get_pacing_burst_size() or similar to avoid confusion around mss?