Page MenuHomeFreeBSD

Remove refill budget from iflib
ClosedPublic

Authored by pkelsey on Mar 3 2020, 4:54 PM.

Details

Summary

This patch removes the limit on rx refills.

It is not clear what the perceived benefit of limiting receive descriptor refills is. Except for a startup transient due to the limited initial filling of receive rings during init, under load and without any other type of limit imposed, the refill work would be naturally limited according to the receive packet processing budget because the only descriptors available to be refilled would be the ones just released by the retrieving the current batch of packets.

It is further not clear, even if an argument can be successfully made as to what the purpose of the limit is, why the refill limit is the packet-reception limit (plus an arbitrary constant). Packets are made of fragments, fragments are 1:1 with driver ring descriptors, and driver ring descriptors are what need to be refilled. Limiting refills based on the packet limit misses an entire dimension of the actual resource consumption (number of fragments per packet) and is guaranteed to fall behind (resulting in starvation and dropped packets) whenever there is more than a little traffic with multiple fragments per packet, except where coincidences favorably align.

Concretely, simple iperf testing with the vmx driver configured with device LRO enabled (multiple fragments per packet) demonstrates that the existing refill-limiting behavior results in dropped packets and higher CPU load (because additional receive interrupts must be processed that are refill-only work) for the same TCP throughput. Removing the refill limit eliminates dropped packets and reduces CPU consumption during the test.

Test Plan

This has been tested on a 12.1 system (with r356932 applied) with em and vmx devices.

This builds for 13-current, but testing on 13-current is waiting for the package system to get fixed.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Why not just remove the limit, rather than making things even more complex?

Why not just remove the limit, rather than making things even more complex?

Based on my current thinking, there's no reason not to remove the limit.

This formulation of the patch is a hedge against there being some operational benefit under some other driver or workload that just isn't occurring to me and a hedge against any potential ensuing debate as it make A/B testing as low-friction as possible.

Another consideration is that it seems to me that there are now iflib and vmx driver fixes that should be released in an errata for 12.1, and it might be harder to get killing the existing refill limits outright into such an errata in the near term than a sysctl that preserves the default but still allows user-override.

I get it, but that (totally arbitrary) limit is a pet peeve of mine. Its one of those things that you follow up several levels of code to an "XXX" comment and an arbitrary value. Its like reading a mystery novel and finding the butler really did do it.

And, to be honest, you already have the sysctl, If the user already has to touch something for decent performance, let's just have him set the budget to 65535 rather than making the running code even more complex.

So it sounds like you should keep the patch as-is, but then remove the limit sysctls and the arbitrary limit in a follow-on patch, possibly intended to be MFC'd in to 12.2?

I get it, but that (totally arbitrary) limit is a pet peeve of mine. Its one of those things that you follow up several levels of code to an "XXX" comment and an arbitrary value. Its like reading a mystery novel and finding the butler really did do it.

And, to be honest, you already have the sysctl, If the user already has to touch something for decent performance, let's just have him set the budget to 65535 rather than making the running code even more complex.

I think the point about there already being the rx_budget sysctl that can be raised greater than the queue size to get unlimited refill is a good one. When I first developed this patch, I was untangling multiple effects during vmx operation with multi-descriptor packets, and having independent control over the rx_budget and the refill was desirable for the investigation. Those multiple effects have been untangled now, and I have just gone back and verified that with the receive rings properly sized, there's no difference in iperf throughput or CPU load between {rx_budget=0 (which means 16), limit_rx_refill=0} and {rx_budget=65535, limit_rx_refill=1}.

This changes my current perspective to:

  • Kill the refill limit entirely in -current
  • MFC removal of the refill limit to 12-STABLE after a suitable while
  • Some decision will have to be made as to whether to include killing the refill limit in an iflib/vmx errata for 12.1, or to just advise on possibly tuning the rx_budget to infinity for jumbo frame or device-LRO configurations

I think this would be compatible with erj's thoughts as well.

This kills the refill limit entirely.

pkelsey retitled this revision from Add a sysctl to iflib to disable budget-based refill to Remove refill budget from iflib.Mar 14 2020, 7:07 PM
pkelsey edited the summary of this revision. (Show Details)
This revision was not accepted when it landed; it landed in state Needs Review.Mar 14 2020, 7:59 PM
This revision was automatically updated to reflect the committed changes.