This is a necessary fix to the AIM algorithm.
I would still not advise using it under any circumstances until D30094 is resolved.
Differential D30155
ixgbe: Bring back accounting for tx in AIM kbowling on May 7 2021, 12:04 AM. Authored by Tags None Referenced Files
Details
This is a necessary fix to the AIM algorithm. I would still not advise using it under any circumstances until D30094 is resolved. Tested on two X552s. Does not fix the problems being discussed in D30094 but this will be necessary eventually to test/use AIM when it is functioning correctly.
Diff Detail
Event Timeline
Comment Actions I have good news and bad news to report. The good news is, AIM functions closer to intended on the sender heavy workload with the txr accounting in place. The bad news: it occasionally lops off around 2gbps on a single stream TCP TSO sender with occasional packet loss in my test environment. On the receiver, AIM reduces single stream UDP performance by about 1gbps and increase loss 20%. That seems like a bigger issue than the current situation, and I'd rather just set static int ixgbe_max_interrupt_rate = (4000000 / IXGBE_LOW_LATENCY); to IXGBE_AVE_LATENCY as a break fix instead of enabling AIM while we continue to figure this EITR interaction out for the intel drivers. From my perspective there are two worthwhile paths to investigate, in one we improve the AIM algorithm. In another, we figure out what is going on in iflib and make it work the way it's supposed to -- we have enough information on the sender we really shouldn't need to dynamically tune EITR as far as I can tell. I'm less sure about the receiver but think in the cases FreeBSD is used a correct static EITR value would be ok if we get the iflib re-arms correct. What do you think? Comment Actions There are some optimizations in the iflib driver to decrease TX descriptor writeback txq_max_rs_deferred (I think @gallatin mentioned this earlier), I wonder if this is just a matter of the old AIM algorithm being too aggressive and needing to be tamped down a bit for this batching.
Comment Actions I have similar observation (bad news) wrt UDP. But for TCP, I see just fine. My runs are all on NetApp platform. client% sudo iperf3 -c 192.168.228.0 -i 5 -u -b 2G Connecting to host 192.168.228.0, port 5201 [ 4] local 192.168.227.254 port 24476 connected to 192.168.228.0 port 5201 [ ID] Interval Transfer Bitrate Total Datagrams [ 4] 0.00-5.00 sec 1.16 GBytes 2.00 Gbits/sec 139506 [ 4] 5.00-10.00 sec 1.16 GBytes 2.00 Gbits/sec 139508 - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams [ 4] 0.00-10.00 sec 2.33 GBytes 2.00 Gbits/sec 0.000 ms 0/279014 (0%) sender [ 4] 0.00-10.09 sec 1.61 GBytes 1.37 Gbits/sec 0.017 ms 85679/279012 (31%) receiver iperf Done. Wrt TCP, I donot see your observation. My lab NIC is embedded-10G (X552). client% sudo iperf3 -c 192.168.228.0 -i 5 -b 2G Connecting to host 192.168.228.0, port 5201 [ 4] local 192.168.227.254 port 38791 connected to 192.168.228.0 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 4] 0.00-5.00 sec 1.16 GBytes 2.00 Gbits/sec 1 4.33 MBytes [ 4] 5.00-10.00 sec 1.16 GBytes 2.00 Gbits/sec 0 7.32 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 4] 0.00-10.00 sec 2.33 GBytes 2.00 Gbits/sec 1 sender [ 4] 0.00-10.00 sec 2.33 GBytes 2.00 Gbits/sec receiver iperf Done. client% sudo iperf3 -c 192.168.228.0 -i 5 -b 7G Connecting to host 192.168.228.0, port 5201 [ 4] local 192.168.227.254 port 38773 connected to 192.168.228.0 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 4] 0.00-5.00 sec 4.07 GBytes 7.00 Gbits/sec 0 3.74 MBytes [ 4] 5.00-10.00 sec 4.07 GBytes 7.00 Gbits/sec 0 3.74 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 4] 0.00-10.00 sec 8.15 GBytes 7.00 Gbits/sec 0 sender [ 4] 0.00-10.09 sec 8.15 GBytes 6.94 Gbits/sec receiver iperf Done. Comment Actions Also, I prefer to have a quick call and discuss the ideas & thoughts we have. We would need an expert from Intel to help us understand AIM. Comment Actions @stallamr_netapp.com thanks, there is a variable here in that I am running in two VMs amongst other things. I'm also diving into this code for the first time in 3 years so this is new, I'm just trying to understand the problem in the drivers and hopefully fix it or find someone who can. @gnn is getting me access to the project's network lab, and I'll use that to see if I can take a look at the problem on other types of hardware. I don't have any authority over intel but I agree it would be helpful if we could get them back on a regular call to discuss important networking development. Would you like me to send out a Google Calendar invite for an iflib meeting?
|