We have ENA_RX_BUDGET = 256 in order to allow up to 256 received
packets to be processed before we do other cleanups (handling tx
packets and, critically, refilling the rx buffer ring). Since the
ring holds 1024 buffers by default, this is fine for normal packets:
We refill the ring when it falls below 7/8 full, and even with a large
burst of incoming packets allowing it to fall by another 1/4 before we
consider refilling the ring still leaves it at 7/8 - 1/4 = 5/8 full.
With jumbos, the story is different: A 9k jumbo (as is used by default
within the EC2 network) consumes 3 descriptors, so a single rx cleanup
pass can consume 3/4 of the default-sized rx ring; if the rx buffer
ring wasn't completely full before a packet burst arrives, this puts
us perilously close to running out of rx buffers.
This precise failure mode has been observed on some EC2 instance types
within a Cluster Placement Group, resulting in the nominal 10 Gbps
single-flow throughput between instances dropping to ~100 Mbps as a
result of repeated rx overruns causing packet loss and ultimately
retransmission timeouts.
Note that theoretically up to ENA_PKT_MAX_BUFS (19) descriptors can be
used for a single packet, in which case even 54 packets would exhaust
the default rx buffer ring; it's not clear if this ever occurs in
practice, but this fix will address that case as well.
Sponsored by: Amazon
MFC after: 1 week