Use smaller jumbo mbufs in ENA as needed
ClosedPublic
Actions

Authored by cperciva on Apr 22 2020, 11:29 PM.

Details

Reviewers

imp

Commits

rS360777: Optimize ENA Rx refill for low memory conditions

Summary

On systems experiencing a heavy memory load, the ENA driver may spend an excessive amount of time in its packet receiving ithread attempting to allocate new receive buffers. Due to the high priority with which network ithreads operate, this can cause other issues, including starving the ENA *transmit* side and causing ENA driver resets, and starving the NVMe driver's I/O completion processing ithread resulting in timeouts and other issues there.

This patch:
(a) Switches the ENA driver from using 16 kB mbuf clusters by default to using 9 kB mbuf clusters by default, since the larger clusters are never appropriate for the EC2 network's MTU, and
(b) When refilling receive buffers, remembers if a 9 kB allocation fails and switches over to 4 kB mbuf clusters for the remainder of the refill operation.

This patch applies to FreeBSD stable/12; the code in HEAD is very slightly different.

Test Plan

I'm hoping that all the people who have reported issues with the ENA driver can test this and confirm that the problems go away. The only uncertainty is whether "try 9 kB and switch to 4 kB upon the first failure" is the right approach or if we should use 4 kB clusters from the start -- but I suspect that switching to 4 kB clusters would have an impact on network performance while switching allocation size after the first failure is probably good enough to mitigate the CPU usage.

Testers: In addition to checking whether this patch gets rid of ENA "lost tx completion" and other ENA (and NVMe!) resets, please report on the output of vmstat -z | grep mbuf_jumbo; sysctl -a | grep mjum_alloc_fail. Under heavy load we should see vmstat reporting that 9 kB allocations failed but ENA not reporting any allocation failures (since it would have gotten a 4 kB allocation).