gve: Add support for 4k RX Buffers when using DQO queue formats
ClosedPublic
Actions

Authored by veethebee_google.com on Jun 10 2025, 9:09 PM.

Details

Reviewers

markj
delphij
kibab
ziaee
jtranoleary_google.com

Group Reviewers

network

Commits

rGfc03742b2512: gve: Add support for 4k RX Buffers when using DQO queue formats
rG71702df61262: gve: Add support for 4k RX Buffers when using DQO queue formats

Summary

This change adds support for using 4K RX Buffers when using DQO queue
formats when a boot-time tunable flag is set to true by the user.
When this flag is enabled, the driver will use 4K RX Buffer size either
when HW LRO is enabled or mtu > 2048.

Signed-off-by: Vee Agarwal <veethebee@google.com>

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

veethebee_google.com created this revision.Jun 10 2025, 9:09 PM

Herald added a subscriber: imp. · View Herald TranscriptJun 10 2025, 9:09 PM

veethebee_google.com requested review of this revision.Jun 10 2025, 9:09 PM

Harbormaster completed remote builds in B64755: Diff 156808.Jun 10 2025, 9:09 PM

LGTM from manpages.

This revision is now accepted and ready to land.Jun 10 2025, 10:32 PM

markj added inline comments.Jun 11 2025, 10:16 PM

sys/dev/gve/gve.h
685	Do you need `_Static_asserts` which verify that MJUMPAGESIZE == 4096 and MCLBYTES == 2048? The latter in particular might not hold; I'm aware of some downstreams which tweak that value, so it'd be nice to at least fail to build in that case if necessary.
sys/dev/gve/gve_main.c
414	I think this driver tries to follow the style(9) convention of declaring variables at the beginning of a scope, in which case the declaration should move up.

veethebee_google.com added inline comments.Jun 11 2025, 10:50 PM

sys/dev/gve/gve_rx_dqo.c
323	@markj, @delphij or any other folks familiar with memory management in freebsd, do you have any insight on whether 4k mbuf jumbo clusters are as readily available as 2k mbuf clusters? From my understanding, fetching mbuf clusters only performs one fetch as there is a memory region for mbufs with pre-attached 2k clusters but when getting a 4k jumbo cluster, an mbuf is fetched separately from fetching a jumbo cluster. Would this pose any issues in replacing the use of 2k clusters with 4k jumbo clusters?

markj added inline comments.Jun 11 2025, 11:20 PM

sys/dev/gve/gve_rx_dqo.c
323	By "fetch" you mean allocate from UMA? It shouldn't make a functional difference. There's a "packet" zone which caches mbuf headers with a 2KB cluster attached, whereas a 4KB packet buffer requires two back-to-back allocations. The packet zone is an optimization to elide a call to uma_zalloc(), but aside from extra performance overhead, the driver shouldn't notice. Are you able to do some benchmarking to measure CPU usage and throughput with small packets to compare the two allocation strategies?

veethebee_google.com marked an inline comment as done.Jun 12 2025, 12:06 AM

veethebee_google.com added inline comments.

sys/dev/gve/gve_rx_dqo.c
323	Yup, I meant the allocate from UMA. We did do extensive performance testing and didn't see any significant differences between the two. So it sounds like replacing 2k buffers with 4k buffers shouldn't really have any issues overall. Thanks so much!

Adding static asserts and fixing declaration to match style(9)

This revision now requires review to proceed.Jun 12 2025, 9:54 PM

Harbormaster completed remote builds in B64833: Diff 156943.Jun 12 2025, 9:54 PM

veethebee_google.com marked 2 inline comments as done.Jun 12 2025, 9:55 PM

Please send me a patch and I'll apply it.

This revision is now accepted and ready to land.Jun 13 2025, 1:01 AM

markj added inline comments.Jun 13 2025, 2:03 PM

sys/dev/gve/gve.h
91	Sorry, I realized that this will trivially fail on arm64 when the page size is 16KB. There, we define MJUMPAGESIZE == 8096 (which is admittedly a bit odd). Using 8KB buffers is obviously suboptimal, but I'd expect it to work--can we relax this assertion to `MJUMPAGESIZE >= GVE_4K_RX_BUFFER_SIZE_DQO`?

Fix static assert

This revision now requires review to proceed.Jun 13 2025, 5:51 PM

Harbormaster completed remote builds in B64851: Diff 156985.Jun 13 2025, 5:51 PM

veethebee_google.com marked an inline comment as done.Jun 13 2025, 5:51 PM

This revision was not accepted when it landed; it landed in state Needs Review.Jun 13 2025, 6:55 PM

Closed by commit rG71702df61262: gve: Add support for 4k RX Buffers when using DQO queue formats (authored by veethebee_google.com, committed by markj). · Explain Why

This revision was automatically updated to reflect the committed changes.

markj added a commit: rG71702df61262: gve: Add support for 4k RX Buffers when using DQO queue formats.

markj added inline comments.Jun 14 2025, 12:23 PM

sys/dev/gve/gve.h
91	I committed a follow-up change to relax the assertion for the MCLBYTES comparison as well--there are some non-default kernel configs which set MCLBYTES=4096. In that case we don't have to switch allocation strategies, so the driver could be modified to handle it, but I'm not sure it's worth bothering. The kernel config in question changes the default only as a testing measure.

markj added inline comments.Jun 19 2025, 1:47 PM

sys/dev/gve/gve_main.c
438	Coverity points out that we should check for errors from gve_alloc_rx_rings(). I'm not sure how to handle errors though; perhaps this routine should be reworked to try and allocate new rings earlier?