HomeFreeBSD

gve: Fix TX livelock

Description

gve: Fix TX livelock

Before this change the transmit taskqueue would enqueue itself when it
cannot find space on the NIC ring with the hope that eventually space
would be made. This results in the following livelock that only occurs
after passing ~200Gbps of TCP traffic for many hours:

100% CPU

┌───────────┐wait on ┌──────────┐ ┌───────────┐
│user thread│ cpu │gve xmit │wait on │gve cleanup│
│with mbuf ├────────►│taskqueue ├────────►│taskqueue │
│uma lock │ │ │ NIC ring│ │
└───────────┘ └──────────┘ space └─────┬─────┘

▲                                           │
│      wait on mbuf uma lock                │
└───────────────────────────────────────────┘

Further details about the livelock are available on
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560.

After this change, the transmit taskqueue no longer spins till there is
room on the NIC ring. It instead stops itself and lets the
completion-processing taskqueue wake it up.

Since I'm touching the trasnmit taskqueue I've also corrected the name
of a counter and also fixed a bug where EINVAL mbufs were not being
freed and were instead living forever on the bufring.

Signed-off-by: Shailend Chand <shailend@google.com>
Reviewed-by: markj
MFC-after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D47138

(cherry picked from commit 40097cd67c0d52e2b288e8555b12faf02768d89c)

Details

Provenance
shailend_google.comAuthored on Nov 5 2024, 7:38 PM
markjCommitted on Nov 20 2024, 9:41 PM
Differential Revision
D47138: gve: Fix TX livelock
Parents
rG1bda36a393c2: gve: Add DQO QPL support
Branches
Unknown
Tags
Unknown