Page MenuHomeFreeBSD

gve: Fix TX livelock
ClosedPublic

Authored by shailend_google.com on Oct 15 2024, 8:45 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Dec 1, 7:31 PM
Unknown Object (File)
Wed, Nov 20, 11:00 PM
Unknown Object (File)
Oct 24 2024, 3:06 PM
Unknown Object (File)
Oct 24 2024, 3:06 PM
Unknown Object (File)
Oct 24 2024, 2:48 PM
Unknown Object (File)
Oct 23 2024, 6:08 AM
Unknown Object (File)
Oct 21 2024, 5:14 AM
Unknown Object (File)
Oct 16 2024, 2:51 AM
Subscribers

Details

Summary

Before this change the transmit taskqueue would enqueue itself when it
cannot find space on the NIC ring with the hope that eventually space
would be made. This results in the following livelock that only occurs
after passing ~200Gbps of TCP traffic for many hours:

iperf thread (with uma zone lock) ---sched---> gve tx xmit thread ---for room---> gve tx cleanup thread -----uma zone lock----> iperf thread

Further details about the livelock are available on
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560.

After this change, the transmit taskqueue no longer spins till there is
room on the NIC ring. It instead stops itself and lets the
completion-processing taskqueue wake it up.

Since I'm touching the trasnmit taskqueue I've also corrected the name
of a counter and also fixed a bug where EINVAL mbufs were not being
freed and were instead living forever on the bufring.

Signed-off-by: Shailend Chand <shailend@google.com>

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

sys/dev/gve/gve.h
386

The variable is accessed with atomic_load, so doesn't need to be declared volatile.

I think you can use a bool here instead.

sys/dev/gve/gve_tx.c
808
sys/dev/gve/gve_tx_dqo.c
1061

Suppose we load stopped, and then are preempted. Then, the transmit path tries to enqueue a packet and fails with ENOBUFS, so it sets stopped = 1. The only way we can make progress is by running the cleanup task again, right?

I think this mechanism works, but I'm wondering if we really need a full barrier here. It looks like acquire/release loads/stores of stopped would be sufficient.

shailend_google.com marked 3 inline comments as done.

Address Mark's review comments

sys/dev/gve/gve.h
386

Removed the volatile, but a switch to bool angers atomic:

gve_tx.c:249:2: error: controlling expression type 'bool' (aka '_Bool') not compatible with any generic association type
        atomic_store_8(&tx->stopped, false);
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/src/sys/sys/atomic_common.h:114:2: note: expanded from macro 'atomic_store_8'
        __atomic_store_generic(p, v, int8_t, uint8_t, 8)
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/src/sys/sys/atomic_common.h:86:11: note: expanded from macro '__atomic_store_generic'
        _Generic(*(p),                          \
sys/dev/gve/gve_tx_dqo.c
1061

Suppose we load stopped, and then are preempted. Then, the transmit path tries to enqueue a packet and fails with ENOBUFS, so it sets stopped = 1. The only way we can make progress is by running the cleanup task again, right?

Yup. The correctness relies on the ordering between the space made by this thread in the ring and its load of stopped.

If this thread loads a 0 and is preempted and then the xmit thread ENOBUFSs, that implies that the space this thread made before getting preempted wasn't enough to xmit a pkt, which means there is a future interrupt coming that will eventually make space.

Note that the xmit thread does not ever schedule this cleanup thread, only the hardware summons it to work by irq.

I think this mechanism works, but I'm wondering if we really need a full barrier here. It looks like acquire/release loads/stores of stopped would be sufficient.

Since I need to load stopped after making the space in the ring, am I not forced to use a barrier: there being no "release" variant of load?

sys/dev/gve/gve.h
386

You'd have to use atomic_store_bool.

sys/dev/gve/gve_tx_dqo.c
1061

We do also have atomic_thread_fence_*, to be used instead of mb() etc. when needed. In particular, atomic_thread_fence_seq_cst() should be equivalent to mb() on all platforms.

shailend_google.com marked 2 inline comments as done.

Switch to a boolean stopped
Switch to atomic_thread_fence_seq_cst

sys/dev/gve/gve.h
386

Sorry, was testing briefly on a version without https://reviews.freebsd.org/D36078 and missed its existence

This revision is now accepted and ready to land.Nov 5 2024, 3:26 PM
This revision now requires review to proceed.Nov 5 2024, 5:33 PM
This revision was not accepted when it landed; it landed in state Needs Review.Nov 6 2024, 3:18 PM
Closed by commit rG40097cd67c0d: gve: Fix TX livelock (authored by shailend_google.com, committed by markj). · Explain Why
This revision was automatically updated to reflect the committed changes.