Page MenuHomeFreeBSD

gve: Add callout to detect and handle TX timeouts
ClosedPublic

Authored by jtranoleary_google.com on May 16 2025, 7:33 PM.
Tags
None
Referenced Files
F133443456: D50385.diff
Sat, Oct 25, 8:26 PM
Unknown Object (File)
Tue, Oct 21, 9:45 PM
Unknown Object (File)
Sat, Oct 18, 7:56 PM
Unknown Object (File)
Sep 24 2025, 6:12 AM
Unknown Object (File)
Sep 24 2025, 2:15 AM
Unknown Object (File)
Sep 24 2025, 1:08 AM
Unknown Object (File)
Sep 21 2025, 10:11 PM
Unknown Object (File)
Sep 20 2025, 4:20 AM
Subscribers

Details

Summary

A TX timeout occurs when the driver allocates resources on a TX queue
for a packet to be sent, prompts the hardware to send the packet, but
does not receive a completion for the packet within a given timeout
period. An accumulation of TX timeouts can cause one or more queues to
run out of space and cause the entire driver to become stuck.

This commit adds a lockless timer service that runs periodically and
checks queues for timed out packets. In the event we detect a timeout,
we prompt the completion phase taskqueue to process completions. Upon
the next inspection of the queue we still detect timed out packets, if
the last "kick" occurred within a fixed cooldown window, we opt to
reset the driver, even if the prior kick successfully freed timed out
packets.

Signed-off-by: Jasper Tran O'Leary <jtranoleary@google.com>

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 64324
Build 61208: arc lint + arc unit

Event Timeline

Hey, thanks for tagging me.

sys/dev/gve/gve_main.c
119

Can we put these in gve.4 DIAGNOSTICS section?

125

Ditto.

sys/dev/gve/gve_sysctl.c
190

This should also go in gve.4 SYSCTL section.

sys/dev/gve/gve_main.c
142

The operator should be on the previous line per style(9), i.e.,

num_timeout_pkts = gve_is_gqi(priv) ?
    gve_check_tx_timeout_gqi(priv, tx) :
    gve_check_tx_timeout_dqo(priv, tx);
152

MPSAFE is set at callout initialization time (and you're already setting it there).

162

Do you really want such a large precision? This basically permits the callout subsystem to further delay the callout by up to the timer period in order to coalesce events, i.e., it could in theory fire after 2*GVE_TX_TIMEOUT_CHECK_CADENCE_SEC seconds. In practice I'd expect it to fire well before that, but it still looks odd.

If you pass 0 instead you'll get a sane default (I think in practice it's the timer period divided by 4).

163

Same comment about MPSAFE.

sys/dev/gve/gve_tx.c
441

Stray whitespace?

sys/dev/gve/gve_utils.c
448

These functions won't get inlined as they're always called from a different compilation unit--do we really need these inline annotations?

481
jtranoleary_google.com marked 8 inline comments as done.

Address first round of reviewer feedback

  • Remove unnecessary inline keywords
  • Various style fixes
  • Change callout(9) params to more reasonable values
  • Add supporting documentation
sys/dev/gve/gve_main.c
119

Added these two TX timeout error messages to the gve.4 diagnostics section.

142

Got it—changed this to operator-at-end style to match style(9).

152

Changed the flag to 0 as indicated.

162

Understood—no, I would not have wanted to such a large precision. I changed to to 0 to get the reasonable default that you described.

sys/dev/gve/gve_sysctl.c
190

In gve.4 we currently don't have any of the read-only counter sysctl variables explicitly listed. It seems like similar drivers also don't list them. The gve.4 page currently includes:

Apart from these messages, the driver exposes per-queue packet and error
     counters as sysctl nodes.

We can also expose per-queue sysctl nodes with sysctl dev.gve.0. Given that there's no explicit section for counters currently, what would be the best approach here?

sys/dev/gve/gve_tx.c
441

Yes, removed.

sys/dev/gve/gve_utils.c
448

You're right—I removed the inline annotations.

481

Fixed.

jtranoleary_google.com marked an inline comment as done.

Once you're happy with the patches, please mail them to me and I'll apply them.

sys/dev/gve/gve_main.c
148

The % operator should be on the preceding line here too.

sys/dev/gve/gve_sysctl.c
190

I don't see any particular reason to exhaustively document these counters. The sysctl descriptions are useful, and someone who wants to know more can always look at the driver sources.

This revision is now accepted and ready to land.May 20 2025, 8:29 PM

Moved a modulo operator to end of line RE style(9)

This revision now requires review to proceed.May 20 2025, 10:23 PM
jtranoleary_google.com added inline comments.
sys/dev/gve/gve_main.c
148

Fixed. In the future, I will be more mindful of operators going at the end of the line for multiline statements.

This revision was not accepted when it landed; it landed in state Needs Review.May 20 2025, 11:51 PM
This revision was automatically updated to reflect the committed changes.