Page MenuHomeFreeBSD

aq(4): take F/W statistics off the iflib core lock (kick-and-read)
ClosedPublic

Authored by nick_spun.io on Thu, Jun 4, 11:25 AM.
Referenced Files
Unknown Object (File)
Sun, Jun 21, 11:11 AM
Unknown Object (File)
Sun, Jun 21, 11:06 AM
Unknown Object (File)
Fri, Jun 19, 12:31 AM
Unknown Object (File)
Fri, Jun 19, 12:25 AM
Unknown Object (File)
Wed, Jun 17, 11:33 PM
Unknown Object (File)
Wed, Jun 17, 11:33 PM
Unknown Object (File)
Tue, Jun 16, 5:36 PM
Unknown Object (File)
Mon, Jun 15, 3:58 PM
Subscribers

Details

Summary

The once-per-second statistics refresh ran the whole F/W-mailbox
transaction under iflib's CTX (sx) lock: fw2x_get_stats toggled the MPI
STATISTICS control bit and busy-polled the state register for the
acknowledgement (up to ~25 ms) before downloading the counters, so a slow
F/W response blocked datapath reconfigure / ioctls for the duration.

The per-cast and error counters have no direct-register source -- the
reference Linux atlantic driver and our port both read them out of the
F/W mailbox, and the MSM registers the chip exposes are never used for the
periodic counters. So rather than poll, adopt the kick-and-read shape the
iflib peer with the same constraint uses (vmxnet3): consume the snapshot
the F/W produced for the *previous* request, then toggle the bit to
request the next one -- no wait. The F/W finished that previous refresh
~1 s ago, so the download needs no poll, and the toggle write stays
serialized against set_mode by the CTX lock exactly as before. This
removes the 25 ms poll (and the toggle_mpi_ctrl_and_wait_ helper) from
under the lock; only the fast 16-dword download remains.

Cost: the counters lag one 1 s cycle, invisible for monitoring, and a torn
read is already rejected by aq_update_hw_stats' monotonic-delta check.

Validated on AQC107: a fixed 500 MiB RX transfer advances good_octets_rcvd
by 549.6 MB -- 500 MiB plus the ~4.8% Ethernet framing overhead -- with
rx_err=0 and traffic at line rate.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Doesn't this mean the first toggle reads no/garbage statistics?

This revision is now accepted and ready to land.Sun, Jun 14, 4:12 PM

Doesn't this mean the first toggle reads no/garbage statistics?

I totally didn't think about that, let me look at how some other drivers do this and whether they just yolo the data for ~1s