The once-per-second statistics refresh ran the whole F/W-mailbox
transaction under iflib's CTX (sx) lock: fw2x_get_stats toggled the MPI
STATISTICS control bit and busy-polled the state register for the
acknowledgement (up to ~25 ms) before downloading the counters, so a slow
F/W response blocked datapath reconfigure / ioctls for the duration.
The per-cast and error counters have no direct-register source -- the
reference Linux atlantic driver and our port both read them out of the
F/W mailbox, and the MSM registers the chip exposes are never used for the
periodic counters. So rather than poll, adopt the kick-and-read shape the
iflib peer with the same constraint uses (vmxnet3): consume the snapshot
the F/W produced for the *previous* request, then toggle the bit to
request the next one -- no wait. The F/W finished that previous refresh
~1 s ago, so the download needs no poll, and the toggle write stays
serialized against set_mode by the CTX lock exactly as before. This
removes the 25 ms poll (and the toggle_mpi_ctrl_and_wait_ helper) from
under the lock; only the fast 16-dword download remains.
Cost: the counters lag one 1 s cycle, invisible for monitoring, and a torn
read is already rejected by aq_update_hw_stats' monotonic-delta check.
Validated on AQC107: a fixed 500 MiB RX transfer advances good_octets_rcvd
by 549.6 MB -- 500 MiB plus the ~4.8% Ethernet framing overhead -- with
rx_err=0 and traffic at line rate.