Page MenuHomeFreeBSD

iwn: (partially) rewrite A-MPDU Tx path
ClosedPublic

Authored by avos on May 15 2017, 12:29 AM.

Details

Summary
  • Generic Tx stats fixes:
    • do not try to parse "aggregation status" for single frames; send them to iwn_tx_done() instead;
    • try to attach mbuf / node reference pair to reported BA events; allows to fix reported status for ieee80211_tx_complete() and ifnet counters (previously all A-MPDU frames were counted as failed - see PR 210211); requires few more firmware bug workarounds;
    • preserve short / long retry counters for AMRR (disabled for now - causes significant performance degradation).
  • Add new IWN_DEBUG_AMPDU debug category.
  • Add one more check into iwn_tx_data() to prevent aggregation ring overflow.
  • Workaround 'seqno % 256' != 'current Tx slot' case (until D9195 is not in the tree)
  • Improve watchdog timer updates (previously watchdog check was omitted when at least one frame was transmitted).
  • Stop Tx when memory leak in currently used ring was detected (unlikely to happen).
  • Few other minor fixes.

Since node / mbuf NULL checks now are omitted for A-MPDU path (firmware (or driver?) is too buggy), this patch should 'fix' PR 192641.

Test Plan

Tested with Intel 6205 (STA):
+ RTL8821AU (AP mode, netserver);
+ WF2780 (ping -f -s 1400; regular usage)

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

node / mbuf NULL checks now are omitted

not really omitted - just considered as 'correct' - and ignored (see SCD_QUERY status comment + workarounds cannot cover all possible cases).

I'm so very glad you're working on this stuff! It's approved; can you email freebsd-wireless@ for some further testing before commit?

sys/dev/iwn/if_iwn.c
3290 ↗(On Diff #28331)

Is this one of the tx seqno fixes? Is it standalone or depend upon other bits?

3311 ↗(On Diff #28331)

Ah, so we were failing each subframe even if we were just retrying it?

3653 ↗(On Diff #28331)

oh, and this is another bad aggregate queue thing? yes, these slots could be empty. but, hm, why would they be? ideally we'd never schedule non-aggregate frames into this ring?

3794–3795 ↗(On Diff #28331)

How'd you generate these overflow seqno field firmware bug workarounds? Is it from local testing, or from somewhere online?

4594 ↗(On Diff #28331)

Maybe put a warning in here in case it happens, so we can find the code path that lead here and nuke it?

This revision is now accepted and ready to land.May 16 2017, 4:07 AM
sys/dev/iwn/if_iwn.c
3290 ↗(On Diff #28331)

This code corrects bitmap calculation; AFAIR it is standalone (up to line 3305)

an example from logs (-64 < shift <= 0 case is wery rare):
ba->seq: 0x2C50
ba->ssn & 0xff: 0xC3
ba->bitmap: 0x1
wn->agg[tid].startidx: 0xC3
wn->agg[tid].bitmap: 0x7

calculated shift: -2 (and bitmap: 0x4)

3311 ↗(On Diff #28331)

Exactly.

3653 ↗(On Diff #28331)

There is a hypothetical case when long_retries counter was not updated correctly and iwn_ampdu_index_check() was not done - then iwn_compressed_ba() may start to free wrong frames.

3794–3795 ↗(On Diff #28331)

Local testing; few more printfs + IWN_DEBUG_AMPDU was turned on to see what is going on (and why KASSERT is triggered)

4594 ↗(On Diff #28331)

Done! (see D10753; this code fragment will be removed then)

I can't test iwn(4) with 4965AGN - firmware throws FH_ERROR's during testing with and/or without this patch when HT rates are in use (A-MPDU / HT40 / SGI status does not matter).
Scanning is not reliable too - sometimes AP is shown on (almost) all 5GHz channels, while it is present on 44th channel only -> this results in NMI_INTERRUPT_WDG error during next scan.

dhw added a subscriber: dhw.

I tested this earlier today: having built head/amd64 @r327616, first verified that output counters were rarely being updated, then I "cloned" that slice, used "svn patch" to apply the patch, rebuilt the kernel, and rebooted. (All of this worked without incident or drama.) Once rebooted, I used "scp" to copy a large file from the laptop to a machine on a different network here at home; I was pleased to note that the traffic counters on the laptop showed a very close correspondence with the counters on the router (where the NICs are em(4), and "just work").

This appears to be a significant improvement; I look forward to seeing it committed.

This revision was automatically updated to reflect the committed changes.