Page MenuHomeFreeBSD

LinuxKPI: 802.11: implement a deferred RX path
ClosedPublic

Authored by bz on Feb 18 2024, 10:15 PM.
Tags
None
Referenced Files
F106840857: D43968.id134602.diff
Mon, Jan 6, 6:48 AM
F106840778: D43968.diff
Mon, Jan 6, 6:46 AM
Unknown Object (File)
Nov 23 2024, 10:08 AM
Unknown Object (File)
Nov 22 2024, 4:17 PM
Unknown Object (File)
Nov 18 2024, 1:01 PM
Unknown Object (File)
Nov 17 2024, 5:49 PM
Unknown Object (File)
Nov 14 2024, 2:24 PM
Unknown Object (File)
Nov 12 2024, 1:46 PM

Details

Summary

Some calls, e.g., action frames cause us to call through all the
way down to firmware from the RX path without any deferral in
net80211.

For LinuxKPI and iwlwifi this goes (with omissions) like this:
lkpi_napi_task -> linuxkpi_ieee80211_rx -> ieee80211_input_mimo ->
sta_input -> ht_recv_action_ba_addba_request ->
lkpi_ic_ampdu_rx_start -> iwl_mvm_mac_ampdu_action ->
iwl_trans_txq_send_hcmd. At that point we are waiting for an
interrupt from the firmware but given the lkpi_napi_task has not
finished (and may have more to dispatch based on budget and what
was received) we will not see thew new interrupt/fw response.
With no answer from the firmware, the software timeout in the
driver kills the command and the firmware and issues a complete
restart.

Implement the deferred RX path in LinuxKPI for the moment. If any
native drivers will hit this in the future we should carefully go
and see how to shift this into net80211.

This fixes the hangs for (*ic_ampdu_rx_start)() calls.

MFC after: 3 days
PR: 276083

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

bz requested review of this revision.Feb 18 2024, 10:15 PM

Notes:
(1) (*ic_ampdu_rx_start)() MO dowcalls in LinuxKPI need to unlock the ic lock and lock the LHW lock as done in the state machine updates given the MO calls can sleep.
(2) there may be further work on the node_cleanup -> ieee80211_ht_node_cleanup -> lkpi_ic_ampdu_rx_stop path needed if (1) is not sufficient
(3) A separate change ieee80211_sn_*() is to come.
(4) Need to go and see about packets and rates now likely as well as the RX packets after the BA response is out will not show up with 11n for me, but I can see the first packet (and retry) in monitor mode. I'd need a "sane" AP to test with.

honestly we should defer rx in net80211, it'd make a whole lot of packet handling w/ state transitions easier

Initial test shows this error when I changed the channel number in AP. Looks like a reproduce of PR 277100 without LKPI_80211_HW_CRYPTO enabled, or a separate issue?

dmesg snip

[4779.973889] iwlwifi0: linuxkpi_ieee80211_beacon_loss: vif 0xfffffe00bdeb3e80 vap 0xfffffe00bdeb3010 state RUN
[4780.076267] iwlwifi0: linuxkpi_ieee80211_beacon_loss: vif 0xfffffe00bdeb3e80 vap 0xfffffe00bdeb3010 state RUN
[4780.079975] calling _callout_stop_safe with the following non-sleepable locks held:
[4780.080635] exclusive sleep mutex iwlwifi0_com_lo (iwlwifi0_com_lo) r = 0 (0xfffffe00bde1c020) locked @ /usr/src/sys/net80211/ieee80211_scan_sw.c:436
[4780.081455] stack backtrace:
[4780.081740] #0 0xffffffff80bc9585 at witness_debugger+0x65
[4780.082164] #1 0xffffffff80bca6e9 at witness_warn+0x3e9
[4780.082531] #2 0xffffffff80b73976 at _callout_stop_safe+0x76
[4780.082942] #3 0xffffffff80de7f7e at timer_shutdown_sync+0xe
[4780.083333] #4 0xffffffff82650912 at iwl_mvm_sta_rx_agg+0x892
[4780.083711] #5 0xffffffff8262e51e at iwl_mvm_mac_ampdu_action+0x1ee
[4780.084171] #6 0xffffffff80dde33b at lkpi_ic_ampdu_rx_stop+0x1bb
[4780.084599] #7 0xffffffff80ce333d at ieee80211_ht_node_cleanup+0x12d
[4780.085044] #8 0xffffffff80cf92f1 at node_cleanup+0x161
[4780.085409] #9 0xffffffff80cfb6d3 at ieee80211_sta_leave+0x13
[4780.085806] #10 0xffffffff80d0ef3d at sta_newstate+0x4cd
[4780.086180] #11 0xffffffff80dd837e at lkpi_sta_run_to_init+0x27e
[4780.086600] #12 0xffffffff80de21db at lkpi_iv_newstate+0x2db
[4780.086990] #13 0xffffffff80d06424 at ieee80211_newstate_cb+0x2a4
[4780.087407] #14 0xffffffff80bbb48b at taskqueue_run_locked+0xab
[4780.087809] #15 0xffffffff80bbc543 at taskqueue_thread_loop+0xd3
[4780.088213] #16 0xffffffff80b098f2 at fork_exit+0x82
[4780.088578] #17 0xffffffff8102ff9e at fork_trampoline+0xe
[4780.094279] wlan0: link state changed to DOWN
root@n2fbsd:/usr/src #

In D43968#1003463, @cc wrote:

Initial test shows this error when I changed the channel number in AP. Looks like a reproduce of PR 277100 without LKPI_80211_HW_CRYPTO enabled, or a separate issue?

Yes, looks likely same/similar than in the PR.

You've seen the comments (1)-(4) above?

In D43968#1003515, @bz wrote:
In D43968#1003463, @cc wrote:

Initial test shows this error when I changed the channel number in AP. Looks like a reproduce of PR 277100 without LKPI_80211_HW_CRYPTO enabled, or a separate issue?

Yes, looks likely same/similar than in the PR.

You've seen the comments (1)-(4) above?

Right, I understand. This fixes PR 276083. The notes (1)-(4) are remaining work to be done.

Tested and the timeout is gone.

This revision is now accepted and ready to land.Feb 21 2024, 3:48 PM
This revision was automatically updated to reflect the committed changes.