Page MenuHomeFreeBSD

LinuxKPI: 802.11: band-aid for invalid state changes after (*iv_update_bss)
ClosedPublic

Authored by bz on Feb 3 2024, 9:29 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Nov 12, 7:23 PM
Unknown Object (File)
Mon, Nov 4, 6:25 PM
Unknown Object (File)
Fri, Oct 25, 11:10 PM
Unknown Object (File)
Fri, Oct 25, 11:10 PM
Unknown Object (File)
Fri, Oct 25, 11:10 PM
Unknown Object (File)
Sep 26 2024, 9:38 PM
Unknown Object (File)
Sep 11 2024, 8:24 PM
Unknown Object (File)
Sep 11 2024, 8:24 PM

Details

Summary

With firmware based solutions we cannot just jump from an active session
to a new iv_bss node without tearing down state for the old and bringing
up the new node. This likely used to work on softmac based cards/drivers
where one could essentially set the state and fire at will.

We track (*iv_update_bss) calls from net80211 and set a local flag that
we are out of synch and do not allow any further operations up the state
machine until we hit INIT or SCAN. That means someone will take the state
down, clean up firmware state and then we can join again and build up
state.

Apparently this problem has been "known" for a while as native iwm(4) and
others have similar workarounds (though less strict) and can be equally
pestered into bad states. For LinuxKPI all the KASSERTs just massively
brought this problem out. The solution will be some rewrites in net80211.
Until then, try to keep us more stable at least and not die on second
join1() calls triggered by service netif start wlan0 and similar.

Sponsored by: The FreeBSD Foundation (2023, partial)
MFC after: 3 days

Test Plan

This is currently very verbose; before it goes into main
the ic_printfs should become tracing.
This requires D43389 to be applied to head as well
as 49619f73151aeaca4cef5adf631253da04a46e19

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 55780
Build 52669: arc lint + arc unit

Event Timeline

bz requested review of this revision.Feb 3 2024, 9:29 PM

Given D43389 is the FIRST of a series, is this the SECOND one or is there any dependence? Please help clarify.

sys/compat/linuxkpi/common/src/linux_80211.c
1210

No need to print a NULL when "ni->ni_drv_data == NULL".

1316

No need to print a NULL when "lvif->lvif_bss == NULL".

sys/compat/linuxkpi/common/src/linux_80211.c
1600

No need to print a NULL when "lvif->lvif_bss == NULL".

1976

No need to print a NULL when "lvif->lvif_bss == NULL".

2111

No need to print a NULL when "lvif->lvif_bss == NULL".

In D43725#999194, @cc wrote:

Given D43389 is the FIRST of a series, is this the SECOND one or is there any dependence? Please help clarify.

I think there is dependence, as I applied this patch only, restarted netif, and hit the panic:

--- trap 0x9, rip = 0xffffffff80cf8661, rsp = 0xfffffe00ab111d00, rbp = 0xfffffe00ab111d10 ---
node_free() at node_free+0x11/frame 0xfffffe00ab111d10
lkpi_sta_auth_to_scan() at lkpi_sta_auth_to_scan+0x27f/frame 0xfffffe00ab111d80
lkpi_iv_newstate() at lkpi_iv_newstate+0x253/frame 0xfffffe00ab111df0
ieee80211_newstate_cb() at ieee80211_newstate_cb+0x226/frame 0xfffffe00ab111e40
taskqueue_run_locked() at taskqueue_run_locked+0xab/frame 0xfffffe00ab111ec0
taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfffffe00ab111ef0
fork_exit() at fork_exit+0x82/frame 0xfffffe00ab111f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00ab111f30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 0 tid 100192 ]
Stopped at      kdb_enter+0x32: movq    $0,0xe394d3(%rip)
db>
In D43725#999202, @cc wrote:
In D43725#999194, @cc wrote:

Given D43389 is the FIRST of a series, is this the SECOND one or is there any dependence? Please help clarify.

I think there is dependence, as I applied this patch only, restarted netif, and hit the panic:

--- trap 0x9, rip = 0xffffffff80cf8661, rsp = 0xfffffe00ab111d00, rbp = 0xfffffe00ab111d10 ---
node_free() at node_free+0x11/frame 0xfffffe00ab111d10
lkpi_sta_auth_to_scan() at lkpi_sta_auth_to_scan+0x27f/frame 0xfffffe00ab111d80
lkpi_iv_newstate() at lkpi_iv_newstate+0x253/frame 0xfffffe00ab111df0
ieee80211_newstate_cb() at ieee80211_newstate_cb+0x226/frame 0xfffffe00ab111e40
taskqueue_run_locked() at taskqueue_run_locked+0xab/frame 0xfffffe00ab111ec0
taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfffffe00ab111ef0
fork_exit() at fork_exit+0x82/frame 0xfffffe00ab111f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00ab111f30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 0 tid 100192 ]
Stopped at      kdb_enter+0x32: movq    $0,0xe394d3(%rip)
db>

With patches from D43389, D43725 and D43753, it looks "service netif restart" does not introduce panic now.

My initial test on the three patches of D43389, D43725, D43753 looks to be good. No more panics. And I need to figure out some issues in my testbed, so I give the approval first as I don't want my test to delay the schedule.

This revision is now accepted and ready to land.Feb 13 2024, 2:54 PM