Page MenuHomeFreeBSD

net80211: deal with lost state transitions
ClosedPublic

Authored by bz on Jan 10 2024, 11:12 AM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Apr 26, 2:56 AM
Unknown Object (File)
Thu, Apr 11, 10:06 PM
Unknown Object (File)
Wed, Apr 10, 10:37 PM
Unknown Object (File)
Wed, Apr 10, 3:20 AM
Unknown Object (File)
Tue, Apr 9, 8:23 PM
Unknown Object (File)
Feb 19 2024, 10:13 PM
Unknown Object (File)
Feb 16 2024, 1:56 PM
Unknown Object (File)
Feb 14 2024, 8:48 PM

Details

Summary

Since 5efea30f039c4 we can possibly lose a state transition which can
cause trouble further down the road.
The reproducer from 643d6dce6c1e can trigger these for example.
Drivers for firmware based wireless cards have worked around some of
this (and other) problems in the past.

Add an array of tasks rather than a single one as we would simply
get npending > 1 and lose order with other tasks. Try to keep state
changes updated as queued in case we end up with more than one at a
time. While this is not ideal either (call it a hack) it will sort
the problem for now.
We will queue in ieee80211_new_state_locked() and do checks there
and dequeue in ieee80211_newstate_cb().
If we still overrun the (currently) 8 slots we will drop the state
change rather than overwrite the last one.
When dequeing we will update iv_nstate and keep it around for historic
reasons for the moment.

The longer term we should make the callers of
ieee80211_new_state[_locked]() actually use the returned errors
and act appropriately but that will touch a lot more places and
drivers (possibly incl. changed behaviour for ioctls).

rtwn(4) and rum(4) should probably be revisted and net80211 internals
removed (for rum(4) at least the currently logic still seems prone to
races).

Sponsored by: The FreeBSD Foundation (in 2023)
MFC after: 3 days

NB: this is the first of a series of changes to deal with the firmware
state problem in the LinuxKPI comapt layer and all the drivers using it.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

bz requested review of this revision.Jan 10 2024, 11:12 AM

I think this patch as a fix is targeting 271979#c28. If I am right, I will use the reproduce method to test this patch. thanks

Also referencing this 274382#c3 for the "panic: lkpi_sta_auth_to_scan", although I initially mis-placed the report in this PR.

In D43389#989053, @cc wrote:

I think this patch as a fix is targeting 271979#c28. If I am right, I will use the reproduce method to test this patch. thanks

Also referencing this 274382#c3 for the "panic: lkpi_sta_auth_to_scan", although I initially mis-placed the report in this PR.

As said, this is the FIRST is a series. there's a lot more changes in LinuxKPI needed to fix any of that so save the time testing just yet please.

My initial test on the three patches of D43389, D43725, D43753 looks to be good. No more panics. And I need to figure out some issues in my testbed, so I give the approval first as I don't want my test to delay the schedule.

This revision is now accepted and ready to land.Feb 13 2024, 2:54 PM
This revision was automatically updated to reflect the committed changes.