net80211: add a timer to flush fast-frames queues
ClosedPublic

Authored by s3erios_gmail.com on Mon, Mar 13, 12:17 AM.

Details

Summary

This should allow to drop 'ieee80211_ff_[age/flush]' calls from drivers
(an additional call can be made from ieee80211_tx_complete() for non-default ieee80211_ffagemax values to prevent stalls - but it will require an additional counter for transmitted frames).

Test Plan

Tested with RTL8821AU, STA mode (A-MSDU part only).

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

IMHO, M_AGE_*() is useless for both powersave & stage queues - the first frame always has AGE set to some default value + it is always 0 for subsequent frames; how is it supposed to work?

The whole M_AGE thing is a bit "special". For fast frames I bet yeah, it's just the right age for the first frame. For the say power save queue, it's supposed to be the age of the frame as a delta from the previous one. If it isn't doing that then it's a bug.

The whole fast frames staging queue should /really/ be a per-node, per-TID staging queue with a queue discipline/dispatch in front of it to hand to the driver packets to transmit. The fast-frames / a-msdu staging queue thing and powersave queues are .. well, not quite that. They're more a global queue before per-VAP, per-node and encap options.

Anyway - one of the big reasons for doing ieee80211_tx_complete() was to get enough information to the wifi stack to continue transmit scheduling. Before it the net80211 transmit path wouldn't know which node/TIDs are making forward progress.

Are all the wifi drivers using ieee80211_tx_complete() now? I'd like to stop the drivers having to do the explicit aging like they're doing right now. :-)

Hm, do you think it'd be worth to do something like "ieee80211_tx_end() / ieee80211_rx_end()" as a kind of driver-driven "I've finished TX/RX loop, please do any deferred/interesting work now" in drivers?

I would rather there be something that was generic versus the special cased SUPERG stuff. (And that way we don't need to ifdef it for superg.)

Hm, do you think it'd be worth to do something like "ieee80211_tx_end() / ieee80211_rx_end()" as a kind of driver-driven "I've finished TX/RX loop, please do any deferred/interesting work now" in drivers?

I would rather there be something that was generic versus the special cased SUPERG stuff. (And that way we don't need to ifdef it for superg.)

For Tx loop there is already ieee80211_tx_complete() call (and ieee80211_tx_end() will just duplicate it); ieee80211_rx_end() ... does it really needed? (right now it seems to be used just because of no timer for aging / flushing FF queues)
+ counters will be needed anyway (but they will reside in drivers instead).

The whole M_AGE thing is a bit "special". For fast frames I bet yeah, it's just the right age for the first frame. For the say power save queue, it's supposed to be the age of the frame as a delta from the previous one. If it isn't doing that then it's a bug.

The current state:

  1. for (some) typical configurations with beacon interval <= 120 ms and listen interval <= 2 (more precisely, bintval * lintval < 244) the age will be always 0;
  2. for other cases:
    • the first frame will receive some calculated AGE0;
    • until inact loop (or powersave event) will not touch the queue, other will receive value of AGE0 - AGE0 = 0;
    • when inact loop occurs, it will compare the age with IEEE80211_INACT_WAIT (15 seconds); only this event can decrease age (but typically they are all flushed; no matter if it was sent 14 or 0.05 seconds ago)
adrian accepted this revision.Mon, Mar 13, 4:06 PM

Yeah I think it's fine for now.

I think that we eventually should aim to replace this with a per-node pre-encap queue for both power saving and general queue management (for say, full offload devices, maybe some power save situations, TX AMSDU) and then post-encap per-node/per-TID for well, everything else. That'll mean powersave, uAPSD, TX AMPDU, rate control, etc can then be pushed from ath(4)'s TX path into net80211 where it really should be and all software offload NICs can finally do A-MPDU / A-MSDU correctly. ieee80211_tx_complete() was the first hook required for that. :)

This revision has a positive review.Mon, Mar 13, 4:06 PM
This revision was automatically updated to reflect the committed changes.

Hm, is it really worth trying to cancel the timer each time we receive a frame? I wonder if we're better off just letting it always run to completion as flushing the timer each time we drain the stageq could be spendy.

(I'm worried about the overhead of always calling into the callout for each TX frame, especially when doing 11ac data rates..)

Yeah, this creates unneeded overhead for permanent data flow (and not so noticeable at the end, since it called only once); I will remove it.
BTW, there is another problem: with default ieee80211_ffagemax there is almost no chances to get an aggregate; typically, the queue is flushed earlier.

Yeah, this creates unneeded overhead for permanent data flow (and not so noticeable at the end, since it called only once); I will remove it.
BTW, there is another problem: with default ieee80211_ffagemax there is almost no chances to get an aggregate; typically, the queue is flushed earlier.

I was seeing two-frame AMSDUs during UDP TX tests. I'll update my test devices and re-test all of this and get back to you.

Lemme know when you've fixed up the code to be more CPU efficient and I'll test it all.

Yeah, this creates unneeded overhead for permanent data flow (and not so noticeable at the end, since it called only once); I will remove it.
BTW, there is another problem: with default ieee80211_ffagemax there is almost no chances to get an aggregate; typically, the queue is flushed earlier.

I was seeing two-frame AMSDUs during UDP TX tests. I'll update my test devices and re-test all of this and get back to you.

This was a bit simpler test ('ping -f -s 1400' with default 200 frames/sec limitation)

Lemme know when you've fixed up the code to be more CPU efficient and I'll test it all.

Callout cancellation was removed in rS315594.