The igb driver at least appears to be able to get into a state
where there's no carrier, and dmesg is spammed with something like:
"igb0 TX(2) desc avail = 1024, pidx = 0", and it remains in a broken
state even if link should be re-established. It's possible that this
is being caused by the watchdog running when link is down, and
IFLIB_ADMIN_ALWAYS_RUN is set.
Details
Before committing, figure out a way to reliably reproduce this
and ensure this fixes the issue.
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Passed - Unit
No Test Coverage - Build Status
Buildable 19116 Build 18741: arc lint + arc unit
Event Timeline
The in-tree igb never sets IFLIB_ADMIN_ALWAYS_RUN. Is this for a modified version of igb?
Not as far as I know... I see I confused ixl with igb.
It looks like I may need to have iflib_txq_drain() consume and reclaim descriptors when !LINK_ACTIVE() then... I was really hoping it was the watchdog.
Regarding this patch then, should the watchdog still be running when there's no link on ixl?
Possibly the easiest way to ensure we don't enter the STALLED state while link is down is to have iflib_link_state_change() set txq->ift_br->drain = iflib_txq_drain_free when link goes down, and then txq->ift_br->drain = iflib_txq_can_drain when it comes back up... which would mean stopping and resetting the callout timer on link changes.
Thoughts?
Looking at iflib_txq_drain_free() more, we likely want to just extract the free loop and put it in a LINK_ACTIVE() block in iflib_txq_drain.