Possible fix for PR221990
ClosedPublic

Authored by shurd on Oct 26 2017, 5:59 PM.

Details

Summary

Assuming that ifl_pidx and ifl_credits are going out of sync in
_iflib_fl_refill(), use the same update logic for both to fix error.

Test Plan

If it's verified this is the root cause, have reporter test
in failing case.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
shurd created this revision.Oct 26 2017, 5:59 PM
pho added a comment.Oct 28 2017, 2:13 PM

With this patch I see:

panic: Assertion sd_m[frag_idx] == NULL failed at ../../../net/iflib.c:1912
cpuid = 6
time = 1509192766
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe07c1559660
kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe07c1559710
vpanic() at vpanic+0x19f/frame 0xfffffe07c1559790
kassert_panic() at kassert_panic+0x139/frame 0xfffffe07c1559800
_iflib_fl_refill() at _iflib_fl_refill+0x1fc/frame 0xfffffe07c15598f0
_task_fn_rx() at _task_fn_rx+0xb92/frame 0xfffffe07c15599f0
gtaskqueue_run_locked() at gtaskqueue_run_locked+0x119/frame 0xfffffe07c1559a40
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xab/frame 0xfffffe07c1559a70
fork_exit() at fork_exit+0x84/frame 0xfffffe07c1559ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe07c1559ab0

Details @ https://people.freebsd.org/~pho/stress/log/iflib.txt

shurd updated this revision to Diff 34463.Oct 30 2017, 8:53 PM

If an error occurs in _iflib_fl_refill(), still update pidx/count and
refill up to the error. Should prevent sd_m[idx] != NULL.

pho added a comment.Oct 31 2017, 5:16 PM

I tested this version for 12 hours without seeing any problems.

sbruno accepted this revision.Oct 31 2017, 5:26 PM
This revision is now accepted and ready to land.Oct 31 2017, 5:26 PM
This revision was automatically updated to reflect the committed changes.