In order to avoid races, epoch_wait() must be called between removing an interface address entry and before calling ifa_free(). Else concurrent threads may still reference the old interface address, IFA, still visible and this can lead up to a double free situation. To detect this situation let ifa_ref() use the function refcount_acquire_if_not_zero() to detect when the IFA is still visible, but can no longer be referenced, because it is scheduled for free. This way both iterating and freeing IFA's in an epoch section becomes possible and this also simplifies the code in question.
Details
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
- Lint Skipped 
- Unit
- Tests Skipped 
- Build Status
- Buildable 20446 
Event Timeline
Nice test scenario!
So the original problem was a hang?
So these:
panic: starting DAD on non-tentative address 0xfffff8010c311000
https://people.freebsd.org/~pho/stress/log/epoch.txt
Fatal trap 9: general protection fault while in kernel mode
https://people.freebsd.org/~pho/stress/log/epoch-2.txt
are unrelated?
I'll proceed with testing your patch.
Hi @pho,
The initial problem was a deadlock which was fixed by https://svnweb.freebsd.org/changeset/base/339588 and after that I experienced some panics and started looking at the code.
--HPS
So these:
panic: starting DAD on non-tentative address 0xfffff8010c311000
https://people.freebsd.org/~pho/stress/log/epoch.txt
Might fix this one.
Fatal trap 9: general protection fault while in kernel mode
https://people.freebsd.org/~pho/stress/log/epoch-2.txt
There are probably missing checks for IFF_DYING before accessing the ifp. I started making a patch, but realized the problem was deeper. See: https://reviews.freebsd.org/D17617
--HPS
../../../netinet6/in6.c:1460:2: error: ignoring return value of function declared with 'warn_unused_result' attribute [-Werror,-Wunused-result]
        ifa_ref(&ia->ia_ifa);The page fault seen in epoch-2.txt is the only problem I see now. I also this on a pristine HEAD.
The page fault is very easy to get, so it may shadow for any other problems.
It is still an issue with HEAD:
:
:
if_delmulti_locked: detaching ifnet instance 0xfffff803d7598000
if_delmulti_locked: detaching ifnet instance 0xfffff803d7598000
if_delmulti_locked: detaching ifnet instance 0xfffff803bc652800
if_delmulti_locked: detaching ifnet instance 0xfffff803bc652800
if_delmulti_locked: detaching ifnet instance 0xfffff803bc652800
if_delmulti_locked: detaching ifnet instance 0xfffff803bc652800
if_delmulti_locked: detaching ifnet instance 0xfffff803bc652800
if_delmulti_locked: detaching ifnet instance 0xfffff803bc652800
Fatal trap 9: general protection fault while in kernel mode
cpuid = 10; apic id = 0a
instruction pointer     = 0x20:0xffffffff80cf1ce5
stack pointer           = 0x28:0xfffffe00aa7e76b0
frame pointer           = 0x28:0xfffffe00aa7e7700
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 5492 (ifconfig)
trap number             = 9
panic: general protection fault
cpuid = 10
time = 1567437407
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00aa7e73c0
vpanic() at vpanic+0x19d/frame 0xfffffe00aa7e7410
panic() at panic+0x43/frame 0xfffffe00aa7e7470
trap_fatal() at trap_fatal+0x39c/frame 0xfffffe00aa7e74d0
trap() at trap+0x6a/frame 0xfffffe00aa7e75e0
calltrap() at calltrap+0x8/frame 0xfffffe00aa7e75e0
--- trap 0x9, rip = 0xffffffff80cf1ce5, rsp = 0xfffffe00aa7e76b0, rbp = 0xfffffe00aa7e7700 ---
vlan_ioctl() at vlan_ioctl+0x105/frame 0xfffffe00aa7e7700
ifhwioctl() at ifhwioctl+0x2ad/frame 0xfffffe00aa7e7780
ifioctl() at ifioctl+0x52d/frame 0xfffffe00aa7e7850
kern_ioctl() at kern_ioctl+0x295/frame 0xfffffe00aa7e78b0
sys_ioctl() at sys_ioctl+0x15d/frame 0xfffffe00aa7e7980
amd64_syscall() at amd64_syscall+0x2d4/frame 0xfffffe00aa7e7ab0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00aa7e7ab0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x80047f29a, rsp = 0x7fffffffe3e8, rbp = 0x7fffffffe4b0 ---
KDB: enter: panic
[ thread pid 5492 tid 100148 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db> x/s version
version:        FreeBSD 13.0-CURRENT #0 r351673M: Mon Sep  2 01:26:53 CEST 2019\012    pho@mercat1.netperf.freebsd.org:/usr/src/sys/amd64/compile/PHO\012
db>I agree that if_ref() and ifa_ref() should do refcount_acquire_if_not_zero(). Haven't reviewed the whole change, though. Is the reproducer still producing the panic with fresh -current?
I have not seen any panics with the reproducer on:
FreeBSD mercat1.netperf.freebsd.org 14.0-CURRENT FreeBSD 14.0-CURRENT #1 main-n251492-d2de68811a80b-dirty: Thu Dec 9 17:05:30 CET 2021 pho@mercat1.netperf.freebsd.org:/usr/src/sys/amd64/compile/PHO amd64