In order to avoid races, epoch_wait() must be called between removing an interface address entry and before calling ifa_free(). Else concurrent threads may still reference the old interface address, IFA, still visible and this can lead up to a double free situation. To detect this situation let ifa_ref() use the function refcount_acquire_if_not_zero() to detect when the IFA is still visible, but can no longer be referenced, because it is scheduled for free. This way both iterating and freeing IFA's in an epoch section becomes possible and this also simplifies the code in question.
Details
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Skipped - Unit
Tests Skipped - Build Status
Buildable 20287
Event Timeline
Nice test scenario!
So the original problem was a hang?
So these:
panic: starting DAD on non-tentative address 0xfffff8010c311000
https://people.freebsd.org/~pho/stress/log/epoch.txt
Fatal trap 9: general protection fault while in kernel mode
https://people.freebsd.org/~pho/stress/log/epoch-2.txt
are unrelated?
I'll proceed with testing your patch.
Hi @pho,
The initial problem was a deadlock which was fixed by https://svnweb.freebsd.org/changeset/base/339588 and after that I experienced some panics and started looking at the code.
--HPS
So these:
panic: starting DAD on non-tentative address 0xfffff8010c311000
https://people.freebsd.org/~pho/stress/log/epoch.txt
Might fix this one.
Fatal trap 9: general protection fault while in kernel mode
https://people.freebsd.org/~pho/stress/log/epoch-2.txt
There are probably missing checks for IFF_DYING before accessing the ifp. I started making a patch, but realized the problem was deeper. See: https://reviews.freebsd.org/D17617
--HPS
../../../netinet6/in6.c:1460:2: error: ignoring return value of function declared with 'warn_unused_result' attribute [-Werror,-Wunused-result] ifa_ref(&ia->ia_ifa);
The page fault seen in epoch-2.txt is the only problem I see now. I also this on a pristine HEAD.
The page fault is very easy to get, so it may shadow for any other problems.
It is still an issue with HEAD:
: : if_delmulti_locked: detaching ifnet instance 0xfffff803d7598000 if_delmulti_locked: detaching ifnet instance 0xfffff803d7598000 if_delmulti_locked: detaching ifnet instance 0xfffff803bc652800 if_delmulti_locked: detaching ifnet instance 0xfffff803bc652800 if_delmulti_locked: detaching ifnet instance 0xfffff803bc652800 if_delmulti_locked: detaching ifnet instance 0xfffff803bc652800 if_delmulti_locked: detaching ifnet instance 0xfffff803bc652800 if_delmulti_locked: detaching ifnet instance 0xfffff803bc652800 Fatal trap 9: general protection fault while in kernel mode cpuid = 10; apic id = 0a instruction pointer = 0x20:0xffffffff80cf1ce5 stack pointer = 0x28:0xfffffe00aa7e76b0 frame pointer = 0x28:0xfffffe00aa7e7700 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 5492 (ifconfig) trap number = 9 panic: general protection fault cpuid = 10 time = 1567437407 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00aa7e73c0 vpanic() at vpanic+0x19d/frame 0xfffffe00aa7e7410 panic() at panic+0x43/frame 0xfffffe00aa7e7470 trap_fatal() at trap_fatal+0x39c/frame 0xfffffe00aa7e74d0 trap() at trap+0x6a/frame 0xfffffe00aa7e75e0 calltrap() at calltrap+0x8/frame 0xfffffe00aa7e75e0 --- trap 0x9, rip = 0xffffffff80cf1ce5, rsp = 0xfffffe00aa7e76b0, rbp = 0xfffffe00aa7e7700 --- vlan_ioctl() at vlan_ioctl+0x105/frame 0xfffffe00aa7e7700 ifhwioctl() at ifhwioctl+0x2ad/frame 0xfffffe00aa7e7780 ifioctl() at ifioctl+0x52d/frame 0xfffffe00aa7e7850 kern_ioctl() at kern_ioctl+0x295/frame 0xfffffe00aa7e78b0 sys_ioctl() at sys_ioctl+0x15d/frame 0xfffffe00aa7e7980 amd64_syscall() at amd64_syscall+0x2d4/frame 0xfffffe00aa7e7ab0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00aa7e7ab0 --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x80047f29a, rsp = 0x7fffffffe3e8, rbp = 0x7fffffffe4b0 --- KDB: enter: panic [ thread pid 5492 tid 100148 ] Stopped at kdb_enter+0x3b: movq $0,kdb_why db> x/s version version: FreeBSD 13.0-CURRENT #0 r351673M: Mon Sep 2 01:26:53 CEST 2019\012 pho@mercat1.netperf.freebsd.org:/usr/src/sys/amd64/compile/PHO\012 db>
I agree that if_ref() and ifa_ref() should do refcount_acquire_if_not_zero(). Haven't reviewed the whole change, though. Is the reproducer still producing the panic with fresh -current?
I have not seen any panics with the reproducer on:
FreeBSD mercat1.netperf.freebsd.org 14.0-CURRENT FreeBSD 14.0-CURRENT #1 main-n251492-d2de68811a80b-dirty: Thu Dec 9 17:05:30 CET 2021 pho@mercat1.netperf.freebsd.org:/usr/src/sys/amd64/compile/PHO amd64