Page MenuHomeFreeBSD

Fix for double free when deleting entries from epoch managed lists
Needs ReviewPublic

Authored by hselasky on Oct 17 2018, 5:33 PM.

Details

Reviewers
mmacy
slavash
kib
bz
mjg
Group Reviewers
network
Summary

In order to avoid races, epoch_wait() must be called between removing an interface address entry and before calling ifa_free(). Else concurrent threads may still reference the old interface address, IFA, still visible and this can lead up to a double free situation. To detect this situation let ifa_ref() use the function refcount_acquire_if_not_zero() to detect when the IFA is still visible, but can no longer be referenced, because it is scheduled for free. This way both iterating and freeing IFA's in an epoch section becomes possible and this also simplifies the code in question.

Test Plan
#!/usr/local/bin/bash
# Script to reproduce hang

for i in {1..255}
do
(ifconfig mce0.$i create; 
ifconfig mce0.$i inet 224.0.0.$i;
ifconfig mce0.$i destroy) &
done

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped
Build Status
Buildable 20446

Event Timeline

hselasky created this revision.Oct 17 2018, 5:33 PM
kib added a comment.Oct 18 2018, 11:06 AM

Again, please upoad the context.

hselasky updated this revision to Diff 49307.Oct 19 2018, 1:56 PM
hselasky retitled this revision from Calling epoch_wait() under VLAN_XLOCK() leads to deadlock to Fix for deadlock and race deleting entries for epoch managed lists.
hselasky edited the summary of this revision. (Show Details)
hselasky edited the test plan for this revision. (Show Details)
hselasky added a reviewer: bz.

Updated patch with full context.

hselasky updated this revision to Diff 49311.Oct 19 2018, 5:07 PM

Make sure the return value of ifa_ref() is checked.

hselasky updated this revision to Diff 49313.Oct 19 2018, 5:46 PM
hselasky edited the summary of this revision. (Show Details)
hselasky added a reviewer: mjg.
hselasky added a subscriber: mjg.

Use already existing refcount_acquire_if_not_zero(). Suggested by @mjg .

hselasky edited the summary of this revision. (Show Details)Oct 22 2018, 10:21 AM
hselasky updated this revision to Diff 49439.Oct 22 2018, 11:52 AM
hselasky retitled this revision from Fix for deadlock and race deleting entries for epoch managed lists to Fix for double free when deleting entries from epoch managed lists.
hselasky edited the summary of this revision. (Show Details)
mmacy added a comment.Oct 25 2018, 7:04 PM

@hselasky have you had pho@ test this change?

hselasky added a subscriber: pho.Oct 26 2018, 8:38 AM

No, @pho has not tested this change.

pho added a comment.Oct 26 2018, 10:47 AM

No, @pho has not tested this change.

Nice test scenario!

So the original problem was a hang?

So these:
panic: starting DAD on non-tentative address 0xfffff8010c311000
https://people.freebsd.org/~pho/stress/log/epoch.txt

Fatal trap 9: general protection fault while in kernel mode
https://people.freebsd.org/~pho/stress/log/epoch-2.txt
are unrelated?

I'll proceed with testing your patch.

Hi @pho,

The initial problem was a deadlock which was fixed by https://svnweb.freebsd.org/changeset/base/339588 and after that I experienced some panics and started looking at the code.

--HPS

So these:
panic: starting DAD on non-tentative address 0xfffff8010c311000
https://people.freebsd.org/~pho/stress/log/epoch.txt

Might fix this one.

Fatal trap 9: general protection fault while in kernel mode
https://people.freebsd.org/~pho/stress/log/epoch-2.txt

There are probably missing checks for IFF_DYING before accessing the ifp. I started making a patch, but realized the problem was deeper. See: https://reviews.freebsd.org/D17617

--HPS

hselasky updated this revision to Diff 49645.Oct 26 2018, 11:06 AM

Removed unrelated patches from if_vlan.c .

pho added a comment.Oct 26 2018, 11:21 AM

Removed unrelated patches from if_vlan.c .

../../../netinet6/in6.c:1460:2: error: ignoring return value of function declared with 'warn_unused_result' attribute [-Werror,-Wunused-result]
        ifa_ref(&ia->ia_ifa);

@pho: Must have been some recent patches. I'll update my patch shortly.

hselasky updated this revision to Diff 49650.Oct 26 2018, 11:52 AM

Update patches for latest FreeBSD 13-current.

pho added a comment.Oct 26 2018, 4:47 PM

Update patches for latest FreeBSD 13-current.

The fault is still there: https://people.freebsd.org/~pho/stress/log/epoch2.txt

@pho : Is the other fault you referred still there?

pho added a comment.Oct 29 2018, 11:25 AM

@pho : Is the other fault you referred still there?

The page fault seen in epoch-2.txt is the only problem I see now. I also this on a pristine HEAD.
The page fault is very easy to get, so it may shadow for any other problems.

mjg added a comment.Sep 2 2019, 2:33 PM

is this still an issue?

I don't see any panics due to this, so it is not a big issue.

This patch likely needs to be rebased before we can continue.

pho added a comment.Sep 2 2019, 3:21 PM

It is still an issue with HEAD:

:
:
if_delmulti_locked: detaching ifnet instance 0xfffff803d7598000
if_delmulti_locked: detaching ifnet instance 0xfffff803d7598000
if_delmulti_locked: detaching ifnet instance 0xfffff803bc652800
if_delmulti_locked: detaching ifnet instance 0xfffff803bc652800
if_delmulti_locked: detaching ifnet instance 0xfffff803bc652800
if_delmulti_locked: detaching ifnet instance 0xfffff803bc652800
if_delmulti_locked: detaching ifnet instance 0xfffff803bc652800
if_delmulti_locked: detaching ifnet instance 0xfffff803bc652800


Fatal trap 9: general protection fault while in kernel mode
cpuid = 10; apic id = 0a
instruction pointer     = 0x20:0xffffffff80cf1ce5
stack pointer           = 0x28:0xfffffe00aa7e76b0
frame pointer           = 0x28:0xfffffe00aa7e7700
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 5492 (ifconfig)
trap number             = 9
panic: general protection fault
cpuid = 10
time = 1567437407
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00aa7e73c0
vpanic() at vpanic+0x19d/frame 0xfffffe00aa7e7410
panic() at panic+0x43/frame 0xfffffe00aa7e7470
trap_fatal() at trap_fatal+0x39c/frame 0xfffffe00aa7e74d0
trap() at trap+0x6a/frame 0xfffffe00aa7e75e0
calltrap() at calltrap+0x8/frame 0xfffffe00aa7e75e0
--- trap 0x9, rip = 0xffffffff80cf1ce5, rsp = 0xfffffe00aa7e76b0, rbp = 0xfffffe00aa7e7700 ---
vlan_ioctl() at vlan_ioctl+0x105/frame 0xfffffe00aa7e7700
ifhwioctl() at ifhwioctl+0x2ad/frame 0xfffffe00aa7e7780
ifioctl() at ifioctl+0x52d/frame 0xfffffe00aa7e7850
kern_ioctl() at kern_ioctl+0x295/frame 0xfffffe00aa7e78b0
sys_ioctl() at sys_ioctl+0x15d/frame 0xfffffe00aa7e7980
amd64_syscall() at amd64_syscall+0x2d4/frame 0xfffffe00aa7e7ab0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00aa7e7ab0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x80047f29a, rsp = 0x7fffffffe3e8, rbp = 0x7fffffffe4b0 ---
KDB: enter: panic
[ thread pid 5492 tid 100148 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db> x/s version
version:        FreeBSD 13.0-CURRENT #0 r351673M: Mon Sep  2 01:26:53 CEST 2019\012    pho@mercat1.netperf.freebsd.org:/usr/src/sys/amd64/compile/PHO\012
db>