Details

Reviewers

mmacy
slavash
kib
bz
mjg

Group Reviewers

network

Summary

In order to avoid races, epoch_wait() must be called between removing an interface address entry and before calling ifa_free(). Else concurrent threads may still reference the old interface address, IFA, still visible and this can lead up to a double free situation. To detect this situation let ifa_ref() use the function refcount_acquire_if_not_zero() to detect when the IFA is still visible, but can no longer be referenced, because it is scheduled for free. This way both iterating and freeing IFA's in an epoch section becomes possible and this also simplifies the code in question.

Test Plan

#!/usr/local/bin/bash
# Script to reproduce hang

for i in {1..255}
do
(ifconfig mce0.$i create; 
ifconfig mce0.$i inet 224.0.0.$i;
ifconfig mce0.$i destroy) &
done

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Skipped

Unit

Tests Skipped

Build Status

Buildable 20287

Event Timeline

• hselasky created this revision.Oct 17 2018, 5:33 PM

Herald added subscribers: ae, imp. · View Herald TranscriptOct 17 2018, 5:33 PM

Harbormaster completed remote builds in B20265: Diff 49260.Oct 17 2018, 5:33 PM

Again, please upoad the context.

Updated patch with full context.

Harbormaster completed remote builds in B20287: Diff 49307.Oct 19 2018, 1:56 PM

Make sure the return value of ifa_ref() is checked.

Harbormaster completed remote builds in B20290: Diff 49311.Oct 19 2018, 5:07 PM

Use already existing refcount_acquire_if_not_zero(). Suggested by @mjg .

Harbormaster completed remote builds in B20292: Diff 49313.Oct 19 2018, 5:46 PM

• hselasky added a child revision: D17617: Fix for use-after-free in if_vlan.c.Oct 19 2018, 5:56 PM

• hselasky edited the summary of this revision. (Show Details)Oct 22 2018, 10:21 AM

• hselasky updated this revision to Diff 49439.Oct 22 2018, 11:52 AM

• hselasky retitled this revision from Fix for deadlock and race deleting entries for epoch managed lists to Fix for double free when deleting entries from epoch managed lists.

• hselasky edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B20333: Diff 49439.Oct 22 2018, 11:52 AM

@hselasky have you had pho@ test this change?

No, @pho has not tested this change.

In D17599#378436, @hselasky wrote:

No, @pho has not tested this change.

Nice test scenario!

So the original problem was a hang?

So these:
panic: starting DAD on non-tentative address 0xfffff8010c311000
https://people.freebsd.org/~pho/stress/log/epoch.txt

Fatal trap 9: general protection fault while in kernel mode
https://people.freebsd.org/~pho/stress/log/epoch-2.txt
are unrelated?

I'll proceed with testing your patch.

Hi @pho,

The initial problem was a deadlock which was fixed by https://svnweb.freebsd.org/changeset/base/339588 and after that I experienced some panics and started looking at the code.

--HPS

So these:
panic: starting DAD on non-tentative address 0xfffff8010c311000
https://people.freebsd.org/~pho/stress/log/epoch.txt

Might fix this one.

Fatal trap 9: general protection fault while in kernel mode
https://people.freebsd.org/~pho/stress/log/epoch-2.txt

There are probably missing checks for IFF_DYING before accessing the ifp. I started making a patch, but realized the problem was deeper. See: https://reviews.freebsd.org/D17617

--HPS

Removed unrelated patches from if_vlan.c .

Harbormaster completed remote builds in B20441: Diff 49645.Oct 26 2018, 11:06 AM

In D17599#378458, @hselasky wrote:

Removed unrelated patches from if_vlan.c .

../../../netinet6/in6.c:1460:2: error: ignoring return value of function declared with 'warn_unused_result' attribute [-Werror,-Wunused-result]
        ifa_ref(&ia->ia_ifa);

@pho: Must have been some recent patches. I'll update my patch shortly.

Update patches for latest FreeBSD 13-current.

Harbormaster completed remote builds in B20446: Diff 49650.Oct 26 2018, 11:52 AM

In D17599#378493, @hselasky wrote:

Update patches for latest FreeBSD 13-current.

The fault is still there: https://people.freebsd.org/~pho/stress/log/epoch2.txt

@pho : Is the other fault you referred still there?

In D17599#379215, @hselasky wrote:

@pho : Is the other fault you referred still there?

The page fault seen in epoch-2.txt is the only problem I see now. I also this on a pristine HEAD.
The page fault is very easy to get, so it may shadow for any other problems.

is this still an issue?

I don't see any panics due to this, so it is not a big issue.

This patch likely needs to be rebased before we can continue.

It is still an issue with HEAD:

:
:
if_delmulti_locked: detaching ifnet instance 0xfffff803d7598000
if_delmulti_locked: detaching ifnet instance 0xfffff803d7598000
if_delmulti_locked: detaching ifnet instance 0xfffff803bc652800
if_delmulti_locked: detaching ifnet instance 0xfffff803bc652800
if_delmulti_locked: detaching ifnet instance 0xfffff803bc652800
if_delmulti_locked: detaching ifnet instance 0xfffff803bc652800
if_delmulti_locked: detaching ifnet instance 0xfffff803bc652800
if_delmulti_locked: detaching ifnet instance 0xfffff803bc652800


Fatal trap 9: general protection fault while in kernel mode
cpuid = 10; apic id = 0a
instruction pointer     = 0x20:0xffffffff80cf1ce5
stack pointer           = 0x28:0xfffffe00aa7e76b0
frame pointer           = 0x28:0xfffffe00aa7e7700
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 5492 (ifconfig)
trap number             = 9
panic: general protection fault
cpuid = 10
time = 1567437407
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00aa7e73c0
vpanic() at vpanic+0x19d/frame 0xfffffe00aa7e7410
panic() at panic+0x43/frame 0xfffffe00aa7e7470
trap_fatal() at trap_fatal+0x39c/frame 0xfffffe00aa7e74d0
trap() at trap+0x6a/frame 0xfffffe00aa7e75e0
calltrap() at calltrap+0x8/frame 0xfffffe00aa7e75e0
--- trap 0x9, rip = 0xffffffff80cf1ce5, rsp = 0xfffffe00aa7e76b0, rbp = 0xfffffe00aa7e7700 ---
vlan_ioctl() at vlan_ioctl+0x105/frame 0xfffffe00aa7e7700
ifhwioctl() at ifhwioctl+0x2ad/frame 0xfffffe00aa7e7780
ifioctl() at ifioctl+0x52d/frame 0xfffffe00aa7e7850
kern_ioctl() at kern_ioctl+0x295/frame 0xfffffe00aa7e78b0
sys_ioctl() at sys_ioctl+0x15d/frame 0xfffffe00aa7e7980
amd64_syscall() at amd64_syscall+0x2d4/frame 0xfffffe00aa7e7ab0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00aa7e7ab0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x80047f29a, rsp = 0x7fffffffe3e8, rbp = 0x7fffffffe4b0 ---
KDB: enter: panic
[ thread pid 5492 tid 100148 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db> x/s version
version:        FreeBSD 13.0-CURRENT #0 r351673M: Mon Sep  2 01:26:53 CEST 2019\012    pho@mercat1.netperf.freebsd.org:/usr/src/sys/amd64/compile/PHO\012
db>

what's up with this?

Herald added subscribers: glebius, melifaro. · View Herald TranscriptDec 9 2021, 10:42 PM

I agree that if_ref() and ifa_ref() should do refcount_acquire_if_not_zero(). Haven't reviewed the whole change, though. Is the reproducer still producing the panic with fresh -current?

In D17599#754462, @glebius wrote:

I agree that if_ref() and ifa_ref() should do refcount_acquire_if_not_zero(). Haven't reviewed the whole change, though. Is the reproducer still producing the panic with fresh -current?

I have not seen any panics with the reproducer on:

FreeBSD mercat1.netperf.freebsd.org 14.0-CURRENT FreeBSD 14.0-CURRENT #1 main-n251492-d2de68811a80b-dirty: Thu Dec  9 17:05:30 CET 2021     pho@mercat1.netperf.freebsd.org:/usr/src/sys/amd64/compile/PHO  amd64

Fix for double free when deleting entries from epoch managed lists
Needs ReviewPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 49307

sys/net/if.c

sys/net/if_var.h

sys/netinet/if_ether.c

sys/netinet/in.c

sys/netinet/ip_carp.c

sys/netinet/raw_ip.c

sys/netinet6/in6.c

sys/netinet6/nd6.c

sys/netinet6/nd6_nbr.c

sys/netinet6/nd6_rtr.c

Fix for double free when deleting entries from epoch managed listsNeeds ReviewPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 49307

sys/net/if.c

sys/net/if_var.h

sys/netinet/if_ether.c

sys/netinet/in.c

sys/netinet/ip_carp.c

sys/netinet/raw_ip.c

sys/netinet6/in6.c

sys/netinet6/nd6.c

sys/netinet6/nd6_nbr.c

sys/netinet6/nd6_rtr.c

Fix for double free when deleting entries from epoch managed lists
Needs ReviewPublic
Actions

Revision Contents
Changeset List