Page MenuHomeFreeBSD

Fix numerous refcount bugs in multicast ...
AbandonedPublic

Authored by mmacy on Apr 11 2019, 11:43 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Nov 15, 3:50 PM
Unknown Object (File)
Fri, Nov 1, 7:18 AM
Unknown Object (File)
Thu, Oct 24, 7:32 PM
Unknown Object (File)
Thu, Oct 24, 7:31 PM
Unknown Object (File)
Thu, Oct 24, 7:30 PM
Unknown Object (File)
Thu, Oct 24, 7:30 PM
Unknown Object (File)
Thu, Oct 24, 4:39 PM
Unknown Object (File)
Oct 6 2024, 6:28 PM

Details

Summary

Fix numerous refcount bugs in multicast leading to use after free of ifnet, if_multiaddr, and in_mcast, as well as null derefs

There are 4 different things that can be freed independently of each other
ifnet (interface structure), if_multiaddr (interface multicast address), in_mcast (multicast address), pcb (protocol control block for connections). If one goes away we have to make sure to not reference it from anywhere else. Conversely anything that points to one of these has to hold a reference. Multicast isn't really well thought out for multithreading.

a) in_getmulti implicitly relies on the caller to add a reference when a new multicast address is created and *not* do so when it's an existing one. Breaking this contract has led to pf and carp causing use after free and could be causing in_multi leaks.

b) If a link layer address is freed before the network layer address, there can be a null deref. Clear the pointer in the network address.

c) the if_multiaddr holds a reference to an ifnet, to insure the ifnet doesn't go away before the multiaddr does, acquire a reference on create and release the reference on free and then clear the pointer to the ifnet

d) if the ifnet has gone away, the ll_ifma pointer is no longer valid

e) release the ifma reference to the ifnet on first delete (possibly incorrect)

f) clear the pointer to the in_mcast in the if_multiaddr when we release the reference in igmp_ifdetach to avoid stale references

g) similarly clear the in_mcast pointer in the ifma in purgemaddrs

h) in inp_join_group don't bump imo_num_memberships until we have actually assigned the mcast address to the array entry, thus avoid a null pointer deref with an invalid value

i) simplify freeing in the error case in carp_multicast_setup

j) ip_moptions handling isn't so performance critical that we have to use an unsigned short for num memberships, thus leading silently to underflow

k) in6_getmulti has to bump the ifma reference once inm points to it

l) removing members in v6 needs to be separate from disconnecting the in6_mcast

m) in6p_join_group would try to unlock the pcb without it held in the error path

n) any foreach loop that can lead to an entry being deleted from the list its iterating over needs to use the _SAFE macro (mld_ifdetach)

pkg install mdnsresponder
sysrc mdnsd_enable=YES
service mdnsd start
ifconfig vtnet0 delete # or whatever iface egress/dhclient is running on
ifconfig epair create
ifconfig epair0a 0/24 up
ifconfig epair0a destroy
# panic
inm: 0xfffff8005a31d700 refcount: 1
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0091f3ee50
in_purgemaddrs() at in_purgemaddrs+0xce/frame 0xfffffe0091f3ee90
in_ifdetach() at in_ifdetach+0x83/frame 0xfffffe0091f3eeb0
if_detach_internal() at if_detach_internal+0x7ee/frame 0xfffffe0091f3ef30
if_detach() at if_detach+0x3d/frame 0xfffffe0091f3ef50
epair_clone_destroy() at epair_clone_destroy+0x117/frame 0xfffffe0091f3efa0
if_clone_destroyif() at if_clone_destroyif+0x175/frame 0xfffffe0091f3eff0
if_clone_destroy() at if_clone_destroy+0x205/frame 0xfffffe0091f3f040
ifioctl() at ifioctl+0x4ee/frame 0xfffffe0091f3f110
kern_ioctl() at kern_ioctl+0x28a/frame 0xfffffe0091f3f180
sys_ioctl() at sys_ioctl+0x15d/frame 0xfffffe0091f3f250
amd64_syscall() at amd64_syscall+0x276/frame 0xfffffe0091f3f370
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe0091f3f370
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x80047008a, rsp = 0x7fffffffe2f8, rbp = 0x7fffffffe310 ---
freeing 0xfffff8005a31d700 
root@beastie3:~ # 

Fatal trap 9: general protection fault while in kernel mode
cpuid = 13; apic id = 0d
instruction pointer     = 0x20:0xffffffff80d648e7
stack pointer           = 0x28:0xfffffe0000538170
frame pointer           = 0x28:0xfffffe00005381b0
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (softirq_13)
trap number             = 9
panic: general protection fault
cpuid = 13
time = 1556067304
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0000537e80
vpanic() at vpanic+0x19d/frame 0xfffffe0000537ed0
panic() at panic+0x43/frame 0xfffffe0000537f30
trap_fatal() at trap_fatal+0x394/frame 0xfffffe0000537f90
trap() at trap+0x6c/frame 0xfffffe00005380a0
calltrap() at calltrap+0x8/frame 0xfffffe00005380a0
--- trap 0x9, rip = 0xffffffff80d648e7, rsp = 0xfffffe0000538170, rbp = 0xfffffe00005381b0 ---
in_leavegroup_locked() at in_leavegroup_locked+0xa7/frame 0xfffffe00005381b0
inp_freemoptions() at inp_freemoptions+0x150/frame 0xfffffe0000538200
in_pcbfree_deferred() at in_pcbfree_deferred+0x16f/frame 0xfffffe0000538250
epoch_call_task() at epoch_call_task+0x1aa/frame 0xfffffe00005382b0
gtaskqueue_run_locked() at gtaskqueue_run_locked+0xf9/frame 0xfffffe0000538300
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0x88/frame 0xfffffe0000538330
fork_exit() at fork_exit+0x84/frame 0xfffffe0000538370
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0000538370
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 0 tid 100039 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why

Will rebase to 20070 when it goes in.

Test Plan

Add sysctls for if_multiaddr, in_mcast, in6_mcast, and ifnet counts and have @pho run stress tests.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 23908

Event Timeline

  • fix case where failure path could lead to ref leak

What kind of tests have been done in order to check for leaks?

Can you also try to start and stop rpcbind daemon and watch the memory statistics for INM's.
Further try to add and delete multicast addresses from ifnet and watch the memory statistics for INM's.

sys/netinet/igmp.c
1087

After this change you need to initialize the inm pointer to NULL!

Is it possible to fix PF and CARP instead? Allocating an INM with an initial refcount of 2 seems a bit odd!

sys/netinet/in_mcast.c
567

Nothing is preventing a lookup with inm_refcount == 1, which then will assert ??

Hi,

Are you trying to mimic the solution I did for IPv6 MLD?

--HPS

sys/netinet/in_mcast.c
2288

Please add a comment describing that the refcount is already acquired.

sys/netinet/in_mcast.c
641

in6_getmulti() is only setting refcount = 1 .... Can you try to follow the approach taken there?

if ((error = in6_joingroup(ifp, &in6, NULL, &in6m, 0)) != 0) {
        in6_leavegroup(im6o->im6o_membership[0], NULL);
        free(im6o->im6o_membership, M_CARP);
        break;
}
in6m_acquire(in6m);

The latter in6m_acquire(in6m) should also be removed from carp driver. I think this is a bug. Can you test?

if ((error = in6_joingroup(ifp, &in6, NULL, &in6m, 0)) != 0) {
        in6_leavegroup(im6o->im6o_membership[0], NULL);
        free(im6o->im6o_membership, M_CARP);
        break;
}
in6m_acquire(in6m);

Once this bug is fully fixed. It now takes longer to hit the use after free panic - but the other issue you cited might be a problem

The latter in6m_acquire(in6m) should also be removed from carp driver. I think this is a bug. Can you test?
sys/netinet/in_mcast.c
641

No. This is a bug. It relies on the caller to differentiate between when it already exists and when it's new.

sys/netinet/in_mcast.c
567

In order to be found in on a list it has to have a reference for that list in addition to the caller. The assert could only falsely trigger if the lock weren't held at the time and you hit a race.

Can you test ifconfig adding / removing and setting same ipv4 multicast address and ensure you don't leak by printing the refs involved?
Also can you do the same for starting and stopping rpcbind?

Fix at least a half dozen different issues exposed by the test case in the summary.

mmacy retitled this revision from Fix refcount issue in in_getmulti to Fix numerous refcount bugs in multicast ....Apr 25 2019, 11:50 PM
mmacy edited the summary of this revision. (Show Details)
mmacy edited the test plan for this revision. (Show Details)
mmacy added a subscriber: pho.
20190426 12:09:07 all (1/1): multicast2.sh
Apr 26 12:09:09 t2 mDNSResponder[14951]: mDNSResponder (Engineering Build) (Jan  5 2019 05:09:39) starting
Apr 26 12:09:09 t2 mDNSResponder[14951]: mDNS_AddDNSServer: Lock not held! mDNS_busy (0) mDNS_reentrancy (0)
panic: Assertion ifma->ifma_ifp == NULL || ifma->ifma_ifp == ifp failed at ../../../net/if.c:3777
cpuid = 8
time = 1556273349
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00004ec870
vpanic() at vpanic+0x19d/frame 0xfffffe00004ec8c0
panic() at panic+0x43/frame 0xfffffe00004ec920
if_delmulti_locked() at if_delmulti_locked+0x1e7/frame 0xfffffe00004ec940
if_delmulti_ifma_flags() at if_delmulti_ifma_flags+0xbb/frame 0xfffffe00004ec990
inm_release_task() at inm_release_task+0x1ac/frame 0xfffffe00004ec9f0
gtaskqueue_run_locked() at gtaskqueue_run_locked+0xf9/frame 0xfffffe00004eca40
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0x88/frame 0xfffffe00004eca70
fork_exit() at fork_exit+0x84/frame 0xfffffe00004ecab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00004ecab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---

Details @ https://people.freebsd.org/~pho/stress/log/mmacy035.txt

sys/netinet6/mld6.c
561

There is no need for CK_STAILQ_FOREACH_SAFE() here. Elements are not freed inside the for-loop and the next pointer of the STAILQ is valid even after rele_locked. Please explain.

Hi,

Can you add:

options DEBUG_MEMGUARD

To the kernel and then reproduce this test and you'll see the bug happens much earlier using this script:

ifconfig epair create
ifconfig epair0a 0/24 up
ifconfig epair0a destroy
Fatal trap 12: page fault while in kernel mode
cpuid = 4; apic id = 08
fault virtual address	= 0x10
fault code		= supervisor read data  , page not present
instruction pointer	= 0x20:0xffffffff80e2ec18
stack pointer	        = 0x28:0xfffffe084f181580
frame pointer	        = 0x28:0xfffffe084f1815c0
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 2786 (ifconfig)
trap number		= 12
panic: page fault
cpuid = 4
time = 1556280609
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe084f181240
vpanic() at vpanic+0x19d/frame 0xfffffe084f181290
panic() at panic+0x43/frame 0xfffffe084f1812f0
trap_fatal() at trap_fatal+0x394/frame 0xfffffe084f181350
trap_pfault() at trap_pfault+0x62/frame 0xfffffe084f1813a0
trap() at trap+0x2b4/frame 0xfffffe084f1814b0
calltrap() at calltrap+0x8/frame 0xfffffe084f1814b0
--- trap 0xc, rip = 0xffffffff80e2ec18, rsp = 0xfffffe084f181580, rbp = 0xfffffe084f1815c0 ---
in6_pcbpurgeif0() at in6_pcbpurgeif0+0xc8/frame 0xfffffe084f1815c0
_in6_ifdetach() at _in6_ifdetach+0x72/frame 0xfffffe084f1815f0
if_detach_internal() at if_detach_internal+0x7f6/frame 0xfffffe084f181670
if_detach() at if_detach+0x3d/frame 0xfffffe084f181690
epair_clone_destroy() at epair_clone_destroy+0x98/frame 0xfffffe084f1816e0
if_clone_destroyif() at if_clone_destroyif+0x175/frame 0xfffffe084f181730
if_clone_destroy() at if_clone_destroy+0x205/frame 0xfffffe084f181780
ifioctl() at ifioctl+0x4ee/frame 0xfffffe084f181850
kern_ioctl() at kern_ioctl+0x28a/frame 0xfffffe084f1818c0
sys_ioctl() at sys_ioctl+0x15d/frame 0xfffffe084f181990
amd64_syscall() at amd64_syscall+0x276/frame 0xfffffe084f181ab0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe084f181ab0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x80048631a, rsp = 0x7fffffffe248, rbp = 0x7fffffffe260 ---
KDB: enter: panic

__curthread () at /usr/img/freebsd/sys/amd64/include/pcpu.h:230
230		__asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (OFFSETOF_CURTHREAD));

(kgdb) frame 16
#16 0xffffffff80e2ec18 in in6_pcbpurgeif0 (pcbinfo=<optimized out>, ifp=0xfffff80161522000)
    at /usr/img/freebsd/sys/netinet6/in6_pcb.c:829
829					if (im6o->im6o_membership[i]->in6m_ifp ==
(kgdb) list
824				 * Drop multicast group membership if we joined
825				 * through the interface being detached.
826				 */
827				gap = 0;
828				for (i = 0; i < im6o->im6o_num_memberships; i++) {
829					if (im6o->im6o_membership[i]->in6m_ifp ==
830					    ifp) {
831						in6_leavegroup(im6o->im6o_membership[i], NULL);
832						gap++;
833					} else if (gap != 0) {
(kgdb) print i
$1 = 0
(kgdb) print im6o->im6o_num_memberships
$2 = 1
(kgdb) print im6o->im6o_membership[i]->in6m_ifp
Cannot access memory at address 0x10

(kgdb) print im6o 
$1 = (struct ip6_moptions *) 0xfffff800097c8140
(kgdb) print *im6o
$2 = {im6o_multicast_ifp = 0x0, im6o_multicast_hlim = 1 '\001', im6o_multicast_loop = 1 '\001', im6o_num_memberships = 1, 
  im6o_max_memberships = 31, im6o_membership = 0xfffff800097b3300, im6o_mfilters = 0xfffff8016156e400, imo6_epoch_ctx = {
    data = {0xdeadc0dedeadc0de, 0xdeadc0dedeadc0de}}}


Possible race with other thread:

(kgdb) bt
#0  0xffffffff8122aa7f in cpustop_handler () at /usr/img/freebsd/sys/x86/x86/mp_x86.c:1403
#1  0xffffffff8122aa40 in ipi_nmi_handler () at /usr/img/freebsd/sys/x86/x86/mp_x86.c:1364
#2  0xffffffff810b26f8 in trap (frame=0xfffffe0000284f30) at /usr/img/freebsd/sys/amd64/amd64/trap.c:206
#3  <signal handler called>
#4  lock_delay (la=0xfffffe084fc71d28) at /usr/img/freebsd/sys/kern/subr_lock.c:132
#5  0xffffffff80be9369 in _sx_xlock_hard (sx=0xffffffff820c53a0 <in6_multi_sx>, x=<optimized out>, opts=<optimized out>, 
    file=<optimized out>, line=<optimized out>) at /usr/img/freebsd/sys/kern/kern_sx.c:682
#6  0xffffffff80be9061 in _sx_xlock (sx=0xffffffff820c53a0 <in6_multi_sx>, opts=0, 
    file=0xffffffff8138072c "/usr/img/freebsd/sys/netinet6/in6_mcast.c", line=2180)
    at /usr/img/freebsd/sys/kern/kern_sx.c:325
#7  0xffffffff80e2afff in in6p_join_group (inp=0xfffff80282aba1e8, sopt=<optimized out>)
    at /usr/img/freebsd/sys/netinet6/in6_mcast.c:2180
#8  ip6_setmoptions (inp=0xfffff80282aba1e8, sopt=<optimized out>) at /usr/img/freebsd/sys/netinet6/in6_mcast.c:2790
#9  0xffffffff80e38111 in ip6_ctloutput (so=0xfffff80009c79368, sopt=0xfffffe084fc72928)
    at /usr/img/freebsd/sys/netinet6/ip6_output.c:1896
#10 0xffffffff80c7d39d in sosetopt (so=0xfffff80009c79368, sopt=0xfffffe084fc72928)
    at /usr/img/freebsd/sys/kern/uipc_socket.c:2760
#11 0xffffffff80c8293c in kern_setsockopt (td=0xfffff8001e1c65a0, s=<optimized out>, level=<optimized out>, 
    name=<optimized out>, val=<optimized out>, valseg=<optimized out>, valsize=20)
    at /usr/img/freebsd/sys/kern/uipc_syscalls.c:1260
#12 0xffffffff80c82884 in sys_setsockopt (td=0xfffffe084fc71d28, uap=<optimized out>)
    at /usr/img/freebsd/sys/kern/uipc_syscalls.c:1221
#13 0xffffffff810b3e16 in syscallenter (td=0xfffff8001e1c65a0)
    at /usr/img/freebsd/sys/amd64/amd64/../../kern/subr_syscall.c:135
#14 amd64_syscall (td=0xfffff8001e1c65a0, traced=0) at /usr/img/freebsd/sys/amd64/amd64/trap.c:1166
#15 0xffffffff8108ccdd in fast_syscall_common () at /usr/img/freebsd/sys/amd64/amd64/exception.S:504
#16 0x0000000000000008 in ?? ()
#17 0x0000000000000029 in ?? ()
#18 0x000000000000000c in ?? ()
#19 0x00007fffffffd990 in ?? ()
#20 0x0000000000000014 in ?? ()
#21 0x0000000000000000 in ?? ()
(kgdb) frame 7 
#7  0xffffffff80e2afff in in6p_join_group (inp=0xfffff80282aba1e8, sopt=<optimized out>)
    at /usr/img/freebsd/sys/netinet6/in6_mcast.c:2180
2180		IN6_MULTI_LOCK();
(kgdb) list
2175		/*
2176		 * Begin state merge transaction at MLD layer.
2177		 */
2178		in_pcbref(inp);
2179		INP_WUNLOCK(inp);
2180		IN6_MULTI_LOCK();
2181	
2182		if (is_new) {
2183			error = in6_joingroup_locked(ifp, &gsa->sin6.sin6_addr, imf,
2184			    &inm, 0);

Hi,

Can you add:

options DEBUG_MEMGUARD

To the kernel and then reproduce this test and you'll see the bug happens much earlier using this script:

ifconfig epair create
ifconfig epair0a 0/24 up
ifconfig epair0a destroy

That did not make any difference.

In D19886#431573, @pho wrote:
20190426 12:09:07 all (1/1): multicast2.sh
Apr 26 12:09:09 t2 mDNSResponder[14951]: mDNSResponder (Engineering Build) (Jan  5 2019 05:09:39) starting
Apr 26 12:09:09 t2 mDNSResponder[14951]: mDNS_AddDNSServer: Lock not held! mDNS_busy (0) mDNS_reentrancy (0)
panic: Assertion ifma->ifma_ifp == NULL || ifma->ifma_ifp == ifp failed at ../../../net/if.c:3777
cpuid = 8
time = 1556273349
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00004ec870
vpanic() at vpanic+0x19d/frame 0xfffffe00004ec8c0
panic() at panic+0x43/frame 0xfffffe00004ec920
if_delmulti_locked() at if_delmulti_locked+0x1e7/frame 0xfffffe00004ec940
if_delmulti_ifma_flags() at if_delmulti_ifma_flags+0xbb/frame 0xfffffe00004ec990
inm_release_task() at inm_release_task+0x1ac/frame 0xfffffe00004ec9f0
gtaskqueue_run_locked() at gtaskqueue_run_locked+0xf9/frame 0xfffffe00004eca40
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0x88/frame 0xfffffe00004eca70
fork_exit() at fork_exit+0x84/frame 0xfffffe00004ecab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00004ecab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---

Details @ https://people.freebsd.org/~pho/stress/log/mmacy035.txt

@pho do you have a simple reproducer?

I'm just using your test case:

service mdnsd onestart
ifconfig vtnet0 delete 2>/dev/null
ifconfig epair create
ifconfig epair0a 0/24 up
ifconfig epair0a destroy
service mdnsd onestop

You may need to add some VM pressure.

mmacy added a reviewer: markj.

I have been running the tests in tests/sys/net with Matt's patch. Many of the test cases in if_clone_test are currently skipped due to panics, and with this patch I am now able to complete a run with those tests enabled. Doing so with mdnsresponder running exposes a different issue in an lo(4) test case, which will be something else to look into. I have also added test cases for epair(4), which pass as well. I will be putting my changes to enable these tests and add the epair test cases up for review shortly.

Please also check output from vmstat -m and vmstat -z that you are not leaking memory:

vmstat -m | grep multi

before running tests:

ether_multi    17     2K       -       17  16,32,64,128
   in_multi     2     1K       -        2  256
  in6_multi    15     2K       -       15  32,256

after:

ether_multi 32839  2449K       -    40807  16,32,64,128
   in_multi     2     1K       -     3538  256
  in6_multi  9270  2316K       -    18525  32,256
In D19886#432900, @ryan_freqlabs.com wrote:

before running tests:

ether_multi    17     2K       -       17  16,32,64,128
   in_multi     2     1K       -        2  256
  in6_multi    15     2K       -       15  32,256

after:

ether_multi 32839  2449K       -    40807  16,32,64,128
   in_multi     2     1K       -     3538  256
  in6_multi  9270  2316K       -    18525  32,256

Looks like a yes. For iX purposes, leaking a megabyte over the course of months of uptime is ok - seeing as the code doesn't work at all otherwise. However, I'll revisit the patch once the reviews from @hselasky and @markj have gone in.

So, what you think about this patch? It just restores the behavior that was in stable/11. I used this patch with head/ for two weeks and seems there is no regression.

In D19886#435308, @ae wrote:

So, what you think about this patch? It just restores the behavior that was in stable/11. I used this patch with head/ for two weeks and seems there is no regression.

@ae go ahead

Ping: Can this change be abandoned now?

Ping - this patch should be abandoned?