Page MenuHomeFreeBSD

Fix possible panic in udp6_input() due to lack of locking.
ClosedPublic

Authored by ae on Jan 19 2021, 8:03 AM.

Details

Summary
The lookup for a IPv6 multicast addresses corresponding to
the destination addresses in the datagram is protected by
the NET_EPOCH section. Access to each PCB is protected by
INP_RLOCK during comparing. But access to socket's so_options
field is not protected. And in some cases it is possible,
that PCB pointer is still valid, but inp_socket is not.
The patch wides lock holding to protect access to inp_socket.
It copies locking strategy from IPv4 UDP handling.

Also fix lock and mbuf leak in udp_input() that can happen
in error case, and modify UDP statistic accounting to be the
same as for IPv6.

PR: 232192
Test Plan

I wrote small program to reproduce panic:


It joins to IPv6 multicast group and reads packets in the loop. Also it periodically closes and reopens socket.
With packet generator I produce needed multicast packets and with ~15kpps rate I can reproduce the panic within 20 seconds.

Diff Detail

Repository
R10 FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

ae published this revision for review.Jan 19 2021, 8:06 AM
ae retitled this revision from Fix pissible panic in udp6_input() due to lack of locking. to Fix possible panic in udp6_input() due to lack of locking..Jan 19 2021, 8:39 AM
gnn requested changes to this revision.Jan 19 2021, 1:07 PM
gnn added a subscriber: gnn.

I believe the use of this boolean is the wrong approach. If you want to check and then lock you'll need to an an INP_RLOCKED similar to INP_WLOCKED which can be found in in_pcb.h.

This revision now requires changes to proceed.Jan 19 2021, 1:07 PM

Before epochification we had an integer variable that was used to track locking.

#define        UH_WLOCKED      2
#define        UH_RLOCKED      1
#define        UH_UNLOCKED     0

And currently rwlock doesn't provide public function or macro similar to rw_wowned() for shared lock.

Yup, I understand that we do not currently have that function. I think you ought to add that as part of this fix.

In D28232#631463, @gnn wrote:

Yup, I understand that we do not currently have that function. I think you ought to add that as part of this fix.

I'm not expert in the locking internals, but as I understand from the code, it is not possible to determine that shared lock was held in the current thread. So, I can't just create RW_RLOCKED() macro, because some other thread can hold shared lock and only it should unlock the lock later, otherwise my unlock will decrease shared lock counter and this will lead to the problem. But I can be wrong.

Yes it is impossible to have RW_RLOCKED() functionality without adding significant overhead. Local boolean tracking the lock status is preferable over lock state interrogation anyway.

Missed one case when PCB could be unlocked before check - when PCB is
already dying and imo is NULL. Also remove extra IN6_IS_ADDR_MULTICAST(),
we already checked it few lines before.

The last patch seems to have fixed the panic for us. But I have doubts, probably we should take INP_RLOCK just before checking in6p_moptions. Any thoughts?
Let me remind what panic it is targeted to fix:

Fatal trap 12: page fault while in kernel mode
cpuid = 8; apic id = 12
fault virtual address	= 0xcc
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80d36a19
stack pointer	        = 0x0:0xfffffe015b813440
frame pointer	        = 0x0:0xfffffe015b813560
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 12 (irq141: mlx5_core1)
trap number		= 12
panic: page fault
cpuid = 8
time = 1611737705
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe015b813100
vpanic() at vpanic+0x182/frame 0xfffffe015b813150
panic() at panic+0x43/frame 0xfffffe015b8131b0
trap_fatal() at trap_fatal+0x387/frame 0xfffffe015b813210
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe015b813260
trap() at trap+0x271/frame 0xfffffe015b813370
calltrap() at calltrap+0x8/frame 0xfffffe015b813370
--- trap 0xc, rip = 0xffffffff80d36a19, rsp = 0xfffffe015b813440, rbp = 0xfffffe015b813560 ---
udp6_input() at udp6_input+0x759/frame 0xfffffe015b813560
ip6_input() at ip6_input+0xb3a/frame 0xfffffe015b813640
netisr_dispatch_src() at netisr_dispatch_src+0xca/frame 0xfffffe015b813690
ether_demux() at ether_demux+0x138/frame 0xfffffe015b8136c0
ether_nh_input() at ether_nh_input+0x344/frame 0xfffffe015b813720
netisr_dispatch_src() at netisr_dispatch_src+0xca/frame 0xfffffe015b813770
ether_input() at ether_input+0x69/frame 0xfffffe015b8137d0
ether_demux() at ether_demux+0x121/frame 0xfffffe015b813800
ether_nh_input() at ether_nh_input+0x344/frame 0xfffffe015b813860
netisr_dispatch_src() at netisr_dispatch_src+0xca/frame 0xfffffe015b8138b0
ether_input() at ether_input+0x69/frame 0xfffffe015b813910
tcp_lro_queue_mbuf() at tcp_lro_queue_mbuf+0xca/frame 0xfffffe015b813940
mlx5e_rx_cq_comp() at mlx5e_rx_cq_comp+0xfd/frame 0xfffffe015b813a60
mlx5_cq_completion() at mlx5_cq_completion+0x90/frame 0xfffffe015b813ac0
mlx5_eq_int() at mlx5_eq_int+0xb0/frame 0xfffffe015b813b10
mlx5_msix_handler() at mlx5_msix_handler+0x15/frame 0xfffffe015b813b20
ithread_loop() at ithread_loop+0x24d/frame 0xfffffe015b813bb0
fork_exit() at fork_exit+0x7e/frame 0xfffffe015b813bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe015b813bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic

...

#14 0xffffffff80efab8f in trap_pfault (frame=0xfffffe015b813380, usermode=<optimized out>, signo=<optimized out>, ucode=<optimized out>)
    at /usr/src/sys/amd64/amd64/trap.c:732
#15 0xffffffff80efa1f1 in trap (frame=0xfffffe015b813380) at /usr/src/sys/amd64/amd64/trap.c:398
#16 <signal handler called>
#17 udp6_input (mp=0xfffffe015b8135a8, offp=<optimized out>, proto=<optimized out>) at /usr/src/sys/netinet6/udp6_usrreq.c:418
#18 0xffffffff80d1caca in ip6_input (m=0xfffff80325811600) at /usr/src/sys/netinet6/ip6_input.c:931
#19 0xffffffff80c5be5a in netisr_dispatch_src (proto=6, source=<optimized out>, m=0xffffffff815742c0 <vnet_entry_udb>) at /usr/src/sys/net/netisr.c:1143
#20 0xffffffff80c4f548 in ether_demux (ifp=0xfffff80003d6b800, m=0x11) at /usr/src/sys/net/if_ethersubr.c:921
#21 0xffffffff80c508a4 in ether_input_internal (ifp=0xfffff80003d6b800, m=0x11) at /usr/src/sys/net/if_ethersubr.c:705
#22 ether_nh_input (m=<optimized out>) at /usr/src/sys/net/if_ethersubr.c:735

...
kgdb) f 17
#17 udp6_input (mp=0xfffffe015b8135a8, offp=<optimized out>, proto=<optimized out>) at /usr/src/sys/netinet6/udp6_usrreq.c:418
418	/usr/src/sys/netinet6/udp6_usrreq.c: No such file or directory.
(kgdb) i lo
inp_locked = false
pcblist = <optimized out>
last = 0xfffff809753377e0
imo = <optimized out>
fromsa = {{sin6_len = 28 '\034', sin6_family = 28 '\034', sin6_port = 56110, sin6_flowinfo = 0, sin6_addr = {__u6_addr = {
        __u6_addr8 = "\376\200", '\000' <repeats 12 times>, <incomplete sequence \365>, __u6_addr16 = {33022, 0, 0, 0, 0, 0, 0, 62720}, __u6_addr32 = {33022, 0, 0, 
          4110417920}}}, sin6_scope_id = 12}, {sin6_len = 28 '\034', sin6_family = 28 '\034', sin6_port = 56110, sin6_flowinfo = 0, sin6_addr = {__u6_addr = {
        __u6_addr8 = "\377\002", '\000' <repeats 12 times>, "\002\020", __u6_addr16 = {767, 0, 0, 0, 0, 0, 0, 4098}, __u6_addr32 = {767, 0, 0, 268566528}}}, 
    sin6_scope_id = 12}}
m = 0xfffff80325811600
off = <optimized out>
ifp = <optimized out>
uh = <optimized out>
ip6 = <optimized out>
plen = <optimized out>
nxt = <optimized out>
ulen = <optimized out>
cscov_partial = 629216880
uh_sum = <optimized out>
pcbinfo = <optimized out>
fwd_tag = <optimized out>
inp = 0xfffff809753377e0
up = <optimized out>

...
(kgdb) p/x last->inp_flags2
$2 = 0x10
(kgdb) p/x last->in6p_moptions
$3 = 0x0
(kgdb) p/x last->inp_socket
$4 = 0x0
(kgdb) p offsetof (struct socket, so_options)
$1 = (int *) 0xcc
ae edited the test plan for this revision. (Show Details)

Make locking similar to what we heave for IPv4.
Fix mbuf leaks and lock leaks for error case.

This revision was not accepted when it landed; it landed in state Needs Review.Feb 11 2021, 9:24 AM
This revision was automatically updated to reflect the committed changes.