Page MenuHomeFreeBSD

Fix lock order reversal tcpinp -> so_rcv in tcp_mss()
Needs RevisionPublic

Authored by sumit.saxena_broadcom.com on Wed, Apr 1, 9:17 AM.
Tags
None
Referenced Files
F150810304: D56213.diff
Sat, Apr 4, 5:14 AM
F150810282: D56213.diff
Sat, Apr 4, 5:14 AM
Unknown Object (File)
Thu, Apr 2, 11:07 AM
Unknown Object (File)
Thu, Apr 2, 11:07 AM
Unknown Object (File)
Thu, Apr 2, 10:40 AM

Details

Summary

When tcp_mss() is called from syncache_socket() or tcp_input() while
holding the inpcb (tcpinp) lock, it acquires SOCK_RECVBUF_LOCK(so_rcv)
and SOCK_SENDBUF_LOCK(so_snd). That establishes lock order tcpinp ->
so_rcv, which reverses the canonical so_rcv -> tcpinp order used
elsewhere and triggers WITNESS.

Add a skip_sockbuf argument to tcp_mss(). When set (callers that hold
the inpcb lock: syncache_socket and tcp_input MSS option handling),
skip the socket buffer lock and resize logic. TCP state (t_maxseg,
TF2_PROC_SACK_PROHIBIT, TSO) is still updated. Socket buffer sizing
still runs when tcp_mss() is invoked without the inpcb lock (e.g. from
tcp_usrreq).

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

tuexen added inline comments.
sys/netinet/tcp_var.h
1484

Would it make sense to use bool instead of int?

glebius requested changes to this revision.Wed, Apr 1, 3:33 PM

That establishes lock order tcpinp ->so_rcv, which reverses the canonical so_rcv -> tcpinp order used elsewhere and triggers WITNESS.

Where exactly elsewhere is "the canonical order"?

For example, let's take a look at send(2): tcp_usr_send() first takes inpcb lock and then appends to the send buffer.

Can you please share your WITNESS report?

This revision now requires changes to proceed.Wed, Apr 1, 3:33 PM

That establishes lock order tcpinp ->so_rcv, which reverses the canonical so_rcv -> tcpinp order used elsewhere and triggers WITNESS.

Where exactly elsewhere is "the canonical order"?

For example, let's take a look at send(2): tcp_usr_send() first takes inpcb lock and then appends to the send buffer.

Can you please share your WITNESS report?

Below are warnings I see in dmesg :

lock order reversal:
1st 0xff0100820a0d51e0 so_rcv (so_rcv, sleep mutex) @ /usr/src/sys/kern/uipc_socket.c:1899
2nd 0xff01008208a10020 tcpinp (tcpinp, rw) @ /usr/src/sys/kern/uipc_ktls.c:1962
lock order tcpinp -> so_rcv established at:
#0 0xffffffff80c0e5e4 at witness_checkorder+0x364
#1 0xffffffff80b70c41 at __mtx_lock_flags+0x91
#2 0xffffffff80da2f6f at tcp_mss+0x15f
#3 0xffffffff80dbd2de at syncache_socket+0x6de
#4 0xffffffff80dbc2b9 at syncache_expand+0x6e9
#5 0xffffffff80d9de60 at tcp_input_with_port+0x9d0
#6 0xffffffff80d9f1ab at tcp_input+0xb
#7 0xffffffff80d8c03d at ip_input+0x28d
#8 0xffffffff80d01554 at netisr_dispatch_src+0xb4
#9 0xffffffff80ce36da at ether_demux+0x16a
#10 0xffffffff80ce4c4e at ether_nh_input+0x3ce
#11 0xffffffff80d01554 at netisr_dispatch_src+0xb4
#12 0xffffffff80ce3b25 at ether_input+0xd5
#13 0xffffffff83723154 at uether_rxflush+0xa4
#14 0xffffffff8371cc49 at axe_bulk_read_callback+0x229
#15 0xffffffff809982fe at usbd_callback_wrapper+0x87e
#16 0xffffffff809995c9 at usb_command_wrapper+0x99
#17 0xffffffff8099847e at usb_callback_proc+0x8e
lock order so_rcv -> tcpinp attempted at:
#0 0xffffffff80c0ee61 at witness_checkorder+0xbe1
#1 0xffffffff80b929f5 at _rw_wlock_cookie+0x65
#2 0xffffffff80c3bd1c at ktls_destroy+0x4dc
#3 0xffffffff80c4adfc at sbdestroy+0x4c
#4 0xffffffff80c4e2e2 at sorele_locked+0x162
#5 0xffffffff80db537f at tcp_close+0x17f
#6 0xffffffff837969e7 at ctf_process_rst+0xe7
#7 0xffffffff83776609 at rack_do_segment_nounlock+0x2469
#8 0xffffffff83774136 at rack_do_segment+0xe6
#9 0xffffffff80d9e5a2 at tcp_input_with_port+0x1112
#10 0xffffffff80d9f1ab at tcp_input+0xb
#11 0xffffffff80d8c03d at ip_input+0x28d
#12 0xffffffff80d01554 at netisr_dispatch_src+0xb4
#13 0xffffffff80ce36da at ether_demux+0x16a
#14 0xffffffff80ce4c4e at ether_nh_input+0x3ce
#15 0xffffffff80d01554 at netisr_dispatch_src+0xb4
#16 0xffffffff80ce3b25 at ether_input+0xd5
#17 0xffffffff80cfcfd9 at iflib_rxeof+0xbb9

sys/netinet/tcp_var.h
1484

Yes, bool would be better.

This is kTLS that violates the correct order. This is a known problem (at least to some of us).

Also, the WITNESS report is correct: in the first 3 lines it points at the problem, then for verbosity it first prints the stack that established the correct order and then full stack of the violator.

Lock order for TCP has always been

INP lock
<Then>
Socket buffer lock

Personally I would like to get rid of at least the send socket buffer lock and just use the INP lock for TCP. But
thats for a future review :)