Listening sockets improvements.

o Separate fields of struct socket that belong to listening from

fields that belong to normal dataflow, and unionize them.  This
shrinks the structure a bit.
- Take out selinfo's from the socket buffers into the socket. The
  first reason is to support braindamaged scenario when a socket is
  added to kevent(2) and then listen(2) is cast on it. The second
  reason is that there is future plan to make socket buffers pluggable,
  so that for a dataflow socket a socket buffer can be changed, and
  in this case we also want to keep same selinfos through the lifetime
  of a socket.
- Remove struct struct so_accf. Since now listening stuff no longer
  affects struct socket size, just move its fields into listening part
  of the union.
- Provide sol_upcall field and enforce that so_upcall_set() may be called
  only on a dataflow socket, which has buffers, and for listening sockets
  provide solisten_upcall_set().

o Remove ACCEPT_LOCK() global.

  • Add a mutex to socket, to be used instead of socket buffer lock to lock fields of struct socket that don't belong to a socket buffer.
  • Allow to acquire two socket locks, but the first one must belong to a listening socket.
  • Make soref()/sorele() to use atomic(9). This allows in some situations to do soref() without owning socket lock. There is place for improvement here, it is possible to make sorele() also to lock optionally.
  • Most protocols aren't touched by this change, except UNIX local sockets. See below for more information.

o Reduce copy-and-paste in kernel modules that accept connections from

listening sockets: provide function solisten_dequeue(), and use it in
the following modules: ctl(4), iscsi(4), ng_btsocket(4), ng_ksocket(4),
infiniband, rpc.

o UNIX local sockets.

  • Removal of ACCEPT_LOCK() global uncovered several races in the UNIX local sockets. Most races exist around spawning a new socket, when we are connecting to a local listening socket. To cover them, we need to hold locks on both PCBs when spawning a third one. This means holding them across sonewconn(). This creates a LOR between pcb locks and unp_list_lock.
  • To fix the new LOR, abandon the global unp_list_lock in favor of global unp_link_lock. Indeed, separating these two locks didn't provide us any extra parralelism in the UNIX sockets.
  • Now call into uipc_attach() may happen with unp_link_lock hold if, we are accepting, or without unp_link_lock in case if we are just creating a socket.
  • Another problem in UNIX sockets is that uipc_close() basicly did nothing for a listening socket. The vnode remained opened for connections. This is fixed by removing vnode in uipc_close(). Maybe the right way would be to do it for all sockets (not only listening), simply move the vnode teardown from uipc_detach() to uipc_close()?

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D9770


glebiusAuthored on Jun 8 2017, 9:30 PM
