The first version created a new file descriptor type for listening sockets,
and listen(2) toggled fileops and file type and f_data on the descriptor.
That worked pretty nice, but the toggle operation was racy and there was
no clear way on how to fix it, especially taking into account our lockless
optimizations for fget().
In the try #2 we don't toggle the file type, neither allocate new f_data.
Instead we create a union in struct socket, to separate data flow fields
from listening ones. This shrinks struct socket a bit (it still is very
The most important change of course is removal of ACCEPT_LOCK() global.
The new locking protocol is that we are allowed to take 2 socket locks
at a time, but the first one must correspond to a listening socket and
second one must not. Unfortunately, WITNESS doesn't yet have functionality
to enforce that.
Removal of ACCEPT_LOCK() uncovered two races in UNIX sockets. To fix the first one, we run sonewconn() holding both PCB locks - the connecting and the accepting sockets. This introduces a LOR between UNPCB lock and PCB_LIST. Reverse order lives in the pcblist sysctl handler. This will be fixed later. The second one is that uipc_close() basicly did nothing. If socket was listening, the vnode remained opened for connections. This is fixed by removing vnode in uipc_close(). Maybe the right way would be to do it for all sockets (not only listening), simply move the code from uipc_detach() to uipc_close()?
The socket API already sucked enough with listen(2) itself, which drastically changes filedescriptor type. But kqueue made it worse. :( We should allow the following sequence to work: add socket to kqueue, listen(2) it, then run kevent() and receive notifications about new connections then. This didn't work in FreeBSD until 2017, but recently it was fixed in r313043 :( Seems like we must keep this working. To make kqueue work with our pretty new union, we take the selinfo structures out of socket buffers and put them on the socket itself. The read selinfo works both for buffer and accept queue. Why did I move the write selinfo, too? Actually this shift of selinfos out of sockbufs is going to be needed for future work with sockbufs. There is plan to make sockbufs an opaque class, not visible to socket layer. This will allow for seamless integration of TLS sockbufs.
Changes to accept filters are cosmetic.
Another cosmetic change is that vnode points to unpcb, not socket.