Page MenuHomeFreeBSD

Listening sockets revamp try #2.

Authored by kbowling on Feb 23 2017, 5:38 PM.



The first version created a new file descriptor type for listening sockets,
and listen(2) toggled fileops and file type and f_data on the descriptor.
That worked pretty nice, but the toggle operation was racy and there was
no clear way on how to fix it, especially taking into account our lockless
optimizations for fget().

In the try #2 we don't toggle the file type, neither allocate new f_data.
Instead we create a union in struct socket, to separate data flow fields
from listening ones. This shrinks struct socket a bit (it still is very
big, however).

The most important change of course is removal of ACCEPT_LOCK() global.
The new locking protocol is that we are allowed to take 2 socket locks
at a time, but the first one must correspond to a listening socket and
second one must not. Unfortunately, WITNESS doesn't yet have functionality
to enforce that.

UNIX sockets

Removal of ACCEPT_LOCK() uncovered two races in UNIX sockets. To fix the
first one, we run sonewconn() holding both PCB locks - the connecting
and the accepting sockets. This introduces a LOR between UNPCB lock and
PCB_LIST. Reverse order lives in the pcblist sysctl handler. This will
be fixed later.
The second one is that uipc_close() basicly did nothing. If socket was
listening, the vnode remained opened for connections. This is fixed by
removing vnode in uipc_close(). Maybe the right way would be to do it
for all sockets (not only listening), simply move the code from
uipc_detach() to uipc_close()?


The socket API already sucked enough with listen(2) itself, which
drastically changes filedescriptor type. But kqueue made it worse. :(
We should allow the following sequence to work: add socket to kqueue,
listen(2) it, then run kevent() and receive notifications about new
connections then. This didn't work in FreeBSD until 2017, but recently
it was fixed in r313043 :( Seems like we must keep this working.
To make kqueue work with our pretty new union, we take the selinfo
structures out of socket buffers and put them on the socket itself.
The read selinfo works both for buffer and accept queue.
Why did I move the write selinfo, too? Actually this shift of selinfos
out of sockbufs is going to be needed for future work with sockbufs.
There is plan to make sockbufs an opaque class, not visible to socket
layer. This will allow for seamless integration of TLS sockbufs.

Changes to accept filters are cosmetic.
Another cosmetic change is that vnode points to unpcb, not socket.

Diff Detail

rS FreeBSD src repository - subversion
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

glebius retitled this revision from to Listening sockets revamp try #2..
glebius updated this object.
  • Provide solisten_wakeup(), call it from sonewconn() and from soisconnected().
  • Fix compilation and convert to solisten_dequeue() and to solisten_upcall_set()
  • Make SCTP compilable.
  • Don't meddle into so_options in SCTP.
tuexen added inline comments.
5207 ↗(On Diff #27729)

Why don't we need to check if the accept queue is already full? This was done in the old code but doesn't seem to be the case any more for the new code?

7009 ↗(On Diff #27729)

Calliing listen() on a 1-to-many style socket is perfectly fine. See RFC 6458. Why don't you fail it indicating EOPNOTSUPP?

7009 ↗(On Diff #27729)

I meant: Why do you fail it...

5207 ↗(On Diff #27729)

Here I trusted the comment, not the code. Comment says that we are checking against socket still listening.

The SCTP code always used not the SO_ACCEPTCONN flag but so_qlimit for this check, since it tweaked so_qlimit to unlisten a socket. The usual check in SCTP look like:

(so->so_qlimit > 0)

I believe, for some obscure reason this place has different check, but effectively should have the same. Trusting the comment, as I said.

If you think that queue overflow is also a case we should check for, than this should be:

(stcb == NULL && (!SOLISTENING(inp->sctp_socket) || (inp->sctp_socket->so_qlen >= inp->sctp_socket->so_qlimit))

Is calling sctp_abort_association() the right thing to do for listen queue overflow?

7009 ↗(On Diff #27729)

Yes, this change is definitely wrong.

My understanding of 1-to-many listening is that we just mark the SCTP pcb as open for new connections, and we don't actually do anything to the socket itself, since a 1-to-many can't have an accept queue. So, would the correct fix be to just skip calling solisten_proto() for the 1-to-many sockets? Can you please point me to the code where we will automatically accept (or not accept) new associations?

5207 ↗(On Diff #27729)

The condition above is the one you want to use. That should fix this.

7138 ↗(On Diff #27729)

I think we need to keep this, since you will not be able to call connect() on a 1-to-many style socket after having called listen() if you remove it.

  • Merge commit '153f9fe91de8a2d0aa5910076e983a7c9d76100a' into solisten
  • Check if socket is not listening or is listening with queue overflow.
  • SOCK_LOCK is no longer synonym for locking receive socket buffer.
  • Access to so_error and to receive socket buffer are not protected
  • Allow listening on 1-to-many sockets, but not actually convert them
  • Merge commit 'f92f6763f17a7dff5cd5e4b96b5c433c33931811' into solisten
  • Use the SCTP_PCB_FLAGS_ACCEPTING flags to check for listeners.

Merge remote-tracking branch 'FreeBSD/master' into solisten

  • Fix check for the listen queue overflow. We shall not do this
  • Revert unnecessary changes to syncache KPI.
  • Put back SO_ACCEPTFILTER on the listening socket. It is inherited by
  • Copy so_options holding the lock. This closes race with accept
  • Two unrelated changes: accept filter teardown and close race.
  • Fix a degenerate case when soisdisconnected() would call soisconnected().
  • Improve fe263daf185e1b88691f9d1df5c92b56a496e050: check sol_accept_filter

I plan to commit this tomorrow.

This revision was automatically updated to reflect the committed changes.

In essence some notes from bugzilla 220404 for
future reference to folks that might look here.

Parts of the wording here are 32-bit powerpc
FreeBSD memory layout specific. But other
aliasing of ->sol_upcall would likely also be

->so_rcv.sb_sel and ->sol_upcall are aliased in the new
union of 2 anonymous structs.

This creates problems for the code as it is and the
test for if ->sol_upcall is NULL is ineffective for
determining when ->sol_upcall should be used as
things are.

static struct socket *
soalloc(struct vnet *vnet)

struct socket *so;
so = uma_zalloc(socket_zone, M_NOWAIT | M_ZERO);

. . .

so->so_rcv.sb_sel = &so->so_rdsel;
so->so_snd.sb_sel = &so->so_wrsel;

. . .

That so->so_rcv.sb_sel assignment makes so->sol_upcall
non-NULL and so appear to be defined for use.

And that makes the following code problematical:

solisten_wakeup(struct socket *sol)

if (sol->sol_upcall != NULL)
        (void )sol->sol_upcall(sol, sol->sol_upcallarg, M_NOWAIT);
else {

. . .

And this code is what is failing on production 32-bit
powerpc kernels (fatal kernel trap for illegal instructions
from jumping to *...->sb_sel conent).

There could be more anonymous struct field problems in
the union that is in struct socket . I've not checked.

I'll note that the only references to sol_upcall are:

grep -r "\<sol_upcall" /usr/src/sys/* | more
/usr/src/sys/kern/uipc_socket.c: if (sol->sol_upcall != NULL)
/usr/src/sys/kern/uipc_socket.c: (void )sol->sol_upcall(sol, sol->sol_upcallarg, M_NOWAIT);
/usr/src/sys/kern/uipc_socket.c: so->sol_upcall = func;
/usr/src/sys/kern/uipc_socket.c: so->sol_upcallarg = arg;
/usr/src/sys/sys/socketvar.h: so_upcall_t *sol_upcall; /* (e) */
/usr/src/sys/sys/socketvar.h: void *sol_upcallarg; /* (e) */

None of those assign NULL.

If NULL was assigned then ->so_rcv.sb_sel would
also become NULL in value.

Again for future reference for folks that might look at this:

At least some C++ compilers reject the header syntax using enum
in one of the anonymous structs in the new union. Probably all
should reject that syntax used where it is.

Likely the headers are supposed to support use from C++, not
just from C.

I'd guess this is easily tested via a temporary C++ source file that
just has the include preprocessor directive(s) required to see if
compiling the temporary source is rejected.

Please see PR 220454. This code generates kernel panics.

Fixed version has been discussed in and committed to head. I think we can close this one.

kbowling abandoned this revision.
kbowling added a reviewer: glebius.

Commandeer to close

Quick question from scrolling through

Never mind; I see you abandoned it.


Can you get the "knl" which I suppose stands for "kernel" out of that name? I've never seen us do that before anywhere in native FreeBSD code.

You get the idea? Is there any reason we don't follow that?

Given they are static I keep wondering why they need to be functions (despite the compiler going to inline them). Normally we do the "function wrappers" in order to export a stable KPI/KBI to modules and 3rd party code, which doesn't seem to be the need for with these?