Details

Reviewers

tuexen
glebius

Group Reviewers

transport

Commits

rS319722: Listening sockets improvements.

Summary

The first version created a new file descriptor type for listening sockets,
and listen(2) toggled fileops and file type and f_data on the descriptor.
That worked pretty nice, but the toggle operation was racy and there was
no clear way on how to fix it, especially taking into account our lockless
optimizations for fget().

In the try #2 we don't toggle the file type, neither allocate new f_data.
Instead we create a union in struct socket, to separate data flow fields
from listening ones. This shrinks struct socket a bit (it still is very
big, however).

The most important change of course is removal of ACCEPT_LOCK() global.
The new locking protocol is that we are allowed to take 2 socket locks
at a time, but the first one must correspond to a listening socket and
second one must not. Unfortunately, WITNESS doesn't yet have functionality
to enforce that.

UNIX sockets

Removal of ACCEPT_LOCK() uncovered two races in UNIX sockets. To fix the
first one, we run sonewconn() holding both PCB locks - the connecting
and the accepting sockets. This introduces a LOR between UNPCB lock and
PCB_LIST. Reverse order lives in the pcblist sysctl handler. This will
be fixed later.
The second one is that uipc_close() basicly did nothing. If socket was
listening, the vnode remained opened for connections. This is fixed by
removing vnode in uipc_close(). Maybe the right way would be to do it
for all sockets (not only listening), simply move the code from
uipc_detach() to uipc_close()?

KQUEUE

The socket API already sucked enough with listen(2) itself, which
drastically changes filedescriptor type. But kqueue made it worse. :(
We should allow the following sequence to work: add socket to kqueue,
listen(2) it, then run kevent() and receive notifications about new
connections then. This didn't work in FreeBSD until 2017, but recently
it was fixed in r313043 :( Seems like we must keep this working.
To make kqueue work with our pretty new union, we take the selinfo
structures out of socket buffers and put them on the socket itself.
The read selinfo works both for buffer and accept queue.
Why did I move the write selinfo, too? Actually this shift of selinfos
out of sockbufs is going to be needed for future work with sockbufs.
There is plan to make sockbufs an opaque class, not visible to socket
layer. This will allow for seamless integration of TLS sockbufs.

P.S.
Changes to accept filters are cosmetic.
Another cosmetic change is that vnode points to unpcb, not socket.

Diff Detail

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 9616
Build 10061: arc lint + arc unit

Event Timeline

glebius updated this revision to Diff 25629.Feb 23 2017, 5:38 PM

glebius retitled this revision from to Listening sockets revamp try #2..

glebius updated this object.

Herald added a reviewer: transport. · View Herald TranscriptFeb 23 2017, 5:38 PM

Provide solisten_wakeup(), call it from sonewconn() and from soisconnected().
Fix compilation and convert to solisten_dequeue() and to solisten_upcall_set()

Make SCTP compilable.
Don't meddle into so_options in SCTP.

Harbormaster completed remote builds in B8901: Diff 27729.Apr 25 2017, 8:06 PM

tuexen added a subscriber: tuexen.May 15 2017, 8:15 PM

tuexen added inline comments.

sys/netinet/sctp_input.c
5204–5206	Why don't we need to check if the accept queue is already full? This was done in the old code but doesn't seem to be the case any more for the new code?
sys/netinet/sctp_usrreq.c
7014	Calliing listen() on a 1-to-many style socket is perfectly fine. See RFC 6458. Why don't you fail it indicating EOPNOTSUPP?

tuexen added inline comments.May 15 2017, 8:20 PM

sys/netinet/sctp_usrreq.c
7014	I meant: Why do you fail it...

tuexen added a reviewer: tuexen.May 15 2017, 8:22 PM

glebius added inline comments.May 15 2017, 9:16 PM

sys/netinet/sctp_input.c
5204–5206	Here I trusted the comment, not the code. Comment says that we are checking against socket still listening. The SCTP code always used not the SO_ACCEPTCONN flag but so_qlimit for this check, since it tweaked so_qlimit to unlisten a socket. The usual check in SCTP look like: (so->so_qlimit > 0) I believe, for some obscure reason this place has different check, but effectively should have the same. Trusting the comment, as I said. If you think that queue overflow is also a case we should check for, than this should be: (stcb == NULL && (!SOLISTENING(inp->sctp_socket) \|\| (inp->sctp_socket->so_qlen >= inp->sctp_socket->so_qlimit)) Is calling sctp_abort_association() the right thing to do for listen queue overflow?
sys/netinet/sctp_usrreq.c
7014	Yes, this change is definitely wrong. My understanding of 1-to-many listening is that we just mark the SCTP pcb as open for new connections, and we don't actually do anything to the socket itself, since a 1-to-many can't have an accept queue. So, would the correct fix be to just skip calling solisten_proto() for the 1-to-many sockets? Can you please point me to the code where we will automatically accept (or not accept) new associations?

tuexen added inline comments.May 15 2017, 10:04 PM

sys/netinet/sctp_input.c
5204–5206	The condition above is the one you want to use. That should fix this.

tuexen added inline comments.May 25 2017, 9:38 PM

sys/netinet/sctp_usrreq.c
7143	I think we need to keep this, since you will not be able to call `connect()` on a 1-to-many style socket after having called `listen()` if you remove it.

Merge commit '153f9fe91de8a2d0aa5910076e983a7c9d76100a' into solisten
Check if socket is not listening or is listening with queue overflow.
SOCK_LOCK is no longer synonym for locking receive socket buffer.
Access to so_error and to receive socket buffer are not protected
Allow listening on 1-to-many sockets, but not actually convert them
Merge commit 'f92f6763f17a7dff5cd5e4b96b5c433c33931811' into solisten
Use the SCTP_PCB_FLAGS_ACCEPTING flags to check for listeners.

Merge remote-tracking branch 'FreeBSD/master' into solisten

Fix check for the listen queue overflow. We shall not do this
Revert unnecessary changes to syncache KPI.
Put back SO_ACCEPTFILTER on the listening socket. It is inherited by
Copy so_options holding the lock. This closes race with accept
Two unrelated changes: accept filter teardown and close race.
Fix a degenerate case when soisdisconnected() would call soisconnected().
Improve fe263daf185e1b88691f9d1df5c92b56a496e050: check sol_accept_filter

I plan to commit this tomorrow.

Closed by commit rS319722: Listening sockets improvements. (authored by glebius). · Explain WhyJun 9 2017, 8:00 PM

This revision was automatically updated to reflect the committed changes.

rlibby mentioned this in D11062: ddb show socket debugging.Jun 14 2017, 7:07 AM

In essence some notes from bugzilla 220404 for
future reference to folks that might look here.

Parts of the wording here are 32-bit powerpc
FreeBSD memory layout specific. But other
aliasing of ->sol_upcall would likely also be
problematical.

->so_rcv.sb_sel and ->sol_upcall are aliased in the new
union of 2 anonymous structs.

This creates problems for the code as it is and the
test for if ->sol_upcall is NULL is ineffective for
determining when ->sol_upcall should be used as
things are.

static struct socket *
soalloc(struct vnet *vnet)
{

struct socket *so;
 
so = uma_zalloc(socket_zone, M_NOWAIT | M_ZERO);

. . .

so->so_rcv.sb_sel = &so->so_rdsel;
so->so_snd.sb_sel = &so->so_wrsel;

. . .

That so->so_rcv.sb_sel assignment makes so->sol_upcall
non-NULL and so appear to be defined for use.

And that makes the following code problematical:

void
solisten_wakeup(struct socket *sol)
{

if (sol->sol_upcall != NULL)
        (void )sol->sol_upcall(sol, sol->sol_upcallarg, M_NOWAIT);
else {

. . .

And this code is what is failing on production 32-bit
powerpc kernels (fatal kernel trap for illegal instructions
from jumping to *...->sb_sel conent).

There could be more anonymous struct field problems in
the union that is in struct socket . I've not checked.

I'll note that the only references to sol_upcall are:

grep -r "\<sol_upcall" /usr/src/sys/* | more
/usr/src/sys/kern/uipc_socket.c: if (sol->sol_upcall != NULL)
/usr/src/sys/kern/uipc_socket.c: (void )sol->sol_upcall(sol, sol->sol_upcallarg, M_NOWAIT);
/usr/src/sys/kern/uipc_socket.c: so->sol_upcall = func;
/usr/src/sys/kern/uipc_socket.c: so->sol_upcallarg = arg;
/usr/src/sys/sys/socketvar.h: so_upcall_t *sol_upcall; /* (e) */
/usr/src/sys/sys/socketvar.h: void *sol_upcallarg; /* (e) */

None of those assign NULL.

If NULL was assigned then ->so_rcv.sb_sel would
also become NULL in value.

Again for future reference for folks that might look at this:

At least some C++ compilers reject the header syntax using enum
in one of the anonymous structs in the new union. Probably all
should reject that syntax used where it is.

Likely the headers are supposed to support use from C++, not
just from C.

I'd guess this is easily tested via a temporary C++ source file that
just has the include preprocessor directive(s) required to see if
compiling the temporary source is rejected.

sylvain_sylvaingarrigues.com added a subscriber: sylvain_sylvaingarrigues.com.Jul 3 2017, 5:42 PM

Please see PR 220454. This code generates kernel panics.

Fixed version has been discussed in https://reviews.freebsd.org/D11475 and committed to head. I think we can close this one.

jch added a subscriber: jch.Jul 6 2017, 8:33 PM

Can we close this?

Herald added a subscriber: imp. · View Herald TranscriptMay 15 2018, 6:12 AM

Commandeer to close

Quick question from scrolling through

Never mind; I see you abandoned it.

head/sys/kern/uipc_socket.c
166 ↗	(On Diff #29395)	Can you get the "knl" which I suppose stands for "kernel" out of that name? I've never seen us do that before anywhere in native FreeBSD code. so_rlock() so_unlock() so_... so_wlock() so_.. You get the idea? Is there any reason we don't follow that? Given they are static I keep wondering why they need to be functions (despite the compiler going to inline them). Normally we do the "function wrappers" in order to export a stable KPI/KBI to modules and 3rd party code, which doesn't seem to be the need for with these?

Listening sockets revamp try #2.AbandonedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 29118

sys/cam/ctl/ctl_ha.c

sys/dev/iscsi/icl_soft_proxy.c

sys/kern/sys_socket.c

sys/kern/uipc_accf.c

sys/kern/uipc_debug.c

sys/kern/uipc_sockbuf.c

sys/kern/uipc_socket.c

sys/kern/uipc_syscalls.c

sys/kern/uipc_usrreq.c

sys/kern/vfs_default.c

sys/kern/vfs_subr.c

sys/kern/vnode_if.src

sys/netgraph/bluetooth/socket/ng_btsocket_l2cap.c

sys/netgraph/bluetooth/socket/ng_btsocket_rfcomm.c

sys/netgraph/bluetooth/socket/ng_btsocket_sco.c

sys/netgraph/ng_ksocket.c

sys/netinet/sctp_input.c

sys/netinet/sctp_os_bsd.h

sys/netinet/sctp_output.c

sys/netinet/sctp_pcb.c

sys/netinet/sctp_syscalls.c

sys/netinet/sctp_sysctl.c

sys/netinet/sctp_usrreq.c

sys/netinet/sctputil.c

sys/netinet/tcp_input.c

sys/netinet/tcp_subr.c

sys/netinet/tcp_syncache.h

sys/netinet/tcp_syncache.c

sys/netinet/tcp_timewait.c

sys/netinet/toecore.c

sys/ofed/drivers/infiniband/core/iwcm.c

sys/ofed/drivers/infiniband/ulp/sdp/sdp_main.c

sys/rpc/svc_vc.c

sys/sys/sockbuf.h

sys/sys/socket.h

sys/sys/socketvar.h

sys/sys/sockopt.h

sys/sys/unpcb.h

sys/sys/vnode.h

usr.bin/netstat/inet.c

Listening sockets revamp try #2.
AbandonedPublic
Actions

Revision Contents
Changeset List