The only purpose of this that I can find is to cause surprising behavior on
accepted connections. Newly-created sockets will never hit these paths as
one cannot set SO_LINGER prior to socket(2). If SO_LINGER is set on a
listening socket and inherited, one would expect the timeout to be inherited
rather than changed arbitrarily like this.
Details
- Reviewers
rscheff tuexen - Group Reviewers
network transport - Commits
- rG4c0bef07be07: kern: net: remove TCP_LINGERTIME
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Passed - Unit
No Test Coverage - Build Status
Buildable 36385 Build 33274: arc lint + arc unit
Event Timeline
In theory, set SO_LINGER with l_onoff=1, l_linger=0 on a listening socket; sonewconn() will inherit the SO_LINGER and linger time but hit this path and reset it to TCP_LINGERTIME. My best guess is that this might've been intended at one point to apply across all TCP sockets, but ended up getting hammered around until we arrived at it only applying to accept()d sockets.
In practice, I found that this was subtly broken in r10:779f106aa169 which moved setting so->so_options in sonewconn() to *after* pru_attach(). This made this effectively dead code, and we've spent the last ~3 years with this being dead code.
I sent a separate e-mail to @glebius to gather his thoughts on it, because the only effect I can see from the move across all in-tree pru_attach() that may be called from sonewconn() is that these two bits became dead code. Nothing else in-tree relies on an accurate so->so_options in pru_attach.
How is the default 2 minute "linger" timer being implemented now if this is dead code? iirc this value use to be used when a socket was closed on the server side to control how long one waited in close_wait for the client to finish up the connection (server initiated close.)
sys/netinet/tcp_usrreq.c | ||
---|---|---|
2566 | The fact that so_linger is NOT 0 becomes critical here, as it causes the else clause to trigger. This is the expected *default* behavior of TCP. If we do not initialize so_linger to TCP_LINGERTIME above this second condition, == 0, is always true, thus altering the expected default behavior of SO_LINGER if one does not also set the timer. |
sys/netinet/tcp_usrreq.c | ||
---|---|---|
2566 | I'm not sure what you're saying here. SO_LINGER cannot be set when the socket is initialized, and you can't set it post-creation without also specifying a timeout. If you have some out-of-band method to set it on a socket that doesn't pair those together, then that should be fixed to apply whatever TCP_LINGERTIME semantics may be needed; the bits I've removed here are not the droids you are looking for, though. |
Ping; we now again hit this path as of 504ebd612ec6 when we create a new connection from a listening socket with SO_LINGER set. I'm not aware of any other implementation that has this TCP_LINGERTIME type of concept that specifically applies to SO_LINGER, and especially not one that resets a 0 lingertime to any other value for those inherited from a listening socket -- but I only tested/inspected Illumos, and tested Linux.
looked into uipc_socket.c, sonewconn() does as you mention, and yes, I would agree that the expected behavior should be to retain the settings of the listening socket rather than change this... But I leave it to @tuexen to give the verdict on this.
Thanks; I think my description is clear enough, but I've also got this (suboptimal, but functional) demo of the issue (adapted from a previous reproducer I've sent, in case it looks familiar): https://people.freebsd.org/~kevans/list.c (cc -pthread)
I think clamping so_linger could make sense (my recollection is that somewhere around ~4 minutes is the most one should reasonably need to wait by spec once we've progressed to FIN_WAIT*), but it should be evenly enforced rather than just on new incoming connections.
This fixes actually a bug. With this patch the following packetdrill runs successfully:
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0.000 bind(3, ..., ...) = 0 +0.000 setsockopt(3, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8) = 0 +0.000 listen(3, 1) = 0 +0.000 < S 0:0(0) win 65535 <mss 1460,sackOK,eol,eol> +0.000 > S. 0:0(0) ack 1 win 65535 <mss 1460,sackOK,eol,eol> +0.000 < . 1:1(0) ack 1 win 65535 +0.000 accept(3, ..., ...) = 4 +0.000 close(3) = 0 +0.000 getsockopt(4, SOL_SOCKET, SO_LINGER, {onoff=128, linger=0}, [8]) = 0 +0.000 close(4) = 0 +0.000 > R. 1:1(0) ack 1 win 0
Thanks for finding the bug and fixing it.