Page MenuHomeFreeBSD

tcp: don't ever return ECONNRESET on close(2)
ClosedPublic

Authored by glebius on Dec 19 2024, 4:09 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Jan 10, 1:20 AM
Unknown Object (File)
Tue, Dec 31, 10:05 PM
Unknown Object (File)
Thu, Dec 26, 7:48 AM
Unknown Object (File)
Thu, Dec 26, 7:32 AM
Unknown Object (File)
Thu, Dec 26, 6:57 AM
Unknown Object (File)
Wed, Dec 25, 12:08 PM
Unknown Object (File)
Dec 23 2024, 6:36 PM
Unknown Object (File)
Dec 20 2024, 4:40 PM

Details

Summary

The SUS doesn't mention this error code as a possible one [1]. The FreeBSD
manual page specifies a possible ECONNRESET for close(2):

[ECONNRESET] The underlying object was a stream socket that was

		shut down by the peer before all pending data was
		delivered.

In the past it had been EINVAL (see 21367f630d72), and this EINVAL was
added as a safety measure in 623dce13c64ef. After conversion to
ECONNRESET it had been documented in the manual page in 78e3a7fdd51e6, but
I bet wasn't ever tested to actually be ever returned, cause the
tcp-testsuite[2] didn't exist back then. So documentation is incorrect
since 2006, if my bet wins. Anyway, in the modern FreeBSD the condition
described above doesn't end up with ECONNRESET error code from close(2).
The error condition is reported via SO_ERROR socket option, though. This
can be checked using the tcp-testsuite, temporarily disabling the
getsockopt(SO_ERROR) lines using sed command [3]. Most of these
getsockopt(2)s are followed by '+0.00 close(3) = 0', which will confirm
that close(2) doesn't return ECONNRESET even on a socket that has the
error stored, neither it is returned in the case described in the manual
page. The latter case is covered by multiple tests residing in tcp-
testsuite/state-event-engine/rcv-rst-*.

However, the deleted block of code could be entered in a race condition
between close(2) and processing of incoming packet, when connection had
already been half-closed with shutdown(SHUT_WR) and sits in TCPS_LAST_ACK.
This was reported in the bug 146845. With the block deleted, we will
continue into tcp_disconnect() which has proper handling of INP_DROPPED.

The race explanation follows. The connection is in TCPS_LAST_ACK. The
network input thread acquires the tcpcb lock first, sets INP_DROPPED,
acquires the socket lock in soisdisconnected() and clears SS_ISCONNECTED.
Meanwhile, the syscall thread goes through sodisconnect() which checks for
SS_ISCONNECTED locklessly(!). The check passes and the thread blocks on
the tcpcb lock in tcp_usr_disconnect(). Once input thread releases the
lock, the syscall thread observes INP_DROPPED and returns ECONNRESET.

  • Thread 1: tcp_do_segment()->tcp_close()->in_pcbdrop(),soisdisconnected()
  • Thread 2: sys_close()...->soclose()->sodisconnect()->tcp_usr_disconnect()

Note that the lockless operation in sodisconnect() isn't correct, but
enforcing the socket lock there will not fix the problem.

[1] https://pubs.opengroup.org/onlinepubs/9799919799/
[2] https://github.com/freebsd-net/tcp-testsuite
[3] sed -i "" -Ee '/\+0\.00 getsockopt\(3, SOL_SOCKET, SO_ERROR, \[ECONNRESET\]/d' $(grep -lr ECONNRESET tcp-testsuite)

PR: 146845

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

This revision is now accepted and ready to land.Dec 19 2024, 4:50 PM

Still wondering if close() does not return ECONNRESET, if you call close() with the linger option enabled and the TCP stack receives a RST-segment. Can test that over the weekend, but not today or tomorrow. But that would mean that ECONNRESET would need to be listed in the close() man-page.
We also need to check SCTP. Will look into this and report back by Monday.

imp accepted this revision.EditedDec 19 2024, 6:00 PM

IEEE Std 1003.1TM-2024 (aka POSIX.1-2024) says:

The close ( ) and posix_close ( ) functions shall fail if:
[EBADF] The fildes argument is not a open file descriptor.
[EINPROGRESS] The function was interrupted by a signal and fildes was closed but the close operation is continuing asynchronously.
The close ( ) and posix_close ( ) functions may fail if:
[EINTR] The function was interrupted by a signal, POSIX_CLOSE_RESTART is defined as non-zero, and (in the case of posix_close()) the flag argument included POSIX_CLOSE_RESTART, in which case fildes is still open.
[EIO] An I/O error occurred while reading from or writing to the file system.

That's all the standard requires.

The '24 standard does talk about the SO_ERROR case returning ECONNRESET (via an async error), but I didn't delve deeply enough to see if there's some buried weasel-words saying these errors might also be returned by close . It's a documented return value of connect(), read(), recv(), recvfrom(), recvmsg(), send(), sendmsg(), sendto(), and write(). It was added in 'issue 6' which is POSIX.1-2001, IIRC. The only issue close() documents wrt sockets is that it must honor the SO_LINGER option if set.

In D48148#1097931, @imp wrote:

IEEE Std 1003.1TM-2024 (aka POSIX.1-2024) says:

The close ( ) and posix_close ( ) functions shall fail if:
[EBADF] The fildes argument is not a open file descriptor.
[EINPROGRESS] The function was interrupted by a signal and fildes was closed but the close operation is continuing asynchronously.
The close ( ) and posix_close ( ) functions may fail if:
[EINTR] The function was interrupted by a signal, POSIX_CLOSE_RESTART is defined as non-zero, and (in the case of posix_close()) the flag argument included POSIX_CLOSE_RESTART, in which case fildes is still open.
[EIO] An I/O error occurred while reading from or writing to the file system.

That's all the standard requires.

The '24 standard does talk about the SO_ERROR case returning ECONNRESET (via an async error), but I didn't delve deeply enough to see if there's some buried weasel-words saying these errors might also be returned by close . It's a documented return value of connect(), read(), recv(), recvfrom(), recvmsg(), send(), sendmsg(), sendto(), and write(). It was added in 'issue 6' which is POSIX.1-2001, IIRC. The only issue close() documents wrt sockets is that it must honor the SO_LINGER option if set.

I guess they are not dealing with file descriptors related to reliable network sockets. For example, what is indicated if you call close() on a TCP socket with the linger time set to 1 seconds, but the TCP connection is not terminated during this time due to retransmissions? I would expect something like EWOULDBLOCK or ETIMEDOUT, but I will test this... Unix Network Programming, 3rd edition, says that EWOULDBLOCK should be indicated whereas Posix says: "The close() and posix_close() functions shall not return an [EAGAIN] or [EWOULDBLOCK] error."
I will write some tests and report, but it won't be before the weekend...