Page MenuHomeFreeBSD

Bugs and problems even in the base stack.
ClosedPublic

Authored by rrs on May 25 2021, 12:06 PM.
Tags
None
Referenced Files
Unknown Object (File)
Oct 16 2024, 2:11 PM
Unknown Object (File)
Oct 3 2024, 6:05 PM
Unknown Object (File)
Oct 1 2024, 5:04 PM
Unknown Object (File)
Sep 27 2024, 12:59 PM
Unknown Object (File)
Sep 27 2024, 12:59 PM
Unknown Object (File)
Sep 27 2024, 12:59 PM
Unknown Object (File)
Sep 26 2024, 2:18 AM
Unknown Object (File)
Sep 26 2024, 2:05 AM
Subscribers

Details

Summary

Michaels testing with UDP tunneling found an issue with the push bit, which was only partly fixed
in the last commit. The problem is the left edge gets transmitted before the adjustments are done
to the send_map, this means that right edge bits must be considered to be added only if
the entire RSM is being retransmitted.

Now syzkaller also continued to find a crash, which Michael sent me the reproducer for. Turns
out that the reproducer on default (freebsd) stack made the stack get into an ack-war with itself.
After fixing the reference issues in rack the same ack-war was found in rack (and bbr). Basically
what happens is we go into the reassembly code and lose the FIN bit. The trick here is we
should not be going into the reassembly code if tlen == 0 i.e. the peer never sent you anything.
That then gets the proper action on the FIN bit but then you end up in LAST_ACK with no
timers running. This is because the usrclosed function gets called and the FIN's and such have
already been exchanged. So when we should be entering FIN_WAIT2 (or even FIN_WAIT1) we get
stuck in LAST_ACK. Fixing this means tweaking the usrclosed function so that we properly
recognize the condition and drop into FIN_WAIT2 where a timer will allow at least TP_MAXIDLE
before closing (to allow time for the peer to retransmit its FIN if the ack is lost). Setting the fast_finwait2
timer can speed this up in testing.

Test Plan

Use both the reproducer and the pkt-drill script to make sure that
all the issues are fixed.

r.c

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <stdio.h>
#include <unistd.h>

int
main(void) {
	struct sockaddr_in addr;
	int fd;

	if ((fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)) < 0) {
		perror("socket");
	}
	addr.sin_family = AF_INET;
	addr.sin_len = sizeof(struct sockaddr_in);
	addr.sin_port = htons(1234);
	addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
	if (bind(fd, (struct sockaddr *)&addr, (socklen_t)sizeof(struct sockaddr_in)) < 0) {
		perror("bind");
	}
	if (sendto(fd, NULL, 0, MSG_EOF, (struct sockaddr *)&addr, (socklen_t)sizeof(struct sockaddr_in)) < 0) {
		perror("sendto");
	}
	if (close(fd) < 0) {
		perror("close");
	}
	return (0);
}

Packet drill script

// Ensure that all relevant sysctl variables have their default values.
 0.00 `sysctl -w net.inet.tcp.syncookies_only=0`
+0.00 `sysctl -w net.inet.tcp.syncookies=1`
+0.00 `sysctl -w net.inet.tcp.rfc1323=1`
+0.00 `sysctl -w net.inet.tcp.sack.enable=1`
+0.00 `sysctl -w net.inet.tcp.ecn.enable=2`
+0.00 `sysctl -w net.inet.tcp.recvspace=65536`
+0.00 `sysctl -w kern.ipc.maxsockbuf=2097152`
// Flush host cache.
+0.00 `sysctl -w net.inet.tcp.hostcache.purgenow=1`
// Ensure that the relevant sysctl variables have their value.
+0.00 `sysctl -w net.inet.tcp.udp_tunneling_port=9811`
+0.00 `sysctl -w net.inet.tcp.udp_tunneling_overhead=8`
// Create a TCP endpoint in the ESTABLISHED state.
+0.00 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0.00 setsockopt(3, IPPROTO_TCP, TCP_LOG, [4], 4) = 0
+0.00 setsockopt(3, IPPROTO_TCP, TCP_REMOTE_UDP_ENCAPS_PORT, [1], 4) = 0
+0.00 fcntl(3, F_GETFL) = 0x02 (flags O_RDWR)
+0.00 fcntl(3, F_SETFL, O_RDWR | O_NONBLOCK) = 0
+0.00 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
+0.00 > S  0:0(0) win 65535 <mss 1452,nop,wscale 6,sackOK,TS val 100 ecr 0>/udp(9811 > 1)
+0.10 < S. 0:0(0) ack 1 win 32767 <mss 1452,sackOK,eol,eol>/udp(1 > 9811)
+0.00 >  . 1:1(0) ack 1 win 65535/udp(9811 > 1)
// Verify that there are no errors pending at the socket layer.
+0.10 getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
// Now it is in the ESTABLISHED state.
+0.00 send(3, ..., 1452, 0) = 1452
+0.00 > P. 1:1453(1452) ack 1 win 65535/udp(9811 > 1)
+0.10 <  . 1:1(0) ack 1453 win 32767/udp(1 > 9811)
+0.00 send(3, ..., 1452, 0) = 1452
+0.00 > P. 1453:2905(1452) ack 1 win 65535/udp(9811 > 1)
+0.00 < [1453:2905(1452)/udp(9811 > 1)] icmp unreachable frag_needed mtu 1300
+0.00 > P. 1453:2705(1252) ack 1 win 65535/udp(9811 > 1)
// The default stack and the RACK stack send the next segment immediately,
// whereas the BBR stack sends it after an RTT, which is 100 ms.
*     > P. 2705:2905(200) ack 1 win 65535/udp(9811 > 1)
+0.10 <  . 1:1(0) ack 2905 win 32767/udp(1 > 9811)
// Tear it down.
+0.00 close(3) = 0
+0.00 > F. 2905:2905(0) ack 1 win 65535/udp(9811 > 1)
+0.10 < F. 1:1(0) ack 2906 win 32767/udp(1 > 9811)
+0.00 >  . 2906:2906(0) ack 2 win 65535/udp(9811 > 1)

Note that you may need to tweak things some for the window scaling depending on
your defaults also be aware that if you are using a larger maxsockbuf (recv/send window) you
can get in the situation where no TCP sockets work if you let the script make these incorrect
i.e. recv/send window larger than maxsockbuf size.

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped