Page MenuHomeFreeBSD

tests/netlink: fix flaky netlink_socket:overflow
Needs ReviewPublic

Authored by olivier on Mar 12 2026, 11:50 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Apr 6, 9:02 PM
Unknown Object (File)
Sun, Apr 5, 3:21 PM
Unknown Object (File)
Sun, Apr 5, 12:43 AM
Unknown Object (File)
Mon, Mar 30, 12:56 AM
Unknown Object (File)
Thu, Mar 26, 12:54 PM
Unknown Object (File)
Sun, Mar 22, 10:04 PM
Unknown Object (File)
Mar 21 2026, 1:55 AM
Unknown Object (File)
Mar 20 2026, 2:38 PM
Subscribers

Details

Reviewers
glebius
melifaro
Summary

Three bugs in the overflow test caused intermittent failures:

  1. The send buffer accepts exactly sendspace bytes before returning EAGAIN. The assertion sizeof(hdr) * cnt > sendspace was too strict; change it to >=.
  1. The fullsocket() loop exited when recvavail > recvspace - rsize, but the kernel's taskqueue only stops when the receive buffer is at or past its hiwat (recvavail >= recvspace). Exiting the loop too early meant the taskqueue was still running and could drain a send buffer slot between fullsocket() returning and the blocking send(), causing the send to succeed when it should block. Wait until recvavail >= recvspace.
  1. The kernel uses ignore_limit=true when writing replies, so the receive buffer can overflow its hiwat by up to one full reply's worth of bytes. A single recv(buf, BUFLEN=1000) call may not consume enough data to bring the buffer below hiwat, leaving the taskqueue stuck. Replace the single recv() with a drain loop that reads until FIONREAD < SO_RCVBUF, guaranteeing the taskqueue can proceed and drain the send buffer for the subsequent blocking send().
Test Plan

This test behave differents on multiples servers:
On a large part of servers and VMs, it was working great.
On our CI, it was flaky, but about 1 every 50 runs.
On the first (big) server, it always fail with:

$ kyua test sys/netlink/netlink_socket:overflow
sys/netlink/netlink_socket:overflow  ->  failed: /usr/src/tests/sys/netlink/netlink_socket.c:109: sizeof(hdr) * cnt > sendspace not met  [0.112s]

On 2 other older servers, it was always failing with:

$ kyua test sys/netlink/netlink_socket:overflow
sys/netlink/netlink_socket:overflow  ->  failed: /usr/src/tests/sys/netlink/netlink_socket.c:149: send(fd, &hdr, sizeof(hdr), 0) == sizeof(hdr) not met  [2.237s]

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped