Page MenuHomeFreeBSD

tests/kern/ssl_sendfile: fix 'random' and 'basic' flakyness
ClosedPublic

Authored by glebius on Mon, Mar 9, 10:16 PM.
Tags
None
Referenced Files
F151055462: D55781.diff
Sun, Apr 5, 4:49 PM
Unknown Object (File)
Sun, Apr 5, 8:38 AM
Unknown Object (File)
Sun, Apr 5, 3:49 AM
Unknown Object (File)
Fri, Apr 3, 11:53 PM
Unknown Object (File)
Thu, Apr 2, 1:15 AM
Unknown Object (File)
Wed, Apr 1, 11:33 AM
Unknown Object (File)
Thu, Mar 26, 6:41 AM
Unknown Object (File)
Wed, Mar 25, 9:36 AM
Subscribers

Details

Summary

The basic and random test cases checked c->sbytes immediately after draining all data from SSL_read, without waiting for the server thread to return from SSL_sendfile().
On a loopback connection, the client can consume all data from the kernel receive buffer before SSL_sendfile() returns on the server side, leaving c->sbytes unset at the point of the ATF_REQUIRE check.

Add wait_server() which holds the mutex and waits on the condvar until the server thread transitions out of RUNNING state, ensuring c->sbytes is stable before it is read.

This fixes a flaky failure appearing roughly once every 40 runs with error:

tests/sys/kern/ssl_sendfile.c:327: c.sbytes == (ssize_t)expect not met
Test Plan

Initialy I’ve increased the failure rate with a small patch by adding usleep(1) temporarily in the test:

ATF_REQUIRE(nread == expect);
usleep(1);  /* widen the race window */
ATF_REQUIRE(c.sbytes == (ssize_t)expect);

With this patch applied, I’ve ran stress-ng to load all my core stress-ng --cpu $(nproc) while running the test in loop thousand of time while sudo cpuset -l 0 kyua test sys/kern/ssl_sendfile:random; do :; done

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 71318
Build 68201: arc lint + arc unit

Event Timeline

olivier created this revision.
olivier edited the test plan for this revision. (Show Details)

Thanks a lot for the diagnosis and fix. Note that truncate, grow and offset_beyond_eof tests already have a very similar race protection. What about making the epilogue of those tests common? I'd suggest to combine the wait and the sbytes check into a single function.

glebius edited reviewers, added: olivier; removed: glebius.
glebius retitled this revision from tests/ssl_sendfile: fix race between client read and c->sbytes check to tests/kern/ssl_sendfile: fix 'random' and 'basic' flakyness.Tue, Mar 10, 5:47 PM
glebius edited the test plan for this revision. (Show Details)

This isn't actually a true race, but an unsynchronized read. No need to
sleep, just flush the caches.

This revision was not accepted when it landed; it landed in state Needs Review.Tue, Mar 10, 9:04 PM
This revision was automatically updated to reflect the committed changes.