Page MenuHomeFreeBSD

fusefs: fix intermittency in the BadServer.ShortWrite test
ClosedPublic

Authored by asomers on Oct 14 2025, 2:22 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Nov 22, 4:00 AM
Unknown Object (File)
Fri, Nov 21, 6:21 AM
Unknown Object (File)
Fri, Nov 21, 5:57 AM
Unknown Object (File)
Fri, Nov 21, 3:26 AM
Unknown Object (File)
Thu, Nov 20, 7:20 AM
Unknown Object (File)
Thu, Nov 20, 4:47 AM
Unknown Object (File)
Thu, Nov 20, 4:47 AM
Unknown Object (File)
Thu, Nov 20, 4:43 AM
Subscribers

Details

Summary

This test implicitly depended on the order in which two threads
completed. If the test thread finished first, the test would pass. But
if the mock file system thread did, it would attempt to read from an
unmounted file system, and fail. As a result, the test would randomly
fail once out of every several thousand executions. Fix it by telling
the mock file system's event loop to exit without attempting to read any
more events.

Reported by: Siva Mahadevan <me@svmhdvn.name>
MFC after: 1 week

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

LGTM and makes sense. Tested on loop for 15+ minutes and found no failures.

This revision was not accepted when it landed; it landed in state Needs Review.Oct 14 2025, 2:47 PM
This revision was automatically updated to reflect the committed changes.

@asomers As of f4f638eb23d770e19ede167908d8145b8851f835, this test is still failing intermittently in CI as seen here on aarch64: https://ci.freebsd.org/view/Test/job/FreeBSD-main-aarch64-test/1801/testReport/junit/sys.fs.fusefs/bad_server/main.

This seems to also be reproducible locally on amd64 with more than 20+ minutes of testing (very intermittent, but becomes clear when running on slower platforms). I'm not sure this fix is comprehensive, although it did certainly help reduce the intermittency.

@asomers As of f4f638eb23d770e19ede167908d8145b8851f835, this test is still failing intermittently in CI as seen here on aarch64: https://ci.freebsd.org/view/Test/job/FreeBSD-main-aarch64-test/1801/testReport/junit/sys.fs.fusefs/bad_server/main.

This seems to also be reproducible locally on amd64 with more than 20+ minutes of testing (very intermittent, but becomes clear when running on slower platforms). I'm not sure this fix is comprehensive, although it did certainly help reduce the intermittency.

You're right, Siva. I just reproduced the problem myself, with about 100,000 iterations. And I see the problem. I'll fix it.