Page MenuHomeFreeBSD

Significantly speed up libthr/mutex_test and make more reliable

Authored by arichardson on Sep 18 2020, 10:29 AM.



Instead of using a simple global++ as the data race, with this change we
performs the increment by loading the global, delaying for a bit and then
storing back the incremented value. If I move the increment outside of the
mutex protected range, I can now see the data race with only 100 iterations
on amd64 in almost all cases. Before such a broken test almost always
passed with < 100,000 iterations and only reliably failed with the current
limit of 10 million.

I noticed this poorly written test because the mutex:mutex{2,3} and
timedmutex:mutex{2,3} tests were always timing out on our CheriBSD Jenkins.
Writing good concurrency tests is hard so I won't attempt to do so, but
this change should make the test more likely to fail if pthread_mutex_lock
is not implemented correctly while also significantly reducing the time it
takes to run these four tests. It will also reduce the time it takes to
perform QEMU RISC-V testsuite runs by almost 40 minutes (out of currently
7 hours).

Diff Detail

No Linters Available
No Unit Test Coverage
Build Status
Buildable 33621
Build 30866: arc lint + arc unit

Event Timeline

arichardson created this revision.

These are in contrib/netbsd-tests; will you be submitting this upstream? I note that our vendor copy is ancient; mutex6 was deleted upstream in December 2017, and the workaround for PR 44387 on PowerPC removed in March 2017.

Please add __FreeBSD__ around any test modifications; it makes it easier for folks to upstream the changes as we're 3 years out of date.

This is where the sources live upstream: .

This revision is now accepted and ready to land.Sep 29 2020, 10:11 PM

This is misindented.


strto*l is preferred; also: what if the number of iterations was <= 0?


The messages could be a bit more complete. Ideally we shouldn't have to read source code to understand errors.

arichardson added inline comments.

Yes that would be nice, but most failures in this test (and most other tests) do require looking at the source code.

This error will print "count != -1: <value>" which is marginally better than the existing checks in this file such as ATF_REQUIRE_EQ(x, 21); -> "x != 21"
ATF_REQUIRE_EQ(*(int *)joinval, 21); -> "*(int *)joinval != 21"

  • fix indentation and avoid negative num_iterations
This revision now requires review to proceed.Sep 30 2020, 9:39 PM

I can also drop the getenv part of the patch? This was mostly to allow testing the duration and effectiveness of different numbers of iterations.

Non-blocking thought: can this use sem_post/sem_(timed)?wait instead of spinning in busy-loops waiting for threads to start?

This revision is now accepted and ready to land.Nov 18 2020, 5:51 AM

Printing out the line number might be helpful in this case along with a more humanized description, TBH.


We could modify ATF_REQUIRE_EQ_MSG to also print the location information?

In D26473#608811, @ngie wrote:

Non-blocking thought: can this use sem_post/sem_(timed)?wait instead of spinning in busy-loops waiting for threads to start?

That is probably nicer, but it adds a dependency on working sem_* implementations. I'd be tempted to keep the busy-wait.