Page MenuHomeFreeBSD

Significantly speed up libthr/mutex_test and make more reliable

Authored by arichardson on Sep 18 2020, 10:29 AM.



Instead of using a simple global++ as the data race, with this change we
performs the increment by loading the global, delaying for a bit and then
storing back the incremented value. If I move the increment outside of the
mutex protected range, I can now see the data race with only 100 iterations
on amd64 in almost all cases. Before such a broken test almost always
passed with < 100,000 iterations and only reliably failed with the current
limit of 10 million.

I noticed this poorly written test because the mutex:mutex{2,3} and
timedmutex:mutex{2,3} tests were always timing out on our CheriBSD Jenkins.
Writing good concurrency tests is hard so I won't attempt to do so, but
this change should make the test more likely to fail if pthread_mutex_lock
is not implemented correctly while also significantly reducing the time it
takes to run these four tests. It will also reduce the time it takes to
perform QEMU RISC-V testsuite runs by almost 40 minutes (out of currently
7 hours).

Diff Detail

rS FreeBSD src repository
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

arichardson created this revision.

These are in contrib/netbsd-tests; will you be submitting this upstream? I note that our vendor copy is ancient; mutex6 was deleted upstream in December 2017, and the workaround for PR 44387 on PowerPC removed in March 2017.

Please add __FreeBSD__ around any test modifications; it makes it easier for folks to upstream the changes as we're 3 years out of date.

This is where the sources live upstream: .

This revision is now accepted and ready to land.Sep 29 2020, 10:11 PM
206 ↗(On Diff #77593)

This is misindented.

324 ↗(On Diff #77593)

strto*l is preferred; also: what if the number of iterations was <= 0?

359–362 ↗(On Diff #77593)

The messages could be a bit more complete. Ideally we shouldn't have to read source code to understand errors.

arichardson added inline comments.
359–362 ↗(On Diff #77593)

Yes that would be nice, but most failures in this test (and most other tests) do require looking at the source code.

This error will print "count != -1: <value>" which is marginally better than the existing checks in this file such as ATF_REQUIRE_EQ(x, 21); -> "x != 21"
ATF_REQUIRE_EQ(*(int *)joinval, 21); -> "*(int *)joinval != 21"

  • fix indentation and avoid negative num_iterations
This revision now requires review to proceed.Sep 30 2020, 9:39 PM
324 ↗(On Diff #77593)

I can also drop the getenv part of the patch? This was mostly to allow testing the duration and effectiveness of different numbers of iterations.

Non-blocking thought: can this use sem_post/sem_(timed)?wait instead of spinning in busy-loops waiting for threads to start?

This revision is now accepted and ready to land.Wed, Nov 18, 5:51 AM
359–362 ↗(On Diff #77593)

Printing out the line number might be helpful in this case along with a more humanized description, TBH.

359–362 ↗(On Diff #77593)

We could modify ATF_REQUIRE_EQ_MSG to also print the location information?

In D26473#608811, @ngie wrote:

Non-blocking thought: can this use sem_post/sem_(timed)?wait instead of spinning in busy-loops waiting for threads to start?

That is probably nicer, but it adds a dependency on working sem_* implementations. I'd be tempted to keep the busy-wait.