Instead of using a simple global++ as the data race, with this change we
performs the increment by loading the global, delaying for a bit and then
storing back the incremented value. If I move the increment outside of the
mutex protected range, I can now see the data race with only 100 iterations
on amd64 in almost all cases. Before such a broken test almost always
passed with < 100,000 iterations and only reliably failed with the current
limit of 10 million.
I noticed this poorly written test because the mutex:mutex{2,3} and
timedmutex:mutex{2,3} tests were always timing out on our CheriBSD Jenkins.
Writing good concurrency tests is hard so I won't attempt to do so, but
this change should make the test more likely to fail if pthread_mutex_lock
is not implemented correctly while also significantly reducing the time it
takes to run these four tests. It will also reduce the time it takes to
perform QEMU RISC-V testsuite runs by almost 40 minutes (out of currently
7 hours).