resolv_test: Fix racy exit check, remove mutexes, and reduce output
ClosedPublic
Actions

Authored by arichardson on Mar 23 2021, 11:21 AM.

Details

Reviewers

jhb

Group Reviewers

tests

Commits

rGc5cde5af4b82: resolv_test: Fix racy exit check, remove mutexes, and reduce output
rG85425bdc5a80: resolv_test: Fix racy exit check, remove mutexes, and reduce output

Summary

Instead of polling nleft[i] (without appropriate memory barriers!) and
using sleep() to detect the exit just call pthread_join() on all threads.

Also replace the use of a mutex that guarding the increments with atomic
fetch_add. This should reduce the runtime of this test on SMP systems.

Finally, remove all the debug printfs unless DEBUG_OUTPUT is set in
the environment.

Test Plan

still fails sometimes on qemu (but maybe less often?)

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 38050
Build 34939: arc lint + arc unit

Event Timeline

arichardson created this revision.Mar 23 2021, 11:21 AM

Herald added a subscriber: imp. · View Herald TranscriptMar 23 2021, 11:21 AM

arichardson requested review of this revision.Mar 23 2021, 11:21 AM

Harbormaster completed remote builds in B38012: Diff 86190.Mar 23 2021, 11:21 AM

jhb added inline comments.Mar 23 2021, 7:54 PM

lib/libc/tests/resolv/resolv_test.c
203	I think if you cared about the error you could probably write this simpler as: atomic_add_explicit(&ask[i], 1, memory_order_relaxed); if (error == 0) atomic_add_explicit(&got[i], 1, memory_order_relaxed); else if (got[i] != 0) fprintf(stderr, ...); Not sure why you needed the fetchadd for ask[i], and I think the race you are worried about here doesn't matter in terms of reading got[i]?

I think the real problem might be QEMU's builtin DNS being flaky (or something wrong in the kernel). However, the nleft[i] race was pointed out by TSan.

lib/libc/tests/resolv/resolv_test.c
203	I guess could do that instead, but since fetchadd returns the old value, there's no need to load it again? Multiple threads are reading+writing to both ask[] and got[]. I think using `_Atomic(int) *` will ensure that the ++ is a fetchadd, but might as well be explicit. If we don't use atomics we could get lost updates.

jhb added inline comments.Mar 23 2021, 8:04 PM

lib/libc/tests/resolv/resolv_test.c
203	In the code above, you don't load got[i] twice as you either increment it for error == 0, or you check it against 0 for error != 0. I find the current code a bit confusing in that case (it intentionally adds 0 sometimes). It was a cute trick to avoid a branch perhaps under the mutex (but not really), but since you have to check error explicitly for the new stderr message, you might as well avoid the cute "add 0" trick. I understand you need atomics, but using 'fetchadd' instead of plain 'add' for ask[i] when you don't use the returned value is what I found confusing.