Page MenuHomeFreeBSD

syslogd/tests: Address races
AcceptedPublic

Authored by markj on Mon, Jan 19, 3:41 PM.
Tags
None
Referenced Files
F142525057: D54779.diff
Tue, Jan 20, 3:26 PM
F142445529: D54779.diff
Tue, Jan 20, 1:08 AM
Unknown Object (File)
Mon, Jan 19, 9:36 PM
Unknown Object (File)
Mon, Jan 19, 9:23 PM
Unknown Object (File)
Mon, Jan 19, 8:04 PM
Unknown Object (File)
Mon, Jan 19, 7:31 PM
Subscribers

Details

Reviewers
jfree
jlduran
Summary

I occasionally see failures in the syslogd test suite. The problem is
that the tests are racy: they often 1) send a message using logger(1),
then 2) immediately check whether the message was logged to a log file.
If the syslogd instance under test doesn't get a chance to run before
step 2 the test fails.

This change reworks things to avoid the race while minimizing the amount
of time sleeping.

  1. Each test uses a single logfile, so have them use a new common variable, SYSLOGD_LOGFILE, instead of something test-specific.
  2. In syslogd_start(), if the configuration references SYSLOGD_LOGFILE, wait for it to be created by syslogd before returning. Record its modification time.
  3. Add a helper syslogd_check_log(), to check for a given log entry in the last line of SYSLOGD_LOGFILE, instead of using atf_check directly.
  4. In syslogd_check_log(), wait for the modification time of the logfile to change relative to syslogd_startup or a previous syslogd_check_log() call, before checking for the desired log entry.

With this change, I was able to run the tests 1000 times in a loop with
4-way parallelism without seeing any test failures. Without the change
I usually get a failure within 10 loops.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 70000
Build 66883: arc lint + arc unit

Event Timeline

Nice! I may steal a few ideas for some other similar tests.
I made a small observation, that can be ignored, I'm just thrilled with my recent discovery of atf_check -r.

usr.sbin/syslogd/tests/syslogd_test_common.sh
196

I wonder if almost the entirety of this function can be replaced with:

atf_check -r 50 test $(cat MTIME) != ${mtime}
This revision is now accepted and ready to land.Mon, Jan 19, 8:27 PM
usr.sbin/syslogd/tests/syslogd_test_common.sh
196

Oh huh. I think I can use that to get rid of MTIME entirely: when we're looking for logfile entries, just do something like:

atf_check -r 5 -o match:"${msg}" tail -n 1 "${SYSLOGD_LOGFILE}"

I'll give that a try.

usr.sbin/syslogd/tests/syslogd_test_common.sh
196

This almost works, but there's a small issue: if the command times out, as it will in some syslogd tests which are marked expected-to-fail, the test result is "broken" instead of "expected failure".

I think I'll just modify those test cases to use syslogd_check_log_nopoll() instead, as the -r trick makes this patch much simpler.

Follow jlduran's suggestion of using atf_check -r.

This revision now requires review to proceed.Mon, Jan 19, 11:58 PM

Very nice!

usr.sbin/syslogd/tests/syslogd_test_common.sh
196

I think that is a bug in atf-check. Will investigate.

This revision is now accepted and ready to land.Tue, Jan 20, 1:37 AM
usr.sbin/syslogd/tests/syslogd_test_common.sh
196

Yeah, I tend to agree. I think if the global test timeout elapses, then the test should be marked broken regardless of whether it's expected to fail, but individual atf_check timeouts should be handled the same as if atf_check itself had failed.