Page MenuHomeFreeBSD

pf: tests: Introduce wait_for_process()
Needs ReviewPublic

Authored by jlduran on Tue, Jan 13, 10:12 PM.
Tags
None
Referenced Files
F141986039: D54695.diff
Wed, Jan 14, 12:57 PM
F141976111: D54695.diff
Wed, Jan 14, 10:31 AM
F141970161: D54695.id169578.diff
Wed, Jan 14, 8:32 AM
F141968985: D54695.diff
Wed, Jan 14, 8:07 AM
F141960491: D54695.diff
Wed, Jan 14, 5:02 AM
F141957340: D54695.id169581.diff
Wed, Jan 14, 3:52 AM
F141957293: D54695.id.diff
Wed, Jan 14, 3:50 AM
F141955979: D54695.id169581.diff
Wed, Jan 14, 3:26 AM

Details

Reviewers
kp
Group Reviewers
tests
Summary

Introduce a new function that waits for processes to start with a
configurable timeout and interval. By default, it sleeps for 1 second
intervals for up to 30 seconds waiting for a process to start.

It tries to address sporadic failures in test pflog:rdr_action which may
be caused by an insufficient sleep time during tcpdump startup. The new
function provides reliable process detection with configurable
parameters for different timing requirements.

The implementation uses pgrep to detect running processes and includes
proper timeout handling to prevent indefinite waiting, as well as an
optional jail parameter. There are many "sleep 1" calls in the test
suite that can be replaced with this helper function.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 69866
Build 66749: arc lint + arc unit

Event Timeline

  • Simplify jail processing
  • Exit rather than return

This might be an overkill, but I would like to know if a similar helper function could help avoid failure intermittence (flakiness) in some tests. I reproduced this test natively on a very fast aarch64 machine, and it did not fail once, comparing the output of a successful pass vs. a failed one makes me wonder if it is not sleeping enough:

For example:
Reference GOOD: https://ci.freebsd.org/view/Test/job/FreeBSD-main-amd64-test/27610/testReport/junit/sys.netpfil.pf/pflog/rdr_action/ (galahad2.nyi.freebsd.org)
Reference BAD: https://ci.freebsd.org/view/Test/job/FreeBSD-main-amd64-test/27611/testReport/junit/sys.netpfil.pf/pflog/rdr_action/ (galahad2.nyi.freebsd.org)

I'm not sure this is sufficient. It is still possible for tcpdump to have started, but not gotten to the point of actually opening the pflog device.

I've had a very quick look at the tcpdump code, and I think it looks like we could potentially rely on the existence of a capture file (so -w <file>). I think that file only gets created after we've done pcap_open()/pcap_setfilter(). That'd require a bit of test-reworking though, because we'd be saving a pcap file, which we'd have to translate back to text to run the checks on.

tests/sys/netpfil/pf/pflog.sh
384

I'd keep this in. It can be useful when debugging test failures.

In D54695#1249657, @kp wrote:

I'm not sure this is sufficient. It is still possible for tcpdump to have started, but not gotten to the point of actually opening the pflog device.

I've had a very quick look at the tcpdump code, and I think it looks like we could potentially rely on the existence of a capture file (so -w <file>). I think that file only gets created after we've done pcap_open()/pcap_setfilter(). That'd require a bit of test-reworking though, because we'd be saving a pcap file, which we'd have to translate back to text to run the checks on.

Ah, yes! I'll experiment -w <file> with atf_check -r instead.