Page MenuHomeFreeBSD

tests/carp: Rework unicast_v4
ClosedPublic

Authored by zlei on Fri, May 1, 7:40 PM.
Tags
None
Referenced Files
F157165996: D56761.id177048.diff
Mon, May 18, 11:03 PM
F157165775: D56761.id177048.diff
Mon, May 18, 11:00 PM
Unknown Object (File)
Sun, May 17, 10:07 PM
Unknown Object (File)
Sun, May 17, 9:19 AM
Unknown Object (File)
Sun, May 17, 9:12 AM
Unknown Object (File)
Sat, May 16, 11:18 PM
Unknown Object (File)
Sat, May 16, 5:14 PM
Unknown Object (File)
Thu, May 14, 4:04 PM

Details

Summary

For unicast tests, it is sufficient to use wait_for_carp() to verify
the setup is sane. Additional sanity checks are not necessarily
required but can serve purpose for redundancy.

For some unclear reason routed(8) is advertising route to carp BACKUP.
That makes the test flaky. Also routed(8) is marked deprecated and may
be removed from base in the future. Let's just add static route entry
manually for additional sanity checks.

PR: 294817
Fixes: 93fbdef51a13 tests: carp: Update test case unicast_v4 to catch PR 284872
MFC after: 3 days

Test Plan
# kyua debug carp:unicast_v4
Executing command [ sysctl -j carp_uni_v4_one net.inet.ip.forwarding=1 ]
Executing command [ ifconfig -j carp_uni_v4_one epair0a inet 198.51.100.1/25 ]
Executing command [ ifconfig -j carp_uni_v4_one epair1a inet 198.51.100.129/25 ]
Executing command [ ifconfig -j carp_uni_v4_two epair0b 198.51.100.2/25 up ]
Executing command [ route -j carp_uni_v4_two -n add default 198.51.100.1 ]
Executing command [ ifconfig -j carp_uni_v4_three epair1b 198.51.100.224/25 up ]
Executing command [ route -j carp_uni_v4_three -n add default 198.51.100.129 ]
Executing command [ jexec carp_uni_v4_one ping -c 1 198.51.100.2 ]
Executing command [ jexec carp_uni_v4_one ping -c 1 198.51.100.224 ]
Executing command [ jexec carp_uni_v4_two ping -c 1 198.51.100.224 ]
Executing command [ ifconfig -j carp_uni_v4_two epair0b add vhid 1 peer 198.51.100.224 192.0.2.1/32 ]
Executing command [ ifconfig -j carp_uni_v4_three epair1b add vhid 1 peer 198.51.100.2 192.0.2.1/32 ]
	carp: MASTER vhid 1 advbase 1 advskew 0
Executing command [ route -j carp_uni_v4_one -n add 192.0.2.1 198.51.100.2 ]
Executing command [ jexec carp_uni_v4_one ping -c 1 192.0.2.1 ]
Executing command [ ifconfig -j carp_uni_v4_two epair0b vhid 1 advskew 2 ]
Executing command [ ifconfig -j carp_uni_v4_two epair0b ]
ifconfig: interface epair0b does not exist
ifconfig: interface epair1b does not exist
carp:unicast_v4  ->  passed

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

zlei requested review of this revision.Fri, May 1, 7:40 PM
tests/sys/netinet/carp.sh
180–182

Is there any reason not to use atf_check for all of these configuration lines? That would make it easier to catch typos, unexpected failures, etc..

192

A general comment: -s exit:0 is redundant, that's already the default. I'd avoid adding it because it just makes the line harder to read.

I can confirm that with this patch the test appears to be stable, thanks!

tests/sys/netinet/carp.sh
180–182

Many many tests have similar configuration steps, so I believe ifconfig and route have been widely and extensively tested, they should be very very unlikely to fail.

Apart from that, I think the tests in base shall be ordered as such,

  1. libc
  2. other libraries
  3. bin / sbin / usr.bin / usr.sbin
  4. contrib ?
  5. libexec
  6. sys/... ( other kernel parts )

The basic principle is, the dependency should be tested firstly, then the dependants.

192

Oh, I'll update.

tests/sys/netinet/carp.sh
180–182

fwiw, i always use atf_check even for basic setup stuff (at least when i remember) because even though it's unlikely the test will run into an ifconfig bug, it's possible the test itself might have a bug that causes ifconfig to fail in some unanticipated situations, and failing earlier makes it much easier to debug the test. also, this catches setup failures that cause the test to wrongly pass instead of fail.

zlei edited the test plan for this revision. (Show Details)

Removed redundant -s exit:0 to make the code easier to read.

tests/sys/netinet/carp.sh
180–182

Yes, kernel bugs/race conditions etc. can cause all sorts of odd failures.

For the past while I've been trying to get the regression test suite to work reliably when run in parallel. I can run the full test suite with -v parallelism=16 on my ryzen in about 23 minutes now, but there's still a small number of flaky tests. It's much easier to debug flaky tests when they make use of atf_check everywhere, as that makes test logs more complete and helps catch unexpected failures.

pouria added inline comments.
tests/sys/netinet/carp.sh
180–182

+1
I also prefer to use -j whenever possible to make lines shorter.
for example:
ifconfig -j carp_uni_v4_two ${epair_one}b 198.51.100.2/25 up
Of course this is not a normal style that people follow.

192

+1

202

We are waiting for carp below, why not sleep 0.01?

206–208

Normal style is to keep if and then on the same line.

tests/sys/netinet/carp.sh
180–182

because even though it's unlikely the test will run into an ifconfig bug, it's possible the test itself might have a bug that causes ifconfig to fail

I think that is debatable.

Many years ago, I asked a question to my team, " How would you ensure that the tests you wrote is bugfree/bugless ? " No simple answer.

Back to this test, the basic setup is a combination of epair, jail, vnet, ifconfig, sysctl, route, which are extensively tested. Also that is not part of pre-checking or post-validation, so more atf_check sound a bit waste.

If that really matters, I think I'd propose to wrap those steps to some common function, just like carp_init and vnet_mkjail. So the main logic of the testcase is much clear to read / review.

tests/sys/netinet/carp.sh
180–182

+1
I also prefer to use -j whenever possible to make lines shorter.
for example:
ifconfig -j carp_uni_v4_two ${epair_one}b 198.51.100.2/25 up
Of course this is not a normal style that people follow.

Actually sysctl also supports the -j option ;)

zlei marked an inline comment as done.Fri, May 1, 9:06 PM
zlei added inline comments.
tests/sys/netinet/carp.sh
202

We are waiting for carp below, why not sleep 0.01?

See 27ff90cd3d8d (tests/carp: make a 0.2 second pause before configuring second jail).

I have an idea to sleep in wait_for_carp() instead.

If there is a small window to have two carp MASTER, then eventually one of them shall become BACKUP. That should be rare, so I think this should be better,

--- a/tests/sys/netinet/carp.sh
+++ b/tests/sys/netinet/carp.sh
@@ -46,6 +46,9 @@ wait_for_carp()
                sleep 0.1
        done
 
+       # It is potential to have two carp MASTER, but eventually
+       # one become MASTER and another become BACKUP.
+       sleep 0.2
        if [ -n "$(is_master ${jail1} ${itf1})" ] &&
            [ -n "$(is_master ${jail2} ${itf2})" ]; then
                atf_fail "Both jails are master"
@@ -90,7 +93,6 @@ basic_v4_body()
        jexec carp_basic_v4_two ifconfig ${epair_one}b 192.0.2.202/29 up
        jexec carp_basic_v4_two ifconfig ${epair_one}b add vhid 1 192.0.2.1/29
 
-       sleep 0.2
        jexec carp_basic_v4_three ifconfig ${epair_two}b 192.0.2.203/29 up
        jexec carp_basic_v4_three ifconfig ${epair_two}b add vhid 1 \
            192.0.2.1/29
@@ -138,7 +140,6 @@ vrrp_v4_body()
        jexec ${j}_two ifconfig ${epair_one}b 192.0.2.202/29 up
        jexec ${j}_two ifconfig ${epair_one}b add vhid 1 carpver 3 192.0.2.1/29
 
-       sleep 0.2
        jexec ${j}_three ifconfig ${epair_two}b 192.0.2.203/29 up
        jexec ${j}_three ifconfig ${epair_two}b add vhid 1 carpver 3 \
            192.0.2.1/29
@@ -197,7 +198,6 @@ unicast_v4_body()
        # A peer address x.x.x.224 to catch PR 284872
        jexec carp_uni_v4_two ifconfig ${epair_one}b add vhid 1 \
            peer 198.51.100.224 192.0.2.1/32
-       sleep 0.2
        jexec carp_uni_v4_three ifconfig ${epair_two}b add vhid 1 \
            peer 198.51.100.2 192.0.2.1/32
 
@@ -262,7 +262,6 @@ basic_v6_body()
        jexec carp_basic_v6_two ifconfig ${epair_one}b inet6 add vhid 1 \
            2001:db8::0:1/64
 
-       sleep 0.2
        jexec carp_basic_v6_three ifconfig ${epair_two}b inet6 2001:db8::1:3/64 up no_dad
        jexec carp_basic_v6_three ifconfig ${epair_two}b inet6 add vhid 1 \
            2001:db8::0:1/64

Well that is not scope of this change.

zlei marked an inline comment as done.Fri, May 1, 9:18 PM
zlei added inline comments.
tests/sys/netinet/carp.sh
180–182

Yes, kernel bugs/race conditions etc. can cause all sorts of odd failures.

For the past while I've been trying to get the regression test suite to work reliably when run in parallel. I can run the full test suite with -v parallelism=16 on my ryzen in about 23 minutes now, but there's still a small number of flaky tests. It's much easier to debug flaky tests when they make use of atf_check everywhere, as that makes test logs more complete and helps catch unexpected failures.

Is it possible to teach kyua to do that ? At least a return value ( the steps without atf_check) is recorded ? Something like this,

Executing command [ jexec carp_uni_v4_one sysctl net.inet.ip.forwarding=1 ], return value [ 0 ]
net.inet.ip.forwarding: 0 -> 1

I think it burdens developers to sprinkle atf_check on every step, and it also make the code harder to read.

Thanks for working on this, Zhenlei!

IMHO, adding atf_check to purely configuration lines is not really important. In "a near bright future", I hope, we will be able to delete the configuration lines from the test bodies completely and declare test jails in some declarative manner instead of scripting.

This revision is now accepted and ready to land.Fri, May 1, 9:50 PM
zlei edited the test plan for this revision. (Show Details)
  1. Added atf_check to config / setup steps.
  2. Minor style fix, if ... ; then
  3. Shortened the jail names to ${j}[one|two|three] to reduce the line width, and focus only on the different part of the jail name, should improve readability a bit.
  4. Prefer ifconfig -j , route -j , sysctl -j over jexec ifconfig / route / sysctl.
This revision now requires review to proceed.Sat, May 2, 5:26 PM
zlei marked 7 inline comments as done.Sat, May 2, 5:35 PM

It requires little effort on a single testcase. Let's keep moving.

This revision is now accepted and ready to land.Sat, May 2, 6:35 PM
This revision was automatically updated to reflect the committed changes.

Thanks for working on this, Zhenlei!

IMHO, adding atf_check to purely configuration lines is not really important. In "a near bright future", I hope, we will be able to delete the configuration lines from the test bodies completely and declare test jails in some declarative manner instead of scripting.

+1 for declarative manner.

Ideally the tests should focusing on the key issue, but not the configurations / setups, even though configurations / setups may fail due to something unexpected, for example firewall rules / bugs, mac rules, or regression from in-tree / out-of-tree softwares.