Page MenuHomeFreeBSD

tests: run all in jail by default
Needs ReviewPublic

Authored by igor.ostapenko_pm.me on Oct 31 2023, 12:09 AM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Feb 22, 8:58 PM
Unknown Object (File)
Wed, Feb 14, 9:22 AM
Unknown Object (File)
Jan 5 2024, 10:34 PM
Unknown Object (File)
Dec 25 2023, 5:25 PM
Unknown Object (File)
Dec 23 2023, 2:07 AM
Unknown Object (File)
Dec 14 2023, 7:35 PM
Unknown Object (File)
Dec 2 2023, 8:12 PM
Unknown Object (File)
Nov 28 2023, 5:18 PM

Details

Reviewers
markj
jilles
melifaro
Group Reviewers
tests
Summary

Overview

This is a followup to the kyua jail support patch, https://reviews.freebsd.org/D42350:

  • 1% of this patch assumes that everyone wants faster test suite runs and jail is always built-in -- thus, it forces all tests to have execenv="jail" by default. This part, I guess, needs discussion and opinions from experienced ones who have been working with the test suite for a long time and have intuition of its possible real life use cases and nuances. For the demo purpose jail execution environment is turned on in this patch by direct change in the suite.test.mk, it's prefixed to allow tests overriding it.
  • 99% of this patch is not related to the decision whether "run in jail" should be by default. It's about existing tests, to help some of them have required jail configuration to do well if they are asked to run in a jail, or set them back to host execution environment.

The way it's done

I've done the first run of the whole suite with execenv="jail" set for everyone, and iterated through the skipped, broken, and failed ones. My instructions were as follows:

  • if a test has an issue with kernel module loading in a jail -- add respective execenv_jail params (if_epair is a frequent example)
  • if a test needs additional rights or features in a jail -- add respective execenv_jail params (e.g. vnet, allow.mlock)
  • if a test expects a user to load a module -- keep it as is
  • if a test show instability in a jail (panics, fails) -- set it to always ask for execenv="host", i.e. it won't be run in a jail by kyua
  • if a test obviously cannot run in a jail -- set it to execenv="host"
  • else -- set it to execenv="host"

ZFS tests are all set to execenv="host", it's done at a single place,atf.test.mk, and it's based on the fact that all those tests are the only ones who require ksh93. That's to decrease time investment due to ZFS tests have a lot of Makefiles, and it's better to hear other opinions first before dealing with all of them.

Results

Such way I've managed to get more tests moved out from skipped/broken/failed categories with "run in jail" activation. Anyway, some fails are left but it seems to be usual fluctuations according to the CI and it's just open topics waiting for fixing/tuning.

RFC

This patch could be a starting point after the kyua patch, if the latter gets a green light. And I believe that some tests can be improved in future to move out from host execenv and be able to run in parallel with jails.


Demo

I guess, all these say nothing without some numbers to compare. I've managed to collect the following.

The baseline:

  • 8 cpus
  • all builds were based on CURRENT 314542de6d (Oct 27)
  • pkg install python py39-pip py39-pytest jq perl5 openvpn ksh93 gtar isc-dhcp44-server
  • pip install scapy
  • kldload zfs pf pfsync pflog dummynet if_bridge if_ovpn ipdivert sctp carp ipsec tcpmd5 if_wg cryptodev if_stf
  • sysctl kern.crypto.allow_soft=1
  • sysctl kern.ipc.tls.enable=1
  • /usr/tests/sys/cddl/zfs tests were excluded from the runs

AArch64

1 h 52 min -- no patches applied, non-parallel

8168/8216 passed (48 failed)
Test cases: 8216 total, 265 skipped, 35 expected failures, 1 broken, 47 failed
     6730.37 real       394.08 user       747.83 sys

1 h 25 min -- no patches applied, parallelism=8

8131/8216 passed (85 failed)
Test cases: 8216 total, 277 skipped, 35 expected failures, 7 broken, 78 failed
     5099.31 real       699.89 user      1164.26 sys

0 h 36 min -- kyua and this patch applied, parallelism=8

8164/8216 passed (52 failed)
Test cases: 8216 total, 262 skipped, 35 expected failures, 4 broken, 48 failed
     2155.09 real       473.59 user      1040.41 sys

AMD64

2 h 6 min -- no patches applied, parallelism=8 (had to exclude a few tests due to constant panics)

8129/8204 passed (75 failed)
Test cases: 8204 total, 266 skipped, 35 expected failures, 5 broken, 70 failed
     7556.41 real      4971.02 user      7299.63 sys

1 h 5 min -- kyua and this patch applied, parallelism=8

8172/8224 passed (52 failed)
Test cases: 8224 total, 253 skipped, 35 expected failures, 4 broken, 48 failed
     3921.02 real      3637.01 user      4967.95 sys

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

Thanks for working on the idea of improving tests parallelism!
Conceptually I like the idea of auto-jailing, but the details matter :-)
Jails deal nicely with removing some tests-running side-effects, but not all.
Filesystem “virtualization” as well as network “virtualization” has to be set up explicitly to actually provide isolation for the respective tests.

For example, all pytest tests sets up an explicit vnet environment, so does many firewall tests.

It’s worth not running them “jailed”.

It would be nice if one does not specify test execution params in 2 places - one in the test and another in the makefile. I’d rather prefer to avoid or minimize Makefile changes at all.

Thanks for your attention. Agree, the details matter. It's expected to receive more opinions and real use case claims when the kyua jail support patch gets detailed consideration and discussion. And the change of existing test suite configuration (this patch) may go another direction after that.