Adding a regression test to check if vnet interface assigned to a jail is visible from the host once the jail destroyed.
There is a regression on -head only:
root@fbsd131:/usr/tests # uname -a FreeBSD fbsd131 13.1-RELEASE FreeBSD 13.1-RELEASE releng/13.1-n250148-fc952ac2212 GENERIC amd64 root@fbsd131:/usr/tests # kyua debug usr.sbin/jail/jail_basic_test:remove Executing command [ ifconfig lo0 inet 18.104.22.168/32 alias ] Executing command [ jail -c name=removejail persist ip4.addr=22.214.171.124 ] Executing command [ jexec removejail ifconfig lo0 ] Executing command [ jail -R removejail ] Executing command [ jls -d -j removejail ] usr.sbin/jail/jail_basic_test:remove -> passed
And on head:
root@current:/usr/tests # uname -a FreeBSD bigone 14.0-CURRENT FreeBSD 14.0-CURRENT #148 main-n256820-6a26c99f827: Wed Jul 20 00:50:17 CEST 2022 root@bigone:/usr/obj/usr/s rc/amd64.amd64/sys/BBR amd64 root@current:/usr/tests # kyua debug usr.sbin/jail/jail_basic_test:remove Executing command [ ifconfig lo0 inet 126.96.36.199/32 alias ] Executing command [ jail -c name=removejail persist ip4.addr=188.8.131.52 ] Executing command [ jexec removejail ifconfig lo0 ] Executing command [ jail -R removejail ] Executing command [ jls -d -j removejail ] Fail: incorrect exit status: 0, expected: 1 stdout: JID IP Address Hostname Path 1 184.108.40.206 / stderr: usr.sbin/jail/jail_basic_test:remove -> failed: atf-check failed; see the output of the test for details
|22 ↗||(On Diff #108417)|
Polling for jail shutdown would probably be wise too...
yes the sleep is racy, but I have no idea to fix it.
Without it, the test will always fails: There is a delay between the jail destruction and the interface being visible back from the host.
So I've tested a loop like this one:
while jls -dj jifdestroy >/dev/null 2>&1; do sleep 1 done
with a dummy script on 13.1-release like this one:
#!/bin/sh #set -eu ifconfig lo888 create jail -c name=bug persist vnet vnet.interface=lo888 jexec bug ifconfig lo888 up jail -R bug while jls -dj bug >/dev/null 2>&1; do echo "wait" sleep 1 done ifconfig lo888 destroy && echo "success" || echo "fails"
And it fails about 1 time on 5 runs:
# sh -x ./yo.sh + ifconfig lo888 create + jail -c 'name=bug' persist vnet 'vnet.interface=lo888' + jexec bug ifconfig lo888 up + jail -R bug + jls -dj bug + ifconfig lo888 destroy ifconfig: interface lo888 does not exist + echo fails fails
So the only stable solution seems the racy sleep here.
Checking that vnet is not dying is the right thing - once vnet is destroyed you should get the interface back. I’d prefer to have this logic in the test instead of a random sleep.
If this fails (on current head) I can take a look in a ~week time
Thanks for the tips about checking for dying state: This is the root cause is. The jail is stuck forever in dying state after destroying (without even using) the vnet interface.
Following new troubleshooting study from email@example.com on PR, he identified the culprid commits and proposed a simpler way to reproduce it.
The problem isn't with vnet but the IP stack and jai, so rewrote the full tests to a simpler version and move it to the usr.sbin/jail tests.
Since the test is no longer vnet-specific, don't forget to adjust the commit message. Also, does the polling loop still exhibit problems with the non-vnet version of the test? We've got to get rid of that "sleep 5".
This looks like a valid public IP address. You should use an RFC 5735 address instead, like 192.0.2.0/24.