Adding a regression test to check if vnet interface assigned to a jail is visible from the host once the jail destroyed.
Details
There is a regression on -head only:
root@fbsd131:/usr/tests # uname -a FreeBSD fbsd131 13.1-RELEASE FreeBSD 13.1-RELEASE releng/13.1-n250148-fc952ac2212 GENERIC amd64 root@fbsd131:/usr/tests # kyua debug usr.sbin/jail/jail_basic_test:remove Executing command [ ifconfig lo0 inet 172.254.254.254/32 alias ] Executing command [ jail -c name=removejail persist ip4.addr=172.254.254.254 ] Executing command [ jexec removejail ifconfig lo0 ] Executing command [ jail -R removejail ] Executing command [ jls -d -j removejail ] usr.sbin/jail/jail_basic_test:remove -> passed
And on head:
root@current:/usr/tests # uname -a FreeBSD bigone 14.0-CURRENT FreeBSD 14.0-CURRENT #148 main-n256820-6a26c99f827: Wed Jul 20 00:50:17 CEST 2022 root@bigone:/usr/obj/usr/s rc/amd64.amd64/sys/BBR amd64 root@current:/usr/tests # kyua debug usr.sbin/jail/jail_basic_test:remove Executing command [ ifconfig lo0 inet 172.254.254.254/32 alias ] Executing command [ jail -c name=removejail persist ip4.addr=172.254.254.254 ] Executing command [ jexec removejail ifconfig lo0 ] Executing command [ jail -R removejail ] Executing command [ jls -d -j removejail ] Fail: incorrect exit status: 0, expected: 1 stdout: JID IP Address Hostname Path 1 172.254.254.254 / stderr: usr.sbin/jail/jail_basic_test:remove -> failed: atf-check failed; see the output of the test for details
Diff Detail
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
tests/sys/net/if_destroy_vnet.sh | ||
---|---|---|
22 ↗ | (On Diff #108417) |
Polling for jail shutdown would probably be wise too... |
yes the sleep is racy, but I have no idea to fix it.
Without it, the test will always fails: There is a delay between the jail destruction and the interface being visible back from the host.
So I've tested a loop like this one:
while jls -dj jifdestroy >/dev/null 2>&1; do sleep 1 done
with a dummy script on 13.1-release like this one:
#!/bin/sh #set -eu ifconfig lo888 create jail -c name=bug persist vnet vnet.interface=lo888 jexec bug ifconfig lo888 up jail -R bug while jls -dj bug >/dev/null 2>&1; do echo "wait" sleep 1 done ifconfig lo888 destroy && echo "success" || echo "fails"
And it fails about 1 time on 5 runs:
# sh -x ./yo.sh + ifconfig lo888 create + jail -c 'name=bug' persist vnet 'vnet.interface=lo888' + jexec bug ifconfig lo888 up + jail -R bug + jls -dj bug + ifconfig lo888 destroy ifconfig: interface lo888 does not exist + echo fails fails
So the only stable solution seems the racy sleep here.
Checking that vnet is not dying is the right thing - once vnet is destroyed you should get the interface back. I’d prefer to have this logic in the test instead of a random sleep.
If this fails (on current head) I can take a look in a ~week time
Thanks for the tips about checking for dying state: This is the root cause is. The jail is stuck forever in dying state after destroying (without even using) the vnet interface.
Following new troubleshooting study from zlei.huang@gmail.com on PR, he identified the culprid commits and proposed a simpler way to reproduce it.
The problem isn't with vnet but the IP stack and jai, so rewrote the full tests to a simpler version and move it to the usr.sbin/jail tests.
Since the test is no longer vnet-specific, don't forget to adjust the commit message. Also, does the polling loop still exhibit problems with the non-vnet version of the test? We've got to get rid of that "sleep 5".
usr.sbin/jail/tests/jail_basic_test.sh | ||
---|---|---|
73 | This looks like a valid public IP address. You should use an RFC 5735 address instead, like 192.0.2.0/24. |
usr.sbin/jail/tests/jail_basic_test.sh | ||
---|---|---|
73 | Maybe s/172/127/g would suffice? |