Page MenuHomeFreeBSD

tests/sys/net: run if_bridge_test in a jail
ClosedPublic

Authored by ivy on Thu, May 22, 8:46 PM.
Tags
None
Referenced Files
F119323109: D50476.diff
Sat, Jun 7, 3:57 PM
Unknown Object (File)
Wed, Jun 4, 7:13 AM
Unknown Object (File)
Sat, May 24, 7:31 AM
Unknown Object (File)
Sat, May 24, 7:02 AM
Unknown Object (File)
Sat, May 24, 5:12 AM
Unknown Object (File)
Fri, May 23, 11:52 AM

Details

Summary

Many of the tests in if_bridge_tests create new epair interfaces and
then add them to a bridge.

If devd is running, devd will attempt to configure these interfaces
based on the host's /etc/rc.conf, and may enable IPv6 on the new
interface (for example, if ipv6_activate_all_interfaces is enabled),
causing an IPv6 link-local address to be assigned to the epair. This
causes the test to fail if net.link.bridge.member_ifaddrs is set to 0,
which is the default.

Fix this by running the tests in a jail, so there is no devd to
interfere with our new interfaces.

Unfortunately, while this should improve overall test performance by
allowing these tests to be parallelised, destroying the epairs inside a
vnet is a lot slower, which causes the many_bridge_members test to time
out in its cleanup trying to destroy 512 epairs. Fix this by doing the
epair destroy in the test body instead.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 64392
Build 61276: arc lint + arc unit

Event Timeline

ivy requested review of this revision.Thu, May 22, 8:46 PM

I do wonder if we shouldn't run all of them in a jail and get rid of the is_exclusive=true. If we genuinely have reproducible panics doing that that'd be a great opportunity to fix another bug.
Still, this is an improvement too, so let's not let that notion stand in the way of making this better.

This revision is now accepted and ready to land.Fri, May 23, 9:35 AM
des requested changes to this revision.Fri, May 23, 5:59 PM
des added inline comments.
tests/sys/net/if_bridge_test.sh
702 ↗(On Diff #155894)

I'd really rather try to fix the timeout.

This revision now requires changes to proceed.Fri, May 23, 5:59 PM
tests/sys/net/if_bridge_test.sh
702 ↗(On Diff #155894)

See the top of contrib/kyua/engine/scheduler.cpp; there are two timeout values, I believe the first is a global cleanup timeout and the second is a per-testcase cleanup timeout. They are both set to 60. The easy way out is to just bump them both to 300. In the long term, it would be better to make them configurable, like the source code comments suggest, but that's quite a bit of work.

tests/sys/net/if_bridge_test.sh
702 ↗(On Diff #155894)

can this be committed directly to contrib, or does it need to go via upstream?

i am not sure about increasing the timeout for all tests to placate one test (especially when this is, really, a kernel bug) but i can do it that way if you think it makes more sense.

kp added inline comments.
tests/sys/net/if_bridge_test.sh
702 ↗(On Diff #155894)

We mostly own kyua at this point. Loop in @ngie and maybe @igoro for Kyua patches.

It's also not the only place we have this issue. There are a couple of pf tests with the same timeout on cleanup, where the underlying issue is probably also slow interface destruction.

You're right that that's the real problem, but it's also likely to be a lot harder to fix that one.

don't work around the test timeout here, we'll fix it in kyua

This revision is now accepted and ready to land.Fri, May 23, 8:46 PM

The combinatorics seems to be as follows:

A. Fix the root cause, i.e. optimize the kernel (if it is doable).
Pros: The ideal way.
Cons: This patch will be blocked for a long time.

B. Add a workaround to the test (this patch includes)
Pros: Doable today.
Cons: Hides the root cause. (probably we could add a TODO which will be an interesting artifact for future archaeologists, but anyway)

C. Increase the timeout
Pros: Doable not today, but in several days.
Cons: Still hides the root cause. In addition we have a question "for how long?", for one system timeout=T could be enough, but another env and test execution conf might need T x k, we may end up fine-tuning the 'k'onstant periodically (in theory, I'm not aware of the details). But, if it looks to be within 60-300 sec with a big chunk of spare time then it might end up being good enough.

We have implementation sub-options here:

C1. Change kyua's default.
  Pros: Quick shot.
  Cons: As already mentioned, it does not feel good to do that for all tests. And the upstream repo is used not only by FreeBSD, while we would like to keep the minimal difference between freebsd-src/contrib/kyua and github/freebsd/kyua.

C2. Improve kyua to introduce per-test "cleanup_timeout" metadata.
  Pros: The best option, presumably.
  Cons: Needs more time for the implementation plus review/approve process.

My personal summary:

  • Option A seems to be off the table.
  • Option B seems to be a working way, if there are votes for. (it looks we have first exit poll results)
  • Option C:
    • If there is a rush to avoid some global impact like constantly failing CI or another annoyance, then we could quickly go with C1, with the idea to eventually apply C2 and revert the default back to the prev value. (meanwhile keep it not replicated to the upstream repo)
    • If there is no rush I would pick C2 as @des suggests, and I could help with impl or review.

my preference would be to land this and D50499 sooner rather than later, i.e. your “C1” option (and then do whatever the ‘better’ fix is in a followup commit) since currently if_bridge_test is broken for anyone with $ipv6_activate_all_interfaces enabled. however this issue doesn't affect ci.freebsd.org Jenkins as that option doesn't seem to be enabled there.

an alternative would be to not land this at all (at least until the kernel issue is fixed) and instead rely on D50477 to fix the specific problem described in the commit message. the upside is this doesn't add 180s to the execution time for if_bridge_test. the downside is it also doesn't actually fix the problem in all cases, e.g. if a user has some ifconfig_epair* options in rc.conf.

This revision was automatically updated to reflect the committed changes.