Page MenuHomeFreeBSD

kyua: Do not count skipped as passed in test cmd
ClosedPublic

Authored by igoro on Sep 12 2024, 1:52 PM.
Tags
None
Referenced Files
F102067947: D46653.diff
Thu, Nov 7, 5:30 AM
F102014301: D46653.id144300.diff
Wed, Nov 6, 1:42 PM
Unknown Object (File)
Tue, Nov 5, 12:12 PM
Unknown Object (File)
Sun, Nov 3, 2:56 PM
Unknown Object (File)
Sun, Nov 3, 5:26 AM
Unknown Object (File)
Fri, Nov 1, 12:37 AM
Unknown Object (File)
Mon, Oct 28, 2:26 AM
Unknown Object (File)
Sat, Oct 26, 11:44 AM

Details

Summary
It changes only output of 'kyua test' CLI command. Hence, other outputs
like junit are kept intact for CI and other use cases. It's meant to
improve UX of attended use cases.

The issue is that the following can be tricky to interpret:

  222/222 passed (0 failed)

It can be read as all tests are passed, but it might be a summary line
of all tests skipped due to some requirement is not met.

It's reworked to easily distinguish such cases:

  222/222 passed (0 broken, 0 failed, 0 skipped)
  0/222 passed (0 broken, 0 failed, 222 skipped)

The overall formula is:

  <actually passed>/<total> (<details about not actually passed ones>)

Suggested by:  kp

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 59461
Build 56348: arc lint + arc unit

Event Timeline

igoro requested review of this revision.Sep 12 2024, 1:52 PM

It changes only output of kyua test CLI command. Hence, other outputs like junit are kept intact for CI and other use cases. It's meant to improve UX of attended use cases.

The classic testing is usually about userland applications, where 'skipped' usually means that a developer intentionally altered a test to be hidden from error side and keep the whole suite green. Typically, it's about a few tests like this, when time budget does not allow to fix it right here right now.

Here we deal with OS and infrastructure testing. In our case 'skipped' usually is not meant by a test author. Frequently, it's based on runtime configuration like whether some module is loaded, root access is given, etc. It means that from common sense perspective the following can be tricky:

222/222 passed (0 failed)

Actually, this is the summary of netpfil/pf test suite when all tests are skipped due to no root privileges provided. But from a glance it may look like everything is fine. Even the exit code is 0, by design.

The proposal is to keep logic and exit code the same way but print another, a bit more detailed, summary line. In the example above it would be as follows:

0/222 passed (0 broken, 0 failed, 222 skipped)

The overall formula is:

<actually passed>/<total> (<details about not actually passed ones>)

A more versatile example:

Before the change:

> kyua test testprog
testprog:broken  ->  broken: Premature exit; test case exited with code 1  [0.002s]
testprog:failed  ->  failed: foo != bar (foo != bar)  [0.006s]
testprog:passed  ->  passed  [0.006s]
testprog:skipped01  ->  skipped: requires something  [0.005s]
testprog:skipped02  ->  skipped: requires something  [0.006s]
testprog:skipped03  ->  skipped: requires something  [0.005s]
testprog:skipped04  ->  skipped: requires something  [0.005s]
testprog:skipped05  ->  skipped: requires something  [0.005s]
testprog:skipped06  ->  skipped: requires something  [0.005s]
testprog:skipped07  ->  skipped: requires something  [0.005s]
testprog:skipped08  ->  skipped: requires something  [0.006s]
testprog:skipped09  ->  skipped: requires something  [0.005s]
testprog:skipped10  ->  skipped: requires something  [0.006s]

Results file id is usr_tests_sys_netpfil_pf.20240912-134547-984724
Results saved to /home/igoro/.kyua/store/results.usr_tests_sys_netpfil_pf.20240912-134547-984724.db

11/13 passed (2 failed)

After the change:

> /usr/obj/home/igoro/src/arm64.aarch64/usr.bin/kyua/kyua test testprog
testprog:broken  ->  broken: Premature exit; test case exited with code 1  [0.002s]
testprog:failed  ->  failed: foo != bar (foo != bar)  [0.006s]
testprog:passed  ->  passed  [0.006s]
testprog:skipped01  ->  skipped: requires something  [0.005s]
testprog:skipped02  ->  skipped: requires something  [0.006s]
testprog:skipped03  ->  skipped: requires something  [0.006s]
testprog:skipped04  ->  skipped: requires something  [0.005s]
testprog:skipped05  ->  skipped: requires something  [0.005s]
testprog:skipped06  ->  skipped: requires something  [0.005s]
testprog:skipped07  ->  skipped: requires something  [0.005s]
testprog:skipped08  ->  skipped: requires something  [0.006s]
testprog:skipped09  ->  skipped: requires something  [0.005s]
testprog:skipped10  ->  skipped: requires something  [0.005s]

Results file id is usr_tests_sys_netpfil_pf.20240912-134606-988436
Results saved to /home/igoro/.kyua/store/results.usr_tests_sys_netpfil_pf.20240912-134606-988436.db

1/13 passed (1 broken, 1 failed, 10 skipped)

This is a request for comments. Probably, there are use cases, I'm not aware of, which would require another format. Also, there can be various expectations regarding the first x/y two numbers, like what exactly should be counted there. As a reference, modern and very active frameworks can be found to deal with such questions very simple way -- they drop the question and print everything explicitly like: A passed, B failed, C skipped, T total etc.

I'm not aware of the cases when this line is parsed for further decision making.

It's published here instead of the GitHub for the initial discussion and easier testing via git-arc-patch etc. Yes, Kyua's tests are not aligned yet.

I'm in favour of the change. The current output is quite misleading.

BTW, in the FreeBSD regression test suite, it's basically impossible to avoid skipping some tests:

  • the are tests that get skipped if ipfw.ko is not loaded, and tests which are skipped if ipfw.ko /is/ loaded,
  • fusefs tests refuse to run if mac_bsdextended is loaded, but mac_bsdextended has its own tests which are skipped if mac_bsdextended is not loaded.
contrib/kyua/cli/cmd_test.cpp
67โ€“68

Do we still need these good_count/bad_count counters? It looks like they can be derived from type_count.

BTW, in the FreeBSD regression test suite, it's basically impossible to avoid skipping some tests:

  • the are tests that get skipped if ipfw.ko is not loaded, and tests which are skipped if ipfw.ko /is/ loaded,

My TODO already has an item to re-think those ipfw on/off tests with the idea to avoid such separation. It seems not to add much value.

0/222 passed (0 broken, 0 failed, 222 skipped)

The overall formula is:

<actually passed>/<total> (<details about not actually passed ones>)

My preference would be <passed>/<passed + failed> (X broken, Y failed, Z skipped), so the example would become

0/0 passed (0 broken, 0 failed, 222 skipped)

That's my preference, but I do think that any change would be an improvement over the current misleading output.

contrib/kyua/cli/cmd_test.cpp
67โ€“68

Yeah, I was thinking about all the calculations upon printing, then all three counters are not needed. But I kept it simpler for the very first diff. Let me revise it with the next update.

Postpone all the calculations until it's time to print the summary

I'm in favour of the change. The current output is quite misleading.

BTW, in the FreeBSD regression test suite, it's basically impossible to avoid skipping some tests:

  • the are tests that get skipped if ipfw.ko is not loaded, and tests which are skipped if ipfw.ko /is/ loaded,

I've ever planed to convert ipfw's tunable net.inet.ip.fw.default_to_accept a per-vnet one. Further plan is introducing per-jail's env parameter [1] so that the host and testing jails do not influence each other. Do you have interests with that ?

  1. A per-jail's env parameter is some syntax like jail -c env.net.inet.ip.fw.default_to_accept=1
  • fusefs tests refuse to run if mac_bsdextended is loaded, but mac_bsdextended has its own tests which are skipped if mac_bsdextended is not loaded.

This looks ok to me. I guess the kyua test suite (which is currently not hooked up to the build in FreeBSD) needs some updating though.

This revision is now accepted and ready to land.Sep 14 2024, 7:09 PM
ngie requested changes to this revision.Sep 14 2024, 7:29 PM
ngie added inline comments.
contrib/kyua/cli/cmd_test.cpp
68

Is there a way an enum or something similar could be used for indexing type_count? Relying on how the array is indexed with another structure seems like implicitly tightly coupled interfaces to me (which is not what we want).

contrib/kyua/model/test_result.hpp
44

Could this be derived from test_result_types instead of it being hardcoded here? Maybe make it an extern in this header and actually do the calculation in the .cpp file.

This revision now requires changes to proceed.Sep 14 2024, 7:29 PM
In D46653#1063256, @kp wrote:

My preference would be <passed>/<passed + failed> (X broken, Y failed, Z skipped), so the example would become

0/0 passed (0 broken, 0 failed, 222 skipped)

That's my preference, but I do think that any change would be an improvement over the current misleading output.

My current perception is that dynamic "total" value could be a new misleading number to introduce. Probably, it's better to keep numbers almost completely 1-to-1 mapped to the numbers shown by kyua report --verbose. @ngie , do you have a stronger opinion here as a Kyua veteran?

contrib/kyua/cli/cmd_test.cpp
68

Agree. See the next comment discussion, there is a way to cover both points.

contrib/kyua/model/test_result.hpp
44

Thanks for raising this, it also bothered me. Adding the 6th element to the enum is definitely a bad idea, other new options are available since C++17 or so (we are with C++11 here), and extracting it from a TU as extern leads to something like error: variable length arrays in C++ are a Clang extension [-Werror,-Wvla-cxx-extension].

And now I see that we could do extract the calculation to the cpp, along with reworking of type_count array (the previous comment is about this). That array could easily be migrated to a dynamic structure like std::map and avoid strict dependency on other structures.

I will bring the change with the next update.

Rework structure of test result types metadata. Fix tests.

This revision is now accepted and ready to land.Oct 5 2024, 4:09 AM

Rework structure of test result types metadata. Fix tests.

Thank you for doing this!

This revision was automatically updated to reflect the committed changes.