Paths

Table of Contentst

kyua: Do not count skipped as passed in test cmd
ClosedPublic
Actions

Authored by igoro on Sep 12 2024, 1:52 PM.

Details

Reviewers

ngie
kp
markj

Commits

rG1152da4a99fb: kyua: Do not count skipped as passed in test cmd
rG99689201a1eb: kyua: Do not count skipped as passed in test cmd

Summary

It changes only output of 'kyua test' CLI command. Hence, other outputs
like junit are kept intact for CI and other use cases. It's meant to
improve UX of attended use cases.

The issue is that the following can be tricky to interpret:

  222/222 passed (0 failed)

It can be read as all tests are passed, but it might be a summary line
of all tests skipped due to some requirement is not met.

It's reworked to easily distinguish such cases:

  222/222 passed (0 broken, 0 failed, 0 skipped)
  0/222 passed (0 broken, 0 failed, 222 skipped)

The overall formula is:

  <actually passed>/<total> (<details about not actually passed ones>)

Suggested by:  kp

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

igoro created this revision.Sep 12 2024, 1:52 PM

Herald added a subscriber: imp. · View Herald TranscriptSep 12 2024, 1:52 PM

igoro requested review of this revision.Sep 12 2024, 1:52 PM

Harbormaster completed remote builds in B59456: Diff 143279.Sep 12 2024, 1:52 PM

It changes only output of kyua test CLI command. Hence, other outputs like junit are kept intact for CI and other use cases. It's meant to improve UX of attended use cases.

The classic testing is usually about userland applications, where 'skipped' usually means that a developer intentionally altered a test to be hidden from error side and keep the whole suite green. Typically, it's about a few tests like this, when time budget does not allow to fix it right here right now.

Here we deal with OS and infrastructure testing. In our case 'skipped' usually is not meant by a test author. Frequently, it's based on runtime configuration like whether some module is loaded, root access is given, etc. It means that from common sense perspective the following can be tricky:

222/222 passed (0 failed)

Actually, this is the summary of netpfil/pf test suite when all tests are skipped due to no root privileges provided. But from a glance it may look like everything is fine. Even the exit code is 0, by design.

The proposal is to keep logic and exit code the same way but print another, a bit more detailed, summary line. In the example above it would be as follows:

0/222 passed (0 broken, 0 failed, 222 skipped)

The overall formula is:

<actually passed>/<total> (<details about not actually passed ones>)

A more versatile example:

Before the change:

> kyua test testprog
testprog:broken  ->  broken: Premature exit; test case exited with code 1  [0.002s]
testprog:failed  ->  failed: foo != bar (foo != bar)  [0.006s]
testprog:passed  ->  passed  [0.006s]
testprog:skipped01  ->  skipped: requires something  [0.005s]
testprog:skipped02  ->  skipped: requires something  [0.006s]
testprog:skipped03  ->  skipped: requires something  [0.005s]
testprog:skipped04  ->  skipped: requires something  [0.005s]
testprog:skipped05  ->  skipped: requires something  [0.005s]
testprog:skipped06  ->  skipped: requires something  [0.005s]
testprog:skipped07  ->  skipped: requires something  [0.005s]
testprog:skipped08  ->  skipped: requires something  [0.006s]
testprog:skipped09  ->  skipped: requires something  [0.005s]
testprog:skipped10  ->  skipped: requires something  [0.006s]

Results file id is usr_tests_sys_netpfil_pf.20240912-134547-984724
Results saved to /home/igoro/.kyua/store/results.usr_tests_sys_netpfil_pf.20240912-134547-984724.db

11/13 passed (2 failed)

After the change:

> /usr/obj/home/igoro/src/arm64.aarch64/usr.bin/kyua/kyua test testprog
testprog:broken  ->  broken: Premature exit; test case exited with code 1  [0.002s]
testprog:failed  ->  failed: foo != bar (foo != bar)  [0.006s]
testprog:passed  ->  passed  [0.006s]
testprog:skipped01  ->  skipped: requires something  [0.005s]
testprog:skipped02  ->  skipped: requires something  [0.006s]
testprog:skipped03  ->  skipped: requires something  [0.006s]
testprog:skipped04  ->  skipped: requires something  [0.005s]
testprog:skipped05  ->  skipped: requires something  [0.005s]
testprog:skipped06  ->  skipped: requires something  [0.005s]
testprog:skipped07  ->  skipped: requires something  [0.005s]
testprog:skipped08  ->  skipped: requires something  [0.006s]
testprog:skipped09  ->  skipped: requires something  [0.005s]
testprog:skipped10  ->  skipped: requires something  [0.005s]

Results file id is usr_tests_sys_netpfil_pf.20240912-134606-988436
Results saved to /home/igoro/.kyua/store/results.usr_tests_sys_netpfil_pf.20240912-134606-988436.db

1/13 passed (1 broken, 1 failed, 10 skipped)

This is a request for comments. Probably, there are use cases, I'm not aware of, which would require another format. Also, there can be various expectations regarding the first x/y two numbers, like what exactly should be counted there. As a reference, modern and very active frameworks can be found to deal with such questions very simple way -- they drop the question and print everything explicitly like: A passed, B failed, C skipped, T total etc.

I'm not aware of the cases when this line is parsed for further decision making.

It's published here instead of the GitHub for the initial discussion and easier testing via git-arc-patch etc. Yes, Kyua's tests are not aligned yet.

I'm in favour of the change. The current output is quite misleading.

BTW, in the FreeBSD regression test suite, it's basically impossible to avoid skipping some tests:

the are tests that get skipped if ipfw.ko is not loaded, and tests which are skipped if ipfw.ko /is/ loaded,
fusefs tests refuse to run if mac_bsdextended is loaded, but mac_bsdextended has its own tests which are skipped if mac_bsdextended is not loaded.

contrib/kyua/cli/cmd_test.cpp
67–68	Do we still need these good_count/bad_count counters? It looks like they can be derived from `type_count`.

In D46653#1063244, @markj wrote:

BTW, in the FreeBSD regression test suite, it's basically impossible to avoid skipping some tests:

the are tests that get skipped if ipfw.ko is not loaded, and tests which are skipped if ipfw.ko /is/ loaded,

My TODO already has an item to re-think those ipfw on/off tests with the idea to avoid such separation. It seems not to add much value.

In D46653#1063243, @igoro wrote:
0/222 passed (0 broken, 0 failed, 222 skipped)
The overall formula is:
<actually passed>/<total> (<details about not actually passed ones>)

My preference would be <passed>/<passed + failed> (X broken, Y failed, Z skipped), so the example would become

0/0 passed (0 broken, 0 failed, 222 skipped)

That's my preference, but I do think that any change would be an improvement over the current misleading output.

igoro added inline comments.Sep 12 2024, 2:25 PM

contrib/kyua/cli/cmd_test.cpp
67–68	Yeah, I was thinking about all the calculations upon printing, then all three counters are not needed. But I kept it simpler for the very first diff. Let me revise it with the next update.

Postpone all the calculations until it's time to print the summary

Harbormaster completed remote builds in B59461: Diff 143286.Sep 12 2024, 4:41 PM

igoro mentioned this in D46655: pf tests: Do not handle ipfw presence.Sep 12 2024, 10:00 PM

In D46653#1063244, @markj wrote:

I'm in favour of the change. The current output is quite misleading.

BTW, in the FreeBSD regression test suite, it's basically impossible to avoid skipping some tests:

the are tests that get skipped if ipfw.ko is not loaded, and tests which are skipped if ipfw.ko /is/ loaded,

I've ever planed to convert ipfw's tunable net.inet.ip.fw.default_to_accept a per-vnet one. Further plan is introducing per-jail's env parameter [1] so that the host and testing jails do not influence each other. Do you have interests with that ?

A per-jail's env parameter is some syntax like jail -c env.net.inet.ip.fw.default_to_accept=1

fusefs tests refuse to run if mac_bsdextended is loaded, but mac_bsdextended has its own tests which are skipped if mac_bsdextended is not loaded.

This looks ok to me. I guess the kyua test suite (which is currently not hooked up to the build in FreeBSD) needs some updating though.

This revision is now accepted and ready to land.Sep 14 2024, 7:09 PM

ngie requested changes to this revision.Sep 14 2024, 7:29 PM

ngie added inline comments.

contrib/kyua/cli/cmd_test.cpp
68	Is there a way an enum or something similar could be used for indexing `type_count`? Relying on how the array is indexed with another structure seems like implicitly tightly coupled interfaces to me (which is not what we want).
contrib/kyua/model/test_result.hpp
45	Could this be derived from `test_result_types` instead of it being hardcoded here? Maybe make it an extern in this header and actually do the calculation in the .cpp file.

This revision now requires changes to proceed.Sep 14 2024, 7:29 PM

In D46653#1063256, @kp wrote:

My preference would be <passed>/<passed + failed> (X broken, Y failed, Z skipped), so the example would become

0/0 passed (0 broken, 0 failed, 222 skipped)

That's my preference, but I do think that any change would be an improvement over the current misleading output.

My current perception is that dynamic "total" value could be a new misleading number to introduce. Probably, it's better to keep numbers almost completely 1-to-1 mapped to the numbers shown by kyua report --verbose. @ngie , do you have a stronger opinion here as a Kyua veteran?

contrib/kyua/cli/cmd_test.cpp
68	Agree. See the next comment discussion, there is a way to cover both points.
contrib/kyua/model/test_result.hpp
45	Thanks for raising this, it also bothered me. Adding the 6th element to the enum is definitely a bad idea, other new options are available since C++17 or so (we are with C++11 here), and extracting it from a TU as extern leads to something like `error: variable length arrays in C++ are a Clang extension [-Werror,-Wvla-cxx-extension]`. And now I see that we could do extract the calculation to the cpp, along with reworking of `type_count` array (the previous comment is about this). That array could easily be migrated to a dynamic structure like `std::map` and avoid strict dependency on other structures. I will bring the change with the next update.

Rework structure of test result types metadata. Fix tests.

Harbormaster completed remote builds in B59504: Diff 143382.Sep 16 2024, 11:25 AM

igoro marked 2 inline comments as done.Sep 16 2024, 11:26 AM

igoro edited the summary of this revision. (Show Details)Sep 16 2024, 11:49 AM

igoro edited the summary of this revision. (Show Details)Sep 16 2024, 12:07 PM

guest-patmaddox added a subscriber: guest-patmaddox.Sep 19 2024, 5:08 PM

ngie accepted this revision.Oct 5 2024, 4:09 AM

This revision is now accepted and ready to land.Oct 5 2024, 4:09 AM

In D46653#1064186, @igoro wrote:

Rework structure of test result types metadata. Fix tests.

Thank you for doing this!

Closed by commit rG99689201a1eb: kyua: Do not count skipped as passed in test cmd (authored by igoro). · Explain WhyOct 5 2024, 11:34 AM

This revision was automatically updated to reflect the committed changes.

igoro added a commit: rG99689201a1eb: kyua: Do not count skipped as passed in test cmd.

igoro added a commit: rG1152da4a99fb: kyua: Do not count skipped as passed in test cmd.Oct 17 2024, 1:51 PM

For the record, stuff like this should really be committed to freebsd/kyua first, then backported. As of right now this change is not in what's intended to go in to 0.14 and in order to do the upgrade I would need to reapply this patch. I provided a ship-it earlier, but I was swayed by the comment in the PR about this change being confusing for naive results parsers: https://github.com/freebsd/kyua/pull/230 .

In D46653#1091513, @ngie wrote:

For the record, stuff like this should really be committed to freebsd/kyua first, then backported. As of right now this change is not in what's intended to go in to 0.14 and in order to do the upgrade I would need to reapply this patch. I provided a ship-it earlier, but I was swayed by the comment in the PR about this change being confusing for naive results parsers: https://github.com/freebsd/kyua/pull/230 .

Yeah, ideally it should go through the GitHub repo. I tried to help with the synchronization by keeping this review and the GitHub PR in sync, the latter was expected to be "blindly" merged after all approvals on the Phab, unfortunately I do not have access to do that. Probably, we could think of some access for me? My vision is that I could merge the things like that PR after all discussions and approvals on the Phab side.

Regarding the change proposed on the GitHub PR. I think it's better to merge the existing PR to keep FreeBSD src/contrib/kyua and the github/freebsd/kyua in sync with absolutely the same commit. And I would open a new PR for the change requested with a reference to the previous GitHub discussion/PR. What do you think is the best here from the organizational perspective?

P.S. The upcoming 0.14 is great news!

In D46653#1091534, @igoro wrote:

In D46653#1091513, @ngie wrote:

For the record, stuff like this should really be committed to freebsd/kyua first, then backported. As of right now this change is not in what's intended to go in to 0.14 and in order to do the upgrade I would need to reapply this patch. I provided a ship-it earlier, but I was swayed by the comment in the PR about this change being confusing for naive results parsers: https://github.com/freebsd/kyua/pull/230 .

Yeah, ideally it should go through the GitHub repo. I tried to help with the synchronization by keeping this review and the GitHub PR in sync, the latter was expected to be "blindly" merged after all approvals on the Phab, unfortunately I do not have access to do that. Probably, we could think of some access for me? My vision is that I could merge the things like that PR after all discussions and approvals on the Phab side.

I think being added to the FreeBSD org is documented somewhere in the Committer's Guide. CC: @markj

Regarding the change proposed on the GitHub PR. I think it's better to merge the existing PR to keep FreeBSD src/contrib/kyua and the github/freebsd/kyua in sync with absolutely the same commit. And I would open a new PR for the change requested with a reference to the previous GitHub discussion/PR. What do you think is the best here from the organizational perspective?

I don't think it would be a good idea to merge the PR as-is. I'm trying to avoid violating POLA as much as humanly possible for 0.14 and taking a look at the output definitely violates POLA. There are enough Linux/MacOS things to deal with -- I'd rather not get more questions with a UX change like has been made on main...

In D46653#1091544, @ngie wrote:

Regarding the change proposed on the GitHub PR. I think it's better to merge the existing PR to keep FreeBSD src/contrib/kyua and the github/freebsd/kyua in sync with absolutely the same commit. And I would open a new PR for the change requested with a reference to the previous GitHub discussion/PR. What do you think is the best here from the organizational perspective?

I don't think it would be a good idea to merge the PR as-is. I'm trying to avoid violating POLA as much as humanly possible for 0.14 and taking a look at the output definitely violates POLA. There are enough Linux/MacOS things to deal with -- I'd rather not get more questions with a UX change like has been made on main...

I forgot to mention explicitly that I was thinking about "for the sake of easy merging between FreeBSD and github/Kyua". Now, as I get it, you worry about potential parsers of the last line of the output. Yes, I mentioned it during the discussion here, and it was like there are no objections to changing it. But, having it changed only for FreeBSD, we could step back and re-think the interface again, before GA.

I would like to list the options that quickly come to mind:

Revert the change [1], keep UI as before.
Revert the change, and probably add an extra line above the last one:

1 passed, 221 skipped, 0 broken, 0 failed, 222 total
222/222 passed (0 failed)

Change the last line to:

1 passed, 221 skipped, 0 broken, 0 failed, 222 total

Change the last line to repeat output of kyua report --verbose:

Test cases: 222 total, 221 skipped, 0 expected failures, 0 broken, 0 failed

If we decide to astonish the users and change the last line then we could think of an extra value for exchange, e.g. make it provide some "smart" summarization to highlight the majority of the results. Perhaps I went too far, or maybe it's just a cup of great coffee talking here :). Please, do not take my wording as the best one:

All failed: 0 passed, 0 skipped, 1 broken, 221 failed, 222 total
Mostly failing: 90 passed, 0 skipped, 1 broken, 131 failed, 222 total
Some failures: 200 passed, 0 skipped, 2 broken, 20 failed, 222 total
All skipped: 0 passed, 222 skipped, 0 broken, 0 failed, 222 total
All passed: 222 passed, 0 skipped, 0 broken, 0 failed, 222 total

[1] Revert the change in FreeBSD main and stable/14 branches. Close the GitHub PR w/o merging.

Please let me know your vision/plan when you have one, and which steps are mine -- I'm ready to help with any option.

In D46653#1091680, @igoro wrote:
In D46653#1091544, @ngie wrote:

Regarding the change proposed on the GitHub PR. I think it's better to merge the existing PR to keep FreeBSD src/contrib/kyua and the github/freebsd/kyua in sync with absolutely the same commit. And I would open a new PR for the change requested with a reference to the previous GitHub discussion/PR. What do you think is the best here from the organizational perspective?

I don't think it would be a good idea to merge the PR as-is. I'm trying to avoid violating POLA as much as humanly possible for 0.14 and taking a look at the output definitely violates POLA. There are enough Linux/MacOS things to deal with -- I'd rather not get more questions with a UX change like has been made on main...

I forgot to mention explicitly that I was thinking about "for the sake of easy merging between FreeBSD and github/Kyua". Now, as I get it, you worry about potential parsers of the last line of the output. Yes, I mentioned it during the discussion here, and it was like there are no objections to changing it. But, having it changed only for FreeBSD, we could step back and re-think the interface again, before GA.

I would like to list the options that quickly come to mind:

Revert the change [1], keep UI as before.

Revert the change, and probably add an extra line above the last one:
1 passed, 221 skipped, 0 broken, 0 failed, 222 total
222/222 passed (0 failed)
Change the last line to:
1 passed, 221 skipped, 0 broken, 0 failed, 222 total
Change the last line to repeat output of kyua report --verbose:
Test cases: 222 total, 221 skipped, 0 expected failures, 0 broken, 0 failed
If we decide to astonish the users and change the last line then we could think of an extra value for exchange, e.g. make it provide some "smart" summarization to highlight the majority of the results. Perhaps I went too far, or maybe it's just a cup of great coffee talking here :). Please, do not take my wording as the best one:
All failed: 0 passed, 0 skipped, 1 broken, 221 failed, 222 total
Mostly failing: 90 passed, 0 skipped, 1 broken, 131 failed, 222 total
Some failures: 200 passed, 0 skipped, 2 broken, 20 failed, 222 total
All skipped: 0 passed, 222 skipped, 0 broken, 0 failed, 222 total
All passed: 222 passed, 0 skipped, 0 broken, 0 failed, 222 total
[1] Revert the change in FreeBSD main and stable/14 branches. Close the GitHub PR w/o merging.

Please let me know your vision/plan when you have one, and which steps are mine -- I'm ready to help with any option.

I kind of thing option 4 makes the most sense.. maybe omitting categories which have 0 results associated with them (less is more)? Let me look at some other test frameworks to see how they do things (this will take me a few days since I’m traveling right now..).

In D46653#1091544, @ngie wrote:

In D46653#1091534, @igoro wrote:

In D46653#1091513, @ngie wrote:

For the record, stuff like this should really be committed to freebsd/kyua first, then backported. As of right now this change is not in what's intended to go in to 0.14 and in order to do the upgrade I would need to reapply this patch. I provided a ship-it earlier, but I was swayed by the comment in the PR about this change being confusing for naive results parsers: https://github.com/freebsd/kyua/pull/230 .

Yeah, ideally it should go through the GitHub repo. I tried to help with the synchronization by keeping this review and the GitHub PR in sync, the latter was expected to be "blindly" merged after all approvals on the Phab, unfortunately I do not have access to do that. Probably, we could think of some access for me? My vision is that I could merge the things like that PR after all discussions and approvals on the Phab side.

I think being added to the FreeBSD org is documented somewhere in the Committer's Guide. CC: @markj

I don't see any docs on that. Am I missing something?

In any case, I submitted https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=283127

Regarding the change itself, I have a hard time believing any serious project is parsing the result summary line and using that as input to something else. Do we know of any actual examples?

I kind of thing option 4 makes the most sense..

For the run of the FreeBSD regression suite that I just completed, option 4 gives:

Test cases: 9032 total, 211 skipped, 33 expected failures, 1 broken, 6 failed

Whereas kyua currently prints:

8814/9032 passed (1 broken, 6 failed, 211 skipped)

I think the current behaviour is the most useful. Option 4 is okay, but it doesn't print the number of passed test cases, which is a bit weird at first glance.

In D46653#1092361, @markj wrote:
In D46653#1091544, @ngie wrote:

I think being added to the FreeBSD org is documented somewhere in the Committer's Guide. CC: @markj

I don't see any docs on that. Am I missing something?

In any case, I submitted https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=283127

Regarding the change itself, I have a hard time believing any serious project is parsing the result summary line and using that as input to something else. Do we know of any actual examples?

I kind of thing option 4 makes the most sense..

For the run of the FreeBSD regression suite that I just completed, option 4 gives:
Test cases: 9032 total, 211 skipped, 33 expected failures, 1 broken, 6 failed
Whereas kyua currently prints:
8814/9032 passed (1 broken, 6 failed, 211 skipped)

The issue is the ratio… random folks who use other frameworks might be confused by it. My first gut reaction was “X testcases failed…?! Oh wait… they were skipped—really??”. pytest for instance doesn’t emit the passed count as a ratio, but instead outputs the actual count in each category: https://slar.se/essential-pytest-1.html .

Skipped tests shouldn’t count against passed or failed tests. The ratio as it stands is confusing. I vote for just outputting the raw values; non-verbose output can just omit categories without any results — seems like a reasonable compromise from the -v output.

I think the current behaviour is the most useful. Option 4 is okay, but it doesn't print the number of passed test cases, which is a bit weird at first glance.

In D46653#1096065, @ngie wrote:
In D46653#1092361, @markj wrote:
In D46653#1091544, @ngie wrote:

I think being added to the FreeBSD org is documented somewhere in the Committer's Guide. CC: @markj

I don't see any docs on that. Am I missing something?

In any case, I submitted https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=283127

Regarding the change itself, I have a hard time believing any serious project is parsing the result summary line and using that as input to something else. Do we know of any actual examples?

I kind of thing option 4 makes the most sense..

For the run of the FreeBSD regression suite that I just completed, option 4 gives:
Test cases: 9032 total, 211 skipped, 33 expected failures, 1 broken, 6 failed
Whereas kyua currently prints:
8814/9032 passed (1 broken, 6 failed, 211 skipped)
The issue is the ratio… random folks who use other frameworks might be confused by it. My first gut reaction was “X testcases failed…?! Oh wait… they were skipped—really??”. pytest for instance doesn’t emit the passed count as a ratio, but instead outputs the actual count in each category: https://slar.se/essential-pytest-1.html .

Skipped tests shouldn’t count against passed or failed tests. The ratio as it stands is confusing. I vote for just outputting the raw values; non-verbose output can just omit categories without any results — seems like a reasonable compromise from the -v output.

We should strive to avoid skipping tests as well, so I don't really see why it's wrong to include skipped test cases in the ratio.

To me this is a bikeshed. I don't insist on keeping the current format, but so long as all the info (i.e., counts of each test result) is there, I don't care very much.

I think the current behaviour is the most useful. Option 4 is okay, but it doesn't print the number of passed test cases, which is a bit weird at first glance.

As @ngie pointed out, I think the ratio is confusing. For reference, here are how Elixir and RSpec (Ruby) report a test run including skips:

207 tests, 0 failures, 3 excluded

3 examples, 0 failures, 3 pending

Revision Contents
Changeset List

Path

Size

contrib/

kyua/

cli/

cmd_test.cpp

50 lines

integration/

cmd_test_test.sh

42 lines

model/

test_result.hpp

15 lines

test_result.cpp

48 lines

Diff 144300

View Options

contrib/kyua/cli/cmd_test.cpp

View Options

contrib/kyua/integration/cmd_test_test.sh

View Options

contrib/kyua/model/test_result.hpp

View Options

contrib/kyua/model/test_result.cpp

kyua: Do not count skipped as passed in test cmdClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 144300

contrib/kyua/cli/cmd_test.cpp

contrib/kyua/integration/cmd_test_test.sh

contrib/kyua/model/test_result.hpp

contrib/kyua/model/test_result.cpp

kyua: Do not count skipped as passed in test cmd
ClosedPublic
Actions

Revision Contents
Changeset List