Details

Reviewers

Commits

rG11ed0a95bfa7: zfs tests: stop writing to arbitrary devices

Summary

TL;DR: Three ZFS tests created ZFS pools on all unmounted devices listed
in /etc/fstab, corrupting their contents. Stop that.

Imagine my surprise when the ESP on my main dev/test VM would "randomly"
become corrupted, making it unbootable. Three tests collect various devices
from the system and try to add them to a test pool. The test expects this
to fail because it _assumes_ these devices are in use and ZFS will correctly
reject the request.

My /etc/fstab has two entries for devices in /dev:

/dev/gpt/swap0  none        swap    sw,trimonce,late
/dev/gpt/esp0   /boot/efi   msdosfs rw,noauto

Note the noauto on the ESP. In a remarkable example of irony, I chose
this because it should keep the ESP more protected from corruption;
in fact, mounting it would have protected it from this case.

The tests added all of these devices to a test pool in a _single command_,
expecting the command to fail. The swap device was in use, so the command
correctly failed, but the ESP was added and therefore corrupted. However,
since the command correctly failed, the test didn't notice the ESP problem.
If each device had been added with its own command, the test _might_ have
noticed that one of them incorrectly succeeded. However, two of these
tests would not have noticed:

hotspare_create_001_neg was incorrectly specified as needing the Solaris
dumpadm command, so it was skipped. _Some_ of the test needs that command,
but it checks for its presence and runs fine without it.

Due to bug 241070, zpool_add_005_pos was marked as an expected failure.
Due to the coarse level of integration with ATF, this test would still
"pass" even if it failed for the wrong reason. I wrote bug 267554 to
reconsider the use of atf_expect_fail in these tests.

Let's further consider the use of various devices found around the system.
In addition to devices in /etc/fstab, the tests also used mounted devices
listed by the mount command. If ZFS behaves correctly, it will refuse
to added mounted devices and swap devices to a pool. However, these are
unit tests used by developers to ensure that ZFS still works after they
modify it, so it's reasonable to expect ZFS to do the _wrong_ thing
sometimes. Using random host devices is unsafe.

Fix the root problem by using only the disks provided via the "disks"
variable in kyua.conf. Use one to create a UFS file system and mount it.
Use another as a swap device. Write code to use a third as a dump device,
but leave it disabled due to bug 241070.

While I'm here:

Due to commit 6b6e2954dd65, we can simply add a second dump device and
remove it in cleanup. We no longer need to save, replace, and restore the
pre-existing dump device.

The cleanup_devices function used camcontrol inquiry to distinguish disks
from other devices, such as partitions. That works fine for SCSI, but not
for ATA or VirtIO block. Use geom disk list instead.

PR: 241070
PR: 267554

Test Plan

These three tests pass without corrupting the ESP.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

vangyzen created this revision.Nov 3 2022, 8:10 PM

Herald added subscribers: dab, badger, delphij, imp. · View Herald TranscriptNov 3 2022, 8:10 PM

vangyzen requested review of this revision.Nov 3 2022, 8:10 PM

Harbormaster completed remote builds in B48131: Diff 112570.Nov 3 2022, 8:10 PM

asomers added inline comments.Nov 3 2022, 8:36 PM

tests/sys/cddl/zfs/tests/cli_root/zpool_add/zpool_add_005_pos.ksh
74	I disagree with this change. If anybody ever fixes bug 241070, it's unlikely that they'll think to update this test, and we'll continue to lack coverage. Having an expected_failure test fail for the "wrong" reason is a real hazard, but it's less bad than the alternatives, such as simply disabling the test. In this case, might I suggest moving the dump_dev assertion into a separate standalone test case so it can still be marked with atf_expect_fail, while all of the other stuff can remain here and hopefully pass?
77	I know that many of the other ZFS tests do this, but it's really best not to use "log_must" in a cleanup function. Because if it fails, then it will prevent the rest of the cleanup function from running.

grahamperrin added a subscriber: grahamperrin.Nov 4 2022, 1:21 AM

vangyzen added inline comments.Nov 4 2022, 1:17 PM

tests/sys/cddl/zfs/tests/cli_root/zpool_add/zpool_add_005_pos.ksh
74	If anybody ever fixes bug 241070, it's unlikely that they'll think to update this test, I just mentioned these tests in a comment. That will greatly increase the odds that they'll get fixed. But even then, we're only human, so I agree, they could get missed. In this case, might I suggest moving the dump_dev assertion into a separate standalone test case so it can still be marked with atf_expect_fail, while all of the other stuff can remain here and hopefully pass? I don't want to make even more copies of these tests. I already had to fix the same issue in three nearly identical copies. I don't want to make it worse. I also don't have the time to refactor. How about this? I could remove the `if $bug_241070_is_fixed` conditions and invert the sense of the test (`log_must zpool add ... dump_dev`) with a comment referring to the bug. It's not an explicit expected failure, but it's practically the same and avoids all of the above issues. I'll code that up for review.
77	Thanks. I admit, I was mostly on auto-pilot for the cleanup.

Don't use log_must in cleanup.
Invert the test for dump devices in lieu of atf_expect_fail.

Harbormaster completed remote builds in B48145: Diff 112602.Nov 4 2022, 1:49 PM

Upload the whole diff.

Harbormaster completed remote builds in B48146: Diff 112603.Nov 4 2022, 1:53 PM

vangyzen marked 2 inline comments as done.Nov 4 2022, 1:57 PM

Fix the tests so they actually pass

Harbormaster completed remote builds in B48149: Diff 112608.Nov 4 2022, 2:45 PM

@asomers What do you think?

Well, it isn't ideal. But it's certainly much better than it used to be, and the ZFS tests are much too big to shoot for "ideal".

This revision is now accepted and ready to land.Nov 11 2022, 5:35 PM

In D37257#848954, @asomers wrote:

Well, it isn't ideal. But it's certainly much better than it used to be, and the ZFS tests are much too big to shoot for "ideal".

Agreed on all points. Thanks for the review.

Closed by commit rG11ed0a95bfa7: zfs tests: stop writing to arbitrary devices (authored by vangyzen). · Explain WhyNov 11 2022, 8:45 PM

This revision was automatically updated to reflect the committed changes.

vangyzen added a commit: rG11ed0a95bfa7: zfs tests: stop writing to arbitrary devices.

zfs tests: stop writing to arbitrary devices
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 113017

tests/sys/cddl/zfs/include/libtest.kshlib

tests/sys/cddl/zfs/tests/cli_root/zpool_add/zpool_add.kshlib

tests/sys/cddl/zfs/tests/cli_root/zpool_add/zpool_add_005_pos.ksh

tests/sys/cddl/zfs/tests/cli_root/zpool_add/zpool_add_test.sh

tests/sys/cddl/zfs/tests/hotspare/hotspare_add_003_neg.ksh

tests/sys/cddl/zfs/tests/hotspare/hotspare_create_001_neg.ksh

tests/sys/cddl/zfs/tests/hotspare/hotspare_test.sh

zfs tests: stop writing to arbitrary devicesClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 113017

tests/sys/cddl/zfs/include/libtest.kshlib

tests/sys/cddl/zfs/tests/cli_root/zpool_add/zpool_add.kshlib

tests/sys/cddl/zfs/tests/cli_root/zpool_add/zpool_add_005_pos.ksh

tests/sys/cddl/zfs/tests/cli_root/zpool_add/zpool_add_test.sh

tests/sys/cddl/zfs/tests/hotspare/hotspare_add_003_neg.ksh

tests/sys/cddl/zfs/tests/hotspare/hotspare_create_001_neg.ksh

tests/sys/cddl/zfs/tests/hotspare/hotspare_test.sh

zfs tests: stop writing to arbitrary devices
ClosedPublic
Actions

Revision Contents
Changeset List