Page MenuHomeFreeBSD

tests: Add better pjdfstest integration
ClosedPublic

Authored by markj on Apr 23 2026, 5:00 PM.
Tags
None
Referenced Files
F159943209: D56605.id.diff
Fri, Jun 19, 7:18 PM
Unknown Object (File)
Tue, Jun 16, 11:11 AM
Unknown Object (File)
Tue, Jun 16, 5:51 AM
Unknown Object (File)
Wed, Jun 10, 2:34 PM
Unknown Object (File)
Thu, Jun 4, 12:33 PM
Unknown Object (File)
Sat, May 30, 12:27 PM
Unknown Object (File)
May 20 2026, 7:14 AM
Unknown Object (File)
May 20 2026, 1:13 AM
Subscribers

Details

Summary

Use ATF to wrap the new reimplementation of pjdfstest that came out of
GSOC 2022: https://github.com/saidsay-so/pjdfstest

So far I added tests for UFS, tmpfs and ZFS. More filesystems and
filesystem option combinations should be added. The result mostly
works, but see the comments below.

The p9fs tests don't really work properly: they need a 9pfs share to be
provided by the hypervisor, which itself is not a big deal, but we also
need the hypervisor to be running as the root user, which I avoid. I
think the solution there will be to implement a INET/unix socket
transport for p9fs, and then add a simple 9p server to the base system
and use that for testing.

This version of pjdfstest requires a pjdfstest user, which we currently
don't have. The current plan is for the pjdfstest package to create the
user upon installation.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

As I mentioned by email, some tests need more than one unprivileged user. Some need up to three. We can't really eliminate that need, but we can change the configured users.

Hmm, so we have "nobody" and "tests"... I guess we could just use another system user, maybe "daemon"? Alternately, the port could create a pjdfstest user, or we could require the test runner to configure one somehow (i.e., skip the tests unless a pjdfstest user exists).

As I mentioned by email, some tests need more than one unprivileged user. Some need up to three. We can't really eliminate that need, but we can change the configured users.

Hmm, so we have "nobody" and "tests"... I guess we could just use another system user, maybe "daemon"? Alternately, the port could create a pjdfstest user, or we could require the test runner to configure one somehow (i.e., skip the tests unless a pjdfstest user exists).

I'm in favor of creating that user in the port.

This revision is now accepted and ready to land.Apr 23 2026, 8:55 PM

As I mentioned by email, some tests need more than one unprivileged user. Some need up to three. We can't really eliminate that need, but we can change the configured users.

Hmm, so we have "nobody" and "tests"... I guess we could just use another system user, maybe "daemon"? Alternately, the port could create a pjdfstest user, or we could require the test runner to configure one somehow (i.e., skip the tests unless a pjdfstest user exists).

I'm in favor of creating that user in the port.

Yeah, I think that makes the most sense.

One more related question: should the rewrite call itself pjdfstest, or something else? Is it going to replace the upstream implementation?

This revision now requires review to proceed.Apr 27 2026, 1:37 PM

Yes, the plan is that this will replace the sh-based implementation. So I don't want to completely rename it. Or I suppose we could just retire the sh-based implementation, and call this one "pjdfstest2" or something.
Also, while this diff LGTM, you shouldn't commit it just yet, until we resolve all of the issues you opened on Github. That's because some of those might change the config file format.

tests/sys/fs/pjdfstest/ufs.sh
61

Is the birthtime failure related to 64-bit timestamps? Or is that due to "not having a birthtime field"?

Yes, the plan is that this will replace the sh-based implementation. So I don't want to completely rename it. Or I suppose we could just retire the sh-based implementation, and call this one "pjdfstest2" or something.

pjdfstest2 seems fine. I have no real preference, just wondering.

Also, while this diff LGTM, you shouldn't commit it just yet, until we resolve all of the issues you opened on Github. That's because some of those might change the config file format.

Sure, no rush. I wasn't going to commit this until pjdfstest is available as a binary package.

markj added a reviewer: tests.

Allow parallel test runs by default now that pjdfstest handles it.

Please note that the new version requires _rust_. That's a pretty large dependency to require for CI, which doesn't exist already today.
This is part of the reason why I was more keen on doing python+pytest integration: python is more common in CI and pytest is on the docket for kyua integration (it's on my TODO list because I want the atf_python stuff to go away...).

tests/sys/fs/pjdfstest/ufs.sh
2

Please don't add shebangs to ATF tests: the build process automatically adds its own shebang which uses the appropriately pathed atf-sh .

tests/sys/fs/pjdfstest/zfs.sh
37
49
87

Why is this the only test where the directory is cleaned up after the fact?

markj marked an inline comment as done.

Address some comments

Please note that the new version requires _rust_. That's a pretty large dependency to require for CI, which doesn't exist already today.
This is part of the reason why I was more keen on doing python+pytest integration: python is more common in CI and pytest is on the docket for kyua integration (it's on my TODO list because I want the atf_python stuff to go away...).

I don't follow: the tests require a compiled rust executable, which will be available via the ports tree. There is no dependency on rust itself.

tests/sys/fs/pjdfstest/zfs.sh
37

Why? Quoting isn't needed around assignments like this, the shell won't perform expansion here.

87

This test creates an extra directory.

asomers requested changes to this revision.May 7 2026, 2:45 PM
asomers added inline comments.
tests/sys/fs/pjdfstest/p9fs.sh
21 ↗(On Diff #176715)

Since you have a dedicated test file system, and aren't merely using /tmp , I suggest enabling the remount tests, here and for the other file systems too.

tests/sys/fs/pjdfstest/tmpfs.sh
44

require.progs here, too

This revision now requires changes to proceed.May 7 2026, 2:45 PM
markj marked 2 inline comments as done.May 7 2026, 3:39 PM
markj added inline comments.
tests/sys/fs/pjdfstest/p9fs.sh
21 ↗(On Diff #176715)

This doesn't trivially work:

open::erofs_named                                                         FAILED                                                                               
        called `Result::unwrap()` on an `Err` value: Failed to remount: mount: tmpfs: Device busy

I presume pjdfstest has its cwd set to some path under the mount?

tests/sys/fs/pjdfstest/p9fs.sh
21 ↗(On Diff #176715)

I saw that with UFS too, but not ZFS. I have a theory, but I'm not sure of the best way to handle it. My theory is that on certain file systems, close triggers some kind of asynchronous task that flushes data to disk, and prevents the file system from being remounted read-only. ZFS either doesn't do that, or else the remount operation waits for such tasks to complete.
Should we consider that a file system bug, and require all file systems to handle it the way that ZFS does? Or do we switch the test to use "unmount -f", which works?

tests/sys/fs/pjdfstest/p9fs.sh
21 ↗(On Diff #176715)

I see this on tmpfs and ufs, so I think there must be something else going on.

tests/sys/fs/pjdfstest/p9fs.sh
21 ↗(On Diff #176715)

Just revisiting this: shall I look into remount issue? I haven't tried to debug it.

Rebase, drop p9fs tests for now.

@asomers I retried the tests with the pjdfstest built from ports. All of the tests pass except the UFS1 one:

root@freebsd:/tmp/kyua.uO4a2P/2/work # pjdfstest -c pjdfstest.toml -p mnt
utimensat::utime_now_write_perm                                               ok
utimensat::changes_timestamps::regular                                        ok
utimensat::changes_timestamps::socket                                         ok
utimensat::subsecond                                                          ok
utimensat::utime_now_nobody                                                   ok
utimensat::changes_timestamps::dir                                            ok
utimensat::nobody                                                             ok
utimensat::utime_now_root                                                     ok
utimensat::order                                                              ok
utimensat::follow_symlink                                                     ok
utimensat::root                                                               ok
Abort trap
root@freebsd:/tmp/kyua.uO4a2P/2/work # echo $?
134

@asomers I retried the tests with the pjdfstest built from ports. All of the tests pass except the UFS1 one:

root@freebsd:/tmp/kyua.uO4a2P/2/work # pjdfstest -c pjdfstest.toml -p mnt
utimensat::utime_now_write_perm                                               ok
utimensat::changes_timestamps::regular                                        ok
utimensat::changes_timestamps::socket                                         ok
utimensat::subsecond                                                          ok
utimensat::utime_now_nobody                                                   ok
utimensat::changes_timestamps::dir                                            ok
utimensat::nobody                                                             ok
utimensat::utime_now_root                                                     ok
utimensat::order                                                              ok
utimensat::follow_symlink                                                     ok
utimensat::root                                                               ok
Abort trap
root@freebsd:/tmp/kyua.uO4a2P/2/work # echo $?
134

I see something similar with an NFS client mount. Please let me know if you're short on time and I'll try to debug it. On top of this I'd quite like to get some basic NFS testing into the tree.

I'll take a look at the UFSv1 failure.

It took a while, but I found the problem. It's this commit: https://github.com/freebsd/freebsd-ports/commit/83a19a60d13fe . For some reason, that commit forces panic=abort for any Rust port that uses LTO, which is most of them. I don't know why diizzy included that line. But I can confirm that removing that line fixes the problem.

It took a while, but I found the problem. It's this commit: https://github.com/freebsd/freebsd-ports/commit/83a19a60d13fe . For some reason, that commit forces panic=abort for any Rust port that uses LTO, which is most of them. I don't know why diizzy included that line. But I can confirm that removing that line fixes the problem.

But why is pjdfstest panicking in the first place?

It took a while, but I found the problem. It's this commit: https://github.com/freebsd/freebsd-ports/commit/83a19a60d13fe . For some reason, that commit forces panic=abort for any Rust port that uses LTO, which is most of them. I don't know why diizzy included that line. But I can confirm that removing that line fixes the problem.

But why is pjdfstest panicking in the first place?

It's panicking because those two test cases fail on UFS1. In pjdfstest, a test case failure is implemented as a panic that gets caught, a little bit like exceptions in C++. But when built with panic=abort, it's impossible to catch a panic.

I'll fix the port so it doesn't panic=abort anymore.

@asomers did you have any more comments on the tests? I think all of my issues are resolved now.

Sprinkle more require.progs

tests/sys/fs/pjdfstest/tmpfs.sh
45

This should be either require.user root OR somehow require that vfs.usermount=1 .

tests/sys/fs/pjdfstest/zfs.sh
74

This should be require.user root

markj marked an inline comment as done.

More require.user root

This revision is now accepted and ready to land.Wed, Jun 17, 7:07 PM
ngie requested changes to this revision.EditedWed, Jun 17, 7:31 PM

This forces all end-users of pjdfstest to build/install the new rust application/binding instead of the simple C application in tree. I would argue that's a regression.

Dell is stuck on an old version of rust for various reasons and upgrading the builder image/toolchain requires a proverbial act of Congress (I love that compiling a new toolchain requires a builder host with 768GB+ of RAM/12 CPUs in order to get the operation done sometime within a day of when operation is started). This is part of the reason why I wasn't on the bandwagon of adopting rust in everything in base, despite the fact that I really like rust as a language in concept.

I think it would be a good idea to add a build option to build/run either one or the other. Otherwise, some consumers will be broken by this change because not everyone has access a rust toolchain and rust has a poor story around ABI compatibility due to the fact that its bootstrap process is self-hosted (version N-1 of the toolchain must build version N of the toolchain).

Sitenote

I guess this forces my hand to rewrite everything in pytest and get around to adding native pytest support in Kyua--which I've been saying needs to be done for years, but haven't gotten around to, because it's tedious work that doesn't involve a ton of fanfare/usually invites more maintainer burden. Ugh.

I'm not going to rewrite things in C/C++, atf-sh, etc -- that's going down the path to unmaintainable/unportable hell (there are people still using pjdfstest on Linux, Solaris, etc; we get periodic bug reports from them).

I really wish I didn't work for corporate America so I had more time to invest in efforts like this.

This revision now requires changes to proceed.Wed, Jun 17, 7:31 PM

This forces all end-users of pjdfstest to build/install the new rust binding instead of the simple C application in tree.
I think it would be a good idea to add a build option to build/run either one or the other. Otherwise, some consumers will be broken by this change because not every has access to the rust toolchain.

The simple C/sh application isn't really suitable for automated testing. That's the main reason that the Rust version was written. The main problem with the C/sh application is that it's not configurable. There's no way for a user to change the selection of tests to run. The only way to do that is to modify the application itself, which isn't scalable. Since retiring mips, the only FreeBSD platform that doesn't have Rust support is armv6, and that platform is already retired in 15.X. So I'm comfortable leaving the sh-based version out of the automated test suite.

This forces all end-users of pjdfstest to build/install the new rust binding instead of the simple C application in tree.
I think it would be a good idea to add a build option to build/run either one or the other. Otherwise, some consumers will be broken by this change because not every has access to the rust toolchain.

The simple C/sh application isn't really suitable for automated testing. That's the main reason that the Rust version was written. The main problem with the C/sh application is that it's not configurable. There's no way for a user to change the selection of tests to run. The only way to do that is to modify the application itself, which isn't scalable. Since retiring mips, the only FreeBSD platform that doesn't have Rust support is armv6, and that platform is already retired in 15.X. So I'm comfortable leaving the sh-based version out of the automated test suite.

Apologies for the many edits.

Please reread the latest version.

Dell (PowerScale OneFS) can't use rust in a sustainable manner without switching to Linux (don't get me started on how much people are still itching to do that..) because of clowny technical/bureaucratic reasons.

This forces all end-users of pjdfstest to build/install the new rust binding instead of the simple C application in tree.
I think it would be a good idea to add a build option to build/run either one or the other. Otherwise, some consumers will be broken by this change because not every has access to the rust toolchain.

The simple C/sh application isn't really suitable for automated testing. That's the main reason that the Rust version was written. The main problem with the C/sh application is that it's not configurable. There's no way for a user to change the selection of tests to run. The only way to do that is to modify the application itself, which isn't scalable. Since retiring mips, the only FreeBSD platform that doesn't have Rust support is armv6, and that platform is already retired in 15.X. So I'm comfortable leaving the sh-based version out of the automated test suite.

Apologies for the many edits.

Please reread the latest version.

Dell (PowerScale OneFS) can't use rust in a sustainable manner without switching to Linux (don't get me started on how much people are still itching to do that..) because of clowny technical/bureaucratic reasons.

Also, we (Dell) have it integrated into a custom internal test harness and running in an automated manner, so the current mechanism is can be automated -- it just (as you rightfully note) sucks to do so.

This forces all end-users of pjdfstest to build/install the new rust application/binding instead of the simple C application in tree. I would argue that's a regression.

Dell is stuck on an old version of rust for various reasons and upgrading the builder image/toolchain requires a proverbial act of Congress (I love that compiling a new toolchain requires a builder host with 768GB+ of RAM/12 CPUs in order to get the operation done sometime within a day of when operation is started). This is part of the reason why I wasn't on the bandwagon of adopting rust in everything in base, despite the fact that I really like rust as a language in concept.

Really? I can build lang/rust on my Ryzen 5950X with 64 GB of ram in just under an hour.

I think it would be a good idea to add a build option to build/run either one or the other. Otherwise, some consumers will be broken by this change because not everyone has access a rust toolchain and rust has a poor story around ABI compatibility due to the fact that its bootstrap process is self-hosted (version N-1 of the toolchain must build version N of the toolchain).

Sitenote

I guess this forces my hand to rewrite everything in pytest and get around to adding native pytest support in Kyua--which I've been saying needs to be done for years, but haven't gotten around to, because it's tedious work that doesn't involve a ton of fanfare/usually invites more maintainer burden. Ugh.

If you _really_ want to do this, I have a PoC pjdfstest rewrite in Python using python's unittest module. I'll give that to you if you want, though I'd rather focus my own efforts on the Rust version, which is much more complete.

I'm not going to rewrite things in C/C++, atf-sh, etc -- that's going down the path to unmaintainable/unportable hell (there are people still using pjdfstest on Linux, Solaris, etc; we get periodic bug reports from them).

I really wish I didn't work for corporate America so I had more time to invest in efforts like this.

This forces all end-users of pjdfstest to build/install the new rust application/binding instead of the simple C application in tree. I would argue that's a regression.

Dell is stuck on an old version of rust for various reasons and upgrading the builder image/toolchain requires a proverbial act of Congress (I love that compiling a new toolchain requires a builder host with 768GB+ of RAM/12 CPUs in order to get the operation done sometime within a day of when operation is started). This is part of the reason why I wasn't on the bandwagon of adopting rust in everything in base, despite the fact that I really like rust as a language in concept.

Really? I can build lang/rust on my Ryzen 5950X with 64 GB of ram in just under an hour.

I can't get into details too much, but yes.. ~12 hours if I'm lucky because it has to bootstrap using the stage 1 compiler instead of rebuilding everything with the stage 2 compiler (which FreeBSD can do).

...

If you _really_ want to do this, I have a PoC pjdfstest rewrite in Python using python's unittest module. I'll give that to you if you want, though I'd rather focus my own efforts on the Rust version, which is much more complete.

Any little bit helps (I would appreciate the head-start).

pytest offers every bit of flexibility that the rust version of pjdfstest would without too much bespoke logic (I would need to write some code to handle the config, but a lot of that stuff is pretty easy to accomplish). The big downsides of going from rust to python are the performance losses and losing the static typing/type safety :(.

I can't get into details too much, but yes.. ~12 hours if I'm lucky because it has to bootstrap using the stage 1 compiler instead of rebuilding everything with the stage 2 compiler (which FreeBSD can do).

Ouch.

...

If you _really_ want to do this, I have a PoC pjdfstest rewrite in Python using python's unittest module. I'll give that to you if you want, though I'd rather focus my own efforts on the Rust version, which is much more complete.

Any little bit helps (I would appreciate the head-start).

pytest offers every bit of flexibility that the rust version of pjdfstest would without too much bespoke logic (I would need to write some code to handle the config, but a lot of that stuff is pretty easy to accomplish). The big downsides of going from rust to python are the performance losses and losing the static typing/type safety :(.

It's all yours: https://github.com/asomers/py-pjdfstest . Also, I want to caution you that Python has an additional handicap: it's C interface is much more awkward than Rust's. The Rust standard library is simple enough that for many things, we use it directly to test the file system. And for other stuff, it's pretty easy to call Rust's libc bindings. But the Python version needs to define a Python-C interface for pretty much everything that touches the file system.

BTW, what version of Rust is OneFS stuck on? If it isn't too old, we may be able to make pjdfstest work on it.

...

It's all yours: https://github.com/asomers/py-pjdfstest .

Thank you!

Also, I want to caution you that Python has an additional handicap: it's C interface is much more awkward than Rust's. The Rust standard library is simple enough that for many things, we use it directly to test the file system. And for other stuff, it's pretty easy to call Rust's libc bindings. But the Python version needs to define a Python-C interface for pretty much everything that touches the file system.

Yeah, the CFFI part is kind of annoying, but it's a part of the joy of doing C bindings in python, unless you have a thing for SWIG or doing everything with the raw python-C APIs *shudders*.

BTW, what version of Rust is OneFS stuck on? If it isn't too old, we may be able to make pjdfstest work on it.

1.74.1, which was released in 12/2023. I don't honestly know if you want to deal with that pain :(.

...

It's all yours: https://github.com/asomers/py-pjdfstest .

Thank you!

Also, I want to caution you that Python has an additional handicap: it's C interface is much more awkward than Rust's. The Rust standard library is simple enough that for many things, we use it directly to test the file system. And for other stuff, it's pretty easy to call Rust's libc bindings. But the Python version needs to define a Python-C interface for pretty much everything that touches the file system.

Yeah, the CFFI part is kind of annoying, but it's a part of the joy of doing C bindings in python, unless you have a thing for SWIG or doing everything with the raw python-C APIs *shudders*.

BTW, what version of Rust is OneFS stuck on? If it isn't too old, we may be able to make pjdfstest work on it.

1.74.1, which was released in 12/2023. I don't honestly know if you want to deal with that pain :(.

It actually wasn't hard. Would you like to try it out? https://github.com/saidsay-so/pjdfstest/pull/186 .

This forces all end-users of pjdfstest to build/install the new rust application/binding instead of the simple C application in tree. I would argue that's a regression.

Well,

  1. for FreeBSD users, this is a matter of running "pkg install pjdfstest"
  2. the test suite already requires quite a few different third party packages, including e.g., py-cryptography (transitively), which already depends on rust to build
  3. the existing pjdfstest tests are not hooked up to the test suite at all, so are not getting run in CI etc.
  4. one always has the fallback option of simply not installing the pjdfstest package, and they'll be exactly where they are today,

So, I don't really see how this is a regression.

This forces all end-users of pjdfstest to build/install the new rust application/binding instead of the simple C application in tree. I would argue that's a regression.

Well,

  1. for FreeBSD users, this is a matter of running "pkg install pjdfstest"
  2. the test suite already requires quite a few different third party packages, including e.g., py-cryptography (transitively), which already depends on rust to build
  3. the existing pjdfstest tests are not hooked up to the test suite at all, so are not getting run in CI etc.
  4. one always has the fallback option of simply not installing the pjdfstest package, and they'll be exactly where they are today,

So, I don't really see how this is a regression.

It's a regression for folks that use the tool in-tree as-designed because:

  1. The existing tool does not support the configuration format required by the rust tool. Execution is driven by prove/kyua .
  2. This mechanism is not opt-in and there isn't an opt-out mechanism, so folks are going to be confused when their test runs start taking 15+ minutes longer to complete and/or start filling up data partitions/datasets. If the test timeouts are set naively, this could result in failures when running these tests.

Has @lwhsu been consulted on this? ci.freebsd.org has ZFS specific test pipelines today for running the ZFS tests; it seems like it would be a good idea to create a separate set of pjdfstest pipeline jobs to verify that things are ok (I'm all for this! we should have been running pjdfstest for ages now..).

It's a regression for folks that use the tool in-tree as-designed because:

  1. The existing tool does not support the configuration format required by the rust tool. Execution is driven by prove/kyua .

Why is that a problem? Is the sh-based pjdfstest in your path or something? Are you worried that markj's new tests will try to invoke the sh-based tool?

  1. This mechanism is not opt-in and there isn't an opt-out mechanism, so folks are going to be confused when their test runs start taking 15+ minutes longer to complete and/or start filling up data partitions/datasets. If the test timeouts are set naively, this could result in failures when running these tests.

Again, that shouldn't happen unless they're running the sh-based tool by mistake. The rust-based tool only takes seconds.

Has @lwhsu been consulted on this? ci.freebsd.org has ZFS specific test pipelines today for running the ZFS tests; it seems like it would be a good idea to create a separate set of pjdfstest pipeline jobs to verify that things are ok (I'm all for this! we should have been running pjdfstest for ages now..).

No, but given that these tests only take seconds, I don't think a new CI pipeline is warranted.

This forces all end-users of pjdfstest to build/install the new rust application/binding instead of the simple C application in tree. I would argue that's a regression.

Well,

  1. for FreeBSD users, this is a matter of running "pkg install pjdfstest"
  2. the test suite already requires quite a few different third party packages, including e.g., py-cryptography (transitively), which already depends on rust to build
  3. the existing pjdfstest tests are not hooked up to the test suite at all, so are not getting run in CI etc.
  4. one always has the fallback option of simply not installing the pjdfstest package, and they'll be exactly where they are today,

So, I don't really see how this is a regression.

It's a regression for folks that use the tool in-tree as-designed because:

  1. The existing tool does not support the configuration format required by the rust tool. Execution is driven by prove/kyua .

This isn't a problem unless you install the new implementation. The old one isn't installed to the PATH.

  1. This mechanism is not opt-in and there isn't an opt-out mechanism, so folks are going to be confused when their test runs start taking 15+ minutes longer to complete and/or start filling up data partitions/datasets. If the test timeouts are set naively, this could result in failures when running these tests.

Of course you can opt out: don't install the pjdfstest package.

Has @lwhsu been consulted on this? ci.freebsd.org has ZFS specific test pipelines today for running the ZFS tests; it seems like it would be a good idea to create a separate set of pjdfstest pipeline jobs to verify that things are ok (I'm all for this! we should have been running pjdfstest for ages now..).

Why does that need to be separate? pjdfstest runs on ZFS in about 15 seconds. The ZTS takes many hours to run (assuming you don't shard it across multiple runners), and it's definitely not running in jenkins today.

ngie accepted this revision.EditedThu, Jun 18, 9:47 PM

I'm sorry -- y'all are right about the 2 implementations being able to coexist. 15 seconds is really short, but that's done on an in-memory filesystem, whereas our tests are run on physical device/virtual compute (so vSphere, etc) backed devices with a WITNESS/INVARIANTS enabled kernel (runtime is ~10 minutes). I'm not sure what the kernel options were for the 15 second run, but regardless, it would probably be at least an order of magnitude longer than the runtime for the new pjdfstest implementation.

I would just add a build knob and documentation for clarity if possible so we could split up the old implementation from the new one from ports. I can do that if needed after the fact though, since I'm the only person really raising any concerns about the new implementation. My coworkers would likely be none the wiser for a couple years given our current rate of upgrading FreeBSD..

This revision is now accepted and ready to land.Thu, Jun 18, 9:47 PM

I'm sorry -- y'all are right about the 2 implementations being able to coexist. 15 seconds is really short, but that's done on an in-memory filesystem, whereas our tests are run on physical device/virtual compute (so vSphere, etc) backed devices with a WITNESS/INVARIANTS enabled kernel (runtime is ~10 minutes). I'm not sure what the kernel options were for the 15 second run, but regardless, it would probably be at least an order of magnitude longer than the runtime for the new pjdfstest implementation.

BTW, using an in-memory file system isn't the reason why markj's tests run quickly. The main reason is that rust-based pjdfstest uses shorter sleeps. sh-based pjdfstest hard-codes a bunch of one-second sleeps, but rust-based pjfstest uses a configurable sleep time. For file systems with sub-one-second timestamp granularity, we can really dial down the sleep time. For instance, just now I ran rust-based pjdfstest on a 7200 RPM HDD. It took 300ms for ZFS, and 4.3s for UFS.

I'm sorry -- y'all are right about the 2 implementations being able to coexist. 15 seconds is really short, but that's done on an in-memory filesystem, whereas our tests are run on physical device/virtual compute (so vSphere, etc) backed devices with a WITNESS/INVARIANTS enabled kernel (runtime is ~10 minutes). I'm not sure what the kernel options were for the 15 second run, but regardless, it would probably be at least an order of magnitude longer than the runtime for the new pjdfstest implementation.

BTW, using an in-memory file system isn't the reason why markj's tests run quickly. The main reason is that rust-based pjdfstest uses shorter sleeps. sh-based pjdfstest hard-codes a bunch of one-second sleeps, but rust-based pjfstest uses a configurable sleep time. For file systems with sub-one-second timestamp granularity, we can really dial down the sleep time. For instance, just now I ran rust-based pjdfstest on a 7200 RPM HDD. It took 300ms for ZFS, and 4.3s for UFS.

Dang -- not bad. Are there any known data creation races that need to be managed in the Rust implementation? How much "tire kicking" has the Rust version gotten so far? How about A/B test code coverage from the kernel?

This version of pjdfstest requires a pjdfstest user, which we currently
don't have. The current plan is for the pjdfstest package to create the
user upon installation.

What about just using the tests user in the upstream port?

% id tests
uid=977(tests) gid=977(tests) groups=977(tests)

Dang -- not bad. Are there any known data creation races that need to be managed in the Rust implementation? How much "tire kicking" has the Rust version gotten so far? How about A/B test code coverage from the kernel?

Data creation races? Not that I know of. The closest thing I know of is that the remount tests don't currently work on UFS. It seems that some operation, maybe close(), on UFS is asynchronous and blocks a subsequent umount if it happens too quickly. As for "tire-kicking", a lot less than the sh-based version. This PR is probably the best yet. And by A/B code coverage, do you mean running both implementations , comparing the UFS or ZFS coverage in kernel code? No, there hasn't been. And in fact, I haven't done any FreeBSD kernel code coverage checking using modern tools. But before anybody does that, there are still some test cases that haven't been fully ported over from the sh-based version. A lot of negative tests, for example. Various EPERM conditions, etc.

This version of pjdfstest requires a pjdfstest user, which we currently
don't have. The current plan is for the pjdfstest package to create the
user upon installation.

What about just using the tests user in the upstream port?

% id tests
uid=977(tests) gid=977(tests) groups=977(tests)

Rust-based pjdfstest required three unprivileged users. So it was using nobody, tests, and pjdfstests. This is actually fewer than sh-based pjdfstest was using, but sh-based pjdfstest used hard-coded UIDs instead of usernames. However, the comment you're replying to is out-of-date. I've subsequently adjusted rust-based pjdfstest so that it now only requires two unprivileged users. That means that we can get by without creating any new ones.

This revision was automatically updated to reflect the committed changes.