Page MenuHomeFreeBSD

geom: Make g_waitidle() wait for orphaned providers
AcceptedPublic

Authored by markj on Mon, Feb 2, 2:07 AM.
Tags
None
Referenced Files
F144050933: D55049.diff
Tue, Feb 3, 10:57 PM
Unknown Object (File)
Tue, Feb 3, 2:08 AM
Unknown Object (File)
Mon, Feb 2, 3:28 PM
Unknown Object (File)
Mon, Feb 2, 5:31 AM
Unknown Object (File)
Mon, Feb 2, 5:14 AM
Subscribers

Details

Summary

This is motivated by the following race in the ZFS zvol code.

When a zvol is created, we create a GEOM-backed zvol, which results in a
/dev/zvol/<zvol path> device file, created by GEOM::dev. If volmode=dev
is specified, zvol_set_volmode_impl() will wither the GEOM, then create
a device file with the same name. This sometimes fails because
g_wither_geom() is asynchronous, so we end up trying to create a device
file while the old one still exists. I want to fix this by adding a
g_waitidle() call to zvol_os_remove_minor().

g_waitidle() is not sufficient: GEOM::dev does not destroy the device
until g_dev_orphan() is called. (In fact the device destruction is
asynchronous too, but the delist_dev() call is sufficient to address
this race.) So, I propose modifying g_waitidle() to block until
orphaned providers are processed.

PR: 258766

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 70336
Build 67219: arc lint + arc unit

Event Timeline

markj requested review of this revision.Mon, Feb 2, 2:07 AM
This revision is now accepted and ready to land.Mon, Feb 2, 5:04 AM

It seems to make sense to know that we at least tried to orphan the device, even if it may still be open for a while. But I don't remember why it was not done before.

Speaking about ZFS, I remember that some of ZVOL tests in ZTS are known to be failing on FreeBSD and marked as such (switching volmode or something similar?). Wonder if your fix will help it too.

In D55049#1258481, @mav wrote:

It seems to make sense to know that we at least tried to orphan the device, even if it may still be open for a while. But I don't remember why it was not done before.

The providergone method is relatively new. Before that, I guess there was little reason to care.

Speaking about ZFS, I remember that some of ZVOL tests in ZTS are known to be failing on FreeBSD and marked as such (switching volmode or something similar?). Wonder if your fix will help it too.

Could be, I did not try running the openzfs tests yet. I will do it later this week. If you are curious, my WIP changes are here: https://github.com/markjdb/freebsd/commits/main-zvol-races/

This commit tries to fix the race: https://github.com/markjdb/freebsd/commit/cd59711024878bded9db0f172923048c060a7467

In D55049#1258481, @mav wrote:

It seems to make sense to know that we at least tried to orphan the device, even if it may still be open for a while. But I don't remember why it was not done before.

The providergone method is relatively new. Before that, I guess there was little reason to care.

Before that classes tracked provider open/close to know when to be gone. Orphanization was there from the beginning.

This looks good to me, and I agree with markj's analysis.