Page MenuHomeFreeBSD

move handling of zvol devices out of txg sync thread
AbandonedPublic

Authored by avg on Jul 11 2016, 10:57 AM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Nov 20, 3:34 PM
Unknown Object (File)
Tue, Nov 19, 3:49 AM
Unknown Object (File)
Sat, Nov 16, 2:38 PM
Unknown Object (File)
Mon, Nov 11, 5:36 PM
Unknown Object (File)
Mon, Nov 11, 3:59 PM
Unknown Object (File)
Sat, Nov 9, 5:21 PM
Unknown Object (File)
Sat, Nov 9, 8:18 AM
Unknown Object (File)
Sat, Nov 9, 4:56 AM

Details

Reviewers
pjd
smh
mav
Group Reviewers
ZFS
Summary

That caused a deadlock because the manipulations of zvol devices
is done under spa_namespace_lock, but there are cases where the
lock is held while waiting for the sync thread.

This is imperfect, because zvol updates are done with an arbitrary
delay after the corresponding dataset changes. So, potentially
zvol devices can get out of sync if multiple changes are performed
concurrently.
A better solution would be to queue all changes and apply them to zvol
devices in the same order as they are originally done.

PR: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203864
See: http://thread.gmane.org/gmane.comp.file-systems.openzfs.devel/2924

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
No Lint Coverage
Unit
No Test Coverage
Build Status
Buildable 4475
Build 4526: arc lint + arc unit

Event Timeline

avg retitled this revision from to move handling of zvol devices out of txg sync thread.
avg updated this object.
avg edited the test plan for this revision. (Show Details)
avg added reviewers: ZFS, smh, pjd.

This looks reasonable however as you say its potentially racey with multiple renames happening.

Would a simple global / pool or pool specific lock taken by zfs_ioc_promote and zfs_ioc_rename help protect against that?

In D7179#149342, @smh wrote:

This looks reasonable however as you say its potentially racey with multiple renames happening.

Would a simple global / pool or pool specific lock taken by zfs_ioc_promote and zfs_ioc_rename help protect against that?

It would, but it would also mean having a Giant-like pool-wide serialization for all DSL modifications. And that would be on top of the existing serialization via the sync thread meaning that at most one DSL operation would be performed per txg sync. I am hesitant to introduce that performance impact.

I think that the mentioned above serialization via the sync thread ("sync task") greatly reduces the number of mutations that can happen to a dataset "almost" concurrently, so in practice the changes should be spaced out in time by the sync thread runs.

Abandoned in favor of D23478.