move handling of zvol devices out of txg sync thread
AbandonedPublic
Actions

Authored by avg on Jul 11 2016, 10:57 AM.

Details

Reviewers

pjd
smh
mav

Group Reviewers

ZFS

Summary

That caused a deadlock because the manipulations of zvol devices
is done under spa_namespace_lock, but there are cases where the
lock is held while waiting for the sync thread.

This is imperfect, because zvol updates are done with an arbitrary
delay after the corresponding dataset changes. So, potentially
zvol devices can get out of sync if multiple changes are performed
concurrently.
A better solution would be to queue all changes and apply them to zvol
devices in the same order as they are originally done.

PR: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203864
See: http://thread.gmane.org/gmane.comp.file-systems.openzfs.devel/2924

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

No Lint Coverage

Unit

No Test Coverage

Build Status

Buildable 4475
Build 4526: arc lint + arc unit

Event Timeline

avg updated this revision to Diff 18277.Jul 11 2016, 10:57 AM

avg retitled this revision from to move handling of zvol devices out of txg sync thread.

avg updated this object.

avg edited the test plan for this revision. (Show Details)

avg added reviewers: ZFS, smh, pjd.

Herald added subscribers: delphij, imp. · View Herald TranscriptJul 11 2016, 10:57 AM

This looks reasonable however as you say its potentially racey with multiple renames happening.

Would a simple global / pool or pool specific lock taken by zfs_ioc_promote and zfs_ioc_rename help protect against that?

In D7179#149342, @smh wrote:

This looks reasonable however as you say its potentially racey with multiple renames happening.

Would a simple global / pool or pool specific lock taken by zfs_ioc_promote and zfs_ioc_rename help protect against that?

It would, but it would also mean having a Giant-like pool-wide serialization for all DSL modifications. And that would be on top of the existing serialization via the sync thread meaning that at most one DSL operation would be performed per txg sync. I am hesitant to introduce that performance impact.

I think that the mentioned above serialization via the sync thread ("sync task") greatly reduces the number of mutations that can happen to a dataset "almost" concurrently, so in practice the changes should be spaced out in time by the sync thread runs.

avg mentioned this in D13447: zfs: deadlock in "zfs rename" if zv_total_opens > 0 since MFV r323535.Dec 12 2017, 3:37 PM

anthoine.bourgeois_blade-group.com added a subscriber: anthoine.bourgeois_blade-group.com.Mar 12 2018, 5:09 PM

anthoine.bourgeois_blade-group.com added a child revision: D14669: Fix move handling of zvol devices out of txg sync thread.Mar 12 2018, 7:23 PM

anthoine.bourgeois_blade-group.com mentioned this in D14669: Fix move handling of zvol devices out of txg sync thread.

avg added a reviewer: mav.Dec 6 2018, 8:35 AM

avg mentioned this in D23478: rework how ZVOLs are updated in response to DSL operations.Feb 3 2020, 12:52 PM

Abandoned in favor of D23478.

avg mentioned this in rS362047: rework how ZVOLs are updated in response to DSL operations.Jun 11 2020, 10:42 AM