zfs: Fix a deadlock between page busy and the teardown lock
ClosedPublic
Actions

Authored by markj on Nov 10 2021, 5:45 PM.

Details

Reviewers

alc
kib
avg
sef

Commits

rGb3427b18b1c6: zfs: Fix a deadlock between page busy and the teardown lock
rGcdf74673bc79: zfs: Fix a deadlock between page busy and the teardown lock
rG705a6ee2b611: zfs: Fix a deadlock between page busy and the teardown lock
rGd28af1abf031: vm: Add a mode to vm_object_page_remove() which skips invalid pages

Summary

ZFS has a per-mountpoint read-mostly "teardown lock" which is acquired
in read mode by most VOPs. It is used to suspend filesystem operations
in preparation for some dataset-level operation like a rollback
(reverting a filesystem to an earlier snapshot). In particular, the ZFS
VOP_GETPAGES implementation acquires this lock.

When rolling back a dataset, ZFS invalidates all file data resident in
the page cache, as a rollback can cause a file's contents to change and
we don't want to let stale data linger. To do this, it calls
vn_page_remove() on each vnode associated with the mountpoint. This
introduces a lock order reversal: to handle a page fault we busy vnode
pages before calling VOP_GETPAGES, and during rollback we busy vnode
pages in vm_object_page_remove() with the teardown lock held.

Resolve the deadlock by exploiting the fact that rollback only needs to
purge valid pages: invalid pages need not be purged by definition, and
then a busy lock holder of invalid ZFS vnode pages can safely block on
the teardown lock. This assumes that we will not pass valid pages to
VOP_GETPAGES via vm_pager_get_pages(). Since ZFS is solely responsible
for marking vnode pages valid, I believe it is a safe assumption.

Add a new mode to vm_object_page_remove() to skip over invalid pages.
Use it when rolling back a dataset.

PR: 258208

Test Plan

Peter has a stress2 scenario which triggers the deadlock fairly regularly.
We haven't observed any problems with this patch applied.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

markj created this revision.Nov 10 2021, 5:45 PM

Herald added subscribers: delphij, imp. · View Herald TranscriptNov 10 2021, 5:45 PM

markj requested review of this revision.Nov 10 2021, 5:45 PM

Harbormaster completed remote builds in B42706: Diff 98322.Nov 10 2021, 5:45 PM

markj edited the test plan for this revision. (Show Details)Nov 10 2021, 5:46 PM

kib added inline comments.Nov 10 2021, 7:04 PM

sys/vm/vm_object.c
2130	Would this still cause the issue with partially valid pages at EOF? If the page is not fully valid, vm_fault() still calls into pager, and pager calls into VOP. The validation of the rest of the page is performed by vm_pager_get_pages() after pgo_getpages() validated page up to EOF. It might be simplest to avoid this issue at all by unconditionally doing vm_page_zero_invalid() in zfs VOP_GETPAGES().

markj added inline comments.Nov 10 2021, 8:02 PM

sys/vm/vm_object.c
2130	I believe ZFS vnode pages are never partially valid. This is asserted in several places which maintain coherence between the page cache and the DMU. See page_busy() or dmu_read_pages(). Truncation (currently) does not mark pages partially invalid, see the last comment in vnode_pager_subpage_purge(). Maybe it is a somewhat fragile assumption, but it exists already.

kib accepted this revision.Nov 10 2021, 8:05 PM

This revision is now accepted and ready to land.Nov 10 2021, 8:05 PM

The code looks fine (small change, after all), other than my request for comments. :)

sys/kern/vfs_vnops.c
2448–2459	Can we get some comments? I realize the code has historically been low on them, but we can try to fix that for new code. :)

Add a comment for vn_pages_remove_valid().

This revision now requires review to proceed.Nov 11 2021, 8:33 PM

Harbormaster completed remote builds in B42736: Diff 98379.Nov 11 2021, 8:33 PM

Thank you very much! This is a very neat fix for what appeared as a quite substantial problem (and maybe is, in terms of interactions between layers / subsystems).

This revision is now accepted and ready to land.Nov 12 2021, 10:04 AM

Thanks

Closed by commit rGd28af1abf031: vm: Add a mode to vm_object_page_remove() which skips invalid pages (authored by markj). · Explain WhyNov 15 2021, 6:03 PM

This revision was automatically updated to reflect the committed changes.

markj added a commit: rGd28af1abf031: vm: Add a mode to vm_object_page_remove() which skips invalid pages.

markj added a commit: rG705a6ee2b611: zfs: Fix a deadlock between page busy and the teardown lock.Nov 20 2021, 4:37 PM

Brian Behlendorf <behlendorf1@llnl.gov> added a commit: rGcdf74673bc79: zfs: Fix a deadlock between page busy and the teardown lock.Dec 15 2021, 1:24 AM

Tony Hutter <hutter2@llnl.gov> added a commit: rGb3427b18b1c6: zfs: Fix a deadlock between page busy and the teardown lock.Mar 11 2022, 6:33 AM