Page MenuHomeFreeBSD

Resolve a ZFS hang
ClosedPublic

Authored by allanjude on Oct 7 2018, 8:24 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Dec 16, 1:17 AM
Unknown Object (File)
Tue, Dec 3, 1:05 PM
Unknown Object (File)
Oct 11 2024, 1:22 AM
Unknown Object (File)
Oct 5 2024, 6:01 AM
Unknown Object (File)
Sep 26 2024, 9:34 PM
Unknown Object (File)
Sep 24 2024, 8:32 AM
Unknown Object (File)
Sep 20 2024, 5:45 PM
Unknown Object (File)
Sep 10 2024, 5:03 PM

Details

Summary

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229614

Over the course of many revisions to the vnode/znode locking code, the check in zfs_zget() for 'this vnode is being unlinked' was lost.

The PR above contains a reproduction case I used to investigate and solve this.

This patch brings the code closer inline with what is in IllumOS.
I plan a followup commit to cleanup some excess delta that has grown over time, but this fix should be MFCd to stable/11, so I didn't want to pollute it with other changes.

Open Question: do we need any additional locking to safely read zp->z_unlinked?

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

The check for zp->z_unlocked was dropped in rS303763, although possibly only because of the changes to the locking. Maybe @avg can shed more light on it.

Confirmed working in the reproduction test case I provided in the PR. Will test soon on the production-like host on which I initially found the issue.

Change makes sense from an outsider's perspective. Can't comment on locking correctness.

This revision is now accepted and ready to land.Oct 8 2018, 12:18 PM

I am really happy that this simple change greatly reduces the chances of hitting the problem.
My understanding of it is in the bug report.

About my rationale for removing the z_unlinked check from zfs_zget.
I think that a znode's being unlinked should not prevent its lookup by ID.
However, I cannot think of any non-exotic situations where that would matter.
An example of an exotic situation could involve NFS over ZFS and passing a file descriptor between processes.

Given the retry loop in zfs_zget I would not worry about unlocked access to z_unlinked.

Also, the bug is not exclusive to unlinked files. The same race can happen with a linked ZFS vnode too.
Of course, it's harder to trigger a controlled reclamation of such a vnode and so it is much harder to write a reliable test case for it.

This revision was automatically updated to reflect the committed changes.