Page MenuHomeFreeBSD

Resolve a ZFS hang
ClosedPublic

Authored by allanjude on Oct 7 2018, 8:24 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Feb 27, 7:18 AM
Unknown Object (File)
Wed, Feb 26, 4:02 AM
Unknown Object (File)
Wed, Feb 26, 3:49 AM
Unknown Object (File)
Mon, Feb 24, 3:47 PM
Unknown Object (File)
Mon, Feb 24, 6:24 AM
Unknown Object (File)
Wed, Feb 5, 9:55 AM
Unknown Object (File)
Jan 24 2025, 5:42 PM
Unknown Object (File)
Jan 16 2025, 9:17 AM

Details

Summary

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229614

Over the course of many revisions to the vnode/znode locking code, the check in zfs_zget() for 'this vnode is being unlinked' was lost.

The PR above contains a reproduction case I used to investigate and solve this.

This patch brings the code closer inline with what is in IllumOS.
I plan a followup commit to cleanup some excess delta that has grown over time, but this fix should be MFCd to stable/11, so I didn't want to pollute it with other changes.

Open Question: do we need any additional locking to safely read zp->z_unlinked?

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

The check for zp->z_unlocked was dropped in rS303763, although possibly only because of the changes to the locking. Maybe @avg can shed more light on it.

Confirmed working in the reproduction test case I provided in the PR. Will test soon on the production-like host on which I initially found the issue.

Change makes sense from an outsider's perspective. Can't comment on locking correctness.

This revision is now accepted and ready to land.Oct 8 2018, 12:18 PM

I am really happy that this simple change greatly reduces the chances of hitting the problem.
My understanding of it is in the bug report.

About my rationale for removing the z_unlinked check from zfs_zget.
I think that a znode's being unlinked should not prevent its lookup by ID.
However, I cannot think of any non-exotic situations where that would matter.
An example of an exotic situation could involve NFS over ZFS and passing a file descriptor between processes.

Given the retry loop in zfs_zget I would not worry about unlocked access to z_unlinked.

Also, the bug is not exclusive to unlinked files. The same race can happen with a linked ZFS vnode too.
Of course, it's harder to trigger a controlled reclamation of such a vnode and so it is much harder to write a reliable test case for it.

This revision was automatically updated to reflect the committed changes.