Page MenuHomeFreeBSD

Resolve a ZFS hang
ClosedPublic

Authored by allanjude on Oct 7 2018, 8:24 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Mar 9, 8:59 AM
Unknown Object (File)
Feb 2 2024, 12:06 PM
Unknown Object (File)
Dec 27 2023, 12:06 AM
Unknown Object (File)
Dec 20 2023, 4:28 AM
Unknown Object (File)
Dec 15 2023, 4:48 AM
Unknown Object (File)
Dec 11 2023, 1:22 AM
Unknown Object (File)
Dec 8 2023, 9:03 PM
Unknown Object (File)
Dec 8 2023, 3:10 PM

Details

Summary

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229614

Over the course of many revisions to the vnode/znode locking code, the check in zfs_zget() for 'this vnode is being unlinked' was lost.

The PR above contains a reproduction case I used to investigate and solve this.

This patch brings the code closer inline with what is in IllumOS.
I plan a followup commit to cleanup some excess delta that has grown over time, but this fix should be MFCd to stable/11, so I didn't want to pollute it with other changes.

Open Question: do we need any additional locking to safely read zp->z_unlinked?

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
No Lint Coverage
Unit
No Test Coverage
Build Status
Buildable 20058
Build 19557: arc lint + arc unit

Event Timeline

The check for zp->z_unlocked was dropped in rS303763, although possibly only because of the changes to the locking. Maybe @avg can shed more light on it.

Confirmed working in the reproduction test case I provided in the PR. Will test soon on the production-like host on which I initially found the issue.

Change makes sense from an outsider's perspective. Can't comment on locking correctness.

This revision is now accepted and ready to land.Oct 8 2018, 12:18 PM

I am really happy that this simple change greatly reduces the chances of hitting the problem.
My understanding of it is in the bug report.

About my rationale for removing the z_unlinked check from zfs_zget.
I think that a znode's being unlinked should not prevent its lookup by ID.
However, I cannot think of any non-exotic situations where that would matter.
An example of an exotic situation could involve NFS over ZFS and passing a file descriptor between processes.

Given the retry loop in zfs_zget I would not worry about unlocked access to z_unlinked.

Also, the bug is not exclusive to unlinked files. The same race can happen with a linked ZFS vnode too.
Of course, it's harder to trigger a controlled reclamation of such a vnode and so it is much harder to write a reliable test case for it.

This revision was automatically updated to reflect the committed changes.