Page MenuHomeFreeBSD

Resolve a ZFS hang
ClosedPublic

Authored by allanjude on Oct 7 2018, 8:24 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, May 29, 9:39 AM
Unknown Object (File)
Apr 11 2023, 10:52 PM
Unknown Object (File)
Apr 11 2023, 10:51 PM
Unknown Object (File)
Apr 11 2023, 10:33 PM
Unknown Object (File)
Mar 10 2023, 11:55 AM
Unknown Object (File)
Feb 9 2023, 4:43 AM
Unknown Object (File)
Jan 1 2023, 12:45 AM
Unknown Object (File)
Dec 31 2022, 10:59 AM

Details

Summary

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229614

Over the course of many revisions to the vnode/znode locking code, the check in zfs_zget() for 'this vnode is being unlinked' was lost.

The PR above contains a reproduction case I used to investigate and solve this.

This patch brings the code closer inline with what is in IllumOS.
I plan a followup commit to cleanup some excess delta that has grown over time, but this fix should be MFCd to stable/11, so I didn't want to pollute it with other changes.

Open Question: do we need any additional locking to safely read zp->z_unlinked?

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

The check for zp->z_unlocked was dropped in rS303763, although possibly only because of the changes to the locking. Maybe @avg can shed more light on it.

Confirmed working in the reproduction test case I provided in the PR. Will test soon on the production-like host on which I initially found the issue.

Change makes sense from an outsider's perspective. Can't comment on locking correctness.

This revision is now accepted and ready to land.Oct 8 2018, 12:18 PM

I am really happy that this simple change greatly reduces the chances of hitting the problem.
My understanding of it is in the bug report.

About my rationale for removing the z_unlinked check from zfs_zget.
I think that a znode's being unlinked should not prevent its lookup by ID.
However, I cannot think of any non-exotic situations where that would matter.
An example of an exotic situation could involve NFS over ZFS and passing a file descriptor between processes.

Given the retry loop in zfs_zget I would not worry about unlocked access to z_unlinked.

Also, the bug is not exclusive to unlinked files. The same race can happen with a linked ZFS vnode too.
Of course, it's harder to trigger a controlled reclamation of such a vnode and so it is much harder to write a reliable test case for it.

This revision was automatically updated to reflect the committed changes.