Page MenuHomeFreeBSD

Resolve a ZFS hang

Authored by allanjude on Oct 7 2018, 8:24 PM.
Referenced Files
F68548738: D17460.id48869.diff
Sat, Sep 30, 4:17 PM
Unknown Object (File)
Thu, Sep 28, 10:42 AM
Unknown Object (File)
Mon, Sep 25, 12:13 PM
Unknown Object (File)
Wed, Sep 6, 1:11 AM
Unknown Object (File)
Tue, Sep 5, 7:09 PM
Unknown Object (File)
Tue, Sep 5, 7:05 PM
Unknown Object (File)
Tue, Sep 5, 7:05 PM
Unknown Object (File)
Tue, Sep 5, 1:06 AM



Over the course of many revisions to the vnode/znode locking code, the check in zfs_zget() for 'this vnode is being unlinked' was lost.

The PR above contains a reproduction case I used to investigate and solve this.

This patch brings the code closer inline with what is in IllumOS.
I plan a followup commit to cleanup some excess delta that has grown over time, but this fix should be MFCd to stable/11, so I didn't want to pollute it with other changes.

Open Question: do we need any additional locking to safely read zp->z_unlinked?

Diff Detail

rS FreeBSD src repository - subversion
No Lint Coverage
No Test Coverage
Build Status
Buildable 20058
Build 19557: arc lint + arc unit

Event Timeline

The check for zp->z_unlocked was dropped in rS303763, although possibly only because of the changes to the locking. Maybe @avg can shed more light on it.

Confirmed working in the reproduction test case I provided in the PR. Will test soon on the production-like host on which I initially found the issue.

Change makes sense from an outsider's perspective. Can't comment on locking correctness.

This revision is now accepted and ready to land.Oct 8 2018, 12:18 PM

I am really happy that this simple change greatly reduces the chances of hitting the problem.
My understanding of it is in the bug report.

About my rationale for removing the z_unlinked check from zfs_zget.
I think that a znode's being unlinked should not prevent its lookup by ID.
However, I cannot think of any non-exotic situations where that would matter.
An example of an exotic situation could involve NFS over ZFS and passing a file descriptor between processes.

Given the retry loop in zfs_zget I would not worry about unlocked access to z_unlinked.

Also, the bug is not exclusive to unlinked files. The same race can happen with a linked ZFS vnode too.
Of course, it's harder to trigger a controlled reclamation of such a vnode and so it is much harder to write a reliable test case for it.

This revision was automatically updated to reflect the committed changes.