Page MenuHomeFreeBSD

do not hold an extra reference on a root vnode while a filesystem is mounted
ClosedPublic

Authored by avg on Apr 23 2015, 11:31 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Nov 17, 1:35 AM
Unknown Object (File)
Mon, Nov 11, 12:52 PM
Unknown Object (File)
Mon, Nov 11, 6:18 AM
Unknown Object (File)
Mon, Nov 11, 12:44 AM
Unknown Object (File)
Thu, Nov 7, 12:15 AM
Unknown Object (File)
Wed, Nov 6, 11:15 PM
Unknown Object (File)
Tue, Nov 5, 6:02 PM
Unknown Object (File)
Oct 19 2024, 1:50 AM
Subscribers

Details

Summary

At present zfs_domount() acquires a reference on the filesystem's root vnode
and that reference is kept until zfs_umount.
The latter calls vflush(rootrefs = 1) to dispose of the extra reference.

There is no explanation of why that reference is kept - what problem it
solves or what behavior it improves.
Also, that logic is FreeBSD specific.

There is one real problem with that reference, though.
zfs recv -F may receive a full, non-incremental stream to a mounted filesystem.
In that case the received root object is likely to have a different z_gen
attribute value. Because of that, zfs_rezget will leave the previous root znode
and vnode disassociated from the actual object (z_sa_hdl == NULL).
Thus, future calls to VFS_ROOT() -> zfs_root() will produce a new vnode-znode
pair, while the old one will be kept alive by the outstanding reference.
So, the outstanding reference will not actually be for the new root vnode
(or, more precisely, vnodes - because a root vnode may be recycled and a newer
one can be created).
As a result, when vflush(rootrefs = 1) s called there will be two problems:

  1. a leaked reference on the old root vnode preventing a graceful unmount
  2. insufficient references on the actual root vnode leading to a crash upon access to the vnode after it is destroyed by vgone() + vdrop()

The second issue will actually override the first one.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

avg retitled this revision from to do not hold an extra reference on a root vnode while a filesystem is mounted.
avg updated this object.
avg edited the test plan for this revision. (Show Details)

This should be fine, assuming that zfs code to instantiate vnodes correctly set VV_ROOT on the root vnode, and not delegate the work to VFS_ROOT() implementation.

In D2353#42815, @kostikbel wrote:

This should be fine, assuming that zfs code to instantiate vnodes correctly set VV_ROOT on the root vnode, and not delegate the work to VFS_ROOT() implementation.

Actually the flag is set in both places. It is done when a new vnode/znode is created and we know that it is for a filesystem root. But, for whatever reason, it is also done in zfs_root().
Is that a problem?

kib edited edge metadata.
In D2353#42833, @avg wrote:
In D2353#42815, @kostikbel wrote:

This should be fine, assuming that zfs code to instantiate vnodes correctly set VV_ROOT on the root vnode, and not delegate the work to VFS_ROOT() implementation.

Actually the flag is set in both places. It is done when a new vnode/znode is created and we know that it is for a filesystem root. But, for whatever reason, it is also done in zfs_root().
Is that a problem?

My only worry was about setting the flag on the vnode initialization (not a znode creation). Doing it in zfs_root() then should be safe but useless. I do not know ZFS code to see a unintended behavior there.

This revision is now accepted and ready to land.Apr 23 2015, 2:33 PM
avg edited edge metadata.

Now that the root vnode does not need any special treatment we no longer
a backdoor in zfs_root() that allowed that call to succeed even after
the filesystem was marked as unmounted.
Thus, ZFS_ENTER_NOERROR() is no longer needed. It was specific to FreeBSD.

This revision now requires review to proceed.Apr 25 2015, 5:59 PM
delphij edited edge metadata.

Looks good to me.

This revision is now accepted and ready to land.Apr 27 2015, 8:51 PM
smh edited edge metadata.

This looks good to me can pjd shed any light on this behaviour?

This revision was automatically updated to reflect the committed changes.