HomeFreeBSD

zfs: do not hold an extra reference on a root vnode while a filesystem is…

Description

zfs: do not hold an extra reference on a root vnode while a filesystem is mounted

At present zfs_domount() acquires a reference on the filesystem's root vnode
and that reference is kept until zfs_umount.
The latter calls vflush(rootrefs = 1) to dispose of the extra reference.

There is no explanation of why that reference is kept - what problem it
solves or what behavior it improves.
Also, that logic is FreeBSD specific.

There is one real problem with that reference, though.
zfs recv -F may receive a full, non-incremental stream to a mounted filesystem.
In that case the received root object is likely to have a different z_gen
attribute value. Because of that, zfs_rezget will leave the previous root znode
and vnode disassociated from the actual object (z_sa_hdl == NULL).
Thus, future calls to VFS_ROOT() -> zfs_root() will produce a new vnode-znode
pair, while the old one will be kept alive by the outstanding reference.
So, the outstanding reference will not actually be for the new root vnode
(or, more precisely, vnodes - because a root vnode may be recycled and a newer
one can be created).
As a result, when vflush(rootrefs = 1) s called there will be two problems:

  • a leaked reference on the old root vnode preventing a graceful unmount
  • insufficient references on the actual root vnode leading to a crash upon access to the vnode after it is destroyed by vgone() + vdrop()

The second issue will actually override the first one.

Differential Revision: https://reviews.freebsd.org/D2353
Reviewed by: delphij, kib, smh
MFC after: 17 days