Page MenuHomeFreeBSD

remove emulation of VFS_HOLD and VFS_RELE from opensolaris compat
ClosedPublic

Authored by avg on Jun 12 2015, 1:59 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Jan 23, 8:23 AM
Unknown Object (File)
Tue, Jan 21, 11:43 AM
Unknown Object (File)
Dec 1 2024, 9:36 PM
Unknown Object (File)
Dec 1 2024, 9:36 PM
Unknown Object (File)
Dec 1 2024, 9:35 PM
Unknown Object (File)
Dec 1 2024, 9:35 PM
Unknown Object (File)
Dec 1 2024, 9:14 PM
Unknown Object (File)
Nov 28 2024, 11:20 PM
Subscribers

Details

Summary

On FreeBSD VFS_HOLD/VN_RELE ere mapped to MNT_REF/MNT_REL which manipulate
mnt_ref. But the job of properly maintaining the reference count is
already automatically performed by insmntque(9) and delmntque(9).
So, in effect all ZFS vnodes referenced the corresponding mountpoint
twice.

That was completely harmless, but we want to be very explicit about
what FreeBSD VFS APIs are used, because illumos VFS_HOLD and FreeBSD
MNT_REF provide quite different guarantees with respect to the held
vfs_t / mountpoint.
On illumos VFS_HOLD is sufficient to guarantee that vfs_t.vfs_data stays
valid. On the other hand, on FreeBSD MNT_REF does *not* provide the same
guarantee about mnt_data. We have to use vfs_busy to get that guarantee.

As a result calls to VFS_HOLD/VFS_RELE on vnode init and fini are removed.
VFS_HOLD calles are replaced with vfs_busy in the ioctl handlers.

And because vfs_busy has a richer interface that can not be dumbed down
in all case it's better to explicitly use it rather than trying to mask
it behind VFS_HOLD.

After these change there were no users left for VFS_HOLD and VFS_RELE.

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage

Event Timeline

avg retitled this revision from to remove emulation of VFS_HOLD and VFS_RELE from opensolaris compat.
avg updated this object.
avg edited the test plan for this revision. (Show Details)
avg added reviewers: kib, smh, delphij, will, gibbs.
avg added a subscriber: ZFS.

Forgot to add that this change fixes a panic that results from a race between zfs_umount() and zfs_ioc_rollback().
In the case that we observed zfsvfs_free() tried to destroy data that zfsvfs_teardown() was still using.
At the moment there is no synchronization to prevent unmounting of a ZFS filesystem that is being between zfs_suspend_fs() and zfs_resume_fs().

Stacks of the racing threads follow.

panic: solaris assert: rrl->rr_writer == NULL, file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/rrwlock.c, line: 163
cpuid = 2
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame 0xffffff9de12bc610
kdb_backtrace() at kdb_backtrace+0x3a/frame 0xffffff9de12bc6d0
panic() at panic+0x21c/frame 0xffffff9de12bc7d0
assfail() at assfail+0x1d/frame 0xffffff9de12bc7e0
rrw_destroy() at rrw_destroy+0x39/frame 0xffffff9de12bc800
zfsvfs_free() at zfsvfs_free+0xae/frame 0xffffff9de12bc820
zfs_umount() at zfs_umount+0x290/frame 0xffffff9de12bc860
dounmount() at dounmount+0x73b/frame 0xffffff9de12bc8d0
sys_unmount() at sys_unmount+0x39c/frame 0xffffff9de12bc9d0
amd64_syscall() at amd64_syscall+0x2aa/frame 0xffffff9de12bcaf0
(kgdb) bt
0 cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1392
1 0xffffffff80cd88ef in ipi_nmi_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1374
2 0xffffffff80ce5c26 in trap (frame=0xffffff80003e9f30) at /usr/src/sys/amd64/amd64/trap.c:211
3 0xffffffff80ccec8f in nmi_calltrap () at /usr/src/sys/amd64/amd64/exception.S:505
4 0xffffffff8097b77b in cache_purgevfs (mp=0xfffffe0073b439a8) at /usr/src/sys/kern/vfs_cache.c:990
5 0xffffffff81b92e77 in zfsvfs_teardown (zfsvfs=0xfffffe0124a55000, unmounting=0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1848
6 0xffffffff81b930de in zfs_suspend_fs (zfsvfs=<value optimized out>) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:2224
7 0xffffffff81b824c8 in zfs_ioc_rollback (zc=<value optimized out>) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:3567
8 0xffffffff81b86d6a in zfsdev_ioctl (dev=<value optimized out>, zcmd=<value optimized out>, arg=0xfffffe0c48b03b20 "\003", flag=3, td=<value optimized out>) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:5948
9 0xffffffff807c9109 in devfs_ioctl_f (fp=0xfffffe0179904d20, com=3222821401, data=0xfffffe0c48b03b20, cred=<value optimized out>, td=0xfffffe0144360490) at /usr/src/sys/fs/devfs/devfs_vnops.c:757
10 0xffffffff80940232 in kern_ioctl (td=0xfffffe0144360490, fd=<value optimized out>, com=3222821401, data=0xfffffe0c48b03b20 "\003") at file.h:311
11 0xffffffff809403fa in sys_ioctl (td=0xfffffe0144360490, uap=0xffffff9de105aa70) at /usr/src/sys/kern/sys_generic.c:692
12 0xffffffff80ce4cfa in amd64_syscall (td=0xfffffe0144360490, traced=0) at subr_syscall.c:135

I do not know zfs code to review the patch.

Still, a note that must be made is that having a vnode on the struct mount vnode list does not prevent unmount. Only locked vnode would give the property, otherwise other thread might reclaim the vnode at any time and unmount then could succeed.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
1443

Is it safe ?

Note that blocking vfs_busy() is before any vnode locks on the corresponding mount, so no vnode locks must be owned by the calling thread at this point. Fine if so.

avg marked an inline comment as done.Jun 18 2015, 10:40 AM
In D2794#53573, @kib wrote:

Still, a note that must be made is that having a vnode on the struct mount vnode list does not prevent unmount. Only locked vnode would give the property, otherwise other thread might reclaim the vnode at any time and unmount then could succeed.

Yes, I know that. The code does not assume such a thing either with my change or without it.
I think that my comment D2794#53517 better described the problem I am trying to solve.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
1443

Yes, I believe that this is safe, because this function is called only from /dev/zfs ioctl handlers and they do not take any vnode locks.

smh requested changes to this revision.Dec 15 2015, 8:03 PM
smh edited edge metadata.
smh added inline comments.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
3033

I believe this is wrong and should be unlocked for vfsp != NULL, as vfs_busy drops the lock only on successful completion (return 0).

Its also confusing to those unfamiliar with the internals of vfs_busy, so it might be good to add a comment to that effect.

This revision now requires changes to proceed.Dec 15 2015, 8:03 PM
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
3033

vfsp == NULL indicates that either suitable vfsp was not found on the list, or that vfs_busy() failed. In both cases, list mutex is owned, and the unlock is due.

smh removed a reviewer: smh.
smh added inline comments.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
3033

Thanks for clarifying, no idea what I was thinking when I read this.

avg marked 3 inline comments as done.Feb 22 2016, 3:58 PM
avg added a subscriber: smh.

"This revision now requires changes to proceed."
How can I clear this? Or is it only @smh that can do that?

avg marked an inline comment as not done.Feb 22 2016, 3:59 PM
kib edited edge metadata.
This revision is now accepted and ready to land.Feb 22 2016, 4:18 PM
smh added a reviewer: smh.
This revision was automatically updated to reflect the committed changes.