Page MenuHomeFreeBSD

add "mntfs" private vnodes for file systems to access their storage safely
ClosedPublic

Authored by chs on Feb 21 2020, 10:56 PM.

Details

Summary

This change is another bit of groundwork needed by the upcoming changes to
allow softdep to recovery gracefully from disk I/O errors.
From the comment in the code:

  • The "mntfs" VCHR vnodes implemented here provide a safe way for file systems
  • to access their disk devices. Using the normal devfs vnode has the problem
  • that if the device disappears, the devfs vnode is vgone'd as part of
  • removing it from the application-visible namespace, and some file systems
  • (notably FFS with softdep) get very unhappy if their dirty buffers are
  • invalidated out from under them. By using a separate, private vnode,
  • file systems are able to clean up their buffer state in a controlled fashion
  • when the underlying device disappears.

This diff adds the new mntfs vnode facilty and changes FFS to use mntfs vnodes.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

sys/fs/mntfs/mntfs_vnops.c
6 ↗(On Diff #68656)

You can delete this line...

I would be slightly less worried about this hack if you added asserts that odevvp does not get any buffers on its bufobj queues.

Also, it is probably worth explaining why VOP_PANIC strategy works for mntfs vnodes, despite we allow buffers on it queues (unless I am wrong, because io really directed to geom avoiding buffer cache strategy ?).

There's no central place in the ufs code to add an assertion about not creating buffers on odevvp. What I could do is define a new BO_NOBUFS flag for bufobj bo_flag, set that flag on odevvp in ffs_mountfs() and clear the flag in ffs_unmount(), and then in buf_vlist_add() assert that (bo->bo_flag & BO_NOBUFS) == 0. Is that what you had in mind?

I copied the mntfs vop_strategy = VOP_PANIC from devfs_specops, and it works for the reason you suspected: VOP_STRATEGY() is not called for VCHR vnodes. g_vfs_open() overrides the devvp's bufobj's bo_ops with g_vfs_bufops, which changes bop_strategy from the default bufstrategy() (which uses VOP_STRATEGY()) to g_vfs_strategy(), thus bypassing VOP_STRATEGY() for file system I/O requests. (and FFS further overrides bo_ops to add its own wrapper around g_vfs_strategy()).

I am happy with this revision.

This revision is now accepted and ready to land.Feb 27 2020, 6:07 AM
In D23787#524311, @chs wrote:

There's no central place in the ufs code to add an assertion about not creating buffers on odevvp. What I could do is define a new BO_NOBUFS flag for bufobj bo_flag, set that flag on odevvp in ffs_mountfs() and clear the flag in ffs_unmount(), and then in buf_vlist_add() assert that (bo->bo_flag & BO_NOBUFS) == 0. Is that what you had in mind?

I am fine with BO_NOBUFS.

Another option is to check odevvp sometimes, e.g. on each call to ffs_geom_strategy() or any other place where odevvp is conveniently available.
Catching the situation where we mis-targeted the write is enough to notify about bug, providing precise information where it happen is separate.

I copied the mntfs vop_strategy = VOP_PANIC from devfs_specops, and it works for the reason you suspected: VOP_STRATEGY() is not called for VCHR vnodes. g_vfs_open() overrides the devvp's bufobj's bo_ops with g_vfs_bufops, which changes bop_strategy from the default bufstrategy() (which uses VOP_STRATEGY()) to g_vfs_strategy(), thus bypassing VOP_STRATEGY() for file system I/O requests. (and FFS further overrides bo_ops to add its own wrapper around g_vfs_strategy()).

I mean that this should be a comment somewhere. Perhaps near the assert discussed above.

update with changes suggested by kib and imp.

This revision now requires review to proceed.Mar 5 2020, 6:48 PM
kib added inline comments.
sys/ufs/ffs/ffs_vfsops.c
823 ↗(On Diff #69233)

Is this additional ref on dev needed after you introduced mntfs vp? The vnode already refs the device, and the scopes seems to be same.

This revision is now accepted and ready to land.Mar 5 2020, 9:14 PM
chs marked an inline comment as done.Mar 6 2020, 12:53 AM
chs added inline comments.
sys/ufs/ffs/ffs_vfsops.c
823 ↗(On Diff #69233)

It's not strictly needed, but as long as the ufsmount has its own pointer to the cdev it seems best for the ufsmount to have its own ref as well.

I was thinking I might make another pass later to clean up the redundant references from file systems to other objects further down the storage stack, and instead have file systems always go through the device bufobj to access any of those lower-level objects. But I want to get the softdep work committed first.