Page MenuHomeFreeBSD

vfs: add root vnode caching for mount points
ClosedPublic

Authored by mjg on Sep 13 2019, 7:32 PM.

Details

Summary

This is an operational patch which may need little tweaks to add granularity for the vfs_op - right now write suspension et al will disable the feature. This can be easily dealt with by adding masks of what's prohibited. I'm posting this mostly to showcase additional uses for the feature added for pcpu counts.

With the patch in place contention on tmpfs locks when crossing the mount point is eliminated. On a system with root on zfs all contention is put on access checking.

zfs has an equivalent mechanism with rmlocks but it never re-populates the vnode. I'll patch that later. Naming is a little cumbersome but stolen from the same scheme for vnode ops - _lookup and _cachedlookup.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

mjg created this revision.Sep 13 2019, 7:32 PM
mjg edited the summary of this revision. (Show Details)Sep 13 2019, 7:33 PM
mjg updated this revision to Diff 62173.Sep 16 2019, 6:39 PM
  • rebase
  • drop unnecesary wait in vfs_cache_root_clear
  • add a comment about fence pairing in vfs_cache_root_fallback
jeff added inline comments.Sep 16 2019, 9:05 PM
sys/kern/vfs_subr.c
5678 ↗(On Diff #62173)

Do we want to assert that the same vnode is returned as is cached in the case of a collision? Otherwise we may return with a vnode locked that is not the same as the cached root.

mjg added inline comments.Sep 16 2019, 9:51 PM
sys/kern/vfs_subr.c
5678 ↗(On Diff #62173)

I can add the assert no problem.

If that's a possibility it already exists in the current code (that is 2 roughly concurrent calls to the _root routine end up with different vnodes) and sounds like solid bug, but we should probably audit if this can happen in sufficiently wrong conditions. With a long enough delay the original thread may perhaps unlock the vnode and have it doomed, at which point the other thread might have found a different one. i.e. it indeed may be the newly cached vnode is already doomed.

kib added a comment.Sep 17 2019, 2:48 PM

I think it is fine.

Several filesystems cache its root vnode, which becomes redundant with the caching done in mnt. I believe it is worth to make simultaneous cleanup, in the separate patch.

mjg updated this revision to Diff 62428.EditedSep 22 2019, 7:22 PM
  • panic if the found vnode does not match ours and is not doomed

pho tested the patch. It had a printf of similar extent as the above, was not triggered.

The condition is only plausible if we race with a forced unmount and this is the first time the node gets populated.

mjg added a comment.EditedSep 22 2019, 7:23 PM
In D21646#473051, @kib wrote:

Several filesystems cache its root vnode, which becomes redundant with the caching done in mnt. I believe it is worth to make simultaneous cleanup, in the separate patch.

I only have a patch for zfs. Some filesystems use the vfs hash but I don't think it should count as it is used for other purposes as well.

I still think the name vfs_cachedroot is iffy but I consider the patch committable otherwise.

sef added a subscriber: sef.Sep 24 2019, 10:26 PM

Perhaps I'm confused, but since you've added a new VFS operation, shouldn't there be a default implementation? You mention doing this for ZFS, but it's only there for tmpfs and UFS; does this need to be a VFS function, or should it be done as part of the mount structure and maintenance functions?

sys/sys/mount.h
704 ↗(On Diff #62428)

Should not this replace one of vfs_spare, for ABI compatibility?

mjg added a comment.Sep 24 2019, 10:30 PM

I have a separate patch for zfs since it involves reverting its current support, which would only clutter this review.

There is no way to get a "default" implementation for this.

sys/sys/mount.h
704 ↗(On Diff #62428)

this can be done for a merge, not in head

sef added a comment.Sep 24 2019, 10:32 PM

There is no way to get a "default" implementation for this.

So if it's not set, the system panics?

sys/sys/mount.h
704 ↗(On Diff #62428)

Why can it not be done in head?

mjg added a comment.Sep 24 2019, 10:35 PM

The new op is *not* called by anything but the new handler which has to be explicitly set by filesystems which want the feature, i.e. the change is a no-op for anyone who does not opt in.

sys/sys/mount.h
704 ↗(On Diff #62428)

it wastes a spare. they are basically there so that people don't forget to add them when a new stable branch is created

sef added a comment.Sep 24 2019, 10:39 PM
In D21646#475415, @mjg wrote:

The new op is *not* called by anything but the new handler which has to be explicitly set by filesystems which want the feature, i.e. the change is a no-op for anyone who does not opt in.

Ah, followed the indirects enough this time. Thank you!

jeff accepted this revision.Sep 25 2019, 11:22 PM
jeff added inline comments.
sys/kern/vfs_subr.c
5689–5690 ↗(On Diff #62428)

This could be a single else if indented one level less.

This revision is now accepted and ready to land.Sep 25 2019, 11:22 PM
kib added a comment.Sep 26 2019, 7:28 AM

I stil has an opinion that the mp root vnode caching should be better integrated with existing code. Right now it feels as a side hack.

For instance, vflush() has the argument to specify root vnode refs to ignore for successful vflush. Your patch drops the reference manually before calling into VFS_UNMOUNT().

The fdescfs, devfs, nfsclient filesystems all pass ref 1 to vflush, which indicates that they also cache root vnode by independent mechanism.

mjg added a comment.EditedSep 26 2019, 1:21 PM

It is wrapped around the current code on purpose - provides all expected win for most users, is easy to opt in and requires no changes for filesystems which don't participate. But more importantly I consider it to be a temporary bandaid until interaction of lookup and the namecache is reworked so that there is no need to VFS_ROOT or busy in the common case.

Also note not all filesystems can really participate (e.g. fdescfs). Thus a full integration would have to look like this:

  1. rootrefs argument is eliminated from vflush
  2. apart from vfs_root, there will have to be vfs_unsetroot which defaults to not doing anything. Should someone cache their root vnode, this will return a pointer to an unlocked vnode with guaranteed usecount of at least 1. Should vnode caching take more refs, they all get dropped in the routine. That way after everything is completed vflush can issue a single vrele.

I think it is more work which does not buy much and requires a lot of extra asserts. I can do it if you insist. (Or perhaps you see a different solution?)

mjg updated this revision to Diff 62948.Oct 5 2019, 10:54 PM
  • add missing support for sigdefer ops
  • add _set routine and use it in nfs and devfs
This revision now requires review to proceed.Oct 5 2019, 10:54 PM
kib accepted this revision.Oct 6 2019, 7:17 PM
kib added inline comments.
sys/kern/vfs_subr.c
5817 ↗(On Diff #62948)

So consumers of this function have usecount for the root vnode set to 2.

This revision is now accepted and ready to land.Oct 6 2019, 7:17 PM
mjg added inline comments.Oct 6 2019, 8:03 PM
sys/kern/vfs_subr.c
5817 ↗(On Diff #62948)

I don't think that's avoidable without playing with vflush. It's not a problem though.

This revision was automatically updated to reflect the committed changes.