Details

Reviewers

kib
jeff

Commits

rS353155: ufs: add root vnode caching
rS353154: nfsclient: add root vnode caching
rS353153: tmpfs: add root vnode caching
rS353152: devfs: add root vnode caching
rS353150: vfs: add optional root vnode caching

Summary

This is an operational patch which may need little tweaks to add granularity for the vfs_op - right now write suspension et al will disable the feature. This can be easily dealt with by adding masks of what's prohibited. I'm posting this mostly to showcase additional uses for the feature added for pcpu counts.

With the patch in place contention on tmpfs locks when crossing the mount point is eliminated. On a system with root on zfs all contention is put on access checking.

zfs has an equivalent mechanism with rmlocks but it never re-populates the vnode. I'll patch that later. Naming is a little cumbersome but stolen from the same scheme for vnode ops - _lookup and _cachedlookup.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

mjg created this revision.Sep 13 2019, 7:32 PM

Herald added a subscriber: imp. · View Herald TranscriptSep 13 2019, 7:32 PM

Harbormaster completed remote builds in B26483: Diff 62060.Sep 13 2019, 7:32 PM

mjg edited the summary of this revision. (Show Details)Sep 13 2019, 7:33 PM

rebase
drop unnecesary wait in vfs_cache_root_clear
add a comment about fence pairing in vfs_cache_root_fallback

jeff added inline comments.Sep 16 2019, 9:05 PM

sys/kern/vfs_subr.c
5678 ↗	(On Diff #62173)	Do we want to assert that the same vnode is returned as is cached in the case of a collision? Otherwise we may return with a vnode locked that is not the same as the cached root.

mjg added inline comments.Sep 16 2019, 9:51 PM

sys/kern/vfs_subr.c
5678 ↗	(On Diff #62173)	I can add the assert no problem. If that's a possibility it already exists in the current code (that is 2 roughly concurrent calls to the _root routine end up with different vnodes) and sounds like solid bug, but we should probably audit if this can happen in sufficiently wrong conditions. With a long enough delay the original thread may perhaps unlock the vnode and have it doomed, at which point the other thread might have found a different one. i.e. it indeed may be the newly cached vnode is already doomed.

I think it is fine.

Several filesystems cache its root vnode, which becomes redundant with the caching done in mnt. I believe it is worth to make simultaneous cleanup, in the separate patch.

panic if the found vnode does not match ours and is not doomed

pho tested the patch. It had a printf of similar extent as the above, was not triggered.

The condition is only plausible if we race with a forced unmount and this is the first time the node gets populated.

In D21646#473051, @kib wrote:

Several filesystems cache its root vnode, which becomes redundant with the caching done in mnt. I believe it is worth to make simultaneous cleanup, in the separate patch.

I only have a patch for zfs. Some filesystems use the vfs hash but I don't think it should count as it is used for other purposes as well.

I still think the name vfs_cachedroot is iffy but I consider the patch committable otherwise.

Perhaps I'm confused, but since you've added a new VFS operation, shouldn't there be a default implementation? You mention doing this for ZFS, but it's only there for tmpfs and UFS; does this need to be a VFS function, or should it be done as part of the mount structure and maintenance functions?

sys/sys/mount.h
704 ↗	(On Diff #62428)	Should not this replace one of vfs_spare, for ABI compatibility?

I have a separate patch for zfs since it involves reverting its current support, which would only clutter this review.

There is no way to get a "default" implementation for this.

sys/sys/mount.h
704 ↗	(On Diff #62428)	this can be done for a merge, not in head

There is no way to get a "default" implementation for this.

So if it's not set, the system panics?

sys/sys/mount.h
704 ↗	(On Diff #62428)	Why can it not be done in head?

The new op is *not* called by anything but the new handler which has to be explicitly set by filesystems which want the feature, i.e. the change is a no-op for anyone who does not opt in.

sys/sys/mount.h
704 ↗	(On Diff #62428)	it wastes a spare. they are basically there so that people don't forget to add them when a new stable branch is created

In D21646#475415, @mjg wrote:

The new op is *not* called by anything but the new handler which has to be explicitly set by filesystems which want the feature, i.e. the change is a no-op for anyone who does not opt in.

Ah, followed the indirects enough this time. Thank you!

jeff accepted this revision.Sep 25 2019, 11:22 PM

jeff added inline comments.

sys/kern/vfs_subr.c
5689–5690 ↗	(On Diff #62428)	This could be a single else if indented one level less.

This revision is now accepted and ready to land.Sep 25 2019, 11:22 PM

I stil has an opinion that the mp root vnode caching should be better integrated with existing code. Right now it feels as a side hack.

For instance, vflush() has the argument to specify root vnode refs to ignore for successful vflush. Your patch drops the reference manually before calling into VFS_UNMOUNT().

The fdescfs, devfs, nfsclient filesystems all pass ref 1 to vflush, which indicates that they also cache root vnode by independent mechanism.

It is wrapped around the current code on purpose - provides all expected win for most users, is easy to opt in and requires no changes for filesystems which don't participate. But more importantly I consider it to be a temporary bandaid until interaction of lookup and the namecache is reworked so that there is no need to VFS_ROOT or busy in the common case.

Also note not all filesystems can really participate (e.g. fdescfs). Thus a full integration would have to look like this:

rootrefs argument is eliminated from vflush
apart from vfs_root, there will have to be vfs_unsetroot which defaults to not doing anything. Should someone cache their root vnode, this will return a pointer to an unlocked vnode with guaranteed usecount of at least 1. Should vnode caching take more refs, they all get dropped in the routine. That way after everything is completed vflush can issue a single vrele.

I think it is more work which does not buy much and requires a lot of extra asserts. I can do it if you insist. (Or perhaps you see a different solution?)