Details

Reviewers

kib
jeff

Commits

rS352424: vfs: manage mnt_ref with atomics

Summary

The previous posting included an rmlock which I considered only a temporary measure from the get go, included specifically to avoid discussions on how to specifically handle per-cpu counting. It turned out to work very differently, thus after a discussion with Jeff (who had reservations about even temporary inclusion of an rmlock) I'm posting an update review with more rationale.

The main problem to be solved is handling of struct mount counters. They are very frequently modified and almost never need to be checked, which makes them a natural fit for a per-cpu-based scheme. The approach implemented here effectively provides support for code section during which we are guaranteed the modifying party will wait for us to finish (and the other way around, we guarantee to not modify anything past a certain point). This automatically lends itself to solve other problems (most notably root vnode caching) and is therefore my first pick.

There are many different ways to do it of course. One proposed by jeff@ is to cmpset into the counter. I disagree with this method for the following reasons:

I find it a little cumbersome to use - it has to account for a case where the count would overflow/underflow which avoidably extends the common case
busy/unbusy and write suspension always modify two counters at the same time. with this approach all the provisions have to be done twice. with my approach we have to provide safety only once. iow this approach is slower for common consumers due to 2 atomic ops and more branches
but most importantly this does not provide the idea of code section which can be safely executed in face of changes to the struct, which I think is an unnecessary loss

The patch below was tested by me with stress2 and poudriere on amd64. On top of that I ran crossmp8 and suj11 from said suite on a powerpc64 box (talos), no problems seen.

Currently the code is rather coarse grained: either all per-cpu oepration is allowed or none is. This can be easily modified later with flags passed by the blocking thread and flags passed by threads entering.

root vnode caching would look like this:

if (!vfs_op_thread_enter(mp))
        return (vfs_cache_root_fallback(mp, flags, vpp));
vp = (struct vnode *)atomic_load_ptr(&mp->mnt_rootvnode);
if (vp == NULL || (vp->v_iflag & VI_DOOMED)) {
        vfs_op_thread_exit(mp);
        return (vfs_cache_root_fallback(mp, flags, vpp));
}
vrefact(vp);
vfs_op_thread_exit(mp);
error = vn_lock(vp, flags);
 .....

Should the vnode need to be cleared it can look like this (assuming only one thread can do it):

vp = mp->mnt_rootvnode;
if (vp == NULL)
        return (NULL);
mp->mnt_rootvnode = NULL;
atomic_thread_fence_seq_cst();
vfs_op_barrier_wait(mp);
vrele(vp);

This provides us with a way to get the vnode without locking anything and it's easy to reason about.

Test Plan

The complete patchset was tested by pho.

Apart from that it was used during several poudriere runs.

Diff Detail

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

mjg created this revision.Aug 26 2019, 11:08 PM

Herald added a subscriber: imp. · View Herald TranscriptAug 26 2019, 11:08 PM

Harbormaster completed remote builds in B26119: Diff 61323.Aug 26 2019, 11:08 PM

mjg added inline comments.Aug 26 2019, 11:11 PM

sys/kern/vfs_mount.c
1383	ops, this if statement is a remnant from per-cpu counters. The first caller to disable the fast path goes over all cpus and sums them up to update the counter in struct.

mjg edited the summary of this revision. (Show Details)Aug 26 2019, 11:13 PM

plug some nits

add vfs_fastpath_enable_mp_locked for safe access with the lock held

rebase on top of r351702

jeff added inline comments.Sep 6 2019, 6:53 PM

sys/sys/mount.h
951	Instead of rmlock could we not encode the flag bits in the counter field and use a cmpset loop to handle the whole business? If you wanted to do it with multiple cpus you need only set the bit in each counter.

mjg added inline comments.Sep 6 2019, 7:43 PM

sys/sys/mount.h
951	Note I said I don't want rmlocks later, they are here specifically because I wanted to avoid a discussion on whether given per-cpu routine works and/or is a good idea. Whatever ends up protecting the fast path is trivially replaceable. Main point of this review is to (dis)agree on the general approach - that is wrapping all code which requires an exact value with fast path enable/disable routines, with a guarantee everyone executing after the disabling routine returns goes for the current code. Assuming this is fine, I'll post an equivalent for busy/unbusy and write suspension. With this in in place we can discuss how to handle per-cpu, unless you are very opposed to rmlocks showing up in the first place even if temporarily (in which case I can elaborate on what scheme I had in mind).

mjg added inline comments.Sep 6 2019, 7:55 PM

sys/sys/mount.h
951	Perhaps I should add I do think making the counters atomic in initial commits is an important logical step, especially when it comes to write suspension -- it required a little patching in SU code. It really just needs something to keep the fast path. You can see the previous iteration here https://people.freebsd.org/~mjg/patches/atomic-mnt-counters.diff But again, if the intermediate work is not seen as necessary, I would be more than happy to go straight to per-cpu variants.

pho tested the combined diff which switches ref + lockref + writeopcount to atomics, it can be found here: https://people.freebsd.org/~mjg/patches/atomic-mnt-counters.diff

It does not include slight rework I did of fastpath disable but it should not matter. I don't think there is much to discuss here but to agree (or not) on the fastpath approach. The actual stuff to be of significance is handling per-cpu scheme which will be done in later reviews.

If this is fine I'll pots lockref + writeopcount in separate reviews before posting per-cpu.

I would use bools systematically.

Did you considered some less ugly name than fastpath ? An idea would be to distinguish between normal mp use and mount/unmount stage.

This revision is now accepted and ready to land.Sep 9 2019, 7:11 AM

change _retval to bool
rename fastpath to vfs_op and alter related routines

This revision now requires review to proceed.Sep 9 2019, 8:24 AM

mjg added a child revision: D21574: vfs: manage mnt_lockref with atomics.Sep 9 2019, 12:22 PM

mjg added a child revision: D21575: vfs: manage mnt_writeopcount with atomics.

see the rewritten summary

Harbormaster completed remote builds in B26409: Diff 61954.Sep 11 2019, 8:52 PM

mjg edited the summary of this revision. (Show Details)Sep 11 2019, 8:53 PM

mjg edited the summary of this revision. (Show Details)

mjg edited the test plan for this revision. (Show Details)Sep 13 2019, 12:39 AM

mjg added a child revision: D21637: vfs: convert struct mount counters to per-cpu.Sep 13 2019, 4:32 PM

kib added inline comments.Sep 14 2019, 5:07 PM

sys/kern/subr_pcpu.c
135 ↗	(On Diff #61954)	Remove one dot.
sys/kern/vfs_default.c
611	Can you eliminate these 'goto out'/'return's. Place the slow path into the 'else' block of the vfs_op_thread_enter().
626	So all VOP_GETWRITEMOUNT() implementations for supported fses never return an error from VOP_GETWRITEMOUNT. The only implementation that does return an error is unionfs. I think we can optimize vn_start_write().
sys/kern/vfs_mount.c
1393	The fence alone is nop, you need a reciprocal fence to sync with. I assume this is the fence in thread_enter() ? Please add a comment explaining it.
sys/sys/mount.h
284	This should be MNT_REF_FASTPATH() ?
291	If you change MNT_REL_FASTPATH into ext macro that returns a value, I think you can use it there too. Can these four macros changed to inline functions ?
953	Please use the 'sequentially consistent fence' and not 'full barrier' term.
969	Can this be inline function ?

kib added inline comments.Sep 14 2019, 5:10 PM

sys/sys/mount.h
273	I should admit that I dislike the _FASTPATH() suffix on principle, but I have no suggestions right now to replace it.

mjg updated this revision to Diff 62104.Sep 14 2019, 8:05 PM

mjg added inline comments.

sys/kern/vfs_default.c
626	Are you proposing to drop the error and only act on mp == NULL? I think this is orthogonal to the change and can be made in a different review regardless of what happens here.
sys/kern/vfs_mount.c
1393	It is explained (albeit briefly) above vfs_op_thread_enter. If necessary I can add a rundown showing cases of how concurrent execution can be handled with two columns (one for each routine).
sys/sys/mount.h
273	_UNLOCKED or (later) _PCPU?
284	See below.
291	I don't think changing this is worthwhile. The per-cpu variant will not have a way to return a meaningful value so this will have to go back to doing work on its own. In order to keep things consistent I would prefer avoid changing the _REF_ variant either. But I don't have a strong opinion here. As for inline, this requires adding several headers to struct mount.h and despite them being gated with _KERNEL it causes a lot of breakage during buildworld, e.g.: /usr/src/lib/libprocstat/common_kvm.h:34: error: conflicting types for 'dev2udev' /usr/obj/usr/src/sparc64.sparc64/tmp/usr/include/sys/systm.h:499: error: previous declaration of 'dev2udev' was here I think conversion to inlines should be a separate effort, if done at all.

kib added inline comments.Sep 14 2019, 8:17 PM

sys/kern/vfs_default.c
626	Most likely yes. I am not proposing to do it in this commit, of course, just something that I noted when I reviewed all implementations of the VOP for your change.
sys/kern/vfs_mount.c
1393	I only want to you point to th dual fence.
1394	s/barrier/fence/. vfs_op_thread_enter().
sys/sys/mount.h
273	unlocked sounds fine
291	Ok. You might move that to a new header.

change comment in vfs_op_enter
rename _FASTPATH to _UNLOCKED

remove spurious 'to' from vfs_op_thread_enter comment and change the ending a little

kib accepted this revision.Sep 14 2019, 8:55 PM

This revision is now accepted and ready to land.Sep 14 2019, 8:55 PM

mjg mentioned this in D21637: vfs: convert struct mount counters to per-cpu.Sep 14 2019, 9:14 PM

My only real feedback is on the naming.

vfs_op_ is too close to VOP and I find that confusing. These are mount ops. I believe it would be much more obvious if we called it mnt_op or mount_op_.

Closed by commit rS352424: vfs: manage mnt_ref with atomics (authored by mjg). · Explain WhySep 16 2019, 9:31 PM

This revision was automatically updated to reflect the committed changes.

mjg added a commit: rS352424: vfs: manage mnt_ref with atomics.

vfs: manage mnt_ref with atomics
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 61370

sys/cddl/compat/opensolaris/kern/opensolaris_vfs.c

sys/kern/vfs_default.c

sys/kern/vfs_mount.c

sys/kern/vfs_mountroot.c

sys/kern/vfs_subr.c

sys/sys/mount.h

vfs: manage mnt_ref with atomicsClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 61370

sys/cddl/compat/opensolaris/kern/opensolaris_vfs.c

sys/kern/vfs_default.c

sys/kern/vfs_mount.c

sys/kern/vfs_mountroot.c

sys/kern/vfs_subr.c

sys/sys/mount.h

vfs: manage mnt_ref with atomics
ClosedPublic
Actions

Revision Contents
Changeset List