Page MenuHomeFreeBSD

vfs: manage mnt_ref with atomics
ClosedPublic

Authored by mjg on Aug 26 2019, 11:08 PM.
Tags
None
Referenced Files
F103594449: D21425.id61553.diff
Tue, Nov 26, 10:01 PM
Unknown Object (File)
Tue, Oct 29, 10:28 AM
Unknown Object (File)
Oct 1 2024, 4:15 AM
Unknown Object (File)
Sep 27 2024, 11:20 PM
Unknown Object (File)
Sep 24 2024, 9:33 AM
Unknown Object (File)
Sep 22 2024, 1:57 PM
Unknown Object (File)
Sep 18 2024, 3:26 AM
Unknown Object (File)
Sep 17 2024, 11:57 AM
Subscribers

Details

Summary

The previous posting included an rmlock which I considered only a temporary measure from the get go, included specifically to avoid discussions on how to specifically handle per-cpu counting. It turned out to work very differently, thus after a discussion with Jeff (who had reservations about even temporary inclusion of an rmlock) I'm posting an update review with more rationale.

The main problem to be solved is handling of struct mount counters. They are very frequently modified and almost never need to be checked, which makes them a natural fit for a per-cpu-based scheme. The approach implemented here effectively provides support for code section during which we are guaranteed the modifying party will wait for us to finish (and the other way around, we guarantee to not modify anything past a certain point). This automatically lends itself to solve other problems (most notably root vnode caching) and is therefore my first pick.

There are many different ways to do it of course. One proposed by jeff@ is to cmpset into the counter. I disagree with this method for the following reasons:

  • I find it a little cumbersome to use - it has to account for a case where the count would overflow/underflow which avoidably extends the common case
  • busy/unbusy and write suspension always modify two counters at the same time. with this approach all the provisions have to be done twice. with my approach we have to provide safety only once. iow this approach is slower for common consumers due to 2 atomic ops and more branches
  • but most importantly this does not provide the idea of code section which can be safely executed in face of changes to the struct, which I think is an unnecessary loss

The patch below was tested by me with stress2 and poudriere on amd64. On top of that I ran crossmp8 and suj11 from said suite on a powerpc64 box (talos), no problems seen.

Currently the code is rather coarse grained: either all per-cpu oepration is allowed or none is. This can be easily modified later with flags passed by the blocking thread and flags passed by threads entering.

root vnode caching would look like this:

if (!vfs_op_thread_enter(mp))
        return (vfs_cache_root_fallback(mp, flags, vpp));
vp = (struct vnode *)atomic_load_ptr(&mp->mnt_rootvnode);
if (vp == NULL || (vp->v_iflag & VI_DOOMED)) {
        vfs_op_thread_exit(mp);
        return (vfs_cache_root_fallback(mp, flags, vpp));
}
vrefact(vp);
vfs_op_thread_exit(mp);
error = vn_lock(vp, flags);
 .....

Should the vnode need to be cleared it can look like this (assuming only one thread can do it):

vp = mp->mnt_rootvnode;
if (vp == NULL)
        return (NULL);
mp->mnt_rootvnode = NULL;
atomic_thread_fence_seq_cst();
vfs_op_barrier_wait(mp);
vrele(vp);

This provides us with a way to get the vnode without locking anything and it's easy to reason about.

Test Plan

The complete patchset was tested by pho.

Apart from that it was used during several poudriere runs.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

sys/kern/vfs_mount.c
1383 ↗(On Diff #61323)

ops, this if statement is a remnant from per-cpu counters. The first caller to disable the fast path goes over all cpus and sums them up to update the counter in struct.

mjg edited the summary of this revision. (Show Details)

plug some nits

  • add vfs_fastpath_enable_mp_locked for safe access with the lock held
  • rebase on top of r351702
sys/sys/mount.h
951 ↗(On Diff #61553)

Instead of rmlock could we not encode the flag bits in the counter field and use a cmpset loop to handle the whole business?

If you wanted to do it with multiple cpus you need only set the bit in each counter.

sys/sys/mount.h
951 ↗(On Diff #61553)

Note I said I don't want rmlocks later, they are here specifically because I wanted to avoid a discussion on whether given per-cpu routine works and/or is a good idea. Whatever ends up protecting the fast path is trivially replaceable.

Main point of *this* review is to (dis)agree on the general approach - that is wrapping all code which requires an exact value with fast path enable/disable routines, with a guarantee everyone executing after the disabling routine returns goes for the current code. Assuming this is fine, I'll post an equivalent for busy/unbusy and write suspension. With this in in place we can discuss how to handle per-cpu, unless you are very opposed to rmlocks showing up in the first place even if temporarily (in which case I can elaborate on what scheme I had in mind).

sys/sys/mount.h
951 ↗(On Diff #61553)

Perhaps I should add I do think making the counters atomic in initial commits is an important logical step, especially when it comes to write suspension -- it required a little patching in SU code. It really just needs *something* to keep the fast path. You can see the previous iteration here https://people.freebsd.org/~mjg/patches/atomic-mnt-counters.diff

But again, if the intermediate work is not seen as necessary, I would be more than happy to go straight to per-cpu variants.

pho tested the combined diff which switches ref + lockref + writeopcount to atomics, it can be found here: https://people.freebsd.org/~mjg/patches/atomic-mnt-counters.diff

It does not include slight rework I did of fastpath disable but it should not matter. I don't think there is much to discuss here but to agree (or not) on the fastpath approach. The actual stuff to be of significance is handling per-cpu scheme which will be done in later reviews.

If this is fine I'll pots lockref + writeopcount in separate reviews before posting per-cpu.

I would use bools systematically.

Did you considered some less ugly name than fastpath ? An idea would be to distinguish between normal mp use and mount/unmount stage.

This revision is now accepted and ready to land.Sep 9 2019, 7:11 AM
  • change _retval to bool
  • rename fastpath to vfs_op and alter related routines
This revision now requires review to proceed.Sep 9 2019, 8:24 AM
mjg edited the summary of this revision. (Show Details)
mjg set the repository for this revision to rS FreeBSD src repository - subversion.
  • see the rewritten summary
mjg edited the summary of this revision. (Show Details)
sys/kern/subr_pcpu.c
135 ↗(On Diff #61954)

Remove one dot.

sys/kern/vfs_default.c
614 ↗(On Diff #61954)

Can you eliminate these 'goto out'/'return's. Place the slow path into the 'else' block of the vfs_op_thread_enter().

627 ↗(On Diff #61954)

So all VOP_GETWRITEMOUNT() implementations for supported fses never return an error from VOP_GETWRITEMOUNT. The only implementation that does return an error is unionfs. I think we can optimize vn_start_write().

sys/kern/vfs_mount.c
1392 ↗(On Diff #61954)

The fence alone is nop, you need a reciprocal fence to sync with. I assume this is the fence in thread_enter() ? Please add a comment explaining it.

sys/sys/mount.h
282 ↗(On Diff #61954)

This should be MNT_REF_FASTPATH() ?

289 ↗(On Diff #61954)

If you change MNT_REL_FASTPATH into ext macro that returns a value, I think you can use it there too.

Can these four macros changed to inline functions ?

963 ↗(On Diff #61954)

Please use the 'sequentially consistent fence' and not 'full barrier' term.

979 ↗(On Diff #61954)

Can this be inline function ?

sys/sys/mount.h
271 ↗(On Diff #61954)

I should admit that I dislike the _FASTPATH() suffix on principle, but I have no suggestions right now to replace it.

mjg added inline comments.
sys/kern/vfs_default.c
627 ↗(On Diff #61954)

Are you proposing to drop the error and only act on mp == NULL? I think this is orthogonal to the change and can be made in a different review regardless of what happens here.

sys/kern/vfs_mount.c
1392 ↗(On Diff #61954)

It is explained (albeit briefly) above vfs_op_thread_enter. If necessary I can add a rundown showing cases of how concurrent execution can be handled with two columns (one for each routine).

sys/sys/mount.h
271 ↗(On Diff #61954)

_UNLOCKED or (later) _PCPU?

282 ↗(On Diff #61954)

See below.

289 ↗(On Diff #61954)

I don't think changing this is worthwhile. The per-cpu variant will not have a way to return a meaningful value so this will have to go back to doing work on its own. In order to keep things consistent I would prefer avoid changing the _REF_ variant either. But I don't have a strong opinion here.

As for inline, this requires adding several headers to struct mount.h and despite them being gated with _KERNEL it causes a lot of breakage during buildworld, e.g.:

/usr/src/lib/libprocstat/common_kvm.h:34: error: conflicting types for 'dev2udev'
/usr/obj/usr/src/sparc64.sparc64/tmp/usr/include/sys/systm.h:499: error: previous declaration of 'dev2udev' was here

I think conversion to inlines should be a separate effort, if done at all.

sys/kern/vfs_default.c
627 ↗(On Diff #61954)

Most likely yes. I am not proposing to do it in this commit, of course, just something that I noted when I reviewed all implementations of the VOP for your change.

sys/kern/vfs_mount.c
1393 ↗(On Diff #62104)

s/barrier/fence/. vfs_op_thread_enter().

1392 ↗(On Diff #61954)

I only want to you point to th dual fence.

sys/sys/mount.h
271 ↗(On Diff #61954)

unlocked sounds fine

289 ↗(On Diff #61954)

Ok.
You might move that to a new header.

  • change comment in vfs_op_enter
  • rename _FASTPATH to _UNLOCKED
  • remove spurious 'to' from vfs_op_thread_enter comment and change the ending a little
This revision is now accepted and ready to land.Sep 14 2019, 8:55 PM

My only real feedback is on the naming.

vfs_op_ is too close to VOP and I find that confusing. These are mount ops. I believe it would be much more obvious if we called it mnt_op or mount_op_.

This revision was automatically updated to reflect the committed changes.