Details

Reviewers

kib
jeff
jhb

Group Reviewers

manpages

Commits

rS356119: Add read-mostly sleepable locks

Summary

The code is basically complete but there is no documentation. I don't feel strongly about the name, but I do feel strongly about semantics.

What's currently known as "read-mostly locks" does not allow readers to sleep in the sleepable variant (sleep is only allowed for writers).

Part of the motivation is getting rid of the mild contention on eventhandler locks which shows up when running poudriere. They pop up everywhere for process and thread creation/destruction and some other things.

Primary motivation is taking care of the MAC problem. The standard way to do checks is the following:

#define MAC_POLICY_CHECK(check, args...) do {                           \
        struct mac_policy_conf *mpc;                                    \
                                                                        \
        error = 0;                                                      \
        LIST_FOREACH(mpc, &mac_static_policy_list, mpc_list) {          \
                if (mpc->mpc_ops->mpo_ ## check != NULL)                \
                        error = mac_error_select(                       \
                            mpc->mpc_ops->mpo_ ## check (args),         \
                            error);                                     \
        }                                                               \
        if (!LIST_EMPTY(&mac_policy_list)) {                            \
                mac_policy_slock_sleep();                               \
                LIST_FOREACH(mpc, &mac_policy_list, mpc_list) {         \
                        if (mpc->mpc_ops->mpo_ ## check != NULL)        \
                                error = mac_error_select(               \
                                    mpc->mpc_ops->mpo_ ## check (args), \
                                    error);                             \
                }                                                       \
                mac_policy_sunlock_sleep();                             \
        }                                                               \
} while (0)

That is, there is not even a scan over a list of modules which provide the given hook, but all modules are scanned to see if they provide it. Should there a policy be loaded, a global shared lock is taken.

Turns out the standard recommended setup today ends up loading mac_ntpd which triggers the following for standard MAC checks, most notably during file lookup. This turns out to significantly limit performance on arm64 as reported by @cperciva with this slock being found at the top of the profile (as a sample data point it increases -j 64 buildworld time from 10:45 to 14:00)

The following code was not yet benchmarked on arm64 but should do the trick.

Primary design goal is to provide cheap fast path for sleepable read-locking with main assumption being that write locking almost never happens. The code maintains a per-cpu reader counter and uses an IPI handler to switch to writing. It basically is a variation of what rmlocks are doing.

epoch was suggested but I disagree with its use due to much more expensive fast path with no advantages for the intended use case (that is, no writers almost ever)

Note both on stock head and with the current patch loading and unloading policies partially stalls execution. This can be fixed with no overhead for regular runtime but extra complexity for the modifying code and I consider it not worth it either since any of these ops should almost never happen anyway (compared to how often the lock is taken for reading).

Also note MAC should be modified to provide lists of modules with a given hooks instead. This would happen to have a side effect of not degrading performance in buildworld when mac_ntpd is loaded, but would still suffer if something callable was loaded. As such, the fix applies regardless.

Test Plan

Survived while true; do kldload mac_ntpd; kldunload mac_ntpd; done while -j 40 buildkernel was running. Also 40-way find loop over the fs; all of which constantly hit mac checks.

Diff Detail

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

mjg created this revision.Dec 15 2019, 6:01 PM

Herald added a subscriber: imp. · View Herald TranscriptDec 15 2019, 6:01 PM

Harbormaster completed remote builds in B28162: Diff 65683.Dec 15 2019, 6:01 PM

mjg edited the summary of this revision. (Show Details)Dec 15 2019, 6:05 PM

mjg edited the summary of this revision. (Show Details)Dec 15 2019, 6:09 PM

mjg edited the summary of this revision. (Show Details)

mjg edited the summary of this revision. (Show Details)Dec 15 2019, 6:12 PM

So this is a kind of Peterson lock, it seems (I find it somewhat curious).

'One writer' constraint looks very fragile. And the lack of documentation makes it almost impossible to use by anybody by you.

sys/kern/kern_rmlock.c
899	Why do you need this fence ? [Please answer with the code comment]
929	And this one.
947	s/if/while/
sys/sys/rmlock.h
136	I think this should be placed into sys/_rmlock.h or even sys/_rmslock.h from the beginning.
138	Why not bool ?

In D22823#499745, @kib wrote:

So this is a kind of Peterson lock, it seems (I find it somewhat curious).

'One writer' constraint looks very fragile. And the lack of documentation makes it almost impossible to use by anybody by you.

There is no documentation right now because I wanted to show the concept. Assuming this looks fine enough modulo fixups I'm going to document it properly. I don't find the constraint to be a problem. The assertion the lock is not already write locked can be made to also show up in production kernels. Any code using this will have reasons to wrap the entire thing with another lock anyway.

sys/kern/kern_rmlock.c
899	I'll comment everything later. As for now, the injected fence may be too early and this has to prevent the case where the count is updated after critical section is exited. Otherwise the loop transferring the counts to the central var can miss the update.
947	only last unlocking reader wakes this up and the invariant is no new readers get the lock until the writer exits, so "if" is sufficient. also note the assertion below
sys/sys/rmlock.h
136	ok
138	for consistency with other counters, does not change anything

kib added inline comments.Dec 16 2019, 4:59 PM

sys/kern/kern_rmlock.c
899	But what is the acquire this release syncs with ?
947	Well, there are things like spurious wakeups.

The read path cost here would be virtually identical with uma_smr. You would only have to do safe list link/unlink but if you serialize a single writer that is trivial.

In D22823#500167, @jeff wrote:

The read path cost here would be virtually identical with uma_smr. You would only have to do safe list link/unlink but if you serialize a single writer that is trivial.

Do you mean uma_smr_enter from D22586? That routine performs atomic_add_acq_int which would make it slower than rms_rlock at least on amd64. Note my function only uses atomic_thread_fence_rel which on this arch expands to a compiler barrier. Other archs do end up suffering a barrier but I don't know how much of a problem it is for them. If necessary I can make it completely fence-less for everyone.

move struct rmslock to sys/_rmlock.h
support concurrent writers
eliminate release fences from rlock/runlock. instead use compiler barriers + an IPI handler to do the work

The code as submitted is almost committable, can use some extra asserts. The missing part is updates to rmlock manual page which I can write later provided the semantics are fine.

sys/kern/kern_rmlock.c
885	I'm not very fond of a separate var here, it can use a flag in readers_pcpu instead but I don't think this is a big problem.
903	These accessors are pretty awful and should be fixed. Unfortunately pcpu stuff is oblivious to the type so it's hard to do any type checking for it. This sitill can be nicer if it can be zpcpu_get(rms->readers_influx, int) instead but I think that's for some other time.

mjg added inline comments.Dec 17 2019, 4:08 AM

sys/kern/kern_rmlock.c
1064	Regarding this condition and the one above being 'if' and not 'while', the code does not do any spurious lookups of this waitchannel so I think it's fine. I'm unfamiliar with anything which would just wake this up in other conditions.

use patched zpcpu accessors which don't require expilcit casting
use a bitmap for repeat IPI signalling

Except for lack of manpage changes I consider the code to be committable.

kib accepted this revision.Dec 17 2019, 8:03 PM

kib added inline comments.

sys/kern/kern_rmlock.c
1064	Our sleepqueues are like condvars in regard of spurious wakes.
sys/sys/_rmlock.h
71	I think this is a good reason to put the new structure definition into _rmslock.h.

This revision is now accepted and ready to land.Dec 17 2019, 8:03 PM

add manpage

This revision now requires review to proceed.Dec 18 2019, 12:37 PM

This revision was not accepted when it landed; it landed in state Needs Review.Dec 27 2019, 11:20 AM

Closed by commit rS356119: Add read-mostly sleepable locks (authored by mjg). · Explain Why

This revision was automatically updated to reflect the committed changes.

mjg added a commit: rS356119: Add read-mostly sleepable locks.

Herald added a reviewer: manpages. · View Herald TranscriptDec 27 2019, 11:20 AM

Add read-mostly sleepable locks
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 65747

sys/kern/kern_rmlock.c

sys/sys/_rmlock.h

sys/sys/rmlock.h

Add read-mostly sleepable locksClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 65747

sys/kern/kern_rmlock.c

sys/sys/_rmlock.h

sys/sys/rmlock.h

Add read-mostly sleepable locks
ClosedPublic
Actions

Revision Contents
Changeset List