The code is basically complete but there is no documentation. I don't feel strongly about the name, but I do feel strongly about semantics.
What's currently known as "read-mostly locks" does not allow readers to sleep in the sleepable variant (sleep is only allowed for writers).
Part of the motivation is getting rid of the mild contention on eventhandler locks which shows up when running poudriere. They pop up everywhere for process and thread creation/destruction and some other things.
Primary motivation is taking care of the MAC problem. The standard way to do checks is the following:
#define MAC_POLICY_CHECK(check, args...) do { \ struct mac_policy_conf *mpc; \ \ error = 0; \ LIST_FOREACH(mpc, &mac_static_policy_list, mpc_list) { \ if (mpc->mpc_ops->mpo_ ## check != NULL) \ error = mac_error_select( \ mpc->mpc_ops->mpo_ ## check (args), \ error); \ } \ if (!LIST_EMPTY(&mac_policy_list)) { \ mac_policy_slock_sleep(); \ LIST_FOREACH(mpc, &mac_policy_list, mpc_list) { \ if (mpc->mpc_ops->mpo_ ## check != NULL) \ error = mac_error_select( \ mpc->mpc_ops->mpo_ ## check (args), \ error); \ } \ mac_policy_sunlock_sleep(); \ } \ } while (0)
That is, there is not even a scan over a list of modules which provide the given hook, but all modules are scanned to see if they provide it. Should there a policy be loaded, a global shared lock is taken.
Turns out the standard recommended setup today ends up loading mac_ntpd which triggers the following for standard MAC checks, most notably during file lookup. This turns out to significantly limit performance on arm64 as reported by @cperciva with this slock being found at the top of the profile (as a sample data point it increases -j 64 buildworld time from 10:45 to 14:00)
The following code was not yet benchmarked on arm64 but should do the trick.
Primary design goal is to provide cheap fast path for sleepable read-locking with main assumption being that write locking almost never happens. The code maintains a per-cpu reader counter and uses an IPI handler to switch to writing. It basically is a variation of what rmlocks are doing.
epoch was suggested but I disagree with its use due to much more expensive fast path with no advantages for the intended use case (that is, no writers almost ever)
Note both on stock head and with the current patch loading and unloading policies partially stalls execution. This can be fixed with no overhead for regular runtime but extra complexity for the modifying code and I consider it not worth it either since any of these ops should almost never happen anyway (compared to how often the lock is taken for reading).
Also note MAC should be modified to provide lists of modules with a given hooks instead. This would happen to have a side effect of not degrading performance in buildworld when mac_ntpd is loaded, but would still suffer if something callable was loaded. As such, the fix applies regardless.