The code is basically complete but there is no documentation. I don't feel strongly about the name, but I do feel strongly about semantics.
Part of the motivation is getting rid of the mild contention on eventhandler locks which shows up when running poudriere. They pop up everywhere for process and thread creation/destruction and some other things.
Primary motivation is taking care of the MAC problem. The standard way to do checks is the following:
```
#define MAC_POLICY_CHECK(check, args...) do { \
struct mac_policy_conf *mpc; \
\
error = 0; \
LIST_FOREACH(mpc, &mac_static_policy_list, mpc_list) { \
if (mpc->mpc_ops->mpo_ ## check != NULL) \
error = mac_error_select( \
mpc->mpc_ops->mpo_ ## check (args), \
error); \
} \
if (!LIST_EMPTY(&mac_policy_list)) { \
mac_policy_slock_sleep(); \
LIST_FOREACH(mpc, &mac_policy_list, mpc_list) { \
if (mpc->mpc_ops->mpo_ ## check != NULL) \
error = mac_error_select( \
mpc->mpc_ops->mpo_ ## check (args), \
error); \
} \
mac_policy_sunlock_sleep(); \
} \
} while (0)
```
That is, there is not even a scan for modules which provide a given hook, but all modules are scanned to see if they provide it. Should there a policy be loaded, a global shared lock is taken.
Turns out the standard recommended setup today ends up loading mac_ntpd which triggers the following for standard MAC checks, most notably during file lookup. This turns out to significantly limit performance on arm64 as reported by @cperciva with this slock being found at the top of the profile.
The following code was not yet benchmarked on arm64 but should do the trick.
Primary design goal is to provide cheap fast path for sleepable read-locking with main assumption being that write locking almost never happens. The code maintains a per-cpu reader counter and uses IPI-injected fences to synchronize against a writer. Important restriction which may look weird but in my opinion provides and advantage is that it is illegal to have competing writers on this lock. The primitive can be modified to allow for it but I don't see any point. Writers should almost never be present to begin with and even if they are, the total hold time should be shortest possible. Any code which ends up write locking this should first take other locks and prepare whatever is needed.
epoch was suggested but I disagree with its use due to much more expensive fast path with no advantages for the intended use case (that is, no writers almost ever)