The current eventhandler code does not scale whatsoever, relocking the list for each callback. This shows up on lockprofile as process_exec, thread_dtor/ctor and other locks. This also top of the profile on a kernel with other changes when creating/destroying threads in a loop. Naive conversion to just rms locks does not work as machinery supports ideas like unregistering a callback by itself, meaning you can't hold rms across the entire thing.
Because of that I implement a new API which scales, is faster single-threaded and less taxing on caches. It retains the "priority" argument as apparently callers need it. One potential regression is that a handler which blocks indefinitely will also block modifications to the list, I don't think that's a problem worth addressing for the foreseable future. It can be done with more complexity.
The goal would be to replace all eventhandlers which are known to be there at compilation time with the new API. This will be separate.
The patch, modulo bugs to be shaken out and perhaps some cleanups is what I intend to go forward with.
thread handlers are converted here https://people.freebsd.org/~mjg/thread-eventhandler.diff