The current eventhandler code does not scale whatsoever, relocking the list for each callback. Naive conversion to just rms locks does not work as machinery supports ideas like unregistering a callback by itself, meaning you can't hold rms across the entire thing.
Because of that I implement a new API which scales, is faster single-threaded and less taxing on caches. It retains the "priority" argument as apparently callers need it. One potential regression is that a handler which blocks indefinitely will also block modifications to the list, I don't think that's a problem worth addressing for the foreseable future. It can be done with more complexity.
As a demo thread_* handlers are converted along with 2 consumers. The goal would be to replace all eventhandlers which are known to be there at compilation time with the new API. This will be separate.
The patch, modulo bugs to be shaken out and perhaps some cleanups is what I intend to go forward with.