The routine avoidably locks a mutex from a small pool which causes scalability issues. Instead we can set the bit with atomics and only resort to locking if that fails. Since going to sleep would take a sleepq lock anyway remove mtx pool use and use sleepq locks directly.
On a kernel with other fixes (including markj's vm page patch) I get the following results from will-it-scale during tests with 104 threads on skylake:
test | before | after | diff |
lseek1_processes | 257118082 | 402258149 | +37% |
readseek1_processes | 75866480 | 101140043 | +25% |