The current vmmeter struct is subject to extreme false sharing due to several atomic counters sharing cachelines. This severely reduces performance on fork/exec heavy workloads on high core boxes.
pig1 (westmere 4 * 10 * 2) doing -j 80 buildkernel on tmpfs goes from:
3599.82s user 1313.64s system 5485% cpu 1:29.58 total
to:
3613.17s user 1124.48s system 5500% cpu 1:26.13 total
i.e. 3 seconds less real time and ~1300 -> ~1100 system time.
The patch goes for the most trivial approach of moving relevant counters out and annotating them with __exclusive_cache_line.
I'm not attached to the new name prefix (vm_) nor names of padding fields. I just would like to see the split (or an equivalent) in.