On non-trivial SMP systems the contention on the pmc_owner mutex leads to a substantial number of samples captured being from the pmc process itself. This change a) makes buffers larger to avoid contention on the global list b) makes the working sample buffer per cpu.
will-it-scale/page_fault1_processes -t 96 -s 30 >& /dev/null &
pmcstat -S UNHALTED_CORE_CYCLES -n 21000000 -O pf1.pmcstat sleep 10
pmcstat -R pf1.pmcstat -z100 -G pf1.stacks
before:
https://github.com/mattmacy/profiling/blob/master/2018.04.22/pf1_orig.svg
after:
https://github.com/mattmacy/profiling/blob/master/2018.04.22/pf1.svg