On non-trivial SMP systems the contention on the pmc_owner mutex leads to a substantial number of samples captured being from the pmc process itself. This change a) makes buffers larger to avoid contention on the global list b) makes the working sample buffer per cpu.
will-it-scale/page_fault1_processes -t 96 -s 30 >& /dev/null &
pmcstat -S UNHALTED_CORE_CYCLES -n 21000000 -O pf1.pmcstat sleep 10
pmcstat -R pf1.pmcstat -z100 -G pf1.stacks
before:
https://github.com/mattmacy/profiling/blob/master/2018.04.22/pf1_orig.svg
after:
https://github.com/mattmacy/profiling/blob/master/2018.04.22/pf1.svg
without pmcstat running:
mmacy@anarchy [~/devel/freebsd|0:09|32] time make -j96 buildkernel -s >& /dev/null
make -j96 buildkernel -s >&/dev/null 2508.41s user 1047.14s system 6000% cpu 59.259 total
with pmcstat running before:
pmcstat -S UNHALTED_CORE_CYCLES -n 21000000 -O /dev/null sleep 600 &
[1] 8918
mmacy@anarchy [~/devel/freebsd|0:29|68] time make -j96 buildkernel -s >& /dev/null
make -j96 buildkernel -s >&/dev/null 3336.68s user 24684.10s system 7442% cpu 6:16.50 total
with pmcstat running after:
pmcstat -S UNHALTED_CORE_CYCLES -n 21000000 -O /dev/null sleep 180 &
mmacy@anarchy [~/devel/freebsd|0:14|48] time make -j96 buildkernel -s >& /dev/null
make -j96 buildkernel -s >&/dev/null 2688.13s user 1547.66s system 6053% cpu 1:09.97 total
fixing the plethora of false sharing in pmc drops it a bit further:
make -j96 buildkernel -s >&/dev/null 2697.82s user 1347.35s system 6058% cpu 1:06.77 total
It turns out that the pmcstat mechanism for specifying rate is a bit brain damaged, it has to be specified _before_ the counter.
So the command line when testing sampling overhead is now:
pmcstat -n 1050000 -S UNHALTED_CORE_CYCLES -O /dev/null sleep 6000 &
This will sample at most 2000/s with a clock of 2.1Ghz.
baseline: make -j96 buildkernel without pmc running
pmc_head: make -j96 buildkernel with pmc running on HEAD
pmc_pcpu: make -j96 buildkernel with pmc running using the attached patch
Real time before and after: x pmc_head_real.log + pmc_pcpu_real.log +---------------------------------------------------------------------------+ |+ | |+ | |+ | |+ | |++ | |++ x | |++ xx x xxxx x x| |MA |___________M___A________________| | +---------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 10 76.4 127.62 84.845 88.577 15.100031 + 10 59.71 60.79 60.135 60.179 0.29957192 Difference at 95.0% confidence -28.398 +/- 10.0344 -32.0602% +/- 7.69825% (Student's t, pooled s = 10.6794) system time before and after: x pmc_head_sys.log + pmc_pcpu_sys.log +---------------------------------------------------------------------------+ |+ | |+ | |+ | |+ | |+ | |+ | |++ x x | |++ xx x x xx x x| |A| |____________M____A________________| | +---------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 10 2277.96 6948.53 2949.47 3341.492 1385.2677 + 10 1038.7 1081.06 1070.555 1064.017 15.85404 Difference at 95.0% confidence -2277.47 +/- 920.425 -68.1574% +/- 8.77623% (Student's t, pooled s = 979.596) Overhead of pmc on head: x baseline_real.log + pmc_head_real.log +---------------------------------------------------------------------------+ |xx + | |xx + | |xx ++ + +++ + +| |AM |___________M___A_______________| | +---------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 10 58.38 59.15 58.86 58.847 0.22504567 + 10 76.4 127.62 84.845 88.577 15.100031 Difference at 95.0% confidence 29.73 +/- 10.0335 50.5208% +/- 17.0525% (Student's t, pooled s = 10.6785) x baseline_sys.log + pmc_head_sys.log +---------------------------------------------------------------------------+ |x + + | |xx ++ + + ++ + +| |A |___________M____A________________| | +---------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 10 1010.15 1073.31 1025.465 1031.524 18.135705 + 10 2277.96 6948.53 2949.47 3341.492 1385.2677 Difference at 95.0% confidence 2309.97 +/- 920.443 223.937% +/- 89.3039% (Student's t, pooled s = 979.616) Overhead of updated pmc: x baseline_real.log + pmc_pcpu_real.log +---------------------------------------------------------------------------+ | x x + | |x x x xx x x x + + ++++ + + +| | |______AM_____| |_______MA________| | +---------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 10 58.38 59.15 58.86 58.847 0.22504567 + 10 59.71 60.79 60.135 60.179 0.29957192 Difference at 95.0% confidence 1.332 +/- 0.248939 2.2635% +/- 0.426506% (Student's t, pooled s = 0.264942) x baseline_sys.log + pmc_pcpu_sys.log +---------------------------------------------------------------------------+ | x x + +| |x x x x x + x+x + + + +x + +| | |____________M_____A_________________||______________A______M_________| | +---------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 10 1010.15 1073.31 1025.465 1031.524 18.135705 + 10 1038.7 1081.06 1070.555 1064.017 15.85404 Difference at 95.0% confidence 32.493 +/- 16.0042 3.15% +/- 1.5794% (Student's t, pooled s = 17.0331)