On non-trivial SMP systems the contention on the pmc_owner mutex leads to a substantial number of samples captured being from the pmc process itself. This change a) makes buffers larger to avoid contention on the global list b) makes the working sample buffer per cpu.
will-it-scale/page_fault1_processes -t 96 -s 30 >& /dev/null &
pmcstat -S UNHALTED_CORE_CYCLES -n 21000000 -O pf1.pmcstat sleep 10
pmcstat -R pf1.pmcstat -z100 -G pf1.stacks
before:
https://github.com/mattmacy/profiling/blob/master/2018.04.22/pf1_orig.svg
after:
https://github.com/mattmacy/profiling/blob/master/2018.04.22/pf1.svg
without pmcstat running:
mmacy@anarchy [~/devel/freebsd|0:09|32] time make -j96 buildkernel -s >& /dev/null
make -j96 buildkernel -s >&/dev/null 2508.41s user 1047.14s system 6000% cpu 59.259 total
with pmcstat running before:
pmcstat -S UNHALTED_CORE_CYCLES -n 21000000 -O /dev/null sleep 600 &
[1] 8918
mmacy@anarchy [~/devel/freebsd|0:29|68] time make -j96 buildkernel -s >& /dev/null
make -j96 buildkernel -s >&/dev/null 3336.68s user 24684.10s system 7442% cpu 6:16.50 total
with pmcstat running after:
pmcstat -S UNHALTED_CORE_CYCLES -n 21000000 -O /dev/null sleep 180 &
mmacy@anarchy [~/devel/freebsd|0:14|48] time make -j96 buildkernel -s >& /dev/null
make -j96 buildkernel -s >&/dev/null 2688.13s user 1547.66s system 6053% cpu 1:09.97 total
fixing the plethora of false sharing in pmc drops it a bit further:
make -j96 buildkernel -s >&/dev/null 2697.82s user 1347.35s system 6058% cpu 1:06.77 total
It turns out that the pmcstat mechanism for specifying rate is a bit brain damaged, it has to be specified _before_ the counter.
So the command line when testing sampling overhead is now:
pmcstat -n 1050000 -S UNHALTED_CORE_CYCLES -O /dev/null sleep 6000 &
This will sample at most 2000/s with a clock of 2.1Ghz.
baseline: ```make -j96 buildkernel``` without pmc running
pmc_head: ```make -j96 buildkernel``` with pmc running on HEAD
pmc_pcpu: ```make -j96 buildkernel``` with pmc running using the attached patch
```
Real time before and after:
x pmc_head_real.log
+ pmc_pcpu_real.log
+---------------------------------------------------------------------------+
|+ |
|+ |
|+ |
|+ |
|++ |
|++ x |
|++ xx x xxxx x x|
|MA |___________M___A________________| |
+---------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 76.4 127.62 84.845 88.577 15.100031
+ 10 59.71 60.79 60.135 60.179 0.29957192
Difference at 95.0% confidence
-28.398 +/- 10.0344
-32.0602% +/- 7.69825%
(Student's t, pooled s = 10.6794)
system time before and after:
x pmc_head_sys.log
+ pmc_pcpu_sys.log
+---------------------------------------------------------------------------+
|+ |
|+ |
|+ |
|+ |
|+ |
|+ |
|++ x x |
|++ xx x x xx x x|
|A| |____________M____A________________| |
+---------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 2277.96 6948.53 2949.47 3341.492 1385.2677
+ 10 1038.7 1081.06 1070.555 1064.017 15.85404
Difference at 95.0% confidence
-2277.47 +/- 920.425
-68.1574% +/- 8.77623%
(Student's t, pooled s = 979.596)
Overhead of pmc on head:
x baseline_real.log
+ pmc_head_real.log
+---------------------------------------------------------------------------+
|xx + |
|xx + |
|xx ++ + +++ + +|
|AM |___________M___A_______________| |
+---------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 58.38 59.15 58.86 58.847 0.22504567
+ 10 76.4 127.62 84.845 88.577 15.100031
Difference at 95.0% confidence
29.73 +/- 10.0335
50.5208% +/- 17.0525%
(Student's t, pooled s = 10.6785)
x baseline_sys.log
+ pmc_head_sys.log
+---------------------------------------------------------------------------+
|x + + |
|xx ++ + + ++ + +|
|A |___________M____A________________| |
+---------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 1010.15 1073.31 1025.465 1031.524 18.135705
+ 10 2277.96 6948.53 2949.47 3341.492 1385.2677
Difference at 95.0% confidence
2309.97 +/- 920.443
223.937% +/- 89.3039%
(Student's t, pooled s = 979.616)
Overhead of updated pmc:
x baseline_real.log
+ pmc_pcpu_real.log
+---------------------------------------------------------------------------+
| x x + |
|x x x xx x x x + + ++++ + + +|
| |______AM_____| |_______MA________| |
+---------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 58.38 59.15 58.86 58.847 0.22504567
+ 10 59.71 60.79 60.135 60.179 0.29957192
Difference at 95.0% confidence
1.332 +/- 0.248939
2.2635% +/- 0.426506%
(Student's t, pooled s = 0.264942)
x baseline_sys.log
+ pmc_pcpu_sys.log
+---------------------------------------------------------------------------+
| x x + +|
|x x x x x + x+x + + + +x + +|
| |____________M_____A_________________||______________A______M_________| |
+---------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 1010.15 1073.31 1025.465 1031.524 18.135705
+ 10 1038.7 1081.06 1070.555 1064.017 15.85404
Difference at 95.0% confidence
32.493 +/- 16.0042
3.15% +/- 1.5794%
(Student's t, pooled s = 17.0331)
```