The routine is called on successful write and read, which on pipes happens a lot and for small sizes.
Precision provided by default seems way bigger than necessary and it causes problems in vms on amd64 (it rdtscp's which vmexits). getnanotime seems to provide the level roughly in lines of Linux so we should be good here.
Sample result in a vm running pipe1_processes on virtualbox + haswell:
[23:09] test:~/will-it-scale (130) # ./pipe1_processes -t 1 -s 5
testcase:pipe read/write
warmup
min:416236 max:416236 total:416236
min:427691 max:427691 total:427691
min:413516 max:413516 total:413516
min:426149 max:426149 total:426149
min:422719 max:422719 total:422719
min:433463 max:433463 total:433463
measurement
min:423289 max:423289 total:423289
min:434897 max:434897 total:434897
min:424951 max:424951 total:424951
min:429241 max:429241 total:429241
min:419946 max:419946 total:419946
average:426464
./pipe1_processes -t 1 -s 5 0.00s user 11.54s system 99% cpu 11.536 total
[23:09] test:~/will-it-scale # ./pipe1_processes -t 1 -s 5
[23:09] test:~/will-it-scale (130) # sysctl vfs.timestamp_precision=1
vfs.timestamp_precision: 2 -> 1
[23:09] test:~/will-it-scale # ./pipe1_processes -t 1 -s 5
testcase:pipe read/write
warmup
min:3341419 max:3341419 total:3341419
min:3301849 max:3301849 total:3301849
min:3227739 max:3227739 total:3227739
min:3215604 max:3215604 total:3215604
min:3442148 max:3442148 total:3442148
min:3268131 max:3268131 total:3268131
measurement
min:3219759 max:3219759 total:3219759
min:3211822 max:3211822 total:3211822
min:3276166 max:3276166 total:3276166
min:3291221 max:3291221 total:3291221
min:3238138 max:3238138 total:3238138
average:3247421
./pipe1_processes -t 1 -s 5 1.19s user 10.11s system 100% cpu 11.299 total
That is 761% of the baseline. kvm + cascade lake had a similar win.