While there are probes which can be used to track all on/off cpu activity, there are 2 problems with them:
- probing on everything is quite expensive
- there is no general way to distinguish blocking because of contention/waiting on something in kernel vs waiting on user (poll, select, waitpid etc.)
The patch below remedies the problem, although I don't have strong opinion on any names used.
The probes can be used with a script like this:
#pragma D option dynvarsize=32m /* * The p_flag & 0x4 test filters out kernel threads. */ sched:::ksleep /(curthread->td_proc->p_flag & 0x4) == 0/ { self->ts = timestamp; self->mesg = arg1; } sched:::kwakeup /self->ts/ { @[stack(30), stringof(self->mesg)] = sum(timestamp - self->ts); self->ts = 0; self->mesg = 0; } dtrace:::END { printa("%k\n%s\n%@d\n\n", @); }
- rm offcpu ; dtrace -s kprof-offcpu.d -o offcpu
- cat offcpu | perl ~/FlameGraph/stackcollapse.pl | perl ~/FlameGraph/flamegraph.pl --color=io > /tmp/out-off.svg
https://people.freebsd.org/~mjg/fg/flix1-buildkernel-offcpu.svg
https://people.freebsd.org/~mjg/fg/flix1-buildworld-offcpu.svg
extra: -j 40 kernel over nfs:
https://people.freebsd.org/~mjg/fg/lynx4-builldkernel-nfs-offcpu.svg
If you hover over labels you will see "samples" count, which in this case is total time in nanoseconds. So for buildworld case that's 153 seconds total spent off cpu due to vm object. Note that a lot of it overlapped, so there is no immediate way to say how much real time it caused. Nonetheless, I find this very useful.