Page MenuHomeFreeBSD

[WIP] off cpu tracking
Needs ReviewPublic

Authored by mjg on Aug 9 2020, 4:47 AM.

Details

Reviewers
kib
markj
jeff
Summary

While there are probes which can be used to track all on/off cpu activity, there are 2 problems with them:

  • probing on everything is quite expensive
  • there is no general way to distinguish blocking because of contention/waiting on something in kernel vs waiting on user (poll, select, waitpid etc.)

The patch below remedies the problem, although I don't have strong opinion on any names used.

The probes can be used with a script like this:

#pragma D option dynvarsize=32m

/*
 * The p_flag & 0x4 test filters out kernel threads.
 */

kernel:profile:*:sleep
/(curthread->td_proc->p_flag & 0x4) == 0/
{

        self->ts = timestamp;
        self->mesg = arg1;
}

kernel:profile:*:wakeup
/self->ts/
{
        @[stack(30), stringof(self->mesg)] = sum(timestamp - self->ts);
        self->ts = 0;
        self->mesg = 0;
}

dtrace:::END
{
        printa("%k\n%s\n%@d\n\n", @);
}
  1. rm offcpu ; dtrace -s kprof-offcpu.d -o offcpu
  2. cat offcpu | perl ~/FlameGraph/stackcollapse.pl | perl ~/FlameGraph/flamegraph.pl --color=io > /tmp/out-off.svg

https://people.freebsd.org/~mjg/fg/flix1-buildkernel-offcpu.svg
https://people.freebsd.org/~mjg/fg/flix1-buildworld-offcpu.svg

extra: -j 40 kernel over nfs:
https://people.freebsd.org/~mjg/fg/lynx4-builldkernel-nfs-offcpu.svg

If you hover over labels you will see "samples" count, which in this case is total time in nanoseconds. So for buildworld case that's 153 seconds total spent off cpu due to vm object. Note that a lot of it overlapped, so there is no immediate way to say how much real time it caused. Nonetheless, I find this very useful.

Diff Detail

Lint
Lint Skipped
Unit
Unit Tests Skipped
Build Status
Buildable 32884

Event Timeline

mjg created this revision.Aug 9 2020, 4:47 AM
mjg requested review of this revision.Aug 9 2020, 4:47 AM
mjg edited the summary of this revision. (Show Details)Aug 9 2020, 5:18 AM
mjg edited the summary of this revision. (Show Details)Aug 9 2020, 5:22 AM
kib added a comment.Aug 9 2020, 10:00 PM

Arguably user-sleeps are cv/mwait with PCATCH.

mjg added a comment.Aug 9 2020, 10:52 PM

This does not catch rfork: rfppwait is doing cv_timedwait(&p2->p_pwait, &p2->p_mtx, hz); , similarly there are spots in umtx and others

kib added a comment.Aug 9 2020, 11:38 PM
In D26011#577013, @mjg wrote:

This does not catch rfork: rfppwait is doing cv_timedwait(&p2->p_pwait, &p2->p_mtx, hz); , similarly there are spots in umtx and others

For vfork() I agree, you would need some kind of cv_timedwait_user() interface indeed.

But I do not see a place in kern_umtx.c where we would use non-interruptible sleep for dropping off cpu which can be blamed on user. Busying the keychain is arguably the in-kernel contention.

mjg added a comment.Aug 10 2020, 5:32 AM

So I had a more look around, I think PCATCH is too big of a hammer for this one. For example vn_start_write comes with PCATCH most of the time, but arguably it should be accounted. With my approach there is more work to do in the long run as newly discovered user problems have to be annotated, but nothing will be missed.

kib added a comment.Aug 10 2020, 9:52 AM

vn_start_write() is interruptible. Initially, I think, Tor made it interruptible to allow users to get out of numerous problems that the suspension implementation had. Since then most of the issues were fixed, and I think that the removal of PCATCH there is long overdue.

On the other hand, it could be argued that VFS syscalls become interruptible only on very exceptional condition, where _user_ initiated suspension, so there is no significant non-compliance. Also, since it is user who starts snapshots or unmounts, accounting that sleep time as requested by user is fair.

So I still very much dislike PUSERW.

markj added inline comments.Aug 10 2020, 5:55 PM
sys/kern/subr_sleepqueue.c
184

Why not put these under the sched provider? I think the proposed names are confusing. "kernel" is too generic, and we already have a "profile" provider.