racct: Remove factor in pcpu estimate
AbandonedPublic
Actions

Authored by cyril_freebsdfoundation.org on Jun 3 2021, 8:54 PM.

Details

Reviewers

markj
trasz

Summary

The runtime value is multiplied by 1000000, but it is already in microseconds, resulting in a very large estimate.
Removing this extra factor of 1000000 should fix bug #235556.
Note that this estimate is only used for short-lived processes.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

cyril_freebsdfoundation.org requested review of this revision.Jun 3 2021, 8:54 PM

cyril_freebsdfoundation.org created this revision.

afedorov added a subscriber: afedorov.Jun 3 2021, 9:36 PM

olevole_olevole.ru added a subscriber: olevole_olevole.ru.Jun 4 2021, 5:47 AM

The patch works like a charm! This fixes the case I described here (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235556#c2), the stats are now correct: https://www.bsdstore.ru/trash/racct.png
Tested on: FreeBSD 14.0-CURRENT #0 main-n247127-1976e079544-dirty

I would suggest noting in the description/commit log message that this estimate is used only for short-lived processes.

sys/kern/kern_racct.c
328	Style: missing parens around the return value.

This revision is now accepted and ready to land.Jun 4 2021, 1:01 PM

cyril_freebsdfoundation.org edited the summary of this revision. (Show Details)Jun 4 2021, 4:10 PM

Added parentheses around return value.

This revision now requires review to proceed.Jun 4 2021, 4:22 PM

cyril_freebsdfoundation.org marked an inline comment as done.Jun 4 2021, 4:22 PM

I did some more testing and now I think that the change is wrong. In particular, note that the RACCT_PCPU resource has RACCT_IN_MILLIONS attribute, which explains why this multiplication occurs.

My test was to run a buildkernel in a jail, passing the number of CPUs to make(1)'s -j parameter. I'd expect to see a usage of roughly 3200 in this case. Without the change, it is much larger than that. With the change, it is too small. By the way, you can collect stats for the host by running rctl -u jail:0 if you didn't already know that. So you don't have to actually create a jail.

Reading the code some more, I see a (the?) problem in racct_proc_exit(). There are two places where the PCPU resource is updated: periodically (once per second) by racctd, and when a process exits. In the latter case, we divide the total runtime of the process by the amount of time elapsed since the process was created, and convert the result to a percentage. Consider what happens when a compiler process is created, does its work, and exits. Suppose it takes 0.1s to compile a file. Compilers are CPU-bound, so the process will be on a CPU for almost all of its lifetime, corresponding to 100% CPU. Suppose we run 10 compiler instances back-to-back: won't that result in a reported usage of 1,000% CPU? In other words, the estimate we use for short-lived processes doesn't make sense when they're CPU-bound.

The problem is that %CPU, as it's currently computed, isn't additive over short time periods.

Closing this revision because it is incorrect.

Revision Contents
Changeset List

Path

Size

sys/

kern/

kern_racct.c

27 lines

Diff 90417

View Options

racct: Remove factor in pcpu estimateAbandonedPublicActions