Change Details

I made an utility to show all affected/related values in one table. https://firk.cantconnect.ru/projects/jtop/jtop.tgz Command line (to be called on large display because of wide table, I'm using about 150x40): ./jtop +jail +proc +loop +esc +ext sort=rpct -kern First table - jails list (on test machine, most likely there will be only one testing jail) SUMCPU% - cpu% usage manually summed from kinfo_proc->ki_pctcpu values ELAPSED, CPU% - elapsed seconds and cpu% usage reported by the new accounting system (-ext to turn off) RA.CPU, RA.PCPU - elapsed seconds and cpu% usage reported by the existing rctl_get_racct() Second table - processes list (there will be not many on testing system, but anyway limited to 20) OLDCPU% - cpu% usage from sched_pctcpu() (was in kinfo_proc->ki_pctcpu, now moved to ki_sparelong[1] by a hack) KICPU% - cpu% usage reported by the new accounting system in kinfo_proc->ki_pctcpu RCPU% - cpu% usage counted manually by comparing previous and current elapsed time and dividing by loop interval ELAPSED - elapsed seconds from kinfo_proc->ki_runtime On unpatched system, command line should be ./jtop +jail +proc +loop +esc -ext sort=rpct -kern ELAPSED and CPU% in jails list will be zero OLDCPU% in process list will be zero KICPU% in process list will be from sched_pctcpu() My tests: --------- What I've seen on single-core VM shortly after new 'make buildworld' JID NAME SUMCPU% ELAPSED CPU% RA.CPU RA.PCPU HOSTNAME IP4ADDRS PATH 1 "1" 1.17% 32.129 96.76% 56 11090% x 127.0.0.1 "/" So, SUMCPU is low because most cputime-consuming processes are shortlived and so not visible here ELAPSED and CPU% fields looks like true data RA.CPU is 2x more that real for some reason RA.PCPU is 11090% which is nonsense for single-core system --------- What I've seen after 20 seconds of 'dd if=/dev/zero of=/dev/null' JID NAME SUMCPU% ELAPSED CPU% RA.CPU RA.PCPU HOSTNAME IP4ADDRS PATH 2 "2" 99.75% 20.251 99.78% 19 83% x 127.0.0.1 "/" FL PID PPID PGID TPGID SID TSID JID OLDCPU% KICPU% RCPU% ELAPSED COMMAND JM 40745 40200 40745 40745 39615 39615 2 88.96% 99.75% 101.17% 20.219 "dd" 0 Here everything is nearly fine, but sched_pctcpu()-sourced OLDCPU% for the process and RA.PCPU for the jail are not true. After 10 seconds (so, 10 sec before this snapshot) it was worse. --------- Test program ("network daemon" waiting for external events and doing some idle task): ``` #include <stdlib.h> #include <unistd.h> #include <math.h> #include <sys/types.h> #include <sys/select.h> int main(int argc, char * * argv) { fd_set fds; struct timeval tv; int to, sl, j, k; to = (argc>=2)?atoi(argv[1]):1000; sl = (argc>=3)?atoi(argv[2]):0; k = 0; for(;;) { FD_ZERO(&fds); FD_SET(0, &fds); tv.tv_sec = 0; tv.tv_usec = (argc>=2)?atoi(argv[1]):1000; select(1, &fds, NULL, NULL, &tv); for(j=0; j<sl; j++) k = (int)sqrt(12345+j+k); } return k; } ``` after about 1 minute of './test-select 1000 0' (kern.hz switched from VM-default 100 to 1000) JID NAME SUMCPU% ELAPSED CPU% RA.CPU RA.PCPU HOSTNAME IP4ADDRS PATH 1 "1" 2.49% 2.017 2.50% 1 1% x 127.0.0.1 "/" FL PID PPID PGID TPGID SID TSID JID OLDCPU% KICPU% RCPU% ELAPSED COMMAND JM 95062 95017 95062 95017 93900 93900 1 1.46% 2.49% 2.50% 1.974 "test-select" 0 sched_pctcpu() reports about half of real cpu usage (on some other system it was 0.34% real and pure 0.00% from sched_pctcpu() but I can't reproduce it on test VM)