Page MenuHomeFreeBSD

use counter(9) in vmmeter
ClosedPublic

Authored by glebius on Mar 28 2017, 12:24 AM.

Details

Summary

Using Benno's 4-year old patch[1] as foundation, do the following:

  • Remove 'struct vmmeter' from 'struct pcpu'.
  • Add early counter mech into pcpu, to allow to create a non-faulting counter that may be updated at early boot time. The early counts would be lost, however.
  • In the global vm_cnt, make all counters that were used in pcpu into counter(9). Use early mech to avoid early panics.
  • Don't include vmmeter.h into pcpu.h and fix all files affected.
  • Removes some entries from assym that weren't being used.

This has been tested on amd64 in bhyve and sparc64.
I need your help guys to review and test the following arches: powerpc,
mips, arm and sparc64.

[1] https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

glebius created this revision.Mar 28 2017, 12:24 AM
glebius retitled this revision from Using Benno's 4-year old patch[1] as foundation, do the following: to use counter(9) in vmmeter.Mar 28 2017, 12:26 AM
glebius edited the summary of this revision. (Show Details)
glebius added reviewers: lidl, marius, andrew, kan, adrian, kib, benno.
gallatin accepted this revision.Mar 28 2017, 9:34 PM
This revision is now accepted and ready to land.Mar 28 2017, 9:34 PM
kib added inline comments.Mar 29 2017, 10:37 AM
sys/sparc64/sparc64/exception.S
2481 ↗(On Diff #26712)

Where is a replacement for this increment ?

glebius added inline comments.Mar 29 2017, 8:41 PM
sys/sparc64/sparc64/exception.S
2481 ↗(On Diff #26712)

sparc64 is the only arch that does this increment in assembly. I asked Kurt Lidl to look at that. This should be done in C, like on all other arches.

lidl added inline comments.Mar 29 2017, 10:07 PM
sys/sparc64/sparc64/exception.S
2481 ↗(On Diff #26712)

I had success in building r316099 + this patch, and
rebooting with the updated kernel and new /usr/bin/vmstat.

Currently the host is busy doing a fresh buildworld/buildkernel with an empty /usr/obj as a stress test.

By my reading of the changes, the increments that used to happen in the assembly for tl0_trap and tl1_trap, are now performed via the single call to VM_CNT_INC(v_trap) in the trap.c file.

lidl accepted this revision.Mar 30 2017, 3:56 AM

Survived a buildworld/buildkernel/installkernel/installworld cycle, running on the new kernel (on my V240).

'vmstat -i 1' give reasonable looking stats while the machine is idling, and running hard.

kib edited edge metadata.EditedMar 30 2017, 4:15 AM
In D10156#210714, @lidl wrote:

'vmstat -i 1' give reasonable looking stats while the machine is idling, and running hard.

I am not sure why you reference vmstat there. The patch most likely broke the vm.stats.sys.v_intr accounting, not the interrupt counters. trap() handles exceptions, while tl{0,1}_intr handle interrupts (registered by intr_setup() and called through intr_handlers[] pointers).

Also, proper testing of this change would ensure that all values seen in vm.stats sysctl output are reasonable and close to the values seen for the similarly timed run of the unpatched kernel.

Kostik is right. I lost counter here. The problem is that sparc64 doesn't have a C part of interrupt handler. It even hardcodes into assembly offset of ih_handler in the structure struct intr_handler. I could resurrect counter in assembly, but it seems to me correct way is to create C function, called from assembly, just like other arches do. Can anyone provide help with sparc64 assembly?

kib added a comment.EditedMar 31 2017, 9:50 AM

Kostik is right. I lost counter here. The problem is that sparc64 doesn't have a C part of interrupt handler. It even hardcodes into assembly offset of ih_handler in the structure struct intr_handler. I could resurrect counter in assembly, but it seems to me correct way is to create C function, called from assembly, just like other arches do. Can anyone provide help with sparc64 assembly?

I suggest slightly different approach there. Instead of providing a C wrapper around calls of the functions from intr_handler, create a C function just to increment the counter, say

void counter_intr_inc(void)
{

    counter_u64_add_protected(&counter, 1); /* Except that no CRITICAL_ASSERT should be asserted */
}

Then replace the assembler fragments which you removed, with something like

call counter_intr_inc
  nop

I forgot sparc assembler for real many years ago, so this might require some adjustments, but AFAIR the ABI is simple enough for void ()(void) calls.

glebius updated this revision to Diff 26888.Mar 31 2017, 5:09 PM
glebius edited edge metadata.

Provide C code in sparc64 to update v_intr.

This revision now requires review to proceed.Mar 31 2017, 5:09 PM

Kurt, can you please review/test the updated patch? When testing, just run 'vmstat 1', and look at 'in' column.

lidl added a comment.Apr 1 2017, 4:39 AM

Well, I reverted the last patch, and then applied the second patch to the /usr/src
tree and rebuilt (via a dependency build) everything.

First thing I noticed is that 'top' doesn't work anymore:

root@spork-137: top
top: sysctl(vm.stats.vm.v_swappgsin...) failed: Cannot allocate memory

Looking 'vmstat 1' output, in particular the 'in' column, the number of interrupts appears to decrease
when the machine is busy.

Here's a sample that starts when the machine (a 2 processor V240) is idle:

procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr da0 da1 in sy cs us sy id
0 0 0 355M 7.5G 0 0 0 0 123 10 0 0 2359 140 1168 0 4 96
0 0 0 355M 7.5G 0 0 0 0 0 10 0 0 2265 125 81 0 1 99

Then I start a 'make -j3' in /usr/src/bin/csh, and the vmstat out changes:

1 0 0 361M 7.5G 1818 0 0 0 2026 10 0 0 2097 2221 312 7 9 83
1 0 0 363M 7.5G 361 0 0 0 78 10 0 0 1623 4069 92 28 23 50
1 0 0 407M 7.5G 4342 0 0 0 2938 11 0 0 1519 5317 464 33 24 42
6 0 0 482M 7.5G 4695 0 1 0 3788 13 0 0 1549 3754 1797 43 28 30
4 0 0 524M 7.5G 5393 0 0 0 1692 16 0 0 375 3738 275 84 16 0
4 0 0 537M 7.5G 1534 0 0 0 0 22 0 0 79 530 178 96 4 0
4 0 0 541M 7.5G 891 0 0 0 0 25 0 0 31 161 142 98 2 0
4 0 0 541M 7.5G 2195 0 0 0 2356 26 0 0 160 1345 224 95 4 0
4 0 0 547M 7.5G 999 0 0 0 369 26 0 0 301 229 1356 85 15 0
5 0 0 500M 7.5G 3660 0 0 0 7570 28 0 0 274 2482 362 88 12 0
4 0 0 543M 7.5G 2794 0 0 0 0 22 0 0 182 1610 218 93 7 0
4 0 0 535M 7.5G 3983 0 0 0 4970 51 0 0 257 2319 316 87 13 0
4 0 0 545M 7.5G 1165 0 0 0 0 27 0 0 65 573 155 97 3 0
5 0 0 522M 7.5G 3292 0 0 0 6616 22 0 0 561 1833 1460 81 19 0
4 0 0 540M 7.5G 1751 0 0 0 0 25 0 0 124 1057 170 95 5 0
5 0 0 522M 7.5G 2903 0 0 0 4691 22 0 0 223 1650 286 92 8 0
4 0 0 544M 7.5G 2230 0 0 0 0 26 0 0 146 1215 187 94 6 0
4 0 0 541M 7.5G 2439 0 0 0 2526 26 0 0 174 1392 206 93 7 0
4 0 0 541M 7.5G 2238 0 0 0 2577 26 0 0 432 1424 1637 81 19 0
procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr da0 da1 in sy cs us sy id
4 0 0 523M 7.5G 2850 0 0 0 4107 24 0 0 211 1678 248 92 8 0
4 0 0 553M 7.5G 2190 0 0 0 0 28 0 0 138 1145 174 95 5 0
4 0 0 553M 7.5G 2617 0 0 0 2304 28 0 0 162 1494 283 93 7 0
4 0 0 541M 7.5G 2438 0 0 0 3862 26 0 0 154 1374 186 94 6 0
4 0 0 513M 7.5G 908 0 0 0 2232 24 0 0 295 288 1519 87 13 0
4 0 0 518M 7.5G 4492 0 0 0 6883 19 0 0 291 2049 364 88 12 0
4 0 0 483M 7.5G 4621 0 0 0 5719 19 0 0 433 3928 344 79 21 0
4 0 0 497M 7.5G 3087 0 0 0 1819 22 0 0 195 2050 225 91 9 0
4 0 0 539M 7.5G 2360 0 0 0 0 25 0 0 158 1345 204 93 7 0
4 0 0 532M 7.5G 2019 0 0 0 3430 23 0 0 407 744 1754 86 14 0
4 0 0 531M 7.5G 4707 0 0 0 4790 23 0 0 371 3363 312 83 17 0
4 0 0 539M 7.5G 1587 0 0 0 0 26 0 0 60 207 167 97 3 0
4 0 0 545M 7.5G 2458 0 0 0 2092 27 0 0 155 1413 216 95 5 0
4 0 0 547M 7.5G 2509 0 0 0 2837 27 0 0 172 1431 217 93 7 0
5 0 0 543M 7.5G 1614 0 0 0 2470 25 0 0 363 808 1603 86 14 0
4 0 0 555M 7.4G 1898 0 0 0 0 29 0 0 130 780 266 93 7 0
4 0 0 555M 7.5G 2534 0 0 0 2707 28 0 0 169 1416 208 93 7 0
4 0 0 544M 7.5G 2212 0 0 0 4118 25 0 0 136 869 246 96 4 0
5 0 0 531M 7.5G 1693 0 0 0 2607 24 0 0 143 1063 209 95 5 0
4 0 0 549M 7.5G 2193 0 0 0 286 27 0 0 356 1102 1354 83 17 0
4 0 0 524M 7.5G 2890 0 0 0 4360 25 0 0 330 1646 1393 86 14 0
procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr da0 da1 in sy cs us sy id
4 0 0 555M 7.5G 4005 0 0 0 1752 29 0 0 275 2430 260 88 12 0
4 0 0 537M 7.5G 5348 0 0 0 7395 24 0 0 387 3621 362 86 14 0
4 0 0 525M 7.5G 3081 0 0 0 3418 27 0 0 215 1703 272 90 10 0
4 0 0 508M 7.5G 4534 0 0 0 7187 29 0 0 608 2836 1449 72 28 0
4 0 0 534M 7.5G 2345 0 0 0 258 24 0 0 149 1404 333 94 6 0
4 0 0 530M 7.5G 4057 0 0 0 4259 24 0 0 297 2658 292 85 15 0
4 0 0 528M 7.5G 3705 0 0 0 4233 23 0 0 277 2518 270 90 10 0
3 0 0 497M 7.5G 3289 0 0 0 4763 21 0 0 299 2773 337 86 14 0
0 0 0 364M 7.5G 586 0 0 0 7680 11 0 0 2062 400 1747 16 13 71
0 0 0 364M 7.5G 0 0 0 0 16 11 0 0 2339 123 319 0 1 99
0 0 0 364M 7.5G 1245 0 0 0 1408 11 0 0 2261 2056 311 2 10 88

As you can see, the 'in' column drops from ~2300 to around 200 when the
compiler is running. This is, at least to me, counter-intuitive. I would have
expected the interrupt count to increase.

If I copy a file onto a remote machine, I do see an increase in interrupts in
the 'vmstat 1' output:

procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr da0 da1 in sy cs us sy id
0 0 0 355M 7.5G 0 0 0 0 0 10 0 0 2265 139 82 0 0 100
0 0 0 355M 7.5G 0 0 0 0 0 10 0 0 2263 130 88 0 0 100
0 0 0 390M 7.5G 0 0 0 0 0 11 0 0 2267 206 104 0 0 100
1 0 0 396M 7.5G 791 0 0 0 99 11 0 0 5970 8340 11152 26 51 23
0 0 0 355M 7.5G 67 0 0 0 1193 12 0 0 4328 5392 6309 14 24 63
0 0 0 355M 7.5G 0 0 0 0 0 10 0 0 2265 123 82 0 0 100
0 0 0 355M 7.5G 0 0 0 0 0 10 0 0 2259 123 80 0 0 100

The interrupts jump from ~2300 to > 5000 during the time the scp command
is running.

marius edited edge metadata.Apr 2 2017, 5:49 PM
In D10156#211135, @kib wrote:

Kostik is right. I lost counter here. The problem is that sparc64 doesn't have a C part of interrupt handler. It even hardcodes into assembly offset of ih_handler in the structure struct intr_handler. I could resurrect counter in assembly, but it seems to me correct way is to create C function, called from assembly, just like other arches do. Can anyone provide help with sparc64 assembly?

I suggest slightly different approach there. Instead of providing a C wrapper around calls of the functions from intr_handler, create a C function just to increment the counter, say

void counter_intr_inc(void)
{
    counter_u64_add_protected(&counter, 1); /* Except that no CRITICAL_ASSERT should be asserted */
}

Then replace the assembler fragments which you removed, with something like

call counter_intr_inc
  nop

I forgot sparc assembler for real many years ago, so this might require some adjustments, but AFAIR the ABI is simple enough for void ()(void) calls.

Technically, the above is correct but needing to call into C for that is fugly.
I'm more concerned about the EARLY_COUNTER hack, though. It's not exactly unsurprising that spar64 boots up with that.

sys/sparc64/sparc64/exception.S
2479 ↗(On Diff #26888)

This is missing a " nop" in the branch delay slot.

2992 ↗(On Diff #26888)

You should move the "stx %l5, [%l4]" from above into the branch delay slot of the added call.

glebius edited the summary of this revision. (Show Details)Apr 4 2017, 5:09 PM
glebius updated this revision to Diff 27042.Apr 4 2017, 5:10 PM
  • Improve sparc64 assembly per Marius suggestions.
  • Fix top(1).
glebius updated this revision to Diff 27044.Apr 4 2017, 5:28 PM
  • Fix rpc.rstatd.

Marius, I remember that pcpu structs aren't an array on sparc64, unlike all other archs. Is that still the case? Can you suggest better idea for EARLY_COUNTER for sparc64?

Marius, what if we just stash some amount of bss memory for that?

static struct pcpu dummy_pcpu[MAXCPU];
#define EARLY_COUNTER &dummy_pcpu[0].pc_early_dummy_counter

Ugly resource overutilization of course :(

marius added a comment.Apr 5 2017, 9:39 PM

Marius, what if we just stash some amount of bss memory for that?
static struct pcpu dummy_pcpu[MAXCPU];
#define EARLY_COUNTER &dummy_pcpu[0].pc_early_dummy_counter
Ugly resource overutilization of course :(

Correct, per-CPU data still isn't managed as an array for sparc64. In one of my
trees I've changed that but could not convince myself to committing that so far
as it just wastes memory at mediocre gains.
However, the real problem with your previous EARLY_COUNTER was that in the
sparc64 per-CPU data, struct pcpu is preceded by the initial/panic stack but you
were casting the entire MD per-CPU data to a struct pcpu, i. e. including that stack
data. Now the above avoids that, too, but just is an insane approach.
The initial/panic stacks could be separated from the struct pcpu data of course along
with moving to a per-CPU data array (I've only implemented the latter so far, though),
but neither of these choices in the current implementation are without reason. Also,
needing to bend existing MD code around a new supposedly MI design that much often
means that this design is flawed and e. g. doesn't take all MD requirements into account.
Why is EARLY_COUNTER now necessary in the first place? Your summary says "The early
counts would be lost, however". Isn't there a simpler way to drain these early counts
into some write-only location or get rid of them some other way than to make that
dummy part of the struct pcpu?

We need early counter for subsystems like VM, which start to run before UMA, before a counter can be allocated. The early counter should be a pointer that can be treated as count_u64_t, so that you can use counter_u64_add() on it from any CPU without overwriting any useful memory. For all architectures a fields with static struct pcpu works perfectly for that.

So, can you suggest anything better than static struct pcpu dummy_pcpu[MAXCPU]; for that purpose?

glebius updated this revision to Diff 27386.Apr 12 2017, 7:37 PM

Use dummy pcpu array for sparc64.

Guys, I consider the current version of the patch as final. Any objections on committing it?

jhibbits resigned from this revision.May 30 2017, 8:26 PM
glebius abandoned this revision.Jun 8 2017, 9:28 PM
This revision was automatically updated to reflect the committed changes.