Page MenuHomeFreeBSD

Simplify vm_reserv_break
ClosedPublic

Authored by dougm on May 13 2019, 9:36 PM.

Details

Summary

Reduce the code size and number of ffsl calls in vm_reserv_break. Use xor to find where free ranges begin and end.

Tested by: pho

Test Plan

alc developed performance test framework for this patch, measuring the cycle count for a vm_reserv_break operation.

For each distribution of free block in a reservation, I took 10 samples with the current vm_reserv_break implementation and 10 samples from the implementation offered by this patch. The first column is from the unmodified implementation.

alternating 15 pages allocated, 15 pages free:
cycles/break: 28112: 27952
cycles/break: 28152: 27979
cycles/break: 28177: 28052
cycles/break: 28199: 28014
cycles/break: 28057: 27985
cycles/break: 28143: 27971
cycles/break: 28221: 27963
cycles/break: 28144: 28029
cycles/break: 28162: 27908
cycles/break: 28058: 27999

alternating 1 page allocated, 1 page free:
cycles/break: 61955: 61065
cycles/break: 62050: 61241
cycles/break: 62066: 61164
cycles/break: 61958: 61150
cycles/break: 62068: 61097
cycles/break: 61987: 61291
cycles/break: 61996: 61298
cycles/break: 62123: 61151
cycles/break: 62011: 61220
cycles/break: 62040: 61113

alternating 170 pages allocated, 170 pages free:
cycles/break: 4260: 4274
cycles/break: 4228: 4322
cycles/break: 4220: 4264
cycles/break: 4227: 4266
cycles/break: 4252: 4289
cycles/break: 4215: 4272
cycles/break: 4249: 4277
cycles/break: 4236: 4294
cycles/break: 4231: 4279
cycles/break: 4226: 4277

Pages 0 and 511 allocated, the rest free:
cycles/break: 4233: 4226
cycles/break: 4236: 4251
cycles/break: 4218: 4204
cycles/break: 4248: 4219
cycles/break: 4241: 4233
cycles/break: 4195: 4232
cycles/break: 4251: 4280
cycles/break: 4193: 4222
cycles/break: 4281: 4221
cycles/break: 4240: 4202

The more fragmented the pages in the reservation, the better the modified code seems to perform a bit better. With big ranges of freed or allocated pages, there's no clear benefit in cycle count.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Modify vm_reserv_reclaim_contig to also use ffsl less, and to use xor to find 0-1 transitions in the bitstream.

dougm retitled this revision from Simplify vm_reserv_break to Simplify vm_reserv_break, vm_reserv_reclaim_config.May 14 2019, 5:03 AM
dougm edited the summary of this revision. (Show Details)

Sure happy to. Will I get credit for doing so?

vm_reserv.c
1034 ↗(On Diff #57386)

Is this the right ordering of "int" and "u_long"?

In D20256#436594, @pho wrote:

Sure happy to. Will I get credit for doing so?

I regret failing to acknowledge you in some, or all, recent commits. If you want me to make some sort of public statement about it, I will.

I'll try to do better.

In D20256#436594, @pho wrote:

Sure happy to. Will I get credit for doing so?

I regret failing to acknowledge you in some, or all, recent commits. If you want me to make some sort of public statement about it, I will.

I'll try to do better.

:)
Just having "Tested by: pho" in future commit logs, if I made major contributions, is all I hope for.

With D20256.57387.diff I got:

20190514 07:51:36 all (1/2): su.sh
witness_lock_list_get: witness exhausted
panic: vm_page_dequeue: queued unlocked page 0xfffff81019657208
cpuid = 15
time = 1557813179
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00dc6ae610
vpanic() at vpanic+0x19d/frame 0xfffffe00dc6ae660
panic() at panic+0x43/frame 0xfffffe00dc6ae6c0
vm_page_dequeue() at vm_page_dequeue+0x2e2/frame 0xfffffe00dc6ae700
vm_page_alloc_domain_after() at vm_page_alloc_domain_after+0x29b/frame 0xfffffe00dc6ae770
vm_page_alloc() at vm_page_alloc+0x74/frame 0xfffffe00dc6ae7d0
vm_fault_hold() at vm_fault_hold+0x12d1/frame 0xfffffe00dc6ae910
vm_fault() at vm_fault+0x60/frame 0xfffffe00dc6ae950
trap_pfault() at trap_pfault+0x188/frame 0xfffffe00dc6ae9a0
trap() at trap+0x46b/frame 0xfffffe00dc6aeab0
calltrap() at calltrap+0x8/frame 0xfffffe00dc6aeab0
--- trap 0xc, rip = 0x80020d3f6, rsp = 0x7fffffffc650, rbp = 0x7fffffffc6b0 ---

https://people.freebsd.org/~pho/stress/log/dougm028.txt

dougm retitled this revision from Simplify vm_reserv_break, vm_reserv_reclaim_config to Simplify vm_reserv_break,.

Just stick to the easy part of the patch, which I don't expect to change, to get something under review. The harder half can be done separately.

vm_reserv.c
1054 ↗(On Diff #57397)

I can't see how this works for a fully populated reservation. Suppose NPOMAP is 1. On both iterations of the outer loop, changes will be 1. On the first iteration, lo == hi == 0 and we set lo = 0. On the second iteration we will set lo = 64 but won't free any pages before exiting the loop. What am I missing?

vm_reserv.c
1054 ↗(On Diff #57397)

By "fully populated", I assume that you mean that all bits of rv->popmap[i] are set for all i.

On the first iteration of the loop:
changes = rv->popmap[i]; /* changes = 0xffffffff */
changes ^= (changes << 1) | (lo == hi); /* changes == 0 */

so we do not set lo to 0, but it remains 0.
On the second iteration, we set lo to 64.

I agree that we won't free any pages, but there are no pages to be freed, and this matches the current behavior.

vm_reserv.c
1054 ↗(On Diff #57397)

Just in case "fully populated" means all zeroes, we'll set lo to 0 on the first loop, and hi to 64 on the second, before freeing all 64 pages.

Initialize hi and lo to -1 to avoid confusion if lo gets set 0 immediately.

Avoid pointlessly assigning lo = NBPOPMAP * NBPOPMAP in the last loop iteration.

dougm retitled this revision from Simplify vm_reserv_break, to Simplify vm_reserv_break.May 15 2019, 1:43 AM

I'm a bit low on test H/W, so I ran D20256.57402.diff on an i386 test host and got this:

20190515 03:16:01 all (113/620): sendfile17.sh


Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 03
fault virtual address   = 0xdeadc0ee
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0x1089740
stack pointer           = 0x28:0x21216a3c
frame pointer           = 0x28:0x21216a50
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 65852 (sendfile17)
trap number             = 12
panic: page fault
cpuid = 3
time = 1557883251
KDB: stack backtrace:
db_trace_self_wrapper(e8ac4d,1c11188,0,2121684c,be79c1,...) at db_trace_self_wrapper+0x2a/frame 0x21216820
kdb_backtrace(7,3,3,212169fc,212169fc,...) at kdb_backtrace+0x2e/frame 0x21216880
vpanic(15bbd75,212168c4,212168c4,212168f8,1558676,...) at vpanic+0x121/frame 0x212168a4
panic(15bbd75,1649719,2db5000,0,deadc0ee,...) at panic+0x14/frame 0x212168b8
trap_fatal(13029a5,0,4,16480b0,332,...) at trap_fatal+0x356/frame 0x212168f8
trap_pfault(deadc0ee) at trap_pfault+0x51/frame 0x21216934
trap(212169fc,8,28,28,24582d80,...) at trap+0x3c0/frame 0x212169f0
calltrap() at 0xffc0316d/frame 0x212169f0
--- trap 0xc, eip = 0x1089740, esp = 0x21216a3c, ebp = 0x21216a50 ---
unp_dispose(21995618) at unp_dispose+0x80/frame 0x21216a50
sofree(21995618) at sofree+0x293/frame 0x21216a7c
soclose(21995618) at soclose+0x2f7/frame 0x21216ab4
soo_close(189eddc8,99d8a80) at soo_close+0x1f/frame 0x21216ac0
_fdrop(189eddc8,99d8a80) at _fdrop+0x18/frame 0x21216ad8
closef(189eddc8,99d8a80,189eddc8,99d8a80,21cb56f0,...) at closef+0x1e0/frame 0x21216b2c
fdescfree_fds(1) at fdescfree_fds+0x98/frame 0x21216b54
fdescfree(99d8a80) at fdescfree+0x389/frame 0x21216bc4
exit1(99d8a80,0,0,21216cdc,1558d79,...) at exit1+0x47a/frame 0x21216bf4
sys_sys_exit(99d8a80,99d8d08) at sys_sys_exit+0x12/frame 0x21216c08
syscall(21216ce8,3b,3b,3b,ffbfe808,...) at syscall+0x2d9/frame 0x21216cdc

https://people.freebsd.org/~pho/stress/log/dougm029.txt
There's no crash dump for this one.
I'll try this test on a pristine HEAD.

I have not been able to reproduce the page fault with or without your patch.
I have resumed testing with Diff 57406.

vm_reserv.c
1054 ↗(On Diff #57397)

Sorry for the noise, I had an inverted sense of what this function is supposed to do.

I ran into another problem, so now I'm switching from i386 to amd64.

20190515 21:24:35 all (167/620): snap5.sh
panic: Memory modified after free 0x2879d800(2048) val=0 @ 0x2879d9b4

cpuid = 3
time = 1557949826
KDB: stack backtrace:
db_trace_self_wrapper(e8ac4d,1c11188,0,93ff92c,be79c1,...) at db_trace_self_wrapper+0x2a/frame 0x93ff900
kdb_backtrace(7,3,3,94b05e0,2879d9b4,...) at kdb_backtrace+0x2e/frame 0x93ff960
vpanic(16a8223,93ff9a4,93ff9a4,93ff9bc,12e1444,...) at vpanic+0x121/frame 0x93ff984
panic(16a8223,2879d800,800,0,2879d9b4,...) at panic+0x14/frame 0x93ff998
trash_ctor(2879d800,800,93ffa48,2) at trash_ctor+0x44/frame 0x93ff9bc
mb_ctor_pack(2a70de00,100,93ffa48,2) at mb_ctor_pack+0x35/frame 0x93ff9e8
uma_zalloc_arg(99997e0,93ffa48,2) at uma_zalloc_arg+0xa0b/frame 0x93ffa34
m_getm2(0,21c,2,1,0) at m_getm2+0xf8/frame 0x93ffa74
m_uiotombuf(93ffbd8,2,8148,0,0) at m_uiotombuf+0x5a/frame 0x93ffaa4
sosend_generic(21905a28,0,93ffbd8,0,0,0,188b2700) at sosend_generic+0x2d7/frame 0x93ffafc
sosend(21905a28,0,93ffbd8,0,0,0,188b2700) at sosend+0x50/frame 0x93ffb2c
soo_write(21040700,93ffbd8,99a4500,0,188b2700) at soo_write+0x32/frame 0x93ffb5c
dofilewrite(21040700,93ffbd8,ffffffff,ffffffff,0) at dofilewrite+0x86/frame 0x93ffb90
kern_writev(188b2700,3,93ffbd8) at kern_writev+0x3b/frame 0x93ffbbc
sys_write(188b2700,188b2988) at sys_write+0x48/frame 0x93ffc08
syscall(93ffce8,3b,3b,3b,20bf7000,...) at syscall+0x2d9/frame 0x93ffcdc

Peter, is there any data on how many reservations get broken in these tests? Without breaking reservations, this code isn't tested.

OK.
vm.reserv.broken: 164537
Uptime is 12 hours on amd64

I ran tests for 18 hours, including a buildworld.
Last vm.reserv.broken value was 247091.
No problems seen.

This revision is now accepted and ready to land.May 16 2019, 1:30 PM

I expect to commit this tonight, unless other reviewer wish to weigh in before then.

In D20256#437068, @pho wrote:

I ran tests for 18 hours, including a buildworld.
Last vm.reserv.broken value was 247091.
No problems seen.

I'm somewhat surprised that the value is that large. Peter, if you run a buildworld on a freshly rebooted amd64 machine, what are the reported values for vm.pmap.pde and vm.reserv before and after the buildworld.

Here's a buildworld from a single user mode boot:

# /usr/bin/time -h ./zzbuildworld.sh 
FreeBSD t2.osted.lan 13.0-CURRENT FreeBSD 13.0-CURRENT #2 r347793: Thu May 16 19:17:57 CEST 2019     pho@t2.osted.lan:/usr/src/sys/amd64/compile/PHO  amd64
vm.pmap.pde.promotions: 302
vm.pmap.pde.p_failures: 111
vm.pmap.pde.mappings: 0
vm.pmap.pde.demotions: 27
vm.reserv.reclaimed: 0
vm.reserv.partpopq: 
DOMAIN    LEVEL     SIZE  NUMBER

     0,      -1,  40944K,     22
     1,      -1,   4588K,      3

vm.reserv.fullpop: 135
vm.reserv.freed: 650
vm.reserv.broken: 0
make -j25 buildworld
vm.pmap.pde.promotions: 98586
vm.pmap.pde.p_failures: 5940
vm.pmap.pde.mappings: 146743
vm.pmap.pde.demotions: 18127
vm.reserv.reclaimed: 0
vm.reserv.partpopq: 
DOMAIN    LEVEL     SIZE  NUMBER

     0,      -1,  90936K,     74
     1,      -1,  17112K,     11

vm.reserv.fullpop: 143
vm.reserv.freed: 1125509
vm.reserv.broken: 0
Elapsed 01:00
        1h20.91s real           17h14m11.64s user               1h15m2.16s sys
#

Given that the new version is shorter (144 bytes less machine code on amd64), the performance results are good enough. If you add comments explaining how this new version works, I'm happy to see it committed.

Add comments to better explain how vm_reserv_break works.

This revision now requires review to proceed.May 27 2019, 9:33 PM
vm_reserv.c
1044 ↗(On Diff #57971)

I suggest quoting "changes" to make it more clear that you're referring to a variable.

1057 ↗(On Diff #57971)

Style: the comment formatting is off:

/*
 * If the ...
This revision is now accepted and ready to land.May 27 2019, 9:53 PM
This revision was automatically updated to reflect the committed changes.