Page MenuHomeFreeBSD

Eliminate adj_free field from vm_map_entry
ClosedPublic

Authored by dougm on Nov 1 2018, 3:04 AM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Nov 20, 4:36 AM
Unknown Object (File)
Tue, Nov 19, 6:58 PM
Unknown Object (File)
Fri, Nov 15, 11:06 PM
Unknown Object (File)
Thu, Nov 14, 9:21 PM
Unknown Object (File)
Wed, Nov 13, 9:01 AM
Unknown Object (File)
Mon, Nov 11, 10:51 AM
Unknown Object (File)
Fri, Nov 8, 9:18 PM
Unknown Object (File)
Thu, Nov 7, 4:33 PM
Subscribers

Details

Summary

Drop the adj_free field from vm_map_entry_t. Refine the max_free field so that p->max_free is the size of the largest gap with one endpoint in the subtree rooted at p. Change vm_map_findspace so that, first, the address-based splay is restricted to tree nodes with large-enough max_free value, to avoid searching for the right starting point in a subtree where all the gaps are too small. Second, when the address search leads to a tree search for the first large-enough gap, that gap is the subject of a splay-search that brings the gap to the top of the tree, so that an immediate insertion will take constant time.

Break up the splay code into separate components, one for searching and breaking up the tree and another for reassembling it. Use these components, and not splay itself, for linking and unlinking. Drop the after-where parameter to link, as it is computed as a side-effect of the splay search.

Submitted by: dougm_rice.edu
Tested by: pho

Test Plan

Peter Holm has graciously stress-tested a previous version for two days, with no problems.

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Reduce vm_map tree manipulations in case of fixup after interval resizing.

When only walking one spine of the tree in vm_map_entry_resize_free, be sure to include the other, un-traversed spine in the rebuilt tree, rather than using NULL.

Add a length parameter to vm_map_entry_splay_split, which treats a pointer to a node with max_free < length as a NULL pointer. Set it to 0 in all callers, except in vm_map_findspace, where it is set to the length parameter. Thus, the splay in findspace can stop without examining parts of the tree that cannot offer sufficient space.

Fix a left vs right mixup that surely breaks the last 2 or 3 diffs.

Make the search for first-fit space leave the range just before the found space at the root of the tree, so that insert-after-findspace doesn't have to walk the search path twice.

Add missing var definition.

D17794.54664.diff cannot boot. An NMI results in a reset.

CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz (1995.24-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x206d7  Family=0x6  Model=0x2d  Stepping=7
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x1fbee3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX>
  AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
  AMD Features2=0x1<LAHF>
  XSAVE Features=0x1<XSAVEOPT>
  VT-x: Basic Features=0xda0400<SMM,INS/OUTS,TRUE>
        Pin-Based Controls=0x7f<ExtINT,NMI,VNMI,PreTmr>
        Primary Processor Controls=0xfff9fffe<INTWIN,TSCOff,HLT,INVLPG,MWAIT,RDPMC,RDTSC,CR3-LD,CR3-ST,CR8-LD,CR8-ST,TPR,NMIWIN,MOV-DR,IO,IOmap,MTF,MSRmap,MONITOR,PAUSE>
        Secondary Processor Controls=0x4ff<APIC,EPT,DT,RDTSCP,x2APIC,VPID,WBINVD,UG,PAUSE-loop>
        Exit Controls=0xda0400<PAT-LD,EFER-SV,PTMR-SV>
        Entry Controls=0xda0400
        EPT Features=0x6134141<XO,PW4,UC,WB,2M,1G,INVEPT,single,all>
        VPID Features=0xf01<INVVPID,individual,single,all,single-globals>
  TSC: P-state invariant, performance statistics
Data TLB0: 2-MByte or 4 MByte pages, 4-way set associative, 32 entries
Data TLB: 4 KB pages, 4-way set associative, 64 entries
Instruction TLB: 2M/4M pages, fully associative, 8 entries
Instruction TLB: 4KByte pages, 4-way set associative, 64 entries
64-Byte prefetching
Shared 2nd-Level TLB: 4 KByte pages, 4-way associative, 512 entries
L2 cache: 256 kbytes, 8-way associative, 64 bytes/line
real memory  = 68719476736 (65536 MB)
Physical memory chunk(s):
0x0000000000010000 - 0x000000000008dfff, 516096 bytes (126 pages)
0x0000000000100000 - 0x00000000001fffff, 1048576 bytes (256 pages)
0x0000000002800000 - 0x00000000bb3c6fff, 3099357184 bytes (756679 pages)
0x00000000bdfaf000 - 0x00000000bdffffff, 331776 bytes (81 pages)
0x0000000100000000 - 0x0000000fd861cfff, 63759831040 bytes (15566365 pages)

Back out the changes related to findspace, so that the only difference between what worked (for 3 hours) and this are the (corrected) changes to reduce tree manipulation when intervals are resized. And wait to see if that boots.

D17794.54666.diff boots. Ran a few tests for 10 minutes without any problems.

Add the length parameter to vm_map_entry_splay_split, and pass 0 in all cases except the one for vm_map_findspace.

Add changes to 'splay' the found space to near the top of the tree/map.

Does not boot:

Data TLB0: 2-MByte or 4 MByte pages, 4-way set associative, 32 entries
Data TLB: 4 KB pages, 4-way set associative, 64 entries
Instruction TLB: 2M/4M pages, fully associative, 8 entries
Instruction TLB: 4KByte pages, 4-way set associative, 64 entries
64-Byte prefetching
Shared 2nd-Level TLB: 4 KByte pages, 4-way associative, 512 entries
L2 cache: 256 kbytes, 8-way associative, 64 bytes/line
real memory  = 68719476736 (65536 MB)
Physical memory chunk(s):
0x0000000000010000 - 0x000000000008dfff, 516096 bytes (126 pages)
0x0000000000100000 - 0x00000000001fffff, 1048576 bytes (256 pages)
0x0000000002800000 - 0x00000000bb3c6fff, 3099357184 bytes (756679 pages)
0x00000000bdfaf000 - 0x00000000bdffffff, 331776 bytes (81 pages)
0x0000000100000000 - 0x0000000fd861cfff, 63759831040 bytes (15566365 pages)

Make sure the llist is null-terminated when the splay-for-first-free-space begins.

Have findspace return with the root at the start of the space found and root->right as its successor, so that calling link after findspace takes a constant amount of time.

Squeeze code into 80 columns.

Entering uma_startup2 with 0 boot pages left
VT(vga): resolution 640x480
CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz (1995.24-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x206d7  Family=0x6  Model=0x2d  Stepping=7
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x1fbee3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX>
  AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
  AMD Features2=0x1<LAHF>
  XSAVE Features=0x1<XSAVEOPT>
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
  TSC: P-state invariant, performance statistics
real memory  = 68719476736 (65536 MB)
panic: kmem_suballoc: bad status return of 3
cpuid = 0
time = 1
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff827249c0
vpanic() at vpanic+0x1b4/frame 0xffffffff82724a20
panic() at panic+0x43/frame 0xffffffff82724a80
kmem_suballoc() at kmem_suballoc+0xb4/frame 0xffffffff82724ab0
vm_ksubmap_init() at vm_ksubmap_init+0x1f7/frame 0xffffffff82724af0
cpu_startup() at cpu_startup+0x21c/frame 0xffffffff82724b20
mi_startup() at mi_startup+0x25f/frame 0xffffffff82724b70
btext() at btext+0x2c
KDB: enter: panic
[ thread pid 0 tid 0 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db> x/s version
version:        FreeBSD 13.0-CURRENT #6 r344750M: Tue Mar  5 08:52:47 CET 2019\012    pho@t2.osted.lan:/usr/src/sys/amd64/compile/PHO\012
db>

Guess that the tree-reassembly phase of vm_map_findspace was broken in the last update and undo it. Then wait to see whether it was actually the other half that broke it.

Add everything back, and a missing assignment "map->root = root".

I have only run a few tests and a buildworld with 54727. Do you need me to run the full set of tests (48 hours using one host)?

Changes to the appearance of the code, but not, hopefully, the function. I do not intended to offer any more functional changes; I would appreciate a thorough testing, so that I can then ask for review and analysis of performance impact.

I have tested D17794.id54771.diff with all of the stress2 tests on amd64. I ran a short two hour test on i386.
No problems seen.

dougm edited the test plan for this revision. (Show Details)
dougm added reviewers: kib, markj.

Include the changes from D17635 that reduce map manipulation when neighbors are merged.

Trying to mount root from ufs:/dev/da0p2 [rw]...
WARNING: WITNESS option enabled, expect reduced performance.
WARNING: DIAGNOSTIC option enabled, expect reduced performance.
Expensive timeout(9) function: 0xffffffff80a55be0(0xffffffff81af3d20) 0.006689896 s
uhub1: 4 ports with 4 removable, self powered
pid 49 (ps), jid 0, uid 0: exited on signal 11
panic: vm_map_entry_unlink: unlink object not mapped
cpuid = 11
time = 1552632343
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00c7c7f430
vpanic() at vpanic+0x1b4/frame 0xfffffe00c7c7f490
panic() at panic+0x43/frame 0xfffffe00c7c7f4f0
vm_map_entry_unlink() at vm_map_entry_unlink+0x19a/frame 0xfffffe00c7c7f520
vm_map_entry_delete() at vm_map_entry_delete+0x21/frame 0xfffffe00c7c7f570
vm_map_delete() at vm_map_delete+0x344/frame 0xfffffe00c7c7f5e0
vm_map_remove() at vm_map_remove+0x81/frame 0xfffffe00c7c7f610
vmspace_exit() at vmspace_exit+0xd3/frame 0xfffffe00c7c7f650
exit1() at exit1+0x5ad/frame 0xfffffe00c7c7f6c0
sigexit() at sigexit+0x959/frame 0xfffffe00c7c7f9a0
postsig() at postsig+0x342/frame 0xfffffe00c7c7fa70
ast() at ast+0x4cf/frame 0xfffffe00c7c7fab0
doreti_ast() at doreti_ast+0x1f/frame 0x7fffffffdb00
KDB: enter: panic
[ thread pid 49 tid 100223 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db> x/s version
version:        FreeBSD 13.0-CURRENT #0 r345124M: Fri Mar 15 07:43:06 CET 2019\012    pho@t2.osted.lan:/usr/src/sys/amd64/compile/PHO\012
db>

Don 't leak a bit of the tree when unlinking an entry with an adjoining neighbor.

This is much better. I ran tests for 2 hours without any problems seen.

I think that it would be useful to have a function which would be only used under INVARIANTS or even DIAGNOSTIC and which checks that:

  • list of vm_map entries is ordered by the start address;
  • the end of previous entry does not overlap with the start of the next one;
  • the tree satisfies the invariants about left/right values vs. root.

And call the checker after the splay tree manipulations.

(No real comments from me about the algorithms).

sys/vm/vm_map.c
921

The split and merge in the name of this function and the function below refer to the llist and rlist, and not to the split of some entry ? It sounds strange to name them vm_map_entry_something.

1125

Why do you need two switches ? I.e, why cannot the code from the second switch cases go straight into the cases of the first switch ? I understand that you do separate llist/rlists and root calculations, but I do not quite get the separation.

1481

Perhaps change the return type of the function to bool ? It got almost complete rewrite anyway.

sys/vm/vm_map.c
921

The split function divides the tree into an llist of nodes less than addr, an rlist of nodes greater than addr, and a root (possibly NULL) that contains addr, with left and right children less than and greater than addr.

I'm trying to follow convention with the naming. I don't object to dropping 'entry' from the names of the new functions, if that's what people want.

1125

I can get by with only one switch, but then a have to duplicate each of the three-line code snippets that start with resetting the new root value. Note that the NONE case in the first switch is likely to alter the value of 'op', so that some other case is taken in the second switch.

1481

If we're talking about changing the interface, how would you feel about changing the return type to vm_offset_t, eliminating the addr parameter, returning the addr, and returning the vm_map_max value for the failure case?

In response to reviewer feedback:
Shorten the names of splay helper functions.
Avoid using back-to-back switch statements.
Change the interface to vm_map_findspace, to return the found address directly, rather than by parameter.
Add a routine for testing the data consistency of the vm_map and invoke it after every update to map->root.

D17794.55338.patch does not boot:

startup_alloc from "UMA Hash", 2 boot pages left
startup_alloc from "UMA Zones", 1 boot pages left
Entering uma_startup1 with 0 boot pages left
Entering uma_startup2 with 0 boot pages left
panic: vtopte on a uva/gpa 0x0
cpuid = 0
time = 1
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff82725a10
vpanic() at vpanic+0x19d/frame 0xffffffff82725a60
panic() at panic+0x43/frame 0xffffffff82725ac0
pmap_qenter() at pmap_qenter+0x169/frame 0xffffffff82725ae0
kmem_init_zero_region() at kmem_init_zero_region+0x71/frame 0xffffffff82725b10
vm_mem_init() at vm_mem_init+0x5b/frame 0xffffffff82725b20
mi_startup() at mi_startup+0x210/frame 0xffffffff82725b70
btext() at btext+0x2c
KDB: enter: panic
[ thread pid 0 tid 0 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db> x/s version
version:        FreeBSD 13.0-CURRENT #0 r345408M: Fri Mar 22 07:54:18 CET 2019\012    pho@t2.osted.lan:/usr/src/sys/amd64/compile/PHO\012
db>

Keep the changes to function names and switch statements. Discard the changes to the vm_map_findspace interface and the assertions. I have no idea which of these led to failure, but I've the discarded the changes that seemed most dangerous.

D17794.55343.diff boots and survived a few tests.

Change vm_map_findspace to return the found address. Change two comparison errors from the last time this change was offered.

I ran tests on Diff 55349 for two hours. No problems seen.

Add in consistency-check assertions for the vm_map tree.

real memory = 68719476736 (65536 MB)
panic: map 0xfffff80840002000 max = 0, max_left = 0, max_right = 0.
cpuid = 0
time = 1
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff82725710
vpanic() at vpanic+0x19d/frame 0xffffffff82725760
panic() at panic+0x43/frame 0xffffffff827257c0
_vm_map_assert_consistent() at _vm_map_assert_consistent+0x347/frame 0xffffffff82725820
vm_map_entry_link() at vm_map_entry_link+0x15d/frame 0xffffffff82725880
vm_map_insert() at vm_map_insert+0x944/frame 0xffffffff82725970
vm_map_find() at vm_map_find+0x863/frame 0xffffffff82725a50
kmem_suballoc() at kmem_suballoc+0x56/frame 0xffffffff82725aa0
vm_ksubmap_init() at vm_ksubmap_init+0x1d4/frame 0xffffffff82725af0
cpu_startup() at cpu_startup+0x21c/frame 0xffffffff82725b20
mi_startup() at mi_startup+0x210/frame 0xffffffff82725b70
btext() at btext+0x2c
KDB: enter: panic
[ thread pid 0 tid 0 ]
Stopped at kdb_enter+0x3b: movq $0,kdb_why
db> x/s version
version: FreeBSD 13.0-CURRENT #4 r345408M: Fri Mar 22 21:26:27 CET 2019\012 pho@t2.osted.lan:/usr/src/sys/amd64/compile/PHO\012
db>

Change '=' to '==' in assertion.

Change the vm_map_findspace failure return value so that the last "allocated" address is invalid, not the first one. This makes the test more like the max_addr tests that follow findspace calls.

Ran tests for 2 hours on D17794.55372.diff without seeing any problems.

Adapt to a recent ASLR fix (D19688).

I just completed a full stress2 test on amd64 with Diff 55389. No problems seen.

sys/vm/vm_map.c
683

This formats %ul break on ILP32. Use %jx and cast args to uintmax_t.

sys/vm/vm_map.h
404

Since the line is changed, remove space before '('.

Apply reviewer-recommended formatting changes.

This revision is now accepted and ready to land.Mar 28 2019, 2:43 PM

I'd appreciate a commit of this patch by kib. Thanks.

I ran random tests for three hours with 55537 on amd64 and a buildworld on i386. No problems seen.

markj added inline comments.
sys/vm/vm_map.c
683

Minor nit: the convention is not to add a period at the end of panic string or assertion messages.

Per reviewer suggestion, remove periods from ends of assert statements.

This revision now requires review to proceed.Mar 29 2019, 2:50 PM
This revision was not accepted when it landed; it landed in state Needs Review.Mar 29 2019, 4:54 PM
This revision was automatically updated to reflect the committed changes.