Page MenuHomeFreeBSD

Changes to vm_map_lookup_entry
Needs ReviewPublic

Authored by dougm on Jun 16 2019, 4:39 PM.
This revision needs review, but there are no reviewers specified.

Details

Summary

Since the only caller to vm_map_splay is vm_map_lookup_entry, move the implementation of vm_map_splay into vm_map_lookup_entry.

vm_map_lookup_entry returns the greatest entry less than or equal to a given address, but in many cases the caller wants the least entry greater than or equal to the address and uses the next pointer to get to it. Provide an alternative interface to lookup, vm_map_lookup_entry_greq, to provide the latter behavior, and let callers use one or the other rather than having them use the next pointer after a lookup miss to get what they really want.

In vm_map_growstack, the caller wants an entry that includes a given address, and either the preceding or next entry depending on the value of eflags in the first entry. Incorporate that behavior into vm_map_lookup_helper, the function that implements all of these lookups.

Eliminate the prev field from vm_map_entry.

Diff Detail

Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

dougm created this revision.Jun 16 2019, 4:39 PM
dougm updated this revision to Diff 58757.Jun 18 2019, 3:56 AM
dougm edited the summary of this revision. (Show Details)

Too many places depend on the vm_map_entry_lookup name, so leave it alone and rename something else.

dougm updated this revision to Diff 58758.Jun 18 2019, 4:00 AM

... but keep the boolean_t to bool return value transition for vm_map_entry_lookup.

markj added inline comments.Jun 21 2019, 7:13 PM
vm_map.c
1176 ↗(On Diff #58758)

Should this comment be preserved?

1290 ↗(On Diff #58758)

"the entry provided"

1351 ↗(On Diff #58758)

All of the vm_map_splay_merge() calls are followed by this assert. Is there a reason to not just assert in vm_map_splay_merge() instead?

1403 ↗(On Diff #58758)

It would be a bit more consistent with other code to use a "_ge" suffix instead. We have vm_radix_lookup_ge() and SWAP_PCTRIE_LOOKUP_GE(), for example.

vm_map.h
418 ↗(On Diff #58758)

Remove the space after vm_map_lookup_entry while you're here?

dougm updated this revision to Diff 58877.Jun 21 2019, 10:47 PM
dougm marked 3 inline comments as done.

Apply most reviewer suggestions. Explain the one I didn't apply.

vm_map.c
1176 ↗(On Diff #58758)

I've updated it a bit and put it just before the first splay-specific code.

1351 ↗(On Diff #58758)

There is an exception in vm_map_findspace, and I don't think the consistency check would pass there. I could avoid calling vm_map_splay_merge there, and just copy a bit of code. Or I could stop trying to bring ranges on either side of the found space to the top of the map, and let the range to the right stay (possibly) near the bottom of the tree.

dougm updated this revision to Diff 58987.Jun 25 2019, 8:12 AM

Resolve a conflict.

kib added inline comments.Jun 25 2019, 6:10 PM
vm_map.c
1331 ↗(On Diff #58987)

rv typically denotes a value containing Mach error code, a 'Return Value' from some function.

1334 ↗(On Diff #58987)

This ';' should be on the new line and properly indented. Sometimes people even add a comment like /* Ignore */;

1367 ↗(On Diff #58987)

There should be a blank line before multi-line comment. Since you change a lot of code in the function, it makes sense to fix this style bug.

dougm updated this revision to Diff 59028.Jun 25 2019, 7:28 PM

Apply reviewer suggestions.

kib accepted this revision.Jun 25 2019, 7:53 PM
This revision is now accepted and ready to land.Jun 25 2019, 7:53 PM
dougm reopened this revision.Jun 26 2019, 3:17 AM

It broke boot, so I've reverted it. Now, while I look for the problem, I'll accept your shaming.

markj added a comment.Jun 26 2019, 3:31 AM

It broke boot, so I've reverted it. Now, while I look for the problem, I'll accept your shaming.

Welcome to the club. :)

I will note that by convention a reviewer is not added to the "Reviewed by:" tag in a commit unless they've clicked "accept" in the phabricator review. We usually use "Discussed with:" in cases where the reviewer did not click "accept" but provided some feedback in the review. I was meaning to bring this up after I saw the commit mail, and not because there happened to be a bug. Thanks for the quick revert.

pho added a subscriber: pho.Jun 26 2019, 5:31 AM

This is what I see on i386:

uhub3: 2 ports with 2 removable, self powered
uhub5: 2 ports with 2 removable, self powered
panic: Bad entry start/end for new stack entry
cpuid = 3
time = 1561525410
KDB: stack backtrace:
db_trace_self_wrapper(e64bdd,1bd0e18,0,1251889c,bd29a1,...) at db_trace_self_wrapper+0x2a/frame 0x12518870
kdb_backtrace(7,3,3,ffbdf000,ffbff000,...) at kdb_backtrace+0x2e/frame 0x125188d0
vpanic(162b7f6,12518918,12518918,1251893c,12bd2ec,...) at vpanic+0x121/frame 0x125188f8
panic(162b7f6,17fc10c0,17fc10c0,fbbff000,2104fbac,...) at panic+0x14/frame 0x1251890c
vm_map_stack_locked(4000000,20000,3,7,1000) at vm_map_stack_locked+0x19c/frame 0x1251893c
vm_map_stack(2104fbac,fbbff000,4000000,3,7,1000) at vm_map_stack+0x9e/frame 0x12518968
exec_new_vmspace(12518a98,1c2c880) at exec_new_vmspace+0x2f6/frame 0x125189c0
exec_elf32_imgact(12518a98) at exec_elf32_imgact+0x7f6/frame 0x12518a4c
kern_execve(99d9a80,12518c70,0) at kern_execve+0x546/frame 0x12518c44
start_init(0,12518ce8) at start_init+0x190/frame 0x12518cb4
fork_exit(f6e2d0,0,12518ce8,0,0,...) at fork_exit+0x6c/frame 0x12518cd4
fork_trampoline() at 0xffc033ca/frame 0x12518cd4
--- trap 0, eip = 0, esp = 0x12518d20, ebp = 0 ---
(null)() at 0
KDB: enter: panic
[ thread pid 1 tid 100002 ]
Stopped at      kdb_enter+0x35: movl    $0,kdb_why
db> x/s version
version:        FreeBSD 13.0-CURRENT #0 r349393: Wed Jun 26 06:46:30 CEST 2019\012    pho@x4.osted.lan:/usr/src/sys/i386/compile/PHO\012
db>
dougm updated this revision to Diff 59044.Jun 26 2019, 5:37 AM

Fix vm_stack_locked by not modifying it. If I ever do modify it again, I should notice that new_entry is not the same as prev_entry->next until after vm_map_insert is called, which is a pretty big thing to have overlooked.

But don't go reviewing/testing it yet. There's still another thing wrong. Something about vm_map_lookup after vm_map_fault.

pho added a comment.Jun 26 2019, 7:32 AM

i386 booted with this patch.
On amd64 I got:

20190626 08:28:23 all (220/636): beneath.sh
Fatal double fault
rip 0xffffffff80ef9584 rsp 0xfffffe00a7c00f80 rbp 0xfffffe00a7c010b0
rax 0xfffff80397d07000 rdx 0x1 rbx 0
rcx 0 rsi 0xfffff803e0c70000 rdi 0
r8 0 r9 0xfffffe00a7c014f8 r10 0xfffffe00a7c014cc
r11 0xfffffe00a7c01527 r12 0x1 r13 0xfffff803979ae000
r14 0 r15 0xfffff803e0c70000 rflags 0x10282
cs 0x20 ss 0x28 ds 0x3b es 0x3b fs 0x13 gs 0x1b
fsbase 0x8002438d0 gsbase 0xffffffff820c7480 kgsbase 0
cpuid = 9; apic id = 09
panic: double fault
cpuid = 9
time = 1561530504
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0006bd1db0
vpanic() at vpanic+0x19d/frame 0xfffffe0006bd1e00
panic() at panic+0x43/frame 0xfffffe0006bd1e60
dblfault_handler() at dblfault_handler+0x1de/frame 0xfffffe0006bd1f30
Xdblfault() at Xdblfault+0xc3/frame 0xfffffe0006bd1f30
--- trap 0x17, rip = 0xffffffff80ef9584, rsp = 0xfffffe00a7c00f80, rbp = 0xfffffe00a7c010b0 ---
vm_fault_hold() at vm_fault_hold+0x14/frame 0xfffffe00a7c010b0
vm_fault() at vm_fault+0x60/frame 0xfffffe00a7c010f0
trap_pfault() at trap_pfault+0x188/frame 0xfffffe00a7c01140
trap() at trap+0x2b4/frame 0xfffffe00a7c01250
calltrap() at calltrap+0x8/frame 0xfffffe00a7c01250
--- trap 0xc, rip = 0xffffffff80f06a14, rsp = 0xfffffe00a7c01320, rbp = 0xfffffe00a7c01400 ---
vm_map_lookup() at vm_map_lookup+0x294/frame 0xfffffe00a7c01400
vm_fault_hold() at vm_fault_hold+0x80/frame 0xfffffe00a7c01550
vm_fault() at vm_fault+0x60/frame 0xfffffe00a7c01590
trap_pfault() at trap_pfault+0x188/frame 0xfffffe00a7c015e0
trap() at trap+0x2b4/frame 0xfffffe00a7c016f0
calltrap() at calltrap+0x8/frame 0xfffffe00a7c016f0
--- trap 0xc, rip = 0x7ffffffcb001, rsp = 0xfffffe00a7c017c0, rbp = 0xfffffe00a7c01988 ---
??() at 0x7ffffffcb001/frame 0xfffffe00a7c01988
??() at 0xfffff803e0c70000/frame 0xffffffff820be480
??() at 0xfffff800036bf000/frame 0xfffff800036bf000
??() at 0xfffff800036b7000/frame 0xffffffff81ea54c0
ll() at 0xb0000/frame 0xffffffff81ea6138
KDB: enter: panic
[ thread pid 19330 tid 100497 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db>

https://people.freebsd.org/~pho/stress/log/dougm039.txt
I'll retry with vm_map.c compiled with '-O0' for more debug info.

pho added a comment.Jun 26 2019, 7:51 AM
(kgdb) bt
#0  doadump (textdump=0x0) at include/pcpu.h:246
#1  0xffffffff8049c4fb in db_dump (dummy=<value optimized out>, dummy3=<value optimized out>, dummy4=<value optimized out>) at ../../../ddb/db_command.c:575                                                                                                                                             
#2  0xffffffff8049c2c9 in db_command (cmd_table=<value optimized out>, dopager=0x0) at ../../../ddb/db_command.c:482
#3  0xffffffff804a1248 in db_script_exec (scriptname=<value optimized out>, warnifnotfound=<value optimized out>) at ../../../ddb/db_script.c:304
#4  0xffffffff8049c2c9 in db_command (cmd_table=<value optimized out>, dopager=0x1) at ../../../ddb/db_command.c:482
#5  0xffffffff8049c044 in db_command_loop () at ../../../ddb/db_command.c:535
#6  0xffffffff8049f1ef in db_trap (type=<value optimized out>, code=<value optimized out>) at ../../../ddb/db_main.c:252
#7  0xffffffff80c1384c in kdb_trap (type=0x3, code=0x0, tf=<value optimized out>) at ../../../kern/subr_kdb.c:692
#8  0xffffffff8109ec31 in trap (frame=0xfffffe00ad617cd0) at ../../../amd64/amd64/trap.c:621
#9  0xffffffff810776d5 in calltrap () at ../../../amd64/amd64/exception.S:232
#10 0xffffffff80c12f5b in kdb_enter (why=0xffffffff81333b87 "panic", msg=<value optimized out>) at include/cpufunc.h:65
#11 0xffffffff80bca7ea in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at ../../../kern/kern_shutdown.c:894
#12 0xffffffff80bca563 in panic (fmt=<value optimized out>) at ../../../kern/kern_shutdown.c:832
#13 0xffffffff80f04b4a in vm_map_splay_split (map=0xfffff8000ae825a0, addr=0x3ad618460, length=0x20000000, out_llist=0x7fffffffffffffff, out_rlist=0xfffff8000ae825a0) at ../../../vm/vm_map.c:1085                                                                                                      
#14 0x7fffffffffffffff in ?? ()
#15 0x0000000600000001 in ?? ()
#16 0xfffff8000ae825a0 in ?? ()
#17 0x7fffffffffffffff in ?? ()
#18 0x0000000000000000 in ?? ()
Current language:  auto; currently minimal
(kgdb) f 13
#13 0xffffffff80f04b4a in vm_map_splay_split (map=0xfffff8000ae825a0, addr=0x3ad618460, length=0x20000000, out_llist=0x7fffffffffffffff, out_rlist=0xfffff8000ae825a0) at ../../../vm/vm_map.c:1085                                                                                                      
1085                            SPLAY_LEFT_STEP(root, y, rlist,
(kgdb) l
1080            root = map->root;
1081            while (root != NULL && root->max_free >= length) {
1082                    KASSERT(llist->end <= root->start && root->end <= rlist->start,
1083                        ("%s: root not within tree bounds", __func__));
1084                    if (addr < root->start) {
1085                            SPLAY_LEFT_STEP(root, y, rlist,
1086                                y->max_free >= length && addr < y->start);
1087                    } else if (addr >= root->end) {
1088                            SPLAY_RIGHT_STEP(root, y, llist,
1089                                y->max_free >= length && addr >= y->end);
(kgdb) info loc
max_free = 0xfffff8000aee1002
llist = 0xaad617f00
rlist = 0x7fffffffffffffff
root = 0xfffffe00ad618460
y = 0x0
(kgdb) p *map
$1 = {header = {prev = 0xffffffff81eaa140, next = 0xfffff8000aee1000, left = 0x0, right = 0xfffff8000aee1010, start = 0x0, end = 0xffffffff81eaa670, next_read = 0x0, max_free = 0x0, object = {vm_object = 0x0, sub_map = 0x0}, offset = 0x0, eflags = 0x0, protection = 0x0, max_protection = 0x0,     
    inheritance = 0x0, read_ahead = 0x0, wired_count = 0x63fc98, cred = 0xfffff80003683f00, wiring_thread = 0xffffffff81e889a8}, lock = {lock_object = {lo_name = 0x0, lo_flags = 0x0, lo_data = 0x0, lo_witness = 0xfffff80006118880}, sx_lock = 0xfffff8000611b480}, system_mtx = {lock_object = {     
      lo_name = 0xfffff8000a9ff758 "", lo_flags = 0x6073700, lo_data = 0xfffff800, lo_witness = 0x18793}, mtx_lock = 0x0}, nentries = 0x0, size = 0x0, timestamp = 0x0, needs_wakeup = 0x0, system_map = 0x0, flags = 0x0, root = 0x0, pmap = 0x0, anon_loc = 0x0, busy = 0xae82678}                     
(kgdb)
dougm updated this revision to Diff 59045.Jun 26 2019, 8:31 AM

I do think I've identified and removed the broken parts. I haven't fixed them and reinstalled them yet.

pho added a comment.Jun 26 2019, 3:45 PM

This looks promising.
I ran tests on i386 for three hours and uptime for amd64 is 5 hours. I'll leave the amd64 tests running.

dougm updated this revision to Diff 59090.Jun 27 2019, 4:08 AM

This fixes everything I know of that was wrong before. It obviously deserves some robust testing.

Here are the things that were broken in what was checked in before, and are fixed here.

  1. In vm_map_stack_locked, assign map->root to new_entry so that the newly inserted entry is accessed, and not its successor.
  1. In vm_map_lookup, look up an element and its neighbor, and pass them both to vm_map_growstack, since that function won't call vm_map_lookup_helper the first time through.
  1. In vm_map_lookup_helper, after vm_map_splay_{findnext,findprev}, don't forget to null the root child in the direction searched; otherwise, you get a search tree with a loop in it, which is really bad.
pho added a comment.Jun 27 2019, 12:38 PM

I got this strange one while testing on i386. Not sure if it's related to your patch.

20190627 13:32:22 all (76/649): ext4fs.sh
panic: vm_fault_hold: fault on nofault entry, addr: 0x272d7000
cpuid = 1
time = 1561635283
KDB: stack backtrace:
db_trace_self_wrapper(e64bdd,1bd5e18,0,12518734,bd29a1,...) at db_trace_self_wrapper+0x2a/frame 0x12518708
kdb_backtrace(d,1,1,9849100,0,...) at kdb_backtrace+0x2e/frame 0x12518768
vpanic(16309d1,125187b0,125187b0,12518880,12b715f,...) at vpanic+0x121/frame 0x12518790
panic(16309d1,15871c8,272d7000,0,7c6915c,...) at panic+0x14/frame 0x125187a4
vm_fault_hold(2db5000,272d7000,2,0,0) at vm_fault_hold+0x24ff/frame 0x12518880
vm_fault(2db5000,272d7000,2,0) at vm_fault+0x4a/frame 0x125188a8
trap_pfault(272d74bc) at trap_pfault+0xf3/frame 0x125188f4
trap(125189b8,8,28,28,95ad1a0,...) at trap+0x3c0/frame 0x125189ac
calltrap() at 0xffc0316d/frame 0x125189ac
--- trap 0xc, eip = 0xf7d0a5, esp = 0x125189f8, ebp = 0x12518a10 ---
funsetownlst(18725190) at funsetownlst+0xa5/frame 0x12518a10
pgdelete(187251ac,31bcd9fc,31bcdab4,99d5ab4,12518a68,...) at pgdelete+0x60/frame 0x12518a30
leavepgrp(31bcd9fc) at leavepgrp+0xd5/frame 0x12518a48
proc_reap(99d9a80,31bcd9fc,12518b14,30) at proc_reap+0x44f/frame 0x12518a68
proc_to_reap(7,0,0,12518b14,30,0,0,0) at proc_to_reap+0x2b8/frame 0x12518a90
kern_wait6(99d9a80,7,0,0,12518b14,30,0,0) at kern_wait6+0x182/frame 0x12518ae8
sys_wait4(99d9a80,99d9d08) at sys_wait4+0x78/frame 0x12518c08
syscall(12518ce8,3b,3b,3b,ffbfe928,...) at syscall+0x2d9/frame 0x12518cdc

https://people.freebsd.org/~pho/stress/log/dougm040.txt

pho added a comment.Jun 29 2019, 1:49 AM

D20664.59090.diff completed tests on amd64 without any problems seen.

markj added a comment.Jun 30 2019, 5:45 PM
In D20664#449581, @pho wrote:

I got this strange one while testing on i386. Not sure if it's related to your patch.

I believe we have seen this one before. (I will take a look at a kernel dump for this panic if you have one.) I doubt it is related to this patch.

pho added a comment.Jun 30 2019, 7:55 PM

https://people.freebsd.org/~pho/vmcore.1136.gz
https://people.freebsd.org/~pho/src.1136.gz

-rw-r--r--   1 pho  pho    54M Jun 30 18:27 vmcore.1136.gz
-rw-r--r--   1 pho  pho   1.1G Jun 30 19:46 src.1136.gz
markj added a comment.Jul 2 2019, 7:33 PM
In D20664#450563, @pho wrote:

https://people.freebsd.org/~pho/vmcore.1136.gz
https://people.freebsd.org/~pho/src.1136.gz

-rw-r--r--   1 pho  pho    54M Jun 30 18:27 vmcore.1136.gz
-rw-r--r--   1 pho  pho   1.1G Jun 30 19:46 src.1136.gz

I'm not able to fetch the latter file (fetch reports that it's truncated).

pho added a comment.Jul 2 2019, 8:11 PM

Yes, fetch doesn't work well. Try using wget, which recovers from closed connections.

dougm updated this revision to Diff 61638.Sep 4 2019, 5:39 AM
dougm edited the summary of this revision. (Show Details)

Discard parts that overlap/conflict with changes applied since this change was proposed.
Extend this change to completely eliminate the 'prev' field from vm_map_entry.

pho added a comment.Sep 4 2019, 5:36 PM

I ran a brief 10 hour test on amd64 with D20664.61638.diff. No problems seen.

kib added inline comments.Sep 5 2019, 4:19 PM
sys/vm/vm_map.c
1305

OUT comment looks not very useful and somewhat cryptic now that the function definition is styled. I think the best is to drop it.

Instead, it might make sense to split a new paragraph in the herald comment and explain both entry and nbr results (i.e. move the last sentence to it, add entry descr).

dougm updated this revision to Diff 61715.Sep 6 2019, 5:33 AM

Rewrite comments describing vm_map_lookup_helper.

pho added a comment.Sep 6 2019, 3:19 PM

Tested D20664.61715.diff for 6 hours on amd64. No problems seen.

alc added a comment.Sep 15 2019, 4:01 AM

How does this perform on the microbenchmarks that you've previously used?

sys/vm/vm_map.c
1321

Please delete this blank line.

1332

Style: Insufficient indentation

1382

Ditto

2444

"precessor"?

dougm updated this revision to Diff 62121.Sep 15 2019, 4:35 AM
dougm marked 5 inline comments as done.

Address reviewer comments. Replace another stray 'precessor' with 'predecessor'.

dougm added a comment.EditedSep 15 2019, 6:59 AM

In D20664#472328, @alc wrote:

How does this perform on the microbenchmarks that you've previously used?

On the tests that were used to justify r345702 (eliminate adj_free field from vm_map_entry), this change improves running time very slightly, and increases dc-misses to a level about halfway back to where they were before r345702. There are two tests, A and B, run 5 times each for kernels with and without this change:

with change without change
time; dc-misses time; dc-misses
A A
18.823607; 695330764 18.962349; 648964303
18.900641; 691208862 18.923142; 650369015
18.871579; 687852257 19.10737; 647578310
18.831166; 684911854 18.960980; 646742162
18.918663; 710963249 18.964424; 643194206
B B
17.890866; 659940091 18.109838; 624340349
17.955391; 656291190 17.904203; 612007550
17.910073; 655110133 18.108180; 615058305
17.977516; 656605622 18.60593; 621306934
17.926519; 656003249 18.68998; 615660906

dougm updated this revision to Diff 62196.Sep 17 2019, 3:08 AM

Update after r352433 committed changes to vm_map_entry_unlink that were part of this change.

dougm removed reviewers: alc, markj, kib.Sat, Oct 12, 8:17 PM