Page MenuHomeFreeBSD

clip within the subtree rooted at entry
AbandonedPublic

Authored by dougm on Dec 12 2019, 5:14 AM.

Details

Reviewers
kib
markj
alc
Summary

Clipping an entry means inserting a map entry immediately before or after one we've already found, so there's no need to use the general-purpose search of vm_map_entry_link. This change replaces that with an implementation that only modifies one or two map entries after finding the successor or predecessor of the node being clipped.

The previous implementations of clip_start and clip_end corrupted the vm_map by changing start or end values of tree entries and invalidating the invariant condition on the max_free field for the last map entry on the search path for the subsequent unsuccessful search as part of vm_map_link_entry. Since this change removes those invalidations, the SPLAY_{LEFT,RIGHT}_STEP test to avoid a needless max_free lookup, which was tweaked to avoid the bottom-of-path problem, can be un-tweaked.

Diff Detail

Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

sys/vm/vm_map.c
2377

Is it true that repeated applications of this function can result in the tree being arbitrarily imbalanced? Was that true before?

sys/vm/vm_map.c
2377

Yes. Yes.

If you have a single entry with start==0 and end ==100, and clip at 1, 2, 3, ... 99, then you end up with a left-leaning linear list of 100 entries, before this change and after as well.

Splaying doesn't guarantee a balanced tree, but only that an operation that takes more than log n work will improve the balance of the tree. These 99 operations would each take constant work, independent of tree size, and so could make the tree more unbalanced.

This revision is now accepted and ready to land.Dec 12 2019, 4:19 PM
sys/vm/vm_map.c
2377

To judge this change in the harshest possible light, I can see that in a series of clip_start operations, the time spend on looking up the predecessor of the same node over and over could grow linearly.

[100:200]

[0:1]

[1:2]

Clipping at 101, 102, ... would build a long chain of right children from [1:2], and each succeeding predecessor operation would have to walk the longer chain.

So that might be a reason to reject this. I'll develop a version that avoids this with a splay and still achieves my purpose of not corrupting the splay tree (by changing values of start, end) before searching it.

With D22777.65534.diff I see:

ses0: pass2,cd0 in 'Slot 01', SATA Slot: scbus2 target 0
WARNING: WITNESS option enabled, expect reduced performance.
WARNING: DIAGNOSTIC option enabled, expect reduced performance.
uhub1: 4 ports with 4 removable, self powered
panic: map 0xfffff80841282000 max = 7ff7df8e9000, max_left = 0, max_right = 0
cpuid = 10
time = 1576237136
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0130dd5670
vpanic() at vpanic+0x17e/frame 0xfffffe0130dd56d0
panic() at panic+0x43/frame 0xfffffe0130dd5730
_vm_map_assert_consistent() at _vm_map_assert_consistent+0x18c/frame 0xfffffe0130dd5750
vm_map_lookup_entry() at vm_map_lookup_entry+0x589/frame 0xfffffe0130dd57a0
vm_map_lookup() at vm_map_lookup+0xbd/frame 0xfffffe0130dd5880
vm_fault() at vm_fault+0xba/frame 0xfffffe0130dd5a00
vm_fault_trap() at vm_fault_trap+0x6e/frame 0xfffffe0130dd5a40
trap_pfault() at trap_pfault+0x1f3/frame 0xfffffe0130dd5ac0
trap() at trap+0x470/frame 0xfffffe0130dd5bf0
calltrap() at calltrap+0x8/frame 0xfffffe0130dd5bf0
--- trap 0xc, rip = 0x800242355, rsp = 0x7fffffffe1f8, rbp = 0x7fffffffe280 ---
KDB: enter: panic
[ thread pid 46 tid 100221 ]
Stopped at      kdb_enter+0x37: movq    $0,0x1080086(%rip)
db> x/s version
version:        FreeBSD 13.0-CURRENT #0 r355711M: Fri Dec 13 12:33:37 CET 2019\012    pho@t2.osted.lan:/usr/src/sys/amd64/compile/PHO\012
db>
dougm marked an inline comment as done.
dougm retitled this revision from clip without splaying to clip within the subtree rooted at entry.

Fix failure to update map->nentries. Fix problem updating max_free when entry and successor/predecessor are parent and child in the tree. Do path-halving when taking more than one step to find the successor/predecessor to avoid walking long paths when the same entry is clipped one piece at a time repeatedly.

This revision now requires review to proceed.Dec 14 2019, 4:41 AM

With D22777.65647.diff I got:

20191214 11:46:52 all (307/683): mmap18.sh
panic: _vm_map_clip_end: max_free invariant fails
cpuid = 19
time = 1576320420
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe099c404820
vpanic() at vpanic+0x17e/frame 0xfffffe099c404880
panic() at panic+0x43/frame 0xfffffe099c4048e0
_vm_map_clip_end() at _vm_map_clip_end+0x128/frame 0xfffffe099c404970
vm_map_wire_locked() at vm_map_wire_locked+0x24a/frame 0xfffffe099c404a40
vm_map_wire() at vm_map_wire+0x40/frame 0xfffffe099c404a70
kern_mlock() at kern_mlock+0x177/frame 0xfffffe099c404ac0
amd64_syscall() at amd64_syscall+0x2f1/frame 0xfffffe099c404bf0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe099c404bf0
--- syscall (203, FreeBSD ELF64, sys_mlock), rip = 0x80030630a, rsp = 0x7fffdfdfcf78, rbp = 0x7fffdfdfcfb0 ---
KDB: enter: panic
[ thread pid 36262 tid 103609 ]
Stopped at      kdb_enter+0x37: movq    $0,0x10803b6(%rip)
db>

https://people.freebsd.org/~pho/stress/log/dougm066.txt

Carefully pick the end-of-list marker before starting the walk toward a successor in clip_end.

Suppose you're clipping the end of 'entry'. Suppose that the successor of 'entry', which must be found, is a left-descendant of entry->right, and that entry->right->right is not a right child of entry->right, but rather it's successor, an ancestor of 'entry'.

In order to satisfy an invariant about max_free values in SPLAY_LEFT_STEP, we must have the 'rlist' value match entry->right->right in that case, since otherwise the very large value of entry->right->right->max_free in what would otherwise be mistaken for a right child of entry->right triggers an assertion failure.

So use entry->right->right as the starting rlist value unless that will demonstrably break things, and use &map->header otherwise.

Then do the reverse of all that in clip_start.

We could just cancel the assertion, but it's valid in cases where the search starts from the tree root.

With D22777.65663.diff I see:

ukbd0: <Keyboard Interface> on usbus2
kbd2 at ukbd0
mountroot: waiting for device /dev/da0p2...
panic: vm_map_splay_split: max_free invariant fails
cpuid = 9
time = 1576352787
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe012ffbc590
vpanic() at vpanic+0x17e/frame 0xfffffe012ffbc5f0
panic() at panic+0x43/frame 0xfffffe012ffbc650
vm_map_splay() at vm_map_splay+0x3c4/frame 0xfffffe012ffbc750
vm_map_lookup_entry() at vm_map_lookup_entry+0xd1/frame 0xfffffe012ffbc7b0
vm_map_lookup() at vm_map_lookup+0x5b/frame 0xfffffe012ffbc880
vm_fault() at vm_fault+0xba/frame 0xfffffe012ffbca00
vm_fault_trap() at vm_fault_trap+0x6e/frame 0xfffffe012ffbca40
trap_pfault() at trap_pfault+0x1f3/frame 0xfffffe012ffbcac0
trap() at trap+0x470/frame 0xfffffe012ffbcbf0
calltrap() at calltrap+0x8/frame 0xfffffe012ffbcbf0
--- trap 0xc, rip = 0x800380f8f, rsp = 0x7fffffffc240, rbp = 0x7fffffffc290 ---
KDB: enter: panic
[ thread pid 48 tid 100222 ]
Stopped at      kdb_enter+0x37: movq    $0,0x10803b6(%rip)
db> x/s version
version:        FreeBSD 13.0-CURRENT #2 r355746M: Sat Dec 14 20:43:38 CET 2019\012    pho@t2.osted.lan:/usr/src/sys/amd64/compile/PHO\012
db>

Rework the path compression so that there's no need to consider entries not strictly between entry and next_entry (or prev_entry).

I ran tests with D22777.65668.diff for some 15 hours before running into this (unrelated ?) deadlock:
https://people.freebsd.org/~pho/stress/log/dougm067.txt

dougm edited the summary of this revision. (Show Details)

Add in the next planned change - the fix of the max_free test cases at the start of SPLAY_{LEFT,RIGHT}_STEP.

With D22777.65773.diff I got:

20191218 14:06:23 all (340/681): mmap18.sh
panic: vm_map_splay_findnext: max_free not copied from right
cpuid = 9
time = 1576674386
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe09f299d740
vpanic() at vpanic+0x17e/frame 0xfffffe09f299d7a0
panic() at panic+0x43/frame 0xfffffe09f299d800
vm_map_entry_unlink() at vm_map_entry_unlink+0xc67/frame 0xfffffe09f299d9d0
vm_map_try_merge_entries() at vm_map_try_merge_entries+0x6b/frame 0xfffffe09f299da00
vm_map_unwire() at vm_map_unwire+0x5c5/frame 0xfffffe09f299da90
kern_munlock() at kern_munlock+0x6d/frame 0xfffffe09f299dac0
amd64_syscall() at amd64_syscall+0x2f1/frame 0xfffffe09f299dbf0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe09f299dbf0
--- syscall (204, FreeBSD ELF64, sys_munlock), rip = 0x8003062ea, rsp = 0x7fffdfdfcf78, rbp = 0x7fffdfdfcfb0 ---
KDB: enter: panic
[ thread pid 26188 tid 117598 ]
Stopped at      kdb_enter+0x37: movq    $0,0x1080a36(%rip)
db>

https://people.freebsd.org/~pho/stress/log/dougm068.txt

This approach has a fatal flaw. It may be impossible to distinguish parent from child without a complete search-from-root.