Page MenuHomeFreeBSD

reduce accesses to vm_map entries off the search path in updating max_free
ClosedPublic

Authored by dougm on Apr 5 2019, 4:10 AM.
Tags
None
Referenced Files
F106086744: D19826.id56126.diff
Wed, Dec 25, 4:26 AM
F106085345: D19826.id56351.diff
Wed, Dec 25, 3:52 AM
F106055451: D19826.id55885.diff
Tue, Dec 24, 3:13 PM
Unknown Object (File)
Mon, Dec 23, 6:33 AM
Unknown Object (File)
Sun, Dec 22, 7:58 PM
Unknown Object (File)
Sun, Dec 22, 7:53 PM
Unknown Object (File)
Sun, Dec 22, 7:41 PM
Unknown Object (File)
Sun, Dec 22, 7:40 PM
Subscribers

Details

Summary

The computations of vm_map_splay_split and vm_map_splay_merge touch both children of every entry on the search path as part of updating values of the max_free field. By comparing the max_free values of an entry and its child on the search path, the code can avoid accessing the child off the path in cases where the max_free value decreases along the path.

Specifically, this patch changes splay_split so that the max_free field of every entry on the search path is replaced, temporarily, by the max_free field from its child not on the search path or, if the child in that direction is NULL, then a difference between start and end values of two pointers already available in the split code, without following any next or prev pointers. However, to find that max_free value does not require looking toward that other child if either the child on the search path has a lower max_free value, or the current max_free value is zero, because in either case we know that the value of max_free for the other child is the value we already have. So, the changes to vm_entry_splay_split make sure that we know all the off-search-path entries we will need to complete the splay, without looking at all of them. There is an exception at the bottom of the search path where we cannot rely on the max_free value in the direction of the NULL pointer that ends the search, because of the behavior of entry-clipping code.

The corresponding change to vm_splay_entry_merge makes it simpler, since it's just reversing pointers and updating running maxima.

In a test intended to exercise vigorously the vm_map implementation, the effect of this change was to reduce the data cache miss rate by 10-14% and the running time by 5-7%.

MFC: 3 months

I would paste the above in as commit message when testing is complete and the patch can be checked in. If any mentors want to suggest improvements to this checkin message, please do.
Tested by: pho

Test Plan

Appealing to the kindness of Peter Holm for correctness testing.

I've run a performance test comparing the old and new vm_map.o files with a couple of applications that map and unmap many ranges to exercise the vm_map. The changes reduce the size of the vm_map.o file on amd64 from 463080 to 455704 bytes. Each test was run 5 times on a machine configured without multiple threads with times from gettimeofday and dc-miss counts from pmcstat.

For the original vm_map.c:
Test 1 Test2
Seconds dc-misses Seconds dc-misses
19.723195 735959319 19.268504 742186061
19.666366 749904436 19.272552 738918198
19.670189 754166550 19.215641 735077981
19.688910 749623489 19.251053 739168849
19.723270 750328650 19.309089 736750177

For the modified vm_map.c:
Test 1 Test2
Seconds dc-misses Seconds dc-misses
18.679360 682211158 17.745270 674801464
18.626152 687063686 17.689493 674667988
18.665722 680993737 17.791162 673663083
18.738495 705018224 17.763687 678144184
18.830796 731609219 17.904861 687706555

The tests 1 and 2 are uploaded here as files mapper3.c and mapper4.c.

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

This is what I got after a three hour test. Let me know if you need more.

debug.counters.max_free_read:    1074538828
debug.counters.max_free_zero:      97456702
debug.counters.max_free_declines: 821034845

Based on Peter's testing, I'm going back to the previous patch, without adding extra checks to avoid copying a 0 value from other-child to parent, since those checks would only pay off 4% of the time. Thanks for the testing, Peter.

sys/vm/vm_map.c
909 ↗(On Diff #56351)

vm_map_maxfree_left, please honor the symbols namespace.

914 ↗(On Diff #56351)

Blank line between functions definitions.

915 ↗(On Diff #56351)

namespace.

BTW, somewhat related to this review. I see that the addition of check_consistent uses a lot of cpu time on my crash boxes.

I believe that it should be handled same as vmem_check in subr_vmem.c: there should be a knob that makes the checker nop, and the knob should be default to 1 only for DIAGNOSTICS kernel.

Apply reviewer suggestions.

sys/vm/vm_map.c
676 ↗(On Diff #56432)

No need for space before '('

679 ↗(On Diff #56432)

It seems that #endif for DIAGNOSTIC is missed ?

Change the DIAGNOSTIC #ifdef, and comment the /* DIAGNOSTIC */ endif.

sys/vm/vm_map.c
679 ↗(On Diff #56432)

I meant that DIAGNOSTIC should only control the default value of enable_vmap_check. I do not have a strong preference there, though.

This revision is now accepted and ready to land.May 14 2019, 2:03 PM

I think that it would be worthwhile to write a test program that creates and destroys a large number of anonymous mappings and then, in particular, profile that program for data cache misses with pmcstat. The one tricky thing about writing this program is that you would need to ensure that the anonymous mappings can't coalesce by ensuring that there is always a gap between two mappings.

sys/vm/vm_map.c
912 ↗(On Diff #56483)

These functions operate on entries. Why has "entry_" been removed from the function name?

919 ↗(On Diff #56483)

vm_size_t, not size_t.

922 ↗(On Diff #56483)

Insert blank line.

926 ↗(On Diff #56483)

Ditto.

929 ↗(On Diff #56483)

Insert blank line.

1095 ↗(On Diff #56483)

With this change, I would have expected the removal of entry_ from the function's name. In general, there is increasing inconsistency in the use of vm_map_entry_ versus vm_map_.

1157 ↗(On Diff #56483)

Unnecessary parentheses.

1160 ↗(On Diff #56483)

Incorrect indentation.

dougm marked 8 inline comments as done.

Address alc comments.

This revision now requires review to proceed.May 14 2019, 6:00 PM
sys/vm/vm_map.c
912 ↗(On Diff #56483)

Because 80-character line limits and 8-space tabs make people desperate to avoid long names and nested loops. Neverthless, at the cost of wrapping 2 long lines, this has been changed.

{F4588426}Offer a test program such as the one alc has requested.

Update to account for overlapping changes that introduced conflicts.

dougm edited the summary of this revision. (Show Details)
dougm edited the test plan for this revision. (Show Details)
dougm added reviewers: kib, markj, alc.

Drop attempts to change vm_link to allow overlapping allocations, since that seemed to have a negative performance impact. Don't reply on max_free inference at the bottom of the search path, where start or end might be modified and the conditions on max_free violated.

This revision is now accepted and ready to land.Jun 1 2019, 5:40 PM
vm_map.c
1096–1100

The indentation here is wrong.

Change to standard indentation in swap function.

This revision now requires review to proceed.Jun 5 2019, 4:07 AM
vm_map.c
957
*      vm_map_entry_max_free_{left,right}:

Update the names of functions in an otherwise stale comment.

Replace ;; with ;; except in for(;;).

dougm marked 4 inline comments as done.Jun 5 2019, 4:27 AM

Provide some assertions and enhance the comments a bit to make the splay-splitting and merging more comprehensible.

With D19826.58291.diff I see:

Starting sshd.
Configuring vt: keymap blanktime.
Starting sendmail_submit.
panic: vm_map_splay_split: addr not within tree bounds
cpuid = 11
time = 1559803155
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00cb698330
vpanic() at vpanic+0x19d/frame 0xfffffe00cb698380
panic() at panic+0x43/frame 0xfffffe00cb6983e0
vm_map_splay_split() at vm_map_splay_split+0x35a/frame 0xfffffe00cb698430
vm_map_lookup_entry() at vm_map_lookup_entry+0xcc/frame 0xfffffe00cb698480
vm_map_lookup() at vm_map_lookup+0xe7/frame 0xfffffe00cb698570
vm_fault_hold() at vm_fault_hold+0x80/frame 0xfffffe00cb6986c0
vm_fault() at vm_fault+0x60/frame 0xfffffe00cb698700
trap_pfault() at trap_pfault+0x188/frame 0xfffffe00cb698750
trap() at trap+0x2b4/frame 0xfffffe00cb698860
calltrap() at calltrap+0x8/frame 0xfffffe00cb698860
--- trap 0xc, rip = 0xffffffff810934cf, rsp = 0xfffffe00cb698930, rbp = 0xfffffe00cb698930 ---
copyinstr_nosmap() at copyinstr_nosmap+0x3f/frame 0xfffffe00cb698930
sys_setlogin() at sys_setlogin+0x62/frame 0xfffffe00cb698990
amd64_syscall() at amd64_syscall+0x291/frame 0xfffffe00cb698ab0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00cb698ab0
--- syscall (50, FreeBSD ELF64, sys_setlogin), rip = 0x80073d8aa, rsp = 0x7fffffffd8b8, rbp = 0x7fffffffec60 ---
KDB: enter: panic
[ thread pid 816 tid 100505 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db> x/s version
version:        FreeBSD 13.0-CURRENT #0 r348726M: Thu Jun  6 08:34:05 CEST 2019\012    pho@t2.osted.lan:/usr/src/sys/amd64/compile/PHO\012
db>

Optimize lookups for addresses not in the range of the vm_map.

Enhance a comment. Wrap a long line.

Move the test for address out-of-range from vm_map_entry_lookup to vm_map_lookup, in the belief that all the other callers to vm_map_entry_lookup have guards against passing out-of-range addresses.

I ran tests on D19826.58346.diff for 3 1/2 hours. This included a buildworld / installworld.
No problems seen.

vm_map.c
1043–1044

I would add that this code relies on the start and end addresses in &map->header.

1047

Why not pass "map" here instead of "root"? Then, the callers don't need to initialize their "io_{l,r}list" pointers.

1118

Style.

1125–1126

The existence of a blank line here was the result of there having been a "non-trivial comment" here, i.e., a comment of the form

/*
 * Blah, blah, ...
 */

Without such a comment, the blank line can be removed.

1130

Style.

1137

The existence of a blank line here was also the result of there having been a "non-trivial comment" above. Without such a comment, the blank line can be removed.

1677–1678

More often, there is no space between the ";"'s, when the second expression is empty and the first and/or third are not.

1678

This line is indented by one space too many.

4529–4530

Isn't this change unrelated to the rest of the patch?

In D19826#444019, @pho wrote:

I ran tests on D19826.58346.diff for 3 1/2 hours. This included a buildworld / installworld.
No problems seen.

Peter,

Was this testing done on a kernel with "options DIAGNOSTIC" or with the debug.vmmap_check sysctl set to one?

All testing I do is with:

options         DEBUG_LOCKS
options         DEBUG_VFS_LOCKS
options         DIAGNOSTIC
nooptions       DEADLKRES             # watchdogd handles this

and

debug.vmem_check=0
debug.vmmap_check=0

Should I do a full test (48 hours) with debug.vmmap_check=1?

In D19826#444469, @pho wrote:

All testing I do is with:

options         DEBUG_LOCKS
options         DEBUG_VFS_LOCKS
options         DIAGNOSTIC
nooptions       DEADLKRES             # watchdogd handles this

and

debug.vmem_check=0
debug.vmmap_check=0

Should I do a full test (48 hours) with debug.vmmap_check=1?

Yes, but I would suggest that you wait for Doug to update the patch based on my recent comments.

dougm marked 9 inline comments as done.

Apply reviewer comments. Pass 'map' to vm_map_splay_merge so that it can provide headers, and root can be assigned to map->root within the function.

vm_map.c
1047

I've some something similar for vm_map_splay_merge.

4529–4530

It's a good idea, and necessary to allow an assert I had in a couple of versions ago, but it's unrelated. I'll offer it in a separate patch.

Change one comment, one indentation, and swap one pair of function arguments in response to offline reviewer suggestions.

This revision is now accepted and ready to land.Jun 9 2019, 3:18 AM

I ran a full stress2 test with debug.vmmap_check=1. This included a buildworld / installworld.
No problems seen.

One month for the MFC timer would be okay.

This revision was automatically updated to reflect the committed changes.