Replace a few utility functions for vm_map manipulation with macros, and add a boolean status argument to give the compiler a chance to generate better code.
Details
On 2 tests that exercise the vm_map vigoriously, these are the runtimes and dc-misses with and without the change in place:
After:
Test 1 Test 2
seconds dc-misses seconds dc-misses
17.518129 719615104 16.613336 694618946
17.544017 717358163 16.560833 695521981
17.498316 722645549 16.519886 693421798
17.481360 723478804 16.559717 689488954
17.505018 720791939 16.494966 689094378
Before:
Test 1 Test 2
seconds dc-misses seconds dc-misses
18.448940 697424367 17.487485 680852142
18.520613 700154192 17.488678 670644687
18.369913 696744015 17.553468 668572905
18.412856 704237817 17.457770 664390472
18.513684 700244077 17.480940 664748569
Diff Detail
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
Do you get similar results by using C functions with the __always_inline attribute instead of macros?
Replace the macros with __always_inline functions, which preserves, and perhaps slightly enhances the performance benefit.
sys/vm/vm_map.c | ||
---|---|---|
1093 | Isn't *found true iff root != NULL? Why do we need found at all? |
sys/vm/vm_map.c | ||
---|---|---|
1093 | I thought I was saving a root != NULL test and letting the break cases in LEFT_STEP/RIGHT_STEP go directly to handling the root!=NULL cases without testing. But I guess I was wrong. After stripping out 'found', I get: 17.482829 719261050 16.607015 704332901 which looks about the same. So I'll take it out. | |
1102 | I hear you ask - why not just keep this a simple while loop? Well, because it can't a simple while loop in the threaded tree, and I seek to make the final patch that switches from not-threaded to threaded to be one that can be easily understood, if I ever get there. |