Details
Diff Detail
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
The splay performed by the initial call to vm_map_findspace() will produce a tree that has either (1) the entry before "*addr" at the root and the entry after "*addr" at the root's right child or (2) the entry after "*addr" at the root and the entry before "*addr" at the root's left child. Then, the splay performed by a redundant call to vm_map_findspace() will only perform a constant-time tweak to the tree structure. It will either change configuration #1 into configuration #2 or change configuration #2 into configuration #1. And, a configuration #1 will be handled by the following snippet of vm_map_findspace()
st = (start > map->root->end) ? start : map->root->end; if (length <= map->root->end + map->root->adj_free - st) { *addr = st; return (0);
and a configuration #2 will be handled by the following snippet of vm_map_findspace()
if (start + length <= map->root->start) { *addr = start; return (0);
If you are concerned about a search for a missing tree element flipping the top two tree nodes for no good reason, as for example when looking for 1 in the set (0 2) and flipping the tree back and forth between
(root)0 -> 2
and
0 <- 2(root)
then I think a change like the one below, which makes one splay iteration take two steps down the tree always, when the tree is deep enough, instead of taking only one step down in the zig-zag case, would address your problem.
diff --git a/sys/vm/vm_map.c b/sys/vm/vm_map.c index 905ac1b9313..0762da6bda9 100644 --- a/sys/vm/vm_map.c +++ b/sys/vm/vm_map.c @@ -917,7 +917,9 @@ vm_map_entry_splay(vm_offset_t addr, vm_map_entry_t root) y = root->left; if (y == NULL) break; - if (addr < y->start && y->left != NULL) { + if (addr < y->start) { + if (y->left == NULL) + break; /* Rotate right and put y on rlist. */ root->left = y->right; y->right = root; @@ -925,17 +927,29 @@ vm_map_entry_splay(vm_offset_t addr, vm_map_entry_t root) root = y->left; y->left = rlist; rlist = y; + } else if (addr >= y->end) { + if (y->right == NULL) + break; + /* Put root on rlist, y on llist. */ + root->left = rlist; + rlist = root; + root = y->right; + y->right = llist; + llist = y; } else { /* Put root on rlist. */ root->left = rlist; rlist = root; root = y; + break; } } else if (addr >= root->end) { y = root->right; if (y == NULL) break; - if (addr >= y->end && y->right != NULL) { + if (addr >= y->end) { + if (y->right == NULL) + break; /* Rotate left and put y on llist. */ root->right = y->left; y->left = root; @@ -943,11 +957,21 @@ vm_map_entry_splay(vm_offset_t addr, vm_map_entry_t root) root = y->right; y->right = llist; llist = y; + } else if (addr < y->start) { + if (y->left == NULL) + break; + /* Put root on llist, y on rlist. */ + root->right = llist; + llist = root; + root = y->left; + y->left = rlist; + rlist = y; } else { /* Put root on llist. */ root->right = llist; llist = root; root = y; + break; } } else break;
vm/vm_map.c | ||
---|---|---|
1524 | After comparing my latest version and this patch, I think that the split of vm_map_find() into two functions provides much cleaner flow control. In your version, I would consider moving the loop to adjust *addr into a function, then break would transform into return (KERN_SUCCESS);. The vm_map_insert()/vm_map_stack() would be only called if result == KERN_SUCCESS. |
I've applied the following patch and run a JVM workload before applying this change and after applying this change.
Index: vm/vm_map.c =================================================================== --- vm/vm_map.c (revision 326323) +++ vm/vm_map.c (working copy) @@ -873,6 +873,9 @@ vm_map_entry_set_max_free(vm_map_entry_t entry) entry->max_free = entry->right->max_free; } +static long counter; +SYSCTL_ULONG(_debug, OID_AUTO, counter, CTLFLAG_RD, &counter, 0, ""); + /* * vm_map_entry_splay: * @@ -911,7 +914,7 @@ vm_map_entry_splay(vm_offset_t addr, vm_map_entry_ */ llist = NULL; rlist = NULL; - for (;;) { + for (; atomic_add_long(&counter, 1), true;) { /* root is never NULL in here. */ if (addr < root->start) { y = root->left;
As I would expect, there is no discernible difference in the number of splay iterations:
x before + after +------------------------------------------------------------------------------+ | x + x | |* x x ++*+ * x + +* + xx + x x + + + x x| | |__|_________________A_MM_______________|_____| | +------------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 15 44116 53654 48147 48088.6 2808.6388 + 15 44116 51873 48043 47870.267 2284.898 No difference proven at 95.0% confidence
vm/vm_map.c | ||
---|---|---|
1491 | Perhaps note that alignment == 0 implies superpage alignment. | |
1507 | This call under the assertion just repeats the preceding line in the caller. IMO it is excessive. Might be, instead, add a new function which checks that [start, start+length) fits into the map, with overflow checking. The function can be used also in vm_map_insert() and vm_map_stack_locked(), I believe. It might be done as the next step. | |
1514 | WIth the amount of implicitness for find_space, I think that vm_map_find() should start asserting the values for it. I.e., assert that the lowest byte of find_space if from the known list. |
vm/vm_map.c | ||
---|---|---|
1507 |
I expect (or fear) that someone else will use this function someday. |
vm/vm_map.c | ||
---|---|---|
1507 | Do you mean vm_map_alignspace() or proposed check function ? |
vm/vm_map.c | ||
---|---|---|
1507 |
vm_map_alignspace() |
vm/vm_map.c | ||
---|---|---|
1507 | The proposed function would provide the same correctness guarantees as the full vm_map_findspace() call. I do not insist, of course. |
vm/vm_map.c | ||
---|---|---|
1584 | Doesn't this also need to accept VMFS_ALIGNED_SPACE? |
Here are splay iteration counts for a "buildworld" before and after applying this change.
x buildworld_before + buildworld_after +------------------------------------------------------------------------------+ | + | |x+x + x * +* x ++ ++ x +| ||_|_________M_AM______A____|______________| | +------------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 7 4.5242531e+08 4.5257617e+08 4.5246820e+08 4.5247583e+08 50335.55 + 11 4.5242990e+08 4.5271037e+08 4.5248084e+08 4.5250497e+08 74015.642 No difference proven at 95.0% confidence
In short, the extra findspace calls are having no significant effect.
That said, as a follow up I want to try out Doug's suggestion. He is correct. We are, in fact, doing redundant comparisons, and I'm skeptical about any branch predictor being able to reliably optimize those redundant branches. However, we make assumptions about where the sought after "gap" appears in the binary search tree after a splay, so his initial proposed change above doesn't play nice with our code. He's given me a revised version from what is posted here.
vm/vm_map.c | ||
---|---|---|
1584 | You are absolutely correct. I hadn't yet run a test with openjdk8, which now uses MAP_ALIGNED() to implement the -XX:+UseLargePages option. |
Doug's suggested change reduces the number of iterations in vm_map_entry_splay() during a "buildworld" workload by 41.5%. To be clear, we're still touching the same map entries; we're just avoiding redundant comparisons.
Doug can you create a review for that patch.
Kostik, I've been reviewing vm_map_find()'s callers, and I'm concerned about i915_gem_mmap_ioctl(). Consider this snippet:
p = curproc; map = &p->p_vmspace->vm_map; ... addr = 0; vm_object_reference(obj->vm_obj); rv = vm_map_find(map, obj->vm_obj, args->offset, &addr, args->size, 0,
I believe that 0 is going to be out-of-range for a map on amd64, and so vm_map_find(), either with or without this revision, is going to return an error. Am I missing something here?
Nevermind. I forgot that vm_map_findspace() handles this case. It updates the given address to be at least vm_map_min(). In general, other callers don't pass 0, but themselves initialize the given address to vm_map_min().
No, I think that the gem code is fine as is. Rather, I think that the comment describing vm_map_find() should explicitly address the handling of an out-of-range "hint". However, I'd prefer not to make any more changes as part of this revision.