Handle address wrap errors in vm_map_find()
ClosedPublic

Authored by alc on Dec 2 2017, 8:49 PM.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
alc created this revision.Dec 2 2017, 8:49 PM
alc added a comment.EditedDec 2 2017, 9:34 PM

The splay performed by the initial call to vm_map_findspace() will produce a tree that has either (1) the entry before "*addr" at the root and the entry after "*addr" at the root's right child or (2) the entry after "*addr" at the root and the entry before "*addr" at the root's left child. Then, the splay performed by a redundant call to vm_map_findspace() will only perform a constant-time tweak to the tree structure. It will either change configuration #1 into configuration #2 or change configuration #2 into configuration #1. And, a configuration #1 will be handled by the following snippet of vm_map_findspace()

st = (start > map->root->end) ? start : map->root->end;
if (length <= map->root->end + map->root->adj_free - st) {
        *addr = st;
        return (0);

and a configuration #2 will be handled by the following snippet of vm_map_findspace()

if (start + length <= map->root->start) {
        *addr = start;
        return (0);
dougm_rice.edu added a comment.EditedDec 2 2017, 9:50 PM

If you are concerned about a search for a missing tree element flipping the top two tree nodes for no good reason, as for example when looking for 1 in the set (0 2) and flipping the tree back and forth between

(root)0 -> 2

and
0 <- 2(root)

then I think a change like the one below, which makes one splay iteration take two steps down the tree always, when the tree is deep enough, instead of taking only one step down in the zig-zag case, would address your problem.

diff --git a/sys/vm/vm_map.c b/sys/vm/vm_map.c
index 905ac1b9313..0762da6bda9 100644
--- a/sys/vm/vm_map.c
+++ b/sys/vm/vm_map.c
@@ -917,7 +917,9 @@ vm_map_entry_splay(vm_offset_t addr, vm_map_entry_t root)
                        y = root->left;
                        if (y == NULL)
                                break;
-                       if (addr < y->start && y->left != NULL) {
+                       if (addr < y->start) {
+                               if (y->left == NULL)
+                                       break;
                                /* Rotate right and put y on rlist. */
                                root->left = y->right;
                                y->right = root;
@@ -925,17 +927,29 @@ vm_map_entry_splay(vm_offset_t addr, vm_map_entry_t root)
                                root = y->left;
                                y->left = rlist;
                                rlist = y;
+                       } else if (addr >= y->end) {
+                               if (y->right == NULL)
+                                       break;
+                               /* Put root on rlist, y on llist. */
+                               root->left = rlist;
+                               rlist = root;
+                               root = y->right;
+                               y->right = llist;
+                               llist = y;
                        } else {
                                /* Put root on rlist. */
                                root->left = rlist;
                                rlist = root;
                                root = y;
+                               break;
                        }
                } else if (addr >= root->end) {
                        y = root->right;
                        if (y == NULL)
                                break;
-                       if (addr >= y->end && y->right != NULL) {
+                       if (addr >= y->end) {
+                               if (y->right == NULL)
+                                       break;
                                /* Rotate left and put y on llist. */
                                root->right = y->left;
                                y->left = root;
@@ -943,11 +957,21 @@ vm_map_entry_splay(vm_offset_t addr, vm_map_entry_t root)
                                root = y->right;
                                y->right = llist;
                                llist = y;
+                       } else if (addr < y->start) {
+                               if (y->left == NULL)
+                                       break;
+                               /* Put root on llist, y on rlist. */
+                               root->right = llist;
+                               llist = root;
+                               root = y->left;
+                               y->left = rlist;
+                               rlist = y;
                        } else {
                                /* Put root on llist. */
                                root->right = llist;
                                llist = root;
                                root = y;
+                               break;
                        }
                } else
                        break;
This comment was removed by dougm_rice.edu.
This comment was removed by dougm_rice.edu.
kib added inline comments.Dec 3 2017, 9:33 AM
vm/vm_map.c
1524 ↗(On Diff #36119)

After comparing my latest version and this patch, I think that the split of vm_map_find() into two functions provides much cleaner flow control. In your version, I would consider moving the loop to adjust *addr into a function, then break would transform into return (KERN_SUCCESS);. The vm_map_insert()/vm_map_stack() would be only called if result == KERN_SUCCESS.

alc added a comment.Dec 3 2017, 6:32 PM

I've applied the following patch and run a JVM workload before applying this change and after applying this change.

Index: vm/vm_map.c
===================================================================
--- vm/vm_map.c (revision 326323)
+++ vm/vm_map.c (working copy)
@@ -873,6 +873,9 @@ vm_map_entry_set_max_free(vm_map_entry_t entry)
                entry->max_free = entry->right->max_free;
 }
 
+static long counter;
+SYSCTL_ULONG(_debug, OID_AUTO, counter, CTLFLAG_RD, &counter, 0, "");
+
 /*
  *     vm_map_entry_splay:
  *
@@ -911,7 +914,7 @@ vm_map_entry_splay(vm_offset_t addr, vm_map_entry_
         */
        llist = NULL;
        rlist = NULL;
-       for (;;) {
+       for (; atomic_add_long(&counter, 1), true;) {
                /* root is never NULL in here. */
                if (addr < root->start) {
                        y = root->left;

As I would expect, there is no discernible difference in the number of splay iterations:

x before
+ after
+------------------------------------------------------------------------------+
|                      x            + x                                        |
|*     x x   ++*+    * x       + +* + xx +    x   x   +      +  +         x   x|
|         |__|_________________A_MM_______________|_____|                      |
+------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  15         44116         53654         48147       48088.6     2808.6388
+  15         44116         51873         48043     47870.267      2284.898
No difference proven at 95.0% confidence
alc updated this revision to Diff 36166.Dec 3 2017, 8:48 PM

Refactor the alignment code.

alc updated this revision to Diff 36170.Dec 4 2017, 1:11 AM

Tidy up. I have no further changes planned.

kib added inline comments.Dec 4 2017, 10:04 AM
vm/vm_map.c
1491 ↗(On Diff #36170)

Perhaps note that alignment == 0 implies superpage alignment.

1507 ↗(On Diff #36170)

This call under the assertion just repeats the preceding line in the caller. IMO it is excessive.

Might be, instead, add a new function which checks that [start, start+length) fits into the map, with overflow checking. The function can be used also in vm_map_insert() and vm_map_stack_locked(), I believe. It might be done as the next step.

1514 ↗(On Diff #36170)

WIth the amount of implicitness for find_space, I think that vm_map_find() should start asserting the values for it. I.e., assert that the lowest byte of find_space if from the known list.

alc updated this revision to Diff 36196.Dec 4 2017, 6:22 PM

Update the comment at the head of the function.

alc marked an inline comment as done.Dec 4 2017, 6:22 PM
alc added inline comments.Dec 4 2017, 7:42 PM
vm/vm_map.c
1507 ↗(On Diff #36170)

This call under the assertion just repeats the preceding line in the caller. IMO it is excessive.

I expect (or fear) that someone else will use this function someday.

kib added inline comments.Dec 4 2017, 8:24 PM
vm/vm_map.c
1507 ↗(On Diff #36170)

Do you mean vm_map_alignspace() or proposed check function ?

emaste added a subscriber: emaste.Dec 5 2017, 3:13 PM
alc updated this revision to Diff 36246.Dec 5 2017, 4:52 PM

Assert that "find_space" has a defined value.

alc marked an inline comment as done.Dec 5 2017, 4:53 PM
alc added inline comments.Dec 5 2017, 4:58 PM
vm/vm_map.c
1507 ↗(On Diff #36170)

Do you mean vm_map_alignspace() or proposed check function ?

vm_map_alignspace()

kib added inline comments.Dec 5 2017, 5:12 PM
vm/vm_map.c
1507 ↗(On Diff #36170)

The proposed function would provide the same correctness guarantees as the full vm_map_findspace() call.

I do not insist, of course.

markj added inline comments.Dec 5 2017, 5:13 PM
vm/vm_map.c
1584 ↗(On Diff #36246)

Doesn't this also need to accept VMFS_ALIGNED_SPACE?

alc added a comment.Dec 5 2017, 5:33 PM

Here are splay iteration counts for a "buildworld" before and after applying this change.

x buildworld_before
+ buildworld_after
+------------------------------------------------------------------------------+
|               +                                                              |
|x+x   + x   * +*  x   ++ ++              x                                   +|
||_|_________M_AM______A____|______________|                                   |
+------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   7 4.5242531e+08 4.5257617e+08 4.5246820e+08 4.5247583e+08      50335.55
+  11 4.5242990e+08 4.5271037e+08 4.5248084e+08 4.5250497e+08     74015.642
No difference proven at 95.0% confidence

In short, the extra findspace calls are having no significant effect.

That said, as a follow up I want to try out Doug's suggestion. He is correct. We are, in fact, doing redundant comparisons, and I'm skeptical about any branch predictor being able to reliably optimize those redundant branches. However, we make assumptions about where the sought after "gap" appears in the binary search tree after a splay, so his initial proposed change above doesn't play nice with our code. He's given me a revised version from what is posted here.

alc updated this revision to Diff 36247.Dec 5 2017, 5:41 PM

Correct the assertion in the last revision to this change.

alc marked an inline comment as done.Dec 5 2017, 5:44 PM
alc added inline comments.
vm/vm_map.c
1584 ↗(On Diff #36246)

You are absolutely correct. I hadn't yet run a test with openjdk8, which now uses MAP_ALIGNED() to implement the -XX:+UseLargePages option.

alc marked an inline comment as done.Dec 6 2017, 4:06 PM
alc added a subscriber: pho.Dec 6 2017, 4:11 PM

Peter, can you please test this change.

pho added a comment.Dec 6 2017, 4:35 PM
In D13346#279764, @alc wrote:

Peter, can you please test this change.

Happy to.

alc added a comment.Dec 6 2017, 6:16 PM

Doug's suggested change reduces the number of iterations in vm_map_entry_splay() during a "buildworld" workload by 41.5%. To be clear, we're still touching the same map entries; we're just avoiding redundant comparisons.

Doug can you create a review for that patch.

alc added a comment.Dec 7 2017, 5:39 AM

Kostik, I've been reviewing vm_map_find()'s callers, and I'm concerned about i915_gem_mmap_ioctl(). Consider this snippet:

        p = curproc;
        map = &p->p_vmspace->vm_map;
...
        addr = 0;
        vm_object_reference(obj->vm_obj);
        rv = vm_map_find(map, obj->vm_obj, args->offset, &addr, args->size, 0,

I believe that 0 is going to be out-of-range for a map on amd64, and so vm_map_find(), either with or without this revision, is going to return an error. Am I missing something here?

alc added a comment.Dec 7 2017, 6:08 AM
In D13346#280027, @alc wrote:

Kostik, I've been reviewing vm_map_find()'s callers, and I'm concerned about i915_gem_mmap_ioctl(). Consider this snippet:

        p = curproc;
        map = &p->p_vmspace->vm_map;
...
        addr = 0;
        vm_object_reference(obj->vm_obj);
        rv = vm_map_find(map, obj->vm_obj, args->offset, &addr, args->size, 0,

I believe that 0 is going to be out-of-range for a map on amd64, and so vm_map_find(), either with or without this revision, is going to return an error. Am I missing something here?

Nevermind. I forgot that vm_map_findspace() handles this case. It updates the given address to be at least vm_map_min(). In general, other callers don't pass 0, but themselves initialize the given address to vm_map_min().

kib added a comment.Dec 7 2017, 9:25 AM
In D13346#280028, @alc wrote:
In D13346#280027, @alc wrote:

Kostik, I've been reviewing vm_map_find()'s callers, and I'm concerned about i915_gem_mmap_ioctl(). Consider this snippet:

        p = curproc;
        map = &p->p_vmspace->vm_map;
...
        addr = 0;
        vm_object_reference(obj->vm_obj);
        rv = vm_map_find(map, obj->vm_obj, args->offset, &addr, args->size, 0,

I believe that 0 is going to be out-of-range for a map on amd64, and so vm_map_find(), either with or without this revision, is going to return an error. Am I missing something here?

Nevermind. I forgot that vm_map_findspace() handles this case. It updates the given address to be at least vm_map_min(). In general, other callers don't pass 0, but themselves initialize the given address to vm_map_min().

So do you prefer to change gem code to do the same ?

alc added a comment.EditedDec 7 2017, 4:14 PM
In D13346#280055, @kib wrote:
In D13346#280028, @alc wrote:

Nevermind. I forgot that vm_map_findspace() handles this case. It updates the given address to be at least vm_map_min(). In general, other callers don't pass 0, but themselves initialize the given address to vm_map_min().

So do you prefer to change gem code to do the same ?

No, I think that the gem code is fine as is. Rather, I think that the comment describing vm_map_find() should explicitly address the handling of an out-of-range "hint". However, I'd prefer not to make any more changes as part of this revision.

pho added a comment.Dec 7 2017, 7:18 PM
In D13346#279794, @pho wrote:
In D13346#279764, @alc wrote:

Peter, can you please test this change.

Happy to.

I ran all of the stress2 tests on amd64. No problems seen.

alc added a comment.Mon, Dec 25, 8:59 PM

Are there any other comments or questions about this change?

kib accepted this revision.Mon, Dec 25, 9:41 PM
This revision is now accepted and ready to land.Mon, Dec 25, 9:41 PM
This revision was automatically updated to reflect the committed changes.