Page MenuHomeFreeBSD

Handle address wrap errors in vm_map_find()
ClosedPublic

Authored by alc on Dec 2 2017, 8:49 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Nov 21, 4:07 AM
Unknown Object (File)
Wed, Nov 20, 12:01 PM
Unknown Object (File)
Sun, Nov 17, 6:51 PM
Unknown Object (File)
Fri, Nov 15, 8:32 AM
Unknown Object (File)
Tue, Oct 29, 11:53 AM
Unknown Object (File)
Tue, Oct 29, 11:53 AM
Unknown Object (File)
Tue, Oct 29, 11:53 AM
Unknown Object (File)
Tue, Oct 29, 11:53 AM

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

The splay performed by the initial call to vm_map_findspace() will produce a tree that has either (1) the entry before "*addr" at the root and the entry after "*addr" at the root's right child or (2) the entry after "*addr" at the root and the entry before "*addr" at the root's left child. Then, the splay performed by a redundant call to vm_map_findspace() will only perform a constant-time tweak to the tree structure. It will either change configuration #1 into configuration #2 or change configuration #2 into configuration #1. And, a configuration #1 will be handled by the following snippet of vm_map_findspace()

st = (start > map->root->end) ? start : map->root->end;
if (length <= map->root->end + map->root->adj_free - st) {
        *addr = st;
        return (0);

and a configuration #2 will be handled by the following snippet of vm_map_findspace()

if (start + length <= map->root->start) {
        *addr = start;
        return (0);

If you are concerned about a search for a missing tree element flipping the top two tree nodes for no good reason, as for example when looking for 1 in the set (0 2) and flipping the tree back and forth between

(root)0 -> 2

and
0 <- 2(root)

then I think a change like the one below, which makes one splay iteration take two steps down the tree always, when the tree is deep enough, instead of taking only one step down in the zig-zag case, would address your problem.

diff --git a/sys/vm/vm_map.c b/sys/vm/vm_map.c
index 905ac1b9313..0762da6bda9 100644
--- a/sys/vm/vm_map.c
+++ b/sys/vm/vm_map.c
@@ -917,7 +917,9 @@ vm_map_entry_splay(vm_offset_t addr, vm_map_entry_t root)
                        y = root->left;
                        if (y == NULL)
                                break;
-                       if (addr < y->start && y->left != NULL) {
+                       if (addr < y->start) {
+                               if (y->left == NULL)
+                                       break;
                                /* Rotate right and put y on rlist. */
                                root->left = y->right;
                                y->right = root;
@@ -925,17 +927,29 @@ vm_map_entry_splay(vm_offset_t addr, vm_map_entry_t root)
                                root = y->left;
                                y->left = rlist;
                                rlist = y;
+                       } else if (addr >= y->end) {
+                               if (y->right == NULL)
+                                       break;
+                               /* Put root on rlist, y on llist. */
+                               root->left = rlist;
+                               rlist = root;
+                               root = y->right;
+                               y->right = llist;
+                               llist = y;
                        } else {
                                /* Put root on rlist. */
                                root->left = rlist;
                                rlist = root;
                                root = y;
+                               break;
                        }
                } else if (addr >= root->end) {
                        y = root->right;
                        if (y == NULL)
                                break;
-                       if (addr >= y->end && y->right != NULL) {
+                       if (addr >= y->end) {
+                               if (y->right == NULL)
+                                       break;
                                /* Rotate left and put y on llist. */
                                root->right = y->left;
                                y->left = root;
@@ -943,11 +957,21 @@ vm_map_entry_splay(vm_offset_t addr, vm_map_entry_t root)
                                root = y->right;
                                y->right = llist;
                                llist = y;
+                       } else if (addr < y->start) {
+                               if (y->left == NULL)
+                                       break;
+                               /* Put root on llist, y on rlist. */
+                               root->right = llist;
+                               llist = root;
+                               root = y->left;
+                               y->left = rlist;
+                               rlist = y;
                        } else {
                                /* Put root on llist. */
                                root->right = llist;
                                llist = root;
                                root = y;
+                               break;
                        }
                } else
                        break;
This comment was removed by dougm.
This comment was removed by dougm.
vm/vm_map.c
1524 ↗(On Diff #36119)

After comparing my latest version and this patch, I think that the split of vm_map_find() into two functions provides much cleaner flow control. In your version, I would consider moving the loop to adjust *addr into a function, then break would transform into return (KERN_SUCCESS);. The vm_map_insert()/vm_map_stack() would be only called if result == KERN_SUCCESS.

I've applied the following patch and run a JVM workload before applying this change and after applying this change.

Index: vm/vm_map.c
===================================================================
--- vm/vm_map.c (revision 326323)
+++ vm/vm_map.c (working copy)
@@ -873,6 +873,9 @@ vm_map_entry_set_max_free(vm_map_entry_t entry)
                entry->max_free = entry->right->max_free;
 }
 
+static long counter;
+SYSCTL_ULONG(_debug, OID_AUTO, counter, CTLFLAG_RD, &counter, 0, "");
+
 /*
  *     vm_map_entry_splay:
  *
@@ -911,7 +914,7 @@ vm_map_entry_splay(vm_offset_t addr, vm_map_entry_
         */
        llist = NULL;
        rlist = NULL;
-       for (;;) {
+       for (; atomic_add_long(&counter, 1), true;) {
                /* root is never NULL in here. */
                if (addr < root->start) {
                        y = root->left;

As I would expect, there is no discernible difference in the number of splay iterations:

x before
+ after
+------------------------------------------------------------------------------+
|                      x            + x                                        |
|*     x x   ++*+    * x       + +* + xx +    x   x   +      +  +         x   x|
|         |__|_________________A_MM_______________|_____|                      |
+------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  15         44116         53654         48147       48088.6     2808.6388
+  15         44116         51873         48043     47870.267      2284.898
No difference proven at 95.0% confidence

Refactor the alignment code.

Tidy up. I have no further changes planned.

vm/vm_map.c
1491 ↗(On Diff #36170)

Perhaps note that alignment == 0 implies superpage alignment.

1507 ↗(On Diff #36170)

This call under the assertion just repeats the preceding line in the caller. IMO it is excessive.

Might be, instead, add a new function which checks that [start, start+length) fits into the map, with overflow checking. The function can be used also in vm_map_insert() and vm_map_stack_locked(), I believe. It might be done as the next step.

1514 ↗(On Diff #36170)

WIth the amount of implicitness for find_space, I think that vm_map_find() should start asserting the values for it. I.e., assert that the lowest byte of find_space if from the known list.

Update the comment at the head of the function.

alc marked an inline comment as done.Dec 4 2017, 6:22 PM
vm/vm_map.c
1507 ↗(On Diff #36170)

This call under the assertion just repeats the preceding line in the caller. IMO it is excessive.

I expect (or fear) that someone else will use this function someday.

vm/vm_map.c
1507 ↗(On Diff #36170)

Do you mean vm_map_alignspace() or proposed check function ?

Assert that "find_space" has a defined value.

alc marked an inline comment as done.Dec 5 2017, 4:53 PM
vm/vm_map.c
1507 ↗(On Diff #36170)

Do you mean vm_map_alignspace() or proposed check function ?

vm_map_alignspace()

vm/vm_map.c
1507 ↗(On Diff #36170)

The proposed function would provide the same correctness guarantees as the full vm_map_findspace() call.

I do not insist, of course.

vm/vm_map.c
1584 ↗(On Diff #36246)

Doesn't this also need to accept VMFS_ALIGNED_SPACE?

Here are splay iteration counts for a "buildworld" before and after applying this change.

x buildworld_before
+ buildworld_after
+------------------------------------------------------------------------------+
|               +                                                              |
|x+x   + x   * +*  x   ++ ++              x                                   +|
||_|_________M_AM______A____|______________|                                   |
+------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   7 4.5242531e+08 4.5257617e+08 4.5246820e+08 4.5247583e+08      50335.55
+  11 4.5242990e+08 4.5271037e+08 4.5248084e+08 4.5250497e+08     74015.642
No difference proven at 95.0% confidence

In short, the extra findspace calls are having no significant effect.

That said, as a follow up I want to try out Doug's suggestion. He is correct. We are, in fact, doing redundant comparisons, and I'm skeptical about any branch predictor being able to reliably optimize those redundant branches. However, we make assumptions about where the sought after "gap" appears in the binary search tree after a splay, so his initial proposed change above doesn't play nice with our code. He's given me a revised version from what is posted here.

Correct the assertion in the last revision to this change.

alc marked an inline comment as done.Dec 5 2017, 5:44 PM
alc added inline comments.
vm/vm_map.c
1584 ↗(On Diff #36246)

You are absolutely correct. I hadn't yet run a test with openjdk8, which now uses MAP_ALIGNED() to implement the -XX:+UseLargePages option.

alc marked an inline comment as done.Dec 6 2017, 4:06 PM

Peter, can you please test this change.

In D13346#279764, @alc wrote:

Peter, can you please test this change.

Happy to.

Doug's suggested change reduces the number of iterations in vm_map_entry_splay() during a "buildworld" workload by 41.5%. To be clear, we're still touching the same map entries; we're just avoiding redundant comparisons.

Doug can you create a review for that patch.

Kostik, I've been reviewing vm_map_find()'s callers, and I'm concerned about i915_gem_mmap_ioctl(). Consider this snippet:

        p = curproc;
        map = &p->p_vmspace->vm_map;
...
        addr = 0;
        vm_object_reference(obj->vm_obj);
        rv = vm_map_find(map, obj->vm_obj, args->offset, &addr, args->size, 0,

I believe that 0 is going to be out-of-range for a map on amd64, and so vm_map_find(), either with or without this revision, is going to return an error. Am I missing something here?

In D13346#280027, @alc wrote:

Kostik, I've been reviewing vm_map_find()'s callers, and I'm concerned about i915_gem_mmap_ioctl(). Consider this snippet:

        p = curproc;
        map = &p->p_vmspace->vm_map;
...
        addr = 0;
        vm_object_reference(obj->vm_obj);
        rv = vm_map_find(map, obj->vm_obj, args->offset, &addr, args->size, 0,

I believe that 0 is going to be out-of-range for a map on amd64, and so vm_map_find(), either with or without this revision, is going to return an error. Am I missing something here?

Nevermind. I forgot that vm_map_findspace() handles this case. It updates the given address to be at least vm_map_min(). In general, other callers don't pass 0, but themselves initialize the given address to vm_map_min().

In D13346#280028, @alc wrote:
In D13346#280027, @alc wrote:

Kostik, I've been reviewing vm_map_find()'s callers, and I'm concerned about i915_gem_mmap_ioctl(). Consider this snippet:

        p = curproc;
        map = &p->p_vmspace->vm_map;
...
        addr = 0;
        vm_object_reference(obj->vm_obj);
        rv = vm_map_find(map, obj->vm_obj, args->offset, &addr, args->size, 0,

I believe that 0 is going to be out-of-range for a map on amd64, and so vm_map_find(), either with or without this revision, is going to return an error. Am I missing something here?

Nevermind. I forgot that vm_map_findspace() handles this case. It updates the given address to be at least vm_map_min(). In general, other callers don't pass 0, but themselves initialize the given address to vm_map_min().

So do you prefer to change gem code to do the same ?

In D13346#280055, @kib wrote:
In D13346#280028, @alc wrote:

Nevermind. I forgot that vm_map_findspace() handles this case. It updates the given address to be at least vm_map_min(). In general, other callers don't pass 0, but themselves initialize the given address to vm_map_min().

So do you prefer to change gem code to do the same ?

No, I think that the gem code is fine as is. Rather, I think that the comment describing vm_map_find() should explicitly address the handling of an out-of-range "hint". However, I'd prefer not to make any more changes as part of this revision.

In D13346#279794, @pho wrote:
In D13346#279764, @alc wrote:

Peter, can you please test this change.

Happy to.

I ran all of the stress2 tests on amd64. No problems seen.

Are there any other comments or questions about this change?

This revision is now accepted and ready to land.Dec 25 2017, 9:41 PM
This revision was automatically updated to reflect the committed changes.