Complete zig-zag splay, rather than zigging and looping again
Needs ReviewPublic

Authored by dougm_rice.edu on Dec 6 2017, 6:42 PM.

Details

Reviewers
alc
kib
markj
Summary

The popular splay implementation adapted for use in vm_map.c can be made to do fewer comparisons and null checks, at the cost of a bit more code, by recognizing that when the next two steps are in opposite directions, the current and next nodes can be attached to left and right subtrees being assembled, rather than simply attaching the current and re-iterating on the next. This also means that a search for a missing element between the root and its predecessor or successor will not restructure the tree.

Diff Detail

Lint
Lint Skipped
Unit
Unit Tests Skipped
dougm_rice.edu created this revision.Dec 6 2017, 6:42 PM
alc added inline comments.Dec 7 2017, 5:25 PM
sys/vm/vm_map.c
928

This can also be a redundant comparison if addr < y->start but y->left == NULL.

dougm_rice.edu added inline comments.Dec 7 2017, 6:26 PM
sys/vm/vm_map.c
928

This would get rid of your redundant comparison. Do you prefer it?

			if (addr < y->start) {
				if (y->left != NULL) {
					/* Rotate right and put y on rlist. */
					root->left = y->right;
					y->right = root;
					vm_map_entry_set_max_free(root);
					root = y->left;
					y->left = rlist;
					rlist = y;
					continue;
				}
			} else if (addr >= y->end) {
				if (y->right == NULL)
					break;
				/* Put root on rlist, y on llist. */
				root->left = rlist;
				rlist = root;
				root = y->right;
				y->right = llist;
				llist = y;
				continue;
			}
			/* Put root on rlist. */
			root->left = rlist;
			rlist = root;
			root = y;
			break;

Adding code to eliminate a redundant comparison.

alc added a comment.Wed, Dec 27, 8:50 PM

I've tested the patch with compiler.compiler from SPECjvm2008. I'm afraid that there is no discernible difference in the number of cycles spent in vm_map_entry_splay().

x before
+ after
+-------------------------------------------------------------------------------+
|                                                +                              |
|+                 + x   + x    +x      xx   x   +x x   x   +            +     +|
|                |___________|__________AM_A_____M__|________________|          |
+-------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   9      20905598      23828774      22610639      22557854      994238.6
+   9      19257738      25797772      23263976      22777334     2164232.2
No difference proven at 95.0% confidence
alc added a comment.Wed, Dec 27, 8:52 PM

That said, the increase in the variance is striking.

alc added a comment.Thu, Dec 28, 7:00 AM

I see an 18.15% reduction in cycles spent in vm_map_entry_splay() for buildworld.

x before_world
+ after_world
+------------------------------------------------------------------------------+
|+                                                                            x|
|+                                                                            x|
|+                                                                            x|
|++                                                                          xx|
|A|                                                                          |A|
+------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5 4.8476252e+10 4.8601982e+10 4.8593236e+10 4.8570385e+10      53214127
+   5 3.9698544e+10  3.983885e+10 3.9745925e+10 3.9755596e+10      51061053
Difference at 95.0% confidence
        -8.81479e+09 +/- 7.60559e+07
        -18.1485% +/- 0.156589%
        (Student's t, pooled s = 5.21487e+07)
alc added a comment.Fri, Dec 29, 9:48 PM

A problem with using any SPECjvm2008 test for evaluating this patch is that by default the SPECjvm2008 tests run for a fixed amount of time. Essentially, speeding up vm_map_entry_splay() just means that there will be more calls to vm_map_entry_splay(), and so it's not entirely surprisingly that there is no statistical difference in the number of cycles spent in vm_map_entry_splay().

alc added a comment.Fri, Dec 29, 9:54 PM

Here are results from a different Java application.

x javamark_1.5GB_before
+ javamark_1.5GB_after
+--------------------------------------------------------------------------------------------------+
|+                          + +                    +            x                    +xx   x x  x +|
|           |____________________________________A_M_______________________|__________|____M______||
+--------------------------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   6       7180627       7524006       7463407     7416031.3     121934.89
+   6       6513706       7541927       7042828     7020509.2     391280.84
Difference at 95.0% confidence
        -395522 +/- 372781
        -5.33334% +/- 5.02669%
        (Student's t, pooled s = 289801)

On a failed lookup, stop trying to avoid pulling both neighbors of the missing item to the top of the tree.

Save the new 'root' value in a variable 'z' to avoid having to access the same child of y twice, once for a null check and later for assignment to root.

Fixup comments, and whitespace.

dougm_rice.edu added inline comments.Mon, Jan 1, 7:01 PM
sys/vm/vm_map.c
932

This case, and the symmetric 'rotate left' case below, are the only cases in this splay code in which a node is modified, but not put on either rlist or llist. We know that max_free is calculated for those nodes in the second pass up the tree. Do we need to recompute max_free for this case, for 'root'?

Yes. Suppose that at the top of the loop 'd' was root in a tree like this:
((A b C) d E)
with the max_free value at d being the max of the entry values b and d and the max_free values of the subtrees A C and E. That is, D = max(A,b,C,d,E). After the restructuring, d is the root of a smaller subtree:
(C d E)
and so the new max_free value for D = max(C,d,E). If either A or b were big, then D has been reduced by this transformation and must be recalculated.

We know that no changes are happening to the subtrees C and E the rest of the splay, as the next root value is b and further changes happen in only that subtree.

dougm_rice.edu added inline comments.Mon, Jan 1, 7:17 PM
sys/vm/vm_map.c
932

Change 'b' in the last line to 'a'.