Complete zig-zag splay, rather than zigging and looping again

Authored by on Dec 6 2017, 6:42 PM.



The popular splay implementation adapted for use in vm_map.c can be made to do fewer comparisons and null checks, at the cost of a bit more code, by recognizing that when the next two steps are in opposite directions, the current and next nodes can be attached to left and right subtrees being assembled, rather than simply attaching the current and re-iterating on the next. This also means that a search for a missing element between the root and its predecessor or successor will not restructure the tree.

Diff Detail

Lint Skipped
Unit Tests Skipped created this revision.Dec 6 2017, 6:42 PM
alc added inline comments.Dec 7 2017, 5:25 PM

This can also be a redundant comparison if addr < y->start but y->left == NULL. added inline comments.Dec 7 2017, 6:26 PM

This would get rid of your redundant comparison. Do you prefer it?

			if (addr < y->start) {
				if (y->left != NULL) {
					/* Rotate right and put y on rlist. */
					root->left = y->right;
					y->right = root;
					root = y->left;
					y->left = rlist;
					rlist = y;
			} else if (addr >= y->end) {
				if (y->right == NULL)
				/* Put root on rlist, y on llist. */
				root->left = rlist;
				rlist = root;
				root = y->right;
				y->right = llist;
				llist = y;
			/* Put root on rlist. */
			root->left = rlist;
			rlist = root;
			root = y;

Adding code to eliminate a redundant comparison.

alc added a comment.Dec 27 2017, 8:50 PM

I've tested the patch with compiler.compiler from SPECjvm2008. I'm afraid that there is no discernible difference in the number of cycles spent in vm_map_entry_splay().

x before
+ after
|                                                +                              |
|+                 + x   + x    +x      xx   x   +x x   x   +            +     +|
|                |___________|__________AM_A_____M__|________________|          |
    N           Min           Max        Median           Avg        Stddev
x   9      20905598      23828774      22610639      22557854      994238.6
+   9      19257738      25797772      23263976      22777334     2164232.2
No difference proven at 95.0% confidence
alc added a comment.Dec 27 2017, 8:52 PM

That said, the increase in the variance is striking.

alc added a comment.Dec 28 2017, 7:00 AM

I see an 18.15% reduction in cycles spent in vm_map_entry_splay() for buildworld.

x before_world
+ after_world
|+                                                                            x|
|+                                                                            x|
|+                                                                            x|
|++                                                                          xx|
|A|                                                                          |A|
    N           Min           Max        Median           Avg        Stddev
x   5 4.8476252e+10 4.8601982e+10 4.8593236e+10 4.8570385e+10      53214127
+   5 3.9698544e+10  3.983885e+10 3.9745925e+10 3.9755596e+10      51061053
Difference at 95.0% confidence
        -8.81479e+09 +/- 7.60559e+07
        -18.1485% +/- 0.156589%
        (Student's t, pooled s = 5.21487e+07)
alc added a comment.Dec 29 2017, 9:48 PM

A problem with using any SPECjvm2008 test for evaluating this patch is that by default the SPECjvm2008 tests run for a fixed amount of time. Essentially, speeding up vm_map_entry_splay() just means that there will be more calls to vm_map_entry_splay(), and so it's not entirely surprisingly that there is no statistical difference in the number of cycles spent in vm_map_entry_splay().

alc added a comment.Dec 29 2017, 9:54 PM

Here are results from a different Java application.

x javamark_1.5GB_before
+ javamark_1.5GB_after
|+                          + +                    +            x                    +xx   x x  x +|
|           |____________________________________A_M_______________________|__________|____M______||
    N           Min           Max        Median           Avg        Stddev
x   6       7180627       7524006       7463407     7416031.3     121934.89
+   6       6513706       7541927       7042828     7020509.2     391280.84
Difference at 95.0% confidence
        -395522 +/- 372781
        -5.33334% +/- 5.02669%
        (Student's t, pooled s = 289801)

On a failed lookup, stop trying to avoid pulling both neighbors of the missing item to the top of the tree.

Save the new 'root' value in a variable 'z' to avoid having to access the same child of y twice, once for a null check and later for assignment to root.

Fixup comments, and whitespace. added inline comments.Jan 1 2018, 7:01 PM

This case, and the symmetric 'rotate left' case below, are the only cases in this splay code in which a node is modified, but not put on either rlist or llist. We know that max_free is calculated for those nodes in the second pass up the tree. Do we need to recompute max_free for this case, for 'root'?

Yes. Suppose that at the top of the loop 'd' was root in a tree like this:
((A b C) d E)
with the max_free value at d being the max of the entry values b and d and the max_free values of the subtrees A C and E. That is, D = max(A,b,C,d,E). After the restructuring, d is the root of a smaller subtree:
(C d E)
and so the new max_free value for D = max(C,d,E). If either A or b were big, then D has been reduced by this transformation and must be recalculated.

We know that no changes are happening to the subtrees C and E the rest of the splay, as the next root value is b and further changes happen in only that subtree. added inline comments.Jan 1 2018, 7:17 PM

Change 'b' in the last line to 'a'. abandoned this revision.Jan 26 2018, 8:31 PM