Page MenuHomeFreeBSD

Complete zig-zag splay, rather than zigging and looping again
AbandonedPublic

Authored by dougm on Dec 6 2017, 6:42 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Apr 17, 4:05 PM
Unknown Object (File)
Fri, Mar 29, 5:05 AM
Unknown Object (File)
Feb 28 2024, 8:11 AM
Unknown Object (File)
Dec 24 2023, 8:25 PM
Unknown Object (File)
Dec 24 2023, 4:53 PM
Unknown Object (File)
Dec 24 2023, 3:37 PM
Unknown Object (File)
Nov 29 2023, 11:52 PM
Unknown Object (File)
Nov 6 2023, 7:06 PM
Subscribers
None

Details

Reviewers
alc
kib
markj
Summary

The popular splay implementation adapted for use in vm_map.c can be made to do fewer comparisons and null checks, at the cost of a bit more code, by recognizing that when the next two steps are in opposite directions, the current and next nodes can be attached to left and right subtrees being assembled, rather than simply attaching the current and re-iterating on the next. This also means that a search for a missing element between the root and its predecessor or successor will not restructure the tree.

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

sys/vm/vm_map.c
928

This can also be a redundant comparison if addr < y->start but y->left == NULL.

sys/vm/vm_map.c
928

This would get rid of your redundant comparison. Do you prefer it?

			if (addr < y->start) {
				if (y->left != NULL) {
					/* Rotate right and put y on rlist. */
					root->left = y->right;
					y->right = root;
					vm_map_entry_set_max_free(root);
					root = y->left;
					y->left = rlist;
					rlist = y;
					continue;
				}
			} else if (addr >= y->end) {
				if (y->right == NULL)
					break;
				/* Put root on rlist, y on llist. */
				root->left = rlist;
				rlist = root;
				root = y->right;
				y->right = llist;
				llist = y;
				continue;
			}
			/* Put root on rlist. */
			root->left = rlist;
			rlist = root;
			root = y;
			break;

Adding code to eliminate a redundant comparison.

I've tested the patch with compiler.compiler from SPECjvm2008. I'm afraid that there is no discernible difference in the number of cycles spent in vm_map_entry_splay().

x before
+ after
+-------------------------------------------------------------------------------+
|                                                +                              |
|+                 + x   + x    +x      xx   x   +x x   x   +            +     +|
|                |___________|__________AM_A_____M__|________________|          |
+-------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   9      20905598      23828774      22610639      22557854      994238.6
+   9      19257738      25797772      23263976      22777334     2164232.2
No difference proven at 95.0% confidence

That said, the increase in the variance is striking.

I see an 18.15% reduction in cycles spent in vm_map_entry_splay() for buildworld.

x before_world
+ after_world
+------------------------------------------------------------------------------+
|+                                                                            x|
|+                                                                            x|
|+                                                                            x|
|++                                                                          xx|
|A|                                                                          |A|
+------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5 4.8476252e+10 4.8601982e+10 4.8593236e+10 4.8570385e+10      53214127
+   5 3.9698544e+10  3.983885e+10 3.9745925e+10 3.9755596e+10      51061053
Difference at 95.0% confidence
        -8.81479e+09 +/- 7.60559e+07
        -18.1485% +/- 0.156589%
        (Student's t, pooled s = 5.21487e+07)

A problem with using any SPECjvm2008 test for evaluating this patch is that by default the SPECjvm2008 tests run for a fixed amount of time. Essentially, speeding up vm_map_entry_splay() just means that there will be more calls to vm_map_entry_splay(), and so it's not entirely surprisingly that there is no statistical difference in the number of cycles spent in vm_map_entry_splay().

Here are results from a different Java application.

x javamark_1.5GB_before
+ javamark_1.5GB_after
+--------------------------------------------------------------------------------------------------+
|+                          + +                    +            x                    +xx   x x  x +|
|           |____________________________________A_M_______________________|__________|____M______||
+--------------------------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   6       7180627       7524006       7463407     7416031.3     121934.89
+   6       6513706       7541927       7042828     7020509.2     391280.84
Difference at 95.0% confidence
        -395522 +/- 372781
        -5.33334% +/- 5.02669%
        (Student's t, pooled s = 289801)

On a failed lookup, stop trying to avoid pulling both neighbors of the missing item to the top of the tree.

Save the new 'root' value in a variable 'z' to avoid having to access the same child of y twice, once for a null check and later for assignment to root.

Fixup comments, and whitespace.

sys/vm/vm_map.c
932

This case, and the symmetric 'rotate left' case below, are the only cases in this splay code in which a node is modified, but not put on either rlist or llist. We know that max_free is calculated for those nodes in the second pass up the tree. Do we need to recompute max_free for this case, for 'root'?

Yes. Suppose that at the top of the loop 'd' was root in a tree like this:
((A b C) d E)
with the max_free value at d being the max of the entry values b and d and the max_free values of the subtrees A C and E. That is, D = max(A,b,C,d,E). After the restructuring, d is the root of a smaller subtree:
(C d E)
and so the new max_free value for D = max(C,d,E). If either A or b were big, then D has been reduced by this transformation and must be recalculated.

We know that no changes are happening to the subtrees C and E the rest of the splay, as the next root value is b and further changes happen in only that subtree.

sys/vm/vm_map.c
932

Change 'b' in the last line to 'a'.