Details

Reviewers

alc
markj

Commits

rG6f251ef228e6: radix_trie: simplify ge, le lookups

Summary

Replace the implementations of lookup_le and lookup_ge with ones that do not use a stack or climb back up the tree.

Test Plan

On amd64, these changes reduce the size of the compiled lookup_ge and lookup_le routines by 112 bytes each.

A buildworld test suggests that this change speeds up lookup_le calls by 21.6%, and slows lookup_ge calls by 4.4%.

original timing:
52450.199u 1452.606s 59:09.38 1518.6% 73478+3095k 121089+33120io 110807pf+0w
vm.radix.le_cycles: 74538053872
vm.radix.le_calls: 341220107
le_cycles/call: 218.44566701340375
vm.radix.ge_cycles: 2167673543
vm.radix.ge_calls: 3540959
ge_cycles/call: 612.1713194081038

modified timing:
52599.911u 1454.881s 59:17.88 1519.2% 73479+3096k 121017+34297io 110758pf+0w
vm.radix.le_cycles: 58436422956
vm.radix.le_calls: 341219862
le_cycles/call: 171.25738992298167
vm.radix.ge_cycles: 2264687629
vm.radix.ge_calls: 3544083
ge_cycles/call: 639.0052459268026

(/ 171.25738992298167 218.44566701340375)
0.7839816292280742
(/ 639.0052459268026 612.1713194081038)
1.0438340145445624

Peter, can you test this?

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

dougm requested review of this revision.Jul 8 2023, 2:40 AM

dougm created this revision.

dougm edited the summary of this revision. (Show Details)Jul 9 2023, 8:57 PM

dougm edited the test plan for this revision. (Show Details)

Discard the effort to include a lock-free option to simplify the code. Avoid counting leading or trailing zeroes within the initial search loop. Use builtin_ctz and builtin_clz to avoid zero-checks I know to be unnecessary.

Apply it all to subr_pctrie too.

dougm edited the test plan for this revision. (Show Details)Jul 13 2023, 6:38 PM

dougm added a subscriber: pho.

I ran tests for 16 hours with D40936.id124419.diff on

[pho@mercat1 /usr/src/sys/amd64/compile/PHO]$ uname -a
FreeBSD mercat1.netperf.freebsd.org 14.0-CURRENT FreeBSD 14.0-CURRENT #0 main-n264144-1cd9788408aa9-dirty: Thu Jul 13 21:09:42 CEST 2023     pho@mercat1.netperf.freebsd.org:/usr/src/sys/amd64/compile/PHO amd64
[pho@mercat1 /usr/src/sys/amd64/compile/PHO]$

No issues observed.

Changed comments a variable names, but nothing affecting behavior.

alc added inline comments.Jul 15 2023, 7:42 PM

sys/vm/vm_radix.c
605–608
608
622	Can you add a block comment above the `while` that explains what this loop does? In particular, explaining what `pred` is.

alc added inline comments.Jul 15 2023, 7:57 PM

sys/vm/vm_radix.c
651

alc added inline comments.Jul 15 2023, 7:59 PM

sys/vm/vm_radix.c
641	Add a comment above the `if` to the effect that no exact match was found.

alc added inline comments.Jul 15 2023, 8:02 PM

sys/vm/vm_radix.c
567	I would add parentheses here for readability even if they aren't technically necessary.

dougm updated this revision to Diff 124718.Jul 15 2023, 8:22 PM

dougm marked 6 inline comments as done.

alc added inline comments.Jul 16 2023, 6:49 AM

sys/vm/vm_radix.c
605–608	I don't see the need for the words "stored at a position".
631	"node" should be "page", and I don't understand the second use of "max" in this comment.
637	"nodes" should be "pages".
643	"nodes" should be "pages".
646	"nodes" should be "pages".
652–653	This is not proper style for comments.

dougm updated this revision to Diff 124724.Jul 16 2023, 7:06 AM

dougm marked 6 inline comments as done.

alc added inline comments.Jul 16 2023, 7:14 AM

sys/vm/vm_radix.c
640	Shouldn't "past" be "before".

dougm updated this revision to Diff 124725.Jul 16 2023, 7:18 AM

dougm marked an inline comment as done.

alc added inline comments.Jul 16 2023, 6:38 PM

sys/vm/vm_radix.c
618–620	"Descend the trie as if performing an ordinary lookup for the page with the specified pindex. However, unlike an ordinary lookup, as we descend the trie, we use "pred" to remember the last branching-off point, that is, the interior node under which the page with the greatest pindex that is both outside our current path down the trie and less than the specified pindex resides. (The node's popmap makes it fast and easy to recognize a branching-off point.) If our ordinary lookup fails to yield a page with a pindex that is less than or equal to the specified pindex, then we will exit this loop and perform a lookup starting from "pred". If "pred" is not NULL, then that lookup is guaranteed to succeed.

More comments. Enough comments? What does Mark think? Does he understand it?

In D40936#934674, @dougm wrote:

More comments. Enough comments? What does Mark think? Does he understand it?

I find the new version quite a bit easier to understand. I haven't read it exhaustively. I do think Alan's suggested description, "Descend the trie as if performing an ordinary lookup..." would be nice to have, since otherwise no comment directly explains the idea behind the implementation.

original timing:
52450.199u 1452.606s 59:09.38 1518.6% 73478+3095k 121089+33120io 110807pf+0w
modified timing:
52599.911u 1454.881s 59:17.88 1519.2% 73479+3096k 121017+34297io 110758pf+0w

Do you have any idea why the timings are so different here? Is the difference consistent across multiple runs?

Drop in 4 versions of Alan's paragraph comment.

markj added inline comments.Jul 17 2023, 4:50 PM

sys/vm/vm_radix.c
629	You could replace one of the copies (per file) with "See vm_radix_lookup_ge() for an explanation of the algorithm."

Reduce a couple of paragraph-comments.

alc added inline comments.Jul 17 2023, 5:28 PM

sys/vm/vm_radix.c
626–627	I think that you can drop all of the comments within this function, aside from the mirroring comment. @markj Do you agree?

original timing:
52450.199u 1452.606s 59:09.38 1518.6% 73478+3095k 121089+33120io 110807pf+0w
modified timing:
52599.911u 1454.881s 59:17.88 1519.2% 73479+3096k 121017+34297io 110758pf+0w

Do you have any idea why the timings are so different here? Is the difference consistent across multiple runs?

I ran 3 more buildworld tests on the modified and unchanged kernels immediately after reboots, with no instrumentation
added to count performance stats for any particular function.
The results:

Modified:
root@108-254-203-203:/usr/src # time make -j16 buildworld > & /dev/null
52357.285u 1448.444s 59:11.33 1515.0% 73477+3095k 120995+33438io 110815pf+0w
root@108-254-203-203:/usr/src # time make -j16 buildworld > & /dev/null
52553.118u 1456.257s 59:19.67 1517.2% 73472+3096k 121217+33609io 110821pf+0w
root@108-254-203-203:/usr/src # time make -j16 buildworld > & /dev/null
52547.476u 1452.889s 59:17.54 1517.9% 73470+3095k 121178+33245io 110870pf+0w

Original:
root@108-254-203-203:/usr/src # time make -j16 buildworld > & /dev/null
52602.552u 1452.130s 59:20.49 1518.1% 73477+3096k 121068+33157io 110855pf+0w
root@108-254-203-203:/usr/src # time make -j16 buildworld > & /dev/null
52656.392u 1451.635s 59:27.48 1516.7% 73471+3096k 121118+33354io 110845pf+0w
root@108-254-203-203:/usr/src # time make -j16 buildworld > & /dev/null
52601.014u 1445.216s 59:23.55 1516.6% 73478+3095k 121121+33611io 110844pf+0w

markj added inline comments.Jul 18 2023, 2:40 PM

sys/vm/vm_radix.c
626–627	I think that's reasonable, yes.

dougm updated this revision to Diff 124797.Jul 18 2023, 3:38 PM

Whitespace fixes.

In D40936#934985, @dougm wrote:

original timing:
52450.199u 1452.606s 59:09.38 1518.6% 73478+3095k 121089+33120io 110807pf+0w
modified timing:
52599.911u 1454.881s 59:17.88 1519.2% 73479+3096k 121017+34297io 110758pf+0w

Do you have any idea why the timings are so different here? Is the difference consistent across multiple runs?

I ran 3 more buildworld tests on the modified and unchanged kernels immediately after reboots, with no instrumentation
added to count performance stats for any particular function.
The results:

Modified:
root@108-254-203-203:/usr/src # time make -j16 buildworld > & /dev/null
52357.285u 1448.444s 59:11.33 1515.0% 73477+3095k 120995+33438io 110815pf+0w
root@108-254-203-203:/usr/src # time make -j16 buildworld > & /dev/null
52553.118u 1456.257s 59:19.67 1517.2% 73472+3096k 121217+33609io 110821pf+0w
root@108-254-203-203:/usr/src # time make -j16 buildworld > & /dev/null
52547.476u 1452.889s 59:17.54 1517.9% 73470+3095k 121178+33245io 110870pf+0w

Original:
root@108-254-203-203:/usr/src # time make -j16 buildworld > & /dev/null
52602.552u 1452.130s 59:20.49 1518.1% 73477+3096k 121068+33157io 110855pf+0w
root@108-254-203-203:/usr/src # time make -j16 buildworld > & /dev/null
52656.392u 1451.635s 59:27.48 1516.7% 73471+3096k 121118+33354io 110845pf+0w
root@108-254-203-203:/usr/src # time make -j16 buildworld > & /dev/null
52601.014u 1445.216s 59:23.55 1516.6% 73478+3095k 121121+33611io 110844pf+0w

In my experience, up to 10 seconds variance is normal between "buildworld" runs. These results are for -NODEBUG kernels, yes?

This revision is now accepted and ready to land.Jul 18 2023, 5:33 PM

In D40936#935104, @alc wrote:

In my experience, up to 10 seconds variance is normal between "buildworld" runs. These results are for -NODEBUG kernels, yes?

Yes. GENERIC-NODEBUG.

markj accepted this revision.Jul 19 2023, 1:39 PM

Closed by commit rG6f251ef228e6: radix_trie: simplify ge, le lookups (authored by dougm). · Explain WhyJul 19 2023, 2:46 PM

This revision was automatically updated to reflect the committed changes.

dougm added a commit: rG6f251ef228e6: radix_trie: simplify ge, le lookups.

Herald added a subscriber: imp. · View Herald TranscriptJul 19 2023, 2:46 PM