Page MenuHomeFreeBSD

radix_trie: use null leaves to speed searches
ClosedPublic

Authored by dougm on Jul 25 2023, 8:09 AM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, May 6, 3:10 AM
Unknown Object (File)
Mon, May 6, 3:10 AM
Unknown Object (File)
Mon, May 6, 3:10 AM
Unknown Object (File)
Mon, May 6, 3:10 AM
Unknown Object (File)
Feb 14 2024, 12:39 PM
Unknown Object (File)
Feb 14 2024, 12:39 PM
Unknown Object (File)
Feb 14 2024, 12:39 PM
Unknown Object (File)
Feb 14 2024, 12:38 PM
Subscribers

Details

Summary

Every path in a radix trie ends with a leaf or a NULL. By replacing NULL (non-leaf) pointers with NULL leaves, there is a NULL test removed from every iteration of an index-based search loop.

This is one of three major changes first presented in D40979, where cumulatively they improved lookup times by 17% to 23% in testing, and insert/remove times by about 5%. The other changes will follow this one.

Test Plan

A kernel has booted with this change in place and used to build another kernel, which rebooted too.

Peter Holm has tested a bigger patch that contains these ideas, expressed in a more Dijkstrian way.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

dougm requested review of this revision.Jul 25 2023, 8:09 AM
dougm created this revision.
dougm added a reviewer: alc.

Drop a redundant calculation from insert.

Expose the fact of the node type, and define the root as a pointer to a node. Put the NULL definitions in the header file and use them in the inline header functions.

As expected, this reduces the number of instructions in _vm_page_lookup by two. Before:

1f0: 48 89 f2                      movq    %rsi, %rdx
1f3: 48 d3 ea                      shrq    %cl, %rdx
1f6: 83 e2 0f                      andl    $0xf, %edx
1f9: 48 8b 44 d0 10                movq    0x10(%rax,%rdx,8), %rax
1fe: 48 85 c0                      testq   %rax, %rax
201: 74 34                         je      0x237 <vm_radix_lookup+0x57>
203: a8 01                         testb   $0x1, %al
205: 75 26                         jne     0x22d <vm_radix_lookup+0x4d>
207: 0f b6 50 0a                   movzbl  0xa(%rax), %edx
20b: 48 8d 0c 95 00 00 00 00       leaq    (,%rdx,4), %rcx
213: 48 83 fa 0e                   cmpq    $0xe, %rdx
217: 77 d7                         ja      0x1f0 <vm_radix_lookup+0x10>
219: 48 c7 c2 f0 ff ff ff          movq    $-0x10, %rdx
220: 48 d3 e2                      shlq    %cl, %rdx
223: 48 21 f2                      andq    %rsi, %rdx
226: 48 3b 10                      cmpq    (%rax), %rdx
229: 74 c5                         je      0x1f0 <vm_radix_lookup+0x10>

After:

2e0: 48 89 f2                      movq    %rsi, %rdx
2e3: 48 d3 ea                      shrq    %cl, %rdx
2e6: 83 e2 0f                      andl    $0xf, %edx
2e9: 48 8b 44 d0 10                movq    0x10(%rax,%rdx,8), %rax
2ee: a8 01                         testb   $0x1, %al
2f0: 75 26                         jne     0x318 <vm_radix_lookup+0x48>
2f2: 0f b6 50 0a                   movzbl  0xa(%rax), %edx
2f6: 48 8d 0c 95 00 00 00 00       leaq    (,%rdx,4), %rcx
2fe: 48 83 fa 0e                   cmpq    $0xe, %rdx
302: 77 dc                         ja      0x2e0 <vm_radix_lookup+0x10>
304: 48 c7 c2 f0 ff ff ff          movq    $-0x10, %rdx
30b: 48 d3 e2                      shlq    %cl, %rdx
30e: 48 21 f2                      andq    %rsi, %rdx
311: 48 3b 10                      cmpq    (%rax), %rdx
314: 74 ca                         je      0x2e0 <vm_radix_lookup+0x10>

The effect on arm64 is similar.

sys/kern/subr_pctrie.c
443

Using a break here is consistent with the control flow of the other functions.

sys/sys/pctrie.h
139
sys/vm/vm_radix.h
52
dougm marked 3 inline comments as done.
dougm edited the summary of this revision. (Show Details)
dougm added reviewers: markj, kib.
This revision is now accepted and ready to land.Jul 27 2023, 6:54 PM