Replace the count field in a trie node with a bitmap that identifies non-NULL children. In lookup_le, lookup_ge, and remove, use the bitmap to find the previous/next/only non-null child in constant time instead of looping across array elements.
Details
- Reviewers
alc markj - Commits
- rG8df38859d0f9: radix_trie: replace node count with popmap
It compiles. A GENERIC kernel with the patch was booted, and used to build a GENERIC-NODEBUG kernel, which was booted. I have seen no problems.
Peter, can you test this?
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Not Applicable - Unit
Tests Not Applicable
Event Timeline
Move the clearing of the last bit of the popmap from _remove to _node_get. If a node that has been 'put' has a last pointer, but the popmap bit for that pointer is clear, then the pointer might not be found. Only clear it when the last pointer gets nulled.
Now that popmap isn't zeroed prematurely, it duplicates data in the last field. So get rid of the last field.
sys/vm/vm_radix.c | ||
---|---|---|
295 | This change uses the popmap in ways that the old code didn't use the count. Consequently, I have to question whether the order of updates to the popmap and the node pointers matters in ways that the order of updates to the count and the node pointers didn't, for example, slot = ffs(rnode->rn_popmap) - 1; tmp = vm_radix_node_load(&rnode->rn_child[slot], LOCKED); I am not asserting that there is a memory ordering problem, only that I want to be convinced that there isn't, or alterations be made to this change to ensure that there aren't any memory ordering problems. |
sys/vm/vm_radix.c | ||
---|---|---|
295 | Is this concern restricted to lookup_ge and lookup_le? If so, I can revert those two functions so that they do not use popmap and address your concerns about them in a subsequent patch, provided that what remains of this patch is worthy of inclusion. If this concern applies to remove, then I understood that there was a writers lock that would limit reading or writing of popmap (after the change described in the previous paragraph) to one process at a time. |
sys/vm/vm_radix.c | ||
---|---|---|
295 | So I see that you quoted code from remove(), so that must be your concern. How are you convinced that there are not memory problems now? I can try to imagine ways that 'count' and the number of non-null children might come to disagree and cause trouble, but every one of my imaginings is stopped by the fact that two removals can't happen at once because there's a write lock. My attempts to imagine problems in the modified version come up against the same obstacle. The delayed flipping of a bit in popmask would cause trouble, but so would the delayed decrementing of 'count'. I don't see the distinction. |
sys/vm/vm_radix.c | ||
---|---|---|
126 | I think that it would be better to have a typedef for rn_popmap. The exact typedef could be conditioned on VM_RADIX_{WIDTH,COUNT}. | |
189 | We usually write an explicit != 0. | |
295 | My concern is not specific to remove or lookup_{ge,le}. | |
296 | We usually write an explicit != 0. | |
602–610 | Can you explain the use of -2 in a comment? | |
870 | This is so far removed from the radix node's popmap definition that in the absence of a typedef it would get overlooked if someone changed the width to 32. |
sys/kern/subr_pctrie.c | ||
---|---|---|
79 | I think that the more useful assertion is that the int type argument to ffs() is at least as wide as pn_popmap_t. |
sys/kern/subr_pctrie.c | ||
---|---|---|
77 | Because ffs() takes an int, this won't work. I would just use #error to emit unsupported width. |
I ran tests with D40775.124202.patch on both i386 and amd64 for 8 hours.
root@freebsd-vm:/usr/src/tools/test/stress2/misc # uname -a FreeBSD freebsd-vm 14.0-CURRENT FreeBSD 14.0-CURRENT #0 main-n263985-884eaacd24bdb-dirty: Thu Jul 6 07:18:28 CEST 2023 pho@mercat1.netperf.freebsd.org:/mnt25/obj/usr/src/i386.i386/sys/PHO i386 root@freebsd-vm:/usr/src/tools/test/stress2/misc # 16:26 /home/pho $ uname -a FreeBSD mercat1.netperf.freebsd.org 14.0-CURRENT FreeBSD 14.0-CURRENT #0 main-n263985-884eaacd24bdb-dirty: Thu Jul 6 06:02:24 CEST 2023 pho@mercat1.netperf.freebsd.org:/usr/src/sys/amd64/compile/PHO amd64 16:27 /home/pho $
No issues observed.
A new kernel, with these changes, and an old one were both instrumented with changes to record the number of calls to several vm_radix functions and the time spent in those calls. A 'buildworld' test was run with each kernel. The functions are abbreviated below as rc (reclaim_all), rm (remove), le (access_le) and ge (access_ge).
old new new/old rc_cycles: 11480307830 10155416011 rc_calls: 1096007 1110962 rc_cycles/call: 10474.666521290466 9141.101145673749 0.872686603157613 rm_cycles: 12165802051 12350658344 rm_calls: 101934319 103140884 rm_cycles/call: 119.34942196454955 119.74551569676288 1.0033187737794909 le_cycles: 67156160585 64213736254 le_calls: 307094506 308610635 le_cycles/call: 218.68239018577557 208.07363380072758 0.9514878341322517 ge_cycles: 2038386746 2028175859 ge_calls: 3262276 3295155 ge_cycles/call: 624.835772938893 615.5024146056862 0.9850627016931063
So the results very from little change for ge to almost 13% less time per call for reclaim_all.
Another instrumentation, that sought to count the number of null pointers not examined by using the bitmap per array walk. Those numbers are 7.68 (rc), 12.94 (rm), 1.41 (le) and 4.53 (ge).