Page MenuHomeFreeBSD

radix_trie: replace node count with popmap
ClosedPublic

Authored by dougm on Jun 27 2023, 6:07 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sep 28 2024, 6:08 AM
Unknown Object (File)
Sep 24 2024, 9:11 AM
Unknown Object (File)
Sep 24 2024, 4:28 AM
Unknown Object (File)
Sep 24 2024, 3:30 AM
Unknown Object (File)
Sep 23 2024, 10:47 PM
Unknown Object (File)
Sep 17 2024, 1:47 AM
Unknown Object (File)
Sep 16 2024, 6:26 PM
Unknown Object (File)
Sep 2 2024, 12:57 AM
Subscribers

Details

Summary

Replace the count field in a trie node with a bitmap that identifies non-NULL children. In lookup_le, lookup_ge, and remove, use the bitmap to find the previous/next/only non-null child in constant time instead of looping across array elements.

Test Plan

It compiles. A GENERIC kernel with the patch was booted, and used to build a GENERIC-NODEBUG kernel, which was booted. I have seen no problems.

Peter, can you test this?

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

dougm requested review of this revision.Jun 27 2023, 6:07 PM
dougm created this revision.
dougm added a subscriber: pho.

Move the clearing of the last bit of the popmap from _remove to _node_get. If a node that has been 'put' has a last pointer, but the popmap bit for that pointer is clear, then the pointer might not be found. Only clear it when the last pointer gets nulled.

With the last change, a KASSERT must also change.

Now that popmap isn't zeroed prematurely, it duplicates data in the last field. So get rid of the last field.

Add a space between 'return' and '(' in one place.

I ran tests with D40775.123968.patch for a day without observing any issues.

sys/vm/vm_radix.c
295

This change uses the popmap in ways that the old code didn't use the count. Consequently, I have to question whether the order of updates to the popmap and the node pointers matters in ways that the order of updates to the count and the node pointers didn't, for example,

slot = ffs(rnode->rn_popmap) - 1;
tmp = vm_radix_node_load(&rnode->rn_child[slot], LOCKED);

I am not asserting that there is a memory ordering problem, only that I want to be convinced that there isn't, or alterations be made to this change to ensure that there aren't any memory ordering problems.

sys/vm/vm_radix.c
295

Is this concern restricted to lookup_ge and lookup_le? If so, I can revert those two functions so that they do not use popmap and address your concerns about them in a subsequent patch, provided that what remains of this patch is worthy of inclusion.

If this concern applies to remove, then I understood that there was a writers lock that would limit reading or writing of popmap (after the change described in the previous paragraph) to one process at a time.

sys/vm/vm_radix.c
295

So I see that you quoted code from remove(), so that must be your concern.

How are you convinced that there are not memory problems now? I can try to imagine ways that 'count' and the number of non-null children might come to disagree and cause trouble, but every one of my imaginings is stopped by the fact that two removals can't happen at once because there's a write lock. My attempts to imagine problems in the modified version come up against the same obstacle. The delayed flipping of a bit in popmask would cause trouble, but so would the delayed decrementing of 'count'. I don't see the distinction.

sys/vm/vm_radix.c
126

I think that it would be better to have a typedef for rn_popmap. The exact typedef could be conditioned on VM_RADIX_{WIDTH,COUNT}.

189

We usually write an explicit != 0.

295

My concern is not specific to remove or lookup_{ge,le}.

296

We usually write an explicit != 0.

602–610

Can you explain the use of -2 in a comment?

870

This is so far removed from the radix node's popmap definition that in the absence of a typedef it would get overlooked if someone changed the width to 32.

dougm marked 6 inline comments as done.
sys/kern/subr_pctrie.c
79

I think that the more useful assertion is that the int type argument to ffs() is at least as wide as pn_popmap_t.

sys/kern/subr_pctrie.c
77

Because ffs() takes an int, this won't work. I would just use #error to emit unsupported width.

dougm marked 2 inline comments as done.

Peter, can you please test this on an x86 32-bit machine?

I ran tests with D40775.123968.patch for a day without observing any issues.

Peter, can you please test this on an x86 32-bit machine?

Sure!

I ran tests with D40775.124202.patch on both i386 and amd64 for 8 hours.

root@freebsd-vm:/usr/src/tools/test/stress2/misc # uname -a
FreeBSD freebsd-vm 14.0-CURRENT FreeBSD 14.0-CURRENT #0 main-n263985-884eaacd24bdb-dirty: Thu Jul  6 07:18:28 CEST 2023     pho@mercat1.netperf.freebsd.org:/mnt25/obj/usr/src/i386.i386/sys/PHO i386
root@freebsd-vm:/usr/src/tools/test/stress2/misc # 

16:26 /home/pho $ uname -a
FreeBSD mercat1.netperf.freebsd.org 14.0-CURRENT FreeBSD 14.0-CURRENT #0 main-n263985-884eaacd24bdb-dirty: Thu Jul  6 06:02:24 CEST 2023     pho@mercat1.netperf.freebsd.org:/usr/src/sys/amd64/compile/PHO amd64
16:27 /home/pho $

No issues observed.

A new kernel, with these changes, and an old one were both instrumented with changes to record the number of calls to several vm_radix functions and the time spent in those calls. A 'buildworld' test was run with each kernel. The functions are abbreviated below as rc (reclaim_all), rm (remove), le (access_le) and ge (access_ge).

                old                     new                     new/old
rc_cycles:      11480307830             10155416011
rc_calls:       1096007                 1110962
rc_cycles/call: 10474.666521290466      9141.101145673749       0.872686603157613
rm_cycles:      12165802051             12350658344
rm_calls:       101934319               103140884
rm_cycles/call: 119.34942196454955      119.74551569676288      1.0033187737794909
le_cycles:      67156160585             64213736254
le_calls:       307094506               308610635
le_cycles/call: 218.68239018577557      208.07363380072758      0.9514878341322517
ge_cycles:      2038386746              2028175859
ge_calls:       3262276                 3295155
ge_cycles/call: 624.835772938893        615.5024146056862       0.9850627016931063

So the results very from little change for ge to almost 13% less time per call for reclaim_all.

Another instrumentation, that sought to count the number of null pointers not examined by using the bitmap per array walk. Those numbers are 7.68 (rc), 12.94 (rm), 1.41 (le) and 4.53 (ge).

sys/kern/subr_pctrie.c
77

The error message is not normally quoted.

80

Shouldn't this message be "too wide"?

dougm marked 2 inline comments as done.
This revision is now accepted and ready to land.Jul 7 2023, 4:01 PM
This revision was automatically updated to reflect the committed changes.