pctrie: move the root to the latest search node
Needs ReviewPublic
Actions

Authored by dougm on Jun 26 2025, 7:02 AM.

Details

Reviewers

alc
kib
markj

Summary

A pctrie search can begin at any node. If pctrie accesses tend to cluster in one index range and then another, starting one search from where the previous search ended tends to reduce search time. That's one of the reasons iterators are used for many pctrie operations. This change modifies non-iterator methods for insertion, removal, lookup and replacement to change the root node at the conclusion of the operation to the node where the search ended.

A disadvantage is that for a pctrie that is truly randomly accessed, climbing to near the top of the trie will be an extra cost not paid if the root were always at the top of tree. An advantage is that many uses of iterators could be dropped from the current code, and invalidation of iterators when locks are briefly released would no longer be an issue.

Diff Detail

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

dougm requested review of this revision.Jun 26 2025, 7:02 AM

dougm created this revision.

dougm added reviewers: alc, kib, markj.Jun 26 2025, 7:04 AM

Update some comments. Fix some formatting. Add a function for stepping from one leaf to the next in the trie.

Enhance more comments.

Beg me pardon please for the naive question.
What this revision basically does is changing the semantic of the pt_root. Instead being a real root of the trie, it becomes the starting point for search. The argument for not needed the real root is that we can always climb up to it from any node, following pn_parent. Am I right?

If I am right, what I do not understand is that there are loot of places where the code does something like this

if (parent == NULL)
     operate on pctrie_root(ptree)

which would now access some random node instead?

sys/kern/subr_pctrie.c
173	Would it be beneficial to check that node != *pctrie_root(ptree)? To avoid dirtying the cache line without the need.

In D51060#1167200, @kib wrote:

What this revision basically does is changing the semantic of the pt_root. Instead being a real root of the trie, it becomes the starting point for search. The argument for not needed the real root is that we can always climb up to it from any node, following pn_parent. Am I right?

You are right.

If I am right, what I do not understand is that there are loot of places where the code does something like this
if (parent == NULL)
     operate on pctrie_root(ptree)
which would now access some random node instead?

This change removes "if (parent == NULL)" from _pctrie_lookup_node(). It doesn't add that test anywhere.

This change adds "if (node != NULL)" in pctrie_root_store(). That condition is important for the case when the trie has no interior nodes and _pctrie_lookup_node() returns with parent set to NULL; the root is pointing to a (possibly null) leaf, and should not change. It also means that if a lookup_ge search fails because there is no larger element in the tree, the parent is set to NULL and so the root is not changes. Perhaps that's not the right choice for lookup_ge (or lookup_le).

This change adds "if (*parent_out ==NULL)" in _pctrie_insert_lookup, and sets the root to the node at the top of the trie in the case where pctrie_insert_node() is going to make a newly allocated node the new top of the trie and the old top of the trie one of its children.

If you are asking about other than these cases, let me know and I'll try to answer more precisely.

sys/kern/subr_pctrie.c
173	I haven't tested that, so I don't know how the read and comparison would affect the performance.

The name pt_root and the term root is confusing IMO. At least it is worth adding a comment explaining the pt_root is some node, not necessary root.

sys/kern/subr_pctrie.c
779	Could these three lines (from 781 and below) replaced by `return (pctrie_lookup_ge(ptree, *m + 1));` ?

dougm added inline comments.Jul 3 2025, 4:32 PM

sys/kern/subr_pctrie.c
779	That would avoid resetting the root. If the last search was for index=255 and left the root pointing to the parent of indices 240-255, then the change you suggest would find index=256, but leave the root unchanged. The change I have proposed would leave the root pointing to the parent of indices 256-271. That's all assuming a trie that includes all indices from 0 to 4095. Changing the root to follow the last index found is a feature.

In D51060#1167241, @kib wrote:

The name pt_root and the term root is confusing IMO. At least it is worth adding a comment explaining the pt_root is some node, not necessary root.

I don't mind changing the name from 'root' to ... what? 'start'? 'cursor'? Any suggestions are welcome.

In D51060#1167398, @dougm wrote:

In D51060#1167241, @kib wrote:

The name pt_root and the term root is confusing IMO. At least it is worth adding a comment explaining the pt_root is some node, not necessary root.

I don't mind changing the name from 'root' to ... what? 'start'? 'cursor'? Any suggestions are welcome.

'cursor' sounds reasonable for me. Perhaps it should be a follow-up change.

This revision is now accepted and ready to land.Jul 5 2025, 4:22 PM

I am rather concerned that the pathological case of having to walk up to the root and then back down will be common place. For example, consider a memory mapped file that is read sequentially. The first access, when the file is not yet memory resident, will leave the cursor at the end. Subsequent accesses well then have to walk all the way up, and all the way down to get to the first page.

While executables and shared libraries are not exactly accessed sequentially, I think the result will be similar.

In short, while I think something like this change will very likely be beneficial to performance, I am not convinced that the complete elimination of the root is a good idea. We eliminated two pointers from the VM object, adding one back for the sake of performance and simplicity of the code, i.e., eliminating some of the explicit iterators, would be worthwhile.

If a pointer to the top of the pctrie were preserved, when would a search begin from where the last search left off, and when would a search begin from the top of the pctrie?

I can understand that if the sequential construction of the pctrie left the search pointer at the bottom right of the pctrie, and if many readonly searches began from that place, then performance could be worse than if the searches started at the top of the pctrie. If that is an issue, then upon completing the construction of the pctrie, an operation could move the search pointer to the top of the pctrie (in vm_object_prepare_buf_pages, perhaps?) to benefit all those readonly searches.

For regular searches that scan the pctrie from start to finish, the average number of nodes to examine per search will be barely over one, and it's just the first one of them that will have a higher than average cost.

But, to return to the key question, if there is a pointer to top-of-pctrie preserved, when will it be used?

dougm mentioned this in D51237: vm: don't use iterators for page allocation.Jul 23 2025, 4:20 AM

Abandon an attempt to use the synchronization that comes with writing a new root after modifying the pctrie to avoid synchronization when performing the last write that modified a node of the pctrie, as this attempt is based on my misunderstanding of the memory consistency model, as reported verbally by @alc.

This revision now requires review to proceed.Jul 29 2025, 4:01 AM