uma_core: Rely on domainset iterator to wait on M_WAITOK
ClosedPublic
Actions

Authored by olce on Sep 9 2025, 9:20 AM.

Details

Reviewers

markj
bnovkov

Commits

rG781802df7a2b: uma_core: Rely on domainset iterator to wait on M_WAITOK

Summary

Commit 8b987a77691d ("Use per-domain keg locks.") removed the need to
lock the keg entirely, replacing it with per-domain keg locks. In
particular, it removed the need to hold a lock over waiting for a domain
to grow free memory.

Simplify the code of keg_fetch_slab() and uma_prealloc() by removing the
M_WAITOK -> M_NOWAIT downgrade and the local call to vm_wait_doms()
(which used to necessitate temporary dropping the keg lock) which the
iterator machinery already handles on M_WAITOK (and compatibly with
vm_domainset_iter_ignore() at that, although that does not matter now).

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

olce created this revision.Sep 9 2025, 9:20 AM

Herald added a subscriber: imp. · View Herald TranscriptSep 9 2025, 9:20 AM

olce requested review of this revision.Sep 9 2025, 9:20 AM

Harbormaster completed remote builds in B66915: Diff 161772.Sep 9 2025, 9:20 AM

markj added inline comments.Sep 9 2025, 7:05 PM

sys/vm/uma_core.c
4015	Hmm, but it's a bug that we don't actually honour this comment. There is no convenient lock to synchronize the iterator though. I wonder how hard it would be to create an atomic iterator?
5259	Indentation on this line is wrong.

olce marked an inline comment as done.Sep 10 2025, 11:11 AM

olce added inline comments.

sys/vm/uma_core.c
4015	Hmm, but it's a bug that we don't actually honour this comment. Admittedly, I'm not that familiar with this code, but a priori I don't think so, as explained in the commit message (this comment should have been removed back then). Which accesses do you think need to be synchronized here?

markj added inline comments.Sep 10 2025, 1:31 PM

sys/vm/uma_core.c
4015	Which accesses do you think need to be synchronized here? Updates to `keg->uk_dr`.

olce added inline comments.Sep 10 2025, 4:03 PM

sys/vm/uma_core.c
4015	Oh, this race. I had already noticed it in another setting. For the round-robin policy, it should mostly be benign, as basically only exactly concurrent allocations can lead to possibly not advancing the iterator's index, and that's anyway trivial to fix. This is slightly more of a concern for the interleaving policy, because there may be a dispute on which next domain to look for (there is typically no problem when allocations in the first domain succeed), but seems very easily fixable as well. Will post these changes in another review.

LGTM overall (aside from the race issues that you'll handle in a separate PR).
I tested the changes with stress2 in a NUMA VM for a couple of hours and didn't run into any obvious issues.

This revision is now accepted and ready to land.Sep 10 2025, 5:04 PM

markj accepted this revision.Sep 11 2025, 1:40 PM

For the race issues, please have a look at D52733.

olce mentioned this in D52733: vm_domainset: Ensure round-robin works properly.Sep 26 2025, 9:38 AM

I'm ok with this change landing.

sys/vm/uma_core.c
4011–4012	For now I'd prefer to see a comment here noting that this is subject to races.
4015	It's mostly benign, but over time it seems possible that the effect of this race could be cumulative and result in unbalanced domains. (Consider, say the ABD cache zone used by ZFS to allocate ARC buffers, often consuming a large fraction of the system's RAM. If a policy says that allocations from this zone should be round-robin, we should try to ensure that they really are balanced.) At least, it's hard to convince myself that it's harmless.

olce marked 4 inline comments as done.Oct 8 2025, 4:43 PM

olce added inline comments.

sys/vm/uma_core.c
4011–4012	I think D52733 and D52982 just obsolete that.
4015	It's mostly benign in the sense that it has very little chance to occur since the window of concurrency for the incrementation is tiny compared to the whole allocation path. But anyway, we are fixing that with D52733 and D52982.