Page MenuHomeFreeBSD

uma_core: Rely on domainset iterator to wait on M_WAITOK
ClosedPublic

Authored by olce on Sep 9 2025, 9:20 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Oct 11, 5:28 AM
Unknown Object (File)
Sat, Oct 11, 5:13 AM
Unknown Object (File)
Sat, Oct 11, 5:13 AM
Unknown Object (File)
Sat, Oct 11, 5:13 AM
Unknown Object (File)
Fri, Oct 10, 10:01 PM
Unknown Object (File)
Mon, Oct 6, 12:42 AM
Unknown Object (File)
Tue, Sep 30, 1:33 AM
Unknown Object (File)
Sat, Sep 27, 6:31 PM
Subscribers

Details

Summary

Commit 8b987a77691d ("Use per-domain keg locks.") removed the need to
lock the keg entirely, replacing it with per-domain keg locks. In
particular, it removed the need to hold a lock over waiting for a domain
to grow free memory.

Simplify the code of keg_fetch_slab() and uma_prealloc() by removing the
M_WAITOK -> M_NOWAIT downgrade and the local call to vm_wait_doms()
(which used to necessitate temporary dropping the keg lock) which the
iterator machinery already handles on M_WAITOK (and compatibly with
vm_domainset_iter_ignore() at that, although that does not matter now).

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

olce requested review of this revision.Sep 9 2025, 9:20 AM
sys/vm/uma_core.c
4015

Hmm, but it's a bug that we don't actually honour this comment. There is no convenient lock to synchronize the iterator though. I wonder how hard it would be to create an atomic iterator?

5259

Indentation on this line is wrong.

olce added inline comments.
sys/vm/uma_core.c
4015

Hmm, but it's a bug that we don't actually honour this comment.

Admittedly, I'm not that familiar with this code, but a priori I don't think so, as explained in the commit message (this comment should have been removed back then).

Which accesses do you think need to be synchronized here?

sys/vm/uma_core.c
4015

Which accesses do you think need to be synchronized here?

Updates to keg->uk_dr.

sys/vm/uma_core.c
4015

Oh, this race. I had already noticed it in another setting. For the round-robin policy, it should mostly be benign, as basically only exactly concurrent allocations can lead to possibly not advancing the iterator's index, and that's anyway trivial to fix. This is slightly more of a concern for the interleaving policy, because there may be a dispute on which next domain to look for (there is typically no problem when allocations in the first domain succeed), but seems very easily fixable as well. Will post these changes in another review.

LGTM overall (aside from the race issues that you'll handle in a separate PR).
I tested the changes with stress2 in a NUMA VM for a couple of hours and didn't run into any obvious issues.

This revision is now accepted and ready to land.Sep 10 2025, 5:04 PM

For the race issues, please have a look at D52733.

I'm ok with this change landing.

sys/vm/uma_core.c
4011–4012

For now I'd prefer to see a comment here noting that this is subject to races.

4015

It's mostly benign, but over time it seems possible that the effect of this race could be cumulative and result in unbalanced domains. (Consider, say the ABD cache zone used by ZFS to allocate ARC buffers, often consuming a large fraction of the system's RAM. If a policy says that allocations from this zone should be round-robin, we should try to ensure that they really are balanced.) At least, it's hard to convince myself that it's harmless.

olce marked 4 inline comments as done.Wed, Oct 8, 4:43 PM
olce added inline comments.
sys/vm/uma_core.c
4011–4012

I think D52733 and D52982 just obsolete that.

4015

It's mostly benign in the sense that it has very little chance to occur since the window of concurrency for the incrementation is tiny compared to the whole allocation path.

But anyway, we are fixing that with D52733 and D52982.

olce marked 2 inline comments as done.Wed, Oct 8, 4:43 PM