vm_domainset: Ensure round-robin works properly
ClosedPublic
Actions

Authored by olce on Sep 25 2025, 2:52 PM.

Details

Reviewers

bnovkov
markj
alc

Commits

rGb15ff7214020: vm_domainset: Ensure round-robin works properly
rGd7226d1858c1: vm_domainset: Ensure round-robin works properly
rG7b0fe2d405ae: vm_domainset: Ensure round-robin works properly

Summary

All iterators that rely on an object's 'struct domainset_ref' (field
'domain' on 'struct vm_object'), which is the case for page allocations
with objects, are used with the corresponding object locked for writing,
so cannot lose concurrent iterator index's increases even if those are
made without atomic operations. The only offender was thread stack
allocation, which has just been fixed in commit XXX.

However, the interleaved policy would still reset the iterator index
when restarting, losing track of the next domain to allocate from when
applying round-robin, which all allocation policies do if allocation
from the first domain fails.

Fix this last round-robin problem by not resetting the shared index at
iterator's phase init on DOMAINSET_POLICY_INTERLEAVE.

Add an assertion to check that, when passed, an object is write-locked
in order to prevent the problem mentioned in the first paragraph from
reappearing.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

olce created this revision.Sep 25 2025, 2:52 PM

Herald added a subscriber: imp. · View Herald TranscriptSep 25 2025, 2:52 PM

olce requested review of this revision.Sep 25 2025, 2:52 PM

Harbormaster completed remote builds in B67329: Diff 162831.Sep 25 2025, 2:52 PM

olce mentioned this in D52441: uma_core: Rely on domainset iterator to wait on M_WAITOK.Sep 25 2025, 2:52 PM

olce added a subscriber: emaste.

So, it seems fixing the races was even simpler than I expected, but in the end that depends on what we really want to be doing in such cases. A priori, it seems to me that what makes the most sense to handle concurrency, with an eye to preserving the spirit of today's policies (which may or may not be a great idea), is actually just prevent any policy initialization from resetting the next position to try, and making sure each try uses a different domain (as all policies, except for the first domain they prescribe, all do round-robin). What do you think?

Although using a fetch & add when the iterator index is in fact per-thread is not necessary, I'm not sure if it's worth optimizing for this case (but, if you find it necessary, we could, relatively easily, without enlarging struct vm_domainset_iter).

concurrent allocations might lead to lost increases of the iterator index

How so? Allocations within an object are serialized by the VM object write lock.

In D52733#1204931, @markj wrote:

How so? Allocations within an object are serialized by the VM object write lock.

I didn't check initially, was just addressing your concern in D52441.

After looking a bit, it doesn't seem to be the case for the kstack_object/kstack_alt_object in vm_glue.c, doesn't it?

Anyway, happy to drop the atomic increment if it's not necessary (and I'll add assertions in iterator initializations that if an object is passed it must be locked).

In D52733#1204957, @olce wrote:

In D52733#1204931, @markj wrote:

How so? Allocations within an object are serialized by the VM object write lock.

I didn't check initially, was just addressing your concern in D52441.

If it was this easy it would be fixed already. :)

After looking a bit, it doesn't seem to be the case for the kstack_object/kstack_alt_object in vm_glue.c, doesn't it?

No, the object write lock is definitely held there too, the pages are allocated in vm_thread_stack_back(). The write lock serializes updates to the page trie.

Anyway, happy to drop the atomic increment if it's not necessary (and I'll add assertions in iterator initializations that if an object is passed it must be locked).

Yes, we cannot reasonably penalize the common case to address the race we talked about in UMA.

In D52733#1204959, @markj wrote:

If it was this easy it would be fixed already. :)

Well, after some recent experiences, you'll have a very hard time making me buy such a kind of argument. :-)

No, the object write lock is definitely held there too, the pages are allocated in vm_thread_stack_back(). The write lock serializes updates to the page trie.

Actually, it's vm_thread_stack_create() that I had in mind. I fail to see any locking here.

Yes, we cannot reasonably penalize the common case to address the race we talked about in UMA.

A single fetch & add isn't something I'd consider per se as really penalizing in practice given the amount of stuff to do in an allocation, unless benchmarks show otherwise. But of course, let's remove it if it's not logically required. It would indeed be better if the iterator index increases cannot race in all cases.

There is probably a misunderstanding. I don't think there is another race in UMA, as explained in D52441's commit message. You then said there was a race on keg->uk_dr. I assumed the only problem was with the iterator index storage, which is fixed here. If, e.g., a keg can change its domainset policy, this would indeed not be enough. I'll have to dig deeper, but it would probably help a lot if you describe which kind of remaining races you have in mind.

In D52733#1205031, @olce wrote:

In D52733#1204959, @markj wrote:

Actually, it's vm_thread_stack_create() that I had in mind. I fail to see any locking here.

Thanks, I see now. This is a regression from commit 7a79d06697614. https://reviews.freebsd.org/D52982

Yes, we cannot reasonably penalize the common case to address the race we talked about in UMA.

A single fetch & add isn't something I'd consider per se as really penalizing in practice given the amount of stuff to do in an allocation, unless benchmarks show otherwise. But of course, let's remove it if it's not logically required. It would indeed be better if the iterator index increases cannot race in all cases.

There is probably a misunderstanding. I don't think there is another race in UMA, as explained in D52441's commit message. You then said there was a race on keg->uk_dr. I assumed the only problem was with the iterator index storage, which is fixed here. If, e.g., a keg can change its domainset policy, this would indeed not be enough. I'll have to dig deeper, but it would probably help a lot if you describe which kind of remaining races you have in mind.

Indeed, that is the only problem in UMA. I just don't like this patch as a solution, as it's not required in the common case.