Page MenuHomeFreeBSD

vm_domainset: Ensure round-robin works properly
ClosedPublic

Authored by olce on Sep 25 2025, 2:52 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Nov 14, 12:39 PM
Unknown Object (File)
Sun, Nov 9, 4:50 AM
Unknown Object (File)
Fri, Nov 7, 12:39 PM
Unknown Object (File)
Wed, Oct 29, 10:00 PM
Unknown Object (File)
Wed, Oct 29, 11:12 AM
Unknown Object (File)
Wed, Oct 29, 11:08 AM
Unknown Object (File)
Wed, Oct 29, 10:19 AM
Unknown Object (File)
Wed, Oct 29, 10:04 AM
Subscribers

Details

Summary

All iterators that rely on an object's 'struct domainset_ref' (field
'domain' on 'struct vm_object'), which is the case for page allocations
with objects, are used with the corresponding object locked for writing,
so cannot lose concurrent iterator index's increases even if those are
made without atomic operations. The only offender was thread stack
allocation, which has just been fixed in commit XXX.

However, the interleaved policy would still reset the iterator index
when restarting, losing track of the next domain to allocate from when
applying round-robin, which all allocation policies do if allocation
from the first domain fails.

Fix this last round-robin problem by not resetting the shared index at
iterator's phase init on DOMAINSET_POLICY_INTERLEAVE.

Add an assertion to check that, when passed, an object is write-locked
in order to prevent the problem mentioned in the first paragraph from
reappearing.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

olce requested review of this revision.Sep 25 2025, 2:52 PM

So, it seems fixing the races was even simpler than I expected, but in the end that depends on what we really want to be doing in such cases. A priori, it seems to me that what makes the most sense to handle concurrency, with an eye to preserving the spirit of today's policies (which may or may not be a great idea), is actually just prevent any policy initialization from resetting the next position to try, and making sure each try uses a different domain (as all policies, except for the first domain they prescribe, all do round-robin). What do you think?

Although using a fetch & add when the iterator index is in fact per-thread is not necessary, I'm not sure if it's worth optimizing for this case (but, if you find it necessary, we could, relatively easily, without enlarging struct vm_domainset_iter).

concurrent allocations might lead to lost increases of the iterator index

How so? Allocations within an object are serialized by the VM object write lock.

How so? Allocations within an object are serialized by the VM object write lock.

I didn't check initially, was just addressing your concern in D52441.

After looking a bit, it doesn't seem to be the case for the kstack_object/kstack_alt_object in vm_glue.c, doesn't it?

Anyway, happy to drop the atomic increment if it's not necessary (and I'll add assertions in iterator initializations that if an object is passed it must be locked).

How so? Allocations within an object are serialized by the VM object write lock.

I didn't check initially, was just addressing your concern in D52441.

If it was this easy it would be fixed already. :)

After looking a bit, it doesn't seem to be the case for the kstack_object/kstack_alt_object in vm_glue.c, doesn't it?

No, the object write lock is definitely held there too, the pages are allocated in vm_thread_stack_back(). The write lock serializes updates to the page trie.

Anyway, happy to drop the atomic increment if it's not necessary (and I'll add assertions in iterator initializations that if an object is passed it must be locked).

Yes, we cannot reasonably penalize the common case to address the race we talked about in UMA.

If it was this easy it would be fixed already. :)

Well, after some recent experiences, you'll have a very hard time making me buy such a kind of argument. :-)

No, the object write lock is definitely held there too, the pages are allocated in vm_thread_stack_back(). The write lock serializes updates to the page trie.

Actually, it's vm_thread_stack_create() that I had in mind. I fail to see any locking here.

Yes, we cannot reasonably penalize the common case to address the race we talked about in UMA.

A single fetch & add isn't something I'd consider per se as really penalizing in practice given the amount of stuff to do in an allocation, unless benchmarks show otherwise. But of course, let's remove it if it's not logically required. It would indeed be better if the iterator index increases cannot race in all cases.

There is probably a misunderstanding. I don't think there is another race in UMA, as explained in D52441's commit message. You then said there was a race on keg->uk_dr. I assumed the only problem was with the iterator index storage, which is fixed here. If, e.g., a keg can change its domainset policy, this would indeed not be enough. I'll have to dig deeper, but it would probably help a lot if you describe which kind of remaining races you have in mind.

Actually, it's vm_thread_stack_create() that I had in mind. I fail to see any locking here.

Thanks, I see now. This is a regression from commit 7a79d06697614. https://reviews.freebsd.org/D52982

Yes, we cannot reasonably penalize the common case to address the race we talked about in UMA.

A single fetch & add isn't something I'd consider per se as really penalizing in practice given the amount of stuff to do in an allocation, unless benchmarks show otherwise. But of course, let's remove it if it's not logically required. It would indeed be better if the iterator index increases cannot race in all cases.

There is probably a misunderstanding. I don't think there is another race in UMA, as explained in D52441's commit message. You then said there was a race on keg->uk_dr. I assumed the only problem was with the iterator index storage, which is fixed here. If, e.g., a keg can change its domainset policy, this would indeed not be enough. I'll have to dig deeper, but it would probably help a lot if you describe which kind of remaining races you have in mind.

Indeed, that is the only problem in UMA. I just don't like this patch as a solution, as it's not required in the common case.

Indeed, that is the only problem in UMA.

Fine.

I just don't like this patch as a solution, as it's not required in the common case.

The atomic update is just one part of the patch, the other is still necessary. Going to update this revision.

olce retitled this revision from vm_domainset: Make the iterators atomic to vm_domainset: Ensure round-robin works properly.
olce edited the summary of this revision. (Show Details)

Update after discussion and fixing the last use of vm_domainset_iter_page_init() with an object not write-locked in D52982.

This revision is now accepted and ready to land.Oct 9 2025, 1:31 PM
This revision was automatically updated to reflect the committed changes.