Page MenuHomeFreeBSD

(umaperf 3/7) atomic zone limits
ClosedPublic

Authored by jeff on Dec 15 2019, 11:37 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Nov 23, 12:03 AM
Unknown Object (File)
Sat, Nov 16, 10:06 AM
Unknown Object (File)
Sat, Nov 16, 9:53 AM
Unknown Object (File)
Sat, Nov 16, 9:42 AM
Unknown Object (File)
Sat, Nov 16, 8:00 AM
Unknown Object (File)
Sat, Nov 9, 3:46 AM
Unknown Object (File)
Oct 22 2024, 4:04 AM
Unknown Object (File)
Sep 30 2024, 4:42 PM
Subscribers

Details

Summary

This is part of a series of patches intended to enable first-touch numa policies for UMA by default. It also reduces the cost of uma_zalloc/zfree by approximately 30% each in my tests.

This uses an atomic add approach to uz_items. The top 20 bits store a sleeper count and the bottom 44 store the item count. Since sleepers are stored above items any zone with sleepers will be considered above its zone limit as expected. I kept the sleepers field because it changes less frequently and it is checked on every uma_zfree call. It could become essentially a bool that is only updated once for any number of sleepers if this shows up as a problem in profiles. I suspect hitting your zone limits is rare enough that it should not be.

This could be abstracted out into a generic api with some care and effort. UMA has somewhat particular properties that made me less inclined to do so.

I'm not totally in love with the wakeup approach but I didn't really have the resources to test alternatives with a real workload. If someone does I have a few proposals.

It looks like this will break 32bit ppc and 32bit mips. bdragon has proposed that a hash of locks be added to support this for these cases. Warner suggested I could simply break them and let the maintainers catch up. For patches other than this I also need testandset and testandclear to be broadly supported. I would like people with an interest in these archs to comment and approve this review if they are comfortable filling in the blanks or let me know if they have time to implement the missing features first.

Test Plan

I have tested this by purposefully exhausting zones and monitoring item and sleeper counts and verifying forward progress.

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 28285
Build 26395: arc lint + arc unit

Event Timeline

jeff edited the test plan for this revision. (Show Details)
jeff added reviewers: mav, markj, rlibby, glebius, gallatin.
jeff set the repository for this revision to rS FreeBSD src repository - subversion.
sys/vm/uma_core.c
3338

Seems like zone_log_warning() and zone_maxaction() could be combined.

3352

Can you assert that the sleeper count doesn't overflow?

3368

I'm a bit confused: uz_sleepers is in the same cache line as uz_bucket_size.

3370

This can just be atomic_add_int()? Or _32 to be more precise.

3383

atomic_subtract_32()

3385

I don't understand this: count is the number of items we wish to allocate. In the fcmpset loop above, if sleepers were present, we only added UZ_ITEMS_SLEEPER to uz_items. What is count - UZ_ITEMS_SLEEPER?

Here and below it looks like the code was written under the assumption that count was already added to the item count. But if we went to sleep, we had only set uz_items = uz_items + UZ_ITEMS_SLEEPER.

3393

Maybe item_count or something else so that we don't have local variables called cnt and count?

3399

Redundant parens.

3438

Missing parens.

3445

KASSERT count > 0?

3537–3538

We are assuming here and in zone_alloc_item() that uz_max_items isn't going to transition between 0 and a non-zero number. That's probably a reasonable assumption in practice, but it'd also be easy to use a local var to record whether or not zone_alloc_limit() was called. Else we should be more careful to document how uma_zone_set_max() should be used.

3953

It is kind of weird that nitems is an int here given that you reserve 44 bits for the item count in uz_max_items.

sys/vm/uma_core.c
3385

Sorry, I see now.

sys/vm/uma_core.c
3214–3215

This comparison is wrong now, we need to mask off sleepers.

4565–4567

We need to mask off the sleeper count.

sys/vm/uma_core.c
3352

Yes that's a good suggestion.

3368

items are adjusted much more frequently than sleepers. When there are sleepers we need to dirty the line so callers to free can observe it. We don't need to dirty it for every sleeper but this is simpler this way and your perf sucks when you run out of memory anyway.

I didn't want every counter increment on uz_items to invalidate the first couple of cache lines that are accessed in faster paths.

3370

Yeah no need for fetchadd. I had a different algorithm here at one point.

3385

Possibly deserves a better comment.

3537–3538

You currently can not clear a limit by setting it to zero. There are a number of things that will have to be fixed for that to work.

3953

Yeah, I noticed this as well. Not sure how much the sysctl type is in our abi contract.

sys/vm/uma_core.c
3214–3215

This assert is not safe anymore anyway since we now overflow and detect.

Review feedback. Don't evaluate uz_max_items multiple times.

Generally looking good, but I do have a few more comments.

sys/vm/uma_core.c
896

"cpu queue must be locked" I think this is stale now?

2112

"number _of_ items"

3361

This should be determined by whether we added UZ_ITEMS_SLEEPER. Otherwise if we race with the max being reduced, we might go down the sleep path without having added UZ_ITEMS_SLEEPER. We can either check new <= max (i.e. the cached value) or just new < UZ_ITEMS_SLEEPER.

3390

UZ_ITEMS_SLEEPER is signed... I don't think this is a problem in practice since we use cc -fwrapv, but it could be prettier.

3434

uz_sleeps is not being maintained anymore. Was that deliberate? If so, should comment that it's only left in place for ABI reasons. Else, should update it in zone_alloc_limit_hard().

3537–3538

I'm not sure that we can ever allow setting the limit from non-zero to zero and keep the unlocked checks. But we could convert a uma_zone_set_max(0) to a maximum limit, ((1LL << (UZ_ITEMS_SLEEPER_SHIFT - 1)) - 1). It would have more cost than never having set the limit, but it should be safe.

3952–3953

(As commented elsewhere, we should handle nitems=0 in some way.)

3953

(Also, beware, we do need enough bits to be able to exceed the limit somewhat (up to several times uz_bucket_size), due to how the fast path works.)

3964–3968

I know this is not the interesting routine but...

We don't hold the sleepq lock here, and we send a wakeup(), and then we set the limit. So I think there is technically a race here, if a client were to lower a limit on a live zone.

4565–4570

Why do we do this at all? Can't we just always use the else case?

...ah, is this about uma_zone_reserve_kva()? And if so, do we care to reflect that here? And if so, shouldn't we check uk_kva instead?

sys/vm/uma_core.c
3964–3968

Er, I mean *raise* a limit on a live zone.

sys/vm/uma_core.c
3964–3968

So the nice thing about using the sleepq for synchronization this way is that you can just order your writes. I do need to move the wakeup last but I do not need to protect the whole block in sleepq_lock().

4565–4570

I just preserved the old behavior. It really doesn't make a ton of sense to me either. I will see if gleb will comment.

Review feedback from rlibby.

sys/vm/uma_int.h
434

Possibly this should be uz_items because bkt_count is already in here.

Fixups look good. I agree with leaving an XXX and coming back later to the problem of transitioning between no-limit and limit.

This revision is now accepted and ready to land.Dec 23 2019, 9:33 PM

32bit mips doesn't have 64bit atomics...

jeff added reviewers: imp, kib, avg, kevans, bdragon, jhibbits.