Page MenuHomeFreeBSD

(umaperf 3/7) atomic zone limits
ClosedPublic

Authored by jeff on Dec 15 2019, 11:37 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Nov 16, 10:06 AM
Unknown Object (File)
Sat, Nov 16, 9:53 AM
Unknown Object (File)
Sat, Nov 16, 9:42 AM
Unknown Object (File)
Sat, Nov 16, 8:00 AM
Unknown Object (File)
Sat, Nov 9, 3:46 AM
Unknown Object (File)
Oct 22 2024, 4:04 AM
Unknown Object (File)
Sep 30 2024, 4:42 PM
Unknown Object (File)
Sep 30 2024, 4:17 PM
Subscribers

Details

Summary

This is part of a series of patches intended to enable first-touch numa policies for UMA by default. It also reduces the cost of uma_zalloc/zfree by approximately 30% each in my tests.

This uses an atomic add approach to uz_items. The top 20 bits store a sleeper count and the bottom 44 store the item count. Since sleepers are stored above items any zone with sleepers will be considered above its zone limit as expected. I kept the sleepers field because it changes less frequently and it is checked on every uma_zfree call. It could become essentially a bool that is only updated once for any number of sleepers if this shows up as a problem in profiles. I suspect hitting your zone limits is rare enough that it should not be.

This could be abstracted out into a generic api with some care and effort. UMA has somewhat particular properties that made me less inclined to do so.

I'm not totally in love with the wakeup approach but I didn't really have the resources to test alternatives with a real workload. If someone does I have a few proposals.

It looks like this will break 32bit ppc and 32bit mips. bdragon has proposed that a hash of locks be added to support this for these cases. Warner suggested I could simply break them and let the maintainers catch up. For patches other than this I also need testandset and testandclear to be broadly supported. I would like people with an interest in these archs to comment and approve this review if they are comfortable filling in the blanks or let me know if they have time to implement the missing features first.

Test Plan

I have tested this by purposefully exhausting zones and monitoring item and sleeper counts and verifying forward progress.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

jeff edited the test plan for this revision. (Show Details)
jeff added reviewers: mav, markj, rlibby, glebius, gallatin.
jeff set the repository for this revision to rS FreeBSD src repository - subversion.
sys/vm/uma_core.c
3333 ↗(On Diff #65695)

Seems like zone_log_warning() and zone_maxaction() could be combined.

3347 ↗(On Diff #65695)

Can you assert that the sleeper count doesn't overflow?

3363 ↗(On Diff #65695)

I'm a bit confused: uz_sleepers is in the same cache line as uz_bucket_size.

3365 ↗(On Diff #65695)

This can just be atomic_add_int()? Or _32 to be more precise.

3378 ↗(On Diff #65695)

atomic_subtract_32()

3380 ↗(On Diff #65695)

I don't understand this: count is the number of items we wish to allocate. In the fcmpset loop above, if sleepers were present, we only added UZ_ITEMS_SLEEPER to uz_items. What is count - UZ_ITEMS_SLEEPER?

Here and below it looks like the code was written under the assumption that count was already added to the item count. But if we went to sleep, we had only set uz_items = uz_items + UZ_ITEMS_SLEEPER.

3388 ↗(On Diff #65695)

Maybe item_count or something else so that we don't have local variables called cnt and count?

3394 ↗(On Diff #65695)

Redundant parens.

3433 ↗(On Diff #65695)

Missing parens.

3440 ↗(On Diff #65695)

KASSERT count > 0?

3522 ↗(On Diff #65695)

We are assuming here and in zone_alloc_item() that uz_max_items isn't going to transition between 0 and a non-zero number. That's probably a reasonable assumption in practice, but it'd also be easy to use a local var to record whether or not zone_alloc_limit() was called. Else we should be more careful to document how uma_zone_set_max() should be used.

3938 ↗(On Diff #65695)

It is kind of weird that nitems is an int here given that you reserve 44 bits for the item count in uz_max_items.

sys/vm/uma_core.c
3380 ↗(On Diff #65695)

Sorry, I see now.

sys/vm/uma_core.c
3207 ↗(On Diff #65695)

This comparison is wrong now, we need to mask off sleepers.

4550 ↗(On Diff #65695)

We need to mask off the sleeper count.

sys/vm/uma_core.c
3347 ↗(On Diff #65695)

Yes that's a good suggestion.

3363 ↗(On Diff #65695)

items are adjusted much more frequently than sleepers. When there are sleepers we need to dirty the line so callers to free can observe it. We don't need to dirty it for every sleeper but this is simpler this way and your perf sucks when you run out of memory anyway.

I didn't want every counter increment on uz_items to invalidate the first couple of cache lines that are accessed in faster paths.

3365 ↗(On Diff #65695)

Yeah no need for fetchadd. I had a different algorithm here at one point.

3380 ↗(On Diff #65695)

Possibly deserves a better comment.

3522 ↗(On Diff #65695)

You currently can not clear a limit by setting it to zero. There are a number of things that will have to be fixed for that to work.

3938 ↗(On Diff #65695)

Yeah, I noticed this as well. Not sure how much the sysctl type is in our abi contract.

sys/vm/uma_core.c
3207 ↗(On Diff #65695)

This assert is not safe anymore anyway since we now overflow and detect.

Review feedback. Don't evaluate uz_max_items multiple times.

Generally looking good, but I do have a few more comments.

sys/vm/uma_core.c
896 ↗(On Diff #65902)

"cpu queue must be locked" I think this is stale now?

2112 ↗(On Diff #65902)

"number _of_ items"

3361 ↗(On Diff #65902)

This should be determined by whether we added UZ_ITEMS_SLEEPER. Otherwise if we race with the max being reduced, we might go down the sleep path without having added UZ_ITEMS_SLEEPER. We can either check new <= max (i.e. the cached value) or just new < UZ_ITEMS_SLEEPER.

3390 ↗(On Diff #65902)

UZ_ITEMS_SLEEPER is signed... I don't think this is a problem in practice since we use cc -fwrapv, but it could be prettier.

3434 ↗(On Diff #65902)

uz_sleeps is not being maintained anymore. Was that deliberate? If so, should comment that it's only left in place for ABI reasons. Else, should update it in zone_alloc_limit_hard().

3952–3953 ↗(On Diff #65902)

(As commented elsewhere, we should handle nitems=0 in some way.)

3964–3968 ↗(On Diff #65902)

I know this is not the interesting routine but...

We don't hold the sleepq lock here, and we send a wakeup(), and then we set the limit. So I think there is technically a race here, if a client were to lower a limit on a live zone.

4565–4570 ↗(On Diff #65902)

Why do we do this at all? Can't we just always use the else case?

...ah, is this about uma_zone_reserve_kva()? And if so, do we care to reflect that here? And if so, shouldn't we check uk_kva instead?

3522 ↗(On Diff #65695)

I'm not sure that we can ever allow setting the limit from non-zero to zero and keep the unlocked checks. But we could convert a uma_zone_set_max(0) to a maximum limit, ((1LL << (UZ_ITEMS_SLEEPER_SHIFT - 1)) - 1). It would have more cost than never having set the limit, but it should be safe.

3938 ↗(On Diff #65695)

(Also, beware, we do need enough bits to be able to exceed the limit somewhat (up to several times uz_bucket_size), due to how the fast path works.)

sys/vm/uma_core.c
3964–3968 ↗(On Diff #65902)

Er, I mean *raise* a limit on a live zone.

sys/vm/uma_core.c
3964–3968 ↗(On Diff #65902)

So the nice thing about using the sleepq for synchronization this way is that you can just order your writes. I do need to move the wakeup last but I do not need to protect the whole block in sleepq_lock().

4565–4570 ↗(On Diff #65902)

I just preserved the old behavior. It really doesn't make a ton of sense to me either. I will see if gleb will comment.

Review feedback from rlibby.

sys/vm/uma_int.h
434 ↗(On Diff #65937)

Possibly this should be uz_items because bkt_count is already in here.

Fixups look good. I agree with leaving an XXX and coming back later to the problem of transitioning between no-limit and limit.

This revision is now accepted and ready to land.Dec 23 2019, 9:33 PM

32bit mips doesn't have 64bit atomics...

jeff added reviewers: imp, kib, avg, kevans, bdragon, jhibbits.