Page MenuHomeFreeBSD

UMA limits in zone & more

Authored by glebius on Oct 30 2018, 11:56 PM.



This review isn't supposed to be committed as a whole!

It consist UMA limits in zone with some extra UMA changes.
It also has changes to create pbuf zone. The latter depends
on having UMA limits in zone.

The review is posted to review UMA changes only.

o Account number of items a zone holds. Counter is updated whenever
items transition from keg to a bucket cache or directly to a consumer.

  • New fields uz_items, uz_maxitems. Deleted uk_maxpages.
  • Point to return NULL or sleep shifts from keg_fetch_slab() to uma_zalloc_arg()
  • zone_alloc_bucket() inlined
  • Add uz_count_max to prevent bucket cache to grow too big for zones with really small limits. See update to uma_zone_set_max().

o Keep counter of sleepers and wakeup them one by one.

o Optimize uma_zfree_arg() to keep lock locked during back jumps

o Allow to limit number of items in bucket cache.

  • UMA_ZONE_NOBUCKETCACHE removed, just use limit of 0.
  • zone_cache_bucket()/zone_remove_bucket() to keep count of items

o Drop support for multi-keg zones, lots stuff deleted and
simplified. Make sure uma_zone_t is now perfectly aligned.

Diff Detail

rS FreeBSD src repository
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

glebius created this revision.Oct 30 2018, 11:56 PM
glebius updated this revision to Diff 50429.Nov 14 2018, 8:35 PM

Merge remote-tracking branch 'FreeBSD/master' into pbuf to take
in and resolve with recent D16666 check-in.

markj added inline comments.Nov 21 2018, 4:27 AM
489 ↗(On Diff #50429)

This assertion should go at the beginning of the function.

3095 ↗(On Diff #50429)

What's the point of deferring the unlock? We hold a critical section here and there's a free slot in uc_freebucket, so we're guaranteed to succeed during the restart. In all other cases we are jumping back with locked == false, right?

glebius marked 2 inline comments as done.Nov 26 2018, 7:46 PM
glebius updated this revision to Diff 51136.Nov 26 2018, 9:31 PM
  • Move assertion to beginning of the function.
  • Remove unlock avoidance in uma_zfree_arg(). Can't remember why it was

Can you explain how the zone/keg locking work after the simplification? I'm having trouble understanding it.

500 ↗(On Diff #51136)

"an upper limit" would sound more natural to me, and you use that term below.

600 ↗(On Diff #51136)

We want the zone lock here, not the keg lock.

2602 ↗(On Diff #51136)

Why did this get inlined? I can't see the reason.

2896 ↗(On Diff #51136)

The extra locking is rather unfortunate. Why do we do the check here and not somewhere where we already hold the zone lock?

3473 ↗(On Diff #51136)

Since you did it elsewhere, get rid of this return statement too?

318 ↗(On Diff #51136)

"This zone's keg"

glebius marked 4 inline comments as done.Nov 27 2018, 11:12 PM
glebius added inline comments.
500 ↗(On Diff #51136)

But existing uma_zone_set_max() comment already has this wording.

600 ↗(On Diff #51136)

In the new scheme if zone has a keg, they share lock.

The zone_timeout() is called for only keg-enabled zones, see zone_foreach(). So, KEG_LOCK() here is correct and is self-documenting.

glebius marked 2 inline comments as done.Nov 27 2018, 11:35 PM
glebius added inline comments.
2602 ↗(On Diff #51136)

I'm preparing a patch to put it back into separate function, and to be fair, I feel that it is more difficult to understand the code when a middle of uma_zalloc_arg() is separated from the rest of the function.

If you really insist I can roll this back. However, zone_alloc_bucket() would be modified. You see, the inlined code doesn't exactly match code of the function. It has few additions to rollback uz_items in case of allocation failure or strange bucket.

2896 ↗(On Diff #51136)

Because uz_import is not always zone_import(). For vm_page zones uz_import it is outside of UMA and doesn't lock zone. Right before uz_import is the only point where we can do bookkeeping of allocations.

The performance impact here is not measurable. We are already at a slow path and are about to acquire some kind of lock in uz_import. For regular zones, the same lock, so same cache line. Also, ratio of fastpath allocs to slowpath is expected to be super low. For example for vm_page we got one slowpath alloc for 10^7 immediate returns from per-CPU cache with zero jumps to zalloc_start.

markj added inline comments.Dec 3 2018, 2:45 PM
733 ↗(On Diff #51136)

Shouldn't we be adding bucket->ub_cnt to uz_frees?

2495 ↗(On Diff #51136)

I don't understand this check. In the common case the bucket will be placed in the per-CPU cache, so it won't count against uz_bktcount. And we already check this condition again before adding it to the bucket list.

2573 ↗(On Diff #51136)

Should assert that zone->uz_items >= max - bucket->ub_cnt before the subtraction.

2602 ↗(On Diff #51136)

I still think I prefer having a separate function. The details of how we fill a bucket aren't really relevant to the core allocation algorithm. At least, please pick a more specific variable name than "max".

2922 ↗(On Diff #51136)

What problem does this line fix?

3301 ↗(On Diff #51136)

It is weird to me that the patch is being so careful in general to do precise accounting, but then just adjusts the caller's request here. Why not simply disable per-CPU caching in this case?

3614 ↗(On Diff #51136)

The parens are not needed anymore.

226 ↗(On Diff #51136)


glebius marked 9 inline comments as done and an inline comment as not done.Dec 3 2018, 11:19 PM
glebius added inline comments.
2495 ↗(On Diff #51136)

You are absolutely correct. This came from Netflix tree and later my profiling showed that condition never happens.

3301 ↗(On Diff #51136)

Without per-CPU caching UMA isn't a useful allocator at all.

Note that uma_zone_set_max() never claimed to put a hard cap on a zone. With new code we are getting closer to requested cap than we used before.

226 ↗(On Diff #51136)

So that we may use single pointer for uz_keg/uz_lockptr. Should I put that in comments?

markj added inline comments.Dec 3 2018, 11:45 PM
3301 ↗(On Diff #51136)

And yet UMA_NOBUCKET exists. :)

I pointed this out because you are using UMA zones to manage pbufs, and the existing allocator is precise. For certain uses, where the max is a function of the about of physical memory, it may be that we don't care about per-CPU caching at all on small systems.

226 ↗(On Diff #51136)

Right, I see now. A comment would be helpful, I think.

glebius updated this revision to Diff 51560.Dec 3 2018, 11:49 PM
glebius marked 2 inline comments as done.

Address Mark's review.

glebius marked 3 inline comments as done.Dec 4 2018, 12:20 AM
glebius added inline comments.
3301 ↗(On Diff #51136)

UMA_NOBUCKET doesn't turn off CPU caches.

The existing allocator is precise, but maximums provided to it are very very rough guesses. "Mkay, 1/2 nswbufs might not be enough, but 3/4 would be fine..." For pbufs potential adjustment done by uma_zone_set_max() is entirely okay. I even see it as an autotuning. The more CPUs machine has, the more buffers it would have.

226 ↗(On Diff #51136)

Will do.

glebius updated this revision to Diff 51564.Dec 4 2018, 12:23 AM
glebius marked an inline comment as done.
  • Improve comment on why uk_lock needs to be first.
markj added inline comments.Dec 4 2018, 12:30 AM
3301 ↗(On Diff #51136)

Are you thinking of UMA_NOBUCKETCACHE? That is a different flag (and I'm glad it's gone with this change). It is a bug if UMA_NOBUCKET _doesn't_ turn off per-CPU caches.

glebius marked an inline comment as done.Dec 4 2018, 12:36 AM
glebius added inline comments.
3301 ↗(On Diff #51136)

Ah, sorry. Yes, I meant UMA_NOBUCKETCACHE.

markj added a comment.Dec 5 2018, 7:23 PM

Most of my inline comments are nits, but: I don't really understand how the wakeup() mechanism works with the per-CPU caches. Threads may remain asleep on "zonelimit" indefinitely if uz_items == uz_maxitems but active threads are able to avoid the slow path. Before, the ZFLAG_FULL check in the zfree fast path handled this. Is there anything preventing starvation?

2492 ↗(On Diff #51564)

I prefer an explicit comparison: if (zone->uz_maxitems > 0)

2493 ↗(On Diff #51564)

Why do we duplicate the logic from zone_alloc_item() instead of simply jumping to zone_alloc_item()?

2502 ↗(On Diff #51564)

msleep(9) is deprecated by mtx_sleep(9).

2514 ↗(On Diff #51564)

Explicit comparison: zone->uz_sleepers > 0.

2530 ↗(On Diff #51564)

Why do you not do a wakeup here?

2607 ↗(On Diff #51564)

Consider adding a KEG_LOCK_ASSERT()?

3075 ↗(On Diff #51564)

Explicit comparison: uz_sleepers > 0.

239 ↗(On Diff #51564)

I don't believe there's any padding here.

329 ↗(On Diff #51564)

To be consistent this should be uz_max_items.

342 ↗(On Diff #51564)

Again, I'd prefer consistency in the names: uz_bkt_count, uz_bkt_max.

430 ↗(On Diff #51564)

Seems to be a dup?

glebius updated this revision to Diff 51630.Dec 5 2018, 11:37 PM
glebius marked an inline comment as done.
  • Rewrite uma_zalloc_arg() so that there is a single sleeping point.

Few minor nits:

  • Rename new fields to have space between words.
  • Dup define.
  • Explicit comparison against 0.
  • There is no pad here.
glebius marked 10 inline comments as done.Dec 5 2018, 11:40 PM
markj added inline comments.Dec 6 2018, 7:04 PM
2555 ↗(On Diff #51630)

Why not have it in zone_alloc_item_locked()? With the current diff, uma_large_malloc(), for example, can bypass limits.

glebius updated this revision to Diff 51746.Dec 7 2018, 11:04 PM
  • Move the sleep into zone_alloc_item_locked() and put it into a loop.
glebius marked an inline comment as done.Dec 7 2018, 11:04 PM
markj accepted this revision.Dec 14 2018, 6:13 PM

The extra locking in zone_alloc_item() still makes me uncomfortable. For zones with small items it may be negligible, but large item zones such as those used by ZFS may be impacted more significantly.

I'd suggest asking Peter to stress test the patch.

3301 ↗(On Diff #51136)

I still don't really like this, but don't object to it.

2884 ↗(On Diff #51746)

I would comment, /* Returns with the zone unlocked. */

2906 ↗(On Diff #51746)

I'd fine this clearer as max_items - items > 1.

This revision is now accepted and ready to land.Dec 14 2018, 6:13 PM
glebius marked 2 inline comments as done.Dec 14 2018, 9:01 PM
glebius added inline comments.
2906 ↗(On Diff #51746)

For me your suggestion looks more obfuscated rather then clearer.

jeff added inline comments.Dec 18 2018, 7:35 AM
337 ↗(On Diff #51746)

This is now used for a single pointer and can be removed.

343 ↗(On Diff #51746)

Did you verify this offset?

glebius added inline comments.Jan 14 2019, 11:42 PM
337 ↗(On Diff #51746)

Yes, I noticed that. At Netflix we already have that substituted for an other field. I will address that later.

343 ↗(On Diff #51746)

I did. Will verify once again pre-commit.

This revision was automatically updated to reflect the committed changes.