Page MenuHomeFreeBSD

UMA limits in zone & more
ClosedPublic

Authored by glebius on Oct 30 2018, 11:56 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Apr 4, 3:34 PM
Unknown Object (File)
Thu, Apr 4, 12:47 AM
Unknown Object (File)
Fri, Mar 22, 2:57 PM
Unknown Object (File)
Fri, Mar 22, 2:57 PM
Unknown Object (File)
Fri, Mar 22, 2:57 PM
Unknown Object (File)
Fri, Mar 22, 2:57 PM
Unknown Object (File)
Fri, Mar 22, 2:57 PM
Unknown Object (File)
Fri, Mar 22, 2:57 PM
Subscribers

Details

Summary

This review isn't supposed to be committed as a whole!

It consist UMA limits in zone with some extra UMA changes.
It also has changes to create pbuf zone. The latter depends
on having UMA limits in zone.

The review is posted to review UMA changes only.

o Account number of items a zone holds. Counter is updated whenever
items transition from keg to a bucket cache or directly to a consumer.

  • New fields uz_items, uz_maxitems. Deleted uk_maxpages.
  • Point to return NULL or sleep shifts from keg_fetch_slab() to uma_zalloc_arg()
  • zone_alloc_bucket() inlined
  • Add uz_count_max to prevent bucket cache to grow too big for zones with really small limits. See update to uma_zone_set_max().

o Keep counter of sleepers and wakeup them one by one.

o Optimize uma_zfree_arg() to keep lock locked during back jumps

o Allow to limit number of items in bucket cache.

  • UMA_ZONE_NOBUCKETCACHE removed, just use limit of 0.
  • zone_cache_bucket()/zone_remove_bucket() to keep count of items

o Drop support for multi-keg zones, lots stuff deleted and
simplified. Make sure uma_zone_t is now perfectly aligned.

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 21442
Build 20764: arc lint + arc unit

Event Timeline

Merge remote-tracking branch 'FreeBSD/master' into pbuf to take
in and resolve with recent D16666 check-in.

sys/vm/uma_core.c
493

This assertion should go at the beginning of the function.

3093

What's the point of deferring the unlock? We hold a critical section here and there's a free slot in uc_freebucket, so we're guaranteed to succeed during the restart. In all other cases we are jumping back with locked == false, right?

  • Move assertion to beginning of the function.
  • Remove unlock avoidance in uma_zfree_arg(). Can't remember why it was

Can you explain how the zone/keg locking work after the simplification? I'm having trouble understanding it.

sys/vm/uma.h
500

"an upper limit" would sound more natural to me, and you use that term below.

sys/vm/uma_core.c
602

We want the zone lock here, not the keg lock.

2602

Why did this get inlined? I can't see the reason.

2891–2912

The extra locking is rather unfortunate. Why do we do the check here and not somewhere where we already hold the zone lock?

3472–3473

Since you did it elsewhere, get rid of this return statement too?

sys/vm/uma_int.h
319

"This zone's keg"

glebius added inline comments.
sys/vm/uma.h
500

But existing uma_zone_set_max() comment already has this wording.

sys/vm/uma_core.c
602

In the new scheme if zone has a keg, they share lock.

The zone_timeout() is called for only keg-enabled zones, see zone_foreach(). So, KEG_LOCK() here is correct and is self-documenting.

glebius added inline comments.
sys/vm/uma_core.c
2602

I'm preparing a patch to put it back into separate function, and to be fair, I feel that it is more difficult to understand the code when a middle of uma_zalloc_arg() is separated from the rest of the function.

If you really insist I can roll this back. However, zone_alloc_bucket() would be modified. You see, the inlined code doesn't exactly match code of the function. It has few additions to rollback uz_items in case of allocation failure or strange bucket.

2891–2912

Because uz_import is not always zone_import(). For vm_page zones uz_import it is outside of UMA and doesn't lock zone. Right before uz_import is the only point where we can do bookkeeping of allocations.

The performance impact here is not measurable. We are already at a slow path and are about to acquire some kind of lock in uz_import. For regular zones, the same lock, so same cache line. Also, ratio of fastpath allocs to slowpath is expected to be super low. For example for vm_page we got one slowpath alloc for 10^7 immediate returns from per-CPU cache with zero jumps to zalloc_start.

sys/vm/uma_core.c
735

Shouldn't we be adding bucket->ub_cnt to uz_frees?

2498

I don't understand this check. In the common case the bucket will be placed in the per-CPU cache, so it won't count against uz_bktcount. And we already check this condition again before adding it to the bucket list.

2553

Should assert that zone->uz_items >= max - bucket->ub_cnt before the subtraction.

2602

I still think I prefer having a separate function. The details of how we fill a bucket aren't really relevant to the core allocation algorithm. At least, please pick a more specific variable name than "max".

2919

What problem does this line fix?

3302

It is weird to me that the patch is being so careful in general to do precise accounting, but then just adjusts the caller's request here. Why not simply disable per-CPU caching in this case?

3613

The parens are not needed anymore.

sys/vm/uma_int.h
226–228

Why?

glebius marked 9 inline comments as done and an inline comment as not done.Dec 3 2018, 11:19 PM
glebius added inline comments.
sys/vm/uma_core.c
2498

You are absolutely correct. This came from Netflix tree and later my profiling showed that condition never happens.

3302

Without per-CPU caching UMA isn't a useful allocator at all.

Note that uma_zone_set_max() never claimed to put a hard cap on a zone. With new code we are getting closer to requested cap than we used before.

sys/vm/uma_int.h
226–228

So that we may use single pointer for uz_keg/uz_lockptr. Should I put that in comments?

sys/vm/uma_core.c
3302

And yet UMA_NOBUCKET exists. :)

I pointed this out because you are using UMA zones to manage pbufs, and the existing allocator is precise. For certain uses, where the max is a function of the about of physical memory, it may be that we don't care about per-CPU caching at all on small systems.

sys/vm/uma_int.h
226–228

Right, I see now. A comment would be helpful, I think.

glebius marked 2 inline comments as done.

Address Mark's review.

glebius added inline comments.
sys/vm/uma_core.c
3302

UMA_NOBUCKET doesn't turn off CPU caches.

The existing allocator is precise, but maximums provided to it are very very rough guesses. "Mkay, 1/2 nswbufs might not be enough, but 3/4 would be fine..." For pbufs potential adjustment done by uma_zone_set_max() is entirely okay. I even see it as an autotuning. The more CPUs machine has, the more buffers it would have.

sys/vm/uma_int.h
226–228

Will do.

glebius marked an inline comment as done.
  • Improve comment on why uk_lock needs to be first.
sys/vm/uma_core.c
3302

Are you thinking of UMA_NOBUCKETCACHE? That is a different flag (and I'm glad it's gone with this change). It is a bug if UMA_NOBUCKET _doesn't_ turn off per-CPU caches.

glebius added inline comments.
sys/vm/uma_core.c
3302

Ah, sorry. Yes, I meant UMA_NOBUCKETCACHE.

Most of my inline comments are nits, but: I don't really understand how the wakeup() mechanism works with the per-CPU caches. Threads may remain asleep on "zonelimit" indefinitely if uz_items == uz_maxitems but active threads are able to avoid the slow path. Before, the ZFLAG_FULL check in the zfree fast path handled this. Is there anything preventing starvation?

sys/vm/uma_core.c
2495

I prefer an explicit comparison: if (zone->uz_maxitems > 0)

2496

Why do we duplicate the logic from zone_alloc_item() instead of simply jumping to zone_alloc_item()?

2505

msleep(9) is deprecated by mtx_sleep(9).

2517

Explicit comparison: zone->uz_sleepers > 0.

2517

Why do you not do a wakeup here?

2600

Consider adding a KEG_LOCK_ASSERT()?

3085

Explicit comparison: uz_sleepers > 0.

sys/vm/uma_int.h
238–239

I don't believe there's any padding here.

328

To be consistent this should be uz_max_items.

341

Again, I'd prefer consistency in the names: uz_bkt_count, uz_bkt_max.

430

Seems to be a dup?

glebius marked an inline comment as done.
  • Rewrite uma_zalloc_arg() so that there is a single sleeping point.

Few minor nits:

  • Rename new fields to have space between words.
  • Dup define.
  • Explicit comparison against 0.
  • Add KEG_LOCK_ASSERT().
  • There is no pad here.
sys/vm/uma_core.c
2557

Why not have it in zone_alloc_item_locked()? With the current diff, uma_large_malloc(), for example, can bypass limits.

  • Move the sleep into zone_alloc_item_locked() and put it into a loop.

The extra locking in zone_alloc_item() still makes me uncomfortable. For zones with small items it may be negligible, but large item zones such as those used by ZFS may be impacted more significantly.

I'd suggest asking Peter to stress test the patch.

sys/vm/uma_core.c
2884

I would comment, /* Returns with the zone unlocked. */

2906

I'd fine this clearer as max_items - items > 1.

3302

I still don't really like this, but don't object to it.

This revision is now accepted and ready to land.Dec 14 2018, 6:13 PM
glebius added inline comments.
sys/vm/uma_core.c
2906

For me your suggestion looks more obfuscated rather then clearer.

sys/vm/uma_int.h
337

This is now used for a single pointer and can be removed.

343

Did you verify this offset?

sys/vm/uma_int.h
337

Yes, I noticed that. At Netflix we already have that substituted for an other field. I will address that later.

343

I did. Will verify once again pre-commit.

This revision was automatically updated to reflect the committed changes.