Details

Reviewers

jeff
markj

Commits

rS343026: o Move zone limit from keg level up to zone level. This means that now

Summary

This review isn't supposed to be committed as a whole!

It consist UMA limits in zone with some extra UMA changes.
It also has changes to create pbuf zone. The latter depends
on having UMA limits in zone.

The review is posted to review UMA changes only.

o Account number of items a zone holds. Counter is updated whenever
items transition from keg to a bucket cache or directly to a consumer.

New fields uz_items, uz_maxitems. Deleted uk_maxpages.
Point to return NULL or sleep shifts from keg_fetch_slab() to uma_zalloc_arg()
zone_alloc_bucket() inlined
Add uz_count_max to prevent bucket cache to grow too big for zones with really small limits. See update to uma_zone_set_max().

o Keep counter of sleepers and wakeup them one by one.

o Optimize uma_zfree_arg() to keep lock locked during back jumps

o Allow to limit number of items in bucket cache.

UMA_ZONE_NOBUCKETCACHE removed, just use limit of 0.
zone_cache_bucket()/zone_remove_bucket() to keep count of items

o Drop support for multi-keg zones, lots stuff deleted and
simplified. Make sure uma_zone_t is now perfectly aligned.

Diff Detail

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 21442
Build 20764: arc lint + arc unit

Event Timeline

glebius created this revision.Oct 30 2018, 11:56 PM

Harbormaster completed remote builds in B20526: Diff 49813.Oct 30 2018, 11:56 PM

Merge remote-tracking branch 'FreeBSD/master' into pbuf to take
in and resolve with recent D16666 check-in.

Harbormaster completed remote builds in B20813: Diff 50429.Nov 14 2018, 8:35 PM

markj added inline comments.Nov 21 2018, 4:27 AM

sys/vm/uma_core.c
493	This assertion should go at the beginning of the function.
3093	What's the point of deferring the unlock? We hold a critical section here and there's a free slot in uc_freebucket, so we're guaranteed to succeed during the restart. In all other cases we are jumping back with locked == false, right?

glebius marked 2 inline comments as done.Nov 26 2018, 7:46 PM

Move assertion to beginning of the function.
Remove unlock avoidance in uma_zfree_arg(). Can't remember why it was

Harbormaster completed remote builds in B21210: Diff 51136.Nov 26 2018, 9:32 PM

Can you explain how the zone/keg locking work after the simplification? I'm having trouble understanding it.

sys/vm/uma.h
500	"an upper limit" would sound more natural to me, and you use that term below.
sys/vm/uma_core.c
602	We want the zone lock here, not the keg lock.
2602	Why did this get inlined? I can't see the reason.
2891–2912	The extra locking is rather unfortunate. Why do we do the check here and not somewhere where we already hold the zone lock?
3472–3473	Since you did it elsewhere, get rid of this return statement too?
sys/vm/uma_int.h
319	"This zone's keg"

glebius marked 4 inline comments as done.Nov 27 2018, 11:12 PM

glebius added inline comments.

sys/vm/uma.h
500	But existing uma_zone_set_max() comment already has this wording.
sys/vm/uma_core.c
602	In the new scheme if zone has a keg, they share lock. The zone_timeout() is called for only keg-enabled zones, see zone_foreach(). So, KEG_LOCK() here is correct and is self-documenting.

glebius marked 2 inline comments as done.Nov 27 2018, 11:35 PM

glebius added inline comments.

sys/vm/uma_core.c
2602	I'm preparing a patch to put it back into separate function, and to be fair, I feel that it is more difficult to understand the code when a middle of uma_zalloc_arg() is separated from the rest of the function. If you really insist I can roll this back. However, zone_alloc_bucket() would be modified. You see, the inlined code doesn't exactly match code of the function. It has few additions to rollback uz_items in case of allocation failure or strange bucket.
2891–2912	Because uz_import is not always zone_import(). For vm_page zones uz_import it is outside of UMA and doesn't lock zone. Right before uz_import is the only point where we can do bookkeeping of allocations. The performance impact here is not measurable. We are already at a slow path and are about to acquire some kind of lock in uz_import. For regular zones, the same lock, so same cache line. Also, ratio of fastpath allocs to slowpath is expected to be super low. For example for vm_page we got one slowpath alloc for 10^7 immediate returns from per-CPU cache with zero jumps to zalloc_start.

markj added inline comments.Dec 3 2018, 2:45 PM

sys/vm/uma_core.c
735	Shouldn't we be adding bucket->ub_cnt to uz_frees?
2498	I don't understand this check. In the common case the bucket will be placed in the per-CPU cache, so it won't count against uz_bktcount. And we already check this condition again before adding it to the bucket list.
2553	Should assert that zone->uz_items >= max - bucket->ub_cnt before the subtraction.
2602	I still think I prefer having a separate function. The details of how we fill a bucket aren't really relevant to the core allocation algorithm. At least, please pick a more specific variable name than "max".
2919	What problem does this line fix?
3302	It is weird to me that the patch is being so careful in general to do precise accounting, but then just adjusts the caller's request here. Why not simply disable per-CPU caching in this case?
3613	The parens are not needed anymore.
sys/vm/uma_int.h
226–228	Why?

glebius marked 9 inline comments as done and an inline comment as not done.Dec 3 2018, 11:19 PM

glebius added inline comments.

sys/vm/uma_core.c
2498	You are absolutely correct. This came from Netflix tree and later my profiling showed that condition never happens.
3302	Without per-CPU caching UMA isn't a useful allocator at all. Note that uma_zone_set_max() never claimed to put a hard cap on a zone. With new code we are getting closer to requested cap than we used before.
sys/vm/uma_int.h
226–228	So that we may use single pointer for uz_keg/uz_lockptr. Should I put that in comments?

markj added inline comments.Dec 3 2018, 11:45 PM

sys/vm/uma_core.c
3302	And yet UMA_NOBUCKET exists. :) I pointed this out because you are using UMA zones to manage pbufs, and the existing allocator is precise. For certain uses, where the max is a function of the about of physical memory, it may be that we don't care about per-CPU caching at all on small systems.
sys/vm/uma_int.h
226–228	Right, I see now. A comment would be helpful, I think.

Address Mark's review.

Harbormaster completed remote builds in B21348: Diff 51560.Dec 3 2018, 11:50 PM

glebius marked 3 inline comments as done.Dec 4 2018, 12:20 AM

glebius added inline comments.

sys/vm/uma_core.c
3302	UMA_NOBUCKET doesn't turn off CPU caches. The existing allocator is precise, but maximums provided to it are very very rough guesses. "Mkay, 1/2 nswbufs might not be enough, but 3/4 would be fine..." For pbufs potential adjustment done by uma_zone_set_max() is entirely okay. I even see it as an autotuning. The more CPUs machine has, the more buffers it would have.
sys/vm/uma_int.h
226–228	Will do.

Improve comment on why uk_lock needs to be first.

Harbormaster completed remote builds in B21351: Diff 51564.Dec 4 2018, 12:26 AM

markj added inline comments.Dec 4 2018, 12:30 AM

sys/vm/uma_core.c
3302	Are you thinking of UMA_NOBUCKETCACHE? That is a different flag (and I'm glad it's gone with this change). It is a bug if UMA_NOBUCKET _doesn't_ turn off per-CPU caches.

glebius marked an inline comment as done.Dec 4 2018, 12:36 AM

glebius added inline comments.

sys/vm/uma_core.c
3302	Ah, sorry. Yes, I meant UMA_NOBUCKETCACHE.

Most of my inline comments are nits, but: I don't really understand how the wakeup() mechanism works with the per-CPU caches. Threads may remain asleep on "zonelimit" indefinitely if uz_items == uz_maxitems but active threads are able to avoid the slow path. Before, the ZFLAG_FULL check in the zfree fast path handled this. Is there anything preventing starvation?

sys/vm/uma_core.c
2495	I prefer an explicit comparison: if (zone->uz_maxitems > 0)
2496	Why do we duplicate the logic from zone_alloc_item() instead of simply jumping to zone_alloc_item()?
2505	msleep(9) is deprecated by mtx_sleep(9).
2517	Explicit comparison: zone->uz_sleepers > 0.
2517	Why do you not do a wakeup here?
2600	Consider adding a KEG_LOCK_ASSERT()?
3085	Explicit comparison: uz_sleepers > 0.
sys/vm/uma_int.h
238–239	I don't believe there's any padding here.
328	To be consistent this should be uz_max_items.
341	Again, I'd prefer consistency in the names: uz_bkt_count, uz_bkt_max.
430	Seems to be a dup?

Rewrite uma_zalloc_arg() so that there is a single sleeping point.

Few minor nits:

Rename new fields to have space between words.
Dup define.
Explicit comparison against 0.
Add KEG_LOCK_ASSERT().
There is no pad here.

Harbormaster completed remote builds in B21381: Diff 51630.Dec 5 2018, 11:39 PM

glebius marked 10 inline comments as done.Dec 5 2018, 11:40 PM

markj added inline comments.Dec 6 2018, 7:04 PM

sys/vm/uma_core.c
2557	Why not have it in zone_alloc_item_locked()? With the current diff, uma_large_malloc(), for example, can bypass limits.

Move the sleep into zone_alloc_item_locked() and put it into a loop.

Harbormaster completed remote builds in B21442: Diff 51746.Dec 7 2018, 11:04 PM

glebius marked an inline comment as done.Dec 7 2018, 11:04 PM

The extra locking in zone_alloc_item() still makes me uncomfortable. For zones with small items it may be negligible, but large item zones such as those used by ZFS may be impacted more significantly.

I'd suggest asking Peter to stress test the patch.

sys/vm/uma_core.c
2884	I would comment, `/* Returns with the zone unlocked. */`
2906	I'd fine this clearer as `max_items - items > 1`.
3302	I still don't really like this, but don't object to it.

This revision is now accepted and ready to land.Dec 14 2018, 6:13 PM

glebius marked 2 inline comments as done.Dec 14 2018, 9:01 PM

glebius added inline comments.

sys/vm/uma_core.c
2906	For me your suggestion looks more obfuscated rather then clearer.

jeff added inline comments.Dec 18 2018, 7:35 AM

sys/vm/uma_int.h
337	This is now used for a single pointer and can be removed.
343	Did you verify this offset?

glebius added inline comments.Jan 14 2019, 11:42 PM

sys/vm/uma_int.h
337	Yes, I noticed that. At Netflix we already have that substituted for an other field. I will address that later.
343	I did. Will verify once again pre-commit.

Closed by commit rS343026: o Move zone limit from keg level up to zone level. This means that now (authored by glebius). · Explain WhyJan 15 2019, 12:02 AM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: imp. · View Herald TranscriptJan 15 2019, 12:02 AM

UMA limits in zone & more
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 51746

lib/libmemstat/memstat_uma.c

sys/cam/cam_periph.c

sys/dev/md/md.c

sys/dev/nvme/nvme_ctrlr.c

sys/fs/fuse/fuse_main.c

sys/fs/fuse/fuse_vnops.c

sys/fs/nfsclient/nfs_clbio.c

sys/fs/nfsclient/nfs_clport.c

sys/fs/smbfs/smbfs_io.c

sys/fs/smbfs/smbfs_vfsops.c

sys/kern/kern_lock.c

sys/kern/kern_physio.c

sys/kern/vfs_aio.c

sys/kern/vfs_bio.c

sys/kern/vfs_cluster.c

sys/sys/buf.h

sys/sys/lockmgr.h

sys/ufs/ffs/ffs_rawread.c

sys/vm/swap_pager.c

sys/vm/uma.h

sys/vm/uma_core.c

sys/vm/uma_int.h

sys/vm/vm_page.c

sys/vm/vm_pager.c

sys/vm/vnode_pager.c

UMA limits in zone & moreClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 51746

lib/libmemstat/memstat_uma.c

sys/cam/cam_periph.c

sys/dev/md/md.c

sys/dev/nvme/nvme_ctrlr.c

sys/fs/fuse/fuse_main.c

sys/fs/fuse/fuse_vnops.c

sys/fs/nfsclient/nfs_clbio.c

sys/fs/nfsclient/nfs_clport.c

sys/fs/smbfs/smbfs_io.c

sys/fs/smbfs/smbfs_vfsops.c

sys/kern/kern_lock.c

sys/kern/kern_physio.c

sys/kern/vfs_aio.c

sys/kern/vfs_bio.c

sys/kern/vfs_cluster.c

sys/sys/buf.h

sys/sys/lockmgr.h

sys/ufs/ffs/ffs_rawread.c

sys/vm/swap_pager.c

sys/vm/uma.h

sys/vm/uma_core.c

sys/vm/uma_int.h

sys/vm/vm_page.c

sys/vm/vm_pager.c

sys/vm/vnode_pager.c

UMA limits in zone & more
ClosedPublic
Actions

Revision Contents
Changeset List