Page MenuHomeFreeBSD

Use per-domain locks for the zone layer.
ClosedPublic

Authored by jeff on Feb 13 2020, 9:34 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Mar 22, 10:36 PM
Unknown Object (File)
Fri, Mar 22, 10:36 PM
Unknown Object (File)
Fri, Mar 22, 10:35 PM
Unknown Object (File)
Fri, Mar 22, 10:35 PM
Unknown Object (File)
Mar 8 2024, 1:07 PM
Unknown Object (File)
Feb 21 2024, 10:59 AM
Unknown Object (File)
Feb 14 2024, 9:11 AM
Unknown Object (File)
Jan 6 2024, 5:35 PM
Subscribers

Details

Summary

I apologize for the length and complexity of this review. I did this in a flurry of commits over two days that were not well isolated and churned eachother. I don't think it's worth taking apart now.

The main thrust is providing independent locking for the domains in the zone layer. This improved allocator throughput on a machine with 50 domains by 2x.

This moves all of the per-zone domain logic into zone_fetch_bucket() and zone_put_bucket(). This really simplifies callers and makes it easier to do other restructuring.

It moves drain into bucket_free() so that every caller doesn't have to do both.

It attempts to avoid the lock on the alloc path if we don't think we'll succeed.

I restructured zone and moved fields around where I thought was appropriate and to reduce gaps.

I now calculate the zone domain address rather than fetching a pointer.

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 29416
Build 27298: arc lint + arc unit

Event Timeline

jeff added reviewers: rlibby, markj.
jeff set the repository for this revision to rS FreeBSD src repository - subversion.

One more things, round-robin domains use all zone domains now but load balance them. This is for lock contention reasons on large machines.

sys/vm/uma_core.c
564

Needs a comment.

690

Should be an inline.

721

Should be an inline.

875

comment

888

This nicely puts the poll outside the lock.

1204

Mark if you could review this section in particular. I think what I did is equivalent but maybe not.

1340–1341

I find this slightly questionable on SMR. It may block for a long time if we do preemptive sections. So this should probably poll first.

sys/vm/uma_int.h
504–505

This one pains me slightly since it is written so often for mbufs. Maybe it should just be with garbage at the end still.

525

I was hoping to fit this in 256b total but it just doesn't quite work.

This means there's a full 64 byte unused cacheline between here and the cpu array :(

586

Unused

jeff edited the summary of this revision. (Show Details)

Fix some imax bugs and address my own feedback.

I am mostly happy with this now and it is doing well on stress after a few small errors.

sys/vm/uma_core.c
630

this is a bug. cnt should be new. I made an error refactoring into the function.

899–900

I should just move this up a level and not use the *

2247

I'm unsure whether I will keep this.

ROUNDROBIN domains can now block on allocations when there are free items in the bucket caches of other 'domains'. It's unlikely due to the highest/lowest loops but it is possible. Even with a single domain a thread can block in page alloc while another frees to the domain layer. IIRC earlier versions of the allocator would check for this condition before re-entering vm_page_alloc().

This feature is the only reason for zone_domain().

sys/vm/uma_core.c
555

These three lines could be written as lockfail = ZDOM_OWNED(zdom).

559

The lack of precision seems ok, but this race makes it possible to exceed bucket_size_max. In practice it'll probably only go over by 1, but it could be more if you're unlucky. We should have a comment explaining why this is acceptable (or consider sharding this field as well, or ...).

714

I don't think we need this assertion anymore now that this function handles cache overflow.

3460

So, the reason we do this here is because we might lose the race and put the bucket back in the cache. There are no other failure modes.

Instead of waiting until you're certain of success, why can't we re-adjust imin when putting the bucket back in the cache after losing the race? This could cause the WSS estimate to be artificially inflated, but only temporarily, and it would simplify imin/imax management a lot.

In fact, if you call zone_put_bucket() with ws = true instead of false, I think we'll get the desired result. You just need to handle uz_bkt_max. If imax/imin are reset while the lock is dropped, we will end up with a larger WSS estimate than we should, but that's a narrow race and the estimate decays anyway.

sys/vm/uma_int.h
491

I dislike the mix of uz_bkt_ and uz_bucket_ in field names.

You might also amend the comment, "Maxium per-domain bucket cache size".

sys/vm/uma_core.c
559

I'd rather not shard it because it makes for some weird perf scenarios especially since round-robin may bounce between them.

I can also load it once and store a calculated value so it never exceeds max.

714

I guess it is possible for the overflow to come via cross domain flush where you don't know the value of ws.

3460

As discussed on irc;

The issue is not the race but that we can call zone_alloc_bucket() now without ever having acquired a domain lock. This becomes increasingly important as we get more cores per domain.

sys/vm/uma_int.h
491

I will go ahead and rename it. I thought it odd as well.

This revision is now accepted and ready to land.Feb 19 2020, 5:37 PM
This revision was automatically updated to reflect the committed changes.