Details

Reviewers

mav
markj
rlibby
glebius
gallatin

Commits

rS356350: Sort cross-domain frees into per-domain buckets before inserting these

Summary

This is part of a series of patches intended to enable first-touch numa policies for UMA by default. It also reduces the cost of uma_zalloc/zfree by approximately 30% each in my tests.

This patch allocates a per-domain crossdomain free bucket for each zone. Prior to this patch, when a crossdomain free bucket is filled it has to be drained all the way back to the keg layer. This is a very expensive operation and means if any significant fraction of your memory is freed on the wrong domain frees serialize on the keg lock. To alleviate this, I sort per-cpu crossdomain buckets into per-domain buckets under a new zone lock. When this bucket is full it goes on the normal free list. Nothing goes through the slab layer.

It would be possible to do this with a per-domain cross domain lock if we allocated a full domain * domain matrix of cross buckets. With this patch I was able to saturate memory bandwidth with mbufs freed on the wrong domain half of the time with only a 30% cpu penalty. Before this patch it was several hundred times slower. I believe the current performance is adequate but if it is not we can easily spend a little more memory and a small effort to make it so.

Diff Detail

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 28299
Build 26409: arc lint + arc unit

Event Timeline

jeff created this revision.Dec 15 2019, 11:39 PM

Harbormaster completed remote builds in B28170: Diff 65698.Dec 15 2019, 11:39 PM

jeff edited the summary of this revision. (Show Details)Dec 16 2019, 12:10 AM

jeff added reviewers: mav, markj, rlibby, glebius, gallatin.

jeff set the repository for this revision to rS FreeBSD src repository - subversion.

Herald added a subscriber: imp. · View Herald TranscriptDec 16 2019, 12:10 AM

markj added inline comments.Dec 16 2019, 7:59 PM

sys/vm/uma_core.c
1098	Shouldn't zone_fetch_bucket() also grab the cross bucket when available?
3738	"sort" should be capitalized.
3787	This might overflow the bucket cache limit.
3794	Unnecessary return statement.

jeff added inline comments.Dec 16 2019, 8:48 PM

sys/vm/uma_core.c
1098	The bucket would not be full and it would slow down crossdomain frees. So I prefer to just leave it until it fills unless there is memory pressure.

markj added inline comments.Dec 17 2019, 3:24 PM

sys/vm/uma_core.c
1098	Then I think we should always reclaim from it first, not last: it is counted in the WSS but cannot be allocated to consumers.

jeff added inline comments.Dec 21 2019, 8:51 PM

sys/vm/uma_core.c
1098	I think that is ok here.
3787	The enforcement of this really complicates a lot of control flow. I wonder if we could only enforce on alloc and let the timeout cleanup any overage.

Free cross bucket more aggressively.

Harbormaster completed remote builds in B28295: Diff 65936.Dec 23 2019, 8:34 PM

markj added inline comments.Dec 23 2019, 10:06 PM

sys/vm/uma_core.c
3787	Yeah, this should become less of an issue once we periodically preen caches. We may trigger an assertion failure in the overflow case though. You could call zone_put_bucket() with wss=false and add a XXX comment for now.

Fix bucket limit issues.

Harbormaster completed remote builds in B28299: Diff 65947.Dec 24 2019, 2:40 AM

rlibby added inline comments.Dec 24 2019, 8:09 PM

sys/vm/uma_core.c
3784–3789	Once this is true once, don't we expect it to be true for the rest? Can't we just `if (bkt_count >= bkt_max) break;` and then drain/free the remainder in a second loop which doesn't take the zone lock?
3784–3795	I think you mean to be operating on `b` and not `bucket` here.

jeff added inline comments.Dec 24 2019, 8:31 PM

sys/vm/uma_core.c
3784–3789	I thought about this but it actually takes a while to drain buckets so it's not certain and it makes the control flow uglier.
3784–3795	yes this is a C&P error.

rlibby added inline comments.Dec 24 2019, 10:09 PM

sys/vm/uma_core.c
1098–1100	Don't we need the ZONE_CROSS_LOCK for this? We have the ZONE_LOCK but that doesn't synchronize with the manipulation in zone_free_cross.
3784–3789	Well it doesn't have to make the control flow uglier (move the condition into the while loop test, delete the if block), but fair enough if you think it might be better to be re-testing in case concurrent allocs are consuming the frees fast enough.
3828–3836	Should we do something like this in zone_free_cross, too?

Fix the cross locking bug.

Harbormaster completed remote builds in B28472: Diff 66334.Jan 4 2020, 5:01 AM

This revision was not accepted when it landed; it landed in state Needs Review.Jan 4 2020, 7:56 AM

Closed by commit rS356350: Sort cross-domain frees into per-domain buckets before inserting these (authored by jeff). · Explain Why

This revision was automatically updated to reflect the committed changes.

jeff added a commit: rS356350: Sort cross-domain frees into per-domain buckets before inserting these.

Latest revision LGTM.

(umaperf 6/7) Sort crossdomain free buckets into domain correct buckets before returning to the system.
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 65947

sys/vm/uma_core.c

sys/vm/uma_int.h

(umaperf 6/7) Sort crossdomain free buckets into domain correct buckets before returning to the system.ClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 65947

sys/vm/uma_core.c

sys/vm/uma_int.h

(umaperf 6/7) Sort crossdomain free buckets into domain correct buckets before returning to the system.
ClosedPublic
Actions

Revision Contents
Changeset List