Page MenuHomeFreeBSD

(umaperf 6/7) Sort crossdomain free buckets into domain correct buckets before returning to the system.
ClosedPublic

Authored by jeff on Dec 15 2019, 11:39 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Jan 19, 7:27 PM
Unknown Object (File)
Sat, Jan 18, 5:43 PM
Unknown Object (File)
Thu, Jan 9, 12:04 PM
Unknown Object (File)
Sat, Jan 4, 9:26 AM
Unknown Object (File)
Thu, Dec 26, 4:42 PM
Unknown Object (File)
Dec 23 2024, 11:32 PM
Unknown Object (File)
Dec 23 2024, 11:00 PM
Unknown Object (File)
Dec 14 2024, 2:41 AM
Subscribers

Details

Summary

This is part of a series of patches intended to enable first-touch numa policies for UMA by default. It also reduces the cost of uma_zalloc/zfree by approximately 30% each in my tests.

This patch allocates a per-domain crossdomain free bucket for each zone. Prior to this patch, when a crossdomain free bucket is filled it has to be drained all the way back to the keg layer. This is a very expensive operation and means if any significant fraction of your memory is freed on the wrong domain frees serialize on the keg lock. To alleviate this, I sort per-cpu crossdomain buckets into per-domain buckets under a new zone lock. When this bucket is full it goes on the normal free list. Nothing goes through the slab layer.

It would be possible to do this with a per-domain cross domain lock if we allocated a full domain * domain matrix of cross buckets. With this patch I was able to saturate memory bandwidth with mbufs freed on the wrong domain half of the time with only a 30% cpu penalty. Before this patch it was several hundred times slower. I believe the current performance is adequate but if it is not we can easily spend a little more memory and a small effort to make it so.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

jeff added reviewers: mav, markj, rlibby, glebius, gallatin.
jeff set the repository for this revision to rS FreeBSD src repository - subversion.
sys/vm/uma_core.c
1092 ↗(On Diff #65698)

Shouldn't zone_fetch_bucket() also grab the cross bucket when available?

3715 ↗(On Diff #65698)

"sort" should be capitalized.

3764 ↗(On Diff #65698)

This might overflow the bucket cache limit.

3771 ↗(On Diff #65698)

Unnecessary return statement.

sys/vm/uma_core.c
1092 ↗(On Diff #65698)

The bucket would not be full and it would slow down crossdomain frees. So I prefer to just leave it until it fills unless there is memory pressure.

sys/vm/uma_core.c
1092 ↗(On Diff #65698)

Then I think we should always reclaim from it first, not last: it is counted in the WSS but cannot be allocated to consumers.

sys/vm/uma_core.c
1092 ↗(On Diff #65698)

I think that is ok here.

3764 ↗(On Diff #65698)

The enforcement of this really complicates a lot of control flow. I wonder if we could only enforce on alloc and let the timeout cleanup any overage.

Free cross bucket more aggressively.

sys/vm/uma_core.c
3764 ↗(On Diff #65698)

Yeah, this should become less of an issue once we periodically preen caches. We may trigger an assertion failure in the overflow case though. You could call zone_put_bucket() with wss=false and add a XXX comment for now.

Fix bucket limit issues.

sys/vm/uma_core.c
3784–3789 ↗(On Diff #65947)

Once this is true once, don't we expect it to be true for the rest? Can't we just if (bkt_count >= bkt_max) break; and then drain/free the remainder in a second loop which doesn't take the zone lock?

3784–3795 ↗(On Diff #65947)

I think you mean to be operating on b and not bucket here.

sys/vm/uma_core.c
3784–3789 ↗(On Diff #65947)

I thought about this but it actually takes a while to drain buckets so it's not certain and it makes the control flow uglier.

3784–3795 ↗(On Diff #65947)

yes this is a C&P error.

sys/vm/uma_core.c
1098–1100 ↗(On Diff #65947)

Don't we need the ZONE_CROSS_LOCK for this? We have the ZONE_LOCK but that doesn't synchronize with the manipulation in zone_free_cross.

3784–3789 ↗(On Diff #65947)

Well it doesn't have to make the control flow uglier (move the condition into the while loop test, delete the if block), but fair enough if you think it might be better to be re-testing in case concurrent allocs are consuming the frees fast enough.

3828–3836 ↗(On Diff #65947)

Should we do something like this in zone_free_cross, too?

Fix the cross locking bug.

This revision was not accepted when it landed; it landed in state Needs Review.Jan 4 2020, 7:56 AM
This revision was automatically updated to reflect the committed changes.