Page MenuHomeFreeBSD

(umaperf 2/7) Store the zone flags and size in the uma per-cpu cache area.
ClosedPublic

Authored by jeff on Dec 15 2019, 11:36 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Dec 19, 12:22 AM
Unknown Object (File)
Thu, Dec 19, 12:20 AM
Unknown Object (File)
Wed, Dec 11, 4:21 AM
Unknown Object (File)
Tue, Dec 10, 2:03 PM
Unknown Object (File)
Nov 5 2024, 2:44 PM
Unknown Object (File)
Nov 3 2024, 2:37 PM
Unknown Object (File)
Oct 19 2024, 8:20 AM
Unknown Object (File)
Sep 18 2024, 11:42 AM
Subscribers

Details

Summary

This is part of a series of patches intended to enable first-touch numa policies for UMA by default. It also reduces the cost of uma_zalloc/zfree by approximately 30% each in my tests.

This patch caches the zone size and flags in the per-cpu caches and adds a few flags so features can be tested without touching uma_zone. In a normal fast path allocation we will now only touch the cacheline for the per-cpu area and the bucket we're popping as a result.

The per-cpu cache buckets have 32bit of padding in them. It is slightly gross to use this but if I don't the per-cpu cache size will exceed 64 bytes which I would very much like to avoid. The limit code could be improved by setting and clearing the flag when we reach and drop below the limit, probably with some timeout to clear to limit hysteresis.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

jeff added reviewers: mav, markj, rlibby, glebius, gallatin.
jeff set the repository for this revision to rS FreeBSD src repository - subversion.
markj added inline comments.
sys/vm/uma_core.c
2178 ↗(On Diff #65694)

Maybe zone_update_cached_attrs() or something like that instead?

2182 ↗(On Diff #65694)

Why not CPU_FOREACH()?

2897 ↗(On Diff #65694)

I don't see why the added ifdefs are needed?

3530 ↗(On Diff #65694)

The two if-statements can be flattened.

sys/vm/uma_int.h
224 ↗(On Diff #65694)

You might consider adding a

struct uma_cached_attrs {
    uint32_t flags;
    uint32_t size;
};

and write your accessors in terms of that. Then you would need fewer accessors and the naming would be clearer. It would also be easier to extend this to make use of the third spare word in the cross bucket, if you ever want to do that. I see that in uma_zfree_arg() we always load the flags but only load the size in a rare branch; given that both words belong to the same cache line, I am not sure that it would be harmful to always load both into the attr struct above. And, in other places we always load both words anyway.

253 ↗(On Diff #65694)

Extra newline.

This revision is now accepted and ready to land.Dec 16 2019, 4:37 PM
sys/vm/uma_int.h
480 ↗(On Diff #65694)

PRINT_UMA_FLAGS below should be updated.

sys/vm/uma_core.c
2182 ↗(On Diff #65694)

At one point there were some startup issues that meant cpus were marked absent until later in the boot. I'm not sure what the situation is now. However, this construction lends itself to hotplug by initializing everything.

We should possibly make a CPU_FOREACH_ALL() or something.

sys/vm/uma_core.c
2897 ↗(On Diff #65694)

In some cases if you if (condition) KASSERT the compiler has generated some extra branch etc. It should not here.

Partially it just gives me visual clarity that nothing in this huge top block is doing anything on release builds. I can look at the assembly and see if it's required but I like it.

Looks good.

An alternative to relying on structure padding would be to use macro name pasting magic for your accessors. More explicit about layout, but less readable code. I think using the structure padding is okay.

Nits and a question inline.

sys/vm/uma_core.c
2843 ↗(On Diff #65694)

Long line...

2938–2940 ↗(On Diff #65694)

Long lines...

3660 ↗(On Diff #65694)

We don't care about this one? Because we're about to call bucket_alloc(zone,...) anyway? And similar below?

This revision now requires review to proceed.Dec 21 2019, 8:46 PM
sys/vm/uma_core.c
3530 ↗(On Diff #65694)

Flattening them would tell the compiler to test both conditions but place the jmp in such a way that the processor won't static predict it true and keep it away from the main function text. This means we'll potentially speculatively touch both lines.

Ordering it this way means that uz_sleepers is only accessed if the unlikely branch tests true.

I have not verified the assembly but I believe this should produce better results.

3660 ↗(On Diff #65694)

Yes we are basically into the slow layer so we're touching a bunch of zone fields anyway.

This revision is now accepted and ready to land.Dec 22 2019, 9:57 PM
markj added inline comments.
sys/vm/uma_core.c
3530 ↗(On Diff #65694)

I'd be surprised if the compiler treated the above any differently from:

if (__predict_false(uz_flags & UMA_ZFLAG_LIMIT) && zone->uz_sleepers > 0)
    ...

but stranger things have happened.