Page MenuHomeFreeBSD

Add a sysctl tree for UMA.
ClosedPublic

Authored by jeff on Tue, Nov 26, 12:45 AM.

Details

Summary

This has been sorely needed for a while. I have added a sysctl oid for every zone. Here is an example:

vm.uma.RADIX_NODE.stats.xdomain: 0
vm.uma.RADIX_NODE.stats.fails: 0
vm.uma.RADIX_NODE.stats.frees: 3404547
vm.uma.RADIX_NODE.stats.allocs: 3488279
vm.uma.RADIX_NODE.stats.current: 75216
vm.uma.RADIX_NODE.domain.0.wss: 233
vm.uma.RADIX_NODE.domain.0.imin: 22011
vm.uma.RADIX_NODE.domain.0.imax: 22011
vm.uma.RADIX_NODE.domain.0.nitems: 22011
vm.uma.RADIX_NODE.limit.sleeps: 0
vm.uma.RADIX_NODE.limit.sleepers: 0
vm.uma.RADIX_NODE.limit.max_items: 0
vm.uma.RADIX_NODE.limit.items: 0
vm.uma.RADIX_NODE.keg.free: 8
vm.uma.RADIX_NODE.keg.pages: 3925
vm.uma.RADIX_NODE.keg.align: 1
vm.uma.RADIX_NODE.keg.ipers: 27
vm.uma.RADIX_NODE.keg.ppera: 1
vm.uma.RADIX_NODE.keg.rsize: 144
vm.uma.RADIX_NODE.keg.name: RADIX NODE
vm.uma.RADIX_NODE.count_max: 253
vm.uma.RADIX_NODE.count: 203
vm.uma.RADIX_NODE.flags: 2147745792
vm.uma.RADIX_NODE.size: 144

Here's the keg portion of z ZSECOND zone
vm.uma.mbuf_packet.keg.free: 1
vm.uma.mbuf_packet.keg.pages: 1906
vm.uma.mbuf_packet.keg.align: 255
vm.uma.mbuf_packet.keg.ipers: 15
vm.uma.mbuf_packet.keg.ppera: 1
vm.uma.mbuf_packet.keg.rsize: 256
vm.uma.mbuf_packet.keg.name: mbuf

Both mbuf and mbuf_packet will print the same stats for the "mbuf" keg.

I had to preprocess and dup the name because we make zones with duplicate names and use illegal characters for sysctl. This means that it is really a human targeted interface.

I have not yet added any writable controls. Presumably you could update limit policies, bucket sizes, or whatever you like for experimentation. I also think we need some more stats. At one point I had counters for fast path vs slow path allocs etc.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

jeff created this revision.Tue, Nov 26, 12:45 AM
jeff edited the summary of this revision. (Show Details)Tue, Nov 26, 12:50 AM
jeff added reviewers: markj, kib, glebius, gallatin.
jeff set the repository for this revision to rS FreeBSD src repository.
jeff added inline comments.Tue, Nov 26, 12:53 AM
sys/vm/uma_core.c
1852 ↗(On Diff #64874)

I could break this up into multiple functions but it would serve no real purpose other than style. It is inherently very linear and verbose.

1870 ↗(On Diff #64874)

I think this needs to be +2. +1 for underscore and + 1 for null terminator.

sys/vm/uma_int.h
342 ↗(On Diff #64874)

I renamed this because it confused even me.

jeff added a reviewer: rlibby.Tue, Nov 26, 1:56 AM
rlibby added inline comments.Tue, Nov 26, 7:14 AM
sys/vm/uma_core.c
263 ↗(On Diff #64874)

It might have been nicer to have both, avoiding ugly unused params.

static void zone_foreach(void (*zfunc)(uma_zone_t));
static void zone_foreach_arg(void (*zfunc)(uma_zone_t, void *), void *);
616 ↗(On Diff #64874)

May want to add __unused ? Etc below.

1875–1883 ↗(On Diff #64874)

This scheme has a couple problems:

  • The name count algorithm doesn't munge the names, so we may not have detected a collision earlier, but collide here (e.g. "foo bar" and "foo_bar" collide).
  • The counting also doesn't strictly work:
z1 = zcreate("foo");   // "foo"
z2 = zcreate("foo");   // "foo_2"
zdestroy(z1);
z3 = zcreate("foo");   // "foo_2" (undetected collision)
z4 = zcreate("foo_2"); // "foo_2" (munging again)

These can lead to trying to reuse the sysctl name, which then would lead to a crash in this code.

We could consider just setting namecnt here in zone_alloc_sysctl according to the first name we are able to successfully insert into the tree. It doesn't look like the sysctl API is very hepful for us in doing that though. We could use sysctl_add_oid but we'd want to suppress the sysctl_warn_reuse spew (maybe with a CTLFLAG?).

jeff added inline comments.Tue, Nov 26, 7:24 AM
sys/vm/uma_core.c
1875–1883 ↗(On Diff #64874)

sysctl won't crash if there is a duplicate. It will print a warning about a re-used entry. I don't think we need to detect super pathological cases. You are right that I should be using count - 1.

rlibby added inline comments.Tue, Nov 26, 7:37 AM
sys/vm/uma_core.c
1875–1883 ↗(On Diff #64874)

Yes, sysctl won't crash on collision, but it'll return NULL, which we don't check for here.

I agree we don't need to chase down pathological cases, just make them harmless.

jeff added inline comments.Tue, Nov 26, 8:06 AM
sys/vm/uma_core.c
1875–1883 ↗(On Diff #64874)

For re-used nodes sysctl returns the original oid. It just doesn't return it for leafs. We never use the return value from a leaf oid add.

I originally tested this without any collision detection so I'm fairly sure it doesn't panic.

kib added a comment.Tue, Nov 26, 1:49 PM

Does this essentially obsoletes vm.zone_stats ? Wouldn't it be better to extend that interface and vmstat -z ?

jeff added a comment.Tue, Nov 26, 7:26 PM

Sysctl will allow us to make changes at runtime. For example limits. It is quicker to modify and doesn't create ABI issues. I view this as more of a debugging tool and vmstat as more for system administrators.

rlibby added inline comments.Wed, Nov 27, 7:05 AM
sys/vm/uma_core.c
1858 ↗(On Diff #64874)

const

1875–1883 ↗(On Diff #64874)

Yep, I'm on board now.

1902–1903 ↗(On Diff #64874)

Hmm, it's a union, should this be

if (zone->uz_lockptr != &zone->uz_lock && zone->uz_keg != NULL) {
        keg = zone->uz_keg;

?

1955–1956 ↗(On Diff #64874)

s/uz_domain[0]/uz_domain[i]/

1967 ↗(On Diff #64874)

s/minum/minimum/

2029–2045 ↗(On Diff #64874)

Just a thought. We don't NULL uz_ctlname/uz_oid. That's fine, we do bzero them. But should we, for consistency? I expect the compiler to remove the stores of zero after bzero anyway.

2220–2230 ↗(On Diff #64874)

We ought to sysctl_remove_oid earlier, to avoid a race on teardown. The sysctl handlers inspect the keg and the counters.

3801–3806 ↗(On Diff #64874)

Here and below, do we actually need the ZONE_LOCK() for the counter fetch? And, since we are not actually inspecting the uz_cpu cache buckets, do we really need it for that either?

3851–3857 ↗(On Diff #64874)

s/uz_allocs/uz_frees/

sys/vm/uma_int.h
342 ↗(On Diff #64874)

Unfortunately there's still a reference in userspace to clean up... tools/tools/umastat/umastat.c.

jeff updated this revision to Diff 64978.Wed, Nov 27, 8:52 PM

Address review feedback. Remove related dead code.

rlibby accepted this revision.Wed, Nov 27, 9:33 PM

Updates look good to me.

sys/vm/uma_core.c
1870 ↗(On Diff #64874)

The string literal also has a null terminator, so just + 1 for the underscore is fine. (I guess you determined this too since no change was made here.)

This revision is now accepted and ready to land.Wed, Nov 27, 9:33 PM