Page MenuHomeFreeBSD

Introduce stats(3)
Needs ReviewPublic

Authored by trasz on May 30 2019, 8:58 PM.

Details

Reviewers
sef
cem
Group Reviewers
manpages
Summary

Introduce stats(3), a flexible statistics gathering API.

This provides a framework to define a template describing
a set of "variables of interest" and the intended way for the
framework to maintain them (for example the maximum, sum,
t-digest, or a combination thereof). Afterwards the user
code feeds in the raw data, and the framework maintains
these variables inside a user-provided, opaque stats blobs.
The framework also provides a way to selectively extract the
stats from the blobs. The stats(3) framework can be used in
both userspace and the kernel.

See the stats(3) manual page for details.

This will be used by the upcoming TCP statistics gathering code,
https://reviews.freebsd.org/D20655.

This depends on both the ARB tree macros:
https://reviews.freebsd.org/D20324

and the Q math macros:
https://reviews.freebsd.org/D20116.

The stats(3) framework is disabled by default for now,
except in the NOTES kernel (for QA); it is expected
to be enabled in amd64 GENERIC after a cool down period.

Sponsored By: Klara Inc, Netflix
Obtained from: Netflix

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 25388
Build 24029: arc lint + arc unit

Event Timeline

trasz created this revision.May 30 2019, 8:58 PM
bcr added a subscriber: bcr.May 31 2019, 7:29 AM

AE vs. BE comment. The man page looks fine to me in general, though.

share/man/man3/stats.3
229

s/minimise/minimize/

In general, we use American English in FreeBSD's docs and man pages. I've seen some other instances like this in the file. Can you do a sweep over the man page to change these occurances?
Maybe textproc/igor can point them out to you.

sef added inline comments.Jun 3 2019, 9:03 PM
share/man/man3/stats.3
227

I think I'd like "Variables of Interest" to be capitalized there.

246

That's an _at least_ 8 byte header, so why say that it is 8 bytes there? I'd recommend something along the lines of, "The publicly-visible header is at least 8 bytes, currently defined as:")

sef added a comment.Jun 3 2019, 10:59 PM

I'd like to see some example usages.

share/man/man3/stats.3
259

I think that should be '.Va opaque' there.

275

This confuses me a bit, although I haven't dug into the implementation yet, so perhaps it's clearer there. Specifically... if it's opaque, why do I need to know the underlying memory allocation size? I can understand needing to know the size of the individual statsblob object, but shouldn't any thing object-specific be _in_ the opaque part? And that would include any difference between object size and opaque size.

716

"8b"?

sys/kern/subr_stats.c
48

style(9) says header files should be alphabetical; is there a compelling reason that can't be the case here?

301

The initializers here are very dense; how does it look with the trade-off of spaces compared to line length?

cem requested changes to this revision.Jun 7 2019, 2:30 AM
cem added a subscriber: cem.

So like, what's the motivation of this change? What are the goals? How was this design chosen? There's 6000 lines of diff present as a fait accompli without any kind of socialization on arch@ or anywhere else. Code is great, but the high level stuff is missing.

Comment density is "low" and copy-paste density is "high." Did you consider templating this, either with awk or C++?

"stats(3)" is a bit misleading; 3900 LoC are a new kernel file. What is the expected consumer? What is the general API surface? Where is the design doc?

Where is the stats(9) manual page? What do stats(3) consumers look like? Will there ever be any in tree? Can this live in ports?

This revision now requires changes to proceed.Jun 7 2019, 2:30 AM
In D20477#443980, @cem wrote:

So like, what's the motivation of this change? What are the goals? How was this design chosen? There's 6000 lines of diff present as a fait accompli without any kind of socialization on arch@ or anywhere else. Code is great, but the high level stuff is missing.
Comment density is "low" and copy-paste density is "high." Did you consider templating this, either with awk or C++?
"stats(3)" is a bit misleading; 3900 LoC are a new kernel file. What is the expected consumer? What is the general API surface? Where is the design doc?
Where is the stats(9) manual page? What do stats(3) consumers look like? Will there ever be any in tree? Can this live in ports?

The first consumer is the new TCP stats framework (review to be posted shortly). I'll get some more detail on the high level design to share.

trasz updated this revision to Diff 58639.Jun 14 2019, 8:59 PM
trasz marked 5 inline comments as done.

Fixes to problems found by bcr@ and sef@. I'll try to provide a basic
test case later on.

trasz added inline comments.Jun 14 2019, 9:00 PM
share/man/man3/stats.3
229

Hm, igor doesn't seem to do that. I've fixed a bunch of /initialis.*/, though.

246

To me this means the header is exactly 8 bytes; the variable length 'opaque' is not really part of the header - it follows the header.

275

It's because the API consumer can provide their own buffer to eg stats_blob_init().

sys/kern/subr_stats.c
48

It's because it uses the ARB macros from <sys/tree.h>. Do we have a policy of whether includes should themselves include the headers they depend on?

301

I've added spaces around the equal signs; this maxes out at 78 chars.

trasz added a comment.Jun 15 2019, 2:49 PM
In D20477#443980, @cem wrote:

So like, what's the motivation of this change? What are the goals? How was this design chosen? There's 6000 lines of diff present as a fait accompli without any kind of socialization on arch@ or anywhere else. Code is great, but the high level stuff is missing.
Comment density is "low" and copy-paste density is "high." Did you consider templating this, either with awk or C++?
"stats(3)" is a bit misleading; 3900 LoC are a new kernel file. What is the expected consumer? What is the general API surface? Where is the design doc?
Where is the stats(9) manual page? What do stats(3) consumers look like? Will there ever be any in tree? Can this live in ports?

The first consumer is the new TCP stats framework (review to be posted shortly). I'll get some more detail on the high level design to share.

And here it is: https://reviews.freebsd.org/D20655.

sef accepted this revision.Jun 28 2019, 6:33 PM

Still no examples, but linking to the consumer helps.

trasz updated this revision to Diff 59519.Jul 7 2019, 7:56 PM

Update to the new Q_INI(3) syntax; no functional changes.

lstewart added inline comments.
share/man/man3/stats.3
716

Oops, should have been caps B for SI bytes suffix. Similarly true for the annotated memory layout further down.

trasz edited the summary of this revision. (Show Details)Jul 15 2019, 7:38 PM
trasz edited the summary of this revision. (Show Details)Jul 15 2019, 7:42 PM
allanjude added inline comments.Jul 16 2019, 4:27 PM
share/man/man3/stats.3
763

I think the same '8b' vs '8B' or '8 bytes' change may be needed here.

trasz updated this revision to Diff 59857.Jul 17 2019, 7:43 PM

One more fix from Allan.

trasz marked an inline comment as done.Jul 17 2019, 7:43 PM
trasz edited the summary of this revision. (Show Details)Jul 19 2019, 12:53 PM
allanjude edited the summary of this revision. (Show Details)Aug 13 2019, 6:02 PM
sef accepted this revision.Aug 13 2019, 6:11 PM
cem added a comment.EditedSun, Sep 15, 9:29 PM

Some drive by comments — I haven't had time to read the 3-4 thousand lines of code in this review fully.

I was pointed this way from the arb tree review because "the digest code in stats(3) [justifies the arb data structure]." So I'll try to find and read that bit, but it looks like that's thousands of lines of code, so... it's still not an especially compelling case for ARB.

First impressions are that this is not only huge, but dense (low on comments even just showing the relation between algorithms and the source text, tdunning's t-digest paper), and the bits related to randomness are sloppy or incorrect.

Endian definitions for voistatdata_numeric etc imply this is intended to be an on-wire or disk structure? Or that you want portions of wider types to align with non-pad parts of other types in the same union? I didn't see where such an assumption was used, but if it is, accessing a union member that isn't the same one you most recently initialized is technically undefined behavior.

sys/kern/subr_stats.c
48

This concern can be resolved with arb macros living in the highly sorted arb.h now, right?

(In general, headers should NOT include their dependencies, but as you point out, there is an inherent tension with alpha-sort inclusion. You can resolve it in a number of unsatisfying ways, one of which would be header inclusion. Other options are things like a smaller sys/_foo.h header inclusion, the ugly strategy of renaming headers to get the alphasort right, or the perhaps uglier strategy of just dumping your definitions in one of the always-sorts-first headers like cdefs.h, types.h, or params.h.)

213–289

C99 has array initializers as well (you use C99 struct initializers a few lines down); just:

{
    [VS_STYPE_VOISTATE] = "VOISTATE",
...

For ex.

299

Standard C requires at least one element to be explicitly initialized. Generally { 0 } will do.

1041–1052

This talks about "true random" a lot and then uses random(3), which is NOT a random number generator. One or the other needs to be fixed.

Additionally, the formula used to reduce the range of random(3), even if were a real RNG, introduces bias.

Given the ability to provide an explicit deterministic seed value, my guess is that the intent would be better served by replacing random() with arc4random_uniform(101);.

2930–2933

This seems like a dangerous assumption? Maybe there should just be a more explicit ARB API for this intended use?

2970

Using random(3) is almost always incorrect. In fact, I think it's always incorrect, even if you just want a fast PRNG. This is a red flag.

cem added inline comments.Sun, Sep 15, 9:32 PM
lib/libstats/Makefile
5–12

Is here any userspace-only code that can be extracted from the giant subr_stats file and made its own file here?

Likewise for the kernel-only portions?

Without having scanned in exhaustive detail, it seems like at least some significant portions are #ifdef kernel or vice versa, and IMO it would significantly help legibility to have more, smaller files with more explicit purpose.

This comment was removed by rpokala.