Page MenuHomeFreeBSD

Make use of stats(3) in the TCP stack
Needs ReviewPublic

Authored by trasz on Jun 15 2019, 2:48 PM.

Details

Summary

Make use of the stats(3) framework in the TCP stack.

This makes it possible to retrieve per-connection statistical
information such as the receive window size, RTT, or goodput,
using a newly added TCP_STATS getsockopt(3) option, and extract them
using the stats_voistat_fetch(3) API.

See the net/tcprtt port for an example consumer of this API.

Compared to the existing TCP_INFO system, the main differences are that
this mechanism is easy to extend without breaking ABI, and provides
statistical information instead of raw "snapshots" of values at a given
point in time. stats(3) is more generic and can be used in both
userland and the kernel.

This work depends on:

the stats(3) framework:
https://reviews.freebsd.org/D20477

The qmath(3) macros:
https://reviews.freebsd.org/D20116

and the Array-base Red Black Tree implementation:
https://reviews.freebsd.org/D20324

An example consumer tool for using these TCP stats is here:
https://reviews.freebsd.org/D20656.

Sponsored By: Klara Inc, Netflix
Obtained from: Netflix

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 24870
Build 23611: arc lint + arc unit

Event Timeline

trasz created this revision.Jun 15 2019, 2:48 PM
trasz retitled this revision from Make use of stats(3) in the TCP stack. This depends on https://reviews.freebsd.org/D20477, https://reviews.freebsd.org/D20116, and https://reviews.freebsd.org/D20324. to Make use of stats(3) in the TCP stack.Jun 15 2019, 2:50 PM
trasz edited the summary of this revision. (Show Details)
trasz edited the summary of this revision. (Show Details)Jun 15 2019, 3:47 PM
trasz updated this revision to Diff 58673.Jun 15 2019, 3:58 PM

Drop unneeded diff.

allanjude added inline comments.
sys/netinet/tcp_usrreq.c
1533

should the fixup of missing TF_FASTOPEN bits be a separate commit?

thj added a subscriber: thj.Jun 19 2019, 10:18 AM
thj added inline comments.
sys/netinet/tcp.h
335

tcpsyninfo looks to be 16 bytes to me, shouldn't this update to

__tcpi_pad[10]

trasz added inline comments.Jun 28 2019, 8:52 PM
sys/netinet/tcp.h
335

But __tcpi_pad[] is u_int32_t, not a char?

allanjude added inline comments.Jun 29 2019, 3:21 PM
sys/netinet/tcp.h
335

indeed, tcpi_pad[] is a u_int32_t (so 4 bytes), so dropping it from 26 to 22 accounts for the 16 bytes. Unless I am missing something.

thj added inline comments.Jul 3 2019, 8:19 AM
sys/netinet/tcp.h
335

Ah okay, I missed that.

sys/netinet/tcp_input.c
321

Is the goodput measurement time (i.e. number of ms for the measurement window) reported by stats somewhere?

Does the calculation break with an rtt <1ms?

I think a comment here might be helpful for future readers of the code, something along the lines of

report goodput in bits per [time]
trasz edited the summary of this revision. (Show Details)Jul 15 2019, 4:30 PM
trasz edited the summary of this revision. (Show Details)Jul 15 2019, 6:43 PM
trasz added inline comments.Jul 17 2019, 8:06 PM
sys/netinet/tcp_input.c
321

I don't think the raw measurement time is reported - but there's the digest computed for goodput, which I'm guessing is what you'd want the times for?

Regarding rtt < 1ms - well, it's just 'capped from the bottom' with the max(), but I don't know how realistic times below 1ms are, given that the resolution of tcp_ts_getticks() seems to be 1ms?

As for the units - that's a good suggestion; let me do a little experiment to make sure I get this right and I'll add it to the VOI declarations in tcp.h.

trasz edited the summary of this revision. (Show Details)Jul 18 2019, 9:36 PM
bz added a subscriber: bz.

Adding TCP folks explicitly as reviewers.

share/man/man4/tcp.4
37

Please don't forget to update on commit.

bz added inline comments.Aug 9 2019, 3:21 PM
sys/netinet/tcp_var.h
753

The comment sounds wrong to me. One variable looks like a sysctl and with that should not need to be user land visible.
For the other, I don't know the new stats framework but I'd hope it'd use a proper "copyin/out" mechanism and not directly read kernel memory using libkvm with vnet variables.

jtl added a subscriber: jtl.Aug 9 2019, 6:48 PM
jtl added inline comments.
sys/netinet/tcp_var.h
753

I agree. I don't think any of these three lines need to be visible to userland. I think they can safely move within the #ifdef _KERNEL block. I suspect that this is a little cruft which managed to sneak in during the 3+ years of development on this project.

lstewart added inline comments.
sys/netinet/tcp_var.h
753

Actually, they do need to be visible to userland in the current way things work, but the comment does possibly need to be expanded to explain why (or my arguably hacky way of doing things needs to be improved).

The underlying issue here is the need for statsblob templates defined in the kernel or userland to be shared across both domains, such that a statsblob generated in one domain based on a particular template can be manipulated in the other domain. By shared, I don't mean in a direct memory access sense, but the statsblob template needs to exist or be discoverable in both domains (or in future, perhaps across machines).

The only direction that matters in the current code being upstreamed is for templates defined in kernel, and then statsblobs based on a kernel-defined template being exported to userspace, and manipulated in userspace e.g. for rendering to JSON for logging purposes.

The "right" solution would be to have an API for exporting/importing statsblob templates. I punted on doing it the "right" way and cheated.

Regarding the TCP-related stats code, tcp_stats_init() in tcp_stats.c defines and registers the "TCP_DEFAULT" statsblob template together with its associated variables of interest. tcp_stats.c is compiled into the kernel, and tcp_stats_init() is called from tcp_init() so that the kernel statsblob template registry contains the "TCP_DEFAULT" template for use by the TCP stack.

kern/subr_stats.c is both compiled into the kernel and into a userland library (similar to the way sbuf(9) is built for both domains), and in order to have the stats templates match between the kernel and userland, I compile tcp_stats.c into the userland library too with tcp_stats_init() tagged as a library constructor so that the "TCP_DEFAULT" template is defined the same as what's in the kernel, and is registered on userland library load. As a consequence, tcp_stats.c has a bit of #ifdef goo to make compilation in userland work correctly, and these tcp_var.h bits need to be visible to make the userland library compile.

This scheme does mean that the kernel and userland can get out of sync if a change is made to the template but only one of the kernel or userland library are recompiled against the new code. In such a situation, statsblobs generated e.g. in kernel, would have a template hash id which differs from the templates known in userland, and the userland code would fail any statsblob manipulation operations that required the template to be known (return an errno).

In practice, this scheme works well and has not caused problems in the Netflix use case. It was always the plan to eventually implement the API for template export/import and clean this up when that work is done, but I don't think the API is a requirement for upstreaming.

bz added inline comments.Aug 12 2019, 7:05 AM
sys/netinet/tcp_var.h
753

Can you point me at the userland code which "reads" these things? Given one is a sysctl already I'd think it'd be easy to adjust that. I am only wondering if the code was originally written with VIMAGE in mind?

lstewart added inline comments.Aug 12 2019, 7:54 AM
sys/netinet/tcp_var.h
753

@bz Oops, sorry for being unclear. The userland library does not read or interact with any of these tcp_var.h variables in kernel. If you look in tcp_stats.c you'll see where this all comes together.

tcp_stats_init() is executed in both kernel, and separately in the userland library as a library constructor. V_tcp_perconn_stats_dflt_tpl is used in tcp_stats_init() and therefore needs to be resolvable for compilation of stats code in both kernel and userland. In the userland library it just resolves to a library-scope global integer - no relation/access to the kernel variable of same name.

V_tcp_perconn_stats_enable does not strictly speaking need to be visible to userland as it's only referenced from kernel code, but it doesn't hurt having it here and keeping these bits together (though I would also not object to it moving elsewhere).

allanjude edited the summary of this revision. (Show Details)Aug 13 2019, 6:05 PM
jtl added inline comments.Mon, Aug 19, 1:39 PM
sys/netinet/tcp_var.h
753

FWIW, I think this dual-use (especially of the V_tcp_perconn_stats_dflt_tpl variable) makes things unclear. I think it would be nice to eliminate it, if possible, without adding code duplication. However, I understand that may not be possible.

I *suspect* you may be able to make the constructor be a static function in the user-land library. This would avoid the need to expose it via headers.

Also, as best I can tell, V_tcp_perconn_stats_dflt_tpl is not used in the userland library outside of tcp_stats_init. Is that true? If so, it might be possible to do a trivial rewrite of this function to use a local variable, and only update V_tcp_perconn_stats_dflt_tpl in the kernel.