Register a "tcp" eventlog(9) provider and emit per-connection events
from the in-tree TCP code paths. Each tcpcb gets a per-connection
eventlog session identified by inp_gencnt. When the user enables
eventlog on a socket via the new TCP_EVENTLOG sockopt, the session
is flagged for capture and dumped against any subscribers. Sessions
follow stack swaps via tcp_ensure_eventlog_session_on_switch(), so a
connection moving between eventlog-capable stacks (e.g. base ↔ rack)
stays observable across the transition.
Schema (include/eventlog/tcp_eventlog_schema.src):
- SESSION/CONN - SESSION_CREATE, CONN_SET_IP_V[46], LOG_ID, CONN_PARAMS, MSS, CONN_OWNER, TCP_STACK_CHANGE, CC_ALGO_CHANGE
- DATA - IN, OUT, ACK, SACK, USER_SEND, SENDFILE
- CC / WINDOW - CWND, CC_PARAMS, ENTER/EXIT_RECOVERY, PRR_STATE, WINDOW_COLLAPSE, HYSTART, GOODPUT
- RTT - RTT, RTT_SAMPLE
- LOSS - RTO, RETRANSMIT_REASON, DSACK
- TIMER - TIMER_START / _CANCEL / _FIRED, SB_WAKE
- PACING - PACING_CALC, PACE_STATE, HW_PACE_RATE
- APP / OPT - SOCKET_OPT_UINT32, SOCKET_OPT_ERR, OUTPUT_DECISION
The RTT keyword is split out from CC because EVENT RTT (300) fires on
every SRTT update; consumers that want CC visibility (cwnd / PRR /
hystart / goodput) without that firehose can subscribe to CC alone.
STRUCT GOODPUT carries the per-goodput-window SRTT and the all-time-min
RTT alongside gp_bw / raw_bw so consumers can correlate throughput
against RTT without interleaving a separate RTT event stream.
Emission points: tcp_input.c (RTT update), tcp_subr.c (provider
registration, dump callback, dump_session, conn_params,
sendfile/log-id helpers, stack switch), tcp_syncache.c (CONN_SET_IP),
tcp_timer.c (RTO), tcp_usrreq.c (TCP_EVENTLOG sockopt, CONN_OWNER on
attach, CC_ALGO_CHANGE in tcp_set_cc_mod), tcp_output.c, the rack/bbr
stacks, the cubic/newreno CC modules, and kern_sendfile.c.
The new sockopt TCP_EVENTLOG (79) and the new t_flags2 bit
TF2_EVENTLOG_ENABLED gate per-connection capture. CONN_OWNER tracks
the owning PID on active opens (socket(2) caller) and on passive
accepts (listener owner is inherited).
The build system in sys/conf/{kern.pre.mk,kmod.mk} (introduced with
eventlog(9)) auto-generates <eventlog/tcp_eventlog.h> from the schema
on every relevant rebuild, so producers always see consistent
emission macros.
tcp(4):
- Document the new TCP_EVENTLOG sockopt and its dump-on-enable behaviour; cross-reference the kern.eventlog.tcp.default sysctl in eventlog(9) for the system-wide default.
- Add elog(1) and eventlog(9) to SEE ALSO.
elog(1):
- New flag -f / --format <type> (only "tcp" is currently accepted). With "-o dir=<path>" capture, files are post-renamed using the first LOG_ID and CONN_SET_IP_V[46] events seen in the session, producing <log_id>_<lport>_<rip>_<rport>.elog. Missing fields decode as "unknown".
- struct session_file gains the per-session metadata fields used by the renamer.
- extract_tcp_metadata() captures the metadata; rename_session_file() runs once at session-end (and at exit cleanup).
elog(1) man page (elog.1):
- Document -f / --format with the TCP file-naming convention.
- TCP-flavored examples now name "tcp" explicitly instead of the generic "<provider>" placeholder.
tests/sys/kern/elog_test.py:
- Replaces the smoke-only test file with end-to-end coverage. Each test pins net.inet.tcp.functions_default to "freebsd" and sets kern.eventlog.tcp.default=1 around a loopback exchange so that captures contain stable, predictable events.
- Coverage:
- basic capture from the tcp provider with no traffic and
with a loopback exchange (per stack, per address family);
- provider-tag and event-name presence in human-readable
output;
- dump-state replay of an existing connection's parameters;
- keyword filtering (RX/TX);
- level filtering and SESSION-keyword bypass;
- binary capture round-trip via -r;
- -o dir=<path> per-session split + -f tcp rename.
Sponsored by: Netflix
Signed-off-by: Nick Banks <nickbanks@netflix.com>
