Page MenuHomeFreeBSD

Add sysctl net.inet.tcp.ktlslist
ClosedPublic

Authored by kib on Mon, Jun 2, 9:36 PM.
Tags
None
Referenced Files
F121916491: D50653.diff
Mon, Jun 30, 3:17 PM
F121858259: D50653.id156666.diff
Mon, Jun 30, 1:56 AM
F121855503: D50653.id156670.diff
Mon, Jun 30, 1:15 AM
Unknown Object (File)
Sun, Jun 29, 12:39 PM
Unknown Object (File)
Sun, Jun 29, 10:17 AM
Unknown Object (File)
Sun, Jun 29, 10:17 AM
Unknown Object (File)
Sun, Jun 29, 10:17 AM
Unknown Object (File)
Sun, Jun 29, 10:17 AM

Details

Summary

The corresponding userspace tool is available at https://github.com/kostikbel/ktcpdump, and requires a lock-stepped update to Rust libc provided by the ktls branch in the fork https://github.com/kostikbel/libc.

There is also a method added to snd tags to allow drivers to say something about the state of the offload. We plan to use it for mlx5, but no code is included into this review.

I have no plans touching netstat(1). I am aware that KTLS can be used with unix sockets, but again I do not plan to add the similar facility for now.

Overall, this allows users to see the state of the KTLS offload for tcp connections, without resorting to kernel debugger.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

kib requested review of this revision.Mon, Jun 2, 9:36 PM
sys/kern/uipc_ktls.c
3495

Presumably the caller should have a credential matching that of the caller? Or does something else prevent a user from dumping all of the session keys?

3498

Extra newline.

sys/netinet/in_pcb.h
326

Will you add pad bytes for this?

sys/netinet/tcp_subr.c
2700

How does xig get initialized?

sys/kern/uipc_ktls.c
3495

I missed the cr_canseeinpcb() call, but it still seems too relaxed.

I don't think KTLS actually works over unix domain sockets, just TCP?

For the tool, the name makes me think it is a special version of tcpdump that will decrypt the packets or some such, ktcplist or some such as a name might avoid confusion? Have you considered outputting names for algorithms in the tool btw? (E.g. "AES-CBC", "SHA256-HMAC", "AES-GCM")

sys/kern/uipc_ktls.c
3495

Looks like there is just a global sysctl and you can either dump all of the keys or none of them. Presumably a priv_check() of some sort should be called in the with-keys variant? (I don't think there is one today)

sys/netinet/tcp_subr.c
2756

Looks like this should move up before the first loop to answer Mark's question? (Except you still have to initialize xig_count here?)

2845

How is userspace expected to detect cnt vs cnt1 mismatches? Is it based on looking at the end of the buffer for a xig structure? In that case, can the count in the first record just be 0? (And then you can indeed just initialize xig before the first loop)

kib marked 8 inline comments as done.Tue, Jun 3, 7:18 PM
kib added inline comments.
sys/kern/uipc_ktls.c
3495

Then, if cr_canseeinpcb() is to weak, what other checks do you suggest? I did not come with any other suitable restrictions there. canseeinpcb is roughly equivalent to candebug, which means that the keys could be extracted with userspace debugger anyway.

sys/netinet/in_pcb.h
326

What do you mean? I do not reserve fixed space for keys, so I cannot express it in C. I can add a flexible array with data at the end of the structure, but its utility is not great.

sys/netinet/tcp_subr.c
2845

Intent is that the generation count is copied out in the prepended xig, and actual count is copied out in the appended xig. I messed this up eventually.

This is the natural approach IMO: we take the gen counts snapshots before looping, and filter the list based on the read gen count values. Then we know the new value for the element count at the end, and copy it out appending the xig. Also we can provide the suggested new gen count values to user so that userspace can detect a change if so willing.

kib marked 3 inline comments as done.

(Try to) un-mess the prepended xig vs appended xig.

External repo and tool are renamed ktcpdump->ktcplist.

jhb added inline comments.
sys/netinet/tcp_subr.c
2760

I think you can leave this out if you wish and just leave the count 0 in the prepended xig. I think this will match your intent (which I agree with), which is that userspace can compare the generation count of the two xig's to detect if the list changed while it was being generated, and the count is only meaningful in the final xig.

kib marked an inline comment as done.

Leave xig.count as zero in the prepended xig.

Eliminate a write-only var

Sorry, apparently I did not submit my previous comments. :(

sys/kern/uipc_ktls.c
3495

cr_canseeinpcb() is more permissive than candebug(): it lets an unprivileged user see any other user's socket connection metadata. I think something equivalent to p_candebug() would be ok.

sys/netinet/in_pcb.h
326

I misunderstood the comment.

sys/kern/uipc_ktls.c
3495

I added a place where all policy can be put, for now the only part to add I see is the ids subset check.

Extract code from p_candebug() into cr_xids_subset().
Add PRIV_NETINET_KTLSKEYS.
Define and use cr_canexport_ktlskeys() for export policy.

sys/kern/kern_prot.c
2188

Extra newline.

sys/kern/uipc_ktls.c
3458

In general I think we shouldn't provide the IV unless the canexportkeys predicate is true. It's not as sensitive as the session keys, but it should still be kept hidden.

sys/netinet/tcp_subr.c
2854

We should zero the buffer as well, e.g., with zfree().

sys/netinet/tcp_var.h
1238

Missing comments here.

kib marked 6 inline comments as done.

Do not export init vectors for !export_keys.
Use zfree() for buffer with keys.
Minor tweaks.

markj added inline comments.
sys/netinet/tcp_subr.c
2806

IMO it's a bit strange to set these lengths to 0. They are properties of the selected algorithm. I'd probably include key buffers in the KTLS struct and just zero them when the privilege check fails.

In that case I guess the buffer size should be TLS_MAX_PARAM_SIZE for consistency with the KTLS code, though, so maybe that's too fat.

sys/sys/ktls.h
152

This is just the largest IV length we use today? I'd suggest hard-coding it or defining a new "max KTLS IV length" constant instead.

sys/netinet/tcp_subr.c
2806

The buffers for keys cannot be fixed length, TLS_MAX_PARAM_SIZE is way to large to be practical even for modest amount of ktls sessions total. MAX is 1K, which means that wiring would need hundreds of KB if not MB.

sys/sys/ktls.h
152

Well, this is the same constant used by tls_session_param several lines below.

I added a dedicated defined symbol and bumped it to 32 (from 16).

Do no abuse CBC IV length constant.

markj added inline comments.
sys/sys/ktls.h
152

Right, but struct tls_session_params is not part of the user ABI.

gallatin added inline comments.
sys/kern/uipc_ktls.c
869

I really don't like adding an atomic to the critical path (and yes, connection establishment is a critical path for us).

What is the need for a global generation count?

sys/kern/uipc_ktls.c
869

The gen count is to make it possible for the sysctl to distinguish ktls sessions for before/after the gen count snapshot was taken.

The count is updated by atomic because there is no any other protection against parallel modifications for ktls_glob_gen.

Really, the ktls session creation typically programs a slow hardware. It takes several mutexes on the path, and pushes the initialization data into the dump sq (on mlx5). Do you actually see some effect from this atomic, or it is a generic dislike?

sys/kern/uipc_ktls.c
869

I noticed it in our internal code review when we brought in the last few weeks of FreeBSD's main branch.

The difference between this and the other slow stuff is that some of those mutexes are uncontended, and some are per-port (so load is at least sharded 4 ways). This is likely to be a contended atomic across 96 or more cores with more than 100k increments a second in times of heavy load.

I don't know much about the timecounter subsystem, but would it work to use getnanouptime() to record the creation of the session, and disambiguate them using address?

sys/kern/uipc_ktls.c
869

Please see D51000 for the proposed way to get rid of the atomic.
I suppose you do not care about the sysctl or about the parallel execution of the sysctl.