Paths

Table of Contentst

tcp: simplify stack switching protocol
Needs ReviewPublic
Actions

Authored by tuexen on Sun, May 19, 4:06 PM.

Details

Reviewers

rrs
rscheff
peter.lei_ieee.org
lstewart

Group Reviewers

transport

Summary

Before this patch, a stack (tfb) accepts a tcpcb (tp), if the tp->t_state is TCPS_CLOSED or tfb->tfb_tcp_handoff_ok is not NULL and tfb->tfb_tcp_handoff_ok(tp) returns 0.
After this patch, the only check is tfb->tfb_tcp_handoff_ok(tp) returns 0. tfb->tfb_tcp_handoff_ok must always be provided.

For existing TCP stacks (FreeBSD, RACK and BBR) there is not functional change. However, the logic is simpler.

Test Plan

Ensure that

kldload tcp_rack tcp_bbr

still works.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

tuexen created this revision.Sun, May 19, 4:06 PM

Herald added a reviewer: transport. · View Herald TranscriptSun, May 19, 4:06 PM

Herald added subscribers: glebius, melifaro, imp. · View Herald Transcript

tuexen requested review of this revision.Sun, May 19, 4:06 PM

Nit: why require tfb_tcp_handoff_ok() be implemented by a stack rather than treat the absence of the hook's implementation to be an implicit ok? Many of the hook functions are optional so it seems a bit unexpected to require that this hook function be implemented.

This is what's currently documented in tcp_functions(9) which will also need an update along with whatever ultimately gets committed:

A user may select a new TCP stack before calling connect(2) or listen(2).
Optionally, a TCP stack may also allow a user to begin using the TCP
stack for a connection that is in a later state by setting a non-NULL
function pointer in the tfb_tcp_handoff_ok field.  If this field is non-
NULL and a user attempts to select that TCP stack after calling
connect(2) or listen(2) for that socket, the kernel will call the
function pointed to by the tfb_tcp_handoff_ok field.  The function should
return 0 if the user is allowed to switch the socket to use the TCP
stack.  Otherwise, the function should return an error code, which will
be returned to the user.  If the tfb_tcp_handoff_ok field is NULL and a
user attempts to select the TCP stack after calling connect(2) or
listen(2) for that socket, the operation will fail and the kernel will
return EINVAL.

In D45253#1032342, @lstewart wrote:

Nit: why require tfb_tcp_handoff_ok() be implemented by a stack rather than treat the absence of the hook's implementation to be an implicit ok? Many of the hook functions are optional so it seems a bit unexpected to require that this hook function be implemented.

Some functions are mandatory, some are optional. So I think it just needs to be clear what this function is.

Right now, the absence of the function means that it is not OK to switch, except the state is TCPS_CLOSED. So I think keeping the semantic of the absence of the tfb_tcp_handoff_ok() is in tune with POLA for all states except TCPS_CLOSED.

But as said earlier: I can live with both semantics, as long as they are clearly specified and right now, every stack implements tfb_tcp_handoff_ok().

This is what's currently documented in tcp_functions(9) which will also need an update along with whatever ultimately gets committed:

A user may select a new TCP stack before calling connect(2) or listen(2).
Optionally, a TCP stack may also allow a user to begin using the TCP
stack for a connection that is in a later state by setting a non-NULL
function pointer in the tfb_tcp_handoff_ok field.  If this field is non-
NULL and a user attempts to select that TCP stack after calling
connect(2) or listen(2) for that socket, the kernel will call the
function pointed to by the tfb_tcp_handoff_ok field.  The function should
return 0 if the user is allowed to switch the socket to use the TCP
stack.  Otherwise, the function should return an error code, which will
be returned to the user.  If the tfb_tcp_handoff_ok field is NULL and a
user attempts to select the TCP stack after calling connect(2) or
listen(2) for that socket, the operation will fail and the kernel will
return EINVAL.

Yes, I'll update it once we have agreed how the API will be.

Right now, the absence of the function means that it is not OK to switch, except the state is TCPS_CLOSED. So I think keeping the semantic of the absence of the tfb_tcp_handoff_ok() is in tune with POLA for all states except TCPS_CLOSED. The other reason is that one could use tfb_tcp_handoff_ok() as the first step of the switch. In particular, one could call something like tcp_timer_stop_nounlock as the last step of tfb_tcp_handoff_ok(). This does not work, if the semantic of the absence of tfb_tcp_handoff_ok() is that it is OK to switch.

Your comments are entirely reasonable so I don't have a strong objection against what you've proposed (although the idea of mutating the connection state/set up in handoff_ok() seems sketchy at best, but we can't stop anyone from doing it either).

In D45253#1035179, @lstewart wrote:

Right now, the absence of the function means that it is not OK to switch, except the state is TCPS_CLOSED. So I think keeping the semantic of the absence of the tfb_tcp_handoff_ok() is in tune with POLA for all states except TCPS_CLOSED. The other reason is that one could use tfb_tcp_handoff_ok() as the first step of the switch. In particular, one could call something like tcp_timer_stop_nounlock as the last step of tfb_tcp_handoff_ok(). This does not work, if the semantic of the absence of tfb_tcp_handoff_ok() is that it is OK to switch.

Your comments are entirely reasonable so I don't have a strong objection against what you've proposed (although the idea of mutating the connection state/set up in handoff_ok() seems sketchy at best, but we can't stop anyone from doing it either).

I think the protocol would require that if tfb_tcp_handoff_ok() returns no error, tfb_tcp_fb_init() must be called. So you can put something in tfb_tcp_handoff_ok(), which either fails and does not change the state or succeeds and potentially change the state in a way that tfb_tcp_fb_init() expects. But this is up to the user of the API and we cannot control what is done the in callbacks.

In D45253#1035180, @tuexen wrote:

In D45253#1035179, @lstewart wrote:

Right now, the absence of the function means that it is not OK to switch, except the state is TCPS_CLOSED. So I think keeping the semantic of the absence of the tfb_tcp_handoff_ok() is in tune with POLA for all states except TCPS_CLOSED. The other reason is that one could use tfb_tcp_handoff_ok() as the first step of the switch. In particular, one could call something like tcp_timer_stop_nounlock as the last step of tfb_tcp_handoff_ok(). This does not work, if the semantic of the absence of tfb_tcp_handoff_ok() is that it is OK to switch.

Your comments are entirely reasonable so I don't have a strong objection against what you've proposed (although the idea of mutating the connection state/set up in handoff_ok() seems sketchy at best, but we can't stop anyone from doing it either).

I think the protocol would require that if tfb_tcp_handoff_ok() returns no error, tfb_tcp_fb_init() must be called. So you can put something in tfb_tcp_handoff_ok(), which either fails and does not change the state or succeeds and potentially change the state in a way that tfb_tcp_fb_init() expects. But this is up to the user of the API and we cannot control what is done the in callbacks.

👍

Might be worth adding a check in tcp_subr.c:register_tcp_functions_as_names() that the handoff_ok hook is not NULL and return an errno + fail to register the module if it is?

In D45253#1035494, @lstewart wrote:

In D45253#1035180, @tuexen wrote:

In D45253#1035179, @lstewart wrote:

Right now, the absence of the function means that it is not OK to switch, except the state is TCPS_CLOSED. So I think keeping the semantic of the absence of the tfb_tcp_handoff_ok() is in tune with POLA for all states except TCPS_CLOSED. The other reason is that one could use tfb_tcp_handoff_ok() as the first step of the switch. In particular, one could call something like tcp_timer_stop_nounlock as the last step of tfb_tcp_handoff_ok(). This does not work, if the semantic of the absence of tfb_tcp_handoff_ok() is that it is OK to switch.

Your comments are entirely reasonable so I don't have a strong objection against what you've proposed (although the idea of mutating the connection state/set up in handoff_ok() seems sketchy at best, but we can't stop anyone from doing it either).

I think the protocol would require that if tfb_tcp_handoff_ok() returns no error, tfb_tcp_fb_init() must be called. So you can put something in tfb_tcp_handoff_ok(), which either fails and does not change the state or succeeds and potentially change the state in a way that tfb_tcp_fb_init() expects. But this is up to the user of the API and we cannot control what is done the in callbacks.

👍

Might be worth adding a check in tcp_subr.c:register_tcp_functions_as_names() that the handoff_ok hook is not NULL and return an errno + fail to register the module if it is?

This is already part of the diff, I think. See tcp_subr.c:1186. tcp_subr.c:register_tcp_functions_as_names() would return EINVAL like in the case where some other mandatory function is not provided.

Update tcp_functions.9.

rrs accepted this revision.Thu, May 30, 10:57 AM

This revision is now accepted and ready to land.Thu, May 30, 10:57 AM

peter.lei_ieee.org added inline comments.Thu, May 30, 3:31 PM

sys/netinet/tcp_usrreq.c
1734	With this cleanup of the handoff_ok / fb_init flow, I think we should move this block to before the handoff_ok check?

Address Peter's comment.

This revision now requires review to proceed.Thu, May 30, 7:49 PM

tuexen marked an inline comment as done.Thu, May 30, 7:50 PM

tuexen added inline comments.

sys/netinet/tcp_usrreq.c
1734	Good catch. Fixed.

tuexen marked an inline comment as done.Fri, May 31, 11:22 AM

In D45253#1035534, @tuexen wrote:

In D45253#1035494, @lstewart wrote:

Might be worth adding a check in tcp_subr.c:register_tcp_functions_as_names() that the handoff_ok hook is not NULL and return an errno + fail to register the module if it is?

This is already part of the diff, I think. See tcp_subr.c:1186. tcp_subr.c:register_tcp_functions_as_names() would return EINVAL like in the case where some other mandatory function is not provided.

Derp, sorry I missed it while replying on the run.

lstewart added inline comments.Fri, May 31, 11:59 PM

share/man/man9/tcp_functions.9
277	"Otherwise, the function should should not have any side effects and return ..." Suggested edit: "If the user is not allowed to switch the socket, the function should undo any changes it made to the connection state configuration and return ..."