Page MenuHomeFreeBSD

tcp: simplify stack switching protocol
ClosedPublic

Authored by tuexen on May 19 2024, 4:06 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Jun 6, 6:33 AM
Unknown Object (File)
May 22 2024, 5:59 PM
Unknown Object (File)
May 21 2024, 11:56 PM
Unknown Object (File)
May 21 2024, 11:19 PM
Unknown Object (File)
May 21 2024, 3:39 AM
Unknown Object (File)
May 20 2024, 1:25 AM
Unknown Object (File)
May 19 2024, 9:31 PM
Unknown Object (File)
May 19 2024, 9:22 PM

Details

Summary

Before this patch, a stack (tfb) accepts a tcpcb (tp), if the tp->t_state is TCPS_CLOSED or tfb->tfb_tcp_handoff_ok is not NULL and tfb->tfb_tcp_handoff_ok(tp) returns 0.
After this patch, the only check is tfb->tfb_tcp_handoff_ok(tp) returns 0. tfb->tfb_tcp_handoff_ok must always be provided.

For existing TCP stacks (FreeBSD, RACK and BBR) there is not functional change. However, the logic is simpler.

Test Plan

Ensure that

kldload tcp_rack tcp_bbr

still works.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Nit: why require tfb_tcp_handoff_ok() be implemented by a stack rather than treat the absence of the hook's implementation to be an implicit ok? Many of the hook functions are optional so it seems a bit unexpected to require that this hook function be implemented.

This is what's currently documented in tcp_functions(9) which will also need an update along with whatever ultimately gets committed:

A user may select a new TCP stack before calling connect(2) or listen(2).
Optionally, a TCP stack may also allow a user to begin using the TCP
stack for a connection that is in a later state by setting a non-NULL
function pointer in the tfb_tcp_handoff_ok field.  If this field is non-
NULL and a user attempts to select that TCP stack after calling
connect(2) or listen(2) for that socket, the kernel will call the
function pointed to by the tfb_tcp_handoff_ok field.  The function should
return 0 if the user is allowed to switch the socket to use the TCP
stack.  Otherwise, the function should return an error code, which will
be returned to the user.  If the tfb_tcp_handoff_ok field is NULL and a
user attempts to select the TCP stack after calling connect(2) or
listen(2) for that socket, the operation will fail and the kernel will
return EINVAL.

Nit: why require tfb_tcp_handoff_ok() be implemented by a stack rather than treat the absence of the hook's implementation to be an implicit ok? Many of the hook functions are optional so it seems a bit unexpected to require that this hook function be implemented.

Some functions are mandatory, some are optional. So I think it just needs to be clear what this function is.

Right now, the absence of the function means that it is not OK to switch, except the state is TCPS_CLOSED. So I think keeping the semantic of the absence of the tfb_tcp_handoff_ok() is in tune with POLA for all states except TCPS_CLOSED.

But as said earlier: I can live with both semantics, as long as they are clearly specified and right now, every stack implements tfb_tcp_handoff_ok().

This is what's currently documented in tcp_functions(9) which will also need an update along with whatever ultimately gets committed:

A user may select a new TCP stack before calling connect(2) or listen(2).
Optionally, a TCP stack may also allow a user to begin using the TCP
stack for a connection that is in a later state by setting a non-NULL
function pointer in the tfb_tcp_handoff_ok field.  If this field is non-
NULL and a user attempts to select that TCP stack after calling
connect(2) or listen(2) for that socket, the kernel will call the
function pointed to by the tfb_tcp_handoff_ok field.  The function should
return 0 if the user is allowed to switch the socket to use the TCP
stack.  Otherwise, the function should return an error code, which will
be returned to the user.  If the tfb_tcp_handoff_ok field is NULL and a
user attempts to select the TCP stack after calling connect(2) or
listen(2) for that socket, the operation will fail and the kernel will
return EINVAL.

Yes, I'll update it once we have agreed how the API will be.

Right now, the absence of the function means that it is not OK to switch, except the state is TCPS_CLOSED. So I think keeping the semantic of the absence of the tfb_tcp_handoff_ok() is in tune with POLA for all states except TCPS_CLOSED. The other reason is that one could use tfb_tcp_handoff_ok() as the first step of the switch. In particular, one could call something like tcp_timer_stop_nounlock as the last step of tfb_tcp_handoff_ok(). This does not work, if the semantic of the absence of tfb_tcp_handoff_ok() is that it is OK to switch.

Your comments are entirely reasonable so I don't have a strong objection against what you've proposed (although the idea of mutating the connection state/set up in handoff_ok() seems sketchy at best, but we can't stop anyone from doing it either).

Right now, the absence of the function means that it is not OK to switch, except the state is TCPS_CLOSED. So I think keeping the semantic of the absence of the tfb_tcp_handoff_ok() is in tune with POLA for all states except TCPS_CLOSED. The other reason is that one could use tfb_tcp_handoff_ok() as the first step of the switch. In particular, one could call something like tcp_timer_stop_nounlock as the last step of tfb_tcp_handoff_ok(). This does not work, if the semantic of the absence of tfb_tcp_handoff_ok() is that it is OK to switch.

Your comments are entirely reasonable so I don't have a strong objection against what you've proposed (although the idea of mutating the connection state/set up in handoff_ok() seems sketchy at best, but we can't stop anyone from doing it either).

I think the protocol would require that if tfb_tcp_handoff_ok() returns no error, tfb_tcp_fb_init() must be called. So you can put something in tfb_tcp_handoff_ok(), which either fails and does not change the state or succeeds and potentially change the state in a way that tfb_tcp_fb_init() expects. But this is up to the user of the API and we cannot control what is done the in callbacks.

Right now, the absence of the function means that it is not OK to switch, except the state is TCPS_CLOSED. So I think keeping the semantic of the absence of the tfb_tcp_handoff_ok() is in tune with POLA for all states except TCPS_CLOSED. The other reason is that one could use tfb_tcp_handoff_ok() as the first step of the switch. In particular, one could call something like tcp_timer_stop_nounlock as the last step of tfb_tcp_handoff_ok(). This does not work, if the semantic of the absence of tfb_tcp_handoff_ok() is that it is OK to switch.

Your comments are entirely reasonable so I don't have a strong objection against what you've proposed (although the idea of mutating the connection state/set up in handoff_ok() seems sketchy at best, but we can't stop anyone from doing it either).

I think the protocol would require that if tfb_tcp_handoff_ok() returns no error, tfb_tcp_fb_init() must be called. So you can put something in tfb_tcp_handoff_ok(), which either fails and does not change the state or succeeds and potentially change the state in a way that tfb_tcp_fb_init() expects. But this is up to the user of the API and we cannot control what is done the in callbacks.

👍

Might be worth adding a check in tcp_subr.c:register_tcp_functions_as_names() that the handoff_ok hook is not NULL and return an errno + fail to register the module if it is?

Right now, the absence of the function means that it is not OK to switch, except the state is TCPS_CLOSED. So I think keeping the semantic of the absence of the tfb_tcp_handoff_ok() is in tune with POLA for all states except TCPS_CLOSED. The other reason is that one could use tfb_tcp_handoff_ok() as the first step of the switch. In particular, one could call something like tcp_timer_stop_nounlock as the last step of tfb_tcp_handoff_ok(). This does not work, if the semantic of the absence of tfb_tcp_handoff_ok() is that it is OK to switch.

Your comments are entirely reasonable so I don't have a strong objection against what you've proposed (although the idea of mutating the connection state/set up in handoff_ok() seems sketchy at best, but we can't stop anyone from doing it either).

I think the protocol would require that if tfb_tcp_handoff_ok() returns no error, tfb_tcp_fb_init() must be called. So you can put something in tfb_tcp_handoff_ok(), which either fails and does not change the state or succeeds and potentially change the state in a way that tfb_tcp_fb_init() expects. But this is up to the user of the API and we cannot control what is done the in callbacks.

👍

Might be worth adding a check in tcp_subr.c:register_tcp_functions_as_names() that the handoff_ok hook is not NULL and return an errno + fail to register the module if it is?

This is already part of the diff, I think. See tcp_subr.c:1186. tcp_subr.c:register_tcp_functions_as_names() would return EINVAL like in the case where some other mandatory function is not provided.

This revision is now accepted and ready to land.Thu, May 30, 10:57 AM
sys/netinet/tcp_usrreq.c
1734

With this cleanup of the handoff_ok / fb_init flow, I think we should move this block to before the handoff_ok check?

This revision now requires review to proceed.Thu, May 30, 7:49 PM
tuexen added inline comments.
sys/netinet/tcp_usrreq.c
1734

Good catch. Fixed.

Might be worth adding a check in tcp_subr.c:register_tcp_functions_as_names() that the handoff_ok hook is not NULL and return an errno + fail to register the module if it is?

This is already part of the diff, I think. See tcp_subr.c:1186. tcp_subr.c:register_tcp_functions_as_names() would return EINVAL like in the case where some other mandatory function is not provided.

Derp, sorry I missed it while replying on the run.

share/man/man9/tcp_functions.9
277

"Otherwise, the function should should not have any side effects and return ..."

Suggested edit:

"If the user is not allowed to switch the socket, the function should undo any changes it made to the connection state configuration and return ..."

Use wording suggested by Lawrence.

tuexen added inline comments.
share/man/man9/tcp_functions.9
277

Done.

This revision is now accepted and ready to land.Tue, Jun 4, 4:12 AM
This revision was automatically updated to reflect the committed changes.
tuexen marked an inline comment as done.