Page MenuHomeFreeBSD

tcp: simplify endpoint creation at the passive side
ClosedPublic

Authored by tuexen on May 30 2024, 11:13 AM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Jul 22, 3:32 PM
Unknown Object (File)
Mon, Jul 22, 11:55 AM
Unknown Object (File)
Mon, Jul 22, 7:42 AM
Unknown Object (File)
Sun, Jul 21, 6:53 PM
Unknown Object (File)
Sun, Jul 14, 10:42 AM
Unknown Object (File)
Sun, Jul 14, 10:42 AM
Unknown Object (File)
Sun, Jul 14, 10:42 AM
Unknown Object (File)
Sun, Jul 14, 10:42 AM

Details

Summary

Use the intended TCP stack when creating a TCP endpoint instead of creating it the endpoint the default stack first and after that switching it to use the intended TCP stack.

While there, fix some error cases, which would call tfb_tcp_fb_fini without tfb_tcp_fb_init having been called.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

This revision is now accepted and ready to land.May 30 2024, 3:23 PM

Should we check for TCP_FUNC_BEING_REMOVED?

In D45411#1035980, @jtl wrote:

Should we check for TCP_FUNC_BEING_REMOVED?

I thought about this and did again after you raised the question. The original code didn't. I think for consistency with the code path in the setsockopt() handler for TCP_FUNCTION_BLK it makes sense to use:

	if (V_functions_inherit_listen_socket_stack)
		tfb = tcp_find_and_ref_fb(sototcpcb(lso)->t_fb);
	else
		tfb = tcp_find_and_ref_default_fb();
	KASSERT(tfb != NULL, ("Cannot find TCP function block."));
	if ((tfb->tfb_flags & TCP_FUNC_BEING_REMOVED) ||
	    (tp = tcp_newtcpcb(inp, tfb)) == NULL) {
		refcount_release(&tfb->tfb_refcnt);
		in_pcbfree(inp);
		sodealloc(so);
		goto allocfail;
	}

However, since we don't hold the tcp_function_lock, the tfb can be removed any time before or after we set the TCP function block in tcp_newtcpcb. It cannot go away, since we hold a reference count. I guess that is the reason why there was no check...

This revision now requires review to proceed.May 31 2024, 11:24 AM
In D45411#1035980, @jtl wrote:

Should we check for TCP_FUNC_BEING_REMOVED?

I thought about this and did again after you raised the question. The original code didn't.
...
However, since we don't hold the tcp_function_lock, the tfb can be removed any time before or after we set the TCP function block in tcp_newtcpcb. It cannot go away, since we hold a reference count. I guess that is the reason why there was no check...

I agree it won't cause a fatal problem (i.e. panic). But, it could cause problems where it is hard to remove TFBs.

Without the check, we could end up in a situation where:

  • Removal code marks TCP_FUNC_BEING_REMOVED.
  • Removal code walks TCBs to switch them off the TFB.
  • After removal code starts walking TCBs, syncache_socket() creates a new TCB using this TFB. (The removal code's TCB walk won't see this as the new TCB is added to the head of the list the removal code is walking.)
  • So, once it is done switching TCBs, removal code will see that TCBs are still using this TFB.

If we had a working, coordinated (i.e. lock-protected or atomic) check for TCP_FUNC_BEING_REMOVED in syncache_socket, this race could not happen.

Whether it is worth adding code to cover this case is another question. I think it depends on whether we expect this to be used primarily as a development feature or a user-facing feature. If the latter, its probably more important it be reliable.

sys/netinet/tcp_subr.c
447

I know it is unrelated, but it seems like this just devolves to rblk = blk if the caller owns a reference on blk (guaranteeing it will be in the list). There might be room for future optimization here.

If we wanted, this is probably where we could check for TCP_FUNC_BEING_REMOVED while we hold the tcp_function_lock. But, I haven't checked all callers to make sure that's appropriate.

In D45411#1036189, @jtl wrote:
In D45411#1035980, @jtl wrote:

Should we check for TCP_FUNC_BEING_REMOVED?

I thought about this and did again after you raised the question. The original code didn't.
...
However, since we don't hold the tcp_function_lock, the tfb can be removed any time before or after we set the TCP function block in tcp_newtcpcb. It cannot go away, since we hold a reference count. I guess that is the reason why there was no check...

I agree it won't cause a fatal problem (i.e. panic). But, it could cause problems where it is hard to remove TFBs.

Without the check, we could end up in a situation where:

  • Removal code marks TCP_FUNC_BEING_REMOVED.
  • Removal code walks TCBs to switch them off the TFB.
  • After removal code starts walking TCBs, syncache_socket() creates a new TCB using this TFB. (The removal code's TCB walk won't see this as the new TCB is added to the head of the list the removal code is walking.)
  • So, once it is done switching TCBs, removal code will see that TCBs are still using this TFB.

Correct. That can happen. But I think this is not introduced by this patch or am I missing something?

If we had a working, coordinated (i.e. lock-protected or atomic) check for TCP_FUNC_BEING_REMOVED in syncache_socket, this race could not happen.

Are you proposing to hold the tcp_function_lock from before the looking up the TFB until it is set and the inp can be found? Wouldn't that be too long?

Whether it is worth adding code to cover this case is another question. I think it depends on whether we expect this to be used primarily as a development feature or a user-facing feature. If the latter, its probably more important it be reliable.

I think the use case is that you are using some stack (for example the RACK stack) even as the default stack and want to switch all endpoints to a different one (for example the freebsd stack). So I would do this by

  1. sudo sysctl net.inet.tcp.functions_default=freebsd
  2. sudo tcpsso -s LISTEN -Srack TCP_FUNCTION_BLK freebsd
  3. sudo tcpsso -Srack TCP_FUNCTION_BLK freebsd

You might need to run the second and third command multiple time, but after that you should be able to kldunload tcp_rack.

sys/netinet/tcp_subr.c
447

We can do that, but that does NOT protect the case, where the TCP_FUNC_BEING_REMOVED was not set inside this function, but gets marked as soon as the tcp_function_lock is released and before the called gets the return value. So where is the difference?

Would the following address your issue:

  1. pass the relevant information to tcp_newtcpcb().
  2. In tcp_newtcpcb, lock the tcp_function_lock, ensure that the TCP_FUNC_BEING_REMOVED flag is not set (else return NULL), increment the refcount, set the pointer and unlock the tcp_function_lock.

Increment reference count and set the TFB atomically. Ensure that the TFB is not being removed when assigned.

tuexen marked an inline comment as done.

jtl: Does the new version address your concern? It is much smaller and cleaner, so I went ahead and updated the proposed patch.

Thanks! I like this approach. I've added a few comments about potential enhancements.

sys/netinet/tcp_subr.c
2206

Why do we need to do find_tcp_fb_locked()? All it does is walk a list to return listening_tcb->t_fb. And, if it is in the process of being removed, we should catch that when TCP_FUNC_BEING_REMOVED is checked.

2210

Why not try the default stack? If TCP_FUNC_BEING_REMOVED is set, but the listening TCB was still using that stack, it seems likely this function is racing with the removal code which is trying to switch all the TCBs to the default stack.

sys/netinet/tcp_syncache.c
1029

I don't think this is sufficient. We really need to undo everything done in tcp_newtcpcb(), which includes (among other things) acquiring a refcount on the CC algo, initializing the CC code, and allocating a stats blob. I wonder if we instead should just run tcp_discardcb() here?

tuexen added inline comments.
sys/netinet/tcp_syncache.c
1029

You are correct, this is a generic issue not related to this patch. So I created D45749 to fix this separately.
Forcing the stack to go though error path, one can observe the memory leak...

jtl added inline comments.
sys/netinet/tcp_subr.c
2206

As we discussed, I agree that this review's usage of find_tcp_fb_locked() is consistent with current usage elsewhere and any change should be part of a larger change to the way we find stack pointers.

2210

As we discussed, this is a design decision. I personally would prefer that we use the default stack rather than fail. However, both choices will produce correct code.

sys/netinet/tcp_syncache.c
1029

Agreed. Thanks for doing that!

This revision is now accepted and ready to land.Wed, Jun 26, 7:29 PM
tuexen added inline comments.
sys/netinet/tcp_subr.c
2206

I looked into this today. One problem is in register_tcp_functions_as_names(). I registers a function block and in case of an error, it just removes some. It does NOT hold a lock in between... No TCP_FUNC_BEING_REMOVED being set...
I think we should make the adding and removing atomic. This means we need to do all allocations beforehand. I'll bring up a patch for review. Then we can cleanup the search I think.

This revision now requires review to proceed.Sat, Jun 29, 8:22 PM
sys/netinet/tcp_subr.c
2210

This was discussed in the FreeBSD transport VC. The consensus was to fail the call, if the TCP stack of the listener is to be removed instead of doing something automatically without knowing if this is the intention of the admin.

We can now avoid hunting the list to ensure the tfb is still on it.

sys/netinet/tcp_subr.c
447

I think the question is if we can assume or not that blk is already registered.

2206

This is now fixed and we know that the TCP function block is registered.

This revision is now accepted and ready to land.Sun, Jul 21, 6:17 PM