Page MenuHomeFreeBSD

tcp: simplify endpoint creation at the passive side
Needs ReviewPublic

Authored by tuexen on Thu, May 30, 11:13 AM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Jun 10, 6:19 PM
Unknown Object (File)
Sat, Jun 1, 5:57 PM

Details

Summary

Use the intended TCP stack when creating a TCP endpoint instead of creating it the endpoint the default stack first and after that switching it to use the intended TCP stack.

While there, fix some error cases, which would call tfb_tcp_fb_fini without tfb_tcp_fb_init having been called.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

This revision is now accepted and ready to land.Thu, May 30, 3:23 PM

Should we check for TCP_FUNC_BEING_REMOVED?

In D45411#1035980, @jtl wrote:

Should we check for TCP_FUNC_BEING_REMOVED?

I thought about this and did again after you raised the question. The original code didn't. I think for consistency with the code path in the setsockopt() handler for TCP_FUNCTION_BLK it makes sense to use:

	if (V_functions_inherit_listen_socket_stack)
		tfb = tcp_find_and_ref_fb(sototcpcb(lso)->t_fb);
	else
		tfb = tcp_find_and_ref_default_fb();
	KASSERT(tfb != NULL, ("Cannot find TCP function block."));
	if ((tfb->tfb_flags & TCP_FUNC_BEING_REMOVED) ||
	    (tp = tcp_newtcpcb(inp, tfb)) == NULL) {
		refcount_release(&tfb->tfb_refcnt);
		in_pcbfree(inp);
		sodealloc(so);
		goto allocfail;
	}

However, since we don't hold the tcp_function_lock, the tfb can be removed any time before or after we set the TCP function block in tcp_newtcpcb. It cannot go away, since we hold a reference count. I guess that is the reason why there was no check...

This revision now requires review to proceed.Fri, May 31, 11:24 AM
In D45411#1035980, @jtl wrote:

Should we check for TCP_FUNC_BEING_REMOVED?

I thought about this and did again after you raised the question. The original code didn't.
...
However, since we don't hold the tcp_function_lock, the tfb can be removed any time before or after we set the TCP function block in tcp_newtcpcb. It cannot go away, since we hold a reference count. I guess that is the reason why there was no check...

I agree it won't cause a fatal problem (i.e. panic). But, it could cause problems where it is hard to remove TFBs.

Without the check, we could end up in a situation where:

  • Removal code marks TCP_FUNC_BEING_REMOVED.
  • Removal code walks TCBs to switch them off the TFB.
  • After removal code starts walking TCBs, syncache_socket() creates a new TCB using this TFB. (The removal code's TCB walk won't see this as the new TCB is added to the head of the list the removal code is walking.)
  • So, once it is done switching TCBs, removal code will see that TCBs are still using this TFB.

If we had a working, coordinated (i.e. lock-protected or atomic) check for TCP_FUNC_BEING_REMOVED in syncache_socket, this race could not happen.

Whether it is worth adding code to cover this case is another question. I think it depends on whether we expect this to be used primarily as a development feature or a user-facing feature. If the latter, its probably more important it be reliable.

sys/netinet/tcp_subr.c
445

I know it is unrelated, but it seems like this just devolves to rblk = blk if the caller owns a reference on blk (guaranteeing it will be in the list). There might be room for future optimization here.

If we wanted, this is probably where we could check for TCP_FUNC_BEING_REMOVED while we hold the tcp_function_lock. But, I haven't checked all callers to make sure that's appropriate.

In D45411#1036189, @jtl wrote:
In D45411#1035980, @jtl wrote:

Should we check for TCP_FUNC_BEING_REMOVED?

I thought about this and did again after you raised the question. The original code didn't.
...
However, since we don't hold the tcp_function_lock, the tfb can be removed any time before or after we set the TCP function block in tcp_newtcpcb. It cannot go away, since we hold a reference count. I guess that is the reason why there was no check...

I agree it won't cause a fatal problem (i.e. panic). But, it could cause problems where it is hard to remove TFBs.

Without the check, we could end up in a situation where:

  • Removal code marks TCP_FUNC_BEING_REMOVED.
  • Removal code walks TCBs to switch them off the TFB.
  • After removal code starts walking TCBs, syncache_socket() creates a new TCB using this TFB. (The removal code's TCB walk won't see this as the new TCB is added to the head of the list the removal code is walking.)
  • So, once it is done switching TCBs, removal code will see that TCBs are still using this TFB.

Correct. That can happen. But I think this is not introduced by this patch or am I missing something?

If we had a working, coordinated (i.e. lock-protected or atomic) check for TCP_FUNC_BEING_REMOVED in syncache_socket, this race could not happen.

Are you proposing to hold the tcp_function_lock from before the looking up the TFB until it is set and the inp can be found? Wouldn't that be too long?

Whether it is worth adding code to cover this case is another question. I think it depends on whether we expect this to be used primarily as a development feature or a user-facing feature. If the latter, its probably more important it be reliable.

I think the use case is that you are using some stack (for example the RACK stack) even as the default stack and want to switch all endpoints to a different one (for example the freebsd stack). So I would do this by

  1. sudo sysctl net.inet.tcp.functions_default=freebsd
  2. sudo tcpsso -s LISTEN -Srack TCP_FUNCTION_BLK freebsd
  3. sudo tcpsso -Srack TCP_FUNCTION_BLK freebsd

You might need to run the second and third command multiple time, but after that you should be able to kldunload tcp_rack.

sys/netinet/tcp_subr.c
445

We can do that, but that does NOT protect the case, where the TCP_FUNC_BEING_REMOVED was not set inside this function, but gets marked as soon as the tcp_function_lock is released and before the called gets the return value. So where is the difference?

Would the following address your issue:

  1. pass the relevant information to tcp_newtcpcb().
  2. In tcp_newtcpcb, lock the tcp_function_lock, ensure that the TCP_FUNC_BEING_REMOVED flag is not set (else return NULL), increment the refcount, set the pointer and unlock the tcp_function_lock.

Increment reference count and set the TFB atomically. Ensure that the TFB is not being removed when assigned.

tuexen marked an inline comment as done.

jtl: Does the new version address your concern? It is much smaller and cleaner, so I went ahead and updated the proposed patch.

Thanks! I like this approach. I've added a few comments about potential enhancements.

sys/netinet/tcp_subr.c
2202

Why do we need to do find_tcp_fb_locked()? All it does is walk a list to return listening_tcb->t_fb. And, if it is in the process of being removed, we should catch that when TCP_FUNC_BEING_REMOVED is checked.

2206

Why not try the default stack? If TCP_FUNC_BEING_REMOVED is set, but the listening TCB was still using that stack, it seems likely this function is racing with the removal code which is trying to switch all the TCBs to the default stack.

sys/netinet/tcp_syncache.c
1029

I don't think this is sufficient. We really need to undo everything done in tcp_newtcpcb(), which includes (among other things) acquiring a refcount on the CC algo, initializing the CC code, and allocating a stats blob. I wonder if we instead should just run tcp_discardcb() here?