Page MenuHomeFreeBSD

Fix a race condition in TCP timewait
ClosedPublic

Authored by jch on Sep 24 2014, 10:56 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Apr 29, 7:44 PM
Unknown Object (File)
Mon, Apr 29, 7:31 PM
Unknown Object (File)
Apr 2 2024, 7:45 AM
Unknown Object (File)
Mar 15 2024, 10:54 AM
Unknown Object (File)
Jan 26 2024, 6:46 AM
Unknown Object (File)
Dec 27 2023, 8:50 PM
Unknown Object (File)
Dec 27 2023, 8:50 PM
Unknown Object (File)
Dec 20 2023, 3:21 PM
Subscribers
None

Details

Summary

Fix a race condition in TCP timewait between tcp_tw_2msl_reuse() and tcp_tw_2msl_scan()
that drives unplanned timewait timeout cancellation. Also simplify implementation by
holding inpcb reference and removing tcptw reference counting.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

jch retitled this revision from to Fix a race condition in TCP timewait.
jch updated this object.
jch edited the test plan for this revision. (Show Details)
jch added a reviewer: jhb.

Note: This race condition between tcp_tw_2msl_reuse() and tcp_tw_2msl_scan() is possible only on HEAD, not in stable/10. Thanks to jhb for this clarification.

On first pass it looks fine to me.

sys/netinet/tcp_timewait.c
107

Perhaps 'only while its inpcb is locked'? Though I'm not sure that is quite right either.. what do you mean by 'valid'? Is it that you can only dereference a tcptw pointer while you have the inpcb locked?

276

I've long wanted '*ref()' routines to return a pointer to the object being referenced so the owner of the reference is more obvious, e.g.:

tw->tw_inpcb = in_pcbref(inp);

That's a fight for another day though, this is fine for now.

663–664

You could perhaps reduce some of the code duplication here by changing tcp_twclose() to take the inp instead of the tw. It would then perform this shared bit of code at the start:

tcp_twclose(struct inpcb *inp, int reuse)
{
    INP_WLOCK(inp);
    tw = intotw();
    if (in_pcbrele_wlocked()) {
        ...
        return;
    }

    if (tw == NULL) {
        ...
        return;
    }

    ...
}

This would have the nice property of removing the asymmetry of the caller locking the INP but tcp_twclose() magically dropping the lock. OTOH, you would have to adjust sysctl_drop() in tcp_subr.c to take this into account (it would have to in_pcbref() / INP_WUNLOCK / tcp_twclose()).'

It is probably worth waiting to see what other folks on the review think about that change before doing it.

sys/netinet/tcp_timewait.c
107

Exactly, based on your comment I would propose something (hopefully) clearer:
- a tcptw is dereferenceable only while its inpcb is locked

276

That's indeed a good idea, I will make a small change to make the owner of the reference slightly more obvious:

/*
 * The tcptw will hold a reference on its inpcb until tcp_twclose
 * is called
 */
tw->tw_inpcb = inp;
in_pcbref(inp);  /* Reference from tw */
663–664

Interesting idea. For completeness changing tcp_twclose(struct tcptw *tw, ...) to tcp_twclose(struct inpcb *inp, ...) will require to adjust sysctl_drop() has you said, but also tcp_twcheck(). And adjusting tcp_twcheck() seems tricky in this case, at least from my perspective.

That said, I agree that there is code duplication between tcp_tw_2msl_reuse() and tcp_tw_2msl_scan(), let me try if I can merge back these two functions together in a single tcp_tw_2msl_scan(int reuse) function.

jch edited edge metadata.

Apply current review comments

From jhb's review:

  • Merge tcp_tw_2msl_reuse() and tcp_tw_2msl_scan() in a single function
  • Make clearer that a tcptw owns a reference on its inpcb
  • Improve a tcptw usage rule comment

From rw's review:

  • Add comments about relational behind INP_INFO_WLOCK_ASSERT in both tcp_tw_2msl_scan() and tcp_twstart() functions

I think the updates look fine, but for some reason I can't get arc to show me the full diff now.

In D826#10, @jhb wrote:

I think the updates look fine, but for some reason I can't get arc to show me the full diff now.

You are right the command:

$ arc patch D826

only applies the last change, not all patches accumulated. As a workaround I found that you can apply the two diffs with:

$ arc patch --diff 1746
$ arc patch --diff 1911

But this is not handy, I am digging the arcanist documentation...

Just for your information, below steps to get a different perspective on this change: It is by comparing 10.0 with this change applied: For example if your are using git from https://github.com/freebsd/freebsd

$ git checkout master
$ arc patch --diff 1746
$ arc patch --diff 1911
$ git diff origin/releng/10.0 HEAD -- sys/netinet/tcp_timewait.c
# Or even a word based diff
$ git diff --color-words origin/releng/10.0 HEAD -- sys/netinet/tcp_timewait.c

As now the timewait code structure is much closer to its state in releng/10.0, it gives another perspective on this change.

Just for your information, this patch just passed our FreeBSD QA test suite with success.

adrian edited edge metadata.
This revision is now accepted and ready to land.Oct 27 2014, 3:04 PM
jhb edited edge metadata.

Julien, you should be good to commit this with 'Reviewed by: adrian, jhb' and possibly 'rwatson' depending on how much review Robert provided. Before merging to 10, we should restore the padding fields that the original commit removed from tcptw so that the ABI of tcptw is restored.