Restore multi threaded callouts in the TCP stack
AbandonedPublic
Actions

Authored by • hselasky on Jan 20 2015, 7:25 AM.

Details

Reviewers

lstewart
gnn
adrian
rwatson

Group Reviewers

network

Summary

Restore multi threaded callouts in the TCP stack. Use new callout_drain_async() API to avoid using callouts after freeing of the TCP timer context.

Diff Detail

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

• hselasky updated this revision to Diff 3263.Jan 20 2015, 7:25 AM

• hselasky retitled this revision from to Restore multi threaded callouts in the TCP stack.

• hselasky updated this object.

• hselasky edited the test plan for this revision. (Show Details)

• hselasky added reviewers: adrian, lstewart, rwatson, gnn.

• hselasky set the repository for this revision to rS FreeBSD src repository - subversion.

adrian added a reviewer: network.Jan 20 2015, 7:35 AM

emaste added a subscriber: emaste.Jan 20 2015, 2:10 PM

Don't like the magic values sprinked through out.
More commentary and analysis as to why these bugs are actually fixed and the workarounds can be removed is in order
since this is critical infrastructure code.
More information about why you are ditching the INFO lock locking in places is in order, either in the code, or in the commit message
The lack of this analysis makes me very uneasy as I (and others) can't check to make sure you haven't forgotten a case.
It would be nice to document also what regression testing was done in the absence of any formal TCP test suite in FreeBSD today.

sys/netinet/tcp_subr.c
911	I'm pretty sure that it isn't safe to drop this lock in this context. I don't think this is a viable solution.
936	Please don't use magic constants like this. It makes the code impossible to follow.
951	This is very hacky. Why was the assert before that this was locked insufficient?
993	why 6? I know I can count below, but some comment about this that's less lame than "set initial value" would be useful.
sys/netinet/tcp_timer.c
276	It would be good to explain in the commit message why these can't happen.
409	You should put back the assert.

rrs added a subscriber: rrs.Jan 22 2015, 5:28 AM

It would be sensible to ask Julien Charbon at Verisign to review this patch -- the code (and potential race conditions) here are remarkably subtle.

jch edited edge metadata.Apr 16 2015, 3:02 PM

jch added a subscriber: jch.

I believe D2079: Fix TCP timers use-after-free old race conditions fixed the same use-after-free race condition than this patch. The differences are:

It uses the old callout API only (no callout_drain_async())
It does not use inp_lock to protect callouts to avoid the INP_INFO_WLOCK/INP_WLOCK LOR management burden

That said:

I really like the new callout_drain_async() API. Once callout_drain_async() is adopted in -CURRENT, it will make sense to update rS281599 to use it
Using inp_lock to protect TCP timer callout is indeed a good idea, but I don't see how to implement it in a clean way as long a both INP_INFO and INP locks are required

Please drop this review as a new proposal will be required the in the wake of other changes.

gnn: What other changes are you referring to?

sys/netinet/tcp_subr.c
911	This is what the code is doing already inside one of the timer callbacks! Are you saying that current code is broken?
936	I'll define the three values used. No problem.
951	Because I allow this function being called not having V_tcpinfo locked initially, and in that case we should drop the lock before exiting again. I'll add a comment to clarify.
993	I'll add a comment to clarify what this value is based on.
sys/netinet/tcp_timer.c
276	They can't happen any more because callouts are cancelled atomically, because they are no longer so-called MPSAFE.
409	No problem.

BTW: Looks like my browser had cached a lot of comments :-)

A similar approach was implemented by r304218. Hopefully the "callout_init_rw()" for timers in the TCP stack will be implemented too. Abandoning for now.