The TCP timer code as been fraught with churn in the effort to fix its
racy use of timers. We have moved to ASYNC drain which cleans up a lot
of this but still have some hanging flags and such around that should be cleaned
up. Also the use of two locks at once in the timer code is problematic since the
timer code only really assures that one lock can be dealt with properly in the
drain functions. Therefore we will move to a "switch" locks type method so that
a future patch can get rid of the async drain all the way and move the lock
under the callout system (where it belongs).
Details
- Reviewers
- gallatin - hiren - jch 
- Group Reviewers
- transport 
- Commits
- rS304218: This cleans up the timer code in TCP and also makes it so we do not
Beat the heck out of it on a NF workload and hopefully get Versign to give it a go as well.
Note I do not intend for this to go into 11 but to wait until after head forks at some point unless
I hear a lot of pushback that we need it sooner
Diff Detail
- Lint
- Lint Skipped 
- Unit
- Tests Skipped 
Event Timeline
In addition to helping the callout correctness, this seems to improve lock contention on tcbinfo, since this lock is now taken only when needed, not when entering every routine (thereby blocking writers, or blocking when a writer has the lock).
I've been running this over a day with ~100K connections at ~80Gb/s at Netflix (in a FreeBSD-11 context) with no issues. I've also tested it on a WITNESS + INVARIANTS kernel, and saw no LORs and no kassert failures (at 5Gb/s :)
Fix it so we don't leak inp's.. the tcp_close() can return NULL in tp, this
would mean we would not do the reference count release. Instead use
the inp.
Also change the name to reflect what is actually being done (advice from Michael Tuexen :-D)
Turns out when tp is returned NULL by the drop/close that means
it also nicely released the INP_WLOCK. We need to re-acqurire the lock
in that case so we can drop our inp reference.