This is my version for problem described in D12914. Copy-paste of
original text from Jonathan follows:
When local TCP connections transition to TIME_WAIT, we (by default) block those connections from actually entering the TIME_WAIT state. (This optimization is controlled by a sysctl.) This optimization makes sense as there is little-to-no chance that local packets are lost or reordered. However, this optimization relies on one critical element: that both ends of the connection will actually shut down correctly.
This assumption is violated when the optimization is enabled. When the optimization is enabled and a local TCP session tries to transition from FINWAIT-2 to TIME_WAIT, the kernel doesn't actually send an ACK for the final FIN before closing the TCP connection. As a result, the other side must retransmit its FIN. As long as the blackhole option is not enabled, the kernel then responds with a RST because it no longer has a matching session. In the worst case, if the blackhold option is enabled, the remote side will need to retransmit its FIN many times before finally timing out the session.
The solution is fairly simple: actually send a FIN before we close the session.
As a side effect, this solution also (partially) fixes something noted in a comment in tcp_timewait.c: the timewait ACK should include timestamps. This makes it so the first ACK will include timestamps. (Subsequent ones still will not.)