Add a limit on how long TCP data can live at the head of the output queue
AbandonedPublic
Actions

Authored by jtl on Apr 6 2018, 7:48 PM.

Details

Reviewers

rrs
gnn

Group Reviewers

transport

Summary

Add a limit on how long TCP data can live at the head of the output queue before it is acknowledged by the remote side.

While data lives in the output queue, it takes buffer space. When the remote side holds a connection in persist for an extended period or consumes data slowly, this data can back up in socket buffers. Furthermore, the data may even become stale.

User-space processes can manage part of this process through their own idle timers, etc. However, a user-space process does not have as much visibility into what is happening in the TCP stack. Additionally, when the userspace process has transmitted data and closed a connection, it loses the ability to monitor this and must rely on the kernel to manage the connection.

This feature is not covered by the TCP specification. However, this is somewhat hinted through the "user timeout" option in RFC 793. Also, it is equivalent to what a user-space application could choose to do on its own through more expensive user-space monitoring. Finally, it is an important capability to manage buffer space used on a server.

The user-space API changes are:

New sysctl (net.inet.tcp.maxunacktime), which provides a default value for this feature.
New socket option (TCP_MAXUNACKTIME), which lets an application set it on a per-socket basis. (If set on the listen socket, connections accepted through the listen socket will inherit the setting.)

The feature defaults to being disabled.

The mechanism is fairly simple:

Record the time we add new data to the socket buffer or transmit new data on an idle connection.
Update that time when the remote peer's cumulative ACK moves more than one byte. (This avoids counting ACKd persist probes.)
When the persist or retransmit timer notice that we haven't received an acknowledgement for the data at the head of the output queue within the maxunacktime, they will reset the connection.

Test Plan

Tested with a script that holds open a connection with data in the output buffer. The code behaves as described.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 16025
Build 16002: arc lint + arc unit

Event Timeline

jtl created this revision.Apr 6 2018, 7:48 PM

Herald added 1 blocking reviewer(s): transport. · View Herald TranscriptApr 6 2018, 7:48 PM

Herald added a subscriber: imp. · View Herald Transcript

Harbormaster completed remote builds in B16025: Diff 41201.Apr 6 2018, 7:48 PM

I'm not sure it is a good idea to have a sysctl for the default value. I would prefer to have the default value always be 0 (not changeable by a sysctl) and required the application to set a non-default value via the socket option. Why is there a need to change the default from "off" on a system wide base?

Will play with the extension in packetdrill over the weekend...

In D14993#320674, @tuexen wrote:

I'm not sure it is a good idea to have a sysctl for the default value. I would prefer to have the default value always be 0 (not changeable by a sysctl) and required the application to set a non-default value via the socket option. Why is there a need to change the default from "off" on a system wide base?

I don't have a particular use case. I assumed it might be easier, in some environments, to change the default rather than change the application code. However, I don't feel strongly that we need to support a non-0 system-wide default.

tuexen added inline comments.Apr 27 2018, 2:00 PM

sys/netinet/tcp_var.h
127	This field was already added in r333041. So the patch does not apply cleanly anymore.

In D14993#320689, @jtl wrote:

In D14993#320674, @tuexen wrote:

I'm not sure it is a good idea to have a sysctl for the default value. I would prefer to have the default value always be 0 (not changeable by a sysctl) and required the application to set a non-default value via the socket option. Why is there a need to change the default from "off" on a system wide base?

I don't have a particular use case. I assumed it might be easier, in some environments, to change the default rather than change the application code. However, I don't feel strongly that we need to support a non-0 system-wide default.

This works nicely if you have only a single application running on your system. But whatever is fine for your application might not be intended for sshd used to access this system...

OK, I tested this patch and think it is OK to commit under the assumption of removing the sysctl variable net.inet.tcp.maxunacktime.