Page MenuHomeFreeBSD

Add a limit on how long TCP data can live at the head of the output queue
AbandonedPublic

Authored by jtl on Apr 6 2018, 7:48 PM.
Tags
None
Referenced Files
F103314624: D14993.diff
Sat, Nov 23, 10:52 AM
Unknown Object (File)
Fri, Nov 22, 10:21 AM
Unknown Object (File)
Tue, Nov 19, 9:31 PM
Unknown Object (File)
Fri, Nov 15, 5:56 AM
Unknown Object (File)
Thu, Nov 7, 9:59 PM
Unknown Object (File)
Thu, Nov 7, 1:57 AM
Unknown Object (File)
Wed, Nov 6, 11:11 PM
Unknown Object (File)
Tue, Nov 5, 12:09 PM

Details

Reviewers
rrs
gnn
Group Reviewers
transport
Summary

Add a limit on how long TCP data can live at the head of the output queue before it is acknowledged by the remote side.

While data lives in the output queue, it takes buffer space. When the remote side holds a connection in persist for an extended period or consumes data slowly, this data can back up in socket buffers. Furthermore, the data may even become stale.

User-space processes can manage part of this process through their own idle timers, etc. However, a user-space process does not have as much visibility into what is happening in the TCP stack. Additionally, when the userspace process has transmitted data and closed a connection, it loses the ability to monitor this and must rely on the kernel to manage the connection.

This feature is not covered by the TCP specification. However, this is somewhat hinted through the "user timeout" option in RFC 793. Also, it is equivalent to what a user-space application could choose to do on its own through more expensive user-space monitoring. Finally, it is an important capability to manage buffer space used on a server.

The user-space API changes are:

  • New sysctl (net.inet.tcp.maxunacktime), which provides a default value for this feature.
  • New socket option (TCP_MAXUNACKTIME), which lets an application set it on a per-socket basis. (If set on the listen socket, connections accepted through the listen socket will inherit the setting.)

The feature defaults to being disabled.

The mechanism is fairly simple:

  • Record the time we add new data to the socket buffer or transmit new data on an idle connection.
  • Update that time when the remote peer's cumulative ACK moves more than one byte. (This avoids counting ACKd persist probes.)
  • When the persist or retransmit timer notice that we haven't received an acknowledgement for the data at the head of the output queue within the maxunacktime, they will reset the connection.
Test Plan

Tested with a script that holds open a connection with data in the output buffer. The code behaves as described.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 16025
Build 16002: arc lint + arc unit

Event Timeline

I'm not sure it is a good idea to have a sysctl for the default value. I would prefer to have the default value always be 0 (not changeable by a sysctl) and required the application to set a non-default value via the socket option. Why is there a need to change the default from "off" on a system wide base?

Will play with the extension in packetdrill over the weekend...

I'm not sure it is a good idea to have a sysctl for the default value. I would prefer to have the default value always be 0 (not changeable by a sysctl) and required the application to set a non-default value via the socket option. Why is there a need to change the default from "off" on a system wide base?

I don't have a particular use case. I assumed it might be easier, in some environments, to change the default rather than change the application code. However, I don't feel strongly that we need to support a non-0 system-wide default.

sys/netinet/tcp_var.h
127

This field was already added in r333041. So the patch does not apply cleanly anymore.

In D14993#320689, @jtl wrote:

I'm not sure it is a good idea to have a sysctl for the default value. I would prefer to have the default value always be 0 (not changeable by a sysctl) and required the application to set a non-default value via the socket option. Why is there a need to change the default from "off" on a system wide base?

I don't have a particular use case. I assumed it might be easier, in some environments, to change the default rather than change the application code. However, I don't feel strongly that we need to support a non-0 system-wide default.

This works nicely if you have only a single application running on your system. But whatever is fine for your application might not be intended for sshd used to access this system...

OK, I tested this patch and think it is OK to commit under the assumption of removing the sysctl variable net.inet.tcp.maxunacktime.

tcp(4) manpage needs to be updated.

Thanks.

Functionally committed in D36716