Add optional TCP logging on sonewconn failures.
Needs ReviewPublic

Authored by mjoras on Jul 25 2017, 1:24 AM.

Details

Reviewers
rstone
glebius
Group Reviewers
network
transport
Summary

The existing "Listen queue overflow" log message is generated from
sonewconn. It logs the size of the listen queue and the so_pcb for which
the queue overflowed. For a TCP socket this is a pointer to the in_pcb.
There's currently no easy way in userspace to correlate this in_pcb
pointer to the process that created it, since netstat(1)/fstat(1) use
the tcpcb. net.inet.tcp.log_debug will cause a log message in this
situation, but it also logs other things that may be undesirable for
production systems. This adds a sysctl to optionally enable logging of
the involved TCP addresses in such error cases, as well as a ratelimit
for the logging.

Test Plan

Example output using backlog of 1:

Jul 24 17:29:52  kernel: sonewconn: pcb 0xfffff800254b4d40: Listen queue overflow: 2 already in queue awaiting acceptance (3 occurrences)
Jul 24 17:29:52  kernel: TCP: [10.7.223.246]:55613 to [10.7.223.201]:3333; syncache_socket: Socket create failed due to limits or memory shortage

Diff Detail

Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 10656
Build 11060: arc lint + arc unit
mjoras created this revision.Jul 25 2017, 1:24 AM
mjoras edited the test plan for this revision. (Show Details)Jul 25 2017, 1:27 AM
mjoras removed a reviewer: transport.
mjoras set the repository for this revision to rS FreeBSD src repository.

Have to say, I cannot imagine any circumstances in which the message that's on by default would be useful when the message that's off by default isn't printed. The PCB address is useless debugging information here, whereas the 4-tuple of the dropped connection is actually meaningful -- so I'd reverse them (or better yet, figure out a way to combine the two messages while hiding the PCB address under a debug flag).

Have to say, I cannot imagine any circumstances in which the message that's on by default would be useful when the message that's off by default isn't printed. The PCB address is useless debugging information here, whereas the 4-tuple of the dropped connection is actually meaningful -- so I'd reverse them (or better yet, figure out a way to combine the two messages while hiding the PCB address under a debug flag).

Yes, you're touching on the underlying awkwardness here which is that the original overflow logging happens in sonewconn, which is at the generic socket layer. I toyed with the idea of simply changing the message at the generic socket layer, but everything I came up with either had gross layering violations (e.g. having a socket layer function with knowlege of all the potential so_pcb types) or got fairly complicated with arguably poor abstractions (e.g. adding a protosw call for something like char *pr_getsockdescr). I'm definitely open to suggestions here though.

Yes, you're touching on the underlying awkwardness here which is that the original overflow logging happens in sonewconn, which is at the generic socket layer. I toyed with the idea of simply changing the message at the generic socket layer, but everything I came up with either had gross layering violations (e.g. having a socket layer function with knowlege of all the potential so_pcb types) or got fairly complicated with arguably poor abstractions (e.g. adding a protosw call for something like char *pr_getsockdescr). I'm definitely open to suggestions here though.

I wouldn't necessarily agree that that's a poor abstraction. The other alternative would be for sonewconn(9) to pass more information back to the caller on failure, and expect the caller to print a meaningful message -- and I'd want to see some evidence that that doesn't make the common case (no failure) any slower. That would also require every caller to implement its own rate-limiting.

I wouldn't necessarily agree that that's a poor abstraction. The other alternative would be for sonewconn(9) to pass more information back to the caller on failure, and expect the caller to print a meaningful message -- and I'd want to see some evidence that that doesn't make the common case (no failure) any slower. That would also require every caller to implement its own rate-limiting.

I should say that I personally don't think it's a bad idea to have an abstraction for getting a human-readable representation of a socket, but I figured that change might cause a bit more controversy since there's no precedent for it today,

The ideal for me would be a message like this:

"%s: pcb %p (%s): Listen queue overflow: %i already in queue awaiting acceptance (%d occurrences)\n"
sonewconn: pcb 0xfffff800254b4d40 ([10.7.223.246]:55613 to [10.7.223.201]:3333): Listen queue overflow: 2 already in queue awaiting acceptance (3 occurrences)

This would keep us mostly consistent with the current log message while adding meaningful information. We _could_ make the callers of sonewconn responsible for producing those exact messages by changing the sonewconn interface to return an errno corresponding to the type of error, but I'm not sure that's a better solution than having multiple log messages (as this revision does) or having a protocol-level function for getting a description of a socket.

ae added a subscriber: ae.Aug 3 2017, 6:50 PM

I proposed this patch for the discussed problem:
https://lists.freebsd.org/pipermail/freebsd-net/2016-December/046650.html

But glebius@ said that he will have better solution after "listening sockets revamp".

mjoras added a comment.Aug 3 2017, 6:57 PM
In D11725#245735, @ae wrote:

I proposed this patch for the discussed problem:
https://lists.freebsd.org/pipermail/freebsd-net/2016-December/046650.html

But glebius@ said that he will have better solution after "listening sockets revamp".

Interesting. I thought to do something similar as a hack, but figured it would get shot down as the socket layer generally eschews protocol specific information (this would be the first and only AF_* usage in uipc_socket.c). I'd personally prefer that we didn't have the socket layer knowing about the protocol families.

I wonder what the status of the better solution is since the listening socket revamp has gone in. There is some extra room in listening sockets relative to data sockets now, so perhaps we could use that for something.

dch added a subscriber: dch.Aug 6 2017, 6:31 PM

@mjoras +1 for your comments around improving the existing log format rather than adding additional lines

jch added a subscriber: jch.Aug 6 2017, 6:54 PM
ae added a comment.Feb 8 2018, 11:52 AM

It would be good have some progress with this review.

mjoras added a comment.Feb 8 2018, 6:11 PM
In D11725#298967, @ae wrote:

It would be good have some progress with this review.

I agree. glebius said that a good approach would be to use the now-available space in the listening socket to store some useful information (e.g. a process) that can be used by this logging. I haven't had the time to look at that very closely but it seems like a reasonable approach.

It would be really nice to see this fixed sooner rather than later. I can't tell from the phab UI who's put a block on this, but could we maybe move forward a bit? The current messages are both annoying and unhelpful.

rgrimes added a reviewer: glebius. rgrimes removed 1 blocking reviewer(s): transport.May 9 2018, 2:24 AM
rgrimes added a subscriber: rgrimes.

I have changed the transport reviewer to non-blocking, and added glebius as a reviewer since he seems to have some stake in this. Hopefully this can help it move forward.

This wasn't not making progress due to the blocking review, it was not making progress since the current diff probably can't land as is. It seems everyone is in favor of the listening socket stashing approach, but I just haven't looked into it yet.

I don't think it will be hard in principle. I'll try to take a closer look.