Differential D23577

divert: Add socket options for divert socket's send and receive buffers
AbandonedPublic
Actions

Authored by nc on Feb 8 2020, 5:52 AM.

Details

Reviewers

donner

Group Reviewers

manpages

Summary

Add socket options for divert socket's send and receive buffers.

This introduces two new socket options IP_DIVSENDBUF and IP_DIVRECVBUF, which respectively affects the divert socket receive and send buffers. This allows an application author to modify these values as needed.

Submitted by: Neel Chauhan <neel at neelc.org>

Test Plan

We can test this with a C program:

unsigned long sndbuf = 8192, rcvbuf = 16384;
setsockopt(s, IPPROTO_DIVERT, IP_DIVSENDBUF, &sndbuf, sizeof(sndbuf));
setsockopt(s, IPPROTO_DIVERT, IP_DIVRECVBUF, &rcvbuf, sizeof(rcvbuf));

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

nc created this revision.Feb 8 2020, 5:52 AM

Herald added a reviewer: manpages. · View Herald TranscriptFeb 8 2020, 5:52 AM

Herald added subscribers: Contributor Reviews (src), imp. · View Herald Transcript

nc edited the test plan for this revision. (Show Details)Feb 8 2020, 5:54 AM

Man page looks good, don't forget to bump the .Dd when you commit it.
Thank you for implementing this feature.

nc added a subscriber: network.Feb 12 2020, 4:10 AM

Please describe use case for this change. Also, defauls are not 65536 but (65536 + 100).

nc edited the summary of this revision. (Show Details)Feb 12 2020, 5:27 AM

In D23577#518755, @eugen_grosbein.net wrote:

Please describe use case for this change. Also, defauls are not 65536 but (65536 + 100).

This can be used to reduce divert memory consumption on "small" devices (e.g. low power routers) or increase divert performance on "big" devices (e.g. middleboxes, IDS).

I updated the description.

In D23577#518767, @neel_neelc.org wrote:

In D23577#518755, @eugen_grosbein.net wrote:

Please describe use case for this change. Also, defauls are not 65536 but (65536 + 100).

This can be used to reduce divert memory consumption on "small" devices (e.g. low power routers) or increase divert performance on "big" devices (e.g. middleboxes, IDS).

I updated the description.

Small and low-power routers do not need multiple divert sockets and economy of 64Kbytes does not seam essential but adds extra sysctls to ALL freebsd systems, and sysctls have their costs, too. Also, small and low-power routers generally are in desperate need for reducing CPU overhead, so they should NOT use natd or similar divert-based pass-through daemons because of significant overhead of system calls that divert sockets add. Luckily, these days we have better alternatives like "ipfw nat" or "ipfw netgraph" to eliminate usage of divert sockets in such extreme cases for in-kernel processing of transit traffic.

The same applies to "big" systems. Do you really have an example of such middle/big system with performance measurements comparing default and non-default socket buffer sizes for divert-based userland application?

As a side node perfomance is gained from collapsing

ipfw -q add 100 divert natd ip from any to any in via wan0
ipfw -q add 1000 divert natd ip from any to any out via wan0

ipfw -q add 100 divert natd ip from any to any via wan0

But I do not see any code for handling a setting to a new value afterwards as your test plan is indicating.
What is the expected outcome?

sys/netinet/ip_divert.c
573–575	Here is the only point where the values are used.
747	Due to the usage, a tunable_ro seems to be more appropriate.

afedorov added a subscriber: afedorov.Feb 12 2020, 10:05 AM

divert(4) sockets can be used not only with natd(8), so the changes looks reasonable for me.

afedorov added inline comments.Feb 12 2020, 10:47 AM

sys/netinet/ip_divert.c
747	The variables div_sendspace and div_recvspace are used when creating a socket. Because of this, the changed values are applied to the newly created socket. And this is a good feature that allows you to change these variables at runtime.

donner added inline comments.Feb 12 2020, 11:22 AM

sys/netinet/ip_divert.c
747	After reading through sys/kern/uipc_socket.c I understand, that this is evaluated every time, the user space process handling the divert (natd in this example) is (re)started. So I'm fine with the CTLFLAG_RW.

In D23577#518798, @aleksandr.fedorov_itglobal.com wrote:

divert(4) sockets can be used not only with natd(8), so the changes looks reasonable for me.

divert sockets can be used with other software but present exactly same significant overhead.
Do you have an example when suggested change improves performance really?

In D23577#518833, @eugen_grosbein.net wrote:

In D23577#518798, @aleksandr.fedorov_itglobal.com wrote:

divert(4) sockets can be used not only with natd(8), so the changes looks reasonable for me.

divert sockets can be used with other software but present exactly same significant overhead.
Do you have an example when suggested change improves performance really?

I don't have a surefire example, but this could be used with say, Squid as a HTTP inspecting proxy.

divert sockets can be used with other software but present exactly same significant overhead.
Do you have an example when suggested change improves performance really?

I agree with you that divert sockets have a known overhead. And I don't have a suitable example, because I don't use them anywhere. On the other hand, I see no reason why the initial socket buffer size should be hardcoded. My experience with other types of sockets indicates that large buffers can increase throughput. For example, netgraph sockets.

So, if some people want to use divert sockets, why not add the ability to resize the buffer socket?

In D23577#518833, @eugen_grosbein.net wrote:

In D23577#518798, @aleksandr.fedorov_itglobal.com wrote:

divert(4) sockets can be used not only with natd(8), so the changes looks reasonable for me.

divert sockets can be used with other software but present exactly same significant overhead.
Do you have an example when suggested change improves performance really?

I do use divert to handle dhcpv6 packets coming over ng* interfaces on a LNS. This is done in ipfw rules, because attaching an bpf-filter to all of the thousand dynamic coming and going interfaces. Therefore the divert approach is much more efficient. So far for a different use case.

For the more interesting performance question, I have to abstain: The DHCPv6 rate is so low, that the handling by a simple perl script is sufficient. Socket buffers are not of any interest.

Can you please explain, what the issue is with the sysctl itself? It will help me to understand the implications which might be relevant for D23586, where memory might be traded for code complexity.

In D23577#519163, @aleksandr.fedorov_itglobal.com wrote:

divert sockets can be used with other software but present exactly same significant overhead.
Do you have an example when suggested change improves performance really?

I agree with you that divert sockets have a known overhead. And I don't have a suitable example, because I don't use them anywhere. On the other hand, I see no reason why the initial socket buffer size should be hardcoded. My experience with other types of sockets indicates that large buffers can increase throughput. For example, netgraph sockets.

So, if some people want to use divert sockets, why not add the ability to resize the buffer socket?

Sysctl tree bloat is not good and there should be real cause for adding another sysctl.

In D23577#519182, @lutz_donnerhacke.de wrote:

Can you please explain, what the issue is with the sysctl itself?

Sysctls are great tools and very handy, so our sysctl tree grows quick and already bloated and needs increasing amounts of memory. I don't think we should add new one just because it's easy and we can do it, without any practical use case.

In D23577#520226, @eugen_grosbein.net wrote:

In D23577#519182, @lutz_donnerhacke.de wrote:

Can you please explain, what the issue is with the sysctl itself?

Sysctls are great tools and very handy, so our sysctl tree grows quick and already bloated and needs increasing amounts of memory. I don't think we should add new one just because it's easy and we can do it, without any practical use case.

So how implementing an (existing) socketopt to modify the desired behavior from the application side?

In D23577#520228, @lutz_donnerhacke.de wrote:

In D23577#520226, @eugen_grosbein.net wrote:

In D23577#519182, @lutz_donnerhacke.de wrote:

Can you please explain, what the issue is with the sysctl itself?

Sysctls are great tools and very handy, so our sysctl tree grows quick and already bloated and needs increasing amounts of memory. I don't think we should add new one just because it's easy and we can do it, without any practical use case.

So how implementing an (existing) socketopt to modify the desired behavior from the application side?

Naturely, using setsockopt() for SO_SNDBUF/SO_RCVBUF.

In D23577#519163, @aleksandr.fedorov_itglobal.com wrote:

divert sockets can be used with other software but present exactly same significant overhead.
Do you have an example when suggested change improves performance really?

I agree with you that divert sockets have a known overhead. And I don't have a suitable example, because I don't use them anywhere. On the other hand, I see no reason why the initial socket buffer size should be hardcoded. My experience with other types of sockets indicates that large buffers can increase throughput. For example, netgraph sockets.

So, if some people want to use divert sockets, why not add the ability to resize the buffer socket?

Only defaults are hardcoded. If and application can benefit from increasing socket buffers, it's free to call setsockopt() to increase them already.

In D23577#520236, @eugen_grosbein.net wrote:

Naturely, using setsockopt() for SO_SNDBUF/SO_RCVBUF.

Current initialization ends in a call to chgsbsize (after checking limits). So a setsocketopt approach is not only feasable, but gains much finer granularity (per application) than a system wide setting.
So please try this approach.

This revision now requires changes to proceed.Feb 15 2020, 2:14 PM

Here, I switch to using socket options.

I stored the values in "struct socket" and use these values if they are set, otherwise use the default (which is used right now). Sorry if there's a better way to do this.

Herald added a subscriber: melifaro. · View Herald TranscriptFeb 16 2020, 5:29 AM

nc retitled this revision from divert: Add socket options divert socket send and receive buffers to divert: Add socket options for divert socket's send and receive buffers.Mar 16 2020, 1:01 AM

nc edited the summary of this revision. (Show Details)

I tried to use the already existing socket infrastructure to change the socket buffer values ...

#include <stdio.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <sys/types.h>

int main(int argn, char **argv) {
        int s;
        struct sockaddr_in bindPort;
        unsigned long size = 0;
        socklen_t sizelen = sizeof(size);
 
        s = socket(AF_INET, SOCK_RAW, IPPROTO_DIVERT);
        if (s == -1) {
                perror("socket");
                return (1);
        }

        if (0 != getsockopt(s, SOL_SOCKET, SO_SNDBUF, &size, &sizelen)) {
                perror("getsockopt");
                return (1);
        }
        printf("Socket %d has a sendbuffer of %lu bytes.\n", s, size);

        size = 9000;
        if (0 != setsockopt(s, SOL_SOCKET, SO_SNDBUF, &size, sizelen)) {
                perror("setsockopt");
                return (1);
        }

        size = 0;
        if (0 != getsockopt(s, SOL_SOCKET, SO_SNDBUF, &size, &sizelen)) {
                perror("getsockopt");
                return (1);
        }
        printf("Socket %d has a sendbuffer of %lu bytes.\n", s, size);

        bindPort.sin_family = AF_INET;
        bindPort.sin_port = htons(12345);
        bindPort.sin_addr.s_addr = 0;
        if (0 != bind(s, (struct sockaddr *)&bindPort, sizeof(bindPort))) {
                perror("bind");
                return (1);
        }

        return 0;
}

It simply works at the first glance.
Can you check, that the functionality works as indented?

If yes, please explain this method to change the buffer size (as an example) in the man pages and avoid additional code, which replicates the behavior.

sys/netinet/in.h
501–503	That's not an IP property, but a socket property. So can we reuse the existing sys/socket.h definitions? #define SO_SNDBUF 0x1001 /* send buffer size / #define SO_RCVBUF 0x1002 / receive buffer size */