Page MenuHomeFreeBSD

divert: Add socket options for divert socket's send and receive buffers
AbandonedPublic

Authored by neel_neelc.org on Feb 8 2020, 5:52 AM.

Details

Reviewers
lutz_donnerhacke.de
Group Reviewers
manpages
Summary

Add socket options for divert socket's send and receive buffers.

This introduces two new socket options IP_DIVSENDBUF and IP_DIVRECVBUF, which respectively affects the divert socket receive and send buffers. This allows an application author to modify these values as needed.

Submitted by: Neel Chauhan <neel at neelc.org>

Test Plan

We can test this with a C program:

unsigned long sndbuf = 8192, rcvbuf = 16384;
setsockopt(s, IPPROTO_DIVERT, IP_DIVSENDBUF, &sndbuf, sizeof(sndbuf));
setsockopt(s, IPPROTO_DIVERT, IP_DIVRECVBUF, &rcvbuf, sizeof(rcvbuf));

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

bcr added a subscriber: bcr.

Man page looks good, don't forget to bump the .Dd when you commit it.
Thank you for implementing this feature.

Please describe use case for this change. Also, defauls are not 65536 but (65536 + 100).

Please describe use case for this change. Also, defauls are not 65536 but (65536 + 100).

This can be used to reduce divert memory consumption on "small" devices (e.g. low power routers) or increase divert performance on "big" devices (e.g. middleboxes, IDS).

I updated the description.

Please describe use case for this change. Also, defauls are not 65536 but (65536 + 100).

This can be used to reduce divert memory consumption on "small" devices (e.g. low power routers) or increase divert performance on "big" devices (e.g. middleboxes, IDS).

I updated the description.

Small and low-power routers do not need multiple divert sockets and economy of 64Kbytes does not seam essential but adds extra sysctls to ALL freebsd systems, and sysctls have their costs, too. Also, small and low-power routers generally are in desperate need for reducing CPU overhead, so they should NOT use natd or similar divert-based pass-through daemons because of significant overhead of system calls that divert sockets add. Luckily, these days we have better alternatives like "ipfw nat" or "ipfw netgraph" to eliminate usage of divert sockets in such extreme cases for in-kernel processing of transit traffic.

The same applies to "big" systems. Do you really have an example of such middle/big system with performance measurements comparing default and non-default socket buffer sizes for divert-based userland application?

As a side node perfomance is gained from collapsing

ipfw -q add 100 divert natd ip from any to any in via wan0
ipfw -q add 1000 divert natd ip from any to any out via wan0

to

ipfw -q add 100 divert natd ip from any to any via wan0

But I do not see any code for handling a setting to a new value afterwards as your test plan is indicating.
What is the expected outcome?

sys/netinet/ip_divert.c
573–575

Here is the only point where the values are used.

746

Due to the usage, a tunable_ro seems to be more appropriate.

divert(4) sockets can be used not only with natd(8), so the changes looks reasonable for me.

sys/netinet/ip_divert.c
746

The variables div_sendspace and div_recvspace are used when creating a socket. Because of this, the changed values ​​are applied to the newly created socket. And this is a good feature that allows you to change these variables at runtime.

sys/netinet/ip_divert.c
746

After reading through sys/kern/uipc_socket.c I understand, that this is evaluated every time, the user space process handling the divert (natd in this example) is (re)started. So I'm fine with the CTLFLAG_RW.

In D23577#518798, @aleksandr.fedorov_itglobal.com wrote:

divert(4) sockets can be used not only with natd(8), so the changes looks reasonable for me.

divert sockets can be used with other software but present exactly same significant overhead.
Do you have an example when suggested change improves performance really?

In D23577#518798, @aleksandr.fedorov_itglobal.com wrote:

divert(4) sockets can be used not only with natd(8), so the changes looks reasonable for me.

divert sockets can be used with other software but present exactly same significant overhead.
Do you have an example when suggested change improves performance really?

I don't have a surefire example, but this could be used with say, Squid as a HTTP inspecting proxy.

divert sockets can be used with other software but present exactly same significant overhead.
Do you have an example when suggested change improves performance really?

I agree with you that divert sockets have a known overhead. And I don't have a suitable example, because I don't use them anywhere. On the other hand, I see no reason why the initial socket buffer size should be hardcoded. My experience with other types of sockets indicates that large buffers can increase throughput. For example, netgraph sockets.

So, if some people want to use divert sockets, why not add the ability to resize the buffer socket?

In D23577#518798, @aleksandr.fedorov_itglobal.com wrote:

divert(4) sockets can be used not only with natd(8), so the changes looks reasonable for me.

divert sockets can be used with other software but present exactly same significant overhead.
Do you have an example when suggested change improves performance really?

I do use divert to handle dhcpv6 packets coming over ng* interfaces on a LNS. This is done in ipfw rules, because attaching an bpf-filter to all of the thousand dynamic coming and going interfaces. Therefore the divert approach is much more efficient. So far for a different use case.

For the more interesting performance question, I have to abstain: The DHCPv6 rate is so low, that the handling by a simple perl script is sufficient. Socket buffers are not of any interest.

Can you please explain, what the issue is with the sysctl itself? It will help me to understand the implications which might be relevant for D23586, where memory might be traded for code complexity.

In D23577#519163, @aleksandr.fedorov_itglobal.com wrote:

divert sockets can be used with other software but present exactly same significant overhead.
Do you have an example when suggested change improves performance really?

I agree with you that divert sockets have a known overhead. And I don't have a suitable example, because I don't use them anywhere. On the other hand, I see no reason why the initial socket buffer size should be hardcoded. My experience with other types of sockets indicates that large buffers can increase throughput. For example, netgraph sockets.

So, if some people want to use divert sockets, why not add the ability to resize the buffer socket?

Sysctl tree bloat is not good and there should be real cause for adding another sysctl.

Can you please explain, what the issue is with the sysctl itself?

Sysctls are great tools and very handy, so our sysctl tree grows quick and already bloated and needs increasing amounts of memory. I don't think we should add new one just because it's easy and we can do it, without any practical use case.

Can you please explain, what the issue is with the sysctl itself?

Sysctls are great tools and very handy, so our sysctl tree grows quick and already bloated and needs increasing amounts of memory. I don't think we should add new one just because it's easy and we can do it, without any practical use case.

So how implementing an (existing) socketopt to modify the desired behavior from the application side?

Can you please explain, what the issue is with the sysctl itself?

Sysctls are great tools and very handy, so our sysctl tree grows quick and already bloated and needs increasing amounts of memory. I don't think we should add new one just because it's easy and we can do it, without any practical use case.

So how implementing an (existing) socketopt to modify the desired behavior from the application side?

Naturely, using setsockopt() for SO_SNDBUF/SO_RCVBUF.

In D23577#519163, @aleksandr.fedorov_itglobal.com wrote:

divert sockets can be used with other software but present exactly same significant overhead.
Do you have an example when suggested change improves performance really?

I agree with you that divert sockets have a known overhead. And I don't have a suitable example, because I don't use them anywhere. On the other hand, I see no reason why the initial socket buffer size should be hardcoded. My experience with other types of sockets indicates that large buffers can increase throughput. For example, netgraph sockets.

So, if some people want to use divert sockets, why not add the ability to resize the buffer socket?

Only defaults are hardcoded. If and application can benefit from increasing socket buffers, it's free to call setsockopt() to increase them already.

Naturely, using setsockopt() for SO_SNDBUF/SO_RCVBUF.

Current initialization ends in a call to chgsbsize (after checking limits). So a setsocketopt approach is not only feasable, but gains much finer granularity (per application) than a system wide setting.
So please try this approach.

This revision now requires changes to proceed.Feb 15 2020, 2:14 PM
neel_neelc.org retitled this revision from divert: Add sysctls for divert socket send and receive buffers to divert: Add socket options divert socket send and receive buffers.
neel_neelc.org edited the summary of this revision. (Show Details)
neel_neelc.org edited the test plan for this revision. (Show Details)

Here, I switch to using socket options.

I stored the values in "struct socket" and use these values if they are set, otherwise use the default (which is used right now). Sorry if there's a better way to do this.

neel_neelc.org retitled this revision from divert: Add socket options divert socket send and receive buffers to divert: Add socket options for divert socket's send and receive buffers.Mar 16 2020, 1:01 AM
neel_neelc.org edited the summary of this revision. (Show Details)

I tried to use the already existing socket infrastructure to change the socket buffer values ...

#include <stdio.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <sys/types.h>

int main(int argn, char **argv) {
        int s;
        struct sockaddr_in bindPort;
        unsigned long size = 0;
        socklen_t sizelen = sizeof(size);
 
        s = socket(AF_INET, SOCK_RAW, IPPROTO_DIVERT);
        if (s == -1) {
                perror("socket");
                return (1);
        }

        if (0 != getsockopt(s, SOL_SOCKET, SO_SNDBUF, &size, &sizelen)) {
                perror("getsockopt");
                return (1);
        }
        printf("Socket %d has a sendbuffer of %lu bytes.\n", s, size);

        size = 9000;
        if (0 != setsockopt(s, SOL_SOCKET, SO_SNDBUF, &size, sizelen)) {
                perror("setsockopt");
                return (1);
        }

        size = 0;
        if (0 != getsockopt(s, SOL_SOCKET, SO_SNDBUF, &size, &sizelen)) {
                perror("getsockopt");
                return (1);
        }
        printf("Socket %d has a sendbuffer of %lu bytes.\n", s, size);

        bindPort.sin_family = AF_INET;
        bindPort.sin_port = htons(12345);
        bindPort.sin_addr.s_addr = 0;
        if (0 != bind(s, (struct sockaddr *)&bindPort, sizeof(bindPort))) {
                perror("bind");
                return (1);
        }

        return 0;
}

It simply works at the first glance.
Can you check, that the functionality works as indented?

If yes, please explain this method to change the buffer size (as an example) in the man pages and avoid additional code, which replicates the behavior.

sys/netinet/in.h
501–503

That's not an IP property, but a socket property.
So can we reuse the existing sys/socket.h definitions?

#define SO_SNDBUF       0x1001          /* send buffer size */
#define SO_RCVBUF       0x1002          /* receive buffer size */
This revision now requires changes to proceed.Mar 16 2020, 9:40 AM