Paths

Table of Contentst

unix/dgram: bump maximum datagram size limit to 8k
ClosedPublic
Actions

Authored by glebius on Nov 29 2023, 4:03 AM.

Details

Reviewers

emaste
bz
cy
markj
tuexen
• karels

Group Reviewers

network

Commits

rGac84975e4a1f: unix/dgram: bump maximum datagram size limit to 8k
rGbe7c095ac99a: unix/dgram: bump maximum datagram size limit to 8k

Summary

This is important for wpa_supplicant operation on a crowded network.

Note: we actually need an API to increase maximum datagram size on a
socket. Previously SO_SNDBUF magically acted like that, but that was
an undocumented "feature".

Also move the comment to the proper line. Previously it was the receive
buffer that imposed the limit. Now notion of buffer size and maximum
datagram are separate.

PR: 274990

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

glebius created this revision.Nov 29 2023, 4:03 AM

Herald added a subscriber: imp. · View Herald TranscriptNov 29 2023, 4:03 AM

glebius requested review of this revision.Nov 29 2023, 4:03 AM

Harbormaster completed remote builds in B54687: Diff 130734.Nov 29 2023, 4:03 AM

I'll repeat my comment from D42558 here.

Yes, but it worked for almost 40 years and on every other UNIX/Linux does too. [..] The limits put into that file in the last millennium were and are really not adequate anymore in the last 20 years (despite never been enforced anyway).

Also the changes broke the previous syslogd fixes; this is not just wpa.

There are (commercial) libraries out there which push MBs over there based on some postings.

We do not need an API; we had one, it was and is the socket API. AF_UNIX are sockets. Socket options are supposed to work on them.

In D42830#976836, @bz wrote:

Yes, but it worked for almost 40 years and on every other UNIX/Linux does too. [..] The limits put into that file in the last millennium were and are really not adequate anymore in the last 20 years (despite never been enforced anyway).
Also the changes broke the previous syslogd fixes; this is not just wpa.

Linux doesn't have maximum datagram size limit at all. On Linux SO_SNDBUF really controls socket buffer size, just like it does on FreeBSD 14. Btw, their default buffer size is huge over 200k. Linux also has a limit for how many datagrams are queued, which makes some sense.

I'm fine with bumping the maximum datagram size sysctl all the way up above 8k. I don't see what's the good point in that limit at all A single datagram of 100k and ten datagrams of 10k consume same amount of mbufs, they are just linked in a different manner. The limit comes all the way from the ancient times, so there is no explanation why it could be useful:

https://github.com/sergev/4.4BSD-Lite2/blob/master/usr/src/sys/kern/uipc_usrreq.c#L312

There are (commercial) libraries out there which push MBs over there based on some postings.

We do not need an API; we had one, it was and is the socket API. AF_UNIX are sockets. Socket options are supposed to work on them.

And they work. SO_SNDBUF sets size of send buffer now.

In D42830#976902, @glebius wrote:

In D42830#976836, @bz wrote:

Yes, but it worked for almost 40 years and on every other UNIX/Linux does too. [..] The limits put into that file in the last millennium were and are really not adequate anymore in the last 20 years (despite never been enforced anyway).
Also the changes broke the previous syslogd fixes; this is not just wpa.

Linux doesn't have maximum datagram size limit at all. On Linux SO_SNDBUF really controls socket buffer size, just like it does on FreeBSD 14. Btw, their default buffer size is huge over 200k. Linux also has a limit for how many datagrams are queued, which makes some sense.

I'm fine with bumping the maximum datagram size sysctl all the way up above 8k. I don't see what's the good point in that limit at all A single datagram of 100k and ten datagrams of 10k consume same amount of mbufs, they are just linked in a different manner. The limit comes all the way from the ancient times, so there is no explanation why it could be useful:

Question: Are send()/recv() atomic for these datagram sockets?

https://github.com/sergev/4.4BSD-Lite2/blob/master/usr/src/sys/kern/uipc_usrreq.c#L312

There are (commercial) libraries out there which push MBs over there based on some postings.

We do not need an API; we had one, it was and is the socket API. AF_UNIX are sockets. Socket options are supposed to work on them.

And they work. SO_SNDBUF sets size of send buffer now.

In D42830#976902, @glebius wrote:

In D42830#976836, @bz wrote:

Yes, but it worked for almost 40 years and on every other UNIX/Linux does too. [..] The limits put into that file in the last millennium were and are really not adequate anymore in the last 20 years (despite never been enforced anyway).
Also the changes broke the previous syslogd fixes; this is not just wpa.

Linux doesn't have maximum datagram size limit at all. On Linux SO_SNDBUF really controls socket buffer size, just like it does on FreeBSD 14. Btw, their default buffer size is huge over 200k. Linux also has a limit for how many datagrams are queued, which makes some sense.

[200k] that's what, 3 max-sized IP packets?
[datagram count limit] Yes as otherwise if you have no consumer you can probably exhaust memory very quickly?

I'm fine with bumping the maximum datagram size sysctl all the way up above 8k. I don't see what's the good point in that limit at all.

That's why I do not understand why we started enforcing it as was.

I think if we "bump" it to anything we may want to consider kern.ipc.maxsockbuf - overhead?

If I understood the old implementation (before your changes) correctly, the value to soreserve() was (despite its name) a hint with the defaults to start with.
And the actual maximum enforced was based on the socket buffer, and with that kern.ipc.maxsockbuf (hence SO_SNDBUF worked).
Given this works for IP sockets, why should we not also go back and do the same for AF_UNIX?
And then have the additional check that the receiver needs to be able to take the data (and have that, what 20 packets, limit on that)? And then it makes a difference if it is 100 1k packets or 1 100k packet.

A single datagram of 100k and ten datagrams of 10k consume same amount of mbufs, they are just linked in a different manner. The limit comes all the way from the ancient times, so there is no explanation why it could be useful:

https://github.com/sergev/4.4BSD-Lite2/blob/master/usr/src/sys/kern/uipc_usrreq.c#L312

[On history also my last email on that subject to you (and others) from 13 Nov 2023]

Though not having checked the code, and please fill me in on today's implementation, do we have to reserve the mbufs upfront?

There are (commercial) libraries out there which push MBs over there based on some postings.

We do not need an API; we had one, it was and is the socket API. AF_UNIX are sockets. Socket options are supposed to work on them.

And they work. SO_SNDBUF sets size of send buffer now.

True but doesn't help anymore ;-)

TLDR: if you fine with proposed bump, please approve the revision. Or suggest a larger value to bump to.

In D42830#976982, @bz wrote:

I'm fine with bumping the maximum datagram size sysctl all the way up above 8k. I don't see what's the good point in that limit at all.

That's why I do not understand why we started enforcing it as was.

We didn't start to enforce it differently. What has changed is that SO_SNDBUF now set socket buffer length, not maximum datagrm. Details down below under your other question.

I think if we "bump" it to anything we may want to consider kern.ipc.maxsockbuf - overhead?

The limit on socket buffer is in action, too, of course. We can bump maxdgram safely.

If I understood the old implementation (before your changes) correctly, the value to soreserve() was (despite its name) a hint with the defaults to start with.
And the actual maximum enforced was based on the socket buffer, and with that kern.ipc.maxsockbuf (hence SO_SNDBUF worked).
Given this works for IP sockets, why should we not also go back and do the same for AF_UNIX?

Nope, it was very different to IP sockets. In the original implementation the send buffer did not exist. The write()s were going directly into the receive buffer of the other socket.
HOWEVER, the packet was checked against socket buffer size limit of the sending socket. Since buffer is always empty, the check effectively becomes maximum datagram
check.

This was done on socket initialization:

sbreserve(so->so_snd, net.local.dgram.maxdgram)  /* 2k */
sbreserve(so->so_rcv, net.local.dgram.recvspace)   /* 16k */

This was done on write:

if (dgram + so->so_snd.sb_cc > so->so_snd.sb_mbmax)  /* sb_cc is always zero! */
        return EMSGSIZE;
so2 = so->unpcb->peer;
sbappend(dgram, so2->so_rcv);  /* internally does same check as above */

Now individual send buffer actually exists for connected AF_UNIX sockets. Please check out recent commit logs and unix(4) for details. Side effect is that now setsockopt(SO_SNDBUF)
does what it is supposed to do, not the undocumented behavior that existed before. And on Linux it works the same. There is just no maxdgram limit at all, AFAIU. And we should do
the same unless anybody has good explanation for such a limit.

There is just no maxdgram limit at all, AFAIU. And we should do
the same unless anybody has good explanation for such a limit.

Go for it.

Do we still make sure we cannot just buffer and buffer into the receiver, i.e., is there still a limit and we can guarantee that the packet either gets there in full or not?

bz added a subscriber: • karels.Dec 1 2023, 6:11 PM

tuexen accepted this revision as: tuexen.Dec 1 2023, 7:22 PM

This revision is now accepted and ready to land.Dec 1 2023, 7:22 PM

There is just no maxdgram limit at all, AFAIU. And we should do
the same unless anybody has good explanation for such a limit.

I think there would be multiple denial of service situations if there was no datagram max size, but I haven't looked at it closely. It would depend on such things as packet limits on receive buffers.

About SO_SNDBUF: has anyone done a survey of usage in the tree? There are a lot of references.

Unless there is a use case for datagrams over 8K, I think this change is fine.

Closed by commit rGbe7c095ac99a: unix/dgram: bump maximum datagram size limit to 8k (authored by glebius). · Explain WhyDec 1 2023, 11:38 PM

This revision was automatically updated to reflect the committed changes.

glebius added a commit: rGbe7c095ac99a: unix/dgram: bump maximum datagram size limit to 8k.

glebius added a commit: rGac84975e4a1f: unix/dgram: bump maximum datagram size limit to 8k.Jan 9 2024, 12:31 AM