Details

Reviewers

melifaro
philip
kbowling

Group Reviewers

network
manpages
transport

Commits

rG664077e69e8f: socket: Implement SO_RERROR
rGf45271340836: socket: Implement SO_RERROR
rG7045b1603bdf: socket: Implement SO_RERROR

Summary

SO_RERROR indicates that receive buffer overflows should be handled as errors.
Historically receive buffer overflows have been ignored and programs
could not tell if they missed messages or messages had been truncated
because of overflows. Since programs historically do not expect to get
receive overflow errors, this behavior is not the default.

This is really really important for programs that use route(4) to keep in sync
with the system. If we loose a message then we need to reload the full system
state, otherwise the behaviour from that point is undefined and can lead
to chasing bogus bug reports.

Test Plan

Download and compile this reproducer: https://xenity.marples.name/~roy/overflow.c

If it cannot detect overflow then your system suffers from it.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

roy_marples.name created this revision.Oct 3 2020, 6:53 PM

Herald added a reviewer: transport. · View Herald TranscriptOct 3 2020, 6:53 PM

Herald added subscribers: melifaro, donner, bz and 2 others. · View Herald Transcript

roy_marples.name requested review of this revision.Oct 3 2020, 6:53 PM

what other OSes share this API now?

In D26652#593846, @emaste wrote:

what other OSes share this API now?

I originally implemented on NetBSD and ported it to DragonFly BSD.
OpenBSD has an API which dhcpcd also uses which is specific to the route(4) API where it sends a RTM_DESYNC message.
The irony being it has to send a message on a socket which has already overflowed.

This API covers all sockets and is not limited to route(4) and thus has greater application outside of dhcpcd.
For example, a syslogd implementation could use it to notify the admin that potentially important messages have been lost - ie from, over the network.

Linux as an equivalent socket option like this, but it only applies to their equivalent of route(4).

In D26652#593848, @roy_marples.name wrote:

In D26652#593846, @emaste wrote:

what other OSes share this API now?

I originally implemented on NetBSD and ported it to DragonFly BSD.
OpenBSD has an API which dhcpcd also uses which is specific to the route(4) API where it sends a RTM_DESYNC message.
The irony being it has to send a message on a socket which has already overflowed.

Maybe one can force that such a message (only one) is appended to the socket buffer even if it is full.

This API covers all sockets and is not limited to route(4) and thus has greater application outside of dhcpcd.

Well, I'm not sure how useful it is for, let's say, UDP based communication. UDP is unreliable. Any application can't assume that it sees all messages. A receiver buffer overflow is only one reason why a message can not be delivered. So why should I care as an application writer?

For example, a syslogd implementation could use it to notify the admin that potentially important messages have been lost - ie from, over the network.

How would you know that a packet is dropped in the network?

Linux as an equivalent socket option like this, but it only applies to their equivalent of route(4).

I think it makes sense to limit the scope...

In D26652#593850, @tuexen wrote:

In D26652#593848, @roy_marples.name wrote:

In D26652#593846, @emaste wrote:

what other OSes share this API now?

I originally implemented on NetBSD and ported it to DragonFly BSD.
OpenBSD has an API which dhcpcd also uses which is specific to the route(4) API where it sends a RTM_DESYNC message.
The irony being it has to send a message on a socket which has already overflowed.

Maybe one can force that such a message (only one) is appended to the socket buffer even if it is full.

I believe they have a system where if the message doesn't fit they flush some or all of the buffer to ensure it does.

This API covers all sockets and is not limited to route(4) and thus has greater application outside of dhcpcd.

Well, I'm not sure how useful it is for, let's say, UDP based communication. UDP is unreliable. Any application can't assume that it sees all messages. A receiver buffer overflow is only one reason why a message can not be delivered. So why should I care as an application writer?

For example, a syslogd implementation could use it to notify the admin that potentially important messages have been lost - ie from, over the network.

How would you know that a packet is dropped in the network?

It could also be dropped locally.
We recently increased the size of some buffers on NetBSD as we discovered boot time messages from some daemons were being discarded as some apps were too noisy on startup.
UDP is an unreliable network protocol sure - but in this patch you will note that I have solved some comments such as /* should notify about lost packet */

Linux as an equivalent socket option like this, but it only applies to their equivalent of route(4).

I think it makes sense to limit the scope...

Well, any fix needs to go into the raw sockets code where the call to sbappendaddr() fails.
Once that is realised then it makes sense to try and cover all uses of sbappendaddr() with a generic API.
From a dhcpcd perspective I only care about route(4), but as an engineer I see a chance to do better.
Remember, this is opt-in.

In D26652#593850, @tuexen wrote:

In D26652#593848, @roy_marples.name wrote:

In D26652#593846, @emaste wrote:

what other OSes share this API now?

I originally implemented on NetBSD and ported it to DragonFly BSD.
OpenBSD has an API which dhcpcd also uses which is specific to the route(4) API where it sends a RTM_DESYNC message.
The irony being it has to send a message on a socket which has already overflowed.

Maybe one can force that such a message (only one) is appended to the socket buffer even if it is full.

This API covers all sockets and is not limited to route(4) and thus has greater application outside of dhcpcd.

Well, I'm not sure how useful it is for, let's say, UDP based communication. UDP is unreliable. Any application can't assume that it sees all messages. A receiver buffer overflow is only one reason why a message can not be delivered. So why should I care as an application writer?

For example, a syslogd implementation could use it to notify the admin that potentially important messages have been lost - ie from, over the network.

How would you know that a packet is dropped in the network?

I don't know if a packet is dropped in the network.
We do know if a packet is dropped in the host, and this allows an action to be taken on that.
As it stands now everything is silently discarded.

Linux as an equivalent socket option like this, but it only applies to their equivalent of route(4).

I think it makes sense to limit the scope...

Why would you want to limit the scope?
How otherwise do you know that your buffers are full? You have no way of knowing this.
This addresses comments in the code that even state this needs fixing.

Minor man page nit.

lib/libc/sys/getsockopt.2
521	You need to make a line break after the sentence stop here.

In D26652#593846, @emaste wrote:

what other OSes share this API now?

Allow me to rephrase my overly verbose answer.

NetBSD, DragonFly BSD and Linux have the same API in that ENOBUFS is returned from read(2) when a message cannot fit in the receivers buffer. This is allowed by POSIX.
NetBSD, DragonFly BSD and Linux enable the reporting of receive errors via setsockopt(2) - none of these are enabled by default.
NetBSD and DragonFly BSD cover sockets as a whole, whereas Linux is exclusive to a netlink socket.
BSD route(4) is equivalent to the superset of rt_netlink on Linux - netlink itself is a generic messaging service on a socket.
So aside from the socket option naming, the solution for NetBSD, DragonFly BSD and Linux are equivalent.

OpenBSD is mentioned in passing in that it has a bespoke solution that is strictly limited to route(4) overflow by sending a RTM_DESYNC message rather than an error code.

In D26652#593856, @roy_marples.name wrote:

In D26652#593850, @tuexen wrote:

In D26652#593848, @roy_marples.name wrote:

In D26652#593846, @emaste wrote:

what other OSes share this API now?

I originally implemented on NetBSD and ported it to DragonFly BSD.
OpenBSD has an API which dhcpcd also uses which is specific to the route(4) API where it sends a RTM_DESYNC message.
The irony being it has to send a message on a socket which has already overflowed.

Maybe one can force that such a message (only one) is appended to the socket buffer even if it is full.

I believe they have a system where if the message doesn't fit they flush some or all of the buffer to ensure it does.

This API covers all sockets and is not limited to route(4) and thus has greater application outside of dhcpcd.

Well, I'm not sure how useful it is for, let's say, UDP based communication. UDP is unreliable. Any application can't assume that it sees all messages. A receiver buffer overflow is only one reason why a message can not be delivered. So why should I care as an application writer?

For example, a syslogd implementation could use it to notify the admin that potentially important messages have been lost - ie from, over the network.

How would you know that a packet is dropped in the network?

It could also be dropped locally.
We recently increased the size of some buffers on NetBSD as we discovered boot time messages from some daemons were being discarded as some apps were too noisy on startup.
UDP is an unreliable network protocol sure - but in this patch you will note that I have solved some comments such as /* should notify about lost packet */

Linux as an equivalent socket option like this, but it only applies to their equivalent of route(4).

I think it makes sense to limit the scope...

Well, any fix needs to go into the raw sockets code where the call to sbappendaddr() fails.
Once that is realised then it makes sense to try and cover all uses of sbappendaddr() with a generic API.
From a dhcpcd perspective I only care about route(4), but as an engineer I see a chance to do better.
Remember, this is opt-in.

In D26652#593850, @tuexen wrote:

In D26652#593848, @roy_marples.name wrote:

In D26652#593846, @emaste wrote:

what other OSes share this API now?

I originally implemented on NetBSD and ported it to DragonFly BSD.
OpenBSD has an API which dhcpcd also uses which is specific to the route(4) API where it sends a RTM_DESYNC message.
The irony being it has to send a message on a socket which has already overflowed.

Maybe one can force that such a message (only one) is appended to the socket buffer even if it is full.

This API covers all sockets and is not limited to route(4) and thus has greater application outside of dhcpcd.

Well, I'm not sure how useful it is for, let's say, UDP based communication. UDP is unreliable. Any application can't assume that it sees all messages. A receiver buffer overflow is only one reason why a message can not be delivered. So why should I care as an application writer?

For example, a syslogd implementation could use it to notify the admin that potentially important messages have been lost - ie from, over the network.

How would you know that a packet is dropped in the network?

I don't know if a packet is dropped in the network.
We do know if a packet is dropped in the host, and this allows an action to be taken on that.
As it stands now everything is silently discarded.

Correct.

Linux as an equivalent socket option like this, but it only applies to their equivalent of route(4).

I think it makes sense to limit the scope...

Why would you want to limit the scope?

Because an application writer might get the impression that he/she will be notified if an incoming packet was dropped. This is not true since this patch only covers one of many reasons. If any application wants to detect this, it should add some sequence numbers to the data and it will know.

If you can guarantee that no other cause of packet loss it possible, then indicating ENOBUFS makes sense to me.

How otherwise do you know that your buffers are full? You have no way of knowing this.

My point is: There is a difference between (a) "your buffers are full" and (b) "one or more of your incoming packets were lost". An application is normally interested in (b). One possible reason for (b) is (a). I would suggest to limit the reporting to cases where you can imply from (a) hasn't happened, that (b) hasn't happened. My impression is that in your primary use case (a) and (b) are equivalent. Is that true?

This addresses comments in the code that even state this needs fixing.

afedorov added a subscriber: afedorov.Oct 4 2020, 12:27 PM

While I applaud this idea for route(4)ing sockets I think that applying it broadly to other socket types has issues that need to be considered. Has this patch been brought to arch@ as yet? I think either a narrower application (just to route(4)) and then a discussion on arch(4) is appropriate for this change.

In D26652#593910, @tuexen wrote:

Why would you want to limit the scope?

Because an application writer might get the impression that he/she will be notified if an incoming packet was dropped. This is not true since this patch only covers one of many reasons. If any application wants to detect this, it should add some sequence numbers to the data and it will know.

Sequence numbers do not help when there is no more data coming for a long time if the final packet was dropped.
Consider the case when a user insets a USB network stick and RTM_IFANNOUNCE was lost.

If you can guarantee that no other cause of packet loss it possible, then indicating ENOBUFS makes sense to me.

How otherwise do you know that your buffers are full? You have no way of knowing this.

My point is: There is a difference between (a) "your buffers are full" and (b) "one or more of your incoming packets were lost". An application is normally interested in (b). One possible reason for (b) is (a). I would suggest to limit the reporting to cases where you can imply from (a) hasn't happened, that (b) hasn't happened. My impression is that in your primary use case (a) and (b) are equivalent. Is that true?

You are correct in that the primary use case is to detect a packet was lost.
The fact it reported buffers are full means that the administrator now knows that the buffer size can be increased to try to mitigate the problem.

In D26652#593926, @gnn wrote:

While I applaud this idea for route(4)ing sockets I think that applying it broadly to other socket types has issues that need to be considered.

Please consider syslog messages as well because this patch allows a syslogd implementation to also know that locally sent messages have been discarded.
We don't know the importance of these messages - it could be from the basic "this cron job ran" to the more important security messages.

Has this patch been brought to arch@ as yet? I think either a narrower application (just to route(4)) and then a discussion on arch(4) is appropriate for this change.

No. I will send an email.

I don't have enough competence to talk about the generic socket case, so I'll talk about route(4).

Firstly, indeed both rtsock/netlink are unreliable protocols and netlink(7) manual explicitly talks on returning ENOBUFS as a default behaviour:

However, reliable transmissions from kernel to user are impossible in
       any case.  The kernel can't send a netlink message if the socket buf‐
       fer is full: the message will be dropped and the kernel and the user-
       space process will no longer have the same view of kernel state.  It
       is up to the application to detect when this happens (via the ENOBUFS
       error returned by recvmsg(2)) and resynchronize.

Let's look into how most popular routing software handle this problem in general.

bird: has an option to periodically re-scan routing rable.
How does it handle Linux ENOBUFS feature:
bird nl_async_hook(), added 8 years ago:

      if (errno == ENOBUFS)
	{
	  /*
	   *  Netlink reports some packets have been thrown away.
	   *  One day we might react to it by asking for route table
	   *  scan in near future.
	   */
	  log(L_WARN "Kernel dropped some netlink messages, will resync on next scan.");
	  return 1;	/* More data are likely to be ready */
	}

FRR: no periodic scans.
How does it handle Linux ENOBUFS feature:
netlink_recv_msg(),

		if (errno == EWOULDBLOCK || errno == EAGAIN)
			return 0;
		flog_err(EC_ZEBRA_RECVMSG_OVERRUN, "%s recvmsg overrun: %s",
			 nl->name, safe_strerror(errno));
		/*
		 * In this case we are screwed. There is no good way to recover
		 * zebra at this point.
		 */
		exit(-1);

Original commit with more reasoning.

So, it looks like even the feature has been present in Linux for 8+ years, it hasn't been adopted by the relevant software.

Also, as SO_RERROR has been implemented in other systems for a while now, I tried looking into the users of this socket option. The only place that I managed to find it was dhcpd, which is a bit surprising.

Maybe either the problem statement is different from what is described in the summary, or the solution should be different?

This revision now requires changes to proceed.Oct 4 2020, 3:06 PM

In D26652#593976, @melifaro wrote:

So, it looks like even the feature has been present in Linux for 8+ years, it hasn't been adopted by the relevant software.

I wasn't aware that this a popularity contest?
If a solution has been adopted once, it's relevant.

Also, as SO_RERROR has been implemented in other systems for a while now, I tried looking into the users of this socket option. The only place that I managed to find it was dhcpd, which is a bit surprising.

dhcpcd.
dhcpd is the ISC DHCP server.
A common misspelling :)

As noted, this addresses a specific failure case which dhcpcd can recover from.
As you have also noted, other applications state they have no way of recovering from it but do log it which is something at least.

I will also note that SO_USER_COOKIE is just as popular.

Maybe either the problem statement is different from what is described in the summary, or the solution should be different?

The problem statement accurately reflects the FreeBSD source code and commentary within.
Either a solution should be implemented or the commentary adjusted and in some cases added.

I had cases where quagga missed routing updates, which caused inconsistent routing between different bgp speakers. This is probably still possible with frr7, and therefore I would welcome a way to at least get some indication that data was lost.

Adjusted man page as requested.

roy_marples.name marked an inline comment as done.Oct 4 2020, 4:04 PM

Use SO_RERROR in route(8).
Warn on any errors returned by read(2) rather than assuming we always get a route message.

In D26652#593989, @pi wrote:

I had cases where quagga missed routing updates, which caused inconsistent routing between different bgp speakers. This is probably still possible with frr7, and therefore I would welcome a way to at least get some indication that data was lost.

PR for FRRouting submitted: https://github.com/FRRouting/frr/pull/7242

Move SO_RERROR so it sits within the correct place.

cy added a subscriber: cy.Oct 11 2020, 2:56 AM

All changes requested to the SO_RERROR approach have been made.
I have done as asked and queried this approach on the freebsd-arch mailing list. No replies which I read as no-one has anything bad to say about the approach, but sadly nothing positive either.
I would really like to see some traction here in 2021 :)

I have also submitted a patch upstream to hostap to resync internal driver state to system interface state using SO_RERROR here:
http://lists.infradead.org/pipermail/hostap/2021-January/039213.html
FreeBSD uses wpa_supplicant in the base system so I see this as a win.

SO_RERROR for ntpd, which FreeBSD also uses.
https://bugs.ntp.org/show_bug.cgi?id=3714

I've been testing this after I found a pointer to this review on the hostapd mailing list.

Are there any remaining reasons not to merge this patch? It looks like activity has stalled in ~October.

Now dealing with FreeBSD bugzilla routing socket overflow reports
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253166

wpa_supplicant now supports SO_RERROR upstream:
https://w1.fi/cgit/hostap/commit/?id=a579642bc3c92c98daabadb5bb36c2da26ab893f

I guess I was wrong w.r.t. the adoption of the feature.
I'm going to commit the change on Tuesday, Feb 9 unless there are any objections.

@roy_marples.name any chance you could consider updating the patch? The current diff fails to apply due to the date clash in lib/libc/sys/getsockopt.2.

Adjust to latest git head.

I confirm that this compiles on latest main and doesn't blow up in my face. I'll do more testing with hostapd today but I don't expect problems.

driesm added a subscriber: driesm.Feb 14 2021, 12:19 PM

@melifaro can you merge it? IMO this should be in the 13.0 ABI.

LGTM for the man page part.

kbowling accepted this revision.Jul 28 2021, 3:59 PM

This revision is now accepted and ready to land.Jul 28 2021, 3:59 PM

Closed by commit rG7045b1603bdf: socket: Implement SO_RERROR (authored by roy_marples.name, committed by Kevin Bowling <kbowling@FreeBSD.org>). · Explain WhyJul 28 2021, 4:35 PM

This revision was automatically updated to reflect the committed changes.

Kevin Bowling <kbowling@FreeBSD.org> added a commit: rG7045b1603bdf: socket: Implement SO_RERROR.

In D26652#638569, @roy_marples.name wrote:

Adjust to latest git head.

@roy_marples.name, I can't tell why this was so circuitous. It's a straight forward change and LGTM. There are people running full routes on FreeBSD like Netflix and FRR users like Netgate so this is a really desirable improvement. I appreciate your work and would be happy to funnel in other improvements if you add me to the reviewers or PRs in the future!

In D26652#705993, @kbowling wrote:

In D26652#638569, @roy_marples.name wrote:

Adjust to latest git head.

@roy_marples.name, I can't tell why this was so circuitous. It's a straight forward change and LGTM. There are people running full routes on FreeBSD like Netflix and FRR users like Netgate so this is a really desirable improvement. I appreciate your work and would be happy to funnel in other improvements if you add me to the reviewers or PRs in the future!

Many thanks for getting this in!

If you fancy adding yourself to D23695 and pushing that forwards that would be nice as it would solve Bug 194485 which I filed and submitted a long since stale patch for 7 years ago.

Kevin Bowling <kbowling@FreeBSD.org> added a commit: rGf45271340836: socket: Implement SO_RERROR.Aug 11 2021, 1:58 AM

Kevin Bowling <kbowling@FreeBSD.org> added a commit: rG664077e69e8f: socket: Implement SO_RERROR.Aug 11 2021, 2:34 AM

Implement SO_RERROR
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 77845

lib/libc/sys/getsockopt.2

sys/kern/uipc_sockbuf.c

sys/kern/uipc_socket.c

sys/kern/uipc_usrreq.c

sys/net/raw_usrreq.c

sys/netgraph/bluetooth/socket/ng_btsocket_hci_raw.c

sys/netgraph/ng_socket.c

sys/netinet/ip_divert.c

sys/netinet/raw_ip.c

sys/netinet/udp_usrreq.c

sys/netinet6/icmp6.c

sys/netinet6/ip6_input.c

sys/netinet6/raw_ip6.c

sys/netinet6/send.c

sys/netinet6/udp6_usrreq.c

sys/netipsec/keysock.c

sys/sys/socket.h

sys/sys/socketvar.h

Implement SO_RERRORClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 77845

lib/libc/sys/getsockopt.2

sys/kern/uipc_sockbuf.c

sys/kern/uipc_socket.c

sys/kern/uipc_usrreq.c

sys/net/raw_usrreq.c

sys/netgraph/bluetooth/socket/ng_btsocket_hci_raw.c

sys/netgraph/ng_socket.c

sys/netinet/ip_divert.c

sys/netinet/raw_ip.c

sys/netinet/udp_usrreq.c

sys/netinet6/icmp6.c

sys/netinet6/ip6_input.c

sys/netinet6/raw_ip6.c

sys/netinet6/send.c

sys/netinet6/udp6_usrreq.c

sys/netipsec/keysock.c

sys/sys/socket.h

sys/sys/socketvar.h

Implement SO_RERROR
ClosedPublic
Actions

Revision Contents
Changeset List