Page MenuHomeFreeBSD

Fix bugs in routing socket use by userland tools

Authored by pkelsey on Apr 9 2017, 6:46 AM.



A brief overview of routing sockets:

  1. All route messages that are generated in the kernel, whether a response to a route message written to a routing socket from userland or an asynchronous route event from within the kernel, are queued to all open routing sockets (this is a slight oversimplification that ignores request-loopback and fib-based filtering, but those complications don't affect what follows).
  1. There are multiple route message formats and only the first three members of the message (length, version, and type) are common across all formats.

(1) means that when reading a routing socket looking for a response to
a request that was just written, there may be an arbitrary number of
route messages waiting in the route socket ahead of the response of
interest. (2) means that when processing messages read from a routing
socket, the type of the received message must first be tested before
evaulating format-specific fields in the message.

The following userland tools make invalid assumptions about the format
of messages read from a routing socket - that is, they don't ensure
the message is of a given format before interpreting message contents
according to that format:


I examined all users of struct rt_msghdr and PF_ROUTE in the userland
sources and have only identified issues in the above.

In addition to fixing the above programs, this patch includes the
following additional cleanup and doc:

  • In route.h, the route message types are annotated with the message format used
  • In route.h, the NULL pointer check has been removed from SA_SIZE() along with the comment casting doubt on its usefulness. There is no uncertainty - no consumer of SA_SIZE() expects it to work with a NULL pointer - and the presence of the check obscures the fact that the route message format does not encode non-present sockaddrs as sizeof(long) zeroes - it omits them entirely from the message contents.
  • In usr.sbin/arp/arp.c, in set(), there is no point to passing &sdl_m to rtmsg() for the initial RTM_GET as the gateway address is not used nor modified in any way for an RTM_GET. Pasing &sdl_m to RTM_GET requests only impedes understanding of the code. The same goes for the RTM_GET in delete().
  • In sbin/route/route.c, in rtmsg(), replaced the size computation in NEXTADDR() with the identical SA_SIZE().
  • In usr.bin/netsat/route.c, in p_rtentry_sysctl(), the address list processing logic introduced in r287351 winds up adding arbitrary offsets to the sa pointer, then dereferences the result to compute the size, following any gap in the list of possible addresses in the message, and there is always a gap. This hasn't corrupted output because coincidentally the only address fields of interest after this loop are the first three in the message, which are always present. The lack of coredumps is probably due to the random offsets being added to the sa pointer being 16 bits. I searched for all other uses of RTAX_MAX in the userland code to find other such address list processing loops, and found none of the other to be defective.
  • There is a fix included for contrib/traceroute/findsaddr-socket.c, although it appears to me that we don't include that file as part of our traceroute build. I'm also not sure of the upstream situation for traceroute.

Diff Detail

rS FreeBSD src repository - subversion
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Looks good to me. Also I want to note, that all

} while()

loops are affected to the problem described in rS303374. It would be nice to fix this problem too. :)

292 ↗(On Diff #27251)

If you have modified all these lines, can you also fix the style and replace space to tab character after #define word?

This revision is now accepted and ready to land.Apr 9 2017, 3:36 PM
292 ↗(On Diff #27251)

Sure, I was kind of on the fence about that, so one suggestion to do it is enough to convince me :)

In D10330#213998, @ae wrote:

Looks good to me. Also I want to note, that all

} while()

loops are affected to the problem described in rS303374. It would be nice to fix this problem too. :)

Thanks for pointing out that commit - it looks to me like the explanation there (mbuf dropped in netisr) could be the root cause of, which is a bug I am sure isn't explained by the bad message parsing fixed here, but that I still had not yet figured out.

With respect to this issue it seems like a useful behavior would be to have route_output() append any response it generates to the requesting route socket directly so that any failure to append can be communicated to the application via the return value of route_output(). The copies of the response would then still go to all other routing sockets via netisr, which I think can be accomplished by using a protocol-specific field in the mbuf to indicate which socket not to send a copy to and by also adjusting raw_input_rts_cb() to filter accordingly. This kind of approach would avoid having to add timeout logic and magic timeout values to all route socket users who are looking for responses to their own requests, and would preserve the netisr queueing behavior for all route messages other than response-to-requester.

1242 ↗(On Diff #27251)

This break should actually be a continue

pkelsey edited edge metadata.
pkelsey edited the summary of this revision. (Show Details)

break -> continue in the addition to sbin/routed/table.c

Converted space to tab between #define and RTM_* in route.h

This revision now requires review to proceed.Apr 9 2017, 6:09 PM
This revision was automatically updated to reflect the committed changes.