diff --git a/share/man/man4/inet.4 b/share/man/man4/inet.4 index dbab301302b1..60b2e588500d 100644 --- a/share/man/man4/inet.4 +++ b/share/man/man4/inet.4 @@ -1,325 +1,336 @@ .\" Copyright (c) 1983, 1991, 1993 .\" The Regents of the University of California. All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 3. Neither the name of the University nor the names of its contributors .\" may be used to endorse or promote products derived from this software .\" without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" From: @(#)inet.4 8.1 (Berkeley) 6/5/93 .\" $FreeBSD$ .\" .Dd November 12, 2021 .Dt INET 4 .Os .Sh NAME .Nm inet .Nd Internet protocol family .Sh SYNOPSIS .In sys/types.h .In netinet/in.h .Sh DESCRIPTION The Internet protocol family is a collection of protocols layered atop the .Em Internet Protocol .Pq Tn IP transport layer, and utilizing the Internet address format. The Internet family provides protocol support for the .Dv SOCK_STREAM , SOCK_DGRAM , and .Dv SOCK_RAW socket types; the .Dv SOCK_RAW interface provides access to the .Tn IP protocol. .Sh ADDRESSING Internet addresses are four byte quantities, stored in network standard format (on little endian machines, such as the .Tn alpha , .Tn amd64 and .Tn i386 these are word and byte reversed). The include file .In netinet/in.h defines this address as a discriminated union. .Pp Sockets bound to the Internet protocol family utilize the following addressing structure, .Bd -literal -offset indent struct sockaddr_in { uint8_t sin_len; sa_family_t sin_family; in_port_t sin_port; struct in_addr sin_addr; char sin_zero[8]; }; .Ed .Pp Sockets may be created with the local address .Dv INADDR_ANY to affect .Dq wildcard matching on incoming messages. The address in a .Xr connect 2 or .Xr sendto 2 call may be given as .Dv INADDR_ANY to mean .Dq this host . The distinguished address .Dv INADDR_BROADCAST is allowed as a shorthand for the broadcast address on the primary network if the first network configured supports broadcast. .Sh PROTOCOLS The Internet protocol family is comprised of the .Tn IP network protocol, Internet Control Message Protocol .Pq Tn ICMP , Internet Group Management Protocol .Pq Tn IGMP , Transmission Control Protocol .Pq Tn TCP , and User Datagram Protocol .Pq Tn UDP . .Tn TCP is used to support the .Dv SOCK_STREAM abstraction while .Tn UDP is used to support the .Dv SOCK_DGRAM abstraction. A raw interface to .Tn IP is available by creating an Internet socket of type .Dv SOCK_RAW . The .Tn ICMP message protocol is accessible from a raw socket. .Pp The .Nm address on an interface consist of the address itself, the netmask, either broadcast address in case of a broadcast interface or peers address in case of point-to-point interface. The following .Xr ioctl 2 commands are provided for a datagram socket in the Internet domain: .Pp .Bl -tag -width ".Dv SIOCGIFBRDADDR" -offset indent -compact .It Dv SIOCAIFADDR Add address to an interface. The command requires .Ft struct in_aliasreq as argument. .It Dv SIOCDIFADDR Delete address from an interface. The command requires .Ft struct ifreq as argument. .It Dv SIOCGIFADDR .It Dv SIOCGIFBRDADDR .It Dv SIOCGIFDSTADDR .It Dv SIOCGIFNETMASK Return address information from interface. The returned value is in .Ft struct ifreq . This way of address information retrieval is obsoleted, a preferred way is to use .Xr getifaddrs 3 API. .El .Ss MIB Variables A number of variables are implemented in the net.inet branch of the .Xr sysctl 3 MIB. In addition to the variables supported by the transport protocols (for which the respective manual pages may be consulted), the following general variables are defined: .Bl -tag -width IPCTL_ACCEPTSOURCEROUTE .It Dv IPCTL_FORWARDING .Pq ip.forwarding Boolean: enable/disable forwarding of IP packets. Defaults to off. .It Dv IPCTL_SENDREDIRECTS .Pq ip.redirect Boolean: enable/disable sending of ICMP redirects in response to .Tn IP packets for which a better, and for the sender directly reachable, route and next hop is known. Defaults to on. .It Dv IPCTL_DEFTTL .Pq ip.ttl Integer: default time-to-live .Pq Dq TTL to use for outgoing .Tn IP packets. .It Dv IPCTL_ACCEPTSOURCEROUTE .Pq ip.accept_sourceroute Boolean: enable/disable accepting of source-routed IP packets (default false). .It Dv IPCTL_SOURCEROUTE .Pq ip.sourceroute Boolean: enable/disable forwarding of source-routed IP packets (default false). .It Va ip.process_options Integer: control IP options processing. By setting this variable to 0, all IP options in the incoming packets will be ignored, and the packets will be passed unmodified. By setting to 1, IP options in the incoming packets will be processed accordingly. By setting to 2, an .Tn ICMP .Dq "prohibited by filter" message will be sent back in response to incoming packets with IP options. Default is 1. This .Xr sysctl 8 variable affects packets destined for a local host as well as packets forwarded to some other host. .It Va ip.rfc1122_strong_es Boolean: in non-forwarding mode .Pq ip.forwarding is disabled partially implement the Strong End System model per RFC1122. If a packet with destination address that is local arrives on a different interface than the interface the address belongs to, the packet would be silently dropped. Enabling this option may break certain setups, e.g. having an alias address(es) on loopback that are expected to be reachable by outside traffic. Enabling some other network features, e.g. .Xr carp 4 or destination address rewriting .Xr pfil 4 filters may override and bypass this check. Disabled by default. .It Va ip.source_address_validation Boolean: perform source address validation for packets destined for the local host. Consider this as following Section 3.2 of RFC3704/BCP84, where we treat local host as our own infrastructure. This has no effect on packets to be forwarded, so don't consider it as anti-spoof feature for a router. Enabled by default. .It Va ip.rfc6864 Boolean: control IP IDs generation behaviour. True value enables RFC6864 support, which specifies that IP ID field of .Em atomic datagrams can be set to any value. The .Fx implementation sets it to zero. Enabled by default. .It Va ip.random_id Boolean: control IP IDs generation behaviour. Setting this .Xr sysctl 8 to 1 causes the ID field in .Em non-atomic IP datagrams (or all IP datagrams, if .Va ip.rfc6864 is disabled) to be randomized instead of incremented by 1 with each packet generated. This closes a minor information leak which allows remote observers to determine the rate of packet generation on the machine by watching the counter. At the same time, on high-speed links, it can decrease the ID reuse cycle greatly. Default is 0 (sequential IP IDs). IPv6 flow IDs and fragment IDs are always random. .It Va ip.maxfrags Integer: maximum number of fragments the host will accept and simultaneously hold across all reassembly queues in all VNETs. If set to 0, reassembly is disabled. If set to -1, this limit is not applied. This limit is recalculated when the number of mbuf clusters is changed. This is a global limit. .It Va ip.maxfragpackets Integer: maximum number of fragmented packets the host will accept and simultaneously hold in the reassembly queue for a particular VNET. 0 means that the host will not accept any fragmented packets for that VNET. \-1 means that the host will not apply this limit for that VNET. This limit is recalculated when the number of mbuf clusters is changed. This is a per-VNET limit. .It Va ip.maxfragbucketsize Integer: maximum number of reassembly queues per bucket. Fragmented packets are hashed to buckets. Each bucket has a list of reassembly queues. The system must compare the incoming packets to the existing reassembly queues in the bucket to find a matching reassembly queue. To preserve system resources, the system limits the number of reassembly queues allowed in each bucket. This limit is recalculated when the number of mbuf clusters is changed or when the value of .Va ip.maxfragpackets changes. This is a per-VNET limit. .It Va ip.maxfragsperpacket Integer: maximum number of fragments the host will accept and hold in the reassembly queue for a packet. 0 means that the host will not accept any fragmented packets for the VNET. This is a per-VNET limit. +.It Va ip.allow_net0 +Boolean: allow experimental use of addresses in 0.0.0.0/8 as endpoints, +and allow forwarding of packets with these addresses. +.It Va ip.allow_net240 +Boolean: allow experimental use of addresses in 240.0.0.0/4 as endpoints, +and allow forwarding of packets with these addresses. +.It Va ip.loopback_prefixlen +Integer: prefix length of the address space reserved for loopback purposes. +The default is 8, meaning that 127.0.0.0/8 is reserved for loopback, +and cannot be sent, received, or forwarded on a non-loopback interface. +Use of other values is experimental. .El .Sh SEE ALSO .Xr ioctl 2 , .Xr socket 2 , .Xr getifaddrs 3 , .Xr sysctl 3 , .Xr icmp 4 , .Xr intro 4 , .Xr ip 4 , .Xr ipfirewall 4 , .Xr route 4 , .Xr tcp 4 , .Xr udp 4 , .Xr pfil 9 .Rs .%T "An Introductory 4.3 BSD Interprocess Communication Tutorial" .%B PS1 .%N 7 .Re .Rs .%T "An Advanced 4.3 BSD Interprocess Communication Tutorial" .%B PS1 .%N 8 .Re .Sh HISTORY The .Nm protocol interface appeared in .Bx 4.2 . The .Dq protocol cloning code appeared in .Fx 2.1 . .Sh CAVEATS The Internet protocol support is subject to change as the Internet protocols develop. Users should not depend on details of the current implementation, but rather the services exported. diff --git a/sys/net/vnet.h b/sys/net/vnet.h index afb6857bbccc..d0ede39c0cb1 100644 --- a/sys/net/vnet.h +++ b/sys/net/vnet.h @@ -1,462 +1,463 @@ /*- * SPDX-License-Identifier: BSD-2-Clause-FreeBSD * * Copyright (c) 2006-2009 University of Zagreb * Copyright (c) 2006-2009 FreeBSD Foundation * All rights reserved. * * This software was developed by the University of Zagreb and the * FreeBSD Foundation under sponsorship by the Stichting NLnet and the * FreeBSD Foundation. * * Copyright (c) 2009 Jeffrey Roberson * Copyright (c) 2009 Robert N. M. Watson * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ /*- * This header file defines several sets of interfaces supporting virtualized * network stacks: * * - Definition of 'struct vnet' and functions and macros to allocate/free/ * manipulate it. * * - A virtual network stack memory allocator, which provides support for * virtualized global variables via a special linker set, set_vnet. * * - Virtualized sysinits/sysuninits, which allow constructors and * destructors to be run for each network stack subsystem as virtual * instances are created and destroyed. * * If VIMAGE isn't compiled into the kernel, virtualized global variables * compile to normal global variables, and virtualized sysinits to regular * sysinits. */ #ifndef _NET_VNET_H_ #define _NET_VNET_H_ /* * struct vnet describes a virtualized network stack, and is primarily a * pointer to storage for virtualized global variables. Expose to userspace * as required for libkvm. */ #if defined(_KERNEL) || defined(_WANT_VNET) +#include /* for CACHE_LINE_SIZE */ #include struct vnet { LIST_ENTRY(vnet) vnet_le; /* all vnets list */ u_int vnet_magic_n; u_int vnet_ifcnt; u_int vnet_sockcnt; u_int vnet_state; /* SI_SUB_* */ void *vnet_data_mem; uintptr_t vnet_data_base; bool vnet_shutdown; /* Shutdown in progress. */ } __aligned(CACHE_LINE_SIZE); #define VNET_MAGIC_N 0x5e4a6f28 /* * These two virtual network stack allocator definitions are also required * for libkvm so that it can evaluate virtualized global variables. */ #define VNET_SETNAME "set_vnet" #define VNET_SYMPREFIX "vnet_entry_" #endif #ifdef _KERNEL #define VNET_PCPUSTAT_DECLARE(type, name) \ VNET_DECLARE(counter_u64_t, name[sizeof(type) / sizeof(uint64_t)]) #define VNET_PCPUSTAT_DEFINE(type, name) \ VNET_DEFINE(counter_u64_t, name[sizeof(type) / sizeof(uint64_t)]) #define VNET_PCPUSTAT_DEFINE_STATIC(type, name) \ VNET_DEFINE_STATIC(counter_u64_t, name[sizeof(type) / sizeof(uint64_t)]) #define VNET_PCPUSTAT_ALLOC(name, wait) \ COUNTER_ARRAY_ALLOC(VNET(name), \ sizeof(VNET(name)) / sizeof(counter_u64_t), (wait)) #define VNET_PCPUSTAT_FREE(name) \ COUNTER_ARRAY_FREE(VNET(name), sizeof(VNET(name)) / sizeof(counter_u64_t)) #define VNET_PCPUSTAT_ADD(type, name, f, v) \ counter_u64_add(VNET(name)[offsetof(type, f) / sizeof(uint64_t)], (v)) #define VNET_PCPUSTAT_FETCH(type, name, f) \ counter_u64_fetch(VNET(name)[offsetof(type, f) / sizeof(uint64_t)]) #define VNET_PCPUSTAT_SYSINIT(name) \ static void \ vnet_##name##_init(const void *unused) \ { \ VNET_PCPUSTAT_ALLOC(name, M_WAITOK); \ } \ VNET_SYSINIT(vnet_ ## name ## _init, SI_SUB_INIT_IF, \ SI_ORDER_FIRST, vnet_ ## name ## _init, NULL) #define VNET_PCPUSTAT_SYSUNINIT(name) \ static void \ vnet_##name##_uninit(const void *unused) \ { \ VNET_PCPUSTAT_FREE(name); \ } \ VNET_SYSUNINIT(vnet_ ## name ## _uninit, SI_SUB_INIT_IF, \ SI_ORDER_FIRST, vnet_ ## name ## _uninit, NULL) #ifdef SYSCTL_OID #define SYSCTL_VNET_PCPUSTAT(parent, nbr, name, type, array, desc) \ static int \ array##_sysctl(SYSCTL_HANDLER_ARGS) \ { \ type s; \ CTASSERT((sizeof(type) / sizeof(uint64_t)) == \ (sizeof(VNET(array)) / sizeof(counter_u64_t))); \ COUNTER_ARRAY_COPY(VNET(array), &s, sizeof(type) / sizeof(uint64_t));\ if (req->newptr) \ COUNTER_ARRAY_ZERO(VNET(array), \ sizeof(type) / sizeof(uint64_t)); \ return (SYSCTL_OUT(req, &s, sizeof(type))); \ } \ SYSCTL_PROC(parent, nbr, name, \ CTLFLAG_VNET | CTLTYPE_OPAQUE | CTLFLAG_RW | CTLFLAG_NEEDGIANT, \ NULL, 0, array ## _sysctl, "I", desc) #endif /* SYSCTL_OID */ #ifdef VIMAGE #include #include /* for struct thread */ #include #include /* * Location of the kernel's 'set_vnet' linker set. */ extern uintptr_t *__start_set_vnet; __GLOBL(__start_set_vnet); extern uintptr_t *__stop_set_vnet; __GLOBL(__stop_set_vnet); #define VNET_START (uintptr_t)&__start_set_vnet #define VNET_STOP (uintptr_t)&__stop_set_vnet /* * Functions to allocate and destroy virtual network stacks. */ struct vnet *vnet_alloc(void); void vnet_destroy(struct vnet *vnet); /* * The current virtual network stack -- we may wish to move this to struct * pcpu in the future. */ #define curvnet curthread->td_vnet /* * Various macros -- get and set the current network stack, but also * assertions. */ #if defined(INVARIANTS) || defined(VNET_DEBUG) #define VNET_ASSERT(exp, msg) do { \ if (!(exp)) \ panic msg; \ } while (0) #else #define VNET_ASSERT(exp, msg) do { \ } while (0) #endif #ifdef VNET_DEBUG void vnet_log_recursion(struct vnet *, const char *, int); #define CURVNET_SET_QUIET(arg) \ VNET_ASSERT((arg) != NULL && (arg)->vnet_magic_n == VNET_MAGIC_N, \ ("CURVNET_SET at %s:%d %s() curvnet=%p vnet=%p", \ __FILE__, __LINE__, __func__, curvnet, (arg))); \ struct vnet *saved_vnet = curvnet; \ const char *saved_vnet_lpush = curthread->td_vnet_lpush; \ curvnet = arg; \ curthread->td_vnet_lpush = __func__; #define CURVNET_SET_VERBOSE(arg) \ CURVNET_SET_QUIET(arg) \ if (saved_vnet) \ vnet_log_recursion(saved_vnet, saved_vnet_lpush, __LINE__); #define CURVNET_SET(arg) CURVNET_SET_VERBOSE(arg) #define CURVNET_RESTORE() \ VNET_ASSERT(curvnet != NULL && (saved_vnet == NULL || \ saved_vnet->vnet_magic_n == VNET_MAGIC_N), \ ("CURVNET_RESTORE at %s:%d %s() curvnet=%p saved_vnet=%p", \ __FILE__, __LINE__, __func__, curvnet, saved_vnet)); \ curvnet = saved_vnet; \ curthread->td_vnet_lpush = saved_vnet_lpush; #else /* !VNET_DEBUG */ #define CURVNET_SET_QUIET(arg) \ VNET_ASSERT((arg) != NULL && (arg)->vnet_magic_n == VNET_MAGIC_N, \ ("CURVNET_SET at %s:%d %s() curvnet=%p vnet=%p", \ __FILE__, __LINE__, __func__, curvnet, (arg))); \ struct vnet *saved_vnet = curvnet; \ curvnet = arg; #define CURVNET_SET_VERBOSE(arg) \ CURVNET_SET_QUIET(arg) #define CURVNET_SET(arg) CURVNET_SET_VERBOSE(arg) #define CURVNET_RESTORE() \ VNET_ASSERT(curvnet != NULL && (saved_vnet == NULL || \ saved_vnet->vnet_magic_n == VNET_MAGIC_N), \ ("CURVNET_RESTORE at %s:%d %s() curvnet=%p saved_vnet=%p", \ __FILE__, __LINE__, __func__, curvnet, saved_vnet)); \ curvnet = saved_vnet; #endif /* VNET_DEBUG */ #define CURVNET_ASSERT_SET() \ VNET_ASSERT(curvnet != NULL, ("vnet is not set at %s:%d %s()", \ __FILE__, __LINE__, __func__)) extern struct vnet *vnet0; #define IS_DEFAULT_VNET(arg) ((arg) == vnet0) #define CRED_TO_VNET(cr) (cr)->cr_prison->pr_vnet #define TD_TO_VNET(td) CRED_TO_VNET((td)->td_ucred) #define P_TO_VNET(p) CRED_TO_VNET((p)->p_ucred) /* * Global linked list of all virtual network stacks, along with read locks to * access it. If a caller may sleep while accessing the list, it must use * the sleepable lock macros. */ LIST_HEAD(vnet_list_head, vnet); extern struct vnet_list_head vnet_head; extern struct rwlock vnet_rwlock; extern struct sx vnet_sxlock; #define VNET_LIST_RLOCK() sx_slock(&vnet_sxlock) #define VNET_LIST_RLOCK_NOSLEEP() rw_rlock(&vnet_rwlock) #define VNET_LIST_RUNLOCK() sx_sunlock(&vnet_sxlock) #define VNET_LIST_RUNLOCK_NOSLEEP() rw_runlock(&vnet_rwlock) /* * Iteration macros to walk the global list of virtual network stacks. */ #define VNET_ITERATOR_DECL(arg) struct vnet *arg #define VNET_FOREACH(arg) LIST_FOREACH((arg), &vnet_head, vnet_le) /* * Virtual network stack memory allocator, which allows global variables to * be automatically instantiated for each network stack instance. */ #define VNET_NAME(n) vnet_entry_##n #define VNET_DECLARE(t, n) extern t VNET_NAME(n) /* struct _hack is to stop this from being used with static data */ #define VNET_DEFINE(t, n) \ struct _hack; t VNET_NAME(n) __section(VNET_SETNAME) __used #if defined(KLD_MODULE) && (defined(__aarch64__) || defined(__riscv) \ || defined(__powerpc64__)) /* * As with DPCPU_DEFINE_STATIC we are unable to mark this data as static * in modules on some architectures. */ #define VNET_DEFINE_STATIC(t, n) \ t VNET_NAME(n) __section(VNET_SETNAME) __used #else #define VNET_DEFINE_STATIC(t, n) \ static t VNET_NAME(n) __section(VNET_SETNAME) __used #endif #define _VNET_PTR(b, n) (__typeof(VNET_NAME(n))*) \ ((b) + (uintptr_t)&VNET_NAME(n)) #define _VNET(b, n) (*_VNET_PTR(b, n)) /* * Virtualized global variable accessor macros. */ #define VNET_VNET_PTR(vnet, n) _VNET_PTR((vnet)->vnet_data_base, n) #define VNET_VNET(vnet, n) (*VNET_VNET_PTR((vnet), n)) #define VNET_PTR(n) VNET_VNET_PTR(curvnet, n) #define VNET(n) VNET_VNET(curvnet, n) /* * Virtual network stack allocator interfaces from the kernel linker. */ void *vnet_data_alloc(int size); void vnet_data_copy(void *start, int size); void vnet_data_free(void *start_arg, int size); /* * Virtual sysinit mechanism, allowing network stack components to declare * startup and shutdown methods to be run when virtual network stack * instances are created and destroyed. */ #include /* * SYSINIT/SYSUNINIT variants that provide per-vnet constructors and * destructors. */ struct vnet_sysinit { enum sysinit_sub_id subsystem; enum sysinit_elem_order order; sysinit_cfunc_t func; const void *arg; TAILQ_ENTRY(vnet_sysinit) link; }; #define VNET_SYSINIT(ident, subsystem, order, func, arg) \ CTASSERT((subsystem) > SI_SUB_VNET && \ (subsystem) <= SI_SUB_VNET_DONE); \ static struct vnet_sysinit ident ## _vnet_init = { \ subsystem, \ order, \ (sysinit_cfunc_t)(sysinit_nfunc_t)func, \ (arg) \ }; \ SYSINIT(vnet_init_ ## ident, subsystem, order, \ vnet_register_sysinit, &ident ## _vnet_init); \ SYSUNINIT(vnet_init_ ## ident, subsystem, order, \ vnet_deregister_sysinit, &ident ## _vnet_init) #define VNET_SYSUNINIT(ident, subsystem, order, func, arg) \ CTASSERT((subsystem) > SI_SUB_VNET && \ (subsystem) <= SI_SUB_VNET_DONE); \ static struct vnet_sysinit ident ## _vnet_uninit = { \ subsystem, \ order, \ (sysinit_cfunc_t)(sysinit_nfunc_t)func, \ (arg) \ }; \ SYSINIT(vnet_uninit_ ## ident, subsystem, order, \ vnet_register_sysuninit, &ident ## _vnet_uninit); \ SYSUNINIT(vnet_uninit_ ## ident, subsystem, order, \ vnet_deregister_sysuninit, &ident ## _vnet_uninit) /* * Run per-vnet sysinits or sysuninits during vnet creation/destruction. */ void vnet_sysinit(void); void vnet_sysuninit(void); /* * Interfaces for managing per-vnet constructors and destructors. */ void vnet_register_sysinit(void *arg); void vnet_register_sysuninit(void *arg); void vnet_deregister_sysinit(void *arg); void vnet_deregister_sysuninit(void *arg); /* * EVENTHANDLER(9) extensions. */ #include void vnet_global_eventhandler_iterator_func(void *, ...); #define VNET_GLOBAL_EVENTHANDLER_REGISTER_TAG(tag, name, func, arg, priority) \ do { \ if (IS_DEFAULT_VNET(curvnet)) { \ (tag) = vimage_eventhandler_register(NULL, #name, func, \ arg, priority, \ vnet_global_eventhandler_iterator_func); \ } \ } while(0) #define VNET_GLOBAL_EVENTHANDLER_REGISTER(name, func, arg, priority) \ do { \ if (IS_DEFAULT_VNET(curvnet)) { \ vimage_eventhandler_register(NULL, #name, func, \ arg, priority, \ vnet_global_eventhandler_iterator_func); \ } \ } while(0) #else /* !VIMAGE */ /* * Various virtual network stack macros compile to no-ops without VIMAGE. */ #define curvnet NULL #define VNET_ASSERT(exp, msg) #define CURVNET_SET(arg) #define CURVNET_SET_QUIET(arg) #define CURVNET_RESTORE() #define CURVNET_ASSERT_SET() \ #define VNET_LIST_RLOCK() #define VNET_LIST_RLOCK_NOSLEEP() #define VNET_LIST_RUNLOCK() #define VNET_LIST_RUNLOCK_NOSLEEP() #define VNET_ITERATOR_DECL(arg) #define VNET_FOREACH(arg) for (int _vn = 0; _vn == 0; _vn++) #define IS_DEFAULT_VNET(arg) 1 #define CRED_TO_VNET(cr) NULL #define TD_TO_VNET(td) NULL #define P_TO_VNET(p) NULL /* * Versions of the VNET macros that compile to normal global variables and * standard sysctl definitions. */ #define VNET_NAME(n) n #define VNET_DECLARE(t, n) extern t n #define VNET_DEFINE(t, n) struct _hack; t n #define VNET_DEFINE_STATIC(t, n) static t n #define _VNET_PTR(b, n) &VNET_NAME(n) /* * Virtualized global variable accessor macros. */ #define VNET_VNET_PTR(vnet, n) (&(n)) #define VNET_VNET(vnet, n) (n) #define VNET_PTR(n) (&(n)) #define VNET(n) (n) /* * When VIMAGE isn't compiled into the kernel, VNET_SYSINIT/VNET_SYSUNINIT * map into normal sysinits, which have the same ordering properties. */ #define VNET_SYSINIT(ident, subsystem, order, func, arg) \ SYSINIT(ident, subsystem, order, func, arg) #define VNET_SYSUNINIT(ident, subsystem, order, func, arg) \ SYSUNINIT(ident, subsystem, order, func, arg) /* * Without VIMAGE revert to the default implementation. */ #define VNET_GLOBAL_EVENTHANDLER_REGISTER_TAG(tag, name, func, arg, priority) \ (tag) = eventhandler_register(NULL, #name, func, arg, priority) #define VNET_GLOBAL_EVENTHANDLER_REGISTER(name, func, arg, priority) \ eventhandler_register(NULL, #name, func, arg, priority) #endif /* VIMAGE */ #endif /* _KERNEL */ #endif /* !_NET_VNET_H_ */ diff --git a/sys/netinet/in.c b/sys/netinet/in.c index 9e4b677cf7e1..c3880c4ba983 100644 --- a/sys/netinet/in.c +++ b/sys/netinet/in.c @@ -1,1751 +1,1797 @@ /*- * SPDX-License-Identifier: BSD-3-Clause * * Copyright (c) 1982, 1986, 1991, 1993 * The Regents of the University of California. All rights reserved. * Copyright (C) 2001 WIDE Project. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)in.c 8.4 (Berkeley) 1/9/95 */ #include __FBSDID("$FreeBSD$"); #include "opt_inet.h" #define IN_HISTORICAL_NETS /* include class masks */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include static int in_aifaddr_ioctl(u_long, caddr_t, struct ifnet *, struct thread *); static int in_difaddr_ioctl(u_long, caddr_t, struct ifnet *, struct thread *); static int in_gifaddr_ioctl(u_long, caddr_t, struct ifnet *, struct thread *); static void in_socktrim(struct sockaddr_in *); static void in_purgemaddrs(struct ifnet *); static bool ia_need_loopback_route(const struct in_ifaddr *); VNET_DEFINE_STATIC(int, nosameprefix); #define V_nosameprefix VNET(nosameprefix) SYSCTL_INT(_net_inet_ip, OID_AUTO, no_same_prefix, CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(nosameprefix), 0, "Refuse to create same prefixes on different interfaces"); VNET_DEFINE_STATIC(bool, broadcast_lowest); #define V_broadcast_lowest VNET(broadcast_lowest) SYSCTL_BOOL(_net_inet_ip, OID_AUTO, broadcast_lowest, CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(broadcast_lowest), 0, "Treat lowest address on a subnet (host 0) as broadcast"); +VNET_DEFINE(bool, ip_allow_net240) = false; +#define V_ip_allow_net240 VNET(ip_allow_net240) +SYSCTL_BOOL(_net_inet_ip, OID_AUTO, allow_net240, + CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(ip_allow_net240), 0, + "Allow use of Experimental addresses, aka Class E (240/4)"); +/* see https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-240 */ + +VNET_DEFINE(bool, ip_allow_net0) = false; +SYSCTL_BOOL(_net_inet_ip, OID_AUTO, allow_net0, + CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(ip_allow_net0), 0, + "Allow use of addresses in network 0/8"); +/* see https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-0 */ + +VNET_DEFINE(uint32_t, in_loopback_mask) = IN_LOOPBACK_MASK_DFLT; +#define V_in_loopback_mask VNET(in_loopback_mask) +static int sysctl_loopback_prefixlen(SYSCTL_HANDLER_ARGS); +SYSCTL_PROC(_net_inet_ip, OID_AUTO, loopback_prefixlen, + CTLFLAG_VNET | CTLTYPE_INT | CTLFLAG_RW, + NULL, 0, sysctl_loopback_prefixlen, "I", + "Prefix length of address space reserved for loopback"); +/* see https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-127 */ + VNET_DECLARE(struct inpcbinfo, ripcbinfo); #define V_ripcbinfo VNET(ripcbinfo) static struct sx in_control_sx; SX_SYSINIT(in_control_sx, &in_control_sx, "in_control"); /* * Return 1 if an internet address is for a ``local'' host * (one to which we have a connection). */ int in_localaddr(struct in_addr in) { u_long i = ntohl(in.s_addr); struct in_ifaddr *ia; NET_EPOCH_ASSERT(); CK_STAILQ_FOREACH(ia, &V_in_ifaddrhead, ia_link) { if ((i & ia->ia_subnetmask) == ia->ia_subnet) return (1); } return (0); } /* * Return 1 if an internet address is for the local host and configured * on one of its interfaces. */ bool in_localip(struct in_addr in) { struct in_ifaddr *ia; NET_EPOCH_ASSERT(); CK_LIST_FOREACH(ia, INADDR_HASH(in.s_addr), ia_hash) if (IA_SIN(ia)->sin_addr.s_addr == in.s_addr) return (true); return (false); } /* * Like in_localip(), but FIB-aware. */ bool in_localip_fib(struct in_addr in, uint16_t fib) { struct in_ifaddr *ia; NET_EPOCH_ASSERT(); CK_LIST_FOREACH(ia, INADDR_HASH(in.s_addr), ia_hash) if (IA_SIN(ia)->sin_addr.s_addr == in.s_addr && ia->ia_ifa.ifa_ifp->if_fib == fib) return (true); return (false); } /* * Return 1 if an internet address is configured on an interface. */ int in_ifhasaddr(struct ifnet *ifp, struct in_addr in) { struct ifaddr *ifa; struct in_ifaddr *ia; NET_EPOCH_ASSERT(); CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) { if (ifa->ifa_addr->sa_family != AF_INET) continue; ia = (struct in_ifaddr *)ifa; if (ia->ia_addr.sin_addr.s_addr == in.s_addr) return (1); } return (0); } /* * Return a reference to the interface address which is different to * the supplied one but with same IP address value. */ static struct in_ifaddr * in_localip_more(struct in_ifaddr *original_ia) { struct epoch_tracker et; in_addr_t original_addr = IA_SIN(original_ia)->sin_addr.s_addr; uint32_t original_fib = original_ia->ia_ifa.ifa_ifp->if_fib; struct in_ifaddr *ia; NET_EPOCH_ENTER(et); CK_LIST_FOREACH(ia, INADDR_HASH(original_addr), ia_hash) { in_addr_t addr = IA_SIN(ia)->sin_addr.s_addr; uint32_t fib = ia->ia_ifa.ifa_ifp->if_fib; if (!V_rt_add_addr_allfibs && (original_fib != fib)) continue; if ((original_ia != ia) && (original_addr == addr)) { ifa_ref(&ia->ia_ifa); NET_EPOCH_EXIT(et); return (ia); } } NET_EPOCH_EXIT(et); return (NULL); } /* * Tries to find first IPv4 address in the provided fib. * Prefers non-loopback addresses and return loopback IFF * @loopback_ok is set. * * Returns ifa or NULL. */ struct in_ifaddr * in_findlocal(uint32_t fibnum, bool loopback_ok) { struct in_ifaddr *ia = NULL, *ia_lo = NULL; NET_EPOCH_ASSERT(); CK_STAILQ_FOREACH(ia, &V_in_ifaddrhead, ia_link) { uint32_t ia_fib = ia->ia_ifa.ifa_ifp->if_fib; if (!V_rt_add_addr_allfibs && (fibnum != ia_fib)) continue; if (!IN_LOOPBACK(ntohl(IA_SIN(ia)->sin_addr.s_addr))) break; if (loopback_ok) ia_lo = ia; } if (ia == NULL) ia = ia_lo; return (ia); } /* * Determine whether an IP address is in a reserved set of addresses * that may not be forwarded, or whether datagrams to that destination * may be forwarded. */ int in_canforward(struct in_addr in) { u_long i = ntohl(in.s_addr); - if (IN_EXPERIMENTAL(i) || IN_MULTICAST(i) || IN_LINKLOCAL(i) || - IN_ZERONET(i) || IN_LOOPBACK(i)) + if (IN_MULTICAST(i) || IN_LINKLOCAL(i) || IN_LOOPBACK(i)) + return (0); + if (IN_EXPERIMENTAL(i) && !V_ip_allow_net240) + return (0); + if (IN_ZERONET(i) && !V_ip_allow_net0) return (0); return (1); } +/* + * Sysctl to manage prefix of reserved loopback network; translate + * to/from mask. The mask is always contiguous high-order 1 bits + * followed by all 0 bits. + */ +static int +sysctl_loopback_prefixlen(SYSCTL_HANDLER_ARGS) +{ + int error, preflen; + + /* ffs is 1-based; compensate. */ + preflen = 33 - ffs(V_in_loopback_mask); + error = sysctl_handle_int(oidp, &preflen, 0, req); + if (error || !req->newptr) + return (error); + if (preflen < 8 || preflen > 32) + return (EINVAL); + V_in_loopback_mask = 0xffffffff << (32 - preflen); + return (0); +} + /* * Trim a mask in a sockaddr */ static void in_socktrim(struct sockaddr_in *ap) { char *cplim = (char *) &ap->sin_addr; char *cp = (char *) (&ap->sin_addr + 1); ap->sin_len = 0; while (--cp >= cplim) if (*cp) { (ap)->sin_len = cp - (char *) (ap) + 1; break; } } /* * Generic internet control operations (ioctl's). */ int in_control(struct socket *so, u_long cmd, caddr_t data, struct ifnet *ifp, struct thread *td) { struct ifreq *ifr = (struct ifreq *)data; struct sockaddr_in *addr = (struct sockaddr_in *)&ifr->ifr_addr; struct epoch_tracker et; struct ifaddr *ifa; struct in_ifaddr *ia; int error; if (ifp == NULL) return (EADDRNOTAVAIL); /* * Filter out 4 ioctls we implement directly. Forward the rest * to specific functions and ifp->if_ioctl(). */ switch (cmd) { case SIOCGIFADDR: case SIOCGIFBRDADDR: case SIOCGIFDSTADDR: case SIOCGIFNETMASK: break; case SIOCGIFALIAS: sx_xlock(&in_control_sx); error = in_gifaddr_ioctl(cmd, data, ifp, td); sx_xunlock(&in_control_sx); return (error); case SIOCDIFADDR: sx_xlock(&in_control_sx); error = in_difaddr_ioctl(cmd, data, ifp, td); sx_xunlock(&in_control_sx); return (error); case OSIOCAIFADDR: /* 9.x compat */ case SIOCAIFADDR: sx_xlock(&in_control_sx); error = in_aifaddr_ioctl(cmd, data, ifp, td); sx_xunlock(&in_control_sx); return (error); case SIOCSIFADDR: case SIOCSIFBRDADDR: case SIOCSIFDSTADDR: case SIOCSIFNETMASK: /* We no longer support that old commands. */ return (EINVAL); default: if (ifp->if_ioctl == NULL) return (EOPNOTSUPP); return ((*ifp->if_ioctl)(ifp, cmd, data)); } if (addr->sin_addr.s_addr != INADDR_ANY && prison_check_ip4(td->td_ucred, &addr->sin_addr) != 0) return (EADDRNOTAVAIL); /* * Find address for this interface, if it exists. If an * address was specified, find that one instead of the * first one on the interface, if possible. */ NET_EPOCH_ENTER(et); CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) { if (ifa->ifa_addr->sa_family != AF_INET) continue; ia = (struct in_ifaddr *)ifa; if (ia->ia_addr.sin_addr.s_addr == addr->sin_addr.s_addr) break; } if (ifa == NULL) CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) if (ifa->ifa_addr->sa_family == AF_INET) { ia = (struct in_ifaddr *)ifa; if (prison_check_ip4(td->td_ucred, &ia->ia_addr.sin_addr) == 0) break; } if (ifa == NULL) { NET_EPOCH_EXIT(et); return (EADDRNOTAVAIL); } error = 0; switch (cmd) { case SIOCGIFADDR: *addr = ia->ia_addr; break; case SIOCGIFBRDADDR: if ((ifp->if_flags & IFF_BROADCAST) == 0) { error = EINVAL; break; } *addr = ia->ia_broadaddr; break; case SIOCGIFDSTADDR: if ((ifp->if_flags & IFF_POINTOPOINT) == 0) { error = EINVAL; break; } *addr = ia->ia_dstaddr; break; case SIOCGIFNETMASK: *addr = ia->ia_sockmask; break; } NET_EPOCH_EXIT(et); return (error); } static int in_aifaddr_ioctl(u_long cmd, caddr_t data, struct ifnet *ifp, struct thread *td) { const struct in_aliasreq *ifra = (struct in_aliasreq *)data; const struct sockaddr_in *addr = &ifra->ifra_addr; const struct sockaddr_in *broadaddr = &ifra->ifra_broadaddr; const struct sockaddr_in *mask = &ifra->ifra_mask; const struct sockaddr_in *dstaddr = &ifra->ifra_dstaddr; const int vhid = (cmd == SIOCAIFADDR) ? ifra->ifra_vhid : 0; struct epoch_tracker et; struct ifaddr *ifa; struct in_ifaddr *ia; bool iaIsFirst; int error = 0; error = priv_check(td, PRIV_NET_ADDIFADDR); if (error) return (error); /* * ifra_addr must be present and be of INET family. * ifra_broadaddr/ifra_dstaddr and ifra_mask are optional. */ if (addr->sin_len != sizeof(struct sockaddr_in) || addr->sin_family != AF_INET) return (EINVAL); if (broadaddr->sin_len != 0 && (broadaddr->sin_len != sizeof(struct sockaddr_in) || broadaddr->sin_family != AF_INET)) return (EINVAL); if (mask->sin_len != 0 && (mask->sin_len != sizeof(struct sockaddr_in) || mask->sin_family != AF_INET)) return (EINVAL); if ((ifp->if_flags & IFF_POINTOPOINT) && (dstaddr->sin_len != sizeof(struct sockaddr_in) || dstaddr->sin_addr.s_addr == INADDR_ANY)) return (EDESTADDRREQ); if (vhid != 0 && carp_attach_p == NULL) return (EPROTONOSUPPORT); /* * See whether address already exist. */ iaIsFirst = true; ia = NULL; NET_EPOCH_ENTER(et); CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) { struct in_ifaddr *it; if (ifa->ifa_addr->sa_family != AF_INET) continue; it = (struct in_ifaddr *)ifa; if (it->ia_addr.sin_addr.s_addr == addr->sin_addr.s_addr && prison_check_ip4(td->td_ucred, &addr->sin_addr) == 0) ia = it; else iaIsFirst = false; } NET_EPOCH_EXIT(et); if (ia != NULL) (void )in_difaddr_ioctl(cmd, data, ifp, td); ifa = ifa_alloc(sizeof(struct in_ifaddr), M_WAITOK); ia = (struct in_ifaddr *)ifa; ifa->ifa_addr = (struct sockaddr *)&ia->ia_addr; ifa->ifa_dstaddr = (struct sockaddr *)&ia->ia_dstaddr; ifa->ifa_netmask = (struct sockaddr *)&ia->ia_sockmask; callout_init_rw(&ia->ia_garp_timer, &ifp->if_addr_lock, CALLOUT_RETURNUNLOCKED); ia->ia_ifp = ifp; ia->ia_addr = *addr; if (mask->sin_len != 0) { ia->ia_sockmask = *mask; ia->ia_subnetmask = ntohl(ia->ia_sockmask.sin_addr.s_addr); } else { in_addr_t i = ntohl(addr->sin_addr.s_addr); /* * If netmask isn't supplied, use historical default. * This is deprecated for interfaces other than loopback * or point-to-point; warn in other cases. In the future * we should return an error rather than warning. */ if ((ifp->if_flags & (IFF_POINTOPOINT | IFF_LOOPBACK)) == 0) printf("%s: set address: WARNING: network mask " "should be specified; using historical default\n", ifp->if_xname); if (IN_CLASSA(i)) ia->ia_subnetmask = IN_CLASSA_NET; else if (IN_CLASSB(i)) ia->ia_subnetmask = IN_CLASSB_NET; else ia->ia_subnetmask = IN_CLASSC_NET; ia->ia_sockmask.sin_addr.s_addr = htonl(ia->ia_subnetmask); } ia->ia_subnet = ntohl(addr->sin_addr.s_addr) & ia->ia_subnetmask; in_socktrim(&ia->ia_sockmask); if (ifp->if_flags & IFF_BROADCAST) { if (broadaddr->sin_len != 0) { ia->ia_broadaddr = *broadaddr; } else if (ia->ia_subnetmask == IN_RFC3021_MASK) { ia->ia_broadaddr.sin_addr.s_addr = INADDR_BROADCAST; ia->ia_broadaddr.sin_len = sizeof(struct sockaddr_in); ia->ia_broadaddr.sin_family = AF_INET; } else { ia->ia_broadaddr.sin_addr.s_addr = htonl(ia->ia_subnet | ~ia->ia_subnetmask); ia->ia_broadaddr.sin_len = sizeof(struct sockaddr_in); ia->ia_broadaddr.sin_family = AF_INET; } } if (ifp->if_flags & IFF_POINTOPOINT) ia->ia_dstaddr = *dstaddr; if (vhid != 0) { error = (*carp_attach_p)(&ia->ia_ifa, vhid); if (error) return (error); } /* if_addrhead is already referenced by ifa_alloc() */ IF_ADDR_WLOCK(ifp); CK_STAILQ_INSERT_TAIL(&ifp->if_addrhead, ifa, ifa_link); IF_ADDR_WUNLOCK(ifp); ifa_ref(ifa); /* in_ifaddrhead */ sx_assert(&in_control_sx, SA_XLOCKED); CK_STAILQ_INSERT_TAIL(&V_in_ifaddrhead, ia, ia_link); CK_LIST_INSERT_HEAD(INADDR_HASH(ia->ia_addr.sin_addr.s_addr), ia, ia_hash); /* * Give the interface a chance to initialize * if this is its first address, * and to validate the address if necessary. */ if (ifp->if_ioctl != NULL) { error = (*ifp->if_ioctl)(ifp, SIOCSIFADDR, (caddr_t)ia); if (error) goto fail1; } /* * Add route for the network. */ if (vhid == 0) { error = in_addprefix(ia); if (error) goto fail1; } /* * Add a loopback route to self. */ if (vhid == 0 && ia_need_loopback_route(ia)) { struct in_ifaddr *eia; eia = in_localip_more(ia); if (eia == NULL) { error = ifa_add_loopback_route((struct ifaddr *)ia, (struct sockaddr *)&ia->ia_addr); if (error) goto fail2; } else ifa_free(&eia->ia_ifa); } if (iaIsFirst && (ifp->if_flags & IFF_MULTICAST)) { struct in_addr allhosts_addr; struct in_ifinfo *ii; ii = ((struct in_ifinfo *)ifp->if_afdata[AF_INET]); allhosts_addr.s_addr = htonl(INADDR_ALLHOSTS_GROUP); error = in_joingroup(ifp, &allhosts_addr, NULL, &ii->ii_allhosts); } /* * Note: we don't need extra reference for ifa, since we called * with sx lock held, and ifaddr can not be deleted in concurrent * thread. */ EVENTHANDLER_INVOKE(ifaddr_event_ext, ifp, ifa, IFADDR_EVENT_ADD); return (error); fail2: if (vhid == 0) (void )in_scrubprefix(ia, LLE_STATIC); fail1: if (ia->ia_ifa.ifa_carp) (*carp_detach_p)(&ia->ia_ifa, false); IF_ADDR_WLOCK(ifp); CK_STAILQ_REMOVE(&ifp->if_addrhead, &ia->ia_ifa, ifaddr, ifa_link); IF_ADDR_WUNLOCK(ifp); ifa_free(&ia->ia_ifa); /* if_addrhead */ sx_assert(&in_control_sx, SA_XLOCKED); CK_STAILQ_REMOVE(&V_in_ifaddrhead, ia, in_ifaddr, ia_link); CK_LIST_REMOVE(ia, ia_hash); ifa_free(&ia->ia_ifa); /* in_ifaddrhead */ return (error); } static int in_difaddr_ioctl(u_long cmd, caddr_t data, struct ifnet *ifp, struct thread *td) { const struct ifreq *ifr = (struct ifreq *)data; const struct sockaddr_in *addr = (const struct sockaddr_in *) &ifr->ifr_addr; struct ifaddr *ifa; struct in_ifaddr *ia; bool deleteAny, iaIsLast; int error; if (td != NULL) { error = priv_check(td, PRIV_NET_DELIFADDR); if (error) return (error); } if (addr->sin_len != sizeof(struct sockaddr_in) || addr->sin_family != AF_INET) deleteAny = true; else deleteAny = false; iaIsLast = true; ia = NULL; IF_ADDR_WLOCK(ifp); CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) { struct in_ifaddr *it; if (ifa->ifa_addr->sa_family != AF_INET) continue; it = (struct in_ifaddr *)ifa; if (deleteAny && ia == NULL && (td == NULL || prison_check_ip4(td->td_ucred, &it->ia_addr.sin_addr) == 0)) ia = it; if (it->ia_addr.sin_addr.s_addr == addr->sin_addr.s_addr && (td == NULL || prison_check_ip4(td->td_ucred, &addr->sin_addr) == 0)) ia = it; if (it != ia) iaIsLast = false; } if (ia == NULL) { IF_ADDR_WUNLOCK(ifp); return (EADDRNOTAVAIL); } CK_STAILQ_REMOVE(&ifp->if_addrhead, &ia->ia_ifa, ifaddr, ifa_link); IF_ADDR_WUNLOCK(ifp); ifa_free(&ia->ia_ifa); /* if_addrhead */ sx_assert(&in_control_sx, SA_XLOCKED); CK_STAILQ_REMOVE(&V_in_ifaddrhead, ia, in_ifaddr, ia_link); CK_LIST_REMOVE(ia, ia_hash); /* * in_scrubprefix() kills the interface route. */ in_scrubprefix(ia, LLE_STATIC); /* * in_ifadown gets rid of all the rest of * the routes. This is not quite the right * thing to do, but at least if we are running * a routing process they will come back. */ in_ifadown(&ia->ia_ifa, 1); if (ia->ia_ifa.ifa_carp) (*carp_detach_p)(&ia->ia_ifa, cmd == SIOCAIFADDR); /* * If this is the last IPv4 address configured on this * interface, leave the all-hosts group. * No state-change report need be transmitted. */ if (iaIsLast && (ifp->if_flags & IFF_MULTICAST)) { struct in_ifinfo *ii; ii = ((struct in_ifinfo *)ifp->if_afdata[AF_INET]); if (ii->ii_allhosts) { (void)in_leavegroup(ii->ii_allhosts, NULL); ii->ii_allhosts = NULL; } } IF_ADDR_WLOCK(ifp); if (callout_stop(&ia->ia_garp_timer) == 1) { ifa_free(&ia->ia_ifa); } IF_ADDR_WUNLOCK(ifp); EVENTHANDLER_INVOKE(ifaddr_event_ext, ifp, &ia->ia_ifa, IFADDR_EVENT_DEL); ifa_free(&ia->ia_ifa); /* in_ifaddrhead */ return (0); } static int in_gifaddr_ioctl(u_long cmd, caddr_t data, struct ifnet *ifp, struct thread *td) { struct in_aliasreq *ifra = (struct in_aliasreq *)data; const struct sockaddr_in *addr = &ifra->ifra_addr; struct epoch_tracker et; struct ifaddr *ifa; struct in_ifaddr *ia; /* * ifra_addr must be present and be of INET family. */ if (addr->sin_len != sizeof(struct sockaddr_in) || addr->sin_family != AF_INET) return (EINVAL); /* * See whether address exist. */ ia = NULL; NET_EPOCH_ENTER(et); CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) { struct in_ifaddr *it; if (ifa->ifa_addr->sa_family != AF_INET) continue; it = (struct in_ifaddr *)ifa; if (it->ia_addr.sin_addr.s_addr == addr->sin_addr.s_addr && prison_check_ip4(td->td_ucred, &addr->sin_addr) == 0) { ia = it; break; } } if (ia == NULL) { NET_EPOCH_EXIT(et); return (EADDRNOTAVAIL); } ifra->ifra_mask = ia->ia_sockmask; if ((ifp->if_flags & IFF_POINTOPOINT) && ia->ia_dstaddr.sin_family == AF_INET) ifra->ifra_dstaddr = ia->ia_dstaddr; else if ((ifp->if_flags & IFF_BROADCAST) && ia->ia_broadaddr.sin_family == AF_INET) ifra->ifra_broadaddr = ia->ia_broadaddr; else memset(&ifra->ifra_broadaddr, 0, sizeof(ifra->ifra_broadaddr)); NET_EPOCH_EXIT(et); return (0); } static int in_match_ifaddr(const struct rtentry *rt, const struct nhop_object *nh, void *arg) { if (nh->nh_ifa == (struct ifaddr *)arg) return (1); return (0); } static int in_handle_prefix_route(uint32_t fibnum, int cmd, struct sockaddr_in *dst, struct sockaddr_in *netmask, struct ifaddr *ifa, struct ifnet *ifp) { NET_EPOCH_ASSERT(); /* Prepare gateway */ struct sockaddr_dl_short sdl = { .sdl_family = AF_LINK, .sdl_len = sizeof(struct sockaddr_dl_short), .sdl_type = ifa->ifa_ifp->if_type, .sdl_index = ifa->ifa_ifp->if_index, }; struct rt_addrinfo info = { .rti_ifa = ifa, .rti_ifp = ifp, .rti_flags = RTF_PINNED | ((netmask != NULL) ? 0 : RTF_HOST), .rti_info = { [RTAX_DST] = (struct sockaddr *)dst, [RTAX_NETMASK] = (struct sockaddr *)netmask, [RTAX_GATEWAY] = (struct sockaddr *)&sdl, }, /* Ensure we delete the prefix IFF prefix ifa matches */ .rti_filter = in_match_ifaddr, .rti_filterdata = ifa, }; return (rib_handle_ifaddr_info(fibnum, cmd, &info)); } /* * Routing table interaction with interface addresses. * * In general, two types of routes needs to be installed: * a) "interface" or "prefix" route, telling user that the addresses * behind the ifa prefix are reached directly. * b) "loopback" route installed for the ifa address, telling user that * the address belongs to local system. * * Handling for (a) and (b) differs in multi-fib aspects, hence they * are implemented in different functions below. * * The cases above may intersect - /32 interface aliases results in * the same prefix produced by (a) and (b). This blurs the definition * of the "loopback" route and complicate interactions. The interaction * table is defined below. The case numbers are used in the multiple * functions below to refer to the particular test case. * * There can be multiple options: * 1) Adding address with prefix on non-p2p/non-loopback interface. * Example: 192.0.2.1/24. Action: * * add "prefix" route towards 192.0.2.0/24 via @ia interface, * using @ia as an address source. * * add "loopback" route towards 192.0.2.1 via V_loif, saving * @ia ifp in the gateway and using @ia as an address source. * * 2) Adding address with /32 mask to non-p2p/non-loopback interface. * Example: 192.0.2.2/32. Action: * * add "prefix" host route via V_loif, using @ia as an address source. * * 3) Adding address with or without prefix to p2p interface. * Example: 10.0.0.1/24->10.0.0.2. Action: * * add "prefix" host route towards 10.0.0.2 via this interface, using @ia * as an address source. Note: no sense in installing full /24 as the interface * is point-to-point. * * add "loopback" route towards 10.0.9.1 via V_loif, saving * @ia ifp in the gateway and using @ia as an address source. * * 4) Adding address with or without prefix to loopback interface. * Example: 192.0.2.1/24. Action: * * add "prefix" host route via @ia interface, using @ia as an address source. * Note: Skip installing /24 prefix as it would introduce TTL loop * for the traffic destined to these addresses. */ /* * Checks if @ia needs to install loopback route to @ia address via * ifa_maintain_loopback_route(). * * Return true on success. */ static bool ia_need_loopback_route(const struct in_ifaddr *ia) { struct ifnet *ifp = ia->ia_ifp; /* Case 4: Skip loopback interfaces */ if ((ifp->if_flags & IFF_LOOPBACK) || (ia->ia_addr.sin_addr.s_addr == INADDR_ANY)) return (false); /* Clash avoidance: Skip p2p interfaces with both addresses are equal */ if ((ifp->if_flags & IFF_POINTOPOINT) && ia->ia_dstaddr.sin_addr.s_addr == ia->ia_addr.sin_addr.s_addr) return (false); /* Case 2: skip /32 prefixes */ if (!(ifp->if_flags & IFF_POINTOPOINT) && (ia->ia_sockmask.sin_addr.s_addr == INADDR_BROADCAST)) return (false); return (true); } /* * Calculate "prefix" route corresponding to @ia. */ static void ia_getrtprefix(const struct in_ifaddr *ia, struct in_addr *prefix, struct in_addr *mask) { if (ia->ia_ifp->if_flags & IFF_POINTOPOINT) { /* Case 3: return host route for dstaddr */ *prefix = ia->ia_dstaddr.sin_addr; mask->s_addr = INADDR_BROADCAST; } else if (ia->ia_ifp->if_flags & IFF_LOOPBACK) { /* Case 4: return host route for ifaddr */ *prefix = ia->ia_addr.sin_addr; mask->s_addr = INADDR_BROADCAST; } else { /* Cases 1,2: return actual ia prefix */ *prefix = ia->ia_addr.sin_addr; *mask = ia->ia_sockmask.sin_addr; prefix->s_addr &= mask->s_addr; } } /* * Adds or delete interface "prefix" route corresponding to @ifa. * Returns 0 on success or errno. */ int in_handle_ifaddr_route(int cmd, struct in_ifaddr *ia) { struct ifaddr *ifa = &ia->ia_ifa; struct in_addr daddr, maddr; struct sockaddr_in *pmask; struct epoch_tracker et; int error; ia_getrtprefix(ia, &daddr, &maddr); struct sockaddr_in mask = { .sin_family = AF_INET, .sin_len = sizeof(struct sockaddr_in), .sin_addr = maddr, }; pmask = (maddr.s_addr != INADDR_BROADCAST) ? &mask : NULL; struct sockaddr_in dst = { .sin_family = AF_INET, .sin_len = sizeof(struct sockaddr_in), .sin_addr.s_addr = daddr.s_addr & maddr.s_addr, }; struct ifnet *ifp = ia->ia_ifp; if ((maddr.s_addr == INADDR_BROADCAST) && (!(ia->ia_ifp->if_flags & (IFF_POINTOPOINT|IFF_LOOPBACK)))) { /* Case 2: host route on broadcast interface */ ifp = V_loif; } uint32_t fibnum = ifa->ifa_ifp->if_fib; NET_EPOCH_ENTER(et); error = in_handle_prefix_route(fibnum, cmd, &dst, pmask, ifa, ifp); NET_EPOCH_EXIT(et); return (error); } /* * Check if we have a route for the given prefix already. */ static bool in_hasrtprefix(struct in_ifaddr *target) { struct epoch_tracker et; struct in_ifaddr *ia; struct in_addr prefix, mask, p, m; bool result = false; ia_getrtprefix(target, &prefix, &mask); /* Look for an existing address with the same prefix, mask, and fib */ NET_EPOCH_ENTER(et); CK_STAILQ_FOREACH(ia, &V_in_ifaddrhead, ia_link) { ia_getrtprefix(ia, &p, &m); if (prefix.s_addr != p.s_addr || mask.s_addr != m.s_addr) continue; if (target->ia_ifp->if_fib != ia->ia_ifp->if_fib) continue; /* * If we got a matching prefix route inserted by other * interface address, we are done here. */ if (ia->ia_flags & IFA_ROUTE) { result = true; break; } } NET_EPOCH_EXIT(et); return (result); } int in_addprefix(struct in_ifaddr *target) { int error; if (in_hasrtprefix(target)) { if (V_nosameprefix) return (EEXIST); else { rt_addrmsg(RTM_ADD, &target->ia_ifa, target->ia_ifp->if_fib); return (0); } } /* * No-one seem to have this prefix route, so we try to insert it. */ rt_addrmsg(RTM_ADD, &target->ia_ifa, target->ia_ifp->if_fib); error = in_handle_ifaddr_route(RTM_ADD, target); if (!error) target->ia_flags |= IFA_ROUTE; return (error); } /* * Removes either all lle entries for given @ia, or lle * corresponding to @ia address. */ static void in_scrubprefixlle(struct in_ifaddr *ia, int all, u_int flags) { struct sockaddr_in addr, mask; struct sockaddr *saddr, *smask; struct ifnet *ifp; saddr = (struct sockaddr *)&addr; bzero(&addr, sizeof(addr)); addr.sin_len = sizeof(addr); addr.sin_family = AF_INET; smask = (struct sockaddr *)&mask; bzero(&mask, sizeof(mask)); mask.sin_len = sizeof(mask); mask.sin_family = AF_INET; mask.sin_addr.s_addr = ia->ia_subnetmask; ifp = ia->ia_ifp; if (all) { /* * Remove all L2 entries matching given prefix. * Convert address to host representation to avoid * doing this on every callback. ia_subnetmask is already * stored in host representation. */ addr.sin_addr.s_addr = ntohl(ia->ia_addr.sin_addr.s_addr); lltable_prefix_free(AF_INET, saddr, smask, flags); } else { /* Remove interface address only */ addr.sin_addr.s_addr = ia->ia_addr.sin_addr.s_addr; lltable_delete_addr(LLTABLE(ifp), LLE_IFADDR, saddr); } } /* * If there is no other address in the system that can serve a route to the * same prefix, remove the route. Hand over the route to the new address * otherwise. */ int in_scrubprefix(struct in_ifaddr *target, u_int flags) { struct epoch_tracker et; struct in_ifaddr *ia; struct in_addr prefix, mask, p, m; int error = 0; /* * Remove the loopback route to the interface address. */ if (ia_need_loopback_route(target) && (flags & LLE_STATIC)) { struct in_ifaddr *eia; eia = in_localip_more(target); if (eia != NULL) { error = ifa_switch_loopback_route((struct ifaddr *)eia, (struct sockaddr *)&target->ia_addr); ifa_free(&eia->ia_ifa); } else { error = ifa_del_loopback_route((struct ifaddr *)target, (struct sockaddr *)&target->ia_addr); } } ia_getrtprefix(target, &prefix, &mask); if ((target->ia_flags & IFA_ROUTE) == 0) { rt_addrmsg(RTM_DELETE, &target->ia_ifa, target->ia_ifp->if_fib); /* * Removing address from !IFF_UP interface or * prefix which exists on other interface (along with route). * No entries should exist here except target addr. * Given that, delete this entry only. */ in_scrubprefixlle(target, 0, flags); return (0); } NET_EPOCH_ENTER(et); CK_STAILQ_FOREACH(ia, &V_in_ifaddrhead, ia_link) { ia_getrtprefix(ia, &p, &m); if (prefix.s_addr != p.s_addr || mask.s_addr != m.s_addr) continue; if ((ia->ia_ifp->if_flags & IFF_UP) == 0) continue; /* * If we got a matching prefix address, move IFA_ROUTE and * the route itself to it. Make sure that routing daemons * get a heads-up. */ if ((ia->ia_flags & IFA_ROUTE) == 0) { ifa_ref(&ia->ia_ifa); NET_EPOCH_EXIT(et); error = in_handle_ifaddr_route(RTM_DELETE, target); if (error == 0) target->ia_flags &= ~IFA_ROUTE; else log(LOG_INFO, "in_scrubprefix: err=%d, old prefix delete failed\n", error); /* Scrub all entries IFF interface is different */ in_scrubprefixlle(target, target->ia_ifp != ia->ia_ifp, flags); error = in_handle_ifaddr_route(RTM_ADD, ia); if (error == 0) ia->ia_flags |= IFA_ROUTE; else log(LOG_INFO, "in_scrubprefix: err=%d, new prefix add failed\n", error); ifa_free(&ia->ia_ifa); return (error); } } NET_EPOCH_EXIT(et); /* * remove all L2 entries on the given prefix */ in_scrubprefixlle(target, 1, flags); /* * As no-one seem to have this prefix, we can remove the route. */ rt_addrmsg(RTM_DELETE, &target->ia_ifa, target->ia_ifp->if_fib); error = in_handle_ifaddr_route(RTM_DELETE, target); if (error == 0) target->ia_flags &= ~IFA_ROUTE; else log(LOG_INFO, "in_scrubprefix: err=%d, prefix delete failed\n", error); return (error); } void in_ifscrub_all(void) { struct ifnet *ifp; struct ifaddr *ifa, *nifa; struct ifaliasreq ifr; IFNET_RLOCK(); CK_STAILQ_FOREACH(ifp, &V_ifnet, if_link) { /* Cannot lock here - lock recursion. */ /* NET_EPOCH_ENTER(et); */ CK_STAILQ_FOREACH_SAFE(ifa, &ifp->if_addrhead, ifa_link, nifa) { if (ifa->ifa_addr->sa_family != AF_INET) continue; /* * This is ugly but the only way for legacy IP to * cleanly remove addresses and everything attached. */ bzero(&ifr, sizeof(ifr)); ifr.ifra_addr = *ifa->ifa_addr; if (ifa->ifa_dstaddr) ifr.ifra_broadaddr = *ifa->ifa_dstaddr; (void)in_control(NULL, SIOCDIFADDR, (caddr_t)&ifr, ifp, NULL); } /* NET_EPOCH_EXIT(et); */ in_purgemaddrs(ifp); igmp_domifdetach(ifp); } IFNET_RUNLOCK(); } int in_ifaddr_broadcast(struct in_addr in, struct in_ifaddr *ia) { return ((in.s_addr == ia->ia_broadaddr.sin_addr.s_addr || /* * Optionally check for old-style (host 0) broadcast, but * taking into account that RFC 3021 obsoletes it. */ (V_broadcast_lowest && ia->ia_subnetmask != IN_RFC3021_MASK && ntohl(in.s_addr) == ia->ia_subnet)) && /* * Check for an all one subnetmask. These * only exist when an interface gets a secondary * address. */ ia->ia_subnetmask != (u_long)0xffffffff); } /* * Return 1 if the address might be a local broadcast address. */ int in_broadcast(struct in_addr in, struct ifnet *ifp) { struct ifaddr *ifa; int found; NET_EPOCH_ASSERT(); if (in.s_addr == INADDR_BROADCAST || in.s_addr == INADDR_ANY) return (1); if ((ifp->if_flags & IFF_BROADCAST) == 0) return (0); found = 0; /* * Look through the list of addresses for a match * with a broadcast address. */ CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) if (ifa->ifa_addr->sa_family == AF_INET && in_ifaddr_broadcast(in, (struct in_ifaddr *)ifa)) { found = 1; break; } return (found); } /* * On interface removal, clean up IPv4 data structures hung off of the ifnet. */ void in_ifdetach(struct ifnet *ifp) { IN_MULTI_LOCK(); in_pcbpurgeif0(&V_ripcbinfo, ifp); in_pcbpurgeif0(&V_udbinfo, ifp); in_pcbpurgeif0(&V_ulitecbinfo, ifp); in_purgemaddrs(ifp); IN_MULTI_UNLOCK(); /* * Make sure all multicast deletions invoking if_ioctl() are * completed before returning. Else we risk accessing a freed * ifnet structure pointer. */ inm_release_wait(NULL); } /* * Delete all IPv4 multicast address records, and associated link-layer * multicast address records, associated with ifp. * XXX It looks like domifdetach runs AFTER the link layer cleanup. * XXX This should not race with ifma_protospec being set during * a new allocation, if it does, we have bigger problems. */ static void in_purgemaddrs(struct ifnet *ifp) { struct in_multi_head purgeinms; struct in_multi *inm; struct ifmultiaddr *ifma, *next; SLIST_INIT(&purgeinms); IN_MULTI_LIST_LOCK(); /* * Extract list of in_multi associated with the detaching ifp * which the PF_INET layer is about to release. * We need to do this as IF_ADDR_LOCK() may be re-acquired * by code further down. */ IF_ADDR_WLOCK(ifp); restart: CK_STAILQ_FOREACH_SAFE(ifma, &ifp->if_multiaddrs, ifma_link, next) { if (ifma->ifma_addr->sa_family != AF_INET || ifma->ifma_protospec == NULL) continue; inm = (struct in_multi *)ifma->ifma_protospec; inm_rele_locked(&purgeinms, inm); if (__predict_false(ifma_restart)) { ifma_restart = true; goto restart; } } IF_ADDR_WUNLOCK(ifp); inm_release_list_deferred(&purgeinms); igmp_ifdetach(ifp); IN_MULTI_LIST_UNLOCK(); } struct in_llentry { struct llentry base; }; #define IN_LLTBL_DEFAULT_HSIZE 32 #define IN_LLTBL_HASH(k, h) \ (((((((k >> 8) ^ k) >> 8) ^ k) >> 8) ^ k) & ((h) - 1)) /* * Do actual deallocation of @lle. */ static void in_lltable_destroy_lle_unlocked(epoch_context_t ctx) { struct llentry *lle; lle = __containerof(ctx, struct llentry, lle_epoch_ctx); LLE_LOCK_DESTROY(lle); LLE_REQ_DESTROY(lle); free(lle, M_LLTABLE); } /* * Called by LLE_FREE_LOCKED when number of references * drops to zero. */ static void in_lltable_destroy_lle(struct llentry *lle) { LLE_WUNLOCK(lle); NET_EPOCH_CALL(in_lltable_destroy_lle_unlocked, &lle->lle_epoch_ctx); } static struct llentry * in_lltable_new(struct in_addr addr4, u_int flags) { struct in_llentry *lle; lle = malloc(sizeof(struct in_llentry), M_LLTABLE, M_NOWAIT | M_ZERO); if (lle == NULL) /* NB: caller generates msg */ return NULL; /* * For IPv4 this will trigger "arpresolve" to generate * an ARP request. */ lle->base.la_expire = time_uptime; /* mark expired */ lle->base.r_l3addr.addr4 = addr4; lle->base.lle_refcnt = 1; lle->base.lle_free = in_lltable_destroy_lle; LLE_LOCK_INIT(&lle->base); LLE_REQ_INIT(&lle->base); callout_init(&lle->base.lle_timer, 1); return (&lle->base); } #define IN_ARE_MASKED_ADDR_EQUAL(d, a, m) ( \ ((((d).s_addr ^ (a).s_addr) & (m).s_addr)) == 0 ) static int in_lltable_match_prefix(const struct sockaddr *saddr, const struct sockaddr *smask, u_int flags, struct llentry *lle) { struct in_addr addr, mask, lle_addr; addr = ((const struct sockaddr_in *)saddr)->sin_addr; mask = ((const struct sockaddr_in *)smask)->sin_addr; lle_addr.s_addr = ntohl(lle->r_l3addr.addr4.s_addr); if (IN_ARE_MASKED_ADDR_EQUAL(lle_addr, addr, mask) == 0) return (0); if (lle->la_flags & LLE_IFADDR) { /* * Delete LLE_IFADDR records IFF address & flag matches. * Note that addr is the interface address within prefix * being matched. * Note also we should handle 'ifdown' cases without removing * ifaddr macs. */ if (addr.s_addr == lle_addr.s_addr && (flags & LLE_STATIC) != 0) return (1); return (0); } /* flags & LLE_STATIC means deleting both dynamic and static entries */ if ((flags & LLE_STATIC) || !(lle->la_flags & LLE_STATIC)) return (1); return (0); } static void in_lltable_free_entry(struct lltable *llt, struct llentry *lle) { size_t pkts_dropped; LLE_WLOCK_ASSERT(lle); KASSERT(llt != NULL, ("lltable is NULL")); /* Unlink entry from table if not already */ if ((lle->la_flags & LLE_LINKED) != 0) { IF_AFDATA_WLOCK_ASSERT(llt->llt_ifp); lltable_unlink_entry(llt, lle); } /* Drop hold queue */ pkts_dropped = llentry_free(lle); ARPSTAT_ADD(dropped, pkts_dropped); } static int in_lltable_rtcheck(struct ifnet *ifp, u_int flags, const struct sockaddr *l3addr) { struct nhop_object *nh; struct in_addr addr; KASSERT(l3addr->sa_family == AF_INET, ("sin_family %d", l3addr->sa_family)); addr = ((const struct sockaddr_in *)l3addr)->sin_addr; nh = fib4_lookup(ifp->if_fib, addr, 0, NHR_NONE, 0); if (nh == NULL) return (EINVAL); /* * If the gateway for an existing host route matches the target L3 * address, which is a special route inserted by some implementation * such as MANET, and the interface is of the correct type, then * allow for ARP to proceed. */ if (nh->nh_flags & NHF_GATEWAY) { if (!(nh->nh_flags & NHF_HOST) || nh->nh_ifp->if_type != IFT_ETHER || (nh->nh_ifp->if_flags & (IFF_NOARP | IFF_STATICARP)) != 0 || memcmp(nh->gw_sa.sa_data, l3addr->sa_data, sizeof(in_addr_t)) != 0) { return (EINVAL); } } /* * Make sure that at least the destination address is covered * by the route. This is for handling the case where 2 or more * interfaces have the same prefix. An incoming packet arrives * on one interface and the corresponding outgoing packet leaves * another interface. */ if ((nh->nh_ifp != ifp) && (nh->nh_flags & NHF_HOST) == 0) { struct in_ifaddr *ia = (struct in_ifaddr *)ifaof_ifpforaddr(l3addr, ifp); struct in_addr dst_addr, mask_addr; if (ia == NULL) return (EINVAL); /* * ifaof_ifpforaddr() returns _best matching_ IFA. * It is possible that ifa prefix does not cover our address. * Explicitly verify and fail if that's the case. */ dst_addr = IA_SIN(ia)->sin_addr; mask_addr.s_addr = htonl(ia->ia_subnetmask); if (!IN_ARE_MASKED_ADDR_EQUAL(dst_addr, addr, mask_addr)) return (EINVAL); } return (0); } static inline uint32_t in_lltable_hash_dst(const struct in_addr dst, uint32_t hsize) { return (IN_LLTBL_HASH(dst.s_addr, hsize)); } static uint32_t in_lltable_hash(const struct llentry *lle, uint32_t hsize) { return (in_lltable_hash_dst(lle->r_l3addr.addr4, hsize)); } static void in_lltable_fill_sa_entry(const struct llentry *lle, struct sockaddr *sa) { struct sockaddr_in *sin; sin = (struct sockaddr_in *)sa; bzero(sin, sizeof(*sin)); sin->sin_family = AF_INET; sin->sin_len = sizeof(*sin); sin->sin_addr = lle->r_l3addr.addr4; } static inline struct llentry * in_lltable_find_dst(struct lltable *llt, struct in_addr dst) { struct llentry *lle; struct llentries *lleh; u_int hashidx; hashidx = in_lltable_hash_dst(dst, llt->llt_hsize); lleh = &llt->lle_head[hashidx]; CK_LIST_FOREACH(lle, lleh, lle_next) { if (lle->la_flags & LLE_DELETED) continue; if (lle->r_l3addr.addr4.s_addr == dst.s_addr) break; } return (lle); } static void in_lltable_delete_entry(struct lltable *llt, struct llentry *lle) { lle->la_flags |= LLE_DELETED; EVENTHANDLER_INVOKE(lle_event, lle, LLENTRY_DELETED); #ifdef DIAGNOSTIC log(LOG_INFO, "ifaddr cache = %p is deleted\n", lle); #endif llentry_free(lle); } static struct llentry * in_lltable_alloc(struct lltable *llt, u_int flags, const struct sockaddr *l3addr) { const struct sockaddr_in *sin = (const struct sockaddr_in *)l3addr; struct ifnet *ifp = llt->llt_ifp; struct llentry *lle; char linkhdr[LLE_MAX_LINKHDR]; size_t linkhdrsize; int lladdr_off; KASSERT(l3addr->sa_family == AF_INET, ("sin_family %d", l3addr->sa_family)); /* * A route that covers the given address must have * been installed 1st because we are doing a resolution, * verify this. */ if (!(flags & LLE_IFADDR) && in_lltable_rtcheck(ifp, flags, l3addr) != 0) return (NULL); lle = in_lltable_new(sin->sin_addr, flags); if (lle == NULL) { log(LOG_INFO, "lla_lookup: new lle malloc failed\n"); return (NULL); } lle->la_flags = flags; if (flags & LLE_STATIC) lle->r_flags |= RLLE_VALID; if ((flags & LLE_IFADDR) == LLE_IFADDR) { linkhdrsize = LLE_MAX_LINKHDR; if (lltable_calc_llheader(ifp, AF_INET, IF_LLADDR(ifp), linkhdr, &linkhdrsize, &lladdr_off) != 0) { in_lltable_free_entry(llt, lle); return (NULL); } lltable_set_entry_addr(ifp, lle, linkhdr, linkhdrsize, lladdr_off); lle->la_flags |= LLE_STATIC; lle->r_flags |= (RLLE_VALID | RLLE_IFADDR); } return (lle); } /* * Return NULL if not found or marked for deletion. * If found return lle read locked. */ static struct llentry * in_lltable_lookup(struct lltable *llt, u_int flags, const struct sockaddr *l3addr) { const struct sockaddr_in *sin = (const struct sockaddr_in *)l3addr; struct llentry *lle; IF_AFDATA_LOCK_ASSERT(llt->llt_ifp); KASSERT(l3addr->sa_family == AF_INET, ("sin_family %d", l3addr->sa_family)); KASSERT((flags & (LLE_UNLOCKED | LLE_EXCLUSIVE)) != (LLE_UNLOCKED | LLE_EXCLUSIVE), ("wrong lle request flags: %#x", flags)); lle = in_lltable_find_dst(llt, sin->sin_addr); if (lle == NULL) return (NULL); if (flags & LLE_UNLOCKED) return (lle); if (flags & LLE_EXCLUSIVE) LLE_WLOCK(lle); else LLE_RLOCK(lle); /* * If the afdata lock is not held, the LLE may have been unlinked while * we were blocked on the LLE lock. Check for this case. */ if (__predict_false((lle->la_flags & LLE_LINKED) == 0)) { if (flags & LLE_EXCLUSIVE) LLE_WUNLOCK(lle); else LLE_RUNLOCK(lle); return (NULL); } return (lle); } static int in_lltable_dump_entry(struct lltable *llt, struct llentry *lle, struct sysctl_req *wr) { struct ifnet *ifp = llt->llt_ifp; /* XXX stack use */ struct { struct rt_msghdr rtm; struct sockaddr_in sin; struct sockaddr_dl sdl; } arpc; struct sockaddr_dl *sdl; int error; bzero(&arpc, sizeof(arpc)); /* skip deleted entries */ if ((lle->la_flags & LLE_DELETED) == LLE_DELETED) return (0); /* Skip if jailed and not a valid IP of the prison. */ lltable_fill_sa_entry(lle,(struct sockaddr *)&arpc.sin); if (prison_if(wr->td->td_ucred, (struct sockaddr *)&arpc.sin) != 0) return (0); /* * produce a msg made of: * struct rt_msghdr; * struct sockaddr_in; (IPv4) * struct sockaddr_dl; */ arpc.rtm.rtm_msglen = sizeof(arpc); arpc.rtm.rtm_version = RTM_VERSION; arpc.rtm.rtm_type = RTM_GET; arpc.rtm.rtm_flags = RTF_UP; arpc.rtm.rtm_addrs = RTA_DST | RTA_GATEWAY; /* publish */ if (lle->la_flags & LLE_PUB) arpc.rtm.rtm_flags |= RTF_ANNOUNCE; sdl = &arpc.sdl; sdl->sdl_family = AF_LINK; sdl->sdl_len = sizeof(*sdl); sdl->sdl_index = ifp->if_index; sdl->sdl_type = ifp->if_type; if ((lle->la_flags & LLE_VALID) == LLE_VALID) { sdl->sdl_alen = ifp->if_addrlen; bcopy(lle->ll_addr, LLADDR(sdl), ifp->if_addrlen); } else { sdl->sdl_alen = 0; bzero(LLADDR(sdl), ifp->if_addrlen); } arpc.rtm.rtm_rmx.rmx_expire = lle->la_flags & LLE_STATIC ? 0 : lle->la_expire; arpc.rtm.rtm_flags |= (RTF_HOST | RTF_LLDATA); if (lle->la_flags & LLE_STATIC) arpc.rtm.rtm_flags |= RTF_STATIC; if (lle->la_flags & LLE_IFADDR) arpc.rtm.rtm_flags |= RTF_PINNED; arpc.rtm.rtm_index = ifp->if_index; error = SYSCTL_OUT(wr, &arpc, sizeof(arpc)); return (error); } static void in_lltable_post_resolved(struct lltable *llt, struct llentry *lle) { struct ifnet *ifp = llt->llt_ifp; /* gratuitous ARP */ if ((lle->la_flags & LLE_PUB) != 0) arprequest(ifp, &lle->r_l3addr.addr4, &lle->r_l3addr.addr4, lle->ll_addr); } static struct lltable * in_lltattach(struct ifnet *ifp) { struct lltable *llt; llt = lltable_allocate_htbl(IN_LLTBL_DEFAULT_HSIZE); llt->llt_af = AF_INET; llt->llt_ifp = ifp; llt->llt_lookup = in_lltable_lookup; llt->llt_alloc_entry = in_lltable_alloc; llt->llt_delete_entry = in_lltable_delete_entry; llt->llt_dump_entry = in_lltable_dump_entry; llt->llt_hash = in_lltable_hash; llt->llt_fill_sa_entry = in_lltable_fill_sa_entry; llt->llt_free_entry = in_lltable_free_entry; llt->llt_match_prefix = in_lltable_match_prefix; llt->llt_mark_used = llentry_mark_used; llt->llt_post_resolved = in_lltable_post_resolved; lltable_link(llt); return (llt); } struct lltable * in_lltable_get(struct ifnet *ifp) { struct lltable *llt = NULL; void *afdata_ptr = ifp->if_afdata[AF_INET]; if (afdata_ptr != NULL) llt = ((struct in_ifinfo *)afdata_ptr)->ii_llt; return (llt); } void * in_domifattach(struct ifnet *ifp) { struct in_ifinfo *ii; ii = malloc(sizeof(struct in_ifinfo), M_IFADDR, M_WAITOK|M_ZERO); ii->ii_llt = in_lltattach(ifp); ii->ii_igmp = igmp_domifattach(ifp); return (ii); } void in_domifdetach(struct ifnet *ifp, void *aux) { struct in_ifinfo *ii = (struct in_ifinfo *)aux; igmp_domifdetach(ifp); lltable_free(ii->ii_llt); free(ii, M_IFADDR); } diff --git a/sys/netinet/in.h b/sys/netinet/in.h index 1fc5c173b33c..44d64190ed01 100644 --- a/sys/netinet/in.h +++ b/sys/netinet/in.h @@ -1,690 +1,708 @@ /*- * SPDX-License-Identifier: BSD-3-Clause * * Copyright (c) 1982, 1986, 1990, 1993 * The Regents of the University of California. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)in.h 8.3 (Berkeley) 1/3/94 * $FreeBSD$ */ #ifndef _NETINET_IN_H_ #define _NETINET_IN_H_ #include #include #include /* Protocols common to RFC 1700, POSIX, and X/Open. */ #define IPPROTO_IP 0 /* dummy for IP */ #define IPPROTO_ICMP 1 /* control message protocol */ #define IPPROTO_TCP 6 /* tcp */ #define IPPROTO_UDP 17 /* user datagram protocol */ #define INADDR_ANY ((in_addr_t)0x00000000) #define INADDR_BROADCAST ((in_addr_t)0xffffffff) /* must be masked */ #ifndef _UINT8_T_DECLARED typedef __uint8_t uint8_t; #define _UINT8_T_DECLARED #endif #ifndef _UINT16_T_DECLARED typedef __uint16_t uint16_t; #define _UINT16_T_DECLARED #endif #ifndef _UINT32_T_DECLARED typedef __uint32_t uint32_t; #define _UINT32_T_DECLARED #endif #ifndef _IN_ADDR_T_DECLARED typedef uint32_t in_addr_t; #define _IN_ADDR_T_DECLARED #endif #ifndef _IN_PORT_T_DECLARED typedef uint16_t in_port_t; #define _IN_PORT_T_DECLARED #endif #ifndef _SA_FAMILY_T_DECLARED typedef __sa_family_t sa_family_t; #define _SA_FAMILY_T_DECLARED #endif /* Internet address (a structure for historical reasons). */ #ifndef _STRUCT_IN_ADDR_DECLARED struct in_addr { in_addr_t s_addr; }; #define _STRUCT_IN_ADDR_DECLARED #endif #ifndef _SOCKLEN_T_DECLARED typedef __socklen_t socklen_t; #define _SOCKLEN_T_DECLARED #endif #include /* Socket address, internet style. */ struct sockaddr_in { uint8_t sin_len; sa_family_t sin_family; in_port_t sin_port; struct in_addr sin_addr; char sin_zero[8]; }; #if !defined(_KERNEL) && __POSIX_VISIBLE >= 200112 #ifndef _BYTEORDER_PROTOTYPED #define _BYTEORDER_PROTOTYPED __BEGIN_DECLS uint32_t htonl(uint32_t); uint16_t htons(uint16_t); uint32_t ntohl(uint32_t); uint16_t ntohs(uint16_t); __END_DECLS #endif #ifndef _BYTEORDER_FUNC_DEFINED #define _BYTEORDER_FUNC_DEFINED #define htonl(x) __htonl(x) #define htons(x) __htons(x) #define ntohl(x) __ntohl(x) #define ntohs(x) __ntohs(x) #endif #endif /* !_KERNEL && __POSIX_VISIBLE >= 200112 */ #if __POSIX_VISIBLE >= 200112 #define IPPROTO_IPV6 41 /* IP6 header */ #define IPPROTO_RAW 255 /* raw IP packet */ #define INET_ADDRSTRLEN 16 #endif #if __BSD_VISIBLE /* * Constants and structures defined by the internet system, * Per RFC 790, September 1981, and numerous additions. */ /* * Protocols (RFC 1700) */ #define IPPROTO_HOPOPTS 0 /* IP6 hop-by-hop options */ #define IPPROTO_IGMP 2 /* group mgmt protocol */ #define IPPROTO_GGP 3 /* gateway^2 (deprecated) */ #define IPPROTO_IPV4 4 /* IPv4 encapsulation */ #define IPPROTO_IPIP IPPROTO_IPV4 /* for compatibility */ #define IPPROTO_ST 7 /* Stream protocol II */ #define IPPROTO_EGP 8 /* exterior gateway protocol */ #define IPPROTO_PIGP 9 /* private interior gateway */ #define IPPROTO_RCCMON 10 /* BBN RCC Monitoring */ #define IPPROTO_NVPII 11 /* network voice protocol*/ #define IPPROTO_PUP 12 /* pup */ #define IPPROTO_ARGUS 13 /* Argus */ #define IPPROTO_EMCON 14 /* EMCON */ #define IPPROTO_XNET 15 /* Cross Net Debugger */ #define IPPROTO_CHAOS 16 /* Chaos*/ #define IPPROTO_MUX 18 /* Multiplexing */ #define IPPROTO_MEAS 19 /* DCN Measurement Subsystems */ #define IPPROTO_HMP 20 /* Host Monitoring */ #define IPPROTO_PRM 21 /* Packet Radio Measurement */ #define IPPROTO_IDP 22 /* xns idp */ #define IPPROTO_TRUNK1 23 /* Trunk-1 */ #define IPPROTO_TRUNK2 24 /* Trunk-2 */ #define IPPROTO_LEAF1 25 /* Leaf-1 */ #define IPPROTO_LEAF2 26 /* Leaf-2 */ #define IPPROTO_RDP 27 /* Reliable Data */ #define IPPROTO_IRTP 28 /* Reliable Transaction */ #define IPPROTO_TP 29 /* tp-4 w/ class negotiation */ #define IPPROTO_BLT 30 /* Bulk Data Transfer */ #define IPPROTO_NSP 31 /* Network Services */ #define IPPROTO_INP 32 /* Merit Internodal */ #define IPPROTO_DCCP 33 /* Datagram Congestion Control Protocol */ #define IPPROTO_3PC 34 /* Third Party Connect */ #define IPPROTO_IDPR 35 /* InterDomain Policy Routing */ #define IPPROTO_XTP 36 /* XTP */ #define IPPROTO_DDP 37 /* Datagram Delivery */ #define IPPROTO_CMTP 38 /* Control Message Transport */ #define IPPROTO_TPXX 39 /* TP++ Transport */ #define IPPROTO_IL 40 /* IL transport protocol */ #define IPPROTO_SDRP 42 /* Source Demand Routing */ #define IPPROTO_ROUTING 43 /* IP6 routing header */ #define IPPROTO_FRAGMENT 44 /* IP6 fragmentation header */ #define IPPROTO_IDRP 45 /* InterDomain Routing*/ #define IPPROTO_RSVP 46 /* resource reservation */ #define IPPROTO_GRE 47 /* General Routing Encap. */ #define IPPROTO_MHRP 48 /* Mobile Host Routing */ #define IPPROTO_BHA 49 /* BHA */ #define IPPROTO_ESP 50 /* IP6 Encap Sec. Payload */ #define IPPROTO_AH 51 /* IP6 Auth Header */ #define IPPROTO_INLSP 52 /* Integ. Net Layer Security */ #define IPPROTO_SWIPE 53 /* IP with encryption */ #define IPPROTO_NHRP 54 /* Next Hop Resolution */ #define IPPROTO_MOBILE 55 /* IP Mobility */ #define IPPROTO_TLSP 56 /* Transport Layer Security */ #define IPPROTO_SKIP 57 /* SKIP */ #define IPPROTO_ICMPV6 58 /* ICMP6 */ #define IPPROTO_NONE 59 /* IP6 no next header */ #define IPPROTO_DSTOPTS 60 /* IP6 destination option */ #define IPPROTO_AHIP 61 /* any host internal protocol */ #define IPPROTO_CFTP 62 /* CFTP */ #define IPPROTO_HELLO 63 /* "hello" routing protocol */ #define IPPROTO_SATEXPAK 64 /* SATNET/Backroom EXPAK */ #define IPPROTO_KRYPTOLAN 65 /* Kryptolan */ #define IPPROTO_RVD 66 /* Remote Virtual Disk */ #define IPPROTO_IPPC 67 /* Pluribus Packet Core */ #define IPPROTO_ADFS 68 /* Any distributed FS */ #define IPPROTO_SATMON 69 /* Satnet Monitoring */ #define IPPROTO_VISA 70 /* VISA Protocol */ #define IPPROTO_IPCV 71 /* Packet Core Utility */ #define IPPROTO_CPNX 72 /* Comp. Prot. Net. Executive */ #define IPPROTO_CPHB 73 /* Comp. Prot. HeartBeat */ #define IPPROTO_WSN 74 /* Wang Span Network */ #define IPPROTO_PVP 75 /* Packet Video Protocol */ #define IPPROTO_BRSATMON 76 /* BackRoom SATNET Monitoring */ #define IPPROTO_ND 77 /* Sun net disk proto (temp.) */ #define IPPROTO_WBMON 78 /* WIDEBAND Monitoring */ #define IPPROTO_WBEXPAK 79 /* WIDEBAND EXPAK */ #define IPPROTO_EON 80 /* ISO cnlp */ #define IPPROTO_VMTP 81 /* VMTP */ #define IPPROTO_SVMTP 82 /* Secure VMTP */ #define IPPROTO_VINES 83 /* Banyon VINES */ #define IPPROTO_TTP 84 /* TTP */ #define IPPROTO_IGP 85 /* NSFNET-IGP */ #define IPPROTO_DGP 86 /* dissimilar gateway prot. */ #define IPPROTO_TCF 87 /* TCF */ #define IPPROTO_IGRP 88 /* Cisco/GXS IGRP */ #define IPPROTO_OSPFIGP 89 /* OSPFIGP */ #define IPPROTO_SRPC 90 /* Strite RPC protocol */ #define IPPROTO_LARP 91 /* Locus Address Resoloution */ #define IPPROTO_MTP 92 /* Multicast Transport */ #define IPPROTO_AX25 93 /* AX.25 Frames */ #define IPPROTO_IPEIP 94 /* IP encapsulated in IP */ #define IPPROTO_MICP 95 /* Mobile Int.ing control */ #define IPPROTO_SCCSP 96 /* Semaphore Comm. security */ #define IPPROTO_ETHERIP 97 /* Ethernet IP encapsulation */ #define IPPROTO_ENCAP 98 /* encapsulation header */ #define IPPROTO_APES 99 /* any private encr. scheme */ #define IPPROTO_GMTP 100 /* GMTP*/ #define IPPROTO_IPCOMP 108 /* payload compression (IPComp) */ #define IPPROTO_SCTP 132 /* SCTP */ #define IPPROTO_MH 135 /* IPv6 Mobility Header */ #define IPPROTO_UDPLITE 136 /* UDP-Lite */ #define IPPROTO_HIP 139 /* IP6 Host Identity Protocol */ #define IPPROTO_SHIM6 140 /* IP6 Shim6 Protocol */ /* 101-254: Partly Unassigned */ #define IPPROTO_PIM 103 /* Protocol Independent Mcast */ #define IPPROTO_CARP 112 /* CARP */ #define IPPROTO_PGM 113 /* PGM */ #define IPPROTO_MPLS 137 /* MPLS-in-IP */ #define IPPROTO_PFSYNC 240 /* PFSYNC */ #define IPPROTO_RESERVED_253 253 /* Reserved */ #define IPPROTO_RESERVED_254 254 /* Reserved */ /* 255: Reserved */ /* BSD Private, local use, namespace incursion, no longer used */ #define IPPROTO_OLD_DIVERT 254 /* OLD divert pseudo-proto */ #define IPPROTO_MAX 256 /* last return value of *_input(), meaning "all job for this pkt is done". */ #define IPPROTO_DONE 257 /* Only used internally, so can be outside the range of valid IP protocols. */ #define IPPROTO_DIVERT 258 /* divert pseudo-protocol */ #define IPPROTO_SEND 259 /* SeND pseudo-protocol */ /* * Defined to avoid confusion. The master value is defined by * PROTO_SPACER in sys/protosw.h. */ #define IPPROTO_SPACER 32767 /* spacer for loadable protos */ /* * Local port number conventions: * * When a user does a bind(2) or connect(2) with a port number of zero, * a non-conflicting local port address is chosen. * The default range is IPPORT_HIFIRSTAUTO through * IPPORT_HILASTAUTO, although that is settable by sysctl. * * A user may set the IPPROTO_IP option IP_PORTRANGE to change this * default assignment range. * * The value IP_PORTRANGE_DEFAULT causes the default behavior. * * The value IP_PORTRANGE_HIGH changes the range of candidate port numbers * into the "high" range. These are reserved for client outbound connections * which do not want to be filtered by any firewalls. * * The value IP_PORTRANGE_LOW changes the range to the "low" are * that is (by convention) restricted to privileged processes. This * convention is based on "vouchsafe" principles only. It is only secure * if you trust the remote host to restrict these ports. * * The default range of ports and the high range can be changed by * sysctl(3). (net.inet.ip.portrange.{hi,low,}{first,last}) * * Changing those values has bad security implications if you are * using a stateless firewall that is allowing packets outside of that * range in order to allow transparent outgoing connections. * * Such a firewall configuration will generally depend on the use of these * default values. If you change them, you may find your Security * Administrator looking for you with a heavy object. * * For a slightly more orthodox text view on this: * * ftp://ftp.isi.edu/in-notes/iana/assignments/port-numbers * * port numbers are divided into three ranges: * * 0 - 1023 Well Known Ports * 1024 - 49151 Registered Ports * 49152 - 65535 Dynamic and/or Private Ports * */ /* * Ports < IPPORT_RESERVED are reserved for * privileged processes (e.g. root). (IP_PORTRANGE_LOW) */ #define IPPORT_RESERVED 1024 /* * Default local port range, used by IP_PORTRANGE_DEFAULT */ #define IPPORT_EPHEMERALFIRST 10000 #define IPPORT_EPHEMERALLAST 65535 /* * Dynamic port range, used by IP_PORTRANGE_HIGH. */ #define IPPORT_HIFIRSTAUTO 49152 #define IPPORT_HILASTAUTO 65535 /* * Scanning for a free reserved port return a value below IPPORT_RESERVED, * but higher than IPPORT_RESERVEDSTART. Traditionally the start value was * 512, but that conflicts with some well-known-services that firewalls may * have a fit if we use. */ #define IPPORT_RESERVEDSTART 600 #define IPPORT_MAX 65535 /* * Historical definitions of bits in internet address integers * (pre-CIDR). Class A/B/C are long obsolete, and now deprecated. * Hide these definitions from the kernel unless IN_HISTORICAL_NETS * is defined. Provide the historical definitions to user level for now. */ #ifndef _KERNEL #define IN_HISTORICAL_NETS #endif #ifdef IN_HISTORICAL_NETS #define IN_CLASSA(i) (((in_addr_t)(i) & 0x80000000) == 0) #define IN_CLASSA_NET 0xff000000 #define IN_CLASSA_NSHIFT 24 #define IN_CLASSA_HOST 0x00ffffff #define IN_CLASSA_MAX 128 #define IN_CLASSB(i) (((in_addr_t)(i) & 0xc0000000) == 0x80000000) #define IN_CLASSB_NET 0xffff0000 #define IN_CLASSB_NSHIFT 16 #define IN_CLASSB_HOST 0x0000ffff #define IN_CLASSB_MAX 65536 #define IN_CLASSC(i) (((in_addr_t)(i) & 0xe0000000) == 0xc0000000) #define IN_CLASSC_NET 0xffffff00 #define IN_CLASSC_NSHIFT 8 #define IN_CLASSC_HOST 0x000000ff #endif /* IN_HISTORICAL_NETS */ #define IN_NETMASK_DEFAULT 0xffffff00 /* mask when forced to guess */ #define IN_MULTICAST(i) (((in_addr_t)(i) & 0xf0000000) == 0xe0000000) #ifdef IN_HISTORICAL_NETS #define IN_CLASSD(i) IN_MULTICAST(i) #define IN_CLASSD_NET 0xf0000000 /* These ones aren't really */ #define IN_CLASSD_NSHIFT 28 /* net and host fields, but */ #define IN_CLASSD_HOST 0x0fffffff /* routing needn't know. */ #endif /* IN_HISTORICAL_NETS */ #define IN_EXPERIMENTAL(i) (((in_addr_t)(i) & 0xf0000000) == 0xf0000000) #define IN_BADCLASS(i) (((in_addr_t)(i) & 0xf0000000) == 0xf0000000) #define IN_LINKLOCAL(i) (((in_addr_t)(i) & 0xffff0000) == 0xa9fe0000) +#ifdef _KERNEL +#define IN_LOOPBACK(i) \ + (((in_addr_t)(i) & V_in_loopback_mask) == 0x7f000000) +#define IN_LOOPBACK_MASK_DFLT 0xff000000 +#else #define IN_LOOPBACK(i) (((in_addr_t)(i) & 0xff000000) == 0x7f000000) +#endif #define IN_ZERONET(i) (((in_addr_t)(i) & 0xff000000) == 0) #define IN_PRIVATE(i) ((((in_addr_t)(i) & 0xff000000) == 0x0a000000) || \ (((in_addr_t)(i) & 0xfff00000) == 0xac100000) || \ (((in_addr_t)(i) & 0xffff0000) == 0xc0a80000)) #define IN_LOCAL_GROUP(i) (((in_addr_t)(i) & 0xffffff00) == 0xe0000000) #define IN_ANY_LOCAL(i) (IN_LINKLOCAL(i) || IN_LOCAL_GROUP(i)) #define INADDR_LOOPBACK ((in_addr_t)0x7f000001) #ifndef _KERNEL #define INADDR_NONE ((in_addr_t)0xffffffff) /* -1 return */ #endif #define INADDR_UNSPEC_GROUP ((in_addr_t)0xe0000000) /* 224.0.0.0 */ #define INADDR_ALLHOSTS_GROUP ((in_addr_t)0xe0000001) /* 224.0.0.1 */ #define INADDR_ALLRTRS_GROUP ((in_addr_t)0xe0000002) /* 224.0.0.2 */ #define INADDR_ALLRPTS_GROUP ((in_addr_t)0xe0000016) /* 224.0.0.22, IGMPv3 */ #define INADDR_CARP_GROUP ((in_addr_t)0xe0000012) /* 224.0.0.18 */ #define INADDR_PFSYNC_GROUP ((in_addr_t)0xe00000f0) /* 224.0.0.240 */ #define INADDR_ALLMDNS_GROUP ((in_addr_t)0xe00000fb) /* 224.0.0.251 */ #define INADDR_MAX_LOCAL_GROUP ((in_addr_t)0xe00000ff) /* 224.0.0.255 */ #ifdef IN_HISTORICAL_NETS #define IN_LOOPBACKNET 127 /* official! */ #endif /* IN_HISTORICAL_NETS */ #define IN_RFC3021_MASK ((in_addr_t)0xfffffffe) +#ifdef _KERNEL +#include + +VNET_DECLARE(bool, ip_allow_net0); +VNET_DECLARE(bool, ip_allow_net240); +/* Address space reserved for loopback */ +VNET_DECLARE(uint32_t, in_loopback_mask); +#define V_ip_allow_net0 VNET(ip_allow_net0) +#define V_ip_allow_net240 VNET(ip_allow_net240) +#define V_in_loopback_mask VNET(in_loopback_mask) +#endif + /* * Options for use with [gs]etsockopt at the IP level. * First word of comment is data type; bool is stored in int. */ #define IP_OPTIONS 1 /* buf/ip_opts; set/get IP options */ #define IP_HDRINCL 2 /* int; header is included with data */ #define IP_TOS 3 /* int; IP type of service and preced. */ #define IP_TTL 4 /* int; IP time to live */ #define IP_RECVOPTS 5 /* bool; receive all IP opts w/dgram */ #define IP_RECVRETOPTS 6 /* bool; receive IP opts for response */ #define IP_RECVDSTADDR 7 /* bool; receive IP dst addr w/dgram */ #define IP_SENDSRCADDR IP_RECVDSTADDR /* cmsg_type to set src addr */ #define IP_RETOPTS 8 /* ip_opts; set/get IP options */ #define IP_MULTICAST_IF 9 /* struct in_addr *or* struct ip_mreqn; * set/get IP multicast i/f */ #define IP_MULTICAST_TTL 10 /* u_char; set/get IP multicast ttl */ #define IP_MULTICAST_LOOP 11 /* u_char; set/get IP multicast loopback */ #define IP_ADD_MEMBERSHIP 12 /* ip_mreq; add an IP group membership */ #define IP_DROP_MEMBERSHIP 13 /* ip_mreq; drop an IP group membership */ #define IP_MULTICAST_VIF 14 /* set/get IP mcast virt. iface */ #define IP_RSVP_ON 15 /* enable RSVP in kernel */ #define IP_RSVP_OFF 16 /* disable RSVP in kernel */ #define IP_RSVP_VIF_ON 17 /* set RSVP per-vif socket */ #define IP_RSVP_VIF_OFF 18 /* unset RSVP per-vif socket */ #define IP_PORTRANGE 19 /* int; range to choose for unspec port */ #define IP_RECVIF 20 /* bool; receive reception if w/dgram */ /* for IPSEC */ #define IP_IPSEC_POLICY 21 /* int; set/get security policy */ /* unused; was IP_FAITH */ #define IP_ONESBCAST 23 /* bool: send all-ones broadcast */ #define IP_BINDANY 24 /* bool: allow bind to any address */ #define IP_BINDMULTI 25 /* bool: allow multiple listeners on a tuple */ #define IP_RSS_LISTEN_BUCKET 26 /* int; set RSS listen bucket */ #define IP_ORIGDSTADDR 27 /* bool: receive IP dst addr/port w/dgram */ #define IP_RECVORIGDSTADDR IP_ORIGDSTADDR /* * Options for controlling the firewall and dummynet. * Historical options (from 40 to 64) will eventually be * replaced by only two options, IP_FW3 and IP_DUMMYNET3. */ #define IP_FW_TABLE_ADD 40 /* add entry */ #define IP_FW_TABLE_DEL 41 /* delete entry */ #define IP_FW_TABLE_FLUSH 42 /* flush table */ #define IP_FW_TABLE_GETSIZE 43 /* get table size */ #define IP_FW_TABLE_LIST 44 /* list table contents */ #define IP_FW3 48 /* generic ipfw v.3 sockopts */ #define IP_DUMMYNET3 49 /* generic dummynet v.3 sockopts */ #define IP_FW_ADD 50 /* add a firewall rule to chain */ #define IP_FW_DEL 51 /* delete a firewall rule from chain */ #define IP_FW_FLUSH 52 /* flush firewall rule chain */ #define IP_FW_ZERO 53 /* clear single/all firewall counter(s) */ #define IP_FW_GET 54 /* get entire firewall rule chain */ #define IP_FW_RESETLOG 55 /* reset logging counters */ #define IP_FW_NAT_CFG 56 /* add/config a nat rule */ #define IP_FW_NAT_DEL 57 /* delete a nat rule */ #define IP_FW_NAT_GET_CONFIG 58 /* get configuration of a nat rule */ #define IP_FW_NAT_GET_LOG 59 /* get log of a nat rule */ #define IP_DUMMYNET_CONFIGURE 60 /* add/configure a dummynet pipe */ #define IP_DUMMYNET_DEL 61 /* delete a dummynet pipe from chain */ #define IP_DUMMYNET_FLUSH 62 /* flush dummynet */ #define IP_DUMMYNET_GET 64 /* get entire dummynet pipes */ #define IP_RECVTTL 65 /* bool; receive IP TTL w/dgram */ #define IP_MINTTL 66 /* minimum TTL for packet or drop */ #define IP_DONTFRAG 67 /* don't fragment packet */ #define IP_RECVTOS 68 /* bool; receive IP TOS w/dgram */ /* IPv4 Source Filter Multicast API [RFC3678] */ #define IP_ADD_SOURCE_MEMBERSHIP 70 /* join a source-specific group */ #define IP_DROP_SOURCE_MEMBERSHIP 71 /* drop a single source */ #define IP_BLOCK_SOURCE 72 /* block a source */ #define IP_UNBLOCK_SOURCE 73 /* unblock a source */ /* The following option is private; do not use it from user applications. */ #define IP_MSFILTER 74 /* set/get filter list */ /* The following option deals with the 802.1Q Ethernet Priority Code Point */ #define IP_VLAN_PCP 75 /* int; set/get PCP used for packet, */ /* -1 use interface default */ /* Protocol Independent Multicast API [RFC3678] */ #define MCAST_JOIN_GROUP 80 /* join an any-source group */ #define MCAST_LEAVE_GROUP 81 /* leave all sources for group */ #define MCAST_JOIN_SOURCE_GROUP 82 /* join a source-specific group */ #define MCAST_LEAVE_SOURCE_GROUP 83 /* leave a single source */ #define MCAST_BLOCK_SOURCE 84 /* block a source */ #define MCAST_UNBLOCK_SOURCE 85 /* unblock a source */ /* Flow and RSS definitions */ #define IP_FLOWID 90 /* get flow id for the given socket/inp */ #define IP_FLOWTYPE 91 /* get flow type (M_HASHTYPE) */ #define IP_RSSBUCKETID 92 /* get RSS flowid -> bucket mapping */ #define IP_RECVFLOWID 93 /* bool; receive IP flowid/flowtype w/ datagram */ #define IP_RECVRSSBUCKETID 94 /* bool; receive IP RSS bucket id w/ datagram */ /* * Defaults and limits for options */ #define IP_DEFAULT_MULTICAST_TTL 1 /* normally limit m'casts to 1 hop */ #define IP_DEFAULT_MULTICAST_LOOP 1 /* normally hear sends if a member */ /* * Limit for IPv4 multicast memberships */ #define IP_MAX_MEMBERSHIPS 4095 /* * Default resource limits for IPv4 multicast source filtering. * These may be modified by sysctl. */ #define IP_MAX_GROUP_SRC_FILTER 512 /* sources per group */ #define IP_MAX_SOCK_SRC_FILTER 128 /* sources per socket/group */ #define IP_MAX_SOCK_MUTE_FILTER 128 /* XXX no longer used */ /* * Argument structure for IP_ADD_MEMBERSHIP and IP_DROP_MEMBERSHIP. */ struct ip_mreq { struct in_addr imr_multiaddr; /* IP multicast address of group */ struct in_addr imr_interface; /* local IP address of interface */ }; /* * Modified argument structure for IP_MULTICAST_IF, obtained from Linux. * This is used to specify an interface index for multicast sends, as * the IPv4 legacy APIs do not support this (unless IP_SENDIF is available). */ struct ip_mreqn { struct in_addr imr_multiaddr; /* IP multicast address of group */ struct in_addr imr_address; /* local IP address of interface */ int imr_ifindex; /* Interface index; cast to uint32_t */ }; /* * Argument structure for IPv4 Multicast Source Filter APIs. [RFC3678] */ struct ip_mreq_source { struct in_addr imr_multiaddr; /* IP multicast address of group */ struct in_addr imr_sourceaddr; /* IP address of source */ struct in_addr imr_interface; /* local IP address of interface */ }; /* * Argument structures for Protocol-Independent Multicast Source * Filter APIs. [RFC3678] */ struct group_req { uint32_t gr_interface; /* interface index */ struct sockaddr_storage gr_group; /* group address */ }; struct group_source_req { uint32_t gsr_interface; /* interface index */ struct sockaddr_storage gsr_group; /* group address */ struct sockaddr_storage gsr_source; /* source address */ }; #ifndef __MSFILTERREQ_DEFINED #define __MSFILTERREQ_DEFINED /* * The following structure is private; do not use it from user applications. * It is used to communicate IP_MSFILTER/IPV6_MSFILTER information between * the RFC 3678 libc functions and the kernel. */ struct __msfilterreq { uint32_t msfr_ifindex; /* interface index */ uint32_t msfr_fmode; /* filter mode for group */ uint32_t msfr_nsrcs; /* # of sources in msfr_srcs */ struct sockaddr_storage msfr_group; /* group address */ struct sockaddr_storage *msfr_srcs; /* pointer to the first member * of a contiguous array of * sources to filter in full. */ }; #endif struct sockaddr; /* * Advanced (Full-state) APIs [RFC3678] * The RFC specifies uint_t for the 6th argument to [sg]etsourcefilter(). * We use uint32_t here to be consistent. */ int setipv4sourcefilter(int, struct in_addr, struct in_addr, uint32_t, uint32_t, struct in_addr *); int getipv4sourcefilter(int, struct in_addr, struct in_addr, uint32_t *, uint32_t *, struct in_addr *); int setsourcefilter(int, uint32_t, struct sockaddr *, socklen_t, uint32_t, uint32_t, struct sockaddr_storage *); int getsourcefilter(int, uint32_t, struct sockaddr *, socklen_t, uint32_t *, uint32_t *, struct sockaddr_storage *); /* * Filter modes; also used to represent per-socket filter mode internally. */ #define MCAST_UNDEFINED 0 /* fmode: not yet defined */ #define MCAST_INCLUDE 1 /* fmode: include these source(s) */ #define MCAST_EXCLUDE 2 /* fmode: exclude these source(s) */ /* * Argument for IP_PORTRANGE: * - which range to search when port is unspecified at bind() or connect() */ #define IP_PORTRANGE_DEFAULT 0 /* default range */ #define IP_PORTRANGE_HIGH 1 /* "high" - request firewall bypass */ #define IP_PORTRANGE_LOW 2 /* "low" - vouchsafe security */ /* * Identifiers for IP sysctl nodes */ #define IPCTL_FORWARDING 1 /* act as router */ #define IPCTL_SENDREDIRECTS 2 /* may send redirects when forwarding */ #define IPCTL_DEFTTL 3 /* default TTL */ #ifdef notyet #define IPCTL_DEFMTU 4 /* default MTU */ #endif /* IPCTL_RTEXPIRE 5 deprecated */ /* IPCTL_RTMINEXPIRE 6 deprecated */ /* IPCTL_RTMAXCACHE 7 deprecated */ #define IPCTL_SOURCEROUTE 8 /* may perform source routes */ #define IPCTL_DIRECTEDBROADCAST 9 /* may re-broadcast received packets */ #define IPCTL_INTRQMAXLEN 10 /* max length of netisr queue */ #define IPCTL_INTRQDROPS 11 /* number of netisr q drops */ #define IPCTL_STATS 12 /* ipstat structure */ #define IPCTL_ACCEPTSOURCEROUTE 13 /* may accept source routed packets */ #define IPCTL_FASTFORWARDING 14 /* use fast IP forwarding code */ /* 15, unused, was: IPCTL_KEEPFAITH */ #define IPCTL_GIF_TTL 16 /* default TTL for gif encap packet */ #define IPCTL_INTRDQMAXLEN 17 /* max length of direct netisr queue */ #define IPCTL_INTRDQDROPS 18 /* number of direct netisr q drops */ #endif /* __BSD_VISIBLE */ #ifdef _KERNEL struct ifnet; struct mbuf; /* forward declarations for Standard C */ struct in_ifaddr; int in_broadcast(struct in_addr, struct ifnet *); int in_ifaddr_broadcast(struct in_addr, struct in_ifaddr *); int in_canforward(struct in_addr); int in_localaddr(struct in_addr); bool in_localip(struct in_addr); bool in_localip_fib(struct in_addr, uint16_t); int in_ifhasaddr(struct ifnet *, struct in_addr); struct in_ifaddr *in_findlocal(uint32_t, bool); int inet_aton(const char *, struct in_addr *); /* in libkern */ char *inet_ntoa_r(struct in_addr ina, char *buf); /* in libkern */ char *inet_ntop(int, const void *, char *, socklen_t); /* in libkern */ int inet_pton(int af, const char *, void *); /* in libkern */ void in_ifdetach(struct ifnet *); #define in_hosteq(s, t) ((s).s_addr == (t).s_addr) #define in_nullhost(x) ((x).s_addr == INADDR_ANY) #define in_allhosts(x) ((x).s_addr == htonl(INADDR_ALLHOSTS_GROUP)) #define satosin(sa) ((struct sockaddr_in *)(sa)) #define sintosa(sin) ((struct sockaddr *)(sin)) #define ifatoia(ifa) ((struct in_ifaddr *)(ifa)) #endif /* _KERNEL */ /* INET6 stuff */ #if __POSIX_VISIBLE >= 200112 #define __KAME_NETINET_IN_H_INCLUDED_ #include #undef __KAME_NETINET_IN_H_INCLUDED_ #endif #endif /* !_NETINET_IN_H_*/ diff --git a/sys/netinet/ip_icmp.c b/sys/netinet/ip_icmp.c index 76bce4b38e24..a710cc2ba8cd 100644 --- a/sys/netinet/ip_icmp.c +++ b/sys/netinet/ip_icmp.c @@ -1,1158 +1,1158 @@ /*- * SPDX-License-Identifier: BSD-3-Clause * * Copyright (c) 1982, 1986, 1988, 1993 * The Regents of the University of California. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)ip_icmp.c 8.2 (Berkeley) 1/4/94 */ #include __FBSDID("$FreeBSD$"); #include "opt_inet.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifdef INET #include #include #endif /* INET */ /* * ICMP routines: error generation, receive packet processing, and * routines to turnaround packets back to the originator, and * host table maintenance routines. */ VNET_DEFINE_STATIC(int, icmplim) = 200; #define V_icmplim VNET(icmplim) SYSCTL_INT(_net_inet_icmp, ICMPCTL_ICMPLIM, icmplim, CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(icmplim), 0, "Maximum number of ICMP responses per second"); VNET_DEFINE_STATIC(int, icmplim_curr_jitter) = 0; #define V_icmplim_curr_jitter VNET(icmplim_curr_jitter) VNET_DEFINE_STATIC(int, icmplim_jitter) = 16; #define V_icmplim_jitter VNET(icmplim_jitter) SYSCTL_INT(_net_inet_icmp, OID_AUTO, icmplim_jitter, CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(icmplim_jitter), 0, "Random icmplim jitter adjustment limit"); VNET_DEFINE_STATIC(int, icmplim_output) = 1; #define V_icmplim_output VNET(icmplim_output) SYSCTL_INT(_net_inet_icmp, OID_AUTO, icmplim_output, CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(icmplim_output), 0, "Enable logging of ICMP response rate limiting"); #ifdef INET VNET_PCPUSTAT_DEFINE(struct icmpstat, icmpstat); VNET_PCPUSTAT_SYSINIT(icmpstat); SYSCTL_VNET_PCPUSTAT(_net_inet_icmp, ICMPCTL_STATS, stats, struct icmpstat, icmpstat, "ICMP statistics (struct icmpstat, netinet/icmp_var.h)"); #ifdef VIMAGE VNET_PCPUSTAT_SYSUNINIT(icmpstat); #endif /* VIMAGE */ VNET_DEFINE_STATIC(int, icmpmaskrepl) = 0; #define V_icmpmaskrepl VNET(icmpmaskrepl) SYSCTL_INT(_net_inet_icmp, ICMPCTL_MASKREPL, maskrepl, CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(icmpmaskrepl), 0, "Reply to ICMP Address Mask Request packets"); VNET_DEFINE_STATIC(u_int, icmpmaskfake) = 0; #define V_icmpmaskfake VNET(icmpmaskfake) SYSCTL_UINT(_net_inet_icmp, OID_AUTO, maskfake, CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(icmpmaskfake), 0, "Fake reply to ICMP Address Mask Request packets"); VNET_DEFINE(int, drop_redirect) = 0; #define V_drop_redirect VNET(drop_redirect) SYSCTL_INT(_net_inet_icmp, OID_AUTO, drop_redirect, CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(drop_redirect), 0, "Ignore ICMP redirects"); VNET_DEFINE_STATIC(int, log_redirect) = 0; #define V_log_redirect VNET(log_redirect) SYSCTL_INT(_net_inet_icmp, OID_AUTO, log_redirect, CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(log_redirect), 0, "Log ICMP redirects to the console"); VNET_DEFINE_STATIC(int, redirtimeout) = 60 * 10; /* 10 minutes */ #define V_redirtimeout VNET(redirtimeout) SYSCTL_INT(_net_inet_icmp, OID_AUTO, redirtimeout, CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(redirtimeout), 0, "Delay in seconds before expiring redirect route"); VNET_DEFINE_STATIC(char, reply_src[IFNAMSIZ]); #define V_reply_src VNET(reply_src) SYSCTL_STRING(_net_inet_icmp, OID_AUTO, reply_src, CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(reply_src), IFNAMSIZ, "ICMP reply source for non-local packets"); VNET_DEFINE_STATIC(int, icmp_rfi) = 0; #define V_icmp_rfi VNET(icmp_rfi) SYSCTL_INT(_net_inet_icmp, OID_AUTO, reply_from_interface, CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(icmp_rfi), 0, "ICMP reply from incoming interface for non-local packets"); /* Router requirements RFC 1812 section 4.3.2.3 requires 576 - 28. */ VNET_DEFINE_STATIC(int, icmp_quotelen) = 548; #define V_icmp_quotelen VNET(icmp_quotelen) SYSCTL_INT(_net_inet_icmp, OID_AUTO, quotelen, CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(icmp_quotelen), 0, "Number of bytes from original packet to quote in ICMP reply"); VNET_DEFINE_STATIC(int, icmpbmcastecho) = 0; #define V_icmpbmcastecho VNET(icmpbmcastecho) SYSCTL_INT(_net_inet_icmp, OID_AUTO, bmcastecho, CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(icmpbmcastecho), 0, "Reply to multicast ICMP Echo Request and Timestamp packets"); VNET_DEFINE_STATIC(int, icmptstamprepl) = 1; #define V_icmptstamprepl VNET(icmptstamprepl) SYSCTL_INT(_net_inet_icmp, OID_AUTO, tstamprepl, CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(icmptstamprepl), 0, "Respond to ICMP Timestamp packets"); VNET_DEFINE_STATIC(int, error_keeptags) = 0; #define V_error_keeptags VNET(error_keeptags) SYSCTL_INT(_net_inet_icmp, OID_AUTO, error_keeptags, CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(error_keeptags), 0, "ICMP error response keeps copy of mbuf_tags of original packet"); #ifdef ICMPPRINTFS int icmpprintfs = 0; #endif static void icmp_reflect(struct mbuf *); static void icmp_send(struct mbuf *, struct mbuf *); static int icmp_verify_redirect_gateway(struct sockaddr_in *, struct sockaddr_in *, struct sockaddr_in *, u_int); extern struct protosw inetsw[]; /* * Kernel module interface for updating icmpstat. The argument is an index * into icmpstat treated as an array of u_long. While this encodes the * general layout of icmpstat into the caller, it doesn't encode its * location, so that future changes to add, for example, per-CPU stats * support won't cause binary compatibility problems for kernel modules. */ void kmod_icmpstat_inc(int statnum) { counter_u64_add(VNET(icmpstat)[statnum], 1); } /* * Generate an error packet of type error * in response to bad packet ip. */ void icmp_error(struct mbuf *n, int type, int code, uint32_t dest, int mtu) { struct ip *oip, *nip; struct icmp *icp; struct mbuf *m; unsigned icmplen, icmpelen, nlen, oiphlen; KASSERT((u_int)type <= ICMP_MAXTYPE, ("%s: illegal ICMP type", __func__)); if (type != ICMP_REDIRECT) ICMPSTAT_INC(icps_error); /* * Don't send error: * if the original packet was encrypted. * if not the first fragment of message. * in response to a multicast or broadcast packet. * if the old packet protocol was an ICMP error message. */ if (n->m_flags & M_DECRYPTED) goto freeit; if (n->m_flags & (M_BCAST|M_MCAST)) goto freeit; /* Drop if IP header plus 8 bytes is not contiguous in first mbuf. */ if (n->m_len < sizeof(struct ip) + ICMP_MINLEN) goto freeit; oip = mtod(n, struct ip *); oiphlen = oip->ip_hl << 2; if (n->m_len < oiphlen + ICMP_MINLEN) goto freeit; #ifdef ICMPPRINTFS if (icmpprintfs) printf("icmp_error(%p, %x, %d)\n", oip, type, code); #endif if (oip->ip_off & htons(~(IP_MF|IP_DF))) goto freeit; if (oip->ip_p == IPPROTO_ICMP && type != ICMP_REDIRECT && !ICMP_INFOTYPE(((struct icmp *)((caddr_t)oip + oiphlen))->icmp_type)) { ICMPSTAT_INC(icps_oldicmp); goto freeit; } /* * Calculate length to quote from original packet and * prevent the ICMP mbuf from overflowing. * Unfortunately this is non-trivial since ip_forward() * sends us truncated packets. */ nlen = m_length(n, NULL); if (oip->ip_p == IPPROTO_TCP) { struct tcphdr *th; int tcphlen; if (oiphlen + sizeof(struct tcphdr) > n->m_len && n->m_next == NULL) goto stdreply; if (n->m_len < oiphlen + sizeof(struct tcphdr) && (n = m_pullup(n, oiphlen + sizeof(struct tcphdr))) == NULL) goto freeit; oip = mtod(n, struct ip *); th = mtodo(n, oiphlen); tcphlen = th->th_off << 2; if (tcphlen < sizeof(struct tcphdr)) goto freeit; if (ntohs(oip->ip_len) < oiphlen + tcphlen) goto freeit; if (oiphlen + tcphlen > n->m_len && n->m_next == NULL) goto stdreply; if (n->m_len < oiphlen + tcphlen && (n = m_pullup(n, oiphlen + tcphlen)) == NULL) goto freeit; oip = mtod(n, struct ip *); icmpelen = max(tcphlen, min(V_icmp_quotelen, ntohs(oip->ip_len) - oiphlen)); } else if (oip->ip_p == IPPROTO_SCTP) { struct sctphdr *sh; struct sctp_chunkhdr *ch; if (ntohs(oip->ip_len) < oiphlen + sizeof(struct sctphdr)) goto stdreply; if (oiphlen + sizeof(struct sctphdr) > n->m_len && n->m_next == NULL) goto stdreply; if (n->m_len < oiphlen + sizeof(struct sctphdr) && (n = m_pullup(n, oiphlen + sizeof(struct sctphdr))) == NULL) goto freeit; oip = mtod(n, struct ip *); icmpelen = max(sizeof(struct sctphdr), min(V_icmp_quotelen, ntohs(oip->ip_len) - oiphlen)); sh = mtodo(n, oiphlen); if (ntohl(sh->v_tag) == 0 && ntohs(oip->ip_len) >= oiphlen + sizeof(struct sctphdr) + 8 && (n->m_len >= oiphlen + sizeof(struct sctphdr) + 8 || n->m_next != NULL)) { if (n->m_len < oiphlen + sizeof(struct sctphdr) + 8 && (n = m_pullup(n, oiphlen + sizeof(struct sctphdr) + 8)) == NULL) goto freeit; oip = mtod(n, struct ip *); sh = mtodo(n, oiphlen); ch = (struct sctp_chunkhdr *)(sh + 1); if (ch->chunk_type == SCTP_INITIATION) { icmpelen = max(sizeof(struct sctphdr) + 8, min(V_icmp_quotelen, ntohs(oip->ip_len) - oiphlen)); } } } else stdreply: icmpelen = max(8, min(V_icmp_quotelen, ntohs(oip->ip_len) - oiphlen)); icmplen = min(oiphlen + icmpelen, nlen); if (icmplen < sizeof(struct ip)) goto freeit; if (MHLEN > sizeof(struct ip) + ICMP_MINLEN + icmplen) m = m_gethdr(M_NOWAIT, MT_DATA); else m = m_getcl(M_NOWAIT, MT_DATA, M_PKTHDR); if (m == NULL) goto freeit; #ifdef MAC mac_netinet_icmp_reply(n, m); #endif icmplen = min(icmplen, M_TRAILINGSPACE(m) - sizeof(struct ip) - ICMP_MINLEN); m_align(m, sizeof(struct ip) + ICMP_MINLEN + icmplen); m->m_data += sizeof(struct ip); m->m_len = ICMP_MINLEN + icmplen; /* XXX MRT make the outgoing packet use the same FIB * that was associated with the incoming packet */ M_SETFIB(m, M_GETFIB(n)); icp = mtod(m, struct icmp *); ICMPSTAT_INC(icps_outhist[type]); icp->icmp_type = type; if (type == ICMP_REDIRECT) icp->icmp_gwaddr.s_addr = dest; else { icp->icmp_void = 0; /* * The following assignments assume an overlay with the * just zeroed icmp_void field. */ if (type == ICMP_PARAMPROB) { icp->icmp_pptr = code; code = 0; } else if (type == ICMP_UNREACH && code == ICMP_UNREACH_NEEDFRAG && mtu) { icp->icmp_nextmtu = htons(mtu); } } icp->icmp_code = code; /* * Copy the quotation into ICMP message and * convert quoted IP header back to network representation. */ m_copydata(n, 0, icmplen, (caddr_t)&icp->icmp_ip); nip = &icp->icmp_ip; /* * Set up ICMP message mbuf and copy old IP header (without options * in front of ICMP message. * If the original mbuf was meant to bypass the firewall, the error * reply should bypass as well. */ m->m_flags |= n->m_flags & M_SKIP_FIREWALL; KASSERT(M_LEADINGSPACE(m) >= sizeof(struct ip), ("insufficient space for ip header")); m->m_data -= sizeof(struct ip); m->m_len += sizeof(struct ip); m->m_pkthdr.len = m->m_len; m->m_pkthdr.rcvif = n->m_pkthdr.rcvif; nip = mtod(m, struct ip *); bcopy((caddr_t)oip, (caddr_t)nip, sizeof(struct ip)); nip->ip_len = htons(m->m_len); nip->ip_v = IPVERSION; nip->ip_hl = 5; nip->ip_p = IPPROTO_ICMP; nip->ip_tos = 0; nip->ip_off = 0; if (V_error_keeptags) m_tag_copy_chain(m, n, M_NOWAIT); icmp_reflect(m); freeit: m_freem(n); } /* * Process a received ICMP message. */ int icmp_input(struct mbuf **mp, int *offp, int proto) { struct icmp *icp; struct in_ifaddr *ia; struct mbuf *m = *mp; struct ip *ip = mtod(m, struct ip *); struct sockaddr_in icmpsrc, icmpdst, icmpgw; int hlen = *offp; int icmplen = ntohs(ip->ip_len) - *offp; int i, code; void (*ctlfunc)(int, struct sockaddr *, void *); int fibnum; NET_EPOCH_ASSERT(); *mp = NULL; /* * Locate icmp structure in mbuf, and check * that not corrupted and of at least minimum length. */ #ifdef ICMPPRINTFS if (icmpprintfs) { char srcbuf[INET_ADDRSTRLEN]; char dstbuf[INET_ADDRSTRLEN]; printf("icmp_input from %s to %s, len %d\n", inet_ntoa_r(ip->ip_src, srcbuf), inet_ntoa_r(ip->ip_dst, dstbuf), icmplen); } #endif if (icmplen < ICMP_MINLEN) { ICMPSTAT_INC(icps_tooshort); goto freeit; } i = hlen + min(icmplen, ICMP_ADVLENMIN); if (m->m_len < i && (m = m_pullup(m, i)) == NULL) { ICMPSTAT_INC(icps_tooshort); return (IPPROTO_DONE); } ip = mtod(m, struct ip *); m->m_len -= hlen; m->m_data += hlen; icp = mtod(m, struct icmp *); if (in_cksum(m, icmplen)) { ICMPSTAT_INC(icps_checksum); goto freeit; } m->m_len += hlen; m->m_data -= hlen; #ifdef ICMPPRINTFS if (icmpprintfs) printf("icmp_input, type %d code %d\n", icp->icmp_type, icp->icmp_code); #endif /* * Message type specific processing. */ if (icp->icmp_type > ICMP_MAXTYPE) goto raw; /* Initialize */ bzero(&icmpsrc, sizeof(icmpsrc)); icmpsrc.sin_len = sizeof(struct sockaddr_in); icmpsrc.sin_family = AF_INET; bzero(&icmpdst, sizeof(icmpdst)); icmpdst.sin_len = sizeof(struct sockaddr_in); icmpdst.sin_family = AF_INET; bzero(&icmpgw, sizeof(icmpgw)); icmpgw.sin_len = sizeof(struct sockaddr_in); icmpgw.sin_family = AF_INET; ICMPSTAT_INC(icps_inhist[icp->icmp_type]); code = icp->icmp_code; switch (icp->icmp_type) { case ICMP_UNREACH: switch (code) { case ICMP_UNREACH_NET: case ICMP_UNREACH_HOST: case ICMP_UNREACH_SRCFAIL: case ICMP_UNREACH_NET_UNKNOWN: case ICMP_UNREACH_HOST_UNKNOWN: case ICMP_UNREACH_ISOLATED: case ICMP_UNREACH_TOSNET: case ICMP_UNREACH_TOSHOST: case ICMP_UNREACH_HOST_PRECEDENCE: case ICMP_UNREACH_PRECEDENCE_CUTOFF: code = PRC_UNREACH_NET; break; case ICMP_UNREACH_NEEDFRAG: code = PRC_MSGSIZE; break; /* * RFC 1122, Sections 3.2.2.1 and 4.2.3.9. * Treat subcodes 2,3 as immediate RST */ case ICMP_UNREACH_PROTOCOL: code = PRC_UNREACH_PROTOCOL; break; case ICMP_UNREACH_PORT: code = PRC_UNREACH_PORT; break; case ICMP_UNREACH_NET_PROHIB: case ICMP_UNREACH_HOST_PROHIB: case ICMP_UNREACH_FILTER_PROHIB: code = PRC_UNREACH_ADMIN_PROHIB; break; default: goto badcode; } goto deliver; case ICMP_TIMXCEED: if (code > 1) goto badcode; code += PRC_TIMXCEED_INTRANS; goto deliver; case ICMP_PARAMPROB: if (code > 1) goto badcode; code = PRC_PARAMPROB; deliver: /* * Problem with datagram; advise higher level routines. */ if (icmplen < ICMP_ADVLENMIN || icmplen < ICMP_ADVLEN(icp) || icp->icmp_ip.ip_hl < (sizeof(struct ip) >> 2)) { ICMPSTAT_INC(icps_badlen); goto freeit; } /* Discard ICMP's in response to multicast packets */ if (IN_MULTICAST(ntohl(icp->icmp_ip.ip_dst.s_addr))) goto badcode; #ifdef ICMPPRINTFS if (icmpprintfs) printf("deliver to protocol %d\n", icp->icmp_ip.ip_p); #endif icmpsrc.sin_addr = icp->icmp_ip.ip_dst; /* * XXX if the packet contains [IPv4 AH TCP], we can't make a * notification to TCP layer. */ i = sizeof(struct ip) + min(icmplen, ICMP_ADVLENPREF(icp)); ip_stripoptions(m); if (m->m_len < i && (m = m_pullup(m, i)) == NULL) { /* This should actually not happen */ ICMPSTAT_INC(icps_tooshort); return (IPPROTO_DONE); } ip = mtod(m, struct ip *); icp = (struct icmp *)(ip + 1); /* * The upper layer handler can rely on: * - The outer IP header has no options. * - The outer IP header, the ICMP header, the inner IP header, * and the first n bytes of the inner payload are contiguous. * n is at least 8, but might be larger based on * ICMP_ADVLENPREF. See its definition in ip_icmp.h. */ ctlfunc = inetsw[ip_protox[icp->icmp_ip.ip_p]].pr_ctlinput; if (ctlfunc) (*ctlfunc)(code, (struct sockaddr *)&icmpsrc, (void *)&icp->icmp_ip); break; badcode: ICMPSTAT_INC(icps_badcode); break; case ICMP_ECHO: if (!V_icmpbmcastecho && (m->m_flags & (M_MCAST | M_BCAST)) != 0) { ICMPSTAT_INC(icps_bmcastecho); break; } if (badport_bandlim(BANDLIM_ICMP_ECHO) < 0) goto freeit; icp->icmp_type = ICMP_ECHOREPLY; goto reflect; case ICMP_TSTAMP: if (V_icmptstamprepl == 0) break; if (!V_icmpbmcastecho && (m->m_flags & (M_MCAST | M_BCAST)) != 0) { ICMPSTAT_INC(icps_bmcasttstamp); break; } if (icmplen < ICMP_TSLEN) { ICMPSTAT_INC(icps_badlen); break; } if (badport_bandlim(BANDLIM_ICMP_TSTAMP) < 0) goto freeit; icp->icmp_type = ICMP_TSTAMPREPLY; icp->icmp_rtime = iptime(); icp->icmp_ttime = icp->icmp_rtime; /* bogus, do later! */ goto reflect; case ICMP_MASKREQ: if (V_icmpmaskrepl == 0) break; /* * We are not able to respond with all ones broadcast * unless we receive it over a point-to-point interface. */ if (icmplen < ICMP_MASKLEN) break; switch (ip->ip_dst.s_addr) { case INADDR_BROADCAST: case INADDR_ANY: icmpdst.sin_addr = ip->ip_src; break; default: icmpdst.sin_addr = ip->ip_dst; } ia = (struct in_ifaddr *)ifaof_ifpforaddr( (struct sockaddr *)&icmpdst, m->m_pkthdr.rcvif); if (ia == NULL) break; if (ia->ia_ifp == NULL) break; icp->icmp_type = ICMP_MASKREPLY; if (V_icmpmaskfake == 0) icp->icmp_mask = ia->ia_sockmask.sin_addr.s_addr; else icp->icmp_mask = V_icmpmaskfake; if (ip->ip_src.s_addr == 0) { if (ia->ia_ifp->if_flags & IFF_BROADCAST) ip->ip_src = satosin(&ia->ia_broadaddr)->sin_addr; else if (ia->ia_ifp->if_flags & IFF_POINTOPOINT) ip->ip_src = satosin(&ia->ia_dstaddr)->sin_addr; } reflect: ICMPSTAT_INC(icps_reflect); ICMPSTAT_INC(icps_outhist[icp->icmp_type]); icmp_reflect(m); return (IPPROTO_DONE); case ICMP_REDIRECT: if (V_log_redirect) { u_long src, dst, gw; src = ntohl(ip->ip_src.s_addr); dst = ntohl(icp->icmp_ip.ip_dst.s_addr); gw = ntohl(icp->icmp_gwaddr.s_addr); printf("icmp redirect from %d.%d.%d.%d: " "%d.%d.%d.%d => %d.%d.%d.%d\n", (int)(src >> 24), (int)((src >> 16) & 0xff), (int)((src >> 8) & 0xff), (int)(src & 0xff), (int)(dst >> 24), (int)((dst >> 16) & 0xff), (int)((dst >> 8) & 0xff), (int)(dst & 0xff), (int)(gw >> 24), (int)((gw >> 16) & 0xff), (int)((gw >> 8) & 0xff), (int)(gw & 0xff)); } /* * RFC1812 says we must ignore ICMP redirects if we * are acting as router. */ if (V_drop_redirect || V_ipforwarding) break; if (code > 3) goto badcode; if (icmplen < ICMP_ADVLENMIN || icmplen < ICMP_ADVLEN(icp) || icp->icmp_ip.ip_hl < (sizeof(struct ip) >> 2)) { ICMPSTAT_INC(icps_badlen); break; } /* * Short circuit routing redirects to force * immediate change in the kernel's routing * tables. The message is also handed to anyone * listening on a raw socket (e.g. the routing * daemon for use in updating its tables). */ icmpgw.sin_addr = ip->ip_src; icmpdst.sin_addr = icp->icmp_gwaddr; #ifdef ICMPPRINTFS if (icmpprintfs) { char dstbuf[INET_ADDRSTRLEN]; char gwbuf[INET_ADDRSTRLEN]; printf("redirect dst %s to %s\n", inet_ntoa_r(icp->icmp_ip.ip_dst, dstbuf), inet_ntoa_r(icp->icmp_gwaddr, gwbuf)); } #endif icmpsrc.sin_addr = icp->icmp_ip.ip_dst; /* * RFC 1122 says network (code 0,2) redirects SHOULD * be treated identically to the host redirects. * Given that, ignore network masks. */ /* * Variable values: * icmpsrc: route destination * icmpdst: route gateway * icmpgw: message source */ if (icmp_verify_redirect_gateway(&icmpgw, &icmpsrc, &icmpdst, M_GETFIB(m)) != 0) { /* TODO: increment bad redirects here */ break; } for ( fibnum = 0; fibnum < rt_numfibs; fibnum++) { rib_add_redirect(fibnum, (struct sockaddr *)&icmpsrc, (struct sockaddr *)&icmpdst, (struct sockaddr *)&icmpgw, m->m_pkthdr.rcvif, RTF_GATEWAY, V_redirtimeout); } pfctlinput(PRC_REDIRECT_HOST, (struct sockaddr *)&icmpsrc); break; /* * No kernel processing for the following; * just fall through to send to raw listener. */ case ICMP_ECHOREPLY: case ICMP_ROUTERADVERT: case ICMP_ROUTERSOLICIT: case ICMP_TSTAMPREPLY: case ICMP_IREQREPLY: case ICMP_MASKREPLY: case ICMP_SOURCEQUENCH: default: break; } raw: *mp = m; rip_input(mp, offp, proto); return (IPPROTO_DONE); freeit: m_freem(m); return (IPPROTO_DONE); } /* * Reflect the ip packet back to the source */ static void icmp_reflect(struct mbuf *m) { struct ip *ip = mtod(m, struct ip *); struct ifaddr *ifa; struct ifnet *ifp; struct in_ifaddr *ia; struct in_addr t; struct nhop_object *nh; struct mbuf *opts = NULL; int optlen = (ip->ip_hl << 2) - sizeof(struct ip); NET_EPOCH_ASSERT(); if (IN_MULTICAST(ntohl(ip->ip_src.s_addr)) || - IN_EXPERIMENTAL(ntohl(ip->ip_src.s_addr)) || - IN_ZERONET(ntohl(ip->ip_src.s_addr)) ) { + (IN_EXPERIMENTAL(ntohl(ip->ip_src.s_addr)) && !V_ip_allow_net240) || + (IN_ZERONET(ntohl(ip->ip_src.s_addr)) && !V_ip_allow_net0) ) { m_freem(m); /* Bad return address */ ICMPSTAT_INC(icps_badaddr); goto done; /* Ip_output() will check for broadcast */ } t = ip->ip_dst; ip->ip_dst = ip->ip_src; /* * Source selection for ICMP replies: * * If the incoming packet was addressed directly to one of our * own addresses, use dst as the src for the reply. */ CK_LIST_FOREACH(ia, INADDR_HASH(t.s_addr), ia_hash) { if (t.s_addr == IA_SIN(ia)->sin_addr.s_addr) { t = IA_SIN(ia)->sin_addr; goto match; } } /* * If the incoming packet was addressed to one of our broadcast * addresses, use the first non-broadcast address which corresponds * to the incoming interface. */ ifp = m->m_pkthdr.rcvif; if (ifp != NULL && ifp->if_flags & IFF_BROADCAST) { CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) { if (ifa->ifa_addr->sa_family != AF_INET) continue; ia = ifatoia(ifa); if (satosin(&ia->ia_broadaddr)->sin_addr.s_addr == t.s_addr) { t = IA_SIN(ia)->sin_addr; goto match; } } } /* * If the packet was transiting through us, use the address of * the interface the packet came through in. If that interface * doesn't have a suitable IP address, the normal selection * criteria apply. */ if (V_icmp_rfi && ifp != NULL) { CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) { if (ifa->ifa_addr->sa_family != AF_INET) continue; ia = ifatoia(ifa); t = IA_SIN(ia)->sin_addr; goto match; } } /* * If the incoming packet was not addressed directly to us, use * designated interface for icmp replies specified by sysctl * net.inet.icmp.reply_src (default not set). Otherwise continue * with normal source selection. */ if (V_reply_src[0] != '\0' && (ifp = ifunit(V_reply_src))) { CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) { if (ifa->ifa_addr->sa_family != AF_INET) continue; ia = ifatoia(ifa); t = IA_SIN(ia)->sin_addr; goto match; } } /* * If the packet was transiting through us, use the address of * the interface that is the closest to the packet source. * When we don't have a route back to the packet source, stop here * and drop the packet. */ nh = fib4_lookup(M_GETFIB(m), ip->ip_dst, 0, NHR_NONE, 0); if (nh == NULL) { m_freem(m); ICMPSTAT_INC(icps_noroute); goto done; } t = IA_SIN(ifatoia(nh->nh_ifa))->sin_addr; match: #ifdef MAC mac_netinet_icmp_replyinplace(m); #endif ip->ip_src = t; ip->ip_ttl = V_ip_defttl; if (optlen > 0) { u_char *cp; int opt, cnt; u_int len; /* * Retrieve any source routing from the incoming packet; * add on any record-route or timestamp options. */ cp = (u_char *) (ip + 1); if ((opts = ip_srcroute(m)) == NULL && (opts = m_gethdr(M_NOWAIT, MT_DATA))) { opts->m_len = sizeof(struct in_addr); mtod(opts, struct in_addr *)->s_addr = 0; } if (opts) { #ifdef ICMPPRINTFS if (icmpprintfs) printf("icmp_reflect optlen %d rt %d => ", optlen, opts->m_len); #endif for (cnt = optlen; cnt > 0; cnt -= len, cp += len) { opt = cp[IPOPT_OPTVAL]; if (opt == IPOPT_EOL) break; if (opt == IPOPT_NOP) len = 1; else { if (cnt < IPOPT_OLEN + sizeof(*cp)) break; len = cp[IPOPT_OLEN]; if (len < IPOPT_OLEN + sizeof(*cp) || len > cnt) break; } /* * Should check for overflow, but it "can't happen" */ if (opt == IPOPT_RR || opt == IPOPT_TS || opt == IPOPT_SECURITY) { bcopy((caddr_t)cp, mtod(opts, caddr_t) + opts->m_len, len); opts->m_len += len; } } /* Terminate & pad, if necessary */ cnt = opts->m_len % 4; if (cnt) { for (; cnt < 4; cnt++) { *(mtod(opts, caddr_t) + opts->m_len) = IPOPT_EOL; opts->m_len++; } } #ifdef ICMPPRINTFS if (icmpprintfs) printf("%d\n", opts->m_len); #endif } ip_stripoptions(m); } m_tag_delete_nonpersistent(m); m->m_flags &= ~(M_BCAST|M_MCAST); icmp_send(m, opts); done: if (opts) (void)m_free(opts); } /* * Verifies if redirect message is valid, according to RFC 1122 * * @src: sockaddr with address of redirect originator * @dst: sockaddr with destination in question * @gateway: new proposed gateway * * Returns 0 on success. */ static int icmp_verify_redirect_gateway(struct sockaddr_in *src, struct sockaddr_in *dst, struct sockaddr_in *gateway, u_int fibnum) { struct nhop_object *nh; struct ifaddr *ifa; NET_EPOCH_ASSERT(); /* Verify the gateway is directly reachable. */ if ((ifa = ifa_ifwithnet((struct sockaddr *)gateway, 0, fibnum))==NULL) return (ENETUNREACH); /* TODO: fib-aware. */ if (ifa_ifwithaddr_check((struct sockaddr *)gateway)) return (EHOSTUNREACH); nh = fib4_lookup(fibnum, dst->sin_addr, 0, NHR_NONE, 0); if (nh == NULL) return (EINVAL); /* * If the redirect isn't from our current router for this dst, * it's either old or wrong. If it redirects us to ourselves, * we have a routing loop, perhaps as a result of an interface * going down recently. */ if (!sa_equal((struct sockaddr *)src, &nh->gw_sa)) return (EINVAL); if (nh->nh_ifa != ifa && ifa->ifa_addr->sa_family != AF_LINK) return (EINVAL); /* If host route already exists, ignore redirect. */ if (nh->nh_flags & NHF_HOST) return (EEXIST); /* If the prefix is directly reachable, ignore redirect. */ if (!(nh->nh_flags & NHF_GATEWAY)) return (EEXIST); return (0); } /* * Send an icmp packet back to the ip level, * after supplying a checksum. */ static void icmp_send(struct mbuf *m, struct mbuf *opts) { struct ip *ip = mtod(m, struct ip *); int hlen; struct icmp *icp; hlen = ip->ip_hl << 2; m->m_data += hlen; m->m_len -= hlen; icp = mtod(m, struct icmp *); icp->icmp_cksum = 0; icp->icmp_cksum = in_cksum(m, ntohs(ip->ip_len) - hlen); m->m_data -= hlen; m->m_len += hlen; m->m_pkthdr.rcvif = (struct ifnet *)0; #ifdef ICMPPRINTFS if (icmpprintfs) { char dstbuf[INET_ADDRSTRLEN]; char srcbuf[INET_ADDRSTRLEN]; printf("icmp_send dst %s src %s\n", inet_ntoa_r(ip->ip_dst, dstbuf), inet_ntoa_r(ip->ip_src, srcbuf)); } #endif (void) ip_output(m, opts, NULL, 0, NULL, NULL); } /* * Return milliseconds since 00:00 UTC in network format. */ uint32_t iptime(void) { struct timeval atv; u_long t; getmicrotime(&atv); t = (atv.tv_sec % (24*60*60)) * 1000 + atv.tv_usec / 1000; return (htonl(t)); } /* * Return the next larger or smaller MTU plateau (table from RFC 1191) * given current value MTU. If DIR is less than zero, a larger plateau * is returned; otherwise, a smaller value is returned. */ int ip_next_mtu(int mtu, int dir) { static int mtutab[] = { 65535, 32000, 17914, 8166, 4352, 2002, 1492, 1280, 1006, 508, 296, 68, 0 }; int i, size; size = (sizeof mtutab) / (sizeof mtutab[0]); if (dir >= 0) { for (i = 0; i < size; i++) if (mtu > mtutab[i]) return mtutab[i]; } else { for (i = size - 1; i >= 0; i--) if (mtu < mtutab[i]) return mtutab[i]; if (mtu == mtutab[0]) return mtutab[0]; } return 0; } #endif /* INET */ /* * badport_bandlim() - check for ICMP bandwidth limit * * Return 0 if it is ok to send an ICMP error response, -1 if we have * hit our bandwidth limit and it is not ok. * * If icmplim is <= 0, the feature is disabled and 0 is returned. * * For now we separate the TCP and UDP subsystems w/ different 'which' * values. We may eventually remove this separation (and simplify the * code further). * * Note that the printing of the error message is delayed so we can * properly print the icmp error rate that the system was trying to do * (i.e. 22000/100 pps, etc...). This can cause long delays in printing * the 'final' error, but it doesn't make sense to solve the printing * delay with more complex code. */ struct icmp_rate { const char *descr; struct counter_rate cr; }; VNET_DEFINE_STATIC(struct icmp_rate, icmp_rates[BANDLIM_MAX]) = { { "icmp unreach response" }, { "icmp ping response" }, { "icmp tstamp response" }, { "closed port RST response" }, { "open port RST response" }, { "icmp6 unreach response" }, { "sctp ootb response" } }; #define V_icmp_rates VNET(icmp_rates) static void icmp_bandlimit_init(void) { for (int i = 0; i < BANDLIM_MAX; i++) { V_icmp_rates[i].cr.cr_rate = counter_u64_alloc(M_WAITOK); V_icmp_rates[i].cr.cr_ticks = ticks; } } VNET_SYSINIT(icmp_bandlimit, SI_SUB_PROTO_DOMAIN, SI_ORDER_ANY, icmp_bandlimit_init, NULL); static void icmp_bandlimit_uninit(void) { for (int i = 0; i < BANDLIM_MAX; i++) counter_u64_free(V_icmp_rates[i].cr.cr_rate); } VNET_SYSUNINIT(icmp_bandlimit, SI_SUB_PROTO_DOMAIN, SI_ORDER_THIRD, icmp_bandlimit_uninit, NULL); int badport_bandlim(int which) { int64_t pps; if (V_icmplim == 0 || which == BANDLIM_UNLIMITED) return (0); KASSERT(which >= 0 && which < BANDLIM_MAX, ("%s: which %d", __func__, which)); if ((V_icmplim + V_icmplim_curr_jitter) <= 0) V_icmplim_curr_jitter = -V_icmplim + 1; pps = counter_ratecheck(&V_icmp_rates[which].cr, V_icmplim + V_icmplim_curr_jitter); if (pps > 0) { /* * Adjust limit +/- to jitter the measurement to deny a * side-channel port scan as in CVE-2020-25705 */ if (V_icmplim_jitter > 0) { int32_t inc = arc4random_uniform(V_icmplim_jitter * 2 +1) - V_icmplim_jitter; V_icmplim_curr_jitter = inc; } } if (pps == -1) return (-1); if (pps > 0 && V_icmplim_output) log(LOG_NOTICE, "Limiting %s from %jd to %d packets/sec\n", V_icmp_rates[which].descr, (intmax_t )pps, V_icmplim + V_icmplim_curr_jitter); return (0); }