Index: stable/12/share/man/man4/gif.4 =================================================================== --- stable/12/share/man/man4/gif.4 (revision 340534) +++ stable/12/share/man/man4/gif.4 (revision 340535) @@ -1,240 +1,231 @@ .\" $KAME: gif.4,v 1.28 2001/05/18 13:15:56 itojun Exp $ .\" .\" Copyright (C) 1995, 1996, 1997, and 1998 WIDE Project. .\" All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 3. Neither the name of the project nor the names of its contributors .\" may be used to endorse or promote products derived from this software .\" without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD$ .\" -.Dd June 5, 2018 +.Dd October 21, 2018 .Dt GIF 4 .Os .Sh NAME .Nm gif .Nd generic tunnel interface .Sh SYNOPSIS .Cd "device gif" .Sh DESCRIPTION The .Nm interface is a generic tunnelling device for IPv4 and IPv6. It can tunnel IPv[46] traffic over IPv[46]. Therefore, there can be four possible configurations. The behavior of .Nm is mainly based on RFC2893 IPv6-over-IPv4 configured tunnel. On .Nx , .Nm can also tunnel ISO traffic over IPv[46] using EON encapsulation. Note that .Nm does not perform GRE encapsulation; use .Xr gre 4 for GRE encapsulation. .Pp Each .Nm interface is created at runtime using interface cloning. This is most easily done with the .Dq Nm ifconfig Cm create command or using the .Va ifconfig_ Ns Aq Ar interface variable in .Xr rc.conf 5 . .Pp To use .Nm , the administrator needs to configure the protocol and addresses used for the outer header. This can be done by using .Xr ifconfig 8 .Cm tunnel , or .Dv SIOCSIFPHYADDR ioctl. The administrator also needs to configure the protocol and addresses for the inner header, with .Xr ifconfig 8 . Note that IPv6 link-local addresses (those that start with .Li fe80:: ) will be automatically configured whenever possible. You may need to remove IPv6 link-local addresses manually using .Xr ifconfig 8 , if you want to disable the use of IPv6 as the inner header (for example, if you need a pure IPv4-over-IPv6 tunnel). Finally, you must modify the routing table to route the packets through the .Nm interface. .Pp The .Nm device can be configured to be ECN friendly. This can be configured by .Dv IFF_LINK1 . .Ss ECN friendly behavior The .Nm device can be configured to be ECN friendly, as described in .Dv draft-ietf-ipsec-ecn-02.txt . This is turned off by default, and can be turned on by the .Dv IFF_LINK1 interface flag. .Pp Without .Dv IFF_LINK1 , .Nm will show normal behavior, as described in RFC2893. This can be summarized as follows: .Bl -tag -width "Ingress" -offset indent .It Ingress Set outer TOS bit to .Dv 0 . .It Egress Drop outer TOS bit. .El .Pp With .Dv IFF_LINK1 , .Nm will copy ECN bits .Dv ( 0x02 and .Dv 0x01 on IPv4 TOS byte or IPv6 traffic class byte) on egress and ingress, as follows: .Bl -tag -width "Ingress" -offset indent .It Ingress Copy TOS bits except for ECN CE (masked with .Dv 0xfe ) from inner to outer. Set ECN CE bit to .Dv 0 . .It Egress Use inner TOS bits with some change. If outer ECN CE bit is .Dv 1 , enable ECN CE bit on the inner. .El .Pp Note that the ECN friendly behavior violates RFC2893. This should be used in mutual agreement with the peer. .Ss Security A malicious party may try to circumvent security filters by using tunnelled packets. For better protection, .Nm performs both martian and ingress filtering against the outer source address on egress. Note that martian/ingress filters are in no way complete. You may want to secure your node by using packet filters. Ingress filtering can break tunnel operation in an asymmetrically routed network. It can be turned off by .Dv IFF_LINK2 bit. .Ss Miscellaneous By default, .Nm tunnels may not be nested. This behavior may be modified at runtime by setting the .Xr sysctl 8 variable .Va net.link.gif.max_nesting to the desired level of nesting. .Sh SEE ALSO .Xr gre 4 , .Xr inet 4 , .Xr inet6 4 , .Xr ifconfig 8 .Rs .%A R. Gilligan .%A E. Nordmark .%B RFC2893 .%T Transition Mechanisms for IPv6 Hosts and Routers .%D August 2000 .%U http://tools.ietf.org/html/rfc2893 .Re .Rs .%A Sally Floyd .%A David L. Black .%A K. K. Ramakrishnan .%T "IPsec Interactions with ECN" .%D December 1999 .%O draft-ietf-ipsec-ecn-02.txt .Re .\" .Sh HISTORY The .Nm device first appeared in the WIDE hydrangea IPv6 kit. .\" .Sh BUGS There are many tunnelling protocol specifications, all defined differently from each other. The .Nm device may not interoperate with peers which are based on different specifications, and are picky about outer header fields. For example, you cannot usually use .Nm to talk with IPsec devices that use IPsec tunnel mode. -.Pp -The current code does not check if the ingress address -(outer source address) -configured in the -.Nm -interface makes sense. -Make sure to specify an address which belongs to your node. -Otherwise, your node will not be able to receive packets from the peer, -and it will generate packets with a spoofed source address. .Pp If the outer protocol is IPv4, .Nm does not try to perform path MTU discovery for the encapsulated packet (DF bit is set to 0). .Pp If the outer protocol is IPv6, path MTU discovery for encapsulated packets may affect communication over the interface. The first bigger-than-pmtu packet may be lost. To avoid the problem, you may want to set the interface MTU for .Nm to 1240 or smaller, when the outer header is IPv6 and the inner header is IPv4. .Pp The .Nm device does not translate ICMP messages for the outer header into the inner header. .Pp In the past, .Nm had a multi-destination behavior, configurable via .Dv IFF_LINK0 flag. The behavior is obsolete and is no longer supported. Index: stable/12/sys/net/if_gif.c =================================================================== --- stable/12/sys/net/if_gif.c (revision 340534) +++ stable/12/sys/net/if_gif.c (revision 340535) @@ -1,701 +1,702 @@ /*- * SPDX-License-Identifier: BSD-3-Clause * * Copyright (C) 1995, 1996, 1997, and 1998 WIDE Project. * Copyright (c) 2018 Andrey V. Elsukov * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the project nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $KAME: if_gif.c,v 1.87 2001/10/19 08:50:27 itojun Exp $ */ #include __FBSDID("$FreeBSD$"); #include "opt_inet.h" #include "opt_inet6.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifdef INET #include #include #endif /* INET */ #ifdef INET6 #ifndef INET #include #endif #include #include #include #include #endif /* INET6 */ #include #include #include #include #include static const char gifname[] = "gif"; MALLOC_DEFINE(M_GIF, "gif", "Generic Tunnel Interface"); static struct sx gif_ioctl_sx; SX_SYSINIT(gif_ioctl_sx, &gif_ioctl_sx, "gif_ioctl"); void (*ng_gif_input_p)(struct ifnet *ifp, struct mbuf **mp, int af); void (*ng_gif_input_orphan_p)(struct ifnet *ifp, struct mbuf *m, int af); void (*ng_gif_attach_p)(struct ifnet *ifp); void (*ng_gif_detach_p)(struct ifnet *ifp); static void gif_delete_tunnel(struct gif_softc *); static int gif_ioctl(struct ifnet *, u_long, caddr_t); static int gif_transmit(struct ifnet *, struct mbuf *); static void gif_qflush(struct ifnet *); static int gif_clone_create(struct if_clone *, int, caddr_t); static void gif_clone_destroy(struct ifnet *); VNET_DEFINE_STATIC(struct if_clone *, gif_cloner); #define V_gif_cloner VNET(gif_cloner) SYSCTL_DECL(_net_link); static SYSCTL_NODE(_net_link, IFT_GIF, gif, CTLFLAG_RW, 0, "Generic Tunnel Interface"); #ifndef MAX_GIF_NEST /* * This macro controls the default upper limitation on nesting of gif tunnels. * Since, setting a large value to this macro with a careless configuration * may introduce system crash, we don't allow any nestings by default. * If you need to configure nested gif tunnels, you can define this macro * in your kernel configuration file. However, if you do so, please be * careful to configure the tunnels so that it won't make a loop. */ #define MAX_GIF_NEST 1 #endif VNET_DEFINE_STATIC(int, max_gif_nesting) = MAX_GIF_NEST; #define V_max_gif_nesting VNET(max_gif_nesting) SYSCTL_INT(_net_link_gif, OID_AUTO, max_nesting, CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(max_gif_nesting), 0, "Max nested tunnels"); static int gif_clone_create(struct if_clone *ifc, int unit, caddr_t params) { struct gif_softc *sc; sc = malloc(sizeof(struct gif_softc), M_GIF, M_WAITOK | M_ZERO); sc->gif_fibnum = curthread->td_proc->p_fibnum; GIF2IFP(sc) = if_alloc(IFT_GIF); GIF2IFP(sc)->if_softc = sc; if_initname(GIF2IFP(sc), gifname, unit); GIF2IFP(sc)->if_addrlen = 0; GIF2IFP(sc)->if_mtu = GIF_MTU; GIF2IFP(sc)->if_flags = IFF_POINTOPOINT | IFF_MULTICAST; GIF2IFP(sc)->if_ioctl = gif_ioctl; GIF2IFP(sc)->if_transmit = gif_transmit; GIF2IFP(sc)->if_qflush = gif_qflush; GIF2IFP(sc)->if_output = gif_output; GIF2IFP(sc)->if_capabilities |= IFCAP_LINKSTATE; GIF2IFP(sc)->if_capenable |= IFCAP_LINKSTATE; if_attach(GIF2IFP(sc)); bpfattach(GIF2IFP(sc), DLT_NULL, sizeof(u_int32_t)); if (ng_gif_attach_p != NULL) (*ng_gif_attach_p)(GIF2IFP(sc)); return (0); } static void gif_clone_destroy(struct ifnet *ifp) { struct gif_softc *sc; sx_xlock(&gif_ioctl_sx); sc = ifp->if_softc; gif_delete_tunnel(sc); if (ng_gif_detach_p != NULL) (*ng_gif_detach_p)(ifp); bpfdetach(ifp); if_detach(ifp); ifp->if_softc = NULL; sx_xunlock(&gif_ioctl_sx); GIF_WAIT(); if_free(ifp); free(sc, M_GIF); } static void vnet_gif_init(const void *unused __unused) { V_gif_cloner = if_clone_simple(gifname, gif_clone_create, gif_clone_destroy, 0); #ifdef INET in_gif_init(); #endif #ifdef INET6 in6_gif_init(); #endif } VNET_SYSINIT(vnet_gif_init, SI_SUB_PROTO_IFATTACHDOMAIN, SI_ORDER_ANY, vnet_gif_init, NULL); static void vnet_gif_uninit(const void *unused __unused) { if_clone_detach(V_gif_cloner); #ifdef INET in_gif_uninit(); #endif #ifdef INET6 in6_gif_uninit(); #endif } VNET_SYSUNINIT(vnet_gif_uninit, SI_SUB_PROTO_IFATTACHDOMAIN, SI_ORDER_ANY, vnet_gif_uninit, NULL); static int gifmodevent(module_t mod, int type, void *data) { switch (type) { case MOD_LOAD: case MOD_UNLOAD: break; default: return (EOPNOTSUPP); } return (0); } static moduledata_t gif_mod = { "if_gif", gifmodevent, 0 }; DECLARE_MODULE(if_gif, gif_mod, SI_SUB_PSEUDO, SI_ORDER_ANY); MODULE_VERSION(if_gif, 1); struct gif_list * gif_hashinit(void) { struct gif_list *hash; int i; hash = malloc(sizeof(struct gif_list) * GIF_HASH_SIZE, M_GIF, M_WAITOK); for (i = 0; i < GIF_HASH_SIZE; i++) CK_LIST_INIT(&hash[i]); return (hash); } void gif_hashdestroy(struct gif_list *hash) { free(hash, M_GIF); } #define MTAG_GIF 1080679712 static int gif_transmit(struct ifnet *ifp, struct mbuf *m) { struct gif_softc *sc; struct etherip_header *eth; #ifdef INET struct ip *ip; #endif #ifdef INET6 struct ip6_hdr *ip6; uint32_t t; #endif uint32_t af; uint8_t proto, ecn; int error; GIF_RLOCK(); #ifdef MAC error = mac_ifnet_check_transmit(ifp, m); if (error) { m_freem(m); goto err; } #endif error = ENETDOWN; sc = ifp->if_softc; if ((ifp->if_flags & IFF_MONITOR) != 0 || (ifp->if_flags & IFF_UP) == 0 || + (ifp->if_drv_flags & IFF_DRV_RUNNING) == 0 || sc->gif_family == 0 || (error = if_tunnel_check_nesting(ifp, m, MTAG_GIF, V_max_gif_nesting)) != 0) { m_freem(m); goto err; } /* Now pull back the af that we stashed in the csum_data. */ if (ifp->if_bridge) af = AF_LINK; else af = m->m_pkthdr.csum_data; m->m_flags &= ~(M_BCAST|M_MCAST); M_SETFIB(m, sc->gif_fibnum); BPF_MTAP2(ifp, &af, sizeof(af), m); if_inc_counter(ifp, IFCOUNTER_OPACKETS, 1); if_inc_counter(ifp, IFCOUNTER_OBYTES, m->m_pkthdr.len); /* inner AF-specific encapsulation */ ecn = 0; switch (af) { #ifdef INET case AF_INET: proto = IPPROTO_IPV4; if (m->m_len < sizeof(struct ip)) m = m_pullup(m, sizeof(struct ip)); if (m == NULL) { error = ENOBUFS; goto err; } ip = mtod(m, struct ip *); ip_ecn_ingress((ifp->if_flags & IFF_LINK1) ? ECN_ALLOWED: ECN_NOCARE, &ecn, &ip->ip_tos); break; #endif #ifdef INET6 case AF_INET6: proto = IPPROTO_IPV6; if (m->m_len < sizeof(struct ip6_hdr)) m = m_pullup(m, sizeof(struct ip6_hdr)); if (m == NULL) { error = ENOBUFS; goto err; } t = 0; ip6 = mtod(m, struct ip6_hdr *); ip6_ecn_ingress((ifp->if_flags & IFF_LINK1) ? ECN_ALLOWED: ECN_NOCARE, &t, &ip6->ip6_flow); ecn = (ntohl(t) >> 20) & 0xff; break; #endif case AF_LINK: proto = IPPROTO_ETHERIP; M_PREPEND(m, sizeof(struct etherip_header), M_NOWAIT); if (m == NULL) { error = ENOBUFS; goto err; } eth = mtod(m, struct etherip_header *); eth->eip_resvh = 0; eth->eip_ver = ETHERIP_VERSION; eth->eip_resvl = 0; break; default: error = EAFNOSUPPORT; m_freem(m); goto err; } /* XXX should we check if our outer source is legal? */ /* dispatch to output logic based on outer AF */ switch (sc->gif_family) { #ifdef INET case AF_INET: error = in_gif_output(ifp, m, proto, ecn); break; #endif #ifdef INET6 case AF_INET6: error = in6_gif_output(ifp, m, proto, ecn); break; #endif default: m_freem(m); } err: if (error) if_inc_counter(ifp, IFCOUNTER_OERRORS, 1); GIF_RUNLOCK(); return (error); } static void gif_qflush(struct ifnet *ifp __unused) { } int gif_output(struct ifnet *ifp, struct mbuf *m, const struct sockaddr *dst, struct route *ro) { uint32_t af; if (dst->sa_family == AF_UNSPEC) bcopy(dst->sa_data, &af, sizeof(af)); else af = dst->sa_family; /* * Now save the af in the inbound pkt csum data, this is a cheat since * we are using the inbound csum_data field to carry the af over to * the gif_transmit() routine, avoiding using yet another mtag. */ m->m_pkthdr.csum_data = af; return (ifp->if_transmit(ifp, m)); } void gif_input(struct mbuf *m, struct ifnet *ifp, int proto, uint8_t ecn) { struct etherip_header *eip; #ifdef INET struct ip *ip; #endif #ifdef INET6 struct ip6_hdr *ip6; uint32_t t; #endif struct ether_header *eh; struct ifnet *oldifp; int isr, n, af; if (ifp == NULL) { /* just in case */ m_freem(m); return; } m->m_pkthdr.rcvif = ifp; m_clrprotoflags(m); switch (proto) { #ifdef INET case IPPROTO_IPV4: af = AF_INET; if (m->m_len < sizeof(struct ip)) m = m_pullup(m, sizeof(struct ip)); if (m == NULL) goto drop; ip = mtod(m, struct ip *); if (ip_ecn_egress((ifp->if_flags & IFF_LINK1) ? ECN_ALLOWED: ECN_NOCARE, &ecn, &ip->ip_tos) == 0) { m_freem(m); goto drop; } break; #endif #ifdef INET6 case IPPROTO_IPV6: af = AF_INET6; if (m->m_len < sizeof(struct ip6_hdr)) m = m_pullup(m, sizeof(struct ip6_hdr)); if (m == NULL) goto drop; t = htonl((uint32_t)ecn << 20); ip6 = mtod(m, struct ip6_hdr *); if (ip6_ecn_egress((ifp->if_flags & IFF_LINK1) ? ECN_ALLOWED: ECN_NOCARE, &t, &ip6->ip6_flow) == 0) { m_freem(m); goto drop; } break; #endif case IPPROTO_ETHERIP: af = AF_LINK; break; default: m_freem(m); goto drop; } #ifdef MAC mac_ifnet_create_mbuf(ifp, m); #endif if (bpf_peers_present(ifp->if_bpf)) { uint32_t af1 = af; bpf_mtap2(ifp->if_bpf, &af1, sizeof(af1), m); } if ((ifp->if_flags & IFF_MONITOR) != 0) { if_inc_counter(ifp, IFCOUNTER_IPACKETS, 1); if_inc_counter(ifp, IFCOUNTER_IBYTES, m->m_pkthdr.len); m_freem(m); return; } if (ng_gif_input_p != NULL) { (*ng_gif_input_p)(ifp, &m, af); if (m == NULL) goto drop; } /* * Put the packet to the network layer input queue according to the * specified address family. * Note: older versions of gif_input directly called network layer * input functions, e.g. ip6_input, here. We changed the policy to * prevent too many recursive calls of such input functions, which * might cause kernel panic. But the change may introduce another * problem; if the input queue is full, packets are discarded. * The kernel stack overflow really happened, and we believed * queue-full rarely occurs, so we changed the policy. */ switch (af) { #ifdef INET case AF_INET: isr = NETISR_IP; break; #endif #ifdef INET6 case AF_INET6: isr = NETISR_IPV6; break; #endif case AF_LINK: n = sizeof(struct etherip_header) + sizeof(struct ether_header); if (n > m->m_len) m = m_pullup(m, n); if (m == NULL) goto drop; eip = mtod(m, struct etherip_header *); if (eip->eip_ver != ETHERIP_VERSION) { /* discard unknown versions */ m_freem(m); goto drop; } m_adj(m, sizeof(struct etherip_header)); m->m_flags &= ~(M_BCAST|M_MCAST); m->m_pkthdr.rcvif = ifp; if (ifp->if_bridge) { oldifp = ifp; eh = mtod(m, struct ether_header *); if (ETHER_IS_MULTICAST(eh->ether_dhost)) { if (ETHER_IS_BROADCAST(eh->ether_dhost)) m->m_flags |= M_BCAST; else m->m_flags |= M_MCAST; if_inc_counter(ifp, IFCOUNTER_IMCASTS, 1); } BRIDGE_INPUT(ifp, m); if (m != NULL && ifp != oldifp) { /* * The bridge gave us back itself or one of the * members for which the frame is addressed. */ ether_demux(ifp, m); return; } } if (m != NULL) m_freem(m); return; default: if (ng_gif_input_orphan_p != NULL) (*ng_gif_input_orphan_p)(ifp, m, af); else m_freem(m); return; } if_inc_counter(ifp, IFCOUNTER_IPACKETS, 1); if_inc_counter(ifp, IFCOUNTER_IBYTES, m->m_pkthdr.len); M_SETFIB(m, ifp->if_fib); netisr_dispatch(isr, m); return; drop: if_inc_counter(ifp, IFCOUNTER_IERRORS, 1); } static int gif_ioctl(struct ifnet *ifp, u_long cmd, caddr_t data) { struct ifreq *ifr = (struct ifreq*)data; struct gif_softc *sc; u_int options; int error; switch (cmd) { case SIOCSIFADDR: ifp->if_flags |= IFF_UP; case SIOCADDMULTI: case SIOCDELMULTI: case SIOCGIFMTU: case SIOCSIFFLAGS: return (0); case SIOCSIFMTU: if (ifr->ifr_mtu < GIF_MTU_MIN || ifr->ifr_mtu > GIF_MTU_MAX) return (EINVAL); else ifp->if_mtu = ifr->ifr_mtu; return (0); } sx_xlock(&gif_ioctl_sx); sc = ifp->if_softc; if (sc == NULL) { error = ENXIO; goto bad; } error = 0; switch (cmd) { case SIOCDIFPHYADDR: if (sc->gif_family == 0) break; gif_delete_tunnel(sc); break; #ifdef INET case SIOCSIFPHYADDR: case SIOCGIFPSRCADDR: case SIOCGIFPDSTADDR: error = in_gif_ioctl(sc, cmd, data); break; #endif #ifdef INET6 case SIOCSIFPHYADDR_IN6: case SIOCGIFPSRCADDR_IN6: case SIOCGIFPDSTADDR_IN6: error = in6_gif_ioctl(sc, cmd, data); break; #endif case SIOCGTUNFIB: ifr->ifr_fib = sc->gif_fibnum; break; case SIOCSTUNFIB: if ((error = priv_check(curthread, PRIV_NET_GIF)) != 0) break; if (ifr->ifr_fib >= rt_numfibs) error = EINVAL; else sc->gif_fibnum = ifr->ifr_fib; break; case GIFGOPTS: options = sc->gif_options; error = copyout(&options, ifr_data_get_ptr(ifr), sizeof(options)); break; case GIFSOPTS: if ((error = priv_check(curthread, PRIV_NET_GIF)) != 0) break; error = copyin(ifr_data_get_ptr(ifr), &options, sizeof(options)); if (error) break; if (options & ~GIF_OPTMASK) { error = EINVAL; break; } if (sc->gif_options != options) { switch (sc->gif_family) { #ifdef INET case AF_INET: error = in_gif_setopts(sc, options); break; #endif #ifdef INET6 case AF_INET6: error = in6_gif_setopts(sc, options); break; #endif default: /* No need to invoke AF-handler */ sc->gif_options = options; } } break; default: error = EINVAL; break; } if (error == 0 && sc->gif_family != 0) { if ( #ifdef INET cmd == SIOCSIFPHYADDR || #endif #ifdef INET6 cmd == SIOCSIFPHYADDR_IN6 || #endif 0) { - ifp->if_drv_flags |= IFF_DRV_RUNNING; if_link_state_change(ifp, LINK_STATE_UP); } } bad: sx_xunlock(&gif_ioctl_sx); return (error); } static void gif_delete_tunnel(struct gif_softc *sc) { sx_assert(&gif_ioctl_sx, SA_XLOCKED); if (sc->gif_family != 0) { + CK_LIST_REMOVE(sc, srchash); CK_LIST_REMOVE(sc, chain); /* Wait until it become safe to free gif_hdr */ GIF_WAIT(); free(sc->gif_hdr, M_GIF); } sc->gif_family = 0; GIF2IFP(sc)->if_drv_flags &= ~IFF_DRV_RUNNING; if_link_state_change(GIF2IFP(sc), LINK_STATE_DOWN); } Index: stable/12/sys/net/if_gif.h =================================================================== --- stable/12/sys/net/if_gif.h (revision 340534) +++ stable/12/sys/net/if_gif.h (revision 340535) @@ -1,130 +1,131 @@ /* $FreeBSD$ */ /* $KAME: if_gif.h,v 1.17 2000/09/11 11:36:41 sumikawa Exp $ */ /*- * SPDX-License-Identifier: BSD-3-Clause * * Copyright (C) 1995, 1996, 1997, and 1998 WIDE Project. * Copyright (c) 2018 Andrey V. Elsukov * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the project nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #ifndef _NET_IF_GIF_H_ #define _NET_IF_GIF_H_ #ifdef _KERNEL struct ip; struct ip6_hdr; extern void (*ng_gif_input_p)(struct ifnet *ifp, struct mbuf **mp, int af); extern void (*ng_gif_input_orphan_p)(struct ifnet *ifp, struct mbuf *m, int af); extern int (*ng_gif_output_p)(struct ifnet *ifp, struct mbuf **mp); extern void (*ng_gif_attach_p)(struct ifnet *ifp); extern void (*ng_gif_detach_p)(struct ifnet *ifp); struct gif_softc { struct ifnet *gif_ifp; int gif_family; int gif_flags; u_int gif_fibnum; u_int gif_options; void *gif_netgraph; /* netgraph node info */ union { void *hdr; struct ip *iphdr; struct ip6_hdr *ip6hdr; } gif_uhdr; CK_LIST_ENTRY(gif_softc) chain; + CK_LIST_ENTRY(gif_softc) srchash; }; CK_LIST_HEAD(gif_list, gif_softc); MALLOC_DECLARE(M_GIF); #ifndef GIF_HASH_SIZE #define GIF_HASH_SIZE (1 << 4) #endif #define GIF2IFP(sc) ((sc)->gif_ifp) #define gif_iphdr gif_uhdr.iphdr #define gif_hdr gif_uhdr.hdr #define gif_ip6hdr gif_uhdr.ip6hdr #define GIF_MTU (1280) /* Default MTU */ #define GIF_MTU_MIN (1280) /* Minimum MTU */ #define GIF_MTU_MAX (8192) /* Maximum MTU */ struct etherip_header { #if BYTE_ORDER == LITTLE_ENDIAN u_int eip_resvl:4, /* reserved */ eip_ver:4; /* version */ #endif #if BYTE_ORDER == BIG_ENDIAN u_int eip_ver:4, /* version */ eip_resvl:4; /* reserved */ #endif u_int8_t eip_resvh; /* reserved */ } __packed; #define ETHERIP_VERSION 0x3 /* mbuf adjust factor to force 32-bit alignment of IP header */ #define ETHERIP_ALIGN 2 #define GIF_RLOCK() struct epoch_tracker gif_et; epoch_enter_preempt(net_epoch_preempt, &gif_et) #define GIF_RUNLOCK() epoch_exit_preempt(net_epoch_preempt, &gif_et) #define GIF_WAIT() epoch_wait_preempt(net_epoch_preempt) /* Prototypes */ struct gif_list *gif_hashinit(void); void gif_hashdestroy(struct gif_list *); void gif_input(struct mbuf *, struct ifnet *, int, uint8_t); int gif_output(struct ifnet *, struct mbuf *, const struct sockaddr *, struct route *); void in_gif_init(void); void in_gif_uninit(void); int in_gif_output(struct ifnet *, struct mbuf *, int, uint8_t); int in_gif_ioctl(struct gif_softc *, u_long, caddr_t); int in_gif_setopts(struct gif_softc *, u_int); void in6_gif_init(void); void in6_gif_uninit(void); int in6_gif_output(struct ifnet *, struct mbuf *, int, uint8_t); int in6_gif_ioctl(struct gif_softc *, u_long, caddr_t); int in6_gif_setopts(struct gif_softc *, u_int); #endif /* _KERNEL */ #define GIFGOPTS _IOWR('i', 150, struct ifreq) #define GIFSOPTS _IOW('i', 151, struct ifreq) #define GIF_IGNORE_SOURCE 0x0002 #define GIF_OPTMASK (GIF_IGNORE_SOURCE) #endif /* _NET_IF_GIF_H_ */ Index: stable/12/sys/net/if_gre.c =================================================================== --- stable/12/sys/net/if_gre.c (revision 340534) +++ stable/12/sys/net/if_gre.c (revision 340535) @@ -1,677 +1,679 @@ /*- * SPDX-License-Identifier: BSD-2-Clause-FreeBSD * * Copyright (c) 1998 The NetBSD Foundation, Inc. * Copyright (c) 2014, 2018 Andrey V. Elsukov * All rights reserved. * * This code is derived from software contributed to The NetBSD Foundation * by Heiko W.Rupp * * IPv6-over-GRE contributed by Gert Doering * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. * * $NetBSD: if_gre.c,v 1.49 2003/12/11 00:22:29 itojun Exp $ */ #include __FBSDID("$FreeBSD$"); #include "opt_inet.h" #include "opt_inet6.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifdef INET #include #include #include #endif #ifdef INET6 #include #include #include #endif #include #include #include #include #include #define GREMTU 1476 static const char grename[] = "gre"; MALLOC_DEFINE(M_GRE, grename, "Generic Routing Encapsulation"); static struct sx gre_ioctl_sx; SX_SYSINIT(gre_ioctl_sx, &gre_ioctl_sx, "gre_ioctl"); static int gre_clone_create(struct if_clone *, int, caddr_t); static void gre_clone_destroy(struct ifnet *); VNET_DEFINE_STATIC(struct if_clone *, gre_cloner); #define V_gre_cloner VNET(gre_cloner) static void gre_qflush(struct ifnet *); static int gre_transmit(struct ifnet *, struct mbuf *); static int gre_ioctl(struct ifnet *, u_long, caddr_t); static int gre_output(struct ifnet *, struct mbuf *, const struct sockaddr *, struct route *); static void gre_delete_tunnel(struct gre_softc *); SYSCTL_DECL(_net_link); static SYSCTL_NODE(_net_link, IFT_TUNNEL, gre, CTLFLAG_RW, 0, "Generic Routing Encapsulation"); #ifndef MAX_GRE_NEST /* * This macro controls the default upper limitation on nesting of gre tunnels. * Since, setting a large value to this macro with a careless configuration * may introduce system crash, we don't allow any nestings by default. * If you need to configure nested gre tunnels, you can define this macro * in your kernel configuration file. However, if you do so, please be * careful to configure the tunnels so that it won't make a loop. */ #define MAX_GRE_NEST 1 #endif VNET_DEFINE_STATIC(int, max_gre_nesting) = MAX_GRE_NEST; #define V_max_gre_nesting VNET(max_gre_nesting) SYSCTL_INT(_net_link_gre, OID_AUTO, max_nesting, CTLFLAG_RW | CTLFLAG_VNET, &VNET_NAME(max_gre_nesting), 0, "Max nested tunnels"); static void vnet_gre_init(const void *unused __unused) { V_gre_cloner = if_clone_simple(grename, gre_clone_create, gre_clone_destroy, 0); #ifdef INET in_gre_init(); #endif #ifdef INET6 in6_gre_init(); #endif } VNET_SYSINIT(vnet_gre_init, SI_SUB_PROTO_IFATTACHDOMAIN, SI_ORDER_ANY, vnet_gre_init, NULL); static void vnet_gre_uninit(const void *unused __unused) { if_clone_detach(V_gre_cloner); #ifdef INET in_gre_uninit(); #endif #ifdef INET6 in6_gre_uninit(); #endif } VNET_SYSUNINIT(vnet_gre_uninit, SI_SUB_PROTO_IFATTACHDOMAIN, SI_ORDER_ANY, vnet_gre_uninit, NULL); static int gre_clone_create(struct if_clone *ifc, int unit, caddr_t params) { struct gre_softc *sc; sc = malloc(sizeof(struct gre_softc), M_GRE, M_WAITOK | M_ZERO); sc->gre_fibnum = curthread->td_proc->p_fibnum; GRE2IFP(sc) = if_alloc(IFT_TUNNEL); GRE2IFP(sc)->if_softc = sc; if_initname(GRE2IFP(sc), grename, unit); GRE2IFP(sc)->if_mtu = GREMTU; GRE2IFP(sc)->if_flags = IFF_POINTOPOINT|IFF_MULTICAST; GRE2IFP(sc)->if_output = gre_output; GRE2IFP(sc)->if_ioctl = gre_ioctl; GRE2IFP(sc)->if_transmit = gre_transmit; GRE2IFP(sc)->if_qflush = gre_qflush; GRE2IFP(sc)->if_capabilities |= IFCAP_LINKSTATE; GRE2IFP(sc)->if_capenable |= IFCAP_LINKSTATE; if_attach(GRE2IFP(sc)); bpfattach(GRE2IFP(sc), DLT_NULL, sizeof(u_int32_t)); return (0); } static void gre_clone_destroy(struct ifnet *ifp) { struct gre_softc *sc; sx_xlock(&gre_ioctl_sx); sc = ifp->if_softc; gre_delete_tunnel(sc); bpfdetach(ifp); if_detach(ifp); ifp->if_softc = NULL; sx_xunlock(&gre_ioctl_sx); GRE_WAIT(); if_free(ifp); free(sc, M_GRE); } static int gre_ioctl(struct ifnet *ifp, u_long cmd, caddr_t data) { struct ifreq *ifr = (struct ifreq *)data; struct gre_softc *sc; uint32_t opt; int error; switch (cmd) { case SIOCSIFMTU: /* XXX: */ if (ifr->ifr_mtu < 576) return (EINVAL); ifp->if_mtu = ifr->ifr_mtu; return (0); case SIOCSIFADDR: ifp->if_flags |= IFF_UP; case SIOCSIFFLAGS: case SIOCADDMULTI: case SIOCDELMULTI: return (0); case GRESADDRS: case GRESADDRD: case GREGADDRS: case GREGADDRD: case GRESPROTO: case GREGPROTO: return (EOPNOTSUPP); } sx_xlock(&gre_ioctl_sx); sc = ifp->if_softc; if (sc == NULL) { error = ENXIO; goto end; } error = 0; switch (cmd) { case SIOCDIFPHYADDR: if (sc->gre_family == 0) break; gre_delete_tunnel(sc); break; #ifdef INET case SIOCSIFPHYADDR: case SIOCGIFPSRCADDR: case SIOCGIFPDSTADDR: error = in_gre_ioctl(sc, cmd, data); break; #endif #ifdef INET6 case SIOCSIFPHYADDR_IN6: case SIOCGIFPSRCADDR_IN6: case SIOCGIFPDSTADDR_IN6: error = in6_gre_ioctl(sc, cmd, data); break; #endif case SIOCGTUNFIB: ifr->ifr_fib = sc->gre_fibnum; break; case SIOCSTUNFIB: if ((error = priv_check(curthread, PRIV_NET_GRE)) != 0) break; if (ifr->ifr_fib >= rt_numfibs) error = EINVAL; else sc->gre_fibnum = ifr->ifr_fib; break; case GRESKEY: case GRESOPTS: if ((error = priv_check(curthread, PRIV_NET_GRE)) != 0) break; if ((error = copyin(ifr_data_get_ptr(ifr), &opt, sizeof(opt))) != 0) break; if (cmd == GRESKEY) { if (sc->gre_key == opt) break; } else if (cmd == GRESOPTS) { if (opt & ~GRE_OPTMASK) { error = EINVAL; break; } if (sc->gre_options == opt) break; } switch (sc->gre_family) { #ifdef INET case AF_INET: in_gre_setopts(sc, cmd, opt); break; #endif #ifdef INET6 case AF_INET6: in6_gre_setopts(sc, cmd, opt); break; #endif default: if (cmd == GRESKEY) sc->gre_key = opt; else sc->gre_options = opt; break; } /* * XXX: Do we need to initiate change of interface * state here? */ break; case GREGKEY: error = copyout(&sc->gre_key, ifr_data_get_ptr(ifr), sizeof(sc->gre_key)); break; case GREGOPTS: error = copyout(&sc->gre_options, ifr_data_get_ptr(ifr), sizeof(sc->gre_options)); break; default: error = EINVAL; break; } if (error == 0 && sc->gre_family != 0) { if ( #ifdef INET cmd == SIOCSIFPHYADDR || #endif #ifdef INET6 cmd == SIOCSIFPHYADDR_IN6 || #endif 0) { - ifp->if_drv_flags |= IFF_DRV_RUNNING; if_link_state_change(ifp, LINK_STATE_UP); } } end: sx_xunlock(&gre_ioctl_sx); return (error); } static void gre_delete_tunnel(struct gre_softc *sc) { sx_assert(&gre_ioctl_sx, SA_XLOCKED); if (sc->gre_family != 0) { CK_LIST_REMOVE(sc, chain); + CK_LIST_REMOVE(sc, srchash); GRE_WAIT(); free(sc->gre_hdr, M_GRE); sc->gre_family = 0; } GRE2IFP(sc)->if_drv_flags &= ~IFF_DRV_RUNNING; if_link_state_change(GRE2IFP(sc), LINK_STATE_DOWN); } struct gre_list * gre_hashinit(void) { struct gre_list *hash; int i; hash = malloc(sizeof(struct gre_list) * GRE_HASH_SIZE, M_GRE, M_WAITOK); for (i = 0; i < GRE_HASH_SIZE; i++) CK_LIST_INIT(&hash[i]); return (hash); } void gre_hashdestroy(struct gre_list *hash) { free(hash, M_GRE); } void gre_updatehdr(struct gre_softc *sc, struct grehdr *gh) { uint32_t *opts; uint16_t flags; sx_assert(&gre_ioctl_sx, SA_XLOCKED); flags = 0; opts = gh->gre_opts; if (sc->gre_options & GRE_ENABLE_CSUM) { flags |= GRE_FLAGS_CP; sc->gre_hlen += 2 * sizeof(uint16_t); *opts++ = 0; } if (sc->gre_key != 0) { flags |= GRE_FLAGS_KP; sc->gre_hlen += sizeof(uint32_t); *opts++ = htonl(sc->gre_key); } if (sc->gre_options & GRE_ENABLE_SEQ) { flags |= GRE_FLAGS_SP; sc->gre_hlen += sizeof(uint32_t); *opts++ = 0; } else sc->gre_oseq = 0; gh->gre_flags = htons(flags); } int gre_input(struct mbuf *m, int off, int proto, void *arg) { struct gre_softc *sc = arg; struct grehdr *gh; struct ifnet *ifp; uint32_t *opts; #ifdef notyet uint32_t key; #endif uint16_t flags; int hlen, isr, af; ifp = GRE2IFP(sc); hlen = off + sizeof(struct grehdr) + 4 * sizeof(uint32_t); if (m->m_pkthdr.len < hlen) goto drop; if (m->m_len < hlen) { m = m_pullup(m, hlen); if (m == NULL) goto drop; } gh = (struct grehdr *)mtodo(m, off); flags = ntohs(gh->gre_flags); if (flags & ~GRE_FLAGS_MASK) goto drop; opts = gh->gre_opts; hlen = 2 * sizeof(uint16_t); if (flags & GRE_FLAGS_CP) { /* reserved1 field must be zero */ if (((uint16_t *)opts)[1] != 0) goto drop; if (in_cksum_skip(m, m->m_pkthdr.len, off) != 0) goto drop; hlen += 2 * sizeof(uint16_t); opts++; } if (flags & GRE_FLAGS_KP) { #ifdef notyet /* * XXX: The current implementation uses the key only for outgoing * packets. But we can check the key value here, or even in the * encapcheck function. */ key = ntohl(*opts); #endif hlen += sizeof(uint32_t); opts++; } #ifdef notyet } else key = 0; if (sc->gre_key != 0 && (key != sc->gre_key || key != 0)) goto drop; #endif if (flags & GRE_FLAGS_SP) { #ifdef notyet seq = ntohl(*opts); #endif hlen += sizeof(uint32_t); } switch (ntohs(gh->gre_proto)) { case ETHERTYPE_WCCP: /* * For WCCP skip an additional 4 bytes if after GRE header * doesn't follow an IP header. */ if (flags == 0 && (*(uint8_t *)gh->gre_opts & 0xF0) != 0x40) hlen += sizeof(uint32_t); /* FALLTHROUGH */ case ETHERTYPE_IP: isr = NETISR_IP; af = AF_INET; break; case ETHERTYPE_IPV6: isr = NETISR_IPV6; af = AF_INET6; break; default: goto drop; } m_adj(m, off + hlen); m_clrprotoflags(m); m->m_pkthdr.rcvif = ifp; M_SETFIB(m, ifp->if_fib); #ifdef MAC mac_ifnet_create_mbuf(ifp, m); #endif BPF_MTAP2(ifp, &af, sizeof(af), m); if_inc_counter(ifp, IFCOUNTER_IPACKETS, 1); if_inc_counter(ifp, IFCOUNTER_IBYTES, m->m_pkthdr.len); if ((ifp->if_flags & IFF_MONITOR) != 0) m_freem(m); else netisr_dispatch(isr, m); return (IPPROTO_DONE); drop: if_inc_counter(ifp, IFCOUNTER_IERRORS, 1); m_freem(m); return (IPPROTO_DONE); } static int gre_output(struct ifnet *ifp, struct mbuf *m, const struct sockaddr *dst, struct route *ro) { uint32_t af; if (dst->sa_family == AF_UNSPEC) bcopy(dst->sa_data, &af, sizeof(af)); else af = dst->sa_family; /* * Now save the af in the inbound pkt csum data, this is a cheat since * we are using the inbound csum_data field to carry the af over to * the gre_transmit() routine, avoiding using yet another mtag. */ m->m_pkthdr.csum_data = af; return (ifp->if_transmit(ifp, m)); } static void gre_setseqn(struct grehdr *gh, uint32_t seq) { uint32_t *opts; uint16_t flags; opts = gh->gre_opts; flags = ntohs(gh->gre_flags); KASSERT((flags & GRE_FLAGS_SP) != 0, ("gre_setseqn called, but GRE_FLAGS_SP isn't set ")); if (flags & GRE_FLAGS_CP) opts++; if (flags & GRE_FLAGS_KP) opts++; *opts = htonl(seq); } #define MTAG_GRE 1307983903 static int gre_transmit(struct ifnet *ifp, struct mbuf *m) { + GRE_RLOCK_TRACKER; struct gre_softc *sc; struct grehdr *gh; uint32_t af; int error, len; uint16_t proto; len = 0; GRE_RLOCK(); #ifdef MAC error = mac_ifnet_check_transmit(ifp, m); if (error) { m_freem(m); goto drop; } #endif error = ENETDOWN; sc = ifp->if_softc; if ((ifp->if_flags & IFF_MONITOR) != 0 || (ifp->if_flags & IFF_UP) == 0 || + (ifp->if_drv_flags & IFF_DRV_RUNNING) == 0 || sc->gre_family == 0 || (error = if_tunnel_check_nesting(ifp, m, MTAG_GRE, V_max_gre_nesting)) != 0) { m_freem(m); goto drop; } af = m->m_pkthdr.csum_data; BPF_MTAP2(ifp, &af, sizeof(af), m); m->m_flags &= ~(M_BCAST|M_MCAST); M_SETFIB(m, sc->gre_fibnum); M_PREPEND(m, sc->gre_hlen, M_NOWAIT); if (m == NULL) { error = ENOBUFS; goto drop; } bcopy(sc->gre_hdr, mtod(m, void *), sc->gre_hlen); /* Determine GRE proto */ switch (af) { #ifdef INET case AF_INET: proto = htons(ETHERTYPE_IP); break; #endif #ifdef INET6 case AF_INET6: proto = htons(ETHERTYPE_IPV6); break; #endif default: m_freem(m); error = ENETDOWN; goto drop; } /* Determine offset of GRE header */ switch (sc->gre_family) { #ifdef INET case AF_INET: len = sizeof(struct ip); break; #endif #ifdef INET6 case AF_INET6: len = sizeof(struct ip6_hdr); break; #endif default: m_freem(m); error = ENETDOWN; goto drop; } gh = (struct grehdr *)mtodo(m, len); gh->gre_proto = proto; if (sc->gre_options & GRE_ENABLE_SEQ) gre_setseqn(gh, sc->gre_oseq++); if (sc->gre_options & GRE_ENABLE_CSUM) { *(uint16_t *)gh->gre_opts = in_cksum_skip(m, m->m_pkthdr.len, len); } len = m->m_pkthdr.len - len; switch (sc->gre_family) { #ifdef INET case AF_INET: error = in_gre_output(m, af, sc->gre_hlen); break; #endif #ifdef INET6 case AF_INET6: error = in6_gre_output(m, af, sc->gre_hlen); break; #endif default: m_freem(m); error = ENETDOWN; } drop: if (error) if_inc_counter(ifp, IFCOUNTER_OERRORS, 1); else { if_inc_counter(ifp, IFCOUNTER_OPACKETS, 1); if_inc_counter(ifp, IFCOUNTER_OBYTES, len); } GRE_RUNLOCK(); return (error); } static void gre_qflush(struct ifnet *ifp __unused) { } static int gremodevent(module_t mod, int type, void *data) { switch (type) { case MOD_LOAD: case MOD_UNLOAD: break; default: return (EOPNOTSUPP); } return (0); } static moduledata_t gre_mod = { "if_gre", gremodevent, 0 }; DECLARE_MODULE(if_gre, gre_mod, SI_SUB_PSEUDO, SI_ORDER_ANY); MODULE_VERSION(if_gre, 1); Index: stable/12/sys/net/if_gre.h =================================================================== --- stable/12/sys/net/if_gre.h (revision 340534) +++ stable/12/sys/net/if_gre.h (revision 340535) @@ -1,145 +1,147 @@ /*- * SPDX-License-Identifier: BSD-2-Clause-FreeBSD * * Copyright (c) 1998 The NetBSD Foundation, Inc. * Copyright (c) 2014 Andrey V. Elsukov * All rights reserved * * This code is derived from software contributed to The NetBSD Foundation * by Heiko W.Rupp * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. * * $NetBSD: if_gre.h,v 1.13 2003/11/10 08:51:52 wiz Exp $ * $FreeBSD$ */ #ifndef _NET_IF_GRE_H_ #define _NET_IF_GRE_H_ #ifdef _KERNEL /* GRE header according to RFC 2784 and RFC 2890 */ struct grehdr { uint16_t gre_flags; /* GRE flags */ #define GRE_FLAGS_CP 0x8000 /* checksum present */ #define GRE_FLAGS_KP 0x2000 /* key present */ #define GRE_FLAGS_SP 0x1000 /* sequence present */ #define GRE_FLAGS_MASK (GRE_FLAGS_CP|GRE_FLAGS_KP|GRE_FLAGS_SP) uint16_t gre_proto; /* protocol type */ uint32_t gre_opts[0]; /* optional fields */ } __packed; #ifdef INET struct greip { struct ip gi_ip; struct grehdr gi_gre; } __packed; #endif #ifdef INET6 struct greip6 { struct ip6_hdr gi6_ip6; struct grehdr gi6_gre; } __packed; #endif struct gre_softc { struct ifnet *gre_ifp; int gre_family; /* AF of delivery header */ uint32_t gre_iseq; uint32_t gre_oseq; uint32_t gre_key; uint32_t gre_options; u_int gre_fibnum; u_int gre_hlen; /* header size */ union { void *hdr; #ifdef INET struct greip *gihdr; #endif #ifdef INET6 struct greip6 *gi6hdr; #endif } gre_uhdr; CK_LIST_ENTRY(gre_softc) chain; + CK_LIST_ENTRY(gre_softc) srchash; }; CK_LIST_HEAD(gre_list, gre_softc); MALLOC_DECLARE(M_GRE); #ifndef GRE_HASH_SIZE #define GRE_HASH_SIZE (1 << 4) #endif #define GRE2IFP(sc) ((sc)->gre_ifp) -#define GRE_RLOCK() struct epoch_tracker gre_et; epoch_enter_preempt(net_epoch_preempt, &gre_et) +#define GRE_RLOCK_TRACKER struct epoch_tracker gre_et +#define GRE_RLOCK() epoch_enter_preempt(net_epoch_preempt, &gre_et) #define GRE_RUNLOCK() epoch_exit_preempt(net_epoch_preempt, &gre_et) #define GRE_WAIT() epoch_wait_preempt(net_epoch_preempt) #define gre_hdr gre_uhdr.hdr #define gre_gihdr gre_uhdr.gihdr #define gre_gi6hdr gre_uhdr.gi6hdr #define gre_oip gre_gihdr->gi_ip #define gre_oip6 gre_gi6hdr->gi6_ip6 struct gre_list *gre_hashinit(void); void gre_hashdestroy(struct gre_list *); int gre_input(struct mbuf *, int, int, void *); void gre_updatehdr(struct gre_softc *, struct grehdr *); void in_gre_init(void); void in_gre_uninit(void); void in_gre_setopts(struct gre_softc *, u_long, uint32_t); int in_gre_ioctl(struct gre_softc *, u_long, caddr_t); int in_gre_output(struct mbuf *, int, int); void in6_gre_init(void); void in6_gre_uninit(void); void in6_gre_setopts(struct gre_softc *, u_long, uint32_t); int in6_gre_ioctl(struct gre_softc *, u_long, caddr_t); int in6_gre_output(struct mbuf *, int, int); /* * CISCO uses special type for GRE tunnel created as part of WCCP * connection, while in fact those packets are just IPv4 encapsulated * into GRE. */ #define ETHERTYPE_WCCP 0x883E #endif /* _KERNEL */ #define GRESADDRS _IOW('i', 101, struct ifreq) #define GRESADDRD _IOW('i', 102, struct ifreq) #define GREGADDRS _IOWR('i', 103, struct ifreq) #define GREGADDRD _IOWR('i', 104, struct ifreq) #define GRESPROTO _IOW('i' , 105, struct ifreq) #define GREGPROTO _IOWR('i', 106, struct ifreq) #define GREGKEY _IOWR('i', 107, struct ifreq) #define GRESKEY _IOW('i', 108, struct ifreq) #define GREGOPTS _IOWR('i', 109, struct ifreq) #define GRESOPTS _IOW('i', 110, struct ifreq) #define GRE_ENABLE_CSUM 0x0001 #define GRE_ENABLE_SEQ 0x0002 #define GRE_OPTMASK (GRE_ENABLE_CSUM|GRE_ENABLE_SEQ) #endif /* _NET_IF_GRE_H_ */ Index: stable/12/sys/net/if_me.c =================================================================== --- stable/12/sys/net/if_me.c (revision 340534) +++ stable/12/sys/net/if_me.c (revision 340535) @@ -1,604 +1,663 @@ /*- * Copyright (c) 2014, 2018 Andrey V. Elsukov * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #define MEMTU (1500 - sizeof(struct mobhdr)) static const char mename[] = "me"; static MALLOC_DEFINE(M_IFME, mename, "Minimal Encapsulation for IP"); /* Minimal forwarding header RFC 2004 */ struct mobhdr { uint8_t mob_proto; /* protocol */ uint8_t mob_flags; /* flags */ #define MOB_FLAGS_SP 0x80 /* source present */ uint16_t mob_csum; /* header checksum */ struct in_addr mob_dst; /* original destination address */ struct in_addr mob_src; /* original source addr (optional) */ } __packed; struct me_softc { struct ifnet *me_ifp; u_int me_fibnum; struct in_addr me_src; struct in_addr me_dst; CK_LIST_ENTRY(me_softc) chain; + CK_LIST_ENTRY(me_softc) srchash; }; CK_LIST_HEAD(me_list, me_softc); #define ME2IFP(sc) ((sc)->me_ifp) #define ME_READY(sc) ((sc)->me_src.s_addr != 0) -#define ME_RLOCK() struct epoch_tracker me_et; epoch_enter_preempt(net_epoch_preempt, &me_et) +#define ME_RLOCK_TRACKER struct epoch_tracker me_et +#define ME_RLOCK() epoch_enter_preempt(net_epoch_preempt, &me_et) #define ME_RUNLOCK() epoch_exit_preempt(net_epoch_preempt, &me_et) #define ME_WAIT() epoch_wait_preempt(net_epoch_preempt) #ifndef ME_HASH_SIZE #define ME_HASH_SIZE (1 << 4) #endif VNET_DEFINE_STATIC(struct me_list *, me_hashtbl) = NULL; +VNET_DEFINE_STATIC(struct me_list *, me_srchashtbl) = NULL; #define V_me_hashtbl VNET(me_hashtbl) +#define V_me_srchashtbl VNET(me_srchashtbl) #define ME_HASH(src, dst) (V_me_hashtbl[\ me_hashval((src), (dst)) & (ME_HASH_SIZE - 1)]) +#define ME_SRCHASH(src) (V_me_srchashtbl[\ + fnv_32_buf(&(src), sizeof(src), FNV1_32_INIT) & (ME_HASH_SIZE - 1)]) static struct sx me_ioctl_sx; SX_SYSINIT(me_ioctl_sx, &me_ioctl_sx, "me_ioctl"); static int me_clone_create(struct if_clone *, int, caddr_t); static void me_clone_destroy(struct ifnet *); VNET_DEFINE_STATIC(struct if_clone *, me_cloner); #define V_me_cloner VNET(me_cloner) static void me_qflush(struct ifnet *); static int me_transmit(struct ifnet *, struct mbuf *); static int me_ioctl(struct ifnet *, u_long, caddr_t); static int me_output(struct ifnet *, struct mbuf *, const struct sockaddr *, struct route *); static int me_input(struct mbuf *, int, int, void *); static int me_set_tunnel(struct me_softc *, in_addr_t, in_addr_t); static void me_delete_tunnel(struct me_softc *); SYSCTL_DECL(_net_link); static SYSCTL_NODE(_net_link, IFT_TUNNEL, me, CTLFLAG_RW, 0, "Minimal Encapsulation for IP (RFC 2004)"); #ifndef MAX_ME_NEST #define MAX_ME_NEST 1 #endif VNET_DEFINE_STATIC(int, max_me_nesting) = MAX_ME_NEST; #define V_max_me_nesting VNET(max_me_nesting) SYSCTL_INT(_net_link_me, OID_AUTO, max_nesting, CTLFLAG_RW | CTLFLAG_VNET, &VNET_NAME(max_me_nesting), 0, "Max nested tunnels"); static uint32_t me_hashval(in_addr_t src, in_addr_t dst) { uint32_t ret; ret = fnv_32_buf(&src, sizeof(src), FNV1_32_INIT); return (fnv_32_buf(&dst, sizeof(dst), ret)); } static struct me_list * me_hashinit(void) { struct me_list *hash; int i; hash = malloc(sizeof(struct me_list) * ME_HASH_SIZE, M_IFME, M_WAITOK); for (i = 0; i < ME_HASH_SIZE; i++) CK_LIST_INIT(&hash[i]); return (hash); } static void vnet_me_init(const void *unused __unused) { + V_me_cloner = if_clone_simple(mename, me_clone_create, me_clone_destroy, 0); } VNET_SYSINIT(vnet_me_init, SI_SUB_PROTO_IFATTACHDOMAIN, SI_ORDER_ANY, vnet_me_init, NULL); static void vnet_me_uninit(const void *unused __unused) { - if (V_me_hashtbl != NULL) + if (V_me_hashtbl != NULL) { free(V_me_hashtbl, M_IFME); + V_me_hashtbl = NULL; + ME_WAIT(); + free(V_me_srchashtbl, M_IFME); + } if_clone_detach(V_me_cloner); } VNET_SYSUNINIT(vnet_me_uninit, SI_SUB_PROTO_IFATTACHDOMAIN, SI_ORDER_ANY, vnet_me_uninit, NULL); static int me_clone_create(struct if_clone *ifc, int unit, caddr_t params) { struct me_softc *sc; sc = malloc(sizeof(struct me_softc), M_IFME, M_WAITOK | M_ZERO); sc->me_fibnum = curthread->td_proc->p_fibnum; ME2IFP(sc) = if_alloc(IFT_TUNNEL); ME2IFP(sc)->if_softc = sc; if_initname(ME2IFP(sc), mename, unit); ME2IFP(sc)->if_mtu = MEMTU;; ME2IFP(sc)->if_flags = IFF_POINTOPOINT|IFF_MULTICAST; ME2IFP(sc)->if_output = me_output; ME2IFP(sc)->if_ioctl = me_ioctl; ME2IFP(sc)->if_transmit = me_transmit; ME2IFP(sc)->if_qflush = me_qflush; ME2IFP(sc)->if_capabilities |= IFCAP_LINKSTATE; ME2IFP(sc)->if_capenable |= IFCAP_LINKSTATE; if_attach(ME2IFP(sc)); bpfattach(ME2IFP(sc), DLT_NULL, sizeof(u_int32_t)); return (0); } static void me_clone_destroy(struct ifnet *ifp) { struct me_softc *sc; sx_xlock(&me_ioctl_sx); sc = ifp->if_softc; me_delete_tunnel(sc); bpfdetach(ifp); if_detach(ifp); ifp->if_softc = NULL; sx_xunlock(&me_ioctl_sx); ME_WAIT(); if_free(ifp); free(sc, M_IFME); } static int me_ioctl(struct ifnet *ifp, u_long cmd, caddr_t data) { struct ifreq *ifr = (struct ifreq *)data; struct sockaddr_in *src, *dst; struct me_softc *sc; int error; switch (cmd) { case SIOCSIFMTU: if (ifr->ifr_mtu < 576) return (EINVAL); ifp->if_mtu = ifr->ifr_mtu; return (0); case SIOCSIFADDR: ifp->if_flags |= IFF_UP; case SIOCSIFFLAGS: case SIOCADDMULTI: case SIOCDELMULTI: return (0); } sx_xlock(&me_ioctl_sx); sc = ifp->if_softc; if (sc == NULL) { error = ENXIO; goto end; } error = 0; switch (cmd) { case SIOCSIFPHYADDR: src = &((struct in_aliasreq *)data)->ifra_addr; dst = &((struct in_aliasreq *)data)->ifra_dstaddr; if (src->sin_family != dst->sin_family || src->sin_family != AF_INET || src->sin_len != dst->sin_len || src->sin_len != sizeof(struct sockaddr_in)) { error = EINVAL; break; } if (src->sin_addr.s_addr == INADDR_ANY || dst->sin_addr.s_addr == INADDR_ANY) { error = EADDRNOTAVAIL; break; } error = me_set_tunnel(sc, src->sin_addr.s_addr, dst->sin_addr.s_addr); break; case SIOCDIFPHYADDR: me_delete_tunnel(sc); break; case SIOCGIFPSRCADDR: case SIOCGIFPDSTADDR: if (!ME_READY(sc)) { error = EADDRNOTAVAIL; break; } src = (struct sockaddr_in *)&ifr->ifr_addr; memset(src, 0, sizeof(*src)); src->sin_family = AF_INET; src->sin_len = sizeof(*src); switch (cmd) { case SIOCGIFPSRCADDR: src->sin_addr = sc->me_src; break; case SIOCGIFPDSTADDR: src->sin_addr = sc->me_dst; break; } error = prison_if(curthread->td_ucred, sintosa(src)); if (error != 0) memset(src, 0, sizeof(*src)); break; case SIOCGTUNFIB: ifr->ifr_fib = sc->me_fibnum; break; case SIOCSTUNFIB: if ((error = priv_check(curthread, PRIV_NET_GRE)) != 0) break; if (ifr->ifr_fib >= rt_numfibs) error = EINVAL; else sc->me_fibnum = ifr->ifr_fib; break; default: error = EINVAL; break; } end: sx_xunlock(&me_ioctl_sx); return (error); } static int me_lookup(const struct mbuf *m, int off, int proto, void **arg) { const struct ip *ip; struct me_softc *sc; if (V_me_hashtbl == NULL) return (0); MPASS(in_epoch(net_epoch_preempt)); ip = mtod(m, const struct ip *); CK_LIST_FOREACH(sc, &ME_HASH(ip->ip_dst.s_addr, ip->ip_src.s_addr), chain) { if (sc->me_src.s_addr == ip->ip_dst.s_addr && sc->me_dst.s_addr == ip->ip_src.s_addr) { if ((ME2IFP(sc)->if_flags & IFF_UP) == 0) return (0); *arg = sc; return (ENCAP_DRV_LOOKUP); } } return (0); } +/* + * Check that ingress address belongs to local host. + */ +static void +me_set_running(struct me_softc *sc) +{ + + if (in_localip(sc->me_src)) + ME2IFP(sc)->if_drv_flags |= IFF_DRV_RUNNING; + else + ME2IFP(sc)->if_drv_flags &= ~IFF_DRV_RUNNING; +} + +/* + * ifaddr_event handler. + * Clear IFF_DRV_RUNNING flag when ingress address disappears to prevent + * source address spoofing. + */ +static void +me_srcaddr(void *arg __unused, const struct sockaddr *sa, + int event __unused) +{ + const struct sockaddr_in *sin; + struct me_softc *sc; + + /* Check that VNET is ready */ + if (V_me_hashtbl == NULL) + return; + + MPASS(in_epoch(net_epoch_preempt)); + sin = (const struct sockaddr_in *)sa; + CK_LIST_FOREACH(sc, &ME_SRCHASH(sin->sin_addr.s_addr), srchash) { + if (sc->me_src.s_addr != sin->sin_addr.s_addr) + continue; + me_set_running(sc); + } +} + static int me_set_tunnel(struct me_softc *sc, in_addr_t src, in_addr_t dst) { struct me_softc *tmp; sx_assert(&me_ioctl_sx, SA_XLOCKED); - if (V_me_hashtbl == NULL) + if (V_me_hashtbl == NULL) { V_me_hashtbl = me_hashinit(); + V_me_srchashtbl = me_hashinit(); + } if (sc->me_src.s_addr == src && sc->me_dst.s_addr == dst) return (0); CK_LIST_FOREACH(tmp, &ME_HASH(src, dst), chain) { if (tmp == sc) continue; if (tmp->me_src.s_addr == src && tmp->me_dst.s_addr == dst) return (EADDRNOTAVAIL); } me_delete_tunnel(sc); sc->me_dst.s_addr = dst; sc->me_src.s_addr = src; CK_LIST_INSERT_HEAD(&ME_HASH(src, dst), sc, chain); + CK_LIST_INSERT_HEAD(&ME_SRCHASH(src), sc, srchash); - ME2IFP(sc)->if_drv_flags |= IFF_DRV_RUNNING; + me_set_running(sc); if_link_state_change(ME2IFP(sc), LINK_STATE_UP); return (0); } static void me_delete_tunnel(struct me_softc *sc) { sx_assert(&me_ioctl_sx, SA_XLOCKED); if (ME_READY(sc)) { CK_LIST_REMOVE(sc, chain); + CK_LIST_REMOVE(sc, srchash); ME_WAIT(); sc->me_src.s_addr = 0; sc->me_dst.s_addr = 0; ME2IFP(sc)->if_drv_flags &= ~IFF_DRV_RUNNING; if_link_state_change(ME2IFP(sc), LINK_STATE_DOWN); } } static uint16_t me_in_cksum(uint16_t *p, int nwords) { uint32_t sum = 0; while (nwords-- > 0) sum += *p++; sum = (sum >> 16) + (sum & 0xffff); sum += (sum >> 16); return (~sum); } static int me_input(struct mbuf *m, int off, int proto, void *arg) { struct me_softc *sc = arg; struct mobhdr *mh; struct ifnet *ifp; struct ip *ip; int hlen; ifp = ME2IFP(sc); /* checks for short packets */ hlen = sizeof(struct mobhdr); if (m->m_pkthdr.len < sizeof(struct ip) + hlen) hlen -= sizeof(struct in_addr); if (m->m_len < sizeof(struct ip) + hlen) m = m_pullup(m, sizeof(struct ip) + hlen); if (m == NULL) goto drop; mh = (struct mobhdr *)mtodo(m, sizeof(struct ip)); /* check for wrong flags */ if (mh->mob_flags & (~MOB_FLAGS_SP)) { m_freem(m); goto drop; } if (mh->mob_flags) { if (hlen != sizeof(struct mobhdr)) { m_freem(m); goto drop; } } else hlen = sizeof(struct mobhdr) - sizeof(struct in_addr); /* check mobile header checksum */ if (me_in_cksum((uint16_t *)mh, hlen / sizeof(uint16_t)) != 0) { m_freem(m); goto drop; } #ifdef MAC mac_ifnet_create_mbuf(ifp, m); #endif ip = mtod(m, struct ip *); ip->ip_dst = mh->mob_dst; ip->ip_p = mh->mob_proto; ip->ip_sum = 0; ip->ip_len = htons(m->m_pkthdr.len - hlen); if (mh->mob_flags) ip->ip_src = mh->mob_src; memmove(mtodo(m, hlen), ip, sizeof(struct ip)); m_adj(m, hlen); m_clrprotoflags(m); m->m_pkthdr.rcvif = ifp; m->m_pkthdr.csum_flags |= (CSUM_IP_CHECKED | CSUM_IP_VALID); M_SETFIB(m, ifp->if_fib); hlen = AF_INET; BPF_MTAP2(ifp, &hlen, sizeof(hlen), m); if_inc_counter(ifp, IFCOUNTER_IPACKETS, 1); if_inc_counter(ifp, IFCOUNTER_IBYTES, m->m_pkthdr.len); if ((ifp->if_flags & IFF_MONITOR) != 0) m_freem(m); else netisr_dispatch(NETISR_IP, m); return (IPPROTO_DONE); drop: if_inc_counter(ifp, IFCOUNTER_IERRORS, 1); return (IPPROTO_DONE); } static int me_output(struct ifnet *ifp, struct mbuf *m, const struct sockaddr *dst, struct route *ro __unused) { uint32_t af; if (dst->sa_family == AF_UNSPEC) bcopy(dst->sa_data, &af, sizeof(af)); else af = dst->sa_family; m->m_pkthdr.csum_data = af; return (ifp->if_transmit(ifp, m)); } #define MTAG_ME 1414491977 static int me_transmit(struct ifnet *ifp, struct mbuf *m) { + ME_RLOCK_TRACKER; struct mobhdr mh; struct me_softc *sc; struct ip *ip; uint32_t af; int error, hlen, plen; ME_RLOCK(); #ifdef MAC error = mac_ifnet_check_transmit(ifp, m); if (error != 0) goto drop; #endif error = ENETDOWN; sc = ifp->if_softc; if (sc == NULL || !ME_READY(sc) || (ifp->if_flags & IFF_MONITOR) != 0 || (ifp->if_flags & IFF_UP) == 0 || + (ifp->if_drv_flags & IFF_DRV_RUNNING) == 0 || (error = if_tunnel_check_nesting(ifp, m, MTAG_ME, V_max_me_nesting)) != 0) { m_freem(m); goto drop; } af = m->m_pkthdr.csum_data; if (af != AF_INET) { error = EAFNOSUPPORT; m_freem(m); goto drop; } if (m->m_len < sizeof(struct ip)) m = m_pullup(m, sizeof(struct ip)); if (m == NULL) { error = ENOBUFS; goto drop; } ip = mtod(m, struct ip *); /* Fragmented datagramms shouldn't be encapsulated */ if (ip->ip_off & htons(IP_MF | IP_OFFMASK)) { error = EINVAL; m_freem(m); goto drop; } mh.mob_proto = ip->ip_p; mh.mob_src = ip->ip_src; mh.mob_dst = ip->ip_dst; if (in_hosteq(sc->me_src, ip->ip_src)) { hlen = sizeof(struct mobhdr) - sizeof(struct in_addr); mh.mob_flags = 0; } else { hlen = sizeof(struct mobhdr); mh.mob_flags = MOB_FLAGS_SP; } BPF_MTAP2(ifp, &af, sizeof(af), m); plen = m->m_pkthdr.len; ip->ip_src = sc->me_src; ip->ip_dst = sc->me_dst; m->m_flags &= ~(M_BCAST|M_MCAST); M_SETFIB(m, sc->me_fibnum); M_PREPEND(m, hlen, M_NOWAIT); if (m == NULL) { error = ENOBUFS; goto drop; } if (m->m_len < sizeof(struct ip) + hlen) m = m_pullup(m, sizeof(struct ip) + hlen); if (m == NULL) { error = ENOBUFS; goto drop; } memmove(mtod(m, void *), mtodo(m, hlen), sizeof(struct ip)); ip = mtod(m, struct ip *); ip->ip_len = htons(m->m_pkthdr.len); ip->ip_p = IPPROTO_MOBILE; ip->ip_sum = 0; mh.mob_csum = 0; mh.mob_csum = me_in_cksum((uint16_t *)&mh, hlen / sizeof(uint16_t)); bcopy(&mh, mtodo(m, sizeof(struct ip)), hlen); error = ip_output(m, NULL, NULL, IP_FORWARDING, NULL, NULL); drop: if (error) if_inc_counter(ifp, IFCOUNTER_OERRORS, 1); else { if_inc_counter(ifp, IFCOUNTER_OPACKETS, 1); if_inc_counter(ifp, IFCOUNTER_OBYTES, plen); } ME_RUNLOCK(); return (error); } static void me_qflush(struct ifnet *ifp __unused) { } +static const struct srcaddrtab *me_srcaddrtab = NULL; static const struct encaptab *ecookie = NULL; static const struct encap_config me_encap_cfg = { .proto = IPPROTO_MOBILE, .min_length = sizeof(struct ip) + sizeof(struct mobhdr) - sizeof(in_addr_t), .exact_match = ENCAP_DRV_LOOKUP, .lookup = me_lookup, .input = me_input }; static int memodevent(module_t mod, int type, void *data) { switch (type) { case MOD_LOAD: + me_srcaddrtab = ip_encap_register_srcaddr(me_srcaddr, + NULL, M_WAITOK); ecookie = ip_encap_attach(&me_encap_cfg, NULL, M_WAITOK); break; case MOD_UNLOAD: ip_encap_detach(ecookie); + ip_encap_unregister_srcaddr(me_srcaddrtab); break; default: return (EOPNOTSUPP); } return (0); } static moduledata_t me_mod = { "if_me", memodevent, 0 }; DECLARE_MODULE(if_me, me_mod, SI_SUB_PSEUDO, SI_ORDER_ANY); MODULE_VERSION(if_me, 1); Index: stable/12/sys/netinet/in_gif.c =================================================================== --- stable/12/sys/netinet/in_gif.c (revision 340534) +++ stable/12/sys/netinet/in_gif.c (revision 340535) @@ -1,407 +1,466 @@ /*- * SPDX-License-Identifier: BSD-3-Clause * * Copyright (C) 1995, 1996, 1997, and 1998 WIDE Project. * Copyright (c) 2018 Andrey V. Elsukov * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the project nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $KAME: in_gif.c,v 1.54 2001/05/14 14:02:16 itojun Exp $ */ #include __FBSDID("$FreeBSD$"); #include "opt_inet.h" #include "opt_inet6.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifdef INET6 #include #endif #include #define GIF_TTL 30 VNET_DEFINE_STATIC(int, ip_gif_ttl) = GIF_TTL; #define V_ip_gif_ttl VNET(ip_gif_ttl) SYSCTL_INT(_net_inet_ip, IPCTL_GIF_TTL, gifttl, CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(ip_gif_ttl), 0, "Default TTL value for encapsulated packets"); /* * We keep interfaces in a hash table using src+dst as key. * Interfaces with GIF_IGNORE_SOURCE flag are linked into plain list. */ VNET_DEFINE_STATIC(struct gif_list *, ipv4_hashtbl) = NULL; +VNET_DEFINE_STATIC(struct gif_list *, ipv4_srchashtbl) = NULL; VNET_DEFINE_STATIC(struct gif_list, ipv4_list) = CK_LIST_HEAD_INITIALIZER(); #define V_ipv4_hashtbl VNET(ipv4_hashtbl) +#define V_ipv4_srchashtbl VNET(ipv4_srchashtbl) #define V_ipv4_list VNET(ipv4_list) #define GIF_HASH(src, dst) (V_ipv4_hashtbl[\ in_gif_hashval((src), (dst)) & (GIF_HASH_SIZE - 1)]) +#define GIF_SRCHASH(src) (V_ipv4_srchashtbl[\ + fnv_32_buf(&(src), sizeof(src), FNV1_32_INIT) & (GIF_HASH_SIZE - 1)]) #define GIF_HASH_SC(sc) GIF_HASH((sc)->gif_iphdr->ip_src.s_addr,\ (sc)->gif_iphdr->ip_dst.s_addr) static uint32_t in_gif_hashval(in_addr_t src, in_addr_t dst) { uint32_t ret; ret = fnv_32_buf(&src, sizeof(src), FNV1_32_INIT); return (fnv_32_buf(&dst, sizeof(dst), ret)); } static int in_gif_checkdup(const struct gif_softc *sc, in_addr_t src, in_addr_t dst) { struct gif_softc *tmp; if (sc->gif_family == AF_INET && sc->gif_iphdr->ip_src.s_addr == src && sc->gif_iphdr->ip_dst.s_addr == dst) return (EEXIST); CK_LIST_FOREACH(tmp, &GIF_HASH(src, dst), chain) { if (tmp == sc) continue; if (tmp->gif_iphdr->ip_src.s_addr == src && tmp->gif_iphdr->ip_dst.s_addr == dst) return (EADDRNOTAVAIL); } return (0); } +/* + * Check that ingress address belongs to local host. + */ static void +in_gif_set_running(struct gif_softc *sc) +{ + + if (in_localip(sc->gif_iphdr->ip_src)) + GIF2IFP(sc)->if_drv_flags |= IFF_DRV_RUNNING; + else + GIF2IFP(sc)->if_drv_flags &= ~IFF_DRV_RUNNING; +} + +/* + * ifaddr_event handler. + * Clear IFF_DRV_RUNNING flag when ingress address disappears to prevent + * source address spoofing. + */ +static void +in_gif_srcaddr(void *arg __unused, const struct sockaddr *sa, + int event __unused) +{ + const struct sockaddr_in *sin; + struct gif_softc *sc; + + /* Check that VNET is ready */ + if (V_ipv4_hashtbl == NULL) + return; + + MPASS(in_epoch(net_epoch_preempt)); + sin = (const struct sockaddr_in *)sa; + CK_LIST_FOREACH(sc, &GIF_SRCHASH(sin->sin_addr.s_addr), srchash) { + if (sc->gif_iphdr->ip_src.s_addr != sin->sin_addr.s_addr) + continue; + in_gif_set_running(sc); + } +} + +static void in_gif_attach(struct gif_softc *sc) { if (sc->gif_options & GIF_IGNORE_SOURCE) CK_LIST_INSERT_HEAD(&V_ipv4_list, sc, chain); else CK_LIST_INSERT_HEAD(&GIF_HASH_SC(sc), sc, chain); + + CK_LIST_INSERT_HEAD(&GIF_SRCHASH(sc->gif_iphdr->ip_src.s_addr), + sc, srchash); } int in_gif_setopts(struct gif_softc *sc, u_int options) { /* NOTE: we are protected with gif_ioctl_sx lock */ MPASS(sc->gif_family == AF_INET); MPASS(sc->gif_options != options); if ((options & GIF_IGNORE_SOURCE) != (sc->gif_options & GIF_IGNORE_SOURCE)) { + CK_LIST_REMOVE(sc, srchash); CK_LIST_REMOVE(sc, chain); sc->gif_options = options; in_gif_attach(sc); } return (0); } int in_gif_ioctl(struct gif_softc *sc, u_long cmd, caddr_t data) { struct ifreq *ifr = (struct ifreq *)data; struct sockaddr_in *dst, *src; struct ip *ip; int error; /* NOTE: we are protected with gif_ioctl_sx lock */ error = EINVAL; switch (cmd) { case SIOCSIFPHYADDR: src = &((struct in_aliasreq *)data)->ifra_addr; dst = &((struct in_aliasreq *)data)->ifra_dstaddr; /* sanity checks */ if (src->sin_family != dst->sin_family || src->sin_family != AF_INET || src->sin_len != dst->sin_len || src->sin_len != sizeof(*src)) break; if (src->sin_addr.s_addr == INADDR_ANY || dst->sin_addr.s_addr == INADDR_ANY) { error = EADDRNOTAVAIL; break; } - if (V_ipv4_hashtbl == NULL) + if (V_ipv4_hashtbl == NULL) { V_ipv4_hashtbl = gif_hashinit(); + V_ipv4_srchashtbl = gif_hashinit(); + } error = in_gif_checkdup(sc, src->sin_addr.s_addr, dst->sin_addr.s_addr); if (error == EADDRNOTAVAIL) break; if (error == EEXIST) { /* Addresses are the same. Just return. */ error = 0; break; } ip = malloc(sizeof(*ip), M_GIF, M_WAITOK | M_ZERO); ip->ip_src.s_addr = src->sin_addr.s_addr; ip->ip_dst.s_addr = dst->sin_addr.s_addr; if (sc->gif_family != 0) { /* Detach existing tunnel first */ + CK_LIST_REMOVE(sc, srchash); CK_LIST_REMOVE(sc, chain); GIF_WAIT(); free(sc->gif_hdr, M_GIF); /* XXX: should we notify about link state change? */ } sc->gif_family = AF_INET; sc->gif_iphdr = ip; in_gif_attach(sc); + in_gif_set_running(sc); break; case SIOCGIFPSRCADDR: case SIOCGIFPDSTADDR: if (sc->gif_family != AF_INET) { error = EADDRNOTAVAIL; break; } src = (struct sockaddr_in *)&ifr->ifr_addr; memset(src, 0, sizeof(*src)); src->sin_family = AF_INET; src->sin_len = sizeof(*src); src->sin_addr = (cmd == SIOCGIFPSRCADDR) ? sc->gif_iphdr->ip_src: sc->gif_iphdr->ip_dst; error = prison_if(curthread->td_ucred, (struct sockaddr *)src); if (error != 0) memset(src, 0, sizeof(*src)); break; } return (error); } int in_gif_output(struct ifnet *ifp, struct mbuf *m, int proto, uint8_t ecn) { struct gif_softc *sc = ifp->if_softc; struct ip *ip; int len; /* prepend new IP header */ MPASS(in_epoch(net_epoch_preempt)); len = sizeof(struct ip); #ifndef __NO_STRICT_ALIGNMENT if (proto == IPPROTO_ETHERIP) len += ETHERIP_ALIGN; #endif M_PREPEND(m, len, M_NOWAIT); if (m == NULL) return (ENOBUFS); #ifndef __NO_STRICT_ALIGNMENT if (proto == IPPROTO_ETHERIP) { len = mtod(m, vm_offset_t) & 3; KASSERT(len == 0 || len == ETHERIP_ALIGN, ("in_gif_output: unexpected misalignment")); m->m_data += len; m->m_len -= ETHERIP_ALIGN; } #endif ip = mtod(m, struct ip *); MPASS(sc->gif_family == AF_INET); bcopy(sc->gif_iphdr, ip, sizeof(struct ip)); ip->ip_p = proto; /* version will be set in ip_output() */ ip->ip_ttl = V_ip_gif_ttl; ip->ip_len = htons(m->m_pkthdr.len); ip->ip_tos = ecn; return (ip_output(m, NULL, NULL, 0, NULL, NULL)); } static int in_gif_input(struct mbuf *m, int off, int proto, void *arg) { struct gif_softc *sc = arg; struct ifnet *gifp; struct ip *ip; uint8_t ecn; MPASS(in_epoch(net_epoch_preempt)); if (sc == NULL) { m_freem(m); KMOD_IPSTAT_INC(ips_nogif); return (IPPROTO_DONE); } gifp = GIF2IFP(sc); if ((gifp->if_flags & IFF_UP) != 0) { ip = mtod(m, struct ip *); ecn = ip->ip_tos; m_adj(m, off); gif_input(m, gifp, proto, ecn); } else { m_freem(m); KMOD_IPSTAT_INC(ips_nogif); } return (IPPROTO_DONE); } static int in_gif_lookup(const struct mbuf *m, int off, int proto, void **arg) { const struct ip *ip; struct gif_softc *sc; int ret; if (V_ipv4_hashtbl == NULL) return (0); MPASS(in_epoch(net_epoch_preempt)); ip = mtod(m, const struct ip *); /* * NOTE: it is safe to iterate without any locking here, because softc * can be reclaimed only when we are not within net_epoch_preempt * section, but ip_encap lookup+input are executed in epoch section. */ ret = 0; CK_LIST_FOREACH(sc, &GIF_HASH(ip->ip_dst.s_addr, ip->ip_src.s_addr), chain) { /* * This is an inbound packet, its ip_dst is source address * in softc. */ if (sc->gif_iphdr->ip_src.s_addr == ip->ip_dst.s_addr && sc->gif_iphdr->ip_dst.s_addr == ip->ip_src.s_addr) { ret = ENCAP_DRV_LOOKUP; goto done; } } /* * No exact match. * Check the list of interfaces with GIF_IGNORE_SOURCE flag. */ CK_LIST_FOREACH(sc, &V_ipv4_list, chain) { if (sc->gif_iphdr->ip_src.s_addr == ip->ip_dst.s_addr) { ret = 32 + 8; /* src + proto */ goto done; } } return (0); done: if ((GIF2IFP(sc)->if_flags & IFF_UP) == 0) return (0); /* ingress filters on outer source */ if ((GIF2IFP(sc)->if_flags & IFF_LINK2) == 0) { struct nhop4_basic nh4; struct in_addr dst; dst = ip->ip_src; if (fib4_lookup_nh_basic(sc->gif_fibnum, dst, 0, 0, &nh4) != 0) return (0); if (nh4.nh_ifp != m->m_pkthdr.rcvif) return (0); } *arg = sc; return (ret); } +static const struct srcaddrtab *ipv4_srcaddrtab; static struct { const struct encap_config encap; const struct encaptab *cookie; } ipv4_encap_cfg[] = { { .encap = { .proto = IPPROTO_IPV4, .min_length = 2 * sizeof(struct ip), .exact_match = ENCAP_DRV_LOOKUP, .lookup = in_gif_lookup, .input = in_gif_input }, }, #ifdef INET6 { .encap = { .proto = IPPROTO_IPV6, .min_length = sizeof(struct ip) + sizeof(struct ip6_hdr), .exact_match = ENCAP_DRV_LOOKUP, .lookup = in_gif_lookup, .input = in_gif_input }, }, #endif { .encap = { .proto = IPPROTO_ETHERIP, .min_length = sizeof(struct ip) + sizeof(struct etherip_header) + sizeof(struct ether_header), .exact_match = ENCAP_DRV_LOOKUP, .lookup = in_gif_lookup, .input = in_gif_input }, } }; void in_gif_init(void) { int i; if (!IS_DEFAULT_VNET(curvnet)) return; + + ipv4_srcaddrtab = ip_encap_register_srcaddr(in_gif_srcaddr, + NULL, M_WAITOK); for (i = 0; i < nitems(ipv4_encap_cfg); i++) ipv4_encap_cfg[i].cookie = ip_encap_attach( &ipv4_encap_cfg[i].encap, NULL, M_WAITOK); } void in_gif_uninit(void) { int i; if (IS_DEFAULT_VNET(curvnet)) { for (i = 0; i < nitems(ipv4_encap_cfg); i++) ip_encap_detach(ipv4_encap_cfg[i].cookie); + ip_encap_unregister_srcaddr(ipv4_srcaddrtab); } - if (V_ipv4_hashtbl != NULL) + if (V_ipv4_hashtbl != NULL) { gif_hashdestroy(V_ipv4_hashtbl); + V_ipv4_hashtbl = NULL; + GIF_WAIT(); + gif_hashdestroy(V_ipv4_srchashtbl); + } } Index: stable/12/sys/netinet/ip_gre.c =================================================================== --- stable/12/sys/netinet/ip_gre.c (revision 340534) +++ stable/12/sys/netinet/ip_gre.c (revision 340535) @@ -1,300 +1,358 @@ /*- * SPDX-License-Identifier: BSD-2-Clause-NetBSD * * Copyright (c) 1998 The NetBSD Foundation, Inc. * Copyright (c) 2014, 2018 Andrey V. Elsukov * All rights reserved. * * This code is derived from software contributed to The NetBSD Foundation * by Heiko W.Rupp * * IPv6-over-GRE contributed by Gert Doering * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. * * $NetBSD: ip_gre.c,v 1.29 2003/09/05 23:02:43 itojun Exp $ */ #include __FBSDID("$FreeBSD$"); #include "opt_inet.h" #include "opt_inet6.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifdef INET6 #include #endif #include #define GRE_TTL 30 VNET_DEFINE(int, ip_gre_ttl) = GRE_TTL; #define V_ip_gre_ttl VNET(ip_gre_ttl) SYSCTL_INT(_net_inet_ip, OID_AUTO, grettl, CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(ip_gre_ttl), 0, "Default TTL value for encapsulated packets"); VNET_DEFINE_STATIC(struct gre_list *, ipv4_hashtbl) = NULL; +VNET_DEFINE_STATIC(struct gre_list *, ipv4_srchashtbl) = NULL; #define V_ipv4_hashtbl VNET(ipv4_hashtbl) +#define V_ipv4_srchashtbl VNET(ipv4_srchashtbl) #define GRE_HASH(src, dst) (V_ipv4_hashtbl[\ in_gre_hashval((src), (dst)) & (GRE_HASH_SIZE - 1)]) +#define GRE_SRCHASH(src) (V_ipv4_srchashtbl[\ + fnv_32_buf(&(src), sizeof(src), FNV1_32_INIT) & (GRE_HASH_SIZE - 1)]) #define GRE_HASH_SC(sc) GRE_HASH((sc)->gre_oip.ip_src.s_addr,\ (sc)->gre_oip.ip_dst.s_addr) static uint32_t in_gre_hashval(in_addr_t src, in_addr_t dst) { uint32_t ret; ret = fnv_32_buf(&src, sizeof(src), FNV1_32_INIT); return (fnv_32_buf(&dst, sizeof(dst), ret)); } static int in_gre_checkdup(const struct gre_softc *sc, in_addr_t src, in_addr_t dst) { struct gre_softc *tmp; if (sc->gre_family == AF_INET && sc->gre_oip.ip_src.s_addr == src && sc->gre_oip.ip_dst.s_addr == dst) return (EEXIST); CK_LIST_FOREACH(tmp, &GRE_HASH(src, dst), chain) { if (tmp == sc) continue; if (tmp->gre_oip.ip_src.s_addr == src && tmp->gre_oip.ip_dst.s_addr == dst) return (EADDRNOTAVAIL); } return (0); } static int in_gre_lookup(const struct mbuf *m, int off, int proto, void **arg) { const struct ip *ip; struct gre_softc *sc; if (V_ipv4_hashtbl == NULL) return (0); MPASS(in_epoch(net_epoch_preempt)); ip = mtod(m, const struct ip *); CK_LIST_FOREACH(sc, &GRE_HASH(ip->ip_dst.s_addr, ip->ip_src.s_addr), chain) { /* * This is an inbound packet, its ip_dst is source address * in softc. */ if (sc->gre_oip.ip_src.s_addr == ip->ip_dst.s_addr && sc->gre_oip.ip_dst.s_addr == ip->ip_src.s_addr) { if ((GRE2IFP(sc)->if_flags & IFF_UP) == 0) return (0); *arg = sc; return (ENCAP_DRV_LOOKUP); } } return (0); } +/* + * Check that ingress address belongs to local host. + */ static void +in_gre_set_running(struct gre_softc *sc) +{ + + if (in_localip(sc->gre_oip.ip_src)) + GRE2IFP(sc)->if_drv_flags |= IFF_DRV_RUNNING; + else + GRE2IFP(sc)->if_drv_flags &= ~IFF_DRV_RUNNING; +} + +/* + * ifaddr_event handler. + * Clear IFF_DRV_RUNNING flag when ingress address disappears to prevent + * source address spoofing. + */ +static void +in_gre_srcaddr(void *arg __unused, const struct sockaddr *sa, + int event __unused) +{ + const struct sockaddr_in *sin; + struct gre_softc *sc; + + /* Check that VNET is ready */ + if (V_ipv4_hashtbl == NULL) + return; + + MPASS(in_epoch(net_epoch_preempt)); + sin = (const struct sockaddr_in *)sa; + CK_LIST_FOREACH(sc, &GRE_SRCHASH(sin->sin_addr.s_addr), srchash) { + if (sc->gre_oip.ip_src.s_addr != sin->sin_addr.s_addr) + continue; + in_gre_set_running(sc); + } +} + +static void in_gre_attach(struct gre_softc *sc) { sc->gre_hlen = sizeof(struct greip); sc->gre_oip.ip_v = IPVERSION; sc->gre_oip.ip_hl = sizeof(struct ip) >> 2; sc->gre_oip.ip_p = IPPROTO_GRE; gre_updatehdr(sc, &sc->gre_gihdr->gi_gre); CK_LIST_INSERT_HEAD(&GRE_HASH_SC(sc), sc, chain); + CK_LIST_INSERT_HEAD(&GRE_SRCHASH(sc->gre_oip.ip_src.s_addr), + sc, srchash); } void in_gre_setopts(struct gre_softc *sc, u_long cmd, uint32_t value) { MPASS(cmd == GRESKEY || cmd == GRESOPTS); /* NOTE: we are protected with gre_ioctl_sx lock */ MPASS(sc->gre_family == AF_INET); CK_LIST_REMOVE(sc, chain); + CK_LIST_REMOVE(sc, srchash); GRE_WAIT(); if (cmd == GRESKEY) sc->gre_key = value; else sc->gre_options = value; in_gre_attach(sc); } int in_gre_ioctl(struct gre_softc *sc, u_long cmd, caddr_t data) { struct ifreq *ifr = (struct ifreq *)data; struct sockaddr_in *dst, *src; struct ip *ip; int error; /* NOTE: we are protected with gre_ioctl_sx lock */ error = EINVAL; switch (cmd) { case SIOCSIFPHYADDR: src = &((struct in_aliasreq *)data)->ifra_addr; dst = &((struct in_aliasreq *)data)->ifra_dstaddr; /* sanity checks */ if (src->sin_family != dst->sin_family || src->sin_family != AF_INET || src->sin_len != dst->sin_len || src->sin_len != sizeof(*src)) break; if (src->sin_addr.s_addr == INADDR_ANY || dst->sin_addr.s_addr == INADDR_ANY) { error = EADDRNOTAVAIL; break; } - if (V_ipv4_hashtbl == NULL) + if (V_ipv4_hashtbl == NULL) { V_ipv4_hashtbl = gre_hashinit(); + V_ipv4_srchashtbl = gre_hashinit(); + } error = in_gre_checkdup(sc, src->sin_addr.s_addr, dst->sin_addr.s_addr); if (error == EADDRNOTAVAIL) break; if (error == EEXIST) { /* Addresses are the same. Just return. */ error = 0; break; } ip = malloc(sizeof(struct greip) + 3 * sizeof(uint32_t), M_GRE, M_WAITOK | M_ZERO); ip->ip_src.s_addr = src->sin_addr.s_addr; ip->ip_dst.s_addr = dst->sin_addr.s_addr; if (sc->gre_family != 0) { /* Detach existing tunnel first */ CK_LIST_REMOVE(sc, chain); + CK_LIST_REMOVE(sc, srchash); GRE_WAIT(); free(sc->gre_hdr, M_GRE); /* XXX: should we notify about link state change? */ } sc->gre_family = AF_INET; sc->gre_hdr = ip; sc->gre_oseq = 0; sc->gre_iseq = UINT32_MAX; in_gre_attach(sc); + in_gre_set_running(sc); break; case SIOCGIFPSRCADDR: case SIOCGIFPDSTADDR: if (sc->gre_family != AF_INET) { error = EADDRNOTAVAIL; break; } src = (struct sockaddr_in *)&ifr->ifr_addr; memset(src, 0, sizeof(*src)); src->sin_family = AF_INET; src->sin_len = sizeof(*src); src->sin_addr = (cmd == SIOCGIFPSRCADDR) ? sc->gre_oip.ip_src: sc->gre_oip.ip_dst; error = prison_if(curthread->td_ucred, (struct sockaddr *)src); if (error != 0) memset(src, 0, sizeof(*src)); break; } return (error); } int in_gre_output(struct mbuf *m, int af, int hlen) { struct greip *gi; gi = mtod(m, struct greip *); switch (af) { case AF_INET: /* * gre_transmit() has used M_PREPEND() that doesn't guarantee * m_data is contiguous more than hlen bytes. Use m_copydata() * here to avoid m_pullup(). */ m_copydata(m, hlen + offsetof(struct ip, ip_tos), sizeof(u_char), &gi->gi_ip.ip_tos); m_copydata(m, hlen + offsetof(struct ip, ip_id), sizeof(u_short), (caddr_t)&gi->gi_ip.ip_id); break; #ifdef INET6 case AF_INET6: gi->gi_ip.ip_tos = 0; /* XXX */ ip_fillid(&gi->gi_ip); break; #endif } gi->gi_ip.ip_ttl = V_ip_gre_ttl; gi->gi_ip.ip_len = htons(m->m_pkthdr.len); return (ip_output(m, NULL, NULL, IP_FORWARDING, NULL, NULL)); } +static const struct srcaddrtab *ipv4_srcaddrtab = NULL; static const struct encaptab *ecookie = NULL; static const struct encap_config ipv4_encap_cfg = { .proto = IPPROTO_GRE, .min_length = sizeof(struct greip) + sizeof(struct ip), .exact_match = ENCAP_DRV_LOOKUP, .lookup = in_gre_lookup, .input = gre_input }; void in_gre_init(void) { if (!IS_DEFAULT_VNET(curvnet)) return; + ipv4_srcaddrtab = ip_encap_register_srcaddr(in_gre_srcaddr, + NULL, M_WAITOK); ecookie = ip_encap_attach(&ipv4_encap_cfg, NULL, M_WAITOK); } void in_gre_uninit(void) { - if (IS_DEFAULT_VNET(curvnet)) + if (IS_DEFAULT_VNET(curvnet)) { ip_encap_detach(ecookie); - if (V_ipv4_hashtbl != NULL) + ip_encap_unregister_srcaddr(ipv4_srcaddrtab); + } + if (V_ipv4_hashtbl != NULL) { gre_hashdestroy(V_ipv4_hashtbl); + V_ipv4_hashtbl = NULL; + GRE_WAIT(); + gre_hashdestroy(V_ipv4_srchashtbl); + } } Index: stable/12/sys/netinet6/in6_gif.c =================================================================== --- stable/12/sys/netinet6/in6_gif.c (revision 340534) +++ stable/12/sys/netinet6/in6_gif.c (revision 340535) @@ -1,429 +1,488 @@ /*- * SPDX-License-Identifier: BSD-3-Clause * * Copyright (C) 1995, 1996, 1997, and 1998 WIDE Project. * Copyright (c) 2018 Andrey V. Elsukov * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the project nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $KAME: in6_gif.c,v 1.49 2001/05/14 14:02:17 itojun Exp $ */ #include __FBSDID("$FreeBSD$"); #include "opt_inet.h" #include "opt_inet6.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifdef INET #include #include #endif #include #include #include #include #include #include #include #include #define GIF_HLIM 30 VNET_DEFINE_STATIC(int, ip6_gif_hlim) = GIF_HLIM; #define V_ip6_gif_hlim VNET(ip6_gif_hlim) SYSCTL_DECL(_net_inet6_ip6); SYSCTL_INT(_net_inet6_ip6, IPV6CTL_GIF_HLIM, gifhlim, CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(ip6_gif_hlim), 0, "Default hop limit for encapsulated packets"); /* * We keep interfaces in a hash table using src+dst as key. * Interfaces with GIF_IGNORE_SOURCE flag are linked into plain list. */ VNET_DEFINE_STATIC(struct gif_list *, ipv6_hashtbl) = NULL; +VNET_DEFINE_STATIC(struct gif_list *, ipv6_srchashtbl) = NULL; VNET_DEFINE_STATIC(struct gif_list, ipv6_list) = CK_LIST_HEAD_INITIALIZER(); #define V_ipv6_hashtbl VNET(ipv6_hashtbl) +#define V_ipv6_srchashtbl VNET(ipv6_srchashtbl) #define V_ipv6_list VNET(ipv6_list) #define GIF_HASH(src, dst) (V_ipv6_hashtbl[\ in6_gif_hashval((src), (dst)) & (GIF_HASH_SIZE - 1)]) +#define GIF_SRCHASH(src) (V_ipv6_srchashtbl[\ + fnv_32_buf((src), sizeof(*src), FNV1_32_INIT) & (GIF_HASH_SIZE - 1)]) #define GIF_HASH_SC(sc) GIF_HASH(&(sc)->gif_ip6hdr->ip6_src,\ &(sc)->gif_ip6hdr->ip6_dst) static uint32_t in6_gif_hashval(const struct in6_addr *src, const struct in6_addr *dst) { uint32_t ret; ret = fnv_32_buf(src, sizeof(*src), FNV1_32_INIT); return (fnv_32_buf(dst, sizeof(*dst), ret)); } static int in6_gif_checkdup(const struct gif_softc *sc, const struct in6_addr *src, const struct in6_addr *dst) { struct gif_softc *tmp; if (sc->gif_family == AF_INET6 && IN6_ARE_ADDR_EQUAL(&sc->gif_ip6hdr->ip6_src, src) && IN6_ARE_ADDR_EQUAL(&sc->gif_ip6hdr->ip6_dst, dst)) return (EEXIST); CK_LIST_FOREACH(tmp, &GIF_HASH(src, dst), chain) { if (tmp == sc) continue; if (IN6_ARE_ADDR_EQUAL(&tmp->gif_ip6hdr->ip6_src, src) && IN6_ARE_ADDR_EQUAL(&tmp->gif_ip6hdr->ip6_dst, dst)) return (EADDRNOTAVAIL); } return (0); } +/* + * Check that ingress address belongs to local host. + */ static void +in6_gif_set_running(struct gif_softc *sc) +{ + + if (in6_localip(&sc->gif_ip6hdr->ip6_src)) + GIF2IFP(sc)->if_drv_flags |= IFF_DRV_RUNNING; + else + GIF2IFP(sc)->if_drv_flags &= ~IFF_DRV_RUNNING; +} + +/* + * ifaddr_event handler. + * Clear IFF_DRV_RUNNING flag when ingress address disappears to prevent + * source address spoofing. + */ +static void +in6_gif_srcaddr(void *arg __unused, const struct sockaddr *sa, int event) +{ + const struct sockaddr_in6 *sin; + struct gif_softc *sc; + + /* Check that VNET is ready */ + if (V_ipv6_hashtbl == NULL) + return; + + MPASS(in_epoch(net_epoch_preempt)); + sin = (const struct sockaddr_in6 *)sa; + CK_LIST_FOREACH(sc, &GIF_SRCHASH(&sin->sin6_addr), srchash) { + if (IN6_ARE_ADDR_EQUAL(&sc->gif_ip6hdr->ip6_src, + &sin->sin6_addr) == 0) + continue; + in6_gif_set_running(sc); + } +} + +static void in6_gif_attach(struct gif_softc *sc) { if (sc->gif_options & GIF_IGNORE_SOURCE) CK_LIST_INSERT_HEAD(&V_ipv6_list, sc, chain); else CK_LIST_INSERT_HEAD(&GIF_HASH_SC(sc), sc, chain); + + CK_LIST_INSERT_HEAD(&GIF_SRCHASH(&sc->gif_ip6hdr->ip6_src), + sc, srchash); } int in6_gif_setopts(struct gif_softc *sc, u_int options) { /* NOTE: we are protected with gif_ioctl_sx lock */ MPASS(sc->gif_family == AF_INET6); MPASS(sc->gif_options != options); if ((options & GIF_IGNORE_SOURCE) != (sc->gif_options & GIF_IGNORE_SOURCE)) { + CK_LIST_REMOVE(sc, srchash); CK_LIST_REMOVE(sc, chain); sc->gif_options = options; in6_gif_attach(sc); } return (0); } int in6_gif_ioctl(struct gif_softc *sc, u_long cmd, caddr_t data) { struct in6_ifreq *ifr = (struct in6_ifreq *)data; struct sockaddr_in6 *dst, *src; struct ip6_hdr *ip6; int error; /* NOTE: we are protected with gif_ioctl_sx lock */ error = EINVAL; switch (cmd) { case SIOCSIFPHYADDR_IN6: src = &((struct in6_aliasreq *)data)->ifra_addr; dst = &((struct in6_aliasreq *)data)->ifra_dstaddr; /* sanity checks */ if (src->sin6_family != dst->sin6_family || src->sin6_family != AF_INET6 || src->sin6_len != dst->sin6_len || src->sin6_len != sizeof(*src)) break; if (IN6_IS_ADDR_UNSPECIFIED(&src->sin6_addr) || IN6_IS_ADDR_UNSPECIFIED(&dst->sin6_addr)) { error = EADDRNOTAVAIL; break; } /* * Check validity of the scope zone ID of the * addresses, and convert it into the kernel * internal form if necessary. */ if ((error = sa6_embedscope(src, 0)) != 0 || (error = sa6_embedscope(dst, 0)) != 0) break; - if (V_ipv6_hashtbl == NULL) + if (V_ipv6_hashtbl == NULL) { V_ipv6_hashtbl = gif_hashinit(); + V_ipv6_srchashtbl = gif_hashinit(); + } error = in6_gif_checkdup(sc, &src->sin6_addr, &dst->sin6_addr); if (error == EADDRNOTAVAIL) break; if (error == EEXIST) { /* Addresses are the same. Just return. */ error = 0; break; } ip6 = malloc(sizeof(*ip6), M_GIF, M_WAITOK | M_ZERO); ip6->ip6_src = src->sin6_addr; ip6->ip6_dst = dst->sin6_addr; ip6->ip6_vfc = IPV6_VERSION; if (sc->gif_family != 0) { /* Detach existing tunnel first */ + CK_LIST_REMOVE(sc, srchash); CK_LIST_REMOVE(sc, chain); GIF_WAIT(); free(sc->gif_hdr, M_GIF); /* XXX: should we notify about link state change? */ } sc->gif_family = AF_INET6; sc->gif_ip6hdr = ip6; in6_gif_attach(sc); + in6_gif_set_running(sc); break; case SIOCGIFPSRCADDR_IN6: case SIOCGIFPDSTADDR_IN6: if (sc->gif_family != AF_INET6) { error = EADDRNOTAVAIL; break; } src = (struct sockaddr_in6 *)&ifr->ifr_addr; memset(src, 0, sizeof(*src)); src->sin6_family = AF_INET6; src->sin6_len = sizeof(*src); src->sin6_addr = (cmd == SIOCGIFPSRCADDR_IN6) ? sc->gif_ip6hdr->ip6_src: sc->gif_ip6hdr->ip6_dst; error = prison_if(curthread->td_ucred, (struct sockaddr *)src); if (error == 0) error = sa6_recoverscope(src); if (error != 0) memset(src, 0, sizeof(*src)); break; } return (error); } int in6_gif_output(struct ifnet *ifp, struct mbuf *m, int proto, uint8_t ecn) { struct gif_softc *sc = ifp->if_softc; struct ip6_hdr *ip6; int len; /* prepend new IP header */ MPASS(in_epoch(net_epoch_preempt)); len = sizeof(struct ip6_hdr); #ifndef __NO_STRICT_ALIGNMENT if (proto == IPPROTO_ETHERIP) len += ETHERIP_ALIGN; #endif M_PREPEND(m, len, M_NOWAIT); if (m == NULL) return (ENOBUFS); #ifndef __NO_STRICT_ALIGNMENT if (proto == IPPROTO_ETHERIP) { len = mtod(m, vm_offset_t) & 3; KASSERT(len == 0 || len == ETHERIP_ALIGN, ("in6_gif_output: unexpected misalignment")); m->m_data += len; m->m_len -= ETHERIP_ALIGN; } #endif ip6 = mtod(m, struct ip6_hdr *); MPASS(sc->gif_family == AF_INET6); bcopy(sc->gif_ip6hdr, ip6, sizeof(struct ip6_hdr)); ip6->ip6_flow |= htonl((uint32_t)ecn << 20); ip6->ip6_nxt = proto; ip6->ip6_hlim = V_ip6_gif_hlim; /* * force fragmentation to minimum MTU, to avoid path MTU discovery. * it is too painful to ask for resend of inner packet, to achieve * path MTU discovery for encapsulated packets. */ return (ip6_output(m, 0, NULL, IPV6_MINMTU, 0, NULL, NULL)); } static int in6_gif_input(struct mbuf *m, int off, int proto, void *arg) { struct gif_softc *sc = arg; struct ifnet *gifp; struct ip6_hdr *ip6; uint8_t ecn; MPASS(in_epoch(net_epoch_preempt)); if (sc == NULL) { m_freem(m); IP6STAT_INC(ip6s_nogif); return (IPPROTO_DONE); } gifp = GIF2IFP(sc); if ((gifp->if_flags & IFF_UP) != 0) { ip6 = mtod(m, struct ip6_hdr *); ecn = (ntohl(ip6->ip6_flow) >> 20) & 0xff; m_adj(m, off); gif_input(m, gifp, proto, ecn); } else { m_freem(m); IP6STAT_INC(ip6s_nogif); } return (IPPROTO_DONE); } static int in6_gif_lookup(const struct mbuf *m, int off, int proto, void **arg) { const struct ip6_hdr *ip6; struct gif_softc *sc; int ret; if (V_ipv6_hashtbl == NULL) return (0); MPASS(in_epoch(net_epoch_preempt)); /* * NOTE: it is safe to iterate without any locking here, because softc * can be reclaimed only when we are not within net_epoch_preempt * section, but ip_encap lookup+input are executed in epoch section. */ ip6 = mtod(m, const struct ip6_hdr *); ret = 0; CK_LIST_FOREACH(sc, &GIF_HASH(&ip6->ip6_dst, &ip6->ip6_src), chain) { /* * This is an inbound packet, its ip6_dst is source address * in softc. */ if (IN6_ARE_ADDR_EQUAL(&sc->gif_ip6hdr->ip6_src, &ip6->ip6_dst) && IN6_ARE_ADDR_EQUAL(&sc->gif_ip6hdr->ip6_dst, &ip6->ip6_src)) { ret = ENCAP_DRV_LOOKUP; goto done; } } /* * No exact match. * Check the list of interfaces with GIF_IGNORE_SOURCE flag. */ CK_LIST_FOREACH(sc, &V_ipv6_list, chain) { if (IN6_ARE_ADDR_EQUAL(&sc->gif_ip6hdr->ip6_src, &ip6->ip6_dst)) { ret = 128 + 8; /* src + proto */ goto done; } } return (0); done: if ((GIF2IFP(sc)->if_flags & IFF_UP) == 0) return (0); /* ingress filters on outer source */ if ((GIF2IFP(sc)->if_flags & IFF_LINK2) == 0) { struct nhop6_basic nh6; if (fib6_lookup_nh_basic(sc->gif_fibnum, &ip6->ip6_src, ntohs(in6_getscope(&ip6->ip6_src)), 0, 0, &nh6) != 0) return (0); if (nh6.nh_ifp != m->m_pkthdr.rcvif) return (0); } *arg = sc; return (ret); } +static const struct srcaddrtab *ipv6_srcaddrtab; static struct { const struct encap_config encap; const struct encaptab *cookie; } ipv6_encap_cfg[] = { #ifdef INET { .encap = { .proto = IPPROTO_IPV4, .min_length = sizeof(struct ip6_hdr) + sizeof(struct ip), .exact_match = ENCAP_DRV_LOOKUP, .lookup = in6_gif_lookup, .input = in6_gif_input }, }, #endif { .encap = { .proto = IPPROTO_IPV6, .min_length = 2 * sizeof(struct ip6_hdr), .exact_match = ENCAP_DRV_LOOKUP, .lookup = in6_gif_lookup, .input = in6_gif_input }, }, { .encap = { .proto = IPPROTO_ETHERIP, .min_length = sizeof(struct ip6_hdr) + sizeof(struct etherip_header) + sizeof(struct ether_header), .exact_match = ENCAP_DRV_LOOKUP, .lookup = in6_gif_lookup, .input = in6_gif_input }, } }; void in6_gif_init(void) { int i; if (!IS_DEFAULT_VNET(curvnet)) return; + + ipv6_srcaddrtab = ip6_encap_register_srcaddr(in6_gif_srcaddr, + NULL, M_WAITOK); for (i = 0; i < nitems(ipv6_encap_cfg); i++) ipv6_encap_cfg[i].cookie = ip6_encap_attach( &ipv6_encap_cfg[i].encap, NULL, M_WAITOK); } void in6_gif_uninit(void) { int i; if (IS_DEFAULT_VNET(curvnet)) { for (i = 0; i < nitems(ipv6_encap_cfg); i++) ip6_encap_detach(ipv6_encap_cfg[i].cookie); + ip6_encap_unregister_srcaddr(ipv6_srcaddrtab); } - if (V_ipv6_hashtbl != NULL) + if (V_ipv6_hashtbl != NULL) { gif_hashdestroy(V_ipv6_hashtbl); + V_ipv6_hashtbl = NULL; + GIF_WAIT(); + gif_hashdestroy(V_ipv6_srchashtbl); + } } Index: stable/12/sys/netinet6/ip6_gre.c =================================================================== --- stable/12/sys/netinet6/ip6_gre.c (revision 340534) +++ stable/12/sys/netinet6/ip6_gre.c (revision 340535) @@ -1,288 +1,346 @@ /*- * Copyright (c) 2014, 2018 Andrey V. Elsukov * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include "opt_inet.h" #include "opt_inet6.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifdef INET #include #include #endif #include #include #include #include #include #include VNET_DEFINE(int, ip6_gre_hlim) = IPV6_DEFHLIM; #define V_ip6_gre_hlim VNET(ip6_gre_hlim) SYSCTL_DECL(_net_inet6_ip6); SYSCTL_INT(_net_inet6_ip6, OID_AUTO, grehlim, CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(ip6_gre_hlim), 0, "Default hop limit for encapsulated packets"); VNET_DEFINE_STATIC(struct gre_list *, ipv6_hashtbl) = NULL; +VNET_DEFINE_STATIC(struct gre_list *, ipv6_srchashtbl) = NULL; #define V_ipv6_hashtbl VNET(ipv6_hashtbl) +#define V_ipv6_srchashtbl VNET(ipv6_srchashtbl) #define GRE_HASH(src, dst) (V_ipv6_hashtbl[\ in6_gre_hashval((src), (dst)) & (GRE_HASH_SIZE - 1)]) +#define GRE_SRCHASH(src) (V_ipv6_srchashtbl[\ + fnv_32_buf((src), sizeof(*src), FNV1_32_INIT) & (GRE_HASH_SIZE - 1)]) #define GRE_HASH_SC(sc) GRE_HASH(&(sc)->gre_oip6.ip6_src,\ &(sc)->gre_oip6.ip6_dst) static uint32_t in6_gre_hashval(const struct in6_addr *src, const struct in6_addr *dst) { uint32_t ret; ret = fnv_32_buf(src, sizeof(*src), FNV1_32_INIT); return (fnv_32_buf(dst, sizeof(*dst), ret)); } static int in6_gre_checkdup(const struct gre_softc *sc, const struct in6_addr *src, const struct in6_addr *dst) { struct gre_softc *tmp; if (sc->gre_family == AF_INET6 && IN6_ARE_ADDR_EQUAL(&sc->gre_oip6.ip6_src, src) && IN6_ARE_ADDR_EQUAL(&sc->gre_oip6.ip6_dst, dst)) return (EEXIST); CK_LIST_FOREACH(tmp, &GRE_HASH(src, dst), chain) { if (tmp == sc) continue; if (IN6_ARE_ADDR_EQUAL(&tmp->gre_oip6.ip6_src, src) && IN6_ARE_ADDR_EQUAL(&tmp->gre_oip6.ip6_dst, dst)) return (EADDRNOTAVAIL); } return (0); } static int in6_gre_lookup(const struct mbuf *m, int off, int proto, void **arg) { const struct ip6_hdr *ip6; struct gre_softc *sc; if (V_ipv6_hashtbl == NULL) return (0); MPASS(in_epoch(net_epoch_preempt)); ip6 = mtod(m, const struct ip6_hdr *); CK_LIST_FOREACH(sc, &GRE_HASH(&ip6->ip6_dst, &ip6->ip6_src), chain) { /* * This is an inbound packet, its ip6_dst is source address * in softc. */ if (IN6_ARE_ADDR_EQUAL(&sc->gre_oip6.ip6_src, &ip6->ip6_dst) && IN6_ARE_ADDR_EQUAL(&sc->gre_oip6.ip6_dst, &ip6->ip6_src)) { if ((GRE2IFP(sc)->if_flags & IFF_UP) == 0) return (0); *arg = sc; return (ENCAP_DRV_LOOKUP); } } return (0); } +/* + * Check that ingress address belongs to local host. + */ static void +in6_gre_set_running(struct gre_softc *sc) +{ + + if (in6_localip(&sc->gre_oip6.ip6_src)) + GRE2IFP(sc)->if_drv_flags |= IFF_DRV_RUNNING; + else + GRE2IFP(sc)->if_drv_flags &= ~IFF_DRV_RUNNING; +} + +/* + * ifaddr_event handler. + * Clear IFF_DRV_RUNNING flag when ingress address disappears to prevent + * source address spoofing. + */ +static void +in6_gre_srcaddr(void *arg __unused, const struct sockaddr *sa, + int event __unused) +{ + const struct sockaddr_in6 *sin; + struct gre_softc *sc; + + /* Check that VNET is ready */ + if (V_ipv6_hashtbl == NULL) + return; + + MPASS(in_epoch(net_epoch_preempt)); + sin = (const struct sockaddr_in6 *)sa; + CK_LIST_FOREACH(sc, &GRE_SRCHASH(&sin->sin6_addr), srchash) { + if (IN6_ARE_ADDR_EQUAL(&sc->gre_oip6.ip6_src, + &sin->sin6_addr) == 0) + continue; + in6_gre_set_running(sc); + } +} + +static void in6_gre_attach(struct gre_softc *sc) { sc->gre_hlen = sizeof(struct greip6); sc->gre_oip6.ip6_vfc = IPV6_VERSION; sc->gre_oip6.ip6_nxt = IPPROTO_GRE; gre_updatehdr(sc, &sc->gre_gi6hdr->gi6_gre); CK_LIST_INSERT_HEAD(&GRE_HASH_SC(sc), sc, chain); + CK_LIST_INSERT_HEAD(&GRE_SRCHASH(&sc->gre_oip6.ip6_src), sc, srchash); } void in6_gre_setopts(struct gre_softc *sc, u_long cmd, uint32_t value) { MPASS(cmd == GRESKEY || cmd == GRESOPTS); /* NOTE: we are protected with gre_ioctl_sx lock */ MPASS(sc->gre_family == AF_INET6); CK_LIST_REMOVE(sc, chain); + CK_LIST_REMOVE(sc, srchash); GRE_WAIT(); if (cmd == GRESKEY) sc->gre_key = value; else sc->gre_options = value; in6_gre_attach(sc); } int in6_gre_ioctl(struct gre_softc *sc, u_long cmd, caddr_t data) { struct in6_ifreq *ifr = (struct in6_ifreq *)data; struct sockaddr_in6 *dst, *src; struct ip6_hdr *ip6; int error; /* NOTE: we are protected with gre_ioctl_sx lock */ error = EINVAL; switch (cmd) { case SIOCSIFPHYADDR_IN6: src = &((struct in6_aliasreq *)data)->ifra_addr; dst = &((struct in6_aliasreq *)data)->ifra_dstaddr; /* sanity checks */ if (src->sin6_family != dst->sin6_family || src->sin6_family != AF_INET6 || src->sin6_len != dst->sin6_len || src->sin6_len != sizeof(*src)) break; if (IN6_IS_ADDR_UNSPECIFIED(&src->sin6_addr) || IN6_IS_ADDR_UNSPECIFIED(&dst->sin6_addr)) { error = EADDRNOTAVAIL; break; } /* * Check validity of the scope zone ID of the * addresses, and convert it into the kernel * internal form if necessary. */ if ((error = sa6_embedscope(src, 0)) != 0 || (error = sa6_embedscope(dst, 0)) != 0) break; - if (V_ipv6_hashtbl == NULL) + if (V_ipv6_hashtbl == NULL) { V_ipv6_hashtbl = gre_hashinit(); + V_ipv6_srchashtbl = gre_hashinit(); + } error = in6_gre_checkdup(sc, &src->sin6_addr, &dst->sin6_addr); if (error == EADDRNOTAVAIL) break; if (error == EEXIST) { /* Addresses are the same. Just return. */ error = 0; break; } ip6 = malloc(sizeof(struct greip6) + 3 * sizeof(uint32_t), M_GRE, M_WAITOK | M_ZERO); ip6->ip6_src = src->sin6_addr; ip6->ip6_dst = dst->sin6_addr; if (sc->gre_family != 0) { /* Detach existing tunnel first */ CK_LIST_REMOVE(sc, chain); + CK_LIST_REMOVE(sc, srchash); GRE_WAIT(); free(sc->gre_hdr, M_GRE); /* XXX: should we notify about link state change? */ } sc->gre_family = AF_INET6; sc->gre_hdr = ip6; sc->gre_oseq = 0; sc->gre_iseq = UINT32_MAX; in6_gre_attach(sc); + in6_gre_set_running(sc); break; case SIOCGIFPSRCADDR_IN6: case SIOCGIFPDSTADDR_IN6: if (sc->gre_family != AF_INET6) { error = EADDRNOTAVAIL; break; } src = (struct sockaddr_in6 *)&ifr->ifr_addr; memset(src, 0, sizeof(*src)); src->sin6_family = AF_INET6; src->sin6_len = sizeof(*src); src->sin6_addr = (cmd == SIOCGIFPSRCADDR_IN6) ? sc->gre_oip6.ip6_src: sc->gre_oip6.ip6_dst; error = prison_if(curthread->td_ucred, (struct sockaddr *)src); if (error == 0) error = sa6_recoverscope(src); if (error != 0) memset(src, 0, sizeof(*src)); break; } return (error); } int in6_gre_output(struct mbuf *m, int af __unused, int hlen __unused) { struct greip6 *gi6; gi6 = mtod(m, struct greip6 *); gi6->gi6_ip6.ip6_hlim = V_ip6_gre_hlim; return (ip6_output(m, NULL, NULL, IPV6_MINMTU, NULL, NULL, NULL)); } +static const struct srcaddrtab *ipv6_srcaddrtab = NULL; static const struct encaptab *ecookie = NULL; static const struct encap_config ipv6_encap_cfg = { .proto = IPPROTO_GRE, .min_length = sizeof(struct greip6) + #ifdef INET sizeof(struct ip), #else sizeof(struct ip6_hdr), #endif .exact_match = ENCAP_DRV_LOOKUP, .lookup = in6_gre_lookup, .input = gre_input }; void in6_gre_init(void) { if (!IS_DEFAULT_VNET(curvnet)) return; + ipv6_srcaddrtab = ip6_encap_register_srcaddr(in6_gre_srcaddr, + NULL, M_WAITOK); ecookie = ip6_encap_attach(&ipv6_encap_cfg, NULL, M_WAITOK); } void in6_gre_uninit(void) { - if (IS_DEFAULT_VNET(curvnet)) + if (IS_DEFAULT_VNET(curvnet)) { ip6_encap_detach(ecookie); - if (V_ipv6_hashtbl != NULL) + ip6_encap_unregister_srcaddr(ipv6_srcaddrtab); + } + if (V_ipv6_hashtbl != NULL) { gre_hashdestroy(V_ipv6_hashtbl); + V_ipv6_hashtbl = NULL; + GRE_WAIT(); + gre_hashdestroy(V_ipv6_srchashtbl); + } } Index: stable/12 =================================================================== --- stable/12 (revision 340534) +++ stable/12 (revision 340535) Property changes on: stable/12 ___________________________________________________________________ Modified: svn:mergeinfo ## -0,0 +0,1 ## Merged /head:r339551-339553,339649