Index: projects/ifnet/UPDATING =================================================================== --- projects/ifnet/UPDATING (revision 279031) +++ projects/ifnet/UPDATING (revision 279032) @@ -1,1156 +1,1156 @@ Updating Information for FreeBSD current users. This file is maintained and copyrighted by M. Warner Losh . See end of file for further details. For commonly done items, please see the COMMON ITEMS: section later in the file. These instructions assume that you basically know what you are doing. If not, then please consult the FreeBSD handbook: http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/makeworld.html Items affecting the ports and packages system can be found in /usr/ports/UPDATING. Please read that file before running portupgrade. NOTE: FreeBSD has switched from gcc to clang. If you have trouble bootstrapping from older versions of FreeBSD, try WITHOUT_CLANG and WITH_GCC to bootstrap to the tip of head, and then rebuild without this option. The bootstrap process from older version of current across the gcc/clang cutover is a bit fragile. NOTE TO PEOPLE WHO THINK THAT FreeBSD 11.x IS SLOW: FreeBSD 11.x has many debugging features turned on, in both the kernel and userland. These features attempt to detect incorrect use of system primitives, and encourage loud failure through extra sanity checking and fail stop semantics. They also substantially impact system performance. If you want to do performance measurement, benchmarking, and optimization, you'll want to turn them off. This includes various WITNESS- related kernel options, INVARIANTS, malloc debugging flags in userland, and various verbose features in the kernel. Many developers choose to disable these features on build machines to maximize performance. (To completely disable malloc debugging, define MALLOC_PRODUCTION in /etc/make.conf, or to merely disable the most expensive debugging functionality run "ln -s 'abort:false,junk:false' /etc/malloc.conf".) 20150217: If you are running a -CURRENT kernel since r273872 (Oct 30th, 2014), but before r278950, the RNG was not seeded properly. Immediately upgrade the kernel to r278950 or later and regenerate any keys (e.g. ssh keys or openssl keys) that were generated w/ a kernel from that range. This does not effect programs that directly used /dev/random - or /dev/urandom. All userland uses of arc4random(3) are effected. + or /dev/urandom. All userland uses of arc4random(3) are affected. 20150210: The autofs(4) ABI was changed in order to restore binary compatibility with 10.1-RELEASE. The automountd(8) daemon needs to be rebuilt to work with the new kernel. 20150131: The powerpc64 kernel has been changed to a position-independent executable. This can only be booted with a new version of loader(8), so make sure to update both world and kernel before rebooting. 20150118: Clang and llvm have been upgraded to 3.5.1 release. This is a bugfix only release, no new features have been added. Please see the 20141231 entry below for information about prerequisites and upgrading, if you are not already using 3.5.0. 20150107: ELF tools addr2line, elfcopy (strip), nm, size, and strings are now taken from the ELF Tool Chain project rather than GNU binutils. They should be drop-in replacements, with the addition of arm64 support. The WITHOUT_ELFTOOLCHAIN_TOOLS= knob may be used to obtain the binutils tools, if necessary. 20150105: The default Unbound configuration now enables remote control using a local socket. Users who have already enabled the local_unbound service should regenerate their configuration by running "service local_unbound setup" as root. 20150102: The GNU texinfo and GNU info pages have been removed. To be able to view GNU info pages please install texinfo from ports. 20141231: Clang, llvm and lldb have been upgraded to 3.5.0 release. As of this release, a prerequisite for building clang, llvm and lldb is a C++11 capable compiler and C++11 standard library. This means that to be able to successfully build the cross-tools stage of buildworld, with clang as the bootstrap compiler, your system compiler or cross compiler should either be clang 3.3 or later, or gcc 4.8 or later, and your system C++ library should be libc++, or libdstdc++ from gcc 4.8 or later. On any standard FreeBSD 10.x or 11.x installation, where clang and libc++ are on by default (that is, on x86 or arm), this should work out of the box. On 9.x installations where clang is enabled by default, e.g. on x86 and powerpc, libc++ will not be enabled by default, so libc++ should be built (with clang) and installed first. If both clang and libc++ are missing, build clang first, then use it to build libc++. On 8.x and earlier installations, upgrade to 9.x first, and then follow the instructions for 9.x above. Sparc64 and mips users are unaffected, as they still use gcc 4.2.1 by default, and do not build clang. Many embedded systems are resource constrained, and will not be able to build clang in a reasonable time, or in some cases at all. In those cases, cross building bootable systems on amd64 is a workaround. This new version of clang introduces a number of new warnings, of which the following are most likely to appear: -Wabsolute-value This warns in two cases, for both C and C++: * When the code is trying to take the absolute value of an unsigned quantity, which is effectively a no-op, and almost never what was intended. The code should be fixed, if at all possible. If you are sure that the unsigned quantity can be safely cast to signed, without loss of information or undefined behavior, you can add an explicit cast, or disable the warning. * When the code is trying to take an absolute value, but the called abs() variant is for the wrong type, which can lead to truncation. If you want to disable the warning instead of fixing the code, please make sure that truncation will not occur, or it might lead to unwanted side-effects. -Wtautological-undefined-compare and -Wundefined-bool-conversion These warn when C++ code is trying to compare 'this' against NULL, while 'this' should never be NULL in well-defined C++ code. However, there is some legacy (pre C++11) code out there, which actively abuses this feature, which was less strictly defined in previous C++ versions. Squid and openjdk do this, for example. The warning can be turned off for C++98 and earlier, but compiling the code in C++11 mode might result in unexpected behavior; for example, the parts of the program that are unreachable could be optimized away. 20141222: The old NFS client and server (kernel options NFSCLIENT, NFSSERVER) kernel sources have been removed. The .h files remain, since some utilities include them. This will need to be fixed later. If "mount -t oldnfs ..." is attempted, it will fail. If the "-o" option on mountd(8), nfsd(8) or nfsstat(1) is used, the utilities will report errors. 20141121: The handling of LOCAL_LIB_DIRS has been altered to skip addition of directories to top level SUBDIR variable when their parent directory is included in LOCAL_DIRS. Users with build systems with such hierarchies and without SUBDIR entries in the parent directory Makefiles should add them or add the directories to LOCAL_DIRS. 20141109: faith(4) and faithd(8) have been removed from the base system. Faith has been obsolete for a very long time. 20141104: vt(4), the new console driver, is enabled by default. It brings support for Unicode and double-width characters, as well as support for UEFI and integration with the KMS kernel video drivers. You may need to update your console settings in /etc/rc.conf, most probably the keymap. During boot, /etc/rc.d/syscons will indicate what you need to do. vt(4) still has issues and lacks some features compared to syscons(4). See the wiki for up-to-date information: https://wiki.freebsd.org/Newcons If you want to keep using syscons(4), you can do so by adding the following line to /boot/loader.conf: kern.vty=sc 20141102: pjdfstest has been integrated into kyua as an opt-in test suite. Please see share/doc/pjdfstest/README for more details on how to execute it. 20141009: gperf has been removed from the base system for architectures that use clang. Ports that require gperf will obtain it from the devel/gperf port. 20140923: pjdfstest has been moved from tools/regression/pjdfstest to contrib/pjdfstest . 20140922: At svn r271982, The default linux compat kernel ABI has been adjusted to 2.6.18 in support of the linux-c6 compat ports infrastructure update. If you wish to continue using the linux-f10 compat ports, add compat.linux.osrelease=2.6.16 to your local sysctl.conf. Users are encouraged to update their linux-compat packages to linux-c6 during their next update cycle. 20140729: The ofwfb driver, used to provide a graphics console on PowerPC when using vt(4), no longer allows mmap() of all physical memory. This will prevent Xorg on PowerPC with some ATI graphics cards from initializing properly unless x11-servers/xorg-server is updated to 1.12.4_8 or newer. 20140723: The xdev targets have been converted to using TARGET and TARGET_ARCH instead of XDEV and XDEV_ARCH. 20140719: The default unbound configuration has been modified to address issues with reverse lookups on networks that use private address ranges. If you use the local_unbound service, run "service local_unbound setup" as root to regenerate your configuration, then "service local_unbound reload" to load the new configuration. 20140709: The GNU texinfo and GNU info pages are not built and installed anymore, WITH_INFO knob has been added to allow to built and install them again. UPDATE: see 20150102 entry on texinfo's removal 20140708: The GNU readline library is now an INTERNALLIB - that is, it is statically linked into consumers (GDB and variants) in the base system, and the shared library is no longer installed. The devel/readline port is available for third party software that requires readline. 20140702: The Itanium architecture (ia64) has been removed from the list of known architectures. This is the first step in the removal of the architecture. 20140701: Commit r268115 has added NFSv4.1 server support, merged from projects/nfsv4.1-server. Since this includes changes to the internal interfaces between the NFS related modules, a full build of the kernel and modules will be necessary. __FreeBSD_version has been bumped. 20140629: The WITHOUT_VT_SUPPORT kernel config knob has been renamed WITHOUT_VT. (The other _SUPPORT knobs have a consistent meaning which differs from the behaviour controlled by this knob.) 20140619: Maximal length of the serial number in CTL was increased from 16 to 64 chars, that breaks ABI. All CTL-related tools, such as ctladm and ctld, need to be rebuilt to work with a new kernel. 20140606: The libatf-c and libatf-c++ major versions were downgraded to 0 and 1 respectively to match the upstream numbers. They were out of sync because, when they were originally added to FreeBSD, the upstream versions were not respected. These libraries are private and not yet built by default, so renumbering them should be a non-issue. However, unclean source trees will yield broken test programs once the operator executes "make delete-old-libs" after a "make installworld". Additionally, the atf-sh binary was made private by moving it into /usr/libexec/. Already-built shell test programs will keep the path to the old binary so they will break after "make delete-old" is run. If you are using WITH_TESTS=yes (not the default), wipe the object tree and rebuild from scratch to prevent spurious test failures. This is only needed once: the misnumbered libraries and misplaced binaries have been added to OptionalObsoleteFiles.inc so they will be removed during a clean upgrade. 20140512: Clang and llvm have been upgraded to 3.4.1 release. 20140508: We bogusly installed src.opts.mk in /usr/share/mk. This file should be removed to avoid issues in the future (and has been added to ObsoleteFiles.inc). 20140505: /etc/src.conf now affects only builds of the FreeBSD src tree. In the past, it affected all builds that used the bsd.*.mk files. The old behavior was a bug, but people may have relied upon it. To get this behavior back, you can .include /etc/src.conf from /etc/make.conf (which is still global and isn't changed). This also changes the behavior of incremental builds inside the tree of individual directories. Set MAKESYSPATH to ".../share/mk" to do that. Although this has survived make universe and some upgrade scenarios, other upgrade scenarios may have broken. At least one form of temporary breakage was fixed with MAKESYSPATH settings for buildworld as well... In cases where MAKESYSPATH isn't working with this setting, you'll need to set it to the full path to your tree. One side effect of all this cleaning up is that bsd.compiler.mk is no longer implicitly included by bsd.own.mk. If you wish to use COMPILER_TYPE, you must now explicitly include bsd.compiler.mk as well. 20140430: The lindev device has been removed since /dev/full has been made a standard device. __FreeBSD_version has been bumped. 20140424: The knob WITHOUT_VI was added to the base system, which controls building ex(1), vi(1), etc. Older releases of FreeBSD required ex(1) in order to reorder files share/termcap and didn't build ex(1) as a build tool, so building/installing with WITH_VI is highly advised for build hosts for older releases. This issue has been fixed in stable/9 and stable/10 in r277022 and r276991, respectively. 20140418: The YES_HESIOD knob has been removed. It has been obsolete for a decade. Please move to using WITH_HESIOD instead or your builds will silently lack HESIOD. 20140405: The uart(4) driver has been changed with respect to its handling of the low-level console. Previously the uart(4) driver prevented any process from changing the baudrate or the CLOCAL and HUPCL control flags. By removing the restrictions, operators can make changes to the serial console port without having to reboot. However, when getty(8) is started on the serial device that is associated with the low-level console, a misconfigured terminal line in /etc/ttys will now have a real impact. Before upgrading the kernel, make sure that /etc/ttys has the serial console device configured as 3wire without baudrate to preserve the previous behaviour. E.g: ttyu0 "/usr/libexec/getty 3wire" vt100 on secure 20140306: Support for libwrap (TCP wrappers) in rpcbind was disabled by default to improve performance. To re-enable it, if needed, run rpcbind with command line option -W. 20140226: Switched back to the GPL dtc compiler due to updates in the upstream dts files not being supported by the BSDL dtc compiler. You will need to rebuild your kernel toolchain to pick up the new compiler. Core dumps may result while building dtb files during a kernel build if you fail to do so. Set WITHOUT_GPL_DTC if you require the BSDL compiler. 20140216: Clang and llvm have been upgraded to 3.4 release. 20140216: The nve(4) driver has been removed. Please use the nfe(4) driver for NVIDIA nForce MCP Ethernet adapters instead. 20140212: An ABI incompatibility crept into the libc++ 3.4 import in r261283. This could cause certain C++ applications using shared libraries built against the previous version of libc++ to crash. The incompatibility has now been fixed, but any C++ applications or shared libraries built between r261283 and r261801 should be recompiled. 20140204: OpenSSH will now ignore errors caused by kernel lacking of Capsicum capability mode support. Please note that enabling the feature in kernel is still highly recommended. 20140131: OpenSSH is now built with sandbox support, and will use sandbox as the default privilege separation method. This requires Capsicum capability mode support in kernel. 20140128: The libelf and libdwarf libraries have been updated to newer versions from upstream. Shared library version numbers for these two libraries were bumped. Any ports or binaries requiring these two libraries should be recompiled. __FreeBSD_version is bumped to 1100006. 20140110: If a Makefile in a tests/ directory was auto-generating a Kyuafile instead of providing an explicit one, this would prevent such Makefile from providing its own Kyuafile in the future during NO_CLEAN builds. This has been fixed in the Makefiles but manual intervention is needed to clean an objdir if you use NO_CLEAN: # find /usr/obj -name Kyuafile | xargs rm -f 20131213: The behavior of gss_pseudo_random() for the krb5 mechanism has changed, for applications requesting a longer random string than produced by the underlying enctype's pseudo-random() function. In particular, the random string produced from a session key of enctype aes256-cts-hmac-sha1-96 or aes256-cts-hmac-sha1-96 will be different at the 17th octet and later, after this change. The counter used in the PRF+ construction is now encoded as a big-endian integer in accordance with RFC 4402. __FreeBSD_version is bumped to 1100004. 20131108: The WITHOUT_ATF build knob has been removed and its functionality has been subsumed into the more generic WITHOUT_TESTS. If you were using the former to disable the build of the ATF libraries, you should change your settings to use the latter. 20131025: The default version of mtree is nmtree which is obtained from NetBSD. The output is generally the same, but may vary slightly. If you found you need identical output adding "-F freebsd9" to the command line should do the trick. For the time being, the old mtree is available as fmtree. 20131014: libbsdyml has been renamed to libyaml and moved to /usr/lib/private. This will break ports-mgmt/pkg. Rebuild the port, or upgrade to pkg 1.1.4_8 and verify bsdyml not linked in, before running "make delete-old-libs": # make -C /usr/ports/ports-mgmt/pkg build deinstall install clean or # pkg install pkg; ldd /usr/local/sbin/pkg | grep bsdyml 20131010: The rc.d/jail script has been updated to support jail(8) configuration file. The "jail__*" rc.conf(5) variables for per-jail configuration are automatically converted to /var/run/jail..conf before the jail(8) utility is invoked. This is transparently backward compatible. See below about some incompatibilities and rc.conf(5) manual page for more details. These variables are now deprecated in favor of jail(8) configuration file. One can use "rc.d/jail config " command to generate a jail(8) configuration file in /var/run/jail..conf without running the jail(8) utility. The default pathname of the configuration file is /etc/jail.conf and can be specified by using $jail_conf or $jail__conf variables. Please note that jail_devfs_ruleset accepts an integer at this moment. Please consider to rewrite the ruleset name with an integer. 20130930: BIND has been removed from the base system. If all you need is a local resolver, simply enable and start the local_unbound service instead. Otherwise, several versions of BIND are available in the ports tree. The dns/bind99 port is one example. With this change, nslookup(1) and dig(1) are no longer in the base system. Users should instead use host(1) and drill(1) which are in the base system. Alternatively, nslookup and dig can be obtained by installing the dns/bind-tools port. 20130916: With the addition of unbound(8), a new unbound user is now required during installworld. "mergemaster -p" can be used to add the user prior to installworld, as documented in the handbook. 20130911: OpenSSH is now built with DNSSEC support, and will by default silently trust signed SSHFP records. This can be controlled with the VerifyHostKeyDNS client configuration setting. DNSSEC support can be disabled entirely with the WITHOUT_LDNS option in src.conf. 20130906: The GNU Compiler Collection and C++ standard library (libstdc++) are no longer built by default on platforms where clang is the system compiler. You can enable them with the WITH_GCC and WITH_GNUCXX options in src.conf. 20130905: The PROCDESC kernel option is now part of the GENERIC kernel configuration and is required for the rwhod(8) to work. If you are using custom kernel configuration, you should include 'options PROCDESC'. 20130905: The API and ABI related to the Capsicum framework was modified in backward incompatible way. The userland libraries and programs have to be recompiled to work with the new kernel. This includes the following libraries and programs, but the whole buildworld is advised: libc, libprocstat, dhclient, tcpdump, hastd, hastctl, kdump, procstat, rwho, rwhod, uniq. 20130903: AES-NI intrinsic support has been added to gcc. The AES-NI module has been updated to use this support. A new gcc is required to build the aesni module on both i386 and amd64. 20130821: The PADLOCK_RNG and RDRAND_RNG kernel options are now devices. Thus "device padlock_rng" and "device rdrand_rng" should be used instead of "options PADLOCK_RNG" & "options RDRAND_RNG". 20130813: WITH_ICONV has been split into two feature sets. WITH_ICONV now enables just the iconv* functionality and is now on by default. WITH_LIBICONV_COMPAT enables the libiconv api and link time compatability. Set WITHOUT_ICONV to build the old way. If you have been using WITH_ICONV before, you will very likely need to turn on WITH_LIBICONV_COMPAT. 20130806: INVARIANTS option now enables DEBUG for code with OpenSolaris and Illumos origin, including ZFS. If you have INVARIANTS in your kernel configuration, then there is no need to set DEBUG or ZFS_DEBUG explicitly. DEBUG used to enable witness(9) tracking of OpenSolaris (mostly ZFS) locks if WITNESS option was set. Because that generated a lot of witness(9) reports and all of them were believed to be false positives, this is no longer done. New option OPENSOLARIS_WITNESS can be used to achieve the previous behavior. 20130806: Timer values in IPv6 data structures now use time_uptime instead of time_second. Although this is not a user-visible functional change, userland utilities which directly use them---ndp(8), rtadvd(8), and rtsold(8) in the base system---need to be updated to r253970 or later. 20130802: find -delete can now delete the pathnames given as arguments, instead of only files found below them or if the pathname did not contain any slashes. Formerly, the following error message would result: find: -delete: : relative path potentially not safe Deleting the pathnames given as arguments can be prevented without error messages using -mindepth 1 or by changing directory and passing "." as argument to find. This works in the old as well as the new version of find. 20130726: Behavior of devfs rules path matching has been changed. Pattern is now always matched against fully qualified devfs path and slash characters must be explicitly matched by slashes in pattern (FNM_PATHNAME). Rulesets involving devfs subdirectories must be reviewed. 20130716: The default ARM ABI has changed to the ARM EABI. The old ABI is incompatible with the ARM EABI and all programs and modules will need to be rebuilt to work with a new kernel. To keep using the old ABI ensure the WITHOUT_ARM_EABI knob is set. NOTE: Support for the old ABI will be removed in the future and users are advised to upgrade. 20130709: pkg_install has been disconnected from the build if you really need it you should add WITH_PKGTOOLS in your src.conf(5). 20130709: Most of network statistics structures were changed to be able keep 64-bits counters. Thus all tools, that work with networking statistics, must be rebuilt (netstat(1), bsnmpd(1), etc.) 20130629: Fix targets that run multiple make's to use && rather than ; so that subsequent steps depend on success of previous. NOTE: if building 'universe' with -j* on stable/8 or stable/9 it would be better to start the build using bmake, to avoid overloading the machine. 20130618: Fix a bug that allowed a tracing process (e.g. gdb) to write to a memory-mapped file in the traced process's address space even if neither the traced process nor the tracing process had write access to that file. 20130615: CVS has been removed from the base system. An exact copy of the code is available from the devel/cvs port. 20130613: Some people report the following error after the switch to bmake: make: illegal option -- J usage: make [-BPSXeiknpqrstv] [-C directory] [-D variable] ... *** [buildworld] Error code 2 this likely due to an old instance of make in ${MAKEPATH} (${MAKEOBJDIRPREFIX}${.CURDIR}/make.${MACHINE}) which src/Makefile will use that blindly, if it exists, so if you see the above error: rm -rf `make -V MAKEPATH` should resolve it. 20130516: Use bmake by default. Whereas before one could choose to build with bmake via -DWITH_BMAKE one must now use -DWITHOUT_BMAKE to use the old make. The goal is to remove these knobs for 10-RELEASE. It is worth noting that bmake (like gmake) treats the command line as the unit of failure, rather than statements within the command line. Thus '(cd some/where && dosomething)' is safer than 'cd some/where; dosomething'. The '()' allows consistent behavior in parallel build. 20130429: Fix a bug that allows NFS clients to issue READDIR on files. 20130426: The WITHOUT_IDEA option has been removed because the IDEA patent expired. 20130426: The sysctl which controls TRIM support under ZFS has been renamed from vfs.zfs.trim_disable -> vfs.zfs.trim.enabled and has been enabled by default. 20130425: The mergemaster command now uses the default MAKEOBJDIRPREFIX rather than creating it's own in the temporary directory in order allow access to bootstrapped versions of tools such as install and mtree. When upgrading from version of FreeBSD where the install command does not support -l, you will need to install a new mergemaster command if mergemaster -p is required. This can be accomplished with the command (cd src/usr.sbin/mergemaster && make install). 20130404: Legacy ATA stack, disabled and replaced by new CAM-based one since FreeBSD 9.0, completely removed from the sources. Kernel modules atadisk and atapi*, user-level tools atacontrol and burncd are removed. Kernel option `options ATA_CAM` is now permanently enabled and removed. 20130319: SOCK_CLOEXEC and SOCK_NONBLOCK flags have been added to socket(2) and socketpair(2). Software, in particular Kerberos, may automatically detect and use these during building. The resulting binaries will not work on older kernels. 20130308: CTL_DISABLE has also been added to the sparc64 GENERIC (for further information, see the respective 20130304 entry). 20130304: Recent commits to callout(9) changed the size of struct callout, so the KBI is probably heavily disturbed. Also, some functions in callout(9)/sleep(9)/sleepqueue(9)/condvar(9) KPIs were replaced by macros. Every kernel module using it won't load, so rebuild is requested. The ctl device has been re-enabled in GENERIC for i386 and amd64, but does not initialize by default (because of the new CTL_DISABLE option) to save memory. To re-enable it, remove the CTL_DISABLE option from the kernel config file or set kern.cam.ctl.disable=0 in /boot/loader.conf. 20130301: The ctl device has been disabled in GENERIC for i386 and amd64. This was done due to the extra memory being allocated at system initialisation time by the ctl driver which was only used if a CAM target device was created. This makes a FreeBSD system unusable on 128MB or less of RAM. 20130208: A new compression method (lz4) has been merged to -HEAD. Please refer to zpool-features(7) for more information. Please refer to the "ZFS notes" section of this file for information on upgrading boot ZFS pools. 20130129: A BSD-licensed patch(1) variant has been added and is installed as bsdpatch, being the GNU version the default patch. To inverse the logic and use the BSD-licensed one as default, while having the GNU version installed as gnupatch, rebuild and install world with the WITH_BSD_PATCH knob set. 20130121: Due to the use of the new -l option to install(1) during build and install, you must take care not to directly set the INSTALL make variable in your /etc/make.conf, /etc/src.conf, or on the command line. If you wish to use the -C flag for all installs you may be able to add INSTALL+=-C to /etc/make.conf or /etc/src.conf. 20130118: The install(1) option -M has changed meaning and now takes an argument that is a file or path to append logs to. In the unlikely event that -M was the last option on the command line and the command line contained at least two files and a target directory the first file will have logs appended to it. The -M option served little practical purpose in the last decade so its use is expected to be extremely rare. 20121223: After switching to Clang as the default compiler some users of ZFS on i386 systems started to experience stack overflow kernel panics. Please consider using 'options KSTACK_PAGES=4' in such configurations. 20121222: GEOM_LABEL now mangles label names read from file system metadata. Mangling affect labels containing spaces, non-printable characters, '%' or '"'. Device names in /etc/fstab and other places may need to be updated. 20121217: By default, only the 10 most recent kernel dumps will be saved. To restore the previous behaviour (no limit on the number of kernel dumps stored in the dump directory) add the following line to /etc/rc.conf: savecore_flags="" 20121201: With the addition of auditdistd(8), a new auditdistd user is now required during installworld. "mergemaster -p" can be used to add the user prior to installworld, as documented in the handbook. 20121117: The sin6_scope_id member variable in struct sockaddr_in6 is now filled by the kernel before passing the structure to the userland via sysctl or routing socket. This means the KAME-specific embedded scope id in sin6_addr.s6_addr[2] is always cleared in userland application. This behavior can be controlled by net.inet6.ip6.deembed_scopeid. __FreeBSD_version is bumped to 1000025. 20121105: On i386 and amd64 systems WITH_CLANG_IS_CC is now the default. This means that the world and kernel will be compiled with clang and that clang will be installed as /usr/bin/cc, /usr/bin/c++, and /usr/bin/cpp. To disable this behavior and revert to building with gcc, compile with WITHOUT_CLANG_IS_CC. Really old versions of current may need to bootstrap WITHOUT_CLANG first if the clang build fails (its compatibility window doesn't extend to the 9 stable branch point). 20121102: The IPFIREWALL_FORWARD kernel option has been removed. Its functionality now turned on by default. 20121023: The ZERO_COPY_SOCKET kernel option has been removed and split into SOCKET_SEND_COW and SOCKET_RECV_PFLIP. NB: SOCKET_SEND_COW uses the VM page based copy-on-write mechanism which is not safe and may result in kernel crashes. NB: The SOCKET_RECV_PFLIP mechanism is useless as no current driver supports disposeable external page sized mbuf storage. Proper replacements for both zero-copy mechanisms are under consideration and will eventually lead to complete removal of the two kernel options. 20121023: The IPv4 network stack has been converted to network byte order. The following modules need to be recompiled together with kernel: carp(4), divert(4), gif(4), siftr(4), gre(4), pf(4), ipfw(4), ng_ipfw(4), stf(4). 20121022: Support for non-MPSAFE filesystems was removed from VFS. The VFS_VERSION was bumped, all filesystem modules shall be recompiled. 20121018: All the non-MPSAFE filesystems have been disconnected from the build. The full list includes: codafs, hpfs, ntfs, nwfs, portalfs, smbfs, xfs. 20121016: The interface cloning API and ABI has changed. The following modules need to be recompiled together with kernel: ipfw(4), pfsync(4), pflog(4), usb(4), wlan(4), stf(4), vlan(4), disc(4), edsc(4), if_bridge(4), gif(4), tap(4), faith(4), epair(4), enc(4), tun(4), if_lagg(4), gre(4). 20121015: The sdhci driver was split in two parts: sdhci (generic SD Host Controller logic) and sdhci_pci (actual hardware driver). No kernel config modifications are required, but if you load sdhc as a module you must switch to sdhci_pci instead. 20121014: Import the FUSE kernel and userland support into base system. 20121013: The GNU sort(1) program has been removed since the BSD-licensed sort(1) has been the default for quite some time and no serious problems have been reported. The corresponding WITH_GNU_SORT knob has also gone. 20121006: The pfil(9) API/ABI for AF_INET family has been changed. Packet filtering modules: pf(4), ipfw(4), ipfilter(4) need to be recompiled with new kernel. 20121001: The net80211(4) ABI has been changed to allow for improved driver PS-POLL and power-save support. All wireless drivers need to be recompiled to work with the new kernel. 20120913: The random(4) support for the VIA hardware random number generator (`PADLOCK') is no longer enabled unconditionally. Add the padlock_rng device in the custom kernel config if needed. The GENERIC kernels on i386 and amd64 do include the device, so the change only affects the custom kernel configurations. 20120908: The pf(4) packet filter ABI has been changed. pfctl(8) and snmp_pf module need to be recompiled to work with new kernel. 20120828: A new ZFS feature flag "com.delphix:empty_bpobj" has been merged to -HEAD. Pools that have empty_bpobj in active state can not be imported read-write with ZFS implementations that do not support this feature. For more information read the zpool-features(5) manual page. 20120727: The sparc64 ZFS loader has been changed to no longer try to auto- detect ZFS providers based on diskN aliases but now requires these to be explicitly listed in the OFW boot-device environment variable. 20120712: The OpenSSL has been upgraded to 1.0.1c. Any binaries requiring libcrypto.so.6 or libssl.so.6 must be recompiled. Also, there are configuration changes. Make sure to merge /etc/ssl/openssl.cnf. 20120712: The following sysctls and tunables have been renamed for consistency with other variables: kern.cam.da.da_send_ordered -> kern.cam.da.send_ordered kern.cam.ada.ada_send_ordered -> kern.cam.ada.send_ordered 20120628: The sort utility has been replaced with BSD sort. For now, GNU sort is also available as "gnusort" or the default can be set back to GNU sort by setting WITH_GNU_SORT. In this case, BSD sort will be installed as "bsdsort". 20120611: A new version of ZFS (pool version 5000) has been merged to -HEAD. Starting with this version the old system of ZFS pool versioning is superseded by "feature flags". This concept enables forward compatibility against certain future changes in functionality of ZFS pools. The first read-only compatible "feature flag" for ZFS pools is named "com.delphix:async_destroy". For more information read the new zpool-features(5) manual page. Please refer to the "ZFS notes" section of this file for information on upgrading boot ZFS pools. 20120417: The malloc(3) implementation embedded in libc now uses sources imported as contrib/jemalloc. The most disruptive API change is to /etc/malloc.conf. If your system has an old-style /etc/malloc.conf, delete it prior to installworld, and optionally re-create it using the new format after rebooting. See malloc.conf(5) for details (specifically the TUNING section and the "opt.*" entries in the MALLCTL NAMESPACE section). 20120328: Big-endian MIPS TARGET_ARCH values no longer end in "eb". mips64eb is now spelled mips64. mipsn32eb is now spelled mipsn32. mipseb is now spelled mips. This is to aid compatibility with third-party software that expects this naming scheme in uname(3). Little-endian settings are unchanged. If you are updating a big-endian mips64 machine from before this change, you may need to set MACHINE_ARCH=mips64 in your environment before the new build system will recognize your machine. 20120306: Disable by default the option VFS_ALLOW_NONMPSAFE for all supported platforms. 20120229: Now unix domain sockets behave "as expected" on nullfs(5). Previously nullfs(5) did not pass through all behaviours to the underlying layer, as a result if we bound to a socket on the lower layer we could connect only to the lower path; if we bound to the upper layer we could connect only to the upper path. The new behavior is one can connect to both the lower and the upper paths regardless what layer path one binds to. 20120211: The getifaddrs upgrade path broken with 20111215 has been restored. If you have upgraded in between 20111215 and 20120209 you need to recompile libc again with your kernel. You still need to recompile world to be able to configure CARP but this restriction already comes from 20111215. 20120114: The set_rcvar() function has been removed from /etc/rc.subr. All base and ports rc.d scripts have been updated, so if you have a port installed with a script in /usr/local/etc/rc.d you can either hand-edit the rcvar= line, or reinstall the port. An easy way to handle the mass-update of /etc/rc.d: rm /etc/rc.d/* && mergemaster -i 20120109: panic(9) now stops other CPUs in the SMP systems, disables interrupts on the current CPU and prevents other threads from running. This behavior can be reverted using the kern.stop_scheduler_on_panic tunable/sysctl. The new behavior can be incompatible with kern.sync_on_panic. 20111215: The carp(4) facility has been changed significantly. Configuration of the CARP protocol via ifconfig(8) has changed, as well as format of CARP events submitted to devd(8) has changed. See manual pages for more information. The arpbalance feature of carp(4) is currently not supported anymore. Size of struct in_aliasreq, struct in6_aliasreq has changed. User utilities using SIOCAIFADDR, SIOCAIFADDR_IN6, e.g. ifconfig(8), need to be recompiled. 20111122: The acpi_wmi(4) status device /dev/wmistat has been renamed to /dev/wmistat0. 20111108: The option VFS_ALLOW_NONMPSAFE option has been added in order to explicitely support non-MPSAFE filesystems. It is on by default for all supported platform at this present time. 20111101: The broken amd(4) driver has been replaced with esp(4) in the amd64, i386 and pc98 GENERIC kernel configuration files. 20110930: sysinstall has been removed 20110923: The stable/9 branch created in subversion. This corresponds to the RELENG_9 branch in CVS. COMMON ITEMS: General Notes ------------- Avoid using make -j when upgrading. While generally safe, there are sometimes problems using -j to upgrade. If your upgrade fails with -j, please try again without -j. From time to time in the past there have been problems using -j with buildworld and/or installworld. This is especially true when upgrading between "distant" versions (eg one that cross a major release boundary or several minor releases, or when several months have passed on the -current branch). Sometimes, obscure build problems are the result of environment poisoning. This can happen because the make utility reads its environment when searching for values for global variables. To run your build attempts in an "environmental clean room", prefix all make commands with 'env -i '. See the env(1) manual page for more details. When upgrading from one major version to another it is generally best to upgrade to the latest code in the currently installed branch first, then do an upgrade to the new branch. This is the best-tested upgrade path, and has the highest probability of being successful. Please try this approach before reporting problems with a major version upgrade. When upgrading a live system, having a root shell around before installing anything can help undo problems. Not having a root shell around can lead to problems if pam has changed too much from your starting point to allow continued authentication after the upgrade. ZFS notes --------- When upgrading the boot ZFS pool to a new version, always follow these two steps: 1.) recompile and reinstall the ZFS boot loader and boot block (this is part of "make buildworld" and "make installworld") 2.) update the ZFS boot block on your boot drive The following example updates the ZFS boot block on the first partition (freebsd-boot) of a GPT partitioned drive ada0: "gpart bootcode -p /boot/gptzfsboot -i 1 ada0" Non-boot pools do not need these updates. To build a kernel ----------------- If you are updating from a prior version of FreeBSD (even one just a few days old), you should follow this procedure. It is the most failsafe as it uses a /usr/obj tree with a fresh mini-buildworld, make kernel-toolchain make -DALWAYS_CHECK_MAKE buildkernel KERNCONF=YOUR_KERNEL_HERE make -DALWAYS_CHECK_MAKE installkernel KERNCONF=YOUR_KERNEL_HERE To test a kernel once --------------------- If you just want to boot a kernel once (because you are not sure if it works, or if you want to boot a known bad kernel to provide debugging information) run make installkernel KERNCONF=YOUR_KERNEL_HERE KODIR=/boot/testkernel nextboot -k testkernel To just build a kernel when you know that it won't mess you up -------------------------------------------------------------- This assumes you are already running a CURRENT system. Replace ${arch} with the architecture of your machine (e.g. "i386", "arm", "amd64", "ia64", "pc98", "sparc64", "powerpc", "mips", etc). cd src/sys/${arch}/conf config KERNEL_NAME_HERE cd ../compile/KERNEL_NAME_HERE make depend make make install If this fails, go to the "To build a kernel" section. To rebuild everything and install it on the current system. ----------------------------------------------------------- # Note: sometimes if you are running current you gotta do more than # is listed here if you are upgrading from a really old current. make buildworld make kernel KERNCONF=YOUR_KERNEL_HERE [1] [3] mergemaster -Fp [5] make installworld mergemaster -Fi [4] make delete-old [6] To cross-install current onto a separate partition -------------------------------------------------- # In this approach we use a separate partition to hold # current's root, 'usr', and 'var' directories. A partition # holding "/", "/usr" and "/var" should be about 2GB in # size. make buildworld make buildkernel KERNCONF=YOUR_KERNEL_HERE make installworld DESTDIR=${CURRENT_ROOT} -DDB_FROM_SRC make distribution DESTDIR=${CURRENT_ROOT} # if newfs'd make installkernel KERNCONF=YOUR_KERNEL_HERE DESTDIR=${CURRENT_ROOT} cp /etc/fstab ${CURRENT_ROOT}/etc/fstab # if newfs'd To upgrade in-place from stable to current ---------------------------------------------- make buildworld [9] make kernel KERNCONF=YOUR_KERNEL_HERE [8] [1] [3] mergemaster -Fp [5] make installworld mergemaster -Fi [4] make delete-old [6] Make sure that you've read the UPDATING file to understand the tweaks to various things you need. At this point in the life cycle of current, things change often and you are on your own to cope. The defaults can also change, so please read ALL of the UPDATING entries. Also, if you are tracking -current, you must be subscribed to freebsd-current@freebsd.org. Make sure that before you update your sources that you have read and understood all the recent messages there. If in doubt, please track -stable which has much fewer pitfalls. [1] If you have third party modules, such as vmware, you should disable them at this point so they don't crash your system on reboot. [3] From the bootblocks, boot -s, and then do fsck -p mount -u / mount -a cd src adjkerntz -i # if CMOS is wall time Also, when doing a major release upgrade, it is required that you boot into single user mode to do the installworld. [4] Note: This step is non-optional. Failure to do this step can result in a significant reduction in the functionality of the system. Attempting to do it by hand is not recommended and those that pursue this avenue should read this file carefully, as well as the archives of freebsd-current and freebsd-hackers mailing lists for potential gotchas. The -U option is also useful to consider. See mergemaster(8) for more information. [5] Usually this step is a noop. However, from time to time you may need to do this if you get unknown user in the following step. It never hurts to do it all the time. You may need to install a new mergemaster (cd src/usr.sbin/mergemaster && make install) after the buildworld before this step if you last updated from current before 20130425 or from -stable before 20130430. [6] This only deletes old files and directories. Old libraries can be deleted by "make delete-old-libs", but you have to make sure that no program is using those libraries anymore. [8] In order to have a kernel that can run the 4.x binaries needed to do an installworld, you must include the COMPAT_FREEBSD4 option in your kernel. Failure to do so may leave you with a system that is hard to boot to recover. A similar kernel option COMPAT_FREEBSD5 is required to run the 5.x binaries on more recent kernels. And so on for COMPAT_FREEBSD6 and COMPAT_FREEBSD7. Make sure that you merge any new devices from GENERIC since the last time you updated your kernel config file. [9] When checking out sources, you must include the -P flag to have cvs prune empty directories. If CPUTYPE is defined in your /etc/make.conf, make sure to use the "?=" instead of the "=" assignment operator, so that buildworld can override the CPUTYPE if it needs to. MAKEOBJDIRPREFIX must be defined in an environment variable, and not on the command line, or in /etc/make.conf. buildworld will warn if it is improperly defined. FORMAT: This file contains a list, in reverse chronological order, of major breakages in tracking -current. It is not guaranteed to be a complete list of such breakages, and only contains entries since October 10, 2007. If you need to see UPDATING entries from before that date, you will need to fetch an UPDATING file from an older FreeBSD release. Copyright information: Copyright 1998-2009 M. Warner Losh. All Rights Reserved. Redistribution, publication, translation and use, with or without modification, in full or in part, in any form or format of this document are permitted without further permission from the author. THIS DOCUMENT IS PROVIDED BY WARNER LOSH ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL WARNER LOSH BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Contact Warner Losh if you have any questions about your use of this document. $FreeBSD$ Index: projects/ifnet/contrib/ipfilter/ip_fil.c =================================================================== --- projects/ifnet/contrib/ipfilter/ip_fil.c (revision 279031) +++ projects/ifnet/contrib/ipfilter/ip_fil.c (revision 279032) @@ -1,881 +1,884 @@ /* $FreeBSD$ */ /* * Copyright (C) 2012 by Darren Reed. * * See the IPFILTER.LICENCE file for details on licencing. * * $Id$ */ #if !defined(lint) static const char sccsid[] = "@(#)ip_fil.c 2.41 6/5/96 (C) 1993-2000 Darren Reed"; static const char rcsid[] = "@(#)$Id$"; #endif #include "ipf.h" #include "md5.h" #include "ipt.h" ipf_main_softc_t ipfmain; static struct ifnet **ifneta = NULL; static int nifs = 0; struct rtentry; static void ipf_setifpaddr __P((struct ifnet *, char *)); void init_ifp __P((void)); #if defined(__sgi) && (IRIX < 60500) static int no_output __P((struct ifnet *, struct mbuf *, struct sockaddr *)); static int write_output __P((struct ifnet *, struct mbuf *, struct sockaddr *)); #else # if TRU64 >= 1885 static int no_output __P((struct ifnet *, struct mbuf *, struct sockaddr *, struct rtentry *, char *)); static int write_output __P((struct ifnet *, struct mbuf *, struct sockaddr *, struct rtentry *, char *)); # else static int no_output __P((struct ifnet *, struct mbuf *, struct sockaddr *, struct rtentry *)); static int write_output __P((struct ifnet *, struct mbuf *, struct sockaddr *, struct rtentry *)); # endif #endif +struct ifaddr { + struct sockaddr_storage ifa_addr; +}; int ipfattach(softc) ipf_main_softc_t *softc; { return 0; } int ipfdetach(softc) ipf_main_softc_t *softc; { return 0; } /* * Filter ioctl interface. */ int ipfioctl(softc, dev, cmd, data, mode) ipf_main_softc_t *softc; int dev; ioctlcmd_t cmd; caddr_t data; int mode; { int error = 0, unit = 0, uid; uid = getuid(); unit = dev; SPL_NET(s); error = ipf_ioctlswitch(softc, unit, data, cmd, mode, uid, NULL); if (error != -1) { SPL_X(s); return error; } SPL_X(s); return error; } void ipf_forgetifp(softc, ifp) ipf_main_softc_t *softc; void *ifp; { register frentry_t *f; WRITE_ENTER(&softc->ipf_mutex); for (f = softc->ipf_acct[0][softc->ipf_active]; (f != NULL); f = f->fr_next) if (f->fr_ifa == ifp) f->fr_ifa = (void *)-1; for (f = softc->ipf_acct[1][softc->ipf_active]; (f != NULL); f = f->fr_next) if (f->fr_ifa == ifp) f->fr_ifa = (void *)-1; for (f = softc->ipf_rules[0][softc->ipf_active]; (f != NULL); f = f->fr_next) if (f->fr_ifa == ifp) f->fr_ifa = (void *)-1; for (f = softc->ipf_rules[1][softc->ipf_active]; (f != NULL); f = f->fr_next) if (f->fr_ifa == ifp) f->fr_ifa = (void *)-1; RWLOCK_EXIT(&softc->ipf_mutex); ipf_nat_sync(softc, ifp); ipf_lookup_sync(softc, ifp); } static int #if defined(__sgi) && (IRIX < 60500) no_output(ifp, m, s) #else # if TRU64 >= 1885 no_output (ifp, m, s, rt, cp) char *cp; # else no_output(ifp, m, s, rt) # endif struct rtentry *rt; #endif struct ifnet *ifp; struct mbuf *m; struct sockaddr *s; { return 0; } static int #if defined(__sgi) && (IRIX < 60500) write_output(ifp, m, s) #else # if TRU64 >= 1885 write_output (ifp, m, s, rt, cp) char *cp; # else write_output(ifp, m, s, rt) # endif struct rtentry *rt; #endif struct ifnet *ifp; struct mbuf *m; struct sockaddr *s; { char fname[32]; mb_t *mb; ip_t *ip; int fd; mb = (mb_t *)m; ip = MTOD(mb, ip_t *); #if (defined(NetBSD) && (NetBSD <= 1991011) && (NetBSD >= 199606)) || \ (defined(OpenBSD) && (OpenBSD >= 199603)) || defined(linux) || \ (defined(__FreeBSD__) && (__FreeBSD_version >= 501113)) sprintf(fname, "/tmp/%s", ifp->if_xname); #else sprintf(fname, "/tmp/%s%d", ifp->if_name, ifp->if_unit); #endif fd = open(fname, O_WRONLY|O_APPEND); if (fd == -1) { perror("open"); return -1; } write(fd, (char *)ip, ntohs(ip->ip_len)); close(fd); return 0; } static void ipf_setifpaddr(ifp, addr) struct ifnet *ifp; char *addr; { #ifdef __sgi struct in_ifaddr *ifa; #else struct ifaddr *ifa; #endif #if defined(__NetBSD__) || defined(__OpenBSD__) || defined(__FreeBSD__) if (ifp->if_addrlist.tqh_first != NULL) #else # ifdef __sgi if (ifp->in_ifaddr != NULL) # else if (ifp->if_addrlist != NULL) # endif #endif return; ifa = (struct ifaddr *)malloc(sizeof(*ifa)); #if defined(__NetBSD__) || defined(__OpenBSD__) || defined(__FreeBSD__) ifp->if_addrlist.tqh_first = ifa; #else # ifdef __sgi ifp->in_ifaddr = ifa; # else ifp->if_addrlist = ifa; # endif #endif if (ifa != NULL) { struct sockaddr_in *sin; #ifdef __sgi sin = (struct sockaddr_in *)&ifa->ia_addr; #else sin = (struct sockaddr_in *)&ifa->ifa_addr; #endif #ifdef USE_INET6 if (index(addr, ':') != NULL) { struct sockaddr_in6 *sin6; sin6 = (struct sockaddr_in6 *)&ifa->ifa_addr; sin6->sin6_family = AF_INET6; /* Abort if bad address. */ switch (inet_pton(AF_INET6, addr, &sin6->sin6_addr)) { case 1: break; case -1: perror("inet_pton"); abort(); break; default: abort(); break; } } else #endif { sin->sin_family = AF_INET; sin->sin_addr.s_addr = inet_addr(addr); if (sin->sin_addr.s_addr == 0) abort(); } } } struct ifnet * get_unit(name, family) char *name; int family; { struct ifnet *ifp, **ifpp, **old_ifneta; char *addr; #if (defined(NetBSD) && (NetBSD <= 1991011) && (NetBSD >= 199606)) || \ (defined(OpenBSD) && (OpenBSD >= 199603)) || defined(linux) || \ (defined(__FreeBSD__) && (__FreeBSD_version >= 501113)) if (!*name) return NULL; if (name == NULL) name = "anon0"; addr = strchr(name, '='); if (addr != NULL) *addr++ = '\0'; for (ifpp = ifneta; ifpp && (ifp = *ifpp); ifpp++) { if (!strcmp(name, ifp->if_xname)) { if (addr != NULL) ipf_setifpaddr(ifp, addr); return ifp; } } #else char *s, ifname[LIFNAMSIZ+1]; if (name == NULL) name = "anon0"; addr = strchr(name, '='); if (addr != NULL) *addr++ = '\0'; for (ifpp = ifneta; ifpp && (ifp = *ifpp); ifpp++) { COPYIFNAME(family, ifp, ifname); if (!strcmp(name, ifname)) { if (addr != NULL) ipf_setifpaddr(ifp, addr); return ifp; } } #endif if (!ifneta) { ifneta = (struct ifnet **)malloc(sizeof(ifp) * 2); if (!ifneta) return NULL; ifneta[1] = NULL; ifneta[0] = (struct ifnet *)calloc(1, sizeof(*ifp)); if (!ifneta[0]) { free(ifneta); return NULL; } nifs = 1; } else { old_ifneta = ifneta; nifs++; ifneta = (struct ifnet **)realloc(ifneta, (nifs + 1) * sizeof(ifp)); if (!ifneta) { free(old_ifneta); nifs = 0; return NULL; } ifneta[nifs] = NULL; ifneta[nifs - 1] = (struct ifnet *)malloc(sizeof(*ifp)); if (!ifneta[nifs - 1]) { nifs--; return NULL; } } ifp = ifneta[nifs - 1]; #if defined(__NetBSD__) || defined(__OpenBSD__) || defined(__FreeBSD__) TAILQ_INIT(&ifp->if_addrlist); #endif #if (defined(NetBSD) && (NetBSD <= 1991011) && (NetBSD >= 199606)) || \ (defined(OpenBSD) && (OpenBSD >= 199603)) || defined(linux) || \ (defined(__FreeBSD__) && (__FreeBSD_version >= 501113)) (void) strncpy(ifp->if_xname, name, sizeof(ifp->if_xname)); #else s = name + strlen(name) - 1; for (; s > name; s--) { if (!ISDIGIT(*s)) { s++; break; } } if ((s > name) && (*s != 0) && ISDIGIT(*s)) { ifp->if_unit = atoi(s); ifp->if_name = (char *)malloc(s - name + 1); (void) strncpy(ifp->if_name, name, s - name); ifp->if_name[s - name] = '\0'; } else { ifp->if_name = strdup(name); ifp->if_unit = -1; } #endif ifp->if_output = (void *)no_output; if (addr != NULL) { ipf_setifpaddr(ifp, addr); } return ifp; } char * get_ifname(ifp) struct ifnet *ifp; { static char ifname[LIFNAMSIZ]; #if defined(__OpenBSD__) || defined(__NetBSD__) || defined(linux) || \ (defined(__FreeBSD__) && (__FreeBSD_version >= 501113)) sprintf(ifname, "%s", ifp->if_xname); #else if (ifp->if_unit != -1) sprintf(ifname, "%s%d", ifp->if_name, ifp->if_unit); else strcpy(ifname, ifp->if_name); #endif return ifname; } void init_ifp() { struct ifnet *ifp, **ifpp; char fname[32]; int fd; #if (defined(NetBSD) && (NetBSD <= 1991011) && (NetBSD >= 199606)) || \ (defined(OpenBSD) && (OpenBSD >= 199603)) || defined(linux) || \ (defined(__FreeBSD__) && (__FreeBSD_version >= 501113)) for (ifpp = ifneta; ifpp && (ifp = *ifpp); ifpp++) { ifp->if_output = (void *)write_output; sprintf(fname, "/tmp/%s", ifp->if_xname); fd = open(fname, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600); if (fd == -1) perror("open"); else close(fd); } #else for (ifpp = ifneta; ifpp && (ifp = *ifpp); ifpp++) { ifp->if_output = (void *)write_output; sprintf(fname, "/tmp/%s%d", ifp->if_name, ifp->if_unit); fd = open(fname, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600); if (fd == -1) perror("open"); else close(fd); } #endif } int ipf_fastroute(m, mpp, fin, fdp) mb_t *m, **mpp; fr_info_t *fin; frdest_t *fdp; { struct ifnet *ifp; ip_t *ip = fin->fin_ip; frdest_t node; int error = 0; frentry_t *fr; void *sifp; int sout; sifp = fin->fin_ifp; sout = fin->fin_out; fr = fin->fin_fr; ip->ip_sum = 0; if (!(fr->fr_flags & FR_KEEPSTATE) && (fdp != NULL) && (fdp->fd_type == FRD_DSTLIST)) { bzero(&node, sizeof(node)); ipf_dstlist_select_node(fin, fdp->fd_ptr, NULL, &node); fdp = &node; } ifp = fdp->fd_ptr; if (ifp == NULL) return 0; /* no routing table out here */ if (fin->fin_out == 0) { fin->fin_ifp = ifp; fin->fin_out = 1; (void) ipf_acctpkt(fin, NULL); fin->fin_fr = NULL; if (!fr || !(fr->fr_flags & FR_RETMASK)) { u_32_t pass; (void) ipf_state_check(fin, &pass); } switch (ipf_nat_checkout(fin, NULL)) { case 0 : break; case 1 : ip->ip_sum = 0; break; case -1 : error = -1; goto done; break; } } m->mb_ifp = ifp; printpacket(fin->fin_out, m); #if defined(__sgi) && (IRIX < 60500) (*ifp->if_output)(ifp, (void *)ip, NULL); # if TRU64 >= 1885 (*ifp->if_output)(ifp, (void *)m, NULL, 0, 0); # else (*ifp->if_output)(ifp, (void *)m, NULL, 0); # endif #endif done: fin->fin_ifp = sifp; fin->fin_out = sout; return error; } int ipf_send_reset(fin) fr_info_t *fin; { ipfkverbose("- TCP RST sent\n"); return 0; } int ipf_send_icmp_err(type, fin, dst) int type; fr_info_t *fin; int dst; { ipfkverbose("- ICMP unreachable sent\n"); return 0; } void m_freem(m) mb_t *m; { return; } void m_copydata(m, off, len, cp) mb_t *m; int off, len; caddr_t cp; { bcopy((char *)m + off, cp, len); } int ipfuiomove(buf, len, rwflag, uio) caddr_t buf; int len, rwflag; struct uio *uio; { int left, ioc, num, offset; struct iovec *io; char *start; if (rwflag == UIO_READ) { left = len; ioc = 0; offset = uio->uio_offset; while ((left > 0) && (ioc < uio->uio_iovcnt)) { io = uio->uio_iov + ioc; num = io->iov_len; if (num > left) num = left; start = (char *)io->iov_base + offset; if (start > (char *)io->iov_base + io->iov_len) { offset -= io->iov_len; ioc++; continue; } bcopy(buf, start, num); uio->uio_resid -= num; uio->uio_offset += num; left -= num; if (left > 0) ioc++; } if (left > 0) return EFAULT; } return 0; } u_32_t ipf_newisn(fin) fr_info_t *fin; { static int iss_seq_off = 0; u_char hash[16]; u_32_t newiss; MD5_CTX ctx; /* * Compute the base value of the ISS. It is a hash * of (saddr, sport, daddr, dport, secret). */ MD5Init(&ctx); MD5Update(&ctx, (u_char *) &fin->fin_fi.fi_src, sizeof(fin->fin_fi.fi_src)); MD5Update(&ctx, (u_char *) &fin->fin_fi.fi_dst, sizeof(fin->fin_fi.fi_dst)); MD5Update(&ctx, (u_char *) &fin->fin_dat, sizeof(fin->fin_dat)); /* MD5Update(&ctx, ipf_iss_secret, sizeof(ipf_iss_secret)); */ MD5Final(hash, &ctx); memcpy(&newiss, hash, sizeof(newiss)); /* * Now increment our "timer", and add it in to * the computed value. * * XXX Use `addin'? * XXX TCP_ISSINCR too large to use? */ iss_seq_off += 0x00010000; newiss += iss_seq_off; return newiss; } /* ------------------------------------------------------------------------ */ /* Function: ipf_nextipid */ /* Returns: int - 0 == success, -1 == error (packet should be droppped) */ /* Parameters: fin(I) - pointer to packet information */ /* */ /* Returns the next IPv4 ID to use for this packet. */ /* ------------------------------------------------------------------------ */ INLINE u_short ipf_nextipid(fin) fr_info_t *fin; { static u_short ipid = 0; ipf_main_softc_t *softc = fin->fin_main_soft; u_short id; MUTEX_ENTER(&softc->ipf_rw); if (fin->fin_pktnum != 0) { /* * The -1 is for aligned test results. */ id = (fin->fin_pktnum - 1) & 0xffff; } else { } id = ipid++; MUTEX_EXIT(&softc->ipf_rw); return id; } INLINE int ipf_checkv4sum(fin) fr_info_t *fin; { if (fin->fin_flx & FI_SHORT) return 1; if (ipf_checkl4sum(fin) == -1) { fin->fin_flx |= FI_BAD; return -1; } return 0; } #ifdef USE_INET6 INLINE int ipf_checkv6sum(fin) fr_info_t *fin; { if (fin->fin_flx & FI_SHORT) return 1; if (ipf_checkl4sum(fin) == -1) { fin->fin_flx |= FI_BAD; return -1; } return 0; } #endif #if 0 /* * See above for description, except that all addressing is in user space. */ int copyoutptr(softc, src, dst, size) void *src, *dst; size_t size; { caddr_t ca; bcopy(dst, (char *)&ca, sizeof(ca)); bcopy(src, ca, size); return 0; } /* * See above for description, except that all addressing is in user space. */ int copyinptr(src, dst, size) void *src, *dst; size_t size; { caddr_t ca; bcopy(src, (char *)&ca, sizeof(ca)); bcopy(ca, dst, size); return 0; } #endif /* * return the first IP Address associated with an interface */ int ipf_ifpaddr(softc, v, atype, ifptr, inp, inpmask) ipf_main_softc_t *softc; int v, atype; void *ifptr; i6addr_t *inp, *inpmask; { struct ifnet *ifp = ifptr; #ifdef __sgi struct in_ifaddr *ifa; #else struct ifaddr *ifa; #endif #if defined(__NetBSD__) || defined(__OpenBSD__) || defined(__FreeBSD__) ifa = ifp->if_addrlist.tqh_first; #else # ifdef __sgi ifa = (struct in_ifaddr *)ifp->in_ifaddr; # else ifa = ifp->if_addrlist; # endif #endif if (ifa != NULL) { if (v == 4) { struct sockaddr_in *sin, mask; mask.sin_addr.s_addr = 0xffffffff; #ifdef __sgi sin = (struct sockaddr_in *)&ifa->ia_addr; #else sin = (struct sockaddr_in *)&ifa->ifa_addr; #endif return ipf_ifpfillv4addr(atype, sin, &mask, &inp->in4, &inpmask->in4); } #ifdef USE_INET6 if (v == 6) { struct sockaddr_in6 *sin6, mask; sin6 = (struct sockaddr_in6 *)&ifa->ifa_addr; ((i6addr_t *)&mask.sin6_addr)->i6[0] = 0xffffffff; ((i6addr_t *)&mask.sin6_addr)->i6[1] = 0xffffffff; ((i6addr_t *)&mask.sin6_addr)->i6[2] = 0xffffffff; ((i6addr_t *)&mask.sin6_addr)->i6[3] = 0xffffffff; return ipf_ifpfillv6addr(atype, sin6, &mask, inp, inpmask); } #endif } return 0; } /* * This function is not meant to be random, rather just produce a * sequence of numbers that isn't linear to show "randomness". */ u_32_t ipf_random() { static unsigned int last = 0xa5a5a5a5; static int calls = 0; int number; calls++; /* * These are deliberately chosen to ensure that there is some * attempt to test whether the output covers the range in test n18. */ switch (calls) { case 1 : number = 0; break; case 2 : number = 4; break; case 3 : number = 3999; break; case 4 : number = 4000; break; case 5 : number = 48999; break; case 6 : number = 49000; break; default : number = last; last *= calls; last++; number ^= last; break; } return number; } int ipf_verifysrc(fin) fr_info_t *fin; { return 1; } int ipf_inject(fin, m) fr_info_t *fin; mb_t *m; { FREE_MB_T(m); return 0; } u_int ipf_pcksum(fin, hlen, sum) fr_info_t *fin; int hlen; u_int sum; { u_short *sp; u_int sum2; int slen; slen = fin->fin_plen - hlen; sp = (u_short *)((u_char *)fin->fin_ip + hlen); for (; slen > 1; slen -= 2) sum += *sp++; if (slen) sum += ntohs(*(u_char *)sp << 8); while (sum > 0xffff) sum = (sum & 0xffff) + (sum >> 16); sum2 = (u_short)(~sum & 0xffff); return sum2; } void * ipf_pullup(m, fin, plen) mb_t *m; fr_info_t *fin; int plen; { if (M_LEN(m) >= plen) return fin->fin_ip; /* * Fake ipf_pullup failing */ fin->fin_reason = FRB_PULLUP; *fin->fin_mp = NULL; fin->fin_m = NULL; fin->fin_ip = NULL; return NULL; } Index: projects/ifnet/contrib/ipfilter/ipf.h =================================================================== --- projects/ifnet/contrib/ipfilter/ipf.h (revision 279031) +++ projects/ifnet/contrib/ipfilter/ipf.h (revision 279032) @@ -1,406 +1,403 @@ /* $FreeBSD$ */ /* * Copyright (C) 2012 by Darren Reed. * * See the IPFILTER.LICENCE file for details on licencing. * * @(#)ipf.h 1.12 6/5/96 * $Id$ */ #ifndef __IPF_H__ #define __IPF_H__ #if defined(__osf__) # define radix_mask ipf_radix_mask # define radix_node ipf_radix_node # define radix_node_head ipf_radix_node_head #endif #include #include #include /* * This is a workaround for troubles on FreeBSD, HPUX, OpenBSD. * Needed here because on some systems gets included by things * like */ #ifndef _KERNEL # define ADD_KERNEL # define _KERNEL # define KERNEL #endif #ifdef __OpenBSD__ struct file; #endif #include #ifdef ADD_KERNEL # undef _KERNEL # undef KERNEL #endif #include #include #include -#define _WANT_IFADDR -#include - #include #include #include #include #ifndef TCP_PAWS_IDLE /* IRIX */ # include #endif #include #include #include #include #include #include #include #include #if !defined(__SVR4) && !defined(__svr4__) && defined(sun) # include #endif #include #include #include "netinet/ip_compat.h" #include "netinet/ip_fil.h" #include "netinet/ip_nat.h" #include "netinet/ip_frag.h" #include "netinet/ip_state.h" #include "netinet/ip_proxy.h" #include "netinet/ip_auth.h" #include "netinet/ip_lookup.h" #include "netinet/ip_pool.h" #include "netinet/ip_scan.h" #include "netinet/ip_htable.h" #include "netinet/ip_sync.h" #include "netinet/ip_dstlist.h" #include "opts.h" #ifndef __P # ifdef __STDC__ # define __P(x) x # else # define __P(x) () # endif #endif #ifndef __STDC__ # undef const # define const #endif #ifndef U_32_T # define U_32_T 1 # if defined(__NetBSD__) || defined(__OpenBSD__) || defined(__FreeBSD__) || \ defined(__sgi) typedef u_int32_t u_32_t; # else # if defined(__alpha__) || defined(__alpha) || defined(_LP64) typedef unsigned int u_32_t; # else # if SOLARIS2 >= 6 typedef uint32_t u_32_t; # else typedef unsigned int u_32_t; # endif # endif # endif /* __NetBSD__ || __OpenBSD__ || __FreeBSD__ || __sgi */ #endif /* U_32_T */ #ifndef MAXHOSTNAMELEN # define MAXHOSTNAMELEN 256 #endif #define MAX_ICMPCODE 16 #define MAX_ICMPTYPE 19 #define PRINTF (void)printf #define FPRINTF (void)fprintf struct ipopt_names { int on_value; int on_bit; int on_siz; char *on_name; }; typedef struct alist_s { struct alist_s *al_next; int al_not; int al_family; i6addr_t al_i6addr; i6addr_t al_i6mask; } alist_t; #define al_addr al_i6addr.in4_addr #define al_mask al_i6mask.in4_addr #define al_1 al_addr #define al_2 al_mask typedef struct plist_s { struct plist_s *pl_next; int pl_compare; u_short pl_port1; u_short pl_port2; } plist_t; typedef struct { u_short fb_c; u_char fb_t; u_char fb_f; u_32_t fb_k; } fakebpf_t; typedef struct { char *it_name; int it_v4; int it_v6; } icmptype_t; typedef struct wordtab { char *w_word; int w_value; } wordtab_t; typedef struct namelist { struct namelist *na_next; char *na_name; int na_value; } namelist_t; typedef struct proxyrule { struct proxyrule *pr_next; char *pr_proxy; char *pr_conf; namelist_t *pr_names; int pr_proto; } proxyrule_t; #if defined(__NetBSD__) || defined(__OpenBSD__) || \ (_BSDI_VERSION >= 199701) || (__FreeBSD_version >= 300000) || \ SOLARIS || defined(__sgi) || defined(__osf__) || defined(linux) # include typedef int (* ioctlfunc_t) __P((int, ioctlcmd_t, ...)); #else typedef int (* ioctlfunc_t) __P((dev_t, ioctlcmd_t, void *)); #endif typedef int (* addfunc_t) __P((int, ioctlfunc_t, void *)); typedef int (* copyfunc_t) __P((void *, void *, size_t)); /* * SunOS4 */ #if defined(sun) && !defined(__SVR4) && !defined(__svr4__) extern int ioctl __P((int, int, void *)); #endif extern char thishost[]; extern char flagset[]; extern u_char flags[]; extern struct ipopt_names ionames[]; extern struct ipopt_names secclass[]; extern char *icmpcodes[MAX_ICMPCODE + 1]; extern char *icmptypes[MAX_ICMPTYPE + 1]; extern int use_inet6; extern int lineNum; extern int debuglevel; extern struct ipopt_names v6ionames[]; extern icmptype_t icmptypelist[]; extern wordtab_t statefields[]; extern wordtab_t natfields[]; extern wordtab_t poolfields[]; extern int addicmp __P((char ***, struct frentry *, int)); extern int addipopt __P((char *, struct ipopt_names *, int, char *)); extern int addkeep __P((char ***, struct frentry *, int)); extern alist_t *alist_new __P((int, char *)); extern void alist_free __P((alist_t *)); extern void assigndefined __P((char *)); extern void binprint __P((void *, size_t)); extern u_32_t buildopts __P((char *, char *, int)); extern int checkrev __P((char *)); extern int connecttcp __P((char *, int)); extern int count6bits __P((u_32_t *)); extern int count4bits __P((u_32_t)); extern char *fac_toname __P((int)); extern int fac_findname __P((char *)); extern const char *familyname __P((const int)); extern void fill6bits __P((int, u_int *)); extern wordtab_t *findword __P((wordtab_t *, char *)); extern int ftov __P((int)); extern char *ipf_geterror __P((int, ioctlfunc_t *)); extern int genmask __P((int, char *, i6addr_t *)); extern int gethost __P((int, char *, i6addr_t *)); extern int geticmptype __P((int, char *)); extern int getport __P((struct frentry *, char *, u_short *, char *)); extern int getportproto __P((char *, int)); extern int getproto __P((char *)); extern char *getnattype __P((struct nat *)); extern char *getsumd __P((u_32_t)); extern u_32_t getoptbyname __P((char *)); extern u_32_t getoptbyvalue __P((int)); extern u_32_t getv6optbyname __P((char *)); extern u_32_t getv6optbyvalue __P((int)); extern char *icmptypename __P((int, int)); extern void initparse __P((void)); extern void ipf_dotuning __P((int, char *, ioctlfunc_t)); extern int ipf_addrule __P((int, ioctlfunc_t, void *)); extern void ipf_mutex_clean __P((void)); extern int ipf_parsefile __P((int, addfunc_t, ioctlfunc_t *, char *)); extern int ipf_parsesome __P((int, addfunc_t, ioctlfunc_t *, FILE *)); extern void ipf_perror __P((int, char *)); extern int ipf_perror_fd __P(( int, ioctlfunc_t, char *)); extern void ipf_rwlock_clean __P((void)); extern char *ipf_strerror __P((int)); extern void ipferror __P((int, char *)); extern int ipmon_parsefile __P((char *)); extern int ipmon_parsesome __P((FILE *)); extern int ipnat_addrule __P((int, ioctlfunc_t, void *)); extern int ipnat_parsefile __P((int, addfunc_t, ioctlfunc_t, char *)); extern int ipnat_parsesome __P((int, addfunc_t, ioctlfunc_t, FILE *)); extern int ippool_parsefile __P((int, char *, ioctlfunc_t)); extern int ippool_parsesome __P((int, FILE *, ioctlfunc_t)); extern int kmemcpywrap __P((void *, void *, size_t)); extern char *kvatoname __P((ipfunc_t, ioctlfunc_t)); extern int load_dstlist __P((struct ippool_dst *, ioctlfunc_t, ipf_dstnode_t *)); extern int load_dstlistnode __P((int, char *, struct ipf_dstnode *, ioctlfunc_t)); extern alist_t *load_file __P((char *)); extern int load_hash __P((struct iphtable_s *, struct iphtent_s *, ioctlfunc_t)); extern int load_hashnode __P((int, char *, struct iphtent_s *, int, ioctlfunc_t)); extern alist_t *load_http __P((char *)); extern int load_pool __P((struct ip_pool_s *list, ioctlfunc_t)); extern int load_poolnode __P((int, char *, ip_pool_node_t *, int, ioctlfunc_t)); extern alist_t *load_url __P((char *)); extern alist_t *make_range __P((int, struct in_addr, struct in_addr)); extern void mb_hexdump __P((mb_t *, FILE *)); extern ipfunc_t nametokva __P((char *, ioctlfunc_t)); extern void nat_setgroupmap __P((struct ipnat *)); extern int ntomask __P((int, int, u_32_t *)); extern u_32_t optname __P((char ***, u_short *, int)); extern wordtab_t *parsefields __P((wordtab_t *, char *)); extern int *parseipfexpr __P((char *, char **)); extern int parsewhoisline __P((char *, addrfamily_t *, addrfamily_t *)); extern void pool_close __P((void)); extern int pool_fd __P((void)); extern int pool_ioctl __P((ioctlfunc_t, ioctlcmd_t, void *)); extern int pool_open __P((void)); extern char *portname __P((int, int)); extern int pri_findname __P((char *)); extern char *pri_toname __P((int)); extern void print_toif __P((int, char *, char *, struct frdest *)); extern void printaps __P((ap_session_t *, int, int)); extern void printaddr __P((int, int, char *, int, u_32_t *, u_32_t *)); extern void printbuf __P((char *, int, int)); extern void printfieldhdr __P((wordtab_t *, wordtab_t *)); extern void printfr __P((struct frentry *, ioctlfunc_t)); extern struct iphtable_s *printhash __P((struct iphtable_s *, copyfunc_t, char *, int, wordtab_t *)); extern struct iphtable_s *printhash_live __P((iphtable_t *, int, char *, int, wordtab_t *)); extern ippool_dst_t *printdstl_live __P((ippool_dst_t *, int, char *, int, wordtab_t *)); extern void printhashdata __P((iphtable_t *, int)); extern struct iphtent_s *printhashnode __P((struct iphtable_s *, struct iphtent_s *, copyfunc_t, int, wordtab_t *)); extern void printhost __P((int, u_32_t *)); extern void printhostmask __P((int, u_32_t *, u_32_t *)); extern void printip __P((int, u_32_t *)); extern void printlog __P((struct frentry *)); extern void printlookup __P((char *, i6addr_t *addr, i6addr_t *mask)); extern void printmask __P((int, u_32_t *)); extern void printnataddr __P((int, char *, nat_addr_t *, int)); extern void printnatfield __P((nat_t *, int)); extern void printnatside __P((char *, nat_stat_side_t *)); extern void printpacket __P((int, mb_t *)); extern void printpacket6 __P((int, mb_t *)); extern struct ippool_dst *printdstlist __P((struct ippool_dst *, copyfunc_t, char *, int, ipf_dstnode_t *, wordtab_t *)); extern void printdstlistdata __P((ippool_dst_t *, int)); extern ipf_dstnode_t *printdstlistnode __P((ipf_dstnode_t *, copyfunc_t, int, wordtab_t *)); extern void printdstlistpolicy __P((ippool_policy_t)); extern struct ip_pool_s *printpool __P((struct ip_pool_s *, copyfunc_t, char *, int, wordtab_t *)); extern struct ip_pool_s *printpool_live __P((struct ip_pool_s *, int, char *, int, wordtab_t *)); extern void printpooldata __P((ip_pool_t *, int)); extern void printpoolfield __P((void *, int, int)); extern struct ip_pool_node *printpoolnode __P((struct ip_pool_node *, int, wordtab_t *)); extern void printproto __P((struct protoent *, int, struct ipnat *)); extern void printportcmp __P((int, struct frpcmp *)); extern void printstatefield __P((ipstate_t *, int)); extern void printtqtable __P((ipftq_t *)); extern void printtunable __P((ipftune_t *)); extern void printunit __P((int)); extern void optprint __P((u_short *, u_long, u_long)); #ifdef USE_INET6 extern void optprintv6 __P((u_short *, u_long, u_long)); #endif extern int remove_hash __P((struct iphtable_s *, ioctlfunc_t)); extern int remove_hashnode __P((int, char *, struct iphtent_s *, ioctlfunc_t)); extern int remove_pool __P((ip_pool_t *, ioctlfunc_t)); extern int remove_poolnode __P((int, char *, ip_pool_node_t *, ioctlfunc_t)); extern u_char tcpflags __P((char *)); extern void printc __P((struct frentry *)); extern void printC __P((int)); extern void emit __P((int, int, void *, struct frentry *)); extern u_char secbit __P((int)); extern u_char seclevel __P((char *)); extern void printfraginfo __P((char *, struct ipfr *)); extern void printifname __P((char *, char *, void *)); extern char *hostname __P((int, void *)); extern struct ipstate *printstate __P((struct ipstate *, int, u_long)); extern void printsbuf __P((char *)); extern void printnat __P((struct ipnat *, int)); extern void printactiveaddress __P((int, char *, i6addr_t *, char *)); extern void printactivenat __P((struct nat *, int, u_long)); extern void printhostmap __P((struct hostmap *, u_int)); extern void printtcpflags __P((u_32_t, u_32_t)); extern void printipfexpr __P((int *)); extern void printstatefield __P((ipstate_t *, int)); extern void printstatefieldhdr __P((int)); extern int sendtrap_v1_0 __P((int, char *, char *, int, time_t)); extern int sendtrap_v2_0 __P((int, char *, char *, int)); extern int vtof __P((int)); extern void set_variable __P((char *, char *)); extern char *get_variable __P((char *, char **, int)); extern void resetlexer __P((void)); extern void debug __P((int, char *, ...)); extern void verbose __P((int, char *, ...)); extern void ipfkdebug __P((char *, ...)); extern void ipfkverbose __P((char *, ...)); #if SOLARIS extern int gethostname __P((char *, int )); extern void sync __P((void)); #endif #endif /* __IPF_H__ */ Index: projects/ifnet/contrib/ipfilter =================================================================== --- projects/ifnet/contrib/ipfilter (revision 279031) +++ projects/ifnet/contrib/ipfilter (revision 279032) Property changes on: projects/ifnet/contrib/ipfilter ___________________________________________________________________ Modified: svn:mergeinfo ## -0,0 +0,1 ## Merged /head/contrib/ipfilter:r276838-279031 Index: projects/ifnet/lib/libc/sys/mmap.2 =================================================================== --- projects/ifnet/lib/libc/sys/mmap.2 (revision 279031) +++ projects/ifnet/lib/libc/sys/mmap.2 (revision 279032) @@ -1,464 +1,450 @@ .\" Copyright (c) 1991, 1993 .\" The Regents of the University of California. All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 4. Neither the name of the University nor the names of its contributors .\" may be used to endorse or promote products derived from this software .\" without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" @(#)mmap.2 8.4 (Berkeley) 5/11/95 .\" $FreeBSD$ .\" -.Dd September 17, 2014 +.Dd February 18, 2015 .Dt MMAP 2 .Os .Sh NAME .Nm mmap .Nd allocate memory, or map files or devices into memory .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In sys/mman.h .Ft void * .Fn mmap "void *addr" "size_t len" "int prot" "int flags" "int fd" "off_t offset" .Sh DESCRIPTION The .Fn mmap system call causes the pages starting at .Fa addr and continuing for at most .Fa len bytes to be mapped from the object described by .Fa fd , starting at byte offset .Fa offset . If .Fa len is not a multiple of the pagesize, the mapped region may extend past the specified range. Any such extension beyond the end of the mapped object will be zero-filled. .Pp If .Fa addr is non-zero, it is used as a hint to the system. (As a convenience to the system, the actual address of the region may differ from the address supplied.) If .Fa addr is zero, an address will be selected by the system. The actual starting address of the region is returned. A successful .Fa mmap deletes any previous mapping in the allocated address range. .Pp The protections (region accessibility) are specified in the .Fa prot argument by .Em or Ns 'ing the following values: .Pp .Bl -tag -width PROT_WRITE -compact .It Dv PROT_NONE Pages may not be accessed. .It Dv PROT_READ Pages may be read. .It Dv PROT_WRITE Pages may be written. .It Dv PROT_EXEC Pages may be executed. .El .Pp The .Fa flags argument specifies the type of the mapped object, mapping options and whether modifications made to the mapped copy of the page are private to the process or are to be shared with other references. Sharing, mapping type and options are specified in the .Fa flags argument by .Em or Ns 'ing the following values: .Bl -tag -width MAP_PREFAULT_READ .It Dv MAP_32BIT Request a region in the first 2GB of the current process's address space. If a suitable region cannot be found, .Fn mmap will fail. This flag is only available on 64-bit platforms. .It Dv MAP_ALIGNED Ns Pq Fa n Align the region on a requested boundary. If a suitable region cannot be found, .Fn mmap will fail. The .Fa n argument specifies the binary logarithm of the desired alignment. .It Dv MAP_ALIGNED_SUPER Align the region to maximize the potential use of large .Pq Dq super pages. If a suitable region cannot be found, .Fn mmap will fail. The system will choose a suitable page size based on the size of mapping. The page size used as well as the alignment of the region may both be affected by properties of the file being mapped. In particular, the physical address of existing pages of a file may require a specific alignment. The region is not guaranteed to be aligned on any specific boundary. .It Dv MAP_ANON Map anonymous memory not associated with any specific file. The file descriptor used for creating .Dv MAP_ANON must be \-1. The .Fa offset argument must be 0. .\".It Dv MAP_FILE .\"Mapped from a regular file or character-special device memory. .It Dv MAP_ANONYMOUS This flag is identical to .Dv MAP_ANON and is provided for compatibility. .It Dv MAP_EXCL This flag can only be used in combination with .Dv MAP_FIXED . Please see the definition of .Dv MAP_FIXED for the description of its effect. .It Dv MAP_FIXED Do not permit the system to select a different address than the one specified. If the specified address cannot be used, .Fn mmap will fail. If .Dv MAP_FIXED is specified, .Fa addr must be a multiple of the pagesize. If .Dv MAP_EXCL -is not specified, a successfull +is not specified, a successful .Dv MAP_FIXED request replaces any previous mappings for the process' pages in the range from .Fa addr to .Fa addr + .Fa len . In contrast, if .Dv MAP_EXCL is specified, the request will fail if a mapping already exists within the range. .It Dv MAP_HASSEMAPHORE Notify the kernel that the region may contain semaphores and that special handling may be necessary. .It Dv MAP_NOCORE Region is not included in a core file. .It Dv MAP_NOSYNC Causes data dirtied via this VM map to be flushed to physical media only when necessary (usually by the pager) rather than gratuitously. Typically this prevents the update daemons from flushing pages dirtied through such maps and thus allows efficient sharing of memory across unassociated processes using a file-backed shared memory map. Without this option any VM pages you dirty may be flushed to disk every so often (every 30-60 seconds usually) which can create performance problems if you do not need that to occur (such as when you are using shared file-backed mmap regions for IPC purposes). Note that VM/file system coherency is maintained whether you use .Dv MAP_NOSYNC or not. This option is not portable across .Ux platforms (yet), though some may implement the same behavior by default. .Pp .Em WARNING ! Extending a file with .Xr ftruncate 2 , thus creating a big hole, and then filling the hole by modifying a shared .Fn mmap can lead to severe file fragmentation. In order to avoid such fragmentation you should always pre-allocate the file's backing store by .Fn write Ns ing zero's into the newly extended area prior to modifying the area via your .Fn mmap . The fragmentation problem is especially sensitive to .Dv MAP_NOSYNC pages, because pages may be flushed to disk in a totally random order. .Pp The same applies when using .Dv MAP_NOSYNC to implement a file-based shared memory store. It is recommended that you create the backing store by .Fn write Ns ing zero's to the backing file rather than .Fn ftruncate Ns ing it. You can test file fragmentation by observing the KB/t (kilobytes per transfer) results from an .Dq Li iostat 1 -while reading a large file sequentially, e.g.\& using +while reading a large file sequentially, e.g.,\& using .Dq Li dd if=filename of=/dev/null bs=32k . .Pp The .Xr fsync 2 system call will flush all dirty data and metadata associated with a file, including dirty NOSYNC VM data, to physical media. The .Xr sync 8 command and .Xr sync 2 system call generally do not flush dirty NOSYNC VM data. The .Xr msync 2 system call is usually not needed since .Bx implements a coherent file system buffer cache. However, it may be used to associate dirty VM pages with file system buffers and thus cause them to be flushed to physical media sooner rather than later. .It Dv MAP_PREFAULT_READ Immediately update the calling process's lowest-level virtual address translation structures, such as its page table, so that every memory resident page within the region is mapped for read access. Ordinarily these structures are updated lazily. The effect of this option is to eliminate any soft faults that would otherwise occur on the initial read accesses to the region. Although this option does not preclude .Fa prot from including .Dv PROT_WRITE , it does not eliminate soft faults on the initial write accesses to the region. .It Dv MAP_PRIVATE Modifications are private. .It Dv MAP_SHARED Modifications are shared. .It Dv MAP_STACK .Dv MAP_STACK implies .Dv MAP_ANON , and .Fa offset of 0. The .Fa fd argument must be -1 and .Fa prot must include at least .Dv PROT_READ and .Dv PROT_WRITE . This option creates a memory region that grows to at most .Fa len bytes in size, starting from the stack top and growing down. The stack top is the starting address returned by the call, plus .Fa len bytes. The bottom of the stack at maximum growth is the starting address returned by the call. .El .Pp The .Xr close 2 system call does not unmap pages, see .Xr munmap 2 for further information. .Sh NOTES Although this implementation does not impose any alignment restrictions on the .Fa offset argument, a portable program must only use page-aligned values. .Pp Large page mappings require that the pages backing an object be aligned in matching blocks in both the virtual address space and RAM. The system will automatically attempt to use large page mappings when mapping an object that is already backed by large pages in RAM by aligning the mapping request in the virtual address space to match the alignment of the large physical pages. The system may also use large page mappings when mapping portions of an object that are not yet backed by pages in RAM. The .Dv MAP_ALIGNED_SUPER flag is an optimization that will align the mapping request to the size of a large page similar to .Dv MAP_ALIGNED , except that the system will override this alignment if an object already uses large pages so that the mapping will be consistent with the existing large pages. This flag is mostly useful for maximizing the use of large pages on the first mapping of objects that do not yet have pages present in RAM. .Sh RETURN VALUES Upon successful completion, .Fn mmap returns a pointer to the mapped region. Otherwise, a value of .Dv MAP_FAILED is returned and .Va errno is set to indicate the error. .Sh ERRORS The .Fn mmap system call will fail if: .Bl -tag -width Er .It Bq Er EACCES The flag .Dv PROT_READ was specified as part of the .Fa prot argument and .Fa fd was not open for reading. The flags .Dv MAP_SHARED and .Dv PROT_WRITE were specified as part of the .Fa flags and .Fa prot argument and .Fa fd was not open for writing. .It Bq Er EBADF The .Fa fd argument is not a valid open file descriptor. .It Bq Er EINVAL An invalid value was passed in the .Fa prot argument. .It Bq Er EINVAL An undefined option was set in the .Fa flags argument. .It Bq Er EINVAL Both .Dv MAP_PRIVATE and .Dv MAP_SHARED were specified. .It Bq Er EINVAL None of .Dv MAP_ANON , .Dv MAP_PRIVATE , .Dv MAP_SHARED , or .Dv MAP_STACK was specified. At least one of these flags must be included. .It Bq Er EINVAL .Dv MAP_FIXED was specified and the .Fa addr argument was not page aligned, or part of the desired address space resides out of the valid address space for a user process. .It Bq Er EINVAL Both .Dv MAP_FIXED and .Dv MAP_32BIT were specified and part of the desired address space resides outside of the first 2GB of user address space. .It Bq Er EINVAL The .Fa len argument was equal to zero. .It Bq Er EINVAL .Dv MAP_ALIGNED was specified and the desired alignment was either larger than the virtual address size of the machine or smaller than a page. .It Bq Er EINVAL .Dv MAP_ANON was specified and the .Fa fd argument was not -1. .It Bq Er EINVAL .Dv MAP_ANON was specified and the .Fa offset argument was not 0. .It Bq Er EINVAL Both .Dv MAP_FIXED and .Dv MAP_EXCL were specified, but the requested region is already used by a mapping. .It Bq Er EINVAL .Dv MAP_EXCL was specified, but .Dv MAP_FIXED was not. .It Bq Er ENODEV .Dv MAP_ANON has not been specified and .Fa fd did not reference a regular or character special file. .It Bq Er ENOMEM .Dv MAP_FIXED was specified and the .Fa addr argument was not available. .Dv MAP_ANON was specified and insufficient memory was available. .El .Sh SEE ALSO .Xr madvise 2 , .Xr mincore 2 , .Xr minherit 2 , .Xr mlock 2 , .Xr mprotect 2 , .Xr msync 2 , .Xr munlock 2 , .Xr munmap 2 , .Xr getpagesize 3 , .Xr getpagesizes 3 -.Sh BUGS -The -.Fa len -argument -is limited to the maximum file size or available userland address -space. -Files may not be able to be made more than 1TB large on 32 bit systems -due to file systems restrictions and bugs, but address space is far more -restrictive. -Larger files may be possible on 64 bit systems. -.Pp -The previous documented limit of 2GB was a documentation bug. -That limit has not existed since -.Fx 2.2 . Index: projects/ifnet/lib/libc =================================================================== --- projects/ifnet/lib/libc (revision 279031) +++ projects/ifnet/lib/libc (revision 279032) Property changes on: projects/ifnet/lib/libc ___________________________________________________________________ Modified: svn:mergeinfo ## -0,0 +0,1 ## Merged /head/lib/libc:r278980-279031 Index: projects/ifnet/share/misc/committers-doc.dot =================================================================== --- projects/ifnet/share/misc/committers-doc.dot (revision 279031) +++ projects/ifnet/share/misc/committers-doc.dot (revision 279032) @@ -1,184 +1,186 @@ # $FreeBSD$ # This file is meant to list all FreeBSD doc+www committers and describe the # mentor-mentee relationships between them. # The graphical output can be generated from this file with the following # command: # $ dot -T png -o file.png committers-doc.dot # # The dot binary is part of the graphics/graphviz port. digraph doc { # Node definitions follow this example: # # foo [label="Foo Bar\nfoo@FreeBSD.org\n????/??/??"] # # ????/??/?? is the date when the commit bit was obtained, usually the one you # can find looking at svn logs for the svnadmin/access file. # Use YYYY/MM/DD format. # # For returned commit bits, the node definition will follow this example: # # foo [label="Foo Bar\nfoo@FreeBSD.org\n????/??/??\n????/??/??"] # # The first date is the same as for an active committer, the second date is # the date when the commit bit has been returned. Again, check svn logs. node [color=grey62, style=filled, bgcolor=black]; # Alumni go here. Try to keep things sorted. ache [label="Andrey Chernov\nache@FreeBSD.org\n1997/06/13\n2010/12/11"] bmah [label="Bruce A. Mah\nbmah@FreeBSD.org\n2000/08/22\n2009/09/13"] bvs [label="Vitaly Bogdanov\nbvs@FreeBSD.org\n2005/10/03\n2010/12/11"] ceri [label="Ceri Davies\nceri@FreeBSD.org\n2002/03/17\n2012/02/29"] den [label="Denis Peplin\nden@FreeBSD.org\n2003/09/13\n2009/07/09"] garys [label="Gary W. Swearingen\ngarys@FreeBSD.org\n2005/08/21\n2008/03/02"] jcamou [label="Jesus R. Camou\njcamou@FreeBSD.org\n2005/03/02\n2008/12/20"] jesusr [label="Jesus Rodriguez Cuesta\njesusr@FreeBSD.org\n1998/12/10\n2010/12/11"] jim [label="Jim Mock\njim@FreeBSD.org\n1999/08/11\n2003/12/15"] josef [label="Josef El-Rayes\njosef@FreeBSD.org\n2004/01/15\n2008/03/29"] marcel [label="Marcel Moolenaar\nmarcel@FreeBSD.org\n1999/07/03\n2012/04/25"] mheinen [label="Martin Heinen\nmheinen@FreeBSD.org\n2002/10/04\n2006/04/26"] murray [label="Murray Stokely\nmurray@FreeBSD.org\n2000/04/05\n2012/04/25"] nik [label="Nik Clayton\nnik@FreeBSD.org\n1998/02/26\n2008/12/20"] pgj [label="Gabor Pali\npgj@FreeBSD.org\n2008/04/21\n2010/12/01"] roam [label="Peter Pentchev\nroam@FreeBSD.org\n2003/02/14\n2012/02/29"] node [color=lightblue2, style=filled, bgcolor=black]; # Current doc committers go here. Try to keep things sorted. ale [label="Alex Dupre\nale@FreeBSD.org\n2003/12/22"] allanjude [label="Allan Jude\nallanjude@FreeBSD.org\n2014/05/17"] bcr [label="Benedict Reuschling\nbcr@FreeBSD.org\n2009/12/24"] +bhd [label="Björn Heidotting\nbhd@FreeBSD.org\n2014/10/14"] blackend [label="Marc Fonvieille\nblackend@FreeBSD.org\n2002/06/16"] brd [label="Brad Davis\nbrd@FreeBSD.org\n2005/06/01"] brueffer [label="Christian Brueffer\nbrueffer@FreeBSD.org\n2003/01/13"] chinsan [label="Chinsan Huang\nchinsan@FreeBSD.org\n2006/09/20"] crees [label="Chris Rees\ncrees@FreeBSD.org\n2013/05/27"] danger [label="Daniel Gerzo\ndanger@FreeBSD.org\n2006/08/20"] delphij [label="Xin Li\ndelphij@FreeBSD.org\n2004/09/14"] dru [label="Dru Lavigne\ndru@FreeBSD.org\n2013/01/22"] eadler [label="Eitan Adler\neadler@FreeBSD.org\n2012/10/15"] ebrandi [label="Edson Brandi\nebrandi@FreeBSD.org\n2012/09/13"] gabor [label="Gabor Kovesdan\ngabor@FreeBSD.org\n2007/02/02"] ganbold [label="Ganbold Tsagaankhuu\nganbold@FreeBSD.org\n2008/02/26"] gavin [label="Gavin Atkinson\ngavin@FreeBSD.org\n2011/07/18"] gjb [label="Glen Barber\ngjb@FreeBSD.org\n2010/09/01"] hrs [label="Hiroki Sato\nhrs@FreeBSD.org\n2000/07/06"] issyl0 [label="Isabell Long\nissyl0@FreeBSD.org\n2012/04/25"] jgh [label="Jason Helfman\njgh@FreeBSD.org\n2014/01/20"] jkois [label="Johann Kois\njkois@FreeBSD.org\n2004/11/11"] joel [label="Joel Dahl\njoel@FreeBSD.org\n2005/04/05"] keramida [label="Giorgos Keramidas\nkeramida@FreeBSD.org\n2001/10/12"] linimon [label="Mark Linimon\nlinimon@FreeBSD.org\n2004/03/31"] loader [label="Fukang Chen\nloader@FreeBSD.org\n2007/07/30"] manolis [label="Manolis Kiagias\nmanolis@FreeBSD.org\n2008/05/24"] marck [label="Dmitry Morozovsky\nmarck@FreeBSD.org\n2004/08/10"] maxim [label="Maxim Konovalov\nmaxim@FreeBSD.org\n2002/02/07"] miwi [label="Martin Wilke\nmiwi@FreeBSD.org\n2007/10/26"] pav [label="Pav Lucistnik\npav@FreeBSD.org\n2005/08/12"] pluknet [label="Sergey Kandaurov\npluknet@FreeBSD.org\n2012/02/14"] remko [label="Remko Lodder\nremko@FreeBSD.org\n2004/10/16"] rene [label="Rene Ladan\nrene@FreeBSD.org\n2008/11/03"] ryusuke [label="Ryusuke Suzuki\nryusuke@FreeBSD.org\n2009/12/21"] simon [label="Simon L. Nielsen\nsimon@FreeBSD.org\n2003/07/20"] skreuzer [label="Steven Kreuzer\nskreuzer@FreeBSD.org\n2014/01/15"] taras [label="Taras Korenko\ntaras@FreeBSD.org\n2010/06/25"] trhodes [label="Tom Rhodes\ntrhodes@FreeBSD.org\n2002/03/25"] wblock [label="Warren Block\nwblock@FreeBSD.org\n2011/09/12"] zeising [label="Niclas Zeising\nzeising@FreeBSD.org\n2012/07/03"] # Here are the mentor/mentee relationships. # Group together all the mentees for a particular mentor. # Keep the list sorted by mentor login. bcr -> gavin bcr -> wblock bcr -> eadler bcr -> dru bcr -> crees bcr -> jgh bcr -> allanjude +bcr -> bhd blackend -> ale brueffer -> joel ceri -> brd ceri -> brueffer ceri -> linimon ceri -> roam ceri -> simon den -> marck delphij -> chinsan delphij -> loader eadler -> allanjude gabor -> pgj gabor -> manolis gabor -> taras gabor -> issyl0 gabor -> ebrandi gjb -> wblock gjb -> rene gjb -> dru gjb -> crees hrs -> ryusuke hrs -> dru hrs -> skreuzer jesusr -> jcamou jim -> trhodes jkois -> miwi jkois -> bcr jkois -> gavin jkois -> gjb jkois -> eadler joel -> zeising keramida -> blackend keramida -> danger keramida -> gabor keramida -> ganbold keramida -> garys keramida -> gjb keramida -> pav marck -> bvs marck -> pluknet marck -> taras maxim -> taras mheinen -> jkois murray -> ceri murray -> delphij nik -> bmah nik -> keramida remko -> jkois remko -> rene remko -> jgh simon -> josef simon -> remko trhodes -> danger trhodes -> jcamou wblock -> jgh wblock -> allanjude } Index: projects/ifnet/share/mk/bsd.sys.mk =================================================================== --- projects/ifnet/share/mk/bsd.sys.mk (revision 279031) +++ projects/ifnet/share/mk/bsd.sys.mk (revision 279032) @@ -1,162 +1,165 @@ # $FreeBSD$ # # This file contains common settings used for building FreeBSD # sources. # Enable various levels of compiler warning checks. These may be # overridden (e.g. if using a non-gcc compiler) by defining MK_WARNS=no. # for GCC: http://gcc.gnu.org/onlinedocs/gcc-4.2.1/gcc/Warning-Options.html .include # the default is gnu99 for now CSTD?= gnu99 .if ${CSTD} == "k&r" CFLAGS+= -traditional .elif ${CSTD} == "c89" || ${CSTD} == "c90" CFLAGS+= -std=iso9899:1990 .elif ${CSTD} == "c94" || ${CSTD} == "c95" CFLAGS+= -std=iso9899:199409 .elif ${CSTD} == "c99" CFLAGS+= -std=iso9899:1999 .else # CSTD CFLAGS+= -std=${CSTD} .endif # CSTD # -pedantic is problematic because it also imposes namespace restrictions #CFLAGS+= -pedantic .if defined(WARNS) .if ${WARNS} >= 1 CWARNFLAGS+= -Wsystem-headers .if !defined(NO_WERROR) && !defined(NO_WERROR.${COMPILER_TYPE}) CWARNFLAGS+= -Werror .endif # !NO_WERROR && !NO_WERROR.${COMPILER_TYPE} .endif # WARNS >= 1 .if ${WARNS} >= 2 CWARNFLAGS+= -Wall -Wno-format-y2k .endif # WARNS >= 2 .if ${WARNS} >= 3 CWARNFLAGS+= -W -Wno-unused-parameter -Wstrict-prototypes\ -Wmissing-prototypes -Wpointer-arith .endif # WARNS >= 3 .if ${WARNS} >= 4 CWARNFLAGS+= -Wreturn-type -Wcast-qual -Wwrite-strings -Wswitch -Wshadow\ -Wunused-parameter .if !defined(NO_WCAST_ALIGN) && !defined(NO_WCAST_ALIGN.${COMPILER_TYPE}) CWARNFLAGS+= -Wcast-align .endif # !NO_WCAST_ALIGN !NO_WCAST_ALIGN.${COMPILER_TYPE} .endif # WARNS >= 4 # BDECFLAGS .if ${WARNS} >= 6 CWARNFLAGS+= -Wchar-subscripts -Winline -Wnested-externs -Wredundant-decls\ -Wold-style-definition .if !defined(NO_WMISSING_VARIABLE_DECLARATIONS) CWARNFLAGS.clang+= -Wmissing-variable-declarations .endif .if !defined(NO_WTHREAD_SAFETY) CWARNFLAGS.clang+= -Wthread-safety .endif .endif # WARNS >= 6 .if ${WARNS} >= 2 && ${WARNS} <= 4 # XXX Delete -Wuninitialized by default for now -- the compiler doesn't # XXX always get it right. CWARNFLAGS+= -Wno-uninitialized .endif # WARNS >=2 && WARNS <= 4 CWARNFLAGS+= -Wno-pointer-sign # Clang has more warnings enabled by default, and when using -Wall, so if WARNS # is set to low values, these have to be disabled explicitly. .if ${WARNS} <= 6 CWARNFLAGS.clang+= -Wno-empty-body -Wno-string-plus-int .if ${COMPILER_TYPE} == "clang" && ${COMPILER_VERSION} > 30300 CWARNFLAGS.clang+= -Wno-unused-const-variable .endif .endif # WARNS <= 6 .if ${WARNS} <= 3 CWARNFLAGS.clang+= -Wno-tautological-compare -Wno-unused-value\ -Wno-parentheses-equality -Wno-unused-function -Wno-enum-conversion .endif # WARNS <= 3 .if ${WARNS} <= 2 CWARNFLAGS.clang+= -Wno-switch -Wno-switch-enum -Wno-knr-promoted-parameter .endif # WARNS <= 2 .if ${WARNS} <= 1 CWARNFLAGS.clang+= -Wno-parentheses .endif # WARNS <= 1 .if defined(NO_WARRAY_BOUNDS) CWARNFLAGS.clang+= -Wno-array-bounds .endif # NO_WARRAY_BOUNDS .endif # WARNS .if defined(FORMAT_AUDIT) WFORMAT= 1 .endif # FORMAT_AUDIT .if defined(WFORMAT) .if ${WFORMAT} > 0 #CWARNFLAGS+= -Wformat-nonliteral -Wformat-security -Wno-format-extra-args CWARNFLAGS+= -Wformat=2 -Wno-format-extra-args .if ${WARNS} <= 3 CWARNFLAGS.clang+= -Wno-format-nonliteral .endif # WARNS <= 3 .if !defined(NO_WERROR) && !defined(NO_WERROR.${COMPILER_TYPE}) CWARNFLAGS+= -Werror .endif # !NO_WERROR && !NO_WERROR.${COMPILER_TYPE} .endif # WFORMAT > 0 .endif # WFORMAT .if defined(NO_WFORMAT) || defined(NO_WFORMAT.${COMPILER_TYPE}) CWARNFLAGS+= -Wno-format .endif # NO_WFORMAT || NO_WFORMAT.${COMPILER_TYPE} .if defined(IGNORE_PRAGMA) CWARNFLAGS+= -Wno-unknown-pragmas .endif # IGNORE_PRAGMA # We need this conditional because many places that use it # only enable it for some files with CLFAGS.$FILE+=${CLANG_NO_IAS}. # unconditionally, and can't easily use the CFLAGS.clang= # mechanism. .if ${COMPILER_TYPE} == "clang" CLANG_NO_IAS= -no-integrated-as .endif CLANG_OPT_SMALL= -mstack-alignment=8 -mllvm -inline-threshold=3\ - -mllvm -simplifycfg-dup-ret -mllvm -enable-gvn=false + -mllvm -simplifycfg-dup-ret -mllvm +.if ${COMPILER_VERSION} > 30400 +CLANG_OPT_SMALL+= -enable-gvn=false +.endif CFLAGS.clang+= -Qunused-arguments .if ${MACHINE_CPUARCH} == "sparc64" # Don't emit .cfi directives, since we must use GNU as on sparc64, for now. CFLAGS.clang+= -fno-dwarf2-cfi-asm .endif # SPARC64 # The libc++ headers use c++11 extensions. These are normally silenced because # they are treated as system headers, but we explicitly disable that warning # suppression when building the base system to catch bugs in our headers. # Eventually we'll want to start building the base system C++ code as C++11, # but not yet. CXXFLAGS.clang+= -Wno-c++11-extensions .if ${MK_SSP} != "no" && \ ${MACHINE_CPUARCH} != "arm" && ${MACHINE_CPUARCH} != "mips" # Don't use -Wstack-protector as it breaks world with -Werror. SSP_CFLAGS?= -fstack-protector CFLAGS+= ${SSP_CFLAGS} .endif # SSP && !ARM && !MIPS # Allow user-specified additional warning flags, plus compiler specific flag overrides. # Unless we've overriden this... .if ${MK_WARNS} != "no" CFLAGS+= ${CWARNFLAGS} ${CWARNFLAGS.${COMPILER_TYPE}} .endif CFLAGS+= ${CFLAGS.${COMPILER_TYPE}} CXXFLAGS+= ${CXXFLAGS.${COMPILER_TYPE}} # Tell bmake not to mistake standard targets for things to be searched for # or expect to ever be up-to-date. PHONY_NOTMAIN = afterdepend afterinstall all beforedepend beforeinstall \ beforelinking build build-tools buildfiles buildincludes \ checkdpadd clean cleandepend cleandir cleanobj configure \ depend dependall distclean distribute exe \ html includes install installfiles installincludes lint \ obj objlink objs objwarn realall realdepend \ realinstall regress subdir-all subdir-depend subdir-install \ tags whereobj .PHONY: ${PHONY_NOTMAIN} .NOTMAIN: ${PHONY_NOTMAIN} Index: projects/ifnet/share =================================================================== --- projects/ifnet/share (revision 279031) +++ projects/ifnet/share (revision 279032) Property changes on: projects/ifnet/share ___________________________________________________________________ Modified: svn:mergeinfo ## -0,0 +0,1 ## Merged /head/share:r278980-279031 Index: projects/ifnet/sys/arm/arm/db_trace.c =================================================================== --- projects/ifnet/sys/arm/arm/db_trace.c (revision 279031) +++ projects/ifnet/sys/arm/arm/db_trace.c (revision 279032) @@ -1,183 +1,183 @@ /* $NetBSD: db_trace.c,v 1.8 2003/01/17 22:28:48 thorpej Exp $ */ /*- * Copyright (c) 2000, 2001 Ben Harris * Copyright (c) 1996 Scott K. Stevens * * Mach Operating System * Copyright (c) 1991,1990 Carnegie Mellon University * All Rights Reserved. * * Permission to use, copy, modify and distribute this software and its * documentation is hereby granted, provided that both the copyright * notice and this permission notice appear in all copies of the * software, derivative works or modified versions, and any portions * thereof, and that both notices appear in supporting documentation. * * CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" * CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND FOR * ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. * * Carnegie Mellon requests users of this software to return to * * Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU * School of Computer Science * Carnegie Mellon University * Pittsburgh PA 15213-3890 * * any improvements or extensions that they make and grant Carnegie the * rights to redistribute these changes. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include static void db_stack_trace_cmd(struct unwind_state *state) { const char *name; db_expr_t value; db_expr_t offset; c_db_sym_t sym; u_int reg, i; char *sep; uint16_t upd_mask; bool finished; finished = false; while (!finished) { - finished = unwind_stack_one(state); + finished = unwind_stack_one(state, 0); /* Print the frame details */ sym = db_search_symbol(state->start_pc, DB_STGY_ANY, &offset); if (sym == C_DB_SYM_NULL) { value = 0; name = "(null)"; } else db_symbol_values(sym, &name, &value); db_printf("%s() at ", name); db_printsym(state->start_pc, DB_STGY_PROC); db_printf("\n"); db_printf("\t pc = 0x%08x lr = 0x%08x (", state->start_pc, state->registers[LR]); db_printsym(state->registers[LR], DB_STGY_PROC); db_printf(")\n"); db_printf("\t sp = 0x%08x fp = 0x%08x", state->registers[SP], state->registers[FP]); /* Don't print the registers we have already printed */ upd_mask = state->update_mask & ~((1 << SP) | (1 << FP) | (1 << LR) | (1 << PC)); sep = "\n\t"; for (i = 0, reg = 0; upd_mask != 0; upd_mask >>= 1, reg++) { if ((upd_mask & 1) != 0) { db_printf("%s%sr%d = 0x%08x", sep, (reg < 10) ? " " : "", reg, state->registers[reg]); i++; if (i == 2) { sep = "\n\t"; i = 0; } else sep = " "; } } db_printf("\n"); if (finished) break; /* * Stop if directed to do so, or if we've unwound back to the * kernel entry point, or if the unwind function didn't change * anything (to avoid getting stuck in this loop forever). * If the latter happens, it's an indication that the unwind * information is incorrect somehow for the function named in * the last frame printed before you see the unwind failure * message (maybe it needs a STOP_UNWINDING). */ if (state->registers[PC] < VM_MIN_KERNEL_ADDRESS) { db_printf("Unable to unwind into user mode\n"); finished = true; } else if (state->update_mask == 0) { db_printf("Unwind failure (no registers changed)\n"); finished = true; } } } /* XXX stubs */ void db_md_list_watchpoints() { } int db_md_clr_watchpoint(db_expr_t addr, db_expr_t size) { return (0); } int db_md_set_watchpoint(db_expr_t addr, db_expr_t size) { return (0); } int db_trace_thread(struct thread *thr, int count) { struct unwind_state state; struct pcb *ctx; if (thr != curthread) { ctx = kdb_thr_ctx(thr); state.registers[FP] = ctx->pcb_regs.sf_r11; state.registers[SP] = ctx->pcb_regs.sf_sp; state.registers[LR] = ctx->pcb_regs.sf_lr; state.registers[PC] = ctx->pcb_regs.sf_pc; db_stack_trace_cmd(&state); } else db_trace_self(); return (0); } void db_trace_self(void) { struct unwind_state state; uint32_t sp; /* Read the stack pointer */ __asm __volatile("mov %0, sp" : "=&r" (sp)); state.registers[FP] = (uint32_t)__builtin_frame_address(0); state.registers[SP] = sp; state.registers[LR] = (uint32_t)__builtin_return_address(0); state.registers[PC] = (uint32_t)db_trace_self; db_stack_trace_cmd(&state); } Index: projects/ifnet/sys/arm/arm/unwind.c =================================================================== --- projects/ifnet/sys/arm/arm/unwind.c (revision 279031) +++ projects/ifnet/sys/arm/arm/unwind.c (revision 279032) @@ -1,369 +1,420 @@ /* * Copyright 2013-2014 Andrew Turner. * Copyright 2013-2014 Ian Lepore. * Copyright 2013-2014 Rui Paulo. * Copyright 2013 Eitan Adler. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are * met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #include +#include #include +#include "linker_if.h" + /* * Definitions for the instruction interpreter. * * The ARM EABI specifies how to perform the frame unwinding in the * Exception Handling ABI for the ARM Architecture document. To perform * the unwind we need to know the initial frame pointer, stack pointer, * link register and program counter. We then find the entry within the * index table that points to the function the program counter is within. * This gives us either a list of three instructions to process, a 31-bit * relative offset to a table of instructions, or a value telling us * we can't unwind any further. * * When we have the instructions to process we need to decode them * following table 4 in section 9.3. This describes a collection of bit * patterns to encode that steps to take to update the stack pointer and * link register to the correct values at the start of the function. */ /* A special case when we are unable to unwind past this function */ #define EXIDX_CANTUNWIND 1 /* * These are set in the linker script. Their addresses will be * either the start or end of the exception table or index. */ -extern int extab_start, extab_end, exidx_start, exidx_end; +extern int exidx_start, exidx_end; /* * Entry types. * These are the only entry types that have been seen in the kernel. */ #define ENTRY_MASK 0xff000000 #define ENTRY_ARM_SU16 0x80000000 #define ENTRY_ARM_LU16 0x81000000 /* Instruction masks. */ #define INSN_VSP_MASK 0xc0 #define INSN_VSP_SIZE_MASK 0x3f #define INSN_STD_MASK 0xf0 #define INSN_STD_DATA_MASK 0x0f #define INSN_POP_TYPE_MASK 0x08 #define INSN_POP_COUNT_MASK 0x07 #define INSN_VSP_LARGE_INC_MASK 0xff /* Instruction definitions */ #define INSN_VSP_INC 0x00 #define INSN_VSP_DEC 0x40 #define INSN_POP_MASKED 0x80 #define INSN_VSP_REG 0x90 #define INSN_POP_COUNT 0xa0 #define INSN_FINISH 0xb0 #define INSN_POP_REGS 0xb1 #define INSN_VSP_LARGE_INC 0xb2 /* An item in the exception index table */ struct unwind_idx { uint32_t offset; uint32_t insn; }; /* Expand a 31-bit signed value to a 32-bit signed value */ static __inline int32_t expand_prel31(uint32_t prel31) { return ((int32_t)(prel31 & 0x7fffffffu) << 1) / 2; } +struct search_context { + uint32_t addr; + caddr_t exidx_start; + caddr_t exidx_end; +}; + +static int +module_search(linker_file_t lf, void *context) +{ + struct search_context *sc = context; + linker_symval_t symval; + c_linker_sym_t sym; + + if (lf->address <= (caddr_t)sc->addr && + (lf->address + lf->size) >= (caddr_t)sc->addr) { + if ((LINKER_LOOKUP_SYMBOL(lf, "__exidx_start", &sym) == 0 || + LINKER_LOOKUP_SYMBOL(lf, "exidx_start", &sym) == 0) && + LINKER_SYMBOL_VALUES(lf, sym, &symval) == 0) + sc->exidx_start = symval.value; + + if ((LINKER_LOOKUP_SYMBOL(lf, "__exidx_end", &sym) == 0 || + LINKER_LOOKUP_SYMBOL(lf, "exidx_end", &sym) == 0) && + LINKER_SYMBOL_VALUES(lf, sym, &symval) == 0) + sc->exidx_end = symval.value; + + if (sc->exidx_start != NULL && sc->exidx_end != NULL) + return (1); + panic("Invalid module %s, no unwind tables\n", lf->filename); + } + return (0); +} + /* * Perform a binary search of the index table to find the function * with the largest address that doesn't exceed addr. */ static struct unwind_idx * -find_index(uint32_t addr) +find_index(uint32_t addr, int search_modules) { + struct search_context sc; + caddr_t idx_start, idx_end; unsigned int min, mid, max; struct unwind_idx *start; struct unwind_idx *item; int32_t prel31_addr; uint32_t func_addr; start = (struct unwind_idx *)&exidx_start; + idx_start = (caddr_t)&exidx_start; + idx_end = (caddr_t)&exidx_end; + /* This may acquire a lock */ + if (search_modules) { + bzero(&sc, sizeof(sc)); + sc.addr = addr; + if (linker_file_foreach(module_search, &sc) != 0 && + sc.exidx_start != NULL && sc.exidx_end != NULL) { + start = (struct unwind_idx *)sc.exidx_start; + idx_start = sc.exidx_start; + idx_end = sc.exidx_end; + } + } + min = 0; - max = (&exidx_end - &exidx_start) / 2; + max = (idx_end - idx_start) / sizeof(struct unwind_idx); while (min != max) { mid = min + (max - min + 1) / 2; item = &start[mid]; prel31_addr = expand_prel31(item->offset); func_addr = (uint32_t)&item->offset + prel31_addr; if (func_addr <= addr) { min = mid; } else { max = mid - 1; } } return &start[min]; } /* Reads the next byte from the instruction list */ static uint8_t unwind_exec_read_byte(struct unwind_state *state) { uint8_t insn; /* Read the unwind instruction */ insn = (*state->insn) >> (state->byte * 8); /* Update the location of the next instruction */ if (state->byte == 0) { state->byte = 3; state->insn++; state->entries--; } else state->byte--; return insn; } /* Executes the next instruction on the list */ static int unwind_exec_insn(struct unwind_state *state) { unsigned int insn; uint32_t *vsp = (uint32_t *)state->registers[SP]; int update_vsp = 0; /* This should never happen */ if (state->entries == 0) return 1; /* Read the next instruction */ insn = unwind_exec_read_byte(state); if ((insn & INSN_VSP_MASK) == INSN_VSP_INC) { state->registers[SP] += ((insn & INSN_VSP_SIZE_MASK) << 2) + 4; } else if ((insn & INSN_VSP_MASK) == INSN_VSP_DEC) { state->registers[SP] -= ((insn & INSN_VSP_SIZE_MASK) << 2) + 4; } else if ((insn & INSN_STD_MASK) == INSN_POP_MASKED) { unsigned int mask, reg; /* Load the mask */ mask = unwind_exec_read_byte(state); mask |= (insn & INSN_STD_DATA_MASK) << 8; /* We have a refuse to unwind instruction */ if (mask == 0) return 1; /* Update SP */ update_vsp = 1; /* Load the registers */ for (reg = 4; mask && reg < 16; mask >>= 1, reg++) { if (mask & 1) { state->registers[reg] = *vsp++; state->update_mask |= 1 << reg; /* If we have updated SP kep its value */ if (reg == SP) update_vsp = 0; } } } else if ((insn & INSN_STD_MASK) == INSN_VSP_REG && ((insn & INSN_STD_DATA_MASK) != 13) && ((insn & INSN_STD_DATA_MASK) != 15)) { /* sp = register */ state->registers[SP] = state->registers[insn & INSN_STD_DATA_MASK]; } else if ((insn & INSN_STD_MASK) == INSN_POP_COUNT) { unsigned int count, reg; /* Read how many registers to load */ count = insn & INSN_POP_COUNT_MASK; /* Update sp */ update_vsp = 1; /* Pop the registers */ for (reg = 4; reg <= 4 + count; reg++) { state->registers[reg] = *vsp++; state->update_mask |= 1 << reg; } /* Check if we are in the pop r14 version */ if ((insn & INSN_POP_TYPE_MASK) != 0) { state->registers[14] = *vsp++; } } else if (insn == INSN_FINISH) { /* Stop processing */ state->entries = 0; } else if (insn == INSN_POP_REGS) { unsigned int mask, reg; mask = unwind_exec_read_byte(state); if (mask == 0 || (mask & 0xf0) != 0) return 1; /* Update SP */ update_vsp = 1; /* Load the registers */ for (reg = 0; mask && reg < 4; mask >>= 1, reg++) { if (mask & 1) { state->registers[reg] = *vsp++; state->update_mask |= 1 << reg; } } } else if ((insn & INSN_VSP_LARGE_INC_MASK) == INSN_VSP_LARGE_INC) { unsigned int uleb128; /* Read the increment value */ uleb128 = unwind_exec_read_byte(state); state->registers[SP] += 0x204 + (uleb128 << 2); } else { /* We hit a new instruction that needs to be implemented */ #if 0 db_printf("Unhandled instruction %.2x\n", insn); #endif return 1; } if (update_vsp) { state->registers[SP] = (uint32_t)vsp; } #if 0 db_printf("fp = %08x, sp = %08x, lr = %08x, pc = %08x\n", state->registers[FP], state->registers[SP], state->registers[LR], state->registers[PC]); #endif return 0; } /* Performs the unwind of a function */ static int unwind_tab(struct unwind_state *state) { uint32_t entry; /* Set PC to a known value */ state->registers[PC] = 0; /* Read the personality */ entry = *state->insn & ENTRY_MASK; if (entry == ENTRY_ARM_SU16) { state->byte = 2; state->entries = 1; } else if (entry == ENTRY_ARM_LU16) { state->byte = 1; state->entries = ((*state->insn >> 16) & 0xFF) + 1; } else { #if 0 db_printf("Unknown entry: %x\n", entry); #endif return 1; } while (state->entries > 0) { if (unwind_exec_insn(state) != 0) return 1; } /* * The program counter was not updated, load it from the link register. */ if (state->registers[PC] == 0) { state->registers[PC] = state->registers[LR]; /* * If the program counter changed, flag it in the update mask. */ if (state->start_pc != state->registers[PC]) state->update_mask |= 1 << PC; } return 0; } int -unwind_stack_one(struct unwind_state *state) +unwind_stack_one(struct unwind_state *state, int can_lock) { struct unwind_idx *index; int finished; /* Reset the mask of updated registers */ state->update_mask = 0; /* The pc value is correct and will be overwritten, save it */ state->start_pc = state->registers[PC]; /* Find the item to run */ - index = find_index(state->start_pc); + index = find_index(state->start_pc, can_lock); finished = 0; if (index->insn != EXIDX_CANTUNWIND) { if (index->insn & (1U << 31)) { /* The data is within the instruction */ state->insn = &index->insn; } else { /* A prel31 offset to the unwind table */ state->insn = (uint32_t *) ((uintptr_t)&index->insn + expand_prel31(index->insn)); } /* Run the unwind function */ finished = unwind_tab(state); } /* This is the top of the stack, finish */ if (index->insn == EXIDX_CANTUNWIND) finished = 1; return (finished); } Index: projects/ifnet/sys/arm/include/stack.h =================================================================== --- projects/ifnet/sys/arm/include/stack.h (revision 279031) +++ projects/ifnet/sys/arm/include/stack.h (revision 279032) @@ -1,60 +1,60 @@ /*- * Copyright (c) 2000, 2001 Ben Harris * Copyright (c) 1996 Scott K. Stevens * * Mach Operating System * Copyright (c) 1991,1990 Carnegie Mellon University * All Rights Reserved. * * Permission to use, copy, modify and distribute this software and its * documentation is hereby granted, provided that both the copyright * notice and this permission notice appear in all copies of the * software, derivative works or modified versions, and any portions * thereof, and that both notices appear in supporting documentation. * * CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" * CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND FOR * ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. * * Carnegie Mellon requests users of this software to return to * * Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU * School of Computer Science * Carnegie Mellon University * Pittsburgh PA 15213-3890 * * any improvements or extensions that they make and grant Carnegie the * rights to redistribute these changes. * * $FreeBSD$ */ #ifndef _MACHINE_STACK_H_ #define _MACHINE_STACK_H_ #define INKERNEL(va) (((vm_offset_t)(va)) >= VM_MIN_KERNEL_ADDRESS) #define FR_SCP (0) #define FR_RLV (-1) #define FR_RSP (-2) #define FR_RFP (-3) /* The state of the unwind process */ struct unwind_state { uint32_t registers[16]; uint32_t start_pc; uint32_t *insn; u_int entries; u_int byte; uint16_t update_mask; }; /* The register names */ #define FP 11 #define SP 13 #define LR 14 #define PC 15 -int unwind_stack_one(struct unwind_state *); +int unwind_stack_one(struct unwind_state *, int); #endif /* !_MACHINE_STACK_H_ */ Index: projects/ifnet/sys/cddl/dev/dtrace/arm/dtrace_isa.c =================================================================== --- projects/ifnet/sys/cddl/dev/dtrace/arm/dtrace_isa.c (revision 279031) +++ projects/ifnet/sys/cddl/dev/dtrace/arm/dtrace_isa.c (revision 279032) @@ -1,356 +1,295 @@ /* * CDDL HEADER START * * The contents of this file are subject to the terms of the * Common Development and Distribution License, Version 1.0 only * (the "License"). You may not use this file except in compliance * with the License. * * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE * or http://www.opensolaris.org/os/licensing. * See the License for the specific language governing permissions * and limitations under the License. * * When distributing Covered Code, include this CDDL HEADER in each * file and include the License file at usr/src/OPENSOLARIS.LICENSE. * If applicable, add the following below this CDDL HEADER, with the * fields enclosed by brackets "[]" replaced with your own identifying * information: Portions Copyright [yyyy] [name of copyright owner] * * CDDL HEADER END * * $FreeBSD$ */ /* * Copyright 2005 Sun Microsystems, Inc. All rights reserved. * Use is subject to license terms. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "regset.h" /* * Wee need some reasonable default to prevent backtrace code * from wandering too far */ #define MAX_FUNCTION_SIZE 0x10000 #define MAX_PROLOGUE_SIZE 0x100 uint8_t dtrace_fuword8_nocheck(void *); uint16_t dtrace_fuword16_nocheck(void *); uint32_t dtrace_fuword32_nocheck(void *); uint64_t dtrace_fuword64_nocheck(void *); void dtrace_getpcstack(pc_t *pcstack, int pcstack_limit, int aframes, uint32_t *intrpc) { - u_int32_t *frame, *lastframe; - int scp_offset; - int depth = 0; + struct unwind_state state; + register_t sp; + int scp_offset; + int depth = 0; pc_t caller = (pc_t) solaris_cpu[curcpu].cpu_dtrace_caller; if (intrpc != 0) pcstack[depth++] = (pc_t) intrpc; aframes++; - frame = (u_int32_t *)__builtin_frame_address(0);; - lastframe = NULL; - scp_offset = -(get_pc_str_offset() >> 2); + __asm __volatile("mov %0, sp" : "=&r" (sp)); - while ((frame != NULL) && (depth < pcstack_limit)) { - db_addr_t scp; -#if 0 - u_int32_t savecode; - int r; - u_int32_t *rp; -#endif + state.registers[FP] = (uint32_t)__builtin_frame_address(0); + state.registers[SP] = sp; + state.registers[LR] = (uint32_t)__builtin_return_address(0); + state.registers[PC] = (uint32_t)dtrace_getpcstack; - /* - * In theory, the SCP isn't guaranteed to be in the function - * that generated the stack frame. We hope for the best. - */ - scp = frame[FR_SCP]; + while (depth < pcstack_limit) { + int done; + done = unwind_stack_one(&state, 1); + if (aframes > 0) { aframes--; if ((aframes == 0) && (caller != 0)) { pcstack[depth++] = caller; } } else { - pcstack[depth++] = scp; + pcstack[depth++] = state.registers[PC]; } -#if 0 - savecode = ((u_int32_t *)scp)[scp_offset]; - if ((savecode & 0x0e100000) == 0x08000000) { - /* Looks like an STM */ - rp = frame - 4; - for (r = 10; r >= 0; r--) { - if (savecode & (1 << r)) { - /* register r == *rp-- */ - } - } - } -#endif - - /* - * Switch to next frame up - */ - if (frame[FR_RFP] == 0) - break; /* Top of stack */ - - lastframe = frame; - frame = (u_int32_t *)(frame[FR_RFP]); - - if (INKERNEL((int)frame)) { - /* staying in kernel */ - if (frame <= lastframe) { - /* bad frame pointer */ - break; - } - } - else + if (done) break; } for (; depth < pcstack_limit; depth++) { pcstack[depth] = 0; } } void dtrace_getupcstack(uint64_t *pcstack, int pcstack_limit) { printf("IMPLEMENT ME: %s\n", __func__); } int dtrace_getustackdepth(void) { printf("IMPLEMENT ME: %s\n", __func__); return (0); } void dtrace_getufpstack(uint64_t *pcstack, uint64_t *fpstack, int pcstack_limit) { printf("IMPLEMENT ME: %s\n", __func__); } /*ARGSUSED*/ uint64_t dtrace_getarg(int arg, int aframes) { /* struct arm_frame *fp = (struct arm_frame *)dtrace_getfp();*/ return (0); } int dtrace_getstackdepth(int aframes) { - u_int32_t *frame, *lastframe; - int scp_offset; - int depth = 1; + struct unwind_state state; + register_t sp; + int scp_offset; + int done = 0; + int depth = 1; - frame = (u_int32_t *)__builtin_frame_address(0);; - lastframe = NULL; - scp_offset = -(get_pc_str_offset() >> 2); + __asm __volatile("mov %0, sp" : "=&r" (sp)); - while (frame != NULL) { - db_addr_t scp; -#if 0 - u_int32_t savecode; - int r; - u_int32_t *rp; -#endif + state.registers[FP] = (uint32_t)__builtin_frame_address(0); + state.registers[SP] = sp; + state.registers[LR] = (uint32_t)__builtin_return_address(0); + state.registers[PC] = (uint32_t)dtrace_getstackdepth; - /* - * In theory, the SCP isn't guaranteed to be in the function - * that generated the stack frame. We hope for the best. - */ - scp = frame[FR_SCP]; - + do { + done = unwind_stack_one(&state, 1); depth++; + } while (!done); - /* - * Switch to next frame up - */ - if (frame[FR_RFP] == 0) - break; /* Top of stack */ - - lastframe = frame; - frame = (u_int32_t *)(frame[FR_RFP]); - - if (INKERNEL((int)frame)) { - /* staying in kernel */ - if (frame <= lastframe) { - /* bad frame pointer */ - break; - } - } - else - break; - } - if (depth < aframes) return 0; else return depth - aframes; - } ulong_t dtrace_getreg(struct trapframe *rp, uint_t reg) { printf("IMPLEMENT ME: %s\n", __func__); return (0); } static int dtrace_copycheck(uintptr_t uaddr, uintptr_t kaddr, size_t size) { if (uaddr + size > VM_MAXUSER_ADDRESS || uaddr + size < uaddr) { DTRACE_CPUFLAG_SET(CPU_DTRACE_BADADDR); cpu_core[curcpu].cpuc_dtrace_illval = uaddr; return (0); } return (1); } void dtrace_copyin(uintptr_t uaddr, uintptr_t kaddr, size_t size, volatile uint16_t *flags) { if (dtrace_copycheck(uaddr, kaddr, size)) dtrace_copy(uaddr, kaddr, size); } void dtrace_copyout(uintptr_t kaddr, uintptr_t uaddr, size_t size, volatile uint16_t *flags) { if (dtrace_copycheck(uaddr, kaddr, size)) dtrace_copy(kaddr, uaddr, size); } void dtrace_copyinstr(uintptr_t uaddr, uintptr_t kaddr, size_t size, volatile uint16_t *flags) { if (dtrace_copycheck(uaddr, kaddr, size)) dtrace_copystr(uaddr, kaddr, size, flags); } void dtrace_copyoutstr(uintptr_t kaddr, uintptr_t uaddr, size_t size, volatile uint16_t *flags) { if (dtrace_copycheck(uaddr, kaddr, size)) dtrace_copystr(kaddr, uaddr, size, flags); } uint8_t dtrace_fuword8(void *uaddr) { if ((uintptr_t)uaddr > VM_MAXUSER_ADDRESS) { DTRACE_CPUFLAG_SET(CPU_DTRACE_BADADDR); cpu_core[curcpu].cpuc_dtrace_illval = (uintptr_t)uaddr; return (0); } return (dtrace_fuword8_nocheck(uaddr)); } uint16_t dtrace_fuword16(void *uaddr) { if ((uintptr_t)uaddr > VM_MAXUSER_ADDRESS) { DTRACE_CPUFLAG_SET(CPU_DTRACE_BADADDR); cpu_core[curcpu].cpuc_dtrace_illval = (uintptr_t)uaddr; return (0); } return (dtrace_fuword16_nocheck(uaddr)); } uint32_t dtrace_fuword32(void *uaddr) { if ((uintptr_t)uaddr > VM_MAXUSER_ADDRESS) { DTRACE_CPUFLAG_SET(CPU_DTRACE_BADADDR); cpu_core[curcpu].cpuc_dtrace_illval = (uintptr_t)uaddr; return (0); } return (dtrace_fuword32_nocheck(uaddr)); } uint64_t dtrace_fuword64(void *uaddr) { if ((uintptr_t)uaddr > VM_MAXUSER_ADDRESS) { DTRACE_CPUFLAG_SET(CPU_DTRACE_BADADDR); cpu_core[curcpu].cpuc_dtrace_illval = (uintptr_t)uaddr; return (0); } return (dtrace_fuword64_nocheck(uaddr)); } #define __with_interrupts_disabled(expr) \ do { \ u_int cpsr_save, tmp; \ \ __asm __volatile( \ "mrs %0, cpsr;" \ "orr %1, %0, %2;" \ "msr cpsr_fsxc, %1;" \ : "=r" (cpsr_save), "=r" (tmp) \ : "I" (PSR_I | PSR_F) \ : "cc" ); \ (expr); \ __asm __volatile( \ "msr cpsr_fsxc, %0" \ : /* no output */ \ : "r" (cpsr_save) \ : "cc" ); \ } while(0) uint32_t dtrace_cas32(uint32_t *target, uint32_t cmp, uint32_t new) { return atomic_cmpset_32((uint32_t*)target, (uint32_t)cmp, (uint32_t)new); } void * dtrace_casptr(volatile void *target, volatile void *cmp, volatile void *new) { return (void*)dtrace_cas32((uint32_t*)target, (uint32_t)cmp, (uint32_t)new); } Index: projects/ifnet/sys/conf/NOTES =================================================================== --- projects/ifnet/sys/conf/NOTES (revision 279031) +++ projects/ifnet/sys/conf/NOTES (revision 279032) @@ -1,2985 +1,2985 @@ # $FreeBSD$ # # NOTES -- Lines that can be cut/pasted into kernel and hints configs. # # Lines that begin with 'device', 'options', 'machine', 'ident', 'maxusers', # 'makeoptions', 'hints', etc. go into the kernel configuration that you # run config(8) with. # # Lines that begin with 'hint.' are NOT for config(8), they go into your # hints file. See /boot/device.hints and/or the 'hints' config(8) directive. # # Please use ``make LINT'' to create an old-style LINT file if you want to # do kernel test-builds. # # This file contains machine independent kernel configuration notes. For # machine dependent notes, look in /sys//conf/NOTES. # # # NOTES conventions and style guide: # # Large block comments should begin and end with a line containing only a # comment character. # # To describe a particular object, a block comment (if it exists) should # come first. Next should come device, options, and hints lines in that # order. All device and option lines must be described by a comment that # doesn't just expand the device or option name. Use only a concise # comment on the same line if possible. Very detailed descriptions of # devices and subsystems belong in man pages. # # A space followed by a tab separates 'options' from an option name. Two # spaces followed by a tab separate 'device' from a device name. Comments # after an option or device should use one space after the comment character. # To comment out a negative option that disables code and thus should not be # enabled for LINT builds, precede 'options' with "#!". # # # This is the ``identification'' of the kernel. Usually this should # be the same as the name of your kernel. # ident LINT # # The `maxusers' parameter controls the static sizing of a number of # internal system tables by a formula defined in subr_param.c. # Omitting this parameter or setting it to 0 will cause the system to # auto-size based on physical memory. # maxusers 10 # To statically compile in device wiring instead of /boot/device.hints #hints "LINT.hints" # Default places to look for devices. # Use the following to compile in values accessible to the kernel # through getenv() (or kenv(1) in userland). The format of the file # is 'variable=value', see kenv(1) # #env "LINT.env" # # The `makeoptions' parameter allows variables to be passed to the # generated Makefile in the build area. # # CONF_CFLAGS gives some extra compiler flags that are added to ${CFLAGS} # after most other flags. Here we use it to inhibit use of non-optimal # gcc built-in functions (e.g., memcmp). # # DEBUG happens to be magic. # The following is equivalent to 'config -g KERNELNAME' and creates # 'kernel.debug' compiled with -g debugging as well as a normal # 'kernel'. Use 'make install.debug' to install the debug kernel # but that isn't normally necessary as the debug symbols are not loaded # by the kernel and are not useful there anyway. # # KERNEL can be overridden so that you can change the default name of your # kernel. # # MODULES_OVERRIDE can be used to limit modules built to a specific list. # makeoptions CONF_CFLAGS=-fno-builtin #Don't allow use of memcmp, etc. #makeoptions DEBUG=-g #Build kernel with gdb(1) debug symbols #makeoptions KERNEL=foo #Build kernel "foo" and install "/foo" # Only build ext2fs module plus those parts of the sound system I need. #makeoptions MODULES_OVERRIDE="ext2fs sound/sound sound/driver/maestro3" makeoptions DESTDIR=/tmp # # FreeBSD processes are subject to certain limits to their consumption # of system resources. See getrlimit(2) for more details. Each # resource limit has two values, a "soft" limit and a "hard" limit. # The soft limits can be modified during normal system operation, but # the hard limits are set at boot time. Their default values are # in sys//include/vmparam.h. There are two ways to change them: # # 1. Set the values at kernel build time. The options below are one # way to allow that limit to grow to 1GB. They can be increased # further by changing the parameters: # # 2. In /boot/loader.conf, set the tunables kern.maxswzone, # kern.maxbcache, kern.maxtsiz, kern.dfldsiz, kern.maxdsiz, # kern.dflssiz, kern.maxssiz and kern.sgrowsiz. # # The options in /boot/loader.conf override anything in the kernel # configuration file. See the function init_param1 in # sys/kern/subr_param.c for more details. # options MAXDSIZ=(1024UL*1024*1024) options MAXSSIZ=(128UL*1024*1024) options DFLDSIZ=(1024UL*1024*1024) # # BLKDEV_IOSIZE sets the default block size used in user block # device I/O. Note that this value will be overridden by the label # when specifying a block device from a label with a non-0 # partition blocksize. The default is PAGE_SIZE. # options BLKDEV_IOSIZE=8192 # # MAXPHYS and DFLTPHYS # # These are the maximal and safe 'raw' I/O block device access sizes. # Reads and writes will be split into MAXPHYS chunks for known good # devices and DFLTPHYS for the rest. Some applications have better # performance with larger raw I/O access sizes. Note that certain VM # parameters are derived from these values and making them too large # can make an unbootable kernel. # # The defaults are 64K and 128K respectively. options DFLTPHYS=(64*1024) options MAXPHYS=(128*1024) # This allows you to actually store this configuration file into # the kernel binary itself. See config(8) for more details. # options INCLUDE_CONFIG_FILE # Include this file in kernel # # Compile-time defaults for various boot parameters # options BOOTVERBOSE=1 options BOOTHOWTO=RB_MULTIPLE options GEOM_AES # Don't use, use GEOM_BDE options GEOM_BDE # Disk encryption. options GEOM_BSD # BSD disklabels options GEOM_CACHE # Disk cache. options GEOM_CONCAT # Disk concatenation. options GEOM_ELI # Disk encryption. options GEOM_FOX # Redundant path mitigation options GEOM_GATE # Userland services. options GEOM_JOURNAL # Journaling. options GEOM_LABEL # Providers labelization. options GEOM_LINUX_LVM # Linux LVM2 volumes options GEOM_MBR # DOS/MBR partitioning options GEOM_MIRROR # Disk mirroring. options GEOM_MULTIPATH # Disk multipath options GEOM_NOP # Test class. options GEOM_PART_APM # Apple partitioning options GEOM_PART_BSD # BSD disklabel options GEOM_PART_BSD64 # BSD disklabel64 options GEOM_PART_EBR # Extended Boot Records options GEOM_PART_EBR_COMPAT # Backward compatible partition names options GEOM_PART_GPT # GPT partitioning options GEOM_PART_LDM # Logical Disk Manager options GEOM_PART_MBR # MBR partitioning options GEOM_PART_PC98 # PC-9800 disk partitioning options GEOM_PART_VTOC8 # SMI VTOC8 disk label options GEOM_PC98 # NEC PC9800 partitioning options GEOM_RAID # Soft RAID functionality. options GEOM_RAID3 # RAID3 functionality. options GEOM_SHSEC # Shared secret. options GEOM_STRIPE # Disk striping. options GEOM_SUNLABEL # Sun/Solaris partitioning options GEOM_UZIP # Read-only compressed disks options GEOM_VINUM # Vinum logical volume manager options GEOM_VIRSTOR # Virtual storage. options GEOM_VOL # Volume names from UFS superblock options GEOM_ZERO # Performance testing helper. # # The root device and filesystem type can be compiled in; # this provides a fallback option if the root device cannot # be correctly guessed by the bootstrap code, or an override if # the RB_DFLTROOT flag (-r) is specified when booting the kernel. # options ROOTDEVNAME=\"ufs:da0s2e\" ##################################################################### # Scheduler options: # # Specifying one of SCHED_4BSD or SCHED_ULE is mandatory. These options # select which scheduler is compiled in. # # SCHED_4BSD is the historical, proven, BSD scheduler. It has a global run # queue and no CPU affinity which makes it suboptimal for SMP. It has very # good interactivity and priority selection. # # SCHED_ULE provides significant performance advantages over 4BSD on many # workloads on SMP machines. It supports cpu-affinity, per-cpu runqueues # and scheduler locks. It also has a stronger notion of interactivity # which leads to better responsiveness even on uniprocessor machines. This # is the default scheduler. # # SCHED_STATS is a debugging option which keeps some stats in the sysctl # tree at 'kern.sched.stats' and is useful for debugging scheduling decisions. # options SCHED_4BSD options SCHED_STATS #options SCHED_ULE ##################################################################### # SMP OPTIONS: # # SMP enables building of a Symmetric MultiProcessor Kernel. # Mandatory: options SMP # Symmetric MultiProcessor Kernel # MAXCPU defines the maximum number of CPUs that can boot in the system. # A default value should be already present, for every architecture. options MAXCPU=32 # MAXMEMDOM defines the maximum number of memory domains that can boot in the # system. A default value should already be defined by every architecture. options MAXMEMDOM=1 # ADAPTIVE_MUTEXES changes the behavior of blocking mutexes to spin # if the thread that currently owns the mutex is executing on another # CPU. This behavior is enabled by default, so this option can be used # to disable it. options NO_ADAPTIVE_MUTEXES # ADAPTIVE_RWLOCKS changes the behavior of reader/writer locks to spin # if the thread that currently owns the rwlock is executing on another # CPU. This behavior is enabled by default, so this option can be used # to disable it. options NO_ADAPTIVE_RWLOCKS # ADAPTIVE_SX changes the behavior of sx locks to spin if the thread that # currently owns the sx lock is executing on another CPU. # This behavior is enabled by default, so this option can be used to # disable it. options NO_ADAPTIVE_SX # MUTEX_NOINLINE forces mutex operations to call functions to perform each # operation rather than inlining the simple cases. This can be used to # shrink the size of the kernel text segment. Note that this behavior is # already implied by the INVARIANT_SUPPORT, INVARIANTS, KTR, LOCK_PROFILING, # and WITNESS options. options MUTEX_NOINLINE # RWLOCK_NOINLINE forces rwlock operations to call functions to perform each # operation rather than inlining the simple cases. This can be used to # shrink the size of the kernel text segment. Note that this behavior is # already implied by the INVARIANT_SUPPORT, INVARIANTS, KTR, LOCK_PROFILING, # and WITNESS options. options RWLOCK_NOINLINE # SX_NOINLINE forces sx lock operations to call functions to perform each # operation rather than inlining the simple cases. This can be used to # shrink the size of the kernel text segment. Note that this behavior is # already implied by the INVARIANT_SUPPORT, INVARIANTS, KTR, LOCK_PROFILING, # and WITNESS options. options SX_NOINLINE # SMP Debugging Options: # # CALLOUT_PROFILING enables rudimentary profiling of the callwheel data # structure used as backend in callout(9). # PREEMPTION allows the threads that are in the kernel to be preempted by # higher priority [interrupt] threads. It helps with interactivity # and allows interrupt threads to run sooner rather than waiting. # WARNING! Only tested on amd64 and i386. # FULL_PREEMPTION instructs the kernel to preempt non-realtime kernel # threads. Its sole use is to expose race conditions and other # bugs during development. Enabling this option will reduce # performance and increase the frequency of kernel panics by # design. If you aren't sure that you need it then you don't. # Relies on the PREEMPTION option. DON'T TURN THIS ON. # MUTEX_DEBUG enables various extra assertions in the mutex code. # SLEEPQUEUE_PROFILING enables rudimentary profiling of the hash table # used to hold active sleep queues as well as sleep wait message # frequency. # TURNSTILE_PROFILING enables rudimentary profiling of the hash table # used to hold active lock queues. # UMTX_PROFILING enables rudimentary profiling of the hash table used to hold active lock queues. # WITNESS enables the witness code which detects deadlocks and cycles # during locking operations. # WITNESS_KDB causes the witness code to drop into the kernel debugger if # a lock hierarchy violation occurs or if locks are held when going to # sleep. # WITNESS_SKIPSPIN disables the witness checks on spin mutexes. options PREEMPTION options FULL_PREEMPTION options MUTEX_DEBUG options WITNESS options WITNESS_KDB options WITNESS_SKIPSPIN # LOCK_PROFILING - Profiling locks. See LOCK_PROFILING(9) for details. options LOCK_PROFILING # Set the number of buffers and the hash size. The hash size MUST be larger # than the number of buffers. Hash size should be prime. options MPROF_BUFFERS="1536" options MPROF_HASH_SIZE="1543" # Profiling for the callout(9) backend. options CALLOUT_PROFILING # Profiling for internal hash tables. options SLEEPQUEUE_PROFILING options TURNSTILE_PROFILING options UMTX_PROFILING ##################################################################### # COMPATIBILITY OPTIONS # # Implement system calls compatible with 4.3BSD and older versions of # FreeBSD. You probably do NOT want to remove this as much current code # still relies on the 4.3 emulation. Note that some architectures that # are supported by FreeBSD do not include support for certain important # aspects of this compatibility option, namely those related to the # signal delivery mechanism. # options COMPAT_43 # Old tty interface. options COMPAT_43TTY # Note that as a general rule, COMPAT_FREEBSD depends on # COMPAT_FREEBSD, COMPAT_FREEBSD, etc. # Enable FreeBSD4 compatibility syscalls options COMPAT_FREEBSD4 # Enable FreeBSD5 compatibility syscalls options COMPAT_FREEBSD5 # Enable FreeBSD6 compatibility syscalls options COMPAT_FREEBSD6 # Enable FreeBSD7 compatibility syscalls options COMPAT_FREEBSD7 # Enable FreeBSD9 compatibility syscalls options COMPAT_FREEBSD9 # Enable FreeBSD10 compatibility syscalls options COMPAT_FREEBSD10 # # These three options provide support for System V Interface # Definition-style interprocess communication, in the form of shared # memory, semaphores, and message queues, respectively. # options SYSVSHM options SYSVSEM options SYSVMSG ##################################################################### # DEBUGGING OPTIONS # # Compile with kernel debugger related code. # options KDB # # Print a stack trace of the current thread on the console for a panic. # options KDB_TRACE # # Don't enter the debugger for a panic. Intended for unattended operation # where you may want to enter the debugger from the console, but still want # the machine to recover from a panic. # options KDB_UNATTENDED # # Enable the ddb debugger backend. # options DDB # # Print the numerical value of symbols in addition to the symbolic # representation. # options DDB_NUMSYM # # Enable the remote gdb debugger backend. # options GDB # # SYSCTL_DEBUG enables a 'sysctl' debug tree that can be used to dump the # contents of the registered sysctl nodes on the console. It is disabled by # default because it generates excessively verbose console output that can # interfere with serial console operation. # options SYSCTL_DEBUG # # Enable textdump by default, this disables kernel core dumps. # options TEXTDUMP_PREFERRED # # Enable extra debug messages while performing textdumps. # options TEXTDUMP_VERBOSE # # NO_SYSCTL_DESCR omits the sysctl node descriptions to save space in the # resulting kernel. options NO_SYSCTL_DESCR # # MALLOC_DEBUG_MAXZONES enables multiple uma zones for malloc(9) # allocations that are smaller than a page. The purpose is to isolate # different malloc types into hash classes, so that any buffer # overruns or use-after-free will usually only affect memory from # malloc types in that hash class. This is purely a debugging tool; # by varying the hash function and tracking which hash class was # corrupted, the intersection of the hash classes from each instance # will point to a single malloc type that is being misused. At this # point inspection or memguard(9) can be used to catch the offending # code. # options MALLOC_DEBUG_MAXZONES=8 # # DEBUG_MEMGUARD builds and enables memguard(9), a replacement allocator # for the kernel used to detect modify-after-free scenarios. See the # memguard(9) man page for more information on usage. # options DEBUG_MEMGUARD # # DEBUG_REDZONE enables buffer underflows and buffer overflows detection for # malloc(9). # options DEBUG_REDZONE # # EARLY_PRINTF enables support for calling a special printf (eprintf) # very early in the kernel (before cn_init() has been called). This # should only be used for debugging purposes early in boot. Normally, # it is not defined. It is commented out here because this feature # isn't generally available. And the required eputc() isn't defined. # #options EARLY_PRINTF # # KTRACE enables the system-call tracing facility ktrace(2). To be more # SMP-friendly, KTRACE uses a worker thread to process most trace events # asynchronously to the thread generating the event. This requires a # pre-allocated store of objects representing trace events. The # KTRACE_REQUEST_POOL option specifies the initial size of this store. # The size of the pool can be adjusted both at boottime and runtime via # the kern.ktrace_request_pool tunable and sysctl. # options KTRACE #kernel tracing options KTRACE_REQUEST_POOL=101 # # KTR is a kernel tracing facility imported from BSD/OS. It is # enabled with the KTR option. KTR_ENTRIES defines the number of # entries in the circular trace buffer; it may be an arbitrary number. # KTR_BOOT_ENTRIES defines the number of entries during the early boot, # before malloc(9) is functional. # KTR_COMPILE defines the mask of events to compile into the kernel as # defined by the KTR_* constants in . KTR_MASK defines the # initial value of the ktr_mask variable which determines at runtime # what events to trace. KTR_CPUMASK determines which CPU's log # events, with bit X corresponding to CPU X. The layout of the string # passed as KTR_CPUMASK must match a series of bitmasks each of them # separated by the "," character (ie: # KTR_CPUMASK=0xAF,0xFFFFFFFFFFFFFFFF). KTR_VERBOSE enables # dumping of KTR events to the console by default. This functionality # can be toggled via the debug.ktr_verbose sysctl and defaults to off # if KTR_VERBOSE is not defined. See ktr(4) and ktrdump(8) for details. # options KTR options KTR_BOOT_ENTRIES=1024 options KTR_ENTRIES=(128*1024) -options KTR_COMPILE=(KTR_INTR|KTR_PROC) +options KTR_COMPILE=(KTR_ALL) options KTR_MASK=KTR_INTR options KTR_CPUMASK=0x3 options KTR_VERBOSE # # ALQ(9) is a facility for the asynchronous queuing of records from the kernel # to a vnode, and is employed by services such as ktr(4) to produce trace # files based on a kernel event stream. Records are written asynchronously # in a worker thread. # options ALQ options KTR_ALQ # # The INVARIANTS option is used in a number of source files to enable # extra sanity checking of internal structures. This support is not # enabled by default because of the extra time it would take to check # for these conditions, which can only occur as a result of # programming errors. # options INVARIANTS # # The INVARIANT_SUPPORT option makes us compile in support for # verifying some of the internal structures. It is a prerequisite for # 'INVARIANTS', as enabling 'INVARIANTS' will make these functions be # called. The intent is that you can set 'INVARIANTS' for single # source files (by changing the source file or specifying it on the # command line) if you have 'INVARIANT_SUPPORT' enabled. Also, if you # wish to build a kernel module with 'INVARIANTS', then adding # 'INVARIANT_SUPPORT' to your kernel will provide all the necessary # infrastructure without the added overhead. # options INVARIANT_SUPPORT # # The DIAGNOSTIC option is used to enable extra debugging information # from some parts of the kernel. As this makes everything more noisy, # it is disabled by default. # options DIAGNOSTIC # # REGRESSION causes optional kernel interfaces necessary only for regression # testing to be enabled. These interfaces may constitute security risks # when enabled, as they permit processes to easily modify aspects of the # run-time environment to reproduce unlikely or unusual (possibly normally # impossible) scenarios. # options REGRESSION # # This option lets some drivers co-exist that can't co-exist in a running # system. This is used to be able to compile all kernel code in one go for # quality assurance purposes (like this file, which the option takes it name # from.) # options COMPILING_LINT # # STACK enables the stack(9) facility, allowing the capture of kernel stack # for the purpose of procinfo(1), etc. stack(9) will also be compiled in # automatically if DDB(4) is compiled into the kernel. # options STACK ##################################################################### # PERFORMANCE MONITORING OPTIONS # # The hwpmc driver that allows the use of in-CPU performance monitoring # counters for performance monitoring. The base kernel needs to be configured # with the 'options' line, while the hwpmc device can be either compiled # in or loaded as a loadable kernel module. # # Additional configuration options may be required on specific architectures, # please see hwpmc(4). device hwpmc # Driver (also a loadable module) options HWPMC_HOOKS # Other necessary kernel hooks ##################################################################### # NETWORKING OPTIONS # # Protocol families # options INET #Internet communications protocols options INET6 #IPv6 communications protocols options ROUTETABLES=2 # allocated fibs up to 65536. default is 1. # but that would be a bad idea as they are large. options TCP_OFFLOAD # TCP offload support. # In order to enable IPSEC you MUST also add device crypto to # your kernel configuration options IPSEC #IP security (requires device crypto) #options IPSEC_DEBUG #debug for IP security # # #DEPRECATED# # Set IPSEC_FILTERTUNNEL to change the default of the sysctl to force packets # coming through a tunnel to be processed by any configured packet filtering # twice. The default is that packets coming out of a tunnel are _not_ processed; # they are assumed trusted. # # IPSEC history is preserved for such packets, and can be filtered # using ipfw(8)'s 'ipsec' keyword, when this option is enabled. # #options IPSEC_FILTERTUNNEL #filter ipsec packets from a tunnel # # Set IPSEC_NAT_T to enable NAT-Traversal support. This enables # optional UDP encapsulation of ESP packets. # options IPSEC_NAT_T #NAT-T support, UDP encap of ESP # # SMB/CIFS requester # NETSMB enables support for SMB protocol, it requires LIBMCHAIN and LIBICONV # options. options NETSMB #SMB/CIFS requester # mchain library. It can be either loaded as KLD or compiled into kernel options LIBMCHAIN # libalias library, performing NAT options LIBALIAS # flowtable cache options FLOWTABLE # # SCTP is a NEW transport protocol defined by # RFC2960 updated by RFC3309 and RFC3758.. and # soon to have a new base RFC and many many more # extensions. This release supports all the extensions # including many drafts (most about to become RFC's). # It is the reference implementation of SCTP # and is quite well tested. # # Note YOU MUST have both INET and INET6 defined. # You don't have to enable V6, but SCTP is # dual stacked and so far we have not torn apart # the V6 and V4.. since an association can span # both a V6 and V4 address at the SAME time :-) # options SCTP # There are bunches of options: # this one turns on all sorts of # nastily printing that you can # do. It's all controlled by a # bit mask (settable by socket opt and # by sysctl). Including will not cause # logging until you set the bits.. but it # can be quite verbose.. so without this # option we don't do any of the tests for # bits and prints.. which makes the code run # faster.. if you are not debugging don't use. options SCTP_DEBUG # # This option turns off the CRC32c checksum. Basically, # you will not be able to talk to anyone else who # has not done this. Its more for experimentation to # see how much CPU the CRC32c really takes. Most new # cards for TCP support checksum offload.. so this # option gives you a "view" into what SCTP would be # like with such an offload (which only exists in # high in iSCSI boards so far). With the new # splitting 8's algorithm its not as bad as it used # to be.. but it does speed things up try only # for in a captured lab environment :-) options SCTP_WITH_NO_CSUM # # # All that options after that turn on specific types of # logging. You can monitor CWND growth, flight size # and all sorts of things. Go look at the code and # see. I have used this to produce interesting # charts and graphs as well :-> # # I have not yet committed the tools to get and print # the logs, I will do that eventually .. before then # if you want them send me an email rrs@freebsd.org # You basically must have ktr(4) enabled for these # and you then set the sysctl to turn on/off various # logging bits. Use ktrdump(8) to pull the log and run # it through a display program.. and graphs and other # things too. # options SCTP_LOCK_LOGGING options SCTP_MBUF_LOGGING options SCTP_MBCNT_LOGGING options SCTP_PACKET_LOGGING options SCTP_LTRACE_CHUNKS options SCTP_LTRACE_ERRORS # altq(9). Enable the base part of the hooks with the ALTQ option. # Individual disciplines must be built into the base system and can not be # loaded as modules at this point. ALTQ requires a stable TSC so if yours is # broken or changes with CPU throttling then you must also have the ALTQ_NOPCC # option. options ALTQ options ALTQ_CBQ # Class Based Queueing options ALTQ_RED # Random Early Detection options ALTQ_RIO # RED In/Out options ALTQ_HFSC # Hierarchical Packet Scheduler options ALTQ_CDNR # Traffic conditioner options ALTQ_PRIQ # Priority Queueing options ALTQ_NOPCC # Required if the TSC is unusable options ALTQ_DEBUG # netgraph(4). Enable the base netgraph code with the NETGRAPH option. # Individual node types can be enabled with the corresponding option # listed below; however, this is not strictly necessary as netgraph # will automatically load the corresponding KLD module if the node type # is not already compiled into the kernel. Each type below has a # corresponding man page, e.g., ng_async(8). options NETGRAPH # netgraph(4) system options NETGRAPH_DEBUG # enable extra debugging, this # affects netgraph(4) and nodes # Node types options NETGRAPH_ASYNC options NETGRAPH_ATMLLC options NETGRAPH_ATM_ATMPIF options NETGRAPH_BLUETOOTH # ng_bluetooth(4) options NETGRAPH_BLUETOOTH_BT3C # ng_bt3c(4) options NETGRAPH_BLUETOOTH_HCI # ng_hci(4) options NETGRAPH_BLUETOOTH_L2CAP # ng_l2cap(4) options NETGRAPH_BLUETOOTH_SOCKET # ng_btsocket(4) options NETGRAPH_BLUETOOTH_UBT # ng_ubt(4) options NETGRAPH_BLUETOOTH_UBTBCMFW # ubtbcmfw(4) options NETGRAPH_BPF options NETGRAPH_BRIDGE options NETGRAPH_CAR options NETGRAPH_CISCO options NETGRAPH_DEFLATE options NETGRAPH_DEVICE options NETGRAPH_ECHO options NETGRAPH_EIFACE options NETGRAPH_ETHER options NETGRAPH_FRAME_RELAY options NETGRAPH_GIF options NETGRAPH_GIF_DEMUX options NETGRAPH_HOLE options NETGRAPH_IFACE options NETGRAPH_IP_INPUT options NETGRAPH_IPFW options NETGRAPH_KSOCKET options NETGRAPH_L2TP options NETGRAPH_LMI # MPPC compression requires proprietary files (not included) #options NETGRAPH_MPPC_COMPRESSION options NETGRAPH_MPPC_ENCRYPTION options NETGRAPH_NETFLOW options NETGRAPH_NAT options NETGRAPH_ONE2MANY options NETGRAPH_PATCH options NETGRAPH_PIPE options NETGRAPH_PPP options NETGRAPH_PPPOE options NETGRAPH_PPTPGRE options NETGRAPH_PRED1 options NETGRAPH_RFC1490 options NETGRAPH_SOCKET options NETGRAPH_SPLIT options NETGRAPH_SPPP options NETGRAPH_TAG options NETGRAPH_TCPMSS options NETGRAPH_TEE options NETGRAPH_UI options NETGRAPH_VJC options NETGRAPH_VLAN # NgATM - Netgraph ATM options NGATM_ATM options NGATM_ATMBASE options NGATM_SSCOP options NGATM_SSCFU options NGATM_UNI options NGATM_CCATM device mn # Munich32x/Falc54 Nx64kbit/sec cards. # Network stack virtualization. #options VIMAGE #options VNET_DEBUG # debug for VIMAGE # # Network interfaces: # The `loop' device is MANDATORY when networking is enabled. device loop # The `ether' device provides generic code to handle # Ethernets; it is MANDATORY when an Ethernet device driver is # configured or token-ring is enabled. device ether # The `vlan' device implements the VLAN tagging of Ethernet frames # according to IEEE 802.1Q. device vlan # The `vxlan' device implements the VXLAN encapsulation of Ethernet # frames in UDP packets according to RFC7348. device vxlan # The `wlan' device provides generic code to support 802.11 # drivers, including host AP mode; it is MANDATORY for the wi, # and ath drivers and will eventually be required by all 802.11 drivers. device wlan options IEEE80211_DEBUG #enable debugging msgs options IEEE80211_AMPDU_AGE #age frames in AMPDU reorder q's options IEEE80211_SUPPORT_MESH #enable 802.11s D3.0 support options IEEE80211_SUPPORT_TDMA #enable TDMA support # The `wlan_wep', `wlan_tkip', and `wlan_ccmp' devices provide # support for WEP, TKIP, and AES-CCMP crypto protocols optionally # used with 802.11 devices that depend on the `wlan' module. device wlan_wep device wlan_ccmp device wlan_tkip # The `wlan_xauth' device provides support for external (i.e. user-mode) # authenticators for use with 802.11 drivers that use the `wlan' # module and support 802.1x and/or WPA security protocols. device wlan_xauth # The `wlan_acl' device provides a MAC-based access control mechanism # for use with 802.11 drivers operating in ap mode and using the # `wlan' module. # The 'wlan_amrr' device provides AMRR transmit rate control algorithm device wlan_acl device wlan_amrr # Generic TokenRing device token # The `fddi' device provides generic code to support FDDI. device fddi # The `arcnet' device provides generic code to support Arcnet. device arcnet # The `sppp' device serves a similar role for certain types # of synchronous PPP links (like `cx', `ar'). device sppp # The `bpf' device enables the Berkeley Packet Filter. Be # aware of the legal and administrative consequences of enabling this # option. DHCP requires bpf. device bpf # The `netmap' device implements memory-mapped access to network # devices from userspace, enabling wire-speed packet capture and # generation even at 10Gbit/s. Requires support in the device # driver. Supported drivers are ixgbe, e1000, re. device netmap # The `disc' device implements a minimal network interface, # which throws away all packets sent and never receives any. It is # included for testing and benchmarking purposes. device disc # The `epair' device implements a virtual back-to-back connected Ethernet # like interface pair. device epair # The `edsc' device implements a minimal Ethernet interface, # which discards all packets sent and receives none. device edsc # The `tap' device is a pty-like virtual Ethernet interface device tap # The `tun' device implements (user-)ppp and nos-tun(8) device tun # The `gif' device implements IPv6 over IP4 tunneling, # IPv4 over IPv6 tunneling, IPv4 over IPv4 tunneling and # IPv6 over IPv6 tunneling. # The `gre' device implements GRE (Generic Routing Encapsulation) tunneling, # as specified in the RFC 2784 and RFC 2890. # The `me' device implements Minimal Encapsulation within IPv4 as # specified in the RFC 2004. # The XBONEHACK option allows the same pair of addresses to be configured on # multiple gif interfaces. device gif device gre device me options XBONEHACK # The `stf' device implements 6to4 encapsulation. device stf # The pf packet filter consists of three devices: # The `pf' device provides /dev/pf and the firewall code itself. # The `pflog' device provides the pflog0 interface which logs packets. # The `pfsync' device provides the pfsync0 interface used for # synchronization of firewall state tables (over the net). device pf device pflog device pfsync # Bridge interface. device if_bridge # Common Address Redundancy Protocol. See carp(4) for more details. device carp # IPsec interface. device enc # Link aggregation interface. device lagg # # Internet family options: # # MROUTING enables the kernel multicast packet forwarder, which works # with mrouted and XORP. # # IPFIREWALL enables support for IP firewall construction, in # conjunction with the `ipfw' program. IPFIREWALL_VERBOSE sends # logged packets to the system logger. IPFIREWALL_VERBOSE_LIMIT # limits the number of times a matching entry can be logged. # # WARNING: IPFIREWALL defaults to a policy of "deny ip from any to any" # and if you do not add other rules during startup to allow access, # YOU WILL LOCK YOURSELF OUT. It is suggested that you set firewall_type=open # in /etc/rc.conf when first enabling this feature, then refining the # firewall rules in /etc/rc.firewall after you've tested that the new kernel # feature works properly. # # IPFIREWALL_DEFAULT_TO_ACCEPT causes the default rule (at boot) to # allow everything. Use with care, if a cracker can crash your # firewall machine, they can get to your protected machines. However, # if you are using it as an as-needed filter for specific problems as # they arise, then this may be for you. Changing the default to 'allow' # means that you won't get stuck if the kernel and /sbin/ipfw binary get # out of sync. # # IPDIVERT enables the divert IP sockets, used by ``ipfw divert''. It # depends on IPFIREWALL if compiled into the kernel. # # IPFIREWALL_NAT adds support for in kernel nat in ipfw, and it requires # LIBALIAS. # # IPSTEALTH enables code to support stealth forwarding (i.e., forwarding # packets without touching the TTL). This can be useful to hide firewalls # from traceroute and similar tools. # # PF_DEFAULT_TO_DROP causes the default pf(4) rule to deny everything. # # TCPDEBUG enables code which keeps traces of the TCP state machine # for sockets with the SO_DEBUG option set, which can then be examined # using the trpt(8) utility. # # RADIX_MPATH provides support for equal-cost multi-path routing. # options MROUTING # Multicast routing options IPFIREWALL #firewall options IPFIREWALL_VERBOSE #enable logging to syslogd(8) options IPFIREWALL_VERBOSE_LIMIT=100 #limit verbosity options IPFIREWALL_DEFAULT_TO_ACCEPT #allow everything by default options IPFIREWALL_NAT #ipfw kernel nat support options IPDIVERT #divert sockets options IPFILTER #ipfilter support options IPFILTER_LOG #ipfilter logging options IPFILTER_LOOKUP #ipfilter pools options IPFILTER_DEFAULT_BLOCK #block all packets by default options IPSTEALTH #support for stealth forwarding options PF_DEFAULT_TO_DROP #drop everything by default options TCPDEBUG options RADIX_MPATH # The MBUF_STRESS_TEST option enables options which create # various random failures / extreme cases related to mbuf # functions. See mbuf(9) for a list of available test cases. # MBUF_PROFILING enables code to profile the mbuf chains # exiting the system (via participating interfaces) and # return a logarithmic histogram of monitored parameters # (e.g. packet size, wasted space, number of mbufs in chain). options MBUF_STRESS_TEST options MBUF_PROFILING # Statically link in accept filters options ACCEPT_FILTER_DATA options ACCEPT_FILTER_DNS options ACCEPT_FILTER_HTTP # TCP_SIGNATURE adds support for RFC 2385 (TCP-MD5) digests. These are # carried in TCP option 19. This option is commonly used to protect # TCP sessions (e.g. BGP) where IPSEC is not available nor desirable. # This is enabled on a per-socket basis using the TCP_MD5SIG socket option. # This requires the use of 'device crypto', 'options IPSEC' # or 'device cryptodev'. options TCP_SIGNATURE #include support for RFC 2385 # DUMMYNET enables the "dummynet" bandwidth limiter. You need IPFIREWALL # as well. See dummynet(4) and ipfw(8) for more info. When you run # DUMMYNET it is advisable to also have at least "options HZ=1000" to achieve # a smooth scheduling of the traffic. options DUMMYNET ##################################################################### # FILESYSTEM OPTIONS # # Only the root filesystem needs to be statically compiled or preloaded # as module; everything else will be automatically loaded at mount # time. Some people still prefer to statically compile other # filesystems as well. # # NB: The UNION filesystem was known to be buggy in the past. It is now # being actively maintained, although there are still some issues being # resolved. # # One of these is mandatory: options FFS #Fast filesystem options NFSCL #Network File System client # The rest are optional: options AUTOFS #Automounter filesystem options CD9660 #ISO 9660 filesystem options FDESCFS #File descriptor filesystem options FUSE #FUSE support module options MSDOSFS #MS DOS File System (FAT, FAT32) options NFSLOCKD #Network Lock Manager options NFSD #Network Filesystem Server options KGSSAPI #Kernel GSSAPI implementation options NULLFS #NULL filesystem options PROCFS #Process filesystem (requires PSEUDOFS) options PSEUDOFS #Pseudo-filesystem framework options PSEUDOFS_TRACE #Debugging support for PSEUDOFS options SMBFS #SMB/CIFS filesystem options TMPFS #Efficient memory filesystem options UDF #Universal Disk Format options UNIONFS #Union filesystem # The xFS_ROOT options REQUIRE the associated ``options xFS'' options NFS_ROOT #NFS usable as root device # Soft updates is a technique for improving filesystem speed and # making abrupt shutdown less risky. # options SOFTUPDATES # Extended attributes allow additional data to be associated with files, # and is used for ACLs, Capabilities, and MAC labels. # See src/sys/ufs/ufs/README.extattr for more information. options UFS_EXTATTR options UFS_EXTATTR_AUTOSTART # Access Control List support for UFS filesystems. The current ACL # implementation requires extended attribute support, UFS_EXTATTR, # for the underlying filesystem. # See src/sys/ufs/ufs/README.acls for more information. options UFS_ACL # Directory hashing improves the speed of operations on very large # directories at the expense of some memory. options UFS_DIRHASH # Gjournal-based UFS journaling support. options UFS_GJOURNAL # Make space in the kernel for a root filesystem on a md device. # Define to the number of kilobytes to reserve for the filesystem. options MD_ROOT_SIZE=10 # Make the md device a potential root device, either with preloaded # images of type mfs_root or md_root. options MD_ROOT # Disk quotas are supported when this option is enabled. options QUOTA #enable disk quotas # If you are running a machine just as a fileserver for PC and MAC # users, using SAMBA, you may consider setting this option # and keeping all those users' directories on a filesystem that is # mounted with the suiddir option. This gives new files the same # ownership as the directory (similar to group). It's a security hole # if you let these users run programs, so confine it to file-servers # (but it'll save you lots of headaches in those cases). Root owned # directories are exempt and X bits are cleared. The suid bit must be # set on the directory as well; see chmod(1). PC owners can't see/set # ownerships so they keep getting their toes trodden on. This saves # you all the support calls as the filesystem it's used on will act as # they expect: "It's my dir so it must be my file". # options SUIDDIR # NFS options: options NFS_MINATTRTIMO=3 # VREG attrib cache timeout in sec options NFS_MAXATTRTIMO=60 options NFS_MINDIRATTRTIMO=30 # VDIR attrib cache timeout in sec options NFS_MAXDIRATTRTIMO=60 options NFS_DEBUG # Enable NFS Debugging # # Add support for the EXT2FS filesystem of Linux fame. Be a bit # careful with this - the ext2fs code has a tendency to lag behind # changes and not be exercised very much, so mounting read/write could # be dangerous (and even mounting read only could result in panics.) # options EXT2FS # # Add support for the ReiserFS filesystem (used in Linux). Currently, # this is limited to read-only access. # options REISERFS # Use real implementations of the aio_* system calls. There are numerous # stability and security issues in the current aio code that make it # unsuitable for inclusion on machines with untrusted local users. options VFS_AIO # Cryptographically secure random number generator; /dev/random device random # The system memory devices; /dev/mem, /dev/kmem device mem # The kernel symbol table device; /dev/ksyms device ksyms # Optional character code conversion support with LIBICONV. # Each option requires their base file system and LIBICONV. options CD9660_ICONV options MSDOSFS_ICONV options UDF_ICONV ##################################################################### # POSIX P1003.1B # Real time extensions added in the 1993 POSIX # _KPOSIX_PRIORITY_SCHEDULING: Build in _POSIX_PRIORITY_SCHEDULING options _KPOSIX_PRIORITY_SCHEDULING # p1003_1b_semaphores are very experimental, # user should be ready to assist in debugging if problems arise. options P1003_1B_SEMAPHORES # POSIX message queue options P1003_1B_MQUEUE ##################################################################### # SECURITY POLICY PARAMETERS # Support for BSM audit options AUDIT # Support for Mandatory Access Control (MAC): options MAC options MAC_BIBA options MAC_BSDEXTENDED options MAC_IFOFF options MAC_LOMAC options MAC_MLS options MAC_NONE options MAC_PARTITION options MAC_PORTACL options MAC_SEEOTHERUIDS options MAC_STUB options MAC_TEST # Support for Capsicum options CAPABILITIES # fine-grained rights on file descriptors options CAPABILITY_MODE # sandboxes with no global namespace access ##################################################################### # CLOCK OPTIONS # The granularity of operation is controlled by the kernel option HZ whose # default value (1000 on most architectures) means a granularity of 1ms # (1s/HZ). Historically, the default was 100, but finer granularity is # required for DUMMYNET and other systems on modern hardware. There are # reasonable arguments that HZ should, in fact, be 100 still; consider, # that reducing the granularity too much might cause excessive overhead in # clock interrupt processing, potentially causing ticks to be missed and thus # actually reducing the accuracy of operation. options HZ=100 # Enable support for the kernel PLL to use an external PPS signal, # under supervision of [x]ntpd(8) # More info in ntpd documentation: http://www.eecis.udel.edu/~ntp options PPS_SYNC # Enable support for generic feed-forward clocks in the kernel. # The feed-forward clock support is an alternative to the feedback oriented # ntpd/system clock approach, and is to be used with a feed-forward # synchronization algorithm such as the RADclock: # More info here: http://www.synclab.org/radclock options FFCLOCK ##################################################################### # SCSI DEVICES # SCSI DEVICE CONFIGURATION # The SCSI subsystem consists of the `base' SCSI code, a number of # high-level SCSI device `type' drivers, and the low-level host-adapter # device drivers. The host adapters are listed in the ISA and PCI # device configuration sections below. # # It is possible to wire down your SCSI devices so that a given bus, # target, and LUN always come on line as the same device unit. In # earlier versions the unit numbers were assigned in the order that # the devices were probed on the SCSI bus. This means that if you # removed a disk drive, you may have had to rewrite your /etc/fstab # file, and also that you had to be careful when adding a new disk # as it may have been probed earlier and moved your device configuration # around. (See also option GEOM_VOL for a different solution to this # problem.) # This old behavior is maintained as the default behavior. The unit # assignment begins with the first non-wired down unit for a device # type. For example, if you wire a disk as "da3" then the first # non-wired disk will be assigned da4. # The syntax for wiring down devices is: hint.scbus.0.at="ahc0" hint.scbus.1.at="ahc1" hint.scbus.1.bus="0" hint.scbus.3.at="ahc2" hint.scbus.3.bus="0" hint.scbus.2.at="ahc2" hint.scbus.2.bus="1" hint.da.0.at="scbus0" hint.da.0.target="0" hint.da.0.unit="0" hint.da.1.at="scbus3" hint.da.1.target="1" hint.da.2.at="scbus2" hint.da.2.target="3" hint.sa.1.at="scbus1" hint.sa.1.target="6" # "units" (SCSI logical unit number) that are not specified are # treated as if specified as LUN 0. # All SCSI devices allocate as many units as are required. # The ch driver drives SCSI Media Changer ("jukebox") devices. # # The da driver drives SCSI Direct Access ("disk") and Optical Media # ("WORM") devices. # # The sa driver drives SCSI Sequential Access ("tape") devices. # # The cd driver drives SCSI Read Only Direct Access ("cd") devices. # # The ses driver drives SCSI Environment Services ("ses") and # SAF-TE ("SCSI Accessible Fault-Tolerant Enclosure") devices. # # The pt driver drives SCSI Processor devices. # # The sg driver provides a passthrough API that is compatible with the # Linux SG driver. It will work in conjunction with the COMPAT_LINUX # option to run linux SG apps. It can also stand on its own and provide # source level API compatibility for porting apps to FreeBSD. # # Target Mode support is provided here but also requires that a SIM # (SCSI Host Adapter Driver) provide support as well. # # The targ driver provides target mode support as a Processor type device. # It exists to give the minimal context necessary to respond to Inquiry # commands. There is a sample user application that shows how the rest # of the command support might be done in /usr/share/examples/scsi_target. # # The targbh driver provides target mode support and exists to respond # to incoming commands that do not otherwise have a logical unit assigned # to them. # # The pass driver provides a passthrough API to access the CAM subsystem. device scbus #base SCSI code device ch #SCSI media changers device da #SCSI direct access devices (aka disks) device sa #SCSI tapes device cd #SCSI CD-ROMs device ses #Enclosure Services (SES and SAF-TE) device pt #SCSI processor device targ #SCSI Target Mode Code device targbh #SCSI Target Mode Blackhole Device device pass #CAM passthrough driver device sg #Linux SCSI passthrough device ctl #CAM Target Layer # CAM OPTIONS: # debugging options: # CAMDEBUG Compile in all possible debugging. # CAM_DEBUG_COMPILE Debug levels to compile in. # CAM_DEBUG_FLAGS Debug levels to enable on boot. # CAM_DEBUG_BUS Limit debugging to the given bus. # CAM_DEBUG_TARGET Limit debugging to the given target. # CAM_DEBUG_LUN Limit debugging to the given lun. # CAM_DEBUG_DELAY Delay in us after printing each debug line. # # CAM_MAX_HIGHPOWER: Maximum number of concurrent high power (start unit) cmds # SCSI_NO_SENSE_STRINGS: When defined disables sense descriptions # SCSI_NO_OP_STRINGS: When defined disables opcode descriptions # SCSI_DELAY: The number of MILLISECONDS to freeze the SIM (scsi adapter) # queue after a bus reset, and the number of milliseconds to # freeze the device queue after a bus device reset. This # can be changed at boot and runtime with the # kern.cam.scsi_delay tunable/sysctl. options CAMDEBUG options CAM_DEBUG_COMPILE=-1 options CAM_DEBUG_FLAGS=(CAM_DEBUG_INFO|CAM_DEBUG_PROBE|CAM_DEBUG_PERIPH) options CAM_DEBUG_BUS=-1 options CAM_DEBUG_TARGET=-1 options CAM_DEBUG_LUN=-1 options CAM_DEBUG_DELAY=1 options CAM_MAX_HIGHPOWER=4 options SCSI_NO_SENSE_STRINGS options SCSI_NO_OP_STRINGS options SCSI_DELAY=5000 # Be pessimistic about Joe SCSI device # Options for the CAM CDROM driver: # CHANGER_MIN_BUSY_SECONDS: Guaranteed minimum time quantum for a changer LUN # CHANGER_MAX_BUSY_SECONDS: Maximum time quantum per changer LUN, only # enforced if there is I/O waiting for another LUN # The compiled in defaults for these variables are 2 and 10 seconds, # respectively. # # These can also be changed on the fly with the following sysctl variables: # kern.cam.cd.changer.min_busy_seconds # kern.cam.cd.changer.max_busy_seconds # options CHANGER_MIN_BUSY_SECONDS=2 options CHANGER_MAX_BUSY_SECONDS=10 # Options for the CAM sequential access driver: # SA_IO_TIMEOUT: Timeout for read/write/wfm operations, in minutes # SA_SPACE_TIMEOUT: Timeout for space operations, in minutes # SA_REWIND_TIMEOUT: Timeout for rewind operations, in minutes # SA_ERASE_TIMEOUT: Timeout for erase operations, in minutes # SA_1FM_AT_EOD: Default to model which only has a default one filemark at EOT. options SA_IO_TIMEOUT=4 options SA_SPACE_TIMEOUT=60 options SA_REWIND_TIMEOUT=(2*60) options SA_ERASE_TIMEOUT=(4*60) options SA_1FM_AT_EOD # Optional timeout for the CAM processor target (pt) device # This is specified in seconds. The default is 60 seconds. options SCSI_PT_DEFAULT_TIMEOUT=60 # Optional enable of doing SES passthrough on other devices (e.g., disks) # # Normally disabled because a lot of newer SCSI disks report themselves # as having SES capabilities, but this can then clot up attempts to build # a topology with the SES device that's on the box these drives are in.... options SES_ENABLE_PASSTHROUGH ##################################################################### # MISCELLANEOUS DEVICES AND OPTIONS device pty #BSD-style compatibility pseudo ttys device nmdm #back-to-back tty devices device md #Memory/malloc disk device snp #Snoop device - to look at pty/vty/etc.. device ccd #Concatenated disk driver device firmware #firmware(9) support # Kernel side iconv library options LIBICONV # Size of the kernel message buffer. Should be N * pagesize. options MSGBUF_SIZE=40960 ##################################################################### # HARDWARE DEVICE CONFIGURATION # For ISA the required hints are listed. # EISA, MCA, PCI, CardBus, SD/MMC and pccard are self identifying buses, so # no hints are needed. # # Mandatory devices: # # These options are valid for other keyboard drivers as well. options KBD_DISABLE_KEYMAP_LOAD # refuse to load a keymap options KBD_INSTALL_CDEV # install a CDEV entry in /dev options FB_DEBUG # Frame buffer debugging device splash # Splash screen and screen saver support # Various screen savers. device blank_saver device daemon_saver device dragon_saver device fade_saver device fire_saver device green_saver device logo_saver device rain_saver device snake_saver device star_saver device warp_saver # The syscons console driver (SCO color console compatible). device sc hint.sc.0.at="isa" options MAXCONS=16 # number of virtual consoles options SC_ALT_MOUSE_IMAGE # simplified mouse cursor in text mode options SC_DFLT_FONT # compile font in makeoptions SC_DFLT_FONT=cp850 options SC_DISABLE_KDBKEY # disable `debug' key options SC_DISABLE_REBOOT # disable reboot key sequence options SC_HISTORY_SIZE=200 # number of history buffer lines options SC_MOUSE_CHAR=0x3 # char code for text mode mouse cursor options SC_PIXEL_MODE # add support for the raster text mode # The following options will let you change the default colors of syscons. options SC_NORM_ATTR=(FG_GREEN|BG_BLACK) options SC_NORM_REV_ATTR=(FG_YELLOW|BG_GREEN) options SC_KERNEL_CONS_ATTR=(FG_RED|BG_BLACK) options SC_KERNEL_CONS_REV_ATTR=(FG_BLACK|BG_RED) # The following options will let you change the default behavior of # cut-n-paste feature options SC_CUT_SPACES2TABS # convert leading spaces into tabs options SC_CUT_SEPCHARS=\"x09\" # set of characters that delimit words # (default is single space - \"x20\") # If you have a two button mouse, you may want to add the following option # to use the right button of the mouse to paste text. options SC_TWOBUTTON_MOUSE # You can selectively disable features in syscons. options SC_NO_CUTPASTE options SC_NO_FONT_LOADING options SC_NO_HISTORY options SC_NO_MODE_CHANGE options SC_NO_SYSMOUSE options SC_NO_SUSPEND_VTYSWITCH # `flags' for sc # 0x80 Put the video card in the VESA 800x600 dots, 16 color mode # 0x100 Probe for a keyboard device periodically if one is not present # Enable experimental features of the syscons terminal emulator (teken). options TEKEN_CONS25 # cons25-style terminal emulation options TEKEN_UTF8 # UTF-8 output handling # The vt video console driver. device vt options VT_ALT_TO_ESC_HACK=1 # Prepend ESC sequence to ALT keys options VT_MAXWINDOWS=16 # Number of virtual consoles options VT_TWOBUTTON_MOUSE # Use right mouse button to paste # The following options set the default framebuffer size. options VT_FB_DEFAULT_HEIGHT=480 options VT_FB_DEFAULT_WIDTH=640 # The following options will let you change the default vt terminal colors. options TERMINAL_NORM_ATTR=(FG_GREEN|BG_BLACK) options TERMINAL_KERN_ATTR=(FG_LIGHTRED|BG_BLACK) # # Optional devices: # # # SCSI host adapters: # # adv: All Narrow SCSI bus AdvanSys controllers. # adw: Second Generation AdvanSys controllers including the ADV940UW. # aha: Adaptec 154x/1535/1640 # ahb: Adaptec 174x EISA controllers # ahc: Adaptec 274x/284x/2910/293x/294x/394x/3950x/3960x/398X/4944/ # 19160x/29160x, aic7770/aic78xx # ahd: Adaptec 29320/39320 Controllers. # aic: Adaptec 6260/6360, APA-1460 (PC Card), NEC PC9801-100 (C-BUS) # bt: Most Buslogic controllers: including BT-445, BT-54x, BT-64x, BT-74x, # BT-75x, BT-946, BT-948, BT-956, BT-958, SDC3211B, SDC3211F, SDC3222F # esp: Emulex ESP, NCR 53C9x and QLogic FAS families based controllers # including the AMD Am53C974 (found on devices such as the Tekram # DC-390(T)) and the Sun ESP and FAS families of controllers # isp: Qlogic ISP 1020, 1040 and 1040B PCI SCSI host adapters, # ISP 1240 Dual Ultra SCSI, ISP 1080 and 1280 (Dual) Ultra2, # ISP 12160 Ultra3 SCSI, # Qlogic ISP 2100 and ISP 2200 1Gb Fibre Channel host adapters. # Qlogic ISP 2300 and ISP 2312 2Gb Fibre Channel host adapters. # Qlogic ISP 2322 and ISP 6322 2Gb Fibre Channel host adapters. # ispfw: Firmware module for Qlogic host adapters # mpt: LSI-Logic MPT/Fusion 53c1020 or 53c1030 Ultra4 # or FC9x9 Fibre Channel host adapters. # ncr: NCR 53C810, 53C825 self-contained SCSI host adapters. # sym: Symbios/Logic 53C8XX family of PCI-SCSI I/O processors: # 53C810, 53C810A, 53C815, 53C825, 53C825A, 53C860, 53C875, # 53C876, 53C885, 53C895, 53C895A, 53C896, 53C897, 53C1510D, # 53C1010-33, 53C1010-66. # trm: Tekram DC395U/UW/F DC315U adapters. # wds: WD7000 # # Note that the order is important in order for Buslogic ISA/EISA cards to be # probed correctly. # device bt hint.bt.0.at="isa" hint.bt.0.port="0x330" device adv hint.adv.0.at="isa" device adw device aha hint.aha.0.at="isa" device aic hint.aic.0.at="isa" device ahb device ahc device ahd device esp device iscsi_initiator device isp hint.isp.0.disable="1" hint.isp.0.role="3" hint.isp.0.prefer_iomap="1" hint.isp.0.prefer_memmap="1" hint.isp.0.fwload_disable="1" hint.isp.0.ignore_nvram="1" hint.isp.0.fullduplex="1" hint.isp.0.topology="lport" hint.isp.0.topology="nport" hint.isp.0.topology="lport-only" hint.isp.0.topology="nport-only" # we can't get u_int64_t types, nor can we get strings if it's got # a leading 0x, hence this silly dodge. hint.isp.0.portwnn="w50000000aaaa0000" hint.isp.0.nodewnn="w50000000aaaa0001" device ispfw device mpt device ncr device sym device trm device wds hint.wds.0.at="isa" hint.wds.0.port="0x350" hint.wds.0.irq="11" hint.wds.0.drq="6" # The aic7xxx driver will attempt to use memory mapped I/O for all PCI # controllers that have it configured only if this option is set. Unfortunately, # this doesn't work on some motherboards, which prevents it from being the # default. options AHC_ALLOW_MEMIO # Dump the contents of the ahc controller configuration PROM. options AHC_DUMP_EEPROM # Bitmap of units to enable targetmode operations. options AHC_TMODE_ENABLE # Compile in Aic7xxx Debugging code. options AHC_DEBUG # Aic7xxx driver debugging options. See sys/dev/aic7xxx/aic7xxx.h options AHC_DEBUG_OPTS # Print register bitfields in debug output. Adds ~128k to driver # See ahc(4). options AHC_REG_PRETTY_PRINT # Compile in aic79xx debugging code. options AHD_DEBUG # Aic79xx driver debugging options. Adds ~215k to driver. See ahd(4). options AHD_DEBUG_OPTS=0xFFFFFFFF # Print human-readable register definitions when debugging options AHD_REG_PRETTY_PRINT # Bitmap of units to enable targetmode operations. options AHD_TMODE_ENABLE # The adw driver will attempt to use memory mapped I/O for all PCI # controllers that have it configured only if this option is set. options ADW_ALLOW_MEMIO # Options used in dev/iscsi (Software iSCSI stack) # options ISCSI_INITIATOR_DEBUG=9 # Options used in dev/isp/ (Qlogic SCSI/FC driver). # # ISP_TARGET_MODE - enable target mode operation # options ISP_TARGET_MODE=1 # # ISP_DEFAULT_ROLES - default role # none=0 # target=1 # initiator=2 # both=3 (not supported currently) # # ISP_INTERNAL_TARGET (trivial internal disk target, for testing) # options ISP_DEFAULT_ROLES=0 # Options used in dev/sym/ (Symbios SCSI driver). #options SYM_SETUP_LP_PROBE_MAP #-Low Priority Probe Map (bits) # Allows the ncr to take precedence # 1 (1<<0) -> 810a, 860 # 2 (1<<1) -> 825a, 875, 885, 895 # 4 (1<<2) -> 895a, 896, 1510d #options SYM_SETUP_SCSI_DIFF #-HVD support for 825a, 875, 885 # disabled:0 (default), enabled:1 #options SYM_SETUP_PCI_PARITY #-PCI parity checking # disabled:0, enabled:1 (default) #options SYM_SETUP_MAX_LUN #-Number of LUNs supported # default:8, range:[1..64] # The 'dpt' driver provides support for old DPT controllers (http://www.dpt.com/). # These have hardware RAID-{0,1,5} support, and do multi-initiator I/O. # The DPT controllers are commonly re-licensed under other brand-names - # some controllers by Olivetti, Dec, HP, AT&T, SNI, AST, Alphatronic, NEC and # Compaq are actually DPT controllers. # # See src/sys/dev/dpt for debugging and other subtle options. # DPT_MEASURE_PERFORMANCE Enables a set of (semi)invasive metrics. Various # instruments are enabled. The tools in # /usr/sbin/dpt_* assume these to be enabled. # DPT_DEBUG_xxxx These are controllable from sys/dev/dpt/dpt.h # DPT_RESET_HBA Make "reset" actually reset the controller # instead of fudging it. Only enable this if you # are 100% certain you need it. device dpt # DPT options #!CAM# options DPT_MEASURE_PERFORMANCE options DPT_RESET_HBA # # Compaq "CISS" RAID controllers (SmartRAID 5* series) # These controllers have a SCSI-like interface, and require the # CAM infrastructure. # device ciss # # Intel Integrated RAID controllers. # This driver was developed and is maintained by Intel. Contacts # at Intel for this driver are # "Kannanthanam, Boji T" and # "Leubner, Achim" . # device iir # # Mylex AcceleRAID and eXtremeRAID controllers with v6 and later # firmware. These controllers have a SCSI-like interface, and require # the CAM infrastructure. # device mly # # Compaq Smart RAID, Mylex DAC960 and AMI MegaRAID controllers. Only # one entry is needed; the code will find and configure all supported # controllers. # device ida # Compaq Smart RAID device mlx # Mylex DAC960 device amr # AMI MegaRAID device amrp # SCSI Passthrough interface (optional, CAM req.) device mfi # LSI MegaRAID SAS device mfip # LSI MegaRAID SAS passthrough, requires CAM options MFI_DEBUG device mrsas # LSI/Avago MegaRAID SAS/SATA, 6Gb/s and 12Gb/s # # 3ware ATA RAID # device twe # 3ware ATA RAID # # Serial ATA host controllers: # # ahci: Advanced Host Controller Interface (AHCI) compatible # mvs: Marvell 88SX50XX/88SX60XX/88SX70XX/SoC controllers # siis: SiliconImage SiI3124/SiI3132/SiI3531 controllers # # These drivers are part of cam(4) subsystem. They supersede less featured # ata(4) subsystem drivers, supporting same hardware. device ahci device mvs device siis # # The 'ATA' driver supports all legacy ATA/ATAPI controllers, including # PC Card devices. You only need one "device ata" for it to find all # PCI and PC Card ATA/ATAPI devices on modern machines. # Alternatively, individual bus and chipset drivers may be chosen by using # the 'atacore' driver then selecting the drivers on a per vendor basis. # For example to build a system which only supports a VIA chipset, # omit 'ata' and include the 'atacore', 'atapci' and 'atavia' drivers. device ata # Modular ATA #device atacore # Core ATA functionality #device atacard # CARDBUS support #device atabus # PC98 cbus support #device ataisa # ISA bus support #device atapci # PCI bus support; only generic chipset support # PCI ATA chipsets #device ataahci # AHCI SATA #device ataacard # ACARD #device ataacerlabs # Acer Labs Inc. (ALI) #device ataadaptec # Adaptec #device ataamd # American Micro Devices (AMD) #device ataati # ATI #device atacenatek # Cenatek #device atacypress # Cypress #device atacyrix # Cyrix #device atahighpoint # HighPoint #device ataintel # Intel #device ataite # Integrated Technology Inc. (ITE) #device atajmicron # JMicron #device atamarvell # Marvell #device atamicron # Micron #device atanational # National #device atanetcell # NetCell #device atanvidia # nVidia #device atapromise # Promise #device ataserverworks # ServerWorks #device atasiliconimage # Silicon Image Inc. (SiI) (formerly CMD) #device atasis # Silicon Integrated Systems Corp.(SiS) #device atavia # VIA Technologies Inc. # # For older non-PCI, non-PnPBIOS systems, these are the hints lines to add: hint.ata.0.at="isa" hint.ata.0.port="0x1f0" hint.ata.0.irq="14" hint.ata.1.at="isa" hint.ata.1.port="0x170" hint.ata.1.irq="15" # # The following options are valid on the ATA driver: # # ATA_STATIC_ID: controller numbering is static ie depends on location # else the device numbers are dynamically allocated. # ATA_REQUEST_TIMEOUT: the number of seconds to wait for an ATA request # before timing out. options ATA_STATIC_ID #options ATA_REQUEST_TIMEOUT=10 # # Standard floppy disk controllers and floppy tapes, supports # the Y-E DATA External FDD (PC Card) # device fdc hint.fdc.0.at="isa" hint.fdc.0.port="0x3F0" hint.fdc.0.irq="6" hint.fdc.0.drq="2" # # FDC_DEBUG enables floppy debugging. Since the debug output is huge, you # gotta turn it actually on by setting the variable fd_debug with DDB, # however. options FDC_DEBUG # # Activate this line if you happen to have an Insight floppy tape. # Probing them proved to be dangerous for people with floppy disks only, # so it's "hidden" behind a flag: #hint.fdc.0.flags="1" # Specify floppy devices hint.fd.0.at="fdc0" hint.fd.0.drive="0" hint.fd.1.at="fdc0" hint.fd.1.drive="1" # # uart: newbusified driver for serial interfaces. It consolidates the sio(4), # sab(4) and zs(4) drivers. # device uart # Options for uart(4) options UART_PPS_ON_CTS # Do time pulse capturing using CTS # instead of DCD. options UART_POLL_FREQ # Set polling rate, used when hw has # no interrupt support (50 Hz default). # The following hint should only be used for pure ISA devices. It is not # needed otherwise. Use of hints is strongly discouraged. hint.uart.0.at="isa" # The following 3 hints are used when the UART is a system device (i.e., a # console or debug port), but only on platforms that don't have any other # means to pass the information to the kernel. The unit number of the hint # is only used to bundle the hints together. There is no relation to the # unit number of the probed UART. hint.uart.0.port="0x3f8" hint.uart.0.flags="0x10" hint.uart.0.baud="115200" # `flags' for serial drivers that support consoles like sio(4) and uart(4): # 0x10 enable console support for this unit. Other console flags # (if applicable) are ignored unless this is set. Enabling # console support does not make the unit the preferred console. # Boot with -h or set boot_serial=YES in the loader. For sio(4) # specifically, the 0x20 flag can also be set (see above). # Currently, at most one unit can have console support; the # first one (in config file order) with this flag set is # preferred. Setting this flag for sio0 gives the old behavior. # 0x80 use this port for serial line gdb support in ddb. Also known # as debug port. # # Options for serial drivers that support consoles: options BREAK_TO_DEBUGGER # A BREAK/DBG on the console goes to # ddb, if available. # Solaris implements a new BREAK which is initiated by a character # sequence CR ~ ^b which is similar to a familiar pattern used on # Sun servers by the Remote Console. There are FreeBSD extensions: # CR ~ ^p requests force panic and CR ~ ^r requests a clean reboot. options ALT_BREAK_TO_DEBUGGER # Serial Communications Controller # Supports the Siemens SAB 82532 and Zilog Z8530 multi-channel # communications controllers. device scc # PCI Universal Communications driver # Supports various multi port PCI I/O cards. device puc # # Network interfaces: # # MII bus support is required for many PCI Ethernet NICs, # namely those which use MII-compliant transceivers or implement # transceiver control interfaces that operate like an MII. Adding # "device miibus" to the kernel config pulls in support for the generic # miibus API, the common support for for bit-bang'ing the MII and all # of the PHY drivers, including a generic one for PHYs that aren't # specifically handled by an individual driver. Support for specific # PHYs may be built by adding "device mii", "device mii_bitbang" if # needed by the NIC driver and then adding the appropriate PHY driver. device mii # Minimal MII support device mii_bitbang # Common module for bit-bang'ing the MII device miibus # MII support w/ bit-bang'ing and all PHYs device acphy # Altima Communications AC101 device amphy # AMD AM79c873 / Davicom DM910{1,2} device atphy # Attansic/Atheros F1 device axphy # Asix Semiconductor AX88x9x device bmtphy # Broadcom BCM5201/BCM5202 and 3Com 3c905C device brgphy # Broadcom BCM54xx/57xx 1000baseTX device ciphy # Cicada/Vitesse CS/VSC8xxx device e1000phy # Marvell 88E1000 1000/100/10-BT device gentbi # Generic 10-bit 1000BASE-{LX,SX} fiber ifaces device icsphy # ICS ICS1889-1893 device ip1000phy # IC Plus IP1000A/IP1001 device jmphy # JMicron JMP211/JMP202 device lxtphy # Level One LXT-970 device mlphy # Micro Linear 6692 device nsgphy # NatSemi DP8361/DP83865/DP83891 device nsphy # NatSemi DP83840A device nsphyter # NatSemi DP83843/DP83815 device pnaphy # HomePNA device qsphy # Quality Semiconductor QS6612 device rdcphy # RDC Semiconductor R6040 device rgephy # RealTek 8169S/8110S/8211B/8211C device rlphy # RealTek 8139 device rlswitch # RealTek 8305 device smcphy # SMSC LAN91C111 device tdkphy # TDK 89Q2120 device tlphy # Texas Instruments ThunderLAN device truephy # LSI TruePHY device xmphy # XaQti XMAC II # an: Aironet 4500/4800 802.11 wireless adapters. Supports the PCMCIA, # PCI and ISA varieties. # ae: Support for gigabit ethernet adapters based on the Attansic/Atheros # L2 PCI-Express FastEthernet controllers. # age: Support for gigabit ethernet adapters based on the Attansic/Atheros # L1 PCI express gigabit ethernet controllers. # alc: Support for Atheros AR8131/AR8132 PCIe ethernet controllers. # ale: Support for Atheros AR8121/AR8113/AR8114 PCIe ethernet controllers. # ath: Atheros a/b/g WiFi adapters (requires ath_hal and wlan) # bce: Broadcom NetXtreme II (BCM5706/BCM5708) PCI/PCIe Gigabit Ethernet # adapters. # bfe: Broadcom BCM4401 Ethernet adapter. # bge: Support for gigabit ethernet adapters based on the Broadcom # BCM570x family of controllers, including the 3Com 3c996-T, # the Netgear GA302T, the SysKonnect SK-9D21 and SK-9D41, and # the embedded gigE NICs on Dell PowerEdge 2550 servers. # bxe: Broadcom NetXtreme II (BCM5771X/BCM578XX) PCIe 10Gb Ethernet # adapters. # bwi: Broadcom BCM430* and BCM431* family of wireless adapters. # bwn: Broadcom BCM43xx family of wireless adapters. # cas: Sun Cassini/Cassini+ and National Semiconductor DP83065 Saturn # cm: Arcnet SMC COM90c26 / SMC COM90c56 # (and SMC COM90c66 in '56 compatibility mode) adapters. # cxgb: Chelsio T3 based 1GbE/10GbE PCIe Ethernet adapters. # cxgbe:Chelsio T4 and T5 based 1GbE/10GbE/40GbE PCIe Ethernet adapters. # dc: Support for PCI fast ethernet adapters based on the DEC/Intel 21143 # and various workalikes including: # the ADMtek AL981 Comet and AN985 Centaur, the ASIX Electronics # AX88140A and AX88141, the Davicom DM9100 and DM9102, the Lite-On # 82c168 and 82c169 PNIC, the Lite-On/Macronix LC82C115 PNIC II # and the Macronix 98713/98713A/98715/98715A/98725 PMAC. This driver # replaces the old al, ax, dm, pn and mx drivers. List of brands: # Digital DE500-BA, Kingston KNE100TX, D-Link DFE-570TX, SOHOware SFA110, # SVEC PN102-TX, CNet Pro110B, 120A, and 120B, Compex RL100-TX, # LinkSys LNE100TX, LNE100TX V2.0, Jaton XpressNet, Alfa Inc GFC2204, # KNE110TX. # de: Digital Equipment DC21040 # em: Intel Pro/1000 Gigabit Ethernet 82542, 82543, 82544 based adapters. # igb: Intel Pro/1000 PCI Express Gigabit Ethernet: 82575 and later adapters. # ep: 3Com 3C509, 3C529, 3C556, 3C562D, 3C563D, 3C572, 3C574X, 3C579, 3C589 # and PC Card devices using these chipsets. # ex: Intel EtherExpress Pro/10 and other i82595-based adapters, # Olicom Ethernet PC Card devices. # fe: Fujitsu MB86960A/MB86965A Ethernet # fea: DEC DEFEA EISA FDDI adapter # fpa: Support for the Digital DEFPA PCI FDDI. `device fddi' is also needed. # fxp: Intel EtherExpress Pro/100B # (hint of prefer_iomap can be done to prefer I/O instead of Mem mapping) # gem: Apple GMAC/Sun ERI/Sun GEM # hme: Sun HME (Happy Meal Ethernet) # jme: JMicron JMC260 Fast Ethernet/JMC250 Gigabit Ethernet based adapters. # le: AMD Am7900 LANCE and Am79C9xx PCnet # lge: Support for PCI gigabit ethernet adapters based on the Level 1 # LXT1001 NetCellerator chipset. This includes the D-Link DGE-500SX, # SMC TigerCard 1000 (SMC9462SX), and some Addtron cards. # malo: Marvell Libertas wireless NICs. # mwl: Marvell 88W8363 802.11n wireless NICs. # Requires the mwl firmware module # mwlfw: Marvell 88W8363 firmware # msk: Support for gigabit ethernet adapters based on the Marvell/SysKonnect # Yukon II Gigabit controllers, including 88E8021, 88E8022, 88E8061, # 88E8062, 88E8035, 88E8036, 88E8038, 88E8050, 88E8052, 88E8053, # 88E8055, 88E8056 and D-Link 560T/550SX. # lmc: Support for the LMC/SBE wide-area network interface cards. # my: Myson Fast Ethernet (MTD80X, MTD89X) # nge: Support for PCI gigabit ethernet adapters based on the National # Semiconductor DP83820 and DP83821 chipset. This includes the # SMC EZ Card 1000 (SMC9462TX), D-Link DGE-500T, Asante FriendlyNet # GigaNIX 1000TA and 1000TPC, the Addtron AEG320T, the Surecom # EP-320G-TX and the Netgear GA622T. # oce: Emulex 10 Gbit adapters (OneConnect Ethernet) # pcn: Support for PCI fast ethernet adapters based on the AMD Am79c97x # PCnet-FAST, PCnet-FAST+, PCnet-FAST III, PCnet-PRO and PCnet-Home # chipsets. These can also be handled by the le(4) driver if the # pcn(4) driver is left out of the kernel. The le(4) driver does not # support the additional features like the MII bus and burst mode of # the PCnet-FAST and greater chipsets though. # ral: Ralink Technology IEEE 802.11 wireless adapter # re: RealTek 8139C+/8169/816xS/811xS/8101E PCI/PCIe Ethernet adapter # rl: Support for PCI fast ethernet adapters based on the RealTek 8129/8139 # chipset. Note that the RealTek driver defaults to using programmed # I/O to do register accesses because memory mapped mode seems to cause # severe lockups on SMP hardware. This driver also supports the # Accton EN1207D `Cheetah' adapter, which uses a chip called # the MPX 5030/5038, which is either a RealTek in disguise or a # RealTek workalike. Note that the D-Link DFE-530TX+ uses the RealTek # chipset and is supported by this driver, not the 'vr' driver. # sf: Support for Adaptec Duralink PCI fast ethernet adapters based on the # Adaptec AIC-6915 "starfire" controller. # This includes dual and quad port cards, as well as one 100baseFX card. # Most of these are 64-bit PCI devices, except for one single port # card which is 32-bit. # sge: Silicon Integrated Systems SiS190/191 Fast/Gigabit Ethernet adapter # sis: Support for NICs based on the Silicon Integrated Systems SiS 900, # SiS 7016 and NS DP83815 PCI fast ethernet controller chips. # sk: Support for the SysKonnect SK-984x series PCI gigabit ethernet NICs. # This includes the SK-9841 and SK-9842 single port cards (single mode # and multimode fiber) and the SK-9843 and SK-9844 dual port cards # (also single mode and multimode). # The driver will autodetect the number of ports on the card and # attach each one as a separate network interface. # sn: Support for ISA and PC Card Ethernet devices using the # SMC91C90/92/94/95 chips. # ste: Sundance Technologies ST201 PCI fast ethernet controller, includes # the D-Link DFE-550TX. # stge: Support for gigabit ethernet adapters based on the Sundance/Tamarack # TC9021 family of controllers, including the Sundance ST2021/ST2023, # the Sundance/Tamarack TC9021, the D-Link DL-4000 and ASUS NX1101. # ti: Support for PCI gigabit ethernet NICs based on the Alteon Networks # Tigon 1 and Tigon 2 chipsets. This includes the Alteon AceNIC, the # 3Com 3c985, the Netgear GA620 and various others. Note that you will # probably want to bump up kern.ipc.nmbclusters a lot to use this driver. # tl: Support for the Texas Instruments TNETE100 series 'ThunderLAN' # cards and integrated ethernet controllers. This includes several # Compaq Netelligent 10/100 cards and the built-in ethernet controllers # in several Compaq Prosignia, Proliant and Deskpro systems. It also # supports several Olicom 10Mbps and 10/100 boards. # tx: SMC 9432 TX, BTX and FTX cards. (SMC EtherPower II series) # txp: Support for 3Com 3cR990 cards with the "Typhoon" chipset # vr: Support for various fast ethernet adapters based on the VIA # Technologies VT3043 `Rhine I' and VT86C100A `Rhine II' chips, # including the D-Link DFE520TX and D-Link DFE530TX (see 'rl' for # DFE530TX+), the Hawking Technologies PN102TX, and the AOpen/Acer ALN-320. # vte: DM&P Vortex86 RDC R6040 Fast Ethernet # vx: 3Com 3C590 and 3C595 # wb: Support for fast ethernet adapters based on the Winbond W89C840F chip. # Note: this is not the same as the Winbond W89C940F, which is a # NE2000 clone. # wi: Lucent WaveLAN/IEEE 802.11 PCMCIA adapters. Note: this supports both # the PCMCIA and ISA cards: the ISA card is really a PCMCIA to ISA # bridge with a PCMCIA adapter plugged into it. # xe: Xircom/Intel EtherExpress Pro100/16 PC Card ethernet controller, # Accton Fast EtherCard-16, Compaq Netelligent 10/100 PC Card, # Toshiba 10/100 Ethernet PC Card, Xircom 16-bit Ethernet + Modem 56 # xl: Support for the 3Com 3c900, 3c905, 3c905B and 3c905C (Fast) # Etherlink XL cards and integrated controllers. This includes the # integrated 3c905B-TX chips in certain Dell Optiplex and Dell # Precision desktop machines and the integrated 3c905-TX chips # in Dell Latitude laptop docking stations. # Also supported: 3Com 3c980(C)-TX, 3Com 3cSOHO100-TX, 3Com 3c450-TX # Order for ISA/EISA devices is important here device cm hint.cm.0.at="isa" hint.cm.0.port="0x2e0" hint.cm.0.irq="9" hint.cm.0.maddr="0xdc000" device ep device ex device fe hint.fe.0.at="isa" hint.fe.0.port="0x300" device fea device sn hint.sn.0.at="isa" hint.sn.0.port="0x300" hint.sn.0.irq="10" device an device wi device xe # PCI Ethernet NICs that use the common MII bus controller code. device ae # Attansic/Atheros L2 FastEthernet device age # Attansic/Atheros L1 Gigabit Ethernet device alc # Atheros AR8131/AR8132 Ethernet device ale # Atheros AR8121/AR8113/AR8114 Ethernet device bce # Broadcom BCM5706/BCM5708 Gigabit Ethernet device bfe # Broadcom BCM440x 10/100 Ethernet device bge # Broadcom BCM570xx Gigabit Ethernet device cas # Sun Cassini/Cassini+ and NS DP83065 Saturn device cxgb # Chelsio T3 10 Gigabit Ethernet device cxgb_t3fw # Chelsio T3 10 Gigabit Ethernet firmware device cxgbe # Chelsio T4 and T5 1GbE/10GbE/40GbE device dc # DEC/Intel 21143 and various workalikes device et # Agere ET1310 10/100/Gigabit Ethernet device fxp # Intel EtherExpress PRO/100B (82557, 82558) hint.fxp.0.prefer_iomap="0" device gem # Apple GMAC/Sun ERI/Sun GEM device hme # Sun HME (Happy Meal Ethernet) device jme # JMicron JMC250 Gigabit/JMC260 Fast Ethernet device lge # Level 1 LXT1001 gigabit Ethernet device msk # Marvell/SysKonnect Yukon II Gigabit Ethernet device my # Myson Fast Ethernet (MTD80X, MTD89X) device nge # NatSemi DP83820 gigabit Ethernet device re # RealTek 8139C+/8169/8169S/8110S device rl # RealTek 8129/8139 device pcn # AMD Am79C97x PCI 10/100 NICs device sf # Adaptec AIC-6915 (``Starfire'') device sge # Silicon Integrated Systems SiS190/191 device sis # Silicon Integrated Systems SiS 900/SiS 7016 device sk # SysKonnect SK-984x & SK-982x gigabit Ethernet device ste # Sundance ST201 (D-Link DFE-550TX) device stge # Sundance/Tamarack TC9021 gigabit Ethernet device tl # Texas Instruments ThunderLAN device tx # SMC EtherPower II (83c170 ``EPIC'') device vr # VIA Rhine, Rhine II device vte # DM&P Vortex86 RDC R6040 Fast Ethernet device wb # Winbond W89C840F device xl # 3Com 3c90x (``Boomerang'', ``Cyclone'') # PCI Ethernet NICs. device de # DEC/Intel DC21x4x (``Tulip'') device em # Intel Pro/1000 Gigabit Ethernet device igb # Intel Pro/1000 PCIE Gigabit Ethernet device ixgb # Intel Pro/10Gbe PCI-X Ethernet device ixgbe # Intel Pro/10Gbe PCIE Ethernet device le # AMD Am7900 LANCE and Am79C9xx PCnet device mxge # Myricom Myri-10G 10GbE NIC device nxge # Neterion Xframe 10GbE Server/Storage Adapter device oce # Emulex 10 GbE (OneConnect Ethernet) device ti # Alteon Networks Tigon I/II gigabit Ethernet device txp # 3Com 3cR990 (``Typhoon'') device vx # 3Com 3c590, 3c595 (``Vortex'') device vxge # Exar/Neterion XFrame 3100 10GbE # PCI FDDI NICs. device fpa # PCI WAN adapters. device lmc # PCI IEEE 802.11 Wireless NICs device ath # Atheros pci/cardbus NIC's device ath_hal # pci/cardbus chip support #device ath_ar5210 # AR5210 chips #device ath_ar5211 # AR5211 chips #device ath_ar5212 # AR5212 chips #device ath_rf2413 #device ath_rf2417 #device ath_rf2425 #device ath_rf5111 #device ath_rf5112 #device ath_rf5413 #device ath_ar5416 # AR5416 chips options AH_SUPPORT_AR5416 # enable AR5416 tx/rx descriptors # All of the AR5212 parts have a problem when paired with the AR71xx # CPUS. These parts have a bug that triggers a fatal bus error on the AR71xx # only. Details of the exact nature of the bug are sketchy, but some can be # found at https://forum.openwrt.org/viewtopic.php?pid=70060 on pages 4, 5 and # 6. This option enables this workaround. There is a performance penalty # for this work around, but without it things don't work at all. The DMA # from the card usually bursts 128 bytes, but on the affected CPUs, only # 4 are safe. options AH_RXCFG_SDMAMW_4BYTES #device ath_ar9160 # AR9160 chips #device ath_ar9280 # AR9280 chips #device ath_ar9285 # AR9285 chips device ath_rate_sample # SampleRate tx rate control for ath device bwi # Broadcom BCM430* BCM431* device bwn # Broadcom BCM43xx device malo # Marvell Libertas wireless NICs. device mwl # Marvell 88W8363 802.11n wireless NICs. device mwlfw device ral # Ralink Technology RT2500 wireless NICs. # Use sf_buf(9) interface for jumbo buffers on ti(4) controllers. #options TI_SF_BUF_JUMBO # Turn on the header splitting option for the ti(4) driver firmware. This # only works for Tigon II chips, and has no effect for Tigon I chips. # This option requires the TI_SF_BUF_JUMBO option above. #options TI_JUMBO_HDRSPLIT # These two options allow manipulating the mbuf cluster size and mbuf size, # respectively. Be very careful with NIC driver modules when changing # these from their default values, because that can potentially cause a # mismatch between the mbuf size assumed by the kernel and the mbuf size # assumed by a module. The only driver that currently has the ability to # detect a mismatch is ti(4). options MCLSHIFT=12 # mbuf cluster shift in bits, 12 == 4KB options MSIZE=512 # mbuf size in bytes # # ATM related options (Cranor version) # (note: this driver cannot be used with the HARP ATM stack) # # The `en' device provides support for Efficient Networks (ENI) # ENI-155 PCI midway cards, and the Adaptec 155Mbps PCI ATM cards (ANA-59x0). # # The `hatm' device provides support for Fore/Marconi HE155 and HE622 # ATM PCI cards. # # The `fatm' device provides support for Fore PCA200E ATM PCI cards. # # The `patm' device provides support for IDT77252 based cards like # ProSum's ProATM-155 and ProATM-25 and IDT's evaluation boards. # # atm device provides generic atm functions and is required for # atm devices. # NATM enables the netnatm protocol family that can be used to # bypass TCP/IP. # # utopia provides the access to the ATM PHY chips and is required for en, # hatm and fatm. # # the current driver supports only PVC operations (no atm-arp, no multicast). # for more details, please read the original documents at # http://www.ccrc.wustl.edu/pub/chuck/tech/bsdatm/bsdatm.html # device atm device en device fatm #Fore PCA200E device hatm #Fore/Marconi HE155/622 device patm #IDT77252 cards (ProATM and IDT) device utopia #ATM PHY driver options NATM #native ATM options LIBMBPOOL #needed by patm, iatm # # Sound drivers # # sound: The generic sound driver. # device sound # # snd_*: Device-specific drivers. # # The flags of the device tell the device a bit more info about the # device that normally is obtained through the PnP interface. # bit 2..0 secondary DMA channel; # bit 4 set if the board uses two dma channels; # bit 15..8 board type, overrides autodetection; leave it # zero if don't know what to put in (and you don't, # since this is unsupported at the moment...). # # snd_ad1816: Analog Devices AD1816 ISA PnP/non-PnP. # snd_als4000: Avance Logic ALS4000 PCI. # snd_atiixp: ATI IXP 200/300/400 PCI. # snd_audiocs: Crystal Semiconductor CS4231 SBus/EBus. Only # for sparc64. # snd_cmi: CMedia CMI8338/CMI8738 PCI. # snd_cs4281: Crystal Semiconductor CS4281 PCI. # snd_csa: Crystal Semiconductor CS461x/428x PCI. (except # 4281) # snd_ds1: Yamaha DS-1 PCI. # snd_emu10k1: Creative EMU10K1 PCI and EMU10K2 (Audigy) PCI. # snd_emu10kx: Creative SoundBlaster Live! and Audigy # snd_envy24: VIA Envy24 and compatible, needs snd_spicds. # snd_envy24ht: VIA Envy24HT and compatible, needs snd_spicds. # snd_es137x: Ensoniq AudioPCI ES137x PCI. # snd_ess: Ensoniq ESS ISA PnP/non-PnP, to be used in # conjunction with snd_sbc. # snd_fm801: Forte Media FM801 PCI. # snd_gusc: Gravis UltraSound ISA PnP/non-PnP. # snd_hda: Intel High Definition Audio (Controller) and # compatible. # snd_hdspe: RME HDSPe AIO and RayDAT. # snd_ich: Intel ICH AC'97 and some more audio controllers # embedded in a chipset, for example nVidia # nForce controllers. # snd_maestro: ESS Technology Maestro-1/2x PCI. # snd_maestro3: ESS Technology Maestro-3/Allegro PCI. # snd_mss: Microsoft Sound System ISA PnP/non-PnP. # snd_neomagic: Neomagic 256 AV/ZX PCI. # snd_sb16: Creative SoundBlaster16, to be used in # conjunction with snd_sbc. # snd_sb8: Creative SoundBlaster (pre-16), to be used in # conjunction with snd_sbc. # snd_sbc: Creative SoundBlaster ISA PnP/non-PnP. # Supports ESS and Avance ISA chips as well. # snd_solo: ESS Solo-1x PCI. # snd_spicds: SPI codec driver, needed by Envy24/Envy24HT drivers. # snd_t4dwave: Trident 4DWave DX/NX PCI, Sis 7018 PCI and Acer Labs # M5451 PCI. # snd_uaudio: USB audio. # snd_via8233: VIA VT8233x PCI. # snd_via82c686: VIA VT82C686A PCI. # snd_vibes: S3 Sonicvibes PCI. device snd_ad1816 device snd_als4000 device snd_atiixp #device snd_audiocs device snd_cmi device snd_cs4281 device snd_csa device snd_ds1 device snd_emu10k1 device snd_emu10kx device snd_envy24 device snd_envy24ht device snd_es137x device snd_ess device snd_fm801 device snd_gusc device snd_hda device snd_hdspe device snd_ich device snd_maestro device snd_maestro3 device snd_mss device snd_neomagic device snd_sb16 device snd_sb8 device snd_sbc device snd_solo device snd_spicds device snd_t4dwave device snd_uaudio device snd_via8233 device snd_via82c686 device snd_vibes # For non-PnP sound cards: hint.pcm.0.at="isa" hint.pcm.0.irq="10" hint.pcm.0.drq="1" hint.pcm.0.flags="0x0" hint.sbc.0.at="isa" hint.sbc.0.port="0x220" hint.sbc.0.irq="5" hint.sbc.0.drq="1" hint.sbc.0.flags="0x15" hint.gusc.0.at="isa" hint.gusc.0.port="0x220" hint.gusc.0.irq="5" hint.gusc.0.drq="1" hint.gusc.0.flags="0x13" # # Following options are intended for debugging/testing purposes: # # SND_DEBUG Enable extra debugging code that includes # sanity checking and possible increase of # verbosity. # # SND_DIAGNOSTIC Similar in a spirit of INVARIANTS/DIAGNOSTIC, # zero tolerance against inconsistencies. # # SND_FEEDER_MULTIFORMAT By default, only 16/32 bit feeders are compiled # in. This options enable most feeder converters # except for 8bit. WARNING: May bloat the kernel. # # SND_FEEDER_FULL_MULTIFORMAT Ditto, but includes 8bit feeders as well. # # SND_FEEDER_RATE_HP (feeder_rate) High precision 64bit arithmetic # as much as possible (the default trying to # avoid it). Possible slowdown. # # SND_PCM_64 (Only applicable for i386/32bit arch) # Process 32bit samples through 64bit # integer/arithmetic. Slight increase of dynamic # range at a cost of possible slowdown. # # SND_OLDSTEREO Only 2 channels are allowed, effectively # disabling multichannel processing. # options SND_DEBUG options SND_DIAGNOSTIC options SND_FEEDER_MULTIFORMAT options SND_FEEDER_FULL_MULTIFORMAT options SND_FEEDER_RATE_HP options SND_PCM_64 options SND_OLDSTEREO # # Miscellaneous hardware: # # scd: Sony CD-ROM using proprietary (non-ATAPI) interface # mcd: Mitsumi CD-ROM using proprietary (non-ATAPI) interface # bktr: Brooktree bt848/848a/849a/878/879 video capture and TV Tuner board # joy: joystick (including IO DATA PCJOY PC Card joystick) # cmx: OmniKey CardMan 4040 pccard smartcard reader # Mitsumi CD-ROM device mcd hint.mcd.0.at="isa" hint.mcd.0.port="0x300" # for the Sony CDU31/33A CDROM device scd hint.scd.0.at="isa" hint.scd.0.port="0x230" device joy # PnP aware, hints for non-PnP only hint.joy.0.at="isa" hint.joy.0.port="0x201" device cmx # # The 'bktr' device is a PCI video capture device using the Brooktree # bt848/bt848a/bt849a/bt878/bt879 chipset. When used with a TV Tuner it forms a # TV card, e.g. Miro PC/TV, Hauppauge WinCast/TV WinTV, VideoLogic Captivator, # Intel Smart Video III, AverMedia, IMS Turbo, FlyVideo. # # options OVERRIDE_CARD=xxx # options OVERRIDE_TUNER=xxx # options OVERRIDE_MSP=1 # options OVERRIDE_DBX=1 # These options can be used to override the auto detection # The current values for xxx are found in src/sys/dev/bktr/bktr_card.h # Using sysctl(8) run-time overrides on a per-card basis can be made # # options BROOKTREE_SYSTEM_DEFAULT=BROOKTREE_PAL # or # options BROOKTREE_SYSTEM_DEFAULT=BROOKTREE_NTSC # Specifies the default video capture mode. # This is required for Dual Crystal (28&35MHz) boards where PAL is used # to prevent hangs during initialization, e.g. VideoLogic Captivator PCI. # # options BKTR_USE_PLL # This is required for PAL or SECAM boards with a 28MHz crystal and no 35MHz # crystal, e.g. some new Bt878 cards. # # options BKTR_GPIO_ACCESS # This enables IOCTLs which give user level access to the GPIO port. # # options BKTR_NO_MSP_RESET # Prevents the MSP34xx reset. Good if you initialize the MSP in another OS first # # options BKTR_430_FX_MODE # Switch Bt878/879 cards into Intel 430FX chipset compatibility mode. # # options BKTR_SIS_VIA_MODE # Switch Bt878/879 cards into SIS/VIA chipset compatibility mode which is # needed for some old SiS and VIA chipset motherboards. # This also allows Bt878/879 chips to work on old OPTi (<1997) chipset # motherboards and motherboards with bad or incomplete PCI 2.1 support. # As a rough guess, old = before 1998 # # options BKTR_NEW_MSP34XX_DRIVER # Use new, more complete initialization scheme for the msp34* soundchip. # Should fix stereo autodetection if the old driver does only output # mono sound. # # options BKTR_USE_FREEBSD_SMBUS # Compile with FreeBSD SMBus implementation # # Brooktree driver has been ported to the new I2C framework. Thus, # you'll need to have the following 3 lines in the kernel config. # device smbus # device iicbus # device iicbb # device iicsmb # The iic and smb devices are only needed if you want to control other # I2C slaves connected to the external connector of some cards. # device bktr # # PC Card/PCMCIA and Cardbus # # cbb: pci/cardbus bridge implementing YENTA interface # pccard: pccard slots # cardbus: cardbus slots device cbb device pccard device cardbus # # MMC/SD # # mmc MMC/SD bus # mmcsd MMC/SD memory card # sdhci Generic PCI SD Host Controller # device mmc device mmcsd device sdhci # # SMB bus # # System Management Bus support is provided by the 'smbus' device. # Access to the SMBus device is via the 'smb' device (/dev/smb*), # which is a child of the 'smbus' device. # # Supported devices: # smb standard I/O through /dev/smb* # # Supported SMB interfaces: # iicsmb I2C to SMB bridge with any iicbus interface # bktr brooktree848 I2C hardware interface # intpm Intel PIIX4 (82371AB, 82443MX) Power Management Unit # alpm Acer Aladdin-IV/V/Pro2 Power Management Unit # ichsmb Intel ICH SMBus controller chips (82801AA, 82801AB, 82801BA) # viapm VIA VT82C586B/596B/686A and VT8233 Power Management Unit # amdpm AMD 756 Power Management Unit # amdsmb AMD 8111 SMBus 2.0 Controller # nfpm NVIDIA nForce Power Management Unit # nfsmb NVIDIA nForce2/3/4 MCP SMBus 2.0 Controller # ismt Intel SMBus 2.0 controller chips (on Atom S1200, C2000) # device smbus # Bus support, required for smb below. device intpm device alpm device ichsmb device viapm device amdpm device amdsmb device nfpm device nfsmb device ismt device smb # # I2C Bus # # Philips i2c bus support is provided by the `iicbus' device. # # Supported devices: # ic i2c network interface # iic i2c standard io # iicsmb i2c to smb bridge. Allow i2c i/o with smb commands. # iicoc simple polling driver for OpenCores I2C controller # # Supported interfaces: # bktr brooktree848 I2C software interface # # Other: # iicbb generic I2C bit-banging code (needed by lpbb, bktr) # device iicbus # Bus support, required for ic/iic/iicsmb below. device iicbb device ic device iic device iicsmb # smb over i2c bridge device iicoc # OpenCores I2C controller support # I2C peripheral devices # # ds133x Dallas Semiconductor DS1337, DS1338 and DS1339 RTC # ds1374 Dallas Semiconductor DS1374 RTC # ds1672 Dallas Semiconductor DS1672 RTC # s35390a Seiko Instruments S-35390A RTC # device ds133x device ds1374 device ds1672 device s35390a # Parallel-Port Bus # # Parallel port bus support is provided by the `ppbus' device. # Multiple devices may be attached to the parallel port, devices # are automatically probed and attached when found. # # Supported devices: # vpo Iomega Zip Drive # Requires SCSI disk support ('scbus' and 'da'), best # performance is achieved with ports in EPP 1.9 mode. # lpt Parallel Printer # plip Parallel network interface # ppi General-purpose I/O ("Geek Port") + IEEE1284 I/O # pps Pulse per second Timing Interface # lpbb Philips official parallel port I2C bit-banging interface # pcfclock Parallel port clock driver. # # Supported interfaces: # ppc ISA-bus parallel port interfaces. # options PPC_PROBE_CHIPSET # Enable chipset specific detection # (see flags in ppc(4)) options DEBUG_1284 # IEEE1284 signaling protocol debug options PERIPH_1284 # Makes your computer act as an IEEE1284 # compliant peripheral options DONTPROBE_1284 # Avoid boot detection of PnP parallel devices options VP0_DEBUG # ZIP/ZIP+ debug options LPT_DEBUG # Printer driver debug options PPC_DEBUG # Parallel chipset level debug options PLIP_DEBUG # Parallel network IP interface debug options PCFCLOCK_VERBOSE # Verbose pcfclock driver options PCFCLOCK_MAX_RETRIES=5 # Maximum read tries (default 10) device ppc hint.ppc.0.at="isa" hint.ppc.0.irq="7" device ppbus device vpo device lpt device plip device ppi device pps device lpbb device pcfclock # Kernel BOOTP support options BOOTP # Use BOOTP to obtain IP address/hostname # Requires NFSCL and NFS_ROOT options BOOTP_NFSROOT # NFS mount root filesystem using BOOTP info options BOOTP_NFSV3 # Use NFS v3 to NFS mount root options BOOTP_COMPAT # Workaround for broken bootp daemons. options BOOTP_WIRED_TO=fxp0 # Use interface fxp0 for BOOTP options BOOTP_BLOCKSIZE=8192 # Override NFS block size # # Add software watchdog routines. # options SW_WATCHDOG # # Add the software deadlock resolver thread. # options DEADLKRES # # Disable swapping of stack pages. This option removes all # code which actually performs swapping, so it's not possible to turn # it back on at run-time. # # This is sometimes usable for systems which don't have any swap space # (see also sysctls "vm.defer_swapspace_pageouts" and # "vm.disable_swapspace_pageouts") # #options NO_SWAPPING # Set the number of sf_bufs to allocate. sf_bufs are virtual buffers # for sendfile(2) that are used to map file VM pages, and normally # default to a quantity that is roughly 16*MAXUSERS+512. You would # typically want about 4 of these for each simultaneous file send. # options NSFBUFS=1024 # # Enable extra debugging code for locks. This stores the filename and # line of whatever acquired the lock in the lock itself, and changes a # number of function calls to pass around the relevant data. This is # not at all useful unless you are debugging lock code. Note that # modules should be recompiled as this option modifies KBI. # options DEBUG_LOCKS ##################################################################### # USB support # UHCI controller device uhci # OHCI controller device ohci # EHCI controller device ehci # XHCI controller device xhci # SL811 Controller #device slhci # General USB code (mandatory for USB) device usb # # USB Double Bulk Pipe devices device udbp # USB Fm Radio device ufm # USB LED device uled # Human Interface Device (anything with buttons and dials) device uhid # USB keyboard device ukbd # USB printer device ulpt # USB mass storage driver (Requires scbus and da) device umass # USB mass storage driver for device-side mode device usfs # USB support for Belkin F5U109 and Magic Control Technology serial adapters device umct # USB modem support device umodem # USB mouse device ums # USB touchpad(s) device atp device wsp # eGalax USB touch screen device uep # Diamond Rio 500 MP3 player device urio # # USB serial support device ucom # USB support for 3G modem cards by Option, Novatel, Huawei and Sierra device u3g # USB support for Technologies ARK3116 based serial adapters device uark # USB support for Belkin F5U103 and compatible serial adapters device ubsa # USB support for serial adapters based on the FT8U100AX and FT8U232AM device uftdi # USB support for some Windows CE based serial communication. device uipaq # USB support for Prolific PL-2303 serial adapters device uplcom # USB support for Silicon Laboratories CP2101/CP2102 based USB serial adapters device uslcom # USB Visor and Palm devices device uvisor # USB serial support for DDI pocket's PHS device uvscom # # ADMtek USB ethernet. Supports the LinkSys USB100TX, # the Billionton USB100, the Melco LU-ATX, the D-Link DSB-650TX # and the SMC 2202USB. Also works with the ADMtek AN986 Pegasus # eval board. device aue # ASIX Electronics AX88172 USB 2.0 ethernet driver. Used in the # LinkSys USB200M and various other adapters. device axe # ASIX Electronics AX88178A/AX88179 USB 2.0/3.0 gigabit ethernet driver. device axge # # Devices which communicate using Ethernet over USB, particularly # Communication Device Class (CDC) Ethernet specification. Supports # Sharp Zaurus PDAs, some DOCSIS cable modems and so on. device cdce # # CATC USB-EL1201A USB ethernet. Supports the CATC Netmate # and Netmate II, and the Belkin F5U111. device cue # # Kawasaki LSI ethernet. Supports the LinkSys USB10T, # Entrega USB-NET-E45, Peracom Ethernet Adapter, the # 3Com 3c19250, the ADS Technologies USB-10BT, the ATen UC10T, # the Netgear EA101, the D-Link DSB-650, the SMC 2102USB # and 2104USB, and the Corega USB-T. device kue # # RealTek RTL8150 USB to fast ethernet. Supports the Melco LUA-KTX # and the GREEN HOUSE GH-USB100B. device rue # # Davicom DM9601E USB to fast ethernet. Supports the Corega FEther USB-TXC. device udav # # Moschip MCS7730/MCS7840 USB to fast ethernet. Supports the Sitecom LN030. device mos # # HSxPA devices from Option N.V device uhso # Realtek RTL8188SU/RTL8191SU/RTL8192SU wireless driver device rsu # # Ralink Technology RT2501USB/RT2601USB wireless driver device rum # Ralink Technology RT2700U/RT2800U/RT3000U wireless driver device run # # Atheros AR5523 wireless driver device uath # # Conexant/Intersil PrismGT wireless driver device upgt # # Ralink Technology RT2500USB wireless driver device ural # # RNDIS USB ethernet driver device urndis # Realtek RTL8187B/L wireless driver device urtw # # Realtek RTL8188CU/RTL8192CU wireless driver device urtwn # # ZyDas ZD1211/ZD1211B wireless driver device zyd # # Sierra USB wireless driver device usie # # debugging options for the USB subsystem # options USB_DEBUG options U3G_DEBUG # options for ukbd: options UKBD_DFLT_KEYMAP # specify the built-in keymap makeoptions UKBD_DFLT_KEYMAP=jp.pc98 # options for uplcom: options UPLCOM_INTR_INTERVAL=100 # interrupt pipe interval # in milliseconds # options for uvscom: options UVSCOM_DEFAULT_OPKTSIZE=8 # default output packet size options UVSCOM_INTR_INTERVAL=100 # interrupt pipe interval # in milliseconds ##################################################################### # FireWire support device firewire # FireWire bus code device sbp # SCSI over Firewire (Requires scbus and da) device sbp_targ # SBP-2 Target mode (Requires scbus and targ) device fwe # Ethernet over FireWire (non-standard!) device fwip # IP over FireWire (RFC2734 and RFC3146) ##################################################################### # dcons support (Dumb Console Device) device dcons # dumb console driver device dcons_crom # FireWire attachment options DCONS_BUF_SIZE=16384 # buffer size options DCONS_POLL_HZ=100 # polling rate options DCONS_FORCE_CONSOLE=0 # force to be the primary console options DCONS_FORCE_GDB=1 # force to be the gdb device ##################################################################### # crypto subsystem # # This is a port of the OpenBSD crypto framework. Include this when # configuring IPSEC and when you have a h/w crypto device to accelerate # user applications that link to OpenSSL. # # Drivers are ports from OpenBSD with some simple enhancements that have # been fed back to OpenBSD. device crypto # core crypto support device cryptodev # /dev/crypto for access to h/w device rndtest # FIPS 140-2 entropy tester device hifn # Hifn 7951, 7781, etc. options HIFN_DEBUG # enable debugging support: hw.hifn.debug options HIFN_RNDTEST # enable rndtest support device ubsec # Broadcom 5501, 5601, 58xx options UBSEC_DEBUG # enable debugging support: hw.ubsec.debug options UBSEC_RNDTEST # enable rndtest support ##################################################################### # # Embedded system options: # # An embedded system might want to run something other than init. options INIT_PATH=/sbin/init:/rescue/init # Debug options options BUS_DEBUG # enable newbus debugging options DEBUG_VFS_LOCKS # enable VFS lock debugging options SOCKBUF_DEBUG # enable sockbuf last record/mb tail checking # # Verbose SYSINIT # # Make the SYSINIT process performed by mi_startup() verbose. This is very # useful when porting to a new architecture. If DDB is also enabled, this # will print function names instead of addresses. options VERBOSE_SYSINIT ##################################################################### # SYSV IPC KERNEL PARAMETERS # # Maximum number of System V semaphores that can be used on the system at # one time. options SEMMNI=11 # Total number of semaphores system wide options SEMMNS=61 # Total number of undo structures in system options SEMMNU=31 # Maximum number of System V semaphores that can be used by a single process # at one time. options SEMMSL=61 # Maximum number of operations that can be outstanding on a single System V # semaphore at one time. options SEMOPM=101 # Maximum number of undo operations that can be outstanding on a single # System V semaphore at one time. options SEMUME=11 # Maximum number of shared memory pages system wide. options SHMALL=1025 # Maximum size, in bytes, of a single System V shared memory region. options SHMMAX=(SHMMAXPGS*PAGE_SIZE+1) options SHMMAXPGS=1025 # Minimum size, in bytes, of a single System V shared memory region. options SHMMIN=2 # Maximum number of shared memory regions that can be used on the system # at one time. options SHMMNI=33 # Maximum number of System V shared memory regions that can be attached to # a single process at one time. options SHMSEG=9 # Compress user core dumps. options COMPRESS_USER_CORES # required to compress file output from kernel for COMPRESS_USER_CORES. device gzio # Set the amount of time (in seconds) the system will wait before # rebooting automatically when a kernel panic occurs. If set to (-1), # the system will wait indefinitely until a key is pressed on the # console. options PANIC_REBOOT_WAIT_TIME=16 # Attempt to bypass the buffer cache and put data directly into the # userland buffer for read operation when O_DIRECT flag is set on the # file. Both offset and length of the read operation must be # multiples of the physical media sector size. # options DIRECTIO # Specify a lower limit for the number of swap I/O buffers. They are # (among other things) used when bypassing the buffer cache due to # DIRECTIO kernel option enabled and O_DIRECT flag set on file. # options NSWBUF_MIN=120 ##################################################################### # More undocumented options for linting. # Note that documenting these is not considered an affront. options CAM_DEBUG_DELAY # VFS cluster debugging. options CLUSTERDEBUG options DEBUG # Kernel filelock debugging. options LOCKF_DEBUG # System V compatible message queues # Please note that the values provided here are used to test kernel # building. The defaults in the sources provide almost the same numbers. # MSGSSZ must be a power of 2 between 8 and 1024. options MSGMNB=2049 # Max number of chars in queue options MSGMNI=41 # Max number of message queue identifiers options MSGSEG=2049 # Max number of message segments options MSGSSZ=16 # Size of a message segment options MSGTQL=41 # Max number of messages in system options NBUF=512 # Number of buffer headers options SCSI_NCR_DEBUG options SCSI_NCR_MAX_SYNC=10000 options SCSI_NCR_MAX_WIDE=1 options SCSI_NCR_MYADDR=7 options SC_DEBUG_LEVEL=5 # Syscons debug level options SC_RENDER_DEBUG # syscons rendering debugging options VFS_BIO_DEBUG # VFS buffer I/O debugging options KSTACK_MAX_PAGES=32 # Maximum pages to give the kernel stack options KSTACK_USAGE_PROF # Adaptec Array Controller driver options options AAC_DEBUG # Debugging levels: # 0 - quiet, only emit warnings # 1 - noisy, emit major function # points and things done # 2 - extremely noisy, emit trace # items in loops, etc. # Resource Accounting options RACCT # Resource Limits options RCTL # Yet more undocumented options for linting. # BKTR_ALLOC_PAGES has no effect except to cause warnings, and # BROOKTREE_ALLOC_PAGES hasn't actually been superseded by it, since the # driver still mostly spells this option BROOKTREE_ALLOC_PAGES. ##options BKTR_ALLOC_PAGES=(217*4+1) options BROOKTREE_ALLOC_PAGES=(217*4+1) options MAXFILES=999 # Random number generator # Only ONE of the below two may be used; they are mutually exclusive. options RANDOM_YARROW # Yarrow CSPRNG (Default) #options RANDOM_FORTUNA # Fortuna CSPRNG options RANDOM_DEBUG # Debugging messages # Module to enable execution of application via emulators like QEMU options IMAGACT_BINMISC Index: projects/ifnet/sys/conf/files.arm =================================================================== --- projects/ifnet/sys/conf/files.arm (revision 279031) +++ projects/ifnet/sys/conf/files.arm (revision 279032) @@ -1,108 +1,108 @@ # $FreeBSD$ arm/arm/autoconf.c standard arm/arm/bcopy_page.S standard arm/arm/bcopyinout.S standard arm/arm/blockio.S standard arm/arm/bootconfig.c standard arm/arm/bus_space_asm_generic.S standard arm/arm/busdma_machdep.c optional !armv6 arm/arm/busdma_machdep-v6.c optional armv6 arm/arm/copystr.S standard arm/arm/cpufunc.c standard arm/arm/cpufunc_asm.S standard arm/arm/cpufunc_asm_armv4.S standard arm/arm/cpuinfo.c standard arm/arm/cpu_asm-v6.S optional armv6 arm/arm/db_disasm.c optional ddb arm/arm/db_interface.c optional ddb arm/arm/db_trace.c optional ddb arm/arm/devmap.c standard arm/arm/disassem.c optional ddb arm/arm/dump_machdep.c standard arm/arm/elf_machdep.c standard arm/arm/elf_note.S standard arm/arm/exception.S standard arm/arm/fiq.c standard arm/arm/fiq_subr.S standard arm/arm/fusu.S standard arm/arm/gdb_machdep.c optional gdb arm/arm/identcpu.c standard arm/arm/in_cksum.c optional inet | inet6 arm/arm/in_cksum_arm.S optional inet | inet6 arm/arm/intr.c standard arm/arm/locore.S standard no-obj arm/arm/machdep.c standard arm/arm/mem.c optional mem arm/arm/minidump_machdep.c optional mem arm/arm/mp_machdep.c optional smp arm/arm/nexus.c standard arm/arm/physmem.c standard arm/arm/pl190.c optional pl190 arm/arm/pl310.c optional pl310 arm/arm/platform.c optional platform arm/arm/platform_if.m optional platform arm/arm/pmap.c optional !armv6 arm/arm/pmap-v6.c optional armv6 arm/arm/sc_machdep.c optional sc arm/arm/setcpsr.S standard arm/arm/setstack.s standard arm/arm/stack_machdep.c optional ddb | stack arm/arm/stdatomic.c standard \ compile-with "${NORMAL_C:N-Wmissing-prototypes}" arm/arm/support.S standard arm/arm/swtch.S standard arm/arm/sys_machdep.c standard arm/arm/syscall.c standard arm/arm/trap.c optional !armv6 arm/arm/trap-v6.c optional armv6 arm/arm/uio_machdep.c standard arm/arm/undefined.c standard -arm/arm/unwind.c optional ddb +arm/arm/unwind.c optional ddb | kdtrace_hooks arm/arm/vm_machdep.c standard arm/arm/vfp.c standard board_id.h standard \ dependency "$S/arm/conf/genboardid.awk $S/arm/conf/mach-types" \ compile-with "${AWK} -f $S/arm/conf/genboardid.awk $S/arm/conf/mach-types > board_id.h" \ no-obj no-implicit-rule before-depend \ clean "board_id.h" cddl/compat/opensolaris/kern/opensolaris_atomic.c optional zfs compile-with "${ZFS_C}" crypto/blowfish/bf_enc.c optional crypto | ipsec crypto/des/des_enc.c optional crypto | ipsec | netsmb dev/fb/fb.c optional sc dev/fdt/fdt_arm_platform.c optional platform fdt dev/hwpmc/hwpmc_arm.c optional hwpmc dev/hwpmc/hwpmc_armv7.c optional hwpmc dev/kbd/kbd.c optional sc | vt dev/syscons/scgfbrndr.c optional sc dev/syscons/scterm-teken.c optional sc dev/syscons/scvtb.c optional sc dev/uart/uart_cpu_fdt.c optional uart fdt font.h optional sc \ compile-with "uudecode < /usr/share/syscons/fonts/${SC_DFLT_FONT}-8x16.fnt && file2c 'u_char dflt_font_16[16*256] = {' '};' < ${SC_DFLT_FONT}-8x16 > font.h && uudecode < /usr/share/syscons/fonts/${SC_DFLT_FONT}-8x14.fnt && file2c 'u_char dflt_font_14[14*256] = {' '};' < ${SC_DFLT_FONT}-8x14 >> font.h && uudecode < /usr/share/syscons/fonts/${SC_DFLT_FONT}-8x8.fnt && file2c 'u_char dflt_font_8[8*256] = {' '};' < ${SC_DFLT_FONT}-8x8 >> font.h" \ no-obj no-implicit-rule before-depend \ clean "font.h ${SC_DFLT_FONT}-8x14 ${SC_DFLT_FONT}-8x16 ${SC_DFLT_FONT}-8x8" kern/subr_busdma_bufalloc.c standard kern/subr_dummy_vdso_tc.c standard kern/subr_sfbuf.c standard libkern/arm/aeabi_unwind.c standard libkern/arm/divsi3.S standard libkern/arm/ffs.S standard libkern/arm/ldivmod.S standard libkern/arm/ldivmod_helper.c standard libkern/arm/memcpy.S standard libkern/arm/memset.S standard libkern/arm/muldi3.c standard libkern/ashldi3.c standard libkern/ashrdi3.c standard libkern/divdi3.c standard libkern/ffsl.c standard libkern/fls.c standard libkern/flsl.c standard libkern/flsll.c standard libkern/lshrdi3.c standard libkern/moddi3.c standard libkern/qdivrem.c standard libkern/ucmpdi2.c standard libkern/udivdi3.c standard libkern/umoddi3.c standard Index: projects/ifnet/sys/conf/kmod.mk =================================================================== --- projects/ifnet/sys/conf/kmod.mk (revision 279031) +++ projects/ifnet/sys/conf/kmod.mk (revision 279032) @@ -1,456 +1,457 @@ # From: @(#)bsd.prog.mk 5.26 (Berkeley) 6/25/91 # $FreeBSD$ # # The include file handles building and installing loadable # kernel modules. # # # +++ variables +++ # # CLEANFILES Additional files to remove for the clean and cleandir targets. # # EXPORT_SYMS A list of symbols that should be exported from the module, # or the name of a file containing a list of symbols, or YES # to export all symbols. If not defined, no symbols are # exported. # # KMOD The name of the kernel module to build. # # KMODDIR Base path for kernel modules (see kld(4)). [/boot/kernel] # # KMODOWN Module file owner. [${BINOWN}] # # KMODGRP Module file group. [${BINGRP}] # # KMODMODE Module file mode. [${BINMODE}] # # KMODLOAD Command to load a kernel module [/sbin/kldload] # # KMODUNLOAD Command to unload a kernel module [/sbin/kldunload] # # MFILES Optionally a list of interfaces used by the module. # This file contains a default list of interfaces. # # PROG The name of the kernel module to build. # If not supplied, ${KMOD}.ko is used. # # SRCS List of source files. # # FIRMWS List of firmware images in format filename:shortname:version # # FIRMWARE_LICENSE # Set to the name of the license the user has to agree on in # order to use this firmware. See /usr/share/doc/legal # # DESTDIR The tree where the module gets installed. [not set] # # +++ targets +++ # # install: # install the kernel module; if the Makefile # does not itself define the target install, the targets # beforeinstall and afterinstall may also be used to cause # actions immediately before and after the install target # is executed. # # load: # Load a module. # # unload: # Unload a module. # AWK?= awk KMODLOAD?= /sbin/kldload KMODUNLOAD?= /sbin/kldunload OBJCOPY?= objcopy .include # Grab all the options for a kernel build. For backwards compat, we need to # do this after bsd.own.mk. .include "kern.opts.mk" .include .include "config.mk" .SUFFIXES: .out .o .c .cc .cxx .C .y .l .s .S # amd64 and mips use direct linking for kmod, all others use shared binaries .if ${MACHINE_CPUARCH} != amd64 && ${MACHINE_CPUARCH} != mips __KLD_SHARED=yes .else __KLD_SHARED=no .endif .if !empty(CFLAGS:M-O[23s]) && empty(CFLAGS:M-fno-strict-aliasing) CFLAGS+= -fno-strict-aliasing .endif WERROR?= -Werror CFLAGS+= ${WERROR} CFLAGS+= -D_KERNEL CFLAGS+= -DKLD_MODULE # Don't use any standard or source-relative include directories. NOSTDINC= -nostdinc CFLAGS:= ${CFLAGS:N-I*} ${NOSTDINC} ${INCLMAGIC} ${CFLAGS:M-I*} .if defined(KERNBUILDDIR) CFLAGS+= -DHAVE_KERNEL_OPTION_HEADERS -include ${KERNBUILDDIR}/opt_global.h .endif # Add -I paths for system headers. Individual module makefiles don't # need any -I paths for this. Similar defaults for .PATH can't be # set because there are no standard paths for non-headers. CFLAGS+= -I. -I${SYSDIR} # Add -I path for altq headers as they are included via net/if_var.h # for example. CFLAGS+= -I${SYSDIR}/contrib/altq CFLAGS.gcc+= -finline-limit=${INLINE_LIMIT} CFLAGS.gcc+= -fms-extensions CFLAGS.gcc+= --param inline-unit-growth=100 CFLAGS.gcc+= --param large-function-growth=1000 # Disallow common variables, and if we end up with commons from # somewhere unexpected, allocate storage for them in the module itself. CFLAGS+= -fno-common LDFLAGS+= -d -warn-common CFLAGS+= ${DEBUG_FLAGS} .if ${MACHINE_CPUARCH} == amd64 CFLAGS+= -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer .endif # Temporary workaround for PR 196407, which contains the fascinating details. # Don't allow clang to use fpu instructions or registers in kernel modules. .if ${MACHINE_CPUARCH} == arm CFLAGS.clang+= -mllvm -arm-use-movt=0 CFLAGS.clang+= -mfpu=none +CFLAGS+= -funwind-tables .endif .if ${MACHINE_CPUARCH} == powerpc CFLAGS+= -mlongcall -fno-omit-frame-pointer .endif .if ${MACHINE_CPUARCH} == mips CFLAGS+= -G0 -fno-pic -mno-abicalls -mlong-calls .endif .if defined(DEBUG) || defined(DEBUG_FLAGS) CTFFLAGS+= -g .endif .if defined(FIRMWS) ${KMOD:S/$/.c/}: ${SYSDIR}/tools/fw_stub.awk ${AWK} -f ${SYSDIR}/tools/fw_stub.awk ${FIRMWS} -m${KMOD} -c${KMOD:S/$/.c/g} \ ${FIRMWARE_LICENSE:C/.+/-l/}${FIRMWARE_LICENSE} SRCS+= ${KMOD:S/$/.c/} CLEANFILES+= ${KMOD:S/$/.c/} .for _firmw in ${FIRMWS} ${_firmw:C/\:.*$/.fwo/}: ${_firmw:C/\:.*$//} @${ECHO} ${_firmw:C/\:.*$//} ${.ALLSRC:M*${_firmw:C/\:.*$//}} @if [ -e ${_firmw:C/\:.*$//} ]; then \ ${LD} -b binary --no-warn-mismatch ${_LDFLAGS} \ -r -d -o ${.TARGET} ${_firmw:C/\:.*$//}; \ else \ ln -s ${.ALLSRC:M*${_firmw:C/\:.*$//}} ${_firmw:C/\:.*$//}; \ ${LD} -b binary --no-warn-mismatch ${_LDFLAGS} \ -r -d -o ${.TARGET} ${_firmw:C/\:.*$//}; \ rm ${_firmw:C/\:.*$//}; \ fi OBJS+= ${_firmw:C/\:.*$/.fwo/} .endfor .endif # Conditionally include SRCS based on kernel config options. .for _o in ${KERN_OPTS} SRCS+=${SRCS.${_o}} .endfor OBJS+= ${SRCS:N*.h:R:S/$/.o/g} .if !defined(PROG) PROG= ${KMOD}.ko .endif .if !defined(DEBUG_FLAGS) FULLPROG= ${PROG} .else FULLPROG= ${PROG}.debug ${PROG}: ${FULLPROG} ${PROG}.symbols ${OBJCOPY} --strip-debug --add-gnu-debuglink=${PROG}.symbols\ ${FULLPROG} ${.TARGET} ${PROG}.symbols: ${FULLPROG} ${OBJCOPY} --only-keep-debug ${FULLPROG} ${.TARGET} .endif .if ${__KLD_SHARED} == yes ${FULLPROG}: ${KMOD}.kld ${LD} -Bshareable ${_LDFLAGS} -o ${.TARGET} ${KMOD}.kld .if !defined(DEBUG_FLAGS) ${OBJCOPY} --strip-debug ${.TARGET} .endif .endif EXPORT_SYMS?= NO .if ${EXPORT_SYMS} != YES CLEANFILES+= export_syms .endif .if ${__KLD_SHARED} == yes ${KMOD}.kld: ${OBJS} .else ${FULLPROG}: ${OBJS} .endif ${LD} ${_LDFLAGS} -r -d -o ${.TARGET} ${OBJS} .if ${MK_CTF} != "no" ${CTFMERGE} ${CTFFLAGS} -o ${.TARGET} ${OBJS} .endif .if defined(EXPORT_SYMS) .if ${EXPORT_SYMS} != YES .if ${EXPORT_SYMS} == NO :> export_syms .elif !exists(${.CURDIR}/${EXPORT_SYMS}) echo ${EXPORT_SYMS} > export_syms .else grep -v '^#' < ${EXPORT_SYMS} > export_syms .endif awk -f ${SYSDIR}/conf/kmod_syms.awk ${.TARGET} \ export_syms | xargs -J% ${OBJCOPY} % ${.TARGET} .endif .endif .if !defined(DEBUG_FLAGS) && ${__KLD_SHARED} == no ${OBJCOPY} --strip-debug ${.TARGET} .endif _ILINKS=machine .if ${MACHINE} != ${MACHINE_CPUARCH} _ILINKS+=${MACHINE_CPUARCH} .endif .if ${MACHINE_CPUARCH} == "i386" || ${MACHINE_CPUARCH} == "amd64" _ILINKS+=x86 .endif CLEANFILES+=${_ILINKS} all: objwarn ${PROG} beforedepend: ${_ILINKS} # Ensure that the links exist without depending on it when it exists which # causes all the modules to be rebuilt when the directory pointed to changes. .for _link in ${_ILINKS} .if !exists(${.OBJDIR}/${_link}) ${OBJS}: ${.OBJDIR}/${_link} .endif .endfor # Search for kernel source tree in standard places. .for _dir in ${.CURDIR}/../.. ${.CURDIR}/../../.. /sys /usr/src/sys .if !defined(SYSDIR) && exists(${_dir}/kern/) SYSDIR= ${_dir} .endif .endfor .if !defined(SYSDIR) || !exists(${SYSDIR}/kern/) .error "can't find kernel source tree" .endif .for _link in ${_ILINKS} .PHONY: ${_link} ${_link}: ${.OBJDIR}/${_link} ${.OBJDIR}/${_link}: @case ${.TARGET:T} in \ machine) \ path=${SYSDIR}/${MACHINE}/include ;; \ *) \ path=${SYSDIR}/${.TARGET:T}/include ;; \ esac ; \ path=`(cd $$path && /bin/pwd)` ; \ ${ECHO} ${.TARGET:T} "->" $$path ; \ ln -sf $$path ${.TARGET:T} .endfor CLEANFILES+= ${PROG} ${KMOD}.kld ${OBJS} .if defined(DEBUG_FLAGS) CLEANFILES+= ${FULLPROG} ${PROG}.symbols .endif .if !target(install) _INSTALLFLAGS:= ${INSTALLFLAGS} .for ie in ${INSTALLFLAGS_EDIT} _INSTALLFLAGS:= ${_INSTALLFLAGS${ie}} .endfor .if !target(realinstall) realinstall: _kmodinstall .ORDER: beforeinstall _kmodinstall _kmodinstall: ${INSTALL} -o ${KMODOWN} -g ${KMODGRP} -m ${KMODMODE} \ ${_INSTALLFLAGS} ${PROG} ${DESTDIR}${KMODDIR} .if defined(DEBUG_FLAGS) && !defined(INSTALL_NODEBUG) && ${MK_KERNEL_SYMBOLS} != "no" ${INSTALL} -o ${KMODOWN} -g ${KMODGRP} -m ${KMODMODE} \ ${_INSTALLFLAGS} ${PROG}.symbols ${DESTDIR}${KMODDIR} .endif .include .if !defined(NO_XREF) afterinstall: _kldxref .ORDER: realinstall _kldxref .ORDER: _installlinks _kldxref _kldxref: @if type kldxref >/dev/null 2>&1; then \ ${ECHO} kldxref ${DESTDIR}${KMODDIR}; \ kldxref ${DESTDIR}${KMODDIR}; \ fi .endif .endif # !target(realinstall) .endif # !target(install) .if !target(load) load: ${PROG} ${KMODLOAD} -v ${.OBJDIR}/${PROG} .endif .if !target(unload) unload: ${KMODUNLOAD} -v ${PROG} .endif .if defined(KERNBUILDDIR) .PATH: ${KERNBUILDDIR} CFLAGS+= -I${KERNBUILDDIR} .for _src in ${SRCS:Mopt_*.h} CLEANFILES+= ${_src} .if !target(${_src}) ${_src}: ln -sf ${KERNBUILDDIR}/${_src} ${.TARGET} .endif .endfor .else .for _src in ${SRCS:Mopt_*.h} CLEANFILES+= ${_src} .if !target(${_src}) ${_src}: :> ${.TARGET} .endif .endfor .endif # Respect configuration-specific C flags. CFLAGS+= ${CONF_CFLAGS} MFILES?= dev/acpica/acpi_if.m dev/acpi_support/acpi_wmi_if.m \ dev/agp/agp_if.m dev/ata/ata_if.m dev/eisa/eisa_if.m \ dev/fb/fb_if.m dev/gpio/gpio_if.m dev/gpio/gpiobus_if.m \ dev/iicbus/iicbb_if.m dev/iicbus/iicbus_if.m \ dev/mbox/mbox_if.m dev/mmc/mmcbr_if.m dev/mmc/mmcbus_if.m \ dev/mii/miibus_if.m dev/mvs/mvs_if.m dev/ofw/ofw_bus_if.m \ dev/pccard/card_if.m dev/pccard/power_if.m dev/pci/pci_if.m \ dev/pci/pcib_if.m dev/ppbus/ppbus_if.m \ dev/sdhci/sdhci_if.m dev/smbus/smbus_if.m dev/spibus/spibus_if.m \ dev/sound/pci/hda/hdac_if.m \ dev/sound/pcm/ac97_if.m dev/sound/pcm/channel_if.m \ dev/sound/pcm/feeder_if.m dev/sound/pcm/mixer_if.m \ dev/sound/midi/mpu_if.m dev/sound/midi/mpufoi_if.m \ dev/sound/midi/synth_if.m dev/usb/usb_if.m isa/isa_if.m \ kern/bus_if.m kern/clock_if.m \ kern/cpufreq_if.m kern/device_if.m kern/serdev_if.m \ libkern/iconv_converter_if.m opencrypto/cryptodev_if.m \ pc98/pc98/canbus_if.m dev/etherswitch/mdio_if.m .for _srcsrc in ${MFILES} .for _ext in c h .for _src in ${SRCS:M${_srcsrc:T:R}.${_ext}} CLEANFILES+= ${_src} .if !target(${_src}) ${_src}: ${SYSDIR}/tools/makeobjops.awk ${SYSDIR}/${_srcsrc} ${AWK} -f ${SYSDIR}/tools/makeobjops.awk ${SYSDIR}/${_srcsrc} -${_ext} .endif .endfor # _src .endfor # _ext .endfor # _srcsrc .if !empty(SRCS:Mvnode_if.c) CLEANFILES+= vnode_if.c vnode_if.c: ${SYSDIR}/tools/vnode_if.awk ${SYSDIR}/kern/vnode_if.src ${AWK} -f ${SYSDIR}/tools/vnode_if.awk ${SYSDIR}/kern/vnode_if.src -c .endif .if !empty(SRCS:Mvnode_if.h) CLEANFILES+= vnode_if.h vnode_if_newproto.h vnode_if_typedef.h vnode_if.h vnode_if_newproto.h vnode_if_typedef.h: ${SYSDIR}/tools/vnode_if.awk \ ${SYSDIR}/kern/vnode_if.src vnode_if.h: vnode_if_newproto.h vnode_if_typedef.h ${AWK} -f ${SYSDIR}/tools/vnode_if.awk ${SYSDIR}/kern/vnode_if.src -h vnode_if_newproto.h: ${AWK} -f ${SYSDIR}/tools/vnode_if.awk ${SYSDIR}/kern/vnode_if.src -p vnode_if_typedef.h: ${AWK} -f ${SYSDIR}/tools/vnode_if.awk ${SYSDIR}/kern/vnode_if.src -q .endif .for _i in mii pccard .if !empty(SRCS:M${_i}devs.h) CLEANFILES+= ${_i}devs.h ${_i}devs.h: ${SYSDIR}/tools/${_i}devs2h.awk ${SYSDIR}/dev/${_i}/${_i}devs ${AWK} -f ${SYSDIR}/tools/${_i}devs2h.awk ${SYSDIR}/dev/${_i}/${_i}devs .endif .endfor # _i .if !empty(SRCS:Musbdevs.h) CLEANFILES+= usbdevs.h usbdevs.h: ${SYSDIR}/tools/usbdevs2h.awk ${SYSDIR}/dev/usb/usbdevs ${AWK} -f ${SYSDIR}/tools/usbdevs2h.awk ${SYSDIR}/dev/usb/usbdevs -h .endif .if !empty(SRCS:Musbdevs_data.h) CLEANFILES+= usbdevs_data.h usbdevs_data.h: ${SYSDIR}/tools/usbdevs2h.awk ${SYSDIR}/dev/usb/usbdevs ${AWK} -f ${SYSDIR}/tools/usbdevs2h.awk ${SYSDIR}/dev/usb/usbdevs -d .endif .if !empty(SRCS:Macpi_quirks.h) CLEANFILES+= acpi_quirks.h acpi_quirks.h: ${SYSDIR}/tools/acpi_quirks2h.awk ${SYSDIR}/dev/acpica/acpi_quirks ${AWK} -f ${SYSDIR}/tools/acpi_quirks2h.awk ${SYSDIR}/dev/acpica/acpi_quirks .endif .if !empty(SRCS:Massym.s) CLEANFILES+= assym.s genassym.o assym.s: genassym.o .if defined(KERNBUILDDIR) genassym.o: opt_global.h .endif assym.s: ${SYSDIR}/kern/genassym.sh sh ${SYSDIR}/kern/genassym.sh genassym.o > ${.TARGET} genassym.o: ${SYSDIR}/${MACHINE_CPUARCH}/${MACHINE_CPUARCH}/genassym.c genassym.o: ${SRCS:Mopt_*.h} ${CC} -c ${CFLAGS:N-fno-common} \ ${SYSDIR}/${MACHINE_CPUARCH}/${MACHINE_CPUARCH}/genassym.c .endif lint: ${SRCS} ${LINT} ${LINTKERNFLAGS} ${CFLAGS:M-[DILU]*} ${.ALLSRC:M*.c} .if defined(KERNBUILDDIR) ${OBJS}: opt_global.h .endif .include cleandepend: cleanilinks # .depend needs include links so we remove them only together. cleanilinks: rm -f ${_ILINKS} .if !exists(${.OBJDIR}/${DEPENDFILE}) ${OBJS}: ${SRCS:M*.h} .endif .include .include "kern.mk" Index: projects/ifnet/sys/conf =================================================================== --- projects/ifnet/sys/conf (revision 279031) +++ projects/ifnet/sys/conf (revision 279032) Property changes on: projects/ifnet/sys/conf ___________________________________________________________________ Modified: svn:mergeinfo ## -0,0 +0,1 ## Merged /head/sys/conf:r278980-279031 Index: projects/ifnet/sys/dev/ofw/ofw_cpu.c =================================================================== --- projects/ifnet/sys/dev/ofw/ofw_cpu.c (revision 279031) +++ projects/ifnet/sys/dev/ofw/ofw_cpu.c (revision 279032) @@ -1,265 +1,324 @@ /*- * Copyright (C) 2009 Nathan Whitehorn * Copyright (C) 2015 The FreeBSD Foundation * All rights reserved. * * Portions of this software were developed by Andrew Turner * under sponsorship from the FreeBSD Foundation. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include +#include static int ofw_cpulist_probe(device_t); static int ofw_cpulist_attach(device_t); static const struct ofw_bus_devinfo *ofw_cpulist_get_devinfo(device_t dev, device_t child); static MALLOC_DEFINE(M_OFWCPU, "ofwcpu", "OFW CPU device information"); struct ofw_cpulist_softc { pcell_t sc_addr_cells; }; static device_method_t ofw_cpulist_methods[] = { /* Device interface */ DEVMETHOD(device_probe, ofw_cpulist_probe), DEVMETHOD(device_attach, ofw_cpulist_attach), /* Bus interface */ DEVMETHOD(bus_add_child, bus_generic_add_child), DEVMETHOD(bus_child_pnpinfo_str, ofw_bus_gen_child_pnpinfo_str), /* ofw_bus interface */ DEVMETHOD(ofw_bus_get_devinfo, ofw_cpulist_get_devinfo), DEVMETHOD(ofw_bus_get_compat, ofw_bus_gen_get_compat), DEVMETHOD(ofw_bus_get_model, ofw_bus_gen_get_model), DEVMETHOD(ofw_bus_get_name, ofw_bus_gen_get_name), DEVMETHOD(ofw_bus_get_node, ofw_bus_gen_get_node), DEVMETHOD(ofw_bus_get_type, ofw_bus_gen_get_type), DEVMETHOD_END }; static driver_t ofw_cpulist_driver = { "cpulist", ofw_cpulist_methods, sizeof(struct ofw_cpulist_softc) }; static devclass_t ofw_cpulist_devclass; DRIVER_MODULE(ofw_cpulist, ofwbus, ofw_cpulist_driver, ofw_cpulist_devclass, 0, 0); static int ofw_cpulist_probe(device_t dev) { const char *name; name = ofw_bus_get_name(dev); if (name == NULL || strcmp(name, "cpus") != 0) return (ENXIO); device_set_desc(dev, "Open Firmware CPU Group"); return (0); } static int ofw_cpulist_attach(device_t dev) { struct ofw_cpulist_softc *sc; phandle_t root, child; device_t cdev; struct ofw_bus_devinfo *dinfo; sc = device_get_softc(dev); root = ofw_bus_get_node(dev); sc->sc_addr_cells = 1; OF_getencprop(root, "#address-cells", &sc->sc_addr_cells, sizeof(sc->sc_addr_cells)); for (child = OF_child(root); child != 0; child = OF_peer(child)) { dinfo = malloc(sizeof(*dinfo), M_OFWCPU, M_WAITOK | M_ZERO); if (ofw_bus_gen_setup_devinfo(dinfo, child) != 0) { free(dinfo, M_OFWCPU); continue; } cdev = device_add_child(dev, NULL, -1); if (cdev == NULL) { device_printf(dev, "<%s>: device_add_child failed\n", dinfo->obd_name); ofw_bus_gen_destroy_devinfo(dinfo); free(dinfo, M_OFWCPU); continue; } device_set_ivars(cdev, dinfo); } return (bus_generic_attach(dev)); } static const struct ofw_bus_devinfo * ofw_cpulist_get_devinfo(device_t dev, device_t child) { return (device_get_ivars(child)); } static int ofw_cpu_probe(device_t); static int ofw_cpu_attach(device_t); static int ofw_cpu_read_ivar(device_t dev, device_t child, int index, uintptr_t *result); struct ofw_cpu_softc { struct pcpu *sc_cpu_pcpu; uint32_t sc_nominal_mhz; boolean_t sc_reg_valid; pcell_t sc_reg[2]; }; static device_method_t ofw_cpu_methods[] = { /* Device interface */ DEVMETHOD(device_probe, ofw_cpu_probe), DEVMETHOD(device_attach, ofw_cpu_attach), /* Bus interface */ DEVMETHOD(bus_add_child, bus_generic_add_child), DEVMETHOD(bus_read_ivar, ofw_cpu_read_ivar), DEVMETHOD(bus_setup_intr, bus_generic_setup_intr), DEVMETHOD(bus_teardown_intr, bus_generic_teardown_intr), DEVMETHOD(bus_alloc_resource, bus_generic_alloc_resource), DEVMETHOD(bus_release_resource, bus_generic_release_resource), DEVMETHOD(bus_activate_resource,bus_generic_activate_resource), DEVMETHOD_END }; static driver_t ofw_cpu_driver = { "cpu", ofw_cpu_methods, sizeof(struct ofw_cpu_softc) }; static devclass_t ofw_cpu_devclass; DRIVER_MODULE(ofw_cpu, cpulist, ofw_cpu_driver, ofw_cpu_devclass, 0, 0); static int ofw_cpu_probe(device_t dev) { const char *type = ofw_bus_get_type(dev); if (type == NULL || strcmp(type, "cpu") != 0) return (ENXIO); device_set_desc(dev, "Open Firmware CPU"); return (0); } static int ofw_cpu_attach(device_t dev) { struct ofw_cpulist_softc *psc; struct ofw_cpu_softc *sc; phandle_t node; pcell_t cell; int rv; sc = device_get_softc(dev); psc = device_get_softc(device_get_parent(dev)); if (nitems(sc->sc_reg) < psc->sc_addr_cells) { if (bootverbose) device_printf(dev, "Too many address cells\n"); return (EINVAL); } node = ofw_bus_get_node(dev); /* Read and validate the reg property for use later */ sc->sc_reg_valid = false; rv = OF_getencprop(node, "reg", sc->sc_reg, sizeof(sc->sc_reg)); if (rv < 0) device_printf(dev, "missing 'reg' property\n"); else if ((rv % 4) != 0) { if (bootverbose) device_printf(dev, "Malformed reg property\n"); } else if ((rv / 4) != psc->sc_addr_cells) { if (bootverbose) device_printf(dev, "Invalid reg size %u\n", rv); } else sc->sc_reg_valid = true; sc->sc_cpu_pcpu = pcpu_find(device_get_unit(dev)); if (OF_getencprop(node, "clock-frequency", &cell, sizeof(cell)) < 0) { if (bootverbose) device_printf(dev, "missing 'clock-frequency' property\n"); } else sc->sc_nominal_mhz = cell / 1000000; /* convert to MHz */ bus_generic_probe(dev); return (bus_generic_attach(dev)); } static int ofw_cpu_read_ivar(device_t dev, device_t child, int index, uintptr_t *result) { + struct ofw_cpulist_softc *psc; struct ofw_cpu_softc *sc; sc = device_get_softc(dev); switch (index) { case CPU_IVAR_PCPU: *result = (uintptr_t)sc->sc_cpu_pcpu; return (0); case CPU_IVAR_NOMINAL_MHZ: if (sc->sc_nominal_mhz > 0) { *result = (uintptr_t)sc->sc_nominal_mhz; return (0); } break; + case CPU_IVAR_CPUID_SIZE: + psc = device_get_softc(device_get_parent(dev)); + *result = psc->sc_addr_cells; + return (0); + case CPU_IVAR_CPUID: + if (sc->sc_reg_valid) { + *result = (uintptr_t)sc->sc_reg; + return (0); + } + break; } return (ENOENT); } +int +ofw_cpu_early_foreach(ofw_cpu_foreach_cb callback, boolean_t only_runnable) +{ + phandle_t node, child; + pcell_t addr_cells, reg[2]; + char status[16]; + u_int id; + int count, rv; + + count = 0; + id = 0; + + node = OF_finddevice("/cpus"); + if (node == -1) + return (-1); + + /* Find the number of cells in the cpu register */ + if (OF_getencprop(node, "#address-cells", &addr_cells, + sizeof(addr_cells)) < 0) + return (-1); + + for (child = OF_child(node); child != 0; child = OF_peer(child), id++) { + /* + * If we are filtering by runnable then limit to only + * those that have been enabled. + */ + if (only_runnable) { + status[0] = '\0'; + OF_getprop(child, "status", status, sizeof(status)); + if (status[0] != '\0' && strcmp(status, "okay") != 0) + continue; + } + + /* + * Check we have a register to identify the cpu + */ + rv = OF_getencprop(child, "reg", reg, + addr_cells * sizeof(cell_t)); + if (rv != addr_cells * sizeof(cell_t)) + continue; + + if (callback == NULL || callback(id, child, addr_cells, reg)) + count++; + } + + return (only_runnable ? count : id); +} Index: projects/ifnet/sys/dev/ofw/ofw_cpu.h =================================================================== --- projects/ifnet/sys/dev/ofw/ofw_cpu.h (nonexistent) +++ projects/ifnet/sys/dev/ofw/ofw_cpu.h (revision 279032) @@ -0,0 +1,38 @@ +/*- + * Copyright (c) 2015 The FreeBSD Foundation + * All rights reserved. + * + * This software was developed by Andrew Turner under + * sponsorship from the FreeBSD Foundation. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD$ + */ + +#ifndef _DEV_OFW_OFW_CPU_H_ +#define _DEV_OFW_OFW_CPU_H_ + +typedef boolean_t (*ofw_cpu_foreach_cb)(u_int, phandle_t, u_int, pcell_t *); +int ofw_cpu_early_foreach(ofw_cpu_foreach_cb, boolean_t); + +#endif /* _DEV_OFW_OFW_CPU_H_ */ Property changes on: projects/ifnet/sys/dev/ofw/ofw_cpu.h ___________________________________________________________________ Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Index: projects/ifnet/sys/kern/kern_ctf.c =================================================================== --- projects/ifnet/sys/kern/kern_ctf.c (revision 279031) +++ projects/ifnet/sys/kern/kern_ctf.c (revision 279032) @@ -1,340 +1,326 @@ /*- * Copyright (c) 2008 John Birrell * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ /* * Note this file is included by both link_elf.c and link_elf_obj.c. * * The CTF header structure definition can't be used here because it's * (annoyingly) covered by the CDDL. We will just use a few bytes from * it as an integer array where we 'know' what they mean. */ #define CTF_HDR_SIZE 36 #define CTF_HDR_STRTAB_U32 7 #define CTF_HDR_STRLEN_U32 8 #ifdef DDB_CTF static void * z_alloc(void *nil, u_int items, u_int size) { void *ptr; ptr = malloc(items * size, M_TEMP, M_NOWAIT); return ptr; } static void z_free(void *nil, void *ptr) { free(ptr, M_TEMP); } #endif static int link_elf_ctf_get(linker_file_t lf, linker_ctf_t *lc) { #ifdef DDB_CTF Elf_Ehdr *hdr = NULL; Elf_Shdr *shdr = NULL; caddr_t ctftab = NULL; caddr_t raw = NULL; caddr_t shstrtab = NULL; elf_file_t ef = (elf_file_t) lf; int flags; int i; int nbytes; ssize_t resid; size_t sz; struct nameidata nd; struct thread *td = curthread; uint8_t ctf_hdr[CTF_HDR_SIZE]; #endif int error = 0; if (lf == NULL || lc == NULL) return (EINVAL); /* Set the defaults for no CTF present. That's not a crime! */ bzero(lc, sizeof(*lc)); #ifdef DDB_CTF /* * First check if we've tried to load CTF data previously and the * CTF ELF section wasn't found. We flag that condition by setting * ctfcnt to -1. See below. */ if (ef->ctfcnt < 0) return (EFTYPE); /* Now check if we've already loaded the CTF data.. */ if (ef->ctfcnt > 0) { /* We only need to load once. */ lc->ctftab = ef->ctftab; lc->ctfcnt = ef->ctfcnt; lc->symtab = ef->ddbsymtab; lc->strtab = ef->ddbstrtab; lc->strcnt = ef->ddbstrcnt; lc->nsym = ef->ddbsymcnt; lc->ctfoffp = (uint32_t **) &ef->ctfoff; lc->typoffp = (uint32_t **) &ef->typoff; lc->typlenp = &ef->typlen; return (0); } /* * We need to try reading the CTF data. Flag no CTF data present * by default and if we actually succeed in reading it, we'll * update ctfcnt to the number of bytes read. */ ef->ctfcnt = -1; NDINIT(&nd, LOOKUP, FOLLOW, UIO_SYSSPACE, lf->pathname, td); flags = FREAD; error = vn_open(&nd, &flags, 0, NULL); if (error) return (error); NDFREE(&nd, NDF_ONLY_PNBUF); /* Allocate memory for the FLF header. */ - if ((hdr = malloc(sizeof(*hdr), M_LINKER, M_WAITOK)) == NULL) { - error = ENOMEM; - goto out; - } + hdr = malloc(sizeof(*hdr), M_LINKER, M_WAITOK); /* Read the ELF header. */ if ((error = vn_rdwr(UIO_READ, nd.ni_vp, hdr, sizeof(*hdr), 0, UIO_SYSSPACE, IO_NODELOCKED, td->td_ucred, NOCRED, &resid, td)) != 0) goto out; /* Sanity check. */ if (!IS_ELF(*hdr)) { error = ENOEXEC; goto out; } nbytes = hdr->e_shnum * hdr->e_shentsize; if (nbytes == 0 || hdr->e_shoff == 0 || hdr->e_shentsize != sizeof(Elf_Shdr)) { error = ENOEXEC; goto out; } /* Allocate memory for all the section headers */ - if ((shdr = malloc(nbytes, M_LINKER, M_WAITOK)) == NULL) { - error = ENOMEM; - goto out; - } + shdr = malloc(nbytes, M_LINKER, M_WAITOK); /* Read all the section headers */ if ((error = vn_rdwr(UIO_READ, nd.ni_vp, (caddr_t)shdr, nbytes, hdr->e_shoff, UIO_SYSSPACE, IO_NODELOCKED, td->td_ucred, NOCRED, &resid, td)) != 0) goto out; /* * We need to search for the CTF section by name, so if the * section names aren't present, then we can't locate the * .SUNW_ctf section containing the CTF data. */ if (hdr->e_shstrndx == 0 || shdr[hdr->e_shstrndx].sh_type != SHT_STRTAB) { printf("%s(%d): module %s e_shstrndx is %d, sh_type is %d\n", __func__, __LINE__, lf->pathname, hdr->e_shstrndx, shdr[hdr->e_shstrndx].sh_type); error = EFTYPE; goto out; } /* Allocate memory to buffer the section header strings. */ - if ((shstrtab = malloc(shdr[hdr->e_shstrndx].sh_size, M_LINKER, - M_WAITOK)) == NULL) { - error = ENOMEM; - goto out; - } + shstrtab = malloc(shdr[hdr->e_shstrndx].sh_size, M_LINKER, M_WAITOK); /* Read the section header strings. */ if ((error = vn_rdwr(UIO_READ, nd.ni_vp, shstrtab, shdr[hdr->e_shstrndx].sh_size, shdr[hdr->e_shstrndx].sh_offset, UIO_SYSSPACE, IO_NODELOCKED, td->td_ucred, NOCRED, &resid, td)) != 0) goto out; /* Search for the section containing the CTF data. */ for (i = 0; i < hdr->e_shnum; i++) if (strcmp(".SUNW_ctf", shstrtab + shdr[i].sh_name) == 0) break; /* Check if the CTF section wasn't found. */ if (i >= hdr->e_shnum) { printf("%s(%d): module %s has no .SUNW_ctf section\n", __func__, __LINE__, lf->pathname); error = EFTYPE; goto out; } /* Read the CTF header. */ if ((error = vn_rdwr(UIO_READ, nd.ni_vp, ctf_hdr, sizeof(ctf_hdr), shdr[i].sh_offset, UIO_SYSSPACE, IO_NODELOCKED, td->td_ucred, NOCRED, &resid, td)) != 0) goto out; /* Check the CTF magic number. (XXX check for big endian!) */ if (ctf_hdr[0] != 0xf1 || ctf_hdr[1] != 0xcf) { printf("%s(%d): module %s has invalid format\n", __func__, __LINE__, lf->pathname); error = EFTYPE; goto out; } /* Check if version 2. */ if (ctf_hdr[2] != 2) { printf("%s(%d): module %s CTF format version is %d " "(2 expected)\n", __func__, __LINE__, lf->pathname, ctf_hdr[2]); error = EFTYPE; goto out; } /* Check if the data is compressed. */ if ((ctf_hdr[3] & 0x1) != 0) { uint32_t *u32 = (uint32_t *) ctf_hdr; /* * The last two fields in the CTF header are the offset * from the end of the header to the start of the string * data and the length of that string data. se this * information to determine the decompressed CTF data * buffer required. */ sz = u32[CTF_HDR_STRTAB_U32] + u32[CTF_HDR_STRLEN_U32] + sizeof(ctf_hdr); /* * Allocate memory for the compressed CTF data, including * the header (which isn't compressed). */ - if ((raw = malloc(shdr[i].sh_size, M_LINKER, M_WAITOK)) == NULL) { - error = ENOMEM; - goto out; - } + raw = malloc(shdr[i].sh_size, M_LINKER, M_WAITOK); } else { /* * The CTF data is not compressed, so the ELF section * size is the same as the buffer size required. */ sz = shdr[i].sh_size; } /* * Allocate memory to buffer the CTF data in it's decompressed * form. */ - if ((ctftab = malloc(sz, M_LINKER, M_WAITOK)) == NULL) { - error = ENOMEM; - goto out; - } + ctftab = malloc(sz, M_LINKER, M_WAITOK); /* * Read the CTF data into the raw buffer if compressed, or * directly into the CTF buffer otherwise. */ if ((error = vn_rdwr(UIO_READ, nd.ni_vp, raw == NULL ? ctftab : raw, shdr[i].sh_size, shdr[i].sh_offset, UIO_SYSSPACE, IO_NODELOCKED, td->td_ucred, NOCRED, &resid, td)) != 0) goto out; /* Check if decompression is required. */ if (raw != NULL) { z_stream zs; int ret; /* * The header isn't compressed, so copy that into the * CTF buffer first. */ bcopy(ctf_hdr, ctftab, sizeof(ctf_hdr)); /* Initialise the zlib structure. */ bzero(&zs, sizeof(zs)); zs.zalloc = z_alloc; zs.zfree = z_free; if (inflateInit(&zs) != Z_OK) { error = EIO; goto out; } zs.avail_in = shdr[i].sh_size - sizeof(ctf_hdr); zs.next_in = ((uint8_t *) raw) + sizeof(ctf_hdr); zs.avail_out = sz - sizeof(ctf_hdr); zs.next_out = ((uint8_t *) ctftab) + sizeof(ctf_hdr); - if ((ret = inflate(&zs, Z_FINISH)) != Z_STREAM_END) { + ret = inflate(&zs, Z_FINISH); + inflateEnd(&zs); + if (ret != Z_STREAM_END) { printf("%s(%d): zlib inflate returned %d\n", __func__, __LINE__, ret); error = EIO; goto out; } } /* Got the CTF data! */ ef->ctftab = ctftab; ef->ctfcnt = shdr[i].sh_size; /* We'll retain the memory allocated for the CTF data. */ ctftab = NULL; /* Let the caller use the CTF data read. */ lc->ctftab = ef->ctftab; lc->ctfcnt = ef->ctfcnt; lc->symtab = ef->ddbsymtab; lc->strtab = ef->ddbstrtab; lc->strcnt = ef->ddbstrcnt; lc->nsym = ef->ddbsymcnt; lc->ctfoffp = (uint32_t **) &ef->ctfoff; lc->typoffp = (uint32_t **) &ef->typoff; lc->typlenp = &ef->typlen; out: VOP_UNLOCK(nd.ni_vp, 0); vn_close(nd.ni_vp, FREAD, td->td_ucred, td); if (hdr != NULL) free(hdr, M_LINKER); if (shdr != NULL) free(shdr, M_LINKER); if (shstrtab != NULL) free(shstrtab, M_LINKER); if (ctftab != NULL) free(ctftab, M_LINKER); if (raw != NULL) free(raw, M_LINKER); #else error = EOPNOTSUPP; #endif return (error); } Index: projects/ifnet/sys/net/if_var.h =================================================================== --- projects/ifnet/sys/net/if_var.h (revision 279031) +++ projects/ifnet/sys/net/if_var.h (revision 279032) @@ -1,541 +1,535 @@ /*- * Copyright (c) 1982, 1986, 1989, 1993 * The Regents of the University of California. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * From: @(#)if.h 8.1 (Berkeley) 6/10/93 * $FreeBSD$ */ #ifndef _NET_IF_VAR_H_ #define _NET_IF_VAR_H_ struct rtentry; /* ifa_rtrequest */ struct rt_addrinfo; /* ifa_rtrequest */ struct socket; struct carp_if; struct carp_softc; struct ifvlantrunk; struct ifmedia; struct netmap_adapter; #ifdef _KERNEL #include /* ifqueue only? */ #include #include #endif /* _KERNEL */ #include #include /* XXX */ #include /* struct ifqueue */ #include /* XXX */ #include /* XXX */ #include /* if_link_task */ #include TAILQ_HEAD(ifnethead, ifnet); /* we use TAILQs so that the order of */ TAILQ_HEAD(ifaddrhead, ifaddr); /* instantiation is preserved in the list */ TAILQ_HEAD(ifmultihead, ifmultiaddr); TAILQ_HEAD(ifgrouphead, ifg_group); #ifdef _KERNEL VNET_DECLARE(struct pfil_head, link_pfil_hook); /* packet filter hooks */ #define V_link_pfil_hook VNET(link_pfil_hook) #endif /* _KERNEL */ typedef void (*iftype_attach_t)(if_t ifp, struct if_attach_args *args); typedef void (*iftype_detach_t)(if_t ifp); struct iftype { const ifType ift_type; SLIST_ENTRY(iftype) ift_next; iftype_attach_t ift_attach; iftype_detach_t ift_detach; uint8_t ift_hdrlen; uint8_t ift_addrlen; uint32_t ift_dlt; uint32_t ift_dlt_hdrlen; struct ifops ift_ops; }; /* * Structure defining a network interface. * * (Would like to call this struct ``if'', but C isn't PL/1.) */ struct ifnet { struct ifops *if_ops; /* driver ops (or overridden) */ void *if_softc; /* driver soft state */ struct ifdriver *if_drv; /* driver static definition */ struct iftype *if_type; /* if type static def (optional)*/ struct iftsomax *if_tsomax; /* TSO limits */ /* General book keeping of interface lists. */ TAILQ_ENTRY(ifnet) if_link; /* all struct ifnets are chained */ LIST_ENTRY(ifnet) if_clones; /* interfaces of a cloner */ TAILQ_HEAD(, ifg_list) if_groups; /* linked list of groups per if */ /* protected by if_addr_lock */ void *if_llsoftc; /* link layer softc */ void *if_l2com; /* pointer to protocol bits */ int if_dunit; /* unit or IF_DUNIT_NONE */ u_short if_index; /* numeric abbreviation for this if */ short if_index_reserved; /* spare space to grow if_index */ char if_xname[IFNAMSIZ]; /* external name (name + unit) */ char *if_description; /* interface description */ /* Variable fields that are touched by the stack and drivers. */ uint32_t if_flags; /* up/down, broadcast, etc. */ uint32_t if_capabilities;/* interface features & capabilities */ uint32_t if_capenable; /* enabled features & capabilities */ void *if_linkmib; /* link-type-specific MIB data */ size_t if_linkmiblen; /* length of above data */ u_int if_refcount; /* reference count */ u_int if_fib; /* interface FIB */ uint8_t if_link_state; /* current link state */ uint32_t if_mtu; /* maximum transmission unit */ uint32_t if_metric; /* routing metric (external only) */ uint64_t if_baudrate; /* linespeed */ uint64_t if_hwassist; /* HW offload capabilities, see IFCAP */ time_t if_epoch; /* uptime at attach or stat reset */ struct timeval if_lastchange; /* time of last administrative change */ struct task if_linktask; /* task for link change events */ /* Addresses of different protocol families assigned to this if. */ struct rwlock if_addr_lock; /* lock to protect address lists */ /* * if_addrhead is the list of all addresses associated to * an interface. * Some code in the kernel assumes that first element * of the list has type AF_LINK, and contains sockaddr_dl * addresses which store the link-level address and the name * of the interface. * However, access to the AF_LINK address through this * field is deprecated. Use if_addr or ifaddr_byindex() instead. */ struct ifaddrhead if_addrhead; /* linked list of addresses per if */ struct ifmultihead if_multiaddrs; /* multicast addresses configured */ int if_amcount; /* number of all-multicast requests */ struct ifaddr *if_addr; /* pointer to link-level address */ const u_int8_t *if_broadcastaddr; /* linklevel broadcast bytestring */ struct rwlock if_afdata_lock; void *if_afdata[AF_MAX]; int if_afdata_initialized; /* Additional features hung off the interface. */ struct ifqueue *if_snd; /* software send queue */ struct vnet *if_vnet; /* pointer to network stack instance */ struct vnet *if_home_vnet; /* where this ifnet originates from */ struct ifvlantrunk *if_vlantrunk; /* pointer to 802.1q data */ struct bpf_if *if_bpf; /* packet filter structure */ int if_pcount; /* number of promiscuous listeners */ void *if_bridge; /* bridge glue */ void *if_lagg; /* lagg glue */ void *if_pf_kif; /* pf glue */ struct carp_if *if_carp; /* carp interface structure */ struct label *if_label; /* interface MAC label */ struct netmap_adapter *if_netmap; /* netmap(4) softc */ counter_u64_t if_counters[IFCOUNTERS]; /* Statistics */ /* * Spare fields to be added before branching a stable branch, so * that structure can be enhanced without changing the kernel * binary interface. */ }; /* * Locks for address lists on the network interface. */ #define IF_ADDR_LOCK_INIT(if) rw_init(&(if)->if_addr_lock, "if_addr_lock") #define IF_ADDR_LOCK_DESTROY(if) rw_destroy(&(if)->if_addr_lock) #define IF_ADDR_WLOCK(if) rw_wlock(&(if)->if_addr_lock) #define IF_ADDR_WUNLOCK(if) rw_wunlock(&(if)->if_addr_lock) #define IF_ADDR_RLOCK(if) rw_rlock(&(if)->if_addr_lock) #define IF_ADDR_RUNLOCK(if) rw_runlock(&(if)->if_addr_lock) #define IF_ADDR_LOCK_ASSERT(if) rw_assert(&(if)->if_addr_lock, RA_LOCKED) #define IF_ADDR_WLOCK_ASSERT(if) rw_assert(&(if)->if_addr_lock, RA_WLOCKED) #ifdef _KERNEL #ifdef _SYS_EVENTHANDLER_H_ /* interface link layer address change event */ typedef void (*iflladdr_event_handler_t)(void *, struct ifnet *); EVENTHANDLER_DECLARE(iflladdr_event, iflladdr_event_handler_t); /* interface address change event */ typedef void (*ifaddr_event_handler_t)(void *, struct ifnet *); EVENTHANDLER_DECLARE(ifaddr_event, ifaddr_event_handler_t); /* new interface arrival event */ typedef void (*ifnet_arrival_event_handler_t)(void *, struct ifnet *); EVENTHANDLER_DECLARE(ifnet_arrival_event, ifnet_arrival_event_handler_t); /* interface departure event */ typedef void (*ifnet_departure_event_handler_t)(void *, struct ifnet *); EVENTHANDLER_DECLARE(ifnet_departure_event, ifnet_departure_event_handler_t); /* Interface link state change event */ typedef void (*ifnet_link_event_handler_t)(void *, struct ifnet *, int); EVENTHANDLER_DECLARE(ifnet_link_event, ifnet_link_event_handler_t); #endif /* _SYS_EVENTHANDLER_H_ */ /* * interface groups */ struct ifg_group { char ifg_group[IFNAMSIZ]; u_int ifg_refcnt; void *ifg_pf_kif; TAILQ_HEAD(, ifg_member) ifg_members; TAILQ_ENTRY(ifg_group) ifg_next; }; struct ifg_member { TAILQ_ENTRY(ifg_member) ifgm_next; struct ifnet *ifgm_ifp; }; struct ifg_list { struct ifg_group *ifgl_group; TAILQ_ENTRY(ifg_list) ifgl_next; }; #ifdef _SYS_EVENTHANDLER_H_ /* group attach event */ typedef void (*group_attach_event_handler_t)(void *, struct ifg_group *); EVENTHANDLER_DECLARE(group_attach_event, group_attach_event_handler_t); /* group detach event */ typedef void (*group_detach_event_handler_t)(void *, struct ifg_group *); EVENTHANDLER_DECLARE(group_detach_event, group_detach_event_handler_t); /* group change event */ typedef void (*group_change_event_handler_t)(void *, const char *); EVENTHANDLER_DECLARE(group_change_event, group_change_event_handler_t); #endif /* _SYS_EVENTHANDLER_H_ */ #define IF_AFDATA_LOCK_INIT(ifp) \ rw_init(&(ifp)->if_afdata_lock, "if_afdata") #define IF_AFDATA_WLOCK(ifp) rw_wlock(&(ifp)->if_afdata_lock) #define IF_AFDATA_RLOCK(ifp) rw_rlock(&(ifp)->if_afdata_lock) #define IF_AFDATA_WUNLOCK(ifp) rw_wunlock(&(ifp)->if_afdata_lock) #define IF_AFDATA_RUNLOCK(ifp) rw_runlock(&(ifp)->if_afdata_lock) #define IF_AFDATA_LOCK(ifp) IF_AFDATA_WLOCK(ifp) #define IF_AFDATA_UNLOCK(ifp) IF_AFDATA_WUNLOCK(ifp) #define IF_AFDATA_TRYLOCK(ifp) rw_try_wlock(&(ifp)->if_afdata_lock) #define IF_AFDATA_DESTROY(ifp) rw_destroy(&(ifp)->if_afdata_lock) #define IF_AFDATA_LOCK_ASSERT(ifp) rw_assert(&(ifp)->if_afdata_lock, RA_LOCKED) #define IF_AFDATA_RLOCK_ASSERT(ifp) rw_assert(&(ifp)->if_afdata_lock, RA_RLOCKED) #define IF_AFDATA_WLOCK_ASSERT(ifp) rw_assert(&(ifp)->if_afdata_lock, RA_WLOCKED) #define IF_AFDATA_UNLOCK_ASSERT(ifp) rw_assert(&(ifp)->if_afdata_lock, RA_UNLOCKED) /* * 72 was chosen below because it is the size of a TCP/IP * header (40) + the minimum mss (32). */ #define IF_MINMTU 72 #define IF_MAXMTU 65535 #define TOEDEV(ifp) ((ifp)->if_llsoftc) -#endif /* _KERNEL */ - /* * The ifaddr structure contains information about one address * of an interface. They are maintained by the different address families, * are allocated and attached when an address is set, and are linked * together so all addresses for an interface can be located. * * NOTE: a 'struct ifaddr' is always at the beginning of a larger * chunk of malloc'ed memory, where we store the three addresses * (ifa_addr, ifa_dstaddr and ifa_netmask) referenced here. */ -#if defined(_KERNEL) || defined(_WANT_IFADDR) struct ifaddr { struct sockaddr *ifa_addr; /* address of interface */ struct sockaddr *ifa_dstaddr; /* other end of p-to-p link */ #define ifa_broadaddr ifa_dstaddr /* broadcast address interface */ struct sockaddr *ifa_netmask; /* used to determine subnet */ struct ifnet *ifa_ifp; /* back-pointer to interface */ struct carp_softc *ifa_carp; /* pointer to CARP data */ TAILQ_ENTRY(ifaddr) ifa_link; /* queue macro glue */ void (*ifa_rtrequest) /* check or clean routes (+ or -)'d */ (int, struct rtentry *, struct rt_addrinfo *); u_short ifa_flags; /* mostly rt_flags for cloning */ +#define IFA_ROUTE RTF_UP /* route installed */ +#define IFA_RTSELF RTF_HOST /* loopback route to self installed */ u_int ifa_refcnt; /* references to this structure */ counter_u64_t ifa_ipackets; counter_u64_t ifa_opackets; counter_u64_t ifa_ibytes; counter_u64_t ifa_obytes; }; -#endif - -#ifdef _KERNEL -#define IFA_ROUTE RTF_UP /* route installed */ -#define IFA_RTSELF RTF_HOST /* loopback route to self installed */ /* For compatibility with other BSDs. SCTP uses it. */ #define ifa_list ifa_link struct ifaddr * ifa_alloc(size_t size, int flags); void ifa_free(struct ifaddr *ifa); void ifa_ref(struct ifaddr *ifa); #endif /* _KERNEL */ /* * Multicast address structure. This is analogous to the ifaddr * structure except that it keeps track of multicast addresses. */ struct ifmultiaddr { TAILQ_ENTRY(ifmultiaddr) ifma_link; /* queue macro glue */ struct sockaddr *ifma_addr; /* address this membership is for */ struct sockaddr *ifma_lladdr; /* link-layer translation, if any */ struct ifnet *ifma_ifp; /* back-pointer to interface */ u_int ifma_refcount; /* reference count */ void *ifma_protospec; /* protocol-specific state, if any */ struct ifmultiaddr *ifma_llifma; /* pointer to ifma for ifma_lladdr */ }; #ifdef _KERNEL extern struct rwlock ifnet_rwlock; extern struct sx ifnet_sxlock; #define IFNET_WLOCK() do { \ sx_xlock(&ifnet_sxlock); \ rw_wlock(&ifnet_rwlock); \ } while (0) #define IFNET_WUNLOCK() do { \ rw_wunlock(&ifnet_rwlock); \ sx_xunlock(&ifnet_sxlock); \ } while (0) /* * To assert the ifnet lock, you must know not only whether it's for read or * write, but also whether it was acquired with sleep support or not. */ #define IFNET_RLOCK_ASSERT() sx_assert(&ifnet_sxlock, SA_SLOCKED) #define IFNET_RLOCK_NOSLEEP_ASSERT() rw_assert(&ifnet_rwlock, RA_RLOCKED) #define IFNET_WLOCK_ASSERT() do { \ sx_assert(&ifnet_sxlock, SA_XLOCKED); \ rw_assert(&ifnet_rwlock, RA_WLOCKED); \ } while (0) #define IFNET_RLOCK() sx_slock(&ifnet_sxlock) #define IFNET_RLOCK_NOSLEEP() rw_rlock(&ifnet_rwlock) #define IFNET_RUNLOCK() sx_sunlock(&ifnet_sxlock) #define IFNET_RUNLOCK_NOSLEEP() rw_runlock(&ifnet_rwlock) /* * Look up an ifnet given its index; the _ref variant also acquires a * reference that must be freed using if_rele(). It is almost always a bug * to call ifnet_byindex() instead if ifnet_byindex_ref(). */ struct ifnet *ifnet_byindex(u_short idx); struct ifnet *ifnet_byindex_locked(u_short idx); struct ifnet *ifnet_byindex_ref(u_short idx); /* * Given the index, ifaddr_byindex() returns the one and only * link-level ifaddr for the interface. You are not supposed to use * it to traverse the list of addresses associated to the interface. */ struct ifaddr *ifaddr_byindex(u_short idx); VNET_DECLARE(struct ifnethead, ifnet); VNET_DECLARE(struct ifgrouphead, ifg_head); VNET_DECLARE(int, if_index); VNET_DECLARE(struct ifnet *, loif); /* first loopback interface */ #define V_ifnet VNET(ifnet) #define V_ifg_head VNET(ifg_head) #define V_if_index VNET(if_index) #define V_loif VNET(loif) int if_addgroup(struct ifnet *, const char *); int if_delgroup(struct ifnet *, const char *); int if_addmulti(struct ifnet *, struct sockaddr *, struct ifmultiaddr **); int if_allmulti(struct ifnet *, int); int if_delmulti(struct ifnet *, struct sockaddr *); void if_delmulti_ifma(struct ifmultiaddr *); void if_vmove(struct ifnet *, struct vnet *); void if_purgeaddrs(struct ifnet *); void if_delallmulti(struct ifnet *); void if_down(struct ifnet *); struct ifmultiaddr * if_findmulti(struct ifnet *, struct sockaddr *); void if_ref(struct ifnet *); void if_rele(struct ifnet *); int if_setlladdr(struct ifnet *, const u_char *, int); void if_up(struct ifnet *); int ifioctl(struct socket *, u_long, caddr_t, struct thread *); int ifpromisc(struct ifnet *, int); struct ifnet *ifunit(const char *); struct ifnet *ifunit_ref(const char *); void iftype_register(struct iftype *); void iftype_unregister(struct iftype *); int ifa_add_loopback_route(struct ifaddr *, struct sockaddr *); int ifa_del_loopback_route(struct ifaddr *, struct sockaddr *); int ifa_switch_loopback_route(struct ifaddr *, struct sockaddr *, int fib); struct ifaddr *ifa_ifwithaddr(struct sockaddr *); int ifa_ifwithaddr_check(struct sockaddr *); struct ifaddr *ifa_ifwithbroadaddr(struct sockaddr *, int); struct ifaddr *ifa_ifwithdstaddr(struct sockaddr *, int); struct ifaddr *ifa_ifwithnet(struct sockaddr *, int, int); struct ifaddr *ifa_ifwithroute(int, struct sockaddr *, struct sockaddr *, u_int); struct ifaddr *ifaof_ifpforaddr(struct sockaddr *, struct ifnet *); int ifa_preferred(struct ifaddr *, struct ifaddr *); int if_simloop(struct ifnet *ifp, struct mbuf *m, int af, int hlen); void if_data_copy(struct ifnet *, struct if_data *); int if_getmtu_family(if_t ifp, int family); int if_setupmultiaddr(if_t ifp, void *mta, int *cnt, int max); int if_multiaddr_array(if_t ifp, void *mta, int *cnt, int max); int if_multiaddr_count(if_t ifp, int max); /* TSO */ void if_tsomax_common(const struct iftsomax *, struct iftsomax *); int if_tsomax_update(if_t ifp, const struct iftsomax *); #ifdef DEVICE_POLLING void if_poll_register(struct ifnet *ifp); void if_poll_deregister(struct ifnet *ifp); #endif /* * Wrappers around ifops. Some ops are optional and can be NULL, * others are mandatory. Those wrappers that driver can invoke * theirselves are not inlined, but implemented in if.c. */ static inline void if_init(if_t ifp, void *sc) { if (ifp->if_ops->ifop_init != NULL) return (ifp->if_ops->ifop_init(sc)); } #undef if_input static inline void if_input(if_t ifp, struct mbuf *m) { return (ifp->if_ops->ifop_input(ifp, m)); } #undef if_transmit static inline int if_transmit(if_t ifp, struct mbuf *m) { return (ifp->if_ops->ifop_transmit(ifp, m)); } static inline void if_qflush(if_t ifp) { if (ifp->if_ops->ifop_qflush != NULL) ifp->if_ops->ifop_qflush(ifp); } static inline int if_output(if_t ifp, struct mbuf *m, const struct sockaddr *dst, struct route *ro) { return (ifp->if_ops->ifop_output(ifp, m, dst, ro)); } static inline int if_ioctl(if_t ifp, u_long cmd, void *data, struct thread *td) { int error = EOPNOTSUPP; if (ifp->if_ops->ifop_ioctl != NULL) error = ifp->if_ops->ifop_ioctl(ifp, cmd, data, td); if (error == EOPNOTSUPP && ifp->if_type != NULL && ifp->if_type->ift_ops.ifop_ioctl != NULL) error = ifp->if_type->ift_ops.ifop_ioctl(ifp, cmd, data, td); return (error); } static inline uint64_t if_get_counter(const if_t ifp, ift_counter cnt) { return (ifp->if_ops->ifop_get_counter(ifp, cnt)); } static inline int if_resolvemulti(if_t ifp, struct sockaddr **llsa, struct sockaddr *sa) { if (ifp->if_ops->ifop_resolvemulti != NULL) return (ifp->if_ops->ifop_resolvemulti(ifp, llsa, sa)); else return (EOPNOTSUPP); } static inline void if_reassign(if_t ifp, struct vnet *new) { return (ifp->if_ops->ifop_reassign(ifp, new)); } #ifdef DEVICE_POLLING static inline int if_poll(if_t ifp, enum poll_cmd cmd, int count) { return (ifp->if_ops->ifop_poll(ifp, cmd, count)); } #endif /* * Inliners to shorten code, and make protocols more ifnet-agnostic. */ static inline ifType if_type(const if_t ifp) { return (ifp->if_drv->ifdrv_type); } static inline uint8_t if_addrlen(const if_t ifp) { return (ifp->if_drv->ifdrv_addrlen); } #endif /* _KERNEL */ #endif /* !_NET_IF_VAR_H_ */ Index: projects/ifnet/sys/netinet/igmp.c =================================================================== --- projects/ifnet/sys/netinet/igmp.c (revision 279031) +++ projects/ifnet/sys/netinet/igmp.c (revision 279032) @@ -1,3643 +1,3653 @@ /*- * Copyright (c) 2007-2009 Bruce Simpson. * Copyright (c) 1988 Stephen Deering. * Copyright (c) 1992, 1993 * The Regents of the University of California. All rights reserved. * * This code is derived from software contributed to Berkeley by * Stephen Deering of Stanford University. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)igmp.c 8.1 (Berkeley) 7/19/93 */ /* * Internet Group Management Protocol (IGMP) routines. * [RFC1112, RFC2236, RFC3376] * * Written by Steve Deering, Stanford, May 1988. * Modified by Rosen Sharma, Stanford, Aug 1994. * Modified by Bill Fenner, Xerox PARC, Feb 1995. * Modified to fully comply to IGMPv2 by Bill Fenner, Oct 1995. * Significantly rewritten for IGMPv3, VIMAGE, and SMP by Bruce Simpson. * * MULTICAST Revision: 3.5.1.4 */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifndef KTR_IGMPV3 #define KTR_IGMPV3 KTR_INET #endif -static struct igmp_ifinfo * +static struct igmp_ifsoftc * igi_alloc_locked(struct ifnet *); static void igi_delete_locked(const struct ifnet *); static void igmp_dispatch_queue(struct mbufq *, int, const int); static void igmp_fasttimo_vnet(void); -static void igmp_final_leave(struct in_multi *, struct igmp_ifinfo *); +static void igmp_final_leave(struct in_multi *, struct igmp_ifsoftc *); static int igmp_handle_state_change(struct in_multi *, - struct igmp_ifinfo *); -static int igmp_initial_join(struct in_multi *, struct igmp_ifinfo *); + struct igmp_ifsoftc *); +static int igmp_initial_join(struct in_multi *, struct igmp_ifsoftc *); static int igmp_input_v1_query(struct ifnet *, const struct ip *, const struct igmp *); static int igmp_input_v2_query(struct ifnet *, const struct ip *, const struct igmp *); static int igmp_input_v3_query(struct ifnet *, const struct ip *, /*const*/ struct igmpv3 *); static int igmp_input_v3_group_query(struct in_multi *, - struct igmp_ifinfo *, int, /*const*/ struct igmpv3 *); + struct igmp_ifsoftc *, int, /*const*/ struct igmpv3 *); static int igmp_input_v1_report(struct ifnet *, /*const*/ struct ip *, /*const*/ struct igmp *); static int igmp_input_v2_report(struct ifnet *, /*const*/ struct ip *, /*const*/ struct igmp *); static void igmp_intr(struct mbuf *); static int igmp_isgroupreported(const struct in_addr); static struct mbuf * igmp_ra_alloc(void); #ifdef KTR static char * igmp_rec_type_to_str(const int); #endif -static void igmp_set_version(struct igmp_ifinfo *, const int); +static void igmp_set_version(struct igmp_ifsoftc *, const int); static void igmp_slowtimo_vnet(void); static int igmp_v1v2_queue_report(struct in_multi *, const int); static void igmp_v1v2_process_group_timer(struct in_multi *, const int); -static void igmp_v1v2_process_querier_timers(struct igmp_ifinfo *); +static void igmp_v1v2_process_querier_timers(struct igmp_ifsoftc *); static void igmp_v2_update_group(struct in_multi *, const int); -static void igmp_v3_cancel_link_timers(struct igmp_ifinfo *); -static void igmp_v3_dispatch_general_query(struct igmp_ifinfo *); +static void igmp_v3_cancel_link_timers(struct igmp_ifsoftc *); +static void igmp_v3_dispatch_general_query(struct igmp_ifsoftc *); static struct mbuf * igmp_v3_encap_report(struct ifnet *, struct mbuf *); static int igmp_v3_enqueue_group_record(struct mbufq *, struct in_multi *, const int, const int, const int); static int igmp_v3_enqueue_filter_change(struct mbufq *, struct in_multi *); -static void igmp_v3_process_group_timers(struct igmp_ifinfo *, +static void igmp_v3_process_group_timers(struct igmp_ifsoftc *, struct mbufq *, struct mbufq *, struct in_multi *, const int); static int igmp_v3_merge_state_changes(struct in_multi *, struct mbufq *); static void igmp_v3_suppress_group_record(struct in_multi *); static int sysctl_igmp_default_version(SYSCTL_HANDLER_ARGS); static int sysctl_igmp_gsr(SYSCTL_HANDLER_ARGS); static int sysctl_igmp_ifinfo(SYSCTL_HANDLER_ARGS); static const struct netisr_handler igmp_nh = { .nh_name = "igmp", .nh_handler = igmp_intr, .nh_proto = NETISR_IGMP, .nh_policy = NETISR_POLICY_SOURCE, }; /* * System-wide globals. * * Unlocked access to these is OK, except for the global IGMP output * queue. The IGMP subsystem lock ends up being system-wide for the moment, * because all VIMAGEs have to share a global output queue, as netisrs * themselves are not virtualized. * * Locking: * * The permitted lock order is: IN_MULTI_LOCK, IGMP_LOCK, IF_ADDR_LOCK. * Any may be taken independently; if any are held at the same * time, the above lock order must be followed. * * All output is delegated to the netisr. * Now that Giant has been eliminated, the netisr may be inlined. * * IN_MULTI_LOCK covers in_multi. - * * IGMP_LOCK covers igmp_ifinfo and any global variables in this file, + * * IGMP_LOCK covers igmp_ifsoftc and any global variables in this file, * including the output queue. * * IF_ADDR_LOCK covers if_multiaddrs, which is used for a variety of * per-link state iterators. - * * igmp_ifinfo is valid as long as PF_INET is attached to the interface, + * * igmp_ifsoftc is valid as long as PF_INET is attached to the interface, * therefore it is not refcounted. - * We allow unlocked reads of igmp_ifinfo when accessed via in_multi. + * We allow unlocked reads of igmp_ifsoftc when accessed via in_multi. * * Reference counting * * IGMP acquires its own reference every time an in_multi is passed to * it and the group is being joined for the first time. * * IGMP releases its reference(s) on in_multi in a deferred way, * because the operations which process the release run as part of * a loop whose control variables are directly affected by the release * (that, and not recursing on the IF_ADDR_LOCK). * * VIMAGE: Each in_multi corresponds to an ifp, and each ifp corresponds * to a vnet in ifp->if_vnet. * * SMPng: XXX We may potentially race operations on ifma_protospec. * The problem is that we currently lack a clean way of taking the * IF_ADDR_LOCK() between the ifnet and in layers w/o recursing, * as anything which modifies ifma needs to be covered by that lock. * So check for ifma_protospec being NULL before proceeding. */ struct mtx igmp_mtx; struct mbuf *m_raopt; /* Router Alert option */ static MALLOC_DEFINE(M_IGMP, "igmp", "igmp state"); /* * VIMAGE-wide globals. * * The IGMPv3 timers themselves need to run per-image, however, * protosw timers run globally (see tcp). * An ifnet can only be in one vimage at a time, and the loopback * ifnet, loif, is itself virtualized. * It would otherwise be possible to seriously hose IGMP state, * and create inconsistencies in upstream multicast routing, if you have * multiple VIMAGEs running on the same link joining different multicast * groups, UNLESS the "primary IP address" is different. This is because * IGMP for IPv4 does not force link-local addresses to be used for each * node, unlike MLD for IPv6. * Obviously the IGMPv3 per-interface state has per-vimage granularity * also as a result. * * FUTURE: Stop using IFP_TO_IA/INADDR_ANY, and use source address selection * policy to control the address used by IGMP on the link. */ static VNET_DEFINE(int, interface_timers_running); /* IGMPv3 general * query response */ static VNET_DEFINE(int, state_change_timers_running); /* IGMPv3 state-change * retransmit */ static VNET_DEFINE(int, current_state_timers_running); /* IGMPv1/v2 host * report; IGMPv3 g/sg * query response */ #define V_interface_timers_running VNET(interface_timers_running) #define V_state_change_timers_running VNET(state_change_timers_running) #define V_current_state_timers_running VNET(current_state_timers_running) -static VNET_DEFINE(LIST_HEAD(, igmp_ifinfo), igi_head); +static VNET_DEFINE(LIST_HEAD(, igmp_ifsoftc), igi_head); static VNET_DEFINE(struct igmpstat, igmpstat) = { .igps_version = IGPS_VERSION_3, .igps_len = sizeof(struct igmpstat), }; static VNET_DEFINE(struct timeval, igmp_gsrdelay) = {10, 0}; #define V_igi_head VNET(igi_head) #define V_igmpstat VNET(igmpstat) #define V_igmp_gsrdelay VNET(igmp_gsrdelay) static VNET_DEFINE(int, igmp_recvifkludge) = 1; static VNET_DEFINE(int, igmp_sendra) = 1; static VNET_DEFINE(int, igmp_sendlocal) = 1; static VNET_DEFINE(int, igmp_v1enable) = 1; static VNET_DEFINE(int, igmp_v2enable) = 1; static VNET_DEFINE(int, igmp_legacysupp); static VNET_DEFINE(int, igmp_default_version) = IGMP_VERSION_3; #define V_igmp_recvifkludge VNET(igmp_recvifkludge) #define V_igmp_sendra VNET(igmp_sendra) #define V_igmp_sendlocal VNET(igmp_sendlocal) #define V_igmp_v1enable VNET(igmp_v1enable) #define V_igmp_v2enable VNET(igmp_v2enable) #define V_igmp_legacysupp VNET(igmp_legacysupp) #define V_igmp_default_version VNET(igmp_default_version) /* * Virtualized sysctls. */ SYSCTL_STRUCT(_net_inet_igmp, IGMPCTL_STATS, stats, CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(igmpstat), igmpstat, ""); SYSCTL_INT(_net_inet_igmp, OID_AUTO, recvifkludge, CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(igmp_recvifkludge), 0, "Rewrite IGMPv1/v2 reports from 0.0.0.0 to contain subnet address"); SYSCTL_INT(_net_inet_igmp, OID_AUTO, sendra, CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(igmp_sendra), 0, "Send IP Router Alert option in IGMPv2/v3 messages"); SYSCTL_INT(_net_inet_igmp, OID_AUTO, sendlocal, CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(igmp_sendlocal), 0, "Send IGMP membership reports for 224.0.0.0/24 groups"); SYSCTL_INT(_net_inet_igmp, OID_AUTO, v1enable, CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(igmp_v1enable), 0, "Enable backwards compatibility with IGMPv1"); SYSCTL_INT(_net_inet_igmp, OID_AUTO, v2enable, CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(igmp_v2enable), 0, "Enable backwards compatibility with IGMPv2"); SYSCTL_INT(_net_inet_igmp, OID_AUTO, legacysupp, CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(igmp_legacysupp), 0, "Allow v1/v2 reports to suppress v3 group responses"); SYSCTL_PROC(_net_inet_igmp, OID_AUTO, default_version, CTLFLAG_VNET | CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_MPSAFE, &VNET_NAME(igmp_default_version), 0, sysctl_igmp_default_version, "I", "Default version of IGMP to run on each interface"); SYSCTL_PROC(_net_inet_igmp, OID_AUTO, gsrdelay, CTLFLAG_VNET | CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_MPSAFE, &VNET_NAME(igmp_gsrdelay.tv_sec), 0, sysctl_igmp_gsr, "I", "Rate limit for IGMPv3 Group-and-Source queries in seconds"); /* * Non-virtualized sysctls. */ static SYSCTL_NODE(_net_inet_igmp, OID_AUTO, ifinfo, CTLFLAG_RD | CTLFLAG_MPSAFE, sysctl_igmp_ifinfo, "Per-interface IGMPv3 state"); static __inline void igmp_save_context(struct mbuf *m, struct ifnet *ifp) { #ifdef VIMAGE m->m_pkthdr.PH_loc.ptr = ifp->if_vnet; #endif /* VIMAGE */ m->m_pkthdr.flowid = ifp->if_index; } static __inline void igmp_scrub_context(struct mbuf *m) { m->m_pkthdr.PH_loc.ptr = NULL; m->m_pkthdr.flowid = 0; } #ifdef KTR static __inline char * inet_ntoa_haddr(in_addr_t haddr) { struct in_addr ia; ia.s_addr = htonl(haddr); return (inet_ntoa(ia)); } #endif /* * Restore context from a queued IGMP output chain. * Return saved ifindex. * * VIMAGE: The assertion is there to make sure that we * actually called CURVNET_SET() with what's in the mbuf chain. */ static __inline uint32_t igmp_restore_context(struct mbuf *m) { #ifdef notyet #if defined(VIMAGE) && defined(INVARIANTS) KASSERT(curvnet == (m->m_pkthdr.PH_loc.ptr), ("%s: called when curvnet was not restored", __func__)); #endif #endif return (m->m_pkthdr.flowid); } /* * Retrieve or set default IGMP version. * * VIMAGE: Assume curvnet set by caller. * SMPng: NOTE: Serialized by IGMP lock. */ static int sysctl_igmp_default_version(SYSCTL_HANDLER_ARGS) { int error; int new; error = sysctl_wire_old_buffer(req, sizeof(int)); if (error) return (error); IGMP_LOCK(); new = V_igmp_default_version; error = sysctl_handle_int(oidp, &new, 0, req); if (error || !req->newptr) goto out_locked; if (new < IGMP_VERSION_1 || new > IGMP_VERSION_3) { error = EINVAL; goto out_locked; } CTR2(KTR_IGMPV3, "change igmp_default_version from %d to %d", V_igmp_default_version, new); V_igmp_default_version = new; out_locked: IGMP_UNLOCK(); return (error); } /* * Retrieve or set threshold between group-source queries in seconds. * * VIMAGE: Assume curvnet set by caller. * SMPng: NOTE: Serialized by IGMP lock. */ static int sysctl_igmp_gsr(SYSCTL_HANDLER_ARGS) { int error; int i; error = sysctl_wire_old_buffer(req, sizeof(int)); if (error) return (error); IGMP_LOCK(); i = V_igmp_gsrdelay.tv_sec; error = sysctl_handle_int(oidp, &i, 0, req); if (error || !req->newptr) goto out_locked; if (i < -1 || i >= 60) { error = EINVAL; goto out_locked; } CTR2(KTR_IGMPV3, "change igmp_gsrdelay from %d to %d", V_igmp_gsrdelay.tv_sec, i); V_igmp_gsrdelay.tv_sec = i; out_locked: IGMP_UNLOCK(); return (error); } /* - * Expose struct igmp_ifinfo to userland, keyed by ifindex. + * Expose struct igmp_ifsoftc to userland, keyed by ifindex. * For use by ifmcstat(8). * * SMPng: NOTE: Does an unlocked ifindex space read. * VIMAGE: Assume curvnet set by caller. The node handler itself * is not directly virtualized. */ static int sysctl_igmp_ifinfo(SYSCTL_HANDLER_ARGS) { int *name; int error; u_int namelen; struct ifnet *ifp; - struct igmp_ifinfo *igi; + struct igmp_ifsoftc *igi; name = (int *)arg1; namelen = arg2; if (req->newptr != NULL) return (EPERM); if (namelen != 1) return (EINVAL); error = sysctl_wire_old_buffer(req, sizeof(struct igmp_ifinfo)); if (error) return (error); IN_MULTI_LOCK(); IGMP_LOCK(); if (name[0] <= 0 || name[0] > V_if_index) { error = ENOENT; goto out_locked; } error = ENOENT; ifp = ifnet_byindex(name[0]); if (ifp == NULL) goto out_locked; LIST_FOREACH(igi, &V_igi_head, igi_link) { if (ifp == igi->igi_ifp) { - error = SYSCTL_OUT(req, igi, - sizeof(struct igmp_ifinfo)); + struct igmp_ifinfo info; + + info.igi_version = igi->igi_version; + info.igi_v1_timer = igi->igi_v1_timer; + info.igi_v2_timer = igi->igi_v2_timer; + info.igi_v3_timer = igi->igi_v3_timer; + info.igi_flags = igi->igi_flags; + info.igi_rv = igi->igi_rv; + info.igi_qi = igi->igi_qi; + info.igi_qri = igi->igi_qri; + info.igi_uri = igi->igi_uri; + error = SYSCTL_OUT(req, &info, sizeof(info)); break; } } out_locked: IGMP_UNLOCK(); IN_MULTI_UNLOCK(); return (error); } /* * Dispatch an entire queue of pending packet chains * using the netisr. * VIMAGE: Assumes the vnet pointer has been set. */ static void igmp_dispatch_queue(struct mbufq *mq, int limit, const int loop) { struct mbuf *m; while ((m = mbufq_dequeue(mq)) != NULL) { CTR3(KTR_IGMPV3, "%s: dispatch %p from %p", __func__, mq, m); if (loop) m->m_flags |= M_IGMP_LOOP; netisr_dispatch(NETISR_IGMP, m); if (--limit == 0) break; } } /* * Filter outgoing IGMP report state by group. * * Reports are ALWAYS suppressed for ALL-HOSTS (224.0.0.1). * If the net.inet.igmp.sendlocal sysctl is 0, then IGMP reports are * disabled for all groups in the 224.0.0.0/24 link-local scope. However, * this may break certain IGMP snooping switches which rely on the old * report behaviour. * * Return zero if the given group is one for which IGMP reports * should be suppressed, or non-zero if reports should be issued. */ static __inline int igmp_isgroupreported(const struct in_addr addr) { if (in_allhosts(addr) || ((!V_igmp_sendlocal && IN_LOCAL_GROUP(ntohl(addr.s_addr))))) return (0); return (1); } /* * Construct a Router Alert option to use in outgoing packets. */ static struct mbuf * igmp_ra_alloc(void) { struct mbuf *m; struct ipoption *p; m = m_get(M_WAITOK, MT_DATA); p = mtod(m, struct ipoption *); p->ipopt_dst.s_addr = INADDR_ANY; p->ipopt_list[0] = IPOPT_RA; /* Router Alert Option */ p->ipopt_list[1] = 0x04; /* 4 bytes long */ p->ipopt_list[2] = IPOPT_EOL; /* End of IP option list */ p->ipopt_list[3] = 0x00; /* pad byte */ m->m_len = sizeof(p->ipopt_dst) + p->ipopt_list[1]; return (m); } /* * Attach IGMP when PF_INET is attached to an interface. */ -struct igmp_ifinfo * +struct igmp_ifsoftc * igmp_domifattach(struct ifnet *ifp) { - struct igmp_ifinfo *igi; + struct igmp_ifsoftc *igi; CTR3(KTR_IGMPV3, "%s: called for ifp %p(%s)", __func__, ifp, ifp->if_xname); IGMP_LOCK(); igi = igi_alloc_locked(ifp); if (!(ifp->if_flags & IFF_MULTICAST)) igi->igi_flags |= IGIF_SILENT; IGMP_UNLOCK(); return (igi); } /* * VIMAGE: assume curvnet set by caller. */ -static struct igmp_ifinfo * +static struct igmp_ifsoftc * igi_alloc_locked(/*const*/ struct ifnet *ifp) { - struct igmp_ifinfo *igi; + struct igmp_ifsoftc *igi; IGMP_LOCK_ASSERT(); - igi = malloc(sizeof(struct igmp_ifinfo), M_IGMP, M_NOWAIT|M_ZERO); + igi = malloc(sizeof(struct igmp_ifsoftc), M_IGMP, M_NOWAIT|M_ZERO); if (igi == NULL) goto out; igi->igi_ifp = ifp; igi->igi_version = V_igmp_default_version; igi->igi_flags = 0; igi->igi_rv = IGMP_RV_INIT; igi->igi_qi = IGMP_QI_INIT; igi->igi_qri = IGMP_QRI_INIT; igi->igi_uri = IGMP_URI_INIT; SLIST_INIT(&igi->igi_relinmhead); mbufq_init(&igi->igi_gq, IGMP_MAX_RESPONSE_PACKETS); LIST_INSERT_HEAD(&V_igi_head, igi, igi_link); - CTR2(KTR_IGMPV3, "allocate igmp_ifinfo for ifp %p(%s)", + CTR2(KTR_IGMPV3, "allocate igmp_ifsoftc for ifp %p(%s)", ifp, ifp->if_xname); out: return (igi); } /* * Hook for ifdetach. * * NOTE: Some finalization tasks need to run before the protocol domain * is detached, but also before the link layer does its cleanup. * * SMPNG: igmp_ifdetach() needs to take IF_ADDR_LOCK(). * XXX This is also bitten by unlocked ifma_protospec access. */ void igmp_ifdetach(struct ifnet *ifp) { - struct igmp_ifinfo *igi; + struct igmp_ifsoftc *igi; struct ifmultiaddr *ifma; struct in_multi *inm, *tinm; CTR3(KTR_IGMPV3, "%s: called for ifp %p(%s)", __func__, ifp, ifp->if_xname); IGMP_LOCK(); igi = ((struct in_ifinfo *)ifp->if_afdata[AF_INET])->ii_igmp; if (igi->igi_version == IGMP_VERSION_3) { IF_ADDR_RLOCK(ifp); TAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) { if (ifma->ifma_addr->sa_family != AF_INET || ifma->ifma_protospec == NULL) continue; #if 0 KASSERT(ifma->ifma_protospec != NULL, ("%s: ifma_protospec is NULL", __func__)); #endif inm = (struct in_multi *)ifma->ifma_protospec; if (inm->inm_state == IGMP_LEAVING_MEMBER) { SLIST_INSERT_HEAD(&igi->igi_relinmhead, inm, inm_nrele); } inm_clear_recorded(inm); } IF_ADDR_RUNLOCK(ifp); /* * Free the in_multi reference(s) for this IGMP lifecycle. */ SLIST_FOREACH_SAFE(inm, &igi->igi_relinmhead, inm_nrele, tinm) { SLIST_REMOVE_HEAD(&igi->igi_relinmhead, inm_nrele); inm_release_locked(inm); } } IGMP_UNLOCK(); } /* * Hook for domifdetach. */ void igmp_domifdetach(struct ifnet *ifp) { - struct igmp_ifinfo *igi; + struct igmp_ifsoftc *igi; CTR3(KTR_IGMPV3, "%s: called for ifp %p(%s)", __func__, ifp, ifp->if_xname); IGMP_LOCK(); igi = ((struct in_ifinfo *)ifp->if_afdata[AF_INET])->ii_igmp; igi_delete_locked(ifp); IGMP_UNLOCK(); } static void igi_delete_locked(const struct ifnet *ifp) { - struct igmp_ifinfo *igi, *tigi; + struct igmp_ifsoftc *igi, *tigi; - CTR3(KTR_IGMPV3, "%s: freeing igmp_ifinfo for ifp %p(%s)", + CTR3(KTR_IGMPV3, "%s: freeing igmp_ifsoftc for ifp %p(%s)", __func__, ifp, ifp->if_xname); IGMP_LOCK_ASSERT(); LIST_FOREACH_SAFE(igi, &V_igi_head, igi_link, tigi) { if (igi->igi_ifp == ifp) { /* * Free deferred General Query responses. */ mbufq_drain(&igi->igi_gq); LIST_REMOVE(igi, igi_link); KASSERT(SLIST_EMPTY(&igi->igi_relinmhead), ("%s: there are dangling in_multi references", __func__)); free(igi, M_IGMP); return; } } #ifdef INVARIANTS - panic("%s: igmp_ifinfo not found for ifp %p\n", __func__, ifp); + panic("%s: igmp_ifsoftc not found for ifp %p\n", __func__, ifp); #endif } /* * Process a received IGMPv1 query. * Return non-zero if the message should be dropped. * * VIMAGE: The curvnet pointer is derived from the input ifp. */ static int igmp_input_v1_query(struct ifnet *ifp, const struct ip *ip, const struct igmp *igmp) { struct ifmultiaddr *ifma; - struct igmp_ifinfo *igi; + struct igmp_ifsoftc *igi; struct in_multi *inm; /* * IGMPv1 Host Mmembership Queries SHOULD always be addressed to * 224.0.0.1. They are always treated as General Queries. * igmp_group is always ignored. Do not drop it as a userland * daemon may wish to see it. * XXX SMPng: unlocked increments in igmpstat assumed atomic. */ if (!in_allhosts(ip->ip_dst) || !in_nullhost(igmp->igmp_group)) { IGMPSTAT_INC(igps_rcv_badqueries); return (0); } IGMPSTAT_INC(igps_rcv_gen_queries); IN_MULTI_LOCK(); IGMP_LOCK(); igi = ((struct in_ifinfo *)ifp->if_afdata[AF_INET])->ii_igmp; - KASSERT(igi != NULL, ("%s: no igmp_ifinfo for ifp %p", __func__, ifp)); + KASSERT(igi != NULL, ("%s: no igmp_ifsoftc for ifp %p", __func__, ifp)); if (igi->igi_flags & IGIF_LOOPBACK) { CTR2(KTR_IGMPV3, "ignore v1 query on IGIF_LOOPBACK ifp %p(%s)", ifp, ifp->if_xname); goto out_locked; } /* * Switch to IGMPv1 host compatibility mode. */ igmp_set_version(igi, IGMP_VERSION_1); CTR2(KTR_IGMPV3, "process v1 query on ifp %p(%s)", ifp, ifp->if_xname); /* * Start the timers in all of our group records * for the interface on which the query arrived, * except those which are already running. */ IF_ADDR_RLOCK(ifp); TAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) { if (ifma->ifma_addr->sa_family != AF_INET || ifma->ifma_protospec == NULL) continue; inm = (struct in_multi *)ifma->ifma_protospec; if (inm->inm_timer != 0) continue; switch (inm->inm_state) { case IGMP_NOT_MEMBER: case IGMP_SILENT_MEMBER: break; case IGMP_G_QUERY_PENDING_MEMBER: case IGMP_SG_QUERY_PENDING_MEMBER: case IGMP_REPORTING_MEMBER: case IGMP_IDLE_MEMBER: case IGMP_LAZY_MEMBER: case IGMP_SLEEPING_MEMBER: case IGMP_AWAKENING_MEMBER: inm->inm_state = IGMP_REPORTING_MEMBER; inm->inm_timer = IGMP_RANDOM_DELAY( IGMP_V1V2_MAX_RI * PR_FASTHZ); V_current_state_timers_running = 1; break; case IGMP_LEAVING_MEMBER: break; } } IF_ADDR_RUNLOCK(ifp); out_locked: IGMP_UNLOCK(); IN_MULTI_UNLOCK(); return (0); } /* * Process a received IGMPv2 general or group-specific query. */ static int igmp_input_v2_query(struct ifnet *ifp, const struct ip *ip, const struct igmp *igmp) { struct ifmultiaddr *ifma; - struct igmp_ifinfo *igi; + struct igmp_ifsoftc *igi; struct in_multi *inm; int is_general_query; uint16_t timer; is_general_query = 0; /* * Validate address fields upfront. * XXX SMPng: unlocked increments in igmpstat assumed atomic. */ if (in_nullhost(igmp->igmp_group)) { /* * IGMPv2 General Query. * If this was not sent to the all-hosts group, ignore it. */ if (!in_allhosts(ip->ip_dst)) return (0); IGMPSTAT_INC(igps_rcv_gen_queries); is_general_query = 1; } else { /* IGMPv2 Group-Specific Query. */ IGMPSTAT_INC(igps_rcv_group_queries); } IN_MULTI_LOCK(); IGMP_LOCK(); igi = ((struct in_ifinfo *)ifp->if_afdata[AF_INET])->ii_igmp; - KASSERT(igi != NULL, ("%s: no igmp_ifinfo for ifp %p", __func__, ifp)); + KASSERT(igi != NULL, ("%s: no igmp_ifsoftc for ifp %p", __func__, ifp)); if (igi->igi_flags & IGIF_LOOPBACK) { CTR2(KTR_IGMPV3, "ignore v2 query on IGIF_LOOPBACK ifp %p(%s)", ifp, ifp->if_xname); goto out_locked; } /* * Ignore v2 query if in v1 Compatibility Mode. */ if (igi->igi_version == IGMP_VERSION_1) goto out_locked; igmp_set_version(igi, IGMP_VERSION_2); timer = igmp->igmp_code * PR_FASTHZ / IGMP_TIMER_SCALE; if (timer == 0) timer = 1; if (is_general_query) { /* * For each reporting group joined on this * interface, kick the report timer. */ CTR2(KTR_IGMPV3, "process v2 general query on ifp %p(%s)", ifp, ifp->if_xname); IF_ADDR_RLOCK(ifp); TAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) { if (ifma->ifma_addr->sa_family != AF_INET || ifma->ifma_protospec == NULL) continue; inm = (struct in_multi *)ifma->ifma_protospec; igmp_v2_update_group(inm, timer); } IF_ADDR_RUNLOCK(ifp); } else { /* * Group-specific IGMPv2 query, we need only * look up the single group to process it. */ inm = inm_lookup(ifp, igmp->igmp_group); if (inm != NULL) { CTR3(KTR_IGMPV3, "process v2 query %s on ifp %p(%s)", inet_ntoa(igmp->igmp_group), ifp, ifp->if_xname); igmp_v2_update_group(inm, timer); } } out_locked: IGMP_UNLOCK(); IN_MULTI_UNLOCK(); return (0); } /* * Update the report timer on a group in response to an IGMPv2 query. * * If we are becoming the reporting member for this group, start the timer. * If we already are the reporting member for this group, and timer is * below the threshold, reset it. * * We may be updating the group for the first time since we switched * to IGMPv3. If we are, then we must clear any recorded source lists, * and transition to REPORTING state; the group timer is overloaded * for group and group-source query responses. * * Unlike IGMPv3, the delay per group should be jittered * to avoid bursts of IGMPv2 reports. */ static void igmp_v2_update_group(struct in_multi *inm, const int timer) { CTR4(KTR_IGMPV3, "%s: %s/%s timer=%d", __func__, inet_ntoa(inm->inm_addr), inm->inm_ifp->if_xname, timer); IN_MULTI_LOCK_ASSERT(); switch (inm->inm_state) { case IGMP_NOT_MEMBER: case IGMP_SILENT_MEMBER: break; case IGMP_REPORTING_MEMBER: if (inm->inm_timer != 0 && inm->inm_timer <= timer) { CTR1(KTR_IGMPV3, "%s: REPORTING and timer running, " "skipping.", __func__); break; } /* FALLTHROUGH */ case IGMP_SG_QUERY_PENDING_MEMBER: case IGMP_G_QUERY_PENDING_MEMBER: case IGMP_IDLE_MEMBER: case IGMP_LAZY_MEMBER: case IGMP_AWAKENING_MEMBER: CTR1(KTR_IGMPV3, "%s: ->REPORTING", __func__); inm->inm_state = IGMP_REPORTING_MEMBER; inm->inm_timer = IGMP_RANDOM_DELAY(timer); V_current_state_timers_running = 1; break; case IGMP_SLEEPING_MEMBER: CTR1(KTR_IGMPV3, "%s: ->AWAKENING", __func__); inm->inm_state = IGMP_AWAKENING_MEMBER; break; case IGMP_LEAVING_MEMBER: break; } } /* * Process a received IGMPv3 general, group-specific or * group-and-source-specific query. * Assumes m has already been pulled up to the full IGMP message length. * Return 0 if successful, otherwise an appropriate error code is returned. */ static int igmp_input_v3_query(struct ifnet *ifp, const struct ip *ip, /*const*/ struct igmpv3 *igmpv3) { - struct igmp_ifinfo *igi; + struct igmp_ifsoftc *igi; struct in_multi *inm; int is_general_query; uint32_t maxresp, nsrc, qqi; uint16_t timer; uint8_t qrv; is_general_query = 0; CTR2(KTR_IGMPV3, "process v3 query on ifp %p(%s)", ifp, ifp->if_xname); maxresp = igmpv3->igmp_code; /* in 1/10ths of a second */ if (maxresp >= 128) { maxresp = IGMP_MANT(igmpv3->igmp_code) << (IGMP_EXP(igmpv3->igmp_code) + 3); } /* * Robustness must never be less than 2 for on-wire IGMPv3. * FUTURE: Check if ifp has IGIF_LOOPBACK set, as we will make * an exception for interfaces whose IGMPv3 state changes * are redirected to loopback (e.g. MANET). */ qrv = IGMP_QRV(igmpv3->igmp_misc); if (qrv < 2) { CTR3(KTR_IGMPV3, "%s: clamping qrv %d to %d", __func__, qrv, IGMP_RV_INIT); qrv = IGMP_RV_INIT; } qqi = igmpv3->igmp_qqi; if (qqi >= 128) { qqi = IGMP_MANT(igmpv3->igmp_qqi) << (IGMP_EXP(igmpv3->igmp_qqi) + 3); } timer = maxresp * PR_FASTHZ / IGMP_TIMER_SCALE; if (timer == 0) timer = 1; nsrc = ntohs(igmpv3->igmp_numsrc); /* * Validate address fields and versions upfront before * accepting v3 query. * XXX SMPng: Unlocked access to igmpstat counters here. */ if (in_nullhost(igmpv3->igmp_group)) { /* * IGMPv3 General Query. * * General Queries SHOULD be directed to 224.0.0.1. * A general query with a source list has undefined * behaviour; discard it. */ IGMPSTAT_INC(igps_rcv_gen_queries); if (!in_allhosts(ip->ip_dst) || nsrc > 0) { IGMPSTAT_INC(igps_rcv_badqueries); return (0); } is_general_query = 1; } else { /* Group or group-source specific query. */ if (nsrc == 0) IGMPSTAT_INC(igps_rcv_group_queries); else IGMPSTAT_INC(igps_rcv_gsr_queries); } IN_MULTI_LOCK(); IGMP_LOCK(); igi = ((struct in_ifinfo *)ifp->if_afdata[AF_INET])->ii_igmp; - KASSERT(igi != NULL, ("%s: no igmp_ifinfo for ifp %p", __func__, ifp)); + KASSERT(igi != NULL, ("%s: no igmp_ifsoftc for ifp %p", __func__, ifp)); if (igi->igi_flags & IGIF_LOOPBACK) { CTR2(KTR_IGMPV3, "ignore v3 query on IGIF_LOOPBACK ifp %p(%s)", ifp, ifp->if_xname); goto out_locked; } /* * Discard the v3 query if we're in Compatibility Mode. * The RFC is not obviously worded that hosts need to stay in * compatibility mode until the Old Version Querier Present * timer expires. */ if (igi->igi_version != IGMP_VERSION_3) { CTR3(KTR_IGMPV3, "ignore v3 query in v%d mode on ifp %p(%s)", igi->igi_version, ifp, ifp->if_xname); goto out_locked; } igmp_set_version(igi, IGMP_VERSION_3); igi->igi_rv = qrv; igi->igi_qi = qqi; igi->igi_qri = maxresp; CTR4(KTR_IGMPV3, "%s: qrv %d qi %d qri %d", __func__, qrv, qqi, maxresp); if (is_general_query) { /* * Schedule a current-state report on this ifp for * all groups, possibly containing source lists. * If there is a pending General Query response * scheduled earlier than the selected delay, do * not schedule any other reports. * Otherwise, reset the interface timer. */ CTR2(KTR_IGMPV3, "process v3 general query on ifp %p(%s)", ifp, ifp->if_xname); if (igi->igi_v3_timer == 0 || igi->igi_v3_timer >= timer) { igi->igi_v3_timer = IGMP_RANDOM_DELAY(timer); V_interface_timers_running = 1; } } else { /* * Group-source-specific queries are throttled on * a per-group basis to defeat denial-of-service attempts. * Queries for groups we are not a member of on this * link are simply ignored. */ inm = inm_lookup(ifp, igmpv3->igmp_group); if (inm == NULL) goto out_locked; if (nsrc > 0) { if (!ratecheck(&inm->inm_lastgsrtv, &V_igmp_gsrdelay)) { CTR1(KTR_IGMPV3, "%s: GS query throttled.", __func__); IGMPSTAT_INC(igps_drop_gsr_queries); goto out_locked; } } CTR3(KTR_IGMPV3, "process v3 %s query on ifp %p(%s)", inet_ntoa(igmpv3->igmp_group), ifp, ifp->if_xname); /* * If there is a pending General Query response * scheduled sooner than the selected delay, no * further report need be scheduled. * Otherwise, prepare to respond to the * group-specific or group-and-source query. */ if (igi->igi_v3_timer == 0 || igi->igi_v3_timer >= timer) igmp_input_v3_group_query(inm, igi, timer, igmpv3); } out_locked: IGMP_UNLOCK(); IN_MULTI_UNLOCK(); return (0); } /* * Process a recieved IGMPv3 group-specific or group-and-source-specific * query. * Return <0 if any error occured. Currently this is ignored. */ static int -igmp_input_v3_group_query(struct in_multi *inm, struct igmp_ifinfo *igi, +igmp_input_v3_group_query(struct in_multi *inm, struct igmp_ifsoftc *igi, int timer, /*const*/ struct igmpv3 *igmpv3) { int retval; uint16_t nsrc; IN_MULTI_LOCK_ASSERT(); IGMP_LOCK_ASSERT(); retval = 0; switch (inm->inm_state) { case IGMP_NOT_MEMBER: case IGMP_SILENT_MEMBER: case IGMP_SLEEPING_MEMBER: case IGMP_LAZY_MEMBER: case IGMP_AWAKENING_MEMBER: case IGMP_IDLE_MEMBER: case IGMP_LEAVING_MEMBER: return (retval); break; case IGMP_REPORTING_MEMBER: case IGMP_G_QUERY_PENDING_MEMBER: case IGMP_SG_QUERY_PENDING_MEMBER: break; } nsrc = ntohs(igmpv3->igmp_numsrc); /* * Deal with group-specific queries upfront. * If any group query is already pending, purge any recorded * source-list state if it exists, and schedule a query response * for this group-specific query. */ if (nsrc == 0) { if (inm->inm_state == IGMP_G_QUERY_PENDING_MEMBER || inm->inm_state == IGMP_SG_QUERY_PENDING_MEMBER) { inm_clear_recorded(inm); timer = min(inm->inm_timer, timer); } inm->inm_state = IGMP_G_QUERY_PENDING_MEMBER; inm->inm_timer = IGMP_RANDOM_DELAY(timer); V_current_state_timers_running = 1; return (retval); } /* * Deal with the case where a group-and-source-specific query has * been received but a group-specific query is already pending. */ if (inm->inm_state == IGMP_G_QUERY_PENDING_MEMBER) { timer = min(inm->inm_timer, timer); inm->inm_timer = IGMP_RANDOM_DELAY(timer); V_current_state_timers_running = 1; return (retval); } /* * Finally, deal with the case where a group-and-source-specific * query has been received, where a response to a previous g-s-r * query exists, or none exists. * In this case, we need to parse the source-list which the Querier * has provided us with and check if we have any source list filter * entries at T1 for these sources. If we do not, there is no need * schedule a report and the query may be dropped. * If we do, we must record them and schedule a current-state * report for those sources. * FIXME: Handling source lists larger than 1 mbuf requires that * we pass the mbuf chain pointer down to this function, and use * m_getptr() to walk the chain. */ if (inm->inm_nsrc > 0) { const struct in_addr *ap; int i, nrecorded; ap = (const struct in_addr *)(igmpv3 + 1); nrecorded = 0; for (i = 0; i < nsrc; i++, ap++) { retval = inm_record_source(inm, ap->s_addr); if (retval < 0) break; nrecorded += retval; } if (nrecorded > 0) { CTR1(KTR_IGMPV3, "%s: schedule response to SG query", __func__); inm->inm_state = IGMP_SG_QUERY_PENDING_MEMBER; inm->inm_timer = IGMP_RANDOM_DELAY(timer); V_current_state_timers_running = 1; } } return (retval); } /* * Process a received IGMPv1 host membership report. * * NOTE: 0.0.0.0 workaround breaks const correctness. */ static int igmp_input_v1_report(struct ifnet *ifp, /*const*/ struct ip *ip, /*const*/ struct igmp *igmp) { struct in_ifaddr *ia; struct in_multi *inm; IGMPSTAT_INC(igps_rcv_reports); if (ifp->if_flags & IFF_LOOPBACK) return (0); if (!IN_MULTICAST(ntohl(igmp->igmp_group.s_addr)) || !in_hosteq(igmp->igmp_group, ip->ip_dst)) { IGMPSTAT_INC(igps_rcv_badreports); return (EINVAL); } /* * RFC 3376, Section 4.2.13, 9.2, 9.3: * Booting clients may use the source address 0.0.0.0. Some * IGMP daemons may not know how to use IP_RECVIF to determine * the interface upon which this message was received. * Replace 0.0.0.0 with the subnet address if told to do so. */ if (V_igmp_recvifkludge && in_nullhost(ip->ip_src)) { IFP_TO_IA(ifp, ia); if (ia != NULL) { ip->ip_src.s_addr = htonl(ia->ia_subnet); ifa_free(&ia->ia_ifa); } } CTR3(KTR_IGMPV3, "process v1 report %s on ifp %p(%s)", inet_ntoa(igmp->igmp_group), ifp, ifp->if_xname); /* * IGMPv1 report suppression. * If we are a member of this group, and our membership should be * reported, stop our group timer and transition to the 'lazy' state. */ IN_MULTI_LOCK(); inm = inm_lookup(ifp, igmp->igmp_group); if (inm != NULL) { - struct igmp_ifinfo *igi; + struct igmp_ifsoftc *igi; igi = inm->inm_igi; if (igi == NULL) { KASSERT(igi != NULL, ("%s: no igi for ifp %p", __func__, ifp)); goto out_locked; } IGMPSTAT_INC(igps_rcv_ourreports); /* * If we are in IGMPv3 host mode, do not allow the * other host's IGMPv1 report to suppress our reports * unless explicitly configured to do so. */ if (igi->igi_version == IGMP_VERSION_3) { if (V_igmp_legacysupp) igmp_v3_suppress_group_record(inm); goto out_locked; } inm->inm_timer = 0; switch (inm->inm_state) { case IGMP_NOT_MEMBER: case IGMP_SILENT_MEMBER: break; case IGMP_IDLE_MEMBER: case IGMP_LAZY_MEMBER: case IGMP_AWAKENING_MEMBER: CTR3(KTR_IGMPV3, "report suppressed for %s on ifp %p(%s)", inet_ntoa(igmp->igmp_group), ifp, ifp->if_xname); case IGMP_SLEEPING_MEMBER: inm->inm_state = IGMP_SLEEPING_MEMBER; break; case IGMP_REPORTING_MEMBER: CTR3(KTR_IGMPV3, "report suppressed for %s on ifp %p(%s)", inet_ntoa(igmp->igmp_group), ifp, ifp->if_xname); if (igi->igi_version == IGMP_VERSION_1) inm->inm_state = IGMP_LAZY_MEMBER; else if (igi->igi_version == IGMP_VERSION_2) inm->inm_state = IGMP_SLEEPING_MEMBER; break; case IGMP_G_QUERY_PENDING_MEMBER: case IGMP_SG_QUERY_PENDING_MEMBER: case IGMP_LEAVING_MEMBER: break; } } out_locked: IN_MULTI_UNLOCK(); return (0); } /* * Process a received IGMPv2 host membership report. * * NOTE: 0.0.0.0 workaround breaks const correctness. */ static int igmp_input_v2_report(struct ifnet *ifp, /*const*/ struct ip *ip, /*const*/ struct igmp *igmp) { struct in_ifaddr *ia; struct in_multi *inm; /* * Make sure we don't hear our own membership report. Fast * leave requires knowing that we are the only member of a * group. */ IFP_TO_IA(ifp, ia); if (ia != NULL && in_hosteq(ip->ip_src, IA_SIN(ia)->sin_addr)) { ifa_free(&ia->ia_ifa); return (0); } IGMPSTAT_INC(igps_rcv_reports); if (ifp->if_flags & IFF_LOOPBACK) { if (ia != NULL) ifa_free(&ia->ia_ifa); return (0); } if (!IN_MULTICAST(ntohl(igmp->igmp_group.s_addr)) || !in_hosteq(igmp->igmp_group, ip->ip_dst)) { if (ia != NULL) ifa_free(&ia->ia_ifa); IGMPSTAT_INC(igps_rcv_badreports); return (EINVAL); } /* * RFC 3376, Section 4.2.13, 9.2, 9.3: * Booting clients may use the source address 0.0.0.0. Some * IGMP daemons may not know how to use IP_RECVIF to determine * the interface upon which this message was received. * Replace 0.0.0.0 with the subnet address if told to do so. */ if (V_igmp_recvifkludge && in_nullhost(ip->ip_src)) { if (ia != NULL) ip->ip_src.s_addr = htonl(ia->ia_subnet); } if (ia != NULL) ifa_free(&ia->ia_ifa); CTR3(KTR_IGMPV3, "process v2 report %s on ifp %p(%s)", inet_ntoa(igmp->igmp_group), ifp, ifp->if_xname); /* * IGMPv2 report suppression. * If we are a member of this group, and our membership should be * reported, and our group timer is pending or about to be reset, * stop our group timer by transitioning to the 'lazy' state. */ IN_MULTI_LOCK(); inm = inm_lookup(ifp, igmp->igmp_group); if (inm != NULL) { - struct igmp_ifinfo *igi; + struct igmp_ifsoftc *igi; igi = inm->inm_igi; KASSERT(igi != NULL, ("%s: no igi for ifp %p", __func__, ifp)); IGMPSTAT_INC(igps_rcv_ourreports); /* * If we are in IGMPv3 host mode, do not allow the * other host's IGMPv1 report to suppress our reports * unless explicitly configured to do so. */ if (igi->igi_version == IGMP_VERSION_3) { if (V_igmp_legacysupp) igmp_v3_suppress_group_record(inm); goto out_locked; } inm->inm_timer = 0; switch (inm->inm_state) { case IGMP_NOT_MEMBER: case IGMP_SILENT_MEMBER: case IGMP_SLEEPING_MEMBER: break; case IGMP_REPORTING_MEMBER: case IGMP_IDLE_MEMBER: case IGMP_AWAKENING_MEMBER: CTR3(KTR_IGMPV3, "report suppressed for %s on ifp %p(%s)", inet_ntoa(igmp->igmp_group), ifp, ifp->if_xname); case IGMP_LAZY_MEMBER: inm->inm_state = IGMP_LAZY_MEMBER; break; case IGMP_G_QUERY_PENDING_MEMBER: case IGMP_SG_QUERY_PENDING_MEMBER: case IGMP_LEAVING_MEMBER: break; } } out_locked: IN_MULTI_UNLOCK(); return (0); } int igmp_input(struct mbuf **mp, int *offp, int proto) { int iphlen; struct ifnet *ifp; struct igmp *igmp; struct ip *ip; struct mbuf *m; int igmplen; int minlen; int queryver; CTR3(KTR_IGMPV3, "%s: called w/mbuf (%p,%d)", __func__, *mp, *offp); m = *mp; ifp = m->m_pkthdr.rcvif; *mp = NULL; IGMPSTAT_INC(igps_rcv_total); ip = mtod(m, struct ip *); iphlen = *offp; igmplen = ntohs(ip->ip_len) - iphlen; /* * Validate lengths. */ if (igmplen < IGMP_MINLEN) { IGMPSTAT_INC(igps_rcv_tooshort); m_freem(m); return (IPPROTO_DONE); } /* * Always pullup to the minimum size for v1/v2 or v3 * to amortize calls to m_pullup(). */ minlen = iphlen; if (igmplen >= IGMP_V3_QUERY_MINLEN) minlen += IGMP_V3_QUERY_MINLEN; else minlen += IGMP_MINLEN; if ((!M_WRITABLE(m) || m->m_len < minlen) && (m = m_pullup(m, minlen)) == 0) { IGMPSTAT_INC(igps_rcv_tooshort); return (IPPROTO_DONE); } ip = mtod(m, struct ip *); /* * Validate checksum. */ m->m_data += iphlen; m->m_len -= iphlen; igmp = mtod(m, struct igmp *); if (in_cksum(m, igmplen)) { IGMPSTAT_INC(igps_rcv_badsum); m_freem(m); return (IPPROTO_DONE); } m->m_data -= iphlen; m->m_len += iphlen; /* * IGMP control traffic is link-scope, and must have a TTL of 1. * DVMRP traffic (e.g. mrinfo, mtrace) is an exception; * probe packets may come from beyond the LAN. */ if (igmp->igmp_type != IGMP_DVMRP && ip->ip_ttl != 1) { IGMPSTAT_INC(igps_rcv_badttl); m_freem(m); return (IPPROTO_DONE); } switch (igmp->igmp_type) { case IGMP_HOST_MEMBERSHIP_QUERY: if (igmplen == IGMP_MINLEN) { if (igmp->igmp_code == 0) queryver = IGMP_VERSION_1; else queryver = IGMP_VERSION_2; } else if (igmplen >= IGMP_V3_QUERY_MINLEN) { queryver = IGMP_VERSION_3; } else { IGMPSTAT_INC(igps_rcv_tooshort); m_freem(m); return (IPPROTO_DONE); } switch (queryver) { case IGMP_VERSION_1: IGMPSTAT_INC(igps_rcv_v1v2_queries); if (!V_igmp_v1enable) break; if (igmp_input_v1_query(ifp, ip, igmp) != 0) { m_freem(m); return (IPPROTO_DONE); } break; case IGMP_VERSION_2: IGMPSTAT_INC(igps_rcv_v1v2_queries); if (!V_igmp_v2enable) break; if (igmp_input_v2_query(ifp, ip, igmp) != 0) { m_freem(m); return (IPPROTO_DONE); } break; case IGMP_VERSION_3: { struct igmpv3 *igmpv3; uint16_t igmpv3len; uint16_t srclen; int nsrc; IGMPSTAT_INC(igps_rcv_v3_queries); igmpv3 = (struct igmpv3 *)igmp; /* * Validate length based on source count. */ nsrc = ntohs(igmpv3->igmp_numsrc); srclen = sizeof(struct in_addr) * nsrc; if (nsrc * sizeof(in_addr_t) > srclen) { IGMPSTAT_INC(igps_rcv_tooshort); return (IPPROTO_DONE); } /* * m_pullup() may modify m, so pullup in * this scope. */ igmpv3len = iphlen + IGMP_V3_QUERY_MINLEN + srclen; if ((!M_WRITABLE(m) || m->m_len < igmpv3len) && (m = m_pullup(m, igmpv3len)) == NULL) { IGMPSTAT_INC(igps_rcv_tooshort); return (IPPROTO_DONE); } igmpv3 = (struct igmpv3 *)(mtod(m, uint8_t *) + iphlen); if (igmp_input_v3_query(ifp, ip, igmpv3) != 0) { m_freem(m); return (IPPROTO_DONE); } } break; } break; case IGMP_v1_HOST_MEMBERSHIP_REPORT: if (!V_igmp_v1enable) break; if (igmp_input_v1_report(ifp, ip, igmp) != 0) { m_freem(m); return (IPPROTO_DONE); } break; case IGMP_v2_HOST_MEMBERSHIP_REPORT: if (!V_igmp_v2enable) break; if (!ip_checkrouteralert(m)) IGMPSTAT_INC(igps_rcv_nora); if (igmp_input_v2_report(ifp, ip, igmp) != 0) { m_freem(m); return (IPPROTO_DONE); } break; case IGMP_v3_HOST_MEMBERSHIP_REPORT: /* * Hosts do not need to process IGMPv3 membership reports, * as report suppression is no longer required. */ if (!ip_checkrouteralert(m)) IGMPSTAT_INC(igps_rcv_nora); break; default: break; } /* * Pass all valid IGMP packets up to any process(es) listening on a * raw IGMP socket. */ *mp = m; return (rip_input(mp, offp, proto)); } /* * Fast timeout handler (global). * VIMAGE: Timeout handlers are expected to service all vimages. */ void igmp_fasttimo(void) { VNET_ITERATOR_DECL(vnet_iter); VNET_LIST_RLOCK_NOSLEEP(); VNET_FOREACH(vnet_iter) { CURVNET_SET(vnet_iter); igmp_fasttimo_vnet(); CURVNET_RESTORE(); } VNET_LIST_RUNLOCK_NOSLEEP(); } /* * Fast timeout handler (per-vnet). * Sends are shuffled off to a netisr to deal with Giant. * * VIMAGE: Assume caller has set up our curvnet. */ static void igmp_fasttimo_vnet(void) { struct mbufq scq; /* State-change packets */ struct mbufq qrq; /* Query response packets */ struct ifnet *ifp; - struct igmp_ifinfo *igi; + struct igmp_ifsoftc *igi; struct ifmultiaddr *ifma; struct in_multi *inm; int loop, uri_fasthz; loop = 0; uri_fasthz = 0; /* * Quick check to see if any work needs to be done, in order to * minimize the overhead of fasttimo processing. * SMPng: XXX Unlocked reads. */ if (!V_current_state_timers_running && !V_interface_timers_running && !V_state_change_timers_running) return; IN_MULTI_LOCK(); IGMP_LOCK(); /* * IGMPv3 General Query response timer processing. */ if (V_interface_timers_running) { CTR1(KTR_IGMPV3, "%s: interface timers running", __func__); V_interface_timers_running = 0; LIST_FOREACH(igi, &V_igi_head, igi_link) { if (igi->igi_v3_timer == 0) { /* Do nothing. */ } else if (--igi->igi_v3_timer == 0) { igmp_v3_dispatch_general_query(igi); } else { V_interface_timers_running = 1; } } } if (!V_current_state_timers_running && !V_state_change_timers_running) goto out_locked; V_current_state_timers_running = 0; V_state_change_timers_running = 0; CTR1(KTR_IGMPV3, "%s: state change timers running", __func__); /* * IGMPv1/v2/v3 host report and state-change timer processing. * Note: Processing a v3 group timer may remove a node. */ LIST_FOREACH(igi, &V_igi_head, igi_link) { ifp = igi->igi_ifp; if (igi->igi_version == IGMP_VERSION_3) { loop = (igi->igi_flags & IGIF_LOOPBACK) ? 1 : 0; uri_fasthz = IGMP_RANDOM_DELAY(igi->igi_uri * PR_FASTHZ); mbufq_init(&qrq, IGMP_MAX_G_GS_PACKETS); mbufq_init(&scq, IGMP_MAX_STATE_CHANGE_PACKETS); } IF_ADDR_RLOCK(ifp); TAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) { if (ifma->ifma_addr->sa_family != AF_INET || ifma->ifma_protospec == NULL) continue; inm = (struct in_multi *)ifma->ifma_protospec; switch (igi->igi_version) { case IGMP_VERSION_1: case IGMP_VERSION_2: igmp_v1v2_process_group_timer(inm, igi->igi_version); break; case IGMP_VERSION_3: igmp_v3_process_group_timers(igi, &qrq, &scq, inm, uri_fasthz); break; } } IF_ADDR_RUNLOCK(ifp); if (igi->igi_version == IGMP_VERSION_3) { struct in_multi *tinm; igmp_dispatch_queue(&qrq, 0, loop); igmp_dispatch_queue(&scq, 0, loop); /* * Free the in_multi reference(s) for this * IGMP lifecycle. */ SLIST_FOREACH_SAFE(inm, &igi->igi_relinmhead, inm_nrele, tinm) { SLIST_REMOVE_HEAD(&igi->igi_relinmhead, inm_nrele); inm_release_locked(inm); } } } out_locked: IGMP_UNLOCK(); IN_MULTI_UNLOCK(); } /* * Update host report group timer for IGMPv1/v2. * Will update the global pending timer flags. */ static void igmp_v1v2_process_group_timer(struct in_multi *inm, const int version) { int report_timer_expired; IN_MULTI_LOCK_ASSERT(); IGMP_LOCK_ASSERT(); if (inm->inm_timer == 0) { report_timer_expired = 0; } else if (--inm->inm_timer == 0) { report_timer_expired = 1; } else { V_current_state_timers_running = 1; return; } switch (inm->inm_state) { case IGMP_NOT_MEMBER: case IGMP_SILENT_MEMBER: case IGMP_IDLE_MEMBER: case IGMP_LAZY_MEMBER: case IGMP_SLEEPING_MEMBER: case IGMP_AWAKENING_MEMBER: break; case IGMP_REPORTING_MEMBER: if (report_timer_expired) { inm->inm_state = IGMP_IDLE_MEMBER; (void)igmp_v1v2_queue_report(inm, (version == IGMP_VERSION_2) ? IGMP_v2_HOST_MEMBERSHIP_REPORT : IGMP_v1_HOST_MEMBERSHIP_REPORT); } break; case IGMP_G_QUERY_PENDING_MEMBER: case IGMP_SG_QUERY_PENDING_MEMBER: case IGMP_LEAVING_MEMBER: break; } } /* * Update a group's timers for IGMPv3. * Will update the global pending timer flags. * Note: Unlocked read from igi. */ static void -igmp_v3_process_group_timers(struct igmp_ifinfo *igi, +igmp_v3_process_group_timers(struct igmp_ifsoftc *igi, struct mbufq *qrq, struct mbufq *scq, struct in_multi *inm, const int uri_fasthz) { int query_response_timer_expired; int state_change_retransmit_timer_expired; IN_MULTI_LOCK_ASSERT(); IGMP_LOCK_ASSERT(); query_response_timer_expired = 0; state_change_retransmit_timer_expired = 0; /* * During a transition from v1/v2 compatibility mode back to v3, * a group record in REPORTING state may still have its group * timer active. This is a no-op in this function; it is easier * to deal with it here than to complicate the slow-timeout path. */ if (inm->inm_timer == 0) { query_response_timer_expired = 0; } else if (--inm->inm_timer == 0) { query_response_timer_expired = 1; } else { V_current_state_timers_running = 1; } if (inm->inm_sctimer == 0) { state_change_retransmit_timer_expired = 0; } else if (--inm->inm_sctimer == 0) { state_change_retransmit_timer_expired = 1; } else { V_state_change_timers_running = 1; } /* We are in fasttimo, so be quick about it. */ if (!state_change_retransmit_timer_expired && !query_response_timer_expired) return; switch (inm->inm_state) { case IGMP_NOT_MEMBER: case IGMP_SILENT_MEMBER: case IGMP_SLEEPING_MEMBER: case IGMP_LAZY_MEMBER: case IGMP_AWAKENING_MEMBER: case IGMP_IDLE_MEMBER: break; case IGMP_G_QUERY_PENDING_MEMBER: case IGMP_SG_QUERY_PENDING_MEMBER: /* * Respond to a previously pending Group-Specific * or Group-and-Source-Specific query by enqueueing * the appropriate Current-State report for * immediate transmission. */ if (query_response_timer_expired) { int retval; retval = igmp_v3_enqueue_group_record(qrq, inm, 0, 1, (inm->inm_state == IGMP_SG_QUERY_PENDING_MEMBER)); CTR2(KTR_IGMPV3, "%s: enqueue record = %d", __func__, retval); inm->inm_state = IGMP_REPORTING_MEMBER; /* XXX Clear recorded sources for next time. */ inm_clear_recorded(inm); } /* FALLTHROUGH */ case IGMP_REPORTING_MEMBER: case IGMP_LEAVING_MEMBER: if (state_change_retransmit_timer_expired) { /* * State-change retransmission timer fired. * If there are any further pending retransmissions, * set the global pending state-change flag, and * reset the timer. */ if (--inm->inm_scrv > 0) { inm->inm_sctimer = uri_fasthz; V_state_change_timers_running = 1; } /* * Retransmit the previously computed state-change * report. If there are no further pending * retransmissions, the mbuf queue will be consumed. * Update T0 state to T1 as we have now sent * a state-change. */ (void)igmp_v3_merge_state_changes(inm, scq); inm_commit(inm); CTR3(KTR_IGMPV3, "%s: T1 -> T0 for %s/%s", __func__, inet_ntoa(inm->inm_addr), inm->inm_ifp->if_xname); /* * If we are leaving the group for good, make sure * we release IGMP's reference to it. * This release must be deferred using a SLIST, * as we are called from a loop which traverses * the in_ifmultiaddr TAILQ. */ if (inm->inm_state == IGMP_LEAVING_MEMBER && inm->inm_scrv == 0) { inm->inm_state = IGMP_NOT_MEMBER; SLIST_INSERT_HEAD(&igi->igi_relinmhead, inm, inm_nrele); } } break; } } /* * Suppress a group's pending response to a group or source/group query. * * Do NOT suppress state changes. This leads to IGMPv3 inconsistency. * Do NOT update ST1/ST0 as this operation merely suppresses * the currently pending group record. * Do NOT suppress the response to a general query. It is possible but * it would require adding another state or flag. */ static void igmp_v3_suppress_group_record(struct in_multi *inm) { IN_MULTI_LOCK_ASSERT(); KASSERT(inm->inm_igi->igi_version == IGMP_VERSION_3, ("%s: not IGMPv3 mode on link", __func__)); if (inm->inm_state != IGMP_G_QUERY_PENDING_MEMBER || inm->inm_state != IGMP_SG_QUERY_PENDING_MEMBER) return; if (inm->inm_state == IGMP_SG_QUERY_PENDING_MEMBER) inm_clear_recorded(inm); inm->inm_timer = 0; inm->inm_state = IGMP_REPORTING_MEMBER; } /* * Switch to a different IGMP version on the given interface, * as per Section 7.2.1. */ static void -igmp_set_version(struct igmp_ifinfo *igi, const int version) +igmp_set_version(struct igmp_ifsoftc *igi, const int version) { int old_version_timer; IGMP_LOCK_ASSERT(); CTR4(KTR_IGMPV3, "%s: switching to v%d on ifp %p(%s)", __func__, version, igi->igi_ifp, igi->igi_ifp->if_xname); if (version == IGMP_VERSION_1 || version == IGMP_VERSION_2) { /* * Compute the "Older Version Querier Present" timer as per * Section 8.12. */ old_version_timer = igi->igi_rv * igi->igi_qi + igi->igi_qri; old_version_timer *= PR_SLOWHZ; if (version == IGMP_VERSION_1) { igi->igi_v1_timer = old_version_timer; igi->igi_v2_timer = 0; } else if (version == IGMP_VERSION_2) { igi->igi_v1_timer = 0; igi->igi_v2_timer = old_version_timer; } } if (igi->igi_v1_timer == 0 && igi->igi_v2_timer > 0) { if (igi->igi_version != IGMP_VERSION_2) { igi->igi_version = IGMP_VERSION_2; igmp_v3_cancel_link_timers(igi); } } else if (igi->igi_v1_timer > 0) { if (igi->igi_version != IGMP_VERSION_1) { igi->igi_version = IGMP_VERSION_1; igmp_v3_cancel_link_timers(igi); } } } /* * Cancel pending IGMPv3 timers for the given link and all groups * joined on it; state-change, general-query, and group-query timers. * * Only ever called on a transition from v3 to Compatibility mode. Kill * the timers stone dead (this may be expensive for large N groups), they * will be restarted if Compatibility Mode deems that they must be due to * query processing. */ static void -igmp_v3_cancel_link_timers(struct igmp_ifinfo *igi) +igmp_v3_cancel_link_timers(struct igmp_ifsoftc *igi) { struct ifmultiaddr *ifma; struct ifnet *ifp; struct in_multi *inm, *tinm; CTR3(KTR_IGMPV3, "%s: cancel v3 timers on ifp %p(%s)", __func__, igi->igi_ifp, igi->igi_ifp->if_xname); IN_MULTI_LOCK_ASSERT(); IGMP_LOCK_ASSERT(); /* * Stop the v3 General Query Response on this link stone dead. * If fasttimo is woken up due to V_interface_timers_running, * the flag will be cleared if there are no pending link timers. */ igi->igi_v3_timer = 0; /* * Now clear the current-state and state-change report timers * for all memberships scoped to this link. */ ifp = igi->igi_ifp; IF_ADDR_RLOCK(ifp); TAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) { if (ifma->ifma_addr->sa_family != AF_INET || ifma->ifma_protospec == NULL) continue; inm = (struct in_multi *)ifma->ifma_protospec; switch (inm->inm_state) { case IGMP_NOT_MEMBER: case IGMP_SILENT_MEMBER: case IGMP_IDLE_MEMBER: case IGMP_LAZY_MEMBER: case IGMP_SLEEPING_MEMBER: case IGMP_AWAKENING_MEMBER: /* * These states are either not relevant in v3 mode, * or are unreported. Do nothing. */ break; case IGMP_LEAVING_MEMBER: /* * If we are leaving the group and switching to * compatibility mode, we need to release the final * reference held for issuing the INCLUDE {}, and * transition to REPORTING to ensure the host leave * message is sent upstream to the old querier -- * transition to NOT would lose the leave and race. */ SLIST_INSERT_HEAD(&igi->igi_relinmhead, inm, inm_nrele); /* FALLTHROUGH */ case IGMP_G_QUERY_PENDING_MEMBER: case IGMP_SG_QUERY_PENDING_MEMBER: inm_clear_recorded(inm); /* FALLTHROUGH */ case IGMP_REPORTING_MEMBER: inm->inm_state = IGMP_REPORTING_MEMBER; break; } /* * Always clear state-change and group report timers. * Free any pending IGMPv3 state-change records. */ inm->inm_sctimer = 0; inm->inm_timer = 0; mbufq_drain(&inm->inm_scq); } IF_ADDR_RUNLOCK(ifp); SLIST_FOREACH_SAFE(inm, &igi->igi_relinmhead, inm_nrele, tinm) { SLIST_REMOVE_HEAD(&igi->igi_relinmhead, inm_nrele); inm_release_locked(inm); } } /* * Update the Older Version Querier Present timers for a link. * See Section 7.2.1 of RFC 3376. */ static void -igmp_v1v2_process_querier_timers(struct igmp_ifinfo *igi) +igmp_v1v2_process_querier_timers(struct igmp_ifsoftc *igi) { IGMP_LOCK_ASSERT(); if (igi->igi_v1_timer == 0 && igi->igi_v2_timer == 0) { /* * IGMPv1 and IGMPv2 Querier Present timers expired. * * Revert to IGMPv3. */ if (igi->igi_version != IGMP_VERSION_3) { CTR5(KTR_IGMPV3, "%s: transition from v%d -> v%d on %p(%s)", __func__, igi->igi_version, IGMP_VERSION_3, igi->igi_ifp, igi->igi_ifp->if_xname); igi->igi_version = IGMP_VERSION_3; } } else if (igi->igi_v1_timer == 0 && igi->igi_v2_timer > 0) { /* * IGMPv1 Querier Present timer expired, * IGMPv2 Querier Present timer running. * If IGMPv2 was disabled since last timeout, * revert to IGMPv3. * If IGMPv2 is enabled, revert to IGMPv2. */ if (!V_igmp_v2enable) { CTR5(KTR_IGMPV3, "%s: transition from v%d -> v%d on %p(%s)", __func__, igi->igi_version, IGMP_VERSION_3, igi->igi_ifp, igi->igi_ifp->if_xname); igi->igi_v2_timer = 0; igi->igi_version = IGMP_VERSION_3; } else { --igi->igi_v2_timer; if (igi->igi_version != IGMP_VERSION_2) { CTR5(KTR_IGMPV3, "%s: transition from v%d -> v%d on %p(%s)", __func__, igi->igi_version, IGMP_VERSION_2, igi->igi_ifp, igi->igi_ifp->if_xname); igi->igi_version = IGMP_VERSION_2; igmp_v3_cancel_link_timers(igi); } } } else if (igi->igi_v1_timer > 0) { /* * IGMPv1 Querier Present timer running. * Stop IGMPv2 timer if running. * * If IGMPv1 was disabled since last timeout, * revert to IGMPv3. * If IGMPv1 is enabled, reset IGMPv2 timer if running. */ if (!V_igmp_v1enable) { CTR5(KTR_IGMPV3, "%s: transition from v%d -> v%d on %p(%s)", __func__, igi->igi_version, IGMP_VERSION_3, igi->igi_ifp, igi->igi_ifp->if_xname); igi->igi_v1_timer = 0; igi->igi_version = IGMP_VERSION_3; } else { --igi->igi_v1_timer; } if (igi->igi_v2_timer > 0) { CTR3(KTR_IGMPV3, "%s: cancel v2 timer on %p(%s)", __func__, igi->igi_ifp, igi->igi_ifp->if_xname); igi->igi_v2_timer = 0; } } } /* * Global slowtimo handler. * VIMAGE: Timeout handlers are expected to service all vimages. */ void igmp_slowtimo(void) { VNET_ITERATOR_DECL(vnet_iter); VNET_LIST_RLOCK_NOSLEEP(); VNET_FOREACH(vnet_iter) { CURVNET_SET(vnet_iter); igmp_slowtimo_vnet(); CURVNET_RESTORE(); } VNET_LIST_RUNLOCK_NOSLEEP(); } /* * Per-vnet slowtimo handler. */ static void igmp_slowtimo_vnet(void) { - struct igmp_ifinfo *igi; + struct igmp_ifsoftc *igi; IGMP_LOCK(); LIST_FOREACH(igi, &V_igi_head, igi_link) { igmp_v1v2_process_querier_timers(igi); } IGMP_UNLOCK(); } /* * Dispatch an IGMPv1/v2 host report or leave message. * These are always small enough to fit inside a single mbuf. */ static int igmp_v1v2_queue_report(struct in_multi *inm, const int type) { struct ifnet *ifp; struct igmp *igmp; struct ip *ip; struct mbuf *m; IN_MULTI_LOCK_ASSERT(); IGMP_LOCK_ASSERT(); ifp = inm->inm_ifp; m = m_gethdr(M_NOWAIT, MT_DATA); if (m == NULL) return (ENOMEM); M_ALIGN(m, sizeof(struct ip) + sizeof(struct igmp)); m->m_pkthdr.len = sizeof(struct ip) + sizeof(struct igmp); m->m_data += sizeof(struct ip); m->m_len = sizeof(struct igmp); igmp = mtod(m, struct igmp *); igmp->igmp_type = type; igmp->igmp_code = 0; igmp->igmp_group = inm->inm_addr; igmp->igmp_cksum = 0; igmp->igmp_cksum = in_cksum(m, sizeof(struct igmp)); m->m_data -= sizeof(struct ip); m->m_len += sizeof(struct ip); ip = mtod(m, struct ip *); ip->ip_tos = 0; ip->ip_len = htons(sizeof(struct ip) + sizeof(struct igmp)); ip->ip_off = 0; ip->ip_p = IPPROTO_IGMP; ip->ip_src.s_addr = INADDR_ANY; if (type == IGMP_HOST_LEAVE_MESSAGE) ip->ip_dst.s_addr = htonl(INADDR_ALLRTRS_GROUP); else ip->ip_dst = inm->inm_addr; igmp_save_context(m, ifp); m->m_flags |= M_IGMPV2; if (inm->inm_igi->igi_flags & IGIF_LOOPBACK) m->m_flags |= M_IGMP_LOOP; CTR2(KTR_IGMPV3, "%s: netisr_dispatch(NETISR_IGMP, %p)", __func__, m); netisr_dispatch(NETISR_IGMP, m); return (0); } /* * Process a state change from the upper layer for the given IPv4 group. * * Each socket holds a reference on the in_multi in its own ip_moptions. * The socket layer will have made the necessary updates to.the group * state, it is now up to IGMP to issue a state change report if there * has been any change between T0 (when the last state-change was issued) * and T1 (now). * * We use the IGMPv3 state machine at group level. The IGMP module * however makes the decision as to which IGMP protocol version to speak. * A state change *from* INCLUDE {} always means an initial join. * A state change *to* INCLUDE {} always means a final leave. * * FUTURE: If IGIF_V3LITE is enabled for this interface, then we can * save ourselves a bunch of work; any exclusive mode groups need not * compute source filter lists. * * VIMAGE: curvnet should have been set by caller, as this routine * is called from the socket option handlers. */ int igmp_change_state(struct in_multi *inm) { - struct igmp_ifinfo *igi; + struct igmp_ifsoftc *igi; struct ifnet *ifp; int error; IN_MULTI_LOCK_ASSERT(); error = 0; /* * Try to detect if the upper layer just asked us to change state * for an interface which has now gone away. */ KASSERT(inm->inm_ifma != NULL, ("%s: no ifma", __func__)); ifp = inm->inm_ifma->ifma_ifp; /* * Sanity check that netinet's notion of ifp is the * same as net's. */ KASSERT(inm->inm_ifp == ifp, ("%s: bad ifp", __func__)); IGMP_LOCK(); igi = ((struct in_ifinfo *)ifp->if_afdata[AF_INET])->ii_igmp; - KASSERT(igi != NULL, ("%s: no igmp_ifinfo for ifp %p", __func__, ifp)); + KASSERT(igi != NULL, ("%s: no igmp_ifsoftc for ifp %p", __func__, ifp)); /* * If we detect a state transition to or from MCAST_UNDEFINED * for this group, then we are starting or finishing an IGMP * life cycle for this group. */ if (inm->inm_st[1].iss_fmode != inm->inm_st[0].iss_fmode) { CTR3(KTR_IGMPV3, "%s: inm transition %d -> %d", __func__, inm->inm_st[0].iss_fmode, inm->inm_st[1].iss_fmode); if (inm->inm_st[0].iss_fmode == MCAST_UNDEFINED) { CTR1(KTR_IGMPV3, "%s: initial join", __func__); error = igmp_initial_join(inm, igi); goto out_locked; } else if (inm->inm_st[1].iss_fmode == MCAST_UNDEFINED) { CTR1(KTR_IGMPV3, "%s: final leave", __func__); igmp_final_leave(inm, igi); goto out_locked; } } else { CTR1(KTR_IGMPV3, "%s: filter set change", __func__); } error = igmp_handle_state_change(inm, igi); out_locked: IGMP_UNLOCK(); return (error); } /* * Perform the initial join for an IGMP group. * * When joining a group: * If the group should have its IGMP traffic suppressed, do nothing. * IGMPv1 starts sending IGMPv1 host membership reports. * IGMPv2 starts sending IGMPv2 host membership reports. * IGMPv3 will schedule an IGMPv3 state-change report containing the * initial state of the membership. */ static int -igmp_initial_join(struct in_multi *inm, struct igmp_ifinfo *igi) +igmp_initial_join(struct in_multi *inm, struct igmp_ifsoftc *igi) { struct ifnet *ifp; struct mbufq *mq; int error, retval, syncstates; CTR4(KTR_IGMPV3, "%s: initial join %s on ifp %p(%s)", __func__, inet_ntoa(inm->inm_addr), inm->inm_ifp, inm->inm_ifp->if_xname); error = 0; syncstates = 1; ifp = inm->inm_ifp; IN_MULTI_LOCK_ASSERT(); IGMP_LOCK_ASSERT(); KASSERT(igi && igi->igi_ifp == ifp, ("%s: inconsistent ifp", __func__)); /* * Groups joined on loopback or marked as 'not reported', * e.g. 224.0.0.1, enter the IGMP_SILENT_MEMBER state and * are never reported in any IGMP protocol exchanges. * All other groups enter the appropriate IGMP state machine * for the version in use on this link. * A link marked as IGIF_SILENT causes IGMP to be completely * disabled for the link. */ if ((ifp->if_flags & IFF_LOOPBACK) || (igi->igi_flags & IGIF_SILENT) || !igmp_isgroupreported(inm->inm_addr)) { CTR1(KTR_IGMPV3, "%s: not kicking state machine for silent group", __func__); inm->inm_state = IGMP_SILENT_MEMBER; inm->inm_timer = 0; } else { /* * Deal with overlapping in_multi lifecycle. * If this group was LEAVING, then make sure * we drop the reference we picked up to keep the * group around for the final INCLUDE {} enqueue. */ if (igi->igi_version == IGMP_VERSION_3 && inm->inm_state == IGMP_LEAVING_MEMBER) inm_release_locked(inm); inm->inm_state = IGMP_REPORTING_MEMBER; switch (igi->igi_version) { case IGMP_VERSION_1: case IGMP_VERSION_2: inm->inm_state = IGMP_IDLE_MEMBER; error = igmp_v1v2_queue_report(inm, (igi->igi_version == IGMP_VERSION_2) ? IGMP_v2_HOST_MEMBERSHIP_REPORT : IGMP_v1_HOST_MEMBERSHIP_REPORT); if (error == 0) { inm->inm_timer = IGMP_RANDOM_DELAY( IGMP_V1V2_MAX_RI * PR_FASTHZ); V_current_state_timers_running = 1; } break; case IGMP_VERSION_3: /* * Defer update of T0 to T1, until the first copy * of the state change has been transmitted. */ syncstates = 0; /* * Immediately enqueue a State-Change Report for * this interface, freeing any previous reports. * Don't kick the timers if there is nothing to do, * or if an error occurred. */ mq = &inm->inm_scq; mbufq_drain(mq); retval = igmp_v3_enqueue_group_record(mq, inm, 1, 0, 0); CTR2(KTR_IGMPV3, "%s: enqueue record = %d", __func__, retval); if (retval <= 0) { error = retval * -1; break; } /* * Schedule transmission of pending state-change * report up to RV times for this link. The timer * will fire at the next igmp_fasttimo (~200ms), * giving us an opportunity to merge the reports. */ if (igi->igi_flags & IGIF_LOOPBACK) { inm->inm_scrv = 1; } else { KASSERT(igi->igi_rv > 1, ("%s: invalid robustness %d", __func__, igi->igi_rv)); inm->inm_scrv = igi->igi_rv; } inm->inm_sctimer = 1; V_state_change_timers_running = 1; error = 0; break; } } /* * Only update the T0 state if state change is atomic, * i.e. we don't need to wait for a timer to fire before we * can consider the state change to have been communicated. */ if (syncstates) { inm_commit(inm); CTR3(KTR_IGMPV3, "%s: T1 -> T0 for %s/%s", __func__, inet_ntoa(inm->inm_addr), inm->inm_ifp->if_xname); } return (error); } /* * Issue an intermediate state change during the IGMP life-cycle. */ static int -igmp_handle_state_change(struct in_multi *inm, struct igmp_ifinfo *igi) +igmp_handle_state_change(struct in_multi *inm, struct igmp_ifsoftc *igi) { struct ifnet *ifp; int retval; CTR4(KTR_IGMPV3, "%s: state change for %s on ifp %p(%s)", __func__, inet_ntoa(inm->inm_addr), inm->inm_ifp, inm->inm_ifp->if_xname); ifp = inm->inm_ifp; IN_MULTI_LOCK_ASSERT(); IGMP_LOCK_ASSERT(); KASSERT(igi && igi->igi_ifp == ifp, ("%s: inconsistent ifp", __func__)); if ((ifp->if_flags & IFF_LOOPBACK) || (igi->igi_flags & IGIF_SILENT) || !igmp_isgroupreported(inm->inm_addr) || (igi->igi_version != IGMP_VERSION_3)) { if (!igmp_isgroupreported(inm->inm_addr)) { CTR1(KTR_IGMPV3, "%s: not kicking state machine for silent group", __func__); } CTR1(KTR_IGMPV3, "%s: nothing to do", __func__); inm_commit(inm); CTR3(KTR_IGMPV3, "%s: T1 -> T0 for %s/%s", __func__, inet_ntoa(inm->inm_addr), inm->inm_ifp->if_xname); return (0); } mbufq_drain(&inm->inm_scq); retval = igmp_v3_enqueue_group_record(&inm->inm_scq, inm, 1, 0, 0); CTR2(KTR_IGMPV3, "%s: enqueue record = %d", __func__, retval); if (retval <= 0) return (-retval); /* * If record(s) were enqueued, start the state-change * report timer for this group. */ inm->inm_scrv = ((igi->igi_flags & IGIF_LOOPBACK) ? 1 : igi->igi_rv); inm->inm_sctimer = 1; V_state_change_timers_running = 1; return (0); } /* * Perform the final leave for an IGMP group. * * When leaving a group: * IGMPv1 does nothing. * IGMPv2 sends a host leave message, if and only if we are the reporter. * IGMPv3 enqueues a state-change report containing a transition * to INCLUDE {} for immediate transmission. */ static void -igmp_final_leave(struct in_multi *inm, struct igmp_ifinfo *igi) +igmp_final_leave(struct in_multi *inm, struct igmp_ifsoftc *igi) { int syncstates; syncstates = 1; CTR4(KTR_IGMPV3, "%s: final leave %s on ifp %p(%s)", __func__, inet_ntoa(inm->inm_addr), inm->inm_ifp, inm->inm_ifp->if_xname); IN_MULTI_LOCK_ASSERT(); IGMP_LOCK_ASSERT(); switch (inm->inm_state) { case IGMP_NOT_MEMBER: case IGMP_SILENT_MEMBER: case IGMP_LEAVING_MEMBER: /* Already leaving or left; do nothing. */ CTR1(KTR_IGMPV3, "%s: not kicking state machine for silent group", __func__); break; case IGMP_REPORTING_MEMBER: case IGMP_IDLE_MEMBER: case IGMP_G_QUERY_PENDING_MEMBER: case IGMP_SG_QUERY_PENDING_MEMBER: if (igi->igi_version == IGMP_VERSION_2) { #ifdef INVARIANTS if (inm->inm_state == IGMP_G_QUERY_PENDING_MEMBER || inm->inm_state == IGMP_SG_QUERY_PENDING_MEMBER) panic("%s: IGMPv3 state reached, not IGMPv3 mode", __func__); #endif igmp_v1v2_queue_report(inm, IGMP_HOST_LEAVE_MESSAGE); inm->inm_state = IGMP_NOT_MEMBER; } else if (igi->igi_version == IGMP_VERSION_3) { /* * Stop group timer and all pending reports. * Immediately enqueue a state-change report * TO_IN {} to be sent on the next fast timeout, * giving us an opportunity to merge reports. */ mbufq_drain(&inm->inm_scq); inm->inm_timer = 0; if (igi->igi_flags & IGIF_LOOPBACK) { inm->inm_scrv = 1; } else { inm->inm_scrv = igi->igi_rv; } CTR4(KTR_IGMPV3, "%s: Leaving %s/%s with %d " "pending retransmissions.", __func__, inet_ntoa(inm->inm_addr), inm->inm_ifp->if_xname, inm->inm_scrv); if (inm->inm_scrv == 0) { inm->inm_state = IGMP_NOT_MEMBER; inm->inm_sctimer = 0; } else { int retval; inm_acquire_locked(inm); retval = igmp_v3_enqueue_group_record( &inm->inm_scq, inm, 1, 0, 0); KASSERT(retval != 0, ("%s: enqueue record = %d", __func__, retval)); inm->inm_state = IGMP_LEAVING_MEMBER; inm->inm_sctimer = 1; V_state_change_timers_running = 1; syncstates = 0; } break; } break; case IGMP_LAZY_MEMBER: case IGMP_SLEEPING_MEMBER: case IGMP_AWAKENING_MEMBER: /* Our reports are suppressed; do nothing. */ break; } if (syncstates) { inm_commit(inm); CTR3(KTR_IGMPV3, "%s: T1 -> T0 for %s/%s", __func__, inet_ntoa(inm->inm_addr), inm->inm_ifp->if_xname); inm->inm_st[1].iss_fmode = MCAST_UNDEFINED; CTR3(KTR_IGMPV3, "%s: T1 now MCAST_UNDEFINED for %s/%s", __func__, inet_ntoa(inm->inm_addr), inm->inm_ifp->if_xname); } } /* * Enqueue an IGMPv3 group record to the given output queue. * * XXX This function could do with having the allocation code * split out, and the multiple-tree-walks coalesced into a single * routine as has been done in igmp_v3_enqueue_filter_change(). * * If is_state_change is zero, a current-state record is appended. * If is_state_change is non-zero, a state-change report is appended. * * If is_group_query is non-zero, an mbuf packet chain is allocated. * If is_group_query is zero, and if there is a packet with free space * at the tail of the queue, it will be appended to providing there * is enough free space. * Otherwise a new mbuf packet chain is allocated. * * If is_source_query is non-zero, each source is checked to see if * it was recorded for a Group-Source query, and will be omitted if * it is not both in-mode and recorded. * * The function will attempt to allocate leading space in the packet * for the IP/IGMP header to be prepended without fragmenting the chain. * * If successful the size of all data appended to the queue is returned, * otherwise an error code less than zero is returned, or zero if * no record(s) were appended. */ static int igmp_v3_enqueue_group_record(struct mbufq *mq, struct in_multi *inm, const int is_state_change, const int is_group_query, const int is_source_query) { struct igmp_grouprec ig; struct igmp_grouprec *pig; struct ifnet *ifp; struct ip_msource *ims, *nims; struct mbuf *m0, *m, *md; int error, is_filter_list_change; int minrec0len, m0srcs, msrcs, nbytes, off; int record_has_sources; int now; int type; in_addr_t naddr; uint8_t mode; IN_MULTI_LOCK_ASSERT(); error = 0; ifp = inm->inm_ifp; is_filter_list_change = 0; m = NULL; m0 = NULL; m0srcs = 0; msrcs = 0; nbytes = 0; nims = NULL; record_has_sources = 1; pig = NULL; type = IGMP_DO_NOTHING; mode = inm->inm_st[1].iss_fmode; /* * If we did not transition out of ASM mode during t0->t1, * and there are no source nodes to process, we can skip * the generation of source records. */ if (inm->inm_st[0].iss_asm > 0 && inm->inm_st[1].iss_asm > 0 && inm->inm_nsrc == 0) record_has_sources = 0; if (is_state_change) { /* * Queue a state change record. * If the mode did not change, and there are non-ASM * listeners or source filters present, * we potentially need to issue two records for the group. * If we are transitioning to MCAST_UNDEFINED, we need * not send any sources. * If there are ASM listeners, and there was no filter * mode transition of any kind, do nothing. */ if (mode != inm->inm_st[0].iss_fmode) { if (mode == MCAST_EXCLUDE) { CTR1(KTR_IGMPV3, "%s: change to EXCLUDE", __func__); type = IGMP_CHANGE_TO_EXCLUDE_MODE; } else { CTR1(KTR_IGMPV3, "%s: change to INCLUDE", __func__); type = IGMP_CHANGE_TO_INCLUDE_MODE; if (mode == MCAST_UNDEFINED) record_has_sources = 0; } } else { if (record_has_sources) { is_filter_list_change = 1; } else { type = IGMP_DO_NOTHING; } } } else { /* * Queue a current state record. */ if (mode == MCAST_EXCLUDE) { type = IGMP_MODE_IS_EXCLUDE; } else if (mode == MCAST_INCLUDE) { type = IGMP_MODE_IS_INCLUDE; KASSERT(inm->inm_st[1].iss_asm == 0, ("%s: inm %p is INCLUDE but ASM count is %d", __func__, inm, inm->inm_st[1].iss_asm)); } } /* * Generate the filter list changes using a separate function. */ if (is_filter_list_change) return (igmp_v3_enqueue_filter_change(mq, inm)); if (type == IGMP_DO_NOTHING) { CTR3(KTR_IGMPV3, "%s: nothing to do for %s/%s", __func__, inet_ntoa(inm->inm_addr), inm->inm_ifp->if_xname); return (0); } /* * If any sources are present, we must be able to fit at least * one in the trailing space of the tail packet's mbuf, * ideally more. */ minrec0len = sizeof(struct igmp_grouprec); if (record_has_sources) minrec0len += sizeof(in_addr_t); CTR4(KTR_IGMPV3, "%s: queueing %s for %s/%s", __func__, igmp_rec_type_to_str(type), inet_ntoa(inm->inm_addr), inm->inm_ifp->if_xname); /* * Check if we have a packet in the tail of the queue for this * group into which the first group record for this group will fit. * Otherwise allocate a new packet. * Always allocate leading space for IP+RA_OPT+IGMP+REPORT. * Note: Group records for G/GSR query responses MUST be sent * in their own packet. */ m0 = mbufq_last(mq); if (!is_group_query && m0 != NULL && (m0->m_pkthdr.PH_vt.vt_nrecs + 1 <= IGMP_V3_REPORT_MAXRECS) && (m0->m_pkthdr.len + minrec0len) < (ifp->if_mtu - IGMP_LEADINGSPACE)) { m0srcs = (ifp->if_mtu - m0->m_pkthdr.len - sizeof(struct igmp_grouprec)) / sizeof(in_addr_t); m = m0; CTR1(KTR_IGMPV3, "%s: use existing packet", __func__); } else { if (mbufq_full(mq)) { CTR1(KTR_IGMPV3, "%s: outbound queue full", __func__); return (-ENOMEM); } m = NULL; m0srcs = (ifp->if_mtu - IGMP_LEADINGSPACE - sizeof(struct igmp_grouprec)) / sizeof(in_addr_t); if (!is_state_change && !is_group_query) { m = m_getcl(M_NOWAIT, MT_DATA, M_PKTHDR); if (m) m->m_data += IGMP_LEADINGSPACE; } if (m == NULL) { m = m_gethdr(M_NOWAIT, MT_DATA); if (m) M_ALIGN(m, IGMP_LEADINGSPACE); } if (m == NULL) return (-ENOMEM); igmp_save_context(m, ifp); CTR1(KTR_IGMPV3, "%s: allocated first packet", __func__); } /* * Append group record. * If we have sources, we don't know how many yet. */ ig.ig_type = type; ig.ig_datalen = 0; ig.ig_numsrc = 0; ig.ig_group = inm->inm_addr; if (!m_append(m, sizeof(struct igmp_grouprec), (void *)&ig)) { if (m != m0) m_freem(m); CTR1(KTR_IGMPV3, "%s: m_append() failed.", __func__); return (-ENOMEM); } nbytes += sizeof(struct igmp_grouprec); /* * Append as many sources as will fit in the first packet. * If we are appending to a new packet, the chain allocation * may potentially use clusters; use m_getptr() in this case. * If we are appending to an existing packet, we need to obtain * a pointer to the group record after m_append(), in case a new * mbuf was allocated. * Only append sources which are in-mode at t1. If we are * transitioning to MCAST_UNDEFINED state on the group, do not * include source entries. * Only report recorded sources in our filter set when responding * to a group-source query. */ if (record_has_sources) { if (m == m0) { md = m_last(m); pig = (struct igmp_grouprec *)(mtod(md, uint8_t *) + md->m_len - nbytes); } else { md = m_getptr(m, 0, &off); pig = (struct igmp_grouprec *)(mtod(md, uint8_t *) + off); } msrcs = 0; RB_FOREACH_SAFE(ims, ip_msource_tree, &inm->inm_srcs, nims) { CTR2(KTR_IGMPV3, "%s: visit node %s", __func__, inet_ntoa_haddr(ims->ims_haddr)); now = ims_get_mode(inm, ims, 1); CTR2(KTR_IGMPV3, "%s: node is %d", __func__, now); if ((now != mode) || (now == mode && mode == MCAST_UNDEFINED)) { CTR1(KTR_IGMPV3, "%s: skip node", __func__); continue; } if (is_source_query && ims->ims_stp == 0) { CTR1(KTR_IGMPV3, "%s: skip unrecorded node", __func__); continue; } CTR1(KTR_IGMPV3, "%s: append node", __func__); naddr = htonl(ims->ims_haddr); if (!m_append(m, sizeof(in_addr_t), (void *)&naddr)) { if (m != m0) m_freem(m); CTR1(KTR_IGMPV3, "%s: m_append() failed.", __func__); return (-ENOMEM); } nbytes += sizeof(in_addr_t); ++msrcs; if (msrcs == m0srcs) break; } CTR2(KTR_IGMPV3, "%s: msrcs is %d this packet", __func__, msrcs); pig->ig_numsrc = htons(msrcs); nbytes += (msrcs * sizeof(in_addr_t)); } if (is_source_query && msrcs == 0) { CTR1(KTR_IGMPV3, "%s: no recorded sources to report", __func__); if (m != m0) m_freem(m); return (0); } /* * We are good to go with first packet. */ if (m != m0) { CTR1(KTR_IGMPV3, "%s: enqueueing first packet", __func__); m->m_pkthdr.PH_vt.vt_nrecs = 1; mbufq_enqueue(mq, m); } else m->m_pkthdr.PH_vt.vt_nrecs++; /* * No further work needed if no source list in packet(s). */ if (!record_has_sources) return (nbytes); /* * Whilst sources remain to be announced, we need to allocate * a new packet and fill out as many sources as will fit. * Always try for a cluster first. */ while (nims != NULL) { if (mbufq_full(mq)) { CTR1(KTR_IGMPV3, "%s: outbound queue full", __func__); return (-ENOMEM); } m = m_getcl(M_NOWAIT, MT_DATA, M_PKTHDR); if (m) m->m_data += IGMP_LEADINGSPACE; if (m == NULL) { m = m_gethdr(M_NOWAIT, MT_DATA); if (m) M_ALIGN(m, IGMP_LEADINGSPACE); } if (m == NULL) return (-ENOMEM); igmp_save_context(m, ifp); md = m_getptr(m, 0, &off); pig = (struct igmp_grouprec *)(mtod(md, uint8_t *) + off); CTR1(KTR_IGMPV3, "%s: allocated next packet", __func__); if (!m_append(m, sizeof(struct igmp_grouprec), (void *)&ig)) { if (m != m0) m_freem(m); CTR1(KTR_IGMPV3, "%s: m_append() failed.", __func__); return (-ENOMEM); } m->m_pkthdr.PH_vt.vt_nrecs = 1; nbytes += sizeof(struct igmp_grouprec); m0srcs = (ifp->if_mtu - IGMP_LEADINGSPACE - sizeof(struct igmp_grouprec)) / sizeof(in_addr_t); msrcs = 0; RB_FOREACH_FROM(ims, ip_msource_tree, nims) { CTR2(KTR_IGMPV3, "%s: visit node %s", __func__, inet_ntoa_haddr(ims->ims_haddr)); now = ims_get_mode(inm, ims, 1); if ((now != mode) || (now == mode && mode == MCAST_UNDEFINED)) { CTR1(KTR_IGMPV3, "%s: skip node", __func__); continue; } if (is_source_query && ims->ims_stp == 0) { CTR1(KTR_IGMPV3, "%s: skip unrecorded node", __func__); continue; } CTR1(KTR_IGMPV3, "%s: append node", __func__); naddr = htonl(ims->ims_haddr); if (!m_append(m, sizeof(in_addr_t), (void *)&naddr)) { if (m != m0) m_freem(m); CTR1(KTR_IGMPV3, "%s: m_append() failed.", __func__); return (-ENOMEM); } ++msrcs; if (msrcs == m0srcs) break; } pig->ig_numsrc = htons(msrcs); nbytes += (msrcs * sizeof(in_addr_t)); CTR1(KTR_IGMPV3, "%s: enqueueing next packet", __func__); mbufq_enqueue(mq, m); } return (nbytes); } /* * Type used to mark record pass completion. * We exploit the fact we can cast to this easily from the * current filter modes on each ip_msource node. */ typedef enum { REC_NONE = 0x00, /* MCAST_UNDEFINED */ REC_ALLOW = 0x01, /* MCAST_INCLUDE */ REC_BLOCK = 0x02, /* MCAST_EXCLUDE */ REC_FULL = REC_ALLOW | REC_BLOCK } rectype_t; /* * Enqueue an IGMPv3 filter list change to the given output queue. * * Source list filter state is held in an RB-tree. When the filter list * for a group is changed without changing its mode, we need to compute * the deltas between T0 and T1 for each source in the filter set, * and enqueue the appropriate ALLOW_NEW/BLOCK_OLD records. * * As we may potentially queue two record types, and the entire R-B tree * needs to be walked at once, we break this out into its own function * so we can generate a tightly packed queue of packets. * * XXX This could be written to only use one tree walk, although that makes * serializing into the mbuf chains a bit harder. For now we do two walks * which makes things easier on us, and it may or may not be harder on * the L2 cache. * * If successful the size of all data appended to the queue is returned, * otherwise an error code less than zero is returned, or zero if * no record(s) were appended. */ static int igmp_v3_enqueue_filter_change(struct mbufq *mq, struct in_multi *inm) { static const int MINRECLEN = sizeof(struct igmp_grouprec) + sizeof(in_addr_t); struct ifnet *ifp; struct igmp_grouprec ig; struct igmp_grouprec *pig; struct ip_msource *ims, *nims; struct mbuf *m, *m0, *md; in_addr_t naddr; int m0srcs, nbytes, npbytes, off, rsrcs, schanged; int nallow, nblock; uint8_t mode, now, then; rectype_t crt, drt, nrt; IN_MULTI_LOCK_ASSERT(); if (inm->inm_nsrc == 0 || (inm->inm_st[0].iss_asm > 0 && inm->inm_st[1].iss_asm > 0)) return (0); ifp = inm->inm_ifp; /* interface */ mode = inm->inm_st[1].iss_fmode; /* filter mode at t1 */ crt = REC_NONE; /* current group record type */ drt = REC_NONE; /* mask of completed group record types */ nrt = REC_NONE; /* record type for current node */ m0srcs = 0; /* # source which will fit in current mbuf chain */ nbytes = 0; /* # of bytes appended to group's state-change queue */ npbytes = 0; /* # of bytes appended this packet */ rsrcs = 0; /* # sources encoded in current record */ schanged = 0; /* # nodes encoded in overall filter change */ nallow = 0; /* # of source entries in ALLOW_NEW */ nblock = 0; /* # of source entries in BLOCK_OLD */ nims = NULL; /* next tree node pointer */ /* * For each possible filter record mode. * The first kind of source we encounter tells us which * is the first kind of record we start appending. * If a node transitioned to UNDEFINED at t1, its mode is treated * as the inverse of the group's filter mode. */ while (drt != REC_FULL) { do { m0 = mbufq_last(mq); if (m0 != NULL && (m0->m_pkthdr.PH_vt.vt_nrecs + 1 <= IGMP_V3_REPORT_MAXRECS) && (m0->m_pkthdr.len + MINRECLEN) < (ifp->if_mtu - IGMP_LEADINGSPACE)) { m = m0; m0srcs = (ifp->if_mtu - m0->m_pkthdr.len - sizeof(struct igmp_grouprec)) / sizeof(in_addr_t); CTR1(KTR_IGMPV3, "%s: use previous packet", __func__); } else { m = m_getcl(M_NOWAIT, MT_DATA, M_PKTHDR); if (m) m->m_data += IGMP_LEADINGSPACE; if (m == NULL) { m = m_gethdr(M_NOWAIT, MT_DATA); if (m) M_ALIGN(m, IGMP_LEADINGSPACE); } if (m == NULL) { CTR1(KTR_IGMPV3, "%s: m_get*() failed", __func__); return (-ENOMEM); } m->m_pkthdr.PH_vt.vt_nrecs = 0; igmp_save_context(m, ifp); m0srcs = (ifp->if_mtu - IGMP_LEADINGSPACE - sizeof(struct igmp_grouprec)) / sizeof(in_addr_t); npbytes = 0; CTR1(KTR_IGMPV3, "%s: allocated new packet", __func__); } /* * Append the IGMP group record header to the * current packet's data area. * Recalculate pointer to free space for next * group record, in case m_append() allocated * a new mbuf or cluster. */ memset(&ig, 0, sizeof(ig)); ig.ig_group = inm->inm_addr; if (!m_append(m, sizeof(ig), (void *)&ig)) { if (m != m0) m_freem(m); CTR1(KTR_IGMPV3, "%s: m_append() failed", __func__); return (-ENOMEM); } npbytes += sizeof(struct igmp_grouprec); if (m != m0) { /* new packet; offset in c hain */ md = m_getptr(m, npbytes - sizeof(struct igmp_grouprec), &off); pig = (struct igmp_grouprec *)(mtod(md, uint8_t *) + off); } else { /* current packet; offset from last append */ md = m_last(m); pig = (struct igmp_grouprec *)(mtod(md, uint8_t *) + md->m_len - sizeof(struct igmp_grouprec)); } /* * Begin walking the tree for this record type * pass, or continue from where we left off * previously if we had to allocate a new packet. * Only report deltas in-mode at t1. * We need not report included sources as allowed * if we are in inclusive mode on the group, * however the converse is not true. */ rsrcs = 0; if (nims == NULL) nims = RB_MIN(ip_msource_tree, &inm->inm_srcs); RB_FOREACH_FROM(ims, ip_msource_tree, nims) { CTR2(KTR_IGMPV3, "%s: visit node %s", __func__, inet_ntoa_haddr(ims->ims_haddr)); now = ims_get_mode(inm, ims, 1); then = ims_get_mode(inm, ims, 0); CTR3(KTR_IGMPV3, "%s: mode: t0 %d, t1 %d", __func__, then, now); if (now == then) { CTR1(KTR_IGMPV3, "%s: skip unchanged", __func__); continue; } if (mode == MCAST_EXCLUDE && now == MCAST_INCLUDE) { CTR1(KTR_IGMPV3, "%s: skip IN src on EX group", __func__); continue; } nrt = (rectype_t)now; if (nrt == REC_NONE) nrt = (rectype_t)(~mode & REC_FULL); if (schanged++ == 0) { crt = nrt; } else if (crt != nrt) continue; naddr = htonl(ims->ims_haddr); if (!m_append(m, sizeof(in_addr_t), (void *)&naddr)) { if (m != m0) m_freem(m); CTR1(KTR_IGMPV3, "%s: m_append() failed", __func__); return (-ENOMEM); } nallow += !!(crt == REC_ALLOW); nblock += !!(crt == REC_BLOCK); if (++rsrcs == m0srcs) break; } /* * If we did not append any tree nodes on this * pass, back out of allocations. */ if (rsrcs == 0) { npbytes -= sizeof(struct igmp_grouprec); if (m != m0) { CTR1(KTR_IGMPV3, "%s: m_free(m)", __func__); m_freem(m); } else { CTR1(KTR_IGMPV3, "%s: m_adj(m, -ig)", __func__); m_adj(m, -((int)sizeof( struct igmp_grouprec))); } continue; } npbytes += (rsrcs * sizeof(in_addr_t)); if (crt == REC_ALLOW) pig->ig_type = IGMP_ALLOW_NEW_SOURCES; else if (crt == REC_BLOCK) pig->ig_type = IGMP_BLOCK_OLD_SOURCES; pig->ig_numsrc = htons(rsrcs); /* * Count the new group record, and enqueue this * packet if it wasn't already queued. */ m->m_pkthdr.PH_vt.vt_nrecs++; if (m != m0) mbufq_enqueue(mq, m); nbytes += npbytes; } while (nims != NULL); drt |= crt; crt = (~crt & REC_FULL); } CTR3(KTR_IGMPV3, "%s: queued %d ALLOW_NEW, %d BLOCK_OLD", __func__, nallow, nblock); return (nbytes); } static int igmp_v3_merge_state_changes(struct in_multi *inm, struct mbufq *scq) { struct mbufq *gq; struct mbuf *m; /* pending state-change */ struct mbuf *m0; /* copy of pending state-change */ struct mbuf *mt; /* last state-change in packet */ int docopy, domerge; u_int recslen; docopy = 0; domerge = 0; recslen = 0; IN_MULTI_LOCK_ASSERT(); IGMP_LOCK_ASSERT(); /* * If there are further pending retransmissions, make a writable * copy of each queued state-change message before merging. */ if (inm->inm_scrv > 0) docopy = 1; gq = &inm->inm_scq; #ifdef KTR if (mbufq_first(gq) == NULL) { CTR2(KTR_IGMPV3, "%s: WARNING: queue for inm %p is empty", __func__, inm); } #endif m = mbufq_first(gq); while (m != NULL) { /* * Only merge the report into the current packet if * there is sufficient space to do so; an IGMPv3 report * packet may only contain 65,535 group records. * Always use a simple mbuf chain concatentation to do this, * as large state changes for single groups may have * allocated clusters. */ domerge = 0; mt = mbufq_last(scq); if (mt != NULL) { recslen = m_length(m, NULL); if ((mt->m_pkthdr.PH_vt.vt_nrecs + m->m_pkthdr.PH_vt.vt_nrecs <= IGMP_V3_REPORT_MAXRECS) && (mt->m_pkthdr.len + recslen <= (inm->inm_ifp->if_mtu - IGMP_LEADINGSPACE))) domerge = 1; } if (!domerge && mbufq_full(gq)) { CTR2(KTR_IGMPV3, "%s: outbound queue full, skipping whole packet %p", __func__, m); mt = m->m_nextpkt; if (!docopy) m_freem(m); m = mt; continue; } if (!docopy) { CTR2(KTR_IGMPV3, "%s: dequeueing %p", __func__, m); m0 = mbufq_dequeue(gq); m = m0->m_nextpkt; } else { CTR2(KTR_IGMPV3, "%s: copying %p", __func__, m); m0 = m_dup(m, M_NOWAIT); if (m0 == NULL) return (ENOMEM); m0->m_nextpkt = NULL; m = m->m_nextpkt; } if (!domerge) { CTR3(KTR_IGMPV3, "%s: queueing %p to scq %p)", __func__, m0, scq); mbufq_enqueue(scq, m0); } else { struct mbuf *mtl; /* last mbuf of packet mt */ CTR3(KTR_IGMPV3, "%s: merging %p with scq tail %p)", __func__, m0, mt); mtl = m_last(mt); m0->m_flags &= ~M_PKTHDR; mt->m_pkthdr.len += recslen; mt->m_pkthdr.PH_vt.vt_nrecs += m0->m_pkthdr.PH_vt.vt_nrecs; mtl->m_next = m0; } } return (0); } /* * Respond to a pending IGMPv3 General Query. */ static void -igmp_v3_dispatch_general_query(struct igmp_ifinfo *igi) +igmp_v3_dispatch_general_query(struct igmp_ifsoftc *igi) { struct ifmultiaddr *ifma; struct ifnet *ifp; struct in_multi *inm; int retval, loop; IN_MULTI_LOCK_ASSERT(); IGMP_LOCK_ASSERT(); KASSERT(igi->igi_version == IGMP_VERSION_3, ("%s: called when version %d", __func__, igi->igi_version)); ifp = igi->igi_ifp; IF_ADDR_RLOCK(ifp); TAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) { if (ifma->ifma_addr->sa_family != AF_INET || ifma->ifma_protospec == NULL) continue; inm = (struct in_multi *)ifma->ifma_protospec; KASSERT(ifp == inm->inm_ifp, ("%s: inconsistent ifp", __func__)); switch (inm->inm_state) { case IGMP_NOT_MEMBER: case IGMP_SILENT_MEMBER: break; case IGMP_REPORTING_MEMBER: case IGMP_IDLE_MEMBER: case IGMP_LAZY_MEMBER: case IGMP_SLEEPING_MEMBER: case IGMP_AWAKENING_MEMBER: inm->inm_state = IGMP_REPORTING_MEMBER; retval = igmp_v3_enqueue_group_record(&igi->igi_gq, inm, 0, 0, 0); CTR2(KTR_IGMPV3, "%s: enqueue record = %d", __func__, retval); break; case IGMP_G_QUERY_PENDING_MEMBER: case IGMP_SG_QUERY_PENDING_MEMBER: case IGMP_LEAVING_MEMBER: break; } } IF_ADDR_RUNLOCK(ifp); loop = (igi->igi_flags & IGIF_LOOPBACK) ? 1 : 0; igmp_dispatch_queue(&igi->igi_gq, IGMP_MAX_RESPONSE_BURST, loop); /* * Slew transmission of bursts over 500ms intervals. */ if (mbufq_first(&igi->igi_gq) != NULL) { igi->igi_v3_timer = 1 + IGMP_RANDOM_DELAY( IGMP_RESPONSE_BURST_INTERVAL); V_interface_timers_running = 1; } } /* * Transmit the next pending IGMP message in the output queue. * * We get called from netisr_processqueue(). A mutex private to igmpoq * will be acquired and released around this routine. * * VIMAGE: Needs to store/restore vnet pointer on a per-mbuf-chain basis. * MRT: Nothing needs to be done, as IGMP traffic is always local to * a link and uses a link-scope multicast address. */ static void igmp_intr(struct mbuf *m) { struct ip_moptions imo; struct ifnet *ifp; struct mbuf *ipopts, *m0; int error; uint32_t ifindex; CTR2(KTR_IGMPV3, "%s: transmit %p", __func__, m); /* * Set VNET image pointer from enqueued mbuf chain * before doing anything else. Whilst we use interface * indexes to guard against interface detach, they are * unique to each VIMAGE and must be retrieved. */ CURVNET_SET((struct vnet *)(m->m_pkthdr.PH_loc.ptr)); ifindex = igmp_restore_context(m); /* * Check if the ifnet still exists. This limits the scope of * any race in the absence of a global ifp lock for low cost * (an array lookup). */ ifp = ifnet_byindex(ifindex); if (ifp == NULL) { CTR3(KTR_IGMPV3, "%s: dropped %p as ifindex %u went away.", __func__, m, ifindex); m_freem(m); IPSTAT_INC(ips_noroute); goto out; } ipopts = V_igmp_sendra ? m_raopt : NULL; imo.imo_multicast_ttl = 1; imo.imo_multicast_vif = -1; imo.imo_multicast_loop = (V_ip_mrouter != NULL); /* * If the user requested that IGMP traffic be explicitly * redirected to the loopback interface (e.g. they are running a * MANET interface and the routing protocol needs to see the * updates), handle this now. */ if (m->m_flags & M_IGMP_LOOP) imo.imo_multicast_ifp = V_loif; else imo.imo_multicast_ifp = ifp; if (m->m_flags & M_IGMPV2) { m0 = m; } else { m0 = igmp_v3_encap_report(ifp, m); if (m0 == NULL) { CTR2(KTR_IGMPV3, "%s: dropped %p", __func__, m); m_freem(m); IPSTAT_INC(ips_odropped); goto out; } } igmp_scrub_context(m0); m_clrprotoflags(m); m0->m_pkthdr.rcvif = V_loif; #ifdef MAC mac_netinet_igmp_send(ifp, m0); #endif error = ip_output(m0, ipopts, NULL, 0, &imo, NULL); if (error) { CTR3(KTR_IGMPV3, "%s: ip_output(%p) = %d", __func__, m0, error); goto out; } IGMPSTAT_INC(igps_snd_reports); out: /* * We must restore the existing vnet pointer before * continuing as we are run from netisr context. */ CURVNET_RESTORE(); } /* * Encapsulate an IGMPv3 report. * * The internal mbuf flag M_IGMPV3_HDR is used to indicate that the mbuf * chain has already had its IP/IGMPv3 header prepended. In this case * the function will not attempt to prepend; the lengths and checksums * will however be re-computed. * * Returns a pointer to the new mbuf chain head, or NULL if the * allocation failed. */ static struct mbuf * igmp_v3_encap_report(struct ifnet *ifp, struct mbuf *m) { struct igmp_report *igmp; struct ip *ip; int hdrlen, igmpreclen; KASSERT((m->m_flags & M_PKTHDR), ("%s: mbuf chain %p is !M_PKTHDR", __func__, m)); igmpreclen = m_length(m, NULL); hdrlen = sizeof(struct ip) + sizeof(struct igmp_report); if (m->m_flags & M_IGMPV3_HDR) { igmpreclen -= hdrlen; } else { M_PREPEND(m, hdrlen, M_NOWAIT); if (m == NULL) return (NULL); m->m_flags |= M_IGMPV3_HDR; } CTR2(KTR_IGMPV3, "%s: igmpreclen is %d", __func__, igmpreclen); m->m_data += sizeof(struct ip); m->m_len -= sizeof(struct ip); igmp = mtod(m, struct igmp_report *); igmp->ir_type = IGMP_v3_HOST_MEMBERSHIP_REPORT; igmp->ir_rsv1 = 0; igmp->ir_rsv2 = 0; igmp->ir_numgrps = htons(m->m_pkthdr.PH_vt.vt_nrecs); igmp->ir_cksum = 0; igmp->ir_cksum = in_cksum(m, sizeof(struct igmp_report) + igmpreclen); m->m_pkthdr.PH_vt.vt_nrecs = 0; m->m_data -= sizeof(struct ip); m->m_len += sizeof(struct ip); ip = mtod(m, struct ip *); ip->ip_tos = IPTOS_PREC_INTERNETCONTROL; ip->ip_len = htons(hdrlen + igmpreclen); ip->ip_off = htons(IP_DF); ip->ip_p = IPPROTO_IGMP; ip->ip_sum = 0; ip->ip_src.s_addr = INADDR_ANY; if (m->m_flags & M_IGMP_LOOP) { struct in_ifaddr *ia; IFP_TO_IA(ifp, ia); if (ia != NULL) { ip->ip_src = ia->ia_addr.sin_addr; ifa_free(&ia->ia_ifa); } } ip->ip_dst.s_addr = htonl(INADDR_ALLRPTS_GROUP); return (m); } #ifdef KTR static char * igmp_rec_type_to_str(const int type) { switch (type) { case IGMP_CHANGE_TO_EXCLUDE_MODE: return "TO_EX"; break; case IGMP_CHANGE_TO_INCLUDE_MODE: return "TO_IN"; break; case IGMP_MODE_IS_EXCLUDE: return "MODE_EX"; break; case IGMP_MODE_IS_INCLUDE: return "MODE_IN"; break; case IGMP_ALLOW_NEW_SOURCES: return "ALLOW_NEW"; break; case IGMP_BLOCK_OLD_SOURCES: return "BLOCK_OLD"; break; default: break; } return "unknown"; } #endif static void igmp_init(void *unused __unused) { CTR1(KTR_IGMPV3, "%s: initializing", __func__); IGMP_LOCK_INIT(); m_raopt = igmp_ra_alloc(); netisr_register(&igmp_nh); } SYSINIT(igmp_init, SI_SUB_PSEUDO, SI_ORDER_MIDDLE, igmp_init, NULL); static void igmp_uninit(void *unused __unused) { CTR1(KTR_IGMPV3, "%s: tearing down", __func__); netisr_unregister(&igmp_nh); m_free(m_raopt); m_raopt = NULL; IGMP_LOCK_DESTROY(); } SYSUNINIT(igmp_uninit, SI_SUB_PSEUDO, SI_ORDER_MIDDLE, igmp_uninit, NULL); static void vnet_igmp_init(const void *unused __unused) { CTR1(KTR_IGMPV3, "%s: initializing", __func__); LIST_INIT(&V_igi_head); } VNET_SYSINIT(vnet_igmp_init, SI_SUB_PSEUDO, SI_ORDER_ANY, vnet_igmp_init, NULL); static void vnet_igmp_uninit(const void *unused __unused) { CTR1(KTR_IGMPV3, "%s: tearing down", __func__); KASSERT(LIST_EMPTY(&V_igi_head), ("%s: igi list not empty; ifnets not detached?", __func__)); } VNET_SYSUNINIT(vnet_igmp_uninit, SI_SUB_PSEUDO, SI_ORDER_ANY, vnet_igmp_uninit, NULL); static int igmp_modevent(module_t mod, int type, void *unused __unused) { switch (type) { case MOD_LOAD: case MOD_UNLOAD: break; default: return (EOPNOTSUPP); } return (0); } static moduledata_t igmp_mod = { "igmp", igmp_modevent, 0 }; DECLARE_MODULE(igmp, igmp_mod, SI_SUB_PSEUDO, SI_ORDER_ANY); Index: projects/ifnet/sys/netinet/igmp_var.h =================================================================== --- projects/ifnet/sys/netinet/igmp_var.h (revision 279031) +++ projects/ifnet/sys/netinet/igmp_var.h (revision 279032) @@ -1,220 +1,231 @@ /*-a * Copyright (c) 1988 Stephen Deering. * Copyright (c) 1992, 1993 * The Regents of the University of California. All rights reserved. * * This code is derived from software contributed to Berkeley by * Stephen Deering of Stanford University. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * from: @(#)igmp_var.h 8.1 (Berkeley) 7/19/93 * $FreeBSD$ */ #ifndef _NETINET_IGMP_VAR_H_ #define _NETINET_IGMP_VAR_H_ /* * Internet Group Management Protocol (IGMP), * implementation-specific definitions. * * Written by Steve Deering, Stanford, May 1988. * * MULTICAST Revision: 3.5.1.3 */ -#ifndef BURN_BRIDGES /* - * Pre-IGMPV3 igmpstat structure. - */ -struct oigmpstat { - u_int igps_rcv_total; /* total IGMP messages received */ - u_int igps_rcv_tooshort; /* received with too few bytes */ - u_int igps_rcv_badsum; /* received with bad checksum */ - u_int igps_rcv_queries; /* received membership queries */ - u_int igps_rcv_badqueries; /* received invalid queries */ - u_int igps_rcv_reports; /* received membership reports */ - u_int igps_rcv_badreports; /* received invalid reports */ - u_int igps_rcv_ourreports; /* received reports for our groups */ - u_int igps_snd_reports; /* sent membership reports */ - u_int igps_rcv_toolong; /* received with too many bytes */ -}; -#endif - -/* * IGMPv3 protocol statistics. */ struct igmpstat { /* * Structure header (to insulate ABI changes). */ uint32_t igps_version; /* version of this structure */ uint32_t igps_len; /* length of this structure */ /* * Message statistics. */ uint64_t igps_rcv_total; /* total IGMP messages received */ uint64_t igps_rcv_tooshort; /* received with too few bytes */ uint64_t igps_rcv_badttl; /* received with ttl other than 1 */ uint64_t igps_rcv_badsum; /* received with bad checksum */ /* * Query statistics. */ uint64_t igps_rcv_v1v2_queries; /* received IGMPv1/IGMPv2 queries */ uint64_t igps_rcv_v3_queries; /* received IGMPv3 queries */ uint64_t igps_rcv_badqueries; /* received invalid queries */ uint64_t igps_rcv_gen_queries; /* received general queries */ uint64_t igps_rcv_group_queries;/* received group queries */ uint64_t igps_rcv_gsr_queries; /* received group-source queries */ uint64_t igps_drop_gsr_queries; /* dropped group-source queries */ /* * Report statistics. */ uint64_t igps_rcv_reports; /* received membership reports */ uint64_t igps_rcv_badreports; /* received invalid reports */ uint64_t igps_rcv_ourreports; /* received reports for our groups */ uint64_t igps_rcv_nora; /* received w/o Router Alert option */ uint64_t igps_snd_reports; /* sent membership reports */ /* * Padding for future additions. */ uint64_t __igps_pad[4]; }; #define IGPS_VERSION_3 3 /* as of FreeBSD 8.x */ #define IGPS_VERSION3_LEN 168 - -#ifdef _KERNEL -#define IGMPSTAT_ADD(name, val) V_igmpstat.name += (val) -#define IGMPSTAT_INC(name) IGMPSTAT_ADD(name, 1) -#endif - #ifdef CTASSERT -CTASSERT(sizeof(struct igmpstat) == 168); +CTASSERT(sizeof(struct igmpstat) == IGPS_VERSION3_LEN); #endif -#ifdef _KERNEL -#define IGMP_RANDOM_DELAY(X) (random() % (X) + 1) +/* + * Identifiers for IGMP sysctl nodes + */ +#define IGMPCTL_STATS 1 /* statistics (read-only) */ +#define IGMP_RANDOM_DELAY(X) (random() % (X) + 1) #define IGMP_MAX_STATE_CHANGES 24 /* Max pending changes per group */ /* * IGMP per-group states. */ #define IGMP_NOT_MEMBER 0 /* Can garbage collect in_multi */ #define IGMP_SILENT_MEMBER 1 /* Do not perform IGMP for group */ #define IGMP_REPORTING_MEMBER 2 /* IGMPv1/2/3 we are reporter */ #define IGMP_IDLE_MEMBER 3 /* IGMPv1/2 we reported last */ #define IGMP_LAZY_MEMBER 4 /* IGMPv1/2 other member reporting */ #define IGMP_SLEEPING_MEMBER 5 /* IGMPv1/2 start query response */ #define IGMP_AWAKENING_MEMBER 6 /* IGMPv1/2 group timer will start */ #define IGMP_G_QUERY_PENDING_MEMBER 7 /* IGMPv3 group query pending */ #define IGMP_SG_QUERY_PENDING_MEMBER 8 /* IGMPv3 source query pending */ #define IGMP_LEAVING_MEMBER 9 /* IGMPv3 dying gasp (pending last */ /* retransmission of INCLUDE {}) */ /* * IGMP version tag. */ #define IGMP_VERSION_NONE 0 /* Invalid */ #define IGMP_VERSION_1 1 #define IGMP_VERSION_2 2 #define IGMP_VERSION_3 3 /* Default */ /* * IGMPv3 protocol control variables. */ #define IGMP_RV_INIT 2 /* Robustness Variable */ #define IGMP_RV_MIN 1 #define IGMP_RV_MAX 7 #define IGMP_QI_INIT 125 /* Query Interval (s) */ #define IGMP_QI_MIN 1 #define IGMP_QI_MAX 255 #define IGMP_QRI_INIT 10 /* Query Response Interval (s) */ #define IGMP_QRI_MIN 1 #define IGMP_QRI_MAX 255 #define IGMP_URI_INIT 3 /* Unsolicited Report Interval (s) */ #define IGMP_URI_MIN 0 #define IGMP_URI_MAX 10 #define IGMP_MAX_G_GS_PACKETS 8 /* # of packets to answer G/GS */ #define IGMP_MAX_STATE_CHANGE_PACKETS 8 /* # of packets per state change */ #define IGMP_MAX_RESPONSE_PACKETS 16 /* # of packets for general query */ #define IGMP_MAX_RESPONSE_BURST 4 /* # of responses to send at once */ #define IGMP_RESPONSE_BURST_INTERVAL (PR_FASTHZ / 2) /* 500ms */ /* * IGMP-specific mbuf flags. */ #define M_IGMPV2 M_PROTO1 /* Packet is IGMPv2 */ #define M_IGMPV3_HDR M_PROTO2 /* Packet has IGMPv3 headers */ #define M_GROUPREC M_PROTO3 /* mbuf chain is a group record */ #define M_IGMP_LOOP M_PROTO4 /* transmit on loif, not real ifp */ /* * Default amount of leading space for IGMPv3 to allocate at the * beginning of its mbuf packet chains, to avoid fragmentation and * unnecessary allocation of leading mbufs. */ #define RAOPT_LEN 4 /* Length of IP Router Alert option */ #define IGMP_LEADINGSPACE \ (sizeof(struct ip) + RAOPT_LEN + sizeof(struct igmp_report)) /* + * Structure returned by net.inet.igmp.ifinfo sysctl. + */ +struct igmp_ifinfo { + uint32_t igi_version; /* IGMPv3 Host Compatibility Mode */ + uint32_t igi_v1_timer; /* IGMPv1 Querier Present timer (s) */ + uint32_t igi_v2_timer; /* IGMPv2 Querier Present timer (s) */ + uint32_t igi_v3_timer; /* IGMPv3 General Query (interface) timer (s)*/ + uint32_t igi_flags; /* IGMP per-interface flags */ +#define IGIF_SILENT 0x00000001 /* Do not use IGMP on this ifp */ +#define IGIF_LOOPBACK 0x00000002 /* Send IGMP reports to loopback */ + uint32_t igi_rv; /* IGMPv3 Robustness Variable */ + uint32_t igi_qi; /* IGMPv3 Query Interval (s) */ + uint32_t igi_qri; /* IGMPv3 Query Response Interval (s) */ + uint32_t igi_uri; /* IGMPv3 Unsolicited Report Interval (s) */ +}; + +#ifdef _KERNEL +#define IGMPSTAT_ADD(name, val) V_igmpstat.name += (val) +#define IGMPSTAT_INC(name) IGMPSTAT_ADD(name, 1) + +/* * Subsystem lock macros. * The IGMP lock is only taken with IGMP. Currently it is system-wide. * VIMAGE: The lock could be pushed to per-VIMAGE granularity in future. */ #define IGMP_LOCK_INIT() mtx_init(&igmp_mtx, "igmp_mtx", NULL, MTX_DEF) #define IGMP_LOCK_DESTROY() mtx_destroy(&igmp_mtx) #define IGMP_LOCK() mtx_lock(&igmp_mtx) #define IGMP_LOCK_ASSERT() mtx_assert(&igmp_mtx, MA_OWNED) #define IGMP_UNLOCK() mtx_unlock(&igmp_mtx) #define IGMP_UNLOCK_ASSERT() mtx_assert(&igmp_mtx, MA_NOTOWNED) -struct igmp_ifinfo; +/* + * Per-interface IGMP router version information. + */ +struct igmp_ifsoftc { + LIST_ENTRY(igmp_ifsoftc) igi_link; + struct ifnet *igi_ifp; /* pointer back to interface */ + uint32_t igi_version; /* IGMPv3 Host Compatibility Mode */ + uint32_t igi_v1_timer; /* IGMPv1 Querier Present timer (s) */ + uint32_t igi_v2_timer; /* IGMPv2 Querier Present timer (s) */ + uint32_t igi_v3_timer; /* IGMPv3 General Query (interface) timer (s)*/ + uint32_t igi_flags; /* IGMP per-interface flags */ + uint32_t igi_rv; /* IGMPv3 Robustness Variable */ + uint32_t igi_qi; /* IGMPv3 Query Interval (s) */ + uint32_t igi_qri; /* IGMPv3 Query Response Interval (s) */ + uint32_t igi_uri; /* IGMPv3 Unsolicited Report Interval (s) */ + SLIST_HEAD(,in_multi) igi_relinmhead; /* released groups */ + struct mbufq igi_gq; /* general query responses queue */ +}; int igmp_change_state(struct in_multi *); void igmp_fasttimo(void); -struct igmp_ifinfo * +struct igmp_ifsoftc * igmp_domifattach(struct ifnet *); void igmp_domifdetach(struct ifnet *); void igmp_ifdetach(struct ifnet *); int igmp_input(struct mbuf **, int *, int); void igmp_slowtimo(void); SYSCTL_DECL(_net_inet_igmp); #endif /* _KERNEL */ - -/* - * Identifiers for IGMP sysctl nodes - */ -#define IGMPCTL_STATS 1 /* statistics (read-only) */ - #endif Index: projects/ifnet/sys/netinet/in_mcast.c =================================================================== --- projects/ifnet/sys/netinet/in_mcast.c (revision 279031) +++ projects/ifnet/sys/netinet/in_mcast.c (revision 279032) @@ -1,3008 +1,3008 @@ /*- * Copyright (c) 2007-2009 Bruce Simpson. * Copyright (c) 2005 Robert N. M. Watson. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. The name of the author may not be used to endorse or promote * products derived from this software without specific prior written * permission. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ /* * IPv4 multicast socket, group, and socket option processing module. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifndef KTR_IGMPV3 #define KTR_IGMPV3 KTR_INET #endif #ifndef __SOCKUNION_DECLARED union sockunion { struct sockaddr_storage ss; struct sockaddr sa; struct sockaddr_dl sdl; struct sockaddr_in sin; }; typedef union sockunion sockunion_t; #define __SOCKUNION_DECLARED #endif /* __SOCKUNION_DECLARED */ static MALLOC_DEFINE(M_INMFILTER, "in_mfilter", "IPv4 multicast PCB-layer source filter"); static MALLOC_DEFINE(M_IPMADDR, "in_multi", "IPv4 multicast group"); static MALLOC_DEFINE(M_IPMOPTS, "ip_moptions", "IPv4 multicast options"); static MALLOC_DEFINE(M_IPMSOURCE, "ip_msource", "IPv4 multicast IGMP-layer source filter"); /* * Locking: * - Lock order is: Giant, INP_WLOCK, IN_MULTI_LOCK, IGMP_LOCK, IF_ADDR_LOCK. * - The IF_ADDR_LOCK is implicitly taken by inm_lookup() earlier, however * it can be taken by code in net/if.c also. * - ip_moptions and in_mfilter are covered by the INP_WLOCK. * * struct in_multi is covered by IN_MULTI_LOCK. There isn't strictly * any need for in_multi itself to be virtualized -- it is bound to an ifp * anyway no matter what happens. */ struct mtx in_multi_mtx; MTX_SYSINIT(in_multi_mtx, &in_multi_mtx, "in_multi_mtx", MTX_DEF); /* * Functions with non-static linkage defined in this file should be * declared in in_var.h: * imo_multi_filter() * in_addmulti() * in_delmulti() * in_joingroup() * in_joingroup_locked() * in_leavegroup() * in_leavegroup_locked() * and ip_var.h: * inp_freemoptions() * inp_getmoptions() * inp_setmoptions() * * XXX: Both carp and pf need to use the legacy (*,G) KPIs in_addmulti() * and in_delmulti(). */ static void imf_commit(struct in_mfilter *); static int imf_get_source(struct in_mfilter *imf, const struct sockaddr_in *psin, struct in_msource **); static struct in_msource * imf_graft(struct in_mfilter *, const uint8_t, const struct sockaddr_in *); static void imf_leave(struct in_mfilter *); static int imf_prune(struct in_mfilter *, const struct sockaddr_in *); static void imf_purge(struct in_mfilter *); static void imf_rollback(struct in_mfilter *); static void imf_reap(struct in_mfilter *); static int imo_grow(struct ip_moptions *); static size_t imo_match_group(const struct ip_moptions *, const struct ifnet *, const struct sockaddr *); static struct in_msource * imo_match_source(const struct ip_moptions *, const size_t, const struct sockaddr *); static void ims_merge(struct ip_msource *ims, const struct in_msource *lims, const int rollback); static int in_getmulti(struct ifnet *, const struct in_addr *, struct in_multi **); static int inm_get_source(struct in_multi *inm, const in_addr_t haddr, const int noalloc, struct ip_msource **pims); #ifdef KTR static int inm_is_ifp_detached(const struct in_multi *); #endif static int inm_merge(struct in_multi *, /*const*/ struct in_mfilter *); static void inm_purge(struct in_multi *); static void inm_reap(struct in_multi *); static struct ip_moptions * inp_findmoptions(struct inpcb *); static void inp_freemoptions_internal(struct ip_moptions *); static void inp_gcmoptions(void *, int); static int inp_get_source_filters(struct inpcb *, struct sockopt *); static int inp_join_group(struct inpcb *, struct sockopt *); static int inp_leave_group(struct inpcb *, struct sockopt *); static struct ifnet * inp_lookup_mcast_ifp(const struct inpcb *, const struct sockaddr_in *, const struct in_addr); static int inp_block_unblock_source(struct inpcb *, struct sockopt *); static int inp_set_multicast_if(struct inpcb *, struct sockopt *); static int inp_set_source_filters(struct inpcb *, struct sockopt *); static int sysctl_ip_mcast_filters(SYSCTL_HANDLER_ARGS); static SYSCTL_NODE(_net_inet_ip, OID_AUTO, mcast, CTLFLAG_RW, 0, "IPv4 multicast"); static u_long in_mcast_maxgrpsrc = IP_MAX_GROUP_SRC_FILTER; SYSCTL_ULONG(_net_inet_ip_mcast, OID_AUTO, maxgrpsrc, CTLFLAG_RWTUN, &in_mcast_maxgrpsrc, 0, "Max source filters per group"); static u_long in_mcast_maxsocksrc = IP_MAX_SOCK_SRC_FILTER; SYSCTL_ULONG(_net_inet_ip_mcast, OID_AUTO, maxsocksrc, CTLFLAG_RWTUN, &in_mcast_maxsocksrc, 0, "Max source filters per socket"); int in_mcast_loop = IP_DEFAULT_MULTICAST_LOOP; SYSCTL_INT(_net_inet_ip_mcast, OID_AUTO, loop, CTLFLAG_RWTUN, &in_mcast_loop, 0, "Loopback multicast datagrams by default"); static SYSCTL_NODE(_net_inet_ip_mcast, OID_AUTO, filters, CTLFLAG_RD | CTLFLAG_MPSAFE, sysctl_ip_mcast_filters, "Per-interface stack-wide source filters"); static STAILQ_HEAD(, ip_moptions) imo_gc_list = STAILQ_HEAD_INITIALIZER(imo_gc_list); static struct task imo_gc_task = TASK_INITIALIZER(0, inp_gcmoptions, NULL); #ifdef KTR /* * Inline function which wraps assertions for a valid ifp. * The ifnet layer will set the ifma's ifp pointer to NULL if the ifp * is detached. */ static int __inline inm_is_ifp_detached(const struct in_multi *inm) { struct ifnet *ifp; KASSERT(inm->inm_ifma != NULL, ("%s: no ifma", __func__)); ifp = inm->inm_ifma->ifma_ifp; if (ifp != NULL) { /* * Sanity check that netinet's notion of ifp is the * same as net's. */ KASSERT(inm->inm_ifp == ifp, ("%s: bad ifp", __func__)); } return (ifp == NULL); } #endif /* * Initialize an in_mfilter structure to a known state at t0, t1 * with an empty source filter list. */ static __inline void imf_init(struct in_mfilter *imf, const int st0, const int st1) { memset(imf, 0, sizeof(struct in_mfilter)); RB_INIT(&imf->imf_sources); imf->imf_st[0] = st0; imf->imf_st[1] = st1; } /* * Function for looking up an in_multi record for an IPv4 multicast address * on a given interface. ifp must be valid. If no record found, return NULL. * The IN_MULTI_LOCK and IF_ADDR_LOCK on ifp must be held. */ struct in_multi * inm_lookup_locked(struct ifnet *ifp, const struct in_addr ina) { struct ifmultiaddr *ifma; struct in_multi *inm; IN_MULTI_LOCK_ASSERT(); IF_ADDR_LOCK_ASSERT(ifp); inm = NULL; TAILQ_FOREACH(ifma, &((ifp)->if_multiaddrs), ifma_link) { if (ifma->ifma_addr->sa_family == AF_INET) { inm = (struct in_multi *)ifma->ifma_protospec; if (inm->inm_addr.s_addr == ina.s_addr) break; inm = NULL; } } return (inm); } /* * Wrapper for inm_lookup_locked(). * The IF_ADDR_LOCK will be taken on ifp and released on return. */ struct in_multi * inm_lookup(struct ifnet *ifp, const struct in_addr ina) { struct in_multi *inm; IN_MULTI_LOCK_ASSERT(); IF_ADDR_RLOCK(ifp); inm = inm_lookup_locked(ifp, ina); IF_ADDR_RUNLOCK(ifp); return (inm); } /* * Resize the ip_moptions vector to the next power-of-two minus 1. * May be called with locks held; do not sleep. */ static int imo_grow(struct ip_moptions *imo) { struct in_multi **nmships; struct in_multi **omships; struct in_mfilter *nmfilters; struct in_mfilter *omfilters; size_t idx; size_t newmax; size_t oldmax; nmships = NULL; nmfilters = NULL; omships = imo->imo_membership; omfilters = imo->imo_mfilters; oldmax = imo->imo_max_memberships; newmax = ((oldmax + 1) * 2) - 1; if (newmax <= IP_MAX_MEMBERSHIPS) { nmships = (struct in_multi **)realloc(omships, sizeof(struct in_multi *) * newmax, M_IPMOPTS, M_NOWAIT); nmfilters = (struct in_mfilter *)realloc(omfilters, sizeof(struct in_mfilter) * newmax, M_INMFILTER, M_NOWAIT); if (nmships != NULL && nmfilters != NULL) { /* Initialize newly allocated source filter heads. */ for (idx = oldmax; idx < newmax; idx++) { imf_init(&nmfilters[idx], MCAST_UNDEFINED, MCAST_EXCLUDE); } imo->imo_max_memberships = newmax; imo->imo_membership = nmships; imo->imo_mfilters = nmfilters; } } if (nmships == NULL || nmfilters == NULL) { if (nmships != NULL) free(nmships, M_IPMOPTS); if (nmfilters != NULL) free(nmfilters, M_INMFILTER); return (ETOOMANYREFS); } return (0); } /* * Find an IPv4 multicast group entry for this ip_moptions instance * which matches the specified group, and optionally an interface. * Return its index into the array, or -1 if not found. */ static size_t imo_match_group(const struct ip_moptions *imo, const struct ifnet *ifp, const struct sockaddr *group) { const struct sockaddr_in *gsin; struct in_multi **pinm; int idx; int nmships; gsin = (const struct sockaddr_in *)group; /* The imo_membership array may be lazy allocated. */ if (imo->imo_membership == NULL || imo->imo_num_memberships == 0) return (-1); nmships = imo->imo_num_memberships; pinm = &imo->imo_membership[0]; for (idx = 0; idx < nmships; idx++, pinm++) { if (*pinm == NULL) continue; if ((ifp == NULL || ((*pinm)->inm_ifp == ifp)) && in_hosteq((*pinm)->inm_addr, gsin->sin_addr)) { break; } } if (idx >= nmships) idx = -1; return (idx); } /* * Find an IPv4 multicast source entry for this imo which matches * the given group index for this socket, and source address. * * NOTE: This does not check if the entry is in-mode, merely if * it exists, which may not be the desired behaviour. */ static struct in_msource * imo_match_source(const struct ip_moptions *imo, const size_t gidx, const struct sockaddr *src) { struct ip_msource find; struct in_mfilter *imf; struct ip_msource *ims; const sockunion_t *psa; KASSERT(src->sa_family == AF_INET, ("%s: !AF_INET", __func__)); KASSERT(gidx != -1 && gidx < imo->imo_num_memberships, ("%s: invalid index %d\n", __func__, (int)gidx)); /* The imo_mfilters array may be lazy allocated. */ if (imo->imo_mfilters == NULL) return (NULL); imf = &imo->imo_mfilters[gidx]; /* Source trees are keyed in host byte order. */ psa = (const sockunion_t *)src; find.ims_haddr = ntohl(psa->sin.sin_addr.s_addr); ims = RB_FIND(ip_msource_tree, &imf->imf_sources, &find); return ((struct in_msource *)ims); } /* * Perform filtering for multicast datagrams on a socket by group and source. * * Returns 0 if a datagram should be allowed through, or various error codes * if the socket was not a member of the group, or the source was muted, etc. */ int imo_multi_filter(const struct ip_moptions *imo, const struct ifnet *ifp, const struct sockaddr *group, const struct sockaddr *src) { size_t gidx; struct in_msource *ims; int mode; KASSERT(ifp != NULL, ("%s: null ifp", __func__)); gidx = imo_match_group(imo, ifp, group); if (gidx == -1) return (MCAST_NOTGMEMBER); /* * Check if the source was included in an (S,G) join. * Allow reception on exclusive memberships by default, * reject reception on inclusive memberships by default. * Exclude source only if an in-mode exclude filter exists. * Include source only if an in-mode include filter exists. * NOTE: We are comparing group state here at IGMP t1 (now) * with socket-layer t0 (since last downcall). */ mode = imo->imo_mfilters[gidx].imf_st[1]; ims = imo_match_source(imo, gidx, src); if ((ims == NULL && mode == MCAST_INCLUDE) || (ims != NULL && ims->imsl_st[0] != mode)) return (MCAST_NOTSMEMBER); return (MCAST_PASS); } /* * Find and return a reference to an in_multi record for (ifp, group), * and bump its reference count. * If one does not exist, try to allocate it, and update link-layer multicast * filters on ifp to listen for group. * Assumes the IN_MULTI lock is held across the call. * Return 0 if successful, otherwise return an appropriate error code. */ static int in_getmulti(struct ifnet *ifp, const struct in_addr *group, struct in_multi **pinm) { struct sockaddr_in gsin; struct ifmultiaddr *ifma; struct in_ifinfo *ii; struct in_multi *inm; int error; IN_MULTI_LOCK_ASSERT(); ii = (struct in_ifinfo *)ifp->if_afdata[AF_INET]; inm = inm_lookup(ifp, *group); if (inm != NULL) { /* * If we already joined this group, just bump the * refcount and return it. */ KASSERT(inm->inm_refcount >= 1, ("%s: bad refcount %d", __func__, inm->inm_refcount)); ++inm->inm_refcount; *pinm = inm; return (0); } memset(&gsin, 0, sizeof(gsin)); gsin.sin_family = AF_INET; gsin.sin_len = sizeof(struct sockaddr_in); gsin.sin_addr = *group; /* * Check if a link-layer group is already associated * with this network-layer group on the given ifnet. */ error = if_addmulti(ifp, (struct sockaddr *)&gsin, &ifma); if (error != 0) return (error); /* XXX ifma_protospec must be covered by IF_ADDR_LOCK */ IF_ADDR_WLOCK(ifp); /* * If something other than netinet is occupying the link-layer * group, print a meaningful error message and back out of * the allocation. * Otherwise, bump the refcount on the existing network-layer * group association and return it. */ if (ifma->ifma_protospec != NULL) { inm = (struct in_multi *)ifma->ifma_protospec; #ifdef INVARIANTS KASSERT(ifma->ifma_addr != NULL, ("%s: no ifma_addr", __func__)); KASSERT(ifma->ifma_addr->sa_family == AF_INET, ("%s: ifma not AF_INET", __func__)); KASSERT(inm != NULL, ("%s: no ifma_protospec", __func__)); if (inm->inm_ifma != ifma || inm->inm_ifp != ifp || !in_hosteq(inm->inm_addr, *group)) panic("%s: ifma %p is inconsistent with %p (%s)", __func__, ifma, inm, inet_ntoa(*group)); #endif ++inm->inm_refcount; *pinm = inm; IF_ADDR_WUNLOCK(ifp); return (0); } IF_ADDR_WLOCK_ASSERT(ifp); /* * A new in_multi record is needed; allocate and initialize it. * We DO NOT perform an IGMP join as the in_ layer may need to * push an initial source list down to IGMP to support SSM. * * The initial source filter state is INCLUDE, {} as per the RFC. */ inm = malloc(sizeof(*inm), M_IPMADDR, M_NOWAIT | M_ZERO); if (inm == NULL) { if_delmulti_ifma(ifma); IF_ADDR_WUNLOCK(ifp); return (ENOMEM); } inm->inm_addr = *group; inm->inm_ifp = ifp; inm->inm_igi = ii->ii_igmp; inm->inm_ifma = ifma; inm->inm_refcount = 1; inm->inm_state = IGMP_NOT_MEMBER; mbufq_init(&inm->inm_scq, IGMP_MAX_STATE_CHANGES); inm->inm_st[0].iss_fmode = MCAST_UNDEFINED; inm->inm_st[1].iss_fmode = MCAST_UNDEFINED; RB_INIT(&inm->inm_srcs); ifma->ifma_protospec = inm; *pinm = inm; IF_ADDR_WUNLOCK(ifp); return (0); } /* * Drop a reference to an in_multi record. * * If the refcount drops to 0, free the in_multi record and * delete the underlying link-layer membership. */ void inm_release_locked(struct in_multi *inm) { struct ifmultiaddr *ifma; IN_MULTI_LOCK_ASSERT(); CTR2(KTR_IGMPV3, "%s: refcount is %d", __func__, inm->inm_refcount); if (--inm->inm_refcount > 0) { CTR2(KTR_IGMPV3, "%s: refcount is now %d", __func__, inm->inm_refcount); return; } CTR2(KTR_IGMPV3, "%s: freeing inm %p", __func__, inm); ifma = inm->inm_ifma; /* XXX this access is not covered by IF_ADDR_LOCK */ CTR2(KTR_IGMPV3, "%s: purging ifma %p", __func__, ifma); KASSERT(ifma->ifma_protospec == inm, ("%s: ifma_protospec != inm", __func__)); ifma->ifma_protospec = NULL; inm_purge(inm); free(inm, M_IPMADDR); if_delmulti_ifma(ifma); } /* * Clear recorded source entries for a group. * Used by the IGMP code. Caller must hold the IN_MULTI lock. * FIXME: Should reap. */ void inm_clear_recorded(struct in_multi *inm) { struct ip_msource *ims; IN_MULTI_LOCK_ASSERT(); RB_FOREACH(ims, ip_msource_tree, &inm->inm_srcs) { if (ims->ims_stp) { ims->ims_stp = 0; --inm->inm_st[1].iss_rec; } } KASSERT(inm->inm_st[1].iss_rec == 0, ("%s: iss_rec %d not 0", __func__, inm->inm_st[1].iss_rec)); } /* * Record a source as pending for a Source-Group IGMPv3 query. * This lives here as it modifies the shared tree. * * inm is the group descriptor. * naddr is the address of the source to record in network-byte order. * * If the net.inet.igmp.sgalloc sysctl is non-zero, we will * lazy-allocate a source node in response to an SG query. * Otherwise, no allocation is performed. This saves some memory * with the trade-off that the source will not be reported to the * router if joined in the window between the query response and * the group actually being joined on the local host. * * VIMAGE: XXX: Currently the igmp_sgalloc feature has been removed. * This turns off the allocation of a recorded source entry if * the group has not been joined. * * Return 0 if the source didn't exist or was already marked as recorded. * Return 1 if the source was marked as recorded by this function. * Return <0 if any error occured (negated errno code). */ int inm_record_source(struct in_multi *inm, const in_addr_t naddr) { struct ip_msource find; struct ip_msource *ims, *nims; IN_MULTI_LOCK_ASSERT(); find.ims_haddr = ntohl(naddr); ims = RB_FIND(ip_msource_tree, &inm->inm_srcs, &find); if (ims && ims->ims_stp) return (0); if (ims == NULL) { if (inm->inm_nsrc == in_mcast_maxgrpsrc) return (-ENOSPC); nims = malloc(sizeof(struct ip_msource), M_IPMSOURCE, M_NOWAIT | M_ZERO); if (nims == NULL) return (-ENOMEM); nims->ims_haddr = find.ims_haddr; RB_INSERT(ip_msource_tree, &inm->inm_srcs, nims); ++inm->inm_nsrc; ims = nims; } /* * Mark the source as recorded and update the recorded * source count. */ ++ims->ims_stp; ++inm->inm_st[1].iss_rec; return (1); } /* * Return a pointer to an in_msource owned by an in_mfilter, * given its source address. * Lazy-allocate if needed. If this is a new entry its filter state is * undefined at t0. * * imf is the filter set being modified. * haddr is the source address in *host* byte-order. * * SMPng: May be called with locks held; malloc must not block. */ static int imf_get_source(struct in_mfilter *imf, const struct sockaddr_in *psin, struct in_msource **plims) { struct ip_msource find; struct ip_msource *ims, *nims; struct in_msource *lims; int error; error = 0; ims = NULL; lims = NULL; /* key is host byte order */ find.ims_haddr = ntohl(psin->sin_addr.s_addr); ims = RB_FIND(ip_msource_tree, &imf->imf_sources, &find); lims = (struct in_msource *)ims; if (lims == NULL) { if (imf->imf_nsrc == in_mcast_maxsocksrc) return (ENOSPC); nims = malloc(sizeof(struct in_msource), M_INMFILTER, M_NOWAIT | M_ZERO); if (nims == NULL) return (ENOMEM); lims = (struct in_msource *)nims; lims->ims_haddr = find.ims_haddr; lims->imsl_st[0] = MCAST_UNDEFINED; RB_INSERT(ip_msource_tree, &imf->imf_sources, nims); ++imf->imf_nsrc; } *plims = lims; return (error); } /* * Graft a source entry into an existing socket-layer filter set, * maintaining any required invariants and checking allocations. * * The source is marked as being in the new filter mode at t1. * * Return the pointer to the new node, otherwise return NULL. */ static struct in_msource * imf_graft(struct in_mfilter *imf, const uint8_t st1, const struct sockaddr_in *psin) { struct ip_msource *nims; struct in_msource *lims; nims = malloc(sizeof(struct in_msource), M_INMFILTER, M_NOWAIT | M_ZERO); if (nims == NULL) return (NULL); lims = (struct in_msource *)nims; lims->ims_haddr = ntohl(psin->sin_addr.s_addr); lims->imsl_st[0] = MCAST_UNDEFINED; lims->imsl_st[1] = st1; RB_INSERT(ip_msource_tree, &imf->imf_sources, nims); ++imf->imf_nsrc; return (lims); } /* * Prune a source entry from an existing socket-layer filter set, * maintaining any required invariants and checking allocations. * * The source is marked as being left at t1, it is not freed. * * Return 0 if no error occurred, otherwise return an errno value. */ static int imf_prune(struct in_mfilter *imf, const struct sockaddr_in *psin) { struct ip_msource find; struct ip_msource *ims; struct in_msource *lims; /* key is host byte order */ find.ims_haddr = ntohl(psin->sin_addr.s_addr); ims = RB_FIND(ip_msource_tree, &imf->imf_sources, &find); if (ims == NULL) return (ENOENT); lims = (struct in_msource *)ims; lims->imsl_st[1] = MCAST_UNDEFINED; return (0); } /* * Revert socket-layer filter set deltas at t1 to t0 state. */ static void imf_rollback(struct in_mfilter *imf) { struct ip_msource *ims, *tims; struct in_msource *lims; RB_FOREACH_SAFE(ims, ip_msource_tree, &imf->imf_sources, tims) { lims = (struct in_msource *)ims; if (lims->imsl_st[0] == lims->imsl_st[1]) { /* no change at t1 */ continue; } else if (lims->imsl_st[0] != MCAST_UNDEFINED) { /* revert change to existing source at t1 */ lims->imsl_st[1] = lims->imsl_st[0]; } else { /* revert source added t1 */ CTR2(KTR_IGMPV3, "%s: free ims %p", __func__, ims); RB_REMOVE(ip_msource_tree, &imf->imf_sources, ims); free(ims, M_INMFILTER); imf->imf_nsrc--; } } imf->imf_st[1] = imf->imf_st[0]; } /* * Mark socket-layer filter set as INCLUDE {} at t1. */ static void imf_leave(struct in_mfilter *imf) { struct ip_msource *ims; struct in_msource *lims; RB_FOREACH(ims, ip_msource_tree, &imf->imf_sources) { lims = (struct in_msource *)ims; lims->imsl_st[1] = MCAST_UNDEFINED; } imf->imf_st[1] = MCAST_INCLUDE; } /* * Mark socket-layer filter set deltas as committed. */ static void imf_commit(struct in_mfilter *imf) { struct ip_msource *ims; struct in_msource *lims; RB_FOREACH(ims, ip_msource_tree, &imf->imf_sources) { lims = (struct in_msource *)ims; lims->imsl_st[0] = lims->imsl_st[1]; } imf->imf_st[0] = imf->imf_st[1]; } /* * Reap unreferenced sources from socket-layer filter set. */ static void imf_reap(struct in_mfilter *imf) { struct ip_msource *ims, *tims; struct in_msource *lims; RB_FOREACH_SAFE(ims, ip_msource_tree, &imf->imf_sources, tims) { lims = (struct in_msource *)ims; if ((lims->imsl_st[0] == MCAST_UNDEFINED) && (lims->imsl_st[1] == MCAST_UNDEFINED)) { CTR2(KTR_IGMPV3, "%s: free lims %p", __func__, ims); RB_REMOVE(ip_msource_tree, &imf->imf_sources, ims); free(ims, M_INMFILTER); imf->imf_nsrc--; } } } /* * Purge socket-layer filter set. */ static void imf_purge(struct in_mfilter *imf) { struct ip_msource *ims, *tims; RB_FOREACH_SAFE(ims, ip_msource_tree, &imf->imf_sources, tims) { CTR2(KTR_IGMPV3, "%s: free ims %p", __func__, ims); RB_REMOVE(ip_msource_tree, &imf->imf_sources, ims); free(ims, M_INMFILTER); imf->imf_nsrc--; } imf->imf_st[0] = imf->imf_st[1] = MCAST_UNDEFINED; KASSERT(RB_EMPTY(&imf->imf_sources), ("%s: imf_sources not empty", __func__)); } /* * Look up a source filter entry for a multicast group. * * inm is the group descriptor to work with. * haddr is the host-byte-order IPv4 address to look up. * noalloc may be non-zero to suppress allocation of sources. * *pims will be set to the address of the retrieved or allocated source. * * SMPng: NOTE: may be called with locks held. * Return 0 if successful, otherwise return a non-zero error code. */ static int inm_get_source(struct in_multi *inm, const in_addr_t haddr, const int noalloc, struct ip_msource **pims) { struct ip_msource find; struct ip_msource *ims, *nims; #ifdef KTR struct in_addr ia; #endif find.ims_haddr = haddr; ims = RB_FIND(ip_msource_tree, &inm->inm_srcs, &find); if (ims == NULL && !noalloc) { if (inm->inm_nsrc == in_mcast_maxgrpsrc) return (ENOSPC); nims = malloc(sizeof(struct ip_msource), M_IPMSOURCE, M_NOWAIT | M_ZERO); if (nims == NULL) return (ENOMEM); nims->ims_haddr = haddr; RB_INSERT(ip_msource_tree, &inm->inm_srcs, nims); ++inm->inm_nsrc; ims = nims; #ifdef KTR ia.s_addr = htonl(haddr); CTR3(KTR_IGMPV3, "%s: allocated %s as %p", __func__, inet_ntoa(ia), ims); #endif } *pims = ims; return (0); } /* * Merge socket-layer source into IGMP-layer source. * If rollback is non-zero, perform the inverse of the merge. */ static void ims_merge(struct ip_msource *ims, const struct in_msource *lims, const int rollback) { int n = rollback ? -1 : 1; #ifdef KTR struct in_addr ia; ia.s_addr = htonl(ims->ims_haddr); #endif if (lims->imsl_st[0] == MCAST_EXCLUDE) { CTR3(KTR_IGMPV3, "%s: t1 ex -= %d on %s", __func__, n, inet_ntoa(ia)); ims->ims_st[1].ex -= n; } else if (lims->imsl_st[0] == MCAST_INCLUDE) { CTR3(KTR_IGMPV3, "%s: t1 in -= %d on %s", __func__, n, inet_ntoa(ia)); ims->ims_st[1].in -= n; } if (lims->imsl_st[1] == MCAST_EXCLUDE) { CTR3(KTR_IGMPV3, "%s: t1 ex += %d on %s", __func__, n, inet_ntoa(ia)); ims->ims_st[1].ex += n; } else if (lims->imsl_st[1] == MCAST_INCLUDE) { CTR3(KTR_IGMPV3, "%s: t1 in += %d on %s", __func__, n, inet_ntoa(ia)); ims->ims_st[1].in += n; } } /* * Atomically update the global in_multi state, when a membership's * filter list is being updated in any way. * * imf is the per-inpcb-membership group filter pointer. * A fake imf may be passed for in-kernel consumers. * * XXX This is a candidate for a set-symmetric-difference style loop * which would eliminate the repeated lookup from root of ims nodes, * as they share the same key space. * * If any error occurred this function will back out of refcounts * and return a non-zero value. */ static int inm_merge(struct in_multi *inm, /*const*/ struct in_mfilter *imf) { struct ip_msource *ims, *nims; struct in_msource *lims; int schanged, error; int nsrc0, nsrc1; schanged = 0; error = 0; nsrc1 = nsrc0 = 0; /* * Update the source filters first, as this may fail. * Maintain count of in-mode filters at t0, t1. These are * used to work out if we transition into ASM mode or not. * Maintain a count of source filters whose state was * actually modified by this operation. */ RB_FOREACH(ims, ip_msource_tree, &imf->imf_sources) { lims = (struct in_msource *)ims; if (lims->imsl_st[0] == imf->imf_st[0]) nsrc0++; if (lims->imsl_st[1] == imf->imf_st[1]) nsrc1++; if (lims->imsl_st[0] == lims->imsl_st[1]) continue; error = inm_get_source(inm, lims->ims_haddr, 0, &nims); ++schanged; if (error) break; ims_merge(nims, lims, 0); } if (error) { struct ip_msource *bims; RB_FOREACH_REVERSE_FROM(ims, ip_msource_tree, nims) { lims = (struct in_msource *)ims; if (lims->imsl_st[0] == lims->imsl_st[1]) continue; (void)inm_get_source(inm, lims->ims_haddr, 1, &bims); if (bims == NULL) continue; ims_merge(bims, lims, 1); } goto out_reap; } CTR3(KTR_IGMPV3, "%s: imf filters in-mode: %d at t0, %d at t1", __func__, nsrc0, nsrc1); /* Handle transition between INCLUDE {n} and INCLUDE {} on socket. */ if (imf->imf_st[0] == imf->imf_st[1] && imf->imf_st[1] == MCAST_INCLUDE) { if (nsrc1 == 0) { CTR1(KTR_IGMPV3, "%s: --in on inm at t1", __func__); --inm->inm_st[1].iss_in; } } /* Handle filter mode transition on socket. */ if (imf->imf_st[0] != imf->imf_st[1]) { CTR3(KTR_IGMPV3, "%s: imf transition %d to %d", __func__, imf->imf_st[0], imf->imf_st[1]); if (imf->imf_st[0] == MCAST_EXCLUDE) { CTR1(KTR_IGMPV3, "%s: --ex on inm at t1", __func__); --inm->inm_st[1].iss_ex; } else if (imf->imf_st[0] == MCAST_INCLUDE) { CTR1(KTR_IGMPV3, "%s: --in on inm at t1", __func__); --inm->inm_st[1].iss_in; } if (imf->imf_st[1] == MCAST_EXCLUDE) { CTR1(KTR_IGMPV3, "%s: ex++ on inm at t1", __func__); inm->inm_st[1].iss_ex++; } else if (imf->imf_st[1] == MCAST_INCLUDE && nsrc1 > 0) { CTR1(KTR_IGMPV3, "%s: in++ on inm at t1", __func__); inm->inm_st[1].iss_in++; } } /* * Track inm filter state in terms of listener counts. * If there are any exclusive listeners, stack-wide * membership is exclusive. * Otherwise, if only inclusive listeners, stack-wide is inclusive. * If no listeners remain, state is undefined at t1, * and the IGMP lifecycle for this group should finish. */ if (inm->inm_st[1].iss_ex > 0) { CTR1(KTR_IGMPV3, "%s: transition to EX", __func__); inm->inm_st[1].iss_fmode = MCAST_EXCLUDE; } else if (inm->inm_st[1].iss_in > 0) { CTR1(KTR_IGMPV3, "%s: transition to IN", __func__); inm->inm_st[1].iss_fmode = MCAST_INCLUDE; } else { CTR1(KTR_IGMPV3, "%s: transition to UNDEF", __func__); inm->inm_st[1].iss_fmode = MCAST_UNDEFINED; } /* Decrement ASM listener count on transition out of ASM mode. */ if (imf->imf_st[0] == MCAST_EXCLUDE && nsrc0 == 0) { if ((imf->imf_st[1] != MCAST_EXCLUDE) || (imf->imf_st[1] == MCAST_EXCLUDE && nsrc1 > 0)) CTR1(KTR_IGMPV3, "%s: --asm on inm at t1", __func__); --inm->inm_st[1].iss_asm; } /* Increment ASM listener count on transition to ASM mode. */ if (imf->imf_st[1] == MCAST_EXCLUDE && nsrc1 == 0) { CTR1(KTR_IGMPV3, "%s: asm++ on inm at t1", __func__); inm->inm_st[1].iss_asm++; } CTR3(KTR_IGMPV3, "%s: merged imf %p to inm %p", __func__, imf, inm); inm_print(inm); out_reap: if (schanged > 0) { CTR1(KTR_IGMPV3, "%s: sources changed; reaping", __func__); inm_reap(inm); } return (error); } /* * Mark an in_multi's filter set deltas as committed. * Called by IGMP after a state change has been enqueued. */ void inm_commit(struct in_multi *inm) { struct ip_msource *ims; CTR2(KTR_IGMPV3, "%s: commit inm %p", __func__, inm); CTR1(KTR_IGMPV3, "%s: pre commit:", __func__); inm_print(inm); RB_FOREACH(ims, ip_msource_tree, &inm->inm_srcs) { ims->ims_st[0] = ims->ims_st[1]; } inm->inm_st[0] = inm->inm_st[1]; } /* * Reap unreferenced nodes from an in_multi's filter set. */ static void inm_reap(struct in_multi *inm) { struct ip_msource *ims, *tims; RB_FOREACH_SAFE(ims, ip_msource_tree, &inm->inm_srcs, tims) { if (ims->ims_st[0].ex > 0 || ims->ims_st[0].in > 0 || ims->ims_st[1].ex > 0 || ims->ims_st[1].in > 0 || ims->ims_stp != 0) continue; CTR2(KTR_IGMPV3, "%s: free ims %p", __func__, ims); RB_REMOVE(ip_msource_tree, &inm->inm_srcs, ims); free(ims, M_IPMSOURCE); inm->inm_nsrc--; } } /* * Purge all source nodes from an in_multi's filter set. */ static void inm_purge(struct in_multi *inm) { struct ip_msource *ims, *tims; RB_FOREACH_SAFE(ims, ip_msource_tree, &inm->inm_srcs, tims) { CTR2(KTR_IGMPV3, "%s: free ims %p", __func__, ims); RB_REMOVE(ip_msource_tree, &inm->inm_srcs, ims); free(ims, M_IPMSOURCE); inm->inm_nsrc--; } } /* * Join a multicast group; unlocked entry point. * * SMPng: XXX: in_joingroup() is called from in_control() when Giant * is not held. Fortunately, ifp is unlikely to have been detached * at this point, so we assume it's OK to recurse. */ int in_joingroup(struct ifnet *ifp, const struct in_addr *gina, /*const*/ struct in_mfilter *imf, struct in_multi **pinm) { int error; IN_MULTI_LOCK(); error = in_joingroup_locked(ifp, gina, imf, pinm); IN_MULTI_UNLOCK(); return (error); } /* * Join a multicast group; real entry point. * * Only preserves atomicity at inm level. * NOTE: imf argument cannot be const due to sys/tree.h limitations. * * If the IGMP downcall fails, the group is not joined, and an error * code is returned. */ int in_joingroup_locked(struct ifnet *ifp, const struct in_addr *gina, /*const*/ struct in_mfilter *imf, struct in_multi **pinm) { struct in_mfilter timf; struct in_multi *inm; int error; IN_MULTI_LOCK_ASSERT(); CTR4(KTR_IGMPV3, "%s: join %s on %p(%s))", __func__, inet_ntoa(*gina), ifp, ifp->if_xname); error = 0; inm = NULL; /* * If no imf was specified (i.e. kernel consumer), * fake one up and assume it is an ASM join. */ if (imf == NULL) { imf_init(&timf, MCAST_UNDEFINED, MCAST_EXCLUDE); imf = &timf; } error = in_getmulti(ifp, gina, &inm); if (error) { CTR1(KTR_IGMPV3, "%s: in_getmulti() failure", __func__); return (error); } CTR1(KTR_IGMPV3, "%s: merge inm state", __func__); error = inm_merge(inm, imf); if (error) { CTR1(KTR_IGMPV3, "%s: failed to merge inm state", __func__); goto out_inm_release; } CTR1(KTR_IGMPV3, "%s: doing igmp downcall", __func__); error = igmp_change_state(inm); if (error) { CTR1(KTR_IGMPV3, "%s: failed to update source", __func__); goto out_inm_release; } out_inm_release: if (error) { CTR2(KTR_IGMPV3, "%s: dropping ref on %p", __func__, inm); inm_release_locked(inm); } else { *pinm = inm; } return (error); } /* * Leave a multicast group; unlocked entry point. */ int in_leavegroup(struct in_multi *inm, /*const*/ struct in_mfilter *imf) { int error; IN_MULTI_LOCK(); error = in_leavegroup_locked(inm, imf); IN_MULTI_UNLOCK(); return (error); } /* * Leave a multicast group; real entry point. * All source filters will be expunged. * * Only preserves atomicity at inm level. * * Holding the write lock for the INP which contains imf * is highly advisable. We can't assert for it as imf does not * contain a back-pointer to the owning inp. * * Note: This is not the same as inm_release(*) as this function also * makes a state change downcall into IGMP. */ int in_leavegroup_locked(struct in_multi *inm, /*const*/ struct in_mfilter *imf) { struct in_mfilter timf; int error; error = 0; IN_MULTI_LOCK_ASSERT(); CTR5(KTR_IGMPV3, "%s: leave inm %p, %s/%s, imf %p", __func__, inm, inet_ntoa(inm->inm_addr), (inm_is_ifp_detached(inm) ? "null" : inm->inm_ifp->if_xname), imf); /* * If no imf was specified (i.e. kernel consumer), * fake one up and assume it is an ASM join. */ if (imf == NULL) { imf_init(&timf, MCAST_EXCLUDE, MCAST_UNDEFINED); imf = &timf; } /* * Begin state merge transaction at IGMP layer. * * As this particular invocation should not cause any memory * to be allocated, and there is no opportunity to roll back * the transaction, it MUST NOT fail. */ CTR1(KTR_IGMPV3, "%s: merge inm state", __func__); error = inm_merge(inm, imf); KASSERT(error == 0, ("%s: failed to merge inm state", __func__)); CTR1(KTR_IGMPV3, "%s: doing igmp downcall", __func__); CURVNET_SET(inm->inm_ifp->if_vnet); error = igmp_change_state(inm); CURVNET_RESTORE(); if (error) CTR1(KTR_IGMPV3, "%s: failed igmp downcall", __func__); CTR2(KTR_IGMPV3, "%s: dropping ref on %p", __func__, inm); inm_release_locked(inm); return (error); } /*#ifndef BURN_BRIDGES*/ /* * Join an IPv4 multicast group in (*,G) exclusive mode. * The group must be a 224.0.0.0/24 link-scope group. * This KPI is for legacy kernel consumers only. */ struct in_multi * in_addmulti(struct in_addr *ap, struct ifnet *ifp) { struct in_multi *pinm; int error; KASSERT(IN_LOCAL_GROUP(ntohl(ap->s_addr)), ("%s: %s not in 224.0.0.0/24", __func__, inet_ntoa(*ap))); error = in_joingroup(ifp, ap, NULL, &pinm); if (error != 0) pinm = NULL; return (pinm); } /* * Leave an IPv4 multicast group, assumed to be in exclusive (*,G) mode. * This KPI is for legacy kernel consumers only. */ void in_delmulti(struct in_multi *inm) { (void)in_leavegroup(inm, NULL); } /*#endif*/ /* * Block or unblock an ASM multicast source on an inpcb. * This implements the delta-based API described in RFC 3678. * * The delta-based API applies only to exclusive-mode memberships. * An IGMP downcall will be performed. * * SMPng: NOTE: Must take Giant as a join may create a new ifma. * * Return 0 if successful, otherwise return an appropriate error code. */ static int inp_block_unblock_source(struct inpcb *inp, struct sockopt *sopt) { struct group_source_req gsr; sockunion_t *gsa, *ssa; struct ifnet *ifp; struct in_mfilter *imf; struct ip_moptions *imo; struct in_msource *ims; struct in_multi *inm; size_t idx; uint16_t fmode; int error, doblock; ifp = NULL; error = 0; doblock = 0; memset(&gsr, 0, sizeof(struct group_source_req)); gsa = (sockunion_t *)&gsr.gsr_group; ssa = (sockunion_t *)&gsr.gsr_source; switch (sopt->sopt_name) { case IP_BLOCK_SOURCE: case IP_UNBLOCK_SOURCE: { struct ip_mreq_source mreqs; error = sooptcopyin(sopt, &mreqs, sizeof(struct ip_mreq_source), sizeof(struct ip_mreq_source)); if (error) return (error); gsa->sin.sin_family = AF_INET; gsa->sin.sin_len = sizeof(struct sockaddr_in); gsa->sin.sin_addr = mreqs.imr_multiaddr; ssa->sin.sin_family = AF_INET; ssa->sin.sin_len = sizeof(struct sockaddr_in); ssa->sin.sin_addr = mreqs.imr_sourceaddr; if (!in_nullhost(mreqs.imr_interface)) INADDR_TO_IFP(mreqs.imr_interface, ifp); if (sopt->sopt_name == IP_BLOCK_SOURCE) doblock = 1; CTR3(KTR_IGMPV3, "%s: imr_interface = %s, ifp = %p", __func__, inet_ntoa(mreqs.imr_interface), ifp); break; } case MCAST_BLOCK_SOURCE: case MCAST_UNBLOCK_SOURCE: error = sooptcopyin(sopt, &gsr, sizeof(struct group_source_req), sizeof(struct group_source_req)); if (error) return (error); if (gsa->sin.sin_family != AF_INET || gsa->sin.sin_len != sizeof(struct sockaddr_in)) return (EINVAL); if (ssa->sin.sin_family != AF_INET || ssa->sin.sin_len != sizeof(struct sockaddr_in)) return (EINVAL); if (gsr.gsr_interface == 0 || V_if_index < gsr.gsr_interface) return (EADDRNOTAVAIL); ifp = ifnet_byindex(gsr.gsr_interface); if (sopt->sopt_name == MCAST_BLOCK_SOURCE) doblock = 1; break; default: CTR2(KTR_IGMPV3, "%s: unknown sopt_name %d", __func__, sopt->sopt_name); return (EOPNOTSUPP); break; } if (!IN_MULTICAST(ntohl(gsa->sin.sin_addr.s_addr))) return (EINVAL); /* * Check if we are actually a member of this group. */ imo = inp_findmoptions(inp); idx = imo_match_group(imo, ifp, &gsa->sa); if (idx == -1 || imo->imo_mfilters == NULL) { error = EADDRNOTAVAIL; goto out_inp_locked; } KASSERT(imo->imo_mfilters != NULL, ("%s: imo_mfilters not allocated", __func__)); imf = &imo->imo_mfilters[idx]; inm = imo->imo_membership[idx]; /* * Attempting to use the delta-based API on an * non exclusive-mode membership is an error. */ fmode = imf->imf_st[0]; if (fmode != MCAST_EXCLUDE) { error = EINVAL; goto out_inp_locked; } /* * Deal with error cases up-front: * Asked to block, but already blocked; or * Asked to unblock, but nothing to unblock. * If adding a new block entry, allocate it. */ ims = imo_match_source(imo, idx, &ssa->sa); if ((ims != NULL && doblock) || (ims == NULL && !doblock)) { CTR3(KTR_IGMPV3, "%s: source %s %spresent", __func__, inet_ntoa(ssa->sin.sin_addr), doblock ? "" : "not "); error = EADDRNOTAVAIL; goto out_inp_locked; } INP_WLOCK_ASSERT(inp); /* * Begin state merge transaction at socket layer. */ if (doblock) { CTR2(KTR_IGMPV3, "%s: %s source", __func__, "block"); ims = imf_graft(imf, fmode, &ssa->sin); if (ims == NULL) error = ENOMEM; } else { CTR2(KTR_IGMPV3, "%s: %s source", __func__, "allow"); error = imf_prune(imf, &ssa->sin); } if (error) { CTR1(KTR_IGMPV3, "%s: merge imf state failed", __func__); goto out_imf_rollback; } /* * Begin state merge transaction at IGMP layer. */ IN_MULTI_LOCK(); CTR1(KTR_IGMPV3, "%s: merge inm state", __func__); error = inm_merge(inm, imf); if (error) { CTR1(KTR_IGMPV3, "%s: failed to merge inm state", __func__); goto out_in_multi_locked; } CTR1(KTR_IGMPV3, "%s: doing igmp downcall", __func__); error = igmp_change_state(inm); if (error) CTR1(KTR_IGMPV3, "%s: failed igmp downcall", __func__); out_in_multi_locked: IN_MULTI_UNLOCK(); out_imf_rollback: if (error) imf_rollback(imf); else imf_commit(imf); imf_reap(imf); out_inp_locked: INP_WUNLOCK(inp); return (error); } /* * Given an inpcb, return its multicast options structure pointer. Accepts * an unlocked inpcb pointer, but will return it locked. May sleep. * * SMPng: NOTE: Potentially calls malloc(M_WAITOK) with Giant held. * SMPng: NOTE: Returns with the INP write lock held. */ static struct ip_moptions * inp_findmoptions(struct inpcb *inp) { struct ip_moptions *imo; struct in_multi **immp; struct in_mfilter *imfp; size_t idx; INP_WLOCK(inp); if (inp->inp_moptions != NULL) return (inp->inp_moptions); INP_WUNLOCK(inp); imo = malloc(sizeof(*imo), M_IPMOPTS, M_WAITOK); immp = malloc(sizeof(*immp) * IP_MIN_MEMBERSHIPS, M_IPMOPTS, M_WAITOK | M_ZERO); imfp = malloc(sizeof(struct in_mfilter) * IP_MIN_MEMBERSHIPS, M_INMFILTER, M_WAITOK); imo->imo_multicast_ifp = NULL; imo->imo_multicast_addr.s_addr = INADDR_ANY; imo->imo_multicast_vif = -1; imo->imo_multicast_ttl = IP_DEFAULT_MULTICAST_TTL; imo->imo_multicast_loop = in_mcast_loop; imo->imo_num_memberships = 0; imo->imo_max_memberships = IP_MIN_MEMBERSHIPS; imo->imo_membership = immp; /* Initialize per-group source filters. */ for (idx = 0; idx < IP_MIN_MEMBERSHIPS; idx++) imf_init(&imfp[idx], MCAST_UNDEFINED, MCAST_EXCLUDE); imo->imo_mfilters = imfp; INP_WLOCK(inp); if (inp->inp_moptions != NULL) { free(imfp, M_INMFILTER); free(immp, M_IPMOPTS); free(imo, M_IPMOPTS); return (inp->inp_moptions); } inp->inp_moptions = imo; return (imo); } /* * Discard the IP multicast options (and source filters). To minimize * the amount of work done while holding locks such as the INP's * pcbinfo lock (which is used in the receive path), the free * operation is performed asynchronously in a separate task. * * SMPng: NOTE: assumes INP write lock is held. */ void inp_freemoptions(struct ip_moptions *imo) { KASSERT(imo != NULL, ("%s: ip_moptions is NULL", __func__)); IN_MULTI_LOCK(); STAILQ_INSERT_TAIL(&imo_gc_list, imo, imo_link); IN_MULTI_UNLOCK(); taskqueue_enqueue(taskqueue_thread, &imo_gc_task); } static void inp_freemoptions_internal(struct ip_moptions *imo) { struct in_mfilter *imf; size_t idx, nmships; nmships = imo->imo_num_memberships; for (idx = 0; idx < nmships; ++idx) { imf = imo->imo_mfilters ? &imo->imo_mfilters[idx] : NULL; if (imf) imf_leave(imf); (void)in_leavegroup(imo->imo_membership[idx], imf); if (imf) imf_purge(imf); } if (imo->imo_mfilters) free(imo->imo_mfilters, M_INMFILTER); free(imo->imo_membership, M_IPMOPTS); free(imo, M_IPMOPTS); } static void inp_gcmoptions(void *context, int pending) { struct ip_moptions *imo; IN_MULTI_LOCK(); while (!STAILQ_EMPTY(&imo_gc_list)) { imo = STAILQ_FIRST(&imo_gc_list); STAILQ_REMOVE_HEAD(&imo_gc_list, imo_link); IN_MULTI_UNLOCK(); inp_freemoptions_internal(imo); IN_MULTI_LOCK(); } IN_MULTI_UNLOCK(); } /* * Atomically get source filters on a socket for an IPv4 multicast group. * Called with INP lock held; returns with lock released. */ static int inp_get_source_filters(struct inpcb *inp, struct sockopt *sopt) { struct __msfilterreq msfr; sockunion_t *gsa; struct ifnet *ifp; struct ip_moptions *imo; struct in_mfilter *imf; struct ip_msource *ims; struct in_msource *lims; struct sockaddr_in *psin; struct sockaddr_storage *ptss; struct sockaddr_storage *tss; int error; size_t idx, nsrcs, ncsrcs; INP_WLOCK_ASSERT(inp); imo = inp->inp_moptions; KASSERT(imo != NULL, ("%s: null ip_moptions", __func__)); INP_WUNLOCK(inp); error = sooptcopyin(sopt, &msfr, sizeof(struct __msfilterreq), sizeof(struct __msfilterreq)); if (error) return (error); if (msfr.msfr_ifindex == 0 || V_if_index < msfr.msfr_ifindex) return (EINVAL); ifp = ifnet_byindex(msfr.msfr_ifindex); if (ifp == NULL) return (EINVAL); INP_WLOCK(inp); /* * Lookup group on the socket. */ gsa = (sockunion_t *)&msfr.msfr_group; idx = imo_match_group(imo, ifp, &gsa->sa); if (idx == -1 || imo->imo_mfilters == NULL) { INP_WUNLOCK(inp); return (EADDRNOTAVAIL); } imf = &imo->imo_mfilters[idx]; /* * Ignore memberships which are in limbo. */ if (imf->imf_st[1] == MCAST_UNDEFINED) { INP_WUNLOCK(inp); return (EAGAIN); } msfr.msfr_fmode = imf->imf_st[1]; /* * If the user specified a buffer, copy out the source filter * entries to userland gracefully. * We only copy out the number of entries which userland * has asked for, but we always tell userland how big the * buffer really needs to be. */ if (msfr.msfr_nsrcs > in_mcast_maxsocksrc) msfr.msfr_nsrcs = in_mcast_maxsocksrc; tss = NULL; if (msfr.msfr_srcs != NULL && msfr.msfr_nsrcs > 0) { tss = malloc(sizeof(struct sockaddr_storage) * msfr.msfr_nsrcs, M_TEMP, M_NOWAIT | M_ZERO); if (tss == NULL) { INP_WUNLOCK(inp); return (ENOBUFS); } } /* * Count number of sources in-mode at t0. * If buffer space exists and remains, copy out source entries. */ nsrcs = msfr.msfr_nsrcs; ncsrcs = 0; ptss = tss; RB_FOREACH(ims, ip_msource_tree, &imf->imf_sources) { lims = (struct in_msource *)ims; if (lims->imsl_st[0] == MCAST_UNDEFINED || lims->imsl_st[0] != imf->imf_st[0]) continue; ++ncsrcs; if (tss != NULL && nsrcs > 0) { psin = (struct sockaddr_in *)ptss; psin->sin_family = AF_INET; psin->sin_len = sizeof(struct sockaddr_in); psin->sin_addr.s_addr = htonl(lims->ims_haddr); psin->sin_port = 0; ++ptss; --nsrcs; } } INP_WUNLOCK(inp); if (tss != NULL) { error = copyout(tss, msfr.msfr_srcs, sizeof(struct sockaddr_storage) * msfr.msfr_nsrcs); free(tss, M_TEMP); if (error) return (error); } msfr.msfr_nsrcs = ncsrcs; error = sooptcopyout(sopt, &msfr, sizeof(struct __msfilterreq)); return (error); } /* * Return the IP multicast options in response to user getsockopt(). */ int inp_getmoptions(struct inpcb *inp, struct sockopt *sopt) { struct ip_mreqn mreqn; struct ip_moptions *imo; struct ifnet *ifp; struct in_ifaddr *ia; int error, optval; u_char coptval; INP_WLOCK(inp); imo = inp->inp_moptions; /* * If socket is neither of type SOCK_RAW or SOCK_DGRAM, * or is a divert socket, reject it. */ if (inp->inp_socket->so_proto->pr_protocol == IPPROTO_DIVERT || (inp->inp_socket->so_proto->pr_type != SOCK_RAW && inp->inp_socket->so_proto->pr_type != SOCK_DGRAM)) { INP_WUNLOCK(inp); return (EOPNOTSUPP); } error = 0; switch (sopt->sopt_name) { case IP_MULTICAST_VIF: if (imo != NULL) optval = imo->imo_multicast_vif; else optval = -1; INP_WUNLOCK(inp); error = sooptcopyout(sopt, &optval, sizeof(int)); break; case IP_MULTICAST_IF: memset(&mreqn, 0, sizeof(struct ip_mreqn)); if (imo != NULL) { ifp = imo->imo_multicast_ifp; if (!in_nullhost(imo->imo_multicast_addr)) { mreqn.imr_address = imo->imo_multicast_addr; } else if (ifp != NULL) { mreqn.imr_ifindex = ifp->if_index; IFP_TO_IA(ifp, ia); if (ia != NULL) { mreqn.imr_address = IA_SIN(ia)->sin_addr; ifa_free(&ia->ia_ifa); } } } INP_WUNLOCK(inp); if (sopt->sopt_valsize == sizeof(struct ip_mreqn)) { error = sooptcopyout(sopt, &mreqn, sizeof(struct ip_mreqn)); } else { error = sooptcopyout(sopt, &mreqn.imr_address, sizeof(struct in_addr)); } break; case IP_MULTICAST_TTL: if (imo == 0) optval = coptval = IP_DEFAULT_MULTICAST_TTL; else optval = coptval = imo->imo_multicast_ttl; INP_WUNLOCK(inp); if (sopt->sopt_valsize == sizeof(u_char)) error = sooptcopyout(sopt, &coptval, sizeof(u_char)); else error = sooptcopyout(sopt, &optval, sizeof(int)); break; case IP_MULTICAST_LOOP: if (imo == 0) optval = coptval = IP_DEFAULT_MULTICAST_LOOP; else optval = coptval = imo->imo_multicast_loop; INP_WUNLOCK(inp); if (sopt->sopt_valsize == sizeof(u_char)) error = sooptcopyout(sopt, &coptval, sizeof(u_char)); else error = sooptcopyout(sopt, &optval, sizeof(int)); break; case IP_MSFILTER: if (imo == NULL) { error = EADDRNOTAVAIL; INP_WUNLOCK(inp); } else { error = inp_get_source_filters(inp, sopt); } break; default: INP_WUNLOCK(inp); error = ENOPROTOOPT; break; } INP_UNLOCK_ASSERT(inp); return (error); } /* * Look up the ifnet to use for a multicast group membership, * given the IPv4 address of an interface, and the IPv4 group address. * * This routine exists to support legacy multicast applications * which do not understand that multicast memberships are scoped to * specific physical links in the networking stack, or which need * to join link-scope groups before IPv4 addresses are configured. * * If inp is non-NULL, use this socket's current FIB number for any * required FIB lookup. * If ina is INADDR_ANY, look up the group address in the unicast FIB, * and use its ifp; usually, this points to the default next-hop. * * If the FIB lookup fails, attempt to use the first non-loopback * interface with multicast capability in the system as a * last resort. The legacy IPv4 ASM API requires that we do * this in order to allow groups to be joined when the routing * table has not yet been populated during boot. * * Returns NULL if no ifp could be found. * * SMPng: TODO: Acquire the appropriate locks for INADDR_TO_IFP. * FUTURE: Implement IPv4 source-address selection. */ static struct ifnet * inp_lookup_mcast_ifp(const struct inpcb *inp, const struct sockaddr_in *gsin, const struct in_addr ina) { struct ifnet *ifp; KASSERT(gsin->sin_family == AF_INET, ("%s: not AF_INET", __func__)); KASSERT(IN_MULTICAST(ntohl(gsin->sin_addr.s_addr)), ("%s: not multicast", __func__)); ifp = NULL; if (!in_nullhost(ina)) { INADDR_TO_IFP(ina, ifp); } else { struct route ro; ro.ro_rt = NULL; memcpy(&ro.ro_dst, gsin, sizeof(struct sockaddr_in)); in_rtalloc_ign(&ro, 0, inp ? inp->inp_inc.inc_fibnum : 0); if (ro.ro_rt != NULL) { ifp = ro.ro_rt->rt_ifp; KASSERT(ifp != NULL, ("%s: null ifp", __func__)); RTFREE(ro.ro_rt); } else { struct in_ifaddr *ia; struct ifnet *mifp; mifp = NULL; IN_IFADDR_RLOCK(); TAILQ_FOREACH(ia, &V_in_ifaddrhead, ia_link) { mifp = ia->ia_ifp; if (!(mifp->if_flags & IFF_LOOPBACK) && (mifp->if_flags & IFF_MULTICAST)) { ifp = mifp; break; } } IN_IFADDR_RUNLOCK(); } } return (ifp); } /* * Join an IPv4 multicast group, possibly with a source. */ static int inp_join_group(struct inpcb *inp, struct sockopt *sopt) { struct group_source_req gsr; sockunion_t *gsa, *ssa; struct ifnet *ifp; struct in_mfilter *imf; struct ip_moptions *imo; struct in_multi *inm; struct in_msource *lims; size_t idx; int error, is_new; ifp = NULL; imf = NULL; lims = NULL; error = 0; is_new = 0; memset(&gsr, 0, sizeof(struct group_source_req)); gsa = (sockunion_t *)&gsr.gsr_group; gsa->ss.ss_family = AF_UNSPEC; ssa = (sockunion_t *)&gsr.gsr_source; ssa->ss.ss_family = AF_UNSPEC; switch (sopt->sopt_name) { case IP_ADD_MEMBERSHIP: case IP_ADD_SOURCE_MEMBERSHIP: { struct ip_mreq_source mreqs; if (sopt->sopt_name == IP_ADD_MEMBERSHIP) { error = sooptcopyin(sopt, &mreqs, sizeof(struct ip_mreq), sizeof(struct ip_mreq)); /* * Do argument switcharoo from ip_mreq into * ip_mreq_source to avoid using two instances. */ mreqs.imr_interface = mreqs.imr_sourceaddr; mreqs.imr_sourceaddr.s_addr = INADDR_ANY; } else if (sopt->sopt_name == IP_ADD_SOURCE_MEMBERSHIP) { error = sooptcopyin(sopt, &mreqs, sizeof(struct ip_mreq_source), sizeof(struct ip_mreq_source)); } if (error) return (error); gsa->sin.sin_family = AF_INET; gsa->sin.sin_len = sizeof(struct sockaddr_in); gsa->sin.sin_addr = mreqs.imr_multiaddr; if (sopt->sopt_name == IP_ADD_SOURCE_MEMBERSHIP) { ssa->sin.sin_family = AF_INET; ssa->sin.sin_len = sizeof(struct sockaddr_in); ssa->sin.sin_addr = mreqs.imr_sourceaddr; } if (!IN_MULTICAST(ntohl(gsa->sin.sin_addr.s_addr))) return (EINVAL); ifp = inp_lookup_mcast_ifp(inp, &gsa->sin, mreqs.imr_interface); CTR3(KTR_IGMPV3, "%s: imr_interface = %s, ifp = %p", __func__, inet_ntoa(mreqs.imr_interface), ifp); break; } case MCAST_JOIN_GROUP: case MCAST_JOIN_SOURCE_GROUP: if (sopt->sopt_name == MCAST_JOIN_GROUP) { error = sooptcopyin(sopt, &gsr, sizeof(struct group_req), sizeof(struct group_req)); } else if (sopt->sopt_name == MCAST_JOIN_SOURCE_GROUP) { error = sooptcopyin(sopt, &gsr, sizeof(struct group_source_req), sizeof(struct group_source_req)); } if (error) return (error); if (gsa->sin.sin_family != AF_INET || gsa->sin.sin_len != sizeof(struct sockaddr_in)) return (EINVAL); /* * Overwrite the port field if present, as the sockaddr * being copied in may be matched with a binary comparison. */ gsa->sin.sin_port = 0; if (sopt->sopt_name == MCAST_JOIN_SOURCE_GROUP) { if (ssa->sin.sin_family != AF_INET || ssa->sin.sin_len != sizeof(struct sockaddr_in)) return (EINVAL); ssa->sin.sin_port = 0; } if (!IN_MULTICAST(ntohl(gsa->sin.sin_addr.s_addr))) return (EINVAL); if (gsr.gsr_interface == 0 || V_if_index < gsr.gsr_interface) return (EADDRNOTAVAIL); ifp = ifnet_byindex(gsr.gsr_interface); break; default: CTR2(KTR_IGMPV3, "%s: unknown sopt_name %d", __func__, sopt->sopt_name); return (EOPNOTSUPP); break; } if (ifp == NULL || (ifp->if_flags & IFF_MULTICAST) == 0) return (EADDRNOTAVAIL); imo = inp_findmoptions(inp); idx = imo_match_group(imo, ifp, &gsa->sa); if (idx == -1) { is_new = 1; } else { inm = imo->imo_membership[idx]; imf = &imo->imo_mfilters[idx]; if (ssa->ss.ss_family != AF_UNSPEC) { /* * MCAST_JOIN_SOURCE_GROUP on an exclusive membership * is an error. On an existing inclusive membership, * it just adds the source to the filter list. */ if (imf->imf_st[1] != MCAST_INCLUDE) { error = EINVAL; goto out_inp_locked; } /* * Throw out duplicates. * * XXX FIXME: This makes a naive assumption that * even if entries exist for *ssa in this imf, * they will be rejected as dupes, even if they * are not valid in the current mode (in-mode). * * in_msource is transactioned just as for anything * else in SSM -- but note naive use of inm_graft() * below for allocating new filter entries. * * This is only an issue if someone mixes the * full-state SSM API with the delta-based API, * which is discouraged in the relevant RFCs. */ lims = imo_match_source(imo, idx, &ssa->sa); if (lims != NULL /*&& lims->imsl_st[1] == MCAST_INCLUDE*/) { error = EADDRNOTAVAIL; goto out_inp_locked; } } else { /* * MCAST_JOIN_GROUP on an existing exclusive * membership is an error; return EADDRINUSE * to preserve 4.4BSD API idempotence, and * avoid tedious detour to code below. * NOTE: This is bending RFC 3678 a bit. * * On an existing inclusive membership, this is also * an error; if you want to change filter mode, * you must use the userland API setsourcefilter(). * XXX We don't reject this for imf in UNDEFINED * state at t1, because allocation of a filter * is atomic with allocation of a membership. */ error = EINVAL; if (imf->imf_st[1] == MCAST_EXCLUDE) error = EADDRINUSE; goto out_inp_locked; } } /* * Begin state merge transaction at socket layer. */ INP_WLOCK_ASSERT(inp); if (is_new) { if (imo->imo_num_memberships == imo->imo_max_memberships) { error = imo_grow(imo); if (error) goto out_inp_locked; } /* * Allocate the new slot upfront so we can deal with * grafting the new source filter in same code path * as for join-source on existing membership. */ idx = imo->imo_num_memberships; imo->imo_membership[idx] = NULL; imo->imo_num_memberships++; KASSERT(imo->imo_mfilters != NULL, ("%s: imf_mfilters vector was not allocated", __func__)); imf = &imo->imo_mfilters[idx]; KASSERT(RB_EMPTY(&imf->imf_sources), ("%s: imf_sources not empty", __func__)); } /* * Graft new source into filter list for this inpcb's * membership of the group. The in_multi may not have * been allocated yet if this is a new membership, however, * the in_mfilter slot will be allocated and must be initialized. * * Note: Grafting of exclusive mode filters doesn't happen * in this path. * XXX: Should check for non-NULL lims (node exists but may * not be in-mode) for interop with full-state API. */ if (ssa->ss.ss_family != AF_UNSPEC) { /* Membership starts in IN mode */ if (is_new) { CTR1(KTR_IGMPV3, "%s: new join w/source", __func__); imf_init(imf, MCAST_UNDEFINED, MCAST_INCLUDE); } else { CTR2(KTR_IGMPV3, "%s: %s source", __func__, "allow"); } lims = imf_graft(imf, MCAST_INCLUDE, &ssa->sin); if (lims == NULL) { CTR1(KTR_IGMPV3, "%s: merge imf state failed", __func__); error = ENOMEM; goto out_imo_free; } } else { /* No address specified; Membership starts in EX mode */ if (is_new) { CTR1(KTR_IGMPV3, "%s: new join w/o source", __func__); imf_init(imf, MCAST_UNDEFINED, MCAST_EXCLUDE); } } /* * Begin state merge transaction at IGMP layer. */ IN_MULTI_LOCK(); if (is_new) { error = in_joingroup_locked(ifp, &gsa->sin.sin_addr, imf, &inm); if (error) { CTR1(KTR_IGMPV3, "%s: in_joingroup_locked failed", __func__); IN_MULTI_UNLOCK(); goto out_imo_free; } imo->imo_membership[idx] = inm; } else { CTR1(KTR_IGMPV3, "%s: merge inm state", __func__); error = inm_merge(inm, imf); if (error) { CTR1(KTR_IGMPV3, "%s: failed to merge inm state", __func__); goto out_in_multi_locked; } CTR1(KTR_IGMPV3, "%s: doing igmp downcall", __func__); error = igmp_change_state(inm); if (error) { CTR1(KTR_IGMPV3, "%s: failed igmp downcall", __func__); goto out_in_multi_locked; } } out_in_multi_locked: IN_MULTI_UNLOCK(); INP_WLOCK_ASSERT(inp); if (error) { imf_rollback(imf); if (is_new) imf_purge(imf); else imf_reap(imf); } else { imf_commit(imf); } out_imo_free: if (error && is_new) { imo->imo_membership[idx] = NULL; --imo->imo_num_memberships; } out_inp_locked: INP_WUNLOCK(inp); return (error); } /* * Leave an IPv4 multicast group on an inpcb, possibly with a source. */ static int inp_leave_group(struct inpcb *inp, struct sockopt *sopt) { struct group_source_req gsr; struct ip_mreq_source mreqs; sockunion_t *gsa, *ssa; struct ifnet *ifp; struct in_mfilter *imf; struct ip_moptions *imo; struct in_msource *ims; struct in_multi *inm; size_t idx; int error, is_final; ifp = NULL; error = 0; is_final = 1; memset(&gsr, 0, sizeof(struct group_source_req)); gsa = (sockunion_t *)&gsr.gsr_group; gsa->ss.ss_family = AF_UNSPEC; ssa = (sockunion_t *)&gsr.gsr_source; ssa->ss.ss_family = AF_UNSPEC; switch (sopt->sopt_name) { case IP_DROP_MEMBERSHIP: case IP_DROP_SOURCE_MEMBERSHIP: if (sopt->sopt_name == IP_DROP_MEMBERSHIP) { error = sooptcopyin(sopt, &mreqs, sizeof(struct ip_mreq), sizeof(struct ip_mreq)); /* * Swap interface and sourceaddr arguments, * as ip_mreq and ip_mreq_source are laid * out differently. */ mreqs.imr_interface = mreqs.imr_sourceaddr; mreqs.imr_sourceaddr.s_addr = INADDR_ANY; } else if (sopt->sopt_name == IP_DROP_SOURCE_MEMBERSHIP) { error = sooptcopyin(sopt, &mreqs, sizeof(struct ip_mreq_source), sizeof(struct ip_mreq_source)); } if (error) return (error); gsa->sin.sin_family = AF_INET; gsa->sin.sin_len = sizeof(struct sockaddr_in); gsa->sin.sin_addr = mreqs.imr_multiaddr; if (sopt->sopt_name == IP_DROP_SOURCE_MEMBERSHIP) { ssa->sin.sin_family = AF_INET; ssa->sin.sin_len = sizeof(struct sockaddr_in); ssa->sin.sin_addr = mreqs.imr_sourceaddr; } /* * Attempt to look up hinted ifp from interface address. * Fallthrough with null ifp iff lookup fails, to * preserve 4.4BSD mcast API idempotence. * XXX NOTE WELL: The RFC 3678 API is preferred because * using an IPv4 address as a key is racy. */ if (!in_nullhost(mreqs.imr_interface)) INADDR_TO_IFP(mreqs.imr_interface, ifp); CTR3(KTR_IGMPV3, "%s: imr_interface = %s, ifp = %p", __func__, inet_ntoa(mreqs.imr_interface), ifp); break; case MCAST_LEAVE_GROUP: case MCAST_LEAVE_SOURCE_GROUP: if (sopt->sopt_name == MCAST_LEAVE_GROUP) { error = sooptcopyin(sopt, &gsr, sizeof(struct group_req), sizeof(struct group_req)); } else if (sopt->sopt_name == MCAST_LEAVE_SOURCE_GROUP) { error = sooptcopyin(sopt, &gsr, sizeof(struct group_source_req), sizeof(struct group_source_req)); } if (error) return (error); if (gsa->sin.sin_family != AF_INET || gsa->sin.sin_len != sizeof(struct sockaddr_in)) return (EINVAL); if (sopt->sopt_name == MCAST_LEAVE_SOURCE_GROUP) { if (ssa->sin.sin_family != AF_INET || ssa->sin.sin_len != sizeof(struct sockaddr_in)) return (EINVAL); } if (gsr.gsr_interface == 0 || V_if_index < gsr.gsr_interface) return (EADDRNOTAVAIL); ifp = ifnet_byindex(gsr.gsr_interface); if (ifp == NULL) return (EADDRNOTAVAIL); break; default: CTR2(KTR_IGMPV3, "%s: unknown sopt_name %d", __func__, sopt->sopt_name); return (EOPNOTSUPP); break; } if (!IN_MULTICAST(ntohl(gsa->sin.sin_addr.s_addr))) return (EINVAL); /* * Find the membership in the membership array. */ imo = inp_findmoptions(inp); idx = imo_match_group(imo, ifp, &gsa->sa); if (idx == -1) { error = EADDRNOTAVAIL; goto out_inp_locked; } inm = imo->imo_membership[idx]; imf = &imo->imo_mfilters[idx]; if (ssa->ss.ss_family != AF_UNSPEC) is_final = 0; /* * Begin state merge transaction at socket layer. */ INP_WLOCK_ASSERT(inp); /* * If we were instructed only to leave a given source, do so. * MCAST_LEAVE_SOURCE_GROUP is only valid for inclusive memberships. */ if (is_final) { imf_leave(imf); } else { if (imf->imf_st[0] == MCAST_EXCLUDE) { error = EADDRNOTAVAIL; goto out_inp_locked; } ims = imo_match_source(imo, idx, &ssa->sa); if (ims == NULL) { CTR3(KTR_IGMPV3, "%s: source %s %spresent", __func__, inet_ntoa(ssa->sin.sin_addr), "not "); error = EADDRNOTAVAIL; goto out_inp_locked; } CTR2(KTR_IGMPV3, "%s: %s source", __func__, "block"); error = imf_prune(imf, &ssa->sin); if (error) { CTR1(KTR_IGMPV3, "%s: merge imf state failed", __func__); goto out_inp_locked; } } /* * Begin state merge transaction at IGMP layer. */ IN_MULTI_LOCK(); if (is_final) { /* * Give up the multicast address record to which * the membership points. */ (void)in_leavegroup_locked(inm, imf); } else { CTR1(KTR_IGMPV3, "%s: merge inm state", __func__); error = inm_merge(inm, imf); if (error) { CTR1(KTR_IGMPV3, "%s: failed to merge inm state", __func__); goto out_in_multi_locked; } CTR1(KTR_IGMPV3, "%s: doing igmp downcall", __func__); error = igmp_change_state(inm); if (error) { CTR1(KTR_IGMPV3, "%s: failed igmp downcall", __func__); } } out_in_multi_locked: IN_MULTI_UNLOCK(); if (error) imf_rollback(imf); else imf_commit(imf); imf_reap(imf); if (is_final) { /* Remove the gap in the membership and filter array. */ for (++idx; idx < imo->imo_num_memberships; ++idx) { imo->imo_membership[idx-1] = imo->imo_membership[idx]; imo->imo_mfilters[idx-1] = imo->imo_mfilters[idx]; } imo->imo_num_memberships--; } out_inp_locked: INP_WUNLOCK(inp); return (error); } /* * Select the interface for transmitting IPv4 multicast datagrams. * * Either an instance of struct in_addr or an instance of struct ip_mreqn * may be passed to this socket option. An address of INADDR_ANY or an * interface index of 0 is used to remove a previous selection. * When no interface is selected, one is chosen for every send. */ static int inp_set_multicast_if(struct inpcb *inp, struct sockopt *sopt) { struct in_addr addr; struct ip_mreqn mreqn; struct ifnet *ifp; struct ip_moptions *imo; int error; if (sopt->sopt_valsize == sizeof(struct ip_mreqn)) { /* * An interface index was specified using the * Linux-derived ip_mreqn structure. */ error = sooptcopyin(sopt, &mreqn, sizeof(struct ip_mreqn), sizeof(struct ip_mreqn)); if (error) return (error); if (mreqn.imr_ifindex < 0 || V_if_index < mreqn.imr_ifindex) return (EINVAL); if (mreqn.imr_ifindex == 0) { ifp = NULL; } else { ifp = ifnet_byindex(mreqn.imr_ifindex); if (ifp == NULL) return (EADDRNOTAVAIL); } } else { /* * An interface was specified by IPv4 address. * This is the traditional BSD usage. */ error = sooptcopyin(sopt, &addr, sizeof(struct in_addr), sizeof(struct in_addr)); if (error) return (error); if (in_nullhost(addr)) { ifp = NULL; } else { INADDR_TO_IFP(addr, ifp); if (ifp == NULL) return (EADDRNOTAVAIL); } CTR3(KTR_IGMPV3, "%s: ifp = %p, addr = %s", __func__, ifp, inet_ntoa(addr)); } /* Reject interfaces which do not support multicast. */ if (ifp != NULL && (ifp->if_flags & IFF_MULTICAST) == 0) return (EOPNOTSUPP); imo = inp_findmoptions(inp); imo->imo_multicast_ifp = ifp; imo->imo_multicast_addr.s_addr = INADDR_ANY; INP_WUNLOCK(inp); return (0); } /* * Atomically set source filters on a socket for an IPv4 multicast group. * * SMPng: NOTE: Potentially calls malloc(M_WAITOK) with Giant held. */ static int inp_set_source_filters(struct inpcb *inp, struct sockopt *sopt) { struct __msfilterreq msfr; sockunion_t *gsa; struct ifnet *ifp; struct in_mfilter *imf; struct ip_moptions *imo; struct in_multi *inm; size_t idx; int error; error = sooptcopyin(sopt, &msfr, sizeof(struct __msfilterreq), sizeof(struct __msfilterreq)); if (error) return (error); if (msfr.msfr_nsrcs > in_mcast_maxsocksrc) return (ENOBUFS); if ((msfr.msfr_fmode != MCAST_EXCLUDE && msfr.msfr_fmode != MCAST_INCLUDE)) return (EINVAL); if (msfr.msfr_group.ss_family != AF_INET || msfr.msfr_group.ss_len != sizeof(struct sockaddr_in)) return (EINVAL); gsa = (sockunion_t *)&msfr.msfr_group; if (!IN_MULTICAST(ntohl(gsa->sin.sin_addr.s_addr))) return (EINVAL); gsa->sin.sin_port = 0; /* ignore port */ if (msfr.msfr_ifindex == 0 || V_if_index < msfr.msfr_ifindex) return (EADDRNOTAVAIL); ifp = ifnet_byindex(msfr.msfr_ifindex); if (ifp == NULL) return (EADDRNOTAVAIL); /* * Take the INP write lock. * Check if this socket is a member of this group. */ imo = inp_findmoptions(inp); idx = imo_match_group(imo, ifp, &gsa->sa); if (idx == -1 || imo->imo_mfilters == NULL) { error = EADDRNOTAVAIL; goto out_inp_locked; } inm = imo->imo_membership[idx]; imf = &imo->imo_mfilters[idx]; /* * Begin state merge transaction at socket layer. */ INP_WLOCK_ASSERT(inp); imf->imf_st[1] = msfr.msfr_fmode; /* * Apply any new source filters, if present. * Make a copy of the user-space source vector so * that we may copy them with a single copyin. This * allows us to deal with page faults up-front. */ if (msfr.msfr_nsrcs > 0) { struct in_msource *lims; struct sockaddr_in *psin; struct sockaddr_storage *kss, *pkss; int i; INP_WUNLOCK(inp); CTR2(KTR_IGMPV3, "%s: loading %lu source list entries", __func__, (unsigned long)msfr.msfr_nsrcs); kss = malloc(sizeof(struct sockaddr_storage) * msfr.msfr_nsrcs, M_TEMP, M_WAITOK); error = copyin(msfr.msfr_srcs, kss, sizeof(struct sockaddr_storage) * msfr.msfr_nsrcs); if (error) { free(kss, M_TEMP); return (error); } INP_WLOCK(inp); /* * Mark all source filters as UNDEFINED at t1. * Restore new group filter mode, as imf_leave() * will set it to INCLUDE. */ imf_leave(imf); imf->imf_st[1] = msfr.msfr_fmode; /* * Update socket layer filters at t1, lazy-allocating * new entries. This saves a bunch of memory at the * cost of one RB_FIND() per source entry; duplicate * entries in the msfr_nsrcs vector are ignored. * If we encounter an error, rollback transaction. * * XXX This too could be replaced with a set-symmetric * difference like loop to avoid walking from root * every time, as the key space is common. */ for (i = 0, pkss = kss; i < msfr.msfr_nsrcs; i++, pkss++) { psin = (struct sockaddr_in *)pkss; if (psin->sin_family != AF_INET) { error = EAFNOSUPPORT; break; } if (psin->sin_len != sizeof(struct sockaddr_in)) { error = EINVAL; break; } error = imf_get_source(imf, psin, &lims); if (error) break; lims->imsl_st[1] = imf->imf_st[1]; } free(kss, M_TEMP); } if (error) goto out_imf_rollback; INP_WLOCK_ASSERT(inp); IN_MULTI_LOCK(); /* * Begin state merge transaction at IGMP layer. */ CTR1(KTR_IGMPV3, "%s: merge inm state", __func__); error = inm_merge(inm, imf); if (error) { CTR1(KTR_IGMPV3, "%s: failed to merge inm state", __func__); goto out_in_multi_locked; } CTR1(KTR_IGMPV3, "%s: doing igmp downcall", __func__); error = igmp_change_state(inm); if (error) CTR1(KTR_IGMPV3, "%s: failed igmp downcall", __func__); out_in_multi_locked: IN_MULTI_UNLOCK(); out_imf_rollback: if (error) imf_rollback(imf); else imf_commit(imf); imf_reap(imf); out_inp_locked: INP_WUNLOCK(inp); return (error); } /* * Set the IP multicast options in response to user setsockopt(). * * Many of the socket options handled in this function duplicate the * functionality of socket options in the regular unicast API. However, * it is not possible to merge the duplicate code, because the idempotence * of the IPv4 multicast part of the BSD Sockets API must be preserved; * the effects of these options must be treated as separate and distinct. * * SMPng: XXX: Unlocked read of inp_socket believed OK. * FUTURE: The IP_MULTICAST_VIF option may be eliminated if MROUTING * is refactored to no longer use vifs. */ int inp_setmoptions(struct inpcb *inp, struct sockopt *sopt) { struct ip_moptions *imo; int error; error = 0; /* * If socket is neither of type SOCK_RAW or SOCK_DGRAM, * or is a divert socket, reject it. */ if (inp->inp_socket->so_proto->pr_protocol == IPPROTO_DIVERT || (inp->inp_socket->so_proto->pr_type != SOCK_RAW && inp->inp_socket->so_proto->pr_type != SOCK_DGRAM)) return (EOPNOTSUPP); switch (sopt->sopt_name) { case IP_MULTICAST_VIF: { int vifi; /* * Select a multicast VIF for transmission. * Only useful if multicast forwarding is active. */ if (legal_vif_num == NULL) { error = EOPNOTSUPP; break; } error = sooptcopyin(sopt, &vifi, sizeof(int), sizeof(int)); if (error) break; if (!legal_vif_num(vifi) && (vifi != -1)) { error = EINVAL; break; } imo = inp_findmoptions(inp); imo->imo_multicast_vif = vifi; INP_WUNLOCK(inp); break; } case IP_MULTICAST_IF: error = inp_set_multicast_if(inp, sopt); break; case IP_MULTICAST_TTL: { u_char ttl; /* * Set the IP time-to-live for outgoing multicast packets. * The original multicast API required a char argument, * which is inconsistent with the rest of the socket API. * We allow either a char or an int. */ if (sopt->sopt_valsize == sizeof(u_char)) { error = sooptcopyin(sopt, &ttl, sizeof(u_char), sizeof(u_char)); if (error) break; } else { u_int ittl; error = sooptcopyin(sopt, &ittl, sizeof(u_int), sizeof(u_int)); if (error) break; if (ittl > 255) { error = EINVAL; break; } ttl = (u_char)ittl; } imo = inp_findmoptions(inp); imo->imo_multicast_ttl = ttl; INP_WUNLOCK(inp); break; } case IP_MULTICAST_LOOP: { u_char loop; /* * Set the loopback flag for outgoing multicast packets. * Must be zero or one. The original multicast API required a * char argument, which is inconsistent with the rest * of the socket API. We allow either a char or an int. */ if (sopt->sopt_valsize == sizeof(u_char)) { error = sooptcopyin(sopt, &loop, sizeof(u_char), sizeof(u_char)); if (error) break; } else { u_int iloop; error = sooptcopyin(sopt, &iloop, sizeof(u_int), sizeof(u_int)); if (error) break; loop = (u_char)iloop; } imo = inp_findmoptions(inp); imo->imo_multicast_loop = !!loop; INP_WUNLOCK(inp); break; } case IP_ADD_MEMBERSHIP: case IP_ADD_SOURCE_MEMBERSHIP: case MCAST_JOIN_GROUP: case MCAST_JOIN_SOURCE_GROUP: error = inp_join_group(inp, sopt); break; case IP_DROP_MEMBERSHIP: case IP_DROP_SOURCE_MEMBERSHIP: case MCAST_LEAVE_GROUP: case MCAST_LEAVE_SOURCE_GROUP: error = inp_leave_group(inp, sopt); break; case IP_BLOCK_SOURCE: case IP_UNBLOCK_SOURCE: case MCAST_BLOCK_SOURCE: case MCAST_UNBLOCK_SOURCE: error = inp_block_unblock_source(inp, sopt); break; case IP_MSFILTER: error = inp_set_source_filters(inp, sopt); break; default: error = EOPNOTSUPP; break; } INP_UNLOCK_ASSERT(inp); return (error); } /* * Expose IGMP's multicast filter mode and source list(s) to userland, * keyed by (ifindex, group). * The filter mode is written out as a uint32_t, followed by * 0..n of struct in_addr. * For use by ifmcstat(8). * SMPng: NOTE: unlocked read of ifindex space. */ static int sysctl_ip_mcast_filters(SYSCTL_HANDLER_ARGS) { struct in_addr src, group; struct ifnet *ifp; struct ifmultiaddr *ifma; struct in_multi *inm; struct ip_msource *ims; int *name; int retval; u_int namelen; uint32_t fmode, ifindex; name = (int *)arg1; namelen = arg2; if (req->newptr != NULL) return (EPERM); if (namelen != 2) return (EINVAL); ifindex = name[0]; if (ifindex <= 0 || ifindex > V_if_index) { CTR2(KTR_IGMPV3, "%s: ifindex %u out of range", __func__, ifindex); return (ENOENT); } group.s_addr = name[1]; if (!IN_MULTICAST(ntohl(group.s_addr))) { CTR2(KTR_IGMPV3, "%s: group %s is not multicast", __func__, inet_ntoa(group)); return (EINVAL); } ifp = ifnet_byindex(ifindex); if (ifp == NULL) { CTR2(KTR_IGMPV3, "%s: no ifp for ifindex %u", __func__, ifindex); return (ENOENT); } retval = sysctl_wire_old_buffer(req, sizeof(uint32_t) + (in_mcast_maxgrpsrc * sizeof(struct in_addr))); if (retval) return (retval); IN_MULTI_LOCK(); IF_ADDR_RLOCK(ifp); TAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) { if (ifma->ifma_addr->sa_family != AF_INET || ifma->ifma_protospec == NULL) continue; inm = (struct in_multi *)ifma->ifma_protospec; if (!in_hosteq(inm->inm_addr, group)) continue; fmode = inm->inm_st[1].iss_fmode; retval = SYSCTL_OUT(req, &fmode, sizeof(uint32_t)); if (retval != 0) break; RB_FOREACH(ims, ip_msource_tree, &inm->inm_srcs) { #ifdef KTR struct in_addr ina; ina.s_addr = htonl(ims->ims_haddr); CTR2(KTR_IGMPV3, "%s: visit node %s", __func__, inet_ntoa(ina)); #endif /* * Only copy-out sources which are in-mode. */ if (fmode != ims_get_mode(inm, ims, 1)) { CTR1(KTR_IGMPV3, "%s: skip non-in-mode", __func__); continue; } src.s_addr = htonl(ims->ims_haddr); retval = SYSCTL_OUT(req, &src, sizeof(struct in_addr)); if (retval != 0) break; } } IF_ADDR_RUNLOCK(ifp); IN_MULTI_UNLOCK(); return (retval); } #if defined(KTR) && (KTR_COMPILE & KTR_IGMPV3) static const char *inm_modestrs[] = { "un", "in", "ex" }; static const char * inm_mode_str(const int mode) { if (mode >= MCAST_UNDEFINED && mode <= MCAST_EXCLUDE) return (inm_modestrs[mode]); return ("??"); } static const char *inm_statestrs[] = { "not-member", "silent", "idle", "lazy", "sleeping", "awakening", "query-pending", "sg-query-pending", "leaving" }; static const char * inm_state_str(const int state) { if (state >= IGMP_NOT_MEMBER && state <= IGMP_LEAVING_MEMBER) return (inm_statestrs[state]); return ("??"); } /* * Dump an in_multi structure to the console. */ void inm_print(const struct in_multi *inm) { int t; if ((ktr_mask & KTR_IGMPV3) == 0) return; printf("%s: --- begin inm %p ---\n", __func__, inm); printf("addr %s ifp %p(%s) ifma %p\n", inet_ntoa(inm->inm_addr), inm->inm_ifp, inm->inm_ifp->if_xname, inm->inm_ifma); printf("timer %u state %s refcount %u scq.len %u\n", inm->inm_timer, inm_state_str(inm->inm_state), inm->inm_refcount, - inm->inm_scq.ifq_len); + inm->inm_scq.mq_len); printf("igi %p nsrc %lu sctimer %u scrv %u\n", inm->inm_igi, inm->inm_nsrc, inm->inm_sctimer, inm->inm_scrv); for (t = 0; t < 2; t++) { printf("t%d: fmode %s asm %u ex %u in %u rec %u\n", t, inm_mode_str(inm->inm_st[t].iss_fmode), inm->inm_st[t].iss_asm, inm->inm_st[t].iss_ex, inm->inm_st[t].iss_in, inm->inm_st[t].iss_rec); } printf("%s: --- end inm %p ---\n", __func__, inm); } #else /* !KTR || !(KTR_COMPILE & KTR_IGMPV3) */ void inm_print(const struct in_multi *inm) { } #endif /* KTR && (KTR_COMPILE & KTR_IGMPV3) */ RB_GENERATE(ip_msource_tree, ip_msource, ims_link, ip_msource_cmp); Index: projects/ifnet/sys/netinet/in_var.h =================================================================== --- projects/ifnet/sys/netinet/in_var.h (revision 279031) +++ projects/ifnet/sys/netinet/in_var.h (revision 279032) @@ -1,439 +1,409 @@ /*- * Copyright (c) 1985, 1986, 1993 * The Regents of the University of California. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)in_var.h 8.2 (Berkeley) 1/9/95 * $FreeBSD$ */ #ifndef _NETINET_IN_VAR_H_ #define _NETINET_IN_VAR_H_ +/* + * Argument structure for SIOCAIFADDR. + */ +struct in_aliasreq { + char ifra_name[IFNAMSIZ]; /* if name, e.g. "en0" */ + struct sockaddr_in ifra_addr; + struct sockaddr_in ifra_broadaddr; +#define ifra_dstaddr ifra_broadaddr + struct sockaddr_in ifra_mask; + int ifra_vhid; +}; + +#ifdef _KERNEL #include #include #include -struct igmp_ifinfo; +struct igmp_ifsoftc; struct in_multi; struct lltable; /* * IPv4 per-interface state. */ struct in_ifinfo { struct lltable *ii_llt; /* ARP state */ - struct igmp_ifinfo *ii_igmp; /* IGMP state */ + struct igmp_ifsoftc *ii_igmp; /* IGMP state */ struct in_multi *ii_allhosts; /* 224.0.0.1 membership */ }; -#if defined(_KERNEL) || defined(_WANT_IFADDR) /* * Interface address, Internet version. One of these structures * is allocated for each Internet address on an interface. * The ifaddr structure contains the protocol-independent part * of the structure and is assumed to be first. */ struct in_ifaddr { struct ifaddr ia_ifa; /* protocol-independent info */ #define ia_ifp ia_ifa.ifa_ifp #define ia_flags ia_ifa.ifa_flags /* ia_subnet{,mask} in host order */ u_long ia_subnet; /* subnet address */ u_long ia_subnetmask; /* mask of subnet */ LIST_ENTRY(in_ifaddr) ia_hash; /* entry in bucket of inet addresses */ TAILQ_ENTRY(in_ifaddr) ia_link; /* list of internet addresses */ struct sockaddr_in ia_addr; /* reserve space for interface name */ struct sockaddr_in ia_dstaddr; /* reserve space for broadcast addr */ #define ia_broadaddr ia_dstaddr struct sockaddr_in ia_sockmask; /* reserve space for general netmask */ }; -#endif -struct in_aliasreq { - char ifra_name[IFNAMSIZ]; /* if name, e.g. "en0" */ - struct sockaddr_in ifra_addr; - struct sockaddr_in ifra_broadaddr; -#define ifra_dstaddr ifra_broadaddr - struct sockaddr_in ifra_mask; - int ifra_vhid; -}; /* * Given a pointer to an in_ifaddr (ifaddr), * return a pointer to the addr as a sockaddr_in. */ #define IA_SIN(ia) (&(((struct in_ifaddr *)(ia))->ia_addr)) #define IA_DSTSIN(ia) (&(((struct in_ifaddr *)(ia))->ia_dstaddr)) #define IA_MASKSIN(ia) (&(((struct in_ifaddr *)(ia))->ia_sockmask)) #define IN_LNAOF(in, ifa) \ ((ntohl((in).s_addr) & ~((struct in_ifaddr *)(ifa)->ia_subnetmask)) - -#ifdef _KERNEL extern u_char inetctlerrmap[]; #define LLTABLE(ifp) \ ((struct in_ifinfo *)(ifp)->if_afdata[AF_INET])->ii_llt /* * Hash table for IP addresses. */ TAILQ_HEAD(in_ifaddrhead, in_ifaddr); LIST_HEAD(in_ifaddrhashhead, in_ifaddr); VNET_DECLARE(struct in_ifaddrhashhead *, in_ifaddrhashtbl); VNET_DECLARE(struct in_ifaddrhead, in_ifaddrhead); VNET_DECLARE(u_long, in_ifaddrhmask); /* mask for hash table */ #define V_in_ifaddrhashtbl VNET(in_ifaddrhashtbl) #define V_in_ifaddrhead VNET(in_ifaddrhead) #define V_in_ifaddrhmask VNET(in_ifaddrhmask) #define INADDR_NHASH_LOG2 9 #define INADDR_NHASH (1 << INADDR_NHASH_LOG2) #define INADDR_HASHVAL(x) fnv_32_buf((&(x)), sizeof(x), FNV1_32_INIT) #define INADDR_HASH(x) \ (&V_in_ifaddrhashtbl[INADDR_HASHVAL(x) & V_in_ifaddrhmask]) extern struct rwlock in_ifaddr_lock; #define IN_IFADDR_LOCK_ASSERT() rw_assert(&in_ifaddr_lock, RA_LOCKED) #define IN_IFADDR_RLOCK() rw_rlock(&in_ifaddr_lock) #define IN_IFADDR_RLOCK_ASSERT() rw_assert(&in_ifaddr_lock, RA_RLOCKED) #define IN_IFADDR_RUNLOCK() rw_runlock(&in_ifaddr_lock) #define IN_IFADDR_WLOCK() rw_wlock(&in_ifaddr_lock) #define IN_IFADDR_WLOCK_ASSERT() rw_assert(&in_ifaddr_lock, RA_WLOCKED) #define IN_IFADDR_WUNLOCK() rw_wunlock(&in_ifaddr_lock) /* * Macro for finding the internet address structure (in_ifaddr) * corresponding to one of our IP addresses (in_addr). */ #define INADDR_TO_IFADDR(addr, ia) \ /* struct in_addr addr; */ \ /* struct in_ifaddr *ia; */ \ do { \ \ LIST_FOREACH(ia, INADDR_HASH((addr).s_addr), ia_hash) \ if (IA_SIN(ia)->sin_addr.s_addr == (addr).s_addr) \ break; \ } while (0) /* * Macro for finding the interface (ifnet structure) corresponding to one * of our IP addresses. */ #define INADDR_TO_IFP(addr, ifp) \ /* struct in_addr addr; */ \ /* struct ifnet *ifp; */ \ { \ struct in_ifaddr *ia; \ \ INADDR_TO_IFADDR(addr, ia); \ (ifp) = (ia == NULL) ? NULL : ia->ia_ifp; \ } /* * Macro for finding the internet address structure (in_ifaddr) corresponding * to a given interface (ifnet structure). */ #define IFP_TO_IA(ifp, ia) \ /* struct ifnet *ifp; */ \ /* struct in_ifaddr *ia; */ \ do { \ IN_IFADDR_RLOCK(); \ for ((ia) = TAILQ_FIRST(&V_in_ifaddrhead); \ (ia) != NULL && (ia)->ia_ifp != (ifp); \ (ia) = TAILQ_NEXT((ia), ia_link)) \ continue; \ if ((ia) != NULL) \ ifa_ref(&(ia)->ia_ifa); \ IN_IFADDR_RUNLOCK(); \ } while (0) -#endif /* * IP datagram reassembly. */ #define IPREASS_NHASH_LOG2 6 #define IPREASS_NHASH (1 << IPREASS_NHASH_LOG2) #define IPREASS_HMASK (IPREASS_NHASH - 1) #define IPREASS_HASH(x,y) \ (((((x) & 0xF) | ((((x) >> 8) & 0xF) << 4)) ^ (y)) & IPREASS_HMASK) /* * Legacy IPv4 IGMP per-link structure. */ struct router_info { struct ifnet *rti_ifp; int rti_type; /* type of router which is querier on this interface */ int rti_time; /* # of slow timeouts since last old query */ SLIST_ENTRY(router_info) rti_list; }; /* - * Per-interface IGMP router version information. - */ -struct igmp_ifinfo { - LIST_ENTRY(igmp_ifinfo) igi_link; - struct ifnet *igi_ifp; /* interface this instance belongs to */ - uint32_t igi_version; /* IGMPv3 Host Compatibility Mode */ - uint32_t igi_v1_timer; /* IGMPv1 Querier Present timer (s) */ - uint32_t igi_v2_timer; /* IGMPv2 Querier Present timer (s) */ - uint32_t igi_v3_timer; /* IGMPv3 General Query (interface) timer (s)*/ - uint32_t igi_flags; /* IGMP per-interface flags */ - uint32_t igi_rv; /* IGMPv3 Robustness Variable */ - uint32_t igi_qi; /* IGMPv3 Query Interval (s) */ - uint32_t igi_qri; /* IGMPv3 Query Response Interval (s) */ - uint32_t igi_uri; /* IGMPv3 Unsolicited Report Interval (s) */ - SLIST_HEAD(,in_multi) igi_relinmhead; /* released groups */ - struct mbufq igi_gq; /* queue of general query responses */ -}; - -#define IGIF_SILENT 0x00000001 /* Do not use IGMP on this ifp */ -#define IGIF_LOOPBACK 0x00000002 /* Send IGMP reports to loopback */ - -/* * IPv4 multicast IGMP-layer source entry. */ struct ip_msource { RB_ENTRY(ip_msource) ims_link; /* RB tree links */ in_addr_t ims_haddr; /* host byte order */ struct ims_st { uint16_t ex; /* # of exclusive members */ uint16_t in; /* # of inclusive members */ } ims_st[2]; /* state at t0, t1 */ uint8_t ims_stp; /* pending query */ }; /* * IPv4 multicast PCB-layer source entry. */ struct in_msource { RB_ENTRY(ip_msource) ims_link; /* RB tree links */ in_addr_t ims_haddr; /* host byte order */ uint8_t imsl_st[2]; /* state before/at commit */ }; RB_HEAD(ip_msource_tree, ip_msource); /* define struct ip_msource_tree */ static __inline int ip_msource_cmp(const struct ip_msource *a, const struct ip_msource *b) { if (a->ims_haddr < b->ims_haddr) return (-1); if (a->ims_haddr == b->ims_haddr) return (0); return (1); } RB_PROTOTYPE(ip_msource_tree, ip_msource, ims_link, ip_msource_cmp); /* * IPv4 multicast PCB-layer group filter descriptor. */ struct in_mfilter { struct ip_msource_tree imf_sources; /* source list for (S,G) */ u_long imf_nsrc; /* # of source entries */ uint8_t imf_st[2]; /* state before/at commit */ }; /* * IPv4 group descriptor. * * For every entry on an ifnet's if_multiaddrs list which represents * an IP multicast group, there is one of these structures. * * If any source filters are present, then a node will exist in the RB-tree * to permit fast lookup by source whenever an operation takes place. * This permits pre-order traversal when we issue reports. * Source filter trees are kept separately from the socket layer to * greatly simplify locking. * * When IGMPv3 is active, inm_timer is the response to group query timer. * The state-change timer inm_sctimer is separate; whenever state changes * for the group the state change record is generated and transmitted, * and kept if retransmissions are necessary. * * FUTURE: inm_link is now only used when groups are being purged * on a detaching ifnet. It could be demoted to a SLIST_ENTRY, but * because it is at the very start of the struct, we can't do this * w/o breaking the ABI for ifmcstat. */ struct in_multi { LIST_ENTRY(in_multi) inm_link; /* to-be-released by in_ifdetach */ struct in_addr inm_addr; /* IP multicast address, convenience */ struct ifnet *inm_ifp; /* back pointer to ifnet */ struct ifmultiaddr *inm_ifma; /* back pointer to ifmultiaddr */ u_int inm_timer; /* IGMPv1/v2 group / v3 query timer */ u_int inm_state; /* state of the membership */ void *inm_rti; /* unused, legacy field */ u_int inm_refcount; /* reference count */ /* New fields for IGMPv3 follow. */ - struct igmp_ifinfo *inm_igi; /* IGMP info */ + struct igmp_ifsoftc *inm_igi; /* IGMP info */ SLIST_ENTRY(in_multi) inm_nrele; /* to-be-released by IGMP */ struct ip_msource_tree inm_srcs; /* tree of sources */ u_long inm_nsrc; /* # of tree entries */ struct mbufq inm_scq; /* queue of pending * state-change packets */ struct timeval inm_lastgsrtv; /* Time of last G-S-R query */ uint16_t inm_sctimer; /* state-change timer */ uint16_t inm_scrv; /* state-change rexmit count */ /* * SSM state counters which track state at T0 (the time the last * state-change report's RV timer went to zero) and T1 * (time of pending report, i.e. now). * Used for computing IGMPv3 state-change reports. Several refcounts * are maintained here to optimize for common use-cases. */ struct inm_st { uint16_t iss_fmode; /* IGMP filter mode */ uint16_t iss_asm; /* # of ASM listeners */ uint16_t iss_ex; /* # of exclusive members */ uint16_t iss_in; /* # of inclusive members */ uint16_t iss_rec; /* # of recorded sources */ } inm_st[2]; /* state at t0, t1 */ }; /* * Helper function to derive the filter mode on a source entry * from its internal counters. Predicates are: * A source is only excluded if all listeners exclude it. * A source is only included if no listeners exclude it, * and at least one listener includes it. * May be used by ifmcstat(8). */ static __inline uint8_t ims_get_mode(const struct in_multi *inm, const struct ip_msource *ims, uint8_t t) { t = !!t; if (inm->inm_st[t].iss_ex > 0 && inm->inm_st[t].iss_ex == ims->ims_st[t].ex) return (MCAST_EXCLUDE); else if (ims->ims_st[t].in > 0 && ims->ims_st[t].ex == 0) return (MCAST_INCLUDE); return (MCAST_UNDEFINED); } -#ifdef _KERNEL - #ifdef SYSCTL_DECL SYSCTL_DECL(_net_inet); SYSCTL_DECL(_net_inet_ip); SYSCTL_DECL(_net_inet_raw); #endif /* * Lock macros for IPv4 layer multicast address lists. IPv4 lock goes * before link layer multicast locks in the lock order. In most cases, * consumers of IN_*_MULTI() macros should acquire the locks before * calling them; users of the in_{add,del}multi() functions should not. */ extern struct mtx in_multi_mtx; #define IN_MULTI_LOCK() mtx_lock(&in_multi_mtx) #define IN_MULTI_UNLOCK() mtx_unlock(&in_multi_mtx) #define IN_MULTI_LOCK_ASSERT() mtx_assert(&in_multi_mtx, MA_OWNED) #define IN_MULTI_UNLOCK_ASSERT() mtx_assert(&in_multi_mtx, MA_NOTOWNED) /* Acquire an in_multi record. */ static __inline void inm_acquire_locked(struct in_multi *inm) { IN_MULTI_LOCK_ASSERT(); ++inm->inm_refcount; } /* * Return values for imo_multi_filter(). */ #define MCAST_PASS 0 /* Pass */ #define MCAST_NOTGMEMBER 1 /* This host not a member of group */ #define MCAST_NOTSMEMBER 2 /* This host excluded source */ #define MCAST_MUTED 3 /* [deprecated] */ struct rtentry; struct route; struct ip_moptions; struct radix_node_head; struct in_multi *inm_lookup_locked(struct ifnet *, const struct in_addr); struct in_multi *inm_lookup(struct ifnet *, const struct in_addr); int imo_multi_filter(const struct ip_moptions *, const struct ifnet *, const struct sockaddr *, const struct sockaddr *); void inm_commit(struct in_multi *); void inm_clear_recorded(struct in_multi *); void inm_print(const struct in_multi *); int inm_record_source(struct in_multi *inm, const in_addr_t); void inm_release(struct in_multi *); void inm_release_locked(struct in_multi *); struct in_multi * in_addmulti(struct in_addr *, struct ifnet *); void in_delmulti(struct in_multi *); int in_joingroup(struct ifnet *, const struct in_addr *, /*const*/ struct in_mfilter *, struct in_multi **); int in_joingroup_locked(struct ifnet *, const struct in_addr *, /*const*/ struct in_mfilter *, struct in_multi **); int in_leavegroup(struct in_multi *, /*const*/ struct in_mfilter *); int in_leavegroup_locked(struct in_multi *, /*const*/ struct in_mfilter *); int in_control(struct socket *, u_long, caddr_t, struct ifnet *, struct thread *); int in_addprefix(struct in_ifaddr *, int); int in_scrubprefix(struct in_ifaddr *, u_int); void ip_input(struct mbuf *); void ip_direct_input(struct mbuf *); void in_ifadown(struct ifaddr *ifa, int); struct mbuf *ip_fastforward(struct mbuf *); void *in_domifattach(struct ifnet *); void in_domifdetach(struct ifnet *, void *); /* XXX */ void in_rtalloc_ign(struct route *ro, u_long ignflags, u_int fibnum); void in_rtalloc(struct route *ro, u_int fibnum); struct rtentry *in_rtalloc1(struct sockaddr *, int, u_long, u_int); void in_rtredirect(struct sockaddr *, struct sockaddr *, struct sockaddr *, int, struct sockaddr *, u_int); int in_rtrequest(int, struct sockaddr *, struct sockaddr *, struct sockaddr *, int, struct rtentry **, u_int); - -#if 0 -int in_rt_getifa(struct rt_addrinfo *, u_int fibnum); -int in_rtioctl(u_long, caddr_t, u_int); -int in_rtrequest1(int, struct rt_addrinfo *, struct rtentry **, u_int); -#endif #endif /* _KERNEL */ /* INET6 stuff */ #include #endif /* _NETINET_IN_VAR_H_ */ Index: projects/ifnet/sys/netinet6/in6_var.h =================================================================== --- projects/ifnet/sys/netinet6/in6_var.h (revision 279031) +++ projects/ifnet/sys/netinet6/in6_var.h (revision 279032) @@ -1,844 +1,841 @@ /*- * Copyright (C) 1995, 1996, 1997, and 1998 WIDE Project. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the project nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $KAME: in6_var.h,v 1.56 2001/03/29 05:34:31 itojun Exp $ */ /*- * Copyright (c) 1985, 1986, 1993 * The Regents of the University of California. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)in_var.h 8.1 (Berkeley) 6/10/93 * $FreeBSD$ */ #ifndef _NETINET6_IN6_VAR_H_ #define _NETINET6_IN6_VAR_H_ #include #include #ifdef _KERNEL #include #include #endif /* * Interface address, Internet version. One of these structures * is allocated for each interface with an Internet address. * The ifaddr structure contains the protocol-independent part * of the structure and is assumed to be first. */ /* * pltime/vltime are just for future reference (required to implements 2 * hour rule for hosts). they should never be modified by nd6_timeout or * anywhere else. * userland -> kernel: accept pltime/vltime * kernel -> userland: throw up everything * in kernel: modify preferred/expire only */ struct in6_addrlifetime { time_t ia6t_expire; /* valid lifetime expiration time */ time_t ia6t_preferred; /* preferred lifetime expiration time */ u_int32_t ia6t_vltime; /* valid lifetime */ u_int32_t ia6t_pltime; /* prefix lifetime */ }; struct nd_ifinfo; struct scope6_id; struct lltable; -struct mld_ifinfo; +struct mld_ifsoftc; struct in6_ifextra { counter_u64_t *in6_ifstat; counter_u64_t *icmp6_ifstat; struct nd_ifinfo *nd_ifinfo; struct scope6_id *scope6_id; struct lltable *lltable; - struct mld_ifinfo *mld_ifinfo; + struct mld_ifsoftc *mld_ifinfo; }; #define LLTABLE6(ifp) (((struct in6_ifextra *)(ifp)->if_afdata[AF_INET6])->lltable) -#if defined(_KERNEL) || defined(_WANT_IFADDR) +#ifdef _KERNEL struct in6_ifaddr { struct ifaddr ia_ifa; /* protocol-independent info */ #define ia_ifp ia_ifa.ifa_ifp #define ia_flags ia_ifa.ifa_flags struct sockaddr_in6 ia_addr; /* interface address */ struct sockaddr_in6 ia_net; /* network number of interface */ struct sockaddr_in6 ia_dstaddr; /* space for destination addr */ struct sockaddr_in6 ia_prefixmask; /* prefix mask */ u_int32_t ia_plen; /* prefix length */ TAILQ_ENTRY(in6_ifaddr) ia_link; /* list of IPv6 addresses */ int ia6_flags; struct in6_addrlifetime ia6_lifetime; time_t ia6_createtime; /* the creation time of this address, which is * currently used for temporary addresses only. */ time_t ia6_updatetime; /* back pointer to the ND prefix (for autoconfigured addresses only) */ struct nd_prefix *ia6_ndpr; /* multicast addresses joined from the kernel */ LIST_HEAD(, in6_multi_mship) ia6_memberships; /* entry in bucket of inet6 addresses */ LIST_ENTRY(in6_ifaddr) ia6_hash; }; /* List of in6_ifaddr's. */ TAILQ_HEAD(in6_ifaddrhead, in6_ifaddr); LIST_HEAD(in6_ifaddrlisthead, in6_ifaddr); -#endif +#endif /* _KERNEL */ /* control structure to manage address selection policy */ struct in6_addrpolicy { struct sockaddr_in6 addr; /* prefix address */ struct sockaddr_in6 addrmask; /* prefix mask */ int preced; /* precedence */ int label; /* matching label */ u_quad_t use; /* statistics */ }; /* * IPv6 interface statistics, as defined in RFC2465 Ipv6IfStatsEntry (p12). */ struct in6_ifstat { uint64_t ifs6_in_receive; /* # of total input datagram */ uint64_t ifs6_in_hdrerr; /* # of datagrams with invalid hdr */ uint64_t ifs6_in_toobig; /* # of datagrams exceeded MTU */ uint64_t ifs6_in_noroute; /* # of datagrams with no route */ uint64_t ifs6_in_addrerr; /* # of datagrams with invalid dst */ uint64_t ifs6_in_protounknown; /* # of datagrams with unknown proto */ /* NOTE: increment on final dst if */ uint64_t ifs6_in_truncated; /* # of truncated datagrams */ uint64_t ifs6_in_discard; /* # of discarded datagrams */ /* NOTE: fragment timeout is not here */ uint64_t ifs6_in_deliver; /* # of datagrams delivered to ULP */ /* NOTE: increment on final dst if */ uint64_t ifs6_out_forward; /* # of datagrams forwarded */ /* NOTE: increment on outgoing if */ uint64_t ifs6_out_request; /* # of outgoing datagrams from ULP */ /* NOTE: does not include forwrads */ uint64_t ifs6_out_discard; /* # of discarded datagrams */ uint64_t ifs6_out_fragok; /* # of datagrams fragmented */ uint64_t ifs6_out_fragfail; /* # of datagrams failed on fragment */ uint64_t ifs6_out_fragcreat; /* # of fragment datagrams */ /* NOTE: this is # after fragment */ uint64_t ifs6_reass_reqd; /* # of incoming fragmented packets */ /* NOTE: increment on final dst if */ uint64_t ifs6_reass_ok; /* # of reassembled packets */ /* NOTE: this is # after reass */ /* NOTE: increment on final dst if */ uint64_t ifs6_reass_fail; /* # of reass failures */ /* NOTE: may not be packet count */ /* NOTE: increment on final dst if */ uint64_t ifs6_in_mcast; /* # of inbound multicast datagrams */ uint64_t ifs6_out_mcast; /* # of outbound multicast datagrams */ }; /* * ICMPv6 interface statistics, as defined in RFC2466 Ipv6IfIcmpEntry. * XXX: I'm not sure if this file is the right place for this structure... */ struct icmp6_ifstat { /* * Input statistics */ /* ipv6IfIcmpInMsgs, total # of input messages */ uint64_t ifs6_in_msg; /* ipv6IfIcmpInErrors, # of input error messages */ uint64_t ifs6_in_error; /* ipv6IfIcmpInDestUnreachs, # of input dest unreach errors */ uint64_t ifs6_in_dstunreach; /* ipv6IfIcmpInAdminProhibs, # of input administratively prohibited errs */ uint64_t ifs6_in_adminprohib; /* ipv6IfIcmpInTimeExcds, # of input time exceeded errors */ uint64_t ifs6_in_timeexceed; /* ipv6IfIcmpInParmProblems, # of input parameter problem errors */ uint64_t ifs6_in_paramprob; /* ipv6IfIcmpInPktTooBigs, # of input packet too big errors */ uint64_t ifs6_in_pkttoobig; /* ipv6IfIcmpInEchos, # of input echo requests */ uint64_t ifs6_in_echo; /* ipv6IfIcmpInEchoReplies, # of input echo replies */ uint64_t ifs6_in_echoreply; /* ipv6IfIcmpInRouterSolicits, # of input router solicitations */ uint64_t ifs6_in_routersolicit; /* ipv6IfIcmpInRouterAdvertisements, # of input router advertisements */ uint64_t ifs6_in_routeradvert; /* ipv6IfIcmpInNeighborSolicits, # of input neighbor solicitations */ uint64_t ifs6_in_neighborsolicit; /* ipv6IfIcmpInNeighborAdvertisements, # of input neighbor advertisements */ uint64_t ifs6_in_neighboradvert; /* ipv6IfIcmpInRedirects, # of input redirects */ uint64_t ifs6_in_redirect; /* ipv6IfIcmpInGroupMembQueries, # of input MLD queries */ uint64_t ifs6_in_mldquery; /* ipv6IfIcmpInGroupMembResponses, # of input MLD reports */ uint64_t ifs6_in_mldreport; /* ipv6IfIcmpInGroupMembReductions, # of input MLD done */ uint64_t ifs6_in_mlddone; /* * Output statistics. We should solve unresolved routing problem... */ /* ipv6IfIcmpOutMsgs, total # of output messages */ uint64_t ifs6_out_msg; /* ipv6IfIcmpOutErrors, # of output error messages */ uint64_t ifs6_out_error; /* ipv6IfIcmpOutDestUnreachs, # of output dest unreach errors */ uint64_t ifs6_out_dstunreach; /* ipv6IfIcmpOutAdminProhibs, # of output administratively prohibited errs */ uint64_t ifs6_out_adminprohib; /* ipv6IfIcmpOutTimeExcds, # of output time exceeded errors */ uint64_t ifs6_out_timeexceed; /* ipv6IfIcmpOutParmProblems, # of output parameter problem errors */ uint64_t ifs6_out_paramprob; /* ipv6IfIcmpOutPktTooBigs, # of output packet too big errors */ uint64_t ifs6_out_pkttoobig; /* ipv6IfIcmpOutEchos, # of output echo requests */ uint64_t ifs6_out_echo; /* ipv6IfIcmpOutEchoReplies, # of output echo replies */ uint64_t ifs6_out_echoreply; /* ipv6IfIcmpOutRouterSolicits, # of output router solicitations */ uint64_t ifs6_out_routersolicit; /* ipv6IfIcmpOutRouterAdvertisements, # of output router advertisements */ uint64_t ifs6_out_routeradvert; /* ipv6IfIcmpOutNeighborSolicits, # of output neighbor solicitations */ uint64_t ifs6_out_neighborsolicit; /* ipv6IfIcmpOutNeighborAdvertisements, # of output neighbor advertisements */ uint64_t ifs6_out_neighboradvert; /* ipv6IfIcmpOutRedirects, # of output redirects */ uint64_t ifs6_out_redirect; /* ipv6IfIcmpOutGroupMembQueries, # of output MLD queries */ uint64_t ifs6_out_mldquery; /* ipv6IfIcmpOutGroupMembResponses, # of output MLD reports */ uint64_t ifs6_out_mldreport; /* ipv6IfIcmpOutGroupMembReductions, # of output MLD done */ uint64_t ifs6_out_mlddone; }; struct in6_ifreq { char ifr_name[IFNAMSIZ]; union { struct sockaddr_in6 ifru_addr; struct sockaddr_in6 ifru_dstaddr; int ifru_flags; int ifru_flags6; int ifru_metric; caddr_t ifru_data; struct in6_addrlifetime ifru_lifetime; struct in6_ifstat ifru_stat; struct icmp6_ifstat ifru_icmp6stat; u_int32_t ifru_scope_id[16]; } ifr_ifru; }; struct in6_aliasreq { char ifra_name[IFNAMSIZ]; struct sockaddr_in6 ifra_addr; struct sockaddr_in6 ifra_dstaddr; struct sockaddr_in6 ifra_prefixmask; int ifra_flags; struct in6_addrlifetime ifra_lifetime; int ifra_vhid; }; /* pre-10.x compat */ struct oin6_aliasreq { char ifra_name[IFNAMSIZ]; struct sockaddr_in6 ifra_addr; struct sockaddr_in6 ifra_dstaddr; struct sockaddr_in6 ifra_prefixmask; int ifra_flags; struct in6_addrlifetime ifra_lifetime; }; /* prefix type macro */ #define IN6_PREFIX_ND 1 #define IN6_PREFIX_RR 2 /* * prefix related flags passed between kernel(NDP related part) and * user land command(ifconfig) and daemon(rtadvd). */ struct in6_prflags { struct prf_ra { u_char onlink : 1; u_char autonomous : 1; u_char reserved : 6; } prf_ra; u_char prf_reserved1; u_short prf_reserved2; /* want to put this on 4byte offset */ struct prf_rr { u_char decrvalid : 1; u_char decrprefd : 1; u_char reserved : 6; } prf_rr; u_char prf_reserved3; u_short prf_reserved4; }; struct in6_prefixreq { char ipr_name[IFNAMSIZ]; u_char ipr_origin; u_char ipr_plen; u_int32_t ipr_vltime; u_int32_t ipr_pltime; struct in6_prflags ipr_flags; struct sockaddr_in6 ipr_prefix; }; #define PR_ORIG_RA 0 #define PR_ORIG_RR 1 #define PR_ORIG_STATIC 2 #define PR_ORIG_KERNEL 3 #define ipr_raf_onlink ipr_flags.prf_ra.onlink #define ipr_raf_auto ipr_flags.prf_ra.autonomous #define ipr_statef_onlink ipr_flags.prf_state.onlink #define ipr_rrf_decrvalid ipr_flags.prf_rr.decrvalid #define ipr_rrf_decrprefd ipr_flags.prf_rr.decrprefd struct in6_rrenumreq { char irr_name[IFNAMSIZ]; u_char irr_origin; u_char irr_m_len; /* match len for matchprefix */ u_char irr_m_minlen; /* minlen for matching prefix */ u_char irr_m_maxlen; /* maxlen for matching prefix */ u_char irr_u_uselen; /* uselen for adding prefix */ u_char irr_u_keeplen; /* keeplen from matching prefix */ struct irr_raflagmask { u_char onlink : 1; u_char autonomous : 1; u_char reserved : 6; } irr_raflagmask; u_int32_t irr_vltime; u_int32_t irr_pltime; struct in6_prflags irr_flags; struct sockaddr_in6 irr_matchprefix; struct sockaddr_in6 irr_useprefix; }; #define irr_raf_mask_onlink irr_raflagmask.onlink #define irr_raf_mask_auto irr_raflagmask.autonomous #define irr_raf_mask_reserved irr_raflagmask.reserved #define irr_raf_onlink irr_flags.prf_ra.onlink #define irr_raf_auto irr_flags.prf_ra.autonomous #define irr_statef_onlink irr_flags.prf_state.onlink #define irr_rrf irr_flags.prf_rr #define irr_rrf_decrvalid irr_flags.prf_rr.decrvalid #define irr_rrf_decrprefd irr_flags.prf_rr.decrprefd /* * Given a pointer to an in6_ifaddr (ifaddr), * return a pointer to the addr as a sockaddr_in6 */ #define IA6_IN6(ia) (&((ia)->ia_addr.sin6_addr)) #define IA6_DSTIN6(ia) (&((ia)->ia_dstaddr.sin6_addr)) #define IA6_MASKIN6(ia) (&((ia)->ia_prefixmask.sin6_addr)) #define IA6_SIN6(ia) (&((ia)->ia_addr)) #define IA6_DSTSIN6(ia) (&((ia)->ia_dstaddr)) #define IFA_IN6(x) (&((struct sockaddr_in6 *)((x)->ifa_addr))->sin6_addr) #define IFA_DSTIN6(x) (&((struct sockaddr_in6 *)((x)->ifa_dstaddr))->sin6_addr) #define IFPR_IN6(x) (&((struct sockaddr_in6 *)((x)->ifpr_prefix))->sin6_addr) #ifdef _KERNEL #define IN6_ARE_MASKED_ADDR_EQUAL(d, a, m) ( \ (((d)->s6_addr32[0] ^ (a)->s6_addr32[0]) & (m)->s6_addr32[0]) == 0 && \ (((d)->s6_addr32[1] ^ (a)->s6_addr32[1]) & (m)->s6_addr32[1]) == 0 && \ (((d)->s6_addr32[2] ^ (a)->s6_addr32[2]) & (m)->s6_addr32[2]) == 0 && \ (((d)->s6_addr32[3] ^ (a)->s6_addr32[3]) & (m)->s6_addr32[3]) == 0 ) #define IN6_MASK_ADDR(a, m) do { \ (a)->s6_addr32[0] &= (m)->s6_addr32[0]; \ (a)->s6_addr32[1] &= (m)->s6_addr32[1]; \ (a)->s6_addr32[2] &= (m)->s6_addr32[2]; \ (a)->s6_addr32[3] &= (m)->s6_addr32[3]; \ } while (0) #endif #define SIOCSIFADDR_IN6 _IOW('i', 12, struct in6_ifreq) #define SIOCGIFADDR_IN6 _IOWR('i', 33, struct in6_ifreq) #ifdef _KERNEL /* * SIOCSxxx ioctls should be unused (see comments in in6.c), but * we do not shift numbers for binary compatibility. */ #define SIOCSIFDSTADDR_IN6 _IOW('i', 14, struct in6_ifreq) #define SIOCSIFNETMASK_IN6 _IOW('i', 22, struct in6_ifreq) #endif #define SIOCGIFDSTADDR_IN6 _IOWR('i', 34, struct in6_ifreq) #define SIOCGIFNETMASK_IN6 _IOWR('i', 37, struct in6_ifreq) #define SIOCDIFADDR_IN6 _IOW('i', 25, struct in6_ifreq) #define OSIOCAIFADDR_IN6 _IOW('i', 26, struct oin6_aliasreq) #define SIOCAIFADDR_IN6 _IOW('i', 27, struct in6_aliasreq) #define SIOCSIFPHYADDR_IN6 _IOW('i', 70, struct in6_aliasreq) #define SIOCGIFPSRCADDR_IN6 _IOWR('i', 71, struct in6_ifreq) #define SIOCGIFPDSTADDR_IN6 _IOWR('i', 72, struct in6_ifreq) #define SIOCGIFAFLAG_IN6 _IOWR('i', 73, struct in6_ifreq) #define SIOCGDRLST_IN6 _IOWR('i', 74, struct in6_drlist) #ifdef _KERNEL /* XXX: SIOCGPRLST_IN6 is exposed in KAME but in6_oprlist is not. */ #define SIOCGPRLST_IN6 _IOWR('i', 75, struct in6_oprlist) #endif #ifdef _KERNEL #define OSIOCGIFINFO_IN6 _IOWR('i', 76, struct in6_ondireq) #endif #define SIOCGIFINFO_IN6 _IOWR('i', 108, struct in6_ndireq) #define SIOCSIFINFO_IN6 _IOWR('i', 109, struct in6_ndireq) #define SIOCSNDFLUSH_IN6 _IOWR('i', 77, struct in6_ifreq) #define SIOCGNBRINFO_IN6 _IOWR('i', 78, struct in6_nbrinfo) #define SIOCSPFXFLUSH_IN6 _IOWR('i', 79, struct in6_ifreq) #define SIOCSRTRFLUSH_IN6 _IOWR('i', 80, struct in6_ifreq) #define SIOCGIFALIFETIME_IN6 _IOWR('i', 81, struct in6_ifreq) #define SIOCSIFALIFETIME_IN6 _IOWR('i', 82, struct in6_ifreq) #define SIOCGIFSTAT_IN6 _IOWR('i', 83, struct in6_ifreq) #define SIOCGIFSTAT_ICMP6 _IOWR('i', 84, struct in6_ifreq) #define SIOCSDEFIFACE_IN6 _IOWR('i', 85, struct in6_ndifreq) #define SIOCGDEFIFACE_IN6 _IOWR('i', 86, struct in6_ndifreq) #define SIOCSIFINFO_FLAGS _IOWR('i', 87, struct in6_ndireq) /* XXX */ #define SIOCSSCOPE6 _IOW('i', 88, struct in6_ifreq) #define SIOCGSCOPE6 _IOWR('i', 89, struct in6_ifreq) #define SIOCGSCOPE6DEF _IOWR('i', 90, struct in6_ifreq) #define SIOCSIFPREFIX_IN6 _IOW('i', 100, struct in6_prefixreq) /* set */ #define SIOCGIFPREFIX_IN6 _IOWR('i', 101, struct in6_prefixreq) /* get */ #define SIOCDIFPREFIX_IN6 _IOW('i', 102, struct in6_prefixreq) /* del */ #define SIOCAIFPREFIX_IN6 _IOW('i', 103, struct in6_rrenumreq) /* add */ #define SIOCCIFPREFIX_IN6 _IOW('i', 104, \ struct in6_rrenumreq) /* change */ #define SIOCSGIFPREFIX_IN6 _IOW('i', 105, \ struct in6_rrenumreq) /* set global */ #define SIOCGETSGCNT_IN6 _IOWR('u', 106, \ struct sioc_sg_req6) /* get s,g pkt cnt */ #define SIOCGETMIFCNT_IN6 _IOWR('u', 107, \ struct sioc_mif_req6) /* get pkt cnt per if */ #define SIOCAADDRCTL_POLICY _IOW('u', 108, struct in6_addrpolicy) #define SIOCDADDRCTL_POLICY _IOW('u', 109, struct in6_addrpolicy) #define IN6_IFF_ANYCAST 0x01 /* anycast address */ #define IN6_IFF_TENTATIVE 0x02 /* tentative address */ #define IN6_IFF_DUPLICATED 0x04 /* DAD detected duplicate */ #define IN6_IFF_DETACHED 0x08 /* may be detached from the link */ #define IN6_IFF_DEPRECATED 0x10 /* deprecated address */ #define IN6_IFF_NODAD 0x20 /* don't perform DAD on this address * (used only at first SIOC* call) */ #define IN6_IFF_AUTOCONF 0x40 /* autoconfigurable address. */ #define IN6_IFF_TEMPORARY 0x80 /* temporary (anonymous) address. */ #define IN6_IFF_PREFER_SOURCE 0x0100 /* preferred address for SAS */ #define IN6_IFF_NOPFX 0x8000 /* skip kernel prefix management. * XXX: this should be temporary. */ /* do not input/output */ #define IN6_IFF_NOTREADY (IN6_IFF_TENTATIVE|IN6_IFF_DUPLICATED) #ifdef _KERNEL #define IN6_ARE_SCOPE_CMP(a,b) ((a)-(b)) #define IN6_ARE_SCOPE_EQUAL(a,b) ((a)==(b)) #endif #ifdef _KERNEL VNET_DECLARE(struct in6_ifaddrhead, in6_ifaddrhead); VNET_DECLARE(struct in6_ifaddrlisthead *, in6_ifaddrhashtbl); VNET_DECLARE(u_long, in6_ifaddrhmask); #define V_in6_ifaddrhead VNET(in6_ifaddrhead) #define V_in6_ifaddrhashtbl VNET(in6_ifaddrhashtbl) #define V_in6_ifaddrhmask VNET(in6_ifaddrhmask) #define IN6ADDR_NHASH_LOG2 8 #define IN6ADDR_NHASH (1 << IN6ADDR_NHASH_LOG2) #define IN6ADDR_HASHVAL(x) (in6_addrhash(x)) #define IN6ADDR_HASH(x) \ (&V_in6_ifaddrhashtbl[IN6ADDR_HASHVAL(x) & V_in6_ifaddrhmask]) static __inline uint32_t in6_addrhash(const struct in6_addr *in6) { uint32_t x; x = in6->s6_addr32[0] ^ in6->s6_addr32[1] ^ in6->s6_addr32[2] ^ in6->s6_addr32[3]; return (fnv_32_buf(&x, sizeof(x), FNV1_32_INIT)); } extern struct rwlock in6_ifaddr_lock; #define IN6_IFADDR_LOCK_ASSERT( ) rw_assert(&in6_ifaddr_lock, RA_LOCKED) #define IN6_IFADDR_RLOCK() rw_rlock(&in6_ifaddr_lock) #define IN6_IFADDR_RLOCK_ASSERT() rw_assert(&in6_ifaddr_lock, RA_RLOCKED) #define IN6_IFADDR_RUNLOCK() rw_runlock(&in6_ifaddr_lock) #define IN6_IFADDR_WLOCK() rw_wlock(&in6_ifaddr_lock) #define IN6_IFADDR_WLOCK_ASSERT() rw_assert(&in6_ifaddr_lock, RA_WLOCKED) #define IN6_IFADDR_WUNLOCK() rw_wunlock(&in6_ifaddr_lock) #define in6_ifstat_inc(ifp, tag) \ do { \ if (ifp) \ counter_u64_add(((struct in6_ifextra *) \ ((ifp)->if_afdata[AF_INET6]))->in6_ifstat[ \ offsetof(struct in6_ifstat, tag) / sizeof(uint64_t)], 1);\ } while (/*CONSTCOND*/ 0) extern u_char inet6ctlerrmap[]; VNET_DECLARE(unsigned long, in6_maxmtu); #define V_in6_maxmtu VNET(in6_maxmtu) #endif /* _KERNEL */ /* * IPv6 multicast MLD-layer source entry. */ struct ip6_msource { RB_ENTRY(ip6_msource) im6s_link; /* RB tree links */ struct in6_addr im6s_addr; struct im6s_st { uint16_t ex; /* # of exclusive members */ uint16_t in; /* # of inclusive members */ } im6s_st[2]; /* state at t0, t1 */ uint8_t im6s_stp; /* pending query */ }; RB_HEAD(ip6_msource_tree, ip6_msource); /* * IPv6 multicast PCB-layer source entry. * * NOTE: overlapping use of struct ip6_msource fields at start. */ struct in6_msource { RB_ENTRY(ip6_msource) im6s_link; /* Common field */ struct in6_addr im6s_addr; /* Common field */ uint8_t im6sl_st[2]; /* state before/at commit */ }; #ifdef _KERNEL /* * IPv6 source tree comparison function. * * An ordered predicate is necessary; bcmp() is not documented to return * an indication of order, memcmp() is, and is an ISO C99 requirement. */ static __inline int ip6_msource_cmp(const struct ip6_msource *a, const struct ip6_msource *b) { return (memcmp(&a->im6s_addr, &b->im6s_addr, sizeof(struct in6_addr))); } RB_PROTOTYPE(ip6_msource_tree, ip6_msource, im6s_link, ip6_msource_cmp); -#endif /* _KERNEL */ /* * IPv6 multicast PCB-layer group filter descriptor. */ struct in6_mfilter { struct ip6_msource_tree im6f_sources; /* source list for (S,G) */ u_long im6f_nsrc; /* # of source entries */ uint8_t im6f_st[2]; /* state before/at commit */ }; /* * Legacy KAME IPv6 multicast membership descriptor. */ struct in6_multi_mship { struct in6_multi *i6mm_maddr; LIST_ENTRY(in6_multi_mship) i6mm_chain; }; /* * IPv6 group descriptor. * * For every entry on an ifnet's if_multiaddrs list which represents * an IP multicast group, there is one of these structures. * * If any source filters are present, then a node will exist in the RB-tree * to permit fast lookup by source whenever an operation takes place. * This permits pre-order traversal when we issue reports. * Source filter trees are kept separately from the socket layer to * greatly simplify locking. * * When MLDv2 is active, in6m_timer is the response to group query timer. * The state-change timer in6m_sctimer is separate; whenever state changes * for the group the state change record is generated and transmitted, * and kept if retransmissions are necessary. * * FUTURE: in6m_link is now only used when groups are being purged * on a detaching ifnet. It could be demoted to a SLIST_ENTRY, but * because it is at the very start of the struct, we can't do this * w/o breaking the ABI for ifmcstat. */ struct in6_multi { LIST_ENTRY(in6_multi) in6m_entry; /* list glue */ struct in6_addr in6m_addr; /* IPv6 multicast address */ struct ifnet *in6m_ifp; /* back pointer to ifnet */ struct ifmultiaddr *in6m_ifma; /* back pointer to ifmultiaddr */ u_int in6m_refcount; /* reference count */ u_int in6m_state; /* state of the membership */ u_int in6m_timer; /* MLD6 listener report timer */ /* New fields for MLDv2 follow. */ - struct mld_ifinfo *in6m_mli; /* MLD info */ + struct mld_ifsoftc *in6m_mli; /* MLD info */ SLIST_ENTRY(in6_multi) in6m_nrele; /* to-be-released by MLD */ struct ip6_msource_tree in6m_srcs; /* tree of sources */ u_long in6m_nsrc; /* # of tree entries */ struct mbufq in6m_scq; /* queue of pending * state-change packets */ struct timeval in6m_lastgsrtv; /* last G-S-R query */ uint16_t in6m_sctimer; /* state-change timer */ uint16_t in6m_scrv; /* state-change rexmit count */ /* * SSM state counters which track state at T0 (the time the last * state-change report's RV timer went to zero) and T1 * (time of pending report, i.e. now). * Used for computing MLDv2 state-change reports. Several refcounts * are maintained here to optimize for common use-cases. */ struct in6m_st { uint16_t iss_fmode; /* MLD filter mode */ uint16_t iss_asm; /* # of ASM listeners */ uint16_t iss_ex; /* # of exclusive members */ uint16_t iss_in; /* # of inclusive members */ uint16_t iss_rec; /* # of recorded sources */ } in6m_st[2]; /* state at t0, t1 */ }; /* * Helper function to derive the filter mode on a source entry * from its internal counters. Predicates are: * A source is only excluded if all listeners exclude it. * A source is only included if no listeners exclude it, * and at least one listener includes it. * May be used by ifmcstat(8). */ static __inline uint8_t im6s_get_mode(const struct in6_multi *inm, const struct ip6_msource *ims, uint8_t t) { t = !!t; if (inm->in6m_st[t].iss_ex > 0 && inm->in6m_st[t].iss_ex == ims->im6s_st[t].ex) return (MCAST_EXCLUDE); else if (ims->im6s_st[t].in > 0 && ims->im6s_st[t].ex == 0) return (MCAST_INCLUDE); return (MCAST_UNDEFINED); } - -#ifdef _KERNEL /* * Lock macros for IPv6 layer multicast address lists. IPv6 lock goes * before link layer multicast locks in the lock order. In most cases, * consumers of IN_*_MULTI() macros should acquire the locks before * calling them; users of the in_{add,del}multi() functions should not. */ extern struct mtx in6_multi_mtx; #define IN6_MULTI_LOCK() mtx_lock(&in6_multi_mtx) #define IN6_MULTI_UNLOCK() mtx_unlock(&in6_multi_mtx) #define IN6_MULTI_LOCK_ASSERT() mtx_assert(&in6_multi_mtx, MA_OWNED) #define IN6_MULTI_UNLOCK_ASSERT() mtx_assert(&in6_multi_mtx, MA_NOTOWNED) /* * Look up an in6_multi record for an IPv6 multicast address * on the interface ifp. * If no record found, return NULL. * * SMPng: The IN6_MULTI_LOCK and IF_ADDR_LOCK on ifp must be held. */ static __inline struct in6_multi * in6m_lookup_locked(struct ifnet *ifp, const struct in6_addr *mcaddr) { struct ifmultiaddr *ifma; struct in6_multi *inm; IN6_MULTI_LOCK_ASSERT(); IF_ADDR_LOCK_ASSERT(ifp); inm = NULL; TAILQ_FOREACH(ifma, &((ifp)->if_multiaddrs), ifma_link) { if (ifma->ifma_addr->sa_family == AF_INET6) { inm = (struct in6_multi *)ifma->ifma_protospec; if (IN6_ARE_ADDR_EQUAL(&inm->in6m_addr, mcaddr)) break; inm = NULL; } } return (inm); } /* * Wrapper for in6m_lookup_locked(). * * SMPng: Assumes that neithr the IN6_MULTI_LOCK() or IF_ADDR_LOCK() are held. */ static __inline struct in6_multi * in6m_lookup(struct ifnet *ifp, const struct in6_addr *mcaddr) { struct in6_multi *inm; IN6_MULTI_LOCK(); IF_ADDR_RLOCK(ifp); inm = in6m_lookup_locked(ifp, mcaddr); IF_ADDR_RUNLOCK(ifp); IN6_MULTI_UNLOCK(); return (inm); } /* Acquire an in6_multi record. */ static __inline void in6m_acquire_locked(struct in6_multi *inm) { IN6_MULTI_LOCK_ASSERT(); ++inm->in6m_refcount; } struct ip6_moptions; struct sockopt; /* Multicast KPIs. */ int im6o_mc_filter(const struct ip6_moptions *, const struct ifnet *, const struct sockaddr *, const struct sockaddr *); int in6_mc_join(struct ifnet *, const struct in6_addr *, struct in6_mfilter *, struct in6_multi **, int); int in6_mc_join_locked(struct ifnet *, const struct in6_addr *, struct in6_mfilter *, struct in6_multi **, int); int in6_mc_leave(struct in6_multi *, struct in6_mfilter *); int in6_mc_leave_locked(struct in6_multi *, struct in6_mfilter *); void in6m_clear_recorded(struct in6_multi *); void in6m_commit(struct in6_multi *); void in6m_print(const struct in6_multi *); int in6m_record_source(struct in6_multi *, const struct in6_addr *); void in6m_release_locked(struct in6_multi *); void ip6_freemoptions(struct ip6_moptions *); int ip6_getmoptions(struct inpcb *, struct sockopt *); int ip6_setmoptions(struct inpcb *, struct sockopt *); /* Legacy KAME multicast KPIs. */ struct in6_multi_mship * in6_joingroup(struct ifnet *, struct in6_addr *, int *, int); int in6_leavegroup(struct in6_multi_mship *); /* flags to in6_update_ifa */ #define IN6_IFAUPDATE_DADDELAY 0x1 /* first time to configure an address */ int in6_mask2len(struct in6_addr *, u_char *); int in6_control(struct socket *, u_long, caddr_t, struct ifnet *, struct thread *); int in6_update_ifa(struct ifnet *, struct in6_aliasreq *, struct in6_ifaddr *, int); void in6_prepare_ifra(struct in6_aliasreq *, const struct in6_addr *, const struct in6_addr *); void in6_purgeaddr(struct ifaddr *); int in6if_do_dad(struct ifnet *); void in6_savemkludge(struct in6_ifaddr *); void *in6_domifattach(struct ifnet *); void in6_domifdetach(struct ifnet *, void *); int in6_domifmtu(struct ifnet *); void in6_setmaxmtu(void); int in6_if2idlen(struct ifnet *); struct in6_ifaddr *in6ifa_ifpforlinklocal(struct ifnet *, int); struct in6_ifaddr *in6ifa_ifpwithaddr(struct ifnet *, struct in6_addr *); struct in6_ifaddr *in6ifa_ifwithaddr(const struct in6_addr *, uint32_t); struct in6_ifaddr *in6ifa_llaonifp(struct ifnet *); char *ip6_sprintf(char *, const struct in6_addr *); int in6_addr2zoneid(struct ifnet *, struct in6_addr *, u_int32_t *); int in6_matchlen(struct in6_addr *, struct in6_addr *); int in6_are_prefix_equal(struct in6_addr *, struct in6_addr *, int); void in6_prefixlen2mask(struct in6_addr *, int); int in6_prefix_ioctl(struct socket *, u_long, caddr_t, struct ifnet *); int in6_prefix_add_ifid(int, struct in6_ifaddr *); void in6_prefix_remove_ifid(int, struct in6_ifaddr *); void in6_purgeprefix(struct ifnet *); int in6_is_addr_deprecated(struct sockaddr_in6 *); int in6_src_ioctl(u_long, caddr_t); void in6_newaddrmsg(struct in6_ifaddr *, int); /* * Extended API for IPv6 FIB support. */ void in6_rtredirect(struct sockaddr *, struct sockaddr *, struct sockaddr *, int, struct sockaddr *, u_int); int in6_rtrequest(int, struct sockaddr *, struct sockaddr *, struct sockaddr *, int, struct rtentry **, u_int); void in6_rtalloc(struct route_in6 *, u_int); void in6_rtalloc_ign(struct route_in6 *, u_long, u_int); struct rtentry *in6_rtalloc1(struct sockaddr *, int, u_long, u_int); #endif /* _KERNEL */ #endif /* _NETINET6_IN6_VAR_H_ */ Index: projects/ifnet/sys/netinet6/mld6.c =================================================================== --- projects/ifnet/sys/netinet6/mld6.c (revision 279031) +++ projects/ifnet/sys/netinet6/mld6.c (revision 279032) @@ -1,3301 +1,3310 @@ /*- * Copyright (c) 2009 Bruce Simpson. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. The name of the author may not be used to endorse or promote * products derived from this software without specific prior written * permission. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $KAME: mld6.c,v 1.27 2001/04/04 05:17:30 itojun Exp $ */ /*- * Copyright (c) 1988 Stephen Deering. * Copyright (c) 1992, 1993 * The Regents of the University of California. All rights reserved. * * This code is derived from software contributed to Berkeley by * Stephen Deering of Stanford University. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)igmp.c 8.1 (Berkeley) 7/19/93 */ #include __FBSDID("$FreeBSD$"); #include "opt_inet.h" #include "opt_inet6.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifndef KTR_MLD #define KTR_MLD KTR_INET6 #endif -static struct mld_ifinfo * +static struct mld_ifsoftc * mli_alloc_locked(struct ifnet *); static void mli_delete_locked(const struct ifnet *); static void mld_dispatch_packet(struct mbuf *); static void mld_dispatch_queue(struct mbufq *, int); -static void mld_final_leave(struct in6_multi *, struct mld_ifinfo *); +static void mld_final_leave(struct in6_multi *, struct mld_ifsoftc *); static void mld_fasttimo_vnet(void); static int mld_handle_state_change(struct in6_multi *, - struct mld_ifinfo *); -static int mld_initial_join(struct in6_multi *, struct mld_ifinfo *, + struct mld_ifsoftc *); +static int mld_initial_join(struct in6_multi *, struct mld_ifsoftc *, const int); #ifdef KTR static char * mld_rec_type_to_str(const int); #endif -static void mld_set_version(struct mld_ifinfo *, const int); +static void mld_set_version(struct mld_ifsoftc *, const int); static void mld_slowtimo_vnet(void); static int mld_v1_input_query(struct ifnet *, const struct ip6_hdr *, /*const*/ struct mld_hdr *); static int mld_v1_input_report(struct ifnet *, const struct ip6_hdr *, /*const*/ struct mld_hdr *); -static void mld_v1_process_group_timer(struct mld_ifinfo *, +static void mld_v1_process_group_timer(struct mld_ifsoftc *, struct in6_multi *); -static void mld_v1_process_querier_timers(struct mld_ifinfo *); +static void mld_v1_process_querier_timers(struct mld_ifsoftc *); static int mld_v1_transmit_report(struct in6_multi *, const int); static void mld_v1_update_group(struct in6_multi *, const int); -static void mld_v2_cancel_link_timers(struct mld_ifinfo *); -static void mld_v2_dispatch_general_query(struct mld_ifinfo *); +static void mld_v2_cancel_link_timers(struct mld_ifsoftc *); +static void mld_v2_dispatch_general_query(struct mld_ifsoftc *); static struct mbuf * mld_v2_encap_report(struct ifnet *, struct mbuf *); static int mld_v2_enqueue_filter_change(struct mbufq *, struct in6_multi *); static int mld_v2_enqueue_group_record(struct mbufq *, struct in6_multi *, const int, const int, const int, const int); static int mld_v2_input_query(struct ifnet *, const struct ip6_hdr *, struct mbuf *, const int, const int); static int mld_v2_merge_state_changes(struct in6_multi *, struct mbufq *); -static void mld_v2_process_group_timers(struct mld_ifinfo *, +static void mld_v2_process_group_timers(struct mld_ifsoftc *, struct mbufq *, struct mbufq *, struct in6_multi *, const int); static int mld_v2_process_group_query(struct in6_multi *, - struct mld_ifinfo *mli, int, struct mbuf *, const int); + struct mld_ifsoftc *mli, int, struct mbuf *, const int); static int sysctl_mld_gsr(SYSCTL_HANDLER_ARGS); static int sysctl_mld_ifinfo(SYSCTL_HANDLER_ARGS); /* * Normative references: RFC 2710, RFC 3590, RFC 3810. * * Locking: * * The MLD subsystem lock ends up being system-wide for the moment, * but could be per-VIMAGE later on. * * The permitted lock order is: IN6_MULTI_LOCK, MLD_LOCK, IF_ADDR_LOCK. * Any may be taken independently; if any are held at the same * time, the above lock order must be followed. * * IN6_MULTI_LOCK covers in_multi. * * MLD_LOCK covers per-link state and any global variables in this file. * * IF_ADDR_LOCK covers if_multiaddrs, which is used for a variety of * per-link state iterators. * * XXX LOR PREVENTION * A special case for IPv6 is the in6_setscope() routine. ip6_output() * will not accept an ifp; it wants an embedded scope ID, unlike * ip_output(), which happily takes the ifp given to it. The embedded * scope ID is only used by MLD to select the outgoing interface. * * During interface attach and detach, MLD will take MLD_LOCK *after* * the IF_AFDATA_LOCK. * As in6_setscope() takes IF_AFDATA_LOCK then SCOPE_LOCK, we can't call * it with MLD_LOCK held without triggering an LOR. A netisr with indirect * dispatch could work around this, but we'd rather not do that, as it * can introduce other races. * * As such, we exploit the fact that the scope ID is just the interface * index, and embed it in the IPv6 destination address accordingly. * This is potentially NOT VALID for MLDv1 reports, as they * are always sent to the multicast group itself; as MLDv2 * reports are always sent to ff02::16, this is not an issue * when MLDv2 is in use. * * This does not however eliminate the LOR when ip6_output() itself * calls in6_setscope() internally whilst MLD_LOCK is held. This will * trigger a LOR warning in WITNESS when the ifnet is detached. * * The right answer is probably to make IF_AFDATA_LOCK an rwlock, given * how it's used across the network stack. Here we're simply exploiting * the fact that MLD runs at a similar layer in the stack to scope6.c. * * VIMAGE: * * Each in6_multi corresponds to an ifp, and each ifp corresponds * to a vnet in ifp->if_vnet. */ static struct mtx mld_mtx; static MALLOC_DEFINE(M_MLD, "mld", "mld state"); #define MLD_EMBEDSCOPE(pin6, zoneid) \ if (IN6_IS_SCOPE_LINKLOCAL(pin6) || \ IN6_IS_ADDR_MC_INTFACELOCAL(pin6)) \ (pin6)->s6_addr16[1] = htons((zoneid) & 0xFFFF) \ /* * VIMAGE-wide globals. */ static VNET_DEFINE(struct timeval, mld_gsrdelay) = {10, 0}; -static VNET_DEFINE(LIST_HEAD(, mld_ifinfo), mli_head); +static VNET_DEFINE(LIST_HEAD(, mld_ifsoftc), mli_head); static VNET_DEFINE(int, interface_timers_running6); static VNET_DEFINE(int, state_change_timers_running6); static VNET_DEFINE(int, current_state_timers_running6); #define V_mld_gsrdelay VNET(mld_gsrdelay) #define V_mli_head VNET(mli_head) #define V_interface_timers_running6 VNET(interface_timers_running6) #define V_state_change_timers_running6 VNET(state_change_timers_running6) #define V_current_state_timers_running6 VNET(current_state_timers_running6) SYSCTL_DECL(_net_inet6); /* Note: Not in any common header. */ SYSCTL_NODE(_net_inet6, OID_AUTO, mld, CTLFLAG_RW, 0, "IPv6 Multicast Listener Discovery"); /* * Virtualized sysctls. */ SYSCTL_PROC(_net_inet6_mld, OID_AUTO, gsrdelay, CTLFLAG_VNET | CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_MPSAFE, &VNET_NAME(mld_gsrdelay.tv_sec), 0, sysctl_mld_gsr, "I", "Rate limit for MLDv2 Group-and-Source queries in seconds"); /* * Non-virtualized sysctls. */ static SYSCTL_NODE(_net_inet6_mld, OID_AUTO, ifinfo, CTLFLAG_RD | CTLFLAG_MPSAFE, sysctl_mld_ifinfo, "Per-interface MLDv2 state"); static int mld_v1enable = 1; SYSCTL_INT(_net_inet6_mld, OID_AUTO, v1enable, CTLFLAG_RWTUN, &mld_v1enable, 0, "Enable fallback to MLDv1"); static int mld_use_allow = 1; SYSCTL_INT(_net_inet6_mld, OID_AUTO, use_allow, CTLFLAG_RWTUN, &mld_use_allow, 0, "Use ALLOW/BLOCK for RFC 4604 SSM joins/leaves"); /* * Packed Router Alert option structure declaration. */ struct mld_raopt { struct ip6_hbh hbh; struct ip6_opt pad; struct ip6_opt_router ra; } __packed; /* * Router Alert hop-by-hop option header. */ static struct mld_raopt mld_ra = { .hbh = { 0, 0 }, .pad = { .ip6o_type = IP6OPT_PADN, 0 }, .ra = { .ip6or_type = IP6OPT_ROUTER_ALERT, .ip6or_len = IP6OPT_RTALERT_LEN - 2, .ip6or_value[0] = ((IP6OPT_RTALERT_MLD >> 8) & 0xFF), .ip6or_value[1] = (IP6OPT_RTALERT_MLD & 0xFF) } }; static struct ip6_pktopts mld_po; static __inline void mld_save_context(struct mbuf *m, struct ifnet *ifp) { #ifdef VIMAGE m->m_pkthdr.PH_loc.ptr = ifp->if_vnet; #endif /* VIMAGE */ m->m_pkthdr.flowid = ifp->if_index; } static __inline void mld_scrub_context(struct mbuf *m) { m->m_pkthdr.PH_loc.ptr = NULL; m->m_pkthdr.flowid = 0; } /* * Restore context from a queued output chain. * Return saved ifindex. * * VIMAGE: The assertion is there to make sure that we * actually called CURVNET_SET() with what's in the mbuf chain. */ static __inline uint32_t mld_restore_context(struct mbuf *m) { #if defined(VIMAGE) && defined(INVARIANTS) KASSERT(curvnet == m->m_pkthdr.PH_loc.ptr, ("%s: called when curvnet was not restored", __func__)); #endif return (m->m_pkthdr.flowid); } /* * Retrieve or set threshold between group-source queries in seconds. * * VIMAGE: Assume curvnet set by caller. * SMPng: NOTE: Serialized by MLD lock. */ static int sysctl_mld_gsr(SYSCTL_HANDLER_ARGS) { int error; int i; error = sysctl_wire_old_buffer(req, sizeof(int)); if (error) return (error); MLD_LOCK(); i = V_mld_gsrdelay.tv_sec; error = sysctl_handle_int(oidp, &i, 0, req); if (error || !req->newptr) goto out_locked; if (i < -1 || i >= 60) { error = EINVAL; goto out_locked; } CTR2(KTR_MLD, "change mld_gsrdelay from %d to %d", V_mld_gsrdelay.tv_sec, i); V_mld_gsrdelay.tv_sec = i; out_locked: MLD_UNLOCK(); return (error); } /* - * Expose struct mld_ifinfo to userland, keyed by ifindex. + * Expose struct mld_ifsoftc to userland, keyed by ifindex. * For use by ifmcstat(8). * * SMPng: NOTE: Does an unlocked ifindex space read. * VIMAGE: Assume curvnet set by caller. The node handler itself * is not directly virtualized. */ static int sysctl_mld_ifinfo(SYSCTL_HANDLER_ARGS) { int *name; int error; u_int namelen; struct ifnet *ifp; - struct mld_ifinfo *mli; + struct mld_ifsoftc *mli; name = (int *)arg1; namelen = arg2; if (req->newptr != NULL) return (EPERM); if (namelen != 1) return (EINVAL); error = sysctl_wire_old_buffer(req, sizeof(struct mld_ifinfo)); if (error) return (error); IN6_MULTI_LOCK(); MLD_LOCK(); if (name[0] <= 0 || name[0] > V_if_index) { error = ENOENT; goto out_locked; } error = ENOENT; ifp = ifnet_byindex(name[0]); if (ifp == NULL) goto out_locked; LIST_FOREACH(mli, &V_mli_head, mli_link) { if (ifp == mli->mli_ifp) { - error = SYSCTL_OUT(req, mli, - sizeof(struct mld_ifinfo)); + struct mld_ifinfo info; + + info.mli_version = mli->mli_version; + info.mli_v1_timer = mli->mli_v1_timer; + info.mli_v2_timer = mli->mli_v2_timer; + info.mli_flags = mli->mli_flags; + info.mli_rv = mli->mli_rv; + info.mli_qi = mli->mli_qi; + info.mli_qri = mli->mli_qri; + info.mli_uri = mli->mli_uri; + error = SYSCTL_OUT(req, &info, sizeof(info)); break; } } out_locked: MLD_UNLOCK(); IN6_MULTI_UNLOCK(); return (error); } /* * Dispatch an entire queue of pending packet chains. * VIMAGE: Assumes the vnet pointer has been set. */ static void mld_dispatch_queue(struct mbufq *mq, int limit) { struct mbuf *m; while ((m = mbufq_dequeue(mq)) != NULL) { CTR3(KTR_MLD, "%s: dispatch %p from %p", __func__, mq, m); mld_dispatch_packet(m); if (--limit == 0) break; } } /* * Filter outgoing MLD report state by group. * * Reports are ALWAYS suppressed for ALL-HOSTS (ff02::1) * and node-local addresses. However, kernel and socket consumers * always embed the KAME scope ID in the address provided, so strip it * when performing comparison. * Note: This is not the same as the *multicast* scope. * * Return zero if the given group is one for which MLD reports * should be suppressed, or non-zero if reports should be issued. */ static __inline int mld_is_addr_reported(const struct in6_addr *addr) { KASSERT(IN6_IS_ADDR_MULTICAST(addr), ("%s: not multicast", __func__)); if (IPV6_ADDR_MC_SCOPE(addr) == IPV6_ADDR_SCOPE_NODELOCAL) return (0); if (IPV6_ADDR_MC_SCOPE(addr) == IPV6_ADDR_SCOPE_LINKLOCAL) { struct in6_addr tmp = *addr; in6_clearscope(&tmp); if (IN6_ARE_ADDR_EQUAL(&tmp, &in6addr_linklocal_allnodes)) return (0); } return (1); } /* * Attach MLD when PF_INET6 is attached to an interface. * * SMPng: Normally called with IF_AFDATA_LOCK held. */ -struct mld_ifinfo * +struct mld_ifsoftc * mld_domifattach(struct ifnet *ifp) { - struct mld_ifinfo *mli; + struct mld_ifsoftc *mli; CTR3(KTR_MLD, "%s: called for ifp %p(%s)", __func__, ifp, if_name(ifp)); MLD_LOCK(); mli = mli_alloc_locked(ifp); if (!(ifp->if_flags & IFF_MULTICAST)) mli->mli_flags |= MLIF_SILENT; if (mld_use_allow) mli->mli_flags |= MLIF_USEALLOW; MLD_UNLOCK(); return (mli); } /* * VIMAGE: assume curvnet set by caller. */ -static struct mld_ifinfo * +static struct mld_ifsoftc * mli_alloc_locked(/*const*/ struct ifnet *ifp) { - struct mld_ifinfo *mli; + struct mld_ifsoftc *mli; MLD_LOCK_ASSERT(); - mli = malloc(sizeof(struct mld_ifinfo), M_MLD, M_NOWAIT|M_ZERO); + mli = malloc(sizeof(struct mld_ifsoftc), M_MLD, M_NOWAIT|M_ZERO); if (mli == NULL) goto out; mli->mli_ifp = ifp; mli->mli_version = MLD_VERSION_2; mli->mli_flags = 0; mli->mli_rv = MLD_RV_INIT; mli->mli_qi = MLD_QI_INIT; mli->mli_qri = MLD_QRI_INIT; mli->mli_uri = MLD_URI_INIT; SLIST_INIT(&mli->mli_relinmhead); mbufq_init(&mli->mli_gq, MLD_MAX_RESPONSE_PACKETS); LIST_INSERT_HEAD(&V_mli_head, mli, mli_link); - CTR2(KTR_MLD, "allocate mld_ifinfo for ifp %p(%s)", + CTR2(KTR_MLD, "allocate mld_ifsoftc for ifp %p(%s)", ifp, if_name(ifp)); out: return (mli); } /* * Hook for ifdetach. * * NOTE: Some finalization tasks need to run before the protocol domain * is detached, but also before the link layer does its cleanup. * Run before link-layer cleanup; cleanup groups, but do not free MLD state. * * SMPng: Caller must hold IN6_MULTI_LOCK(). * Must take IF_ADDR_LOCK() to cover if_multiaddrs iterator. * XXX This routine is also bitten by unlocked ifma_protospec access. */ void mld_ifdetach(struct ifnet *ifp) { - struct mld_ifinfo *mli; + struct mld_ifsoftc *mli; struct ifmultiaddr *ifma; struct in6_multi *inm, *tinm; CTR3(KTR_MLD, "%s: called for ifp %p(%s)", __func__, ifp, if_name(ifp)); IN6_MULTI_LOCK_ASSERT(); MLD_LOCK(); mli = MLD_IFINFO(ifp); if (mli->mli_version == MLD_VERSION_2) { IF_ADDR_RLOCK(ifp); TAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) { if (ifma->ifma_addr->sa_family != AF_INET6 || ifma->ifma_protospec == NULL) continue; inm = (struct in6_multi *)ifma->ifma_protospec; if (inm->in6m_state == MLD_LEAVING_MEMBER) { SLIST_INSERT_HEAD(&mli->mli_relinmhead, inm, in6m_nrele); } in6m_clear_recorded(inm); } IF_ADDR_RUNLOCK(ifp); SLIST_FOREACH_SAFE(inm, &mli->mli_relinmhead, in6m_nrele, tinm) { SLIST_REMOVE_HEAD(&mli->mli_relinmhead, in6m_nrele); in6m_release_locked(inm); } } MLD_UNLOCK(); } /* * Hook for domifdetach. * Runs after link-layer cleanup; free MLD state. * * SMPng: Normally called with IF_AFDATA_LOCK held. */ void mld_domifdetach(struct ifnet *ifp) { CTR3(KTR_MLD, "%s: called for ifp %p(%s)", __func__, ifp, if_name(ifp)); MLD_LOCK(); mli_delete_locked(ifp); MLD_UNLOCK(); } static void mli_delete_locked(const struct ifnet *ifp) { - struct mld_ifinfo *mli, *tmli; + struct mld_ifsoftc *mli, *tmli; - CTR3(KTR_MLD, "%s: freeing mld_ifinfo for ifp %p(%s)", + CTR3(KTR_MLD, "%s: freeing mld_ifsoftc for ifp %p(%s)", __func__, ifp, if_name(ifp)); MLD_LOCK_ASSERT(); LIST_FOREACH_SAFE(mli, &V_mli_head, mli_link, tmli) { if (mli->mli_ifp == ifp) { /* * Free deferred General Query responses. */ mbufq_drain(&mli->mli_gq); LIST_REMOVE(mli, mli_link); KASSERT(SLIST_EMPTY(&mli->mli_relinmhead), ("%s: there are dangling in_multi references", __func__)); free(mli, M_MLD); return; } } #ifdef INVARIANTS - panic("%s: mld_ifinfo not found for ifp %p\n", __func__, ifp); + panic("%s: mld_ifsoftc not found for ifp %p\n", __func__, ifp); #endif } /* * Process a received MLDv1 general or address-specific query. * Assumes that the query header has been pulled up to sizeof(mld_hdr). * * NOTE: Can't be fully const correct as we temporarily embed scope ID in * mld_addr. This is OK as we own the mbuf chain. */ static int mld_v1_input_query(struct ifnet *ifp, const struct ip6_hdr *ip6, /*const*/ struct mld_hdr *mld) { struct ifmultiaddr *ifma; - struct mld_ifinfo *mli; + struct mld_ifsoftc *mli; struct in6_multi *inm; int is_general_query; uint16_t timer; #ifdef KTR char ip6tbuf[INET6_ADDRSTRLEN]; #endif is_general_query = 0; if (!mld_v1enable) { CTR3(KTR_MLD, "ignore v1 query %s on ifp %p(%s)", ip6_sprintf(ip6tbuf, &mld->mld_addr), ifp, if_name(ifp)); return (0); } /* * RFC3810 Section 6.2: MLD queries must originate from * a router's link-local address. */ if (!IN6_IS_SCOPE_LINKLOCAL(&ip6->ip6_src)) { CTR3(KTR_MLD, "ignore v1 query src %s on ifp %p(%s)", ip6_sprintf(ip6tbuf, &ip6->ip6_src), ifp, if_name(ifp)); return (0); } /* * Do address field validation upfront before we accept * the query. */ if (IN6_IS_ADDR_UNSPECIFIED(&mld->mld_addr)) { /* * MLDv1 General Query. * If this was not sent to the all-nodes group, ignore it. */ struct in6_addr dst; dst = ip6->ip6_dst; in6_clearscope(&dst); if (!IN6_ARE_ADDR_EQUAL(&dst, &in6addr_linklocal_allnodes)) return (EINVAL); is_general_query = 1; } else { /* * Embed scope ID of receiving interface in MLD query for * lookup whilst we don't hold other locks. */ in6_setscope(&mld->mld_addr, ifp, NULL); } IN6_MULTI_LOCK(); MLD_LOCK(); /* * Switch to MLDv1 host compatibility mode. */ mli = MLD_IFINFO(ifp); - KASSERT(mli != NULL, ("%s: no mld_ifinfo for ifp %p", __func__, ifp)); + KASSERT(mli != NULL, ("%s: no mld_ifsoftc for ifp %p", __func__, ifp)); mld_set_version(mli, MLD_VERSION_1); timer = (ntohs(mld->mld_maxdelay) * PR_FASTHZ) / MLD_TIMER_SCALE; if (timer == 0) timer = 1; IF_ADDR_RLOCK(ifp); if (is_general_query) { /* * For each reporting group joined on this * interface, kick the report timer. */ CTR2(KTR_MLD, "process v1 general query on ifp %p(%s)", ifp, if_name(ifp)); TAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) { if (ifma->ifma_addr->sa_family != AF_INET6 || ifma->ifma_protospec == NULL) continue; inm = (struct in6_multi *)ifma->ifma_protospec; mld_v1_update_group(inm, timer); } } else { /* * MLDv1 Group-Specific Query. * If this is a group-specific MLDv1 query, we need only * look up the single group to process it. */ inm = in6m_lookup_locked(ifp, &mld->mld_addr); if (inm != NULL) { CTR3(KTR_MLD, "process v1 query %s on ifp %p(%s)", ip6_sprintf(ip6tbuf, &mld->mld_addr), ifp, if_name(ifp)); mld_v1_update_group(inm, timer); } /* XXX Clear embedded scope ID as userland won't expect it. */ in6_clearscope(&mld->mld_addr); } IF_ADDR_RUNLOCK(ifp); MLD_UNLOCK(); IN6_MULTI_UNLOCK(); return (0); } /* * Update the report timer on a group in response to an MLDv1 query. * * If we are becoming the reporting member for this group, start the timer. * If we already are the reporting member for this group, and timer is * below the threshold, reset it. * * We may be updating the group for the first time since we switched * to MLDv2. If we are, then we must clear any recorded source lists, * and transition to REPORTING state; the group timer is overloaded * for group and group-source query responses. * * Unlike MLDv2, the delay per group should be jittered * to avoid bursts of MLDv1 reports. */ static void mld_v1_update_group(struct in6_multi *inm, const int timer) { #ifdef KTR char ip6tbuf[INET6_ADDRSTRLEN]; #endif CTR4(KTR_MLD, "%s: %s/%s timer=%d", __func__, ip6_sprintf(ip6tbuf, &inm->in6m_addr), if_name(inm->in6m_ifp), timer); IN6_MULTI_LOCK_ASSERT(); switch (inm->in6m_state) { case MLD_NOT_MEMBER: case MLD_SILENT_MEMBER: break; case MLD_REPORTING_MEMBER: if (inm->in6m_timer != 0 && inm->in6m_timer <= timer) { CTR1(KTR_MLD, "%s: REPORTING and timer running, " "skipping.", __func__); break; } /* FALLTHROUGH */ case MLD_SG_QUERY_PENDING_MEMBER: case MLD_G_QUERY_PENDING_MEMBER: case MLD_IDLE_MEMBER: case MLD_LAZY_MEMBER: case MLD_AWAKENING_MEMBER: CTR1(KTR_MLD, "%s: ->REPORTING", __func__); inm->in6m_state = MLD_REPORTING_MEMBER; inm->in6m_timer = MLD_RANDOM_DELAY(timer); V_current_state_timers_running6 = 1; break; case MLD_SLEEPING_MEMBER: CTR1(KTR_MLD, "%s: ->AWAKENING", __func__); inm->in6m_state = MLD_AWAKENING_MEMBER; break; case MLD_LEAVING_MEMBER: break; } } /* * Process a received MLDv2 general, group-specific or * group-and-source-specific query. * * Assumes that the query header has been pulled up to sizeof(mldv2_query). * * Return 0 if successful, otherwise an appropriate error code is returned. */ static int mld_v2_input_query(struct ifnet *ifp, const struct ip6_hdr *ip6, struct mbuf *m, const int off, const int icmp6len) { - struct mld_ifinfo *mli; + struct mld_ifsoftc *mli; struct mldv2_query *mld; struct in6_multi *inm; uint32_t maxdelay, nsrc, qqi; int is_general_query; uint16_t timer; uint8_t qrv; #ifdef KTR char ip6tbuf[INET6_ADDRSTRLEN]; #endif is_general_query = 0; /* * RFC3810 Section 6.2: MLD queries must originate from * a router's link-local address. */ if (!IN6_IS_SCOPE_LINKLOCAL(&ip6->ip6_src)) { CTR3(KTR_MLD, "ignore v1 query src %s on ifp %p(%s)", ip6_sprintf(ip6tbuf, &ip6->ip6_src), ifp, if_name(ifp)); return (0); } CTR2(KTR_MLD, "input v2 query on ifp %p(%s)", ifp, if_name(ifp)); mld = (struct mldv2_query *)(mtod(m, uint8_t *) + off); maxdelay = ntohs(mld->mld_maxdelay); /* in 1/10ths of a second */ if (maxdelay >= 32768) { maxdelay = (MLD_MRC_MANT(maxdelay) | 0x1000) << (MLD_MRC_EXP(maxdelay) + 3); } timer = (maxdelay * PR_FASTHZ) / MLD_TIMER_SCALE; if (timer == 0) timer = 1; qrv = MLD_QRV(mld->mld_misc); if (qrv < 2) { CTR3(KTR_MLD, "%s: clamping qrv %d to %d", __func__, qrv, MLD_RV_INIT); qrv = MLD_RV_INIT; } qqi = mld->mld_qqi; if (qqi >= 128) { qqi = MLD_QQIC_MANT(mld->mld_qqi) << (MLD_QQIC_EXP(mld->mld_qqi) + 3); } nsrc = ntohs(mld->mld_numsrc); if (nsrc > MLD_MAX_GS_SOURCES) return (EMSGSIZE); if (icmp6len < sizeof(struct mldv2_query) + (nsrc * sizeof(struct in6_addr))) return (EMSGSIZE); /* * Do further input validation upfront to avoid resetting timers * should we need to discard this query. */ if (IN6_IS_ADDR_UNSPECIFIED(&mld->mld_addr)) { /* * A general query with a source list has undefined * behaviour; discard it. */ if (nsrc > 0) return (EINVAL); is_general_query = 1; } else { /* * Embed scope ID of receiving interface in MLD query for * lookup whilst we don't hold other locks (due to KAME * locking lameness). We own this mbuf chain just now. */ in6_setscope(&mld->mld_addr, ifp, NULL); } IN6_MULTI_LOCK(); MLD_LOCK(); mli = MLD_IFINFO(ifp); - KASSERT(mli != NULL, ("%s: no mld_ifinfo for ifp %p", __func__, ifp)); + KASSERT(mli != NULL, ("%s: no mld_ifsoftc for ifp %p", __func__, ifp)); /* * Discard the v2 query if we're in Compatibility Mode. * The RFC is pretty clear that hosts need to stay in MLDv1 mode * until the Old Version Querier Present timer expires. */ if (mli->mli_version != MLD_VERSION_2) goto out_locked; mld_set_version(mli, MLD_VERSION_2); mli->mli_rv = qrv; mli->mli_qi = qqi; mli->mli_qri = maxdelay; CTR4(KTR_MLD, "%s: qrv %d qi %d maxdelay %d", __func__, qrv, qqi, maxdelay); if (is_general_query) { /* * MLDv2 General Query. * * Schedule a current-state report on this ifp for * all groups, possibly containing source lists. * * If there is a pending General Query response * scheduled earlier than the selected delay, do * not schedule any other reports. * Otherwise, reset the interface timer. */ CTR2(KTR_MLD, "process v2 general query on ifp %p(%s)", ifp, if_name(ifp)); if (mli->mli_v2_timer == 0 || mli->mli_v2_timer >= timer) { mli->mli_v2_timer = MLD_RANDOM_DELAY(timer); V_interface_timers_running6 = 1; } } else { /* * MLDv2 Group-specific or Group-and-source-specific Query. * * Group-source-specific queries are throttled on * a per-group basis to defeat denial-of-service attempts. * Queries for groups we are not a member of on this * link are simply ignored. */ IF_ADDR_RLOCK(ifp); inm = in6m_lookup_locked(ifp, &mld->mld_addr); if (inm == NULL) { IF_ADDR_RUNLOCK(ifp); goto out_locked; } if (nsrc > 0) { if (!ratecheck(&inm->in6m_lastgsrtv, &V_mld_gsrdelay)) { CTR1(KTR_MLD, "%s: GS query throttled.", __func__); IF_ADDR_RUNLOCK(ifp); goto out_locked; } } CTR2(KTR_MLD, "process v2 group query on ifp %p(%s)", ifp, if_name(ifp)); /* * If there is a pending General Query response * scheduled sooner than the selected delay, no * further report need be scheduled. * Otherwise, prepare to respond to the * group-specific or group-and-source query. */ if (mli->mli_v2_timer == 0 || mli->mli_v2_timer >= timer) mld_v2_process_group_query(inm, mli, timer, m, off); /* XXX Clear embedded scope ID as userland won't expect it. */ in6_clearscope(&mld->mld_addr); IF_ADDR_RUNLOCK(ifp); } out_locked: MLD_UNLOCK(); IN6_MULTI_UNLOCK(); return (0); } /* * Process a recieved MLDv2 group-specific or group-and-source-specific * query. * Return <0 if any error occured. Currently this is ignored. */ static int -mld_v2_process_group_query(struct in6_multi *inm, struct mld_ifinfo *mli, +mld_v2_process_group_query(struct in6_multi *inm, struct mld_ifsoftc *mli, int timer, struct mbuf *m0, const int off) { struct mldv2_query *mld; int retval; uint16_t nsrc; IN6_MULTI_LOCK_ASSERT(); MLD_LOCK_ASSERT(); retval = 0; mld = (struct mldv2_query *)(mtod(m0, uint8_t *) + off); switch (inm->in6m_state) { case MLD_NOT_MEMBER: case MLD_SILENT_MEMBER: case MLD_SLEEPING_MEMBER: case MLD_LAZY_MEMBER: case MLD_AWAKENING_MEMBER: case MLD_IDLE_MEMBER: case MLD_LEAVING_MEMBER: return (retval); break; case MLD_REPORTING_MEMBER: case MLD_G_QUERY_PENDING_MEMBER: case MLD_SG_QUERY_PENDING_MEMBER: break; } nsrc = ntohs(mld->mld_numsrc); /* * Deal with group-specific queries upfront. * If any group query is already pending, purge any recorded * source-list state if it exists, and schedule a query response * for this group-specific query. */ if (nsrc == 0) { if (inm->in6m_state == MLD_G_QUERY_PENDING_MEMBER || inm->in6m_state == MLD_SG_QUERY_PENDING_MEMBER) { in6m_clear_recorded(inm); timer = min(inm->in6m_timer, timer); } inm->in6m_state = MLD_G_QUERY_PENDING_MEMBER; inm->in6m_timer = MLD_RANDOM_DELAY(timer); V_current_state_timers_running6 = 1; return (retval); } /* * Deal with the case where a group-and-source-specific query has * been received but a group-specific query is already pending. */ if (inm->in6m_state == MLD_G_QUERY_PENDING_MEMBER) { timer = min(inm->in6m_timer, timer); inm->in6m_timer = MLD_RANDOM_DELAY(timer); V_current_state_timers_running6 = 1; return (retval); } /* * Finally, deal with the case where a group-and-source-specific * query has been received, where a response to a previous g-s-r * query exists, or none exists. * In this case, we need to parse the source-list which the Querier * has provided us with and check if we have any source list filter * entries at T1 for these sources. If we do not, there is no need * schedule a report and the query may be dropped. * If we do, we must record them and schedule a current-state * report for those sources. */ if (inm->in6m_nsrc > 0) { struct mbuf *m; uint8_t *sp; int i, nrecorded; int soff; m = m0; soff = off + sizeof(struct mldv2_query); nrecorded = 0; for (i = 0; i < nsrc; i++) { sp = mtod(m, uint8_t *) + soff; retval = in6m_record_source(inm, (const struct in6_addr *)sp); if (retval < 0) break; nrecorded += retval; soff += sizeof(struct in6_addr); if (soff >= m->m_len) { soff = soff - m->m_len; m = m->m_next; if (m == NULL) break; } } if (nrecorded > 0) { CTR1(KTR_MLD, "%s: schedule response to SG query", __func__); inm->in6m_state = MLD_SG_QUERY_PENDING_MEMBER; inm->in6m_timer = MLD_RANDOM_DELAY(timer); V_current_state_timers_running6 = 1; } } return (retval); } /* * Process a received MLDv1 host membership report. * Assumes mld points to mld_hdr in pulled up mbuf chain. * * NOTE: Can't be fully const correct as we temporarily embed scope ID in * mld_addr. This is OK as we own the mbuf chain. */ static int mld_v1_input_report(struct ifnet *ifp, const struct ip6_hdr *ip6, /*const*/ struct mld_hdr *mld) { struct in6_addr src, dst; struct in6_ifaddr *ia; struct in6_multi *inm; #ifdef KTR char ip6tbuf[INET6_ADDRSTRLEN]; #endif if (!mld_v1enable) { CTR3(KTR_MLD, "ignore v1 report %s on ifp %p(%s)", ip6_sprintf(ip6tbuf, &mld->mld_addr), ifp, if_name(ifp)); return (0); } if (ifp->if_flags & IFF_LOOPBACK) return (0); /* * MLDv1 reports must originate from a host's link-local address, * or the unspecified address (when booting). */ src = ip6->ip6_src; in6_clearscope(&src); if (!IN6_IS_SCOPE_LINKLOCAL(&src) && !IN6_IS_ADDR_UNSPECIFIED(&src)) { CTR3(KTR_MLD, "ignore v1 query src %s on ifp %p(%s)", ip6_sprintf(ip6tbuf, &ip6->ip6_src), ifp, if_name(ifp)); return (EINVAL); } /* * RFC2710 Section 4: MLDv1 reports must pertain to a multicast * group, and must be directed to the group itself. */ dst = ip6->ip6_dst; in6_clearscope(&dst); if (!IN6_IS_ADDR_MULTICAST(&mld->mld_addr) || !IN6_ARE_ADDR_EQUAL(&mld->mld_addr, &dst)) { CTR3(KTR_MLD, "ignore v1 query dst %s on ifp %p(%s)", ip6_sprintf(ip6tbuf, &ip6->ip6_dst), ifp, if_name(ifp)); return (EINVAL); } /* * Make sure we don't hear our own membership report, as fast * leave requires knowing that we are the only member of a * group. Assume we used the link-local address if available, * otherwise look for ::. * * XXX Note that scope ID comparison is needed for the address * returned by in6ifa_ifpforlinklocal(), but SHOULD NOT be * performed for the on-wire address. */ ia = in6ifa_ifpforlinklocal(ifp, IN6_IFF_NOTREADY|IN6_IFF_ANYCAST); if ((ia && IN6_ARE_ADDR_EQUAL(&ip6->ip6_src, IA6_IN6(ia))) || (ia == NULL && IN6_IS_ADDR_UNSPECIFIED(&src))) { if (ia != NULL) ifa_free(&ia->ia_ifa); return (0); } if (ia != NULL) ifa_free(&ia->ia_ifa); CTR3(KTR_MLD, "process v1 report %s on ifp %p(%s)", ip6_sprintf(ip6tbuf, &mld->mld_addr), ifp, if_name(ifp)); /* * Embed scope ID of receiving interface in MLD query for lookup * whilst we don't hold other locks (due to KAME locking lameness). */ if (!IN6_IS_ADDR_UNSPECIFIED(&mld->mld_addr)) in6_setscope(&mld->mld_addr, ifp, NULL); IN6_MULTI_LOCK(); MLD_LOCK(); IF_ADDR_RLOCK(ifp); /* * MLDv1 report suppression. * If we are a member of this group, and our membership should be * reported, and our group timer is pending or about to be reset, * stop our group timer by transitioning to the 'lazy' state. */ inm = in6m_lookup_locked(ifp, &mld->mld_addr); if (inm != NULL) { - struct mld_ifinfo *mli; + struct mld_ifsoftc *mli; mli = inm->in6m_mli; KASSERT(mli != NULL, ("%s: no mli for ifp %p", __func__, ifp)); /* * If we are in MLDv2 host mode, do not allow the * other host's MLDv1 report to suppress our reports. */ if (mli->mli_version == MLD_VERSION_2) goto out_locked; inm->in6m_timer = 0; switch (inm->in6m_state) { case MLD_NOT_MEMBER: case MLD_SILENT_MEMBER: case MLD_SLEEPING_MEMBER: break; case MLD_REPORTING_MEMBER: case MLD_IDLE_MEMBER: case MLD_AWAKENING_MEMBER: CTR3(KTR_MLD, "report suppressed for %s on ifp %p(%s)", ip6_sprintf(ip6tbuf, &mld->mld_addr), ifp, if_name(ifp)); case MLD_LAZY_MEMBER: inm->in6m_state = MLD_LAZY_MEMBER; break; case MLD_G_QUERY_PENDING_MEMBER: case MLD_SG_QUERY_PENDING_MEMBER: case MLD_LEAVING_MEMBER: break; } } out_locked: IF_ADDR_RUNLOCK(ifp); MLD_UNLOCK(); IN6_MULTI_UNLOCK(); /* XXX Clear embedded scope ID as userland won't expect it. */ in6_clearscope(&mld->mld_addr); return (0); } /* * MLD input path. * * Assume query messages which fit in a single ICMPv6 message header * have been pulled up. * Assume that userland will want to see the message, even if it * otherwise fails kernel input validation; do not free it. * Pullup may however free the mbuf chain m if it fails. * * Return IPPROTO_DONE if we freed m. Otherwise, return 0. */ int mld_input(struct mbuf *m, int off, int icmp6len) { struct ifnet *ifp; struct ip6_hdr *ip6; struct mld_hdr *mld; int mldlen; CTR3(KTR_MLD, "%s: called w/mbuf (%p,%d)", __func__, m, off); ifp = m->m_pkthdr.rcvif; ip6 = mtod(m, struct ip6_hdr *); /* Pullup to appropriate size. */ mld = (struct mld_hdr *)(mtod(m, uint8_t *) + off); if (mld->mld_type == MLD_LISTENER_QUERY && icmp6len >= sizeof(struct mldv2_query)) { mldlen = sizeof(struct mldv2_query); } else { mldlen = sizeof(struct mld_hdr); } IP6_EXTHDR_GET(mld, struct mld_hdr *, m, off, mldlen); if (mld == NULL) { ICMP6STAT_INC(icp6s_badlen); return (IPPROTO_DONE); } /* * Userland needs to see all of this traffic for implementing * the endpoint discovery portion of multicast routing. */ switch (mld->mld_type) { case MLD_LISTENER_QUERY: icmp6_ifstat_inc(ifp, ifs6_in_mldquery); if (icmp6len == sizeof(struct mld_hdr)) { if (mld_v1_input_query(ifp, ip6, mld) != 0) return (0); } else if (icmp6len >= sizeof(struct mldv2_query)) { if (mld_v2_input_query(ifp, ip6, m, off, icmp6len) != 0) return (0); } break; case MLD_LISTENER_REPORT: icmp6_ifstat_inc(ifp, ifs6_in_mldreport); if (mld_v1_input_report(ifp, ip6, mld) != 0) return (0); break; case MLDV2_LISTENER_REPORT: icmp6_ifstat_inc(ifp, ifs6_in_mldreport); break; case MLD_LISTENER_DONE: icmp6_ifstat_inc(ifp, ifs6_in_mlddone); break; default: break; } return (0); } /* * Fast timeout handler (global). * VIMAGE: Timeout handlers are expected to service all vimages. */ void mld_fasttimo(void) { VNET_ITERATOR_DECL(vnet_iter); VNET_LIST_RLOCK_NOSLEEP(); VNET_FOREACH(vnet_iter) { CURVNET_SET(vnet_iter); mld_fasttimo_vnet(); CURVNET_RESTORE(); } VNET_LIST_RUNLOCK_NOSLEEP(); } /* * Fast timeout handler (per-vnet). * * VIMAGE: Assume caller has set up our curvnet. */ static void mld_fasttimo_vnet(void) { struct mbufq scq; /* State-change packets */ struct mbufq qrq; /* Query response packets */ struct ifnet *ifp; - struct mld_ifinfo *mli; + struct mld_ifsoftc *mli; struct ifmultiaddr *ifma; struct in6_multi *inm, *tinm; int uri_fasthz; uri_fasthz = 0; /* * Quick check to see if any work needs to be done, in order to * minimize the overhead of fasttimo processing. * SMPng: XXX Unlocked reads. */ if (!V_current_state_timers_running6 && !V_interface_timers_running6 && !V_state_change_timers_running6) return; IN6_MULTI_LOCK(); MLD_LOCK(); /* * MLDv2 General Query response timer processing. */ if (V_interface_timers_running6) { CTR1(KTR_MLD, "%s: interface timers running", __func__); V_interface_timers_running6 = 0; LIST_FOREACH(mli, &V_mli_head, mli_link) { if (mli->mli_v2_timer == 0) { /* Do nothing. */ } else if (--mli->mli_v2_timer == 0) { mld_v2_dispatch_general_query(mli); } else { V_interface_timers_running6 = 1; } } } if (!V_current_state_timers_running6 && !V_state_change_timers_running6) goto out_locked; V_current_state_timers_running6 = 0; V_state_change_timers_running6 = 0; CTR1(KTR_MLD, "%s: state change timers running", __func__); /* * MLD host report and state-change timer processing. * Note: Processing a v2 group timer may remove a node. */ LIST_FOREACH(mli, &V_mli_head, mli_link) { ifp = mli->mli_ifp; if (mli->mli_version == MLD_VERSION_2) { uri_fasthz = MLD_RANDOM_DELAY(mli->mli_uri * PR_FASTHZ); mbufq_init(&qrq, MLD_MAX_G_GS_PACKETS); mbufq_init(&scq, MLD_MAX_STATE_CHANGE_PACKETS); } IF_ADDR_RLOCK(ifp); TAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) { if (ifma->ifma_addr->sa_family != AF_INET6 || ifma->ifma_protospec == NULL) continue; inm = (struct in6_multi *)ifma->ifma_protospec; switch (mli->mli_version) { case MLD_VERSION_1: mld_v1_process_group_timer(mli, inm); break; case MLD_VERSION_2: mld_v2_process_group_timers(mli, &qrq, &scq, inm, uri_fasthz); break; } } IF_ADDR_RUNLOCK(ifp); switch (mli->mli_version) { case MLD_VERSION_1: /* * Transmit reports for this lifecycle. This * is done while not holding IF_ADDR_LOCK * since this can call * in6ifa_ifpforlinklocal() which locks * IF_ADDR_LOCK internally as well as * ip6_output() to transmit a packet. */ SLIST_FOREACH_SAFE(inm, &mli->mli_relinmhead, in6m_nrele, tinm) { SLIST_REMOVE_HEAD(&mli->mli_relinmhead, in6m_nrele); (void)mld_v1_transmit_report(inm, MLD_LISTENER_REPORT); } break; case MLD_VERSION_2: mld_dispatch_queue(&qrq, 0); mld_dispatch_queue(&scq, 0); /* * Free the in_multi reference(s) for * this lifecycle. */ SLIST_FOREACH_SAFE(inm, &mli->mli_relinmhead, in6m_nrele, tinm) { SLIST_REMOVE_HEAD(&mli->mli_relinmhead, in6m_nrele); in6m_release_locked(inm); } break; } } out_locked: MLD_UNLOCK(); IN6_MULTI_UNLOCK(); } /* * Update host report group timer. * Will update the global pending timer flags. */ static void -mld_v1_process_group_timer(struct mld_ifinfo *mli, struct in6_multi *inm) +mld_v1_process_group_timer(struct mld_ifsoftc *mli, struct in6_multi *inm) { int report_timer_expired; IN6_MULTI_LOCK_ASSERT(); MLD_LOCK_ASSERT(); if (inm->in6m_timer == 0) { report_timer_expired = 0; } else if (--inm->in6m_timer == 0) { report_timer_expired = 1; } else { V_current_state_timers_running6 = 1; return; } switch (inm->in6m_state) { case MLD_NOT_MEMBER: case MLD_SILENT_MEMBER: case MLD_IDLE_MEMBER: case MLD_LAZY_MEMBER: case MLD_SLEEPING_MEMBER: case MLD_AWAKENING_MEMBER: break; case MLD_REPORTING_MEMBER: if (report_timer_expired) { inm->in6m_state = MLD_IDLE_MEMBER; SLIST_INSERT_HEAD(&mli->mli_relinmhead, inm, in6m_nrele); } break; case MLD_G_QUERY_PENDING_MEMBER: case MLD_SG_QUERY_PENDING_MEMBER: case MLD_LEAVING_MEMBER: break; } } /* * Update a group's timers for MLDv2. * Will update the global pending timer flags. * Note: Unlocked read from mli. */ static void -mld_v2_process_group_timers(struct mld_ifinfo *mli, +mld_v2_process_group_timers(struct mld_ifsoftc *mli, struct mbufq *qrq, struct mbufq *scq, struct in6_multi *inm, const int uri_fasthz) { int query_response_timer_expired; int state_change_retransmit_timer_expired; #ifdef KTR char ip6tbuf[INET6_ADDRSTRLEN]; #endif IN6_MULTI_LOCK_ASSERT(); MLD_LOCK_ASSERT(); query_response_timer_expired = 0; state_change_retransmit_timer_expired = 0; /* * During a transition from compatibility mode back to MLDv2, * a group record in REPORTING state may still have its group * timer active. This is a no-op in this function; it is easier * to deal with it here than to complicate the slow-timeout path. */ if (inm->in6m_timer == 0) { query_response_timer_expired = 0; } else if (--inm->in6m_timer == 0) { query_response_timer_expired = 1; } else { V_current_state_timers_running6 = 1; } if (inm->in6m_sctimer == 0) { state_change_retransmit_timer_expired = 0; } else if (--inm->in6m_sctimer == 0) { state_change_retransmit_timer_expired = 1; } else { V_state_change_timers_running6 = 1; } /* We are in fasttimo, so be quick about it. */ if (!state_change_retransmit_timer_expired && !query_response_timer_expired) return; switch (inm->in6m_state) { case MLD_NOT_MEMBER: case MLD_SILENT_MEMBER: case MLD_SLEEPING_MEMBER: case MLD_LAZY_MEMBER: case MLD_AWAKENING_MEMBER: case MLD_IDLE_MEMBER: break; case MLD_G_QUERY_PENDING_MEMBER: case MLD_SG_QUERY_PENDING_MEMBER: /* * Respond to a previously pending Group-Specific * or Group-and-Source-Specific query by enqueueing * the appropriate Current-State report for * immediate transmission. */ if (query_response_timer_expired) { int retval; retval = mld_v2_enqueue_group_record(qrq, inm, 0, 1, (inm->in6m_state == MLD_SG_QUERY_PENDING_MEMBER), 0); CTR2(KTR_MLD, "%s: enqueue record = %d", __func__, retval); inm->in6m_state = MLD_REPORTING_MEMBER; in6m_clear_recorded(inm); } /* FALLTHROUGH */ case MLD_REPORTING_MEMBER: case MLD_LEAVING_MEMBER: if (state_change_retransmit_timer_expired) { /* * State-change retransmission timer fired. * If there are any further pending retransmissions, * set the global pending state-change flag, and * reset the timer. */ if (--inm->in6m_scrv > 0) { inm->in6m_sctimer = uri_fasthz; V_state_change_timers_running6 = 1; } /* * Retransmit the previously computed state-change * report. If there are no further pending * retransmissions, the mbuf queue will be consumed. * Update T0 state to T1 as we have now sent * a state-change. */ (void)mld_v2_merge_state_changes(inm, scq); in6m_commit(inm); CTR3(KTR_MLD, "%s: T1 -> T0 for %s/%s", __func__, ip6_sprintf(ip6tbuf, &inm->in6m_addr), if_name(inm->in6m_ifp)); /* * If we are leaving the group for good, make sure * we release MLD's reference to it. * This release must be deferred using a SLIST, * as we are called from a loop which traverses * the in_ifmultiaddr TAILQ. */ if (inm->in6m_state == MLD_LEAVING_MEMBER && inm->in6m_scrv == 0) { inm->in6m_state = MLD_NOT_MEMBER; SLIST_INSERT_HEAD(&mli->mli_relinmhead, inm, in6m_nrele); } } break; } } /* * Switch to a different version on the given interface, * as per Section 9.12. */ static void -mld_set_version(struct mld_ifinfo *mli, const int version) +mld_set_version(struct mld_ifsoftc *mli, const int version) { int old_version_timer; MLD_LOCK_ASSERT(); CTR4(KTR_MLD, "%s: switching to v%d on ifp %p(%s)", __func__, version, mli->mli_ifp, if_name(mli->mli_ifp)); if (version == MLD_VERSION_1) { /* * Compute the "Older Version Querier Present" timer as per * Section 9.12. */ old_version_timer = (mli->mli_rv * mli->mli_qi) + mli->mli_qri; old_version_timer *= PR_SLOWHZ; mli->mli_v1_timer = old_version_timer; } if (mli->mli_v1_timer > 0 && mli->mli_version != MLD_VERSION_1) { mli->mli_version = MLD_VERSION_1; mld_v2_cancel_link_timers(mli); } } /* * Cancel pending MLDv2 timers for the given link and all groups * joined on it; state-change, general-query, and group-query timers. */ static void -mld_v2_cancel_link_timers(struct mld_ifinfo *mli) +mld_v2_cancel_link_timers(struct mld_ifsoftc *mli) { struct ifmultiaddr *ifma; struct ifnet *ifp; struct in6_multi *inm, *tinm; CTR3(KTR_MLD, "%s: cancel v2 timers on ifp %p(%s)", __func__, mli->mli_ifp, if_name(mli->mli_ifp)); IN6_MULTI_LOCK_ASSERT(); MLD_LOCK_ASSERT(); /* * Fast-track this potentially expensive operation * by checking all the global 'timer pending' flags. */ if (!V_interface_timers_running6 && !V_state_change_timers_running6 && !V_current_state_timers_running6) return; mli->mli_v2_timer = 0; ifp = mli->mli_ifp; IF_ADDR_RLOCK(ifp); TAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) { if (ifma->ifma_addr->sa_family != AF_INET6) continue; inm = (struct in6_multi *)ifma->ifma_protospec; switch (inm->in6m_state) { case MLD_NOT_MEMBER: case MLD_SILENT_MEMBER: case MLD_IDLE_MEMBER: case MLD_LAZY_MEMBER: case MLD_SLEEPING_MEMBER: case MLD_AWAKENING_MEMBER: break; case MLD_LEAVING_MEMBER: /* * If we are leaving the group and switching * version, we need to release the final * reference held for issuing the INCLUDE {}. */ SLIST_INSERT_HEAD(&mli->mli_relinmhead, inm, in6m_nrele); /* FALLTHROUGH */ case MLD_G_QUERY_PENDING_MEMBER: case MLD_SG_QUERY_PENDING_MEMBER: in6m_clear_recorded(inm); /* FALLTHROUGH */ case MLD_REPORTING_MEMBER: inm->in6m_sctimer = 0; inm->in6m_timer = 0; inm->in6m_state = MLD_REPORTING_MEMBER; /* * Free any pending MLDv2 state-change records. */ mbufq_drain(&inm->in6m_scq); break; } } IF_ADDR_RUNLOCK(ifp); SLIST_FOREACH_SAFE(inm, &mli->mli_relinmhead, in6m_nrele, tinm) { SLIST_REMOVE_HEAD(&mli->mli_relinmhead, in6m_nrele); in6m_release_locked(inm); } } /* * Global slowtimo handler. * VIMAGE: Timeout handlers are expected to service all vimages. */ void mld_slowtimo(void) { VNET_ITERATOR_DECL(vnet_iter); VNET_LIST_RLOCK_NOSLEEP(); VNET_FOREACH(vnet_iter) { CURVNET_SET(vnet_iter); mld_slowtimo_vnet(); CURVNET_RESTORE(); } VNET_LIST_RUNLOCK_NOSLEEP(); } /* * Per-vnet slowtimo handler. */ static void mld_slowtimo_vnet(void) { - struct mld_ifinfo *mli; + struct mld_ifsoftc *mli; MLD_LOCK(); LIST_FOREACH(mli, &V_mli_head, mli_link) { mld_v1_process_querier_timers(mli); } MLD_UNLOCK(); } /* * Update the Older Version Querier Present timers for a link. * See Section 9.12 of RFC 3810. */ static void -mld_v1_process_querier_timers(struct mld_ifinfo *mli) +mld_v1_process_querier_timers(struct mld_ifsoftc *mli) { MLD_LOCK_ASSERT(); if (mli->mli_version != MLD_VERSION_2 && --mli->mli_v1_timer == 0) { /* * MLDv1 Querier Present timer expired; revert to MLDv2. */ CTR5(KTR_MLD, "%s: transition from v%d -> v%d on %p(%s)", __func__, mli->mli_version, MLD_VERSION_2, mli->mli_ifp, if_name(mli->mli_ifp)); mli->mli_version = MLD_VERSION_2; } } /* * Transmit an MLDv1 report immediately. */ static int mld_v1_transmit_report(struct in6_multi *in6m, const int type) { struct ifnet *ifp; struct in6_ifaddr *ia; struct ip6_hdr *ip6; struct mbuf *mh, *md; struct mld_hdr *mld; IN6_MULTI_LOCK_ASSERT(); MLD_LOCK_ASSERT(); ifp = in6m->in6m_ifp; ia = in6ifa_ifpforlinklocal(ifp, IN6_IFF_NOTREADY|IN6_IFF_ANYCAST); /* ia may be NULL if link-local address is tentative. */ mh = m_gethdr(M_NOWAIT, MT_DATA); if (mh == NULL) { if (ia != NULL) ifa_free(&ia->ia_ifa); return (ENOMEM); } md = m_get(M_NOWAIT, MT_DATA); if (md == NULL) { m_free(mh); if (ia != NULL) ifa_free(&ia->ia_ifa); return (ENOMEM); } mh->m_next = md; /* * FUTURE: Consider increasing alignment by ETHER_HDR_LEN, so * that ether_output() does not need to allocate another mbuf * for the header in the most common case. */ M_ALIGN(mh, sizeof(struct ip6_hdr)); mh->m_pkthdr.len = sizeof(struct ip6_hdr) + sizeof(struct mld_hdr); mh->m_len = sizeof(struct ip6_hdr); ip6 = mtod(mh, struct ip6_hdr *); ip6->ip6_flow = 0; ip6->ip6_vfc &= ~IPV6_VERSION_MASK; ip6->ip6_vfc |= IPV6_VERSION; ip6->ip6_nxt = IPPROTO_ICMPV6; ip6->ip6_src = ia ? ia->ia_addr.sin6_addr : in6addr_any; ip6->ip6_dst = in6m->in6m_addr; md->m_len = sizeof(struct mld_hdr); mld = mtod(md, struct mld_hdr *); mld->mld_type = type; mld->mld_code = 0; mld->mld_cksum = 0; mld->mld_maxdelay = 0; mld->mld_reserved = 0; mld->mld_addr = in6m->in6m_addr; in6_clearscope(&mld->mld_addr); mld->mld_cksum = in6_cksum(mh, IPPROTO_ICMPV6, sizeof(struct ip6_hdr), sizeof(struct mld_hdr)); mld_save_context(mh, ifp); mh->m_flags |= M_MLDV1; mld_dispatch_packet(mh); if (ia != NULL) ifa_free(&ia->ia_ifa); return (0); } /* * Process a state change from the upper layer for the given IPv6 group. * * Each socket holds a reference on the in_multi in its own ip_moptions. * The socket layer will have made the necessary updates to.the group * state, it is now up to MLD to issue a state change report if there * has been any change between T0 (when the last state-change was issued) * and T1 (now). * * We use the MLDv2 state machine at group level. The MLd module * however makes the decision as to which MLD protocol version to speak. * A state change *from* INCLUDE {} always means an initial join. * A state change *to* INCLUDE {} always means a final leave. * * If delay is non-zero, and the state change is an initial multicast * join, the state change report will be delayed by 'delay' ticks * in units of PR_FASTHZ if MLDv1 is active on the link; otherwise * the initial MLDv2 state change report will be delayed by whichever * is sooner, a pending state-change timer or delay itself. * * VIMAGE: curvnet should have been set by caller, as this routine * is called from the socket option handlers. */ int mld_change_state(struct in6_multi *inm, const int delay) { - struct mld_ifinfo *mli; + struct mld_ifsoftc *mli; struct ifnet *ifp; int error; IN6_MULTI_LOCK_ASSERT(); error = 0; /* * Try to detect if the upper layer just asked us to change state * for an interface which has now gone away. */ KASSERT(inm->in6m_ifma != NULL, ("%s: no ifma", __func__)); ifp = inm->in6m_ifma->ifma_ifp; if (ifp != NULL) { /* * Sanity check that netinet6's notion of ifp is the * same as net's. */ KASSERT(inm->in6m_ifp == ifp, ("%s: bad ifp", __func__)); } MLD_LOCK(); mli = MLD_IFINFO(ifp); - KASSERT(mli != NULL, ("%s: no mld_ifinfo for ifp %p", __func__, ifp)); + KASSERT(mli != NULL, ("%s: no mld_ifsoftc for ifp %p", __func__, ifp)); /* * If we detect a state transition to or from MCAST_UNDEFINED * for this group, then we are starting or finishing an MLD * life cycle for this group. */ if (inm->in6m_st[1].iss_fmode != inm->in6m_st[0].iss_fmode) { CTR3(KTR_MLD, "%s: inm transition %d -> %d", __func__, inm->in6m_st[0].iss_fmode, inm->in6m_st[1].iss_fmode); if (inm->in6m_st[0].iss_fmode == MCAST_UNDEFINED) { CTR1(KTR_MLD, "%s: initial join", __func__); error = mld_initial_join(inm, mli, delay); goto out_locked; } else if (inm->in6m_st[1].iss_fmode == MCAST_UNDEFINED) { CTR1(KTR_MLD, "%s: final leave", __func__); mld_final_leave(inm, mli); goto out_locked; } } else { CTR1(KTR_MLD, "%s: filter set change", __func__); } error = mld_handle_state_change(inm, mli); out_locked: MLD_UNLOCK(); return (error); } /* * Perform the initial join for an MLD group. * * When joining a group: * If the group should have its MLD traffic suppressed, do nothing. * MLDv1 starts sending MLDv1 host membership reports. * MLDv2 will schedule an MLDv2 state-change report containing the * initial state of the membership. * * If the delay argument is non-zero, then we must delay sending the * initial state change for delay ticks (in units of PR_FASTHZ). */ static int -mld_initial_join(struct in6_multi *inm, struct mld_ifinfo *mli, +mld_initial_join(struct in6_multi *inm, struct mld_ifsoftc *mli, const int delay) { struct ifnet *ifp; struct mbufq *mq; int error, retval, syncstates; int odelay; #ifdef KTR char ip6tbuf[INET6_ADDRSTRLEN]; #endif CTR4(KTR_MLD, "%s: initial join %s on ifp %p(%s)", __func__, ip6_sprintf(ip6tbuf, &inm->in6m_addr), inm->in6m_ifp, if_name(inm->in6m_ifp)); error = 0; syncstates = 1; ifp = inm->in6m_ifp; IN6_MULTI_LOCK_ASSERT(); MLD_LOCK_ASSERT(); KASSERT(mli && mli->mli_ifp == ifp, ("%s: inconsistent ifp", __func__)); /* * Groups joined on loopback or marked as 'not reported', * enter the MLD_SILENT_MEMBER state and * are never reported in any protocol exchanges. * All other groups enter the appropriate state machine * for the version in use on this link. * A link marked as MLIF_SILENT causes MLD to be completely * disabled for the link. */ if ((ifp->if_flags & IFF_LOOPBACK) || (mli->mli_flags & MLIF_SILENT) || !mld_is_addr_reported(&inm->in6m_addr)) { CTR1(KTR_MLD, "%s: not kicking state machine for silent group", __func__); inm->in6m_state = MLD_SILENT_MEMBER; inm->in6m_timer = 0; } else { /* * Deal with overlapping in_multi lifecycle. * If this group was LEAVING, then make sure * we drop the reference we picked up to keep the * group around for the final INCLUDE {} enqueue. */ if (mli->mli_version == MLD_VERSION_2 && inm->in6m_state == MLD_LEAVING_MEMBER) in6m_release_locked(inm); inm->in6m_state = MLD_REPORTING_MEMBER; switch (mli->mli_version) { case MLD_VERSION_1: /* * If a delay was provided, only use it if * it is greater than the delay normally * used for an MLDv1 state change report, * and delay sending the initial MLDv1 report * by not transitioning to the IDLE state. */ odelay = MLD_RANDOM_DELAY(MLD_V1_MAX_RI * PR_FASTHZ); if (delay) { inm->in6m_timer = max(delay, odelay); V_current_state_timers_running6 = 1; } else { inm->in6m_state = MLD_IDLE_MEMBER; error = mld_v1_transmit_report(inm, MLD_LISTENER_REPORT); if (error == 0) { inm->in6m_timer = odelay; V_current_state_timers_running6 = 1; } } break; case MLD_VERSION_2: /* * Defer update of T0 to T1, until the first copy * of the state change has been transmitted. */ syncstates = 0; /* * Immediately enqueue a State-Change Report for * this interface, freeing any previous reports. * Don't kick the timers if there is nothing to do, * or if an error occurred. */ mq = &inm->in6m_scq; mbufq_drain(mq); retval = mld_v2_enqueue_group_record(mq, inm, 1, 0, 0, (mli->mli_flags & MLIF_USEALLOW)); CTR2(KTR_MLD, "%s: enqueue record = %d", __func__, retval); if (retval <= 0) { error = retval * -1; break; } /* * Schedule transmission of pending state-change * report up to RV times for this link. The timer * will fire at the next mld_fasttimo (~200ms), * giving us an opportunity to merge the reports. * * If a delay was provided to this function, only * use this delay if sooner than the existing one. */ KASSERT(mli->mli_rv > 1, ("%s: invalid robustness %d", __func__, mli->mli_rv)); inm->in6m_scrv = mli->mli_rv; if (delay) { if (inm->in6m_sctimer > 1) { inm->in6m_sctimer = min(inm->in6m_sctimer, delay); } else inm->in6m_sctimer = delay; } else inm->in6m_sctimer = 1; V_state_change_timers_running6 = 1; error = 0; break; } } /* * Only update the T0 state if state change is atomic, * i.e. we don't need to wait for a timer to fire before we * can consider the state change to have been communicated. */ if (syncstates) { in6m_commit(inm); CTR3(KTR_MLD, "%s: T1 -> T0 for %s/%s", __func__, ip6_sprintf(ip6tbuf, &inm->in6m_addr), if_name(inm->in6m_ifp)); } return (error); } /* * Issue an intermediate state change during the life-cycle. */ static int -mld_handle_state_change(struct in6_multi *inm, struct mld_ifinfo *mli) +mld_handle_state_change(struct in6_multi *inm, struct mld_ifsoftc *mli) { struct ifnet *ifp; int retval; #ifdef KTR char ip6tbuf[INET6_ADDRSTRLEN]; #endif CTR4(KTR_MLD, "%s: state change for %s on ifp %p(%s)", __func__, ip6_sprintf(ip6tbuf, &inm->in6m_addr), inm->in6m_ifp, if_name(inm->in6m_ifp)); ifp = inm->in6m_ifp; IN6_MULTI_LOCK_ASSERT(); MLD_LOCK_ASSERT(); KASSERT(mli && mli->mli_ifp == ifp, ("%s: inconsistent ifp", __func__)); if ((ifp->if_flags & IFF_LOOPBACK) || (mli->mli_flags & MLIF_SILENT) || !mld_is_addr_reported(&inm->in6m_addr) || (mli->mli_version != MLD_VERSION_2)) { if (!mld_is_addr_reported(&inm->in6m_addr)) { CTR1(KTR_MLD, "%s: not kicking state machine for silent group", __func__); } CTR1(KTR_MLD, "%s: nothing to do", __func__); in6m_commit(inm); CTR3(KTR_MLD, "%s: T1 -> T0 for %s/%s", __func__, ip6_sprintf(ip6tbuf, &inm->in6m_addr), if_name(inm->in6m_ifp)); return (0); } mbufq_drain(&inm->in6m_scq); retval = mld_v2_enqueue_group_record(&inm->in6m_scq, inm, 1, 0, 0, (mli->mli_flags & MLIF_USEALLOW)); CTR2(KTR_MLD, "%s: enqueue record = %d", __func__, retval); if (retval <= 0) return (-retval); /* * If record(s) were enqueued, start the state-change * report timer for this group. */ inm->in6m_scrv = mli->mli_rv; inm->in6m_sctimer = 1; V_state_change_timers_running6 = 1; return (0); } /* * Perform the final leave for a multicast address. * * When leaving a group: * MLDv1 sends a DONE message, if and only if we are the reporter. * MLDv2 enqueues a state-change report containing a transition * to INCLUDE {} for immediate transmission. */ static void -mld_final_leave(struct in6_multi *inm, struct mld_ifinfo *mli) +mld_final_leave(struct in6_multi *inm, struct mld_ifsoftc *mli) { int syncstates; #ifdef KTR char ip6tbuf[INET6_ADDRSTRLEN]; #endif syncstates = 1; CTR4(KTR_MLD, "%s: final leave %s on ifp %p(%s)", __func__, ip6_sprintf(ip6tbuf, &inm->in6m_addr), inm->in6m_ifp, if_name(inm->in6m_ifp)); IN6_MULTI_LOCK_ASSERT(); MLD_LOCK_ASSERT(); switch (inm->in6m_state) { case MLD_NOT_MEMBER: case MLD_SILENT_MEMBER: case MLD_LEAVING_MEMBER: /* Already leaving or left; do nothing. */ CTR1(KTR_MLD, "%s: not kicking state machine for silent group", __func__); break; case MLD_REPORTING_MEMBER: case MLD_IDLE_MEMBER: case MLD_G_QUERY_PENDING_MEMBER: case MLD_SG_QUERY_PENDING_MEMBER: if (mli->mli_version == MLD_VERSION_1) { #ifdef INVARIANTS if (inm->in6m_state == MLD_G_QUERY_PENDING_MEMBER || inm->in6m_state == MLD_SG_QUERY_PENDING_MEMBER) panic("%s: MLDv2 state reached, not MLDv2 mode", __func__); #endif mld_v1_transmit_report(inm, MLD_LISTENER_DONE); inm->in6m_state = MLD_NOT_MEMBER; V_current_state_timers_running6 = 1; } else if (mli->mli_version == MLD_VERSION_2) { /* * Stop group timer and all pending reports. * Immediately enqueue a state-change report * TO_IN {} to be sent on the next fast timeout, * giving us an opportunity to merge reports. */ mbufq_drain(&inm->in6m_scq); inm->in6m_timer = 0; inm->in6m_scrv = mli->mli_rv; CTR4(KTR_MLD, "%s: Leaving %s/%s with %d " "pending retransmissions.", __func__, ip6_sprintf(ip6tbuf, &inm->in6m_addr), if_name(inm->in6m_ifp), inm->in6m_scrv); if (inm->in6m_scrv == 0) { inm->in6m_state = MLD_NOT_MEMBER; inm->in6m_sctimer = 0; } else { int retval; in6m_acquire_locked(inm); retval = mld_v2_enqueue_group_record( &inm->in6m_scq, inm, 1, 0, 0, (mli->mli_flags & MLIF_USEALLOW)); KASSERT(retval != 0, ("%s: enqueue record = %d", __func__, retval)); inm->in6m_state = MLD_LEAVING_MEMBER; inm->in6m_sctimer = 1; V_state_change_timers_running6 = 1; syncstates = 0; } break; } break; case MLD_LAZY_MEMBER: case MLD_SLEEPING_MEMBER: case MLD_AWAKENING_MEMBER: /* Our reports are suppressed; do nothing. */ break; } if (syncstates) { in6m_commit(inm); CTR3(KTR_MLD, "%s: T1 -> T0 for %s/%s", __func__, ip6_sprintf(ip6tbuf, &inm->in6m_addr), if_name(inm->in6m_ifp)); inm->in6m_st[1].iss_fmode = MCAST_UNDEFINED; CTR3(KTR_MLD, "%s: T1 now MCAST_UNDEFINED for %p/%s", __func__, &inm->in6m_addr, if_name(inm->in6m_ifp)); } } /* * Enqueue an MLDv2 group record to the given output queue. * * If is_state_change is zero, a current-state record is appended. * If is_state_change is non-zero, a state-change report is appended. * * If is_group_query is non-zero, an mbuf packet chain is allocated. * If is_group_query is zero, and if there is a packet with free space * at the tail of the queue, it will be appended to providing there * is enough free space. * Otherwise a new mbuf packet chain is allocated. * * If is_source_query is non-zero, each source is checked to see if * it was recorded for a Group-Source query, and will be omitted if * it is not both in-mode and recorded. * * If use_block_allow is non-zero, state change reports for initial join * and final leave, on an inclusive mode group with a source list, will be * rewritten to use the ALLOW_NEW and BLOCK_OLD record types, respectively. * * The function will attempt to allocate leading space in the packet * for the IPv6+ICMP headers to be prepended without fragmenting the chain. * * If successful the size of all data appended to the queue is returned, * otherwise an error code less than zero is returned, or zero if * no record(s) were appended. */ static int mld_v2_enqueue_group_record(struct mbufq *mq, struct in6_multi *inm, const int is_state_change, const int is_group_query, const int is_source_query, const int use_block_allow) { struct mldv2_record mr; struct mldv2_record *pmr; struct ifnet *ifp; struct ip6_msource *ims, *nims; struct mbuf *m0, *m, *md; int error, is_filter_list_change; int minrec0len, m0srcs, msrcs, nbytes, off; int record_has_sources; int now; int type; uint8_t mode; #ifdef KTR char ip6tbuf[INET6_ADDRSTRLEN]; #endif IN6_MULTI_LOCK_ASSERT(); error = 0; ifp = inm->in6m_ifp; is_filter_list_change = 0; m = NULL; m0 = NULL; m0srcs = 0; msrcs = 0; nbytes = 0; nims = NULL; record_has_sources = 1; pmr = NULL; type = MLD_DO_NOTHING; mode = inm->in6m_st[1].iss_fmode; /* * If we did not transition out of ASM mode during t0->t1, * and there are no source nodes to process, we can skip * the generation of source records. */ if (inm->in6m_st[0].iss_asm > 0 && inm->in6m_st[1].iss_asm > 0 && inm->in6m_nsrc == 0) record_has_sources = 0; if (is_state_change) { /* * Queue a state change record. * If the mode did not change, and there are non-ASM * listeners or source filters present, * we potentially need to issue two records for the group. * If there are ASM listeners, and there was no filter * mode transition of any kind, do nothing. * * If we are transitioning to MCAST_UNDEFINED, we need * not send any sources. A transition to/from this state is * considered inclusive with some special treatment. * * If we are rewriting initial joins/leaves to use * ALLOW/BLOCK, and the group's membership is inclusive, * we need to send sources in all cases. */ if (mode != inm->in6m_st[0].iss_fmode) { if (mode == MCAST_EXCLUDE) { CTR1(KTR_MLD, "%s: change to EXCLUDE", __func__); type = MLD_CHANGE_TO_EXCLUDE_MODE; } else { CTR1(KTR_MLD, "%s: change to INCLUDE", __func__); if (use_block_allow) { /* * XXX * Here we're interested in state * edges either direction between * MCAST_UNDEFINED and MCAST_INCLUDE. * Perhaps we should just check * the group state, rather than * the filter mode. */ if (mode == MCAST_UNDEFINED) { type = MLD_BLOCK_OLD_SOURCES; } else { type = MLD_ALLOW_NEW_SOURCES; } } else { type = MLD_CHANGE_TO_INCLUDE_MODE; if (mode == MCAST_UNDEFINED) record_has_sources = 0; } } } else { if (record_has_sources) { is_filter_list_change = 1; } else { type = MLD_DO_NOTHING; } } } else { /* * Queue a current state record. */ if (mode == MCAST_EXCLUDE) { type = MLD_MODE_IS_EXCLUDE; } else if (mode == MCAST_INCLUDE) { type = MLD_MODE_IS_INCLUDE; KASSERT(inm->in6m_st[1].iss_asm == 0, ("%s: inm %p is INCLUDE but ASM count is %d", __func__, inm, inm->in6m_st[1].iss_asm)); } } /* * Generate the filter list changes using a separate function. */ if (is_filter_list_change) return (mld_v2_enqueue_filter_change(mq, inm)); if (type == MLD_DO_NOTHING) { CTR3(KTR_MLD, "%s: nothing to do for %s/%s", __func__, ip6_sprintf(ip6tbuf, &inm->in6m_addr), if_name(inm->in6m_ifp)); return (0); } /* * If any sources are present, we must be able to fit at least * one in the trailing space of the tail packet's mbuf, * ideally more. */ minrec0len = sizeof(struct mldv2_record); if (record_has_sources) minrec0len += sizeof(struct in6_addr); CTR4(KTR_MLD, "%s: queueing %s for %s/%s", __func__, mld_rec_type_to_str(type), ip6_sprintf(ip6tbuf, &inm->in6m_addr), if_name(inm->in6m_ifp)); /* * Check if we have a packet in the tail of the queue for this * group into which the first group record for this group will fit. * Otherwise allocate a new packet. * Always allocate leading space for IP6+RA+ICMPV6+REPORT. * Note: Group records for G/GSR query responses MUST be sent * in their own packet. */ m0 = mbufq_last(mq); if (!is_group_query && m0 != NULL && (m0->m_pkthdr.PH_vt.vt_nrecs + 1 <= MLD_V2_REPORT_MAXRECS) && (m0->m_pkthdr.len + minrec0len) < (ifp->if_mtu - MLD_MTUSPACE)) { m0srcs = (ifp->if_mtu - m0->m_pkthdr.len - sizeof(struct mldv2_record)) / sizeof(struct in6_addr); m = m0; CTR1(KTR_MLD, "%s: use existing packet", __func__); } else { if (mbufq_full(mq)) { CTR1(KTR_MLD, "%s: outbound queue full", __func__); return (-ENOMEM); } m = NULL; m0srcs = (ifp->if_mtu - MLD_MTUSPACE - sizeof(struct mldv2_record)) / sizeof(struct in6_addr); if (!is_state_change && !is_group_query) m = m_getcl(M_NOWAIT, MT_DATA, M_PKTHDR); if (m == NULL) m = m_gethdr(M_NOWAIT, MT_DATA); if (m == NULL) return (-ENOMEM); mld_save_context(m, ifp); CTR1(KTR_MLD, "%s: allocated first packet", __func__); } /* * Append group record. * If we have sources, we don't know how many yet. */ mr.mr_type = type; mr.mr_datalen = 0; mr.mr_numsrc = 0; mr.mr_addr = inm->in6m_addr; in6_clearscope(&mr.mr_addr); if (!m_append(m, sizeof(struct mldv2_record), (void *)&mr)) { if (m != m0) m_freem(m); CTR1(KTR_MLD, "%s: m_append() failed.", __func__); return (-ENOMEM); } nbytes += sizeof(struct mldv2_record); /* * Append as many sources as will fit in the first packet. * If we are appending to a new packet, the chain allocation * may potentially use clusters; use m_getptr() in this case. * If we are appending to an existing packet, we need to obtain * a pointer to the group record after m_append(), in case a new * mbuf was allocated. * * Only append sources which are in-mode at t1. If we are * transitioning to MCAST_UNDEFINED state on the group, and * use_block_allow is zero, do not include source entries. * Otherwise, we need to include this source in the report. * * Only report recorded sources in our filter set when responding * to a group-source query. */ if (record_has_sources) { if (m == m0) { md = m_last(m); pmr = (struct mldv2_record *)(mtod(md, uint8_t *) + md->m_len - nbytes); } else { md = m_getptr(m, 0, &off); pmr = (struct mldv2_record *)(mtod(md, uint8_t *) + off); } msrcs = 0; RB_FOREACH_SAFE(ims, ip6_msource_tree, &inm->in6m_srcs, nims) { CTR2(KTR_MLD, "%s: visit node %s", __func__, ip6_sprintf(ip6tbuf, &ims->im6s_addr)); now = im6s_get_mode(inm, ims, 1); CTR2(KTR_MLD, "%s: node is %d", __func__, now); if ((now != mode) || (now == mode && (!use_block_allow && mode == MCAST_UNDEFINED))) { CTR1(KTR_MLD, "%s: skip node", __func__); continue; } if (is_source_query && ims->im6s_stp == 0) { CTR1(KTR_MLD, "%s: skip unrecorded node", __func__); continue; } CTR1(KTR_MLD, "%s: append node", __func__); if (!m_append(m, sizeof(struct in6_addr), (void *)&ims->im6s_addr)) { if (m != m0) m_freem(m); CTR1(KTR_MLD, "%s: m_append() failed.", __func__); return (-ENOMEM); } nbytes += sizeof(struct in6_addr); ++msrcs; if (msrcs == m0srcs) break; } CTR2(KTR_MLD, "%s: msrcs is %d this packet", __func__, msrcs); pmr->mr_numsrc = htons(msrcs); nbytes += (msrcs * sizeof(struct in6_addr)); } if (is_source_query && msrcs == 0) { CTR1(KTR_MLD, "%s: no recorded sources to report", __func__); if (m != m0) m_freem(m); return (0); } /* * We are good to go with first packet. */ if (m != m0) { CTR1(KTR_MLD, "%s: enqueueing first packet", __func__); m->m_pkthdr.PH_vt.vt_nrecs = 1; mbufq_enqueue(mq, m); } else m->m_pkthdr.PH_vt.vt_nrecs++; /* * No further work needed if no source list in packet(s). */ if (!record_has_sources) return (nbytes); /* * Whilst sources remain to be announced, we need to allocate * a new packet and fill out as many sources as will fit. * Always try for a cluster first. */ while (nims != NULL) { if (mbufq_full(mq)) { CTR1(KTR_MLD, "%s: outbound queue full", __func__); return (-ENOMEM); } m = m_getcl(M_NOWAIT, MT_DATA, M_PKTHDR); if (m == NULL) m = m_gethdr(M_NOWAIT, MT_DATA); if (m == NULL) return (-ENOMEM); mld_save_context(m, ifp); md = m_getptr(m, 0, &off); pmr = (struct mldv2_record *)(mtod(md, uint8_t *) + off); CTR1(KTR_MLD, "%s: allocated next packet", __func__); if (!m_append(m, sizeof(struct mldv2_record), (void *)&mr)) { if (m != m0) m_freem(m); CTR1(KTR_MLD, "%s: m_append() failed.", __func__); return (-ENOMEM); } m->m_pkthdr.PH_vt.vt_nrecs = 1; nbytes += sizeof(struct mldv2_record); m0srcs = (ifp->if_mtu - MLD_MTUSPACE - sizeof(struct mldv2_record)) / sizeof(struct in6_addr); msrcs = 0; RB_FOREACH_FROM(ims, ip6_msource_tree, nims) { CTR2(KTR_MLD, "%s: visit node %s", __func__, ip6_sprintf(ip6tbuf, &ims->im6s_addr)); now = im6s_get_mode(inm, ims, 1); if ((now != mode) || (now == mode && (!use_block_allow && mode == MCAST_UNDEFINED))) { CTR1(KTR_MLD, "%s: skip node", __func__); continue; } if (is_source_query && ims->im6s_stp == 0) { CTR1(KTR_MLD, "%s: skip unrecorded node", __func__); continue; } CTR1(KTR_MLD, "%s: append node", __func__); if (!m_append(m, sizeof(struct in6_addr), (void *)&ims->im6s_addr)) { if (m != m0) m_freem(m); CTR1(KTR_MLD, "%s: m_append() failed.", __func__); return (-ENOMEM); } ++msrcs; if (msrcs == m0srcs) break; } pmr->mr_numsrc = htons(msrcs); nbytes += (msrcs * sizeof(struct in6_addr)); CTR1(KTR_MLD, "%s: enqueueing next packet", __func__); mbufq_enqueue(mq, m); } return (nbytes); } /* * Type used to mark record pass completion. * We exploit the fact we can cast to this easily from the * current filter modes on each ip_msource node. */ typedef enum { REC_NONE = 0x00, /* MCAST_UNDEFINED */ REC_ALLOW = 0x01, /* MCAST_INCLUDE */ REC_BLOCK = 0x02, /* MCAST_EXCLUDE */ REC_FULL = REC_ALLOW | REC_BLOCK } rectype_t; /* * Enqueue an MLDv2 filter list change to the given output queue. * * Source list filter state is held in an RB-tree. When the filter list * for a group is changed without changing its mode, we need to compute * the deltas between T0 and T1 for each source in the filter set, * and enqueue the appropriate ALLOW_NEW/BLOCK_OLD records. * * As we may potentially queue two record types, and the entire R-B tree * needs to be walked at once, we break this out into its own function * so we can generate a tightly packed queue of packets. * * XXX This could be written to only use one tree walk, although that makes * serializing into the mbuf chains a bit harder. For now we do two walks * which makes things easier on us, and it may or may not be harder on * the L2 cache. * * If successful the size of all data appended to the queue is returned, * otherwise an error code less than zero is returned, or zero if * no record(s) were appended. */ static int mld_v2_enqueue_filter_change(struct mbufq *mq, struct in6_multi *inm) { static const int MINRECLEN = sizeof(struct mldv2_record) + sizeof(struct in6_addr); struct ifnet *ifp; struct mldv2_record mr; struct mldv2_record *pmr; struct ip6_msource *ims, *nims; struct mbuf *m, *m0, *md; int m0srcs, nbytes, npbytes, off, rsrcs, schanged; int nallow, nblock; uint8_t mode, now, then; rectype_t crt, drt, nrt; #ifdef KTR char ip6tbuf[INET6_ADDRSTRLEN]; #endif IN6_MULTI_LOCK_ASSERT(); if (inm->in6m_nsrc == 0 || (inm->in6m_st[0].iss_asm > 0 && inm->in6m_st[1].iss_asm > 0)) return (0); ifp = inm->in6m_ifp; /* interface */ mode = inm->in6m_st[1].iss_fmode; /* filter mode at t1 */ crt = REC_NONE; /* current group record type */ drt = REC_NONE; /* mask of completed group record types */ nrt = REC_NONE; /* record type for current node */ m0srcs = 0; /* # source which will fit in current mbuf chain */ npbytes = 0; /* # of bytes appended this packet */ nbytes = 0; /* # of bytes appended to group's state-change queue */ rsrcs = 0; /* # sources encoded in current record */ schanged = 0; /* # nodes encoded in overall filter change */ nallow = 0; /* # of source entries in ALLOW_NEW */ nblock = 0; /* # of source entries in BLOCK_OLD */ nims = NULL; /* next tree node pointer */ /* * For each possible filter record mode. * The first kind of source we encounter tells us which * is the first kind of record we start appending. * If a node transitioned to UNDEFINED at t1, its mode is treated * as the inverse of the group's filter mode. */ while (drt != REC_FULL) { do { m0 = mbufq_last(mq); if (m0 != NULL && (m0->m_pkthdr.PH_vt.vt_nrecs + 1 <= MLD_V2_REPORT_MAXRECS) && (m0->m_pkthdr.len + MINRECLEN) < (ifp->if_mtu - MLD_MTUSPACE)) { m = m0; m0srcs = (ifp->if_mtu - m0->m_pkthdr.len - sizeof(struct mldv2_record)) / sizeof(struct in6_addr); CTR1(KTR_MLD, "%s: use previous packet", __func__); } else { m = m_getcl(M_NOWAIT, MT_DATA, M_PKTHDR); if (m == NULL) m = m_gethdr(M_NOWAIT, MT_DATA); if (m == NULL) { CTR1(KTR_MLD, "%s: m_get*() failed", __func__); return (-ENOMEM); } m->m_pkthdr.PH_vt.vt_nrecs = 0; mld_save_context(m, ifp); m0srcs = (ifp->if_mtu - MLD_MTUSPACE - sizeof(struct mldv2_record)) / sizeof(struct in6_addr); npbytes = 0; CTR1(KTR_MLD, "%s: allocated new packet", __func__); } /* * Append the MLD group record header to the * current packet's data area. * Recalculate pointer to free space for next * group record, in case m_append() allocated * a new mbuf or cluster. */ memset(&mr, 0, sizeof(mr)); mr.mr_addr = inm->in6m_addr; in6_clearscope(&mr.mr_addr); if (!m_append(m, sizeof(mr), (void *)&mr)) { if (m != m0) m_freem(m); CTR1(KTR_MLD, "%s: m_append() failed", __func__); return (-ENOMEM); } npbytes += sizeof(struct mldv2_record); if (m != m0) { /* new packet; offset in chain */ md = m_getptr(m, npbytes - sizeof(struct mldv2_record), &off); pmr = (struct mldv2_record *)(mtod(md, uint8_t *) + off); } else { /* current packet; offset from last append */ md = m_last(m); pmr = (struct mldv2_record *)(mtod(md, uint8_t *) + md->m_len - sizeof(struct mldv2_record)); } /* * Begin walking the tree for this record type * pass, or continue from where we left off * previously if we had to allocate a new packet. * Only report deltas in-mode at t1. * We need not report included sources as allowed * if we are in inclusive mode on the group, * however the converse is not true. */ rsrcs = 0; if (nims == NULL) { nims = RB_MIN(ip6_msource_tree, &inm->in6m_srcs); } RB_FOREACH_FROM(ims, ip6_msource_tree, nims) { CTR2(KTR_MLD, "%s: visit node %s", __func__, ip6_sprintf(ip6tbuf, &ims->im6s_addr)); now = im6s_get_mode(inm, ims, 1); then = im6s_get_mode(inm, ims, 0); CTR3(KTR_MLD, "%s: mode: t0 %d, t1 %d", __func__, then, now); if (now == then) { CTR1(KTR_MLD, "%s: skip unchanged", __func__); continue; } if (mode == MCAST_EXCLUDE && now == MCAST_INCLUDE) { CTR1(KTR_MLD, "%s: skip IN src on EX group", __func__); continue; } nrt = (rectype_t)now; if (nrt == REC_NONE) nrt = (rectype_t)(~mode & REC_FULL); if (schanged++ == 0) { crt = nrt; } else if (crt != nrt) continue; if (!m_append(m, sizeof(struct in6_addr), (void *)&ims->im6s_addr)) { if (m != m0) m_freem(m); CTR1(KTR_MLD, "%s: m_append() failed", __func__); return (-ENOMEM); } nallow += !!(crt == REC_ALLOW); nblock += !!(crt == REC_BLOCK); if (++rsrcs == m0srcs) break; } /* * If we did not append any tree nodes on this * pass, back out of allocations. */ if (rsrcs == 0) { npbytes -= sizeof(struct mldv2_record); if (m != m0) { CTR1(KTR_MLD, "%s: m_free(m)", __func__); m_freem(m); } else { CTR1(KTR_MLD, "%s: m_adj(m, -mr)", __func__); m_adj(m, -((int)sizeof( struct mldv2_record))); } continue; } npbytes += (rsrcs * sizeof(struct in6_addr)); if (crt == REC_ALLOW) pmr->mr_type = MLD_ALLOW_NEW_SOURCES; else if (crt == REC_BLOCK) pmr->mr_type = MLD_BLOCK_OLD_SOURCES; pmr->mr_numsrc = htons(rsrcs); /* * Count the new group record, and enqueue this * packet if it wasn't already queued. */ m->m_pkthdr.PH_vt.vt_nrecs++; if (m != m0) mbufq_enqueue(mq, m); nbytes += npbytes; } while (nims != NULL); drt |= crt; crt = (~crt & REC_FULL); } CTR3(KTR_MLD, "%s: queued %d ALLOW_NEW, %d BLOCK_OLD", __func__, nallow, nblock); return (nbytes); } static int mld_v2_merge_state_changes(struct in6_multi *inm, struct mbufq *scq) { struct mbufq *gq; struct mbuf *m; /* pending state-change */ struct mbuf *m0; /* copy of pending state-change */ struct mbuf *mt; /* last state-change in packet */ int docopy, domerge; u_int recslen; docopy = 0; domerge = 0; recslen = 0; IN6_MULTI_LOCK_ASSERT(); MLD_LOCK_ASSERT(); /* * If there are further pending retransmissions, make a writable * copy of each queued state-change message before merging. */ if (inm->in6m_scrv > 0) docopy = 1; gq = &inm->in6m_scq; #ifdef KTR if (mbufq_first(gq) == NULL) { CTR2(KTR_MLD, "%s: WARNING: queue for inm %p is empty", __func__, inm); } #endif m = mbufq_first(gq); while (m != NULL) { /* * Only merge the report into the current packet if * there is sufficient space to do so; an MLDv2 report * packet may only contain 65,535 group records. * Always use a simple mbuf chain concatentation to do this, * as large state changes for single groups may have * allocated clusters. */ domerge = 0; mt = mbufq_last(scq); if (mt != NULL) { recslen = m_length(m, NULL); if ((mt->m_pkthdr.PH_vt.vt_nrecs + m->m_pkthdr.PH_vt.vt_nrecs <= MLD_V2_REPORT_MAXRECS) && (mt->m_pkthdr.len + recslen <= (inm->in6m_ifp->if_mtu - MLD_MTUSPACE))) domerge = 1; } if (!domerge && mbufq_full(gq)) { CTR2(KTR_MLD, "%s: outbound queue full, skipping whole packet %p", __func__, m); mt = m->m_nextpkt; if (!docopy) m_freem(m); m = mt; continue; } if (!docopy) { CTR2(KTR_MLD, "%s: dequeueing %p", __func__, m); m0 = mbufq_dequeue(gq); m = m0->m_nextpkt; } else { CTR2(KTR_MLD, "%s: copying %p", __func__, m); m0 = m_dup(m, M_NOWAIT); if (m0 == NULL) return (ENOMEM); m0->m_nextpkt = NULL; m = m->m_nextpkt; } if (!domerge) { CTR3(KTR_MLD, "%s: queueing %p to scq %p)", __func__, m0, scq); mbufq_enqueue(scq, m0); } else { struct mbuf *mtl; /* last mbuf of packet mt */ CTR3(KTR_MLD, "%s: merging %p with ifscq tail %p)", __func__, m0, mt); mtl = m_last(mt); m0->m_flags &= ~M_PKTHDR; mt->m_pkthdr.len += recslen; mt->m_pkthdr.PH_vt.vt_nrecs += m0->m_pkthdr.PH_vt.vt_nrecs; mtl->m_next = m0; } } return (0); } /* * Respond to a pending MLDv2 General Query. */ static void -mld_v2_dispatch_general_query(struct mld_ifinfo *mli) +mld_v2_dispatch_general_query(struct mld_ifsoftc *mli) { struct ifmultiaddr *ifma; struct ifnet *ifp; struct in6_multi *inm; int retval; IN6_MULTI_LOCK_ASSERT(); MLD_LOCK_ASSERT(); KASSERT(mli->mli_version == MLD_VERSION_2, ("%s: called when version %d", __func__, mli->mli_version)); ifp = mli->mli_ifp; IF_ADDR_RLOCK(ifp); TAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) { if (ifma->ifma_addr->sa_family != AF_INET6 || ifma->ifma_protospec == NULL) continue; inm = (struct in6_multi *)ifma->ifma_protospec; KASSERT(ifp == inm->in6m_ifp, ("%s: inconsistent ifp", __func__)); switch (inm->in6m_state) { case MLD_NOT_MEMBER: case MLD_SILENT_MEMBER: break; case MLD_REPORTING_MEMBER: case MLD_IDLE_MEMBER: case MLD_LAZY_MEMBER: case MLD_SLEEPING_MEMBER: case MLD_AWAKENING_MEMBER: inm->in6m_state = MLD_REPORTING_MEMBER; retval = mld_v2_enqueue_group_record(&mli->mli_gq, inm, 0, 0, 0, 0); CTR2(KTR_MLD, "%s: enqueue record = %d", __func__, retval); break; case MLD_G_QUERY_PENDING_MEMBER: case MLD_SG_QUERY_PENDING_MEMBER: case MLD_LEAVING_MEMBER: break; } } IF_ADDR_RUNLOCK(ifp); mld_dispatch_queue(&mli->mli_gq, MLD_MAX_RESPONSE_BURST); /* * Slew transmission of bursts over 500ms intervals. */ if (mbufq_first(&mli->mli_gq) != NULL) { mli->mli_v2_timer = 1 + MLD_RANDOM_DELAY( MLD_RESPONSE_BURST_INTERVAL); V_interface_timers_running6 = 1; } } /* * Transmit the next pending message in the output queue. * * VIMAGE: Needs to store/restore vnet pointer on a per-mbuf-chain basis. * MRT: Nothing needs to be done, as MLD traffic is always local to * a link and uses a link-scope multicast address. */ static void mld_dispatch_packet(struct mbuf *m) { struct ip6_moptions im6o; struct ifnet *ifp; struct ifnet *oifp; struct mbuf *m0; struct mbuf *md; struct ip6_hdr *ip6; struct mld_hdr *mld; int error; int off; int type; uint32_t ifindex; CTR2(KTR_MLD, "%s: transmit %p", __func__, m); /* * Set VNET image pointer from enqueued mbuf chain * before doing anything else. Whilst we use interface * indexes to guard against interface detach, they are * unique to each VIMAGE and must be retrieved. */ ifindex = mld_restore_context(m); /* * Check if the ifnet still exists. This limits the scope of * any race in the absence of a global ifp lock for low cost * (an array lookup). */ ifp = ifnet_byindex(ifindex); if (ifp == NULL) { CTR3(KTR_MLD, "%s: dropped %p as ifindex %u went away.", __func__, m, ifindex); m_freem(m); IP6STAT_INC(ip6s_noroute); goto out; } im6o.im6o_multicast_hlim = 1; im6o.im6o_multicast_loop = (V_ip6_mrouter != NULL); im6o.im6o_multicast_ifp = ifp; if (m->m_flags & M_MLDV1) { m0 = m; } else { m0 = mld_v2_encap_report(ifp, m); if (m0 == NULL) { CTR2(KTR_MLD, "%s: dropped %p", __func__, m); IP6STAT_INC(ip6s_odropped); goto out; } } mld_scrub_context(m0); m_clrprotoflags(m); m0->m_pkthdr.rcvif = V_loif; ip6 = mtod(m0, struct ip6_hdr *); #if 0 (void)in6_setscope(&ip6->ip6_dst, ifp, NULL); /* XXX LOR */ #else /* * XXX XXX Break some KPI rules to prevent an LOR which would * occur if we called in6_setscope() at transmission. * See comments at top of file. */ MLD_EMBEDSCOPE(&ip6->ip6_dst, ifp->if_index); #endif /* * Retrieve the ICMPv6 type before handoff to ip6_output(), * so we can bump the stats. */ md = m_getptr(m0, sizeof(struct ip6_hdr), &off); mld = (struct mld_hdr *)(mtod(md, uint8_t *) + off); type = mld->mld_type; error = ip6_output(m0, &mld_po, NULL, IPV6_UNSPECSRC, &im6o, &oifp, NULL); if (error) { CTR3(KTR_MLD, "%s: ip6_output(%p) = %d", __func__, m0, error); goto out; } ICMP6STAT_INC(icp6s_outhist[type]); if (oifp != NULL) { icmp6_ifstat_inc(oifp, ifs6_out_msg); switch (type) { case MLD_LISTENER_REPORT: case MLDV2_LISTENER_REPORT: icmp6_ifstat_inc(oifp, ifs6_out_mldreport); break; case MLD_LISTENER_DONE: icmp6_ifstat_inc(oifp, ifs6_out_mlddone); break; } } out: return; } /* * Encapsulate an MLDv2 report. * * KAME IPv6 requires that hop-by-hop options be passed separately, * and that the IPv6 header be prepended in a separate mbuf. * * Returns a pointer to the new mbuf chain head, or NULL if the * allocation failed. */ static struct mbuf * mld_v2_encap_report(struct ifnet *ifp, struct mbuf *m) { struct mbuf *mh; struct mldv2_report *mld; struct ip6_hdr *ip6; struct in6_ifaddr *ia; int mldreclen; KASSERT(ifp != NULL, ("%s: null ifp", __func__)); KASSERT((m->m_flags & M_PKTHDR), ("%s: mbuf chain %p is !M_PKTHDR", __func__, m)); /* * RFC3590: OK to send as :: or tentative during DAD. */ ia = in6ifa_ifpforlinklocal(ifp, IN6_IFF_NOTREADY|IN6_IFF_ANYCAST); if (ia == NULL) CTR1(KTR_MLD, "%s: warning: ia is NULL", __func__); mh = m_gethdr(M_NOWAIT, MT_DATA); if (mh == NULL) { if (ia != NULL) ifa_free(&ia->ia_ifa); m_freem(m); return (NULL); } M_ALIGN(mh, sizeof(struct ip6_hdr) + sizeof(struct mldv2_report)); mldreclen = m_length(m, NULL); CTR2(KTR_MLD, "%s: mldreclen is %d", __func__, mldreclen); mh->m_len = sizeof(struct ip6_hdr) + sizeof(struct mldv2_report); mh->m_pkthdr.len = sizeof(struct ip6_hdr) + sizeof(struct mldv2_report) + mldreclen; ip6 = mtod(mh, struct ip6_hdr *); ip6->ip6_flow = 0; ip6->ip6_vfc &= ~IPV6_VERSION_MASK; ip6->ip6_vfc |= IPV6_VERSION; ip6->ip6_nxt = IPPROTO_ICMPV6; ip6->ip6_src = ia ? ia->ia_addr.sin6_addr : in6addr_any; if (ia != NULL) ifa_free(&ia->ia_ifa); ip6->ip6_dst = in6addr_linklocal_allv2routers; /* scope ID will be set in netisr */ mld = (struct mldv2_report *)(ip6 + 1); mld->mld_type = MLDV2_LISTENER_REPORT; mld->mld_code = 0; mld->mld_cksum = 0; mld->mld_v2_reserved = 0; mld->mld_v2_numrecs = htons(m->m_pkthdr.PH_vt.vt_nrecs); m->m_pkthdr.PH_vt.vt_nrecs = 0; mh->m_next = m; mld->mld_cksum = in6_cksum(mh, IPPROTO_ICMPV6, sizeof(struct ip6_hdr), sizeof(struct mldv2_report) + mldreclen); return (mh); } #ifdef KTR static char * mld_rec_type_to_str(const int type) { switch (type) { case MLD_CHANGE_TO_EXCLUDE_MODE: return "TO_EX"; break; case MLD_CHANGE_TO_INCLUDE_MODE: return "TO_IN"; break; case MLD_MODE_IS_EXCLUDE: return "MODE_EX"; break; case MLD_MODE_IS_INCLUDE: return "MODE_IN"; break; case MLD_ALLOW_NEW_SOURCES: return "ALLOW_NEW"; break; case MLD_BLOCK_OLD_SOURCES: return "BLOCK_OLD"; break; default: break; } return "unknown"; } #endif static void mld_init(void *unused __unused) { CTR1(KTR_MLD, "%s: initializing", __func__); MLD_LOCK_INIT(); ip6_initpktopts(&mld_po); mld_po.ip6po_hlim = 1; mld_po.ip6po_hbh = &mld_ra.hbh; mld_po.ip6po_prefer_tempaddr = IP6PO_TEMPADDR_NOTPREFER; mld_po.ip6po_flags = IP6PO_DONTFRAG; } SYSINIT(mld_init, SI_SUB_PSEUDO, SI_ORDER_MIDDLE, mld_init, NULL); static void mld_uninit(void *unused __unused) { CTR1(KTR_MLD, "%s: tearing down", __func__); MLD_LOCK_DESTROY(); } SYSUNINIT(mld_uninit, SI_SUB_PSEUDO, SI_ORDER_MIDDLE, mld_uninit, NULL); static void vnet_mld_init(const void *unused __unused) { CTR1(KTR_MLD, "%s: initializing", __func__); LIST_INIT(&V_mli_head); } VNET_SYSINIT(vnet_mld_init, SI_SUB_PSEUDO, SI_ORDER_ANY, vnet_mld_init, NULL); static void vnet_mld_uninit(const void *unused __unused) { CTR1(KTR_MLD, "%s: tearing down", __func__); KASSERT(LIST_EMPTY(&V_mli_head), ("%s: mli list not empty; ifnets not detached?", __func__)); } VNET_SYSUNINIT(vnet_mld_uninit, SI_SUB_PSEUDO, SI_ORDER_ANY, vnet_mld_uninit, NULL); static int mld_modevent(module_t mod, int type, void *unused __unused) { switch (type) { case MOD_LOAD: case MOD_UNLOAD: break; default: return (EOPNOTSUPP); } return (0); } static moduledata_t mld_mod = { "mld", mld_modevent, 0 }; DECLARE_MODULE(mld, mld_mod, SI_SUB_PSEUDO, SI_ORDER_ANY); Index: projects/ifnet/sys/netinet6/mld6_var.h =================================================================== --- projects/ifnet/sys/netinet6/mld6_var.h (revision 279031) +++ projects/ifnet/sys/netinet6/mld6_var.h (revision 279032) @@ -1,164 +1,177 @@ /*- * Copyright (c) 2009 Bruce Simpson. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. The name of the author may not be used to endorse or promote * products derived from this software without specific prior written * permission. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ #ifndef _NETINET6_MLD6_VAR_H_ #define _NETINET6_MLD6_VAR_H_ /* * Multicast Listener Discovery (MLD) * implementation-specific definitions. */ -#ifdef _KERNEL - /* - * Per-link MLD state. - */ -struct mld_ifinfo { - LIST_ENTRY(mld_ifinfo) mli_link; - struct ifnet *mli_ifp; /* interface this instance belongs to */ - uint32_t mli_version; /* MLDv1 Host Compatibility Mode */ - uint32_t mli_v1_timer; /* MLDv1 Querier Present timer (s) */ - uint32_t mli_v2_timer; /* MLDv2 General Query (interface) timer (s)*/ - uint32_t mli_flags; /* MLD per-interface flags */ - uint32_t mli_rv; /* MLDv2 Robustness Variable */ - uint32_t mli_qi; /* MLDv2 Query Interval (s) */ - uint32_t mli_qri; /* MLDv2 Query Response Interval (s) */ - uint32_t mli_uri; /* MLDv2 Unsolicited Report Interval (s) */ - SLIST_HEAD(,in6_multi) mli_relinmhead; /* released groups */ - struct mbufq mli_gq; /* queue of general query responses */ -}; -#define MLIF_SILENT 0x00000001 /* Do not use MLD on this ifp */ -#define MLIF_USEALLOW 0x00000002 /* Use ALLOW/BLOCK for joins/leaves */ - -#define MLD_RANDOM_DELAY(X) (arc4random() % (X) + 1) -#define MLD_MAX_STATE_CHANGES 24 /* Max pending changes per group */ - -/* * MLD per-group states. */ #define MLD_NOT_MEMBER 0 /* Can garbage collect group */ #define MLD_SILENT_MEMBER 1 /* Do not perform MLD for group */ #define MLD_REPORTING_MEMBER 2 /* MLDv1 we are reporter */ #define MLD_IDLE_MEMBER 3 /* MLDv1 we reported last */ #define MLD_LAZY_MEMBER 4 /* MLDv1 other member reporting */ #define MLD_SLEEPING_MEMBER 5 /* MLDv1 start query response */ #define MLD_AWAKENING_MEMBER 6 /* MLDv1 group timer will start */ #define MLD_G_QUERY_PENDING_MEMBER 7 /* MLDv2 group query pending */ #define MLD_SG_QUERY_PENDING_MEMBER 8 /* MLDv2 source query pending */ #define MLD_LEAVING_MEMBER 9 /* MLDv2 dying gasp (pending last */ /* retransmission of INCLUDE {}) */ /* * MLD version tag. */ #define MLD_VERSION_NONE 0 /* Invalid */ #define MLD_VERSION_1 1 #define MLD_VERSION_2 2 /* Default */ /* * MLDv2 protocol control variables. */ #define MLD_RV_INIT 2 /* Robustness Variable */ #define MLD_RV_MIN 1 #define MLD_RV_MAX 7 #define MLD_QI_INIT 125 /* Query Interval (s) */ #define MLD_QI_MIN 1 #define MLD_QI_MAX 255 #define MLD_QRI_INIT 10 /* Query Response Interval (s) */ #define MLD_QRI_MIN 1 #define MLD_QRI_MAX 255 #define MLD_URI_INIT 3 /* Unsolicited Report Interval (s) */ #define MLD_URI_MIN 0 #define MLD_URI_MAX 10 #define MLD_MAX_GS_SOURCES 256 /* # of sources in rx GS query */ #define MLD_MAX_G_GS_PACKETS 8 /* # of packets to answer G/GS */ #define MLD_MAX_STATE_CHANGE_PACKETS 8 /* # of packets per state change */ #define MLD_MAX_RESPONSE_PACKETS 16 /* # of packets for general query */ #define MLD_MAX_RESPONSE_BURST 4 /* # of responses to send at once */ #define MLD_RESPONSE_BURST_INTERVAL (PR_FASTHZ / 2) /* 500ms */ /* * MLD-specific mbuf flags. */ #define M_MLDV1 M_PROTO1 /* Packet is MLDv1 */ #define M_GROUPREC M_PROTO3 /* mbuf chain is a group record */ /* * Leading space for MLDv2 reports inside MTU. * * NOTE: This differs from IGMPv3 significantly. KAME IPv6 requires * that a fully formed mbuf chain *without* the Router Alert option * is passed to ip6_output(), however we must account for it in the * MTU if we need to split an MLDv2 report into several packets. * * We now put the MLDv2 report header in the initial mbuf containing * the IPv6 header. */ #define MLD_MTUSPACE (sizeof(struct ip6_hdr) + sizeof(struct mld_raopt) + \ sizeof(struct icmp6_hdr)) /* + * Structure returned by net.inet6.mld.ifinfo. + */ +struct mld_ifinfo { + uint32_t mli_version; /* MLDv1 Host Compatibility Mode */ + uint32_t mli_v1_timer; /* MLDv1 Querier Present timer (s) */ + uint32_t mli_v2_timer; /* MLDv2 General Query (interface) timer (s)*/ + uint32_t mli_flags; /* MLD per-interface flags */ +#define MLIF_SILENT 0x00000001 /* Do not use MLD on this ifp */ +#define MLIF_USEALLOW 0x00000002 /* Use ALLOW/BLOCK for joins/leaves */ + uint32_t mli_rv; /* MLDv2 Robustness Variable */ + uint32_t mli_qi; /* MLDv2 Query Interval (s) */ + uint32_t mli_qri; /* MLDv2 Query Response Interval (s) */ + uint32_t mli_uri; /* MLDv2 Unsolicited Report Interval (s) */ +}; + +#ifdef _KERNEL +/* + * Per-link MLD state. + */ +struct mld_ifsoftc { + LIST_ENTRY(mld_ifsoftc) mli_link; + struct ifnet *mli_ifp; /* interface this instance belongs to */ + uint32_t mli_version; /* MLDv1 Host Compatibility Mode */ + uint32_t mli_v1_timer; /* MLDv1 Querier Present timer (s) */ + uint32_t mli_v2_timer; /* MLDv2 General Query (interface) timer (s)*/ + uint32_t mli_flags; /* MLD per-interface flags */ + uint32_t mli_rv; /* MLDv2 Robustness Variable */ + uint32_t mli_qi; /* MLDv2 Query Interval (s) */ + uint32_t mli_qri; /* MLDv2 Query Response Interval (s) */ + uint32_t mli_uri; /* MLDv2 Unsolicited Report Interval (s) */ + SLIST_HEAD(,in6_multi) mli_relinmhead; /* released groups */ + struct mbufq mli_gq; /* queue of general query responses */ +}; + +#define MLD_RANDOM_DELAY(X) (arc4random() % (X) + 1) +#define MLD_MAX_STATE_CHANGES 24 /* Max pending changes per group */ + +/* * Subsystem lock macros. * The MLD lock is only taken with MLD. Currently it is system-wide. * VIMAGE: The lock could be pushed to per-VIMAGE granularity in future. */ #define MLD_LOCK_INIT() mtx_init(&mld_mtx, "mld_mtx", NULL, MTX_DEF) #define MLD_LOCK_DESTROY() mtx_destroy(&mld_mtx) #define MLD_LOCK() mtx_lock(&mld_mtx) #define MLD_LOCK_ASSERT() mtx_assert(&mld_mtx, MA_OWNED) #define MLD_UNLOCK() mtx_unlock(&mld_mtx) #define MLD_UNLOCK_ASSERT() mtx_assert(&mld_mtx, MA_NOTOWNED) /* * Per-link MLD context. */ #define MLD_IFINFO(ifp) \ (((struct in6_ifextra *)(ifp)->if_afdata[AF_INET6])->mld_ifinfo) int mld_change_state(struct in6_multi *, const int); -struct mld_ifinfo * +struct mld_ifsoftc * mld_domifattach(struct ifnet *); void mld_domifdetach(struct ifnet *); void mld_fasttimo(void); void mld_ifdetach(struct ifnet *); int mld_input(struct mbuf *, int, int); void mld_slowtimo(void); #ifdef SYSCTL_DECL SYSCTL_DECL(_net_inet6_mld); #endif #endif /* _KERNEL */ #endif /* _NETINET6_MLD6_VAR_H_ */ Index: projects/ifnet/sys/ofed/drivers/infiniband/core/cma.c =================================================================== --- projects/ifnet/sys/ofed/drivers/infiniband/core/cma.c (revision 279031) +++ projects/ifnet/sys/ofed/drivers/infiniband/core/cma.c (revision 279032) @@ -1,3694 +1,3700 @@ /* * Copyright (c) 2005 Voltaire Inc. All rights reserved. * Copyright (c) 2002-2005, Network Appliance, Inc. All rights reserved. * Copyright (c) 1999-2005, Mellanox Technologies, Inc. All rights reserved. * Copyright (c) 2005-2006 Intel Corporation. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU * General Public License (GPL) Version 2, available from the file * COPYING in the main directory of this source tree, or the * OpenIB.org BSD license below: * * Redistribution and use in source and binary forms, with or * without modification, are permitted provided that the following * conditions are met: * * - Redistributions of source code must retain the above * copyright notice, this list of conditions and the following * disclaimer. * * - Redistributions in binary form must reproduce the above * copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials * provided with the distribution. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include MODULE_AUTHOR("Sean Hefty"); MODULE_DESCRIPTION("Generic RDMA CM Agent"); MODULE_LICENSE("Dual BSD/GPL"); #define CMA_CM_RESPONSE_TIMEOUT 20 #define CMA_MAX_CM_RETRIES 15 #define CMA_CM_MRA_SETTING (IB_CM_MRA_FLAG_DELAY | 24) #define CMA_IBOE_PACKET_LIFETIME 18 static int cma_response_timeout = CMA_CM_RESPONSE_TIMEOUT; module_param_named(cma_response_timeout, cma_response_timeout, int, 0644); MODULE_PARM_DESC(cma_response_timeout, "CMA_CM_RESPONSE_TIMEOUT (default=20)"); static int def_prec2sl = 3; module_param_named(def_prec2sl, def_prec2sl, int, 0644); MODULE_PARM_DESC(def_prec2sl, "Default value for SL priority with RoCE. Valid values 0 - 7"); static int debug_level = 0; #define cma_pr(level, priv, format, arg...) \ printk(level "CMA: %p: %s: " format, ((struct rdma_id_priv *) priv) , __func__, ## arg) #define cma_dbg(priv, format, arg...) \ do { if (debug_level) cma_pr(KERN_DEBUG, priv, format, ## arg); } while (0) #define cma_warn(priv, format, arg...) \ cma_pr(KERN_WARNING, priv, format, ## arg) #define CMA_GID_FMT "%2.2x%2.2x:%2.2x%2.2x" #define CMA_GID_RAW_ARG(gid) ((u8 *)(gid))[12],\ ((u8 *)(gid))[13],\ ((u8 *)(gid))[14],\ ((u8 *)(gid))[15] #define CMA_GID_ARG(gid) CMA_GID_RAW_ARG((gid).raw) #define cma_debug_path(priv, pfx, p) \ cma_dbg(priv, pfx "sgid=" CMA_GID_FMT ",dgid=" \ CMA_GID_FMT "\n", CMA_GID_ARG(p.sgid), \ CMA_GID_ARG(p.dgid)) #define cma_debug_gid(priv, g) \ cma_dbg(priv, "gid=" CMA_GID_FMT "\n", CMA_GID_ARG(g) module_param_named(debug_level, debug_level, int, 0644); MODULE_PARM_DESC(debug_level, "debug level default=0"); static void cma_add_one(struct ib_device *device); static void cma_remove_one(struct ib_device *device); static struct ib_client cma_client = { .name = "cma", .add = cma_add_one, .remove = cma_remove_one }; static struct ib_sa_client sa_client; static struct rdma_addr_client addr_client; static LIST_HEAD(dev_list); static LIST_HEAD(listen_any_list); static DEFINE_MUTEX(lock); static struct workqueue_struct *cma_wq; static struct workqueue_struct *cma_free_wq; static DEFINE_IDR(sdp_ps); static DEFINE_IDR(tcp_ps); static DEFINE_IDR(udp_ps); static DEFINE_IDR(ipoib_ps); static DEFINE_IDR(ib_ps); struct cma_device { struct list_head list; struct ib_device *device; struct completion comp; atomic_t refcount; struct list_head id_list; }; struct rdma_bind_list { struct idr *ps; struct hlist_head owners; unsigned short port; }; enum { CMA_OPTION_AFONLY, }; /* * Device removal can occur at anytime, so we need extra handling to * serialize notifying the user of device removal with other callbacks. * We do this by disabling removal notification while a callback is in process, * and reporting it after the callback completes. */ struct rdma_id_private { struct rdma_cm_id id; struct rdma_bind_list *bind_list; struct socket *sock; struct hlist_node node; struct list_head list; /* listen_any_list or cma_device.list */ struct list_head listen_list; /* per device listens */ struct cma_device *cma_dev; struct list_head mc_list; int internal_id; enum rdma_cm_state state; spinlock_t lock; spinlock_t cm_lock; struct mutex qp_mutex; struct completion comp; atomic_t refcount; struct mutex handler_mutex; struct work_struct work; /* garbage coll */ int backlog; int timeout_ms; struct ib_sa_query *query; int query_id; union { struct ib_cm_id *ib; struct iw_cm_id *iw; } cm_id; u32 seq_num; u32 qkey; u32 qp_num; pid_t owner; u32 options; u8 srq; u8 tos; u8 reuseaddr; u8 afonly; int qp_timeout; /* cache for mc record params */ struct ib_sa_mcmember_rec rec; int is_valid_rec; }; struct cma_multicast { struct rdma_id_private *id_priv; union { struct ib_sa_multicast *ib; } multicast; struct list_head list; void *context; struct sockaddr_storage addr; struct kref mcref; }; struct cma_work { struct work_struct work; struct rdma_id_private *id; enum rdma_cm_state old_state; enum rdma_cm_state new_state; struct rdma_cm_event event; }; struct cma_ndev_work { struct work_struct work; struct rdma_id_private *id; struct rdma_cm_event event; }; struct iboe_mcast_work { struct work_struct work; struct rdma_id_private *id; struct cma_multicast *mc; }; union cma_ip_addr { struct in6_addr ip6; struct { __be32 pad[3]; __be32 addr; } ip4; }; struct cma_hdr { u8 cma_version; u8 ip_version; /* IP version: 7:4 */ __be16 port; union cma_ip_addr src_addr; union cma_ip_addr dst_addr; }; struct sdp_hh { u8 bsdh[16]; u8 sdp_version; /* Major version: 7:4 */ u8 ip_version; /* IP version: 7:4 */ u8 sdp_specific1[10]; __be16 port; __be16 sdp_specific2; union cma_ip_addr src_addr; union cma_ip_addr dst_addr; }; struct sdp_hah { u8 bsdh[16]; u8 sdp_version; }; #define CMA_VERSION 0x00 #define SDP_MAJ_VERSION 0x2 static int cma_comp(struct rdma_id_private *id_priv, enum rdma_cm_state comp) { unsigned long flags; int ret; spin_lock_irqsave(&id_priv->lock, flags); ret = (id_priv->state == comp); spin_unlock_irqrestore(&id_priv->lock, flags); return ret; } static int cma_comp_exch(struct rdma_id_private *id_priv, enum rdma_cm_state comp, enum rdma_cm_state exch) { unsigned long flags; int ret; spin_lock_irqsave(&id_priv->lock, flags); if ((ret = (id_priv->state == comp))) id_priv->state = exch; spin_unlock_irqrestore(&id_priv->lock, flags); return ret; } static enum rdma_cm_state cma_exch(struct rdma_id_private *id_priv, enum rdma_cm_state exch) { unsigned long flags; enum rdma_cm_state old; spin_lock_irqsave(&id_priv->lock, flags); old = id_priv->state; id_priv->state = exch; spin_unlock_irqrestore(&id_priv->lock, flags); return old; } static inline u8 cma_get_ip_ver(struct cma_hdr *hdr) { return hdr->ip_version >> 4; } static inline void cma_set_ip_ver(struct cma_hdr *hdr, u8 ip_ver) { hdr->ip_version = (ip_ver << 4) | (hdr->ip_version & 0xF); } static inline u8 sdp_get_majv(u8 sdp_version) { return sdp_version >> 4; } static inline u8 sdp_get_ip_ver(struct sdp_hh *hh) { return hh->ip_version >> 4; } static inline void sdp_set_ip_ver(struct sdp_hh *hh, u8 ip_ver) { hh->ip_version = (ip_ver << 4) | (hh->ip_version & 0xF); } static void cma_attach_to_dev(struct rdma_id_private *id_priv, struct cma_device *cma_dev) { atomic_inc(&cma_dev->refcount); id_priv->cma_dev = cma_dev; id_priv->id.device = cma_dev->device; id_priv->id.route.addr.dev_addr.transport = rdma_node_get_transport(cma_dev->device->node_type); list_add_tail(&id_priv->list, &cma_dev->id_list); } static inline void cma_deref_dev(struct cma_device *cma_dev) { if (atomic_dec_and_test(&cma_dev->refcount)) complete(&cma_dev->comp); } static inline void release_mc(struct kref *kref) { struct cma_multicast *mc = container_of(kref, struct cma_multicast, mcref); kfree(mc->multicast.ib); kfree(mc); } static void cma_release_dev(struct rdma_id_private *id_priv) { mutex_lock(&lock); list_del(&id_priv->list); cma_deref_dev(id_priv->cma_dev); id_priv->cma_dev = NULL; mutex_unlock(&lock); } static int cma_set_qkey(struct rdma_id_private *id_priv) { struct ib_sa_mcmember_rec rec; int ret = 0; if (id_priv->qkey) return 0; switch (id_priv->id.ps) { case RDMA_PS_UDP: id_priv->qkey = RDMA_UDP_QKEY; break; case RDMA_PS_IPOIB: ib_addr_get_mgid(&id_priv->id.route.addr.dev_addr, &rec.mgid); ret = ib_sa_get_mcmember_rec(id_priv->id.device, id_priv->id.port_num, &rec.mgid, &rec); if (!ret) id_priv->qkey = be32_to_cpu(rec.qkey); break; default: break; } return ret; } static int find_gid_port(struct ib_device *device, union ib_gid *gid, u8 port_num) { int i; int err; struct ib_port_attr props; union ib_gid tmp; err = ib_query_port(device, port_num, &props); if (err) return 1; for (i = 0; i < props.gid_tbl_len; ++i) { err = ib_query_gid(device, port_num, i, &tmp); if (err) return 1; if (!memcmp(&tmp, gid, sizeof tmp)) return 0; } return -EAGAIN; } static int cma_acquire_dev(struct rdma_id_private *id_priv) { struct rdma_dev_addr *dev_addr = &id_priv->id.route.addr.dev_addr; struct cma_device *cma_dev; union ib_gid gid, iboe_gid; int ret = -ENODEV; u8 port; enum rdma_link_layer dev_ll = dev_addr->dev_type == ARPHRD_INFINIBAND ? IB_LINK_LAYER_INFINIBAND : IB_LINK_LAYER_ETHERNET; if (dev_ll != IB_LINK_LAYER_INFINIBAND && id_priv->id.ps == RDMA_PS_IPOIB) return -EINVAL; mutex_lock(&lock); rdma_ip2gid((struct sockaddr *)&id_priv->id.route.addr.src_addr, &iboe_gid); memcpy(&gid, dev_addr->src_dev_addr + rdma_addr_gid_offset(dev_addr), sizeof gid); list_for_each_entry(cma_dev, &dev_list, list) { for (port = 1; port <= cma_dev->device->phys_port_cnt; ++port) { if (rdma_port_get_link_layer(cma_dev->device, port) == dev_ll) { if (rdma_node_get_transport(cma_dev->device->node_type) == RDMA_TRANSPORT_IB && rdma_port_get_link_layer(cma_dev->device, port) == IB_LINK_LAYER_ETHERNET) ret = find_gid_port(cma_dev->device, &iboe_gid, port); else ret = find_gid_port(cma_dev->device, &gid, port); if (!ret) { id_priv->id.port_num = port; goto out; } else if (ret == 1) break; } } } out: if (!ret) cma_attach_to_dev(id_priv, cma_dev); mutex_unlock(&lock); return ret; } static void cma_deref_id(struct rdma_id_private *id_priv) { if (atomic_dec_and_test(&id_priv->refcount)) complete(&id_priv->comp); } static int cma_disable_callback(struct rdma_id_private *id_priv, enum rdma_cm_state state) { mutex_lock(&id_priv->handler_mutex); if (id_priv->state != state) { mutex_unlock(&id_priv->handler_mutex); return -EINVAL; } return 0; } struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler event_handler, void *context, enum rdma_port_space ps, enum ib_qp_type qp_type) { struct rdma_id_private *id_priv; id_priv = kzalloc(sizeof *id_priv, GFP_KERNEL); if (!id_priv) return ERR_PTR(-ENOMEM); id_priv->owner = curthread->td_proc->p_pid; id_priv->state = RDMA_CM_IDLE; id_priv->id.context = context; id_priv->id.event_handler = event_handler; id_priv->id.ps = ps; id_priv->id.qp_type = qp_type; spin_lock_init(&id_priv->lock); spin_lock_init(&id_priv->cm_lock); mutex_init(&id_priv->qp_mutex); init_completion(&id_priv->comp); atomic_set(&id_priv->refcount, 1); mutex_init(&id_priv->handler_mutex); INIT_LIST_HEAD(&id_priv->listen_list); INIT_LIST_HEAD(&id_priv->mc_list); get_random_bytes(&id_priv->seq_num, sizeof id_priv->seq_num); return &id_priv->id; } EXPORT_SYMBOL(rdma_create_id); static int cma_init_ud_qp(struct rdma_id_private *id_priv, struct ib_qp *qp) { struct ib_qp_attr qp_attr; int qp_attr_mask, ret; qp_attr.qp_state = IB_QPS_INIT; ret = rdma_init_qp_attr(&id_priv->id, &qp_attr, &qp_attr_mask); if (ret) return ret; ret = ib_modify_qp(qp, &qp_attr, qp_attr_mask); if (ret) return ret; qp_attr.qp_state = IB_QPS_RTR; ret = ib_modify_qp(qp, &qp_attr, IB_QP_STATE); if (ret) return ret; qp_attr.qp_state = IB_QPS_RTS; qp_attr.sq_psn = 0; ret = ib_modify_qp(qp, &qp_attr, IB_QP_STATE | IB_QP_SQ_PSN); return ret; } static int cma_init_conn_qp(struct rdma_id_private *id_priv, struct ib_qp *qp) { struct ib_qp_attr qp_attr; int qp_attr_mask, ret; qp_attr.qp_state = IB_QPS_INIT; ret = rdma_init_qp_attr(&id_priv->id, &qp_attr, &qp_attr_mask); if (ret) return ret; return ib_modify_qp(qp, &qp_attr, qp_attr_mask); } int rdma_create_qp(struct rdma_cm_id *id, struct ib_pd *pd, struct ib_qp_init_attr *qp_init_attr) { struct rdma_id_private *id_priv; struct ib_qp *qp; int ret; id_priv = container_of(id, struct rdma_id_private, id); if (id->device != pd->device) return -EINVAL; qp = ib_create_qp(pd, qp_init_attr); if (IS_ERR(qp)) return PTR_ERR(qp); if (id->qp_type == IB_QPT_UD) ret = cma_init_ud_qp(id_priv, qp); else ret = cma_init_conn_qp(id_priv, qp); if (ret) goto err; id->qp = qp; id_priv->qp_num = qp->qp_num; id_priv->srq = (qp->srq != NULL); return 0; err: ib_destroy_qp(qp); return ret; } EXPORT_SYMBOL(rdma_create_qp); void rdma_destroy_qp(struct rdma_cm_id *id) { struct rdma_id_private *id_priv; id_priv = container_of(id, struct rdma_id_private, id); mutex_lock(&id_priv->qp_mutex); ib_destroy_qp(id_priv->id.qp); id_priv->id.qp = NULL; mutex_unlock(&id_priv->qp_mutex); } EXPORT_SYMBOL(rdma_destroy_qp); static int cma_modify_qp_rtr(struct rdma_id_private *id_priv, struct rdma_conn_param *conn_param) { struct ib_qp_attr qp_attr; int qp_attr_mask, ret; union ib_gid sgid; mutex_lock(&id_priv->qp_mutex); if (!id_priv->id.qp) { ret = 0; goto out; } /* Need to update QP attributes from default values. */ qp_attr.qp_state = IB_QPS_INIT; ret = rdma_init_qp_attr(&id_priv->id, &qp_attr, &qp_attr_mask); if (ret) goto out; ret = ib_modify_qp(id_priv->id.qp, &qp_attr, qp_attr_mask); if (ret) goto out; qp_attr.qp_state = IB_QPS_RTR; ret = rdma_init_qp_attr(&id_priv->id, &qp_attr, &qp_attr_mask); if (ret) goto out; ret = ib_query_gid(id_priv->id.device, id_priv->id.port_num, qp_attr.ah_attr.grh.sgid_index, &sgid); if (ret) goto out; if (rdma_node_get_transport(id_priv->cma_dev->device->node_type) == RDMA_TRANSPORT_IB && rdma_port_get_link_layer(id_priv->id.device, id_priv->id.port_num) == IB_LINK_LAYER_ETHERNET) { ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr.smac, NULL); if (ret) goto out; } if (conn_param) qp_attr.max_dest_rd_atomic = conn_param->responder_resources; ret = ib_modify_qp(id_priv->id.qp, &qp_attr, qp_attr_mask); out: mutex_unlock(&id_priv->qp_mutex); return ret; } static int cma_modify_qp_rts(struct rdma_id_private *id_priv, struct rdma_conn_param *conn_param) { struct ib_qp_attr qp_attr; int qp_attr_mask, ret; mutex_lock(&id_priv->qp_mutex); if (!id_priv->id.qp) { ret = 0; goto out; } qp_attr.qp_state = IB_QPS_RTS; ret = rdma_init_qp_attr(&id_priv->id, &qp_attr, &qp_attr_mask); if (ret) goto out; if (conn_param) qp_attr.max_rd_atomic = conn_param->initiator_depth; if (id_priv->qp_timeout && id_priv->id.qp->qp_type == IB_QPT_RC) { qp_attr.timeout = id_priv->qp_timeout; qp_attr_mask |= IB_QP_TIMEOUT; } ret = ib_modify_qp(id_priv->id.qp, &qp_attr, qp_attr_mask); out: mutex_unlock(&id_priv->qp_mutex); return ret; } static int cma_modify_qp_err(struct rdma_id_private *id_priv) { struct ib_qp_attr qp_attr; int ret; mutex_lock(&id_priv->qp_mutex); if (!id_priv->id.qp) { ret = 0; goto out; } qp_attr.qp_state = IB_QPS_ERR; ret = ib_modify_qp(id_priv->id.qp, &qp_attr, IB_QP_STATE); out: mutex_unlock(&id_priv->qp_mutex); return ret; } static int cma_ib_init_qp_attr(struct rdma_id_private *id_priv, struct ib_qp_attr *qp_attr, int *qp_attr_mask) { struct rdma_dev_addr *dev_addr = &id_priv->id.route.addr.dev_addr; int ret; u16 pkey; if (rdma_port_get_link_layer(id_priv->id.device, id_priv->id.port_num) == IB_LINK_LAYER_INFINIBAND) pkey = ib_addr_get_pkey(dev_addr); else pkey = 0xffff; ret = ib_find_cached_pkey(id_priv->id.device, id_priv->id.port_num, pkey, &qp_attr->pkey_index); if (ret) return ret; qp_attr->port_num = id_priv->id.port_num; *qp_attr_mask = IB_QP_STATE | IB_QP_PKEY_INDEX | IB_QP_PORT; if (id_priv->id.qp_type == IB_QPT_UD) { ret = cma_set_qkey(id_priv); if (ret) return ret; qp_attr->qkey = id_priv->qkey; *qp_attr_mask |= IB_QP_QKEY; } else { qp_attr->qp_access_flags = 0; *qp_attr_mask |= IB_QP_ACCESS_FLAGS; } return 0; } int rdma_init_qp_attr(struct rdma_cm_id *id, struct ib_qp_attr *qp_attr, int *qp_attr_mask) { struct rdma_id_private *id_priv; int ret = 0; id_priv = container_of(id, struct rdma_id_private, id); switch (rdma_node_get_transport(id_priv->id.device->node_type)) { case RDMA_TRANSPORT_IB: if (!id_priv->cm_id.ib || (id_priv->id.qp_type == IB_QPT_UD)) ret = cma_ib_init_qp_attr(id_priv, qp_attr, qp_attr_mask); else ret = ib_cm_init_qp_attr(id_priv->cm_id.ib, qp_attr, qp_attr_mask); if (qp_attr->qp_state == IB_QPS_RTR) qp_attr->rq_psn = id_priv->seq_num; break; case RDMA_TRANSPORT_IWARP: case RDMA_TRANSPORT_SCIF: if (!id_priv->cm_id.iw) { qp_attr->qp_access_flags = 0; *qp_attr_mask = IB_QP_STATE | IB_QP_ACCESS_FLAGS; } else ret = iw_cm_init_qp_attr(id_priv->cm_id.iw, qp_attr, qp_attr_mask); break; default: ret = -ENOSYS; break; } return ret; } EXPORT_SYMBOL(rdma_init_qp_attr); static inline int cma_zero_addr(struct sockaddr *addr) { struct in6_addr *ip6; if (addr->sa_family == AF_INET) return ipv4_is_zeronet( ((struct sockaddr_in *)addr)->sin_addr.s_addr); else { ip6 = &((struct sockaddr_in6 *) addr)->sin6_addr; return (ip6->s6_addr32[0] | ip6->s6_addr32[1] | ip6->s6_addr32[2] | ip6->s6_addr32[3]) == 0; } } static inline int cma_loopback_addr(struct sockaddr *addr) { if (addr->sa_family == AF_INET) return ipv4_is_loopback( ((struct sockaddr_in *) addr)->sin_addr.s_addr); else return ipv6_addr_loopback( &((struct sockaddr_in6 *) addr)->sin6_addr); } static inline int cma_any_addr(struct sockaddr *addr) { return cma_zero_addr(addr) || cma_loopback_addr(addr); } static int cma_addr_cmp(struct sockaddr *src, struct sockaddr *dst) { if (src->sa_family != dst->sa_family) return -1; switch (src->sa_family) { case AF_INET: return ((struct sockaddr_in *) src)->sin_addr.s_addr != ((struct sockaddr_in *) dst)->sin_addr.s_addr; default: return ipv6_addr_cmp(&((struct sockaddr_in6 *) src)->sin6_addr, &((struct sockaddr_in6 *) dst)->sin6_addr); } } static inline __be16 cma_port(struct sockaddr *addr) { if (addr->sa_family == AF_INET) return ((struct sockaddr_in *) addr)->sin_port; else return ((struct sockaddr_in6 *) addr)->sin6_port; } static inline int cma_any_port(struct sockaddr *addr) { return !cma_port(addr); } static int cma_get_net_info(void *hdr, enum rdma_port_space ps, u8 *ip_ver, __be16 *port, union cma_ip_addr **src, union cma_ip_addr **dst) { switch (ps) { case RDMA_PS_SDP: if (sdp_get_majv(((struct sdp_hh *) hdr)->sdp_version) != SDP_MAJ_VERSION) return -EINVAL; *ip_ver = sdp_get_ip_ver(hdr); *port = ((struct sdp_hh *) hdr)->port; *src = &((struct sdp_hh *) hdr)->src_addr; *dst = &((struct sdp_hh *) hdr)->dst_addr; break; default: if (((struct cma_hdr *) hdr)->cma_version != CMA_VERSION) return -EINVAL; *ip_ver = cma_get_ip_ver(hdr); *port = ((struct cma_hdr *) hdr)->port; *src = &((struct cma_hdr *) hdr)->src_addr; *dst = &((struct cma_hdr *) hdr)->dst_addr; break; } if (*ip_ver != 4 && *ip_ver != 6) return -EINVAL; return 0; } static void cma_save_net_info(struct rdma_addr *addr, struct rdma_addr *listen_addr, u8 ip_ver, __be16 port, union cma_ip_addr *src, union cma_ip_addr *dst) { struct sockaddr_in *listen4, *ip4; struct sockaddr_in6 *listen6, *ip6; switch (ip_ver) { case 4: listen4 = (struct sockaddr_in *) &listen_addr->src_addr; ip4 = (struct sockaddr_in *) &addr->src_addr; ip4->sin_family = listen4->sin_family; ip4->sin_addr.s_addr = dst->ip4.addr; ip4->sin_port = listen4->sin_port; ip4 = (struct sockaddr_in *) &addr->dst_addr; ip4->sin_family = listen4->sin_family; ip4->sin_addr.s_addr = src->ip4.addr; ip4->sin_port = port; break; case 6: listen6 = (struct sockaddr_in6 *) &listen_addr->src_addr; ip6 = (struct sockaddr_in6 *) &addr->src_addr; ip6->sin6_family = listen6->sin6_family; ip6->sin6_addr = dst->ip6; ip6->sin6_port = listen6->sin6_port; ip6 = (struct sockaddr_in6 *) &addr->dst_addr; ip6->sin6_family = listen6->sin6_family; ip6->sin6_addr = src->ip6; ip6->sin6_port = port; break; default: break; } } static inline int cma_user_data_offset(enum rdma_port_space ps) { switch (ps) { case RDMA_PS_SDP: return 0; default: return sizeof(struct cma_hdr); } } static void cma_cancel_route(struct rdma_id_private *id_priv) { switch (rdma_port_get_link_layer(id_priv->id.device, id_priv->id.port_num)) { case IB_LINK_LAYER_INFINIBAND: if (id_priv->query) ib_sa_cancel_query(id_priv->query_id, id_priv->query); break; default: break; } } static void cma_cancel_listens(struct rdma_id_private *id_priv) { struct rdma_id_private *dev_id_priv; /* * Remove from listen_any_list to prevent added devices from spawning * additional listen requests. */ mutex_lock(&lock); list_del(&id_priv->list); while (!list_empty(&id_priv->listen_list)) { dev_id_priv = list_entry(id_priv->listen_list.next, struct rdma_id_private, listen_list); /* sync with device removal to avoid duplicate destruction */ list_del_init(&dev_id_priv->list); list_del(&dev_id_priv->listen_list); mutex_unlock(&lock); rdma_destroy_id(&dev_id_priv->id); mutex_lock(&lock); } mutex_unlock(&lock); } static void cma_cancel_operation(struct rdma_id_private *id_priv, enum rdma_cm_state state) { switch (state) { case RDMA_CM_ADDR_QUERY: rdma_addr_cancel(&id_priv->id.route.addr.dev_addr); break; case RDMA_CM_ROUTE_QUERY: cma_cancel_route(id_priv); break; case RDMA_CM_LISTEN: if (cma_any_addr((struct sockaddr *) &id_priv->id.route.addr.src_addr) && !id_priv->cma_dev) cma_cancel_listens(id_priv); break; default: break; } } static void cma_release_port(struct rdma_id_private *id_priv) { struct rdma_bind_list *bind_list; mutex_lock(&lock); bind_list = id_priv->bind_list; if (!bind_list) { mutex_unlock(&lock); return; } hlist_del(&id_priv->node); id_priv->bind_list = NULL; if (hlist_empty(&bind_list->owners)) { idr_remove(bind_list->ps, bind_list->port); kfree(bind_list); } mutex_unlock(&lock); } static void cma_leave_mc_groups(struct rdma_id_private *id_priv) { struct cma_multicast *mc; while (!list_empty(&id_priv->mc_list)) { mc = container_of(id_priv->mc_list.next, struct cma_multicast, list); list_del(&mc->list); switch (rdma_port_get_link_layer(id_priv->cma_dev->device, id_priv->id.port_num)) { case IB_LINK_LAYER_INFINIBAND: ib_sa_free_multicast(mc->multicast.ib); kfree(mc); break; case IB_LINK_LAYER_ETHERNET: kref_put(&mc->mcref, release_mc); break; default: break; } } } static void __rdma_free(struct work_struct *work) { struct rdma_id_private *id_priv; id_priv = container_of(work, struct rdma_id_private, work); wait_for_completion(&id_priv->comp); if (id_priv->internal_id) cma_deref_id(id_priv->id.context); kfree(id_priv->id.route.path_rec); kfree(id_priv); } void rdma_destroy_id(struct rdma_cm_id *id) { struct rdma_id_private *id_priv; enum rdma_cm_state state; unsigned long flags; struct ib_cm_id *ib; id_priv = container_of(id, struct rdma_id_private, id); state = cma_exch(id_priv, RDMA_CM_DESTROYING); cma_cancel_operation(id_priv, state); /* * Wait for any active callback to finish. New callbacks will find * the id_priv state set to destroying and abort. */ mutex_lock(&id_priv->handler_mutex); mutex_unlock(&id_priv->handler_mutex); if (id_priv->cma_dev) { switch (rdma_node_get_transport(id_priv->id.device->node_type)) { case RDMA_TRANSPORT_IB: spin_lock_irqsave(&id_priv->cm_lock, flags); if (id_priv->cm_id.ib && !IS_ERR(id_priv->cm_id.ib)) { ib = id_priv->cm_id.ib; id_priv->cm_id.ib = NULL; spin_unlock_irqrestore(&id_priv->cm_lock, flags); ib_destroy_cm_id(ib); } else spin_unlock_irqrestore(&id_priv->cm_lock, flags); break; case RDMA_TRANSPORT_IWARP: case RDMA_TRANSPORT_SCIF: if (id_priv->cm_id.iw) iw_destroy_cm_id(id_priv->cm_id.iw); break; default: break; } cma_leave_mc_groups(id_priv); cma_release_dev(id_priv); } cma_release_port(id_priv); cma_deref_id(id_priv); INIT_WORK(&id_priv->work, __rdma_free); queue_work(cma_free_wq, &id_priv->work); } EXPORT_SYMBOL(rdma_destroy_id); static int cma_rep_recv(struct rdma_id_private *id_priv) { int ret; ret = cma_modify_qp_rtr(id_priv, NULL); if (ret) goto reject; ret = cma_modify_qp_rts(id_priv, NULL); if (ret) goto reject; cma_dbg(id_priv, "sending RTU\n"); ret = ib_send_cm_rtu(id_priv->cm_id.ib, NULL, 0); if (ret) goto reject; return 0; reject: cma_modify_qp_err(id_priv); cma_dbg(id_priv, "sending REJ\n"); ib_send_cm_rej(id_priv->cm_id.ib, IB_CM_REJ_CONSUMER_DEFINED, NULL, 0, NULL, 0); return ret; } static int cma_verify_rep(struct rdma_id_private *id_priv, void *data) { if (id_priv->id.ps == RDMA_PS_SDP && sdp_get_majv(((struct sdp_hah *) data)->sdp_version) != SDP_MAJ_VERSION) return -EINVAL; return 0; } static void cma_set_rep_event_data(struct rdma_cm_event *event, struct ib_cm_rep_event_param *rep_data, void *private_data) { event->param.conn.private_data = private_data; event->param.conn.private_data_len = IB_CM_REP_PRIVATE_DATA_SIZE; event->param.conn.responder_resources = rep_data->responder_resources; event->param.conn.initiator_depth = rep_data->initiator_depth; event->param.conn.flow_control = rep_data->flow_control; event->param.conn.rnr_retry_count = rep_data->rnr_retry_count; event->param.conn.srq = rep_data->srq; event->param.conn.qp_num = rep_data->remote_qpn; } static int cma_ib_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event) { struct rdma_id_private *id_priv = cm_id->context; struct rdma_cm_event event; int ret = 0; if ((ib_event->event != IB_CM_TIMEWAIT_EXIT && cma_disable_callback(id_priv, RDMA_CM_CONNECT)) || (ib_event->event == IB_CM_TIMEWAIT_EXIT && cma_disable_callback(id_priv, RDMA_CM_DISCONNECT))) return 0; memset(&event, 0, sizeof event); switch (ib_event->event) { case IB_CM_REQ_ERROR: case IB_CM_REP_ERROR: event.event = RDMA_CM_EVENT_UNREACHABLE; event.status = -ETIMEDOUT; break; case IB_CM_REP_RECEIVED: event.status = cma_verify_rep(id_priv, ib_event->private_data); if (event.status) event.event = RDMA_CM_EVENT_CONNECT_ERROR; else if (id_priv->id.qp && id_priv->id.ps != RDMA_PS_SDP) { event.status = cma_rep_recv(id_priv); event.event = event.status ? RDMA_CM_EVENT_CONNECT_ERROR : RDMA_CM_EVENT_ESTABLISHED; } else event.event = RDMA_CM_EVENT_CONNECT_RESPONSE; cma_set_rep_event_data(&event, &ib_event->param.rep_rcvd, ib_event->private_data); break; case IB_CM_RTU_RECEIVED: case IB_CM_USER_ESTABLISHED: event.event = RDMA_CM_EVENT_ESTABLISHED; break; case IB_CM_DREQ_ERROR: event.status = -ETIMEDOUT; /* fall through */ case IB_CM_DREQ_RECEIVED: case IB_CM_DREP_RECEIVED: if (!cma_comp_exch(id_priv, RDMA_CM_CONNECT, RDMA_CM_DISCONNECT)) goto out; event.event = RDMA_CM_EVENT_DISCONNECTED; break; case IB_CM_TIMEWAIT_EXIT: event.event = RDMA_CM_EVENT_TIMEWAIT_EXIT; break; case IB_CM_MRA_RECEIVED: /* ignore event */ goto out; case IB_CM_REJ_RECEIVED: cma_modify_qp_err(id_priv); event.status = ib_event->param.rej_rcvd.reason; event.event = RDMA_CM_EVENT_REJECTED; event.param.conn.private_data = ib_event->private_data; event.param.conn.private_data_len = IB_CM_REJ_PRIVATE_DATA_SIZE; break; default: printk(KERN_ERR "RDMA CMA: unexpected IB CM event: %d\n", ib_event->event); goto out; } ret = id_priv->id.event_handler(&id_priv->id, &event); if (ret) { /* Destroy the CM ID by returning a non-zero value. */ id_priv->cm_id.ib = NULL; cma_exch(id_priv, RDMA_CM_DESTROYING); mutex_unlock(&id_priv->handler_mutex); rdma_destroy_id(&id_priv->id); return ret; } out: mutex_unlock(&id_priv->handler_mutex); return ret; } static struct rdma_id_private *cma_new_conn_id(struct rdma_cm_id *listen_id, struct ib_cm_event *ib_event) { struct rdma_id_private *id_priv; struct rdma_cm_id *id; struct rdma_route *rt; union cma_ip_addr *src, *dst; __be16 port; u8 ip_ver; int ret; if (cma_get_net_info(ib_event->private_data, listen_id->ps, &ip_ver, &port, &src, &dst)) return NULL; id = rdma_create_id(listen_id->event_handler, listen_id->context, listen_id->ps, ib_event->param.req_rcvd.qp_type); if (IS_ERR(id)) return NULL; cma_save_net_info(&id->route.addr, &listen_id->route.addr, ip_ver, port, src, dst); rt = &id->route; rt->num_paths = ib_event->param.req_rcvd.alternate_path ? 2 : 1; rt->path_rec = kmalloc(sizeof *rt->path_rec * rt->num_paths, GFP_KERNEL); if (!rt->path_rec) goto err; rt->path_rec[0] = *ib_event->param.req_rcvd.primary_path; if (rt->num_paths == 2) rt->path_rec[1] = *ib_event->param.req_rcvd.alternate_path; if (cma_any_addr((struct sockaddr *) &rt->addr.src_addr)) { rt->addr.dev_addr.dev_type = ARPHRD_INFINIBAND; rdma_addr_set_sgid(&rt->addr.dev_addr, &rt->path_rec[0].sgid); ib_addr_set_pkey(&rt->addr.dev_addr, be16_to_cpu(rt->path_rec[0].pkey)); } else { ret = rdma_translate_ip((struct sockaddr *) &rt->addr.src_addr, &rt->addr.dev_addr, NULL); if (ret) goto err; } rdma_addr_set_dgid(&rt->addr.dev_addr, &rt->path_rec[0].dgid); id_priv = container_of(id, struct rdma_id_private, id); id_priv->state = RDMA_CM_CONNECT; return id_priv; err: rdma_destroy_id(id); return NULL; } static struct rdma_id_private *cma_new_udp_id(struct rdma_cm_id *listen_id, struct ib_cm_event *ib_event) { struct rdma_id_private *id_priv; struct rdma_cm_id *id; union cma_ip_addr *src, *dst; __be16 port; u8 ip_ver; int ret; id = rdma_create_id(listen_id->event_handler, listen_id->context, listen_id->ps, IB_QPT_UD); if (IS_ERR(id)) return NULL; if (cma_get_net_info(ib_event->private_data, listen_id->ps, &ip_ver, &port, &src, &dst)) goto err; cma_save_net_info(&id->route.addr, &listen_id->route.addr, ip_ver, port, src, dst); if (!cma_any_addr((struct sockaddr *) &id->route.addr.src_addr)) { ret = rdma_translate_ip((struct sockaddr *) &id->route.addr.src_addr, &id->route.addr.dev_addr, NULL); if (ret) goto err; } id_priv = container_of(id, struct rdma_id_private, id); id_priv->state = RDMA_CM_CONNECT; return id_priv; err: rdma_destroy_id(id); return NULL; } static void cma_set_req_event_data(struct rdma_cm_event *event, struct ib_cm_req_event_param *req_data, void *private_data, int offset) { event->param.conn.private_data = private_data + offset; event->param.conn.private_data_len = IB_CM_REQ_PRIVATE_DATA_SIZE - offset; event->param.conn.responder_resources = req_data->responder_resources; event->param.conn.initiator_depth = req_data->initiator_depth; event->param.conn.flow_control = req_data->flow_control; event->param.conn.retry_count = req_data->retry_count; event->param.conn.rnr_retry_count = req_data->rnr_retry_count; event->param.conn.srq = req_data->srq; event->param.conn.qp_num = req_data->remote_qpn; } static int cma_check_req_qp_type(struct rdma_cm_id *id, struct ib_cm_event *ib_event) { return (((ib_event->event == IB_CM_REQ_RECEIVED) && (ib_event->param.req_rcvd.qp_type == id->qp_type)) || ((ib_event->event == IB_CM_SIDR_REQ_RECEIVED) && (id->qp_type == IB_QPT_UD)) || (!id->qp_type)); } static int cma_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event) { struct rdma_id_private *listen_id, *conn_id; struct rdma_cm_event event; int offset, ret; u8 smac[ETH_ALEN]; u8 alt_smac[ETH_ALEN]; u8 *psmac = smac; u8 *palt_smac = alt_smac; int is_iboe = ((rdma_node_get_transport(cm_id->device->node_type) == RDMA_TRANSPORT_IB) && (rdma_port_get_link_layer(cm_id->device, ib_event->param.req_rcvd.port) == IB_LINK_LAYER_ETHERNET)); int is_sidr = 0; listen_id = cm_id->context; if (!cma_check_req_qp_type(&listen_id->id, ib_event)) return -EINVAL; if (cma_disable_callback(listen_id, RDMA_CM_LISTEN)) return -ECONNABORTED; memset(&event, 0, sizeof event); offset = cma_user_data_offset(listen_id->id.ps); event.event = RDMA_CM_EVENT_CONNECT_REQUEST; if (ib_event->event == IB_CM_SIDR_REQ_RECEIVED) { is_sidr = 1; conn_id = cma_new_udp_id(&listen_id->id, ib_event); event.param.ud.private_data = ib_event->private_data + offset; event.param.ud.private_data_len = IB_CM_SIDR_REQ_PRIVATE_DATA_SIZE - offset; } else { conn_id = cma_new_conn_id(&listen_id->id, ib_event); cma_set_req_event_data(&event, &ib_event->param.req_rcvd, ib_event->private_data, offset); } if (!conn_id) { ret = -ENOMEM; goto err1; } mutex_lock_nested(&conn_id->handler_mutex, SINGLE_DEPTH_NESTING); ret = cma_acquire_dev(conn_id); if (ret) goto err2; conn_id->cm_id.ib = cm_id; cm_id->context = conn_id; cm_id->cm_handler = cma_ib_handler; /* * Protect against the user destroying conn_id from another thread * until we're done accessing it. */ atomic_inc(&conn_id->refcount); ret = conn_id->id.event_handler(&conn_id->id, &event); if (ret) goto err3; if (is_iboe && !is_sidr) { if (ib_event->param.req_rcvd.primary_path != NULL) rdma_addr_find_smac_by_sgid( &ib_event->param.req_rcvd.primary_path->sgid, psmac, NULL); else psmac = NULL; if (ib_event->param.req_rcvd.alternate_path != NULL) rdma_addr_find_smac_by_sgid( &ib_event->param.req_rcvd.alternate_path->sgid, palt_smac, NULL); else palt_smac = NULL; } /* * Acquire mutex to prevent user executing rdma_destroy_id() * while we're accessing the cm_id. */ mutex_lock(&lock); if (is_iboe && !is_sidr) ib_update_cm_av(cm_id, psmac, palt_smac); if (cma_comp(conn_id, RDMA_CM_CONNECT) && (conn_id->id.qp_type != IB_QPT_UD)) { cma_dbg(container_of(&conn_id->id, struct rdma_id_private, id), "sending MRA\n"); ib_send_cm_mra(cm_id, CMA_CM_MRA_SETTING, NULL, 0); } mutex_unlock(&lock); mutex_unlock(&conn_id->handler_mutex); mutex_unlock(&listen_id->handler_mutex); cma_deref_id(conn_id); return 0; err3: cma_deref_id(conn_id); /* Destroy the CM ID by returning a non-zero value. */ conn_id->cm_id.ib = NULL; err2: cma_exch(conn_id, RDMA_CM_DESTROYING); mutex_unlock(&conn_id->handler_mutex); err1: mutex_unlock(&listen_id->handler_mutex); if (conn_id) rdma_destroy_id(&conn_id->id); return ret; } static __be64 cma_get_service_id(enum rdma_port_space ps, struct sockaddr *addr) { return cpu_to_be64(((u64)ps << 16) + be16_to_cpu(cma_port(addr))); } static void cma_set_compare_data(enum rdma_port_space ps, struct sockaddr *addr, struct ib_cm_compare_data *compare) { struct cma_hdr *cma_data, *cma_mask; struct sdp_hh *sdp_data, *sdp_mask; __be32 ip4_addr; struct in6_addr ip6_addr; memset(compare, 0, sizeof *compare); cma_data = (void *) compare->data; cma_mask = (void *) compare->mask; sdp_data = (void *) compare->data; sdp_mask = (void *) compare->mask; switch (addr->sa_family) { case AF_INET: ip4_addr = ((struct sockaddr_in *) addr)->sin_addr.s_addr; if (ps == RDMA_PS_SDP) { sdp_set_ip_ver(sdp_data, 4); sdp_set_ip_ver(sdp_mask, 0xF); if (!cma_any_addr(addr)) { sdp_data->dst_addr.ip4.addr = ip4_addr; sdp_mask->dst_addr.ip4.addr = htonl(~0); } } else { cma_set_ip_ver(cma_data, 4); cma_set_ip_ver(cma_mask, 0xF); if (!cma_any_addr(addr)) { cma_data->dst_addr.ip4.addr = ip4_addr; cma_mask->dst_addr.ip4.addr = htonl(~0); } } break; case AF_INET6: ip6_addr = ((struct sockaddr_in6 *) addr)->sin6_addr; if (ps == RDMA_PS_SDP) { sdp_set_ip_ver(sdp_data, 6); sdp_set_ip_ver(sdp_mask, 0xF); if (!cma_any_addr(addr)) { sdp_data->dst_addr.ip6 = ip6_addr; memset(&sdp_mask->dst_addr.ip6, 0xFF, sizeof(sdp_mask->dst_addr.ip6)); } } else { cma_set_ip_ver(cma_data, 6); cma_set_ip_ver(cma_mask, 0xF); if (!cma_any_addr(addr)) { cma_data->dst_addr.ip6 = ip6_addr; memset(&cma_mask->dst_addr.ip6, 0xFF, sizeof(cma_mask->dst_addr.ip6)); } } break; default: break; } } static int cma_iw_handler(struct iw_cm_id *iw_id, struct iw_cm_event *iw_event) { struct rdma_id_private *id_priv = iw_id->context; struct rdma_cm_event event; struct sockaddr_in *sin; int ret = 0; if (cma_disable_callback(id_priv, RDMA_CM_CONNECT)) return 0; memset(&event, 0, sizeof event); switch (iw_event->event) { case IW_CM_EVENT_CLOSE: event.event = RDMA_CM_EVENT_DISCONNECTED; break; case IW_CM_EVENT_CONNECT_REPLY: sin = (struct sockaddr_in *) &id_priv->id.route.addr.src_addr; *sin = iw_event->local_addr; sin = (struct sockaddr_in *) &id_priv->id.route.addr.dst_addr; *sin = iw_event->remote_addr; switch ((int)iw_event->status) { case 0: event.event = RDMA_CM_EVENT_ESTABLISHED; event.param.conn.initiator_depth = iw_event->ird; event.param.conn.responder_resources = iw_event->ord; break; case -ECONNRESET: case -ECONNREFUSED: event.event = RDMA_CM_EVENT_REJECTED; break; case -ETIMEDOUT: event.event = RDMA_CM_EVENT_UNREACHABLE; break; default: event.event = RDMA_CM_EVENT_CONNECT_ERROR; break; } break; case IW_CM_EVENT_ESTABLISHED: event.event = RDMA_CM_EVENT_ESTABLISHED; event.param.conn.initiator_depth = iw_event->ird; event.param.conn.responder_resources = iw_event->ord; break; default: BUG_ON(1); } event.status = iw_event->status; event.param.conn.private_data = iw_event->private_data; event.param.conn.private_data_len = iw_event->private_data_len; ret = id_priv->id.event_handler(&id_priv->id, &event); if (ret) { /* Destroy the CM ID by returning a non-zero value. */ id_priv->cm_id.iw = NULL; cma_exch(id_priv, RDMA_CM_DESTROYING); mutex_unlock(&id_priv->handler_mutex); rdma_destroy_id(&id_priv->id); return ret; } mutex_unlock(&id_priv->handler_mutex); return ret; } static int iw_conn_req_handler(struct iw_cm_id *cm_id, struct iw_cm_event *iw_event) { struct rdma_cm_id *new_cm_id; struct rdma_id_private *listen_id, *conn_id; struct sockaddr_in *sin; struct net_device *dev = NULL; struct rdma_cm_event event; int ret; struct ib_device_attr attr; listen_id = cm_id->context; if (cma_disable_callback(listen_id, RDMA_CM_LISTEN)) return -ECONNABORTED; /* Create a new RDMA id for the new IW CM ID */ new_cm_id = rdma_create_id(listen_id->id.event_handler, listen_id->id.context, RDMA_PS_TCP, IB_QPT_RC); if (IS_ERR(new_cm_id)) { ret = -ENOMEM; goto out; } conn_id = container_of(new_cm_id, struct rdma_id_private, id); mutex_lock_nested(&conn_id->handler_mutex, SINGLE_DEPTH_NESTING); conn_id->state = RDMA_CM_CONNECT; dev = ip_dev_find(&init_net, iw_event->local_addr.sin_addr.s_addr); if (!dev) { ret = -EADDRNOTAVAIL; mutex_unlock(&conn_id->handler_mutex); rdma_destroy_id(new_cm_id); goto out; } ret = rdma_copy_addr(&conn_id->id.route.addr.dev_addr, dev, NULL); if (ret) { mutex_unlock(&conn_id->handler_mutex); rdma_destroy_id(new_cm_id); goto out; } ret = cma_acquire_dev(conn_id); if (ret) { mutex_unlock(&conn_id->handler_mutex); rdma_destroy_id(new_cm_id); goto out; } conn_id->cm_id.iw = cm_id; cm_id->context = conn_id; cm_id->cm_handler = cma_iw_handler; sin = (struct sockaddr_in *) &new_cm_id->route.addr.src_addr; *sin = iw_event->local_addr; sin = (struct sockaddr_in *) &new_cm_id->route.addr.dst_addr; *sin = iw_event->remote_addr; ret = ib_query_device(conn_id->id.device, &attr); if (ret) { mutex_unlock(&conn_id->handler_mutex); rdma_destroy_id(new_cm_id); goto out; } memset(&event, 0, sizeof event); event.event = RDMA_CM_EVENT_CONNECT_REQUEST; event.param.conn.private_data = iw_event->private_data; event.param.conn.private_data_len = iw_event->private_data_len; event.param.conn.initiator_depth = iw_event->ird; event.param.conn.responder_resources = iw_event->ord; /* * Protect against the user destroying conn_id from another thread * until we're done accessing it. */ atomic_inc(&conn_id->refcount); ret = conn_id->id.event_handler(&conn_id->id, &event); if (ret) { /* User wants to destroy the CM ID */ conn_id->cm_id.iw = NULL; cma_exch(conn_id, RDMA_CM_DESTROYING); mutex_unlock(&conn_id->handler_mutex); cma_deref_id(conn_id); rdma_destroy_id(&conn_id->id); goto out; } mutex_unlock(&conn_id->handler_mutex); cma_deref_id(conn_id); out: if (dev) dev_put(dev); mutex_unlock(&listen_id->handler_mutex); return ret; } static int cma_ib_listen(struct rdma_id_private *id_priv) { struct ib_cm_compare_data compare_data; struct sockaddr *addr; struct ib_cm_id *id; __be64 svc_id; int ret; id = ib_create_cm_id(id_priv->id.device, cma_req_handler, id_priv); if (IS_ERR(id)) return PTR_ERR(id); id_priv->cm_id.ib = id; addr = (struct sockaddr *) &id_priv->id.route.addr.src_addr; svc_id = cma_get_service_id(id_priv->id.ps, addr); if (cma_any_addr(addr) && !id_priv->afonly) ret = ib_cm_listen(id_priv->cm_id.ib, svc_id, 0, NULL); else { cma_set_compare_data(id_priv->id.ps, addr, &compare_data); ret = ib_cm_listen(id_priv->cm_id.ib, svc_id, 0, &compare_data); } if (ret) { ib_destroy_cm_id(id_priv->cm_id.ib); id_priv->cm_id.ib = NULL; } return ret; } static int cma_iw_listen(struct rdma_id_private *id_priv, int backlog) { int ret; struct sockaddr_in *sin; struct iw_cm_id *id; id = iw_create_cm_id(id_priv->id.device, id_priv->sock, iw_conn_req_handler, id_priv); if (IS_ERR(id)) return PTR_ERR(id); id_priv->cm_id.iw = id; sin = (struct sockaddr_in *) &id_priv->id.route.addr.src_addr; id_priv->cm_id.iw->local_addr = *sin; ret = iw_cm_listen(id_priv->cm_id.iw, backlog); if (ret) { iw_destroy_cm_id(id_priv->cm_id.iw); id_priv->cm_id.iw = NULL; } return ret; } static int cma_listen_handler(struct rdma_cm_id *id, struct rdma_cm_event *event) { struct rdma_id_private *id_priv = id->context; id->context = id_priv->id.context; id->event_handler = id_priv->id.event_handler; return id_priv->id.event_handler(id, event); } static void cma_listen_on_dev(struct rdma_id_private *id_priv, struct cma_device *cma_dev) { struct rdma_id_private *dev_id_priv; struct rdma_cm_id *id; int ret; id = rdma_create_id(cma_listen_handler, id_priv, id_priv->id.ps, id_priv->id.qp_type); if (IS_ERR(id)) return; dev_id_priv = container_of(id, struct rdma_id_private, id); dev_id_priv->state = RDMA_CM_ADDR_BOUND; memcpy(&id->route.addr.src_addr, &id_priv->id.route.addr.src_addr, ip_addr_size((struct sockaddr *) &id_priv->id.route.addr.src_addr)); cma_attach_to_dev(dev_id_priv, cma_dev); list_add_tail(&dev_id_priv->listen_list, &id_priv->listen_list); atomic_inc(&id_priv->refcount); dev_id_priv->internal_id = 1; dev_id_priv->afonly = id_priv->afonly; ret = rdma_listen(id, id_priv->backlog); if (ret) cma_warn(id_priv, "cma_listen_on_dev, error %d, listening on device %s\n", ret, cma_dev->device->name); } static void cma_listen_on_all(struct rdma_id_private *id_priv) { struct cma_device *cma_dev; mutex_lock(&lock); list_add_tail(&id_priv->list, &listen_any_list); list_for_each_entry(cma_dev, &dev_list, list) cma_listen_on_dev(id_priv, cma_dev); mutex_unlock(&lock); } void rdma_set_service_type(struct rdma_cm_id *id, int tos) { struct rdma_id_private *id_priv; id_priv = container_of(id, struct rdma_id_private, id); id_priv->tos = (u8) tos; } EXPORT_SYMBOL(rdma_set_service_type); void rdma_set_timeout(struct rdma_cm_id *id, int timeout) { struct rdma_id_private *id_priv; id_priv = container_of(id, struct rdma_id_private, id); id_priv->qp_timeout = (u8) timeout; } EXPORT_SYMBOL(rdma_set_timeout); static void cma_query_handler(int status, struct ib_sa_path_rec *path_rec, void *context) { struct cma_work *work = context; struct rdma_route *route; route = &work->id->id.route; if (!status) { route->num_paths = 1; *route->path_rec = *path_rec; } else { work->old_state = RDMA_CM_ROUTE_QUERY; work->new_state = RDMA_CM_ADDR_RESOLVED; work->event.event = RDMA_CM_EVENT_ROUTE_ERROR; work->event.status = status; } queue_work(cma_wq, &work->work); } static int cma_query_ib_route(struct rdma_id_private *id_priv, int timeout_ms, struct cma_work *work) { struct rdma_addr *addr = &id_priv->id.route.addr; struct ib_sa_path_rec path_rec; ib_sa_comp_mask comp_mask; struct sockaddr_in6 *sin6; memset(&path_rec, 0, sizeof path_rec); rdma_addr_get_sgid(&addr->dev_addr, &path_rec.sgid); rdma_addr_get_dgid(&addr->dev_addr, &path_rec.dgid); path_rec.pkey = cpu_to_be16(ib_addr_get_pkey(&addr->dev_addr)); path_rec.numb_path = 1; path_rec.reversible = 1; path_rec.service_id = cma_get_service_id(id_priv->id.ps, (struct sockaddr *) &addr->dst_addr); comp_mask = IB_SA_PATH_REC_DGID | IB_SA_PATH_REC_SGID | IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH | IB_SA_PATH_REC_REVERSIBLE | IB_SA_PATH_REC_SERVICE_ID; if (addr->src_addr.ss_family == AF_INET) { path_rec.qos_class = cpu_to_be16((u16) id_priv->tos); comp_mask |= IB_SA_PATH_REC_QOS_CLASS; } else { sin6 = (struct sockaddr_in6 *) &addr->src_addr; path_rec.traffic_class = (u8) (be32_to_cpu(sin6->sin6_flowinfo) >> 20); comp_mask |= IB_SA_PATH_REC_TRAFFIC_CLASS; } id_priv->query_id = ib_sa_path_rec_get(&sa_client, id_priv->id.device, id_priv->id.port_num, &path_rec, comp_mask, timeout_ms, GFP_KERNEL, cma_query_handler, work, &id_priv->query); return (id_priv->query_id < 0) ? id_priv->query_id : 0; } static void cma_work_handler(struct work_struct *_work) { struct cma_work *work = container_of(_work, struct cma_work, work); struct rdma_id_private *id_priv = work->id; int destroy = 0; mutex_lock(&id_priv->handler_mutex); if (!cma_comp_exch(id_priv, work->old_state, work->new_state)) goto out; if (id_priv->id.event_handler(&id_priv->id, &work->event)) { cma_exch(id_priv, RDMA_CM_DESTROYING); destroy = 1; } out: mutex_unlock(&id_priv->handler_mutex); cma_deref_id(id_priv); if (destroy) rdma_destroy_id(&id_priv->id); kfree(work); } static void cma_ndev_work_handler(struct work_struct *_work) { struct cma_ndev_work *work = container_of(_work, struct cma_ndev_work, work); struct rdma_id_private *id_priv = work->id; int destroy = 0; mutex_lock(&id_priv->handler_mutex); if (id_priv->state == RDMA_CM_DESTROYING || id_priv->state == RDMA_CM_DEVICE_REMOVAL) goto out; if (id_priv->id.event_handler(&id_priv->id, &work->event)) { cma_exch(id_priv, RDMA_CM_DESTROYING); destroy = 1; } out: mutex_unlock(&id_priv->handler_mutex); cma_deref_id(id_priv); if (destroy) rdma_destroy_id(&id_priv->id); kfree(work); } static int cma_resolve_ib_route(struct rdma_id_private *id_priv, int timeout_ms) { struct rdma_route *route = &id_priv->id.route; struct cma_work *work; int ret; work = kzalloc(sizeof *work, GFP_KERNEL); if (!work) return -ENOMEM; work->id = id_priv; INIT_WORK(&work->work, cma_work_handler); work->old_state = RDMA_CM_ROUTE_QUERY; work->new_state = RDMA_CM_ROUTE_RESOLVED; work->event.event = RDMA_CM_EVENT_ROUTE_RESOLVED; route->path_rec = kmalloc(sizeof *route->path_rec, GFP_KERNEL); if (!route->path_rec) { ret = -ENOMEM; goto err1; } ret = cma_query_ib_route(id_priv, timeout_ms, work); if (ret) goto err2; return 0; err2: kfree(route->path_rec); route->path_rec = NULL; err1: kfree(work); return ret; } int rdma_set_ib_paths(struct rdma_cm_id *id, struct ib_sa_path_rec *path_rec, int num_paths) { struct rdma_id_private *id_priv; int ret; id_priv = container_of(id, struct rdma_id_private, id); if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_RESOLVED, RDMA_CM_ROUTE_RESOLVED)) return -EINVAL; id->route.path_rec = kmemdup(path_rec, sizeof *path_rec * num_paths, GFP_KERNEL); if (!id->route.path_rec) { ret = -ENOMEM; goto err; } id->route.num_paths = num_paths; return 0; err: cma_comp_exch(id_priv, RDMA_CM_ROUTE_RESOLVED, RDMA_CM_ADDR_RESOLVED); return ret; } EXPORT_SYMBOL(rdma_set_ib_paths); static int cma_resolve_iw_route(struct rdma_id_private *id_priv, int timeout_ms) { struct cma_work *work; work = kzalloc(sizeof *work, GFP_KERNEL); if (!work) return -ENOMEM; work->id = id_priv; INIT_WORK(&work->work, cma_work_handler); work->old_state = RDMA_CM_ROUTE_QUERY; work->new_state = RDMA_CM_ROUTE_RESOLVED; work->event.event = RDMA_CM_EVENT_ROUTE_RESOLVED; queue_work(cma_wq, &work->work); return 0; } static u8 tos_to_sl(u8 tos) { return def_prec2sl & 7; } static int cma_resolve_iboe_route(struct rdma_id_private *id_priv) { struct rdma_route *route = &id_priv->id.route; struct rdma_addr *addr = &route->addr; struct cma_work *work; int ret; struct sockaddr_in *src_addr = (struct sockaddr_in *)&route->addr.src_addr; struct sockaddr_in *dst_addr = (struct sockaddr_in *)&route->addr.dst_addr; struct net_device *ndev = NULL; if (src_addr->sin_family != dst_addr->sin_family) return -EINVAL; work = kzalloc(sizeof *work, GFP_KERNEL); if (!work) return -ENOMEM; work->id = id_priv; INIT_WORK(&work->work, cma_work_handler); route->path_rec = kzalloc(sizeof *route->path_rec, GFP_KERNEL); if (!route->path_rec) { ret = -ENOMEM; goto err1; } route->num_paths = 1; if (addr->dev_addr.bound_dev_if) ndev = dev_get_by_index(&init_net, addr->dev_addr.bound_dev_if); if (!ndev) { ret = -ENODEV; goto err2; } route->path_rec->vlan_id = rdma_vlan_dev_vlan_id(ndev); memcpy(route->path_rec->dmac, addr->dev_addr.dst_dev_addr, ETH_ALEN); memcpy(route->path_rec->smac, IF_LLADDR(ndev), ndev->if_addrlen); rdma_ip2gid((struct sockaddr *)&id_priv->id.route.addr.src_addr, &route->path_rec->sgid); rdma_ip2gid((struct sockaddr *)&id_priv->id.route.addr.dst_addr, &route->path_rec->dgid); route->path_rec->hop_limit = 1; route->path_rec->reversible = 1; route->path_rec->pkey = cpu_to_be16(0xffff); route->path_rec->mtu_selector = IB_SA_EQ; route->path_rec->sl = tos_to_sl(id_priv->tos); route->path_rec->mtu = iboe_get_mtu(ndev->if_mtu); route->path_rec->rate_selector = IB_SA_EQ; route->path_rec->rate = iboe_get_rate(ndev); dev_put(ndev); route->path_rec->packet_life_time_selector = IB_SA_EQ; route->path_rec->packet_life_time = CMA_IBOE_PACKET_LIFETIME; if (!route->path_rec->mtu) { ret = -EINVAL; goto err2; } work->old_state = RDMA_CM_ROUTE_QUERY; work->new_state = RDMA_CM_ROUTE_RESOLVED; work->event.event = RDMA_CM_EVENT_ROUTE_RESOLVED; work->event.status = 0; queue_work(cma_wq, &work->work); return 0; err2: kfree(route->path_rec); route->path_rec = NULL; err1: kfree(work); return ret; } int rdma_resolve_route(struct rdma_cm_id *id, int timeout_ms) { struct rdma_id_private *id_priv; int ret; id_priv = container_of(id, struct rdma_id_private, id); if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_RESOLVED, RDMA_CM_ROUTE_QUERY)) return -EINVAL; atomic_inc(&id_priv->refcount); switch (rdma_node_get_transport(id->device->node_type)) { case RDMA_TRANSPORT_IB: switch (rdma_port_get_link_layer(id->device, id->port_num)) { case IB_LINK_LAYER_INFINIBAND: ret = cma_resolve_ib_route(id_priv, timeout_ms); break; case IB_LINK_LAYER_ETHERNET: ret = cma_resolve_iboe_route(id_priv); break; default: ret = -ENOSYS; } break; case RDMA_TRANSPORT_IWARP: case RDMA_TRANSPORT_SCIF: ret = cma_resolve_iw_route(id_priv, timeout_ms); break; default: ret = -ENOSYS; break; } if (ret) goto err; return 0; err: cma_comp_exch(id_priv, RDMA_CM_ROUTE_QUERY, RDMA_CM_ADDR_RESOLVED); cma_deref_id(id_priv); return ret; } EXPORT_SYMBOL(rdma_resolve_route); int rdma_enable_apm(struct rdma_cm_id *id, enum alt_path_type alt_type) { /* APM is not supported yet */ return -EINVAL; } EXPORT_SYMBOL(rdma_enable_apm); static int cma_bind_loopback(struct rdma_id_private *id_priv) { struct cma_device *cma_dev; struct ib_port_attr port_attr; union ib_gid gid; u16 pkey; int ret; u8 p; mutex_lock(&lock); if (list_empty(&dev_list)) { ret = -ENODEV; goto out; } list_for_each_entry(cma_dev, &dev_list, list) for (p = 1; p <= cma_dev->device->phys_port_cnt; ++p) if (!ib_query_port(cma_dev->device, p, &port_attr) && port_attr.state == IB_PORT_ACTIVE) goto port_found; p = 1; cma_dev = list_entry(dev_list.next, struct cma_device, list); port_found: ret = ib_get_cached_gid(cma_dev->device, p, 0, &gid); if (ret) goto out; ret = ib_get_cached_pkey(cma_dev->device, p, 0, &pkey); if (ret) goto out; id_priv->id.route.addr.dev_addr.dev_type = (rdma_port_get_link_layer(cma_dev->device, p) == IB_LINK_LAYER_INFINIBAND) ? ARPHRD_INFINIBAND : ARPHRD_ETHER; rdma_addr_set_sgid(&id_priv->id.route.addr.dev_addr, &gid); ib_addr_set_pkey(&id_priv->id.route.addr.dev_addr, pkey); id_priv->id.port_num = p; cma_attach_to_dev(id_priv, cma_dev); out: mutex_unlock(&lock); return ret; } static void addr_handler(int status, struct sockaddr *src_addr, struct rdma_dev_addr *dev_addr, void *context) { struct rdma_id_private *id_priv = context; struct rdma_cm_event event; memset(&event, 0, sizeof event); mutex_lock(&id_priv->handler_mutex); if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_QUERY, RDMA_CM_ADDR_RESOLVED)) goto out; memcpy(&id_priv->id.route.addr.src_addr, src_addr, ip_addr_size(src_addr)); if (!status && !id_priv->cma_dev) status = cma_acquire_dev(id_priv); if (status) { if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_RESOLVED, RDMA_CM_ADDR_BOUND)) goto out; event.event = RDMA_CM_EVENT_ADDR_ERROR; event.status = status; } else event.event = RDMA_CM_EVENT_ADDR_RESOLVED; if (id_priv->id.event_handler(&id_priv->id, &event)) { cma_exch(id_priv, RDMA_CM_DESTROYING); mutex_unlock(&id_priv->handler_mutex); cma_deref_id(id_priv); rdma_destroy_id(&id_priv->id); return; } out: mutex_unlock(&id_priv->handler_mutex); cma_deref_id(id_priv); } static int cma_resolve_loopback(struct rdma_id_private *id_priv) { struct cma_work *work; struct sockaddr *src, *dst; union ib_gid gid; int ret; work = kzalloc(sizeof *work, GFP_KERNEL); if (!work) return -ENOMEM; if (!id_priv->cma_dev) { ret = cma_bind_loopback(id_priv); if (ret) goto err; } rdma_addr_get_sgid(&id_priv->id.route.addr.dev_addr, &gid); rdma_addr_set_dgid(&id_priv->id.route.addr.dev_addr, &gid); src = (struct sockaddr *) &id_priv->id.route.addr.src_addr; if (cma_zero_addr(src)) { dst = (struct sockaddr *) &id_priv->id.route.addr.dst_addr; if ((src->sa_family = dst->sa_family) == AF_INET) { ((struct sockaddr_in *)src)->sin_addr = ((struct sockaddr_in *)dst)->sin_addr; } else { ((struct sockaddr_in6 *)src)->sin6_addr = ((struct sockaddr_in6 *)dst)->sin6_addr; } } work->id = id_priv; INIT_WORK(&work->work, cma_work_handler); work->old_state = RDMA_CM_ADDR_QUERY; work->new_state = RDMA_CM_ADDR_RESOLVED; work->event.event = RDMA_CM_EVENT_ADDR_RESOLVED; queue_work(cma_wq, &work->work); return 0; err: kfree(work); return ret; } static int cma_resolve_scif(struct rdma_id_private *id_priv) { struct cma_work *work; work = kzalloc(sizeof *work, GFP_KERNEL); if (!work) return -ENOMEM; /* we probably can leave it empty here */ work->id = id_priv; INIT_WORK(&work->work, cma_work_handler); work->old_state = RDMA_CM_ADDR_QUERY; work->new_state = RDMA_CM_ADDR_RESOLVED; work->event.event = RDMA_CM_EVENT_ADDR_RESOLVED; queue_work(cma_wq, &work->work); return 0; } static int cma_bind_addr(struct rdma_cm_id *id, struct sockaddr *src_addr, struct sockaddr *dst_addr) { if (!src_addr || !src_addr->sa_family) { src_addr = (struct sockaddr *) &id->route.addr.src_addr; if ((src_addr->sa_family = dst_addr->sa_family) == AF_INET6) { ((struct sockaddr_in6 *) src_addr)->sin6_scope_id = ((struct sockaddr_in6 *) dst_addr)->sin6_scope_id; } } if (!cma_any_addr(src_addr)) return rdma_bind_addr(id, src_addr); else { struct sockaddr_in addr_in; memset(&addr_in, 0, sizeof addr_in); addr_in.sin_family = dst_addr->sa_family; addr_in.sin_len = sizeof addr_in; return rdma_bind_addr(id, (struct sockaddr *) &addr_in); } } int rdma_resolve_addr(struct rdma_cm_id *id, struct sockaddr *src_addr, struct sockaddr *dst_addr, int timeout_ms) { struct rdma_id_private *id_priv; int ret; id_priv = container_of(id, struct rdma_id_private, id); if (id_priv->state == RDMA_CM_IDLE) { ret = cma_bind_addr(id, src_addr, dst_addr); if (ret) return ret; } if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_BOUND, RDMA_CM_ADDR_QUERY)) return -EINVAL; atomic_inc(&id_priv->refcount); memcpy(&id->route.addr.dst_addr, dst_addr, ip_addr_size(dst_addr)); if (cma_any_addr(dst_addr)) ret = cma_resolve_loopback(id_priv); else if (id_priv->id.device && rdma_node_get_transport(id_priv->id.device->node_type) == RDMA_TRANSPORT_SCIF) ret = cma_resolve_scif(id_priv); else ret = rdma_resolve_ip(&addr_client, (struct sockaddr *) &id->route.addr.src_addr, dst_addr, &id->route.addr.dev_addr, timeout_ms, addr_handler, id_priv); if (ret) goto err; return 0; err: cma_comp_exch(id_priv, RDMA_CM_ADDR_QUERY, RDMA_CM_ADDR_BOUND); cma_deref_id(id_priv); return ret; } EXPORT_SYMBOL(rdma_resolve_addr); int rdma_set_reuseaddr(struct rdma_cm_id *id, int reuse) { struct rdma_id_private *id_priv; unsigned long flags; int ret; id_priv = container_of(id, struct rdma_id_private, id); spin_lock_irqsave(&id_priv->lock, flags); if (id_priv->state == RDMA_CM_IDLE) { id_priv->reuseaddr = reuse; ret = 0; } else { ret = -EINVAL; } spin_unlock_irqrestore(&id_priv->lock, flags); return ret; } EXPORT_SYMBOL(rdma_set_reuseaddr); int rdma_set_afonly(struct rdma_cm_id *id, int afonly) { struct rdma_id_private *id_priv; unsigned long flags; int ret; id_priv = container_of(id, struct rdma_id_private, id); spin_lock_irqsave(&id_priv->lock, flags); if (id_priv->state == RDMA_CM_IDLE || id_priv->state == RDMA_CM_ADDR_BOUND) { id_priv->options |= (1 << CMA_OPTION_AFONLY); id_priv->afonly = afonly; ret = 0; } else { ret = -EINVAL; } spin_unlock_irqrestore(&id_priv->lock, flags); return ret; } EXPORT_SYMBOL(rdma_set_afonly); static void cma_bind_port(struct rdma_bind_list *bind_list, struct rdma_id_private *id_priv) { struct sockaddr_in *sin; sin = (struct sockaddr_in *) &id_priv->id.route.addr.src_addr; sin->sin_port = htons(bind_list->port); id_priv->bind_list = bind_list; hlist_add_head(&id_priv->node, &bind_list->owners); } static int cma_alloc_port(struct idr *ps, struct rdma_id_private *id_priv, unsigned short snum) { struct rdma_bind_list *bind_list; int port, ret; bind_list = kzalloc(sizeof *bind_list, GFP_KERNEL); if (!bind_list) return -ENOMEM; do { ret = idr_get_new_above(ps, bind_list, snum, &port); } while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL)); if (ret) goto err1; if (port != snum) { ret = -EADDRNOTAVAIL; goto err2; } bind_list->ps = ps; bind_list->port = (unsigned short) port; cma_bind_port(bind_list, id_priv); return 0; err2: idr_remove(ps, port); err1: kfree(bind_list); return ret; } static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv) { static unsigned int last_used_port; int low, high, remaining; unsigned int rover; inet_get_local_port_range(&low, &high); remaining = (high - low) + 1; rover = random() % remaining + low; retry: if (last_used_port != rover && !idr_find(ps, (unsigned short) rover)) { int ret = cma_alloc_port(ps, id_priv, rover); /* * Remember previously used port number in order to avoid * re-using same port immediately after it is closed. */ if (!ret) last_used_port = rover; if (ret != -EADDRNOTAVAIL) return ret; } if (--remaining) { rover++; if ((rover < low) || (rover > high)) rover = low; goto retry; } return -EADDRNOTAVAIL; } /* * Check that the requested port is available. This is called when trying to * bind to a specific port, or when trying to listen on a bound port. In * the latter case, the provided id_priv may already be on the bind_list, but * we still need to check that it's okay to start listening. */ static int cma_check_port(struct rdma_bind_list *bind_list, struct rdma_id_private *id_priv, uint8_t reuseaddr) { struct rdma_id_private *cur_id; struct sockaddr *addr, *cur_addr; struct hlist_node *node; addr = (struct sockaddr *) &id_priv->id.route.addr.src_addr; hlist_for_each_entry(cur_id, node, &bind_list->owners, node) { if (id_priv == cur_id) continue; if ((cur_id->state != RDMA_CM_LISTEN) && reuseaddr && cur_id->reuseaddr) continue; cur_addr = (struct sockaddr *) &cur_id->id.route.addr.src_addr; if (id_priv->afonly && cur_id->afonly && (addr->sa_family != cur_addr->sa_family)) continue; if (cma_any_addr(addr) || cma_any_addr(cur_addr)) return -EADDRNOTAVAIL; if (!cma_addr_cmp(addr, cur_addr)) return -EADDRINUSE; } return 0; } static int cma_use_port(struct idr *ps, struct rdma_id_private *id_priv) { struct rdma_bind_list *bind_list; unsigned short snum; int ret; snum = ntohs(cma_port((struct sockaddr *) &id_priv->id.route.addr.src_addr)); bind_list = idr_find(ps, snum); if (!bind_list) { ret = cma_alloc_port(ps, id_priv, snum); } else { ret = cma_check_port(bind_list, id_priv, id_priv->reuseaddr); if (!ret) cma_bind_port(bind_list, id_priv); } return ret; } static int cma_bind_listen(struct rdma_id_private *id_priv) { struct rdma_bind_list *bind_list = id_priv->bind_list; int ret = 0; mutex_lock(&lock); if (bind_list->owners.first->next) ret = cma_check_port(bind_list, id_priv, 0); mutex_unlock(&lock); return ret; } static int cma_get_port(struct rdma_id_private *id_priv) { struct idr *ps; int ret; switch (id_priv->id.ps) { case RDMA_PS_SDP: ps = &sdp_ps; break; case RDMA_PS_TCP: ps = &tcp_ps; break; case RDMA_PS_UDP: ps = &udp_ps; break; case RDMA_PS_IPOIB: ps = &ipoib_ps; break; case RDMA_PS_IB: ps = &ib_ps; break; default: return -EPROTONOSUPPORT; } mutex_lock(&lock); if (cma_any_port((struct sockaddr *) &id_priv->id.route.addr.src_addr)) ret = cma_alloc_any_port(ps, id_priv); else ret = cma_use_port(ps, id_priv); mutex_unlock(&lock); return ret; } static int cma_check_linklocal(struct rdma_dev_addr *dev_addr, struct sockaddr *addr) { #if defined(INET6) struct sockaddr_in6 *sin6; if (addr->sa_family != AF_INET6) return 0; sin6 = (struct sockaddr_in6 *) addr; if (IN6_IS_SCOPE_LINKLOCAL(&sin6->sin6_addr) && !sin6->sin6_scope_id) return -EINVAL; dev_addr->bound_dev_if = sin6->sin6_scope_id; #endif return 0; } int rdma_listen(struct rdma_cm_id *id, int backlog) { struct rdma_id_private *id_priv; int ret; id_priv = container_of(id, struct rdma_id_private, id); if (id_priv->state == RDMA_CM_IDLE) { ((struct sockaddr *) &id->route.addr.src_addr)->sa_family = AF_INET; ret = rdma_bind_addr(id, (struct sockaddr *) &id->route.addr.src_addr); if (ret) return ret; } if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_BOUND, RDMA_CM_LISTEN)) return -EINVAL; if (id_priv->reuseaddr) { ret = cma_bind_listen(id_priv); if (ret) goto err; } id_priv->backlog = backlog; if (id->device) { switch (rdma_node_get_transport(id->device->node_type)) { case RDMA_TRANSPORT_IB: ret = cma_ib_listen(id_priv); if (ret) goto err; break; case RDMA_TRANSPORT_IWARP: case RDMA_TRANSPORT_SCIF: ret = cma_iw_listen(id_priv, backlog); if (ret) goto err; break; default: ret = -ENOSYS; goto err; } } else cma_listen_on_all(id_priv); return 0; err: id_priv->backlog = 0; cma_comp_exch(id_priv, RDMA_CM_LISTEN, RDMA_CM_ADDR_BOUND); return ret; } EXPORT_SYMBOL(rdma_listen); int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr) { struct rdma_id_private *id_priv; int ret; +#if defined(INET6) int ipv6only; size_t var_size = sizeof(int); +#endif if (addr->sa_family != AF_INET && addr->sa_family != AF_INET6) return -EAFNOSUPPORT; id_priv = container_of(id, struct rdma_id_private, id); if (!cma_comp_exch(id_priv, RDMA_CM_IDLE, RDMA_CM_ADDR_BOUND)) return -EINVAL; ret = cma_check_linklocal(&id->route.addr.dev_addr, addr); if (ret) goto err1; memcpy(&id->route.addr.src_addr, addr, ip_addr_size(addr)); if (!cma_any_addr(addr)) { ret = rdma_translate_ip(addr, &id->route.addr.dev_addr, NULL); if (ret) goto err1; ret = cma_acquire_dev(id_priv); if (ret) goto err1; } if (!(id_priv->options & (1 << CMA_OPTION_AFONLY))) { if (addr->sa_family == AF_INET) id_priv->afonly = 1; #if defined(INET6) else if (addr->sa_family == AF_INET6) id_priv->afonly = kernel_sysctlbyname(&thread0, "net.inet6.ip6.v6only", &ipv6only, &var_size, NULL, 0, NULL, 0); #endif } ret = cma_get_port(id_priv); if (ret) goto err2; return 0; err2: if (id_priv->cma_dev) cma_release_dev(id_priv); err1: cma_comp_exch(id_priv, RDMA_CM_ADDR_BOUND, RDMA_CM_IDLE); return ret; } EXPORT_SYMBOL(rdma_bind_addr); static int cma_format_hdr(void *hdr, enum rdma_port_space ps, struct rdma_route *route) { struct cma_hdr *cma_hdr; struct sdp_hh *sdp_hdr; if (route->addr.src_addr.ss_family == AF_INET) { struct sockaddr_in *src4, *dst4; src4 = (struct sockaddr_in *) &route->addr.src_addr; dst4 = (struct sockaddr_in *) &route->addr.dst_addr; switch (ps) { case RDMA_PS_SDP: sdp_hdr = hdr; if (sdp_get_majv(sdp_hdr->sdp_version) != SDP_MAJ_VERSION) return -EINVAL; sdp_set_ip_ver(sdp_hdr, 4); sdp_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr; sdp_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr; sdp_hdr->port = src4->sin_port; break; default: cma_hdr = hdr; cma_hdr->cma_version = CMA_VERSION; cma_set_ip_ver(cma_hdr, 4); cma_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr; cma_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr; cma_hdr->port = src4->sin_port; break; } } else { struct sockaddr_in6 *src6, *dst6; src6 = (struct sockaddr_in6 *) &route->addr.src_addr; dst6 = (struct sockaddr_in6 *) &route->addr.dst_addr; switch (ps) { case RDMA_PS_SDP: sdp_hdr = hdr; if (sdp_get_majv(sdp_hdr->sdp_version) != SDP_MAJ_VERSION) return -EINVAL; sdp_set_ip_ver(sdp_hdr, 6); sdp_hdr->src_addr.ip6 = src6->sin6_addr; sdp_hdr->dst_addr.ip6 = dst6->sin6_addr; sdp_hdr->port = src6->sin6_port; break; default: cma_hdr = hdr; cma_hdr->cma_version = CMA_VERSION; cma_set_ip_ver(cma_hdr, 6); cma_hdr->src_addr.ip6 = src6->sin6_addr; cma_hdr->dst_addr.ip6 = dst6->sin6_addr; cma_hdr->port = src6->sin6_port; break; } } return 0; } static int cma_sidr_rep_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event) { struct rdma_id_private *id_priv = cm_id->context; struct rdma_cm_event event; struct ib_cm_sidr_rep_event_param *rep = &ib_event->param.sidr_rep_rcvd; int ret = 0; if (cma_disable_callback(id_priv, RDMA_CM_CONNECT)) return 0; memset(&event, 0, sizeof event); switch (ib_event->event) { case IB_CM_SIDR_REQ_ERROR: event.event = RDMA_CM_EVENT_UNREACHABLE; event.status = -ETIMEDOUT; break; case IB_CM_SIDR_REP_RECEIVED: event.param.ud.private_data = ib_event->private_data; event.param.ud.private_data_len = IB_CM_SIDR_REP_PRIVATE_DATA_SIZE; if (rep->status != IB_SIDR_SUCCESS) { event.event = RDMA_CM_EVENT_UNREACHABLE; event.status = ib_event->param.sidr_rep_rcvd.status; break; } ret = cma_set_qkey(id_priv); if (ret) { event.event = RDMA_CM_EVENT_ADDR_ERROR; event.status = -EINVAL; break; } if (id_priv->qkey != rep->qkey) { event.event = RDMA_CM_EVENT_UNREACHABLE; event.status = -EINVAL; break; } ib_init_ah_from_path(id_priv->id.device, id_priv->id.port_num, id_priv->id.route.path_rec, &event.param.ud.ah_attr); event.param.ud.qp_num = rep->qpn; event.param.ud.qkey = rep->qkey; event.event = RDMA_CM_EVENT_ESTABLISHED; event.status = 0; break; default: printk(KERN_ERR "RDMA CMA: unexpected IB CM event: %d\n", ib_event->event); goto out; } ret = id_priv->id.event_handler(&id_priv->id, &event); if (ret) { /* Destroy the CM ID by returning a non-zero value. */ id_priv->cm_id.ib = NULL; cma_exch(id_priv, RDMA_CM_DESTROYING); mutex_unlock(&id_priv->handler_mutex); rdma_destroy_id(&id_priv->id); return ret; } out: mutex_unlock(&id_priv->handler_mutex); return ret; } static int cma_resolve_ib_udp(struct rdma_id_private *id_priv, struct rdma_conn_param *conn_param) { struct ib_cm_sidr_req_param req; struct rdma_route *route; struct ib_cm_id *id; int ret; req.private_data_len = sizeof(struct cma_hdr) + conn_param->private_data_len; if (req.private_data_len < conn_param->private_data_len) return -EINVAL; req.private_data = kzalloc(req.private_data_len, GFP_ATOMIC); if (!req.private_data) return -ENOMEM; if (conn_param->private_data && conn_param->private_data_len) memcpy((void *) req.private_data + sizeof(struct cma_hdr), conn_param->private_data, conn_param->private_data_len); route = &id_priv->id.route; ret = cma_format_hdr((void *) req.private_data, id_priv->id.ps, route); if (ret) goto out; id = ib_create_cm_id(id_priv->id.device, cma_sidr_rep_handler, id_priv); if (IS_ERR(id)) { ret = PTR_ERR(id); goto out; } id_priv->cm_id.ib = id; req.path = route->path_rec; req.service_id = cma_get_service_id(id_priv->id.ps, (struct sockaddr *) &route->addr.dst_addr); req.timeout_ms = 1 << (cma_response_timeout - 8); req.max_cm_retries = CMA_MAX_CM_RETRIES; cma_dbg(id_priv, "sending SIDR\n"); ret = ib_send_cm_sidr_req(id_priv->cm_id.ib, &req); if (ret) { ib_destroy_cm_id(id_priv->cm_id.ib); id_priv->cm_id.ib = NULL; } out: kfree(req.private_data); return ret; } static int cma_connect_ib(struct rdma_id_private *id_priv, struct rdma_conn_param *conn_param) { struct ib_cm_req_param req; struct rdma_route *route; void *private_data; struct ib_cm_id *id; int offset, ret; memset(&req, 0, sizeof req); offset = cma_user_data_offset(id_priv->id.ps); req.private_data_len = offset + conn_param->private_data_len; if (req.private_data_len < conn_param->private_data_len) return -EINVAL; private_data = kzalloc(req.private_data_len, GFP_ATOMIC); if (!private_data) return -ENOMEM; if (conn_param->private_data && conn_param->private_data_len) memcpy(private_data + offset, conn_param->private_data, conn_param->private_data_len); id = ib_create_cm_id(id_priv->id.device, cma_ib_handler, id_priv); if (IS_ERR(id)) { ret = PTR_ERR(id); goto out; } id_priv->cm_id.ib = id; route = &id_priv->id.route; ret = cma_format_hdr(private_data, id_priv->id.ps, route); if (ret) goto out; req.private_data = private_data; req.primary_path = &route->path_rec[0]; if (route->num_paths == 2) req.alternate_path = &route->path_rec[1]; req.service_id = cma_get_service_id(id_priv->id.ps, (struct sockaddr *) &route->addr.dst_addr); req.qp_num = id_priv->qp_num; req.qp_type = id_priv->id.qp_type; req.starting_psn = id_priv->seq_num; req.responder_resources = conn_param->responder_resources; req.initiator_depth = conn_param->initiator_depth; req.flow_control = conn_param->flow_control; req.retry_count = min_t(u8, 7, conn_param->retry_count); req.rnr_retry_count = min_t(u8, 7, conn_param->rnr_retry_count); req.remote_cm_response_timeout = cma_response_timeout; req.local_cm_response_timeout = cma_response_timeout; req.max_cm_retries = CMA_MAX_CM_RETRIES; req.srq = id_priv->srq ? 1 : 0; cma_dbg(id_priv, "sending REQ\n"); ret = ib_send_cm_req(id_priv->cm_id.ib, &req); out: if (ret && !IS_ERR(id)) { ib_destroy_cm_id(id); id_priv->cm_id.ib = NULL; } kfree(private_data); return ret; } static int cma_connect_iw(struct rdma_id_private *id_priv, struct rdma_conn_param *conn_param) { struct iw_cm_id *cm_id; struct sockaddr_in* sin; int ret; struct iw_cm_conn_param iw_param; cm_id = iw_create_cm_id(id_priv->id.device, id_priv->sock, cma_iw_handler, id_priv); if (IS_ERR(cm_id)) return PTR_ERR(cm_id); id_priv->cm_id.iw = cm_id; sin = (struct sockaddr_in*) &id_priv->id.route.addr.src_addr; cm_id->local_addr = *sin; sin = (struct sockaddr_in*) &id_priv->id.route.addr.dst_addr; cm_id->remote_addr = *sin; ret = cma_modify_qp_rtr(id_priv, conn_param); if (ret) goto out; if (conn_param) { iw_param.ord = conn_param->initiator_depth; iw_param.ird = conn_param->responder_resources; iw_param.private_data = conn_param->private_data; iw_param.private_data_len = conn_param->private_data_len; iw_param.qpn = id_priv->id.qp ? id_priv->qp_num : conn_param->qp_num; } else { memset(&iw_param, 0, sizeof iw_param); iw_param.qpn = id_priv->qp_num; } ret = iw_cm_connect(cm_id, &iw_param); out: if (ret) { iw_destroy_cm_id(cm_id); id_priv->cm_id.iw = NULL; } return ret; } int rdma_connect(struct rdma_cm_id *id, struct rdma_conn_param *conn_param) { struct rdma_id_private *id_priv; int ret; id_priv = container_of(id, struct rdma_id_private, id); if (!cma_comp_exch(id_priv, RDMA_CM_ROUTE_RESOLVED, RDMA_CM_CONNECT)) return -EINVAL; if (!id->qp) { id_priv->qp_num = conn_param->qp_num; id_priv->srq = conn_param->srq; } switch (rdma_node_get_transport(id->device->node_type)) { case RDMA_TRANSPORT_IB: if (id->qp_type == IB_QPT_UD) ret = cma_resolve_ib_udp(id_priv, conn_param); else ret = cma_connect_ib(id_priv, conn_param); break; case RDMA_TRANSPORT_IWARP: case RDMA_TRANSPORT_SCIF: ret = cma_connect_iw(id_priv, conn_param); break; default: ret = -ENOSYS; break; } if (ret) goto err; return 0; err: cma_comp_exch(id_priv, RDMA_CM_CONNECT, RDMA_CM_ROUTE_RESOLVED); return ret; } EXPORT_SYMBOL(rdma_connect); static int cma_accept_ib(struct rdma_id_private *id_priv, struct rdma_conn_param *conn_param) { struct ib_cm_rep_param rep; int ret; ret = cma_modify_qp_rtr(id_priv, conn_param); if (ret) goto out; ret = cma_modify_qp_rts(id_priv, conn_param); if (ret) goto out; memset(&rep, 0, sizeof rep); rep.qp_num = id_priv->qp_num; rep.starting_psn = id_priv->seq_num; rep.private_data = conn_param->private_data; rep.private_data_len = conn_param->private_data_len; rep.responder_resources = conn_param->responder_resources; rep.initiator_depth = conn_param->initiator_depth; rep.failover_accepted = 0; rep.flow_control = conn_param->flow_control; rep.rnr_retry_count = min_t(u8, 7, conn_param->rnr_retry_count); rep.srq = id_priv->srq ? 1 : 0; cma_dbg(id_priv, "sending REP\n"); ret = ib_send_cm_rep(id_priv->cm_id.ib, &rep); out: return ret; } static int cma_accept_iw(struct rdma_id_private *id_priv, struct rdma_conn_param *conn_param) { struct iw_cm_conn_param iw_param; int ret; if (!conn_param) return -EINVAL; ret = cma_modify_qp_rtr(id_priv, conn_param); if (ret) return ret; iw_param.ord = conn_param->initiator_depth; iw_param.ird = conn_param->responder_resources; iw_param.private_data = conn_param->private_data; iw_param.private_data_len = conn_param->private_data_len; if (id_priv->id.qp) { iw_param.qpn = id_priv->qp_num; } else iw_param.qpn = conn_param->qp_num; return iw_cm_accept(id_priv->cm_id.iw, &iw_param); } static int cma_send_sidr_rep(struct rdma_id_private *id_priv, enum ib_cm_sidr_status status, const void *private_data, int private_data_len) { struct ib_cm_sidr_rep_param rep; int ret; memset(&rep, 0, sizeof rep); rep.status = status; if (status == IB_SIDR_SUCCESS) { ret = cma_set_qkey(id_priv); if (ret) return ret; rep.qp_num = id_priv->qp_num; rep.qkey = id_priv->qkey; } rep.private_data = private_data; rep.private_data_len = private_data_len; cma_dbg(id_priv, "sending SIDR\n"); return ib_send_cm_sidr_rep(id_priv->cm_id.ib, &rep); } int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param) { struct rdma_id_private *id_priv; int ret; id_priv = container_of(id, struct rdma_id_private, id); id_priv->owner = curthread->td_proc->p_pid; if (!cma_comp(id_priv, RDMA_CM_CONNECT)) return -EINVAL; if (!id->qp && conn_param) { id_priv->qp_num = conn_param->qp_num; id_priv->srq = conn_param->srq; } switch (rdma_node_get_transport(id->device->node_type)) { case RDMA_TRANSPORT_IB: if (id->qp_type == IB_QPT_UD) { if (conn_param) ret = cma_send_sidr_rep(id_priv, IB_SIDR_SUCCESS, conn_param->private_data, conn_param->private_data_len); else ret = cma_send_sidr_rep(id_priv, IB_SIDR_SUCCESS, NULL, 0); } else { if (conn_param) ret = cma_accept_ib(id_priv, conn_param); else ret = cma_rep_recv(id_priv); } break; case RDMA_TRANSPORT_IWARP: case RDMA_TRANSPORT_SCIF: ret = cma_accept_iw(id_priv, conn_param); break; default: ret = -ENOSYS; break; } if (ret) goto reject; return 0; reject: cma_modify_qp_err(id_priv); rdma_reject(id, NULL, 0); return ret; } EXPORT_SYMBOL(rdma_accept); int rdma_notify(struct rdma_cm_id *id, enum ib_event_type event) { struct rdma_id_private *id_priv; int ret; id_priv = container_of(id, struct rdma_id_private, id); if (!id_priv->cm_id.ib) return -EINVAL; switch (id->device->node_type) { case RDMA_NODE_IB_CA: ret = ib_cm_notify(id_priv->cm_id.ib, event); break; default: ret = 0; break; } return ret; } EXPORT_SYMBOL(rdma_notify); int rdma_reject(struct rdma_cm_id *id, const void *private_data, u8 private_data_len) { struct rdma_id_private *id_priv; int ret; id_priv = container_of(id, struct rdma_id_private, id); if (!id_priv->cm_id.ib) return -EINVAL; switch (rdma_node_get_transport(id->device->node_type)) { case RDMA_TRANSPORT_IB: if (id->qp_type == IB_QPT_UD) ret = cma_send_sidr_rep(id_priv, IB_SIDR_REJECT, private_data, private_data_len); else { cma_dbg(id_priv, "sending REJ\n"); ret = ib_send_cm_rej(id_priv->cm_id.ib, IB_CM_REJ_CONSUMER_DEFINED, NULL, 0, private_data, private_data_len); } break; case RDMA_TRANSPORT_IWARP: case RDMA_TRANSPORT_SCIF: ret = iw_cm_reject(id_priv->cm_id.iw, private_data, private_data_len); break; default: ret = -ENOSYS; break; } return ret; } EXPORT_SYMBOL(rdma_reject); int rdma_disconnect(struct rdma_cm_id *id) { struct rdma_id_private *id_priv; int ret; id_priv = container_of(id, struct rdma_id_private, id); if (!id_priv->cm_id.ib) return -EINVAL; switch (rdma_node_get_transport(id->device->node_type)) { case RDMA_TRANSPORT_IB: ret = cma_modify_qp_err(id_priv); if (ret) goto out; /* Initiate or respond to a disconnect. */ cma_dbg(id_priv, "sending DREQ\n"); if (ib_send_cm_dreq(id_priv->cm_id.ib, NULL, 0)) { cma_dbg(id_priv, "sending DREP\n"); ib_send_cm_drep(id_priv->cm_id.ib, NULL, 0); } break; case RDMA_TRANSPORT_IWARP: case RDMA_TRANSPORT_SCIF: ret = iw_cm_disconnect(id_priv->cm_id.iw, 0); break; default: ret = -EINVAL; break; } out: return ret; } EXPORT_SYMBOL(rdma_disconnect); static int cma_ib_mc_handler(int status, struct ib_sa_multicast *multicast) { struct rdma_id_private *id_priv; struct cma_multicast *mc = multicast->context; struct rdma_cm_event event; struct rdma_dev_addr *dev_addr; int ret; struct net_device *ndev = NULL; u16 vlan; id_priv = mc->id_priv; dev_addr = &id_priv->id.route.addr.dev_addr; if (cma_disable_callback(id_priv, RDMA_CM_ADDR_BOUND) && cma_disable_callback(id_priv, RDMA_CM_ADDR_RESOLVED)) return 0; mutex_lock(&id_priv->qp_mutex); if (!status && id_priv->id.qp) status = ib_attach_mcast(id_priv->id.qp, &multicast->rec.mgid, be16_to_cpu(multicast->rec.mlid)); mutex_unlock(&id_priv->qp_mutex); memset(&event, 0, sizeof event); event.status = status; event.param.ud.private_data = mc->context; ndev = dev_get_by_index(&init_net, dev_addr->bound_dev_if); if (!ndev) { status = -ENODEV; } else { vlan = rdma_vlan_dev_vlan_id(ndev); dev_put(ndev); } if (!status) { event.event = RDMA_CM_EVENT_MULTICAST_JOIN; ib_init_ah_from_mcmember(id_priv->id.device, id_priv->id.port_num, &multicast->rec, &event.param.ud.ah_attr); event.param.ud.ah_attr.vlan_id = vlan; event.param.ud.qp_num = 0xFFFFFF; event.param.ud.qkey = be32_to_cpu(multicast->rec.qkey); } else { event.event = RDMA_CM_EVENT_MULTICAST_ERROR; /* mark that the cached record is no longer valid */ if (status != -ENETRESET && status != -EAGAIN) { spin_lock(&id_priv->lock); id_priv->is_valid_rec = 0; spin_unlock(&id_priv->lock); } } ret = id_priv->id.event_handler(&id_priv->id, &event); if (ret) { cma_exch(id_priv, RDMA_CM_DESTROYING); mutex_unlock(&id_priv->handler_mutex); rdma_destroy_id(&id_priv->id); return 0; } mutex_unlock(&id_priv->handler_mutex); return 0; } static void cma_set_mgid(struct rdma_id_private *id_priv, struct sockaddr *addr, union ib_gid *mgid) { unsigned char mc_map[MAX_ADDR_LEN]; struct rdma_dev_addr *dev_addr = &id_priv->id.route.addr.dev_addr; struct sockaddr_in *sin = (struct sockaddr_in *) addr; +#if defined(INET6) struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *) addr; +#endif if (cma_any_addr(addr)) { memset(mgid, 0, sizeof *mgid); +#if defined(INET6) } else if ((addr->sa_family == AF_INET6) && ((be32_to_cpu(sin6->sin6_addr.s6_addr32[0]) & 0xFFF0FFFF) == 0xFF10A01B)) { /* IPv6 address is an SA assigned MGID. */ memcpy(mgid, &sin6->sin6_addr, sizeof *mgid); } else if (addr->sa_family == AF_INET6) { ipv6_ib_mc_map(&sin6->sin6_addr, dev_addr->broadcast, mc_map); if (id_priv->id.ps == RDMA_PS_UDP) mc_map[7] = 0x01; /* Use RDMA CM signature */ *mgid = *(union ib_gid *) (mc_map + 4); +#endif } else { ip_ib_mc_map(sin->sin_addr.s_addr, dev_addr->broadcast, mc_map); if (id_priv->id.ps == RDMA_PS_UDP) mc_map[7] = 0x01; /* Use RDMA CM signature */ *mgid = *(union ib_gid *) (mc_map + 4); } } static int cma_join_ib_multicast(struct rdma_id_private *id_priv, struct cma_multicast *mc) { struct ib_sa_mcmember_rec rec; struct rdma_dev_addr *dev_addr = &id_priv->id.route.addr.dev_addr; ib_sa_comp_mask comp_mask; int ret = 0; ib_addr_get_mgid(dev_addr, &id_priv->rec.mgid); /* cache ipoib bc record */ spin_lock(&id_priv->lock); if (!id_priv->is_valid_rec) ret = ib_sa_get_mcmember_rec(id_priv->id.device, id_priv->id.port_num, &id_priv->rec.mgid, &id_priv->rec); if (ret) { id_priv->is_valid_rec = 0; spin_unlock(&id_priv->lock); return ret; } else { rec = id_priv->rec; id_priv->is_valid_rec = 1; } spin_unlock(&id_priv->lock); cma_set_mgid(id_priv, (struct sockaddr *) &mc->addr, &rec.mgid); if (id_priv->id.ps == RDMA_PS_UDP) rec.qkey = cpu_to_be32(RDMA_UDP_QKEY); rdma_addr_get_sgid(dev_addr, &rec.port_gid); rec.pkey = cpu_to_be16(ib_addr_get_pkey(dev_addr)); rec.join_state = 1; comp_mask = IB_SA_MCMEMBER_REC_MGID | IB_SA_MCMEMBER_REC_PORT_GID | IB_SA_MCMEMBER_REC_PKEY | IB_SA_MCMEMBER_REC_JOIN_STATE | IB_SA_MCMEMBER_REC_QKEY | IB_SA_MCMEMBER_REC_SL | IB_SA_MCMEMBER_REC_FLOW_LABEL | IB_SA_MCMEMBER_REC_TRAFFIC_CLASS; if (id_priv->id.ps == RDMA_PS_IPOIB) comp_mask |= IB_SA_MCMEMBER_REC_RATE | IB_SA_MCMEMBER_REC_RATE_SELECTOR | IB_SA_MCMEMBER_REC_MTU_SELECTOR | IB_SA_MCMEMBER_REC_MTU | IB_SA_MCMEMBER_REC_HOP_LIMIT; mc->multicast.ib = ib_sa_join_multicast(&sa_client, id_priv->id.device, id_priv->id.port_num, &rec, comp_mask, GFP_KERNEL, cma_ib_mc_handler, mc); return PTR_RET(mc->multicast.ib); } static void iboe_mcast_work_handler(struct work_struct *work) { struct iboe_mcast_work *mw = container_of(work, struct iboe_mcast_work, work); struct cma_multicast *mc = mw->mc; struct ib_sa_multicast *m = mc->multicast.ib; mc->multicast.ib->context = mc; cma_ib_mc_handler(0, m); kref_put(&mc->mcref, release_mc); kfree(mw); } static void cma_iboe_set_mgid(struct sockaddr *addr, union ib_gid *mgid) { struct sockaddr_in *sin = (struct sockaddr_in *)addr; struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)addr; if (cma_any_addr(addr)) { memset(mgid, 0, sizeof *mgid); } else if (addr->sa_family == AF_INET6) { memcpy(mgid, &sin6->sin6_addr, sizeof *mgid); } else { mgid->raw[0] = 0xff; mgid->raw[1] = 0x0e; mgid->raw[2] = 0; mgid->raw[3] = 0; mgid->raw[4] = 0; mgid->raw[5] = 0; mgid->raw[6] = 0; mgid->raw[7] = 0; mgid->raw[8] = 0; mgid->raw[9] = 0; mgid->raw[10] = 0xff; mgid->raw[11] = 0xff; *(__be32 *)(&mgid->raw[12]) = sin->sin_addr.s_addr; } } static int cma_iboe_join_multicast(struct rdma_id_private *id_priv, struct cma_multicast *mc) { struct iboe_mcast_work *work; struct rdma_dev_addr *dev_addr = &id_priv->id.route.addr.dev_addr; int err; struct sockaddr *addr = (struct sockaddr *)&mc->addr; struct net_device *ndev = NULL; if (cma_zero_addr((struct sockaddr *)&mc->addr)) return -EINVAL; work = kzalloc(sizeof *work, GFP_KERNEL); if (!work) return -ENOMEM; mc->multicast.ib = kzalloc(sizeof(struct ib_sa_multicast), GFP_KERNEL); if (!mc->multicast.ib) { err = -ENOMEM; goto out1; } cma_iboe_set_mgid(addr, &mc->multicast.ib->rec.mgid); mc->multicast.ib->rec.pkey = cpu_to_be16(0xffff); if (id_priv->id.ps == RDMA_PS_UDP) mc->multicast.ib->rec.qkey = cpu_to_be32(RDMA_UDP_QKEY); if (dev_addr->bound_dev_if) ndev = dev_get_by_index(&init_net, dev_addr->bound_dev_if); if (!ndev) { err = -ENODEV; goto out2; } mc->multicast.ib->rec.rate = iboe_get_rate(ndev); mc->multicast.ib->rec.hop_limit = 1; mc->multicast.ib->rec.mtu = iboe_get_mtu(ndev->if_mtu); dev_put(ndev); if (!mc->multicast.ib->rec.mtu) { err = -EINVAL; goto out2; } rdma_ip2gid((struct sockaddr *)&id_priv->id.route.addr.src_addr, &mc->multicast.ib->rec.port_gid); work->id = id_priv; work->mc = mc; INIT_WORK(&work->work, iboe_mcast_work_handler); kref_get(&mc->mcref); queue_work(cma_wq, &work->work); return 0; out2: kfree(mc->multicast.ib); out1: kfree(work); return err; } int rdma_join_multicast(struct rdma_cm_id *id, struct sockaddr *addr, void *context) { struct rdma_id_private *id_priv; struct cma_multicast *mc; int ret; id_priv = container_of(id, struct rdma_id_private, id); if (!cma_comp(id_priv, RDMA_CM_ADDR_BOUND) && !cma_comp(id_priv, RDMA_CM_ADDR_RESOLVED)) return -EINVAL; mc = kmalloc(sizeof *mc, GFP_KERNEL); if (!mc) return -ENOMEM; memcpy(&mc->addr, addr, ip_addr_size(addr)); mc->context = context; mc->id_priv = id_priv; spin_lock(&id_priv->lock); list_add(&mc->list, &id_priv->mc_list); spin_unlock(&id_priv->lock); switch (rdma_node_get_transport(id->device->node_type)) { case RDMA_TRANSPORT_IB: switch (rdma_port_get_link_layer(id->device, id->port_num)) { case IB_LINK_LAYER_INFINIBAND: ret = cma_join_ib_multicast(id_priv, mc); break; case IB_LINK_LAYER_ETHERNET: kref_init(&mc->mcref); ret = cma_iboe_join_multicast(id_priv, mc); break; default: ret = -EINVAL; } break; default: ret = -ENOSYS; break; } if (ret) { spin_lock_irq(&id_priv->lock); list_del(&mc->list); spin_unlock_irq(&id_priv->lock); kfree(mc); } return ret; } EXPORT_SYMBOL(rdma_join_multicast); void rdma_leave_multicast(struct rdma_cm_id *id, struct sockaddr *addr) { struct rdma_id_private *id_priv; struct cma_multicast *mc; id_priv = container_of(id, struct rdma_id_private, id); spin_lock_irq(&id_priv->lock); list_for_each_entry(mc, &id_priv->mc_list, list) { if (!memcmp(&mc->addr, addr, ip_addr_size(addr))) { list_del(&mc->list); spin_unlock_irq(&id_priv->lock); if (id->qp) ib_detach_mcast(id->qp, &mc->multicast.ib->rec.mgid, be16_to_cpu(mc->multicast.ib->rec.mlid)); if (rdma_node_get_transport(id_priv->cma_dev->device->node_type) == RDMA_TRANSPORT_IB) { switch (rdma_port_get_link_layer(id->device, id->port_num)) { case IB_LINK_LAYER_INFINIBAND: ib_sa_free_multicast(mc->multicast.ib); kfree(mc); break; case IB_LINK_LAYER_ETHERNET: kref_put(&mc->mcref, release_mc); break; default: break; } } return; } } spin_unlock_irq(&id_priv->lock); } EXPORT_SYMBOL(rdma_leave_multicast); static int cma_netdev_change(struct net_device *ndev, struct rdma_id_private *id_priv) { struct rdma_dev_addr *dev_addr; struct cma_ndev_work *work; dev_addr = &id_priv->id.route.addr.dev_addr; if ((dev_addr->bound_dev_if == ndev->if_index) && memcmp(dev_addr->src_dev_addr, IF_LLADDR(ndev), ndev->if_addrlen)) { printk(KERN_INFO "RDMA CM addr change for ndev %s used by id %p\n", ndev->if_xname, &id_priv->id); work = kzalloc(sizeof *work, GFP_KERNEL); if (!work) return -ENOMEM; INIT_WORK(&work->work, cma_ndev_work_handler); work->id = id_priv; work->event.event = RDMA_CM_EVENT_ADDR_CHANGE; atomic_inc(&id_priv->refcount); queue_work(cma_wq, &work->work); } return 0; } static int cma_netdev_callback(struct notifier_block *self, unsigned long event, void *ctx) { struct net_device *ndev = (struct net_device *)ctx; struct cma_device *cma_dev; struct rdma_id_private *id_priv; int ret = NOTIFY_DONE; /* BONDING related, commented out until the bonding is resolved */ #if 0 if (dev_net(ndev) != &init_net) return NOTIFY_DONE; if (event != NETDEV_BONDING_FAILOVER) return NOTIFY_DONE; if (!(ndev->flags & IFF_MASTER) || !(ndev->priv_flags & IFF_BONDING)) return NOTIFY_DONE; #endif if (event != NETDEV_DOWN && event != NETDEV_UNREGISTER) return NOTIFY_DONE; mutex_lock(&lock); list_for_each_entry(cma_dev, &dev_list, list) list_for_each_entry(id_priv, &cma_dev->id_list, list) { ret = cma_netdev_change(ndev, id_priv); if (ret) goto out; } out: mutex_unlock(&lock); return ret; } static struct notifier_block cma_nb = { .notifier_call = cma_netdev_callback }; static void cma_add_one(struct ib_device *device) { struct cma_device *cma_dev; struct rdma_id_private *id_priv; cma_dev = kmalloc(sizeof *cma_dev, GFP_KERNEL); if (!cma_dev) return; cma_dev->device = device; init_completion(&cma_dev->comp); atomic_set(&cma_dev->refcount, 1); INIT_LIST_HEAD(&cma_dev->id_list); ib_set_client_data(device, &cma_client, cma_dev); mutex_lock(&lock); list_add_tail(&cma_dev->list, &dev_list); list_for_each_entry(id_priv, &listen_any_list, list) cma_listen_on_dev(id_priv, cma_dev); mutex_unlock(&lock); } static int cma_remove_id_dev(struct rdma_id_private *id_priv) { struct rdma_cm_event event; enum rdma_cm_state state; int ret = 0; /* Record that we want to remove the device */ state = cma_exch(id_priv, RDMA_CM_DEVICE_REMOVAL); if (state == RDMA_CM_DESTROYING) return 0; cma_cancel_operation(id_priv, state); mutex_lock(&id_priv->handler_mutex); /* Check for destruction from another callback. */ if (!cma_comp(id_priv, RDMA_CM_DEVICE_REMOVAL)) goto out; memset(&event, 0, sizeof event); event.event = RDMA_CM_EVENT_DEVICE_REMOVAL; ret = id_priv->id.event_handler(&id_priv->id, &event); out: mutex_unlock(&id_priv->handler_mutex); return ret; } static void cma_process_remove(struct cma_device *cma_dev) { struct rdma_id_private *id_priv; int ret; mutex_lock(&lock); while (!list_empty(&cma_dev->id_list)) { id_priv = list_entry(cma_dev->id_list.next, struct rdma_id_private, list); list_del(&id_priv->listen_list); list_del_init(&id_priv->list); atomic_inc(&id_priv->refcount); mutex_unlock(&lock); ret = id_priv->internal_id ? 1 : cma_remove_id_dev(id_priv); cma_deref_id(id_priv); if (ret) rdma_destroy_id(&id_priv->id); mutex_lock(&lock); } mutex_unlock(&lock); cma_deref_dev(cma_dev); wait_for_completion(&cma_dev->comp); } static void cma_remove_one(struct ib_device *device) { struct cma_device *cma_dev; cma_dev = ib_get_client_data(device, &cma_client); if (!cma_dev) return; mutex_lock(&lock); list_del(&cma_dev->list); mutex_unlock(&lock); cma_process_remove(cma_dev); kfree(cma_dev); } static int __init cma_init(void) { int ret = -ENOMEM; cma_wq = create_singlethread_workqueue("rdma_cm"); if (!cma_wq) return -ENOMEM; cma_free_wq = create_singlethread_workqueue("rdma_cm_fr"); if (!cma_free_wq) goto err1; ib_sa_register_client(&sa_client); rdma_addr_register_client(&addr_client); register_netdevice_notifier(&cma_nb); ret = ib_register_client(&cma_client); if (ret) goto err; return 0; err: unregister_netdevice_notifier(&cma_nb); rdma_addr_unregister_client(&addr_client); ib_sa_unregister_client(&sa_client); destroy_workqueue(cma_free_wq); err1: destroy_workqueue(cma_wq); return ret; } static void __exit cma_cleanup(void) { ib_unregister_client(&cma_client); unregister_netdevice_notifier(&cma_nb); rdma_addr_unregister_client(&addr_client); ib_sa_unregister_client(&sa_client); flush_workqueue(cma_free_wq); destroy_workqueue(cma_free_wq); destroy_workqueue(cma_wq); idr_destroy(&sdp_ps); idr_destroy(&tcp_ps); idr_destroy(&udp_ps); idr_destroy(&ipoib_ps); idr_destroy(&ib_ps); } module_init(cma_init); module_exit(cma_cleanup); Index: projects/ifnet/sys/sys/cpu.h =================================================================== --- projects/ifnet/sys/sys/cpu.h (revision 279031) +++ projects/ifnet/sys/sys/cpu.h (revision 279032) @@ -1,173 +1,189 @@ /*- * Copyright (c) 2005-2007 Nate Lawson (SDG) * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ #ifndef _SYS_CPU_H_ #define _SYS_CPU_H_ #include /* * CPU device support. */ #define CPU_IVAR_PCPU 1 #define CPU_IVAR_NOMINAL_MHZ 2 +#define CPU_IVAR_CPUID_SIZE 3 +#define CPU_IVAR_CPUID 4 static __inline struct pcpu *cpu_get_pcpu(device_t dev) { uintptr_t v = 0; BUS_READ_IVAR(device_get_parent(dev), dev, CPU_IVAR_PCPU, &v); return ((struct pcpu *)v); } static __inline int32_t cpu_get_nominal_mhz(device_t dev) { uintptr_t v = 0; if (BUS_READ_IVAR(device_get_parent(dev), dev, CPU_IVAR_NOMINAL_MHZ, &v) != 0) return (-1); return ((int32_t)v); +} + +static __inline const uint32_t *cpu_get_cpuid(device_t dev, size_t *count) +{ + uintptr_t v = 0; + if (BUS_READ_IVAR(device_get_parent(dev), dev, + CPU_IVAR_CPUID_SIZE, &v) != 0) + return (NULL); + *count = (size_t)v; + + if (BUS_READ_IVAR(device_get_parent(dev), dev, + CPU_IVAR_CPUID, &v) != 0) + return (NULL); + return ((const uint32_t *)v); } /* * CPU frequency control interface. */ /* Each driver's CPU frequency setting is exported in this format. */ struct cf_setting { int freq; /* CPU clock in Mhz or 100ths of a percent. */ int volts; /* Voltage in mV. */ int power; /* Power consumed in mW. */ int lat; /* Transition latency in us. */ device_t dev; /* Driver providing this setting. */ int spec[4];/* Driver-specific storage for non-standard info. */ }; /* Maximum number of settings a given driver can have. */ #define MAX_SETTINGS 24 /* A combination of settings is a level. */ struct cf_level { struct cf_setting total_set; struct cf_setting abs_set; struct cf_setting rel_set[MAX_SETTINGS]; int rel_count; TAILQ_ENTRY(cf_level) link; }; TAILQ_HEAD(cf_level_lst, cf_level); /* Drivers should set all unknown values to this. */ #define CPUFREQ_VAL_UNKNOWN (-1) /* * Every driver offers a type of CPU control. Absolute levels are mutually * exclusive while relative levels modify the current absolute level. There * may be multiple absolute and relative drivers available on a given * system. * * For example, consider a system with two absolute drivers that provide * frequency settings of 100, 200 and 300, 400 and a relative driver that * provides settings of 50%, 100%. The cpufreq core would export frequency * levels of 50, 100, 150, 200, 300, 400. * * The "info only" flag signifies that settings returned by * CPUFREQ_DRV_SETTINGS cannot be passed to the CPUFREQ_DRV_SET method and * are only informational. This is for some drivers that can return * information about settings but rely on another machine-dependent driver * for actually performing the frequency transition (e.g., ACPI performance * states of type "functional fixed hardware.") */ #define CPUFREQ_TYPE_MASK 0xffff #define CPUFREQ_TYPE_RELATIVE (1<<0) #define CPUFREQ_TYPE_ABSOLUTE (1<<1) #define CPUFREQ_FLAG_INFO_ONLY (1<<16) /* * When setting a level, the caller indicates the priority of this request. * Priorities determine, among other things, whether a level can be * overridden by other callers. For example, if the user sets a level but * the system thermal driver needs to override it for emergency cooling, * the driver would use a higher priority. Once the event has passed, the * driver would call cpufreq to resume any previous level. */ #define CPUFREQ_PRIO_HIGHEST 1000000 #define CPUFREQ_PRIO_KERN 1000 #define CPUFREQ_PRIO_USER 100 #define CPUFREQ_PRIO_LOWEST 0 /* * Register and unregister a driver with the cpufreq core. Once a driver * is registered, it must support calls to its CPUFREQ_GET, CPUFREQ_GET_LEVEL, * and CPUFREQ_SET methods. It must also unregister before returning from * its DEVICE_DETACH method. */ int cpufreq_register(device_t dev); int cpufreq_unregister(device_t dev); /* * Notify the cpufreq core that the number of or values for settings have * changed. */ int cpufreq_settings_changed(device_t dev); /* * Eventhandlers that are called before and after a change in frequency. * The new level and the result of the change (0 is success) is passed in. * If the driver wishes to revoke the change from cpufreq_pre_change, it * stores a non-zero error code in the result parameter and the change will * not be made. If the post-change eventhandler gets a non-zero result, * no change was made and the previous level remains in effect. If a change * is revoked, the post-change eventhandler is still called with the error * value supplied by the revoking driver. This gives listeners who cached * some data in preparation for a level change a chance to clean up. */ typedef void (*cpufreq_pre_notify_fn)(void *, const struct cf_level *, int *); typedef void (*cpufreq_post_notify_fn)(void *, const struct cf_level *, int); EVENTHANDLER_DECLARE(cpufreq_pre_change, cpufreq_pre_notify_fn); EVENTHANDLER_DECLARE(cpufreq_post_change, cpufreq_post_notify_fn); /* * Eventhandler called when the available list of levels changed. * The unit number of the device (i.e. "cpufreq0") whose levels changed * is provided so the listener can retrieve the new list of levels. */ typedef void (*cpufreq_levels_notify_fn)(void *, int); EVENTHANDLER_DECLARE(cpufreq_levels_changed, cpufreq_levels_notify_fn); /* Allow values to be +/- a bit since sometimes we have to estimate. */ #define CPUFREQ_CMP(x, y) (abs((x) - (y)) < 25) /* * Machine-dependent functions. */ /* Estimate the current clock rate for the given CPU id. */ int cpu_est_clockrate(int cpu_id, uint64_t *rate); #endif /* !_SYS_CPU_H_ */ Index: projects/ifnet/sys/sys/mbuf.h =================================================================== --- projects/ifnet/sys/sys/mbuf.h (revision 279031) +++ projects/ifnet/sys/sys/mbuf.h (revision 279032) @@ -1,1299 +1,1298 @@ /*- * Copyright (c) 1982, 1986, 1988, 1993 * The Regents of the University of California. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)mbuf.h 8.5 (Berkeley) 2/19/95 * $FreeBSD$ */ #ifndef _SYS_MBUF_H_ #define _SYS_MBUF_H_ /* XXX: These includes suck. Sorry! */ #include #ifdef _KERNEL #include #include #ifdef WITNESS #include #endif #endif /* * Mbufs are of a single size, MSIZE (sys/param.h), which includes overhead. * An mbuf may add a single "mbuf cluster" of size MCLBYTES (also in * sys/param.h), which has no additional overhead and is used instead of the * internal data area; this is done when at least MINCLSIZE of data must be * stored. Additionally, it is possible to allocate a separate buffer * externally and attach it to the mbuf in a way similar to that of mbuf * clusters. * * NB: These calculation do not take actual compiler-induced alignment and * padding inside the complete struct mbuf into account. Appropriate * attention is required when changing members of struct mbuf. * * MLEN is data length in a normal mbuf. * MHLEN is data length in an mbuf with pktheader. * MINCLSIZE is a smallest amount of data that should be put into cluster. * * Compile-time assertions in uipc_mbuf.c test these values to ensure that * they are sensible. */ struct mbuf; #define MHSIZE offsetof(struct mbuf, m_dat) #define MPKTHSIZE offsetof(struct mbuf, m_pktdat) #define MLEN ((int)(MSIZE - MHSIZE)) #define MHLEN ((int)(MSIZE - MPKTHSIZE)) #define MINCLSIZE (MHLEN + 1) #ifdef _KERNEL /*- * Macro for type conversion: convert mbuf pointer to data pointer of correct * type: * * mtod(m, t) -- Convert mbuf pointer to data pointer of correct type. * mtodo(m, o) -- Same as above but with offset 'o' into data. */ #define mtod(m, t) ((t)((m)->m_data)) #define mtodo(m, o) ((void *)(((m)->m_data) + (o))) /* * Argument structure passed to UMA routines during mbuf and packet * allocations. */ struct mb_args { int flags; /* Flags for mbuf being allocated */ short type; /* Type of mbuf being allocated */ }; #endif /* _KERNEL */ /* * Packet tag structure (see below for details). */ struct m_tag { SLIST_ENTRY(m_tag) m_tag_link; /* List of packet tags */ u_int16_t m_tag_id; /* Tag ID */ u_int16_t m_tag_len; /* Length of data */ u_int32_t m_tag_cookie; /* ABI/Module ID */ void (*m_tag_free)(struct m_tag *); }; /* * Record/packet header in first mbuf of chain; valid only if M_PKTHDR is set. * Size ILP32: 48 * LP64: 56 * Compile-time assertions in uipc_mbuf.c test these values to ensure that * they are correct. */ struct pkthdr { struct ifnet *rcvif; /* rcv interface */ SLIST_HEAD(packet_tags, m_tag) tags; /* list of packet tags */ int32_t len; /* total packet length */ /* Layer crossing persistent information. */ uint32_t flowid; /* packet's 4-tuple system */ uint64_t csum_flags; /* checksum and offload features */ uint16_t fibnum; /* this packet should use this fib */ uint8_t cosqos; /* class/quality of service */ uint8_t rsstype; /* hash type */ uint8_t l2hlen; /* layer 2 header length */ uint8_t l3hlen; /* layer 3 header length */ uint8_t l4hlen; /* layer 4 header length */ uint8_t l5hlen; /* layer 5 header length */ union { uint8_t eight[8]; uint16_t sixteen[4]; uint32_t thirtytwo[2]; uint64_t sixtyfour[1]; uintptr_t unintptr[1]; void *ptr; } PH_per; /* Layer specific non-persistent local storage for reassembly, etc. */ union { uint8_t eight[8]; uint16_t sixteen[4]; uint32_t thirtytwo[2]; uint64_t sixtyfour[1]; uintptr_t unintptr[1]; void *ptr; } PH_loc; }; #define ether_vtag PH_per.sixteen[0] #define PH_vt PH_per #define vt_nrecs sixteen[0] #define tso_segsz PH_per.sixteen[1] #define csum_phsum PH_per.sixteen[2] #define csum_data PH_per.thirtytwo[1] #define pkt_tcphdr PH_loc.ptr /* * Description of external storage mapped into mbuf; valid only if M_EXT is * set. * Size ILP32: 28 * LP64: 48 * Compile-time assertions in uipc_mbuf.c test these values to ensure that * they are correct. */ struct m_ext { volatile u_int *ext_cnt; /* pointer to ref count info */ caddr_t ext_buf; /* start of buffer */ uint32_t ext_size; /* size of buffer, for ext_free */ uint32_t ext_type:8, /* type of external storage */ ext_flags:24; /* external storage mbuf flags */ void (*ext_free) /* free routine if not the usual */ (struct mbuf *, void *, void *); void *ext_arg1; /* optional argument pointer */ void *ext_arg2; /* optional argument pointer */ }; /* * The core of the mbuf object along with some shortcut defines for practical * purposes. */ struct mbuf { /* * Header present at the beginning of every mbuf. * Size ILP32: 24 * LP64: 32 * Compile-time assertions in uipc_mbuf.c test these values to ensure * that they are correct. */ union { /* next buffer in chain */ struct mbuf *m_next; SLIST_ENTRY(mbuf) m_slist; STAILQ_ENTRY(mbuf) m_stailq; }; union { /* next chain in queue/record */ struct mbuf *m_nextpkt; SLIST_ENTRY(mbuf) m_slistpkt; STAILQ_ENTRY(mbuf) m_stailqpkt; }; caddr_t m_data; /* location of data */ int32_t m_len; /* amount of data in this mbuf */ uint32_t m_type:8, /* type of data in this mbuf */ m_flags:24; /* flags; see below */ #if !defined(__LP64__) uint32_t m_pad; /* pad for 64bit alignment */ #endif /* * A set of optional headers (packet header, external storage header) * and internal data storage. Historically, these arrays were sized * to MHLEN (space left after a packet header) and MLEN (space left * after only a regular mbuf header); they are now variable size in * order to support future work on variable-size mbufs. */ union { struct { struct pkthdr m_pkthdr; /* M_PKTHDR set */ union { struct m_ext m_ext; /* M_EXT set */ char m_pktdat[0]; }; }; char m_dat[0]; /* !M_PKTHDR, !M_EXT */ }; }; /* * mbuf flags of global significance and layer crossing. * Those of only protocol/layer specific significance are to be mapped * to M_PROTO[1-12] and cleared at layer handoff boundaries. * NB: Limited to the lower 24 bits. */ #define M_EXT 0x00000001 /* has associated external storage */ #define M_PKTHDR 0x00000002 /* start of record */ #define M_EOR 0x00000004 /* end of record */ #define M_RDONLY 0x00000008 /* associated data is marked read-only */ #define M_BCAST 0x00000010 /* send/received as link-level broadcast */ #define M_MCAST 0x00000020 /* send/received as link-level multicast */ #define M_PROMISC 0x00000040 /* packet was not for us */ #define M_VLANTAG 0x00000080 /* ether_vtag is valid */ #define M_UNUSED_8 0x00000100 /* --available-- */ #define M_NOFREE 0x00000200 /* do not free mbuf, embedded in cluster */ #define M_PROTO1 0x00001000 /* protocol-specific */ #define M_PROTO2 0x00002000 /* protocol-specific */ #define M_PROTO3 0x00004000 /* protocol-specific */ #define M_PROTO4 0x00008000 /* protocol-specific */ #define M_PROTO5 0x00010000 /* protocol-specific */ #define M_PROTO6 0x00020000 /* protocol-specific */ #define M_PROTO7 0x00040000 /* protocol-specific */ #define M_PROTO8 0x00080000 /* protocol-specific */ #define M_PROTO9 0x00100000 /* protocol-specific */ #define M_PROTO10 0x00200000 /* protocol-specific */ #define M_PROTO11 0x00400000 /* protocol-specific */ #define M_PROTO12 0x00800000 /* protocol-specific */ /* * Flags to purge when crossing layers. */ #define M_PROTOFLAGS \ (M_PROTO1|M_PROTO2|M_PROTO3|M_PROTO4|M_PROTO5|M_PROTO6|M_PROTO7|M_PROTO8|\ M_PROTO9|M_PROTO10|M_PROTO11|M_PROTO12) /* * Flags preserved when copying m_pkthdr. */ #define M_COPYFLAGS \ (M_PKTHDR|M_EOR|M_RDONLY|M_BCAST|M_MCAST|M_PROMISC|M_VLANTAG| \ M_PROTOFLAGS) /* * Mbuf flag description for use with printf(9) %b identifier. */ #define M_FLAG_BITS \ "\20\1M_EXT\2M_PKTHDR\3M_EOR\4M_RDONLY\5M_BCAST\6M_MCAST" \ "\7M_PROMISC\10M_VLANTAG" #define M_FLAG_PROTOBITS \ "\15M_PROTO1\16M_PROTO2\17M_PROTO3\20M_PROTO4\21M_PROTO5" \ "\22M_PROTO6\23M_PROTO7\24M_PROTO8\25M_PROTO9\26M_PROTO10" \ "\27M_PROTO11\30M_PROTO12" #define M_FLAG_PRINTF (M_FLAG_BITS M_FLAG_PROTOBITS) /* * Network interface cards are able to hash protocol fields (such as IPv4 * addresses and TCP port numbers) classify packets into flows. These flows * can then be used to maintain ordering while delivering packets to the OS * via parallel input queues, as well as to provide a stateless affinity * model. NIC drivers can pass up the hash via m->m_pkthdr.flowid, and set * m_flag fields to indicate how the hash should be interpreted by the * network stack. * * Most NICs support RSS, which provides ordering and explicit affinity, and * use the hash m_flag bits to indicate what header fields were covered by * the hash. M_HASHTYPE_OPAQUE can be set by non-RSS cards or configurations * that provide an opaque flow identifier, allowing for ordering and * distribution without explicit affinity. */ /* Microsoft RSS standard hash types */ #define M_HASHTYPE_NONE 0 #define M_HASHTYPE_RSS_IPV4 1 /* IPv4 2-tuple */ #define M_HASHTYPE_RSS_TCP_IPV4 2 /* TCPv4 4-tuple */ #define M_HASHTYPE_RSS_IPV6 3 /* IPv6 2-tuple */ #define M_HASHTYPE_RSS_TCP_IPV6 4 /* TCPv6 4-tuple */ #define M_HASHTYPE_RSS_IPV6_EX 5 /* IPv6 2-tuple + ext hdrs */ #define M_HASHTYPE_RSS_TCP_IPV6_EX 6 /* TCPv6 4-tiple + ext hdrs */ /* Non-standard RSS hash types */ #define M_HASHTYPE_RSS_UDP_IPV4 7 /* IPv4 UDP 4-tuple */ #define M_HASHTYPE_RSS_UDP_IPV4_EX 8 /* IPv4 UDP 4-tuple + ext hdrs */ #define M_HASHTYPE_RSS_UDP_IPV6 9 /* IPv6 UDP 4-tuple */ #define M_HASHTYPE_RSS_UDP_IPV6_EX 10 /* IPv6 UDP 4-tuple + ext hdrs */ #define M_HASHTYPE_OPAQUE 255 /* ordering, not affinity */ #define M_HASHTYPE_CLEAR(m) ((m)->m_pkthdr.rsstype = 0) #define M_HASHTYPE_GET(m) ((m)->m_pkthdr.rsstype) #define M_HASHTYPE_SET(m, v) ((m)->m_pkthdr.rsstype = (v)) #define M_HASHTYPE_TEST(m, v) (M_HASHTYPE_GET(m) == (v)) /* * COS/QOS class and quality of service tags. * It uses DSCP code points as base. */ #define QOS_DSCP_CS0 0x00 #define QOS_DSCP_DEF QOS_DSCP_CS0 #define QOS_DSCP_CS1 0x20 #define QOS_DSCP_AF11 0x28 #define QOS_DSCP_AF12 0x30 #define QOS_DSCP_AF13 0x38 #define QOS_DSCP_CS2 0x40 #define QOS_DSCP_AF21 0x48 #define QOS_DSCP_AF22 0x50 #define QOS_DSCP_AF23 0x58 #define QOS_DSCP_CS3 0x60 #define QOS_DSCP_AF31 0x68 #define QOS_DSCP_AF32 0x70 #define QOS_DSCP_AF33 0x78 #define QOS_DSCP_CS4 0x80 #define QOS_DSCP_AF41 0x88 #define QOS_DSCP_AF42 0x90 #define QOS_DSCP_AF43 0x98 #define QOS_DSCP_CS5 0xa0 #define QOS_DSCP_EF 0xb8 #define QOS_DSCP_CS6 0xc0 #define QOS_DSCP_CS7 0xe0 /* * External mbuf storage buffer types. */ #define EXT_CLUSTER 1 /* mbuf cluster */ #define EXT_SFBUF 2 /* sendfile(2)'s sf_bufs */ #define EXT_JUMBOP 3 /* jumbo cluster 4096 bytes */ #define EXT_JUMBO9 4 /* jumbo cluster 9216 bytes */ #define EXT_JUMBO16 5 /* jumbo cluster 16184 bytes */ #define EXT_PACKET 6 /* mbuf+cluster from packet zone */ #define EXT_MBUF 7 /* external mbuf reference (M_IOVEC) */ #define EXT_VENDOR1 224 /* for vendor-internal use */ #define EXT_VENDOR2 225 /* for vendor-internal use */ #define EXT_VENDOR3 226 /* for vendor-internal use */ #define EXT_VENDOR4 227 /* for vendor-internal use */ #define EXT_EXP1 244 /* for experimental use */ #define EXT_EXP2 245 /* for experimental use */ #define EXT_EXP3 246 /* for experimental use */ #define EXT_EXP4 247 /* for experimental use */ #define EXT_NET_DRV 252 /* custom ext_buf provided by net driver(s) */ #define EXT_MOD_TYPE 253 /* custom module's ext_buf type */ #define EXT_DISPOSABLE 254 /* can throw this buffer away w/page flipping */ #define EXT_EXTREF 255 /* has externally maintained ext_cnt ptr */ /* * Flags for external mbuf buffer types. * NB: limited to the lower 24 bits. */ #define EXT_FLAG_EMBREF 0x000001 /* embedded ext_cnt, notyet */ #define EXT_FLAG_EXTREF 0x000002 /* external ext_cnt, notyet */ #define EXT_FLAG_NOFREE 0x000010 /* don't free mbuf to pool, notyet */ #define EXT_FLAG_VENDOR1 0x010000 /* for vendor-internal use */ #define EXT_FLAG_VENDOR2 0x020000 /* for vendor-internal use */ #define EXT_FLAG_VENDOR3 0x040000 /* for vendor-internal use */ #define EXT_FLAG_VENDOR4 0x080000 /* for vendor-internal use */ #define EXT_FLAG_EXP1 0x100000 /* for experimental use */ #define EXT_FLAG_EXP2 0x200000 /* for experimental use */ #define EXT_FLAG_EXP3 0x400000 /* for experimental use */ #define EXT_FLAG_EXP4 0x800000 /* for experimental use */ /* * EXT flag description for use with printf(9) %b identifier. */ #define EXT_FLAG_BITS \ "\20\1EXT_FLAG_EMBREF\2EXT_FLAG_EXTREF\5EXT_FLAG_NOFREE" \ "\21EXT_FLAG_VENDOR1\22EXT_FLAG_VENDOR2\23EXT_FLAG_VENDOR3" \ "\24EXT_FLAG_VENDOR4\25EXT_FLAG_EXP1\26EXT_FLAG_EXP2\27EXT_FLAG_EXP3" \ "\30EXT_FLAG_EXP4" /* * External reference/free functions. */ void sf_ext_ref(void *, void *); void sf_ext_free(void *, void *); /* * Flags indicating checksum, segmentation and other offload work to be * done, or already done, by hardware or lower layers. It is split into * separate inbound and outbound flags. * * Outbound flags that are set by upper protocol layers requesting lower * layers, or ideally the hardware, to perform these offloading tasks. * For outbound packets this field and its flags can be directly tested * against ifnet if_hwassist. */ #define CSUM_IP 0x00000001 /* IP header checksum offload */ #define CSUM_IP_UDP 0x00000002 /* UDP checksum offload */ #define CSUM_IP_TCP 0x00000004 /* TCP checksum offload */ #define CSUM_IP_SCTP 0x00000008 /* SCTP checksum offload */ #define CSUM_IP_TSO 0x00000010 /* TCP segmentation offload */ #define CSUM_IP_ISCSI 0x00000020 /* iSCSI checksum offload */ #define CSUM_IP6_UDP 0x00000200 /* UDP checksum offload */ #define CSUM_IP6_TCP 0x00000400 /* TCP checksum offload */ #define CSUM_IP6_SCTP 0x00000800 /* SCTP checksum offload */ #define CSUM_IP6_TSO 0x00001000 /* TCP segmentation offload */ #define CSUM_IP6_ISCSI 0x00002000 /* iSCSI checksum offload */ /* Inbound checksum support where the checksum was verified by hardware. */ #define CSUM_L3_CALC 0x01000000 /* calculated layer 3 csum */ #define CSUM_L3_VALID 0x02000000 /* checksum is correct */ #define CSUM_L4_CALC 0x04000000 /* calculated layer 4 csum */ #define CSUM_L4_VALID 0x08000000 /* checksum is correct */ #define CSUM_L5_CALC 0x10000000 /* calculated layer 5 csum */ #define CSUM_L5_VALID 0x20000000 /* checksum is correct */ #define CSUM_COALESED 0x40000000 /* contains merged segments */ /* * CSUM flag description for use with printf(9) %b identifier. */ #define CSUM_BITS \ "\20\1CSUM_IP\2CSUM_IP_UDP\3CSUM_IP_TCP\4CSUM_IP_SCTP\5CSUM_IP_TSO" \ "\6CSUM_IP_ISCSI" \ "\12CSUM_IP6_UDP\13CSUM_IP6_TCP\14CSUM_IP6_SCTP\15CSUM_IP6_TSO" \ "\16CSUM_IP6_ISCSI" \ "\31CSUM_L3_CALC\32CSUM_L3_VALID\33CSUM_L4_CALC\34CSUM_L4_VALID" \ "\35CSUM_L5_CALC\36CSUM_L5_VALID\37CSUM_COALESED" /* CSUM flags compatibility mappings. */ #define CSUM_IP_CHECKED CSUM_L3_CALC #define CSUM_IP_VALID CSUM_L3_VALID #define CSUM_DATA_VALID CSUM_L4_VALID #define CSUM_PSEUDO_HDR CSUM_L4_CALC #define CSUM_SCTP_VALID CSUM_L4_VALID #define CSUM_DELAY_DATA (CSUM_TCP|CSUM_UDP) #define CSUM_DELAY_IP CSUM_IP /* Only v4, no v6 IP hdr csum */ #define CSUM_DELAY_DATA_IPV6 (CSUM_TCP_IPV6|CSUM_UDP_IPV6) #define CSUM_DATA_VALID_IPV6 CSUM_DATA_VALID #define CSUM_TCP CSUM_IP_TCP #define CSUM_UDP CSUM_IP_UDP #define CSUM_SCTP CSUM_IP_SCTP #define CSUM_TSO (CSUM_IP_TSO|CSUM_IP6_TSO) #define CSUM_UDP_IPV6 CSUM_IP6_UDP #define CSUM_TCP_IPV6 CSUM_IP6_TCP #define CSUM_SCTP_IPV6 CSUM_IP6_SCTP /* * mbuf types describing the content of the mbuf (including external storage). */ #define MT_NOTMBUF 0 /* USED INTERNALLY ONLY! Object is not mbuf */ #define MT_DATA 1 /* dynamic (data) allocation */ #define MT_HEADER MT_DATA /* packet header, use M_PKTHDR instead */ #define MT_VENDOR1 4 /* for vendor-internal use */ #define MT_VENDOR2 5 /* for vendor-internal use */ #define MT_VENDOR3 6 /* for vendor-internal use */ #define MT_VENDOR4 7 /* for vendor-internal use */ #define MT_SONAME 8 /* socket name */ #define MT_EXP1 9 /* for experimental use */ #define MT_EXP2 10 /* for experimental use */ #define MT_EXP3 11 /* for experimental use */ #define MT_EXP4 12 /* for experimental use */ #define MT_CONTROL 14 /* extra-data protocol message */ #define MT_OOBDATA 15 /* expedited data */ #define MT_NTYPES 16 /* number of mbuf types for mbtypes[] */ #define MT_NOINIT 255 /* Not a type but a flag to allocate a non-initialized mbuf */ /* * String names of mbuf-related UMA(9) and malloc(9) types. Exposed to * !_KERNEL so that monitoring tools can look up the zones with * libmemstat(3). */ #define MBUF_MEM_NAME "mbuf" #define MBUF_CLUSTER_MEM_NAME "mbuf_cluster" #define MBUF_PACKET_MEM_NAME "mbuf_packet" #define MBUF_JUMBOP_MEM_NAME "mbuf_jumbo_page" #define MBUF_JUMBO9_MEM_NAME "mbuf_jumbo_9k" #define MBUF_JUMBO16_MEM_NAME "mbuf_jumbo_16k" #define MBUF_TAG_MEM_NAME "mbuf_tag" #define MBUF_EXTREFCNT_MEM_NAME "mbuf_ext_refcnt" #ifdef _KERNEL #ifdef WITNESS #define MBUF_CHECKSLEEP(how) do { \ if (how == M_WAITOK) \ WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, NULL, \ "Sleeping in \"%s\"", __func__); \ } while (0) #else #define MBUF_CHECKSLEEP(how) #endif /* * Network buffer allocation API * * The rest of it is defined in kern/kern_mbuf.c */ extern uma_zone_t zone_mbuf; extern uma_zone_t zone_clust; extern uma_zone_t zone_pack; extern uma_zone_t zone_jumbop; extern uma_zone_t zone_jumbo9; extern uma_zone_t zone_jumbo16; extern uma_zone_t zone_ext_refcnt; void mb_free_ext(struct mbuf *); int m_pkthdr_init(struct mbuf *, int); static __inline int m_gettype(int size) { int type; switch (size) { case MSIZE: type = EXT_MBUF; break; case MCLBYTES: type = EXT_CLUSTER; break; #if MJUMPAGESIZE != MCLBYTES case MJUMPAGESIZE: type = EXT_JUMBOP; break; #endif case MJUM9BYTES: type = EXT_JUMBO9; break; case MJUM16BYTES: type = EXT_JUMBO16; break; default: panic("%s: invalid cluster size %d", __func__, size); } return (type); } /* * Associated an external reference counted buffer with an mbuf. */ static __inline void m_extaddref(struct mbuf *m, caddr_t buf, u_int size, u_int *ref_cnt, void (*freef)(struct mbuf *, void *, void *), void *arg1, void *arg2) { KASSERT(ref_cnt != NULL, ("%s: ref_cnt not provided", __func__)); atomic_add_int(ref_cnt, 1); m->m_flags |= M_EXT; m->m_ext.ext_buf = buf; m->m_ext.ext_cnt = ref_cnt; m->m_data = m->m_ext.ext_buf; m->m_ext.ext_size = size; m->m_ext.ext_free = freef; m->m_ext.ext_arg1 = arg1; m->m_ext.ext_arg2 = arg2; m->m_ext.ext_type = EXT_EXTREF; } static __inline uma_zone_t m_getzone(int size) { uma_zone_t zone; switch (size) { case MCLBYTES: zone = zone_clust; break; #if MJUMPAGESIZE != MCLBYTES case MJUMPAGESIZE: zone = zone_jumbop; break; #endif case MJUM9BYTES: zone = zone_jumbo9; break; case MJUM16BYTES: zone = zone_jumbo16; break; default: panic("%s: invalid cluster size %d", __func__, size); } return (zone); } /* * Initialize an mbuf with linear storage. * * Inline because the consumer text overhead will be roughly the same to * initialize or call a function with this many parameters and M_PKTHDR * should go away with constant propagation for !MGETHDR. */ static __inline int m_init(struct mbuf *m, uma_zone_t zone, int size, int how, short type, int flags) { int error; m->m_next = NULL; m->m_nextpkt = NULL; m->m_data = m->m_dat; m->m_len = 0; m->m_flags = flags; m->m_type = type; if (flags & M_PKTHDR) { if ((error = m_pkthdr_init(m, how)) != 0) return (error); } return (0); } static __inline struct mbuf * m_get(int how, short type) { struct mb_args args; args.flags = 0; args.type = type; return (uma_zalloc_arg(zone_mbuf, &args, how)); } /* * XXX This should be deprecated, very little use. */ static __inline struct mbuf * m_getclr(int how, short type) { struct mbuf *m; struct mb_args args; args.flags = 0; args.type = type; m = uma_zalloc_arg(zone_mbuf, &args, how); if (m != NULL) bzero(m->m_data, MLEN); return (m); } static __inline struct mbuf * m_gethdr(int how, short type) { struct mb_args args; args.flags = M_PKTHDR; args.type = type; return (uma_zalloc_arg(zone_mbuf, &args, how)); } static __inline struct mbuf * m_getcl(int how, short type, int flags) { struct mb_args args; args.flags = flags; args.type = type; return (uma_zalloc_arg(zone_pack, &args, how)); } static __inline int m_clget(struct mbuf *m, int how) { if (m->m_flags & M_EXT) printf("%s: %p mbuf already has external storage\n", __func__, m); m->m_ext.ext_buf = (char *)NULL; uma_zalloc_arg(zone_clust, m, how); /* * On a cluster allocation failure, drain the packet zone and retry, * we might be able to loosen a few clusters up on the drain. */ if ((how & M_NOWAIT) && (m->m_ext.ext_buf == NULL)) { zone_drain(zone_pack); uma_zalloc_arg(zone_clust, m, how); } return (m->m_flags & M_EXT); } /* * m_cljget() is different from m_clget() as it can allocate clusters without * attaching them to an mbuf. In that case the return value is the pointer * to the cluster of the requested size. If an mbuf was specified, it gets * the cluster attached to it and the return value can be safely ignored. * For size it takes MCLBYTES, MJUMPAGESIZE, MJUM9BYTES, MJUM16BYTES. */ static __inline void * m_cljget(struct mbuf *m, int how, int size) { uma_zone_t zone; if (m && m->m_flags & M_EXT) printf("%s: %p mbuf already has external storage\n", __func__, m); if (m != NULL) m->m_ext.ext_buf = NULL; zone = m_getzone(size); return (uma_zalloc_arg(zone, m, how)); } static __inline void m_cljset(struct mbuf *m, void *cl, int type) { uma_zone_t zone; int size; switch (type) { case EXT_CLUSTER: size = MCLBYTES; zone = zone_clust; break; #if MJUMPAGESIZE != MCLBYTES case EXT_JUMBOP: size = MJUMPAGESIZE; zone = zone_jumbop; break; #endif case EXT_JUMBO9: size = MJUM9BYTES; zone = zone_jumbo9; break; case EXT_JUMBO16: size = MJUM16BYTES; zone = zone_jumbo16; break; default: panic("%s: unknown cluster type %d", __func__, type); break; } m->m_data = m->m_ext.ext_buf = cl; m->m_ext.ext_free = m->m_ext.ext_arg1 = m->m_ext.ext_arg2 = NULL; m->m_ext.ext_size = size; m->m_ext.ext_type = type; m->m_ext.ext_flags = 0; m->m_ext.ext_cnt = uma_find_refcnt(zone, cl); m->m_flags |= M_EXT; } static __inline void m_chtype(struct mbuf *m, short new_type) { m->m_type = new_type; } static __inline void m_clrprotoflags(struct mbuf *m) { while (m) { m->m_flags &= ~M_PROTOFLAGS; m = m->m_next; } } static __inline struct mbuf * m_last(struct mbuf *m) { while (m->m_next) m = m->m_next; return (m); } /* * mbuf, cluster, and external object allocation macros (for compatibility * purposes). */ #define M_MOVE_PKTHDR(to, from) m_move_pkthdr((to), (from)) #define MGET(m, how, type) ((m) = m_get((how), (type))) #define MGETHDR(m, how, type) ((m) = m_gethdr((how), (type))) #define MCLGET(m, how) m_clget((m), (how)) #define MEXTADD(m, buf, size, free, arg1, arg2, flags, type) \ (void )m_extadd((m), (caddr_t)(buf), (size), (free), (arg1), (arg2),\ (flags), (type), M_NOWAIT) #define m_getm(m, len, how, type) \ m_getm2((m), (len), (how), (type), M_PKTHDR) /* * Evaluate TRUE if it's safe to write to the mbuf m's data region (this can * be both the local data payload, or an external buffer area, depending on * whether M_EXT is set). */ #define M_WRITABLE(m) (!((m)->m_flags & M_RDONLY) && \ (!(((m)->m_flags & M_EXT)) || \ (*((m)->m_ext.ext_cnt) == 1)) ) \ /* Check if the supplied mbuf has a packet header, or else panic. */ #define M_ASSERTPKTHDR(m) \ KASSERT((m) != NULL && (m)->m_flags & M_PKTHDR, \ ("%s: no mbuf packet header!", __func__)) /* * Ensure that the supplied mbuf is a valid, non-free mbuf. * * XXX: Broken at the moment. Need some UMA magic to make it work again. */ #define M_ASSERTVALID(m) \ KASSERT((((struct mbuf *)m)->m_flags & 0) == 0, \ ("%s: attempted use of a free mbuf!", __func__)) /* * Return the address of the start of the buffer associated with an mbuf, * handling external storage, packet-header mbufs, and regular data mbufs. */ #define M_START(m) \ (((m)->m_flags & M_EXT) ? (m)->m_ext.ext_buf : \ ((m)->m_flags & M_PKTHDR) ? &(m)->m_pktdat[0] : \ &(m)->m_dat[0]) /* * Return the size of the buffer associated with an mbuf, handling external * storage, packet-header mbufs, and regular data mbufs. */ #define M_SIZE(m) \ (((m)->m_flags & M_EXT) ? (m)->m_ext.ext_size : \ ((m)->m_flags & M_PKTHDR) ? MHLEN : \ MLEN) /* * Set the m_data pointer of a newly allocated mbuf to place an object of the * specified size at the end of the mbuf, longword aligned. * * NB: Historically, we had M_ALIGN(), MH_ALIGN(), and MEXT_ALIGN() as * separate macros, each asserting that it was called at the proper moment. * This required callers to themselves test the storage type and call the * right one. Rather than require callers to be aware of those layout * decisions, we centralize here. */ static __inline void m_align(struct mbuf *m, int len) { #ifdef INVARIANTS const char *msg = "%s: not a virgin mbuf"; #endif int adjust; KASSERT(m->m_data == M_START(m), (msg, __func__)); adjust = M_SIZE(m) - len; m->m_data += adjust &~ (sizeof(long)-1); } #define M_ALIGN(m, len) m_align(m, len) #define MH_ALIGN(m, len) m_align(m, len) #define MEXT_ALIGN(m, len) m_align(m, len) /* * Compute the amount of space available before the current start of data in * an mbuf. * * The M_WRITABLE() is a temporary, conservative safety measure: the burden * of checking writability of the mbuf data area rests solely with the caller. * * NB: In previous versions, M_LEADINGSPACE() would only check M_WRITABLE() * for mbufs with external storage. We now allow mbuf-embedded data to be * read-only as well. */ #define M_LEADINGSPACE(m) \ (M_WRITABLE(m) ? ((m)->m_data - M_START(m)) : 0) /* * Compute the amount of space available after the end of data in an mbuf. * * The M_WRITABLE() is a temporary, conservative safety measure: the burden * of checking writability of the mbuf data area rests solely with the caller. * * NB: In previous versions, M_TRAILINGSPACE() would only check M_WRITABLE() * for mbufs with external storage. We now allow mbuf-embedded data to be * read-only as well. */ #define M_TRAILINGSPACE(m) \ (M_WRITABLE(m) ? \ ((M_START(m) + M_SIZE(m)) - ((m)->m_data + (m)->m_len)) : 0) /* * Arrange to prepend space of size plen to mbuf m. If a new mbuf must be * allocated, how specifies whether to wait. If the allocation fails, the * original mbuf chain is freed and m is set to NULL. */ #define M_PREPEND(m, plen, how) do { \ struct mbuf **_mmp = &(m); \ struct mbuf *_mm = *_mmp; \ int _mplen = (plen); \ int __mhow = (how); \ \ MBUF_CHECKSLEEP(how); \ if (M_LEADINGSPACE(_mm) >= _mplen) { \ _mm->m_data -= _mplen; \ _mm->m_len += _mplen; \ } else \ _mm = m_prepend(_mm, _mplen, __mhow); \ if (_mm != NULL && _mm->m_flags & M_PKTHDR) \ _mm->m_pkthdr.len += _mplen; \ *_mmp = _mm; \ } while (0) /* * Change mbuf to new type. This is a relatively expensive operation and * should be avoided. */ #define MCHTYPE(m, t) m_chtype((m), (t)) /* Length to m_copy to copy all. */ #define M_COPYALL 1000000000 /* Compatibility with 4.3. */ #define m_copy(m, o, l) m_copym((m), (o), (l), M_NOWAIT) extern int max_datalen; /* MHLEN - max_hdr */ extern int max_hdr; /* Largest link + protocol header */ extern int max_linkhdr; /* Largest link-level header */ extern int max_protohdr; /* Largest protocol header */ extern int nmbclusters; /* Maximum number of clusters */ struct uio; void m_adj(struct mbuf *, int); int m_apply(struct mbuf *, int, int, int (*)(void *, void *, u_int), void *); int m_append(struct mbuf *, int, c_caddr_t); void m_cat(struct mbuf *, struct mbuf *); void m_catpkt(struct mbuf *, struct mbuf *); int m_extadd(struct mbuf *, caddr_t, u_int, void (*)(struct mbuf *, void *, void *), void *, void *, int, int, int); struct mbuf *m_collapse(struct mbuf *, int, int); void m_copyback(struct mbuf *, int, int, c_caddr_t); void m_copydata(const struct mbuf *, int, int, caddr_t); struct mbuf *m_copym(struct mbuf *, int, int, int); struct mbuf *m_copypacket(struct mbuf *, int); void m_copy_pkthdr(struct mbuf *, struct mbuf *); struct mbuf *m_copyup(struct mbuf *, int, int); struct mbuf *m_defrag(struct mbuf *, int); void m_demote(struct mbuf *, int, int); struct mbuf *m_devget(char *, int, int, struct ifnet *, void (*)(char *, caddr_t, u_int)); struct mbuf *m_dup(struct mbuf *, int); int m_dup_pkthdr(struct mbuf *, struct mbuf *, int); u_int m_fixhdr(struct mbuf *); struct mbuf *m_fragment(struct mbuf *, int, int); void m_freem(struct mbuf *); struct mbuf *m_get2(int, int, short, int); struct mbuf *m_getjcl(int, short, int, int); struct mbuf *m_getm2(struct mbuf *, int, int, short, int); struct mbuf *m_getptr(struct mbuf *, int, int *); u_int m_length(struct mbuf *, struct mbuf **); int m_mbuftouio(struct uio *, struct mbuf *, int); void m_move_pkthdr(struct mbuf *, struct mbuf *); struct mbuf *m_prepend(struct mbuf *, int, int); void m_print(const struct mbuf *, int); struct mbuf *m_pulldown(struct mbuf *, int, int, int *); struct mbuf *m_pullup(struct mbuf *, int); int m_sanity(struct mbuf *, int); struct mbuf *m_split(struct mbuf *, int, int); struct mbuf *m_uiotombuf(struct uio *, int, int, int, int); struct mbuf *m_unshare(struct mbuf *, int); /*- * Network packets may have annotations attached by affixing a list of * "packet tags" to the pkthdr structure. Packet tags are dynamically * allocated semi-opaque data structures that have a fixed header * (struct m_tag) that specifies the size of the memory block and a * pair that identifies it. The cookie is a 32-bit unique * unsigned value used to identify a module or ABI. By convention this value * is chosen as the date+time that the module is created, expressed as the * number of seconds since the epoch (e.g., using date -u +'%s'). The type * value is an ABI/module-specific value that identifies a particular * annotation and is private to the module. For compatibility with systems * like OpenBSD that define packet tags w/o an ABI/module cookie, the value * PACKET_ABI_COMPAT is used to implement m_tag_get and m_tag_find * compatibility shim functions and several tag types are defined below. * Users that do not require compatibility should use a private cookie value * so that packet tag-related definitions can be maintained privately. * * Note that the packet tag returned by m_tag_alloc has the default memory * alignment implemented by malloc. To reference private data one can use a * construct like: * * struct m_tag *mtag = m_tag_alloc(...); * struct foo *p = (struct foo *)(mtag+1); * * if the alignment of struct m_tag is sufficient for referencing members of * struct foo. Otherwise it is necessary to embed struct m_tag within the * private data structure to insure proper alignment; e.g., * * struct foo { * struct m_tag tag; * ... * }; * struct foo *p = (struct foo *) m_tag_alloc(...); * struct m_tag *mtag = &p->tag; */ /* * Persistent tags stay with an mbuf until the mbuf is reclaimed. Otherwise * tags are expected to ``vanish'' when they pass through a network * interface. For most interfaces this happens normally as the tags are * reclaimed when the mbuf is free'd. However in some special cases * reclaiming must be done manually. An example is packets that pass through * the loopback interface. Also, one must be careful to do this when * ``turning around'' packets (e.g., icmp_reflect). * * To mark a tag persistent bit-or this flag in when defining the tag id. * The tag will then be treated as described above. */ #define MTAG_PERSISTENT 0x800 #define PACKET_TAG_NONE 0 /* Nadda */ /* Packet tags for use with PACKET_ABI_COMPAT. */ #define PACKET_TAG_IPSEC_IN_DONE 1 /* IPsec applied, in */ #define PACKET_TAG_IPSEC_OUT_DONE 2 /* IPsec applied, out */ #define PACKET_TAG_IPSEC_IN_CRYPTO_DONE 3 /* NIC IPsec crypto done */ #define PACKET_TAG_IPSEC_OUT_CRYPTO_NEEDED 4 /* NIC IPsec crypto req'ed */ #define PACKET_TAG_IPSEC_IN_COULD_DO_CRYPTO 5 /* NIC notifies IPsec */ #define PACKET_TAG_IPSEC_PENDING_TDB 6 /* Reminder to do IPsec */ #define PACKET_TAG_BRIDGE 7 /* Bridge processing done */ #define PACKET_TAG_GIF 8 /* GIF processing done */ #define PACKET_TAG_GRE 9 /* GRE processing done */ #define PACKET_TAG_IN_PACKET_CHECKSUM 10 /* NIC checksumming done */ #define PACKET_TAG_ENCAP 11 /* Encap. processing */ #define PACKET_TAG_IPSEC_SOCKET 12 /* IPSEC socket ref */ #define PACKET_TAG_IPSEC_HISTORY 13 /* IPSEC history */ #define PACKET_TAG_IPV6_INPUT 14 /* IPV6 input processing */ #define PACKET_TAG_DUMMYNET 15 /* dummynet info */ #define PACKET_TAG_DIVERT 17 /* divert info */ #define PACKET_TAG_IPFORWARD 18 /* ipforward info */ #define PACKET_TAG_MACLABEL (19 | MTAG_PERSISTENT) /* MAC label */ #define PACKET_TAG_PF (21 | MTAG_PERSISTENT) /* PF/ALTQ information */ #define PACKET_TAG_RTSOCKFAM 25 /* rtsock sa family */ #define PACKET_TAG_IPOPTIONS 27 /* Saved IP options */ #define PACKET_TAG_CARP 28 /* CARP info */ #define PACKET_TAG_IPSEC_NAT_T_PORTS 29 /* two uint16_t */ #define PACKET_TAG_ND_OUTGOING 30 /* ND outgoing */ /* Specific cookies and tags. */ /* Packet tag routines. */ struct m_tag *m_tag_alloc(u_int32_t, int, int, int); void m_tag_delete(struct mbuf *, struct m_tag *); void m_tag_delete_chain(struct mbuf *, struct m_tag *); void m_tag_free_default(struct m_tag *); struct m_tag *m_tag_locate(struct mbuf *, u_int32_t, int, struct m_tag *); struct m_tag *m_tag_copy(struct m_tag *, int); int m_tag_copy_chain(struct mbuf *, struct mbuf *, int); void m_tag_delete_nonpersistent(struct mbuf *); /* * Initialize the list of tags associated with an mbuf. */ static __inline void m_tag_init(struct mbuf *m) { SLIST_INIT(&m->m_pkthdr.tags); } /* * Set up the contents of a tag. Note that this does not fill in the free * method; the caller is expected to do that. * * XXX probably should be called m_tag_init, but that was already taken. */ static __inline void m_tag_setup(struct m_tag *t, u_int32_t cookie, int type, int len) { t->m_tag_id = type; t->m_tag_len = len; t->m_tag_cookie = cookie; } /* * Reclaim resources associated with a tag. */ static __inline void m_tag_free(struct m_tag *t) { (*t->m_tag_free)(t); } /* * Return the first tag associated with an mbuf. */ static __inline struct m_tag * m_tag_first(struct mbuf *m) { return (SLIST_FIRST(&m->m_pkthdr.tags)); } /* * Return the next tag in the list of tags associated with an mbuf. */ static __inline struct m_tag * m_tag_next(struct mbuf *m, struct m_tag *t) { return (SLIST_NEXT(t, m_tag_link)); } /* * Prepend a tag to the list of tags associated with an mbuf. */ static __inline void m_tag_prepend(struct mbuf *m, struct m_tag *t) { SLIST_INSERT_HEAD(&m->m_pkthdr.tags, t, m_tag_link); } /* * Unlink a tag from the list of tags associated with an mbuf. */ static __inline void m_tag_unlink(struct mbuf *m, struct m_tag *t) { SLIST_REMOVE(&m->m_pkthdr.tags, t, m_tag, m_tag_link); } /* These are for OpenBSD compatibility. */ #define MTAG_ABI_COMPAT 0 /* compatibility ABI */ static __inline struct m_tag * m_tag_get(int type, int length, int wait) { return (m_tag_alloc(MTAG_ABI_COMPAT, type, length, wait)); } static __inline struct m_tag * m_tag_find(struct mbuf *m, int type, struct m_tag *start) { return (SLIST_EMPTY(&m->m_pkthdr.tags) ? (struct m_tag *)NULL : m_tag_locate(m, MTAG_ABI_COMPAT, type, start)); } static __inline struct mbuf * m_free(struct mbuf *m) { struct mbuf *n = m->m_next; if ((m->m_flags & (M_PKTHDR|M_NOFREE)) == (M_PKTHDR|M_NOFREE)) m_tag_delete_chain(m, NULL); if (m->m_flags & M_EXT) mb_free_ext(m); else if ((m->m_flags & M_NOFREE) == 0) uma_zfree(zone_mbuf, m); return (n); } static int inline rt_m_getfib(struct mbuf *m) { KASSERT(m->m_flags & M_PKTHDR , ("Attempt to get FIB from non header mbuf.")); return (m->m_pkthdr.fibnum); } #define M_GETFIB(_m) rt_m_getfib(_m) #define M_SETFIB(_m, _fib) do { \ KASSERT((_m)->m_flags & M_PKTHDR, ("Attempt to set FIB on non header mbuf.")); \ ((_m)->m_pkthdr.fibnum) = (_fib); \ } while (0) -#endif /* _KERNEL */ - #ifdef MBUF_PROFILING void m_profile(struct mbuf *m); #define M_PROFILE(m) m_profile(m) #else #define M_PROFILE(m) #endif struct mbufq { STAILQ_HEAD(, mbuf) mq_head; int mq_len; int mq_maxlen; }; static inline void mbufq_init(struct mbufq *mq, int maxlen) { STAILQ_INIT(&mq->mq_head); mq->mq_maxlen = maxlen; mq->mq_len = 0; } static inline struct mbuf * mbufq_flush(struct mbufq *mq) { struct mbuf *m; m = STAILQ_FIRST(&mq->mq_head); STAILQ_INIT(&mq->mq_head); mq->mq_len = 0; return (m); } static inline void mbufq_drain(struct mbufq *mq) { struct mbuf *m, *n; n = mbufq_flush(mq); while ((m = n) != NULL) { n = STAILQ_NEXT(m, m_stailqpkt); m_freem(m); } } static inline struct mbuf * mbufq_first(const struct mbufq *mq) { return (STAILQ_FIRST(&mq->mq_head)); } static inline struct mbuf * mbufq_last(const struct mbufq *mq) { return (STAILQ_LAST(&mq->mq_head, mbuf, m_stailqpkt)); } static inline int mbufq_full(const struct mbufq *mq) { return (mq->mq_len >= mq->mq_maxlen); } static inline int mbufq_len(const struct mbufq *mq) { return (mq->mq_len); } static inline int mbufq_enqueue(struct mbufq *mq, struct mbuf *m) { if (mbufq_full(mq)) return (ENOBUFS); STAILQ_INSERT_TAIL(&mq->mq_head, m, m_stailqpkt); mq->mq_len++; return (0); } static inline struct mbuf * mbufq_dequeue(struct mbufq *mq) { struct mbuf *m; m = STAILQ_FIRST(&mq->mq_head); if (m) { STAILQ_REMOVE_HEAD(&mq->mq_head, m_stailqpkt); mq->mq_len--; } return (m); } static inline void mbufq_prepend(struct mbufq *mq, struct mbuf *m) { STAILQ_INSERT_HEAD(&mq->mq_head, m, m_stailqpkt); mq->mq_len++; } +#endif /* _KERNEL */ #endif /* !_SYS_MBUF_H_ */ Index: projects/ifnet/sys =================================================================== --- projects/ifnet/sys (revision 279031) +++ projects/ifnet/sys (revision 279032) Property changes on: projects/ifnet/sys ___________________________________________________________________ Modified: svn:mergeinfo ## -0,0 +0,1 ## Merged /head/sys:r278980-279031 Index: projects/ifnet/usr.bin/netstat/inet.c =================================================================== --- projects/ifnet/usr.bin/netstat/inet.c (revision 279031) +++ projects/ifnet/usr.bin/netstat/inet.c (revision 279032) @@ -1,1336 +1,1274 @@ /*- * Copyright (c) 1983, 1988, 1993, 1995 * The Regents of the University of California. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #if 0 #ifndef lint static char sccsid[] = "@(#)inet.c 8.5 (Berkeley) 5/24/95"; #endif /* not lint */ #endif #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #ifdef INET6 #include #endif /* INET6 */ #include #include #include #include #include #include #include #include #include #define TCPSTATES #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "netstat.h" char *inetname(struct in_addr *); void inetprint(struct in_addr *, int, const char *, int); #ifdef INET6 static int udp_done, tcp_done, sdp_done; #endif /* INET6 */ static int pcblist_sysctl(int proto, const char *name, char **bufp, int istcp __unused) { const char *mibvar; char *buf; size_t len; switch (proto) { case IPPROTO_TCP: mibvar = "net.inet.tcp.pcblist"; break; case IPPROTO_UDP: mibvar = "net.inet.udp.pcblist"; break; case IPPROTO_DIVERT: mibvar = "net.inet.divert.pcblist"; break; default: mibvar = "net.inet.raw.pcblist"; break; } if (strncmp(name, "sdp", 3) == 0) mibvar = "net.inet.sdp.pcblist"; len = 0; if (sysctlbyname(mibvar, 0, &len, 0, 0) < 0) { if (errno != ENOENT) warn("sysctl: %s", mibvar); return (0); } if ((buf = malloc(len)) == 0) { warnx("malloc %lu bytes", (u_long)len); return (0); } if (sysctlbyname(mibvar, buf, &len, 0, 0) < 0) { warn("sysctl: %s", mibvar); free(buf); return (0); } *bufp = buf; return (1); } /* * Copied directly from uipc_socket2.c. We leave out some fields that are in * nested structures that aren't used to avoid extra work. */ static void sbtoxsockbuf(struct sockbuf *sb, struct xsockbuf *xsb) { xsb->sb_cc = sb->sb_ccc; xsb->sb_hiwat = sb->sb_hiwat; xsb->sb_mbcnt = sb->sb_mbcnt; xsb->sb_mcnt = sb->sb_mcnt; xsb->sb_ccnt = sb->sb_ccnt; xsb->sb_mbmax = sb->sb_mbmax; xsb->sb_lowat = sb->sb_lowat; xsb->sb_flags = sb->sb_flags; xsb->sb_timeo = sb->sb_timeo; } int sotoxsocket(struct socket *so, struct xsocket *xso) { struct protosw proto; struct domain domain; bzero(xso, sizeof *xso); xso->xso_len = sizeof *xso; xso->xso_so = so; xso->so_type = so->so_type; xso->so_options = so->so_options; xso->so_linger = so->so_linger; xso->so_state = so->so_state; xso->so_pcb = so->so_pcb; if (kread((uintptr_t)so->so_proto, &proto, sizeof(proto)) != 0) return (-1); xso->xso_protocol = proto.pr_protocol; if (kread((uintptr_t)proto.pr_domain, &domain, sizeof(domain)) != 0) return (-1); xso->xso_family = domain.dom_family; xso->so_qlen = so->so_qlen; xso->so_incqlen = so->so_incqlen; xso->so_qlimit = so->so_qlimit; xso->so_timeo = so->so_timeo; xso->so_error = so->so_error; xso->so_oobmark = so->so_oobmark; sbtoxsockbuf(&so->so_snd, &xso->so_snd); sbtoxsockbuf(&so->so_rcv, &xso->so_rcv); return (0); } static int pcblist_kvm(u_long off, char **bufp, int istcp) { struct inpcbinfo pcbinfo; struct inpcbhead listhead; struct inpcb *inp; struct xinpcb xi; struct xinpgen xig; struct xtcpcb xt; struct socket so; struct xsocket *xso; char *buf, *p; size_t len; if (off == 0) return (0); kread(off, &pcbinfo, sizeof(pcbinfo)); if (istcp) len = 2 * sizeof(xig) + (pcbinfo.ipi_count + pcbinfo.ipi_count / 8) * sizeof(struct xtcpcb); else len = 2 * sizeof(xig) + (pcbinfo.ipi_count + pcbinfo.ipi_count / 8) * sizeof(struct xinpcb); if ((buf = malloc(len)) == 0) { warnx("malloc %lu bytes", (u_long)len); return (0); } p = buf; #define COPYOUT(obj, size) do { \ if (len < (size)) { \ warnx("buffer size exceeded"); \ goto fail; \ } \ bcopy((obj), p, (size)); \ len -= (size); \ p += (size); \ } while (0) #define KREAD(off, buf, len) do { \ if (kread((uintptr_t)(off), (buf), (len)) != 0) \ goto fail; \ } while (0) /* Write out header. */ xig.xig_len = sizeof xig; xig.xig_count = pcbinfo.ipi_count; xig.xig_gen = pcbinfo.ipi_gencnt; xig.xig_sogen = 0; COPYOUT(&xig, sizeof xig); /* Walk the PCB list. */ xt.xt_len = sizeof xt; xi.xi_len = sizeof xi; if (istcp) xso = &xt.xt_socket; else xso = &xi.xi_socket; KREAD(pcbinfo.ipi_listhead, &listhead, sizeof(listhead)); LIST_FOREACH(inp, &listhead, inp_list) { if (istcp) { KREAD(inp, &xt.xt_inp, sizeof(*inp)); inp = &xt.xt_inp; } else { KREAD(inp, &xi.xi_inp, sizeof(*inp)); inp = &xi.xi_inp; } if (inp->inp_gencnt > pcbinfo.ipi_gencnt) continue; if (istcp) { if (inp->inp_ppcb == NULL) bzero(&xt.xt_tp, sizeof xt.xt_tp); else if (inp->inp_flags & INP_TIMEWAIT) { bzero(&xt.xt_tp, sizeof xt.xt_tp); xt.xt_tp.t_state = TCPS_TIME_WAIT; } else KREAD(inp->inp_ppcb, &xt.xt_tp, sizeof xt.xt_tp); } if (inp->inp_socket) { KREAD(inp->inp_socket, &so, sizeof(so)); if (sotoxsocket(&so, xso) != 0) goto fail; } else { bzero(xso, sizeof(*xso)); if (istcp) xso->xso_protocol = IPPROTO_TCP; } if (istcp) COPYOUT(&xt, sizeof xt); else COPYOUT(&xi, sizeof xi); } /* Reread the pcbinfo and write out the footer. */ kread(off, &pcbinfo, sizeof(pcbinfo)); xig.xig_count = pcbinfo.ipi_count; xig.xig_gen = pcbinfo.ipi_gencnt; COPYOUT(&xig, sizeof xig); *bufp = buf; return (1); fail: free(buf); return (0); #undef COPYOUT #undef KREAD } /* * Print a summary of connections related to an Internet * protocol. For TCP, also give state of connection. * Listening processes (aflag) are suppressed unless the * -a (all) flag is specified. */ void protopr(u_long off, const char *name, int af1, int proto) { int istcp; static int first = 1; char *buf; const char *vchar; struct tcpcb *tp = NULL; struct inpcb *inp; struct xinpgen *xig, *oxig; struct xsocket *so; struct xtcp_timer *timer; istcp = 0; switch (proto) { case IPPROTO_TCP: #ifdef INET6 if (strncmp(name, "sdp", 3) != 0) { if (tcp_done != 0) return; else tcp_done = 1; } else { if (sdp_done != 0) return; else sdp_done = 1; } #endif istcp = 1; break; case IPPROTO_UDP: #ifdef INET6 if (udp_done != 0) return; else udp_done = 1; #endif break; } if (live) { if (!pcblist_sysctl(proto, name, &buf, istcp)) return; } else { if (!pcblist_kvm(off, &buf, istcp)) return; } oxig = xig = (struct xinpgen *)buf; for (xig = (struct xinpgen *)((char *)xig + xig->xig_len); xig->xig_len > sizeof(struct xinpgen); xig = (struct xinpgen *)((char *)xig + xig->xig_len)) { if (istcp) { timer = &((struct xtcpcb *)xig)->xt_timer; tp = &((struct xtcpcb *)xig)->xt_tp; inp = &((struct xtcpcb *)xig)->xt_inp; so = &((struct xtcpcb *)xig)->xt_socket; } else { inp = &((struct xinpcb *)xig)->xi_inp; so = &((struct xinpcb *)xig)->xi_socket; timer = NULL; } /* Ignore sockets for protocols other than the desired one. */ if (so->xso_protocol != proto) continue; /* Ignore PCBs which were freed during copyout. */ if (inp->inp_gencnt > oxig->xig_gen) continue; if ((af1 == AF_INET && (inp->inp_vflag & INP_IPV4) == 0) #ifdef INET6 || (af1 == AF_INET6 && (inp->inp_vflag & INP_IPV6) == 0) #endif /* INET6 */ || (af1 == AF_UNSPEC && ((inp->inp_vflag & INP_IPV4) == 0 #ifdef INET6 && (inp->inp_vflag & INP_IPV6) == 0 #endif /* INET6 */ )) ) continue; if (!aflag && ( (istcp && tp->t_state == TCPS_LISTEN) || (af1 == AF_INET && inet_lnaof(inp->inp_laddr) == INADDR_ANY) #ifdef INET6 || (af1 == AF_INET6 && IN6_IS_ADDR_UNSPECIFIED(&inp->in6p_laddr)) #endif /* INET6 */ || (af1 == AF_UNSPEC && (((inp->inp_vflag & INP_IPV4) != 0 && inet_lnaof(inp->inp_laddr) == INADDR_ANY) #ifdef INET6 || ((inp->inp_vflag & INP_IPV6) != 0 && IN6_IS_ADDR_UNSPECIFIED(&inp->in6p_laddr)) #endif )) )) continue; if (first) { if (!Lflag) { printf("Active Internet connections"); if (aflag) printf(" (including servers)"); } else printf( "Current listen queue sizes (qlen/incqlen/maxqlen)"); putchar('\n'); if (Aflag) printf("%-*s ", 2 * (int)sizeof(void *), "Tcpcb"); if (Lflag) printf((Aflag && !Wflag) ? "%-5.5s %-14.14s %-18.18s" : "%-5.5s %-14.14s %-22.22s", "Proto", "Listen", "Local Address"); else if (Tflag) printf((Aflag && !Wflag) ? "%-5.5s %-6.6s %-6.6s %-6.6s %-18.18s %s" : "%-5.5s %-6.6s %-6.6s %-6.6s %-22.22s %s", "Proto", "Rexmit", "OOORcv", "0-win", "Local Address", "Foreign Address"); else { printf((Aflag && !Wflag) ? "%-5.5s %-6.6s %-6.6s %-18.18s %-18.18s" : "%-5.5s %-6.6s %-6.6s %-22.22s %-22.22s", "Proto", "Recv-Q", "Send-Q", "Local Address", "Foreign Address"); if (!xflag && !Rflag) printf(" (state)"); } if (xflag) { printf(" %-6.6s %-6.6s %-6.6s %-6.6s %-6.6s %-6.6s %-6.6s %-6.6s %-6.6s %-6.6s %-6.6s %-6.6s", "R-MBUF", "S-MBUF", "R-CLUS", "S-CLUS", "R-HIWA", "S-HIWA", "R-LOWA", "S-LOWA", "R-BCNT", "S-BCNT", "R-BMAX", "S-BMAX"); printf(" %7.7s %7.7s %7.7s %7.7s %7.7s %7.7s", "rexmt", "persist", "keep", "2msl", "delack", "rcvtime"); } else if (Rflag) { printf (" %8.8s %5.5s", "flowid", "ftype"); } putchar('\n'); first = 0; } if (Lflag && so->so_qlimit == 0) continue; if (Aflag) { if (istcp) printf("%*lx ", 2 * (int)sizeof(void *), (u_long)inp->inp_ppcb); else printf("%*lx ", 2 * (int)sizeof(void *), (u_long)so->so_pcb); } #ifdef INET6 if ((inp->inp_vflag & INP_IPV6) != 0) vchar = ((inp->inp_vflag & INP_IPV4) != 0) ? "46" : "6 "; else #endif vchar = ((inp->inp_vflag & INP_IPV4) != 0) ? "4 " : " "; if (istcp && (tp->t_flags & TF_TOE) != 0) printf("%-3.3s%-2.2s ", "toe", vchar); else printf("%-3.3s%-2.2s ", name, vchar); if (Lflag) { char buf1[15]; snprintf(buf1, 15, "%d/%d/%d", so->so_qlen, so->so_incqlen, so->so_qlimit); printf("%-14.14s ", buf1); } else if (Tflag) { if (istcp) printf("%6u %6u %6u ", tp->t_sndrexmitpack, tp->t_rcvoopack, tp->t_sndzerowin); } else { printf("%6u %6u ", so->so_rcv.sb_cc, so->so_snd.sb_cc); } if (numeric_port) { if (inp->inp_vflag & INP_IPV4) { inetprint(&inp->inp_laddr, (int)inp->inp_lport, name, 1); if (!Lflag) inetprint(&inp->inp_faddr, (int)inp->inp_fport, name, 1); } #ifdef INET6 else if (inp->inp_vflag & INP_IPV6) { inet6print(&inp->in6p_laddr, (int)inp->inp_lport, name, 1); if (!Lflag) inet6print(&inp->in6p_faddr, (int)inp->inp_fport, name, 1); } /* else nothing printed now */ #endif /* INET6 */ } else if (inp->inp_flags & INP_ANONPORT) { if (inp->inp_vflag & INP_IPV4) { inetprint(&inp->inp_laddr, (int)inp->inp_lport, name, 1); if (!Lflag) inetprint(&inp->inp_faddr, (int)inp->inp_fport, name, 0); } #ifdef INET6 else if (inp->inp_vflag & INP_IPV6) { inet6print(&inp->in6p_laddr, (int)inp->inp_lport, name, 1); if (!Lflag) inet6print(&inp->in6p_faddr, (int)inp->inp_fport, name, 0); } /* else nothing printed now */ #endif /* INET6 */ } else { if (inp->inp_vflag & INP_IPV4) { inetprint(&inp->inp_laddr, (int)inp->inp_lport, name, 0); if (!Lflag) inetprint(&inp->inp_faddr, (int)inp->inp_fport, name, inp->inp_lport != inp->inp_fport); } #ifdef INET6 else if (inp->inp_vflag & INP_IPV6) { inet6print(&inp->in6p_laddr, (int)inp->inp_lport, name, 0); if (!Lflag) inet6print(&inp->in6p_faddr, (int)inp->inp_fport, name, inp->inp_lport != inp->inp_fport); } /* else nothing printed now */ #endif /* INET6 */ } if (xflag) { printf("%6u %6u %6u %6u %6u %6u %6u %6u %6u %6u %6u %6u", so->so_rcv.sb_mcnt, so->so_snd.sb_mcnt, so->so_rcv.sb_ccnt, so->so_snd.sb_ccnt, so->so_rcv.sb_hiwat, so->so_snd.sb_hiwat, so->so_rcv.sb_lowat, so->so_snd.sb_lowat, so->so_rcv.sb_mbcnt, so->so_snd.sb_mbcnt, so->so_rcv.sb_mbmax, so->so_snd.sb_mbmax); if (timer != NULL) printf(" %4d.%02d %4d.%02d %4d.%02d %4d.%02d %4d.%02d %4d.%02d", timer->tt_rexmt / 1000, (timer->tt_rexmt % 1000) / 10, timer->tt_persist / 1000, (timer->tt_persist % 1000) / 10, timer->tt_keep / 1000, (timer->tt_keep % 1000) / 10, timer->tt_2msl / 1000, (timer->tt_2msl % 1000) / 10, timer->tt_delack / 1000, (timer->tt_delack % 1000) / 10, timer->t_rcvtime / 1000, (timer->t_rcvtime % 1000) / 10); } if (istcp && !Lflag && !xflag && !Tflag && !Rflag) { if (tp->t_state < 0 || tp->t_state >= TCP_NSTATES) printf("%d", tp->t_state); else { printf("%s", tcpstates[tp->t_state]); #if defined(TF_NEEDSYN) && defined(TF_NEEDFIN) /* Show T/TCP `hidden state' */ if (tp->t_flags & (TF_NEEDSYN|TF_NEEDFIN)) putchar('*'); #endif /* defined(TF_NEEDSYN) && defined(TF_NEEDFIN) */ } } if (Rflag) { printf(" %08x %5d", inp->inp_flowid, inp->inp_flowtype); } putchar('\n'); } if (xig != oxig && xig->xig_gen != oxig->xig_gen) { if (oxig->xig_count > xig->xig_count) { printf("Some %s sockets may have been deleted.\n", name); } else if (oxig->xig_count < xig->xig_count) { printf("Some %s sockets may have been created.\n", name); } else { printf( "Some %s sockets may have been created or deleted.\n", name); } } free(buf); } /* * Dump TCP statistics structure. */ void tcp_stats(u_long off, const char *name, int af1 __unused, int proto __unused) { struct tcpstat tcpstat, zerostat; size_t len = sizeof tcpstat; #ifdef INET6 if (tcp_done != 0) return; else tcp_done = 1; #endif if (live) { if (zflag) memset(&zerostat, 0, len); if (sysctlbyname("net.inet.tcp.stats", &tcpstat, &len, zflag ? &zerostat : NULL, zflag ? len : 0) < 0) { warn("sysctl: net.inet.tcp.stats"); return; } } else kread_counters(off, &tcpstat, len); printf ("%s:\n", name); #define p(f, m) if (tcpstat.f || sflag <= 1) \ printf(m, (uintmax_t )tcpstat.f, plural(tcpstat.f)) #define p1a(f, m) if (tcpstat.f || sflag <= 1) \ printf(m, (uintmax_t )tcpstat.f) #define p2(f1, f2, m) if (tcpstat.f1 || tcpstat.f2 || sflag <= 1) \ printf(m, (uintmax_t )tcpstat.f1, plural(tcpstat.f1), \ (uintmax_t )tcpstat.f2, plural(tcpstat.f2)) #define p2a(f1, f2, m) if (tcpstat.f1 || tcpstat.f2 || sflag <= 1) \ printf(m, (uintmax_t )tcpstat.f1, plural(tcpstat.f1), \ (uintmax_t )tcpstat.f2) #define p3(f, m) if (tcpstat.f || sflag <= 1) \ printf(m, (uintmax_t )tcpstat.f, pluralies(tcpstat.f)) p(tcps_sndtotal, "\t%ju packet%s sent\n"); p2(tcps_sndpack,tcps_sndbyte, "\t\t%ju data packet%s (%ju byte%s)\n"); p2(tcps_sndrexmitpack, tcps_sndrexmitbyte, "\t\t%ju data packet%s (%ju byte%s) retransmitted\n"); p(tcps_sndrexmitbad, "\t\t%ju data packet%s unnecessarily retransmitted\n"); p(tcps_mturesent, "\t\t%ju resend%s initiated by MTU discovery\n"); p2a(tcps_sndacks, tcps_delack, "\t\t%ju ack-only packet%s (%ju delayed)\n"); p(tcps_sndurg, "\t\t%ju URG only packet%s\n"); p(tcps_sndprobe, "\t\t%ju window probe packet%s\n"); p(tcps_sndwinup, "\t\t%ju window update packet%s\n"); p(tcps_sndctrl, "\t\t%ju control packet%s\n"); p(tcps_rcvtotal, "\t%ju packet%s received\n"); p2(tcps_rcvackpack, tcps_rcvackbyte, "\t\t%ju ack%s (for %ju byte%s)\n"); p(tcps_rcvdupack, "\t\t%ju duplicate ack%s\n"); p(tcps_rcvacktoomuch, "\t\t%ju ack%s for unsent data\n"); p2(tcps_rcvpack, tcps_rcvbyte, "\t\t%ju packet%s (%ju byte%s) received in-sequence\n"); p2(tcps_rcvduppack, tcps_rcvdupbyte, "\t\t%ju completely duplicate packet%s (%ju byte%s)\n"); p(tcps_pawsdrop, "\t\t%ju old duplicate packet%s\n"); p2(tcps_rcvpartduppack, tcps_rcvpartdupbyte, "\t\t%ju packet%s with some dup. data (%ju byte%s duped)\n"); p2(tcps_rcvoopack, tcps_rcvoobyte, "\t\t%ju out-of-order packet%s (%ju byte%s)\n"); p2(tcps_rcvpackafterwin, tcps_rcvbyteafterwin, "\t\t%ju packet%s (%ju byte%s) of data after window\n"); p(tcps_rcvwinprobe, "\t\t%ju window probe%s\n"); p(tcps_rcvwinupd, "\t\t%ju window update packet%s\n"); p(tcps_rcvafterclose, "\t\t%ju packet%s received after close\n"); p(tcps_rcvbadsum, "\t\t%ju discarded for bad checksum%s\n"); p(tcps_rcvbadoff, "\t\t%ju discarded for bad header offset field%s\n"); p1a(tcps_rcvshort, "\t\t%ju discarded because packet too short\n"); p1a(tcps_rcvreassfull, "\t\t%ju discarded due to no space in reassembly queue\n"); p(tcps_connattempt, "\t%ju connection request%s\n"); p(tcps_accepts, "\t%ju connection accept%s\n"); p(tcps_badsyn, "\t%ju bad connection attempt%s\n"); p(tcps_listendrop, "\t%ju listen queue overflow%s\n"); p(tcps_badrst, "\t%ju ignored RSTs in the window%s\n"); p(tcps_connects, "\t%ju connection%s established (including accepts)\n"); p2(tcps_closed, tcps_drops, "\t%ju connection%s closed (including %ju drop%s)\n"); p(tcps_cachedrtt, "\t\t%ju connection%s updated cached RTT on close\n"); p(tcps_cachedrttvar, "\t\t%ju connection%s updated cached RTT variance on close\n"); p(tcps_cachedssthresh, "\t\t%ju connection%s updated cached ssthresh on close\n"); p(tcps_conndrops, "\t%ju embryonic connection%s dropped\n"); p2(tcps_rttupdated, tcps_segstimed, "\t%ju segment%s updated rtt (of %ju attempt%s)\n"); p(tcps_rexmttimeo, "\t%ju retransmit timeout%s\n"); p(tcps_timeoutdrop, "\t\t%ju connection%s dropped by rexmit timeout\n"); p(tcps_persisttimeo, "\t%ju persist timeout%s\n"); p(tcps_persistdrop, "\t\t%ju connection%s dropped by persist timeout\n"); p(tcps_finwait2_drops, "\t%ju Connection%s (fin_wait_2) dropped because of timeout\n"); p(tcps_keeptimeo, "\t%ju keepalive timeout%s\n"); p(tcps_keepprobe, "\t\t%ju keepalive probe%s sent\n"); p(tcps_keepdrops, "\t\t%ju connection%s dropped by keepalive\n"); p(tcps_predack, "\t%ju correct ACK header prediction%s\n"); p(tcps_preddat, "\t%ju correct data packet header prediction%s\n"); p3(tcps_sc_added, "\t%ju syncache entr%s added\n"); p1a(tcps_sc_retransmitted, "\t\t%ju retransmitted\n"); p1a(tcps_sc_dupsyn, "\t\t%ju dupsyn\n"); p1a(tcps_sc_dropped, "\t\t%ju dropped\n"); p1a(tcps_sc_completed, "\t\t%ju completed\n"); p1a(tcps_sc_bucketoverflow, "\t\t%ju bucket overflow\n"); p1a(tcps_sc_cacheoverflow, "\t\t%ju cache overflow\n"); p1a(tcps_sc_reset, "\t\t%ju reset\n"); p1a(tcps_sc_stale, "\t\t%ju stale\n"); p1a(tcps_sc_aborted, "\t\t%ju aborted\n"); p1a(tcps_sc_badack, "\t\t%ju badack\n"); p1a(tcps_sc_unreach, "\t\t%ju unreach\n"); p(tcps_sc_zonefail, "\t\t%ju zone failure%s\n"); p(tcps_sc_sendcookie, "\t%ju cookie%s sent\n"); p(tcps_sc_recvcookie, "\t%ju cookie%s received\n"); p3(tcps_hc_added, "\t%ju hostcache entr%s added\n"); p1a(tcps_hc_bucketoverflow, "\t\t%ju bucket overflow\n"); p(tcps_sack_recovery_episode, "\t%ju SACK recovery episode%s\n"); p(tcps_sack_rexmits, "\t%ju segment rexmit%s in SACK recovery episodes\n"); p(tcps_sack_rexmit_bytes, "\t%ju byte rexmit%s in SACK recovery episodes\n"); p(tcps_sack_rcv_blocks, "\t%ju SACK option%s (SACK blocks) received\n"); p(tcps_sack_send_blocks, "\t%ju SACK option%s (SACK blocks) sent\n"); p1a(tcps_sack_sboverflow, "\t%ju SACK scoreboard overflow\n"); p(tcps_ecn_ce, "\t%ju packet%s with ECN CE bit set\n"); p(tcps_ecn_ect0, "\t%ju packet%s with ECN ECT(0) bit set\n"); p(tcps_ecn_ect1, "\t%ju packet%s with ECN ECT(1) bit set\n"); p(tcps_ecn_shs, "\t%ju successful ECN handshake%s\n"); p(tcps_ecn_rcwnd, "\t%ju time%s ECN reduced the congestion window\n"); p(tcps_sig_rcvgoodsig, "\t%ju packet%s with valid tcp-md5 signature received\n"); p(tcps_sig_rcvbadsig, "\t%ju packet%s with invalid tcp-md5 signature received\n"); p(tcps_sig_err_buildsig, "\t%ju packet%s with tcp-md5 signature mismatch\n"); p(tcps_sig_err_sigopt, "\t%ju packet%s with unexpected tcp-md5 signature received\n"); p(tcps_sig_err_nosigopt, "\t%ju packet%s without expected tcp-md5 signature received\n"); #undef p #undef p1a #undef p2 #undef p2a #undef p3 } /* * Dump UDP statistics structure. */ void udp_stats(u_long off, const char *name, int af1 __unused, int proto __unused) { struct udpstat udpstat, zerostat; size_t len = sizeof udpstat; uint64_t delivered; #ifdef INET6 if (udp_done != 0) return; else udp_done = 1; #endif if (live) { if (zflag) memset(&zerostat, 0, len); if (sysctlbyname("net.inet.udp.stats", &udpstat, &len, zflag ? &zerostat : NULL, zflag ? len : 0) < 0) { warn("sysctl: net.inet.udp.stats"); return; } } else kread_counters(off, &udpstat, len); printf("%s:\n", name); #define p(f, m) if (udpstat.f || sflag <= 1) \ printf("\t%ju " m, (uintmax_t)udpstat.f, plural(udpstat.f)) #define p1a(f, m) if (udpstat.f || sflag <= 1) \ printf("\t%ju " m, (uintmax_t)udpstat.f) p(udps_ipackets, "datagram%s received\n"); p1a(udps_hdrops, "with incomplete header\n"); p1a(udps_badlen, "with bad data length field\n"); p1a(udps_badsum, "with bad checksum\n"); p1a(udps_nosum, "with no checksum\n"); p1a(udps_noport, "dropped due to no socket\n"); p(udps_noportbcast, "broadcast/multicast datagram%s undelivered\n"); p1a(udps_fullsock, "dropped due to full socket buffers\n"); p1a(udpps_pcbhashmiss, "not for hashed pcb\n"); delivered = udpstat.udps_ipackets - udpstat.udps_hdrops - udpstat.udps_badlen - udpstat.udps_badsum - udpstat.udps_noport - udpstat.udps_noportbcast - udpstat.udps_fullsock; if (delivered || sflag <= 1) printf("\t%ju delivered\n", (uint64_t)delivered); p(udps_opackets, "datagram%s output\n"); /* the next statistic is cumulative in udps_noportbcast */ p(udps_filtermcast, "time%s multicast source filter matched\n"); #undef p #undef p1a } /* * Dump CARP statistics structure. */ void carp_stats(u_long off, const char *name, int af1 __unused, int proto __unused) { struct carpstats carpstat, zerostat; size_t len = sizeof(struct carpstats); if (live) { if (zflag) memset(&zerostat, 0, len); if (sysctlbyname("net.inet.carp.stats", &carpstat, &len, zflag ? &zerostat : NULL, zflag ? len : 0) < 0) { if (errno != ENOENT) warn("sysctl: net.inet.carp.stats"); return; } } else { if (off == 0) return; kread_counters(off, &carpstat, len); } printf("%s:\n", name); #define p(f, m) if (carpstat.f || sflag <= 1) \ printf(m, (uintmax_t)carpstat.f, plural(carpstat.f)) #define p2(f, m) if (carpstat.f || sflag <= 1) \ printf(m, (uintmax_t)carpstat.f) p(carps_ipackets, "\t%ju packet%s received (IPv4)\n"); p(carps_ipackets6, "\t%ju packet%s received (IPv6)\n"); p(carps_badttl, "\t\t%ju packet%s discarded for wrong TTL\n"); p(carps_hdrops, "\t\t%ju packet%s shorter than header\n"); p(carps_badsum, "\t\t%ju discarded for bad checksum%s\n"); p(carps_badver, "\t\t%ju discarded packet%s with a bad version\n"); p2(carps_badlen, "\t\t%ju discarded because packet too short\n"); p2(carps_badauth, "\t\t%ju discarded for bad authentication\n"); p2(carps_badvhid, "\t\t%ju discarded for bad vhid\n"); p2(carps_badaddrs, "\t\t%ju discarded because of a bad address list\n"); p(carps_opackets, "\t%ju packet%s sent (IPv4)\n"); p(carps_opackets6, "\t%ju packet%s sent (IPv6)\n"); p2(carps_onomem, "\t\t%ju send failed due to mbuf memory error\n"); #if notyet p(carps_ostates, "\t\t%s state update%s sent\n"); #endif #undef p #undef p2 } /* * Dump IP statistics structure. */ void ip_stats(u_long off, const char *name, int af1 __unused, int proto __unused) { struct ipstat ipstat, zerostat; size_t len = sizeof ipstat; if (live) { if (zflag) memset(&zerostat, 0, len); if (sysctlbyname("net.inet.ip.stats", &ipstat, &len, zflag ? &zerostat : NULL, zflag ? len : 0) < 0) { warn("sysctl: net.inet.ip.stats"); return; } } else kread_counters(off, &ipstat, len); printf("%s:\n", name); #define p(f, m) if (ipstat.f || sflag <= 1) \ printf(m, (uintmax_t )ipstat.f, plural(ipstat.f)) #define p1a(f, m) if (ipstat.f || sflag <= 1) \ printf(m, (uintmax_t )ipstat.f) p(ips_total, "\t%ju total packet%s received\n"); p(ips_badsum, "\t%ju bad header checksum%s\n"); p1a(ips_toosmall, "\t%ju with size smaller than minimum\n"); p1a(ips_tooshort, "\t%ju with data size < data length\n"); p1a(ips_toolong, "\t%ju with ip length > max ip packet size\n"); p1a(ips_badhlen, "\t%ju with header length < data size\n"); p1a(ips_badlen, "\t%ju with data length < header length\n"); p1a(ips_badoptions, "\t%ju with bad options\n"); p1a(ips_badvers, "\t%ju with incorrect version number\n"); p(ips_fragments, "\t%ju fragment%s received\n"); p(ips_fragdropped, "\t%ju fragment%s dropped (dup or out of space)\n"); p(ips_fragtimeout, "\t%ju fragment%s dropped after timeout\n"); p(ips_reassembled, "\t%ju packet%s reassembled ok\n"); p(ips_delivered, "\t%ju packet%s for this host\n"); p(ips_noproto, "\t%ju packet%s for unknown/unsupported protocol\n"); p(ips_forward, "\t%ju packet%s forwarded"); p(ips_fastforward, " (%ju packet%s fast forwarded)"); if (ipstat.ips_forward || sflag <= 1) putchar('\n'); p(ips_cantforward, "\t%ju packet%s not forwardable\n"); p(ips_notmember, "\t%ju packet%s received for unknown multicast group\n"); p(ips_redirectsent, "\t%ju redirect%s sent\n"); p(ips_localout, "\t%ju packet%s sent from this host\n"); p(ips_rawout, "\t%ju packet%s sent with fabricated ip header\n"); p(ips_odropped, "\t%ju output packet%s dropped due to no bufs, etc.\n"); p(ips_noroute, "\t%ju output packet%s discarded due to no route\n"); p(ips_fragmented, "\t%ju output datagram%s fragmented\n"); p(ips_ofragments, "\t%ju fragment%s created\n"); p(ips_cantfrag, "\t%ju datagram%s that can't be fragmented\n"); p(ips_nogif, "\t%ju tunneling packet%s that can't find gif\n"); p(ips_badaddr, "\t%ju datagram%s with bad address in header\n"); #undef p #undef p1a } /* * Dump ARP statistics structure. */ void arp_stats(u_long off, const char *name, int af1 __unused, int proto __unused) { struct arpstat arpstat, zerostat; size_t len = sizeof(arpstat); if (live) { if (zflag) memset(&zerostat, 0, len); if (sysctlbyname("net.link.ether.arp.stats", &arpstat, &len, zflag ? &zerostat : NULL, zflag ? len : 0) < 0) { warn("sysctl: net.link.ether.arp.stats"); return; } } else kread_counters(off, &arpstat, len); printf("%s:\n", name); #define p(f, m) if (arpstat.f || sflag <= 1) \ printf("\t%ju " m, (uintmax_t)arpstat.f, plural(arpstat.f)) #define p2(f, m) if (arpstat.f || sflag <= 1) \ printf("\t%ju " m, (uintmax_t)arpstat.f, pluralies(arpstat.f)) p(txrequests, "ARP request%s sent\n"); p2(txreplies, "ARP repl%s sent\n"); p(rxrequests, "ARP request%s received\n"); p2(rxreplies, "ARP repl%s received\n"); p(received, "ARP packet%s received\n"); p(dropped, "total packet%s dropped due to no ARP entry\n"); p(timeouts, "ARP entry%s timed out\n"); p(dupips, "Duplicate IP%s seen\n"); #undef p #undef p2 } static const char *icmpnames[ICMP_MAXTYPE + 1] = { "echo reply", /* RFC 792 */ "#1", "#2", "destination unreachable", /* RFC 792 */ "source quench", /* RFC 792 */ "routing redirect", /* RFC 792 */ "#6", "#7", "echo", /* RFC 792 */ "router advertisement", /* RFC 1256 */ "router solicitation", /* RFC 1256 */ "time exceeded", /* RFC 792 */ "parameter problem", /* RFC 792 */ "time stamp", /* RFC 792 */ "time stamp reply", /* RFC 792 */ "information request", /* RFC 792 */ "information request reply", /* RFC 792 */ "address mask request", /* RFC 950 */ "address mask reply", /* RFC 950 */ "#19", "#20", "#21", "#22", "#23", "#24", "#25", "#26", "#27", "#28", "#29", "icmp traceroute", /* RFC 1393 */ "datagram conversion error", /* RFC 1475 */ "mobile host redirect", "IPv6 where-are-you", "IPv6 i-am-here", "mobile registration req", "mobile registration reply", "domain name request", /* RFC 1788 */ "domain name reply", /* RFC 1788 */ "icmp SKIP", "icmp photuris", /* RFC 2521 */ }; /* * Dump ICMP statistics. */ void icmp_stats(u_long off, const char *name, int af1 __unused, int proto __unused) { struct icmpstat icmpstat, zerostat; int i, first; size_t len; len = sizeof icmpstat; if (live) { if (zflag) memset(&zerostat, 0, len); if (sysctlbyname("net.inet.icmp.stats", &icmpstat, &len, zflag ? &zerostat : NULL, zflag ? len : 0) < 0) { warn("sysctl: net.inet.icmp.stats"); return; } } else kread_counters(off, &icmpstat, len); printf("%s:\n", name); #define p(f, m) if (icmpstat.f || sflag <= 1) \ printf(m, icmpstat.f, plural(icmpstat.f)) #define p1a(f, m) if (icmpstat.f || sflag <= 1) \ printf(m, icmpstat.f) #define p2(f, m) if (icmpstat.f || sflag <= 1) \ printf(m, icmpstat.f, plurales(icmpstat.f)) p(icps_error, "\t%lu call%s to icmp_error\n"); p(icps_oldicmp, "\t%lu error%s not generated in response to an icmp message\n"); for (first = 1, i = 0; i < ICMP_MAXTYPE + 1; i++) if (icmpstat.icps_outhist[i] != 0) { if (first) { printf("\tOutput histogram:\n"); first = 0; } if (icmpnames[i] != NULL) printf("\t\t%s: %lu\n", icmpnames[i], icmpstat.icps_outhist[i]); else printf("\t\tunknown ICMP #%d: %lu\n", i, icmpstat.icps_outhist[i]); } p(icps_badcode, "\t%lu message%s with bad code fields\n"); p(icps_tooshort, "\t%lu message%s less than the minimum length\n"); p(icps_checksum, "\t%lu message%s with bad checksum\n"); p(icps_badlen, "\t%lu message%s with bad length\n"); p1a(icps_bmcastecho, "\t%lu multicast echo requests ignored\n"); p1a(icps_bmcasttstamp, "\t%lu multicast timestamp requests ignored\n"); for (first = 1, i = 0; i < ICMP_MAXTYPE + 1; i++) if (icmpstat.icps_inhist[i] != 0) { if (first) { printf("\tInput histogram:\n"); first = 0; } if (icmpnames[i] != NULL) printf("\t\t%s: %lu\n", icmpnames[i], icmpstat.icps_inhist[i]); else printf("\t\tunknown ICMP #%d: %lu\n", i, icmpstat.icps_inhist[i]); } p(icps_reflect, "\t%lu message response%s generated\n"); p2(icps_badaddr, "\t%lu invalid return address%s\n"); p(icps_noroute, "\t%lu no return route%s\n"); #undef p #undef p1a #undef p2 if (live) { len = sizeof i; if (sysctlbyname("net.inet.icmp.maskrepl", &i, &len, NULL, 0) < 0) return; printf("\tICMP address mask responses are %sabled\n", i ? "en" : "dis"); } } -#ifndef BURN_BRIDGES /* - * Dump IGMP statistics structure (pre 8.x kernel). - */ -static void -igmp_stats_live_old(const char *name) -{ - struct oigmpstat oigmpstat, zerostat; - size_t len = sizeof(oigmpstat); - - if (zflag) - memset(&zerostat, 0, len); - if (sysctlbyname("net.inet.igmp.stats", &oigmpstat, &len, - zflag ? &zerostat : NULL, zflag ? len : 0) < 0) { - warn("sysctl: net.inet.igmp.stats"); - return; - } - - printf("%s:\n", name); - -#define p(f, m) if (oigmpstat.f || sflag <= 1) \ - printf(m, oigmpstat.f, plural(oigmpstat.f)) -#define py(f, m) if (oigmpstat.f || sflag <= 1) \ - printf(m, oigmpstat.f, oigmpstat.f != 1 ? "ies" : "y") - p(igps_rcv_total, "\t%u message%s received\n"); - p(igps_rcv_tooshort, "\t%u message%s received with too few bytes\n"); - p(igps_rcv_badsum, "\t%u message%s received with bad checksum\n"); - py(igps_rcv_queries, "\t%u membership quer%s received\n"); - py(igps_rcv_badqueries, - "\t%u membership quer%s received with invalid field(s)\n"); - p(igps_rcv_reports, "\t%u membership report%s received\n"); - p(igps_rcv_badreports, - "\t%u membership report%s received with invalid field(s)\n"); - p(igps_rcv_ourreports, -"\t%u membership report%s received for groups to which we belong\n"); - p(igps_snd_reports, "\t%u membership report%s sent\n"); -#undef p -#undef py -} -#endif /* !BURN_BRIDGES */ - -/* * Dump IGMP statistics structure. */ void igmp_stats(u_long off, const char *name, int af1 __unused, int proto __unused) { struct igmpstat igmpstat, zerostat; size_t len; - -#ifndef BURN_BRIDGES - if (live) { - /* - * Detect if we are being run against a pre-IGMPv3 kernel. - * We cannot do this for a core file as the legacy - * struct igmpstat has no size field, nor does it - * export it in any readily-available symbols. - */ - len = 0; - if (sysctlbyname("net.inet.igmp.stats", NULL, &len, NULL, - 0) < 0) { - warn("sysctl: net.inet.igmp.stats"); - return; - } - if (len < sizeof(igmpstat)) { - igmp_stats_live_old(name); - return; - } - } -#endif /* !BURN_BRIDGES */ len = sizeof(igmpstat); if (live) { if (zflag) memset(&zerostat, 0, len); if (sysctlbyname("net.inet.igmp.stats", &igmpstat, &len, zflag ? &zerostat : NULL, zflag ? len : 0) < 0) { warn("sysctl: net.inet.igmp.stats"); return; } } else { len = sizeof(igmpstat); kread(off, &igmpstat, len); } if (igmpstat.igps_version != IGPS_VERSION_3) { warnx("%s: version mismatch (%d != %d)", __func__, igmpstat.igps_version, IGPS_VERSION_3); } if (igmpstat.igps_len != IGPS_VERSION3_LEN) { warnx("%s: size mismatch (%d != %d)", __func__, igmpstat.igps_len, IGPS_VERSION3_LEN); } printf("%s:\n", name); #define p64(f, m) if (igmpstat.f || sflag <= 1) \ printf(m, (uintmax_t) igmpstat.f, plural(igmpstat.f)) #define py64(f, m) if (igmpstat.f || sflag <= 1) \ printf(m, (uintmax_t) igmpstat.f, pluralies(igmpstat.f)) p64(igps_rcv_total, "\t%ju message%s received\n"); p64(igps_rcv_tooshort, "\t%ju message%s received with too few bytes\n"); p64(igps_rcv_badttl, "\t%ju message%s received with wrong TTL\n"); p64(igps_rcv_badsum, "\t%ju message%s received with bad checksum\n"); py64(igps_rcv_v1v2_queries, "\t%ju V1/V2 membership quer%s received\n"); py64(igps_rcv_v3_queries, "\t%ju V3 membership quer%s received\n"); py64(igps_rcv_badqueries, "\t%ju membership quer%s received with invalid field(s)\n"); py64(igps_rcv_gen_queries, "\t%ju general quer%s received\n"); py64(igps_rcv_group_queries, "\t%ju group quer%s received\n"); py64(igps_rcv_gsr_queries, "\t%ju group-source quer%s received\n"); py64(igps_drop_gsr_queries, "\t%ju group-source quer%s dropped\n"); p64(igps_rcv_reports, "\t%ju membership report%s received\n"); p64(igps_rcv_badreports, "\t%ju membership report%s received with invalid field(s)\n"); p64(igps_rcv_ourreports, "\t%ju membership report%s received for groups to which we belong\n"); p64(igps_rcv_nora, "\t%ju V3 report%s received without Router Alert\n"); p64(igps_snd_reports, "\t%ju membership report%s sent\n"); #undef p64 #undef py64 } /* * Dump PIM statistics structure. */ void pim_stats(u_long off __unused, const char *name, int af1 __unused, int proto __unused) { struct pimstat pimstat, zerostat; size_t len = sizeof pimstat; if (live) { if (zflag) memset(&zerostat, 0, len); if (sysctlbyname("net.inet.pim.stats", &pimstat, &len, zflag ? &zerostat : NULL, zflag ? len : 0) < 0) { if (errno != ENOENT) warn("sysctl: net.inet.pim.stats"); return; } } else { if (off == 0) return; kread_counters(off, &pimstat, len); } printf("%s:\n", name); #define p(f, m) if (pimstat.f || sflag <= 1) \ printf(m, (uintmax_t)pimstat.f, plural(pimstat.f)) #define py(f, m) if (pimstat.f || sflag <= 1) \ printf(m, (uintmax_t)pimstat.f, pimstat.f != 1 ? "ies" : "y") p(pims_rcv_total_msgs, "\t%ju message%s received\n"); p(pims_rcv_total_bytes, "\t%ju byte%s received\n"); p(pims_rcv_tooshort, "\t%ju message%s received with too few bytes\n"); p(pims_rcv_badsum, "\t%ju message%s received with bad checksum\n"); p(pims_rcv_badversion, "\t%ju message%s received with bad version\n"); p(pims_rcv_registers_msgs, "\t%ju data register message%s received\n"); p(pims_rcv_registers_bytes, "\t%ju data register byte%s received\n"); p(pims_rcv_registers_wrongiif, "\t%ju data register message%s received on wrong iif\n"); p(pims_rcv_badregisters, "\t%ju bad register%s received\n"); p(pims_snd_registers_msgs, "\t%ju data register message%s sent\n"); p(pims_snd_registers_bytes, "\t%ju data register byte%s sent\n"); #undef p #undef py } /* * Pretty print an Internet address (net address + port). */ void inetprint(struct in_addr *in, int port, const char *proto, int num_port) { struct servent *sp = 0; char line[80], *cp; int width; if (Wflag) sprintf(line, "%s.", inetname(in)); else sprintf(line, "%.*s.", (Aflag && !num_port) ? 12 : 16, inetname(in)); cp = strchr(line, '\0'); if (!num_port && port) sp = getservbyport((int)port, proto); if (sp || port == 0) sprintf(cp, "%.15s ", sp ? sp->s_name : "*"); else sprintf(cp, "%d ", ntohs((u_short)port)); width = (Aflag && !Wflag) ? 18 : 22; if (Wflag) printf("%-*s ", width, line); else printf("%-*.*s ", width, width, line); } /* * Construct an Internet address representation. * If numeric_addr has been supplied, give * numeric value, otherwise try for symbolic name. */ char * inetname(struct in_addr *inp) { char *cp; static char line[MAXHOSTNAMELEN]; struct hostent *hp; struct netent *np; cp = 0; if (!numeric_addr && inp->s_addr != INADDR_ANY) { int net = inet_netof(*inp); int lna = inet_lnaof(*inp); if (lna == INADDR_ANY) { np = getnetbyaddr(net, AF_INET); if (np) cp = np->n_name; } if (cp == 0) { hp = gethostbyaddr((char *)inp, sizeof (*inp), AF_INET); if (hp) { cp = hp->h_name; trimdomain(cp, strlen(cp)); } } } if (inp->s_addr == INADDR_ANY) strcpy(line, "*"); else if (cp) { strlcpy(line, cp, sizeof(line)); } else { inp->s_addr = ntohl(inp->s_addr); #define C(x) ((u_int)((x) & 0xff)) sprintf(line, "%u.%u.%u.%u", C(inp->s_addr >> 24), C(inp->s_addr >> 16), C(inp->s_addr >> 8), C(inp->s_addr)); } return (line); } Index: projects/ifnet/usr.sbin/ifmcstat/Makefile =================================================================== --- projects/ifnet/usr.sbin/ifmcstat/Makefile (revision 279031) +++ projects/ifnet/usr.sbin/ifmcstat/Makefile (revision 279032) @@ -1,23 +1,18 @@ # @(#)Makefile 8.1 (Berkeley) 6/5/93 # $FreeBSD$ .include PROG= ifmcstat SRCS= ifmcstat.c printb.c MAN= ifmcstat.8 BINMODE= 555 WARNS?= 2 .if ${MK_INET6_SUPPORT} != "no" CFLAGS+=-DINET6 .endif -.if ${MK_KVM_SUPPORT} != "no" -CFLAGS+=-DWITH_KVM -LIBADD= kvm -.endif - .include Index: projects/ifnet/usr.sbin/ifmcstat/ifmcstat.c =================================================================== --- projects/ifnet/usr.sbin/ifmcstat/ifmcstat.c (revision 279031) +++ projects/ifnet/usr.sbin/ifmcstat/ifmcstat.c (revision 279032) @@ -1,1247 +1,1247 @@ /* $KAME: ifmcstat.c,v 1.48 2006/11/15 05:13:59 itojun Exp $ */ /* * Copyright (c) 2007-2009 Bruce Simpson. * Copyright (C) 1995, 1996, 1997, and 1998 WIDE Project. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the project nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include -#define _WANT_IFADDR #include #include #include #include #include #include #include #include #include -#define KERNEL -# include -#undef KERNEL -#define _KERNEL -#define SYSCTL_DECL(x) -# include -#undef SYSCTL_DECL -#undef _KERNEL +#include +#include #ifdef INET6 #include -#define _KERNEL -# include -#undef _KERNEL +#include #endif /* INET6 */ #include #include #include #include #include #include #include #include #include #include #include #include -#include #include #include -#include #include #include -/* XXX: This file currently assumes INET and KVM support in the base system. */ +#ifdef KVM +/* + * Currently the KVM build is broken. To be fixed it requires uncovering + * large amount of _KERNEL code in include files, and it is also very + * tentative to internal kernel ABI changes. If anyone wishes to restore + * it, please move it out of src/usr.sbin to src/tools/tools. + */ +#include +#include +#endif + +/* XXX: This file currently assumes INET support in the base system. */ #ifndef INET #define INET #endif extern void printb(const char *, unsigned int, const char *); union sockunion { struct sockaddr_storage ss; struct sockaddr sa; struct sockaddr_dl sdl; #ifdef INET struct sockaddr_in sin; #endif #ifdef INET6 struct sockaddr_in6 sin6; #endif }; typedef union sockunion sockunion_t; uint32_t ifindex = 0; int af = AF_UNSPEC; #ifdef WITH_KVM int Kflag = 0; #endif int vflag = 0; #define sa_dl_equal(a1, a2) \ ((((struct sockaddr_dl *)(a1))->sdl_len == \ ((struct sockaddr_dl *)(a2))->sdl_len) && \ (bcmp(LLADDR((struct sockaddr_dl *)(a1)), \ LLADDR((struct sockaddr_dl *)(a2)), \ ((struct sockaddr_dl *)(a1))->sdl_alen) == 0)) /* * Most of the code in this utility is to support the use of KVM for * post-mortem debugging of the multicast code. */ #ifdef WITH_KVM #ifdef INET static void if_addrlist(struct ifaddr *); static struct in_multi * in_multientry(struct in_multi *); #endif /* INET */ #ifdef INET6 static void if6_addrlist(struct ifaddr *); static struct in6_multi * in6_multientry(struct in6_multi *); #endif /* INET6 */ static void kread(u_long, void *, int); static void ll_addrlist(struct ifaddr *); static int ifmcstat_kvm(const char *kernel, const char *core); #define KREAD(addr, buf, type) \ kread((u_long)addr, (void *)buf, sizeof(type)) kvm_t *kvmd; struct nlist nl[] = { { "_ifnet", 0, 0, 0, 0, }, { "", 0, 0, 0, 0, }, }; #define N_IFNET 0 #endif /* WITH_KVM */ static int ifmcstat_getifmaddrs(void); #ifdef INET static void in_ifinfo(struct igmp_ifinfo *); static const char * inm_mode(u_int mode); #endif #ifdef INET6 static void in6_ifinfo(struct mld_ifinfo *); static const char * inet6_n2a(struct in6_addr *, uint32_t); #endif int main(int, char **); static void usage() { fprintf(stderr, "usage: ifmcstat [-i interface] [-f address family]" " [-v]" #ifdef WITH_KVM " [-K] [-M core] [-N system]" #endif "\n"); exit(EX_USAGE); } static const char *options = "i:f:vM:N:" #ifdef WITH_KVM "K" #endif ; int main(int argc, char **argv) { int c, error; #ifdef WITH_KVM const char *kernel = NULL; const char *core = NULL; #endif while ((c = getopt(argc, argv, options)) != -1) { switch (c) { case 'i': if ((ifindex = if_nametoindex(optarg)) == 0) { fprintf(stderr, "%s: unknown interface\n", optarg); exit(EX_NOHOST); } break; case 'f': #ifdef INET if (strcmp(optarg, "inet") == 0) { af = AF_INET; break; } #endif #ifdef INET6 if (strcmp(optarg, "inet6") == 0) { af = AF_INET6; break; } #endif if (strcmp(optarg, "link") == 0) { af = AF_LINK; break; } fprintf(stderr, "%s: unknown address family\n", optarg); exit(EX_USAGE); /*NOTREACHED*/ break; #ifdef WITH_KVM case 'K': ++Kflag; break; #endif case 'v': ++vflag; break; #ifdef WITH_KVM case 'M': core = strdup(optarg); break; case 'N': kernel = strdup(optarg); break; #endif default: usage(); break; /*NOTREACHED*/ } } if (af == AF_LINK && vflag) usage(); #ifdef WITH_KVM if (Kflag) error = ifmcstat_kvm(kernel, core); /* * If KVM failed, and user did not explicitly specify a core file, * or force KVM backend to be disabled, try the sysctl backend. */ if (!Kflag || (error != 0 && (core == NULL && kernel == NULL))) #endif error = ifmcstat_getifmaddrs(); if (error != 0) exit(EX_OSERR); exit(EX_OK); /*NOTREACHED*/ } #ifdef INET static void in_ifinfo(struct igmp_ifinfo *igi) { printf("\t"); switch (igi->igi_version) { case IGMP_VERSION_1: case IGMP_VERSION_2: case IGMP_VERSION_3: printf("igmpv%d", igi->igi_version); break; default: printf("igmpv?(%d)", igi->igi_version); break; } if (igi->igi_flags) printb(" flags", igi->igi_flags, "\020\1SILENT\2LOOPBACK"); if (igi->igi_version == IGMP_VERSION_3) { printf(" rv %u qi %u qri %u uri %u", igi->igi_rv, igi->igi_qi, igi->igi_qri, igi->igi_uri); } if (vflag >= 2) { printf(" v1timer %u v2timer %u v3timer %u", igi->igi_v1_timer, igi->igi_v2_timer, igi->igi_v3_timer); } printf("\n"); } static const char *inm_modes[] = { "undefined", "include", "exclude", }; static const char * inm_mode(u_int mode) { if (mode >= MCAST_UNDEFINED && mode <= MCAST_EXCLUDE) return (inm_modes[mode]); return (NULL); } #endif /* INET */ #ifdef WITH_KVM static int ifmcstat_kvm(const char *kernel, const char *core) { char buf[_POSIX2_LINE_MAX], ifname[IFNAMSIZ]; struct ifnet *ifp, *nifp, ifnet; if ((kvmd = kvm_openfiles(kernel, core, NULL, O_RDONLY, buf)) == NULL) { perror("kvm_openfiles"); return (-1); } if (kvm_nlist(kvmd, nl) < 0) { perror("kvm_nlist"); return (-1); } if (nl[N_IFNET].n_value == 0) { printf("symbol %s not found\n", nl[N_IFNET].n_name); return (-1); } KREAD(nl[N_IFNET].n_value, &ifp, struct ifnet *); while (ifp) { KREAD(ifp, &ifnet, struct ifnet); nifp = ifnet.if_link.tqe_next; if (ifindex && ifindex != ifnet.if_index) goto next; printf("%s:\n", if_indextoname(ifnet.if_index, ifname)); #ifdef INET if_addrlist(TAILQ_FIRST(&ifnet.if_addrhead)); #endif #ifdef INET6 if6_addrlist(TAILQ_FIRST(&ifnet.if_addrhead)); #endif if (vflag) ll_addrlist(TAILQ_FIRST(&ifnet.if_addrhead)); next: ifp = nifp; } return (0); } static void kread(u_long addr, void *buf, int len) { if (kvm_read(kvmd, addr, buf, len) != len) { perror("kvm_read"); exit(EX_OSERR); } } static void ll_addrlist(struct ifaddr *ifap) { char addrbuf[NI_MAXHOST]; struct ifaddr ifa; struct sockaddr sa; struct sockaddr_dl sdl; struct ifaddr *ifap0; if (af && af != AF_LINK) return; ifap0 = ifap; while (ifap) { KREAD(ifap, &ifa, struct ifaddr); if (ifa.ifa_addr == NULL) goto nextifap; KREAD(ifa.ifa_addr, &sa, struct sockaddr); if (sa.sa_family != PF_LINK) goto nextifap; KREAD(ifa.ifa_addr, &sdl, struct sockaddr_dl); if (sdl.sdl_alen == 0) goto nextifap; addrbuf[0] = '\0'; getnameinfo((struct sockaddr *)&sdl, sdl.sdl_len, addrbuf, sizeof(addrbuf), NULL, 0, NI_NUMERICHOST); printf("\tlink %s\n", addrbuf); nextifap: ifap = ifa.ifa_link.tqe_next; } if (ifap0) { struct ifnet ifnet; struct ifmultiaddr ifm, *ifmp = 0; KREAD(ifap0, &ifa, struct ifaddr); KREAD(ifa.ifa_ifp, &ifnet, struct ifnet); if (TAILQ_FIRST(&ifnet.if_multiaddrs)) ifmp = TAILQ_FIRST(&ifnet.if_multiaddrs); while (ifmp) { KREAD(ifmp, &ifm, struct ifmultiaddr); if (ifm.ifma_addr == NULL) goto nextmulti; KREAD(ifm.ifma_addr, &sa, struct sockaddr); if (sa.sa_family != AF_LINK) goto nextmulti; KREAD(ifm.ifma_addr, &sdl, struct sockaddr_dl); addrbuf[0] = '\0'; getnameinfo((struct sockaddr *)&sdl, sdl.sdl_len, addrbuf, sizeof(addrbuf), NULL, 0, NI_NUMERICHOST); printf("\t\tgroup %s refcnt %d\n", addrbuf, ifm.ifma_refcount); nextmulti: ifmp = TAILQ_NEXT(&ifm, ifma_link); } } } #ifdef INET6 static void if6_addrlist(struct ifaddr *ifap) { struct ifnet ifnet; struct ifaddr ifa; struct sockaddr sa; struct in6_ifaddr if6a; struct ifaddr *ifap0; if (af && af != AF_INET6) return; ifap0 = ifap; while (ifap) { KREAD(ifap, &ifa, struct ifaddr); if (ifa.ifa_addr == NULL) goto nextifap; KREAD(ifa.ifa_addr, &sa, struct sockaddr); if (sa.sa_family != PF_INET6) goto nextifap; KREAD(ifap, &if6a, struct in6_ifaddr); printf("\tinet6 %s\n", inet6_n2a(&if6a.ia_addr.sin6_addr, if6a.ia_addr.sin6_scope_id)); /* * Print per-link MLD information, if available. */ if (ifa.ifa_ifp != NULL) { struct in6_ifextra ie; struct mld_ifinfo mli; KREAD(ifa.ifa_ifp, &ifnet, struct ifnet); KREAD(ifnet.if_afdata[AF_INET6], &ie, struct in6_ifextra); if (ie.mld_ifinfo != NULL) { KREAD(ie.mld_ifinfo, &mli, struct mld_ifinfo); in6_ifinfo(&mli); } } nextifap: ifap = ifa.ifa_link.tqe_next; } if (ifap0) { struct ifnet ifnet; struct ifmultiaddr ifm, *ifmp = 0; struct sockaddr_dl sdl; KREAD(ifap0, &ifa, struct ifaddr); KREAD(ifa.ifa_ifp, &ifnet, struct ifnet); if (TAILQ_FIRST(&ifnet.if_multiaddrs)) ifmp = TAILQ_FIRST(&ifnet.if_multiaddrs); while (ifmp) { KREAD(ifmp, &ifm, struct ifmultiaddr); if (ifm.ifma_addr == NULL) goto nextmulti; KREAD(ifm.ifma_addr, &sa, struct sockaddr); if (sa.sa_family != AF_INET6) goto nextmulti; (void)in6_multientry((struct in6_multi *) ifm.ifma_protospec); if (ifm.ifma_lladdr == 0) goto nextmulti; KREAD(ifm.ifma_lladdr, &sdl, struct sockaddr_dl); printf("\t\t\tmcast-macaddr %s refcnt %d\n", ether_ntoa((struct ether_addr *)LLADDR(&sdl)), ifm.ifma_refcount); nextmulti: ifmp = TAILQ_NEXT(&ifm, ifma_link); } } } static struct in6_multi * in6_multientry(struct in6_multi *mc) { struct in6_multi multi; KREAD(mc, &multi, struct in6_multi); printf("\t\tgroup %s", inet6_n2a(&multi.in6m_addr, 0)); printf(" refcnt %u\n", multi.in6m_refcount); return (multi.in6m_entry.le_next); } #endif /* INET6 */ #ifdef INET static void if_addrlist(struct ifaddr *ifap) { struct ifaddr ifa; struct ifnet ifnet; struct sockaddr sa; struct in_ifaddr ia; struct ifaddr *ifap0; if (af && af != AF_INET) return; ifap0 = ifap; while (ifap) { KREAD(ifap, &ifa, struct ifaddr); if (ifa.ifa_addr == NULL) goto nextifap; KREAD(ifa.ifa_addr, &sa, struct sockaddr); if (sa.sa_family != PF_INET) goto nextifap; KREAD(ifap, &ia, struct in_ifaddr); printf("\tinet %s\n", inet_ntoa(ia.ia_addr.sin_addr)); /* * Print per-link IGMP information, if available. */ if (ifa.ifa_ifp != NULL) { struct in_ifinfo ii; struct igmp_ifinfo igi; KREAD(ifa.ifa_ifp, &ifnet, struct ifnet); KREAD(ifnet.if_afdata[AF_INET], &ii, struct in_ifinfo); if (ii.ii_igmp != NULL) { KREAD(ii.ii_igmp, &igi, struct igmp_ifinfo); in_ifinfo(&igi); } } nextifap: ifap = ifa.ifa_link.tqe_next; } if (ifap0) { struct ifmultiaddr ifm, *ifmp = 0; struct sockaddr_dl sdl; KREAD(ifap0, &ifa, struct ifaddr); KREAD(ifa.ifa_ifp, &ifnet, struct ifnet); if (TAILQ_FIRST(&ifnet.if_multiaddrs)) ifmp = TAILQ_FIRST(&ifnet.if_multiaddrs); while (ifmp) { KREAD(ifmp, &ifm, struct ifmultiaddr); if (ifm.ifma_addr == NULL) goto nextmulti; KREAD(ifm.ifma_addr, &sa, struct sockaddr); if (sa.sa_family != AF_INET) goto nextmulti; (void)in_multientry((struct in_multi *) ifm.ifma_protospec); if (ifm.ifma_lladdr == 0) goto nextmulti; KREAD(ifm.ifma_lladdr, &sdl, struct sockaddr_dl); printf("\t\t\tmcast-macaddr %s refcnt %d\n", ether_ntoa((struct ether_addr *)LLADDR(&sdl)), ifm.ifma_refcount); nextmulti: ifmp = TAILQ_NEXT(&ifm, ifma_link); } } } static const char *inm_states[] = { "not-member", "silent", "idle", "lazy", "sleeping", "awakening", "query-pending", "sg-query-pending", "leaving" }; static const char * inm_state(u_int state) { if (state >= IGMP_NOT_MEMBER && state <= IGMP_LEAVING_MEMBER) return (inm_states[state]); return (NULL); } #if 0 static struct ip_msource * ims_min_kvm(struct in_multi *pinm) { struct ip_msource ims0; struct ip_msource *tmp, *parent; parent = NULL; tmp = RB_ROOT(&pinm->inm_srcs); while (tmp) { parent = tmp; KREAD(tmp, &ims0, struct ip_msource); tmp = RB_LEFT(&ims0, ims_link); } return (parent); /* kva */ } /* XXX This routine is buggy. See RB_NEXT in sys/tree.h. */ static struct ip_msource * ims_next_kvm(struct ip_msource *ims) { struct ip_msource ims0, ims1; struct ip_msource *tmp; KREAD(ims, &ims0, struct ip_msource); if (RB_RIGHT(&ims0, ims_link)) { ims = RB_RIGHT(&ims0, ims_link); KREAD(ims, &ims1, struct ip_msource); while ((tmp = RB_LEFT(&ims1, ims_link))) { KREAD(tmp, &ims0, struct ip_msource); ims = RB_LEFT(&ims0, ims_link); } } else { tmp = RB_PARENT(&ims0, ims_link); if (tmp) { KREAD(tmp, &ims1, struct ip_msource); if (ims == RB_LEFT(&ims1, ims_link)) ims = tmp; } else { while ((tmp = RB_PARENT(&ims0, ims_link))) { KREAD(tmp, &ims1, struct ip_msource); if (ims == RB_RIGHT(&ims1, ims_link)) { ims = tmp; KREAD(ims, &ims0, struct ip_msource); } else break; } ims = RB_PARENT(&ims0, ims_link); } } return (ims); /* kva */ } static void inm_print_sources_kvm(struct in_multi *pinm) { struct ip_msource ims0; struct ip_msource *ims; struct in_addr src; int cnt; uint8_t fmode; cnt = 0; fmode = pinm->inm_st[1].iss_fmode; if (fmode == MCAST_UNDEFINED) return; for (ims = ims_min_kvm(pinm); ims != NULL; ims = ims_next_kvm(ims)) { if (cnt == 0) printf(" srcs "); KREAD(ims, &ims0, struct ip_msource); /* Only print sources in-mode at t1. */ if (fmode != ims_get_mode(pinm, ims, 1)) continue; src.s_addr = htonl(ims0.ims_haddr); printf("%s%s", (cnt++ == 0 ? "" : ","), inet_ntoa(src)); } } #endif static struct in_multi * in_multientry(struct in_multi *pinm) { struct in_multi inm; const char *state, *mode; KREAD(pinm, &inm, struct in_multi); printf("\t\tgroup %s", inet_ntoa(inm.inm_addr)); printf(" refcnt %u", inm.inm_refcount); state = inm_state(inm.inm_state); if (state) printf(" state %s", state); else printf(" state (%d)", inm.inm_state); mode = inm_mode(inm.inm_st[1].iss_fmode); if (mode) printf(" mode %s", mode); else printf(" mode (%d)", inm.inm_st[1].iss_fmode); if (vflag >= 2) { printf(" asm %u ex %u in %u rec %u", (u_int)inm.inm_st[1].iss_asm, (u_int)inm.inm_st[1].iss_ex, (u_int)inm.inm_st[1].iss_in, (u_int)inm.inm_st[1].iss_rec); } #if 0 /* Buggy. */ if (vflag) inm_print_sources_kvm(&inm); #endif printf("\n"); return (NULL); } #endif /* INET */ #endif /* WITH_KVM */ #ifdef INET6 static void in6_ifinfo(struct mld_ifinfo *mli) { printf("\t"); switch (mli->mli_version) { case MLD_VERSION_1: case MLD_VERSION_2: printf("mldv%d", mli->mli_version); break; default: printf("mldv?(%d)", mli->mli_version); break; } if (mli->mli_flags) printb(" flags", mli->mli_flags, "\020\1SILENT\2USEALLOW"); if (mli->mli_version == MLD_VERSION_2) { printf(" rv %u qi %u qri %u uri %u", mli->mli_rv, mli->mli_qi, mli->mli_qri, mli->mli_uri); } if (vflag >= 2) { printf(" v1timer %u v2timer %u", mli->mli_v1_timer, mli->mli_v2_timer); } printf("\n"); } static const char * inet6_n2a(struct in6_addr *p, uint32_t scope_id) { static char buf[NI_MAXHOST]; struct sockaddr_in6 sin6; const int niflags = NI_NUMERICHOST; memset(&sin6, 0, sizeof(sin6)); sin6.sin6_family = AF_INET6; sin6.sin6_len = sizeof(struct sockaddr_in6); sin6.sin6_addr = *p; sin6.sin6_scope_id = scope_id; if (getnameinfo((struct sockaddr *)&sin6, sin6.sin6_len, buf, sizeof(buf), NULL, 0, niflags) == 0) { return (buf); } else { return ("(invalid)"); } } #endif /* INET6 */ #ifdef INET /* * Retrieve per-group source filter mode and lists via sysctl. */ static void inm_print_sources_sysctl(uint32_t ifindex, struct in_addr gina) { #define MAX_SYSCTL_TRY 5 int mib[7]; int ntry = 0; size_t mibsize; size_t len; size_t needed; size_t cnt; int i; char *buf; struct in_addr *pina; uint32_t *p; uint32_t fmode; const char *modestr; mibsize = sizeof(mib) / sizeof(mib[0]); if (sysctlnametomib("net.inet.ip.mcast.filters", mib, &mibsize) == -1) { perror("sysctlnametomib"); return; } needed = 0; mib[5] = ifindex; mib[6] = gina.s_addr; /* 32 bits wide */ mibsize = sizeof(mib) / sizeof(mib[0]); do { if (sysctl(mib, mibsize, NULL, &needed, NULL, 0) == -1) { perror("sysctl net.inet.ip.mcast.filters"); return; } if ((buf = malloc(needed)) == NULL) { perror("malloc"); return; } if (sysctl(mib, mibsize, buf, &needed, NULL, 0) == -1) { if (errno != ENOMEM || ++ntry >= MAX_SYSCTL_TRY) { perror("sysctl"); goto out_free; } free(buf); buf = NULL; } } while (buf == NULL); len = needed; if (len < sizeof(uint32_t)) { perror("sysctl"); goto out_free; } p = (uint32_t *)buf; fmode = *p++; len -= sizeof(uint32_t); modestr = inm_mode(fmode); if (modestr) printf(" mode %s", modestr); else printf(" mode (%u)", fmode); if (vflag == 0) goto out_free; cnt = len / sizeof(struct in_addr); pina = (struct in_addr *)p; for (i = 0; i < cnt; i++) { if (i == 0) printf(" srcs "); fprintf(stdout, "%s%s", (i == 0 ? "" : ","), inet_ntoa(*pina++)); len -= sizeof(struct in_addr); } if (len > 0) { fprintf(stderr, "warning: %u trailing bytes from %s\n", (unsigned int)len, "net.inet.ip.mcast.filters"); } out_free: free(buf); #undef MAX_SYSCTL_TRY } #endif /* INET */ #ifdef INET6 /* * Retrieve MLD per-group source filter mode and lists via sysctl. * * Note: The 128-bit IPv6 group address needs to be segmented into * 32-bit pieces for marshaling to sysctl. So the MIB name ends * up looking like this: * a.b.c.d.e.ifindex.g[0].g[1].g[2].g[3] * Assumes that pgroup originated from the kernel, so its components * are already in network-byte order. */ static void in6m_print_sources_sysctl(uint32_t ifindex, struct in6_addr *pgroup) { #define MAX_SYSCTL_TRY 5 char addrbuf[INET6_ADDRSTRLEN]; int mib[10]; int ntry = 0; int *pi; size_t mibsize; size_t len; size_t needed; size_t cnt; int i; char *buf; struct in6_addr *pina; uint32_t *p; uint32_t fmode; const char *modestr; mibsize = sizeof(mib) / sizeof(mib[0]); if (sysctlnametomib("net.inet6.ip6.mcast.filters", mib, &mibsize) == -1) { perror("sysctlnametomib"); return; } needed = 0; mib[5] = ifindex; pi = (int *)pgroup; for (i = 0; i < 4; i++) mib[6 + i] = *pi++; mibsize = sizeof(mib) / sizeof(mib[0]); do { if (sysctl(mib, mibsize, NULL, &needed, NULL, 0) == -1) { perror("sysctl net.inet6.ip6.mcast.filters"); return; } if ((buf = malloc(needed)) == NULL) { perror("malloc"); return; } if (sysctl(mib, mibsize, buf, &needed, NULL, 0) == -1) { if (errno != ENOMEM || ++ntry >= MAX_SYSCTL_TRY) { perror("sysctl"); goto out_free; } free(buf); buf = NULL; } } while (buf == NULL); len = needed; if (len < sizeof(uint32_t)) { perror("sysctl"); goto out_free; } p = (uint32_t *)buf; fmode = *p++; len -= sizeof(uint32_t); modestr = inm_mode(fmode); if (modestr) printf(" mode %s", modestr); else printf(" mode (%u)", fmode); if (vflag == 0) goto out_free; cnt = len / sizeof(struct in6_addr); pina = (struct in6_addr *)p; for (i = 0; i < cnt; i++) { if (i == 0) printf(" srcs "); inet_ntop(AF_INET6, (const char *)pina++, addrbuf, INET6_ADDRSTRLEN); fprintf(stdout, "%s%s", (i == 0 ? "" : ","), addrbuf); len -= sizeof(struct in6_addr); } if (len > 0) { fprintf(stderr, "warning: %u trailing bytes from %s\n", (unsigned int)len, "net.inet6.ip6.mcast.filters"); } out_free: free(buf); #undef MAX_SYSCTL_TRY } #endif /* INET6 */ static int ifmcstat_getifmaddrs(void) { char thisifname[IFNAMSIZ]; char addrbuf[NI_MAXHOST]; struct ifaddrs *ifap, *ifa; struct ifmaddrs *ifmap, *ifma; sockunion_t lastifasa; sockunion_t *psa, *pgsa, *pllsa, *pifasa; char *pcolon; char *pafname; uint32_t lastifindex, thisifindex; int error; error = 0; ifap = NULL; ifmap = NULL; lastifindex = 0; thisifindex = 0; lastifasa.ss.ss_family = AF_UNSPEC; if (getifaddrs(&ifap) != 0) { warn("getifmaddrs"); return (-1); } if (getifmaddrs(&ifmap) != 0) { warn("getifmaddrs"); error = -1; goto out; } for (ifma = ifmap; ifma; ifma = ifma->ifma_next) { error = 0; if (ifma->ifma_name == NULL || ifma->ifma_addr == NULL) continue; psa = (sockunion_t *)ifma->ifma_name; if (psa->sa.sa_family != AF_LINK) { fprintf(stderr, "WARNING: Kernel returned invalid data.\n"); error = -1; break; } /* Filter on interface name. */ thisifindex = psa->sdl.sdl_index; if (ifindex != 0 && thisifindex != ifindex) continue; /* Filter on address family. */ pgsa = (sockunion_t *)ifma->ifma_addr; if (af != 0 && pgsa->sa.sa_family != af) continue; strlcpy(thisifname, link_ntoa(&psa->sdl), IFNAMSIZ); pcolon = strchr(thisifname, ':'); if (pcolon) *pcolon = '\0'; /* Only print the banner for the first ifmaddrs entry. */ if (lastifindex == 0 || lastifindex != thisifindex) { lastifindex = thisifindex; fprintf(stdout, "%s:\n", thisifname); } /* * Currently, multicast joins only take place on the * primary IPv4 address, and only on the link-local IPv6 * address, as per IGMPv2/3 and MLDv1/2 semantics. * Therefore, we only look up the primary address on * the first pass. */ pifasa = NULL; for (ifa = ifap; ifa; ifa = ifa->ifa_next) { if ((strcmp(ifa->ifa_name, thisifname) != 0) || (ifa->ifa_addr == NULL) || (ifa->ifa_addr->sa_family != pgsa->sa.sa_family)) continue; /* * For AF_INET6 only the link-local address should * be returned. If built without IPv6 support, * skip this address entirely. */ pifasa = (sockunion_t *)ifa->ifa_addr; if (pifasa->sa.sa_family == AF_INET6 #ifdef INET6 && !IN6_IS_ADDR_LINKLOCAL(&pifasa->sin6.sin6_addr) #endif ) { pifasa = NULL; continue; } break; } if (pifasa == NULL) continue; /* primary address not found */ if (!vflag && pifasa->sa.sa_family == AF_LINK) continue; /* Parse and print primary address, if not already printed. */ if (lastifasa.ss.ss_family == AF_UNSPEC || ((lastifasa.ss.ss_family == AF_LINK && !sa_dl_equal(&lastifasa.sa, &pifasa->sa)) || !sa_equal(&lastifasa.sa, &pifasa->sa))) { switch (pifasa->sa.sa_family) { case AF_INET: pafname = "inet"; break; case AF_INET6: pafname = "inet6"; break; case AF_LINK: pafname = "link"; break; default: pafname = "unknown"; break; } switch (pifasa->sa.sa_family) { case AF_INET6: #ifdef INET6 { const char *p = inet6_n2a(&pifasa->sin6.sin6_addr, pifasa->sin6.sin6_scope_id); strlcpy(addrbuf, p, sizeof(addrbuf)); break; } #else /* FALLTHROUGH */ #endif case AF_INET: case AF_LINK: error = getnameinfo(&pifasa->sa, pifasa->sa.sa_len, addrbuf, sizeof(addrbuf), NULL, 0, NI_NUMERICHOST); if (error) perror("getnameinfo"); break; default: addrbuf[0] = '\0'; break; } fprintf(stdout, "\t%s %s", pafname, addrbuf); #ifdef INET6 if (pifasa->sa.sa_family == AF_INET6 && pifasa->sin6.sin6_scope_id) fprintf(stdout, " scopeid 0x%x", pifasa->sin6.sin6_scope_id); #endif fprintf(stdout, "\n"); #ifdef INET /* * Print per-link IGMP information, if available. */ if (pifasa->sa.sa_family == AF_INET) { struct igmp_ifinfo igi; size_t mibsize, len; int mib[5]; mibsize = sizeof(mib) / sizeof(mib[0]); if (sysctlnametomib("net.inet.igmp.ifinfo", mib, &mibsize) == -1) { perror("sysctlnametomib"); goto next_ifnet; } mib[mibsize] = thisifindex; len = sizeof(struct igmp_ifinfo); if (sysctl(mib, mibsize + 1, &igi, &len, NULL, 0) == -1) { perror("sysctl net.inet.igmp.ifinfo"); goto next_ifnet; } in_ifinfo(&igi); } #endif /* INET */ #ifdef INET6 /* * Print per-link MLD information, if available. */ if (pifasa->sa.sa_family == AF_INET6) { struct mld_ifinfo mli; size_t mibsize, len; int mib[5]; mibsize = sizeof(mib) / sizeof(mib[0]); if (sysctlnametomib("net.inet6.mld.ifinfo", mib, &mibsize) == -1) { perror("sysctlnametomib"); goto next_ifnet; } mib[mibsize] = thisifindex; len = sizeof(struct mld_ifinfo); if (sysctl(mib, mibsize + 1, &mli, &len, NULL, 0) == -1) { perror("sysctl net.inet6.mld.ifinfo"); goto next_ifnet; } in6_ifinfo(&mli); } #endif /* INET6 */ #if defined(INET) || defined(INET6) next_ifnet: #endif lastifasa = *pifasa; } /* Print this group address. */ #ifdef INET6 if (pgsa->sa.sa_family == AF_INET6) { const char *p = inet6_n2a(&pgsa->sin6.sin6_addr, pgsa->sin6.sin6_scope_id); strlcpy(addrbuf, p, sizeof(addrbuf)); } else #endif { error = getnameinfo(&pgsa->sa, pgsa->sa.sa_len, addrbuf, sizeof(addrbuf), NULL, 0, NI_NUMERICHOST); if (error) perror("getnameinfo"); } fprintf(stdout, "\t\tgroup %s", addrbuf); #ifdef INET6 if (pgsa->sa.sa_family == AF_INET6 && pgsa->sin6.sin6_scope_id) fprintf(stdout, " scopeid 0x%x", pgsa->sin6.sin6_scope_id); #endif #ifdef INET if (pgsa->sa.sa_family == AF_INET) { inm_print_sources_sysctl(thisifindex, pgsa->sin.sin_addr); } #endif #ifdef INET6 if (pgsa->sa.sa_family == AF_INET6) { in6m_print_sources_sysctl(thisifindex, &pgsa->sin6.sin6_addr); } #endif fprintf(stdout, "\n"); /* Link-layer mapping, if present. */ pllsa = (sockunion_t *)ifma->ifma_lladdr; if (pllsa != NULL) { error = getnameinfo(&pllsa->sa, pllsa->sa.sa_len, addrbuf, sizeof(addrbuf), NULL, 0, NI_NUMERICHOST); fprintf(stdout, "\t\t\tmcast-macaddr %s\n", addrbuf); } } out: if (ifmap != NULL) freeifmaddrs(ifmap); if (ifap != NULL) freeifaddrs(ifap); return (error); } Index: projects/ifnet/usr.sbin/syslogd/syslogd.c =================================================================== --- projects/ifnet/usr.sbin/syslogd/syslogd.c (revision 279031) +++ projects/ifnet/usr.sbin/syslogd/syslogd.c (revision 279032) @@ -1,2757 +1,2758 @@ /* * Copyright (c) 1983, 1988, 1993, 1994 * The Regents of the University of California. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #ifndef lint static const char copyright[] = "@(#) Copyright (c) 1983, 1988, 1993, 1994\n\ The Regents of the University of California. All rights reserved.\n"; #endif /* not lint */ #ifndef lint #if 0 static char sccsid[] = "@(#)syslogd.c 8.3 (Berkeley) 4/4/94"; #endif #endif /* not lint */ #include __FBSDID("$FreeBSD$"); /* * syslogd -- log system messages * * This program implements a system log. It takes a series of lines. * Each line may have a priority, signified as "" as * the first characters of the line. If this is * not present, a default priority is used. * * To kill syslogd, send a signal 15 (terminate). A signal 1 (hup) will * cause it to reread its configuration file. * * Defined Constants: * * MAXLINE -- the maximum line length that can be handled. * DEFUPRI -- the default priority for user messages * DEFSPRI -- the default priority for kernel messages * * Author: Eric Allman * extensive changes by Ralph Campbell * more extensive changes by Eric Allman (again) * Extension to log by program name as well as facility and priority * by Peter da Silva. * -u and -v by Harlan Stenn. * Priority comparison code by Harlan Stenn. */ #define MAXLINE 1024 /* maximum line length */ #define MAXSVLINE 120 /* maximum saved line length */ #define DEFUPRI (LOG_USER|LOG_NOTICE) #define DEFSPRI (LOG_KERN|LOG_CRIT) #define TIMERINTVL 30 /* interval for checking flush, mark */ #define TTYMSGTIME 1 /* timeout passed to ttymsg */ #define RCVBUF_MINSIZE (80 * 1024) /* minimum size of dgram rcv buffer */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "pathnames.h" #include "ttymsg.h" #define SYSLOG_NAMES #include const char *ConfFile = _PATH_LOGCONF; const char *PidFile = _PATH_LOGPID; const char ctty[] = _PATH_CONSOLE; #define dprintf if (Debug) printf #define MAXUNAMES 20 /* maximum number of user names */ /* * Unix sockets. * We have two default sockets, one with 666 permissions, * and one for privileged programs. */ struct funix { int s; const char *name; mode_t mode; STAILQ_ENTRY(funix) next; }; struct funix funix_secure = { -1, _PATH_LOG_PRIV, S_IRUSR | S_IWUSR, { NULL } }; struct funix funix_default = { -1, _PATH_LOG, DEFFILEMODE, { &funix_secure } }; STAILQ_HEAD(, funix) funixes = { &funix_default, &(funix_secure.next.stqe_next) }; /* * Flags to logmsg(). */ #define IGN_CONS 0x001 /* don't print on console */ #define SYNC_FILE 0x002 /* do fsync on file after printing */ #define ADDDATE 0x004 /* add a date to the message */ #define MARK 0x008 /* this message is a mark */ #define ISKERNEL 0x010 /* kernel generated message */ /* * This structure represents the files that will have log * copies printed. * We require f_file to be valid if f_type is F_FILE, F_CONSOLE, F_TTY * or if f_type if F_PIPE and f_pid > 0. */ struct filed { struct filed *f_next; /* next in linked list */ short f_type; /* entry type, see below */ short f_file; /* file descriptor */ time_t f_time; /* time this was last written */ char *f_host; /* host from which to recd. */ u_char f_pmask[LOG_NFACILITIES+1]; /* priority mask */ u_char f_pcmp[LOG_NFACILITIES+1]; /* compare priority */ #define PRI_LT 0x1 #define PRI_EQ 0x2 #define PRI_GT 0x4 char *f_program; /* program this applies to */ union { char f_uname[MAXUNAMES][MAXLOGNAME]; struct { char f_hname[MAXHOSTNAMELEN]; struct addrinfo *f_addr; } f_forw; /* forwarding address */ char f_fname[MAXPATHLEN]; struct { char f_pname[MAXPATHLEN]; pid_t f_pid; } f_pipe; } f_un; char f_prevline[MAXSVLINE]; /* last message logged */ char f_lasttime[16]; /* time of last occurrence */ char f_prevhost[MAXHOSTNAMELEN]; /* host from which recd. */ int f_prevpri; /* pri of f_prevline */ int f_prevlen; /* length of f_prevline */ int f_prevcount; /* repetition cnt of prevline */ u_int f_repeatcount; /* number of "repeated" msgs */ int f_flags; /* file-specific flags */ #define FFLAG_SYNC 0x01 #define FFLAG_NEEDSYNC 0x02 }; /* * Queue of about-to-be dead processes we should watch out for. */ TAILQ_HEAD(stailhead, deadq_entry) deadq_head; struct stailhead *deadq_headp; struct deadq_entry { pid_t dq_pid; int dq_timeout; TAILQ_ENTRY(deadq_entry) dq_entries; }; /* * The timeout to apply to processes waiting on the dead queue. Unit * of measure is `mark intervals', i.e. 20 minutes by default. * Processes on the dead queue will be terminated after that time. */ #define DQ_TIMO_INIT 2 typedef struct deadq_entry *dq_t; /* * Struct to hold records of network addresses that are allowed to log * to us. */ struct allowedpeer { int isnumeric; u_short port; union { struct { struct sockaddr_storage addr; struct sockaddr_storage mask; } numeric; char *name; } u; #define a_addr u.numeric.addr #define a_mask u.numeric.mask #define a_name u.name }; /* * Intervals at which we flush out "message repeated" messages, * in seconds after previous message is logged. After each flush, * we move to the next interval until we reach the largest. */ int repeatinterval[] = { 30, 120, 600 }; /* # of secs before flush */ #define MAXREPEAT ((sizeof(repeatinterval) / sizeof(repeatinterval[0])) - 1) #define REPEATTIME(f) ((f)->f_time + repeatinterval[(f)->f_repeatcount]) #define BACKOFF(f) { if (++(f)->f_repeatcount > MAXREPEAT) \ (f)->f_repeatcount = MAXREPEAT; \ } /* values for f_type */ #define F_UNUSED 0 /* unused entry */ #define F_FILE 1 /* regular file */ #define F_TTY 2 /* terminal */ #define F_CONSOLE 3 /* console terminal */ #define F_FORW 4 /* remote machine */ #define F_USERS 5 /* list of users */ #define F_WALL 6 /* everyone logged on */ #define F_PIPE 7 /* pipe to program */ const char *TypeNames[8] = { "UNUSED", "FILE", "TTY", "CONSOLE", "FORW", "USERS", "WALL", "PIPE" }; static struct filed *Files; /* Log files that we write to */ static struct filed consfile; /* Console */ static int Debug; /* debug flag */ static int resolve = 1; /* resolve hostname */ static char LocalHostName[MAXHOSTNAMELEN]; /* our hostname */ static const char *LocalDomain; /* our local domain name */ static int *finet; /* Internet datagram socket */ static int fklog = -1; /* /dev/klog */ static int Initialized; /* set when we have initialized ourselves */ static int MarkInterval = 20 * 60; /* interval between marks in seconds */ static int MarkSeq; /* mark sequence number */ static int NoBind; /* don't bind() as suggested by RFC 3164 */ static int SecureMode; /* when true, receive only unix domain socks */ #ifdef INET6 static int family = PF_UNSPEC; /* protocol family (IPv4, IPv6 or both) */ #else static int family = PF_INET; /* protocol family (IPv4 only) */ #endif static int mask_C1 = 1; /* mask characters from 0x80 - 0x9F */ static int send_to_all; /* send message to all IPv4/IPv6 addresses */ static int use_bootfile; /* log entire bootfile for every kern msg */ static int no_compress; /* don't compress messages (1=pipes, 2=all) */ static int logflags = O_WRONLY|O_APPEND; /* flags used to open log files */ static char bootfile[MAXLINE+1]; /* booted kernel file */ struct allowedpeer *AllowedPeers; /* List of allowed peers */ static int NumAllowed; /* Number of entries in AllowedPeers */ static int RemoteAddDate; /* Always set the date on remote messages */ static int UniquePriority; /* Only log specified priority? */ static int LogFacPri; /* Put facility and priority in log message: */ /* 0=no, 1=numeric, 2=names */ static int KeepKernFac; /* Keep remotely logged kernel facility */ static int needdofsync = 0; /* Are any file(s) waiting to be fsynced? */ static struct pidfh *pfh; volatile sig_atomic_t MarkSet, WantDie; static int allowaddr(char *); static void cfline(const char *, struct filed *, const char *, const char *); static const char *cvthname(struct sockaddr *); static void deadq_enter(pid_t, const char *); static int deadq_remove(pid_t); static int decode(const char *, const CODE *); static void die(int); static void dodie(int); static void dofsync(void); static void domark(int); static void fprintlog(struct filed *, int, const char *); static int *socksetup(int, char *); static void init(int); static void logerror(const char *); static void logmsg(int, const char *, const char *, int); static void log_deadchild(pid_t, int, const char *); static void markit(void); static int skip_message(const char *, const char *, int); static void printline(const char *, char *, int); static void printsys(char *); static int p_open(const char *, pid_t *); static void readklog(void); static void reapchild(int); static void usage(void); static int validate(struct sockaddr *, const char *); static void unmapped(struct sockaddr *); static void wallmsg(struct filed *, struct iovec *, const int iovlen); static int waitdaemon(int, int, int); static void timedout(int); static void increase_rcvbuf(int); int main(int argc, char *argv[]) { int ch, i, fdsrmax = 0, l; struct sockaddr_un sunx, fromunix; struct sockaddr_storage frominet; fd_set *fdsr = NULL; char line[MAXLINE + 1]; char *bindhostname; const char *hname; struct timeval tv, *tvp; struct sigaction sact; struct funix *fx, *fx1; sigset_t mask; pid_t ppid = 1, spid; socklen_t len; if (madvise(NULL, 0, MADV_PROTECT) != 0) dprintf("madvise() failed: %s\n", strerror(errno)); bindhostname = NULL; while ((ch = getopt(argc, argv, "468Aa:b:cCdf:kl:m:nNop:P:sS:Tuv")) != -1) switch (ch) { case '4': family = PF_INET; break; #ifdef INET6 case '6': family = PF_INET6; break; #endif case '8': mask_C1 = 0; break; case 'A': send_to_all++; break; case 'a': /* allow specific network addresses only */ if (allowaddr(optarg) == -1) usage(); break; case 'b': bindhostname = optarg; break; case 'c': no_compress++; break; case 'C': logflags |= O_CREAT; break; case 'd': /* debug */ Debug++; break; case 'f': /* configuration file */ ConfFile = optarg; break; case 'k': /* keep remote kern fac */ KeepKernFac = 1; break; case 'l': { long perml; mode_t mode; char *name, *ep; if (optarg[0] == '/') { mode = DEFFILEMODE; name = optarg; } else if ((name = strchr(optarg, ':')) != NULL) { *name++ = '\0'; if (name[0] != '/') errx(1, "socket name must be absolute " "path"); if (isdigit(*optarg)) { perml = strtol(optarg, &ep, 8); if (*ep || perml < 0 || perml & ~(S_IRWXU|S_IRWXG|S_IRWXO)) errx(1, "invalid mode %s, exiting", optarg); mode = (mode_t )perml; } else errx(1, "invalid mode %s, exiting", optarg); } else /* doesn't begin with '/', and no ':' */ errx(1, "can't parse path %s", optarg); if (strlen(name) >= sizeof(sunx.sun_path)) errx(1, "%s path too long, exiting", name); if ((fx = malloc(sizeof(struct funix))) == NULL) errx(1, "malloc failed"); fx->s = -1; fx->name = name; fx->mode = mode; STAILQ_INSERT_TAIL(&funixes, fx, next); break; } case 'm': /* mark interval */ MarkInterval = atoi(optarg) * 60; break; case 'N': NoBind = 1; SecureMode = 1; break; case 'n': resolve = 0; break; case 'o': use_bootfile = 1; break; case 'p': /* path */ if (strlen(optarg) >= sizeof(sunx.sun_path)) errx(1, "%s path too long, exiting", optarg); funix_default.name = optarg; break; case 'P': /* path for alt. PID */ PidFile = optarg; break; case 's': /* no network mode */ SecureMode++; break; case 'S': /* path for privileged originator */ if (strlen(optarg) >= sizeof(sunx.sun_path)) errx(1, "%s path too long, exiting", optarg); funix_secure.name = optarg; break; case 'T': RemoteAddDate = 1; break; case 'u': /* only log specified priority */ UniquePriority++; break; case 'v': /* log facility and priority */ LogFacPri++; break; default: usage(); } if ((argc -= optind) != 0) usage(); pfh = pidfile_open(PidFile, 0600, &spid); if (pfh == NULL) { if (errno == EEXIST) errx(1, "syslogd already running, pid: %d", spid); warn("cannot open pid file"); } if (!Debug) { ppid = waitdaemon(0, 0, 30); if (ppid < 0) { warn("could not become daemon"); pidfile_remove(pfh); exit(1); } } else { setlinebuf(stdout); } if (NumAllowed) endservent(); consfile.f_type = F_CONSOLE; (void)strlcpy(consfile.f_un.f_fname, ctty + sizeof _PATH_DEV - 1, sizeof(consfile.f_un.f_fname)); (void)strlcpy(bootfile, getbootfile(), sizeof(bootfile)); (void)signal(SIGTERM, dodie); (void)signal(SIGINT, Debug ? dodie : SIG_IGN); (void)signal(SIGQUIT, Debug ? dodie : SIG_IGN); /* * We don't want the SIGCHLD and SIGHUP handlers to interfere * with each other; they are likely candidates for being called * simultaneously (SIGHUP closes pipe descriptor, process dies, * SIGCHLD happens). */ sigemptyset(&mask); sigaddset(&mask, SIGHUP); sact.sa_handler = reapchild; sact.sa_mask = mask; sact.sa_flags = SA_RESTART; (void)sigaction(SIGCHLD, &sact, NULL); (void)signal(SIGALRM, domark); (void)signal(SIGPIPE, SIG_IGN); /* We'll catch EPIPE instead. */ (void)alarm(TIMERINTVL); TAILQ_INIT(&deadq_head); #ifndef SUN_LEN #define SUN_LEN(unp) (strlen((unp)->sun_path) + 2) #endif STAILQ_FOREACH_SAFE(fx, &funixes, next, fx1) { (void)unlink(fx->name); memset(&sunx, 0, sizeof(sunx)); sunx.sun_family = AF_LOCAL; (void)strlcpy(sunx.sun_path, fx->name, sizeof(sunx.sun_path)); fx->s = socket(PF_LOCAL, SOCK_DGRAM, 0); if (fx->s < 0 || bind(fx->s, (struct sockaddr *)&sunx, SUN_LEN(&sunx)) < 0 || chmod(fx->name, fx->mode) < 0) { (void)snprintf(line, sizeof line, "cannot create %s", fx->name); logerror(line); dprintf("cannot create %s (%d)\n", fx->name, errno); if (fx == &funix_default || fx == &funix_secure) die(0); else { STAILQ_REMOVE(&funixes, fx, funix, next); continue; } } increase_rcvbuf(fx->s); } if (SecureMode <= 1) finet = socksetup(family, bindhostname); if (finet) { if (SecureMode) { for (i = 0; i < *finet; i++) { - if (shutdown(finet[i+1], SHUT_RD) < 0) { + if (shutdown(finet[i+1], SHUT_RD) < 0 && + errno != ENOTCONN) { logerror("shutdown"); if (!Debug) die(0); } } } else { dprintf("listening on inet and/or inet6 socket\n"); } dprintf("sending on inet and/or inet6 socket\n"); } if ((fklog = open(_PATH_KLOG, O_RDONLY, 0)) >= 0) if (fcntl(fklog, F_SETFL, O_NONBLOCK) < 0) fklog = -1; if (fklog < 0) dprintf("can't open %s (%d)\n", _PATH_KLOG, errno); /* tuck my process id away */ pidfile_write(pfh); dprintf("off & running....\n"); init(0); /* prevent SIGHUP and SIGCHLD handlers from running in parallel */ sigemptyset(&mask); sigaddset(&mask, SIGCHLD); sact.sa_handler = init; sact.sa_mask = mask; sact.sa_flags = SA_RESTART; (void)sigaction(SIGHUP, &sact, NULL); tvp = &tv; tv.tv_sec = tv.tv_usec = 0; if (fklog != -1 && fklog > fdsrmax) fdsrmax = fklog; if (finet && !SecureMode) { for (i = 0; i < *finet; i++) { if (finet[i+1] != -1 && finet[i+1] > fdsrmax) fdsrmax = finet[i+1]; } } STAILQ_FOREACH(fx, &funixes, next) if (fx->s > fdsrmax) fdsrmax = fx->s; fdsr = (fd_set *)calloc(howmany(fdsrmax+1, NFDBITS), sizeof(fd_mask)); if (fdsr == NULL) errx(1, "calloc fd_set"); for (;;) { if (MarkSet) markit(); if (WantDie) die(WantDie); bzero(fdsr, howmany(fdsrmax+1, NFDBITS) * sizeof(fd_mask)); if (fklog != -1) FD_SET(fklog, fdsr); if (finet && !SecureMode) { for (i = 0; i < *finet; i++) { if (finet[i+1] != -1) FD_SET(finet[i+1], fdsr); } } STAILQ_FOREACH(fx, &funixes, next) FD_SET(fx->s, fdsr); i = select(fdsrmax+1, fdsr, NULL, NULL, needdofsync ? &tv : tvp); switch (i) { case 0: dofsync(); needdofsync = 0; if (tvp) { tvp = NULL; if (ppid != 1) kill(ppid, SIGALRM); } continue; case -1: if (errno != EINTR) logerror("select"); continue; } if (fklog != -1 && FD_ISSET(fklog, fdsr)) readklog(); if (finet && !SecureMode) { for (i = 0; i < *finet; i++) { if (FD_ISSET(finet[i+1], fdsr)) { len = sizeof(frominet); l = recvfrom(finet[i+1], line, MAXLINE, 0, (struct sockaddr *)&frominet, &len); if (l > 0) { line[l] = '\0'; hname = cvthname((struct sockaddr *)&frominet); unmapped((struct sockaddr *)&frominet); if (validate((struct sockaddr *)&frominet, hname)) printline(hname, line, RemoteAddDate ? ADDDATE : 0); } else if (l < 0 && errno != EINTR) logerror("recvfrom inet"); } } } STAILQ_FOREACH(fx, &funixes, next) { if (FD_ISSET(fx->s, fdsr)) { len = sizeof(fromunix); l = recvfrom(fx->s, line, MAXLINE, 0, (struct sockaddr *)&fromunix, &len); if (l > 0) { line[l] = '\0'; printline(LocalHostName, line, 0); } else if (l < 0 && errno != EINTR) logerror("recvfrom unix"); } } } if (fdsr) free(fdsr); } static void unmapped(struct sockaddr *sa) { struct sockaddr_in6 *sin6; struct sockaddr_in sin4; if (sa->sa_family != AF_INET6) return; if (sa->sa_len != sizeof(struct sockaddr_in6) || sizeof(sin4) > sa->sa_len) return; sin6 = (struct sockaddr_in6 *)sa; if (!IN6_IS_ADDR_V4MAPPED(&sin6->sin6_addr)) return; memset(&sin4, 0, sizeof(sin4)); sin4.sin_family = AF_INET; sin4.sin_len = sizeof(struct sockaddr_in); memcpy(&sin4.sin_addr, &sin6->sin6_addr.s6_addr[12], sizeof(sin4.sin_addr)); sin4.sin_port = sin6->sin6_port; memcpy(sa, &sin4, sin4.sin_len); } static void usage(void) { fprintf(stderr, "%s\n%s\n%s\n%s\n", "usage: syslogd [-468ACcdknosTuv] [-a allowed_peer]", " [-b bind_address] [-f config_file]", " [-l [mode:]path] [-m mark_interval]", " [-P pid_file] [-p log_socket]"); exit(1); } /* * Take a raw input line, decode the message, and print the message * on the appropriate log files. */ static void printline(const char *hname, char *msg, int flags) { char *p, *q; long n; int c, pri; char line[MAXLINE + 1]; /* test for special codes */ p = msg; pri = DEFUPRI; if (*p == '<') { errno = 0; n = strtol(p + 1, &q, 10); if (*q == '>' && n >= 0 && n < INT_MAX && errno == 0) { p = q + 1; pri = n; } } if (pri &~ (LOG_FACMASK|LOG_PRIMASK)) pri = DEFUPRI; /* * Don't allow users to log kernel messages. * NOTE: since LOG_KERN == 0 this will also match * messages with no facility specified. */ if ((pri & LOG_FACMASK) == LOG_KERN && !KeepKernFac) pri = LOG_MAKEPRI(LOG_USER, LOG_PRI(pri)); q = line; while ((c = (unsigned char)*p++) != '\0' && q < &line[sizeof(line) - 4]) { if (mask_C1 && (c & 0x80) && c < 0xA0) { c &= 0x7F; *q++ = 'M'; *q++ = '-'; } if (isascii(c) && iscntrl(c)) { if (c == '\n') { *q++ = ' '; } else if (c == '\t') { *q++ = '\t'; } else { *q++ = '^'; *q++ = c ^ 0100; } } else { *q++ = c; } } *q = '\0'; logmsg(pri, line, hname, flags); } /* * Read /dev/klog while data are available, split into lines. */ static void readklog(void) { char *p, *q, line[MAXLINE + 1]; int len, i; len = 0; for (;;) { i = read(fklog, line + len, MAXLINE - 1 - len); if (i > 0) { line[i + len] = '\0'; } else { if (i < 0 && errno != EINTR && errno != EAGAIN) { logerror("klog"); fklog = -1; } break; } for (p = line; (q = strchr(p, '\n')) != NULL; p = q + 1) { *q = '\0'; printsys(p); } len = strlen(p); if (len >= MAXLINE - 1) { printsys(p); len = 0; } if (len > 0) memmove(line, p, len + 1); } if (len > 0) printsys(line); } /* * Take a raw input line from /dev/klog, format similar to syslog(). */ static void printsys(char *msg) { char *p, *q; long n; int flags, isprintf, pri; flags = ISKERNEL | SYNC_FILE | ADDDATE; /* fsync after write */ p = msg; pri = DEFSPRI; isprintf = 1; if (*p == '<') { errno = 0; n = strtol(p + 1, &q, 10); if (*q == '>' && n >= 0 && n < INT_MAX && errno == 0) { p = q + 1; pri = n; isprintf = 0; } } /* * Kernel printf's and LOG_CONSOLE messages have been displayed * on the console already. */ if (isprintf || (pri & LOG_FACMASK) == LOG_CONSOLE) flags |= IGN_CONS; if (pri &~ (LOG_FACMASK|LOG_PRIMASK)) pri = DEFSPRI; logmsg(pri, p, LocalHostName, flags); } static time_t now; /* * Match a program or host name against a specification. * Return a non-0 value if the message must be ignored * based on the specification. */ static int skip_message(const char *name, const char *spec, int checkcase) { const char *s; char prev, next; int exclude = 0; /* Behaviour on explicit match */ if (spec == NULL) return 0; switch (*spec) { case '-': exclude = 1; /*FALLTHROUGH*/ case '+': spec++; break; default: break; } if (checkcase) s = strstr (spec, name); else s = strcasestr (spec, name); if (s != NULL) { prev = (s == spec ? ',' : *(s - 1)); next = *(s + strlen (name)); if (prev == ',' && (next == '\0' || next == ',')) /* Explicit match: skip iff the spec is an exclusive one. */ return exclude; } /* No explicit match for this name: skip the message iff the spec is an inclusive one. */ return !exclude; } /* * Log a message to the appropriate log files, users, etc. based on * the priority. */ static void logmsg(int pri, const char *msg, const char *from, int flags) { struct filed *f; int i, fac, msglen, omask, prilev; const char *timestamp; char prog[NAME_MAX+1]; char buf[MAXLINE+1]; dprintf("logmsg: pri %o, flags %x, from %s, msg %s\n", pri, flags, from, msg); omask = sigblock(sigmask(SIGHUP)|sigmask(SIGALRM)); /* * Check to see if msg looks non-standard. */ msglen = strlen(msg); if (msglen < 16 || msg[3] != ' ' || msg[6] != ' ' || msg[9] != ':' || msg[12] != ':' || msg[15] != ' ') flags |= ADDDATE; (void)time(&now); if (flags & ADDDATE) { timestamp = ctime(&now) + 4; } else { timestamp = msg; msg += 16; msglen -= 16; } /* skip leading blanks */ while (isspace(*msg)) { msg++; msglen--; } /* extract facility and priority level */ if (flags & MARK) fac = LOG_NFACILITIES; else fac = LOG_FAC(pri); /* Check maximum facility number. */ if (fac > LOG_NFACILITIES) { (void)sigsetmask(omask); return; } prilev = LOG_PRI(pri); /* extract program name */ for (i = 0; i < NAME_MAX; i++) { if (!isprint(msg[i]) || msg[i] == ':' || msg[i] == '[' || msg[i] == '/' || isspace(msg[i])) break; prog[i] = msg[i]; } prog[i] = 0; /* add kernel prefix for kernel messages */ if (flags & ISKERNEL) { snprintf(buf, sizeof(buf), "%s: %s", use_bootfile ? bootfile : "kernel", msg); msg = buf; msglen = strlen(buf); } /* log the message to the particular outputs */ if (!Initialized) { f = &consfile; /* * Open in non-blocking mode to avoid hangs during open * and close(waiting for the port to drain). */ f->f_file = open(ctty, O_WRONLY | O_NONBLOCK, 0); if (f->f_file >= 0) { (void)strlcpy(f->f_lasttime, timestamp, sizeof(f->f_lasttime)); fprintlog(f, flags, msg); (void)close(f->f_file); } (void)sigsetmask(omask); return; } for (f = Files; f; f = f->f_next) { /* skip messages that are incorrect priority */ if (!(((f->f_pcmp[fac] & PRI_EQ) && (f->f_pmask[fac] == prilev)) ||((f->f_pcmp[fac] & PRI_LT) && (f->f_pmask[fac] < prilev)) ||((f->f_pcmp[fac] & PRI_GT) && (f->f_pmask[fac] > prilev)) ) || f->f_pmask[fac] == INTERNAL_NOPRI) continue; /* skip messages with the incorrect hostname */ if (skip_message(from, f->f_host, 0)) continue; /* skip messages with the incorrect program name */ if (skip_message(prog, f->f_program, 1)) continue; /* skip message to console if it has already been printed */ if (f->f_type == F_CONSOLE && (flags & IGN_CONS)) continue; /* don't output marks to recently written files */ if ((flags & MARK) && (now - f->f_time) < MarkInterval / 2) continue; /* * suppress duplicate lines to this file */ if (no_compress - (f->f_type != F_PIPE) < 1 && (flags & MARK) == 0 && msglen == f->f_prevlen && !strcmp(msg, f->f_prevline) && !strcasecmp(from, f->f_prevhost)) { (void)strlcpy(f->f_lasttime, timestamp, sizeof(f->f_lasttime)); f->f_prevcount++; dprintf("msg repeated %d times, %ld sec of %d\n", f->f_prevcount, (long)(now - f->f_time), repeatinterval[f->f_repeatcount]); /* * If domark would have logged this by now, * flush it now (so we don't hold isolated messages), * but back off so we'll flush less often * in the future. */ if (now > REPEATTIME(f)) { fprintlog(f, flags, (char *)NULL); BACKOFF(f); } } else { /* new line, save it */ if (f->f_prevcount) fprintlog(f, 0, (char *)NULL); f->f_repeatcount = 0; f->f_prevpri = pri; (void)strlcpy(f->f_lasttime, timestamp, sizeof(f->f_lasttime)); (void)strlcpy(f->f_prevhost, from, sizeof(f->f_prevhost)); if (msglen < MAXSVLINE) { f->f_prevlen = msglen; (void)strlcpy(f->f_prevline, msg, sizeof(f->f_prevline)); fprintlog(f, flags, (char *)NULL); } else { f->f_prevline[0] = 0; f->f_prevlen = 0; fprintlog(f, flags, msg); } } } (void)sigsetmask(omask); } static void dofsync(void) { struct filed *f; for (f = Files; f; f = f->f_next) { if ((f->f_type == F_FILE) && (f->f_flags & FFLAG_NEEDSYNC)) { f->f_flags &= ~FFLAG_NEEDSYNC; (void)fsync(f->f_file); } } } #define IOV_SIZE 7 static void fprintlog(struct filed *f, int flags, const char *msg) { struct iovec iov[IOV_SIZE]; struct iovec *v; struct addrinfo *r; int i, l, lsent = 0; char line[MAXLINE + 1], repbuf[80], greetings[200], *wmsg = NULL; char nul[] = "", space[] = " ", lf[] = "\n", crlf[] = "\r\n"; const char *msgret; v = iov; if (f->f_type == F_WALL) { v->iov_base = greetings; /* The time displayed is not synchornized with the other log * destinations (like messages). Following fragment was using * ctime(&now), which was updating the time every 30 sec. * With f_lasttime, time is synchronized correctly. */ v->iov_len = snprintf(greetings, sizeof greetings, "\r\n\7Message from syslogd@%s at %.24s ...\r\n", f->f_prevhost, f->f_lasttime); if (v->iov_len >= sizeof greetings) v->iov_len = sizeof greetings - 1; v++; v->iov_base = nul; v->iov_len = 0; v++; } else { v->iov_base = f->f_lasttime; v->iov_len = strlen(f->f_lasttime); v++; v->iov_base = space; v->iov_len = 1; v++; } if (LogFacPri) { static char fp_buf[30]; /* Hollow laugh */ int fac = f->f_prevpri & LOG_FACMASK; int pri = LOG_PRI(f->f_prevpri); const char *f_s = NULL; char f_n[5]; /* Hollow laugh */ const char *p_s = NULL; char p_n[5]; /* Hollow laugh */ if (LogFacPri > 1) { const CODE *c; for (c = facilitynames; c->c_name; c++) { if (c->c_val == fac) { f_s = c->c_name; break; } } for (c = prioritynames; c->c_name; c++) { if (c->c_val == pri) { p_s = c->c_name; break; } } } if (!f_s) { snprintf(f_n, sizeof f_n, "%d", LOG_FAC(fac)); f_s = f_n; } if (!p_s) { snprintf(p_n, sizeof p_n, "%d", pri); p_s = p_n; } snprintf(fp_buf, sizeof fp_buf, "<%s.%s> ", f_s, p_s); v->iov_base = fp_buf; v->iov_len = strlen(fp_buf); } else { v->iov_base = nul; v->iov_len = 0; } v++; v->iov_base = f->f_prevhost; v->iov_len = strlen(v->iov_base); v++; v->iov_base = space; v->iov_len = 1; v++; if (msg) { wmsg = strdup(msg); /* XXX iov_base needs a `const' sibling. */ if (wmsg == NULL) { logerror("strdup"); exit(1); } v->iov_base = wmsg; v->iov_len = strlen(msg); } else if (f->f_prevcount > 1) { v->iov_base = repbuf; v->iov_len = snprintf(repbuf, sizeof repbuf, "last message repeated %d times", f->f_prevcount); } else { v->iov_base = f->f_prevline; v->iov_len = f->f_prevlen; } v++; dprintf("Logging to %s", TypeNames[f->f_type]); f->f_time = now; switch (f->f_type) { int port; case F_UNUSED: dprintf("\n"); break; case F_FORW: port = (int)ntohs(((struct sockaddr_in *) (f->f_un.f_forw.f_addr->ai_addr))->sin_port); if (port != 514) { dprintf(" %s:%d\n", f->f_un.f_forw.f_hname, port); } else { dprintf(" %s\n", f->f_un.f_forw.f_hname); } /* check for local vs remote messages */ if (strcasecmp(f->f_prevhost, LocalHostName)) l = snprintf(line, sizeof line - 1, "<%d>%.15s Forwarded from %s: %s", f->f_prevpri, (char *)iov[0].iov_base, f->f_prevhost, (char *)iov[5].iov_base); else l = snprintf(line, sizeof line - 1, "<%d>%.15s %s", f->f_prevpri, (char *)iov[0].iov_base, (char *)iov[5].iov_base); if (l < 0) l = 0; else if (l > MAXLINE) l = MAXLINE; if (finet) { for (r = f->f_un.f_forw.f_addr; r; r = r->ai_next) { for (i = 0; i < *finet; i++) { #if 0 /* * should we check AF first, or just * trial and error? FWD */ if (r->ai_family == address_family_of(finet[i+1])) #endif lsent = sendto(finet[i+1], line, l, 0, r->ai_addr, r->ai_addrlen); if (lsent == l) break; } if (lsent == l && !send_to_all) break; } dprintf("lsent/l: %d/%d\n", lsent, l); if (lsent != l) { int e = errno; logerror("sendto"); errno = e; switch (errno) { case ENOBUFS: case ENETDOWN: case ENETUNREACH: case EHOSTUNREACH: case EHOSTDOWN: case EADDRNOTAVAIL: break; /* case EBADF: */ /* case EACCES: */ /* case ENOTSOCK: */ /* case EFAULT: */ /* case EMSGSIZE: */ /* case EAGAIN: */ /* case ENOBUFS: */ /* case ECONNREFUSED: */ default: dprintf("removing entry: errno=%d\n", e); f->f_type = F_UNUSED; break; } } } break; case F_FILE: dprintf(" %s\n", f->f_un.f_fname); v->iov_base = lf; v->iov_len = 1; if (writev(f->f_file, iov, IOV_SIZE) < 0) { /* * If writev(2) fails for potentially transient errors * like the filesystem being full, ignore it. * Otherwise remove this logfile from the list. */ if (errno != ENOSPC) { int e = errno; (void)close(f->f_file); f->f_type = F_UNUSED; errno = e; logerror(f->f_un.f_fname); } } else if ((flags & SYNC_FILE) && (f->f_flags & FFLAG_SYNC)) { f->f_flags |= FFLAG_NEEDSYNC; needdofsync = 1; } break; case F_PIPE: dprintf(" %s\n", f->f_un.f_pipe.f_pname); v->iov_base = lf; v->iov_len = 1; if (f->f_un.f_pipe.f_pid == 0) { if ((f->f_file = p_open(f->f_un.f_pipe.f_pname, &f->f_un.f_pipe.f_pid)) < 0) { f->f_type = F_UNUSED; logerror(f->f_un.f_pipe.f_pname); break; } } if (writev(f->f_file, iov, IOV_SIZE) < 0) { int e = errno; (void)close(f->f_file); if (f->f_un.f_pipe.f_pid > 0) deadq_enter(f->f_un.f_pipe.f_pid, f->f_un.f_pipe.f_pname); f->f_un.f_pipe.f_pid = 0; errno = e; logerror(f->f_un.f_pipe.f_pname); } break; case F_CONSOLE: if (flags & IGN_CONS) { dprintf(" (ignored)\n"); break; } /* FALLTHROUGH */ case F_TTY: dprintf(" %s%s\n", _PATH_DEV, f->f_un.f_fname); v->iov_base = crlf; v->iov_len = 2; errno = 0; /* ttymsg() only sometimes returns an errno */ if ((msgret = ttymsg(iov, IOV_SIZE, f->f_un.f_fname, 10))) { f->f_type = F_UNUSED; logerror(msgret); } break; case F_USERS: case F_WALL: dprintf("\n"); v->iov_base = crlf; v->iov_len = 2; wallmsg(f, iov, IOV_SIZE); break; } f->f_prevcount = 0; free(wmsg); } /* * WALLMSG -- Write a message to the world at large * * Write the specified message to either the entire * world, or a list of approved users. */ static void wallmsg(struct filed *f, struct iovec *iov, const int iovlen) { static int reenter; /* avoid calling ourselves */ struct utmpx *ut; int i; const char *p; if (reenter++) return; setutxent(); /* NOSTRICT */ while ((ut = getutxent()) != NULL) { if (ut->ut_type != USER_PROCESS) continue; if (f->f_type == F_WALL) { if ((p = ttymsg(iov, iovlen, ut->ut_line, TTYMSGTIME)) != NULL) { errno = 0; /* already in msg */ logerror(p); } continue; } /* should we send the message to this user? */ for (i = 0; i < MAXUNAMES; i++) { if (!f->f_un.f_uname[i][0]) break; if (!strcmp(f->f_un.f_uname[i], ut->ut_user)) { if ((p = ttymsg(iov, iovlen, ut->ut_line, TTYMSGTIME)) != NULL) { errno = 0; /* already in msg */ logerror(p); } break; } } } endutxent(); reenter = 0; } static void reapchild(int signo __unused) { int status; pid_t pid; struct filed *f; while ((pid = wait3(&status, WNOHANG, (struct rusage *)NULL)) > 0) { if (!Initialized) /* Don't tell while we are initting. */ continue; /* First, look if it's a process from the dead queue. */ if (deadq_remove(pid)) goto oncemore; /* Now, look in list of active processes. */ for (f = Files; f; f = f->f_next) if (f->f_type == F_PIPE && f->f_un.f_pipe.f_pid == pid) { (void)close(f->f_file); f->f_un.f_pipe.f_pid = 0; log_deadchild(pid, status, f->f_un.f_pipe.f_pname); break; } oncemore: continue; } } /* * Return a printable representation of a host address. */ static const char * cvthname(struct sockaddr *f) { int error, hl; sigset_t omask, nmask; static char hname[NI_MAXHOST], ip[NI_MAXHOST]; error = getnameinfo((struct sockaddr *)f, ((struct sockaddr *)f)->sa_len, ip, sizeof ip, NULL, 0, NI_NUMERICHOST); dprintf("cvthname(%s)\n", ip); if (error) { dprintf("Malformed from address %s\n", gai_strerror(error)); return ("???"); } if (!resolve) return (ip); sigemptyset(&nmask); sigaddset(&nmask, SIGHUP); sigprocmask(SIG_BLOCK, &nmask, &omask); error = getnameinfo((struct sockaddr *)f, ((struct sockaddr *)f)->sa_len, hname, sizeof hname, NULL, 0, NI_NAMEREQD); sigprocmask(SIG_SETMASK, &omask, NULL); if (error) { dprintf("Host name for your address (%s) unknown\n", ip); return (ip); } hl = strlen(hname); if (hl > 0 && hname[hl-1] == '.') hname[--hl] = '\0'; trimdomain(hname, hl); return (hname); } static void dodie(int signo) { WantDie = signo; } static void domark(int signo __unused) { MarkSet = 1; } /* * Print syslogd errors some place. */ static void logerror(const char *type) { char buf[512]; static int recursed = 0; /* If there's an error while trying to log an error, give up. */ if (recursed) return; recursed++; if (errno) (void)snprintf(buf, sizeof buf, "syslogd: %s: %s", type, strerror(errno)); else (void)snprintf(buf, sizeof buf, "syslogd: %s", type); errno = 0; dprintf("%s\n", buf); logmsg(LOG_SYSLOG|LOG_ERR, buf, LocalHostName, ADDDATE); recursed--; } static void die(int signo) { struct filed *f; struct funix *fx; int was_initialized; char buf[100]; was_initialized = Initialized; Initialized = 0; /* Don't log SIGCHLDs. */ for (f = Files; f != NULL; f = f->f_next) { /* flush any pending output */ if (f->f_prevcount) fprintlog(f, 0, (char *)NULL); if (f->f_type == F_PIPE && f->f_un.f_pipe.f_pid > 0) { (void)close(f->f_file); f->f_un.f_pipe.f_pid = 0; } } Initialized = was_initialized; if (signo) { dprintf("syslogd: exiting on signal %d\n", signo); (void)snprintf(buf, sizeof(buf), "exiting on signal %d", signo); errno = 0; logerror(buf); } STAILQ_FOREACH(fx, &funixes, next) (void)unlink(fx->name); pidfile_remove(pfh); exit(1); } /* * INIT -- Initialize syslogd from configuration table */ static void init(int signo) { int i; FILE *cf; struct filed *f, *next, **nextp; char *p; char cline[LINE_MAX]; char prog[LINE_MAX]; char host[MAXHOSTNAMELEN]; char oldLocalHostName[MAXHOSTNAMELEN]; char hostMsg[2*MAXHOSTNAMELEN+40]; char bootfileMsg[LINE_MAX]; dprintf("init\n"); /* * Load hostname (may have changed). */ if (signo != 0) (void)strlcpy(oldLocalHostName, LocalHostName, sizeof(oldLocalHostName)); if (gethostname(LocalHostName, sizeof(LocalHostName))) err(EX_OSERR, "gethostname() failed"); if ((p = strchr(LocalHostName, '.')) != NULL) { *p++ = '\0'; LocalDomain = p; } else { LocalDomain = ""; } /* * Close all open log files. */ Initialized = 0; for (f = Files; f != NULL; f = next) { /* flush any pending output */ if (f->f_prevcount) fprintlog(f, 0, (char *)NULL); switch (f->f_type) { case F_FILE: case F_FORW: case F_CONSOLE: case F_TTY: (void)close(f->f_file); break; case F_PIPE: if (f->f_un.f_pipe.f_pid > 0) { (void)close(f->f_file); deadq_enter(f->f_un.f_pipe.f_pid, f->f_un.f_pipe.f_pname); } f->f_un.f_pipe.f_pid = 0; break; } next = f->f_next; if (f->f_program) free(f->f_program); if (f->f_host) free(f->f_host); free((char *)f); } Files = NULL; nextp = &Files; /* open the configuration file */ if ((cf = fopen(ConfFile, "r")) == NULL) { dprintf("cannot open %s\n", ConfFile); *nextp = (struct filed *)calloc(1, sizeof(*f)); if (*nextp == NULL) { logerror("calloc"); exit(1); } cfline("*.ERR\t/dev/console", *nextp, "*", "*"); (*nextp)->f_next = (struct filed *)calloc(1, sizeof(*f)); if ((*nextp)->f_next == NULL) { logerror("calloc"); exit(1); } cfline("*.PANIC\t*", (*nextp)->f_next, "*", "*"); Initialized = 1; return; } /* * Foreach line in the conf table, open that file. */ f = NULL; (void)strlcpy(host, "*", sizeof(host)); (void)strlcpy(prog, "*", sizeof(prog)); while (fgets(cline, sizeof(cline), cf) != NULL) { /* * check for end-of-section, comments, strip off trailing * spaces and newline character. #!prog is treated specially: * following lines apply only to that program. */ for (p = cline; isspace(*p); ++p) continue; if (*p == 0) continue; if (*p == '#') { p++; if (*p != '!' && *p != '+' && *p != '-') continue; } if (*p == '+' || *p == '-') { host[0] = *p++; while (isspace(*p)) p++; if ((!*p) || (*p == '*')) { (void)strlcpy(host, "*", sizeof(host)); continue; } if (*p == '@') p = LocalHostName; for (i = 1; i < MAXHOSTNAMELEN - 1; i++) { if (!isalnum(*p) && *p != '.' && *p != '-' && *p != ',' && *p != ':' && *p != '%') break; host[i] = *p++; } host[i] = '\0'; continue; } if (*p == '!') { p++; while (isspace(*p)) p++; if ((!*p) || (*p == '*')) { (void)strlcpy(prog, "*", sizeof(prog)); continue; } for (i = 0; i < LINE_MAX - 1; i++) { if (!isprint(p[i]) || isspace(p[i])) break; prog[i] = p[i]; } prog[i] = 0; continue; } for (p = cline + 1; *p != '\0'; p++) { if (*p != '#') continue; if (*(p - 1) == '\\') { strcpy(p - 1, p); p--; continue; } *p = '\0'; break; } for (i = strlen(cline) - 1; i >= 0 && isspace(cline[i]); i--) cline[i] = '\0'; f = (struct filed *)calloc(1, sizeof(*f)); if (f == NULL) { logerror("calloc"); exit(1); } *nextp = f; nextp = &f->f_next; cfline(cline, f, prog, host); } /* close the configuration file */ (void)fclose(cf); Initialized = 1; if (Debug) { int port; for (f = Files; f; f = f->f_next) { for (i = 0; i <= LOG_NFACILITIES; i++) if (f->f_pmask[i] == INTERNAL_NOPRI) printf("X "); else printf("%d ", f->f_pmask[i]); printf("%s: ", TypeNames[f->f_type]); switch (f->f_type) { case F_FILE: printf("%s", f->f_un.f_fname); break; case F_CONSOLE: case F_TTY: printf("%s%s", _PATH_DEV, f->f_un.f_fname); break; case F_FORW: port = (int)ntohs(((struct sockaddr_in *) (f->f_un.f_forw.f_addr->ai_addr))->sin_port); if (port != 514) { printf("%s:%d", f->f_un.f_forw.f_hname, port); } else { printf("%s", f->f_un.f_forw.f_hname); } break; case F_PIPE: printf("%s", f->f_un.f_pipe.f_pname); break; case F_USERS: for (i = 0; i < MAXUNAMES && *f->f_un.f_uname[i]; i++) printf("%s, ", f->f_un.f_uname[i]); break; } if (f->f_program) printf(" (%s)", f->f_program); printf("\n"); } } logmsg(LOG_SYSLOG|LOG_INFO, "syslogd: restart", LocalHostName, ADDDATE); dprintf("syslogd: restarted\n"); /* * Log a change in hostname, but only on a restart. */ if (signo != 0 && strcmp(oldLocalHostName, LocalHostName) != 0) { (void)snprintf(hostMsg, sizeof(hostMsg), "syslogd: hostname changed, \"%s\" to \"%s\"", oldLocalHostName, LocalHostName); logmsg(LOG_SYSLOG|LOG_INFO, hostMsg, LocalHostName, ADDDATE); dprintf("%s\n", hostMsg); } /* * Log the kernel boot file if we aren't going to use it as * the prefix, and if this is *not* a restart. */ if (signo == 0 && !use_bootfile) { (void)snprintf(bootfileMsg, sizeof(bootfileMsg), "syslogd: kernel boot file is %s", bootfile); logmsg(LOG_KERN|LOG_INFO, bootfileMsg, LocalHostName, ADDDATE); dprintf("%s\n", bootfileMsg); } } /* * Crack a configuration file line */ static void cfline(const char *line, struct filed *f, const char *prog, const char *host) { struct addrinfo hints, *res; int error, i, pri, syncfile; const char *p, *q; char *bp; char buf[MAXLINE], ebuf[100]; dprintf("cfline(\"%s\", f, \"%s\", \"%s\")\n", line, prog, host); errno = 0; /* keep strerror() stuff out of logerror messages */ /* clear out file entry */ memset(f, 0, sizeof(*f)); for (i = 0; i <= LOG_NFACILITIES; i++) f->f_pmask[i] = INTERNAL_NOPRI; /* save hostname if any */ if (host && *host == '*') host = NULL; if (host) { int hl; f->f_host = strdup(host); if (f->f_host == NULL) { logerror("strdup"); exit(1); } hl = strlen(f->f_host); if (hl > 0 && f->f_host[hl-1] == '.') f->f_host[--hl] = '\0'; trimdomain(f->f_host, hl); } /* save program name if any */ if (prog && *prog == '*') prog = NULL; if (prog) { f->f_program = strdup(prog); if (f->f_program == NULL) { logerror("strdup"); exit(1); } } /* scan through the list of selectors */ for (p = line; *p && *p != '\t' && *p != ' ';) { int pri_done; int pri_cmp; int pri_invert; /* find the end of this facility name list */ for (q = p; *q && *q != '\t' && *q != ' ' && *q++ != '.'; ) continue; /* get the priority comparison */ pri_cmp = 0; pri_done = 0; pri_invert = 0; if (*q == '!') { pri_invert = 1; q++; } while (!pri_done) { switch (*q) { case '<': pri_cmp |= PRI_LT; q++; break; case '=': pri_cmp |= PRI_EQ; q++; break; case '>': pri_cmp |= PRI_GT; q++; break; default: pri_done++; break; } } /* collect priority name */ for (bp = buf; *q && !strchr("\t,; ", *q); ) *bp++ = *q++; *bp = '\0'; /* skip cruft */ while (strchr(",;", *q)) q++; /* decode priority name */ if (*buf == '*') { pri = LOG_PRIMASK; pri_cmp = PRI_LT | PRI_EQ | PRI_GT; } else { /* Ignore trailing spaces. */ for (i = strlen(buf) - 1; i >= 0 && buf[i] == ' '; i--) buf[i] = '\0'; pri = decode(buf, prioritynames); if (pri < 0) { errno = 0; (void)snprintf(ebuf, sizeof ebuf, "unknown priority name \"%s\"", buf); logerror(ebuf); return; } } if (!pri_cmp) pri_cmp = (UniquePriority) ? (PRI_EQ) : (PRI_EQ | PRI_GT) ; if (pri_invert) pri_cmp ^= PRI_LT | PRI_EQ | PRI_GT; /* scan facilities */ while (*p && !strchr("\t.; ", *p)) { for (bp = buf; *p && !strchr("\t,;. ", *p); ) *bp++ = *p++; *bp = '\0'; if (*buf == '*') { for (i = 0; i < LOG_NFACILITIES; i++) { f->f_pmask[i] = pri; f->f_pcmp[i] = pri_cmp; } } else { i = decode(buf, facilitynames); if (i < 0) { errno = 0; (void)snprintf(ebuf, sizeof ebuf, "unknown facility name \"%s\"", buf); logerror(ebuf); return; } f->f_pmask[i >> 3] = pri; f->f_pcmp[i >> 3] = pri_cmp; } while (*p == ',' || *p == ' ') p++; } p = q; } /* skip to action part */ while (*p == '\t' || *p == ' ') p++; if (*p == '-') { syncfile = 0; p++; } else syncfile = 1; switch (*p) { case '@': { char *tp; char endkey = ':'; /* * scan forward to see if there is a port defined. * so we can't use strlcpy.. */ i = sizeof(f->f_un.f_forw.f_hname); tp = f->f_un.f_forw.f_hname; p++; /* * an ipv6 address should start with a '[' in that case * we should scan for a ']' */ if (*p == '[') { p++; endkey = ']'; } while (*p && (*p != endkey) && (i-- > 0)) { *tp++ = *p++; } if (endkey == ']' && *p == endkey) p++; *tp = '\0'; } /* See if we copied a domain and have a port */ if (*p == ':') p++; else p = NULL; memset(&hints, 0, sizeof(hints)); hints.ai_family = family; hints.ai_socktype = SOCK_DGRAM; error = getaddrinfo(f->f_un.f_forw.f_hname, p ? p : "syslog", &hints, &res); if (error) { logerror(gai_strerror(error)); break; } f->f_un.f_forw.f_addr = res; f->f_type = F_FORW; break; case '/': if ((f->f_file = open(p, logflags, 0600)) < 0) { f->f_type = F_UNUSED; logerror(p); break; } if (syncfile) f->f_flags |= FFLAG_SYNC; if (isatty(f->f_file)) { if (strcmp(p, ctty) == 0) f->f_type = F_CONSOLE; else f->f_type = F_TTY; (void)strlcpy(f->f_un.f_fname, p + sizeof(_PATH_DEV) - 1, sizeof(f->f_un.f_fname)); } else { (void)strlcpy(f->f_un.f_fname, p, sizeof(f->f_un.f_fname)); f->f_type = F_FILE; } break; case '|': f->f_un.f_pipe.f_pid = 0; (void)strlcpy(f->f_un.f_pipe.f_pname, p + 1, sizeof(f->f_un.f_pipe.f_pname)); f->f_type = F_PIPE; break; case '*': f->f_type = F_WALL; break; default: for (i = 0; i < MAXUNAMES && *p; i++) { for (q = p; *q && *q != ','; ) q++; (void)strncpy(f->f_un.f_uname[i], p, MAXLOGNAME - 1); if ((q - p) >= MAXLOGNAME) f->f_un.f_uname[i][MAXLOGNAME - 1] = '\0'; else f->f_un.f_uname[i][q - p] = '\0'; while (*q == ',' || *q == ' ') q++; p = q; } f->f_type = F_USERS; break; } } /* * Decode a symbolic name to a numeric value */ static int decode(const char *name, const CODE *codetab) { const CODE *c; char *p, buf[40]; if (isdigit(*name)) return (atoi(name)); for (p = buf; *name && p < &buf[sizeof(buf) - 1]; p++, name++) { if (isupper(*name)) *p = tolower(*name); else *p = *name; } *p = '\0'; for (c = codetab; c->c_name; c++) if (!strcmp(buf, c->c_name)) return (c->c_val); return (-1); } static void markit(void) { struct filed *f; dq_t q, next; now = time((time_t *)NULL); MarkSeq += TIMERINTVL; if (MarkSeq >= MarkInterval) { logmsg(LOG_INFO, "-- MARK --", LocalHostName, ADDDATE|MARK); MarkSeq = 0; } for (f = Files; f; f = f->f_next) { if (f->f_prevcount && now >= REPEATTIME(f)) { dprintf("flush %s: repeated %d times, %d sec.\n", TypeNames[f->f_type], f->f_prevcount, repeatinterval[f->f_repeatcount]); fprintlog(f, 0, (char *)NULL); BACKOFF(f); } } /* Walk the dead queue, and see if we should signal somebody. */ for (q = TAILQ_FIRST(&deadq_head); q != NULL; q = next) { next = TAILQ_NEXT(q, dq_entries); switch (q->dq_timeout) { case 0: /* Already signalled once, try harder now. */ if (kill(q->dq_pid, SIGKILL) != 0) (void)deadq_remove(q->dq_pid); break; case 1: /* * Timed out on dead queue, send terminate * signal. Note that we leave the removal * from the dead queue to reapchild(), which * will also log the event (unless the process * didn't even really exist, in case we simply * drop it from the dead queue). */ if (kill(q->dq_pid, SIGTERM) != 0) (void)deadq_remove(q->dq_pid); /* FALLTHROUGH */ default: q->dq_timeout--; } } MarkSet = 0; (void)alarm(TIMERINTVL); } /* * fork off and become a daemon, but wait for the child to come online * before returing to the parent, or we get disk thrashing at boot etc. * Set a timer so we don't hang forever if it wedges. */ static int waitdaemon(int nochdir, int noclose, int maxwait) { int fd; int status; pid_t pid, childpid; switch (childpid = fork()) { case -1: return (-1); case 0: break; default: signal(SIGALRM, timedout); alarm(maxwait); while ((pid = wait3(&status, 0, NULL)) != -1) { if (WIFEXITED(status)) errx(1, "child pid %d exited with return code %d", pid, WEXITSTATUS(status)); if (WIFSIGNALED(status)) errx(1, "child pid %d exited on signal %d%s", pid, WTERMSIG(status), WCOREDUMP(status) ? " (core dumped)" : ""); if (pid == childpid) /* it's gone... */ break; } exit(0); } if (setsid() == -1) return (-1); if (!nochdir) (void)chdir("/"); if (!noclose && (fd = open(_PATH_DEVNULL, O_RDWR, 0)) != -1) { (void)dup2(fd, STDIN_FILENO); (void)dup2(fd, STDOUT_FILENO); (void)dup2(fd, STDERR_FILENO); if (fd > 2) (void)close (fd); } return (getppid()); } /* * We get a SIGALRM from the child when it's running and finished doing it's * fsync()'s or O_SYNC writes for all the boot messages. * * We also get a signal from the kernel if the timer expires, so check to * see what happened. */ static void timedout(int sig __unused) { int left; left = alarm(0); signal(SIGALRM, SIG_DFL); if (left == 0) errx(1, "timed out waiting for child"); else _exit(0); } /* * Add `s' to the list of allowable peer addresses to accept messages * from. * * `s' is a string in the form: * * [*]domainname[:{servicename|portnumber|*}] * * or * * netaddr/maskbits[:{servicename|portnumber|*}] * * Returns -1 on error, 0 if the argument was valid. */ static int allowaddr(char *s) { char *cp1, *cp2; struct allowedpeer ap; struct servent *se; int masklen = -1; struct addrinfo hints, *res; struct in_addr *addrp, *maskp; #ifdef INET6 int i; u_int32_t *addr6p, *mask6p; #endif char ip[NI_MAXHOST]; #ifdef INET6 if (*s != '[' || (cp1 = strchr(s + 1, ']')) == NULL) #endif cp1 = s; if ((cp1 = strrchr(cp1, ':'))) { /* service/port provided */ *cp1++ = '\0'; if (strlen(cp1) == 1 && *cp1 == '*') /* any port allowed */ ap.port = 0; else if ((se = getservbyname(cp1, "udp"))) { ap.port = ntohs(se->s_port); } else { ap.port = strtol(cp1, &cp2, 0); if (*cp2 != '\0') return (-1); /* port not numeric */ } } else { if ((se = getservbyname("syslog", "udp"))) ap.port = ntohs(se->s_port); else /* sanity, should not happen */ ap.port = 514; } if ((cp1 = strchr(s, '/')) != NULL && strspn(cp1 + 1, "0123456789") == strlen(cp1 + 1)) { *cp1 = '\0'; if ((masklen = atoi(cp1 + 1)) < 0) return (-1); } #ifdef INET6 if (*s == '[') { cp2 = s + strlen(s) - 1; if (*cp2 == ']') { ++s; *cp2 = '\0'; } else { cp2 = NULL; } } else { cp2 = NULL; } #endif memset(&hints, 0, sizeof(hints)); hints.ai_family = PF_UNSPEC; hints.ai_socktype = SOCK_DGRAM; hints.ai_flags = AI_PASSIVE | AI_NUMERICHOST; if (getaddrinfo(s, NULL, &hints, &res) == 0) { ap.isnumeric = 1; memcpy(&ap.a_addr, res->ai_addr, res->ai_addrlen); memset(&ap.a_mask, 0, sizeof(ap.a_mask)); ap.a_mask.ss_family = res->ai_family; if (res->ai_family == AF_INET) { ap.a_mask.ss_len = sizeof(struct sockaddr_in); maskp = &((struct sockaddr_in *)&ap.a_mask)->sin_addr; addrp = &((struct sockaddr_in *)&ap.a_addr)->sin_addr; if (masklen < 0) { /* use default netmask */ if (IN_CLASSA(ntohl(addrp->s_addr))) maskp->s_addr = htonl(IN_CLASSA_NET); else if (IN_CLASSB(ntohl(addrp->s_addr))) maskp->s_addr = htonl(IN_CLASSB_NET); else maskp->s_addr = htonl(IN_CLASSC_NET); } else if (masklen <= 32) { /* convert masklen to netmask */ if (masklen == 0) maskp->s_addr = 0; else maskp->s_addr = htonl(~((1 << (32 - masklen)) - 1)); } else { freeaddrinfo(res); return (-1); } /* Lose any host bits in the network number. */ addrp->s_addr &= maskp->s_addr; } #ifdef INET6 else if (res->ai_family == AF_INET6 && masklen <= 128) { ap.a_mask.ss_len = sizeof(struct sockaddr_in6); if (masklen < 0) masklen = 128; mask6p = (u_int32_t *)&((struct sockaddr_in6 *)&ap.a_mask)->sin6_addr; /* convert masklen to netmask */ while (masklen > 0) { if (masklen < 32) { *mask6p = htonl(~(0xffffffff >> masklen)); break; } *mask6p++ = 0xffffffff; masklen -= 32; } /* Lose any host bits in the network number. */ mask6p = (u_int32_t *)&((struct sockaddr_in6 *)&ap.a_mask)->sin6_addr; addr6p = (u_int32_t *)&((struct sockaddr_in6 *)&ap.a_addr)->sin6_addr; for (i = 0; i < 4; i++) addr6p[i] &= mask6p[i]; } #endif else { freeaddrinfo(res); return (-1); } freeaddrinfo(res); } else { /* arg `s' is domain name */ ap.isnumeric = 0; ap.a_name = s; if (cp1) *cp1 = '/'; #ifdef INET6 if (cp2) { *cp2 = ']'; --s; } #endif } if (Debug) { printf("allowaddr: rule %d: ", NumAllowed); if (ap.isnumeric) { printf("numeric, "); getnameinfo((struct sockaddr *)&ap.a_addr, ((struct sockaddr *)&ap.a_addr)->sa_len, ip, sizeof ip, NULL, 0, NI_NUMERICHOST); printf("addr = %s, ", ip); getnameinfo((struct sockaddr *)&ap.a_mask, ((struct sockaddr *)&ap.a_mask)->sa_len, ip, sizeof ip, NULL, 0, NI_NUMERICHOST); printf("mask = %s; ", ip); } else { printf("domainname = %s; ", ap.a_name); } printf("port = %d\n", ap.port); } if ((AllowedPeers = realloc(AllowedPeers, ++NumAllowed * sizeof(struct allowedpeer))) == NULL) { logerror("realloc"); exit(1); } memcpy(&AllowedPeers[NumAllowed - 1], &ap, sizeof(struct allowedpeer)); return (0); } /* * Validate that the remote peer has permission to log to us. */ static int validate(struct sockaddr *sa, const char *hname) { int i; size_t l1, l2; char *cp, name[NI_MAXHOST], ip[NI_MAXHOST], port[NI_MAXSERV]; struct allowedpeer *ap; struct sockaddr_in *sin4, *a4p = NULL, *m4p = NULL; #ifdef INET6 int j, reject; struct sockaddr_in6 *sin6, *a6p = NULL, *m6p = NULL; #endif struct addrinfo hints, *res; u_short sport; if (NumAllowed == 0) /* traditional behaviour, allow everything */ return (1); (void)strlcpy(name, hname, sizeof(name)); memset(&hints, 0, sizeof(hints)); hints.ai_family = PF_UNSPEC; hints.ai_socktype = SOCK_DGRAM; hints.ai_flags = AI_PASSIVE | AI_NUMERICHOST; if (getaddrinfo(name, NULL, &hints, &res) == 0) freeaddrinfo(res); else if (strchr(name, '.') == NULL) { strlcat(name, ".", sizeof name); strlcat(name, LocalDomain, sizeof name); } if (getnameinfo(sa, sa->sa_len, ip, sizeof ip, port, sizeof port, NI_NUMERICHOST | NI_NUMERICSERV) != 0) return (0); /* for safety, should not occur */ dprintf("validate: dgram from IP %s, port %s, name %s;\n", ip, port, name); sport = atoi(port); /* now, walk down the list */ for (i = 0, ap = AllowedPeers; i < NumAllowed; i++, ap++) { if (ap->port != 0 && ap->port != sport) { dprintf("rejected in rule %d due to port mismatch.\n", i); continue; } if (ap->isnumeric) { if (ap->a_addr.ss_family != sa->sa_family) { dprintf("rejected in rule %d due to address family mismatch.\n", i); continue; } if (ap->a_addr.ss_family == AF_INET) { sin4 = (struct sockaddr_in *)sa; a4p = (struct sockaddr_in *)&ap->a_addr; m4p = (struct sockaddr_in *)&ap->a_mask; if ((sin4->sin_addr.s_addr & m4p->sin_addr.s_addr) != a4p->sin_addr.s_addr) { dprintf("rejected in rule %d due to IP mismatch.\n", i); continue; } } #ifdef INET6 else if (ap->a_addr.ss_family == AF_INET6) { sin6 = (struct sockaddr_in6 *)sa; a6p = (struct sockaddr_in6 *)&ap->a_addr; m6p = (struct sockaddr_in6 *)&ap->a_mask; if (a6p->sin6_scope_id != 0 && sin6->sin6_scope_id != a6p->sin6_scope_id) { dprintf("rejected in rule %d due to scope mismatch.\n", i); continue; } reject = 0; for (j = 0; j < 16; j += 4) { if ((*(u_int32_t *)&sin6->sin6_addr.s6_addr[j] & *(u_int32_t *)&m6p->sin6_addr.s6_addr[j]) != *(u_int32_t *)&a6p->sin6_addr.s6_addr[j]) { ++reject; break; } } if (reject) { dprintf("rejected in rule %d due to IP mismatch.\n", i); continue; } } #endif else continue; } else { cp = ap->a_name; l1 = strlen(name); if (*cp == '*') { /* allow wildmatch */ cp++; l2 = strlen(cp); if (l2 > l1 || memcmp(cp, &name[l1 - l2], l2) != 0) { dprintf("rejected in rule %d due to name mismatch.\n", i); continue; } } else { /* exact match */ l2 = strlen(cp); if (l2 != l1 || memcmp(cp, name, l1) != 0) { dprintf("rejected in rule %d due to name mismatch.\n", i); continue; } } } dprintf("accepted in rule %d.\n", i); return (1); /* hooray! */ } return (0); } /* * Fairly similar to popen(3), but returns an open descriptor, as * opposed to a FILE *. */ static int p_open(const char *prog, pid_t *rpid) { int pfd[2], nulldesc; pid_t pid; sigset_t omask, mask; char *argv[4]; /* sh -c cmd NULL */ char errmsg[200]; if (pipe(pfd) == -1) return (-1); if ((nulldesc = open(_PATH_DEVNULL, O_RDWR)) == -1) /* we are royally screwed anyway */ return (-1); sigemptyset(&mask); sigaddset(&mask, SIGALRM); sigaddset(&mask, SIGHUP); sigprocmask(SIG_BLOCK, &mask, &omask); switch ((pid = fork())) { case -1: sigprocmask(SIG_SETMASK, &omask, 0); close(nulldesc); return (-1); case 0: argv[0] = strdup("sh"); argv[1] = strdup("-c"); argv[2] = strdup(prog); argv[3] = NULL; if (argv[0] == NULL || argv[1] == NULL || argv[2] == NULL) { logerror("strdup"); exit(1); } alarm(0); (void)setsid(); /* Avoid catching SIGHUPs. */ /* * Throw away pending signals, and reset signal * behaviour to standard values. */ signal(SIGALRM, SIG_IGN); signal(SIGHUP, SIG_IGN); sigprocmask(SIG_SETMASK, &omask, 0); signal(SIGPIPE, SIG_DFL); signal(SIGQUIT, SIG_DFL); signal(SIGALRM, SIG_DFL); signal(SIGHUP, SIG_DFL); dup2(pfd[0], STDIN_FILENO); dup2(nulldesc, STDOUT_FILENO); dup2(nulldesc, STDERR_FILENO); closefrom(3); (void)execvp(_PATH_BSHELL, argv); _exit(255); } sigprocmask(SIG_SETMASK, &omask, 0); close(nulldesc); close(pfd[0]); /* * Avoid blocking on a hung pipe. With O_NONBLOCK, we are * supposed to get an EWOULDBLOCK on writev(2), which is * caught by the logic above anyway, which will in turn close * the pipe, and fork a new logging subprocess if necessary. * The stale subprocess will be killed some time later unless * it terminated itself due to closing its input pipe (so we * get rid of really dead puppies). */ if (fcntl(pfd[1], F_SETFL, O_NONBLOCK) == -1) { /* This is bad. */ (void)snprintf(errmsg, sizeof errmsg, "Warning: cannot change pipe to PID %d to " "non-blocking behaviour.", (int)pid); logerror(errmsg); } *rpid = pid; return (pfd[1]); } static void deadq_enter(pid_t pid, const char *name) { dq_t p; int status; /* * Be paranoid, if we can't signal the process, don't enter it * into the dead queue (perhaps it's already dead). If possible, * we try to fetch and log the child's status. */ if (kill(pid, 0) != 0) { if (waitpid(pid, &status, WNOHANG) > 0) log_deadchild(pid, status, name); return; } p = malloc(sizeof(struct deadq_entry)); if (p == NULL) { logerror("malloc"); exit(1); } p->dq_pid = pid; p->dq_timeout = DQ_TIMO_INIT; TAILQ_INSERT_TAIL(&deadq_head, p, dq_entries); } static int deadq_remove(pid_t pid) { dq_t q; TAILQ_FOREACH(q, &deadq_head, dq_entries) { if (q->dq_pid == pid) { TAILQ_REMOVE(&deadq_head, q, dq_entries); free(q); return (1); } } return (0); } static void log_deadchild(pid_t pid, int status, const char *name) { int code; char buf[256]; const char *reason; errno = 0; /* Keep strerror() stuff out of logerror messages. */ if (WIFSIGNALED(status)) { reason = "due to signal"; code = WTERMSIG(status); } else { reason = "with status"; code = WEXITSTATUS(status); if (code == 0) return; } (void)snprintf(buf, sizeof buf, "Logging subprocess %d (%s) exited %s %d.", pid, name, reason, code); logerror(buf); } static int * socksetup(int af, char *bindhostname) { struct addrinfo hints, *res, *r; const char *bindservice; char *cp; int error, maxs, *s, *socks; /* * We have to handle this case for backwards compatibility: * If there are two (or more) colons but no '[' and ']', * assume this is an inet6 address without a service. */ bindservice = "syslog"; if (bindhostname != NULL) { #ifdef INET6 if (*bindhostname == '[' && (cp = strchr(bindhostname + 1, ']')) != NULL) { ++bindhostname; *cp = '\0'; if (cp[1] == ':' && cp[2] != '\0') bindservice = cp + 2; } else { #endif cp = strchr(bindhostname, ':'); if (cp != NULL && strchr(cp + 1, ':') == NULL) { *cp = '\0'; if (cp[1] != '\0') bindservice = cp + 1; if (cp == bindhostname) bindhostname = NULL; } #ifdef INET6 } #endif } memset(&hints, 0, sizeof(hints)); hints.ai_flags = AI_PASSIVE; hints.ai_family = af; hints.ai_socktype = SOCK_DGRAM; error = getaddrinfo(bindhostname, bindservice, &hints, &res); if (error) { logerror(gai_strerror(error)); errno = 0; die(0); } /* Count max number of sockets we may open */ for (maxs = 0, r = res; r; r = r->ai_next, maxs++); socks = malloc((maxs+1) * sizeof(int)); if (socks == NULL) { logerror("couldn't allocate memory for sockets"); die(0); } *socks = 0; /* num of sockets counter at start of array */ s = socks + 1; for (r = res; r; r = r->ai_next) { int on = 1; *s = socket(r->ai_family, r->ai_socktype, r->ai_protocol); if (*s < 0) { logerror("socket"); continue; } #ifdef INET6 if (r->ai_family == AF_INET6) { if (setsockopt(*s, IPPROTO_IPV6, IPV6_V6ONLY, (char *)&on, sizeof (on)) < 0) { logerror("setsockopt"); close(*s); continue; } } #endif if (setsockopt(*s, SOL_SOCKET, SO_REUSEADDR, (char *)&on, sizeof (on)) < 0) { logerror("setsockopt"); close(*s); continue; } /* * RFC 3164 recommends that client side message * should come from the privileged syslogd port. * * If the system administrator choose not to obey * this, we can skip the bind() step so that the * system will choose a port for us. */ if (!NoBind) { if (bind(*s, r->ai_addr, r->ai_addrlen) < 0) { logerror("bind"); close(*s); continue; } if (!SecureMode) increase_rcvbuf(*s); } (*socks)++; s++; } if (*socks == 0) { free(socks); if (Debug) return (NULL); else die(0); } if (res) freeaddrinfo(res); return (socks); } static void increase_rcvbuf(int fd) { socklen_t len, slen; slen = sizeof(len); if (getsockopt(fd, SOL_SOCKET, SO_RCVBUF, &len, &slen) == 0) { if (len < RCVBUF_MINSIZE) { len = RCVBUF_MINSIZE; setsockopt(fd, SOL_SOCKET, SO_RCVBUF, &len, sizeof(len)); } } } Index: projects/ifnet =================================================================== --- projects/ifnet (revision 279031) +++ projects/ifnet (revision 279032) Property changes on: projects/ifnet ___________________________________________________________________ Modified: svn:mergeinfo ## -0,0 +0,1 ## Merged /head:r278980-279031