Index: releng/10.2/UPDATING =================================================================== --- releng/10.2/UPDATING (revision 296954) +++ releng/10.2/UPDATING (revision 296955) @@ -1,2325 +1,2335 @@ Updating Information for FreeBSD current users This file is maintained and copyrighted by M. Warner Losh . See end of file for further details. For commonly done items, please see the COMMON ITEMS: section later in the file. These instructions assume that you basically know what you are doing. If not, then please consult the FreeBSD handbook: http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/makeworld.html Items affecting the ports and packages system can be found in /usr/ports/UPDATING. Please read that file before running portupgrade. NOTE: FreeBSD has switched from gcc to clang. If you have trouble bootstrapping from older versions of FreeBSD, try WITHOUT_CLANG to bootstrap to the tip of stable/10, and then rebuild without this option. The bootstrap process from older version of current is a bit fragile. +20160316 p14 FreeBSD-SA-16:14.openssh-xauth + FreeBSD-SA-16:15.sysarch + FreeBSD-EN-16:04.hyperv + FreeBSD-EN-16:05.hv_netvsc + + Fix OpenSSH xauth(1) command injection. [SA-16:14] + Fix incorrect argument validation in sysarch(2). [SA-16:15] + Fix Hyper-V KVP (Key-Value Pair) daemon indefinite sleep. [EN-16:04] + Fix hv_netvsc(4) incorrect TCP/IP checksums. [EN-16:05] + 20160303 p13 FreeBSD-SA-16:12.openssl Fix multiple vulnerabilities of OpenSSL. 20160130 p12 FreeBSD-SA-16:11.openssl Fix OpenSSL SSLv2 ciphersuite downgrade vulnerability. [SA-16:11] 20160127 p11 FreeBSD-SA-16:09.ntp FreeBSD-SA-16:10.linux Fix multiple vulnerabilities of ntp. [SA-16:09] Fix Linux compatibility layer issetugid(2) system call vulnerability. [SA-16:10] 20160114 p10 FreeBSD-SA-16:07.openssh Fix OpenSSH client information leak. [SA-16:07] 20160114 p9 FreeBSD-EN-16:01.filemon FreeBSD-EN-16:02.pf FreeBSD-EN-16:03.yplib FreeBSD-SA-16:01.sctp FreeBSD-SA-16:02.ntp FreeBSD-SA-16:03.linux FreeBSD-SA-16:04.linux FreeBSD-SA-16:05.tcp FreeBSD-SA-16:06.bsnmpd Fix multiple stability and locking problems in filemon(4). [EN-16:01] Fix pf(4) generating bad TCP checksums. [EN-16:02] Fix infinite loop in YP/NIS client library. [EN-16:03] Fix remote denial of service in SCTP. [SA-16:01] Update NTP to 4.2.8p5. [SA-16:02] Fix kernel memory diclosure in Linux compatibility layer. [SA-16:03] Fix kernel memory overwrite in Linux compatibility layer. [SA-16:04] Fix crash in TCP MD5 signatures. [SA-16:05] Fix insecure default permissions for snmpd.config. [SA-16:06] 20151205 p8 FreeBSD-SA-15:26.openssl Fix multiple OpenSSL vulnerabilities. [SA-15:26] 20151104 p7 FreeBSD-SA-15:25.ntp [revised] FreeBSD-EN-15:19.kqueue FreeBSD-EN-15:20.vm Fix regression in ntpd(8) lacking support for RAWDCF reference clock in 10.2-RELEASE-p6. [SA-15:25] Fix kqueue write events never fired for files greater 2GB. [EN-15:19] Fix applications exiting due to segmentation violation on a correct memory address. [EN-15:20.vm] 20151026: p6 FreeBSD-SA-15:25.ntp Fix multiple NTP vulnerabilities. New NTP version is 4.2.8p4. 20151002: p5 FreeBSD-SA-15:24.rpcbind [revised] Revised patch to address a regression that prevents NIS from working. 20150929: p4 FreeBSD-SA-15:24.rpcbind Fix rpcbind(8) remote denial of service. [SA-15:24] 20150916: p3 FreeBSD-EN-15:16.pw FreeBSD-EN-15:17.libc FreeBSD-EN-15:18.pkg Fix regression in pw(8) when creating numeric users or groups. [EN-15:16] Fix libc handling of signals for multi-threaded processes. [EN-15:17] Implement pubkey support for pkg(7) bootstrap. [EN-15:18] 20150825: p2 FreeBSD-SA-15:22.openssh FreeBSD-EN-15:15.pkg Fix OpenSSH multiple vulnerabilities. [SA-15:22] Fix insufficient check of unsupported pkg(7) signature methods. [EN-15:15] 20150818: p1 FreeBSD-SA-15:20.expat FreeBSD-EN-15:11.toolchain FreeBSD-EN-15:12.netstat FreeBSD-EN-15:13.vidcontrol Fix multiple integer overflows in expat (libbsdxml) XML parser. [SA-15:20] Fix make(1) syntax errors when upgrading from 9.x and earlier. [EN-15:11] Fix incorrect netstat(1) data handling on 32-bit systems. [EN-15:12] Allow size argument to vidcontrol(1) for syscons(4). [EN-15:13] 20150813: 10.2-RELEASE. 20150703: The default Unbound configuration now enables remote control using a local socket. Users who have already enabled the local_unbound service should regenerate their configuration by running "service local_unbound setup" as root. 20150624: An additional fix for the issue described in the 20150614 sendmail entry below has been been committed in revision 284786. 20150615: The fix for the issue described in the 20150614 sendmail entry below has been been committed in revision 284485. The work around described in that entry is no longer needed unless the default setting is overridden by a confDH_PARAMETERS configuration setting of '5' or pointing to a 512 bit DH parameter file. 20150614: The import of openssl to address the FreeBSD-SA-15:10.openssl security advisory includes a change which rejects handshakes with DH parameters below 768 bits. sendmail releases prior to 8.15.2 (not yet released), defaulted to a 512 bit DH parameter setting for client connections. To work around this interoperability, sendmail can be configured to use a 2048 bit DH parameter by: 1. Edit /etc/mail/`hostname`.mc 2. If a setting for confDH_PARAMETERS does not exist or exists and is set to a string beginning with '5', replace it with '2'. 3. If a setting for confDH_PARAMETERS exists and is set to a file path, create a new file with: openssl dhparam -out /path/to/file 2048 4. Rebuild the .cf file: cd /etc/mail/; make; make install 5. Restart sendmail: cd /etc/mail/; make restart A sendmail patch is coming, at which time this file will be updated. 20150601: chmod, chflags, chown and chgrp now affect symlinks in -R mode as defined in symlink(7); previously symlinks were silently ignored. 20150430: The const qualifier has been removed from iconv(3) to comply with POSIX. The ports tree is aware of this from r384038 onwards. 20141215: At svn r275807, The default linux compat kernel ABI has been adjusted to 2.6.18 in support of the linux-c6 compat ports infrastructure update. If you wish to continue using the linux-f10 compat ports, add compat.linux.osrelease=2.6.16 to your local sysctl.conf. Users are encouraged to update their linux-compat packages to linux-c6 during their next update cycle. See ports/UPDATING 20141209 and 20141215 on migration to CentOS 6 ports. 20141205: pjdfstest has been integrated into kyua as an opt-in test suite. Please see share/doc/pjdfstest/README for a more details on how to execute it. 20141118: 10.1-RELEASE. 20140904: The ofwfb driver, used to provide a graphics console on PowerPC when using vt(4), no longer allows mmap() of all of physical memory. This will prevent Xorg on PowerPC with some ATI graphics cards from initializing properly unless x11-servers/xorg-server is updated to 1.12.4_8 or newer. 20140831: The libatf-c and libatf-c++ major versions were downgraded to 0 and 1 respectively to match the upstream numbers. They were out of sync because, when they were originally added to FreeBSD, the upstream versions were not respected. These libraries are private and not yet built by default, so renumbering them should be a non-issue. However, unclean source trees will yield broken test programs once the operator executes "make delete-old-libs" after a "make installworld". Additionally, the atf-sh binary was made private by moving it into /usr/libexec/. Already-built shell test programs will keep the path to the old binary so they will break after "make delete-old" is run. If you are using WITH_TESTS=yes (not the default), wipe the object tree and rebuild from scratch to prevent spurious test failures. This is only needed once: the misnumbered libraries and misplaced binaries have been added to OptionalObsoleteFiles.inc so they will be removed during a clean upgrade. 20140814: The ixgbe tunables now match their sysctl counterparts, for example: hw.ixgbe.enable_aim => hw.ix.enable_aim Anyone using ixgbe tunables should ensure they update /boot/loader.conf. 20140801: The NFSv4.1 server committed by r269398 changes the internal function call interfaces used between the NFS and krpc modules. As such, __FreeBSD_version was bumped. 20140729: The default unbound configuration has been modified to address issues with reverse lookups on networks that use private address ranges. If you use the local_unbound service, run "service local_unbound setup" as root to regenerate your configuration, then "service local_unbound reload" to load the new configuration. 20140717: It is no longer necessary to include the dwarf version in your DEBUG options in your kernel config file. The bug that required it to be placed in the config file has bene fixed. DEBUG should now just contain -g. The build system will automatically update things to do the right thing. 20140715: Several ABI breaking changes were merged to CTL and new iSCSI code. All CTL and iSCSI-related tools, such as ctladm, ctld, iscsid and iscsictl need to be rebuilt to work with a new kernel. 20140708: The WITHOUT_VT_SUPPORT kernel config knob has been renamed WITHOUT_VT. (The other _SUPPORT knobs have a consistent meaning which differs from the behaviour controlled by this knob.) 20140608: On i386 and amd64 systems, the onifconsole flag is now set by default in /etc/ttys for ttyu0. This causes ttyu0 to be automatically enabled as a login TTY if it is set in the bootloader as an active kernel console. No changes in behavior should result otherwise. To revert to the previous behavior, set ttyu0 to "off" in /etc/ttys. 20140512: Clang and llvm have been upgraded to 3.4.1 release. 20140321: Clang and llvm have been upgraded to 3.4 release. 20140306: If a Makefile in a tests/ directory was auto-generating a Kyuafile instead of providing an explicit one, this would prevent such Makefile from providing its own Kyuafile in the future during NO_CLEAN builds. This has been fixed in the Makefiles but manual intervention is needed to clean an objdir if you use NO_CLEAN: # find /usr/obj -name Kyuafile | xargs rm -f 20140303: OpenSSH will now ignore errors caused by kernel lacking of Capsicum capability mode support. Please note that enabling the feature in kernel is still highly recommended. 20140227: OpenSSH is now built with sandbox support, and will use sandbox as the default privilege separation method. This requires Capsicum capability mode support in kernel. 20140216: The nve(4) driver for NVIDIA nForce MCP Ethernet adapters has been deprecated and will not be part of FreeBSD 11.0 and later releases. If you use this driver, please consider switching to the nfe(4) driver instead. 20140120: 10.0-RELEASE. 20131216: The behavior of gss_pseudo_random() for the krb5 mechanism has changed, for applications requesting a longer random string than produced by the underlying enctype's pseudo-random() function. In particular, the random string produced from a session key of enctype aes256-cts-hmac-sha1-96 or aes256-cts-hmac-sha1-96 will be different at the 17th octet and later, after this change. The counter used in the PRF+ construction is now encoded as a big-endian integer in accordance with RFC 4402. __FreeBSD_version is bumped to 1000701. 20131108: The WITHOUT_ATF build knob has been removed and its functionality has been subsumed into the more generic WITHOUT_TESTS. If you were using the former to disable the build of the ATF libraries, you should change your settings to use the latter. 20131031: The default version of mtree is nmtree which is obtained from NetBSD. The output is generally the same, but may vary slightly. If you found you need identical output adding "-F freebsd9" to the command line should do the trick. For the time being, the old mtree is available as fmtree. 20131014: libbsdyml has been renamed to libyaml and moved to /usr/lib/private. This will break ports-mgmt/pkg. Rebuild the port, or upgrade to pkg 1.1.4_8 and verify bsdyml not linked in, before running "make delete-old-libs": # make -C /usr/ports/ports-mgmt/pkg build deinstall install clean or # pkg install pkg; ldd /usr/local/sbin/pkg | grep bsdyml 20131010: The rc.d/jail script has been updated to support jail(8) configuration file. The "jail__*" rc.conf(5) variables for per-jail configuration are automatically converted to /var/run/jail..conf before the jail(8) utility is invoked. This is transparently backward compatible. See below about some incompatibilities and rc.conf(5) manual page for more details. These variables are now deprecated in favor of jail(8) configuration file. One can use "rc.d/jail config " command to generate a jail(8) configuration file in /var/run/jail..conf without running the jail(8) utility. The default pathname of the configuration file is /etc/jail.conf and can be specified by using $jail_conf or $jail__conf variables. Please note that jail_devfs_ruleset accepts an integer at this moment. Please consider to rewrite the ruleset name with an integer. 20130930: BIND has been removed from the base system. If all you need is a local resolver, simply enable and start the local_unbound service instead. Otherwise, several versions of BIND are available in the ports tree. The dns/bind99 port is one example. With this change, nslookup(1) and dig(1) are no longer in the base system. Users should instead use host(1) and drill(1) which are in the base system. Alternatively, nslookup and dig can be obtained by installing the dns/bind-tools port. 20130916: With the addition of unbound(8), a new unbound user is now required during installworld. "mergemaster -p" can be used to add the user prior to installworld, as documented in the handbook. 20130911: OpenSSH is now built with DNSSEC support, and will by default silently trust signed SSHFP records. This can be controlled with the VerifyHostKeyDNS client configuration setting. DNSSEC support can be disabled entirely with the WITHOUT_LDNS option in src.conf. 20130906: The GNU Compiler Collection and C++ standard library (libstdc++) are no longer built by default on platforms where clang is the system compiler. You can enable them with the WITH_GCC and WITH_GNUCXX options in src.conf. 20130905: The PROCDESC kernel option is now part of the GENERIC kernel configuration and is required for the rwhod(8) to work. If you are using custom kernel configuration, you should include 'options PROCDESC'. 20130905: The API and ABI related to the Capsicum framework was modified in backward incompatible way. The userland libraries and programs have to be recompiled to work with the new kernel. This includes the following libraries and programs, but the whole buildworld is advised: libc, libprocstat, dhclient, tcpdump, hastd, hastctl, kdump, procstat, rwho, rwhod, uniq. 20130903: AES-NI intrinsic support has been added to gcc. The AES-NI module has been updated to use this support. A new gcc is required to build the aesni module on both i386 and amd64. 20130821: The PADLOCK_RNG and RDRAND_RNG kernel options are now devices. Thus "device padlock_rng" and "device rdrand_rng" should be used instead of "options PADLOCK_RNG" & "options RDRAND_RNG". 20130813: WITH_ICONV has been split into two feature sets. WITH_ICONV now enables just the iconv* functionality and is now on by default. WITH_LIBICONV_COMPAT enables the libiconv api and link time compatability. Set WITHOUT_ICONV to build the old way. If you have been using WITH_ICONV before, you will very likely need to turn on WITH_LIBICONV_COMPAT. 20130806: INVARIANTS option now enables DEBUG for code with OpenSolaris and Illumos origin, including ZFS. If you have INVARIANTS in your kernel configuration, then there is no need to set DEBUG or ZFS_DEBUG explicitly. DEBUG used to enable witness(9) tracking of OpenSolaris (mostly ZFS) locks if WITNESS option was set. Because that generated a lot of witness(9) reports and all of them were believed to be false positives, this is no longer done. New option OPENSOLARIS_WITNESS can be used to achieve the previous behavior. 20130806: Timer values in IPv6 data structures now use time_uptime instead of time_second. Although this is not a user-visible functional change, userland utilities which directly use them---ndp(8), rtadvd(8), and rtsold(8) in the base system---need to be updated to r253970 or later. 20130802: find -delete can now delete the pathnames given as arguments, instead of only files found below them or if the pathname did not contain any slashes. Formerly, the following error message would result: find: -delete: : relative path potentially not safe Deleting the pathnames given as arguments can be prevented without error messages using -mindepth 1 or by changing directory and passing "." as argument to find. This works in the old as well as the new version of find. 20130726: Behavior of devfs rules path matching has been changed. Pattern is now always matched against fully qualified devfs path and slash characters must be explicitly matched by slashes in pattern (FNM_PATHNAME). Rulesets involving devfs subdirectories must be reviewed. 20130716: The default ARM ABI has changed to the ARM EABI. The old ABI is incompatible with the ARM EABI and all programs and modules will need to be rebuilt to work with a new kernel. To keep using the old ABI ensure the WITHOUT_ARM_EABI knob is set. NOTE: Support for the old ABI will be removed in the future and users are advised to upgrade. 20130709: pkg_install has been disconnected from the build if you really need it you should add WITH_PKGTOOLS in your src.conf(5). 20130709: Most of network statistics structures were changed to be able keep 64-bits counters. Thus all tools, that work with networking statistics, must be rebuilt (netstat(1), bsnmpd(1), etc.) 20130629: Fix targets that run multiple make's to use && rather than ; so that subsequent steps depend on success of previous. NOTE: if building 'universe' with -j* on stable/8 or stable/9 it would be better to start the build using bmake, to avoid overloading the machine. 20130618: Fix a bug that allowed a tracing process (e.g. gdb) to write to a memory-mapped file in the traced process's address space even if neither the traced process nor the tracing process had write access to that file. 20130615: CVS has been removed from the base system. An exact copy of the code is available from the devel/cvs port. 20130613: Some people report the following error after the switch to bmake: make: illegal option -- J usage: make [-BPSXeiknpqrstv] [-C directory] [-D variable] ... *** [buildworld] Error code 2 this likely due to an old instance of make in ${MAKEPATH} (${MAKEOBJDIRPREFIX}${.CURDIR}/make.${MACHINE}) which src/Makefile will use that blindly, if it exists, so if you see the above error: rm -rf `make -V MAKEPATH` should resolve it. 20130516: Use bmake by default. Whereas before one could choose to build with bmake via -DWITH_BMAKE one must now use -DWITHOUT_BMAKE to use the old make. The goal is to remove these knobs for 10-RELEASE. It is worth noting that bmake (like gmake) treats the command line as the unit of failure, rather than statements within the command line. Thus '(cd some/where && dosomething)' is safer than 'cd some/where; dosomething'. The '()' allows consistent behavior in parallel build. 20130429: Fix a bug that allows NFS clients to issue READDIR on files. 20130426: The WITHOUT_IDEA option has been removed because the IDEA patent expired. 20130426: The sysctl which controls TRIM support under ZFS has been renamed from vfs.zfs.trim_disable -> vfs.zfs.trim.enabled and has been enabled by default. 20130425: The mergemaster command now uses the default MAKEOBJDIRPREFIX rather than creating it's own in the temporary directory in order allow access to bootstrapped versions of tools such as install and mtree. When upgrading from version of FreeBSD where the install command does not support -l, you will need to install a new mergemaster command if mergemaster -p is required. This can be accomplished with the command (cd src/usr.sbin/mergemaster && make install). 20130404: Legacy ATA stack, disabled and replaced by new CAM-based one since FreeBSD 9.0, completely removed from the sources. Kernel modules atadisk and atapi*, user-level tools atacontrol and burncd are removed. Kernel option `options ATA_CAM` is now permanently enabled and removed. 20130319: SOCK_CLOEXEC and SOCK_NONBLOCK flags have been added to socket(2) and socketpair(2). Software, in particular Kerberos, may automatically detect and use these during building. The resulting binaries will not work on older kernels. 20130308: CTL_DISABLE has also been added to the sparc64 GENERIC (for further information, see the respective 20130304 entry). 20130304: Recent commits to callout(9) changed the size of struct callout, so the KBI is probably heavily disturbed. Also, some functions in callout(9)/sleep(9)/sleepqueue(9)/condvar(9) KPIs were replaced by macros. Every kernel module using it won't load, so rebuild is requested. The ctl device has been re-enabled in GENERIC for i386 and amd64, but does not initialize by default (because of the new CTL_DISABLE option) to save memory. To re-enable it, remove the CTL_DISABLE option from the kernel config file or set kern.cam.ctl.disable=0 in /boot/loader.conf. 20130301: The ctl device has been disabled in GENERIC for i386 and amd64. This was done due to the extra memory being allocated at system initialisation time by the ctl driver which was only used if a CAM target device was created. This makes a FreeBSD system unusable on 128MB or less of RAM. 20130208: A new compression method (lz4) has been merged to -HEAD. Please refer to zpool-features(7) for more information. Please refer to the "ZFS notes" section of this file for information on upgrading boot ZFS pools. 20130129: A BSD-licensed patch(1) variant has been added and is installed as bsdpatch, being the GNU version the default patch. To inverse the logic and use the BSD-licensed one as default, while having the GNU version installed as gnupatch, rebuild and install world with the WITH_BSD_PATCH knob set. 20130121: Due to the use of the new -l option to install(1) during build and install, you must take care not to directly set the INSTALL make variable in your /etc/make.conf, /etc/src.conf, or on the command line. If you wish to use the -C flag for all installs you may be able to add INSTALL+=-C to /etc/make.conf or /etc/src.conf. 20130118: The install(1) option -M has changed meaning and now takes an argument that is a file or path to append logs to. In the unlikely event that -M was the last option on the command line and the command line contained at least two files and a target directory the first file will have logs appended to it. The -M option served little practical purpose in the last decade so its use is expected to be extremely rare. 20121223: After switching to Clang as the default compiler some users of ZFS on i386 systems started to experience stack overflow kernel panics. Please consider using 'options KSTACK_PAGES=4' in such configurations. 20121222: GEOM_LABEL now mangles label names read from file system metadata. Mangling affect labels containing spaces, non-printable characters, '%' or '"'. Device names in /etc/fstab and other places may need to be updated. 20121217: By default, only the 10 most recent kernel dumps will be saved. To restore the previous behaviour (no limit on the number of kernel dumps stored in the dump directory) add the following line to /etc/rc.conf: savecore_flags="" 20121201: With the addition of auditdistd(8), a new auditdistd user is now required during installworld. "mergemaster -p" can be used to add the user prior to installworld, as documented in the handbook. 20121117: The sin6_scope_id member variable in struct sockaddr_in6 is now filled by the kernel before passing the structure to the userland via sysctl or routing socket. This means the KAME-specific embedded scope id in sin6_addr.s6_addr[2] is always cleared in userland application. This behavior can be controlled by net.inet6.ip6.deembed_scopeid. __FreeBSD_version is bumped to 1000025. 20121105: On i386 and amd64 systems WITH_CLANG_IS_CC is now the default. This means that the world and kernel will be compiled with clang and that clang will be installed as /usr/bin/cc, /usr/bin/c++, and /usr/bin/cpp. To disable this behavior and revert to building with gcc, compile with WITHOUT_CLANG_IS_CC. Really old versions of current may need to bootstrap WITHOUT_CLANG first if the clang build fails (its compatibility window doesn't extend to the 9 stable branch point). 20121102: The IPFIREWALL_FORWARD kernel option has been removed. Its functionality now turned on by default. 20121023: The ZERO_COPY_SOCKET kernel option has been removed and split into SOCKET_SEND_COW and SOCKET_RECV_PFLIP. NB: SOCKET_SEND_COW uses the VM page based copy-on-write mechanism which is not safe and may result in kernel crashes. NB: The SOCKET_RECV_PFLIP mechanism is useless as no current driver supports disposeable external page sized mbuf storage. Proper replacements for both zero-copy mechanisms are under consideration and will eventually lead to complete removal of the two kernel options. 20121023: The IPv4 network stack has been converted to network byte order. The following modules need to be recompiled together with kernel: carp(4), divert(4), gif(4), siftr(4), gre(4), pf(4), ipfw(4), ng_ipfw(4), stf(4). 20121022: Support for non-MPSAFE filesystems was removed from VFS. The VFS_VERSION was bumped, all filesystem modules shall be recompiled. 20121018: All the non-MPSAFE filesystems have been disconnected from the build. The full list includes: codafs, hpfs, ntfs, nwfs, portalfs, smbfs, xfs. 20121016: The interface cloning API and ABI has changed. The following modules need to be recompiled together with kernel: ipfw(4), pfsync(4), pflog(4), usb(4), wlan(4), stf(4), vlan(4), disc(4), edsc(4), if_bridge(4), gif(4), tap(4), faith(4), epair(4), enc(4), tun(4), if_lagg(4), gre(4). 20121015: The sdhci driver was split in two parts: sdhci (generic SD Host Controller logic) and sdhci_pci (actual hardware driver). No kernel config modifications are required, but if you load sdhc as a module you must switch to sdhci_pci instead. 20121014: Import the FUSE kernel and userland support into base system. 20121013: The GNU sort(1) program has been removed since the BSD-licensed sort(1) has been the default for quite some time and no serious problems have been reported. The corresponding WITH_GNU_SORT knob has also gone. 20121006: The pfil(9) API/ABI for AF_INET family has been changed. Packet filtering modules: pf(4), ipfw(4), ipfilter(4) need to be recompiled with new kernel. 20121001: The net80211(4) ABI has been changed to allow for improved driver PS-POLL and power-save support. All wireless drivers need to be recompiled to work with the new kernel. 20120913: The random(4) support for the VIA hardware random number generator (`PADLOCK') is no longer enabled unconditionally. Add the padlock_rng device in the custom kernel config if needed. The GENERIC kernels on i386 and amd64 do include the device, so the change only affects the custom kernel configurations. 20120908: The pf(4) packet filter ABI has been changed. pfctl(8) and snmp_pf module need to be recompiled to work with new kernel. 20120828: A new ZFS feature flag "com.delphix:empty_bpobj" has been merged to -HEAD. Pools that have empty_bpobj in active state can not be imported read-write with ZFS implementations that do not support this feature. For more information read the zpool-features(5) manual page. 20120727: The sparc64 ZFS loader has been changed to no longer try to auto- detect ZFS providers based on diskN aliases but now requires these to be explicitly listed in the OFW boot-device environment variable. 20120712: The OpenSSL has been upgraded to 1.0.1c. Any binaries requiring libcrypto.so.6 or libssl.so.6 must be recompiled. Also, there are configuration changes. Make sure to merge /etc/ssl/openssl.cnf. 20120712: The following sysctls and tunables have been renamed for consistency with other variables: kern.cam.da.da_send_ordered -> kern.cam.da.send_ordered kern.cam.ada.ada_send_ordered -> kern.cam.ada.send_ordered 20120628: The sort utility has been replaced with BSD sort. For now, GNU sort is also available as "gnusort" or the default can be set back to GNU sort by setting WITH_GNU_SORT. In this case, BSD sort will be installed as "bsdsort". 20120611: A new version of ZFS (pool version 5000) has been merged to -HEAD. Starting with this version the old system of ZFS pool versioning is superseded by "feature flags". This concept enables forward compatibility against certain future changes in functionality of ZFS pools. The first read-only compatible "feature flag" for ZFS pools is named "com.delphix:async_destroy". For more information read the new zpool-features(5) manual page. Please refer to the "ZFS notes" section of this file for information on upgrading boot ZFS pools. 20120417: The malloc(3) implementation embedded in libc now uses sources imported as contrib/jemalloc. The most disruptive API change is to /etc/malloc.conf. If your system has an old-style /etc/malloc.conf, delete it prior to installworld, and optionally re-create it using the new format after rebooting. See malloc.conf(5) for details (specifically the TUNING section and the "opt.*" entries in the MALLCTL NAMESPACE section). 20120328: Big-endian MIPS TARGET_ARCH values no longer end in "eb". mips64eb is now spelled mips64. mipsn32eb is now spelled mipsn32. mipseb is now spelled mips. This is to aid compatibility with third-party software that expects this naming scheme in uname(3). Little-endian settings are unchanged. If you are updating a big-endian mips64 machine from before this change, you may need to set MACHINE_ARCH=mips64 in your environment before the new build system will recognize your machine. 20120306: Disable by default the option VFS_ALLOW_NONMPSAFE for all supported platforms. 20120229: Now unix domain sockets behave "as expected" on nullfs(5). Previously nullfs(5) did not pass through all behaviours to the underlying layer, as a result if we bound to a socket on the lower layer we could connect only to the lower path; if we bound to the upper layer we could connect only to the upper path. The new behavior is one can connect to both the lower and the upper paths regardless what layer path one binds to. 20120211: The getifaddrs upgrade path broken with 20111215 has been restored. If you have upgraded in between 20111215 and 20120209 you need to recompile libc again with your kernel. You still need to recompile world to be able to configure CARP but this restriction already comes from 20111215. 20120114: The set_rcvar() function has been removed from /etc/rc.subr. All base and ports rc.d scripts have been updated, so if you have a port installed with a script in /usr/local/etc/rc.d you can either hand-edit the rcvar= line, or reinstall the port. An easy way to handle the mass-update of /etc/rc.d: rm /etc/rc.d/* && mergemaster -i 20120109: panic(9) now stops other CPUs in the SMP systems, disables interrupts on the current CPU and prevents other threads from running. This behavior can be reverted using the kern.stop_scheduler_on_panic tunable/sysctl. The new behavior can be incompatible with kern.sync_on_panic. 20111215: The carp(4) facility has been changed significantly. Configuration of the CARP protocol via ifconfig(8) has changed, as well as format of CARP events submitted to devd(8) has changed. See manual pages for more information. The arpbalance feature of carp(4) is currently not supported anymore. Size of struct in_aliasreq, struct in6_aliasreq has changed. User utilities using SIOCAIFADDR, SIOCAIFADDR_IN6, e.g. ifconfig(8), need to be recompiled. 20111122: The acpi_wmi(4) status device /dev/wmistat has been renamed to /dev/wmistat0. 20111108: The option VFS_ALLOW_NONMPSAFE option has been added in order to explicitely support non-MPSAFE filesystems. It is on by default for all supported platform at this present time. 20111101: The broken amd(4) driver has been replaced with esp(4) in the amd64, i386 and pc98 GENERIC kernel configuration files. 20110930: sysinstall has been removed 20110923: The stable/9 branch created in subversion. This corresponds to the RELENG_9 branch in CVS. 20110913: This commit modifies vfs_register() so that it uses a hash calculation to set vfc_typenum, which is enabled by default. The first time a system is booted after this change, the vfc_typenum values will change for all file systems. The main effect of this is a change to the NFS server file handles for file systems that use vfc_typenum in their fsid, such as ZFS. It will, however, prevent vfc_typenum from changing when file systems are loaded in a different order for subsequent reboots. To disable this, you can set vfs.typenumhash=0 in /boot/loader.conf until you are ready to remount all NFS clients after a reboot. 20110828: Bump the shared library version numbers for libraries that do not use symbol versioning, have changed the ABI compared to stable/8 and which shared library version was not bumped. Done as part of 9.0-RELEASE cycle. 20110815: During the merge of Capsicum features, the fget(9) KPI was modified. This may require the rebuilding of out-of-tree device drivers -- issues have been reported specifically with the nVidia device driver. __FreeBSD_version is bumped to 900041. Also, there is a period between 20110811 and 20110814 where the special devices /dev/{stdin,stdout,stderr} did not work correctly. Building world from a kernel during that window may not work. 20110628: The packet filter (pf) code has been updated to OpenBSD 4.5. You need to update userland tools to be in sync with kernel. This update breaks backward compatibility with earlier pfsync(4) versions. Care must be taken when updating redundant firewall setups. 20110608: The following sysctls and tunables are retired on x86 platforms: machdep.hlt_cpus machdep.hlt_logical_cpus The following sysctl is retired: machdep.hyperthreading_allowed The sysctls were supposed to provide a way to dynamically offline and online selected CPUs on x86 platforms, but the implementation has not been reliable especially with SCHED_ULE scheduler. machdep.hyperthreading_allowed tunable is still available to ignore hyperthreading CPUs at OS level. Individual CPUs can be disabled using hint.lapic.X.disabled tunable, where X is an APIC ID of a CPU. Be advised, though, that disabling CPUs in non-uniform fashion will result in non-uniform topology and may lead to sub-optimal system performance with SCHED_ULE, which is a default scheduler. 20110607: cpumask_t type is retired and cpuset_t is used in order to describe a mask of CPUs. 20110531: Changes to ifconfig(8) for dynamic address family detection mandate that you are running a kernel of 20110525 or later. Make sure to follow the update procedure to boot a new kernel before installing world. 20110513: Support for sun4v architecture is officially dropped 20110503: Several KPI breaking changes have been committed to the mii(4) layer, the PHY drivers and consequently some Ethernet drivers using mii(4). This means that miibus.ko and the modules of the affected Ethernet drivers need to be recompiled. Note to kernel developers: Given that the OUI bit reversion problem was fixed as part of these changes all mii(4) commits related to OUIs, i.e. to sys/dev/mii/miidevs, PHY driver probing and vendor specific handling, no longer can be merged verbatim to stable/8 and previous branches. 20110430: Users of the Atheros AR71xx SoC code now need to add 'device ar71xx_pci' into their kernel configurations along with 'device pci'. 20110427: The default NFS client is now the new NFS client, so fstype "newnfs" is now "nfs" and the regular/old NFS client is now fstype "oldnfs". Although mounts via fstype "nfs" will usually work without userland changes, it is recommended that the mount(8) and mount_nfs(8) commands be rebuilt from sources and that a link to mount_nfs called mount_oldnfs be created. The new client is compiled into the kernel with "options NFSCL" and this is needed for diskless root file systems. The GENERIC kernel configs have been changed to use NFSCL and NFSD (the new server) instead of NFSCLIENT and NFSSERVER. To use the regular/old client, you can "mount -t oldnfs ...". For a diskless root file system, you must also include a line like: vfs.root.mountfrom="oldnfs:" in the boot/loader.conf on the root fs on the NFS server to make a diskless root fs use the old client. 20110424: The GENERIC kernels for all architectures now default to the new CAM-based ATA stack. It means that all legacy ATA drivers were removed and replaced by respective CAM drivers. If you are using ATA device names in /etc/fstab or other places, make sure to update them respectively (adX -> adaY, acdX -> cdY, afdX -> daY, astX -> saY, where 'Y's are the sequential numbers starting from zero for each type in order of detection, unless configured otherwise with tunables, see cam(4)). There will be symbolic links created in /dev/ to map old adX devices to the respective adaY. They should provide basic compatibility for file systems mounting in most cases, but they do not support old user-level APIs and do not have respective providers in GEOM. Consider using updated management tools with new device names. It is possible to load devices ahci, ata, siis and mvs as modules, but option ATA_CAM should remain in kernel configuration to make ata module work as CAM driver supporting legacy ATA controllers. Device ata still can be used in modular fashion (atacore + ...). Modules atadisk and atapi* are not used and won't affect operation in ATA_CAM mode. Note that to use CAM-based ATA kernel should include CAM devices scbus, pass, da (or explicitly ada), cd and optionally others. All of them are parts of the cam module. ataraid(4) functionality is now supported by the RAID GEOM class. To use it you can load geom_raid kernel module and use graid(8) tool for management. Instead of /dev/arX device names, use /dev/raid/rX. No kernel config options or code have been removed, so if a problem arises, please report it and optionally revert to the old ATA stack. In order to do it you can remove from the kernel config: options ATA_CAM device ahci device mvs device siis , and instead add back: device atadisk # ATA disk drives device ataraid # ATA RAID drives device atapicd # ATAPI CDROM drives device atapifd # ATAPI floppy drives device atapist # ATAPI tape drives 20110423: The default NFS server has been changed to the new server, which was referred to as the experimental server. If you need to switch back to the old NFS server, you must now put the "-o" option on both the mountd and nfsd commands. This can be done using the mountd_flags and nfs_server_flags rc.conf variables until an update to the rc scripts is committed, which is coming soon. 20110418: The GNU Objective-C runtime library (libobjc), and other Objective-C related components have been removed from the base system. If you require an Objective-C library, please use one of the available ports. 20110331: ath(4) has been split into bus- and device- modules. if_ath contains the HAL, the TX rate control and the network device code. if_ath_pci contains the PCI bus glue. For Atheros MIPS embedded systems, if_ath_ahb contains the AHB glue. Users need to load both if_ath_pci and if_ath in order to use ath on everything else. TO REPEAT: if_ath_ahb is not needed for normal users. Normal users only need to load if_ath and if_ath_pci for ath(4) operation. 20110314: As part of the replacement of sysinstall, the process of building release media has changed significantly. For details, please re-read release(7), which has been updated to reflect the new build process. 20110218: GNU binutils 2.17.50 (as of 2007-07-03) has been merged to -HEAD. This is the last available version under GPLv2. It brings a number of new features, such as support for newer x86 CPU's (with SSE-3, SSSE-3, SSE 4.1 and SSE 4.2), better support for powerpc64, a number of new directives, and lots of other small improvements. See the ChangeLog file in contrib/binutils for the full details. 20110218: IPsec's HMAC_SHA256-512 support has been fixed to be RFC4868 compliant, and will now use half of hash for authentication. This will break interoperability with all stacks (including all actual FreeBSD versions) who implement draft-ietf-ipsec-ciph-sha-256-00 (they use 96 bits of hash for authentication). The only workaround with such peers is to use another HMAC algorithm for IPsec ("phase 2") authentication. 20110207: Remove the uio_yield prototype and symbol. This function has been misnamed since it was introduced and should not be globally exposed with this name. The equivalent functionality is now available using kern_yield(curthread->td_user_pri). The function remains undocumented. 20110112: A SYSCTL_[ADD_]UQUAD was added for unsigned uint64_t pointers, symmetric with the existing SYSCTL_[ADD_]QUAD. Type checking for scalar sysctls is defined but disabled. Code that needs UQUAD to pass the type checking that must compile on older systems where the define is not present can check against __FreeBSD_version >= 900030. The system dialog(1) has been replaced with a new version previously in ports as devel/cdialog. dialog(1) is mostly command-line compatible with the previous version, but the libdialog associated with it has a largely incompatible API. As such, the original version of libdialog will be kept temporarily as libodialog, until its base system consumers are replaced or updated. Bump __FreeBSD_version to 900030. 20110103: If you are trying to run make universe on a -stable system, and you get the following warning: "Makefile", line 356: "Target architecture for i386/conf/GENERIC unknown. config(8) likely too old." or something similar to it, then you must upgrade your -stable system to 8.2-Release or newer (really, any time after r210146 7/15/2010 in stable/8) or build the config from the latest stable/8 branch and install it on your system. Prior to this date, building a current universe on 8-stable system from between 7/15/2010 and 1/2/2011 would result in a weird shell parsing error in the first kernel build phase. A new config on those old systems will fix that problem for older versions of -current. 20101228: The TCP stack has been modified to allow Khelp modules to interact with it via helper hook points and store per-connection data in the TCP control block. Bump __FreeBSD_version to 900029. User space tools that rely on the size of struct tcpcb in tcp_var.h (e.g. sockstat) need to be recompiled. 20101114: Generic IEEE 802.3 annex 31B full duplex flow control support has been added to mii(4) and bge(4), bce(4), msk(4), nfe(4) and stge(4) along with brgphy(4), e1000phy(4) as well as ip1000phy() have been converted to take advantage of it instead of using custom implementations. This means that these drivers now no longer unconditionally advertise support for flow control but only do so if flow control is a selected media option. This was implemented in the generic support that way in order to allow flow control to be switched on and off via ifconfig(8) with the PHY specific default to typically off in order to protect from unwanted effects. Consequently, if you used flow control with one of the above mentioned drivers you now need to explicitly enable it, for example via: ifconfig bge0 media auto mediaopt flowcontrol Along with the above mentioned changes generic support for setting 1000baseT master mode also has been added and brgphy(4), ciphy(4), e1000phy(4) as well as ip1000phy(4) have been converted to take advantage of it. This means that these drivers now no longer take the link0 parameter for selecting master mode but the master media option has to be used instead, for example like in the following: ifconfig bge0 media 1000baseT mediaopt full-duplex,master Selection of master mode now is also available with all other PHY drivers supporting 1000baseT. 20101111: The TCP stack has received a significant update to add support for modularised congestion control and generally improve the clarity of congestion control decisions. Bump __FreeBSD_version to 900025. User space tools that rely on the size of struct tcpcb in tcp_var.h (e.g. sockstat) need to be recompiled. 20101002: The man(1) utility has been replaced by a new version that no longer uses /etc/manpath.config. Please consult man.conf(5) for how to migrate local entries to the new format. 20100928: The copyright strings printed by login(1) and sshd(8) at the time of a new connection have been removed to follow other operating systems and upstream sshd. 20100915: A workaround for a fixed ld bug has been removed in kernel code, so make sure that your system ld is built from sources after revision 210245 from 2010-07-19 (r211583 if building head kernel on stable/8, r211584 for stable/7; both from 2010-08-21). A symptom of incorrect ld version is different addresses for set_pcpu section and __start_set_pcpu symbol in kernel and/or modules. 20100913: The $ipv6_prefer variable in rc.conf(5) has been split into $ip6addrctl_policy and $ipv6_activate_all_interfaces. The $ip6addrctl_policy is a variable to choose a pre-defined address selection policy set by ip6addrctl(8). A value "ipv4_prefer", "ipv6_prefer" or "AUTO" can be specified. The default is "AUTO". The $ipv6_activate_all_interfaces specifies whether IFDISABLED flag (see an entry of 20090926) is set on an interface with no corresponding $ifconfig_IF_ipv6 line. The default is "NO" for security reason. If you want IPv6 link-local address on all interfaces by default, set this to "YES". The old ipv6_prefer="YES" is equivalent to ipv6_activate_all_interfaces="YES" and ip6addrctl_policy="ipv6_prefer". 20100913: DTrace has grown support for userland tracing. Due to this, DTrace is now i386 and amd64 only. dtruss(1) is now installed by default on those systems and a new kernel module is needed for userland tracing: fasttrap. No changes to your kernel config file are necessary to enable userland tracing, but you might consider adding 'STRIP=' and 'CFLAGS+=-fno-omit-frame-pointer' to your make.conf if you want to have informative userland stack traces in DTrace (ustack). 20100725: The acpi_aiboost(4) driver has been removed in favor of the new aibs(4) driver. You should update your kernel configuration file. 20100722: BSD grep has been imported to the base system and it is built by default. It is completely BSD licensed, highly GNU-compatible, uses less memory than its GNU counterpart and has a small codebase. However, it is slower than its GNU counterpart, which is mostly noticeable for larger searches, for smaller ones it is measurable but not significant. The reason is complex, the most important factor is that we lack a modern and efficient regex library and GNU overcomes this by optimizing the searches internally. Future work on improving the regex performance is planned, for the meantime, users that need better performance, can build GNU grep instead by setting the WITH_GNU_GREP knob. 20100713: Due to the import of powerpc64 support, all existing powerpc kernel configuration files must be updated with a machine directive like this: machine powerpc powerpc In addition, an updated config(8) is required to build powerpc kernels after this change. 20100713: A new version of ZFS (version 15) has been merged to -HEAD. This version uses a python library for the following subcommands: zfs allow, zfs unallow, zfs groupspace, zfs userspace. For full functionality of these commands the following port must be installed: sysutils/py-zfs 20100429: 'vm_page's are now hashed by physical address to an array of mutexes. Currently this is only used to serialize access to hold_count. Over time the page queue mutex will be peeled away. This changes the size of pmap on every architecture. And requires all callers of vm_page_hold and vm_page_unhold to be updated. 20100402: WITH_CTF can now be specified in src.conf (not recommended, there are some problems with static executables), make.conf (would also affect ports which do not use GNU make and do not override the compile targets) or in the kernel config (via "makeoptions WITH_CTF=yes"). When WITH_CTF was specified there before this was silently ignored, so make sure that WITH_CTF is not used in places which could lead to unwanted behavior. 20100311: The kernel option COMPAT_IA32 has been replaced with COMPAT_FREEBSD32 to allow 32-bit compatibility on non-x86 platforms. All kernel configurations on amd64 and ia64 platforms using these options must be modified accordingly. 20100113: The utmp user accounting database has been replaced with utmpx, the user accounting interface standardized by POSIX. Unfortunately the semantics of utmp and utmpx don't match, making it practically impossible to support both interfaces. The user accounting database is used by tools like finger(1), last(1), talk(1), w(1) and ac(8). All applications in the base system use utmpx. This means only local binaries (e.g. from the ports tree) may still use these utmp database files. These applications must be rebuilt to make use of utmpx. After the system has been upgraded, it is safe to remove the old log files (/var/run/utmp, /var/log/lastlog and /var/log/wtmp*), assuming their contents is of no importance anymore. Old wtmp databases can only be used by last(1) and ac(8) after they have been converted to the new format using wtmpcvt(1). 20100108: Introduce the kernel thread "deadlock resolver" (which can be enabled via the DEADLKRES option, see NOTES for more details) and the sleepq_type() function for sleepqueues. 20091202: The rc.firewall and rc.firewall6 were unified, and rc.firewall6 and rc.d/ip6fw were removed. According to the removal of rc.d/ip6fw, ipv6_firewall_* rc variables are obsoleted. Instead, the following new rc variables are added to rc.d/ipfw: firewall_client_net_ipv6, firewall_simple_iif_ipv6, firewall_simple_inet_ipv6, firewall_simple_oif_ipv6, firewall_simple_onet_ipv6, firewall_trusted_ipv6 The meanings correspond to the relevant IPv4 variables. 20091125: 8.0-RELEASE. 20091113: The default terminal emulation for syscons(4) has been changed from cons25 to xterm on all platforms except pc98. This means that the /etc/ttys file needs to be updated to ensure correct operation of applications on the console. The terminal emulation style can be toggled per window by using vidcontrol(1)'s -T flag. The TEKEN_CONS25 kernel configuration options can be used to change the compile-time default back to cons25. To prevent graphical artifacts, make sure the TERM environment variable is set to match the terminal emulation that is being performed by syscons(4). 20091109: The layout of the structure ieee80211req_scan_result has changed. Applications that require wireless scan results (e.g. ifconfig(8)) from net80211 need to be recompiled. Applications such as wpa_supplicant(8) may require a full world build without using NO_CLEAN in order to get synchronized with the new structure. 20091025: The iwn(4) driver has been updated to support the 5000 and 5150 series. There's one kernel module for each firmware. Adding "device iwnfw" to the kernel configuration file means including all three firmware images inside the kernel. If you want to include just the one for your wireless card, use the devices iwn4965fw, iwn5000fw or iwn5150fw. 20090926: The rc.d/network_ipv6, IPv6 configuration script has been integrated into rc.d/netif. The changes are the following: 1. To use IPv6, simply define $ifconfig_IF_ipv6 like $ifconfig_IF for IPv4. For aliases, $ifconfig_IF_aliasN should be used. Note that both variables need the "inet6" keyword at the head. Do not set $ipv6_network_interfaces manually if you do not understand what you are doing. It is not needed in most cases. $ipv6_ifconfig_IF and $ipv6_ifconfig_IF_aliasN still work, but they are obsolete. 2. $ipv6_enable is obsolete. Use $ipv6_prefer and "inet6 accept_rtadv" keyword in ifconfig(8) instead. If you define $ipv6_enable=YES, it means $ipv6_prefer=YES and all configured interfaces have "inet6 accept_rtadv" in the $ifconfig_IF_ipv6. These are for backward compatibility. 3. A new variable $ipv6_prefer has been added. If NO, IPv6 functionality of interfaces with no corresponding $ifconfig_IF_ipv6 is disabled by using "inet6 ifdisabled" flag, and the default address selection policy of ip6addrctl(8) is the IPv4-preferred one (see rc.d/ip6addrctl for more details). Note that if you want to configure IPv6 functionality on the disabled interfaces after boot, first you need to clear the flag by using ifconfig(8) like: ifconfig em0 inet6 -ifdisabled If YES, the default address selection policy is set as IPv6-preferred. The default value of $ipv6_prefer is NO. 4. If your system need to receive Router Advertisement messages, define "inet6 accept_rtadv" in $ifconfig_IF_ipv6. The rc(8) scripts automatically invoke rtsol(8) when the interface becomes UP. The Router Advertisement messages are used for SLAAC (State-Less Address AutoConfiguration). 20090922: 802.11s D3.03 support was committed. This is incompatible with the previous code, which was based on D3.0. 20090912: A sysctl variable net.inet6.ip6.accept_rtadv now sets the default value of a per-interface flag ND6_IFF_ACCEPT_RTADV, not a global knob to control whether accepting Router Advertisement messages or not. Also, a per-interface flag ND6_IFF_AUTO_LINKLOCAL has been added and a sysctl variable net.inet6.ip6.auto_linklocal is its default value. The ifconfig(8) utility now supports these flags. 20090910: ZFS snapshots are now mounted with MNT_IGNORE flag. Use -v option for mount(8) and -a option for df(1) to see them. 20090825: The old tunable hw.bus.devctl_disable has been superseded by hw.bus.devctl_queue. hw.bus.devctl_disable=1 in loader.conf should be replaced by hw.bus.devctl_queue=0. The default for this new tunable is 1000. 20090813: Remove the option STOP_NMI. The default action is now to use NMI only for KDB via the newly introduced function stop_cpus_hard() and maintain stop_cpus() to just use a normal IPI_STOP on ia32 and amd64. 20090803: The stable/8 branch created in subversion. This corresponds to the RELENG_8 branch in CVS. 20090719: Bump the shared library version numbers for all libraries that do not use symbol versioning as part of the 8.0-RELEASE cycle. Bump __FreeBSD_version to 800105. 20090714: Due to changes in the implementation of virtual network stack support, all network-related kernel modules must be recompiled. As this change breaks the ABI, bump __FreeBSD_version to 800104. 20090713: The TOE interface to the TCP syncache has been modified to remove struct tcpopt () from the ABI of the network stack. The cxgb driver is the only TOE consumer affected by this change, and needs to be recompiled along with the kernel. As this change breaks the ABI, bump __FreeBSD_version to 800103. 20090712: Padding has been added to struct tcpcb, sackhint and tcpstat in to facilitate future MFCs and bug fixes whilst maintaining the ABI. However, this change breaks the ABI, so bump __FreeBSD_version to 800102. User space tools that rely on the size of any of these structs (e.g. sockstat) need to be recompiled. 20090630: The NFS_LEGACYRPC option has been removed along with the old kernel RPC implementation that this option selected. Kernel configurations may need to be adjusted. 20090629: The network interface device nodes at /dev/net/ have been removed. All ioctl operations can be performed the normal way using routing sockets. The kqueue functionality can generally be replaced with routing sockets. 20090628: The documentation from the FreeBSD Documentation Project (Handbook, FAQ, etc.) is now installed via packages by sysinstall(8) and under the /usr/local/share/doc/freebsd directory instead of /usr/share/doc. 20090624: The ABI of various structures related to the SYSV IPC API have been changed. As a result, the COMPAT_FREEBSD[456] and COMPAT_43 kernel options now all require COMPAT_FREEBSD7. Bump __FreeBSD_version to 800100. 20090622: Layout of struct vnet has changed as routing related variables were moved to their own Vimage module. Modules need to be recompiled. Bump __FreeBSD_version to 800099. 20090619: NGROUPS_MAX and NGROUPS have been increased from 16 to 1023 and 1024 respectively. As long as no more than 16 groups per process are used, no changes should be visible. When more than 16 groups are used, old binaries may fail if they call getgroups() or getgrouplist() with statically sized storage. Recompiling will work around this, but applications should be modified to use dynamically allocated storage for group arrays as POSIX.1-2008 does not cap an implementation's number of supported groups at NGROUPS_MAX+1 as previous versions did. NFS and portalfs mounts may also be affected as the list of groups is truncated to 16. Users of NFS who use more than 16 groups, should take care that negative group permissions are not used on the exported file systems as they will not be reliable unless a GSSAPI based authentication method is used. 20090616: The compiling option ADAPTIVE_LOCKMGRS has been introduced. This option compiles in the support for adaptive spinning for lockmgrs which want to enable it. The lockinit() function now accepts the flag LK_ADAPTIVE in order to make the lock object subject to adaptive spinning when both held in write and read mode. 20090613: The layout of the structure returned by IEEE80211_IOC_STA_INFO has changed. User applications that use this ioctl need to be rebuilt. 20090611: The layout of struct thread has changed. Kernel and modules need to be rebuilt. 20090608: The layout of structs ifnet, domain, protosw and vnet_net has changed. Kernel modules need to be rebuilt. Bump __FreeBSD_version to 800097. 20090602: window(1) has been removed from the base system. It can now be installed from ports. The port is called misc/window. 20090601: The way we are storing and accessing `routing table' entries has changed. Programs reading the FIB, like netstat, need to be re-compiled. 20090601: A new netisr implementation has been added for FreeBSD 8. Network file system modules, such as igmp, ipdivert, and others, should be rebuilt. Bump __FreeBSD_version to 800096. 20090530: Remove the tunable/sysctl debug.mpsafevfs as its initial purpose is no more valid. 20090530: Add VOP_ACCESSX(9). File system modules need to be rebuilt. Bump __FreeBSD_version to 800094. 20090529: Add mnt_xflag field to 'struct mount'. File system modules need to be rebuilt. Bump __FreeBSD_version to 800093. 20090528: The compiling option ADAPTIVE_SX has been retired while it has been introduced the option NO_ADAPTIVE_SX which handles the reversed logic. The KPI for sx_init_flags() changes as accepting flags: SX_ADAPTIVESPIN flag has been retired while the SX_NOADAPTIVE flag has been introduced in order to handle the reversed logic. Bump __FreeBSD_version to 800092. 20090527: Add support for hierarchical jails. Remove global securelevel. Bump __FreeBSD_version to 800091. 20090523: The layout of struct vnet_net has changed, therefore modules need to be rebuilt. Bump __FreeBSD_version to 800090. 20090523: The newly imported zic(8) produces a new format in the output. Please run tzsetup(8) to install the newly created data to /etc/localtime. 20090520: The sysctl tree for the usb stack has renamed from hw.usb2.* to hw.usb.* and is now consistent again with previous releases. 20090520: 802.11 monitor mode support was revised and driver api's were changed. Drivers dependent on net80211 now support DLT_IEEE802_11_RADIO instead of DLT_IEEE802_11. No user-visible data structures were changed but applications that use DLT_IEEE802_11 may require changes. Bump __FreeBSD_version to 800088. 20090430: The layout of the following structs has changed: sysctl_oid, socket, ifnet, inpcbinfo, tcpcb, syncache_head, vnet_inet, vnet_inet6 and vnet_ipfw. Most modules need to be rebuild or panics may be experienced. World rebuild is required for correctly checking networking state from userland. Bump __FreeBSD_version to 800085. 20090429: MLDv2 and Source-Specific Multicast (SSM) have been merged to the IPv6 stack. VIMAGE hooks are in but not yet used. The implementation of SSM within FreeBSD's IPv6 stack closely follows the IPv4 implementation. For kernel developers: * The most important changes are that the ip6_output() and ip6_input() paths no longer take the IN6_MULTI_LOCK, and this lock has been downgraded to a non-recursive mutex. * As with the changes to the IPv4 stack to support SSM, filtering of inbound multicast traffic must now be performed by transport protocols within the IPv6 stack. This does not apply to TCP and SCTP, however, it does apply to UDP in IPv6 and raw IPv6. * The KPIs used by IPv6 multicast are similar to those used by the IPv4 stack, with the following differences: * im6o_mc_filter() is analogous to imo_multicast_filter(). * The legacy KAME entry points in6_joingroup and in6_leavegroup() are shimmed to in6_mc_join() and in6_mc_leave() respectively. * IN6_LOOKUP_MULTI() has been deprecated and removed. * IPv6 relies on MLD for the DAD mechanism. KAME's internal KPIs for MLDv1 have an additional 'timer' argument which is used to jitter the initial membership report for the solicited-node multicast membership on-link. * This is not strictly needed for MLDv2, which already jitters its report transmissions. However, the 'timer' argument is preserved in case MLDv1 is active on the interface. * The KAME linked-list based IPv6 membership implementation has been refactored to use a vector similar to that used by the IPv4 stack. Code which maintains a list of its own multicast memberships internally, e.g. carp, has been updated to reflect the new semantics. * There is a known Lock Order Reversal (LOR) due to in6_setscope() acquiring the IF_AFDATA_LOCK and being called within ip6_output(). Whilst MLDv2 tries to avoid this otherwise benign LOR, it is an implementation constraint which needs to be addressed in HEAD. For application developers: * The changes are broadly similar to those made for the IPv4 stack. * The use of IPv4 and IPv6 multicast socket options on the same socket, using mapped addresses, HAS NOT been tested or supported. * There are a number of issues with the implementation of various IPv6 multicast APIs which need to be resolved in the API surface before the implementation is fully compatible with KAME userland use, and these are mostly to do with interface index treatment. * The literature available discusses the use of either the delta / ASM API with setsockopt(2)/getsockopt(2), or the full-state / ASM API using setsourcefilter(3)/getsourcefilter(3). For more information please refer to RFC 3768, 'Socket Interface Extensions for Multicast Source Filters'. * Applications which use the published RFC 3678 APIs should be fine. For systems administrators: * The mtest(8) utility has been refactored to support IPv6, in addition to IPv4. Interface addresses are no longer accepted as arguments, their names must be used instead. The utility will map the interface name to its first IPv4 address as returned by getifaddrs(3). * The ifmcstat(8) utility has also been updated to print the MLDv2 endpoint state and source filter lists via sysctl(3). * The net.inet6.ip6.mcast.loop sysctl may be tuned to 0 to disable loopback of IPv6 multicast datagrams by default; it defaults to 1 to preserve the existing behaviour. Disabling multicast loopback is recommended for optimal system performance. * The IPv6 MROUTING code has been changed to examine this sysctl instead of attempting to perform a group lookup before looping back forwarded datagrams. Bump __FreeBSD_version to 800084. 20090422: Implement low-level Bluetooth HCI API. Bump __FreeBSD_version to 800083. 20090419: The layout of struct malloc_type, used by modules to register new memory allocation types, has changed. Most modules will need to be rebuilt or panics may be experienced. Bump __FreeBSD_version to 800081. 20090415: Anticipate overflowing inp_flags - add inp_flags2. This changes most offsets in inpcb, so checking v4 connection state will require a world rebuild. Bump __FreeBSD_version to 800080. 20090415: Add an llentry to struct route and struct route_in6. Modules embedding a struct route will need to be recompiled. Bump __FreeBSD_version to 800079. 20090414: The size of rt_metrics_lite and by extension rtentry has changed. Networking administration apps will need to be recompiled. The route command now supports show as an alias for get, weighting of routes, sticky and nostick flags to alter the behavior of stateful load balancing. Bump __FreeBSD_version to 800078. 20090408: Do not use Giant for kbdmux(4) locking. This is wrong and apparently causing more problems than it solves. This will re-open the issue where interrupt handlers may race with kbdmux(4) in polling mode. Typical symptoms include (but not limited to) duplicated and/or missing characters when low level console functions (such as gets) are used while interrupts are enabled (for example geli password prompt, mountroot prompt etc.). Disabling kbdmux(4) may help. 20090407: The size of structs vnet_net, vnet_inet and vnet_ipfw has changed; kernel modules referencing any of the above need to be recompiled. Bump __FreeBSD_version to 800075. 20090320: GEOM_PART has become the default partition slicer for storage devices, replacing GEOM_MBR, GEOM_BSD, GEOM_PC98 and GEOM_GPT slicers. It introduces some changes: MSDOS/EBR: the devices created from MSDOS extended partition entries (EBR) can be named differently than with GEOM_MBR and are now symlinks to devices with offset-based names. fstabs may need to be modified. BSD: the "geometry does not match label" warning is harmless in most cases but it points to problems in file system misalignment with disk geometry. The "c" partition is now implicit, covers the whole top-level drive and cannot be (mis)used by users. General: Kernel dumps are now not allowed to be written to devices whose partition types indicate they are meant to be used for file systems (or, in case of MSDOS partitions, as something else than the "386BSD" type). Most of these changes date approximately from 200812. 20090319: The uscanner(4) driver has been removed from the kernel. This follows Linux removing theirs in 2.6 and making libusb the default interface (supported by sane). 20090319: The multicast forwarding code has been cleaned up. netstat(1) only relies on KVM now for printing bandwidth upcall meters. The IPv4 and IPv6 modules are split into ip_mroute_mod and ip6_mroute_mod respectively. The config(5) options for statically compiling this code remain the same, i.e. 'options MROUTING'. 20090315: Support for the IFF_NEEDSGIANT network interface flag has been removed, which means that non-MPSAFE network device drivers are no longer supported. In particular, if_ar, if_sr, and network device drivers from the old (legacy) USB stack can no longer be built or used. 20090313: POSIX.1 Native Language Support (NLS) has been enabled in libc and a bunch of new language catalog files have also been added. This means that some common libc messages are now localized and they depend on the LC_MESSAGES environmental variable. 20090313: The k8temp(4) driver has been renamed to amdtemp(4) since support for Family 10 and Family 11 CPU families was added. 20090309: IGMPv3 and Source-Specific Multicast (SSM) have been merged to the IPv4 stack. VIMAGE hooks are in but not yet used. For kernel developers, the most important changes are that the ip_output() and ip_input() paths no longer take the IN_MULTI_LOCK(), and this lock has been downgraded to a non-recursive mutex. Transport protocols (UDP, Raw IP) are now responsible for filtering inbound multicast traffic according to group membership and source filters. The imo_multicast_filter() KPI exists for this purpose. Transports which do not use multicast (SCTP, TCP) already reject multicast by default. Forwarding and receive performance may improve as a mutex acquisition is no longer needed in the ip_input() low-level input path. in_addmulti() and in_delmulti() are shimmed to new KPIs which exist to support SSM in-kernel. For application developers, it is recommended that loopback of multicast datagrams be disabled for best performance, as this will still cause the lock to be taken for each looped-back datagram transmission. The net.inet.ip.mcast.loop sysctl may be tuned to 0 to disable loopback by default; it defaults to 1 to preserve the existing behaviour. For systems administrators, to obtain best performance with multicast reception and multiple groups, it is always recommended that a card with a suitably precise hash filter is used. Hash collisions will still result in the lock being taken within the transport protocol input path to check group membership. If deploying FreeBSD in an environment with IGMP snooping switches, it is recommended that the net.inet.igmp.sendlocal sysctl remain enabled; this forces 224.0.0.0/24 group membership to be announced via IGMP. The size of 'struct igmpstat' has changed; netstat needs to be recompiled to reflect this. Bump __FreeBSD_version to 800070. 20090309: libusb20.so.1 is now installed as libusb.so.1 and the ports system updated to use it. This requires a buildworld/installworld in order to update the library and dependencies (usbconfig, etc). Its advisable to rebuild all ports which uses libusb. More specific directions are given in the ports collection UPDATING file. Any /etc/libmap.conf entries for libusb are no longer required and can be removed. 20090302: A workaround is committed to allow the creation of System V shared memory segment of size > 2 GB on the 64-bit architectures. Due to a limitation of the existing ABI, the shm_segsz member of the struct shmid_ds, returned by shmctl(IPC_STAT) call is wrong for large segments. Note that limits must be explicitly raised to allow such segments to be created. 20090301: The layout of struct ifnet has changed, requiring a rebuild of all network device driver modules. 20090227: The /dev handling for the new USB stack has changed, a buildworld/installworld is required for libusb20. 20090223: The new USB2 stack has now been permanently moved in and all kernel and module names reverted to their previous values (eg, usb, ehci, ohci, ums, ...). The old usb stack can be compiled in by prefixing the name with the letter 'o', the old usb modules have been removed. Updating entry 20090216 for xorg and 20090215 for libmap may still apply. 20090217: The rc.conf(5) option if_up_delay has been renamed to defaultroute_delay to better reflect its purpose. If you have customized this setting in /etc/rc.conf you need to update it to use the new name. 20090216: xorg 7.4 wants to configure its input devices via hald which does not yet work with USB2. If the keyboard/mouse does not work in xorg then add Option "AllowEmptyInput" "off" to your ServerLayout section. This will cause X to use the configured kbd and mouse sections from your xorg.conf. 20090215: The GENERIC kernels for all architectures now default to the new USB2 stack. No kernel config options or code have been removed so if a problem arises please report it and optionally revert to the old USB stack. If you are loading USB kernel modules or have a custom kernel that includes GENERIC then ensure that usb names are also changed over, eg uftdi -> usb2_serial_ftdi. Older programs linked against the ports libusb 0.1 need to be redirected to the new stack's libusb20. /etc/libmap.conf can be used for this: # Map old usb library to new one for usb2 stack libusb-0.1.so.8 libusb20.so.1 20090209: All USB ethernet devices now attach as interfaces under the name ueN (eg. ue0). This is to provide a predictable name as vendors often change usb chipsets in a product without notice. 20090203: The ichsmb(4) driver has been changed to require SMBus slave addresses be left-justified (xxxxxxx0b) rather than right-justified. All of the other SMBus controller drivers require left-justified slave addresses, so this change makes all the drivers provide the same interface. 20090201: INET6 statistics (struct ip6stat) was updated. netstat(1) needs to be recompiled. 20090119: NTFS has been removed from GENERIC kernel on amd64 to match GENERIC on i386. Should not cause any issues since mount_ntfs(8) will load ntfs.ko module automatically when NTFS support is actually needed, unless ntfs.ko is not installed or security level prohibits loading kernel modules. If either is the case, "options NTFS" has to be added into kernel config. 20090115: TCP Appropriate Byte Counting (RFC 3465) support added to kernel. New field in struct tcpcb breaks ABI, so bump __FreeBSD_version to 800061. User space tools that rely on the size of struct tcpcb in tcp_var.h (e.g. sockstat) need to be recompiled. 20081225: ng_tty(4) module updated to match the new TTY subsystem. Due to API change, user-level applications must be updated. New API support added to mpd5 CVS and expected to be present in next mpd5.3 release. 20081219: With __FreeBSD_version 800060 the makefs tool is part of the base system (it was a port). 20081216: The afdata and ifnet locks have been changed from mutexes to rwlocks, network modules will need to be re-compiled. 20081214: __FreeBSD_version 800059 incorporates the new arp-v2 rewrite. RTF_CLONING, RTF_LLINFO and RTF_WASCLONED flags are eliminated. The new code reduced struct rtentry{} by 16 bytes on 32-bit architecture and 40 bytes on 64-bit architecture. The userland applications "arp" and "ndp" have been updated accordingly. The output from "netstat -r" shows only routing entries and none of the L2 information. 20081130: __FreeBSD_version 800057 marks the switchover from the binary ath hal to source code. Users must add the line: options AH_SUPPORT_AR5416 to their kernel config files when specifying: device ath_hal The ath_hal module no longer exists; the code is now compiled together with the driver in the ath module. It is now possible to tailor chip support (i.e. reduce the set of chips and thereby the code size); consult ath_hal(4) for details. 20081121: __FreeBSD_version 800054 adds memory barriers to , new interfaces to ifnet to facilitate multiple hardware transmit queues for cards that support them, and a lock-less ring-buffer implementation to enable drivers to more efficiently manage queueing of packets. 20081117: A new version of ZFS (version 13) has been merged to -HEAD. This version has zpool attribute "listsnapshots" off by default, which means "zfs list" does not show snapshots, and is the same as Solaris behavior. 20081028: dummynet(4) ABI has changed. ipfw(8) needs to be recompiled. 20081009: The uhci, ohci, ehci and slhci USB Host controller drivers have been put into separate modules. If you load the usb module separately through loader.conf you will need to load the appropriate *hci module as well. E.g. for a UHCI-based USB 2.0 controller add the following to loader.conf: uhci_load="YES" ehci_load="YES" 20081009: The ABI used by the PMC toolset has changed. Please keep userland (libpmc(3)) and the kernel module (hwpmc(4)) in sync. 20081009: atapci kernel module now includes only generic PCI ATA driver. AHCI driver moved to ataahci kernel module. All vendor-specific code moved into separate kernel modules: ataacard, ataacerlabs, ataadaptec, ataamd, ataati, atacenatek, atacypress, atacyrix, atahighpoint, ataintel, ataite, atajmicron, atamarvell, atamicron, atanational, atanetcell, atanvidia, atapromise, ataserverworks, atasiliconimage, atasis, atavia 20080820: The TTY subsystem of the kernel has been replaced by a new implementation, which provides better scalability and an improved driver model. Most common drivers have been migrated to the new TTY subsystem, while others have not. The following drivers have not yet been ported to the new TTY layer: PCI/ISA: cy, digi, rc, rp, sio USB: ubser, ucycom Line disciplines: ng_h4, ng_tty, ppp, sl, snp Adding these drivers to your kernel configuration file shall cause compilation to fail. 20080818: ntpd has been upgraded to 4.2.4p5. 20080801: OpenSSH has been upgraded to 5.1p1. For many years, FreeBSD's version of OpenSSH preferred DSA over RSA for host and user authentication keys. With this upgrade, we've switched to the vendor's default of RSA over DSA. This may cause upgraded clients to warn about unknown host keys even for previously known hosts. Users should follow the usual procedure for verifying host keys before accepting the RSA key. This can be circumvented by setting the "HostKeyAlgorithms" option to "ssh-dss,ssh-rsa" in ~/.ssh/config or on the ssh command line. Please note that the sequence of keys offered for authentication has been changed as well. You may want to specify IdentityFile in a different order to revert this behavior. 20080713: The sio(4) driver has been removed from the i386 and amd64 kernel configuration files. This means uart(4) is now the default serial port driver on those platforms as well. To prevent collisions with the sio(4) driver, the uart(4) driver uses different names for its device nodes. This means the onboard serial port will now most likely be called "ttyu0" instead of "ttyd0". You may need to reconfigure applications to use the new device names. When using the serial port as a boot console, be sure to update /boot/device.hints and /etc/ttys before booting the new kernel. If you forget to do so, you can still manually specify the hints at the loader prompt: set hint.uart.0.at="isa" set hint.uart.0.port="0x3F8" set hint.uart.0.flags="0x10" set hint.uart.0.irq="4" boot -s 20080609: The gpt(8) utility has been removed. Use gpart(8) to partition disks instead. 20080603: The version that Linuxulator emulates was changed from 2.4.2 to 2.6.16. If you experience any problems with Linux binaries please try to set sysctl compat.linux.osrelease to 2.4.2 and if it fixes the problem contact emulation mailing list. 20080525: ISDN4BSD (I4B) was removed from the src tree. You may need to update a your kernel configuration and remove relevant entries. 20080509: I have checked in code to support multiple routing tables. See the man pages setfib(1) and setfib(2). This is a hopefully backwards compatible version, but to make use of it you need to compile your kernel with options ROUTETABLES=2 (or more up to 16). 20080420: The 802.11 wireless support was redone to enable multi-bss operation on devices that are capable. The underlying device is no longer used directly but instead wlanX devices are cloned with ifconfig. This requires changes to rc.conf files. For example, change: ifconfig_ath0="WPA DHCP" to wlans_ath0=wlan0 ifconfig_wlan0="WPA DHCP" see rc.conf(5) for more details. In addition, mergemaster of /etc/rc.d is highly recommended. Simultaneous update of userland and kernel wouldn't hurt either. As part of the multi-bss changes the wlan_scan_ap and wlan_scan_sta modules were merged into the base wlan module. All references to these modules (e.g. in kernel config files) must be removed. 20080408: psm(4) has gained write(2) support in native operation level. Arbitrary commands can be written to /dev/psm%d and status can be read back from it. Therefore, an application is responsible for status validation and error recovery. It is a no-op in other operation levels. 20080312: Support for KSE threading has been removed from the kernel. To run legacy applications linked against KSE libmap.conf may be used. The following libmap.conf may be used to ensure compatibility with any prior release: libpthread.so.1 libthr.so.1 libpthread.so.2 libthr.so.2 libkse.so.3 libthr.so.3 20080301: The layout of struct vmspace has changed. This affects libkvm and any executables that link against libkvm and use the kvm_getprocs() function. In particular, but not exclusively, it affects ps(1), fstat(1), pkill(1), systat(1), top(1) and w(1). The effects are minimal, but it's advisable to upgrade world nonetheless. 20080229: The latest em driver no longer has support in it for the 82575 adapter, this is now moved to the igb driver. The split was done to make new features that are incompatible with older hardware easier to do. 20080220: The new geom_lvm(4) geom class has been renamed to geom_linux_lvm(4), likewise the kernel option is now GEOM_LINUX_LVM. 20080211: The default NFS mount mode has changed from UDP to TCP for increased reliability. If you rely on (insecurely) NFS mounting across a firewall you may need to update your firewall rules. 20080208: Belatedly note the addition of m_collapse for compacting mbuf chains. 20080126: The fts(3) structures have been changed to use adequate integer types for their members and so to be able to cope with huge file trees. The old fts(3) ABI is preserved through symbol versioning in libc, so third-party binaries using fts(3) should still work, although they will not take advantage of the extended types. At the same time, some third-party software might fail to build after this change due to unportable assumptions made in its source code about fts(3) structure members. Such software should be fixed by its vendor or, in the worst case, in the ports tree. FreeBSD_version 800015 marks this change for the unlikely case that a portable fix is impossible. 20080123: To upgrade to -current after this date, you must be running FreeBSD not older than 6.0-RELEASE. Upgrading to -current from 5.x now requires a stop over at RELENG_6 or RELENG_7 systems. 20071128: The ADAPTIVE_GIANT kernel option has been retired because its functionality is the default now. 20071118: The AT keyboard emulation of sunkbd(4) has been turned on by default. In order to make the special symbols of the Sun keyboards driven by sunkbd(4) work under X these now have to be configured the same way as Sun USB keyboards driven by ukbd(4) (which also does AT keyboard emulation), f.e.: Option "XkbLayout" "us" Option "XkbRules" "xorg" Option "XkbSymbols" "pc(pc105)+sun_vndr/usb(sun_usb)+us" 20071024: It has been decided that it is desirable to provide ABI backwards compatibility to the FreeBSD 4/5/6 versions of the PCIOCGETCONF, PCIOCREAD and PCIOCWRITE IOCTLs, which was broken with the introduction of PCI domain support (see the 20070930 entry). Unfortunately, this required the ABI of PCIOCGETCONF to be broken again in order to be able to provide backwards compatibility to the old version of that IOCTL. Thus consumers of PCIOCGETCONF have to be recompiled again. As for prominent ports this affects neither pciutils nor xorg-server this time, the hal port needs to be rebuilt however. 20071020: The misnamed kthread_create() and friends have been renamed to kproc_create() etc. Many of the callers already used kproc_start().. I will return kthread_create() and friends in a while with implementations that actually create threads, not procs. Renaming corresponds with version 800002. 20071010: RELENG_7 branched. COMMON ITEMS: General Notes ------------- Avoid using make -j when upgrading. While generally safe, there are sometimes problems using -j to upgrade. If your upgrade fails with -j, please try again without -j. From time to time in the past there have been problems using -j with buildworld and/or installworld. This is especially true when upgrading between "distant" versions (eg one that cross a major release boundary or several minor releases, or when several months have passed on the -current branch). Sometimes, obscure build problems are the result of environment poisoning. This can happen because the make utility reads its environment when searching for values for global variables. To run your build attempts in an "environmental clean room", prefix all make commands with 'env -i '. See the env(1) manual page for more details. When upgrading from one major version to another it is generally best to upgrade to the latest code in the currently installed branch first, then do an upgrade to the new branch. This is the best-tested upgrade path, and has the highest probability of being successful. Please try this approach before reporting problems with a major version upgrade. When upgrading a live system, having a root shell around before installing anything can help undo problems. Not having a root shell around can lead to problems if pam has changed too much from your starting point to allow continued authentication after the upgrade. ZFS notes --------- When upgrading the boot ZFS pool to a new version, always follow these two steps: 1.) recompile and reinstall the ZFS boot loader and boot block (this is part of "make buildworld" and "make installworld") 2.) update the ZFS boot block on your boot drive The following example updates the ZFS boot block on the first partition (freebsd-boot) of a GPT partitioned drive ad0: "gpart bootcode -p /boot/gptzfsboot -i 1 ad0" Non-boot pools do not need these updates. To build a kernel ----------------- If you are updating from a prior version of FreeBSD (even one just a few days old), you should follow this procedure. It is the most failsafe as it uses a /usr/obj tree with a fresh mini-buildworld, make kernel-toolchain make -DALWAYS_CHECK_MAKE buildkernel KERNCONF=YOUR_KERNEL_HERE make -DALWAYS_CHECK_MAKE installkernel KERNCONF=YOUR_KERNEL_HERE To test a kernel once --------------------- If you just want to boot a kernel once (because you are not sure if it works, or if you want to boot a known bad kernel to provide debugging information) run make installkernel KERNCONF=YOUR_KERNEL_HERE KODIR=/boot/testkernel nextboot -k testkernel To just build a kernel when you know that it won't mess you up -------------------------------------------------------------- This assumes you are already running a CURRENT system. Replace ${arch} with the architecture of your machine (e.g. "i386", "arm", "amd64", "ia64", "pc98", "sparc64", "powerpc", "mips", etc). cd src/sys/${arch}/conf config KERNEL_NAME_HERE cd ../compile/KERNEL_NAME_HERE make depend make make install If this fails, go to the "To build a kernel" section. To rebuild everything and install it on the current system. ----------------------------------------------------------- # Note: sometimes if you are running current you gotta do more than # is listed here if you are upgrading from a really old current. make buildworld make kernel KERNCONF=YOUR_KERNEL_HERE [1] [3] mergemaster -p [5] make installworld mergemaster -i [4] make delete-old [6] To cross-install current onto a separate partition -------------------------------------------------- # In this approach we use a separate partition to hold # current's root, 'usr', and 'var' directories. A partition # holding "/", "/usr" and "/var" should be about 2GB in # size. make buildworld make buildkernel KERNCONF=YOUR_KERNEL_HERE make installworld DESTDIR=${CURRENT_ROOT} make distribution DESTDIR=${CURRENT_ROOT} # if newfs'd make installkernel KERNCONF=YOUR_KERNEL_HERE DESTDIR=${CURRENT_ROOT} cp /etc/fstab ${CURRENT_ROOT}/etc/fstab # if newfs'd To upgrade in-place from stable to current ---------------------------------------------- make buildworld [9] make kernel KERNCONF=YOUR_KERNEL_HERE [8] [1] [3] mergemaster -p [5] make installworld mergemaster -i [4] make delete-old [6] Make sure that you've read the UPDATING file to understand the tweaks to various things you need. At this point in the life cycle of current, things change often and you are on your own to cope. The defaults can also change, so please read ALL of the UPDATING entries. Also, if you are tracking -current, you must be subscribed to freebsd-current@freebsd.org. Make sure that before you update your sources that you have read and understood all the recent messages there. If in doubt, please track -stable which has much fewer pitfalls. [1] If you have third party modules, such as vmware, you should disable them at this point so they don't crash your system on reboot. [3] From the bootblocks, boot -s, and then do fsck -p mount -u / mount -a cd src adjkerntz -i # if CMOS is wall time Also, when doing a major release upgrade, it is required that you boot into single user mode to do the installworld. [4] Note: This step is non-optional. Failure to do this step can result in a significant reduction in the functionality of the system. Attempting to do it by hand is not recommended and those that pursue this avenue should read this file carefully, as well as the archives of freebsd-current and freebsd-hackers mailing lists for potential gotchas. The -U option is also useful to consider. See mergemaster(8) for more information. [5] Usually this step is a noop. However, from time to time you may need to do this if you get unknown user in the following step. It never hurts to do it all the time. You may need to install a new mergemaster (cd src/usr.sbin/mergemaster && make install) after the buildworld before this step if you last updated from current before 20130425 or from -stable before 20130430. [6] This only deletes old files and directories. Old libraries can be deleted by "make delete-old-libs", but you have to make sure that no program is using those libraries anymore. [8] In order to have a kernel that can run the 4.x binaries needed to do an installworld, you must include the COMPAT_FREEBSD4 option in your kernel. Failure to do so may leave you with a system that is hard to boot to recover. A similar kernel option COMPAT_FREEBSD5 is required to run the 5.x binaries on more recent kernels. And so on for COMPAT_FREEBSD6 and COMPAT_FREEBSD7. Make sure that you merge any new devices from GENERIC since the last time you updated your kernel config file. [9] When checking out sources, you must include the -P flag to have cvs prune empty directories. If CPUTYPE is defined in your /etc/make.conf, make sure to use the "?=" instead of the "=" assignment operator, so that buildworld can override the CPUTYPE if it needs to. MAKEOBJDIRPREFIX must be defined in an environment variable, and not on the command line, or in /etc/make.conf. buildworld will warn if it is improperly defined. FORMAT: This file contains a list, in reverse chronological order, of major breakages in tracking -current. It is not guaranteed to be a complete list of such breakages, and only contains entries since October 10, 2007. If you need to see UPDATING entries from before that date, you will need to fetch an UPDATING file from an older FreeBSD release. Copyright information: Copyright 1998-2009 M. Warner Losh. All Rights Reserved. Redistribution, publication, translation and use, with or without modification, in full or in part, in any form or format of this document are permitted without further permission from the author. THIS DOCUMENT IS PROVIDED BY WARNER LOSH ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL WARNER LOSH BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Contact Warner Losh if you have any questions about your use of this document. $FreeBSD$ Index: releng/10.2/crypto/openssh/session.c =================================================================== --- releng/10.2/crypto/openssh/session.c (revision 296954) +++ releng/10.2/crypto/openssh/session.c (revision 296955) @@ -1,2794 +1,2822 @@ /* $OpenBSD: session.c,v 1.270 2014/01/31 16:39:19 tedu Exp $ */ /* $FreeBSD$ */ /* * Copyright (c) 1995 Tatu Ylonen , Espoo, Finland * All rights reserved * * As far as I am concerned, the code I have written for this software * can be used freely for any purpose. Any derived versions of this * software must be clearly marked as such, and if the derived work is * incompatible with the protocol description in the RFC file, it must be * called by a name other than "ssh" or "Secure Shell". * * SSH2 support by Markus Friedl. * Copyright (c) 2000, 2001 Markus Friedl. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include "includes.h" __RCSID("$FreeBSD$"); #include #include #ifdef HAVE_SYS_STAT_H # include #endif #include #include #include #include +#include #include #include #include #ifdef HAVE_PATHS_H #include #endif #include #include #include #include #include #include #include #include "openbsd-compat/sys-queue.h" #include "xmalloc.h" #include "ssh.h" #include "ssh1.h" #include "ssh2.h" #include "sshpty.h" #include "packet.h" #include "buffer.h" #include "match.h" #include "uidswap.h" #include "compat.h" #include "channels.h" #include "key.h" #include "cipher.h" #ifdef GSSAPI #include "ssh-gss.h" #endif #include "hostfile.h" #include "auth.h" #include "auth-options.h" #include "authfd.h" #include "pathnames.h" #include "log.h" #include "servconf.h" #include "sshlogin.h" #include "serverloop.h" #include "canohost.h" #include "misc.h" #include "session.h" #include "kex.h" #include "monitor_wrap.h" #include "sftp.h" #if defined(KRB5) && defined(USE_AFS) #include #endif #ifdef WITH_SELINUX #include #endif #define IS_INTERNAL_SFTP(c) \ (!strncmp(c, INTERNAL_SFTP_NAME, sizeof(INTERNAL_SFTP_NAME) - 1) && \ (c[sizeof(INTERNAL_SFTP_NAME) - 1] == '\0' || \ c[sizeof(INTERNAL_SFTP_NAME) - 1] == ' ' || \ c[sizeof(INTERNAL_SFTP_NAME) - 1] == '\t')) /* func */ Session *session_new(void); void session_set_fds(Session *, int, int, int, int, int); void session_pty_cleanup(Session *); void session_proctitle(Session *); int session_setup_x11fwd(Session *); int do_exec_pty(Session *, const char *); int do_exec_no_pty(Session *, const char *); int do_exec(Session *, const char *); void do_login(Session *, const char *); #ifdef LOGIN_NEEDS_UTMPX static void do_pre_login(Session *s); #endif void do_child(Session *, const char *); void do_motd(void); int check_quietlogin(Session *, const char *); static void do_authenticated1(Authctxt *); static void do_authenticated2(Authctxt *); static int session_pty_req(Session *); /* import */ extern ServerOptions options; extern char *__progname; extern int log_stderr; extern int debug_flag; extern u_int utmp_len; extern int startup_pipe; extern void destroy_sensitive_data(void); extern Buffer loginmsg; /* original command from peer. */ const char *original_command = NULL; /* data */ static int sessions_first_unused = -1; static int sessions_nalloc = 0; static Session *sessions = NULL; #define SUBSYSTEM_NONE 0 #define SUBSYSTEM_EXT 1 #define SUBSYSTEM_INT_SFTP 2 #define SUBSYSTEM_INT_SFTP_ERROR 3 #ifdef HAVE_LOGIN_CAP login_cap_t *lc; #endif static int is_child = 0; /* Name and directory of socket for authentication agent forwarding. */ static char *auth_sock_name = NULL; static char *auth_sock_dir = NULL; /* removes the agent forwarding socket */ static void auth_sock_cleanup_proc(struct passwd *pw) { if (auth_sock_name != NULL) { temporarily_use_uid(pw); unlink(auth_sock_name); rmdir(auth_sock_dir); auth_sock_name = NULL; restore_uid(); } } static int auth_input_request_forwarding(struct passwd * pw) { Channel *nc; int sock = -1; struct sockaddr_un sunaddr; if (auth_sock_name != NULL) { error("authentication forwarding requested twice."); return 0; } /* Temporarily drop privileged uid for mkdir/bind. */ temporarily_use_uid(pw); /* Allocate a buffer for the socket name, and format the name. */ auth_sock_dir = xstrdup("/tmp/ssh-XXXXXXXXXX"); /* Create private directory for socket */ if (mkdtemp(auth_sock_dir) == NULL) { packet_send_debug("Agent forwarding disabled: " "mkdtemp() failed: %.100s", strerror(errno)); restore_uid(); free(auth_sock_dir); auth_sock_dir = NULL; goto authsock_err; } xasprintf(&auth_sock_name, "%s/agent.%ld", auth_sock_dir, (long) getpid()); /* Create the socket. */ sock = socket(AF_UNIX, SOCK_STREAM, 0); if (sock < 0) { error("socket: %.100s", strerror(errno)); restore_uid(); goto authsock_err; } /* Bind it to the name. */ memset(&sunaddr, 0, sizeof(sunaddr)); sunaddr.sun_family = AF_UNIX; strlcpy(sunaddr.sun_path, auth_sock_name, sizeof(sunaddr.sun_path)); if (bind(sock, (struct sockaddr *)&sunaddr, sizeof(sunaddr)) < 0) { error("bind: %.100s", strerror(errno)); restore_uid(); goto authsock_err; } /* Restore the privileged uid. */ restore_uid(); /* Start listening on the socket. */ if (listen(sock, SSH_LISTEN_BACKLOG) < 0) { error("listen: %.100s", strerror(errno)); goto authsock_err; } /* * Allocate a channel for the authentication agent socket. * Ignore HPN on that one given no improvement expected. */ nc = channel_new("auth socket", SSH_CHANNEL_AUTH_SOCKET, sock, sock, -1, CHAN_X11_WINDOW_DEFAULT, CHAN_X11_PACKET_DEFAULT, 0, "auth socket", 1); nc->path = xstrdup(auth_sock_name); return 1; authsock_err: free(auth_sock_name); if (auth_sock_dir != NULL) { rmdir(auth_sock_dir); free(auth_sock_dir); } if (sock != -1) close(sock); auth_sock_name = NULL; auth_sock_dir = NULL; return 0; } static void display_loginmsg(void) { if (buffer_len(&loginmsg) > 0) { buffer_append(&loginmsg, "\0", 1); printf("%s", (char *)buffer_ptr(&loginmsg)); buffer_clear(&loginmsg); } } void do_authenticated(Authctxt *authctxt) { setproctitle("%s", authctxt->pw->pw_name); /* setup the channel layer */ if (no_port_forwarding_flag || (options.allow_tcp_forwarding & FORWARD_LOCAL) == 0) channel_disable_adm_local_opens(); else channel_permit_all_opens(); auth_debug_send(); if (compat20) do_authenticated2(authctxt); else do_authenticated1(authctxt); do_cleanup(authctxt); } +/* Check untrusted xauth strings for metacharacters */ +static int +xauth_valid_string(const char *s) +{ + size_t i; + + for (i = 0; s[i] != '\0'; i++) { + if (!isalnum((u_char)s[i]) && + s[i] != '.' && s[i] != ':' && s[i] != '/' && + s[i] != '-' && s[i] != '_') + return 0; + } + return 1; +} + /* * Prepares for an interactive session. This is called after the user has * been successfully authenticated. During this message exchange, pseudo * terminals are allocated, X11, TCP/IP, and authentication agent forwardings * are requested, etc. */ static void do_authenticated1(Authctxt *authctxt) { Session *s; char *command; int success, type, screen_flag; int enable_compression_after_reply = 0; u_int proto_len, data_len, dlen, compression_level = 0; s = session_new(); if (s == NULL) { error("no more sessions"); return; } s->authctxt = authctxt; s->pw = authctxt->pw; /* * We stay in this loop until the client requests to execute a shell * or a command. */ for (;;) { success = 0; /* Get a packet from the client. */ type = packet_read(); /* Process the packet. */ switch (type) { case SSH_CMSG_REQUEST_COMPRESSION: compression_level = packet_get_int(); packet_check_eom(); if (compression_level < 1 || compression_level > 9) { packet_send_debug("Received invalid compression level %d.", compression_level); break; } if (options.compression == COMP_NONE) { debug2("compression disabled"); break; } /* Enable compression after we have responded with SUCCESS. */ enable_compression_after_reply = 1; success = 1; break; case SSH_CMSG_REQUEST_PTY: success = session_pty_req(s); break; case SSH_CMSG_X11_REQUEST_FORWARDING: s->auth_proto = packet_get_string(&proto_len); s->auth_data = packet_get_string(&data_len); screen_flag = packet_get_protocol_flags() & SSH_PROTOFLAG_SCREEN_NUMBER; debug2("SSH_PROTOFLAG_SCREEN_NUMBER: %d", screen_flag); if (packet_remaining() == 4) { if (!screen_flag) debug2("Buggy client: " "X11 screen flag missing"); s->screen = packet_get_int(); } else { s->screen = 0; } packet_check_eom(); - success = session_setup_x11fwd(s); + if (xauth_valid_string(s->auth_proto) && + xauth_valid_string(s->auth_data)) + success = session_setup_x11fwd(s); + else { + success = 0; + error("Invalid X11 forwarding data"); + } if (!success) { free(s->auth_proto); free(s->auth_data); s->auth_proto = NULL; s->auth_data = NULL; } break; case SSH_CMSG_AGENT_REQUEST_FORWARDING: if (!options.allow_agent_forwarding || no_agent_forwarding_flag || compat13) { debug("Authentication agent forwarding not permitted for this authentication."); break; } debug("Received authentication agent forwarding request."); success = auth_input_request_forwarding(s->pw); break; case SSH_CMSG_PORT_FORWARD_REQUEST: if (no_port_forwarding_flag) { debug("Port forwarding not permitted for this authentication."); break; } if (!(options.allow_tcp_forwarding & FORWARD_REMOTE)) { debug("Port forwarding not permitted."); break; } debug("Received TCP/IP port forwarding request."); if (channel_input_port_forward_request(s->pw->pw_uid == 0, options.gateway_ports) < 0) { debug("Port forwarding failed."); break; } success = 1; break; case SSH_CMSG_MAX_PACKET_SIZE: if (packet_set_maxsize(packet_get_int()) > 0) success = 1; break; case SSH_CMSG_EXEC_SHELL: case SSH_CMSG_EXEC_CMD: if (type == SSH_CMSG_EXEC_CMD) { command = packet_get_string(&dlen); debug("Exec command '%.500s'", command); if (do_exec(s, command) != 0) packet_disconnect( "command execution failed"); free(command); } else { if (do_exec(s, NULL) != 0) packet_disconnect( "shell execution failed"); } packet_check_eom(); session_close(s); return; default: /* * Any unknown messages in this phase are ignored, * and a failure message is returned. */ logit("Unknown packet type received after authentication: %d", type); } packet_start(success ? SSH_SMSG_SUCCESS : SSH_SMSG_FAILURE); packet_send(); packet_write_wait(); /* Enable compression now that we have replied if appropriate. */ if (enable_compression_after_reply) { enable_compression_after_reply = 0; packet_start_compression(compression_level); } } } #define USE_PIPES 1 /* * This is called to fork and execute a command when we have no tty. This * will call do_child from the child, and server_loop from the parent after * setting up file descriptors and such. */ int do_exec_no_pty(Session *s, const char *command) { pid_t pid; #ifdef USE_PIPES int pin[2], pout[2], perr[2]; if (s == NULL) fatal("do_exec_no_pty: no session"); /* Allocate pipes for communicating with the program. */ if (pipe(pin) < 0) { error("%s: pipe in: %.100s", __func__, strerror(errno)); return -1; } if (pipe(pout) < 0) { error("%s: pipe out: %.100s", __func__, strerror(errno)); close(pin[0]); close(pin[1]); return -1; } if (pipe(perr) < 0) { error("%s: pipe err: %.100s", __func__, strerror(errno)); close(pin[0]); close(pin[1]); close(pout[0]); close(pout[1]); return -1; } #else int inout[2], err[2]; if (s == NULL) fatal("do_exec_no_pty: no session"); /* Uses socket pairs to communicate with the program. */ if (socketpair(AF_UNIX, SOCK_STREAM, 0, inout) < 0) { error("%s: socketpair #1: %.100s", __func__, strerror(errno)); return -1; } if (socketpair(AF_UNIX, SOCK_STREAM, 0, err) < 0) { error("%s: socketpair #2: %.100s", __func__, strerror(errno)); close(inout[0]); close(inout[1]); return -1; } #endif session_proctitle(s); /* Fork the child. */ switch ((pid = fork())) { case -1: error("%s: fork: %.100s", __func__, strerror(errno)); #ifdef USE_PIPES close(pin[0]); close(pin[1]); close(pout[0]); close(pout[1]); close(perr[0]); close(perr[1]); #else close(inout[0]); close(inout[1]); close(err[0]); close(err[1]); #endif return -1; case 0: is_child = 1; /* Child. Reinitialize the log since the pid has changed. */ log_init(__progname, options.log_level, options.log_facility, log_stderr); /* * Create a new session and process group since the 4.4BSD * setlogin() affects the entire process group. */ if (setsid() < 0) error("setsid failed: %.100s", strerror(errno)); #ifdef USE_PIPES /* * Redirect stdin. We close the parent side of the socket * pair, and make the child side the standard input. */ close(pin[1]); if (dup2(pin[0], 0) < 0) perror("dup2 stdin"); close(pin[0]); /* Redirect stdout. */ close(pout[0]); if (dup2(pout[1], 1) < 0) perror("dup2 stdout"); close(pout[1]); /* Redirect stderr. */ close(perr[0]); if (dup2(perr[1], 2) < 0) perror("dup2 stderr"); close(perr[1]); #else /* * Redirect stdin, stdout, and stderr. Stdin and stdout will * use the same socket, as some programs (particularly rdist) * seem to depend on it. */ close(inout[1]); close(err[1]); if (dup2(inout[0], 0) < 0) /* stdin */ perror("dup2 stdin"); if (dup2(inout[0], 1) < 0) /* stdout (same as stdin) */ perror("dup2 stdout"); close(inout[0]); if (dup2(err[0], 2) < 0) /* stderr */ perror("dup2 stderr"); close(err[0]); #endif #ifdef _UNICOS cray_init_job(s->pw); /* set up cray jid and tmpdir */ #endif /* Do processing for the child (exec command etc). */ do_child(s, command); /* NOTREACHED */ default: break; } #ifdef _UNICOS signal(WJSIGNAL, cray_job_termination_handler); #endif /* _UNICOS */ #ifdef HAVE_CYGWIN cygwin_set_impersonation_token(INVALID_HANDLE_VALUE); #endif s->pid = pid; /* Set interactive/non-interactive mode. */ packet_set_interactive(s->display != NULL, options.ip_qos_interactive, options.ip_qos_bulk); /* * Clear loginmsg, since it's the child's responsibility to display * it to the user, otherwise multiple sessions may accumulate * multiple copies of the login messages. */ buffer_clear(&loginmsg); #ifdef USE_PIPES /* We are the parent. Close the child sides of the pipes. */ close(pin[0]); close(pout[1]); close(perr[1]); if (compat20) { session_set_fds(s, pin[1], pout[0], perr[0], s->is_subsystem, 0); } else { /* Enter the interactive session. */ server_loop(pid, pin[1], pout[0], perr[0]); /* server_loop has closed pin[1], pout[0], and perr[0]. */ } #else /* We are the parent. Close the child sides of the socket pairs. */ close(inout[0]); close(err[0]); /* * Enter the interactive session. Note: server_loop must be able to * handle the case that fdin and fdout are the same. */ if (compat20) { session_set_fds(s, inout[1], inout[1], err[1], s->is_subsystem, 0); } else { server_loop(pid, inout[1], inout[1], err[1]); /* server_loop has closed inout[1] and err[1]. */ } #endif return 0; } /* * This is called to fork and execute a command when we have a tty. This * will call do_child from the child, and server_loop from the parent after * setting up file descriptors, controlling tty, updating wtmp, utmp, * lastlog, and other such operations. */ int do_exec_pty(Session *s, const char *command) { int fdout, ptyfd, ttyfd, ptymaster; pid_t pid; if (s == NULL) fatal("do_exec_pty: no session"); ptyfd = s->ptyfd; ttyfd = s->ttyfd; /* * Create another descriptor of the pty master side for use as the * standard input. We could use the original descriptor, but this * simplifies code in server_loop. The descriptor is bidirectional. * Do this before forking (and cleanup in the child) so as to * detect and gracefully fail out-of-fd conditions. */ if ((fdout = dup(ptyfd)) < 0) { error("%s: dup #1: %s", __func__, strerror(errno)); close(ttyfd); close(ptyfd); return -1; } /* we keep a reference to the pty master */ if ((ptymaster = dup(ptyfd)) < 0) { error("%s: dup #2: %s", __func__, strerror(errno)); close(ttyfd); close(ptyfd); close(fdout); return -1; } /* Fork the child. */ switch ((pid = fork())) { case -1: error("%s: fork: %.100s", __func__, strerror(errno)); close(fdout); close(ptymaster); close(ttyfd); close(ptyfd); return -1; case 0: is_child = 1; close(fdout); close(ptymaster); /* Child. Reinitialize the log because the pid has changed. */ log_init(__progname, options.log_level, options.log_facility, log_stderr); /* Close the master side of the pseudo tty. */ close(ptyfd); /* Make the pseudo tty our controlling tty. */ pty_make_controlling_tty(&ttyfd, s->tty); /* Redirect stdin/stdout/stderr from the pseudo tty. */ if (dup2(ttyfd, 0) < 0) error("dup2 stdin: %s", strerror(errno)); if (dup2(ttyfd, 1) < 0) error("dup2 stdout: %s", strerror(errno)); if (dup2(ttyfd, 2) < 0) error("dup2 stderr: %s", strerror(errno)); /* Close the extra descriptor for the pseudo tty. */ close(ttyfd); /* record login, etc. similar to login(1) */ #ifndef HAVE_OSF_SIA if (!(options.use_login && command == NULL)) { #ifdef _UNICOS cray_init_job(s->pw); /* set up cray jid and tmpdir */ #endif /* _UNICOS */ do_login(s, command); } # ifdef LOGIN_NEEDS_UTMPX else do_pre_login(s); # endif #endif /* * Do common processing for the child, such as execing * the command. */ do_child(s, command); /* NOTREACHED */ default: break; } #ifdef _UNICOS signal(WJSIGNAL, cray_job_termination_handler); #endif /* _UNICOS */ #ifdef HAVE_CYGWIN cygwin_set_impersonation_token(INVALID_HANDLE_VALUE); #endif s->pid = pid; /* Parent. Close the slave side of the pseudo tty. */ close(ttyfd); /* Enter interactive session. */ s->ptymaster = ptymaster; packet_set_interactive(1, options.ip_qos_interactive, options.ip_qos_bulk); if (compat20) { session_set_fds(s, ptyfd, fdout, -1, 1, 1); } else { server_loop(pid, ptyfd, fdout, -1); /* server_loop _has_ closed ptyfd and fdout. */ } return 0; } #ifdef LOGIN_NEEDS_UTMPX static void do_pre_login(Session *s) { socklen_t fromlen; struct sockaddr_storage from; pid_t pid = getpid(); /* * Get IP address of client. If the connection is not a socket, let * the address be 0.0.0.0. */ memset(&from, 0, sizeof(from)); fromlen = sizeof(from); if (packet_connection_is_on_socket()) { if (getpeername(packet_get_connection_in(), (struct sockaddr *)&from, &fromlen) < 0) { debug("getpeername: %.100s", strerror(errno)); cleanup_exit(255); } } record_utmp_only(pid, s->tty, s->pw->pw_name, get_remote_name_or_ip(utmp_len, options.use_dns), (struct sockaddr *)&from, fromlen); } #endif /* * This is called to fork and execute a command. If another command is * to be forced, execute that instead. */ int do_exec(Session *s, const char *command) { int ret; const char *forced = NULL; char session_type[1024], *tty = NULL; if (options.adm_forced_command) { original_command = command; command = options.adm_forced_command; forced = "(config)"; } else if (forced_command) { original_command = command; command = forced_command; forced = "(key-option)"; } if (forced != NULL) { if (IS_INTERNAL_SFTP(command)) { s->is_subsystem = s->is_subsystem ? SUBSYSTEM_INT_SFTP : SUBSYSTEM_INT_SFTP_ERROR; } else if (s->is_subsystem) s->is_subsystem = SUBSYSTEM_EXT; snprintf(session_type, sizeof(session_type), "forced-command %s '%.900s'", forced, command); } else if (s->is_subsystem) { snprintf(session_type, sizeof(session_type), "subsystem '%.900s'", s->subsys); } else if (command == NULL) { snprintf(session_type, sizeof(session_type), "shell"); } else { /* NB. we don't log unforced commands to preserve privacy */ snprintf(session_type, sizeof(session_type), "command"); } if (s->ttyfd != -1) { tty = s->tty; if (strncmp(tty, "/dev/", 5) == 0) tty += 5; } verbose("Starting session: %s%s%s for %s from %.200s port %d", session_type, tty == NULL ? "" : " on ", tty == NULL ? "" : tty, s->pw->pw_name, get_remote_ipaddr(), get_remote_port()); #ifdef SSH_AUDIT_EVENTS if (command != NULL) PRIVSEP(audit_run_command(command)); else if (s->ttyfd == -1) { char *shell = s->pw->pw_shell; if (shell[0] == '\0') /* empty shell means /bin/sh */ shell =_PATH_BSHELL; PRIVSEP(audit_run_command(shell)); } #endif if (s->ttyfd != -1) ret = do_exec_pty(s, command); else ret = do_exec_no_pty(s, command); original_command = NULL; /* * Clear loginmsg: it's the child's responsibility to display * it to the user, otherwise multiple sessions may accumulate * multiple copies of the login messages. */ buffer_clear(&loginmsg); return ret; } /* administrative, login(1)-like work */ void do_login(Session *s, const char *command) { socklen_t fromlen; struct sockaddr_storage from; struct passwd * pw = s->pw; pid_t pid = getpid(); /* * Get IP address of client. If the connection is not a socket, let * the address be 0.0.0.0. */ memset(&from, 0, sizeof(from)); fromlen = sizeof(from); if (packet_connection_is_on_socket()) { if (getpeername(packet_get_connection_in(), (struct sockaddr *)&from, &fromlen) < 0) { debug("getpeername: %.100s", strerror(errno)); cleanup_exit(255); } } /* Record that there was a login on that tty from the remote host. */ if (!use_privsep) record_login(pid, s->tty, pw->pw_name, pw->pw_uid, get_remote_name_or_ip(utmp_len, options.use_dns), (struct sockaddr *)&from, fromlen); #ifdef USE_PAM /* * If password change is needed, do it now. * This needs to occur before the ~/.hushlogin check. */ if (options.use_pam && !use_privsep && s->authctxt->force_pwchange) { display_loginmsg(); do_pam_chauthtok(); s->authctxt->force_pwchange = 0; /* XXX - signal [net] parent to enable forwardings */ } #endif if (check_quietlogin(s, command)) return; display_loginmsg(); do_motd(); } /* * Display the message of the day. */ void do_motd(void) { FILE *f; char buf[256]; if (options.print_motd) { #ifdef HAVE_LOGIN_CAP f = fopen(login_getcapstr(lc, "welcome", "/etc/motd", "/etc/motd"), "r"); #else f = fopen("/etc/motd", "r"); #endif if (f) { while (fgets(buf, sizeof(buf), f)) fputs(buf, stdout); fclose(f); } } } /* * Check for quiet login, either .hushlogin or command given. */ int check_quietlogin(Session *s, const char *command) { char buf[256]; struct passwd *pw = s->pw; struct stat st; /* Return 1 if .hushlogin exists or a command given. */ if (command != NULL) return 1; snprintf(buf, sizeof(buf), "%.200s/.hushlogin", pw->pw_dir); #ifdef HAVE_LOGIN_CAP if (login_getcapbool(lc, "hushlogin", 0) || stat(buf, &st) >= 0) return 1; #else if (stat(buf, &st) >= 0) return 1; #endif return 0; } /* * Sets the value of the given variable in the environment. If the variable * already exists, its value is overridden. */ void child_set_env(char ***envp, u_int *envsizep, const char *name, const char *value) { char **env; u_int envsize; u_int i, namelen; if (strchr(name, '=') != NULL) { error("Invalid environment variable \"%.100s\"", name); return; } /* * If we're passed an uninitialized list, allocate a single null * entry before continuing. */ if (*envp == NULL && *envsizep == 0) { *envp = xmalloc(sizeof(char *)); *envp[0] = NULL; *envsizep = 1; } /* * Find the slot where the value should be stored. If the variable * already exists, we reuse the slot; otherwise we append a new slot * at the end of the array, expanding if necessary. */ env = *envp; namelen = strlen(name); for (i = 0; env[i]; i++) if (strncmp(env[i], name, namelen) == 0 && env[i][namelen] == '=') break; if (env[i]) { /* Reuse the slot. */ free(env[i]); } else { /* New variable. Expand if necessary. */ envsize = *envsizep; if (i >= envsize - 1) { if (envsize >= 1000) fatal("child_set_env: too many env vars"); envsize += 50; env = (*envp) = xrealloc(env, envsize, sizeof(char *)); *envsizep = envsize; } /* Need to set the NULL pointer at end of array beyond the new slot. */ env[i + 1] = NULL; } /* Allocate space and format the variable in the appropriate slot. */ env[i] = xmalloc(strlen(name) + 1 + strlen(value) + 1); snprintf(env[i], strlen(name) + 1 + strlen(value) + 1, "%s=%s", name, value); } /* * Reads environment variables from the given file and adds/overrides them * into the environment. If the file does not exist, this does nothing. * Otherwise, it must consist of empty lines, comments (line starts with '#') * and assignments of the form name=value. No other forms are allowed. */ static void read_environment_file(char ***env, u_int *envsize, const char *filename) { FILE *f; char buf[4096]; char *cp, *value; u_int lineno = 0; f = fopen(filename, "r"); if (!f) return; while (fgets(buf, sizeof(buf), f)) { if (++lineno > 1000) fatal("Too many lines in environment file %s", filename); for (cp = buf; *cp == ' ' || *cp == '\t'; cp++) ; if (!*cp || *cp == '#' || *cp == '\n') continue; cp[strcspn(cp, "\n")] = '\0'; value = strchr(cp, '='); if (value == NULL) { fprintf(stderr, "Bad line %u in %.100s\n", lineno, filename); continue; } /* * Replace the equals sign by nul, and advance value to * the value string. */ *value = '\0'; value++; child_set_env(env, envsize, cp, value); } fclose(f); } #ifdef HAVE_ETC_DEFAULT_LOGIN /* * Return named variable from specified environment, or NULL if not present. */ static char * child_get_env(char **env, const char *name) { int i; size_t len; len = strlen(name); for (i=0; env[i] != NULL; i++) if (strncmp(name, env[i], len) == 0 && env[i][len] == '=') return(env[i] + len + 1); return NULL; } /* * Read /etc/default/login. * We pick up the PATH (or SUPATH for root) and UMASK. */ static void read_etc_default_login(char ***env, u_int *envsize, uid_t uid) { char **tmpenv = NULL, *var; u_int i, tmpenvsize = 0; u_long mask; /* * We don't want to copy the whole file to the child's environment, * so we use a temporary environment and copy the variables we're * interested in. */ read_environment_file(&tmpenv, &tmpenvsize, "/etc/default/login"); if (tmpenv == NULL) return; if (uid == 0) var = child_get_env(tmpenv, "SUPATH"); else var = child_get_env(tmpenv, "PATH"); if (var != NULL) child_set_env(env, envsize, "PATH", var); if ((var = child_get_env(tmpenv, "UMASK")) != NULL) if (sscanf(var, "%5lo", &mask) == 1) umask((mode_t)mask); for (i = 0; tmpenv[i] != NULL; i++) free(tmpenv[i]); free(tmpenv); } #endif /* HAVE_ETC_DEFAULT_LOGIN */ void copy_environment(char **source, char ***env, u_int *envsize) { char *var_name, *var_val; int i; if (source == NULL) return; for(i = 0; source[i] != NULL; i++) { var_name = xstrdup(source[i]); if ((var_val = strstr(var_name, "=")) == NULL) { free(var_name); continue; } *var_val++ = '\0'; debug3("Copy environment: %s=%s", var_name, var_val); child_set_env(env, envsize, var_name, var_val); free(var_name); } } static char ** do_setup_env(Session *s, const char *shell) { char buf[256]; u_int i, envsize; char **env, *laddr; struct passwd *pw = s->pw; #if !defined (HAVE_LOGIN_CAP) && !defined (HAVE_CYGWIN) char *path = NULL; #else extern char **environ; char **senv, **var; #endif /* Initialize the environment. */ envsize = 100; env = xcalloc(envsize, sizeof(char *)); env[0] = NULL; #ifdef HAVE_CYGWIN /* * The Windows environment contains some setting which are * important for a running system. They must not be dropped. */ { char **p; p = fetch_windows_environment(); copy_environment(p, &env, &envsize); free_windows_environment(p); } #endif if (getenv("TZ")) child_set_env(&env, &envsize, "TZ", getenv("TZ")); #ifdef GSSAPI /* Allow any GSSAPI methods that we've used to alter * the childs environment as they see fit */ ssh_gssapi_do_child(&env, &envsize); #endif if (!options.use_login) { /* Set basic environment. */ for (i = 0; i < s->num_env; i++) child_set_env(&env, &envsize, s->env[i].name, s->env[i].val); child_set_env(&env, &envsize, "USER", pw->pw_name); child_set_env(&env, &envsize, "LOGNAME", pw->pw_name); #ifdef _AIX child_set_env(&env, &envsize, "LOGIN", pw->pw_name); #endif child_set_env(&env, &envsize, "HOME", pw->pw_dir); snprintf(buf, sizeof buf, "%.200s/%.50s", _PATH_MAILDIR, pw->pw_name); child_set_env(&env, &envsize, "MAIL", buf); #ifdef HAVE_LOGIN_CAP child_set_env(&env, &envsize, "PATH", _PATH_STDPATH); child_set_env(&env, &envsize, "TERM", "su"); senv = environ; environ = xmalloc(sizeof(char *)); *environ = NULL; (void) setusercontext(lc, pw, pw->pw_uid, LOGIN_SETENV|LOGIN_SETPATH); copy_environment(environ, &env, &envsize); for (var = environ; *var != NULL; ++var) free(*var); free(environ); environ = senv; #else /* HAVE_LOGIN_CAP */ # ifndef HAVE_CYGWIN /* * There's no standard path on Windows. The path contains * important components pointing to the system directories, * needed for loading shared libraries. So the path better * remains intact here. */ # ifdef HAVE_ETC_DEFAULT_LOGIN read_etc_default_login(&env, &envsize, pw->pw_uid); path = child_get_env(env, "PATH"); # endif /* HAVE_ETC_DEFAULT_LOGIN */ if (path == NULL || *path == '\0') { child_set_env(&env, &envsize, "PATH", s->pw->pw_uid == 0 ? SUPERUSER_PATH : _PATH_STDPATH); } # endif /* HAVE_CYGWIN */ #endif /* HAVE_LOGIN_CAP */ /* Normal systems set SHELL by default. */ child_set_env(&env, &envsize, "SHELL", shell); } /* Set custom environment options from RSA authentication. */ if (!options.use_login) { while (custom_environment) { struct envstring *ce = custom_environment; char *str = ce->s; for (i = 0; str[i] != '=' && str[i]; i++) ; if (str[i] == '=') { str[i] = 0; child_set_env(&env, &envsize, str, str + i + 1); } custom_environment = ce->next; free(ce->s); free(ce); } } /* SSH_CLIENT deprecated */ snprintf(buf, sizeof buf, "%.50s %d %d", get_remote_ipaddr(), get_remote_port(), get_local_port()); child_set_env(&env, &envsize, "SSH_CLIENT", buf); laddr = get_local_ipaddr(packet_get_connection_in()); snprintf(buf, sizeof buf, "%.50s %d %.50s %d", get_remote_ipaddr(), get_remote_port(), laddr, get_local_port()); free(laddr); child_set_env(&env, &envsize, "SSH_CONNECTION", buf); if (s->ttyfd != -1) child_set_env(&env, &envsize, "SSH_TTY", s->tty); if (s->term) child_set_env(&env, &envsize, "TERM", s->term); if (s->display) child_set_env(&env, &envsize, "DISPLAY", s->display); if (original_command) child_set_env(&env, &envsize, "SSH_ORIGINAL_COMMAND", original_command); #ifdef _UNICOS if (cray_tmpdir[0] != '\0') child_set_env(&env, &envsize, "TMPDIR", cray_tmpdir); #endif /* _UNICOS */ /* * Since we clear KRB5CCNAME at startup, if it's set now then it * must have been set by a native authentication method (eg AIX or * SIA), so copy it to the child. */ { char *cp; if ((cp = getenv("KRB5CCNAME")) != NULL) child_set_env(&env, &envsize, "KRB5CCNAME", cp); } #ifdef _AIX { char *cp; if ((cp = getenv("AUTHSTATE")) != NULL) child_set_env(&env, &envsize, "AUTHSTATE", cp); read_environment_file(&env, &envsize, "/etc/environment"); } #endif #ifdef KRB5 if (s->authctxt->krb5_ccname) child_set_env(&env, &envsize, "KRB5CCNAME", s->authctxt->krb5_ccname); #endif #ifdef USE_PAM /* * Pull in any environment variables that may have * been set by PAM. */ if (options.use_pam) { char **p; p = fetch_pam_child_environment(); copy_environment(p, &env, &envsize); free_pam_environment(p); p = fetch_pam_environment(); copy_environment(p, &env, &envsize); free_pam_environment(p); } #endif /* USE_PAM */ if (auth_sock_name != NULL) child_set_env(&env, &envsize, SSH_AUTHSOCKET_ENV_NAME, auth_sock_name); /* read $HOME/.ssh/environment. */ if (options.permit_user_env && !options.use_login) { snprintf(buf, sizeof buf, "%.200s/.ssh/environment", strcmp(pw->pw_dir, "/") ? pw->pw_dir : ""); read_environment_file(&env, &envsize, buf); } if (debug_flag) { /* dump the environment */ fprintf(stderr, "Environment:\n"); for (i = 0; env[i]; i++) fprintf(stderr, " %.200s\n", env[i]); } return env; } /* * Run $HOME/.ssh/rc, /etc/ssh/sshrc, or xauth (whichever is found * first in this order). */ static void do_rc_files(Session *s, const char *shell) { FILE *f = NULL; char cmd[1024]; int do_xauth; struct stat st; do_xauth = s->display != NULL && s->auth_proto != NULL && s->auth_data != NULL; /* ignore _PATH_SSH_USER_RC for subsystems and admin forced commands */ if (!s->is_subsystem && options.adm_forced_command == NULL && !no_user_rc && stat(_PATH_SSH_USER_RC, &st) >= 0) { snprintf(cmd, sizeof cmd, "%s -c '%s %s'", shell, _PATH_BSHELL, _PATH_SSH_USER_RC); if (debug_flag) fprintf(stderr, "Running %s\n", cmd); f = popen(cmd, "w"); if (f) { if (do_xauth) fprintf(f, "%s %s\n", s->auth_proto, s->auth_data); pclose(f); } else fprintf(stderr, "Could not run %s\n", _PATH_SSH_USER_RC); } else if (stat(_PATH_SSH_SYSTEM_RC, &st) >= 0) { if (debug_flag) fprintf(stderr, "Running %s %s\n", _PATH_BSHELL, _PATH_SSH_SYSTEM_RC); f = popen(_PATH_BSHELL " " _PATH_SSH_SYSTEM_RC, "w"); if (f) { if (do_xauth) fprintf(f, "%s %s\n", s->auth_proto, s->auth_data); pclose(f); } else fprintf(stderr, "Could not run %s\n", _PATH_SSH_SYSTEM_RC); } else if (do_xauth && options.xauth_location != NULL) { /* Add authority data to .Xauthority if appropriate. */ if (debug_flag) { fprintf(stderr, "Running %.500s remove %.100s\n", options.xauth_location, s->auth_display); fprintf(stderr, "%.500s add %.100s %.100s %.100s\n", options.xauth_location, s->auth_display, s->auth_proto, s->auth_data); } snprintf(cmd, sizeof cmd, "%s -q -", options.xauth_location); f = popen(cmd, "w"); if (f) { fprintf(f, "remove %s\n", s->auth_display); fprintf(f, "add %s %s %s\n", s->auth_display, s->auth_proto, s->auth_data); pclose(f); } else { fprintf(stderr, "Could not run %s\n", cmd); } } } static void do_nologin(struct passwd *pw) { FILE *f = NULL; char buf[1024], *nl, *def_nl = _PATH_NOLOGIN; struct stat sb; #ifdef HAVE_LOGIN_CAP if (login_getcapbool(lc, "ignorenologin", 0) || pw->pw_uid == 0) return; nl = login_getcapstr(lc, "nologin", def_nl, def_nl); #else if (pw->pw_uid == 0) return; nl = def_nl; #endif if (stat(nl, &sb) == -1) { if (nl != def_nl) free(nl); return; } /* /etc/nologin exists. Print its contents if we can and exit. */ logit("User %.100s not allowed because %s exists", pw->pw_name, nl); if ((f = fopen(nl, "r")) != NULL) { while (fgets(buf, sizeof(buf), f)) fputs(buf, stderr); fclose(f); } exit(254); } /* * Chroot into a directory after checking it for safety: all path components * must be root-owned directories with strict permissions. */ static void safely_chroot(const char *path, uid_t uid) { const char *cp; char component[MAXPATHLEN]; struct stat st; if (*path != '/') fatal("chroot path does not begin at root"); if (strlen(path) >= sizeof(component)) fatal("chroot path too long"); /* * Descend the path, checking that each component is a * root-owned directory with strict permissions. */ for (cp = path; cp != NULL;) { if ((cp = strchr(cp, '/')) == NULL) strlcpy(component, path, sizeof(component)); else { cp++; memcpy(component, path, cp - path); component[cp - path] = '\0'; } debug3("%s: checking '%s'", __func__, component); if (stat(component, &st) != 0) fatal("%s: stat(\"%s\"): %s", __func__, component, strerror(errno)); if (st.st_uid != 0 || (st.st_mode & 022) != 0) fatal("bad ownership or modes for chroot " "directory %s\"%s\"", cp == NULL ? "" : "component ", component); if (!S_ISDIR(st.st_mode)) fatal("chroot path %s\"%s\" is not a directory", cp == NULL ? "" : "component ", component); } if (chdir(path) == -1) fatal("Unable to chdir to chroot path \"%s\": " "%s", path, strerror(errno)); if (chroot(path) == -1) fatal("chroot(\"%s\"): %s", path, strerror(errno)); if (chdir("/") == -1) fatal("%s: chdir(/) after chroot: %s", __func__, strerror(errno)); verbose("Changed root directory to \"%s\"", path); } /* Set login name, uid, gid, and groups. */ void do_setusercontext(struct passwd *pw) { char *chroot_path, *tmp; platform_setusercontext(pw); if (platform_privileged_uidswap()) { #ifdef HAVE_LOGIN_CAP if (setusercontext(lc, pw, pw->pw_uid, (LOGIN_SETALL & ~(LOGIN_SETENV|LOGIN_SETPATH|LOGIN_SETUSER))) < 0) { perror("unable to set user context"); exit(1); } #else if (setlogin(pw->pw_name) < 0) error("setlogin failed: %s", strerror(errno)); if (setgid(pw->pw_gid) < 0) { perror("setgid"); exit(1); } /* Initialize the group list. */ if (initgroups(pw->pw_name, pw->pw_gid) < 0) { perror("initgroups"); exit(1); } endgrent(); #endif platform_setusercontext_post_groups(pw); if (options.chroot_directory != NULL && strcasecmp(options.chroot_directory, "none") != 0) { tmp = tilde_expand_filename(options.chroot_directory, pw->pw_uid); chroot_path = percent_expand(tmp, "h", pw->pw_dir, "u", pw->pw_name, (char *)NULL); safely_chroot(chroot_path, pw->pw_uid); free(tmp); free(chroot_path); /* Make sure we don't attempt to chroot again */ free(options.chroot_directory); options.chroot_directory = NULL; } #ifdef HAVE_LOGIN_CAP if (setusercontext(lc, pw, pw->pw_uid, LOGIN_SETUSER) < 0) { perror("unable to set user context (setuser)"); exit(1); } /* * FreeBSD's setusercontext() will not apply the user's * own umask setting unless running with the user's UID. */ (void) setusercontext(lc, pw, pw->pw_uid, LOGIN_SETUMASK); #else # ifdef USE_LIBIAF if (set_id(pw->pw_name) != 0) { fatal("set_id(%s) Failed", pw->pw_name); } # endif /* USE_LIBIAF */ /* Permanently switch to the desired uid. */ permanently_set_uid(pw); #endif } else if (options.chroot_directory != NULL && strcasecmp(options.chroot_directory, "none") != 0) { fatal("server lacks privileges to chroot to ChrootDirectory"); } if (getuid() != pw->pw_uid || geteuid() != pw->pw_uid) fatal("Failed to set uids to %u.", (u_int) pw->pw_uid); } static void do_pwchange(Session *s) { fflush(NULL); fprintf(stderr, "WARNING: Your password has expired.\n"); if (s->ttyfd != -1) { fprintf(stderr, "You must change your password now and login again!\n"); #ifdef WITH_SELINUX setexeccon(NULL); #endif #ifdef PASSWD_NEEDS_USERNAME execl(_PATH_PASSWD_PROG, "passwd", s->pw->pw_name, (char *)NULL); #else execl(_PATH_PASSWD_PROG, "passwd", (char *)NULL); #endif perror("passwd"); } else { fprintf(stderr, "Password change required but no TTY available.\n"); } exit(1); } static void launch_login(struct passwd *pw, const char *hostname) { /* Launch login(1). */ execl(LOGIN_PROGRAM, "login", "-h", hostname, #ifdef xxxLOGIN_NEEDS_TERM (s->term ? s->term : "unknown"), #endif /* LOGIN_NEEDS_TERM */ #ifdef LOGIN_NO_ENDOPT "-p", "-f", pw->pw_name, (char *)NULL); #else "-p", "-f", "--", pw->pw_name, (char *)NULL); #endif /* Login couldn't be executed, die. */ perror("login"); exit(1); } static void child_close_fds(void) { extern AuthenticationConnection *auth_conn; if (auth_conn) { ssh_close_authentication_connection(auth_conn); auth_conn = NULL; } if (packet_get_connection_in() == packet_get_connection_out()) close(packet_get_connection_in()); else { close(packet_get_connection_in()); close(packet_get_connection_out()); } /* * Close all descriptors related to channels. They will still remain * open in the parent. */ /* XXX better use close-on-exec? -markus */ channel_close_all(); /* * Close any extra file descriptors. Note that there may still be * descriptors left by system functions. They will be closed later. */ endpwent(); /* * Close any extra open file descriptors so that we don't have them * hanging around in clients. Note that we want to do this after * initgroups, because at least on Solaris 2.3 it leaves file * descriptors open. */ closefrom(STDERR_FILENO + 1); } /* * Performs common processing for the child, such as setting up the * environment, closing extra file descriptors, setting the user and group * ids, and executing the command or shell. */ #define ARGV_MAX 10 void do_child(Session *s, const char *command) { extern char **environ; char **env; char *argv[ARGV_MAX]; const char *shell, *shell0, *hostname = NULL; struct passwd *pw = s->pw; int r = 0; /* remove hostkey from the child's memory */ destroy_sensitive_data(); /* Force a password change */ if (s->authctxt->force_pwchange) { do_setusercontext(pw); child_close_fds(); do_pwchange(s); exit(1); } /* login(1) is only called if we execute the login shell */ if (options.use_login && command != NULL) options.use_login = 0; #ifdef _UNICOS cray_setup(pw->pw_uid, pw->pw_name, command); #endif /* _UNICOS */ /* * Login(1) does this as well, and it needs uid 0 for the "-h" * switch, so we let login(1) to this for us. */ if (!options.use_login) { #ifdef HAVE_OSF_SIA session_setup_sia(pw, s->ttyfd == -1 ? NULL : s->tty); if (!check_quietlogin(s, command)) do_motd(); #else /* HAVE_OSF_SIA */ /* When PAM is enabled we rely on it to do the nologin check */ if (!options.use_pam) do_nologin(pw); do_setusercontext(pw); /* * PAM session modules in do_setusercontext may have * generated messages, so if this in an interactive * login then display them too. */ if (!check_quietlogin(s, command)) display_loginmsg(); #endif /* HAVE_OSF_SIA */ } #ifdef USE_PAM if (options.use_pam && !options.use_login && !is_pam_session_open()) { debug3("PAM session not opened, exiting"); display_loginmsg(); exit(254); } #endif /* * Get the shell from the password data. An empty shell field is * legal, and means /bin/sh. */ shell = (pw->pw_shell[0] == '\0') ? _PATH_BSHELL : pw->pw_shell; /* * Make sure $SHELL points to the shell from the password file, * even if shell is overridden from login.conf */ env = do_setup_env(s, shell); #ifdef HAVE_LOGIN_CAP shell = login_getcapstr(lc, "shell", (char *)shell, (char *)shell); #endif /* we have to stash the hostname before we close our socket. */ if (options.use_login) hostname = get_remote_name_or_ip(utmp_len, options.use_dns); /* * Close the connection descriptors; note that this is the child, and * the server will still have the socket open, and it is important * that we do not shutdown it. Note that the descriptors cannot be * closed before building the environment, as we call * get_remote_ipaddr there. */ child_close_fds(); /* * Must take new environment into use so that .ssh/rc, * /etc/ssh/sshrc and xauth are run in the proper environment. */ environ = env; #if defined(KRB5) && defined(USE_AFS) /* * At this point, we check to see if AFS is active and if we have * a valid Kerberos 5 TGT. If so, it seems like a good idea to see * if we can (and need to) extend the ticket into an AFS token. If * we don't do this, we run into potential problems if the user's * home directory is in AFS and it's not world-readable. */ if (options.kerberos_get_afs_token && k_hasafs() && (s->authctxt->krb5_ctx != NULL)) { char cell[64]; debug("Getting AFS token"); k_setpag(); if (k_afs_cell_of_file(pw->pw_dir, cell, sizeof(cell)) == 0) krb5_afslog(s->authctxt->krb5_ctx, s->authctxt->krb5_fwd_ccache, cell, NULL); krb5_afslog_home(s->authctxt->krb5_ctx, s->authctxt->krb5_fwd_ccache, NULL, NULL, pw->pw_dir); } #endif /* Change current directory to the user's home directory. */ if (chdir(pw->pw_dir) < 0) { /* Suppress missing homedir warning for chroot case */ #ifdef HAVE_LOGIN_CAP r = login_getcapbool(lc, "requirehome", 0); #endif if (r || options.chroot_directory == NULL || strcasecmp(options.chroot_directory, "none") == 0) fprintf(stderr, "Could not chdir to home " "directory %s: %s\n", pw->pw_dir, strerror(errno)); if (r) exit(1); } closefrom(STDERR_FILENO + 1); if (!options.use_login) do_rc_files(s, shell); /* restore SIGPIPE for child */ signal(SIGPIPE, SIG_DFL); if (s->is_subsystem == SUBSYSTEM_INT_SFTP_ERROR) { printf("This service allows sftp connections only.\n"); fflush(NULL); exit(1); } else if (s->is_subsystem == SUBSYSTEM_INT_SFTP) { extern int optind, optreset; int i; char *p, *args; setproctitle("%s@%s", s->pw->pw_name, INTERNAL_SFTP_NAME); args = xstrdup(command ? command : "sftp-server"); for (i = 0, (p = strtok(args, " ")); p; (p = strtok(NULL, " "))) if (i < ARGV_MAX - 1) argv[i++] = p; argv[i] = NULL; optind = optreset = 1; __progname = argv[0]; #ifdef WITH_SELINUX ssh_selinux_change_context("sftpd_t"); #endif exit(sftp_server_main(i, argv, s->pw)); } fflush(NULL); if (options.use_login) { launch_login(pw, hostname); /* NEVERREACHED */ } /* Get the last component of the shell name. */ if ((shell0 = strrchr(shell, '/')) != NULL) shell0++; else shell0 = shell; /* * If we have no command, execute the shell. In this case, the shell * name to be passed in argv[0] is preceded by '-' to indicate that * this is a login shell. */ if (!command) { char argv0[256]; /* Start the shell. Set initial character to '-'. */ argv0[0] = '-'; if (strlcpy(argv0 + 1, shell0, sizeof(argv0) - 1) >= sizeof(argv0) - 1) { errno = EINVAL; perror(shell); exit(1); } /* Execute the shell. */ argv[0] = argv0; argv[1] = NULL; execve(shell, argv, env); /* Executing the shell failed. */ perror(shell); exit(1); } /* * Execute the command using the user's shell. This uses the -c * option to execute the command. */ argv[0] = (char *) shell0; argv[1] = "-c"; argv[2] = (char *) command; argv[3] = NULL; execve(shell, argv, env); perror(shell); exit(1); } void session_unused(int id) { debug3("%s: session id %d unused", __func__, id); if (id >= options.max_sessions || id >= sessions_nalloc) { fatal("%s: insane session id %d (max %d nalloc %d)", __func__, id, options.max_sessions, sessions_nalloc); } memset(&sessions[id], 0, sizeof(*sessions)); sessions[id].self = id; sessions[id].used = 0; sessions[id].chanid = -1; sessions[id].ptyfd = -1; sessions[id].ttyfd = -1; sessions[id].ptymaster = -1; sessions[id].x11_chanids = NULL; sessions[id].next_unused = sessions_first_unused; sessions_first_unused = id; } Session * session_new(void) { Session *s, *tmp; if (sessions_first_unused == -1) { if (sessions_nalloc >= options.max_sessions) return NULL; debug2("%s: allocate (allocated %d max %d)", __func__, sessions_nalloc, options.max_sessions); tmp = xrealloc(sessions, sessions_nalloc + 1, sizeof(*sessions)); if (tmp == NULL) { error("%s: cannot allocate %d sessions", __func__, sessions_nalloc + 1); return NULL; } sessions = tmp; session_unused(sessions_nalloc++); } if (sessions_first_unused >= sessions_nalloc || sessions_first_unused < 0) { fatal("%s: insane first_unused %d max %d nalloc %d", __func__, sessions_first_unused, options.max_sessions, sessions_nalloc); } s = &sessions[sessions_first_unused]; if (s->used) { fatal("%s: session %d already used", __func__, sessions_first_unused); } sessions_first_unused = s->next_unused; s->used = 1; s->next_unused = -1; debug("session_new: session %d", s->self); return s; } static void session_dump(void) { int i; for (i = 0; i < sessions_nalloc; i++) { Session *s = &sessions[i]; debug("dump: used %d next_unused %d session %d %p " "channel %d pid %ld", s->used, s->next_unused, s->self, s, s->chanid, (long)s->pid); } } int session_open(Authctxt *authctxt, int chanid) { Session *s = session_new(); debug("session_open: channel %d", chanid); if (s == NULL) { error("no more sessions"); return 0; } s->authctxt = authctxt; s->pw = authctxt->pw; if (s->pw == NULL || !authctxt->valid) fatal("no user for session %d", s->self); debug("session_open: session %d: link with channel %d", s->self, chanid); s->chanid = chanid; return 1; } Session * session_by_tty(char *tty) { int i; for (i = 0; i < sessions_nalloc; i++) { Session *s = &sessions[i]; if (s->used && s->ttyfd != -1 && strcmp(s->tty, tty) == 0) { debug("session_by_tty: session %d tty %s", i, tty); return s; } } debug("session_by_tty: unknown tty %.100s", tty); session_dump(); return NULL; } static Session * session_by_channel(int id) { int i; for (i = 0; i < sessions_nalloc; i++) { Session *s = &sessions[i]; if (s->used && s->chanid == id) { debug("session_by_channel: session %d channel %d", i, id); return s; } } debug("session_by_channel: unknown channel %d", id); session_dump(); return NULL; } static Session * session_by_x11_channel(int id) { int i, j; for (i = 0; i < sessions_nalloc; i++) { Session *s = &sessions[i]; if (s->x11_chanids == NULL || !s->used) continue; for (j = 0; s->x11_chanids[j] != -1; j++) { if (s->x11_chanids[j] == id) { debug("session_by_x11_channel: session %d " "channel %d", s->self, id); return s; } } } debug("session_by_x11_channel: unknown channel %d", id); session_dump(); return NULL; } static Session * session_by_pid(pid_t pid) { int i; debug("session_by_pid: pid %ld", (long)pid); for (i = 0; i < sessions_nalloc; i++) { Session *s = &sessions[i]; if (s->used && s->pid == pid) return s; } error("session_by_pid: unknown pid %ld", (long)pid); session_dump(); return NULL; } static int session_window_change_req(Session *s) { s->col = packet_get_int(); s->row = packet_get_int(); s->xpixel = packet_get_int(); s->ypixel = packet_get_int(); packet_check_eom(); pty_change_window_size(s->ptyfd, s->row, s->col, s->xpixel, s->ypixel); return 1; } static int session_pty_req(Session *s) { u_int len; int n_bytes; if (no_pty_flag || !options.permit_tty) { debug("Allocating a pty not permitted for this authentication."); return 0; } if (s->ttyfd != -1) { packet_disconnect("Protocol error: you already have a pty."); return 0; } s->term = packet_get_string(&len); if (compat20) { s->col = packet_get_int(); s->row = packet_get_int(); } else { s->row = packet_get_int(); s->col = packet_get_int(); } s->xpixel = packet_get_int(); s->ypixel = packet_get_int(); if (strcmp(s->term, "") == 0) { free(s->term); s->term = NULL; } /* Allocate a pty and open it. */ debug("Allocating pty."); if (!PRIVSEP(pty_allocate(&s->ptyfd, &s->ttyfd, s->tty, sizeof(s->tty)))) { free(s->term); s->term = NULL; s->ptyfd = -1; s->ttyfd = -1; error("session_pty_req: session %d alloc failed", s->self); return 0; } debug("session_pty_req: session %d alloc %s", s->self, s->tty); /* for SSH1 the tty modes length is not given */ if (!compat20) n_bytes = packet_remaining(); tty_parse_modes(s->ttyfd, &n_bytes); if (!use_privsep) pty_setowner(s->pw, s->tty); /* Set window size from the packet. */ pty_change_window_size(s->ptyfd, s->row, s->col, s->xpixel, s->ypixel); packet_check_eom(); session_proctitle(s); return 1; } static int session_subsystem_req(Session *s) { struct stat st; u_int len; int success = 0; char *prog, *cmd; u_int i; s->subsys = packet_get_string(&len); packet_check_eom(); debug2("subsystem request for %.100s by user %s", s->subsys, s->pw->pw_name); for (i = 0; i < options.num_subsystems; i++) { if (strcmp(s->subsys, options.subsystem_name[i]) == 0) { prog = options.subsystem_command[i]; cmd = options.subsystem_args[i]; if (strcmp(INTERNAL_SFTP_NAME, prog) == 0) { s->is_subsystem = SUBSYSTEM_INT_SFTP; debug("subsystem: %s", prog); } else { if (stat(prog, &st) < 0) debug("subsystem: cannot stat %s: %s", prog, strerror(errno)); s->is_subsystem = SUBSYSTEM_EXT; debug("subsystem: exec() %s", cmd); } success = do_exec(s, cmd) == 0; break; } } if (!success) logit("subsystem request for %.100s by user %s failed, " "subsystem not found", s->subsys, s->pw->pw_name); return success; } static int session_x11_req(Session *s) { int success; if (s->auth_proto != NULL || s->auth_data != NULL) { error("session_x11_req: session %d: " "x11 forwarding already active", s->self); return 0; } s->single_connection = packet_get_char(); s->auth_proto = packet_get_string(NULL); s->auth_data = packet_get_string(NULL); s->screen = packet_get_int(); packet_check_eom(); - success = session_setup_x11fwd(s); + if (xauth_valid_string(s->auth_proto) && + xauth_valid_string(s->auth_data)) + success = session_setup_x11fwd(s); + else { + success = 0; + error("Invalid X11 forwarding data"); + } if (!success) { free(s->auth_proto); free(s->auth_data); s->auth_proto = NULL; s->auth_data = NULL; } return success; } static int session_shell_req(Session *s) { packet_check_eom(); return do_exec(s, NULL) == 0; } static int session_exec_req(Session *s) { u_int len, success; char *command = packet_get_string(&len); packet_check_eom(); success = do_exec(s, command) == 0; free(command); return success; } static int session_break_req(Session *s) { packet_get_int(); /* ignored */ packet_check_eom(); if (s->ptymaster == -1 || tcsendbreak(s->ptymaster, 0) < 0) return 0; return 1; } static int session_env_req(Session *s) { char *name, *val; u_int name_len, val_len, i; name = packet_get_cstring(&name_len); val = packet_get_cstring(&val_len); packet_check_eom(); /* Don't set too many environment variables */ if (s->num_env > 128) { debug2("Ignoring env request %s: too many env vars", name); goto fail; } for (i = 0; i < options.num_accept_env; i++) { if (match_pattern(name, options.accept_env[i])) { debug2("Setting env %d: %s=%s", s->num_env, name, val); s->env = xrealloc(s->env, s->num_env + 1, sizeof(*s->env)); s->env[s->num_env].name = name; s->env[s->num_env].val = val; s->num_env++; return (1); } } debug2("Ignoring env request %s: disallowed name", name); fail: free(name); free(val); return (0); } static int session_auth_agent_req(Session *s) { static int called = 0; packet_check_eom(); if (no_agent_forwarding_flag || !options.allow_agent_forwarding) { debug("session_auth_agent_req: no_agent_forwarding_flag"); return 0; } if (called) { return 0; } else { called = 1; return auth_input_request_forwarding(s->pw); } } int session_input_channel_req(Channel *c, const char *rtype) { int success = 0; Session *s; if ((s = session_by_channel(c->self)) == NULL) { logit("session_input_channel_req: no session %d req %.100s", c->self, rtype); return 0; } debug("session_input_channel_req: session %d req %s", s->self, rtype); /* * a session is in LARVAL state until a shell, a command * or a subsystem is executed */ if (c->type == SSH_CHANNEL_LARVAL) { if (strcmp(rtype, "shell") == 0) { success = session_shell_req(s); } else if (strcmp(rtype, "exec") == 0) { success = session_exec_req(s); } else if (strcmp(rtype, "pty-req") == 0) { success = session_pty_req(s); } else if (strcmp(rtype, "x11-req") == 0) { success = session_x11_req(s); } else if (strcmp(rtype, "auth-agent-req@openssh.com") == 0) { success = session_auth_agent_req(s); } else if (strcmp(rtype, "subsystem") == 0) { success = session_subsystem_req(s); } else if (strcmp(rtype, "env") == 0) { success = session_env_req(s); } } if (strcmp(rtype, "window-change") == 0) { success = session_window_change_req(s); } else if (strcmp(rtype, "break") == 0) { success = session_break_req(s); } return success; } void session_set_fds(Session *s, int fdin, int fdout, int fderr, int ignore_fderr, int is_tty) { if (!compat20) fatal("session_set_fds: called for proto != 2.0"); /* * now that have a child and a pipe to the child, * we can activate our channel and register the fd's */ if (s->chanid == -1) fatal("no channel for session %d", s->self); if (options.hpn_disabled) channel_set_fds(s->chanid, fdout, fdin, fderr, ignore_fderr ? CHAN_EXTENDED_IGNORE : CHAN_EXTENDED_READ, 1, is_tty, CHAN_SES_WINDOW_DEFAULT); else channel_set_fds(s->chanid, fdout, fdin, fderr, ignore_fderr ? CHAN_EXTENDED_IGNORE : CHAN_EXTENDED_READ, 1, is_tty, options.hpn_buffer_size); } /* * Function to perform pty cleanup. Also called if we get aborted abnormally * (e.g., due to a dropped connection). */ void session_pty_cleanup2(Session *s) { if (s == NULL) { error("session_pty_cleanup: no session"); return; } if (s->ttyfd == -1) return; debug("session_pty_cleanup: session %d release %s", s->self, s->tty); /* Record that the user has logged out. */ if (s->pid != 0) record_logout(s->pid, s->tty, s->pw->pw_name); /* Release the pseudo-tty. */ if (getuid() == 0) pty_release(s->tty); /* * Close the server side of the socket pairs. We must do this after * the pty cleanup, so that another process doesn't get this pty * while we're still cleaning up. */ if (s->ptymaster != -1 && close(s->ptymaster) < 0) error("close(s->ptymaster/%d): %s", s->ptymaster, strerror(errno)); /* unlink pty from session */ s->ttyfd = -1; } void session_pty_cleanup(Session *s) { PRIVSEP(session_pty_cleanup2(s)); } static char * sig2name(int sig) { #define SSH_SIG(x) if (sig == SIG ## x) return #x SSH_SIG(ABRT); SSH_SIG(ALRM); SSH_SIG(FPE); SSH_SIG(HUP); SSH_SIG(ILL); SSH_SIG(INT); SSH_SIG(KILL); SSH_SIG(PIPE); SSH_SIG(QUIT); SSH_SIG(SEGV); SSH_SIG(TERM); SSH_SIG(USR1); SSH_SIG(USR2); #undef SSH_SIG return "SIG@openssh.com"; } static void session_close_x11(int id) { Channel *c; if ((c = channel_by_id(id)) == NULL) { debug("session_close_x11: x11 channel %d missing", id); } else { /* Detach X11 listener */ debug("session_close_x11: detach x11 channel %d", id); channel_cancel_cleanup(id); if (c->ostate != CHAN_OUTPUT_CLOSED) chan_mark_dead(c); } } static void session_close_single_x11(int id, void *arg) { Session *s; u_int i; debug3("session_close_single_x11: channel %d", id); channel_cancel_cleanup(id); if ((s = session_by_x11_channel(id)) == NULL) fatal("session_close_single_x11: no x11 channel %d", id); for (i = 0; s->x11_chanids[i] != -1; i++) { debug("session_close_single_x11: session %d: " "closing channel %d", s->self, s->x11_chanids[i]); /* * The channel "id" is already closing, but make sure we * close all of its siblings. */ if (s->x11_chanids[i] != id) session_close_x11(s->x11_chanids[i]); } free(s->x11_chanids); s->x11_chanids = NULL; free(s->display); s->display = NULL; free(s->auth_proto); s->auth_proto = NULL; free(s->auth_data); s->auth_data = NULL; free(s->auth_display); s->auth_display = NULL; } static void session_exit_message(Session *s, int status) { Channel *c; if ((c = channel_lookup(s->chanid)) == NULL) fatal("session_exit_message: session %d: no channel %d", s->self, s->chanid); debug("session_exit_message: session %d channel %d pid %ld", s->self, s->chanid, (long)s->pid); if (WIFEXITED(status)) { channel_request_start(s->chanid, "exit-status", 0); packet_put_int(WEXITSTATUS(status)); packet_send(); } else if (WIFSIGNALED(status)) { channel_request_start(s->chanid, "exit-signal", 0); packet_put_cstring(sig2name(WTERMSIG(status))); #ifdef WCOREDUMP packet_put_char(WCOREDUMP(status)? 1 : 0); #else /* WCOREDUMP */ packet_put_char(0); #endif /* WCOREDUMP */ packet_put_cstring(""); packet_put_cstring(""); packet_send(); } else { /* Some weird exit cause. Just exit. */ packet_disconnect("wait returned status %04x.", status); } /* disconnect channel */ debug("session_exit_message: release channel %d", s->chanid); /* * Adjust cleanup callback attachment to send close messages when * the channel gets EOF. The session will be then be closed * by session_close_by_channel when the childs close their fds. */ channel_register_cleanup(c->self, session_close_by_channel, 1); /* * emulate a write failure with 'chan_write_failed', nobody will be * interested in data we write. * Note that we must not call 'chan_read_failed', since there could * be some more data waiting in the pipe. */ if (c->ostate != CHAN_OUTPUT_CLOSED) chan_write_failed(c); } void session_close(Session *s) { u_int i; debug("session_close: session %d pid %ld", s->self, (long)s->pid); if (s->ttyfd != -1) session_pty_cleanup(s); free(s->term); free(s->display); free(s->x11_chanids); free(s->auth_display); free(s->auth_data); free(s->auth_proto); free(s->subsys); if (s->env != NULL) { for (i = 0; i < s->num_env; i++) { free(s->env[i].name); free(s->env[i].val); } free(s->env); } session_proctitle(s); session_unused(s->self); } void session_close_by_pid(pid_t pid, int status) { Session *s = session_by_pid(pid); if (s == NULL) { debug("session_close_by_pid: no session for pid %ld", (long)pid); return; } if (s->chanid != -1) session_exit_message(s, status); if (s->ttyfd != -1) session_pty_cleanup(s); s->pid = 0; } /* * this is called when a channel dies before * the session 'child' itself dies */ void session_close_by_channel(int id, void *arg) { Session *s = session_by_channel(id); u_int i; if (s == NULL) { debug("session_close_by_channel: no session for id %d", id); return; } debug("session_close_by_channel: channel %d child %ld", id, (long)s->pid); if (s->pid != 0) { debug("session_close_by_channel: channel %d: has child", id); /* * delay detach of session, but release pty, since * the fd's to the child are already closed */ if (s->ttyfd != -1) session_pty_cleanup(s); return; } /* detach by removing callback */ channel_cancel_cleanup(s->chanid); /* Close any X11 listeners associated with this session */ if (s->x11_chanids != NULL) { for (i = 0; s->x11_chanids[i] != -1; i++) { session_close_x11(s->x11_chanids[i]); s->x11_chanids[i] = -1; } } s->chanid = -1; session_close(s); } void session_destroy_all(void (*closefunc)(Session *)) { int i; for (i = 0; i < sessions_nalloc; i++) { Session *s = &sessions[i]; if (s->used) { if (closefunc != NULL) closefunc(s); else session_close(s); } } } static char * session_tty_list(void) { static char buf[1024]; int i; char *cp; buf[0] = '\0'; for (i = 0; i < sessions_nalloc; i++) { Session *s = &sessions[i]; if (s->used && s->ttyfd != -1) { if (strncmp(s->tty, "/dev/", 5) != 0) { cp = strrchr(s->tty, '/'); cp = (cp == NULL) ? s->tty : cp + 1; } else cp = s->tty + 5; if (buf[0] != '\0') strlcat(buf, ",", sizeof buf); strlcat(buf, cp, sizeof buf); } } if (buf[0] == '\0') strlcpy(buf, "notty", sizeof buf); return buf; } void session_proctitle(Session *s) { if (s->pw == NULL) error("no user for session %d", s->self); else setproctitle("%s@%s", s->pw->pw_name, session_tty_list()); } int session_setup_x11fwd(Session *s) { struct stat st; char display[512], auth_display[512]; char hostname[MAXHOSTNAMELEN]; u_int i; if (no_x11_forwarding_flag) { packet_send_debug("X11 forwarding disabled in user configuration file."); return 0; } if (!options.x11_forwarding) { debug("X11 forwarding disabled in server configuration file."); return 0; } if (!options.xauth_location || (stat(options.xauth_location, &st) == -1)) { packet_send_debug("No xauth program; cannot forward with spoofing."); return 0; } if (options.use_login) { packet_send_debug("X11 forwarding disabled; " "not compatible with UseLogin=yes."); return 0; } if (s->display != NULL) { debug("X11 display already set."); return 0; } if (x11_create_display_inet(options.x11_display_offset, options.x11_use_localhost, s->single_connection, &s->display_number, &s->x11_chanids) == -1) { debug("x11_create_display_inet failed."); return 0; } for (i = 0; s->x11_chanids[i] != -1; i++) { channel_register_cleanup(s->x11_chanids[i], session_close_single_x11, 0); } /* Set up a suitable value for the DISPLAY variable. */ if (gethostname(hostname, sizeof(hostname)) < 0) fatal("gethostname: %.100s", strerror(errno)); /* * auth_display must be used as the displayname when the * authorization entry is added with xauth(1). This will be * different than the DISPLAY string for localhost displays. */ if (options.x11_use_localhost) { snprintf(display, sizeof display, "localhost:%u.%u", s->display_number, s->screen); snprintf(auth_display, sizeof auth_display, "unix:%u.%u", s->display_number, s->screen); s->display = xstrdup(display); s->auth_display = xstrdup(auth_display); } else { #ifdef IPADDR_IN_DISPLAY struct hostent *he; struct in_addr my_addr; he = gethostbyname(hostname); if (he == NULL) { error("Can't get IP address for X11 DISPLAY."); packet_send_debug("Can't get IP address for X11 DISPLAY."); return 0; } memcpy(&my_addr, he->h_addr_list[0], sizeof(struct in_addr)); snprintf(display, sizeof display, "%.50s:%u.%u", inet_ntoa(my_addr), s->display_number, s->screen); #else snprintf(display, sizeof display, "%.400s:%u.%u", hostname, s->display_number, s->screen); #endif s->display = xstrdup(display); s->auth_display = xstrdup(display); } return 1; } static void do_authenticated2(Authctxt *authctxt) { server_loop2(authctxt); } void do_cleanup(Authctxt *authctxt) { static int called = 0; debug("do_cleanup"); /* no cleanup if we're in the child for login shell */ if (is_child) return; /* avoid double cleanup */ if (called) return; called = 1; if (authctxt == NULL) return; #ifdef USE_PAM if (options.use_pam) { sshpam_cleanup(); sshpam_thread_cleanup(); } #endif if (!authctxt->authenticated) return; #ifdef KRB5 if (options.kerberos_ticket_cleanup && authctxt->krb5_ctx) krb5_cleanup_proc(authctxt); #endif #ifdef GSSAPI if (compat20 && options.gss_cleanup_creds) ssh_gssapi_cleanup_creds(); #endif /* remove agent socket */ auth_sock_cleanup_proc(authctxt->pw); /* * Cleanup ptys/utmp only if privsep is disabled, * or if running in monitor. */ if (!use_privsep || mm_is_monitor()) session_destroy_all(session_pty_cleanup2); } Index: releng/10.2/sys/amd64/amd64/sys_machdep.c =================================================================== --- releng/10.2/sys/amd64/amd64/sys_machdep.c (revision 296954) +++ releng/10.2/sys/amd64/amd64/sys_machdep.c (revision 296955) @@ -1,754 +1,754 @@ /*- * Copyright (c) 2003 Peter Wemm. * Copyright (c) 1990 The Regents of the University of California. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * from: @(#)sys_machdep.c 5.5 (Berkeley) 1/19/91 */ #include __FBSDID("$FreeBSD$"); #include "opt_capsicum.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include /* for kernel_map */ #include #include #include #include #include #include #include #include #include #define MAX_LD 8192 int max_ldt_segment = 1024; SYSCTL_INT(_machdep, OID_AUTO, max_ldt_segment, CTLFLAG_RDTUN, &max_ldt_segment, 0, "Maximum number of allowed LDT segments in the single address space"); static void max_ldt_segment_init(void *arg __unused) { TUNABLE_INT_FETCH("machdep.max_ldt_segment", &max_ldt_segment); if (max_ldt_segment <= 0) max_ldt_segment = 1; if (max_ldt_segment > MAX_LD) max_ldt_segment = MAX_LD; } SYSINIT(maxldt, SI_SUB_VM_CONF, SI_ORDER_ANY, max_ldt_segment_init, NULL); #ifdef notyet #ifdef SMP static void set_user_ldt_rv(struct vmspace *vmsp); #endif #endif static void user_ldt_derefl(struct proc_ldt *pldt); #ifndef _SYS_SYSPROTO_H_ struct sysarch_args { int op; char *parms; }; #endif int sysarch_ldt(struct thread *td, struct sysarch_args *uap, int uap_space) { struct i386_ldt_args *largs, la; struct user_segment_descriptor *lp; int error = 0; /* * XXXKIB check that the BSM generation code knows to encode * the op argument. */ AUDIT_ARG_CMD(uap->op); if (uap_space == UIO_USERSPACE) { error = copyin(uap->parms, &la, sizeof(struct i386_ldt_args)); if (error != 0) return (error); largs = &la; } else largs = (struct i386_ldt_args *)uap->parms; switch (uap->op) { case I386_GET_LDT: error = amd64_get_ldt(td, largs); break; case I386_SET_LDT: if (largs->descs != NULL && largs->num > max_ldt_segment) return (EINVAL); set_pcb_flags(td->td_pcb, PCB_FULL_IRET); if (largs->descs != NULL) { lp = malloc(largs->num * sizeof(struct user_segment_descriptor), M_TEMP, M_WAITOK); error = copyin(largs->descs, lp, largs->num * sizeof(struct user_segment_descriptor)); if (error == 0) error = amd64_set_ldt(td, largs, lp); free(lp, M_TEMP); } else { error = amd64_set_ldt(td, largs, NULL); } break; } return (error); } void update_gdt_gsbase(struct thread *td, uint32_t base) { struct user_segment_descriptor *sd; if (td != curthread) return; set_pcb_flags(td->td_pcb, PCB_FULL_IRET); critical_enter(); sd = PCPU_GET(gs32p); sd->sd_lobase = base & 0xffffff; sd->sd_hibase = (base >> 24) & 0xff; critical_exit(); } void update_gdt_fsbase(struct thread *td, uint32_t base) { struct user_segment_descriptor *sd; if (td != curthread) return; set_pcb_flags(td->td_pcb, PCB_FULL_IRET); critical_enter(); sd = PCPU_GET(fs32p); sd->sd_lobase = base & 0xffffff; sd->sd_hibase = (base >> 24) & 0xff; critical_exit(); } int sysarch(td, uap) struct thread *td; register struct sysarch_args *uap; { int error = 0; struct pcb *pcb = curthread->td_pcb; uint32_t i386base; uint64_t a64base; struct i386_ioperm_args iargs; struct i386_get_xfpustate i386xfpu; struct amd64_get_xfpustate a64xfpu; #ifdef CAPABILITY_MODE /* * When adding new operations, add a new case statement here to * explicitly indicate whether or not the operation is safe to * perform in capability mode. */ if (IN_CAPABILITY_MODE(td)) { switch (uap->op) { case I386_GET_LDT: case I386_SET_LDT: case I386_GET_IOPERM: case I386_GET_FSBASE: case I386_SET_FSBASE: case I386_GET_GSBASE: case I386_SET_GSBASE: case I386_GET_XFPUSTATE: case AMD64_GET_FSBASE: case AMD64_SET_FSBASE: case AMD64_GET_GSBASE: case AMD64_SET_GSBASE: case AMD64_GET_XFPUSTATE: break; case I386_SET_IOPERM: default: #ifdef KTRACE if (KTRPOINT(td, KTR_CAPFAIL)) ktrcapfail(CAPFAIL_SYSCALL, NULL, NULL); #endif return (ECAPMODE); } } #endif if (uap->op == I386_GET_LDT || uap->op == I386_SET_LDT) return (sysarch_ldt(td, uap, UIO_USERSPACE)); /* * XXXKIB check that the BSM generation code knows to encode * the op argument. */ AUDIT_ARG_CMD(uap->op); switch (uap->op) { case I386_GET_IOPERM: case I386_SET_IOPERM: if ((error = copyin(uap->parms, &iargs, sizeof(struct i386_ioperm_args))) != 0) return (error); break; case I386_GET_XFPUSTATE: if ((error = copyin(uap->parms, &i386xfpu, sizeof(struct i386_get_xfpustate))) != 0) return (error); a64xfpu.addr = (void *)(uintptr_t)i386xfpu.addr; a64xfpu.len = i386xfpu.len; break; case AMD64_GET_XFPUSTATE: if ((error = copyin(uap->parms, &a64xfpu, sizeof(struct amd64_get_xfpustate))) != 0) return (error); break; default: break; } switch (uap->op) { case I386_GET_IOPERM: error = amd64_get_ioperm(td, &iargs); if (error == 0) error = copyout(&iargs, uap->parms, sizeof(struct i386_ioperm_args)); break; case I386_SET_IOPERM: error = amd64_set_ioperm(td, &iargs); break; case I386_GET_FSBASE: i386base = pcb->pcb_fsbase; error = copyout(&i386base, uap->parms, sizeof(i386base)); break; case I386_SET_FSBASE: error = copyin(uap->parms, &i386base, sizeof(i386base)); if (!error) { pcb->pcb_fsbase = i386base; td->td_frame->tf_fs = _ufssel; update_gdt_fsbase(td, i386base); } break; case I386_GET_GSBASE: i386base = pcb->pcb_gsbase; error = copyout(&i386base, uap->parms, sizeof(i386base)); break; case I386_SET_GSBASE: error = copyin(uap->parms, &i386base, sizeof(i386base)); if (!error) { pcb->pcb_gsbase = i386base; td->td_frame->tf_gs = _ugssel; update_gdt_gsbase(td, i386base); } break; case AMD64_GET_FSBASE: error = copyout(&pcb->pcb_fsbase, uap->parms, sizeof(pcb->pcb_fsbase)); break; case AMD64_SET_FSBASE: error = copyin(uap->parms, &a64base, sizeof(a64base)); if (!error) { if (a64base < VM_MAXUSER_ADDRESS) { pcb->pcb_fsbase = a64base; set_pcb_flags(pcb, PCB_FULL_IRET); td->td_frame->tf_fs = _ufssel; } else error = EINVAL; } break; case AMD64_GET_GSBASE: error = copyout(&pcb->pcb_gsbase, uap->parms, sizeof(pcb->pcb_gsbase)); break; case AMD64_SET_GSBASE: error = copyin(uap->parms, &a64base, sizeof(a64base)); if (!error) { if (a64base < VM_MAXUSER_ADDRESS) { pcb->pcb_gsbase = a64base; set_pcb_flags(pcb, PCB_FULL_IRET); td->td_frame->tf_gs = _ugssel; } else error = EINVAL; } break; case I386_GET_XFPUSTATE: case AMD64_GET_XFPUSTATE: if (a64xfpu.len > cpu_max_ext_state_size - sizeof(struct savefpu)) return (EINVAL); fpugetregs(td); error = copyout((char *)(get_pcb_user_save_td(td) + 1), a64xfpu.addr, a64xfpu.len); break; default: error = EINVAL; break; } return (error); } int amd64_set_ioperm(td, uap) struct thread *td; struct i386_ioperm_args *uap; { int i, error; char *iomap; struct amd64tss *tssp; struct system_segment_descriptor *tss_sd; u_long *addr; struct pcb *pcb; if ((error = priv_check(td, PRIV_IO)) != 0) return (error); if ((error = securelevel_gt(td->td_ucred, 0)) != 0) return (error); if (uap->start + uap->length > IOPAGES * PAGE_SIZE * NBBY) return (EINVAL); /* * XXX * While this is restricted to root, we should probably figure out * whether any other driver is using this i/o address, as so not to * cause confusion. This probably requires a global 'usage registry'. */ pcb = td->td_pcb; if (pcb->pcb_tssp == NULL) { tssp = (struct amd64tss *)kmem_malloc(kernel_arena, ctob(IOPAGES+1), M_WAITOK); if (tssp == NULL) return (ENOMEM); iomap = (char *)&tssp[1]; addr = (u_long *)iomap; for (i = 0; i < (ctob(IOPAGES) + 1) / sizeof(u_long); i++) *addr++ = ~0; critical_enter(); /* Takes care of tss_rsp0. */ memcpy(tssp, &common_tss[PCPU_GET(cpuid)], sizeof(struct amd64tss)); tssp->tss_iobase = sizeof(*tssp); pcb->pcb_tssp = tssp; tss_sd = PCPU_GET(tss); tss_sd->sd_lobase = (u_long)tssp & 0xffffff; tss_sd->sd_hibase = ((u_long)tssp >> 24) & 0xfffffffffful; tss_sd->sd_type = SDT_SYSTSS; ltr(GSEL(GPROC0_SEL, SEL_KPL)); PCPU_SET(tssp, tssp); critical_exit(); } else iomap = (char *)&pcb->pcb_tssp[1]; for (i = uap->start; i < uap->start + uap->length; i++) { if (uap->enable) iomap[i >> 3] &= ~(1 << (i & 7)); else iomap[i >> 3] |= (1 << (i & 7)); } return (error); } int amd64_get_ioperm(td, uap) struct thread *td; struct i386_ioperm_args *uap; { int i, state; char *iomap; if (uap->start >= IOPAGES * PAGE_SIZE * NBBY) return (EINVAL); if (td->td_pcb->pcb_tssp == NULL) { uap->length = 0; goto done; } iomap = (char *)&td->td_pcb->pcb_tssp[1]; i = uap->start; state = (iomap[i >> 3] >> (i & 7)) & 1; uap->enable = !state; uap->length = 1; for (i = uap->start + 1; i < IOPAGES * PAGE_SIZE * NBBY; i++) { if (state != ((iomap[i >> 3] >> (i & 7)) & 1)) break; uap->length++; } done: return (0); } /* * Update the GDT entry pointing to the LDT to point to the LDT of the * current process. */ void set_user_ldt(struct mdproc *mdp) { critical_enter(); *PCPU_GET(ldt) = mdp->md_ldt_sd; lldt(GSEL(GUSERLDT_SEL, SEL_KPL)); critical_exit(); } #ifdef notyet #ifdef SMP static void set_user_ldt_rv(struct vmspace *vmsp) { struct thread *td; td = curthread; if (vmsp != td->td_proc->p_vmspace) return; set_user_ldt(&td->td_proc->p_md); } #endif #endif struct proc_ldt * user_ldt_alloc(struct proc *p, int force) { struct proc_ldt *pldt, *new_ldt; struct mdproc *mdp; struct soft_segment_descriptor sldt; mtx_assert(&dt_lock, MA_OWNED); mdp = &p->p_md; if (!force && mdp->md_ldt != NULL) return (mdp->md_ldt); mtx_unlock(&dt_lock); new_ldt = malloc(sizeof(struct proc_ldt), M_SUBPROC, M_WAITOK); new_ldt->ldt_base = (caddr_t)kmem_malloc(kernel_arena, max_ldt_segment * sizeof(struct user_segment_descriptor), M_WAITOK | M_ZERO); if (new_ldt->ldt_base == NULL) { FREE(new_ldt, M_SUBPROC); mtx_lock(&dt_lock); return (NULL); } new_ldt->ldt_refcnt = 1; sldt.ssd_base = (uint64_t)new_ldt->ldt_base; sldt.ssd_limit = max_ldt_segment * sizeof(struct user_segment_descriptor) - 1; sldt.ssd_type = SDT_SYSLDT; sldt.ssd_dpl = SEL_KPL; sldt.ssd_p = 1; sldt.ssd_long = 0; sldt.ssd_def32 = 0; sldt.ssd_gran = 0; mtx_lock(&dt_lock); pldt = mdp->md_ldt; if (pldt != NULL && !force) { kmem_free(kernel_arena, (vm_offset_t)new_ldt->ldt_base, max_ldt_segment * sizeof(struct user_segment_descriptor)); free(new_ldt, M_SUBPROC); return (pldt); } if (pldt != NULL) { bcopy(pldt->ldt_base, new_ldt->ldt_base, max_ldt_segment * sizeof(struct user_segment_descriptor)); user_ldt_derefl(pldt); } ssdtosyssd(&sldt, &p->p_md.md_ldt_sd); atomic_store_rel_ptr((volatile uintptr_t *)&mdp->md_ldt, (uintptr_t)new_ldt); if (p == curproc) set_user_ldt(mdp); return (mdp->md_ldt); } void user_ldt_free(struct thread *td) { struct proc *p = td->td_proc; struct mdproc *mdp = &p->p_md; struct proc_ldt *pldt; mtx_assert(&dt_lock, MA_OWNED); if ((pldt = mdp->md_ldt) == NULL) { mtx_unlock(&dt_lock); return; } mdp->md_ldt = NULL; bzero(&mdp->md_ldt_sd, sizeof(mdp->md_ldt_sd)); if (td == curthread) lldt(GSEL(GNULL_SEL, SEL_KPL)); user_ldt_deref(pldt); } static void user_ldt_derefl(struct proc_ldt *pldt) { if (--pldt->ldt_refcnt == 0) { kmem_free(kernel_arena, (vm_offset_t)pldt->ldt_base, max_ldt_segment * sizeof(struct user_segment_descriptor)); free(pldt, M_SUBPROC); } } void user_ldt_deref(struct proc_ldt *pldt) { mtx_assert(&dt_lock, MA_OWNED); user_ldt_derefl(pldt); mtx_unlock(&dt_lock); } /* * Note for the authors of compat layers (linux, etc): copyout() in * the function below is not a problem since it presents data in * arch-specific format (i.e. i386-specific in this case), not in * the OS-specific one. */ int amd64_get_ldt(td, uap) struct thread *td; struct i386_ldt_args *uap; { int error = 0; struct proc_ldt *pldt; int num; struct user_segment_descriptor *lp; #ifdef DEBUG printf("amd64_get_ldt: start=%d num=%d descs=%p\n", uap->start, uap->num, (void *)uap->descs); #endif if ((pldt = td->td_proc->p_md.md_ldt) != NULL) { lp = &((struct user_segment_descriptor *)(pldt->ldt_base)) [uap->start]; num = min(uap->num, max_ldt_segment); } else return (EINVAL); if ((uap->start > (unsigned int)max_ldt_segment) || ((unsigned int)num > (unsigned int)max_ldt_segment) || ((unsigned int)(uap->start + num) > (unsigned int)max_ldt_segment)) return(EINVAL); error = copyout(lp, uap->descs, num * sizeof(struct user_segment_descriptor)); if (!error) td->td_retval[0] = num; return(error); } int amd64_set_ldt(td, uap, descs) struct thread *td; struct i386_ldt_args *uap; struct user_segment_descriptor *descs; { - int error = 0, i; - int largest_ld; + int error = 0; + unsigned int largest_ld, i; struct mdproc *mdp = &td->td_proc->p_md; struct proc_ldt *pldt; struct user_segment_descriptor *dp; struct proc *p; #ifdef DEBUG printf("amd64_set_ldt: start=%d num=%d descs=%p\n", uap->start, uap->num, (void *)uap->descs); #endif set_pcb_flags(td->td_pcb, PCB_FULL_IRET); p = td->td_proc; if (descs == NULL) { /* Free descriptors */ if (uap->start == 0 && uap->num == 0) uap->num = max_ldt_segment; if (uap->num == 0) return (EINVAL); if ((pldt = mdp->md_ldt) == NULL || uap->start >= max_ldt_segment) return (0); largest_ld = uap->start + uap->num; if (largest_ld > max_ldt_segment) largest_ld = max_ldt_segment; i = largest_ld - uap->start; mtx_lock(&dt_lock); bzero(&((struct user_segment_descriptor *)(pldt->ldt_base)) [uap->start], sizeof(struct user_segment_descriptor) * i); mtx_unlock(&dt_lock); return (0); } if (!(uap->start == LDT_AUTO_ALLOC && uap->num == 1)) { /* verify range of descriptors to modify */ largest_ld = uap->start + uap->num; if (uap->start >= max_ldt_segment || largest_ld > max_ldt_segment) return (EINVAL); } /* Check descriptors for access violations */ for (i = 0; i < uap->num; i++) { dp = &descs[i]; switch (dp->sd_type) { case SDT_SYSNULL: /* system null */ dp->sd_p = 0; break; case SDT_SYS286TSS: case SDT_SYSLDT: case SDT_SYS286BSY: case SDT_SYS286CGT: case SDT_SYSTASKGT: case SDT_SYS286IGT: case SDT_SYS286TGT: case SDT_SYSNULL2: case SDT_SYSTSS: case SDT_SYSNULL3: case SDT_SYSBSY: case SDT_SYSCGT: case SDT_SYSNULL4: case SDT_SYSIGT: case SDT_SYSTGT: /* I can't think of any reason to allow a user proc * to create a segment of these types. They are * for OS use only. */ return (EACCES); /*NOTREACHED*/ /* memory segment types */ case SDT_MEMEC: /* memory execute only conforming */ case SDT_MEMEAC: /* memory execute only accessed conforming */ case SDT_MEMERC: /* memory execute read conforming */ case SDT_MEMERAC: /* memory execute read accessed conforming */ /* Must be "present" if executable and conforming. */ if (dp->sd_p == 0) return (EACCES); break; case SDT_MEMRO: /* memory read only */ case SDT_MEMROA: /* memory read only accessed */ case SDT_MEMRW: /* memory read write */ case SDT_MEMRWA: /* memory read write accessed */ case SDT_MEMROD: /* memory read only expand dwn limit */ case SDT_MEMRODA: /* memory read only expand dwn lim accessed */ case SDT_MEMRWD: /* memory read write expand dwn limit */ case SDT_MEMRWDA: /* memory read write expand dwn lim acessed */ case SDT_MEME: /* memory execute only */ case SDT_MEMEA: /* memory execute only accessed */ case SDT_MEMER: /* memory execute read */ case SDT_MEMERA: /* memory execute read accessed */ break; default: return(EINVAL); /*NOTREACHED*/ } /* Only user (ring-3) descriptors may be present. */ if ((dp->sd_p != 0) && (dp->sd_dpl != SEL_UPL)) return (EACCES); } if (uap->start == LDT_AUTO_ALLOC && uap->num == 1) { /* Allocate a free slot */ mtx_lock(&dt_lock); pldt = user_ldt_alloc(p, 0); if (pldt == NULL) { mtx_unlock(&dt_lock); return (ENOMEM); } /* * start scanning a bit up to leave room for NVidia and * Wine, which still user the "Blat" method of allocation. */ i = 16; dp = &((struct user_segment_descriptor *)(pldt->ldt_base))[i]; for (; i < max_ldt_segment; ++i, ++dp) { if (dp->sd_type == SDT_SYSNULL) break; } if (i >= max_ldt_segment) { mtx_unlock(&dt_lock); return (ENOSPC); } uap->start = i; error = amd64_set_ldt_data(td, i, 1, descs); mtx_unlock(&dt_lock); } else { largest_ld = uap->start + uap->num; if (largest_ld > max_ldt_segment) return (EINVAL); mtx_lock(&dt_lock); if (user_ldt_alloc(p, 0) != NULL) { error = amd64_set_ldt_data(td, uap->start, uap->num, descs); } mtx_unlock(&dt_lock); } if (error == 0) td->td_retval[0] = uap->start; return (error); } int amd64_set_ldt_data(struct thread *td, int start, int num, struct user_segment_descriptor *descs) { struct mdproc *mdp = &td->td_proc->p_md; struct proc_ldt *pldt = mdp->md_ldt; mtx_assert(&dt_lock, MA_OWNED); /* Fill in range */ bcopy(descs, &((struct user_segment_descriptor *)(pldt->ldt_base))[start], num * sizeof(struct user_segment_descriptor)); return (0); } Index: releng/10.2/sys/conf/newvers.sh =================================================================== --- releng/10.2/sys/conf/newvers.sh (revision 296954) +++ releng/10.2/sys/conf/newvers.sh (revision 296955) @@ -1,224 +1,224 @@ #!/bin/sh - # # Copyright (c) 1984, 1986, 1990, 1993 # The Regents of the University of California. All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # 4. Neither the name of the University nor the names of its contributors # may be used to endorse or promote products derived from this software # without specific prior written permission. # # THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE # ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS # OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) # HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT # LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY # OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF # SUCH DAMAGE. # # @(#)newvers.sh 8.1 (Berkeley) 4/20/94 # $FreeBSD$ TYPE="FreeBSD" REVISION="10.2" -BRANCH="RELEASE-p13" +BRANCH="RELEASE-p14" if [ "X${BRANCH_OVERRIDE}" != "X" ]; then BRANCH=${BRANCH_OVERRIDE} fi RELEASE="${REVISION}-${BRANCH}" VERSION="${TYPE} ${RELEASE}" if [ "X${SYSDIR}" = "X" ]; then SYSDIR=$(dirname $0)/.. fi if [ "X${PARAMFILE}" != "X" ]; then RELDATE=$(awk '/__FreeBSD_version.*propagated to newvers/ {print $3}' \ ${PARAMFILE}) else RELDATE=$(awk '/__FreeBSD_version.*propagated to newvers/ {print $3}' \ ${SYSDIR}/sys/param.h) fi b=share/examples/etc/bsd-style-copyright year=$(sed -Ee '/^Copyright .* The FreeBSD Project/!d;s/^.*1992-([0-9]*) .*$/\1/g' ${SYSDIR}/../COPYRIGHT) # look for copyright template for bsd_copyright in ../$b ../../$b ../../../$b /usr/src/$b /usr/$b do if [ -r "$bsd_copyright" ]; then COPYRIGHT=`sed \ -e "s/\[year\]/1992-$year/" \ -e 's/\[your name here\]\.* /The FreeBSD Project./' \ -e 's/\[your name\]\.*/The FreeBSD Project./' \ -e '/\[id for your version control system, if any\]/d' \ $bsd_copyright` break fi done # no copyright found, use a dummy if [ X"$COPYRIGHT" = X ]; then COPYRIGHT="/*- * Copyright (c) 1992-$year The FreeBSD Project. * All rights reserved. * */" fi # add newline COPYRIGHT="$COPYRIGHT " LC_ALL=C; export LC_ALL if [ ! -r version ] then echo 0 > version fi touch version v=`cat version` u=${USER:-root} d=`pwd` h=${HOSTNAME:-`hostname`} t=`date` i=`${MAKE:-make} -V KERN_IDENT` compiler_v=$($(${MAKE:-make} -V CC) -v 2>&1 | grep 'version') for dir in /usr/bin /usr/local/bin; do if [ ! -z "${svnversion}" ] ; then break fi if [ -x "${dir}/svnversion" ] && [ -z ${svnversion} ] ; then # Run svnversion from ${dir} on this script; if return code # is not zero, the checkout might not be compatible with the # svnversion being used. ${dir}/svnversion $(realpath ${0}) >/dev/null 2>&1 if [ $? -eq 0 ]; then svnversion=${dir}/svnversion break fi fi done if [ -z "${svnversion}" ] && [ -x /usr/bin/svnliteversion ] ; then /usr/bin/svnliteversion $(realpath ${0}) >/dev/null 2>&1 if [ $? -eq 0 ]; then svnversion=/usr/bin/svnliteversion else svnversion= fi fi for dir in /usr/bin /usr/local/bin; do if [ -x "${dir}/p4" ] && [ -z ${p4_cmd} ] ; then p4_cmd=${dir}/p4 fi done if [ -d "${SYSDIR}/../.git" ] ; then for dir in /usr/bin /usr/local/bin; do if [ -x "${dir}/git" ] ; then git_cmd="${dir}/git --git-dir=${SYSDIR}/../.git" break fi done fi if [ -d "${SYSDIR}/../.hg" ] ; then for dir in /usr/bin /usr/local/bin; do if [ -x "${dir}/hg" ] ; then hg_cmd="${dir}/hg -R ${SYSDIR}/.." break fi done fi if [ -n "$svnversion" ] ; then svn=`cd ${SYSDIR} && $svnversion 2>/dev/null` case "$svn" in [0-9]*) svn=" r${svn}" ;; *) unset svn ;; esac fi if [ -n "$git_cmd" ] ; then git=`$git_cmd rev-parse --verify --short HEAD 2>/dev/null` svn=`$git_cmd svn find-rev $git 2>/dev/null` if [ -n "$svn" ] ; then svn=" r${svn}" git="=${git}" else svn=`$git_cmd log | fgrep 'git-svn-id:' | head -1 | \ sed -n 's/^.*@\([0-9][0-9]*\).*$/\1/p'` if [ -z "$svn" ] ; then svn=`$git_cmd log --format='format:%N' | \ grep '^svn ' | head -1 | \ sed -n 's/^.*revision=\([0-9][0-9]*\).*$/\1/p'` fi if [ -n "$svn" ] ; then svn=" r${svn}" git="+${git}" else git=" ${git}" fi fi git_b=`$git_cmd rev-parse --abbrev-ref HEAD` if [ -n "$git_b" ] ; then git="${git}(${git_b})" fi if $git_cmd --work-tree=${SYSDIR}/.. diff-index \ --name-only HEAD | read dummy; then git="${git}-dirty" fi fi if [ -n "$p4_cmd" ] ; then p4version=`cd ${SYSDIR} && $p4_cmd changes -m1 "./...#have" 2>&1 | \ awk '{ print $2 }'` case "$p4version" in [0-9]*) p4version=" ${p4version}" p4opened=`cd ${SYSDIR} && $p4_cmd opened ./... 2>&1` case "$p4opened" in File*) ;; //*) p4version="${p4version}+edit" ;; esac ;; *) unset p4version ;; esac fi if [ -n "$hg_cmd" ] ; then hg=`$hg_cmd id 2>/dev/null` svn=`$hg_cmd svn info 2>/dev/null | \ awk -F': ' '/Revision/ { print $2 }'` if [ -n "$svn" ] ; then svn=" r${svn}" fi if [ -n "$hg" ] ; then hg=" ${hg}" fi fi cat << EOF > vers.c $COPYRIGHT #define SCCSSTR "@(#)${VERSION} #${v}${svn}${git}${hg}${p4version}: ${t}" #define VERSTR "${VERSION} #${v}${svn}${git}${hg}${p4version}: ${t}\\n ${u}@${h}:${d}\\n" #define RELSTR "${RELEASE}" char sccs[sizeof(SCCSSTR) > 128 ? sizeof(SCCSSTR) : 128] = SCCSSTR; char version[sizeof(VERSTR) > 256 ? sizeof(VERSTR) : 256] = VERSTR; char compiler_version[] = "${compiler_v}"; char ostype[] = "${TYPE}"; char osrelease[sizeof(RELSTR) > 32 ? sizeof(RELSTR) : 32] = RELSTR; int osreldate = ${RELDATE}; char kern_ident[] = "${i}"; EOF echo $((v + 1)) > version Index: releng/10.2/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c =================================================================== --- releng/10.2/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c (revision 296954) +++ releng/10.2/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c (revision 296955) @@ -1,1310 +1,1320 @@ /*- * Copyright (c) 2010-2012 Citrix Inc. * Copyright (c) 2009-2012 Microsoft Corp. * Copyright (c) 2012 NetApp Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice unmodified, this list of conditions, and the following * disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ /*- * Copyright (c) 2004-2006 Kip Macy * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include "opt_inet6.h" #include "opt_inet.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "hv_net_vsc.h" #include "hv_rndis.h" #include "hv_rndis_filter.h" /* Short for Hyper-V network interface */ #define NETVSC_DEVNAME "hn" /* * It looks like offset 0 of buf is reserved to hold the softc pointer. * The sc pointer evidently not needed, and is not presently populated. * The packet offset is where the netvsc_packet starts in the buffer. */ #define HV_NV_SC_PTR_OFFSET_IN_BUF 0 #define HV_NV_PACKET_OFFSET_IN_BUF 16 +/* + * A unified flag for all outbound check sum flags is useful, + * and it helps avoiding unnecessary check sum calculation in + * network forwarding scenario. + */ +#define HV_CSUM_FOR_OUTBOUND \ + (CSUM_IP|CSUM_IP_UDP|CSUM_IP_TCP|CSUM_IP_SCTP|CSUM_IP_TSO| \ + CSUM_IP_ISCSI|CSUM_IP6_UDP|CSUM_IP6_TCP|CSUM_IP6_SCTP| \ + CSUM_IP6_TSO|CSUM_IP6_ISCSI) /* * Data types */ struct hv_netvsc_driver_context { uint32_t drv_inited; }; /* * Be aware that this sleepable mutex will exhibit WITNESS errors when * certain TCP and ARP code paths are taken. This appears to be a * well-known condition, as all other drivers checked use a sleeping * mutex to protect their transmit paths. * Also Be aware that mutexes do not play well with semaphores, and there * is a conflicting semaphore in a certain channel code path. */ #define NV_LOCK_INIT(_sc, _name) \ mtx_init(&(_sc)->hn_lock, _name, MTX_NETWORK_LOCK, MTX_DEF) #define NV_LOCK(_sc) mtx_lock(&(_sc)->hn_lock) #define NV_LOCK_ASSERT(_sc) mtx_assert(&(_sc)->hn_lock, MA_OWNED) #define NV_UNLOCK(_sc) mtx_unlock(&(_sc)->hn_lock) #define NV_LOCK_DESTROY(_sc) mtx_destroy(&(_sc)->hn_lock) /* * Globals */ int hv_promisc_mode = 0; /* normal mode by default */ /* The one and only one */ static struct hv_netvsc_driver_context g_netvsc_drv; /* * Forward declarations */ static void hn_stop(hn_softc_t *sc); static void hn_ifinit_locked(hn_softc_t *sc); static void hn_ifinit(void *xsc); static int hn_ioctl(struct ifnet *ifp, u_long cmd, caddr_t data); static int hn_start_locked(struct ifnet *ifp); static void hn_start(struct ifnet *ifp); /* * NetVsc get message transport protocol type */ static uint32_t get_transport_proto_type(struct mbuf *m_head) { uint32_t ret_val = TRANSPORT_TYPE_NOT_IP; uint16_t ether_type = 0; int ether_len = 0; struct ether_vlan_header *eh; #ifdef INET struct ip *iph; #endif #ifdef INET6 struct ip6_hdr *ip6; #endif eh = mtod(m_head, struct ether_vlan_header*); if (eh->evl_encap_proto == htons(ETHERTYPE_VLAN)) { ether_len = ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN; ether_type = eh->evl_proto; } else { ether_len = ETHER_HDR_LEN; ether_type = eh->evl_encap_proto; } switch (ntohs(ether_type)) { #ifdef INET6 case ETHERTYPE_IPV6: ip6 = (struct ip6_hdr *)(m_head->m_data + ether_len); if (IPPROTO_TCP == ip6->ip6_nxt) { ret_val = TRANSPORT_TYPE_IPV6_TCP; } else if (IPPROTO_UDP == ip6->ip6_nxt) { ret_val = TRANSPORT_TYPE_IPV6_UDP; } break; #endif #ifdef INET case ETHERTYPE_IP: iph = (struct ip *)(m_head->m_data + ether_len); if (IPPROTO_TCP == iph->ip_p) { ret_val = TRANSPORT_TYPE_IPV4_TCP; } else if (IPPROTO_UDP == iph->ip_p) { ret_val = TRANSPORT_TYPE_IPV4_UDP; } break; #endif default: ret_val = TRANSPORT_TYPE_NOT_IP; break; } return (ret_val); } /* * NetVsc driver initialization * Note: Filter init is no longer required */ static int netvsc_drv_init(void) { return (0); } /* * NetVsc global initialization entry point */ static void netvsc_init(void) { if (bootverbose) printf("Netvsc initializing... "); /* * XXXKYS: cleanup initialization */ if (!cold && !g_netvsc_drv.drv_inited) { g_netvsc_drv.drv_inited = 1; netvsc_drv_init(); if (bootverbose) printf("done!\n"); } else if (bootverbose) printf("Already initialized!\n"); } /* {F8615163-DF3E-46c5-913F-F2D2F965ED0E} */ static const hv_guid g_net_vsc_device_type = { .data = {0x63, 0x51, 0x61, 0xF8, 0x3E, 0xDF, 0xc5, 0x46, 0x91, 0x3F, 0xF2, 0xD2, 0xF9, 0x65, 0xED, 0x0E} }; /* * Standard probe entry point. * */ static int netvsc_probe(device_t dev) { const char *p; p = vmbus_get_type(dev); if (!memcmp(p, &g_net_vsc_device_type.data, sizeof(hv_guid))) { device_set_desc(dev, "Synthetic Network Interface"); if (bootverbose) printf("Netvsc probe... DONE \n"); return (BUS_PROBE_DEFAULT); } return (ENXIO); } /* * Standard attach entry point. * * Called when the driver is loaded. It allocates needed resources, * and initializes the "hardware" and software. */ static int netvsc_attach(device_t dev) { struct hv_device *device_ctx = vmbus_get_devctx(dev); netvsc_device_info device_info; hn_softc_t *sc; int unit = device_get_unit(dev); struct ifnet *ifp; int ret; netvsc_init(); sc = device_get_softc(dev); if (sc == NULL) { return (ENOMEM); } bzero(sc, sizeof(hn_softc_t)); sc->hn_unit = unit; sc->hn_dev = dev; NV_LOCK_INIT(sc, "NetVSCLock"); sc->hn_dev_obj = device_ctx; ifp = sc->hn_ifp = sc->arpcom.ac_ifp = if_alloc(IFT_ETHER); ifp->if_softc = sc; if_initname(ifp, device_get_name(dev), device_get_unit(dev)); ifp->if_dunit = unit; ifp->if_dname = NETVSC_DEVNAME; ifp->if_flags = IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST; ifp->if_ioctl = hn_ioctl; ifp->if_start = hn_start; ifp->if_init = hn_ifinit; /* needed by hv_rf_on_device_add() code */ ifp->if_mtu = ETHERMTU; IFQ_SET_MAXLEN(&ifp->if_snd, 512); ifp->if_snd.ifq_drv_maxlen = 511; IFQ_SET_READY(&ifp->if_snd); /* * Tell upper layers that we support full VLAN capability. */ ifp->if_data.ifi_hdrlen = sizeof(struct ether_vlan_header); ifp->if_capabilities |= IFCAP_VLAN_HWTAGGING | IFCAP_VLAN_MTU | IFCAP_HWCSUM | IFCAP_TSO; ifp->if_capenable |= IFCAP_VLAN_HWTAGGING | IFCAP_VLAN_MTU | IFCAP_HWCSUM | IFCAP_TSO; /* * Only enable UDP checksum offloading when it is on 2012R2 or * later. UDP checksum offloading doesn't work on earlier * Windows releases. */ if (hv_vmbus_protocal_version >= HV_VMBUS_VERSION_WIN8_1) ifp->if_hwassist = CSUM_TCP | CSUM_UDP | CSUM_TSO; else ifp->if_hwassist = CSUM_TCP | CSUM_TSO; ret = hv_rf_on_device_add(device_ctx, &device_info); if (ret != 0) { if_free(ifp); return (ret); } if (device_info.link_state == 0) { sc->hn_carrier = 1; } ether_ifattach(ifp, device_info.mac_addr); return (0); } /* * Standard detach entry point */ static int netvsc_detach(device_t dev) { struct hv_device *hv_device = vmbus_get_devctx(dev); if (bootverbose) printf("netvsc_detach\n"); /* * XXXKYS: Need to clean up all our * driver state; this is the driver * unloading. */ /* * XXXKYS: Need to stop outgoing traffic and unregister * the netdevice. */ hv_rf_on_device_remove(hv_device, HV_RF_NV_DESTROY_CHANNEL); return (0); } /* * Standard shutdown entry point */ static int netvsc_shutdown(device_t dev) { return (0); } /* * Send completion processing * * Note: It looks like offset 0 of buf is reserved to hold the softc * pointer. The sc pointer is not currently needed in this function, and * it is not presently populated by the TX function. */ void netvsc_xmit_completion(void *context) { netvsc_packet *packet = (netvsc_packet *)context; struct mbuf *mb; uint8_t *buf; mb = (struct mbuf *)(uintptr_t)packet->compl.send.send_completion_tid; buf = ((uint8_t *)packet) - HV_NV_PACKET_OFFSET_IN_BUF; free(buf, M_NETVSC); if (mb != NULL) { m_freem(mb); } } /* * Start a transmit of one or more packets */ static int hn_start_locked(struct ifnet *ifp) { hn_softc_t *sc = ifp->if_softc; struct hv_device *device_ctx = vmbus_get_devctx(sc->hn_dev); netvsc_dev *net_dev = sc->net_dev; device_t dev = device_ctx->device; uint8_t *buf; netvsc_packet *packet; struct mbuf *m_head, *m; struct mbuf *mc_head = NULL; struct ether_vlan_header *eh; rndis_msg *rndis_mesg; rndis_packet *rndis_pkt; rndis_per_packet_info *rppi; ndis_8021q_info *rppi_vlan_info; rndis_tcp_ip_csum_info *csum_info; rndis_tcp_tso_info *tso_info; int ether_len; int i; int num_frags; int len; int retries = 0; int ret = 0; uint32_t rndis_msg_size = 0; uint32_t trans_proto_type; uint32_t send_buf_section_idx = NVSP_1_CHIMNEY_SEND_INVALID_SECTION_INDEX; while (!IFQ_DRV_IS_EMPTY(&sc->hn_ifp->if_snd)) { IFQ_DRV_DEQUEUE(&sc->hn_ifp->if_snd, m_head); if (m_head == NULL) { break; } len = 0; num_frags = 0; /* Walk the mbuf list computing total length and num frags */ for (m = m_head; m != NULL; m = m->m_next) { if (m->m_len != 0) { num_frags++; len += m->m_len; } } /* * Reserve the number of pages requested. Currently, * one page is reserved for the message in the RNDIS * filter packet */ num_frags += HV_RF_NUM_TX_RESERVED_PAGE_BUFS; /* If exceeds # page_buffers in netvsc_packet */ if (num_frags > NETVSC_PACKET_MAXPAGE) { device_printf(dev, "exceed max page buffers,%d,%d\n", num_frags, NETVSC_PACKET_MAXPAGE); m_freem(m_head); if_inc_counter(ifp, IFCOUNTER_OERRORS, 1); return (EINVAL); } /* * Allocate a buffer with space for a netvsc packet plus a * number of reserved areas. First comes a (currently 16 * bytes, currently unused) reserved data area. Second is * the netvsc_packet. Third is an area reserved for an * rndis_filter_packet struct. Fourth (optional) is a * rndis_per_packet_info struct. * Changed malloc to M_NOWAIT to avoid sleep under spin lock. * No longer reserving extra space for page buffers, as they * are already part of the netvsc_packet. */ buf = malloc(HV_NV_PACKET_OFFSET_IN_BUF + sizeof(netvsc_packet) + sizeof(rndis_msg) + RNDIS_VLAN_PPI_SIZE + RNDIS_TSO_PPI_SIZE + RNDIS_CSUM_PPI_SIZE, M_NETVSC, M_ZERO | M_NOWAIT); if (buf == NULL) { device_printf(dev, "hn:malloc packet failed\n"); m_freem(m_head); if_inc_counter(ifp, IFCOUNTER_OERRORS, 1); return (ENOMEM); } packet = (netvsc_packet *)(buf + HV_NV_PACKET_OFFSET_IN_BUF); *(vm_offset_t *)buf = HV_NV_SC_PTR_OFFSET_IN_BUF; packet->is_data_pkt = TRUE; /* Set up the rndis header */ packet->page_buf_count = num_frags; /* Initialize it from the mbuf */ packet->tot_data_buf_len = len; /* * extension points to the area reserved for the * rndis_filter_packet, which is placed just after * the netvsc_packet (and rppi struct, if present; * length is updated later). */ packet->rndis_mesg = packet + 1; rndis_mesg = (rndis_msg *)packet->rndis_mesg; rndis_mesg->ndis_msg_type = REMOTE_NDIS_PACKET_MSG; rndis_pkt = &rndis_mesg->msg.packet; rndis_pkt->data_offset = sizeof(rndis_packet); rndis_pkt->data_length = packet->tot_data_buf_len; rndis_pkt->per_pkt_info_offset = sizeof(rndis_packet); rndis_msg_size = RNDIS_MESSAGE_SIZE(rndis_packet); /* * If the Hyper-V infrastructure needs to embed a VLAN tag, * initialize netvsc_packet and rppi struct values as needed. */ if (m_head->m_flags & M_VLANTAG) { /* * set up some additional fields so the Hyper-V infrastructure will stuff the VLAN tag * into the frame. */ packet->vlan_tci = m_head->m_pkthdr.ether_vtag; rndis_msg_size += RNDIS_VLAN_PPI_SIZE; rppi = hv_set_rppi_data(rndis_mesg, RNDIS_VLAN_PPI_SIZE, ieee_8021q_info); /* VLAN info immediately follows rppi struct */ rppi_vlan_info = (ndis_8021q_info *)((char*)rppi + rppi->per_packet_info_offset); /* FreeBSD does not support CFI or priority */ rppi_vlan_info->u1.s1.vlan_id = packet->vlan_tci & 0xfff; } - if (0 == m_head->m_pkthdr.csum_flags) { + /* Only check the flags for outbound and ignore the ones for inbound */ + if (0 == (m_head->m_pkthdr.csum_flags & HV_CSUM_FOR_OUTBOUND)) { goto pre_send; } eh = mtod(m_head, struct ether_vlan_header*); if (eh->evl_encap_proto == htons(ETHERTYPE_VLAN)) { ether_len = ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN; } else { ether_len = ETHER_HDR_LEN; } trans_proto_type = get_transport_proto_type(m_head); if (TRANSPORT_TYPE_NOT_IP == trans_proto_type) { goto pre_send; } /* * TSO packet needless to setup the send side checksum * offload. */ if (m_head->m_pkthdr.csum_flags & CSUM_TSO) { goto do_tso; } /* setup checksum offload */ rndis_msg_size += RNDIS_CSUM_PPI_SIZE; rppi = hv_set_rppi_data(rndis_mesg, RNDIS_CSUM_PPI_SIZE, tcpip_chksum_info); csum_info = (rndis_tcp_ip_csum_info *)((char*)rppi + rppi->per_packet_info_offset); if (trans_proto_type & (TYPE_IPV4 << 16)) { csum_info->xmit.is_ipv4 = 1; } else { csum_info->xmit.is_ipv6 = 1; } if (trans_proto_type & TYPE_TCP) { csum_info->xmit.tcp_csum = 1; csum_info->xmit.tcp_header_offset = 0; } else if (trans_proto_type & TYPE_UDP) { csum_info->xmit.udp_csum = 1; } goto pre_send; do_tso: /* setup TCP segmentation offload */ rndis_msg_size += RNDIS_TSO_PPI_SIZE; rppi = hv_set_rppi_data(rndis_mesg, RNDIS_TSO_PPI_SIZE, tcp_large_send_info); tso_info = (rndis_tcp_tso_info *)((char *)rppi + rppi->per_packet_info_offset); tso_info->lso_v2_xmit.type = RNDIS_TCP_LARGE_SEND_OFFLOAD_V2_TYPE; #ifdef INET if (trans_proto_type & (TYPE_IPV4 << 16)) { struct ip *ip = (struct ip *)(m_head->m_data + ether_len); unsigned long iph_len = ip->ip_hl << 2; struct tcphdr *th = (struct tcphdr *)((caddr_t)ip + iph_len); tso_info->lso_v2_xmit.ip_version = RNDIS_TCP_LARGE_SEND_OFFLOAD_IPV4; ip->ip_len = 0; ip->ip_sum = 0; th->th_sum = in_pseudo(ip->ip_src.s_addr, ip->ip_dst.s_addr, htons(IPPROTO_TCP)); } #endif #if defined(INET6) && defined(INET) else #endif #ifdef INET6 { struct ip6_hdr *ip6 = (struct ip6_hdr *)(m_head->m_data + ether_len); struct tcphdr *th = (struct tcphdr *)(ip6 + 1); tso_info->lso_v2_xmit.ip_version = RNDIS_TCP_LARGE_SEND_OFFLOAD_IPV6; ip6->ip6_plen = 0; th->th_sum = in6_cksum_pseudo(ip6, 0, IPPROTO_TCP, 0); } #endif tso_info->lso_v2_xmit.tcp_header_offset = 0; tso_info->lso_v2_xmit.mss = m_head->m_pkthdr.tso_segsz; pre_send: rndis_mesg->msg_len = packet->tot_data_buf_len + rndis_msg_size; packet->tot_data_buf_len = rndis_mesg->msg_len; /* send packet with send buffer */ if (packet->tot_data_buf_len < net_dev->send_section_size) { send_buf_section_idx = hv_nv_get_next_send_section(net_dev); if (send_buf_section_idx != NVSP_1_CHIMNEY_SEND_INVALID_SECTION_INDEX) { char *dest = ((char *)net_dev->send_buf + send_buf_section_idx * net_dev->send_section_size); memcpy(dest, rndis_mesg, rndis_msg_size); dest += rndis_msg_size; for (m = m_head; m != NULL; m = m->m_next) { if (m->m_len) { memcpy(dest, (void *)mtod(m, vm_offset_t), m->m_len); dest += m->m_len; } } packet->send_buf_section_idx = send_buf_section_idx; packet->send_buf_section_size = packet->tot_data_buf_len; packet->page_buf_count = 0; goto do_send; } } /* send packet with page buffer */ packet->page_buffers[0].pfn = atop(hv_get_phys_addr(rndis_mesg)); packet->page_buffers[0].offset = (unsigned long)rndis_mesg & PAGE_MASK; packet->page_buffers[0].length = rndis_msg_size; /* * Fill the page buffers with mbuf info starting at index * HV_RF_NUM_TX_RESERVED_PAGE_BUFS. */ i = HV_RF_NUM_TX_RESERVED_PAGE_BUFS; for (m = m_head; m != NULL; m = m->m_next) { if (m->m_len) { vm_offset_t paddr = vtophys(mtod(m, vm_offset_t)); packet->page_buffers[i].pfn = paddr >> PAGE_SHIFT; packet->page_buffers[i].offset = paddr & (PAGE_SIZE - 1); packet->page_buffers[i].length = m->m_len; i++; } } packet->send_buf_section_idx = NVSP_1_CHIMNEY_SEND_INVALID_SECTION_INDEX; packet->send_buf_section_size = 0; do_send: /* * If bpf, copy the mbuf chain. This is less expensive than * it appears; the mbuf clusters are not copied, only their * reference counts are incremented. * Needed to avoid a race condition where the completion * callback is invoked, freeing the mbuf chain, before the * bpf_mtap code has a chance to run. */ if (ifp->if_bpf) { mc_head = m_copypacket(m_head, M_DONTWAIT); } retry_send: /* Set the completion routine */ packet->compl.send.on_send_completion = netvsc_xmit_completion; packet->compl.send.send_completion_context = packet; packet->compl.send.send_completion_tid = (uint64_t)(uintptr_t)m_head; /* Removed critical_enter(), does not appear necessary */ ret = hv_nv_on_send(device_ctx, packet); if (ret == 0) { ifp->if_opackets++; /* if bpf && mc_head, call bpf_mtap code */ if (mc_head) { ETHER_BPF_MTAP(ifp, mc_head); } } else { retries++; if (retries < 4) { goto retry_send; } IF_PREPEND(&ifp->if_snd, m_head); ifp->if_drv_flags |= IFF_DRV_OACTIVE; /* * Null the mbuf pointer so the completion function * does not free the mbuf chain. We just pushed the * mbuf chain back on the if_snd queue. */ packet->compl.send.send_completion_tid = 0; /* * Release the resources since we will not get any * send completion */ netvsc_xmit_completion(packet); if_inc_counter(ifp, IFCOUNTER_OERRORS, 1); } /* if bpf && mc_head, free the mbuf chain copy */ if (mc_head) { m_freem(mc_head); } } return (ret); } /* * Link up/down notification */ void netvsc_linkstatus_callback(struct hv_device *device_obj, uint32_t status) { hn_softc_t *sc = device_get_softc(device_obj->device); if (sc == NULL) { return; } if (status == 1) { sc->hn_carrier = 1; } else { sc->hn_carrier = 0; } } /* * Append the specified data to the indicated mbuf chain, * Extend the mbuf chain if the new data does not fit in * existing space. * * This is a minor rewrite of m_append() from sys/kern/uipc_mbuf.c. * There should be an equivalent in the kernel mbuf code, * but there does not appear to be one yet. * * Differs from m_append() in that additional mbufs are * allocated with cluster size MJUMPAGESIZE, and filled * accordingly. * * Return 1 if able to complete the job; otherwise 0. */ static int hv_m_append(struct mbuf *m0, int len, c_caddr_t cp) { struct mbuf *m, *n; int remainder, space; for (m = m0; m->m_next != NULL; m = m->m_next) ; remainder = len; space = M_TRAILINGSPACE(m); if (space > 0) { /* * Copy into available space. */ if (space > remainder) space = remainder; bcopy(cp, mtod(m, caddr_t) + m->m_len, space); m->m_len += space; cp += space; remainder -= space; } while (remainder > 0) { /* * Allocate a new mbuf; could check space * and allocate a cluster instead. */ n = m_getjcl(M_DONTWAIT, m->m_type, 0, MJUMPAGESIZE); if (n == NULL) break; n->m_len = min(MJUMPAGESIZE, remainder); bcopy(cp, mtod(n, caddr_t), n->m_len); cp += n->m_len; remainder -= n->m_len; m->m_next = n; m = n; } if (m0->m_flags & M_PKTHDR) m0->m_pkthdr.len += len - remainder; return (remainder == 0); } /* * Called when we receive a data packet from the "wire" on the * specified device * * Note: This is no longer used as a callback */ int netvsc_recv(struct hv_device *device_ctx, netvsc_packet *packet, rndis_tcp_ip_csum_info *csum_info) { hn_softc_t *sc = (hn_softc_t *)device_get_softc(device_ctx->device); struct mbuf *m_new; struct ifnet *ifp; device_t dev = device_ctx->device; int size; if (sc == NULL) { return (0); /* TODO: KYS how can this be! */ } ifp = sc->hn_ifp; ifp = sc->arpcom.ac_ifp; if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) { return (0); } /* * Bail out if packet contains more data than configured MTU. */ if (packet->tot_data_buf_len > (ifp->if_mtu + ETHER_HDR_LEN)) { return (0); } /* * Get an mbuf with a cluster. For packets 2K or less, * get a standard 2K cluster. For anything larger, get a * 4K cluster. Any buffers larger than 4K can cause problems * if looped around to the Hyper-V TX channel, so avoid them. */ size = MCLBYTES; if (packet->tot_data_buf_len > MCLBYTES) { /* 4096 */ size = MJUMPAGESIZE; } m_new = m_getjcl(M_DONTWAIT, MT_DATA, M_PKTHDR, size); if (m_new == NULL) { device_printf(dev, "alloc mbuf failed.\n"); return (0); } hv_m_append(m_new, packet->tot_data_buf_len, packet->data); m_new->m_pkthdr.rcvif = ifp; /* receive side checksum offload */ m_new->m_pkthdr.csum_flags = 0; if (NULL != csum_info) { /* IP csum offload */ if (csum_info->receive.ip_csum_succeeded) { m_new->m_pkthdr.csum_flags |= (CSUM_IP_CHECKED | CSUM_IP_VALID); } /* TCP csum offload */ if (csum_info->receive.tcp_csum_succeeded) { m_new->m_pkthdr.csum_flags |= (CSUM_DATA_VALID | CSUM_PSEUDO_HDR); m_new->m_pkthdr.csum_data = 0xffff; } } if ((packet->vlan_tci != 0) && (ifp->if_capenable & IFCAP_VLAN_HWTAGGING) != 0) { m_new->m_pkthdr.ether_vtag = packet->vlan_tci; m_new->m_flags |= M_VLANTAG; } /* * Note: Moved RX completion back to hv_nv_on_receive() so all * messages (not just data messages) will trigger a response. */ ifp->if_ipackets++; /* We're not holding the lock here, so don't release it */ (*ifp->if_input)(ifp, m_new); return (0); } /* * Rules for using sc->temp_unusable: * 1. sc->temp_unusable can only be read or written while holding NV_LOCK() * 2. code reading sc->temp_unusable under NV_LOCK(), and finding * sc->temp_unusable set, must release NV_LOCK() and exit * 3. to retain exclusive control of the interface, * sc->temp_unusable must be set by code before releasing NV_LOCK() * 4. only code setting sc->temp_unusable can clear sc->temp_unusable * 5. code setting sc->temp_unusable must eventually clear sc->temp_unusable */ /* * Standard ioctl entry point. Called when the user wants to configure * the interface. */ static int hn_ioctl(struct ifnet *ifp, u_long cmd, caddr_t data) { hn_softc_t *sc = ifp->if_softc; struct ifreq *ifr = (struct ifreq *)data; #ifdef INET struct ifaddr *ifa = (struct ifaddr *)data; #endif netvsc_device_info device_info; struct hv_device *hn_dev; int mask, error = 0; int retry_cnt = 500; switch(cmd) { case SIOCSIFADDR: #ifdef INET if (ifa->ifa_addr->sa_family == AF_INET) { ifp->if_flags |= IFF_UP; if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) hn_ifinit(sc); arp_ifinit(ifp, ifa); } else #endif error = ether_ioctl(ifp, cmd, data); break; case SIOCSIFMTU: hn_dev = vmbus_get_devctx(sc->hn_dev); /* Check MTU value change */ if (ifp->if_mtu == ifr->ifr_mtu) break; if (ifr->ifr_mtu > NETVSC_MAX_CONFIGURABLE_MTU) { error = EINVAL; break; } /* Obtain and record requested MTU */ ifp->if_mtu = ifr->ifr_mtu; do { NV_LOCK(sc); if (!sc->temp_unusable) { sc->temp_unusable = TRUE; retry_cnt = -1; } NV_UNLOCK(sc); if (retry_cnt > 0) { retry_cnt--; DELAY(5 * 1000); } } while (retry_cnt > 0); if (retry_cnt == 0) { error = EINVAL; break; } /* We must remove and add back the device to cause the new * MTU to take effect. This includes tearing down, but not * deleting the channel, then bringing it back up. */ error = hv_rf_on_device_remove(hn_dev, HV_RF_NV_RETAIN_CHANNEL); if (error) { NV_LOCK(sc); sc->temp_unusable = FALSE; NV_UNLOCK(sc); break; } error = hv_rf_on_device_add(hn_dev, &device_info); if (error) { NV_LOCK(sc); sc->temp_unusable = FALSE; NV_UNLOCK(sc); break; } hn_ifinit_locked(sc); NV_LOCK(sc); sc->temp_unusable = FALSE; NV_UNLOCK(sc); break; case SIOCSIFFLAGS: do { NV_LOCK(sc); if (!sc->temp_unusable) { sc->temp_unusable = TRUE; retry_cnt = -1; } NV_UNLOCK(sc); if (retry_cnt > 0) { retry_cnt--; DELAY(5 * 1000); } } while (retry_cnt > 0); if (retry_cnt == 0) { error = EINVAL; break; } if (ifp->if_flags & IFF_UP) { /* * If only the state of the PROMISC flag changed, * then just use the 'set promisc mode' command * instead of reinitializing the entire NIC. Doing * a full re-init means reloading the firmware and * waiting for it to start up, which may take a * second or two. */ #ifdef notyet /* Fixme: Promiscuous mode? */ if (ifp->if_drv_flags & IFF_DRV_RUNNING && ifp->if_flags & IFF_PROMISC && !(sc->hn_if_flags & IFF_PROMISC)) { /* do something here for Hyper-V */ } else if (ifp->if_drv_flags & IFF_DRV_RUNNING && !(ifp->if_flags & IFF_PROMISC) && sc->hn_if_flags & IFF_PROMISC) { /* do something here for Hyper-V */ } else #endif hn_ifinit_locked(sc); } else { if (ifp->if_drv_flags & IFF_DRV_RUNNING) { hn_stop(sc); } } NV_LOCK(sc); sc->temp_unusable = FALSE; NV_UNLOCK(sc); sc->hn_if_flags = ifp->if_flags; error = 0; break; case SIOCSIFCAP: mask = ifr->ifr_reqcap ^ ifp->if_capenable; if (mask & IFCAP_TXCSUM) { if (IFCAP_TXCSUM & ifp->if_capenable) { ifp->if_capenable &= ~IFCAP_TXCSUM; ifp->if_hwassist &= ~(CSUM_TCP | CSUM_UDP); } else { ifp->if_capenable |= IFCAP_TXCSUM; /* * Only enable UDP checksum offloading on * Windows Server 2012R2 or later releases. */ if (hv_vmbus_protocal_version >= HV_VMBUS_VERSION_WIN8_1) { ifp->if_hwassist |= (CSUM_TCP | CSUM_UDP); } else { ifp->if_hwassist |= CSUM_TCP; } } } if (mask & IFCAP_RXCSUM) { if (IFCAP_RXCSUM & ifp->if_capenable) { ifp->if_capenable &= ~IFCAP_RXCSUM; } else { ifp->if_capenable |= IFCAP_RXCSUM; } } if (mask & IFCAP_TSO4) { ifp->if_capenable ^= IFCAP_TSO4; ifp->if_hwassist ^= CSUM_IP_TSO; } if (mask & IFCAP_TSO6) { ifp->if_capenable ^= IFCAP_TSO6; ifp->if_hwassist ^= CSUM_IP6_TSO; } error = 0; break; case SIOCADDMULTI: case SIOCDELMULTI: #ifdef notyet /* Fixme: Multicast mode? */ if (ifp->if_drv_flags & IFF_DRV_RUNNING) { NV_LOCK(sc); netvsc_setmulti(sc); NV_UNLOCK(sc); error = 0; } #endif /* FALLTHROUGH */ case SIOCSIFMEDIA: case SIOCGIFMEDIA: error = EINVAL; break; default: error = ether_ioctl(ifp, cmd, data); break; } return (error); } /* * */ static void hn_stop(hn_softc_t *sc) { struct ifnet *ifp; int ret; struct hv_device *device_ctx = vmbus_get_devctx(sc->hn_dev); ifp = sc->hn_ifp; if (bootverbose) printf(" Closing Device ...\n"); ifp->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE); if_link_state_change(ifp, LINK_STATE_DOWN); sc->hn_initdone = 0; ret = hv_rf_on_close(device_ctx); } /* * FreeBSD transmit entry point */ static void hn_start(struct ifnet *ifp) { hn_softc_t *sc; sc = ifp->if_softc; NV_LOCK(sc); if (sc->temp_unusable) { NV_UNLOCK(sc); return; } hn_start_locked(ifp); NV_UNLOCK(sc); } /* * */ static void hn_ifinit_locked(hn_softc_t *sc) { struct ifnet *ifp; struct hv_device *device_ctx = vmbus_get_devctx(sc->hn_dev); int ret; ifp = sc->hn_ifp; if (ifp->if_drv_flags & IFF_DRV_RUNNING) { return; } hv_promisc_mode = 1; ret = hv_rf_on_open(device_ctx); if (ret != 0) { return; } else { sc->hn_initdone = 1; } ifp->if_drv_flags |= IFF_DRV_RUNNING; ifp->if_drv_flags &= ~IFF_DRV_OACTIVE; if_link_state_change(ifp, LINK_STATE_UP); } /* * */ static void hn_ifinit(void *xsc) { hn_softc_t *sc = xsc; NV_LOCK(sc); if (sc->temp_unusable) { NV_UNLOCK(sc); return; } sc->temp_unusable = TRUE; NV_UNLOCK(sc); hn_ifinit_locked(sc); NV_LOCK(sc); sc->temp_unusable = FALSE; NV_UNLOCK(sc); } #ifdef LATER /* * */ static void hn_watchdog(struct ifnet *ifp) { hn_softc_t *sc; sc = ifp->if_softc; printf("hn%d: watchdog timeout -- resetting\n", sc->hn_unit); hn_ifinit(sc); /*???*/ ifp->if_oerrors++; } #endif static device_method_t netvsc_methods[] = { /* Device interface */ DEVMETHOD(device_probe, netvsc_probe), DEVMETHOD(device_attach, netvsc_attach), DEVMETHOD(device_detach, netvsc_detach), DEVMETHOD(device_shutdown, netvsc_shutdown), { 0, 0 } }; static driver_t netvsc_driver = { NETVSC_DEVNAME, netvsc_methods, sizeof(hn_softc_t) }; static devclass_t netvsc_devclass; DRIVER_MODULE(hn, vmbus, netvsc_driver, netvsc_devclass, 0, 0); MODULE_VERSION(hn, 1); MODULE_DEPEND(hn, vmbus, 1, 1, 1); SYSINIT(netvsc_initx, SI_SUB_KTHREAD_IDLE, SI_ORDER_MIDDLE + 1, netvsc_init, NULL); Index: releng/10.2/sys/dev/hyperv/utilities/hv_kvp.c =================================================================== --- releng/10.2/sys/dev/hyperv/utilities/hv_kvp.c (revision 296954) +++ releng/10.2/sys/dev/hyperv/utilities/hv_kvp.c (revision 296955) @@ -1,1002 +1,1011 @@ /*- * Copyright (c) 2014 Microsoft Corp. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice unmodified, this list of conditions, and the following * disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ /* * Author: Sainath Varanasi. * Date: 4/2012 * Email: bsdic@microsoft.com */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include +#include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "unicode.h" #include "hv_kvp.h" /* hv_kvp defines */ #define BUFFERSIZE sizeof(struct hv_kvp_msg) #define KVP_SUCCESS 0 #define KVP_ERROR 1 #define kvp_hdr hdr.kvp_hdr /* hv_kvp debug control */ static int hv_kvp_log = 0; SYSCTL_INT(_dev, OID_AUTO, hv_kvp_log, CTLFLAG_RW, &hv_kvp_log, 0, "hv_kvp log"); #define hv_kvp_log_error(...) do { \ if (hv_kvp_log > 0) \ log(LOG_ERR, "hv_kvp: " __VA_ARGS__); \ } while (0) #define hv_kvp_log_info(...) do { \ if (hv_kvp_log > 1) \ log(LOG_INFO, "hv_kvp: " __VA_ARGS__); \ } while (0) /* character device prototypes */ static d_open_t hv_kvp_dev_open; static d_close_t hv_kvp_dev_close; static d_read_t hv_kvp_dev_daemon_read; static d_write_t hv_kvp_dev_daemon_write; static d_poll_t hv_kvp_dev_daemon_poll; /* hv_kvp prototypes */ static int hv_kvp_req_in_progress(void); static void hv_kvp_transaction_init(uint32_t, hv_vmbus_channel *, uint64_t, uint8_t *); static void hv_kvp_send_msg_to_daemon(void); static void hv_kvp_process_request(void *context); /* hv_kvp character device structure */ static struct cdevsw hv_kvp_cdevsw = { .d_version = D_VERSION, .d_open = hv_kvp_dev_open, .d_close = hv_kvp_dev_close, .d_read = hv_kvp_dev_daemon_read, .d_write = hv_kvp_dev_daemon_write, .d_poll = hv_kvp_dev_daemon_poll, .d_name = "hv_kvp_dev", }; static struct cdev *hv_kvp_dev; static struct hv_kvp_msg *hv_kvp_dev_buf; struct proc *daemon_task; +static struct selinfo hv_kvp_selinfo; + /* * Global state to track and synchronize multiple * KVP transaction requests from the host. */ static struct { /* Pre-allocated work item for queue */ hv_work_item work_item; /* Unless specified the pending mutex should be * used to alter the values of the following paramters: * 1. req_in_progress * 2. req_timed_out * 3. pending_reqs. */ struct mtx pending_mutex; /* To track if transaction is active or not */ boolean_t req_in_progress; /* Tracks if daemon did not reply back in time */ boolean_t req_timed_out; /* Tracks if daemon is serving a request currently */ boolean_t daemon_busy; /* Count of KVP requests from Hyper-V. */ uint64_t pending_reqs; /* Length of host message */ uint32_t host_msg_len; /* Pointer to channel */ hv_vmbus_channel *channelp; /* Host message id */ uint64_t host_msg_id; /* Current kvp message from the host */ struct hv_kvp_msg *host_kvp_msg; /* Current kvp message for daemon */ struct hv_kvp_msg daemon_kvp_msg; /* Rcv buffer for communicating with the host*/ uint8_t *rcv_buf; /* Device semaphore to control communication */ struct sema dev_sema; /* Indicates if daemon registered with driver */ boolean_t register_done; /* Character device status */ boolean_t dev_accessed; } kvp_globals; /* global vars */ MALLOC_DECLARE(M_HV_KVP_DEV_BUF); MALLOC_DEFINE(M_HV_KVP_DEV_BUF, "hv_kvp_dev buffer", "buffer for hv_kvp_dev module"); /* * hv_kvp low level functions */ /* * Check if kvp transaction is in progres */ static int hv_kvp_req_in_progress(void) { return (kvp_globals.req_in_progress); } /* * This routine is called whenever a message is received from the host */ static void hv_kvp_transaction_init(uint32_t rcv_len, hv_vmbus_channel *rcv_channel, uint64_t request_id, uint8_t *rcv_buf) { /* Store all the relevant message details in the global structure */ /* Do not need to use mutex for req_in_progress here */ kvp_globals.req_in_progress = true; kvp_globals.host_msg_len = rcv_len; kvp_globals.channelp = rcv_channel; kvp_globals.host_msg_id = request_id; kvp_globals.rcv_buf = rcv_buf; kvp_globals.host_kvp_msg = (struct hv_kvp_msg *)&rcv_buf[ sizeof(struct hv_vmbus_pipe_hdr) + sizeof(struct hv_vmbus_icmsg_hdr)]; } /* * hv_kvp - version neogtiation function */ static void hv_kvp_negotiate_version(struct hv_vmbus_icmsg_hdr *icmsghdrp, struct hv_vmbus_icmsg_negotiate *negop, uint8_t *buf) { int icframe_vercnt; int icmsg_vercnt; icmsghdrp->icmsgsize = 0x10; negop = (struct hv_vmbus_icmsg_negotiate *)&buf[ sizeof(struct hv_vmbus_pipe_hdr) + sizeof(struct hv_vmbus_icmsg_hdr)]; icframe_vercnt = negop->icframe_vercnt; icmsg_vercnt = negop->icmsg_vercnt; /* * Select the framework version number we will support */ if ((icframe_vercnt >= 2) && (negop->icversion_data[1].major == 3)) { icframe_vercnt = 3; if (icmsg_vercnt > 2) icmsg_vercnt = 4; else icmsg_vercnt = 3; } else { icframe_vercnt = 1; icmsg_vercnt = 1; } negop->icframe_vercnt = 1; negop->icmsg_vercnt = 1; negop->icversion_data[0].major = icframe_vercnt; negop->icversion_data[0].minor = 0; negop->icversion_data[1].major = icmsg_vercnt; negop->icversion_data[1].minor = 0; } /* * Convert ip related info in umsg from utf8 to utf16 and store in hmsg */ static int hv_kvp_convert_utf8_ipinfo_to_utf16(struct hv_kvp_msg *umsg, struct hv_kvp_ip_msg *host_ip_msg) { int err_ip, err_subnet, err_gway, err_dns, err_adap; int UNUSED_FLAG = 1; utf8_to_utf16((uint16_t *)host_ip_msg->kvp_ip_val.ip_addr, MAX_IP_ADDR_SIZE, (char *)umsg->body.kvp_ip_val.ip_addr, strlen((char *)umsg->body.kvp_ip_val.ip_addr), UNUSED_FLAG, &err_ip); utf8_to_utf16((uint16_t *)host_ip_msg->kvp_ip_val.sub_net, MAX_IP_ADDR_SIZE, (char *)umsg->body.kvp_ip_val.sub_net, strlen((char *)umsg->body.kvp_ip_val.sub_net), UNUSED_FLAG, &err_subnet); utf8_to_utf16((uint16_t *)host_ip_msg->kvp_ip_val.gate_way, MAX_GATEWAY_SIZE, (char *)umsg->body.kvp_ip_val.gate_way, strlen((char *)umsg->body.kvp_ip_val.gate_way), UNUSED_FLAG, &err_gway); utf8_to_utf16((uint16_t *)host_ip_msg->kvp_ip_val.dns_addr, MAX_IP_ADDR_SIZE, (char *)umsg->body.kvp_ip_val.dns_addr, strlen((char *)umsg->body.kvp_ip_val.dns_addr), UNUSED_FLAG, &err_dns); utf8_to_utf16((uint16_t *)host_ip_msg->kvp_ip_val.adapter_id, MAX_IP_ADDR_SIZE, (char *)umsg->body.kvp_ip_val.adapter_id, strlen((char *)umsg->body.kvp_ip_val.adapter_id), UNUSED_FLAG, &err_adap); host_ip_msg->kvp_ip_val.dhcp_enabled = umsg->body.kvp_ip_val.dhcp_enabled; host_ip_msg->kvp_ip_val.addr_family = umsg->body.kvp_ip_val.addr_family; return (err_ip | err_subnet | err_gway | err_dns | err_adap); } /* * Convert ip related info in hmsg from utf16 to utf8 and store in umsg */ static int hv_kvp_convert_utf16_ipinfo_to_utf8(struct hv_kvp_ip_msg *host_ip_msg, struct hv_kvp_msg *umsg) { int err_ip, err_subnet, err_gway, err_dns, err_adap; int UNUSED_FLAG = 1; int guid_index; struct hv_device *hv_dev; /* GUID Data Structure */ hn_softc_t *sc; /* hn softc structure */ char if_name[4]; unsigned char guid_instance[40]; char *guid_data = NULL; char buf[39]; struct guid_extract { char a1[2]; char a2[2]; char a3[2]; char a4[2]; char b1[2]; char b2[2]; char c1[2]; char c2[2]; char d[4]; char e[12]; }; struct guid_extract *id; device_t *devs; int devcnt; /* IP Address */ utf16_to_utf8((char *)umsg->body.kvp_ip_val.ip_addr, MAX_IP_ADDR_SIZE, (uint16_t *)host_ip_msg->kvp_ip_val.ip_addr, MAX_IP_ADDR_SIZE, UNUSED_FLAG, &err_ip); /* Adapter ID : GUID */ utf16_to_utf8((char *)umsg->body.kvp_ip_val.adapter_id, MAX_ADAPTER_ID_SIZE, (uint16_t *)host_ip_msg->kvp_ip_val.adapter_id, MAX_ADAPTER_ID_SIZE, UNUSED_FLAG, &err_adap); if (devclass_get_devices(devclass_find("hn"), &devs, &devcnt) == 0) { for (devcnt = devcnt - 1; devcnt >= 0; devcnt--) { sc = device_get_softc(devs[devcnt]); /* Trying to find GUID of Network Device */ hv_dev = sc->hn_dev_obj; for (guid_index = 0; guid_index < 16; guid_index++) { sprintf(&guid_instance[guid_index * 2], "%02x", hv_dev->device_id.data[guid_index]); } guid_data = (char *)guid_instance; id = (struct guid_extract *)guid_data; snprintf(buf, sizeof(buf), "{%.2s%.2s%.2s%.2s-%.2s%.2s-%.2s%.2s-%.4s-%s}", id->a4, id->a3, id->a2, id->a1, id->b2, id->b1, id->c2, id->c1, id->d, id->e); guid_data = NULL; sprintf(if_name, "%s%d", "hn", device_get_unit(devs[devcnt])); if (strncmp(buf, (char *)umsg->body.kvp_ip_val.adapter_id, 39) == 0) { strcpy((char *)umsg->body.kvp_ip_val.adapter_id, if_name); break; } } free(devs, M_TEMP); } /* Address Family , DHCP , SUBNET, Gateway, DNS */ umsg->kvp_hdr.operation = host_ip_msg->operation; umsg->body.kvp_ip_val.addr_family = host_ip_msg->kvp_ip_val.addr_family; umsg->body.kvp_ip_val.dhcp_enabled = host_ip_msg->kvp_ip_val.dhcp_enabled; utf16_to_utf8((char *)umsg->body.kvp_ip_val.sub_net, MAX_IP_ADDR_SIZE, (uint16_t *)host_ip_msg->kvp_ip_val.sub_net, MAX_IP_ADDR_SIZE, UNUSED_FLAG, &err_subnet); utf16_to_utf8((char *)umsg->body.kvp_ip_val.gate_way, MAX_GATEWAY_SIZE, (uint16_t *)host_ip_msg->kvp_ip_val.gate_way, MAX_GATEWAY_SIZE, UNUSED_FLAG, &err_gway); utf16_to_utf8((char *)umsg->body.kvp_ip_val.dns_addr, MAX_IP_ADDR_SIZE, (uint16_t *)host_ip_msg->kvp_ip_val.dns_addr, MAX_IP_ADDR_SIZE, UNUSED_FLAG, &err_dns); return (err_ip | err_subnet | err_gway | err_dns | err_adap); } /* * Prepare a user kvp msg based on host kvp msg (utf16 to utf8) * Ensure utf16_utf8 takes care of the additional string terminating char!! */ static void hv_kvp_convert_hostmsg_to_usermsg(void) { int utf_err = 0; uint32_t value_type; struct hv_kvp_ip_msg *host_ip_msg = (struct hv_kvp_ip_msg *) kvp_globals.host_kvp_msg; struct hv_kvp_msg *hmsg = kvp_globals.host_kvp_msg; struct hv_kvp_msg *umsg = &kvp_globals.daemon_kvp_msg; memset(umsg, 0, sizeof(struct hv_kvp_msg)); umsg->kvp_hdr.operation = hmsg->kvp_hdr.operation; umsg->kvp_hdr.pool = hmsg->kvp_hdr.pool; switch (umsg->kvp_hdr.operation) { case HV_KVP_OP_SET_IP_INFO: hv_kvp_convert_utf16_ipinfo_to_utf8(host_ip_msg, umsg); break; case HV_KVP_OP_GET_IP_INFO: utf16_to_utf8((char *)umsg->body.kvp_ip_val.adapter_id, MAX_ADAPTER_ID_SIZE, (uint16_t *)host_ip_msg->kvp_ip_val.adapter_id, MAX_ADAPTER_ID_SIZE, 1, &utf_err); umsg->body.kvp_ip_val.addr_family = host_ip_msg->kvp_ip_val.addr_family; break; case HV_KVP_OP_SET: value_type = hmsg->body.kvp_set.data.value_type; switch (value_type) { case HV_REG_SZ: umsg->body.kvp_set.data.value_size = utf16_to_utf8( (char *)umsg->body.kvp_set.data.msg_value.value, HV_KVP_EXCHANGE_MAX_VALUE_SIZE - 1, (uint16_t *)hmsg->body.kvp_set.data.msg_value.value, hmsg->body.kvp_set.data.value_size, 1, &utf_err); /* utf8 encoding */ umsg->body.kvp_set.data.value_size = umsg->body.kvp_set.data.value_size / 2; break; case HV_REG_U32: umsg->body.kvp_set.data.value_size = sprintf(umsg->body.kvp_set.data.msg_value.value, "%d", hmsg->body.kvp_set.data.msg_value.value_u32) + 1; break; case HV_REG_U64: umsg->body.kvp_set.data.value_size = sprintf(umsg->body.kvp_set.data.msg_value.value, "%llu", (unsigned long long) hmsg->body.kvp_set.data.msg_value.value_u64) + 1; break; } umsg->body.kvp_set.data.key_size = utf16_to_utf8( umsg->body.kvp_set.data.key, HV_KVP_EXCHANGE_MAX_KEY_SIZE - 1, (uint16_t *)hmsg->body.kvp_set.data.key, hmsg->body.kvp_set.data.key_size, 1, &utf_err); /* utf8 encoding */ umsg->body.kvp_set.data.key_size = umsg->body.kvp_set.data.key_size / 2; break; case HV_KVP_OP_GET: umsg->body.kvp_get.data.key_size = utf16_to_utf8(umsg->body.kvp_get.data.key, HV_KVP_EXCHANGE_MAX_KEY_SIZE - 1, (uint16_t *)hmsg->body.kvp_get.data.key, hmsg->body.kvp_get.data.key_size, 1, &utf_err); /* utf8 encoding */ umsg->body.kvp_get.data.key_size = umsg->body.kvp_get.data.key_size / 2; break; case HV_KVP_OP_DELETE: umsg->body.kvp_delete.key_size = utf16_to_utf8(umsg->body.kvp_delete.key, HV_KVP_EXCHANGE_MAX_KEY_SIZE - 1, (uint16_t *)hmsg->body.kvp_delete.key, hmsg->body.kvp_delete.key_size, 1, &utf_err); /* utf8 encoding */ umsg->body.kvp_delete.key_size = umsg->body.kvp_delete.key_size / 2; break; case HV_KVP_OP_ENUMERATE: umsg->body.kvp_enum_data.index = hmsg->body.kvp_enum_data.index; break; default: hv_kvp_log_info("%s: daemon_kvp_msg: Invalid operation : %d\n", __func__, umsg->kvp_hdr.operation); } } /* * Prepare a host kvp msg based on user kvp msg (utf8 to utf16) */ static int hv_kvp_convert_usermsg_to_hostmsg(void) { int hkey_len = 0, hvalue_len = 0, utf_err = 0; struct hv_kvp_exchg_msg_value *host_exchg_data; char *key_name, *value; struct hv_kvp_msg *umsg = &kvp_globals.daemon_kvp_msg; struct hv_kvp_msg *hmsg = kvp_globals.host_kvp_msg; struct hv_kvp_ip_msg *host_ip_msg = (struct hv_kvp_ip_msg *)hmsg; switch (hmsg->kvp_hdr.operation) { case HV_KVP_OP_GET_IP_INFO: return (hv_kvp_convert_utf8_ipinfo_to_utf16(umsg, host_ip_msg)); case HV_KVP_OP_SET_IP_INFO: case HV_KVP_OP_SET: case HV_KVP_OP_DELETE: return (KVP_SUCCESS); case HV_KVP_OP_ENUMERATE: host_exchg_data = &hmsg->body.kvp_enum_data.data; key_name = umsg->body.kvp_enum_data.data.key; hkey_len = utf8_to_utf16((uint16_t *)host_exchg_data->key, ((HV_KVP_EXCHANGE_MAX_KEY_SIZE / 2) - 2), key_name, strlen(key_name), 1, &utf_err); /* utf16 encoding */ host_exchg_data->key_size = 2 * (hkey_len + 1); value = umsg->body.kvp_enum_data.data.msg_value.value; hvalue_len = utf8_to_utf16( (uint16_t *)host_exchg_data->msg_value.value, ((HV_KVP_EXCHANGE_MAX_VALUE_SIZE / 2) - 2), value, strlen(value), 1, &utf_err); host_exchg_data->value_size = 2 * (hvalue_len + 1); host_exchg_data->value_type = HV_REG_SZ; if ((hkey_len < 0) || (hvalue_len < 0)) return (HV_KVP_E_FAIL); return (KVP_SUCCESS); case HV_KVP_OP_GET: host_exchg_data = &hmsg->body.kvp_get.data; value = umsg->body.kvp_get.data.msg_value.value; hvalue_len = utf8_to_utf16( (uint16_t *)host_exchg_data->msg_value.value, ((HV_KVP_EXCHANGE_MAX_VALUE_SIZE / 2) - 2), value, strlen(value), 1, &utf_err); /* Convert value size to uft16 */ host_exchg_data->value_size = 2 * (hvalue_len + 1); /* Use values by string */ host_exchg_data->value_type = HV_REG_SZ; if ((hkey_len < 0) || (hvalue_len < 0)) return (HV_KVP_E_FAIL); return (KVP_SUCCESS); default: return (HV_KVP_E_FAIL); } } /* * Send the response back to the host. */ static void hv_kvp_respond_host(int error) { struct hv_vmbus_icmsg_hdr *hv_icmsg_hdrp; hv_icmsg_hdrp = (struct hv_vmbus_icmsg_hdr *) &kvp_globals.rcv_buf[sizeof(struct hv_vmbus_pipe_hdr)]; if (error) error = HV_KVP_E_FAIL; hv_icmsg_hdrp->status = error; hv_icmsg_hdrp->icflags = HV_ICMSGHDRFLAG_TRANSACTION | HV_ICMSGHDRFLAG_RESPONSE; error = hv_vmbus_channel_send_packet(kvp_globals.channelp, kvp_globals.rcv_buf, kvp_globals.host_msg_len, kvp_globals.host_msg_id, HV_VMBUS_PACKET_TYPE_DATA_IN_BAND, 0); if (error) hv_kvp_log_info("%s: hv_kvp_respond_host: sendpacket error:%d\n", __func__, error); } /* * This is the main kvp kernel process that interacts with both user daemon * and the host */ static void hv_kvp_send_msg_to_daemon(void) { /* Prepare kvp_msg to be sent to user */ hv_kvp_convert_hostmsg_to_usermsg(); /* Send the msg to user via function deamon_read - setting sema */ sema_post(&kvp_globals.dev_sema); + + /* We should wake up the daemon, in case it's doing poll() */ + selwakeup(&hv_kvp_selinfo); } /* * Function to read the kvp request buffer from host * and interact with daemon */ static void hv_kvp_process_request(void *context) { uint8_t *kvp_buf; hv_vmbus_channel *channel = context; uint32_t recvlen = 0; uint64_t requestid; struct hv_vmbus_icmsg_hdr *icmsghdrp; int ret = 0; uint64_t pending_cnt = 1; hv_kvp_log_info("%s: entering hv_kvp_process_request\n", __func__); kvp_buf = receive_buffer[HV_KVP]; ret = hv_vmbus_channel_recv_packet(channel, kvp_buf, 2 * PAGE_SIZE, &recvlen, &requestid); /* * We start counting only after the daemon registers * and therefore there could be requests pending in * the VMBus that are not reflected in pending_cnt. * Therefore we continue reading as long as either of * the below conditions is true. */ while ((pending_cnt>0) || ((ret == 0) && (recvlen > 0))) { if ((ret == 0) && (recvlen>0)) { icmsghdrp = (struct hv_vmbus_icmsg_hdr *) &kvp_buf[sizeof(struct hv_vmbus_pipe_hdr)]; hv_kvp_transaction_init(recvlen, channel, requestid, kvp_buf); if (icmsghdrp->icmsgtype == HV_ICMSGTYPE_NEGOTIATE) { hv_kvp_negotiate_version(icmsghdrp, NULL, kvp_buf); hv_kvp_respond_host(ret); /* * It is ok to not acquire the mutex before setting * req_in_progress here because negotiation is the * first thing that happens and hence there is no * chance of a race condition. */ kvp_globals.req_in_progress = false; hv_kvp_log_info("%s :version negotiated\n", __func__); } else { if (!kvp_globals.daemon_busy) { hv_kvp_log_info("%s: issuing qury to daemon\n", __func__); mtx_lock(&kvp_globals.pending_mutex); kvp_globals.req_timed_out = false; kvp_globals.daemon_busy = true; mtx_unlock(&kvp_globals.pending_mutex); hv_kvp_send_msg_to_daemon(); hv_kvp_log_info("%s: waiting for daemon\n", __func__); } /* Wait 5 seconds for daemon to respond back */ tsleep(&kvp_globals, 0, "kvpworkitem", 5 * hz); hv_kvp_log_info("%s: came out of wait\n", __func__); } } mtx_lock(&kvp_globals.pending_mutex); /* Notice that once req_timed_out is set to true * it will remain true until the next request is * sent to the daemon. The response from daemon * is forwarded to host only when this flag is * false. */ kvp_globals.req_timed_out = true; /* * Cancel request if so need be. */ if (hv_kvp_req_in_progress()) { hv_kvp_log_info("%s: request was still active after wait so failing\n", __func__); hv_kvp_respond_host(HV_KVP_E_FAIL); kvp_globals.req_in_progress = false; } /* * Decrement pending request count and */ if (kvp_globals.pending_reqs>0) { kvp_globals.pending_reqs = kvp_globals.pending_reqs - 1; } pending_cnt = kvp_globals.pending_reqs; mtx_unlock(&kvp_globals.pending_mutex); /* * Try reading next buffer */ recvlen = 0; ret = hv_vmbus_channel_recv_packet(channel, kvp_buf, 2 * PAGE_SIZE, &recvlen, &requestid); hv_kvp_log_info("%s: read: context %p, pending_cnt %llu ret =%d, recvlen=%d\n", __func__, context, (unsigned long long)pending_cnt, ret, recvlen); } } /* * Callback routine that gets called whenever there is a message from host */ void hv_kvp_callback(void *context) { uint64_t pending_cnt = 0; if (kvp_globals.register_done == false) { kvp_globals.channelp = context; } else { mtx_lock(&kvp_globals.pending_mutex); kvp_globals.pending_reqs = kvp_globals.pending_reqs + 1; pending_cnt = kvp_globals.pending_reqs; mtx_unlock(&kvp_globals.pending_mutex); if (pending_cnt == 1) { hv_kvp_log_info("%s: Queuing work item\n", __func__); hv_queue_work_item( service_table[HV_KVP].work_queue, hv_kvp_process_request, context ); } } } /* * This function is called by the hv_kvp_init - * creates character device hv_kvp_dev * allocates memory to hv_kvp_dev_buf * */ static int hv_kvp_dev_init(void) { int error = 0; /* initialize semaphore */ sema_init(&kvp_globals.dev_sema, 0, "hv_kvp device semaphore"); /* create character device */ error = make_dev_p(MAKEDEV_CHECKNAME | MAKEDEV_WAITOK, &hv_kvp_dev, &hv_kvp_cdevsw, 0, UID_ROOT, GID_WHEEL, 0640, "hv_kvp_dev"); if (error != 0) return (error); /* * Malloc with M_WAITOK flag will never fail. */ hv_kvp_dev_buf = malloc(sizeof(*hv_kvp_dev_buf), M_HV_KVP_DEV_BUF, M_WAITOK | M_ZERO); return (0); } /* * This function is called by the hv_kvp_deinit - * destroy character device */ static void hv_kvp_dev_destroy(void) { if (daemon_task != NULL) { PROC_LOCK(daemon_task); kern_psignal(daemon_task, SIGKILL); PROC_UNLOCK(daemon_task); } destroy_dev(hv_kvp_dev); free(hv_kvp_dev_buf, M_HV_KVP_DEV_BUF); return; } static int hv_kvp_dev_open(struct cdev *dev, int oflags, int devtype, struct thread *td) { hv_kvp_log_info("%s: Opened device \"hv_kvp_device\" successfully.\n", __func__); if (kvp_globals.dev_accessed) return (-EBUSY); daemon_task = curproc; kvp_globals.dev_accessed = true; kvp_globals.daemon_busy = false; return (0); } static int hv_kvp_dev_close(struct cdev *dev __unused, int fflag __unused, int devtype __unused, struct thread *td __unused) { hv_kvp_log_info("%s: Closing device \"hv_kvp_device\".\n", __func__); kvp_globals.dev_accessed = false; kvp_globals.register_done = false; return (0); } /* * hv_kvp_daemon read invokes this function * acts as a send to daemon */ static int hv_kvp_dev_daemon_read(struct cdev *dev __unused, struct uio *uio, int ioflag __unused) { size_t amt; int error = 0; /* Check hv_kvp daemon registration status*/ if (!kvp_globals.register_done) return (KVP_ERROR); sema_wait(&kvp_globals.dev_sema); memcpy(hv_kvp_dev_buf, &kvp_globals.daemon_kvp_msg, sizeof(struct hv_kvp_msg)); amt = MIN(uio->uio_resid, uio->uio_offset >= BUFFERSIZE + 1 ? 0 : BUFFERSIZE + 1 - uio->uio_offset); if ((error = uiomove(hv_kvp_dev_buf, amt, uio)) != 0) hv_kvp_log_info("%s: hv_kvp uiomove read failed!\n", __func__); return (error); } /* * hv_kvp_daemon write invokes this function * acts as a recieve from daemon */ static int hv_kvp_dev_daemon_write(struct cdev *dev __unused, struct uio *uio, int ioflag __unused) { size_t amt; int error = 0; uio->uio_offset = 0; amt = MIN(uio->uio_resid, BUFFERSIZE); error = uiomove(hv_kvp_dev_buf, amt, uio); if (error != 0) return (error); memcpy(&kvp_globals.daemon_kvp_msg, hv_kvp_dev_buf, sizeof(struct hv_kvp_msg)); if (kvp_globals.register_done == false) { if (kvp_globals.daemon_kvp_msg.kvp_hdr.operation == HV_KVP_OP_REGISTER) { kvp_globals.register_done = true; if (kvp_globals.channelp) { hv_kvp_callback(kvp_globals.channelp); } } else { hv_kvp_log_info("%s, KVP Registration Failed\n", __func__); return (KVP_ERROR); } } else { mtx_lock(&kvp_globals.pending_mutex); if(!kvp_globals.req_timed_out) { hv_kvp_convert_usermsg_to_hostmsg(); hv_kvp_respond_host(KVP_SUCCESS); wakeup(&kvp_globals); kvp_globals.req_in_progress = false; } kvp_globals.daemon_busy = false; mtx_unlock(&kvp_globals.pending_mutex); } return (error); } /* * hv_kvp_daemon poll invokes this function to check if data is available * for daemon to read. */ static int -hv_kvp_dev_daemon_poll(struct cdev *dev __unused, int events, struct thread *td __unused) +hv_kvp_dev_daemon_poll(struct cdev *dev __unused, int events, struct thread *td) { int revents = 0; mtx_lock(&kvp_globals.pending_mutex); /* * We check global flag daemon_busy for the data availiability for * userland to read. Deamon_busy is set to true before driver has data * for daemon to read. It is set to false after daemon sends * then response back to driver. */ if (kvp_globals.daemon_busy == true) revents = POLLIN; + else + selrecord(td, &hv_kvp_selinfo); + mtx_unlock(&kvp_globals.pending_mutex); return (revents); } /* * hv_kvp initialization function * called from hv_util service. * */ int hv_kvp_init(hv_vmbus_service *srv) { int error = 0; hv_work_queue *work_queue = NULL; memset(&kvp_globals, 0, sizeof(kvp_globals)); work_queue = hv_work_queue_create("KVP Service"); if (work_queue == NULL) { hv_kvp_log_info("%s: Work queue alloc failed\n", __func__); error = ENOMEM; hv_kvp_log_error("%s: ENOMEM\n", __func__); goto Finish; } srv->work_queue = work_queue; error = hv_kvp_dev_init(); mtx_init(&kvp_globals.pending_mutex, "hv-kvp pending mutex", NULL, MTX_DEF); kvp_globals.pending_reqs = 0; Finish: return (error); } void hv_kvp_deinit(void) { hv_kvp_dev_destroy(); mtx_destroy(&kvp_globals.pending_mutex); return; } Index: releng/10.2 =================================================================== --- releng/10.2 (revision 296954) +++ releng/10.2 (revision 296955) Property changes on: releng/10.2 ___________________________________________________________________ Modified: svn:mergeinfo ## -0,0 +0,2 ## Merged /head:r291156,292258 Merged /stable/10:r292438-292439