Index: releng/10.3/UPDATING =================================================================== --- releng/10.3/UPDATING (revision 303983) +++ releng/10.3/UPDATING (revision 303984) @@ -1,2319 +1,2335 @@ Updating Information for FreeBSD current users This file is maintained and copyrighted by M. Warner Losh . See end of file for further details. For commonly done items, please see the COMMON ITEMS: section later in the file. These instructions assume that you basically know what you are doing. If not, then please consult the FreeBSD handbook: http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/makeworld.html Items affecting the ports and packages system can be found in /usr/ports/UPDATING. Please read that file before running portupgrade. NOTE: FreeBSD has switched from gcc to clang. If you have trouble bootstrapping from older versions of FreeBSD, try WITHOUT_CLANG to bootstrap to the tip of stable/10, and then rebuild without this option. The bootstrap process from older version of current is a bit fragile. +20160811 p7 FreeBSD-EN-16:10.dhclient + FreeBSD-EN-16:11.vmbus + FreeBSD-EN-16:12.hv_storvsc + FreeBSD-EN-16:13.vmbus + FreeBSD-EN-16:14.hv_storvsc + FreeBSD-EN-16:15.vmbus + FreeBSD-EN-16:16.hv_storvsc + + Fix handling of unknown options from a DHCP server. [EN-16:10] + Fix a panic in hv_vmbus(4). [EN-16:11] + Fix missing hotplugged disk in hv_storvsc(4). [EN-16:12] + Fix the timecounter emulation in hv_vmbus(4). [EN-16:13] + Fix callout(9) handling in hv_storvsc(4). [EN-16:14] + Fix memory allocation issues in hv_vmbus(4). [EN-16:15] + Fix SCSI command handling in hv_storvsc(4). [EN-16:16] + 20160725 p6 FreeBSD-SA-16:25.bspatch FreeBSD-EN-16:09.freebsd-update Fix bspatch heap overflow vulnerability. [SA-16:25] Fix freebsd-update(8) support of FreeBSD 11.0 release distribution. [EN-16:09] 20160604 p5 FreeBSD-SA-16:24.ntp Fix multiple vulnerabilities of ntp. 20160531 p4 FreeBSD-SA-16:20.linux FreeBSD-SA-16:21.43bsd FreeBSD-SA-16:22.libarchive Fix kernel stack disclosure in Linux compatibility layer. [SA-16:20] Fix kernel stack disclosure in 4.3BSD compatibility layer. [SA-16:21] Fix directory traversal in cpio(1). [SA-16:22] 20160517 p3 FreeBSD-SA-16:18.atkbd FreeBSD-SA-16:19.sendmsg Fix buffer overflow in keyboard driver. [SA-16:18] Fix incorrect argument handling in sendmsg(2). [SA-16:19] 20160504 p2 FreeBSD-SA-16:17.openssl FreeBSD-EN-16:06.libc FreeBSD-EN-16:07.ipi FreeBSD-EN-16:08.zfs Fix multiple OpenSSL vulnerabilitites. [SA-16:17] Fix performance regression in libc hash(3). [EN-16:06] Fix excessive latency in x86 IPI delivery. [EN-16:07] Fix memory leak in ZFS. [EN-16:08] 20160429 p1 FreeBSD-SA-16:16.ntp Fix multiple vulnerabilities of ntp. 20160329: 10.3-RELEASE. 20160124: The NONE and HPN patches has been removed from OpenSSH. They are still available in the security/openssh-portable port. 20151214: r292223 changed the internal interface between the nfsd.ko and nfscommon.ko modules. As such, they must both be upgraded to-gether. __FreeBSD_version has been bumped because of this. 20151113: Qlogic 24xx/25xx firmware images were updated from 5.5.0 to 7.3.0. Kernel modules isp_2400_multi and isp_2500_multi were removed and should be replaced with isp_2400 and isp_2500 modules respectively. 20150806: The menu.rc and loader.rc files will now be replaced during upgrades. Please migrate local changes to menu.rc.local and loader.rc.local instead. 20151026: NTP has been upgraded to 4.2.8p4. 20151025: ALLOW_DEPRECATED_ATF_TOOLS/ATFFILE support has been removed from atf.test.mk (included from bsd.test.mk). Please upgrade devel/atf and devel/kyua to version 0.20+ and adjust any calling code to work with Kyuafile and kyua. 20150823: The polarity of Pulse Per Second (PPS) capture events with the uart(4) driver has been corrected. Prior to this change the PPS "assert" event corresponded to the trailing edge of a positive PPS pulse and the "clear" event was the leading edge of the next pulse. As the width of a PPS pulse in a typical GPS receiver is on the order of 1 millisecond, most users will not notice any significant difference with this change. Anyone who has compensated for the historical polarity reversal by configuring a negative offset equal to the pulse width will need to remove that workaround. 20150822: From legacy ata(4) driver was removed support for SATA controllers supported by more functional drivers ahci(4), siis(4) and mvs(4). Kernel modules ataahci and ataadaptec were removed completely, replaced by ahci and mvs modules respectively. 20150813: 10.2-RELEASE. 20150731: As ZFS requires more kernel stack pages than is the default on some architectures e.g. i386, it now warns if KSTACK_PAGES is less than ZFS_MIN_KSTACK_PAGES (which is 4 at the time of writing). Please consider using 'options KSTACK_PAGES=X' where X is greater than or equal to ZFS_MIN_KSTACK_PAGES i.e. 4 in such configurations. 20150703: The default Unbound configuration now enables remote control using a local socket. Users who have already enabled the local_unbound service should regenerate their configuration by running "service local_unbound setup" as root. 20150624: An additional fix for the issue described in the 20150614 sendmail entry below has been been committed in revision 284786. 20150615: The fix for the issue described in the 20150614 sendmail entry below has been been committed in revision 284485. The work around described in that entry is no longer needed unless the default setting is overridden by a confDH_PARAMETERS configuration setting of '5' or pointing to a 512 bit DH parameter file. 20150614: The import of openssl to address the FreeBSD-SA-15:10.openssl security advisory includes a change which rejects handshakes with DH parameters below 768 bits. sendmail releases prior to 8.15.2 (not yet released), defaulted to a 512 bit DH parameter setting for client connections. To work around this interoperability, sendmail can be configured to use a 2048 bit DH parameter by: 1. Edit /etc/mail/`hostname`.mc 2. If a setting for confDH_PARAMETERS does not exist or exists and is set to a string beginning with '5', replace it with '2'. 3. If a setting for confDH_PARAMETERS exists and is set to a file path, create a new file with: openssl dhparam -out /path/to/file 2048 4. Rebuild the .cf file: cd /etc/mail/; make; make install 5. Restart sendmail: cd /etc/mail/; make restart A sendmail patch is coming, at which time this file will be updated. 20150601: chmod, chflags, chown and chgrp now affect symlinks in -R mode as defined in symlink(7); previously symlinks were silently ignored. 20150430: The const qualifier has been removed from iconv(3) to comply with POSIX. The ports tree is aware of this from r384038 onwards. 20141215: At svn r275807, The default linux compat kernel ABI has been adjusted to 2.6.18 in support of the linux-c6 compat ports infrastructure update. If you wish to continue using the linux-f10 compat ports, add compat.linux.osrelease=2.6.16 to your local sysctl.conf. Users are encouraged to update their linux-compat packages to linux-c6 during their next update cycle. See ports/UPDATING 20141209 and 20141215 on migration to CentOS 6 ports. 20141205: pjdfstest has been integrated into kyua as an opt-in test suite. Please see share/doc/pjdfstest/README for a more details on how to execute it. 20141118: 10.1-RELEASE. 20140904: The ofwfb driver, used to provide a graphics console on PowerPC when using vt(4), no longer allows mmap() of all of physical memory. This will prevent Xorg on PowerPC with some ATI graphics cards from initializing properly unless x11-servers/xorg-server is updated to 1.12.4_8 or newer. 20140831: The libatf-c and libatf-c++ major versions were downgraded to 0 and 1 respectively to match the upstream numbers. They were out of sync because, when they were originally added to FreeBSD, the upstream versions were not respected. These libraries are private and not yet built by default, so renumbering them should be a non-issue. However, unclean source trees will yield broken test programs once the operator executes "make delete-old-libs" after a "make installworld". Additionally, the atf-sh binary was made private by moving it into /usr/libexec/. Already-built shell test programs will keep the path to the old binary so they will break after "make delete-old" is run. If you are using WITH_TESTS=yes (not the default), wipe the object tree and rebuild from scratch to prevent spurious test failures. This is only needed once: the misnumbered libraries and misplaced binaries have been added to OptionalObsoleteFiles.inc so they will be removed during a clean upgrade. 20140814: The ixgbe tunables now match their sysctl counterparts, for example: hw.ixgbe.enable_aim => hw.ix.enable_aim Anyone using ixgbe tunables should ensure they update /boot/loader.conf. 20140801: The NFSv4.1 server committed by r269398 changes the internal function call interfaces used between the NFS and krpc modules. As such, __FreeBSD_version was bumped. 20140729: The default unbound configuration has been modified to address issues with reverse lookups on networks that use private address ranges. If you use the local_unbound service, run "service local_unbound setup" as root to regenerate your configuration, then "service local_unbound reload" to load the new configuration. 20140717: It is no longer necessary to include the dwarf version in your DEBUG options in your kernel config file. The bug that required it to be placed in the config file has bene fixed. DEBUG should now just contain -g. The build system will automatically update things to do the right thing. 20140715: Several ABI breaking changes were merged to CTL and new iSCSI code. All CTL and iSCSI-related tools, such as ctladm, ctld, iscsid and iscsictl need to be rebuilt to work with a new kernel. 20140708: The WITHOUT_VT_SUPPORT kernel config knob has been renamed WITHOUT_VT. (The other _SUPPORT knobs have a consistent meaning which differs from the behaviour controlled by this knob.) 20140608: On i386 and amd64 systems, the onifconsole flag is now set by default in /etc/ttys for ttyu0. This causes ttyu0 to be automatically enabled as a login TTY if it is set in the bootloader as an active kernel console. No changes in behavior should result otherwise. To revert to the previous behavior, set ttyu0 to "off" in /etc/ttys. 20140512: Clang and llvm have been upgraded to 3.4.1 release. 20140321: Clang and llvm have been upgraded to 3.4 release. 20140306: If a Makefile in a tests/ directory was auto-generating a Kyuafile instead of providing an explicit one, this would prevent such Makefile from providing its own Kyuafile in the future during NO_CLEAN builds. This has been fixed in the Makefiles but manual intervention is needed to clean an objdir if you use NO_CLEAN: # find /usr/obj -name Kyuafile | xargs rm -f 20140303: OpenSSH will now ignore errors caused by kernel lacking of Capsicum capability mode support. Please note that enabling the feature in kernel is still highly recommended. 20140227: OpenSSH is now built with sandbox support, and will use sandbox as the default privilege separation method. This requires Capsicum capability mode support in kernel. 20140216: The nve(4) driver for NVIDIA nForce MCP Ethernet adapters has been deprecated and will not be part of FreeBSD 11.0 and later releases. If you use this driver, please consider switching to the nfe(4) driver instead. 20140120: 10.0-RELEASE. 20131216: The behavior of gss_pseudo_random() for the krb5 mechanism has changed, for applications requesting a longer random string than produced by the underlying enctype's pseudo-random() function. In particular, the random string produced from a session key of enctype aes256-cts-hmac-sha1-96 or aes256-cts-hmac-sha1-96 will be different at the 17th octet and later, after this change. The counter used in the PRF+ construction is now encoded as a big-endian integer in accordance with RFC 4402. __FreeBSD_version is bumped to 1000701. 20131108: The WITHOUT_ATF build knob has been removed and its functionality has been subsumed into the more generic WITHOUT_TESTS. If you were using the former to disable the build of the ATF libraries, you should change your settings to use the latter. 20131031: The default version of mtree is nmtree which is obtained from NetBSD. The output is generally the same, but may vary slightly. If you found you need identical output adding "-F freebsd9" to the command line should do the trick. For the time being, the old mtree is available as fmtree. 20131014: libbsdyml has been renamed to libyaml and moved to /usr/lib/private. This will break ports-mgmt/pkg. Rebuild the port, or upgrade to pkg 1.1.4_8 and verify bsdyml not linked in, before running "make delete-old-libs": # make -C /usr/ports/ports-mgmt/pkg build deinstall install clean or # pkg install pkg; ldd /usr/local/sbin/pkg | grep bsdyml 20131010: The rc.d/jail script has been updated to support jail(8) configuration file. The "jail__*" rc.conf(5) variables for per-jail configuration are automatically converted to /var/run/jail..conf before the jail(8) utility is invoked. This is transparently backward compatible. See below about some incompatibilities and rc.conf(5) manual page for more details. These variables are now deprecated in favor of jail(8) configuration file. One can use "rc.d/jail config " command to generate a jail(8) configuration file in /var/run/jail..conf without running the jail(8) utility. The default pathname of the configuration file is /etc/jail.conf and can be specified by using $jail_conf or $jail__conf variables. Please note that jail_devfs_ruleset accepts an integer at this moment. Please consider to rewrite the ruleset name with an integer. 20130930: BIND has been removed from the base system. If all you need is a local resolver, simply enable and start the local_unbound service instead. Otherwise, several versions of BIND are available in the ports tree. The dns/bind99 port is one example. With this change, nslookup(1) and dig(1) are no longer in the base system. Users should instead use host(1) and drill(1) which are in the base system. Alternatively, nslookup and dig can be obtained by installing the dns/bind-tools port. 20130916: With the addition of unbound(8), a new unbound user is now required during installworld. "mergemaster -p" can be used to add the user prior to installworld, as documented in the handbook. 20130911: OpenSSH is now built with DNSSEC support, and will by default silently trust signed SSHFP records. This can be controlled with the VerifyHostKeyDNS client configuration setting. DNSSEC support can be disabled entirely with the WITHOUT_LDNS option in src.conf. 20130906: The GNU Compiler Collection and C++ standard library (libstdc++) are no longer built by default on platforms where clang is the system compiler. You can enable them with the WITH_GCC and WITH_GNUCXX options in src.conf. 20130905: The PROCDESC kernel option is now part of the GENERIC kernel configuration and is required for the rwhod(8) to work. If you are using custom kernel configuration, you should include 'options PROCDESC'. 20130905: The API and ABI related to the Capsicum framework was modified in backward incompatible way. The userland libraries and programs have to be recompiled to work with the new kernel. This includes the following libraries and programs, but the whole buildworld is advised: libc, libprocstat, dhclient, tcpdump, hastd, hastctl, kdump, procstat, rwho, rwhod, uniq. 20130903: AES-NI intrinsic support has been added to gcc. The AES-NI module has been updated to use this support. A new gcc is required to build the aesni module on both i386 and amd64. 20130821: The PADLOCK_RNG and RDRAND_RNG kernel options are now devices. Thus "device padlock_rng" and "device rdrand_rng" should be used instead of "options PADLOCK_RNG" & "options RDRAND_RNG". 20130813: WITH_ICONV has been split into two feature sets. WITH_ICONV now enables just the iconv* functionality and is now on by default. WITH_LIBICONV_COMPAT enables the libiconv api and link time compatability. Set WITHOUT_ICONV to build the old way. If you have been using WITH_ICONV before, you will very likely need to turn on WITH_LIBICONV_COMPAT. 20130806: INVARIANTS option now enables DEBUG for code with OpenSolaris and Illumos origin, including ZFS. If you have INVARIANTS in your kernel configuration, then there is no need to set DEBUG or ZFS_DEBUG explicitly. DEBUG used to enable witness(9) tracking of OpenSolaris (mostly ZFS) locks if WITNESS option was set. Because that generated a lot of witness(9) reports and all of them were believed to be false positives, this is no longer done. New option OPENSOLARIS_WITNESS can be used to achieve the previous behavior. 20130806: Timer values in IPv6 data structures now use time_uptime instead of time_second. Although this is not a user-visible functional change, userland utilities which directly use them---ndp(8), rtadvd(8), and rtsold(8) in the base system---need to be updated to r253970 or later. 20130802: find -delete can now delete the pathnames given as arguments, instead of only files found below them or if the pathname did not contain any slashes. Formerly, the following error message would result: find: -delete: : relative path potentially not safe Deleting the pathnames given as arguments can be prevented without error messages using -mindepth 1 or by changing directory and passing "." as argument to find. This works in the old as well as the new version of find. 20130726: Behavior of devfs rules path matching has been changed. Pattern is now always matched against fully qualified devfs path and slash characters must be explicitly matched by slashes in pattern (FNM_PATHNAME). Rulesets involving devfs subdirectories must be reviewed. 20130716: The default ARM ABI has changed to the ARM EABI. The old ABI is incompatible with the ARM EABI and all programs and modules will need to be rebuilt to work with a new kernel. To keep using the old ABI ensure the WITHOUT_ARM_EABI knob is set. NOTE: Support for the old ABI will be removed in the future and users are advised to upgrade. 20130709: pkg_install has been disconnected from the build if you really need it you should add WITH_PKGTOOLS in your src.conf(5). 20130709: Most of network statistics structures were changed to be able keep 64-bits counters. Thus all tools, that work with networking statistics, must be rebuilt (netstat(1), bsnmpd(1), etc.) 20130618: Fix a bug that allowed a tracing process (e.g. gdb) to write to a memory-mapped file in the traced process's address space even if neither the traced process nor the tracing process had write access to that file. 20130615: CVS has been removed from the base system. An exact copy of the code is available from the devel/cvs port. 20130613: Some people report the following error after the switch to bmake: make: illegal option -- J usage: make [-BPSXeiknpqrstv] [-C directory] [-D variable] ... *** [buildworld] Error code 2 this likely due to an old instance of make in ${MAKEPATH} (${MAKEOBJDIRPREFIX}${.CURDIR}/make.${MACHINE}) which src/Makefile will use that blindly, if it exists, so if you see the above error: rm -rf `make -V MAKEPATH` should resolve it. 20130516: Use bmake by default. Whereas before one could choose to build with bmake via -DWITH_BMAKE one must now use -DWITHOUT_BMAKE to use the old make. The goal is to remove these knobs for 10-RELEASE. It is worth noting that bmake (like gmake) treats the command line as the unit of failure, rather than statements within the command line. Thus '(cd some/where && dosomething)' is safer than 'cd some/where; dosomething'. The '()' allows consistent behavior in parallel build. 20130429: Fix a bug that allows NFS clients to issue READDIR on files. 20130426: The WITHOUT_IDEA option has been removed because the IDEA patent expired. 20130426: The sysctl which controls TRIM support under ZFS has been renamed from vfs.zfs.trim_disable -> vfs.zfs.trim.enabled and has been enabled by default. 20130425: The mergemaster command now uses the default MAKEOBJDIRPREFIX rather than creating it's own in the temporary directory in order allow access to bootstrapped versions of tools such as install and mtree. When upgrading from version of FreeBSD where the install command does not support -l, you will need to install a new mergemaster command if mergemaster -p is required. This can be accomplished with the command (cd src/usr.sbin/mergemaster && make install). 20130404: Legacy ATA stack, disabled and replaced by new CAM-based one since FreeBSD 9.0, completely removed from the sources. Kernel modules atadisk and atapi*, user-level tools atacontrol and burncd are removed. Kernel option `options ATA_CAM` is now permanently enabled and removed. 20130319: SOCK_CLOEXEC and SOCK_NONBLOCK flags have been added to socket(2) and socketpair(2). Software, in particular Kerberos, may automatically detect and use these during building. The resulting binaries will not work on older kernels. 20130308: CTL_DISABLE has also been added to the sparc64 GENERIC (for further information, see the respective 20130304 entry). 20130304: Recent commits to callout(9) changed the size of struct callout, so the KBI is probably heavily disturbed. Also, some functions in callout(9)/sleep(9)/sleepqueue(9)/condvar(9) KPIs were replaced by macros. Every kernel module using it won't load, so rebuild is requested. The ctl device has been re-enabled in GENERIC for i386 and amd64, but does not initialize by default (because of the new CTL_DISABLE option) to save memory. To re-enable it, remove the CTL_DISABLE option from the kernel config file or set kern.cam.ctl.disable=0 in /boot/loader.conf. 20130301: The ctl device has been disabled in GENERIC for i386 and amd64. This was done due to the extra memory being allocated at system initialisation time by the ctl driver which was only used if a CAM target device was created. This makes a FreeBSD system unusable on 128MB or less of RAM. 20130208: A new compression method (lz4) has been merged to -HEAD. Please refer to zpool-features(7) for more information. Please refer to the "ZFS notes" section of this file for information on upgrading boot ZFS pools. 20130129: A BSD-licensed patch(1) variant has been added and is installed as bsdpatch, being the GNU version the default patch. To inverse the logic and use the BSD-licensed one as default, while having the GNU version installed as gnupatch, rebuild and install world with the WITH_BSD_PATCH knob set. 20130121: Due to the use of the new -l option to install(1) during build and install, you must take care not to directly set the INSTALL make variable in your /etc/make.conf, /etc/src.conf, or on the command line. If you wish to use the -C flag for all installs you may be able to add INSTALL+=-C to /etc/make.conf or /etc/src.conf. 20130118: The install(1) option -M has changed meaning and now takes an argument that is a file or path to append logs to. In the unlikely event that -M was the last option on the command line and the command line contained at least two files and a target directory the first file will have logs appended to it. The -M option served little practical purpose in the last decade so its use is expected to be extremely rare. 20121223: After switching to Clang as the default compiler some users of ZFS on i386 systems started to experience stack overflow kernel panics. Please consider using 'options KSTACK_PAGES=4' in such configurations. 20121222: GEOM_LABEL now mangles label names read from file system metadata. Mangling affect labels containing spaces, non-printable characters, '%' or '"'. Device names in /etc/fstab and other places may need to be updated. 20121217: By default, only the 10 most recent kernel dumps will be saved. To restore the previous behaviour (no limit on the number of kernel dumps stored in the dump directory) add the following line to /etc/rc.conf: savecore_flags="" 20121201: With the addition of auditdistd(8), a new auditdistd user is now required during installworld. "mergemaster -p" can be used to add the user prior to installworld, as documented in the handbook. 20121117: The sin6_scope_id member variable in struct sockaddr_in6 is now filled by the kernel before passing the structure to the userland via sysctl or routing socket. This means the KAME-specific embedded scope id in sin6_addr.s6_addr[2] is always cleared in userland application. This behavior can be controlled by net.inet6.ip6.deembed_scopeid. __FreeBSD_version is bumped to 1000025. 20121105: On i386 and amd64 systems WITH_CLANG_IS_CC is now the default. This means that the world and kernel will be compiled with clang and that clang will be installed as /usr/bin/cc, /usr/bin/c++, and /usr/bin/cpp. To disable this behavior and revert to building with gcc, compile with WITHOUT_CLANG_IS_CC. Really old versions of current may need to bootstrap WITHOUT_CLANG first if the clang build fails (its compatibility window doesn't extend to the 9 stable branch point). 20121102: The IPFIREWALL_FORWARD kernel option has been removed. Its functionality now turned on by default. 20121023: The ZERO_COPY_SOCKET kernel option has been removed and split into SOCKET_SEND_COW and SOCKET_RECV_PFLIP. NB: SOCKET_SEND_COW uses the VM page based copy-on-write mechanism which is not safe and may result in kernel crashes. NB: The SOCKET_RECV_PFLIP mechanism is useless as no current driver supports disposeable external page sized mbuf storage. Proper replacements for both zero-copy mechanisms are under consideration and will eventually lead to complete removal of the two kernel options. 20121023: The IPv4 network stack has been converted to network byte order. The following modules need to be recompiled together with kernel: carp(4), divert(4), gif(4), siftr(4), gre(4), pf(4), ipfw(4), ng_ipfw(4), stf(4). 20121022: Support for non-MPSAFE filesystems was removed from VFS. The VFS_VERSION was bumped, all filesystem modules shall be recompiled. 20121018: All the non-MPSAFE filesystems have been disconnected from the build. The full list includes: codafs, hpfs, ntfs, nwfs, portalfs, smbfs, xfs. 20121016: The interface cloning API and ABI has changed. The following modules need to be recompiled together with kernel: ipfw(4), pfsync(4), pflog(4), usb(4), wlan(4), stf(4), vlan(4), disc(4), edsc(4), if_bridge(4), gif(4), tap(4), faith(4), epair(4), enc(4), tun(4), if_lagg(4), gre(4). 20121015: The sdhci driver was split in two parts: sdhci (generic SD Host Controller logic) and sdhci_pci (actual hardware driver). No kernel config modifications are required, but if you load sdhc as a module you must switch to sdhci_pci instead. 20121014: Import the FUSE kernel and userland support into base system. 20121013: The GNU sort(1) program has been removed since the BSD-licensed sort(1) has been the default for quite some time and no serious problems have been reported. The corresponding WITH_GNU_SORT knob has also gone. 20121006: The pfil(9) API/ABI for AF_INET family has been changed. Packet filtering modules: pf(4), ipfw(4), ipfilter(4) need to be recompiled with new kernel. 20121001: The net80211(4) ABI has been changed to allow for improved driver PS-POLL and power-save support. All wireless drivers need to be recompiled to work with the new kernel. 20120913: The random(4) support for the VIA hardware random number generator (`PADLOCK') is no longer enabled unconditionally. Add the padlock_rng device in the custom kernel config if needed. The GENERIC kernels on i386 and amd64 do include the device, so the change only affects the custom kernel configurations. 20120908: The pf(4) packet filter ABI has been changed. pfctl(8) and snmp_pf module need to be recompiled to work with new kernel. 20120828: A new ZFS feature flag "com.delphix:empty_bpobj" has been merged to -HEAD. Pools that have empty_bpobj in active state can not be imported read-write with ZFS implementations that do not support this feature. For more information read the zpool-features(5) manual page. 20120727: The sparc64 ZFS loader has been changed to no longer try to auto- detect ZFS providers based on diskN aliases but now requires these to be explicitly listed in the OFW boot-device environment variable. 20120712: The OpenSSL has been upgraded to 1.0.1c. Any binaries requiring libcrypto.so.6 or libssl.so.6 must be recompiled. Also, there are configuration changes. Make sure to merge /etc/ssl/openssl.cnf. 20120712: The following sysctls and tunables have been renamed for consistency with other variables: kern.cam.da.da_send_ordered -> kern.cam.da.send_ordered kern.cam.ada.ada_send_ordered -> kern.cam.ada.send_ordered 20120628: The sort utility has been replaced with BSD sort. For now, GNU sort is also available as "gnusort" or the default can be set back to GNU sort by setting WITH_GNU_SORT. In this case, BSD sort will be installed as "bsdsort". 20120611: A new version of ZFS (pool version 5000) has been merged to -HEAD. Starting with this version the old system of ZFS pool versioning is superseded by "feature flags". This concept enables forward compatibility against certain future changes in functionality of ZFS pools. The first read-only compatible "feature flag" for ZFS pools is named "com.delphix:async_destroy". For more information read the new zpool-features(5) manual page. Please refer to the "ZFS notes" section of this file for information on upgrading boot ZFS pools. 20120417: The malloc(3) implementation embedded in libc now uses sources imported as contrib/jemalloc. The most disruptive API change is to /etc/malloc.conf. If your system has an old-style /etc/malloc.conf, delete it prior to installworld, and optionally re-create it using the new format after rebooting. See malloc.conf(5) for details (specifically the TUNING section and the "opt.*" entries in the MALLCTL NAMESPACE section). 20120328: Big-endian MIPS TARGET_ARCH values no longer end in "eb". mips64eb is now spelled mips64. mipsn32eb is now spelled mipsn32. mipseb is now spelled mips. This is to aid compatibility with third-party software that expects this naming scheme in uname(3). Little-endian settings are unchanged. If you are updating a big-endian mips64 machine from before this change, you may need to set MACHINE_ARCH=mips64 in your environment before the new build system will recognize your machine. 20120306: Disable by default the option VFS_ALLOW_NONMPSAFE for all supported platforms. 20120229: Now unix domain sockets behave "as expected" on nullfs(5). Previously nullfs(5) did not pass through all behaviours to the underlying layer, as a result if we bound to a socket on the lower layer we could connect only to the lower path; if we bound to the upper layer we could connect only to the upper path. The new behavior is one can connect to both the lower and the upper paths regardless what layer path one binds to. 20120211: The getifaddrs upgrade path broken with 20111215 has been restored. If you have upgraded in between 20111215 and 20120209 you need to recompile libc again with your kernel. You still need to recompile world to be able to configure CARP but this restriction already comes from 20111215. 20120114: The set_rcvar() function has been removed from /etc/rc.subr. All base and ports rc.d scripts have been updated, so if you have a port installed with a script in /usr/local/etc/rc.d you can either hand-edit the rcvar= line, or reinstall the port. An easy way to handle the mass-update of /etc/rc.d: rm /etc/rc.d/* && mergemaster -i 20120109: panic(9) now stops other CPUs in the SMP systems, disables interrupts on the current CPU and prevents other threads from running. This behavior can be reverted using the kern.stop_scheduler_on_panic tunable/sysctl. The new behavior can be incompatible with kern.sync_on_panic. 20111215: The carp(4) facility has been changed significantly. Configuration of the CARP protocol via ifconfig(8) has changed, as well as format of CARP events submitted to devd(8) has changed. See manual pages for more information. The arpbalance feature of carp(4) is currently not supported anymore. Size of struct in_aliasreq, struct in6_aliasreq has changed. User utilities using SIOCAIFADDR, SIOCAIFADDR_IN6, e.g. ifconfig(8), need to be recompiled. 20111122: The acpi_wmi(4) status device /dev/wmistat has been renamed to /dev/wmistat0. 20111108: The option VFS_ALLOW_NONMPSAFE option has been added in order to explicitely support non-MPSAFE filesystems. It is on by default for all supported platform at this present time. 20111101: The broken amd(4) driver has been replaced with esp(4) in the amd64, i386 and pc98 GENERIC kernel configuration files. 20110930: sysinstall has been removed 20110923: The stable/9 branch created in subversion. This corresponds to the RELENG_9 branch in CVS. 20110913: This commit modifies vfs_register() so that it uses a hash calculation to set vfc_typenum, which is enabled by default. The first time a system is booted after this change, the vfc_typenum values will change for all file systems. The main effect of this is a change to the NFS server file handles for file systems that use vfc_typenum in their fsid, such as ZFS. It will, however, prevent vfc_typenum from changing when file systems are loaded in a different order for subsequent reboots. To disable this, you can set vfs.typenumhash=0 in /boot/loader.conf until you are ready to remount all NFS clients after a reboot. 20110828: Bump the shared library version numbers for libraries that do not use symbol versioning, have changed the ABI compared to stable/8 and which shared library version was not bumped. Done as part of 9.0-RELEASE cycle. 20110815: During the merge of Capsicum features, the fget(9) KPI was modified. This may require the rebuilding of out-of-tree device drivers -- issues have been reported specifically with the nVidia device driver. __FreeBSD_version is bumped to 900041. Also, there is a period between 20110811 and 20110814 where the special devices /dev/{stdin,stdout,stderr} did not work correctly. Building world from a kernel during that window may not work. 20110628: The packet filter (pf) code has been updated to OpenBSD 4.5. You need to update userland tools to be in sync with kernel. This update breaks backward compatibility with earlier pfsync(4) versions. Care must be taken when updating redundant firewall setups. 20110608: The following sysctls and tunables are retired on x86 platforms: machdep.hlt_cpus machdep.hlt_logical_cpus The following sysctl is retired: machdep.hyperthreading_allowed The sysctls were supposed to provide a way to dynamically offline and online selected CPUs on x86 platforms, but the implementation has not been reliable especially with SCHED_ULE scheduler. machdep.hyperthreading_allowed tunable is still available to ignore hyperthreading CPUs at OS level. Individual CPUs can be disabled using hint.lapic.X.disabled tunable, where X is an APIC ID of a CPU. Be advised, though, that disabling CPUs in non-uniform fashion will result in non-uniform topology and may lead to sub-optimal system performance with SCHED_ULE, which is a default scheduler. 20110607: cpumask_t type is retired and cpuset_t is used in order to describe a mask of CPUs. 20110531: Changes to ifconfig(8) for dynamic address family detection mandate that you are running a kernel of 20110525 or later. Make sure to follow the update procedure to boot a new kernel before installing world. 20110513: Support for sun4v architecture is officially dropped 20110503: Several KPI breaking changes have been committed to the mii(4) layer, the PHY drivers and consequently some Ethernet drivers using mii(4). This means that miibus.ko and the modules of the affected Ethernet drivers need to be recompiled. Note to kernel developers: Given that the OUI bit reversion problem was fixed as part of these changes all mii(4) commits related to OUIs, i.e. to sys/dev/mii/miidevs, PHY driver probing and vendor specific handling, no longer can be merged verbatim to stable/8 and previous branches. 20110430: Users of the Atheros AR71xx SoC code now need to add 'device ar71xx_pci' into their kernel configurations along with 'device pci'. 20110427: The default NFS client is now the new NFS client, so fstype "newnfs" is now "nfs" and the regular/old NFS client is now fstype "oldnfs". Although mounts via fstype "nfs" will usually work without userland changes, it is recommended that the mount(8) and mount_nfs(8) commands be rebuilt from sources and that a link to mount_nfs called mount_oldnfs be created. The new client is compiled into the kernel with "options NFSCL" and this is needed for diskless root file systems. The GENERIC kernel configs have been changed to use NFSCL and NFSD (the new server) instead of NFSCLIENT and NFSSERVER. To use the regular/old client, you can "mount -t oldnfs ...". For a diskless root file system, you must also include a line like: vfs.root.mountfrom="oldnfs:" in the boot/loader.conf on the root fs on the NFS server to make a diskless root fs use the old client. 20110424: The GENERIC kernels for all architectures now default to the new CAM-based ATA stack. It means that all legacy ATA drivers were removed and replaced by respective CAM drivers. If you are using ATA device names in /etc/fstab or other places, make sure to update them respectively (adX -> adaY, acdX -> cdY, afdX -> daY, astX -> saY, where 'Y's are the sequential numbers starting from zero for each type in order of detection, unless configured otherwise with tunables, see cam(4)). There will be symbolic links created in /dev/ to map old adX devices to the respective adaY. They should provide basic compatibility for file systems mounting in most cases, but they do not support old user-level APIs and do not have respective providers in GEOM. Consider using updated management tools with new device names. It is possible to load devices ahci, ata, siis and mvs as modules, but option ATA_CAM should remain in kernel configuration to make ata module work as CAM driver supporting legacy ATA controllers. Device ata still can be used in modular fashion (atacore + ...). Modules atadisk and atapi* are not used and won't affect operation in ATA_CAM mode. Note that to use CAM-based ATA kernel should include CAM devices scbus, pass, da (or explicitly ada), cd and optionally others. All of them are parts of the cam module. ataraid(4) functionality is now supported by the RAID GEOM class. To use it you can load geom_raid kernel module and use graid(8) tool for management. Instead of /dev/arX device names, use /dev/raid/rX. No kernel config options or code have been removed, so if a problem arises, please report it and optionally revert to the old ATA stack. In order to do it you can remove from the kernel config: options ATA_CAM device ahci device mvs device siis , and instead add back: device atadisk # ATA disk drives device ataraid # ATA RAID drives device atapicd # ATAPI CDROM drives device atapifd # ATAPI floppy drives device atapist # ATAPI tape drives 20110423: The default NFS server has been changed to the new server, which was referred to as the experimental server. If you need to switch back to the old NFS server, you must now put the "-o" option on both the mountd and nfsd commands. This can be done using the mountd_flags and nfs_server_flags rc.conf variables until an update to the rc scripts is committed, which is coming soon. 20110418: The GNU Objective-C runtime library (libobjc), and other Objective-C related components have been removed from the base system. If you require an Objective-C library, please use one of the available ports. 20110331: ath(4) has been split into bus- and device- modules. if_ath contains the HAL, the TX rate control and the network device code. if_ath_pci contains the PCI bus glue. For Atheros MIPS embedded systems, if_ath_ahb contains the AHB glue. Users need to load both if_ath_pci and if_ath in order to use ath on everything else. TO REPEAT: if_ath_ahb is not needed for normal users. Normal users only need to load if_ath and if_ath_pci for ath(4) operation. 20110314: As part of the replacement of sysinstall, the process of building release media has changed significantly. For details, please re-read release(7), which has been updated to reflect the new build process. 20110218: GNU binutils 2.17.50 (as of 2007-07-03) has been merged to -HEAD. This is the last available version under GPLv2. It brings a number of new features, such as support for newer x86 CPU's (with SSE-3, SSSE-3, SSE 4.1 and SSE 4.2), better support for powerpc64, a number of new directives, and lots of other small improvements. See the ChangeLog file in contrib/binutils for the full details. 20110218: IPsec's HMAC_SHA256-512 support has been fixed to be RFC4868 compliant, and will now use half of hash for authentication. This will break interoperability with all stacks (including all actual FreeBSD versions) who implement draft-ietf-ipsec-ciph-sha-256-00 (they use 96 bits of hash for authentication). The only workaround with such peers is to use another HMAC algorithm for IPsec ("phase 2") authentication. 20110207: Remove the uio_yield prototype and symbol. This function has been misnamed since it was introduced and should not be globally exposed with this name. The equivalent functionality is now available using kern_yield(curthread->td_user_pri). The function remains undocumented. 20110112: A SYSCTL_[ADD_]UQUAD was added for unsigned uint64_t pointers, symmetric with the existing SYSCTL_[ADD_]QUAD. Type checking for scalar sysctls is defined but disabled. Code that needs UQUAD to pass the type checking that must compile on older systems where the define is not present can check against __FreeBSD_version >= 900030. The system dialog(1) has been replaced with a new version previously in ports as devel/cdialog. dialog(1) is mostly command-line compatible with the previous version, but the libdialog associated with it has a largely incompatible API. As such, the original version of libdialog will be kept temporarily as libodialog, until its base system consumers are replaced or updated. Bump __FreeBSD_version to 900030. 20110103: If you are trying to run make universe on a -stable system, and you get the following warning: "Makefile", line 356: "Target architecture for i386/conf/GENERIC unknown. config(8) likely too old." or something similar to it, then you must upgrade your -stable system to 8.2-Release or newer (really, any time after r210146 7/15/2010 in stable/8) or build the config from the latest stable/8 branch and install it on your system. Prior to this date, building a current universe on 8-stable system from between 7/15/2010 and 1/2/2011 would result in a weird shell parsing error in the first kernel build phase. A new config on those old systems will fix that problem for older versions of -current. 20101228: The TCP stack has been modified to allow Khelp modules to interact with it via helper hook points and store per-connection data in the TCP control block. Bump __FreeBSD_version to 900029. User space tools that rely on the size of struct tcpcb in tcp_var.h (e.g. sockstat) need to be recompiled. 20101114: Generic IEEE 802.3 annex 31B full duplex flow control support has been added to mii(4) and bge(4), bce(4), msk(4), nfe(4) and stge(4) along with brgphy(4), e1000phy(4) as well as ip1000phy() have been converted to take advantage of it instead of using custom implementations. This means that these drivers now no longer unconditionally advertise support for flow control but only do so if flow control is a selected media option. This was implemented in the generic support that way in order to allow flow control to be switched on and off via ifconfig(8) with the PHY specific default to typically off in order to protect from unwanted effects. Consequently, if you used flow control with one of the above mentioned drivers you now need to explicitly enable it, for example via: ifconfig bge0 media auto mediaopt flowcontrol Along with the above mentioned changes generic support for setting 1000baseT master mode also has been added and brgphy(4), ciphy(4), e1000phy(4) as well as ip1000phy(4) have been converted to take advantage of it. This means that these drivers now no longer take the link0 parameter for selecting master mode but the master media option has to be used instead, for example like in the following: ifconfig bge0 media 1000baseT mediaopt full-duplex,master Selection of master mode now is also available with all other PHY drivers supporting 1000baseT. 20101111: The TCP stack has received a significant update to add support for modularised congestion control and generally improve the clarity of congestion control decisions. Bump __FreeBSD_version to 900025. User space tools that rely on the size of struct tcpcb in tcp_var.h (e.g. sockstat) need to be recompiled. 20101002: The man(1) utility has been replaced by a new version that no longer uses /etc/manpath.config. Please consult man.conf(5) for how to migrate local entries to the new format. 20100928: The copyright strings printed by login(1) and sshd(8) at the time of a new connection have been removed to follow other operating systems and upstream sshd. 20100915: A workaround for a fixed ld bug has been removed in kernel code, so make sure that your system ld is built from sources after revision 210245 from 2010-07-19 (r211583 if building head kernel on stable/8, r211584 for stable/7; both from 2010-08-21). A symptom of incorrect ld version is different addresses for set_pcpu section and __start_set_pcpu symbol in kernel and/or modules. 20100913: The $ipv6_prefer variable in rc.conf(5) has been split into $ip6addrctl_policy and $ipv6_activate_all_interfaces. The $ip6addrctl_policy is a variable to choose a pre-defined address selection policy set by ip6addrctl(8). A value "ipv4_prefer", "ipv6_prefer" or "AUTO" can be specified. The default is "AUTO". The $ipv6_activate_all_interfaces specifies whether IFDISABLED flag (see an entry of 20090926) is set on an interface with no corresponding $ifconfig_IF_ipv6 line. The default is "NO" for security reason. If you want IPv6 link-local address on all interfaces by default, set this to "YES". The old ipv6_prefer="YES" is equivalent to ipv6_activate_all_interfaces="YES" and ip6addrctl_policy="ipv6_prefer". 20100913: DTrace has grown support for userland tracing. Due to this, DTrace is now i386 and amd64 only. dtruss(1) is now installed by default on those systems and a new kernel module is needed for userland tracing: fasttrap. No changes to your kernel config file are necessary to enable userland tracing, but you might consider adding 'STRIP=' and 'CFLAGS+=-fno-omit-frame-pointer' to your make.conf if you want to have informative userland stack traces in DTrace (ustack). 20100725: The acpi_aiboost(4) driver has been removed in favor of the new aibs(4) driver. You should update your kernel configuration file. 20100722: BSD grep has been imported to the base system and it is built by default. It is completely BSD licensed, highly GNU-compatible, uses less memory than its GNU counterpart and has a small codebase. However, it is slower than its GNU counterpart, which is mostly noticeable for larger searches, for smaller ones it is measurable but not significant. The reason is complex, the most important factor is that we lack a modern and efficient regex library and GNU overcomes this by optimizing the searches internally. Future work on improving the regex performance is planned, for the meantime, users that need better performance, can build GNU grep instead by setting the WITH_GNU_GREP knob. 20100713: Due to the import of powerpc64 support, all existing powerpc kernel configuration files must be updated with a machine directive like this: machine powerpc powerpc In addition, an updated config(8) is required to build powerpc kernels after this change. 20100713: A new version of ZFS (version 15) has been merged to -HEAD. This version uses a python library for the following subcommands: zfs allow, zfs unallow, zfs groupspace, zfs userspace. For full functionality of these commands the following port must be installed: sysutils/py-zfs 20100429: 'vm_page's are now hashed by physical address to an array of mutexes. Currently this is only used to serialize access to hold_count. Over time the page queue mutex will be peeled away. This changes the size of pmap on every architecture. And requires all callers of vm_page_hold and vm_page_unhold to be updated. 20100402: WITH_CTF can now be specified in src.conf (not recommended, there are some problems with static executables), make.conf (would also affect ports which do not use GNU make and do not override the compile targets) or in the kernel config (via "makeoptions WITH_CTF=yes"). When WITH_CTF was specified there before this was silently ignored, so make sure that WITH_CTF is not used in places which could lead to unwanted behavior. 20100311: The kernel option COMPAT_IA32 has been replaced with COMPAT_FREEBSD32 to allow 32-bit compatibility on non-x86 platforms. All kernel configurations on amd64 and ia64 platforms using these options must be modified accordingly. 20100113: The utmp user accounting database has been replaced with utmpx, the user accounting interface standardized by POSIX. Unfortunately the semantics of utmp and utmpx don't match, making it practically impossible to support both interfaces. The user accounting database is used by tools like finger(1), last(1), talk(1), w(1) and ac(8). All applications in the base system use utmpx. This means only local binaries (e.g. from the ports tree) may still use these utmp database files. These applications must be rebuilt to make use of utmpx. After the system has been upgraded, it is safe to remove the old log files (/var/run/utmp, /var/log/lastlog and /var/log/wtmp*), assuming their contents is of no importance anymore. Old wtmp databases can only be used by last(1) and ac(8) after they have been converted to the new format using wtmpcvt(1). 20100108: Introduce the kernel thread "deadlock resolver" (which can be enabled via the DEADLKRES option, see NOTES for more details) and the sleepq_type() function for sleepqueues. 20091202: The rc.firewall and rc.firewall6 were unified, and rc.firewall6 and rc.d/ip6fw were removed. According to the removal of rc.d/ip6fw, ipv6_firewall_* rc variables are obsoleted. Instead, the following new rc variables are added to rc.d/ipfw: firewall_client_net_ipv6, firewall_simple_iif_ipv6, firewall_simple_inet_ipv6, firewall_simple_oif_ipv6, firewall_simple_onet_ipv6, firewall_trusted_ipv6 The meanings correspond to the relevant IPv4 variables. 20091125: 8.0-RELEASE. 20091113: The default terminal emulation for syscons(4) has been changed from cons25 to xterm on all platforms except pc98. This means that the /etc/ttys file needs to be updated to ensure correct operation of applications on the console. The terminal emulation style can be toggled per window by using vidcontrol(1)'s -T flag. The TEKEN_CONS25 kernel configuration options can be used to change the compile-time default back to cons25. To prevent graphical artifacts, make sure the TERM environment variable is set to match the terminal emulation that is being performed by syscons(4). 20091109: The layout of the structure ieee80211req_scan_result has changed. Applications that require wireless scan results (e.g. ifconfig(8)) from net80211 need to be recompiled. Applications such as wpa_supplicant(8) may require a full world build without using NO_CLEAN in order to get synchronized with the new structure. 20091025: The iwn(4) driver has been updated to support the 5000 and 5150 series. There's one kernel module for each firmware. Adding "device iwnfw" to the kernel configuration file means including all three firmware images inside the kernel. If you want to include just the one for your wireless card, use the devices iwn4965fw, iwn5000fw or iwn5150fw. 20090926: The rc.d/network_ipv6, IPv6 configuration script has been integrated into rc.d/netif. The changes are the following: 1. To use IPv6, simply define $ifconfig_IF_ipv6 like $ifconfig_IF for IPv4. For aliases, $ifconfig_IF_aliasN should be used. Note that both variables need the "inet6" keyword at the head. Do not set $ipv6_network_interfaces manually if you do not understand what you are doing. It is not needed in most cases. $ipv6_ifconfig_IF and $ipv6_ifconfig_IF_aliasN still work, but they are obsolete. 2. $ipv6_enable is obsolete. Use $ipv6_prefer and "inet6 accept_rtadv" keyword in ifconfig(8) instead. If you define $ipv6_enable=YES, it means $ipv6_prefer=YES and all configured interfaces have "inet6 accept_rtadv" in the $ifconfig_IF_ipv6. These are for backward compatibility. 3. A new variable $ipv6_prefer has been added. If NO, IPv6 functionality of interfaces with no corresponding $ifconfig_IF_ipv6 is disabled by using "inet6 ifdisabled" flag, and the default address selection policy of ip6addrctl(8) is the IPv4-preferred one (see rc.d/ip6addrctl for more details). Note that if you want to configure IPv6 functionality on the disabled interfaces after boot, first you need to clear the flag by using ifconfig(8) like: ifconfig em0 inet6 -ifdisabled If YES, the default address selection policy is set as IPv6-preferred. The default value of $ipv6_prefer is NO. 4. If your system need to receive Router Advertisement messages, define "inet6 accept_rtadv" in $ifconfig_IF_ipv6. The rc(8) scripts automatically invoke rtsol(8) when the interface becomes UP. The Router Advertisement messages are used for SLAAC (State-Less Address AutoConfiguration). 20090922: 802.11s D3.03 support was committed. This is incompatible with the previous code, which was based on D3.0. 20090912: A sysctl variable net.inet6.ip6.accept_rtadv now sets the default value of a per-interface flag ND6_IFF_ACCEPT_RTADV, not a global knob to control whether accepting Router Advertisement messages or not. Also, a per-interface flag ND6_IFF_AUTO_LINKLOCAL has been added and a sysctl variable net.inet6.ip6.auto_linklocal is its default value. The ifconfig(8) utility now supports these flags. 20090910: ZFS snapshots are now mounted with MNT_IGNORE flag. Use -v option for mount(8) and -a option for df(1) to see them. 20090825: The old tunable hw.bus.devctl_disable has been superseded by hw.bus.devctl_queue. hw.bus.devctl_disable=1 in loader.conf should be replaced by hw.bus.devctl_queue=0. The default for this new tunable is 1000. 20090813: Remove the option STOP_NMI. The default action is now to use NMI only for KDB via the newly introduced function stop_cpus_hard() and maintain stop_cpus() to just use a normal IPI_STOP on ia32 and amd64. 20090803: The stable/8 branch created in subversion. This corresponds to the RELENG_8 branch in CVS. 20090719: Bump the shared library version numbers for all libraries that do not use symbol versioning as part of the 8.0-RELEASE cycle. Bump __FreeBSD_version to 800105. 20090714: Due to changes in the implementation of virtual network stack support, all network-related kernel modules must be recompiled. As this change breaks the ABI, bump __FreeBSD_version to 800104. 20090713: The TOE interface to the TCP syncache has been modified to remove struct tcpopt () from the ABI of the network stack. The cxgb driver is the only TOE consumer affected by this change, and needs to be recompiled along with the kernel. As this change breaks the ABI, bump __FreeBSD_version to 800103. 20090712: Padding has been added to struct tcpcb, sackhint and tcpstat in to facilitate future MFCs and bug fixes whilst maintaining the ABI. However, this change breaks the ABI, so bump __FreeBSD_version to 800102. User space tools that rely on the size of any of these structs (e.g. sockstat) need to be recompiled. 20090630: The NFS_LEGACYRPC option has been removed along with the old kernel RPC implementation that this option selected. Kernel configurations may need to be adjusted. 20090629: The network interface device nodes at /dev/net/ have been removed. All ioctl operations can be performed the normal way using routing sockets. The kqueue functionality can generally be replaced with routing sockets. 20090628: The documentation from the FreeBSD Documentation Project (Handbook, FAQ, etc.) is now installed via packages by sysinstall(8) and under the /usr/local/share/doc/freebsd directory instead of /usr/share/doc. 20090624: The ABI of various structures related to the SYSV IPC API have been changed. As a result, the COMPAT_FREEBSD[456] and COMPAT_43 kernel options now all require COMPAT_FREEBSD7. Bump __FreeBSD_version to 800100. 20090622: Layout of struct vnet has changed as routing related variables were moved to their own Vimage module. Modules need to be recompiled. Bump __FreeBSD_version to 800099. 20090619: NGROUPS_MAX and NGROUPS have been increased from 16 to 1023 and 1024 respectively. As long as no more than 16 groups per process are used, no changes should be visible. When more than 16 groups are used, old binaries may fail if they call getgroups() or getgrouplist() with statically sized storage. Recompiling will work around this, but applications should be modified to use dynamically allocated storage for group arrays as POSIX.1-2008 does not cap an implementation's number of supported groups at NGROUPS_MAX+1 as previous versions did. NFS and portalfs mounts may also be affected as the list of groups is truncated to 16. Users of NFS who use more than 16 groups, should take care that negative group permissions are not used on the exported file systems as they will not be reliable unless a GSSAPI based authentication method is used. 20090616: The compiling option ADAPTIVE_LOCKMGRS has been introduced. This option compiles in the support for adaptive spinning for lockmgrs which want to enable it. The lockinit() function now accepts the flag LK_ADAPTIVE in order to make the lock object subject to adaptive spinning when both held in write and read mode. 20090613: The layout of the structure returned by IEEE80211_IOC_STA_INFO has changed. User applications that use this ioctl need to be rebuilt. 20090611: The layout of struct thread has changed. Kernel and modules need to be rebuilt. 20090608: The layout of structs ifnet, domain, protosw and vnet_net has changed. Kernel modules need to be rebuilt. Bump __FreeBSD_version to 800097. 20090602: window(1) has been removed from the base system. It can now be installed from ports. The port is called misc/window. 20090601: The way we are storing and accessing `routing table' entries has changed. Programs reading the FIB, like netstat, need to be re-compiled. 20090601: A new netisr implementation has been added for FreeBSD 8. Network file system modules, such as igmp, ipdivert, and others, should be rebuilt. Bump __FreeBSD_version to 800096. 20090530: Remove the tunable/sysctl debug.mpsafevfs as its initial purpose is no more valid. 20090530: Add VOP_ACCESSX(9). File system modules need to be rebuilt. Bump __FreeBSD_version to 800094. 20090529: Add mnt_xflag field to 'struct mount'. File system modules need to be rebuilt. Bump __FreeBSD_version to 800093. 20090528: The compiling option ADAPTIVE_SX has been retired while it has been introduced the option NO_ADAPTIVE_SX which handles the reversed logic. The KPI for sx_init_flags() changes as accepting flags: SX_ADAPTIVESPIN flag has been retired while the SX_NOADAPTIVE flag has been introduced in order to handle the reversed logic. Bump __FreeBSD_version to 800092. 20090527: Add support for hierarchical jails. Remove global securelevel. Bump __FreeBSD_version to 800091. 20090523: The layout of struct vnet_net has changed, therefore modules need to be rebuilt. Bump __FreeBSD_version to 800090. 20090523: The newly imported zic(8) produces a new format in the output. Please run tzsetup(8) to install the newly created data to /etc/localtime. 20090520: The sysctl tree for the usb stack has renamed from hw.usb2.* to hw.usb.* and is now consistent again with previous releases. 20090520: 802.11 monitor mode support was revised and driver api's were changed. Drivers dependent on net80211 now support DLT_IEEE802_11_RADIO instead of DLT_IEEE802_11. No user-visible data structures were changed but applications that use DLT_IEEE802_11 may require changes. Bump __FreeBSD_version to 800088. 20090430: The layout of the following structs has changed: sysctl_oid, socket, ifnet, inpcbinfo, tcpcb, syncache_head, vnet_inet, vnet_inet6 and vnet_ipfw. Most modules need to be rebuild or panics may be experienced. World rebuild is required for correctly checking networking state from userland. Bump __FreeBSD_version to 800085. 20090429: MLDv2 and Source-Specific Multicast (SSM) have been merged to the IPv6 stack. VIMAGE hooks are in but not yet used. The implementation of SSM within FreeBSD's IPv6 stack closely follows the IPv4 implementation. For kernel developers: * The most important changes are that the ip6_output() and ip6_input() paths no longer take the IN6_MULTI_LOCK, and this lock has been downgraded to a non-recursive mutex. * As with the changes to the IPv4 stack to support SSM, filtering of inbound multicast traffic must now be performed by transport protocols within the IPv6 stack. This does not apply to TCP and SCTP, however, it does apply to UDP in IPv6 and raw IPv6. * The KPIs used by IPv6 multicast are similar to those used by the IPv4 stack, with the following differences: * im6o_mc_filter() is analogous to imo_multicast_filter(). * The legacy KAME entry points in6_joingroup and in6_leavegroup() are shimmed to in6_mc_join() and in6_mc_leave() respectively. * IN6_LOOKUP_MULTI() has been deprecated and removed. * IPv6 relies on MLD for the DAD mechanism. KAME's internal KPIs for MLDv1 have an additional 'timer' argument which is used to jitter the initial membership report for the solicited-node multicast membership on-link. * This is not strictly needed for MLDv2, which already jitters its report transmissions. However, the 'timer' argument is preserved in case MLDv1 is active on the interface. * The KAME linked-list based IPv6 membership implementation has been refactored to use a vector similar to that used by the IPv4 stack. Code which maintains a list of its own multicast memberships internally, e.g. carp, has been updated to reflect the new semantics. * There is a known Lock Order Reversal (LOR) due to in6_setscope() acquiring the IF_AFDATA_LOCK and being called within ip6_output(). Whilst MLDv2 tries to avoid this otherwise benign LOR, it is an implementation constraint which needs to be addressed in HEAD. For application developers: * The changes are broadly similar to those made for the IPv4 stack. * The use of IPv4 and IPv6 multicast socket options on the same socket, using mapped addresses, HAS NOT been tested or supported. * There are a number of issues with the implementation of various IPv6 multicast APIs which need to be resolved in the API surface before the implementation is fully compatible with KAME userland use, and these are mostly to do with interface index treatment. * The literature available discusses the use of either the delta / ASM API with setsockopt(2)/getsockopt(2), or the full-state / ASM API using setsourcefilter(3)/getsourcefilter(3). For more information please refer to RFC 3768, 'Socket Interface Extensions for Multicast Source Filters'. * Applications which use the published RFC 3678 APIs should be fine. For systems administrators: * The mtest(8) utility has been refactored to support IPv6, in addition to IPv4. Interface addresses are no longer accepted as arguments, their names must be used instead. The utility will map the interface name to its first IPv4 address as returned by getifaddrs(3). * The ifmcstat(8) utility has also been updated to print the MLDv2 endpoint state and source filter lists via sysctl(3). * The net.inet6.ip6.mcast.loop sysctl may be tuned to 0 to disable loopback of IPv6 multicast datagrams by default; it defaults to 1 to preserve the existing behaviour. Disabling multicast loopback is recommended for optimal system performance. * The IPv6 MROUTING code has been changed to examine this sysctl instead of attempting to perform a group lookup before looping back forwarded datagrams. Bump __FreeBSD_version to 800084. 20090422: Implement low-level Bluetooth HCI API. Bump __FreeBSD_version to 800083. 20090419: The layout of struct malloc_type, used by modules to register new memory allocation types, has changed. Most modules will need to be rebuilt or panics may be experienced. Bump __FreeBSD_version to 800081. 20090415: Anticipate overflowing inp_flags - add inp_flags2. This changes most offsets in inpcb, so checking v4 connection state will require a world rebuild. Bump __FreeBSD_version to 800080. 20090415: Add an llentry to struct route and struct route_in6. Modules embedding a struct route will need to be recompiled. Bump __FreeBSD_version to 800079. 20090414: The size of rt_metrics_lite and by extension rtentry has changed. Networking administration apps will need to be recompiled. The route command now supports show as an alias for get, weighting of routes, sticky and nostick flags to alter the behavior of stateful load balancing. Bump __FreeBSD_version to 800078. 20090408: Do not use Giant for kbdmux(4) locking. This is wrong and apparently causing more problems than it solves. This will re-open the issue where interrupt handlers may race with kbdmux(4) in polling mode. Typical symptoms include (but not limited to) duplicated and/or missing characters when low level console functions (such as gets) are used while interrupts are enabled (for example geli password prompt, mountroot prompt etc.). Disabling kbdmux(4) may help. 20090407: The size of structs vnet_net, vnet_inet and vnet_ipfw has changed; kernel modules referencing any of the above need to be recompiled. Bump __FreeBSD_version to 800075. 20090320: GEOM_PART has become the default partition slicer for storage devices, replacing GEOM_MBR, GEOM_BSD, GEOM_PC98 and GEOM_GPT slicers. It introduces some changes: MSDOS/EBR: the devices created from MSDOS extended partition entries (EBR) can be named differently than with GEOM_MBR and are now symlinks to devices with offset-based names. fstabs may need to be modified. BSD: the "geometry does not match label" warning is harmless in most cases but it points to problems in file system misalignment with disk geometry. The "c" partition is now implicit, covers the whole top-level drive and cannot be (mis)used by users. General: Kernel dumps are now not allowed to be written to devices whose partition types indicate they are meant to be used for file systems (or, in case of MSDOS partitions, as something else than the "386BSD" type). Most of these changes date approximately from 200812. 20090319: The uscanner(4) driver has been removed from the kernel. This follows Linux removing theirs in 2.6 and making libusb the default interface (supported by sane). 20090319: The multicast forwarding code has been cleaned up. netstat(1) only relies on KVM now for printing bandwidth upcall meters. The IPv4 and IPv6 modules are split into ip_mroute_mod and ip6_mroute_mod respectively. The config(5) options for statically compiling this code remain the same, i.e. 'options MROUTING'. 20090315: Support for the IFF_NEEDSGIANT network interface flag has been removed, which means that non-MPSAFE network device drivers are no longer supported. In particular, if_ar, if_sr, and network device drivers from the old (legacy) USB stack can no longer be built or used. 20090313: POSIX.1 Native Language Support (NLS) has been enabled in libc and a bunch of new language catalog files have also been added. This means that some common libc messages are now localized and they depend on the LC_MESSAGES environmental variable. 20090313: The k8temp(4) driver has been renamed to amdtemp(4) since support for Family 10 and Family 11 CPU families was added. 20090309: IGMPv3 and Source-Specific Multicast (SSM) have been merged to the IPv4 stack. VIMAGE hooks are in but not yet used. For kernel developers, the most important changes are that the ip_output() and ip_input() paths no longer take the IN_MULTI_LOCK(), and this lock has been downgraded to a non-recursive mutex. Transport protocols (UDP, Raw IP) are now responsible for filtering inbound multicast traffic according to group membership and source filters. The imo_multicast_filter() KPI exists for this purpose. Transports which do not use multicast (SCTP, TCP) already reject multicast by default. Forwarding and receive performance may improve as a mutex acquisition is no longer needed in the ip_input() low-level input path. in_addmulti() and in_delmulti() are shimmed to new KPIs which exist to support SSM in-kernel. For application developers, it is recommended that loopback of multicast datagrams be disabled for best performance, as this will still cause the lock to be taken for each looped-back datagram transmission. The net.inet.ip.mcast.loop sysctl may be tuned to 0 to disable loopback by default; it defaults to 1 to preserve the existing behaviour. For systems administrators, to obtain best performance with multicast reception and multiple groups, it is always recommended that a card with a suitably precise hash filter is used. Hash collisions will still result in the lock being taken within the transport protocol input path to check group membership. If deploying FreeBSD in an environment with IGMP snooping switches, it is recommended that the net.inet.igmp.sendlocal sysctl remain enabled; this forces 224.0.0.0/24 group membership to be announced via IGMP. The size of 'struct igmpstat' has changed; netstat needs to be recompiled to reflect this. Bump __FreeBSD_version to 800070. 20090309: libusb20.so.1 is now installed as libusb.so.1 and the ports system updated to use it. This requires a buildworld/installworld in order to update the library and dependencies (usbconfig, etc). Its advisable to rebuild all ports which uses libusb. More specific directions are given in the ports collection UPDATING file. Any /etc/libmap.conf entries for libusb are no longer required and can be removed. 20090302: A workaround is committed to allow the creation of System V shared memory segment of size > 2 GB on the 64-bit architectures. Due to a limitation of the existing ABI, the shm_segsz member of the struct shmid_ds, returned by shmctl(IPC_STAT) call is wrong for large segments. Note that limits must be explicitly raised to allow such segments to be created. 20090301: The layout of struct ifnet has changed, requiring a rebuild of all network device driver modules. 20090227: The /dev handling for the new USB stack has changed, a buildworld/installworld is required for libusb20. 20090223: The new USB2 stack has now been permanently moved in and all kernel and module names reverted to their previous values (eg, usb, ehci, ohci, ums, ...). The old usb stack can be compiled in by prefixing the name with the letter 'o', the old usb modules have been removed. Updating entry 20090216 for xorg and 20090215 for libmap may still apply. 20090217: The rc.conf(5) option if_up_delay has been renamed to defaultroute_delay to better reflect its purpose. If you have customized this setting in /etc/rc.conf you need to update it to use the new name. 20090216: xorg 7.4 wants to configure its input devices via hald which does not yet work with USB2. If the keyboard/mouse does not work in xorg then add Option "AllowEmptyInput" "off" to your ServerLayout section. This will cause X to use the configured kbd and mouse sections from your xorg.conf. 20090215: The GENERIC kernels for all architectures now default to the new USB2 stack. No kernel config options or code have been removed so if a problem arises please report it and optionally revert to the old USB stack. If you are loading USB kernel modules or have a custom kernel that includes GENERIC then ensure that usb names are also changed over, eg uftdi -> usb2_serial_ftdi. Older programs linked against the ports libusb 0.1 need to be redirected to the new stack's libusb20. /etc/libmap.conf can be used for this: # Map old usb library to new one for usb2 stack libusb-0.1.so.8 libusb20.so.1 20090209: All USB ethernet devices now attach as interfaces under the name ueN (eg. ue0). This is to provide a predictable name as vendors often change usb chipsets in a product without notice. 20090203: The ichsmb(4) driver has been changed to require SMBus slave addresses be left-justified (xxxxxxx0b) rather than right-justified. All of the other SMBus controller drivers require left-justified slave addresses, so this change makes all the drivers provide the same interface. 20090201: INET6 statistics (struct ip6stat) was updated. netstat(1) needs to be recompiled. 20090119: NTFS has been removed from GENERIC kernel on amd64 to match GENERIC on i386. Should not cause any issues since mount_ntfs(8) will load ntfs.ko module automatically when NTFS support is actually needed, unless ntfs.ko is not installed or security level prohibits loading kernel modules. If either is the case, "options NTFS" has to be added into kernel config. 20090115: TCP Appropriate Byte Counting (RFC 3465) support added to kernel. New field in struct tcpcb breaks ABI, so bump __FreeBSD_version to 800061. User space tools that rely on the size of struct tcpcb in tcp_var.h (e.g. sockstat) need to be recompiled. 20081225: ng_tty(4) module updated to match the new TTY subsystem. Due to API change, user-level applications must be updated. New API support added to mpd5 CVS and expected to be present in next mpd5.3 release. 20081219: With __FreeBSD_version 800060 the makefs tool is part of the base system (it was a port). 20081216: The afdata and ifnet locks have been changed from mutexes to rwlocks, network modules will need to be re-compiled. 20081214: __FreeBSD_version 800059 incorporates the new arp-v2 rewrite. RTF_CLONING, RTF_LLINFO and RTF_WASCLONED flags are eliminated. The new code reduced struct rtentry{} by 16 bytes on 32-bit architecture and 40 bytes on 64-bit architecture. The userland applications "arp" and "ndp" have been updated accordingly. The output from "netstat -r" shows only routing entries and none of the L2 information. 20081130: __FreeBSD_version 800057 marks the switchover from the binary ath hal to source code. Users must add the line: options AH_SUPPORT_AR5416 to their kernel config files when specifying: device ath_hal The ath_hal module no longer exists; the code is now compiled together with the driver in the ath module. It is now possible to tailor chip support (i.e. reduce the set of chips and thereby the code size); consult ath_hal(4) for details. 20081121: __FreeBSD_version 800054 adds memory barriers to , new interfaces to ifnet to facilitate multiple hardware transmit queues for cards that support them, and a lock-less ring-buffer implementation to enable drivers to more efficiently manage queueing of packets. 20081117: A new version of ZFS (version 13) has been merged to -HEAD. This version has zpool attribute "listsnapshots" off by default, which means "zfs list" does not show snapshots, and is the same as Solaris behavior. 20081028: dummynet(4) ABI has changed. ipfw(8) needs to be recompiled. 20081009: The uhci, ohci, ehci and slhci USB Host controller drivers have been put into separate modules. If you load the usb module separately through loader.conf you will need to load the appropriate *hci module as well. E.g. for a UHCI-based USB 2.0 controller add the following to loader.conf: uhci_load="YES" ehci_load="YES" 20081009: The ABI used by the PMC toolset has changed. Please keep userland (libpmc(3)) and the kernel module (hwpmc(4)) in sync. 20081009: atapci kernel module now includes only generic PCI ATA driver. AHCI driver moved to ataahci kernel module. All vendor-specific code moved into separate kernel modules: ataacard, ataacerlabs, ataadaptec, ataamd, ataati, atacenatek, atacypress, atacyrix, atahighpoint, ataintel, ataite, atajmicron, atamarvell, atamicron, atanational, atanetcell, atanvidia, atapromise, ataserverworks, atasiliconimage, atasis, atavia 20080820: The TTY subsystem of the kernel has been replaced by a new implementation, which provides better scalability and an improved driver model. Most common drivers have been migrated to the new TTY subsystem, while others have not. The following drivers have not yet been ported to the new TTY layer: PCI/ISA: cy, digi, rc, rp, sio USB: ubser, ucycom Line disciplines: ng_h4, ng_tty, ppp, sl, snp Adding these drivers to your kernel configuration file shall cause compilation to fail. 20080818: ntpd has been upgraded to 4.2.4p5. 20080801: OpenSSH has been upgraded to 5.1p1. For many years, FreeBSD's version of OpenSSH preferred DSA over RSA for host and user authentication keys. With this upgrade, we've switched to the vendor's default of RSA over DSA. This may cause upgraded clients to warn about unknown host keys even for previously known hosts. Users should follow the usual procedure for verifying host keys before accepting the RSA key. This can be circumvented by setting the "HostKeyAlgorithms" option to "ssh-dss,ssh-rsa" in ~/.ssh/config or on the ssh command line. Please note that the sequence of keys offered for authentication has been changed as well. You may want to specify IdentityFile in a different order to revert this behavior. 20080713: The sio(4) driver has been removed from the i386 and amd64 kernel configuration files. This means uart(4) is now the default serial port driver on those platforms as well. To prevent collisions with the sio(4) driver, the uart(4) driver uses different names for its device nodes. This means the onboard serial port will now most likely be called "ttyu0" instead of "ttyd0". You may need to reconfigure applications to use the new device names. When using the serial port as a boot console, be sure to update /boot/device.hints and /etc/ttys before booting the new kernel. If you forget to do so, you can still manually specify the hints at the loader prompt: set hint.uart.0.at="isa" set hint.uart.0.port="0x3F8" set hint.uart.0.flags="0x10" set hint.uart.0.irq="4" boot -s 20080609: The gpt(8) utility has been removed. Use gpart(8) to partition disks instead. 20080603: The version that Linuxulator emulates was changed from 2.4.2 to 2.6.16. If you experience any problems with Linux binaries please try to set sysctl compat.linux.osrelease to 2.4.2 and if it fixes the problem contact emulation mailing list. 20080525: ISDN4BSD (I4B) was removed from the src tree. You may need to update a your kernel configuration and remove relevant entries. 20080509: I have checked in code to support multiple routing tables. See the man pages setfib(1) and setfib(2). This is a hopefully backwards compatible version, but to make use of it you need to compile your kernel with options ROUTETABLES=2 (or more up to 16). 20080420: The 802.11 wireless support was redone to enable multi-bss operation on devices that are capable. The underlying device is no longer used directly but instead wlanX devices are cloned with ifconfig. This requires changes to rc.conf files. For example, change: ifconfig_ath0="WPA DHCP" to wlans_ath0=wlan0 ifconfig_wlan0="WPA DHCP" see rc.conf(5) for more details. In addition, mergemaster of /etc/rc.d is highly recommended. Simultaneous update of userland and kernel wouldn't hurt either. As part of the multi-bss changes the wlan_scan_ap and wlan_scan_sta modules were merged into the base wlan module. All references to these modules (e.g. in kernel config files) must be removed. 20080408: psm(4) has gained write(2) support in native operation level. Arbitrary commands can be written to /dev/psm%d and status can be read back from it. Therefore, an application is responsible for status validation and error recovery. It is a no-op in other operation levels. 20080312: Support for KSE threading has been removed from the kernel. To run legacy applications linked against KSE libmap.conf may be used. The following libmap.conf may be used to ensure compatibility with any prior release: libpthread.so.1 libthr.so.1 libpthread.so.2 libthr.so.2 libkse.so.3 libthr.so.3 20080301: The layout of struct vmspace has changed. This affects libkvm and any executables that link against libkvm and use the kvm_getprocs() function. In particular, but not exclusively, it affects ps(1), fstat(1), pkill(1), systat(1), top(1) and w(1). The effects are minimal, but it's advisable to upgrade world nonetheless. 20080229: The latest em driver no longer has support in it for the 82575 adapter, this is now moved to the igb driver. The split was done to make new features that are incompatible with older hardware easier to do. 20080220: The new geom_lvm(4) geom class has been renamed to geom_linux_lvm(4), likewise the kernel option is now GEOM_LINUX_LVM. 20080211: The default NFS mount mode has changed from UDP to TCP for increased reliability. If you rely on (insecurely) NFS mounting across a firewall you may need to update your firewall rules. 20080208: Belatedly note the addition of m_collapse for compacting mbuf chains. 20080126: The fts(3) structures have been changed to use adequate integer types for their members and so to be able to cope with huge file trees. The old fts(3) ABI is preserved through symbol versioning in libc, so third-party binaries using fts(3) should still work, although they will not take advantage of the extended types. At the same time, some third-party software might fail to build after this change due to unportable assumptions made in its source code about fts(3) structure members. Such software should be fixed by its vendor or, in the worst case, in the ports tree. FreeBSD_version 800015 marks this change for the unlikely case that a portable fix is impossible. 20080123: To upgrade to -current after this date, you must be running FreeBSD not older than 6.0-RELEASE. Upgrading to -current from 5.x now requires a stop over at RELENG_6 or RELENG_7 systems. 20071128: The ADAPTIVE_GIANT kernel option has been retired because its functionality is the default now. 20071118: The AT keyboard emulation of sunkbd(4) has been turned on by default. In order to make the special symbols of the Sun keyboards driven by sunkbd(4) work under X these now have to be configured the same way as Sun USB keyboards driven by ukbd(4) (which also does AT keyboard emulation), f.e.: Option "XkbLayout" "us" Option "XkbRules" "xorg" Option "XkbSymbols" "pc(pc105)+sun_vndr/usb(sun_usb)+us" 20071024: It has been decided that it is desirable to provide ABI backwards compatibility to the FreeBSD 4/5/6 versions of the PCIOCGETCONF, PCIOCREAD and PCIOCWRITE IOCTLs, which was broken with the introduction of PCI domain support (see the 20070930 entry). Unfortunately, this required the ABI of PCIOCGETCONF to be broken again in order to be able to provide backwards compatibility to the old version of that IOCTL. Thus consumers of PCIOCGETCONF have to be recompiled again. As for prominent ports this affects neither pciutils nor xorg-server this time, the hal port needs to be rebuilt however. 20071020: The misnamed kthread_create() and friends have been renamed to kproc_create() etc. Many of the callers already used kproc_start().. I will return kthread_create() and friends in a while with implementations that actually create threads, not procs. Renaming corresponds with version 800002. 20071010: RELENG_7 branched. COMMON ITEMS: General Notes ------------- Avoid using make -j when upgrading. While generally safe, there are sometimes problems using -j to upgrade. If your upgrade fails with -j, please try again without -j. From time to time in the past there have been problems using -j with buildworld and/or installworld. This is especially true when upgrading between "distant" versions (eg one that cross a major release boundary or several minor releases, or when several months have passed on the -current branch). Sometimes, obscure build problems are the result of environment poisoning. This can happen because the make utility reads its environment when searching for values for global variables. To run your build attempts in an "environmental clean room", prefix all make commands with 'env -i '. See the env(1) manual page for more details. When upgrading from one major version to another it is generally best to upgrade to the latest code in the currently installed branch first, then do an upgrade to the new branch. This is the best-tested upgrade path, and has the highest probability of being successful. Please try this approach before reporting problems with a major version upgrade. When upgrading a live system, having a root shell around before installing anything can help undo problems. Not having a root shell around can lead to problems if pam has changed too much from your starting point to allow continued authentication after the upgrade. ZFS notes --------- When upgrading the boot ZFS pool to a new version, always follow these two steps: 1.) recompile and reinstall the ZFS boot loader and boot block (this is part of "make buildworld" and "make installworld") 2.) update the ZFS boot block on your boot drive The following example updates the ZFS boot block on the first partition (freebsd-boot) of a GPT partitioned drive ad0: "gpart bootcode -p /boot/gptzfsboot -i 1 ad0" Non-boot pools do not need these updates. To build a kernel ----------------- If you are updating from a prior version of FreeBSD (even one just a few days old), you should follow this procedure. It is the most failsafe as it uses a /usr/obj tree with a fresh mini-buildworld, make kernel-toolchain make -DALWAYS_CHECK_MAKE buildkernel KERNCONF=YOUR_KERNEL_HERE make -DALWAYS_CHECK_MAKE installkernel KERNCONF=YOUR_KERNEL_HERE To test a kernel once --------------------- If you just want to boot a kernel once (because you are not sure if it works, or if you want to boot a known bad kernel to provide debugging information) run make installkernel KERNCONF=YOUR_KERNEL_HERE KODIR=/boot/testkernel nextboot -k testkernel To just build a kernel when you know that it won't mess you up -------------------------------------------------------------- This assumes you are already running a CURRENT system. Replace ${arch} with the architecture of your machine (e.g. "i386", "arm", "amd64", "ia64", "pc98", "sparc64", "powerpc", "mips", etc). cd src/sys/${arch}/conf config KERNEL_NAME_HERE cd ../compile/KERNEL_NAME_HERE make depend make make install If this fails, go to the "To build a kernel" section. To rebuild everything and install it on the current system. ----------------------------------------------------------- # Note: sometimes if you are running current you gotta do more than # is listed here if you are upgrading from a really old current. make buildworld make kernel KERNCONF=YOUR_KERNEL_HERE [1] [3] mergemaster -p [5] make installworld mergemaster -i [4] make delete-old [6] To cross-install current onto a separate partition -------------------------------------------------- # In this approach we use a separate partition to hold # current's root, 'usr', and 'var' directories. A partition # holding "/", "/usr" and "/var" should be about 2GB in # size. make buildworld make buildkernel KERNCONF=YOUR_KERNEL_HERE make installworld DESTDIR=${CURRENT_ROOT} make distribution DESTDIR=${CURRENT_ROOT} # if newfs'd make installkernel KERNCONF=YOUR_KERNEL_HERE DESTDIR=${CURRENT_ROOT} cp /etc/fstab ${CURRENT_ROOT}/etc/fstab # if newfs'd To upgrade in-place from stable to current ---------------------------------------------- make buildworld [9] make kernel KERNCONF=YOUR_KERNEL_HERE [8] [1] [3] mergemaster -p [5] make installworld mergemaster -i [4] make delete-old [6] Make sure that you've read the UPDATING file to understand the tweaks to various things you need. At this point in the life cycle of current, things change often and you are on your own to cope. The defaults can also change, so please read ALL of the UPDATING entries. Also, if you are tracking -current, you must be subscribed to freebsd-current@freebsd.org. Make sure that before you update your sources that you have read and understood all the recent messages there. If in doubt, please track -stable which has much fewer pitfalls. [1] If you have third party modules, such as vmware, you should disable them at this point so they don't crash your system on reboot. [3] From the bootblocks, boot -s, and then do fsck -p mount -u / mount -a cd src adjkerntz -i # if CMOS is wall time Also, when doing a major release upgrade, it is required that you boot into single user mode to do the installworld. [4] Note: This step is non-optional. Failure to do this step can result in a significant reduction in the functionality of the system. Attempting to do it by hand is not recommended and those that pursue this avenue should read this file carefully, as well as the archives of freebsd-current and freebsd-hackers mailing lists for potential gotchas. The -U option is also useful to consider. See mergemaster(8) for more information. [5] Usually this step is a noop. However, from time to time you may need to do this if you get unknown user in the following step. It never hurts to do it all the time. You may need to install a new mergemaster (cd src/usr.sbin/mergemaster && make install) after the buildworld before this step if you last updated from current before 20130425 or from -stable before 20130430. [6] This only deletes old files and directories. Old libraries can be deleted by "make delete-old-libs", but you have to make sure that no program is using those libraries anymore. [8] In order to have a kernel that can run the 4.x binaries needed to do an installworld, you must include the COMPAT_FREEBSD4 option in your kernel. Failure to do so may leave you with a system that is hard to boot to recover. A similar kernel option COMPAT_FREEBSD5 is required to run the 5.x binaries on more recent kernels. And so on for COMPAT_FREEBSD6 and COMPAT_FREEBSD7. Make sure that you merge any new devices from GENERIC since the last time you updated your kernel config file. [9] When checking out sources, you must include the -P flag to have cvs prune empty directories. If CPUTYPE is defined in your /etc/make.conf, make sure to use the "?=" instead of the "=" assignment operator, so that buildworld can override the CPUTYPE if it needs to. MAKEOBJDIRPREFIX must be defined in an environment variable, and not on the command line, or in /etc/make.conf. buildworld will warn if it is improperly defined. FORMAT: This file contains a list, in reverse chronological order, of major breakages in tracking -current. It is not guaranteed to be a complete list of such breakages, and only contains entries since October 10, 2007. If you need to see UPDATING entries from before that date, you will need to fetch an UPDATING file from an older FreeBSD release. Copyright information: Copyright 1998-2009 M. Warner Losh. All Rights Reserved. Redistribution, publication, translation and use, with or without modification, in full or in part, in any form or format of this document are permitted without further permission from the author. THIS DOCUMENT IS PROVIDED BY WARNER LOSH ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL WARNER LOSH BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Contact Warner Losh if you have any questions about your use of this document. $FreeBSD$ Index: releng/10.3/sbin/dhclient/dhclient.c =================================================================== --- releng/10.3/sbin/dhclient/dhclient.c (revision 303983) +++ releng/10.3/sbin/dhclient/dhclient.c (revision 303984) @@ -1,2762 +1,2763 @@ /* $OpenBSD: dhclient.c,v 1.63 2005/02/06 17:10:13 krw Exp $ */ /* * Copyright 2004 Henning Brauer * Copyright (c) 1995, 1996, 1997, 1998, 1999 * The Internet Software Consortium. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of The Internet Software Consortium nor the names * of its contributors may be used to endorse or promote products derived * from this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE INTERNET SOFTWARE CONSORTIUM AND * CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE * DISCLAIMED. IN NO EVENT SHALL THE INTERNET SOFTWARE CONSORTIUM OR * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * This software has been written for the Internet Software Consortium * by Ted Lemon in cooperation with Vixie * Enterprises. To learn more about the Internet Software Consortium, * see ``http://www.vix.com/isc''. To learn more about Vixie * Enterprises, see ``http://www.vix.com''. * * This client was substantially modified and enhanced by Elliot Poger * for use on Linux while he was working on the MosquitoNet project at * Stanford. * * The current version owes much to Elliot's Linux enhancements, but * was substantially reorganized and partially rewritten by Ted Lemon * so as to use the same networking framework that the Internet Software * Consortium DHCP server uses. Much system-specific configuration code * was moved into a shell script so that as support for more operating * systems is added, it will not be necessary to port and maintain * system-specific configuration code to these operating systems - instead, * the shell script can invoke the native tools to accomplish the same * purpose. */ #include __FBSDID("$FreeBSD$"); #include #include "dhcpd.h" #include "privsep.h" #include #include #ifndef _PATH_VAREMPTY #define _PATH_VAREMPTY "/var/empty" #endif #define PERIOD 0x2e #define hyphenchar(c) ((c) == 0x2d) #define bslashchar(c) ((c) == 0x5c) #define periodchar(c) ((c) == PERIOD) #define asterchar(c) ((c) == 0x2a) #define alphachar(c) (((c) >= 0x41 && (c) <= 0x5a) || \ ((c) >= 0x61 && (c) <= 0x7a)) #define digitchar(c) ((c) >= 0x30 && (c) <= 0x39) #define whitechar(c) ((c) == ' ' || (c) == '\t') #define borderchar(c) (alphachar(c) || digitchar(c)) #define middlechar(c) (borderchar(c) || hyphenchar(c)) #define domainchar(c) ((c) > 0x20 && (c) < 0x7f) #define CLIENT_PATH "PATH=/usr/bin:/usr/sbin:/bin:/sbin" time_t cur_time; time_t default_lease_time = 43200; /* 12 hours... */ char *path_dhclient_conf = _PATH_DHCLIENT_CONF; char *path_dhclient_db = NULL; int log_perror = 1; int privfd; int nullfd = -1; char hostname[_POSIX_HOST_NAME_MAX + 1]; struct iaddr iaddr_broadcast = { 4, { 255, 255, 255, 255 } }; struct in_addr inaddr_any, inaddr_broadcast; char *path_dhclient_pidfile; struct pidfh *pidfile; /* * ASSERT_STATE() does nothing now; it used to be * assert (state_is == state_shouldbe). */ #define ASSERT_STATE(state_is, state_shouldbe) {} #define TIME_MAX 2147483647 int log_priority; int no_daemon; int unknown_ok = 1; int routefd; struct interface_info *ifi; int findproto(char *, int); struct sockaddr *get_ifa(char *, int); void routehandler(struct protocol *); void usage(void); int check_option(struct client_lease *l, int option); int check_classless_option(unsigned char *data, int len); int ipv4addrs(char * buf); int res_hnok(const char *dn); int check_search(const char *srch); char *option_as_string(unsigned int code, unsigned char *data, int len); int fork_privchld(int, int); #define ROUNDUP(a) \ ((a) > 0 ? (1 + (((a) - 1) | (sizeof(long) - 1))) : sizeof(long)) #define ADVANCE(x, n) (x += ROUNDUP((n)->sa_len)) static time_t scripttime; int findproto(char *cp, int n) { struct sockaddr *sa; int i; if (n == 0) return -1; for (i = 1; i; i <<= 1) { if (i & n) { sa = (struct sockaddr *)cp; switch (i) { case RTA_IFA: case RTA_DST: case RTA_GATEWAY: case RTA_NETMASK: if (sa->sa_family == AF_INET) return AF_INET; if (sa->sa_family == AF_INET6) return AF_INET6; break; case RTA_IFP: break; } ADVANCE(cp, sa); } } return (-1); } struct sockaddr * get_ifa(char *cp, int n) { struct sockaddr *sa; int i; if (n == 0) return (NULL); for (i = 1; i; i <<= 1) if (i & n) { sa = (struct sockaddr *)cp; if (i == RTA_IFA) return (sa); ADVANCE(cp, sa); } return (NULL); } struct iaddr defaddr = { 4 }; uint8_t curbssid[6]; static void disassoc(void *arg) { struct interface_info *ifi = arg; /* * Clear existing state. */ if (ifi->client->active != NULL) { script_init("EXPIRE", NULL); script_write_params("old_", ifi->client->active); if (ifi->client->alias) script_write_params("alias_", ifi->client->alias); script_go(); } ifi->client->state = S_INIT; } /* ARGSUSED */ void routehandler(struct protocol *p) { char msg[2048], *addr; struct rt_msghdr *rtm; struct if_msghdr *ifm; struct ifa_msghdr *ifam; struct if_announcemsghdr *ifan; struct ieee80211_join_event *jev; struct client_lease *l; time_t t = time(NULL); struct sockaddr *sa; struct iaddr a; ssize_t n; int linkstat; n = read(routefd, &msg, sizeof(msg)); rtm = (struct rt_msghdr *)msg; if (n < sizeof(rtm->rtm_msglen) || n < rtm->rtm_msglen || rtm->rtm_version != RTM_VERSION) return; switch (rtm->rtm_type) { case RTM_NEWADDR: case RTM_DELADDR: ifam = (struct ifa_msghdr *)rtm; if (ifam->ifam_index != ifi->index) break; if (findproto((char *)(ifam + 1), ifam->ifam_addrs) != AF_INET) break; if (scripttime == 0 || t < scripttime + 10) break; sa = get_ifa((char *)(ifam + 1), ifam->ifam_addrs); if (sa == NULL) break; if ((a.len = sizeof(struct in_addr)) > sizeof(a.iabuf)) error("king bula sez: len mismatch"); memcpy(a.iabuf, &((struct sockaddr_in *)sa)->sin_addr, a.len); if (addr_eq(a, defaddr)) break; for (l = ifi->client->active; l != NULL; l = l->next) if (addr_eq(a, l->address)) break; if (l == NULL) /* added/deleted addr is not the one we set */ break; addr = inet_ntoa(((struct sockaddr_in *)sa)->sin_addr); if (rtm->rtm_type == RTM_NEWADDR) { /* * XXX: If someone other than us adds our address, * should we assume they are taking over from us, * delete the lease record, and exit without modifying * the interface? */ warning("My address (%s) was re-added", addr); } else { warning("My address (%s) was deleted, dhclient exiting", addr); goto die; } break; case RTM_IFINFO: ifm = (struct if_msghdr *)rtm; if (ifm->ifm_index != ifi->index) break; if ((rtm->rtm_flags & RTF_UP) == 0) { warning("Interface %s is down, dhclient exiting", ifi->name); goto die; } linkstat = interface_link_status(ifi->name); if (linkstat != ifi->linkstat) { debug("%s link state %s -> %s", ifi->name, ifi->linkstat ? "up" : "down", linkstat ? "up" : "down"); ifi->linkstat = linkstat; if (linkstat) state_reboot(ifi); } break; case RTM_IFANNOUNCE: ifan = (struct if_announcemsghdr *)rtm; if (ifan->ifan_what == IFAN_DEPARTURE && ifan->ifan_index == ifi->index) { warning("Interface %s is gone, dhclient exiting", ifi->name); goto die; } break; case RTM_IEEE80211: ifan = (struct if_announcemsghdr *)rtm; if (ifan->ifan_index != ifi->index) break; switch (ifan->ifan_what) { case RTM_IEEE80211_ASSOC: case RTM_IEEE80211_REASSOC: /* * Use assoc/reassoc event to kick state machine * in case we roam. Otherwise fall back to the * normal state machine just like a wired network. */ jev = (struct ieee80211_join_event *) &ifan[1]; if (memcmp(curbssid, jev->iev_addr, 6)) { disassoc(ifi); state_reboot(ifi); } memcpy(curbssid, jev->iev_addr, 6); break; } break; default: break; } return; die: script_init("FAIL", NULL); if (ifi->client->alias) script_write_params("alias_", ifi->client->alias); script_go(); if (pidfile != NULL) pidfile_remove(pidfile); exit(1); } int main(int argc, char *argv[]) { extern char *__progname; int ch, fd, quiet = 0, i = 0; int pipe_fd[2]; int immediate_daemon = 0; struct passwd *pw; pid_t otherpid; cap_rights_t rights; /* Initially, log errors to stderr as well as to syslogd. */ openlog(__progname, LOG_PID | LOG_NDELAY, DHCPD_LOG_FACILITY); setlogmask(LOG_UPTO(LOG_DEBUG)); while ((ch = getopt(argc, argv, "bc:dl:p:qu")) != -1) switch (ch) { case 'b': immediate_daemon = 1; break; case 'c': path_dhclient_conf = optarg; break; case 'd': no_daemon = 1; break; case 'l': path_dhclient_db = optarg; break; case 'p': path_dhclient_pidfile = optarg; break; case 'q': quiet = 1; break; case 'u': unknown_ok = 0; break; default: usage(); } argc -= optind; argv += optind; if (argc != 1) usage(); if (path_dhclient_pidfile == NULL) { asprintf(&path_dhclient_pidfile, "%sdhclient.%s.pid", _PATH_VARRUN, *argv); if (path_dhclient_pidfile == NULL) error("asprintf"); } pidfile = pidfile_open(path_dhclient_pidfile, 0600, &otherpid); if (pidfile == NULL) { if (errno == EEXIST) error("dhclient already running, pid: %d.", otherpid); if (errno == EAGAIN) error("dhclient already running."); warning("Cannot open or create pidfile: %m"); } if ((ifi = calloc(1, sizeof(struct interface_info))) == NULL) error("calloc"); if (strlcpy(ifi->name, argv[0], IFNAMSIZ) >= IFNAMSIZ) error("Interface name too long"); if (path_dhclient_db == NULL && asprintf(&path_dhclient_db, "%s.%s", _PATH_DHCLIENT_DB, ifi->name) == -1) error("asprintf"); if (quiet) log_perror = 0; tzset(); time(&cur_time); inaddr_broadcast.s_addr = INADDR_BROADCAST; inaddr_any.s_addr = INADDR_ANY; read_client_conf(); /* The next bit is potentially very time-consuming, so write out the pidfile right away. We will write it out again with the correct pid after daemonizing. */ if (pidfile != NULL) pidfile_write(pidfile); if (!interface_link_status(ifi->name)) { fprintf(stderr, "%s: no link ...", ifi->name); fflush(stderr); sleep(1); while (!interface_link_status(ifi->name)) { fprintf(stderr, "."); fflush(stderr); if (++i > 10) { fprintf(stderr, " giving up\n"); exit(1); } sleep(1); } fprintf(stderr, " got link\n"); } ifi->linkstat = 1; if ((nullfd = open(_PATH_DEVNULL, O_RDWR, 0)) == -1) error("cannot open %s: %m", _PATH_DEVNULL); if ((pw = getpwnam("_dhcp")) == NULL) { warning("no such user: _dhcp, falling back to \"nobody\""); if ((pw = getpwnam("nobody")) == NULL) error("no such user: nobody"); } /* * Obtain hostname before entering capability mode - it won't be * possible then, as reading kern.hostname is not permitted. */ if (gethostname(hostname, sizeof(hostname)) < 0) hostname[0] = '\0'; priv_script_init("PREINIT", NULL); if (ifi->client->alias) priv_script_write_params("alias_", ifi->client->alias); priv_script_go(); /* set up the interface */ discover_interfaces(ifi); if (pipe(pipe_fd) == -1) error("pipe"); fork_privchld(pipe_fd[0], pipe_fd[1]); close(ifi->ufdesc); ifi->ufdesc = -1; close(ifi->wfdesc); ifi->wfdesc = -1; close(pipe_fd[0]); privfd = pipe_fd[1]; cap_rights_init(&rights, CAP_READ, CAP_WRITE); if (cap_rights_limit(privfd, &rights) < 0 && errno != ENOSYS) error("can't limit private descriptor: %m"); if ((fd = open(path_dhclient_db, O_RDONLY|O_EXLOCK|O_CREAT, 0)) == -1) error("can't open and lock %s: %m", path_dhclient_db); read_client_leases(); rewrite_client_leases(); close(fd); if ((routefd = socket(PF_ROUTE, SOCK_RAW, 0)) != -1) add_protocol("AF_ROUTE", routefd, routehandler, ifi); if (shutdown(routefd, SHUT_WR) < 0) error("can't shutdown route socket: %m"); cap_rights_init(&rights, CAP_EVENT, CAP_READ); if (cap_rights_limit(routefd, &rights) < 0 && errno != ENOSYS) error("can't limit route socket: %m"); if (chroot(_PATH_VAREMPTY) == -1) error("chroot"); if (chdir("/") == -1) error("chdir(\"/\")"); if (setgroups(1, &pw->pw_gid) || setegid(pw->pw_gid) || setgid(pw->pw_gid) || seteuid(pw->pw_uid) || setuid(pw->pw_uid)) error("can't drop privileges: %m"); endpwent(); setproctitle("%s", ifi->name); if (cap_enter() < 0 && errno != ENOSYS) error("can't enter capability mode: %m"); if (immediate_daemon) go_daemon(); ifi->client->state = S_INIT; state_reboot(ifi); bootp_packet_handler = do_packet; dispatch(); /* not reached */ return (0); } void usage(void) { extern char *__progname; fprintf(stderr, "usage: %s [-bdqu] ", __progname); fprintf(stderr, "[-c conffile] [-l leasefile] interface\n"); exit(1); } /* * Individual States: * * Each routine is called from the dhclient_state_machine() in one of * these conditions: * -> entering INIT state * -> recvpacket_flag == 0: timeout in this state * -> otherwise: received a packet in this state * * Return conditions as handled by dhclient_state_machine(): * Returns 1, sendpacket_flag = 1: send packet, reset timer. * Returns 1, sendpacket_flag = 0: just reset the timer (wait for a milestone). * Returns 0: finish the nap which was interrupted for no good reason. * * Several per-interface variables are used to keep track of the process: * active_lease: the lease that is being used on the interface * (null pointer if not configured yet). * offered_leases: leases corresponding to DHCPOFFER messages that have * been sent to us by DHCP servers. * acked_leases: leases corresponding to DHCPACK messages that have been * sent to us by DHCP servers. * sendpacket: DHCP packet we're trying to send. * destination: IP address to send sendpacket to * In addition, there are several relevant per-lease variables. * T1_expiry, T2_expiry, lease_expiry: lease milestones * In the active lease, these control the process of renewing the lease; * In leases on the acked_leases list, this simply determines when we * can no longer legitimately use the lease. */ void state_reboot(void *ipp) { struct interface_info *ip = ipp; /* If we don't remember an active lease, go straight to INIT. */ if (!ip->client->active || ip->client->active->is_bootp) { state_init(ip); return; } /* We are in the rebooting state. */ ip->client->state = S_REBOOTING; /* make_request doesn't initialize xid because it normally comes from the DHCPDISCOVER, but we haven't sent a DHCPDISCOVER, so pick an xid now. */ ip->client->xid = arc4random(); /* Make a DHCPREQUEST packet, and set appropriate per-interface flags. */ make_request(ip, ip->client->active); ip->client->destination = iaddr_broadcast; ip->client->first_sending = cur_time; ip->client->interval = ip->client->config->initial_interval; /* Zap the medium list... */ ip->client->medium = NULL; /* Send out the first DHCPREQUEST packet. */ send_request(ip); } /* * Called when a lease has completely expired and we've * been unable to renew it. */ void state_init(void *ipp) { struct interface_info *ip = ipp; ASSERT_STATE(state, S_INIT); /* Make a DHCPDISCOVER packet, and set appropriate per-interface flags. */ make_discover(ip, ip->client->active); ip->client->xid = ip->client->packet.xid; ip->client->destination = iaddr_broadcast; ip->client->state = S_SELECTING; ip->client->first_sending = cur_time; ip->client->interval = ip->client->config->initial_interval; /* Add an immediate timeout to cause the first DHCPDISCOVER packet to go out. */ send_discover(ip); } /* * state_selecting is called when one or more DHCPOFFER packets * have been received and a configurable period of time has passed. */ void state_selecting(void *ipp) { struct interface_info *ip = ipp; struct client_lease *lp, *next, *picked; ASSERT_STATE(state, S_SELECTING); /* Cancel state_selecting and send_discover timeouts, since either one could have got us here. */ cancel_timeout(state_selecting, ip); cancel_timeout(send_discover, ip); /* We have received one or more DHCPOFFER packets. Currently, the only criterion by which we judge leases is whether or not we get a response when we arp for them. */ picked = NULL; for (lp = ip->client->offered_leases; lp; lp = next) { next = lp->next; /* Check to see if we got an ARPREPLY for the address in this particular lease. */ if (!picked) { script_init("ARPCHECK", lp->medium); script_write_params("check_", lp); /* If the ARPCHECK code detects another machine using the offered address, it exits nonzero. We need to send a DHCPDECLINE and toss the lease. */ if (script_go()) { make_decline(ip, lp); send_decline(ip); goto freeit; } picked = lp; picked->next = NULL; } else { freeit: free_client_lease(lp); } } ip->client->offered_leases = NULL; /* If we just tossed all the leases we were offered, go back to square one. */ if (!picked) { ip->client->state = S_INIT; state_init(ip); return; } /* If it was a BOOTREPLY, we can just take the address right now. */ if (!picked->options[DHO_DHCP_MESSAGE_TYPE].len) { ip->client->new = picked; /* Make up some lease expiry times XXX these should be configurable. */ ip->client->new->expiry = cur_time + 12000; ip->client->new->renewal += cur_time + 8000; ip->client->new->rebind += cur_time + 10000; ip->client->state = S_REQUESTING; /* Bind to the address we received. */ bind_lease(ip); return; } /* Go to the REQUESTING state. */ ip->client->destination = iaddr_broadcast; ip->client->state = S_REQUESTING; ip->client->first_sending = cur_time; ip->client->interval = ip->client->config->initial_interval; /* Make a DHCPREQUEST packet from the lease we picked. */ make_request(ip, picked); ip->client->xid = ip->client->packet.xid; /* Toss the lease we picked - we'll get it back in a DHCPACK. */ free_client_lease(picked); /* Add an immediate timeout to send the first DHCPREQUEST packet. */ send_request(ip); } /* state_requesting is called when we receive a DHCPACK message after having sent out one or more DHCPREQUEST packets. */ void dhcpack(struct packet *packet) { struct interface_info *ip = packet->interface; struct client_lease *lease; /* If we're not receptive to an offer right now, or if the offer has an unrecognizable transaction id, then just drop it. */ if (packet->interface->client->xid != packet->raw->xid || (packet->interface->hw_address.hlen != packet->raw->hlen) || (memcmp(packet->interface->hw_address.haddr, packet->raw->chaddr, packet->raw->hlen))) return; if (ip->client->state != S_REBOOTING && ip->client->state != S_REQUESTING && ip->client->state != S_RENEWING && ip->client->state != S_REBINDING) return; note("DHCPACK from %s", piaddr(packet->client_addr)); lease = packet_to_lease(packet); if (!lease) { note("packet_to_lease failed."); return; } ip->client->new = lease; /* Stop resending DHCPREQUEST. */ cancel_timeout(send_request, ip); /* Figure out the lease time. */ if (ip->client->new->options[DHO_DHCP_LEASE_TIME].data) ip->client->new->expiry = getULong( ip->client->new->options[DHO_DHCP_LEASE_TIME].data); else ip->client->new->expiry = default_lease_time; /* A number that looks negative here is really just very large, because the lease expiry offset is unsigned. */ if (ip->client->new->expiry < 0) ip->client->new->expiry = TIME_MAX; /* XXX should be fixed by resetting the client state */ if (ip->client->new->expiry < 60) ip->client->new->expiry = 60; /* Take the server-provided renewal time if there is one; otherwise figure it out according to the spec. */ if (ip->client->new->options[DHO_DHCP_RENEWAL_TIME].len) ip->client->new->renewal = getULong( ip->client->new->options[DHO_DHCP_RENEWAL_TIME].data); else ip->client->new->renewal = ip->client->new->expiry / 2; /* Same deal with the rebind time. */ if (ip->client->new->options[DHO_DHCP_REBINDING_TIME].len) ip->client->new->rebind = getULong( ip->client->new->options[DHO_DHCP_REBINDING_TIME].data); else ip->client->new->rebind = ip->client->new->renewal + ip->client->new->renewal / 2 + ip->client->new->renewal / 4; ip->client->new->expiry += cur_time; /* Lease lengths can never be negative. */ if (ip->client->new->expiry < cur_time) ip->client->new->expiry = TIME_MAX; ip->client->new->renewal += cur_time; if (ip->client->new->renewal < cur_time) ip->client->new->renewal = TIME_MAX; ip->client->new->rebind += cur_time; if (ip->client->new->rebind < cur_time) ip->client->new->rebind = TIME_MAX; bind_lease(ip); } void bind_lease(struct interface_info *ip) { /* Remember the medium. */ ip->client->new->medium = ip->client->medium; /* Write out the new lease. */ write_client_lease(ip, ip->client->new, 0); /* Run the client script with the new parameters. */ script_init((ip->client->state == S_REQUESTING ? "BOUND" : (ip->client->state == S_RENEWING ? "RENEW" : (ip->client->state == S_REBOOTING ? "REBOOT" : "REBIND"))), ip->client->new->medium); if (ip->client->active && ip->client->state != S_REBOOTING) script_write_params("old_", ip->client->active); script_write_params("new_", ip->client->new); if (ip->client->alias) script_write_params("alias_", ip->client->alias); script_go(); /* Replace the old active lease with the new one. */ if (ip->client->active) free_client_lease(ip->client->active); ip->client->active = ip->client->new; ip->client->new = NULL; /* Set up a timeout to start the renewal process. */ add_timeout(ip->client->active->renewal, state_bound, ip); note("bound to %s -- renewal in %d seconds.", piaddr(ip->client->active->address), (int)(ip->client->active->renewal - cur_time)); ip->client->state = S_BOUND; reinitialize_interfaces(); go_daemon(); } /* * state_bound is called when we've successfully bound to a particular * lease, but the renewal time on that lease has expired. We are * expected to unicast a DHCPREQUEST to the server that gave us our * original lease. */ void state_bound(void *ipp) { struct interface_info *ip = ipp; ASSERT_STATE(state, S_BOUND); /* T1 has expired. */ make_request(ip, ip->client->active); ip->client->xid = ip->client->packet.xid; if (ip->client->active->options[DHO_DHCP_SERVER_IDENTIFIER].len == 4) { memcpy(ip->client->destination.iabuf, ip->client->active-> options[DHO_DHCP_SERVER_IDENTIFIER].data, 4); ip->client->destination.len = 4; } else ip->client->destination = iaddr_broadcast; ip->client->first_sending = cur_time; ip->client->interval = ip->client->config->initial_interval; ip->client->state = S_RENEWING; /* Send the first packet immediately. */ send_request(ip); } void bootp(struct packet *packet) { struct iaddrlist *ap; if (packet->raw->op != BOOTREPLY) return; /* If there's a reject list, make sure this packet's sender isn't on it. */ for (ap = packet->interface->client->config->reject_list; ap; ap = ap->next) { if (addr_eq(packet->client_addr, ap->addr)) { note("BOOTREPLY from %s rejected.", piaddr(ap->addr)); return; } } dhcpoffer(packet); } void dhcp(struct packet *packet) { struct iaddrlist *ap; void (*handler)(struct packet *); char *type; switch (packet->packet_type) { case DHCPOFFER: handler = dhcpoffer; type = "DHCPOFFER"; break; case DHCPNAK: handler = dhcpnak; type = "DHCPNACK"; break; case DHCPACK: handler = dhcpack; type = "DHCPACK"; break; default: return; } /* If there's a reject list, make sure this packet's sender isn't on it. */ for (ap = packet->interface->client->config->reject_list; ap; ap = ap->next) { if (addr_eq(packet->client_addr, ap->addr)) { note("%s from %s rejected.", type, piaddr(ap->addr)); return; } } (*handler)(packet); } void dhcpoffer(struct packet *packet) { struct interface_info *ip = packet->interface; struct client_lease *lease, *lp; int i; int arp_timeout_needed, stop_selecting; char *name = packet->options[DHO_DHCP_MESSAGE_TYPE].len ? "DHCPOFFER" : "BOOTREPLY"; /* If we're not receptive to an offer right now, or if the offer has an unrecognizable transaction id, then just drop it. */ if (ip->client->state != S_SELECTING || packet->interface->client->xid != packet->raw->xid || (packet->interface->hw_address.hlen != packet->raw->hlen) || (memcmp(packet->interface->hw_address.haddr, packet->raw->chaddr, packet->raw->hlen))) return; note("%s from %s", name, piaddr(packet->client_addr)); /* If this lease doesn't supply the minimum required parameters, blow it off. */ for (i = 0; ip->client->config->required_options[i]; i++) { if (!packet->options[ip->client->config-> required_options[i]].len) { note("%s isn't satisfactory.", name); return; } } /* If we've already seen this lease, don't record it again. */ for (lease = ip->client->offered_leases; lease; lease = lease->next) { if (lease->address.len == sizeof(packet->raw->yiaddr) && !memcmp(lease->address.iabuf, &packet->raw->yiaddr, lease->address.len)) { debug("%s already seen.", name); return; } } lease = packet_to_lease(packet); if (!lease) { note("packet_to_lease failed."); return; } /* If this lease was acquired through a BOOTREPLY, record that fact. */ if (!packet->options[DHO_DHCP_MESSAGE_TYPE].len) lease->is_bootp = 1; /* Record the medium under which this lease was offered. */ lease->medium = ip->client->medium; /* Send out an ARP Request for the offered IP address. */ script_init("ARPSEND", lease->medium); script_write_params("check_", lease); /* If the script can't send an ARP request without waiting, we'll be waiting when we do the ARPCHECK, so don't wait now. */ if (script_go()) arp_timeout_needed = 0; else arp_timeout_needed = 2; /* Figure out when we're supposed to stop selecting. */ stop_selecting = ip->client->first_sending + ip->client->config->select_interval; /* If this is the lease we asked for, put it at the head of the list, and don't mess with the arp request timeout. */ if (lease->address.len == ip->client->requested_address.len && !memcmp(lease->address.iabuf, ip->client->requested_address.iabuf, ip->client->requested_address.len)) { lease->next = ip->client->offered_leases; ip->client->offered_leases = lease; } else { /* If we already have an offer, and arping for this offer would take us past the selection timeout, then don't extend the timeout - just hope for the best. */ if (ip->client->offered_leases && (cur_time + arp_timeout_needed) > stop_selecting) arp_timeout_needed = 0; /* Put the lease at the end of the list. */ lease->next = NULL; if (!ip->client->offered_leases) ip->client->offered_leases = lease; else { for (lp = ip->client->offered_leases; lp->next; lp = lp->next) ; /* nothing */ lp->next = lease; } } /* If we're supposed to stop selecting before we've had time to wait for the ARPREPLY, add some delay to wait for the ARPREPLY. */ if (stop_selecting - cur_time < arp_timeout_needed) stop_selecting = cur_time + arp_timeout_needed; /* If the selecting interval has expired, go immediately to state_selecting(). Otherwise, time out into state_selecting at the select interval. */ if (stop_selecting <= 0) state_selecting(ip); else { add_timeout(stop_selecting, state_selecting, ip); cancel_timeout(send_discover, ip); } } /* Allocate a client_lease structure and initialize it from the parameters in the specified packet. */ struct client_lease * packet_to_lease(struct packet *packet) { struct client_lease *lease; int i; lease = malloc(sizeof(struct client_lease)); if (!lease) { warning("dhcpoffer: no memory to record lease."); return (NULL); } memset(lease, 0, sizeof(*lease)); /* Copy the lease options. */ for (i = 0; i < 256; i++) { if (packet->options[i].len) { lease->options[i].data = malloc(packet->options[i].len + 1); if (!lease->options[i].data) { warning("dhcpoffer: no memory for option %d", i); free_client_lease(lease); return (NULL); } else { memcpy(lease->options[i].data, packet->options[i].data, packet->options[i].len); lease->options[i].len = packet->options[i].len; lease->options[i].data[lease->options[i].len] = 0; } if (!check_option(lease,i)) { /* ignore a bogus lease offer */ warning("Invalid lease option - ignoring offer"); free_client_lease(lease); return (NULL); } } } lease->address.len = sizeof(packet->raw->yiaddr); memcpy(lease->address.iabuf, &packet->raw->yiaddr, lease->address.len); lease->nextserver.len = sizeof(packet->raw->siaddr); memcpy(lease->nextserver.iabuf, &packet->raw->siaddr, lease->nextserver.len); /* If the server name was filled out, copy it. Do not attempt to validate the server name as a host name. RFC 2131 merely states that sname is NUL-terminated (which do do not assume) and that it is the server's host name. Since the ISC client and server allow arbitrary characters, we do as well. */ if ((!packet->options[DHO_DHCP_OPTION_OVERLOAD].len || !(packet->options[DHO_DHCP_OPTION_OVERLOAD].data[0] & 2)) && packet->raw->sname[0]) { lease->server_name = malloc(DHCP_SNAME_LEN + 1); if (!lease->server_name) { warning("dhcpoffer: no memory for server name."); free_client_lease(lease); return (NULL); } memcpy(lease->server_name, packet->raw->sname, DHCP_SNAME_LEN); lease->server_name[DHCP_SNAME_LEN]='\0'; } /* Ditto for the filename. */ if ((!packet->options[DHO_DHCP_OPTION_OVERLOAD].len || !(packet->options[DHO_DHCP_OPTION_OVERLOAD].data[0] & 1)) && packet->raw->file[0]) { /* Don't count on the NUL terminator. */ lease->filename = malloc(DHCP_FILE_LEN + 1); if (!lease->filename) { warning("dhcpoffer: no memory for filename."); free_client_lease(lease); return (NULL); } memcpy(lease->filename, packet->raw->file, DHCP_FILE_LEN); lease->filename[DHCP_FILE_LEN]='\0'; } return lease; } void dhcpnak(struct packet *packet) { struct interface_info *ip = packet->interface; /* If we're not receptive to an offer right now, or if the offer has an unrecognizable transaction id, then just drop it. */ if (packet->interface->client->xid != packet->raw->xid || (packet->interface->hw_address.hlen != packet->raw->hlen) || (memcmp(packet->interface->hw_address.haddr, packet->raw->chaddr, packet->raw->hlen))) return; if (ip->client->state != S_REBOOTING && ip->client->state != S_REQUESTING && ip->client->state != S_RENEWING && ip->client->state != S_REBINDING) return; note("DHCPNAK from %s", piaddr(packet->client_addr)); if (!ip->client->active) { note("DHCPNAK with no active lease.\n"); return; } free_client_lease(ip->client->active); ip->client->active = NULL; /* Stop sending DHCPREQUEST packets... */ cancel_timeout(send_request, ip); ip->client->state = S_INIT; state_init(ip); } /* Send out a DHCPDISCOVER packet, and set a timeout to send out another one after the right interval has expired. If we don't get an offer by the time we reach the panic interval, call the panic function. */ void send_discover(void *ipp) { struct interface_info *ip = ipp; int interval, increase = 1; /* Figure out how long it's been since we started transmitting. */ interval = cur_time - ip->client->first_sending; /* If we're past the panic timeout, call the script and tell it we haven't found anything for this interface yet. */ if (interval > ip->client->config->timeout) { state_panic(ip); return; } /* If we're selecting media, try the whole list before doing the exponential backoff, but if we've already received an offer, stop looping, because we obviously have it right. */ if (!ip->client->offered_leases && ip->client->config->media) { int fail = 0; again: if (ip->client->medium) { ip->client->medium = ip->client->medium->next; increase = 0; } if (!ip->client->medium) { if (fail) error("No valid media types for %s!", ip->name); ip->client->medium = ip->client->config->media; increase = 1; } note("Trying medium \"%s\" %d", ip->client->medium->string, increase); script_init("MEDIUM", ip->client->medium); if (script_go()) goto again; } /* * If we're supposed to increase the interval, do so. If it's * currently zero (i.e., we haven't sent any packets yet), set * it to one; otherwise, add to it a random number between zero * and two times itself. On average, this means that it will * double with every transmission. */ if (increase) { if (!ip->client->interval) ip->client->interval = ip->client->config->initial_interval; else { ip->client->interval += (arc4random() >> 2) % (2 * ip->client->interval); } /* Don't backoff past cutoff. */ if (ip->client->interval > ip->client->config->backoff_cutoff) ip->client->interval = ((ip->client->config->backoff_cutoff / 2) + ((arc4random() >> 2) % ip->client->config->backoff_cutoff)); } else if (!ip->client->interval) ip->client->interval = ip->client->config->initial_interval; /* If the backoff would take us to the panic timeout, just use that as the interval. */ if (cur_time + ip->client->interval > ip->client->first_sending + ip->client->config->timeout) ip->client->interval = (ip->client->first_sending + ip->client->config->timeout) - cur_time + 1; /* Record the number of seconds since we started sending. */ if (interval < 65536) ip->client->packet.secs = htons(interval); else ip->client->packet.secs = htons(65535); ip->client->secs = ip->client->packet.secs; note("DHCPDISCOVER on %s to %s port %d interval %d", ip->name, inet_ntoa(inaddr_broadcast), REMOTE_PORT, (int)ip->client->interval); /* Send out a packet. */ send_packet_unpriv(privfd, &ip->client->packet, ip->client->packet_length, inaddr_any, inaddr_broadcast); add_timeout(cur_time + ip->client->interval, send_discover, ip); } /* * state_panic gets called if we haven't received any offers in a preset * amount of time. When this happens, we try to use existing leases * that haven't yet expired, and failing that, we call the client script * and hope it can do something. */ void state_panic(void *ipp) { struct interface_info *ip = ipp; struct client_lease *loop = ip->client->active; struct client_lease *lp; note("No DHCPOFFERS received."); /* We may not have an active lease, but we may have some predefined leases that we can try. */ if (!ip->client->active && ip->client->leases) goto activate_next; /* Run through the list of leases and see if one can be used. */ while (ip->client->active) { if (ip->client->active->expiry > cur_time) { note("Trying recorded lease %s", piaddr(ip->client->active->address)); /* Run the client script with the existing parameters. */ script_init("TIMEOUT", ip->client->active->medium); script_write_params("new_", ip->client->active); if (ip->client->alias) script_write_params("alias_", ip->client->alias); /* If the old lease is still good and doesn't yet need renewal, go into BOUND state and timeout at the renewal time. */ if (!script_go()) { if (cur_time < ip->client->active->renewal) { ip->client->state = S_BOUND; note("bound: renewal in %d seconds.", (int)(ip->client->active->renewal - cur_time)); add_timeout( ip->client->active->renewal, state_bound, ip); } else { ip->client->state = S_BOUND; note("bound: immediate renewal."); state_bound(ip); } reinitialize_interfaces(); go_daemon(); return; } } /* If there are no other leases, give up. */ if (!ip->client->leases) { ip->client->leases = ip->client->active; ip->client->active = NULL; break; } activate_next: /* Otherwise, put the active lease at the end of the lease list, and try another lease.. */ for (lp = ip->client->leases; lp->next; lp = lp->next) ; lp->next = ip->client->active; if (lp->next) lp->next->next = NULL; ip->client->active = ip->client->leases; ip->client->leases = ip->client->leases->next; /* If we already tried this lease, we've exhausted the set of leases, so we might as well give up for now. */ if (ip->client->active == loop) break; else if (!loop) loop = ip->client->active; } /* No leases were available, or what was available didn't work, so tell the shell script that we failed to allocate an address, and try again later. */ note("No working leases in persistent database - sleeping.\n"); script_init("FAIL", NULL); if (ip->client->alias) script_write_params("alias_", ip->client->alias); script_go(); ip->client->state = S_INIT; add_timeout(cur_time + ip->client->config->retry_interval, state_init, ip); go_daemon(); } void send_request(void *ipp) { struct interface_info *ip = ipp; struct in_addr from, to; int interval; /* Figure out how long it's been since we started transmitting. */ interval = cur_time - ip->client->first_sending; /* If we're in the INIT-REBOOT or REQUESTING state and we're past the reboot timeout, go to INIT and see if we can DISCOVER an address... */ /* XXX In the INIT-REBOOT state, if we don't get an ACK, it means either that we're on a network with no DHCP server, or that our server is down. In the latter case, assuming that there is a backup DHCP server, DHCPDISCOVER will get us a new address, but we could also have successfully reused our old address. In the former case, we're hosed anyway. This is not a win-prone situation. */ if ((ip->client->state == S_REBOOTING || ip->client->state == S_REQUESTING) && interval > ip->client->config->reboot_timeout) { cancel: ip->client->state = S_INIT; cancel_timeout(send_request, ip); state_init(ip); return; } /* If we're in the reboot state, make sure the media is set up correctly. */ if (ip->client->state == S_REBOOTING && !ip->client->medium && ip->client->active->medium ) { script_init("MEDIUM", ip->client->active->medium); /* If the medium we chose won't fly, go to INIT state. */ if (script_go()) goto cancel; /* Record the medium. */ ip->client->medium = ip->client->active->medium; } /* If the lease has expired, relinquish the address and go back to the INIT state. */ if (ip->client->state != S_REQUESTING && cur_time > ip->client->active->expiry) { /* Run the client script with the new parameters. */ script_init("EXPIRE", NULL); script_write_params("old_", ip->client->active); if (ip->client->alias) script_write_params("alias_", ip->client->alias); script_go(); /* Now do a preinit on the interface so that we can discover a new address. */ script_init("PREINIT", NULL); if (ip->client->alias) script_write_params("alias_", ip->client->alias); script_go(); ip->client->state = S_INIT; state_init(ip); return; } /* Do the exponential backoff... */ if (!ip->client->interval) ip->client->interval = ip->client->config->initial_interval; else ip->client->interval += ((arc4random() >> 2) % (2 * ip->client->interval)); /* Don't backoff past cutoff. */ if (ip->client->interval > ip->client->config->backoff_cutoff) ip->client->interval = ((ip->client->config->backoff_cutoff / 2) + ((arc4random() >> 2) % ip->client->interval)); /* If the backoff would take us to the expiry time, just set the timeout to the expiry time. */ if (ip->client->state != S_REQUESTING && cur_time + ip->client->interval > ip->client->active->expiry) ip->client->interval = ip->client->active->expiry - cur_time + 1; /* If the lease T2 time has elapsed, or if we're not yet bound, broadcast the DHCPREQUEST rather than unicasting. */ if (ip->client->state == S_REQUESTING || ip->client->state == S_REBOOTING || cur_time > ip->client->active->rebind) to.s_addr = INADDR_BROADCAST; else memcpy(&to.s_addr, ip->client->destination.iabuf, sizeof(to.s_addr)); if (ip->client->state != S_REQUESTING) memcpy(&from, ip->client->active->address.iabuf, sizeof(from)); else from.s_addr = INADDR_ANY; /* Record the number of seconds since we started sending. */ if (ip->client->state == S_REQUESTING) ip->client->packet.secs = ip->client->secs; else { if (interval < 65536) ip->client->packet.secs = htons(interval); else ip->client->packet.secs = htons(65535); } note("DHCPREQUEST on %s to %s port %d", ip->name, inet_ntoa(to), REMOTE_PORT); /* Send out a packet. */ send_packet_unpriv(privfd, &ip->client->packet, ip->client->packet_length, from, to); add_timeout(cur_time + ip->client->interval, send_request, ip); } void send_decline(void *ipp) { struct interface_info *ip = ipp; note("DHCPDECLINE on %s to %s port %d", ip->name, inet_ntoa(inaddr_broadcast), REMOTE_PORT); /* Send out a packet. */ send_packet_unpriv(privfd, &ip->client->packet, ip->client->packet_length, inaddr_any, inaddr_broadcast); } void make_discover(struct interface_info *ip, struct client_lease *lease) { unsigned char discover = DHCPDISCOVER; struct tree_cache *options[256]; struct tree_cache option_elements[256]; int i; memset(option_elements, 0, sizeof(option_elements)); memset(options, 0, sizeof(options)); memset(&ip->client->packet, 0, sizeof(ip->client->packet)); /* Set DHCP_MESSAGE_TYPE to DHCPDISCOVER */ i = DHO_DHCP_MESSAGE_TYPE; options[i] = &option_elements[i]; options[i]->value = &discover; options[i]->len = sizeof(discover); options[i]->buf_size = sizeof(discover); options[i]->timeout = 0xFFFFFFFF; /* Request the options we want */ i = DHO_DHCP_PARAMETER_REQUEST_LIST; options[i] = &option_elements[i]; options[i]->value = ip->client->config->requested_options; options[i]->len = ip->client->config->requested_option_count; options[i]->buf_size = ip->client->config->requested_option_count; options[i]->timeout = 0xFFFFFFFF; /* If we had an address, try to get it again. */ if (lease) { ip->client->requested_address = lease->address; i = DHO_DHCP_REQUESTED_ADDRESS; options[i] = &option_elements[i]; options[i]->value = lease->address.iabuf; options[i]->len = lease->address.len; options[i]->buf_size = lease->address.len; options[i]->timeout = 0xFFFFFFFF; } else ip->client->requested_address.len = 0; /* Send any options requested in the config file. */ for (i = 0; i < 256; i++) if (!options[i] && ip->client->config->send_options[i].data) { options[i] = &option_elements[i]; options[i]->value = ip->client->config->send_options[i].data; options[i]->len = ip->client->config->send_options[i].len; options[i]->buf_size = ip->client->config->send_options[i].len; options[i]->timeout = 0xFFFFFFFF; } /* send host name if not set via config file. */ if (!options[DHO_HOST_NAME]) { if (hostname[0] != '\0') { size_t len; char* posDot = strchr(hostname, '.'); if (posDot != NULL) len = posDot - hostname; else len = strlen(hostname); options[DHO_HOST_NAME] = &option_elements[DHO_HOST_NAME]; options[DHO_HOST_NAME]->value = hostname; options[DHO_HOST_NAME]->len = len; options[DHO_HOST_NAME]->buf_size = len; options[DHO_HOST_NAME]->timeout = 0xFFFFFFFF; } } /* set unique client identifier */ char client_ident[sizeof(struct hardware)]; if (!options[DHO_DHCP_CLIENT_IDENTIFIER]) { int hwlen = (ip->hw_address.hlen < sizeof(client_ident)-1) ? ip->hw_address.hlen : sizeof(client_ident)-1; client_ident[0] = ip->hw_address.htype; memcpy(&client_ident[1], ip->hw_address.haddr, hwlen); options[DHO_DHCP_CLIENT_IDENTIFIER] = &option_elements[DHO_DHCP_CLIENT_IDENTIFIER]; options[DHO_DHCP_CLIENT_IDENTIFIER]->value = client_ident; options[DHO_DHCP_CLIENT_IDENTIFIER]->len = hwlen+1; options[DHO_DHCP_CLIENT_IDENTIFIER]->buf_size = hwlen+1; options[DHO_DHCP_CLIENT_IDENTIFIER]->timeout = 0xFFFFFFFF; } /* Set up the option buffer... */ ip->client->packet_length = cons_options(NULL, &ip->client->packet, 0, options, 0, 0, 0, NULL, 0); if (ip->client->packet_length < BOOTP_MIN_LEN) ip->client->packet_length = BOOTP_MIN_LEN; ip->client->packet.op = BOOTREQUEST; ip->client->packet.htype = ip->hw_address.htype; ip->client->packet.hlen = ip->hw_address.hlen; ip->client->packet.hops = 0; ip->client->packet.xid = arc4random(); ip->client->packet.secs = 0; /* filled in by send_discover. */ ip->client->packet.flags = 0; memset(&(ip->client->packet.ciaddr), 0, sizeof(ip->client->packet.ciaddr)); memset(&(ip->client->packet.yiaddr), 0, sizeof(ip->client->packet.yiaddr)); memset(&(ip->client->packet.siaddr), 0, sizeof(ip->client->packet.siaddr)); memset(&(ip->client->packet.giaddr), 0, sizeof(ip->client->packet.giaddr)); memcpy(ip->client->packet.chaddr, ip->hw_address.haddr, ip->hw_address.hlen); } void make_request(struct interface_info *ip, struct client_lease * lease) { unsigned char request = DHCPREQUEST; struct tree_cache *options[256]; struct tree_cache option_elements[256]; int i; memset(options, 0, sizeof(options)); memset(&ip->client->packet, 0, sizeof(ip->client->packet)); /* Set DHCP_MESSAGE_TYPE to DHCPREQUEST */ i = DHO_DHCP_MESSAGE_TYPE; options[i] = &option_elements[i]; options[i]->value = &request; options[i]->len = sizeof(request); options[i]->buf_size = sizeof(request); options[i]->timeout = 0xFFFFFFFF; /* Request the options we want */ i = DHO_DHCP_PARAMETER_REQUEST_LIST; options[i] = &option_elements[i]; options[i]->value = ip->client->config->requested_options; options[i]->len = ip->client->config->requested_option_count; options[i]->buf_size = ip->client->config->requested_option_count; options[i]->timeout = 0xFFFFFFFF; /* If we are requesting an address that hasn't yet been assigned to us, use the DHCP Requested Address option. */ if (ip->client->state == S_REQUESTING) { /* Send back the server identifier... */ i = DHO_DHCP_SERVER_IDENTIFIER; options[i] = &option_elements[i]; options[i]->value = lease->options[i].data; options[i]->len = lease->options[i].len; options[i]->buf_size = lease->options[i].len; options[i]->timeout = 0xFFFFFFFF; } if (ip->client->state == S_REQUESTING || ip->client->state == S_REBOOTING) { ip->client->requested_address = lease->address; i = DHO_DHCP_REQUESTED_ADDRESS; options[i] = &option_elements[i]; options[i]->value = lease->address.iabuf; options[i]->len = lease->address.len; options[i]->buf_size = lease->address.len; options[i]->timeout = 0xFFFFFFFF; } else ip->client->requested_address.len = 0; /* Send any options requested in the config file. */ for (i = 0; i < 256; i++) if (!options[i] && ip->client->config->send_options[i].data) { options[i] = &option_elements[i]; options[i]->value = ip->client->config->send_options[i].data; options[i]->len = ip->client->config->send_options[i].len; options[i]->buf_size = ip->client->config->send_options[i].len; options[i]->timeout = 0xFFFFFFFF; } /* send host name if not set via config file. */ if (!options[DHO_HOST_NAME]) { if (hostname[0] != '\0') { size_t len; char* posDot = strchr(hostname, '.'); if (posDot != NULL) len = posDot - hostname; else len = strlen(hostname); options[DHO_HOST_NAME] = &option_elements[DHO_HOST_NAME]; options[DHO_HOST_NAME]->value = hostname; options[DHO_HOST_NAME]->len = len; options[DHO_HOST_NAME]->buf_size = len; options[DHO_HOST_NAME]->timeout = 0xFFFFFFFF; } } /* set unique client identifier */ char client_ident[sizeof(struct hardware)]; if (!options[DHO_DHCP_CLIENT_IDENTIFIER]) { int hwlen = (ip->hw_address.hlen < sizeof(client_ident)-1) ? ip->hw_address.hlen : sizeof(client_ident)-1; client_ident[0] = ip->hw_address.htype; memcpy(&client_ident[1], ip->hw_address.haddr, hwlen); options[DHO_DHCP_CLIENT_IDENTIFIER] = &option_elements[DHO_DHCP_CLIENT_IDENTIFIER]; options[DHO_DHCP_CLIENT_IDENTIFIER]->value = client_ident; options[DHO_DHCP_CLIENT_IDENTIFIER]->len = hwlen+1; options[DHO_DHCP_CLIENT_IDENTIFIER]->buf_size = hwlen+1; options[DHO_DHCP_CLIENT_IDENTIFIER]->timeout = 0xFFFFFFFF; } /* Set up the option buffer... */ ip->client->packet_length = cons_options(NULL, &ip->client->packet, 0, options, 0, 0, 0, NULL, 0); if (ip->client->packet_length < BOOTP_MIN_LEN) ip->client->packet_length = BOOTP_MIN_LEN; ip->client->packet.op = BOOTREQUEST; ip->client->packet.htype = ip->hw_address.htype; ip->client->packet.hlen = ip->hw_address.hlen; ip->client->packet.hops = 0; ip->client->packet.xid = ip->client->xid; ip->client->packet.secs = 0; /* Filled in by send_request. */ /* If we own the address we're requesting, put it in ciaddr; otherwise set ciaddr to zero. */ if (ip->client->state == S_BOUND || ip->client->state == S_RENEWING || ip->client->state == S_REBINDING) { memcpy(&ip->client->packet.ciaddr, lease->address.iabuf, lease->address.len); ip->client->packet.flags = 0; } else { memset(&ip->client->packet.ciaddr, 0, sizeof(ip->client->packet.ciaddr)); ip->client->packet.flags = 0; } memset(&ip->client->packet.yiaddr, 0, sizeof(ip->client->packet.yiaddr)); memset(&ip->client->packet.siaddr, 0, sizeof(ip->client->packet.siaddr)); memset(&ip->client->packet.giaddr, 0, sizeof(ip->client->packet.giaddr)); memcpy(ip->client->packet.chaddr, ip->hw_address.haddr, ip->hw_address.hlen); } void make_decline(struct interface_info *ip, struct client_lease *lease) { struct tree_cache *options[256], message_type_tree; struct tree_cache requested_address_tree; struct tree_cache server_id_tree, client_id_tree; unsigned char decline = DHCPDECLINE; int i; memset(options, 0, sizeof(options)); memset(&ip->client->packet, 0, sizeof(ip->client->packet)); /* Set DHCP_MESSAGE_TYPE to DHCPDECLINE */ i = DHO_DHCP_MESSAGE_TYPE; options[i] = &message_type_tree; options[i]->value = &decline; options[i]->len = sizeof(decline); options[i]->buf_size = sizeof(decline); options[i]->timeout = 0xFFFFFFFF; /* Send back the server identifier... */ i = DHO_DHCP_SERVER_IDENTIFIER; options[i] = &server_id_tree; options[i]->value = lease->options[i].data; options[i]->len = lease->options[i].len; options[i]->buf_size = lease->options[i].len; options[i]->timeout = 0xFFFFFFFF; /* Send back the address we're declining. */ i = DHO_DHCP_REQUESTED_ADDRESS; options[i] = &requested_address_tree; options[i]->value = lease->address.iabuf; options[i]->len = lease->address.len; options[i]->buf_size = lease->address.len; options[i]->timeout = 0xFFFFFFFF; /* Send the uid if the user supplied one. */ i = DHO_DHCP_CLIENT_IDENTIFIER; if (ip->client->config->send_options[i].len) { options[i] = &client_id_tree; options[i]->value = ip->client->config->send_options[i].data; options[i]->len = ip->client->config->send_options[i].len; options[i]->buf_size = ip->client->config->send_options[i].len; options[i]->timeout = 0xFFFFFFFF; } /* Set up the option buffer... */ ip->client->packet_length = cons_options(NULL, &ip->client->packet, 0, options, 0, 0, 0, NULL, 0); if (ip->client->packet_length < BOOTP_MIN_LEN) ip->client->packet_length = BOOTP_MIN_LEN; ip->client->packet.op = BOOTREQUEST; ip->client->packet.htype = ip->hw_address.htype; ip->client->packet.hlen = ip->hw_address.hlen; ip->client->packet.hops = 0; ip->client->packet.xid = ip->client->xid; ip->client->packet.secs = 0; /* Filled in by send_request. */ ip->client->packet.flags = 0; /* ciaddr must always be zero. */ memset(&ip->client->packet.ciaddr, 0, sizeof(ip->client->packet.ciaddr)); memset(&ip->client->packet.yiaddr, 0, sizeof(ip->client->packet.yiaddr)); memset(&ip->client->packet.siaddr, 0, sizeof(ip->client->packet.siaddr)); memset(&ip->client->packet.giaddr, 0, sizeof(ip->client->packet.giaddr)); memcpy(ip->client->packet.chaddr, ip->hw_address.haddr, ip->hw_address.hlen); } void free_client_lease(struct client_lease *lease) { int i; if (lease->server_name) free(lease->server_name); if (lease->filename) free(lease->filename); for (i = 0; i < 256; i++) { if (lease->options[i].len) free(lease->options[i].data); } free(lease); } FILE *leaseFile; void rewrite_client_leases(void) { struct client_lease *lp; cap_rights_t rights; if (!leaseFile) { leaseFile = fopen(path_dhclient_db, "w"); if (!leaseFile) error("can't create %s: %m", path_dhclient_db); cap_rights_init(&rights, CAP_FCNTL, CAP_FSTAT, CAP_FSYNC, CAP_FTRUNCATE, CAP_SEEK, CAP_WRITE); if (cap_rights_limit(fileno(leaseFile), &rights) < 0 && errno != ENOSYS) { error("can't limit lease descriptor: %m"); } if (cap_fcntls_limit(fileno(leaseFile), CAP_FCNTL_GETFL) < 0 && errno != ENOSYS) { error("can't limit lease descriptor fcntls: %m"); } } else { fflush(leaseFile); rewind(leaseFile); } for (lp = ifi->client->leases; lp; lp = lp->next) write_client_lease(ifi, lp, 1); if (ifi->client->active) write_client_lease(ifi, ifi->client->active, 1); fflush(leaseFile); ftruncate(fileno(leaseFile), ftello(leaseFile)); fsync(fileno(leaseFile)); } void write_client_lease(struct interface_info *ip, struct client_lease *lease, int rewrite) { static int leases_written; struct tm *t; int i; if (!rewrite) { if (leases_written++ > 20) { rewrite_client_leases(); leases_written = 0; } } /* If the lease came from the config file, we don't need to stash a copy in the lease database. */ if (lease->is_static) return; if (!leaseFile) { /* XXX */ leaseFile = fopen(path_dhclient_db, "w"); if (!leaseFile) error("can't create %s: %m", path_dhclient_db); } fprintf(leaseFile, "lease {\n"); if (lease->is_bootp) fprintf(leaseFile, " bootp;\n"); fprintf(leaseFile, " interface \"%s\";\n", ip->name); fprintf(leaseFile, " fixed-address %s;\n", piaddr(lease->address)); if (lease->nextserver.len == sizeof(inaddr_any) && 0 != memcmp(lease->nextserver.iabuf, &inaddr_any, sizeof(inaddr_any))) fprintf(leaseFile, " next-server %s;\n", piaddr(lease->nextserver)); if (lease->filename) fprintf(leaseFile, " filename \"%s\";\n", lease->filename); if (lease->server_name) fprintf(leaseFile, " server-name \"%s\";\n", lease->server_name); if (lease->medium) fprintf(leaseFile, " medium \"%s\";\n", lease->medium->string); for (i = 0; i < 256; i++) if (lease->options[i].len) fprintf(leaseFile, " option %s %s;\n", dhcp_options[i].name, pretty_print_option(i, lease->options[i].data, lease->options[i].len, 1, 1)); t = gmtime(&lease->renewal); fprintf(leaseFile, " renew %d %d/%d/%d %02d:%02d:%02d;\n", t->tm_wday, t->tm_year + 1900, t->tm_mon + 1, t->tm_mday, t->tm_hour, t->tm_min, t->tm_sec); t = gmtime(&lease->rebind); fprintf(leaseFile, " rebind %d %d/%d/%d %02d:%02d:%02d;\n", t->tm_wday, t->tm_year + 1900, t->tm_mon + 1, t->tm_mday, t->tm_hour, t->tm_min, t->tm_sec); t = gmtime(&lease->expiry); fprintf(leaseFile, " expire %d %d/%d/%d %02d:%02d:%02d;\n", t->tm_wday, t->tm_year + 1900, t->tm_mon + 1, t->tm_mday, t->tm_hour, t->tm_min, t->tm_sec); fprintf(leaseFile, "}\n"); fflush(leaseFile); } void script_init(char *reason, struct string_list *medium) { size_t len, mediumlen = 0; struct imsg_hdr hdr; struct buf *buf; int errs; if (medium != NULL && medium->string != NULL) mediumlen = strlen(medium->string); hdr.code = IMSG_SCRIPT_INIT; hdr.len = sizeof(struct imsg_hdr) + sizeof(size_t) + mediumlen + sizeof(size_t) + strlen(reason); if ((buf = buf_open(hdr.len)) == NULL) error("buf_open: %m"); errs = 0; errs += buf_add(buf, &hdr, sizeof(hdr)); errs += buf_add(buf, &mediumlen, sizeof(mediumlen)); if (mediumlen > 0) errs += buf_add(buf, medium->string, mediumlen); len = strlen(reason); errs += buf_add(buf, &len, sizeof(len)); errs += buf_add(buf, reason, len); if (errs) error("buf_add: %m"); if (buf_close(privfd, buf) == -1) error("buf_close: %m"); } void priv_script_init(char *reason, char *medium) { struct interface_info *ip = ifi; if (ip) { ip->client->scriptEnvsize = 100; if (ip->client->scriptEnv == NULL) ip->client->scriptEnv = malloc(ip->client->scriptEnvsize * sizeof(char *)); if (ip->client->scriptEnv == NULL) error("script_init: no memory for environment"); ip->client->scriptEnv[0] = strdup(CLIENT_PATH); if (ip->client->scriptEnv[0] == NULL) error("script_init: no memory for environment"); ip->client->scriptEnv[1] = NULL; script_set_env(ip->client, "", "interface", ip->name); if (medium) script_set_env(ip->client, "", "medium", medium); script_set_env(ip->client, "", "reason", reason); } } void priv_script_write_params(char *prefix, struct client_lease *lease) { struct interface_info *ip = ifi; u_int8_t dbuf[1500], *dp = NULL; int i, len; char tbuf[128]; script_set_env(ip->client, prefix, "ip_address", piaddr(lease->address)); if (ip->client->config->default_actions[DHO_SUBNET_MASK] == ACTION_SUPERSEDE) { dp = ip->client->config->defaults[DHO_SUBNET_MASK].data; len = ip->client->config->defaults[DHO_SUBNET_MASK].len; } else { dp = lease->options[DHO_SUBNET_MASK].data; len = lease->options[DHO_SUBNET_MASK].len; } if (len && (len < sizeof(lease->address.iabuf))) { struct iaddr netmask, subnet, broadcast; memcpy(netmask.iabuf, dp, len); netmask.len = len; subnet = subnet_number(lease->address, netmask); if (subnet.len) { script_set_env(ip->client, prefix, "network_number", piaddr(subnet)); if (!lease->options[DHO_BROADCAST_ADDRESS].len) { broadcast = broadcast_addr(subnet, netmask); if (broadcast.len) script_set_env(ip->client, prefix, "broadcast_address", piaddr(broadcast)); } } } if (lease->filename) script_set_env(ip->client, prefix, "filename", lease->filename); if (lease->server_name) script_set_env(ip->client, prefix, "server_name", lease->server_name); for (i = 0; i < 256; i++) { len = 0; if (ip->client->config->defaults[i].len) { if (lease->options[i].len) { switch ( ip->client->config->default_actions[i]) { case ACTION_DEFAULT: dp = lease->options[i].data; len = lease->options[i].len; break; case ACTION_SUPERSEDE: supersede: dp = ip->client-> config->defaults[i].data; len = ip->client-> config->defaults[i].len; break; case ACTION_PREPEND: len = ip->client-> config->defaults[i].len + lease->options[i].len; if (len >= sizeof(dbuf)) { warning("no space to %s %s", "prepend option", dhcp_options[i].name); goto supersede; } dp = dbuf; memcpy(dp, ip->client-> config->defaults[i].data, ip->client-> config->defaults[i].len); memcpy(dp + ip->client-> config->defaults[i].len, lease->options[i].data, lease->options[i].len); dp[len] = '\0'; break; case ACTION_APPEND: /* * When we append, we assume that we're * appending to text. Some MS servers * include a NUL byte at the end of * the search string provided. */ len = ip->client-> config->defaults[i].len + lease->options[i].len; if (len >= sizeof(dbuf)) { warning("no space to %s %s", "append option", dhcp_options[i].name); goto supersede; } memcpy(dbuf, lease->options[i].data, lease->options[i].len); for (dp = dbuf + lease->options[i].len; dp > dbuf; dp--, len--) if (dp[-1] != '\0') break; memcpy(dp, ip->client-> config->defaults[i].data, ip->client-> config->defaults[i].len); dp = dbuf; dp[len] = '\0'; } } else { dp = ip->client-> config->defaults[i].data; len = ip->client-> config->defaults[i].len; } } else if (lease->options[i].len) { len = lease->options[i].len; dp = lease->options[i].data; } else { len = 0; } if (len) { char name[256]; if (dhcp_option_ev_name(name, sizeof(name), &dhcp_options[i])) script_set_env(ip->client, prefix, name, pretty_print_option(i, dp, len, 0, 0)); } } snprintf(tbuf, sizeof(tbuf), "%d", (int)lease->expiry); script_set_env(ip->client, prefix, "expiry", tbuf); } void script_write_params(char *prefix, struct client_lease *lease) { size_t fn_len = 0, sn_len = 0, pr_len = 0; struct imsg_hdr hdr; struct buf *buf; int errs, i; if (lease->filename != NULL) fn_len = strlen(lease->filename); if (lease->server_name != NULL) sn_len = strlen(lease->server_name); if (prefix != NULL) pr_len = strlen(prefix); hdr.code = IMSG_SCRIPT_WRITE_PARAMS; hdr.len = sizeof(hdr) + sizeof(struct client_lease) + sizeof(size_t) + fn_len + sizeof(size_t) + sn_len + sizeof(size_t) + pr_len; for (i = 0; i < 256; i++) hdr.len += sizeof(int) + lease->options[i].len; scripttime = time(NULL); if ((buf = buf_open(hdr.len)) == NULL) error("buf_open: %m"); errs = 0; errs += buf_add(buf, &hdr, sizeof(hdr)); errs += buf_add(buf, lease, sizeof(struct client_lease)); errs += buf_add(buf, &fn_len, sizeof(fn_len)); errs += buf_add(buf, lease->filename, fn_len); errs += buf_add(buf, &sn_len, sizeof(sn_len)); errs += buf_add(buf, lease->server_name, sn_len); errs += buf_add(buf, &pr_len, sizeof(pr_len)); errs += buf_add(buf, prefix, pr_len); for (i = 0; i < 256; i++) { errs += buf_add(buf, &lease->options[i].len, sizeof(lease->options[i].len)); errs += buf_add(buf, lease->options[i].data, lease->options[i].len); } if (errs) error("buf_add: %m"); if (buf_close(privfd, buf) == -1) error("buf_close: %m"); } int script_go(void) { struct imsg_hdr hdr; struct buf *buf; int ret; hdr.code = IMSG_SCRIPT_GO; hdr.len = sizeof(struct imsg_hdr); if ((buf = buf_open(hdr.len)) == NULL) error("buf_open: %m"); if (buf_add(buf, &hdr, sizeof(hdr))) error("buf_add: %m"); if (buf_close(privfd, buf) == -1) error("buf_close: %m"); bzero(&hdr, sizeof(hdr)); buf_read(privfd, &hdr, sizeof(hdr)); if (hdr.code != IMSG_SCRIPT_GO_RET) error("unexpected msg type %u", hdr.code); if (hdr.len != sizeof(hdr) + sizeof(int)) error("received corrupted message"); buf_read(privfd, &ret, sizeof(ret)); scripttime = time(NULL); return (ret); } int priv_script_go(void) { char *scriptName, *argv[2], **envp, *epp[3], reason[] = "REASON=NBI"; static char client_path[] = CLIENT_PATH; struct interface_info *ip = ifi; int pid, wpid, wstatus; scripttime = time(NULL); if (ip) { scriptName = ip->client->config->script_name; envp = ip->client->scriptEnv; } else { scriptName = top_level_config.script_name; epp[0] = reason; epp[1] = client_path; epp[2] = NULL; envp = epp; } argv[0] = scriptName; argv[1] = NULL; pid = fork(); if (pid < 0) { error("fork: %m"); wstatus = 0; } else if (pid) { do { wpid = wait(&wstatus); } while (wpid != pid && wpid > 0); if (wpid < 0) { error("wait: %m"); wstatus = 0; } } else { execve(scriptName, argv, envp); error("execve (%s, ...): %m", scriptName); } if (ip) script_flush_env(ip->client); return (wstatus & 0xff); } void script_set_env(struct client_state *client, const char *prefix, const char *name, const char *value) { int i, j, namelen; + /* No `` or $() command substitution allowed in environment values! */ + for (j=0; j < strlen(value); j++) + switch (value[j]) { + case '`': + case '$': + warning("illegal character (%c) in value '%s'", + value[j], value); + /* Ignore this option */ + return; + } + namelen = strlen(name); for (i = 0; client->scriptEnv[i]; i++) if (strncmp(client->scriptEnv[i], name, namelen) == 0 && client->scriptEnv[i][namelen] == '=') break; if (client->scriptEnv[i]) /* Reuse the slot. */ free(client->scriptEnv[i]); else { /* New variable. Expand if necessary. */ if (i >= client->scriptEnvsize - 1) { char **newscriptEnv; int newscriptEnvsize = client->scriptEnvsize + 50; newscriptEnv = realloc(client->scriptEnv, newscriptEnvsize); if (newscriptEnv == NULL) { free(client->scriptEnv); client->scriptEnv = NULL; client->scriptEnvsize = 0; error("script_set_env: no memory for variable"); } client->scriptEnv = newscriptEnv; client->scriptEnvsize = newscriptEnvsize; } /* need to set the NULL pointer at end of array beyond the new slot. */ client->scriptEnv[i + 1] = NULL; } /* Allocate space and format the variable in the appropriate slot. */ client->scriptEnv[i] = malloc(strlen(prefix) + strlen(name) + 1 + strlen(value) + 1); if (client->scriptEnv[i] == NULL) error("script_set_env: no memory for variable assignment"); - - /* No `` or $() command substitution allowed in environment values! */ - for (j=0; j < strlen(value); j++) - switch (value[j]) { - case '`': - case '$': - error("illegal character (%c) in value '%s'", value[j], - value); - /* not reached */ - } snprintf(client->scriptEnv[i], strlen(prefix) + strlen(name) + 1 + strlen(value) + 1, "%s%s=%s", prefix, name, value); } void script_flush_env(struct client_state *client) { int i; for (i = 0; client->scriptEnv[i]; i++) { free(client->scriptEnv[i]); client->scriptEnv[i] = NULL; } client->scriptEnvsize = 0; } int dhcp_option_ev_name(char *buf, size_t buflen, struct option *option) { int i; for (i = 0; option->name[i]; i++) { if (i + 1 == buflen) return 0; if (option->name[i] == '-') buf[i] = '_'; else buf[i] = option->name[i]; } buf[i] = 0; return 1; } void go_daemon(void) { static int state = 0; cap_rights_t rights; if (no_daemon || state) return; state = 1; /* Stop logging to stderr... */ log_perror = 0; if (daemon(1, 0) == -1) error("daemon"); cap_rights_init(&rights); if (pidfile != NULL) { pidfile_write(pidfile); if (cap_rights_limit(pidfile_fileno(pidfile), &rights) < 0 && errno != ENOSYS) { error("can't limit pidfile descriptor: %m"); } } /* we are chrooted, daemon(3) fails to open /dev/null */ if (nullfd != -1) { dup2(nullfd, STDIN_FILENO); dup2(nullfd, STDOUT_FILENO); dup2(nullfd, STDERR_FILENO); close(nullfd); nullfd = -1; } if (cap_rights_limit(STDIN_FILENO, &rights) < 0 && errno != ENOSYS) error("can't limit stdin: %m"); cap_rights_init(&rights, CAP_WRITE); if (cap_rights_limit(STDOUT_FILENO, &rights) < 0 && errno != ENOSYS) error("can't limit stdout: %m"); if (cap_rights_limit(STDERR_FILENO, &rights) < 0 && errno != ENOSYS) error("can't limit stderr: %m"); } int check_option(struct client_lease *l, int option) { char *opbuf; char *sbuf; /* we use this, since this is what gets passed to dhclient-script */ opbuf = pretty_print_option(option, l->options[option].data, l->options[option].len, 0, 0); sbuf = option_as_string(option, l->options[option].data, l->options[option].len); switch (option) { case DHO_SUBNET_MASK: case DHO_TIME_SERVERS: case DHO_NAME_SERVERS: case DHO_ROUTERS: case DHO_DOMAIN_NAME_SERVERS: case DHO_LOG_SERVERS: case DHO_COOKIE_SERVERS: case DHO_LPR_SERVERS: case DHO_IMPRESS_SERVERS: case DHO_RESOURCE_LOCATION_SERVERS: case DHO_SWAP_SERVER: case DHO_BROADCAST_ADDRESS: case DHO_NIS_SERVERS: case DHO_NTP_SERVERS: case DHO_NETBIOS_NAME_SERVERS: case DHO_NETBIOS_DD_SERVER: case DHO_FONT_SERVERS: case DHO_DHCP_SERVER_IDENTIFIER: case DHO_NISPLUS_SERVERS: case DHO_MOBILE_IP_HOME_AGENT: case DHO_SMTP_SERVER: case DHO_POP_SERVER: case DHO_NNTP_SERVER: case DHO_WWW_SERVER: case DHO_FINGER_SERVER: case DHO_IRC_SERVER: case DHO_STREETTALK_SERVER: case DHO_STREETTALK_DA_SERVER: if (!ipv4addrs(opbuf)) { warning("Invalid IP address in option: %s", opbuf); return (0); } return (1) ; case DHO_HOST_NAME: case DHO_NIS_DOMAIN: case DHO_NISPLUS_DOMAIN: case DHO_TFTP_SERVER_NAME: if (!res_hnok(sbuf)) { warning("Bogus Host Name option %d: %s (%s)", option, sbuf, opbuf); l->options[option].len = 0; free(l->options[option].data); } return (1); case DHO_DOMAIN_NAME: case DHO_DOMAIN_SEARCH: if (!res_hnok(sbuf)) { if (!check_search(sbuf)) { warning("Bogus domain search list %d: %s (%s)", option, sbuf, opbuf); l->options[option].len = 0; free(l->options[option].data); } } return (1); case DHO_PAD: case DHO_TIME_OFFSET: case DHO_BOOT_SIZE: case DHO_MERIT_DUMP: case DHO_ROOT_PATH: case DHO_EXTENSIONS_PATH: case DHO_IP_FORWARDING: case DHO_NON_LOCAL_SOURCE_ROUTING: case DHO_POLICY_FILTER: case DHO_MAX_DGRAM_REASSEMBLY: case DHO_DEFAULT_IP_TTL: case DHO_PATH_MTU_AGING_TIMEOUT: case DHO_PATH_MTU_PLATEAU_TABLE: case DHO_INTERFACE_MTU: case DHO_ALL_SUBNETS_LOCAL: case DHO_PERFORM_MASK_DISCOVERY: case DHO_MASK_SUPPLIER: case DHO_ROUTER_DISCOVERY: case DHO_ROUTER_SOLICITATION_ADDRESS: case DHO_STATIC_ROUTES: case DHO_TRAILER_ENCAPSULATION: case DHO_ARP_CACHE_TIMEOUT: case DHO_IEEE802_3_ENCAPSULATION: case DHO_DEFAULT_TCP_TTL: case DHO_TCP_KEEPALIVE_INTERVAL: case DHO_TCP_KEEPALIVE_GARBAGE: case DHO_VENDOR_ENCAPSULATED_OPTIONS: case DHO_NETBIOS_NODE_TYPE: case DHO_NETBIOS_SCOPE: case DHO_X_DISPLAY_MANAGER: case DHO_DHCP_REQUESTED_ADDRESS: case DHO_DHCP_LEASE_TIME: case DHO_DHCP_OPTION_OVERLOAD: case DHO_DHCP_MESSAGE_TYPE: case DHO_DHCP_PARAMETER_REQUEST_LIST: case DHO_DHCP_MESSAGE: case DHO_DHCP_MAX_MESSAGE_SIZE: case DHO_DHCP_RENEWAL_TIME: case DHO_DHCP_REBINDING_TIME: case DHO_DHCP_CLASS_IDENTIFIER: case DHO_DHCP_CLIENT_IDENTIFIER: case DHO_BOOTFILE_NAME: case DHO_DHCP_USER_CLASS_ID: case DHO_END: return (1); case DHO_CLASSLESS_ROUTES: return (check_classless_option(l->options[option].data, l->options[option].len)); default: warning("unknown dhcp option value 0x%x", option); return (unknown_ok); } } /* RFC 3442 The Classless Static Routes option checks */ int check_classless_option(unsigned char *data, int len) { int i = 0; unsigned char width; in_addr_t addr, mask; if (len < 5) { warning("Too small length: %d", len); return (0); } while(i < len) { width = data[i++]; if (width == 0) { i += 4; continue; } else if (width < 9) { addr = (in_addr_t)(data[i] << 24); i += 1; } else if (width < 17) { addr = (in_addr_t)(data[i] << 24) + (in_addr_t)(data[i + 1] << 16); i += 2; } else if (width < 25) { addr = (in_addr_t)(data[i] << 24) + (in_addr_t)(data[i + 1] << 16) + (in_addr_t)(data[i + 2] << 8); i += 3; } else if (width < 33) { addr = (in_addr_t)(data[i] << 24) + (in_addr_t)(data[i + 1] << 16) + (in_addr_t)(data[i + 2] << 8) + data[i + 3]; i += 4; } else { warning("Incorrect subnet width: %d", width); return (0); } mask = (in_addr_t)(~0) << (32 - width); addr = ntohl(addr); mask = ntohl(mask); /* * From RFC 3442: * ... After deriving a subnet number and subnet mask * from each destination descriptor, the DHCP client * MUST zero any bits in the subnet number where the * corresponding bit in the mask is zero... */ if ((addr & mask) != addr) { addr &= mask; data[i - 1] = (unsigned char)( (addr >> (((32 - width)/8)*8)) & 0xFF); } i += 4; } if (i > len) { warning("Incorrect data length: %d (must be %d)", len, i); return (0); } return (1); } int res_hnok(const char *dn) { int pch = PERIOD, ch = *dn++; while (ch != '\0') { int nch = *dn++; if (periodchar(ch)) { ; } else if (periodchar(pch)) { if (!borderchar(ch)) return (0); } else if (periodchar(nch) || nch == '\0') { if (!borderchar(ch)) return (0); } else { if (!middlechar(ch)) return (0); } pch = ch, ch = nch; } return (1); } int check_search(const char *srch) { int pch = PERIOD, ch = *srch++; int domains = 1; /* 256 char limit re resolv.conf(5) */ if (strlen(srch) > 256) return (0); while (whitechar(ch)) ch = *srch++; while (ch != '\0') { int nch = *srch++; if (periodchar(ch) || whitechar(ch)) { ; } else if (periodchar(pch)) { if (!borderchar(ch)) return (0); } else if (periodchar(nch) || nch == '\0') { if (!borderchar(ch)) return (0); } else { if (!middlechar(ch)) return (0); } if (!whitechar(ch)) { pch = ch; } else { while (whitechar(nch)) { nch = *srch++; } if (nch != '\0') domains++; pch = PERIOD; } ch = nch; } /* 6 domain limit re resolv.conf(5) */ if (domains > 6) return (0); return (1); } /* Does buf consist only of dotted decimal ipv4 addrs? * return how many if so, * otherwise, return 0 */ int ipv4addrs(char * buf) { struct in_addr jnk; int count = 0; while (inet_aton(buf, &jnk) == 1){ count++; while (periodchar(*buf) || digitchar(*buf)) buf++; if (*buf == '\0') return (count); while (*buf == ' ') buf++; } return (0); } char * option_as_string(unsigned int code, unsigned char *data, int len) { static char optbuf[32768]; /* XXX */ char *op = optbuf; int opleft = sizeof(optbuf); unsigned char *dp = data; if (code > 255) error("option_as_string: bad code %d", code); for (; dp < data + len; dp++) { if (!isascii(*dp) || !isprint(*dp)) { if (dp + 1 != data + len || *dp != 0) { snprintf(op, opleft, "\\%03o", *dp); op += 4; opleft -= 4; } } else if (*dp == '"' || *dp == '\'' || *dp == '$' || *dp == '`' || *dp == '\\') { *op++ = '\\'; *op++ = *dp; opleft -= 2; } else { *op++ = *dp; opleft--; } } if (opleft < 1) goto toobig; *op = 0; return optbuf; toobig: warning("dhcp option too large"); return ""; } int fork_privchld(int fd, int fd2) { struct pollfd pfd[1]; int nfds; switch (fork()) { case -1: error("cannot fork"); case 0: break; default: return (0); } setproctitle("%s [priv]", ifi->name); setsid(); dup2(nullfd, STDIN_FILENO); dup2(nullfd, STDOUT_FILENO); dup2(nullfd, STDERR_FILENO); close(nullfd); close(fd2); close(ifi->rfdesc); ifi->rfdesc = -1; for (;;) { pfd[0].fd = fd; pfd[0].events = POLLIN; if ((nfds = poll(pfd, 1, INFTIM)) == -1) if (errno != EINTR) error("poll error"); if (nfds == 0 || !(pfd[0].revents & POLLIN)) continue; dispatch_imsg(ifi, fd); } } Index: releng/10.3/sys/conf/newvers.sh =================================================================== --- releng/10.3/sys/conf/newvers.sh (revision 303983) +++ releng/10.3/sys/conf/newvers.sh (revision 303984) @@ -1,232 +1,232 @@ #!/bin/sh - # # Copyright (c) 1984, 1986, 1990, 1993 # The Regents of the University of California. All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # 4. Neither the name of the University nor the names of its contributors # may be used to endorse or promote products derived from this software # without specific prior written permission. # # THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE # ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS # OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) # HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT # LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY # OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF # SUCH DAMAGE. # # @(#)newvers.sh 8.1 (Berkeley) 4/20/94 # $FreeBSD$ TYPE="FreeBSD" REVISION="10.3" -BRANCH="RELEASE-p6" +BRANCH="RELEASE-p7" if [ "X${BRANCH_OVERRIDE}" != "X" ]; then BRANCH=${BRANCH_OVERRIDE} fi RELEASE="${REVISION}-${BRANCH}" VERSION="${TYPE} ${RELEASE}" if [ "X${SYSDIR}" = "X" ]; then SYSDIR=$(dirname $0)/.. fi if [ "X${PARAMFILE}" != "X" ]; then RELDATE=$(awk '/__FreeBSD_version.*propagated to newvers/ {print $3}' \ ${PARAMFILE}) else RELDATE=$(awk '/__FreeBSD_version.*propagated to newvers/ {print $3}' \ ${SYSDIR}/sys/param.h) fi b=share/examples/etc/bsd-style-copyright year=$(sed -Ee '/^Copyright .* The FreeBSD Project/!d;s/^.*1992-([0-9]*) .*$/\1/g' ${SYSDIR}/../COPYRIGHT) # look for copyright template for bsd_copyright in ../$b ../../$b ../../../$b /usr/src/$b /usr/$b do if [ -r "$bsd_copyright" ]; then COPYRIGHT=`sed \ -e "s/\[year\]/1992-$year/" \ -e 's/\[your name here\]\.* /The FreeBSD Project./' \ -e 's/\[your name\]\.*/The FreeBSD Project./' \ -e '/\[id for your version control system, if any\]/d' \ $bsd_copyright` break fi done # no copyright found, use a dummy if [ X"$COPYRIGHT" = X ]; then COPYRIGHT="/*- * Copyright (c) 1992-$year The FreeBSD Project. * All rights reserved. * */" fi # add newline COPYRIGHT="$COPYRIGHT " LC_ALL=C; export LC_ALL if [ ! -r version ] then echo 0 > version fi touch version v=`cat version` u=${USER:-root} d=`pwd` h=${HOSTNAME:-`hostname`} if [ -n "$SOURCE_DATE_EPOCH" ]; then if ! t=`date -r $SOURCE_DATE_EPOCH 2>/dev/null`; then echo "Invalid SOURCE_DATE_EPOCH" >&2 exit 1 fi else t=`date` fi i=`${MAKE:-make} -V KERN_IDENT` compiler_v=$($(${MAKE:-make} -V CC) -v 2>&1 | grep 'version') for dir in /usr/bin /usr/local/bin; do if [ ! -z "${svnversion}" ] ; then break fi if [ -x "${dir}/svnversion" ] && [ -z ${svnversion} ] ; then # Run svnversion from ${dir} on this script; if return code # is not zero, the checkout might not be compatible with the # svnversion being used. ${dir}/svnversion $(realpath ${0}) >/dev/null 2>&1 if [ $? -eq 0 ]; then svnversion=${dir}/svnversion break fi fi done if [ -z "${svnversion}" ] && [ -x /usr/bin/svnliteversion ] ; then /usr/bin/svnliteversion $(realpath ${0}) >/dev/null 2>&1 if [ $? -eq 0 ]; then svnversion=/usr/bin/svnliteversion else svnversion= fi fi for dir in /usr/bin /usr/local/bin; do if [ -x "${dir}/p4" ] && [ -z ${p4_cmd} ] ; then p4_cmd=${dir}/p4 fi done if [ -d "${SYSDIR}/../.git" ] ; then for dir in /usr/bin /usr/local/bin; do if [ -x "${dir}/git" ] ; then git_cmd="${dir}/git --git-dir=${SYSDIR}/../.git" break fi done fi if [ -d "${SYSDIR}/../.hg" ] ; then for dir in /usr/bin /usr/local/bin; do if [ -x "${dir}/hg" ] ; then hg_cmd="${dir}/hg -R ${SYSDIR}/.." break fi done fi if [ -n "$svnversion" ] ; then svn=`cd ${SYSDIR} && $svnversion 2>/dev/null` case "$svn" in [0-9]*) svn=" r${svn}" ;; *) unset svn ;; esac fi if [ -n "$git_cmd" ] ; then git=`$git_cmd rev-parse --verify --short HEAD 2>/dev/null` svn=`$git_cmd svn find-rev $git 2>/dev/null` if [ -n "$svn" ] ; then svn=" r${svn}" git="=${git}" else svn=`$git_cmd log | fgrep 'git-svn-id:' | head -1 | \ sed -n 's/^.*@\([0-9][0-9]*\).*$/\1/p'` if [ -z "$svn" ] ; then svn=`$git_cmd log --format='format:%N' | \ grep '^svn ' | head -1 | \ sed -n 's/^.*revision=\([0-9][0-9]*\).*$/\1/p'` fi if [ -n "$svn" ] ; then svn=" r${svn}" git="+${git}" else git=" ${git}" fi fi git_b=`$git_cmd rev-parse --abbrev-ref HEAD` if [ -n "$git_b" ] ; then git="${git}(${git_b})" fi if $git_cmd --work-tree=${SYSDIR}/.. diff-index \ --name-only HEAD | read dummy; then git="${git}-dirty" fi fi if [ -n "$p4_cmd" ] ; then p4version=`cd ${SYSDIR} && $p4_cmd changes -m1 "./...#have" 2>&1 | \ awk '{ print $2 }'` case "$p4version" in [0-9]*) p4version=" ${p4version}" p4opened=`cd ${SYSDIR} && $p4_cmd opened ./... 2>&1` case "$p4opened" in File*) ;; //*) p4version="${p4version}+edit" ;; esac ;; *) unset p4version ;; esac fi if [ -n "$hg_cmd" ] ; then hg=`$hg_cmd id 2>/dev/null` svn=`$hg_cmd svn info 2>/dev/null | \ awk -F': ' '/Revision/ { print $2 }'` if [ -n "$svn" ] ; then svn=" r${svn}" fi if [ -n "$hg" ] ; then hg=" ${hg}" fi fi cat << EOF > vers.c $COPYRIGHT #define SCCSSTR "@(#)${VERSION} #${v}${svn}${git}${hg}${p4version}: ${t}" #define VERSTR "${VERSION} #${v}${svn}${git}${hg}${p4version}: ${t}\\n ${u}@${h}:${d}\\n" #define RELSTR "${RELEASE}" char sccs[sizeof(SCCSSTR) > 128 ? sizeof(SCCSSTR) : 128] = SCCSSTR; char version[sizeof(VERSTR) > 256 ? sizeof(VERSTR) : 256] = VERSTR; char compiler_version[] = "${compiler_v}"; char ostype[] = "${TYPE}"; char osrelease[sizeof(RELSTR) > 32 ? sizeof(RELSTR) : 32] = RELSTR; int osreldate = ${RELDATE}; char kern_ident[] = "${i}"; EOF echo $((v + 1)) > version Index: releng/10.3/sys/dev/hyperv/storvsc/hv_storvsc_drv_freebsd.c =================================================================== --- releng/10.3/sys/dev/hyperv/storvsc/hv_storvsc_drv_freebsd.c (revision 303983) +++ releng/10.3/sys/dev/hyperv/storvsc/hv_storvsc_drv_freebsd.c (revision 303984) @@ -1,2145 +1,2195 @@ /*- * Copyright (c) 2009-2012 Microsoft Corp. * Copyright (c) 2012 NetApp Inc. * Copyright (c) 2012 Citrix Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice unmodified, this list of conditions, and the following * disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ /** * StorVSC driver for Hyper-V. This driver presents a SCSI HBA interface * to the Comman Access Method (CAM) layer. CAM control blocks (CCBs) are * converted into VSCSI protocol messages which are delivered to the parent * partition StorVSP driver over the Hyper-V VMBUS. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "hv_vstorage.h" #define STORVSC_RINGBUFFER_SIZE (20*PAGE_SIZE) #define STORVSC_MAX_LUNS_PER_TARGET (64) #define STORVSC_MAX_IO_REQUESTS (STORVSC_MAX_LUNS_PER_TARGET * 2) #define BLKVSC_MAX_IDE_DISKS_PER_TARGET (1) #define BLKVSC_MAX_IO_REQUESTS STORVSC_MAX_IO_REQUESTS #define STORVSC_MAX_TARGETS (2) -#define STORVSC_WIN7_MAJOR 4 -#define STORVSC_WIN7_MINOR 2 - -#define STORVSC_WIN8_MAJOR 5 -#define STORVSC_WIN8_MINOR 1 - #define VSTOR_PKT_SIZE (sizeof(struct vstor_packet) - vmscsi_size_delta) #define HV_ALIGN(x, a) roundup2(x, a) struct storvsc_softc; struct hv_sgl_node { LIST_ENTRY(hv_sgl_node) link; struct sglist *sgl_data; }; struct hv_sgl_page_pool{ LIST_HEAD(, hv_sgl_node) in_use_sgl_list; LIST_HEAD(, hv_sgl_node) free_sgl_list; boolean_t is_init; } g_hv_sgl_page_pool; #define STORVSC_MAX_SG_PAGE_CNT STORVSC_MAX_IO_REQUESTS * HV_MAX_MULTIPAGE_BUFFER_COUNT enum storvsc_request_type { WRITE_TYPE, READ_TYPE, UNKNOWN_TYPE }; struct hv_storvsc_request { LIST_ENTRY(hv_storvsc_request) link; struct vstor_packet vstor_packet; hv_vmbus_multipage_buffer data_buf; void *sense_data; uint8_t sense_info_len; uint8_t retries; union ccb *ccb; struct storvsc_softc *softc; struct callout callout; struct sema synch_sema; /*Synchronize the request/response if needed */ struct sglist *bounce_sgl; unsigned int bounce_sgl_count; uint64_t not_aligned_seg_bits; }; struct storvsc_softc { struct hv_device *hs_dev; LIST_HEAD(, hv_storvsc_request) hs_free_list; struct mtx hs_lock; struct storvsc_driver_props *hs_drv_props; int hs_unit; uint32_t hs_frozen; struct cam_sim *hs_sim; struct cam_path *hs_path; uint32_t hs_num_out_reqs; boolean_t hs_destroy; boolean_t hs_drain_notify; boolean_t hs_open_multi_channel; struct sema hs_drain_sema; struct hv_storvsc_request hs_init_req; struct hv_storvsc_request hs_reset_req; }; /** * HyperV storvsc timeout testing cases: * a. IO returned after first timeout; * b. IO returned after second timeout and queue freeze; * c. IO returned while timer handler is running * The first can be tested by "sg_senddiag -vv /dev/daX", * and the second and third can be done by * "sg_wr_mode -v -p 08 -c 0,1a -m 0,ff /dev/daX". */ #define HVS_TIMEOUT_TEST 0 /* * Bus/adapter reset functionality on the Hyper-V host is * buggy and it will be disabled until * it can be further tested. */ #define HVS_HOST_RESET 0 struct storvsc_driver_props { char *drv_name; char *drv_desc; uint8_t drv_max_luns_per_target; uint8_t drv_max_ios_per_target; uint32_t drv_ringbuffer_size; }; enum hv_storage_type { DRIVER_BLKVSC, DRIVER_STORVSC, DRIVER_UNKNOWN }; #define HS_MAX_ADAPTERS 10 #define HV_STORAGE_SUPPORTS_MULTI_CHANNEL 0x1 /* {ba6163d9-04a1-4d29-b605-72e2ffb1dc7f} */ static const hv_guid gStorVscDeviceType={ .data = {0xd9, 0x63, 0x61, 0xba, 0xa1, 0x04, 0x29, 0x4d, 0xb6, 0x05, 0x72, 0xe2, 0xff, 0xb1, 0xdc, 0x7f} }; /* {32412632-86cb-44a2-9b5c-50d1417354f5} */ static const hv_guid gBlkVscDeviceType={ .data = {0x32, 0x26, 0x41, 0x32, 0xcb, 0x86, 0xa2, 0x44, 0x9b, 0x5c, 0x50, 0xd1, 0x41, 0x73, 0x54, 0xf5} }; static struct storvsc_driver_props g_drv_props_table[] = { {"blkvsc", "Hyper-V IDE Storage Interface", BLKVSC_MAX_IDE_DISKS_PER_TARGET, BLKVSC_MAX_IO_REQUESTS, STORVSC_RINGBUFFER_SIZE}, {"storvsc", "Hyper-V SCSI Storage Interface", STORVSC_MAX_LUNS_PER_TARGET, STORVSC_MAX_IO_REQUESTS, STORVSC_RINGBUFFER_SIZE} }; /* * Sense buffer size changed in win8; have a run-time * variable to track the size we should use. */ -static int sense_buffer_size; +static int sense_buffer_size = PRE_WIN8_STORVSC_SENSE_BUFFER_SIZE; /* * The size of the vmscsi_request has changed in win8. The * additional size is for the newly added elements in the * structure. These elements are valid only when we are talking * to a win8 host. * Track the correct size we need to apply. */ static int vmscsi_size_delta; +/* + * The storage protocol version is determined during the + * initial exchange with the host. It will indicate which + * storage functionality is available in the host. +*/ +static int vmstor_proto_version; -static int storvsc_current_major; -static int storvsc_current_minor; +struct vmstor_proto { + int proto_version; + int sense_buffer_size; + int vmscsi_size_delta; +}; +static const struct vmstor_proto vmstor_proto_list[] = { + { + VMSTOR_PROTOCOL_VERSION_WIN10, + POST_WIN7_STORVSC_SENSE_BUFFER_SIZE, + 0 + }, + { + VMSTOR_PROTOCOL_VERSION_WIN8_1, + POST_WIN7_STORVSC_SENSE_BUFFER_SIZE, + 0 + }, + { + VMSTOR_PROTOCOL_VERSION_WIN8, + POST_WIN7_STORVSC_SENSE_BUFFER_SIZE, + 0 + }, + { + VMSTOR_PROTOCOL_VERSION_WIN7, + PRE_WIN8_STORVSC_SENSE_BUFFER_SIZE, + sizeof(struct vmscsi_win8_extension), + }, + { + VMSTOR_PROTOCOL_VERSION_WIN6, + PRE_WIN8_STORVSC_SENSE_BUFFER_SIZE, + sizeof(struct vmscsi_win8_extension), + } +}; + /* static functions */ static int storvsc_probe(device_t dev); static int storvsc_attach(device_t dev); static int storvsc_detach(device_t dev); static void storvsc_poll(struct cam_sim * sim); static void storvsc_action(struct cam_sim * sim, union ccb * ccb); static int create_storvsc_request(union ccb *ccb, struct hv_storvsc_request *reqp); static void storvsc_free_request(struct storvsc_softc *sc, struct hv_storvsc_request *reqp); static enum hv_storage_type storvsc_get_storage_type(device_t dev); static void hv_storvsc_rescan_target(struct storvsc_softc *sc); static void hv_storvsc_on_channel_callback(void *context); static void hv_storvsc_on_iocompletion( struct storvsc_softc *sc, struct vstor_packet *vstor_packet, struct hv_storvsc_request *request); static int hv_storvsc_connect_vsp(struct hv_device *device); static void storvsc_io_done(struct hv_storvsc_request *reqp); static void storvsc_copy_sgl_to_bounce_buf(struct sglist *bounce_sgl, bus_dma_segment_t *orig_sgl, unsigned int orig_sgl_count, uint64_t seg_bits); void storvsc_copy_from_bounce_buf_to_sgl(bus_dma_segment_t *dest_sgl, unsigned int dest_sgl_count, struct sglist* src_sgl, uint64_t seg_bits); static device_method_t storvsc_methods[] = { /* Device interface */ DEVMETHOD(device_probe, storvsc_probe), DEVMETHOD(device_attach, storvsc_attach), DEVMETHOD(device_detach, storvsc_detach), DEVMETHOD(device_shutdown, bus_generic_shutdown), DEVMETHOD_END }; static driver_t storvsc_driver = { "storvsc", storvsc_methods, sizeof(struct storvsc_softc), }; static devclass_t storvsc_devclass; DRIVER_MODULE(storvsc, vmbus, storvsc_driver, storvsc_devclass, 0, 0); MODULE_VERSION(storvsc, 1); MODULE_DEPEND(storvsc, vmbus, 1, 1, 1); /** * The host is capable of sending messages to us that are * completely unsolicited. So, we need to address the race * condition where we may be in the process of unloading the * driver when the host may send us an unsolicited message. * We address this issue by implementing a sequentially * consistent protocol: * * 1. Channel callback is invoked while holding the the channel lock * and an unloading driver will reset the channel callback under * the protection of this channel lock. * * 2. To ensure bounded wait time for unloading a driver, we don't * permit outgoing traffic once the device is marked as being * destroyed. * * 3. Once the device is marked as being destroyed, we only * permit incoming traffic to properly account for * packets already sent out. */ static inline struct storvsc_softc * get_stor_device(struct hv_device *device, boolean_t outbound) { struct storvsc_softc *sc; sc = device_get_softc(device->device); if (sc == NULL) { return NULL; } if (outbound) { /* * Here we permit outgoing I/O only * if the device is not being destroyed. */ if (sc->hs_destroy) { sc = NULL; } } else { /* * inbound case; if being destroyed * only permit to account for * messages already sent out. */ if (sc->hs_destroy && (sc->hs_num_out_reqs == 0)) { sc = NULL; } } return sc; } /** * @brief Callback handler, will be invoked when receive mutil-channel offer * * @param context new multi-channel */ static void storvsc_handle_sc_creation(void *context) { hv_vmbus_channel *new_channel; struct hv_device *device; struct storvsc_softc *sc; struct vmstor_chan_props props; int ret = 0; new_channel = (hv_vmbus_channel *)context; device = new_channel->primary_channel->device; sc = get_stor_device(device, TRUE); if (sc == NULL) return; if (FALSE == sc->hs_open_multi_channel) return; memset(&props, 0, sizeof(props)); ret = hv_vmbus_channel_open(new_channel, sc->hs_drv_props->drv_ringbuffer_size, sc->hs_drv_props->drv_ringbuffer_size, (void *)&props, sizeof(struct vmstor_chan_props), hv_storvsc_on_channel_callback, new_channel); return; } /** * @brief Send multi-channel creation request to host * * @param device a Hyper-V device pointer * @param max_chans the max channels supported by vmbus */ static void storvsc_send_multichannel_request(struct hv_device *dev, int max_chans) { struct storvsc_softc *sc; struct hv_storvsc_request *request; struct vstor_packet *vstor_packet; int request_channels_cnt = 0; int ret; /* get multichannels count that need to create */ request_channels_cnt = MIN(max_chans, mp_ncpus); sc = get_stor_device(dev, TRUE); if (sc == NULL) { printf("Storvsc_error: get sc failed while send mutilchannel " "request\n"); return; } request = &sc->hs_init_req; /* Establish a handler for multi-channel */ dev->channel->sc_creation_callback = storvsc_handle_sc_creation; /* request the host to create multi-channel */ memset(request, 0, sizeof(struct hv_storvsc_request)); sema_init(&request->synch_sema, 0, ("stor_synch_sema")); vstor_packet = &request->vstor_packet; vstor_packet->operation = VSTOR_OPERATION_CREATE_MULTI_CHANNELS; vstor_packet->flags = REQUEST_COMPLETION_FLAG; vstor_packet->u.multi_channels_cnt = request_channels_cnt; ret = hv_vmbus_channel_send_packet( dev->channel, vstor_packet, VSTOR_PKT_SIZE, (uint64_t)(uintptr_t)request, HV_VMBUS_PACKET_TYPE_DATA_IN_BAND, HV_VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED); /* wait for 5 seconds */ ret = sema_timedwait(&request->synch_sema, 5 * hz); if (ret != 0) { printf("Storvsc_error: create multi-channel timeout, %d\n", ret); return; } if (vstor_packet->operation != VSTOR_OPERATION_COMPLETEIO || vstor_packet->status != 0) { printf("Storvsc_error: create multi-channel invalid operation " "(%d) or statue (%u)\n", vstor_packet->operation, vstor_packet->status); return; } sc->hs_open_multi_channel = TRUE; if (bootverbose) printf("Storvsc create multi-channel success!\n"); } /** * @brief initialize channel connection to parent partition * * @param dev a Hyper-V device pointer * @returns 0 on success, non-zero error on failure */ static int hv_storvsc_channel_init(struct hv_device *dev) { - int ret = 0; + int ret = 0, i; struct hv_storvsc_request *request; struct vstor_packet *vstor_packet; struct storvsc_softc *sc; uint16_t max_chans = 0; boolean_t support_multichannel = FALSE; max_chans = 0; support_multichannel = FALSE; sc = get_stor_device(dev, TRUE); if (sc == NULL) return (ENODEV); request = &sc->hs_init_req; memset(request, 0, sizeof(struct hv_storvsc_request)); vstor_packet = &request->vstor_packet; request->softc = sc; /** * Initiate the vsc/vsp initialization protocol on the open channel */ sema_init(&request->synch_sema, 0, ("stor_synch_sema")); vstor_packet->operation = VSTOR_OPERATION_BEGININITIALIZATION; vstor_packet->flags = REQUEST_COMPLETION_FLAG; ret = hv_vmbus_channel_send_packet( dev->channel, vstor_packet, VSTOR_PKT_SIZE, (uint64_t)(uintptr_t)request, HV_VMBUS_PACKET_TYPE_DATA_IN_BAND, HV_VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED); if (ret != 0) goto cleanup; /* wait 5 seconds */ ret = sema_timedwait(&request->synch_sema, 5 * hz); if (ret != 0) goto cleanup; if (vstor_packet->operation != VSTOR_OPERATION_COMPLETEIO || vstor_packet->status != 0) { goto cleanup; } - /* reuse the packet for version range supported */ + for (i = 0; i < nitems(vmstor_proto_list); i++) { + /* reuse the packet for version range supported */ - memset(vstor_packet, 0, sizeof(struct vstor_packet)); - vstor_packet->operation = VSTOR_OPERATION_QUERYPROTOCOLVERSION; - vstor_packet->flags = REQUEST_COMPLETION_FLAG; + memset(vstor_packet, 0, sizeof(struct vstor_packet)); + vstor_packet->operation = VSTOR_OPERATION_QUERYPROTOCOLVERSION; + vstor_packet->flags = REQUEST_COMPLETION_FLAG; - vstor_packet->u.version.major_minor = - VMSTOR_PROTOCOL_VERSION(storvsc_current_major, storvsc_current_minor); + vstor_packet->u.version.major_minor = + vmstor_proto_list[i].proto_version; - /* revision is only significant for Windows guests */ - vstor_packet->u.version.revision = 0; + /* revision is only significant for Windows guests */ + vstor_packet->u.version.revision = 0; - ret = hv_vmbus_channel_send_packet( + ret = hv_vmbus_channel_send_packet( dev->channel, vstor_packet, VSTOR_PKT_SIZE, (uint64_t)(uintptr_t)request, HV_VMBUS_PACKET_TYPE_DATA_IN_BAND, HV_VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED); - if (ret != 0) - goto cleanup; + if (ret != 0) + goto cleanup; - /* wait 5 seconds */ - ret = sema_timedwait(&request->synch_sema, 5 * hz); + /* wait 5 seconds */ + ret = sema_timedwait(&request->synch_sema, 5 * hz); - if (ret) - goto cleanup; + if (ret) + goto cleanup; - /* TODO: Check returned version */ - if (vstor_packet->operation != VSTOR_OPERATION_COMPLETEIO || - vstor_packet->status != 0) - goto cleanup; + if (vstor_packet->operation != VSTOR_OPERATION_COMPLETEIO) { + ret = EINVAL; + goto cleanup; + } + if (vstor_packet->status == 0) { + vmstor_proto_version = + vmstor_proto_list[i].proto_version; + sense_buffer_size = + vmstor_proto_list[i].sense_buffer_size; + vmscsi_size_delta = + vmstor_proto_list[i].vmscsi_size_delta; + break; + } + } + if (vstor_packet->status != 0) { + ret = EINVAL; + goto cleanup; + } /** * Query channel properties */ memset(vstor_packet, 0, sizeof(struct vstor_packet)); vstor_packet->operation = VSTOR_OPERATION_QUERYPROPERTIES; vstor_packet->flags = REQUEST_COMPLETION_FLAG; ret = hv_vmbus_channel_send_packet( dev->channel, vstor_packet, VSTOR_PKT_SIZE, (uint64_t)(uintptr_t)request, HV_VMBUS_PACKET_TYPE_DATA_IN_BAND, HV_VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED); if ( ret != 0) goto cleanup; /* wait 5 seconds */ ret = sema_timedwait(&request->synch_sema, 5 * hz); if (ret != 0) goto cleanup; /* TODO: Check returned version */ if (vstor_packet->operation != VSTOR_OPERATION_COMPLETEIO || vstor_packet->status != 0) { goto cleanup; } /* multi-channels feature is supported by WIN8 and above version */ max_chans = vstor_packet->u.chan_props.max_channel_cnt; if ((hv_vmbus_protocal_version != HV_VMBUS_VERSION_WIN7) && (hv_vmbus_protocal_version != HV_VMBUS_VERSION_WS2008) && (vstor_packet->u.chan_props.flags & HV_STORAGE_SUPPORTS_MULTI_CHANNEL)) { support_multichannel = TRUE; } memset(vstor_packet, 0, sizeof(struct vstor_packet)); vstor_packet->operation = VSTOR_OPERATION_ENDINITIALIZATION; vstor_packet->flags = REQUEST_COMPLETION_FLAG; ret = hv_vmbus_channel_send_packet( dev->channel, vstor_packet, VSTOR_PKT_SIZE, (uint64_t)(uintptr_t)request, HV_VMBUS_PACKET_TYPE_DATA_IN_BAND, HV_VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED); if (ret != 0) { goto cleanup; } /* wait 5 seconds */ ret = sema_timedwait(&request->synch_sema, 5 * hz); if (ret != 0) goto cleanup; if (vstor_packet->operation != VSTOR_OPERATION_COMPLETEIO || vstor_packet->status != 0) goto cleanup; /* * If multi-channel is supported, send multichannel create * request to host. */ if (support_multichannel) storvsc_send_multichannel_request(dev, max_chans); cleanup: sema_destroy(&request->synch_sema); return (ret); } /** * @brief Open channel connection to paraent partition StorVSP driver * * Open and initialize channel connection to parent partition StorVSP driver. * * @param pointer to a Hyper-V device * @returns 0 on success, non-zero error on failure */ static int hv_storvsc_connect_vsp(struct hv_device *dev) { int ret = 0; struct vmstor_chan_props props; struct storvsc_softc *sc; sc = device_get_softc(dev->device); memset(&props, 0, sizeof(struct vmstor_chan_props)); /* * Open the channel */ ret = hv_vmbus_channel_open( dev->channel, sc->hs_drv_props->drv_ringbuffer_size, sc->hs_drv_props->drv_ringbuffer_size, (void *)&props, sizeof(struct vmstor_chan_props), hv_storvsc_on_channel_callback, dev->channel); if (ret != 0) { return ret; } ret = hv_storvsc_channel_init(dev); return (ret); } #if HVS_HOST_RESET static int hv_storvsc_host_reset(struct hv_device *dev) { int ret = 0; struct storvsc_softc *sc; struct hv_storvsc_request *request; struct vstor_packet *vstor_packet; sc = get_stor_device(dev, TRUE); if (sc == NULL) { return ENODEV; } request = &sc->hs_reset_req; request->softc = sc; vstor_packet = &request->vstor_packet; sema_init(&request->synch_sema, 0, "stor synch sema"); vstor_packet->operation = VSTOR_OPERATION_RESETBUS; vstor_packet->flags = REQUEST_COMPLETION_FLAG; ret = hv_vmbus_channel_send_packet(dev->channel, vstor_packet, VSTOR_PKT_SIZE, (uint64_t)(uintptr_t)&sc->hs_reset_req, HV_VMBUS_PACKET_TYPE_DATA_IN_BAND, HV_VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED); if (ret != 0) { goto cleanup; } ret = sema_timedwait(&request->synch_sema, 5 * hz); /* KYS 5 seconds */ if (ret) { goto cleanup; } /* * At this point, all outstanding requests in the adapter * should have been flushed out and return to us */ cleanup: sema_destroy(&request->synch_sema); return (ret); } #endif /* HVS_HOST_RESET */ /** * @brief Function to initiate an I/O request * * @param device Hyper-V device pointer * @param request pointer to a request structure * @returns 0 on success, non-zero error on failure */ static int hv_storvsc_io_request(struct hv_device *device, struct hv_storvsc_request *request) { struct storvsc_softc *sc; struct vstor_packet *vstor_packet = &request->vstor_packet; struct hv_vmbus_channel* outgoing_channel = NULL; int ret = 0; sc = get_stor_device(device, TRUE); if (sc == NULL) { return ENODEV; } vstor_packet->flags |= REQUEST_COMPLETION_FLAG; vstor_packet->u.vm_srb.length = VSTOR_PKT_SIZE; vstor_packet->u.vm_srb.sense_info_len = sense_buffer_size; vstor_packet->u.vm_srb.transfer_len = request->data_buf.length; vstor_packet->operation = VSTOR_OPERATION_EXECUTESRB; outgoing_channel = vmbus_select_outgoing_channel(device->channel); mtx_unlock(&request->softc->hs_lock); if (request->data_buf.length) { ret = hv_vmbus_channel_send_packet_multipagebuffer( outgoing_channel, &request->data_buf, vstor_packet, VSTOR_PKT_SIZE, (uint64_t)(uintptr_t)request); } else { ret = hv_vmbus_channel_send_packet( outgoing_channel, vstor_packet, VSTOR_PKT_SIZE, (uint64_t)(uintptr_t)request, HV_VMBUS_PACKET_TYPE_DATA_IN_BAND, HV_VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED); } mtx_lock(&request->softc->hs_lock); if (ret != 0) { printf("Unable to send packet %p ret %d", vstor_packet, ret); } else { atomic_add_int(&sc->hs_num_out_reqs, 1); } return (ret); } /** * Process IO_COMPLETION_OPERATION and ready * the result to be completed for upper layer * processing by the CAM layer. */ static void hv_storvsc_on_iocompletion(struct storvsc_softc *sc, struct vstor_packet *vstor_packet, struct hv_storvsc_request *request) { struct vmscsi_req *vm_srb; vm_srb = &vstor_packet->u.vm_srb; + /* + * Copy some fields of the host's response into the request structure, + * because the fields will be used later in storvsc_io_done(). + */ + request->vstor_packet.u.vm_srb.scsi_status = vm_srb->scsi_status; + request->vstor_packet.u.vm_srb.transfer_len = vm_srb->transfer_len; + if (((vm_srb->scsi_status & 0xFF) == SCSI_STATUS_CHECK_COND) && (vm_srb->srb_status & SRB_STATUS_AUTOSENSE_VALID)) { /* Autosense data available */ KASSERT(vm_srb->sense_info_len <= request->sense_info_len, ("vm_srb->sense_info_len <= " "request->sense_info_len")); memcpy(request->sense_data, vm_srb->u.sense_data, vm_srb->sense_info_len); request->sense_info_len = vm_srb->sense_info_len; } /* Complete request by passing to the CAM layer */ storvsc_io_done(request); atomic_subtract_int(&sc->hs_num_out_reqs, 1); if (sc->hs_drain_notify && (sc->hs_num_out_reqs == 0)) { sema_post(&sc->hs_drain_sema); } } static void hv_storvsc_rescan_target(struct storvsc_softc *sc) { path_id_t pathid; target_id_t targetid; union ccb *ccb; pathid = cam_sim_path(sc->hs_sim); targetid = CAM_TARGET_WILDCARD; /* * Allocate a CCB and schedule a rescan. */ ccb = xpt_alloc_ccb_nowait(); if (ccb == NULL) { printf("unable to alloc CCB for rescan\n"); return; } if (xpt_create_path(&ccb->ccb_h.path, NULL, pathid, targetid, CAM_LUN_WILDCARD) != CAM_REQ_CMP) { printf("unable to create path for rescan, pathid: %d," "targetid: %d\n", pathid, targetid); xpt_free_ccb(ccb); return; } if (targetid == CAM_TARGET_WILDCARD) ccb->ccb_h.func_code = XPT_SCAN_BUS; else ccb->ccb_h.func_code = XPT_SCAN_TGT; xpt_rescan(ccb); } static void hv_storvsc_on_channel_callback(void *context) { int ret = 0; hv_vmbus_channel *channel = (hv_vmbus_channel *)context; struct hv_device *device = NULL; struct storvsc_softc *sc; uint32_t bytes_recvd; uint64_t request_id; uint8_t packet[roundup2(sizeof(struct vstor_packet), 8)]; struct hv_storvsc_request *request; struct vstor_packet *vstor_packet; if (channel->primary_channel != NULL){ device = channel->primary_channel->device; } else { device = channel->device; } KASSERT(device, ("device is NULL")); sc = get_stor_device(device, FALSE); if (sc == NULL) { printf("Storvsc_error: get stor device failed.\n"); return; } ret = hv_vmbus_channel_recv_packet( channel, packet, roundup2(VSTOR_PKT_SIZE, 8), &bytes_recvd, &request_id); while ((ret == 0) && (bytes_recvd > 0)) { request = (struct hv_storvsc_request *)(uintptr_t)request_id; if ((request == &sc->hs_init_req) || (request == &sc->hs_reset_req)) { memcpy(&request->vstor_packet, packet, sizeof(struct vstor_packet)); sema_post(&request->synch_sema); } else { vstor_packet = (struct vstor_packet *)packet; switch(vstor_packet->operation) { case VSTOR_OPERATION_COMPLETEIO: if (request == NULL) panic("VMBUS: storvsc received a " "packet with NULL request id in " "COMPLETEIO operation."); hv_storvsc_on_iocompletion(sc, vstor_packet, request); break; case VSTOR_OPERATION_REMOVEDEVICE: printf("VMBUS: storvsc operation %d not " "implemented.\n", vstor_packet->operation); /* TODO: implement */ break; case VSTOR_OPERATION_ENUMERATE_BUS: hv_storvsc_rescan_target(sc); break; default: break; } } ret = hv_vmbus_channel_recv_packet( channel, packet, roundup2(VSTOR_PKT_SIZE, 8), &bytes_recvd, &request_id); } } /** * @brief StorVSC probe function * * Device probe function. Returns 0 if the input device is a StorVSC * device. Otherwise, a ENXIO is returned. If the input device is * for BlkVSC (paravirtual IDE) device and this support is disabled in * favor of the emulated ATA/IDE device, return ENXIO. * * @param a device * @returns 0 on success, ENXIO if not a matcing StorVSC device */ static int storvsc_probe(device_t dev) { int ata_disk_enable = 0; int ret = ENXIO; - if (hv_vmbus_protocal_version == HV_VMBUS_VERSION_WS2008 || - hv_vmbus_protocal_version == HV_VMBUS_VERSION_WIN7) { - sense_buffer_size = PRE_WIN8_STORVSC_SENSE_BUFFER_SIZE; - vmscsi_size_delta = sizeof(struct vmscsi_win8_extension); - storvsc_current_major = STORVSC_WIN7_MAJOR; - storvsc_current_minor = STORVSC_WIN7_MINOR; - } else { - sense_buffer_size = POST_WIN7_STORVSC_SENSE_BUFFER_SIZE; - vmscsi_size_delta = 0; - storvsc_current_major = STORVSC_WIN8_MAJOR; - storvsc_current_minor = STORVSC_WIN8_MINOR; - } - switch (storvsc_get_storage_type(dev)) { case DRIVER_BLKVSC: if(bootverbose) device_printf(dev, "DRIVER_BLKVSC-Emulated ATA/IDE probe\n"); if (!getenv_int("hw.ata.disk_enable", &ata_disk_enable)) { if(bootverbose) device_printf(dev, "Enlightened ATA/IDE detected\n"); ret = BUS_PROBE_DEFAULT; } else if(bootverbose) device_printf(dev, "Emulated ATA/IDE set (hw.ata.disk_enable set)\n"); break; case DRIVER_STORVSC: if(bootverbose) device_printf(dev, "Enlightened SCSI device detected\n"); ret = BUS_PROBE_DEFAULT; break; default: ret = ENXIO; } return (ret); } /** * @brief StorVSC attach function * * Function responsible for allocating per-device structures, * setting up CAM interfaces and scanning for available LUNs to * be used for SCSI device peripherals. * * @param a device * @returns 0 on success or an error on failure */ static int storvsc_attach(device_t dev) { struct hv_device *hv_dev = vmbus_get_devctx(dev); enum hv_storage_type stor_type; struct storvsc_softc *sc; struct cam_devq *devq; int ret, i, j; struct hv_storvsc_request *reqp; struct root_hold_token *root_mount_token = NULL; struct hv_sgl_node *sgl_node = NULL; void *tmp_buff = NULL; /* * We need to serialize storvsc attach calls. */ root_mount_token = root_mount_hold("storvsc"); sc = device_get_softc(dev); if (sc == NULL) { ret = ENOMEM; goto cleanup; } stor_type = storvsc_get_storage_type(dev); if (stor_type == DRIVER_UNKNOWN) { ret = ENODEV; goto cleanup; } bzero(sc, sizeof(struct storvsc_softc)); /* fill in driver specific properties */ sc->hs_drv_props = &g_drv_props_table[stor_type]; /* fill in device specific properties */ sc->hs_unit = device_get_unit(dev); sc->hs_dev = hv_dev; device_set_desc(dev, g_drv_props_table[stor_type].drv_desc); LIST_INIT(&sc->hs_free_list); mtx_init(&sc->hs_lock, "hvslck", NULL, MTX_DEF); for (i = 0; i < sc->hs_drv_props->drv_max_ios_per_target; ++i) { reqp = malloc(sizeof(struct hv_storvsc_request), M_DEVBUF, M_WAITOK|M_ZERO); reqp->softc = sc; LIST_INSERT_HEAD(&sc->hs_free_list, reqp, link); } /* create sg-list page pool */ if (FALSE == g_hv_sgl_page_pool.is_init) { g_hv_sgl_page_pool.is_init = TRUE; LIST_INIT(&g_hv_sgl_page_pool.in_use_sgl_list); LIST_INIT(&g_hv_sgl_page_pool.free_sgl_list); /* * Pre-create SG list, each SG list with * HV_MAX_MULTIPAGE_BUFFER_COUNT segments, each * segment has one page buffer */ for (i = 0; i < STORVSC_MAX_IO_REQUESTS; i++) { sgl_node = malloc(sizeof(struct hv_sgl_node), M_DEVBUF, M_WAITOK|M_ZERO); sgl_node->sgl_data = sglist_alloc(HV_MAX_MULTIPAGE_BUFFER_COUNT, M_WAITOK|M_ZERO); for (j = 0; j < HV_MAX_MULTIPAGE_BUFFER_COUNT; j++) { tmp_buff = malloc(PAGE_SIZE, M_DEVBUF, M_WAITOK|M_ZERO); sgl_node->sgl_data->sg_segs[j].ss_paddr = (vm_paddr_t)tmp_buff; } LIST_INSERT_HEAD(&g_hv_sgl_page_pool.free_sgl_list, sgl_node, link); } } sc->hs_destroy = FALSE; sc->hs_drain_notify = FALSE; sc->hs_open_multi_channel = FALSE; sema_init(&sc->hs_drain_sema, 0, "Store Drain Sema"); ret = hv_storvsc_connect_vsp(hv_dev); if (ret != 0) { goto cleanup; } /* * Create the device queue. * Hyper-V maps each target to one SCSI HBA */ devq = cam_simq_alloc(sc->hs_drv_props->drv_max_ios_per_target); if (devq == NULL) { device_printf(dev, "Failed to alloc device queue\n"); ret = ENOMEM; goto cleanup; } sc->hs_sim = cam_sim_alloc(storvsc_action, storvsc_poll, sc->hs_drv_props->drv_name, sc, sc->hs_unit, &sc->hs_lock, 1, sc->hs_drv_props->drv_max_ios_per_target, devq); if (sc->hs_sim == NULL) { device_printf(dev, "Failed to alloc sim\n"); cam_simq_free(devq); ret = ENOMEM; goto cleanup; } mtx_lock(&sc->hs_lock); /* bus_id is set to 0, need to get it from VMBUS channel query? */ if (xpt_bus_register(sc->hs_sim, dev, 0) != CAM_SUCCESS) { cam_sim_free(sc->hs_sim, /*free_devq*/TRUE); mtx_unlock(&sc->hs_lock); device_printf(dev, "Unable to register SCSI bus\n"); ret = ENXIO; goto cleanup; } if (xpt_create_path(&sc->hs_path, /*periph*/NULL, cam_sim_path(sc->hs_sim), CAM_TARGET_WILDCARD, CAM_LUN_WILDCARD) != CAM_REQ_CMP) { xpt_bus_deregister(cam_sim_path(sc->hs_sim)); cam_sim_free(sc->hs_sim, /*free_devq*/TRUE); mtx_unlock(&sc->hs_lock); device_printf(dev, "Unable to create path\n"); ret = ENXIO; goto cleanup; } mtx_unlock(&sc->hs_lock); root_mount_rel(root_mount_token); return (0); cleanup: root_mount_rel(root_mount_token); while (!LIST_EMPTY(&sc->hs_free_list)) { reqp = LIST_FIRST(&sc->hs_free_list); LIST_REMOVE(reqp, link); free(reqp, M_DEVBUF); } while (!LIST_EMPTY(&g_hv_sgl_page_pool.free_sgl_list)) { sgl_node = LIST_FIRST(&g_hv_sgl_page_pool.free_sgl_list); LIST_REMOVE(sgl_node, link); for (j = 0; j < HV_MAX_MULTIPAGE_BUFFER_COUNT; j++) { if (NULL != (void*)sgl_node->sgl_data->sg_segs[j].ss_paddr) { free((void*)sgl_node->sgl_data->sg_segs[j].ss_paddr, M_DEVBUF); } } sglist_free(sgl_node->sgl_data); free(sgl_node, M_DEVBUF); } return (ret); } /** * @brief StorVSC device detach function * * This function is responsible for safely detaching a * StorVSC device. This includes waiting for inbound responses * to complete and freeing associated per-device structures. * * @param dev a device * returns 0 on success */ static int storvsc_detach(device_t dev) { struct storvsc_softc *sc = device_get_softc(dev); struct hv_storvsc_request *reqp = NULL; struct hv_device *hv_device = vmbus_get_devctx(dev); struct hv_sgl_node *sgl_node = NULL; int j = 0; mtx_lock(&hv_device->channel->inbound_lock); sc->hs_destroy = TRUE; mtx_unlock(&hv_device->channel->inbound_lock); /* * At this point, all outbound traffic should be disabled. We * only allow inbound traffic (responses) to proceed so that * outstanding requests can be completed. */ sc->hs_drain_notify = TRUE; sema_wait(&sc->hs_drain_sema); sc->hs_drain_notify = FALSE; /* * Since we have already drained, we don't need to busy wait. * The call to close the channel will reset the callback * under the protection of the incoming channel lock. */ hv_vmbus_channel_close(hv_device->channel); mtx_lock(&sc->hs_lock); while (!LIST_EMPTY(&sc->hs_free_list)) { reqp = LIST_FIRST(&sc->hs_free_list); LIST_REMOVE(reqp, link); free(reqp, M_DEVBUF); } mtx_unlock(&sc->hs_lock); while (!LIST_EMPTY(&g_hv_sgl_page_pool.free_sgl_list)) { sgl_node = LIST_FIRST(&g_hv_sgl_page_pool.free_sgl_list); LIST_REMOVE(sgl_node, link); for (j = 0; j < HV_MAX_MULTIPAGE_BUFFER_COUNT; j++){ if (NULL != (void*)sgl_node->sgl_data->sg_segs[j].ss_paddr) { free((void*)sgl_node->sgl_data->sg_segs[j].ss_paddr, M_DEVBUF); } } sglist_free(sgl_node->sgl_data); free(sgl_node, M_DEVBUF); } return (0); } #if HVS_TIMEOUT_TEST /** * @brief unit test for timed out operations * * This function provides unit testing capability to simulate * timed out operations. Recompilation with HV_TIMEOUT_TEST=1 * is required. * * @param reqp pointer to a request structure * @param opcode SCSI operation being performed * @param wait if 1, wait for I/O to complete */ static void storvsc_timeout_test(struct hv_storvsc_request *reqp, uint8_t opcode, int wait) { int ret; union ccb *ccb = reqp->ccb; struct storvsc_softc *sc = reqp->softc; if (reqp->vstor_packet.vm_srb.cdb[0] != opcode) { return; } if (wait) { mtx_lock(&reqp->event.mtx); } ret = hv_storvsc_io_request(sc->hs_dev, reqp); if (ret != 0) { if (wait) { mtx_unlock(&reqp->event.mtx); } printf("%s: io_request failed with %d.\n", __func__, ret); ccb->ccb_h.status = CAM_PROVIDE_FAIL; mtx_lock(&sc->hs_lock); storvsc_free_request(sc, reqp); xpt_done(ccb); mtx_unlock(&sc->hs_lock); return; } if (wait) { xpt_print(ccb->ccb_h.path, "%u: %s: waiting for IO return.\n", ticks, __func__); ret = cv_timedwait(&reqp->event.cv, &reqp->event.mtx, 60*hz); mtx_unlock(&reqp->event.mtx); xpt_print(ccb->ccb_h.path, "%u: %s: %s.\n", ticks, __func__, (ret == 0)? "IO return detected" : "IO return not detected"); /* * Now both the timer handler and io done are running * simultaneously. We want to confirm the io done always * finishes after the timer handler exits. So reqp used by * timer handler is not freed or stale. Do busy loop for * another 1/10 second to make sure io done does * wait for the timer handler to complete. */ DELAY(100*1000); mtx_lock(&sc->hs_lock); xpt_print(ccb->ccb_h.path, "%u: %s: finishing, queue frozen %d, " "ccb status 0x%x scsi_status 0x%x.\n", ticks, __func__, sc->hs_frozen, ccb->ccb_h.status, ccb->csio.scsi_status); mtx_unlock(&sc->hs_lock); } } #endif /* HVS_TIMEOUT_TEST */ +#ifdef notyet /** * @brief timeout handler for requests * * This function is called as a result of a callout expiring. * * @param arg pointer to a request */ static void storvsc_timeout(void *arg) { struct hv_storvsc_request *reqp = arg; struct storvsc_softc *sc = reqp->softc; union ccb *ccb = reqp->ccb; if (reqp->retries == 0) { mtx_lock(&sc->hs_lock); xpt_print(ccb->ccb_h.path, "%u: IO timed out (req=0x%p), wait for another %u secs.\n", ticks, reqp, ccb->ccb_h.timeout / 1000); cam_error_print(ccb, CAM_ESF_ALL, CAM_EPF_ALL); mtx_unlock(&sc->hs_lock); reqp->retries++; callout_reset_sbt(&reqp->callout, SBT_1MS * ccb->ccb_h.timeout, 0, storvsc_timeout, reqp, 0); #if HVS_TIMEOUT_TEST storvsc_timeout_test(reqp, SEND_DIAGNOSTIC, 0); #endif return; } mtx_lock(&sc->hs_lock); xpt_print(ccb->ccb_h.path, "%u: IO (reqp = 0x%p) did not return for %u seconds, %s.\n", ticks, reqp, ccb->ccb_h.timeout * (reqp->retries+1) / 1000, (sc->hs_frozen == 0)? "freezing the queue" : "the queue is already frozen"); if (sc->hs_frozen == 0) { sc->hs_frozen = 1; xpt_freeze_simq(xpt_path_sim(ccb->ccb_h.path), 1); } mtx_unlock(&sc->hs_lock); #if HVS_TIMEOUT_TEST storvsc_timeout_test(reqp, MODE_SELECT_10, 1); #endif } +#endif /** * @brief StorVSC device poll function * * This function is responsible for servicing requests when * interrupts are disabled (i.e when we are dumping core.) * * @param sim a pointer to a CAM SCSI interface module */ static void storvsc_poll(struct cam_sim *sim) { struct storvsc_softc *sc = cam_sim_softc(sim); mtx_assert(&sc->hs_lock, MA_OWNED); mtx_unlock(&sc->hs_lock); hv_storvsc_on_channel_callback(sc->hs_dev->channel); mtx_lock(&sc->hs_lock); } /** * @brief StorVSC device action function * * This function is responsible for handling SCSI operations which * are passed from the CAM layer. The requests are in the form of * CAM control blocks which indicate the action being performed. * Not all actions require converting the request to a VSCSI protocol * message - these actions can be responded to by this driver. * Requests which are destined for a backend storage device are converted * to a VSCSI protocol message and sent on the channel connection associated * with this device. * * @param sim pointer to a CAM SCSI interface module * @param ccb pointer to a CAM control block */ static void storvsc_action(struct cam_sim *sim, union ccb *ccb) { struct storvsc_softc *sc = cam_sim_softc(sim); int res; mtx_assert(&sc->hs_lock, MA_OWNED); switch (ccb->ccb_h.func_code) { case XPT_PATH_INQ: { struct ccb_pathinq *cpi = &ccb->cpi; cpi->version_num = 1; cpi->hba_inquiry = PI_TAG_ABLE|PI_SDTR_ABLE; cpi->target_sprt = 0; cpi->hba_misc = PIM_NOBUSRESET; cpi->hba_eng_cnt = 0; cpi->max_target = STORVSC_MAX_TARGETS; cpi->max_lun = sc->hs_drv_props->drv_max_luns_per_target; cpi->initiator_id = cpi->max_target; cpi->bus_id = cam_sim_bus(sim); cpi->base_transfer_speed = 300000; cpi->transport = XPORT_SAS; cpi->transport_version = 0; cpi->protocol = PROTO_SCSI; cpi->protocol_version = SCSI_REV_SPC2; strncpy(cpi->sim_vid, "FreeBSD", SIM_IDLEN); strncpy(cpi->hba_vid, sc->hs_drv_props->drv_name, HBA_IDLEN); strncpy(cpi->dev_name, cam_sim_name(sim), DEV_IDLEN); cpi->unit_number = cam_sim_unit(sim); ccb->ccb_h.status = CAM_REQ_CMP; xpt_done(ccb); return; } case XPT_GET_TRAN_SETTINGS: { struct ccb_trans_settings *cts = &ccb->cts; cts->transport = XPORT_SAS; cts->transport_version = 0; cts->protocol = PROTO_SCSI; cts->protocol_version = SCSI_REV_SPC2; /* enable tag queuing and disconnected mode */ cts->proto_specific.valid = CTS_SCSI_VALID_TQ; cts->proto_specific.scsi.valid = CTS_SCSI_VALID_TQ; cts->proto_specific.scsi.flags = CTS_SCSI_FLAGS_TAG_ENB; cts->xport_specific.valid = CTS_SPI_VALID_DISC; cts->xport_specific.spi.flags = CTS_SPI_FLAGS_DISC_ENB; ccb->ccb_h.status = CAM_REQ_CMP; xpt_done(ccb); return; } case XPT_SET_TRAN_SETTINGS: { ccb->ccb_h.status = CAM_REQ_CMP; xpt_done(ccb); return; } case XPT_CALC_GEOMETRY:{ cam_calc_geometry(&ccb->ccg, 1); xpt_done(ccb); return; } case XPT_RESET_BUS: case XPT_RESET_DEV:{ #if HVS_HOST_RESET if ((res = hv_storvsc_host_reset(sc->hs_dev)) != 0) { xpt_print(ccb->ccb_h.path, "hv_storvsc_host_reset failed with %d\n", res); ccb->ccb_h.status = CAM_PROVIDE_FAIL; xpt_done(ccb); return; } ccb->ccb_h.status = CAM_REQ_CMP; xpt_done(ccb); return; #else xpt_print(ccb->ccb_h.path, "%s reset not supported.\n", (ccb->ccb_h.func_code == XPT_RESET_BUS)? "bus" : "dev"); ccb->ccb_h.status = CAM_REQ_INVALID; xpt_done(ccb); return; #endif /* HVS_HOST_RESET */ } case XPT_SCSI_IO: case XPT_IMMED_NOTIFY: { struct hv_storvsc_request *reqp = NULL; if (ccb->csio.cdb_len == 0) { panic("cdl_len is 0\n"); } if (LIST_EMPTY(&sc->hs_free_list)) { ccb->ccb_h.status = CAM_REQUEUE_REQ; if (sc->hs_frozen == 0) { sc->hs_frozen = 1; xpt_freeze_simq(sim, /* count*/1); } xpt_done(ccb); return; } reqp = LIST_FIRST(&sc->hs_free_list); LIST_REMOVE(reqp, link); bzero(reqp, sizeof(struct hv_storvsc_request)); reqp->softc = sc; ccb->ccb_h.status |= CAM_SIM_QUEUED; if ((res = create_storvsc_request(ccb, reqp)) != 0) { ccb->ccb_h.status = CAM_REQ_INVALID; xpt_done(ccb); return; } +#ifdef notyet if (ccb->ccb_h.timeout != CAM_TIME_INFINITY) { callout_init(&reqp->callout, CALLOUT_MPSAFE); callout_reset_sbt(&reqp->callout, SBT_1MS * ccb->ccb_h.timeout, 0, storvsc_timeout, reqp, 0); #if HVS_TIMEOUT_TEST cv_init(&reqp->event.cv, "storvsc timeout cv"); mtx_init(&reqp->event.mtx, "storvsc timeout mutex", NULL, MTX_DEF); switch (reqp->vstor_packet.vm_srb.cdb[0]) { case MODE_SELECT_10: case SEND_DIAGNOSTIC: /* To have timer send the request. */ return; default: break; } #endif /* HVS_TIMEOUT_TEST */ } +#endif if ((res = hv_storvsc_io_request(sc->hs_dev, reqp)) != 0) { xpt_print(ccb->ccb_h.path, "hv_storvsc_io_request failed with %d\n", res); ccb->ccb_h.status = CAM_PROVIDE_FAIL; storvsc_free_request(sc, reqp); xpt_done(ccb); return; } return; } default: ccb->ccb_h.status = CAM_REQ_INVALID; xpt_done(ccb); return; } } /** * @brief destroy bounce buffer * * This function is responsible for destroy a Scatter/Gather list * that create by storvsc_create_bounce_buffer() * * @param sgl- the Scatter/Gather need be destroy * @param sg_count- page count of the SG list. * */ static void storvsc_destroy_bounce_buffer(struct sglist *sgl) { struct hv_sgl_node *sgl_node = NULL; sgl_node = LIST_FIRST(&g_hv_sgl_page_pool.in_use_sgl_list); LIST_REMOVE(sgl_node, link); if (NULL == sgl_node) { printf("storvsc error: not enough in use sgl\n"); return; } sgl_node->sgl_data = sgl; LIST_INSERT_HEAD(&g_hv_sgl_page_pool.free_sgl_list, sgl_node, link); } /** * @brief create bounce buffer * * This function is responsible for create a Scatter/Gather list, * which hold several pages that can be aligned with page size. * * @param seg_count- SG-list segments count * @param write - if WRITE_TYPE, set SG list page used size to 0, * otherwise set used size to page size. * * return NULL if create failed */ static struct sglist * storvsc_create_bounce_buffer(uint16_t seg_count, int write) { int i = 0; struct sglist *bounce_sgl = NULL; unsigned int buf_len = ((write == WRITE_TYPE) ? 0 : PAGE_SIZE); struct hv_sgl_node *sgl_node = NULL; /* get struct sglist from free_sgl_list */ sgl_node = LIST_FIRST(&g_hv_sgl_page_pool.free_sgl_list); LIST_REMOVE(sgl_node, link); if (NULL == sgl_node) { printf("storvsc error: not enough free sgl\n"); return NULL; } bounce_sgl = sgl_node->sgl_data; LIST_INSERT_HEAD(&g_hv_sgl_page_pool.in_use_sgl_list, sgl_node, link); bounce_sgl->sg_maxseg = seg_count; if (write == WRITE_TYPE) bounce_sgl->sg_nseg = 0; else bounce_sgl->sg_nseg = seg_count; for (i = 0; i < seg_count; i++) bounce_sgl->sg_segs[i].ss_len = buf_len; return bounce_sgl; } /** * @brief copy data from SG list to bounce buffer * * This function is responsible for copy data from one SG list's segments * to another SG list which used as bounce buffer. * * @param bounce_sgl - the destination SG list * @param orig_sgl - the segment of the source SG list. * @param orig_sgl_count - the count of segments. * @param orig_sgl_count - indicate which segment need bounce buffer, * set 1 means need. * */ static void storvsc_copy_sgl_to_bounce_buf(struct sglist *bounce_sgl, bus_dma_segment_t *orig_sgl, unsigned int orig_sgl_count, uint64_t seg_bits) { int src_sgl_idx = 0; for (src_sgl_idx = 0; src_sgl_idx < orig_sgl_count; src_sgl_idx++) { if (seg_bits & (1 << src_sgl_idx)) { memcpy((void*)bounce_sgl->sg_segs[src_sgl_idx].ss_paddr, (void*)orig_sgl[src_sgl_idx].ds_addr, orig_sgl[src_sgl_idx].ds_len); bounce_sgl->sg_segs[src_sgl_idx].ss_len = orig_sgl[src_sgl_idx].ds_len; } } } /** * @brief copy data from SG list which used as bounce to another SG list * * This function is responsible for copy data from one SG list with bounce * buffer to another SG list's segments. * * @param dest_sgl - the destination SG list's segments * @param dest_sgl_count - the count of destination SG list's segment. * @param src_sgl - the source SG list. * @param seg_bits - indicate which segment used bounce buffer of src SG-list. * */ void storvsc_copy_from_bounce_buf_to_sgl(bus_dma_segment_t *dest_sgl, unsigned int dest_sgl_count, struct sglist* src_sgl, uint64_t seg_bits) { int sgl_idx = 0; for (sgl_idx = 0; sgl_idx < dest_sgl_count; sgl_idx++) { if (seg_bits & (1 << sgl_idx)) { memcpy((void*)(dest_sgl[sgl_idx].ds_addr), (void*)(src_sgl->sg_segs[sgl_idx].ss_paddr), src_sgl->sg_segs[sgl_idx].ss_len); } } } /** * @brief check SG list with bounce buffer or not * * This function is responsible for check if need bounce buffer for SG list. * * @param sgl - the SG list's segments * @param sg_count - the count of SG list's segment. * @param bits - segmengs number that need bounce buffer * * return -1 if SG list needless bounce buffer */ static int storvsc_check_bounce_buffer_sgl(bus_dma_segment_t *sgl, unsigned int sg_count, uint64_t *bits) { int i = 0; int offset = 0; uint64_t phys_addr = 0; uint64_t tmp_bits = 0; boolean_t found_hole = FALSE; boolean_t pre_aligned = TRUE; if (sg_count < 2){ return -1; } *bits = 0; phys_addr = vtophys(sgl[0].ds_addr); offset = phys_addr - trunc_page(phys_addr); if (offset != 0) { pre_aligned = FALSE; tmp_bits |= 1; } for (i = 1; i < sg_count; i++) { phys_addr = vtophys(sgl[i].ds_addr); offset = phys_addr - trunc_page(phys_addr); if (offset == 0) { if (FALSE == pre_aligned){ /* * This segment is aligned, if the previous * one is not aligned, find a hole */ found_hole = TRUE; } pre_aligned = TRUE; } else { tmp_bits |= 1 << i; if (!pre_aligned) { if (phys_addr != vtophys(sgl[i-1].ds_addr + sgl[i-1].ds_len)) { /* * Check whether connect to previous * segment,if not, find the hole */ found_hole = TRUE; } } else { found_hole = TRUE; } pre_aligned = FALSE; } } if (!found_hole) { return (-1); } else { *bits = tmp_bits; return 0; } } /** * @brief Fill in a request structure based on a CAM control block * * Fills in a request structure based on the contents of a CAM control * block. The request structure holds the payload information for * VSCSI protocol request. * * @param ccb pointer to a CAM contorl block * @param reqp pointer to a request structure */ static int create_storvsc_request(union ccb *ccb, struct hv_storvsc_request *reqp) { struct ccb_scsiio *csio = &ccb->csio; uint64_t phys_addr; uint32_t bytes_to_copy = 0; uint32_t pfn_num = 0; uint32_t pfn; uint64_t not_aligned_seg_bits = 0; /* refer to struct vmscsi_req for meanings of these two fields */ reqp->vstor_packet.u.vm_srb.port = cam_sim_unit(xpt_path_sim(ccb->ccb_h.path)); reqp->vstor_packet.u.vm_srb.path_id = cam_sim_bus(xpt_path_sim(ccb->ccb_h.path)); reqp->vstor_packet.u.vm_srb.target_id = ccb->ccb_h.target_id; reqp->vstor_packet.u.vm_srb.lun = ccb->ccb_h.target_lun; reqp->vstor_packet.u.vm_srb.cdb_len = csio->cdb_len; if(ccb->ccb_h.flags & CAM_CDB_POINTER) { memcpy(&reqp->vstor_packet.u.vm_srb.u.cdb, csio->cdb_io.cdb_ptr, csio->cdb_len); } else { memcpy(&reqp->vstor_packet.u.vm_srb.u.cdb, csio->cdb_io.cdb_bytes, csio->cdb_len); } switch (ccb->ccb_h.flags & CAM_DIR_MASK) { case CAM_DIR_OUT: reqp->vstor_packet.u.vm_srb.data_in = WRITE_TYPE; break; case CAM_DIR_IN: reqp->vstor_packet.u.vm_srb.data_in = READ_TYPE; break; case CAM_DIR_NONE: reqp->vstor_packet.u.vm_srb.data_in = UNKNOWN_TYPE; break; default: reqp->vstor_packet.u.vm_srb.data_in = UNKNOWN_TYPE; break; } reqp->sense_data = &csio->sense_data; reqp->sense_info_len = csio->sense_len; reqp->ccb = ccb; if (0 == csio->dxfer_len) { return (0); } reqp->data_buf.length = csio->dxfer_len; switch (ccb->ccb_h.flags & CAM_DATA_MASK) { case CAM_DATA_VADDR: { bytes_to_copy = csio->dxfer_len; phys_addr = vtophys(csio->data_ptr); reqp->data_buf.offset = phys_addr & PAGE_MASK; while (bytes_to_copy != 0) { int bytes, page_offset; phys_addr = vtophys(&csio->data_ptr[reqp->data_buf.length - bytes_to_copy]); pfn = phys_addr >> PAGE_SHIFT; reqp->data_buf.pfn_array[pfn_num] = pfn; page_offset = phys_addr & PAGE_MASK; bytes = min(PAGE_SIZE - page_offset, bytes_to_copy); bytes_to_copy -= bytes; pfn_num++; } break; } case CAM_DATA_SG: { int i = 0; int offset = 0; int ret; bus_dma_segment_t *storvsc_sglist = (bus_dma_segment_t *)ccb->csio.data_ptr; u_int16_t storvsc_sg_count = ccb->csio.sglist_cnt; printf("Storvsc: get SG I/O operation, %d\n", reqp->vstor_packet.u.vm_srb.data_in); if (storvsc_sg_count > HV_MAX_MULTIPAGE_BUFFER_COUNT){ printf("Storvsc: %d segments is too much, " "only support %d segments\n", storvsc_sg_count, HV_MAX_MULTIPAGE_BUFFER_COUNT); return (EINVAL); } /* * We create our own bounce buffer function currently. Idealy * we should use BUS_DMA(9) framework. But with current BUS_DMA * code there is no callback API to check the page alignment of * middle segments before busdma can decide if a bounce buffer * is needed for particular segment. There is callback, * "bus_dma_filter_t *filter", but the parrameters are not * sufficient for storvsc driver. * TODO: * Add page alignment check in BUS_DMA(9) callback. Once * this is complete, switch the following code to use * BUS_DMA(9) for storvsc bounce buffer support. */ /* check if we need to create bounce buffer */ ret = storvsc_check_bounce_buffer_sgl(storvsc_sglist, storvsc_sg_count, ¬_aligned_seg_bits); if (ret != -1) { reqp->bounce_sgl = storvsc_create_bounce_buffer(storvsc_sg_count, reqp->vstor_packet.u.vm_srb.data_in); if (NULL == reqp->bounce_sgl) { printf("Storvsc_error: " "create bounce buffer failed.\n"); return (ENOMEM); } reqp->bounce_sgl_count = storvsc_sg_count; reqp->not_aligned_seg_bits = not_aligned_seg_bits; /* * if it is write, we need copy the original data *to bounce buffer */ if (WRITE_TYPE == reqp->vstor_packet.u.vm_srb.data_in) { storvsc_copy_sgl_to_bounce_buf( reqp->bounce_sgl, storvsc_sglist, storvsc_sg_count, reqp->not_aligned_seg_bits); } /* transfer virtual address to physical frame number */ if (reqp->not_aligned_seg_bits & 0x1){ phys_addr = vtophys(reqp->bounce_sgl->sg_segs[0].ss_paddr); }else{ phys_addr = vtophys(storvsc_sglist[0].ds_addr); } reqp->data_buf.offset = phys_addr & PAGE_MASK; pfn = phys_addr >> PAGE_SHIFT; reqp->data_buf.pfn_array[0] = pfn; for (i = 1; i < storvsc_sg_count; i++) { if (reqp->not_aligned_seg_bits & (1 << i)) { phys_addr = vtophys(reqp->bounce_sgl->sg_segs[i].ss_paddr); } else { phys_addr = vtophys(storvsc_sglist[i].ds_addr); } pfn = phys_addr >> PAGE_SHIFT; reqp->data_buf.pfn_array[i] = pfn; } } else { phys_addr = vtophys(storvsc_sglist[0].ds_addr); reqp->data_buf.offset = phys_addr & PAGE_MASK; for (i = 0; i < storvsc_sg_count; i++) { phys_addr = vtophys(storvsc_sglist[i].ds_addr); pfn = phys_addr >> PAGE_SHIFT; reqp->data_buf.pfn_array[i] = pfn; } /* check the last segment cross boundary or not */ offset = phys_addr & PAGE_MASK; if (offset) { phys_addr = vtophys(storvsc_sglist[i-1].ds_addr + PAGE_SIZE - offset); pfn = phys_addr >> PAGE_SHIFT; reqp->data_buf.pfn_array[i] = pfn; } reqp->bounce_sgl_count = 0; } break; } default: printf("Unknow flags: %d\n", ccb->ccb_h.flags); return(EINVAL); } return(0); } /* - * Modified based on scsi_print_inquiry which is responsible to - * print the detail information for scsi_inquiry_data. - * + * SCSI Inquiry checks qualifier and type. + * If qualifier is 011b, means the device server is not capable + * of supporting a peripheral device on this logical unit, and + * the type should be set to 1Fh. + * * Return 1 if it is valid, 0 otherwise. */ static inline int is_inquiry_valid(const struct scsi_inquiry_data *inq_data) { uint8_t type; - char vendor[16], product[48], revision[16]; - - /* - * Check device type and qualifier - */ - if (!(SID_QUAL_IS_VENDOR_UNIQUE(inq_data) || - SID_QUAL(inq_data) == SID_QUAL_LU_CONNECTED)) + if (SID_QUAL(inq_data) != SID_QUAL_LU_CONNECTED) { return (0); - + } type = SID_TYPE(inq_data); - switch (type) { - case T_DIRECT: - case T_SEQUENTIAL: - case T_PRINTER: - case T_PROCESSOR: - case T_WORM: - case T_CDROM: - case T_SCANNER: - case T_OPTICAL: - case T_CHANGER: - case T_COMM: - case T_STORARRAY: - case T_ENCLOSURE: - case T_RBC: - case T_OCRW: - case T_OSD: - case T_ADC: - break; - case T_NODEVICE: - default: + if (type == T_NODEVICE) { return (0); } - - /* - * Check vendor, product, and revision - */ - cam_strvis(vendor, inq_data->vendor, sizeof(inq_data->vendor), - sizeof(vendor)); - cam_strvis(product, inq_data->product, sizeof(inq_data->product), - sizeof(product)); - cam_strvis(revision, inq_data->revision, sizeof(inq_data->revision), - sizeof(revision)); - if (strlen(vendor) == 0 || - strlen(product) == 0 || - strlen(revision) == 0) - return (0); - return (1); } /** * @brief completion function before returning to CAM * * I/O process has been completed and the result needs * to be passed to the CAM layer. * Free resources related to this request. * * @param reqp pointer to a request structure */ static void storvsc_io_done(struct hv_storvsc_request *reqp) { union ccb *ccb = reqp->ccb; struct ccb_scsiio *csio = &ccb->csio; struct storvsc_softc *sc = reqp->softc; struct vmscsi_req *vm_srb = &reqp->vstor_packet.u.vm_srb; bus_dma_segment_t *ori_sglist = NULL; int ori_sg_count = 0; /* destroy bounce buffer if it is used */ if (reqp->bounce_sgl_count) { ori_sglist = (bus_dma_segment_t *)ccb->csio.data_ptr; ori_sg_count = ccb->csio.sglist_cnt; /* * If it is READ operation, we should copy back the data * to original SG list. */ if (READ_TYPE == reqp->vstor_packet.u.vm_srb.data_in) { storvsc_copy_from_bounce_buf_to_sgl(ori_sglist, ori_sg_count, reqp->bounce_sgl, reqp->not_aligned_seg_bits); } storvsc_destroy_bounce_buffer(reqp->bounce_sgl); reqp->bounce_sgl_count = 0; } if (reqp->retries > 0) { mtx_lock(&sc->hs_lock); #if HVS_TIMEOUT_TEST xpt_print(ccb->ccb_h.path, "%u: IO returned after timeout, " "waking up timer handler if any.\n", ticks); mtx_lock(&reqp->event.mtx); cv_signal(&reqp->event.cv); mtx_unlock(&reqp->event.mtx); #endif reqp->retries = 0; xpt_print(ccb->ccb_h.path, "%u: IO returned after timeout, " "stopping timer if any.\n", ticks); mtx_unlock(&sc->hs_lock); } +#ifdef notyet /* * callout_drain() will wait for the timer handler to finish * if it is running. So we don't need any lock to synchronize * between this routine and the timer handler. * Note that we need to make sure reqp is not freed when timer * handler is using or will use it. */ if (ccb->ccb_h.timeout != CAM_TIME_INFINITY) { callout_drain(&reqp->callout); } +#endif ccb->ccb_h.status &= ~CAM_SIM_QUEUED; ccb->ccb_h.status &= ~CAM_STATUS_MASK; if (vm_srb->scsi_status == SCSI_STATUS_OK) { const struct scsi_generic *cmd; - /* * Check whether the data for INQUIRY cmd is valid or * not. Windows 10 and Windows 2016 send all zero * inquiry data to VM even for unpopulated slots. */ cmd = (const struct scsi_generic *) ((ccb->ccb_h.flags & CAM_CDB_POINTER) ? csio->cdb_io.cdb_ptr : csio->cdb_io.cdb_bytes); - if (cmd->opcode == INQUIRY && - is_inquiry_valid( - (const struct scsi_inquiry_data *)csio->data_ptr) == 0) { + if (cmd->opcode == INQUIRY) { + /* + * The host of Windows 10 or 2016 server will response + * the inquiry request with invalid data for unexisted device: + [0x7f 0x0 0x5 0x2 0x1f ... ] + * But on windows 2012 R2, the response is: + [0x7f 0x0 0x0 0x0 0x0 ] + * That is why here wants to validate the inquiry response. + * The validation will skip the INQUIRY whose response is short, + * which is less than SHORT_INQUIRY_LENGTH (36). + * + * For more information about INQUIRY, please refer to: + * ftp://ftp.avc-pioneer.com/Mtfuji_7/Proposal/Jun09/INQUIRY.pdf + */ + const struct scsi_inquiry_data *inq_data = + (const struct scsi_inquiry_data *)csio->data_ptr; + uint8_t* resp_buf = (uint8_t*)csio->data_ptr; + /* Get the buffer length reported by host */ + int resp_xfer_len = vm_srb->transfer_len; + /* Get the available buffer length */ + int resp_buf_len = resp_xfer_len >= 5 ? resp_buf[4] + 5 : 0; + int data_len = (resp_buf_len < resp_xfer_len) ? resp_buf_len : resp_xfer_len; + if (data_len < SHORT_INQUIRY_LENGTH) { + ccb->ccb_h.status |= CAM_REQ_CMP; + if (bootverbose && data_len >= 5) { + mtx_lock(&sc->hs_lock); + xpt_print(ccb->ccb_h.path, + "storvsc skips the validation for short inquiry (%d)" + " [%x %x %x %x %x]\n", + data_len,resp_buf[0],resp_buf[1],resp_buf[2], + resp_buf[3],resp_buf[4]); + mtx_unlock(&sc->hs_lock); + } + } else if (is_inquiry_valid(inq_data) == 0) { ccb->ccb_h.status |= CAM_DEV_NOT_THERE; + if (bootverbose && data_len >= 5) { + mtx_lock(&sc->hs_lock); + xpt_print(ccb->ccb_h.path, + "storvsc uninstalled invalid device" + " [%x %x %x %x %x]\n", + resp_buf[0],resp_buf[1],resp_buf[2],resp_buf[3],resp_buf[4]); + mtx_unlock(&sc->hs_lock); + } + } else { + ccb->ccb_h.status |= CAM_REQ_CMP; if (bootverbose) { mtx_lock(&sc->hs_lock); xpt_print(ccb->ccb_h.path, - "storvsc uninstalled device\n"); + "storvsc has passed inquiry response (%d) validation\n", + data_len); mtx_unlock(&sc->hs_lock); } + } } else { ccb->ccb_h.status |= CAM_REQ_CMP; } } else { mtx_lock(&sc->hs_lock); xpt_print(ccb->ccb_h.path, "storvsc scsi_status = %d\n", vm_srb->scsi_status); mtx_unlock(&sc->hs_lock); ccb->ccb_h.status |= CAM_SCSI_STATUS_ERROR; } ccb->csio.scsi_status = (vm_srb->scsi_status & 0xFF); ccb->csio.resid = ccb->csio.dxfer_len - vm_srb->transfer_len; if (reqp->sense_info_len != 0) { csio->sense_resid = csio->sense_len - reqp->sense_info_len; ccb->ccb_h.status |= CAM_AUTOSNS_VALID; } mtx_lock(&sc->hs_lock); if (reqp->softc->hs_frozen == 1) { xpt_print(ccb->ccb_h.path, "%u: storvsc unfreezing softc 0x%p.\n", ticks, reqp->softc); ccb->ccb_h.status |= CAM_RELEASE_SIMQ; reqp->softc->hs_frozen = 0; } storvsc_free_request(sc, reqp); xpt_done(ccb); mtx_unlock(&sc->hs_lock); } /** * @brief Free a request structure * * Free a request structure by returning it to the free list * * @param sc pointer to a softc * @param reqp pointer to a request structure */ static void storvsc_free_request(struct storvsc_softc *sc, struct hv_storvsc_request *reqp) { LIST_INSERT_HEAD(&sc->hs_free_list, reqp, link); } /** * @brief Determine type of storage device from GUID * * Using the type GUID, determine if this is a StorVSC (paravirtual * SCSI or BlkVSC (paravirtual IDE) device. * * @param dev a device * returns an enum */ static enum hv_storage_type storvsc_get_storage_type(device_t dev) { const char *p = vmbus_get_type(dev); if (!memcmp(p, &gBlkVscDeviceType, sizeof(hv_guid))) { return DRIVER_BLKVSC; } else if (!memcmp(p, &gStorVscDeviceType, sizeof(hv_guid))) { return DRIVER_STORVSC; } return (DRIVER_UNKNOWN); } Index: releng/10.3/sys/dev/hyperv/storvsc/hv_vstorage.h =================================================================== --- releng/10.3/sys/dev/hyperv/storvsc/hv_vstorage.h (revision 303983) +++ releng/10.3/sys/dev/hyperv/storvsc/hv_vstorage.h (revision 303984) @@ -1,266 +1,271 @@ /*- * Copyright (c) 2009-2012 Microsoft Corp. * Copyright (c) 2012 NetApp Inc. * Copyright (c) 2012 Citrix Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice unmodified, this list of conditions, and the following * disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * $FreeBSD$ */ #ifndef __HV_VSTORAGE_H__ #define __HV_VSTORAGE_H__ /* * Major/minor macros. Minor version is in LSB, meaning that earlier flat * version numbers will be interpreted as "0.x" (i.e., 1 becomes 0.1). */ #define VMSTOR_PROTOCOL_MAJOR(VERSION_) (((VERSION_) >> 8) & 0xff) #define VMSTOR_PROTOCOL_MINOR(VERSION_) (((VERSION_) ) & 0xff) #define VMSTOR_PROTOCOL_VERSION(MAJOR_, MINOR_) ((((MAJOR_) & 0xff) << 8) | \ (((MINOR_) & 0xff) )) +#define VMSTOR_PROTOCOL_VERSION_WIN6 VMSTOR_PROTOCOL_VERSION(2, 0) +#define VMSTOR_PROTOCOL_VERSION_WIN7 VMSTOR_PROTOCOL_VERSION(4, 2) +#define VMSTOR_PROTOCOL_VERSION_WIN8 VMSTOR_PROTOCOL_VERSION(5, 1) +#define VMSTOR_PROTOCOL_VERSION_WIN8_1 VMSTOR_PROTOCOL_VERSION(6, 0) +#define VMSTOR_PROTOCOL_VERSION_WIN10 VMSTOR_PROTOCOL_VERSION(6, 2) /* * Invalid version. */ #define VMSTOR_INVALID_PROTOCOL_VERSION -1 /* * Version history: * V1 Beta 0.1 * V1 RC < 2008/1/31 1.0 * V1 RC > 2008/1/31 2.0 * Win7: 4.2 * Win8: 5.1 */ #define VMSTOR_PROTOCOL_VERSION_CURRENT VMSTOR_PROTOCOL_VERSION(5, 1) /** * Packet structure ops describing virtual storage requests. */ enum vstor_packet_ops { VSTOR_OPERATION_COMPLETEIO = 1, VSTOR_OPERATION_REMOVEDEVICE = 2, VSTOR_OPERATION_EXECUTESRB = 3, VSTOR_OPERATION_RESETLUN = 4, VSTOR_OPERATION_RESETADAPTER = 5, VSTOR_OPERATION_RESETBUS = 6, VSTOR_OPERATION_BEGININITIALIZATION = 7, VSTOR_OPERATION_ENDINITIALIZATION = 8, VSTOR_OPERATION_QUERYPROTOCOLVERSION = 9, VSTOR_OPERATION_QUERYPROPERTIES = 10, VSTOR_OPERATION_ENUMERATE_BUS = 11, VSTOR_OPERATION_FCHBA_DATA = 12, VSTOR_OPERATION_CREATE_MULTI_CHANNELS = 13, VSTOR_OPERATION_MAXIMUM = 13 }; /* * Platform neutral description of a scsi request - * this remains the same across the write regardless of 32/64 bit * note: it's patterned off the Windows DDK SCSI_PASS_THROUGH structure */ #define CDB16GENERIC_LENGTH 0x10 #define SENSE_BUFFER_SIZE 0x14 #define MAX_DATA_BUFFER_LENGTH_WITH_PADDING 0x14 #define POST_WIN7_STORVSC_SENSE_BUFFER_SIZE 0x14 #define PRE_WIN8_STORVSC_SENSE_BUFFER_SIZE 0x12 struct vmscsi_win8_extension { /* * The following were added in Windows 8 */ uint16_t reserve; uint8_t queue_tag; uint8_t queue_action; uint32_t srb_flags; uint32_t time_out_value; uint32_t queue_sort_ey; } __packed; struct vmscsi_req { uint16_t length; uint8_t srb_status; uint8_t scsi_status; /* HBA number, set to the order number detected by initiator. */ uint8_t port; /* SCSI bus number or bus_id, different from CAM's path_id. */ uint8_t path_id; uint8_t target_id; uint8_t lun; uint8_t cdb_len; uint8_t sense_info_len; uint8_t data_in; uint8_t reserved; uint32_t transfer_len; union { uint8_t cdb[CDB16GENERIC_LENGTH]; uint8_t sense_data[SENSE_BUFFER_SIZE]; uint8_t reserved_array[MAX_DATA_BUFFER_LENGTH_WITH_PADDING]; } u; /* * The following was added in win8. */ struct vmscsi_win8_extension win8_extension; } __packed; /** * This structure is sent during the initialization phase to get the different * properties of the channel. */ struct vmstor_chan_props { uint16_t proto_ver; uint8_t path_id; uint8_t target_id; uint16_t max_channel_cnt; /** * Note: port number is only really known on the client side */ uint16_t port; uint32_t flags; uint32_t max_transfer_bytes; /** * This id is unique for each channel and will correspond with * vendor specific data in the inquiry_ata */ uint64_t unique_id; } __packed; /** * This structure is sent during the storage protocol negotiations. */ struct vmstor_proto_ver { /** * Major (MSW) and minor (LSW) version numbers. */ uint16_t major_minor; uint16_t revision; /* always zero */ } __packed; /** * Channel Property Flags */ #define STORAGE_CHANNEL_REMOVABLE_FLAG 0x1 #define STORAGE_CHANNEL_EMULATED_IDE_FLAG 0x2 struct vstor_packet { /** * Requested operation type */ enum vstor_packet_ops operation; /* * Flags - see below for values */ uint32_t flags; /** * Status of the request returned from the server side. */ uint32_t status; union { /** * Structure used to forward SCSI commands from the client to * the server. */ struct vmscsi_req vm_srb; /** * Structure used to query channel properties. */ struct vmstor_chan_props chan_props; /** * Used during version negotiations. */ struct vmstor_proto_ver version; /** * Number of multichannels to create */ uint16_t multi_channels_cnt; } u; } __packed; /** * SRB (SCSI Request Block) Status Codes */ #define SRB_STATUS_PENDING 0x00 #define SRB_STATUS_SUCCESS 0x01 #define SRB_STATUS_ABORTED 0x02 #define SRB_STATUS_ABORT_FAILED 0x03 #define SRB_STATUS_ERROR 0x04 #define SRB_STATUS_BUSY 0x05 /** * SRB Status Masks (can be combined with above status codes) */ #define SRB_STATUS_QUEUE_FROZEN 0x40 #define SRB_STATUS_AUTOSENSE_VALID 0x80 /** * Packet flags */ /** * This flag indicates that the server should send back a completion for this * packet. */ #define REQUEST_COMPLETION_FLAG 0x1 /** * This is the set of flags that the vsc can set in any packets it sends */ #define VSC_LEGAL_FLAGS (REQUEST_COMPLETION_FLAG) #endif /* __HV_VSTORAGE_H__ */ Index: releng/10.3/sys/dev/hyperv/vmbus/hv_channel.c =================================================================== --- releng/10.3/sys/dev/hyperv/vmbus/hv_channel.c (revision 303983) +++ releng/10.3/sys/dev/hyperv/vmbus/hv_channel.c (revision 303984) @@ -1,882 +1,882 @@ /*- * Copyright (c) 2009-2012 Microsoft Corp. * Copyright (c) 2012 NetApp Inc. * Copyright (c) 2012 Citrix Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice unmodified, this list of conditions, and the following * disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include "hv_vmbus_priv.h" static int vmbus_channel_create_gpadl_header( /* must be phys and virt contiguous*/ void* contig_buffer, /* page-size multiple */ uint32_t size, hv_vmbus_channel_msg_info** msg_info, uint32_t* message_count); static void vmbus_channel_set_event(hv_vmbus_channel* channel); /** * @brief Trigger an event notification on the specified channel */ static void vmbus_channel_set_event(hv_vmbus_channel *channel) { hv_vmbus_monitor_page *monitor_page; if (channel->offer_msg.monitor_allocated) { /* Each uint32_t represents 32 channels */ synch_set_bit((channel->offer_msg.child_rel_id & 31), ((uint32_t *)hv_vmbus_g_connection.send_interrupt_page + ((channel->offer_msg.child_rel_id >> 5)))); monitor_page = (hv_vmbus_monitor_page *) hv_vmbus_g_connection.monitor_pages; monitor_page++; /* Get the child to parent monitor page */ synch_set_bit(channel->monitor_bit, (uint32_t *)&monitor_page-> trigger_group[channel->monitor_group].u.pending); } else { hv_vmbus_set_event(channel); } } /** * @brief Open the specified channel */ int hv_vmbus_channel_open( hv_vmbus_channel* new_channel, uint32_t send_ring_buffer_size, uint32_t recv_ring_buffer_size, void* user_data, uint32_t user_data_len, hv_vmbus_pfn_channel_callback pfn_on_channel_callback, void* context) { int ret = 0; void *in, *out; hv_vmbus_channel_open_channel* open_msg; hv_vmbus_channel_msg_info* open_info; mtx_lock(&new_channel->sc_lock); if (new_channel->state == HV_CHANNEL_OPEN_STATE) { new_channel->state = HV_CHANNEL_OPENING_STATE; } else { mtx_unlock(&new_channel->sc_lock); if(bootverbose) printf("VMBUS: Trying to open channel <%p> which in " "%d state.\n", new_channel, new_channel->state); return (EINVAL); } mtx_unlock(&new_channel->sc_lock); new_channel->on_channel_callback = pfn_on_channel_callback; new_channel->channel_callback_context = context; /* Allocate the ring buffer */ out = contigmalloc((send_ring_buffer_size + recv_ring_buffer_size), M_DEVBUF, M_ZERO, 0UL, BUS_SPACE_MAXADDR, PAGE_SIZE, 0); KASSERT(out != NULL, ("Error VMBUS: contigmalloc failed to allocate Ring Buffer!")); if (out == NULL) return (ENOMEM); in = ((uint8_t *) out + send_ring_buffer_size); new_channel->ring_buffer_pages = out; new_channel->ring_buffer_page_count = (send_ring_buffer_size + recv_ring_buffer_size) >> PAGE_SHIFT; new_channel->ring_buffer_size = send_ring_buffer_size + recv_ring_buffer_size; hv_vmbus_ring_buffer_init( &new_channel->outbound, out, send_ring_buffer_size); hv_vmbus_ring_buffer_init( &new_channel->inbound, in, recv_ring_buffer_size); /** * Establish the gpadl for the ring buffer */ new_channel->ring_buffer_gpadl_handle = 0; ret = hv_vmbus_channel_establish_gpadl(new_channel, new_channel->outbound.ring_buffer, send_ring_buffer_size + recv_ring_buffer_size, &new_channel->ring_buffer_gpadl_handle); /** * Create and init the channel open message */ open_info = (hv_vmbus_channel_msg_info*) malloc( sizeof(hv_vmbus_channel_msg_info) + sizeof(hv_vmbus_channel_open_channel), M_DEVBUF, M_NOWAIT); KASSERT(open_info != NULL, ("Error VMBUS: malloc failed to allocate Open Channel message!")); if (open_info == NULL) return (ENOMEM); sema_init(&open_info->wait_sema, 0, "Open Info Sema"); open_msg = (hv_vmbus_channel_open_channel*) open_info->msg; open_msg->header.message_type = HV_CHANNEL_MESSAGE_OPEN_CHANNEL; open_msg->open_id = new_channel->offer_msg.child_rel_id; open_msg->child_rel_id = new_channel->offer_msg.child_rel_id; open_msg->ring_buffer_gpadl_handle = new_channel->ring_buffer_gpadl_handle; open_msg->downstream_ring_buffer_page_offset = send_ring_buffer_size >> PAGE_SHIFT; open_msg->target_vcpu = new_channel->target_vcpu; if (user_data_len) memcpy(open_msg->user_data, user_data, user_data_len); - mtx_lock_spin(&hv_vmbus_g_connection.channel_msg_lock); + mtx_lock(&hv_vmbus_g_connection.channel_msg_lock); TAILQ_INSERT_TAIL( &hv_vmbus_g_connection.channel_msg_anchor, open_info, msg_list_entry); - mtx_unlock_spin(&hv_vmbus_g_connection.channel_msg_lock); + mtx_unlock(&hv_vmbus_g_connection.channel_msg_lock); ret = hv_vmbus_post_message( open_msg, sizeof(hv_vmbus_channel_open_channel)); if (ret != 0) goto cleanup; ret = sema_timedwait(&open_info->wait_sema, 5 * hz); /* KYS 5 seconds */ if (ret) { if(bootverbose) printf("VMBUS: channel <%p> open timeout.\n", new_channel); goto cleanup; } if (open_info->response.open_result.status == 0) { new_channel->state = HV_CHANNEL_OPENED_STATE; if(bootverbose) printf("VMBUS: channel <%p> open success.\n", new_channel); } else { if(bootverbose) printf("Error VMBUS: channel <%p> open failed - %d!\n", new_channel, open_info->response.open_result.status); } cleanup: - mtx_lock_spin(&hv_vmbus_g_connection.channel_msg_lock); + mtx_lock(&hv_vmbus_g_connection.channel_msg_lock); TAILQ_REMOVE( &hv_vmbus_g_connection.channel_msg_anchor, open_info, msg_list_entry); - mtx_unlock_spin(&hv_vmbus_g_connection.channel_msg_lock); + mtx_unlock(&hv_vmbus_g_connection.channel_msg_lock); sema_destroy(&open_info->wait_sema); free(open_info, M_DEVBUF); return (ret); } /** * @brief Create a gpadl for the specified buffer */ static int vmbus_channel_create_gpadl_header( void* contig_buffer, uint32_t size, /* page-size multiple */ hv_vmbus_channel_msg_info** msg_info, uint32_t* message_count) { int i; int page_count; unsigned long long pfn; uint32_t msg_size; hv_vmbus_channel_gpadl_header* gpa_header; hv_vmbus_channel_gpadl_body* gpadl_body; hv_vmbus_channel_msg_info* msg_header; hv_vmbus_channel_msg_info* msg_body; int pfnSum, pfnCount, pfnLeft, pfnCurr, pfnSize; page_count = size >> PAGE_SHIFT; pfn = hv_get_phys_addr(contig_buffer) >> PAGE_SHIFT; /*do we need a gpadl body msg */ pfnSize = HV_MAX_SIZE_CHANNEL_MESSAGE - sizeof(hv_vmbus_channel_gpadl_header) - sizeof(hv_gpa_range); pfnCount = pfnSize / sizeof(uint64_t); if (page_count > pfnCount) { /* if(we need a gpadl body) */ /* fill in the header */ msg_size = sizeof(hv_vmbus_channel_msg_info) + sizeof(hv_vmbus_channel_gpadl_header) + sizeof(hv_gpa_range) + pfnCount * sizeof(uint64_t); msg_header = malloc(msg_size, M_DEVBUF, M_NOWAIT | M_ZERO); KASSERT( msg_header != NULL, ("Error VMBUS: malloc failed to allocate Gpadl Message!")); if (msg_header == NULL) return (ENOMEM); TAILQ_INIT(&msg_header->sub_msg_list_anchor); msg_header->message_size = msg_size; gpa_header = (hv_vmbus_channel_gpadl_header*) msg_header->msg; gpa_header->range_count = 1; gpa_header->range_buf_len = sizeof(hv_gpa_range) + page_count * sizeof(uint64_t); gpa_header->range[0].byte_offset = 0; gpa_header->range[0].byte_count = size; for (i = 0; i < pfnCount; i++) { gpa_header->range[0].pfn_array[i] = pfn + i; } *msg_info = msg_header; *message_count = 1; pfnSum = pfnCount; pfnLeft = page_count - pfnCount; /* * figure out how many pfns we can fit */ pfnSize = HV_MAX_SIZE_CHANNEL_MESSAGE - sizeof(hv_vmbus_channel_gpadl_body); pfnCount = pfnSize / sizeof(uint64_t); /* * fill in the body */ while (pfnLeft) { if (pfnLeft > pfnCount) { pfnCurr = pfnCount; } else { pfnCurr = pfnLeft; } msg_size = sizeof(hv_vmbus_channel_msg_info) + sizeof(hv_vmbus_channel_gpadl_body) + pfnCurr * sizeof(uint64_t); msg_body = malloc(msg_size, M_DEVBUF, M_NOWAIT | M_ZERO); KASSERT( msg_body != NULL, ("Error VMBUS: malloc failed to allocate Gpadl msg_body!")); if (msg_body == NULL) return (ENOMEM); msg_body->message_size = msg_size; (*message_count)++; gpadl_body = (hv_vmbus_channel_gpadl_body*) msg_body->msg; /* * gpadl_body->gpadl = kbuffer; */ for (i = 0; i < pfnCurr; i++) { gpadl_body->pfn[i] = pfn + pfnSum + i; } TAILQ_INSERT_TAIL( &msg_header->sub_msg_list_anchor, msg_body, msg_list_entry); pfnSum += pfnCurr; pfnLeft -= pfnCurr; } } else { /* else everything fits in a header */ msg_size = sizeof(hv_vmbus_channel_msg_info) + sizeof(hv_vmbus_channel_gpadl_header) + sizeof(hv_gpa_range) + page_count * sizeof(uint64_t); msg_header = malloc(msg_size, M_DEVBUF, M_NOWAIT | M_ZERO); KASSERT( msg_header != NULL, ("Error VMBUS: malloc failed to allocate Gpadl Message!")); if (msg_header == NULL) return (ENOMEM); msg_header->message_size = msg_size; gpa_header = (hv_vmbus_channel_gpadl_header*) msg_header->msg; gpa_header->range_count = 1; gpa_header->range_buf_len = sizeof(hv_gpa_range) + page_count * sizeof(uint64_t); gpa_header->range[0].byte_offset = 0; gpa_header->range[0].byte_count = size; for (i = 0; i < page_count; i++) { gpa_header->range[0].pfn_array[i] = pfn + i; } *msg_info = msg_header; *message_count = 1; } return (0); } /** * @brief Establish a GPADL for the specified buffer */ int hv_vmbus_channel_establish_gpadl( hv_vmbus_channel* channel, void* contig_buffer, uint32_t size, /* page-size multiple */ uint32_t* gpadl_handle) { int ret = 0; hv_vmbus_channel_gpadl_header* gpadl_msg; hv_vmbus_channel_gpadl_body* gpadl_body; hv_vmbus_channel_msg_info* msg_info; hv_vmbus_channel_msg_info* sub_msg_info; uint32_t msg_count; hv_vmbus_channel_msg_info* curr; uint32_t next_gpadl_handle; next_gpadl_handle = hv_vmbus_g_connection.next_gpadl_handle; atomic_add_int((int*) &hv_vmbus_g_connection.next_gpadl_handle, 1); ret = vmbus_channel_create_gpadl_header( contig_buffer, size, &msg_info, &msg_count); if(ret != 0) { /* if(allocation failed) return immediately */ /* reverse atomic_add_int above */ atomic_subtract_int((int*) &hv_vmbus_g_connection.next_gpadl_handle, 1); return ret; } sema_init(&msg_info->wait_sema, 0, "Open Info Sema"); gpadl_msg = (hv_vmbus_channel_gpadl_header*) msg_info->msg; gpadl_msg->header.message_type = HV_CHANNEL_MESSAGEL_GPADL_HEADER; gpadl_msg->child_rel_id = channel->offer_msg.child_rel_id; gpadl_msg->gpadl = next_gpadl_handle; - mtx_lock_spin(&hv_vmbus_g_connection.channel_msg_lock); + mtx_lock(&hv_vmbus_g_connection.channel_msg_lock); TAILQ_INSERT_TAIL( &hv_vmbus_g_connection.channel_msg_anchor, msg_info, msg_list_entry); - mtx_unlock_spin(&hv_vmbus_g_connection.channel_msg_lock); + mtx_unlock(&hv_vmbus_g_connection.channel_msg_lock); ret = hv_vmbus_post_message( gpadl_msg, msg_info->message_size - (uint32_t) sizeof(hv_vmbus_channel_msg_info)); if (ret != 0) goto cleanup; if (msg_count > 1) { TAILQ_FOREACH(curr, &msg_info->sub_msg_list_anchor, msg_list_entry) { sub_msg_info = curr; gpadl_body = (hv_vmbus_channel_gpadl_body*) sub_msg_info->msg; gpadl_body->header.message_type = HV_CHANNEL_MESSAGE_GPADL_BODY; gpadl_body->gpadl = next_gpadl_handle; ret = hv_vmbus_post_message( gpadl_body, sub_msg_info->message_size - (uint32_t) sizeof(hv_vmbus_channel_msg_info)); /* if (the post message failed) give up and clean up */ if(ret != 0) goto cleanup; } } ret = sema_timedwait(&msg_info->wait_sema, 5 * hz); /* KYS 5 seconds*/ if (ret != 0) goto cleanup; *gpadl_handle = gpadl_msg->gpadl; cleanup: - mtx_lock_spin(&hv_vmbus_g_connection.channel_msg_lock); + mtx_lock(&hv_vmbus_g_connection.channel_msg_lock); TAILQ_REMOVE(&hv_vmbus_g_connection.channel_msg_anchor, msg_info, msg_list_entry); - mtx_unlock_spin(&hv_vmbus_g_connection.channel_msg_lock); + mtx_unlock(&hv_vmbus_g_connection.channel_msg_lock); sema_destroy(&msg_info->wait_sema); free(msg_info, M_DEVBUF); return (ret); } /** * @brief Teardown the specified GPADL handle */ int hv_vmbus_channel_teardown_gpdal( hv_vmbus_channel* channel, uint32_t gpadl_handle) { int ret = 0; hv_vmbus_channel_gpadl_teardown* msg; hv_vmbus_channel_msg_info* info; info = (hv_vmbus_channel_msg_info *) malloc( sizeof(hv_vmbus_channel_msg_info) + sizeof(hv_vmbus_channel_gpadl_teardown), M_DEVBUF, M_NOWAIT); KASSERT(info != NULL, ("Error VMBUS: malloc failed to allocate Gpadl Teardown Msg!")); if (info == NULL) { ret = ENOMEM; goto cleanup; } sema_init(&info->wait_sema, 0, "Open Info Sema"); msg = (hv_vmbus_channel_gpadl_teardown*) info->msg; msg->header.message_type = HV_CHANNEL_MESSAGE_GPADL_TEARDOWN; msg->child_rel_id = channel->offer_msg.child_rel_id; msg->gpadl = gpadl_handle; - mtx_lock_spin(&hv_vmbus_g_connection.channel_msg_lock); + mtx_lock(&hv_vmbus_g_connection.channel_msg_lock); TAILQ_INSERT_TAIL(&hv_vmbus_g_connection.channel_msg_anchor, info, msg_list_entry); - mtx_unlock_spin(&hv_vmbus_g_connection.channel_msg_lock); + mtx_unlock(&hv_vmbus_g_connection.channel_msg_lock); ret = hv_vmbus_post_message(msg, sizeof(hv_vmbus_channel_gpadl_teardown)); if (ret != 0) goto cleanup; ret = sema_timedwait(&info->wait_sema, 5 * hz); /* KYS 5 seconds */ cleanup: /* * Received a torndown response */ - mtx_lock_spin(&hv_vmbus_g_connection.channel_msg_lock); + mtx_lock(&hv_vmbus_g_connection.channel_msg_lock); TAILQ_REMOVE(&hv_vmbus_g_connection.channel_msg_anchor, info, msg_list_entry); - mtx_unlock_spin(&hv_vmbus_g_connection.channel_msg_lock); + mtx_unlock(&hv_vmbus_g_connection.channel_msg_lock); sema_destroy(&info->wait_sema); free(info, M_DEVBUF); return (ret); } static void hv_vmbus_channel_close_internal(hv_vmbus_channel *channel) { int ret = 0; hv_vmbus_channel_close_channel* msg; hv_vmbus_channel_msg_info* info; channel->state = HV_CHANNEL_OPEN_STATE; channel->sc_creation_callback = NULL; /* * Grab the lock to prevent race condition when a packet received * and unloading driver is in the process. */ mtx_lock(&channel->inbound_lock); channel->on_channel_callback = NULL; mtx_unlock(&channel->inbound_lock); /** * Send a closing message */ info = (hv_vmbus_channel_msg_info *) malloc( sizeof(hv_vmbus_channel_msg_info) + sizeof(hv_vmbus_channel_close_channel), M_DEVBUF, M_NOWAIT); KASSERT(info != NULL, ("VMBUS: malloc failed hv_vmbus_channel_close!")); if(info == NULL) return; msg = (hv_vmbus_channel_close_channel*) info->msg; msg->header.message_type = HV_CHANNEL_MESSAGE_CLOSE_CHANNEL; msg->child_rel_id = channel->offer_msg.child_rel_id; ret = hv_vmbus_post_message( msg, sizeof(hv_vmbus_channel_close_channel)); /* Tear down the gpadl for the channel's ring buffer */ if (channel->ring_buffer_gpadl_handle) { hv_vmbus_channel_teardown_gpdal(channel, channel->ring_buffer_gpadl_handle); } /* TODO: Send a msg to release the childRelId */ /* cleanup the ring buffers for this channel */ hv_ring_buffer_cleanup(&channel->outbound); hv_ring_buffer_cleanup(&channel->inbound); contigfree(channel->ring_buffer_pages, channel->ring_buffer_size, M_DEVBUF); free(info, M_DEVBUF); } /** * @brief Close the specified channel */ void hv_vmbus_channel_close(hv_vmbus_channel *channel) { hv_vmbus_channel* sub_channel; if (channel->primary_channel != NULL) { /* * We only close multi-channels when the primary is * closed. */ return; } /* * Close all multi-channels first. */ TAILQ_FOREACH(sub_channel, &channel->sc_list_anchor, sc_list_entry) { if (sub_channel->state != HV_CHANNEL_OPENED_STATE) continue; hv_vmbus_channel_close_internal(sub_channel); } /* * Then close the primary channel. */ hv_vmbus_channel_close_internal(channel); } /** * @brief Send the specified buffer on the given channel */ int hv_vmbus_channel_send_packet( hv_vmbus_channel* channel, void* buffer, uint32_t buffer_len, uint64_t request_id, hv_vmbus_packet_type type, uint32_t flags) { int ret = 0; hv_vm_packet_descriptor desc; uint32_t packet_len; uint64_t aligned_data; uint32_t packet_len_aligned; boolean_t need_sig; hv_vmbus_sg_buffer_list buffer_list[3]; packet_len = sizeof(hv_vm_packet_descriptor) + buffer_len; packet_len_aligned = HV_ALIGN_UP(packet_len, sizeof(uint64_t)); aligned_data = 0; /* Setup the descriptor */ desc.type = type; /* HV_VMBUS_PACKET_TYPE_DATA_IN_BAND; */ desc.flags = flags; /* HV_VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED */ /* in 8-bytes granularity */ desc.data_offset8 = sizeof(hv_vm_packet_descriptor) >> 3; desc.length8 = (uint16_t) (packet_len_aligned >> 3); desc.transaction_id = request_id; buffer_list[0].data = &desc; buffer_list[0].length = sizeof(hv_vm_packet_descriptor); buffer_list[1].data = buffer; buffer_list[1].length = buffer_len; buffer_list[2].data = &aligned_data; buffer_list[2].length = packet_len_aligned - packet_len; ret = hv_ring_buffer_write(&channel->outbound, buffer_list, 3, &need_sig); /* TODO: We should determine if this is optional */ if (ret == 0 && need_sig) { vmbus_channel_set_event(channel); } return (ret); } /** * @brief Send a range of single-page buffer packets using * a GPADL Direct packet type */ int hv_vmbus_channel_send_packet_pagebuffer( hv_vmbus_channel* channel, hv_vmbus_page_buffer page_buffers[], uint32_t page_count, void* buffer, uint32_t buffer_len, uint64_t request_id) { int ret = 0; int i = 0; boolean_t need_sig; uint32_t packet_len; uint32_t packetLen_aligned; hv_vmbus_sg_buffer_list buffer_list[3]; hv_vmbus_channel_packet_page_buffer desc; uint32_t descSize; uint64_t alignedData = 0; if (page_count > HV_MAX_PAGE_BUFFER_COUNT) return (EINVAL); /* * Adjust the size down since hv_vmbus_channel_packet_page_buffer * is the largest size we support */ descSize = sizeof(hv_vmbus_channel_packet_page_buffer) - ((HV_MAX_PAGE_BUFFER_COUNT - page_count) * sizeof(hv_vmbus_page_buffer)); packet_len = descSize + buffer_len; packetLen_aligned = HV_ALIGN_UP(packet_len, sizeof(uint64_t)); /* Setup the descriptor */ desc.type = HV_VMBUS_PACKET_TYPE_DATA_USING_GPA_DIRECT; desc.flags = HV_VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED; desc.data_offset8 = descSize >> 3; /* in 8-bytes granularity */ desc.length8 = (uint16_t) (packetLen_aligned >> 3); desc.transaction_id = request_id; desc.range_count = page_count; for (i = 0; i < page_count; i++) { desc.range[i].length = page_buffers[i].length; desc.range[i].offset = page_buffers[i].offset; desc.range[i].pfn = page_buffers[i].pfn; } buffer_list[0].data = &desc; buffer_list[0].length = descSize; buffer_list[1].data = buffer; buffer_list[1].length = buffer_len; buffer_list[2].data = &alignedData; buffer_list[2].length = packetLen_aligned - packet_len; ret = hv_ring_buffer_write(&channel->outbound, buffer_list, 3, &need_sig); /* TODO: We should determine if this is optional */ if (ret == 0 && need_sig) { vmbus_channel_set_event(channel); } return (ret); } /** * @brief Send a multi-page buffer packet using a GPADL Direct packet type */ int hv_vmbus_channel_send_packet_multipagebuffer( hv_vmbus_channel* channel, hv_vmbus_multipage_buffer* multi_page_buffer, void* buffer, uint32_t buffer_len, uint64_t request_id) { int ret = 0; uint32_t desc_size; boolean_t need_sig; uint32_t packet_len; uint32_t packet_len_aligned; uint32_t pfn_count; uint64_t aligned_data = 0; hv_vmbus_sg_buffer_list buffer_list[3]; hv_vmbus_channel_packet_multipage_buffer desc; pfn_count = HV_NUM_PAGES_SPANNED( multi_page_buffer->offset, multi_page_buffer->length); if ((pfn_count == 0) || (pfn_count > HV_MAX_MULTIPAGE_BUFFER_COUNT)) return (EINVAL); /* * Adjust the size down since hv_vmbus_channel_packet_multipage_buffer * is the largest size we support */ desc_size = sizeof(hv_vmbus_channel_packet_multipage_buffer) - ((HV_MAX_MULTIPAGE_BUFFER_COUNT - pfn_count) * sizeof(uint64_t)); packet_len = desc_size + buffer_len; packet_len_aligned = HV_ALIGN_UP(packet_len, sizeof(uint64_t)); /* * Setup the descriptor */ desc.type = HV_VMBUS_PACKET_TYPE_DATA_USING_GPA_DIRECT; desc.flags = HV_VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED; desc.data_offset8 = desc_size >> 3; /* in 8-bytes granularity */ desc.length8 = (uint16_t) (packet_len_aligned >> 3); desc.transaction_id = request_id; desc.range_count = 1; desc.range.length = multi_page_buffer->length; desc.range.offset = multi_page_buffer->offset; memcpy(desc.range.pfn_array, multi_page_buffer->pfn_array, pfn_count * sizeof(uint64_t)); buffer_list[0].data = &desc; buffer_list[0].length = desc_size; buffer_list[1].data = buffer; buffer_list[1].length = buffer_len; buffer_list[2].data = &aligned_data; buffer_list[2].length = packet_len_aligned - packet_len; ret = hv_ring_buffer_write(&channel->outbound, buffer_list, 3, &need_sig); /* TODO: We should determine if this is optional */ if (ret == 0 && need_sig) { vmbus_channel_set_event(channel); } return (ret); } /** * @brief Retrieve the user packet on the specified channel */ int hv_vmbus_channel_recv_packet( hv_vmbus_channel* channel, void* Buffer, uint32_t buffer_len, uint32_t* buffer_actual_len, uint64_t* request_id) { int ret; uint32_t user_len; uint32_t packet_len; hv_vm_packet_descriptor desc; *buffer_actual_len = 0; *request_id = 0; ret = hv_ring_buffer_peek(&channel->inbound, &desc, sizeof(hv_vm_packet_descriptor)); if (ret != 0) return (0); packet_len = desc.length8 << 3; user_len = packet_len - (desc.data_offset8 << 3); *buffer_actual_len = user_len; if (user_len > buffer_len) return (EINVAL); *request_id = desc.transaction_id; /* Copy over the packet to the user buffer */ ret = hv_ring_buffer_read(&channel->inbound, Buffer, user_len, (desc.data_offset8 << 3)); return (0); } /** * @brief Retrieve the raw packet on the specified channel */ int hv_vmbus_channel_recv_packet_raw( hv_vmbus_channel* channel, void* buffer, uint32_t buffer_len, uint32_t* buffer_actual_len, uint64_t* request_id) { int ret; uint32_t packetLen; uint32_t userLen; hv_vm_packet_descriptor desc; *buffer_actual_len = 0; *request_id = 0; ret = hv_ring_buffer_peek( &channel->inbound, &desc, sizeof(hv_vm_packet_descriptor)); if (ret != 0) return (0); packetLen = desc.length8 << 3; userLen = packetLen - (desc.data_offset8 << 3); *buffer_actual_len = packetLen; if (packetLen > buffer_len) return (ENOBUFS); *request_id = desc.transaction_id; /* Copy over the entire packet to the user buffer */ ret = hv_ring_buffer_read(&channel->inbound, buffer, packetLen, 0); return (0); } Index: releng/10.3/sys/dev/hyperv/vmbus/hv_channel_mgmt.c =================================================================== --- releng/10.3/sys/dev/hyperv/vmbus/hv_channel_mgmt.c (revision 303983) +++ releng/10.3/sys/dev/hyperv/vmbus/hv_channel_mgmt.c (revision 303984) @@ -1,851 +1,851 @@ /*- * Copyright (c) 2009-2012 Microsoft Corp. * Copyright (c) 2012 NetApp Inc. * Copyright (c) 2012 Citrix Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice unmodified, this list of conditions, and the following * disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #include #include "hv_vmbus_priv.h" /* * Internal functions */ static void vmbus_channel_on_offer(hv_vmbus_channel_msg_header* hdr); static void vmbus_channel_on_open_result(hv_vmbus_channel_msg_header* hdr); static void vmbus_channel_on_offer_rescind(hv_vmbus_channel_msg_header* hdr); static void vmbus_channel_on_gpadl_created(hv_vmbus_channel_msg_header* hdr); static void vmbus_channel_on_gpadl_torndown(hv_vmbus_channel_msg_header* hdr); static void vmbus_channel_on_offers_delivered(hv_vmbus_channel_msg_header* hdr); static void vmbus_channel_on_version_response(hv_vmbus_channel_msg_header* hdr); /** * Channel message dispatch table */ hv_vmbus_channel_msg_table_entry g_channel_message_table[HV_CHANNEL_MESSAGE_COUNT] = { { HV_CHANNEL_MESSAGE_INVALID, 0, NULL }, { HV_CHANNEL_MESSAGE_OFFER_CHANNEL, 0, vmbus_channel_on_offer }, { HV_CHANNEL_MESSAGE_RESCIND_CHANNEL_OFFER, 0, vmbus_channel_on_offer_rescind }, { HV_CHANNEL_MESSAGE_REQUEST_OFFERS, 0, NULL }, { HV_CHANNEL_MESSAGE_ALL_OFFERS_DELIVERED, 1, vmbus_channel_on_offers_delivered }, { HV_CHANNEL_MESSAGE_OPEN_CHANNEL, 0, NULL }, { HV_CHANNEL_MESSAGE_OPEN_CHANNEL_RESULT, 1, vmbus_channel_on_open_result }, { HV_CHANNEL_MESSAGE_CLOSE_CHANNEL, 0, NULL }, { HV_CHANNEL_MESSAGEL_GPADL_HEADER, 0, NULL }, { HV_CHANNEL_MESSAGE_GPADL_BODY, 0, NULL }, { HV_CHANNEL_MESSAGE_GPADL_CREATED, 1, vmbus_channel_on_gpadl_created }, { HV_CHANNEL_MESSAGE_GPADL_TEARDOWN, 0, NULL }, { HV_CHANNEL_MESSAGE_GPADL_TORNDOWN, 1, vmbus_channel_on_gpadl_torndown }, { HV_CHANNEL_MESSAGE_REL_ID_RELEASED, 0, NULL }, { HV_CHANNEL_MESSAGE_INITIATED_CONTACT, 0, NULL }, { HV_CHANNEL_MESSAGE_VERSION_RESPONSE, 1, vmbus_channel_on_version_response }, { HV_CHANNEL_MESSAGE_UNLOAD, 0, NULL } }; /** * Implementation of the work abstraction. */ static void work_item_callback(void *work, int pending) { struct hv_work_item *w = (struct hv_work_item *)work; /* * Serialize work execution. */ if (w->wq->work_sema != NULL) { sema_wait(w->wq->work_sema); } w->callback(w->context); if (w->wq->work_sema != NULL) { sema_post(w->wq->work_sema); } free(w, M_DEVBUF); } struct hv_work_queue* hv_work_queue_create(char* name) { static unsigned int qid = 0; char qname[64]; int pri; struct hv_work_queue* wq; wq = malloc(sizeof(struct hv_work_queue), M_DEVBUF, M_NOWAIT | M_ZERO); KASSERT(wq != NULL, ("Error VMBUS: Failed to allocate work_queue\n")); if (wq == NULL) return (NULL); /* * We use work abstraction to handle messages * coming from the host and these are typically offers. * Some FreeBsd drivers appear to have a concurrency issue * where probe/attach needs to be serialized. We ensure that * by having only one thread process work elements in a * specific queue by serializing work execution. * */ if (strcmp(name, "vmbusQ") == 0) { pri = PI_DISK; } else { /* control */ pri = PI_NET; /* * Initialize semaphore for this queue by pointing * to the globale semaphore used for synchronizing all * control messages. */ wq->work_sema = &hv_vmbus_g_connection.control_sema; } sprintf(qname, "hv_%s_%u", name, qid); /* * Fixme: FreeBSD 8.2 has a different prototype for * taskqueue_create(), and for certain other taskqueue functions. * We need to research the implications of these changes. * Fixme: Not sure when the changes were introduced. */ wq->queue = taskqueue_create(qname, M_NOWAIT, taskqueue_thread_enqueue, &wq->queue #if __FreeBSD_version < 800000 , &wq->proc #endif ); if (wq->queue == NULL) { free(wq, M_DEVBUF); return (NULL); } if (taskqueue_start_threads(&wq->queue, 1, pri, "%s taskq", qname)) { taskqueue_free(wq->queue); free(wq, M_DEVBUF); return (NULL); } qid++; return (wq); } void hv_work_queue_close(struct hv_work_queue *wq) { /* * KYS: Need to drain the taskqueue * before we close the hv_work_queue. */ /*KYS: taskqueue_drain(wq->tq, ); */ taskqueue_free(wq->queue); free(wq, M_DEVBUF); } /** * @brief Create work item */ int hv_queue_work_item( struct hv_work_queue *wq, void (*callback)(void *), void *context) { struct hv_work_item *w = malloc(sizeof(struct hv_work_item), M_DEVBUF, M_NOWAIT | M_ZERO); KASSERT(w != NULL, ("Error VMBUS: Failed to allocate WorkItem\n")); if (w == NULL) return (ENOMEM); w->callback = callback; w->context = context; w->wq = wq; TASK_INIT(&w->work, 0, work_item_callback, w); return (taskqueue_enqueue(wq->queue, &w->work)); } /** * @brief Allocate and initialize a vmbus channel object */ hv_vmbus_channel* hv_vmbus_allocate_channel(void) { hv_vmbus_channel* channel; channel = (hv_vmbus_channel*) malloc( sizeof(hv_vmbus_channel), M_DEVBUF, M_NOWAIT | M_ZERO); KASSERT(channel != NULL, ("Error VMBUS: Failed to allocate channel!")); if (channel == NULL) return (NULL); mtx_init(&channel->inbound_lock, "channel inbound", NULL, MTX_DEF); mtx_init(&channel->sc_lock, "vmbus multi channel", NULL, MTX_DEF); TAILQ_INIT(&channel->sc_list_anchor); return (channel); } /** * @brief Release the vmbus channel object itself */ static inline void ReleaseVmbusChannel(void *context) { hv_vmbus_channel* channel = (hv_vmbus_channel*) context; free(channel, M_DEVBUF); } /** * @brief Release the resources used by the vmbus channel object */ void hv_vmbus_free_vmbus_channel(hv_vmbus_channel* channel) { mtx_destroy(&channel->sc_lock); mtx_destroy(&channel->inbound_lock); /* * We have to release the channel's workqueue/thread in * the vmbus's workqueue/thread context * ie we can't destroy ourselves */ hv_queue_work_item(hv_vmbus_g_connection.work_queue, ReleaseVmbusChannel, (void *) channel); } /** * @brief Process the offer by creating a channel/device * associated with this offer */ static void vmbus_channel_process_offer(hv_vmbus_channel *new_channel) { boolean_t f_new; hv_vmbus_channel* channel; int ret; uint32_t relid; f_new = TRUE; channel = NULL; relid = new_channel->offer_msg.child_rel_id; /* * Make sure this is a new offer */ mtx_lock(&hv_vmbus_g_connection.channel_lock); hv_vmbus_g_connection.channels[relid] = new_channel; TAILQ_FOREACH(channel, &hv_vmbus_g_connection.channel_anchor, list_entry) { if (memcmp(&channel->offer_msg.offer.interface_type, &new_channel->offer_msg.offer.interface_type, sizeof(hv_guid)) == 0 && memcmp(&channel->offer_msg.offer.interface_instance, &new_channel->offer_msg.offer.interface_instance, sizeof(hv_guid)) == 0) { f_new = FALSE; break; } } if (f_new) { /* Insert at tail */ TAILQ_INSERT_TAIL( &hv_vmbus_g_connection.channel_anchor, new_channel, list_entry); } mtx_unlock(&hv_vmbus_g_connection.channel_lock); /*XXX add new channel to percpu_list */ if (!f_new) { /* * Check if this is a sub channel. */ if (new_channel->offer_msg.offer.sub_channel_index != 0) { /* * It is a sub channel offer, process it. */ new_channel->primary_channel = channel; mtx_lock(&channel->sc_lock); TAILQ_INSERT_TAIL( &channel->sc_list_anchor, new_channel, sc_list_entry); mtx_unlock(&channel->sc_lock); /* Insert new channel into channel_anchor. */ printf("VMBUS get multi-channel offer, rel=%u,sub=%u\n", new_channel->offer_msg.child_rel_id, new_channel->offer_msg.offer.sub_channel_index); mtx_lock(&hv_vmbus_g_connection.channel_lock); TAILQ_INSERT_TAIL(&hv_vmbus_g_connection.channel_anchor, new_channel, list_entry); mtx_unlock(&hv_vmbus_g_connection.channel_lock); if(bootverbose) printf("VMBUS: new multi-channel offer <%p>, " "its primary channel is <%p>.\n", new_channel, new_channel->primary_channel); /*XXX add it to percpu_list */ new_channel->state = HV_CHANNEL_OPEN_STATE; if (channel->sc_creation_callback != NULL) { channel->sc_creation_callback(new_channel); } return; } hv_vmbus_free_vmbus_channel(new_channel); return; } new_channel->state = HV_CHANNEL_OPEN_STATE; /* * Start the process of binding this offer to the driver * (We need to set the device field before calling * hv_vmbus_child_device_add()) */ new_channel->device = hv_vmbus_child_device_create( new_channel->offer_msg.offer.interface_type, new_channel->offer_msg.offer.interface_instance, new_channel); /* * Add the new device to the bus. This will kick off device-driver * binding which eventually invokes the device driver's AddDevice() * method. */ ret = hv_vmbus_child_device_register(new_channel->device); if (ret != 0) { mtx_lock(&hv_vmbus_g_connection.channel_lock); TAILQ_REMOVE( &hv_vmbus_g_connection.channel_anchor, new_channel, list_entry); mtx_unlock(&hv_vmbus_g_connection.channel_lock); hv_vmbus_free_vmbus_channel(new_channel); } } /** * Array of device guids that are performance critical. We try to distribute * the interrupt load for these devices across all online cpus. */ static const hv_guid high_perf_devices[] = { {HV_NIC_GUID, }, {HV_IDE_GUID, }, {HV_SCSI_GUID, }, }; enum { PERF_CHN_NIC = 0, PERF_CHN_IDE, PERF_CHN_SCSI, MAX_PERF_CHN, }; /* * We use this static number to distribute the channel interrupt load. */ static uint32_t next_vcpu; /** * Starting with Win8, we can statically distribute the incoming * channel interrupt load by binding a channel to VCPU. We * implement here a simple round robin scheme for distributing * the interrupt load. * We will bind channels that are not performance critical to cpu 0 and * performance critical channels (IDE, SCSI and Network) will be uniformly * distributed across all available CPUs. */ static void vmbus_channel_select_cpu(hv_vmbus_channel *channel, hv_guid *guid) { uint32_t current_cpu; int i; boolean_t is_perf_channel = FALSE; for (i = PERF_CHN_NIC; i < MAX_PERF_CHN; i++) { if (memcmp(guid->data, high_perf_devices[i].data, sizeof(hv_guid)) == 0) { is_perf_channel = TRUE; break; } } if ((hv_vmbus_protocal_version == HV_VMBUS_VERSION_WS2008) || (hv_vmbus_protocal_version == HV_VMBUS_VERSION_WIN7) || (!is_perf_channel)) { /* Host's view of guest cpu */ channel->target_vcpu = 0; /* Guest's own view of cpu */ channel->target_cpu = 0; return; } /* mp_ncpus should have the number cpus currently online */ current_cpu = (++next_vcpu % mp_ncpus); channel->target_cpu = current_cpu; channel->target_vcpu = hv_vmbus_g_context.hv_vcpu_index[current_cpu]; if (bootverbose) printf("VMBUS: Total online cpus %d, assign perf channel %d " "to vcpu %d, cpu %d\n", mp_ncpus, i, channel->target_vcpu, current_cpu); } /** * @brief Handler for channel offers from Hyper-V/Azure * * Handler for channel offers from vmbus in parent partition. We ignore * all offers except network and storage offers. For each network and storage * offers, we create a channel object and queue a work item to the channel * object to process the offer synchronously */ static void vmbus_channel_on_offer(hv_vmbus_channel_msg_header* hdr) { hv_vmbus_channel_offer_channel* offer; hv_vmbus_channel* new_channel; offer = (hv_vmbus_channel_offer_channel*) hdr; hv_guid *guidType; hv_guid *guidInstance; guidType = &offer->offer.interface_type; guidInstance = &offer->offer.interface_instance; /* Allocate the channel object and save this offer */ new_channel = hv_vmbus_allocate_channel(); if (new_channel == NULL) return; /* * By default we setup state to enable batched * reading. A specific service can choose to * disable this prior to opening the channel. */ new_channel->batched_reading = TRUE; new_channel->signal_event_param = (hv_vmbus_input_signal_event *) (HV_ALIGN_UP((unsigned long) &new_channel->signal_event_buffer, HV_HYPERCALL_PARAM_ALIGN)); new_channel->signal_event_param->connection_id.as_uint32_t = 0; new_channel->signal_event_param->connection_id.u.id = HV_VMBUS_EVENT_CONNECTION_ID; new_channel->signal_event_param->flag_number = 0; new_channel->signal_event_param->rsvd_z = 0; if (hv_vmbus_protocal_version != HV_VMBUS_VERSION_WS2008) { new_channel->is_dedicated_interrupt = (offer->is_dedicated_interrupt != 0); new_channel->signal_event_param->connection_id.u.id = offer->connection_id; } /* * Bind the channel to a chosen cpu. */ vmbus_channel_select_cpu(new_channel, &offer->offer.interface_type); memcpy(&new_channel->offer_msg, offer, sizeof(hv_vmbus_channel_offer_channel)); new_channel->monitor_group = (uint8_t) offer->monitor_id / 32; new_channel->monitor_bit = (uint8_t) offer->monitor_id % 32; vmbus_channel_process_offer(new_channel); } /** * @brief Rescind offer handler. * * We queue a work item to process this offer * synchronously */ static void vmbus_channel_on_offer_rescind(hv_vmbus_channel_msg_header* hdr) { hv_vmbus_channel_rescind_offer* rescind; hv_vmbus_channel* channel; rescind = (hv_vmbus_channel_rescind_offer*) hdr; channel = hv_vmbus_g_connection.channels[rescind->child_rel_id]; if (channel == NULL) return; hv_vmbus_child_device_unregister(channel->device); mtx_lock(&hv_vmbus_g_connection.channel_lock); hv_vmbus_g_connection.channels[rescind->child_rel_id] = NULL; mtx_unlock(&hv_vmbus_g_connection.channel_lock); } /** * * @brief Invoked when all offers have been delivered. */ static void vmbus_channel_on_offers_delivered(hv_vmbus_channel_msg_header* hdr) { } /** * @brief Open result handler. * * This is invoked when we received a response * to our channel open request. Find the matching request, copy the * response and signal the requesting thread. */ static void vmbus_channel_on_open_result(hv_vmbus_channel_msg_header* hdr) { hv_vmbus_channel_open_result* result; hv_vmbus_channel_msg_info* msg_info; hv_vmbus_channel_msg_header* requestHeader; hv_vmbus_channel_open_channel* openMsg; result = (hv_vmbus_channel_open_result*) hdr; /* * Find the open msg, copy the result and signal/unblock the wait event */ - mtx_lock_spin(&hv_vmbus_g_connection.channel_msg_lock); + mtx_lock(&hv_vmbus_g_connection.channel_msg_lock); TAILQ_FOREACH(msg_info, &hv_vmbus_g_connection.channel_msg_anchor, msg_list_entry) { requestHeader = (hv_vmbus_channel_msg_header*) msg_info->msg; if (requestHeader->message_type == HV_CHANNEL_MESSAGE_OPEN_CHANNEL) { openMsg = (hv_vmbus_channel_open_channel*) msg_info->msg; if (openMsg->child_rel_id == result->child_rel_id && openMsg->open_id == result->open_id) { memcpy(&msg_info->response.open_result, result, sizeof(hv_vmbus_channel_open_result)); sema_post(&msg_info->wait_sema); break; } } } - mtx_unlock_spin(&hv_vmbus_g_connection.channel_msg_lock); + mtx_unlock(&hv_vmbus_g_connection.channel_msg_lock); } /** * @brief GPADL created handler. * * This is invoked when we received a response * to our gpadl create request. Find the matching request, copy the * response and signal the requesting thread. */ static void vmbus_channel_on_gpadl_created(hv_vmbus_channel_msg_header* hdr) { hv_vmbus_channel_gpadl_created* gpadl_created; hv_vmbus_channel_msg_info* msg_info; hv_vmbus_channel_msg_header* request_header; hv_vmbus_channel_gpadl_header* gpadl_header; gpadl_created = (hv_vmbus_channel_gpadl_created*) hdr; /* Find the establish msg, copy the result and signal/unblock * the wait event */ - mtx_lock_spin(&hv_vmbus_g_connection.channel_msg_lock); + mtx_lock(&hv_vmbus_g_connection.channel_msg_lock); TAILQ_FOREACH(msg_info, &hv_vmbus_g_connection.channel_msg_anchor, msg_list_entry) { request_header = (hv_vmbus_channel_msg_header*) msg_info->msg; if (request_header->message_type == HV_CHANNEL_MESSAGEL_GPADL_HEADER) { gpadl_header = (hv_vmbus_channel_gpadl_header*) request_header; if ((gpadl_created->child_rel_id == gpadl_header->child_rel_id) && (gpadl_created->gpadl == gpadl_header->gpadl)) { memcpy(&msg_info->response.gpadl_created, gpadl_created, sizeof(hv_vmbus_channel_gpadl_created)); sema_post(&msg_info->wait_sema); break; } } } - mtx_unlock_spin(&hv_vmbus_g_connection.channel_msg_lock); + mtx_unlock(&hv_vmbus_g_connection.channel_msg_lock); } /** * @brief GPADL torndown handler. * * This is invoked when we received a respons * to our gpadl teardown request. Find the matching request, copy the * response and signal the requesting thread */ static void vmbus_channel_on_gpadl_torndown(hv_vmbus_channel_msg_header* hdr) { hv_vmbus_channel_gpadl_torndown* gpadl_torndown; hv_vmbus_channel_msg_info* msg_info; hv_vmbus_channel_msg_header* requestHeader; hv_vmbus_channel_gpadl_teardown* gpadlTeardown; gpadl_torndown = (hv_vmbus_channel_gpadl_torndown*)hdr; /* * Find the open msg, copy the result and signal/unblock the * wait event. */ - mtx_lock_spin(&hv_vmbus_g_connection.channel_msg_lock); + mtx_lock(&hv_vmbus_g_connection.channel_msg_lock); TAILQ_FOREACH(msg_info, &hv_vmbus_g_connection.channel_msg_anchor, msg_list_entry) { requestHeader = (hv_vmbus_channel_msg_header*) msg_info->msg; if (requestHeader->message_type == HV_CHANNEL_MESSAGE_GPADL_TEARDOWN) { gpadlTeardown = (hv_vmbus_channel_gpadl_teardown*) requestHeader; if (gpadl_torndown->gpadl == gpadlTeardown->gpadl) { memcpy(&msg_info->response.gpadl_torndown, gpadl_torndown, sizeof(hv_vmbus_channel_gpadl_torndown)); sema_post(&msg_info->wait_sema); break; } } } - mtx_unlock_spin(&hv_vmbus_g_connection.channel_msg_lock); + mtx_unlock(&hv_vmbus_g_connection.channel_msg_lock); } /** * @brief Version response handler. * * This is invoked when we received a response * to our initiate contact request. Find the matching request, copy th * response and signal the requesting thread. */ static void vmbus_channel_on_version_response(hv_vmbus_channel_msg_header* hdr) { hv_vmbus_channel_msg_info* msg_info; hv_vmbus_channel_msg_header* requestHeader; hv_vmbus_channel_initiate_contact* initiate; hv_vmbus_channel_version_response* versionResponse; versionResponse = (hv_vmbus_channel_version_response*)hdr; - mtx_lock_spin(&hv_vmbus_g_connection.channel_msg_lock); + mtx_lock(&hv_vmbus_g_connection.channel_msg_lock); TAILQ_FOREACH(msg_info, &hv_vmbus_g_connection.channel_msg_anchor, msg_list_entry) { requestHeader = (hv_vmbus_channel_msg_header*) msg_info->msg; if (requestHeader->message_type == HV_CHANNEL_MESSAGE_INITIATED_CONTACT) { initiate = (hv_vmbus_channel_initiate_contact*) requestHeader; memcpy(&msg_info->response.version_response, versionResponse, sizeof(hv_vmbus_channel_version_response)); sema_post(&msg_info->wait_sema); } } - mtx_unlock_spin(&hv_vmbus_g_connection.channel_msg_lock); + mtx_unlock(&hv_vmbus_g_connection.channel_msg_lock); } /** * @brief Handler for channel protocol messages. * * This is invoked in the vmbus worker thread context. */ void hv_vmbus_on_channel_message(void *context) { hv_vmbus_message* msg; hv_vmbus_channel_msg_header* hdr; int size; msg = (hv_vmbus_message*) context; hdr = (hv_vmbus_channel_msg_header*) msg->u.payload; size = msg->header.payload_size; if (hdr->message_type >= HV_CHANNEL_MESSAGE_COUNT) { free(msg, M_DEVBUF); return; } if (g_channel_message_table[hdr->message_type].messageHandler) { g_channel_message_table[hdr->message_type].messageHandler(hdr); } /* Free the msg that was allocated in VmbusOnMsgDPC() */ free(msg, M_DEVBUF); } /** * @brief Send a request to get all our pending offers. */ int hv_vmbus_request_channel_offers(void) { int ret; hv_vmbus_channel_msg_header* msg; hv_vmbus_channel_msg_info* msg_info; msg_info = (hv_vmbus_channel_msg_info *) malloc(sizeof(hv_vmbus_channel_msg_info) + sizeof(hv_vmbus_channel_msg_header), M_DEVBUF, M_NOWAIT); if (msg_info == NULL) { if(bootverbose) printf("Error VMBUS: malloc failed for Request Offers\n"); return (ENOMEM); } msg = (hv_vmbus_channel_msg_header*) msg_info->msg; msg->message_type = HV_CHANNEL_MESSAGE_REQUEST_OFFERS; ret = hv_vmbus_post_message(msg, sizeof(hv_vmbus_channel_msg_header)); if (msg_info) free(msg_info, M_DEVBUF); return (ret); } /** * @brief Release channels that are unattached/unconnected (i.e., no drivers associated) */ void hv_vmbus_release_unattached_channels(void) { hv_vmbus_channel *channel; mtx_lock(&hv_vmbus_g_connection.channel_lock); while (!TAILQ_EMPTY(&hv_vmbus_g_connection.channel_anchor)) { channel = TAILQ_FIRST(&hv_vmbus_g_connection.channel_anchor); TAILQ_REMOVE(&hv_vmbus_g_connection.channel_anchor, channel, list_entry); hv_vmbus_child_device_unregister(channel->device); hv_vmbus_free_vmbus_channel(channel); } bzero(hv_vmbus_g_connection.channels, sizeof(hv_vmbus_channel*) * HV_CHANNEL_MAX_COUNT); mtx_unlock(&hv_vmbus_g_connection.channel_lock); } /** * @brief Select the best outgoing channel * * The channel whose vcpu binding is closest to the currect vcpu will * be selected. * If no multi-channel, always select primary channel * * @param primary - primary channel */ struct hv_vmbus_channel * vmbus_select_outgoing_channel(struct hv_vmbus_channel *primary) { hv_vmbus_channel *new_channel = NULL; hv_vmbus_channel *outgoing_channel = primary; int old_cpu_distance = 0; int new_cpu_distance = 0; int cur_vcpu = 0; int smp_pro_id = PCPU_GET(cpuid); if (TAILQ_EMPTY(&primary->sc_list_anchor)) { return outgoing_channel; } if (smp_pro_id >= MAXCPU) { return outgoing_channel; } cur_vcpu = hv_vmbus_g_context.hv_vcpu_index[smp_pro_id]; TAILQ_FOREACH(new_channel, &primary->sc_list_anchor, sc_list_entry) { if (new_channel->state != HV_CHANNEL_OPENED_STATE){ continue; } if (new_channel->target_vcpu == cur_vcpu){ return new_channel; } old_cpu_distance = ((outgoing_channel->target_vcpu > cur_vcpu) ? (outgoing_channel->target_vcpu - cur_vcpu) : (cur_vcpu - outgoing_channel->target_vcpu)); new_cpu_distance = ((new_channel->target_vcpu > cur_vcpu) ? (new_channel->target_vcpu - cur_vcpu) : (cur_vcpu - new_channel->target_vcpu)); if (old_cpu_distance < new_cpu_distance) { continue; } outgoing_channel = new_channel; } return(outgoing_channel); } Index: releng/10.3/sys/dev/hyperv/vmbus/hv_connection.c =================================================================== --- releng/10.3/sys/dev/hyperv/vmbus/hv_connection.c (revision 303983) +++ releng/10.3/sys/dev/hyperv/vmbus/hv_connection.c (revision 303984) @@ -1,522 +1,526 @@ /*- * Copyright (c) 2009-2012 Microsoft Corp. * Copyright (c) 2012 NetApp Inc. * Copyright (c) 2012 Citrix Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice unmodified, this list of conditions, and the following * disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include "hv_vmbus_priv.h" /* * Globals */ hv_vmbus_connection hv_vmbus_g_connection = { .connect_state = HV_DISCONNECTED, .next_gpadl_handle = 0xE1E10, }; uint32_t hv_vmbus_protocal_version = HV_VMBUS_VERSION_WS2008; static uint32_t hv_vmbus_get_next_version(uint32_t current_ver) { switch (current_ver) { case (HV_VMBUS_VERSION_WIN7): return(HV_VMBUS_VERSION_WS2008); case (HV_VMBUS_VERSION_WIN8): return(HV_VMBUS_VERSION_WIN7); case (HV_VMBUS_VERSION_WIN8_1): return(HV_VMBUS_VERSION_WIN8); case (HV_VMBUS_VERSION_WS2008): default: return(HV_VMBUS_VERSION_INVALID); } } /** * Negotiate the highest supported hypervisor version. */ static int hv_vmbus_negotiate_version(hv_vmbus_channel_msg_info *msg_info, uint32_t version) { int ret = 0; hv_vmbus_channel_initiate_contact *msg; sema_init(&msg_info->wait_sema, 0, "Msg Info Sema"); msg = (hv_vmbus_channel_initiate_contact*) msg_info->msg; msg->header.message_type = HV_CHANNEL_MESSAGE_INITIATED_CONTACT; msg->vmbus_version_requested = version; msg->interrupt_page = hv_get_phys_addr( hv_vmbus_g_connection.interrupt_page); msg->monitor_page_1 = hv_get_phys_addr( hv_vmbus_g_connection.monitor_pages); msg->monitor_page_2 = hv_get_phys_addr( ((uint8_t *) hv_vmbus_g_connection.monitor_pages + PAGE_SIZE)); /** * Add to list before we send the request since we may receive the * response before returning from this routine */ - mtx_lock_spin(&hv_vmbus_g_connection.channel_msg_lock); + mtx_lock(&hv_vmbus_g_connection.channel_msg_lock); TAILQ_INSERT_TAIL( &hv_vmbus_g_connection.channel_msg_anchor, msg_info, msg_list_entry); - mtx_unlock_spin(&hv_vmbus_g_connection.channel_msg_lock); + mtx_unlock(&hv_vmbus_g_connection.channel_msg_lock); ret = hv_vmbus_post_message( msg, sizeof(hv_vmbus_channel_initiate_contact)); if (ret != 0) { - mtx_lock_spin(&hv_vmbus_g_connection.channel_msg_lock); + mtx_lock(&hv_vmbus_g_connection.channel_msg_lock); TAILQ_REMOVE( &hv_vmbus_g_connection.channel_msg_anchor, msg_info, msg_list_entry); - mtx_unlock_spin(&hv_vmbus_g_connection.channel_msg_lock); + mtx_unlock(&hv_vmbus_g_connection.channel_msg_lock); return (ret); } /** * Wait for the connection response */ ret = sema_timedwait(&msg_info->wait_sema, 5 * hz); /* KYS 5 seconds */ - mtx_lock_spin(&hv_vmbus_g_connection.channel_msg_lock); + mtx_lock(&hv_vmbus_g_connection.channel_msg_lock); TAILQ_REMOVE( &hv_vmbus_g_connection.channel_msg_anchor, msg_info, msg_list_entry); - mtx_unlock_spin(&hv_vmbus_g_connection.channel_msg_lock); + mtx_unlock(&hv_vmbus_g_connection.channel_msg_lock); /** * Check if successful */ if (msg_info->response.version_response.version_supported) { hv_vmbus_g_connection.connect_state = HV_CONNECTED; } else { ret = ECONNREFUSED; } return (ret); } /** * Send a connect request on the partition service connection */ int hv_vmbus_connect(void) { int ret = 0; uint32_t version; hv_vmbus_channel_msg_info* msg_info = NULL; /** * Make sure we are not connecting or connected */ if (hv_vmbus_g_connection.connect_state != HV_DISCONNECTED) { return (-1); } /** * Initialize the vmbus connection */ hv_vmbus_g_connection.connect_state = HV_CONNECTING; hv_vmbus_g_connection.work_queue = hv_work_queue_create("vmbusQ"); sema_init(&hv_vmbus_g_connection.control_sema, 1, "control_sema"); TAILQ_INIT(&hv_vmbus_g_connection.channel_msg_anchor); mtx_init(&hv_vmbus_g_connection.channel_msg_lock, "vmbus channel msg", - NULL, MTX_SPIN); + NULL, MTX_DEF); TAILQ_INIT(&hv_vmbus_g_connection.channel_anchor); mtx_init(&hv_vmbus_g_connection.channel_lock, "vmbus channel", NULL, MTX_DEF); /** * Setup the vmbus event connection for channel interrupt abstraction * stuff */ hv_vmbus_g_connection.interrupt_page = contigmalloc( PAGE_SIZE, M_DEVBUF, M_NOWAIT | M_ZERO, 0UL, BUS_SPACE_MAXADDR, PAGE_SIZE, 0); KASSERT(hv_vmbus_g_connection.interrupt_page != NULL, ("Error VMBUS: malloc failed to allocate Channel" " Request Event message!")); if (hv_vmbus_g_connection.interrupt_page == NULL) { ret = ENOMEM; goto cleanup; } hv_vmbus_g_connection.recv_interrupt_page = hv_vmbus_g_connection.interrupt_page; hv_vmbus_g_connection.send_interrupt_page = ((uint8_t *) hv_vmbus_g_connection.interrupt_page + (PAGE_SIZE >> 1)); /** * Set up the monitor notification facility. The 1st page for * parent->child and the 2nd page for child->parent */ hv_vmbus_g_connection.monitor_pages = contigmalloc( 2 * PAGE_SIZE, M_DEVBUF, M_NOWAIT | M_ZERO, 0UL, BUS_SPACE_MAXADDR, PAGE_SIZE, 0); KASSERT(hv_vmbus_g_connection.monitor_pages != NULL, ("Error VMBUS: malloc failed to allocate Monitor Pages!")); if (hv_vmbus_g_connection.monitor_pages == NULL) { ret = ENOMEM; goto cleanup; } msg_info = (hv_vmbus_channel_msg_info*) malloc(sizeof(hv_vmbus_channel_msg_info) + sizeof(hv_vmbus_channel_initiate_contact), M_DEVBUF, M_NOWAIT | M_ZERO); KASSERT(msg_info != NULL, ("Error VMBUS: malloc failed for Initiate Contact message!")); if (msg_info == NULL) { ret = ENOMEM; goto cleanup; } hv_vmbus_g_connection.channels = malloc(sizeof(hv_vmbus_channel*) * HV_CHANNEL_MAX_COUNT, M_DEVBUF, M_WAITOK | M_ZERO); /* * Find the highest vmbus version number we can support. */ version = HV_VMBUS_VERSION_CURRENT; do { ret = hv_vmbus_negotiate_version(msg_info, version); if (ret == EWOULDBLOCK) { /* * We timed out. */ goto cleanup; } if (hv_vmbus_g_connection.connect_state == HV_CONNECTED) break; version = hv_vmbus_get_next_version(version); } while (version != HV_VMBUS_VERSION_INVALID); hv_vmbus_protocal_version = version; if (bootverbose) printf("VMBUS: Protocol Version: %d.%d\n", version >> 16, version & 0xFFFF); sema_destroy(&msg_info->wait_sema); free(msg_info, M_DEVBUF); return (0); /* * Cleanup after failure! */ cleanup: hv_vmbus_g_connection.connect_state = HV_DISCONNECTED; hv_work_queue_close(hv_vmbus_g_connection.work_queue); sema_destroy(&hv_vmbus_g_connection.control_sema); mtx_destroy(&hv_vmbus_g_connection.channel_lock); mtx_destroy(&hv_vmbus_g_connection.channel_msg_lock); if (hv_vmbus_g_connection.interrupt_page != NULL) { contigfree( hv_vmbus_g_connection.interrupt_page, PAGE_SIZE, M_DEVBUF); hv_vmbus_g_connection.interrupt_page = NULL; } if (hv_vmbus_g_connection.monitor_pages != NULL) { contigfree( hv_vmbus_g_connection.monitor_pages, 2 * PAGE_SIZE, M_DEVBUF); hv_vmbus_g_connection.monitor_pages = NULL; } if (msg_info) { sema_destroy(&msg_info->wait_sema); free(msg_info, M_DEVBUF); } free(hv_vmbus_g_connection.channels, M_DEVBUF); return (ret); } /** * Send a disconnect request on the partition service connection */ int hv_vmbus_disconnect(void) { int ret = 0; hv_vmbus_channel_unload* msg; msg = malloc(sizeof(hv_vmbus_channel_unload), M_DEVBUF, M_NOWAIT | M_ZERO); KASSERT(msg != NULL, ("Error VMBUS: malloc failed to allocate Channel Unload Msg!")); if (msg == NULL) return (ENOMEM); msg->message_type = HV_CHANNEL_MESSAGE_UNLOAD; ret = hv_vmbus_post_message(msg, sizeof(hv_vmbus_channel_unload)); contigfree(hv_vmbus_g_connection.interrupt_page, PAGE_SIZE, M_DEVBUF); mtx_destroy(&hv_vmbus_g_connection.channel_msg_lock); hv_work_queue_close(hv_vmbus_g_connection.work_queue); sema_destroy(&hv_vmbus_g_connection.control_sema); free(hv_vmbus_g_connection.channels, M_DEVBUF); hv_vmbus_g_connection.connect_state = HV_DISCONNECTED; free(msg, M_DEVBUF); return (ret); } /** * Process a channel event notification */ static void VmbusProcessChannelEvent(uint32_t relid) { void* arg; uint32_t bytes_to_read; hv_vmbus_channel* channel; boolean_t is_batched_reading; /** * Find the channel based on this relid and invokes * the channel callback to process the event */ channel = hv_vmbus_g_connection.channels[relid]; if (channel == NULL) { return; } /** * To deal with the race condition where we might * receive a packet while the relevant driver is * being unloaded, dispatch the callback while * holding the channel lock. The unloading driver * will acquire the same channel lock to set the * callback to NULL. This closes the window. */ /* * Disable the lock due to newly added WITNESS check in r277723. * Will seek other way to avoid race condition. * -- whu */ // mtx_lock(&channel->inbound_lock); if (channel->on_channel_callback != NULL) { arg = channel->channel_callback_context; is_batched_reading = channel->batched_reading; /* * Optimize host to guest signaling by ensuring: * 1. While reading the channel, we disable interrupts from * host. * 2. Ensure that we process all posted messages from the host * before returning from this callback. * 3. Once we return, enable signaling from the host. Once this * state is set we check to see if additional packets are * available to read. In this case we repeat the process. */ do { if (is_batched_reading) hv_ring_buffer_read_begin(&channel->inbound); channel->on_channel_callback(arg); if (is_batched_reading) bytes_to_read = hv_ring_buffer_read_end(&channel->inbound); else bytes_to_read = 0; } while (is_batched_reading && (bytes_to_read != 0)); } // mtx_unlock(&channel->inbound_lock); } /** * Handler for events */ void hv_vmbus_on_events(void *arg) { int bit; int cpu; int dword; void *page_addr; uint32_t* recv_interrupt_page = NULL; int rel_id; int maxdword; hv_vmbus_synic_event_flags *event; /* int maxdword = PAGE_SIZE >> 3; */ cpu = (int)(long)arg; KASSERT(cpu <= mp_maxid, ("VMBUS: hv_vmbus_on_events: " "cpu out of range!")); if ((hv_vmbus_protocal_version == HV_VMBUS_VERSION_WS2008) || (hv_vmbus_protocal_version == HV_VMBUS_VERSION_WIN7)) { maxdword = HV_MAX_NUM_CHANNELS_SUPPORTED >> 5; /* * receive size is 1/2 page and divide that by 4 bytes */ recv_interrupt_page = hv_vmbus_g_connection.recv_interrupt_page; } else { /* * On Host with Win8 or above, the event page can be * checked directly to get the id of the channel * that has the pending interrupt. */ maxdword = HV_EVENT_FLAGS_DWORD_COUNT; page_addr = hv_vmbus_g_context.syn_ic_event_page[cpu]; event = (hv_vmbus_synic_event_flags *) page_addr + HV_VMBUS_MESSAGE_SINT; recv_interrupt_page = event->flags32; } /* * Check events */ if (recv_interrupt_page != NULL) { for (dword = 0; dword < maxdword; dword++) { if (recv_interrupt_page[dword]) { for (bit = 0; bit < HV_CHANNEL_DWORD_LEN; bit++) { if (synch_test_and_clear_bit(bit, (uint32_t *) &recv_interrupt_page[dword])) { rel_id = (dword << 5) + bit; if (rel_id == 0) { /* * Special case - * vmbus channel protocol msg. */ continue; } else { VmbusProcessChannelEvent(rel_id); } } } } } } return; } /** * Send a msg on the vmbus's message connection */ -int hv_vmbus_post_message(void *buffer, size_t bufferLen) { - int ret = 0; +int hv_vmbus_post_message(void *buffer, size_t bufferLen) +{ hv_vmbus_connection_id connId; - unsigned retries = 0; + sbintime_t time = SBT_1MS; + int retries; + int ret; - /* NetScaler delays from previous code were consolidated here */ - static int delayAmount[] = {100, 100, 100, 500, 500, 5000, 5000, 5000}; + connId.as_uint32_t = 0; + connId.u.id = HV_VMBUS_MESSAGE_CONNECTION_ID; - /* for(each entry in delayAmount) try to post message, - * delay a little bit before retrying + /* + * We retry to cope with transient failures caused by host side's + * insufficient resources. 20 times should suffice in practice. */ - for (retries = 0; - retries < sizeof(delayAmount)/sizeof(delayAmount[0]); retries++) { - connId.as_uint32_t = 0; - connId.u.id = HV_VMBUS_MESSAGE_CONNECTION_ID; - ret = hv_vmbus_post_msg_via_msg_ipc(connId, 1, buffer, bufferLen); - if (ret != HV_STATUS_INSUFFICIENT_BUFFERS) - break; - /* TODO: KYS We should use a blocking wait call */ - DELAY(delayAmount[retries]); + for (retries = 0; retries < 20; retries++) { + ret = hv_vmbus_post_msg_via_msg_ipc(connId, 1, buffer, + bufferLen); + if (ret == HV_STATUS_SUCCESS) + return (0); + + pause_sbt("pstmsg", time, 0, C_HARDCLOCK); + if (time < SBT_1S * 2) + time *= 2; } - KASSERT(ret == 0, ("Error VMBUS: Message Post Failed\n")); + KASSERT(ret == HV_STATUS_SUCCESS, + ("Error VMBUS: Message Post Failed, ret=%d\n", ret)); - return (ret); + return (EAGAIN); } /** * Send an event notification to the parent */ int hv_vmbus_set_event(hv_vmbus_channel *channel) { int ret = 0; uint32_t child_rel_id = channel->offer_msg.child_rel_id; /* Each uint32_t represents 32 channels */ synch_set_bit(child_rel_id & 31, (((uint32_t *)hv_vmbus_g_connection.send_interrupt_page + (child_rel_id >> 5)))); ret = hv_vmbus_signal_event(channel->signal_event_param); return (ret); } Index: releng/10.3/sys/dev/hyperv/vmbus/hv_hv.c =================================================================== --- releng/10.3/sys/dev/hyperv/vmbus/hv_hv.c (revision 303983) +++ releng/10.3/sys/dev/hyperv/vmbus/hv_hv.c (revision 303984) @@ -1,429 +1,521 @@ /*- * Copyright (c) 2009-2012 Microsoft Corp. * Copyright (c) 2012 NetApp Inc. * Copyright (c) 2012 Citrix Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice unmodified, this list of conditions, and the following * disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ /** * Implements low-level interactions with Hypver-V/Azure */ #include __FBSDID("$FreeBSD$"); #include +#include #include #include #include #include #include #include #include #include #include "hv_vmbus_priv.h" #define HV_NANOSECONDS_PER_SEC 1000000000L static u_int hv_get_timecount(struct timecounter *tc); +u_int hyperv_features; +u_int hyperv_recommends; + /** * Globals */ hv_vmbus_context hv_vmbus_g_context = { .syn_ic_initialized = FALSE, .hypercall_page = NULL, }; static struct timecounter hv_timecounter = { hv_get_timecount, 0, ~0u, HV_NANOSECONDS_PER_SEC/100, "Hyper-V", HV_NANOSECONDS_PER_SEC/100 }; static u_int hv_get_timecount(struct timecounter *tc) { u_int now = rdmsr(HV_X64_MSR_TIME_REF_COUNT); return (now); } /** * @brief Query the cpuid for presence of windows hypervisor */ int hv_vmbus_query_hypervisor_presence(void) { if (vm_guest != VM_GUEST_HV) return (0); return (hv_high >= HV_X64_CPUID_MIN && hv_high <= HV_X64_CPUID_MAX); } /** * @brief Get version of the windows hypervisor */ static int hv_vmbus_get_hypervisor_version(void) { u_int regs[4]; unsigned int maxLeaf; unsigned int op; /* * Its assumed that this is called after confirming that * Viridian is present * Query id and revision. */ op = HV_CPU_ID_FUNCTION_HV_VENDOR_AND_MAX_FUNCTION; do_cpuid(op, regs); maxLeaf = regs[0]; op = HV_CPU_ID_FUNCTION_HV_INTERFACE; do_cpuid(op, regs); if (maxLeaf >= HV_CPU_ID_FUNCTION_MS_HV_VERSION) { op = HV_CPU_ID_FUNCTION_MS_HV_VERSION; do_cpuid(op, regs); } return (maxLeaf); } /** * @brief Invoke the specified hypercall */ static uint64_t hv_vmbus_do_hypercall(uint64_t control, void* input, void* output) { #ifdef __x86_64__ uint64_t hv_status = 0; uint64_t input_address = (input) ? hv_get_phys_addr(input) : 0; uint64_t output_address = (output) ? hv_get_phys_addr(output) : 0; volatile void* hypercall_page = hv_vmbus_g_context.hypercall_page; __asm__ __volatile__ ("mov %0, %%r8" : : "r" (output_address): "r8"); __asm__ __volatile__ ("call *%3" : "=a"(hv_status): "c" (control), "d" (input_address), "m" (hypercall_page)); return (hv_status); #else uint32_t control_high = control >> 32; uint32_t control_low = control & 0xFFFFFFFF; uint32_t hv_status_high = 1; uint32_t hv_status_low = 1; uint64_t input_address = (input) ? hv_get_phys_addr(input) : 0; uint32_t input_address_high = input_address >> 32; uint32_t input_address_low = input_address & 0xFFFFFFFF; uint64_t output_address = (output) ? hv_get_phys_addr(output) : 0; uint32_t output_address_high = output_address >> 32; uint32_t output_address_low = output_address & 0xFFFFFFFF; volatile void* hypercall_page = hv_vmbus_g_context.hypercall_page; __asm__ __volatile__ ("call *%8" : "=d"(hv_status_high), "=a"(hv_status_low) : "d" (control_high), "a" (control_low), "b" (input_address_high), "c" (input_address_low), "D"(output_address_high), "S"(output_address_low), "m" (hypercall_page)); return (hv_status_low | ((uint64_t)hv_status_high << 32)); #endif /* __x86_64__ */ } /** * @brief Main initialization routine. * * This routine must be called * before any other routines in here are called */ int hv_vmbus_init(void) { int max_leaf; hv_vmbus_x64_msr_hypercall_contents hypercall_msr; void* virt_addr = 0; memset( hv_vmbus_g_context.syn_ic_event_page, 0, sizeof(hv_vmbus_handle) * MAXCPU); memset( hv_vmbus_g_context.syn_ic_msg_page, 0, sizeof(hv_vmbus_handle) * MAXCPU); if (vm_guest != VM_GUEST_HV) goto cleanup; max_leaf = hv_vmbus_get_hypervisor_version(); /* * Write our OS info */ uint64_t os_guest_info = HV_FREEBSD_GUEST_ID; wrmsr(HV_X64_MSR_GUEST_OS_ID, os_guest_info); hv_vmbus_g_context.guest_id = os_guest_info; /* * See if the hypercall page is already set */ hypercall_msr.as_uint64_t = rdmsr(HV_X64_MSR_HYPERCALL); virt_addr = malloc(PAGE_SIZE, M_DEVBUF, M_NOWAIT | M_ZERO); KASSERT(virt_addr != NULL, ("Error VMBUS: malloc failed to allocate page during init!")); if (virt_addr == NULL) goto cleanup; hypercall_msr.u.enable = 1; hypercall_msr.u.guest_physical_address = (hv_get_phys_addr(virt_addr) >> PAGE_SHIFT); wrmsr(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64_t); /* * Confirm that hypercall page did get set up */ hypercall_msr.as_uint64_t = 0; hypercall_msr.as_uint64_t = rdmsr(HV_X64_MSR_HYPERCALL); if (!hypercall_msr.u.enable) goto cleanup; hv_vmbus_g_context.hypercall_page = virt_addr; - tc_init(&hv_timecounter); /* register virtual timecount */ - hv_et_init(); return (0); cleanup: if (virt_addr != NULL) { if (hypercall_msr.u.enable) { hypercall_msr.as_uint64_t = 0; wrmsr(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64_t); } free(virt_addr, M_DEVBUF); } return (ENOTSUP); } /** * @brief Cleanup routine, called normally during driver unloading or exiting */ void hv_vmbus_cleanup(void) { hv_vmbus_x64_msr_hypercall_contents hypercall_msr; if (hv_vmbus_g_context.guest_id == HV_FREEBSD_GUEST_ID) { if (hv_vmbus_g_context.hypercall_page != NULL) { hypercall_msr.as_uint64_t = 0; wrmsr(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64_t); free(hv_vmbus_g_context.hypercall_page, M_DEVBUF); hv_vmbus_g_context.hypercall_page = NULL; } } } /** * @brief Post a message using the hypervisor message IPC. * (This involves a hypercall.) */ hv_vmbus_status hv_vmbus_post_msg_via_msg_ipc( hv_vmbus_connection_id connection_id, hv_vmbus_msg_type message_type, void* payload, size_t payload_size) { struct alignedinput { uint64_t alignment8; hv_vmbus_input_post_message msg; }; hv_vmbus_input_post_message* aligned_msg; hv_vmbus_status status; size_t addr; if (payload_size > HV_MESSAGE_PAYLOAD_BYTE_COUNT) return (EMSGSIZE); addr = (size_t) malloc(sizeof(struct alignedinput), M_DEVBUF, M_ZERO | M_NOWAIT); KASSERT(addr != 0, ("Error VMBUS: malloc failed to allocate message buffer!")); if (addr == 0) return (ENOMEM); aligned_msg = (hv_vmbus_input_post_message*) (HV_ALIGN_UP(addr, HV_HYPERCALL_PARAM_ALIGN)); aligned_msg->connection_id = connection_id; aligned_msg->message_type = message_type; aligned_msg->payload_size = payload_size; memcpy((void*) aligned_msg->payload, payload, payload_size); status = hv_vmbus_do_hypercall( HV_CALL_POST_MESSAGE, aligned_msg, 0) & 0xFFFF; free((void *) addr, M_DEVBUF); return (status); } /** * @brief Signal an event on the specified connection using the hypervisor * event IPC. (This involves a hypercall.) */ hv_vmbus_status hv_vmbus_signal_event(void *con_id) { hv_vmbus_status status; status = hv_vmbus_do_hypercall( HV_CALL_SIGNAL_EVENT, con_id, 0) & 0xFFFF; return (status); } /** * @brief hv_vmbus_synic_init */ void hv_vmbus_synic_init(void *arg) { int cpu; uint64_t hv_vcpu_index; hv_vmbus_synic_simp simp; hv_vmbus_synic_siefp siefp; hv_vmbus_synic_scontrol sctrl; hv_vmbus_synic_sint shared_sint; uint64_t version; hv_setup_args* setup_args = (hv_setup_args *)arg; cpu = PCPU_GET(cpuid); if (hv_vmbus_g_context.hypercall_page == NULL) return; /* * TODO: Check the version */ version = rdmsr(HV_X64_MSR_SVERSION); hv_vmbus_g_context.syn_ic_msg_page[cpu] = setup_args->page_buffers[2 * cpu]; hv_vmbus_g_context.syn_ic_event_page[cpu] = setup_args->page_buffers[2 * cpu + 1]; /* * Setup the Synic's message page */ simp.as_uint64_t = rdmsr(HV_X64_MSR_SIMP); simp.u.simp_enabled = 1; simp.u.base_simp_gpa = ((hv_get_phys_addr( hv_vmbus_g_context.syn_ic_msg_page[cpu])) >> PAGE_SHIFT); wrmsr(HV_X64_MSR_SIMP, simp.as_uint64_t); /* * Setup the Synic's event page */ siefp.as_uint64_t = rdmsr(HV_X64_MSR_SIEFP); siefp.u.siefp_enabled = 1; siefp.u.base_siefp_gpa = ((hv_get_phys_addr( hv_vmbus_g_context.syn_ic_event_page[cpu])) >> PAGE_SHIFT); wrmsr(HV_X64_MSR_SIEFP, siefp.as_uint64_t); /*HV_SHARED_SINT_IDT_VECTOR + 0x20; */ shared_sint.as_uint64_t = 0; shared_sint.u.vector = setup_args->vector; shared_sint.u.masked = FALSE; shared_sint.u.auto_eoi = TRUE; wrmsr(HV_X64_MSR_SINT0 + HV_VMBUS_MESSAGE_SINT, shared_sint.as_uint64_t); /* Enable the global synic bit */ sctrl.as_uint64_t = rdmsr(HV_X64_MSR_SCONTROL); sctrl.u.enable = 1; wrmsr(HV_X64_MSR_SCONTROL, sctrl.as_uint64_t); hv_vmbus_g_context.syn_ic_initialized = TRUE; /* * Set up the cpuid mapping from Hyper-V to FreeBSD. * The array is indexed using FreeBSD cpuid. */ hv_vcpu_index = rdmsr(HV_X64_MSR_VP_INDEX); hv_vmbus_g_context.hv_vcpu_index[cpu] = (uint32_t)hv_vcpu_index; return; } /** * @brief Cleanup routine for hv_vmbus_synic_init() */ void hv_vmbus_synic_cleanup(void *arg) { hv_vmbus_synic_sint shared_sint; hv_vmbus_synic_simp simp; hv_vmbus_synic_siefp siefp; if (!hv_vmbus_g_context.syn_ic_initialized) return; shared_sint.as_uint64_t = rdmsr( HV_X64_MSR_SINT0 + HV_VMBUS_MESSAGE_SINT); shared_sint.u.masked = 1; /* * Disable the interrupt */ wrmsr( HV_X64_MSR_SINT0 + HV_VMBUS_MESSAGE_SINT, shared_sint.as_uint64_t); simp.as_uint64_t = rdmsr(HV_X64_MSR_SIMP); simp.u.simp_enabled = 0; simp.u.base_simp_gpa = 0; wrmsr(HV_X64_MSR_SIMP, simp.as_uint64_t); siefp.as_uint64_t = rdmsr(HV_X64_MSR_SIEFP); siefp.u.siefp_enabled = 0; siefp.u.base_siefp_gpa = 0; wrmsr(HV_X64_MSR_SIEFP, siefp.as_uint64_t); } +static bool +hyperv_identify(void) +{ + u_int regs[4]; + unsigned int maxLeaf; + unsigned int op; + + if (vm_guest != VM_GUEST_HV) + return (false); + + op = HV_CPU_ID_FUNCTION_HV_VENDOR_AND_MAX_FUNCTION; + do_cpuid(op, regs); + maxLeaf = regs[0]; + if (maxLeaf < HV_CPU_ID_FUNCTION_MS_HV_IMPLEMENTATION_LIMITS) + return (false); + + op = HV_CPU_ID_FUNCTION_HV_INTERFACE; + do_cpuid(op, regs); + if (regs[0] != 0x31237648 /* HV#1 */) + return (false); + + op = HV_CPU_ID_FUNCTION_MS_HV_FEATURES; + do_cpuid(op, regs); + if ((regs[0] & HV_FEATURE_MSR_HYPERCALL) == 0) { + /* + * Hyper-V w/o Hypercall is impossible; someone + * is faking Hyper-V. + */ + return (false); + } + hyperv_features = regs[0]; + + op = HV_CPU_ID_FUNCTION_MS_HV_VERSION; + do_cpuid(op, regs); + printf("Hyper-V Version: %d.%d.%d [SP%d]\n", + regs[1] >> 16, regs[1] & 0xffff, regs[0], regs[2]); + + printf(" Features: 0x%b\n", hyperv_features, + "\020" + "\001VPRUNTIME" + "\002TMREFCNT" + "\003SYNCIC" + "\004SYNCTM" + "\005APIC" + "\006HYERCALL" + "\007VPINDEX" + "\010RESET" + "\011STATS" + "\012REFTSC" + "\013IDLE" + "\014TMFREQ" + "\015DEBUG"); + + op = HV_CPU_ID_FUNCTION_MS_HV_ENLIGHTENMENT_INFORMATION; + do_cpuid(op, regs); + hyperv_recommends = regs[0]; + if (bootverbose) + printf(" Recommends: %08x %08x\n", regs[0], regs[1]); + + op = HV_CPU_ID_FUNCTION_MS_HV_IMPLEMENTATION_LIMITS; + do_cpuid(op, regs); + if (bootverbose) { + printf(" Limits: Vcpu:%d Lcpu:%d Int:%d\n", + regs[0], regs[1], regs[2]); + } + + if (maxLeaf >= HV_CPU_ID_FUNCTION_MS_HV_HARDWARE_FEATURE) { + op = HV_CPU_ID_FUNCTION_MS_HV_HARDWARE_FEATURE; + do_cpuid(op, regs); + if (bootverbose) { + printf(" HW Features: %08x AMD: %08x\n", + regs[0], regs[3]); + } + } + + return (true); +} + +static void +hyperv_init(void *dummy __unused) +{ + if (!hyperv_identify()) + return; + + if (hyperv_features & HV_FEATURE_MSR_TIME_REFCNT) { + /* Register virtual timecount */ + tc_init(&hv_timecounter); + } +} +SYSINIT(hyperv_initialize, SI_SUB_HYPERVISOR, SI_ORDER_FIRST, hyperv_init, NULL); Index: releng/10.3/sys/dev/hyperv/vmbus/hv_vmbus_priv.h =================================================================== --- releng/10.3/sys/dev/hyperv/vmbus/hv_vmbus_priv.h (revision 303983) +++ releng/10.3/sys/dev/hyperv/vmbus/hv_vmbus_priv.h (revision 303984) @@ -1,771 +1,782 @@ /*- * Copyright (c) 2009-2012 Microsoft Corp. * Copyright (c) 2012 NetApp Inc. * Copyright (c) 2012 Citrix Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice unmodified, this list of conditions, and the following * disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * $FreeBSD$ */ #ifndef __HYPERV_PRIV_H__ #define __HYPERV_PRIV_H__ #include #include #include #include #include /* * Status codes for hypervisor operations. */ typedef uint16_t hv_vmbus_status; #define HV_MESSAGE_SIZE (256) #define HV_MESSAGE_PAYLOAD_BYTE_COUNT (240) #define HV_MESSAGE_PAYLOAD_QWORD_COUNT (30) #define HV_ANY_VP (0xFFFFFFFF) /* * Synthetic interrupt controller flag constants. */ #define HV_EVENT_FLAGS_COUNT (256 * 8) #define HV_EVENT_FLAGS_BYTE_COUNT (256) #define HV_EVENT_FLAGS_DWORD_COUNT (256 / sizeof(uint32_t)) /** * max channel count <== event_flags_dword_count * bit_of_dword */ #define HV_CHANNEL_DWORD_LEN (32) #define HV_CHANNEL_MAX_COUNT \ ((HV_EVENT_FLAGS_DWORD_COUNT) * HV_CHANNEL_DWORD_LEN) /* * MessageId: HV_STATUS_INSUFFICIENT_BUFFERS * MessageText: * You did not supply enough message buffers to send a message. */ +#define HV_STATUS_SUCCESS ((uint16_t)0) #define HV_STATUS_INSUFFICIENT_BUFFERS ((uint16_t)0x0013) typedef void (*hv_vmbus_channel_callback)(void *context); typedef struct { void* data; uint32_t length; } hv_vmbus_sg_buffer_list; typedef struct { uint32_t current_interrupt_mask; uint32_t current_read_index; uint32_t current_write_index; uint32_t bytes_avail_to_read; uint32_t bytes_avail_to_write; } hv_vmbus_ring_buffer_debug_info; typedef struct { uint32_t rel_id; hv_vmbus_channel_state state; hv_guid interface_type; hv_guid interface_instance; uint32_t monitor_id; uint32_t server_monitor_pending; uint32_t server_monitor_latency; uint32_t server_monitor_connection_id; uint32_t client_monitor_pending; uint32_t client_monitor_latency; uint32_t client_monitor_connection_id; hv_vmbus_ring_buffer_debug_info inbound; hv_vmbus_ring_buffer_debug_info outbound; } hv_vmbus_channel_debug_info; typedef union { hv_vmbus_channel_version_supported version_supported; hv_vmbus_channel_open_result open_result; hv_vmbus_channel_gpadl_torndown gpadl_torndown; hv_vmbus_channel_gpadl_created gpadl_created; hv_vmbus_channel_version_response version_response; } hv_vmbus_channel_msg_response; /* * Represents each channel msg on the vmbus connection * This is a variable-size data structure depending on * the msg type itself */ typedef struct hv_vmbus_channel_msg_info { /* * Bookkeeping stuff */ TAILQ_ENTRY(hv_vmbus_channel_msg_info) msg_list_entry; /* * So far, this is only used to handle * gpadl body message */ TAILQ_HEAD(, hv_vmbus_channel_msg_info) sub_msg_list_anchor; /* * Synchronize the request/response if * needed. * KYS: Use a semaphore for now. * Not perf critical. */ struct sema wait_sema; hv_vmbus_channel_msg_response response; uint32_t message_size; /** * The channel message that goes out on * the "wire". It will contain at * minimum the * hv_vmbus_channel_msg_header * header. */ unsigned char msg[0]; } hv_vmbus_channel_msg_info; /* * The format must be the same as hv_vm_data_gpa_direct */ typedef struct hv_vmbus_channel_packet_page_buffer { uint16_t type; uint16_t data_offset8; uint16_t length8; uint16_t flags; uint64_t transaction_id; uint32_t reserved; uint32_t range_count; hv_vmbus_page_buffer range[HV_MAX_PAGE_BUFFER_COUNT]; } __packed hv_vmbus_channel_packet_page_buffer; /* * The format must be the same as hv_vm_data_gpa_direct */ typedef struct hv_vmbus_channel_packet_multipage_buffer { uint16_t type; uint16_t data_offset8; uint16_t length8; uint16_t flags; uint64_t transaction_id; uint32_t reserved; uint32_t range_count; /* Always 1 in this case */ hv_vmbus_multipage_buffer range; } __packed hv_vmbus_channel_packet_multipage_buffer; enum { HV_VMBUS_MESSAGE_CONNECTION_ID = 1, HV_VMBUS_MESSAGE_PORT_ID = 1, HV_VMBUS_EVENT_CONNECTION_ID = 2, HV_VMBUS_EVENT_PORT_ID = 2, HV_VMBUS_MONITOR_CONNECTION_ID = 3, HV_VMBUS_MONITOR_PORT_ID = 3, HV_VMBUS_MESSAGE_SINT = 2 }; #define HV_PRESENT_BIT 0x80000000 #define HV_HYPERCALL_PARAM_ALIGN sizeof(uint64_t) typedef struct { uint64_t guest_id; void* hypercall_page; hv_bool_uint8_t syn_ic_initialized; hv_vmbus_handle syn_ic_msg_page[MAXCPU]; hv_vmbus_handle syn_ic_event_page[MAXCPU]; /* * For FreeBSD cpuid to Hyper-V vcpuid mapping. */ uint32_t hv_vcpu_index[MAXCPU]; /* * Each cpu has its own software interrupt handler for channel * event and msg handling. */ struct intr_event *hv_event_intr_event[MAXCPU]; struct intr_event *hv_msg_intr_event[MAXCPU]; void *event_swintr[MAXCPU]; void *msg_swintr[MAXCPU]; /* * Host use this vector to intrrupt guest for vmbus channel * event and msg. */ unsigned int hv_cb_vector; } hv_vmbus_context; /* * Define hypervisor message types */ typedef enum { HV_MESSAGE_TYPE_NONE = 0x00000000, /* * Memory access messages */ HV_MESSAGE_TYPE_UNMAPPED_GPA = 0x80000000, HV_MESSAGE_TYPE_GPA_INTERCEPT = 0x80000001, /* * Timer notification messages */ HV_MESSAGE_TIMER_EXPIRED = 0x80000010, /* * Error messages */ HV_MESSAGE_TYPE_INVALID_VP_REGISTER_VALUE = 0x80000020, HV_MESSAGE_TYPE_UNRECOVERABLE_EXCEPTION = 0x80000021, HV_MESSAGE_TYPE_UNSUPPORTED_FEATURE = 0x80000022, /* * Trace buffer complete messages */ HV_MESSAGE_TYPE_EVENT_LOG_BUFFER_COMPLETE = 0x80000040, /* * Platform-specific processor intercept messages */ HV_MESSAGE_TYPE_X64_IO_PORT_INTERCEPT = 0x80010000, HV_MESSAGE_TYPE_X64_MSR_INTERCEPT = 0x80010001, HV_MESSAGE_TYPE_X64_CPU_INTERCEPT = 0x80010002, HV_MESSAGE_TYPE_X64_EXCEPTION_INTERCEPT = 0x80010003, HV_MESSAGE_TYPE_X64_APIC_EOI = 0x80010004, HV_MESSAGE_TYPE_X64_LEGACY_FP_ERROR = 0x80010005 } hv_vmbus_msg_type; /* * Define port identifier type */ typedef union _hv_vmbus_port_id { uint32_t as_uint32_t; struct { uint32_t id:24; uint32_t reserved:8; } u ; } hv_vmbus_port_id; /* * Define synthetic interrupt controller message flag */ typedef union { uint8_t as_uint8_t; struct { uint8_t message_pending:1; uint8_t reserved:7; } u; } hv_vmbus_msg_flags; typedef uint64_t hv_vmbus_partition_id; /* * Define synthetic interrupt controller message header */ typedef struct { hv_vmbus_msg_type message_type; uint8_t payload_size; hv_vmbus_msg_flags message_flags; uint8_t reserved[2]; union { hv_vmbus_partition_id sender; hv_vmbus_port_id port; } u; } hv_vmbus_msg_header; /* * Define synthetic interrupt controller message format */ typedef struct { hv_vmbus_msg_header header; union { uint64_t payload[HV_MESSAGE_PAYLOAD_QWORD_COUNT]; } u ; } hv_vmbus_message; /* * Maximum channels is determined by the size of the interrupt * page which is PAGE_SIZE. 1/2 of PAGE_SIZE is for * send endpoint interrupt and the other is receive * endpoint interrupt. * * Note: (PAGE_SIZE >> 1) << 3 allocates 16348 channels */ #define HV_MAX_NUM_CHANNELS (PAGE_SIZE >> 1) << 3 /* * (The value here must be in multiple of 32) */ #define HV_MAX_NUM_CHANNELS_SUPPORTED 256 /* * VM Bus connection states */ typedef enum { HV_DISCONNECTED, HV_CONNECTING, HV_CONNECTED, HV_DISCONNECTING } hv_vmbus_connect_state; #define HV_MAX_SIZE_CHANNEL_MESSAGE HV_MESSAGE_PAYLOAD_BYTE_COUNT typedef struct { hv_vmbus_connect_state connect_state; uint32_t next_gpadl_handle; /** * Represents channel interrupts. Each bit position * represents a channel. * When a channel sends an interrupt via VMBUS, it * finds its bit in the send_interrupt_page, set it and * calls Hv to generate a port event. The other end * receives the port event and parse the * recv_interrupt_page to see which bit is set */ void *interrupt_page; void *send_interrupt_page; void *recv_interrupt_page; /* * 2 pages - 1st page for parent->child * notification and 2nd is child->parent * notification */ void *monitor_pages; TAILQ_HEAD(, hv_vmbus_channel_msg_info) channel_msg_anchor; struct mtx channel_msg_lock; /** * List of primary channels. Sub channels will be linked * under their primary channel. */ TAILQ_HEAD(, hv_vmbus_channel) channel_anchor; struct mtx channel_lock; /** * channel table for fast lookup through id. */ hv_vmbus_channel **channels; hv_vmbus_handle work_queue; struct sema control_sema; } hv_vmbus_connection; typedef union { uint64_t as_uint64_t; struct { uint64_t build_number : 16; uint64_t service_version : 8; /* Service Pack, etc. */ uint64_t minor_version : 8; uint64_t major_version : 8; /* * HV_GUEST_OS_MICROSOFT_IDS (If Vendor=MS) * HV_GUEST_OS_VENDOR */ uint64_t os_id : 8; uint64_t vendor_id : 16; } u; } hv_vmbus_x64_msr_guest_os_id_contents; typedef union { uint64_t as_uint64_t; struct { uint64_t enable :1; uint64_t reserved :11; uint64_t guest_physical_address :52; } u; } hv_vmbus_x64_msr_hypercall_contents; typedef union { uint32_t as_uint32_t; struct { uint32_t group_enable :4; uint32_t rsvd_z :28; } u; } hv_vmbus_monitor_trigger_state; typedef union { uint64_t as_uint64_t; struct { uint32_t pending; uint32_t armed; } u; } hv_vmbus_monitor_trigger_group; typedef struct { hv_vmbus_connection_id connection_id; uint16_t flag_number; uint16_t rsvd_z; } hv_vmbus_monitor_parameter; /* * hv_vmbus_monitor_page Layout * ------------------------------------------------------ * | 0 | trigger_state (4 bytes) | Rsvd1 (4 bytes) | * | 8 | trigger_group[0] | * | 10 | trigger_group[1] | * | 18 | trigger_group[2] | * | 20 | trigger_group[3] | * | 28 | Rsvd2[0] | * | 30 | Rsvd2[1] | * | 38 | Rsvd2[2] | * | 40 | next_check_time[0][0] | next_check_time[0][1] | * | ... | * | 240 | latency[0][0..3] | * | 340 | Rsvz3[0] | * | 440 | parameter[0][0] | * | 448 | parameter[0][1] | * | ... | * | 840 | Rsvd4[0] | * ------------------------------------------------------ */ typedef struct { hv_vmbus_monitor_trigger_state trigger_state; uint32_t rsvd_z1; hv_vmbus_monitor_trigger_group trigger_group[4]; uint64_t rsvd_z2[3]; int32_t next_check_time[4][32]; uint16_t latency[4][32]; uint64_t rsvd_z3[32]; hv_vmbus_monitor_parameter parameter[4][32]; uint8_t rsvd_z4[1984]; } hv_vmbus_monitor_page; /* * The below CPUID leaves are present if VersionAndFeatures.HypervisorPresent * is set by CPUID(HV_CPU_ID_FUNCTION_VERSION_AND_FEATURES). */ typedef enum { HV_CPU_ID_FUNCTION_VERSION_AND_FEATURES = 0x00000001, HV_CPU_ID_FUNCTION_HV_VENDOR_AND_MAX_FUNCTION = 0x40000000, HV_CPU_ID_FUNCTION_HV_INTERFACE = 0x40000001, /* * The remaining functions depend on the value * of hv_cpu_id_function_interface */ HV_CPU_ID_FUNCTION_MS_HV_VERSION = 0x40000002, HV_CPU_ID_FUNCTION_MS_HV_FEATURES = 0x40000003, HV_CPU_ID_FUNCTION_MS_HV_ENLIGHTENMENT_INFORMATION = 0x40000004, - HV_CPU_ID_FUNCTION_MS_HV_IMPLEMENTATION_LIMITS = 0x40000005 - + HV_CPU_ID_FUNCTION_MS_HV_IMPLEMENTATION_LIMITS = 0x40000005, + HV_CPU_ID_FUNCTION_MS_HV_HARDWARE_FEATURE = 0x40000006 } hv_vmbus_cpuid_function; +#define HV_FEATURE_MSR_TIME_REFCNT (1 << 1) +#define HV_FEATURE_MSR_SYNCIC (1 << 2) +#define HV_FEATURE_MSR_STIMER (1 << 3) +#define HV_FEATURE_MSR_APIC (1 << 4) +#define HV_FEATURE_MSR_HYPERCALL (1 << 5) +#define HV_FEATURE_MSR_GUEST_IDLE (1 << 10) + /* * Define the format of the SIMP register */ typedef union { uint64_t as_uint64_t; struct { uint64_t simp_enabled : 1; uint64_t preserved : 11; uint64_t base_simp_gpa : 52; } u; } hv_vmbus_synic_simp; /* * Define the format of the SIEFP register */ typedef union { uint64_t as_uint64_t; struct { uint64_t siefp_enabled : 1; uint64_t preserved : 11; uint64_t base_siefp_gpa : 52; } u; } hv_vmbus_synic_siefp; /* * Define synthetic interrupt source */ typedef union { uint64_t as_uint64_t; struct { uint64_t vector : 8; uint64_t reserved1 : 8; uint64_t masked : 1; uint64_t auto_eoi : 1; uint64_t reserved2 : 46; } u; } hv_vmbus_synic_sint; /* * Timer configuration register. */ union hv_timer_config { uint64_t as_uint64; struct { uint64_t enable:1; uint64_t periodic:1; uint64_t lazy:1; uint64_t auto_enable:1; uint64_t reserved_z0:12; uint64_t sintx:4; uint64_t reserved_z1:44; }; }; /* * Define syn_ic control register */ typedef union _hv_vmbus_synic_scontrol { uint64_t as_uint64_t; struct { uint64_t enable : 1; uint64_t reserved : 63; } u; } hv_vmbus_synic_scontrol; /* * Define the hv_vmbus_post_message hypercall input structure */ typedef struct { hv_vmbus_connection_id connection_id; uint32_t reserved; hv_vmbus_msg_type message_type; uint32_t payload_size; uint64_t payload[HV_MESSAGE_PAYLOAD_QWORD_COUNT]; } hv_vmbus_input_post_message; /* * Define the synthetic interrupt controller event flags format */ typedef union { uint8_t flags8[HV_EVENT_FLAGS_BYTE_COUNT]; uint32_t flags32[HV_EVENT_FLAGS_DWORD_COUNT]; } hv_vmbus_synic_event_flags; #define HV_X64_CPUID_MIN (0x40000005) #define HV_X64_CPUID_MAX (0x4000ffff) /* * Declare the MSR used to identify the guest OS */ #define HV_X64_MSR_GUEST_OS_ID (0x40000000) /* * Declare the MSR used to setup pages used to communicate with the hypervisor */ #define HV_X64_MSR_HYPERCALL (0x40000001) /* MSR used to provide vcpu index */ #define HV_X64_MSR_VP_INDEX (0x40000002) #define HV_X64_MSR_TIME_REF_COUNT (0x40000020) /* * Define synthetic interrupt controller model specific registers */ #define HV_X64_MSR_SCONTROL (0x40000080) #define HV_X64_MSR_SVERSION (0x40000081) #define HV_X64_MSR_SIEFP (0x40000082) #define HV_X64_MSR_SIMP (0x40000083) #define HV_X64_MSR_EOM (0x40000084) #define HV_X64_MSR_SINT0 (0x40000090) #define HV_X64_MSR_SINT1 (0x40000091) #define HV_X64_MSR_SINT2 (0x40000092) #define HV_X64_MSR_SINT3 (0x40000093) #define HV_X64_MSR_SINT4 (0x40000094) #define HV_X64_MSR_SINT5 (0x40000095) #define HV_X64_MSR_SINT6 (0x40000096) #define HV_X64_MSR_SINT7 (0x40000097) #define HV_X64_MSR_SINT8 (0x40000098) #define HV_X64_MSR_SINT9 (0x40000099) #define HV_X64_MSR_SINT10 (0x4000009A) #define HV_X64_MSR_SINT11 (0x4000009B) #define HV_X64_MSR_SINT12 (0x4000009C) #define HV_X64_MSR_SINT13 (0x4000009D) #define HV_X64_MSR_SINT14 (0x4000009E) #define HV_X64_MSR_SINT15 (0x4000009F) /* * Synthetic Timer MSRs. Four timers per vcpu. */ #define HV_X64_MSR_STIMER0_CONFIG 0x400000B0 #define HV_X64_MSR_STIMER0_COUNT 0x400000B1 #define HV_X64_MSR_STIMER1_CONFIG 0x400000B2 #define HV_X64_MSR_STIMER1_COUNT 0x400000B3 #define HV_X64_MSR_STIMER2_CONFIG 0x400000B4 #define HV_X64_MSR_STIMER2_COUNT 0x400000B5 #define HV_X64_MSR_STIMER3_CONFIG 0x400000B6 #define HV_X64_MSR_STIMER3_COUNT 0x400000B7 /* * Declare the various hypercall operations */ typedef enum { HV_CALL_POST_MESSAGE = 0x005c, HV_CALL_SIGNAL_EVENT = 0x005d, } hv_vmbus_call_code; /** * Global variables */ extern hv_vmbus_context hv_vmbus_g_context; extern hv_vmbus_connection hv_vmbus_g_connection; + +extern u_int hyperv_features; +extern u_int hyperv_recommends; typedef void (*vmbus_msg_handler)(hv_vmbus_channel_msg_header *msg); typedef struct hv_vmbus_channel_msg_table_entry { hv_vmbus_channel_msg_type messageType; bool handler_no_sleep; /* true: the handler doesn't sleep */ vmbus_msg_handler messageHandler; } hv_vmbus_channel_msg_table_entry; extern hv_vmbus_channel_msg_table_entry g_channel_message_table[]; /* * Private, VM Bus functions */ int hv_vmbus_ring_buffer_init( hv_vmbus_ring_buffer_info *ring_info, void *buffer, uint32_t buffer_len); void hv_ring_buffer_cleanup( hv_vmbus_ring_buffer_info *ring_info); int hv_ring_buffer_write( hv_vmbus_ring_buffer_info *ring_info, hv_vmbus_sg_buffer_list sg_buffers[], uint32_t sg_buff_count, boolean_t *need_sig); int hv_ring_buffer_peek( hv_vmbus_ring_buffer_info *ring_info, void *buffer, uint32_t buffer_len); int hv_ring_buffer_read( hv_vmbus_ring_buffer_info *ring_info, void *buffer, uint32_t buffer_len, uint32_t offset); uint32_t hv_vmbus_get_ring_buffer_interrupt_mask( hv_vmbus_ring_buffer_info *ring_info); void hv_vmbus_dump_ring_info( hv_vmbus_ring_buffer_info *ring_info, char *prefix); void hv_ring_buffer_read_begin( hv_vmbus_ring_buffer_info *ring_info); uint32_t hv_ring_buffer_read_end( hv_vmbus_ring_buffer_info *ring_info); hv_vmbus_channel* hv_vmbus_allocate_channel(void); void hv_vmbus_free_vmbus_channel(hv_vmbus_channel *channel); void hv_vmbus_on_channel_message(void *context); int hv_vmbus_request_channel_offers(void); void hv_vmbus_release_unattached_channels(void); int hv_vmbus_init(void); void hv_vmbus_cleanup(void); uint16_t hv_vmbus_post_msg_via_msg_ipc( hv_vmbus_connection_id connection_id, hv_vmbus_msg_type message_type, void *payload, size_t payload_size); uint16_t hv_vmbus_signal_event(void *con_id); void hv_vmbus_synic_init(void *irq_arg); void hv_vmbus_synic_cleanup(void *arg); int hv_vmbus_query_hypervisor_presence(void); struct hv_device* hv_vmbus_child_device_create( hv_guid device_type, hv_guid device_instance, hv_vmbus_channel *channel); int hv_vmbus_child_device_register( struct hv_device *child_dev); int hv_vmbus_child_device_unregister( struct hv_device *child_dev); /** * Connection interfaces */ int hv_vmbus_connect(void); int hv_vmbus_disconnect(void); int hv_vmbus_post_message(void *buffer, size_t buf_size); int hv_vmbus_set_event(hv_vmbus_channel *channel); void hv_vmbus_on_events(void *); /** * Event Timer interfaces */ void hv_et_init(void); void hv_et_intr(struct trapframe*); /* * The guest OS needs to register the guest ID with the hypervisor. * The guest ID is a 64 bit entity and the structure of this ID is * specified in the Hyper-V specification: * * http://msdn.microsoft.com/en-us/library/windows/ * hardware/ff542653%28v=vs.85%29.aspx * * While the current guideline does not specify how FreeBSD guest ID(s) * need to be generated, our plan is to publish the guidelines for * FreeBSD and other guest operating systems that currently are hosted * on Hyper-V. The implementation here conforms to this yet * unpublished guidelines. * * Bit(s) * 63 - Indicates if the OS is Open Source or not; 1 is Open Source * 62:56 - Os Type; Linux is 0x100, FreeBSD is 0x200 * 55:48 - Distro specific identification * 47:16 - FreeBSD kernel version number * 15:0 - Distro specific identification * */ #define HV_FREEBSD_VENDOR_ID 0x8200 #define HV_FREEBSD_GUEST_ID hv_generate_guest_id(0,0) static inline uint64_t hv_generate_guest_id( uint8_t distro_id_part1, uint16_t distro_id_part2) { uint64_t guest_id; guest_id = (((uint64_t)HV_FREEBSD_VENDOR_ID) << 48); guest_id |= (((uint64_t)(distro_id_part1)) << 48); guest_id |= (((uint64_t)(__FreeBSD_version)) << 16); /* in param.h */ guest_id |= ((uint64_t)(distro_id_part2)); return guest_id; } typedef struct { unsigned int vector; void *page_buffers[2 * MAXCPU]; } hv_setup_args; #endif /* __HYPERV_PRIV_H__ */ Index: releng/10.3 =================================================================== --- releng/10.3 (revision 303983) +++ releng/10.3 (revision 303984) Property changes on: releng/10.3 ___________________________________________________________________ Modified: svn:mergeinfo ## -0,0 +0,2 ## Merged /head:r297219,297635,297802-297804,298038,298385,299505,302541,302605 Merged /stable/10:r299153,299156,300656,301924-301925,301942,302863