Index: head/UPDATING
===================================================================
--- head/UPDATING	(revision 358019)
+++ head/UPDATING	(revision 358020)
@@ -1,2199 +1,2204 @@
  Updating Information for FreeBSD current users.
 
 This file is maintained and copyrighted by M. Warner Losh <imp@freebsd.org>.
 See end of file for further details.  For commonly done items, please see the
 COMMON ITEMS: section later in the file.  These instructions assume that you
 basically know what you are doing.  If not, then please consult the FreeBSD
 handbook:
 
     https://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/makeworld.html
 
 Items affecting the ports and packages system can be found in
 /usr/ports/UPDATING.  Please read that file before running portupgrade.
 
 NOTE TO PEOPLE WHO THINK THAT FreeBSD 13.x IS SLOW:
 	FreeBSD 13.x has many debugging features turned on, in both the kernel
 	and userland.  These features attempt to detect incorrect use of
 	system primitives, and encourage loud failure through extra sanity
 	checking and fail stop semantics.  They also substantially impact
 	system performance.  If you want to do performance measurement,
 	benchmarking, and optimization, you'll want to turn them off.  This
 	includes various WITNESS- related kernel options, INVARIANTS, malloc
 	debugging flags in userland, and various verbose features in the
 	kernel.  Many developers choose to disable these features on build
 	machines to maximize performance.  (To completely disable malloc
 	debugging, define MALLOC_PRODUCTION in /etc/make.conf, or to merely
 	disable the most expensive debugging functionality run
 	"ln -s 'abort:false,junk:false' /etc/malloc.conf".)
 
+20200217:
+	The size of struct vnet and the magic cookie have changed.
+	Users need to recompile libkvm and all modules using VIMAGE
+	together with their new kernel.
+
 20200212:
 	Defining the long deprecated NO_CTF, NO_DEBUG_FILES, NO_INSTALLLIB,
 	NO_MAN, NO_PROFILE, and NO_WARNS variables is now an error.  Update
 	your Makefiles and scripts to define MK_<var>=no instead as required.
 
 	One exception to this is that program or library Makefiles should
 	define MAN to empty rather than setting MK_MAN=no.
 
 20200108:
 	Clang/LLVM is now the default compiler and LLD the default
 	linker for riscv64.
 
 20200107:
 	make universe no longer uses GCC 4.2.1 on any architectures.
 	Architectures not supported by in-tree Clang/LLVM require an
 	external toolchain package.
 
 20200104:
 	GCC 4.2.1 is now not built by default, as part of the GCC 4.2.1
 	retirement plan.  Specifically, the GCC, GCC_BOOTSTRAP, and GNUCXX
 	options default to off for all supported CPU architectures.  As a
 	short-term transition aid they may be enabled via WITH_* options.
 	GCC 4.2.1 is expected to be removed from the tree on 2020-03-31.
 
 20200102:
 	Support for armv5 has been disconnected and is being removed. The
 	machine combination MACHINE=arm MACHINE_ARCH=arm is no longer valid.
 	You must now use a MACHINE_ARCH of armv6 or armv7. The default
 	MACHINE_ARCH for MACHINE=arm is now armv7.
 
 20191226:
 	Clang/LLVM is now the default compiler for all powerpc architectures.
 	LLD is now the default linker for powerpc64.  The change for powerpc64
 	also includes a change to the ELFv2 ABI, incompatible with the existing
 	ABI.
 
 20191226:
 	Kernel-loadable random(4) modules are no longer unloadable.
 
 20191222:
 	Clang, llvm, lld, lldb, compiler-rt, libc++, libunwind and openmp have
 	been upgraded to 9.0.1.  Please see the 20141231 entry below for
 	information about prerequisites and upgrading, if you are not already
 	using clang 3.5.0 or higher.
 
 20191212:
 	r355677 has modified the internal interface used between the
 	NFS modules in the kernel. As such, they must all be upgraded
 	simultaneously. I will do a version bump for this.
 
 20191205:
 	The root certificates of the Mozilla CA Certificate Store have been
 	imported into the base system and can be managed with the certctl(8)
 	utility.  If you have installed the security/ca_root_nss port or package
 	with the ETCSYMLINK option (the default), be advised that there may be
 	differences between those included in the port and those included in
 	base due to differences in nss branch used as well as general update
 	frequency.  Note also that certctl(8) cannot manage certs in the
 	format used by the security/ca_root_nss port.
 
 20191120:
 	The amd(8) automount daemon has been disabled by default, and will be
 	removed in the future.  As of FreeBSD 10.1 the autofs(5) is available
 	for automounting.
 
 20191107:
 	The nctgpio and wbwd drivers have been moved to the superio bus.
 	If you have one of these drivers in a kernel configuration, then
 	you should add device superio to it.  If you use one of these drivers
 	as a module and you compile a custom set of modules, then you should
 	add superio to the set.
 
 20191021:
 	KPIs for network drivers to access interface addresses have changed.
 	Users need to recompile NIC driver modules together with kernel.
 
 20191021:
 	The net.link.tap.user_open sysctl no longer prevents user opening of
 	already created /dev/tapNN devices.  Access is still controlled by
 	node permissions, just like tun devices.  The net.link.tap.user_open
 	sysctl is now used only to allow users to perform devfs cloning of
 	tap devices, and the subsequent open may not succeed if the user is not
 	in the appropriate group.  This sysctl may be deprecated/removed
 	completely in the future.
 
 20191009:
 	mips, powerpc, and sparc64 are no longer built as part of
 	universe / tinderbox unless MAKE_OBSOLETE_GCC is defined. If
 	not defined, mips, powerpc, and sparc64 builds will look for
 	the xtoolchain binaries and if installed use them for universe
 	builds. As llvm 9.0 becomes vetted for these architectures, they
 	will be removed from the list.
 
 20191009:
 	Clang, llvm, lld, lldb, compiler-rt, libc++, libunwind and openmp have
 	been upgraded to 9.0.0.  Please see the 20141231 entry below for
 	information about prerequisites and upgrading, if you are not already
 	using clang 3.5.0 or higher.
 
 20191003:
 	The hpt27xx, hptmv, hptnr, and hptrr drivers have been removed from
 	GENERIC.  They are available as modules and can be loaded by adding
 	to /boot/loader.conf hpt27xx_load="YES", hptmv_load="YES",
 	hptnr_load="YES", or hptrr_load="YES", respectively.
 
 20190913:
 	ntpd no longer by default locks its pages in memory, allowing them
 	to be paged out by the kernel. Use rlimit memlock to restore
 	historic BSD behaviour. For example, add "rlimit memlock 32"
 	to ntp.conf to lock up to 32 MB of ntpd address space in memory.
 
 20190823:
 	Several of ping6's options have been renamed for better consistency
 	with ping.  If you use any of -ARWXaghmrtwx, you must update your
 	scripts.  See ping6(8) for details.
 
 20190727:
 	The vfs.fusefs.sync_unmount and vfs.fusefs.init_backgrounded sysctls
 	and the "-o sync_unmount" and "-o init_backgrounded" mount options have
 	been removed from mount_fusefs(8).  You can safely remove them from
 	your scripts, because they had no effect.
 
 	The vfs.fusefs.fix_broken_io, vfs.fusefs.sync_resize,
 	vfs.fusefs.refresh_size, vfs.fusefs.mmap_enable,
 	vfs.fusefs.reclaim_revoked, and vfs.fusefs.data_cache_invalidate
 	sysctls have been removed.  If you felt the need to set any of them to
 	a non-default value, please tell asomers@FreeBSD.org why.
 
 20190713:
 	Default permissions on the /var/account/acct file (and copies of it
 	rotated by periodic daily scripts) are changed from 0644 to 0640
 	because the file contains sensitive information that should not be
 	world-readable.  If the /var/account directory must be created by
 	rc.d/accounting, the mode used is now 0750.  Admins who use the
 	accounting feature are encouraged to change the mode of an existing
 	/var/account directory to 0750 or 0700.
 
 20190620:
 	Entropy collection and the /dev/random device are no longer optional
 	components.  The "device random" option has been removed.
 	Implementations of distilling algorithms can still be made loadable
 	with "options RANDOM_LOADABLE" (e.g., random_fortuna.ko).
 
 20190612:
 	Clang, llvm, lld, lldb, compiler-rt, libc++, libunwind and openmp have
 	been upgraded to 8.0.1.  Please see the 20141231 entry below for
 	information about prerequisites and upgrading, if you are not already
 	using clang 3.5.0 or higher.
 
 20190608:
 	A fix was applied to i386 kernel modules to avoid panics with
 	dpcpu or vnet.  Users need to recompile i386 kernel modules
 	having pcpu or vnet sections or they will refuse to load.
 
 20190513:
 	User-wired pages now have their own counter,
 	vm.stats.vm.v_user_wire_count.  The vm.max_wired sysctl was renamed
 	to vm.max_user_wired and changed from an unsigned int to an unsigned
 	long.  bhyve VMs wired with the -S are now subject to the user
 	wiring limit; the vm.max_user_wired sysctl may need to be tuned to
 	avoid running into the limit.
 
 20190507:
 	The IPSEC option has been removed from GENERIC.  Users requiring
 	ipsec(4) must now load the ipsec(4) kernel module.
 
 20190507:
 	The tap(4) driver has been folded into tun(4), and the module has been
 	renamed to tuntap.  You should update any kld_list="if_tap" or
 	kld_list="if_tun" entries in /etc/rc.conf, if_tap_load="YES" or
 	if_tun_load="YES" entries in /boot/loader.conf to load the if_tuntap
 	module instead, and "device tap" or "device tun" entries in kernel
 	config files to select the tuntap device instead.
 
 20190418:
 	The following knobs have been added related to tradeoffs between
 	safe use of the random device and availability in the absence of
 	entropy:
 
 	kern.random.initial_seeding.bypass_before_seeding: tunable; set
 	non-zero to bypass the random device prior to seeding, or zero to
 	block random requests until the random device is initially seeded.
 	For now, set to 1 (unsafe) by default to restore pre-r346250 boot
 	availability properties.
 
 	kern.random.initial_seeding.read_random_bypassed_before_seeding:
 	read-only diagnostic sysctl that is set when bypass is enabled and
 	read_random(9) is bypassed, to enable programmatic handling of this
 	initial condition, if desired.
 
 	kern.random.initial_seeding.arc4random_bypassed_before_seeding:
 	Similar to the above, but for for arc4random(9) initial seeding.
 
 	kern.random.initial_seeding.disable_bypass_warnings: tunable; set
 	non-zero to disable warnings in dmesg when the same conditions are
 	met as for the diagnostic sysctls above.  Defaults to zero, i.e.,
 	produce warnings in dmesg when the conditions are met.
 
 20190416:
 	The loadable random module KPI has changed; the random_infra_init()
 	routine now requires a 3rd function pointer for a bool (*)(void)
 	method that returns true if the random device is seeded (and
 	therefore unblocked).
 
 20190404:
 	r345895 reverts r320698. This implies that an nfsuserd(8) daemon
 	built from head sources between r320757 (July 6, 2017) and
 	r338192 (Aug. 22, 2018) will not work unless the "-use-udpsock"
 	is added to the command line.
 	nfsuserd daemons built from head sources that are post-r338192 are
 	not affected and should continue to work.
 
 20190320:
 	The fuse(4) module has been renamed to fusefs(4) for consistency with
 	other filesystems.  You should update any kld_load="fuse" entries in
 	/etc/rc.conf, fuse_load="YES" entries in /boot/loader.conf, and
 	"options FUSE" entries in kernel config files.
 
 20190304:
 	Clang, llvm, lld, lldb, compiler-rt and libc++ have been upgraded to
 	8.0.0.  Please see the 20141231 entry below for information about
 	prerequisites and upgrading, if you are not already using clang 3.5.0
 	or higher.
 
 20190226:
 	geom_uzip(4) depends on the new module xz.  If geom_uzip is statically
 	compiled into your custom kernel, add 'device xz' statement to the
 	kernel config.
 
 20190219:
 	drm and drm2 have been removed from the tree. Please see
 	https://wiki.freebsd.org/Graphics for the latest information on
 	migrating to the drm ports.
 
 20190131:
 	Iflib is no longer unconditionally compiled into the kernel.  Drivers
 	using iflib and statically compiled into the kernel, now require
 	the 'device iflib' config option.  For the same drivers loaded as
 	modules on kernels not having 'device iflib', the iflib.ko module
 	is loaded automatically.
 
 20190125:
 	The IEEE80211_AMPDU_AGE and AH_SUPPORT_AR5416 kernel configuration
 	options no longer exist since r343219 and r343427 respectively;
 	nothing uses them, so they should be just removed from custom
 	kernel config files.
 
 20181230:
 	r342635 changes the way efibootmgr(8) works by requiring users to add
 	the -b (bootnum) parameter for commands where the bootnum was previously
 	specified with each option. For example 'efibootmgr -B 0001' is now
 	'efibootmgr -B -b 0001'.
 
 20181220:
 	r342286 modifies the NFSv4 server so that it obeys vfs.nfsd.nfs_privport
 	in the same as it is applied to NFSv2 and 3.  This implies that NFSv4
 	servers that have vfs.nfsd.nfs_privport set will only allow mounts
 	from clients using a reserved port#. Since both the FreeBSD and Linux
 	NFSv4 clients use reserved port#s by default, this should not affect
 	most NFSv4 mounts.
 
 20181219:
 	The XLP config has been removed. We can't support 64-bit atomics in this
 	kernel because it is running in 32-bit mode. XLP users must transition
 	to running a 64-bit kernel (XLP64 or XLPN32).
 
 	The mips GXEMUL support has been removed from FreeBSD. MALTA* + qemu is
 	the preferred emulator today and we don't need two different ones.
 
 	The old sibyte / swarm / Broadcom BCM1250 support has been
 	removed from the mips port.
 
 20181211:
 	Clang, llvm, lld, lldb, compiler-rt and libc++ have been upgraded to
 	7.0.1.  Please see the 20141231 entry below for information about
 	prerequisites and upgrading, if you are not already using clang 3.5.0
 	or higher.
 
 20181211:
 	Remove the timed and netdate programs from the base tree.  Setting
 	the time with these daemons has been obsolete for over a decade.
 
 20181126:
 	On amd64, arm64 and armv7 (architectures that install LLVM's ld.lld
 	linker as /usr/bin/ld) GNU ld is no longer installed as ld.bfd, as
 	it produces broken binaries when ifuncs are in use.  Users needing
 	GNU ld should install the binutils port or package.
 
 20181123:
 	The BSD crtbegin and crtend code has been enabled by default. It has
 	had extensive testing on amd64, arm64, and i386. It can be disabled
 	by building a world with -DWITHOUT_BSD_CRTBEGIN.
 
 20181115:
 	The set of CTM commands (ctm, ctm_smail, ctm_rmail, ctm_dequeue)
 	has been converted to a port (misc/ctm) and will be removed from
 	FreeBSD-13.  It is available as a package (ctm) for all supported
 	FreeBSD versions.
 
 20181110:
 	The default newsyslog.conf(5) file has been changed to only include
 	files in /etc/newsyslog.conf.d/ and /usr/local/etc/newsyslog.conf.d/ if
 	the filenames end in '.conf' and do not begin with a '.'.
 
 	You should check the configuration files in these two directories match
 	this naming convention. You can verify which configuration files are
 	being included using the command:
 		$ newsyslog -Nrv
 
 20181015:
 	Ports for the DRM modules have been simplified. Now, amd64 users should
 	just install the drm-kmod port. All others should install
 	drm-legacy-kmod.
 
 	Graphics hardware that's newer than about 2010 usually works with
 	drm-kmod.  For hardware older than 2013, however, some users will need
 	to use drm-legacy-kmod if drm-kmod doesn't work for them. Hardware older
 	than 2008 usually only works in drm-legacy-kmod. The graphics team can
 	only commit to hardware made since 2013 due to the complexity of the
 	market and difficulty to test all the older cards effectively. If you
 	have hardware supported by drm-kmod, you are strongly encouraged to use
 	that as you will get better support.
 
 	Other than KPI chasing, drm-legacy-kmod will not be updated. As outlined
 	elsewhere, the drm and drm2 modules will be eliminated from the src base
 	soon (with a limited exception for arm). Please update to the package
 	asap and report any issues to x11@freebsd.org.
 
 	Generally, anybody using the drm*-kmod packages should add
 	WITHOUT_DRM_MODULE=t and WITHOUT_DRM2_MODULE=t to avoid nasty
 	cross-threading surprises, especially with automatic driver
 	loading from X11 startup. These will become the defaults in 13-current
 	shortly.
 
 20181012:
 	The ixlv(4) driver has been renamed to iavf(4).  As a consequence,
 	custom kernel and module loading configuration files must be updated
 	accordingly.  Moreover, interfaces previous presented as ixlvN to the
 	system are now exposed as iavfN and network configuration files must
 	be adjusted as necessary.
 
 20181009:
 	OpenSSL has been updated to version 1.1.1.  This update included
 	additional various API changes throughout the base system.  It is
 	important to rebuild third-party software after upgrading.  The value
 	of __FreeBSD_version has been bumped accordingly.
 
 20181006:
 	The legacy DRM modules and drivers have now been added to the loader's
 	module blacklist, in favor of loading them with kld_list in rc.conf(5).
 	The module blacklist may be overridden with the loader.conf(5)
 	'module_blacklist' variable, but loading them via rc.conf(5) is strongly
 	encouraged.
 
 20181002:
 	The cam(4) based nda(4) driver will be used over nvd(4) by default on
 	powerpc64. You may set 'options NVME_USE_NVD=1' in your kernel conf or
 	loader tunable 'hw.nvme.use_nvd=1' if you wish to use the existing
 	driver.  Make sure to edit /boot/etc/kboot.conf and fstab to use the
 	nda device name.
 
 20180913:
 	Reproducible build mode is now on by default, in preparation for
 	FreeBSD 12.0.  This eliminates build metadata such as the user,
 	host, and time from the kernel (and uname), unless the working tree
 	corresponds to a modified checkout from a version control system.
 	The previous behavior can be obtained by setting the /etc/src.conf
 	knob WITHOUT_REPRODUCIBLE_BUILD.
 
 20180826:
 	The Yarrow CSPRNG has been removed from the kernel as it has not been
 	supported by its designers since at least 2003. Fortuna has been the
 	default since FreeBSD-11.
 
 20180822:
 	devctl freeze/thaw have gone into the tree, the rc scripts have been
 	updated to use them and devmatch has been changed.  You should update
 	kernel, userland and rc scripts all at the same time.
 
 20180818:
 	The default interpreter has been switched from 4th to Lua.
 	LOADER_DEFAULT_INTERP, documented in build(7), will override the default
 	interpreter.  If you have custom FORTH code you will need to set
 	LOADER_DEFAULT_INTERP=4th (valid values are 4th, lua or simp) in
 	src.conf for the build.  This will create default hard links between
 	loader and loader_4th instead of loader and loader_lua, the new default.
 	If you are using UEFI it will create the proper hard link to loader.efi.
 
 	bhyve uses userboot.so. It remains 4th-only until some issues are solved
 	regarding coexisting with multiple versions of FreeBSD are resolved.
 
 20180815:
 	ls(1) now respects the COLORTERM environment variable used in other
 	systems and software to indicate that a colored terminal is both
 	supported and desired.  If ls(1) is suddenly emitting colors, they may
 	be disabled again by either removing the unwanted COLORTERM from your
 	environment, or using `ls --color=never`.  The ls(1) specific CLICOLOR
 	may not be observed in a future release.
 
 20180808:
 	The default pager for most commands has been changed to "less".  To
 	restore the old behavior, set PAGER="more" and MANPAGER="more -s" in
 	your environment.
 
 20180731:
 	The jedec_ts(4) driver has been removed. A superset of its functionality
 	is available in the jedec_dimm(4) driver, and the manpage for that
 	driver includes migration instructions. If you have "device jedec_ts"
 	in your kernel configuration file, it must be removed.
 
 20180730:
 	amd64/GENERIC now has EFI runtime services, EFIRT, enabled by default.
 	This should have no effect if the kernel is booted via BIOS/legacy boot.
 	EFIRT may be disabled via a loader tunable, efi.rt.disabled, if a system
 	has a buggy firmware that prevents a successful boot due to use of
 	runtime services.
 
 20180727:
 	Atmel AT91RM9200 and AT91SAM9, Cavium CNS 11xx and XScale
 	support has been removed from the tree. These ports were
 	obsolete and/or known to be broken for many years.
 
 20180723:
 	loader.efi has been augmented to participate more fully in the
 	UEFI boot manager protocol. loader.efi will now look at the
 	BootXXXX environment variable to determine if a specific kernel
 	or root partition was specified. XXXX is derived from BootCurrent.
 	efibootmgr(8) manages these standard UEFI variables.
 
 20180720:
 	zfsloader's functionality has now been folded into loader.
 	zfsloader is no longer necessary once you've updated your
 	boot blocks. For a transition period, we will install a
 	hardlink for zfsloader to loader to allow a smooth transition
 	until the boot blocks can be updated (hard link because old
 	zfs boot blocks don't understand symlinks).
 
 20180719:
 	ARM64 now have efifb support, if you want to have serial console
 	on your arm64 board when an screen is connected and the bootloader
 	setup a frame buffer for us to use, just add :
 	boot_serial=YES
 	boot_multicons=YES
 	in /boot/loader.conf
 	For Raspberry Pi 3 (RPI) users, this is needed even if you don't have
 	an screen connected as the firmware will setup a frame buffer are that
 	u-boot will expose as an EFI frame buffer.
 
 20180719:
 	New uid:gid added, ntpd:ntpd (123:123).  Be sure to run mergemaster
 	or take steps to update /etc/passwd before doing installworld on
 	existing systems.  Do not skip the "mergemaster -Fp" step before
 	installworld, as described in the update procedures near the bottom
 	of this document.  Also, rc.d/ntpd now starts ntpd(8) as user ntpd
 	if the new mac_ntpd(4) policy is available, unless ntpd_flags or
 	the ntp config file contain options that change file/dir locations.
 	When such options (e.g., "statsdir" or "crypto") are used, ntpd can
 	still be run as non-root by setting ntpd_user=ntpd in rc.conf, after
 	taking steps to ensure that all required files/dirs are accessible
 	by the ntpd user.
 
 20180717:
 	Big endian arm support has been removed.
 
 20180711:
 	The static environment setup in kernel configs is no longer mutually
 	exclusive with the loader(8) environment by default.  In order to
 	restore the previous default behavior of disabling the loader(8)
 	environment if a static environment is present, you must specify
 	loader_env.disabled=1 in the static environment.
 
 20180705:
 	The ABI of syscalls used by management tools like sockstat and
 	netstat has been broken to allow 32-bit binaries to work on
 	64-bit kernels without modification.  These programs will need
 	to match the kernel in order to function.  External programs may
 	require minor modifications to accommodate a change of type in
 	structures from pointers to 64-bit virtual addresses.
 
 20180702:
 	On i386 and amd64 atomics are now inlined. Out of tree modules using
 	atomics will need to be rebuilt.
 
 20180701:
 	The '%I' format in the kern.corefile sysctl limits the number of
 	core files that a process can generate to the number stored in the
 	debug.ncores sysctl. The '%I' format is replaced by the single digit
 	index. Previously, if all indexes were taken the kernel would overwrite
 	only a core file with the highest index in a filename.
 	Currently the system will create a new core file if there is a free
 	index or if all slots are taken it will overwrite the oldest one.
 
 20180630:
 	Clang, llvm, lld, lldb, compiler-rt and libc++ have been upgraded to
 	6.0.1.  Please see the 20141231 entry below for information about
 	prerequisites and upgrading, if you are not already using clang 3.5.0
 	or higher.
 
 20180628:
 	r335753 introduced a new quoting method. However, etc/devd/devmatch.conf
 	needed to be changed to work with it. This change was made with r335763
 	and requires a mergemaster / etcupdate / etc to update the installed
 	file.
 
 20180612:
 	r334930 changed the interface between the NFS modules, so they all
 	need to be rebuilt.  r335018 did a __FreeBSD_version bump for this.
 
 20180530:
 	As of r334391 lld is the default amd64 system linker; it is installed
 	as /usr/bin/ld.  Kernel build workarounds (see 20180510 entry) are no
 	longer necessary.
 
 20180530:
 	The kernel / userland interface for devinfo changed, so you'll
 	need a new kernel and userland as a pair for it to work (rebuilding
 	lib/libdevinfo is all that's required). devinfo and devmatch will
 	not work, but everything else will when there's a mismatch.
 
 20180523:
 	The on-disk format for hwpmc callchain records has changed to include
 	threadid corresponding to a given record. This changes the field offsets
 	and thus requires that libpmcstat be rebuilt before using a kernel
 	later than r334108.
 
 20180517:
 	The vxge(4) driver has been removed.  This driver was introduced into
 	HEAD one week before the Exar left the Ethernet market and is not
 	known to be used.  If you have device vxge in your kernel config file
 	it must be removed.
 
 20180510:
 	The amd64 kernel now requires a ld that supports ifunc to produce a
 	working kernel, either lld or a newer binutils. lld is built by default
 	on amd64, and the 'buildkernel' target uses it automatically. However,
 	it is not the default linker, so building the kernel the traditional
 	way requires LD=ld.lld on the command line (or LD=/usr/local/bin/ld for
 	binutils port/package). lld will soon be default, and this requirement
 	will go away.
 
 	NOTE: As of r334391 lld is the default system linker on amd64, and no
 	workaround is necessary.
 
 20180508:
 	The nxge(4) driver has been removed.  This driver was for PCI-X 10g
 	cards made by s2io/Neterion.  The company was acquired by Exar and
 	no longer sells or supports Ethernet products.  If you have device
 	nxge in your kernel config file it must be removed.
 
 20180504:
 	The tz database (tzdb) has been updated to 2018e.  This version more
 	correctly models time stamps in time zones with negative DST such as
 	Europe/Dublin (from 1971 on), Europe/Prague (1946/7), and
 	Africa/Windhoek (1994/2017).  This does not affect the UT offsets, only
 	time zone abbreviations and the tm_isdst flag.
 
 20180502:
 	The ixgb(4) driver has been removed.  This driver was for an early and
 	uncommon legacy PCI 10GbE for a single ASIC, Intel 82597EX. Intel
 	quickly shifted to the long lived ixgbe family.  If you have device
 	ixgb in your kernel config file it must be removed.
 
 20180501:
 	The lmc(4) driver has been removed.  This was a WAN interface
 	card that was already reportedly rare in 2003, and had an ambiguous
 	license.  If you have device lmc in your kernel config file it must
 	be removed.
 
 20180413:
 	Support for Arcnet networks has been removed.  If you have device
 	arcnet or device cm in your kernel config file they must be
 	removed.
 
 20180411:
 	Support for FDDI networks has been removed.  If you have device
 	fddi or device fpa in your kernel config file they must be
 	removed.
 
 20180406:
 	In addition to supporting RFC 3164 formatted messages, the
 	syslogd(8) service is now capable of parsing RFC 5424 formatted
 	log messages. The main benefit of using RFC 5424 is that clients
 	may now send log messages with timestamps containing year numbers,
 	microseconds and time zone offsets.
 
 	Similarly, the syslog(3) C library function has been altered to
 	send RFC 5424 formatted messages to the local system logging
 	daemon. On systems using syslogd(8), this change should have no
 	negative impact, as long as syslogd(8) and the C library are
 	updated at the same time. On systems using a different system
 	logging daemon, it may be necessary to make configuration
 	adjustments, depending on the software used.
 
 	When using syslog-ng, add the 'syslog-protocol' flag to local
 	input sources to enable parsing of RFC 5424 formatted messages:
 
 		source src {
 			unix-dgram("/var/run/log" flags(syslog-protocol));
 		}
 
 	When using rsyslog, disable the 'SysSock.UseSpecialParser' option
 	of the 'imuxsock' module to let messages be processed by the
 	regular RFC 3164/5424 parsing pipeline:
 
 		module(load="imuxsock" SysSock.UseSpecialParser="off")
 
 	Do note that these changes only affect communication between local
 	applications and syslogd(8). The format that syslogd(8) uses to
 	store messages on disk or forward messages to other systems
 	remains unchanged. syslogd(8) still uses RFC 3164 for these
 	purposes. Options to customize this behaviour will be added in the
 	future. Utilities that process log files stored in /var/log are
 	thus expected to continue to function as before.
 
 	__FreeBSD_version has been incremented to 1200061 to denote this
 	change.
 
 20180328:
 	Support for token ring networks has been removed. If you
 	have "device token" in your kernel config you should remove
 	it. No device drivers supported token ring.
 
 20180323:
 	makefs was modified to be able to tag ISO9660 El Torito boot catalog
 	entries as EFI instead of overloading the i386 tag as done previously.
 	The amd64 mkisoimages.sh script used to build amd64 ISO images for
 	release was updated to use this. This may mean that makefs must be
 	updated before "make cdrom" can be run in the release directory. This
 	should be as simple as:
 
 		$ cd $SRCDIR/usr.sbin/makefs
 		$ make depend all install
 
 20180212:
 	FreeBSD boot loader enhanced with Lua scripting. It's purely opt-in for
 	now by building WITH_LOADER_LUA and WITHOUT_FORTH in /etc/src.conf.
 	Co-existence for the transition period will come shortly. Booting is a
 	complex environment and test coverage for Lua-enabled loaders has been
 	thin, so it would be prudent to assume it might not work and make
 	provisions for backup boot methods.
 
 20180211:
 	devmatch functionality has been turned on in devd. It will automatically
 	load drivers for unattached devices. This may cause unexpected drivers
 	to be loaded. Please report any problems to current@ and
 	imp@freebsd.org.
 
 20180114:
 	Clang, llvm, lld, lldb, compiler-rt and libc++ have been upgraded to
 	6.0.0.  Please see the 20141231 entry below for information about
 	prerequisites and upgrading, if you are not already using clang 3.5.0
 	or higher.
 
 20180110:
 	LLVM's lld linker is now used as the FreeBSD/amd64 bootstrap linker.
 	This means it is used to link the kernel and userland libraries and
 	executables, but is not yet installed as /usr/bin/ld by default.
 
 	To revert to ld.bfd as the bootstrap linker, in /etc/src.conf set
 	WITHOUT_LLD_BOOTSTRAP=yes
 
 20180110:
 	On i386, pmtimer has been removed. Its functionality has been folded
 	into apm. It was a no-op on ACPI in current for a while now (but was
 	still needed on i386 in FreeBSD 11 and earlier). Users may need to
 	remove it from kernel config files.
 
 20180104:
 	The use of RSS hash from the network card aka flowid has been
 	disabled by default for lagg(4) as it's currently incompatible with
 	the lacp and loadbalance protocols.
 
 	This can be re-enabled by setting the following in loader.conf:
 	net.link.lagg.default_use_flowid="1"
 
 20180102:
 	The SW_WATCHDOG option is no longer necessary to enable the
 	hardclock-based software watchdog if no hardware watchdog is
 	configured. As before, SW_WATCHDOG will cause the software
 	watchdog to be enabled even if a hardware watchdog is configured.
 
 20171215:
 	r326887 fixes the issue described in the 20171214 UPDATING entry.
 	r326888 flips the switch back to building GELI support always.
 
 20171214:
 	r362593 broke ZFS + GELI support for reasons unknown. However,
 	it also broke ZFS support generally, so GELI has been turned off
 	by default as the lesser evil in r326857. If you boot off ZFS and/or
 	GELI, it might not be a good time to update.
 
 20171125:
 	PowerPC users must update loader(8) by rebuilding world before
 	installing a new kernel, as the protocol connecting them has
 	changed. Without the update, loader metadata will not be passed
 	successfully to the kernel and users will have to enter their
 	root partition at the kernel mountroot prompt to continue booting.
 	Newer versions of loader can boot old kernels without issue.
 
 20171110:
 	The LOADER_FIREWIRE_SUPPORT build variable as been renamed to
 	WITH/OUT_LOADER_FIREWIRE. LOADER_{NO_,}GELI_SUPPORT has been renamed
 	to WITH/OUT_LOADER_GELI.
 
 20171106:
 	The naive and non-compliant support of posix_fallocate(2) in ZFS
 	has been removed as of r325320.  The system call now returns EINVAL
 	when used on a ZFS file.  Although the new behavior complies with the
 	standard, some consumers are not prepared to cope with it.
 	One known victim is lld prior to r325420.
 
 20171102:
 	Building in a FreeBSD src checkout will automatically create object
 	directories now rather than store files in the current directory if
 	'make obj' was not ran.  Calling 'make obj' is no longer necessary.
 	This feature can be disabled by setting WITHOUT_AUTO_OBJ=yes in
 	/etc/src-env.conf (not /etc/src.conf), or passing the option in the
 	environment.
 
 20171101:
 	The default MAKEOBJDIR has changed from /usr/obj/<srcdir> for native
 	builds, and /usr/obj/<arch>/<srcdir> for cross-builds, to a unified
 	/usr/obj/<srcdir>/<arch>.  This behavior can be changed to the old
 	format by setting WITHOUT_UNIFIED_OBJDIR=yes in /etc/src-env.conf,
 	the environment, or with -DWITHOUT_UNIFIED_OBJDIR when building.
 	The UNIFIED_OBJDIR option is a transitional feature that will be
 	removed for 12.0 release; please migrate to the new format for any
 	tools by looking up the OBJDIR used by 'make -V .OBJDIR' means rather
 	than hardcoding paths.
 
 20171028:
 	The native-xtools target no longer installs the files by default to the
 	OBJDIR.  Use the native-xtools-install target with a DESTDIR to install
 	to ${DESTDIR}/${NXTP} where NXTP defaults to /nxb-bin.
 
 20171021:
 	As part of the boot loader infrastructure cleanup, LOADER_*_SUPPORT
 	options are changing from controlling the build if defined / undefined
 	to controlling the build with explicit 'yes' or 'no' values. They will
 	shift to WITH/WITHOUT options to match other options in the system.
 
 20171010:
 	libstand has turned into a private library for sys/boot use only.
 	It is no longer supported as a public interface outside of sys/boot.
 
 20171005:
 	The arm port has split armv6 into armv6 and armv7. armv7 is now
 	a valid TARGET_ARCH/MACHINE_ARCH setting. If you have an armv7 system
 	and are running a kernel from before r324363, you will need to add
 	MACHINE_ARCH=armv7 to 'make buildworld' to do a native build.
 
 20171003:
 	When building multiple kernels using KERNCONF, non-existent KERNCONF
 	files will produce an error and buildkernel will fail. Previously
 	missing KERNCONF files silently failed giving no indication as to
 	why, only to subsequently discover during installkernel that the
 	desired kernel was never built in the first place.
 
 20170912:
 	The default serial number format for CTL LUNs has changed.  This will
 	affect users who use /dev/diskid/* device nodes, or whose FibreChannel
 	or iSCSI clients care about their LUNs' serial numbers.  Users who
 	require serial number stability should hardcode serial numbers in
 	/etc/ctl.conf .
 
 20170912:
 	For 32-bit arm compiled for hard-float support, soft-floating point
 	binaries now always get their shared libraries from
 	LD_SOFT_LIBRARY_PATH (in the past, this was only used if
 	/usr/libsoft also existed). Only users with a hard-float ld.so, but
 	soft-float everything else should be affected.
 
 20170826:
 	The geli password typed at boot is now hidden.  To restore the previous
 	behavior, see geli(8) for configuration options.
 
 20170825:
 	Move PMTUD blackhole counters to TCPSTATS and remove them from bare
 	sysctl values.  Minor nit, but requires a rebuild of both world/kernel
 	to complete.
 
 20170814:
 	"make check" behavior (made in ^/head@r295380) has been changed to
 	execute from a limited sandbox, as opposed to executing from
 	${TESTSDIR}.
 
 	Behavioral changes:
 	- The "beforecheck" and "aftercheck" targets are now specified.
 	- ${CHECKDIR} (added in commit noted above) has been removed.
 	- Legacy behavior can be enabled by setting
 	  WITHOUT_MAKE_CHECK_USE_SANDBOX in src.conf(5) or the environment.
 
 	If the limited sandbox mode is enabled, "make check" will execute
 	"make distribution", then install, execute the tests, and clean up the
 	sandbox if successful.
 
 	The "make distribution" and "make install" targets are typically run as
 	root to set appropriate permissions and ownership at installation time.
 	The end-user should set "WITH_INSTALL_AS_USER" in src.conf(5) or the
 	environment if executing "make check" with limited sandbox mode using
 	an unprivileged user.
 
 20170808:
 	Since the switch to GPT disk labels, fsck for UFS/FFS has been
 	unable to automatically find alternate superblocks. As of r322297,
 	the information needed to find alternate superblocks has been
 	moved to the end of the area reserved for the boot block.
 	Filesystems created with a newfs of this vintage or later
 	will create the recovery information. If you have a filesystem
 	created prior to this change and wish to have a recovery block
 	created for your filesystem, you can do so by running fsck in
 	foreground mode (i.e., do not use the -p or -y options). As it
 	starts, fsck will ask ``SAVE DATA TO FIND ALTERNATE SUPERBLOCKS''
 	to which you should answer yes.
 
 20170728:
 	As of r321665, an NFSv4 server configuration that services
 	Kerberos mounts or clients that do not support the uid/gid in
 	owner/owner_group string capability, must explicitly enable
 	the nfsuserd daemon by adding nfsuserd_enable="YES" to the
 	machine's /etc/rc.conf file.
 
 20170722:
 	Clang, llvm, lldb, compiler-rt and libc++ have been upgraded to 5.0.0.
 	Please see the 20141231 entry below for information about prerequisites
 	and upgrading, if you are not already using clang 3.5.0 or higher.
 
 20170701:
 	WITHOUT_RCMDS is now the default. Set WITH_RCMDS if you need the
 	r-commands (rlogin, rsh, etc.) to be built with the base system.
 
 20170625:
 	The FreeBSD/powerpc platform now uses a 64-bit type for time_t.  This is
 	a very major ABI incompatible change, so users of FreeBSD/powerpc must
 	be careful when performing source upgrades.  It is best to run
 	'make installworld' from an alternate root system, either a live
 	CD/memory stick, or a temporary root partition.  Additionally, all ports
 	must be recompiled.  powerpc64 is largely unaffected, except in the case
 	of 32-bit compatibility.  All 32-bit binaries will be affected.
 
 20170623:
 	Forward compatibility for the "ino64" project have been committed. This
 	will allow most new binaries to run on older kernels in a limited
 	fashion.  This prevents many of the common foot-shooting actions in the
 	upgrade as well as the limited ability to roll back the kernel across
 	the ino64 upgrade. Complicated use cases may not work properly, though
 	enough simpler ones work to allow recovery in most situations.
 
 20170620:
 	Switch back to the BSDL dtc (Device Tree Compiler). Set WITH_GPL_DTC
 	if you require the GPL compiler.
 
 20170618:
 	The internal ABI used for communication between the NFS kernel modules
 	was changed by r320085, so __FreeBSD_version was bumped to
 	ensure all the NFS related modules are updated together.
 
 20170617:
 	The ABI of struct event was changed by extending the data
 	member to 64bit and adding ext fields.  For upgrade, same
 	precautions as for the entry 20170523 "ino64" must be
 	followed.
 
 20170531:
 	The GNU roff toolchain has been removed from base. To render manpages
 	which are not supported by mandoc(1), man(1) can fallback on GNU roff
 	from ports (and recommends to install it).
 	To render roff(7) documents, consider using GNU roff from ports or the
 	heirloom doctools roff toolchain from ports via pkg install groff or
 	via pkg install heirloom-doctools.
 
 20170524:
 	The ath(4) and ath_hal(4) modules now build piecemeal to allow for
 	smaller runtime footprint builds.  This is useful for embedded systems
 	which only require one chipset support.
 
 	If you load it as a module, make sure this is in /boot/loader.conf:
 
 	if_ath_load="YES"
 
 	This will load the HAL, all chip/RF backends and if_ath_pci.
 	If you have if_ath_pci in /boot/loader.conf, ensure it is after
 	if_ath or it will not load any HAL chipset support.
 
 	If you want to selectively load things (eg on ye cheape ARM/MIPS
 	platforms where RAM is at a premium) you should:
 
 	* load ath_hal
 	* load the chip modules in question
 	* load ath_rate, ath_dfs
 	* load ath_main
 	* load if_ath_pci and/or if_ath_ahb depending upon your particular
 	  bus bind type - this is where probe/attach is done.
 
 	For further comments/feedback, poke adrian@ .
 
 20170523:
 	The "ino64" 64-bit inode project has been committed, which extends
 	a number of types to 64 bits.  Upgrading in place requires care and
 	adherence to the documented upgrade procedure.
 
 	If using a custom kernel configuration ensure that the
 	COMPAT_FREEBSD11 option is included (as during the upgrade the
 	system will be running the ino64 kernel with the existing world).
 
 	For the safest in-place upgrade begin by removing previous build
 	artifacts via "rm -rf /usr/obj/*".  Then, carefully follow the full
 	procedure documented below under the heading "To rebuild everything and
 	install it on the current system."  Specifically, a reboot is required
 	after installing the new kernel before installing world. While an
 	installworld normally works by accident from multiuser after rebooting
 	the proper kernel, there are many cases where this will fail across this
 	upgrade and installworld from single user is required.
 
 20170424:
 	The NATM framework including the en(4), fatm(4), hatm(4), and
 	patm(4) devices has been removed.  Consumers should plan a
 	migration before the end-of-life date for FreeBSD 11.
 
 20170420:
 	GNU diff has been replaced by a BSD licensed diff. Some features of GNU
 	diff has not been implemented, if those are needed a newer version of
 	GNU diff is available via the diffutils package under the gdiff name.
 
 20170413:
 	As of r316810 for ipfilter, keep frags is no longer assumed when
 	keep state is specified in a rule. r316810 aligns ipfilter with
 	documentation in man pages separating keep frags from keep state.
 	This allows keep state to be specified without forcing keep frags
 	and allows keep frags to be specified independently of keep state.
 	To maintain previous behaviour, also specify keep frags with
 	keep state (as documented in ipf.conf.5).
 
 20170407:
 	arm64 builds now use the base system LLD 4.0.0 linker by default,
 	instead of requiring that the aarch64-binutils port or package be
 	installed. To continue using aarch64-binutils, set
 	CROSS_BINUTILS_PREFIX=/usr/local/aarch64-freebsd/bin .
 
 20170405:
 	The UDP optimization in entry 20160818 that added the sysctl
 	net.inet.udp.require_l2_bcast has been reverted.  L2 broadcast
 	packets will no longer be treated as L3 broadcast packets.
 
 20170331:
 	Binds and sends to the loopback addresses, IPv6 and IPv4, will now
 	use any explicitly assigned loopback address available in the jail
 	instead of using the first assigned address of the jail.
 
 20170329:
 	The ctl.ko module no longer implements the iSCSI target frontend:
 	cfiscsi.ko does instead.
 
 	If building cfiscsi.ko as a kernel module, the module can be loaded
 	via one of the following methods:
 	- `cfiscsi_load="YES"` in loader.conf(5).
 	- Add `cfiscsi` to `$kld_list` in rc.conf(5).
 	- ctladm(8)/ctld(8), when compiled with iSCSI support
 	  (`WITH_ISCSI=yes` in src.conf(5))
 
 	Please see cfiscsi(4) for more details.
 
 20170316:
 	The mmcsd.ko module now additionally depends on geom_flashmap.ko.
 	Also, mmc.ko and mmcsd.ko need to be a matching pair built from the
 	same source (previously, the dependency of mmcsd.ko on mmc.ko was
 	missing, but mmcsd.ko now will refuse to load if it is incompatible
 	with mmc.ko).
 
 20170315:
 	The syntax of ipfw(8) named states was changed to avoid ambiguity.
 	If you have used named states in the firewall rules, you need to modify
 	them after installworld and before rebooting. Now named states must
 	be prefixed with colon.
 
 20170311:
 	The old drm (sys/dev/drm/) drivers for i915 and radeon have been
 	removed as the userland we provide cannot use them. The KMS version
 	(sys/dev/drm2) supports the same hardware.
 
 20170302:
 	Clang, llvm, lldb, compiler-rt and libc++ have been upgraded to 4.0.0.
 	Please see the 20141231 entry below for information about prerequisites
 	and upgrading, if you are not already using clang 3.5.0 or higher.
 
 20170221:
 	The code that provides support for ZFS .zfs/ directory functionality
 	has been reimplemented.  It's not possible now to create a snapshot
 	by mkdir under .zfs/snapshot/.  That should be the only user visible
 	change.
 
 20170216:
 	EISA bus support has been removed. The WITH_EISA option is no longer
 	valid.
 
 20170215:
 	MCA bus support has been removed.
 
 20170127:
 	The WITH_LLD_AS_LD / WITHOUT_LLD_AS_LD build knobs have been renamed
 	WITH_LLD_IS_LD / WITHOUT_LLD_IS_LD, for consistency with CLANG_IS_CC.
 
 20170112:
 	The EM_MULTIQUEUE kernel configuration option is deprecated now that
 	the em(4) driver conforms to iflib specifications.
 
 20170109:
 	The igb(4), em(4) and lem(4) ethernet drivers are now implemented via
 	IFLIB.  If you have a custom kernel configuration that excludes em(4)
 	but you use igb(4), you need to re-add em(4) to your custom
 	configuration.
 
 20161217:
 	Clang, llvm, lldb, compiler-rt and libc++ have been upgraded to 3.9.1.
 	Please see the 20141231 entry below for information about prerequisites
 	and upgrading, if you are not already using clang 3.5.0 or higher.
 
 20161124:
 	Clang, llvm, lldb, compiler-rt and libc++ have been upgraded to 3.9.0.
 	Please see the 20141231 entry below for information about prerequisites
 	and upgrading, if you are not already using clang 3.5.0 or higher.
 
 20161119:
 	The layout of the pmap structure has changed for powerpc to put the pmap
 	statistics at the front for all CPU variations.  libkvm(3) and all tools
 	that link against it need to be recompiled.
 
 20161030:
 	isl(4) and cyapa(4) drivers now require a new driver,
 	chromebook_platform(4), to work properly on Chromebook-class hardware.
 	On other types of hardware the drivers may need to be configured using
 	device hints.  Please see the corresponding manual pages for details.
 
 20161017:
 	The urtwn(4) driver was merged into rtwn(4) and now consists of
 	rtwn(4) main module + rtwn_usb(4) and rtwn_pci(4) bus-specific
 	parts.
 	Also, firmware for RTL8188CE was renamed due to possible name
 	conflict (rtwnrtl8192cU(B) -> rtwnrtl8192cE(B))
 
 20161015:
 	GNU rcs has been removed from base.  It is available as packages:
 	- rcs: Latest GPLv3 GNU rcs version.
 	- rcs57: Copy of the latest version of GNU rcs (GPLv2) before it was
 	removed from base.
 
 20161008:
 	Use of the cc_cdg, cc_chd, cc_hd, or cc_vegas congestion control
 	modules now requires that the kernel configuration contain the
 	TCP_HHOOK option. (This option is included in the GENERIC kernel.)
 
 20161003:
 	The WITHOUT_ELFCOPY_AS_OBJCOPY src.conf(5) knob has been retired.
 	ELF Tool Chain's elfcopy is always installed as /usr/bin/objcopy.
 
 20160924:
 	Relocatable object files with the extension of .So have been renamed
 	to use an extension of .pico instead.  The purpose of this change is
 	to avoid a name clash with shared libraries on case-insensitive file
 	systems.  On those file systems, foo.So is the same file as foo.so.
 
 20160918:
 	GNU rcs has been turned off by default.  It can (temporarily) be built
 	again by adding WITH_RCS knob in src.conf.
 	Otherwise, GNU rcs is available from packages:
 	- rcs: Latest GPLv3 GNU rcs version.
 	- rcs57: Copy of the latest version of GNU rcs (GPLv2) from base.
 
 20160918:
 	The backup_uses_rcs functionality has been removed from rc.subr.
 
 20160908:
 	The queue(3) debugging macro, QUEUE_MACRO_DEBUG, has been split into
 	two separate components, QUEUE_MACRO_DEBUG_TRACE and
 	QUEUE_MACRO_DEBUG_TRASH.  Define both for the original
 	QUEUE_MACRO_DEBUG behavior.
 
 20160824:
 	r304787 changed some ioctl interfaces between the iSCSI userspace
 	programs and the kernel.  ctladm, ctld, iscsictl, and iscsid must be
 	rebuilt to work with new kernels.  __FreeBSD_version has been bumped
 	to 1200005.
 
 20160818:
 	The UDP receive code has been updated to only treat incoming UDP
 	packets that were addressed to an L2 broadcast address as L3
 	broadcast packets.  It is not expected that this will affect any
 	standards-conforming UDP application.  The new behaviour can be
 	disabled by setting the sysctl net.inet.udp.require_l2_bcast to
 	0.
 
 20160818:
 	Remove the openbsd_poll system call.
 	__FreeBSD_version has been bumped because of this.
 
 20160708:
 	The stable/11 branch has been created from head@r302406.
 
 20160622:
 	The libc stub for the pipe(2) system call has been replaced with
 	a wrapper that calls the pipe2(2) system call and the pipe(2)
 	system call is now only implemented by the kernels that include
 	"options COMPAT_FREEBSD10" in their config file (this is the
 	default).  Users should ensure that this option is enabled in
 	their kernel or upgrade userspace to r302092 before upgrading their
 	kernel.
 
 20160527:
 	CAM will now strip leading spaces from SCSI disks' serial numbers.
 	This will affect users who create UFS filesystems on SCSI disks using
 	those disk's diskid device nodes.  For example, if /etc/fstab
 	previously contained a line like
 	"/dev/diskid/DISK-%20%20%20%20%20%20%20ABCDEFG0123456", you should
 	change it to "/dev/diskid/DISK-ABCDEFG0123456".  Users of geom
 	transforms like gmirror may also be affected.  ZFS users should
 	generally be fine.
 
 20160523:
 	The bitstring(3) API has been updated with new functionality and
 	improved performance.  But it is binary-incompatible with the old API.
 	Objects built with the new headers may not be linked against objects
 	built with the old headers.
 
 20160520:
 	The brk and sbrk functions have been removed from libc on arm64.
 	Binutils from ports has been updated to not link to these
 	functions and should be updated to the latest version before
 	installing a new libc.
 
 20160517:
 	The armv6 port now defaults to hard float ABI. Limited support
 	for running both hardfloat and soft float on the same system
 	is available using the libraries installed with -DWITH_LIBSOFT.
 	This has only been tested as an upgrade path for installworld
 	and packages may fail or need manual intervention to run. New
 	packages will be needed.
 
 	To update an existing self-hosted armv6hf system, you must add
 	TARGET_ARCH=armv6 on the make command line for both the build
 	and the install steps.
 
 20160510:
 	Kernel modules compiled outside of a kernel build now default to
 	installing to /boot/modules instead of /boot/kernel.  Many kernel
 	modules built this way (such as those in ports) already overrode
 	KMODDIR explicitly to install into /boot/modules.  However,
 	manually building and installing a module from /sys/modules will
 	now install to /boot/modules instead of /boot/kernel.
 
 20160414:
 	The CAM I/O scheduler has been committed to the kernel. There should be
 	no user visible impact. This does enable NCQ Trim on ada SSDs. While the
 	list of known rogues that claim support for this but actually corrupt
 	data is believed to be complete, be on the lookout for data
 	corruption. The known rogue list is believed to be complete:
 
 		o Crucial MX100, M550 drives with MU01 firmware.
 		o Micron M510 and M550 drives with MU01 firmware.
 		o Micron M500 prior to MU07 firmware
 		o Samsung 830, 840, and 850 all firmwares
 		o FCCT M500 all firmwares
 
 	Crucial has firmware http://www.crucial.com/usa/en/support-ssd-firmware
 	with working NCQ TRIM. For Micron branded drives, see your sales rep for
 	updated firmware. Black listed drives will work correctly because these
 	drives work correctly so long as no NCQ TRIMs are sent to them. Given
 	this list is the same as found in Linux, it's believed there are no
 	other rogues in the market place. All other models from the above
 	vendors work.
 
 	To be safe, if you are at all concerned, you can quirk each of your
 	drives to prevent NCQ from being sent by setting:
 		kern.cam.ada.X.quirks="0x2"
 	in loader.conf. If the drive requires the 4k sector quirk, set the
 	quirks entry to 0x3.
 
 20160330:
 	The FAST_DEPEND build option has been removed and its functionality is
 	now the one true way.  The old mkdep(1) style of 'make depend' has
 	been removed.  See 20160311 for further details.
 
 20160317:
 	Resource range types have grown from unsigned long to uintmax_t.  All
 	drivers, and anything using libdevinfo, need to be recompiled.
 
 20160311:
 	WITH_FAST_DEPEND is now enabled by default for in-tree and out-of-tree
 	builds.  It no longer runs mkdep(1) during 'make depend', and the
 	'make depend' stage can safely be skipped now as it is auto ran
 	when building 'make all' and will generate all SRCS and DPSRCS before
 	building anything else.  Dependencies are gathered at compile time with
 	-MF flags kept in separate .depend files per object file.  Users should
 	run 'make cleandepend' once if using -DNO_CLEAN to clean out older
 	stale .depend files.
 
 20160306:
 	On amd64, clang 3.8.0 can now insert sections of type AMD64_UNWIND into
 	kernel modules.  Therefore, if you load any kernel modules at boot time,
 	please install the boot loaders after you install the kernel, but before
 	rebooting, e.g.:
 
 	make buildworld
 	make buildkernel KERNCONF=YOUR_KERNEL_HERE
 	make installkernel KERNCONF=YOUR_KERNEL_HERE
 	make -C sys/boot install
 	<reboot in single user>
 
 	Then follow the usual steps, described in the General Notes section,
 	below.
 
 20160305:
 	Clang, llvm, lldb and compiler-rt have been upgraded to 3.8.0.  Please
 	see the 20141231 entry below for information about prerequisites and
 	upgrading, if you are not already using clang 3.5.0 or higher.
 
 20160301:
 	The AIO subsystem is now a standard part of the kernel.  The
 	VFS_AIO kernel option and aio.ko kernel module have been removed.
 	Due to stability concerns, asynchronous I/O requests are only
 	permitted on sockets and raw disks by default.  To enable
 	asynchronous I/O requests on all file types, set the
 	vfs.aio.enable_unsafe sysctl to a non-zero value.
 
 20160226:
 	The ELF object manipulation tool objcopy is now provided by the
 	ELF Tool Chain project rather than by GNU binutils. It should be a
 	drop-in replacement, with the addition of arm64 support. The
 	(temporary) src.conf knob WITHOUT_ELFCOPY_AS_OBJCOPY knob may be set
 	to obtain the GNU version if necessary.
 
 20160129:
 	Building ZFS pools on top of zvols is prohibited by default.  That
 	feature has never worked safely; it's always been prone to deadlocks.
 	Using a zvol as the backing store for a VM guest's virtual disk will
 	still work, even if the guest is using ZFS.  Legacy behavior can be
 	restored by setting vfs.zfs.vol.recursive=1.
 
 20160119:
 	The NONE and HPN patches has been removed from OpenSSH.  They are
 	still available in the security/openssh-portable port.
 
 20160113:
 	With the addition of ypldap(8), a new _ypldap user is now required
 	during installworld. "mergemaster -p" can be used to add the user
 	prior to installworld, as documented in the handbook.
 
 20151216:
 	The tftp loader (pxeboot) now uses the option root-path directive. As a
 	consequence it no longer looks for a pxeboot.4th file on the tftp
 	server. Instead it uses the regular /boot infrastructure as with the
 	other loaders.
 
 20151211:
 	The code to start recording plug and play data into the modules has
 	been committed. While the old tools will properly build a new kernel,
 	a number of warnings about "unknown metadata record 4" will be produced
 	for an older kldxref. To avoid such warnings, make sure to rebuild
 	the kernel toolchain (or world). Make sure that you have r292078 or
 	later when trying to build 292077 or later before rebuilding.
 
 20151207:
 	Debug data files are now built by default with 'make buildworld' and
 	installed with 'make installworld'. This facilitates debugging but
 	requires more disk space both during the build and for the installed
 	world. Debug files may be disabled by setting WITHOUT_DEBUG_FILES=yes
 	in src.conf(5).
 
 20151130:
 	r291527 changed the internal interface between the nfsd.ko and
 	nfscommon.ko modules. As such, they must both be upgraded to-gether.
 	__FreeBSD_version has been bumped because of this.
 
 20151108:
 	Add support for unicode collation strings leads to a change of
 	order of files listed by ls(1) for example. To get back to the old
 	behaviour, set LC_COLLATE environment variable to "C".
 
 	Databases administrators will need to reindex their databases given
 	collation results will be different.
 
 	Due to a bug in install(1) it is recommended to remove the ancient
 	locales before running make installworld.
 
 	rm -rf /usr/share/locale/*
 
 20151030:
 	The OpenSSL has been upgraded to 1.0.2d.  Any binaries requiring
 	libcrypto.so.7 or libssl.so.7 must be recompiled.
 
 20151020:
 	Qlogic 24xx/25xx firmware images were updated from 5.5.0 to 7.3.0.
 	Kernel modules isp_2400_multi and isp_2500_multi were removed and
 	should be replaced with isp_2400 and isp_2500 modules respectively.
 
 20151017:
 	The build previously allowed using 'make -n' to not recurse into
 	sub-directories while showing what commands would be executed, and
 	'make -n -n' to recursively show commands.  Now 'make -n' will recurse
 	and 'make -N' will not.
 
 20151012:
 	If you specify SENDMAIL_MC or SENDMAIL_CF in make.conf, mergemaster
 	and etcupdate will now use this file. A custom sendmail.cf is now
 	updated via this mechanism rather than via installworld.  If you had
 	excluded sendmail.cf in mergemaster.rc or etcupdate.conf, you may
 	want to remove the exclusion or change it to "always install".
 	/etc/mail/sendmail.cf is now managed the same way regardless of
 	whether SENDMAIL_MC/SENDMAIL_CF is used.  If you are not using
 	SENDMAIL_MC/SENDMAIL_CF there should be no change in behavior.
 
 20151011:
 	Compatibility shims for legacy ATA device names have been removed.
 	It includes ATA_STATIC_ID kernel option, kern.cam.ada.legacy_aliases
 	and kern.geom.raid.legacy_aliases loader tunables, kern.devalias.*
 	environment variables, /dev/ad* and /dev/ar* symbolic links.
 
 20151006:
 	Clang, llvm, lldb, compiler-rt and libc++ have been upgraded to 3.7.0.
 	Please see the 20141231 entry below for information about prerequisites
 	and upgrading, if you are not already using clang 3.5.0 or higher.
 
 20150924:
 	Kernel debug files have been moved to /usr/lib/debug/boot/kernel/,
 	and renamed from .symbols to .debug. This reduces the size requirements
 	on the boot partition or file system and provides consistency with
 	userland debug files.
 
 	When using the supported kernel installation method the
 	/usr/lib/debug/boot/kernel directory will be renamed (to kernel.old)
 	as is done with /boot/kernel.
 
 	Developers wishing to maintain the historical behavior of installing
 	debug files in /boot/kernel/ can set KERN_DEBUGDIR="" in src.conf(5).
 
 20150827:
 	The wireless drivers had undergone changes that remove the 'parent
 	interface' from the ifconfig -l output. The rc.d network scripts
 	used to check presence of a parent interface in the list, so old
 	scripts would fail to start wireless networking. Thus, etcupdate(3)
 	or mergemaster(8) run is required after kernel update, to update your
 	rc.d scripts in /etc.
 
 20150827:
 	pf no longer supports 'scrub fragment crop' or 'scrub fragment drop-ovl'
 	These configurations are now automatically interpreted as
 	'scrub fragment reassemble'.
 
 20150817:
 	Kernel-loadable modules for the random(4) device are back. To use
 	them, the kernel must have
 
 	device	random
 	options	RANDOM_LOADABLE
 
 	kldload(8) can then be used to load random_fortuna.ko
 	or random_yarrow.ko. Please note that due to the indirect
 	function calls that the loadable modules need to provide,
 	the build-in variants will be slightly more efficient.
 
 	The random(4) kernel option RANDOM_DUMMY has been retired due to
 	unpopularity. It was not all that useful anyway.
 
 20150813:
 	The WITHOUT_ELFTOOLCHAIN_TOOLS src.conf(5) knob has been retired.
 	Control over building the ELF Tool Chain tools is now provided by
 	the WITHOUT_TOOLCHAIN knob.
 
 20150810:
 	The polarity of Pulse Per Second (PPS) capture events with the
 	uart(4) driver has been corrected.  Prior to this change the PPS
 	"assert" event corresponded to the trailing edge of a positive PPS
 	pulse and the "clear" event was the leading edge of the next pulse.
 
 	As the width of a PPS pulse in a typical GPS receiver is on the
 	order of 1 millisecond, most users will not notice any significant
 	difference with this change.
 
 	Anyone who has compensated for the historical polarity reversal by
 	configuring a negative offset equal to the pulse width will need to
 	remove that workaround.
 
 20150809:
 	The default group assigned to /dev/dri entries has been changed
 	from 'wheel' to 'video' with the id of '44'. If you want to have
 	access to the dri devices please add yourself to the video group
 	with:
 
 	# pw groupmod video -m $USER
 
 20150806:
 	The menu.rc and loader.rc files will now be replaced during
 	upgrades. Please migrate local changes to menu.rc.local and
 	loader.rc.local instead.
 
 20150805:
 	GNU Binutils versions of addr2line, c++filt, nm, readelf, size,
 	strings and strip have been removed. The src.conf(5) knob
 	WITHOUT_ELFTOOLCHAIN_TOOLS no longer provides the binutils tools.
 
 20150728:
 	As ZFS requires more kernel stack pages than is the default on some
 	architectures e.g. i386, it now warns if KSTACK_PAGES is less than
 	ZFS_MIN_KSTACK_PAGES (which is 4 at the time of writing).
 
 	Please consider using 'options KSTACK_PAGES=X' where X is greater
 	than or equal to ZFS_MIN_KSTACK_PAGES i.e. 4 in such configurations.
 
 20150706:
 	sendmail has been updated to 8.15.2.  Starting with FreeBSD 11.0
 	and sendmail 8.15, sendmail uses uncompressed IPv6 addresses by
 	default, i.e., they will not contain "::".  For example, instead
 	of ::1, it will be 0:0:0:0:0:0:0:1.  This permits a zero subnet
 	to have a more specific match, such as different map entries for
 	IPv6:0:0 vs IPv6:0.  This change requires that configuration
 	data (including maps, files, classes, custom ruleset, etc.) must
 	use the same format, so make certain such configuration data is
 	upgrading.  As a very simple check search for patterns like
 	'IPv6:[0-9a-fA-F:]*::' and 'IPv6::'.  To return to the old
 	behavior, set the m4 option confUSE_COMPRESSED_IPV6_ADDRESSES or
 	the cf option UseCompressedIPv6Addresses.
 
 20150630:
 	The default kernel entropy-processing algorithm is now
 	Fortuna, replacing Yarrow.
 
 	Assuming you have 'device random' in your kernel config
 	file, the configurations allow a kernel option to override
 	this default. You may choose *ONE* of:
 
 	options	RANDOM_YARROW	# Legacy /dev/random algorithm.
 	options	RANDOM_DUMMY	# Blocking-only driver.
 
 	If you have neither, you get Fortuna.  For most people,
 	read no further, Fortuna will give a /dev/random that works
 	like it always used to, and the difference will be irrelevant.
 
 	If you remove 'device random', you get *NO* kernel-processed
 	entropy at all. This may be acceptable to folks building
 	embedded systems, but has complications. Carry on reading,
 	and it is assumed you know what you need.
 
 	*PLEASE* read random(4) and random(9) if you are in the
 	habit of tweaking kernel configs, and/or if you are a member
 	of the embedded community, wanting specific and not-usual
 	behaviour from your security subsystems.
 
 	NOTE!! If you use RANDOM_DUMMY and/or have no 'device
 	random', you will NOT have a functioning /dev/random, and
 	many cryptographic features will not work, including SSH.
 	You may also find strange behaviour from the random(3) set
 	of library functions, in particular sranddev(3), srandomdev(3)
 	and arc4random(3). The reason for this is that the KERN_ARND
 	sysctl only returns entropy if it thinks it has some to
 	share, and with RANDOM_DUMMY or no 'device random' this
 	will never happen.
 
 20150623:
 	An additional fix for the issue described in the 20150614 sendmail
 	entry below has been committed in revision 284717.
 
 20150616:
 	FreeBSD's old make (fmake) has been removed from the system. It is
 	available as the devel/fmake port or via pkg install fmake.
 
 20150615:
 	The fix for the issue described in the 20150614 sendmail entry
 	below has been committed in revision 284436.  The work
 	around described in that entry is no longer needed unless the
 	default setting is overridden by a confDH_PARAMETERS configuration
 	setting of '5' or pointing to a 512 bit DH parameter file.
 
 20150614:
 	ALLOW_DEPRECATED_ATF_TOOLS/ATFFILE support has been removed from
 	atf.test.mk (included from bsd.test.mk). Please upgrade devel/atf
 	and devel/kyua to version 0.20+ and adjust any calling code to work
 	with Kyuafile and kyua.
 
 20150614:
 	The import of openssl to address the FreeBSD-SA-15:10.openssl
 	security advisory includes a change which rejects handshakes
 	with DH parameters below 768 bits.  sendmail releases prior
 	to 8.15.2 (not yet released), defaulted to a 512 bit
 	DH parameter setting for client connections.  To work around
 	this interoperability, sendmail can be configured to use a
 	2048 bit DH parameter by:
 
 	1. Edit /etc/mail/`hostname`.mc
 	2. If a setting for confDH_PARAMETERS does not exist or
 	   exists and is set to a string beginning with '5',
 	   replace it with '2'.
 	3. If a setting for confDH_PARAMETERS exists and is set to
 	   a file path, create a new file with:
 		openssl dhparam -out /path/to/file 2048
 	4. Rebuild the .cf file:
 		cd /etc/mail/; make; make install
 	5. Restart sendmail:
 		cd /etc/mail/; make restart
 
 	A sendmail patch is coming, at which time this file will be
 	updated.
 
 20150604:
 	Generation of legacy formatted entries have been disabled by default
 	in pwd_mkdb(8), as all base system consumers of the legacy formatted
 	entries were converted to use the new format by default when the new,
 	machine independent format have been added and supported since FreeBSD
 	5.x.
 
 	Please see the pwd_mkdb(8) manual page for further details.
 
 20150525:
 	Clang and llvm have been upgraded to 3.6.1 release.  Please see the
 	20141231 entry below for information about prerequisites and upgrading,
 	if you are not already using 3.5.0 or higher.
 
 20150521:
 	TI platform code switched to using vendor DTS files and this update
 	may break existing systems running on Beaglebone, Beaglebone Black,
 	and Pandaboard:
 
 	- dtb files should be regenerated/reinstalled. Filenames are the
 	  same but content is different now
 	- GPIO addressing was changed, now each GPIO bank (32 pins per bank)
 	  has its own /dev/gpiocX device, e.g. pin 121 on /dev/gpioc0 in old
 	  addressing scheme is now pin 25 on /dev/gpioc3.
 	- Pandaboard: /etc/ttys should be updated, serial console device is
 	  now /dev/ttyu2, not /dev/ttyu0
 
 20150501:
 	soelim(1) from gnu/usr.bin/groff has been replaced by usr.bin/soelim.
 	If you need the GNU extension from groff soelim(1), install groff
 	from package: pkg install groff, or via ports: textproc/groff.
 
 20150423:
 	chmod, chflags, chown and chgrp now affect symlinks in -R mode as
 	defined in symlink(7); previously symlinks were silently ignored.
 
 20150415:
 	The const qualifier has been removed from iconv(3) to comply with
 	POSIX.  The ports tree is aware of this from r384038 onwards.
 
 20150416:
 	Libraries specified by LIBADD in Makefiles must have a corresponding
 	DPADD_<lib> variable to ensure correct dependencies.  This is now
 	enforced in src.libnames.mk.
 
 20150324:
 	From legacy ata(4) driver was removed support for SATA controllers
 	supported by more functional drivers ahci(4), siis(4) and mvs(4).
 	Kernel modules ataahci and ataadaptec were removed completely,
 	replaced by ahci and mvs modules respectively.
 
 20150315:
 	Clang, llvm and lldb have been upgraded to 3.6.0 release.  Please see
 	the 20141231 entry below for information about prerequisites and
 	upgrading, if you are not already using 3.5.0 or higher.
 
 20150307:
 	The 32-bit PowerPC kernel has been changed to a position-independent
 	executable. This can only be booted with a version of loader(8)
 	newer than January 31, 2015, so make sure to update both world and
 	kernel before rebooting.
 
 20150217:
 	If you are running a -CURRENT kernel since r273872 (Oct 30th, 2014),
 	but before r278950, the RNG was not seeded properly.  Immediately
 	upgrade the kernel to r278950 or later and regenerate any keys (e.g.
 	ssh keys or openssl keys) that were generated w/ a kernel from that
 	range.  This does not affect programs that directly used /dev/random
 	or /dev/urandom.  All userland uses of arc4random(3) are affected.
 
 20150210:
 	The autofs(4) ABI was changed in order to restore binary compatibility
 	with 10.1-RELEASE.  The automountd(8) daemon needs to be rebuilt to work
 	with the new kernel.
 
 20150131:
 	The powerpc64 kernel has been changed to a position-independent
 	executable. This can only be booted with a new version of loader(8),
 	so make sure to update both world and kernel before rebooting.
 
 20150118:
 	Clang and llvm have been upgraded to 3.5.1 release.  This is a bugfix
 	only release, no new features have been added.  Please see the 20141231
 	entry below for information about prerequisites and upgrading, if you
 	are not already using 3.5.0.
 
 20150107:
 	ELF tools addr2line, elfcopy (strip), nm, size, and strings are now
 	taken from the ELF Tool Chain project rather than GNU binutils. They
 	should be drop-in replacements, with the addition of arm64 support.
 	The WITHOUT_ELFTOOLCHAIN_TOOLS= knob may be used to obtain the
 	binutils tools, if necessary. See 20150805 for updated information.
 
 20150105:
 	The default Unbound configuration now enables remote control
 	using a local socket.  Users who have already enabled the
 	local_unbound service should regenerate their configuration
 	by running "service local_unbound setup" as root.
 
 20150102:
 	The GNU texinfo and GNU info pages have been removed.
 	To be able to view GNU info pages please install texinfo from ports.
 
 20141231:
 	Clang, llvm and lldb have been upgraded to 3.5.0 release.
 
 	As of this release, a prerequisite for building clang, llvm and lldb is
 	a C++11 capable compiler and C++11 standard library.  This means that to
 	be able to successfully build the cross-tools stage of buildworld, with
 	clang as the bootstrap compiler, your system compiler or cross compiler
 	should either be clang 3.3 or later, or gcc 4.8 or later, and your
 	system C++ library should be libc++, or libdstdc++ from gcc 4.8 or
 	later.
 
 	On any standard FreeBSD 10.x or 11.x installation, where clang and
 	libc++ are on by default (that is, on x86 or arm), this should work out
 	of the box.
 
 	On 9.x installations where clang is enabled by default, e.g. on x86 and
 	powerpc, libc++ will not be enabled by default, so libc++ should be
 	built (with clang) and installed first.  If both clang and libc++ are
 	missing, build clang first, then use it to build libc++.
 
 	On 8.x and earlier installations, upgrade to 9.x first, and then follow
 	the instructions for 9.x above.
 
 	Sparc64 and mips users are unaffected, as they still use gcc 4.2.1 by
 	default, and do not build clang.
 
 	Many embedded systems are resource constrained, and will not be able to
 	build clang in a reasonable time, or in some cases at all.  In those
 	cases, cross building bootable systems on amd64 is a workaround.
 
 	This new version of clang introduces a number of new warnings, of which
 	the following are most likely to appear:
 
 	-Wabsolute-value
 
 	This warns in two cases, for both C and C++:
 	* When the code is trying to take the absolute value of an unsigned
 	  quantity, which is effectively a no-op, and almost never what was
 	  intended.  The code should be fixed, if at all possible.  If you are
 	  sure that the unsigned quantity can be safely cast to signed, without
 	  loss of information or undefined behavior, you can add an explicit
 	  cast, or disable the warning.
 
 	* When the code is trying to take an absolute value, but the called
 	  abs() variant is for the wrong type, which can lead to truncation.
 	  If you want to disable the warning instead of fixing the code, please
 	  make sure that truncation will not occur, or it might lead to unwanted
 	  side-effects.
 
 	-Wtautological-undefined-compare and
 	-Wundefined-bool-conversion
 
 	These warn when C++ code is trying to compare 'this' against NULL, while
 	'this' should never be NULL in well-defined C++ code.  However, there is
 	some legacy (pre C++11) code out there, which actively abuses this
 	feature, which was less strictly defined in previous C++ versions.
 
 	Squid and openjdk do this, for example.  The warning can be turned off
 	for C++98 and earlier, but compiling the code in C++11 mode might result
 	in unexpected behavior; for example, the parts of the program that are
 	unreachable could be optimized away.
 
 20141222:
 	The old NFS client and server (kernel options NFSCLIENT, NFSSERVER)
 	kernel sources have been removed. The .h files remain, since some
 	utilities include them. This will need to be fixed later.
 	If "mount -t oldnfs ..." is attempted, it will fail.
 	If the "-o" option on mountd(8), nfsd(8) or nfsstat(1) is used,
 	the utilities will report errors.
 
 20141121:
 	The handling of LOCAL_LIB_DIRS has been altered to skip addition of
 	directories to top level SUBDIR variable when their parent
 	directory is included in LOCAL_DIRS.  Users with build systems with
 	such hierarchies and without SUBDIR entries in the parent
 	directory Makefiles should add them or add the directories to
 	LOCAL_DIRS.
 
 20141109:
 	faith(4) and faithd(8) have been removed from the base system. Faith
 	has been obsolete for a very long time.
 
 20141104:
 	vt(4), the new console driver, is enabled by default. It brings
 	support for Unicode and double-width characters, as well as
 	support for UEFI and integration with the KMS kernel video
 	drivers.
 
 	You may need to update your console settings in /etc/rc.conf,
 	most probably the keymap. During boot, /etc/rc.d/syscons will
 	indicate what you need to do.
 
 	vt(4) still has issues and lacks some features compared to
 	syscons(4). See the wiki for up-to-date information:
 	  https://wiki.freebsd.org/Newcons
 
 	If you want to keep using syscons(4), you can do so by adding
 	the following line to /boot/loader.conf:
 	  kern.vty=sc
 
 20141102:
 	pjdfstest has been integrated into kyua as an opt-in test suite.
 	Please see share/doc/pjdfstest/README for more details on how to
 	execute it.
 
 20141009:
 	gperf has been removed from the base system for architectures
 	that use clang. Ports that require gperf will obtain it from the
 	devel/gperf port.
 
 20140923:
 	pjdfstest has been moved from tools/regression/pjdfstest to
 	contrib/pjdfstest .
 
 20140922:
 	At svn r271982, The default linux compat kernel ABI has been adjusted
 	to 2.6.18 in support of the linux-c6 compat ports infrastructure
 	update.  If you wish to continue using the linux-f10 compat ports,
 	add compat.linux.osrelease=2.6.16 to your local sysctl.conf.  Users are
 	encouraged to update their linux-compat packages to linux-c6 during
 	their next update cycle.
 
 20140729:
 	The ofwfb driver, used to provide a graphics console on PowerPC when
 	using vt(4), no longer allows mmap() of all physical memory. This
 	will prevent Xorg on PowerPC with some ATI graphics cards from
 	initializing properly unless x11-servers/xorg-server is updated to
 	1.12.4_8 or newer.
 
 20140723:
 	The xdev targets have been converted to using TARGET and
 	TARGET_ARCH instead of XDEV and XDEV_ARCH.
 
 20140719:
 	The default unbound configuration has been modified to address
 	issues with reverse lookups on networks that use private
 	address ranges.  If you use the local_unbound service, run
 	"service local_unbound setup" as root to regenerate your
 	configuration, then "service local_unbound reload" to load the
 	new configuration.
 
 20140709:
 	The GNU texinfo and GNU info pages are not built and installed
 	anymore, WITH_INFO knob has been added to allow to built and install
 	them again.
 	UPDATE: see 20150102 entry on texinfo's removal
 
 20140708:
 	The GNU readline library is now an INTERNALLIB - that is, it is
 	statically linked into consumers (GDB and variants) in the base
 	system, and the shared library is no longer installed.  The
 	devel/readline port is available for third party software that
 	requires readline.
 
 20140702:
 	The Itanium architecture (ia64) has been removed from the list of
 	known architectures. This is the first step in the removal of the
 	architecture.
 
 20140701:
 	Commit r268115 has added NFSv4.1 server support, merged from
 	projects/nfsv4.1-server.  Since this includes changes to the
 	internal interfaces between the NFS related modules, a full
 	build of the kernel and modules will be necessary.
 	__FreeBSD_version has been bumped.
 
 20140629:
 	The WITHOUT_VT_SUPPORT kernel config knob has been renamed
 	WITHOUT_VT.  (The other _SUPPORT knobs have a consistent meaning
 	which differs from the behaviour controlled by this knob.)
 
 20140619:
 	Maximal length of the serial number in CTL was increased from 16 to
 	64 chars, that breaks ABI.  All CTL-related tools, such as ctladm
 	and ctld, need to be rebuilt to work with a new kernel.
 
 20140606:
 	The libatf-c and libatf-c++ major versions were downgraded to 0 and
 	1 respectively to match the upstream numbers.  They were out of
 	sync because, when they were originally added to FreeBSD, the
 	upstream versions were not respected.  These libraries are private
 	and not yet built by default, so renumbering them should be a
 	non-issue.  However, unclean source trees will yield broken test
 	programs once the operator executes "make delete-old-libs" after a
 	"make installworld".
 
 	Additionally, the atf-sh binary was made private by moving it into
 	/usr/libexec/.  Already-built shell test programs will keep the
 	path to the old binary so they will break after "make delete-old"
 	is run.
 
 	If you are using WITH_TESTS=yes (not the default), wipe the object
 	tree and rebuild from scratch to prevent spurious test failures.
 	This is only needed once: the misnumbered libraries and misplaced
 	binaries have been added to OptionalObsoleteFiles.inc so they will
 	be removed during a clean upgrade.
 
 20140512:
 	Clang and llvm have been upgraded to 3.4.1 release.
 
 20140508:
 	We bogusly installed src.opts.mk in /usr/share/mk. This file should
 	be removed to avoid issues in the future (and has been added to
 	ObsoleteFiles.inc).
 
 20140505:
 	/etc/src.conf now affects only builds of the FreeBSD src tree. In the
 	past, it affected all builds that used the bsd.*.mk files. The old
 	behavior was a bug, but people may have relied upon it. To get this
 	behavior back, you can .include /etc/src.conf from /etc/make.conf
 	(which is still global and isn't changed). This also changes the
 	behavior of incremental builds inside the tree of individual
 	directories. Set MAKESYSPATH to ".../share/mk" to do that.
 	Although this has survived make universe and some upgrade scenarios,
 	other upgrade scenarios may have broken. At least one form of
 	temporary breakage was fixed with MAKESYSPATH settings for buildworld
 	as well... In cases where MAKESYSPATH isn't working with this
 	setting, you'll need to set it to the full path to your tree.
 
 	One side effect of all this cleaning up is that bsd.compiler.mk
 	is no longer implicitly included by bsd.own.mk. If you wish to
 	use COMPILER_TYPE, you must now explicitly include bsd.compiler.mk
 	as well.
 
 20140430:
 	The lindev device has been removed since /dev/full has been made a
 	standard device.  __FreeBSD_version has been bumped.
 
 20140424:
 	The knob WITHOUT_VI was added to the base system, which controls
 	building ex(1), vi(1), etc. Older releases of FreeBSD required ex(1)
 	in order to reorder files share/termcap and didn't build ex(1) as a
 	build tool, so building/installing with WITH_VI is highly advised for
 	build hosts for older releases.
 
 	This issue has been fixed in stable/9 and stable/10 in r277022 and
 	r276991, respectively.
 
 20140418:
 	The YES_HESIOD knob has been removed. It has been obsolete for
 	a decade. Please move to using WITH_HESIOD instead or your builds
 	will silently lack HESIOD.
 
 20140405:
 	The uart(4) driver has been changed with respect to its handling
 	of the low-level console. Previously the uart(4) driver prevented
 	any process from changing the baudrate or the CLOCAL and HUPCL
 	control flags. By removing the restrictions, operators can make
 	changes to the serial console port without having to reboot.
 	However, when getty(8) is started on the serial device that is
 	associated with the low-level console, a misconfigured terminal
 	line in /etc/ttys will now have a real impact.
 	Before upgrading the kernel, make sure that /etc/ttys has the
 	serial console device configured as 3wire without baudrate to
 	preserve the previous behaviour. E.g:
 	    ttyu0  "/usr/libexec/getty 3wire"  vt100  on  secure
 
 20140306:
 	Support for libwrap (TCP wrappers) in rpcbind was disabled by default
 	to improve performance.  To re-enable it, if needed, run rpcbind
 	with command line option -W.
 
 20140226:
 	Switched back to the GPL dtc compiler due to updates in the upstream
 	dts files not being supported by the BSDL dtc compiler. You will need
 	to rebuild your kernel toolchain to pick up the new compiler. Core dumps
 	may result while building dtb files during a kernel build if you fail
 	to do so. Set WITHOUT_GPL_DTC if you require the BSDL compiler.
 
 20140216:
 	Clang and llvm have been upgraded to 3.4 release.
 
 20140216:
 	The nve(4) driver has been removed.  Please use the nfe(4) driver
 	for NVIDIA nForce MCP Ethernet adapters instead.
 
 20140212:
 	An ABI incompatibility crept into the libc++ 3.4 import in r261283.
 	This could cause certain C++ applications using shared libraries built
 	against the previous version of libc++ to crash.  The incompatibility
 	has now been fixed, but any C++ applications or shared libraries built
 	between r261283 and r261801 should be recompiled.
 
 20140204:
 	OpenSSH will now ignore errors caused by kernel lacking of Capsicum
 	capability mode support.  Please note that enabling the feature in
 	kernel is still highly recommended.
 
 20140131:
 	OpenSSH is now built with sandbox support, and will use sandbox as
 	the default privilege separation method.  This requires Capsicum
 	capability mode support in kernel.
 
 20140128:
 	The libelf and libdwarf libraries have been updated to newer
 	versions from upstream. Shared library version numbers for
 	these two libraries were bumped. Any ports or binaries
 	requiring these two libraries should be recompiled.
 	__FreeBSD_version is bumped to 1100006.
 
 20140110:
 	If a Makefile in a tests/ directory was auto-generating a Kyuafile
 	instead of providing an explicit one, this would prevent such
 	Makefile from providing its own Kyuafile in the future during
 	NO_CLEAN builds.  This has been fixed in the Makefiles but manual
 	intervention is needed to clean an objdir if you use NO_CLEAN:
 	  # find /usr/obj -name Kyuafile | xargs rm -f
 
 20131213:
 	The behavior of gss_pseudo_random() for the krb5 mechanism
 	has changed, for applications requesting a longer random string
 	than produced by the underlying enctype's pseudo-random() function.
 	In particular, the random string produced from a session key of
 	enctype aes256-cts-hmac-sha1-96 or aes256-cts-hmac-sha1-96 will
 	be different at the 17th octet and later, after this change.
 	The counter used in the PRF+ construction is now encoded as a
 	big-endian integer in accordance with RFC 4402.
 	__FreeBSD_version is bumped to 1100004.
 
 20131108:
 	The WITHOUT_ATF build knob has been removed and its functionality
 	has been subsumed into the more generic WITHOUT_TESTS.  If you were
 	using the former to disable the build of the ATF libraries, you
 	should change your settings to use the latter.
 
 20131025:
 	The default version of mtree is nmtree which is obtained from
 	NetBSD.  The output is generally the same, but may vary
 	slightly.  If you found you need identical output adding
 	"-F freebsd9" to the command line should do the trick.  For the
 	time being, the old mtree is available as fmtree.
 
 20131014:
 	libbsdyml has been renamed to libyaml and moved to /usr/lib/private.
 	This will break ports-mgmt/pkg. Rebuild the port, or upgrade to pkg
 	1.1.4_8 and verify bsdyml not linked in, before running "make
 	delete-old-libs":
 	  # make -C /usr/ports/ports-mgmt/pkg build deinstall install clean
 	  or
 	  # pkg install pkg; ldd /usr/local/sbin/pkg | grep bsdyml
 
 20131010:
 	The stable/10 branch has been created in subversion from head
 	revision r256279.
 
 COMMON ITEMS:
 
 	General Notes
 	-------------
 	Sometimes, obscure build problems are the result of environment
 	poisoning.  This can happen because the make utility reads its
 	environment when searching for values for global variables.  To run
 	your build attempts in an "environmental clean room", prefix all make
 	commands with 'env -i '.  See the env(1) manual page for more details.
 	Occasionally a build failure will occur with "make -j" due to a race
 	condition.  If this happens try building again without -j, and please
 	report a bug if it happens consistently.
 
 	When upgrading from one major version to another it is generally best to
 	upgrade to the latest code in the currently installed branch first, then
 	do an upgrade to the new branch. This is the best-tested upgrade path,
 	and has the highest probability of being successful.  Please try this
 	approach if you encounter problems with a major version upgrade.  Since
 	the stable 4.x branch point, one has generally been able to upgrade from
 	anywhere in the most recent stable branch to head / current (or even the
 	last couple of stable branches). See the top of this file when there's
 	an exception.
 
 	When upgrading a live system, having a root shell around before
 	installing anything can help undo problems. Not having a root shell
 	around can lead to problems if pam has changed too much from your
 	starting point to allow continued authentication after the upgrade.
 
 	This file should be read as a log of events. When a later event changes
 	information of a prior event, the prior event should not be deleted.
 	Instead, a pointer to the entry with the new information should be
 	placed in the old entry. Readers of this file should also sanity check
 	older entries before relying on them blindly. Authors of new entries
 	should write them with this in mind.
 
 	ZFS notes
 	---------
 	When upgrading the boot ZFS pool to a new version, always follow
 	these two steps:
 
 	1.) recompile and reinstall the ZFS boot loader and boot block
 	(this is part of "make buildworld" and "make installworld")
 
 	2.) update the ZFS boot block on your boot drive
 
 	The following example updates the ZFS boot block on the first
 	partition (freebsd-boot) of a GPT partitioned drive ada0:
 	"gpart bootcode -p /boot/gptzfsboot -i 1 ada0"
 
 	Non-boot pools do not need these updates.
 
 	To build a kernel
 	-----------------
 	If you are updating from a prior version of FreeBSD (even one just
 	a few days old), you should follow this procedure.  It is the most
 	failsafe as it uses a /usr/obj tree with a fresh mini-buildworld,
 
 	make kernel-toolchain
 	make -DALWAYS_CHECK_MAKE buildkernel KERNCONF=YOUR_KERNEL_HERE
 	make -DALWAYS_CHECK_MAKE installkernel KERNCONF=YOUR_KERNEL_HERE
 
 	To test a kernel once
 	---------------------
 	If you just want to boot a kernel once (because you are not sure
 	if it works, or if you want to boot a known bad kernel to provide
 	debugging information) run
 	make installkernel KERNCONF=YOUR_KERNEL_HERE KODIR=/boot/testkernel
 	nextboot -k testkernel
 
 	To rebuild everything and install it on the current system.
 	-----------------------------------------------------------
 	# Note: sometimes if you are running current you gotta do more than
 	# is listed here if you are upgrading from a really old current.
 
 	<make sure you have good level 0 dumps>
 	make buildworld
 	make buildkernel KERNCONF=YOUR_KERNEL_HERE
 	make installkernel KERNCONF=YOUR_KERNEL_HERE
 							[1]
 	<reboot in single user>				[3]
 	mergemaster -Fp					[5]
 	make installworld
 	mergemaster -Fi					[4]
 	make delete-old					[6]
 	<reboot>
 
 	To cross-install current onto a separate partition
 	--------------------------------------------------
 	# In this approach we use a separate partition to hold
 	# current's root, 'usr', and 'var' directories.   A partition
 	# holding "/", "/usr" and "/var" should be about 2GB in
 	# size.
 
 	<make sure you have good level 0 dumps>
 	<boot into -stable>
 	make buildworld
 	make buildkernel KERNCONF=YOUR_KERNEL_HERE
 	<maybe newfs current's root partition>
 	<mount current's root partition on directory ${CURRENT_ROOT}>
 	make installworld DESTDIR=${CURRENT_ROOT} -DDB_FROM_SRC
 	make distribution DESTDIR=${CURRENT_ROOT} # if newfs'd
 	make installkernel KERNCONF=YOUR_KERNEL_HERE DESTDIR=${CURRENT_ROOT}
 	cp /etc/fstab ${CURRENT_ROOT}/etc/fstab 		   # if newfs'd
 	<edit ${CURRENT_ROOT}/etc/fstab to mount "/" from the correct partition>
 	<reboot into current>
 	<do a "native" rebuild/install as described in the previous section>
 	<maybe install compatibility libraries from ports/misc/compat*>
 	<reboot>
 
 
 	To upgrade in-place from stable to current
 	----------------------------------------------
 	<make sure you have good level 0 dumps>
 	make buildworld					[9]
 	make buildkernel KERNCONF=YOUR_KERNEL_HERE	[8]
 	make installkernel KERNCONF=YOUR_KERNEL_HERE
 							[1]
 	<reboot in single user>				[3]
 	mergemaster -Fp					[5]
 	make installworld
 	mergemaster -Fi					[4]
 	make delete-old					[6]
 	<reboot>
 
 	Make sure that you've read the UPDATING file to understand the
 	tweaks to various things you need.  At this point in the life
 	cycle of current, things change often and you are on your own
 	to cope.  The defaults can also change, so please read ALL of
 	the UPDATING entries.
 
 	Also, if you are tracking -current, you must be subscribed to
 	freebsd-current@freebsd.org.  Make sure that before you update
 	your sources that you have read and understood all the recent
 	messages there.  If in doubt, please track -stable which has
 	much fewer pitfalls.
 
 	[1] If you have third party modules, such as vmware, you
 	should disable them at this point so they don't crash your
 	system on reboot.
 
 	[3] From the bootblocks, boot -s, and then do
 		fsck -p
 		mount -u /
 		mount -a
 		sh /etc/rc.d/zfs start	# mount zfs filesystem, if needed
 		cd src			# full path to source
 		adjkerntz -i		# if CMOS is wall time
 	Also, when doing a major release upgrade, it is required that
 	you boot into single user mode to do the installworld.
 
 	[4] Note: This step is non-optional.  Failure to do this step
 	can result in a significant reduction in the functionality of the
 	system.  Attempting to do it by hand is not recommended and those
 	that pursue this avenue should read this file carefully, as well
 	as the archives of freebsd-current and freebsd-hackers mailing lists
 	for potential gotchas.  The -U option is also useful to consider.
 	See mergemaster(8) for more information.
 
 	[5] Usually this step is a no-op.  However, from time to time
 	you may need to do this if you get unknown user in the following
 	step.  It never hurts to do it all the time.  You may need to
 	install a new mergemaster (cd src/usr.sbin/mergemaster && make
 	install) after the buildworld before this step if you last updated
 	from current before 20130425 or from -stable before 20130430.
 
 	[6] This only deletes old files and directories. Old libraries
 	can be deleted by "make delete-old-libs", but you have to make
 	sure that no program is using those libraries anymore.
 
 	[8] The new kernel must be able to run existing binaries used by an
 	installworld.  When upgrading across major versions, the new kernel's
 	configuration must include the correct COMPAT_FREEBSD<n> option for
 	existing binaries (e.g. COMPAT_FREEBSD11 to run 11.x binaries).  Failure
 	to do so may leave you with a system that is hard to boot to recover. A
 	GENERIC kernel will include suitable compatibility options to run
 	binaries from older branches.  Note that the ability to run binaries
 	from unsupported branches is not guaranteed.
 
 	Make sure that you merge any new devices from GENERIC since the
 	last time you updated your kernel config file. Options also
 	change over time, so you may need to adjust your custom kernels
 	for these as well.
 
 	[9] If CPUTYPE is defined in your /etc/make.conf, make sure to use the
 	"?=" instead of the "=" assignment operator, so that buildworld can
 	override the CPUTYPE if it needs to.
 
 	MAKEOBJDIRPREFIX must be defined in an environment variable, and
 	not on the command line, or in /etc/make.conf.  buildworld will
 	warn if it is improperly defined.
 FORMAT:
 
 This file contains a list, in reverse chronological order, of major
 breakages in tracking -current.  It is not guaranteed to be a complete
 list of such breakages, and only contains entries since September 23, 2011.
 If you need to see UPDATING entries from before that date, you will need
 to fetch an UPDATING file from an older FreeBSD release.
 
 Copyright information:
 
 Copyright 1998-2009 M. Warner Losh <imp@FreeBSD.org>
 
 Redistribution, publication, translation and use, with or without
 modification, in full or in part, in any form or format of this
 document are permitted without further permission from the author.
 
 THIS DOCUMENT IS PROVIDED BY WARNER LOSH ``AS IS'' AND ANY EXPRESS OR
 IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 DISCLAIMED.  IN NO EVENT SHALL WARNER LOSH BE LIABLE FOR ANY DIRECT,
 INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
 (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
 STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
 IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
 POSSIBILITY OF SUCH DAMAGE.
 
 Contact Warner Losh if you have any questions about your use of
 this document.
 
 $FreeBSD$
Index: head/lib/libkvm/kvm.c
===================================================================
--- head/lib/libkvm/kvm.c	(revision 358019)
+++ head/lib/libkvm/kvm.c	(revision 358020)
@@ -1,531 +1,532 @@
 /*-
  * SPDX-License-Identifier: BSD-3-Clause
  *
  * Copyright (c) 1989, 1992, 1993
  *	The Regents of the University of California.  All rights reserved.
  *
  * This code is derived from software developed by the Computer Systems
  * Engineering group at Lawrence Berkeley Laboratory under DARPA contract
  * BG 91-66 and contributed to Berkeley.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 3. Neither the name of the University nor the names of its contributors
  *    may be used to endorse or promote products derived from this software
  *    without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 __SCCSID("@(#)kvm.c	8.2 (Berkeley) 2/13/94");
 
 #include <sys/param.h>
 #include <sys/fnv_hash.h>
 
 #define	_WANT_VNET
 
 #include <sys/user.h>
 #include <sys/linker.h>
 #include <sys/pcpu.h>
 #include <sys/stat.h>
 #include <sys/sysctl.h>
 #include <sys/mman.h>
 
+#include <stdbool.h>
 #include <net/vnet.h>
 
 #include <fcntl.h>
 #include <kvm.h>
 #include <limits.h>
 #include <paths.h>
 #include <stdint.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <unistd.h>
 
 #include "kvm_private.h"
 
 SET_DECLARE(kvm_arch, struct kvm_arch);
 
 static char _kd_is_null[] = "";
 
 char *
 kvm_geterr(kvm_t *kd)
 {
 
 	if (kd == NULL)
 		return (_kd_is_null);
 	return (kd->errbuf);
 }
 
 static int
 _kvm_read_kernel_ehdr(kvm_t *kd)
 {
 	Elf *elf;
 
 	if (elf_version(EV_CURRENT) == EV_NONE) {
 		_kvm_err(kd, kd->program, "Unsupported libelf");
 		return (-1);
 	}
 	elf = elf_begin(kd->nlfd, ELF_C_READ, NULL);
 	if (elf == NULL) {
 		_kvm_err(kd, kd->program, "%s", elf_errmsg(0));
 		return (-1);
 	}
 	if (elf_kind(elf) != ELF_K_ELF) {
 		_kvm_err(kd, kd->program, "kernel is not an ELF file");
 		return (-1);
 	}
 	if (gelf_getehdr(elf, &kd->nlehdr) == NULL) {
 		_kvm_err(kd, kd->program, "%s", elf_errmsg(0));
 		elf_end(elf);
 		return (-1);
 	}
 	elf_end(elf);
 
 	switch (kd->nlehdr.e_ident[EI_DATA]) {
 	case ELFDATA2LSB:
 	case ELFDATA2MSB:
 		return (0);
 	default:
 		_kvm_err(kd, kd->program,
 		    "unsupported ELF data encoding for kernel");
 		return (-1);
 	}
 }
 
 static kvm_t *
 _kvm_open(kvm_t *kd, const char *uf, const char *mf, int flag, char *errout)
 {
 	struct kvm_arch **parch;
 	struct stat st;
 
 	kd->vmfd = -1;
 	kd->pmfd = -1;
 	kd->nlfd = -1;
 	kd->vmst = NULL;
 	kd->procbase = NULL;
 	kd->argspc = NULL;
 	kd->argv = NULL;
 
 	if (uf == NULL)
 		uf = getbootfile();
 	else if (strlen(uf) >= MAXPATHLEN) {
 		_kvm_err(kd, kd->program, "exec file name too long");
 		goto failed;
 	}
 	if (flag & ~O_RDWR) {
 		_kvm_err(kd, kd->program, "bad flags arg");
 		goto failed;
 	}
 	if (mf == NULL)
 		mf = _PATH_MEM;
 
 	if ((kd->pmfd = open(mf, flag | O_CLOEXEC, 0)) < 0) {
 		_kvm_syserr(kd, kd->program, "%s", mf);
 		goto failed;
 	}
 	if (fstat(kd->pmfd, &st) < 0) {
 		_kvm_syserr(kd, kd->program, "%s", mf);
 		goto failed;
 	}
 	if (S_ISREG(st.st_mode) && st.st_size <= 0) {
 		errno = EINVAL;
 		_kvm_syserr(kd, kd->program, "empty file");
 		goto failed;
 	}
 	if (S_ISCHR(st.st_mode)) {
 		/*
 		 * If this is a character special device, then check that
 		 * it's /dev/mem.  If so, open kmem too.  (Maybe we should
 		 * make it work for either /dev/mem or /dev/kmem -- in either
 		 * case you're working with a live kernel.)
 		 */
 		if (strcmp(mf, _PATH_DEVNULL) == 0) {
 			kd->vmfd = open(_PATH_DEVNULL, O_RDONLY | O_CLOEXEC);
 			return (kd);
 		} else if (strcmp(mf, _PATH_MEM) == 0) {
 			if ((kd->vmfd = open(_PATH_KMEM, flag | O_CLOEXEC)) <
 			    0) {
 				_kvm_syserr(kd, kd->program, "%s", _PATH_KMEM);
 				goto failed;
 			}
 			return (kd);
 		}
 	}
 
 	/*
 	 * This is either a crash dump or a remote live system with its physical
 	 * memory fully accessible via a special device.
 	 * Open the namelist fd and determine the architecture.
 	 */
 	if ((kd->nlfd = open(uf, O_RDONLY | O_CLOEXEC, 0)) < 0) {
 		_kvm_syserr(kd, kd->program, "%s", uf);
 		goto failed;
 	}
 	if (_kvm_read_kernel_ehdr(kd) < 0)
 		goto failed;
 	if (strncmp(mf, _PATH_FWMEM, strlen(_PATH_FWMEM)) == 0 ||
 	    strncmp(mf, _PATH_DEVVMM, strlen(_PATH_DEVVMM)) == 0) {
 		kd->rawdump = 1;
 		kd->writable = 1;
 	}
 	SET_FOREACH(parch, kvm_arch) {
 		if ((*parch)->ka_probe(kd)) {
 			kd->arch = *parch;
 			break;
 		}
 	}
 	if (kd->arch == NULL) {
 		_kvm_err(kd, kd->program, "unsupported architecture");
 		goto failed;
 	}
 
 	/*
 	 * Non-native kernels require a symbol resolver.
 	 */
 	if (!kd->arch->ka_native(kd) && kd->resolve_symbol == NULL) {
 		_kvm_err(kd, kd->program,
 		    "non-native kernel requires a symbol resolver");
 		goto failed;
 	}
 
 	/*
 	 * Initialize the virtual address translation machinery.
 	 */
 	if (kd->arch->ka_initvtop(kd) < 0)
 		goto failed;
 	return (kd);
 failed:
 	/*
 	 * Copy out the error if doing sane error semantics.
 	 */
 	if (errout != NULL)
 		strlcpy(errout, kd->errbuf, _POSIX2_LINE_MAX);
 	(void)kvm_close(kd);
 	return (NULL);
 }
 
 kvm_t *
 kvm_openfiles(const char *uf, const char *mf, const char *sf __unused, int flag,
     char *errout)
 {
 	kvm_t *kd;
 
 	if ((kd = calloc(1, sizeof(*kd))) == NULL) {
 		if (errout != NULL)
 			(void)strlcpy(errout, strerror(errno),
 			    _POSIX2_LINE_MAX);
 		return (NULL);
 	}
 	return (_kvm_open(kd, uf, mf, flag, errout));
 }
 
 kvm_t *
 kvm_open(const char *uf, const char *mf, const char *sf __unused, int flag,
     const char *errstr)
 {
 	kvm_t *kd;
 
 	if ((kd = calloc(1, sizeof(*kd))) == NULL) {
 		if (errstr != NULL)
 			(void)fprintf(stderr, "%s: %s\n",
 				      errstr, strerror(errno));
 		return (NULL);
 	}
 	kd->program = errstr;
 	return (_kvm_open(kd, uf, mf, flag, NULL));
 }
 
 kvm_t *
 kvm_open2(const char *uf, const char *mf, int flag, char *errout,
     int (*resolver)(const char *, kvaddr_t *))
 {
 	kvm_t *kd;
 
 	if ((kd = calloc(1, sizeof(*kd))) == NULL) {
 		if (errout != NULL)
 			(void)strlcpy(errout, strerror(errno),
 			    _POSIX2_LINE_MAX);
 		return (NULL);
 	}
 	kd->resolve_symbol = resolver;
 	return (_kvm_open(kd, uf, mf, flag, errout));
 }
 
 int
 kvm_close(kvm_t *kd)
 {
 	int error = 0;
 
 	if (kd == NULL) {
 		errno = EINVAL;
 		return (-1);
 	}
 	if (kd->vmst != NULL)
 		kd->arch->ka_freevtop(kd);
 	if (kd->pmfd >= 0)
 		error |= close(kd->pmfd);
 	if (kd->vmfd >= 0)
 		error |= close(kd->vmfd);
 	if (kd->nlfd >= 0)
 		error |= close(kd->nlfd);
 	if (kd->procbase != 0)
 		free((void *)kd->procbase);
 	if (kd->argbuf != 0)
 		free((void *) kd->argbuf);
 	if (kd->argspc != 0)
 		free((void *) kd->argspc);
 	if (kd->argv != 0)
 		free((void *)kd->argv);
 	if (kd->pt_map != NULL)
 		free(kd->pt_map);
 	if (kd->page_map != NULL)
 		free(kd->page_map);
 	if (kd->sparse_map != MAP_FAILED)
 		munmap(kd->sparse_map, kd->pt_sparse_size);
 	free((void *)kd);
 
 	return (error);
 }
 
 int
 kvm_nlist2(kvm_t *kd, struct kvm_nlist *nl)
 {
 
 	/*
 	 * If called via the public interface, permit initialization of
 	 * further virtualized modules on demand.
 	 */
 	return (_kvm_nlist(kd, nl, 1));
 }
 
 int
 kvm_nlist(kvm_t *kd, struct nlist *nl)
 {
 	struct kvm_nlist *kl;
 	int count, i, nfail;
 
 	/*
 	 * Avoid reporting truncated addresses by failing for non-native
 	 * cores.
 	 */
 	if (!kvm_native(kd)) {
 		_kvm_err(kd, kd->program, "kvm_nlist of non-native vmcore");
 		return (-1);
 	}
 
 	for (count = 0; nl[count].n_name != NULL && nl[count].n_name[0] != '\0';
 	     count++)
 		;
 	if (count == 0)
 		return (0);
 	kl = calloc(count + 1, sizeof(*kl));
 	for (i = 0; i < count; i++)
 		kl[i].n_name = nl[i].n_name;
 	nfail = kvm_nlist2(kd, kl);
 	for (i = 0; i < count; i++) {
 		nl[i].n_type = kl[i].n_type;
 		nl[i].n_other = 0;
 		nl[i].n_desc = 0;
 		nl[i].n_value = kl[i].n_value;
 	}
 	return (nfail);
 }
 
 ssize_t
 kvm_read(kvm_t *kd, u_long kva, void *buf, size_t len)
 {
 
 	return (kvm_read2(kd, kva, buf, len));
 }
 
 ssize_t
 kvm_read2(kvm_t *kd, kvaddr_t kva, void *buf, size_t len)
 {
 	int cc;
 	ssize_t cr;
 	off_t pa;
 	char *cp;
 
 	if (ISALIVE(kd)) {
 		/*
 		 * We're using /dev/kmem.  Just read straight from the
 		 * device and let the active kernel do the address translation.
 		 */
 		errno = 0;
 		if (lseek(kd->vmfd, (off_t)kva, 0) == -1 && errno != 0) {
 			_kvm_err(kd, 0, "invalid address (0x%jx)",
 			    (uintmax_t)kva);
 			return (-1);
 		}
 		cr = read(kd->vmfd, buf, len);
 		if (cr < 0) {
 			_kvm_syserr(kd, 0, "kvm_read");
 			return (-1);
 		} else if (cr < (ssize_t)len)
 			_kvm_err(kd, kd->program, "short read");
 		return (cr);
 	}
 
 	cp = buf;
 	while (len > 0) {
 		cc = kd->arch->ka_kvatop(kd, kva, &pa);
 		if (cc == 0)
 			return (-1);
 		if (cc > (ssize_t)len)
 			cc = len;
 		errno = 0;
 		if (lseek(kd->pmfd, pa, 0) == -1 && errno != 0) {
 			_kvm_syserr(kd, 0, _PATH_MEM);
 			break;
 		}
 		cr = read(kd->pmfd, cp, cc);
 		if (cr < 0) {
 			_kvm_syserr(kd, kd->program, "kvm_read");
 			break;
 		}
 		/*
 		 * If ka_kvatop returns a bogus value or our core file is
 		 * truncated, we might wind up seeking beyond the end of the
 		 * core file in which case the read will return 0 (EOF).
 		 */
 		if (cr == 0)
 			break;
 		cp += cr;
 		kva += cr;
 		len -= cr;
 	}
 
 	return (cp - (char *)buf);
 }
 
 ssize_t
 kvm_write(kvm_t *kd, u_long kva, const void *buf, size_t len)
 {
 	int cc;
 	ssize_t cw;
 	off_t pa;
 	const char *cp;
 
 	if (!ISALIVE(kd) && !kd->writable) {
 		_kvm_err(kd, kd->program,
 		    "kvm_write not implemented for dead kernels");
 		return (-1);
 	}
 
 	if (ISALIVE(kd)) {
 		/*
 		 * Just like kvm_read, only we write.
 		 */
 		errno = 0;
 		if (lseek(kd->vmfd, (off_t)kva, 0) == -1 && errno != 0) {
 			_kvm_err(kd, 0, "invalid address (%lx)", kva);
 			return (-1);
 		}
 		cc = write(kd->vmfd, buf, len);
 		if (cc < 0) {
 			_kvm_syserr(kd, 0, "kvm_write");
 			return (-1);
 		} else if ((size_t)cc < len)
 			_kvm_err(kd, kd->program, "short write");
 		return (cc);
 	}
 
 	cp = buf;
 	while (len > 0) {
 		cc = kd->arch->ka_kvatop(kd, kva, &pa);
 		if (cc == 0)
 			return (-1);
 		if (cc > (ssize_t)len)
 			cc = len;
 		errno = 0;
 		if (lseek(kd->pmfd, pa, 0) == -1 && errno != 0) {
 			_kvm_syserr(kd, 0, _PATH_MEM);
 			break;
 		}
 		cw = write(kd->pmfd, cp, cc);
 		if (cw < 0) {
 			_kvm_syserr(kd, kd->program, "kvm_write");
 			break;
 		}
 		/*
 		 * If ka_kvatop returns a bogus value or our core file is
 		 * truncated, we might wind up seeking beyond the end of the
 		 * core file in which case the read will return 0 (EOF).
 		 */
 		if (cw == 0)
 			break;
 		cp += cw;
 		kva += cw;
 		len -= cw;
 	}
 
 	return (cp - (const char *)buf);
 }
 
 int
 kvm_native(kvm_t *kd)
 {
 
 	if (ISALIVE(kd))
 		return (1);
 	return (kd->arch->ka_native(kd));
 }
 
 int
 kvm_walk_pages(kvm_t *kd, kvm_walk_pages_cb_t *cb, void *closure)
 {
 
 	if (kd->arch->ka_walk_pages == NULL)
 		return (0);
 
 	return (kd->arch->ka_walk_pages(kd, cb, closure));
 }
 
 kssize_t
 kvm_kerndisp(kvm_t *kd)
 {
 	unsigned long kernbase, rel_kernbase;
 	size_t kernbase_len = sizeof(kernbase);
 	size_t rel_kernbase_len = sizeof(rel_kernbase);
 
 	if (ISALIVE(kd)) {
 		if (sysctlbyname("kern.base_address", &kernbase,
 		    &kernbase_len, NULL, 0) == -1) {
 			_kvm_syserr(kd, kd->program,
 				"failed to get kernel base address");
 			return (0);
 		}
 		if (sysctlbyname("kern.relbase_address", &rel_kernbase,
 		    &rel_kernbase_len, NULL, 0) == -1) {
 			_kvm_syserr(kd, kd->program,
 				"failed to get relocated kernel base address");
 			return (0);
 		}
 		return (rel_kernbase - kernbase);
 	}
 
 	if (kd->arch->ka_kerndisp == NULL)
 		return (0);
 
 	return (kd->arch->ka_kerndisp(kd));
 }
Index: head/lib/libkvm/kvm_private.c
===================================================================
--- head/lib/libkvm/kvm_private.c	(revision 358019)
+++ head/lib/libkvm/kvm_private.c	(revision 358020)
@@ -1,768 +1,769 @@
 /*-
  * Copyright (c) 1989, 1992, 1993
  *	The Regents of the University of California.  All rights reserved.
  *
  * This code is derived from software developed by the Computer Systems
  * Engineering group at Lawrence Berkeley Laboratory under DARPA contract
  * BG 91-66 and contributed to Berkeley.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 3. Neither the name of the University nor the names of its contributors
  *    may be used to endorse or promote products derived from this software
  *    without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/fnv_hash.h>
 
 #define	_WANT_VNET
 
 #include <sys/user.h>
 #include <sys/linker.h>
 #include <sys/pcpu.h>
 #include <sys/stat.h>
 #include <sys/mman.h>
 
+#include <stdbool.h>
 #include <net/vnet.h>
 
 #include <assert.h>
 #include <fcntl.h>
 #include <vm/vm.h>
 #include <kvm.h>
 #include <limits.h>
 #include <paths.h>
 #include <stdint.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <unistd.h>
 #include <stdarg.h>
 #include <inttypes.h>
 
 #include "kvm_private.h"
 
 /*
  * Routines private to libkvm.
  */
 
 /* from src/lib/libc/gen/nlist.c */
 int __fdnlist(int, struct nlist *);
 
 /*
  * Report an error using printf style arguments.  "program" is kd->program
  * on hard errors, and 0 on soft errors, so that under sun error emulation,
  * only hard errors are printed out (otherwise, programs like gdb will
  * generate tons of error messages when trying to access bogus pointers).
  */
 void
 _kvm_err(kvm_t *kd, const char *program, const char *fmt, ...)
 {
 	va_list ap;
 
 	va_start(ap, fmt);
 	if (program != NULL) {
 		(void)fprintf(stderr, "%s: ", program);
 		(void)vfprintf(stderr, fmt, ap);
 		(void)fputc('\n', stderr);
 	} else
 		(void)vsnprintf(kd->errbuf,
 		    sizeof(kd->errbuf), fmt, ap);
 
 	va_end(ap);
 }
 
 void
 _kvm_syserr(kvm_t *kd, const char *program, const char *fmt, ...)
 {
 	va_list ap;
 	int n;
 
 	va_start(ap, fmt);
 	if (program != NULL) {
 		(void)fprintf(stderr, "%s: ", program);
 		(void)vfprintf(stderr, fmt, ap);
 		(void)fprintf(stderr, ": %s\n", strerror(errno));
 	} else {
 		char *cp = kd->errbuf;
 
 		(void)vsnprintf(cp, sizeof(kd->errbuf), fmt, ap);
 		n = strlen(cp);
 		(void)snprintf(&cp[n], sizeof(kd->errbuf) - n, ": %s",
 		    strerror(errno));
 	}
 	va_end(ap);
 }
 
 void *
 _kvm_malloc(kvm_t *kd, size_t n)
 {
 	void *p;
 
 	if ((p = calloc(n, sizeof(char))) == NULL)
 		_kvm_err(kd, kd->program, "can't allocate %zu bytes: %s",
 			 n, strerror(errno));
 	return (p);
 }
 
 int
 _kvm_probe_elf_kernel(kvm_t *kd, int class, int machine)
 {
 
 	return (kd->nlehdr.e_ident[EI_CLASS] == class &&
 	    ((machine == EM_PPC || machine == EM_PPC64) ?
 	     kd->nlehdr.e_type == ET_DYN : kd->nlehdr.e_type == ET_EXEC) &&
 	    kd->nlehdr.e_machine == machine);
 }
 
 int
 _kvm_is_minidump(kvm_t *kd)
 {
 	char minihdr[8];
 
 	if (kd->rawdump)
 		return (0);
 	if (pread(kd->pmfd, &minihdr, 8, 0) == 8 &&
 	    memcmp(&minihdr, "minidump", 8) == 0)
 		return (1);
 	return (0);
 }
 
 /*
  * The powerpc backend has a hack to strip a leading kerneldump
  * header from the core before treating it as an ELF header.
  *
  * We can add that here if we can get a change to libelf to support
  * an initial offset into the file.  Alternatively we could patch
  * savecore to extract cores from a regular file instead.
  */
 int
 _kvm_read_core_phdrs(kvm_t *kd, size_t *phnump, GElf_Phdr **phdrp)
 {
 	GElf_Ehdr ehdr;
 	GElf_Phdr *phdr;
 	Elf *elf;
 	size_t i, phnum;
 
 	elf = elf_begin(kd->pmfd, ELF_C_READ, NULL);
 	if (elf == NULL) {
 		_kvm_err(kd, kd->program, "%s", elf_errmsg(0));
 		return (-1);
 	}
 	if (elf_kind(elf) != ELF_K_ELF) {
 		_kvm_err(kd, kd->program, "invalid core");
 		goto bad;
 	}
 	if (gelf_getclass(elf) != kd->nlehdr.e_ident[EI_CLASS]) {
 		_kvm_err(kd, kd->program, "invalid core");
 		goto bad;
 	}
 	if (gelf_getehdr(elf, &ehdr) == NULL) {
 		_kvm_err(kd, kd->program, "%s", elf_errmsg(0));
 		goto bad;
 	}
 	if (ehdr.e_type != ET_CORE) {
 		_kvm_err(kd, kd->program, "invalid core");
 		goto bad;
 	}
 	if (ehdr.e_machine != kd->nlehdr.e_machine) {
 		_kvm_err(kd, kd->program, "invalid core");
 		goto bad;
 	}
 
 	if (elf_getphdrnum(elf, &phnum) == -1) {
 		_kvm_err(kd, kd->program, "%s", elf_errmsg(0));
 		goto bad;
 	}
 
 	phdr = calloc(phnum, sizeof(*phdr));
 	if (phdr == NULL) {
 		_kvm_err(kd, kd->program, "failed to allocate phdrs");
 		goto bad;
 	}
 
 	for (i = 0; i < phnum; i++) {
 		if (gelf_getphdr(elf, i, &phdr[i]) == NULL) {
 			free(phdr);
 			_kvm_err(kd, kd->program, "%s", elf_errmsg(0));
 			goto bad;
 		}
 	}
 	elf_end(elf);
 	*phnump = phnum;
 	*phdrp = phdr;
 	return (0);
 
 bad:
 	elf_end(elf);
 	return (-1);
 }
 
 /*
  * Transform v such that only bits [bit0, bitN) may be set.  Generates a
  * bitmask covering the number of bits, then shifts so +bit0+ is the first.
  */
 static uint64_t
 bitmask_range(uint64_t v, uint64_t bit0, uint64_t bitN)
 {
 	if (bit0 == 0 && bitN == BITS_IN(v))
 		return (v);
 
 	return (v & (((1ULL << (bitN - bit0)) - 1ULL) << bit0));
 }
 
 /*
  * Returns the number of bits in a given byte array range starting at a
  * given base, from bit0 to bitN.  bit0 may be non-zero in the case of
  * counting backwards from bitN.
  */
 static uint64_t
 popcount_bytes(uint64_t *addr, uint32_t bit0, uint32_t bitN)
 {
 	uint32_t res = bitN - bit0;
 	uint64_t count = 0;
 	uint32_t bound;
 
 	/* Align to 64-bit boundary on the left side if needed. */
 	if ((bit0 % BITS_IN(*addr)) != 0) {
 		bound = MIN(bitN, roundup2(bit0, BITS_IN(*addr)));
 		count += __bitcount64(bitmask_range(*addr, bit0, bound));
 		res -= (bound - bit0);
 		addr++;
 	}
 
 	while (res > 0) {
 		bound = MIN(res, BITS_IN(*addr));
 		count += __bitcount64(bitmask_range(*addr, 0, bound));
 		res -= bound;
 		addr++;
 	}
 
 	return (count);
 }
 
 void *
 _kvm_pmap_get(kvm_t *kd, u_long idx, size_t len)
 {
 	uintptr_t off = idx * len;
 
 	if ((off_t)off >= kd->pt_sparse_off)
 		return (NULL);
 	return (void *)((uintptr_t)kd->page_map + off);
 }
 
 void *
 _kvm_map_get(kvm_t *kd, u_long pa, unsigned int page_size)
 {
 	off_t off;
 	uintptr_t addr;
 
 	off = _kvm_pt_find(kd, pa, page_size);
 	if (off == -1)
 		return NULL;
 
 	addr = (uintptr_t)kd->page_map + off;
 	if (off >= kd->pt_sparse_off)
 		addr = (uintptr_t)kd->sparse_map + (off - kd->pt_sparse_off);
 	return (void *)addr;
 }
 
 int
 _kvm_pt_init(kvm_t *kd, size_t map_len, off_t map_off, off_t sparse_off,
     int page_size, int word_size)
 {
 	uint64_t *addr;
 	uint32_t *popcount_bin;
 	int bin_popcounts = 0;
 	uint64_t pc_bins, res;
 	ssize_t rd;
 
 	/*
 	 * Map the bitmap specified by the arguments.
 	 */
 	kd->pt_map = _kvm_malloc(kd, map_len);
 	if (kd->pt_map == NULL) {
 		_kvm_err(kd, kd->program, "cannot allocate %zu bytes for bitmap",
 		    map_len);
 		return (-1);
 	}
 	rd = pread(kd->pmfd, kd->pt_map, map_len, map_off);
 	if (rd < 0 || rd != (ssize_t)map_len) {
 		_kvm_err(kd, kd->program, "cannot read %zu bytes for bitmap",
 		    map_len);
 		return (-1);
 	}
 	kd->pt_map_size = map_len;
 
 	/*
 	 * Generate a popcount cache for every POPCOUNT_BITS in the bitmap,
 	 * so lookups only have to calculate the number of bits set between
 	 * a cache point and their bit.  This reduces lookups to O(1),
 	 * without significantly increasing memory requirements.
 	 *
 	 * Round up the number of bins so that 'upper half' lookups work for
 	 * the final bin, if needed.  The first popcount is 0, since no bits
 	 * precede bit 0, so add 1 for that also.  Without this, extra work
 	 * would be needed to handle the first PTEs in _kvm_pt_find().
 	 */
 	addr = kd->pt_map;
 	res = map_len;
 	pc_bins = 1 + (res * NBBY + POPCOUNT_BITS / 2) / POPCOUNT_BITS;
 	kd->pt_popcounts = calloc(pc_bins, sizeof(uint32_t));
 	if (kd->pt_popcounts == NULL) {
 		_kvm_err(kd, kd->program, "cannot allocate popcount bins");
 		return (-1);
 	}
 
 	for (popcount_bin = &kd->pt_popcounts[1]; res > 0;
 	    addr++, res -= sizeof(*addr)) {
 		*popcount_bin += popcount_bytes(addr, 0,
 		    MIN(res * NBBY, BITS_IN(*addr)));
 		if (++bin_popcounts == POPCOUNTS_IN(*addr)) {
 			popcount_bin++;
 			*popcount_bin = *(popcount_bin - 1);
 			bin_popcounts = 0;
 		}
 	}
 
 	assert(pc_bins * sizeof(*popcount_bin) ==
 	    ((uintptr_t)popcount_bin - (uintptr_t)kd->pt_popcounts));
 
 	kd->pt_sparse_off = sparse_off;
 	kd->pt_sparse_size = (uint64_t)*popcount_bin * page_size;
 	kd->pt_page_size = page_size;
 	kd->pt_word_size = word_size;
 
 	/*
 	 * Map the sparse page array.  This is useful for performing point
 	 * lookups of specific pages, e.g. for kvm_walk_pages.  Generally,
 	 * this is much larger than is reasonable to read in up front, so
 	 * mmap it in instead.
 	 */
 	kd->sparse_map = mmap(NULL, kd->pt_sparse_size, PROT_READ,
 	    MAP_PRIVATE, kd->pmfd, kd->pt_sparse_off);
 	if (kd->sparse_map == MAP_FAILED) {
 		_kvm_err(kd, kd->program, "cannot map %" PRIu64
 		    " bytes from fd %d offset %jd for sparse map: %s",
 		    kd->pt_sparse_size, kd->pmfd,
 		    (intmax_t)kd->pt_sparse_off, strerror(errno));
 		return (-1);
 	}
 	return (0);
 }
 
 int
 _kvm_pmap_init(kvm_t *kd, uint32_t pmap_size, off_t pmap_off)
 {
 	ssize_t exp_len = pmap_size;
 
 	kd->page_map_size = pmap_size;
 	kd->page_map_off = pmap_off;
 	kd->page_map = _kvm_malloc(kd, pmap_size);
 	if (kd->page_map == NULL) {
 		_kvm_err(kd, kd->program, "cannot allocate %u bytes "
 		    "for page map", pmap_size);
 		return (-1);
 	}
 	if (pread(kd->pmfd, kd->page_map, pmap_size, pmap_off) != exp_len) {
 		_kvm_err(kd, kd->program, "cannot read %d bytes from "
 		    "offset %jd for page map", pmap_size, (intmax_t)pmap_off);
 		return (-1);
 	}
 	return (0);
 }
 
 /*
  * Find the offset for the given physical page address; returns -1 otherwise.
  *
  * A page's offset is represented by the sparse page base offset plus the
  * number of bits set before its bit multiplied by page size.  This means
  * that if a page exists in the dump, it's necessary to know how many pages
  * in the dump precede it.  Reduce this O(n) counting to O(1) by caching the
  * number of bits set at POPCOUNT_BITS intervals.
  *
  * Then to find the number of pages before the requested address, simply
  * index into the cache and count the number of bits set between that cache
  * bin and the page's bit.  Halve the number of bytes that have to be
  * checked by also counting down from the next higher bin if it's closer.
  */
 off_t
 _kvm_pt_find(kvm_t *kd, uint64_t pa, unsigned int page_size)
 {
 	uint64_t *bitmap = kd->pt_map;
 	uint64_t pte_bit_id = pa / page_size;
 	uint64_t pte_u64 = pte_bit_id / BITS_IN(*bitmap);
 	uint64_t popcount_id = pte_bit_id / POPCOUNT_BITS;
 	uint64_t pte_mask = 1ULL << (pte_bit_id % BITS_IN(*bitmap));
 	uint64_t bitN;
 	uint32_t count;
 
 	/* Check whether the page address requested is in the dump. */
 	if (pte_bit_id >= (kd->pt_map_size * NBBY) ||
 	    (bitmap[pte_u64] & pte_mask) == 0)
 		return (-1);
 
 	/*
 	 * Add/sub popcounts from the bitmap until the PTE's bit is reached.
 	 * For bits that are in the upper half between the calculated
 	 * popcount id and the next one, use the next one and subtract to
 	 * minimize the number of popcounts required.
 	 */
 	if ((pte_bit_id % POPCOUNT_BITS) < (POPCOUNT_BITS / 2)) {
 		count = kd->pt_popcounts[popcount_id] + popcount_bytes(
 		    bitmap + popcount_id * POPCOUNTS_IN(*bitmap),
 		    0, pte_bit_id - popcount_id * POPCOUNT_BITS);
 	} else {
 		/*
 		 * Counting in reverse is trickier, since we must avoid
 		 * reading from bytes that are not in range, and invert.
 		 */
 		uint64_t pte_u64_bit_off = pte_u64 * BITS_IN(*bitmap);
 
 		popcount_id++;
 		bitN = MIN(popcount_id * POPCOUNT_BITS,
 		    kd->pt_map_size * BITS_IN(uint8_t));
 		count = kd->pt_popcounts[popcount_id] - popcount_bytes(
 		    bitmap + pte_u64,
 		    pte_bit_id - pte_u64_bit_off, bitN - pte_u64_bit_off);
 	}
 
 	/*
 	 * This can only happen if the core is truncated.  Treat these
 	 * entries as if they don't exist, since their backing doesn't.
 	 */
 	if (count >= (kd->pt_sparse_size / page_size))
 		return (-1);
 
 	return (kd->pt_sparse_off + (uint64_t)count * page_size);
 }
 
 static int
 kvm_fdnlist(kvm_t *kd, struct kvm_nlist *list)
 {
 	kvaddr_t addr;
 	int error, nfail;
 
 	if (kd->resolve_symbol == NULL) {
 		struct nlist *nl;
 		int count, i;
 
 		for (count = 0; list[count].n_name != NULL &&
 		     list[count].n_name[0] != '\0'; count++)
 			;
 		nl = calloc(count + 1, sizeof(*nl));
 		for (i = 0; i < count; i++)
 			nl[i].n_name = list[i].n_name;
 		nfail = __fdnlist(kd->nlfd, nl);
 		for (i = 0; i < count; i++) {
 			list[i].n_type = nl[i].n_type;
 			list[i].n_value = nl[i].n_value;
 		}
 		free(nl);
 		return (nfail);
 	}
 
 	nfail = 0;
 	while (list->n_name != NULL && list->n_name[0] != '\0') {
 		error = kd->resolve_symbol(list->n_name, &addr);
 		if (error != 0) {
 			nfail++;
 			list->n_value = 0;
 			list->n_type = 0;
 		} else {
 			list->n_value = addr;
 			list->n_type = N_DATA | N_EXT;
 		}
 		list++;
 	}
 	return (nfail);
 }
 
 /*
  * Walk the list of unresolved symbols, generate a new list and prefix the
  * symbol names, try again, and merge back what we could resolve.
  */
 static int
 kvm_fdnlist_prefix(kvm_t *kd, struct kvm_nlist *nl, int missing,
     const char *prefix, kvaddr_t (*validate_fn)(kvm_t *, kvaddr_t))
 {
 	struct kvm_nlist *n, *np, *p;
 	char *cp, *ce;
 	const char *ccp;
 	size_t len;
 	int slen, unresolved;
 
 	/*
 	 * Calculate the space we need to malloc for nlist and names.
 	 * We are going to store the name twice for later lookups: once
 	 * with the prefix and once the unmodified name delmited by \0.
 	 */
 	len = 0;
 	unresolved = 0;
 	for (p = nl; p->n_name && p->n_name[0]; ++p) {
 		if (p->n_type != N_UNDF)
 			continue;
 		len += sizeof(struct kvm_nlist) + strlen(prefix) +
 		    2 * (strlen(p->n_name) + 1);
 		unresolved++;
 	}
 	if (unresolved == 0)
 		return (unresolved);
 	/* Add space for the terminating nlist entry. */
 	len += sizeof(struct kvm_nlist);
 	unresolved++;
 
 	/* Alloc one chunk for (nlist, [names]) and setup pointers. */
 	n = np = malloc(len);
 	bzero(n, len);
 	if (n == NULL)
 		return (missing);
 	cp = ce = (char *)np;
 	cp += unresolved * sizeof(struct kvm_nlist);
 	ce += len;
 
 	/* Generate shortened nlist with special prefix. */
 	unresolved = 0;
 	for (p = nl; p->n_name && p->n_name[0]; ++p) {
 		if (p->n_type != N_UNDF)
 			continue;
 		*np = *p;
 		/* Save the new\0orig. name so we can later match it again. */
 		slen = snprintf(cp, ce - cp, "%s%s%c%s", prefix,
 		    (prefix[0] != '\0' && p->n_name[0] == '_') ?
 			(p->n_name + 1) : p->n_name, '\0', p->n_name);
 		if (slen < 0 || slen >= ce - cp)
 			continue;
 		np->n_name = cp;
 		cp += slen + 1;
 		np++;
 		unresolved++;
 	}
 
 	/* Do lookup on the reduced list. */
 	np = n;
 	unresolved = kvm_fdnlist(kd, np);
 
 	/* Check if we could resolve further symbols and update the list. */
 	if (unresolved >= 0 && unresolved < missing) {
 		/* Find the first freshly resolved entry. */
 		for (; np->n_name && np->n_name[0]; np++)
 			if (np->n_type != N_UNDF)
 				break;
 		/*
 		 * The lists are both in the same order,
 		 * so we can walk them in parallel.
 		 */
 		for (p = nl; np->n_name && np->n_name[0] &&
 		    p->n_name && p->n_name[0]; ++p) {
 			if (p->n_type != N_UNDF)
 				continue;
 			/* Skip expanded name and compare to orig. one. */
 			ccp = np->n_name + strlen(np->n_name) + 1;
 			if (strcmp(ccp, p->n_name) != 0)
 				continue;
 			/* Update nlist with new, translated results. */
 			p->n_type = np->n_type;
 			if (validate_fn)
 				p->n_value = (*validate_fn)(kd, np->n_value);
 			else
 				p->n_value = np->n_value;
 			missing--;
 			/* Find next freshly resolved entry. */
 			for (np++; np->n_name && np->n_name[0]; np++)
 				if (np->n_type != N_UNDF)
 					break;
 		}
 	}
 	/* We could assert missing = unresolved here. */
 
 	free(n);
 	return (unresolved);
 }
 
 int
 _kvm_nlist(kvm_t *kd, struct kvm_nlist *nl, int initialize)
 {
 	struct kvm_nlist *p;
 	int nvalid;
 	struct kld_sym_lookup lookup;
 	int error;
 	const char *prefix = "";
 	char symname[1024]; /* XXX-BZ symbol name length limit? */
 	int tried_vnet, tried_dpcpu;
 
 	/*
 	 * If we can't use the kld symbol lookup, revert to the
 	 * slow library call.
 	 */
 	if (!ISALIVE(kd)) {
 		error = kvm_fdnlist(kd, nl);
 		if (error <= 0)			/* Hard error or success. */
 			return (error);
 
 		if (_kvm_vnet_initialized(kd, initialize))
 			error = kvm_fdnlist_prefix(kd, nl, error,
 			    VNET_SYMPREFIX, _kvm_vnet_validaddr);
 
 		if (error > 0 && _kvm_dpcpu_initialized(kd, initialize))
 			error = kvm_fdnlist_prefix(kd, nl, error,
 			    DPCPU_SYMPREFIX, _kvm_dpcpu_validaddr);
 
 		return (error);
 	}
 
 	/*
 	 * We can use the kld lookup syscall.  Go through each nlist entry
 	 * and look it up with a kldsym(2) syscall.
 	 */
 	nvalid = 0;
 	tried_vnet = 0;
 	tried_dpcpu = 0;
 again:
 	for (p = nl; p->n_name && p->n_name[0]; ++p) {
 		if (p->n_type != N_UNDF)
 			continue;
 
 		lookup.version = sizeof(lookup);
 		lookup.symvalue = 0;
 		lookup.symsize = 0;
 
 		error = snprintf(symname, sizeof(symname), "%s%s", prefix,
 		    (prefix[0] != '\0' && p->n_name[0] == '_') ?
 			(p->n_name + 1) : p->n_name);
 		if (error < 0 || error >= (int)sizeof(symname))
 			continue;
 		lookup.symname = symname;
 		if (lookup.symname[0] == '_')
 			lookup.symname++;
 
 		if (kldsym(0, KLDSYM_LOOKUP, &lookup) != -1) {
 			p->n_type = N_TEXT;
 			if (_kvm_vnet_initialized(kd, initialize) &&
 			    strcmp(prefix, VNET_SYMPREFIX) == 0)
 				p->n_value =
 				    _kvm_vnet_validaddr(kd, lookup.symvalue);
 			else if (_kvm_dpcpu_initialized(kd, initialize) &&
 			    strcmp(prefix, DPCPU_SYMPREFIX) == 0)
 				p->n_value =
 				    _kvm_dpcpu_validaddr(kd, lookup.symvalue);
 			else
 				p->n_value = lookup.symvalue;
 			++nvalid;
 			/* lookup.symsize */
 		}
 	}
 
 	/*
 	 * Check the number of entries that weren't found. If they exist,
 	 * try again with a prefix for virtualized or DPCPU symbol names.
 	 */
 	error = ((p - nl) - nvalid);
 	if (error && _kvm_vnet_initialized(kd, initialize) && !tried_vnet) {
 		tried_vnet = 1;
 		prefix = VNET_SYMPREFIX;
 		goto again;
 	}
 	if (error && _kvm_dpcpu_initialized(kd, initialize) && !tried_dpcpu) {
 		tried_dpcpu = 1;
 		prefix = DPCPU_SYMPREFIX;
 		goto again;
 	}
 
 	/*
 	 * Return the number of entries that weren't found. If they exist,
 	 * also fill internal error buffer.
 	 */
 	error = ((p - nl) - nvalid);
 	if (error)
 		_kvm_syserr(kd, kd->program, "kvm_nlist");
 	return (error);
 }
 
 int
 _kvm_bitmap_init(struct kvm_bitmap *bm, u_long bitmapsize, u_long *idx)
 {
 
 	*idx = ULONG_MAX;
 	bm->map = calloc(bitmapsize, sizeof *bm->map);
 	if (bm->map == NULL)
 		return (0);
 	bm->size = bitmapsize;
 	return (1);
 }
 
 void
 _kvm_bitmap_set(struct kvm_bitmap *bm, u_long pa, unsigned int page_size)
 {
 	u_long bm_index = pa / page_size;
 	uint8_t *byte = &bm->map[bm_index / 8];
 
 	*byte |= (1UL << (bm_index % 8));
 }
 
 int
 _kvm_bitmap_next(struct kvm_bitmap *bm, u_long *idx)
 {
 	u_long first_invalid = bm->size * CHAR_BIT;
 
 	if (*idx == ULONG_MAX)
 		*idx = 0;
 	else
 		(*idx)++;
 
 	/* Find the next valid idx. */
 	for (; *idx < first_invalid; (*idx)++) {
 		unsigned int mask = *idx % CHAR_BIT;
 		if ((bm->map[*idx * CHAR_BIT] & mask) == 0)
 			break;
 	}
 
 	return (*idx < first_invalid);
 }
 
 void
 _kvm_bitmap_deinit(struct kvm_bitmap *bm)
 {
 
 	free(bm->map);
 }
 
 int
 _kvm_visit_cb(kvm_t *kd, kvm_walk_pages_cb_t *cb, void *arg, u_long pa,
     u_long kmap_vaddr, u_long dmap_vaddr, vm_prot_t prot, size_t len,
     unsigned int page_size)
 {
 	unsigned int pgsz = page_size ? page_size : len;
 	struct kvm_page p = {
 		.kp_version = LIBKVM_WALK_PAGES_VERSION,
 		.kp_paddr = pa,
 		.kp_kmap_vaddr = kmap_vaddr,
 		.kp_dmap_vaddr = dmap_vaddr,
 		.kp_prot = prot,
 		.kp_offset = _kvm_pt_find(kd, pa, pgsz),
 		.kp_len = len,
 	};
 
 	return cb(&p, arg);
 }
Index: head/lib/libkvm/kvm_vnet.c
===================================================================
--- head/lib/libkvm/kvm_vnet.c	(revision 358019)
+++ head/lib/libkvm/kvm_vnet.c	(revision 358020)
@@ -1,246 +1,247 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (c) 2009 Robert N. M. Watson
  * Copyright (c) 2009 Bjoern A. Zeeb <bz@FreeBSD.org>
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 
 #define	_WANT_PRISON
 #define	_WANT_UCRED
 #define	_WANT_VNET
 
 #include <sys/_lock.h>
 #include <sys/_mutex.h>
 #include <sys/_task.h>
 #include <sys/jail.h>
 #include <sys/proc.h>
 #include <sys/types.h>
 
+#include <stdbool.h>
 #include <net/vnet.h>
 
 #include <kvm.h>
 #include <limits.h>
 #include <stdlib.h>
 #include <unistd.h>
 
 #include "kvm_private.h"
 
 /*
  * Set up libkvm to handle virtual network stack symbols by selecting a
  * starting pid.
  */
 int
 _kvm_vnet_selectpid(kvm_t *kd, pid_t pid)
 {
 	struct proc proc;
 	struct ucred cred;
 	struct prison prison;
 	struct vnet vnet;
 	struct kvm_nlist nl[] = {
 		/*
 		 * Note: kvm_nlist strips the first '_' so add an extra one
 		 * here to __{start,stop}_set_vnet.
 		 */
 #define	NLIST_START_VNET	0
 		{ .n_name = "___start_" VNET_SETNAME },
 #define	NLIST_STOP_VNET		1
 		{ .n_name = "___stop_" VNET_SETNAME },
 #define	NLIST_VNET_HEAD		2
 		{ .n_name = "vnet_head" },
 #define	NLIST_ALLPROC		3
 		{ .n_name = "allproc" },
 #define	NLIST_DUMPTID		4
 		{ .n_name = "dumptid" },
 #define	NLIST_PROC0		5
 		{ .n_name = "proc0" },
 		{ .n_name = NULL },
 	};
 	uintptr_t procp, credp;
 #define	VMCORE_VNET_OF_PROC0
 #ifndef VMCORE_VNET_OF_PROC0
 	struct thread td;
 	uintptr_t tdp;
 #endif
 	lwpid_t dumptid;
 
 	/*
 	 * XXX: This only works for native kernels for now.
 	 */
 	if (!kvm_native(kd))
 		return (-1);
 
 	/*
 	 * Locate and cache locations of important symbols
 	 * using the internal version of _kvm_nlist, turning
 	 * off initialization to avoid recursion in case of
 	 * unresolveable symbols.
 	 */
 	if (_kvm_nlist(kd, nl, 0) != 0) {
 		/*
 		 * XXX-BZ: ___start_/___stop_VNET_SETNAME may fail.
 		 * For now do not report an error here as we are called
 		 * internally and in `void context' until we merge the
 		 * functionality to optionally activate this into programs.
 		 * By that time we can properly fail and let the callers
 		 * handle the error.
 		 */
 		/* _kvm_err(kd, kd->program, "%s: no namelist", __func__); */
 		return (-1);
 	}
 
 	/*
 	 * Auto-detect if this is a crashdump by reading dumptid.
 	 */
 	dumptid = 0;
 	if (nl[NLIST_DUMPTID].n_value) {
 		if (kvm_read(kd, nl[NLIST_DUMPTID].n_value, &dumptid,
 		    sizeof(dumptid)) != sizeof(dumptid)) {
 			_kvm_err(kd, kd->program, "%s: dumptid", __func__);
 			return (-1);
 		}
 	}
 
 	/*
 	 * First, find the process for this pid.  If we are working on a
 	 * dump, either locate the thread dumptid is referring to or proc0.
 	 * Based on either, take the address of the ucred.
 	 */
 	credp = 0;
 
 	procp = nl[NLIST_ALLPROC].n_value;
 #ifdef VMCORE_VNET_OF_PROC0
 	if (dumptid > 0) {
 		procp = nl[NLIST_PROC0].n_value;
 		pid = 0;
 	}
 #endif
 	while (procp != 0) {
 		if (kvm_read(kd, procp, &proc, sizeof(proc)) != sizeof(proc)) {
 			_kvm_err(kd, kd->program, "%s: proc", __func__);
 			return (-1);
 		}
 #ifndef VMCORE_VNET_OF_PROC0
 		if (dumptid > 0) {
 			tdp = (uintptr_t)TAILQ_FIRST(&proc.p_threads);
 			while (tdp != 0) {
 				if (kvm_read(kd, tdp, &td, sizeof(td)) !=
 				    sizeof(td)) {
 					_kvm_err(kd, kd->program, "%s: thread",
 					    __func__);
 					return (-1);
 				}
 				if (td.td_tid == dumptid) {
 					credp = (uintptr_t)td.td_ucred;
 					break;
 				}
 				tdp = (uintptr_t)TAILQ_NEXT(&td, td_plist);
 			}
 		} else
 #endif
 		if (proc.p_pid == pid)
 			credp = (uintptr_t)proc.p_ucred;
 		if (credp != 0)
 			break;
 		procp = (uintptr_t)LIST_NEXT(&proc, p_list);
 	}
 	if (credp == 0) {
 		_kvm_err(kd, kd->program, "%s: pid/tid not found", __func__);
 		return (-1);
 	}
 	if (kvm_read(kd, (uintptr_t)credp, &cred, sizeof(cred)) !=
 	    sizeof(cred)) {
 		_kvm_err(kd, kd->program, "%s: cred", __func__);
 		return (-1);
 	}
 	if (cred.cr_prison == NULL) {
 		_kvm_err(kd, kd->program, "%s: no jail", __func__);
 		return (-1);
 	}
 	if (kvm_read(kd, (uintptr_t)cred.cr_prison, &prison, sizeof(prison)) !=
 	    sizeof(prison)) {
 		_kvm_err(kd, kd->program, "%s: prison", __func__);
 		return (-1);
 	}
 	if (prison.pr_vnet == NULL) {
 		_kvm_err(kd, kd->program, "%s: no vnet", __func__);
 		return (-1);
 	}
 	if (kvm_read(kd, (uintptr_t)prison.pr_vnet, &vnet, sizeof(vnet)) !=
 	    sizeof(vnet)) {
 		_kvm_err(kd, kd->program, "%s: vnet", __func__);
 		return (-1);
 	}
 	if (vnet.vnet_magic_n != VNET_MAGIC_N) {
 		_kvm_err(kd, kd->program, "%s: invalid vnet magic#", __func__);
 		return (-1);
 	}
 	kd->vnet_initialized = 1;
 	kd->vnet_start = nl[NLIST_START_VNET].n_value;
 	kd->vnet_stop = nl[NLIST_STOP_VNET].n_value;
 	kd->vnet_current = (uintptr_t)prison.pr_vnet;
 	kd->vnet_base = vnet.vnet_data_base;
 	return (0);
 }
 
 /*
  * Check whether the vnet module has been initialized successfully
  * or not, initialize it if permitted.
  */
 int
 _kvm_vnet_initialized(kvm_t *kd, int intialize)
 {
 
 	if (kd->vnet_initialized || !intialize)
 		return (kd->vnet_initialized);
 
 	(void) _kvm_vnet_selectpid(kd, getpid());
 
 	return (kd->vnet_initialized);
 }
 
 /*
  * Check whether the value is within the vnet symbol range and
  * only if so adjust the offset relative to the current base.
  */
 kvaddr_t
 _kvm_vnet_validaddr(kvm_t *kd, kvaddr_t value)
 {
 
 	if (value == 0)
 		return (value);
 
 	if (!kd->vnet_initialized)
 		return (value);
 
 	if (value < kd->vnet_start || value >= kd->vnet_stop)
 		return (value);
 
 	return (kd->vnet_base + value);
 }
Index: head/sys/net/if.c
===================================================================
--- head/sys/net/if.c	(revision 358019)
+++ head/sys/net/if.c	(revision 358020)
@@ -1,4562 +1,4575 @@
 /*-
  * SPDX-License-Identifier: BSD-3-Clause
  *
  * Copyright (c) 1980, 1986, 1993
  *	The Regents of the University of California.  All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 3. Neither the name of the University nor the names of its contributors
  *    may be used to endorse or promote products derived from this software
  *    without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  *	@(#)if.c	8.5 (Berkeley) 1/9/95
  * $FreeBSD$
  */
 
 #include "opt_bpf.h"
 #include "opt_inet6.h"
 #include "opt_inet.h"
 
 #include <sys/param.h>
 #include <sys/conf.h>
 #include <sys/eventhandler.h>
 #include <sys/malloc.h>
 #include <sys/domainset.h>
 #include <sys/sbuf.h>
 #include <sys/bus.h>
 #include <sys/epoch.h>
 #include <sys/mbuf.h>
 #include <sys/systm.h>
 #include <sys/priv.h>
 #include <sys/proc.h>
 #include <sys/socket.h>
 #include <sys/socketvar.h>
 #include <sys/protosw.h>
 #include <sys/kernel.h>
 #include <sys/lock.h>
 #include <sys/refcount.h>
 #include <sys/module.h>
 #include <sys/rwlock.h>
 #include <sys/sockio.h>
 #include <sys/syslog.h>
 #include <sys/sysctl.h>
 #include <sys/sysent.h>
 #include <sys/taskqueue.h>
 #include <sys/domain.h>
 #include <sys/jail.h>
 #include <sys/priv.h>
 
 #include <machine/stdarg.h>
 #include <vm/uma.h>
 
 #include <net/bpf.h>
 #include <net/ethernet.h>
 #include <net/if.h>
 #include <net/if_arp.h>
 #include <net/if_clone.h>
 #include <net/if_dl.h>
 #include <net/if_types.h>
 #include <net/if_var.h>
 #include <net/if_media.h>
 #include <net/if_vlan_var.h>
 #include <net/radix.h>
 #include <net/route.h>
 #include <net/vnet.h>
 
 #if defined(INET) || defined(INET6)
 #include <net/ethernet.h>
 #include <netinet/in.h>
 #include <netinet/in_var.h>
 #include <netinet/ip.h>
 #include <netinet/ip_carp.h>
 #ifdef INET
 #include <net/debugnet.h>
 #include <netinet/if_ether.h>
 #endif /* INET */
 #ifdef INET6
 #include <netinet6/in6_var.h>
 #include <netinet6/in6_ifattach.h>
 #endif /* INET6 */
 #endif /* INET || INET6 */
 
 #include <security/mac/mac_framework.h>
 
 /*
  * Consumers of struct ifreq such as tcpdump assume no pad between ifr_name
  * and ifr_ifru when it is used in SIOCGIFCONF.
  */
 _Static_assert(sizeof(((struct ifreq *)0)->ifr_name) ==
     offsetof(struct ifreq, ifr_ifru), "gap between ifr_name and ifr_ifru");
 
 __read_mostly epoch_t net_epoch_preempt;
 #ifdef COMPAT_FREEBSD32
 #include <sys/mount.h>
 #include <compat/freebsd32/freebsd32.h>
 
 struct ifreq_buffer32 {
 	uint32_t	length;		/* (size_t) */
 	uint32_t	buffer;		/* (void *) */
 };
 
 /*
  * Interface request structure used for socket
  * ioctl's.  All interface ioctl's must have parameter
  * definitions which begin with ifr_name.  The
  * remainder may be interface specific.
  */
 struct ifreq32 {
 	char	ifr_name[IFNAMSIZ];		/* if name, e.g. "en0" */
 	union {
 		struct sockaddr	ifru_addr;
 		struct sockaddr	ifru_dstaddr;
 		struct sockaddr	ifru_broadaddr;
 		struct ifreq_buffer32 ifru_buffer;
 		short		ifru_flags[2];
 		short		ifru_index;
 		int		ifru_jid;
 		int		ifru_metric;
 		int		ifru_mtu;
 		int		ifru_phys;
 		int		ifru_media;
 		uint32_t	ifru_data;
 		int		ifru_cap[2];
 		u_int		ifru_fib;
 		u_char		ifru_vlan_pcp;
 	} ifr_ifru;
 };
 CTASSERT(sizeof(struct ifreq) == sizeof(struct ifreq32));
 CTASSERT(__offsetof(struct ifreq, ifr_ifru) ==
     __offsetof(struct ifreq32, ifr_ifru));
 
 struct ifgroupreq32 {
 	char	ifgr_name[IFNAMSIZ];
 	u_int	ifgr_len;
 	union {
 		char		ifgru_group[IFNAMSIZ];
 		uint32_t	ifgru_groups;
 	} ifgr_ifgru;
 };
 
 struct ifmediareq32 {
 	char		ifm_name[IFNAMSIZ];
 	int		ifm_current;
 	int		ifm_mask;
 	int		ifm_status;
 	int		ifm_active;
 	int		ifm_count;
 	uint32_t	ifm_ulist;	/* (int *) */
 };
 #define	SIOCGIFMEDIA32	_IOC_NEWTYPE(SIOCGIFMEDIA, struct ifmediareq32)
 #define	SIOCGIFXMEDIA32	_IOC_NEWTYPE(SIOCGIFXMEDIA, struct ifmediareq32)
 
 #define	_CASE_IOC_IFGROUPREQ_32(cmd)				\
     _IOC_NEWTYPE((cmd), struct ifgroupreq32): case
 #else /* !COMPAT_FREEBSD32 */
 #define _CASE_IOC_IFGROUPREQ_32(cmd)
 #endif /* !COMPAT_FREEBSD32 */
 
 #define CASE_IOC_IFGROUPREQ(cmd)	\
     _CASE_IOC_IFGROUPREQ_32(cmd)	\
     (cmd)
 
 union ifreq_union {
 	struct ifreq	ifr;
 #ifdef COMPAT_FREEBSD32
 	struct ifreq32	ifr32;
 #endif
 };
 
 union ifgroupreq_union {
 	struct ifgroupreq ifgr;
 #ifdef COMPAT_FREEBSD32
 	struct ifgroupreq32 ifgr32;
 #endif
 };
 
 SYSCTL_NODE(_net, PF_LINK, link, CTLFLAG_RW, 0, "Link layers");
 SYSCTL_NODE(_net_link, 0, generic, CTLFLAG_RW, 0, "Generic link-management");
 
 SYSCTL_INT(_net_link, OID_AUTO, ifqmaxlen, CTLFLAG_RDTUN,
     &ifqmaxlen, 0, "max send queue size");
 
 /* Log link state change events */
 static int log_link_state_change = 1;
 
 SYSCTL_INT(_net_link, OID_AUTO, log_link_state_change, CTLFLAG_RW,
 	&log_link_state_change, 0,
 	"log interface link state change events");
 
 /* Log promiscuous mode change events */
 static int log_promisc_mode_change = 1;
 
 SYSCTL_INT(_net_link, OID_AUTO, log_promisc_mode_change, CTLFLAG_RDTUN,
 	&log_promisc_mode_change, 1,
 	"log promiscuous mode change events");
 
 /* Interface description */
 static unsigned int ifdescr_maxlen = 1024;
 SYSCTL_UINT(_net, OID_AUTO, ifdescr_maxlen, CTLFLAG_RW,
 	&ifdescr_maxlen, 0,
 	"administrative maximum length for interface description");
 
 static MALLOC_DEFINE(M_IFDESCR, "ifdescr", "ifnet descriptions");
 
 /* global sx for non-critical path ifdescr */
 static struct sx ifdescr_sx;
 SX_SYSINIT(ifdescr_sx, &ifdescr_sx, "ifnet descr");
 
 void	(*ng_ether_link_state_p)(struct ifnet *ifp, int state);
 void	(*lagg_linkstate_p)(struct ifnet *ifp, int state);
 /* These are external hooks for CARP. */
 void	(*carp_linkstate_p)(struct ifnet *ifp);
 void	(*carp_demote_adj_p)(int, char *);
 int	(*carp_master_p)(struct ifaddr *);
 #if defined(INET) || defined(INET6)
 int	(*carp_forus_p)(struct ifnet *ifp, u_char *dhost);
 int	(*carp_output_p)(struct ifnet *ifp, struct mbuf *m,
     const struct sockaddr *sa);
 int	(*carp_ioctl_p)(struct ifreq *, u_long, struct thread *);   
 int	(*carp_attach_p)(struct ifaddr *, int);
 void	(*carp_detach_p)(struct ifaddr *, bool);
 #endif
 #ifdef INET
 int	(*carp_iamatch_p)(struct ifaddr *, uint8_t **);
 #endif
 #ifdef INET6
 struct ifaddr *(*carp_iamatch6_p)(struct ifnet *ifp, struct in6_addr *taddr6);
 caddr_t	(*carp_macmatch6_p)(struct ifnet *ifp, struct mbuf *m,
     const struct in6_addr *taddr);
 #endif
 
 struct mbuf *(*tbr_dequeue_ptr)(struct ifaltq *, int) = NULL;
 
 /*
  * XXX: Style; these should be sorted alphabetically, and unprototyped
  * static functions should be prototyped. Currently they are sorted by
  * declaration order.
  */
 static void	if_attachdomain(void *);
 static void	if_attachdomain1(struct ifnet *);
 static int	ifconf(u_long, caddr_t);
 static void	*if_grow(void);
 static void	if_input_default(struct ifnet *, struct mbuf *);
 static int	if_requestencap_default(struct ifnet *, struct if_encap_req *);
 static void	if_route(struct ifnet *, int flag, int fam);
 static int	if_setflag(struct ifnet *, int, int, int *, int);
 static int	if_transmit(struct ifnet *ifp, struct mbuf *m);
 static void	if_unroute(struct ifnet *, int flag, int fam);
 static int	if_delmulti_locked(struct ifnet *, struct ifmultiaddr *, int);
 static void	do_link_state_change(void *, int);
 static int	if_getgroup(struct ifgroupreq *, struct ifnet *);
 static int	if_getgroupmembers(struct ifgroupreq *);
 static void	if_delgroups(struct ifnet *);
 static void	if_attach_internal(struct ifnet *, int, struct if_clone *);
 static int	if_detach_internal(struct ifnet *, int, struct if_clone **);
 static void	if_siocaddmulti(void *, int);
 #ifdef VIMAGE
 static int	if_vmove(struct ifnet *, struct vnet *);
 #endif
 
 #ifdef INET6
 /*
  * XXX: declare here to avoid to include many inet6 related files..
  * should be more generalized?
  */
 extern void	nd6_setmtu(struct ifnet *);
 #endif
 
 /* ipsec helper hooks */
 VNET_DEFINE(struct hhook_head *, ipsec_hhh_in[HHOOK_IPSEC_COUNT]);
 VNET_DEFINE(struct hhook_head *, ipsec_hhh_out[HHOOK_IPSEC_COUNT]);
 
 VNET_DEFINE(int, if_index);
 int	ifqmaxlen = IFQ_MAXLEN;
 VNET_DEFINE(struct ifnethead, ifnet);	/* depend on static init XXX */
 VNET_DEFINE(struct ifgrouphead, ifg_head);
 
 VNET_DEFINE_STATIC(int, if_indexlim) = 8;
 
 /* Table of ifnet by index. */
 VNET_DEFINE(struct ifnet **, ifindex_table);
 
 #define	V_if_indexlim		VNET(if_indexlim)
 #define	V_ifindex_table		VNET(ifindex_table)
 
 /*
  * The global network interface list (V_ifnet) and related state (such as
  * if_index, if_indexlim, and ifindex_table) are protected by an sxlock and
  * an rwlock.  Either may be acquired shared to stablize the list, but both
  * must be acquired writable to modify the list.  This model allows us to
  * both stablize the interface list during interrupt thread processing, but
  * also to stablize it over long-running ioctls, without introducing priority
  * inversions and deadlocks.
  */
 struct rwlock ifnet_rwlock;
 RW_SYSINIT_FLAGS(ifnet_rw, &ifnet_rwlock, "ifnet_rw", RW_RECURSE);
 struct sx ifnet_sxlock;
 SX_SYSINIT_FLAGS(ifnet_sx, &ifnet_sxlock, "ifnet_sx", SX_RECURSE);
 
 /*
  * The allocation of network interfaces is a rather non-atomic affair; we
  * need to select an index before we are ready to expose the interface for
  * use, so will use this pointer value to indicate reservation.
  */
 #define	IFNET_HOLD	(void *)(uintptr_t)(-1)
 
+#ifdef VIMAGE
+#define	VNET_IS_SHUTTING_DOWN(_vnet)					\
+    ((_vnet)->vnet_shutdown && (_vnet)->vnet_state < SI_SUB_VNET_DONE)
+#endif
+
 static	if_com_alloc_t *if_com_alloc[256];
 static	if_com_free_t *if_com_free[256];
 
 static MALLOC_DEFINE(M_IFNET, "ifnet", "interface internals");
 MALLOC_DEFINE(M_IFADDR, "ifaddr", "interface address");
 MALLOC_DEFINE(M_IFMADDR, "ether_multi", "link-level multicast address");
 
 struct ifnet *
 ifnet_byindex(u_short idx)
 {
 	struct ifnet *ifp;
 
 	if (__predict_false(idx > V_if_index))
 		return (NULL);
 
 	ifp = *(struct ifnet * const volatile *)(V_ifindex_table + idx);
 	return (__predict_false(ifp == IFNET_HOLD) ? NULL : ifp);
 }
 
 struct ifnet *
 ifnet_byindex_ref(u_short idx)
 {
 	struct ifnet *ifp;
 
 	NET_EPOCH_ASSERT();
 
 	ifp = ifnet_byindex(idx);
 	if (ifp == NULL || (ifp->if_flags & IFF_DYING))
 		return (NULL);
 	if_ref(ifp);
 	return (ifp);
 }
 
 /*
  * Allocate an ifindex array entry; return 0 on success or an error on
  * failure.
  */
 static u_short
 ifindex_alloc(void **old)
 {
 	u_short idx;
 
 	IFNET_WLOCK_ASSERT();
 	/*
 	 * Try to find an empty slot below V_if_index.  If we fail, take the
 	 * next slot.
 	 */
 	for (idx = 1; idx <= V_if_index; idx++) {
 		if (V_ifindex_table[idx] == NULL)
 			break;
 	}
 
 	/* Catch if_index overflow. */
 	if (idx >= V_if_indexlim) {
 		*old = if_grow();
 		return (USHRT_MAX);
 	}
 	if (idx > V_if_index)
 		V_if_index = idx;
 	return (idx);
 }
 
 static void
 ifindex_free_locked(u_short idx)
 {
 
 	IFNET_WLOCK_ASSERT();
 
 	V_ifindex_table[idx] = NULL;
 	while (V_if_index > 0 &&
 	    V_ifindex_table[V_if_index] == NULL)
 		V_if_index--;
 }
 
 static void
 ifindex_free(u_short idx)
 {
 
 	IFNET_WLOCK();
 	ifindex_free_locked(idx);
 	IFNET_WUNLOCK();
 }
 
 static void
 ifnet_setbyindex(u_short idx, struct ifnet *ifp)
 {
 
 	V_ifindex_table[idx] = ifp;
 }
 
 struct ifaddr *
 ifaddr_byindex(u_short idx)
 {
 	struct ifnet *ifp;
 	struct ifaddr *ifa = NULL;
 
 	NET_EPOCH_ASSERT();
 
 	ifp = ifnet_byindex(idx);
 	if (ifp != NULL && (ifa = ifp->if_addr) != NULL)
 		ifa_ref(ifa);
 	return (ifa);
 }
 
 /*
  * Network interface utility routines.
  *
  * Routines with ifa_ifwith* names take sockaddr *'s as
  * parameters.
  */
 
 static void
 vnet_if_init(const void *unused __unused)
 {
 	void *old;
 
 	CK_STAILQ_INIT(&V_ifnet);
 	CK_STAILQ_INIT(&V_ifg_head);
 	IFNET_WLOCK();
 	old = if_grow();				/* create initial table */
 	IFNET_WUNLOCK();
 	epoch_wait_preempt(net_epoch_preempt);
 	free(old, M_IFNET);
 	vnet_if_clone_init();
 }
 VNET_SYSINIT(vnet_if_init, SI_SUB_INIT_IF, SI_ORDER_SECOND, vnet_if_init,
     NULL);
 
 #ifdef VIMAGE
 static void
 vnet_if_uninit(const void *unused __unused)
 {
 
 	VNET_ASSERT(CK_STAILQ_EMPTY(&V_ifnet), ("%s:%d tailq &V_ifnet=%p "
 	    "not empty", __func__, __LINE__, &V_ifnet));
 	VNET_ASSERT(CK_STAILQ_EMPTY(&V_ifg_head), ("%s:%d tailq &V_ifg_head=%p "
 	    "not empty", __func__, __LINE__, &V_ifg_head));
 
 	free((caddr_t)V_ifindex_table, M_IFNET);
 }
 VNET_SYSUNINIT(vnet_if_uninit, SI_SUB_INIT_IF, SI_ORDER_FIRST,
     vnet_if_uninit, NULL);
 
 static void
 vnet_if_return(const void *unused __unused)
 {
 	struct ifnet *ifp, *nifp;
 
 	/* Return all inherited interfaces to their parent vnets. */
 	CK_STAILQ_FOREACH_SAFE(ifp, &V_ifnet, if_link, nifp) {
 		if (ifp->if_home_vnet != ifp->if_vnet)
 			if_vmove(ifp, ifp->if_home_vnet);
 	}
 }
 VNET_SYSUNINIT(vnet_if_return, SI_SUB_VNET_DONE, SI_ORDER_ANY,
     vnet_if_return, NULL);
 #endif
 
 
 static void *
 if_grow(void)
 {
 	int oldlim;
 	u_int n;
 	struct ifnet **e;
 	void *old;
 
 	old = NULL;
 	IFNET_WLOCK_ASSERT();
 	oldlim = V_if_indexlim;
 	IFNET_WUNLOCK();
 	n = (oldlim << 1) * sizeof(*e);
 	e = malloc(n, M_IFNET, M_WAITOK | M_ZERO);
 	IFNET_WLOCK();
 	if (V_if_indexlim != oldlim) {
 		free(e, M_IFNET);
 		return (NULL);
 	}
 	if (V_ifindex_table != NULL) {
 		memcpy((caddr_t)e, (caddr_t)V_ifindex_table, n/2);
 		old = V_ifindex_table;
 	}
 	V_if_indexlim <<= 1;
 	V_ifindex_table = e;
 	return (old);
 }
 
 /*
  * Allocate a struct ifnet and an index for an interface.  A layer 2
  * common structure will also be allocated if an allocation routine is
  * registered for the passed type.
  */
 struct ifnet *
 if_alloc_domain(u_char type, int numa_domain)
 {
 	struct ifnet *ifp;
 	u_short idx;
 	void *old;
 
 	KASSERT(numa_domain <= IF_NODOM, ("numa_domain too large"));
 	if (numa_domain == IF_NODOM)
 		ifp = malloc(sizeof(struct ifnet), M_IFNET,
 		    M_WAITOK | M_ZERO);
 	else
 		ifp = malloc_domainset(sizeof(struct ifnet), M_IFNET,
 		    DOMAINSET_PREF(numa_domain), M_WAITOK | M_ZERO);
  restart:
 	IFNET_WLOCK();
 	idx = ifindex_alloc(&old);
 	if (__predict_false(idx == USHRT_MAX)) {
 		IFNET_WUNLOCK();
 		epoch_wait_preempt(net_epoch_preempt);
 		free(old, M_IFNET);
 		goto restart;
 	}
 	ifnet_setbyindex(idx, IFNET_HOLD);
 	IFNET_WUNLOCK();
 	ifp->if_index = idx;
 	ifp->if_type = type;
 	ifp->if_alloctype = type;
 	ifp->if_numa_domain = numa_domain;
 #ifdef VIMAGE
 	ifp->if_vnet = curvnet;
 #endif
 	/* XXX */
 	ifp->if_flags |= IFF_NEEDSEPOCH;
 	if (if_com_alloc[type] != NULL) {
 		ifp->if_l2com = if_com_alloc[type](type, ifp);
 		if (ifp->if_l2com == NULL) {
 			free(ifp, M_IFNET);
 			ifindex_free(idx);
 			return (NULL);
 		}
 	}
 
 	IF_ADDR_LOCK_INIT(ifp);
 	TASK_INIT(&ifp->if_linktask, 0, do_link_state_change, ifp);
 	TASK_INIT(&ifp->if_addmultitask, 0, if_siocaddmulti, ifp);
 	ifp->if_afdata_initialized = 0;
 	IF_AFDATA_LOCK_INIT(ifp);
 	CK_STAILQ_INIT(&ifp->if_addrhead);
 	CK_STAILQ_INIT(&ifp->if_multiaddrs);
 	CK_STAILQ_INIT(&ifp->if_groups);
 #ifdef MAC
 	mac_ifnet_init(ifp);
 #endif
 	ifq_init(&ifp->if_snd, ifp);
 
 	refcount_init(&ifp->if_refcount, 1);	/* Index reference. */
 	for (int i = 0; i < IFCOUNTERS; i++)
 		ifp->if_counters[i] = counter_u64_alloc(M_WAITOK);
 	ifp->if_get_counter = if_get_counter_default;
 	ifp->if_pcp = IFNET_PCP_NONE;
 	ifnet_setbyindex(ifp->if_index, ifp);
 	return (ifp);
 }
 
 struct ifnet *
 if_alloc_dev(u_char type, device_t dev)
 {
 	int numa_domain;
 
 	if (dev == NULL || bus_get_domain(dev, &numa_domain) != 0)
 		return (if_alloc_domain(type, IF_NODOM));
 	return (if_alloc_domain(type, numa_domain));
 }
 
 struct ifnet *
 if_alloc(u_char type)
 {
 
 	return (if_alloc_domain(type, IF_NODOM));
 }
 /*
  * Do the actual work of freeing a struct ifnet, and layer 2 common
  * structure.  This call is made when the last reference to an
  * interface is released.
  */
 static void
 if_free_internal(struct ifnet *ifp)
 {
 
 	KASSERT((ifp->if_flags & IFF_DYING),
 	    ("if_free_internal: interface not dying"));
 
 	if (if_com_free[ifp->if_alloctype] != NULL)
 		if_com_free[ifp->if_alloctype](ifp->if_l2com,
 		    ifp->if_alloctype);
 
 #ifdef MAC
 	mac_ifnet_destroy(ifp);
 #endif /* MAC */
 	IF_AFDATA_DESTROY(ifp);
 	IF_ADDR_LOCK_DESTROY(ifp);
 	ifq_delete(&ifp->if_snd);
 
 	for (int i = 0; i < IFCOUNTERS; i++)
 		counter_u64_free(ifp->if_counters[i]);
 
 	free(ifp->if_description, M_IFDESCR);
 	free(ifp->if_hw_addr, M_IFADDR);
 	if (ifp->if_numa_domain == IF_NODOM)
 		free(ifp, M_IFNET);
 	else
 		free_domain(ifp, M_IFNET);
 }
 
 static void
 if_destroy(epoch_context_t ctx)
 {
 	struct ifnet *ifp;
 
 	ifp = __containerof(ctx, struct ifnet, if_epoch_ctx);
 	if_free_internal(ifp);
 }
 
 /*
  * Deregister an interface and free the associated storage.
  */
 void
 if_free(struct ifnet *ifp)
 {
 
 	ifp->if_flags |= IFF_DYING;			/* XXX: Locking */
 
 	CURVNET_SET_QUIET(ifp->if_vnet);
 	IFNET_WLOCK();
 	KASSERT(ifp == ifnet_byindex(ifp->if_index),
 	    ("%s: freeing unallocated ifnet", ifp->if_xname));
 
 	ifindex_free_locked(ifp->if_index);
 	IFNET_WUNLOCK();
 
 	if (refcount_release(&ifp->if_refcount))
 		NET_EPOCH_CALL(if_destroy, &ifp->if_epoch_ctx);
 	CURVNET_RESTORE();
 }
 
 /*
  * Interfaces to keep an ifnet type-stable despite the possibility of the
  * driver calling if_free().  If there are additional references, we defer
  * freeing the underlying data structure.
  */
 void
 if_ref(struct ifnet *ifp)
 {
 
 	/* We don't assert the ifnet list lock here, but arguably should. */
 	refcount_acquire(&ifp->if_refcount);
 }
 
 void
 if_rele(struct ifnet *ifp)
 {
 
 	if (!refcount_release(&ifp->if_refcount))
 		return;
 	NET_EPOCH_CALL(if_destroy, &ifp->if_epoch_ctx);
 }
 
 void
 ifq_init(struct ifaltq *ifq, struct ifnet *ifp)
 {
 	
 	mtx_init(&ifq->ifq_mtx, ifp->if_xname, "if send queue", MTX_DEF);
 
 	if (ifq->ifq_maxlen == 0) 
 		ifq->ifq_maxlen = ifqmaxlen;
 
 	ifq->altq_type = 0;
 	ifq->altq_disc = NULL;
 	ifq->altq_flags &= ALTQF_CANTCHANGE;
 	ifq->altq_tbr  = NULL;
 	ifq->altq_ifp  = ifp;
 }
 
 void
 ifq_delete(struct ifaltq *ifq)
 {
 	mtx_destroy(&ifq->ifq_mtx);
 }
 
 /*
  * Perform generic interface initialization tasks and attach the interface
  * to the list of "active" interfaces.  If vmove flag is set on entry
  * to if_attach_internal(), perform only a limited subset of initialization
  * tasks, given that we are moving from one vnet to another an ifnet which
  * has already been fully initialized.
  *
  * Note that if_detach_internal() removes group membership unconditionally
  * even when vmove flag is set, and if_attach_internal() adds only IFG_ALL.
  * Thus, when if_vmove() is applied to a cloned interface, group membership
  * is lost while a cloned one always joins a group whose name is
  * ifc->ifc_name.  To recover this after if_detach_internal() and
  * if_attach_internal(), the cloner should be specified to
  * if_attach_internal() via ifc.  If it is non-NULL, if_attach_internal()
  * attempts to join a group whose name is ifc->ifc_name.
  *
  * XXX:
  *  - The decision to return void and thus require this function to
  *    succeed is questionable.
  *  - We should probably do more sanity checking.  For instance we don't
  *    do anything to insure if_xname is unique or non-empty.
  */
 void
 if_attach(struct ifnet *ifp)
 {
 
 	if_attach_internal(ifp, 0, NULL);
 }
 
 /*
  * Compute the least common TSO limit.
  */
 void
 if_hw_tsomax_common(if_t ifp, struct ifnet_hw_tsomax *pmax)
 {
 	/*
 	 * 1) If there is no limit currently, take the limit from
 	 * the network adapter.
 	 *
 	 * 2) If the network adapter has a limit below the current
 	 * limit, apply it.
 	 */
 	if (pmax->tsomaxbytes == 0 || (ifp->if_hw_tsomax != 0 &&
 	    ifp->if_hw_tsomax < pmax->tsomaxbytes)) {
 		pmax->tsomaxbytes = ifp->if_hw_tsomax;
 	}
 	if (pmax->tsomaxsegcount == 0 || (ifp->if_hw_tsomaxsegcount != 0 &&
 	    ifp->if_hw_tsomaxsegcount < pmax->tsomaxsegcount)) {
 		pmax->tsomaxsegcount = ifp->if_hw_tsomaxsegcount;
 	}
 	if (pmax->tsomaxsegsize == 0 || (ifp->if_hw_tsomaxsegsize != 0 &&
 	    ifp->if_hw_tsomaxsegsize < pmax->tsomaxsegsize)) {
 		pmax->tsomaxsegsize = ifp->if_hw_tsomaxsegsize;
 	}
 }
 
 /*
  * Update TSO limit of a network adapter.
  *
  * Returns zero if no change. Else non-zero.
  */
 int
 if_hw_tsomax_update(if_t ifp, struct ifnet_hw_tsomax *pmax)
 {
 	int retval = 0;
 	if (ifp->if_hw_tsomax != pmax->tsomaxbytes) {
 		ifp->if_hw_tsomax = pmax->tsomaxbytes;
 		retval++;
 	}
 	if (ifp->if_hw_tsomaxsegsize != pmax->tsomaxsegsize) {
 		ifp->if_hw_tsomaxsegsize = pmax->tsomaxsegsize;
 		retval++;
 	}
 	if (ifp->if_hw_tsomaxsegcount != pmax->tsomaxsegcount) {
 		ifp->if_hw_tsomaxsegcount = pmax->tsomaxsegcount;
 		retval++;
 	}
 	return (retval);
 }
 
 static void
 if_attach_internal(struct ifnet *ifp, int vmove, struct if_clone *ifc)
 {
 	unsigned socksize, ifasize;
 	int namelen, masklen;
 	struct sockaddr_dl *sdl;
 	struct ifaddr *ifa;
 
 	if (ifp->if_index == 0 || ifp != ifnet_byindex(ifp->if_index))
 		panic ("%s: BUG: if_attach called without if_alloc'd input()\n",
 		    ifp->if_xname);
 
 #ifdef VIMAGE
 	ifp->if_vnet = curvnet;
 	if (ifp->if_home_vnet == NULL)
 		ifp->if_home_vnet = curvnet;
 #endif
 
 	if_addgroup(ifp, IFG_ALL);
 
 	/* Restore group membership for cloned interfaces. */
 	if (vmove && ifc != NULL)
 		if_clone_addgroup(ifp, ifc);
 
 	getmicrotime(&ifp->if_lastchange);
 	ifp->if_epoch = time_uptime;
 
 	KASSERT((ifp->if_transmit == NULL && ifp->if_qflush == NULL) ||
 	    (ifp->if_transmit != NULL && ifp->if_qflush != NULL),
 	    ("transmit and qflush must both either be set or both be NULL"));
 	if (ifp->if_transmit == NULL) {
 		ifp->if_transmit = if_transmit;
 		ifp->if_qflush = if_qflush;
 	}
 	if (ifp->if_input == NULL)
 		ifp->if_input = if_input_default;
 
 	if (ifp->if_requestencap == NULL)
 		ifp->if_requestencap = if_requestencap_default;
 
 	if (!vmove) {
 #ifdef MAC
 		mac_ifnet_create(ifp);
 #endif
 
 		/*
 		 * Create a Link Level name for this device.
 		 */
 		namelen = strlen(ifp->if_xname);
 		/*
 		 * Always save enough space for any possiable name so we
 		 * can do a rename in place later.
 		 */
 		masklen = offsetof(struct sockaddr_dl, sdl_data[0]) + IFNAMSIZ;
 		socksize = masklen + ifp->if_addrlen;
 		if (socksize < sizeof(*sdl))
 			socksize = sizeof(*sdl);
 		socksize = roundup2(socksize, sizeof(long));
 		ifasize = sizeof(*ifa) + 2 * socksize;
 		ifa = ifa_alloc(ifasize, M_WAITOK);
 		sdl = (struct sockaddr_dl *)(ifa + 1);
 		sdl->sdl_len = socksize;
 		sdl->sdl_family = AF_LINK;
 		bcopy(ifp->if_xname, sdl->sdl_data, namelen);
 		sdl->sdl_nlen = namelen;
 		sdl->sdl_index = ifp->if_index;
 		sdl->sdl_type = ifp->if_type;
 		ifp->if_addr = ifa;
 		ifa->ifa_ifp = ifp;
 		ifa->ifa_addr = (struct sockaddr *)sdl;
 		sdl = (struct sockaddr_dl *)(socksize + (caddr_t)sdl);
 		ifa->ifa_netmask = (struct sockaddr *)sdl;
 		sdl->sdl_len = masklen;
 		while (namelen != 0)
 			sdl->sdl_data[--namelen] = 0xff;
 		CK_STAILQ_INSERT_HEAD(&ifp->if_addrhead, ifa, ifa_link);
 		/* Reliably crash if used uninitialized. */
 		ifp->if_broadcastaddr = NULL;
 
 		if (ifp->if_type == IFT_ETHER) {
 			ifp->if_hw_addr = malloc(ifp->if_addrlen, M_IFADDR,
 			    M_WAITOK | M_ZERO);
 		}
 
 #if defined(INET) || defined(INET6)
 		/* Use defaults for TSO, if nothing is set */
 		if (ifp->if_hw_tsomax == 0 &&
 		    ifp->if_hw_tsomaxsegcount == 0 &&
 		    ifp->if_hw_tsomaxsegsize == 0) {
 			/*
 			 * The TSO defaults needs to be such that an
 			 * NFS mbuf list of 35 mbufs totalling just
 			 * below 64K works and that a chain of mbufs
 			 * can be defragged into at most 32 segments:
 			 */
 			ifp->if_hw_tsomax = min(IP_MAXPACKET, (32 * MCLBYTES) -
 			    (ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN));
 			ifp->if_hw_tsomaxsegcount = 35;
 			ifp->if_hw_tsomaxsegsize = 2048;	/* 2K */
 
 			/* XXX some drivers set IFCAP_TSO after ethernet attach */
 			if (ifp->if_capabilities & IFCAP_TSO) {
 				if_printf(ifp, "Using defaults for TSO: %u/%u/%u\n",
 				    ifp->if_hw_tsomax,
 				    ifp->if_hw_tsomaxsegcount,
 				    ifp->if_hw_tsomaxsegsize);
 			}
 		}
 #endif
 	}
 #ifdef VIMAGE
 	else {
 		/*
 		 * Update the interface index in the link layer address
 		 * of the interface.
 		 */
 		for (ifa = ifp->if_addr; ifa != NULL;
 		    ifa = CK_STAILQ_NEXT(ifa, ifa_link)) {
 			if (ifa->ifa_addr->sa_family == AF_LINK) {
 				sdl = (struct sockaddr_dl *)ifa->ifa_addr;
 				sdl->sdl_index = ifp->if_index;
 			}
 		}
 	}
 #endif
 
 	IFNET_WLOCK();
 	CK_STAILQ_INSERT_TAIL(&V_ifnet, ifp, if_link);
 #ifdef VIMAGE
 	curvnet->vnet_ifcnt++;
 #endif
 	IFNET_WUNLOCK();
 
 	if (domain_init_status >= 2)
 		if_attachdomain1(ifp);
 
 	EVENTHANDLER_INVOKE(ifnet_arrival_event, ifp);
 	if (IS_DEFAULT_VNET(curvnet))
 		devctl_notify("IFNET", ifp->if_xname, "ATTACH", NULL);
 
 	/* Announce the interface. */
 	rt_ifannouncemsg(ifp, IFAN_ARRIVAL);
 }
 
 static void
 if_epochalloc(void *dummy __unused)
 {
 
 	net_epoch_preempt = epoch_alloc("Net preemptible", EPOCH_PREEMPT);
 }
 SYSINIT(ifepochalloc, SI_SUB_EPOCH, SI_ORDER_ANY, if_epochalloc, NULL);
 
 static void
 if_attachdomain(void *dummy)
 {
 	struct ifnet *ifp;
 
 	CK_STAILQ_FOREACH(ifp, &V_ifnet, if_link)
 		if_attachdomain1(ifp);
 }
 SYSINIT(domainifattach, SI_SUB_PROTO_IFATTACHDOMAIN, SI_ORDER_SECOND,
     if_attachdomain, NULL);
 
 static void
 if_attachdomain1(struct ifnet *ifp)
 {
 	struct domain *dp;
 
 	/*
 	 * Since dp->dom_ifattach calls malloc() with M_WAITOK, we
 	 * cannot lock ifp->if_afdata initialization, entirely.
 	 */
 	IF_AFDATA_LOCK(ifp);
 	if (ifp->if_afdata_initialized >= domain_init_status) {
 		IF_AFDATA_UNLOCK(ifp);
 		log(LOG_WARNING, "%s called more than once on %s\n",
 		    __func__, ifp->if_xname);
 		return;
 	}
 	ifp->if_afdata_initialized = domain_init_status;
 	IF_AFDATA_UNLOCK(ifp);
 
 	/* address family dependent data region */
 	bzero(ifp->if_afdata, sizeof(ifp->if_afdata));
 	for (dp = domains; dp; dp = dp->dom_next) {
 		if (dp->dom_ifattach)
 			ifp->if_afdata[dp->dom_family] =
 			    (*dp->dom_ifattach)(ifp);
 	}
 }
 
 /*
  * Remove any unicast or broadcast network addresses from an interface.
  */
 void
 if_purgeaddrs(struct ifnet *ifp)
 {
 	struct ifaddr *ifa;
 
 	while (1) {
 		struct epoch_tracker et;
 
 		NET_EPOCH_ENTER(et);
 		CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) {
 			if (ifa->ifa_addr->sa_family != AF_LINK)
 				break;
 		}
 		NET_EPOCH_EXIT(et);
 
 		if (ifa == NULL)
 			break;
 #ifdef INET
 		/* XXX: Ugly!! ad hoc just for INET */
 		if (ifa->ifa_addr->sa_family == AF_INET) {
 			struct ifaliasreq ifr;
 
 			bzero(&ifr, sizeof(ifr));
 			ifr.ifra_addr = *ifa->ifa_addr;
 			if (ifa->ifa_dstaddr)
 				ifr.ifra_broadaddr = *ifa->ifa_dstaddr;
 			if (in_control(NULL, SIOCDIFADDR, (caddr_t)&ifr, ifp,
 			    NULL) == 0)
 				continue;
 		}
 #endif /* INET */
 #ifdef INET6
 		if (ifa->ifa_addr->sa_family == AF_INET6) {
 			in6_purgeaddr(ifa);
 			/* ifp_addrhead is already updated */
 			continue;
 		}
 #endif /* INET6 */
 		IF_ADDR_WLOCK(ifp);
 		CK_STAILQ_REMOVE(&ifp->if_addrhead, ifa, ifaddr, ifa_link);
 		IF_ADDR_WUNLOCK(ifp);
 		ifa_free(ifa);
 	}
 }
 
 /*
  * Remove any multicast network addresses from an interface when an ifnet
  * is going away.
  */
 static void
 if_purgemaddrs(struct ifnet *ifp)
 {
 	struct ifmultiaddr *ifma;
 
 	IF_ADDR_WLOCK(ifp);
 	while (!CK_STAILQ_EMPTY(&ifp->if_multiaddrs)) {
 		ifma = CK_STAILQ_FIRST(&ifp->if_multiaddrs);
 		CK_STAILQ_REMOVE(&ifp->if_multiaddrs, ifma, ifmultiaddr, ifma_link);
 		if_delmulti_locked(ifp, ifma, 1);
 	}
 	IF_ADDR_WUNLOCK(ifp);
 }
 
 /*
  * Detach an interface, removing it from the list of "active" interfaces.
  * If vmove flag is set on entry to if_detach_internal(), perform only a
  * limited subset of cleanup tasks, given that we are moving an ifnet from
  * one vnet to another, where it must be fully operational.
  *
  * XXXRW: There are some significant questions about event ordering, and
  * how to prevent things from starting to use the interface during detach.
  */
 void
 if_detach(struct ifnet *ifp)
 {
 
 	CURVNET_SET_QUIET(ifp->if_vnet);
 	if_detach_internal(ifp, 0, NULL);
 	CURVNET_RESTORE();
 }
 
 /*
  * The vmove flag, if set, indicates that we are called from a callpath
  * that is moving an interface to a different vnet instance.
  *
  * The shutdown flag, if set, indicates that we are called in the
  * process of shutting down a vnet instance.  Currently only the
  * vnet_if_return SYSUNINIT function sets it.  Note: we can be called
  * on a vnet instance shutdown without this flag being set, e.g., when
  * the cloned interfaces are destoyed as first thing of teardown.
  */
 static int
 if_detach_internal(struct ifnet *ifp, int vmove, struct if_clone **ifcp)
 {
 	struct ifaddr *ifa;
 	int i;
 	struct domain *dp;
  	struct ifnet *iter;
  	int found = 0;
 #ifdef VIMAGE
 	bool shutdown;
 
-	shutdown = ifp->if_vnet->vnet_shutdown;
+	shutdown = VNET_IS_SHUTTING_DOWN(ifp->if_vnet);
 #endif
 	IFNET_WLOCK();
 	CK_STAILQ_FOREACH(iter, &V_ifnet, if_link)
 		if (iter == ifp) {
 			CK_STAILQ_REMOVE(&V_ifnet, ifp, ifnet, if_link);
 			if (!vmove)
 				ifp->if_flags |= IFF_DYING;
 			found = 1;
 			break;
 		}
 	IFNET_WUNLOCK();
 	if (!found) {
 		/*
 		 * While we would want to panic here, we cannot
 		 * guarantee that the interface is indeed still on
 		 * the list given we don't hold locks all the way.
 		 */
 		return (ENOENT);
 #if 0
 		if (vmove)
 			panic("%s: ifp=%p not on the ifnet tailq %p",
 			    __func__, ifp, &V_ifnet);
 		else
 			return; /* XXX this should panic as well? */
 #endif
 	}
 
 	/*
 	 * At this point we know the interface still was on the ifnet list
 	 * and we removed it so we are in a stable state.
 	 */
 #ifdef VIMAGE
 	curvnet->vnet_ifcnt--;
 #endif
 	epoch_wait_preempt(net_epoch_preempt);
 
 	/*
 	 * Ensure all pending EPOCH(9) callbacks have been executed. This
 	 * fixes issues about late destruction of multicast options
 	 * which lead to leave group calls, which in turn access the
 	 * belonging ifnet structure:
 	 */
 	epoch_drain_callbacks(net_epoch_preempt);
 
 	/*
 	 * In any case (destroy or vmove) detach us from the groups
 	 * and remove/wait for pending events on the taskq.
 	 * XXX-BZ in theory an interface could still enqueue a taskq change?
 	 */
 	if_delgroups(ifp);
 
 	taskqueue_drain(taskqueue_swi, &ifp->if_linktask);
 	taskqueue_drain(taskqueue_swi, &ifp->if_addmultitask);
 
 	/*
 	 * Check if this is a cloned interface or not. Must do even if
 	 * shutting down as a if_vmove_reclaim() would move the ifp and
 	 * the if_clone_addgroup() will have a corrupted string overwise
 	 * from a gibberish pointer.
 	 */
 	if (vmove && ifcp != NULL)
 		*ifcp = if_clone_findifc(ifp);
 
 	if_down(ifp);
 
 #ifdef VIMAGE
 	/*
 	 * On VNET shutdown abort here as the stack teardown will do all
 	 * the work top-down for us.
 	 */
 	if (shutdown) {
 		/* Give interface users the chance to clean up. */
 		EVENTHANDLER_INVOKE(ifnet_departure_event, ifp);
 
 		/*
 		 * In case of a vmove we are done here without error.
 		 * If we would signal an error it would lead to the same
 		 * abort as if we did not find the ifnet anymore.
 		 * if_detach() calls us in void context and does not care
 		 * about an early abort notification, so life is splendid :)
 		 */
 		goto finish_vnet_shutdown;
 	}
 #endif
 
 	/*
 	 * At this point we are not tearing down a VNET and are either
 	 * going to destroy or vmove the interface and have to cleanup
 	 * accordingly.
 	 */
 
 	/*
 	 * Remove routes and flush queues.
 	 */
 #ifdef ALTQ
 	if (ALTQ_IS_ENABLED(&ifp->if_snd))
 		altq_disable(&ifp->if_snd);
 	if (ALTQ_IS_ATTACHED(&ifp->if_snd))
 		altq_detach(&ifp->if_snd);
 #endif
 
 	if_purgeaddrs(ifp);
 
 #ifdef INET
 	in_ifdetach(ifp);
 #endif
 
 #ifdef INET6
 	/*
 	 * Remove all IPv6 kernel structs related to ifp.  This should be done
 	 * before removing routing entries below, since IPv6 interface direct
 	 * routes are expected to be removed by the IPv6-specific kernel API.
 	 * Otherwise, the kernel will detect some inconsistency and bark it.
 	 */
 	in6_ifdetach(ifp);
 #endif
 	if_purgemaddrs(ifp);
 
 	/* Announce that the interface is gone. */
 	rt_ifannouncemsg(ifp, IFAN_DEPARTURE);
 	EVENTHANDLER_INVOKE(ifnet_departure_event, ifp);
 	if (IS_DEFAULT_VNET(curvnet))
 		devctl_notify("IFNET", ifp->if_xname, "DETACH", NULL);
 
 	if (!vmove) {
 		/*
 		 * Prevent further calls into the device driver via ifnet.
 		 */
 		if_dead(ifp);
 
 		/*
 		 * Clean up all addresses.
 		 */
 		IF_ADDR_WLOCK(ifp);
 		if (!CK_STAILQ_EMPTY(&ifp->if_addrhead)) {
 			ifa = CK_STAILQ_FIRST(&ifp->if_addrhead);
 			CK_STAILQ_REMOVE(&ifp->if_addrhead, ifa, ifaddr, ifa_link);
 			IF_ADDR_WUNLOCK(ifp);
 			ifa_free(ifa);
 		} else
 			IF_ADDR_WUNLOCK(ifp);
 	}
 
 	rt_flushifroutes(ifp);
 
 #ifdef VIMAGE
 finish_vnet_shutdown:
 #endif
 	/*
 	 * We cannot hold the lock over dom_ifdetach calls as they might
 	 * sleep, for example trying to drain a callout, thus open up the
 	 * theoretical race with re-attaching.
 	 */
 	IF_AFDATA_LOCK(ifp);
 	i = ifp->if_afdata_initialized;
 	ifp->if_afdata_initialized = 0;
 	IF_AFDATA_UNLOCK(ifp);
 	for (dp = domains; i > 0 && dp; dp = dp->dom_next) {
 		if (dp->dom_ifdetach && ifp->if_afdata[dp->dom_family]) {
 			(*dp->dom_ifdetach)(ifp,
 			    ifp->if_afdata[dp->dom_family]);
 			ifp->if_afdata[dp->dom_family] = NULL;
 		}
 	}
 
 	return (0);
 }
 
 #ifdef VIMAGE
 /*
  * if_vmove() performs a limited version of if_detach() in current
  * vnet and if_attach()es the ifnet to the vnet specified as 2nd arg.
  * An attempt is made to shrink if_index in current vnet, find an
  * unused if_index in target vnet and calls if_grow() if necessary,
  * and finally find an unused if_xname for the target vnet.
  */
 static int
 if_vmove(struct ifnet *ifp, struct vnet *new_vnet)
 {
 	struct if_clone *ifc;
 #ifdef DEV_BPF
 	u_int bif_dlt, bif_hdrlen;
 #endif
 	void *old;
 	int rc;
 
 #ifdef DEV_BPF
  	/*
 	 * if_detach_internal() will call the eventhandler to notify
 	 * interface departure.  That will detach if_bpf.  We need to
 	 * safe the dlt and hdrlen so we can re-attach it later.
 	 */
 	bpf_get_bp_params(ifp->if_bpf, &bif_dlt, &bif_hdrlen);
 #endif
 
 	/*
 	 * Detach from current vnet, but preserve LLADDR info, do not
 	 * mark as dead etc. so that the ifnet can be reattached later.
 	 * If we cannot find it, we lost the race to someone else.
 	 */
 	rc = if_detach_internal(ifp, 1, &ifc);
 	if (rc != 0)
 		return (rc);
 
 	/*
 	 * Unlink the ifnet from ifindex_table[] in current vnet, and shrink
 	 * the if_index for that vnet if possible.
 	 *
 	 * NOTE: IFNET_WLOCK/IFNET_WUNLOCK() are assumed to be unvirtualized,
 	 * or we'd lock on one vnet and unlock on another.
 	 */
 	IFNET_WLOCK();
 	ifindex_free_locked(ifp->if_index);
 	IFNET_WUNLOCK();
 
 	/*
 	 * Perform interface-specific reassignment tasks, if provided by
 	 * the driver.
 	 */
 	if (ifp->if_reassign != NULL)
 		ifp->if_reassign(ifp, new_vnet, NULL);
 
 	/*
 	 * Switch to the context of the target vnet.
 	 */
 	CURVNET_SET_QUIET(new_vnet);
  restart:
 	IFNET_WLOCK();
 	ifp->if_index = ifindex_alloc(&old);
 	if (__predict_false(ifp->if_index == USHRT_MAX)) {
 		IFNET_WUNLOCK();
 		epoch_wait_preempt(net_epoch_preempt);
 		free(old, M_IFNET);
 		goto restart;
 	}
 	ifnet_setbyindex(ifp->if_index, ifp);
 	IFNET_WUNLOCK();
 
 	if_attach_internal(ifp, 1, ifc);
 
 #ifdef DEV_BPF
 	if (ifp->if_bpf == NULL)
 		bpfattach(ifp, bif_dlt, bif_hdrlen);
 #endif
 
 	CURVNET_RESTORE();
 	return (0);
 }
 
 /*
  * Move an ifnet to or from another child prison/vnet, specified by the jail id.
  */
 static int
 if_vmove_loan(struct thread *td, struct ifnet *ifp, char *ifname, int jid)
 {
 	struct prison *pr;
 	struct ifnet *difp;
 	int error;
+	bool shutdown;
 
 	/* Try to find the prison within our visibility. */
 	sx_slock(&allprison_lock);
 	pr = prison_find_child(td->td_ucred->cr_prison, jid);
 	sx_sunlock(&allprison_lock);
 	if (pr == NULL)
 		return (ENXIO);
 	prison_hold_locked(pr);
 	mtx_unlock(&pr->pr_mtx);
 
 	/* Do not try to move the iface from and to the same prison. */
 	if (pr->pr_vnet == ifp->if_vnet) {
 		prison_free(pr);
 		return (EEXIST);
 	}
 
 	/* Make sure the named iface does not exists in the dst. prison/vnet. */
 	/* XXX Lock interfaces to avoid races. */
 	CURVNET_SET_QUIET(pr->pr_vnet);
 	difp = ifunit(ifname);
 	if (difp != NULL) {
 		CURVNET_RESTORE();
 		prison_free(pr);
 		return (EEXIST);
 	}
 
 	/* Make sure the VNET is stable. */
-	if (ifp->if_vnet->vnet_shutdown) {
+	shutdown = VNET_IS_SHUTTING_DOWN(ifp->if_vnet);
+	if (shutdown) {
 		CURVNET_RESTORE();
 		prison_free(pr);
 		return (EBUSY);
 	}
 	CURVNET_RESTORE();
 
 	/* Move the interface into the child jail/vnet. */
 	error = if_vmove(ifp, pr->pr_vnet);
 
 	/* Report the new if_xname back to the userland on success. */
 	if (error == 0)
 		sprintf(ifname, "%s", ifp->if_xname);
 
 	prison_free(pr);
 	return (error);
 }
 
 static int
 if_vmove_reclaim(struct thread *td, char *ifname, int jid)
 {
 	struct prison *pr;
 	struct vnet *vnet_dst;
 	struct ifnet *ifp;
 	int error;
+ 	bool shutdown;
 
 	/* Try to find the prison within our visibility. */
 	sx_slock(&allprison_lock);
 	pr = prison_find_child(td->td_ucred->cr_prison, jid);
 	sx_sunlock(&allprison_lock);
 	if (pr == NULL)
 		return (ENXIO);
 	prison_hold_locked(pr);
 	mtx_unlock(&pr->pr_mtx);
 
 	/* Make sure the named iface exists in the source prison/vnet. */
 	CURVNET_SET(pr->pr_vnet);
 	ifp = ifunit(ifname);		/* XXX Lock to avoid races. */
 	if (ifp == NULL) {
 		CURVNET_RESTORE();
 		prison_free(pr);
 		return (ENXIO);
 	}
 
 	/* Do not try to move the iface from and to the same prison. */
 	vnet_dst = TD_TO_VNET(td);
 	if (vnet_dst == ifp->if_vnet) {
 		CURVNET_RESTORE();
 		prison_free(pr);
 		return (EEXIST);
 	}
 
 	/* Make sure the VNET is stable. */
-	if (ifp->if_vnet->vnet_shutdown) {
+	shutdown = VNET_IS_SHUTTING_DOWN(ifp->if_vnet);
+	if (shutdown) {
 		CURVNET_RESTORE();
 		prison_free(pr);
 		return (EBUSY);
 	}
 
 	/* Get interface back from child jail/vnet. */
 	error = if_vmove(ifp, vnet_dst);
 	CURVNET_RESTORE();
 
 	/* Report the new if_xname back to the userland on success. */
 	if (error == 0)
 		sprintf(ifname, "%s", ifp->if_xname);
 
 	prison_free(pr);
 	return (error);
 }
 #endif /* VIMAGE */
 
 /*
  * Add a group to an interface
  */
 int
 if_addgroup(struct ifnet *ifp, const char *groupname)
 {
 	struct ifg_list		*ifgl;
 	struct ifg_group	*ifg = NULL;
 	struct ifg_member	*ifgm;
 	int 			 new = 0;
 
 	if (groupname[0] && groupname[strlen(groupname) - 1] >= '0' &&
 	    groupname[strlen(groupname) - 1] <= '9')
 		return (EINVAL);
 
 	IFNET_WLOCK();
 	CK_STAILQ_FOREACH(ifgl, &ifp->if_groups, ifgl_next)
 		if (!strcmp(ifgl->ifgl_group->ifg_group, groupname)) {
 			IFNET_WUNLOCK();
 			return (EEXIST);
 		}
 
 	if ((ifgl = malloc(sizeof(*ifgl), M_TEMP, M_NOWAIT)) == NULL) {
 	    	IFNET_WUNLOCK();
 		return (ENOMEM);
 	}
 
 	if ((ifgm = malloc(sizeof(*ifgm), M_TEMP, M_NOWAIT)) == NULL) {
 		free(ifgl, M_TEMP);
 		IFNET_WUNLOCK();
 		return (ENOMEM);
 	}
 
 	CK_STAILQ_FOREACH(ifg, &V_ifg_head, ifg_next)
 		if (!strcmp(ifg->ifg_group, groupname))
 			break;
 
 	if (ifg == NULL) {
 		if ((ifg = malloc(sizeof(*ifg), M_TEMP, M_NOWAIT)) == NULL) {
 			free(ifgl, M_TEMP);
 			free(ifgm, M_TEMP);
 			IFNET_WUNLOCK();
 			return (ENOMEM);
 		}
 		strlcpy(ifg->ifg_group, groupname, sizeof(ifg->ifg_group));
 		ifg->ifg_refcnt = 0;
 		CK_STAILQ_INIT(&ifg->ifg_members);
 		CK_STAILQ_INSERT_TAIL(&V_ifg_head, ifg, ifg_next);
 		new = 1;
 	}
 
 	ifg->ifg_refcnt++;
 	ifgl->ifgl_group = ifg;
 	ifgm->ifgm_ifp = ifp;
 
 	IF_ADDR_WLOCK(ifp);
 	CK_STAILQ_INSERT_TAIL(&ifg->ifg_members, ifgm, ifgm_next);
 	CK_STAILQ_INSERT_TAIL(&ifp->if_groups, ifgl, ifgl_next);
 	IF_ADDR_WUNLOCK(ifp);
 
 	IFNET_WUNLOCK();
 
 	if (new)
 		EVENTHANDLER_INVOKE(group_attach_event, ifg);
 	EVENTHANDLER_INVOKE(group_change_event, groupname);
 
 	return (0);
 }
 
 /*
  * Helper function to remove a group out of an interface.  Expects the global
  * ifnet lock to be write-locked, and drops it before returning.
  */
 static void
 _if_delgroup_locked(struct ifnet *ifp, struct ifg_list *ifgl,
     const char *groupname)
 {
 	struct ifg_member *ifgm;
 	bool freeifgl;
 
 	IFNET_WLOCK_ASSERT();
 
 	IF_ADDR_WLOCK(ifp);
 	CK_STAILQ_REMOVE(&ifp->if_groups, ifgl, ifg_list, ifgl_next);
 	IF_ADDR_WUNLOCK(ifp);
 
 	CK_STAILQ_FOREACH(ifgm, &ifgl->ifgl_group->ifg_members, ifgm_next) {
 		if (ifgm->ifgm_ifp == ifp) {
 			CK_STAILQ_REMOVE(&ifgl->ifgl_group->ifg_members, ifgm,
 			    ifg_member, ifgm_next);
 			break;
 		}
 	}
 
 	if (--ifgl->ifgl_group->ifg_refcnt == 0) {
 		CK_STAILQ_REMOVE(&V_ifg_head, ifgl->ifgl_group, ifg_group,
 		    ifg_next);
 		freeifgl = true;
 	} else {
 		freeifgl = false;
 	}
 	IFNET_WUNLOCK();
 
 	epoch_wait_preempt(net_epoch_preempt);
 	if (freeifgl) {
 		EVENTHANDLER_INVOKE(group_detach_event, ifgl->ifgl_group);
 		free(ifgl->ifgl_group, M_TEMP);
 	}
 	free(ifgm, M_TEMP);
 	free(ifgl, M_TEMP);
 
 	EVENTHANDLER_INVOKE(group_change_event, groupname);
 }
 
 /*
  * Remove a group from an interface
  */
 int
 if_delgroup(struct ifnet *ifp, const char *groupname)
 {
 	struct ifg_list *ifgl;
 
 	IFNET_WLOCK();
 	CK_STAILQ_FOREACH(ifgl, &ifp->if_groups, ifgl_next)
 		if (strcmp(ifgl->ifgl_group->ifg_group, groupname) == 0)
 			break;
 	if (ifgl == NULL) {
 		IFNET_WUNLOCK();
 		return (ENOENT);
 	}
 
 	_if_delgroup_locked(ifp, ifgl, groupname);
 
 	return (0);
 }
 
 /*
  * Remove an interface from all groups
  */
 static void
 if_delgroups(struct ifnet *ifp)
 {
 	struct ifg_list *ifgl;
 	char groupname[IFNAMSIZ];
 
 	IFNET_WLOCK();
 	while ((ifgl = CK_STAILQ_FIRST(&ifp->if_groups)) != NULL) {
 		strlcpy(groupname, ifgl->ifgl_group->ifg_group, IFNAMSIZ);
 		_if_delgroup_locked(ifp, ifgl, groupname);
 		IFNET_WLOCK();
 	}
 	IFNET_WUNLOCK();
 }
 
 static char *
 ifgr_group_get(void *ifgrp)
 {
 	union ifgroupreq_union *ifgrup;
 
 	ifgrup = ifgrp;
 #ifdef COMPAT_FREEBSD32
 	if (SV_CURPROC_FLAG(SV_ILP32))
 		return (&ifgrup->ifgr32.ifgr_ifgru.ifgru_group[0]);
 #endif
 	return (&ifgrup->ifgr.ifgr_ifgru.ifgru_group[0]);
 }
 
 static struct ifg_req *
 ifgr_groups_get(void *ifgrp)
 {
 	union ifgroupreq_union *ifgrup;
 
 	ifgrup = ifgrp;
 #ifdef COMPAT_FREEBSD32
 	if (SV_CURPROC_FLAG(SV_ILP32))
 		return ((struct ifg_req *)(uintptr_t)
 		    ifgrup->ifgr32.ifgr_ifgru.ifgru_groups);
 #endif
 	return (ifgrup->ifgr.ifgr_ifgru.ifgru_groups);
 }
 
 /*
  * Stores all groups from an interface in memory pointed to by ifgr.
  */
 static int
 if_getgroup(struct ifgroupreq *ifgr, struct ifnet *ifp)
 {
 	int			 len, error;
 	struct ifg_list		*ifgl;
 	struct ifg_req		 ifgrq, *ifgp;
 
 	NET_EPOCH_ASSERT();
 
 	if (ifgr->ifgr_len == 0) {
 		CK_STAILQ_FOREACH(ifgl, &ifp->if_groups, ifgl_next)
 			ifgr->ifgr_len += sizeof(struct ifg_req);
 		return (0);
 	}
 
 	len = ifgr->ifgr_len;
 	ifgp = ifgr_groups_get(ifgr);
 	/* XXX: wire */
 	CK_STAILQ_FOREACH(ifgl, &ifp->if_groups, ifgl_next) {
 		if (len < sizeof(ifgrq))
 			return (EINVAL);
 		bzero(&ifgrq, sizeof ifgrq);
 		strlcpy(ifgrq.ifgrq_group, ifgl->ifgl_group->ifg_group,
 		    sizeof(ifgrq.ifgrq_group));
 		if ((error = copyout(&ifgrq, ifgp, sizeof(struct ifg_req))))
 			return (error);
 		len -= sizeof(ifgrq);
 		ifgp++;
 	}
 
 	return (0);
 }
 
 /*
  * Stores all members of a group in memory pointed to by igfr
  */
 static int
 if_getgroupmembers(struct ifgroupreq *ifgr)
 {
 	struct ifg_group	*ifg;
 	struct ifg_member	*ifgm;
 	struct ifg_req		 ifgrq, *ifgp;
 	int			 len, error;
 
 	IFNET_RLOCK();
 	CK_STAILQ_FOREACH(ifg, &V_ifg_head, ifg_next)
 		if (strcmp(ifg->ifg_group, ifgr->ifgr_name) == 0)
 			break;
 	if (ifg == NULL) {
 		IFNET_RUNLOCK();
 		return (ENOENT);
 	}
 
 	if (ifgr->ifgr_len == 0) {
 		CK_STAILQ_FOREACH(ifgm, &ifg->ifg_members, ifgm_next)
 			ifgr->ifgr_len += sizeof(ifgrq);
 		IFNET_RUNLOCK();
 		return (0);
 	}
 
 	len = ifgr->ifgr_len;
 	ifgp = ifgr_groups_get(ifgr);
 	CK_STAILQ_FOREACH(ifgm, &ifg->ifg_members, ifgm_next) {
 		if (len < sizeof(ifgrq)) {
 			IFNET_RUNLOCK();
 			return (EINVAL);
 		}
 		bzero(&ifgrq, sizeof ifgrq);
 		strlcpy(ifgrq.ifgrq_member, ifgm->ifgm_ifp->if_xname,
 		    sizeof(ifgrq.ifgrq_member));
 		if ((error = copyout(&ifgrq, ifgp, sizeof(struct ifg_req)))) {
 			IFNET_RUNLOCK();
 			return (error);
 		}
 		len -= sizeof(ifgrq);
 		ifgp++;
 	}
 	IFNET_RUNLOCK();
 
 	return (0);
 }
 
 /*
  * Return counter values from counter(9)s stored in ifnet.
  */
 uint64_t
 if_get_counter_default(struct ifnet *ifp, ift_counter cnt)
 {
 
 	KASSERT(cnt < IFCOUNTERS, ("%s: invalid cnt %d", __func__, cnt));
 
 	return (counter_u64_fetch(ifp->if_counters[cnt]));
 }
 
 /*
  * Increase an ifnet counter. Usually used for counters shared
  * between the stack and a driver, but function supports them all.
  */
 void
 if_inc_counter(struct ifnet *ifp, ift_counter cnt, int64_t inc)
 {
 
 	KASSERT(cnt < IFCOUNTERS, ("%s: invalid cnt %d", __func__, cnt));
 
 	counter_u64_add(ifp->if_counters[cnt], inc);
 }
 
 /*
  * Copy data from ifnet to userland API structure if_data.
  */
 void
 if_data_copy(struct ifnet *ifp, struct if_data *ifd)
 {
 
 	ifd->ifi_type = ifp->if_type;
 	ifd->ifi_physical = 0;
 	ifd->ifi_addrlen = ifp->if_addrlen;
 	ifd->ifi_hdrlen = ifp->if_hdrlen;
 	ifd->ifi_link_state = ifp->if_link_state;
 	ifd->ifi_vhid = 0;
 	ifd->ifi_datalen = sizeof(struct if_data);
 	ifd->ifi_mtu = ifp->if_mtu;
 	ifd->ifi_metric = ifp->if_metric;
 	ifd->ifi_baudrate = ifp->if_baudrate;
 	ifd->ifi_hwassist = ifp->if_hwassist;
 	ifd->ifi_epoch = ifp->if_epoch;
 	ifd->ifi_lastchange = ifp->if_lastchange;
 
 	ifd->ifi_ipackets = ifp->if_get_counter(ifp, IFCOUNTER_IPACKETS);
 	ifd->ifi_ierrors = ifp->if_get_counter(ifp, IFCOUNTER_IERRORS);
 	ifd->ifi_opackets = ifp->if_get_counter(ifp, IFCOUNTER_OPACKETS);
 	ifd->ifi_oerrors = ifp->if_get_counter(ifp, IFCOUNTER_OERRORS);
 	ifd->ifi_collisions = ifp->if_get_counter(ifp, IFCOUNTER_COLLISIONS);
 	ifd->ifi_ibytes = ifp->if_get_counter(ifp, IFCOUNTER_IBYTES);
 	ifd->ifi_obytes = ifp->if_get_counter(ifp, IFCOUNTER_OBYTES);
 	ifd->ifi_imcasts = ifp->if_get_counter(ifp, IFCOUNTER_IMCASTS);
 	ifd->ifi_omcasts = ifp->if_get_counter(ifp, IFCOUNTER_OMCASTS);
 	ifd->ifi_iqdrops = ifp->if_get_counter(ifp, IFCOUNTER_IQDROPS);
 	ifd->ifi_oqdrops = ifp->if_get_counter(ifp, IFCOUNTER_OQDROPS);
 	ifd->ifi_noproto = ifp->if_get_counter(ifp, IFCOUNTER_NOPROTO);
 }
 
 /*
  * Initialization, destruction and refcounting functions for ifaddrs.
  */
 struct ifaddr *
 ifa_alloc(size_t size, int flags)
 {
 	struct ifaddr *ifa;
 
 	KASSERT(size >= sizeof(struct ifaddr),
 	    ("%s: invalid size %zu", __func__, size));
 
 	ifa = malloc(size, M_IFADDR, M_ZERO | flags);
 	if (ifa == NULL)
 		return (NULL);
 
 	if ((ifa->ifa_opackets = counter_u64_alloc(flags)) == NULL)
 		goto fail;
 	if ((ifa->ifa_ipackets = counter_u64_alloc(flags)) == NULL)
 		goto fail;
 	if ((ifa->ifa_obytes = counter_u64_alloc(flags)) == NULL)
 		goto fail;
 	if ((ifa->ifa_ibytes = counter_u64_alloc(flags)) == NULL)
 		goto fail;
 
 	refcount_init(&ifa->ifa_refcnt, 1);
 
 	return (ifa);
 
 fail:
 	/* free(NULL) is okay */
 	counter_u64_free(ifa->ifa_opackets);
 	counter_u64_free(ifa->ifa_ipackets);
 	counter_u64_free(ifa->ifa_obytes);
 	counter_u64_free(ifa->ifa_ibytes);
 	free(ifa, M_IFADDR);
 
 	return (NULL);
 }
 
 void
 ifa_ref(struct ifaddr *ifa)
 {
 
 	refcount_acquire(&ifa->ifa_refcnt);
 }
 
 static void
 ifa_destroy(epoch_context_t ctx)
 {
 	struct ifaddr *ifa;
 
 	ifa = __containerof(ctx, struct ifaddr, ifa_epoch_ctx);
 	counter_u64_free(ifa->ifa_opackets);
 	counter_u64_free(ifa->ifa_ipackets);
 	counter_u64_free(ifa->ifa_obytes);
 	counter_u64_free(ifa->ifa_ibytes);
 	free(ifa, M_IFADDR);
 }
 
 void
 ifa_free(struct ifaddr *ifa)
 {
 
 	if (refcount_release(&ifa->ifa_refcnt))
 		NET_EPOCH_CALL(ifa_destroy, &ifa->ifa_epoch_ctx);
 }
 
 
 static int
 ifa_maintain_loopback_route(int cmd, const char *otype, struct ifaddr *ifa,
     struct sockaddr *ia)
 {
 	struct epoch_tracker et;
 	int error;
 	struct rt_addrinfo info;
 	struct sockaddr_dl null_sdl;
 	struct ifnet *ifp;
 	struct ifaddr *rti_ifa = NULL;
 
 	ifp = ifa->ifa_ifp;
 
 	bzero(&info, sizeof(info));
 	if (cmd != RTM_DELETE)
 		info.rti_ifp = V_loif;
 	if (cmd == RTM_ADD) {
 		/* explicitly specify (loopback) ifa */
 		if (info.rti_ifp != NULL) {
 			NET_EPOCH_ENTER(et);
 			rti_ifa = ifaof_ifpforaddr(ifa->ifa_addr, info.rti_ifp);
 			if (rti_ifa != NULL)
 				ifa_ref(rti_ifa);
 			info.rti_ifa = rti_ifa;
 			NET_EPOCH_EXIT(et);
 		}
 	}
 	info.rti_flags = ifa->ifa_flags | RTF_HOST | RTF_STATIC | RTF_PINNED;
 	info.rti_info[RTAX_DST] = ia;
 	info.rti_info[RTAX_GATEWAY] = (struct sockaddr *)&null_sdl;
 	link_init_sdl(ifp, (struct sockaddr *)&null_sdl, ifp->if_type);
 
 	error = rtrequest1_fib(cmd, &info, NULL, ifp->if_fib);
 
 	if (rti_ifa != NULL)
 		ifa_free(rti_ifa);
 
 	if (error == 0 ||
 	    (cmd == RTM_ADD && error == EEXIST) ||
 	    (cmd == RTM_DELETE && (error == ENOENT || error == ESRCH)))
 		return (error);
 
 	log(LOG_DEBUG, "%s: %s failed for interface %s: %u\n",
 		__func__, otype, if_name(ifp), error);
 
 	return (error);
 }
 
 int
 ifa_add_loopback_route(struct ifaddr *ifa, struct sockaddr *ia)
 {
 
 	return (ifa_maintain_loopback_route(RTM_ADD, "insertion", ifa, ia));
 }
 
 int
 ifa_del_loopback_route(struct ifaddr *ifa, struct sockaddr *ia)
 {
 
 	return (ifa_maintain_loopback_route(RTM_DELETE, "deletion", ifa, ia));
 }
 
 int
 ifa_switch_loopback_route(struct ifaddr *ifa, struct sockaddr *ia)
 {
 
 	return (ifa_maintain_loopback_route(RTM_CHANGE, "switch", ifa, ia));
 }
 
 /*
  * XXX: Because sockaddr_dl has deeper structure than the sockaddr
  * structs used to represent other address families, it is necessary
  * to perform a different comparison.
  */
 
 #define	sa_dl_equal(a1, a2)	\
 	((((const struct sockaddr_dl *)(a1))->sdl_len ==		\
 	 ((const struct sockaddr_dl *)(a2))->sdl_len) &&		\
 	 (bcmp(CLLADDR((const struct sockaddr_dl *)(a1)),		\
 	       CLLADDR((const struct sockaddr_dl *)(a2)),		\
 	       ((const struct sockaddr_dl *)(a1))->sdl_alen) == 0))
 
 /*
  * Locate an interface based on a complete address.
  */
 /*ARGSUSED*/
 struct ifaddr *
 ifa_ifwithaddr(const struct sockaddr *addr)
 {
 	struct ifnet *ifp;
 	struct ifaddr *ifa;
 
 	NET_EPOCH_ASSERT();
 
 	CK_STAILQ_FOREACH(ifp, &V_ifnet, if_link) {
 		CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) {
 			if (ifa->ifa_addr->sa_family != addr->sa_family)
 				continue;
 			if (sa_equal(addr, ifa->ifa_addr)) {
 				goto done;
 			}
 			/* IP6 doesn't have broadcast */
 			if ((ifp->if_flags & IFF_BROADCAST) &&
 			    ifa->ifa_broadaddr &&
 			    ifa->ifa_broadaddr->sa_len != 0 &&
 			    sa_equal(ifa->ifa_broadaddr, addr)) {
 				goto done;
 			}
 		}
 	}
 	ifa = NULL;
 done:
 	return (ifa);
 }
 
 int
 ifa_ifwithaddr_check(const struct sockaddr *addr)
 {
 	struct epoch_tracker et;
 	int rc;
 
 	NET_EPOCH_ENTER(et);
 	rc = (ifa_ifwithaddr(addr) != NULL);
 	NET_EPOCH_EXIT(et);
 	return (rc);
 }
 
 /*
  * Locate an interface based on the broadcast address.
  */
 /* ARGSUSED */
 struct ifaddr *
 ifa_ifwithbroadaddr(const struct sockaddr *addr, int fibnum)
 {
 	struct ifnet *ifp;
 	struct ifaddr *ifa;
 
 	NET_EPOCH_ASSERT();
 	CK_STAILQ_FOREACH(ifp, &V_ifnet, if_link) {
 		if ((fibnum != RT_ALL_FIBS) && (ifp->if_fib != fibnum))
 			continue;
 		CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) {
 			if (ifa->ifa_addr->sa_family != addr->sa_family)
 				continue;
 			if ((ifp->if_flags & IFF_BROADCAST) &&
 			    ifa->ifa_broadaddr &&
 			    ifa->ifa_broadaddr->sa_len != 0 &&
 			    sa_equal(ifa->ifa_broadaddr, addr)) {
 				goto done;
 			}
 		}
 	}
 	ifa = NULL;
 done:
 	return (ifa);
 }
 
 /*
  * Locate the point to point interface with a given destination address.
  */
 /*ARGSUSED*/
 struct ifaddr *
 ifa_ifwithdstaddr(const struct sockaddr *addr, int fibnum)
 {
 	struct ifnet *ifp;
 	struct ifaddr *ifa;
 
 	NET_EPOCH_ASSERT();
 	CK_STAILQ_FOREACH(ifp, &V_ifnet, if_link) {
 		if ((ifp->if_flags & IFF_POINTOPOINT) == 0)
 			continue;
 		if ((fibnum != RT_ALL_FIBS) && (ifp->if_fib != fibnum))
 			continue;
 		CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) {
 			if (ifa->ifa_addr->sa_family != addr->sa_family)
 				continue;
 			if (ifa->ifa_dstaddr != NULL &&
 			    sa_equal(addr, ifa->ifa_dstaddr)) {
 				goto done;
 			}
 		}
 	}
 	ifa = NULL;
 done:
 	return (ifa);
 }
 
 /*
  * Find an interface on a specific network.  If many, choice
  * is most specific found.
  */
 struct ifaddr *
 ifa_ifwithnet(const struct sockaddr *addr, int ignore_ptp, int fibnum)
 {
 	struct ifnet *ifp;
 	struct ifaddr *ifa;
 	struct ifaddr *ifa_maybe = NULL;
 	u_int af = addr->sa_family;
 	const char *addr_data = addr->sa_data, *cplim;
 
 	NET_EPOCH_ASSERT();
 	/*
 	 * AF_LINK addresses can be looked up directly by their index number,
 	 * so do that if we can.
 	 */
 	if (af == AF_LINK) {
 	    const struct sockaddr_dl *sdl = (const struct sockaddr_dl *)addr;
 	    if (sdl->sdl_index && sdl->sdl_index <= V_if_index)
 		return (ifaddr_byindex(sdl->sdl_index));
 	}
 
 	/*
 	 * Scan though each interface, looking for ones that have addresses
 	 * in this address family and the requested fib.
 	 */
 	CK_STAILQ_FOREACH(ifp, &V_ifnet, if_link) {
 		if ((fibnum != RT_ALL_FIBS) && (ifp->if_fib != fibnum))
 			continue;
 		CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) {
 			const char *cp, *cp2, *cp3;
 
 			if (ifa->ifa_addr->sa_family != af)
 next:				continue;
 			if (af == AF_INET && 
 			    ifp->if_flags & IFF_POINTOPOINT && !ignore_ptp) {
 				/*
 				 * This is a bit broken as it doesn't
 				 * take into account that the remote end may
 				 * be a single node in the network we are
 				 * looking for.
 				 * The trouble is that we don't know the
 				 * netmask for the remote end.
 				 */
 				if (ifa->ifa_dstaddr != NULL &&
 				    sa_equal(addr, ifa->ifa_dstaddr)) {
 					goto done;
 				}
 			} else {
 				/*
 				 * Scan all the bits in the ifa's address.
 				 * If a bit dissagrees with what we are
 				 * looking for, mask it with the netmask
 				 * to see if it really matters.
 				 * (A byte at a time)
 				 */
 				if (ifa->ifa_netmask == 0)
 					continue;
 				cp = addr_data;
 				cp2 = ifa->ifa_addr->sa_data;
 				cp3 = ifa->ifa_netmask->sa_data;
 				cplim = ifa->ifa_netmask->sa_len
 					+ (char *)ifa->ifa_netmask;
 				while (cp3 < cplim)
 					if ((*cp++ ^ *cp2++) & *cp3++)
 						goto next; /* next address! */
 				/*
 				 * If the netmask of what we just found
 				 * is more specific than what we had before
 				 * (if we had one), or if the virtual status
 				 * of new prefix is better than of the old one,
 				 * then remember the new one before continuing
 				 * to search for an even better one.
 				 */
 				if (ifa_maybe == NULL ||
 				    ifa_preferred(ifa_maybe, ifa) ||
 				    rn_refines((caddr_t)ifa->ifa_netmask,
 				    (caddr_t)ifa_maybe->ifa_netmask)) {
 					ifa_maybe = ifa;
 				}
 			}
 		}
 	}
 	ifa = ifa_maybe;
 	ifa_maybe = NULL;
 done:
 	return (ifa);
 }
 
 /*
  * Find an interface address specific to an interface best matching
  * a given address.
  */
 struct ifaddr *
 ifaof_ifpforaddr(const struct sockaddr *addr, struct ifnet *ifp)
 {
 	struct ifaddr *ifa;
 	const char *cp, *cp2, *cp3;
 	char *cplim;
 	struct ifaddr *ifa_maybe = NULL;
 	u_int af = addr->sa_family;
 
 	if (af >= AF_MAX)
 		return (NULL);
 
 	NET_EPOCH_ASSERT();
 	CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) {
 		if (ifa->ifa_addr->sa_family != af)
 			continue;
 		if (ifa_maybe == NULL)
 			ifa_maybe = ifa;
 		if (ifa->ifa_netmask == 0) {
 			if (sa_equal(addr, ifa->ifa_addr) ||
 			    (ifa->ifa_dstaddr &&
 			    sa_equal(addr, ifa->ifa_dstaddr)))
 				goto done;
 			continue;
 		}
 		if (ifp->if_flags & IFF_POINTOPOINT) {
 			if (sa_equal(addr, ifa->ifa_dstaddr))
 				goto done;
 		} else {
 			cp = addr->sa_data;
 			cp2 = ifa->ifa_addr->sa_data;
 			cp3 = ifa->ifa_netmask->sa_data;
 			cplim = ifa->ifa_netmask->sa_len + (char *)ifa->ifa_netmask;
 			for (; cp3 < cplim; cp3++)
 				if ((*cp++ ^ *cp2++) & *cp3)
 					break;
 			if (cp3 == cplim)
 				goto done;
 		}
 	}
 	ifa = ifa_maybe;
 done:
 	return (ifa);
 }
 
 /*
  * See whether new ifa is better than current one:
  * 1) A non-virtual one is preferred over virtual.
  * 2) A virtual in master state preferred over any other state.
  *
  * Used in several address selecting functions.
  */
 int
 ifa_preferred(struct ifaddr *cur, struct ifaddr *next)
 {
 
 	return (cur->ifa_carp && (!next->ifa_carp ||
 	    ((*carp_master_p)(next) && !(*carp_master_p)(cur))));
 }
 
 struct sockaddr_dl *
 link_alloc_sdl(size_t size, int flags)
 {
 
 	return (malloc(size, M_TEMP, flags));
 }
 
 void
 link_free_sdl(struct sockaddr *sa)
 {
 	free(sa, M_TEMP);
 }
 
 /*
  * Fills in given sdl with interface basic info.
  * Returns pointer to filled sdl.
  */
 struct sockaddr_dl *
 link_init_sdl(struct ifnet *ifp, struct sockaddr *paddr, u_char iftype)
 {
 	struct sockaddr_dl *sdl;
 
 	sdl = (struct sockaddr_dl *)paddr;
 	memset(sdl, 0, sizeof(struct sockaddr_dl));
 	sdl->sdl_len = sizeof(struct sockaddr_dl);
 	sdl->sdl_family = AF_LINK;
 	sdl->sdl_index = ifp->if_index;
 	sdl->sdl_type = iftype;
 
 	return (sdl);
 }
 
 /*
  * Mark an interface down and notify protocols of
  * the transition.
  */
 static void
 if_unroute(struct ifnet *ifp, int flag, int fam)
 {
 	struct ifaddr *ifa;
 
 	KASSERT(flag == IFF_UP, ("if_unroute: flag != IFF_UP"));
 
 	ifp->if_flags &= ~flag;
 	getmicrotime(&ifp->if_lastchange);
 	CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link)
 		if (fam == PF_UNSPEC || (fam == ifa->ifa_addr->sa_family))
 			pfctlinput(PRC_IFDOWN, ifa->ifa_addr);
 	ifp->if_qflush(ifp);
 
 	if (ifp->if_carp)
 		(*carp_linkstate_p)(ifp);
 	rt_ifmsg(ifp);
 }
 
 /*
  * Mark an interface up and notify protocols of
  * the transition.
  */
 static void
 if_route(struct ifnet *ifp, int flag, int fam)
 {
 	struct ifaddr *ifa;
 
 	KASSERT(flag == IFF_UP, ("if_route: flag != IFF_UP"));
 
 	ifp->if_flags |= flag;
 	getmicrotime(&ifp->if_lastchange);
 	CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link)
 		if (fam == PF_UNSPEC || (fam == ifa->ifa_addr->sa_family))
 			pfctlinput(PRC_IFUP, ifa->ifa_addr);
 	if (ifp->if_carp)
 		(*carp_linkstate_p)(ifp);
 	rt_ifmsg(ifp);
 #ifdef INET6
 	in6_if_up(ifp);
 #endif
 }
 
 void	(*vlan_link_state_p)(struct ifnet *);	/* XXX: private from if_vlan */
 void	(*vlan_trunk_cap_p)(struct ifnet *);		/* XXX: private from if_vlan */
 struct ifnet *(*vlan_trunkdev_p)(struct ifnet *);
 struct	ifnet *(*vlan_devat_p)(struct ifnet *, uint16_t);
 int	(*vlan_tag_p)(struct ifnet *, uint16_t *);
 int	(*vlan_pcp_p)(struct ifnet *, uint16_t *);
 int	(*vlan_setcookie_p)(struct ifnet *, void *);
 void	*(*vlan_cookie_p)(struct ifnet *);
 
 /*
  * Handle a change in the interface link state. To avoid LORs
  * between driver lock and upper layer locks, as well as possible
  * recursions, we post event to taskqueue, and all job
  * is done in static do_link_state_change().
  */
 void
 if_link_state_change(struct ifnet *ifp, int link_state)
 {
 	/* Return if state hasn't changed. */
 	if (ifp->if_link_state == link_state)
 		return;
 
 	ifp->if_link_state = link_state;
 
 	/* XXXGL: reference ifp? */
 	taskqueue_enqueue(taskqueue_swi, &ifp->if_linktask);
 }
 
 static void
 do_link_state_change(void *arg, int pending)
 {
 	struct ifnet *ifp;
 	int link_state;
 
 	ifp = arg;
 	link_state = ifp->if_link_state;
 
 	CURVNET_SET(ifp->if_vnet);
 	rt_ifmsg(ifp);
 	if (ifp->if_vlantrunk != NULL)
 		(*vlan_link_state_p)(ifp);
 
 	if ((ifp->if_type == IFT_ETHER || ifp->if_type == IFT_L2VLAN) &&
 	    ifp->if_l2com != NULL)
 		(*ng_ether_link_state_p)(ifp, link_state);
 	if (ifp->if_carp)
 		(*carp_linkstate_p)(ifp);
 	if (ifp->if_bridge)
 		ifp->if_bridge_linkstate(ifp);
 	if (ifp->if_lagg)
 		(*lagg_linkstate_p)(ifp, link_state);
 
 	if (IS_DEFAULT_VNET(curvnet))
 		devctl_notify("IFNET", ifp->if_xname,
 		    (link_state == LINK_STATE_UP) ? "LINK_UP" : "LINK_DOWN",
 		    NULL);
 	if (pending > 1)
 		if_printf(ifp, "%d link states coalesced\n", pending);
 	if (log_link_state_change)
 		if_printf(ifp, "link state changed to %s\n",
 		    (link_state == LINK_STATE_UP) ? "UP" : "DOWN" );
 	EVENTHANDLER_INVOKE(ifnet_link_event, ifp, link_state);
 	CURVNET_RESTORE();
 }
 
 /*
  * Mark an interface down and notify protocols of
  * the transition.
  */
 void
 if_down(struct ifnet *ifp)
 {
 
 	EVENTHANDLER_INVOKE(ifnet_event, ifp, IFNET_EVENT_DOWN);
 	if_unroute(ifp, IFF_UP, AF_UNSPEC);
 }
 
 /*
  * Mark an interface up and notify protocols of
  * the transition.
  */
 void
 if_up(struct ifnet *ifp)
 {
 
 	if_route(ifp, IFF_UP, AF_UNSPEC);
 	EVENTHANDLER_INVOKE(ifnet_event, ifp, IFNET_EVENT_UP);
 }
 
 /*
  * Flush an interface queue.
  */
 void
 if_qflush(struct ifnet *ifp)
 {
 	struct mbuf *m, *n;
 	struct ifaltq *ifq;
 	
 	ifq = &ifp->if_snd;
 	IFQ_LOCK(ifq);
 #ifdef ALTQ
 	if (ALTQ_IS_ENABLED(ifq))
 		ALTQ_PURGE(ifq);
 #endif
 	n = ifq->ifq_head;
 	while ((m = n) != NULL) {
 		n = m->m_nextpkt;
 		m_freem(m);
 	}
 	ifq->ifq_head = 0;
 	ifq->ifq_tail = 0;
 	ifq->ifq_len = 0;
 	IFQ_UNLOCK(ifq);
 }
 
 /*
  * Map interface name to interface structure pointer, with or without
  * returning a reference.
  */
 struct ifnet *
 ifunit_ref(const char *name)
 {
 	struct epoch_tracker et;
 	struct ifnet *ifp;
 
 	NET_EPOCH_ENTER(et);
 	CK_STAILQ_FOREACH(ifp, &V_ifnet, if_link) {
 		if (strncmp(name, ifp->if_xname, IFNAMSIZ) == 0 &&
 		    !(ifp->if_flags & IFF_DYING))
 			break;
 	}
 	if (ifp != NULL)
 		if_ref(ifp);
 	NET_EPOCH_EXIT(et);
 	return (ifp);
 }
 
 struct ifnet *
 ifunit(const char *name)
 {
 	struct epoch_tracker et;
 	struct ifnet *ifp;
 
 	NET_EPOCH_ENTER(et);
 	CK_STAILQ_FOREACH(ifp, &V_ifnet, if_link) {
 		if (strncmp(name, ifp->if_xname, IFNAMSIZ) == 0)
 			break;
 	}
 	NET_EPOCH_EXIT(et);
 	return (ifp);
 }
 
 static void *
 ifr_buffer_get_buffer(void *data)
 {
 	union ifreq_union *ifrup;
 
 	ifrup = data;
 #ifdef COMPAT_FREEBSD32
 	if (SV_CURPROC_FLAG(SV_ILP32))
 		return ((void *)(uintptr_t)
 		    ifrup->ifr32.ifr_ifru.ifru_buffer.buffer);
 #endif
 	return (ifrup->ifr.ifr_ifru.ifru_buffer.buffer);
 }
 
 static void
 ifr_buffer_set_buffer_null(void *data)
 {
 	union ifreq_union *ifrup;
 
 	ifrup = data;
 #ifdef COMPAT_FREEBSD32
 	if (SV_CURPROC_FLAG(SV_ILP32))
 		ifrup->ifr32.ifr_ifru.ifru_buffer.buffer = 0;
 	else
 #endif
 		ifrup->ifr.ifr_ifru.ifru_buffer.buffer = NULL;
 }
 
 static size_t
 ifr_buffer_get_length(void *data)
 {
 	union ifreq_union *ifrup;
 
 	ifrup = data;
 #ifdef COMPAT_FREEBSD32
 	if (SV_CURPROC_FLAG(SV_ILP32))
 		return (ifrup->ifr32.ifr_ifru.ifru_buffer.length);
 #endif
 	return (ifrup->ifr.ifr_ifru.ifru_buffer.length);
 }
 
 static void
 ifr_buffer_set_length(void *data, size_t len)
 {
 	union ifreq_union *ifrup;
 
 	ifrup = data;
 #ifdef COMPAT_FREEBSD32
 	if (SV_CURPROC_FLAG(SV_ILP32))
 		ifrup->ifr32.ifr_ifru.ifru_buffer.length = len;
 	else
 #endif
 		ifrup->ifr.ifr_ifru.ifru_buffer.length = len;
 }
 
 void *
 ifr_data_get_ptr(void *ifrp)
 {
 	union ifreq_union *ifrup;
 
 	ifrup = ifrp;
 #ifdef COMPAT_FREEBSD32
 	if (SV_CURPROC_FLAG(SV_ILP32))
 		return ((void *)(uintptr_t)
 		    ifrup->ifr32.ifr_ifru.ifru_data);
 #endif
 		return (ifrup->ifr.ifr_ifru.ifru_data);
 }
 
 /*
  * Hardware specific interface ioctls.
  */
 int
 ifhwioctl(u_long cmd, struct ifnet *ifp, caddr_t data, struct thread *td)
 {
 	struct ifreq *ifr;
 	int error = 0, do_ifup = 0;
 	int new_flags, temp_flags;
 	size_t namelen, onamelen;
 	size_t descrlen;
 	char *descrbuf, *odescrbuf;
 	char new_name[IFNAMSIZ];
 	struct ifaddr *ifa;
 	struct sockaddr_dl *sdl;
 
 	ifr = (struct ifreq *)data;
 	switch (cmd) {
 	case SIOCGIFINDEX:
 		ifr->ifr_index = ifp->if_index;
 		break;
 
 	case SIOCGIFFLAGS:
 		temp_flags = ifp->if_flags | ifp->if_drv_flags;
 		ifr->ifr_flags = temp_flags & 0xffff;
 		ifr->ifr_flagshigh = temp_flags >> 16;
 		break;
 
 	case SIOCGIFCAP:
 		ifr->ifr_reqcap = ifp->if_capabilities;
 		ifr->ifr_curcap = ifp->if_capenable;
 		break;
 
 #ifdef MAC
 	case SIOCGIFMAC:
 		error = mac_ifnet_ioctl_get(td->td_ucred, ifr, ifp);
 		break;
 #endif
 
 	case SIOCGIFMETRIC:
 		ifr->ifr_metric = ifp->if_metric;
 		break;
 
 	case SIOCGIFMTU:
 		ifr->ifr_mtu = ifp->if_mtu;
 		break;
 
 	case SIOCGIFPHYS:
 		/* XXXGL: did this ever worked? */
 		ifr->ifr_phys = 0;
 		break;
 
 	case SIOCGIFDESCR:
 		error = 0;
 		sx_slock(&ifdescr_sx);
 		if (ifp->if_description == NULL)
 			error = ENOMSG;
 		else {
 			/* space for terminating nul */
 			descrlen = strlen(ifp->if_description) + 1;
 			if (ifr_buffer_get_length(ifr) < descrlen)
 				ifr_buffer_set_buffer_null(ifr);
 			else
 				error = copyout(ifp->if_description,
 				    ifr_buffer_get_buffer(ifr), descrlen);
 			ifr_buffer_set_length(ifr, descrlen);
 		}
 		sx_sunlock(&ifdescr_sx);
 		break;
 
 	case SIOCSIFDESCR:
 		error = priv_check(td, PRIV_NET_SETIFDESCR);
 		if (error)
 			return (error);
 
 		/*
 		 * Copy only (length-1) bytes to make sure that
 		 * if_description is always nul terminated.  The
 		 * length parameter is supposed to count the
 		 * terminating nul in.
 		 */
 		if (ifr_buffer_get_length(ifr) > ifdescr_maxlen)
 			return (ENAMETOOLONG);
 		else if (ifr_buffer_get_length(ifr) == 0)
 			descrbuf = NULL;
 		else {
 			descrbuf = malloc(ifr_buffer_get_length(ifr),
 			    M_IFDESCR, M_WAITOK | M_ZERO);
 			error = copyin(ifr_buffer_get_buffer(ifr), descrbuf,
 			    ifr_buffer_get_length(ifr) - 1);
 			if (error) {
 				free(descrbuf, M_IFDESCR);
 				break;
 			}
 		}
 
 		sx_xlock(&ifdescr_sx);
 		odescrbuf = ifp->if_description;
 		ifp->if_description = descrbuf;
 		sx_xunlock(&ifdescr_sx);
 
 		getmicrotime(&ifp->if_lastchange);
 		free(odescrbuf, M_IFDESCR);
 		break;
 
 	case SIOCGIFFIB:
 		ifr->ifr_fib = ifp->if_fib;
 		break;
 
 	case SIOCSIFFIB:
 		error = priv_check(td, PRIV_NET_SETIFFIB);
 		if (error)
 			return (error);
 		if (ifr->ifr_fib >= rt_numfibs)
 			return (EINVAL);
 
 		ifp->if_fib = ifr->ifr_fib;
 		break;
 
 	case SIOCSIFFLAGS:
 		error = priv_check(td, PRIV_NET_SETIFFLAGS);
 		if (error)
 			return (error);
 		/*
 		 * Currently, no driver owned flags pass the IFF_CANTCHANGE
 		 * check, so we don't need special handling here yet.
 		 */
 		new_flags = (ifr->ifr_flags & 0xffff) |
 		    (ifr->ifr_flagshigh << 16);
 		if (ifp->if_flags & IFF_UP &&
 		    (new_flags & IFF_UP) == 0) {
 			if_down(ifp);
 		} else if (new_flags & IFF_UP &&
 		    (ifp->if_flags & IFF_UP) == 0) {
 			do_ifup = 1;
 		}
 		/* See if permanently promiscuous mode bit is about to flip */
 		if ((ifp->if_flags ^ new_flags) & IFF_PPROMISC) {
 			if (new_flags & IFF_PPROMISC)
 				ifp->if_flags |= IFF_PROMISC;
 			else if (ifp->if_pcount == 0)
 				ifp->if_flags &= ~IFF_PROMISC;
 			if (log_promisc_mode_change)
                                 if_printf(ifp, "permanently promiscuous mode %s\n",
                                     ((new_flags & IFF_PPROMISC) ?
                                      "enabled" : "disabled"));
 		}
 		ifp->if_flags = (ifp->if_flags & IFF_CANTCHANGE) |
 			(new_flags &~ IFF_CANTCHANGE);
 		if (ifp->if_ioctl) {
 			(void) (*ifp->if_ioctl)(ifp, cmd, data);
 		}
 		if (do_ifup)
 			if_up(ifp);
 		getmicrotime(&ifp->if_lastchange);
 		break;
 
 	case SIOCSIFCAP:
 		error = priv_check(td, PRIV_NET_SETIFCAP);
 		if (error)
 			return (error);
 		if (ifp->if_ioctl == NULL)
 			return (EOPNOTSUPP);
 		if (ifr->ifr_reqcap & ~ifp->if_capabilities)
 			return (EINVAL);
 		error = (*ifp->if_ioctl)(ifp, cmd, data);
 		if (error == 0)
 			getmicrotime(&ifp->if_lastchange);
 		break;
 
 #ifdef MAC
 	case SIOCSIFMAC:
 		error = mac_ifnet_ioctl_set(td->td_ucred, ifr, ifp);
 		break;
 #endif
 
 	case SIOCSIFNAME:
 		error = priv_check(td, PRIV_NET_SETIFNAME);
 		if (error)
 			return (error);
 		error = copyinstr(ifr_data_get_ptr(ifr), new_name, IFNAMSIZ,
 		    NULL);
 		if (error != 0)
 			return (error);
 		if (new_name[0] == '\0')
 			return (EINVAL);
 		if (new_name[IFNAMSIZ-1] != '\0') {
 			new_name[IFNAMSIZ-1] = '\0';
 			if (strlen(new_name) == IFNAMSIZ-1)
 				return (EINVAL);
 		}
 		if (strcmp(new_name, ifp->if_xname) == 0)
 			break;
 		if (ifunit(new_name) != NULL)
 			return (EEXIST);
 
 		/*
 		 * XXX: Locking.  Nothing else seems to lock if_flags,
 		 * and there are numerous other races with the
 		 * ifunit() checks not being atomic with namespace
 		 * changes (renames, vmoves, if_attach, etc).
 		 */
 		ifp->if_flags |= IFF_RENAMING;
 		
 		/* Announce the departure of the interface. */
 		rt_ifannouncemsg(ifp, IFAN_DEPARTURE);
 		EVENTHANDLER_INVOKE(ifnet_departure_event, ifp);
 
 		if_printf(ifp, "changing name to '%s'\n", new_name);
 
 		IF_ADDR_WLOCK(ifp);
 		strlcpy(ifp->if_xname, new_name, sizeof(ifp->if_xname));
 		ifa = ifp->if_addr;
 		sdl = (struct sockaddr_dl *)ifa->ifa_addr;
 		namelen = strlen(new_name);
 		onamelen = sdl->sdl_nlen;
 		/*
 		 * Move the address if needed.  This is safe because we
 		 * allocate space for a name of length IFNAMSIZ when we
 		 * create this in if_attach().
 		 */
 		if (namelen != onamelen) {
 			bcopy(sdl->sdl_data + onamelen,
 			    sdl->sdl_data + namelen, sdl->sdl_alen);
 		}
 		bcopy(new_name, sdl->sdl_data, namelen);
 		sdl->sdl_nlen = namelen;
 		sdl = (struct sockaddr_dl *)ifa->ifa_netmask;
 		bzero(sdl->sdl_data, onamelen);
 		while (namelen != 0)
 			sdl->sdl_data[--namelen] = 0xff;
 		IF_ADDR_WUNLOCK(ifp);
 
 		EVENTHANDLER_INVOKE(ifnet_arrival_event, ifp);
 		/* Announce the return of the interface. */
 		rt_ifannouncemsg(ifp, IFAN_ARRIVAL);
 
 		ifp->if_flags &= ~IFF_RENAMING;
 		break;
 
 #ifdef VIMAGE
 	case SIOCSIFVNET:
 		error = priv_check(td, PRIV_NET_SETIFVNET);
 		if (error)
 			return (error);
 		error = if_vmove_loan(td, ifp, ifr->ifr_name, ifr->ifr_jid);
 		break;
 #endif
 
 	case SIOCSIFMETRIC:
 		error = priv_check(td, PRIV_NET_SETIFMETRIC);
 		if (error)
 			return (error);
 		ifp->if_metric = ifr->ifr_metric;
 		getmicrotime(&ifp->if_lastchange);
 		break;
 
 	case SIOCSIFPHYS:
 		error = priv_check(td, PRIV_NET_SETIFPHYS);
 		if (error)
 			return (error);
 		if (ifp->if_ioctl == NULL)
 			return (EOPNOTSUPP);
 		error = (*ifp->if_ioctl)(ifp, cmd, data);
 		if (error == 0)
 			getmicrotime(&ifp->if_lastchange);
 		break;
 
 	case SIOCSIFMTU:
 	{
 		u_long oldmtu = ifp->if_mtu;
 
 		error = priv_check(td, PRIV_NET_SETIFMTU);
 		if (error)
 			return (error);
 		if (ifr->ifr_mtu < IF_MINMTU || ifr->ifr_mtu > IF_MAXMTU)
 			return (EINVAL);
 		if (ifp->if_ioctl == NULL)
 			return (EOPNOTSUPP);
 		error = (*ifp->if_ioctl)(ifp, cmd, data);
 		if (error == 0) {
 			getmicrotime(&ifp->if_lastchange);
 			rt_ifmsg(ifp);
 #ifdef INET
 			DEBUGNET_NOTIFY_MTU(ifp);
 #endif
 		}
 		/*
 		 * If the link MTU changed, do network layer specific procedure.
 		 */
 		if (ifp->if_mtu != oldmtu) {
 #ifdef INET6
 			nd6_setmtu(ifp);
 #endif
 			rt_updatemtu(ifp);
 		}
 		break;
 	}
 
 	case SIOCADDMULTI:
 	case SIOCDELMULTI:
 		if (cmd == SIOCADDMULTI)
 			error = priv_check(td, PRIV_NET_ADDMULTI);
 		else
 			error = priv_check(td, PRIV_NET_DELMULTI);
 		if (error)
 			return (error);
 
 		/* Don't allow group membership on non-multicast interfaces. */
 		if ((ifp->if_flags & IFF_MULTICAST) == 0)
 			return (EOPNOTSUPP);
 
 		/* Don't let users screw up protocols' entries. */
 		if (ifr->ifr_addr.sa_family != AF_LINK)
 			return (EINVAL);
 
 		if (cmd == SIOCADDMULTI) {
 			struct epoch_tracker et;
 			struct ifmultiaddr *ifma;
 
 			/*
 			 * Userland is only permitted to join groups once
 			 * via the if_addmulti() KPI, because it cannot hold
 			 * struct ifmultiaddr * between calls. It may also
 			 * lose a race while we check if the membership
 			 * already exists.
 			 */
 			NET_EPOCH_ENTER(et);
 			ifma = if_findmulti(ifp, &ifr->ifr_addr);
 			NET_EPOCH_EXIT(et);
 			if (ifma != NULL)
 				error = EADDRINUSE;
 			else
 				error = if_addmulti(ifp, &ifr->ifr_addr, &ifma);
 		} else {
 			error = if_delmulti(ifp, &ifr->ifr_addr);
 		}
 		if (error == 0)
 			getmicrotime(&ifp->if_lastchange);
 		break;
 
 	case SIOCSIFPHYADDR:
 	case SIOCDIFPHYADDR:
 #ifdef INET6
 	case SIOCSIFPHYADDR_IN6:
 #endif
 	case SIOCSIFMEDIA:
 	case SIOCSIFGENERIC:
 		error = priv_check(td, PRIV_NET_HWIOCTL);
 		if (error)
 			return (error);
 		if (ifp->if_ioctl == NULL)
 			return (EOPNOTSUPP);
 		error = (*ifp->if_ioctl)(ifp, cmd, data);
 		if (error == 0)
 			getmicrotime(&ifp->if_lastchange);
 		break;
 
 	case SIOCGIFSTATUS:
 	case SIOCGIFPSRCADDR:
 	case SIOCGIFPDSTADDR:
 	case SIOCGIFMEDIA:
 	case SIOCGIFXMEDIA:
 	case SIOCGIFGENERIC:
 	case SIOCGIFRSSKEY:
 	case SIOCGIFRSSHASH:
 	case SIOCGIFDOWNREASON:
 		if (ifp->if_ioctl == NULL)
 			return (EOPNOTSUPP);
 		error = (*ifp->if_ioctl)(ifp, cmd, data);
 		break;
 
 	case SIOCSIFLLADDR:
 		error = priv_check(td, PRIV_NET_SETLLADDR);
 		if (error)
 			return (error);
 		error = if_setlladdr(ifp,
 		    ifr->ifr_addr.sa_data, ifr->ifr_addr.sa_len);
 		break;
 
 	case SIOCGHWADDR:
 		error = if_gethwaddr(ifp, ifr);
 		break;
 
 	case CASE_IOC_IFGROUPREQ(SIOCAIFGROUP):
 		error = priv_check(td, PRIV_NET_ADDIFGROUP);
 		if (error)
 			return (error);
 		if ((error = if_addgroup(ifp,
 		    ifgr_group_get((struct ifgroupreq *)data))))
 			return (error);
 		break;
 
 	case CASE_IOC_IFGROUPREQ(SIOCGIFGROUP):
 	{
 		struct epoch_tracker et;
 
 		NET_EPOCH_ENTER(et);
 		error = if_getgroup((struct ifgroupreq *)data, ifp);
 		NET_EPOCH_EXIT(et);
 		break;
 	}
 
 	case CASE_IOC_IFGROUPREQ(SIOCDIFGROUP):
 		error = priv_check(td, PRIV_NET_DELIFGROUP);
 		if (error)
 			return (error);
 		if ((error = if_delgroup(ifp,
 		    ifgr_group_get((struct ifgroupreq *)data))))
 			return (error);
 		break;
 
 	default:
 		error = ENOIOCTL;
 		break;
 	}
 	return (error);
 }
 
 #ifdef COMPAT_FREEBSD32
 struct ifconf32 {
 	int32_t	ifc_len;
 	union {
 		uint32_t	ifcu_buf;
 		uint32_t	ifcu_req;
 	} ifc_ifcu;
 };
 #define	SIOCGIFCONF32	_IOWR('i', 36, struct ifconf32)
 #endif
 
 #ifdef COMPAT_FREEBSD32
 static void
 ifmr_init(struct ifmediareq *ifmr, caddr_t data)
 {
 	struct ifmediareq32 *ifmr32;
 
 	ifmr32 = (struct ifmediareq32 *)data;
 	memcpy(ifmr->ifm_name, ifmr32->ifm_name,
 	    sizeof(ifmr->ifm_name));
 	ifmr->ifm_current = ifmr32->ifm_current;
 	ifmr->ifm_mask = ifmr32->ifm_mask;
 	ifmr->ifm_status = ifmr32->ifm_status;
 	ifmr->ifm_active = ifmr32->ifm_active;
 	ifmr->ifm_count = ifmr32->ifm_count;
 	ifmr->ifm_ulist = (int *)(uintptr_t)ifmr32->ifm_ulist;
 }
 
 static void
 ifmr_update(const struct ifmediareq *ifmr, caddr_t data)
 {
 	struct ifmediareq32 *ifmr32;
 
 	ifmr32 = (struct ifmediareq32 *)data;
 	ifmr32->ifm_current = ifmr->ifm_current;
 	ifmr32->ifm_mask = ifmr->ifm_mask;
 	ifmr32->ifm_status = ifmr->ifm_status;
 	ifmr32->ifm_active = ifmr->ifm_active;
 	ifmr32->ifm_count = ifmr->ifm_count;
 }
 #endif
 
 /*
  * Interface ioctls.
  */
 int
 ifioctl(struct socket *so, u_long cmd, caddr_t data, struct thread *td)
 {
 #ifdef COMPAT_FREEBSD32
 	caddr_t saved_data = NULL;
 	struct ifmediareq ifmr;
 	struct ifmediareq *ifmrp = NULL;
 #endif
 	struct ifnet *ifp;
 	struct ifreq *ifr;
 	int error;
 	int oif_flags;
+#ifdef VIMAGE
+	bool shutdown;
+#endif
 
 	CURVNET_SET(so->so_vnet);
 #ifdef VIMAGE
 	/* Make sure the VNET is stable. */
-	if (so->so_vnet->vnet_shutdown) {
+	shutdown = VNET_IS_SHUTTING_DOWN(so->so_vnet);
+	if (shutdown) {
 		CURVNET_RESTORE();
 		return (EBUSY);
 	}
 #endif
 
 	switch (cmd) {
 	case SIOCGIFCONF:
 		error = ifconf(cmd, data);
 		goto out_noref;
 
 #ifdef COMPAT_FREEBSD32
 	case SIOCGIFCONF32:
 		{
 			struct ifconf32 *ifc32;
 			struct ifconf ifc;
 
 			ifc32 = (struct ifconf32 *)data;
 			ifc.ifc_len = ifc32->ifc_len;
 			ifc.ifc_buf = PTRIN(ifc32->ifc_buf);
 
 			error = ifconf(SIOCGIFCONF, (void *)&ifc);
 			if (error == 0)
 				ifc32->ifc_len = ifc.ifc_len;
 			goto out_noref;
 		}
 #endif
 	}
 
 #ifdef COMPAT_FREEBSD32
 	switch (cmd) {
 	case SIOCGIFMEDIA32:
 	case SIOCGIFXMEDIA32:
 		ifmrp = &ifmr;
 		ifmr_init(ifmrp, data);
 		cmd = _IOC_NEWTYPE(cmd, struct ifmediareq);
 		saved_data = data;
 		data = (caddr_t)ifmrp;
 	}
 #endif
 
 	ifr = (struct ifreq *)data;
 	switch (cmd) {
 #ifdef VIMAGE
 	case SIOCSIFRVNET:
 		error = priv_check(td, PRIV_NET_SETIFVNET);
 		if (error == 0)
 			error = if_vmove_reclaim(td, ifr->ifr_name,
 			    ifr->ifr_jid);
 		goto out_noref;
 #endif
 	case SIOCIFCREATE:
 	case SIOCIFCREATE2:
 		error = priv_check(td, PRIV_NET_IFCREATE);
 		if (error == 0)
 			error = if_clone_create(ifr->ifr_name,
 			    sizeof(ifr->ifr_name), cmd == SIOCIFCREATE2 ?
 			    ifr_data_get_ptr(ifr) : NULL);
 		goto out_noref;
 	case SIOCIFDESTROY:
 		error = priv_check(td, PRIV_NET_IFDESTROY);
 		if (error == 0)
 			error = if_clone_destroy(ifr->ifr_name);
 		goto out_noref;
 
 	case SIOCIFGCLONERS:
 		error = if_clone_list((struct if_clonereq *)data);
 		goto out_noref;
 
 	case CASE_IOC_IFGROUPREQ(SIOCGIFGMEMB):
 		error = if_getgroupmembers((struct ifgroupreq *)data);
 		goto out_noref;
 
 #if defined(INET) || defined(INET6)
 	case SIOCSVH:
 	case SIOCGVH:
 		if (carp_ioctl_p == NULL)
 			error = EPROTONOSUPPORT;
 		else
 			error = (*carp_ioctl_p)(ifr, cmd, td);
 		goto out_noref;
 #endif
 	}
 
 	ifp = ifunit_ref(ifr->ifr_name);
 	if (ifp == NULL) {
 		error = ENXIO;
 		goto out_noref;
 	}
 
 	error = ifhwioctl(cmd, ifp, data, td);
 	if (error != ENOIOCTL)
 		goto out_ref;
 
 	oif_flags = ifp->if_flags;
 	if (so->so_proto == NULL) {
 		error = EOPNOTSUPP;
 		goto out_ref;
 	}
 
 	/*
 	 * Pass the request on to the socket control method, and if the
 	 * latter returns EOPNOTSUPP, directly to the interface.
 	 *
 	 * Make an exception for the legacy SIOCSIF* requests.  Drivers
 	 * trust SIOCSIFADDR et al to come from an already privileged
 	 * layer, and do not perform any credentials checks or input
 	 * validation.
 	 */
 	error = ((*so->so_proto->pr_usrreqs->pru_control)(so, cmd, data,
 	    ifp, td));
 	if (error == EOPNOTSUPP && ifp != NULL && ifp->if_ioctl != NULL &&
 	    cmd != SIOCSIFADDR && cmd != SIOCSIFBRDADDR &&
 	    cmd != SIOCSIFDSTADDR && cmd != SIOCSIFNETMASK)
 		error = (*ifp->if_ioctl)(ifp, cmd, data);
 
 	if ((oif_flags ^ ifp->if_flags) & IFF_UP) {
 #ifdef INET6
 		if (ifp->if_flags & IFF_UP)
 			in6_if_up(ifp);
 #endif
 	}
 
 out_ref:
 	if_rele(ifp);
 out_noref:
 #ifdef COMPAT_FREEBSD32
 	if (ifmrp != NULL) {
 		KASSERT((cmd == SIOCGIFMEDIA || cmd == SIOCGIFXMEDIA),
 		    ("ifmrp non-NULL, but cmd is not an ifmedia req 0x%lx",
 		     cmd));
 		data = saved_data;
 		ifmr_update(ifmrp, data);
 	}
 #endif
 	CURVNET_RESTORE();
 	return (error);
 }
 
 /*
  * The code common to handling reference counted flags,
  * e.g., in ifpromisc() and if_allmulti().
  * The "pflag" argument can specify a permanent mode flag to check,
  * such as IFF_PPROMISC for promiscuous mode; should be 0 if none.
  *
  * Only to be used on stack-owned flags, not driver-owned flags.
  */
 static int
 if_setflag(struct ifnet *ifp, int flag, int pflag, int *refcount, int onswitch)
 {
 	struct ifreq ifr;
 	int error;
 	int oldflags, oldcount;
 
 	/* Sanity checks to catch programming errors */
 	KASSERT((flag & (IFF_DRV_OACTIVE|IFF_DRV_RUNNING)) == 0,
 	    ("%s: setting driver-owned flag %d", __func__, flag));
 
 	if (onswitch)
 		KASSERT(*refcount >= 0,
 		    ("%s: increment negative refcount %d for flag %d",
 		    __func__, *refcount, flag));
 	else
 		KASSERT(*refcount > 0,
 		    ("%s: decrement non-positive refcount %d for flag %d",
 		    __func__, *refcount, flag));
 
 	/* In case this mode is permanent, just touch refcount */
 	if (ifp->if_flags & pflag) {
 		*refcount += onswitch ? 1 : -1;
 		return (0);
 	}
 
 	/* Save ifnet parameters for if_ioctl() may fail */
 	oldcount = *refcount;
 	oldflags = ifp->if_flags;
 	
 	/*
 	 * See if we aren't the only and touching refcount is enough.
 	 * Actually toggle interface flag if we are the first or last.
 	 */
 	if (onswitch) {
 		if ((*refcount)++)
 			return (0);
 		ifp->if_flags |= flag;
 	} else {
 		if (--(*refcount))
 			return (0);
 		ifp->if_flags &= ~flag;
 	}
 
 	/* Call down the driver since we've changed interface flags */
 	if (ifp->if_ioctl == NULL) {
 		error = EOPNOTSUPP;
 		goto recover;
 	}
 	ifr.ifr_flags = ifp->if_flags & 0xffff;
 	ifr.ifr_flagshigh = ifp->if_flags >> 16;
 	error = (*ifp->if_ioctl)(ifp, SIOCSIFFLAGS, (caddr_t)&ifr);
 	if (error)
 		goto recover;
 	/* Notify userland that interface flags have changed */
 	rt_ifmsg(ifp);
 	return (0);
 
 recover:
 	/* Recover after driver error */
 	*refcount = oldcount;
 	ifp->if_flags = oldflags;
 	return (error);
 }
 
 /*
  * Set/clear promiscuous mode on interface ifp based on the truth value
  * of pswitch.  The calls are reference counted so that only the first
  * "on" request actually has an effect, as does the final "off" request.
  * Results are undefined if the "off" and "on" requests are not matched.
  */
 int
 ifpromisc(struct ifnet *ifp, int pswitch)
 {
 	int error;
 	int oldflags = ifp->if_flags;
 
 	error = if_setflag(ifp, IFF_PROMISC, IFF_PPROMISC,
 			   &ifp->if_pcount, pswitch);
 	/* If promiscuous mode status has changed, log a message */
 	if (error == 0 && ((ifp->if_flags ^ oldflags) & IFF_PROMISC) &&
             log_promisc_mode_change)
 		if_printf(ifp, "promiscuous mode %s\n",
 		    (ifp->if_flags & IFF_PROMISC) ? "enabled" : "disabled");
 	return (error);
 }
 
 /*
  * Return interface configuration
  * of system.  List may be used
  * in later ioctl's (above) to get
  * other information.
  */
 /*ARGSUSED*/
 static int
 ifconf(u_long cmd, caddr_t data)
 {
 	struct ifconf *ifc = (struct ifconf *)data;
 	struct ifnet *ifp;
 	struct ifaddr *ifa;
 	struct ifreq ifr;
 	struct sbuf *sb;
 	int error, full = 0, valid_len, max_len;
 
 	/* Limit initial buffer size to MAXPHYS to avoid DoS from userspace. */
 	max_len = MAXPHYS - 1;
 
 	/* Prevent hostile input from being able to crash the system */
 	if (ifc->ifc_len <= 0)
 		return (EINVAL);
 
 again:
 	if (ifc->ifc_len <= max_len) {
 		max_len = ifc->ifc_len;
 		full = 1;
 	}
 	sb = sbuf_new(NULL, NULL, max_len + 1, SBUF_FIXEDLEN);
 	max_len = 0;
 	valid_len = 0;
 
 	IFNET_RLOCK();
 	CK_STAILQ_FOREACH(ifp, &V_ifnet, if_link) {
 		struct epoch_tracker et;
 		int addrs;
 
 		/*
 		 * Zero the ifr to make sure we don't disclose the contents
 		 * of the stack.
 		 */
 		memset(&ifr, 0, sizeof(ifr));
 
 		if (strlcpy(ifr.ifr_name, ifp->if_xname, sizeof(ifr.ifr_name))
 		    >= sizeof(ifr.ifr_name)) {
 			sbuf_delete(sb);
 			IFNET_RUNLOCK();
 			return (ENAMETOOLONG);
 		}
 
 		addrs = 0;
 		NET_EPOCH_ENTER(et);
 		CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) {
 			struct sockaddr *sa = ifa->ifa_addr;
 
 			if (prison_if(curthread->td_ucred, sa) != 0)
 				continue;
 			addrs++;
 			if (sa->sa_len <= sizeof(*sa)) {
 				if (sa->sa_len < sizeof(*sa)) {
 					memset(&ifr.ifr_ifru.ifru_addr, 0,
 					    sizeof(ifr.ifr_ifru.ifru_addr));
 					memcpy(&ifr.ifr_ifru.ifru_addr, sa,
 					    sa->sa_len);
 				} else
 					ifr.ifr_ifru.ifru_addr = *sa;
 				sbuf_bcat(sb, &ifr, sizeof(ifr));
 				max_len += sizeof(ifr);
 			} else {
 				sbuf_bcat(sb, &ifr,
 				    offsetof(struct ifreq, ifr_addr));
 				max_len += offsetof(struct ifreq, ifr_addr);
 				sbuf_bcat(sb, sa, sa->sa_len);
 				max_len += sa->sa_len;
 			}
 
 			if (sbuf_error(sb) == 0)
 				valid_len = sbuf_len(sb);
 		}
 		NET_EPOCH_EXIT(et);
 		if (addrs == 0) {
 			sbuf_bcat(sb, &ifr, sizeof(ifr));
 			max_len += sizeof(ifr);
 
 			if (sbuf_error(sb) == 0)
 				valid_len = sbuf_len(sb);
 		}
 	}
 	IFNET_RUNLOCK();
 
 	/*
 	 * If we didn't allocate enough space (uncommon), try again.  If
 	 * we have already allocated as much space as we are allowed,
 	 * return what we've got.
 	 */
 	if (valid_len != max_len && !full) {
 		sbuf_delete(sb);
 		goto again;
 	}
 
 	ifc->ifc_len = valid_len;
 	sbuf_finish(sb);
 	error = copyout(sbuf_data(sb), ifc->ifc_req, ifc->ifc_len);
 	sbuf_delete(sb);
 	return (error);
 }
 
 /*
  * Just like ifpromisc(), but for all-multicast-reception mode.
  */
 int
 if_allmulti(struct ifnet *ifp, int onswitch)
 {
 
 	return (if_setflag(ifp, IFF_ALLMULTI, 0, &ifp->if_amcount, onswitch));
 }
 
 struct ifmultiaddr *
 if_findmulti(struct ifnet *ifp, const struct sockaddr *sa)
 {
 	struct ifmultiaddr *ifma;
 
 	IF_ADDR_LOCK_ASSERT(ifp);
 
 	CK_STAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) {
 		if (sa->sa_family == AF_LINK) {
 			if (sa_dl_equal(ifma->ifma_addr, sa))
 				break;
 		} else {
 			if (sa_equal(ifma->ifma_addr, sa))
 				break;
 		}
 	}
 
 	return ifma;
 }
 
 /*
  * Allocate a new ifmultiaddr and initialize based on passed arguments.  We
  * make copies of passed sockaddrs.  The ifmultiaddr will not be added to
  * the ifnet multicast address list here, so the caller must do that and
  * other setup work (such as notifying the device driver).  The reference
  * count is initialized to 1.
  */
 static struct ifmultiaddr *
 if_allocmulti(struct ifnet *ifp, struct sockaddr *sa, struct sockaddr *llsa,
     int mflags)
 {
 	struct ifmultiaddr *ifma;
 	struct sockaddr *dupsa;
 
 	ifma = malloc(sizeof *ifma, M_IFMADDR, mflags |
 	    M_ZERO);
 	if (ifma == NULL)
 		return (NULL);
 
 	dupsa = malloc(sa->sa_len, M_IFMADDR, mflags);
 	if (dupsa == NULL) {
 		free(ifma, M_IFMADDR);
 		return (NULL);
 	}
 	bcopy(sa, dupsa, sa->sa_len);
 	ifma->ifma_addr = dupsa;
 
 	ifma->ifma_ifp = ifp;
 	ifma->ifma_refcount = 1;
 	ifma->ifma_protospec = NULL;
 
 	if (llsa == NULL) {
 		ifma->ifma_lladdr = NULL;
 		return (ifma);
 	}
 
 	dupsa = malloc(llsa->sa_len, M_IFMADDR, mflags);
 	if (dupsa == NULL) {
 		free(ifma->ifma_addr, M_IFMADDR);
 		free(ifma, M_IFMADDR);
 		return (NULL);
 	}
 	bcopy(llsa, dupsa, llsa->sa_len);
 	ifma->ifma_lladdr = dupsa;
 
 	return (ifma);
 }
 
 /*
  * if_freemulti: free ifmultiaddr structure and possibly attached related
  * addresses.  The caller is responsible for implementing reference
  * counting, notifying the driver, handling routing messages, and releasing
  * any dependent link layer state.
  */
 #ifdef MCAST_VERBOSE
 extern void kdb_backtrace(void);
 #endif
 static void
 if_freemulti_internal(struct ifmultiaddr *ifma)
 {
 
 	KASSERT(ifma->ifma_refcount == 0, ("if_freemulti: refcount %d",
 	    ifma->ifma_refcount));
 
 	if (ifma->ifma_lladdr != NULL)
 		free(ifma->ifma_lladdr, M_IFMADDR);
 #ifdef MCAST_VERBOSE
 	kdb_backtrace();
 	printf("%s freeing ifma: %p\n", __func__, ifma);
 #endif
 	free(ifma->ifma_addr, M_IFMADDR);
 	free(ifma, M_IFMADDR);
 }
 
 static void
 if_destroymulti(epoch_context_t ctx)
 {
 	struct ifmultiaddr *ifma;
 
 	ifma = __containerof(ctx, struct ifmultiaddr, ifma_epoch_ctx);
 	if_freemulti_internal(ifma);
 }
 
 void
 if_freemulti(struct ifmultiaddr *ifma)
 {
 	KASSERT(ifma->ifma_refcount == 0, ("if_freemulti_epoch: refcount %d",
 	    ifma->ifma_refcount));
 
 	NET_EPOCH_CALL(if_destroymulti, &ifma->ifma_epoch_ctx);
 }
 
 
 /*
  * Register an additional multicast address with a network interface.
  *
  * - If the address is already present, bump the reference count on the
  *   address and return.
  * - If the address is not link-layer, look up a link layer address.
  * - Allocate address structures for one or both addresses, and attach to the
  *   multicast address list on the interface.  If automatically adding a link
  *   layer address, the protocol address will own a reference to the link
  *   layer address, to be freed when it is freed.
  * - Notify the network device driver of an addition to the multicast address
  *   list.
  *
  * 'sa' points to caller-owned memory with the desired multicast address.
  *
  * 'retifma' will be used to return a pointer to the resulting multicast
  * address reference, if desired.
  */
 int
 if_addmulti(struct ifnet *ifp, struct sockaddr *sa,
     struct ifmultiaddr **retifma)
 {
 	struct ifmultiaddr *ifma, *ll_ifma;
 	struct sockaddr *llsa;
 	struct sockaddr_dl sdl;
 	int error;
 
 #ifdef INET
 	IN_MULTI_LIST_UNLOCK_ASSERT();
 #endif
 #ifdef INET6
 	IN6_MULTI_LIST_UNLOCK_ASSERT();
 #endif
 	/*
 	 * If the address is already present, return a new reference to it;
 	 * otherwise, allocate storage and set up a new address.
 	 */
 	IF_ADDR_WLOCK(ifp);
 	ifma = if_findmulti(ifp, sa);
 	if (ifma != NULL) {
 		ifma->ifma_refcount++;
 		if (retifma != NULL)
 			*retifma = ifma;
 		IF_ADDR_WUNLOCK(ifp);
 		return (0);
 	}
 
 	/*
 	 * The address isn't already present; resolve the protocol address
 	 * into a link layer address, and then look that up, bump its
 	 * refcount or allocate an ifma for that also.
 	 * Most link layer resolving functions returns address data which
 	 * fits inside default sockaddr_dl structure. However callback
 	 * can allocate another sockaddr structure, in that case we need to
 	 * free it later.
 	 */
 	llsa = NULL;
 	ll_ifma = NULL;
 	if (ifp->if_resolvemulti != NULL) {
 		/* Provide called function with buffer size information */
 		sdl.sdl_len = sizeof(sdl);
 		llsa = (struct sockaddr *)&sdl;
 		error = ifp->if_resolvemulti(ifp, &llsa, sa);
 		if (error)
 			goto unlock_out;
 	}
 
 	/*
 	 * Allocate the new address.  Don't hook it up yet, as we may also
 	 * need to allocate a link layer multicast address.
 	 */
 	ifma = if_allocmulti(ifp, sa, llsa, M_NOWAIT);
 	if (ifma == NULL) {
 		error = ENOMEM;
 		goto free_llsa_out;
 	}
 
 	/*
 	 * If a link layer address is found, we'll need to see if it's
 	 * already present in the address list, or allocate is as well.
 	 * When this block finishes, the link layer address will be on the
 	 * list.
 	 */
 	if (llsa != NULL) {
 		ll_ifma = if_findmulti(ifp, llsa);
 		if (ll_ifma == NULL) {
 			ll_ifma = if_allocmulti(ifp, llsa, NULL, M_NOWAIT);
 			if (ll_ifma == NULL) {
 				--ifma->ifma_refcount;
 				if_freemulti(ifma);
 				error = ENOMEM;
 				goto free_llsa_out;
 			}
 			ll_ifma->ifma_flags |= IFMA_F_ENQUEUED;
 			CK_STAILQ_INSERT_HEAD(&ifp->if_multiaddrs, ll_ifma,
 			    ifma_link);
 		} else
 			ll_ifma->ifma_refcount++;
 		ifma->ifma_llifma = ll_ifma;
 	}
 
 	/*
 	 * We now have a new multicast address, ifma, and possibly a new or
 	 * referenced link layer address.  Add the primary address to the
 	 * ifnet address list.
 	 */
 	ifma->ifma_flags |= IFMA_F_ENQUEUED;
 	CK_STAILQ_INSERT_HEAD(&ifp->if_multiaddrs, ifma, ifma_link);
 
 	if (retifma != NULL)
 		*retifma = ifma;
 
 	/*
 	 * Must generate the message while holding the lock so that 'ifma'
 	 * pointer is still valid.
 	 */
 	rt_newmaddrmsg(RTM_NEWMADDR, ifma);
 	IF_ADDR_WUNLOCK(ifp);
 
 	/*
 	 * We are certain we have added something, so call down to the
 	 * interface to let them know about it.
 	 */
 	if (ifp->if_ioctl != NULL) {
 		if (THREAD_CAN_SLEEP())
 			(void )(*ifp->if_ioctl)(ifp, SIOCADDMULTI, 0);
 		else
 			taskqueue_enqueue(taskqueue_swi, &ifp->if_addmultitask);
 	}
 
 	if ((llsa != NULL) && (llsa != (struct sockaddr *)&sdl))
 		link_free_sdl(llsa);
 
 	return (0);
 
 free_llsa_out:
 	if ((llsa != NULL) && (llsa != (struct sockaddr *)&sdl))
 		link_free_sdl(llsa);
 
 unlock_out:
 	IF_ADDR_WUNLOCK(ifp);
 	return (error);
 }
 
 static void
 if_siocaddmulti(void *arg, int pending)
 {
 	struct ifnet *ifp;
 
 	ifp = arg;
 #ifdef DIAGNOSTIC
 	if (pending > 1)
 		if_printf(ifp, "%d SIOCADDMULTI coalesced\n", pending);
 #endif
 	CURVNET_SET(ifp->if_vnet);
 	(void )(*ifp->if_ioctl)(ifp, SIOCADDMULTI, 0);
 	CURVNET_RESTORE();
 }
 
 /*
  * Delete a multicast group membership by network-layer group address.
  *
  * Returns ENOENT if the entry could not be found. If ifp no longer
  * exists, results are undefined. This entry point should only be used
  * from subsystems which do appropriate locking to hold ifp for the
  * duration of the call.
  * Network-layer protocol domains must use if_delmulti_ifma().
  */
 int
 if_delmulti(struct ifnet *ifp, struct sockaddr *sa)
 {
 	struct ifmultiaddr *ifma;
 	int lastref;
 
 	KASSERT(ifp, ("%s: NULL ifp", __func__));
 
 	IF_ADDR_WLOCK(ifp);
 	lastref = 0;
 	ifma = if_findmulti(ifp, sa);
 	if (ifma != NULL)
 		lastref = if_delmulti_locked(ifp, ifma, 0);
 	IF_ADDR_WUNLOCK(ifp);
 
 	if (ifma == NULL)
 		return (ENOENT);
 
 	if (lastref && ifp->if_ioctl != NULL) {
 		(void)(*ifp->if_ioctl)(ifp, SIOCDELMULTI, 0);
 	}
 
 	return (0);
 }
 
 /*
  * Delete all multicast group membership for an interface.
  * Should be used to quickly flush all multicast filters.
  */
 void
 if_delallmulti(struct ifnet *ifp)
 {
 	struct ifmultiaddr *ifma;
 	struct ifmultiaddr *next;
 
 	IF_ADDR_WLOCK(ifp);
 	CK_STAILQ_FOREACH_SAFE(ifma, &ifp->if_multiaddrs, ifma_link, next)
 		if_delmulti_locked(ifp, ifma, 0);
 	IF_ADDR_WUNLOCK(ifp);
 }
 
 void
 if_delmulti_ifma(struct ifmultiaddr *ifma)
 {
 	if_delmulti_ifma_flags(ifma, 0);
 }
 
 /*
  * Delete a multicast group membership by group membership pointer.
  * Network-layer protocol domains must use this routine.
  *
  * It is safe to call this routine if the ifp disappeared.
  */
 void
 if_delmulti_ifma_flags(struct ifmultiaddr *ifma, int flags)
 {
 	struct ifnet *ifp;
 	int lastref;
 	MCDPRINTF("%s freeing ifma: %p\n", __func__, ifma);
 #ifdef INET
 	IN_MULTI_LIST_UNLOCK_ASSERT();
 #endif
 	ifp = ifma->ifma_ifp;
 #ifdef DIAGNOSTIC
 	if (ifp == NULL) {
 		printf("%s: ifma_ifp seems to be detached\n", __func__);
 	} else {
 		struct epoch_tracker et;
 		struct ifnet *oifp;
 
 		NET_EPOCH_ENTER(et);
 		CK_STAILQ_FOREACH(oifp, &V_ifnet, if_link)
 			if (ifp == oifp)
 				break;
 		NET_EPOCH_EXIT(et);
 		if (ifp != oifp)
 			ifp = NULL;
 	}
 #endif
 	/*
 	 * If and only if the ifnet instance exists: Acquire the address lock.
 	 */
 	if (ifp != NULL)
 		IF_ADDR_WLOCK(ifp);
 
 	lastref = if_delmulti_locked(ifp, ifma, flags);
 
 	if (ifp != NULL) {
 		/*
 		 * If and only if the ifnet instance exists:
 		 *  Release the address lock.
 		 *  If the group was left: update the hardware hash filter.
 		 */
 		IF_ADDR_WUNLOCK(ifp);
 		if (lastref && ifp->if_ioctl != NULL) {
 			(void)(*ifp->if_ioctl)(ifp, SIOCDELMULTI, 0);
 		}
 	}
 }
 
 /*
  * Perform deletion of network-layer and/or link-layer multicast address.
  *
  * Return 0 if the reference count was decremented.
  * Return 1 if the final reference was released, indicating that the
  * hardware hash filter should be reprogrammed.
  */
 static int
 if_delmulti_locked(struct ifnet *ifp, struct ifmultiaddr *ifma, int detaching)
 {
 	struct ifmultiaddr *ll_ifma;
 
 	if (ifp != NULL && ifma->ifma_ifp != NULL) {
 		KASSERT(ifma->ifma_ifp == ifp,
 		    ("%s: inconsistent ifp %p", __func__, ifp));
 		IF_ADDR_WLOCK_ASSERT(ifp);
 	}
 
 	ifp = ifma->ifma_ifp;
 	MCDPRINTF("%s freeing %p from %s \n", __func__, ifma, ifp ? ifp->if_xname : "");
 
 	/*
 	 * If the ifnet is detaching, null out references to ifnet,
 	 * so that upper protocol layers will notice, and not attempt
 	 * to obtain locks for an ifnet which no longer exists. The
 	 * routing socket announcement must happen before the ifnet
 	 * instance is detached from the system.
 	 */
 	if (detaching) {
 #ifdef DIAGNOSTIC
 		printf("%s: detaching ifnet instance %p\n", __func__, ifp);
 #endif
 		/*
 		 * ifp may already be nulled out if we are being reentered
 		 * to delete the ll_ifma.
 		 */
 		if (ifp != NULL) {
 			rt_newmaddrmsg(RTM_DELMADDR, ifma);
 			ifma->ifma_ifp = NULL;
 		}
 	}
 
 	if (--ifma->ifma_refcount > 0)
 		return 0;
 
 	if (ifp != NULL && detaching == 0 && (ifma->ifma_flags & IFMA_F_ENQUEUED)) {
 		CK_STAILQ_REMOVE(&ifp->if_multiaddrs, ifma, ifmultiaddr, ifma_link);
 		ifma->ifma_flags &= ~IFMA_F_ENQUEUED;
 	}
 	/*
 	 * If this ifma is a network-layer ifma, a link-layer ifma may
 	 * have been associated with it. Release it first if so.
 	 */
 	ll_ifma = ifma->ifma_llifma;
 	if (ll_ifma != NULL) {
 		KASSERT(ifma->ifma_lladdr != NULL,
 		    ("%s: llifma w/o lladdr", __func__));
 		if (detaching)
 			ll_ifma->ifma_ifp = NULL;	/* XXX */
 		if (--ll_ifma->ifma_refcount == 0) {
 			if (ifp != NULL) {
 				if (ll_ifma->ifma_flags & IFMA_F_ENQUEUED) {
 					CK_STAILQ_REMOVE(&ifp->if_multiaddrs, ll_ifma, ifmultiaddr,
 						ifma_link);
 					ll_ifma->ifma_flags &= ~IFMA_F_ENQUEUED;
 				}
 			}
 			if_freemulti(ll_ifma);
 		}
 	}
 #ifdef INVARIANTS
 	if (ifp) {
 		struct ifmultiaddr *ifmatmp;
 
 		CK_STAILQ_FOREACH(ifmatmp, &ifp->if_multiaddrs, ifma_link)
 			MPASS(ifma != ifmatmp);
 	}
 #endif
 	if_freemulti(ifma);
 	/*
 	 * The last reference to this instance of struct ifmultiaddr
 	 * was released; the hardware should be notified of this change.
 	 */
 	return 1;
 }
 
 /*
  * Set the link layer address on an interface.
  *
  * At this time we only support certain types of interfaces,
  * and we don't allow the length of the address to change.
  *
  * Set noinline to be dtrace-friendly
  */
 __noinline int
 if_setlladdr(struct ifnet *ifp, const u_char *lladdr, int len)
 {
 	struct sockaddr_dl *sdl;
 	struct ifaddr *ifa;
 	struct ifreq ifr;
 
 	ifa = ifp->if_addr;
 	if (ifa == NULL)
 		return (EINVAL);
 
 	sdl = (struct sockaddr_dl *)ifa->ifa_addr;
 	if (sdl == NULL)
 		return (EINVAL);
 
 	if (len != sdl->sdl_alen)	/* don't allow length to change */
 		return (EINVAL);
 
 	switch (ifp->if_type) {
 	case IFT_ETHER:
 	case IFT_XETHER:
 	case IFT_L2VLAN:
 	case IFT_BRIDGE:
 	case IFT_IEEE8023ADLAG:
 		bcopy(lladdr, LLADDR(sdl), len);
 		break;
 	default:
 		return (ENODEV);
 	}
 
 	/*
 	 * If the interface is already up, we need
 	 * to re-init it in order to reprogram its
 	 * address filter.
 	 */
 	if ((ifp->if_flags & IFF_UP) != 0) {
 		if (ifp->if_ioctl) {
 			ifp->if_flags &= ~IFF_UP;
 			ifr.ifr_flags = ifp->if_flags & 0xffff;
 			ifr.ifr_flagshigh = ifp->if_flags >> 16;
 			(*ifp->if_ioctl)(ifp, SIOCSIFFLAGS, (caddr_t)&ifr);
 			ifp->if_flags |= IFF_UP;
 			ifr.ifr_flags = ifp->if_flags & 0xffff;
 			ifr.ifr_flagshigh = ifp->if_flags >> 16;
 			(*ifp->if_ioctl)(ifp, SIOCSIFFLAGS, (caddr_t)&ifr);
 		}
 	}
 	EVENTHANDLER_INVOKE(iflladdr_event, ifp);
 
 	return (0);
 }
 
 /*
  * Compat function for handling basic encapsulation requests.
  * Not converted stacks (FDDI, IB, ..) supports traditional
  * output model: ARP (and other similar L2 protocols) are handled
  * inside output routine, arpresolve/nd6_resolve() returns MAC
  * address instead of full prepend.
  *
  * This function creates calculated header==MAC for IPv4/IPv6 and
  * returns EAFNOSUPPORT (which is then handled in ARP code) for other
  * address families.
  */
 static int
 if_requestencap_default(struct ifnet *ifp, struct if_encap_req *req)
 {
 
 	if (req->rtype != IFENCAP_LL)
 		return (EOPNOTSUPP);
 
 	if (req->bufsize < req->lladdr_len)
 		return (ENOMEM);
 
 	switch (req->family) {
 	case AF_INET:
 	case AF_INET6:
 		break;
 	default:
 		return (EAFNOSUPPORT);
 	}
 
 	/* Copy lladdr to storage as is */
 	memmove(req->buf, req->lladdr, req->lladdr_len);
 	req->bufsize = req->lladdr_len;
 	req->lladdr_off = 0;
 
 	return (0);
 }
 
 /*
  * Tunnel interfaces can nest, also they may cause infinite recursion
  * calls when misconfigured. We'll prevent this by detecting loops.
  * High nesting level may cause stack exhaustion. We'll prevent this
  * by introducing upper limit.
  *
  * Return 0, if tunnel nesting count is equal or less than limit.
  */
 int
 if_tunnel_check_nesting(struct ifnet *ifp, struct mbuf *m, uint32_t cookie,
     int limit)
 {
 	struct m_tag *mtag;
 	int count;
 
 	count = 1;
 	mtag = NULL;
 	while ((mtag = m_tag_locate(m, cookie, 0, mtag)) != NULL) {
 		if (*(struct ifnet **)(mtag + 1) == ifp) {
 			log(LOG_NOTICE, "%s: loop detected\n", if_name(ifp));
 			return (EIO);
 		}
 		count++;
 	}
 	if (count > limit) {
 		log(LOG_NOTICE,
 		    "%s: if_output recursively called too many times(%d)\n",
 		    if_name(ifp), count);
 		return (EIO);
 	}
 	mtag = m_tag_alloc(cookie, 0, sizeof(struct ifnet *), M_NOWAIT);
 	if (mtag == NULL)
 		return (ENOMEM);
 	*(struct ifnet **)(mtag + 1) = ifp;
 	m_tag_prepend(m, mtag);
 	return (0);
 }
 
 /*
  * Get the link layer address that was read from the hardware at attach.
  *
  * This is only set by Ethernet NICs (IFT_ETHER), but laggX interfaces re-type
  * their component interfaces as IFT_IEEE8023ADLAG.
  */
 int
 if_gethwaddr(struct ifnet *ifp, struct ifreq *ifr)
 {
 
 	if (ifp->if_hw_addr == NULL)
 		return (ENODEV);
 
 	switch (ifp->if_type) {
 	case IFT_ETHER:
 	case IFT_IEEE8023ADLAG:
 		bcopy(ifp->if_hw_addr, ifr->ifr_addr.sa_data, ifp->if_addrlen);
 		return (0);
 	default:
 		return (ENODEV);
 	}
 }
 
 /*
  * The name argument must be a pointer to storage which will last as
  * long as the interface does.  For physical devices, the result of
  * device_get_name(dev) is a good choice and for pseudo-devices a
  * static string works well.
  */
 void
 if_initname(struct ifnet *ifp, const char *name, int unit)
 {
 	ifp->if_dname = name;
 	ifp->if_dunit = unit;
 	if (unit != IF_DUNIT_NONE)
 		snprintf(ifp->if_xname, IFNAMSIZ, "%s%d", name, unit);
 	else
 		strlcpy(ifp->if_xname, name, IFNAMSIZ);
 }
 
 int
 if_printf(struct ifnet *ifp, const char *fmt, ...)
 {
 	char if_fmt[256];
 	va_list ap;
 
 	snprintf(if_fmt, sizeof(if_fmt), "%s: %s", ifp->if_xname, fmt);
 	va_start(ap, fmt);
 	vlog(LOG_INFO, if_fmt, ap);
 	va_end(ap);
 	return (0);
 }
 
 void
 if_start(struct ifnet *ifp)
 {
 
 	(*(ifp)->if_start)(ifp);
 }
 
 /*
  * Backwards compatibility interface for drivers 
  * that have not implemented it
  */
 static int
 if_transmit(struct ifnet *ifp, struct mbuf *m)
 {
 	int error;
 
 	IFQ_HANDOFF(ifp, m, error);
 	return (error);
 }
 
 static void
 if_input_default(struct ifnet *ifp __unused, struct mbuf *m)
 {
 
 	m_freem(m);
 }
 
 int
 if_handoff(struct ifqueue *ifq, struct mbuf *m, struct ifnet *ifp, int adjust)
 {
 	int active = 0;
 
 	IF_LOCK(ifq);
 	if (_IF_QFULL(ifq)) {
 		IF_UNLOCK(ifq);
 		if_inc_counter(ifp, IFCOUNTER_OQDROPS, 1);
 		m_freem(m);
 		return (0);
 	}
 	if (ifp != NULL) {
 		if_inc_counter(ifp, IFCOUNTER_OBYTES, m->m_pkthdr.len + adjust);
 		if (m->m_flags & (M_BCAST|M_MCAST))
 			if_inc_counter(ifp, IFCOUNTER_OMCASTS, 1);
 		active = ifp->if_drv_flags & IFF_DRV_OACTIVE;
 	}
 	_IF_ENQUEUE(ifq, m);
 	IF_UNLOCK(ifq);
 	if (ifp != NULL && !active)
 		(*(ifp)->if_start)(ifp);
 	return (1);
 }
 
 void
 if_register_com_alloc(u_char type,
     if_com_alloc_t *a, if_com_free_t *f)
 {
 	
 	KASSERT(if_com_alloc[type] == NULL,
 	    ("if_register_com_alloc: %d already registered", type));
 	KASSERT(if_com_free[type] == NULL,
 	    ("if_register_com_alloc: %d free already registered", type));
 
 	if_com_alloc[type] = a;
 	if_com_free[type] = f;
 }
 
 void
 if_deregister_com_alloc(u_char type)
 {
 	
 	KASSERT(if_com_alloc[type] != NULL,
 	    ("if_deregister_com_alloc: %d not registered", type));
 	KASSERT(if_com_free[type] != NULL,
 	    ("if_deregister_com_alloc: %d free not registered", type));
 	if_com_alloc[type] = NULL;
 	if_com_free[type] = NULL;
 }
 
 /* API for driver access to network stack owned ifnet.*/
 uint64_t
 if_setbaudrate(struct ifnet *ifp, uint64_t baudrate)
 {
 	uint64_t oldbrate;
 
 	oldbrate = ifp->if_baudrate;
 	ifp->if_baudrate = baudrate;
 	return (oldbrate);
 }
 
 uint64_t
 if_getbaudrate(if_t ifp)
 {
 
 	return (((struct ifnet *)ifp)->if_baudrate);
 }
 
 int
 if_setcapabilities(if_t ifp, int capabilities)
 {
 	((struct ifnet *)ifp)->if_capabilities = capabilities;
 	return (0);
 }
 
 int
 if_setcapabilitiesbit(if_t ifp, int setbit, int clearbit)
 {
 	((struct ifnet *)ifp)->if_capabilities |= setbit;
 	((struct ifnet *)ifp)->if_capabilities &= ~clearbit;
 
 	return (0);
 }
 
 int
 if_getcapabilities(if_t ifp)
 {
 	return ((struct ifnet *)ifp)->if_capabilities;
 }
 
 int 
 if_setcapenable(if_t ifp, int capabilities)
 {
 	((struct ifnet *)ifp)->if_capenable = capabilities;
 	return (0);
 }
 
 int 
 if_setcapenablebit(if_t ifp, int setcap, int clearcap)
 {
 	if(setcap) 
 		((struct ifnet *)ifp)->if_capenable |= setcap;
 	if(clearcap)
 		((struct ifnet *)ifp)->if_capenable &= ~clearcap;
 
 	return (0);
 }
 
 const char *
 if_getdname(if_t ifp)
 {
 	return ((struct ifnet *)ifp)->if_dname;
 }
 
 int 
 if_togglecapenable(if_t ifp, int togglecap)
 {
 	((struct ifnet *)ifp)->if_capenable ^= togglecap;
 	return (0);
 }
 
 int
 if_getcapenable(if_t ifp)
 {
 	return ((struct ifnet *)ifp)->if_capenable;
 }
 
 /*
  * This is largely undesirable because it ties ifnet to a device, but does
  * provide flexiblity for an embedded product vendor. Should be used with
  * the understanding that it violates the interface boundaries, and should be
  * a last resort only.
  */
 int
 if_setdev(if_t ifp, void *dev)
 {
 	return (0);
 }
 
 int
 if_setdrvflagbits(if_t ifp, int set_flags, int clear_flags)
 {
 	((struct ifnet *)ifp)->if_drv_flags |= set_flags;
 	((struct ifnet *)ifp)->if_drv_flags &= ~clear_flags;
 
 	return (0);
 }
 
 int
 if_getdrvflags(if_t ifp)
 {
 	return ((struct ifnet *)ifp)->if_drv_flags;
 }
  
 int
 if_setdrvflags(if_t ifp, int flags)
 {
 	((struct ifnet *)ifp)->if_drv_flags = flags;
 	return (0);
 }
 
 
 int
 if_setflags(if_t ifp, int flags)
 {
 	/* XXX Temporary */
 	((struct ifnet *)ifp)->if_flags = flags | IFF_NEEDSEPOCH;
 	return (0);
 }
 
 int
 if_setflagbits(if_t ifp, int set, int clear)
 {
 	((struct ifnet *)ifp)->if_flags |= set;
 	((struct ifnet *)ifp)->if_flags &= ~clear;
 
 	return (0);
 }
 
 int
 if_getflags(if_t ifp)
 {
 	return ((struct ifnet *)ifp)->if_flags;
 }
 
 int
 if_clearhwassist(if_t ifp)
 {
 	((struct ifnet *)ifp)->if_hwassist = 0;
 	return (0);
 }
 
 int
 if_sethwassistbits(if_t ifp, int toset, int toclear)
 {
 	((struct ifnet *)ifp)->if_hwassist |= toset;
 	((struct ifnet *)ifp)->if_hwassist &= ~toclear;
 
 	return (0);
 }
 
 int
 if_sethwassist(if_t ifp, int hwassist_bit)
 {
 	((struct ifnet *)ifp)->if_hwassist = hwassist_bit;
 	return (0);
 }
 
 int
 if_gethwassist(if_t ifp)
 {
 	return ((struct ifnet *)ifp)->if_hwassist;
 }
 
 int
 if_setmtu(if_t ifp, int mtu)
 {
 	((struct ifnet *)ifp)->if_mtu = mtu;
 	return (0);
 }
 
 int
 if_getmtu(if_t ifp)
 {
 	return ((struct ifnet *)ifp)->if_mtu;
 }
 
 int
 if_getmtu_family(if_t ifp, int family)
 {
 	struct domain *dp;
 
 	for (dp = domains; dp; dp = dp->dom_next) {
 		if (dp->dom_family == family && dp->dom_ifmtu != NULL)
 			return (dp->dom_ifmtu((struct ifnet *)ifp));
 	}
 
 	return (((struct ifnet *)ifp)->if_mtu);
 }
 
 /*
  * Methods for drivers to access interface unicast and multicast
  * link level addresses.  Driver shall not know 'struct ifaddr' neither
  * 'struct ifmultiaddr'.
  */
 u_int
 if_lladdr_count(if_t ifp)
 {
 	struct epoch_tracker et;
 	struct ifaddr *ifa;
 	u_int count;
 
 	count = 0;
 	NET_EPOCH_ENTER(et);
 	CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link)
 		if (ifa->ifa_addr->sa_family == AF_LINK)
 			count++;
 	NET_EPOCH_EXIT(et);
 
 	return (count);
 }
 
 u_int
 if_foreach_lladdr(if_t ifp, iflladdr_cb_t cb, void *cb_arg)
 {
 	struct epoch_tracker et;
 	struct ifaddr *ifa;
 	u_int count;
 
 	MPASS(cb);
 
 	count = 0;
 	NET_EPOCH_ENTER(et);
 	CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) {
 		if (ifa->ifa_addr->sa_family != AF_LINK)
 			continue;
 		count += (*cb)(cb_arg, (struct sockaddr_dl *)ifa->ifa_addr,
 		    count);
 	}
 	NET_EPOCH_EXIT(et);
 
 	return (count);
 }
 
 u_int
 if_llmaddr_count(if_t ifp)
 {
 	struct epoch_tracker et;
 	struct ifmultiaddr *ifma;
 	int count;
 
 	count = 0;
 	NET_EPOCH_ENTER(et);
 	CK_STAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link)
 		if (ifma->ifma_addr->sa_family == AF_LINK)
 			count++;
 	NET_EPOCH_EXIT(et);
 
 	return (count);
 }
 
 u_int
 if_foreach_llmaddr(if_t ifp, iflladdr_cb_t cb, void *cb_arg)
 {
 	struct epoch_tracker et;
 	struct ifmultiaddr *ifma;
 	u_int count;
 
 	MPASS(cb);
 
 	count = 0;
 	NET_EPOCH_ENTER(et);
 	CK_STAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) {
 		if (ifma->ifma_addr->sa_family != AF_LINK)
 			continue;
 		count += (*cb)(cb_arg, (struct sockaddr_dl *)ifma->ifma_addr,
 		    count);
 	}
 	NET_EPOCH_EXIT(et);
 
 	return (count);
 }
 
 int
 if_setsoftc(if_t ifp, void *softc)
 {
 	((struct ifnet *)ifp)->if_softc = softc;
 	return (0);
 }
 
 void *
 if_getsoftc(if_t ifp)
 {
 	return ((struct ifnet *)ifp)->if_softc;
 }
 
 void 
 if_setrcvif(struct mbuf *m, if_t ifp)
 {
 
 	MPASS((m->m_pkthdr.csum_flags & CSUM_SND_TAG) == 0);
 	m->m_pkthdr.rcvif = (struct ifnet *)ifp;
 }
 
 void 
 if_setvtag(struct mbuf *m, uint16_t tag)
 {
 	m->m_pkthdr.ether_vtag = tag;	
 }
 
 uint16_t
 if_getvtag(struct mbuf *m)
 {
 
 	return (m->m_pkthdr.ether_vtag);
 }
 
 int
 if_sendq_empty(if_t ifp)
 {
 	return IFQ_DRV_IS_EMPTY(&((struct ifnet *)ifp)->if_snd);
 }
 
 struct ifaddr *
 if_getifaddr(if_t ifp)
 {
 	return ((struct ifnet *)ifp)->if_addr;
 }
 
 int
 if_getamcount(if_t ifp)
 {
 	return ((struct ifnet *)ifp)->if_amcount;
 }
 
 
 int
 if_setsendqready(if_t ifp)
 {
 	IFQ_SET_READY(&((struct ifnet *)ifp)->if_snd);
 	return (0);
 }
 
 int
 if_setsendqlen(if_t ifp, int tx_desc_count)
 {
 	IFQ_SET_MAXLEN(&((struct ifnet *)ifp)->if_snd, tx_desc_count);
 	((struct ifnet *)ifp)->if_snd.ifq_drv_maxlen = tx_desc_count;
 
 	return (0);
 }
 
 int
 if_vlantrunkinuse(if_t ifp)
 {
 	return ((struct ifnet *)ifp)->if_vlantrunk != NULL?1:0;
 }
 
 int
 if_input(if_t ifp, struct mbuf* sendmp)
 {
 	(*((struct ifnet *)ifp)->if_input)((struct ifnet *)ifp, sendmp);
 	return (0);
 
 }
 
 struct mbuf *
 if_dequeue(if_t ifp)
 {
 	struct mbuf *m;
 	IFQ_DRV_DEQUEUE(&((struct ifnet *)ifp)->if_snd, m);
 
 	return (m);
 }
 
 int
 if_sendq_prepend(if_t ifp, struct mbuf *m)
 {
 	IFQ_DRV_PREPEND(&((struct ifnet *)ifp)->if_snd, m);
 	return (0);
 }
 
 int
 if_setifheaderlen(if_t ifp, int len)
 {
 	((struct ifnet *)ifp)->if_hdrlen = len;
 	return (0);
 }
 
 caddr_t
 if_getlladdr(if_t ifp)
 {
 	return (IF_LLADDR((struct ifnet *)ifp));
 }
 
 void *
 if_gethandle(u_char type)
 {
 	return (if_alloc(type));
 }
 
 void
 if_bpfmtap(if_t ifh, struct mbuf *m)
 {
 	struct ifnet *ifp = (struct ifnet *)ifh;
 
 	BPF_MTAP(ifp, m);
 }
 
 void
 if_etherbpfmtap(if_t ifh, struct mbuf *m)
 {
 	struct ifnet *ifp = (struct ifnet *)ifh;
 
 	ETHER_BPF_MTAP(ifp, m);
 }
 
 void
 if_vlancap(if_t ifh)
 {
 	struct ifnet *ifp = (struct ifnet *)ifh;
 	VLAN_CAPABILITIES(ifp);
 }
 
 int
 if_sethwtsomax(if_t ifp, u_int if_hw_tsomax)
 {
 
 	((struct ifnet *)ifp)->if_hw_tsomax = if_hw_tsomax;
         return (0);
 }
 
 int
 if_sethwtsomaxsegcount(if_t ifp, u_int if_hw_tsomaxsegcount)
 {
 
 	((struct ifnet *)ifp)->if_hw_tsomaxsegcount = if_hw_tsomaxsegcount;
         return (0);
 }
 
 int
 if_sethwtsomaxsegsize(if_t ifp, u_int if_hw_tsomaxsegsize)
 {
 
 	((struct ifnet *)ifp)->if_hw_tsomaxsegsize = if_hw_tsomaxsegsize;
         return (0);
 }
 
 u_int
 if_gethwtsomax(if_t ifp)
 {
 
 	return (((struct ifnet *)ifp)->if_hw_tsomax);
 }
 
 u_int
 if_gethwtsomaxsegcount(if_t ifp)
 {
 
 	return (((struct ifnet *)ifp)->if_hw_tsomaxsegcount);
 }
 
 u_int
 if_gethwtsomaxsegsize(if_t ifp)
 {
 
 	return (((struct ifnet *)ifp)->if_hw_tsomaxsegsize);
 }
 
 void
 if_setinitfn(if_t ifp, void (*init_fn)(void *))
 {
 	((struct ifnet *)ifp)->if_init = init_fn;
 }
 
 void
 if_setioctlfn(if_t ifp, int (*ioctl_fn)(if_t, u_long, caddr_t))
 {
 	((struct ifnet *)ifp)->if_ioctl = (void *)ioctl_fn;
 }
 
 void
 if_setstartfn(if_t ifp, void (*start_fn)(if_t))
 {
 	((struct ifnet *)ifp)->if_start = (void *)start_fn;
 }
 
 void
 if_settransmitfn(if_t ifp, if_transmit_fn_t start_fn)
 {
 	((struct ifnet *)ifp)->if_transmit = start_fn;
 }
 
 void if_setqflushfn(if_t ifp, if_qflush_fn_t flush_fn)
 {
 	((struct ifnet *)ifp)->if_qflush = flush_fn;
 	
 }
 
 void
 if_setgetcounterfn(if_t ifp, if_get_counter_t fn)
 {
 
 	ifp->if_get_counter = fn;
 }
 
 /* Revisit these - These are inline functions originally. */
 int
 drbr_inuse_drv(if_t ifh, struct buf_ring *br)
 {
 	return drbr_inuse(ifh, br);
 }
 
 struct mbuf*
 drbr_dequeue_drv(if_t ifh, struct buf_ring *br)
 {
 	return drbr_dequeue(ifh, br);
 }
 
 int
 drbr_needs_enqueue_drv(if_t ifh, struct buf_ring *br)
 {
 	return drbr_needs_enqueue(ifh, br);
 }
 
 int
 drbr_enqueue_drv(if_t ifh, struct buf_ring *br, struct mbuf *m)
 {
 	return drbr_enqueue(ifh, br, m);
 
 }
Index: head/sys/net/vnet.c
===================================================================
--- head/sys/net/vnet.c	(revision 358019)
+++ head/sys/net/vnet.c	(revision 358020)
@@ -1,800 +1,808 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (c) 2004-2009 University of Zagreb
  * Copyright (c) 2006-2009 FreeBSD Foundation
  * All rights reserved.
  *
  * This software was developed by the University of Zagreb and the
  * FreeBSD Foundation under sponsorship by the Stichting NLnet and the
  * FreeBSD Foundation.
  *
  * Copyright (c) 2009 Jeffrey Roberson <jeff@freebsd.org>
  * Copyright (c) 2009 Robert N. M. Watson
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include "opt_ddb.h"
 #include "opt_kdb.h"
 
 #include <sys/param.h>
 #include <sys/kdb.h>
 #include <sys/kernel.h>
 #include <sys/jail.h>
 #include <sys/sdt.h>
 #include <sys/systm.h>
 #include <sys/sysctl.h>
 #include <sys/eventhandler.h>
 #include <sys/lock.h>
 #include <sys/malloc.h>
 #include <sys/proc.h>
 #include <sys/socket.h>
 #include <sys/sx.h>
 #include <sys/sysctl.h>
 
 #include <machine/stdarg.h>
 
 #ifdef DDB
 #include <ddb/ddb.h>
 #include <ddb/db_sym.h>
 #endif
 
 #include <net/if.h>
 #include <net/if_var.h>
 #include <net/vnet.h>
 
 /*-
  * This file implements core functions for virtual network stacks:
  *
  * - Virtual network stack management functions.
  *
  * - Virtual network stack memory allocator, which virtualizes global
  *   variables in the network stack
  *
  * - Virtualized SYSINIT's/SYSUNINIT's, which allow network stack subsystems
  *   to register startup/shutdown events to be run for each virtual network
  *   stack instance.
  */
 
 FEATURE(vimage, "VIMAGE kernel virtualization");
 
 static MALLOC_DEFINE(M_VNET, "vnet", "network stack control block");
 
 /*
  * The virtual network stack list has two read-write locks, one sleepable and
  * the other not, so that the list can be stablized and walked in a variety
  * of network stack contexts.  Both must be acquired exclusively to modify
  * the list, but a read lock of either lock is sufficient to walk the list.
  */
 struct rwlock		vnet_rwlock;
 struct sx		vnet_sxlock;
 
 #define	VNET_LIST_WLOCK() do {						\
 	sx_xlock(&vnet_sxlock);						\
 	rw_wlock(&vnet_rwlock);						\
 } while (0)
 
 #define	VNET_LIST_WUNLOCK() do {					\
 	rw_wunlock(&vnet_rwlock);					\
 	sx_xunlock(&vnet_sxlock);					\
 } while (0)
 
 struct vnet_list_head vnet_head;
 struct vnet *vnet0;
 
 /*
  * The virtual network stack allocator provides storage for virtualized
  * global variables.  These variables are defined/declared using the
  * VNET_DEFINE()/VNET_DECLARE() macros, which place them in the 'set_vnet'
  * linker set.  The details of the implementation are somewhat subtle, but
  * allow the majority of most network subsystems to maintain
  * virtualization-agnostic.
  *
  * The virtual network stack allocator handles variables in the base kernel
  * vs. modules in similar but different ways.  In both cases, virtualized
  * global variables are marked as such by being declared to be part of the
  * vnet linker set.  These "master" copies of global variables serve two
  * functions:
  *
  * (1) They contain static initialization or "default" values for global
  *     variables which will be propagated to each virtual network stack
  *     instance when created.  As with normal global variables, they default
  *     to zero-filled.
  *
  * (2) They act as unique global names by which the variable can be referred
  *     to, regardless of network stack instance.  The single global symbol
  *     will be used to calculate the location of a per-virtual instance
  *     variable at run-time.
  *
  * Each virtual network stack instance has a complete copy of each
  * virtualized global variable, stored in a malloc'd block of memory
  * referred to by vnet->vnet_data_mem.  Critical to the design is that each
  * per-instance memory block is laid out identically to the master block so
  * that the offset of each global variable is the same across all blocks.  To
  * optimize run-time access, a precalculated 'base' address,
  * vnet->vnet_data_base, is stored in each vnet, and is the amount that can
  * be added to the address of a 'master' instance of a variable to get to the
  * per-vnet instance.
  *
  * Virtualized global variables are handled in a similar manner, but as each
  * module has its own 'set_vnet' linker set, and we want to keep all
  * virtualized globals togther, we reserve space in the kernel's linker set
  * for potential module variables using a per-vnet character array,
  * 'modspace'.  The virtual network stack allocator maintains a free list to
  * track what space in the array is free (all, initially) and as modules are
  * linked, allocates portions of the space to specific globals.  The kernel
  * module linker queries the virtual network stack allocator and will
  * bind references of the global to the location during linking.  It also
  * calls into the virtual network stack allocator, once the memory is
  * initialized, in order to propagate the new static initializations to all
  * existing virtual network stack instances so that the soon-to-be executing
  * module will find every network stack instance with proper default values.
  */
 
 /*
  * Number of bytes of data in the 'set_vnet' linker set, and hence the total
  * size of all kernel virtualized global variables, and the malloc(9) type
  * that will be used to allocate it.
  */
 #define	VNET_BYTES	(VNET_STOP - VNET_START)
 
 static MALLOC_DEFINE(M_VNET_DATA, "vnet_data", "VNET data");
 
 /*
  * VNET_MODMIN is the minimum number of bytes we will reserve for the sum of
  * global variables across all loaded modules.  As this actually sizes an
  * array declared as a virtualized global variable in the kernel itself, and
  * we want the virtualized global variable space to be page-sized, we may
  * have more space than that in practice.
  */
 #define	VNET_MODMIN	(8 * PAGE_SIZE)
 #define	VNET_SIZE	roundup2(VNET_BYTES, PAGE_SIZE)
 
 /*
  * Space to store virtualized global variables from loadable kernel modules,
  * and the free list to manage it.
  */
 VNET_DEFINE_STATIC(char, modspace[VNET_MODMIN] __aligned(__alignof(void *)));
 
 /*
  * Global lists of subsystem constructor and destructors for vnets.  They are
  * registered via VNET_SYSINIT() and VNET_SYSUNINIT().  Both lists are
  * protected by the vnet_sysinit_sxlock global lock.
  */
 static TAILQ_HEAD(vnet_sysinit_head, vnet_sysinit) vnet_constructors =
 	TAILQ_HEAD_INITIALIZER(vnet_constructors);
 static TAILQ_HEAD(vnet_sysuninit_head, vnet_sysinit) vnet_destructors =
 	TAILQ_HEAD_INITIALIZER(vnet_destructors);
 
 struct sx		vnet_sysinit_sxlock;
 
 #define	VNET_SYSINIT_WLOCK()	sx_xlock(&vnet_sysinit_sxlock);
 #define	VNET_SYSINIT_WUNLOCK()	sx_xunlock(&vnet_sysinit_sxlock);
 #define	VNET_SYSINIT_RLOCK()	sx_slock(&vnet_sysinit_sxlock);
 #define	VNET_SYSINIT_RUNLOCK()	sx_sunlock(&vnet_sysinit_sxlock);
 
 struct vnet_data_free {
 	uintptr_t	vnd_start;
 	int		vnd_len;
 	TAILQ_ENTRY(vnet_data_free) vnd_link;
 };
 
 static MALLOC_DEFINE(M_VNET_DATA_FREE, "vnet_data_free",
     "VNET resource accounting");
 static TAILQ_HEAD(, vnet_data_free) vnet_data_free_head =
 	    TAILQ_HEAD_INITIALIZER(vnet_data_free_head);
 static struct sx vnet_data_free_lock;
 
 SDT_PROVIDER_DEFINE(vnet);
 SDT_PROBE_DEFINE1(vnet, functions, vnet_alloc, entry, "int");
 SDT_PROBE_DEFINE2(vnet, functions, vnet_alloc, alloc, "int",
     "struct vnet *");
 SDT_PROBE_DEFINE2(vnet, functions, vnet_alloc, return,
     "int", "struct vnet *");
 SDT_PROBE_DEFINE2(vnet, functions, vnet_destroy, entry,
     "int", "struct vnet *");
 SDT_PROBE_DEFINE1(vnet, functions, vnet_destroy, return,
     "int");
 
 #ifdef DDB
 static void db_show_vnet_print_vs(struct vnet_sysinit *, int);
 #endif
 
 /*
  * Allocate a virtual network stack.
  */
 struct vnet *
 vnet_alloc(void)
 {
 	struct vnet *vnet;
 
 	SDT_PROBE1(vnet, functions, vnet_alloc, entry, __LINE__);
 	vnet = malloc(sizeof(struct vnet), M_VNET, M_WAITOK | M_ZERO);
 	vnet->vnet_magic_n = VNET_MAGIC_N;
 	SDT_PROBE2(vnet, functions, vnet_alloc, alloc, __LINE__, vnet);
 
 	/*
 	 * Allocate storage for virtualized global variables and copy in
 	 * initial values form our 'master' copy.
 	 */
 	vnet->vnet_data_mem = malloc(VNET_SIZE, M_VNET_DATA, M_WAITOK);
 	memcpy(vnet->vnet_data_mem, (void *)VNET_START, VNET_BYTES);
 
 	/*
 	 * All use of vnet-specific data will immediately subtract VNET_START
 	 * from the base memory pointer, so pre-calculate that now to avoid
 	 * it on each use.
 	 */
 	vnet->vnet_data_base = (uintptr_t)vnet->vnet_data_mem - VNET_START;
 
 	/* Initialize / attach vnet module instances. */
 	CURVNET_SET_QUIET(vnet);
 	vnet_sysinit();
 	CURVNET_RESTORE();
 
 	VNET_LIST_WLOCK();
 	LIST_INSERT_HEAD(&vnet_head, vnet, vnet_le);
 	VNET_LIST_WUNLOCK();
 
 	SDT_PROBE2(vnet, functions, vnet_alloc, return, __LINE__, vnet);
 	return (vnet);
 }
 
 /*
  * Destroy a virtual network stack.
  */
 void
 vnet_destroy(struct vnet *vnet)
 {
 
 	SDT_PROBE2(vnet, functions, vnet_destroy, entry, __LINE__, vnet);
 	KASSERT(vnet->vnet_sockcnt == 0,
 	    ("%s: vnet still has sockets", __func__));
 
 	VNET_LIST_WLOCK();
 	LIST_REMOVE(vnet, vnet_le);
 	VNET_LIST_WUNLOCK();
 
+	/* Signal that VNET is being shutdown. */
+	vnet->vnet_shutdown = true;
+
 	CURVNET_SET_QUIET(vnet);
 	vnet_sysuninit();
 	CURVNET_RESTORE();
 
 	/*
 	 * Release storage for the virtual network stack instance.
 	 */
 	free(vnet->vnet_data_mem, M_VNET_DATA);
 	vnet->vnet_data_mem = NULL;
 	vnet->vnet_data_base = 0;
 	vnet->vnet_magic_n = 0xdeadbeef;
 	free(vnet, M_VNET);
 	SDT_PROBE1(vnet, functions, vnet_destroy, return, __LINE__);
 }
 
 /*
  * Boot time initialization and allocation of virtual network stacks.
  */
 static void
 vnet_init_prelink(void *arg __unused)
 {
 
 	rw_init(&vnet_rwlock, "vnet_rwlock");
 	sx_init(&vnet_sxlock, "vnet_sxlock");
 	sx_init(&vnet_sysinit_sxlock, "vnet_sysinit_sxlock");
 	LIST_INIT(&vnet_head);
 }
 SYSINIT(vnet_init_prelink, SI_SUB_VNET_PRELINK, SI_ORDER_FIRST,
     vnet_init_prelink, NULL);
 
 static void
 vnet0_init(void *arg __unused)
 {
 
 	if (bootverbose)
 		printf("VIMAGE (virtualized network stack) enabled\n");
 
 	/*
 	 * We MUST clear curvnet in vi_init_done() before going SMP,
 	 * otherwise CURVNET_SET() macros would scream about unnecessary
 	 * curvnet recursions.
 	 */
 	curvnet = prison0.pr_vnet = vnet0 = vnet_alloc();
 }
 SYSINIT(vnet0_init, SI_SUB_VNET, SI_ORDER_FIRST, vnet0_init, NULL);
 
 static void
 vnet_init_done(void *unused __unused)
 {
 
 	curvnet = NULL;
 }
 SYSINIT(vnet_init_done, SI_SUB_VNET_DONE, SI_ORDER_ANY, vnet_init_done,
     NULL);
 
 /*
  * Once on boot, initialize the modspace freelist to entirely cover modspace.
  */
 static void
 vnet_data_startup(void *dummy __unused)
 {
 	struct vnet_data_free *df;
 
 	df = malloc(sizeof(*df), M_VNET_DATA_FREE, M_WAITOK | M_ZERO);
 	df->vnd_start = (uintptr_t)&VNET_NAME(modspace);
 	df->vnd_len = VNET_MODMIN;
 	TAILQ_INSERT_HEAD(&vnet_data_free_head, df, vnd_link);
 	sx_init(&vnet_data_free_lock, "vnet_data alloc lock");
 }
 SYSINIT(vnet_data, SI_SUB_KLD, SI_ORDER_FIRST, vnet_data_startup, NULL);
 
+/* Dummy VNET_SYSINIT to make sure we always reach the final end state. */
 static void
-vnet_sysuninit_shutdown(void *unused __unused)
+vnet_sysinit_done(void *unused __unused)
 {
 
-	/* Signal that VNET is being shutdown. */
-	curvnet->vnet_shutdown = 1;
+	return;
 }
-VNET_SYSUNINIT(vnet_sysuninit_shutdown, SI_SUB_VNET_DONE, SI_ORDER_FIRST,
-    vnet_sysuninit_shutdown, NULL);
+VNET_SYSINIT(vnet_sysinit_done, SI_SUB_VNET_DONE, SI_ORDER_ANY,
+    vnet_sysinit_done, NULL);
 
 /*
  * When a module is loaded and requires storage for a virtualized global
  * variable, allocate space from the modspace free list.  This interface
  * should be used only by the kernel linker.
  */
 void *
 vnet_data_alloc(int size)
 {
 	struct vnet_data_free *df;
 	void *s;
 
 	s = NULL;
 	size = roundup2(size, sizeof(void *));
 	sx_xlock(&vnet_data_free_lock);
 	TAILQ_FOREACH(df, &vnet_data_free_head, vnd_link) {
 		if (df->vnd_len < size)
 			continue;
 		if (df->vnd_len == size) {
 			s = (void *)df->vnd_start;
 			TAILQ_REMOVE(&vnet_data_free_head, df, vnd_link);
 			free(df, M_VNET_DATA_FREE);
 			break;
 		}
 		s = (void *)df->vnd_start;
 		df->vnd_len -= size;
 		df->vnd_start = df->vnd_start + size;
 		break;
 	}
 	sx_xunlock(&vnet_data_free_lock);
 
 	return (s);
 }
 
 /*
  * Free space for a virtualized global variable on module unload.
  */
 void
 vnet_data_free(void *start_arg, int size)
 {
 	struct vnet_data_free *df;
 	struct vnet_data_free *dn;
 	uintptr_t start;
 	uintptr_t end;
 
 	size = roundup2(size, sizeof(void *));
 	start = (uintptr_t)start_arg;
 	end = start + size;
 	/*
 	 * Free a region of space and merge it with as many neighbors as
 	 * possible.  Keeping the list sorted simplifies this operation.
 	 */
 	sx_xlock(&vnet_data_free_lock);
 	TAILQ_FOREACH(df, &vnet_data_free_head, vnd_link) {
 		if (df->vnd_start > end)
 			break;
 		/*
 		 * If we expand at the end of an entry we may have to merge
 		 * it with the one following it as well.
 		 */
 		if (df->vnd_start + df->vnd_len == start) {
 			df->vnd_len += size;
 			dn = TAILQ_NEXT(df, vnd_link);
 			if (df->vnd_start + df->vnd_len == dn->vnd_start) {
 				df->vnd_len += dn->vnd_len;
 				TAILQ_REMOVE(&vnet_data_free_head, dn,
 				    vnd_link);
 				free(dn, M_VNET_DATA_FREE);
 			}
 			sx_xunlock(&vnet_data_free_lock);
 			return;
 		}
 		if (df->vnd_start == end) {
 			df->vnd_start = start;
 			df->vnd_len += size;
 			sx_xunlock(&vnet_data_free_lock);
 			return;
 		}
 	}
 	dn = malloc(sizeof(*df), M_VNET_DATA_FREE, M_WAITOK | M_ZERO);
 	dn->vnd_start = start;
 	dn->vnd_len = size;
 	if (df)
 		TAILQ_INSERT_BEFORE(df, dn, vnd_link);
 	else
 		TAILQ_INSERT_TAIL(&vnet_data_free_head, dn, vnd_link);
 	sx_xunlock(&vnet_data_free_lock);
 }
 
 /*
  * When a new virtualized global variable has been allocated, propagate its
  * initial value to each already-allocated virtual network stack instance.
  */
 void
 vnet_data_copy(void *start, int size)
 {
 	struct vnet *vnet;
 
 	VNET_LIST_RLOCK();
 	LIST_FOREACH(vnet, &vnet_head, vnet_le)
 		memcpy((void *)((uintptr_t)vnet->vnet_data_base +
 		    (uintptr_t)start), start, size);
 	VNET_LIST_RUNLOCK();
 }
 
 /*
  * Support for special SYSINIT handlers registered via VNET_SYSINIT()
  * and VNET_SYSUNINIT().
  */
 void
 vnet_register_sysinit(void *arg)
 {
 	struct vnet_sysinit *vs, *vs2;	
 	struct vnet *vnet;
 
 	vs = arg;
 	KASSERT(vs->subsystem > SI_SUB_VNET, ("vnet sysinit too early"));
 
 	/* Add the constructor to the global list of vnet constructors. */
 	VNET_SYSINIT_WLOCK();
 	TAILQ_FOREACH(vs2, &vnet_constructors, link) {
 		if (vs2->subsystem > vs->subsystem)
 			break;
 		if (vs2->subsystem == vs->subsystem && vs2->order > vs->order)
 			break;
 	}
 	if (vs2 != NULL)
 		TAILQ_INSERT_BEFORE(vs2, vs, link);
 	else
 		TAILQ_INSERT_TAIL(&vnet_constructors, vs, link);
 
 	/*
 	 * Invoke the constructor on all the existing vnets when it is
 	 * registered.
 	 */
 	VNET_FOREACH(vnet) {
 		CURVNET_SET_QUIET(vnet);
 		vs->func(vs->arg);
 		CURVNET_RESTORE();
 	}
 	VNET_SYSINIT_WUNLOCK();
 }
 
 void
 vnet_deregister_sysinit(void *arg)
 {
 	struct vnet_sysinit *vs;
 
 	vs = arg;
 
 	/* Remove the constructor from the global list of vnet constructors. */
 	VNET_SYSINIT_WLOCK();
 	TAILQ_REMOVE(&vnet_constructors, vs, link);
 	VNET_SYSINIT_WUNLOCK();
 }
 
 void
 vnet_register_sysuninit(void *arg)
 {
 	struct vnet_sysinit *vs, *vs2;
 
 	vs = arg;
 
 	/* Add the destructor to the global list of vnet destructors. */
 	VNET_SYSINIT_WLOCK();
 	TAILQ_FOREACH(vs2, &vnet_destructors, link) {
 		if (vs2->subsystem > vs->subsystem)
 			break;
 		if (vs2->subsystem == vs->subsystem && vs2->order > vs->order)
 			break;
 	}
 	if (vs2 != NULL)
 		TAILQ_INSERT_BEFORE(vs2, vs, link);
 	else
 		TAILQ_INSERT_TAIL(&vnet_destructors, vs, link);
 	VNET_SYSINIT_WUNLOCK();
 }
 
 void
 vnet_deregister_sysuninit(void *arg)
 {
 	struct vnet_sysinit *vs;
 	struct vnet *vnet;
 
 	vs = arg;
 
 	/*
 	 * Invoke the destructor on all the existing vnets when it is
 	 * deregistered.
 	 */
 	VNET_SYSINIT_WLOCK();
 	VNET_FOREACH(vnet) {
 		CURVNET_SET_QUIET(vnet);
 		vs->func(vs->arg);
 		CURVNET_RESTORE();
 	}
 
 	/* Remove the destructor from the global list of vnet destructors. */
 	TAILQ_REMOVE(&vnet_destructors, vs, link);
 	VNET_SYSINIT_WUNLOCK();
 }
 
 /*
  * Invoke all registered vnet constructors on the current vnet.  Used during
  * vnet construction.  The caller is responsible for ensuring the new vnet is
  * the current vnet and that the vnet_sysinit_sxlock lock is locked.
  */
 void
 vnet_sysinit(void)
 {
 	struct vnet_sysinit *vs;
 
 	VNET_SYSINIT_RLOCK();
-	TAILQ_FOREACH(vs, &vnet_constructors, link)
+	TAILQ_FOREACH(vs, &vnet_constructors, link) {
+		curvnet->vnet_state = vs->subsystem;
 		vs->func(vs->arg);
+	}
 	VNET_SYSINIT_RUNLOCK();
 }
 
 /*
  * Invoke all registered vnet destructors on the current vnet.  Used during
  * vnet destruction.  The caller is responsible for ensuring the dying vnet
  * the current vnet and that the vnet_sysinit_sxlock lock is locked.
  */
 void
 vnet_sysuninit(void)
 {
 	struct vnet_sysinit *vs;
 
 	VNET_SYSINIT_RLOCK();
 	TAILQ_FOREACH_REVERSE(vs, &vnet_destructors, vnet_sysuninit_head,
-	    link)
+	    link) {
+		curvnet->vnet_state = vs->subsystem;
 		vs->func(vs->arg);
+	}
 	VNET_SYSINIT_RUNLOCK();
 }
 
 /*
  * EVENTHANDLER(9) extensions.
  */
 /*
  * Invoke the eventhandler function originally registered with the possibly
  * registered argument for all virtual network stack instances.
  *
  * This iterator can only be used for eventhandlers that do not take any
  * additional arguments, as we do ignore the variadic arguments from the
  * EVENTHANDLER_INVOKE() call.
  */
 void
 vnet_global_eventhandler_iterator_func(void *arg, ...)
 {
 	VNET_ITERATOR_DECL(vnet_iter);
 	struct eventhandler_entry_vimage *v_ee;
 
 	/*
 	 * There is a bug here in that we should actually cast things to
 	 * (struct eventhandler_entry_ ## name *)  but that's not easily
 	 * possible in here so just re-using the variadic version we
 	 * defined for the generic vimage case.
 	 */
 	v_ee = arg;
 	VNET_LIST_RLOCK();
 	VNET_FOREACH(vnet_iter) {
 		CURVNET_SET(vnet_iter);
 		((vimage_iterator_func_t)v_ee->func)(v_ee->ee_arg);
 		CURVNET_RESTORE();
 	}
 	VNET_LIST_RUNLOCK();
 }
 
 #ifdef VNET_DEBUG
 struct vnet_recursion {
 	SLIST_ENTRY(vnet_recursion)	 vnr_le;
 	const char			*prev_fn;
 	const char			*where_fn;
 	int				 where_line;
 	struct vnet			*old_vnet;
 	struct vnet			*new_vnet;
 };
 
 static SLIST_HEAD(, vnet_recursion) vnet_recursions =
     SLIST_HEAD_INITIALIZER(vnet_recursions);
 
 static void
 vnet_print_recursion(struct vnet_recursion *vnr, int brief)
 {
 
 	if (!brief)
 		printf("CURVNET_SET() recursion in ");
 	printf("%s() line %d, prev in %s()", vnr->where_fn, vnr->where_line,
 	    vnr->prev_fn);
 	if (brief)
 		printf(", ");
 	else
 		printf("\n    ");
 	printf("%p -> %p\n", vnr->old_vnet, vnr->new_vnet);
 }
 
 void
 vnet_log_recursion(struct vnet *old_vnet, const char *old_fn, int line)
 {
 	struct vnet_recursion *vnr;
 
 	/* Skip already logged recursion events. */
 	SLIST_FOREACH(vnr, &vnet_recursions, vnr_le)
 		if (vnr->prev_fn == old_fn &&
 		    vnr->where_fn == curthread->td_vnet_lpush &&
 		    vnr->where_line == line &&
 		    (vnr->old_vnet == vnr->new_vnet) == (curvnet == old_vnet))
 			return;
 
 	vnr = malloc(sizeof(*vnr), M_VNET, M_NOWAIT | M_ZERO);
 	if (vnr == NULL)
 		panic("%s: malloc failed", __func__);
 	vnr->prev_fn = old_fn;
 	vnr->where_fn = curthread->td_vnet_lpush;
 	vnr->where_line = line;
 	vnr->old_vnet = old_vnet;
 	vnr->new_vnet = curvnet;
 
 	SLIST_INSERT_HEAD(&vnet_recursions, vnr, vnr_le);
 
 	vnet_print_recursion(vnr, 0);
 #ifdef KDB
 	kdb_backtrace();
 #endif
 }
 #endif /* VNET_DEBUG */
 
 /*
  * DDB(4).
  */
 #ifdef DDB
 static void
 db_vnet_print(struct vnet *vnet)
 {
 
 	db_printf("vnet            = %p\n", vnet);
 	db_printf(" vnet_magic_n   = %#08x (%s, orig %#08x)\n",
 	    vnet->vnet_magic_n,
 	    (vnet->vnet_magic_n == VNET_MAGIC_N) ?
 		"ok" : "mismatch", VNET_MAGIC_N);
 	db_printf(" vnet_ifcnt     = %u\n", vnet->vnet_ifcnt);
 	db_printf(" vnet_sockcnt   = %u\n", vnet->vnet_sockcnt);
 	db_printf(" vnet_data_mem  = %p\n", vnet->vnet_data_mem);
 	db_printf(" vnet_data_base = %#jx\n",
 	    (uintmax_t)vnet->vnet_data_base);
-	db_printf(" vnet_shutdown  = %#08x\n", vnet->vnet_shutdown);
+	db_printf(" vnet_state     = %#08x\n", vnet->vnet_state);
+	db_printf(" vnet_shutdown  = %#03x\n", vnet->vnet_shutdown);
 	db_printf("\n");
 }
 
 DB_SHOW_ALL_COMMAND(vnets, db_show_all_vnets)
 {
 	VNET_ITERATOR_DECL(vnet_iter);
 
 	VNET_FOREACH(vnet_iter) {
 		db_vnet_print(vnet_iter);
 		if (db_pager_quit)
 			break;
 	}
 }
 
 DB_SHOW_COMMAND(vnet, db_show_vnet)
 {
 
 	if (!have_addr) {
 		db_printf("usage: show vnet <struct vnet *>\n");
 		return;
 	}
 
 	db_vnet_print((struct vnet *)addr);
 }
 
 static void
 db_show_vnet_print_vs(struct vnet_sysinit *vs, int ddb)
 {
 	const char *vsname, *funcname;
 	c_db_sym_t sym;
 	db_expr_t  offset;
 
 #define xprint(...)							\
 	if (ddb)							\
 		db_printf(__VA_ARGS__);					\
 	else								\
 		printf(__VA_ARGS__)
 
 	if (vs == NULL) {
 		xprint("%s: no vnet_sysinit * given\n", __func__);
 		return;
 	}
 
 	sym = db_search_symbol((vm_offset_t)vs, DB_STGY_ANY, &offset);
 	db_symbol_values(sym, &vsname, NULL);
 	sym = db_search_symbol((vm_offset_t)vs->func, DB_STGY_PROC, &offset);
 	db_symbol_values(sym, &funcname, NULL);
 	xprint("%s(%p)\n", (vsname != NULL) ? vsname : "", vs);
 	xprint("  %#08x %#08x\n", vs->subsystem, vs->order);
 	xprint("  %p(%s)(%p)\n",
 	    vs->func, (funcname != NULL) ? funcname : "", vs->arg);
 #undef xprint
 }
 
 DB_SHOW_COMMAND(vnet_sysinit, db_show_vnet_sysinit)
 {
 	struct vnet_sysinit *vs;
 
 	db_printf("VNET_SYSINIT vs Name(Ptr)\n");
 	db_printf("  Subsystem  Order\n");
 	db_printf("  Function(Name)(Arg)\n");
 	TAILQ_FOREACH(vs, &vnet_constructors, link) {
 		db_show_vnet_print_vs(vs, 1);
 		if (db_pager_quit)
 			break;
 	}
 }
 
 DB_SHOW_COMMAND(vnet_sysuninit, db_show_vnet_sysuninit)
 {
 	struct vnet_sysinit *vs;
 
 	db_printf("VNET_SYSUNINIT vs Name(Ptr)\n");
 	db_printf("  Subsystem  Order\n");
 	db_printf("  Function(Name)(Arg)\n");
 	TAILQ_FOREACH_REVERSE(vs, &vnet_destructors, vnet_sysuninit_head,
 	    link) {
 		db_show_vnet_print_vs(vs, 1);
 		if (db_pager_quit)
 			break;
 	}
 }
 
 #ifdef VNET_DEBUG
 DB_SHOW_COMMAND(vnetrcrs, db_show_vnetrcrs)
 {
 	struct vnet_recursion *vnr;
 
 	SLIST_FOREACH(vnr, &vnet_recursions, vnr_le)
 		vnet_print_recursion(vnr, 1);
 }
 #endif
 #endif /* DDB */
Index: head/sys/net/vnet.h
===================================================================
--- head/sys/net/vnet.h	(revision 358019)
+++ head/sys/net/vnet.h	(revision 358020)
@@ -1,455 +1,456 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (c) 2006-2009 University of Zagreb
  * Copyright (c) 2006-2009 FreeBSD Foundation
  * All rights reserved.
  *
  * This software was developed by the University of Zagreb and the
  * FreeBSD Foundation under sponsorship by the Stichting NLnet and the
  * FreeBSD Foundation.
  *
  * Copyright (c) 2009 Jeffrey Roberson <jeff@freebsd.org>
  * Copyright (c) 2009 Robert N. M. Watson
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 /*-
  * This header file defines several sets of interfaces supporting virtualized
  * network stacks:
  *
  * - Definition of 'struct vnet' and functions and macros to allocate/free/
  *   manipulate it.
  *
  * - A virtual network stack memory allocator, which provides support for
  *   virtualized global variables via a special linker set, set_vnet.
  *
  * - Virtualized sysinits/sysuninits, which allow constructors and
  *   destructors to be run for each network stack subsystem as virtual
  *   instances are created and destroyed.
  *
  * If VIMAGE isn't compiled into the kernel, virtualized global variables
  * compile to normal global variables, and virtualized sysinits to regular
  * sysinits.
  */
 
 #ifndef _NET_VNET_H_
 #define	_NET_VNET_H_
 
 /*
  * struct vnet describes a virtualized network stack, and is primarily a
  * pointer to storage for virtualized global variables.  Expose to userspace
  * as required for libkvm.
  */
 #if defined(_KERNEL) || defined(_WANT_VNET)
 #include <sys/queue.h>
 
 struct vnet {
 	LIST_ENTRY(vnet)	 vnet_le;	/* all vnets list */
 	u_int			 vnet_magic_n;
 	u_int			 vnet_ifcnt;
 	u_int			 vnet_sockcnt;
-	u_int			 vnet_shutdown; /* Shutdown in progress. */
+	u_int			 vnet_state;	/* SI_SUB_* */
 	void			*vnet_data_mem;
 	uintptr_t		 vnet_data_base;
-};
-#define	VNET_MAGIC_N	0x3e0d8f29
+	bool			 vnet_shutdown;	/* Shutdown in progress. */
+} __aligned(CACHE_LINE_SIZE);
+#define	VNET_MAGIC_N	0x5e4a6f28
 
 /*
  * These two virtual network stack allocator definitions are also required
  * for libkvm so that it can evaluate virtualized global variables.
  */
 #define	VNET_SETNAME		"set_vnet"
 #define	VNET_SYMPREFIX		"vnet_entry_"
 #endif
 
 #ifdef _KERNEL
 
 #define	VNET_PCPUSTAT_DECLARE(type, name)	\
     VNET_DECLARE(counter_u64_t, name[sizeof(type) / sizeof(uint64_t)])
 
 #define	VNET_PCPUSTAT_DEFINE(type, name)	\
     VNET_DEFINE(counter_u64_t, name[sizeof(type) / sizeof(uint64_t)])
 #define	VNET_PCPUSTAT_DEFINE_STATIC(type, name)	\
     VNET_DEFINE_STATIC(counter_u64_t, name[sizeof(type) / sizeof(uint64_t)])
 
 #define	VNET_PCPUSTAT_ALLOC(name, wait)	\
     COUNTER_ARRAY_ALLOC(VNET(name), \
 	sizeof(VNET(name)) / sizeof(counter_u64_t), (wait))
 
 #define	VNET_PCPUSTAT_FREE(name)	\
     COUNTER_ARRAY_FREE(VNET(name), sizeof(VNET(name)) / sizeof(counter_u64_t))
 
 #define	VNET_PCPUSTAT_ADD(type, name, f, v)	\
     counter_u64_add(VNET(name)[offsetof(type, f) / sizeof(uint64_t)], (v))
 
 #define	VNET_PCPUSTAT_FETCH(type, name, f)	\
     counter_u64_fetch(VNET(name)[offsetof(type, f) / sizeof(uint64_t)])
 
 #define	VNET_PCPUSTAT_SYSINIT(name)	\
 static void				\
 vnet_##name##_init(const void *unused)	\
 {					\
 	VNET_PCPUSTAT_ALLOC(name, M_WAITOK);	\
 }					\
 VNET_SYSINIT(vnet_ ## name ## _init, SI_SUB_INIT_IF,			\
     SI_ORDER_FIRST, vnet_ ## name ## _init, NULL)
 
 #define	VNET_PCPUSTAT_SYSUNINIT(name)					\
 static void								\
 vnet_##name##_uninit(const void *unused)				\
 {									\
 	VNET_PCPUSTAT_FREE(name);					\
 }									\
 VNET_SYSUNINIT(vnet_ ## name ## _uninit, SI_SUB_INIT_IF,		\
     SI_ORDER_FIRST, vnet_ ## name ## _uninit, NULL)
 
 #ifdef SYSCTL_OID
 #define	SYSCTL_VNET_PCPUSTAT(parent, nbr, name, type, array, desc)	\
 static int								\
 array##_sysctl(SYSCTL_HANDLER_ARGS)					\
 {									\
 	type s;								\
 	CTASSERT((sizeof(type) / sizeof(uint64_t)) ==			\
 	    (sizeof(VNET(array)) / sizeof(counter_u64_t)));		\
 	COUNTER_ARRAY_COPY(VNET(array), &s, sizeof(type) / sizeof(uint64_t));\
 	if (req->newptr)						\
 		COUNTER_ARRAY_ZERO(VNET(array),				\
 		    sizeof(type) / sizeof(uint64_t));			\
 	return (SYSCTL_OUT(req, &s, sizeof(type)));			\
 }									\
 SYSCTL_PROC(parent, nbr, name, CTLFLAG_VNET | CTLTYPE_OPAQUE | CTLFLAG_RW, \
     NULL, 0, array ## _sysctl, "I", desc)
 #endif /* SYSCTL_OID */
 
 #ifdef VIMAGE
 #include <sys/lock.h>
 #include <sys/proc.h>			/* for struct thread */
 #include <sys/rwlock.h>
 #include <sys/sx.h>
 
 /*
  * Location of the kernel's 'set_vnet' linker set.
  */
 extern uintptr_t	*__start_set_vnet;
 __GLOBL(__start_set_vnet);
 extern uintptr_t	*__stop_set_vnet;
 __GLOBL(__stop_set_vnet);
 
 #define	VNET_START	(uintptr_t)&__start_set_vnet
 #define	VNET_STOP	(uintptr_t)&__stop_set_vnet
 
 /*
  * Functions to allocate and destroy virtual network stacks.
  */
 struct vnet *vnet_alloc(void);
 void	vnet_destroy(struct vnet *vnet);
 
 /*
  * The current virtual network stack -- we may wish to move this to struct
  * pcpu in the future.
  */
 #define	curvnet	curthread->td_vnet
 
 /*
  * Various macros -- get and set the current network stack, but also
  * assertions.
  */
 #if defined(INVARIANTS) || defined(VNET_DEBUG)
 #define	VNET_ASSERT(exp, msg)	do {					\
 	if (!(exp))							\
 		panic msg;						\
 } while (0)
 #else
 #define	VNET_ASSERT(exp, msg)	do {					\
 } while (0)
 #endif
 
 #ifdef VNET_DEBUG
 void vnet_log_recursion(struct vnet *, const char *, int);
 
 #define	CURVNET_SET_QUIET(arg)						\
 	VNET_ASSERT((arg) != NULL && (arg)->vnet_magic_n == VNET_MAGIC_N, \
 	    ("CURVNET_SET at %s:%d %s() curvnet=%p vnet=%p",		\
 	    __FILE__, __LINE__, __func__, curvnet, (arg)));		\
 	struct vnet *saved_vnet = curvnet;				\
 	const char *saved_vnet_lpush = curthread->td_vnet_lpush;	\
 	curvnet = arg;							\
 	curthread->td_vnet_lpush = __func__;
  
 #define	CURVNET_SET_VERBOSE(arg)					\
 	CURVNET_SET_QUIET(arg)						\
 	if (saved_vnet)							\
 		vnet_log_recursion(saved_vnet, saved_vnet_lpush, __LINE__);
 
 #define	CURVNET_SET(arg)	CURVNET_SET_VERBOSE(arg)
  
 #define	CURVNET_RESTORE()						\
 	VNET_ASSERT(curvnet != NULL && (saved_vnet == NULL ||		\
 	    saved_vnet->vnet_magic_n == VNET_MAGIC_N),			\
 	    ("CURVNET_RESTORE at %s:%d %s() curvnet=%p saved_vnet=%p",	\
 	    __FILE__, __LINE__, __func__, curvnet, saved_vnet));	\
 	curvnet = saved_vnet;						\
 	curthread->td_vnet_lpush = saved_vnet_lpush;
 #else /* !VNET_DEBUG */
 
 #define	CURVNET_SET_QUIET(arg)						\
 	VNET_ASSERT((arg) != NULL && (arg)->vnet_magic_n == VNET_MAGIC_N, \
 	    ("CURVNET_SET at %s:%d %s() curvnet=%p vnet=%p",		\
 	    __FILE__, __LINE__, __func__, curvnet, (arg)));		\
 	struct vnet *saved_vnet = curvnet;				\
 	curvnet = arg;	
  
 #define	CURVNET_SET_VERBOSE(arg)					\
 	CURVNET_SET_QUIET(arg)
 
 #define	CURVNET_SET(arg)	CURVNET_SET_VERBOSE(arg)
  
 #define	CURVNET_RESTORE()						\
 	VNET_ASSERT(curvnet != NULL && (saved_vnet == NULL ||		\
 	    saved_vnet->vnet_magic_n == VNET_MAGIC_N),			\
 	    ("CURVNET_RESTORE at %s:%d %s() curvnet=%p saved_vnet=%p",	\
 	    __FILE__, __LINE__, __func__, curvnet, saved_vnet));	\
 	curvnet = saved_vnet;
 #endif /* VNET_DEBUG */
 
 extern struct vnet *vnet0;
 #define	IS_DEFAULT_VNET(arg)	((arg) == vnet0)
 
 #define	CRED_TO_VNET(cr)	(cr)->cr_prison->pr_vnet
 #define	TD_TO_VNET(td)		CRED_TO_VNET((td)->td_ucred)
 #define	P_TO_VNET(p)		CRED_TO_VNET((p)->p_ucred)
 
 /*
  * Global linked list of all virtual network stacks, along with read locks to
  * access it.  If a caller may sleep while accessing the list, it must use
  * the sleepable lock macros.
  */
 LIST_HEAD(vnet_list_head, vnet);
 extern struct vnet_list_head vnet_head;
 extern struct rwlock vnet_rwlock;
 extern struct sx vnet_sxlock;
 
 #define	VNET_LIST_RLOCK()		sx_slock(&vnet_sxlock)
 #define	VNET_LIST_RLOCK_NOSLEEP()	rw_rlock(&vnet_rwlock)
 #define	VNET_LIST_RUNLOCK()		sx_sunlock(&vnet_sxlock)
 #define	VNET_LIST_RUNLOCK_NOSLEEP()	rw_runlock(&vnet_rwlock)
 
 /*
  * Iteration macros to walk the global list of virtual network stacks.
  */
 #define	VNET_ITERATOR_DECL(arg)	struct vnet *arg
 #define	VNET_FOREACH(arg)	LIST_FOREACH((arg), &vnet_head, vnet_le)
 
 /*
  * Virtual network stack memory allocator, which allows global variables to
  * be automatically instantiated for each network stack instance.
  */
 #define	VNET_NAME(n)		vnet_entry_##n
 #define	VNET_DECLARE(t, n)	extern t VNET_NAME(n)
 /* struct _hack is to stop this from being used with static data */
 #define	VNET_DEFINE(t, n)	\
     struct _hack; t VNET_NAME(n) __section(VNET_SETNAME) __used
 #if defined(KLD_MODULE) && (defined(__aarch64__) || defined(__riscv) \
 		|| defined(__powerpc64__))
 /*
  * As with DPCPU_DEFINE_STATIC we are unable to mark this data as static
  * in modules on some architectures.
  */
 #define	VNET_DEFINE_STATIC(t, n) \
     t VNET_NAME(n) __section(VNET_SETNAME) __used
 #else
 #define	VNET_DEFINE_STATIC(t, n) \
     static t VNET_NAME(n) __section(VNET_SETNAME) __used
 #endif
 #define	_VNET_PTR(b, n)		(__typeof(VNET_NAME(n))*)		\
 				    ((b) + (uintptr_t)&VNET_NAME(n))
 
 #define	_VNET(b, n)		(*_VNET_PTR(b, n))
 
 /*
  * Virtualized global variable accessor macros.
  */
 #define	VNET_VNET_PTR(vnet, n)		_VNET_PTR((vnet)->vnet_data_base, n)
 #define	VNET_VNET(vnet, n)		(*VNET_VNET_PTR((vnet), n))
 
 #define	VNET_PTR(n)		VNET_VNET_PTR(curvnet, n)
 #define	VNET(n)			VNET_VNET(curvnet, n)
 
 /*
  * Virtual network stack allocator interfaces from the kernel linker.
  */
 void	*vnet_data_alloc(int size);
 void	 vnet_data_copy(void *start, int size);
 void	 vnet_data_free(void *start_arg, int size);
 
 /*
  * Virtual sysinit mechanism, allowing network stack components to declare
  * startup and shutdown methods to be run when virtual network stack
  * instances are created and destroyed.
  */
 #include <sys/kernel.h>
 
 /*
  * SYSINIT/SYSUNINIT variants that provide per-vnet constructors and
  * destructors.
  */
 struct vnet_sysinit {
 	enum sysinit_sub_id	subsystem;
 	enum sysinit_elem_order	order;
 	sysinit_cfunc_t		func;
 	const void		*arg;
 	TAILQ_ENTRY(vnet_sysinit) link;
 };
 
 #define	VNET_SYSINIT(ident, subsystem, order, func, arg)		\
 	CTASSERT((subsystem) > SI_SUB_VNET &&				\
 	    (subsystem) <= SI_SUB_VNET_DONE);				\
 	static struct vnet_sysinit ident ## _vnet_init = {		\
 		subsystem,						\
 		order,							\
 		(sysinit_cfunc_t)(sysinit_nfunc_t)func,			\
 		(arg)							\
 	};								\
 	SYSINIT(vnet_init_ ## ident, subsystem, order,			\
 	    vnet_register_sysinit, &ident ## _vnet_init);		\
 	SYSUNINIT(vnet_init_ ## ident, subsystem, order,		\
 	    vnet_deregister_sysinit, &ident ## _vnet_init)
 
 #define	VNET_SYSUNINIT(ident, subsystem, order, func, arg)		\
 	CTASSERT((subsystem) > SI_SUB_VNET &&				\
 	    (subsystem) <= SI_SUB_VNET_DONE);				\
 	static struct vnet_sysinit ident ## _vnet_uninit = {		\
 		subsystem,						\
 		order,							\
 		(sysinit_cfunc_t)(sysinit_nfunc_t)func,			\
 		(arg)							\
 	};								\
 	SYSINIT(vnet_uninit_ ## ident, subsystem, order,		\
 	    vnet_register_sysuninit, &ident ## _vnet_uninit);		\
 	SYSUNINIT(vnet_uninit_ ## ident, subsystem, order,		\
 	    vnet_deregister_sysuninit, &ident ## _vnet_uninit)
 
 /*
  * Run per-vnet sysinits or sysuninits during vnet creation/destruction.
  */
 void	 vnet_sysinit(void);
 void	 vnet_sysuninit(void);
 
 /*
  * Interfaces for managing per-vnet constructors and destructors.
  */
 void	vnet_register_sysinit(void *arg);
 void	vnet_register_sysuninit(void *arg);
 void	vnet_deregister_sysinit(void *arg);
 void	vnet_deregister_sysuninit(void *arg);
 
 /*
  * EVENTHANDLER(9) extensions.
  */
 #include <sys/eventhandler.h>
 
 void	vnet_global_eventhandler_iterator_func(void *, ...);
 #define VNET_GLOBAL_EVENTHANDLER_REGISTER_TAG(tag, name, func, arg, priority) \
 do {									\
 	if (IS_DEFAULT_VNET(curvnet)) {					\
 		(tag) = vimage_eventhandler_register(NULL, #name, func,	\
 		    arg, priority,					\
 		    vnet_global_eventhandler_iterator_func);		\
 	}								\
 } while(0)
 #define VNET_GLOBAL_EVENTHANDLER_REGISTER(name, func, arg, priority)	\
 do {									\
 	if (IS_DEFAULT_VNET(curvnet)) {					\
 		vimage_eventhandler_register(NULL, #name, func,		\
 		    arg, priority,					\
 		    vnet_global_eventhandler_iterator_func);		\
 	}								\
 } while(0)
 
 #else /* !VIMAGE */
 
 /*
  * Various virtual network stack macros compile to no-ops without VIMAGE.
  */
 #define	curvnet			NULL
 
 #define	VNET_ASSERT(exp, msg)
 #define	CURVNET_SET(arg)
 #define	CURVNET_SET_QUIET(arg)
 #define	CURVNET_RESTORE()
 
 #define	VNET_LIST_RLOCK()
 #define	VNET_LIST_RLOCK_NOSLEEP()
 #define	VNET_LIST_RUNLOCK()
 #define	VNET_LIST_RUNLOCK_NOSLEEP()
 #define	VNET_ITERATOR_DECL(arg)
 #define	VNET_FOREACH(arg)
 
 #define	IS_DEFAULT_VNET(arg)	1
 #define	CRED_TO_VNET(cr)	NULL
 #define	TD_TO_VNET(td)		NULL
 #define	P_TO_VNET(p)		NULL
 
 /*
  * Versions of the VNET macros that compile to normal global variables and
  * standard sysctl definitions.
  */
 #define	VNET_NAME(n)		n
 #define	VNET_DECLARE(t, n)	extern t n
 #define	VNET_DEFINE(t, n)	struct _hack; t n
 #define	VNET_DEFINE_STATIC(t, n)	static t n
 #define	_VNET_PTR(b, n)		&VNET_NAME(n)
 
 /*
  * Virtualized global variable accessor macros.
  */
 #define	VNET_VNET_PTR(vnet, n)		(&(n))
 #define	VNET_VNET(vnet, n)		(n)
 
 #define	VNET_PTR(n)		(&(n))
 #define	VNET(n)			(n)
 
 /*
  * When VIMAGE isn't compiled into the kernel, VNET_SYSINIT/VNET_SYSUNINIT
  * map into normal sysinits, which have the same ordering properties.
  */
 #define	VNET_SYSINIT(ident, subsystem, order, func, arg)		\
 	SYSINIT(ident, subsystem, order, func, arg)
 #define	VNET_SYSUNINIT(ident, subsystem, order, func, arg)		\
 	SYSUNINIT(ident, subsystem, order, func, arg)
 
 /*
  * Without VIMAGE revert to the default implementation.
  */
 #define VNET_GLOBAL_EVENTHANDLER_REGISTER_TAG(tag, name, func, arg, priority) \
 	(tag) = eventhandler_register(NULL, #name, func, arg, priority)
 #define VNET_GLOBAL_EVENTHANDLER_REGISTER(name, func, arg, priority)	\
 	eventhandler_register(NULL, #name, func, arg, priority)
 #endif /* VIMAGE */
 #endif /* _KERNEL */
 
 #endif /* !_NET_VNET_H_ */
Index: head/sys/sys/param.h
===================================================================
--- head/sys/sys/param.h	(revision 358019)
+++ head/sys/sys/param.h	(revision 358020)
@@ -1,368 +1,368 @@
 /*-
  * SPDX-License-Identifier: BSD-3-Clause
  *
  * Copyright (c) 1982, 1986, 1989, 1993
  *	The Regents of the University of California.  All rights reserved.
  * (c) UNIX System Laboratories, Inc.
  * All or some portions of this file are derived from material licensed
  * to the University of California by American Telephone and Telegraph
  * Co. or Unix System Laboratories, Inc. and are reproduced herein with
  * the permission of UNIX System Laboratories, Inc.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 3. Neither the name of the University nor the names of its contributors
  *    may be used to endorse or promote products derived from this software
  *    without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  *	@(#)param.h	8.3 (Berkeley) 4/4/95
  * $FreeBSD$
  */
 
 #ifndef _SYS_PARAM_H_
 #define _SYS_PARAM_H_
 
 #include <sys/_null.h>
 
 #define	BSD	199506		/* System version (year & month). */
 #define BSD4_3	1
 #define BSD4_4	1
 
 /*
  * __FreeBSD_version numbers are documented in the Porter's Handbook.
  * If you bump the version for any reason, you should update the documentation
  * there.
  * Currently this lives here in the doc/ repository:
  *
  *	head/en_US.ISO8859-1/books/porters-handbook/versions/chapter.xml
  *
  * scheme is:  <major><two digit minor>Rxx
  *		'R' is in the range 0 to 4 if this is a release branch or
  *		X.0-CURRENT before releng/X.0 is created, otherwise 'R' is
  *		in the range 5 to 9.
  */
 #undef __FreeBSD_version
-#define __FreeBSD_version 1300077	/* Master, propagated to newvers */
+#define __FreeBSD_version 1300078	/* Master, propagated to newvers */
 
 /*
  * __FreeBSD_kernel__ indicates that this system uses the kernel of FreeBSD,
  * which by definition is always true on FreeBSD. This macro is also defined
  * on other systems that use the kernel of FreeBSD, such as GNU/kFreeBSD.
  *
  * It is tempting to use this macro in userland code when we want to enable
  * kernel-specific routines, and in fact it's fine to do this in code that
  * is part of FreeBSD itself.  However, be aware that as presence of this
  * macro is still not widespread (e.g. older FreeBSD versions, 3rd party
  * compilers, etc), it is STRONGLY DISCOURAGED to check for this macro in
  * external applications without also checking for __FreeBSD__ as an
  * alternative.
  */
 #undef __FreeBSD_kernel__
 #define __FreeBSD_kernel__
 
 #if defined(_KERNEL) || defined(IN_RTLD)
 #define	P_OSREL_SIGWAIT			700000
 #define	P_OSREL_SIGSEGV			700004
 #define	P_OSREL_MAP_ANON		800104
 #define	P_OSREL_MAP_FSTRICT		1100036
 #define	P_OSREL_SHUTDOWN_ENOTCONN	1100077
 #define	P_OSREL_MAP_GUARD		1200035
 #define	P_OSREL_WRFSBASE		1200041
 #define	P_OSREL_CK_CYLGRP		1200046
 #define	P_OSREL_VMTOTAL64		1200054
 #define	P_OSREL_CK_SUPERBLOCK		1300000
 #define	P_OSREL_CK_INODE		1300005
 #define	P_OSREL_POWERPC_NEW_AUX_ARGS	1300070
 
 #define	P_OSREL_MAJOR(x)		((x) / 100000)
 #endif
 
 #ifndef LOCORE
 #include <sys/types.h>
 #endif
 
 /*
  * Machine-independent constants (some used in following include files).
  * Redefined constants are from POSIX 1003.1 limits file.
  *
  * MAXCOMLEN should be >= sizeof(ac_comm) (see <acct.h>)
  */
 #include <sys/syslimits.h>
 
 #define	MAXCOMLEN	19		/* max command name remembered */
 #define	MAXINTERP	PATH_MAX	/* max interpreter file name length */
 #define	MAXLOGNAME	33		/* max login name length (incl. NUL) */
 #define	MAXUPRC		CHILD_MAX	/* max simultaneous processes */
 #define	NCARGS		ARG_MAX		/* max bytes for an exec function */
 #define	NGROUPS		(NGROUPS_MAX+1)	/* max number groups */
 #define	NOFILE		OPEN_MAX	/* max open files per process */
 #define	NOGROUP		65535		/* marker for empty group set member */
 #define MAXHOSTNAMELEN	256		/* max hostname size */
 #define SPECNAMELEN	255		/* max length of devicename */
 
 /* More types and definitions used throughout the kernel. */
 #ifdef _KERNEL
 #include <sys/cdefs.h>
 #include <sys/errno.h>
 #ifndef LOCORE
 #include <sys/time.h>
 #include <sys/priority.h>
 #endif
 
 #ifndef FALSE
 #define	FALSE	0
 #endif
 #ifndef TRUE
 #define	TRUE	1
 #endif
 #endif
 
 #ifndef _KERNEL
 /* Signals. */
 #include <sys/signal.h>
 #endif
 
 /* Machine type dependent parameters. */
 #include <machine/param.h>
 #ifndef _KERNEL
 #include <sys/limits.h>
 #endif
 
 #ifndef DEV_BSHIFT
 #define	DEV_BSHIFT	9		/* log2(DEV_BSIZE) */
 #endif
 #define	DEV_BSIZE	(1<<DEV_BSHIFT)
 
 #ifndef BLKDEV_IOSIZE
 #define BLKDEV_IOSIZE  PAGE_SIZE	/* default block device I/O size */
 #endif
 #ifndef DFLTPHYS
 #define DFLTPHYS	(64 * 1024)	/* default max raw I/O transfer size */
 #endif
 #ifndef MAXPHYS
 #define MAXPHYS		(128 * 1024)	/* max raw I/O transfer size */
 #endif
 #ifndef MAXDUMPPGS
 #define MAXDUMPPGS	(DFLTPHYS/PAGE_SIZE)
 #endif
 
 /*
  * Constants related to network buffer management.
  * MCLBYTES must be no larger than PAGE_SIZE.
  */
 #ifndef	MSIZE
 #define	MSIZE		256		/* size of an mbuf */
 #endif
 
 #ifndef	MCLSHIFT
 #define MCLSHIFT	11		/* convert bytes to mbuf clusters */
 #endif	/* MCLSHIFT */
 
 #define MCLBYTES	(1 << MCLSHIFT)	/* size of an mbuf cluster */
 
 #if PAGE_SIZE < 2048
 #define	MJUMPAGESIZE	MCLBYTES
 #elif PAGE_SIZE <= 8192
 #define	MJUMPAGESIZE	PAGE_SIZE
 #else
 #define	MJUMPAGESIZE	(8 * 1024)
 #endif
 
 #define	MJUM9BYTES	(9 * 1024)	/* jumbo cluster 9k */
 #define	MJUM16BYTES	(16 * 1024)	/* jumbo cluster 16k */
 
 /*
  * Some macros for units conversion
  */
 
 /* clicks to bytes */
 #ifndef ctob
 #define ctob(x)	((x)<<PAGE_SHIFT)
 #endif
 
 /* bytes to clicks */
 #ifndef btoc
 #define btoc(x)	(((vm_offset_t)(x)+PAGE_MASK)>>PAGE_SHIFT)
 #endif
 
 /*
  * btodb() is messy and perhaps slow because `bytes' may be an off_t.  We
  * want to shift an unsigned type to avoid sign extension and we don't
  * want to widen `bytes' unnecessarily.  Assume that the result fits in
  * a daddr_t.
  */
 #ifndef btodb
 #define btodb(bytes)	 		/* calculates (bytes / DEV_BSIZE) */ \
 	(sizeof (bytes) > sizeof(long) \
 	 ? (daddr_t)((unsigned long long)(bytes) >> DEV_BSHIFT) \
 	 : (daddr_t)((unsigned long)(bytes) >> DEV_BSHIFT))
 #endif
 
 #ifndef dbtob
 #define dbtob(db)			/* calculates (db * DEV_BSIZE) */ \
 	((off_t)(db) << DEV_BSHIFT)
 #endif
 
 #define	PRIMASK	0x0ff
 #define	PCATCH	0x100		/* OR'd with pri for tsleep to check signals */
 #define	PDROP	0x200	/* OR'd with pri to stop re-entry of interlock mutex */
 
 #define	NZERO	0		/* default "nice" */
 
 #define	NBBY	8		/* number of bits in a byte */
 #define	NBPW	sizeof(int)	/* number of bytes per word (integer) */
 
 #define	CMASK	022		/* default file mask: S_IWGRP|S_IWOTH */
 
 #define	NODEV	(dev_t)(-1)	/* non-existent device */
 
 /*
  * File system parameters and macros.
  *
  * MAXBSIZE -	Filesystems are made out of blocks of at most MAXBSIZE bytes
  *		per block.  MAXBSIZE may be made larger without effecting
  *		any existing filesystems as long as it does not exceed MAXPHYS,
  *		and may be made smaller at the risk of not being able to use
  *		filesystems which require a block size exceeding MAXBSIZE.
  *
  * MAXBCACHEBUF - Maximum size of a buffer in the buffer cache.  This must
  *		be >= MAXBSIZE and can be set differently for different
  *		architectures by defining it in <machine/param.h>.
  *		Making this larger allows NFS to do larger reads/writes.
  *
  * BKVASIZE -	Nominal buffer space per buffer, in bytes.  BKVASIZE is the
  *		minimum KVM memory reservation the kernel is willing to make.
  *		Filesystems can of course request smaller chunks.  Actual
  *		backing memory uses a chunk size of a page (PAGE_SIZE).
  *		The default value here can be overridden on a per-architecture
  *		basis by defining it in <machine/param.h>.
  *
  *		If you make BKVASIZE too small you risk seriously fragmenting
  *		the buffer KVM map which may slow things down a bit.  If you
  *		make it too big the kernel will not be able to optimally use
  *		the KVM memory reserved for the buffer cache and will wind
  *		up with too-few buffers.
  *
  *		The default is 16384, roughly 2x the block size used by a
  *		normal UFS filesystem.
  */
 #define MAXBSIZE	65536	/* must be power of 2 */
 #ifndef	MAXBCACHEBUF
 #define	MAXBCACHEBUF	MAXBSIZE /* must be a power of 2 >= MAXBSIZE */
 #endif
 #ifndef	BKVASIZE
 #define BKVASIZE	16384	/* must be power of 2 */
 #endif
 #define BKVAMASK	(BKVASIZE-1)
 
 /*
  * MAXPATHLEN defines the longest permissible path length after expanding
  * symbolic links. It is used to allocate a temporary buffer from the buffer
  * pool in which to do the name expansion, hence should be a power of two,
  * and must be less than or equal to MAXBSIZE.  MAXSYMLINKS defines the
  * maximum number of symbolic links that may be expanded in a path name.
  * It should be set high enough to allow all legitimate uses, but halt
  * infinite loops reasonably quickly.
  */
 #define	MAXPATHLEN	PATH_MAX
 #define MAXSYMLINKS	32
 
 /* Bit map related macros. */
 #define	setbit(a,i)	(((unsigned char *)(a))[(i)/NBBY] |= 1<<((i)%NBBY))
 #define	clrbit(a,i)	(((unsigned char *)(a))[(i)/NBBY] &= ~(1<<((i)%NBBY)))
 #define	isset(a,i)							\
 	(((const unsigned char *)(a))[(i)/NBBY] & (1<<((i)%NBBY)))
 #define	isclr(a,i)							\
 	((((const unsigned char *)(a))[(i)/NBBY] & (1<<((i)%NBBY))) == 0)
 
 /* Macros for counting and rounding. */
 #ifndef howmany
 #define	howmany(x, y)	(((x)+((y)-1))/(y))
 #endif
 #define	nitems(x)	(sizeof((x)) / sizeof((x)[0]))
 #define	rounddown(x, y)	(((x)/(y))*(y))
 #define	rounddown2(x, y) ((x)&(~((y)-1)))          /* if y is power of two */
 #define	roundup(x, y)	((((x)+((y)-1))/(y))*(y))  /* to any y */
 #define	roundup2(x, y)	(((x)+((y)-1))&(~((y)-1))) /* if y is powers of two */
 #define powerof2(x)	((((x)-1)&(x))==0)
 
 /* Macros for min/max. */
 #define	MIN(a,b) (((a)<(b))?(a):(b))
 #define	MAX(a,b) (((a)>(b))?(a):(b))
 
 #ifdef _KERNEL
 /*
  * Basic byte order function prototypes for non-inline functions.
  */
 #ifndef LOCORE
 #ifndef _BYTEORDER_PROTOTYPED
 #define	_BYTEORDER_PROTOTYPED
 __BEGIN_DECLS
 __uint32_t	 htonl(__uint32_t);
 __uint16_t	 htons(__uint16_t);
 __uint32_t	 ntohl(__uint32_t);
 __uint16_t	 ntohs(__uint16_t);
 __END_DECLS
 #endif
 #endif
 
 #ifndef _BYTEORDER_FUNC_DEFINED
 #define	_BYTEORDER_FUNC_DEFINED
 #define	htonl(x)	__htonl(x)
 #define	htons(x)	__htons(x)
 #define	ntohl(x)	__ntohl(x)
 #define	ntohs(x)	__ntohs(x)
 #endif /* !_BYTEORDER_FUNC_DEFINED */
 #endif /* _KERNEL */
 
 /*
  * Scale factor for scaled integers used to count %cpu time and load avgs.
  *
  * The number of CPU `tick's that map to a unique `%age' can be expressed
  * by the formula (1 / (2 ^ (FSHIFT - 11))).  The maximum load average that
  * can be calculated (assuming 32 bits) can be closely approximated using
  * the formula (2 ^ (2 * (16 - FSHIFT))) for (FSHIFT < 15).
  *
  * For the scheduler to maintain a 1:1 mapping of CPU `tick' to `%age',
  * FSHIFT must be at least 11; this gives us a maximum load avg of ~1024.
  */
 #define	FSHIFT	11		/* bits to right of fixed binary point */
 #define FSCALE	(1<<FSHIFT)
 
 #define dbtoc(db)			/* calculates devblks to pages */ \
 	((db + (ctodb(1) - 1)) >> (PAGE_SHIFT - DEV_BSHIFT))
 
 #define ctodb(db)			/* calculates pages to devblks */ \
 	((db) << (PAGE_SHIFT - DEV_BSHIFT))
 
 /*
  * Old spelling of __containerof().
  */
 #define	member2struct(s, m, x)						\
 	((struct s *)(void *)((char *)(x) - offsetof(struct s, m)))
 
 /*
  * Access a variable length array that has been declared as a fixed
  * length array.
  */
 #define __PAST_END(array, offset) (((__typeof__(*(array)) *)(array))[offset])
 
 #endif	/* _SYS_PARAM_H_ */