Index: projects/clang390-import/UPDATING =================================================================== --- projects/clang390-import/UPDATING (revision 305686) +++ projects/clang390-import/UPDATING (revision 305687) @@ -1,1641 +1,1647 @@ Updating Information for FreeBSD current users. This file is maintained and copyrighted by M. Warner Losh . See end of file for further details. For commonly done items, please see the COMMON ITEMS: section later in the file. These instructions assume that you basically know what you are doing. If not, then please consult the FreeBSD handbook: http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/makeworld.html Items affecting the ports and packages system can be found in /usr/ports/UPDATING. Please read that file before running portupgrade. NOTE: FreeBSD has switched from gcc to clang. If you have trouble bootstrapping from older versions of FreeBSD, try WITHOUT_CLANG and WITH_GCC to bootstrap to the tip of head, and then rebuild without this option. The bootstrap process from older version of current across the gcc/clang cutover is a bit fragile. NOTE TO PEOPLE WHO THINK THAT FreeBSD 12.x IS SLOW: FreeBSD 12.x has many debugging features turned on, in both the kernel and userland. These features attempt to detect incorrect use of system primitives, and encourage loud failure through extra sanity checking and fail stop semantics. They also substantially impact system performance. If you want to do performance measurement, benchmarking, and optimization, you'll want to turn them off. This includes various WITNESS- related kernel options, INVARIANTS, malloc debugging flags in userland, and various verbose features in the kernel. Many developers choose to disable these features on build machines to maximize performance. (To completely disable malloc debugging, define MALLOC_PRODUCTION in /etc/make.conf, or to merely disable the most expensive debugging functionality run "ln -s 'abort:false,junk:false' /etc/malloc.conf".) +20160908: + The queue(3) debugging macro, QUEUE_MACRO_DEBUG, has been split into + two separate components, QUEUE_MACRO_DEBUG_TRACE and + QUEUE_MACRO_DEBUG_TRASH. Define both for the original + QUEUE_MACRO_DEBUG behavior. + 20160824: r304787 changed some ioctl interfaces between the iSCSI userspace programs and the kernel. ctladm, ctld, iscsictl, and iscsid must be rebuilt to work with new kernels. __FreeBSD_version has been bumped to 1200005. 20160818: The UDP receive code has been updated to only treat incoming UDP packets that were addressed to an L2 broadcast address as L3 broadcast packets. It is not expected that this will affect any standards-conforming UDP application. The new behaviour can be disabled by setting the sysctl net.inet.udp.require_l2_bcast to 0. 20160818: Remove the openbsd_poll system call. __FreeBSD_version has been bumped because of this. 20160622: The libc stub for the pipe(2) system call has been replaced with a wrapper that calls the pipe2(2) system call and the pipe(2) system call is now only implemented by the kernels that include "options COMPAT_FREEBSD10" in their config file (this is the default). Users should ensure that this option is enabled in their kernel or upgrade userspace to r302092 before upgrading their kernel. 20160527: CAM will now strip leading spaces from SCSI disks' serial numbers. This will effect users who create UFS filesystems on SCSI disks using those disk's diskid device nodes. For example, if /etc/fstab previously contained a line like "/dev/diskid/DISK-%20%20%20%20%20%20%20ABCDEFG0123456", you should change it to "/dev/diskid/DISK-ABCDEFG0123456". Users of geom transforms like gmirror may also be affected. ZFS users should generally be fine. 20160523: The bitstring(3) API has been updated with new functionality and improved performance. But it is binary-incompatible with the old API. Objects built with the new headers may not be linked against objects built with the old headers. 20160520: The brk and sbrk functions have been removed from libc on arm64. Binutils from ports has been updated to not link to these functions and should be updated to the latest version before installing a new libc. 20160517: The armv6 port now defaults to hard float ABI. Limited support for running both hardfloat and soft float on the same system is available using the libraries installed with -DWITH_LIBSOFT. This has only been tested as an upgrade path for installworld and packages may fail or need manual intervention to run. New packages will be needed. To update an existing self-hosted armv6hf system, you must add TARGET_ARCH=armv6 on the make command line for both the build and the install steps. 20160510: Kernel modules compiled outside of a kernel build now default to installing to /boot/modules instead of /boot/kernel. Many kernel modules built this way (such as those in ports) already overrode KMODDIR explicitly to install into /boot/modules. However, manually building and installing a module from /sys/modules will now install to /boot/modules instead of /boot/kernel. 20160414: The CAM I/O scheduler has been committed to the kernel. There should be no user visible impact. This does enable NCQ Trim on ada SSDs. While the list of known rogues that claim support for this but actually corrupt data is believed to be complete, be on the lookout for data corruption. The known rogue list is believed to be complete: o Crucial MX100, M550 drives with MU01 firmware. o Micron M510 and M550 drives with MU01 firmware. o Micron M500 prior to MU07 firmware o Samsung 830, 840, and 850 all firmwares o FCCT M500 all firmwares Crucial has firmware http://www.crucial.com/usa/en/support-ssd-firmware with working NCQ TRIM. For Micron branded drives, see your sales rep for updated firmware. Black listed drives will work correctly because these drives work correctly so long as no NCQ TRIMs are sent to them. Given this list is the same as found in Linux, it's believed there are no other rogues in the market place. All other models from the above vendors work. To be safe, if you are at all concerned, you can quirk each of your drives to prevent NCQ from being sent by setting: kern.cam.ada.X.quirks="0x2" in loader.conf. If the drive requires the 4k sector quirk, set the quirks entry to 0x3. 20160330: The FAST_DEPEND build option has been removed and its functionality is now the one true way. The old mkdep(1) style of 'make depend' has been removed. See 20160311 for further details. 20160317: Resource range types have grown from unsigned long to uintmax_t. All drivers, and anything using libdevinfo, need to be recompiled. 20160311: WITH_FAST_DEPEND is now enabled by default for in-tree and out-of-tree builds. It no longer runs mkdep(1) during 'make depend', and the 'make depend' stage can safely be skipped now as it is auto ran when building 'make all' and will generate all SRCS and DPSRCS before building anything else. Dependencies are gathered at compile time with -MF flags kept in separate .depend files per object file. Users should run 'make cleandepend' once if using -DNO_CLEAN to clean out older stale .depend files. 20160306: On amd64, clang 3.8.0 can now insert sections of type AMD64_UNWIND into kernel modules. Therefore, if you load any kernel modules at boot time, please install the boot loaders after you install the kernel, but before rebooting, e.g.: make buildworld make kernel KERNCONF=YOUR_KERNEL_HERE make -C sys/boot install Then follow the usual steps, described in the General Notes section, below. 20160305: Clang, llvm, lldb and compiler-rt have been upgraded to 3.8.0. Please see the 20141231 entry below for information about prerequisites and upgrading, if you are not already using clang 3.5.0 or higher. 20160301: The AIO subsystem is now a standard part of the kernel. The VFS_AIO kernel option and aio.ko kernel module have been removed. Due to stability concerns, asynchronous I/O requests are only permitted on sockets and raw disks by default. To enable asynchronous I/O requests on all file types, set the vfs.aio.enable_unsafe sysctl to a non-zero value. 20160226: The ELF object manipulation tool objcopy is now provided by the ELF Tool Chain project rather than by GNU binutils. It should be a drop-in replacement, with the addition of arm64 support. The (temporary) src.conf knob WITHOUT_ELFCOPY_AS_OBJCOPY knob may be set to obtain the GNU version if necessary. 20160129: Building ZFS pools on top of zvols is prohibited by default. That feature has never worked safely; it's always been prone to deadlocks. Using a zvol as the backing store for a VM guest's virtual disk will still work, even if the guest is using ZFS. Legacy behavior can be restored by setting vfs.zfs.vol.recursive=1. 20160119: The NONE and HPN patches has been removed from OpenSSH. They are still available in the security/openssh-portable port. 20160113: With the addition of ypldap(8), a new _ypldap user is now required during installworld. "mergemaster -p" can be used to add the user prior to installworld, as documented in the handbook. 20151216: The tftp loader (pxeboot) now uses the option root-path directive. As a consequence it no longer looks for a pxeboot.4th file on the tftp server. Instead it uses the regular /boot infrastructure as with the other loaders. 20151211: The code to start recording plug and play data into the modules has been committed. While the old tools will properly build a new kernel, a number of warnings about "unknown metadata record 4" will be produced for an older kldxref. To avoid such warnings, make sure to rebuild the kernel toolchain (or world). Make sure that you have r292078 or later when trying to build 292077 or later before rebuilding. 20151207: Debug data files are now built by default with 'make buildworld' and installed with 'make installworld'. This facilitates debugging but requires more disk space both during the build and for the installed world. Debug files may be disabled by setting WITHOUT_DEBUG_FILES=yes in src.conf(5). 20151130: r291527 changed the internal interface between the nfsd.ko and nfscommon.ko modules. As such, they must both be upgraded to-gether. __FreeBSD_version has been bumped because of this. 20151108: Add support for unicode collation strings leads to a change of order of files listed by ls(1) for example. To get back to the old behaviour, set LC_COLLATE environment variable to "C". Databases administrators will need to reindex their databases given collation results will be different. Due to a bug in install(1) it is recommended to remove the ancient locales before running make installworld. rm -rf /usr/share/locale/* 20151030: The OpenSSL has been upgraded to 1.0.2d. Any binaries requiring libcrypto.so.7 or libssl.so.7 must be recompiled. 20151020: Qlogic 24xx/25xx firmware images were updated from 5.5.0 to 7.3.0. Kernel modules isp_2400_multi and isp_2500_multi were removed and should be replaced with isp_2400 and isp_2500 modules respectively. 20151017: The build previously allowed using 'make -n' to not recurse into sub-directories while showing what commands would be executed, and 'make -n -n' to recursively show commands. Now 'make -n' will recurse and 'make -N' will not. 20151012: If you specify SENDMAIL_MC or SENDMAIL_CF in make.conf, mergemaster and etcupdate will now use this file. A custom sendmail.cf is now updated via this mechanism rather than via installworld. If you had excluded sendmail.cf in mergemaster.rc or etcupdate.conf, you may want to remove the exclusion or change it to "always install". /etc/mail/sendmail.cf is now managed the same way regardless of whether SENDMAIL_MC/SENDMAIL_CF is used. If you are not using SENDMAIL_MC/SENDMAIL_CF there should be no change in behavior. 20151011: Compatibility shims for legacy ATA device names have been removed. It includes ATA_STATIC_ID kernel option, kern.cam.ada.legacy_aliases and kern.geom.raid.legacy_aliases loader tunables, kern.devalias.* environment variables, /dev/ad* and /dev/ar* symbolic links. 20151006: Clang, llvm, lldb, compiler-rt and libc++ have been upgraded to 3.7.0. Please see the 20141231 entry below for information about prerequisites and upgrading, if you are not already using clang 3.5.0 or higher. 20150924: Kernel debug files have been moved to /usr/lib/debug/boot/kernel/, and renamed from .symbols to .debug. This reduces the size requirements on the boot partition or file system and provides consistency with userland debug files. When using the supported kernel installation method the /usr/lib/debug/boot/kernel directory will be renamed (to kernel.old) as is done with /boot/kernel. Developers wishing to maintain the historical behavior of installing debug files in /boot/kernel/ can set KERN_DEBUGDIR="" in src.conf(5). 20150827: The wireless drivers had undergone changes that remove the 'parent interface' from the ifconfig -l output. The rc.d network scripts used to check presence of a parent interface in the list, so old scripts would fail to start wireless networking. Thus, etcupdate(3) or mergemaster(8) run is required after kernel update, to update your rc.d scripts in /etc. 20150827: pf no longer supports 'scrub fragment crop' or 'scrub fragment drop-ovl' These configurations are now automatically interpreted as 'scrub fragment reassemble'. 20150817: Kernel-loadable modules for the random(4) device are back. To use them, the kernel must have device random options RANDOM_LOADABLE kldload(8) can then be used to load random_fortuna.ko or random_yarrow.ko. Please note that due to the indirect function calls that the loadable modules need to provide, the build-in variants will be slightly more efficient. The random(4) kernel option RANDOM_DUMMY has been retired due to unpopularity. It was not all that useful anyway. 20150813: The WITHOUT_ELFTOOLCHAIN_TOOLS src.conf(5) knob has been retired. Control over building the ELF Tool Chain tools is now provided by the WITHOUT_TOOLCHAIN knob. 20150810: The polarity of Pulse Per Second (PPS) capture events with the uart(4) driver has been corrected. Prior to this change the PPS "assert" event corresponded to the trailing edge of a positive PPS pulse and the "clear" event was the leading edge of the next pulse. As the width of a PPS pulse in a typical GPS receiver is on the order of 1 millisecond, most users will not notice any significant difference with this change. Anyone who has compensated for the historical polarity reversal by configuring a negative offset equal to the pulse width will need to remove that workaround. 20150809: The default group assigned to /dev/dri entries has been changed from 'wheel' to 'video' with the id of '44'. If you want to have access to the dri devices please add yourself to the video group with: # pw groupmod video -m $USER 20150806: The menu.rc and loader.rc files will now be replaced during upgrades. Please migrate local changes to menu.rc.local and loader.rc.local instead. 20150805: GNU Binutils versions of addr2line, c++filt, nm, readelf, size, strings and strip have been removed. The src.conf(5) knob WITHOUT_ELFTOOLCHAIN_TOOLS no longer provides the binutils tools. 20150728: As ZFS requires more kernel stack pages than is the default on some architectures e.g. i386, it now warns if KSTACK_PAGES is less than ZFS_MIN_KSTACK_PAGES (which is 4 at the time of writing). Please consider using 'options KSTACK_PAGES=X' where X is greater than or equal to ZFS_MIN_KSTACK_PAGES i.e. 4 in such configurations. 20150706: sendmail has been updated to 8.15.2. Starting with FreeBSD 11.0 and sendmail 8.15, sendmail uses uncompressed IPv6 addresses by default, i.e., they will not contain "::". For example, instead of ::1, it will be 0:0:0:0:0:0:0:1. This permits a zero subnet to have a more specific match, such as different map entries for IPv6:0:0 vs IPv6:0. This change requires that configuration data (including maps, files, classes, custom ruleset, etc.) must use the same format, so make certain such configuration data is upgrading. As a very simple check search for patterns like 'IPv6:[0-9a-fA-F:]*::' and 'IPv6::'. To return to the old behavior, set the m4 option confUSE_COMPRESSED_IPV6_ADDRESSES or the cf option UseCompressedIPv6Addresses. 20150630: The default kernel entropy-processing algorithm is now Fortuna, replacing Yarrow. Assuming you have 'device random' in your kernel config file, the configurations allow a kernel option to override this default. You may choose *ONE* of: options RANDOM_YARROW # Legacy /dev/random algorithm. options RANDOM_DUMMY # Blocking-only driver. If you have neither, you get Fortuna. For most people, read no further, Fortuna will give a /dev/random that works like it always used to, and the difference will be irrelevant. If you remove 'device random', you get *NO* kernel-processed entropy at all. This may be acceptable to folks building embedded systems, but has complications. Carry on reading, and it is assumed you know what you need. *PLEASE* read random(4) and random(9) if you are in the habit of tweaking kernel configs, and/or if you are a member of the embedded community, wanting specific and not-usual behaviour from your security subsystems. NOTE!! If you use RANDOM_DUMMY and/or have no 'device random', you will NOT have a functioning /dev/random, and many cryptographic features will not work, including SSH. You may also find strange behaviour from the random(3) set of library functions, in particular sranddev(3), srandomdev(3) and arc4random(3). The reason for this is that the KERN_ARND sysctl only returns entropy if it thinks it has some to share, and with RANDOM_DUMMY or no 'device random' this will never happen. 20150623: An additional fix for the issue described in the 20150614 sendmail entry below has been been committed in revision 284717. 20150616: FreeBSD's old make (fmake) has been removed from the system. It is available as the devel/fmake port or via pkg install fmake. 20150615: The fix for the issue described in the 20150614 sendmail entry below has been been committed in revision 284436. The work around described in that entry is no longer needed unless the default setting is overridden by a confDH_PARAMETERS configuration setting of '5' or pointing to a 512 bit DH parameter file. 20150614: ALLOW_DEPRECATED_ATF_TOOLS/ATFFILE support has been removed from atf.test.mk (included from bsd.test.mk). Please upgrade devel/atf and devel/kyua to version 0.20+ and adjust any calling code to work with Kyuafile and kyua. 20150614: The import of openssl to address the FreeBSD-SA-15:10.openssl security advisory includes a change which rejects handshakes with DH parameters below 768 bits. sendmail releases prior to 8.15.2 (not yet released), defaulted to a 512 bit DH parameter setting for client connections. To work around this interoperability, sendmail can be configured to use a 2048 bit DH parameter by: 1. Edit /etc/mail/`hostname`.mc 2. If a setting for confDH_PARAMETERS does not exist or exists and is set to a string beginning with '5', replace it with '2'. 3. If a setting for confDH_PARAMETERS exists and is set to a file path, create a new file with: openssl dhparam -out /path/to/file 2048 4. Rebuild the .cf file: cd /etc/mail/; make; make install 5. Restart sendmail: cd /etc/mail/; make restart A sendmail patch is coming, at which time this file will be updated. 20150604: Generation of legacy formatted entries have been disabled by default in pwd_mkdb(8), as all base system consumers of the legacy formatted entries were converted to use the new format by default when the new, machine independent format have been added and supported since FreeBSD 5.x. Please see the pwd_mkdb(8) manual page for further details. 20150525: Clang and llvm have been upgraded to 3.6.1 release. Please see the 20141231 entry below for information about prerequisites and upgrading, if you are not already using 3.5.0 or higher. 20150521: TI platform code switched to using vendor DTS files and this update may break existing systems running on Beaglebone, Beaglebone Black, and Pandaboard: - dtb files should be regenerated/reinstalled. Filenames are the same but content is different now - GPIO addressing was changed, now each GPIO bank (32 pins per bank) has its own /dev/gpiocX device, e.g. pin 121 on /dev/gpioc0 in old addressing scheme is now pin 25 on /dev/gpioc3. - Pandaboard: /etc/ttys should be updated, serial console device is now /dev/ttyu2, not /dev/ttyu0 20150501: soelim(1) from gnu/usr.bin/groff has been replaced by usr.bin/soelim. If you need the GNU extension from groff soelim(1), install groff from package: pkg install groff, or via ports: textproc/groff. 20150423: chmod, chflags, chown and chgrp now affect symlinks in -R mode as defined in symlink(7); previously symlinks were silently ignored. 20150415: The const qualifier has been removed from iconv(3) to comply with POSIX. The ports tree is aware of this from r384038 onwards. 20150416: Libraries specified by LIBADD in Makefiles must have a corresponding DPADD_ variable to ensure correct dependencies. This is now enforced in src.libnames.mk. 20150324: From legacy ata(4) driver was removed support for SATA controllers supported by more functional drivers ahci(4), siis(4) and mvs(4). Kernel modules ataahci and ataadaptec were removed completely, replaced by ahci and mvs modules respectively. 20150315: Clang, llvm and lldb have been upgraded to 3.6.0 release. Please see the 20141231 entry below for information about prerequisites and upgrading, if you are not already using 3.5.0 or higher. 20150307: The 32-bit PowerPC kernel has been changed to a position-independent executable. This can only be booted with a version of loader(8) newer than January 31, 2015, so make sure to update both world and kernel before rebooting. 20150217: If you are running a -CURRENT kernel since r273872 (Oct 30th, 2014), but before r278950, the RNG was not seeded properly. Immediately upgrade the kernel to r278950 or later and regenerate any keys (e.g. ssh keys or openssl keys) that were generated w/ a kernel from that range. This does not affect programs that directly used /dev/random or /dev/urandom. All userland uses of arc4random(3) are affected. 20150210: The autofs(4) ABI was changed in order to restore binary compatibility with 10.1-RELEASE. The automountd(8) daemon needs to be rebuilt to work with the new kernel. 20150131: The powerpc64 kernel has been changed to a position-independent executable. This can only be booted with a new version of loader(8), so make sure to update both world and kernel before rebooting. 20150118: Clang and llvm have been upgraded to 3.5.1 release. This is a bugfix only release, no new features have been added. Please see the 20141231 entry below for information about prerequisites and upgrading, if you are not already using 3.5.0. 20150107: ELF tools addr2line, elfcopy (strip), nm, size, and strings are now taken from the ELF Tool Chain project rather than GNU binutils. They should be drop-in replacements, with the addition of arm64 support. The WITHOUT_ELFTOOLCHAIN_TOOLS= knob may be used to obtain the binutils tools, if necessary. See 20150805 for updated information. 20150105: The default Unbound configuration now enables remote control using a local socket. Users who have already enabled the local_unbound service should regenerate their configuration by running "service local_unbound setup" as root. 20150102: The GNU texinfo and GNU info pages have been removed. To be able to view GNU info pages please install texinfo from ports. 20141231: Clang, llvm and lldb have been upgraded to 3.5.0 release. As of this release, a prerequisite for building clang, llvm and lldb is a C++11 capable compiler and C++11 standard library. This means that to be able to successfully build the cross-tools stage of buildworld, with clang as the bootstrap compiler, your system compiler or cross compiler should either be clang 3.3 or later, or gcc 4.8 or later, and your system C++ library should be libc++, or libdstdc++ from gcc 4.8 or later. On any standard FreeBSD 10.x or 11.x installation, where clang and libc++ are on by default (that is, on x86 or arm), this should work out of the box. On 9.x installations where clang is enabled by default, e.g. on x86 and powerpc, libc++ will not be enabled by default, so libc++ should be built (with clang) and installed first. If both clang and libc++ are missing, build clang first, then use it to build libc++. On 8.x and earlier installations, upgrade to 9.x first, and then follow the instructions for 9.x above. Sparc64 and mips users are unaffected, as they still use gcc 4.2.1 by default, and do not build clang. Many embedded systems are resource constrained, and will not be able to build clang in a reasonable time, or in some cases at all. In those cases, cross building bootable systems on amd64 is a workaround. This new version of clang introduces a number of new warnings, of which the following are most likely to appear: -Wabsolute-value This warns in two cases, for both C and C++: * When the code is trying to take the absolute value of an unsigned quantity, which is effectively a no-op, and almost never what was intended. The code should be fixed, if at all possible. If you are sure that the unsigned quantity can be safely cast to signed, without loss of information or undefined behavior, you can add an explicit cast, or disable the warning. * When the code is trying to take an absolute value, but the called abs() variant is for the wrong type, which can lead to truncation. If you want to disable the warning instead of fixing the code, please make sure that truncation will not occur, or it might lead to unwanted side-effects. -Wtautological-undefined-compare and -Wundefined-bool-conversion These warn when C++ code is trying to compare 'this' against NULL, while 'this' should never be NULL in well-defined C++ code. However, there is some legacy (pre C++11) code out there, which actively abuses this feature, which was less strictly defined in previous C++ versions. Squid and openjdk do this, for example. The warning can be turned off for C++98 and earlier, but compiling the code in C++11 mode might result in unexpected behavior; for example, the parts of the program that are unreachable could be optimized away. 20141222: The old NFS client and server (kernel options NFSCLIENT, NFSSERVER) kernel sources have been removed. The .h files remain, since some utilities include them. This will need to be fixed later. If "mount -t oldnfs ..." is attempted, it will fail. If the "-o" option on mountd(8), nfsd(8) or nfsstat(1) is used, the utilities will report errors. 20141121: The handling of LOCAL_LIB_DIRS has been altered to skip addition of directories to top level SUBDIR variable when their parent directory is included in LOCAL_DIRS. Users with build systems with such hierarchies and without SUBDIR entries in the parent directory Makefiles should add them or add the directories to LOCAL_DIRS. 20141109: faith(4) and faithd(8) have been removed from the base system. Faith has been obsolete for a very long time. 20141104: vt(4), the new console driver, is enabled by default. It brings support for Unicode and double-width characters, as well as support for UEFI and integration with the KMS kernel video drivers. You may need to update your console settings in /etc/rc.conf, most probably the keymap. During boot, /etc/rc.d/syscons will indicate what you need to do. vt(4) still has issues and lacks some features compared to syscons(4). See the wiki for up-to-date information: https://wiki.freebsd.org/Newcons If you want to keep using syscons(4), you can do so by adding the following line to /boot/loader.conf: kern.vty=sc 20141102: pjdfstest has been integrated into kyua as an opt-in test suite. Please see share/doc/pjdfstest/README for more details on how to execute it. 20141009: gperf has been removed from the base system for architectures that use clang. Ports that require gperf will obtain it from the devel/gperf port. 20140923: pjdfstest has been moved from tools/regression/pjdfstest to contrib/pjdfstest . 20140922: At svn r271982, The default linux compat kernel ABI has been adjusted to 2.6.18 in support of the linux-c6 compat ports infrastructure update. If you wish to continue using the linux-f10 compat ports, add compat.linux.osrelease=2.6.16 to your local sysctl.conf. Users are encouraged to update their linux-compat packages to linux-c6 during their next update cycle. 20140729: The ofwfb driver, used to provide a graphics console on PowerPC when using vt(4), no longer allows mmap() of all physical memory. This will prevent Xorg on PowerPC with some ATI graphics cards from initializing properly unless x11-servers/xorg-server is updated to 1.12.4_8 or newer. 20140723: The xdev targets have been converted to using TARGET and TARGET_ARCH instead of XDEV and XDEV_ARCH. 20140719: The default unbound configuration has been modified to address issues with reverse lookups on networks that use private address ranges. If you use the local_unbound service, run "service local_unbound setup" as root to regenerate your configuration, then "service local_unbound reload" to load the new configuration. 20140709: The GNU texinfo and GNU info pages are not built and installed anymore, WITH_INFO knob has been added to allow to built and install them again. UPDATE: see 20150102 entry on texinfo's removal 20140708: The GNU readline library is now an INTERNALLIB - that is, it is statically linked into consumers (GDB and variants) in the base system, and the shared library is no longer installed. The devel/readline port is available for third party software that requires readline. 20140702: The Itanium architecture (ia64) has been removed from the list of known architectures. This is the first step in the removal of the architecture. 20140701: Commit r268115 has added NFSv4.1 server support, merged from projects/nfsv4.1-server. Since this includes changes to the internal interfaces between the NFS related modules, a full build of the kernel and modules will be necessary. __FreeBSD_version has been bumped. 20140629: The WITHOUT_VT_SUPPORT kernel config knob has been renamed WITHOUT_VT. (The other _SUPPORT knobs have a consistent meaning which differs from the behaviour controlled by this knob.) 20140619: Maximal length of the serial number in CTL was increased from 16 to 64 chars, that breaks ABI. All CTL-related tools, such as ctladm and ctld, need to be rebuilt to work with a new kernel. 20140606: The libatf-c and libatf-c++ major versions were downgraded to 0 and 1 respectively to match the upstream numbers. They were out of sync because, when they were originally added to FreeBSD, the upstream versions were not respected. These libraries are private and not yet built by default, so renumbering them should be a non-issue. However, unclean source trees will yield broken test programs once the operator executes "make delete-old-libs" after a "make installworld". Additionally, the atf-sh binary was made private by moving it into /usr/libexec/. Already-built shell test programs will keep the path to the old binary so they will break after "make delete-old" is run. If you are using WITH_TESTS=yes (not the default), wipe the object tree and rebuild from scratch to prevent spurious test failures. This is only needed once: the misnumbered libraries and misplaced binaries have been added to OptionalObsoleteFiles.inc so they will be removed during a clean upgrade. 20140512: Clang and llvm have been upgraded to 3.4.1 release. 20140508: We bogusly installed src.opts.mk in /usr/share/mk. This file should be removed to avoid issues in the future (and has been added to ObsoleteFiles.inc). 20140505: /etc/src.conf now affects only builds of the FreeBSD src tree. In the past, it affected all builds that used the bsd.*.mk files. The old behavior was a bug, but people may have relied upon it. To get this behavior back, you can .include /etc/src.conf from /etc/make.conf (which is still global and isn't changed). This also changes the behavior of incremental builds inside the tree of individual directories. Set MAKESYSPATH to ".../share/mk" to do that. Although this has survived make universe and some upgrade scenarios, other upgrade scenarios may have broken. At least one form of temporary breakage was fixed with MAKESYSPATH settings for buildworld as well... In cases where MAKESYSPATH isn't working with this setting, you'll need to set it to the full path to your tree. One side effect of all this cleaning up is that bsd.compiler.mk is no longer implicitly included by bsd.own.mk. If you wish to use COMPILER_TYPE, you must now explicitly include bsd.compiler.mk as well. 20140430: The lindev device has been removed since /dev/full has been made a standard device. __FreeBSD_version has been bumped. 20140424: The knob WITHOUT_VI was added to the base system, which controls building ex(1), vi(1), etc. Older releases of FreeBSD required ex(1) in order to reorder files share/termcap and didn't build ex(1) as a build tool, so building/installing with WITH_VI is highly advised for build hosts for older releases. This issue has been fixed in stable/9 and stable/10 in r277022 and r276991, respectively. 20140418: The YES_HESIOD knob has been removed. It has been obsolete for a decade. Please move to using WITH_HESIOD instead or your builds will silently lack HESIOD. 20140405: The uart(4) driver has been changed with respect to its handling of the low-level console. Previously the uart(4) driver prevented any process from changing the baudrate or the CLOCAL and HUPCL control flags. By removing the restrictions, operators can make changes to the serial console port without having to reboot. However, when getty(8) is started on the serial device that is associated with the low-level console, a misconfigured terminal line in /etc/ttys will now have a real impact. Before upgrading the kernel, make sure that /etc/ttys has the serial console device configured as 3wire without baudrate to preserve the previous behaviour. E.g: ttyu0 "/usr/libexec/getty 3wire" vt100 on secure 20140306: Support for libwrap (TCP wrappers) in rpcbind was disabled by default to improve performance. To re-enable it, if needed, run rpcbind with command line option -W. 20140226: Switched back to the GPL dtc compiler due to updates in the upstream dts files not being supported by the BSDL dtc compiler. You will need to rebuild your kernel toolchain to pick up the new compiler. Core dumps may result while building dtb files during a kernel build if you fail to do so. Set WITHOUT_GPL_DTC if you require the BSDL compiler. 20140216: Clang and llvm have been upgraded to 3.4 release. 20140216: The nve(4) driver has been removed. Please use the nfe(4) driver for NVIDIA nForce MCP Ethernet adapters instead. 20140212: An ABI incompatibility crept into the libc++ 3.4 import in r261283. This could cause certain C++ applications using shared libraries built against the previous version of libc++ to crash. The incompatibility has now been fixed, but any C++ applications or shared libraries built between r261283 and r261801 should be recompiled. 20140204: OpenSSH will now ignore errors caused by kernel lacking of Capsicum capability mode support. Please note that enabling the feature in kernel is still highly recommended. 20140131: OpenSSH is now built with sandbox support, and will use sandbox as the default privilege separation method. This requires Capsicum capability mode support in kernel. 20140128: The libelf and libdwarf libraries have been updated to newer versions from upstream. Shared library version numbers for these two libraries were bumped. Any ports or binaries requiring these two libraries should be recompiled. __FreeBSD_version is bumped to 1100006. 20140110: If a Makefile in a tests/ directory was auto-generating a Kyuafile instead of providing an explicit one, this would prevent such Makefile from providing its own Kyuafile in the future during NO_CLEAN builds. This has been fixed in the Makefiles but manual intervention is needed to clean an objdir if you use NO_CLEAN: # find /usr/obj -name Kyuafile | xargs rm -f 20131213: The behavior of gss_pseudo_random() for the krb5 mechanism has changed, for applications requesting a longer random string than produced by the underlying enctype's pseudo-random() function. In particular, the random string produced from a session key of enctype aes256-cts-hmac-sha1-96 or aes256-cts-hmac-sha1-96 will be different at the 17th octet and later, after this change. The counter used in the PRF+ construction is now encoded as a big-endian integer in accordance with RFC 4402. __FreeBSD_version is bumped to 1100004. 20131108: The WITHOUT_ATF build knob has been removed and its functionality has been subsumed into the more generic WITHOUT_TESTS. If you were using the former to disable the build of the ATF libraries, you should change your settings to use the latter. 20131025: The default version of mtree is nmtree which is obtained from NetBSD. The output is generally the same, but may vary slightly. If you found you need identical output adding "-F freebsd9" to the command line should do the trick. For the time being, the old mtree is available as fmtree. 20131014: libbsdyml has been renamed to libyaml and moved to /usr/lib/private. This will break ports-mgmt/pkg. Rebuild the port, or upgrade to pkg 1.1.4_8 and verify bsdyml not linked in, before running "make delete-old-libs": # make -C /usr/ports/ports-mgmt/pkg build deinstall install clean or # pkg install pkg; ldd /usr/local/sbin/pkg | grep bsdyml 20131010: The stable/10 branch has been created in subversion from head revision r256279. 20131010: The rc.d/jail script has been updated to support jail(8) configuration file. The "jail__*" rc.conf(5) variables for per-jail configuration are automatically converted to /var/run/jail..conf before the jail(8) utility is invoked. This is transparently backward compatible. See below about some incompatibilities and rc.conf(5) manual page for more details. These variables are now deprecated in favor of jail(8) configuration file. One can use "rc.d/jail config " command to generate a jail(8) configuration file in /var/run/jail..conf without running the jail(8) utility. The default pathname of the configuration file is /etc/jail.conf and can be specified by using $jail_conf or $jail__conf variables. Please note that jail_devfs_ruleset accepts an integer at this moment. Please consider to rewrite the ruleset name with an integer. 20130930: BIND has been removed from the base system. If all you need is a local resolver, simply enable and start the local_unbound service instead. Otherwise, several versions of BIND are available in the ports tree. The dns/bind99 port is one example. With this change, nslookup(1) and dig(1) are no longer in the base system. Users should instead use host(1) and drill(1) which are in the base system. Alternatively, nslookup and dig can be obtained by installing the dns/bind-tools port. 20130916: With the addition of unbound(8), a new unbound user is now required during installworld. "mergemaster -p" can be used to add the user prior to installworld, as documented in the handbook. 20130911: OpenSSH is now built with DNSSEC support, and will by default silently trust signed SSHFP records. This can be controlled with the VerifyHostKeyDNS client configuration setting. DNSSEC support can be disabled entirely with the WITHOUT_LDNS option in src.conf. 20130906: The GNU Compiler Collection and C++ standard library (libstdc++) are no longer built by default on platforms where clang is the system compiler. You can enable them with the WITH_GCC and WITH_GNUCXX options in src.conf. 20130905: The PROCDESC kernel option is now part of the GENERIC kernel configuration and is required for the rwhod(8) to work. If you are using custom kernel configuration, you should include 'options PROCDESC'. 20130905: The API and ABI related to the Capsicum framework was modified in backward incompatible way. The userland libraries and programs have to be recompiled to work with the new kernel. This includes the following libraries and programs, but the whole buildworld is advised: libc, libprocstat, dhclient, tcpdump, hastd, hastctl, kdump, procstat, rwho, rwhod, uniq. 20130903: AES-NI intrinsic support has been added to gcc. The AES-NI module has been updated to use this support. A new gcc is required to build the aesni module on both i386 and amd64. 20130821: The PADLOCK_RNG and RDRAND_RNG kernel options are now devices. Thus "device padlock_rng" and "device rdrand_rng" should be used instead of "options PADLOCK_RNG" & "options RDRAND_RNG". 20130813: WITH_ICONV has been split into two feature sets. WITH_ICONV now enables just the iconv* functionality and is now on by default. WITH_LIBICONV_COMPAT enables the libiconv api and link time compatibility. Set WITHOUT_ICONV to build the old way. If you have been using WITH_ICONV before, you will very likely need to turn on WITH_LIBICONV_COMPAT. 20130806: INVARIANTS option now enables DEBUG for code with OpenSolaris and Illumos origin, including ZFS. If you have INVARIANTS in your kernel configuration, then there is no need to set DEBUG or ZFS_DEBUG explicitly. DEBUG used to enable witness(9) tracking of OpenSolaris (mostly ZFS) locks if WITNESS option was set. Because that generated a lot of witness(9) reports and all of them were believed to be false positives, this is no longer done. New option OPENSOLARIS_WITNESS can be used to achieve the previous behavior. 20130806: Timer values in IPv6 data structures now use time_uptime instead of time_second. Although this is not a user-visible functional change, userland utilities which directly use them---ndp(8), rtadvd(8), and rtsold(8) in the base system---need to be updated to r253970 or later. 20130802: find -delete can now delete the pathnames given as arguments, instead of only files found below them or if the pathname did not contain any slashes. Formerly, the following error message would result: find: -delete: : relative path potentially not safe Deleting the pathnames given as arguments can be prevented without error messages using -mindepth 1 or by changing directory and passing "." as argument to find. This works in the old as well as the new version of find. 20130726: Behavior of devfs rules path matching has been changed. Pattern is now always matched against fully qualified devfs path and slash characters must be explicitly matched by slashes in pattern (FNM_PATHNAME). Rulesets involving devfs subdirectories must be reviewed. 20130716: The default ARM ABI has changed to the ARM EABI. The old ABI is incompatible with the ARM EABI and all programs and modules will need to be rebuilt to work with a new kernel. To keep using the old ABI ensure the WITHOUT_ARM_EABI knob is set. NOTE: Support for the old ABI will be removed in the future and users are advised to upgrade. 20130709: pkg_install has been disconnected from the build if you really need it you should add WITH_PKGTOOLS in your src.conf(5). 20130709: Most of network statistics structures were changed to be able keep 64-bits counters. Thus all tools, that work with networking statistics, must be rebuilt (netstat(1), bsnmpd(1), etc.) 20130618: Fix a bug that allowed a tracing process (e.g. gdb) to write to a memory-mapped file in the traced process's address space even if neither the traced process nor the tracing process had write access to that file. 20130615: CVS has been removed from the base system. An exact copy of the code is available from the devel/cvs port. 20130613: Some people report the following error after the switch to bmake: make: illegal option -- J usage: make [-BPSXeiknpqrstv] [-C directory] [-D variable] ... *** [buildworld] Error code 2 this likely due to an old instance of make in ${MAKEPATH} (${MAKEOBJDIRPREFIX}${.CURDIR}/make.${MACHINE}) which src/Makefile will use that blindly, if it exists, so if you see the above error: rm -rf `make -V MAKEPATH` should resolve it. 20130516: Use bmake by default. Whereas before one could choose to build with bmake via -DWITH_BMAKE one must now use -DWITHOUT_BMAKE to use the old make. The goal is to remove these knobs for 10-RELEASE. It is worth noting that bmake (like gmake) treats the command line as the unit of failure, rather than statements within the command line. Thus '(cd some/where && dosomething)' is safer than 'cd some/where; dosomething'. The '()' allows consistent behavior in parallel build. 20130429: Fix a bug that allows NFS clients to issue READDIR on files. 20130426: The WITHOUT_IDEA option has been removed because the IDEA patent expired. 20130426: The sysctl which controls TRIM support under ZFS has been renamed from vfs.zfs.trim_disable -> vfs.zfs.trim.enabled and has been enabled by default. 20130425: The mergemaster command now uses the default MAKEOBJDIRPREFIX rather than creating it's own in the temporary directory in order allow access to bootstrapped versions of tools such as install and mtree. When upgrading from version of FreeBSD where the install command does not support -l, you will need to install a new mergemaster command if mergemaster -p is required. This can be accomplished with the command (cd src/usr.sbin/mergemaster && make install). 20130404: Legacy ATA stack, disabled and replaced by new CAM-based one since FreeBSD 9.0, completely removed from the sources. Kernel modules atadisk and atapi*, user-level tools atacontrol and burncd are removed. Kernel option `options ATA_CAM` is now permanently enabled and removed. 20130319: SOCK_CLOEXEC and SOCK_NONBLOCK flags have been added to socket(2) and socketpair(2). Software, in particular Kerberos, may automatically detect and use these during building. The resulting binaries will not work on older kernels. 20130308: CTL_DISABLE has also been added to the sparc64 GENERIC (for further information, see the respective 20130304 entry). 20130304: Recent commits to callout(9) changed the size of struct callout, so the KBI is probably heavily disturbed. Also, some functions in callout(9)/sleep(9)/sleepqueue(9)/condvar(9) KPIs were replaced by macros. Every kernel module using it won't load, so rebuild is requested. The ctl device has been re-enabled in GENERIC for i386 and amd64, but does not initialize by default (because of the new CTL_DISABLE option) to save memory. To re-enable it, remove the CTL_DISABLE option from the kernel config file or set kern.cam.ctl.disable=0 in /boot/loader.conf. 20130301: The ctl device has been disabled in GENERIC for i386 and amd64. This was done due to the extra memory being allocated at system initialisation time by the ctl driver which was only used if a CAM target device was created. This makes a FreeBSD system unusable on 128MB or less of RAM. 20130208: A new compression method (lz4) has been merged to -HEAD. Please refer to zpool-features(7) for more information. Please refer to the "ZFS notes" section of this file for information on upgrading boot ZFS pools. 20130129: A BSD-licensed patch(1) variant has been added and is installed as bsdpatch, being the GNU version the default patch. To inverse the logic and use the BSD-licensed one as default, while having the GNU version installed as gnupatch, rebuild and install world with the WITH_BSD_PATCH knob set. 20130121: Due to the use of the new -l option to install(1) during build and install, you must take care not to directly set the INSTALL make variable in your /etc/make.conf, /etc/src.conf, or on the command line. If you wish to use the -C flag for all installs you may be able to add INSTALL+=-C to /etc/make.conf or /etc/src.conf. 20130118: The install(1) option -M has changed meaning and now takes an argument that is a file or path to append logs to. In the unlikely event that -M was the last option on the command line and the command line contained at least two files and a target directory the first file will have logs appended to it. The -M option served little practical purpose in the last decade so its use is expected to be extremely rare. 20121223: After switching to Clang as the default compiler some users of ZFS on i386 systems started to experience stack overflow kernel panics. Please consider using 'options KSTACK_PAGES=4' in such configurations. 20121222: GEOM_LABEL now mangles label names read from file system metadata. Mangling affect labels containing spaces, non-printable characters, '%' or '"'. Device names in /etc/fstab and other places may need to be updated. 20121217: By default, only the 10 most recent kernel dumps will be saved. To restore the previous behaviour (no limit on the number of kernel dumps stored in the dump directory) add the following line to /etc/rc.conf: savecore_flags="" 20121201: With the addition of auditdistd(8), a new auditdistd user is now required during installworld. "mergemaster -p" can be used to add the user prior to installworld, as documented in the handbook. 20121117: The sin6_scope_id member variable in struct sockaddr_in6 is now filled by the kernel before passing the structure to the userland via sysctl or routing socket. This means the KAME-specific embedded scope id in sin6_addr.s6_addr[2] is always cleared in userland application. This behavior can be controlled by net.inet6.ip6.deembed_scopeid. __FreeBSD_version is bumped to 1000025. 20121105: On i386 and amd64 systems WITH_CLANG_IS_CC is now the default. This means that the world and kernel will be compiled with clang and that clang will be installed as /usr/bin/cc, /usr/bin/c++, and /usr/bin/cpp. To disable this behavior and revert to building with gcc, compile with WITHOUT_CLANG_IS_CC. Really old versions of current may need to bootstrap WITHOUT_CLANG first if the clang build fails (its compatibility window doesn't extend to the 9 stable branch point). 20121102: The IPFIREWALL_FORWARD kernel option has been removed. Its functionality now turned on by default. 20121023: The ZERO_COPY_SOCKET kernel option has been removed and split into SOCKET_SEND_COW and SOCKET_RECV_PFLIP. NB: SOCKET_SEND_COW uses the VM page based copy-on-write mechanism which is not safe and may result in kernel crashes. NB: The SOCKET_RECV_PFLIP mechanism is useless as no current driver supports disposeable external page sized mbuf storage. Proper replacements for both zero-copy mechanisms are under consideration and will eventually lead to complete removal of the two kernel options. 20121023: The IPv4 network stack has been converted to network byte order. The following modules need to be recompiled together with kernel: carp(4), divert(4), gif(4), siftr(4), gre(4), pf(4), ipfw(4), ng_ipfw(4), stf(4). 20121022: Support for non-MPSAFE filesystems was removed from VFS. The VFS_VERSION was bumped, all filesystem modules shall be recompiled. 20121018: All the non-MPSAFE filesystems have been disconnected from the build. The full list includes: codafs, hpfs, ntfs, nwfs, portalfs, smbfs, xfs. 20121016: The interface cloning API and ABI has changed. The following modules need to be recompiled together with kernel: ipfw(4), pfsync(4), pflog(4), usb(4), wlan(4), stf(4), vlan(4), disc(4), edsc(4), if_bridge(4), gif(4), tap(4), faith(4), epair(4), enc(4), tun(4), if_lagg(4), gre(4). 20121015: The sdhci driver was split in two parts: sdhci (generic SD Host Controller logic) and sdhci_pci (actual hardware driver). No kernel config modifications are required, but if you load sdhc as a module you must switch to sdhci_pci instead. 20121014: Import the FUSE kernel and userland support into base system. 20121013: The GNU sort(1) program has been removed since the BSD-licensed sort(1) has been the default for quite some time and no serious problems have been reported. The corresponding WITH_GNU_SORT knob has also gone. 20121006: The pfil(9) API/ABI for AF_INET family has been changed. Packet filtering modules: pf(4), ipfw(4), ipfilter(4) need to be recompiled with new kernel. 20121001: The net80211(4) ABI has been changed to allow for improved driver PS-POLL and power-save support. All wireless drivers need to be recompiled to work with the new kernel. 20120913: The random(4) support for the VIA hardware random number generator (`PADLOCK') is no longer enabled unconditionally. Add the padlock_rng device in the custom kernel config if needed. The GENERIC kernels on i386 and amd64 do include the device, so the change only affects the custom kernel configurations. 20120908: The pf(4) packet filter ABI has been changed. pfctl(8) and snmp_pf module need to be recompiled to work with new kernel. 20120828: A new ZFS feature flag "com.delphix:empty_bpobj" has been merged to -HEAD. Pools that have empty_bpobj in active state can not be imported read-write with ZFS implementations that do not support this feature. For more information read the zpool-features(5) manual page. 20120727: The sparc64 ZFS loader has been changed to no longer try to auto- detect ZFS providers based on diskN aliases but now requires these to be explicitly listed in the OFW boot-device environment variable. 20120712: The OpenSSL has been upgraded to 1.0.1c. Any binaries requiring libcrypto.so.6 or libssl.so.6 must be recompiled. Also, there are configuration changes. Make sure to merge /etc/ssl/openssl.cnf. 20120712: The following sysctls and tunables have been renamed for consistency with other variables: kern.cam.da.da_send_ordered -> kern.cam.da.send_ordered kern.cam.ada.ada_send_ordered -> kern.cam.ada.send_ordered 20120628: The sort utility has been replaced with BSD sort. For now, GNU sort is also available as "gnusort" or the default can be set back to GNU sort by setting WITH_GNU_SORT. In this case, BSD sort will be installed as "bsdsort". 20120611: A new version of ZFS (pool version 5000) has been merged to -HEAD. Starting with this version the old system of ZFS pool versioning is superseded by "feature flags". This concept enables forward compatibility against certain future changes in functionality of ZFS pools. The first read-only compatible "feature flag" for ZFS pools is named "com.delphix:async_destroy". For more information read the new zpool-features(5) manual page. Please refer to the "ZFS notes" section of this file for information on upgrading boot ZFS pools. 20120417: The malloc(3) implementation embedded in libc now uses sources imported as contrib/jemalloc. The most disruptive API change is to /etc/malloc.conf. If your system has an old-style /etc/malloc.conf, delete it prior to installworld, and optionally re-create it using the new format after rebooting. See malloc.conf(5) for details (specifically the TUNING section and the "opt.*" entries in the MALLCTL NAMESPACE section). 20120328: Big-endian MIPS TARGET_ARCH values no longer end in "eb". mips64eb is now spelled mips64. mipsn32eb is now spelled mipsn32. mipseb is now spelled mips. This is to aid compatibility with third-party software that expects this naming scheme in uname(3). Little-endian settings are unchanged. If you are updating a big-endian mips64 machine from before this change, you may need to set MACHINE_ARCH=mips64 in your environment before the new build system will recognize your machine. 20120306: Disable by default the option VFS_ALLOW_NONMPSAFE for all supported platforms. 20120229: Now unix domain sockets behave "as expected" on nullfs(5). Previously nullfs(5) did not pass through all behaviours to the underlying layer, as a result if we bound to a socket on the lower layer we could connect only to the lower path; if we bound to the upper layer we could connect only to the upper path. The new behavior is one can connect to both the lower and the upper paths regardless what layer path one binds to. 20120211: The getifaddrs upgrade path broken with 20111215 has been restored. If you have upgraded in between 20111215 and 20120209 you need to recompile libc again with your kernel. You still need to recompile world to be able to configure CARP but this restriction already comes from 20111215. 20120114: The set_rcvar() function has been removed from /etc/rc.subr. All base and ports rc.d scripts have been updated, so if you have a port installed with a script in /usr/local/etc/rc.d you can either hand-edit the rcvar= line, or reinstall the port. An easy way to handle the mass-update of /etc/rc.d: rm /etc/rc.d/* && mergemaster -i 20120109: panic(9) now stops other CPUs in the SMP systems, disables interrupts on the current CPU and prevents other threads from running. This behavior can be reverted using the kern.stop_scheduler_on_panic tunable/sysctl. The new behavior can be incompatible with kern.sync_on_panic. 20111215: The carp(4) facility has been changed significantly. Configuration of the CARP protocol via ifconfig(8) has changed, as well as format of CARP events submitted to devd(8) has changed. See manual pages for more information. The arpbalance feature of carp(4) is currently not supported anymore. Size of struct in_aliasreq, struct in6_aliasreq has changed. User utilities using SIOCAIFADDR, SIOCAIFADDR_IN6, e.g. ifconfig(8), need to be recompiled. 20111122: The acpi_wmi(4) status device /dev/wmistat has been renamed to /dev/wmistat0. 20111108: The option VFS_ALLOW_NONMPSAFE option has been added in order to explicitely support non-MPSAFE filesystems. It is on by default for all supported platform at this present time. 20111101: The broken amd(4) driver has been replaced with esp(4) in the amd64, i386 and pc98 GENERIC kernel configuration files. 20110930: sysinstall has been removed 20110923: The stable/9 branch created in subversion. This corresponds to the RELENG_9 branch in CVS. COMMON ITEMS: General Notes ------------- Avoid using make -j when upgrading. While generally safe, there are sometimes problems using -j to upgrade. If your upgrade fails with -j, please try again without -j. From time to time in the past there have been problems using -j with buildworld and/or installworld. This is especially true when upgrading between "distant" versions (eg one that cross a major release boundary or several minor releases, or when several months have passed on the -current branch). Sometimes, obscure build problems are the result of environment poisoning. This can happen because the make utility reads its environment when searching for values for global variables. To run your build attempts in an "environmental clean room", prefix all make commands with 'env -i '. See the env(1) manual page for more details. When upgrading from one major version to another it is generally best to upgrade to the latest code in the currently installed branch first, then do an upgrade to the new branch. This is the best-tested upgrade path, and has the highest probability of being successful. Please try this approach before reporting problems with a major version upgrade. When upgrading a live system, having a root shell around before installing anything can help undo problems. Not having a root shell around can lead to problems if pam has changed too much from your starting point to allow continued authentication after the upgrade. This file should be read as a log of events. When a later event changes information of a prior event, the prior event should not be deleted. Instead, a pointer to the entry with the new information should be placed in the old entry. Readers of this file should also sanity check older entries before relying on them blindly. Authors of new entries should write them with this in mind. ZFS notes --------- When upgrading the boot ZFS pool to a new version, always follow these two steps: 1.) recompile and reinstall the ZFS boot loader and boot block (this is part of "make buildworld" and "make installworld") 2.) update the ZFS boot block on your boot drive The following example updates the ZFS boot block on the first partition (freebsd-boot) of a GPT partitioned drive ada0: "gpart bootcode -p /boot/gptzfsboot -i 1 ada0" Non-boot pools do not need these updates. To build a kernel ----------------- If you are updating from a prior version of FreeBSD (even one just a few days old), you should follow this procedure. It is the most failsafe as it uses a /usr/obj tree with a fresh mini-buildworld, make kernel-toolchain make -DALWAYS_CHECK_MAKE buildkernel KERNCONF=YOUR_KERNEL_HERE make -DALWAYS_CHECK_MAKE installkernel KERNCONF=YOUR_KERNEL_HERE To test a kernel once --------------------- If you just want to boot a kernel once (because you are not sure if it works, or if you want to boot a known bad kernel to provide debugging information) run make installkernel KERNCONF=YOUR_KERNEL_HERE KODIR=/boot/testkernel nextboot -k testkernel To just build a kernel when you know that it won't mess you up -------------------------------------------------------------- This assumes you are already running a CURRENT system. Replace ${arch} with the architecture of your machine (e.g. "i386", "arm", "amd64", "ia64", "pc98", "sparc64", "powerpc", "mips", etc). cd src/sys/${arch}/conf config KERNEL_NAME_HERE cd ../compile/KERNEL_NAME_HERE make depend make make install If this fails, go to the "To build a kernel" section. To rebuild everything and install it on the current system. ----------------------------------------------------------- # Note: sometimes if you are running current you gotta do more than # is listed here if you are upgrading from a really old current. make buildworld make kernel KERNCONF=YOUR_KERNEL_HERE [1] [3] mergemaster -Fp [5] make installworld mergemaster -Fi [4] make delete-old [6] To cross-install current onto a separate partition -------------------------------------------------- # In this approach we use a separate partition to hold # current's root, 'usr', and 'var' directories. A partition # holding "/", "/usr" and "/var" should be about 2GB in # size. make buildworld make buildkernel KERNCONF=YOUR_KERNEL_HERE make installworld DESTDIR=${CURRENT_ROOT} -DDB_FROM_SRC make distribution DESTDIR=${CURRENT_ROOT} # if newfs'd make installkernel KERNCONF=YOUR_KERNEL_HERE DESTDIR=${CURRENT_ROOT} cp /etc/fstab ${CURRENT_ROOT}/etc/fstab # if newfs'd To upgrade in-place from stable to current ---------------------------------------------- make buildworld [9] make kernel KERNCONF=YOUR_KERNEL_HERE [8] [1] [3] mergemaster -Fp [5] make installworld mergemaster -Fi [4] make delete-old [6] Make sure that you've read the UPDATING file to understand the tweaks to various things you need. At this point in the life cycle of current, things change often and you are on your own to cope. The defaults can also change, so please read ALL of the UPDATING entries. Also, if you are tracking -current, you must be subscribed to freebsd-current@freebsd.org. Make sure that before you update your sources that you have read and understood all the recent messages there. If in doubt, please track -stable which has much fewer pitfalls. [1] If you have third party modules, such as vmware, you should disable them at this point so they don't crash your system on reboot. [3] From the bootblocks, boot -s, and then do fsck -p mount -u / mount -a cd src adjkerntz -i # if CMOS is wall time Also, when doing a major release upgrade, it is required that you boot into single user mode to do the installworld. [4] Note: This step is non-optional. Failure to do this step can result in a significant reduction in the functionality of the system. Attempting to do it by hand is not recommended and those that pursue this avenue should read this file carefully, as well as the archives of freebsd-current and freebsd-hackers mailing lists for potential gotchas. The -U option is also useful to consider. See mergemaster(8) for more information. [5] Usually this step is a noop. However, from time to time you may need to do this if you get unknown user in the following step. It never hurts to do it all the time. You may need to install a new mergemaster (cd src/usr.sbin/mergemaster && make install) after the buildworld before this step if you last updated from current before 20130425 or from -stable before 20130430. [6] This only deletes old files and directories. Old libraries can be deleted by "make delete-old-libs", but you have to make sure that no program is using those libraries anymore. [8] In order to have a kernel that can run the 4.x binaries needed to do an installworld, you must include the COMPAT_FREEBSD4 option in your kernel. Failure to do so may leave you with a system that is hard to boot to recover. A similar kernel option COMPAT_FREEBSD5 is required to run the 5.x binaries on more recent kernels. And so on for COMPAT_FREEBSD6 and COMPAT_FREEBSD7. Make sure that you merge any new devices from GENERIC since the last time you updated your kernel config file. [9] When checking out sources, you must include the -P flag to have cvs prune empty directories. If CPUTYPE is defined in your /etc/make.conf, make sure to use the "?=" instead of the "=" assignment operator, so that buildworld can override the CPUTYPE if it needs to. MAKEOBJDIRPREFIX must be defined in an environment variable, and not on the command line, or in /etc/make.conf. buildworld will warn if it is improperly defined. FORMAT: This file contains a list, in reverse chronological order, of major breakages in tracking -current. It is not guaranteed to be a complete list of such breakages, and only contains entries since September 23, 2011. If you need to see UPDATING entries from before that date, you will need to fetch an UPDATING file from an older FreeBSD release. Copyright information: Copyright 1998-2009 M. Warner Losh. All Rights Reserved. Redistribution, publication, translation and use, with or without modification, in full or in part, in any form or format of this document are permitted without further permission from the author. THIS DOCUMENT IS PROVIDED BY WARNER LOSH ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL WARNER LOSH BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Contact Warner Losh if you have any questions about your use of this document. $FreeBSD$ Index: projects/clang390-import/contrib/bmake/ChangeLog =================================================================== --- projects/clang390-import/contrib/bmake/ChangeLog (revision 305686) +++ projects/clang390-import/contrib/bmake/ChangeLog (revision 305687) @@ -1,1936 +1,1965 @@ +2016-08-18 Simon J. Gerraty + + * Makefile (_MAKE_VERSION): 20160818 + its a neater number; pick up whitespace fixes to man page. + +2016-08-17 Simon J. Gerraty + + * Makefile (_MAKE_VERSION): 20160817 + Merge with NetBSD make, pick up + o meta.c: move handling of .MAKE.META.IGNORE_* to meta_ignore() + so we can call it before adding entries to missingFiles. + Thus we do not track files we have been told to ignore. + +2016-08-15 Simon J. Gerraty + + * Makefile (_MAKE_VERSION): 20160815 + Merge with NetBSD make, pick up + o meta_oodate: apply .MAKE.META.IGNORE_FILTER (if defined) to + pathnames, and skip if the expansion is empty. + Useful for dirdeps.mk when checking DIRDEPS_CACHE. + +2016-08-12 Simon J. Gerraty + + * Makefile (_MAKE_VERSION): 20160812 + Merge with NetBSD make, pick up + o meta.c: remove all missingFiles entries that match a deleted + dir. + o main.c: set .ERROR_CMD if possible. + 2016-06-06 Simon J. Gerraty * Makefile (_MAKE_VERSION): 20160606 Merge with NetBSD make, pick up o dir.c: extend mtimes cache to others via cached_stat() 2016-06-04 Simon J. Gerraty * Makefile (_MAKE_VERSION): 20160604 Merge with NetBSD make, pick up o meta.c: missing filemon data is only relevant if we read a meta file. Also do not return oodate for a missing metafile if gn->path points to .CURDIR 2016-06-02 Simon J. Gerraty * Makefile (_MAKE_VERSION): 20160602 Merge with NetBSD make, pick up o cached_realpath(): avoid hitting filesystem more than necessary. o meta.c: refactor need_meta decision, add knobs for missing meta file and filemon data wrt out-of-datedness. 2016-05-28 Simon J. Gerraty * Makefile (_MAKE_VERSION): 20160528 * boot-strap, make-bootstrap.sh.in: Makefile now uses _MAKE_VERSION 2016-05-12 Simon J. Gerraty * Makefile (_MAKE_VERSION): 20160512 Merge with NetBSD make, pick up o meta.c: ignore paths that match .MAKE.META.IGNORE_PATTERNS this is useful for gcov builds. o propagate errors from filemon(4). 2016-05-09 Simon J. Gerraty * Makefile (_MAKE_VERSION): 20160509 Merge with NetBSD make, pick up o remove use of non-standard types u_int etc. o meta.c: apply realpath() before matching against metaIgnorePaths 2016-04-04 Simon J. Gerraty * Makefile (_MAKE_VERSION): 20160404 Merge with NetBSD make, pick up o allow makefile to set .MAKE.JOBS * Makefile (PROG_NAME): use ${_MAKE_VERSION} 2016-03-15 Simon J. Gerraty * Makefile (_MAKE_VERSION): 20160315 Merge with NetBSD make, pick up o fix handling of archive members 2016-03-13 Simon J. Gerraty * Makefile (_MAKE_VERSION): rename variable to avoid interference with checks for ${MAKE_VERSION} 2016-03-10 Simon J. Gerraty * Makefile (MAKE_VERSION): 20160310 Merge with NetBSD make, pick up o meta.c: treat missing Read file same as Write, incase we Delete it. 2016-03-07 Simon J. Gerraty * Makefile (MAKE_VERSION): 20160307 Merge with NetBSD make, pick up o var.c: fix :ts\nnn to be octal by default. o meta.c: meta_finish() to cleanup memory. 2016-02-26 Simon J. Gerraty * Makefile (MAKE_VERSION): 20160226 Merge with NetBSD make, pick up o meta.c: allow meta file for makeDepend if makefiles want it. 2016-02-19 Simon J. Gerraty * var.c: default .MAKE.SAVE_DOLLARS to FALSE for backwards compatability. * Makefile (MAKE_VERSION): 20160220 Merge with NetBSD make, pick up o var.c: add knob to control handling of '$$' in := 2016-02-18 Simon J. Gerraty * Makefile (MAKE_VERSION): 20160218 Merge with NetBSD make, pick up o var.c: add .export-literal allows us to fix sys.clean-env.mk post the changes to Var_Subst. Var_Subst now takes flags, and does not consume '$$' in := 2016-02-17 Simon J. Gerraty * Makefile (MAKE_VERSION): 20160217 Merge with NetBSD make, pick up o var.c: preserve '$$' in := o parse.c: add .dinclude for handling included makefile like .depend 2015-12-20 Simon J. Gerraty * Makefile (MAKE_VERSION): 20151220 Merge with NetBSD make, pick up o suff.c: re-initialize suffNull when clearing suffixes. 2015-12-01 Simon J. Gerraty * Makefile (MAKE_VERSION): 20151201 Merge with NetBSD make, pick up o cond.c: CondCvtArg: avoid access beyond end of empty buffer. o meta.c: meta_oodate: use lstat(2) for checking link target in case it is a symlink. o var.c: avoid calling brk_string and Var_Export1 with empty strings. 2015-11-26 Simon J. Gerraty * Makefile (MAKE_VERSION): 20151126 Merge with NetBSD make, pick up o parse.c: ParseTrackInput don't access beyond end of old value. 2015-10-22 Simon J. Gerraty * Makefile (MAKE_VERSION): 20151022 * Add support for BSD/OS which lacks inttypes.h and really needs sys/param.h for sys/sysctl.h also 'type' is not a shell builtin. * var.c: eliminate uint32_t and need for inttypes.h * main.c: PrintOnError flush stdout before run .ERROR * parse.c: cope with _SC_PAGESIZE not being defined. 2015-10-20 Simon J. Gerraty * Makefile (MAKE_VERSION): 20151020 Merge with NetBSD make, pick up o var.c: fix uninitialized var 2015-10-12 Simon J. Gerraty * var.c: the conditional expressions used with ':?' can be expensive, if already discarding do not evaluate or expand anything. 2015-10-10 Simon J. Gerraty * Makefile (MAKE_VERSION): 20151010 Merge with NetBSD make, pick up o Add Boolean wantit flag to Var_Subst and Var_Parse when FALSE we know we are discarding the result and can skip operations like Cmd_Exec. 2015-10-09 Simon J. Gerraty * Makefile (MAKE_VERSION): 20151009 Merge with NetBSD make, pick up o var.c: don't check for NULL before free() o meta.c: meta_oodate, do not hard code ignore of makeDependfile 2015-09-10 Simon J. Gerraty * Makefile (MAKE_VERSION): 20150910 Merge with NetBSD make, pick up o main.c: with -w print Enter/Leaving messages for objdir too if necessary. o centralize shell metachar handling * FILES: add metachar.[ch] 2015-06-06 Simon J. Gerraty * Makefile (MAKE_VERSION): 20150606 Merge with NetBSD make, pick up o make.1: document .OBJDIR target 2015-05-05 Simon J. Gerraty * Makefile (MAKE_VERSION): 20150505 Merge with NetBSD make, pick up o cond.c: be strict about lhs of comparison when evaluating .if but less so when called from variable expansion. o unit-tests/cond2.mk: test various error conditions 2015-05-04 Simon J. Gerraty * machine.sh (MACHINE): Add Bitrig patch from joerg@netbsd.org 2015-04-18 Simon J. Gerraty * Makefile (MAKE_VERSION): 20150418 Merge with NetBSD make, pick up o job.c: use memmove() rather than memcpy() * unit-tests/varshell.mk: SunOS cannot handle the TERMINATED_BY_SIGNAL case, so skip it. 2015-04-11 Simon J. Gerraty * Makefile (MAKE_VERSION): 20150411 bump version - only mk/ changes. 2015-04-10 Simon J. Gerraty * Makefile (MAKE_VERSION): 20150410 Merge with NetBSD make, pick up o document different handling of '-' in jobs mode vs compat o fix jobs mode so that '-' only applies to whole job when shell lacks hasErrCtl o meta.c: use separate vars to track lcwd and latestdir (read) per process 2015-04-01 Simon J. Gerraty * Makefile (MAKE_VERSION): 20150401 Merge with NetBSD make, pick up o meta.c: close meta file in child * Makefile: use BINDIR.bmake if set. Same for MANDIR and SHAREDIR Handy for testing release candidates in various environments. 2015-03-26 Simon J. Gerraty * move initialization of savederr to block where it is used to avoid spurious warning from gcc5 2014-11-11 Simon J. Gerraty * Makefile (MAKE_VERSION): 20141111 just a cooler number 2014-11-05 Simon J. Gerraty * Makefile (MAKE_VERSION): 20141105 Merge with NetBSD make, pick up o revert major overhaul of suffix handling and POSIX compliance - too much breakage and impossible to make backwards compatible. o we still have the new unit test structure which is ok. o meta.c ensure "-- filemon" is at start of line. 2014-09-17 Simon J. Gerraty * configure.in: test that result of getconf PATH_MAX is numeric and discard if not. Apparently needed for Hurd. 2014-08-30 Simon J. Gerraty * Makefile (MAKE_VERSION): 20140830 Merge with NetBSD make, pick up o major overhaul of suffix handling o improved POSIX compliance o overhauled unit-tests 2014-06-20 Simon J. Gerraty * Makefile (MAKE_VERSION): 20140620 Merge with NetBSD make, pick up o var.c return varNoError rather than var_Error for ::= modifiers. 2014-05-22 Simon J. Gerraty * Makefile (MAKE_VERSION): 20140522 Merge with NetBSD make, pick up o var.c detect some parse errors. 2014-04-05 Simon J. Gerraty * Fix spelling errors - patch from Pedro Giffuni 2014-02-14 Simon J. Gerraty * Makefile (MAKE_VERSION): 20140214 Merge with NetBSD make, pick up o .INCLUDEFROM* o use Var_Value to get MAKEOBJDIR[PREFIX] o reduced realloc'ign in brk_string. * configure.in: add a check for compiler supporting __func__ 2014-01-03 Simon J. Gerraty * boot-strap: ignore mksrc=none 2014-01-02 Simon J. Gerraty * Makefile (DEFAULT_SYS_PATH?): use just ${prefix}/share/mk 2014-01-01 Simon J. Gerraty * Makefile (MAKE_VERSION): 20140101 * configure.in: set bmake_path_max to min(_SC_PATH_MAX,1024) * Makefile.config: defined BMAKE_PATH_MAX to bmake_path_max * make.h: use BMAKE_PATH_MAX if MAXPATHLEN not defined (needed for Hurd) * configure.in: Add AC_PREREQ and check for sysctl; patch from Andrew Shadura andrewsh at debian.org 2013-10-16 Simon J. Gerraty * Makefile (MAKE_VERSION): 20131010 * lose the const from arg to systcl to avoid problems on older BSDs. 2013-10-01 Simon J. Gerraty * Makefile (MAKE_VERSION): 20131001 Merge with NetBSD make, pick up o main.c: for NATIVE build sysctl to get MACHINE_ARCH from hw.machine_arch if necessary. o meta.c: meta_oodate - need to look at src of Link and target of Move as well. * main.c: check that CTL_HW and HW_MACHINE_ARCH exist. provide __arraycount() if needed. 2013-09-04 Simon J. Gerraty * Makefile (MAKE_VERSION): 20130904 Merge with NetBSD make, pick up o Add VAR_INTERNAL context, so that internal setting of MAKEFILE does not override value set by makefiles. 2013-09-02 Simon J. Gerraty * Makefile (MAKE_VERSION): 20130902 Merge with NetBSD make, pick up o CompatRunCommand: only apply shellErrFlag when errCheck is true 2013-08-28 Simon J. Gerraty * Makefile (MAKE_VERSION): 20130828 Merge with NetBSD make, pick up o Fix VAR :sh = syntax from Will Andrews at freebsd.org o Call Job_SetPrefix() from Job_Init() so makefiles have opportunity to set .MAKE.JOB.PREFIX 2013-07-30 Simon J. Gerraty * Makefile (MAKE_VERSION): 20130730 Merge with NetBSD make, pick up o Allow suppression of --- job -- tokens by setting .MAKE.JOB.PREFIX empty. 2013-07-16 Simon J. Gerraty * Makefile (MAKE_VERSION): 20130716 Merge with NetBSD make, pick up o number of gmake compatibility tweaks -w for gmake style entering/leaving messages if .MAKE.LEVEL > 0 indicate it in progname "make[1]" etc. handle MAKEFLAGS containing only letters. o when overriding a GLOBAL variable on the command line, delete it from GLOBAL context so -V doesn't show the wrong value. 2013-07-06 Simon J. Gerraty * configure.in: We don't need MAKE_LEVEL_SAFE anymore. * Makefile (MAKE_VERSION): 20130706 Merge with NetBSD make, pick up o Shell_Init(): export shellErrFlag if commandShell hasErrCtl is true so that CompatRunCommand() can use it, to ensure consistent behavior with jobs mode. o use MAKE_LEVEL_ENV to define the variable to propagate .MAKE.LEVEL - currently set to MAKELEVEL (same as gmake). o meta.c: use .MAKE.META.IGNORE_PATHS to allow customization of paths to ignore. 2013-06-04 Simon J. Gerraty * Makefile (MAKE_VERSION): 20130604 Merge with NetBSD make, pick up o job.c: JobCreatePipe: do fcntl() after any tweaking of fd's to avoid leaking descriptors. 2013-05-28 Simon J. Gerraty * Makefile (MAKE_VERSION): 20130528 Merge with NetBSD make, pick up o var.c: cleanup some left-overs in VarHash() 2013-05-20 Simon J. Gerraty * Makefile (MAKE_VERSION): 20130520 generate manifest from component FILES rather than have to update FILES when mk/FILES changes. 2013-05-18 Simon J. Gerraty * Makefile (MAKE_VERSION): 20130518 Merge with NetBSD make, pick up o suff.c: don't skip all processsing for .PHONY targets else wildcard srcs do not get expanded. o var.c: expand name of variable to delete if necessary. 2013-03-30 Simon J. Gerraty * Makefile (MAKE_VERSION): 20130330 Merge with NetBSD make, pick up o meta.c: refine the handling of .OODATE in commands. Rather than suppress command comparison for the entire script as though .NOMETA_CMP had been used, only suppress it for the one command line. This allows something like ${.OODATE:M.NOMETA_CMP} to be used to suppress comparison of a command without otherwise affecting it. o make.1: document that 2013-03-22 Simon J. Gerraty * Makefile (MAKE_VERSION): 20130321 yes, not quite right but its a cooler number. Merge with NetBSD make, pick up o parse.c: fix ParseGmakeExport to be portable and add a unit-test. * meta.c: call meta_init() before makefiles are read and if built with filemon support set .MAKE.PATH_FILEMON to _PATH_FILEMON this let's makefiles test for support. Call meta_mode_init() to process .MAKE.MODE. 2013-03-13 Simon J. Gerraty * Makefile (MAKE_VERSION): 20130305 Merge with NetBSD make, pick up o run .STALE: target when a dependency from .depend is missing. o job.c: add Job_RunTarget() for the above and .BEGIN 2013-03-03 Simon J. Gerraty * Makefile (MAKE_VERSION): 20130303 Merge with NetBSD make, pick up o main.c: set .MAKE.OS to utsname.sysname o job.c: more checks for read and poll errors o var.c: lose VarChangeCase() saves 4% time 2013-03-02 Simon J. Gerraty * boot-strap: remove MAKEOBJDIRPREFIX from environment since we want to use MAKEOBJDIR 2013-01-27 Simon J. Gerraty * Merge with NetBSD make, pick up o make.1: more info on how shell commands are handled. o job.c,main.c: detect write errors to job pipes. 2013-01-25 Simon J. Gerraty * Makefile (MAKE_VERSION): 20130123 Merge with NetBSD make, pick up o meta.c: if script uses .OODATE and meta_oodate() decides rebuild is needed, .OODATE will be empty - set it to .ALLSRC. o var.c: in debug output indicate which variabale modifiers apply to. o remove Check_Cwd logic the makefiles have been fixed. 2012-12-12 Simon J. Gerraty * makefile.in: add a simple makefile for folk who insist on ./configure; make; make install it just runs boot-strap * include mk/* to accommodate the above * boot-strap: re-work to accommodate the above mksrc defaults to $Mydir/mk allow op={configure,build,install,clean,all} add options to facilitate install * Makefile.config.in: just the bits set by configure * Makefile: bump version to 20121212 abandon Makefile.in (NetBSD Makefile) leverage mk/* instead * configure.in: ensure srcdir is absolute 2012-11-11 Simon J. Gerraty * Makefile.in (MAKE_VERSION): 20121111 fix generation of bmake.cat1 2012-11-09 Simon J. Gerraty * Makefile.in (MAKE_VERSION): 20121109 Merge with NetBSD make, pick up o make.c: MakeBuildChild: return 0 so search continues if a .ORDER dependency is detected. o unit-tests/order: test the above 2012-11-02 Simon J. Gerraty * Makefile.in (MAKE_VERSION): 20121102 Merge with NetBSD make, pick up o cond.c: allow cond_state[] to grow. In meta mode with a very large tree, we can hit the limit while processing dirdeps. 2012-10-25 Simon J. Gerraty * Makefile.in: we need to use ${srcdir} not ${.CURDIR} 2012-10-10 Simon J. Gerraty * Makefile.in (MAKE_VERSION): 20121010 o protect syntax that only bmake parses correctly. o remove auto setting of FORCE_MACHINE, use configure's --with-force-machine=whatever if that is desired. 2012-10-08 Simon J. Gerraty * Makefile.in: do not lose history from make.1 when generating bmake.1 2012-10-07 Simon J. Gerraty * Makefile.in (MAKE_VERSION): 20121007 Merge with NetBSD make, pick up o compat.c: ignore empty commands - same as jobs mode. o make.1: document meta chars that cause use of shell 2012-09-11 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20120911 * bsd.after-import.mk: include Makefile.inc early and allow it to override PROG 2012-08-31 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20120831 Merge with NetBSD make, pick up o cast sizeof() to int for comparison o minor make.1 tweak 2012-08-30 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20120830 Merge with NetBSD make, pick up o .MAKE.EXPAND_VARIABLES knob can control default behavior of -V o debug flag -dV causes -V to show raw value regardless. 2012-07-05 Simon J. Gerraty * bsd.after-import.mk (after-import): ensure unit-tests/Makefile gets SRCTOP set. 2012-07-04 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20120704 Merge with NetBSD make, pick up o Job_ParseShell should call Shell_Init if it has been previously called. * Makefile.in: set USE_META based on configure result. also .PARSEDIR is safer indicator of bmake. 2012-06-26 Simon J. Gerraty * Makefile.in: bump version to 20120626 ensure CPPFLAGS is in CFLAGS * meta.c: avoid nested externs * bsd.after-import.mk: avoid ${.CURDIR}/Makefile as target 2012-06-20 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20120620 Merge with NetBSD make, pick up o make_malloc.c: avoid including make_malloc.h again * Makefile.in: avoid bmake only syntax or protect with .if defined(.MAKE.LEVEL) * bsd.after-import.mk: replace .-include with .sinclude ensure? SRCTOP gets a value * configure.in: look for filemon.h in /usr/include/dev/filemon first. 2012-06-19 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20120612 Merge with NetBSD make, pick up o use MAKE_ATTR_* rather than those defined by cdefs.h or compiler for greater portability. o unit-tests/forloop: check that .for works as expected wrt number of times and with "quoted strings". 2012-06-06 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20120606 Merge with NetBSD make, pick up o compat.c: use kill(2) rather than raise(3). * configure.in: look for sys/dev/filemon * bsd.after-import.mk: add a .-include "Makefile.inc" to Makefile and pass BOOTSTRAP_XTRAS to boot-strap. 2012-06-04 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20120604 Merge with NetBSD make, pick up o util.c and var.c share same var for tracking if environ has been reallocated. o util.c provide getenv with setenv. * Add MAKE_LEVEL_SAFE as an alternate means of passing MAKE_LEVEL when the shell actively strips .MAKE.* from the environment. We still refer to the variable always as .MAKE.LEVEL * util.c fix bug in findenv() was finding prefix of name. * compat.c: re-raising SIGINT etc after running .INTERRUPT results in more reliable termination of all activity on many platforms. 2012-06-02 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20120602 Merge with NetBSD make, pick up o for.c: handle quoted items in .for list 2012-05-30 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20120530 Merge with NetBSD make, pick up o compat.c: ignore empty command. 2012-05-24 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20120524 * FILES: add bsd.after-import.mk: A simple means of integrating bmake into a BSD build system. 2012-05-20 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20120520 Merge with NetBSD make, pick up o increased limit for nested conditionals. 2012-05-18 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20120518 Merge with NetBSD make, pick up o use _exit(2) in signal hanlder o Don't use the [dir] cache when building nodes that might have changed since the last exec. o Avoid nested extern declaration warnings. 2012-04-27 Simon J. Gerraty * meta.c (fgetLine): avoid %z - not portable. * parse.c: Since we moved include of sys/mman.h and def's of MAP_COPY etc. we got dups from a merge. 2012-04-24 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20120420 Merge with NetBSD make, pick up o restore duplicate supression in .MAKE.MAKEFILES runtime saving can be significant. o Var_Subst() uses Buf_DestroyCompact() to reduce memory consumption up to 20%. 2012-04-20 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20120420 Merge with NetBSD make, pick up o remove duplicate supression in .MAKE.MAKEFILES o improved dir cache behavior o gmake'ish export command 2012-03-25 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20120325 Merge with NetBSD make, pick up o fix parsing of :[#] in conditionals. 2012-02-10 Simon J. Gerraty * Makefile.in: replace use of .Nx in bmake.1 with NetBSD since some systems cannot cope with .Nx 2011-11-14 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20111111 Merge with NetBSD make, pick up o debug output for .PARSEDIR and .PARSEFILE 2011-10-10 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20111010 2011-10-09 Simon J. Gerraty * boot-strap: check for an expected file in the dirs we look for. * make-bootstrap.sh: pass on LDSTATIC 2011-10-01 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20111001 Merge with NetBSD make, pick up o ensure .PREFIX is set for .PHONY and .TARGET set for .PHONY run via .END o __dead used consistently 2011-09-10 Simon J. Gerraty * Makefile.in (MAKE_VERSION): 20110909 is a better number ;-) 2011-09-05 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20110905 Merge with NetBSD make, pick up o meta_oodate: ignore makeDependfile 2011-08-28 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20110828 Merge with NetBSD make, pick up o silent=yes in .MAKE.MODE causes meta mode to mark targets as SILENT if a .meta file is created 2011-08-18 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20110818 Merge with NetBSD make, pick up o in meta mode, if target flagged .META a missing .meta file means target is out-of-date o fixes for gcc 4.5 warnings o simplify job printing code 2011-08-09 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20110808 Merge with NetBSD make, pick up o do not touch OP_SPECIAL targets when doing make -t 2011-06-22 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20110622 Merge with NetBSD make, pick up o meta_oodate detect corrupted .meta file and declare oodate. * configure.in: add check for setsid 2011-06-07 Simon J. Gerraty * Merge with NetBSD make, pick up o unit-tests/modts now works on MirBSD 2011-06-04 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20110606 Merge with NetBSD make, pick up o ApplyModifiers: when we parse a variable which is not the entire modifier string, or not followed by ':', do not consider it as containing modifiers. o loadfile: ensure newline at end of mapped file. 2011-05-05 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20110505 Merge with NetBSD make, pick up o .MAKE.META.BAILIWICK - list of prefixes which define the scope of make's control. In meta mode, any generated file within said bailiwick, which is found to be missing, causes current target to be out-of-date. 2011-04-11 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20110411 Merge with NetBSD make, pick up o when long modifiers fail to match, check sysV style. - add a test case 2011-04-10 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20110410 Merge with NetBSD make, pick up o :hash - cheap 32bit hash of value o :localtime, :gmtime - use value as format string for strftime. 2011-03-30 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20110330 mostly because its a cooler version. Merge with NetBSD make, pick up o NetBSD tags for meta.[ch] o job.c call meta_job_finish() after meta_job_error(). o meta_job_error() should call meta_job_finish() to ensure .meta file is closed, and safe to copy - if .ERROR target wants. meta_job_finish() is safe to call repeatedly. 2011-03-29 Simon J. Gerraty * unit-tests/modts: use printf if it is a builtin, to save us from MirBSD * Makefile.in (MAKE_VERSION): bump version to 20110329 Merge with NetBSD make, pick up o fix for use after free() in CondDoExists(). o meta_oodate() report extra commands and return earlier. 2011-03-27 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20110327 Merge with NetBSD make, pick up o meta.c, if .MAKE.MODE contains curdirOk=yes allow creating .meta files in .CURDIR * boot-strap (TOOL_DIFF): aparently at least on linux distro formats the output of 'type' differently - so eat any "()" 2011-03-06 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20110306 Merge with NetBSD make, pick up o meta.c, only do getcwd() once 2011-03-05 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20110305 Merge with NetBSD make, pick up o correct sysV substitution handling of empty lhs and variable o correct exists() check for dir with trailing / o correct handling of modifiers for non-existant variables during evaluation of conditionals. o ensure MAP_FILE is defined. o meta.c use curdir[] now exported by main.c 2011-02-25 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20110225 Merge with NetBSD make, pick up o fix for incorrect .PARSEDIR when .OBJDIR is re-computed after makefiles have been read. o fix example of :? modifier in man page. 2011-02-13 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20110214 Merge with NetBSD make, pick up o meta.c handle realpath() failing when generating meta file name. * sigcompat.c: convert to ansi so we can use higher warning levels. 2011-02-07 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20110207 Merge with NetBSD make, pick up o fix for bug in meta mode. 2011-01-03 Simon J. Gerraty * parse.c: SunOS 5.8 at least does not have MAP_FILE 2011-01-01 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20110101 Merge with NetBSD make, pick up o use mmap(2) if available, for reading makefiles 2010-12-15 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20101215 Merge with NetBSD make, pick up o ensure meta_job_error() does not report a previous .meta file as being culprit. 2010-12-10 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20101210 Merge with NetBSD make, pick up o meta_oodate: track cwd per process, and only consider target out-of-date if missing file is outside make's CWD. Ignore files in /tmp/ etc. o to ensure unit-tests results match, need to control LC_ALL as well as LANG. o fix for parsing bug in var.c 2010-11-26 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20101126 Merge with NetBSD make, pick up o if stale dependency is an IMPSRC, search via .PATH o meta_oodate: if a referenced file is missing, target is out-of-date. o meta_oodate: if a target uses .OODATE in its commands, it (.OODATE) needs to be recomputed. o keep a pointer to youngest child node, rather than just its mtime. 2010-11-02 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20101101 2010-10-16 Simon J. Gerraty * machine.sh: like os.sh, allow for uname -p producing useless drivel 2010-09-13 Simon J. Gerraty * boot-strap: document configure knobs for meta and filemon. * Makefile.in (MAKE_VERSION): bump version to 20100911 Merge with NetBSD make, pick up o meta.c - meta mode * make-bootstrap.sh.in: handle meta.c * configure.in: add knobs for use_meta and filemon_h also, look for dirname, str[e]sep and strlcpy * util.c: add simple err[x] and warn[x] 2010-08-08 Simon J. Gerraty * boot-strap (TOOL_DIFF): set this to ensure tests use the same version of diff that configure tested * Makefile.in (MAKE_VERSION): bump version to 20100808 Merge with NetBSD make, pick up o in jobs mode, when we discover we cannot make something, call PrintOnError before exit. 2010-08-06 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20100806 Merge with NetBSD make, pick up o formatting fixes for ignored errors o ensure jobs are cleaned up regardless of where wait() was called. 2010-06-28 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20100618 * os.sh (MACHINE_ARCH): watch out for drivel from uname -p 2010-06-16 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20100616 Merge with NetBSD make, pick up o man page update o call PrintOnError from JobFinish when we detect an error we are not ignoring. 2010-06-06 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20100606 Merge with NetBSD make, pick up o man page update 2010-06-05 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20100605 Merge with NetBSD make, pick up o use bmake_signal() which is a wrapper around sigaction() in place of signal() o add .export-env to allow exporting variables to environment without tracking (so no re-export when the internal value is changed). 2010-05-24 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20100524 Merge with NetBSD make, pick up o fix for .info et al being greedy. 2010-05-23 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20100520 Merge with NetBSD make, pick up o back to using realpath on argv[0] but only if contains '/' and does not start with '/'. 2010-05-10 Simon J. Gerraty * boot-strap: use absolute path for bmake when running tests. * Makefile.in (MAKE_VERSION): bump version to 20100510 Merge with NetBSD make, pick up o revert use of realpath on argv[0] too many corner cases. o print MAKE_PRINT_VAR_ON_ERROR before running .ERROR target. 2010-05-05 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20100505 Merge with NetBSD make, pick up o fix for missed SIGCHLD when compiled with SunPRO actually for bmake, defining FORCE_POSIX_SIGNALS would have done the job. 2010-04-30 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20100430 Merge with NetBSD make, pick up o fflush stdout before writing to stdout 2010-04-23 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20100423 Merge with NetBSD make, pick up o updated unit tests for Haiku (this time for sure). * boot-strap: based on patch from joerg honor --with-default-sys-path better. * boot-strap: remove mention of --with-prefix-sys-path 2010-04-22 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20100422 * Merge with NetBSD make, pick up o fix for vfork() on Darwin. o fix for bogus $TMPDIR. o set .MAKE.MODE=compat for -B o set .MAKE.JOBS=max_jobs for -j max_jobs o allow unit-tests to run without any *.mk o unit-tests/modmisc be more conservative in dirs presumed to exist. * boot-strap: ignore /usr/share/mk except on NetBSD. * unit-tests/Makefile.in: set LANG=C when running unit-tests to ensure sort(1) behaves as expected. 2010-04-21 Simon J. Gerraty * boot-strap: add FindHereOrAbove so we can use -m .../mk 2010-04-20 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20100420 * Merge with NetBSD make, pick up o fix for variable realpath() behavior. we have to stat(2) the result to be sure. o fix for .export (all) when nested vars use :sh 2010-04-14 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20100414 * Merge with NetBSD make, pick up o use realpath to resolve argv[0] (for .MAKE) if needed. o add realpath from libc. o add :tA to resolve variable via realpath(3) if possible. 2010-04-08 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20100408 * Merge with NetBSD make, pick up o unit tests for .ERROR, .error o fix for .ERROR to ensure it cannot be default target. 2010-04-06 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20100406 * Merge with NetBSD make, pick up o fix for compat mode "Error code" going to debug_file. o fix for .ALLSRC being populated twice. o support for .info, .warning and .error directives o .MAKE.MODE to control make's operational mode o .MAKE.MAKEFILE_PREFERENCE to control the preferred makefile name(s). o .MAKE.DEPENDFILE to control the name of the depend file o .ERROR target - run on failure. 2010-03-18 Simon J. Gerraty * make-bootstrap.sh.in: extract MAKE_VERSION from Makefile * os.sh,arch.c: patch for Haiku from joerg at netbsd 2010-03-17 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20100222 * Merge with NetBSD make, pick up o better error msg for .for with mutiple inter vars * boot-strap: o use make-bootstrap.sh from joerg at netbsd to avoid the need for a native make when bootstrapping. o add "" everywhere ;-) o if /usr/share/tmac/andoc.tmac exists install nroff bmake.1 otherwise the pre-formated version. 2010-01-04 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20100102 * Merge with NetBSD make, pick up: o fix for -m .../ 2009-11-18 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20091118 * Merge with NetBSD make, pick up: o .unexport o report lines that start with '.' and should have ':' (catch typo's of .el*if). 2009-10-30 Simon J. Gerraty * configure.in: Ensure that srcdir and mksrc are absolute paths. 2009-10-09 Simon J. Gerraty * Makefile.in (MAKE_VERSION): fix version to 20091007 2009-10-07 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 200910007 * Merge with NetBSD make, pick up: o fix for parsing of :S;...;...; applied to .for loop iterator appearing in a dependency line. 2009-09-09 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20090909 * Merge with NetBSD make, pick up: o fix for -C, .CURDIR and .OBJDIR * boot-strap: o allow share_dir to be set independent of prefix. o select default share_dir better when prefix ends in $HOST_TARGET o if FORCE_BSD_MK etc were set, include them in the suggested install-mk command. 2009-09-08 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20090908 * Merge with NetBSD make, pick up: o .MAKE.LEVEL for recursion tracking o fix for :M scanning \: 2009-09-03 Simon J. Gerraty * configure.in: Don't -D__EXTENSIONS__ if AC_USE_SYSTEM_EXTENSIONS says "no". 2009-08-26 Simon J. Gerraty * Makefile.in (MAKE_VERSION): bump version to 20090826 Simplify MAKE_VERSION to just the bare date. * Merge with NetBSD make, pick up: o -C directory support. o support for SIGINFO o use $TMPDIR for temp files. o child of vfork should be careful about modifying parent's state. 2009-03-26 Simon J. Gerraty * Appy some patches for MiNT from David Brownlee 2009-02-26 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump version to 20090222 * Merge with NetBSD make, pick up: o Possible null pointer de-ref in Var_Set. 2009-02-08 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump version to 20090204 * Merge with NetBSD make, pick up: o bmake_malloc et al moved to their own .c o Count both () and {} when looking for the end of a :M pattern o Change 'Buffer' so that it is the actual struct, not a pointer to it. o strlist.c - functions for processing extendable arrays of pointers to strings. o ClientData replaced with void *, so const void * can be used. o New debug flag C for DEBUG_CWD 2008-11-11 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump version to 20081111 Apply patch from Joerg Sonnenberge to configure.in: o remove some redundant checks o check for emlloc etc only in libutil and require the whole family. util.c: o remove [v]asprintf which is no longer used. 2008-11-04 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump version to 20081101 * Merge with NetBSD make, pick up: o util.c: avoid use of putenv() - christos 2008-10-30 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump version to 20081030 pick up man page tweaks. 2008-10-29 Simon J. Gerraty * Makefile.in: move processing of LIBOBJS to after is definition! thus we'll have getenv.c in SRCS only if needed. * make.1: add examples of how to use :? * Makefile.in (BMAKE_VERSION): bump version to 20081029 * Merge with NetBSD make, pick up: o fix for .END processing with -j o segfault from Parse_Error when no makefile is open o handle numeric expressions in any variable expansion o debug output now defaults to stderr, -dF to change it - apb o make now uses bmake_malloc etc so that it can build natively on A/UX - wasn't an issue for bmake, but we want to keep in sync. 2008-09-27 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump version to 20080808 * Merge with NetBSD make, pick up: o fix for PR/38840: Pierre Pronchery: make crashes while parsing long lines in Makefiles o optimizations for VarQuote by joerg o fix for PR/38756: dominik: make dumps core on invalid makefile 2008-05-15 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump version to 20080515 * Merge with NetBSD make, pick up: o fix skip setting vars in VAR_GLOBAL context, to handle cases where VAR_CMD is used for other than command line vars. 2008-05-14 Simon J. Gerraty * boot-strap (make_version): we may need to look in $prefix/share/mk for sys.mk * Makefile.in (BMAKE_VERSION): bump version to 20080514 * Merge with NetBSD make, pick up: o skip setting vars in VAR_GLOBAL context, when already set in VAR_CMD which takes precedence. 2008-03-30 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump version to 20080330 * Merge with NetBSD make, pick up: o fix for ?= when LHS contains variable reference. 2008-02-15 Simon J. Gerraty * merge some patches from NetBSD pkgsrc. * makefile.boot.in (BOOTSTRAP_SYS_PATH): Allow better control of the MAKSYSPATH used during bootstrap. * Makefile.in (BMAKE_VERSION): bump version to 20080215 * Merge with NetBSD make, pick up: o warn if non-space chars follow 'empty' in a conditional. 2008-01-18 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump version to 20080118 * Merge with NetBSD make, pick up: o consider dependencies read from .depend as optional - dsl o remember when buffer for reading makefile grows - dsl o add -dl (aka LOUD) - David O'Brien 2007-10-22 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump version to 20071022 * Merge with NetBSD make, pick up: o Allow .PATH to be used for .include "" * boot-strap: source default settings from .bmake-boot-strap.rc 2007-10-16 Simon J. Gerraty * Makefile.in: fix maninstall on various systems provided that our man.mk is used. For non-BSD systems we install the preformatted page into $MANDIR/cat1 2007-10-15 Simon J. Gerraty * boot-strap: make bmake.1 too, so maninstall works. 2007-10-14 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump version to 20071014 * Merge with NetBSD make, pick up: o revamped handling of defshell - configure no longer needs to know the content of the shells array - apb o stop Var_Subst modifying its input - apb o avoid calling ParseTrackInput too often - dsl 2007-10-11 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump version to 20071011 * Merge with NetBSD make, pick up: o fix Shell_Init for case that _BASENAME_DEFSHELL is absolute path. * sigcompat.c: some tweaks for HP-UX 11.x based on patch from Tobias Nygren * configure.in: update handling of --with-defshell to match new make behavior. --with-defshell=/usr/xpg4/bin/sh will now do what one might hope - provided the chosen shell behaves enough like sh. 2007-10-08 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump to 20071008 * Merge with NetBSD make, pick up: o .MAKE.JOB.PREFIX - control the token output before jobs - sjg o .export/.MAKE.EXPORTED - export of variables - sjg o .MAKE.MAKEFILES - track all makefiles read - sjg o performance improvements - dsl o revamp parallel job scheduling - dsl 2006-07-28 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump to 20060728 * Merge with NetBSD make, pick up: o extra debug info during variable and cond processing - sjg o shell definition now covers newline - rillig o minor mem leak in PrintOnError - sjg 2006-05-11 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump to 20060511 * Merge with NetBSD make, pick up: o more memory leaks - coverity o possible overflow in ArchFindMember - coverity o extract variable modifier code out of Var_Parse() so it can be called recursively - sjg o unit-tests/moderrs - sjg 2006-04-12 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump to 20060412 * Merge with NetBSD make, pick up: o fixes for some memory leaks - coverity o only read first sys.mk etc when searching sysIncPath - sjg * main.c (ReadMakefile): remove hack for __INTERIX that prevented setting ${MAKEFILE} - OBATA Akio 2006-03-18 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump to 20060318 * Merge with NetBSD make, pick up: o cleanup of job.c to remove remote handling, distcc is more useful and this code was likely bit-rotting - dsl o fix for :P modifier - sjg * boot-strap: set default prefix to something reasonable (for me anyway). 2006-03-01 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump to 20060301 * Merge with NetBSD make, pick up: o make .WAIT apply recursively, document and test case - apb o allow variable modifiers in a variable appear anywhere in modifier list, document and test case - sjg 2006-02-22 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump to 20060222 * Merge with NetBSD make, pick up: o improved job token handling - dsl o SIG_DFL the correct signal before exec - dsl o more debug info during parsing - dsl o allow variable modifiers to be specified via variable - sjg * boot-strap: explain why we died if no mksrc 2005-11-05 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump to 20051105 * configure.in: always set default_sys_path default is ${prefix}/share/mk - remove prefix_sys_path, anyone wanting more than above needs to set it manually. 2005-11-04 Simon J. Gerraty * boot-strap: make this a bit easier for pkgsrc folk. bootstrap still fails on IRIX64 since MACHINE_ARCH gets set to 'mips' while pkgsrc wants 'mipseb' or 'mipsel' 2005-11-02 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump to 20051102 * job.c (JobFinish): fix likely ancient merge lossage fix from Todd Vierling. * boot-strap (srcdir): allow setting mksrc=none 2005-10-31 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump to 20051031 * ranlib.h: skip on OSF too. (NetBSD PR 31864) 2005-10-10 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump to 20051002 fix a silly typo 2005-10-09 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump to 20051001 support for UnixWare and some other systems, based on patches from pkgsrc/bootstrap 2005-09-03 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump to 20050901 * Merge with NetBSD make, pick up: o possible parse error causing us to wander off. 2005-06-06 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump to 20050606 * Merge with NetBSD make, pick up: o :0x modifier for randomizing a list o fixes for a number of -Wuninitialized issues. 2005-05-30 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump to 20050530 * Merge with NetBSD make, pick up: o Handle dependencies for .BEGIN, .END and .INTERRUPT * README: was seriously out of date. 2005-03-22 Simon J. Gerraty * Important to use .MAKE rather than MAKE. 2005-03-15 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump to 20050315 * Merge with NetBSD make, pick up: o don't mistake .elsefoo for .else o use suffix-specific search path correctly o bunch of style nits 2004-05-11 Simon J. Gerraty * boot-strap: o ensure that args to --src and --with-mksrc are resolved before giving them to configure. o add -o "objdir" so that builder can control it, default is $OS as determined by os.sh o add -q to suppress all the install instructions. 2004-05-08 Simon J. Gerraty * Remove __IDSTRING() * Makefile.in (BMAKE_VERSION): bump to 20040508 * Merge with NetBSD make, pick up: o posix fixes - remove '-e' from compat mode - add support for '+' command-line prefix. o fix for handling '--' on command-line. o fix include in lst.lib/lstInt.h to simplify '-I's o we also picked up replacement of MAKE_BOOTSTRAP with !MAKE_NATIVE which is a noop, but possibly confusing. 2004-04-14 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump to 20040414 * Merge with NetBSD make, pick up: o allow quoted strings on lhs of conditionals o issue warning when extra .else is seen o print line numer when errors encountered during parsing from string. 2004-02-20 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump to 20040220 * Merge with NetBSD make, pick up: o fix for old :M parsing bug. o re-jigged unit-tests 2004-02-15 Simon J. Gerraty * Makefile.in (accept test): use ${.MAKE:S,^./,${.CURDIR}/,} so that './bmake -f Makefile test' works. 2004-02-14 Simon J. Gerraty * Makefile.in: (BMAKE_VERSION): bump to 20040214 * Merge with NetBSD make, pick up: o search upwards for *.mk o fix for double free of var substitution buffers o use of getopt replaced with custom code, since the usage (re-scanning) isn't posix compatible. 2004-02-12 Simon J. Gerraty * arch.c: don't include ranlib.h on ELF systems (thanks to Chuck Cranor ). 2004-01-18 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump to 20040118 * boot-strap (while): export vars we assign to on cmdline * unit-test/Makefile.in: ternary is .PHONY 2004-01-08 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump version to 20040108 * Merge with NetBSD make, pick up: o fix for ternary modifier 2004-01-06 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump version to 20040105 * Merge with NetBSD make, pick up: o fix for cond.c to handle compound expressions better o variable expansion within sysV style replacements 2003-12-22 Simon J. Gerraty * Make portable snprintf safer - output to /dev/null first to check space needed. * Makefile.in (BMAKE_VERSION): bump version to 20031222 * Merge with NetBSD make, pick up: o -dg3 to show input graph when things go wrong. o explicitly look for makefiles in objdir if not found in curdir so that errors in .depend etc will be reported accurarely. o avoid use of -e in shell scripts in jobs mode, use '|| exit $?' instead as it more accurately reflects the expected behavior and is more consistently implemented. o avoid use of asprintf. 2003-09-28 Simon J. Gerraty * util.c: Add asprintf and vasprintf. * Makefile.in (BMAKE_VERSION): bump version to 20030928 * Merge with NetBSD make, pick up: :[] modifier - allows picking words from a variable. :tW modifier - allows treating value as one big word. W flag for :C and :S - allows treating value as one big word. 2003-09-12 Simon J. Gerraty * Merge with NetBSD make pick up -de flag to enable printing failed command. don't skip 1st two dir entries (normally . and ..) since coda does not have them. 2003-09-09 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump version to 20030909 * Merge with NetBSD make, pick up: - changes for -V '${VAR}' to print fully expanded value cf. -V VAR - CompatRunCommand now prints the command that failed. - several files got updated 3 clause Berkeley license. 2003-08-02 Simon J. Gerraty * boot-strap: Allow setting configure args on command line. 2003-07-31 Simon J. Gerraty * configure.in: add --with-defshell to allow sh or ksh to be selected as default shell. * Makefile.in: bump version to 20030731 * Merge with NetBSD make Pick up .SHELL spec for ksh and associate man page changes. Also compat mode now uses the same shell specs. 2003-07-29 Simon J. Gerraty * var.c (Var_Parse): ensure delim is initialized. * unit-tests/Makefile.in: use single quotes to avoid problems from some shells. * makefile.boot.in: Run the unit-tests as part of the bootstrap procedure. 2003-07-28 Simon J. Gerraty * unit-tests/Makefile.in: always force complaints from ${TEST_MAKE} to be from 'make'. * configure.in: add check for 'diff -u' also fix some old autoconf'isms * Makefile.in (BMAKE_VERSION): bump version to 20030728. if using GCC add -Wno-cast-qual to CFLAGS for var.o * Merge with NetBSD make Pick up fix for :ts parsing error in some cases. Pick unit-tests. 2003-07-23 Simon J. Gerraty * Makefile.in (BMAKE_VERSION): bump version to 20030723. * var.c (Var_Parse): fix bug in :ts modifier, after const correctness fixes, must pass nstr to VarModify. 2003-07-14 Simon J. Gerraty * Makefile.in: BMAKE_VERSION switch to a date based version. We'll generally use the date of last import from NetBSD. * Merge with NetBSD make Pick up fixes for const-correctness, now passes WARNS=3 on NetBSD. Pick up :ts modifier, allows controlling the separator used between words in variable expansion. 2003-07-11 Simon J. Gerraty * FILES: include boot-strap and os.sh * Makefile.in: only set WARNS if we are NetBSD, the effect on FreeBSD is known to be bad. * makefile.boot.in (bootstrap): make this the default target. * Makefile.in: bump version to 3.1.19 * machine.sh: avoid A-Z with tr as it is bound to lose. 2003-07-10 Simon J. Gerraty * Merge with NetBSD make Pick up fix for PR/19781 - unhelpful error msg on unclosed ${var:foo Plus some doc fixes. 2003-04-27 Simon J. Gerraty * Merge with NetBSD make Pick up fix for PR/1523 - don't count a library as built, if there is no way to build it * Bump version to 3.1.18 2003-03-23 Simon J. Gerraty * Merge with NetBSD make Pick up fix for ParseDoSpecialSrc - we only use it if .WAIT appears in src list. 2003-03-21 Simon J. Gerraty * Merge with NetBSD make (mmm 10th anniversary!) pick up fix for .WAIT in srcs that refer to $@ or $* (PR#20828) pick up -X which tells us to not export VAR=val via setenv if we are already doing so via MAKEFLAGS. This saves valuable env space on systems like Darwin. set MAKE_VERSION to 3.1.17 * parse.c: pix up fix for suffix rules 2003-03-06 Simon J. Gerraty * Merge with NetBSD make. pick up fix for propagating -B via MAKEFLAGS. set MAKE_VERSION to 3.1.16 * Apply some patches from pkgsrc-bootstrap/bmake Originally by Grant Beattie I may have missed some - since they are based on bmake-3.1.12 2002-12-03 Simon J. Gerraty * makefile.boot.in (bmake): update install targets for those that use them, also clear MAKEFLAGS when invoking bmake.boot to avoid havoc from gmake -w. Thanks to Harlan Stenn . * bmake.cat1: update the pre-formatted man page! 2002-11-30 Simon J. Gerraty * Merge with NetBSD make. pick up fix for premature free of pointer used in call to Dir_InitCur(). set MAKE_VERSION to 3.1.15 2002-11-26 Simon J. Gerraty * configure.in: determine suitable value for MKSRC. override using --with-mksrc=PATH. * machine.sh: use `uname -p` for MACHINE_ARCH on modern SunOS systems. configs(8) will use 'sun4' as an alias for 'sparc'. 2002-11-25 Simon J. Gerraty * Merge with NetBSD make. pick up ${.PATH} pick up fix for finding ../cat.c via .PATH when .CURDIR=.. set MAKE_VERSION to 3.1.14 add configure checks for killpg and sys/socket.h 2002-09-16 Simon J. Gerraty * tag bmake-3-1-13 * makefile.boot.in (bmake): use install-mk Also setup ./mk before trying to invoke bmake.boot incase we needed install-mk to create a sys.mk for us. * configure.in: If we need to add -I${srcdir}/missing, make it an absolute path so that it works for lst.lib too. * make.h: always include sys/cdefs.h since we provide one if the host does not. * Makefile.in (install-mk): use MKSRC/install-mk which will do the right thing. use uname -p for ARCH if possible. since install-mk will setup links bsd.prog.mk -> prog.mk if needed, just .include bsd.prog.mk * Merge with NetBSD make (NetBSD-1.6) Code is ansi-C only now. Bug in handling of dotLast is fixed. Can now assign .OBJDIR and make will reset its notions of life. New modifiers :tu :tl for toUpper and toLower. Tue Oct 16 12:18:42 2001 Simon J. Gerraty * Merge with NetBSD make pick up fix for .END failure in compat mode. pick up fix for extra va_end() in ParseVErrorInternal. Thu Oct 11 13:20:06 2001 Simon J. Gerraty * configure.in: for systems that have sys/cdefs.h check if it is compatible. If not, include the one under missing, but tell it to include the native one too - necessary on Linux. * missing/sys/cdefs.h: if NEED_HOST_CDEFS_H is defined, use include_next (for gcc) to get the native sys/cdefs.h Tue Aug 21 02:29:34 2001 Simon J. Gerraty * job.c (JobFinish): Fix an earlier merge bug that resulted in leaking descriptors when using -jN. * job.c (JobPrintCommand): See if "curdir" exists before attempting to chdir(). Doing the chdir directly in make (when in compat mode) fails silently, so let the -jN version do the same. This can happen when building kernels in an object tree and playing clever games to reset .CURDIR. * Merged with NetBSD make pick up .USEBEFORE Tue Jun 26 23:45:11 2001 Simon J. Gerraty * makefile.boot.in: Give bmake.boot a MAKESYSPATH that might work. Tue Jun 12 16:48:57 2001 Simon J. Gerraty * var.c (Var_Set): Add 4th (flags) arg so VarLoopExpand can tell us not to export the iterator variable when using VAR_CMD context. Sun Jun 10 21:55:21 2001 Simon J. Gerraty * job.c (Job_CatchChildren): don't call Job_CatchOutput() here, its the wrong "fix". Sat Jun 9 00:11:24 2001 Simon J. Gerraty * Redesigned export of VAR_CMD's via MAKEFLAGS. We now simply append the variable names to .MAKEOVERRIDES, and handle duplicate suppression and quoting in ExportMAKEFLAGS using: ${.MAKEOVERRIDES:O:u:@v@$v=${$v:Q}@} Apart from fixing quoting bugs in previous version, this allows us to export vars to the environment by simply doing: .MAKEOVERRIDES+= PATH Merged again with NetBSD make, but the above is the only change. * configure.in: added --disable-pwd-override disable $PWD overriding getcwd() --disable-check-make-chdir disable make trying to guess when it should automatically cd ${.CURDIR} * Merge with NetBSD make, changes include: parse.c (ParseDoDependency): Spot that the syntax error is caused by an unresolved cvs/rcs conflict and say so. var.c: most of Var* functions now take a ctxt as 1st arg. now does variable substituion on rhs of sysv style modifiers. * var.c (Var_Set): exporting of command line variables (VAR_CMD) is now done here. We append the name='value' to .MAKEOVERRIDES rather than directly into MAKEFLAGS as this allows a Makefile to use .MAKEOVERRIDES= to disable this behaviour. GNU make uses a very similar mechanism. Note that in adding name='value' to .MAKEOVERRIDES we do the moral equivalent of: .MAKEOVERRIDES:= ${.MAKEOVERRIDES:Nname=*} name='val' Fri Jun 1 14:08:02 2001 Simon J. Gerraty * make-conf.h (USE_IOVEC): make it conditional on HAVE_SYS_UIO_H * Merged with NetBSD make make -dx can now be used to run commands via sh -x better error messages on exec failures. Thu May 31 01:44:54 2001 Simon J. Gerraty * Makefile.in (main.o): depends on ${SRCS} ${MAKEFILE} so that MAKE_VERSION gets updated. Also don't use ?= for MAKE_VERSION, MACHINE etc otherwise they propagate from the previous bmake. * configure.in (machine): allow --with-machine=generic to make configure use machine.sh to set MACHINE. * job.c (JobInterrupt): convert to using WAIT_T and friends. * Makefile.in: mention in bmake.1 that we use autoconf. * make.1: mention MAKE_PRINT_VAR_ON_ERROR. Wed May 30 23:17:18 2001 Simon J. Gerraty * main.c (ReadMakefile): don't set MAKEFILE if reading ".depend" as that rather defeats the usefulness of ${MAKEFILE}. * main.c (MainParseArgs): append command line variable assignments to MAKEFLAGS so that they get propagated to child make's. Apparently this is required POSIX behaviour? Its useful anyway. Tue May 29 02:20:07 2001 Simon J. Gerraty * compat.c (CompatRunCommand): don't use perror() since stdio may cause problems in child of vfork(). * compat.c, main.c: Call PrintOnError() when we are going to bail. This routine prints out the .curdir where we stopped and will also display any vars listed in ${MAKE_PRINT_VAR_ON_ERROR}. * main.c: add ${.newline} to hold a "\n" - sometimes handy in :@ expansion. * var.c: VarLoopExpand: ignore addSpace if a \n is present. * Added RCSid's for the files we've touched. Thu May 24 15:41:37 2001 Simon J. Gerraty * configure.in: Thanks to some clues from mdb@juniper.net, added autoconf magic to control setting of MACHINE, MACHINE_ARCH as well as what ends up in _PATH_DEFSYSPATH. We now have: --with-machine=MACHINE explicitly set MACHINE --with-force-machine=MACHINE set FORCE_MACHINE --with-machine_arch=MACHINE_ARCH explicitly set MACHINE_ARCH --with-default-sys-path=PATH:DIR:LIST use an explicit _PATH_DEFSYSPATH --with-prefix-sys-path=PATH:DIR:LIST prefix _PATH_PREFIX_SYSPATH --with-path-objdirprefix=PATH override _PATH_OBJDIRPREFIX If _PATH_OBJDIRPREFIX is set to "no" we won't define it. * makefile: added a pathetically simple makefile to drive bootstrapping. Running configure by hand is more useful. * Makefile.in: added MAKE_VERSION, and reworked things to be less dependent on NetBSD bsd.*.mk * pathnames.h: allow NO_PATH_OBJDIRPREFIX to stop us defining _PATH_OBJDIRPREFIX for those that don't want a default. construct _PATH_DEFSYSPATH from the info we get from configure. * main.c: allow for no _PATH_OBJDIRPREFIX, set ${MAKE_VERSION} if MAKE_VERSION is defined. * compat.c: when we bail, print out the .CURDIR we were in. Sat May 12 00:34:12 2001 Simon J. Gerraty * Merged with NetBSD make * var.c: fixed a bug in the handling of the modifier :P if the node as found but the path was null, we segfault trying to duplicate it. Mon Mar 5 16:20:33 2001 Simon J. Gerraty * Merged with NetBSD make * make.c: Make_OODate's test for a library out of date was using cmtime where it should have used mtime (my bug). * compat.c: Use perror() to tell us what really went wrong when we cannot exec a command. Fri Dec 15 10:11:08 2000 Simon J. Gerraty * Merged with NetBSD make Sat Jun 10 10:11:08 2000 Simon J. Gerraty * Merged with NetBSD make Thu Jun 1 10:11:08 2000 Simon J. Gerraty * Merged with NetBSD make Tue May 30 10:11:08 2000 Simon J. Gerraty * Merged with NetBSD make Thu Apr 27 00:07:47 2000 Simon J. Gerraty * util.c: don't provide signal() since we use sigcompat.c * Makefile.in: added a build target. * var.c (Var_Parse): added ODE modifiers :U, :D, :L, :P, :@ and :! These allow some quite clever magic. * main.c (main): added support for getenv(MAKESYSPATH). Mon Apr 2 16:25:13 2000 Simon J. Gerraty * Disable $PWD overriding getcwd() if MAKEOBJDIRPREFIX is set. This avoids objdir having a different value depending on how a directory was reached (via command line, or subdir.mk). * If FORCE_MACHINE is defined, ignore getenv("MACHINE"). Mon Apr 2 23:15:31 2000 Simon J. Gerraty * Do a chdir(${.CURDIR}) before invoking ${.MAKE} or ${.MAKE:T} if MAKEOBJDIRPREFIX is set and NOCHECKMAKECHDIR is not. I've been testing this in NetBSD's make for some weeks. * Turn Makefile into Makefile.in and make it useful. Tue Feb 29 22:08:00 2000 Simon J. Gerraty * Imported NetBSD's -current make(1) and resolve conflicts. * Applied autoconf patches from bmake v2 * Imported clean code base from NetBSD-1.0 Index: projects/clang390-import/contrib/bmake/Makefile =================================================================== --- projects/clang390-import/contrib/bmake/Makefile (revision 305686) +++ projects/clang390-import/contrib/bmake/Makefile (revision 305687) @@ -1,222 +1,222 @@ -# $Id: Makefile,v 1.67 2016/06/07 00:46:12 sjg Exp $ +# $Id: Makefile,v 1.72 2016/08/18 23:02:26 sjg Exp $ # Base version on src date -_MAKE_VERSION= 20160606 +_MAKE_VERSION= 20160818 PROG= bmake SRCS= \ arch.c \ buf.c \ compat.c \ cond.c \ dir.c \ for.c \ hash.c \ job.c \ main.c \ make.c \ make_malloc.c \ meta.c \ metachar.c \ parse.c \ str.c \ strlist.c \ suff.c \ targ.c \ trace.c \ util.c \ var.c # from lst.lib/ SRCS+= \ lstAppend.c \ lstAtEnd.c \ lstAtFront.c \ lstClose.c \ lstConcat.c \ lstDatum.c \ lstDeQueue.c \ lstDestroy.c \ lstDupl.c \ lstEnQueue.c \ lstFind.c \ lstFindFrom.c \ lstFirst.c \ lstForEach.c \ lstForEachFrom.c \ lstInit.c \ lstInsert.c \ lstIsAtEnd.c \ lstIsEmpty.c \ lstLast.c \ lstMember.c \ lstNext.c \ lstOpen.c \ lstPrev.c \ lstRemove.c \ lstReplace.c \ lstSucc.c # this file gets generated by configure .-include "Makefile.config" .if !empty(LIBOBJS) SRCS+= ${LIBOBJS:T:.o=.c} .endif # just in case prefix?= /usr srcdir?= ${.CURDIR} DEFAULT_SYS_PATH?= ${prefix}/share/mk CPPFLAGS+= -DUSE_META CFLAGS+= ${CPPFLAGS} CFLAGS+= -D_PATH_DEFSYSPATH=\"${DEFAULT_SYS_PATH}\" CFLAGS+= -I. -I${srcdir} ${XDEFS} -DMAKE_NATIVE CFLAGS+= ${COPTS.${.ALLSRC:M*.c:T:u}} COPTS.main.c+= "-DMAKE_VERSION=\"${_MAKE_VERSION}\"" # meta mode can be useful even without filemon FILEMON_H ?= /usr/include/dev/filemon/filemon.h .if exists(${FILEMON_H}) && ${FILEMON_H:T} == "filemon.h" COPTS.meta.c += -DHAVE_FILEMON_H -I${FILEMON_H:H} .endif .PATH: ${srcdir} .PATH: ${srcdir}/lst.lib .if make(obj) || make(clean) SUBDIR+= unit-tests .endif # start-delete1 for bsd.after-import.mk # we skip a lot of this when building as part of FreeBSD etc. # list of OS's which are derrived from BSD4.4 BSD44_LIST= NetBSD FreeBSD OpenBSD DragonFly MirBSD Bitrig # we are... OS!= uname -s # are we 4.4BSD ? isBSD44:=${BSD44_LIST:M${OS}} .if ${isBSD44} == "" MANTARGET= cat INSTALL?=${srcdir}/install-sh .if (${MACHINE} == "sun386") # even I don't have one of these anymore :-) CFLAGS+= -DPORTAR .elif (${MACHINE} != "sunos") SRCS+= sigcompat.c CFLAGS+= -DSIGNAL_FLAGS=SA_RESTART .endif .else MANTARGET?= man .endif # turn this on by default - ignored if we are root WITH_INSTALL_AS_USER= # suppress with -DWITHOUT_* OPTIONS_DEFAULT_YES+= \ AUTOCONF_MK \ INSTALL_MK \ PROG_LINK OPTIONS_DEFAULT_NO+= \ PROG_VERSION # process options now .include .if ${MK_PROG_VERSION} == "yes" PROG_NAME= ${PROG}-${_MAKE_VERSION} .if ${MK_PROG_LINK} == "yes" SYMLINKS+= ${PROG_NAME} ${BINDIR}/${PROG} .endif .endif EXTRACT_MAN=no # end-delete1 MAN= ${PROG}.1 MAN1= ${MAN} .if (${PROG} != "make") CLEANFILES+= my.history .if make(${MAN}) || !exists(${srcdir}/${MAN}) my.history: ${MAKEFILE} @(echo ".Nm"; \ echo "is derived from NetBSD"; \ echo ".Xr make 1 ."; \ echo "It uses autoconf to facilitate portability to other platforms."; \ echo ".Pp") > $@ .NOPATH: ${MAN} ${MAN}: make.1 my.history @echo making $@ @sed -e 's/^.Nx/NetBSD/' -e '/^.Nm/s/make/${PROG}/' \ -e '/^.Sh HISTORY/rmy.history' \ -e '/^.Sh HISTORY/,$$s,^.Nm,make,' ${srcdir}/make.1 > $@ all beforeinstall: ${MAN} _mfromdir=. .endif .endif MANTARGET?= cat MANDEST?= ${MANDIR}/${MANTARGET}1 .if ${MANTARGET} == "cat" _mfromdir=${srcdir} .endif .include CPPFLAGS+= -DMAKE_NATIVE -DHAVE_CONFIG_H COPTS.var.c += -Wno-cast-qual COPTS.job.c += -Wno-format-nonliteral COPTS.parse.c += -Wno-format-nonliteral COPTS.var.c += -Wno-format-nonliteral # Force these SHAREDIR= ${SHAREDIR.bmake:U${prefix}/share} BINDIR= ${BINDIR.bmake:U${prefix}/bin} MANDIR= ${MANDIR.bmake:U${SHAREDIR}/man} .if !exists(.depend) ${OBJS}: config.h .endif # make sure that MAKE_VERSION gets updated. main.o: ${SRCS} ${MAKEFILE} # start-delete2 for bsd.after-import.mk .if ${MK_AUTOCONF_MK} == "yes" .include .endif SHARE_MK?=${SHAREDIR}/mk MKSRC=${srcdir}/mk INSTALL?=${srcdir}/install-sh .if ${MK_INSTALL_MK} == "yes" install: install-mk .endif beforeinstall: test -d ${DESTDIR}${BINDIR} || ${INSTALL} -m 775 -d ${DESTDIR}${BINDIR} test -d ${DESTDIR}${MANDEST} || ${INSTALL} -m 775 -d ${DESTDIR}${MANDEST} install-mk: .if exists(${MKSRC}/install-mk) test -d ${DESTDIR}${SHARE_MK} || ${INSTALL} -m 775 -d ${DESTDIR}${SHARE_MK} sh ${MKSRC}/install-mk -v -m 644 ${DESTDIR}${SHARE_MK} .else @echo need to unpack mk.tar.gz under ${srcdir} or set MKSRC; false .endif # end-delete2 # A simple unit-test driver to help catch regressions accept test: cd ${.CURDIR}/unit-tests && MAKEFLAGS= ${.MAKE} -r -m / TEST_MAKE=${TEST_MAKE:U${.OBJDIR}/${PROG:T}} ${.TARGET} Index: projects/clang390-import/contrib/bmake/bmake.1 =================================================================== --- projects/clang390-import/contrib/bmake/bmake.1 (revision 305686) +++ projects/clang390-import/contrib/bmake/bmake.1 (revision 305687) @@ -1,2331 +1,2346 @@ -.\" $NetBSD: make.1,v 1.259 2016/06/03 07:07:37 wiz Exp $ +.\" $NetBSD: make.1,v 1.262 2016/08/18 19:23:20 wiz Exp $ .\" .\" Copyright (c) 1990, 1993 .\" The Regents of the University of California. All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 3. Neither the name of the University nor the names of its contributors .\" may be used to endorse or promote products derived from this software .\" without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" from: @(#)make.1 8.4 (Berkeley) 3/19/94 .\" -.Dd June 2, 2016 +.Dd August 15, 2016 .Dt MAKE 1 .Os .Sh NAME .Nm bmake .Nd maintain program dependencies .Sh SYNOPSIS .Nm .Op Fl BeikNnqrstWwX .Op Fl C Ar directory .Op Fl D Ar variable .Op Fl d Ar flags .Op Fl f Ar makefile .Op Fl I Ar directory .Op Fl J Ar private .Op Fl j Ar max_jobs .Op Fl m Ar directory .Op Fl T Ar file .Op Fl V Ar variable .Op Ar variable=value .Op Ar target ... .Sh DESCRIPTION .Nm is a program designed to simplify the maintenance of other programs. Its input is a list of specifications as to the files upon which programs and other files depend. If no .Fl f Ar makefile makefile option is given, .Nm will try to open .Ql Pa makefile then .Ql Pa Makefile in order to find the specifications. If the file .Ql Pa .depend exists, it is read (see .Xr mkdep 1 ) . .Pp This manual page is intended as a reference document only. For a more thorough description of .Nm and makefiles, please refer to .%T "PMake \- A Tutorial" . .Pp .Nm will prepend the contents of the .Va MAKEFLAGS environment variable to the command line arguments before parsing them. .Pp The options are as follows: .Bl -tag -width Ds .It Fl B Try to be backwards compatible by executing a single shell per command and by executing the commands to make the sources of a dependency line in sequence. .It Fl C Ar directory Change to .Ar directory before reading the makefiles or doing anything else. If multiple .Fl C options are specified, each is interpreted relative to the previous one: .Fl C Pa / Fl C Pa etc is equivalent to .Fl C Pa /etc . .It Fl D Ar variable Define .Ar variable to be 1, in the global context. .It Fl d Ar [-]flags Turn on debugging, and specify which portions of .Nm are to print debugging information. Unless the flags are preceded by .Ql \- they are added to the .Va MAKEFLAGS environment variable and will be processed by any child make processes. By default, debugging information is printed to standard error, but this can be changed using the .Ar F debugging flag. The debugging output is always unbuffered; in addition, if debugging is enabled but debugging output is not directed to standard output, then the standard output is line buffered. .Ar Flags is one or more of the following: .Bl -tag -width Ds .It Ar A Print all possible debugging information; equivalent to specifying all of the debugging flags. .It Ar a Print debugging information about archive searching and caching. .It Ar C Print debugging information about current working directory. .It Ar c Print debugging information about conditional evaluation. .It Ar d Print debugging information about directory searching and caching. .It Ar e Print debugging information about failed commands and targets. .It Ar F Ns Oo Sy \&+ Oc Ns Ar filename Specify where debugging output is written. This must be the last flag, because it consumes the remainder of the argument. If the character immediately after the .Ql F flag is .Ql \&+ , then the file will be opened in append mode; otherwise the file will be overwritten. If the file name is .Ql stdout or .Ql stderr then debugging output will be written to the standard output or standard error output file descriptors respectively (and the .Ql \&+ option has no effect). Otherwise, the output will be written to the named file. If the file name ends .Ql .%d then the .Ql %d is replaced by the pid. .It Ar f Print debugging information about loop evaluation. .It Ar "g1" Print the input graph before making anything. .It Ar "g2" Print the input graph after making everything, or before exiting on error. .It Ar "g3" Print the input graph before exiting on error. .It Ar j Print debugging information about running multiple shells. .It Ar l Print commands in Makefiles regardless of whether or not they are prefixed by .Ql @ or other "quiet" flags. Also known as "loud" behavior. .It Ar M Print debugging information about "meta" mode decisions about targets. .It Ar m Print debugging information about making targets, including modification dates. .It Ar n Don't delete the temporary command scripts created when running commands. These temporary scripts are created in the directory referred to by the .Ev TMPDIR environment variable, or in .Pa /tmp if .Ev TMPDIR is unset or set to the empty string. The temporary scripts are created by .Xr mkstemp 3 , and have names of the form .Pa makeXXXXXX . .Em NOTE : This can create many files in .Ev TMPDIR or .Pa /tmp , so use with care. .It Ar p Print debugging information about makefile parsing. .It Ar s Print debugging information about suffix-transformation rules. .It Ar t Print debugging information about target list maintenance. .It Ar V Force the .Fl V option to print raw values of variables. .It Ar v Print debugging information about variable assignment. .It Ar x Run shell commands with .Fl x so the actual commands are printed as they are executed. .El .It Fl e Specify that environment variables override macro assignments within makefiles. .It Fl f Ar makefile Specify a makefile to read instead of the default .Ql Pa makefile . If .Ar makefile is .Ql Fl , standard input is read. Multiple makefiles may be specified, and are read in the order specified. .It Fl I Ar directory Specify a directory in which to search for makefiles and included makefiles. The system makefile directory (or directories, see the .Fl m option) is automatically included as part of this list. .It Fl i Ignore non-zero exit of shell commands in the makefile. Equivalent to specifying .Ql Fl before each command line in the makefile. .It Fl J Ar private This option should .Em not be specified by the user. .Pp When the .Ar j option is in use in a recursive build, this option is passed by a make to child makes to allow all the make processes in the build to cooperate to avoid overloading the system. .It Fl j Ar max_jobs Specify the maximum number of jobs that .Nm may have running at any one time. The value is saved in .Va .MAKE.JOBS . Turns compatibility mode off, unless the .Ar B flag is also specified. When compatibility mode is off, all commands associated with a target are executed in a single shell invocation as opposed to the traditional one shell invocation per line. This can break traditional scripts which change directories on each command invocation and then expect to start with a fresh environment on the next line. It is more efficient to correct the scripts rather than turn backwards compatibility on. .It Fl k Continue processing after errors are encountered, but only on those targets that do not depend on the target whose creation caused the error. .It Fl m Ar directory Specify a directory in which to search for sys.mk and makefiles included via the .Ao Ar file Ac Ns -style include statement. The .Fl m option can be used multiple times to form a search path. This path will override the default system include path: /usr/share/mk. Furthermore the system include path will be appended to the search path used for .Qo Ar file Qc Ns -style include statements (see the .Fl I option). .Pp If a file or directory name in the .Fl m argument (or the .Ev MAKESYSPATH environment variable) starts with the string .Qq \&.../ then .Nm will search for the specified file or directory named in the remaining part of the argument string. The search starts with the current directory of the Makefile and then works upward towards the root of the file system. If the search is successful, then the resulting directory replaces the .Qq \&.../ specification in the .Fl m argument. If used, this feature allows .Nm to easily search in the current source tree for customized sys.mk files (e.g., by using .Qq \&.../mk/sys.mk as an argument). .It Fl n Display the commands that would have been executed, but do not actually execute them unless the target depends on the .MAKE special source (see below). .It Fl N Display the commands which would have been executed, but do not actually execute any of them; useful for debugging top-level makefiles without descending into subdirectories. .It Fl q Do not execute any commands, but exit 0 if the specified targets are up-to-date and 1, otherwise. .It Fl r Do not use the built-in rules specified in the system makefile. .It Fl s Do not echo any commands as they are executed. Equivalent to specifying .Ql Ic @ before each command line in the makefile. .It Fl T Ar tracefile When used with the .Fl j flag, append a trace record to .Ar tracefile for each job started and completed. .It Fl t Rather than re-building a target as specified in the makefile, create it or update its modification time to make it appear up-to-date. .It Fl V Ar variable Print .Nm Ns 's idea of the value of .Ar variable , in the global context. Do not build any targets. Multiple instances of this option may be specified; the variables will be printed one per line, with a blank line for each null or undefined variable. If .Ar variable contains a .Ql \&$ then the value will be expanded before printing. .It Fl W Treat any warnings during makefile parsing as errors. .It Fl w Print entering and leaving directory messages, pre and post processing. .It Fl X Don't export variables passed on the command line to the environment individually. Variables passed on the command line are still exported via the .Va MAKEFLAGS environment variable. This option may be useful on systems which have a small limit on the size of command arguments. .It Ar variable=value Set the value of the variable .Ar variable to .Ar value . Normally, all values passed on the command line are also exported to sub-makes in the environment. The .Fl X flag disables this behavior. Variable assignments should follow options for POSIX compatibility but no ordering is enforced. .El .Pp There are seven different types of lines in a makefile: file dependency specifications, shell commands, variable assignments, include statements, conditional directives, for loops, and comments. .Pp In general, lines may be continued from one line to the next by ending them with a backslash .Pq Ql \e . The trailing newline character and initial whitespace on the following line are compressed into a single space. .Sh FILE DEPENDENCY SPECIFICATIONS Dependency lines consist of one or more targets, an operator, and zero or more sources. This creates a relationship where the targets .Dq depend on the sources and are usually created from them. The exact relationship between the target and the source is determined by the operator that separates them. The three operators are as follows: .Bl -tag -width flag .It Ic \&: A target is considered out-of-date if its modification time is less than those of any of its sources. Sources for a target accumulate over dependency lines when this operator is used. The target is removed if .Nm is interrupted. .It Ic \&! Targets are always re-created, but not until all sources have been examined and re-created as necessary. Sources for a target accumulate over dependency lines when this operator is used. The target is removed if .Nm is interrupted. .It Ic \&:: If no sources are specified, the target is always re-created. Otherwise, a target is considered out-of-date if any of its sources has been modified more recently than the target. Sources for a target do not accumulate over dependency lines when this operator is used. The target will not be removed if .Nm is interrupted. .El .Pp Targets and sources may contain the shell wildcard values .Ql \&? , .Ql * , .Ql [] , and .Ql {} . The values .Ql \&? , .Ql * , and .Ql [] may only be used as part of the final component of the target or source, and must be used to describe existing files. The value .Ql {} need not necessarily be used to describe existing files. Expansion is in directory order, not alphabetically as done in the shell. .Sh SHELL COMMANDS Each target may have associated with it one or more lines of shell commands, normally used to create the target. Each of the lines in this script .Em must be preceded by a tab. (For historical reasons, spaces are not accepted.) While targets can appear in many dependency lines if desired, by default only one of these rules may be followed by a creation script. If the .Ql Ic \&:: operator is used, however, all rules may include scripts and the scripts are executed in the order found. .Pp Each line is treated as a separate shell command, unless the end of line is escaped with a backslash .Pq Ql \e in which case that line and the next are combined. .\" The escaped newline is retained and passed to the shell, which .\" normally ignores it. .\" However, the tab at the beginning of the following line is removed. If the first characters of the command are any combination of .Ql Ic @ , .Ql Ic + , or .Ql Ic \- , the command is treated specially. A .Ql Ic @ causes the command not to be echoed before it is executed. A .Ql Ic + causes the command to be executed even when .Fl n is given. This is similar to the effect of the .MAKE special source, except that the effect can be limited to a single line of a script. A .Ql Ic \- in compatibility mode causes any non-zero exit status of the command line to be ignored. .Pp When .Nm is run in jobs mode with .Fl j Ar max_jobs , the entire script for the target is fed to a single instance of the shell. In compatibility (non-jobs) mode, each command is run in a separate process. If the command contains any shell meta characters .Pq Ql #=|^(){};&<>*?[]:$`\e\en it will be passed to the shell; otherwise .Nm will attempt direct execution. If a line starts with .Ql Ic \- and the shell has ErrCtl enabled then failure of the command line will be ignored as in compatibility mode. Otherwise .Ql Ic \- affects the entire job; the script will stop at the first command line that fails, but the target will not be deemed to have failed. .Pp Makefiles should be written so that the mode of .Nm operation does not change their behavior. For example, any command which needs to use .Dq cd or .Dq chdir without potentially changing the directory for subsequent commands should be put in parentheses so it executes in a subshell. To force the use of one shell, escape the line breaks so as to make the whole script one command. For example: .Bd -literal -offset indent avoid-chdir-side-effects: @echo Building $@ in `pwd` @(cd ${.CURDIR} && ${MAKE} $@) @echo Back in `pwd` ensure-one-shell-regardless-of-mode: @echo Building $@ in `pwd`; \e (cd ${.CURDIR} && ${MAKE} $@); \e echo Back in `pwd` .Ed .Pp Since .Nm will .Xr chdir 2 to .Ql Va .OBJDIR before executing any targets, each child process starts with that as its current working directory. .Sh VARIABLE ASSIGNMENTS Variables in make are much like variables in the shell, and, by tradition, consist of all upper-case letters. .Ss Variable assignment modifiers The five operators that can be used to assign values to variables are as follows: .Bl -tag -width Ds .It Ic \&= Assign the value to the variable. Any previous value is overridden. .It Ic \&+= Append the value to the current value of the variable. .It Ic \&?= Assign the value to the variable if it is not already defined. .It Ic \&:= Assign with expansion, i.e. expand the value before assigning it to the variable. Normally, expansion is not done until the variable is referenced. .Em NOTE : References to undefined variables are .Em not expanded. This can cause problems when variable modifiers are used. .It Ic \&!= Expand the value and pass it to the shell for execution and assign the result to the variable. Any newlines in the result are replaced with spaces. .El .Pp Any white-space before the assigned .Ar value is removed; if the value is being appended, a single space is inserted between the previous contents of the variable and the appended value. .Pp Variables are expanded by surrounding the variable name with either curly braces .Pq Ql {} or parentheses .Pq Ql () and preceding it with a dollar sign .Pq Ql \&$ . If the variable name contains only a single letter, the surrounding braces or parentheses are not required. This shorter form is not recommended. .Pp If the variable name contains a dollar, then the name itself is expanded first. This allows almost arbitrary variable names, however names containing dollar, braces, parenthesis, or whitespace are really best avoided! .Pp If the result of expanding a variable contains a dollar sign .Pq Ql \&$ the string is expanded again. .Pp Variable substitution occurs at three distinct times, depending on where the variable is being used. .Bl -enum .It Variables in dependency lines are expanded as the line is read. .It Variables in shell commands are expanded when the shell command is executed. .It .Dq .for loop index variables are expanded on each loop iteration. Note that other variables are not expanded inside loops so the following example code: .Bd -literal -offset indent .Dv .for i in 1 2 3 a+= ${i} j= ${i} b+= ${j} .Dv .endfor all: @echo ${a} @echo ${b} .Ed will print: .Bd -literal -offset indent 1 2 3 3 3 3 .Ed Because while ${a} contains .Dq 1 2 3 after the loop is executed, ${b} contains .Dq ${j} ${j} ${j} which expands to .Dq 3 3 3 since after the loop completes ${j} contains .Dq 3 . .El .Ss Variable classes The four different classes of variables (in order of increasing precedence) are: .Bl -tag -width Ds .It Environment variables Variables defined as part of .Nm Ns 's environment. .It Global variables Variables defined in the makefile or in included makefiles. .It Command line variables Variables defined as part of the command line. .It Local variables Variables that are defined specific to a certain target. .El .Pp Local variables are all built in and their values vary magically from target to target. It is not currently possible to define new local variables. The seven local variables are as follows: .Bl -tag -width ".ARCHIVE" -offset indent .It Va .ALLSRC The list of all sources for this target; also known as .Ql Va \&\*[Gt] . .It Va .ARCHIVE The name of the archive file; also known as .Ql Va \&! . .It Va .IMPSRC In suffix-transformation rules, the name/path of the source from which the target is to be transformed (the .Dq implied source); also known as .Ql Va \&\*[Lt] . It is not defined in explicit rules. .It Va .MEMBER The name of the archive member; also known as .Ql Va % . .It Va .OODATE The list of sources for this target that were deemed out-of-date; also known as .Ql Va \&? . .It Va .PREFIX The file prefix of the target, containing only the file portion, no suffix or preceding directory components; also known as .Ql Va * . The suffix must be one of the known suffixes declared with .Ic .SUFFIXES or it will not be recognized. .It Va .TARGET The name of the target; also known as .Ql Va @ . For compatibility with other makes this is an alias for .Ic .ARCHIVE in archive member rules. .El .Pp The shorter forms .Ql ( Va \*[Gt] , .Ql Va \&! , .Ql Va \*[Lt] , .Ql Va % , .Ql Va \&? , .Ql Va * , and .Ql Va @ ) are permitted for backward compatibility with historical makefiles and legacy POSIX make and are not recommended. .Pp Variants of these variables with the punctuation followed immediately by .Ql D or .Ql F , e.g. .Ql Va $(@D) , are legacy forms equivalent to using the .Ql :H and .Ql :T modifiers. These forms are accepted for compatibility with .At V makefiles and POSIX but are not recommended. .Pp Four of the local variables may be used in sources on dependency lines because they expand to the proper value for each target on the line. These variables are .Ql Va .TARGET , .Ql Va .PREFIX , .Ql Va .ARCHIVE , and .Ql Va .MEMBER . .Ss Additional built-in variables In addition, .Nm sets or knows about the following variables: .Bl -tag -width .MAKEOVERRIDES .It Va \&$ A single dollar sign .Ql \&$ , i.e. .Ql \&$$ expands to a single dollar sign. .It Va .ALLTARGETS The list of all targets encountered in the Makefile. If evaluated during Makefile parsing, lists only those targets encountered thus far. .It Va .CURDIR A path to the directory where .Nm was executed. Refer to the description of .Ql Ev PWD for more details. .It Va .INCLUDEDFROMDIR The directory of the file this Makefile was included from. .It Va .INCLUDEDFROMFILE The filename of the file this Makefile was included from. .It Ev MAKE The name that .Nm was executed with .Pq Va argv[0] . For compatibility .Nm also sets .Va .MAKE with the same value. The preferred variable to use is the environment variable .Ev MAKE because it is more compatible with other versions of .Nm and cannot be confused with the special target with the same name. .It Va .MAKE.DEPENDFILE Names the makefile (default .Ql Pa .depend ) from which generated dependencies are read. .It Va .MAKE.EXPAND_VARIABLES A boolean that controls the default behavior of the .Fl V option. .It Va .MAKE.EXPORTED The list of variables exported by .Nm . .It Va .MAKE.JOBS The argument to the .Fl j option. .It Va .MAKE.JOB.PREFIX If .Nm is run with .Ar j then output for each target is prefixed with a token .Ql --- target --- the first part of which can be controlled via .Va .MAKE.JOB.PREFIX . If .Va .MAKE.JOB.PREFIX is empty, no token is printed. .br For example: .Li .MAKE.JOB.PREFIX=${.newline}---${.MAKE:T}[${.MAKE.PID}] would produce tokens like .Ql ---make[1234] target --- making it easier to track the degree of parallelism being achieved. .It Ev MAKEFLAGS The environment variable .Ql Ev MAKEFLAGS may contain anything that may be specified on .Nm Ns 's command line. Anything specified on .Nm Ns 's command line is appended to the .Ql Ev MAKEFLAGS variable which is then entered into the environment for all programs which .Nm executes. .It Va .MAKE.LEVEL The recursion depth of .Nm . The initial instance of .Nm will be 0, and an incremented value is put into the environment to be seen by the next generation. This allows tests like: .Li .if ${.MAKE.LEVEL} == 0 to protect things which should only be evaluated in the initial instance of .Nm . .It Va .MAKE.MAKEFILE_PREFERENCE The ordered list of makefile names (default .Ql Pa makefile , .Ql Pa Makefile ) that .Nm will look for. .It Va .MAKE.MAKEFILES The list of makefiles read by .Nm , which is useful for tracking dependencies. Each makefile is recorded only once, regardless of the number of times read. .It Va .MAKE.MODE Processed after reading all makefiles. Can affect the mode that .Nm runs in. It can contain a number of keywords: .Bl -hang -width missing-filemon=bf. .It Pa compat Like .Fl B , puts .Nm into "compat" mode. .It Pa meta Puts .Nm into "meta" mode, where meta files are created for each target to capture the command run, the output generated and if .Xr filemon 4 is available, the system calls which are of interest to .Nm . The captured output can be very useful when diagnosing errors. .It Pa curdirOk= Ar bf Normally .Nm will not create .meta files in .Ql Va .CURDIR . This can be overridden by setting .Va bf to a value which represents True. .It Pa missing-meta= Ar bf If .Va bf is True, then a missing .meta file makes the target out-of-date. .It Pa missing-filemon= Ar bf If .Va bf is True, then missing filemon data makes the target out-of-date. .It Pa nofilemon Do not use .Xr filemon 4 . .It Pa env For debugging, it can be useful to include the environment in the .meta file. .It Pa verbose If in "meta" mode, print a clue about the target being built. This is useful if the build is otherwise running silently. The message printed the value of: .Va .MAKE.META.PREFIX . .It Pa ignore-cmd Some makefiles have commands which are simply not stable. This keyword causes them to be ignored for determining whether a target is out of date in "meta" mode. See also .Ic .NOMETA_CMP . .It Pa silent= Ar bf If .Va bf is True, when a .meta file is created, mark the target .Ic .SILENT . .El .It Va .MAKE.META.BAILIWICK In "meta" mode, provides a list of prefixes which match the directories controlled by .Nm . If a file that was generated outside of .Va .OBJDIR but within said bailiwick is missing, the current target is considered out-of-date. .It Va .MAKE.META.CREATED In "meta" mode, this variable contains a list of all the meta files updated. If not empty, it can be used to trigger processing of .Va .MAKE.META.FILES . .It Va .MAKE.META.FILES In "meta" mode, this variable contains a list of all the meta files used (updated or not). This list can be used to process the meta files to extract dependency information. .It Va .MAKE.META.IGNORE_PATHS Provides a list of path prefixes that should be ignored; because the contents are expected to change over time. The default list includes: .Ql Pa /dev /etc /proc /tmp /var/run /var/tmp .It Va .MAKE.META.IGNORE_PATTERNS Provides a list of patterns to match against pathnames. Ignore any that match. +.It Va .MAKE.META.IGNORE_FILTER +Provides a list of variable modifiers to apply to each pathname. +Ignore if the expansion is an empty string. .It Va .MAKE.META.PREFIX Defines the message printed for each meta file updated in "meta verbose" mode. The default value is: .Dl Building ${.TARGET:H:tA}/${.TARGET:T} .It Va .MAKEOVERRIDES This variable is used to record the names of variables assigned to on the command line, so that they may be exported as part of .Ql Ev MAKEFLAGS . This behavior can be disabled by assigning an empty value to .Ql Va .MAKEOVERRIDES within a makefile. Extra variables can be exported from a makefile by appending their names to .Ql Va .MAKEOVERRIDES . .Ql Ev MAKEFLAGS is re-exported whenever .Ql Va .MAKEOVERRIDES is modified. .It Va .MAKE.PATH_FILEMON If .Nm was built with .Xr filemon 4 support, this is set to the path of the device node. This allows makefiles to test for this support. .It Va .MAKE.PID The process-id of .Nm . .It Va .MAKE.PPID The parent process-id of .Nm . .It Va .MAKE.SAVE_DOLLARS value should be a boolean that controls whether .Ql $$ are preserved when doing .Ql := assignments. The default is false, for backwards compatibility. Set to true for compatability with other makes. If set to false, .Ql $$ becomes .Ql $ per normal evaluation rules. .It Va MAKE_PRINT_VAR_ON_ERROR When .Nm -stops due to an error, it prints its name and the value of +stops due to an error, it sets +.Ql Va .ERROR_TARGET +to the name of the target that failed, +.Ql Va .ERROR_CMD +to the commands of the failed target, +and in "meta" mode, it also sets +.Ql Va .ERROR_CWD +to the +.Xr getcwd 3 , +and +.Ql Va .ERROR_META_FILE +to the path of the meta file (if any) describing the failed target. +It then prints its name and the value of .Ql Va .CURDIR as well as the value of any variables named in .Ql Va MAKE_PRINT_VAR_ON_ERROR . .It Va .newline This variable is simply assigned a newline character as its value. This allows expansions using the .Cm \&:@ modifier to put a newline between iterations of the loop rather than a space. For example, the printing of .Ql Va MAKE_PRINT_VAR_ON_ERROR could be done as ${MAKE_PRINT_VAR_ON_ERROR:@v@$v='${$v}'${.newline}@}. .It Va .OBJDIR A path to the directory where the targets are built. Its value is determined by trying to .Xr chdir 2 to the following directories in order and using the first match: .Bl -enum .It .Ev ${MAKEOBJDIRPREFIX}${.CURDIR} .Pp (Only if .Ql Ev MAKEOBJDIRPREFIX is set in the environment or on the command line.) .It .Ev ${MAKEOBJDIR} .Pp (Only if .Ql Ev MAKEOBJDIR is set in the environment or on the command line.) .It .Ev ${.CURDIR} Ns Pa /obj. Ns Ev ${MACHINE} .It .Ev ${.CURDIR} Ns Pa /obj .It .Pa /usr/obj/ Ns Ev ${.CURDIR} .It .Ev ${.CURDIR} .El .Pp Variable expansion is performed on the value before it's used, so expressions such as .Dl ${.CURDIR:S,^/usr/src,/var/obj,} may be used. This is especially useful with .Ql Ev MAKEOBJDIR . .Pp .Ql Va .OBJDIR may be modified in the makefile via the special target .Ql Ic .OBJDIR . In all cases, .Nm will .Xr chdir 2 to the specified directory if it exists, and set .Ql Va .OBJDIR and .Ql Ev PWD to that directory before executing any targets. . .It Va .PARSEDIR A path to the directory of the current .Ql Pa Makefile being parsed. .It Va .PARSEFILE The basename of the current .Ql Pa Makefile being parsed. This variable and .Ql Va .PARSEDIR are both set only while the .Ql Pa Makefiles are being parsed. If you want to retain their current values, assign them to a variable using assignment with expansion: .Pq Ql Cm \&:= . .It Va .PATH A variable that represents the list of directories that .Nm will search for files. The search list should be updated using the target .Ql Va .PATH rather than the variable. .It Ev PWD Alternate path to the current directory. .Nm normally sets .Ql Va .CURDIR to the canonical path given by .Xr getcwd 3 . However, if the environment variable .Ql Ev PWD is set and gives a path to the current directory, then .Nm sets .Ql Va .CURDIR to the value of .Ql Ev PWD instead. This behavior is disabled if .Ql Ev MAKEOBJDIRPREFIX is set or .Ql Ev MAKEOBJDIR contains a variable transform. .Ql Ev PWD is set to the value of .Ql Va .OBJDIR for all programs which .Nm executes. .It Ev .TARGETS The list of targets explicitly specified on the command line, if any. .It Ev VPATH Colon-separated .Pq Dq \&: lists of directories that .Nm will search for files. The variable is supported for compatibility with old make programs only, use .Ql Va .PATH instead. .El .Ss Variable modifiers Variable expansion may be modified to select or modify each word of the variable (where a .Dq word is white-space delimited sequence of characters). The general format of a variable expansion is as follows: .Pp .Dl ${variable[:modifier[:...]]} .Pp Each modifier begins with a colon, which may be escaped with a backslash .Pq Ql \e . .Pp A set of modifiers can be specified via a variable, as follows: .Pp .Dl modifier_variable=modifier[:...] .Dl ${variable:${modifier_variable}[:...]} .Pp In this case the first modifier in the modifier_variable does not start with a colon, since that must appear in the referencing variable. If any of the modifiers in the modifier_variable contain a dollar sign .Pq Ql $ , these must be doubled to avoid early expansion. .Pp The supported modifiers are: .Bl -tag -width EEE .It Cm \&:E Replaces each word in the variable with its suffix. .It Cm \&:H Replaces each word in the variable with everything but the last component. .It Cm \&:M Ns Ar pattern Select only those words that match .Ar pattern . The standard shell wildcard characters .Pf ( Ql * , .Ql \&? , and .Ql Oo Oc ) may be used. The wildcard characters may be escaped with a backslash .Pq Ql \e . As a consequence of the way values are split into words, matched, and then joined, a construct like .Dl ${VAR:M*} will normalize the inter-word spacing, removing all leading and trailing space, and converting multiple consecutive spaces to single spaces. . .It Cm \&:N Ns Ar pattern This is identical to .Ql Cm \&:M , but selects all words which do not match .Ar pattern . .It Cm \&:O Order every word in variable alphabetically. To sort words in reverse order use the .Ql Cm \&:O:[-1..1] combination of modifiers. .It Cm \&:Ox Randomize words in variable. The results will be different each time you are referring to the modified variable; use the assignment with expansion .Pq Ql Cm \&:= to prevent such behavior. For example, .Bd -literal -offset indent LIST= uno due tre quattro RANDOM_LIST= ${LIST:Ox} STATIC_RANDOM_LIST:= ${LIST:Ox} all: @echo "${RANDOM_LIST}" @echo "${RANDOM_LIST}" @echo "${STATIC_RANDOM_LIST}" @echo "${STATIC_RANDOM_LIST}" .Ed may produce output similar to: .Bd -literal -offset indent quattro due tre uno tre due quattro uno due uno quattro tre due uno quattro tre .Ed .It Cm \&:Q Quotes every shell meta-character in the variable, so that it can be passed safely through recursive invocations of .Nm . .It Cm \&:R Replaces each word in the variable with everything but its suffix. .It Cm \&:gmtime The value is a format string for .Xr strftime 3 , using the current .Xr gmtime 3 . .It Cm \&:hash Compute a 32-bit hash of the value and encode it as hex digits. .It Cm \&:localtime The value is a format string for .Xr strftime 3 , using the current .Xr localtime 3 . .It Cm \&:tA Attempt to convert variable to an absolute path using .Xr realpath 3 , if that fails, the value is unchanged. .It Cm \&:tl Converts variable to lower-case letters. .It Cm \&:ts Ns Ar c Words in the variable are normally separated by a space on expansion. This modifier sets the separator to the character .Ar c . If .Ar c is omitted, then no separator is used. The common escapes (including octal numeric codes), work as expected. .It Cm \&:tu Converts variable to upper-case letters. .It Cm \&:tW Causes the value to be treated as a single word (possibly containing embedded white space). See also .Ql Cm \&:[*] . .It Cm \&:tw Causes the value to be treated as a sequence of words delimited by white space. See also .Ql Cm \&:[@] . .Sm off .It Cm \&:S No \&/ Ar old_string No \&/ Ar new_string No \&/ Op Cm 1gW .Sm on Modify the first occurrence of .Ar old_string in the variable's value, replacing it with .Ar new_string . If a .Ql g is appended to the last slash of the pattern, all occurrences in each word are replaced. If a .Ql 1 is appended to the last slash of the pattern, only the first word is affected. If a .Ql W is appended to the last slash of the pattern, then the value is treated as a single word (possibly containing embedded white space). If .Ar old_string begins with a caret .Pq Ql ^ , .Ar old_string is anchored at the beginning of each word. If .Ar old_string ends with a dollar sign .Pq Ql \&$ , it is anchored at the end of each word. Inside .Ar new_string , an ampersand .Pq Ql \*[Am] is replaced by .Ar old_string (without any .Ql ^ or .Ql \&$ ) . Any character may be used as a delimiter for the parts of the modifier string. The anchoring, ampersand and delimiter characters may be escaped with a backslash .Pq Ql \e . .Pp Variable expansion occurs in the normal fashion inside both .Ar old_string and .Ar new_string with the single exception that a backslash is used to prevent the expansion of a dollar sign .Pq Ql \&$ , not a preceding dollar sign as is usual. .Sm off .It Cm \&:C No \&/ Ar pattern No \&/ Ar replacement No \&/ Op Cm 1gW .Sm on The .Cm \&:C modifier is just like the .Cm \&:S modifier except that the old and new strings, instead of being simple strings, are an extended regular expression (see .Xr regex 3 ) string .Ar pattern and an .Xr ed 1 Ns \-style string .Ar replacement . Normally, the first occurrence of the pattern .Ar pattern in each word of the value is substituted with .Ar replacement . The .Ql 1 modifier causes the substitution to apply to at most one word; the .Ql g modifier causes the substitution to apply to as many instances of the search pattern .Ar pattern as occur in the word or words it is found in; the .Ql W modifier causes the value to be treated as a single word (possibly containing embedded white space). Note that .Ql 1 and .Ql g are orthogonal; the former specifies whether multiple words are potentially affected, the latter whether multiple substitutions can potentially occur within each affected word. .Pp As for the .Cm \&:S modifier, the .Ar pattern and .Ar replacement are subjected to variable expansion before being parsed as regular expressions. .It Cm \&:T Replaces each word in the variable with its last component. .It Cm \&:u Remove adjacent duplicate words (like .Xr uniq 1 ) . .Sm off .It Cm \&:\&? Ar true_string Cm \&: Ar false_string .Sm on If the variable name (not its value), when parsed as a .if conditional expression, evaluates to true, return as its value the .Ar true_string , otherwise return the .Ar false_string . Since the variable name is used as the expression, \&:\&? must be the first modifier after the variable name itself - which will, of course, usually contain variable expansions. A common error is trying to use expressions like .Dl ${NUMBERS:M42:?match:no} which actually tests defined(NUMBERS), to determine is any words match "42" you need to use something like: .Dl ${"${NUMBERS:M42}" != \&"\&":?match:no} . .It Ar :old_string=new_string This is the .At V style variable substitution. It must be the last modifier specified. If .Ar old_string or .Ar new_string do not contain the pattern matching character .Ar % then it is assumed that they are anchored at the end of each word, so only suffixes or entire words may be replaced. Otherwise .Ar % is the substring of .Ar old_string to be replaced in .Ar new_string . .Pp Variable expansion occurs in the normal fashion inside both .Ar old_string and .Ar new_string with the single exception that a backslash is used to prevent the expansion of a dollar sign .Pq Ql \&$ , not a preceding dollar sign as is usual. .Sm off .It Cm \&:@ Ar temp Cm @ Ar string Cm @ .Sm on This is the loop expansion mechanism from the OSF Development Environment (ODE) make. Unlike .Cm \&.for loops expansion occurs at the time of reference. Assign .Ar temp to each word in the variable and evaluate .Ar string . The ODE convention is that .Ar temp should start and end with a period. For example. .Dl ${LINKS:@.LINK.@${LN} ${TARGET} ${.LINK.}@} .Pp However a single character variable is often more readable: .Dl ${MAKE_PRINT_VAR_ON_ERROR:@v@$v='${$v}'${.newline}@} .It Cm \&:U Ns Ar newval If the variable is undefined .Ar newval is the value. If the variable is defined, the existing value is returned. This is another ODE make feature. It is handy for setting per-target CFLAGS for instance: .Dl ${_${.TARGET:T}_CFLAGS:U${DEF_CFLAGS}} If a value is only required if the variable is undefined, use: .Dl ${VAR:D:Unewval} .It Cm \&:D Ns Ar newval If the variable is defined .Ar newval is the value. .It Cm \&:L The name of the variable is the value. .It Cm \&:P The path of the node which has the same name as the variable is the value. If no such node exists or its path is null, then the name of the variable is used. In order for this modifier to work, the name (node) must at least have appeared on the rhs of a dependency. .Sm off .It Cm \&:\&! Ar cmd Cm \&! .Sm on The output of running .Ar cmd is the value. .It Cm \&:sh If the variable is non-empty it is run as a command and the output becomes the new value. .It Cm \&::= Ns Ar str The variable is assigned the value .Ar str after substitution. This modifier and its variations are useful in obscure situations such as wanting to set a variable when shell commands are being parsed. These assignment modifiers always expand to nothing, so if appearing in a rule line by themselves should be preceded with something to keep .Nm happy. .Pp The .Ql Cm \&:: helps avoid false matches with the .At V style .Cm \&:= modifier and since substitution always occurs the .Cm \&::= form is vaguely appropriate. .It Cm \&::?= Ns Ar str As for .Cm \&::= but only if the variable does not already have a value. .It Cm \&::+= Ns Ar str Append .Ar str to the variable. .It Cm \&::!= Ns Ar cmd Assign the output of .Ar cmd to the variable. .It Cm \&:\&[ Ns Ar range Ns Cm \&] Selects one or more words from the value, or performs other operations related to the way in which the value is divided into words. .Pp Ordinarily, a value is treated as a sequence of words delimited by white space. Some modifiers suppress this behavior, causing a value to be treated as a single word (possibly containing embedded white space). An empty value, or a value that consists entirely of white-space, is treated as a single word. For the purposes of the .Ql Cm \&:[] modifier, the words are indexed both forwards using positive integers (where index 1 represents the first word), and backwards using negative integers (where index \-1 represents the last word). .Pp The .Ar range is subjected to variable expansion, and the expanded result is then interpreted as follows: .Bl -tag -width index .\" :[n] .It Ar index Selects a single word from the value. .\" :[start..end] .It Ar start Ns Cm \&.. Ns Ar end Selects all words from .Ar start to .Ar end , inclusive. For example, .Ql Cm \&:[2..-1] selects all words from the second word to the last word. If .Ar start is greater than .Ar end , then the words are output in reverse order. For example, .Ql Cm \&:[-1..1] selects all the words from last to first. .\" :[*] .It Cm \&* Causes subsequent modifiers to treat the value as a single word (possibly containing embedded white space). Analogous to the effect of \&"$*\&" in Bourne shell. .\" :[0] .It 0 Means the same as .Ql Cm \&:[*] . .\" :[*] .It Cm \&@ Causes subsequent modifiers to treat the value as a sequence of words delimited by white space. Analogous to the effect of \&"$@\&" in Bourne shell. .\" :[#] .It Cm \&# Returns the number of words in the value. .El \" :[range] .El .Sh INCLUDE STATEMENTS, CONDITIONALS AND FOR LOOPS Makefile inclusion, conditional structures and for loops reminiscent of the C programming language are provided in .Nm . All such structures are identified by a line beginning with a single dot .Pq Ql \&. character. Files are included with either .Cm \&.include Aq Ar file or .Cm \&.include Pf \*q Ar file Ns \*q . Variables between the angle brackets or double quotes are expanded to form the file name. If angle brackets are used, the included makefile is expected to be in the system makefile directory. If double quotes are used, the including makefile's directory and any directories specified using the .Fl I option are searched before the system makefile directory. For compatibility with other versions of .Nm .Ql include file ... is also accepted. .Pp If the include statement is written as .Cm .-include or as .Cm .sinclude then errors locating and/or opening include files are ignored. .Pp If the include statement is written as .Cm .dinclude not only are errors locating and/or opening include files ignored, but stale dependencies within the included file will be ignored just like .Va .MAKE.DEPENDFILE . .Pp Conditional expressions are also preceded by a single dot as the first character of a line. The possible conditionals are as follows: .Bl -tag -width Ds .It Ic .error Ar message The message is printed along with the name of the makefile and line number, then .Nm will exit. .It Ic .export Ar variable ... Export the specified global variable. If no variable list is provided, all globals are exported except for internal variables (those that start with .Ql \&. ) . This is not affected by the .Fl X flag, so should be used with caution. For compatibility with other .Nm programs .Ql export variable=value is also accepted. .Pp Appending a variable name to .Va .MAKE.EXPORTED is equivalent to exporting a variable. .It Ic .export-env Ar variable ... The same as .Ql .export , except that the variable is not appended to .Va .MAKE.EXPORTED . This allows exporting a value to the environment which is different from that used by .Nm internally. .It Ic .export-literal Ar variable ... The same as .Ql .export-env , except that variables in the value are not expanded. .It Ic .info Ar message The message is printed along with the name of the makefile and line number. .It Ic .undef Ar variable Un-define the specified global variable. Only global variables may be un-defined. .It Ic .unexport Ar variable ... The opposite of .Ql .export . The specified global .Va variable will be removed from .Va .MAKE.EXPORTED . If no variable list is provided, all globals are unexported, and .Va .MAKE.EXPORTED deleted. .It Ic .unexport-env Unexport all globals previously exported and clear the environment inherited from the parent. This operation will cause a memory leak of the original environment, so should be used sparingly. Testing for .Va .MAKE.LEVEL being 0, would make sense. Also note that any variables which originated in the parent environment should be explicitly preserved if desired. For example: .Bd -literal -offset indent .Li .if ${.MAKE.LEVEL} == 0 PATH := ${PATH} .Li .unexport-env .Li .export PATH .Li .endif .Pp .Ed Would result in an environment containing only .Ql Ev PATH , which is the minimal useful environment. Actually .Ql Ev .MAKE.LEVEL will also be pushed into the new environment. .It Ic .warning Ar message The message prefixed by .Ql Pa warning: is printed along with the name of the makefile and line number. .It Ic \&.if Oo \&! Oc Ns Ar expression Op Ar operator expression ... Test the value of an expression. .It Ic .ifdef Oo \&! Oc Ns Ar variable Op Ar operator variable ... Test the value of a variable. .It Ic .ifndef Oo \&! Oc Ns Ar variable Op Ar operator variable ... Test the value of a variable. .It Ic .ifmake Oo \&! Oc Ns Ar target Op Ar operator target ... Test the target being built. .It Ic .ifnmake Oo \&! Ns Oc Ar target Op Ar operator target ... Test the target being built. .It Ic .else Reverse the sense of the last conditional. .It Ic .elif Oo \&! Ns Oc Ar expression Op Ar operator expression ... A combination of .Ql Ic .else followed by .Ql Ic .if . .It Ic .elifdef Oo \&! Oc Ns Ar variable Op Ar operator variable ... A combination of .Ql Ic .else followed by .Ql Ic .ifdef . .It Ic .elifndef Oo \&! Oc Ns Ar variable Op Ar operator variable ... A combination of .Ql Ic .else followed by .Ql Ic .ifndef . .It Ic .elifmake Oo \&! Oc Ns Ar target Op Ar operator target ... A combination of .Ql Ic .else followed by .Ql Ic .ifmake . .It Ic .elifnmake Oo \&! Oc Ns Ar target Op Ar operator target ... A combination of .Ql Ic .else followed by .Ql Ic .ifnmake . .It Ic .endif End the body of the conditional. .El .Pp The .Ar operator may be any one of the following: .Bl -tag -width "Cm XX" .It Cm \&|\&| Logical OR. .It Cm \&\*[Am]\*[Am] Logical .Tn AND ; of higher precedence than .Dq \&|\&| . .El .Pp As in C, .Nm will only evaluate a conditional as far as is necessary to determine its value. Parentheses may be used to change the order of evaluation. The boolean operator .Ql Ic \&! may be used to logically negate an entire conditional. It is of higher precedence than .Ql Ic \&\*[Am]\*[Am] . .Pp The value of .Ar expression may be any of the following: .Bl -tag -width defined .It Ic defined Takes a variable name as an argument and evaluates to true if the variable has been defined. .It Ic make Takes a target name as an argument and evaluates to true if the target was specified as part of .Nm Ns 's command line or was declared the default target (either implicitly or explicitly, see .Va .MAIN ) before the line containing the conditional. .It Ic empty Takes a variable, with possible modifiers, and evaluates to true if the expansion of the variable would result in an empty string. .It Ic exists Takes a file name as an argument and evaluates to true if the file exists. The file is searched for on the system search path (see .Va .PATH ) . .It Ic target Takes a target name as an argument and evaluates to true if the target has been defined. .It Ic commands Takes a target name as an argument and evaluates to true if the target has been defined and has commands associated with it. .El .Pp .Ar Expression may also be an arithmetic or string comparison. Variable expansion is performed on both sides of the comparison, after which the integral values are compared. A value is interpreted as hexadecimal if it is preceded by 0x, otherwise it is decimal; octal numbers are not supported. The standard C relational operators are all supported. If after variable expansion, either the left or right hand side of a .Ql Ic == or .Ql Ic "!=" operator is not an integral value, then string comparison is performed between the expanded variables. If no relational operator is given, it is assumed that the expanded variable is being compared against 0 or an empty string in the case of a string comparison. .Pp When .Nm is evaluating one of these conditional expressions, and it encounters a (white-space separated) word it doesn't recognize, either the .Dq make or .Dq defined expression is applied to it, depending on the form of the conditional. If the form is .Ql Ic .ifdef , .Ql Ic .ifndef , or .Ql Ic .if the .Dq defined expression is applied. Similarly, if the form is .Ql Ic .ifmake or .Ql Ic .ifnmake , the .Dq make expression is applied. .Pp If the conditional evaluates to true the parsing of the makefile continues as before. If it evaluates to false, the following lines are skipped. In both cases this continues until a .Ql Ic .else or .Ql Ic .endif is found. .Pp For loops are typically used to apply a set of rules to a list of files. The syntax of a for loop is: .Pp .Bl -tag -compact -width Ds .It Ic \&.for Ar variable Oo Ar variable ... Oc Ic in Ar expression .It Aq make-rules .It Ic \&.endfor .El .Pp After the for .Ic expression is evaluated, it is split into words. On each iteration of the loop, one word is taken and assigned to each .Ic variable , in order, and these .Ic variables are substituted into the .Ic make-rules inside the body of the for loop. The number of words must come out even; that is, if there are three iteration variables, the number of words provided must be a multiple of three. .Sh COMMENTS Comments begin with a hash .Pq Ql \&# character, anywhere but in a shell command line, and continue to the end of an unescaped new line. .Sh SPECIAL SOURCES (ATTRIBUTES) .Bl -tag -width .IGNOREx .It Ic .EXEC Target is never out of date, but always execute commands anyway. .It Ic .IGNORE Ignore any errors from the commands associated with this target, exactly as if they all were preceded by a dash .Pq Ql \- . .\" .It Ic .INVISIBLE .\" XXX .\" .It Ic .JOIN .\" XXX .It Ic .MADE Mark all sources of this target as being up-to-date. .It Ic .MAKE Execute the commands associated with this target even if the .Fl n or .Fl t options were specified. Normally used to mark recursive .Nm Ns s . .It Ic .META Create a meta file for the target, even if it is flagged as .Ic .PHONY , .Ic .MAKE , or .Ic .SPECIAL . Usage in conjunction with .Ic .MAKE is the most likely case. In "meta" mode, the target is out-of-date if the meta file is missing. .It Ic .NOMETA Do not create a meta file for the target. Meta files are also not created for .Ic .PHONY , .Ic .MAKE , or .Ic .SPECIAL targets. .It Ic .NOMETA_CMP Ignore differences in commands when deciding if target is out of date. This is useful if the command contains a value which always changes. If the number of commands change, though, the target will still be out of date. The same effect applies to any command line that uses the variable .Va .OODATE , which can be used for that purpose even when not otherwise needed or desired: .Bd -literal -offset indent skip-compare-for-some: @echo this will be compared @echo this will not ${.OODATE:M.NOMETA_CMP} @echo this will also be compared .Ed The .Cm \&:M pattern suppresses any expansion of the unwanted variable. .It Ic .NOPATH Do not search for the target in the directories specified by .Ic .PATH . .It Ic .NOTMAIN Normally .Nm selects the first target it encounters as the default target to be built if no target was specified. This source prevents this target from being selected. .It Ic .OPTIONAL If a target is marked with this attribute and .Nm can't figure out how to create it, it will ignore this fact and assume the file isn't needed or already exists. .It Ic .PHONY The target does not correspond to an actual file; it is always considered to be out of date, and will not be created with the .Fl t option. Suffix-transformation rules are not applied to .Ic .PHONY targets. .It Ic .PRECIOUS When .Nm is interrupted, it normally removes any partially made targets. This source prevents the target from being removed. .It Ic .RECURSIVE Synonym for .Ic .MAKE . .It Ic .SILENT Do not echo any of the commands associated with this target, exactly as if they all were preceded by an at sign .Pq Ql @ . .It Ic .USE Turn the target into .Nm Ns 's version of a macro. When the target is used as a source for another target, the other target acquires the commands, sources, and attributes (except for .Ic .USE ) of the source. If the target already has commands, the .Ic .USE target's commands are appended to them. .It Ic .USEBEFORE Exactly like .Ic .USE , but prepend the .Ic .USEBEFORE target commands to the target. .It Ic .WAIT If .Ic .WAIT appears in a dependency line, the sources that precede it are made before the sources that succeed it in the line. Since the dependents of files are not made until the file itself could be made, this also stops the dependents being built unless they are needed for another branch of the dependency tree. So given: .Bd -literal x: a .WAIT b echo x a: echo a b: b1 echo b b1: echo b1 .Ed the output is always .Ql a , .Ql b1 , .Ql b , .Ql x . .br The ordering imposed by .Ic .WAIT is only relevant for parallel makes. .El .Sh SPECIAL TARGETS Special targets may not be included with other targets, i.e. they must be the only target specified. .Bl -tag -width .BEGINx .It Ic .BEGIN Any command lines attached to this target are executed before anything else is done. .It Ic .DEFAULT This is sort of a .Ic .USE rule for any target (that was used only as a source) that .Nm can't figure out any other way to create. Only the shell script is used. The .Ic .IMPSRC variable of a target that inherits .Ic .DEFAULT Ns 's commands is set to the target's own name. .It Ic .END Any command lines attached to this target are executed after everything else is done. .It Ic .ERROR Any command lines attached to this target are executed when another target fails. The .Ic .ERROR_TARGET variable is set to the target that failed. See also .Ic MAKE_PRINT_VAR_ON_ERROR . .It Ic .IGNORE Mark each of the sources with the .Ic .IGNORE attribute. If no sources are specified, this is the equivalent of specifying the .Fl i option. .It Ic .INTERRUPT If .Nm is interrupted, the commands for this target will be executed. .It Ic .MAIN If no target is specified when .Nm is invoked, this target will be built. .It Ic .MAKEFLAGS This target provides a way to specify flags for .Nm when the makefile is used. The flags are as if typed to the shell, though the .Fl f option will have no effect. .\" XXX: NOT YET!!!! .\" .It Ic .NOTPARALLEL .\" The named targets are executed in non parallel mode. .\" If no targets are .\" specified, then all targets are executed in non parallel mode. .It Ic .NOPATH Apply the .Ic .NOPATH attribute to any specified sources. .It Ic .NOTPARALLEL Disable parallel mode. .It Ic .NO_PARALLEL Synonym for .Ic .NOTPARALLEL , for compatibility with other pmake variants. .It Ic .OBJDIR The source is a new value for .Ql Va .OBJDIR . If it exists, .Nm will .Xr chdir 2 to it and update the value of .Ql Va .OBJDIR . .It Ic .ORDER The named targets are made in sequence. This ordering does not add targets to the list of targets to be made. Since the dependents of a target do not get built until the target itself could be built, unless .Ql a is built by another part of the dependency graph, the following is a dependency loop: .Bd -literal \&.ORDER: b a b: a .Ed .Pp The ordering imposed by .Ic .ORDER is only relevant for parallel makes. .\" XXX: NOT YET!!!! .\" .It Ic .PARALLEL .\" The named targets are executed in parallel mode. .\" If no targets are .\" specified, then all targets are executed in parallel mode. .It Ic .PATH The sources are directories which are to be searched for files not found in the current directory. If no sources are specified, any previously specified directories are deleted. If the source is the special .Ic .DOTLAST target, then the current working directory is searched last. .It Ic .PATH. Ns Va suffix Like .Ic .PATH but applies only to files with a particular suffix. The suffix must have been previously declared with .Ic .SUFFIXES . .It Ic .PHONY Apply the .Ic .PHONY attribute to any specified sources. .It Ic .PRECIOUS Apply the .Ic .PRECIOUS attribute to any specified sources. If no sources are specified, the .Ic .PRECIOUS attribute is applied to every target in the file. .It Ic .SHELL Sets the shell that .Nm will use to execute commands. The sources are a set of .Ar field=value pairs. .Bl -tag -width hasErrCtls .It Ar name This is the minimal specification, used to select one of the built-in shell specs; .Ar sh , .Ar ksh , and .Ar csh . .It Ar path Specifies the path to the shell. .It Ar hasErrCtl Indicates whether the shell supports exit on error. .It Ar check The command to turn on error checking. .It Ar ignore The command to disable error checking. .It Ar echo The command to turn on echoing of commands executed. .It Ar quiet The command to turn off echoing of commands executed. .It Ar filter The output to filter after issuing the .Ar quiet command. It is typically identical to .Ar quiet . .It Ar errFlag The flag to pass the shell to enable error checking. .It Ar echoFlag The flag to pass the shell to enable command echoing. .It Ar newline The string literal to pass the shell that results in a single newline character when used outside of any quoting characters. .El Example: .Bd -literal \&.SHELL: name=ksh path=/bin/ksh hasErrCtl=true \e check="set \-e" ignore="set +e" \e echo="set \-v" quiet="set +v" filter="set +v" \e echoFlag=v errFlag=e newline="'\en'" .Ed .It Ic .SILENT Apply the .Ic .SILENT attribute to any specified sources. If no sources are specified, the .Ic .SILENT attribute is applied to every command in the file. .It Ic .STALE This target gets run when a dependency file contains stale entries, having .Va .ALLSRC set to the name of that dependency file. .It Ic .SUFFIXES Each source specifies a suffix to .Nm . If no sources are specified, any previously specified suffixes are deleted. It allows the creation of suffix-transformation rules. .Pp Example: .Bd -literal \&.SUFFIXES: .o \&.c.o: cc \-o ${.TARGET} \-c ${.IMPSRC} .Ed .El .Sh ENVIRONMENT .Nm uses the following environment variables, if they exist: .Ev MACHINE , .Ev MACHINE_ARCH , .Ev MAKE , .Ev MAKEFLAGS , .Ev MAKEOBJDIR , .Ev MAKEOBJDIRPREFIX , .Ev MAKESYSPATH , .Ev PWD , and .Ev TMPDIR . .Pp .Ev MAKEOBJDIRPREFIX and .Ev MAKEOBJDIR may only be set in the environment or on the command line to .Nm and not as makefile variables; see the description of .Ql Va .OBJDIR for more details. .Sh FILES .Bl -tag -width /usr/share/mk -compact .It .depend list of dependencies .It Makefile list of dependencies .It makefile list of dependencies .It sys.mk system makefile .It /usr/share/mk system makefile directory .El .Sh COMPATIBILITY The basic make syntax is compatible between different versions of make; however the special variables, variable modifiers and conditionals are not. .Ss Older versions An incomplete list of changes in older versions of .Nm : .Pp The way that .for loop variables are substituted changed after NetBSD 5.0 so that they still appear to be variable expansions. In particular this stops them being treated as syntax, and removes some obscure problems using them in .if statements. .Pp The way that parallel makes are scheduled changed in NetBSD 4.0 so that .ORDER and .WAIT apply recursively to the dependent nodes. The algorithms used may change again in the future. .Ss Other make dialects Other make dialects (GNU make, SVR4 make, POSIX make, etc.) do not support most of the features of .Nm as described in this manual. Most notably: .Bl -bullet -offset indent .It The .Ic .WAIT and .Ic .ORDER declarations and most functionality pertaining to parallelization. (GNU make supports parallelization but lacks these features needed to control it effectively.) .It Directives, including for loops and conditionals and most of the forms of include files. (GNU make has its own incompatible and less powerful syntax for conditionals.) .It All built-in variables that begin with a dot. .It Most of the special sources and targets that begin with a dot, with the notable exception of .Ic .PHONY , .Ic .PRECIOUS , and .Ic .SUFFIXES . .It Variable modifiers, except for the .Dl :old=new string substitution, which does not portably support globbing with .Ql % and historically only works on declared suffixes. .It The .Ic $> variable even in its short form; most makes support this functionality but its name varies. .El .Pp Some features are somewhat more portable, such as assignment with .Ic += , .Ic ?= , and .Ic != . The .Ic .PATH functionality is based on an older feature .Ic VPATH found in GNU make and many versions of SVR4 make; however, historically its behavior is too ill-defined (and too buggy) to rely upon. .Pp The .Ic $@ and .Ic $< variables are more or less universally portable, as is the .Ic $(MAKE) variable. Basic use of suffix rules (for files only in the current directory, not trying to chain transformations together, etc.) is also reasonably portable. .Sh SEE ALSO .Xr mkdep 1 .Sh HISTORY .Nm is derived from NetBSD .Xr make 1 . It uses autoconf to facilitate portability to other platforms. .Pp A make command appeared in .At v7 . This make implementation is based on Adam De Boor's pmake program which was written for Sprite at Berkeley. It was designed to be a parallel distributed make running jobs on different machines using a daemon called .Dq customs . .Pp Historically the target/dependency .Dq FRC has been used to FoRCe rebuilding (since the target/dependency does not exist... unless someone creates an .Dq FRC file). .Sh BUGS The make syntax is difficult to parse without actually acting of the data. For instance finding the end of a variable use should involve scanning each the modifiers using the correct terminator for each field. In many places make just counts {} and () in order to find the end of a variable expansion. .Pp There is no way of escaping a space character in a filename. Index: projects/clang390-import/contrib/bmake/bmake.cat1 =================================================================== --- projects/clang390-import/contrib/bmake/bmake.cat1 (revision 305686) +++ projects/clang390-import/contrib/bmake/bmake.cat1 (revision 305687) @@ -1,1492 +1,1501 @@ MAKE(1) NetBSD General Commands Manual MAKE(1) NNAAMMEE bbmmaakkee -- maintain program dependencies SSYYNNOOPPSSIISS bbmmaakkee [--BBeeiikkNNnnqqrrssttWWwwXX] [--CC _d_i_r_e_c_t_o_r_y] [--DD _v_a_r_i_a_b_l_e] [--dd _f_l_a_g_s] [--ff _m_a_k_e_f_i_l_e] [--II _d_i_r_e_c_t_o_r_y] [--JJ _p_r_i_v_a_t_e] [--jj _m_a_x___j_o_b_s] [--mm _d_i_r_e_c_t_o_r_y] [--TT _f_i_l_e] [--VV _v_a_r_i_a_b_l_e] [_v_a_r_i_a_b_l_e_=_v_a_l_u_e] [_t_a_r_g_e_t _._._.] DDEESSCCRRIIPPTTIIOONN bbmmaakkee is a program designed to simplify the maintenance of other pro- grams. Its input is a list of specifications as to the files upon which programs and other files depend. If no --ff _m_a_k_e_f_i_l_e makefile option is given, bbmmaakkee will try to open `_m_a_k_e_f_i_l_e' then `_M_a_k_e_f_i_l_e' in order to find the specifications. If the file `_._d_e_p_e_n_d' exists, it is read (see mkdep(1)). This manual page is intended as a reference document only. For a more thorough description of bbmmaakkee and makefiles, please refer to _P_M_a_k_e _- _A _T_u_t_o_r_i_a_l. bbmmaakkee will prepend the contents of the _M_A_K_E_F_L_A_G_S environment variable to the command line arguments before parsing them. The options are as follows: --BB Try to be backwards compatible by executing a single shell per command and by executing the commands to make the sources of a dependency line in sequence. --CC _d_i_r_e_c_t_o_r_y Change to _d_i_r_e_c_t_o_r_y before reading the makefiles or doing any- thing else. If multiple --CC options are specified, each is inter- preted relative to the previous one: --CC _/ --CC _e_t_c is equivalent to --CC _/_e_t_c. --DD _v_a_r_i_a_b_l_e Define _v_a_r_i_a_b_l_e to be 1, in the global context. --dd _[_-_]_f_l_a_g_s Turn on debugging, and specify which portions of bbmmaakkee are to print debugging information. Unless the flags are preceded by `-' they are added to the _M_A_K_E_F_L_A_G_S environment variable and will be processed by any child make processes. By default, debugging information is printed to standard error, but this can be changed using the _F debugging flag. The debugging output is always unbuffered; in addition, if debugging is enabled but debugging output is not directed to standard output, then the standard out- put is line buffered. _F_l_a_g_s is one or more of the following: _A Print all possible debugging information; equivalent to specifying all of the debugging flags. _a Print debugging information about archive searching and caching. _C Print debugging information about current working direc- tory. _c Print debugging information about conditional evaluation. _d Print debugging information about directory searching and caching. _e Print debugging information about failed commands and targets. _F[++]_f_i_l_e_n_a_m_e Specify where debugging output is written. This must be the last flag, because it consumes the remainder of the argument. If the character immediately after the `F' flag is `+', then the file will be opened in append mode; otherwise the file will be overwritten. If the file name is `stdout' or `stderr' then debugging output will be written to the standard output or standard error output file descriptors respectively (and the `+' option has no effect). Otherwise, the output will be written to the named file. If the file name ends `.%d' then the `%d' is replaced by the pid. _f Print debugging information about loop evaluation. _g_1 Print the input graph before making anything. _g_2 Print the input graph after making everything, or before exiting on error. _g_3 Print the input graph before exiting on error. _j Print debugging information about running multiple shells. _l Print commands in Makefiles regardless of whether or not they are prefixed by `@' or other "quiet" flags. Also known as "loud" behavior. _M Print debugging information about "meta" mode decisions about targets. _m Print debugging information about making targets, includ- ing modification dates. _n Don't delete the temporary command scripts created when running commands. These temporary scripts are created in the directory referred to by the TMPDIR environment vari- able, or in _/_t_m_p if TMPDIR is unset or set to the empty string. The temporary scripts are created by mkstemp(3), and have names of the form _m_a_k_e_X_X_X_X_X_X. _N_O_T_E: This can create many files in TMPDIR or _/_t_m_p, so use with care. _p Print debugging information about makefile parsing. _s Print debugging information about suffix-transformation rules. _t Print debugging information about target list mainte- nance. _V Force the --VV option to print raw values of variables. _v Print debugging information about variable assignment. _x Run shell commands with --xx so the actual commands are printed as they are executed. --ee Specify that environment variables override macro assignments within makefiles. --ff _m_a_k_e_f_i_l_e Specify a makefile to read instead of the default `_m_a_k_e_f_i_l_e'. If _m_a_k_e_f_i_l_e is `--', standard input is read. Multiple makefiles may be specified, and are read in the order specified. --II _d_i_r_e_c_t_o_r_y Specify a directory in which to search for makefiles and included makefiles. The system makefile directory (or directories, see the --mm option) is automatically included as part of this list. --ii Ignore non-zero exit of shell commands in the makefile. Equiva- lent to specifying `--' before each command line in the makefile. --JJ _p_r_i_v_a_t_e This option should _n_o_t be specified by the user. When the _j option is in use in a recursive build, this option is passed by a make to child makes to allow all the make processes in the build to cooperate to avoid overloading the system. --jj _m_a_x___j_o_b_s Specify the maximum number of jobs that bbmmaakkee may have running at any one time. The value is saved in _._M_A_K_E_._J_O_B_S. Turns compati- bility mode off, unless the _B flag is also specified. When com- patibility mode is off, all commands associated with a target are executed in a single shell invocation as opposed to the tradi- tional one shell invocation per line. This can break traditional scripts which change directories on each command invocation and then expect to start with a fresh environment on the next line. It is more efficient to correct the scripts rather than turn backwards compatibility on. --kk Continue processing after errors are encountered, but only on those targets that do not depend on the target whose creation caused the error. --mm _d_i_r_e_c_t_o_r_y Specify a directory in which to search for sys.mk and makefiles included via the <_f_i_l_e>-style include statement. The --mm option can be used multiple times to form a search path. This path will override the default system include path: /usr/share/mk. Fur- thermore the system include path will be appended to the search path used for "_f_i_l_e"-style include statements (see the --II option). If a file or directory name in the --mm argument (or the MAKESYSPATH environment variable) starts with the string ".../" then bbmmaakkee will search for the specified file or directory named in the remaining part of the argument string. The search starts with the current directory of the Makefile and then works upward towards the root of the file system. If the search is success- ful, then the resulting directory replaces the ".../" specifica- tion in the --mm argument. If used, this feature allows bbmmaakkee to easily search in the current source tree for customized sys.mk files (e.g., by using ".../mk/sys.mk" as an argument). --nn Display the commands that would have been executed, but do not actually execute them unless the target depends on the .MAKE spe- cial source (see below). --NN Display the commands which would have been executed, but do not actually execute any of them; useful for debugging top-level makefiles without descending into subdirectories. --qq Do not execute any commands, but exit 0 if the specified targets are up-to-date and 1, otherwise. --rr Do not use the built-in rules specified in the system makefile. --ss Do not echo any commands as they are executed. Equivalent to specifying `@@' before each command line in the makefile. --TT _t_r_a_c_e_f_i_l_e When used with the --jj flag, append a trace record to _t_r_a_c_e_f_i_l_e for each job started and completed. --tt Rather than re-building a target as specified in the makefile, create it or update its modification time to make it appear up- to-date. --VV _v_a_r_i_a_b_l_e Print bbmmaakkee's idea of the value of _v_a_r_i_a_b_l_e, in the global con- text. Do not build any targets. Multiple instances of this option may be specified; the variables will be printed one per line, with a blank line for each null or undefined variable. If _v_a_r_i_a_b_l_e contains a `$' then the value will be expanded before printing. --WW Treat any warnings during makefile parsing as errors. --ww Print entering and leaving directory messages, pre and post pro- cessing. --XX Don't export variables passed on the command line to the environ- ment individually. Variables passed on the command line are still exported via the _M_A_K_E_F_L_A_G_S environment variable. This option may be useful on systems which have a small limit on the size of command arguments. _v_a_r_i_a_b_l_e_=_v_a_l_u_e Set the value of the variable _v_a_r_i_a_b_l_e to _v_a_l_u_e. Normally, all values passed on the command line are also exported to sub-makes in the environment. The --XX flag disables this behavior. Vari- able assignments should follow options for POSIX compatibility but no ordering is enforced. There are seven different types of lines in a makefile: file dependency specifications, shell commands, variable assignments, include statements, conditional directives, for loops, and comments. In general, lines may be continued from one line to the next by ending them with a backslash (`\'). The trailing newline character and initial whitespace on the following line are compressed into a single space. FFIILLEE DDEEPPEENNDDEENNCCYY SSPPEECCIIFFIICCAATTIIOONNSS Dependency lines consist of one or more targets, an operator, and zero or more sources. This creates a relationship where the targets ``depend'' on the sources and are usually created from them. The exact relationship between the target and the source is determined by the operator that sep- arates them. The three operators are as follows: :: A target is considered out-of-date if its modification time is less than those of any of its sources. Sources for a target accumulate over dependency lines when this operator is used. The target is removed if bbmmaakkee is interrupted. !! Targets are always re-created, but not until all sources have been examined and re-created as necessary. Sources for a target accumu- late over dependency lines when this operator is used. The target is removed if bbmmaakkee is interrupted. :::: If no sources are specified, the target is always re-created. Oth- erwise, a target is considered out-of-date if any of its sources has been modified more recently than the target. Sources for a target do not accumulate over dependency lines when this operator is used. The target will not be removed if bbmmaakkee is interrupted. Targets and sources may contain the shell wildcard values `?', `*', `[]', and `{}'. The values `?', `*', and `[]' may only be used as part of the final component of the target or source, and must be used to describe existing files. The value `{}' need not necessarily be used to describe existing files. Expansion is in directory order, not alphabetically as done in the shell. SSHHEELLLL CCOOMMMMAANNDDSS Each target may have associated with it one or more lines of shell com- mands, normally used to create the target. Each of the lines in this script _m_u_s_t be preceded by a tab. (For historical reasons, spaces are not accepted.) While targets can appear in many dependency lines if desired, by default only one of these rules may be followed by a creation script. If the `::::' operator is used, however, all rules may include scripts and the scripts are executed in the order found. Each line is treated as a separate shell command, unless the end of line is escaped with a backslash (`\') in which case that line and the next are combined. If the first characters of the command are any combination of `@@', `++', or `--', the command is treated specially. A `@@' causes the command not to be echoed before it is executed. A `++' causes the command to be executed even when --nn is given. This is similar to the effect of the .MAKE special source, except that the effect can be limited to a sin- gle line of a script. A `--' in compatibility mode causes any non-zero exit status of the command line to be ignored. When bbmmaakkee is run in jobs mode with --jj _m_a_x___j_o_b_s, the entire script for the target is fed to a single instance of the shell. In compatibility (non-jobs) mode, each command is run in a separate process. If the com- mand contains any shell meta characters (`#=|^(){};&<>*?[]:$`\\n') it will be passed to the shell; otherwise bbmmaakkee will attempt direct execu- tion. If a line starts with `--' and the shell has ErrCtl enabled then failure of the command line will be ignored as in compatibility mode. Otherwise `--' affects the entire job; the script will stop at the first command line that fails, but the target will not be deemed to have failed. Makefiles should be written so that the mode of bbmmaakkee operation does not change their behavior. For example, any command which needs to use ``cd'' or ``chdir'' without potentially changing the directory for subse- quent commands should be put in parentheses so it executes in a subshell. To force the use of one shell, escape the line breaks so as to make the whole script one command. For example: avoid-chdir-side-effects: @echo Building $@ in `pwd` @(cd ${.CURDIR} && ${MAKE} $@) @echo Back in `pwd` ensure-one-shell-regardless-of-mode: @echo Building $@ in `pwd`; \ (cd ${.CURDIR} && ${MAKE} $@); \ echo Back in `pwd` Since bbmmaakkee will chdir(2) to `_._O_B_J_D_I_R' before executing any targets, each child process starts with that as its current working directory. VVAARRIIAABBLLEE AASSSSIIGGNNMMEENNTTSS Variables in make are much like variables in the shell, and, by tradi- tion, consist of all upper-case letters. VVaarriiaabbllee aassssiiggnnmmeenntt mmooddiiffiieerrss The five operators that can be used to assign values to variables are as follows: == Assign the value to the variable. Any previous value is overrid- den. ++== Append the value to the current value of the variable. ??== Assign the value to the variable if it is not already defined. ::== Assign with expansion, i.e. expand the value before assigning it to the variable. Normally, expansion is not done until the vari- able is referenced. _N_O_T_E: References to undefined variables are _n_o_t expanded. This can cause problems when variable modifiers are used. !!== Expand the value and pass it to the shell for execution and assign the result to the variable. Any newlines in the result are replaced with spaces. Any white-space before the assigned _v_a_l_u_e is removed; if the value is being appended, a single space is inserted between the previous contents of the variable and the appended value. Variables are expanded by surrounding the variable name with either curly braces (`{}') or parentheses (`()') and preceding it with a dollar sign (`$'). If the variable name contains only a single letter, the surround- ing braces or parentheses are not required. This shorter form is not recommended. If the variable name contains a dollar, then the name itself is expanded first. This allows almost arbitrary variable names, however names con- taining dollar, braces, parenthesis, or whitespace are really best avoided! If the result of expanding a variable contains a dollar sign (`$') the string is expanded again. Variable substitution occurs at three distinct times, depending on where the variable is being used. 1. Variables in dependency lines are expanded as the line is read. 2. Variables in shell commands are expanded when the shell command is executed. 3. ``.for'' loop index variables are expanded on each loop iteration. Note that other variables are not expanded inside loops so the fol- lowing example code: .for i in 1 2 3 a+= ${i} j= ${i} b+= ${j} .endfor all: @echo ${a} @echo ${b} will print: 1 2 3 3 3 3 Because while ${a} contains ``1 2 3'' after the loop is executed, ${b} contains ``${j} ${j} ${j}'' which expands to ``3 3 3'' since after the loop completes ${j} contains ``3''. VVaarriiaabbllee ccllaasssseess The four different classes of variables (in order of increasing prece- dence) are: Environment variables Variables defined as part of bbmmaakkee's environment. Global variables Variables defined in the makefile or in included makefiles. Command line variables Variables defined as part of the command line. Local variables Variables that are defined specific to a certain target. Local variables are all built in and their values vary magically from target to target. It is not currently possible to define new local vari- ables. The seven local variables are as follows: _._A_L_L_S_R_C The list of all sources for this target; also known as `_>'. _._A_R_C_H_I_V_E The name of the archive file; also known as `_!'. _._I_M_P_S_R_C In suffix-transformation rules, the name/path of the source from which the target is to be transformed (the ``implied'' source); also known as `_<'. It is not defined in explicit rules. _._M_E_M_B_E_R The name of the archive member; also known as `_%'. _._O_O_D_A_T_E The list of sources for this target that were deemed out- of-date; also known as `_?'. _._P_R_E_F_I_X The file prefix of the target, containing only the file portion, no suffix or preceding directory components; also known as `_*'. The suffix must be one of the known suffixes declared with ..SSUUFFFFIIXXEESS or it will not be recog- nized. _._T_A_R_G_E_T The name of the target; also known as `_@'. For compati- bility with other makes this is an alias for ..AARRCCHHIIVVEE in archive member rules. The shorter forms (`_>', `_!', `_<', `_%', `_?', `_*', and `_@') are permitted for backward compatibility with historical makefiles and legacy POSIX make and are not recommended. Variants of these variables with the punctuation followed immediately by `D' or `F', e.g. `_$_(_@_D_)', are legacy forms equivalent to using the `:H' and `:T' modifiers. These forms are accepted for compatibility with AT&T System V UNIX makefiles and POSIX but are not recommended. Four of the local variables may be used in sources on dependency lines because they expand to the proper value for each target on the line. These variables are `_._T_A_R_G_E_T', `_._P_R_E_F_I_X', `_._A_R_C_H_I_V_E', and `_._M_E_M_B_E_R'. AAddddiittiioonnaall bbuuiilltt--iinn vvaarriiaabblleess In addition, bbmmaakkee sets or knows about the following variables: _$ A single dollar sign `$', i.e. `$$' expands to a single dollar sign. _._A_L_L_T_A_R_G_E_T_S The list of all targets encountered in the Makefile. If evaluated during Makefile parsing, lists only those tar- gets encountered thus far. _._C_U_R_D_I_R A path to the directory where bbmmaakkee was executed. Refer to the description of `PWD' for more details. _._I_N_C_L_U_D_E_D_F_R_O_M_D_I_R The directory of the file this Makefile was included from. _._I_N_C_L_U_D_E_D_F_R_O_M_F_I_L_E The filename of the file this Makefile was included from. MAKE The name that bbmmaakkee was executed with (_a_r_g_v_[_0_]). For compatibility bbmmaakkee also sets _._M_A_K_E with the same value. The preferred variable to use is the environment variable MAKE because it is more compatible with other versions of bbmmaakkee and cannot be confused with the special target with the same name. _._M_A_K_E_._D_E_P_E_N_D_F_I_L_E Names the makefile (default `_._d_e_p_e_n_d') from which gener- ated dependencies are read. _._M_A_K_E_._E_X_P_A_N_D___V_A_R_I_A_B_L_E_S A boolean that controls the default behavior of the --VV option. _._M_A_K_E_._E_X_P_O_R_T_E_D The list of variables exported by bbmmaakkee. _._M_A_K_E_._J_O_B_S The argument to the --jj option. _._M_A_K_E_._J_O_B_._P_R_E_F_I_X If bbmmaakkee is run with _j then output for each target is prefixed with a token `--- target ---' the first part of which can be controlled via _._M_A_K_E_._J_O_B_._P_R_E_F_I_X. If _._M_A_K_E_._J_O_B_._P_R_E_F_I_X is empty, no token is printed. For example: .MAKE.JOB.PREFIX=${.newline}---${.MAKE:T}[${.MAKE.PID}] would produce tokens like `---make[1234] target ---' mak- ing it easier to track the degree of parallelism being achieved. MAKEFLAGS The environment variable `MAKEFLAGS' may contain anything that may be specified on bbmmaakkee's command line. Anything specified on bbmmaakkee's command line is appended to the `MAKEFLAGS' variable which is then entered into the envi- ronment for all programs which bbmmaakkee executes. _._M_A_K_E_._L_E_V_E_L The recursion depth of bbmmaakkee. The initial instance of bbmmaakkee will be 0, and an incremented value is put into the environment to be seen by the next generation. This allows tests like: .if ${.MAKE.LEVEL} == 0 to protect things which should only be evaluated in the initial instance of bbmmaakkee. _._M_A_K_E_._M_A_K_E_F_I_L_E___P_R_E_F_E_R_E_N_C_E The ordered list of makefile names (default `_m_a_k_e_f_i_l_e', `_M_a_k_e_f_i_l_e') that bbmmaakkee will look for. _._M_A_K_E_._M_A_K_E_F_I_L_E_S The list of makefiles read by bbmmaakkee, which is useful for tracking dependencies. Each makefile is recorded only once, regardless of the number of times read. _._M_A_K_E_._M_O_D_E Processed after reading all makefiles. Can affect the mode that bbmmaakkee runs in. It can contain a number of key- words: _c_o_m_p_a_t Like --BB, puts bbmmaakkee into "compat" mode. _m_e_t_a Puts bbmmaakkee into "meta" mode, where meta files are created for each tar- get to capture the command run, the output generated and if filemon(4) is available, the system calls which are of interest to bbmmaakkee. The cap- tured output can be very useful when diagnosing errors. _c_u_r_d_i_r_O_k_= _b_f Normally bbmmaakkee will not create .meta files in `_._C_U_R_D_I_R'. This can be overridden by setting _b_f to a value which represents True. _m_i_s_s_i_n_g_-_m_e_t_a_= _b_f If _b_f is True, then a missing .meta file makes the target out-of-date. _m_i_s_s_i_n_g_-_f_i_l_e_m_o_n_= _b_f If _b_f is True, then missing filemon data makes the target out-of-date. _n_o_f_i_l_e_m_o_n Do not use filemon(4). _e_n_v For debugging, it can be useful to include the environment in the .meta file. _v_e_r_b_o_s_e If in "meta" mode, print a clue about the target being built. This is useful if the build is otherwise running silently. The message printed the value of: _._M_A_K_E_._M_E_T_A_._P_R_E_F_I_X. _i_g_n_o_r_e_-_c_m_d Some makefiles have commands which are simply not stable. This keyword causes them to be ignored for deter- mining whether a target is out of date in "meta" mode. See also ..NNOOMMEETTAA__CCMMPP. _s_i_l_e_n_t_= _b_f If _b_f is True, when a .meta file is created, mark the target ..SSIILLEENNTT. _._M_A_K_E_._M_E_T_A_._B_A_I_L_I_W_I_C_K In "meta" mode, provides a list of prefixes which match the directories controlled by bbmmaakkee. If a file that was generated outside of _._O_B_J_D_I_R but within said bailiwick is missing, the current target is considered out-of-date. _._M_A_K_E_._M_E_T_A_._C_R_E_A_T_E_D In "meta" mode, this variable contains a list of all the meta files updated. If not empty, it can be used to trigger processing of _._M_A_K_E_._M_E_T_A_._F_I_L_E_S. _._M_A_K_E_._M_E_T_A_._F_I_L_E_S In "meta" mode, this variable contains a list of all the meta files used (updated or not). This list can be used to process the meta files to extract dependency informa- tion. _._M_A_K_E_._M_E_T_A_._I_G_N_O_R_E___P_A_T_H_S Provides a list of path prefixes that should be ignored; because the contents are expected to change over time. The default list includes: `_/_d_e_v _/_e_t_c _/_p_r_o_c _/_t_m_p _/_v_a_r_/_r_u_n _/_v_a_r_/_t_m_p' _._M_A_K_E_._M_E_T_A_._I_G_N_O_R_E___P_A_T_T_E_R_N_S Provides a list of patterns to match against pathnames. Ignore any that match. + _._M_A_K_E_._M_E_T_A_._I_G_N_O_R_E___F_I_L_T_E_R + Provides a list of variable modifiers to apply to each + pathname. Ignore if the expansion is an empty string. + _._M_A_K_E_._M_E_T_A_._P_R_E_F_I_X Defines the message printed for each meta file updated in "meta verbose" mode. The default value is: Building ${.TARGET:H:tA}/${.TARGET:T} _._M_A_K_E_O_V_E_R_R_I_D_E_S This variable is used to record the names of variables assigned to on the command line, so that they may be exported as part of `MAKEFLAGS'. This behavior can be disabled by assigning an empty value to `_._M_A_K_E_O_V_E_R_R_I_D_E_S' within a makefile. Extra variables can be exported from a makefile by appending their names to `_._M_A_K_E_O_V_E_R_R_I_D_E_S'. `MAKEFLAGS' is re-exported whenever `_._M_A_K_E_O_V_E_R_R_I_D_E_S' is modified. _._M_A_K_E_._P_A_T_H___F_I_L_E_M_O_N If bbmmaakkee was built with filemon(4) support, this is set to the path of the device node. This allows makefiles to test for this support. _._M_A_K_E_._P_I_D The process-id of bbmmaakkee. _._M_A_K_E_._P_P_I_D The parent process-id of bbmmaakkee. _._M_A_K_E_._S_A_V_E___D_O_L_L_A_R_S value should be a boolean that controls whether `$$' are preserved when doing `:=' assignments. The default is false, for backwards compatibility. Set to true for com- patability with other makes. If set to false, `$$' becomes `$' per normal evaluation rules. _M_A_K_E___P_R_I_N_T___V_A_R___O_N___E_R_R_O_R - When bbmmaakkee stops due to an error, it prints its name and - the value of `_._C_U_R_D_I_R' as well as the value of any vari- - ables named in `_M_A_K_E___P_R_I_N_T___V_A_R___O_N___E_R_R_O_R'. + When bbmmaakkee stops due to an error, it sets `_._E_R_R_O_R___T_A_R_G_E_T' + to the name of the target that failed, `_._E_R_R_O_R___C_M_D' to + the commands of the failed target, and in "meta" mode, it + also sets `_._E_R_R_O_R___C_W_D' to the getcwd(3), and + `_._E_R_R_O_R___M_E_T_A___F_I_L_E' to the path of the meta file (if any) + describing the failed target. It then prints its name + and the value of `_._C_U_R_D_I_R' as well as the value of any + variables named in `_M_A_K_E___P_R_I_N_T___V_A_R___O_N___E_R_R_O_R'. _._n_e_w_l_i_n_e This variable is simply assigned a newline character as its value. This allows expansions using the ::@@ modifier to put a newline between iterations of the loop rather than a space. For example, the printing of `_M_A_K_E___P_R_I_N_T___V_A_R___O_N___E_R_R_O_R' could be done as ${MAKE_PRINT_VAR_ON_ERROR:@v@$v='${$v}'${.newline}@}. _._O_B_J_D_I_R A path to the directory where the targets are built. Its value is determined by trying to chdir(2) to the follow- ing directories in order and using the first match: 1. ${MAKEOBJDIRPREFIX}${.CURDIR} (Only if `MAKEOBJDIRPREFIX' is set in the environ- ment or on the command line.) 2. ${MAKEOBJDIR} (Only if `MAKEOBJDIR' is set in the environment or on the command line.) 3. ${.CURDIR}_/_o_b_j_.${MACHINE} 4. ${.CURDIR}_/_o_b_j 5. _/_u_s_r_/_o_b_j_/${.CURDIR} 6. ${.CURDIR} Variable expansion is performed on the value before it's used, so expressions such as ${.CURDIR:S,^/usr/src,/var/obj,} may be used. This is especially useful with `MAKEOBJDIR'. `_._O_B_J_D_I_R' may be modified in the makefile via the special target `..OOBBJJDDIIRR'. In all cases, bbmmaakkee will chdir(2) to the specified directory if it exists, and set `_._O_B_J_D_I_R' and `PWD' to that directory before executing any targets. _._P_A_R_S_E_D_I_R A path to the directory of the current `_M_a_k_e_f_i_l_e' being parsed. _._P_A_R_S_E_F_I_L_E The basename of the current `_M_a_k_e_f_i_l_e' being parsed. This variable and `_._P_A_R_S_E_D_I_R' are both set only while the `_M_a_k_e_f_i_l_e_s' are being parsed. If you want to retain their current values, assign them to a variable using assignment with expansion: (`::=='). _._P_A_T_H A variable that represents the list of directories that bbmmaakkee will search for files. The search list should be updated using the target `_._P_A_T_H' rather than the vari- able. PWD Alternate path to the current directory. bbmmaakkee normally sets `_._C_U_R_D_I_R' to the canonical path given by getcwd(3). However, if the environment variable `PWD' is set and gives a path to the current directory, then bbmmaakkee sets `_._C_U_R_D_I_R' to the value of `PWD' instead. This behavior is disabled if `MAKEOBJDIRPREFIX' is set or `MAKEOBJDIR' contains a variable transform. `PWD' is set to the value of `_._O_B_J_D_I_R' for all programs which bbmmaakkee executes. .TARGETS The list of targets explicitly specified on the command line, if any. VPATH Colon-separated (``:'') lists of directories that bbmmaakkee will search for files. The variable is supported for compatibility with old make programs only, use `_._P_A_T_H' instead. VVaarriiaabbllee mmooddiiffiieerrss Variable expansion may be modified to select or modify each word of the variable (where a ``word'' is white-space delimited sequence of charac- ters). The general format of a variable expansion is as follows: ${variable[:modifier[:...]]} Each modifier begins with a colon, which may be escaped with a backslash (`\'). A set of modifiers can be specified via a variable, as follows: modifier_variable=modifier[:...] ${variable:${modifier_variable}[:...]} In this case the first modifier in the modifier_variable does not start with a colon, since that must appear in the referencing variable. If any of the modifiers in the modifier_variable contain a dollar sign (`$'), these must be doubled to avoid early expansion. The supported modifiers are: ::EE Replaces each word in the variable with its suffix. ::HH Replaces each word in the variable with everything but the last com- ponent. ::MM_p_a_t_t_e_r_n Select only those words that match _p_a_t_t_e_r_n. The standard shell wildcard characters (`*', `?', and `[]') may be used. The wildcard characters may be escaped with a backslash (`\'). As a consequence of the way values are split into words, matched, and then joined, a construct like ${VAR:M*} will normalize the inter-word spacing, removing all leading and trailing space, and converting multiple consecutive spaces to single spaces. ::NN_p_a_t_t_e_r_n This is identical to `::MM', but selects all words which do not match _p_a_t_t_e_r_n. ::OO Order every word in variable alphabetically. To sort words in reverse order use the `::OO::[[--11....11]]' combination of modifiers. ::OOxx Randomize words in variable. The results will be different each time you are referring to the modified variable; use the assignment with expansion (`::==') to prevent such behavior. For example, LIST= uno due tre quattro RANDOM_LIST= ${LIST:Ox} STATIC_RANDOM_LIST:= ${LIST:Ox} all: @echo "${RANDOM_LIST}" @echo "${RANDOM_LIST}" @echo "${STATIC_RANDOM_LIST}" @echo "${STATIC_RANDOM_LIST}" may produce output similar to: quattro due tre uno tre due quattro uno due uno quattro tre due uno quattro tre ::QQ Quotes every shell meta-character in the variable, so that it can be passed safely through recursive invocations of bbmmaakkee. ::RR Replaces each word in the variable with everything but its suffix. ::ggmmttiimmee The value is a format string for strftime(3), using the current gmtime(3). ::hhaasshh Compute a 32-bit hash of the value and encode it as hex digits. ::llooccaallttiimmee The value is a format string for strftime(3), using the current localtime(3). ::ttAA Attempt to convert variable to an absolute path using realpath(3), if that fails, the value is unchanged. ::ttll Converts variable to lower-case letters. ::ttss_c Words in the variable are normally separated by a space on expan- sion. This modifier sets the separator to the character _c. If _c is omitted, then no separator is used. The common escapes (including octal numeric codes), work as expected. ::ttuu Converts variable to upper-case letters. ::ttWW Causes the value to be treated as a single word (possibly containing embedded white space). See also `::[[**]]'. ::ttww Causes the value to be treated as a sequence of words delimited by white space. See also `::[[@@]]'. ::SS/_o_l_d___s_t_r_i_n_g/_n_e_w___s_t_r_i_n_g/[11ggWW] Modify the first occurrence of _o_l_d___s_t_r_i_n_g in the variable's value, replacing it with _n_e_w___s_t_r_i_n_g. If a `g' is appended to the last slash of the pattern, all occurrences in each word are replaced. If a `1' is appended to the last slash of the pattern, only the first word is affected. If a `W' is appended to the last slash of the pattern, then the value is treated as a single word (possibly con- taining embedded white space). If _o_l_d___s_t_r_i_n_g begins with a caret (`^'), _o_l_d___s_t_r_i_n_g is anchored at the beginning of each word. If _o_l_d___s_t_r_i_n_g ends with a dollar sign (`$'), it is anchored at the end of each word. Inside _n_e_w___s_t_r_i_n_g, an ampersand (`&') is replaced by _o_l_d___s_t_r_i_n_g (without any `^' or `$'). Any character may be used as a delimiter for the parts of the modifier string. The anchoring, ampersand and delimiter characters may be escaped with a backslash (`\'). Variable expansion occurs in the normal fashion inside both _o_l_d___s_t_r_i_n_g and _n_e_w___s_t_r_i_n_g with the single exception that a backslash is used to prevent the expansion of a dollar sign (`$'), not a pre- ceding dollar sign as is usual. ::CC/_p_a_t_t_e_r_n/_r_e_p_l_a_c_e_m_e_n_t/[11ggWW] The ::CC modifier is just like the ::SS modifier except that the old and new strings, instead of being simple strings, are an extended regu- lar expression (see regex(3)) string _p_a_t_t_e_r_n and an ed(1)-style string _r_e_p_l_a_c_e_m_e_n_t. Normally, the first occurrence of the pattern _p_a_t_t_e_r_n in each word of the value is substituted with _r_e_p_l_a_c_e_m_e_n_t. The `1' modifier causes the substitution to apply to at most one word; the `g' modifier causes the substitution to apply to as many instances of the search pattern _p_a_t_t_e_r_n as occur in the word or words it is found in; the `W' modifier causes the value to be treated as a single word (possibly containing embedded white space). Note that `1' and `g' are orthogonal; the former specifies whether multiple words are potentially affected, the latter whether multiple substitutions can potentially occur within each affected word. As for the ::SS modifier, the _p_a_t_t_e_r_n and _r_e_p_l_a_c_e_m_e_n_t are subjected to variable expansion before being parsed as regular expressions. ::TT Replaces each word in the variable with its last component. ::uu Remove adjacent duplicate words (like uniq(1)). ::??_t_r_u_e___s_t_r_i_n_g::_f_a_l_s_e___s_t_r_i_n_g If the variable name (not its value), when parsed as a .if condi- tional expression, evaluates to true, return as its value the _t_r_u_e___s_t_r_i_n_g, otherwise return the _f_a_l_s_e___s_t_r_i_n_g. Since the variable name is used as the expression, :? must be the first modifier after the variable name itself - which will, of course, usually contain variable expansions. A common error is trying to use expressions like ${NUMBERS:M42:?match:no} which actually tests defined(NUMBERS), to determine is any words match "42" you need to use something like: ${"${NUMBERS:M42}" != "":?match:no}. _:_o_l_d___s_t_r_i_n_g_=_n_e_w___s_t_r_i_n_g This is the AT&T System V UNIX style variable substitution. It must be the last modifier specified. If _o_l_d___s_t_r_i_n_g or _n_e_w___s_t_r_i_n_g do not contain the pattern matching character _% then it is assumed that they are anchored at the end of each word, so only suffixes or entire words may be replaced. Otherwise _% is the substring of _o_l_d___s_t_r_i_n_g to be replaced in _n_e_w___s_t_r_i_n_g. Variable expansion occurs in the normal fashion inside both _o_l_d___s_t_r_i_n_g and _n_e_w___s_t_r_i_n_g with the single exception that a backslash is used to prevent the expansion of a dollar sign (`$'), not a pre- ceding dollar sign as is usual. ::@@_t_e_m_p@@_s_t_r_i_n_g@@ This is the loop expansion mechanism from the OSF Development Envi- ronment (ODE) make. Unlike ..ffoorr loops expansion occurs at the time of reference. Assign _t_e_m_p to each word in the variable and evaluate _s_t_r_i_n_g. The ODE convention is that _t_e_m_p should start and end with a period. For example. ${LINKS:@.LINK.@${LN} ${TARGET} ${.LINK.}@} However a single character variable is often more readable: ${MAKE_PRINT_VAR_ON_ERROR:@v@$v='${$v}'${.newline}@} ::UU_n_e_w_v_a_l If the variable is undefined _n_e_w_v_a_l is the value. If the variable is defined, the existing value is returned. This is another ODE make feature. It is handy for setting per-target CFLAGS for instance: ${_${.TARGET:T}_CFLAGS:U${DEF_CFLAGS}} If a value is only required if the variable is undefined, use: ${VAR:D:Unewval} ::DD_n_e_w_v_a_l If the variable is defined _n_e_w_v_a_l is the value. ::LL The name of the variable is the value. ::PP The path of the node which has the same name as the variable is the value. If no such node exists or its path is null, then the name of the variable is used. In order for this modifier to work, the name (node) must at least have appeared on the rhs of a dependency. ::!!_c_m_d!! The output of running _c_m_d is the value. ::sshh If the variable is non-empty it is run as a command and the output becomes the new value. ::::==_s_t_r The variable is assigned the value _s_t_r after substitution. This modifier and its variations are useful in obscure situations such as wanting to set a variable when shell commands are being parsed. These assignment modifiers always expand to nothing, so if appearing in a rule line by themselves should be preceded with something to keep bbmmaakkee happy. The `::::' helps avoid false matches with the AT&T System V UNIX style ::== modifier and since substitution always occurs the ::::== form is vaguely appropriate. ::::??==_s_t_r As for ::::== but only if the variable does not already have a value. ::::++==_s_t_r Append _s_t_r to the variable. ::::!!==_c_m_d Assign the output of _c_m_d to the variable. ::[[_r_a_n_g_e]] Selects one or more words from the value, or performs other opera- tions related to the way in which the value is divided into words. Ordinarily, a value is treated as a sequence of words delimited by white space. Some modifiers suppress this behavior, causing a value to be treated as a single word (possibly containing embedded white space). An empty value, or a value that consists entirely of white- space, is treated as a single word. For the purposes of the `::[[]]' modifier, the words are indexed both forwards using positive inte- gers (where index 1 represents the first word), and backwards using negative integers (where index -1 represents the last word). The _r_a_n_g_e is subjected to variable expansion, and the expanded result is then interpreted as follows: _i_n_d_e_x Selects a single word from the value. _s_t_a_r_t...._e_n_d Selects all words from _s_t_a_r_t to _e_n_d, inclusive. For example, `::[[22....--11]]' selects all words from the second word to the last word. If _s_t_a_r_t is greater than _e_n_d, then the words are out- put in reverse order. For example, `::[[--11....11]]' selects all the words from last to first. ** Causes subsequent modifiers to treat the value as a single word (possibly containing embedded white space). Analogous to the effect of "$*" in Bourne shell. 0 Means the same as `::[[**]]'. @@ Causes subsequent modifiers to treat the value as a sequence of words delimited by white space. Analogous to the effect of "$@" in Bourne shell. ## Returns the number of words in the value. IINNCCLLUUDDEE SSTTAATTEEMMEENNTTSS,, CCOONNDDIITTIIOONNAALLSS AANNDD FFOORR LLOOOOPPSS Makefile inclusion, conditional structures and for loops reminiscent of the C programming language are provided in bbmmaakkee. All such structures are identified by a line beginning with a single dot (`.') character. Files are included with either ..iinncclluuddee <_f_i_l_e> or ..iinncclluuddee "_f_i_l_e". Vari- ables between the angle brackets or double quotes are expanded to form the file name. If angle brackets are used, the included makefile is expected to be in the system makefile directory. If double quotes are used, the including makefile's directory and any directories specified using the --II option are searched before the system makefile directory. For compatibility with other versions of bbmmaakkee `include file ...' is also accepted. If the include statement is written as ..--iinncclluuddee or as ..ssiinncclluuddee then errors locating and/or opening include files are ignored. If the include statement is written as ..ddiinncclluuddee not only are errors locating and/or opening include files ignored, but stale dependencies within the included file will be ignored just like _._M_A_K_E_._D_E_P_E_N_D_F_I_L_E. Conditional expressions are also preceded by a single dot as the first character of a line. The possible conditionals are as follows: ..eerrrroorr _m_e_s_s_a_g_e The message is printed along with the name of the makefile and line number, then bbmmaakkee will exit. ..eexxppoorrtt _v_a_r_i_a_b_l_e _._._. Export the specified global variable. If no variable list is provided, all globals are exported except for internal variables (those that start with `.'). This is not affected by the --XX flag, so should be used with caution. For compatibility with other bbmmaakkee programs `export variable=value' is also accepted. Appending a variable name to _._M_A_K_E_._E_X_P_O_R_T_E_D is equivalent to exporting a variable. ..eexxppoorrtt--eennvv _v_a_r_i_a_b_l_e _._._. The same as `.export', except that the variable is not appended to _._M_A_K_E_._E_X_P_O_R_T_E_D. This allows exporting a value to the environ- ment which is different from that used by bbmmaakkee internally. ..eexxppoorrtt--lliitteerraall _v_a_r_i_a_b_l_e _._._. The same as `.export-env', except that variables in the value are not expanded. ..iinnffoo _m_e_s_s_a_g_e The message is printed along with the name of the makefile and line number. ..uunnddeeff _v_a_r_i_a_b_l_e Un-define the specified global variable. Only global variables may be un-defined. ..uunneexxppoorrtt _v_a_r_i_a_b_l_e _._._. The opposite of `.export'. The specified global _v_a_r_i_a_b_l_e will be removed from _._M_A_K_E_._E_X_P_O_R_T_E_D. If no variable list is provided, all globals are unexported, and _._M_A_K_E_._E_X_P_O_R_T_E_D deleted. ..uunneexxppoorrtt--eennvv Unexport all globals previously exported and clear the environ- ment inherited from the parent. This operation will cause a mem- ory leak of the original environment, so should be used spar- ingly. Testing for _._M_A_K_E_._L_E_V_E_L being 0, would make sense. Also note that any variables which originated in the parent environ- ment should be explicitly preserved if desired. For example: .if ${.MAKE.LEVEL} == 0 PATH := ${PATH} .unexport-env .export PATH .endif Would result in an environment containing only `PATH', which is the minimal useful environment. Actually `.MAKE.LEVEL' will also be pushed into the new environment. ..wwaarrnniinngg _m_e_s_s_a_g_e The message prefixed by `_w_a_r_n_i_n_g_:' is printed along with the name of the makefile and line number. ..iiff [!]_e_x_p_r_e_s_s_i_o_n [_o_p_e_r_a_t_o_r _e_x_p_r_e_s_s_i_o_n _._._.] Test the value of an expression. ..iiffddeeff [!]_v_a_r_i_a_b_l_e [_o_p_e_r_a_t_o_r _v_a_r_i_a_b_l_e _._._.] Test the value of a variable. ..iiffnnddeeff [!]_v_a_r_i_a_b_l_e [_o_p_e_r_a_t_o_r _v_a_r_i_a_b_l_e _._._.] Test the value of a variable. ..iiffmmaakkee [!]_t_a_r_g_e_t [_o_p_e_r_a_t_o_r _t_a_r_g_e_t _._._.] Test the target being built. ..iiffnnmmaakkee [!] _t_a_r_g_e_t [_o_p_e_r_a_t_o_r _t_a_r_g_e_t _._._.] Test the target being built. ..eellssee Reverse the sense of the last conditional. ..eelliiff [!] _e_x_p_r_e_s_s_i_o_n [_o_p_e_r_a_t_o_r _e_x_p_r_e_s_s_i_o_n _._._.] A combination of `..eellssee' followed by `..iiff'. ..eelliiffddeeff [!]_v_a_r_i_a_b_l_e [_o_p_e_r_a_t_o_r _v_a_r_i_a_b_l_e _._._.] A combination of `..eellssee' followed by `..iiffddeeff'. ..eelliiffnnddeeff [!]_v_a_r_i_a_b_l_e [_o_p_e_r_a_t_o_r _v_a_r_i_a_b_l_e _._._.] A combination of `..eellssee' followed by `..iiffnnddeeff'. ..eelliiffmmaakkee [!]_t_a_r_g_e_t [_o_p_e_r_a_t_o_r _t_a_r_g_e_t _._._.] A combination of `..eellssee' followed by `..iiffmmaakkee'. ..eelliiffnnmmaakkee [!]_t_a_r_g_e_t [_o_p_e_r_a_t_o_r _t_a_r_g_e_t _._._.] A combination of `..eellssee' followed by `..iiffnnmmaakkee'. ..eennddiiff End the body of the conditional. The _o_p_e_r_a_t_o_r may be any one of the following: |||| Logical OR. &&&& Logical AND; of higher precedence than ``||''. As in C, bbmmaakkee will only evaluate a conditional as far as is necessary to determine its value. Parentheses may be used to change the order of evaluation. The boolean operator `!!' may be used to logically negate an entire conditional. It is of higher precedence than `&&&&'. The value of _e_x_p_r_e_s_s_i_o_n may be any of the following: ddeeffiinneedd Takes a variable name as an argument and evaluates to true if the variable has been defined. mmaakkee Takes a target name as an argument and evaluates to true if the target was specified as part of bbmmaakkee's command line or was declared the default target (either implicitly or explicitly, see _._M_A_I_N) before the line containing the conditional. eemmppttyy Takes a variable, with possible modifiers, and evaluates to true if the expansion of the variable would result in an empty string. eexxiissttss Takes a file name as an argument and evaluates to true if the file exists. The file is searched for on the system search path (see _._P_A_T_H). ttaarrggeett Takes a target name as an argument and evaluates to true if the target has been defined. ccoommmmaannddss Takes a target name as an argument and evaluates to true if the target has been defined and has commands associated with it. _E_x_p_r_e_s_s_i_o_n may also be an arithmetic or string comparison. Variable expansion is performed on both sides of the comparison, after which the integral values are compared. A value is interpreted as hexadecimal if it is preceded by 0x, otherwise it is decimal; octal numbers are not sup- ported. The standard C relational operators are all supported. If after variable expansion, either the left or right hand side of a `====' or `!!==' operator is not an integral value, then string comparison is performed between the expanded variables. If no relational operator is given, it is assumed that the expanded variable is being compared against 0 or an empty string in the case of a string comparison. When bbmmaakkee is evaluating one of these conditional expressions, and it encounters a (white-space separated) word it doesn't recognize, either the ``make'' or ``defined'' expression is applied to it, depending on the form of the conditional. If the form is `..iiffddeeff', `..iiffnnddeeff', or `..iiff' the ``defined'' expression is applied. Similarly, if the form is `..iiffmmaakkee' or `..iiffnnmmaakkee, tthhee' ``make'' expression is applied. If the conditional evaluates to true the parsing of the makefile contin- ues as before. If it evaluates to false, the following lines are skipped. In both cases this continues until a `..eellssee' or `..eennddiiff' is found. For loops are typically used to apply a set of rules to a list of files. The syntax of a for loop is: ..ffoorr _v_a_r_i_a_b_l_e [_v_a_r_i_a_b_l_e _._._.] iinn _e_x_p_r_e_s_s_i_o_n ..eennddffoorr After the for eexxpprreessssiioonn is evaluated, it is split into words. On each iteration of the loop, one word is taken and assigned to each vvaarriiaabbllee, in order, and these vvaarriiaabblleess are substituted into the mmaakkee--rruulleess inside the body of the for loop. The number of words must come out even; that is, if there are three iteration variables, the number of words provided must be a multiple of three. CCOOMMMMEENNTTSS Comments begin with a hash (`#') character, anywhere but in a shell com- mand line, and continue to the end of an unescaped new line. SSPPEECCIIAALL SSOOUURRCCEESS ((AATTTTRRIIBBUUTTEESS)) ..EEXXEECC Target is never out of date, but always execute commands any- way. ..IIGGNNOORREE Ignore any errors from the commands associated with this tar- get, exactly as if they all were preceded by a dash (`-'). ..MMAADDEE Mark all sources of this target as being up-to-date. ..MMAAKKEE Execute the commands associated with this target even if the --nn or --tt options were specified. Normally used to mark recursive bbmmaakkees. ..MMEETTAA Create a meta file for the target, even if it is flagged as ..PPHHOONNYY, ..MMAAKKEE, or ..SSPPEECCIIAALL. Usage in conjunction with ..MMAAKKEE is the most likely case. In "meta" mode, the target is out-of- date if the meta file is missing. ..NNOOMMEETTAA Do not create a meta file for the target. Meta files are also not created for ..PPHHOONNYY, ..MMAAKKEE, or ..SSPPEECCIIAALL targets. ..NNOOMMEETTAA__CCMMPP Ignore differences in commands when deciding if target is out of date. This is useful if the command contains a value which always changes. If the number of commands change, though, the target will still be out of date. The same effect applies to any command line that uses the variable _._O_O_D_A_T_E, which can be used for that purpose even when not otherwise needed or desired: skip-compare-for-some: @echo this will be compared @echo this will not ${.OODATE:M.NOMETA_CMP} @echo this will also be compared The ::MM pattern suppresses any expansion of the unwanted vari- able. ..NNOOPPAATTHH Do not search for the target in the directories specified by ..PPAATTHH. ..NNOOTTMMAAIINN Normally bbmmaakkee selects the first target it encounters as the default target to be built if no target was specified. This source prevents this target from being selected. ..OOPPTTIIOONNAALL If a target is marked with this attribute and bbmmaakkee can't fig- ure out how to create it, it will ignore this fact and assume the file isn't needed or already exists. ..PPHHOONNYY The target does not correspond to an actual file; it is always considered to be out of date, and will not be created with the --tt option. Suffix-transformation rules are not applied to ..PPHHOONNYY targets. ..PPRREECCIIOOUUSS When bbmmaakkee is interrupted, it normally removes any partially made targets. This source prevents the target from being removed. ..RREECCUURRSSIIVVEE Synonym for ..MMAAKKEE. ..SSIILLEENNTT Do not echo any of the commands associated with this target, exactly as if they all were preceded by an at sign (`@'). ..UUSSEE Turn the target into bbmmaakkee's version of a macro. When the tar- get is used as a source for another target, the other target acquires the commands, sources, and attributes (except for ..UUSSEE) of the source. If the target already has commands, the ..UUSSEE target's commands are appended to them. ..UUSSEEBBEEFFOORREE Exactly like ..UUSSEE, but prepend the ..UUSSEEBBEEFFOORREE target commands to the target. ..WWAAIITT If ..WWAAIITT appears in a dependency line, the sources that precede it are made before the sources that succeed it in the line. Since the dependents of files are not made until the file itself could be made, this also stops the dependents being built unless they are needed for another branch of the depen- dency tree. So given: x: a .WAIT b echo x a: echo a b: b1 echo b b1: echo b1 the output is always `a', `b1', `b', `x'. The ordering imposed by ..WWAAIITT is only relevant for parallel makes. SSPPEECCIIAALL TTAARRGGEETTSS Special targets may not be included with other targets, i.e. they must be the only target specified. ..BBEEGGIINN Any command lines attached to this target are executed before anything else is done. ..DDEEFFAAUULLTT This is sort of a ..UUSSEE rule for any target (that was used only as a source) that bbmmaakkee can't figure out any other way to cre- ate. Only the shell script is used. The ..IIMMPPSSRRCC variable of a target that inherits ..DDEEFFAAUULLTT's commands is set to the target's own name. ..EENNDD Any command lines attached to this target are executed after everything else is done. ..EERRRROORR Any command lines attached to this target are executed when another target fails. The ..EERRRROORR__TTAARRGGEETT variable is set to the target that failed. See also MMAAKKEE__PPRRIINNTT__VVAARR__OONN__EERRRROORR. ..IIGGNNOORREE Mark each of the sources with the ..IIGGNNOORREE attribute. If no sources are specified, this is the equivalent of specifying the --ii option. ..IINNTTEERRRRUUPPTT If bbmmaakkee is interrupted, the commands for this target will be executed. ..MMAAIINN If no target is specified when bbmmaakkee is invoked, this target will be built. ..MMAAKKEEFFLLAAGGSS This target provides a way to specify flags for bbmmaakkee when the makefile is used. The flags are as if typed to the shell, though the --ff option will have no effect. ..NNOOPPAATTHH Apply the ..NNOOPPAATTHH attribute to any specified sources. ..NNOOTTPPAARRAALLLLEELL Disable parallel mode. ..NNOO__PPAARRAALLLLEELL Synonym for ..NNOOTTPPAARRAALLLLEELL, for compatibility with other pmake variants. ..OOBBJJDDIIRR The source is a new value for `_._O_B_J_D_I_R'. If it exists, bbmmaakkee will chdir(2) to it and update the value of `_._O_B_J_D_I_R'. ..OORRDDEERR The named targets are made in sequence. This ordering does not add targets to the list of targets to be made. Since the depen- dents of a target do not get built until the target itself could be built, unless `a' is built by another part of the dependency graph, the following is a dependency loop: .ORDER: b a b: a The ordering imposed by ..OORRDDEERR is only relevant for parallel makes. ..PPAATTHH The sources are directories which are to be searched for files not found in the current directory. If no sources are speci- fied, any previously specified directories are deleted. If the source is the special ..DDOOTTLLAASSTT target, then the current working directory is searched last. ..PPAATTHH.._s_u_f_f_i_x Like ..PPAATTHH but applies only to files with a particular suffix. The suffix must have been previously declared with ..SSUUFFFFIIXXEESS. ..PPHHOONNYY Apply the ..PPHHOONNYY attribute to any specified sources. ..PPRREECCIIOOUUSS Apply the ..PPRREECCIIOOUUSS attribute to any specified sources. If no sources are specified, the ..PPRREECCIIOOUUSS attribute is applied to every target in the file. ..SSHHEELLLL Sets the shell that bbmmaakkee will use to execute commands. The sources are a set of _f_i_e_l_d_=_v_a_l_u_e pairs. _n_a_m_e This is the minimal specification, used to select one of the built-in shell specs; _s_h, _k_s_h, and _c_s_h. _p_a_t_h Specifies the path to the shell. _h_a_s_E_r_r_C_t_l Indicates whether the shell supports exit on error. _c_h_e_c_k The command to turn on error checking. _i_g_n_o_r_e The command to disable error checking. _e_c_h_o The command to turn on echoing of commands executed. _q_u_i_e_t The command to turn off echoing of commands exe- cuted. _f_i_l_t_e_r The output to filter after issuing the _q_u_i_e_t com- mand. It is typically identical to _q_u_i_e_t. _e_r_r_F_l_a_g The flag to pass the shell to enable error checking. _e_c_h_o_F_l_a_g The flag to pass the shell to enable command echo- ing. _n_e_w_l_i_n_e The string literal to pass the shell that results in a single newline character when used outside of any quoting characters. Example: .SHELL: name=ksh path=/bin/ksh hasErrCtl=true \ check="set -e" ignore="set +e" \ echo="set -v" quiet="set +v" filter="set +v" \ echoFlag=v errFlag=e newline="'\n'" ..SSIILLEENNTT Apply the ..SSIILLEENNTT attribute to any specified sources. If no sources are specified, the ..SSIILLEENNTT attribute is applied to every command in the file. ..SSTTAALLEE This target gets run when a dependency file contains stale entries, having _._A_L_L_S_R_C set to the name of that dependency file. ..SSUUFFFFIIXXEESS Each source specifies a suffix to bbmmaakkee. If no sources are specified, any previously specified suffixes are deleted. It allows the creation of suffix-transformation rules. Example: .SUFFIXES: .o .c.o: cc -o ${.TARGET} -c ${.IMPSRC} EENNVVIIRROONNMMEENNTT bbmmaakkee uses the following environment variables, if they exist: MACHINE, MACHINE_ARCH, MAKE, MAKEFLAGS, MAKEOBJDIR, MAKEOBJDIRPREFIX, MAKESYSPATH, PWD, and TMPDIR. MAKEOBJDIRPREFIX and MAKEOBJDIR may only be set in the environment or on the command line to bbmmaakkee and not as makefile variables; see the descrip- tion of `_._O_B_J_D_I_R' for more details. FFIILLEESS .depend list of dependencies Makefile list of dependencies makefile list of dependencies sys.mk system makefile /usr/share/mk system makefile directory CCOOMMPPAATTIIBBIILLIITTYY The basic make syntax is compatible between different versions of make; however the special variables, variable modifiers and conditionals are not. OOllddeerr vveerrssiioonnss An incomplete list of changes in older versions of bbmmaakkee: The way that .for loop variables are substituted changed after NetBSD 5.0 so that they still appear to be variable expansions. In particular this stops them being treated as syntax, and removes some obscure problems using them in .if statements. The way that parallel makes are scheduled changed in NetBSD 4.0 so that .ORDER and .WAIT apply recursively to the dependent nodes. The algo- rithms used may change again in the future. OOtthheerr mmaakkee ddiiaalleeccttss Other make dialects (GNU make, SVR4 make, POSIX make, etc.) do not sup- port most of the features of bbmmaakkee as described in this manual. Most notably: ++oo The ..WWAAIITT and ..OORRDDEERR declarations and most functionality per- taining to parallelization. (GNU make supports parallelization but lacks these features needed to control it effectively.) ++oo Directives, including for loops and conditionals and most of the forms of include files. (GNU make has its own incompatible and less powerful syntax for conditionals.) ++oo All built-in variables that begin with a dot. ++oo Most of the special sources and targets that begin with a dot, with the notable exception of ..PPHHOONNYY, ..PPRREECCIIOOUUSS, and ..SSUUFFFFIIXXEESS. ++oo Variable modifiers, except for the :old=new string substitution, which does not portably support globbing with `%' and historically only works on declared suffixes. ++oo The $$>> variable even in its short form; most makes support this functionality but its name varies. Some features are somewhat more portable, such as assignment with ++==, ??==, and !!==. The ..PPAATTHH functionality is based on an older feature VVPPAATTHH found in GNU make and many versions of SVR4 make; however, historically its behavior is too ill-defined (and too buggy) to rely upon. The $$@@ and $$<< variables are more or less universally portable, as is the $$((MMAAKKEE)) variable. Basic use of suffix rules (for files only in the cur- rent directory, not trying to chain transformations together, etc.) is also reasonably portable. SSEEEE AALLSSOO mkdep(1) HHIISSTTOORRYY bbmmaakkee is derived from NetBSD make(1). It uses autoconf to facilitate portability to other platforms. A make command appeared in Version 7 AT&T UNIX. This make implementation is based on Adam De Boor's pmake program which was written for Sprite at Berkeley. It was designed to be a parallel distributed make running jobs on different machines using a daemon called ``customs''. Historically the target/dependency ``FRC'' has been used to FoRCe rebuilding (since the target/dependency does not exist... unless someone creates an ``FRC'' file). BBUUGGSS The make syntax is difficult to parse without actually acting of the data. For instance finding the end of a variable use should involve scanning each the modifiers using the correct terminator for each field. In many places make just counts {} and () in order to find the end of a variable expansion. There is no way of escaping a space character in a filename. -NetBSD 5.1 June 2, 2016 NetBSD 5.1 +NetBSD 5.1 August 15, 2016 NetBSD 5.1 Index: projects/clang390-import/contrib/bmake/main.c =================================================================== --- projects/clang390-import/contrib/bmake/main.c (revision 305686) +++ projects/clang390-import/contrib/bmake/main.c (revision 305687) @@ -1,2103 +1,2112 @@ -/* $NetBSD: main.c,v 1.247 2016/06/05 01:39:17 christos Exp $ */ +/* $NetBSD: main.c,v 1.250 2016/08/11 19:53:17 sjg Exp $ */ /* * Copyright (c) 1988, 1989, 1990, 1993 * The Regents of the University of California. All rights reserved. * * This code is derived from software contributed to Berkeley by * Adam de Boor. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ /* * Copyright (c) 1989 by Berkeley Softworks * All rights reserved. * * This code is derived from software contributed to Berkeley by * Adam de Boor. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. All advertising materials mentioning features or use of this software * must display the following acknowledgement: * This product includes software developed by the University of * California, Berkeley and its contributors. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #ifndef MAKE_NATIVE -static char rcsid[] = "$NetBSD: main.c,v 1.247 2016/06/05 01:39:17 christos Exp $"; +static char rcsid[] = "$NetBSD: main.c,v 1.250 2016/08/11 19:53:17 sjg Exp $"; #else #include #ifndef lint __COPYRIGHT("@(#) Copyright (c) 1988, 1989, 1990, 1993\ The Regents of the University of California. All rights reserved."); #endif /* not lint */ #ifndef lint #if 0 static char sccsid[] = "@(#)main.c 8.3 (Berkeley) 3/19/94"; #else -__RCSID("$NetBSD: main.c,v 1.247 2016/06/05 01:39:17 christos Exp $"); +__RCSID("$NetBSD: main.c,v 1.250 2016/08/11 19:53:17 sjg Exp $"); #endif #endif /* not lint */ #endif /*- * main.c -- * The main file for this entire program. Exit routines etc * reside here. * * Utility functions defined in this file: * Main_ParseArgLine Takes a line of arguments, breaks them and * treats them as if they were given when first * invoked. Used by the parse module to implement * the .MFLAGS target. * * Error Print a tagged error message. The global * MAKE variable must have been defined. This * takes a format string and optional arguments * for it. * * Fatal Print an error message and exit. Also takes * a format string and arguments for it. * * Punt Aborts all jobs and exits with a message. Also * takes a format string and arguments for it. * * Finish Finish things up by printing the number of * errors which occurred, as passed to it, and * exiting. */ #include #include #include #include #include #if defined(MAKE_NATIVE) && defined(HAVE_SYSCTL) #include #endif #include #include "wait.h" #include #include #include #include #include #include #include #include "make.h" #include "hash.h" #include "dir.h" #include "job.h" #include "pathnames.h" #include "trace.h" #ifdef USE_IOVEC #include #endif #ifndef DEFMAXLOCAL #define DEFMAXLOCAL DEFMAXJOBS #endif /* DEFMAXLOCAL */ #ifndef __arraycount # define __arraycount(__x) (sizeof(__x) / sizeof(__x[0])) #endif Lst create; /* Targets to be made */ time_t now; /* Time at start of make */ GNode *DEFAULT; /* .DEFAULT node */ Boolean allPrecious; /* .PRECIOUS given on line by itself */ static Boolean noBuiltins; /* -r flag */ static Lst makefiles; /* ordered list of makefiles to read */ static Boolean printVars; /* print value of one or more vars */ static Lst variables; /* list of variables to print */ int maxJobs; /* -j argument */ static int maxJobTokens; /* -j argument */ Boolean compatMake; /* -B argument */ int debug; /* -d argument */ Boolean debugVflag; /* -dV */ Boolean noExecute; /* -n flag */ Boolean noRecursiveExecute; /* -N flag */ Boolean keepgoing; /* -k flag */ Boolean queryFlag; /* -q flag */ Boolean touchFlag; /* -t flag */ Boolean enterFlag; /* -w flag */ Boolean enterFlagObj; /* -w and objdir != srcdir */ Boolean ignoreErrors; /* -i flag */ Boolean beSilent; /* -s flag */ Boolean oldVars; /* variable substitution style */ Boolean checkEnvFirst; /* -e flag */ Boolean parseWarnFatal; /* -W flag */ Boolean jobServer; /* -J flag */ static int jp_0 = -1, jp_1 = -1; /* ends of parent job pipe */ Boolean varNoExportEnv; /* -X flag */ Boolean doing_depend; /* Set while reading .depend */ static Boolean jobsRunning; /* TRUE if the jobs might be running */ static const char * tracefile; static void MainParseArgs(int, char **); static int ReadMakefile(const void *, const void *); static void usage(void) MAKE_ATTR_DEAD; static Boolean ignorePWD; /* if we use -C, PWD is meaningless */ static char objdir[MAXPATHLEN + 1]; /* where we chdir'ed to */ char curdir[MAXPATHLEN + 1]; /* Startup directory */ char *progname; /* the program name */ char *makeDependfile; pid_t myPid; int makelevel; Boolean forceJobs = FALSE; /* * On some systems MACHINE is defined as something other than * what we want. */ #ifdef FORCE_MACHINE # undef MACHINE # define MACHINE FORCE_MACHINE #endif extern Lst parseIncPath; /* * For compatibility with the POSIX version of MAKEFLAGS that includes * all the options with out -, convert flags to -f -l -a -g -s. */ static char * explode(const char *flags) { size_t len; char *nf, *st; const char *f; if (flags == NULL) return NULL; for (f = flags; *f; f++) if (!isalpha((unsigned char)*f)) break; if (*f) return bmake_strdup(flags); len = strlen(flags); st = nf = bmake_malloc(len * 3 + 1); while (*flags) { *nf++ = '-'; *nf++ = *flags++; *nf++ = ' '; } *nf = '\0'; return st; } static void parse_debug_options(const char *argvalue) { const char *modules; const char *mode; char *fname; int len; for (modules = argvalue; *modules; ++modules) { switch (*modules) { case 'A': debug = ~0; break; case 'a': debug |= DEBUG_ARCH; break; case 'C': debug |= DEBUG_CWD; break; case 'c': debug |= DEBUG_COND; break; case 'd': debug |= DEBUG_DIR; break; case 'e': debug |= DEBUG_ERROR; break; case 'f': debug |= DEBUG_FOR; break; case 'g': if (modules[1] == '1') { debug |= DEBUG_GRAPH1; ++modules; } else if (modules[1] == '2') { debug |= DEBUG_GRAPH2; ++modules; } else if (modules[1] == '3') { debug |= DEBUG_GRAPH3; ++modules; } break; case 'j': debug |= DEBUG_JOB; break; case 'l': debug |= DEBUG_LOUD; break; case 'M': debug |= DEBUG_META; break; case 'm': debug |= DEBUG_MAKE; break; case 'n': debug |= DEBUG_SCRIPT; break; case 'p': debug |= DEBUG_PARSE; break; case 's': debug |= DEBUG_SUFF; break; case 't': debug |= DEBUG_TARG; break; case 'V': debugVflag = TRUE; break; case 'v': debug |= DEBUG_VAR; break; case 'x': debug |= DEBUG_SHELL; break; case 'F': if (debug_file != stdout && debug_file != stderr) fclose(debug_file); if (*++modules == '+') { modules++; mode = "a"; } else mode = "w"; if (strcmp(modules, "stdout") == 0) { debug_file = stdout; goto debug_setbuf; } if (strcmp(modules, "stderr") == 0) { debug_file = stderr; goto debug_setbuf; } len = strlen(modules); fname = malloc(len + 20); memcpy(fname, modules, len + 1); /* Let the filename be modified by the pid */ if (strcmp(fname + len - 3, ".%d") == 0) snprintf(fname + len - 2, 20, "%d", getpid()); debug_file = fopen(fname, mode); if (!debug_file) { fprintf(stderr, "Cannot open debug file %s\n", fname); usage(); } free(fname); goto debug_setbuf; default: (void)fprintf(stderr, "%s: illegal argument to d option -- %c\n", progname, *modules); usage(); } } debug_setbuf: /* * Make the debug_file unbuffered, and make * stdout line buffered (unless debugfile == stdout). */ setvbuf(debug_file, NULL, _IONBF, 0); if (debug_file != stdout) { setvbuf(stdout, NULL, _IOLBF, 0); } } /*- * MainParseArgs -- * Parse a given argument vector. Called from main() and from * Main_ParseArgLine() when the .MAKEFLAGS target is used. * * XXX: Deal with command line overriding .MAKEFLAGS in makefile * * Results: * None * * Side Effects: * Various global and local flags will be set depending on the flags * given */ static void MainParseArgs(int argc, char **argv) { char *p; int c = '?'; int arginc; char *argvalue; const char *getopt_def; char *optscan; Boolean inOption, dashDash = FALSE; char found_path[MAXPATHLEN + 1]; /* for searching for sys.mk */ #define OPTFLAGS "BC:D:I:J:NST:V:WXd:ef:ij:km:nqrstw" /* Can't actually use getopt(3) because rescanning is not portable */ getopt_def = OPTFLAGS; rearg: inOption = FALSE; optscan = NULL; while(argc > 1) { char *getopt_spec; if(!inOption) optscan = argv[1]; c = *optscan++; arginc = 0; if(inOption) { if(c == '\0') { ++argv; --argc; inOption = FALSE; continue; } } else { if (c != '-' || dashDash) break; inOption = TRUE; c = *optscan++; } /* '-' found at some earlier point */ getopt_spec = strchr(getopt_def, c); if(c != '\0' && getopt_spec != NULL && getopt_spec[1] == ':') { /* - found, and should have an arg */ inOption = FALSE; arginc = 1; argvalue = optscan; if(*argvalue == '\0') { if (argc < 3) goto noarg; argvalue = argv[2]; arginc = 2; } } else { argvalue = NULL; } switch(c) { case '\0': arginc = 1; inOption = FALSE; break; case 'B': compatMake = TRUE; Var_Append(MAKEFLAGS, "-B", VAR_GLOBAL); Var_Set(MAKE_MODE, "compat", VAR_GLOBAL, 0); break; case 'C': if (chdir(argvalue) == -1) { (void)fprintf(stderr, "%s: chdir %s: %s\n", progname, argvalue, strerror(errno)); exit(1); } if (getcwd(curdir, MAXPATHLEN) == NULL) { (void)fprintf(stderr, "%s: %s.\n", progname, strerror(errno)); exit(2); } ignorePWD = TRUE; break; case 'D': if (argvalue == NULL || argvalue[0] == 0) goto noarg; Var_Set(argvalue, "1", VAR_GLOBAL, 0); Var_Append(MAKEFLAGS, "-D", VAR_GLOBAL); Var_Append(MAKEFLAGS, argvalue, VAR_GLOBAL); break; case 'I': if (argvalue == NULL) goto noarg; Parse_AddIncludeDir(argvalue); Var_Append(MAKEFLAGS, "-I", VAR_GLOBAL); Var_Append(MAKEFLAGS, argvalue, VAR_GLOBAL); break; case 'J': if (argvalue == NULL) goto noarg; if (sscanf(argvalue, "%d,%d", &jp_0, &jp_1) != 2) { (void)fprintf(stderr, "%s: internal error -- J option malformed (%s)\n", progname, argvalue); usage(); } if ((fcntl(jp_0, F_GETFD, 0) < 0) || (fcntl(jp_1, F_GETFD, 0) < 0)) { #if 0 (void)fprintf(stderr, "%s: ###### warning -- J descriptors were closed!\n", progname); exit(2); #endif jp_0 = -1; jp_1 = -1; compatMake = TRUE; } else { Var_Append(MAKEFLAGS, "-J", VAR_GLOBAL); Var_Append(MAKEFLAGS, argvalue, VAR_GLOBAL); jobServer = TRUE; } break; case 'N': noExecute = TRUE; noRecursiveExecute = TRUE; Var_Append(MAKEFLAGS, "-N", VAR_GLOBAL); break; case 'S': keepgoing = FALSE; Var_Append(MAKEFLAGS, "-S", VAR_GLOBAL); break; case 'T': if (argvalue == NULL) goto noarg; tracefile = bmake_strdup(argvalue); Var_Append(MAKEFLAGS, "-T", VAR_GLOBAL); Var_Append(MAKEFLAGS, argvalue, VAR_GLOBAL); break; case 'V': if (argvalue == NULL) goto noarg; printVars = TRUE; (void)Lst_AtEnd(variables, argvalue); Var_Append(MAKEFLAGS, "-V", VAR_GLOBAL); Var_Append(MAKEFLAGS, argvalue, VAR_GLOBAL); break; case 'W': parseWarnFatal = TRUE; break; case 'X': varNoExportEnv = TRUE; Var_Append(MAKEFLAGS, "-X", VAR_GLOBAL); break; case 'd': if (argvalue == NULL) goto noarg; /* If '-d-opts' don't pass to children */ if (argvalue[0] == '-') argvalue++; else { Var_Append(MAKEFLAGS, "-d", VAR_GLOBAL); Var_Append(MAKEFLAGS, argvalue, VAR_GLOBAL); } parse_debug_options(argvalue); break; case 'e': checkEnvFirst = TRUE; Var_Append(MAKEFLAGS, "-e", VAR_GLOBAL); break; case 'f': if (argvalue == NULL) goto noarg; (void)Lst_AtEnd(makefiles, argvalue); break; case 'i': ignoreErrors = TRUE; Var_Append(MAKEFLAGS, "-i", VAR_GLOBAL); break; case 'j': if (argvalue == NULL) goto noarg; forceJobs = TRUE; maxJobs = strtol(argvalue, &p, 0); if (*p != '\0' || maxJobs < 1) { (void)fprintf(stderr, "%s: illegal argument to -j -- must be positive integer!\n", progname); exit(1); } Var_Append(MAKEFLAGS, "-j", VAR_GLOBAL); Var_Append(MAKEFLAGS, argvalue, VAR_GLOBAL); Var_Set(".MAKE.JOBS", argvalue, VAR_GLOBAL, 0); maxJobTokens = maxJobs; break; case 'k': keepgoing = TRUE; Var_Append(MAKEFLAGS, "-k", VAR_GLOBAL); break; case 'm': if (argvalue == NULL) goto noarg; /* look for magic parent directory search string */ if (strncmp(".../", argvalue, 4) == 0) { if (!Dir_FindHereOrAbove(curdir, argvalue+4, found_path, sizeof(found_path))) break; /* nothing doing */ (void)Dir_AddDir(sysIncPath, found_path); } else { (void)Dir_AddDir(sysIncPath, argvalue); } Var_Append(MAKEFLAGS, "-m", VAR_GLOBAL); Var_Append(MAKEFLAGS, argvalue, VAR_GLOBAL); break; case 'n': noExecute = TRUE; Var_Append(MAKEFLAGS, "-n", VAR_GLOBAL); break; case 'q': queryFlag = TRUE; /* Kind of nonsensical, wot? */ Var_Append(MAKEFLAGS, "-q", VAR_GLOBAL); break; case 'r': noBuiltins = TRUE; Var_Append(MAKEFLAGS, "-r", VAR_GLOBAL); break; case 's': beSilent = TRUE; Var_Append(MAKEFLAGS, "-s", VAR_GLOBAL); break; case 't': touchFlag = TRUE; Var_Append(MAKEFLAGS, "-t", VAR_GLOBAL); break; case 'w': enterFlag = TRUE; Var_Append(MAKEFLAGS, "-w", VAR_GLOBAL); break; case '-': dashDash = TRUE; break; default: case '?': #ifndef MAKE_NATIVE fprintf(stderr, "getopt(%s) -> %d (%c)\n", OPTFLAGS, c, c); #endif usage(); } argv += arginc; argc -= arginc; } oldVars = TRUE; /* * See if the rest of the arguments are variable assignments and * perform them if so. Else take them to be targets and stuff them * on the end of the "create" list. */ for (; argc > 1; ++argv, --argc) if (Parse_IsVar(argv[1])) { Parse_DoVar(argv[1], VAR_CMD); } else { if (!*argv[1]) Punt("illegal (null) argument."); if (*argv[1] == '-' && !dashDash) goto rearg; (void)Lst_AtEnd(create, bmake_strdup(argv[1])); } return; noarg: (void)fprintf(stderr, "%s: option requires an argument -- %c\n", progname, c); usage(); } /*- * Main_ParseArgLine -- * Used by the parse module when a .MFLAGS or .MAKEFLAGS target * is encountered and by main() when reading the .MAKEFLAGS envariable. * Takes a line of arguments and breaks it into its * component words and passes those words and the number of them to the * MainParseArgs function. * The line should have all its leading whitespace removed. * * Input: * line Line to fracture * * Results: * None * * Side Effects: * Only those that come from the various arguments. */ void Main_ParseArgLine(const char *line) { char **argv; /* Manufactured argument vector */ int argc; /* Number of arguments in argv */ char *args; /* Space used by the args */ char *buf, *p1; char *argv0 = Var_Value(".MAKE", VAR_GLOBAL, &p1); size_t len; if (line == NULL) return; for (; *line == ' '; ++line) continue; if (!*line) return; #ifndef POSIX { /* * $MAKE may simply be naming the make(1) binary */ char *cp; if (!(cp = strrchr(line, '/'))) cp = line; if ((cp = strstr(cp, "make")) && strcmp(cp, "make") == 0) return; } #endif buf = bmake_malloc(len = strlen(line) + strlen(argv0) + 2); (void)snprintf(buf, len, "%s %s", argv0, line); free(p1); argv = brk_string(buf, &argc, TRUE, &args); if (argv == NULL) { Error("Unterminated quoted string [%s]", buf); free(buf); return; } free(buf); MainParseArgs(argc, argv); free(args); free(argv); } Boolean Main_SetObjdir(const char *path) { struct stat sb; char *p = NULL; char buf[MAXPATHLEN + 1]; Boolean rc = FALSE; /* expand variable substitutions */ if (strchr(path, '$') != 0) { snprintf(buf, MAXPATHLEN, "%s", path); path = p = Var_Subst(NULL, buf, VAR_GLOBAL, VARF_WANTRES); } if (path[0] != '/') { snprintf(buf, MAXPATHLEN, "%s/%s", curdir, path); path = buf; } /* look for the directory and try to chdir there */ if (stat(path, &sb) == 0 && S_ISDIR(sb.st_mode)) { if (chdir(path)) { (void)fprintf(stderr, "make warning: %s: %s.\n", path, strerror(errno)); } else { strncpy(objdir, path, MAXPATHLEN); Var_Set(".OBJDIR", objdir, VAR_GLOBAL, 0); setenv("PWD", objdir, 1); Dir_InitDot(); rc = TRUE; if (enterFlag && strcmp(objdir, curdir) != 0) enterFlagObj = TRUE; } } free(p); return rc; } /*- * ReadAllMakefiles -- * wrapper around ReadMakefile() to read all. * * Results: * TRUE if ok, FALSE on error */ static int ReadAllMakefiles(const void *p, const void *q) { return (ReadMakefile(p, q) == 0); } int str2Lst_Append(Lst lp, char *str, const char *sep) { char *cp; int n; if (!sep) sep = " \t"; for (n = 0, cp = strtok(str, sep); cp; cp = strtok(NULL, sep)) { (void)Lst_AtEnd(lp, cp); n++; } return (n); } #ifdef SIGINFO /*ARGSUSED*/ static void siginfo(int signo MAKE_ATTR_UNUSED) { char dir[MAXPATHLEN]; char str[2 * MAXPATHLEN]; int len; if (getcwd(dir, sizeof(dir)) == NULL) return; len = snprintf(str, sizeof(str), "%s: Working in: %s\n", progname, dir); if (len > 0) (void)write(STDERR_FILENO, str, (size_t)len); } #endif /* * Allow makefiles some control over the mode we run in. */ void MakeMode(const char *mode) { char *mp = NULL; if (!mode) mode = mp = Var_Subst(NULL, "${" MAKE_MODE ":tl}", VAR_GLOBAL, VARF_WANTRES); if (mode && *mode) { if (strstr(mode, "compat")) { compatMake = TRUE; forceJobs = FALSE; } #if USE_META if (strstr(mode, "meta")) meta_mode_init(mode); #endif } free(mp); } /*- * main -- * The main function, for obvious reasons. Initializes variables * and a few modules, then parses the arguments give it in the * environment and on the command line. Reads the system makefile * followed by either Makefile, makefile or the file given by the * -f argument. Sets the .MAKEFLAGS PMake variable based on all the * flags it has received by then uses either the Make or the Compat * module to create the initial list of targets. * * Results: * If -q was given, exits -1 if anything was out-of-date. Else it exits * 0. * * Side Effects: * The program exits when done. Targets are created. etc. etc. etc. */ int main(int argc, char **argv) { Lst targs; /* target nodes to create -- passed to Make_Init */ Boolean outOfDate = FALSE; /* FALSE if all targets up to date */ struct stat sb, sa; char *p1, *path; char mdpath[MAXPATHLEN]; #ifdef FORCE_MACHINE const char *machine = FORCE_MACHINE; #else const char *machine = getenv("MACHINE"); #endif const char *machine_arch = getenv("MACHINE_ARCH"); char *syspath = getenv("MAKESYSPATH"); Lst sysMkPath; /* Path of sys.mk */ char *cp = NULL, *start; /* avoid faults on read-only strings */ static char defsyspath[] = _PATH_DEFSYSPATH; char found_path[MAXPATHLEN + 1]; /* for searching for sys.mk */ struct timeval rightnow; /* to initialize random seed */ struct utsname utsname; /* default to writing debug to stderr */ debug_file = stderr; #ifdef SIGINFO (void)bmake_signal(SIGINFO, siginfo); #endif /* * Set the seed to produce a different random sequence * on each program execution. */ gettimeofday(&rightnow, NULL); srandom(rightnow.tv_sec + rightnow.tv_usec); if ((progname = strrchr(argv[0], '/')) != NULL) progname++; else progname = argv[0]; #if defined(MAKE_NATIVE) || (defined(HAVE_SETRLIMIT) && defined(RLIMIT_NOFILE)) /* * get rid of resource limit on file descriptors */ { struct rlimit rl; if (getrlimit(RLIMIT_NOFILE, &rl) != -1 && rl.rlim_cur != rl.rlim_max) { rl.rlim_cur = rl.rlim_max; (void)setrlimit(RLIMIT_NOFILE, &rl); } } #endif if (uname(&utsname) == -1) { (void)fprintf(stderr, "%s: uname failed (%s).\n", progname, strerror(errno)); exit(2); } /* * Get the name of this type of MACHINE from utsname * so we can share an executable for similar machines. * (i.e. m68k: amiga hp300, mac68k, sun3, ...) * * Note that both MACHINE and MACHINE_ARCH are decided at * run-time. */ if (!machine) { #ifdef MAKE_NATIVE machine = utsname.machine; #else #ifdef MAKE_MACHINE machine = MAKE_MACHINE; #else machine = "unknown"; #endif #endif } if (!machine_arch) { #if defined(MAKE_NATIVE) && defined(HAVE_SYSCTL) && defined(CTL_HW) && defined(HW_MACHINE_ARCH) static char machine_arch_buf[sizeof(utsname.machine)]; int mib[2] = { CTL_HW, HW_MACHINE_ARCH }; size_t len = sizeof(machine_arch_buf); if (sysctl(mib, __arraycount(mib), machine_arch_buf, &len, NULL, 0) < 0) { (void)fprintf(stderr, "%s: sysctl failed (%s).\n", progname, strerror(errno)); exit(2); } machine_arch = machine_arch_buf; #else #ifndef MACHINE_ARCH #ifdef MAKE_MACHINE_ARCH machine_arch = MAKE_MACHINE_ARCH; #else machine_arch = "unknown"; #endif #else machine_arch = MACHINE_ARCH; #endif #endif } myPid = getpid(); /* remember this for vFork() */ /* * Just in case MAKEOBJDIR wants us to do something tricky. */ Var_Init(); /* Initialize the lists of variables for * parsing arguments */ Var_Set(".MAKE.OS", utsname.sysname, VAR_GLOBAL, 0); Var_Set("MACHINE", machine, VAR_GLOBAL, 0); Var_Set("MACHINE_ARCH", machine_arch, VAR_GLOBAL, 0); #ifdef MAKE_VERSION Var_Set("MAKE_VERSION", MAKE_VERSION, VAR_GLOBAL, 0); #endif Var_Set(".newline", "\n", VAR_GLOBAL, 0); /* handy for :@ loops */ /* * This is the traditional preference for makefiles. */ #ifndef MAKEFILE_PREFERENCE_LIST # define MAKEFILE_PREFERENCE_LIST "makefile Makefile" #endif Var_Set(MAKEFILE_PREFERENCE, MAKEFILE_PREFERENCE_LIST, VAR_GLOBAL, 0); Var_Set(MAKE_DEPENDFILE, ".depend", VAR_GLOBAL, 0); create = Lst_Init(FALSE); makefiles = Lst_Init(FALSE); printVars = FALSE; debugVflag = FALSE; variables = Lst_Init(FALSE); beSilent = FALSE; /* Print commands as executed */ ignoreErrors = FALSE; /* Pay attention to non-zero returns */ noExecute = FALSE; /* Execute all commands */ noRecursiveExecute = FALSE; /* Execute all .MAKE targets */ keepgoing = FALSE; /* Stop on error */ allPrecious = FALSE; /* Remove targets when interrupted */ queryFlag = FALSE; /* This is not just a check-run */ noBuiltins = FALSE; /* Read the built-in rules */ touchFlag = FALSE; /* Actually update targets */ debug = 0; /* No debug verbosity, please. */ jobsRunning = FALSE; maxJobs = DEFMAXLOCAL; /* Set default local max concurrency */ maxJobTokens = maxJobs; compatMake = FALSE; /* No compat mode */ ignorePWD = FALSE; /* * Initialize the parsing, directory and variable modules to prepare * for the reading of inclusion paths and variable settings on the * command line */ /* * Initialize various variables. * MAKE also gets this name, for compatibility * .MAKEFLAGS gets set to the empty string just in case. * MFLAGS also gets initialized empty, for compatibility. */ Parse_Init(); if (argv[0][0] == '/' || strchr(argv[0], '/') == NULL) { /* * Leave alone if it is an absolute path, or if it does * not contain a '/' in which case we need to find it in * the path, like execvp(3) and the shells do. */ p1 = argv[0]; } else { /* * A relative path, canonicalize it. */ p1 = cached_realpath(argv[0], mdpath); if (!p1 || *p1 != '/' || stat(p1, &sb) < 0) { p1 = argv[0]; /* realpath failed */ } } Var_Set("MAKE", p1, VAR_GLOBAL, 0); Var_Set(".MAKE", p1, VAR_GLOBAL, 0); Var_Set(MAKEFLAGS, "", VAR_GLOBAL, 0); Var_Set(MAKEOVERRIDES, "", VAR_GLOBAL, 0); Var_Set("MFLAGS", "", VAR_GLOBAL, 0); Var_Set(".ALLTARGETS", "", VAR_GLOBAL, 0); /* some makefiles need to know this */ Var_Set(MAKE_LEVEL ".ENV", MAKE_LEVEL_ENV, VAR_CMD, 0); /* * Set some other useful macros */ { char tmp[64], *ep; makelevel = ((ep = getenv(MAKE_LEVEL_ENV)) && *ep) ? atoi(ep) : 0; if (makelevel < 0) makelevel = 0; snprintf(tmp, sizeof(tmp), "%d", makelevel); Var_Set(MAKE_LEVEL, tmp, VAR_GLOBAL, 0); snprintf(tmp, sizeof(tmp), "%u", myPid); Var_Set(".MAKE.PID", tmp, VAR_GLOBAL, 0); snprintf(tmp, sizeof(tmp), "%u", getppid()); Var_Set(".MAKE.PPID", tmp, VAR_GLOBAL, 0); } if (makelevel > 0) { char pn[1024]; snprintf(pn, sizeof(pn), "%s[%d]", progname, makelevel); progname = bmake_strdup(pn); } #ifdef USE_META meta_init(); #endif /* * First snag any flags out of the MAKE environment variable. * (Note this is *not* MAKEFLAGS since /bin/make uses that and it's * in a different format). */ #ifdef POSIX p1 = explode(getenv("MAKEFLAGS")); Main_ParseArgLine(p1); free(p1); #else Main_ParseArgLine(getenv("MAKE")); #endif /* * Find where we are (now). * We take care of PWD for the automounter below... */ if (getcwd(curdir, MAXPATHLEN) == NULL) { (void)fprintf(stderr, "%s: getcwd: %s.\n", progname, strerror(errno)); exit(2); } MainParseArgs(argc, argv); if (enterFlag) printf("%s: Entering directory `%s'\n", progname, curdir); /* * Verify that cwd is sane. */ if (stat(curdir, &sa) == -1) { (void)fprintf(stderr, "%s: %s: %s.\n", progname, curdir, strerror(errno)); exit(2); } /* * All this code is so that we know where we are when we start up * on a different machine with pmake. * Overriding getcwd() with $PWD totally breaks MAKEOBJDIRPREFIX * since the value of curdir can vary depending on how we got * here. Ie sitting at a shell prompt (shell that provides $PWD) * or via subdir.mk in which case its likely a shell which does * not provide it. * So, to stop it breaking this case only, we ignore PWD if * MAKEOBJDIRPREFIX is set or MAKEOBJDIR contains a transform. */ #ifndef NO_PWD_OVERRIDE if (!ignorePWD) { char *pwd, *ptmp1 = NULL, *ptmp2 = NULL; if ((pwd = getenv("PWD")) != NULL && Var_Value("MAKEOBJDIRPREFIX", VAR_CMD, &ptmp1) == NULL) { const char *makeobjdir = Var_Value("MAKEOBJDIR", VAR_CMD, &ptmp2); if (makeobjdir == NULL || !strchr(makeobjdir, '$')) { if (stat(pwd, &sb) == 0 && sa.st_ino == sb.st_ino && sa.st_dev == sb.st_dev) (void)strncpy(curdir, pwd, MAXPATHLEN); } } free(ptmp1); free(ptmp2); } #endif Var_Set(".CURDIR", curdir, VAR_GLOBAL, 0); /* * Find the .OBJDIR. If MAKEOBJDIRPREFIX, or failing that, * MAKEOBJDIR is set in the environment, try only that value * and fall back to .CURDIR if it does not exist. * * Otherwise, try _PATH_OBJDIR.MACHINE, _PATH_OBJDIR, and * finally _PATH_OBJDIRPREFIX`pwd`, in that order. If none * of these paths exist, just use .CURDIR. */ Dir_Init(curdir); (void)Main_SetObjdir(curdir); if ((path = Var_Value("MAKEOBJDIRPREFIX", VAR_CMD, &p1)) != NULL) { (void)snprintf(mdpath, MAXPATHLEN, "%s%s", path, curdir); (void)Main_SetObjdir(mdpath); free(p1); } else if ((path = Var_Value("MAKEOBJDIR", VAR_CMD, &p1)) != NULL) { (void)Main_SetObjdir(path); free(p1); } else { (void)snprintf(mdpath, MAXPATHLEN, "%s.%s", _PATH_OBJDIR, machine); if (!Main_SetObjdir(mdpath) && !Main_SetObjdir(_PATH_OBJDIR)) { (void)snprintf(mdpath, MAXPATHLEN, "%s%s", _PATH_OBJDIRPREFIX, curdir); (void)Main_SetObjdir(mdpath); } } /* * Initialize archive, target and suffix modules in preparation for * parsing the makefile(s) */ Arch_Init(); Targ_Init(); Suff_Init(); Trace_Init(tracefile); DEFAULT = NULL; (void)time(&now); Trace_Log(MAKESTART, NULL); /* * Set up the .TARGETS variable to contain the list of targets to be * created. If none specified, make the variable empty -- the parser * will fill the thing in with the default or .MAIN target. */ if (!Lst_IsEmpty(create)) { LstNode ln; for (ln = Lst_First(create); ln != NULL; ln = Lst_Succ(ln)) { char *name = (char *)Lst_Datum(ln); Var_Append(".TARGETS", name, VAR_GLOBAL); } } else Var_Set(".TARGETS", "", VAR_GLOBAL, 0); /* * If no user-supplied system path was given (through the -m option) * add the directories from the DEFSYSPATH (more than one may be given * as dir1:...:dirn) to the system include path. */ if (syspath == NULL || *syspath == '\0') syspath = defsyspath; else syspath = bmake_strdup(syspath); for (start = syspath; *start != '\0'; start = cp) { for (cp = start; *cp != '\0' && *cp != ':'; cp++) continue; if (*cp == ':') { *cp++ = '\0'; } /* look for magic parent directory search string */ if (strncmp(".../", start, 4) != 0) { (void)Dir_AddDir(defIncPath, start); } else { if (Dir_FindHereOrAbove(curdir, start+4, found_path, sizeof(found_path))) { (void)Dir_AddDir(defIncPath, found_path); } } } if (syspath != defsyspath) free(syspath); /* * Read in the built-in rules first, followed by the specified * makefile, if it was (makefile != NULL), or the default * makefile and Makefile, in that order, if it wasn't. */ if (!noBuiltins) { LstNode ln; sysMkPath = Lst_Init(FALSE); Dir_Expand(_PATH_DEFSYSMK, Lst_IsEmpty(sysIncPath) ? defIncPath : sysIncPath, sysMkPath); if (Lst_IsEmpty(sysMkPath)) Fatal("%s: no system rules (%s).", progname, _PATH_DEFSYSMK); ln = Lst_Find(sysMkPath, NULL, ReadMakefile); if (ln == NULL) Fatal("%s: cannot open %s.", progname, (char *)Lst_Datum(ln)); } if (!Lst_IsEmpty(makefiles)) { LstNode ln; ln = Lst_Find(makefiles, NULL, ReadAllMakefiles); if (ln != NULL) Fatal("%s: cannot open %s.", progname, (char *)Lst_Datum(ln)); } else { p1 = Var_Subst(NULL, "${" MAKEFILE_PREFERENCE "}", VAR_CMD, VARF_WANTRES); if (p1) { (void)str2Lst_Append(makefiles, p1, NULL); (void)Lst_Find(makefiles, NULL, ReadMakefile); free(p1); } } /* In particular suppress .depend for '-r -V .OBJDIR -f /dev/null' */ if (!noBuiltins || !printVars) { makeDependfile = Var_Subst(NULL, "${.MAKE.DEPENDFILE:T}", VAR_CMD, VARF_WANTRES); doing_depend = TRUE; (void)ReadMakefile(makeDependfile, NULL); doing_depend = FALSE; } if (enterFlagObj) printf("%s: Entering directory `%s'\n", progname, objdir); MakeMode(NULL); Var_Append("MFLAGS", Var_Value(MAKEFLAGS, VAR_GLOBAL, &p1), VAR_GLOBAL); free(p1); if (!forceJobs && !compatMake && Var_Exists(".MAKE.JOBS", VAR_GLOBAL)) { char *value; int n; value = Var_Subst(NULL, "${.MAKE.JOBS}", VAR_GLOBAL, VARF_WANTRES); n = strtol(value, NULL, 0); if (n < 1) { (void)fprintf(stderr, "%s: illegal value for .MAKE.JOBS -- must be positive integer!\n", progname); exit(1); } if (n != maxJobs) { Var_Append(MAKEFLAGS, "-j", VAR_GLOBAL); Var_Append(MAKEFLAGS, value, VAR_GLOBAL); } maxJobs = n; maxJobTokens = maxJobs; forceJobs = TRUE; free(value); } /* * Be compatible if user did not specify -j and did not explicitly * turned compatibility on */ if (!compatMake && !forceJobs) { compatMake = TRUE; } if (!compatMake) Job_ServerStart(maxJobTokens, jp_0, jp_1); if (DEBUG(JOB)) fprintf(debug_file, "job_pipe %d %d, maxjobs %d, tokens %d, compat %d\n", jp_0, jp_1, maxJobs, maxJobTokens, compatMake); Main_ExportMAKEFLAGS(TRUE); /* initial export */ /* * For compatibility, look at the directories in the VPATH variable * and add them to the search path, if the variable is defined. The * variable's value is in the same format as the PATH envariable, i.e. * ::... */ if (Var_Exists("VPATH", VAR_CMD)) { char *vpath, savec; /* * GCC stores string constants in read-only memory, but * Var_Subst will want to write this thing, so store it * in an array */ static char VPATH[] = "${VPATH}"; vpath = Var_Subst(NULL, VPATH, VAR_CMD, VARF_WANTRES); path = vpath; do { /* skip to end of directory */ for (cp = path; *cp != ':' && *cp != '\0'; cp++) continue; /* Save terminator character so know when to stop */ savec = *cp; *cp = '\0'; /* Add directory to search path */ (void)Dir_AddDir(dirSearchPath, path); *cp = savec; path = cp + 1; } while (savec == ':'); free(vpath); } /* * Now that all search paths have been read for suffixes et al, it's * time to add the default search path to their lists... */ Suff_DoPaths(); /* * Propagate attributes through :: dependency lists. */ Targ_Propagate(); /* print the initial graph, if the user requested it */ if (DEBUG(GRAPH1)) Targ_PrintGraph(1); /* print the values of any variables requested by the user */ if (printVars) { LstNode ln; Boolean expandVars; if (debugVflag) expandVars = FALSE; else expandVars = getBoolean(".MAKE.EXPAND_VARIABLES", FALSE); for (ln = Lst_First(variables); ln != NULL; ln = Lst_Succ(ln)) { char *var = (char *)Lst_Datum(ln); char *value; if (strchr(var, '$')) { value = p1 = Var_Subst(NULL, var, VAR_GLOBAL, VARF_WANTRES); } else if (expandVars) { char tmp[128]; if (snprintf(tmp, sizeof(tmp), "${%s}", var) >= (int)(sizeof(tmp))) Fatal("%s: variable name too big: %s", progname, var); value = p1 = Var_Subst(NULL, tmp, VAR_GLOBAL, VARF_WANTRES); } else { value = Var_Value(var, VAR_GLOBAL, &p1); } printf("%s\n", value ? value : ""); free(p1); } } else { /* * Have now read the entire graph and need to make a list of * targets to create. If none was given on the command line, * we consult the parsing module to find the main target(s) * to create. */ if (Lst_IsEmpty(create)) targs = Parse_MainName(); else targs = Targ_FindList(create, TARG_CREATE); if (!compatMake) { /* * Initialize job module before traversing the graph * now that any .BEGIN and .END targets have been read. * This is done only if the -q flag wasn't given * (to prevent the .BEGIN from being executed should * it exist). */ if (!queryFlag) { Job_Init(); jobsRunning = TRUE; } /* Traverse the graph, checking on all the targets */ outOfDate = Make_Run(targs); } else { /* * Compat_Init will take care of creating all the * targets as well as initializing the module. */ Compat_Run(targs); } } #ifdef CLEANUP Lst_Destroy(targs, NULL); Lst_Destroy(variables, NULL); Lst_Destroy(makefiles, NULL); Lst_Destroy(create, (FreeProc *)free); #endif /* print the graph now it's been processed if the user requested it */ if (DEBUG(GRAPH2)) Targ_PrintGraph(2); Trace_Log(MAKEEND, 0); if (enterFlagObj) printf("%s: Leaving directory `%s'\n", progname, objdir); if (enterFlag) printf("%s: Leaving directory `%s'\n", progname, curdir); #ifdef USE_META meta_finish(); #endif Suff_End(); Targ_End(); Arch_End(); Var_End(); Parse_End(); Dir_End(); Job_End(); Trace_End(); return outOfDate ? 1 : 0; } /*- * ReadMakefile -- * Open and parse the given makefile. * * Results: * 0 if ok. -1 if couldn't open file. * * Side Effects: * lots */ static int ReadMakefile(const void *p, const void *q MAKE_ATTR_UNUSED) { const char *fname = p; /* makefile to read */ int fd; size_t len = MAXPATHLEN; char *name, *path = bmake_malloc(len); if (!strcmp(fname, "-")) { Parse_File(NULL /*stdin*/, -1); Var_Set("MAKEFILE", "", VAR_INTERNAL, 0); } else { /* if we've chdir'd, rebuild the path name */ if (strcmp(curdir, objdir) && *fname != '/') { size_t plen = strlen(curdir) + strlen(fname) + 2; if (len < plen) path = bmake_realloc(path, len = 2 * plen); (void)snprintf(path, len, "%s/%s", curdir, fname); fd = open(path, O_RDONLY); if (fd != -1) { fname = path; goto found; } /* If curdir failed, try objdir (ala .depend) */ plen = strlen(objdir) + strlen(fname) + 2; if (len < plen) path = bmake_realloc(path, len = 2 * plen); (void)snprintf(path, len, "%s/%s", objdir, fname); fd = open(path, O_RDONLY); if (fd != -1) { fname = path; goto found; } } else { fd = open(fname, O_RDONLY); if (fd != -1) goto found; } /* look in -I and system include directories. */ name = Dir_FindFile(fname, parseIncPath); if (!name) name = Dir_FindFile(fname, Lst_IsEmpty(sysIncPath) ? defIncPath : sysIncPath); if (!name || (fd = open(name, O_RDONLY)) == -1) { free(name); free(path); return(-1); } fname = name; /* * set the MAKEFILE variable desired by System V fans -- the * placement of the setting here means it gets set to the last * makefile specified, as it is set by SysV make. */ found: if (!doing_depend) Var_Set("MAKEFILE", fname, VAR_INTERNAL, 0); Parse_File(fname, fd); } free(path); return(0); } /*- * Cmd_Exec -- * Execute the command in cmd, and return the output of that command * in a string. * * Results: * A string containing the output of the command, or the empty string * If errnum is not NULL, it contains the reason for the command failure * * Side Effects: * The string must be freed by the caller. */ char * Cmd_Exec(const char *cmd, const char **errnum) { const char *args[4]; /* Args for invoking the shell */ int fds[2]; /* Pipe streams */ int cpid; /* Child PID */ int pid; /* PID from wait() */ char *res; /* result */ WAIT_T status; /* command exit status */ Buffer buf; /* buffer to store the result */ char *cp; int cc; /* bytes read, or -1 */ int savederr; /* saved errno */ *errnum = NULL; if (!shellName) Shell_Init(); /* * Set up arguments for shell */ args[0] = shellName; args[1] = "-c"; args[2] = cmd; args[3] = NULL; /* * Open a pipe for fetching its output */ if (pipe(fds) == -1) { *errnum = "Couldn't create pipe for \"%s\""; goto bad; } /* * Fork */ switch (cpid = vFork()) { case 0: /* * Close input side of pipe */ (void)close(fds[0]); /* * Duplicate the output stream to the shell's output, then * shut the extra thing down. Note we don't fetch the error * stream...why not? Why? */ (void)dup2(fds[1], 1); (void)close(fds[1]); Var_ExportVars(); (void)execv(shellPath, UNCONST(args)); _exit(1); /*NOTREACHED*/ case -1: *errnum = "Couldn't exec \"%s\""; goto bad; default: /* * No need for the writing half */ (void)close(fds[1]); savederr = 0; Buf_Init(&buf, 0); do { char result[BUFSIZ]; cc = read(fds[0], result, sizeof(result)); if (cc > 0) Buf_AddBytes(&buf, cc, result); } while (cc > 0 || (cc == -1 && errno == EINTR)); if (cc == -1) savederr = errno; /* * Close the input side of the pipe. */ (void)close(fds[0]); /* * Wait for the process to exit. */ while(((pid = waitpid(cpid, &status, 0)) != cpid) && (pid >= 0)) { JobReapChild(pid, status, FALSE); continue; } cc = Buf_Size(&buf); res = Buf_Destroy(&buf, FALSE); if (savederr != 0) *errnum = "Couldn't read shell's output for \"%s\""; if (WIFSIGNALED(status)) *errnum = "\"%s\" exited on a signal"; else if (WEXITSTATUS(status) != 0) *errnum = "\"%s\" returned non-zero status"; /* * Null-terminate the result, convert newlines to spaces and * install it in the variable. */ res[cc] = '\0'; cp = &res[cc]; if (cc > 0 && *--cp == '\n') { /* * A final newline is just stripped */ *cp-- = '\0'; } while (cp >= res) { if (*cp == '\n') { *cp = ' '; } cp--; } break; } return res; bad: res = bmake_malloc(1); *res = '\0'; return res; } /*- * Error -- * Print an error message given its format. * * Results: * None. * * Side Effects: * The message is printed. */ /* VARARGS */ void Error(const char *fmt, ...) { va_list ap; FILE *err_file; err_file = debug_file; if (err_file == stdout) err_file = stderr; (void)fflush(stdout); for (;;) { va_start(ap, fmt); fprintf(err_file, "%s: ", progname); (void)vfprintf(err_file, fmt, ap); va_end(ap); (void)fprintf(err_file, "\n"); (void)fflush(err_file); if (err_file == stderr) break; err_file = stderr; } } /*- * Fatal -- * Produce a Fatal error message. If jobs are running, waits for them * to finish. * * Results: * None * * Side Effects: * The program exits */ /* VARARGS */ void Fatal(const char *fmt, ...) { va_list ap; va_start(ap, fmt); if (jobsRunning) Job_Wait(); (void)fflush(stdout); (void)vfprintf(stderr, fmt, ap); va_end(ap); (void)fprintf(stderr, "\n"); (void)fflush(stderr); PrintOnError(NULL, NULL); if (DEBUG(GRAPH2) || DEBUG(GRAPH3)) Targ_PrintGraph(2); Trace_Log(MAKEERROR, 0); exit(2); /* Not 1 so -q can distinguish error */ } /* * Punt -- * Major exception once jobs are being created. Kills all jobs, prints * a message and exits. * * Results: * None * * Side Effects: * All children are killed indiscriminately and the program Lib_Exits */ /* VARARGS */ void Punt(const char *fmt, ...) { va_list ap; va_start(ap, fmt); (void)fflush(stdout); (void)fprintf(stderr, "%s: ", progname); (void)vfprintf(stderr, fmt, ap); va_end(ap); (void)fprintf(stderr, "\n"); (void)fflush(stderr); PrintOnError(NULL, NULL); DieHorribly(); } /*- * DieHorribly -- * Exit without giving a message. * * Results: * None * * Side Effects: * A big one... */ void DieHorribly(void) { if (jobsRunning) Job_AbortAll(); if (DEBUG(GRAPH2)) Targ_PrintGraph(2); Trace_Log(MAKEERROR, 0); exit(2); /* Not 1, so -q can distinguish error */ } /* * Finish -- * Called when aborting due to errors in child shell to signal * abnormal exit. * * Results: * None * * Side Effects: * The program exits */ void Finish(int errors) /* number of errors encountered in Make_Make */ { Fatal("%d error%s", errors, errors == 1 ? "" : "s"); } /* * eunlink -- * Remove a file carefully, avoiding directories. */ int eunlink(const char *file) { struct stat st; if (lstat(file, &st) == -1) return -1; if (S_ISDIR(st.st_mode)) { errno = EISDIR; return -1; } return unlink(file); } /* * execError -- * Print why exec failed, avoiding stdio. */ void execError(const char *af, const char *av) { #ifdef USE_IOVEC int i = 0; struct iovec iov[8]; #define IOADD(s) \ (void)(iov[i].iov_base = UNCONST(s), \ iov[i].iov_len = strlen(iov[i].iov_base), \ i++) #else #define IOADD(s) (void)write(2, s, strlen(s)) #endif IOADD(progname); IOADD(": "); IOADD(af); IOADD("("); IOADD(av); IOADD(") failed ("); IOADD(strerror(errno)); IOADD(")\n"); #ifdef USE_IOVEC while (writev(2, iov, 8) == -1 && errno == EAGAIN) continue; #endif } /* * usage -- * exit with usage message */ static void usage(void) { char *p; if ((p = strchr(progname, '[')) != NULL) *p = '\0'; (void)fprintf(stderr, "usage: %s [-BeikNnqrstWwX] \n\ [-C directory] [-D variable] [-d flags] [-f makefile]\n\ [-I directory] [-J private] [-j max_jobs] [-m directory] [-T file]\n\ [-V variable] [variable=value] [target ...]\n", progname); exit(2); } /* * realpath(3) can get expensive, cache results... */ char * cached_realpath(const char *pathname, char *resolved) { static GNode *cache; char *rp, *cp; if (!pathname || !pathname[0]) return NULL; if (!cache) { cache = Targ_NewGN("Realpath"); #ifndef DEBUG_REALPATH_CACHE cache->flags = INTERNAL; #endif } - rp = Var_Value(pathname, cache, &cp); - if (rp) { + if ((rp = Var_Value(pathname, cache, &cp)) != NULL) { /* a hit */ strlcpy(resolved, rp, MAXPATHLEN); - } else if ((rp = realpath(pathname, resolved))) { + } else if ((rp = realpath(pathname, resolved)) != NULL) { Var_Set(pathname, rp, cache, 0); } free(cp); return rp ? resolved : NULL; } int PrintAddr(void *a, void *b) { printf("%lx ", (unsigned long) a); return b ? 0 : 0; } +static int +addErrorCMD(void *cmdp, void *gnp) +{ + if (cmdp == NULL) + return 1; /* stop */ + Var_Append(".ERROR_CMD", cmdp, VAR_GLOBAL); + return 0; +} void PrintOnError(GNode *gn, const char *s) { static GNode *en = NULL; char tmp[64]; char *cp; if (s) printf("%s", s); printf("\n%s: stopped in %s\n", progname, curdir); if (en) return; /* we've been here! */ if (gn) { /* * We can print this even if there is no .ERROR target. */ Var_Set(".ERROR_TARGET", gn->name, VAR_GLOBAL, 0); + Var_Delete(".ERROR_CMD", VAR_GLOBAL); + Lst_ForEach(gn->commands, addErrorCMD, gn); } strncpy(tmp, "${MAKE_PRINT_VAR_ON_ERROR:@v@$v='${$v}'\n@}", sizeof(tmp) - 1); cp = Var_Subst(NULL, tmp, VAR_GLOBAL, VARF_WANTRES); if (cp) { if (*cp) printf("%s", cp); free(cp); } fflush(stdout); /* * Finally, see if there is a .ERROR target, and run it if so. */ en = Targ_FindNode(".ERROR", TARG_NOCREATE); if (en) { en->type |= OP_SPECIAL; Compat_Make(en, en); } } void Main_ExportMAKEFLAGS(Boolean first) { static int once = 1; char tmp[64]; char *s; if (once != first) return; once = 0; strncpy(tmp, "${.MAKEFLAGS} ${.MAKEOVERRIDES:O:u:@v@$v=${$v:Q}@}", sizeof(tmp)); s = Var_Subst(NULL, tmp, VAR_CMD, VARF_WANTRES); if (s && *s) { #ifdef POSIX setenv("MAKEFLAGS", s, 1); #else setenv("MAKE", s, 1); #endif } } char * getTmpdir(void) { static char *tmpdir = NULL; if (!tmpdir) { struct stat st; /* * Honor $TMPDIR but only if it is valid. * Ensure it ends with /. */ tmpdir = Var_Subst(NULL, "${TMPDIR:tA:U" _PATH_TMP "}/", VAR_GLOBAL, VARF_WANTRES); if (stat(tmpdir, &st) < 0 || !S_ISDIR(st.st_mode)) { free(tmpdir); tmpdir = bmake_strdup(_PATH_TMP); } } return tmpdir; } /* * Create and open a temp file using "pattern". * If "fnamep" is provided set it to a copy of the filename created. * Otherwise unlink the file once open. */ int mkTempFile(const char *pattern, char **fnamep) { static char *tmpdir = NULL; char tfile[MAXPATHLEN]; int fd; if (!pattern) pattern = TMPPAT; if (!tmpdir) tmpdir = getTmpdir(); if (pattern[0] == '/') { snprintf(tfile, sizeof(tfile), "%s", pattern); } else { snprintf(tfile, sizeof(tfile), "%s%s", tmpdir, pattern); } if ((fd = mkstemp(tfile)) < 0) Punt("Could not create temporary file %s: %s", tfile, strerror(errno)); if (fnamep) { *fnamep = bmake_strdup(tfile); } else { unlink(tfile); /* we just want the descriptor */ } return fd; } /* * Convert a string representation of a boolean. * Anything that looks like "No", "False", "Off", "0" etc, * is FALSE, otherwise TRUE. */ Boolean s2Boolean(const char *s, Boolean bf) { if (s) { switch(*s) { case '\0': /* not set - the default wins */ break; case '0': case 'F': case 'f': case 'N': case 'n': bf = FALSE; break; case 'O': case 'o': switch (s[1]) { case 'F': case 'f': bf = FALSE; break; default: bf = TRUE; break; } break; default: bf = TRUE; break; } } return (bf); } /* * Return a Boolean based on setting of a knob. * * If the knob is not set, the supplied default is the return value. * If set, anything that looks or smells like "No", "False", "Off", "0" etc, * is FALSE, otherwise TRUE. */ Boolean getBoolean(const char *name, Boolean bf) { char tmp[64]; char *cp; if (snprintf(tmp, sizeof(tmp), "${%s:U:tl}", name) < (int)(sizeof(tmp))) { cp = Var_Subst(NULL, tmp, VAR_GLOBAL, VARF_WANTRES); if (cp) { bf = s2Boolean(cp, bf); free(cp); } } return (bf); } Index: projects/clang390-import/contrib/bmake/make.1 =================================================================== --- projects/clang390-import/contrib/bmake/make.1 (revision 305686) +++ projects/clang390-import/contrib/bmake/make.1 (revision 305687) @@ -1,2337 +1,2352 @@ -.\" $NetBSD: make.1,v 1.259 2016/06/03 07:07:37 wiz Exp $ +.\" $NetBSD: make.1,v 1.262 2016/08/18 19:23:20 wiz Exp $ .\" .\" Copyright (c) 1990, 1993 .\" The Regents of the University of California. All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 3. Neither the name of the University nor the names of its contributors .\" may be used to endorse or promote products derived from this software .\" without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" from: @(#)make.1 8.4 (Berkeley) 3/19/94 .\" -.Dd June 2, 2016 +.Dd August 15, 2016 .Dt MAKE 1 .Os .Sh NAME .Nm make .Nd maintain program dependencies .Sh SYNOPSIS .Nm .Op Fl BeikNnqrstWwX .Op Fl C Ar directory .Op Fl D Ar variable .Op Fl d Ar flags .Op Fl f Ar makefile .Op Fl I Ar directory .Op Fl J Ar private .Op Fl j Ar max_jobs .Op Fl m Ar directory .Op Fl T Ar file .Op Fl V Ar variable .Op Ar variable=value .Op Ar target ... .Sh DESCRIPTION .Nm is a program designed to simplify the maintenance of other programs. Its input is a list of specifications as to the files upon which programs and other files depend. If no .Fl f Ar makefile makefile option is given, .Nm will try to open .Ql Pa makefile then .Ql Pa Makefile in order to find the specifications. If the file .Ql Pa .depend exists, it is read (see .Xr mkdep 1 ) . .Pp This manual page is intended as a reference document only. For a more thorough description of .Nm and makefiles, please refer to .%T "PMake \- A Tutorial" . .Pp .Nm will prepend the contents of the .Va MAKEFLAGS environment variable to the command line arguments before parsing them. .Pp The options are as follows: .Bl -tag -width Ds .It Fl B Try to be backwards compatible by executing a single shell per command and by executing the commands to make the sources of a dependency line in sequence. .It Fl C Ar directory Change to .Ar directory before reading the makefiles or doing anything else. If multiple .Fl C options are specified, each is interpreted relative to the previous one: .Fl C Pa / Fl C Pa etc is equivalent to .Fl C Pa /etc . .It Fl D Ar variable Define .Ar variable to be 1, in the global context. .It Fl d Ar [-]flags Turn on debugging, and specify which portions of .Nm are to print debugging information. Unless the flags are preceded by .Ql \- they are added to the .Va MAKEFLAGS environment variable and will be processed by any child make processes. By default, debugging information is printed to standard error, but this can be changed using the .Ar F debugging flag. The debugging output is always unbuffered; in addition, if debugging is enabled but debugging output is not directed to standard output, then the standard output is line buffered. .Ar Flags is one or more of the following: .Bl -tag -width Ds .It Ar A Print all possible debugging information; equivalent to specifying all of the debugging flags. .It Ar a Print debugging information about archive searching and caching. .It Ar C Print debugging information about current working directory. .It Ar c Print debugging information about conditional evaluation. .It Ar d Print debugging information about directory searching and caching. .It Ar e Print debugging information about failed commands and targets. .It Ar F Ns Oo Sy \&+ Oc Ns Ar filename Specify where debugging output is written. This must be the last flag, because it consumes the remainder of the argument. If the character immediately after the .Ql F flag is .Ql \&+ , then the file will be opened in append mode; otherwise the file will be overwritten. If the file name is .Ql stdout or .Ql stderr then debugging output will be written to the standard output or standard error output file descriptors respectively (and the .Ql \&+ option has no effect). Otherwise, the output will be written to the named file. If the file name ends .Ql .%d then the .Ql %d is replaced by the pid. .It Ar f Print debugging information about loop evaluation. .It Ar "g1" Print the input graph before making anything. .It Ar "g2" Print the input graph after making everything, or before exiting on error. .It Ar "g3" Print the input graph before exiting on error. .It Ar j Print debugging information about running multiple shells. .It Ar l Print commands in Makefiles regardless of whether or not they are prefixed by .Ql @ or other "quiet" flags. Also known as "loud" behavior. .It Ar M Print debugging information about "meta" mode decisions about targets. .It Ar m Print debugging information about making targets, including modification dates. .It Ar n Don't delete the temporary command scripts created when running commands. These temporary scripts are created in the directory referred to by the .Ev TMPDIR environment variable, or in .Pa /tmp if .Ev TMPDIR is unset or set to the empty string. The temporary scripts are created by .Xr mkstemp 3 , and have names of the form .Pa makeXXXXXX . .Em NOTE : This can create many files in .Ev TMPDIR or .Pa /tmp , so use with care. .It Ar p Print debugging information about makefile parsing. .It Ar s Print debugging information about suffix-transformation rules. .It Ar t Print debugging information about target list maintenance. .It Ar V Force the .Fl V option to print raw values of variables. .It Ar v Print debugging information about variable assignment. .It Ar x Run shell commands with .Fl x so the actual commands are printed as they are executed. .El .It Fl e Specify that environment variables override macro assignments within makefiles. .It Fl f Ar makefile Specify a makefile to read instead of the default .Ql Pa makefile . If .Ar makefile is .Ql Fl , standard input is read. Multiple makefiles may be specified, and are read in the order specified. .It Fl I Ar directory Specify a directory in which to search for makefiles and included makefiles. The system makefile directory (or directories, see the .Fl m option) is automatically included as part of this list. .It Fl i Ignore non-zero exit of shell commands in the makefile. Equivalent to specifying .Ql Fl before each command line in the makefile. .It Fl J Ar private This option should .Em not be specified by the user. .Pp When the .Ar j option is in use in a recursive build, this option is passed by a make to child makes to allow all the make processes in the build to cooperate to avoid overloading the system. .It Fl j Ar max_jobs Specify the maximum number of jobs that .Nm may have running at any one time. The value is saved in .Va .MAKE.JOBS . Turns compatibility mode off, unless the .Ar B flag is also specified. When compatibility mode is off, all commands associated with a target are executed in a single shell invocation as opposed to the traditional one shell invocation per line. This can break traditional scripts which change directories on each command invocation and then expect to start with a fresh environment on the next line. It is more efficient to correct the scripts rather than turn backwards compatibility on. .It Fl k Continue processing after errors are encountered, but only on those targets that do not depend on the target whose creation caused the error. .It Fl m Ar directory Specify a directory in which to search for sys.mk and makefiles included via the .Ao Ar file Ac Ns -style include statement. The .Fl m option can be used multiple times to form a search path. This path will override the default system include path: /usr/share/mk. Furthermore the system include path will be appended to the search path used for .Qo Ar file Qc Ns -style include statements (see the .Fl I option). .Pp If a file or directory name in the .Fl m argument (or the .Ev MAKESYSPATH environment variable) starts with the string .Qq \&.../ then .Nm will search for the specified file or directory named in the remaining part of the argument string. The search starts with the current directory of the Makefile and then works upward towards the root of the file system. If the search is successful, then the resulting directory replaces the .Qq \&.../ specification in the .Fl m argument. If used, this feature allows .Nm to easily search in the current source tree for customized sys.mk files (e.g., by using .Qq \&.../mk/sys.mk as an argument). .It Fl n Display the commands that would have been executed, but do not actually execute them unless the target depends on the .MAKE special source (see below). .It Fl N Display the commands which would have been executed, but do not actually execute any of them; useful for debugging top-level makefiles without descending into subdirectories. .It Fl q Do not execute any commands, but exit 0 if the specified targets are up-to-date and 1, otherwise. .It Fl r Do not use the built-in rules specified in the system makefile. .It Fl s Do not echo any commands as they are executed. Equivalent to specifying .Ql Ic @ before each command line in the makefile. .It Fl T Ar tracefile When used with the .Fl j flag, append a trace record to .Ar tracefile for each job started and completed. .It Fl t Rather than re-building a target as specified in the makefile, create it or update its modification time to make it appear up-to-date. .It Fl V Ar variable Print .Nm Ns 's idea of the value of .Ar variable , in the global context. Do not build any targets. Multiple instances of this option may be specified; the variables will be printed one per line, with a blank line for each null or undefined variable. If .Ar variable contains a .Ql \&$ then the value will be expanded before printing. .It Fl W Treat any warnings during makefile parsing as errors. .It Fl w Print entering and leaving directory messages, pre and post processing. .It Fl X Don't export variables passed on the command line to the environment individually. Variables passed on the command line are still exported via the .Va MAKEFLAGS environment variable. This option may be useful on systems which have a small limit on the size of command arguments. .It Ar variable=value Set the value of the variable .Ar variable to .Ar value . Normally, all values passed on the command line are also exported to sub-makes in the environment. The .Fl X flag disables this behavior. Variable assignments should follow options for POSIX compatibility but no ordering is enforced. .El .Pp There are seven different types of lines in a makefile: file dependency specifications, shell commands, variable assignments, include statements, conditional directives, for loops, and comments. .Pp In general, lines may be continued from one line to the next by ending them with a backslash .Pq Ql \e . The trailing newline character and initial whitespace on the following line are compressed into a single space. .Sh FILE DEPENDENCY SPECIFICATIONS Dependency lines consist of one or more targets, an operator, and zero or more sources. This creates a relationship where the targets .Dq depend on the sources and are usually created from them. The exact relationship between the target and the source is determined by the operator that separates them. The three operators are as follows: .Bl -tag -width flag .It Ic \&: A target is considered out-of-date if its modification time is less than those of any of its sources. Sources for a target accumulate over dependency lines when this operator is used. The target is removed if .Nm is interrupted. .It Ic \&! Targets are always re-created, but not until all sources have been examined and re-created as necessary. Sources for a target accumulate over dependency lines when this operator is used. The target is removed if .Nm is interrupted. .It Ic \&:: If no sources are specified, the target is always re-created. Otherwise, a target is considered out-of-date if any of its sources has been modified more recently than the target. Sources for a target do not accumulate over dependency lines when this operator is used. The target will not be removed if .Nm is interrupted. .El .Pp Targets and sources may contain the shell wildcard values .Ql \&? , .Ql * , .Ql [] , and .Ql {} . The values .Ql \&? , .Ql * , and .Ql [] may only be used as part of the final component of the target or source, and must be used to describe existing files. The value .Ql {} need not necessarily be used to describe existing files. Expansion is in directory order, not alphabetically as done in the shell. .Sh SHELL COMMANDS Each target may have associated with it one or more lines of shell commands, normally used to create the target. Each of the lines in this script .Em must be preceded by a tab. (For historical reasons, spaces are not accepted.) While targets can appear in many dependency lines if desired, by default only one of these rules may be followed by a creation script. If the .Ql Ic \&:: operator is used, however, all rules may include scripts and the scripts are executed in the order found. .Pp Each line is treated as a separate shell command, unless the end of line is escaped with a backslash .Pq Ql \e in which case that line and the next are combined. .\" The escaped newline is retained and passed to the shell, which .\" normally ignores it. .\" However, the tab at the beginning of the following line is removed. If the first characters of the command are any combination of .Ql Ic @ , .Ql Ic + , or .Ql Ic \- , the command is treated specially. A .Ql Ic @ causes the command not to be echoed before it is executed. A .Ql Ic + causes the command to be executed even when .Fl n is given. This is similar to the effect of the .MAKE special source, except that the effect can be limited to a single line of a script. A .Ql Ic \- in compatibility mode causes any non-zero exit status of the command line to be ignored. .Pp When .Nm is run in jobs mode with .Fl j Ar max_jobs , the entire script for the target is fed to a single instance of the shell. In compatibility (non-jobs) mode, each command is run in a separate process. If the command contains any shell meta characters .Pq Ql #=|^(){};&<>*?[]:$`\e\en it will be passed to the shell; otherwise .Nm will attempt direct execution. If a line starts with .Ql Ic \- and the shell has ErrCtl enabled then failure of the command line will be ignored as in compatibility mode. Otherwise .Ql Ic \- affects the entire job; the script will stop at the first command line that fails, but the target will not be deemed to have failed. .Pp Makefiles should be written so that the mode of .Nm operation does not change their behavior. For example, any command which needs to use .Dq cd or .Dq chdir without potentially changing the directory for subsequent commands should be put in parentheses so it executes in a subshell. To force the use of one shell, escape the line breaks so as to make the whole script one command. For example: .Bd -literal -offset indent avoid-chdir-side-effects: @echo Building $@ in `pwd` @(cd ${.CURDIR} && ${MAKE} $@) @echo Back in `pwd` ensure-one-shell-regardless-of-mode: @echo Building $@ in `pwd`; \e (cd ${.CURDIR} && ${MAKE} $@); \e echo Back in `pwd` .Ed .Pp Since .Nm will .Xr chdir 2 to .Ql Va .OBJDIR before executing any targets, each child process starts with that as its current working directory. .Sh VARIABLE ASSIGNMENTS Variables in make are much like variables in the shell, and, by tradition, consist of all upper-case letters. .Ss Variable assignment modifiers The five operators that can be used to assign values to variables are as follows: .Bl -tag -width Ds .It Ic \&= Assign the value to the variable. Any previous value is overridden. .It Ic \&+= Append the value to the current value of the variable. .It Ic \&?= Assign the value to the variable if it is not already defined. .It Ic \&:= Assign with expansion, i.e. expand the value before assigning it to the variable. Normally, expansion is not done until the variable is referenced. .Em NOTE : References to undefined variables are .Em not expanded. This can cause problems when variable modifiers are used. .It Ic \&!= Expand the value and pass it to the shell for execution and assign the result to the variable. Any newlines in the result are replaced with spaces. .El .Pp Any white-space before the assigned .Ar value is removed; if the value is being appended, a single space is inserted between the previous contents of the variable and the appended value. .Pp Variables are expanded by surrounding the variable name with either curly braces .Pq Ql {} or parentheses .Pq Ql () and preceding it with a dollar sign .Pq Ql \&$ . If the variable name contains only a single letter, the surrounding braces or parentheses are not required. This shorter form is not recommended. .Pp If the variable name contains a dollar, then the name itself is expanded first. This allows almost arbitrary variable names, however names containing dollar, braces, parenthesis, or whitespace are really best avoided! .Pp If the result of expanding a variable contains a dollar sign .Pq Ql \&$ the string is expanded again. .Pp Variable substitution occurs at three distinct times, depending on where the variable is being used. .Bl -enum .It Variables in dependency lines are expanded as the line is read. .It Variables in shell commands are expanded when the shell command is executed. .It .Dq .for loop index variables are expanded on each loop iteration. Note that other variables are not expanded inside loops so the following example code: .Bd -literal -offset indent .Dv .for i in 1 2 3 a+= ${i} j= ${i} b+= ${j} .Dv .endfor all: @echo ${a} @echo ${b} .Ed will print: .Bd -literal -offset indent 1 2 3 3 3 3 .Ed Because while ${a} contains .Dq 1 2 3 after the loop is executed, ${b} contains .Dq ${j} ${j} ${j} which expands to .Dq 3 3 3 since after the loop completes ${j} contains .Dq 3 . .El .Ss Variable classes The four different classes of variables (in order of increasing precedence) are: .Bl -tag -width Ds .It Environment variables Variables defined as part of .Nm Ns 's environment. .It Global variables Variables defined in the makefile or in included makefiles. .It Command line variables Variables defined as part of the command line. .It Local variables Variables that are defined specific to a certain target. .El .Pp Local variables are all built in and their values vary magically from target to target. It is not currently possible to define new local variables. The seven local variables are as follows: .Bl -tag -width ".ARCHIVE" -offset indent .It Va .ALLSRC The list of all sources for this target; also known as .Ql Va \&\*[Gt] . .It Va .ARCHIVE The name of the archive file; also known as .Ql Va \&! . .It Va .IMPSRC In suffix-transformation rules, the name/path of the source from which the target is to be transformed (the .Dq implied source); also known as .Ql Va \&\*[Lt] . It is not defined in explicit rules. .It Va .MEMBER The name of the archive member; also known as .Ql Va % . .It Va .OODATE The list of sources for this target that were deemed out-of-date; also known as .Ql Va \&? . .It Va .PREFIX The file prefix of the target, containing only the file portion, no suffix or preceding directory components; also known as .Ql Va * . The suffix must be one of the known suffixes declared with .Ic .SUFFIXES or it will not be recognized. .It Va .TARGET The name of the target; also known as .Ql Va @ . For compatibility with other makes this is an alias for .Ic .ARCHIVE in archive member rules. .El .Pp The shorter forms .Ql ( Va \*[Gt] , .Ql Va \&! , .Ql Va \*[Lt] , .Ql Va % , .Ql Va \&? , .Ql Va * , and .Ql Va @ ) are permitted for backward compatibility with historical makefiles and legacy POSIX make and are not recommended. .Pp Variants of these variables with the punctuation followed immediately by .Ql D or .Ql F , e.g. .Ql Va $(@D) , are legacy forms equivalent to using the .Ql :H and .Ql :T modifiers. These forms are accepted for compatibility with .At V makefiles and POSIX but are not recommended. .Pp Four of the local variables may be used in sources on dependency lines because they expand to the proper value for each target on the line. These variables are .Ql Va .TARGET , .Ql Va .PREFIX , .Ql Va .ARCHIVE , and .Ql Va .MEMBER . .Ss Additional built-in variables In addition, .Nm sets or knows about the following variables: .Bl -tag -width .MAKEOVERRIDES .It Va \&$ A single dollar sign .Ql \&$ , i.e. .Ql \&$$ expands to a single dollar sign. .It Va .ALLTARGETS The list of all targets encountered in the Makefile. If evaluated during Makefile parsing, lists only those targets encountered thus far. .It Va .CURDIR A path to the directory where .Nm was executed. Refer to the description of .Ql Ev PWD for more details. .It Va .INCLUDEDFROMDIR The directory of the file this Makefile was included from. .It Va .INCLUDEDFROMFILE The filename of the file this Makefile was included from. .It Ev MAKE The name that .Nm was executed with .Pq Va argv[0] . For compatibility .Nm also sets .Va .MAKE with the same value. The preferred variable to use is the environment variable .Ev MAKE because it is more compatible with other versions of .Nm and cannot be confused with the special target with the same name. .It Va .MAKE.ALWAYS_PASS_JOB_QUEUE Tells .Nm whether to pass the descriptors of the job token queue even if the target is not tagged with .Ic .MAKE The default is .Ql Pa yes for backwards compatability with .Fx 9.0 and earlier. .It Va .MAKE.DEPENDFILE Names the makefile (default .Ql Pa .depend ) from which generated dependencies are read. .It Va .MAKE.EXPAND_VARIABLES A boolean that controls the default behavior of the .Fl V option. .It Va .MAKE.EXPORTED The list of variables exported by .Nm . .It Va .MAKE.JOBS The argument to the .Fl j option. .It Va .MAKE.JOB.PREFIX If .Nm is run with .Ar j then output for each target is prefixed with a token .Ql --- target --- the first part of which can be controlled via .Va .MAKE.JOB.PREFIX . If .Va .MAKE.JOB.PREFIX is empty, no token is printed. .br For example: .Li .MAKE.JOB.PREFIX=${.newline}---${.MAKE:T}[${.MAKE.PID}] would produce tokens like .Ql ---make[1234] target --- making it easier to track the degree of parallelism being achieved. .It Ev MAKEFLAGS The environment variable .Ql Ev MAKEFLAGS may contain anything that may be specified on .Nm Ns 's command line. Anything specified on .Nm Ns 's command line is appended to the .Ql Ev MAKEFLAGS variable which is then entered into the environment for all programs which .Nm executes. .It Va .MAKE.LEVEL The recursion depth of .Nm . The initial instance of .Nm will be 0, and an incremented value is put into the environment to be seen by the next generation. This allows tests like: .Li .if ${.MAKE.LEVEL} == 0 to protect things which should only be evaluated in the initial instance of .Nm . .It Va .MAKE.MAKEFILE_PREFERENCE The ordered list of makefile names (default .Ql Pa makefile , .Ql Pa Makefile ) that .Nm will look for. .It Va .MAKE.MAKEFILES The list of makefiles read by .Nm , which is useful for tracking dependencies. Each makefile is recorded only once, regardless of the number of times read. .It Va .MAKE.MODE Processed after reading all makefiles. Can affect the mode that .Nm runs in. It can contain a number of keywords: .Bl -hang -width missing-filemon=bf. .It Pa compat Like .Fl B , puts .Nm into "compat" mode. .It Pa meta Puts .Nm into "meta" mode, where meta files are created for each target to capture the command run, the output generated and if .Xr filemon 4 is available, the system calls which are of interest to .Nm . The captured output can be very useful when diagnosing errors. .It Pa curdirOk= Ar bf Normally .Nm will not create .meta files in .Ql Va .CURDIR . This can be overridden by setting .Va bf to a value which represents True. .It Pa missing-meta= Ar bf If .Va bf is True, then a missing .meta file makes the target out-of-date. .It Pa missing-filemon= Ar bf If .Va bf is True, then missing filemon data makes the target out-of-date. .It Pa nofilemon Do not use .Xr filemon 4 . .It Pa env For debugging, it can be useful to include the environment in the .meta file. .It Pa verbose If in "meta" mode, print a clue about the target being built. This is useful if the build is otherwise running silently. The message printed the value of: .Va .MAKE.META.PREFIX . .It Pa ignore-cmd Some makefiles have commands which are simply not stable. This keyword causes them to be ignored for determining whether a target is out of date in "meta" mode. See also .Ic .NOMETA_CMP . .It Pa silent= Ar bf If .Va bf is True, when a .meta file is created, mark the target .Ic .SILENT . .El .It Va .MAKE.META.BAILIWICK In "meta" mode, provides a list of prefixes which match the directories controlled by .Nm . If a file that was generated outside of .Va .OBJDIR but within said bailiwick is missing, the current target is considered out-of-date. .It Va .MAKE.META.CREATED In "meta" mode, this variable contains a list of all the meta files updated. If not empty, it can be used to trigger processing of .Va .MAKE.META.FILES . .It Va .MAKE.META.FILES In "meta" mode, this variable contains a list of all the meta files used (updated or not). This list can be used to process the meta files to extract dependency information. .It Va .MAKE.META.IGNORE_PATHS Provides a list of path prefixes that should be ignored; because the contents are expected to change over time. The default list includes: .Ql Pa /dev /etc /proc /tmp /var/run /var/tmp .It Va .MAKE.META.IGNORE_PATTERNS Provides a list of patterns to match against pathnames. Ignore any that match. +.It Va .MAKE.META.IGNORE_FILTER +Provides a list of variable modifiers to apply to each pathname. +Ignore if the expansion is an empty string. .It Va .MAKE.META.PREFIX Defines the message printed for each meta file updated in "meta verbose" mode. The default value is: .Dl Building ${.TARGET:H:tA}/${.TARGET:T} .It Va .MAKEOVERRIDES This variable is used to record the names of variables assigned to on the command line, so that they may be exported as part of .Ql Ev MAKEFLAGS . This behavior can be disabled by assigning an empty value to .Ql Va .MAKEOVERRIDES within a makefile. Extra variables can be exported from a makefile by appending their names to .Ql Va .MAKEOVERRIDES . .Ql Ev MAKEFLAGS is re-exported whenever .Ql Va .MAKEOVERRIDES is modified. .It Va .MAKE.PATH_FILEMON If .Nm was built with .Xr filemon 4 support, this is set to the path of the device node. This allows makefiles to test for this support. .It Va .MAKE.PID The process-id of .Nm . .It Va .MAKE.PPID The parent process-id of .Nm . .It Va .MAKE.SAVE_DOLLARS value should be a boolean that controls whether .Ql $$ are preserved when doing .Ql := assignments. The default is false, for backwards compatibility. Set to true for compatability with other makes. If set to false, .Ql $$ becomes .Ql $ per normal evaluation rules. .It Va MAKE_PRINT_VAR_ON_ERROR When .Nm -stops due to an error, it prints its name and the value of +stops due to an error, it sets +.Ql Va .ERROR_TARGET +to the name of the target that failed, +.Ql Va .ERROR_CMD +to the commands of the failed target, +and in "meta" mode, it also sets +.Ql Va .ERROR_CWD +to the +.Xr getcwd 3 , +and +.Ql Va .ERROR_META_FILE +to the path of the meta file (if any) describing the failed target. +It then prints its name and the value of .Ql Va .CURDIR as well as the value of any variables named in .Ql Va MAKE_PRINT_VAR_ON_ERROR . .It Va .newline This variable is simply assigned a newline character as its value. This allows expansions using the .Cm \&:@ modifier to put a newline between iterations of the loop rather than a space. For example, the printing of .Ql Va MAKE_PRINT_VAR_ON_ERROR could be done as ${MAKE_PRINT_VAR_ON_ERROR:@v@$v='${$v}'${.newline}@}. .It Va .OBJDIR A path to the directory where the targets are built. Its value is determined by trying to .Xr chdir 2 to the following directories in order and using the first match: .Bl -enum .It .Ev ${MAKEOBJDIRPREFIX}${.CURDIR} .Pp (Only if .Ql Ev MAKEOBJDIRPREFIX is set in the environment or on the command line.) .It .Ev ${MAKEOBJDIR} .Pp (Only if .Ql Ev MAKEOBJDIR is set in the environment or on the command line.) .It .Ev ${.CURDIR} Ns Pa /obj. Ns Ev ${MACHINE} .It .Ev ${.CURDIR} Ns Pa /obj .It .Pa /usr/obj/ Ns Ev ${.CURDIR} .It .Ev ${.CURDIR} .El .Pp Variable expansion is performed on the value before it's used, so expressions such as .Dl ${.CURDIR:S,^/usr/src,/var/obj,} may be used. This is especially useful with .Ql Ev MAKEOBJDIR . .Pp .Ql Va .OBJDIR may be modified in the makefile via the special target .Ql Ic .OBJDIR . In all cases, .Nm will .Xr chdir 2 to the specified directory if it exists, and set .Ql Va .OBJDIR and .Ql Ev PWD to that directory before executing any targets. . .It Va .PARSEDIR A path to the directory of the current .Ql Pa Makefile being parsed. .It Va .PARSEFILE The basename of the current .Ql Pa Makefile being parsed. This variable and .Ql Va .PARSEDIR are both set only while the .Ql Pa Makefiles are being parsed. If you want to retain their current values, assign them to a variable using assignment with expansion: .Pq Ql Cm \&:= . .It Va .PATH A variable that represents the list of directories that .Nm will search for files. The search list should be updated using the target .Ql Va .PATH rather than the variable. .It Ev PWD Alternate path to the current directory. .Nm normally sets .Ql Va .CURDIR to the canonical path given by .Xr getcwd 3 . However, if the environment variable .Ql Ev PWD is set and gives a path to the current directory, then .Nm sets .Ql Va .CURDIR to the value of .Ql Ev PWD instead. This behavior is disabled if .Ql Ev MAKEOBJDIRPREFIX is set or .Ql Ev MAKEOBJDIR contains a variable transform. .Ql Ev PWD is set to the value of .Ql Va .OBJDIR for all programs which .Nm executes. .It Ev .TARGETS The list of targets explicitly specified on the command line, if any. .It Ev VPATH Colon-separated .Pq Dq \&: lists of directories that .Nm will search for files. The variable is supported for compatibility with old make programs only, use .Ql Va .PATH instead. .El .Ss Variable modifiers Variable expansion may be modified to select or modify each word of the variable (where a .Dq word is white-space delimited sequence of characters). The general format of a variable expansion is as follows: .Pp .Dl ${variable[:modifier[:...]]} .Pp Each modifier begins with a colon, which may be escaped with a backslash .Pq Ql \e . .Pp A set of modifiers can be specified via a variable, as follows: .Pp .Dl modifier_variable=modifier[:...] .Dl ${variable:${modifier_variable}[:...]} .Pp In this case the first modifier in the modifier_variable does not start with a colon, since that must appear in the referencing variable. If any of the modifiers in the modifier_variable contain a dollar sign .Pq Ql $ , these must be doubled to avoid early expansion. .Pp The supported modifiers are: .Bl -tag -width EEE .It Cm \&:E Replaces each word in the variable with its suffix. .It Cm \&:H Replaces each word in the variable with everything but the last component. .It Cm \&:M Ns Ar pattern Select only those words that match .Ar pattern . The standard shell wildcard characters .Pf ( Ql * , .Ql \&? , and .Ql Oo Oc ) may be used. The wildcard characters may be escaped with a backslash .Pq Ql \e . As a consequence of the way values are split into words, matched, and then joined, a construct like .Dl ${VAR:M*} will normalize the inter-word spacing, removing all leading and trailing space, and converting multiple consecutive spaces to single spaces. . .It Cm \&:N Ns Ar pattern This is identical to .Ql Cm \&:M , but selects all words which do not match .Ar pattern . .It Cm \&:O Order every word in variable alphabetically. To sort words in reverse order use the .Ql Cm \&:O:[-1..1] combination of modifiers. .It Cm \&:Ox Randomize words in variable. The results will be different each time you are referring to the modified variable; use the assignment with expansion .Pq Ql Cm \&:= to prevent such behavior. For example, .Bd -literal -offset indent LIST= uno due tre quattro RANDOM_LIST= ${LIST:Ox} STATIC_RANDOM_LIST:= ${LIST:Ox} all: @echo "${RANDOM_LIST}" @echo "${RANDOM_LIST}" @echo "${STATIC_RANDOM_LIST}" @echo "${STATIC_RANDOM_LIST}" .Ed may produce output similar to: .Bd -literal -offset indent quattro due tre uno tre due quattro uno due uno quattro tre due uno quattro tre .Ed .It Cm \&:Q Quotes every shell meta-character in the variable, so that it can be passed safely through recursive invocations of .Nm . .It Cm \&:R Replaces each word in the variable with everything but its suffix. .It Cm \&:gmtime The value is a format string for .Xr strftime 3 , using the current .Xr gmtime 3 . .It Cm \&:hash Compute a 32-bit hash of the value and encode it as hex digits. .It Cm \&:localtime The value is a format string for .Xr strftime 3 , using the current .Xr localtime 3 . .It Cm \&:tA Attempt to convert variable to an absolute path using .Xr realpath 3 , if that fails, the value is unchanged. .It Cm \&:tl Converts variable to lower-case letters. .It Cm \&:ts Ns Ar c Words in the variable are normally separated by a space on expansion. This modifier sets the separator to the character .Ar c . If .Ar c is omitted, then no separator is used. The common escapes (including octal numeric codes), work as expected. .It Cm \&:tu Converts variable to upper-case letters. .It Cm \&:tW Causes the value to be treated as a single word (possibly containing embedded white space). See also .Ql Cm \&:[*] . .It Cm \&:tw Causes the value to be treated as a sequence of words delimited by white space. See also .Ql Cm \&:[@] . .Sm off .It Cm \&:S No \&/ Ar old_string No \&/ Ar new_string No \&/ Op Cm 1gW .Sm on Modify the first occurrence of .Ar old_string in the variable's value, replacing it with .Ar new_string . If a .Ql g is appended to the last slash of the pattern, all occurrences in each word are replaced. If a .Ql 1 is appended to the last slash of the pattern, only the first word is affected. If a .Ql W is appended to the last slash of the pattern, then the value is treated as a single word (possibly containing embedded white space). If .Ar old_string begins with a caret .Pq Ql ^ , .Ar old_string is anchored at the beginning of each word. If .Ar old_string ends with a dollar sign .Pq Ql \&$ , it is anchored at the end of each word. Inside .Ar new_string , an ampersand .Pq Ql \*[Am] is replaced by .Ar old_string (without any .Ql ^ or .Ql \&$ ) . Any character may be used as a delimiter for the parts of the modifier string. The anchoring, ampersand and delimiter characters may be escaped with a backslash .Pq Ql \e . .Pp Variable expansion occurs in the normal fashion inside both .Ar old_string and .Ar new_string with the single exception that a backslash is used to prevent the expansion of a dollar sign .Pq Ql \&$ , not a preceding dollar sign as is usual. .Sm off .It Cm \&:C No \&/ Ar pattern No \&/ Ar replacement No \&/ Op Cm 1gW .Sm on The .Cm \&:C modifier is just like the .Cm \&:S modifier except that the old and new strings, instead of being simple strings, are an extended regular expression (see .Xr regex 3 ) string .Ar pattern and an .Xr ed 1 Ns \-style string .Ar replacement . Normally, the first occurrence of the pattern .Ar pattern in each word of the value is substituted with .Ar replacement . The .Ql 1 modifier causes the substitution to apply to at most one word; the .Ql g modifier causes the substitution to apply to as many instances of the search pattern .Ar pattern as occur in the word or words it is found in; the .Ql W modifier causes the value to be treated as a single word (possibly containing embedded white space). Note that .Ql 1 and .Ql g are orthogonal; the former specifies whether multiple words are potentially affected, the latter whether multiple substitutions can potentially occur within each affected word. .Pp As for the .Cm \&:S modifier, the .Ar pattern and .Ar replacement are subjected to variable expansion before being parsed as regular expressions. .It Cm \&:T Replaces each word in the variable with its last component. .It Cm \&:u Remove adjacent duplicate words (like .Xr uniq 1 ) . .Sm off .It Cm \&:\&? Ar true_string Cm \&: Ar false_string .Sm on If the variable name (not its value), when parsed as a .if conditional expression, evaluates to true, return as its value the .Ar true_string , otherwise return the .Ar false_string . Since the variable name is used as the expression, \&:\&? must be the first modifier after the variable name itself - which will, of course, usually contain variable expansions. A common error is trying to use expressions like .Dl ${NUMBERS:M42:?match:no} which actually tests defined(NUMBERS), to determine is any words match "42" you need to use something like: .Dl ${"${NUMBERS:M42}" != \&"\&":?match:no} . .It Ar :old_string=new_string This is the .At V style variable substitution. It must be the last modifier specified. If .Ar old_string or .Ar new_string do not contain the pattern matching character .Ar % then it is assumed that they are anchored at the end of each word, so only suffixes or entire words may be replaced. Otherwise .Ar % is the substring of .Ar old_string to be replaced in .Ar new_string . .Pp Variable expansion occurs in the normal fashion inside both .Ar old_string and .Ar new_string with the single exception that a backslash is used to prevent the expansion of a dollar sign .Pq Ql \&$ , not a preceding dollar sign as is usual. .Sm off .It Cm \&:@ Ar temp Cm @ Ar string Cm @ .Sm on This is the loop expansion mechanism from the OSF Development Environment (ODE) make. Unlike .Cm \&.for loops expansion occurs at the time of reference. Assign .Ar temp to each word in the variable and evaluate .Ar string . The ODE convention is that .Ar temp should start and end with a period. For example. .Dl ${LINKS:@.LINK.@${LN} ${TARGET} ${.LINK.}@} .Pp However a single character variable is often more readable: .Dl ${MAKE_PRINT_VAR_ON_ERROR:@v@$v='${$v}'${.newline}@} .It Cm \&:U Ns Ar newval If the variable is undefined .Ar newval is the value. If the variable is defined, the existing value is returned. This is another ODE make feature. It is handy for setting per-target CFLAGS for instance: .Dl ${_${.TARGET:T}_CFLAGS:U${DEF_CFLAGS}} If a value is only required if the variable is undefined, use: .Dl ${VAR:D:Unewval} .It Cm \&:D Ns Ar newval If the variable is defined .Ar newval is the value. .It Cm \&:L The name of the variable is the value. .It Cm \&:P The path of the node which has the same name as the variable is the value. If no such node exists or its path is null, then the name of the variable is used. In order for this modifier to work, the name (node) must at least have appeared on the rhs of a dependency. .Sm off .It Cm \&:\&! Ar cmd Cm \&! .Sm on The output of running .Ar cmd is the value. .It Cm \&:sh If the variable is non-empty it is run as a command and the output becomes the new value. .It Cm \&::= Ns Ar str The variable is assigned the value .Ar str after substitution. This modifier and its variations are useful in obscure situations such as wanting to set a variable when shell commands are being parsed. These assignment modifiers always expand to nothing, so if appearing in a rule line by themselves should be preceded with something to keep .Nm happy. .Pp The .Ql Cm \&:: helps avoid false matches with the .At V style .Cm \&:= modifier and since substitution always occurs the .Cm \&::= form is vaguely appropriate. .It Cm \&::?= Ns Ar str As for .Cm \&::= but only if the variable does not already have a value. .It Cm \&::+= Ns Ar str Append .Ar str to the variable. .It Cm \&::!= Ns Ar cmd Assign the output of .Ar cmd to the variable. .It Cm \&:\&[ Ns Ar range Ns Cm \&] Selects one or more words from the value, or performs other operations related to the way in which the value is divided into words. .Pp Ordinarily, a value is treated as a sequence of words delimited by white space. Some modifiers suppress this behavior, causing a value to be treated as a single word (possibly containing embedded white space). An empty value, or a value that consists entirely of white-space, is treated as a single word. For the purposes of the .Ql Cm \&:[] modifier, the words are indexed both forwards using positive integers (where index 1 represents the first word), and backwards using negative integers (where index \-1 represents the last word). .Pp The .Ar range is subjected to variable expansion, and the expanded result is then interpreted as follows: .Bl -tag -width index .\" :[n] .It Ar index Selects a single word from the value. .\" :[start..end] .It Ar start Ns Cm \&.. Ns Ar end Selects all words from .Ar start to .Ar end , inclusive. For example, .Ql Cm \&:[2..-1] selects all words from the second word to the last word. If .Ar start is greater than .Ar end , then the words are output in reverse order. For example, .Ql Cm \&:[-1..1] selects all the words from last to first. .\" :[*] .It Cm \&* Causes subsequent modifiers to treat the value as a single word (possibly containing embedded white space). Analogous to the effect of \&"$*\&" in Bourne shell. .\" :[0] .It 0 Means the same as .Ql Cm \&:[*] . .\" :[*] .It Cm \&@ Causes subsequent modifiers to treat the value as a sequence of words delimited by white space. Analogous to the effect of \&"$@\&" in Bourne shell. .\" :[#] .It Cm \&# Returns the number of words in the value. .El \" :[range] .El .Sh INCLUDE STATEMENTS, CONDITIONALS AND FOR LOOPS Makefile inclusion, conditional structures and for loops reminiscent of the C programming language are provided in .Nm . All such structures are identified by a line beginning with a single dot .Pq Ql \&. character. Files are included with either .Cm \&.include Aq Ar file or .Cm \&.include Pf \*q Ar file Ns \*q . Variables between the angle brackets or double quotes are expanded to form the file name. If angle brackets are used, the included makefile is expected to be in the system makefile directory. If double quotes are used, the including makefile's directory and any directories specified using the .Fl I option are searched before the system makefile directory. For compatibility with other versions of .Nm .Ql include file ... is also accepted. .Pp If the include statement is written as .Cm .-include or as .Cm .sinclude then errors locating and/or opening include files are ignored. .Pp If the include statement is written as .Cm .dinclude not only are errors locating and/or opening include files ignored, but stale dependencies within the included file will be ignored just like .Va .MAKE.DEPENDFILE . .Pp Conditional expressions are also preceded by a single dot as the first character of a line. The possible conditionals are as follows: .Bl -tag -width Ds .It Ic .error Ar message The message is printed along with the name of the makefile and line number, then .Nm will exit. .It Ic .export Ar variable ... Export the specified global variable. If no variable list is provided, all globals are exported except for internal variables (those that start with .Ql \&. ) . This is not affected by the .Fl X flag, so should be used with caution. For compatibility with other .Nm programs .Ql export variable=value is also accepted. .Pp Appending a variable name to .Va .MAKE.EXPORTED is equivalent to exporting a variable. .It Ic .export-env Ar variable ... The same as .Ql .export , except that the variable is not appended to .Va .MAKE.EXPORTED . This allows exporting a value to the environment which is different from that used by .Nm internally. .It Ic .export-literal Ar variable ... The same as .Ql .export-env , except that variables in the value are not expanded. .It Ic .info Ar message The message is printed along with the name of the makefile and line number. .It Ic .undef Ar variable Un-define the specified global variable. Only global variables may be un-defined. .It Ic .unexport Ar variable ... The opposite of .Ql .export . The specified global .Va variable will be removed from .Va .MAKE.EXPORTED . If no variable list is provided, all globals are unexported, and .Va .MAKE.EXPORTED deleted. .It Ic .unexport-env Unexport all globals previously exported and clear the environment inherited from the parent. This operation will cause a memory leak of the original environment, so should be used sparingly. Testing for .Va .MAKE.LEVEL being 0, would make sense. Also note that any variables which originated in the parent environment should be explicitly preserved if desired. For example: .Bd -literal -offset indent .Li .if ${.MAKE.LEVEL} == 0 PATH := ${PATH} .Li .unexport-env .Li .export PATH .Li .endif .Pp .Ed Would result in an environment containing only .Ql Ev PATH , which is the minimal useful environment. Actually .Ql Ev .MAKE.LEVEL will also be pushed into the new environment. .It Ic .warning Ar message The message prefixed by .Ql Pa warning: is printed along with the name of the makefile and line number. .It Ic \&.if Oo \&! Oc Ns Ar expression Op Ar operator expression ... Test the value of an expression. .It Ic .ifdef Oo \&! Oc Ns Ar variable Op Ar operator variable ... Test the value of a variable. .It Ic .ifndef Oo \&! Oc Ns Ar variable Op Ar operator variable ... Test the value of a variable. .It Ic .ifmake Oo \&! Oc Ns Ar target Op Ar operator target ... Test the target being built. .It Ic .ifnmake Oo \&! Ns Oc Ar target Op Ar operator target ... Test the target being built. .It Ic .else Reverse the sense of the last conditional. .It Ic .elif Oo \&! Ns Oc Ar expression Op Ar operator expression ... A combination of .Ql Ic .else followed by .Ql Ic .if . .It Ic .elifdef Oo \&! Oc Ns Ar variable Op Ar operator variable ... A combination of .Ql Ic .else followed by .Ql Ic .ifdef . .It Ic .elifndef Oo \&! Oc Ns Ar variable Op Ar operator variable ... A combination of .Ql Ic .else followed by .Ql Ic .ifndef . .It Ic .elifmake Oo \&! Oc Ns Ar target Op Ar operator target ... A combination of .Ql Ic .else followed by .Ql Ic .ifmake . .It Ic .elifnmake Oo \&! Oc Ns Ar target Op Ar operator target ... A combination of .Ql Ic .else followed by .Ql Ic .ifnmake . .It Ic .endif End the body of the conditional. .El .Pp The .Ar operator may be any one of the following: .Bl -tag -width "Cm XX" .It Cm \&|\&| Logical OR. .It Cm \&\*[Am]\*[Am] Logical .Tn AND ; of higher precedence than .Dq \&|\&| . .El .Pp As in C, .Nm will only evaluate a conditional as far as is necessary to determine its value. Parentheses may be used to change the order of evaluation. The boolean operator .Ql Ic \&! may be used to logically negate an entire conditional. It is of higher precedence than .Ql Ic \&\*[Am]\*[Am] . .Pp The value of .Ar expression may be any of the following: .Bl -tag -width defined .It Ic defined Takes a variable name as an argument and evaluates to true if the variable has been defined. .It Ic make Takes a target name as an argument and evaluates to true if the target was specified as part of .Nm Ns 's command line or was declared the default target (either implicitly or explicitly, see .Va .MAIN ) before the line containing the conditional. .It Ic empty Takes a variable, with possible modifiers, and evaluates to true if the expansion of the variable would result in an empty string. .It Ic exists Takes a file name as an argument and evaluates to true if the file exists. The file is searched for on the system search path (see .Va .PATH ) . .It Ic target Takes a target name as an argument and evaluates to true if the target has been defined. .It Ic commands Takes a target name as an argument and evaluates to true if the target has been defined and has commands associated with it. .El .Pp .Ar Expression may also be an arithmetic or string comparison. Variable expansion is performed on both sides of the comparison, after which the integral values are compared. A value is interpreted as hexadecimal if it is preceded by 0x, otherwise it is decimal; octal numbers are not supported. The standard C relational operators are all supported. If after variable expansion, either the left or right hand side of a .Ql Ic == or .Ql Ic "!=" operator is not an integral value, then string comparison is performed between the expanded variables. If no relational operator is given, it is assumed that the expanded variable is being compared against 0 or an empty string in the case of a string comparison. .Pp When .Nm is evaluating one of these conditional expressions, and it encounters a (white-space separated) word it doesn't recognize, either the .Dq make or .Dq defined expression is applied to it, depending on the form of the conditional. If the form is .Ql Ic .ifdef , .Ql Ic .ifndef , or .Ql Ic .if the .Dq defined expression is applied. Similarly, if the form is .Ql Ic .ifmake or .Ql Ic .ifnmake , the .Dq make expression is applied. .Pp If the conditional evaluates to true the parsing of the makefile continues as before. If it evaluates to false, the following lines are skipped. In both cases this continues until a .Ql Ic .else or .Ql Ic .endif is found. .Pp For loops are typically used to apply a set of rules to a list of files. The syntax of a for loop is: .Pp .Bl -tag -compact -width Ds .It Ic \&.for Ar variable Oo Ar variable ... Oc Ic in Ar expression .It Aq make-rules .It Ic \&.endfor .El .Pp After the for .Ic expression is evaluated, it is split into words. On each iteration of the loop, one word is taken and assigned to each .Ic variable , in order, and these .Ic variables are substituted into the .Ic make-rules inside the body of the for loop. The number of words must come out even; that is, if there are three iteration variables, the number of words provided must be a multiple of three. .Sh COMMENTS Comments begin with a hash .Pq Ql \&# character, anywhere but in a shell command line, and continue to the end of an unescaped new line. .Sh SPECIAL SOURCES (ATTRIBUTES) .Bl -tag -width .IGNOREx .It Ic .EXEC Target is never out of date, but always execute commands anyway. .It Ic .IGNORE Ignore any errors from the commands associated with this target, exactly as if they all were preceded by a dash .Pq Ql \- . .\" .It Ic .INVISIBLE .\" XXX .\" .It Ic .JOIN .\" XXX .It Ic .MADE Mark all sources of this target as being up-to-date. .It Ic .MAKE Execute the commands associated with this target even if the .Fl n or .Fl t options were specified. Normally used to mark recursive .Nm Ns s . .It Ic .META Create a meta file for the target, even if it is flagged as .Ic .PHONY , .Ic .MAKE , or .Ic .SPECIAL . Usage in conjunction with .Ic .MAKE is the most likely case. In "meta" mode, the target is out-of-date if the meta file is missing. .It Ic .NOMETA Do not create a meta file for the target. Meta files are also not created for .Ic .PHONY , .Ic .MAKE , or .Ic .SPECIAL targets. .It Ic .NOMETA_CMP Ignore differences in commands when deciding if target is out of date. This is useful if the command contains a value which always changes. If the number of commands change, though, the target will still be out of date. The same effect applies to any command line that uses the variable .Va .OODATE , which can be used for that purpose even when not otherwise needed or desired: .Bd -literal -offset indent skip-compare-for-some: @echo this will be compared @echo this will not ${.OODATE:M.NOMETA_CMP} @echo this will also be compared .Ed The .Cm \&:M pattern suppresses any expansion of the unwanted variable. .It Ic .NOPATH Do not search for the target in the directories specified by .Ic .PATH . .It Ic .NOTMAIN Normally .Nm selects the first target it encounters as the default target to be built if no target was specified. This source prevents this target from being selected. .It Ic .OPTIONAL If a target is marked with this attribute and .Nm can't figure out how to create it, it will ignore this fact and assume the file isn't needed or already exists. .It Ic .PHONY The target does not correspond to an actual file; it is always considered to be out of date, and will not be created with the .Fl t option. Suffix-transformation rules are not applied to .Ic .PHONY targets. .It Ic .PRECIOUS When .Nm is interrupted, it normally removes any partially made targets. This source prevents the target from being removed. .It Ic .RECURSIVE Synonym for .Ic .MAKE . .It Ic .SILENT Do not echo any of the commands associated with this target, exactly as if they all were preceded by an at sign .Pq Ql @ . .It Ic .USE Turn the target into .Nm Ns 's version of a macro. When the target is used as a source for another target, the other target acquires the commands, sources, and attributes (except for .Ic .USE ) of the source. If the target already has commands, the .Ic .USE target's commands are appended to them. .It Ic .USEBEFORE Exactly like .Ic .USE , but prepend the .Ic .USEBEFORE target commands to the target. .It Ic .WAIT If .Ic .WAIT appears in a dependency line, the sources that precede it are made before the sources that succeed it in the line. Since the dependents of files are not made until the file itself could be made, this also stops the dependents being built unless they are needed for another branch of the dependency tree. So given: .Bd -literal x: a .WAIT b echo x a: echo a b: b1 echo b b1: echo b1 .Ed the output is always .Ql a , .Ql b1 , .Ql b , .Ql x . .br The ordering imposed by .Ic .WAIT is only relevant for parallel makes. .El .Sh SPECIAL TARGETS Special targets may not be included with other targets, i.e. they must be the only target specified. .Bl -tag -width .BEGINx .It Ic .BEGIN Any command lines attached to this target are executed before anything else is done. .It Ic .DEFAULT This is sort of a .Ic .USE rule for any target (that was used only as a source) that .Nm can't figure out any other way to create. Only the shell script is used. The .Ic .IMPSRC variable of a target that inherits .Ic .DEFAULT Ns 's commands is set to the target's own name. .It Ic .END Any command lines attached to this target are executed after everything else is done. .It Ic .ERROR Any command lines attached to this target are executed when another target fails. The .Ic .ERROR_TARGET variable is set to the target that failed. See also .Ic MAKE_PRINT_VAR_ON_ERROR . .It Ic .IGNORE Mark each of the sources with the .Ic .IGNORE attribute. If no sources are specified, this is the equivalent of specifying the .Fl i option. .It Ic .INTERRUPT If .Nm is interrupted, the commands for this target will be executed. .It Ic .MAIN If no target is specified when .Nm is invoked, this target will be built. .It Ic .MAKEFLAGS This target provides a way to specify flags for .Nm when the makefile is used. The flags are as if typed to the shell, though the .Fl f option will have no effect. .\" XXX: NOT YET!!!! .\" .It Ic .NOTPARALLEL .\" The named targets are executed in non parallel mode. .\" If no targets are .\" specified, then all targets are executed in non parallel mode. .It Ic .NOPATH Apply the .Ic .NOPATH attribute to any specified sources. .It Ic .NOTPARALLEL Disable parallel mode. .It Ic .NO_PARALLEL Synonym for .Ic .NOTPARALLEL , for compatibility with other pmake variants. .It Ic .OBJDIR The source is a new value for .Ql Va .OBJDIR . If it exists, .Nm will .Xr chdir 2 to it and update the value of .Ql Va .OBJDIR . .It Ic .ORDER The named targets are made in sequence. This ordering does not add targets to the list of targets to be made. Since the dependents of a target do not get built until the target itself could be built, unless .Ql a is built by another part of the dependency graph, the following is a dependency loop: .Bd -literal \&.ORDER: b a b: a .Ed .Pp The ordering imposed by .Ic .ORDER is only relevant for parallel makes. .\" XXX: NOT YET!!!! .\" .It Ic .PARALLEL .\" The named targets are executed in parallel mode. .\" If no targets are .\" specified, then all targets are executed in parallel mode. .It Ic .PATH The sources are directories which are to be searched for files not found in the current directory. If no sources are specified, any previously specified directories are deleted. If the source is the special .Ic .DOTLAST target, then the current working directory is searched last. .It Ic .PATH. Ns Va suffix Like .Ic .PATH but applies only to files with a particular suffix. The suffix must have been previously declared with .Ic .SUFFIXES . .It Ic .PHONY Apply the .Ic .PHONY attribute to any specified sources. .It Ic .PRECIOUS Apply the .Ic .PRECIOUS attribute to any specified sources. If no sources are specified, the .Ic .PRECIOUS attribute is applied to every target in the file. .It Ic .SHELL Sets the shell that .Nm will use to execute commands. The sources are a set of .Ar field=value pairs. .Bl -tag -width hasErrCtls .It Ar name This is the minimal specification, used to select one of the built-in shell specs; .Ar sh , .Ar ksh , and .Ar csh . .It Ar path Specifies the path to the shell. .It Ar hasErrCtl Indicates whether the shell supports exit on error. .It Ar check The command to turn on error checking. .It Ar ignore The command to disable error checking. .It Ar echo The command to turn on echoing of commands executed. .It Ar quiet The command to turn off echoing of commands executed. .It Ar filter The output to filter after issuing the .Ar quiet command. It is typically identical to .Ar quiet . .It Ar errFlag The flag to pass the shell to enable error checking. .It Ar echoFlag The flag to pass the shell to enable command echoing. .It Ar newline The string literal to pass the shell that results in a single newline character when used outside of any quoting characters. .El Example: .Bd -literal \&.SHELL: name=ksh path=/bin/ksh hasErrCtl=true \e check="set \-e" ignore="set +e" \e echo="set \-v" quiet="set +v" filter="set +v" \e echoFlag=v errFlag=e newline="'\en'" .Ed .It Ic .SILENT Apply the .Ic .SILENT attribute to any specified sources. If no sources are specified, the .Ic .SILENT attribute is applied to every command in the file. .It Ic .STALE This target gets run when a dependency file contains stale entries, having .Va .ALLSRC set to the name of that dependency file. .It Ic .SUFFIXES Each source specifies a suffix to .Nm . If no sources are specified, any previously specified suffixes are deleted. It allows the creation of suffix-transformation rules. .Pp Example: .Bd -literal \&.SUFFIXES: .o \&.c.o: cc \-o ${.TARGET} \-c ${.IMPSRC} .Ed .El .Sh ENVIRONMENT .Nm uses the following environment variables, if they exist: .Ev MACHINE , .Ev MACHINE_ARCH , .Ev MAKE , .Ev MAKEFLAGS , .Ev MAKEOBJDIR , .Ev MAKEOBJDIRPREFIX , .Ev MAKESYSPATH , .Ev PWD , and .Ev TMPDIR . .Pp .Ev MAKEOBJDIRPREFIX and .Ev MAKEOBJDIR may only be set in the environment or on the command line to .Nm and not as makefile variables; see the description of .Ql Va .OBJDIR for more details. .Sh FILES .Bl -tag -width /usr/share/mk -compact .It .depend list of dependencies .It Makefile list of dependencies .It makefile list of dependencies .It sys.mk system makefile .It /usr/share/mk system makefile directory .El .Sh COMPATIBILITY The basic make syntax is compatible between different versions of make; however the special variables, variable modifiers and conditionals are not. .Ss Older versions An incomplete list of changes in older versions of .Nm : .Pp The way that .for loop variables are substituted changed after .Nx 5.0 so that they still appear to be variable expansions. In particular this stops them being treated as syntax, and removes some obscure problems using them in .if statements. .Pp The way that parallel makes are scheduled changed in .Nx 4.0 so that .ORDER and .WAIT apply recursively to the dependent nodes. The algorithms used may change again in the future. .Ss Other make dialects Other make dialects (GNU make, SVR4 make, POSIX make, etc.) do not support most of the features of .Nm as described in this manual. Most notably: .Bl -bullet -offset indent .It The .Ic .WAIT and .Ic .ORDER declarations and most functionality pertaining to parallelization. (GNU make supports parallelization but lacks these features needed to control it effectively.) .It Directives, including for loops and conditionals and most of the forms of include files. (GNU make has its own incompatible and less powerful syntax for conditionals.) .It All built-in variables that begin with a dot. .It Most of the special sources and targets that begin with a dot, with the notable exception of .Ic .PHONY , .Ic .PRECIOUS , and .Ic .SUFFIXES . .It Variable modifiers, except for the .Dl :old=new string substitution, which does not portably support globbing with .Ql % and historically only works on declared suffixes. .It The .Ic $> variable even in its short form; most makes support this functionality but its name varies. .El .Pp Some features are somewhat more portable, such as assignment with .Ic += , .Ic ?= , and .Ic != . The .Ic .PATH functionality is based on an older feature .Ic VPATH found in GNU make and many versions of SVR4 make; however, historically its behavior is too ill-defined (and too buggy) to rely upon. .Pp The .Ic $@ and .Ic $< variables are more or less universally portable, as is the .Ic $(MAKE) variable. Basic use of suffix rules (for files only in the current directory, not trying to chain transformations together, etc.) is also reasonably portable. .Sh SEE ALSO .Xr mkdep 1 .Sh HISTORY A .Nm command appeared in .At v7 . This .Nm implementation is based on Adam De Boor's pmake program which was written for Sprite at Berkeley. It was designed to be a parallel distributed make running jobs on different machines using a daemon called .Dq customs . .Pp Historically the target/dependency .Dq FRC has been used to FoRCe rebuilding (since the target/dependency does not exist... unless someone creates an .Dq FRC file). .Sh BUGS The .Nm syntax is difficult to parse without actually acting of the data. For instance finding the end of a variable use should involve scanning each the modifiers using the correct terminator for each field. In many places .Nm just counts {} and () in order to find the end of a variable expansion. .Pp There is no way of escaping a space character in a filename. Index: projects/clang390-import/contrib/bmake/meta.c =================================================================== --- projects/clang390-import/contrib/bmake/meta.c (revision 305686) +++ projects/clang390-import/contrib/bmake/meta.c (revision 305687) @@ -1,1555 +1,1633 @@ -/* $NetBSD: meta.c,v 1.61 2016/06/07 00:40:00 sjg Exp $ */ +/* $NetBSD: meta.c,v 1.67 2016/08/17 15:52:42 sjg Exp $ */ /* * Implement 'meta' mode. * Adapted from John Birrell's patches to FreeBSD make. * --sjg */ /* * Copyright (c) 2009-2016, Juniper Networks, Inc. * Portions Copyright (c) 2009, John Birrell. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #if defined(USE_META) #ifdef HAVE_CONFIG_H # include "config.h" #endif #include #include #ifdef HAVE_LIBGEN_H #include #elif !defined(HAVE_DIRNAME) char * dirname(char *); #endif #include #if !defined(HAVE_CONFIG_H) || defined(HAVE_ERR_H) #include #endif #include "make.h" #include "job.h" #ifdef HAVE_FILEMON_H # include #endif #if !defined(USE_FILEMON) && defined(FILEMON_SET_FD) # define USE_FILEMON #endif static BuildMon Mybm; /* for compat */ static Lst metaBailiwick; /* our scope of control */ static char *metaBailiwickStr; /* string storage for the list */ static Lst metaIgnorePaths; /* paths we deliberately ignore */ static char *metaIgnorePathsStr; /* string storage for the list */ #ifndef MAKE_META_IGNORE_PATHS #define MAKE_META_IGNORE_PATHS ".MAKE.META.IGNORE_PATHS" #endif #ifndef MAKE_META_IGNORE_PATTERNS #define MAKE_META_IGNORE_PATTERNS ".MAKE.META.IGNORE_PATTERNS" #endif +#ifndef MAKE_META_IGNORE_FILTER +#define MAKE_META_IGNORE_FILTER ".MAKE.META.IGNORE_FILTER" +#endif Boolean useMeta = FALSE; static Boolean useFilemon = FALSE; static Boolean writeMeta = FALSE; static Boolean metaMissing = FALSE; /* oodate if missing */ static Boolean filemonMissing = FALSE; /* oodate if missing */ static Boolean metaEnv = FALSE; /* don't save env unless asked */ static Boolean metaVerbose = FALSE; static Boolean metaIgnoreCMDs = FALSE; /* ignore CMDs in .meta files */ static Boolean metaIgnorePatterns = FALSE; /* do we need to do pattern matches */ +static Boolean metaIgnoreFilter = FALSE; /* do we have more complex filtering? */ static Boolean metaCurdirOk = FALSE; /* write .meta in .CURDIR Ok? */ static Boolean metaSilent = FALSE; /* if we have a .meta be SILENT */ extern Boolean forceJobs; extern Boolean comatMake; extern char **environ; #define MAKE_META_PREFIX ".MAKE.META.PREFIX" #ifndef N2U # define N2U(n, u) (((n) + ((u) - 1)) / (u)) #endif #ifndef ROUNDUP # define ROUNDUP(n, u) (N2U((n), (u)) * (u)) #endif #if !defined(HAVE_STRSEP) # define strsep(s, d) stresep((s), (d), 0) #endif /* * Filemon is a kernel module which snoops certain syscalls. * * C chdir * E exec * F [v]fork * L [sym]link * M rename * R read * W write * S stat * * See meta_oodate below - we mainly care about 'E' and 'R'. * * We can still use meta mode without filemon, but * the benefits are more limited. */ #ifdef USE_FILEMON # ifndef _PATH_FILEMON # define _PATH_FILEMON "/dev/filemon" # endif /* * Open the filemon device. */ static void filemon_open(BuildMon *pbm) { int retry; pbm->mon_fd = pbm->filemon_fd = -1; if (!useFilemon) return; for (retry = 5; retry >= 0; retry--) { if ((pbm->filemon_fd = open(_PATH_FILEMON, O_RDWR)) >= 0) break; } if (pbm->filemon_fd < 0) { useFilemon = FALSE; warn("Could not open %s", _PATH_FILEMON); return; } /* * We use a file outside of '.' * to avoid a FreeBSD kernel bug where unlink invalidates * cwd causing getcwd to do a lot more work. * We only care about the descriptor. */ pbm->mon_fd = mkTempFile("filemon.XXXXXX", NULL); if (ioctl(pbm->filemon_fd, FILEMON_SET_FD, &pbm->mon_fd) < 0) { err(1, "Could not set filemon file descriptor!"); } /* we don't need these once we exec */ (void)fcntl(pbm->mon_fd, F_SETFD, FD_CLOEXEC); (void)fcntl(pbm->filemon_fd, F_SETFD, FD_CLOEXEC); } /* * Read the build monitor output file and write records to the target's * metadata file. */ static int filemon_read(FILE *mfp, int fd) { char buf[BUFSIZ]; int n; int error; /* Check if we're not writing to a meta data file.*/ if (mfp == NULL) { if (fd >= 0) close(fd); /* not interested */ return 0; } /* rewind */ (void)lseek(fd, (off_t)0, SEEK_SET); error = 0; fprintf(mfp, "\n-- filemon acquired metadata --\n"); while ((n = read(fd, buf, sizeof(buf))) > 0) { if ((int)fwrite(buf, 1, n, mfp) < n) error = EIO; } fflush(mfp); if (close(fd) < 0) error = errno; return error; } #endif /* * when realpath() fails, * we use this, to clean up ./ and ../ */ static void eat_dots(char *buf, size_t bufsz, int dots) { char *cp; char *cp2; const char *eat; size_t eatlen; switch (dots) { case 1: eat = "/./"; eatlen = 2; break; case 2: eat = "/../"; eatlen = 3; break; default: return; } do { cp = strstr(buf, eat); if (cp) { cp2 = cp + eatlen; if (dots == 2 && cp > buf) { do { cp--; } while (cp > buf && *cp != '/'); } if (*cp == '/') { strlcpy(cp, cp2, bufsz - (cp - buf)); } else { return; /* can't happen? */ } } } while (cp); } static char * meta_name(struct GNode *gn, char *mname, size_t mnamelen, const char *dname, const char *tname, const char *cwd) { char buf[MAXPATHLEN]; char *rp; char *cp; char *tp; /* * Weed out relative paths from the target file name. * We have to be careful though since if target is a * symlink, the result will be unstable. * So we use realpath() just to get the dirname, and leave the * basename as given to us. */ if ((cp = strrchr(tname, '/'))) { if (cached_realpath(tname, buf)) { if ((rp = strrchr(buf, '/'))) { rp++; cp++; if (strcmp(cp, rp) != 0) strlcpy(rp, cp, sizeof(buf) - (rp - buf)); } tname = buf; } else { /* * We likely have a directory which is about to be made. * We pretend realpath() succeeded, to have a chance * of generating the same meta file name that we will * next time through. */ if (tname[0] == '/') { strlcpy(buf, tname, sizeof(buf)); } else { snprintf(buf, sizeof(buf), "%s/%s", cwd, tname); } eat_dots(buf, sizeof(buf), 1); /* ./ */ eat_dots(buf, sizeof(buf), 2); /* ../ */ tname = buf; } } /* on some systems dirname may modify its arg */ tp = bmake_strdup(tname); if (strcmp(dname, dirname(tp)) == 0) snprintf(mname, mnamelen, "%s.meta", tname); else { snprintf(mname, mnamelen, "%s/%s.meta", dname, tname); /* * Replace path separators in the file name after the * current object directory path. */ cp = mname + strlen(dname) + 1; while (*cp != '\0') { if (*cp == '/') *cp = '_'; cp++; } } free(tp); return (mname); } /* * Return true if running ${.MAKE} * Bypassed if target is flagged .MAKE */ static int is_submake(void *cmdp, void *gnp) { static char *p_make = NULL; static int p_len; char *cmd = cmdp; GNode *gn = gnp; char *mp = NULL; char *cp; char *cp2; int rc = 0; /* keep looking */ if (!p_make) { p_make = Var_Value(".MAKE", gn, &cp); p_len = strlen(p_make); } cp = strchr(cmd, '$'); if ((cp)) { mp = Var_Subst(NULL, cmd, gn, VARF_WANTRES); cmd = mp; } cp2 = strstr(cmd, p_make); if ((cp2)) { switch (cp2[p_len]) { case '\0': case ' ': case '\t': case '\n': rc = 1; break; } if (cp2 > cmd && rc > 0) { switch (cp2[-1]) { case ' ': case '\t': case '\n': break; default: rc = 0; /* no match */ break; } } } free(mp); return (rc); } typedef struct meta_file_s { FILE *fp; GNode *gn; } meta_file_t; static int printCMD(void *cmdp, void *mfpp) { meta_file_t *mfp = mfpp; char *cmd = cmdp; char *cp = NULL; if (strchr(cmd, '$')) { cmd = cp = Var_Subst(NULL, cmd, mfp->gn, VARF_WANTRES); } fprintf(mfp->fp, "CMD %s\n", cmd); free(cp); return 0; } /* * Certain node types never get a .meta file */ #define SKIP_META_TYPE(_type) do { \ if ((gn->type & __CONCAT(OP_, _type))) { \ if (verbose) { \ fprintf(debug_file, "Skipping meta for %s: .%s\n", \ gn->name, __STRING(_type)); \ } \ return FALSE; \ } \ } while (0) /* * Do we need/want a .meta file ? */ static Boolean meta_needed(GNode *gn, const char *dname, const char *tname, char *objdir, int verbose) { struct stat fs; if (verbose) verbose = DEBUG(META); /* This may be a phony node which we don't want meta data for... */ /* Skip .meta for .BEGIN, .END, .ERROR etc as well. */ /* Or it may be explicitly flagged as .NOMETA */ SKIP_META_TYPE(NOMETA); /* Unless it is explicitly flagged as .META */ if (!(gn->type & OP_META)) { SKIP_META_TYPE(PHONY); SKIP_META_TYPE(SPECIAL); SKIP_META_TYPE(MAKE); } /* Check if there are no commands to execute. */ if (Lst_IsEmpty(gn->commands)) { if (verbose) fprintf(debug_file, "Skipping meta for %s: no commands\n", gn->name); return FALSE; } if ((gn->type & (OP_META|OP_SUBMAKE)) == OP_SUBMAKE) { /* OP_SUBMAKE is a bit too aggressive */ if (Lst_ForEach(gn->commands, is_submake, gn)) { if (DEBUG(META)) fprintf(debug_file, "Skipping meta for %s: .SUBMAKE\n", gn->name); return FALSE; } } /* The object directory may not exist. Check it.. */ if (cached_stat(dname, &fs) != 0) { if (verbose) fprintf(debug_file, "Skipping meta for %s: no .OBJDIR\n", gn->name); return FALSE; } /* make sure these are canonical */ if (cached_realpath(dname, objdir)) dname = objdir; /* If we aren't in the object directory, don't create a meta file. */ if (!metaCurdirOk && strcmp(curdir, dname) == 0) { if (verbose) fprintf(debug_file, "Skipping meta for %s: .OBJDIR == .CURDIR\n", gn->name); return FALSE; } return TRUE; } static FILE * meta_create(BuildMon *pbm, GNode *gn) { meta_file_t mf; char buf[MAXPATHLEN]; char objdir[MAXPATHLEN]; char **ptr; const char *dname; const char *tname; char *fname; const char *cp; char *p[4]; /* >= possible uses */ int i; mf.fp = NULL; i = 0; dname = Var_Value(".OBJDIR", gn, &p[i++]); tname = Var_Value(TARGET, gn, &p[i++]); /* if this succeeds objdir is realpath of dname */ if (!meta_needed(gn, dname, tname, objdir, TRUE)) goto out; dname = objdir; if (metaVerbose) { char *mp; /* Describe the target we are building */ mp = Var_Subst(NULL, "${" MAKE_META_PREFIX "}", gn, VARF_WANTRES); if (*mp) fprintf(stdout, "%s\n", mp); free(mp); } /* Get the basename of the target */ if ((cp = strrchr(tname, '/')) == NULL) { cp = tname; } else { cp++; } fflush(stdout); if (!writeMeta) /* Don't create meta data. */ goto out; fname = meta_name(gn, pbm->meta_fname, sizeof(pbm->meta_fname), dname, tname, objdir); #ifdef DEBUG_META_MODE if (DEBUG(META)) fprintf(debug_file, "meta_create: %s\n", fname); #endif if ((mf.fp = fopen(fname, "w")) == NULL) err(1, "Could not open meta file '%s'", fname); fprintf(mf.fp, "# Meta data file %s\n", fname); mf.gn = gn; Lst_ForEach(gn->commands, printCMD, &mf); fprintf(mf.fp, "CWD %s\n", getcwd(buf, sizeof(buf))); fprintf(mf.fp, "TARGET %s\n", tname); if (metaEnv) { for (ptr = environ; *ptr != NULL; ptr++) fprintf(mf.fp, "ENV %s\n", *ptr); } fprintf(mf.fp, "-- command output --\n"); fflush(mf.fp); Var_Append(".MAKE.META.FILES", fname, VAR_GLOBAL); Var_Append(".MAKE.META.CREATED", fname, VAR_GLOBAL); gn->type |= OP_META; /* in case anyone wants to know */ if (metaSilent) { gn->type |= OP_SILENT; } out: for (i--; i >= 0; i--) { free(p[i]); } return (mf.fp); } static Boolean boolValue(char *s) { switch(*s) { case '0': case 'N': case 'n': case 'F': case 'f': return FALSE; } return TRUE; } /* * Initialization we need before reading makefiles. */ void meta_init(void) { #ifdef USE_FILEMON /* this allows makefiles to test if we have filemon support */ Var_Set(".MAKE.PATH_FILEMON", _PATH_FILEMON, VAR_GLOBAL, 0); #endif } #define get_mode_bf(bf, token) \ if ((cp = strstr(make_mode, token))) \ bf = boolValue(&cp[sizeof(token) - 1]) /* * Initialization we need after reading makefiles. */ void meta_mode_init(const char *make_mode) { static int once = 0; char *cp; useMeta = TRUE; useFilemon = TRUE; writeMeta = TRUE; if (make_mode) { if (strstr(make_mode, "env")) metaEnv = TRUE; if (strstr(make_mode, "verb")) metaVerbose = TRUE; if (strstr(make_mode, "read")) writeMeta = FALSE; if (strstr(make_mode, "nofilemon")) useFilemon = FALSE; if (strstr(make_mode, "ignore-cmd")) metaIgnoreCMDs = TRUE; if (useFilemon) get_mode_bf(filemonMissing, "missing-filemon="); get_mode_bf(metaCurdirOk, "curdirok="); get_mode_bf(metaMissing, "missing-meta="); get_mode_bf(metaSilent, "silent="); } if (metaVerbose && !Var_Exists(MAKE_META_PREFIX, VAR_GLOBAL)) { /* * The default value for MAKE_META_PREFIX * prints the absolute path of the target. * This works be cause :H will generate '.' if there is no / * and :tA will resolve that to cwd. */ Var_Set(MAKE_META_PREFIX, "Building ${.TARGET:H:tA}/${.TARGET:T}", VAR_GLOBAL, 0); } if (once) return; once = 1; memset(&Mybm, 0, sizeof(Mybm)); /* * We consider ourselves master of all within ${.MAKE.META.BAILIWICK} */ metaBailiwick = Lst_Init(FALSE); metaBailiwickStr = Var_Subst(NULL, "${.MAKE.META.BAILIWICK:O:u:tA}", VAR_GLOBAL, VARF_WANTRES); if (metaBailiwickStr) { str2Lst_Append(metaBailiwick, metaBailiwickStr, NULL); } /* * We ignore any paths that start with ${.MAKE.META.IGNORE_PATHS} */ metaIgnorePaths = Lst_Init(FALSE); Var_Append(MAKE_META_IGNORE_PATHS, "/dev /etc /proc /tmp /var/run /var/tmp ${TMPDIR}", VAR_GLOBAL); metaIgnorePathsStr = Var_Subst(NULL, "${" MAKE_META_IGNORE_PATHS ":O:u:tA}", VAR_GLOBAL, VARF_WANTRES); if (metaIgnorePathsStr) { str2Lst_Append(metaIgnorePaths, metaIgnorePathsStr, NULL); } /* * We ignore any paths that match ${.MAKE.META.IGNORE_PATTERNS} */ cp = NULL; if (Var_Value(MAKE_META_IGNORE_PATTERNS, VAR_GLOBAL, &cp)) { metaIgnorePatterns = TRUE; free(cp); } + cp = NULL; + if (Var_Value(MAKE_META_IGNORE_FILTER, VAR_GLOBAL, &cp)) { + metaIgnoreFilter = TRUE; + free(cp); + } } /* * In each case below we allow for job==NULL */ void meta_job_start(Job *job, GNode *gn) { BuildMon *pbm; if (job != NULL) { pbm = &job->bm; } else { pbm = &Mybm; } pbm->mfp = meta_create(pbm, gn); #ifdef USE_FILEMON_ONCE /* compat mode we open the filemon dev once per command */ if (job == NULL) return; #endif #ifdef USE_FILEMON if (pbm->mfp != NULL && useFilemon) { filemon_open(pbm); } else { pbm->mon_fd = pbm->filemon_fd = -1; } #endif } /* * The child calls this before doing anything. * It does not disturb our state. */ void meta_job_child(Job *job) { #ifdef USE_FILEMON BuildMon *pbm; if (job != NULL) { pbm = &job->bm; } else { pbm = &Mybm; } if (pbm->mfp != NULL) { close(fileno(pbm->mfp)); if (useFilemon) { pid_t pid; pid = getpid(); if (ioctl(pbm->filemon_fd, FILEMON_SET_PID, &pid) < 0) { err(1, "Could not set filemon pid!"); } } } #endif } void meta_job_error(Job *job, GNode *gn, int flags, int status) { char cwd[MAXPATHLEN]; BuildMon *pbm; if (job != NULL) { pbm = &job->bm; if (!gn) gn = job->node; } else { pbm = &Mybm; } if (pbm->mfp != NULL) { fprintf(pbm->mfp, "*** Error code %d%s\n", status, (flags & JOB_IGNERR) ? "(ignored)" : ""); } if (gn) { Var_Set(".ERROR_TARGET", gn->path ? gn->path : gn->name, VAR_GLOBAL, 0); } getcwd(cwd, sizeof(cwd)); Var_Set(".ERROR_CWD", cwd, VAR_GLOBAL, 0); if (pbm->meta_fname[0]) { Var_Set(".ERROR_META_FILE", pbm->meta_fname, VAR_GLOBAL, 0); } meta_job_finish(job); } void meta_job_output(Job *job, char *cp, const char *nl) { BuildMon *pbm; if (job != NULL) { pbm = &job->bm; } else { pbm = &Mybm; } if (pbm->mfp != NULL) { if (metaVerbose) { static char *meta_prefix = NULL; static int meta_prefix_len; if (!meta_prefix) { char *cp2; meta_prefix = Var_Subst(NULL, "${" MAKE_META_PREFIX "}", VAR_GLOBAL, VARF_WANTRES); if ((cp2 = strchr(meta_prefix, '$'))) meta_prefix_len = cp2 - meta_prefix; else meta_prefix_len = strlen(meta_prefix); } if (strncmp(cp, meta_prefix, meta_prefix_len) == 0) { cp = strchr(cp+1, '\n'); if (!cp++) return; } } fprintf(pbm->mfp, "%s%s", cp, nl); } } int meta_cmd_finish(void *pbmp) { int error = 0; #ifdef USE_FILEMON BuildMon *pbm = pbmp; int x; if (!pbm) pbm = &Mybm; if (pbm->filemon_fd >= 0) { if (close(pbm->filemon_fd) < 0) error = errno; x = filemon_read(pbm->mfp, pbm->mon_fd); if (error == 0 && x != 0) error = x; pbm->filemon_fd = pbm->mon_fd = -1; } #endif return error; } int meta_job_finish(Job *job) { BuildMon *pbm; int error = 0; int x; if (job != NULL) { pbm = &job->bm; } else { pbm = &Mybm; } if (pbm->mfp != NULL) { error = meta_cmd_finish(pbm); x = fclose(pbm->mfp); if (error == 0 && x != 0) error = errno; pbm->mfp = NULL; pbm->meta_fname[0] = '\0'; } return error; } void meta_finish(void) { Lst_Destroy(metaBailiwick, NULL); free(metaBailiwickStr); Lst_Destroy(metaIgnorePaths, NULL); free(metaIgnorePathsStr); } /* * Fetch a full line from fp - growing bufp if needed * Return length in bufp. */ static int fgetLine(char **bufp, size_t *szp, int o, FILE *fp) { char *buf = *bufp; size_t bufsz = *szp; struct stat fs; int x; if (fgets(&buf[o], bufsz - o, fp) != NULL) { check_newline: x = o + strlen(&buf[o]); if (buf[x - 1] == '\n') return x; /* * We need to grow the buffer. * The meta file can give us a clue. */ if (fstat(fileno(fp), &fs) == 0) { size_t newsz; char *p; newsz = ROUNDUP((fs.st_size / 2), BUFSIZ); if (newsz <= bufsz) newsz = ROUNDUP(fs.st_size, BUFSIZ); if (DEBUG(META)) fprintf(debug_file, "growing buffer %u -> %u\n", (unsigned)bufsz, (unsigned)newsz); p = bmake_realloc(buf, newsz); if (p) { *bufp = buf = p; *szp = bufsz = newsz; /* fetch the rest */ if (!fgets(&buf[x], bufsz - x, fp)) return x; /* truncated! */ goto check_newline; } } } return 0; } +/* Lst_ForEach wants 1 to stop search */ static int prefix_match(void *p, void *q) { const char *prefix = p; const char *path = q; size_t n = strlen(prefix); return (0 == strncmp(path, prefix, n)); } +/* + * looking for exact or prefix/ match to + * Lst_Find wants 0 to stop search + */ static int +path_match(const void *p, const void *q) +{ + const char *prefix = q; + const char *path = p; + size_t n = strlen(prefix); + int rc; + + if ((rc = strncmp(path, prefix, n)) == 0) { + switch (path[n]) { + case '\0': + case '/': + break; + default: + rc = 1; + break; + } + } + return rc; +} + +/* Lst_Find wants 0 to stop search */ +static int string_match(const void *p, const void *q) { const char *p1 = p; const char *p2 = q; return strcmp(p1, p2); } +static int +meta_ignore(GNode *gn, const char *p) +{ + char fname[MAXPATHLEN]; + + if (p == NULL) + return TRUE; + + if (*p == '/') { + cached_realpath(p, fname); /* clean it up */ + if (Lst_ForEach(metaIgnorePaths, prefix_match, fname)) { +#ifdef DEBUG_META_MODE + if (DEBUG(META)) + fprintf(debug_file, "meta_oodate: ignoring path: %s\n", + p); +#endif + return TRUE; + } + } + + if (metaIgnorePatterns) { + char *pm; + + snprintf(fname, sizeof(fname), + "${%s:@m@${%s:L:M$m}@}", + MAKE_META_IGNORE_PATTERNS, p); + pm = Var_Subst(NULL, fname, gn, VARF_WANTRES); + if (*pm) { +#ifdef DEBUG_META_MODE + if (DEBUG(META)) + fprintf(debug_file, "meta_oodate: ignoring pattern: %s\n", + p); +#endif + free(pm); + return TRUE; + } + free(pm); + } + + if (metaIgnoreFilter) { + char *fm; + + /* skip if filter result is empty */ + snprintf(fname, sizeof(fname), + "${%s:L:${%s:ts:}}", + p, MAKE_META_IGNORE_FILTER); + fm = Var_Subst(NULL, fname, gn, VARF_WANTRES); + if (*fm == '\0') { +#ifdef DEBUG_META_MODE + if (DEBUG(META)) + fprintf(debug_file, "meta_oodate: ignoring filtered: %s\n", + p); +#endif + free(fm); + return TRUE; + } + free(fm); + } + return FALSE; +} + /* * When running with 'meta' functionality, a target can be out-of-date * if any of the references in its meta data file is more recent. * We have to track the latestdir on a per-process basis. */ #define LCWD_VNAME_FMT ".meta.%d.lcwd" #define LDIR_VNAME_FMT ".meta.%d.ldir" /* * It is possible that a .meta file is corrupted, * if we detect this we want to reproduce it. * Setting oodate TRUE will have that effect. */ #define CHECK_VALID_META(p) if (!(p && *p)) { \ warnx("%s: %d: malformed", fname, lineno); \ oodate = TRUE; \ continue; \ } #define DEQUOTE(p) if (*p == '\'') { \ char *ep; \ p++; \ if ((ep = strchr(p, '\''))) \ *ep = '\0'; \ } Boolean meta_oodate(GNode *gn, Boolean oodate) { static char *tmpdir = NULL; static char cwd[MAXPATHLEN]; char lcwd_vname[64]; char ldir_vname[64]; char lcwd[MAXPATHLEN]; char latestdir[MAXPATHLEN]; char fname[MAXPATHLEN]; char fname1[MAXPATHLEN]; char fname2[MAXPATHLEN]; char fname3[MAXPATHLEN]; const char *dname; const char *tname; char *p; char *cp; char *link_src; char *move_target; static size_t cwdlen = 0; static size_t tmplen = 0; FILE *fp; Boolean needOODATE = FALSE; Lst missingFiles; char *pa[4]; /* >= possible uses */ int i; int have_filemon = FALSE; if (oodate) return oodate; /* we're done */ i = 0; dname = Var_Value(".OBJDIR", gn, &pa[i++]); tname = Var_Value(TARGET, gn, &pa[i++]); /* if this succeeds fname3 is realpath of dname */ if (!meta_needed(gn, dname, tname, fname3, FALSE)) goto oodate_out; dname = fname3; missingFiles = Lst_Init(FALSE); /* * We need to check if the target is out-of-date. This includes * checking if the expanded command has changed. This in turn * requires that all variables are set in the same way that they * would be if the target needs to be re-built. */ Make_DoAllVar(gn); meta_name(gn, fname, sizeof(fname), dname, tname, dname); #ifdef DEBUG_META_MODE if (DEBUG(META)) fprintf(debug_file, "meta_oodate: %s\n", fname); #endif if ((fp = fopen(fname, "r")) != NULL) { static char *buf = NULL; static size_t bufsz; int lineno = 0; int lastpid = 0; int pid; int x; LstNode ln; struct stat fs; if (!buf) { bufsz = 8 * BUFSIZ; buf = bmake_malloc(bufsz); } if (!cwdlen) { if (getcwd(cwd, sizeof(cwd)) == NULL) err(1, "Could not get current working directory"); cwdlen = strlen(cwd); } strlcpy(lcwd, cwd, sizeof(lcwd)); strlcpy(latestdir, cwd, sizeof(latestdir)); if (!tmpdir) { tmpdir = getTmpdir(); tmplen = strlen(tmpdir); } /* we want to track all the .meta we read */ Var_Append(".MAKE.META.FILES", fname, VAR_GLOBAL); ln = Lst_First(gn->commands); while (!oodate && (x = fgetLine(&buf, &bufsz, 0, fp)) > 0) { lineno++; if (buf[x - 1] == '\n') buf[x - 1] = '\0'; else { warnx("%s: %d: line truncated at %u", fname, lineno, x); oodate = TRUE; break; } link_src = NULL; move_target = NULL; /* Find the start of the build monitor section. */ if (!have_filemon) { if (strncmp(buf, "-- filemon", 10) == 0) { have_filemon = TRUE; continue; } if (strncmp(buf, "# buildmon", 10) == 0) { have_filemon = TRUE; continue; } } /* Delimit the record type. */ p = buf; #ifdef DEBUG_META_MODE if (DEBUG(META)) fprintf(debug_file, "%s: %d: %s\n", fname, lineno, buf); #endif strsep(&p, " "); if (have_filemon) { /* * We are in the 'filemon' output section. * Each record from filemon follows the general form: * * * * Where: * is a single letter, denoting the syscall. * is the process that made the syscall. * is the arguments (of interest). */ switch(buf[0]) { case '#': /* comment */ case 'V': /* version */ break; default: /* * We need to track pathnames per-process. * * Each process run by make, starts off in the 'CWD' * recorded in the .meta file, if it chdirs ('C') * elsewhere we need to track that - but only for * that process. If it forks ('F'), we initialize * the child to have the same cwd as its parent. * * We also need to track the 'latestdir' of * interest. This is usually the same as cwd, but * not if a process is reading directories. * * Each time we spot a different process ('pid') * we save the current value of 'latestdir' in a * variable qualified by 'lastpid', and * re-initialize 'latestdir' to any pre-saved * value for the current 'pid' and 'CWD' if none. */ CHECK_VALID_META(p); pid = atoi(p); if (pid > 0 && pid != lastpid) { char *ldir; char *tp; if (lastpid > 0) { /* We need to remember these. */ Var_Set(lcwd_vname, lcwd, VAR_GLOBAL, 0); Var_Set(ldir_vname, latestdir, VAR_GLOBAL, 0); } snprintf(lcwd_vname, sizeof(lcwd_vname), LCWD_VNAME_FMT, pid); snprintf(ldir_vname, sizeof(ldir_vname), LDIR_VNAME_FMT, pid); lastpid = pid; ldir = Var_Value(ldir_vname, VAR_GLOBAL, &tp); if (ldir) { strlcpy(latestdir, ldir, sizeof(latestdir)); free(tp); } ldir = Var_Value(lcwd_vname, VAR_GLOBAL, &tp); if (ldir) { strlcpy(lcwd, ldir, sizeof(lcwd)); free(tp); } } /* Skip past the pid. */ if (strsep(&p, " ") == NULL) continue; #ifdef DEBUG_META_MODE if (DEBUG(META)) fprintf(debug_file, "%s: %d: %d: %c: cwd=%s lcwd=%s ldir=%s\n", fname, lineno, pid, buf[0], cwd, lcwd, latestdir); #endif break; } CHECK_VALID_META(p); /* Process according to record type. */ switch (buf[0]) { case 'X': /* eXit */ Var_Delete(lcwd_vname, VAR_GLOBAL); Var_Delete(ldir_vname, VAR_GLOBAL); lastpid = 0; /* no need to save ldir_vname */ break; case 'F': /* [v]Fork */ { char cldir[64]; int child; child = atoi(p); if (child > 0) { snprintf(cldir, sizeof(cldir), LCWD_VNAME_FMT, child); Var_Set(cldir, lcwd, VAR_GLOBAL, 0); snprintf(cldir, sizeof(cldir), LDIR_VNAME_FMT, child); Var_Set(cldir, latestdir, VAR_GLOBAL, 0); #ifdef DEBUG_META_MODE if (DEBUG(META)) fprintf(debug_file, "%s: %d: %d: cwd=%s lcwd=%s ldir=%s\n", fname, lineno, child, cwd, lcwd, latestdir); #endif } } break; case 'C': /* Chdir */ /* Update lcwd and latest directory. */ strlcpy(latestdir, p, sizeof(latestdir)); strlcpy(lcwd, p, sizeof(lcwd)); Var_Set(lcwd_vname, lcwd, VAR_GLOBAL, 0); Var_Set(ldir_vname, lcwd, VAR_GLOBAL, 0); #ifdef DEBUG_META_MODE if (DEBUG(META)) fprintf(debug_file, "%s: %d: cwd=%s ldir=%s\n", fname, lineno, cwd, lcwd); #endif break; case 'M': /* renaMe */ /* * For 'M'oves we want to check * the src as for 'R'ead * and the target as for 'W'rite. */ cp = p; /* save this for a second */ /* now get target */ if (strsep(&p, " ") == NULL) continue; CHECK_VALID_META(p); move_target = p; p = cp; /* 'L' and 'M' put single quotes around the args */ DEQUOTE(p); DEQUOTE(move_target); /* FALLTHROUGH */ case 'D': /* unlink */ if (*p == '/' && !Lst_IsEmpty(missingFiles)) { - /* remove p from the missingFiles list if present */ - if ((ln = Lst_Find(missingFiles, p, string_match)) != NULL) { - char *tp = Lst_Datum(ln); - Lst_Remove(missingFiles, ln); - free(tp); - ln = NULL; /* we're done with it */ + /* remove any missingFiles entries that match p */ + if ((ln = Lst_Find(missingFiles, p, + path_match)) != NULL) { + LstNode nln; + char *tp; + + do { + nln = Lst_FindFrom(missingFiles, Lst_Succ(ln), + p, path_match); + tp = Lst_Datum(ln); + Lst_Remove(missingFiles, ln); + free(tp); + } while ((ln = nln) != NULL); } } if (buf[0] == 'M') { /* the target of the mv is a file 'W'ritten */ #ifdef DEBUG_META_MODE if (DEBUG(META)) fprintf(debug_file, "meta_oodate: M %s -> %s\n", p, move_target); #endif p = move_target; goto check_write; } break; case 'L': /* Link */ /* * For 'L'inks check * the src as for 'R'ead * and the target as for 'W'rite. */ link_src = p; /* now get target */ if (strsep(&p, " ") == NULL) continue; CHECK_VALID_META(p); /* 'L' and 'M' put single quotes around the args */ DEQUOTE(p); DEQUOTE(link_src); #ifdef DEBUG_META_MODE if (DEBUG(META)) fprintf(debug_file, "meta_oodate: L %s -> %s\n", link_src, p); #endif /* FALLTHROUGH */ case 'W': /* Write */ check_write: /* * If a file we generated within our bailiwick * but outside of .OBJDIR is missing, * we need to do it again. */ /* ignore non-absolute paths */ if (*p != '/') break; if (Lst_IsEmpty(metaBailiwick)) break; /* ignore cwd - normal dependencies handle those */ if (strncmp(p, cwd, cwdlen) == 0) break; if (!Lst_ForEach(metaBailiwick, prefix_match, p)) break; /* tmpdir might be within */ if (tmplen > 0 && strncmp(p, tmpdir, tmplen) == 0) break; /* ignore anything containing the string "tmp" */ if ((strstr("tmp", p))) break; if ((link_src != NULL && cached_lstat(p, &fs) < 0) || (link_src == NULL && cached_stat(p, &fs) < 0)) { - if (Lst_Find(missingFiles, p, string_match) == NULL) + if (!meta_ignore(gn, p)) { + if (Lst_Find(missingFiles, p, string_match) == NULL) Lst_AtEnd(missingFiles, bmake_strdup(p)); + } } break; check_link_src: p = link_src; link_src = NULL; #ifdef DEBUG_META_MODE if (DEBUG(META)) fprintf(debug_file, "meta_oodate: L src %s\n", p); #endif /* FALLTHROUGH */ case 'R': /* Read */ case 'E': /* Exec */ /* * Check for runtime files that can't * be part of the dependencies because * they are _expected_ to change. */ - if (*p == '/') { - cached_realpath(p, fname1); /* clean it up */ - if (Lst_ForEach(metaIgnorePaths, prefix_match, fname1)) { -#ifdef DEBUG_META_MODE - if (DEBUG(META)) - fprintf(debug_file, "meta_oodate: ignoring path: %s\n", - p); -#endif - break; - } - } - - if (metaIgnorePatterns) { - char *pm; - - snprintf(fname1, sizeof(fname1), - "${%s:@m@${%s:L:M$m}@}", - MAKE_META_IGNORE_PATTERNS, p); - pm = Var_Subst(NULL, fname1, gn, VARF_WANTRES); - if (*pm) { -#ifdef DEBUG_META_MODE - if (DEBUG(META)) - fprintf(debug_file, "meta_oodate: ignoring pattern: %s\n", - p); -#endif - free(pm); - break; - } - free(pm); - } - + if (meta_ignore(gn, p)) + break; + /* * The rest of the record is the file name. * Check if it's not an absolute path. */ { char *sdirs[4]; char **sdp; int sdx = 0; int found = 0; if (*p == '/') { sdirs[sdx++] = p; /* done */ } else { if (strcmp(".", p) == 0) continue; /* no point */ /* Check vs latestdir */ snprintf(fname1, sizeof(fname1), "%s/%s", latestdir, p); sdirs[sdx++] = fname1; if (strcmp(latestdir, lcwd) != 0) { /* Check vs lcwd */ snprintf(fname2, sizeof(fname2), "%s/%s", lcwd, p); sdirs[sdx++] = fname2; } if (strcmp(lcwd, cwd) != 0) { /* Check vs cwd */ snprintf(fname3, sizeof(fname3), "%s/%s", cwd, p); sdirs[sdx++] = fname3; } } sdirs[sdx++] = NULL; for (sdp = sdirs; *sdp && !found; sdp++) { #ifdef DEBUG_META_MODE if (DEBUG(META)) fprintf(debug_file, "%s: %d: looking for: %s\n", fname, lineno, *sdp); #endif if (cached_stat(*sdp, &fs) == 0) { found = 1; p = *sdp; } } if (found) { #ifdef DEBUG_META_MODE if (DEBUG(META)) fprintf(debug_file, "%s: %d: found: %s\n", fname, lineno, p); #endif if (!S_ISDIR(fs.st_mode) && fs.st_mtime > gn->mtime) { if (DEBUG(META)) fprintf(debug_file, "%s: %d: file '%s' is newer than the target...\n", fname, lineno, p); oodate = TRUE; } else if (S_ISDIR(fs.st_mode)) { /* Update the latest directory. */ cached_realpath(p, latestdir); } } else if (errno == ENOENT && *p == '/' && strncmp(p, cwd, cwdlen) != 0) { /* * A referenced file outside of CWD is missing. * We cannot catch every eventuality here... */ if (Lst_Find(missingFiles, p, string_match) == NULL) Lst_AtEnd(missingFiles, bmake_strdup(p)); } } if (buf[0] == 'E') { /* previous latestdir is no longer relevant */ strlcpy(latestdir, lcwd, sizeof(latestdir)); } break; default: break; } if (!oodate && buf[0] == 'L' && link_src != NULL) goto check_link_src; } else if (strcmp(buf, "CMD") == 0) { /* * Compare the current command with the one in the * meta data file. */ if (ln == NULL) { if (DEBUG(META)) fprintf(debug_file, "%s: %d: there were more build commands in the meta data file than there are now...\n", fname, lineno); oodate = TRUE; } else { char *cmd = (char *)Lst_Datum(ln); Boolean hasOODATE = FALSE; if (strstr(cmd, "$?")) hasOODATE = TRUE; else if ((cp = strstr(cmd, ".OODATE"))) { /* check for $[{(].OODATE[:)}] */ if (cp > cmd + 2 && cp[-2] == '$') hasOODATE = TRUE; } if (hasOODATE) { needOODATE = TRUE; if (DEBUG(META)) fprintf(debug_file, "%s: %d: cannot compare command using .OODATE\n", fname, lineno); } cmd = Var_Subst(NULL, cmd, gn, VARF_WANTRES|VARF_UNDEFERR); if ((cp = strchr(cmd, '\n'))) { int n; /* * This command contains newlines, we need to * fetch more from the .meta file before we * attempt a comparison. */ /* first put the newline back at buf[x - 1] */ buf[x - 1] = '\n'; do { /* now fetch the next line */ if ((n = fgetLine(&buf, &bufsz, x, fp)) <= 0) break; x = n; lineno++; if (buf[x - 1] != '\n') { warnx("%s: %d: line truncated at %u", fname, lineno, x); break; } cp = strchr(++cp, '\n'); } while (cp); if (buf[x - 1] == '\n') buf[x - 1] = '\0'; } if (!hasOODATE && !(gn->type & OP_NOMETA_CMP) && strcmp(p, cmd) != 0) { if (DEBUG(META)) fprintf(debug_file, "%s: %d: a build command has changed\n%s\nvs\n%s\n", fname, lineno, p, cmd); if (!metaIgnoreCMDs) oodate = TRUE; } free(cmd); ln = Lst_Succ(ln); } } else if (strcmp(buf, "CWD") == 0) { /* * Check if there are extra commands now * that weren't in the meta data file. */ if (!oodate && ln != NULL) { if (DEBUG(META)) fprintf(debug_file, "%s: %d: there are extra build commands now that weren't in the meta data file\n", fname, lineno); oodate = TRUE; } if (strcmp(p, cwd) != 0) { if (DEBUG(META)) fprintf(debug_file, "%s: %d: the current working directory has changed from '%s' to '%s'\n", fname, lineno, p, curdir); oodate = TRUE; } } } fclose(fp); if (!Lst_IsEmpty(missingFiles)) { if (DEBUG(META)) fprintf(debug_file, "%s: missing files: %s...\n", fname, (char *)Lst_Datum(Lst_First(missingFiles))); oodate = TRUE; } if (!oodate && !have_filemon && filemonMissing) { if (DEBUG(META)) fprintf(debug_file, "%s: missing filemon data\n", fname); oodate = TRUE; } } else { if (writeMeta && metaMissing) { cp = NULL; /* if target is in .CURDIR we do not need a meta file */ if (gn->path && (cp = strrchr(gn->path, '/')) && cp > gn->path) { if (strncmp(curdir, gn->path, (cp - gn->path)) != 0) { cp = NULL; /* not in .CURDIR */ } } if (!cp) { if (DEBUG(META)) fprintf(debug_file, "%s: required but missing\n", fname); oodate = TRUE; needOODATE = TRUE; /* assume the worst */ } } } Lst_Destroy(missingFiles, (FreeProc *)free); if (oodate && needOODATE) { /* * Target uses .OODATE which is empty; or we wouldn't be here. * We have decided it is oodate, so .OODATE needs to be set. * All we can sanely do is set it to .ALLSRC. */ Var_Delete(OODATE, gn); Var_Set(OODATE, Var_Value(ALLSRC, gn, &cp), gn, 0); free(cp); } oodate_out: for (i--; i >= 0; i--) { free(pa[i]); } return oodate; } /* support for compat mode */ static int childPipe[2]; void meta_compat_start(void) { #ifdef USE_FILEMON_ONCE /* * We need to re-open filemon for each cmd. */ BuildMon *pbm = &Mybm; if (pbm->mfp != NULL && useFilemon) { filemon_open(pbm); } else { pbm->mon_fd = pbm->filemon_fd = -1; } #endif if (pipe(childPipe) < 0) Punt("Cannot create pipe: %s", strerror(errno)); /* Set close-on-exec flag for both */ (void)fcntl(childPipe[0], F_SETFD, FD_CLOEXEC); (void)fcntl(childPipe[1], F_SETFD, FD_CLOEXEC); } void meta_compat_child(void) { meta_job_child(NULL); if (dup2(childPipe[1], 1) < 0 || dup2(1, 2) < 0) { execError("dup2", "pipe"); _exit(1); } } void meta_compat_parent(void) { FILE *fp; char buf[BUFSIZ]; close(childPipe[1]); /* child side */ fp = fdopen(childPipe[0], "r"); while (fgets(buf, sizeof(buf), fp)) { meta_job_output(NULL, buf, ""); printf("%s", buf); - (void)fflush(stdout); + fflush(stdout); } fclose(fp); } #endif /* USE_META */ Index: projects/clang390-import/contrib/bmake/mk/ChangeLog =================================================================== --- projects/clang390-import/contrib/bmake/mk/ChangeLog (revision 305686) +++ projects/clang390-import/contrib/bmake/mk/ChangeLog (revision 305687) @@ -1,1180 +1,1211 @@ +2016-08-15 Simon J. Gerraty + + * install-mk (MK_VERSION): 20160815 + + * dirdeps.mk (.MAKE.META.IGNORE_FILTER): set filter to only + consider Makefile.depend* when checking if DIRDEPS_CACHE is up-to-date. + +2016-08-13 Simon J. Gerraty + + * meta.sys.mk (.MAKE.META.IGNORE_PATHS): + in meta mode we can ignore the mtime of makefiles + +2016-08-02 Simon J. Gerraty + + * install-mk (MK_VERSION): 20160802 + + * lib.mk (libinstall): depends on beforinstall + + * prog.mk (proginstall): depends on beforinstall + patch from Lauri Tirkkonen + + * dirdeps.mk (bootstrap): When bootstrapping; creat + .MAKE.DEPENDFILE_DEFAULT and allow additional filtering via + .MAKE.DEPENDFILE_BOOTSTRAP_SED + + * dirdeps.mk: move some comments to where they make sense. + +2016-07-27 Simon J. Gerraty + + * dirdeps.mk (DIRDEPS_CACHE): no dirname. + 2016-06-02 Simon J. Gerraty * install-mk (MK_VERSION): 20160602 * meta.autodep.mk: when passing META_FILES to gendirdeps.mk do not apply :T to META_XTRAS patch from Bryan Drewery at FreeBSD.org. 2016-05-30 Simon J. Gerraty * install-mk (MK_VERSION): 20160530 * meta.stage.mk: we assume ${CLEANFILES} gets .NOPATH make it so. 2016-05-12 Simon J. Gerraty * install-mk (MK_VERSION): 20160512 * dpadd.mk: always include local.dpadd.mk if it exists remove some things that better belong in local.dpadd.mk skip INCLUDES_* for staged libs unless SRC_* defined. * own.mk: add INCLUDEDIR 2016-04-18 Simon J. Gerraty * dirdeps.mk: when doing -f dirdeps.mk if target suppies no TARGET_MACHINE - :E will be empty or match part of path, use ${MACHINE} 2016-04-07 Simon J. Gerraty * meta.autodep.mk: issue a warning if UPDATE_DEPENDFILE=NO due to NO_FILEMON_COOKIE * dirdeps.mk: move the logic that allows for make -f dirdeps.mk some/dir.${TARGET_SPEC} inside the check for !target(_DIRDEP_USE) 2016-04-04 Simon J. Gerraty * Use <> when including local*.mk and others which may exist elsewhere so that user can better control what they get. * meta.autodep.mk (NO_FILEMON_COOKIE): create a cookie if we ever build dir with nofilemon so that UPDATE_DEPENDFILE will be forced to NO until cleaned. 2016-04-01 Simon J. Gerraty * install-mk (MK_VERSION): 20160401 * meta2deps.py: fix old print statement when debugging. * gendirdeps.mk: META2DEPS_CMD append M2D_EXCLUDES with -X patch from Bryan Drewery 2016-03-22 Simon J. Gerraty * install-mk (MK_VERSION): 20160317 (St. Pats) * warnings.mk: g++ does not like -Wimplicit * sys.mk sys/*.mk lib.mk prog.mk: use CXX_SUFFIXES to handle the pelthora of common suffixes for C++ * lib.mk: use .So for shared objects 2016-03-15 Simon J. Gerraty * install-mk (MK_VERSION): 20160315 * meta.stage.mk (LN_CP_SCRIPT): do not ln(1) if we have to chmod(1) normally only applies to scripts. * dirdeps.mk: NO_DIRDEPS_BELOW to supress DIRDEPS below RELDIR as well as outside it. 2016-03-10 Simon J. Gerraty * install-mk (MK_VERSION): 20160310 * dirdeps.mk: use targets rather than a list to track DIRDEPS that we have processed; the list gets very inefficient as number of DIRDEPS gets large. * sys.dependfile.mk: fix comment wrt MACHINE * meta.autodep.mk: ignore staged DPADDs when bootstrapping. patch from Bryan Drewery 2016-03-02 Simon J. Gerraty * meta2deps.sh: don't ignore subdirs. patch from Bryan Drewery 2016-02-26 Simon J. Gerraty * install-mk (MK_VERSION): 20160226 * gendirdeps.mk: mark _DEPENDFILE .NOMETA 2016-02-20 Simon J. Gerraty * dirdeps.mk: we shouldn't normally include .depend but if we do use .dinclude if we can. 2016-02-18 Simon J. Gerraty * install-mk (MK_VERSION): 20160218 * sys.clean-env.mk: with recent change to Var_Subst() we cannot use the '$$' trick, but .export-literal does the job we need. * auto.dep.mk: make use .dinclude if we can. 2016-02-05 Simon J. Gerraty * dirdeps.mk: Add _build_all_dirs such that local.dirdeps.mk can add fully qualified dirs to it. These will be built normally but the current DEP_RELDIR will not depend on then (to avoid cycles). This makes it easy to hook things like unit-tests into build. 2016-01-21 Simon J. Gerraty * dirdeps.mk: add bootstrap-empty 2015-12-12 Simon J. Gerraty * install-mk (MK_VERSION): 20151212 * auto.obj.mk: do not require MAKEOBJDIRPREFIX to exist. only apply :tA to __objdir when comparing to .OBJDIR 2015-11-14 Simon J. Gerraty * install-mk (MK_VERSION): 20151111 * meta.sys.mk: include sys.dependfile.mk * sys.mk (OPTIONS_DEFAULT_NO): use options.mk to set MK_AUTO_OBJ and MK_DIRDEPS_BUILD include local.sys.env.mk early include local.sys.mk later * own.mk (OPTIONS_DEFAULT_NO): AUTO_OBJ etc moved to sys.mk 2015-11-13 Simon J. Gerraty * meta.sys.mk (META_COOKIE_TOUCH): add ${META_COOKIE_TOUCH} to the end of scripts to touch cookie * meta.stage.mk: stage_libs should ignore SYMLINKS. 2015-10-23 Simon J. Gerraty * install-mk (MK_VERSION): 20151022 * sys.mk: BSD/OS does not have 'type' as a shell builtin. 2015-10-20 Simon J. Gerraty * install-mk (MK_VERSION): 20151020 * dirdeps.mk: Add logic for make -f dirdeps.mk some/dir.${TARGET_SPEC} 2015-10-14 Simon J. Gerraty * install-mk (MK_VERSION): 20151010 2015-10-02 Simon J. Gerraty * meta.stage.mk: use staging: ${STAGE_TARGETS:... to have stage_lins run last in non-jobs mode. Use .ORDER only for jobs mode. 2015-09-02 Simon J. Gerraty * rst2htm.mk: allow for per target flags etc. 2015-09-01 Simon J. Gerraty * install-mk (MK_VERSION): 20150901 * doc.mk: create dir if needed use DOC_INSTALL_OWN 2015-06-15 Simon J. Gerraty * install-mk (MK_VERSION): 20150615 * auto.obj.mk: allow use of MAKEOBJDIRPREFIX too. Follow make's normal precedence rules. * gendirdeps.mk: allow customization of the header. eg. for FreeBSD: GENDIRDEPS_HEADER= echo '\# ${FreeBSD:L:@v@$$$v$$ @:M*F*}'; * meta.autodep.mk: ignore dirdeps.cache* * meta.stage.mk: when bootstrapping options it can be handy to throw warnings rather than errors for staging conflicts. * meta.sys.mk: include local.meta.sys.mk for customization 2015-06-06 Simon J. Gerraty * install-mk (MK_VERSION): 20150606 * dirdeps.mk: don't rely on manually maintained Makefile.depend to set DEP_RELDIR and reset DIRDEPS. By setting DEP_RELDIR ourselves we can skip :tA * gendirdeps.mk: skip setting DEP_RELDIR. 2015-05-24 Simon J. Gerraty * dirdeps.mk: avoid wildcards like make(bootstrap*) 2015-05-20 Simon J. Gerraty * install-mk (MK_VERSION): 20150520 * dirdeps.mk: when we are building dirdeps cache file we *want* meta_oodate to look at all the Makefile.depend files, so set .MAKE.DEPENDFILE to something that won't match. * meta.stage.mk: for STAGE_AS_* basename of file may not be unique so first use absolute path as key. Also skip staging at level 0. 2015-04-30 Simon J. Gerraty * install-mk (MK_VERSION): 20150430 * dirdeps.mk: fix _count_dirdeps for non-cache case. 2015-04-16 Simon J. Gerraty * install-mk (MK_VERSION): 20150411 bump version * own.mk: put AUTO_OBJ in OPTIONS_DEFAULT_NO rather than YES. it is here mainly for documentation purposes, since if using auto.obj.mk it is better done via sys.mk 2015-04-01 Simon J. Gerraty * install-mk (MK_VERSION): 20150401 * meta2deps.sh: support @list * meta2deps.py: updates from Juniper o add EXCLUDES o skip bogus input files. o treat 'M' and 'L' as both an 'R' and a 'W' 2015-03-03 Simon J. Gerraty * install-mk (MK_VERSION): 20150303 * dirdeps.mk: if MK_DIRDEPS_CACHE is yes, use dirdeps-cache which is built via sub-make so we have a .meta file to tell if it is out-of-date. The dirdeps-cache contains the same dependency rules that we normaly construct on the fly. This adds a few seconds overhead when the cache is out of date, but for a large target, the savings can be significant (10-20min). 2014-11-18 Simon J. Gerraty * install-mk (MK_VERSION): 20141118 * meta.stage.mk: add stale_staged * dirdeps.mk (_DIRDEP_USE_LEVEL): allow this to be tweaked only useful under very rare conditions such as FreeBSD's make universe. * auto.obj.mk: Allow MK_AUTO_OBJ to set MKOBJDIRS=auto 2014-11-11 Simon J. Gerraty * install-mk (MK_VERSION): 20141111 * mkopt.sh: use consistent semantics for _mk_opt and _mk_opts 2014-11-09 Simon J. Gerraty * FILES: include mkopt.sh which allows handling options in shell scripts in a manner compatible with options.mk 2014-10-12 Simon J. Gerraty * meta.stage.mk: ensure only _STAGED_DIRS under objroot are used for GENDIRDEPS_FILTER to avoid surprises. 2014-10-10 Simon J. Gerraty * dirdeps.mk (NSkipHostDir): this needs SRCTOP prepended since by the time it is applied to __depdirs they have. * dirdeps.mk fix filtering of _machines since M_dep_qual_fixes expects patterns like *.${MACHINE} * cython.mk (pyprefix?): use pyprefix to find python bits since prefix might be something else (where we install our stuff) 2014-09-11 Simon J. Gerraty * install-mk (MK_VERSION): 20140911 * dirdeps.mk: add bootstrap target to simplify adding support for new MACHINE. 2014-09-01 Simon J. Gerraty * gendirdeps.mk: Add handling of GENDIRDEPS_FILTER_DIR_VARS and GENDIRDEPS_FILTER_VARS to make it easier to produce sharable Makefile.depend files. 2014-08-28 Simon J. Gerraty * install-mk (MK_VERSION): 20140828 * cython.mk: capture logic for building python extension modules with Cython. 2014-08-08 Simon J. Gerraty * meta.stage.mk (_STAGE_AS_BASENAME_USE): Add StageAs variant 2014-08-02 Simon J. Gerraty * install-mk (MK_VERSION): 20140801 * dep.mk: use explicit MKDEP_MK rather than overload MKDEP to identify the autodep.mk variant. * sys.dependfile.mk: delete .MAKE.DEPENDFILE if its initial value does not match .MAKE.DEPENDFILE_PREFIX * meta.autodep.mk: if _bootstrap_dirdeps add RELDIR to DIRDEPS 2014-05-22 Simon J. Gerraty * install-mk (MK_VERSION): 20140522 * lib.mk: use CC to link shlib for linux too patch from Brendan MacDonell 2014-05-05 Simon J. Gerraty * meta.autodep.mk: add _reldir_{finish,failed} for gathering stats if WITH_META_STATS is defined. 2014-05-02 Simon J. Gerraty * dirdeps.mk: accept -DWITHOUT_DIRDEPS (same a as -DNO_DIRDEPS) to supress dirdeps outside of .CURDIR. 2014-04-05 Simon J. Gerraty * Fix spelling errors - patch from Pedro Giffuni 2014-03-14 Simon J. Gerraty * install-mk (MK_VERSION): 20140314 * dirdeps.mk (beforedirdeps): a handy hook * dirdeps.mk (DIRDEP_MAKE): allow the actual command we run to visit leaf dirs to be intercepted (eg. for distributed build). * dirdeps.mk (__depdirs): ensure // don't sneak in * gendirdeps.mk (DIRDEPS): ensure // don't sneak in 2014-02-21 Simon J. Gerraty * rst2htm.mk (RST2PDF): add support for rst2pdf 2014-02-14 Simon J. Gerraty * install-mk (MK_VERSION): bump version * dirdeps.mk (_last_dependfile): use .INCLUDEDFROMFILE if available. 2014-02-10 Simon J. Gerraty * options.mk: avoid :U so this isn't bmake dependent 2014-02-09 Simon J. Gerraty * options.mk: cleanup and simplify semanitcs NO_* dominates all, if both WITH_* and WITHOUT_* are defined then result is DOMINATE_* which defaults to "no". Ie. WITHOUT_ normally wins. 2013-12-12 Simon J. Gerraty * install-mk (MK_VERSION): bump version * meta2deps.py: convert to print function for python3 compat. we also need to open files with mode 'r' rather than 'rb' otherwise we get bytes instead of strings. 2013-10-10 Simon J. Gerraty * install-mk (MK_VERSION): bump version * dirdeps.mk: when TARGET_SPEC_VARS is more than just MACHINE apply the same filtering (M_dep_qual_fixes) when setting _machines as _build_dirs. Also fix the filtering of Makefile.depend files - for reporting what we are looking for (M_dep_qual_fixes can get confused by Makefile.depend) Add some more debug info. 2013-09-04 Simon J. Gerraty * gendirdeps.mk (_objtops): fix typo also while processing M2D_OBJROOTS to gather qualdir_list qualify $ql with loop iterator to ensure correct results. 2013-08-01 Simon J. Gerraty * install-mk (MK_VERSION): 20130801 * libs.mk: update to match progs.mk 2013-07-26 Simon J. Gerraty * install-mk (MK_VERSION): 20130726 some updates from Juniper and FreeBSD o meta2deps.py: indicate file and line number when we hit parse errors also allow @file to provide huge list of .meta files. * meta2deps.py: add try_parse() to cleanup the above. 2013-07-16 Simon J. Gerraty * install-mk (MK_VERSION): 20130716 * own.mk: add GPROG as an option * prog.mk: honor MK_GPROF==yes 2013-05-10 Simon J. Gerraty * install-mk (MK_VERSION): 20130505 * gendirdeps.mk, meta2deps.py, meta2deps.sh: handle $TARGET_SPEC for when $MACHINE isn't enough for objdir distinction. Bring meta2deps.sh closer to par with meta2deps.py. 2013-04-18 Simon J. Gerraty * meta.stage.mk: set INSTALL to STAGE_INSTALL when making 'all' also if the target 'beforeinstall' exists, make it depend on .dirdep (incase it uses STAGE_INSTALL). 2013-04-17 Simon J. Gerraty * install-mk (MK_VERSION): 20130401 ;-) * meta.stage.mk (STAGE_INSTALL_SH): add stage-install.sh as wrapper around install(1). * options.mk (OPTION_PREFIX): Allow a prefix other than MK_ 2013-03-30 Simon J. Gerraty * meta2deps.py (MetaFile.__init__): ensure self.cwd is initialized. * install-mk (MK_VERSION): bump version 2013-03-21 Simon J. Gerraty * install-mk (MK_VERSION): bump version * gendirdeps.mk: do not apply :tA to DPADD entries, since we lose any trailing /., rather apply :tA only when needed. * gendirdeps.mk: better mimic meta2deps handling of .dirdep files. * meta.stage.mk (LN_CP_SCRIPT): Add LnCp to do the ln||cp dance consistently. * dirdeps.mk: better describe the dance in sys.mk for TARGET_SPEC. 2013-03-18 Simon J. Gerraty * gendirdeps.mk: revert the dance around .MAKE.DEPENDFILE_DEFAULT it is simpler to just not update when say building for "host" (where we know we apply filters to DIRDEPS), and using a non-machine qualified dependfile. 2013-03-16 Simon J. Gerraty * dirdeps.mk: improve DIRDEPS filtering by allowing DEP_SKIP_DIR and DEP_DIRDEPS_FILTER to vary by DEP_MACHINE and DEP_TARGET_SPEC * gendirdeps.mk: ensure _objroot has trailing / if it needs it. * meta2deps.py: if machine is "host", then also trim self.host_target from any OBJROOTS. 2013-03-11 Simon J. Gerraty * gendirdeps.mk: if .MAKE.DEPENDFILE_DEFAULT is not machine qualified but _DEPENDFILE is, and .MAKE.DEPENDFILE_DEFAULT exists but _DEPENDFILE does not, compare the new _DEPENDFILE against .MAKE.DEPENDFILE_DEFAULT and discard if the same. 2013-03-08 Simon J. Gerraty * meta.stage.mk: use STAGE_TARGETS to control .ORDER and hook to all: via staging: 2013-03-07 Simon J. Gerraty * sys.dependfile.mk (.MAKE.DEPENDFILE_DEFAULT): use a separate variable for the default .MAKE.DEPENDFILE value so that it can be controlled independently of .MAKE.DEPENDFILE_PREFERENCE * meta.stage.mk: throw error if cp fails etc. Stage*() return early if passed no args. .ORDER stage_* 2013-03-03 Simon J. Gerraty * install-mk (MK_VERSION): bump version * gendirdeps.mk: handle multiple M2D_OBJROOTS better. 2013-02-10 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20130210 * import latest dirdeps.mk, gendirdeps.mk and meta2deps.py from Juniper. o dirdeps.mk now fully supports TARGET_SPEC consisting of more than just MACHINE. o no longer use DEP_MACHINE from Makefile.depend* so remove it. 2013-01-23 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20130123 * meta.stage.mk: add stage_links (hard links). if doing hard links, we add dest to link as well. Default the stage dir for [sym]links to STAGE_OBJTOP since these are typically specified as absolute paths. Add -m "mode" flag to StageFiles and StageAs. 2012-11-11 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20121111 * autoconf.mk: avoid meta mode seeing changed commands for config.status * meta.autodep.mk: pass resolved MAKESYSPATH to gendirdeps in case we were found via .../mk * sys.clean-env.mk: move it from examples, we and others use it "as is". * FILES: add srctop.mk and options.mk * own.mk: convert to using options.mk which is modeled after FreeBSD's handling of MK_* but more flexible. This allows MK_* for boolean knobs to not be confused with MK* which can be commands. * examples/sys.clean-env.mk: add WITH[OUT]_ to MAKE_ENV_SAVE_PREFIX_LIST. Mention that HOME=/var/empty might be a good idea. 2012-11-08 Simon J. Gerraty * sys.dependfile.mk: if not depend file exists, $MACHINE specific ones are supported but not the default, check if any exist and follow suit. 2012-11-06 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20121106 2012-11-05 Simon J. Gerraty * import latest dirdeps.mk and meta2deps.py from Juniper. * progs.mk: add MAN and CXXFLAGS to PROG_VARS also add PROGS_TARGETS and pass on PROG_CXX if it seems appropriate. 2012-11-04 Simon J. Gerraty * meta.stage.mk: update CLEANFILES remove redundant cp of .dirdep from STAGE_AS_SCRIPT. * progs.mk: Add LDADD to PROG_VARS 2012-10-12 Simon J. Gerraty * meta.stage.mk (STAGE_DIR_FILTER): track dirs we stage to in _STAGED_DIRS so that these can be turned into filters for GENDIRDEPS_FILTER. 2012-10-10 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20121010 * meta.stage.mk (STAGE_DIRDEP_SCRIPT): check that an existing target.dirdep matches .dirdep 2012-08-08 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20120808 * import latest meta2deps.py from Juniper. 2012-07-11 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20120711 * dep.mk: add explicit dependencies on SRCS after applying SRCS_DEP_FILTER * meta.autodep.mk: add explicit dependencies on SRCS after applying SRCS_DEP_FILTER * meta.autodep.mk: ensure GENDIRDEPS_FILTER is exported if needed. 2012-06-26 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20120626 * meta.sys.mk: ignore PYTHON if it does not exist compare ${.MAKE.DEPENDFILE:E} against ${MACHINE} is more reliable. * meta.stage.mk: examine .MAKE.DEPENDFILE_PREFERENCE for any entries ending in .${MACHINE} to decide if qualified _dirdep is needed. * gendirdeps.mk: only produce unqualified deps if no .MAKE.DEPENDFILE_PREFERENCE ends in .${MACHINE} * meta.subdir.mk: apply SUBDIRDEPS_FILTER 2012-04-20 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20120420 * add sys.dependfile.mk so we can experiment with .MAKE.DEPENDFILE_PREFERENCE * meta.autodep.mk: _DEPENDFILE is precious! 2012-03-15 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20120315 * install-new.mk: avoid being interrupted 2012-02-26 Simon J. Gerraty * man.mk: MAN might have multiple values so be careful with exists(). 2012-01-19 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20120112 * fix examples/sys.clean-env.mk so that MAKEOBJDIR is handled as: MAKEOBJDIR='${.CURDIR:S,${SRCTOP},${OBJTOP},}' 2011-12-03 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20111201 * import dirdeps.mk from Juniper sjg@ o more consistent handling of DEP_MACHINE, especially when dealing with an odd Makefile.depend, when normally using Makefile.depend.${MACHINE} 2011-11-22 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20111122 * meta.autodep.mk: add some debug output, be more crisp about updating. Use ${.ALLTARGETS:M*.o} as a clue for .depend 2011-11-13 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20111111 it's too cool to miss * import meta* updates from Juniper sjg@ o dirdeps.mk set DEP_MACHINE for Makefile.depend (when we are normally using Makefile.depend.${MACHINE}), handy for read-only manually maintained dependencies. o meta2deps.py add a clear 'ERROR:' token if an exception is raised. o gendirdeps.mk if ERROR: from meta2deps.py do not update anything. 2011-10-30 Simon J. Gerraty * install-new.mk separate the cmp and copy logic to its own function. 2011-10-28 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20111028 * sys.mk: include auto.obj.mk if MKOBJDIRS is set to auto * subdir.mk: ensure _SUBDIRUSE is provided * meta.autodep.mk: remove dependency of gendirdeps.mk on auto.obj.mk * meta.subdir.mk: always allow for Makefile.depend 2011-10-10 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20111010 o minor tweak to *dirdeps.mk from Juniper sjg@ 2011-10-01 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20111001 o add meta2deps.py from Juniper sjg@ o tweak gendirdeps.mk to work with meta2deps.py when not cross-building * autoconf.mk: add autoconf-input as a hook for regenerating AUTOCONF_INPUTS (configure). 2011-08-24 Simon J. Gerraty * meta.autodep.mk: if we do not have OBJS, .depend isn't a useful trigger for updating Makefile.depend* 2011-08-08 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20110808 * obj.mk: minor cleanup * auto.obj.mk: improve description of Mkdirs and honor NO_OBJ too. 2011-08-01 Simon J. Gerraty * auto.obj.mk (.OBJDIR): throw an error if we cannot use the specified dir. 2011-06-28 Simon J. Gerraty * meta.autodep.mk: if XMAKE_META_FILE is set the makefile uses a foreign make, and so dependencies can only be gathered from a clean tree build. 2011-06-24 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20110622 * meta.autodep.mk: improve bootstraping 2011-06-10 Simon J. Gerraty * yacc.mk: handle the corner case of .c being removed while .h remains. 2011-06-08 Simon J. Gerraty * yacc.mk: do .y.h and .y.c separately 2011-06-04 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20110606 * don't store SRC_DIRDEPS in Makefile.depend* by default not everyone needs it. 2011-05-04 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20110505 first release including meta mode makefiles 2011-05-02 Simon J. Gerraty * meta.stage.mk: add STAGE_AS_SETS and stage_as for things that need to be staged with different names. 2011-05-01 Simon J. Gerraty * meta.stage.mk: add notion of STAGE_SETS so a makefile can stage to multiple dirs 2011-04-03 Simon J. Gerraty * rst2htm.mk: convert rst to s5 (slides) or plain html depending on target name. 2011-03-30 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20110330 2011-03-29 Simon J. Gerraty * sys.mk (_DEBUG_MAKE_FLAGS): use indirection so that DEBUG_MAKE_FLAGS0 can be used to debug level 0 only and DEBUG_MAKE_FLAGS for the rest. * sys.mk: re-define M_whence in terms of M_type. M_type is useful for checking if something is a builtin. 2011-03-16 Simon J. Gerraty * meta.stage.mk: add stage_symlinks and leverage StageLinks for stage_libs 2011-03-10 Simon J. Gerraty * dirdeps.mk: correct value for _depdir_files depends on .MAKE.DEPENDFILE Add our copyright - just to make it clear we have frobbed this quite a bit. DEP_MACHINE needs to be set to MACHINE each time, if using only Makefile.depend (cf. Makefile.depend.${MACHINE}) * meta.stage.mk: meta mode version of staging * init.mk, final.mk: include local.*.mk to simplify customization 2011-03-03 Simon J. Gerraty * auto.obj.mk: just because we are doing mk destroy, we should still set .OBJDIR correctly if it exists. * install-mk (mksrc): do not exclude meta.sys.mk 2011-03-01 Simon J. Gerraty * host-target.mk: set/export _HOST_ARCH etc separately, catch junk resulting from uname -p, so we can find sys/Linux.mk correctly. 2011-02-18 Simon J. Gerraty * meta.sys.mk: throw an error if /dev/filemon is missing and we expected to be updating Makefile.depend* 2011-02-14 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20110214 * meta.subdir.mk: add support for -DBOOTSTRAP_DEPENDFILES 2010-09-25 Simon J. Gerraty * meta.sys.mk: not valid for older bmake 2010-09-24 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20100919 include dirdeps.mk et al from Juniper Networks, for meta mode - requires filemon(9). * sys.mk, subdir.mk: Add hooks for meta mode. we do this as meta.sys.mk, meta.autodep.mk and meta.subdir.mk to make turning it on/off simple. 2010-06-16 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20100616 * fix typo in sys.mk 2010-06-12 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20100612 * lib.mk: remove duplicate addition to SOBJS 2010-06-10 Simon J. Gerraty * sys.mk: Add a means of selectively turning on debug flags. Eg. DEBUG_MAKE_FLAGS=-dv DEBUG_MAKE_DIRS="*lib/sjg" will act as if we did make -dv if .CURDIR ends in lib/sjg DEBUG_MAKE_SYS_DIRS does the same thing, but we set the flags at the start of sys.mk rather than the end. This only makes sense for leaf dirs, so we check that .MAKE.LEVEL > 0 2010-06-09 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20100608 * sys.mk: include sys.env.mk later so it can use M_ListToSkip et al. * examples/sys.clean-env.mk: require MAKE_VERIONS >= 20100606 also make it easier for folk to tweak 2010-06-08 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20100606 do not install examples/* * FILES: add examples/sys.clean-env.mk * examples/sys.clean-env.mk: use .export-env to handle MAKEOBJDIR this requires bmake-20100606 or later to work. 2010-05-13 Simon J. Gerraty * sys.mk (M_tA): better simulate the result of :tA if not available. 2010-05-04 Simon J. Gerraty * sys.mk: canonicalize MAKE_VERSION old versions reported bmake- build- whereas we only care about 2010-04-25 Simon J. Gerraty * install-mk: just warn about FORCE_{BSD,SYS}_MK being ignored * lib.mk: we only build the shared lib if SHLIB_FULLVERSION is !empty 2010-04-22 Simon J. Gerraty * dpadd.mk: use LDADD_* if defined. 2010-04-21 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20100420 * sys/NetBSD.mk: add MACHINE_CPU to keep netbsd makefiles happy * autoconf.mk allow AUTO_AUTOCONF 2010-04-19 Simon J. Gerraty * obj.mk: add objwarn to keep freebsd makefiles happy * auto.obj.mk: ensure Mkdirs is available. * FILES: add auto.dep.mk - a simpler version of autodep.mk * dep.mk: auto.dep.mk does not do 'make depend' so ignore it if asked to do that. fix/simplify the tests for when to run mkdep. * auto.dep.mk: add some explanation of how/what we do. * autodep.mk: skip the .OPTIONAL frobbing of .depend bmake's FROM_DEPEND flag makes it redundant. 2010-04-13 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20100404 * subdir.mk: protect from multiple inclusion using _SUBDIRUSE. * obj.mk: protect from multiple inclusion even as bsd.obj.mk Also create a target _SUBDIRUSE so that we can be used without subdir.mk 2010-04-12 Simon J. Gerraty * dep.mk: use <> when .including so can override. 2010-01-11 Simon J. Gerraty * lib.mk (SHLIB_LINKS): ensure a string comparison. 2010-01-04 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20100102 * own.mk: ensure PRINTOBJDIR works * autoconf.mk: pass on CONFIGURE_ARGS * init.mk: handle COPTS.${.IMPSRC:T} etc. * lib.mk: allow sys.mk to control SHLIB_FULLVERSION fix handling of symlinks for darwin * libnames.mk: add DSHLIBEXT for libs which only exist as shared. * man.mk: suppress chown when not root. * rst2htm.mk: allow srcs from multiple locations. * sys.mk: M_whence, stop after 1st line of output. * sys/Darwin.mk: Use .dylib for DSHLIBEXT and HOST_LIBEXT * sys/SunOS.mk: we need to export PATH 2009-12-23 Simon J. Gerraty * install-mk (MK_VERSION): bump version include rst2htm.mk 2009-12-17 Simon J. Gerraty * sys.mk,libnames.mk add .-include this allows local customization without the need to edit the distributed files. 2009-12-14 Simon J. Gerraty * dpadd.mk (__dpadd_libdirs): order -L's to avoid picking up older versions already installed. 2009-12-13 Simon J. Gerraty * stage.mk (.stage-install): generalize lib.mk's .libinstall * rules.mk rules for generic Makefile. * inc.mk install for includes. 2009-12-11 Simon J. Gerraty * sys/NetBSD.mk (MAKE_VERSION): some of our *.mk want to check this, so provide it if using native make. 2009-12-10 Simon J. Gerraty * FILES: move all the platform *.sys.mk files to sys/*.mk * Rename Generic.sys.mk to sys.mk - we always want it. 2009-11-17 Simon J. Gerraty * install-mk (MK_VERSION): bump version * host-target.mk: only export the expensive stuff * Generic.sys.mk (sys_mk): for SunOS we need to look for ${HOST_OS}.${HOST_OSMAJOR} too! 2009-11-07 Simon J. Gerraty * install-mk (MK_VERSION): bump version * lib.mk: if sys.mk doesn't give us an lorder, don't use it. based on patch from Greg Olszewski. * Generic.sys.mk: if we have nothing to work with set LORDER etc only if we can find it. 2009-09-08 Simon J. Gerraty * install-mk (MK_VERSION): bump version * man.mk: cleanman: remove CLEANMAN if defined. 2009-09-04 Simon J. Gerraty * SunOS.5.sys.mk (CC): Use ?= like the other *sys.mk 2009-07-17 Simon J. Gerraty * install-mk (MK_VERSION): bump version include auto.obj.mk 2009-03-26 Simon J. Gerraty * prog.mk,lib.mk: ensure test of USE_DPADD_MK doesn't fail. 2008-11-11 Simon J. Gerraty * install-mk (MK_VERSION): bump version man.mk: ensure we generate *.cat1 etc in . 2008-07-16 Simon J. Gerraty * install-mk (MK_VERSION): bump version add prlist.mk 2007-11-25 Simon J. Gerraty * Generic.sys.mk: Allow os specific sys.mk to be in a subdir of ${.PARSEDIR} 2007-11-22 Simon J. Gerraty * install-mk (MK_VERSION): bump version * general cleanup * dpadd.mk introduce DPMAGIC_LIBS_* 2007-04-30 Simon J. Gerraty * install-mk (MK_VERSION): bump version * libs.mk, progs.mk, autodep.mk: allow for per lib/prog depend files and ensure clean is called for each lib/prog. 2007-03-27 Simon J. Gerraty * autodep.mk (.depend): delete lines that do not start with space and do not contain ':' 2007-02-16 Simon J. Gerraty * autodep.mk (.depend): gcc may wrap lines if pathnames are long so make sure the transform for .OPTIONAL copes. 2007-02-03 Simon J. Gerraty * install-mk (MK_VERSION): bump version * own.mk: make sure RM and LN are defined. * obj.mk: fix a typo, and objlink target. 2006-12-30 Simon J. Gerraty * install-mk (MK_VERSION): bump version * added libs.mk - analogous to progs.mk make both of them always inlcude {lib,prog}.mk 2006-12-28 Simon J. Gerraty * progs.mk: add a means of building multiple apps in one dir. 2006-11-26 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20061126 * warnings.mk: detect invalid WARNINGS_SET * warnings.mk: use ${.TARGET:T:R}.o when looking for target specific warnings. * For .cc sources, turn off warnings that g++ vomits on. 2006-11-08 Simon J. Gerraty * own.mk: if __initialized__ target doesn't exist and we are FreeBSD we got here directly from sys.mk 2006-11-06 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20061106 add scripts.mk 2006-03-18 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20060318 * autodep.mk: avoid := when modifying OBJS into __dependsrcs 2006-03-02 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20060302 * autodep.mk: use -MF et al to help gcc+ccache DTRT. 2006-03-01 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20060301 * autodep.mk (.depend): if MAKE_VERSION is newer than 20050530 we can make .END depend on .depend and make .depend depend on __depsrcs that exist. * dpadd.mk: add SRC_PATHADD 2005-11-04 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20051104 * prog.mk: remove all the LIBC?= junk, use .-include libnames.mk instead (none by default). also if USE_DPADD_MK is set, include that. 2005-10-09 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20051001 Add UnixWare.sys.mk from Klaus Heinz. 2005-04-05 Simon J. Gerraty * install-mk: always install *.sys.mk and if need be symlink one to sys.mk 2005-03-22 Simon J. Gerraty * subdir.mk, own.mk: use .MAKE rather than MAKE 2004-02-15 Simon J. Gerraty * own.mk: don't use NetBSD's _SRC_TOP_ it can cause confusion. Also don't take just 'mk' as a srctop indicator. 2004-02-14 Simon J. Gerraty * warnings.mk: overhauled, now very powerful. 2004-02-03 Simon J. Gerraty * Generic.sys.mk: need to use ${.PARSEDIR} with exists(). 2004-02-01 Simon J. Gerraty * install-mk (MK_VERSION): bump version to 20040201 * extract HOST_TARGET stuff to host-target.mk so own.mk and Generic.sys.mk can share. * fix typo in autodep.mk _SUBDIRUSE not _SUBDIR. 2003-09-30 Simon J. Gerraty * install-mk (MK_VERSION): 20030930 * rename generic.sys.mk to Generic.sys.mk so that it does not get installed (unless being used as sys.mk) * set OS and ROOT_GROUP for those that we know the value. for others (eg. Generic.sys.mk) wrap the != in an .ifndef so we don't do it again for each sub-make. 2003-09-28 Simon J. Gerraty * install-mk (MK_VERSION): 20030928 Add some extra *.sys.mk from bootstrap-pkgsrc some of these likely still need work. Make everything default to root:wheel ownership, sys.mk can set ROOT_GROUP accordingly. 2003-08-07 Simon J. Gerraty * install-mk: if FORCE_BSD_MK={cp,ln} use the ones in SYS_MK_DIR not the portable ones. 2003-07-31 Simon J. Gerraty * install-mk: add ability to use cp -f when updating destination .mk files. Also now possible to play games with FORCE_SYS_MK=ln etc on *BSD machines to link /usr/share/mk/sys.mk into dest - not recommended unless you seriously want to. 2003-07-28 Simon J. Gerraty * own.mk (IMPFLAGS): add support for COPTS.${IMPSRC:T} etc for semi-compatability with NetBSD. 2003-07-23 Simon J. Gerraty * install-mk: add a version indicator 2003-07-22 Simon J. Gerraty * prog.mk: don't try and use ${LIBCRT0} if its /dev/null * install-mk: Allow FORCE_SYS_MK to come from env Index: projects/clang390-import/contrib/bmake/mk/dirdeps.mk =================================================================== --- projects/clang390-import/contrib/bmake/mk/dirdeps.mk (revision 305686) +++ projects/clang390-import/contrib/bmake/mk/dirdeps.mk (revision 305687) @@ -1,711 +1,724 @@ -# $Id: dirdeps.mk,v 1.67 2016/04/18 21:50:47 sjg Exp $ +# $Id: dirdeps.mk,v 1.73 2016/08/15 19:28:13 sjg Exp $ # Copyright (c) 2010-2013, Juniper Networks, Inc. # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS # "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR # A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT # OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, # SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT # LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, # DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY # THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. # Much of the complexity here is for supporting cross-building. # If a tree does not support that, simply using plain Makefile.depend # should provide sufficient clue. # Otherwise the recommendation is to use Makefile.depend.${MACHINE} # as expected below. # Note: this file gets multiply included. # This is what we do with DIRDEPS # DIRDEPS: # This is a list of directories - relative to SRCTOP, it is # normally only of interest to .MAKE.LEVEL 0. # In some cases the entry may be qualified with a . # or . suffix (see TARGET_SPEC_VARS below), # for example to force building something for the pseudo # machines "host" or "common" regardless of current ${MACHINE}. # # All unqualified entries end up being qualified with .${TARGET_SPEC} # and partially qualified (if TARGET_SPEC_VARS has multiple # entries) are also expanded to a full .. # The _DIRDEP_USE target uses the suffix to set TARGET_SPEC # correctly when visiting each entry. # # The fully qualified directory entries are used to construct a # dependency graph that will drive the build later. # # Also, for each fully qualified directory target, we will search # using ${.MAKE.DEPENDFILE_PREFERENCE} to find additional # dependencies. We use Makefile.depend (default value for # .MAKE.DEPENDFILE_PREFIX) to refer to these makefiles to # distinguish them from others. # # Each Makefile.depend file sets DEP_RELDIR to be the # the RELDIR (path relative to SRCTOP) for its directory, and # since each Makefile.depend file includes dirdeps.mk, this # processing is recursive and results in .MAKE.LEVEL 0 learning the # dependencies of the tree wrt the initial directory (_DEP_RELDIR). # # BUILD_AT_LEVEL0 # Indicates whether .MAKE.LEVEL 0 builds anything: # if "no" sub-makes are used to build everything, # if "yes" sub-makes are only used to build for other machines. # It is best to use "no", but this can require fixing some # makefiles to not do anything at .MAKE.LEVEL 0. # # TARGET_SPEC_VARS # The default value is just MACHINE, and for most environments # this is sufficient. The _DIRDEP_USE target actually sets # both MACHINE and TARGET_SPEC to the suffix of the current # target so that in the general case TARGET_SPEC can be ignored. # # If more than MACHINE is needed then sys.mk needs to decompose # TARGET_SPEC and set the relevant variables accordingly. # It is important that MACHINE be included in and actually be # the first member of TARGET_SPEC_VARS. This allows other # variables to be considered optional, and some of the treatment # below relies on MACHINE being the first entry. # Note: TARGET_SPEC cannot contain any '.'s so the target # triple used by compiler folk won't work (directly anyway). # # For example: # # # Always list MACHINE first, # # other variables might be optional. # TARGET_SPEC_VARS = MACHINE TARGET_OS # .if ${TARGET_SPEC:Uno:M*,*} != "" # _tspec := ${TARGET_SPEC:S/,/ /g} # MACHINE := ${_tspec:[1]} # TARGET_OS := ${_tspec:[2]} # # etc. # # We need to stop that TARGET_SPEC affecting any submakes # # and deal with MACHINE=${TARGET_SPEC} in the environment. # TARGET_SPEC = # # export but do not track # .export-env TARGET_SPEC # .export ${TARGET_SPEC_VARS} # .for v in ${TARGET_SPEC_VARS:O:u} # .if empty($v) # .undef $v # .endif # .endfor # .endif # # make sure we know what TARGET_SPEC is # # as we may need it to find Makefile.depend* # TARGET_SPEC = ${TARGET_SPEC_VARS:@v@${$v:U}@:ts,} # # touch this at your peril _DIRDEP_USE_LEVEL?= 0 .if ${.MAKE.LEVEL} == ${_DIRDEP_USE_LEVEL} # only the first instance is interested in all this -# First off, we want to know what ${MACHINE} to build for. -# This can be complicated if we are using a mixture of ${MACHINE} specific -# and non-specific Makefile.depend* - .if !target(_DIRDEP_USE) +# do some setup we only need once +_CURDIR ?= ${.CURDIR} +_OBJDIR ?= ${.OBJDIR} + +now_utc = ${%s:L:gmtime} +.if !defined(start_utc) +start_utc := ${now_utc} +.endif + .if ${MAKEFILE:T} == ${.PARSEFILE} && empty(DIRDEPS) && ${.TARGETS:Uall:M*/*} != "" # This little trick let's us do # # mk -f dirdeps.mk some/dir.${TARGET_SPEC} # all: ${.TARGETS:Nall}: all DIRDEPS := ${.TARGETS:M*[/.]*} # so that -DNO_DIRDEPS works DEP_RELDIR := ${DIRDEPS:[1]:R} # this will become DEP_MACHINE below TARGET_MACHINE := ${DIRDEPS:[1]:E:C/,.*//} .if ${TARGET_MACHINE:N*/*} == "" TARGET_MACHINE := ${MACHINE} .endif # disable DIRDEPS_CACHE as it does not like this trick MK_DIRDEPS_CACHE = no .endif # make sure we get the behavior we expect .MAKE.SAVE_DOLLARS = no -# do some setup we only need once -_CURDIR ?= ${.CURDIR} -_OBJDIR ?= ${.OBJDIR} - -now_utc = ${%s:L:gmtime} -.if !defined(start_utc) -start_utc := ${now_utc} -.endif - # make sure these are empty to start with _DEP_TARGET_SPEC = # If TARGET_SPEC_VARS is other than just MACHINE # it should be set by sys.mk or similar by now. # TARGET_SPEC must not contain any '.'s. TARGET_SPEC_VARS ?= MACHINE # this is what we started with TARGET_SPEC = ${TARGET_SPEC_VARS:@v@${$v:U}@:ts,} # this is what we mostly use below DEP_TARGET_SPEC = ${TARGET_SPEC_VARS:S,^,DEP_,:@v@${$v:U}@:ts,} # make sure we have defaults .for v in ${TARGET_SPEC_VARS} DEP_$v ?= ${$v} .endfor .if ${TARGET_SPEC_VARS:[#]} > 1 # Ok, this gets more complex (putting it mildly). # In order to stay sane, we need to ensure that all the build_dirs # we compute below are fully qualified wrt DEP_TARGET_SPEC. # The makefiles may only partially specify (eg. MACHINE only), # so we need to construct a set of modifiers to fill in the gaps. # jot 10 should output 1 2 3 .. 10 JOT ?= jot _tspec_x := ${${JOT} ${TARGET_SPEC_VARS:[#]}:L:sh} # this handles unqualified entries M_dep_qual_fixes = C;(/[^/.,]+)$$;\1.$${DEP_TARGET_SPEC}; # there needs to be at least one item missing for these to make sense .for i in ${_tspec_x:[2..-1]} _tspec_m$i := ${TARGET_SPEC_VARS:[2..$i]:@w@[^,]+@:ts,} _tspec_a$i := ,${TARGET_SPEC_VARS:[$i..-1]:@v@$$$${DEP_$v}@:ts,} M_dep_qual_fixes += C;(\.${_tspec_m$i})$$;\1${_tspec_a$i}; .endfor .else # A harmless? default. M_dep_qual_fixes = U .endif .if !defined(.MAKE.DEPENDFILE_PREFERENCE) # .MAKE.DEPENDFILE_PREFERENCE makes the logic below neater? # you really want this set by sys.mk or similar .MAKE.DEPENDFILE_PREFERENCE = ${_CURDIR}/${.MAKE.DEPENDFILE:T} .if ${.MAKE.DEPENDFILE:E} == "${TARGET_SPEC}" .if ${TARGET_SPEC} != ${MACHINE} .MAKE.DEPENDFILE_PREFERENCE += ${_CURDIR}/${.MAKE.DEPENDFILE:T:R}.$${MACHINE} .endif .MAKE.DEPENDFILE_PREFERENCE += ${_CURDIR}/${.MAKE.DEPENDFILE:T:R} .endif .endif _default_dependfile := ${.MAKE.DEPENDFILE_PREFERENCE:[1]:T} _machine_dependfiles := ${.MAKE.DEPENDFILE_PREFERENCE:T:M*${MACHINE}*} # for machine specific dependfiles we require ${MACHINE} to be at the end # also for the sake of sanity we require a common prefix .if !defined(.MAKE.DEPENDFILE_PREFIX) # knowing .MAKE.DEPENDFILE_PREFIX helps .if !empty(_machine_dependfiles) .MAKE.DEPENDFILE_PREFIX := ${_machine_dependfiles:[1]:T:R} .else .MAKE.DEPENDFILE_PREFIX := ${_default_dependfile:T} .endif .endif # this is how we identify non-machine specific dependfiles N_notmachine := ${.MAKE.DEPENDFILE_PREFERENCE:E:N*${MACHINE}*:${M_ListToSkip}} .endif # !target(_DIRDEP_USE) +# First off, we want to know what ${MACHINE} to build for. +# This can be complicated if we are using a mixture of ${MACHINE} specific +# and non-specific Makefile.depend* + # if we were included recursively _DEP_TARGET_SPEC should be valid. .if empty(_DEP_TARGET_SPEC) # we may or may not have included a dependfile yet .if defined(.INCLUDEDFROMFILE) _last_dependfile := ${.INCLUDEDFROMFILE:M${.MAKE.DEPENDFILE_PREFIX}*} .else _last_dependfile := ${.MAKE.MAKEFILES:M*/${.MAKE.DEPENDFILE_PREFIX}*:[-1]} .endif .if ${_debug_reldir:U0} .info ${DEP_RELDIR}.${DEP_TARGET_SPEC}: _last_dependfile='${_last_dependfile}' .endif .if empty(_last_dependfile) || ${_last_dependfile:E:${N_notmachine}} == "" # this is all we have to work with DEP_MACHINE = ${TARGET_MACHINE:U${MACHINE}} _DEP_TARGET_SPEC := ${DEP_TARGET_SPEC} .else _DEP_TARGET_SPEC = ${_last_dependfile:${M_dep_qual_fixes:ts:}:E} .endif .if !empty(_last_dependfile) # record that we've read dependfile for this _dirdeps_checked.${_CURDIR}.${TARGET_SPEC}: .endif .endif # by now _DEP_TARGET_SPEC should be set, parse it. .if ${TARGET_SPEC_VARS:[#]} > 1 # we need to parse DEP_MACHINE may or may not contain more info _tspec := ${_DEP_TARGET_SPEC:S/,/ /g} .for i in ${_tspec_x} DEP_${TARGET_SPEC_VARS:[$i]} := ${_tspec:[$i]} .endfor .for v in ${TARGET_SPEC_VARS:O:u} .if empty(DEP_$v) .undef DEP_$v .endif .endfor .else DEP_MACHINE := ${_DEP_TARGET_SPEC} .endif # reset each time through _build_all_dirs = # the first time we are included the _DIRDEP_USE target will not be defined # we can use this as a clue to do initialization and other one time things. .if !target(_DIRDEP_USE) # make sure this target exists dirdeps: beforedirdeps .WAIT beforedirdeps: # We normally expect to be included by Makefile.depend.* # which sets the DEP_* macros below. DEP_RELDIR ?= ${RELDIR} # this can cause lots of output! # set to a set of glob expressions that might match RELDIR DEBUG_DIRDEPS ?= no # remember the initial value of DEP_RELDIR - we test for it below. _DEP_RELDIR := ${DEP_RELDIR} .endif # pickup customizations # as below you can use !target(_DIRDEP_USE) to protect things # which should only be done once. .-include .if !target(_DIRDEP_USE) # things we skip for host tools SKIP_HOSTDIR ?= NSkipHostDir = ${SKIP_HOSTDIR:N*.host*:S,$,.host*,:N.host*:S,^,${SRCTOP}/,:${M_ListToSkip}} # things we always skip # SKIP_DIRDEPS allows for adding entries on command line. SKIP_DIR += .host *.WAIT ${SKIP_DIRDEPS} SKIP_DIR.host += ${SKIP_HOSTDIR} DEP_SKIP_DIR = ${SKIP_DIR} \ ${SKIP_DIR.${DEP_TARGET_SPEC}:U} \ ${SKIP_DIR.${DEP_MACHINE}:U} \ ${SKIP_DIRDEPS.${DEP_MACHINE}:U} NSkipDir = ${DEP_SKIP_DIR:${M_ListToSkip}} .if defined(NODIRDEPS) || defined(WITHOUT_DIRDEPS) NO_DIRDEPS = .elif defined(WITHOUT_DIRDEPS_BELOW) NO_DIRDEPS_BELOW = .endif .if defined(NO_DIRDEPS) # confine ourselves to the original dir and below. DIRDEPS_FILTER += M${_DEP_RELDIR}* .elif defined(NO_DIRDEPS_BELOW) DIRDEPS_FILTER += M${_DEP_RELDIR} .endif # this is what we run below DIRDEP_MAKE?= ${.MAKE} # we suppress SUBDIR when visiting the leaves # we assume sys.mk will set MACHINE_ARCH # you can add extras to DIRDEP_USE_ENV # if there is no makefile in the target directory, we skip it. _DIRDEP_USE: .USE .MAKE @for m in ${.MAKE.MAKEFILE_PREFERENCE}; do \ test -s ${.TARGET:R}/$$m || continue; \ echo "${TRACER}Checking ${.TARGET:R} for ${.TARGET:E} ..."; \ MACHINE_ARCH= NO_SUBDIR=1 ${DIRDEP_USE_ENV} \ TARGET_SPEC=${.TARGET:E} \ MACHINE=${.TARGET:E} \ ${DIRDEP_MAKE} -C ${.TARGET:R} || exit 1; \ break; \ done .ifdef ALL_MACHINES # this is how you limit it to only the machines we have been built for # previously. .if empty(ONLY_MACHINE_LIST) .if !empty(ALL_MACHINE_LIST) # ALL_MACHINE_LIST is the list of all legal machines - ignore anything else _machine_list != cd ${_CURDIR} && 'ls' -1 ${ALL_MACHINE_LIST:O:u:@m@${.MAKE.DEPENDFILE:T:R}.$m@} 2> /dev/null; echo .else _machine_list != 'ls' -1 ${_CURDIR}/${.MAKE.DEPENDFILE_PREFIX}.* 2> /dev/null; echo .endif _only_machines := ${_machine_list:${NIgnoreFiles:UN*.bak}:E:O:u} .else _only_machines := ${ONLY_MACHINE_LIST} .endif .if empty(_only_machines) # we must be boot-strapping _only_machines := ${TARGET_MACHINE:U${ALL_MACHINE_LIST:U${DEP_MACHINE}}} .endif .else # ! ALL_MACHINES # if ONLY_MACHINE_LIST is set, we are limited to that # if TARGET_MACHINE is set - it is really the same as ONLY_MACHINE_LIST # otherwise DEP_MACHINE is it - so DEP_MACHINE will match. _only_machines := ${ONLY_MACHINE_LIST:U${TARGET_MACHINE:U${DEP_MACHINE}}:M${DEP_MACHINE}} .endif .if !empty(NOT_MACHINE_LIST) _only_machines := ${_only_machines:${NOT_MACHINE_LIST:${M_ListToSkip}}} .endif # make sure we have a starting place? DIRDEPS ?= ${RELDIR} .endif # target # if repeatedly building the same target, # we can avoid the overhead of re-computing the tree dependencies. MK_DIRDEPS_CACHE ?= no BUILD_DIRDEPS_CACHE ?= no BUILD_DIRDEPS ?= yes .if !defined(NO_DIRDEPS) && !defined(NO_DIRDEPS_BELOW) .if ${MK_DIRDEPS_CACHE} == "yes" # this is where we will cache all our work -DIRDEPS_CACHE?= ${_OBJDIR}/dirdeps.cache${.TARGETS:Nall:O:u:ts-:S,/,_,g:S,^,.,:N.} +DIRDEPS_CACHE?= ${_OBJDIR:tA}/dirdeps.cache${.TARGETS:Nall:O:u:ts-:S,/,_,g:S,^,.,:N.} # just ensure this exists build-dirdeps: M_oneperline = @x@\\${.newline} $$x@ .if ${BUILD_DIRDEPS_CACHE} == "no" .if !target(dirdeps-cached) # we do this via sub-make BUILD_DIRDEPS = no +# ignore anything but these +.MAKE.META.IGNORE_FILTER = M*/${.MAKE.DEPENDFILE_PREFIX}* + dirdeps: dirdeps-cached dirdeps-cached: ${DIRDEPS_CACHE} .MAKE @echo "${TRACER}Using ${DIRDEPS_CACHE}" @MAKELEVEL=${.MAKE.LEVEL} ${.MAKE} -C ${_CURDIR} -f ${DIRDEPS_CACHE} \ dirdeps MK_DIRDEPS_CACHE=no BUILD_DIRDEPS=no # these should generally do BUILD_DIRDEPS_MAKEFILE ?= ${MAKEFILE} BUILD_DIRDEPS_TARGETS ?= ${.TARGETS} # we need the .meta file to ensure we update if # any of the Makefile.depend* changed. # We do not want to compare the command line though. ${DIRDEPS_CACHE}: .META .NOMETA_CMP +@{ echo '# Autogenerated - do NOT edit!'; echo; \ echo 'BUILD_DIRDEPS=no'; echo; \ echo '.include '; \ } > ${.TARGET}.new +@MAKELEVEL=${.MAKE.LEVEL} DIRDEPS_CACHE=${DIRDEPS_CACHE} \ DIRDEPS="${DIRDEPS}" \ MAKEFLAGS= ${.MAKE} -C ${_CURDIR} -f ${BUILD_DIRDEPS_MAKEFILE} \ ${BUILD_DIRDEPS_TARGETS} BUILD_DIRDEPS_CACHE=yes \ .MAKE.DEPENDFILE=.none \ ${.MAKEFLAGS:tW:S,-D ,-D,g:tw:M*WITH*} \ 3>&1 1>&2 | sed 's,${SRCTOP},$${SRCTOP},g' >> ${.TARGET}.new && \ mv ${.TARGET}.new ${.TARGET} .endif .elif !target(_count_dirdeps) # we want to capture the dirdeps count in the cache .END: _count_dirdeps _count_dirdeps: .NOMETA @echo '.info $${.newline}$${TRACER}Makefiles read: total=${.MAKE.MAKEFILES:[#]} depend=${.MAKE.MAKEFILES:M*depend*:[#]} dirdeps=${.ALLTARGETS:M${SRCTOP}*:O:u:[#]}' >&3 .endif .elif !make(dirdeps) && !target(_count_dirdeps) beforedirdeps: _count_dirdeps _count_dirdeps: .NOMETA @echo "${TRACER}Makefiles read: total=${.MAKE.MAKEFILES:[#]} depend=${.MAKE.MAKEFILES:M*depend*:[#]} dirdeps=${.ALLTARGETS:M${SRCTOP}*:O:u:[#]} seconds=`expr ${now_utc} - ${start_utc}`" .endif .endif .if ${BUILD_DIRDEPS} == "yes" .if ${DEBUG_DIRDEPS:@x@${DEP_RELDIR:M$x}${${DEP_RELDIR}.${DEP_MACHINE}:L:M$x}@} != "" _debug_reldir = 1 .else _debug_reldir = 0 .endif .if ${DEBUG_DIRDEPS:@x@${DEP_RELDIR:M$x}${${DEP_RELDIR}.depend:L:M$x}@} != "" _debug_search = 1 .else _debug_search = 0 .endif # the rest is done repeatedly for every Makefile.depend we read. # if we are anything but the original dir we care only about the # machine type we were included for.. .if ${DEP_RELDIR} == "." _this_dir := ${SRCTOP} .else _this_dir := ${SRCTOP}/${DEP_RELDIR} .endif # on rare occasions, there can be a need for extra help _dep_hack := ${_this_dir}/${.MAKE.DEPENDFILE_PREFIX}.inc .-include <${_dep_hack}> .if ${DEP_RELDIR} != ${_DEP_RELDIR} || ${DEP_TARGET_SPEC} != ${TARGET_SPEC} # this should be all _machines := ${DEP_MACHINE} .else # this is the machine list we actually use below _machines := ${_only_machines} .if defined(HOSTPROG) || ${DEP_MACHINE} == "host" # we need to build this guy's dependencies for host as well. _machines += host .endif _machines := ${_machines:O:u} .endif .if ${TARGET_SPEC_VARS:[#]} > 1 # we need to tweak _machines _dm := ${DEP_MACHINE} # apply the same filtering that we do when qualifying DIRDEPS. # M_dep_qual_fixes expects .${MACHINE}* so add (and remove) '.' _machines := ${_machines:@DEP_MACHINE@${DEP_TARGET_SPEC}@:S,^,.,:${M_dep_qual_fixes:ts:}:O:u:S,^.,,} DEP_MACHINE := ${_dm} .endif # reset each time through _build_dirs = .if ${DEP_RELDIR} == ${_DEP_RELDIR} # pickup other machines for this dir if necessary .if ${BUILD_AT_LEVEL0:Uyes} == "no" _build_dirs += ${_machines:@m@${_CURDIR}.$m@} .else _build_dirs += ${_machines:N${DEP_TARGET_SPEC}:@m@${_CURDIR}.$m@} .if ${DEP_TARGET_SPEC} == ${TARGET_SPEC} # pickup local dependencies now .if ${MAKE_VERSION} < 20160220 .-include <.depend> .else .dinclude <.depend> .endif .endif .endif .endif .if ${_debug_reldir} .info ${DEP_RELDIR}.${DEP_TARGET_SPEC}: DIRDEPS='${DIRDEPS}' .info ${DEP_RELDIR}.${DEP_TARGET_SPEC}: _machines='${_machines}' .endif .if !empty(DIRDEPS) # these we reset each time through as they can depend on DEP_MACHINE DEP_DIRDEPS_FILTER = \ ${DIRDEPS_FILTER.${DEP_TARGET_SPEC}:U} \ ${DIRDEPS_FILTER.${DEP_MACHINE}:U} \ ${DIRDEPS_FILTER:U} .if empty(DEP_DIRDEPS_FILTER) # something harmless DEP_DIRDEPS_FILTER = U .endif # this is what we start with __depdirs := ${DIRDEPS:${NSkipDir}:${DEP_DIRDEPS_FILTER:ts:}:C,//+,/,g:O:u:@d@${SRCTOP}/$d@} # some entries may be qualified with . # the :M*/*/*.* just tries to limit the dirs we check to likely ones. # the ${d:E:M*/*} ensures we don't consider junos/usr.sbin/mgd __qual_depdirs := ${__depdirs:M*/*/*.*:@d@${exists($d):?:${"${d:E:M*/*}":?:${exists(${d:R}):?$d:}}}@} __unqual_depdirs := ${__depdirs:${__qual_depdirs:Uno:${M_ListToSkip}}} .if ${DEP_RELDIR} == ${_DEP_RELDIR} # if it was called out - we likely need it. __hostdpadd := ${DPADD:U.:M${HOST_OBJTOP}/*:S,${HOST_OBJTOP}/,,:H:${NSkipDir}:${DIRDEPS_FILTER:ts:}:S,$,.host,:N.*:@d@${SRCTOP}/$d@} __qual_depdirs += ${__hostdpadd} .endif .if ${_debug_reldir} .info depdirs=${__depdirs} .info qualified=${__qual_depdirs} .info unqualified=${__unqual_depdirs} .endif # _build_dirs is what we will feed to _DIRDEP_USE _build_dirs += \ ${__qual_depdirs:M*.host:${NSkipHostDir}:N.host} \ ${__qual_depdirs:N*.host} \ ${_machines:Mhost*:@m@${__unqual_depdirs:@d@$d.$m@}@:${NSkipHostDir}:N.host} \ ${_machines:Nhost*:@m@${__unqual_depdirs:@d@$d.$m@}@} # qualify everything now _build_dirs := ${_build_dirs:${M_dep_qual_fixes:ts:}:O:u} _build_all_dirs += ${_build_dirs} _build_all_dirs := ${_build_all_dirs:O:u} .endif # empty DIRDEPS # Normally if doing make -V something, # we do not want to waste time chasing DIRDEPS # but if we want to count the number of Makefile.depend* read, we do. .if ${.MAKEFLAGS:M-V${_V_READ_DIRDEPS}} == "" .if !empty(_build_all_dirs) .if ${BUILD_DIRDEPS_CACHE} == "yes" x!= { echo; echo '\# ${DEP_RELDIR}.${DEP_TARGET_SPEC}'; \ echo 'dirdeps: ${_build_all_dirs:${M_oneperline}}'; echo; } >&3; echo x!= { ${_build_all_dirs:@x@${target($x):?:echo '$x: _DIRDEP_USE';}@} echo; } >&3; echo .else # this makes it all happen dirdeps: ${_build_all_dirs} .endif ${_build_all_dirs}: _DIRDEP_USE .if ${_debug_reldir} .info ${DEP_RELDIR}.${DEP_TARGET_SPEC}: needs: ${_build_dirs} .endif # this builds the dependency graph .for m in ${_machines} # it would be nice to do :N${.TARGET} .if !empty(__qual_depdirs) .for q in ${__qual_depdirs:${M_dep_qual_fixes:ts:}:E:O:u:N$m} .if ${_debug_reldir} || ${DEBUG_DIRDEPS:@x@${${DEP_RELDIR}.$m:L:M$x}${${DEP_RELDIR}.$q:L:M$x}@} != "" .info ${DEP_RELDIR}.$m: graph: ${_build_dirs:M*.$q} .endif .if ${BUILD_DIRDEPS_CACHE} == "yes" x!= { echo; echo '${_this_dir}.$m: ${_build_dirs:M*.$q:${M_oneperline}}'; echo; } >&3; echo .else ${_this_dir}.$m: ${_build_dirs:M*.$q} .endif .endfor .endif .if ${_debug_reldir} .info ${DEP_RELDIR}.$m: graph: ${_build_dirs:M*.$m:N${_this_dir}.$m} .endif .if ${BUILD_DIRDEPS_CACHE} == "yes" x!= { echo; echo '${_this_dir}.$m: ${_build_dirs:M*.$m:N${_this_dir}.$m:${M_oneperline}}'; echo; } >&3; echo .else ${_this_dir}.$m: ${_build_dirs:M*.$m:N${_this_dir}.$m} .endif .endfor .endif # Now find more dependencies - and recurse. .for d in ${_build_all_dirs} .if !target(_dirdeps_checked.$d) # once only _dirdeps_checked.$d: .if ${_debug_search} .info checking $d .endif # Note: _build_all_dirs is fully qualifed so d:R is always the directory .if exists(${d:R}) # Warning: there is an assumption here that MACHINE is always # the first entry in TARGET_SPEC_VARS. # If TARGET_SPEC and MACHINE are insufficient, you have a problem. _m := ${.MAKE.DEPENDFILE_PREFERENCE:T:S;${TARGET_SPEC}$;${d:E};:S;${MACHINE};${d:E:C/,.*//};:@m@${exists(${d:R}/$m):?${d:R}/$m:}@:[1]} .if !empty(_m) # M_dep_qual_fixes isn't geared to Makefile.depend _qm := ${_m:C;(\.depend)$;\1.${d:E};:${M_dep_qual_fixes:ts:}} .if ${_debug_search} .info Looking for ${_qm} .endif # we pass _DEP_TARGET_SPEC to tell the next step what we want _DEP_TARGET_SPEC := ${d:E} # some makefiles may still look at this _DEP_MACHINE := ${d:E:C/,.*//} # set this "just in case" # we can skip :tA since we computed the path above DEP_RELDIR := ${_m:H:S,${SRCTOP}/,,} # and reset this DIRDEPS = .if ${_debug_reldir} && ${_qm} != ${_m} .info loading ${_m} for ${d:E} .endif .include <${_m}> .endif .endif .endif .endfor .endif # -V .endif # BUILD_DIRDEPS .elif ${.MAKE.LEVEL} > 42 .error You should have stopped recursing by now. .else # we are building something DEP_RELDIR := ${RELDIR} _DEP_RELDIR := ${RELDIR} # pickup local dependencies .if ${MAKE_VERSION} < 20160220 .-include <.depend> .else .dinclude <.depend> .endif .endif # bootstrapping new dependencies made easy? .if !target(bootstrap) && (make(bootstrap) || \ make(bootstrap-this) || \ make(bootstrap-recurse) || \ make(bootstrap-empty)) -.if exists(${.CURDIR}/${.MAKE.DEPENDFILE:T}) +# if we are bootstrapping create the default +_want = ${.CURDIR}/${.MAKE.DEPENDFILE_DEFAULT:T} + +.if exists(${_want}) # stop here ${.TARGETS:Mboot*}: .elif !make(bootstrap-empty) # find a Makefile.depend to use as _src _src != cd ${.CURDIR} && for m in ${.MAKE.DEPENDFILE_PREFERENCE:T:S,${MACHINE},*,}; do test -s $$m || continue; echo $$m; break; done; echo .if empty(_src) .error cannot find any of ${.MAKE.DEPENDFILE_PREFERENCE:T}${.newline}Use: bootstrap-empty .endif -_src?= ${.MAKE.DEPENDFILE:T} +_src?= ${.MAKE.DEPENDFILE} +.MAKE.DEPENDFILE_BOOTSTRAP_SED+= -e 's,${_src:E},${MACHINE},g' + # just create Makefile.depend* for this dir bootstrap-this: .NOTMAIN - @echo Bootstrapping ${RELDIR}/${.MAKE.DEPENDFILE:T} from ${_src:T} - (cd ${.CURDIR} && sed 's,${_src:E},${MACHINE},g' ${_src} > ${.MAKE.DEPENDFILE:T}) + @echo Bootstrapping ${RELDIR}/${_want:T} from ${_src:T}; \ + echo You need to build ${RELDIR} to correctly populate it. +.if ${_src:T} != ${.MAKE.DEPENDFILE_PREFIX:T} + (cd ${.CURDIR} && sed ${.MAKE.DEPENDFILE_BOOTSTRAP_SED} ${_src} > ${_want}) +.else + cp ${.CURDIR}/${_src} ${_want} +.endif # create Makefile.depend* for this dir and its dependencies bootstrap: bootstrap-recurse bootstrap-recurse: bootstrap-this _mf := ${.PARSEFILE} bootstrap-recurse: .NOTMAIN .MAKE @cd ${SRCTOP} && \ for d in `cd ${RELDIR} && ${.MAKE} -B -f ${"${.MAKEFLAGS:M-n}":?${_src}:${.MAKE.DEPENDFILE:T}} -V DIRDEPS`; do \ test -d $$d || d=$${d%.*}; \ test -d $$d || continue; \ echo "Checking $$d for bootstrap ..."; \ (cd $$d && ${.MAKE} -f ${_mf} bootstrap-recurse); \ done .endif # create an empty Makefile.depend* to get the ball rolling. bootstrap-empty: .NOTMAIN .NOMETA - @echo Creating empty ${RELDIR}/${.MAKE.DEPENDFILE:T}; \ + @echo Creating empty ${RELDIR}/${_want:T}; \ echo You need to build ${RELDIR} to correctly populate it. - @{ echo DIRDEPS=; echo ".include "; } > ${.CURDIR}/${.MAKE.DEPENDFILE:T} + @{ echo DIRDEPS=; echo ".include "; } > ${_want} .endif Index: projects/clang390-import/contrib/bmake/mk/install-mk =================================================================== --- projects/clang390-import/contrib/bmake/mk/install-mk (revision 305686) +++ projects/clang390-import/contrib/bmake/mk/install-mk (revision 305687) @@ -1,185 +1,185 @@ : # NAME: # install-mk - install mk files # # SYNOPSIS: # install-mk [options] [var=val] [dest] # # DESCRIPTION: # This tool installs mk files in a semi-intelligent manner into # "dest". # # Options: # # -n just say what we want to do, but don't touch anything. # # -f use -f when copying sys,mk. # # -v be verbose # # -q be quiet # # -m "mode" # Use "mode" for installed files (444). # # -o "owner" # Use "owner" for installed files. # # -g "group" # Use "group" for installed files. # # var=val # Set "var" to "val". See below. # # All our *.mk files are copied to "dest" with appropriate # ownership and permissions. # # By default if a sys.mk can be found in a standard location # (that bmake will find) then no sys.mk will be put in "dest". # # SKIP_SYS_MK: # If set, we will avoid installing our 'sys.mk' # This is probably a bad idea. # # SKIP_BSD_MK: # If set, we will skip making bsd.*.mk links to *.mk # # sys.mk: # # By default (and provided we are not installing to the system # mk dir - '/usr/share/mk') we install our own 'sys.mk' which # includes a sys specific file, or a generic one. # # # AUTHOR: # Simon J. Gerraty # RCSid: -# $Id: install-mk,v 1.128 2016/06/03 17:22:32 sjg Exp $ +# $Id: install-mk,v 1.130 2016/08/15 19:28:13 sjg Exp $ # # @(#) Copyright (c) 1994 Simon J. Gerraty # # This file is provided in the hope that it will # be of use. There is absolutely NO WARRANTY. # Permission to copy, redistribute or otherwise # use this file is hereby granted provided that # the above copyright notice and this notice are # left intact. # # Please send copies of changes and bug-fixes to: # sjg@crufty.net # -MK_VERSION=20160602 +MK_VERSION=20160815 OWNER= GROUP= MODE=444 BINMODE=555 ECHO=: SKIP= cp_f=-f while : do case "$1" in *=*) eval "$1"; shift;; +f) cp_f=; shift;; -f) cp_f=-f; shift;; -m) MODE=$2; shift 2;; -o) OWNER=$2; shift 2;; -g) GROUP=$2; shift 2;; -v) ECHO=echo; shift;; -q) ECHO=:; shift;; -n) ECHO=echo SKIP=:; shift;; --) shift; break;; *) break;; esac done case $# in 0) echo "$0 [options] []" echo "eg." echo "$0 -o bin -g bin -m 444 /usr/local/share/mk" exit 1 ;; esac dest=$1 os=${2:-`uname`} osrel=${3:-`uname -r`} Do() { $ECHO "$@" $SKIP "$@" } Error() { echo "ERROR: $@" >&2 exit 1 } Warning() { echo "WARNING: $@" >&2 } [ "$FORCE_SYS_MK" ] && Warning "ignoring: FORCE_{BSD,SYS}_MK (no longer supported)" SYS_MK_DIR=${SYS_MK_DIR:-/usr/share/mk} SYS_MK=${SYS_MK:-$SYS_MK_DIR/sys.mk} realpath() { [ -d $1 ] && cd $1 && 'pwd' && return echo $1 } if [ -s $SYS_MK -a -d $dest ]; then # if this is a BSD system we don't want to touch $SYS_MK dest=`realpath $dest` sys_mk_dir=`realpath $SYS_MK_DIR` if [ $dest = $sys_mk_dir ]; then case "$os" in *BSD*) SKIP_SYS_MK=: SKIP_BSD_MK=: ;; *) # could be fake? if [ ! -d $dest/sys -a ! -s $dest/Generic.sys.mk ]; then SKIP_SYS_MK=: # play safe SKIP_BSD_MK=: fi ;; esac fi fi [ -d $dest/sys ] || Do mkdir -p $dest/sys [ -d $dest/sys ] || Do mkdir $dest/sys || exit 1 [ -z "$SKIP" ] && dest=`realpath $dest` cd `dirname $0` mksrc=`'pwd'` if [ $mksrc = $dest ]; then SKIP_MKFILES=: else # we do not install the examples mk_files=`grep '^[a-z].*\.mk' FILES | egrep -v '(examples/|^sys\.mk|sys/)'` mk_scripts=`egrep '^[a-z].*\.(sh|py)' FILES | egrep -v '/'` sys_mk_files=`grep 'sys/.*\.mk' FILES` SKIP_MKFILES= [ -z "$SKIP_SYS_MK" ] && mk_files="sys.mk $mk_files" fi $SKIP_MKFILES Do cp $cp_f $mk_files $dest $SKIP_MKFILES Do cp $cp_f $sys_mk_files $dest/sys $SKIP_MKFILES Do cp $cp_f $mk_scripts $dest $SKIP cd $dest $SKIP_MKFILES Do chmod $MODE $mk_files $sys_mk_files $SKIP_MKFILES Do chmod $BINMODE $mk_scripts [ "$GROUP" ] && $SKIP_MKFILES Do chgrp $GROUP $mk_files $sys_mk_files [ "$OWNER" ] && $SKIP_MKFILES Do chown $OWNER $mk_files $sys_mk_files # if this is a BSD system the bsd.*.mk should exist and be used. if [ -z "$SKIP_BSD_MK" ]; then for f in dep doc init lib links man nls obj own prog subdir do b=bsd.$f.mk [ -s $b ] || Do ln -s $f.mk $b done fi exit 0 Index: projects/clang390-import/contrib/bmake/mk/lib.mk =================================================================== --- projects/clang390-import/contrib/bmake/mk/lib.mk (revision 305686) +++ projects/clang390-import/contrib/bmake/mk/lib.mk (revision 305687) @@ -1,603 +1,604 @@ -# $Id: lib.mk,v 1.53 2016/03/22 20:45:14 sjg Exp $ +# $Id: lib.mk,v 1.54 2016/08/02 20:52:17 sjg Exp $ .if !target(__${.PARSEFILE}__) __${.PARSEFILE}__: .include .if ${OBJECT_FMT} == "ELF" NEED_SOLINKS?= yes .endif .if exists(${.CURDIR}/shlib_version) SHLIB_MAJOR != . ${.CURDIR}/shlib_version ; echo $$major SHLIB_MINOR != . ${.CURDIR}/shlib_version ; echo $$minor .endif print-shlib-major: .if defined(SHLIB_MAJOR) && ${MK_PIC} != "no" @echo ${SHLIB_MAJOR} .else @false .endif print-shlib-minor: .if defined(SHLIB_MINOR) && ${MK_PIC} != "no" @echo ${SHLIB_MINOR} .else @false .endif print-shlib-teeny: .if defined(SHLIB_TEENY) && ${MK_PIC} != "no" @echo ${SHLIB_TEENY} .else @false .endif SHLIB_FULLVERSION ?= ${${SHLIB_MAJOR} ${SHLIB_MINOR} ${SHLIB_TEENY}:L:ts.} SHLIB_FULLVERSION := ${SHLIB_FULLVERSION} # add additional suffixes not exported. # .po is used for profiling object files. # .So is used for PIC object files. .SUFFIXES: .out .a .ln .So .po .o .s .S .c .cc .C .m .F .f .r .y .l .cl .p .h .SUFFIXES: .sh .m4 .m CFLAGS+= ${COPTS} # Derrived from NetBSD-1.6 # Set PICFLAGS to cc flags for producing position-independent code, # if not already set. Includes -DPIC, if required. # Data-driven table using make variables to control how shared libraries # are built for different platforms and object formats. # OBJECT_FMT: currently either "ELF" or "a.out", from # SHLIB_SOVERSION: version number to be compiled into a shared library # via -soname. Usually ${SHLIB_MAJOR} on ELF. # NetBSD/pmax used to use ${SHLIB_MAJOR}[.${SHLIB_MINOR} # [.${SHLIB_TEENY}]] # SHLIB_SHFLAGS: Flags to tell ${LD} to emit shared library. # with ELF, also set shared-lib version for ld.so. # SHLIB_LDSTARTFILE: support .o file, call C++ file-level constructors # SHLIB_LDENDFILE: support .o file, call C++ file-level destructors # FPICFLAGS: flags for ${FC} to compile .[fF] files to .So objects. # CPPICFLAGS: flags for ${CPP} to preprocess .[sS] files for ${AS} # CPICFLAGS: flags for ${CC} to compile .[cC] files to .So objects. # CAPICFLAGS flags for {$CC} to compiling .[Ss] files # (usually just ${CPPPICFLAGS} ${CPICFLAGS}) # APICFLAGS: flags for ${AS} to assemble .[sS] to .So objects. .if ${TARGET_OSNAME} == "NetBSD" .if ${MACHINE_ARCH} == "alpha" # Alpha-specific shared library flags FPICFLAGS ?= -fPIC CPICFLAGS ?= -fPIC -DPIC CPPPICFLAGS?= -DPIC CAPICFLAGS?= ${CPPPICFLAGS} ${CPICFLAGS} APICFLAGS ?= .elif ${MACHINE_ARCH} == "mipsel" || ${MACHINE_ARCH} == "mipseb" # mips-specific shared library flags # On mips, all libs are compiled with ABIcalls, not just sharedlibs. MKPICLIB= no # so turn shlib PIC flags on for ${AS}. AINC+=-DABICALLS AFLAGS+= -fPIC AS+= -KPIC .elif ${MACHINE_ARCH} == "vax" && ${OBJECT_FMT} == "ELF" # On the VAX, all object are PIC by default, not just sharedlibs. MKPICLIB= no .elif (${MACHINE_ARCH} == "sparc" || ${MACHINE_ARCH} == "sparc64") && \ ${OBJECT_FMT} == "ELF" # If you use -fPIC you need to define BIGPIC to turn on 32-bit # relocations in asm code FPICFLAGS ?= -fPIC CPICFLAGS ?= -fPIC -DPIC CPPPICFLAGS?= -DPIC -DBIGPIC CAPICFLAGS?= ${CPPPICFLAGS} ${CPICFLAGS} APICFLAGS ?= -KPIC .else # Platform-independent flags for NetBSD a.out shared libraries SHLIB_SOVERSION=${SHLIB_FULLVERSION} SHLIB_SHFLAGS= FPICFLAGS ?= -fPIC CPICFLAGS?= -fPIC -DPIC CPPPICFLAGS?= -DPIC CAPICFLAGS?= ${CPPPICFLAGS} ${CPICFLAGS} APICFLAGS?= -k .endif # Platform-independent linker flags for ELF shared libraries .if ${OBJECT_FMT} == "ELF" SHLIB_SOVERSION= ${SHLIB_MAJOR} SHLIB_SHFLAGS= -soname lib${LIB}.so.${SHLIB_SOVERSION} SHLIB_LDSTARTFILE?= /usr/lib/crtbeginS.o SHLIB_LDENDFILE?= /usr/lib/crtendS.o .endif # for compatibility with the following CC_PIC?= ${CPICFLAGS} LD_shared=${SHLIB_SHFLAGS} .endif # NetBSD .if ${TARGET_OSNAME} == "FreeBSD" .if ${OBJECT_FMT} == "ELF" SHLIB_SOVERSION= ${SHLIB_MAJOR} SHLIB_SHFLAGS= -soname lib${LIB}.so.${SHLIB_SOVERSION} .else SHLIB_SHFLAGS= -assert pure-text .endif SHLIB_LDSTARTFILE= SHLIB_LDENDFILE= CC_PIC?= -fpic LD_shared=${SHLIB_SHFLAGS} .endif # FreeBSD MKPICLIB?= yes # sys.mk can override these LD_X?=-X LD_x?=-x LD_r?=-r # Non BSD machines will be using bmake. .if ${TARGET_OSNAME} == "SunOS" LD_shared=-assert pure-text .if ${OBJECT_FMT} == "ELF" || ${MACHINE} == "solaris" # Solaris LD_shared=-h lib${LIB}.so.${SHLIB_MAJOR} -G .endif .elif ${TARGET_OSNAME} == "HP-UX" LD_shared=-b LD_so=sl DLLIB= # HPsUX lorder does not grok anything but .o LD_sobjs=`${LORDER} ${OBJS} | ${TSORT} | sed 's,\.o,.So,'` LD_pobjs=`${LORDER} ${OBJS} | ${TSORT} | sed 's,\.o,.po,'` .elif ${TARGET_OSNAME} == "OSF1" LD_shared= -msym -shared -expect_unresolved '*' LD_solib= -all lib${LIB}_pic.a DLLIB= # lorder does not grok anything but .o LD_sobjs=`${LORDER} ${OBJS} | ${TSORT} | sed 's,\.o,.So,'` LD_pobjs=`${LORDER} ${OBJS} | ${TSORT} | sed 's,\.o,.po,'` AR_cq= -cqs .elif ${TARGET_OSNAME} == "FreeBSD" LD_solib= lib${LIB}_pic.a .elif ${TARGET_OSNAME} == "Linux" SHLIB_LD = ${CC} # this is ambiguous of course LD_shared=-shared -Wl,"-h lib${LIB}.so.${SHLIB_MAJOR}" LD_solib= -Wl,--whole-archive lib${LIB}_pic.a -Wl,--no-whole-archive # Linux uses GNU ld, which is a multi-pass linker # so we don't need to use lorder or tsort LD_objs = ${OBJS} LD_pobjs = ${POBJS} LD_sobjs = ${SOBJS} .elif ${TARGET_OSNAME} == "Darwin" SHLIB_LD = ${CC} SHLIB_INSTALL_VERSION ?= ${SHLIB_MAJOR} SHLIB_COMPATABILITY_VERSION ?= ${SHLIB_MAJOR}.${SHLIB_MINOR:U0} SHLIB_COMPATABILITY ?= \ -compatibility_version ${SHLIB_COMPATABILITY_VERSION} \ -current_version ${SHLIB_FULLVERSION} LD_shared = -dynamiclib \ -flat_namespace -undefined suppress \ -install_name ${LIBDIR}/lib${LIB}.${SHLIB_INSTALL_VERSION}.${LD_solink} \ ${SHLIB_COMPATABILITY} SHLIB_LINKS = .for v in ${SHLIB_COMPATABILITY_VERSION} ${SHLIB_INSTALL_VERSION} .if "$v" != "${SHLIB_FULLVERSION}" SHLIB_LINKS += lib${LIB}.$v.${LD_solink} .endif .endfor .if ${MK_LINKLIB} != "no" SHLIB_LINKS += lib${LIB}.${LD_solink} .endif LD_so = ${SHLIB_FULLVERSION}.dylib LD_sobjs = ${SOBJS:O:u} LD_solib = ${LD_sobjs} SOLIB = ${LD_sobjs} LD_solink = dylib .if ${MACHINE_ARCH} == "i386" PICFLAG ?= -fPIC .else PICFLAG ?= -fPIC -fno-common .endif RANLIB = : .endif SHLIB_LD ?= ${LD} .if !empty(SHLIB_MAJOR) .if ${NEED_SOLINKS} && empty(SHLIB_LINKS) .if ${MK_LINKLIB} != "no" SHLIB_LINKS = lib${LIB}.${LD_solink} .endif .if "${SHLIB_FULLVERSION}" != "${SHLIB_MAJOR}" SHLIB_LINKS += lib${LIB}.${LD_solink}.${SHLIB_MAJOR} .endif .endif .endif LIBTOOL?=libtool LD_shared ?= -Bshareable -Bforcearchive LD_so ?= so.${SHLIB_FULLVERSION} LD_solink ?= so .if empty(LORDER) LD_objs ?= ${OBJS} LD_pobjs ?= ${POBJS} LD_sobjs ?= ${SOBJS} .else LD_objs ?= `${LORDER} ${OBJS} | ${TSORT}` LD_sobjs ?= `${LORDER} ${SOBJS} | ${TSORT}` LD_pobjs ?= `${LORDER} ${POBJS} | ${TSORT}` .endif LD_solib ?= ${LD_sobjs} AR_cq ?= cq .if exists(/netbsd) && exists(${DESTDIR}/usr/lib/libdl.so) DLLIB ?= -ldl .endif # some libs have lots of objects, and scanning all .o, .po and .So meta files # is a waste of time, this tells meta.autodep.mk to just pick one # (typically .So) # yes, 42 is a random number. .if ${MK_DIRDEPS_BUILD} == "yes" && ${SRCS:Uno:[\#]} > 42 OPTIMIZE_OBJECT_META_FILES ?= yes .endif .if ${MK_LIBTOOL} == "yes" # because libtool is so fascist about naming the object files, # we cannot (yet) build profiled libs MK_PROFILE=no _LIBS=lib${LIB}.a .if exists(${.CURDIR}/shlib_version) SHLIB_AGE != . ${.CURDIR}/shlib_version ; echo $$age .endif .else # for the normal .a we do not want to strip symbols .c.o: ${COMPILE.c} ${.IMPSRC} # for the normal .a we do not want to strip symbols ${CXX_SUFFIXES:%=%.o}: ${COMPILE.cc} ${.IMPSRC} .S.o .s.o: @echo ${COMPILE.S} ${CFLAGS:M-[ID]*} ${AINC} ${.IMPSRC} @${COMPILE.S} ${CFLAGS:M-[ID]*} ${AINC} ${.IMPSRC} .if (${LD_X} == "") .c.po: ${COMPILE.c} ${CC_PG} ${PROFFLAGS} ${.IMPSRC} -o ${.TARGET} ${CXX_SUFFIXES:%=%.po}: ${COMPILE.cc} -pg ${.IMPSRC} -o ${.TARGET} .S.So .s.So: ${COMPILE.S} ${PICFLAG} ${CC_PIC} ${CFLAGS:M-[ID]*} ${AINC} ${.IMPSRC} -o ${.TARGET} .else .c.po: @echo ${COMPILE.c} ${CC_PG} ${PROFFLAGS} ${.IMPSRC} -o ${.TARGET} @${COMPILE.c} ${CC_PG} ${PROFFLAGS} ${.IMPSRC} -o ${.TARGET}.o @${LD} ${LD_X} ${LD_r} ${.TARGET}.o -o ${.TARGET} @rm -f ${.TARGET}.o ${CXX_SUFFIXES:%=%.po}: @echo ${COMPILE.cc} ${CXX_PG} ${PROFFLAGS} ${.IMPSRC} -o ${.TARGET} @${COMPILE.cc} ${CXX_PG} ${.IMPSRC} -o ${.TARGET}.o @${LD} ${LD_X} ${LD_r} ${.TARGET}.o -o ${.TARGET} @rm -f ${.TARGET}.o .S.So .s.So: @echo ${COMPILE.S} ${PICFLAG} ${CC_PIC} ${CFLAGS:M-[ID]*} ${AINC} ${.IMPSRC} -o ${.TARGET} @${COMPILE.S} ${PICFLAG} ${CC_PIC} ${CFLAGS:M-[ID]*} ${AINC} ${.IMPSRC} -o ${.TARGET}.o @${LD} ${LD_x} ${LD_r} ${.TARGET}.o -o ${.TARGET} @rm -f ${.TARGET}.o .endif .if (${LD_x} == "") .c.So: ${COMPILE.c} ${PICFLAG} ${CC_PIC} ${.IMPSRC} -o ${.TARGET} ${CXX_SUFFIXES:%=%.So}: ${COMPILE.cc} ${PICFLAG} ${CC_PIC} ${.IMPSRC} -o ${.TARGET} .S.po .s.po: ${COMPILE.S} ${PROFFLAGS} ${CFLAGS:M-[ID]*} ${AINC} ${.IMPSRC} -o ${.TARGET} .else .c.So: @echo ${COMPILE.c} ${PICFLAG} ${CC_PIC} ${.IMPSRC} -o ${.TARGET} @${COMPILE.c} ${PICFLAG} ${CC_PIC} ${.IMPSRC} -o ${.TARGET}.o @${LD} ${LD_x} ${LD_r} ${.TARGET}.o -o ${.TARGET} @rm -f ${.TARGET}.o ${CXX_SUFFIXES:%=%.So}: @echo ${COMPILE.cc} ${PICFLAG} ${CC_PIC} ${.IMPSRC} -o ${.TARGET} @${COMPILE.cc} ${PICFLAG} ${CC_PIC} ${.IMPSRC} -o ${.TARGET}.o @${LD} ${LD_x} ${LD_r} ${.TARGET}.o -o ${.TARGET} @rm -f ${.TARGET}.o .S.po .s.po: @echo ${COMPILE.S} ${PROFFLAGS} ${CFLAGS:M-[ID]*} ${AINC} ${.IMPSRC} -o ${.TARGET} @${COMPILE.S} ${PROFFLAGS} ${CFLAGS:M-[ID]*} ${AINC} ${.IMPSRC} -o ${.TARGET}.o @${LD} ${LD_X} ${LD_r} ${.TARGET}.o -o ${.TARGET} @rm -f ${.TARGET}.o .endif .endif .c.ln: ${LINT} ${LINTFLAGS} ${CFLAGS:M-[IDU]*} -i ${.IMPSRC} .if ${MK_LIBTOOL} != "yes" .if !defined(PICFLAG) PICFLAG=-fpic .endif _LIBS= .if ${MK_ARCHIVE} != "no" _LIBS += lib${LIB}.a .endif .if ${MK_PROFILE} != "no" _LIBS+=lib${LIB}_p.a POBJS+=${OBJS:.o=.po} .endif .if ${MK_PIC} != "no" .if ${MK_PICLIB} == "no" SOLIB ?= lib${LIB}.a .else SOLIB=lib${LIB}_pic.a _LIBS+=${SOLIB} .endif .if !empty(SHLIB_FULLVERSION) _LIBS+=lib${LIB}.${LD_so} .endif .endif .if ${MK_LINT} != "no" _LIBS+=llib-l${LIB}.ln .endif # here is where you can define what LIB* are .-include .if ${MK_DPADD_MK} == "yes" # lots of cool magic, but might not suit everyone. .include .endif .if !defined(_SKIP_BUILD) all: prebuild .WAIT ${_LIBS} # a hook for things that must be done early prebuild: .if !defined(.PARSEDIR) # no-op is the best we can do if not bmake. .WAIT: .endif .endif all: _SUBDIRUSE .for s in ${SRCS:N*.h:M*/*} ${.o .So .po .lo:L:@o@${s:T:R}$o@}: $s .endfor OBJS+= ${SRCS:T:N*.h:R:S/$/.o/g} .NOPATH: ${OBJS} .if ${MK_LIBTOOL} == "yes" .if ${MK_PIC} == "no" LT_STATIC=-static .else LT_STATIC= .endif SHLIB_AGE?=0 # .lo's are created as a side effect .s.o .S.o .c.o: ${LIBTOOL} --mode=compile ${CC} ${LT_STATIC} ${CFLAGS} ${CPPFLAGS} ${IMPFLAGS} -c ${.IMPSRC} # can't really do profiled libs with libtool - its too fascist about # naming the output... lib${LIB}.a:: ${OBJS} @rm -f ${.TARGET} ${LIBTOOL} --mode=link ${CC} ${LT_STATIC} -o ${.TARGET:.a=.la} ${OBJS:.o=.lo} -rpath ${SHLIBDIR}:/usr/lib -version-info ${SHLIB_MAJOR}:${SHLIB_MINOR}:${SHLIB_AGE} @ln .libs/${.TARGET} . lib${LIB}.${LD_so}:: lib${LIB}.a @[ -s ${.TARGET}.${SHLIB_AGE} ] || { ln -s .libs/lib${LIB}.${LD_so}* . 2>/dev/null; : } @[ -s ${.TARGET} ] || ln -s ${.TARGET}.${SHLIB_AGE} ${.TARGET} .else # MK_LIBTOOL=yes lib${LIB}.a:: ${OBJS} @echo building standard ${LIB} library @rm -f ${.TARGET} @${AR} ${AR_cq} ${.TARGET} ${LD_objs} ${RANLIB} ${.TARGET} POBJS+= ${OBJS:.o=.po} .NOPATH: ${POBJS} lib${LIB}_p.a:: ${POBJS} @echo building profiled ${LIB} library @rm -f ${.TARGET} @${AR} ${AR_cq} ${.TARGET} ${LD_pobjs} ${RANLIB} ${.TARGET} SOBJS+= ${OBJS:.o=.So} .NOPATH: ${SOBJS} lib${LIB}_pic.a:: ${SOBJS} @echo building shared object ${LIB} library @rm -f ${.TARGET} @${AR} ${AR_cq} ${.TARGET} ${LD_sobjs} ${RANLIB} ${.TARGET} #SHLIB_LDADD?= ${LDADD} # bound to be non-portable... # this is known to work for NetBSD 1.6 and FreeBSD 4.2 lib${LIB}.${LD_so}: ${SOLIB} ${DPADD} @echo building shared ${LIB} library \(version ${SHLIB_FULLVERSION}\) @rm -f ${.TARGET} .if ${TARGET_OSNAME} == "NetBSD" || ${TARGET_OSNAME} == "FreeBSD" .if ${OBJECT_FMT} == "ELF" ${SHLIB_LD} -x -shared ${SHLIB_SHFLAGS} -o ${.TARGET} \ ${SHLIB_LDSTARTFILE} \ --whole-archive ${SOLIB} --no-whole-archive ${SHLIB_LDADD} \ ${SHLIB_LDENDFILE} .else ${SHLIB_LD} ${LD_x} ${LD_shared} \ -o ${.TARGET} ${SOLIB} ${SHLIB_LDADD} .endif .else ${SHLIB_LD} -o ${.TARGET} ${LD_shared} ${LD_solib} ${DLLIB} ${SHLIB_LDADD} .endif .endif .if !empty(SHLIB_LINKS) rm -f ${SHLIB_LINKS}; ${SHLIB_LINKS:O:u:@x@ln -s ${.TARGET} $x;@} .endif LOBJS+= ${LSRCS:.c=.ln} ${SRCS:M*.c:.c=.ln} .NOPATH: ${LOBJS} LLIBS?= -lc llib-l${LIB}.ln: ${LOBJS} @echo building llib-l${LIB}.ln @rm -f llib-l${LIB}.ln @${LINT} -C${LIB} ${LOBJS} ${LLIBS} .if !target(clean) cleanlib: .PHONY rm -f a.out [Ee]rrs mklog core *.core ${CLEANFILES} rm -f lib${LIB}.a ${OBJS} rm -f lib${LIB}_p.a ${POBJS} rm -f lib${LIB}_pic.a lib${LIB}.so.*.* ${SOBJS} rm -f llib-l${LIB}.ln ${LOBJS} .if !empty(SHLIB_LINKS) rm -f ${SHLIB_LINKS} .endif clean: _SUBDIRUSE cleanlib cleandir: _SUBDIRUSE cleanlib .else cleandir: _SUBDIRUSE clean .endif .if defined(SRCS) && (!defined(MKDEP) || ${MKDEP} != autodep) afterdepend: .depend @(TMP=/tmp/_depend$$$$; \ sed -e 's/^\([^\.]*\).o[ ]*:/\1.o \1.po \1.So \1.ln:/' \ < .depend > $$TMP; \ mv $$TMP .depend) .endif .if !target(install) .if !target(beforeinstall) beforeinstall: .endif .if !empty(LIBOWN) LIB_INSTALL_OWN ?= -o ${LIBOWN} -g ${LIBGRP} .endif .include .if !target(realinstall) realinstall: libinstall .endif .if !target(libinstall) libinstall: [ -d ${DESTDIR}/${LIBDIR} ] || \ ${INSTALL} -d ${LIB_INSTALL_OWN} -m 775 ${DESTDIR}${LIBDIR} .if ${MK_ARCHIVE} != "no" ${INSTALL} ${COPY} ${LIB_INSTALL_OWN} -m 600 lib${LIB}.a \ ${DESTDIR}${LIBDIR} ${RANLIB} ${DESTDIR}${LIBDIR}/lib${LIB}.a chmod ${LIBMODE} ${DESTDIR}${LIBDIR}/lib${LIB}.a .endif .if ${MK_PROFILE} != "no" ${INSTALL} ${COPY} ${LIB_INSTALL_OWN} -m 600 \ lib${LIB}_p.a ${DESTDIR}${LIBDIR} ${RANLIB} ${DESTDIR}${LIBDIR}/lib${LIB}_p.a chmod ${LIBMODE} ${DESTDIR}${LIBDIR}/lib${LIB}_p.a .endif .if ${MK_PIC} != "no" .if ${MK_PICLIB} != "no" ${INSTALL} ${COPY} ${LIB_INSTALL_OWN} -m 600 \ lib${LIB}_pic.a ${DESTDIR}${LIBDIR} ${RANLIB} ${DESTDIR}${LIBDIR}/lib${LIB}_pic.a chmod ${LIBMODE} ${DESTDIR}${LIBDIR}/lib${LIB}_pic.a .endif .if !empty(SHLIB_MAJOR) ${INSTALL} ${COPY} ${LIB_INSTALL_OWN} -m ${LIBMODE} \ lib${LIB}.${LD_so} ${DESTDIR}${LIBDIR} .if !empty(SHLIB_LINKS) (cd ${DESTDIR}${LIBDIR} && { rm -f ${SHLIB_LINKS}; ${SHLIB_LINKS:O:u:@x@ln -s lib${LIB}.${LD_so} $x;@} }) .endif .endif .endif .if ${MK_LINT} != "no" && ${MK_LINKLIB} != "no" && !empty(LOBJS) ${INSTALL} ${COPY} ${LIB_INSTALL_OWN} -m ${LIBMODE} \ llib-l${LIB}.ln ${DESTDIR}${LINTLIBDIR} .endif .if defined(LINKS) && !empty(LINKS) @set ${LINKS}; ${_LINKS_SCRIPT} .endif .endif install: maninstall _SUBDIRUSE maninstall: afterinstall afterinstall: realinstall +libinstall: beforeinstall realinstall: beforeinstall .endif .if ${MK_MAN} != "no" .include .endif .if ${MK_NLS} != "no" .include .endif .include .include .include .include .endif # during building we usually need/want to install libs somewhere central # note that we do NOT ch{own,grp} as that would likely fail at this point. # otherwise it is the same as realinstall # Note that we don't need this when using dpadd.mk .libinstall: ${_LIBS} test -d ${DESTDIR}${LIBDIR} || ${INSTALL} -d -m775 ${DESTDIR}${LIBDIR} .for _lib in ${_LIBS:M*.a} ${INSTALL} ${COPY} -m 644 ${_lib} ${DESTDIR}${LIBDIR} ${RANLIB} ${DESTDIR}${LIBDIR}/${_lib} .endfor .for _lib in ${_LIBS:M*.${LD_solink}*:O:u} ${INSTALL} ${COPY} -m ${LIBMODE} ${_lib} ${DESTDIR}${LIBDIR} .if !empty(SHLIB_LINKS) (cd ${DESTDIR}${LIBDIR} && { ${SHLIB_LINKS:O:u:@x@ln -sf ${_lib} $x;@}; }) .endif .endfor @touch ${.TARGET} .include .endif Index: projects/clang390-import/contrib/bmake/mk/meta.sys.mk =================================================================== --- projects/clang390-import/contrib/bmake/mk/meta.sys.mk (revision 305686) +++ projects/clang390-import/contrib/bmake/mk/meta.sys.mk (revision 305687) @@ -1,153 +1,166 @@ -# $Id: meta.sys.mk,v 1.28 2016/04/05 15:58:37 sjg Exp $ +# $Id: meta.sys.mk,v 1.29 2016/08/13 17:51:45 sjg Exp $ # # @(#) Copyright (c) 2010, Simon J. Gerraty # # This file is provided in the hope that it will # be of use. There is absolutely NO WARRANTY. # Permission to copy, redistribute or otherwise # use this file is hereby granted provided that # the above copyright notice and this notice are # left intact. # # Please send copies of changes and bug-fixes to: # sjg@crufty.net # # include this if you want to enable meta mode # for maximum benefit, requires filemon(4) driver. .if ${MAKE_VERSION:U0} > 20100901 .if !target(.ERROR) .-include # absoulte path to what we are reading. _PARSEDIR = ${.PARSEDIR:tA} +.if !defined(SYS_MK_DIR) +SYS_MK_DIR := ${_PARSEDIR} +.endif + META_MODE += meta verbose .MAKE.MODE ?= ${META_MODE} .if ${.MAKE.LEVEL} == 0 _make_mode := ${.MAKE.MODE} ${META_MODE} .if ${_make_mode:M*read*} != "" || ${_make_mode:M*nofilemon*} != "" # tell everyone we are not updating Makefile.depend* UPDATE_DEPENDFILE = NO .export UPDATE_DEPENDFILE .endif .if ${UPDATE_DEPENDFILE:Uyes:tl} == "no" && !exists(/dev/filemon) # we should not get upset META_MODE += nofilemon .export META_MODE .endif .endif .if !defined(NO_SILENT) .if ${MAKE_VERSION} > 20110818 # only be silent when we have a .meta file META_MODE += silent=yes .else .SILENT: .endif .endif # we use the pseudo machine "host" for the build host. # this should be taken care of before we get here .if ${OBJTOP:Ua} == ${HOST_OBJTOP:Ub} MACHINE = host .endif .if ${.MAKE.LEVEL} == 0 # it can be handy to know which MACHINE kicked off the build # for example, if using Makefild.depend for multiple machines, # allowing only MACHINE0 to update can keep things simple. MACHINE0 := ${MACHINE} .export MACHINE0 .if defined(PYTHON) && exists(${PYTHON}) # we prefer the python version of this - it is much faster META2DEPS ?= ${.PARSEDIR}/meta2deps.py .else META2DEPS ?= ${.PARSEDIR}/meta2deps.sh .endif META2DEPS := ${META2DEPS} .export META2DEPS .endif MAKE_PRINT_VAR_ON_ERROR += \ .ERROR_TARGET \ .ERROR_META_FILE \ .MAKE.LEVEL \ MAKEFILE \ .MAKE.MODE .if !defined(SB) && defined(SRCTOP) SB = ${SRCTOP:H} .endif ERROR_LOGDIR ?= ${SB}/error meta_error_log = ${ERROR_LOGDIR}/meta-${.MAKE.PID}.log # we are not interested in make telling us a failure happened elsewhere .ERROR: _metaError _metaError: .NOMETA .NOTMAIN -@[ "${.ERROR_META_FILE}" ] && { \ grep -q 'failure has been detected in another branch' ${.ERROR_META_FILE} && exit 0; \ mkdir -p ${meta_error_log:H}; \ cp ${.ERROR_META_FILE} ${meta_error_log}; \ echo "ERROR: log ${meta_error_log}" >&2; }; : .endif META_COOKIE_TOUCH= # some targets need to be .PHONY in non-meta mode META_NOPHONY= .PHONY # Are we, after all, in meta mode? .if ${.MAKE.MODE:Uno:Mmeta*} != "" MKDEP_MK = meta.autodep.mk .if ${.MAKE.MAKEFILES:M*sys.dependfile.mk} == "" # this does all the smarts of setting .MAKE.DEPENDFILE .-include # check if we got anything sane .if ${.MAKE.DEPENDFILE} == ".depend" .undef .MAKE.DEPENDFILE .endif .MAKE.DEPENDFILE ?= Makefile.depend .endif # we can afford to use cookies to prevent some targets # re-running needlessly META_COOKIE_TOUCH= touch ${COOKIE.${.TARGET}:U${.OBJDIR}/${.TARGET}} META_NOPHONY= + +# some targets involve old pre-built targets +# ignore mtime of shell +# and mtime of makefiles does not matter in meta mode +.MAKE.META.IGNORE_PATHS += \ + ${MAKEFILE} \ + ${SHELL} \ + ${SYS_MK_DIR} + .if ${UPDATE_DEPENDFILE:Uyes:tl} != "no" .if ${.MAKEFLAGS:Uno:M-k} != "" # make this more obvious .warning Setting UPDATE_DEPENDFILE=NO due to -k UPDATE_DEPENDFILE= NO .export UPDATE_DEPENDFILE .elif !exists(/dev/filemon) .error ${.newline}ERROR: The filemon module (/dev/filemon) is not loaded. .endif .endif .if ${.MAKE.LEVEL} == 0 # make sure dirdeps target exists and do it first all: dirdeps .WAIT dirdeps: .NOPATH: dirdeps .if defined(ALL_MACHINES) # the first .MAIN: is what counts # by default dirdeps is all we want at level0 .MAIN: dirdeps # tell dirdeps.mk what we want BUILD_AT_LEVEL0 = no .endif .if ${.TARGETS:Nall} == "" # it works best if we do everything via sub-makes BUILD_AT_LEVEL0 ?= no .endif .endif .endif .endif Index: projects/clang390-import/contrib/bmake/mk/prog.mk =================================================================== --- projects/clang390-import/contrib/bmake/mk/prog.mk (revision 305686) +++ projects/clang390-import/contrib/bmake/mk/prog.mk (revision 305687) @@ -1,222 +1,223 @@ -# $Id: prog.mk,v 1.26 2016/03/22 20:45:14 sjg Exp $ +# $Id: prog.mk,v 1.27 2016/08/02 20:52:17 sjg Exp $ .if !target(__${.PARSEFILE}__) __${.PARSEFILE}__: .include # FreeBSD at least expects MAN8 etc. .if defined(MAN) && !empty(MAN) _sect:=${MAN:E} MAN${_sect}=${MAN} .endif .SUFFIXES: .out .o .c .cc .C .y .l .s .8 .7 .6 .5 .4 .3 .2 .1 .0 CFLAGS+= ${COPTS} .if ${TARGET_OSNAME} == "NetBSD" .if ${MACHINE_ARCH} == "sparc64" CFLAGS+= -mcmodel=medlow .endif # ELF platforms depend on crtbegin.o and crtend.o .if ${OBJECT_FMT} == "ELF" .ifndef LIBCRTBEGIN LIBCRTBEGIN= ${DESTDIR}/usr/lib/crtbegin.o .MADE: ${LIBCRTBEGIN} .endif .ifndef LIBCRTEND LIBCRTEND= ${DESTDIR}/usr/lib/crtend.o .MADE: ${LIBCRTEND} .endif _SHLINKER= ${SHLINKDIR}/ld.elf_so .else LIBCRTBEGIN?= LIBCRTEND?= _SHLINKER= ${SHLINKDIR}/ld.so .endif .ifndef LIBCRT0 LIBCRT0= ${DESTDIR}/usr/lib/crt0.o .MADE: ${LIBCRT0} .endif .endif # NetBSD # here is where you can define what LIB* are .-include .if ${MK_DPADD_MK} == "yes" # lots of cool magic, but might not suit everyone. .include .endif .if ${MK_GPROF} == "yes" CFLAGS+= ${CC_PG} ${PROFFLAGS} LDADD+= ${CC_PG} .if ${MK_DPADD_MK} == "no" LDADD_LIBC_P?= -lc_p LDADD_LAST+= ${LDADD_LIBC_P} .endif .endif .if defined(SHAREDSTRINGS) CLEANFILES+=strings .c.o: ${CC} -E ${CFLAGS} ${.IMPSRC} | xstr -c - @${CC} ${CFLAGS} -c x.c -o ${.TARGET} @rm -f x.c ${CXX_SUFFIXES:%=%.o}: ${CXX} -E ${CXXFLAGS} ${.IMPSRC} | xstr -c - @mv -f x.c x.cc @${CXX} ${CXXFLAGS} -c x.cc -o ${.TARGET} @rm -f x.cc .endif .if defined(PROG) SRCS?= ${PROG}.c .for s in ${SRCS:N*.h:N*.sh:M*/*} ${.o .po .lo:L:@o@${s:T:R}$o@}: $s .endfor .if !empty(SRCS:N*.h:N*.sh) OBJS+= ${SRCS:T:N*.h:N*.sh:R:S/$/.o/g} LOBJS+= ${LSRCS:.c=.ln} ${SRCS:M*.c:.c=.ln} .endif .if defined(OBJS) && !empty(OBJS) .NOPATH: ${OBJS} ${PROG} ${SRCS:M*.[ly]:C/\..$/.c/} ${YHEADER:D${SRCS:M*.y:.y=.h}} # this is known to work for NetBSD 1.6 and FreeBSD 4.2 .if ${TARGET_OSNAME} == "NetBSD" || ${TARGET_OSNAME} == "FreeBSD" _PROGLDOPTS= .if ${SHLINKDIR} != "/usr/libexec" # XXX: change or remove if ld.so moves _PROGLDOPTS+= -Wl,-dynamic-linker=${_SHLINKER} .endif .if defined(LIBDIR) && ${SHLIBDIR} != ${LIBDIR} _PROGLDOPTS+= -Wl,-rpath-link,${DESTDIR}${SHLIBDIR}:${DESTDIR}/usr/lib \ -L${DESTDIR}${SHLIBDIR} .endif _PROGLDOPTS+= -Wl,-rpath,${SHLIBDIR}:/usr/lib .if defined(PROG_CXX) _CCLINK= ${CXX} _SUPCXX= -lstdc++ -lm .endif .endif # NetBSD _CCLINK?= ${CC} .if defined(DESTDIR) && exists(${LIBCRT0}) && ${LIBCRT0} != "/dev/null" ${PROG}: ${LIBCRT0} ${OBJS} ${LIBC} ${DPADD} ${_CCLINK} ${LDFLAGS} ${LDSTATIC} -o ${.TARGET} -nostdlib ${_PROGLDOPTS} -L${DESTDIR}/usr/lib ${LIBCRT0} ${LIBCRTBEGIN} ${OBJS} ${LDADD} -L${DESTDIR}/usr/lib ${_SUPCXX} -lgcc -lc -lgcc ${LIBCRTEND} .else ${PROG}: ${LIBCRT0} ${OBJS} ${LIBC} ${DPADD} ${_CCLINK} ${LDFLAGS} ${LDSTATIC} -o ${.TARGET} ${_PROGLDOPTS} ${OBJS} ${LDADD} .endif # defined(DESTDIR) .endif # defined(OBJS) && !empty(OBJS) .if !defined(MAN) MAN= ${PROG}.1 .endif # !defined(MAN) .endif # defined(PROG) .if !defined(_SKIP_BUILD) all: ${PROG} .endif all: _SUBDIRUSE .if !target(clean) cleanprog: rm -f a.out [Ee]rrs mklog core *.core \ ${PROG} ${OBJS} ${LOBJS} ${CLEANFILES} clean: _SUBDIRUSE cleanprog cleandir: _SUBDIRUSE cleanprog .else cleandir: _SUBDIRUSE clean .endif .if defined(SRCS) && (!defined(MKDEP) || ${MKDEP} != autodep) afterdepend: .depend @(TMP=/tmp/_depend$$$$; \ sed -e 's/^\([^\.]*\).o[ ]*:/\1.o \1.ln:/' \ < .depend > $$TMP; \ mv $$TMP .depend) .endif .if !target(install) .if !target(beforeinstall) beforeinstall: .endif .if !target(afterinstall) afterinstall: .endif .if !empty(BINOWN) PROG_INSTALL_OWN ?= -o ${BINOWN} -g ${BINGRP} .endif .if !target(realinstall) realinstall: proginstall .endif .if !target(proginstall) proginstall: .if defined(PROG) [ -d ${DESTDIR}${BINDIR} ] || \ ${INSTALL} -d ${PROG_INSTALL_OWN} -m 775 ${DESTDIR}${BINDIR} ${INSTALL} ${COPY} ${STRIP_FLAG} ${PROG_INSTALL_OWN} -m ${BINMODE} \ ${PROG} ${DESTDIR}${BINDIR}/${PROG_NAME} .endif .if defined(HIDEGAME) (cd ${DESTDIR}/usr/games; rm -f ${PROG}; ln -s dm ${PROG}) .endif .endif .include install: maninstall install_links _SUBDIRUSE install_links: .if !empty(SYMLINKS) @set ${SYMLINKS}; ${_SYMLINKS_SCRIPT} .endif .if !empty(LINKS) @set ${LINKS}; ${_LINKS_SCRIPT} .endif maninstall: afterinstall afterinstall: realinstall +proginstall: beforeinstall realinstall: beforeinstall .endif .if !target(lint) lint: ${LOBJS} .if defined(LOBJS) && !empty(LOBJS) @${LINT} ${LINTFLAGS} ${LDFLAGS:M-L*} ${LOBJS} ${LDADD} .endif .endif .NOPATH: ${PROG} .if defined(OBJS) && !empty(OBJS) .NOPATH: ${OBJS} .endif .if ${MK_MAN} != "no" .include .endif .if ${MK_NLS} != "no" .include .endif .include .include .include .include .endif Index: projects/clang390-import/contrib/bmake/os.sh =================================================================== --- projects/clang390-import/contrib/bmake/os.sh (revision 305686) +++ projects/clang390-import/contrib/bmake/os.sh (revision 305687) @@ -1,242 +1,243 @@ : # NAME: # os.sh - operating system specifics # # DESCRIPTION: # This file is included at the start of processing. Its role is # to set the variables OS, OSREL, OSMAJOR, MACHINE and MACHINE_ARCH to # reflect the current system. # # It also sets variables such as MAILER, LOCAL_FS, PS_AXC to hide # certain aspects of different UNIX flavours. # # SEE ALSO: # site.sh,funcs.sh # # AUTHOR: # Simon J. Gerraty # RCSid: -# $Id: os.sh,v 1.50 2015/12/17 17:06:29 sjg Exp $ +# $Id: os.sh,v 1.52 2016/06/17 05:15:14 sjg Exp $ # # @(#) Copyright (c) 1994 Simon J. Gerraty # # This file is provided in the hope that it will # be of use. There is absolutely NO WARRANTY. # Permission to copy, redistribute or otherwise # use this file is hereby granted provided that # the above copyright notice and this notice are # left intact. # # Please send copies of changes and bug-fixes to: # sjg@crufty.net # # this lets us skip sourcing it again _OS_SH=: OS=`uname` OSREL=`uname -r` OSMAJOR=`IFS=.; set $OSREL; echo $1` MACHINE=`uname -m` MACHINE_ARCH=`uname -p 2>/dev/null || echo $MACHINE` # there is at least one case of `uname -p` outputting # a bunch of usless drivel case "$MACHINE_ARCH" in unknown|*[!A-Za-z0-9_-]*) MACHINE_ARCH="$MACHINE";; esac # we need this here, and it is not always available... Which() { case "$1" in -*) t=$1; shift;; *) t=-x;; esac case "$1" in /*) test $t $1 && echo $1;; *) # some shells cannot correctly handle `IFS` # in conjunction with the for loop. _dirs=`IFS=:; echo ${2:-$PATH}` for d in $_dirs do test $t $d/$1 && { echo $d/$1; break; } done ;; esac } # tr is insanely non-portable wrt char classes, so we need to # spell out the alphabet. sed y/// would work too. toUpper() { ${TR:-tr} abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ } toLower() { ${TR:-tr} ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz } K= case $OS in AIX) # everyone loves to be different... OSMAJOR=`uname -v` OSREL="$OSMAJOR.`uname -r`" LOCAL_FS=jfs PS_AXC=-e SHARE_ARCH=$OS/$OSMAJOR.X ;; SunOS) CHOWN=`Which chown /usr/etc:/usr/bin` export CHOWN # Great! Solaris keeps moving arch(1) # should just bite the bullet and use uname -p arch=`Which arch /usr/bin:/usr/ucb` MAILER=/usr/ucb/Mail LOCAL_FS=4.2 case "$OSREL" in 4.0*) # uname -m just says sun which could be anything # so use arch(1). MACHINE_ARCH=`arch` MACHINE=$MACHINE_ARCH ;; 4*) MACHINE_ARCH=`arch` ;; 5*) K=-k LOCAL_FS=ufs MAILER=mailx PS_AXC=-e # can you believe that ln on Solaris defaults to # overwriting an existing file!!!!! We want one that works! test -x /usr/xpg4/bin/ln && LN=${LN:-/usr/xpg4/bin/ln} # wonderful, 5.8's tr again require's []'s # but /usr/xpg4/bin/tr causes problems if LC_COLLATE is set! # use toUpper/toLower instead. ;; esac case "$OS/$MACHINE_ARCH" in *sun386) SHARE_ARCH=$MACHINE_ARCH;; esac ;; *BSD) K=-k MAILER=/usr/bin/Mail LOCAL_FS=local : $-,$ENV case "$-,$ENV" in *i*,*) ;; *,|*ENVFILE*) ;; *) ENV=;; esac # NetBSD at least has good backward compatibility # so NetBSD/i386 is good enough case $OS in NetBSD) HOST_ARCH=$MACHINE - SHARE_ARCH=$OS/$HOST + SHARE_ARCH=$OS/$HOST_ARCH ;; OpenBSD) arch=`Which arch /usr/bin:/usr/ucb:$PATH` MACHINE_ARCH=`$arch -s` ;; esac NAWK=awk export NAWK ;; HP-UX) TMP_DIRS="/tmp /usr/tmp" LOCAL_FS=hfs MAILER=mailx # don't rely on /bin/sh, its broken _shell=/bin/ksh; ENV= # also, no one would be interested in OSMAJOR=A case "$OSREL" in ?.09*) OSMAJOR=9; PS_AXC=-e;; ?.10*) OSMAJOR=10; PS_AXC=-e;; esac ;; IRIX) LOCAL_FS=efs ;; Interix) MACHINE=i386 MACHINE_ARCH=i386 ;; UnixWare) OSREL=`uname -v` OSMAJOR=`IFS=.; set $OSREL; echo $1` MACHINE_ARCH=`uname -m` ;; Linux) # Not really any such thing as Linux, but # this covers red-hat and hopefully others. case $MACHINE in i?86) MACHINE_ARCH=i386;; # we don't care about i686 vs i586 esac LOCAL_FS=ext2 PS_AXC=axc [ -x /usr/bin/md5sum ] && { MD5=/usr/bin/md5sum; export MD5; } ;; QNX) case $MACHINE in x86pc) MACHINE_ARCH=i386;; esac ;; Haiku) case $MACHINE in BeBox) MACHINE_ARCH=powerpc;; BeMac) MACHINE_ARCH=powerpc;; BePC) MACHINE_ARCH=i386;; esac ;; esac HOSTNAME=${HOSTNAME:-`( hostname ) 2>/dev/null`} HOSTNAME=${HOSTNAME:-`( uname -n ) 2>/dev/null`} case "$HOSTNAME" in *.*) HOST=`IFS=.; set -- $HOSTNAME; echo $1`;; *) HOST=$HOSTNAME;; esac TMP_DIRS=${TMP_DIRS:-"/tmp /var/tmp"} MACHINE_ARCH=${MACHINE_ARCH:-$MACHINE} HOST_ARCH=${HOST_ARCH:-$MACHINE_ARCH} # we mount server:/share/arch/$SHARE_ARCH as /usr/local -SHARE_ARCH=${SHARE_ARCH:-$OS/$OSMAJOR.X/$HOST_ARCH} +SHARE_ARCH_DEFAULT=$OS/$OSMAJOR.X/$HOST_ARCH +SHARE_ARCH=${SHARE_ARCH:-$SHARE_ARCH_DEFAULT} LN=${LN:-ln} TR=${TR:-tr} # Some people like have /share/$HOST_TARGET/bin etc. HOST_TARGET=`echo ${OS}${OSMAJOR}-$HOST_ARCH | tr -d / | toLower` export HOST_TARGET case `echo -n .` in -n*) N=; C="\c";; *) N=-n; C=;; esac Echo() { case "$1" in -n) _n=$N _c=$C; shift;; *) _n= _c=;; esac echo $_n "$@" $_c } export HOSTNAME HOST export OS MACHINE MACHINE_ARCH OSREL OSMAJOR LOCAL_FS TMP_DIRS MAILER N C K PS_AXC export LN SHARE_ARCH TR case /$0 in */os.sh) for v in $* do eval vv=\$$v echo "$v='$vv'" done ;; esac Index: projects/clang390-import/contrib/bmake/suff.c =================================================================== --- projects/clang390-import/contrib/bmake/suff.c (revision 305686) +++ projects/clang390-import/contrib/bmake/suff.c (revision 305687) @@ -1,2667 +1,2675 @@ -/* $NetBSD: suff.c,v 1.81 2016/03/15 18:30:14 matthias Exp $ */ +/* $NetBSD: suff.c,v 1.84 2016/06/30 05:34:04 dholland Exp $ */ /* * Copyright (c) 1988, 1989, 1990, 1993 * The Regents of the University of California. All rights reserved. * * This code is derived from software contributed to Berkeley by * Adam de Boor. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ /* * Copyright (c) 1989 by Berkeley Softworks * All rights reserved. * * This code is derived from software contributed to Berkeley by * Adam de Boor. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. All advertising materials mentioning features or use of this software * must display the following acknowledgement: * This product includes software developed by the University of * California, Berkeley and its contributors. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #ifndef MAKE_NATIVE -static char rcsid[] = "$NetBSD: suff.c,v 1.81 2016/03/15 18:30:14 matthias Exp $"; +static char rcsid[] = "$NetBSD: suff.c,v 1.84 2016/06/30 05:34:04 dholland Exp $"; #else #include #ifndef lint #if 0 static char sccsid[] = "@(#)suff.c 8.4 (Berkeley) 3/21/94"; #else -__RCSID("$NetBSD: suff.c,v 1.81 2016/03/15 18:30:14 matthias Exp $"); +__RCSID("$NetBSD: suff.c,v 1.84 2016/06/30 05:34:04 dholland Exp $"); #endif #endif /* not lint */ #endif /*- * suff.c -- * Functions to maintain suffix lists and find implicit dependents * using suffix transformation rules * * Interface: * Suff_Init Initialize all things to do with suffixes. * * Suff_End Cleanup the module * * Suff_DoPaths This function is used to make life easier * when searching for a file according to its * suffix. It takes the global search path, * as defined using the .PATH: target, and appends * its directories to the path of each of the * defined suffixes, as specified using * .PATH: targets. In addition, all * directories given for suffixes labeled as * include files or libraries, using the .INCLUDES * or .LIBS targets, are played with using * Dir_MakeFlags to create the .INCLUDES and * .LIBS global variables. * * Suff_ClearSuffixes Clear out all the suffixes and defined * transformations. * * Suff_IsTransform Return TRUE if the passed string is the lhs * of a transformation rule. * * Suff_AddSuffix Add the passed string as another known suffix. * * Suff_GetPath Return the search path for the given suffix. * * Suff_AddInclude Mark the given suffix as denoting an include * file. * * Suff_AddLib Mark the given suffix as denoting a library. * * Suff_AddTransform Add another transformation to the suffix * graph. Returns GNode suitable for framing, I * mean, tacking commands, attributes, etc. on. * * Suff_SetNull Define the suffix to consider the suffix of * any file that doesn't have a known one. * * Suff_FindDeps Find implicit sources for and the location of * a target based on its suffix. Returns the * bottom-most node added to the graph or NULL * if the target had no implicit sources. * * Suff_FindPath Return the appropriate path to search in * order to find the node. */ #include #include "make.h" #include "hash.h" #include "dir.h" static Lst sufflist; /* Lst of suffixes */ #ifdef CLEANUP static Lst suffClean; /* Lst of suffixes to be cleaned */ #endif static Lst srclist; /* Lst of sources */ static Lst transforms; /* Lst of transformation rules */ static int sNum = 0; /* Counter for assigning suffix numbers */ /* * Structure describing an individual suffix. */ typedef struct _Suff { char *name; /* The suffix itself */ int nameLen; /* Length of the suffix */ short flags; /* Type of suffix */ #define SUFF_INCLUDE 0x01 /* One which is #include'd */ #define SUFF_LIBRARY 0x02 /* One which contains a library */ #define SUFF_NULL 0x04 /* The empty suffix */ Lst searchPath; /* The path along which files of this suffix * may be found */ int sNum; /* The suffix number */ int refCount; /* Reference count of list membership */ Lst parents; /* Suffixes we have a transformation to */ Lst children; /* Suffixes we have a transformation from */ Lst ref; /* List of lists this suffix is referenced */ } Suff; /* * for SuffSuffIsSuffix */ typedef struct { char *ename; /* The end of the name */ int len; /* Length of the name */ } SuffixCmpData; /* * Structure used in the search for implied sources. */ typedef struct _Src { char *file; /* The file to look for */ char *pref; /* Prefix from which file was formed */ Suff *suff; /* The suffix on the file */ struct _Src *parent; /* The Src for which this is a source */ GNode *node; /* The node describing the file */ int children; /* Count of existing children (so we don't free * this thing too early or never nuke it) */ #ifdef DEBUG_SRC Lst cp; /* Debug; children list */ #endif } Src; /* * A structure for passing more than one argument to the Lst-library-invoked * function... */ typedef struct { Lst l; Src *s; } LstSrc; typedef struct { GNode **gn; Suff *s; Boolean r; } GNodeSuff; static Suff *suffNull; /* The NULL suffix for this run */ static Suff *emptySuff; /* The empty suffix required for POSIX * single-suffix transformation rules */ static const char *SuffStrIsPrefix(const char *, const char *); static char *SuffSuffIsSuffix(const Suff *, const SuffixCmpData *); static int SuffSuffIsSuffixP(const void *, const void *); static int SuffSuffHasNameP(const void *, const void *); static int SuffSuffIsPrefix(const void *, const void *); static int SuffGNHasNameP(const void *, const void *); static void SuffUnRef(void *, void *); static void SuffFree(void *); static void SuffInsert(Lst, Suff *); static void SuffRemove(Lst, Suff *); static Boolean SuffParseTransform(char *, Suff **, Suff **); static int SuffRebuildGraph(void *, void *); static int SuffScanTargets(void *, void *); static int SuffAddSrc(void *, void *); static int SuffRemoveSrc(Lst); static void SuffAddLevel(Lst, Src *); static Src *SuffFindThem(Lst, Lst); static Src *SuffFindCmds(Src *, Lst); static void SuffExpandChildren(LstNode, GNode *); static void SuffExpandWildcards(LstNode, GNode *); static Boolean SuffApplyTransform(GNode *, GNode *, Suff *, Suff *); static void SuffFindDeps(GNode *, Lst); static void SuffFindArchiveDeps(GNode *, Lst); static void SuffFindNormalDeps(GNode *, Lst); static int SuffPrintName(void *, void *); static int SuffPrintSuff(void *, void *); static int SuffPrintTrans(void *, void *); /*************** Lst Predicates ****************/ /*- *----------------------------------------------------------------------- * SuffStrIsPrefix -- * See if pref is a prefix of str. * * Input: * pref possible prefix * str string to check * * Results: * NULL if it ain't, pointer to character in str after prefix if so * * Side Effects: * None *----------------------------------------------------------------------- */ static const char * SuffStrIsPrefix(const char *pref, const char *str) { while (*str && *pref == *str) { pref++; str++; } return (*pref ? NULL : str); } /*- *----------------------------------------------------------------------- * SuffSuffIsSuffix -- * See if suff is a suffix of str. sd->ename should point to THE END * of the string to check. (THE END == the null byte) * * Input: * s possible suffix * sd string to examine * * Results: * NULL if it ain't, pointer to character in str before suffix if * it is. * * Side Effects: * None *----------------------------------------------------------------------- */ static char * SuffSuffIsSuffix(const Suff *s, const SuffixCmpData *sd) { char *p1; /* Pointer into suffix name */ char *p2; /* Pointer into string being examined */ if (sd->len < s->nameLen) return NULL; /* this string is shorter than the suffix */ p1 = s->name + s->nameLen; p2 = sd->ename; while (p1 >= s->name && *p1 == *p2) { p1--; p2--; } return (p1 == s->name - 1 ? p2 : NULL); } /*- *----------------------------------------------------------------------- * SuffSuffIsSuffixP -- * Predicate form of SuffSuffIsSuffix. Passed as the callback function * to Lst_Find. * * Results: * 0 if the suffix is the one desired, non-zero if not. * * Side Effects: * None. * *----------------------------------------------------------------------- */ static int SuffSuffIsSuffixP(const void *s, const void *sd) { return(!SuffSuffIsSuffix(s, sd)); } /*- *----------------------------------------------------------------------- * SuffSuffHasNameP -- * Callback procedure for finding a suffix based on its name. Used by * Suff_GetPath. * * Input: * s Suffix to check * sd Desired name * * Results: * 0 if the suffix is of the given name. non-zero otherwise. * * Side Effects: * None *----------------------------------------------------------------------- */ static int SuffSuffHasNameP(const void *s, const void *sname) { return (strcmp(sname, ((const Suff *)s)->name)); } /*- *----------------------------------------------------------------------- * SuffSuffIsPrefix -- * See if the suffix described by s is a prefix of the string. Care * must be taken when using this to search for transformations and * what-not, since there could well be two suffixes, one of which * is a prefix of the other... * * Input: * s suffix to compare * str string to examine * * Results: * 0 if s is a prefix of str. non-zero otherwise * * Side Effects: * None *----------------------------------------------------------------------- */ static int SuffSuffIsPrefix(const void *s, const void *str) { return SuffStrIsPrefix(((const Suff *)s)->name, str) == NULL; } /*- *----------------------------------------------------------------------- * SuffGNHasNameP -- * See if the graph node has the desired name * * Input: * gn current node we're looking at * name name we're looking for * * Results: * 0 if it does. non-zero if it doesn't * * Side Effects: * None *----------------------------------------------------------------------- */ static int SuffGNHasNameP(const void *gn, const void *name) { return (strcmp(name, ((const GNode *)gn)->name)); } /*********** Maintenance Functions ************/ static void SuffUnRef(void *lp, void *sp) { Lst l = (Lst) lp; LstNode ln = Lst_Member(l, sp); if (ln != NULL) { Lst_Remove(l, ln); ((Suff *)sp)->refCount--; } } /*- *----------------------------------------------------------------------- * SuffFree -- * Free up all memory associated with the given suffix structure. * * Results: * none * * Side Effects: * the suffix entry is detroyed *----------------------------------------------------------------------- */ static void SuffFree(void *sp) { Suff *s = (Suff *)sp; if (s == suffNull) suffNull = NULL; if (s == emptySuff) emptySuff = NULL; #ifdef notdef /* We don't delete suffixes in order, so we cannot use this */ if (s->refCount) Punt("Internal error deleting suffix `%s' with refcount = %d", s->name, s->refCount); #endif Lst_Destroy(s->ref, NULL); Lst_Destroy(s->children, NULL); Lst_Destroy(s->parents, NULL); Lst_Destroy(s->searchPath, Dir_Destroy); free(s->name); free(s); } /*- *----------------------------------------------------------------------- * SuffRemove -- * Remove the suffix into the list * * Results: * None * * Side Effects: * The reference count for the suffix is decremented and the * suffix is possibly freed *----------------------------------------------------------------------- */ static void SuffRemove(Lst l, Suff *s) { SuffUnRef(l, s); if (s->refCount == 0) { SuffUnRef(sufflist, s); SuffFree(s); } } /*- *----------------------------------------------------------------------- * SuffInsert -- * Insert the suffix into the list keeping the list ordered by suffix * numbers. * * Input: * l the list where in s should be inserted * s the suffix to insert * * Results: * None * * Side Effects: * The reference count of the suffix is incremented *----------------------------------------------------------------------- */ static void SuffInsert(Lst l, Suff *s) { LstNode ln; /* current element in l we're examining */ Suff *s2 = NULL; /* the suffix descriptor in this element */ if (Lst_Open(l) == FAILURE) { return; } while ((ln = Lst_Next(l)) != NULL) { s2 = (Suff *)Lst_Datum(ln); if (s2->sNum >= s->sNum) { break; } } Lst_Close(l); if (DEBUG(SUFF)) { fprintf(debug_file, "inserting %s(%d)...", s->name, s->sNum); } if (ln == NULL) { if (DEBUG(SUFF)) { fprintf(debug_file, "at end of list\n"); } (void)Lst_AtEnd(l, s); s->refCount++; (void)Lst_AtEnd(s->ref, l); } else if (s2->sNum != s->sNum) { if (DEBUG(SUFF)) { fprintf(debug_file, "before %s(%d)\n", s2->name, s2->sNum); } (void)Lst_InsertBefore(l, ln, s); s->refCount++; (void)Lst_AtEnd(s->ref, l); } else if (DEBUG(SUFF)) { fprintf(debug_file, "already there\n"); } } /*- *----------------------------------------------------------------------- * Suff_ClearSuffixes -- * This is gross. Nuke the list of suffixes but keep all transformation * rules around. The transformation graph is destroyed in this process, * but we leave the list of rules so when a new graph is formed the rules * will remain. * This function is called from the parse module when a * .SUFFIXES:\n line is encountered. * * Results: * none * * Side Effects: * the sufflist and its graph nodes are destroyed *----------------------------------------------------------------------- */ void Suff_ClearSuffixes(void) { #ifdef CLEANUP Lst_Concat(suffClean, sufflist, LST_CONCLINK); #endif sufflist = Lst_Init(FALSE); sNum = 0; if (suffNull) SuffFree(suffNull); emptySuff = suffNull = bmake_malloc(sizeof(Suff)); suffNull->name = bmake_strdup(""); suffNull->nameLen = 0; suffNull->searchPath = Lst_Init(FALSE); Dir_Concat(suffNull->searchPath, dirSearchPath); suffNull->children = Lst_Init(FALSE); suffNull->parents = Lst_Init(FALSE); suffNull->ref = Lst_Init(FALSE); suffNull->sNum = sNum++; suffNull->flags = SUFF_NULL; suffNull->refCount = 1; } /*- *----------------------------------------------------------------------- * SuffParseTransform -- * Parse a transformation string to find its two component suffixes. * * Input: * str String being parsed * srcPtr Place to store source of trans. * targPtr Place to store target of trans. * * Results: * TRUE if the string is a valid transformation and FALSE otherwise. * * Side Effects: * The passed pointers are overwritten. * *----------------------------------------------------------------------- */ static Boolean SuffParseTransform(char *str, Suff **srcPtr, Suff **targPtr) { LstNode srcLn; /* element in suffix list of trans source*/ Suff *src; /* Source of transformation */ LstNode targLn; /* element in suffix list of trans target*/ char *str2; /* Extra pointer (maybe target suffix) */ LstNode singleLn; /* element in suffix list of any suffix * that exactly matches str */ Suff *single = NULL;/* Source of possible transformation to * null suffix */ srcLn = NULL; singleLn = NULL; /* * Loop looking first for a suffix that matches the start of the * string and then for one that exactly matches the rest of it. If * we can find two that meet these criteria, we've successfully * parsed the string. */ for (;;) { if (srcLn == NULL) { srcLn = Lst_Find(sufflist, str, SuffSuffIsPrefix); } else { srcLn = Lst_FindFrom(sufflist, Lst_Succ(srcLn), str, SuffSuffIsPrefix); } if (srcLn == NULL) { /* * Ran out of source suffixes -- no such rule */ if (singleLn != NULL) { /* * Not so fast Mr. Smith! There was a suffix that encompassed * the entire string, so we assume it was a transformation * to the null suffix (thank you POSIX). We still prefer to * find a double rule over a singleton, hence we leave this * check until the end. * * XXX: Use emptySuff over suffNull? */ *srcPtr = single; *targPtr = suffNull; return(TRUE); } return (FALSE); } src = (Suff *)Lst_Datum(srcLn); str2 = str + src->nameLen; if (*str2 == '\0') { single = src; singleLn = srcLn; } else { targLn = Lst_Find(sufflist, str2, SuffSuffHasNameP); if (targLn != NULL) { *srcPtr = src; *targPtr = (Suff *)Lst_Datum(targLn); return (TRUE); } } } } /*- *----------------------------------------------------------------------- * Suff_IsTransform -- * Return TRUE if the given string is a transformation rule * * * Input: * str string to check * * Results: * TRUE if the string is a concatenation of two known suffixes. * FALSE otherwise * * Side Effects: * None *----------------------------------------------------------------------- */ Boolean Suff_IsTransform(char *str) { Suff *src, *targ; return (SuffParseTransform(str, &src, &targ)); } /*- *----------------------------------------------------------------------- * Suff_AddTransform -- * Add the transformation rule described by the line to the * list of rules and place the transformation itself in the graph * * Input: * line name of transformation to add * * Results: * The node created for the transformation in the transforms list * * Side Effects: * The node is placed on the end of the transforms Lst and links are * made between the two suffixes mentioned in the target name *----------------------------------------------------------------------- */ GNode * Suff_AddTransform(char *line) { GNode *gn; /* GNode of transformation rule */ Suff *s, /* source suffix */ *t; /* target suffix */ LstNode ln; /* Node for existing transformation */ ln = Lst_Find(transforms, line, SuffGNHasNameP); if (ln == NULL) { /* * Make a new graph node for the transformation. It will be filled in * by the Parse module. */ gn = Targ_NewGN(line); (void)Lst_AtEnd(transforms, gn); } else { /* * New specification for transformation rule. Just nuke the old list * of commands so they can be filled in again... We don't actually * free the commands themselves, because a given command can be * attached to several different transformations. */ gn = (GNode *)Lst_Datum(ln); Lst_Destroy(gn->commands, NULL); Lst_Destroy(gn->children, NULL); gn->commands = Lst_Init(FALSE); gn->children = Lst_Init(FALSE); } gn->type = OP_TRANSFORM; (void)SuffParseTransform(line, &s, &t); /* * link the two together in the proper relationship and order */ if (DEBUG(SUFF)) { fprintf(debug_file, "defining transformation from `%s' to `%s'\n", s->name, t->name); } SuffInsert(t->children, s); SuffInsert(s->parents, t); return (gn); } /*- *----------------------------------------------------------------------- * Suff_EndTransform -- * Handle the finish of a transformation definition, removing the * transformation from the graph if it has neither commands nor * sources. This is a callback procedure for the Parse module via * Lst_ForEach * * Input: * gnp Node for transformation * dummy Node for transformation * * Results: * === 0 * * Side Effects: * If the node has no commands or children, the children and parents * lists of the affected suffixes are altered. * *----------------------------------------------------------------------- */ int Suff_EndTransform(void *gnp, void *dummy) { GNode *gn = (GNode *)gnp; + (void)dummy; + if ((gn->type & OP_DOUBLEDEP) && !Lst_IsEmpty (gn->cohorts)) gn = (GNode *)Lst_Datum(Lst_Last(gn->cohorts)); if ((gn->type & OP_TRANSFORM) && Lst_IsEmpty(gn->commands) && Lst_IsEmpty(gn->children)) { Suff *s, *t; /* * SuffParseTransform() may fail for special rules which are not * actual transformation rules. (e.g. .DEFAULT) */ if (SuffParseTransform(gn->name, &s, &t)) { Lst p; if (DEBUG(SUFF)) { fprintf(debug_file, "deleting transformation from `%s' to `%s'\n", s->name, t->name); } /* * Store s->parents because s could be deleted in SuffRemove */ p = s->parents; /* * Remove the source from the target's children list. We check for a * nil return to handle a beanhead saying something like * .c.o .c.o: * * We'll be called twice when the next target is seen, but .c and .o * are only linked once... */ SuffRemove(t->children, s); /* * Remove the target from the source's parents list */ SuffRemove(p, t); } } else if ((gn->type & OP_TRANSFORM) && DEBUG(SUFF)) { fprintf(debug_file, "transformation %s complete\n", gn->name); } - return(dummy ? 0 : 0); + return 0; } /*- *----------------------------------------------------------------------- * SuffRebuildGraph -- * Called from Suff_AddSuffix via Lst_ForEach to search through the * list of existing transformation rules and rebuild the transformation * graph when it has been destroyed by Suff_ClearSuffixes. If the * given rule is a transformation involving this suffix and another, * existing suffix, the proper relationship is established between * the two. * * Input: * transformp Transformation to test * sp Suffix to rebuild * * Results: * Always 0. * * Side Effects: * The appropriate links will be made between this suffix and * others if transformation rules exist for it. * *----------------------------------------------------------------------- */ static int SuffRebuildGraph(void *transformp, void *sp) { GNode *transform = (GNode *)transformp; Suff *s = (Suff *)sp; char *cp; LstNode ln; Suff *s2; SuffixCmpData sd; /* * First see if it is a transformation from this suffix. */ cp = UNCONST(SuffStrIsPrefix(s->name, transform->name)); if (cp != NULL) { ln = Lst_Find(sufflist, cp, SuffSuffHasNameP); if (ln != NULL) { /* * Found target. Link in and return, since it can't be anything * else. */ s2 = (Suff *)Lst_Datum(ln); SuffInsert(s2->children, s); SuffInsert(s->parents, s2); return(0); } } /* * Not from, maybe to? */ sd.len = strlen(transform->name); sd.ename = transform->name + sd.len; cp = SuffSuffIsSuffix(s, &sd); if (cp != NULL) { /* * Null-terminate the source suffix in order to find it. */ cp[1] = '\0'; ln = Lst_Find(sufflist, transform->name, SuffSuffHasNameP); /* * Replace the start of the target suffix */ cp[1] = s->name[0]; if (ln != NULL) { /* * Found it -- establish the proper relationship */ s2 = (Suff *)Lst_Datum(ln); SuffInsert(s->children, s2); SuffInsert(s2->parents, s); } } return(0); } /*- *----------------------------------------------------------------------- * SuffScanTargets -- * Called from Suff_AddSuffix via Lst_ForEach to search through the * list of existing targets and find if any of the existing targets * can be turned into a transformation rule. * * Results: * 1 if a new main target has been selected, 0 otherwise. * * Side Effects: * If such a target is found and the target is the current main * target, the main target is set to NULL and the next target * examined (if that exists) becomes the main target. * *----------------------------------------------------------------------- */ static int SuffScanTargets(void *targetp, void *gsp) { GNode *target = (GNode *)targetp; GNodeSuff *gs = (GNodeSuff *)gsp; Suff *s, *t; char *ptr; if (*gs->gn == NULL && gs->r && (target->type & OP_NOTARGET) == 0) { *gs->gn = target; Targ_SetMain(target); return 1; } if ((unsigned int)target->type == OP_TRANSFORM) return 0; if ((ptr = strstr(target->name, gs->s->name)) == NULL || ptr == target->name) return 0; if (SuffParseTransform(target->name, &s, &t)) { if (*gs->gn == target) { gs->r = TRUE; *gs->gn = NULL; Targ_SetMain(NULL); } Lst_Destroy(target->children, NULL); target->children = Lst_Init(FALSE); target->type = OP_TRANSFORM; /* * link the two together in the proper relationship and order */ if (DEBUG(SUFF)) { fprintf(debug_file, "defining transformation from `%s' to `%s'\n", s->name, t->name); } SuffInsert(t->children, s); SuffInsert(s->parents, t); } return 0; } /*- *----------------------------------------------------------------------- * Suff_AddSuffix -- * Add the suffix in string to the end of the list of known suffixes. * Should we restructure the suffix graph? Make doesn't... * * Input: * str the name of the suffix to add * * Results: * None * * Side Effects: * A GNode is created for the suffix and a Suff structure is created and * added to the suffixes list unless the suffix was already known. * The mainNode passed can be modified if a target mutated into a * transform and that target happened to be the main target. *----------------------------------------------------------------------- */ void Suff_AddSuffix(char *str, GNode **gn) { Suff *s; /* new suffix descriptor */ LstNode ln; GNodeSuff gs; ln = Lst_Find(sufflist, str, SuffSuffHasNameP); if (ln == NULL) { s = bmake_malloc(sizeof(Suff)); s->name = bmake_strdup(str); s->nameLen = strlen(s->name); s->searchPath = Lst_Init(FALSE); s->children = Lst_Init(FALSE); s->parents = Lst_Init(FALSE); s->ref = Lst_Init(FALSE); s->sNum = sNum++; s->flags = 0; s->refCount = 1; (void)Lst_AtEnd(sufflist, s); /* * We also look at our existing targets list to see if adding * this suffix will make one of our current targets mutate into * a suffix rule. This is ugly, but other makes treat all targets * that start with a . as suffix rules. */ gs.gn = gn; gs.s = s; gs.r = FALSE; Lst_ForEach(Targ_List(), SuffScanTargets, &gs); /* * Look for any existing transformations from or to this suffix. * XXX: Only do this after a Suff_ClearSuffixes? */ Lst_ForEach(transforms, SuffRebuildGraph, s); } } /*- *----------------------------------------------------------------------- * Suff_GetPath -- * Return the search path for the given suffix, if it's defined. * * Results: * The searchPath for the desired suffix or NULL if the suffix isn't * defined. * * Side Effects: * None *----------------------------------------------------------------------- */ Lst Suff_GetPath(char *sname) { LstNode ln; Suff *s; ln = Lst_Find(sufflist, sname, SuffSuffHasNameP); if (ln == NULL) { return NULL; } else { s = (Suff *)Lst_Datum(ln); return (s->searchPath); } } /*- *----------------------------------------------------------------------- * Suff_DoPaths -- * Extend the search paths for all suffixes to include the default * search path. * * Results: * None. * * Side Effects: * The searchPath field of all the suffixes is extended by the * directories in dirSearchPath. If paths were specified for the * ".h" suffix, the directories are stuffed into a global variable * called ".INCLUDES" with each directory preceded by a -I. The same * is done for the ".a" suffix, except the variable is called * ".LIBS" and the flag is -L. *----------------------------------------------------------------------- */ void Suff_DoPaths(void) { Suff *s; LstNode ln; char *ptr; Lst inIncludes; /* Cumulative .INCLUDES path */ Lst inLibs; /* Cumulative .LIBS path */ if (Lst_Open(sufflist) == FAILURE) { return; } inIncludes = Lst_Init(FALSE); inLibs = Lst_Init(FALSE); while ((ln = Lst_Next(sufflist)) != NULL) { s = (Suff *)Lst_Datum(ln); if (!Lst_IsEmpty (s->searchPath)) { #ifdef INCLUDES if (s->flags & SUFF_INCLUDE) { Dir_Concat(inIncludes, s->searchPath); } #endif /* INCLUDES */ #ifdef LIBRARIES if (s->flags & SUFF_LIBRARY) { Dir_Concat(inLibs, s->searchPath); } #endif /* LIBRARIES */ Dir_Concat(s->searchPath, dirSearchPath); } else { Lst_Destroy(s->searchPath, Dir_Destroy); s->searchPath = Lst_Duplicate(dirSearchPath, Dir_CopyDir); } } Var_Set(".INCLUDES", ptr = Dir_MakeFlags("-I", inIncludes), VAR_GLOBAL, 0); free(ptr); Var_Set(".LIBS", ptr = Dir_MakeFlags("-L", inLibs), VAR_GLOBAL, 0); free(ptr); Lst_Destroy(inIncludes, Dir_Destroy); Lst_Destroy(inLibs, Dir_Destroy); Lst_Close(sufflist); } /*- *----------------------------------------------------------------------- * Suff_AddInclude -- * Add the given suffix as a type of file which gets included. * Called from the parse module when a .INCLUDES line is parsed. * The suffix must have already been defined. * * Input: * sname Name of the suffix to mark * * Results: * None. * * Side Effects: * The SUFF_INCLUDE bit is set in the suffix's flags field * *----------------------------------------------------------------------- */ void Suff_AddInclude(char *sname) { LstNode ln; Suff *s; ln = Lst_Find(sufflist, sname, SuffSuffHasNameP); if (ln != NULL) { s = (Suff *)Lst_Datum(ln); s->flags |= SUFF_INCLUDE; } } /*- *----------------------------------------------------------------------- * Suff_AddLib -- * Add the given suffix as a type of file which is a library. * Called from the parse module when parsing a .LIBS line. The * suffix must have been defined via .SUFFIXES before this is * called. * * Input: * sname Name of the suffix to mark * * Results: * None. * * Side Effects: * The SUFF_LIBRARY bit is set in the suffix's flags field * *----------------------------------------------------------------------- */ void Suff_AddLib(char *sname) { LstNode ln; Suff *s; ln = Lst_Find(sufflist, sname, SuffSuffHasNameP); if (ln != NULL) { s = (Suff *)Lst_Datum(ln); s->flags |= SUFF_LIBRARY; } } /********** Implicit Source Search Functions *********/ /*- *----------------------------------------------------------------------- * SuffAddSrc -- * Add a suffix as a Src structure to the given list with its parent * being the given Src structure. If the suffix is the null suffix, * the prefix is used unaltered as the file name in the Src structure. * * Input: * sp suffix for which to create a Src structure * lsp list and parent for the new Src * * Results: * always returns 0 * * Side Effects: * A Src structure is created and tacked onto the end of the list *----------------------------------------------------------------------- */ static int SuffAddSrc(void *sp, void *lsp) { Suff *s = (Suff *)sp; LstSrc *ls = (LstSrc *)lsp; Src *s2; /* new Src structure */ Src *targ; /* Target structure */ targ = ls->s; if ((s->flags & SUFF_NULL) && (*s->name != '\0')) { /* * If the suffix has been marked as the NULL suffix, also create a Src * structure for a file with no suffix attached. Two birds, and all * that... */ s2 = bmake_malloc(sizeof(Src)); s2->file = bmake_strdup(targ->pref); s2->pref = targ->pref; s2->parent = targ; s2->node = NULL; s2->suff = s; s->refCount++; s2->children = 0; targ->children += 1; (void)Lst_AtEnd(ls->l, s2); #ifdef DEBUG_SRC s2->cp = Lst_Init(FALSE); Lst_AtEnd(targ->cp, s2); - fprintf(debug_file, "1 add %x %x to %x:", targ, s2, ls->l); + fprintf(debug_file, "1 add %p %p to %p:", targ, s2, ls->l); Lst_ForEach(ls->l, PrintAddr, NULL); fprintf(debug_file, "\n"); #endif } s2 = bmake_malloc(sizeof(Src)); s2->file = str_concat(targ->pref, s->name, 0); s2->pref = targ->pref; s2->parent = targ; s2->node = NULL; s2->suff = s; s->refCount++; s2->children = 0; targ->children += 1; (void)Lst_AtEnd(ls->l, s2); #ifdef DEBUG_SRC s2->cp = Lst_Init(FALSE); Lst_AtEnd(targ->cp, s2); - fprintf(debug_file, "2 add %x %x to %x:", targ, s2, ls->l); + fprintf(debug_file, "2 add %p %p to %p:", targ, s2, ls->l); Lst_ForEach(ls->l, PrintAddr, NULL); fprintf(debug_file, "\n"); #endif return(0); } /*- *----------------------------------------------------------------------- * SuffAddLevel -- * Add all the children of targ as Src structures to the given list * * Input: * l list to which to add the new level * targ Src structure to use as the parent * * Results: * None * * Side Effects: * Lots of structures are created and added to the list *----------------------------------------------------------------------- */ static void SuffAddLevel(Lst l, Src *targ) { LstSrc ls; ls.s = targ; ls.l = l; Lst_ForEach(targ->suff->children, SuffAddSrc, &ls); } /*- *---------------------------------------------------------------------- * SuffRemoveSrc -- * Free all src structures in list that don't have a reference count * * Results: * Ture if an src was removed * * Side Effects: * The memory is free'd. *---------------------------------------------------------------------- */ static int SuffRemoveSrc(Lst l) { LstNode ln; Src *s; int t = 0; if (Lst_Open(l) == FAILURE) { return 0; } #ifdef DEBUG_SRC fprintf(debug_file, "cleaning %lx: ", (unsigned long) l); Lst_ForEach(l, PrintAddr, NULL); fprintf(debug_file, "\n"); #endif while ((ln = Lst_Next(l)) != NULL) { s = (Src *)Lst_Datum(ln); if (s->children == 0) { free(s->file); if (!s->parent) free(s->pref); else { #ifdef DEBUG_SRC - LstNode ln = Lst_Member(s->parent->cp, s); - if (ln != NULL) - Lst_Remove(s->parent->cp, ln); + LstNode ln2 = Lst_Member(s->parent->cp, s); + if (ln2 != NULL) + Lst_Remove(s->parent->cp, ln2); #endif --s->parent->children; } #ifdef DEBUG_SRC - fprintf(debug_file, "free: [l=%x] p=%x %d\n", l, s, s->children); + fprintf(debug_file, "free: [l=%p] p=%p %d\n", l, s, s->children); Lst_Destroy(s->cp, NULL); #endif Lst_Remove(l, ln); free(s); t |= 1; Lst_Close(l); return TRUE; } #ifdef DEBUG_SRC else { - fprintf(debug_file, "keep: [l=%x] p=%x %d: ", l, s, s->children); + fprintf(debug_file, "keep: [l=%p] p=%p %d: ", l, s, s->children); Lst_ForEach(s->cp, PrintAddr, NULL); fprintf(debug_file, "\n"); } #endif } Lst_Close(l); return t; } /*- *----------------------------------------------------------------------- * SuffFindThem -- * Find the first existing file/target in the list srcs * * Input: * srcs list of Src structures to search through * * Results: * The lowest structure in the chain of transformations * * Side Effects: * None *----------------------------------------------------------------------- */ static Src * SuffFindThem(Lst srcs, Lst slst) { Src *s; /* current Src */ Src *rs; /* returned Src */ char *ptr; rs = NULL; while (!Lst_IsEmpty (srcs)) { s = (Src *)Lst_DeQueue(srcs); if (DEBUG(SUFF)) { fprintf(debug_file, "\ttrying %s...", s->file); } /* * A file is considered to exist if either a node exists in the * graph for it or the file actually exists. */ if (Targ_FindNode(s->file, TARG_NOCREATE) != NULL) { #ifdef DEBUG_SRC - fprintf(debug_file, "remove %x from %x\n", s, srcs); + fprintf(debug_file, "remove %p from %p\n", s, srcs); #endif rs = s; break; } if ((ptr = Dir_FindFile(s->file, s->suff->searchPath)) != NULL) { rs = s; #ifdef DEBUG_SRC - fprintf(debug_file, "remove %x from %x\n", s, srcs); + fprintf(debug_file, "remove %p from %p\n", s, srcs); #endif free(ptr); break; } if (DEBUG(SUFF)) { fprintf(debug_file, "not there\n"); } SuffAddLevel(srcs, s); Lst_AtEnd(slst, s); } if (DEBUG(SUFF) && rs) { fprintf(debug_file, "got it\n"); } return (rs); } /*- *----------------------------------------------------------------------- * SuffFindCmds -- * See if any of the children of the target in the Src structure is * one from which the target can be transformed. If there is one, * a Src structure is put together for it and returned. * * Input: * targ Src structure to play with * * Results: * The Src structure of the "winning" child, or NULL if no such beast. * * Side Effects: * A Src structure may be allocated. * *----------------------------------------------------------------------- */ static Src * SuffFindCmds(Src *targ, Lst slst) { LstNode ln; /* General-purpose list node */ GNode *t, /* Target GNode */ *s; /* Source GNode */ int prefLen;/* The length of the defined prefix */ Suff *suff; /* Suffix on matching beastie */ Src *ret; /* Return value */ char *cp; t = targ->node; (void)Lst_Open(t->children); prefLen = strlen(targ->pref); for (;;) { ln = Lst_Next(t->children); if (ln == NULL) { Lst_Close(t->children); return NULL; } s = (GNode *)Lst_Datum(ln); if (s->type & OP_OPTIONAL && Lst_IsEmpty(t->commands)) { /* * We haven't looked to see if .OPTIONAL files exist yet, so * don't use one as the implicit source. * This allows us to use .OPTIONAL in .depend files so make won't * complain "don't know how to make xxx.h' when a dependent file * has been moved/deleted. */ continue; } cp = strrchr(s->name, '/'); if (cp == NULL) { cp = s->name; } else { cp++; } if (strncmp(cp, targ->pref, prefLen) != 0) continue; /* * The node matches the prefix ok, see if it has a known * suffix. */ ln = Lst_Find(sufflist, &cp[prefLen], SuffSuffHasNameP); if (ln == NULL) continue; /* * It even has a known suffix, see if there's a transformation * defined between the node's suffix and the target's suffix. * * XXX: Handle multi-stage transformations here, too. */ suff = (Suff *)Lst_Datum(ln); if (Lst_Member(suff->parents, targ->suff) != NULL) break; } /* * Hot Damn! Create a new Src structure to describe * this transformation (making sure to duplicate the * source node's name so Suff_FindDeps can free it * again (ick)), and return the new structure. */ ret = bmake_malloc(sizeof(Src)); ret->file = bmake_strdup(s->name); ret->pref = targ->pref; ret->suff = suff; suff->refCount++; ret->parent = targ; ret->node = s; ret->children = 0; targ->children += 1; #ifdef DEBUG_SRC ret->cp = Lst_Init(FALSE); - fprintf(debug_file, "3 add %x %x\n", targ, ret); + fprintf(debug_file, "3 add %p %p\n", targ, ret); Lst_AtEnd(targ->cp, ret); #endif Lst_AtEnd(slst, ret); if (DEBUG(SUFF)) { fprintf(debug_file, "\tusing existing source %s\n", s->name); } return (ret); } /*- *----------------------------------------------------------------------- * SuffExpandChildren -- * Expand the names of any children of a given node that contain * variable invocations or file wildcards into actual targets. * * Input: * cln Child to examine * pgn Parent node being processed * * Results: * === 0 (continue) * * Side Effects: * The expanded node is removed from the parent's list of children, * and the parent's unmade counter is decremented, but other nodes * may be added. * *----------------------------------------------------------------------- */ static void SuffExpandChildren(LstNode cln, GNode *pgn) { GNode *cgn = (GNode *)Lst_Datum(cln); GNode *gn; /* New source 8) */ char *cp; /* Expanded value */ if (!Lst_IsEmpty(cgn->order_pred) || !Lst_IsEmpty(cgn->order_succ)) /* It is all too hard to process the result of .ORDER */ return; if (cgn->type & OP_WAIT) /* Ignore these (& OP_PHONY ?) */ return; /* * First do variable expansion -- this takes precedence over * wildcard expansion. If the result contains wildcards, they'll be gotten * to later since the resulting words are tacked on to the end of * the children list. */ if (strchr(cgn->name, '$') == NULL) { SuffExpandWildcards(cln, pgn); return; } if (DEBUG(SUFF)) { fprintf(debug_file, "Expanding \"%s\"...", cgn->name); } cp = Var_Subst(NULL, cgn->name, pgn, VARF_UNDEFERR|VARF_WANTRES); if (cp != NULL) { Lst members = Lst_Init(FALSE); if (cgn->type & OP_ARCHV) { /* * Node was an archive(member) target, so we want to call * on the Arch module to find the nodes for us, expanding * variables in the parent's context. */ char *sacrifice = cp; (void)Arch_ParseArchive(&sacrifice, members, pgn); } else { /* * Break the result into a vector of strings whose nodes * we can find, then add those nodes to the members list. * Unfortunately, we can't use brk_string b/c it * doesn't understand about variable specifications with * spaces in them... */ char *start; char *initcp = cp; /* For freeing... */ for (start = cp; *start == ' ' || *start == '\t'; start++) continue; for (cp = start; *cp != '\0'; cp++) { if (*cp == ' ' || *cp == '\t') { /* * White-space -- terminate element, find the node, * add it, skip any further spaces. */ *cp++ = '\0'; gn = Targ_FindNode(start, TARG_CREATE); (void)Lst_AtEnd(members, gn); while (*cp == ' ' || *cp == '\t') { cp++; } /* * Adjust cp for increment at start of loop, but * set start to first non-space. */ start = cp--; } else if (*cp == '$') { /* * Start of a variable spec -- contact variable module * to find the end so we can skip over it. */ char *junk; int len; void *freeIt; junk = Var_Parse(cp, pgn, VARF_UNDEFERR|VARF_WANTRES, &len, &freeIt); if (junk != var_Error) { cp += len - 1; } free(freeIt); - } else if (*cp == '\\' && *cp != '\0') { + } else if (*cp == '\\' && cp[1] != '\0') { /* * Escaped something -- skip over it */ cp++; } } if (cp != start) { /* * Stuff left over -- add it to the list too */ gn = Targ_FindNode(start, TARG_CREATE); (void)Lst_AtEnd(members, gn); } /* * Point cp back at the beginning again so the variable value * can be freed. */ cp = initcp; } /* * Add all elements of the members list to the parent node. */ while(!Lst_IsEmpty(members)) { gn = (GNode *)Lst_DeQueue(members); if (DEBUG(SUFF)) { fprintf(debug_file, "%s...", gn->name); } /* Add gn to the parents child list before the original child */ (void)Lst_InsertBefore(pgn->children, cln, gn); (void)Lst_AtEnd(gn->parents, pgn); pgn->unmade++; /* Expand wildcards on new node */ SuffExpandWildcards(Lst_Prev(cln), pgn); } Lst_Destroy(members, NULL); /* * Free the result */ free(cp); } if (DEBUG(SUFF)) { fprintf(debug_file, "\n"); } /* * Now the source is expanded, remove it from the list of children to * keep it from being processed. */ pgn->unmade--; Lst_Remove(pgn->children, cln); Lst_Remove(cgn->parents, Lst_Member(cgn->parents, pgn)); } static void SuffExpandWildcards(LstNode cln, GNode *pgn) { GNode *cgn = (GNode *)Lst_Datum(cln); GNode *gn; /* New source 8) */ char *cp; /* Expanded value */ Lst explist; /* List of expansions */ if (!Dir_HasWildcards(cgn->name)) return; /* * Expand the word along the chosen path */ explist = Lst_Init(FALSE); Dir_Expand(cgn->name, Suff_FindPath(cgn), explist); while (!Lst_IsEmpty(explist)) { /* * Fetch next expansion off the list and find its GNode */ cp = (char *)Lst_DeQueue(explist); if (DEBUG(SUFF)) { fprintf(debug_file, "%s...", cp); } gn = Targ_FindNode(cp, TARG_CREATE); /* Add gn to the parents child list before the original child */ (void)Lst_InsertBefore(pgn->children, cln, gn); (void)Lst_AtEnd(gn->parents, pgn); pgn->unmade++; } /* * Nuke what's left of the list */ Lst_Destroy(explist, NULL); if (DEBUG(SUFF)) { fprintf(debug_file, "\n"); } /* * Now the source is expanded, remove it from the list of children to * keep it from being processed. */ pgn->unmade--; Lst_Remove(pgn->children, cln); Lst_Remove(cgn->parents, Lst_Member(cgn->parents, pgn)); } /*- *----------------------------------------------------------------------- * Suff_FindPath -- * Find a path along which to expand the node. * * If the word has a known suffix, use that path. * If it has no known suffix, use the default system search path. * * Input: * gn Node being examined * * Results: * The appropriate path to search for the GNode. * * Side Effects: * XXX: We could set the suffix here so that we don't have to scan * again. * *----------------------------------------------------------------------- */ Lst Suff_FindPath(GNode* gn) { Suff *suff = gn->suffix; if (suff == NULL) { SuffixCmpData sd; /* Search string data */ LstNode ln; sd.len = strlen(gn->name); sd.ename = gn->name + sd.len; ln = Lst_Find(sufflist, &sd, SuffSuffIsSuffixP); if (DEBUG(SUFF)) { fprintf(debug_file, "Wildcard expanding \"%s\"...", gn->name); } if (ln != NULL) suff = (Suff *)Lst_Datum(ln); /* XXX: Here we can save the suffix so we don't have to do this again */ } if (suff != NULL) { if (DEBUG(SUFF)) { fprintf(debug_file, "suffix is \"%s\"...", suff->name); } return suff->searchPath; } else { /* * Use default search path */ return dirSearchPath; } } /*- *----------------------------------------------------------------------- * SuffApplyTransform -- * Apply a transformation rule, given the source and target nodes * and suffixes. * * Input: * tGn Target node * sGn Source node * t Target suffix * s Source suffix * * Results: * TRUE if successful, FALSE if not. * * Side Effects: * The source and target are linked and the commands from the * transformation are added to the target node's commands list. * All attributes but OP_DEPMASK and OP_TRANSFORM are applied * to the target. The target also inherits all the sources for * the transformation rule. * *----------------------------------------------------------------------- */ static Boolean SuffApplyTransform(GNode *tGn, GNode *sGn, Suff *t, Suff *s) { LstNode ln, nln; /* General node */ char *tname; /* Name of transformation rule */ GNode *gn; /* Node for same */ /* * Form the proper links between the target and source. */ (void)Lst_AtEnd(tGn->children, sGn); (void)Lst_AtEnd(sGn->parents, tGn); tGn->unmade += 1; /* * Locate the transformation rule itself */ tname = str_concat(s->name, t->name, 0); ln = Lst_Find(transforms, tname, SuffGNHasNameP); free(tname); if (ln == NULL) { /* * Not really such a transformation rule (can happen when we're * called to link an OP_MEMBER and OP_ARCHV node), so return * FALSE. */ return(FALSE); } gn = (GNode *)Lst_Datum(ln); if (DEBUG(SUFF)) { fprintf(debug_file, "\tapplying %s -> %s to \"%s\"\n", s->name, t->name, tGn->name); } /* * Record last child for expansion purposes */ ln = Lst_Last(tGn->children); /* * Pass the buck to Make_HandleUse to apply the rule */ (void)Make_HandleUse(gn, tGn); /* * Deal with wildcards and variables in any acquired sources */ for (ln = Lst_Succ(ln); ln != NULL; ln = nln) { nln = Lst_Succ(ln); SuffExpandChildren(ln, tGn); } /* * Keep track of another parent to which this beast is transformed so * the .IMPSRC variable can be set correctly for the parent. */ (void)Lst_AtEnd(sGn->iParents, tGn); return(TRUE); } /*- *----------------------------------------------------------------------- * SuffFindArchiveDeps -- * Locate dependencies for an OP_ARCHV node. * * Input: * gn Node for which to locate dependencies * * Results: * None * * Side Effects: * Same as Suff_FindDeps * *----------------------------------------------------------------------- */ static void SuffFindArchiveDeps(GNode *gn, Lst slst) { char *eoarch; /* End of archive portion */ char *eoname; /* End of member portion */ GNode *mem; /* Node for member */ static const char *copy[] = { /* Variables to be copied from the member node */ TARGET, /* Must be first */ PREFIX, /* Must be second */ }; LstNode ln, nln; /* Next suffix node to check */ int i; /* Index into copy and vals */ Suff *ms; /* Suffix descriptor for member */ char *name; /* Start of member's name */ /* * The node is an archive(member) pair. so we must find a * suffix for both of them. */ eoarch = strchr(gn->name, '('); eoname = strchr(eoarch, ')'); *eoname = '\0'; /* Nuke parentheses during suffix search */ *eoarch = '\0'; /* So a suffix can be found */ name = eoarch + 1; /* * To simplify things, call Suff_FindDeps recursively on the member now, * so we can simply compare the member's .PREFIX and .TARGET variables * to locate its suffix. This allows us to figure out the suffix to * use for the archive without having to do a quadratic search over the * suffix list, backtracking for each one... */ mem = Targ_FindNode(name, TARG_CREATE); SuffFindDeps(mem, slst); /* * Create the link between the two nodes right off */ (void)Lst_AtEnd(gn->children, mem); (void)Lst_AtEnd(mem->parents, gn); gn->unmade += 1; /* * Copy in the variables from the member node to this one. */ for (i = (sizeof(copy)/sizeof(copy[0]))-1; i >= 0; i--) { char *p1; Var_Set(copy[i], Var_Value(copy[i], mem, &p1), gn, 0); free(p1); } ms = mem->suffix; if (ms == NULL) { /* * Didn't know what it was -- use .NULL suffix if not in make mode */ if (DEBUG(SUFF)) { fprintf(debug_file, "using null suffix\n"); } ms = suffNull; } /* * Set the other two local variables required for this target. */ Var_Set(MEMBER, name, gn, 0); Var_Set(ARCHIVE, gn->name, gn, 0); /* * Set $@ for compatibility with other makes */ Var_Set(TARGET, gn->name, gn, 0); /* * Now we've got the important local variables set, expand any sources * that still contain variables or wildcards in their names. */ for (ln = Lst_First(gn->children); ln != NULL; ln = nln) { nln = Lst_Succ(ln); SuffExpandChildren(ln, gn); } if (ms != NULL) { /* * Member has a known suffix, so look for a transformation rule from * it to a possible suffix of the archive. Rather than searching * through the entire list, we just look at suffixes to which the * member's suffix may be transformed... */ SuffixCmpData sd; /* Search string data */ /* * Use first matching suffix... */ sd.len = eoarch - gn->name; sd.ename = eoarch; ln = Lst_Find(ms->parents, &sd, SuffSuffIsSuffixP); if (ln != NULL) { /* * Got one -- apply it */ if (!SuffApplyTransform(gn, mem, (Suff *)Lst_Datum(ln), ms) && DEBUG(SUFF)) { fprintf(debug_file, "\tNo transformation from %s -> %s\n", ms->name, ((Suff *)Lst_Datum(ln))->name); } } } /* * Replace the opening and closing parens now we've no need of the separate * pieces. */ *eoarch = '('; *eoname = ')'; /* * Pretend gn appeared to the left of a dependency operator so * the user needn't provide a transformation from the member to the * archive. */ if (OP_NOP(gn->type)) { gn->type |= OP_DEPENDS; } /* * Flag the member as such so we remember to look in the archive for * its modification time. The OP_JOIN | OP_MADE is needed because this * target should never get made. */ mem->type |= OP_MEMBER | OP_JOIN | OP_MADE; } /*- *----------------------------------------------------------------------- * SuffFindNormalDeps -- * Locate implicit dependencies for regular targets. * * Input: * gn Node for which to find sources * * Results: * None. * * Side Effects: * Same as Suff_FindDeps... * *----------------------------------------------------------------------- */ static void SuffFindNormalDeps(GNode *gn, Lst slst) { char *eoname; /* End of name */ char *sopref; /* Start of prefix */ LstNode ln, nln; /* Next suffix node to check */ Lst srcs; /* List of sources at which to look */ Lst targs; /* List of targets to which things can be * transformed. They all have the same file, * but different suff and pref fields */ Src *bottom; /* Start of found transformation path */ Src *src; /* General Src pointer */ char *pref; /* Prefix to use */ Src *targ; /* General Src target pointer */ SuffixCmpData sd; /* Search string data */ sd.len = strlen(gn->name); sd.ename = eoname = gn->name + sd.len; sopref = gn->name; /* * Begin at the beginning... */ ln = Lst_First(sufflist); srcs = Lst_Init(FALSE); targs = Lst_Init(FALSE); /* * We're caught in a catch-22 here. On the one hand, we want to use any * transformation implied by the target's sources, but we can't examine * the sources until we've expanded any variables/wildcards they may hold, * and we can't do that until we've set up the target's local variables * and we can't do that until we know what the proper suffix for the * target is (in case there are two suffixes one of which is a suffix of * the other) and we can't know that until we've found its implied * source, which we may not want to use if there's an existing source * that implies a different transformation. * * In an attempt to get around this, which may not work all the time, * but should work most of the time, we look for implied sources first, * checking transformations to all possible suffixes of the target, * use what we find to set the target's local variables, expand the * children, then look for any overriding transformations they imply. * Should we find one, we discard the one we found before. */ bottom = NULL; targ = NULL; if (!(gn->type & OP_PHONY)) { while (ln != NULL) { /* * Look for next possible suffix... */ ln = Lst_FindFrom(sufflist, ln, &sd, SuffSuffIsSuffixP); if (ln != NULL) { int prefLen; /* Length of the prefix */ /* * Allocate a Src structure to which things can be transformed */ targ = bmake_malloc(sizeof(Src)); targ->file = bmake_strdup(gn->name); targ->suff = (Suff *)Lst_Datum(ln); targ->suff->refCount++; targ->node = gn; targ->parent = NULL; targ->children = 0; #ifdef DEBUG_SRC targ->cp = Lst_Init(FALSE); #endif /* * Allocate room for the prefix, whose end is found by * subtracting the length of the suffix from * the end of the name. */ prefLen = (eoname - targ->suff->nameLen) - sopref; targ->pref = bmake_malloc(prefLen + 1); memcpy(targ->pref, sopref, prefLen); targ->pref[prefLen] = '\0'; /* * Add nodes from which the target can be made */ SuffAddLevel(srcs, targ); /* * Record the target so we can nuke it */ (void)Lst_AtEnd(targs, targ); /* * Search from this suffix's successor... */ ln = Lst_Succ(ln); } } /* * Handle target of unknown suffix... */ if (Lst_IsEmpty(targs) && suffNull != NULL) { if (DEBUG(SUFF)) { fprintf(debug_file, "\tNo known suffix on %s. Using .NULL suffix\n", gn->name); } targ = bmake_malloc(sizeof(Src)); targ->file = bmake_strdup(gn->name); targ->suff = suffNull; targ->suff->refCount++; targ->node = gn; targ->parent = NULL; targ->children = 0; targ->pref = bmake_strdup(sopref); #ifdef DEBUG_SRC targ->cp = Lst_Init(FALSE); #endif /* * Only use the default suffix rules if we don't have commands * defined for this gnode; traditional make programs used to * not define suffix rules if the gnode had children but we * don't do this anymore. */ if (Lst_IsEmpty(gn->commands)) SuffAddLevel(srcs, targ); else { if (DEBUG(SUFF)) fprintf(debug_file, "not "); } if (DEBUG(SUFF)) fprintf(debug_file, "adding suffix rules\n"); (void)Lst_AtEnd(targs, targ); } /* * Using the list of possible sources built up from the target * suffix(es), try and find an existing file/target that matches. */ bottom = SuffFindThem(srcs, slst); if (bottom == NULL) { /* * No known transformations -- use the first suffix found * for setting the local variables. */ if (!Lst_IsEmpty(targs)) { targ = (Src *)Lst_Datum(Lst_First(targs)); } else { targ = NULL; } } else { /* * Work up the transformation path to find the suffix of the * target to which the transformation was made. */ for (targ = bottom; targ->parent != NULL; targ = targ->parent) continue; } } Var_Set(TARGET, gn->path ? gn->path : gn->name, gn, 0); pref = (targ != NULL) ? targ->pref : gn->name; Var_Set(PREFIX, pref, gn, 0); /* * Now we've got the important local variables set, expand any sources * that still contain variables or wildcards in their names. */ for (ln = Lst_First(gn->children); ln != NULL; ln = nln) { nln = Lst_Succ(ln); SuffExpandChildren(ln, gn); } if (targ == NULL) { if (DEBUG(SUFF)) { fprintf(debug_file, "\tNo valid suffix on %s\n", gn->name); } sfnd_abort: /* * Deal with finding the thing on the default search path. We * always do that, not only if the node is only a source (not * on the lhs of a dependency operator or [XXX] it has neither * children or commands) as the old pmake did. */ if ((gn->type & (OP_PHONY|OP_NOPATH)) == 0) { free(gn->path); gn->path = Dir_FindFile(gn->name, (targ == NULL ? dirSearchPath : targ->suff->searchPath)); if (gn->path != NULL) { char *ptr; Var_Set(TARGET, gn->path, gn, 0); if (targ != NULL) { /* * Suffix known for the thing -- trim the suffix off * the path to form the proper .PREFIX variable. */ int savep = strlen(gn->path) - targ->suff->nameLen; char savec; if (gn->suffix) gn->suffix->refCount--; gn->suffix = targ->suff; gn->suffix->refCount++; savec = gn->path[savep]; gn->path[savep] = '\0'; if ((ptr = strrchr(gn->path, '/')) != NULL) ptr++; else ptr = gn->path; Var_Set(PREFIX, ptr, gn, 0); gn->path[savep] = savec; } else { /* * The .PREFIX gets the full path if the target has * no known suffix. */ if (gn->suffix) gn->suffix->refCount--; gn->suffix = NULL; if ((ptr = strrchr(gn->path, '/')) != NULL) ptr++; else ptr = gn->path; Var_Set(PREFIX, ptr, gn, 0); } } } goto sfnd_return; } /* * If the suffix indicates that the target is a library, mark that in * the node's type field. */ if (targ->suff->flags & SUFF_LIBRARY) { gn->type |= OP_LIB; } /* * Check for overriding transformation rule implied by sources */ if (!Lst_IsEmpty(gn->children)) { src = SuffFindCmds(targ, slst); if (src != NULL) { /* * Free up all the Src structures in the transformation path * up to, but not including, the parent node. */ while (bottom && bottom->parent != NULL) { if (Lst_Member(slst, bottom) == NULL) { Lst_AtEnd(slst, bottom); } bottom = bottom->parent; } bottom = src; } } if (bottom == NULL) { /* * No idea from where it can come -- return now. */ goto sfnd_abort; } /* * We now have a list of Src structures headed by 'bottom' and linked via * their 'parent' pointers. What we do next is create links between * source and target nodes (which may or may not have been created) * and set the necessary local variables in each target. The * commands for each target are set from the commands of the * transformation rule used to get from the src suffix to the targ * suffix. Note that this causes the commands list of the original * node, gn, to be replaced by the commands of the final * transformation rule. Also, the unmade field of gn is incremented. * Etc. */ if (bottom->node == NULL) { bottom->node = Targ_FindNode(bottom->file, TARG_CREATE); } for (src = bottom; src->parent != NULL; src = src->parent) { targ = src->parent; if (src->node->suffix) src->node->suffix->refCount--; src->node->suffix = src->suff; src->node->suffix->refCount++; if (targ->node == NULL) { targ->node = Targ_FindNode(targ->file, TARG_CREATE); } SuffApplyTransform(targ->node, src->node, targ->suff, src->suff); if (targ->node != gn) { /* * Finish off the dependency-search process for any nodes * between bottom and gn (no point in questing around the * filesystem for their implicit source when it's already * known). Note that the node can't have any sources that * need expanding, since SuffFindThem will stop on an existing * node, so all we need to do is set the standard and System V * variables. */ targ->node->type |= OP_DEPS_FOUND; Var_Set(PREFIX, targ->pref, targ->node, 0); Var_Set(TARGET, targ->node->name, targ->node, 0); } } if (gn->suffix) gn->suffix->refCount--; gn->suffix = src->suff; gn->suffix->refCount++; /* * Nuke the transformation path and the Src structures left over in the * two lists. */ sfnd_return: if (bottom) if (Lst_Member(slst, bottom) == NULL) Lst_AtEnd(slst, bottom); while (SuffRemoveSrc(srcs) || SuffRemoveSrc(targs)) continue; Lst_Concat(slst, srcs, LST_CONCLINK); Lst_Concat(slst, targs, LST_CONCLINK); } /*- *----------------------------------------------------------------------- * Suff_FindDeps -- * Find implicit sources for the target described by the graph node * gn * * Results: * Nothing. * * Side Effects: * Nodes are added to the graph below the passed-in node. The nodes * are marked to have their IMPSRC variable filled in. The * PREFIX variable is set for the given node and all its * implied children. * * Notes: * The path found by this target is the shortest path in the * transformation graph, which may pass through non-existent targets, * to an existing target. The search continues on all paths from the * root suffix until a file is found. I.e. if there's a path * .o -> .c -> .l -> .l,v from the root and the .l,v file exists but * the .c and .l files don't, the search will branch out in * all directions from .o and again from all the nodes on the * next level until the .l,v node is encountered. * *----------------------------------------------------------------------- */ void Suff_FindDeps(GNode *gn) { SuffFindDeps(gn, srclist); while (SuffRemoveSrc(srclist)) continue; } /* * Input: * gn node we're dealing with * */ static void SuffFindDeps(GNode *gn, Lst slst) { if (gn->type & OP_DEPS_FOUND) { /* * If dependencies already found, no need to do it again... */ return; } else { gn->type |= OP_DEPS_FOUND; } /* * Make sure we have these set, may get revised below. */ Var_Set(TARGET, gn->path ? gn->path : gn->name, gn, 0); Var_Set(PREFIX, gn->name, gn, 0); if (DEBUG(SUFF)) { fprintf(debug_file, "SuffFindDeps (%s)\n", gn->name); } if (gn->type & OP_ARCHV) { SuffFindArchiveDeps(gn, slst); } else if (gn->type & OP_LIB) { /* * If the node is a library, it is the arch module's job to find it * and set the TARGET variable accordingly. We merely provide the * search path, assuming all libraries end in ".a" (if the suffix * hasn't been defined, there's nothing we can do for it, so we just * set the TARGET variable to the node's name in order to give it a * value). */ LstNode ln; Suff *s; ln = Lst_Find(sufflist, LIBSUFF, SuffSuffHasNameP); if (gn->suffix) gn->suffix->refCount--; if (ln != NULL) { gn->suffix = s = (Suff *)Lst_Datum(ln); gn->suffix->refCount++; Arch_FindLib(gn, s->searchPath); } else { gn->suffix = NULL; Var_Set(TARGET, gn->name, gn, 0); } /* * Because a library (-lfoo) target doesn't follow the standard * filesystem conventions, we don't set the regular variables for * the thing. .PREFIX is simply made empty... */ Var_Set(PREFIX, "", gn, 0); } else { SuffFindNormalDeps(gn, slst); } } /*- *----------------------------------------------------------------------- * Suff_SetNull -- * Define which suffix is the null suffix. * * Input: * name Name of null suffix * * Results: * None. * * Side Effects: * 'suffNull' is altered. * * Notes: * Need to handle the changing of the null suffix gracefully so the * old transformation rules don't just go away. * *----------------------------------------------------------------------- */ void Suff_SetNull(char *name) { Suff *s; LstNode ln; ln = Lst_Find(sufflist, name, SuffSuffHasNameP); if (ln != NULL) { s = (Suff *)Lst_Datum(ln); if (suffNull != NULL) { suffNull->flags &= ~SUFF_NULL; } s->flags |= SUFF_NULL; /* * XXX: Here's where the transformation mangling would take place */ suffNull = s; } else { Parse_Error(PARSE_WARNING, "Desired null suffix %s not defined.", name); } } /*- *----------------------------------------------------------------------- * Suff_Init -- * Initialize suffixes module * * Results: * None * * Side Effects: * Many *----------------------------------------------------------------------- */ void Suff_Init(void) { #ifdef CLEANUP suffClean = Lst_Init(FALSE); #endif srclist = Lst_Init(FALSE); transforms = Lst_Init(FALSE); /* * Create null suffix for single-suffix rules (POSIX). The thing doesn't * actually go on the suffix list or everyone will think that's its * suffix. */ Suff_ClearSuffixes(); } /*- *---------------------------------------------------------------------- * Suff_End -- * Cleanup the this module * * Results: * None * * Side Effects: * The memory is free'd. *---------------------------------------------------------------------- */ void Suff_End(void) { #ifdef CLEANUP Lst_Destroy(sufflist, SuffFree); Lst_Destroy(suffClean, SuffFree); if (suffNull) SuffFree(suffNull); Lst_Destroy(srclist, NULL); Lst_Destroy(transforms, NULL); #endif } /********************* DEBUGGING FUNCTIONS **********************/ static int SuffPrintName(void *s, void *dummy) { + (void)dummy; + fprintf(debug_file, "%s ", ((Suff *)s)->name); - return (dummy ? 0 : 0); + return 0; } static int SuffPrintSuff(void *sp, void *dummy) { Suff *s = (Suff *)sp; int flags; int flag; + (void)dummy; + fprintf(debug_file, "# `%s' [%d] ", s->name, s->refCount); flags = s->flags; if (flags) { fputs(" (", debug_file); while (flags) { flag = 1 << (ffs(flags) - 1); flags &= ~flag; switch (flag) { case SUFF_NULL: fprintf(debug_file, "NULL"); break; case SUFF_INCLUDE: fprintf(debug_file, "INCLUDE"); break; case SUFF_LIBRARY: fprintf(debug_file, "LIBRARY"); break; } fputc(flags ? '|' : ')', debug_file); } } fputc('\n', debug_file); fprintf(debug_file, "#\tTo: "); Lst_ForEach(s->parents, SuffPrintName, NULL); fputc('\n', debug_file); fprintf(debug_file, "#\tFrom: "); Lst_ForEach(s->children, SuffPrintName, NULL); fputc('\n', debug_file); fprintf(debug_file, "#\tSearch Path: "); Dir_PrintPath(s->searchPath); fputc('\n', debug_file); - return (dummy ? 0 : 0); + return 0; } static int SuffPrintTrans(void *tp, void *dummy) { GNode *t = (GNode *)tp; + (void)dummy; + fprintf(debug_file, "%-16s: ", t->name); Targ_PrintType(t->type); fputc('\n', debug_file); Lst_ForEach(t->commands, Targ_PrintCmd, NULL); fputc('\n', debug_file); - return(dummy ? 0 : 0); + return 0; } void Suff_PrintAll(void) { fprintf(debug_file, "#*** Suffixes:\n"); Lst_ForEach(sufflist, SuffPrintSuff, NULL); fprintf(debug_file, "#*** Transformations:\n"); Lst_ForEach(transforms, SuffPrintTrans, NULL); } Index: projects/clang390-import/contrib/bmake =================================================================== --- projects/clang390-import/contrib/bmake (revision 305686) +++ projects/clang390-import/contrib/bmake (revision 305687) Property changes on: projects/clang390-import/contrib/bmake ___________________________________________________________________ Modified: svn:mergeinfo ## -0,0 +0,2 ## Merged /head/contrib/bmake:r303250-305686 Merged /vendor/NetBSD/bmake/dist:r301637-305632 Index: projects/clang390-import/etc/mtree/BSD.tests.dist =================================================================== --- projects/clang390-import/etc/mtree/BSD.tests.dist (revision 305686) +++ projects/clang390-import/etc/mtree/BSD.tests.dist (revision 305687) @@ -1,680 +1,692 @@ # $FreeBSD$ # # Please see the file src/etc/mtree/README before making changes to this file. # /set type=dir uname=root gname=wheel mode=0755 . bin cat .. date .. dd .. expr .. ls .. mv .. pax .. pkill .. sh builtins .. errors .. execution .. expansion .. parameters .. parser .. set-e .. .. sleep .. test .. .. cddl lib .. sbin .. usr.bin .. usr.sbin dtrace common aggs .. arithmetic .. arrays .. assocs .. begin .. bitfields .. buffering .. builtinvar .. cg .. clauses .. cpc .. decls .. drops .. dtraceUtil .. end .. enum .. error .. exit .. fbtprovider .. funcs .. grammar .. include .. inline .. io .. ip .. java_api .. json .. lexer .. llquantize .. mdb .. mib .. misc .. multiaggs .. offsetof .. operators .. pid .. plockstat .. pointers .. pragma .. predicates .. preprocessor .. print .. printa .. printf .. privs .. probes .. proc .. profile-n .. providers .. raise .. rates .. safety .. scalars .. sched .. scripting .. sdt .. sizeof .. speculation .. stability .. stack .. stackdepth .. stop .. strlen .. strtoll .. struct .. sugar .. syscall .. sysevent .. tick-n .. trace .. tracemem .. translators .. typedef .. types .. uctf .. union .. usdt .. ustack .. vars .. version .. .. .. zfsd .. .. .. etc rc.d .. .. games .. gnu lib .. usr.bin diff .. .. .. lib atf libatf-c detail .. .. libatf-c++ detail .. .. test-programs .. .. libarchive .. libc c063 .. db .. gen execve .. posix_spawn .. .. hash data .. .. iconv .. inet .. locale .. net getaddrinfo data .. .. .. nss .. regex data .. .. resolv .. rpc .. ssp .. setjmp .. stdio .. stdlib .. string .. sys .. time .. tls dso .. .. termios .. ttyio .. .. + libcasper + services + cap_dns + .. + cap_grp + .. + cap_pwd + .. + cap_sysctl + .. + .. + .. libcrypt .. libdevdctl .. libmp .. libnv .. libpam .. libproc .. librt .. libthr dlopen .. .. libutil .. libxo .. msun .. .. libexec atf atf-check .. atf-sh .. .. rtld-elf .. .. sbin dhclient .. devd .. growfs .. ifconfig .. mdconfig .. .. secure lib .. libexec .. usr.bin .. usr.sbin .. .. share examples tests atf .. plain .. .. .. .. sys acl .. aio .. fifo .. file .. geom class concat .. eli .. gate .. gpt .. mirror .. nop .. raid3 .. shsec .. stripe .. uzip etalon .. .. .. .. kern acct .. execve .. pipe .. .. kqueue libkqueue .. .. mac bsdextended .. portacl .. .. mqueue .. netinet .. opencrypto .. pjdfstest chflags .. chmod .. chown .. ftruncate .. granular .. link .. mkdir .. mkfifo .. mknod .. open .. rename .. rmdir .. symlink .. truncate .. unlink .. .. posixshm .. sys .. vfs .. vm .. .. usr.bin apply .. basename .. bmake archives fmt_44bsd .. fmt_44bsd_mod .. fmt_oldbsd .. .. basic t0 .. t1 .. t2 .. t3 .. .. execution ellipsis .. empty .. joberr .. plus .. .. shell builtin .. meta .. path .. path_select .. replace .. select .. .. suffixes basic .. src_wild1 .. src_wild2 .. .. syntax directive-t0 .. enl .. funny-targets .. semi .. .. sysmk t0 2 1 .. .. mk .. .. t1 2 1 .. .. mk .. .. t2 2 1 .. .. mk .. .. .. variables modifier_M .. modifier_t .. opt_V .. t0 .. .. .. bsdcat .. calendar .. cmp .. cpio .. col .. comm .. cut .. dirname .. file2c .. grep .. gzip .. ident .. join .. jot .. lastcomm .. limits .. m4 .. mkimg .. ncal .. opensm .. printf .. sdiff .. sed regress.multitest.out .. .. soelim .. tar .. timeout .. tr .. truncate .. units .. uudecode .. uuencode .. xargs .. xinstall .. xo .. yacc yacc .. .. .. usr.sbin chown .. etcupdate .. extattr .. fstyp .. makefs .. newsyslog .. nmtree .. pw .. rpcbind .. sa .. .. .. # vim: set expandtab ts=4 sw=4: Index: projects/clang390-import/lib/libc/aarch64/sys/Makefile.inc =================================================================== --- projects/clang390-import/lib/libc/aarch64/sys/Makefile.inc (revision 305686) +++ projects/clang390-import/lib/libc/aarch64/sys/Makefile.inc (revision 305687) @@ -1,23 +1,15 @@ # $FreeBSD$ MIASM:= ${MIASM:Nfreebsd[467]_*} SRCS+= __vdso_gettc.c MDASM= cerror.S \ shmat.S \ sigreturn.S \ syscall.S \ vfork.S # Don't generate default code for these syscalls: -NOASM= break.o \ - exit.o \ - getlogin.o \ - sbrk.o \ - sstk.o \ - vfork.o \ - yield.o - -PSEUDO= _exit.o \ - _getlogin.o +NOASM+= sbrk.o \ + vfork.o Index: projects/clang390-import/lib/libc/amd64/sys/Makefile.inc =================================================================== --- projects/clang390-import/lib/libc/amd64/sys/Makefile.inc (revision 305686) +++ projects/clang390-import/lib/libc/amd64/sys/Makefile.inc (revision 305687) @@ -1,13 +1,11 @@ # from: Makefile.inc,v 1.1 1993/09/03 19:04:23 jtc Exp # $FreeBSD$ SRCS+= amd64_get_fsbase.c amd64_get_gsbase.c amd64_set_fsbase.c \ amd64_set_gsbase.c MDASM= vfork.S brk.S cerror.S exect.S getcontext.S \ sbrk.S setlogin.S sigreturn.S # Don't generate default code for these syscalls: -NOASM= break.o exit.o getlogin.o sstk.o vfork.o yield.o - -PSEUDO= _getlogin.o _exit.o +NOASM+= vfork.o Index: projects/clang390-import/lib/libc/arm/sys/Makefile.inc =================================================================== --- projects/clang390-import/lib/libc/arm/sys/Makefile.inc (revision 305686) +++ projects/clang390-import/lib/libc/arm/sys/Makefile.inc (revision 305687) @@ -1,10 +1,8 @@ # $FreeBSD$ SRCS+= __vdso_gettc.c MDASM= Ovfork.S brk.S cerror.S sbrk.S shmat.S sigreturn.S syscall.S # Don't generate default code for these syscalls: -NOASM= break.o exit.o getlogin.o sstk.o vfork.o yield.o - -PSEUDO= _exit.o _getlogin.o +NOASM+= vfork.o Index: projects/clang390-import/lib/libc/i386/sys/Makefile.inc =================================================================== --- projects/clang390-import/lib/libc/i386/sys/Makefile.inc (revision 305686) +++ projects/clang390-import/lib/libc/i386/sys/Makefile.inc (revision 305687) @@ -1,23 +1,20 @@ # from: Makefile.inc,v 1.1 1993/09/03 19:04:23 jtc Exp # $FreeBSD$ .if !defined(COMPAT_32BIT) SRCS+= i386_clr_watch.c i386_set_watch.c i386_vm86.c .endif SRCS+= i386_get_fsbase.c i386_get_gsbase.c i386_get_ioperm.c i386_get_ldt.c \ i386_set_fsbase.c i386_set_gsbase.c i386_set_ioperm.c i386_set_ldt.c MDASM= Ovfork.S brk.S cerror.S exect.S getcontext.S \ sbrk.S setlogin.S sigreturn.S syscall.S -# Don't generate default code for these syscalls: -NOASM= break.o exit.o getlogin.o sstk.o vfork.o yield.o - -PSEUDO= _getlogin.o _exit.o +NOASM+= vfork.o MAN+= i386_get_ioperm.2 i386_get_ldt.2 i386_vm86.2 MAN+= i386_set_watch.3 MLINKS+=i386_get_ioperm.2 i386_set_ioperm.2 MLINKS+=i386_get_ldt.2 i386_set_ldt.2 MLINKS+=i386_set_watch.3 i386_clr_watch.3 Index: projects/clang390-import/lib/libc/mips/sys/Makefile.inc =================================================================== --- projects/clang390-import/lib/libc/mips/sys/Makefile.inc (revision 305686) +++ projects/clang390-import/lib/libc/mips/sys/Makefile.inc (revision 305687) @@ -1,11 +1,9 @@ # $FreeBSD$ SRCS+= trivial-vdso_tc.c MDASM= Ovfork.S brk.S cerror.S exect.S \ sbrk.S syscall.S # Don't generate default code for these syscalls: -NOASM= break.o exit.o getlogin.o sstk.o vfork.o yield.o - -PSEUDO= _exit.o _getlogin.o +NOASM+= vfork.o Index: projects/clang390-import/lib/libc/powerpc/sys/Makefile.inc =================================================================== --- projects/clang390-import/lib/libc/powerpc/sys/Makefile.inc (revision 305686) +++ projects/clang390-import/lib/libc/powerpc/sys/Makefile.inc (revision 305687) @@ -1,8 +1,3 @@ # $FreeBSD$ MDASM+= brk.S cerror.S exect.S sbrk.S setlogin.S - -# Don't generate default code for these syscalls: -NOASM= break.o exit.o getlogin.o sstk.o yield.o - -PSEUDO= _getlogin.o _exit.o Index: projects/clang390-import/lib/libc/powerpc64/sys/Makefile.inc =================================================================== --- projects/clang390-import/lib/libc/powerpc64/sys/Makefile.inc (revision 305686) +++ projects/clang390-import/lib/libc/powerpc64/sys/Makefile.inc (revision 305687) @@ -1,8 +1,3 @@ # $FreeBSD$ MDASM+= brk.S cerror.S exect.S sbrk.S setlogin.S - -# Don't generate default code for these syscalls: -NOASM= break.o exit.o getlogin.o sstk.o yield.o - -PSEUDO= _getlogin.o _exit.o Index: projects/clang390-import/lib/libc/riscv/sys/Makefile.inc =================================================================== --- projects/clang390-import/lib/libc/riscv/sys/Makefile.inc (revision 305686) +++ projects/clang390-import/lib/libc/riscv/sys/Makefile.inc (revision 305687) @@ -1,22 +1,13 @@ # $FreeBSD$ SRCS+= trivial-vdso_tc.c #MDASM= ptrace.S MDASM= cerror.S \ shmat.S \ sigreturn.S \ syscall.S \ vfork.S # Don't generate default code for these syscalls: -NOASM= break.o \ - exit.o \ - getlogin.o \ - sbrk.o \ - sstk.o \ - vfork.o \ - yield.o - -PSEUDO= _exit.o \ - _getlogin.o +NOASM+= vfork.o Index: projects/clang390-import/lib/libc/sparc64/sys/Makefile.inc =================================================================== --- projects/clang390-import/lib/libc/sparc64/sys/Makefile.inc (revision 305686) +++ projects/clang390-import/lib/libc/sparc64/sys/Makefile.inc (revision 305687) @@ -1,20 +1,15 @@ # $FreeBSD$ SRCS+= __sparc_sigtramp_setup.c \ __sparc_utrap.c \ __sparc_utrap_align.c \ __sparc_utrap_emul.c \ __sparc_utrap_fp_disabled.S \ __sparc_utrap_gen.S \ __sparc_utrap_install.c \ __sparc_utrap_setup.c \ sigcode.S CFLAGS+= -I${LIBC_SRCTOP}/sparc64/fpu MDASM+= brk.S cerror.S exect.S sbrk.S setlogin.S sigaction1.S - -# Don't generate default code for these syscalls: -NOASM= break.o exit.o getlogin.o sstk.o yield.o - -PSEUDO= _getlogin.o _exit.o Index: projects/clang390-import/lib/libc/sys/Makefile.inc =================================================================== --- projects/clang390-import/lib/libc/sys/Makefile.inc (revision 305686) +++ projects/clang390-import/lib/libc/sys/Makefile.inc (revision 305687) @@ -1,468 +1,479 @@ # @(#)Makefile.inc 8.3 (Berkeley) 10/24/94 # $FreeBSD$ # sys sources .PATH: ${LIBC_SRCTOP}/${LIBC_ARCH}/sys ${LIBC_SRCTOP}/sys # Include the generated makefile containing the *complete* list # of syscall names in MIASM. .include "${LIBC_SRCTOP}/../../sys/sys/syscall.mk" # Include machine dependent definitions. # # MDASM names override the default syscall names in MIASM. # NOASM will prevent the default syscall code from being generated. +# PSEUDO generates _() and __sys_() symbols, but not (). # +# While historically machine dependent, all architectures have the following +# declarations in common: +# +NOASM= break.o \ + exit.o \ + getlogin.o \ + sstk.o \ + yield.o +PSEUDO= _exit.o \ + _getlogin.o .sinclude "${LIBC_SRCTOP}/${LIBC_ARCH}/sys/Makefile.inc" SRCS+= clock_gettime.c gettimeofday.c __vdso_gettimeofday.c NOASM+= clock_gettime.o gettimeofday.o PSEUDO+= _clock_gettime.o _gettimeofday.o # Sources common to both syscall interfaces: SRCS+= \ __error.c \ interposing_table.c SRCS+= futimens.c utimensat.c NOASM+= futimens.o utimensat.o PSEUDO+= _futimens.o _utimensat.o SRCS+= pipe.c INTERPOSED = \ accept \ accept4 \ aio_suspend \ close \ connect \ fcntl \ fdatasync \ fsync \ fork \ kevent \ msync \ nanosleep \ open \ openat \ poll \ ppoll \ pselect \ ptrace \ read \ readv \ recvfrom \ recvmsg \ select \ sendmsg \ sendto \ setcontext \ sigprocmask \ sigsuspend \ sigtimedwait \ sigwait \ sigwaitinfo \ swapcontext \ wait4 \ wait6 \ write \ writev .if ${MACHINE_CPUARCH} == "sparc64" SRCS+= sigaction.c NOASM+= sigaction.o .else INTERPOSED+= sigaction .endif SRCS+= ${INTERPOSED:S/$/.c/} NOASM+= ${INTERPOSED:S/$/.o/} PSEUDO+= ${INTERPOSED:C/^.*$/_&.o/} # Add machine dependent asm sources: SRCS+=${MDASM} # Look though the complete list of syscalls (MIASM) for names that are # not defined with machine dependent implementations (MDASM) and are # not declared for no generation of default code (NOASM). Add each # syscall that satisfies these conditions to the ASM list. .for _asm in ${MIASM} .if (${MDASM:R:M${_asm:R}} == "") .if (${NOASM:R:M${_asm:R}} == "") ASM+=$(_asm) .endif .endif .endfor SASM= ${ASM:S/.o/.S/} SPSEUDO= ${PSEUDO:S/.o/.S/} SRCS+= ${SASM} ${SPSEUDO} SYM_MAPS+= ${LIBC_SRCTOP}/sys/Symbol.map # Generated files CLEANFILES+= ${SASM} ${SPSEUDO} .if ${MACHINE_CPUARCH} == "amd64" || ${MACHINE_CPUARCH} == "i386" || \ ${MACHINE_CPUARCH} == "powerpc" || ${MACHINE_ARCH:Marmv6*} NOTE_GNU_STACK='\t.section .note.GNU-stack,"",%%progbits\n' .else NOTE_GNU_STACK='' .endif ${SASM}: printf '#include "compat.h"\n' > ${.TARGET} printf '#include "SYS.h"\nRSYSCALL(${.PREFIX})\n' >> ${.TARGET} printf ${NOTE_GNU_STACK} >>${.TARGET} ${SPSEUDO}: printf '#include "compat.h"\n' > ${.TARGET} printf '#include "SYS.h"\nPSEUDO(${.PREFIX:S/_//})\n' \ >> ${.TARGET} printf ${NOTE_GNU_STACK} >>${.TARGET} MAN+= abort2.2 \ accept.2 \ access.2 \ acct.2 \ adjtime.2 \ aio_cancel.2 \ aio_error.2 \ aio_fsync.2 \ aio_mlock.2 \ aio_read.2 \ aio_return.2 \ aio_suspend.2 \ aio_waitcomplete.2 \ aio_write.2 \ bind.2 \ bindat.2 \ brk.2 \ cap_enter.2 \ cap_fcntls_limit.2 \ cap_ioctls_limit.2 \ cap_rights_limit.2 \ chdir.2 \ chflags.2 \ chmod.2 \ chown.2 \ chroot.2 \ clock_gettime.2 \ close.2 \ closefrom.2 \ connect.2 \ connectat.2 \ cpuset.2 \ cpuset_getaffinity.2 \ dup.2 \ execve.2 \ _exit.2 \ extattr_get_file.2 \ fcntl.2 \ ffclock.2 \ fhopen.2 \ flock.2 \ fork.2 \ fsync.2 \ getdirentries.2 \ getdtablesize.2 \ getfh.2 \ getfsstat.2 \ getgid.2 \ getgroups.2 \ getitimer.2 \ getlogin.2 \ getloginclass.2 \ getpeername.2 \ getpgrp.2 \ getpid.2 \ getpriority.2 \ getrlimit.2 \ getrusage.2 \ getsid.2 \ getsockname.2 \ getsockopt.2 \ gettimeofday.2 \ getuid.2 \ intro.2 \ ioctl.2 \ issetugid.2 \ jail.2 \ kenv.2 \ kill.2 \ kldfind.2 \ kldfirstmod.2 \ kldload.2 \ kldnext.2 \ kldstat.2 \ kldsym.2 \ kldunload.2 \ kqueue.2 \ ktrace.2 \ link.2 \ lio_listio.2 \ listen.2 \ lseek.2 \ madvise.2 \ mincore.2 \ minherit.2 \ mkdir.2 \ mkfifo.2 \ mknod.2 \ mlock.2 \ mlockall.2 \ mmap.2 \ modfind.2 \ modnext.2 \ modstat.2 \ mount.2 \ mprotect.2 \ mq_close.2 \ mq_getattr.2 \ mq_notify.2 \ mq_open.2 \ mq_receive.2 \ mq_send.2 \ mq_setattr.2 \ msgctl.2 \ msgget.2 \ msgrcv.2 \ msgsnd.2 \ msync.2 \ munmap.2 \ nanosleep.2 \ nfssvc.2 \ ntp_adjtime.2 \ numa_getaffinity.2 \ open.2 \ pathconf.2 \ pdfork.2 \ pipe.2 \ poll.2 \ posix_fadvise.2 \ posix_fallocate.2 \ posix_openpt.2 \ procctl.2 \ profil.2 \ pselect.2 \ ptrace.2 \ quotactl.2 \ read.2 \ readlink.2 \ reboot.2 \ recv.2 \ rename.2 \ revoke.2 \ rfork.2 \ rmdir.2 \ rtprio.2 .if !defined(NO_P1003_1B) MAN+= sched_get_priority_max.2 \ sched_setparam.2 \ sched_setscheduler.2 \ sched_yield.2 .endif MAN+= sctp_generic_recvmsg.2 \ sctp_generic_sendmsg.2 \ sctp_peeloff.2 \ select.2 \ semctl.2 \ semget.2 \ semop.2 \ send.2 \ setfib.2 \ sendfile.2 \ setgroups.2 \ setpgid.2 \ setregid.2 \ setresuid.2 \ setreuid.2 \ setsid.2 \ setuid.2 \ shmat.2 \ shmctl.2 \ shmget.2 \ shm_open.2 \ shutdown.2 \ sigaction.2 \ sigaltstack.2 \ sigpending.2 \ sigprocmask.2 \ sigqueue.2 \ sigreturn.2 \ sigstack.2 \ sigsuspend.2 \ sigwait.2 \ sigwaitinfo.2 \ socket.2 \ socketpair.2 \ stat.2 \ statfs.2 \ swapon.2 \ symlink.2 \ sync.2 \ sysarch.2 \ syscall.2 \ thr_exit.2 \ thr_kill.2 \ thr_new.2 \ thr_self.2 \ thr_set_name.2 \ timer_create.2 \ timer_delete.2 \ timer_settime.2 \ truncate.2 \ umask.2 \ undelete.2 \ unlink.2 \ utimensat.2 \ utimes.2 \ utrace.2 \ uuidgen.2 \ vfork.2 \ wait.2 \ write.2 \ _umtx_op.2 MLINKS+=accept.2 accept4.2 MLINKS+=access.2 eaccess.2 \ access.2 faccessat.2 MLINKS+=brk.2 sbrk.2 MLINKS+=cap_enter.2 cap_getmode.2 MLINKS+=cap_fcntls_limit.2 cap_fcntls_get.2 MLINKS+=cap_ioctls_limit.2 cap_ioctls_get.2 MLINKS+=cap_rights_limit.2 cap_rights_get.2 MLINKS+=chdir.2 fchdir.2 MLINKS+=chflags.2 chflagsat.2 \ chflags.2 fchflags.2 \ chflags.2 lchflags.2 MLINKS+=chmod.2 fchmod.2 \ chmod.2 fchmodat.2 \ chmod.2 lchmod.2 MLINKS+=chown.2 fchown.2 \ chown.2 fchownat.2 \ chown.2 lchown.2 MLINKS+=clock_gettime.2 clock_getres.2 \ clock_gettime.2 clock_settime.2 MLINKS+=cpuset.2 cpuset_getid.2 \ cpuset.2 cpuset_setid.2 MLINKS+=cpuset_getaffinity.2 cpuset_setaffinity.2 MLINKS+=dup.2 dup2.2 MLINKS+=execve.2 fexecve.2 MLINKS+=extattr_get_file.2 extattr.2 \ extattr_get_file.2 extattr_delete_fd.2 \ extattr_get_file.2 extattr_delete_file.2 \ extattr_get_file.2 extattr_delete_link.2 \ extattr_get_file.2 extattr_get_fd.2 \ extattr_get_file.2 extattr_get_link.2 \ extattr_get_file.2 extattr_list_fd.2 \ extattr_get_file.2 extattr_list_file.2 \ extattr_get_file.2 extattr_list_link.2 \ extattr_get_file.2 extattr_set_fd.2 \ extattr_get_file.2 extattr_set_file.2 \ extattr_get_file.2 extattr_set_link.2 MLINKS+=ffclock.2 ffclock_getcounter.2 \ ffclock.2 ffclock_getestimate.2 \ ffclock.2 ffclock_setestimate.2 MLINKS+=fhopen.2 fhstat.2 fhopen.2 fhstatfs.2 MLINKS+=fsync.2 fdatasync.2 MLINKS+=getdirentries.2 getdents.2 MLINKS+=getfh.2 lgetfh.2 MLINKS+=getgid.2 getegid.2 MLINKS+=getitimer.2 setitimer.2 MLINKS+=getlogin.2 getlogin_r.3 MLINKS+=getlogin.2 setlogin.2 MLINKS+=getloginclass.2 setloginclass.2 MLINKS+=getpgrp.2 getpgid.2 MLINKS+=getpid.2 getppid.2 MLINKS+=getpriority.2 setpriority.2 MLINKS+=getrlimit.2 setrlimit.2 MLINKS+=getsockopt.2 setsockopt.2 MLINKS+=gettimeofday.2 settimeofday.2 MLINKS+=getuid.2 geteuid.2 MLINKS+=intro.2 errno.2 MLINKS+=jail.2 jail_attach.2 \ jail.2 jail_get.2 \ jail.2 jail_remove.2 \ jail.2 jail_set.2 MLINKS+=kldunload.2 kldunloadf.2 MLINKS+=kqueue.2 kevent.2 \ kqueue.2 EV_SET.3 MLINKS+=link.2 linkat.2 MLINKS+=madvise.2 posix_madvise.2 MLINKS+=mkdir.2 mkdirat.2 MLINKS+=mkfifo.2 mkfifoat.2 MLINKS+=mknod.2 mknodat.2 MLINKS+=mlock.2 munlock.2 MLINKS+=mlockall.2 munlockall.2 MLINKS+=modnext.2 modfnext.2 MLINKS+=mount.2 nmount.2 \ mount.2 unmount.2 MLINKS+=mq_receive.2 mq_timedreceive.2 MLINKS+=mq_send.2 mq_timedsend.2 MLINKS+=ntp_adjtime.2 ntp_gettime.2 MLINKS+=numa_getaffinity.2 numa_setaffinity.2 MLINKS+=open.2 openat.2 MLINKS+=pathconf.2 fpathconf.2 MLINKS+=pathconf.2 lpathconf.2 MLINKS+=pdfork.2 pdgetpid.2\ pdfork.2 pdkill.2 \ pdfork.2 pdwait4.2 MLINKS+=pipe.2 pipe2.2 MLINKS+=poll.2 ppoll.2 MLINKS+=read.2 pread.2 \ read.2 preadv.2 \ read.2 readv.2 MLINKS+=readlink.2 readlinkat.2 MLINKS+=recv.2 recvfrom.2 \ recv.2 recvmsg.2 MLINKS+=rename.2 renameat.2 MLINKS+=rtprio.2 rtprio_thread.2 .if !defined(NO_P1003_1B) MLINKS+=sched_get_priority_max.2 sched_get_priority_min.2 \ sched_get_priority_max.2 sched_rr_get_interval.2 MLINKS+=sched_setparam.2 sched_getparam.2 MLINKS+=sched_setscheduler.2 sched_getscheduler.2 .endif MLINKS+=select.2 FD_CLR.3 \ select.2 FD_ISSET.3 \ select.2 FD_SET.3 \ select.2 FD_ZERO.3 MLINKS+=send.2 sendmsg.2 \ send.2 sendto.2 MLINKS+=setpgid.2 setpgrp.2 MLINKS+=setresuid.2 getresgid.2 \ setresuid.2 getresuid.2 \ setresuid.2 setresgid.2 MLINKS+=setuid.2 setegid.2 \ setuid.2 seteuid.2 \ setuid.2 setgid.2 MLINKS+=shmat.2 shmdt.2 MLINKS+=shm_open.2 shm_unlink.2 MLINKS+=sigwaitinfo.2 sigtimedwait.2 MLINKS+=stat.2 fstat.2 \ stat.2 fstatat.2 \ stat.2 lstat.2 MLINKS+=statfs.2 fstatfs.2 MLINKS+=swapon.2 swapoff.2 MLINKS+=symlink.2 symlinkat.2 MLINKS+=syscall.2 __syscall.2 MLINKS+=timer_settime.2 timer_getoverrun.2 \ timer_settime.2 timer_gettime.2 MLINKS+=thr_kill.2 thr_kill2.2 MLINKS+=truncate.2 ftruncate.2 MLINKS+=unlink.2 unlinkat.2 MLINKS+=utimensat.2 futimens.2 MLINKS+=utimes.2 futimes.2 \ utimes.2 futimesat.2 \ utimes.2 lutimes.2 MLINKS+=wait.2 wait3.2 \ wait.2 wait4.2 \ wait.2 waitpid.2 \ wait.2 waitid.2 \ wait.2 wait6.2 MLINKS+=write.2 pwrite.2 \ write.2 pwritev.2 \ write.2 writev.2 Index: projects/clang390-import/lib/libc/sys/_exit.2 =================================================================== --- projects/clang390-import/lib/libc/sys/_exit.2 (revision 305686) +++ projects/clang390-import/lib/libc/sys/_exit.2 (revision 305687) @@ -1,123 +1,125 @@ .\" Copyright (c) 1980, 1993 .\" The Regents of the University of California. All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 4. Neither the name of the University nor the names of its contributors .\" may be used to endorse or promote products derived from this software .\" without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" @(#)_exit.2 8.1 (Berkeley) 6/4/93 .\" $FreeBSD$ .\" -.Dd June 4, 1993 +.Dd September 8, 2016 .Dt EXIT 2 .Os .Sh NAME .Nm _exit .Nd terminate the calling process .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In unistd.h .Ft void .Fn _exit "int status" .Sh DESCRIPTION The .Fn _exit system call terminates a process with the following consequences: .Bl -bullet .It All of the descriptors open in the calling process are closed. This may entail delays, for example, waiting for output to drain; a process in this state may not be killed, as it is already dying. .It If the parent process of the calling process has an outstanding .Xr wait 2 call or catches the .Dv SIGCHLD signal, it is notified of the calling process's termination and the .Fa status is set as defined by .Xr wait 2 . .It The parent process-ID of all of the calling process's existing child -processes are set to 1; the initialization process +processes are set to the process-ID of the calling process's reaper; +the reaper (normally the initialization process) inherits each of these processes (see +.Xr procctl 2 , .Xr init 8 and the .Sx DEFINITIONS section of .Xr intro 2 ) . .It If the termination of the process causes any process group to become orphaned (usually because the parents of all members of the group have now exited; see .Dq orphaned process group in .Xr intro 2 ) , and if any member of the orphaned group is stopped, the .Dv SIGHUP signal and the .Dv SIGCONT signal are sent to all members of the newly-orphaned process group. .It If the process is a controlling process (see .Xr intro 2 ) , the .Dv SIGHUP signal is sent to the foreground process group of the controlling terminal, and all current access to the controlling terminal is revoked. .El .Pp Most C programs call the library routine .Xr exit 3 , which flushes buffers, closes streams, unlinks temporary files, etc., before calling .Fn _exit . .Sh RETURN VALUES The .Fn _exit system call can never return. .Sh SEE ALSO .Xr fork 2 , .Xr sigaction 2 , .Xr wait 2 , .Xr exit 3 , .Xr init 8 .Sh STANDARDS The .Fn _exit system call is expected to conform to .St -p1003.1-90 . .Sh HISTORY The .Fn _exit function appeared in .At v7 . Index: projects/clang390-import/lib/libc/sys/intro.2 =================================================================== --- projects/clang390-import/lib/libc/sys/intro.2 (revision 305686) +++ projects/clang390-import/lib/libc/sys/intro.2 (revision 305687) @@ -1,751 +1,754 @@ .\" Copyright (c) 1980, 1983, 1986, 1991, 1993 .\" The Regents of the University of California. All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 4. Neither the name of the University nor the names of its contributors .\" may be used to endorse or promote products derived from this software .\" without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" @(#)intro.2 8.5 (Berkeley) 2/27/95 .\" $FreeBSD$ .\" -.Dd May 4, 2013 +.Dd September 8, 2016 .Dt INTRO 2 .Os .Sh NAME .Nm intro .Nd introduction to system calls and error numbers .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In errno.h .Sh DESCRIPTION This section provides an overview of the system calls, their error returns, and other common definitions and concepts. .\".Pp .\".Sy System call restart .\".Pp .\"(more later...) .Sh RETURN VALUES Nearly all of the system calls provide an error number referenced via the external identifier errno. This identifier is defined in .In sys/errno.h as .Pp .Dl extern int * __error(); .Dl #define errno (* __error()) .Pp The .Va __error() function returns a pointer to a field in the thread specific structure for threads other than the initial thread. For the initial thread and non-threaded processes, .Va __error() returns a pointer to a global .Va errno variable that is compatible with the previous definition. .Pp When a system call detects an error, it returns an integer value indicating failure (usually -1) and sets the variable .Va errno accordingly. (This allows interpretation of the failure on receiving a -1 and to take action accordingly.) Successful calls never set .Va errno ; once set, it remains until another error occurs. It should only be examined after an error. Note that a number of system calls overload the meanings of these error numbers, and that the meanings must be interpreted according to the type and circumstances of the call. .Pp The following is a complete list of the errors and their names as given in .In sys/errno.h . .Bl -hang -width Ds .It Er 0 Em "Undefined error: 0" . Not used. .It Er 1 EPERM Em "Operation not permitted" . An attempt was made to perform an operation limited to processes with appropriate privileges or to the owner of a file or other resources. .It Er 2 ENOENT Em "No such file or directory" . A component of a specified pathname did not exist, or the pathname was an empty string. .It Er 3 ESRCH Em "No such process" . No process could be found corresponding to that specified by the given process ID. .It Er 4 EINTR Em "Interrupted system call" . An asynchronous signal (such as .Dv SIGINT or .Dv SIGQUIT ) was caught by the process during the execution of an interruptible function. If the signal handler performs a normal return, the interrupted system call will seem to have returned the error condition. .It Er 5 EIO Em "Input/output error" . Some physical input or output error occurred. This error will not be reported until a subsequent operation on the same file descriptor and may be lost (over written) by any subsequent errors. .It Er 6 ENXIO Em "Device not configured" . Input or output on a special file referred to a device that did not exist, or made a request beyond the limits of the device. This error may also occur when, for example, a tape drive is not online or no disk pack is loaded on a drive. .It Er 7 E2BIG Em "Argument list too long" . The number of bytes used for the argument and environment list of the new process exceeded the current limit .Dv ( NCARGS in .In sys/param.h ) . .It Er 8 ENOEXEC Em "Exec format error" . A request was made to execute a file that, although it has the appropriate permissions, was not in the format required for an executable file. .It Er 9 EBADF Em "Bad file descriptor" . A file descriptor argument was out of range, referred to no open file, or a read (write) request was made to a file that was only open for writing (reading). .It Er 10 ECHILD Em "\&No child processes" . A .Xr wait 2 or .Xr waitpid 2 function was executed by a process that had no existing or unwaited-for child processes. .It Er 11 EDEADLK Em "Resource deadlock avoided" . An attempt was made to lock a system resource that would have resulted in a deadlock situation. .It Er 12 ENOMEM Em "Cannot allocate memory" . The new process image required more memory than was allowed by the hardware or by system-imposed memory management constraints. A lack of swap space is normally temporary; however, a lack of core is not. Soft limits may be increased to their corresponding hard limits. .It Er 13 EACCES Em "Permission denied" . An attempt was made to access a file in a way forbidden by its file access permissions. .It Er 14 EFAULT Em "Bad address" . The system detected an invalid address in attempting to use an argument of a call. .It Er 15 ENOTBLK Em "Block device required" . A block device operation was attempted on a non-block device or file. .It Er 16 EBUSY Em "Device busy" . An attempt to use a system resource which was in use at the time in a manner which would have conflicted with the request. .It Er 17 EEXIST Em "File exists" . An existing file was mentioned in an inappropriate context, for instance, as the new link name in a .Xr link 2 system call. .It Er 18 EXDEV Em "Cross-device link" . A hard link to a file on another file system was attempted. .It Er 19 ENODEV Em "Operation not supported by device" . An attempt was made to apply an inappropriate function to a device, for example, trying to read a write-only device such as a printer. .It Er 20 ENOTDIR Em "Not a directory" . A component of the specified pathname existed, but it was not a directory, when a directory was expected. .It Er 21 EISDIR Em "Is a directory" . An attempt was made to open a directory with write mode specified. .It Er 22 EINVAL Em "Invalid argument" . Some invalid argument was supplied. (For example, specifying an undefined signal to a .Xr signal 3 function or a .Xr kill 2 system call). .It Er 23 ENFILE Em "Too many open files in system" . Maximum number of open files allowable on the system has been reached and requests for an open cannot be satisfied until at least one has been closed. .It Er 24 EMFILE Em "Too many open files" . Maximum number of file descriptors allowable in the process has been reached and requests for an open cannot be satisfied until at least one has been closed. The .Xr getdtablesize 2 system call will obtain the current limit. .It Er 25 ENOTTY Em "Inappropriate ioctl for device" . A control function (see .Xr ioctl 2 ) was attempted for a file or special device for which the operation was inappropriate. .It Er 26 ETXTBSY Em "Text file busy" . The new process was a pure procedure (shared text) file which was open for writing by another process, or while the pure procedure file was being executed an .Xr open 2 call requested write access. .It Er 27 EFBIG Em "File too large" . The size of a file exceeded the maximum. .It Er 28 ENOSPC Em "No space left on device" . A .Xr write 2 to an ordinary file, the creation of a directory or symbolic link, or the creation of a directory entry failed because no more disk blocks were available on the file system, or the allocation of an inode for a newly created file failed because no more inodes were available on the file system. .It Er 29 ESPIPE Em "Illegal seek" . An .Xr lseek 2 system call was issued on a socket, pipe or .Tn FIFO . .It Er 30 EROFS Em "Read-only file system" . An attempt was made to modify a file or directory on a file system that was read-only at the time. .It Er 31 EMLINK Em "Too many links" . Maximum allowable hard links to a single file has been exceeded (limit of 32767 hard links per file). .It Er 32 EPIPE Em "Broken pipe" . A write on a pipe, socket or .Tn FIFO for which there is no process to read the data. .It Er 33 EDOM Em "Numerical argument out of domain" . A numerical input argument was outside the defined domain of the mathematical function. .It Er 34 ERANGE Em "Result too large" . A numerical result of the function was too large to fit in the available space (perhaps exceeded precision). .It Er 35 EAGAIN Em "Resource temporarily unavailable" . This is a temporary condition and later calls to the same routine may complete normally. .It Er 36 EINPROGRESS Em "Operation now in progress" . An operation that takes a long time to complete (such as a .Xr connect 2 ) was attempted on a non-blocking object (see .Xr fcntl 2 ) . .It Er 37 EALREADY Em "Operation already in progress" . An operation was attempted on a non-blocking object that already had an operation in progress. .It Er 38 ENOTSOCK Em "Socket operation on non-socket" . Self-explanatory. .It Er 39 EDESTADDRREQ Em "Destination address required" . A required address was omitted from an operation on a socket. .It Er 40 EMSGSIZE Em "Message too long" . A message sent on a socket was larger than the internal message buffer or some other network limit. .It Er 41 EPROTOTYPE Em "Protocol wrong type for socket" . A protocol was specified that does not support the semantics of the socket type requested. For example, you cannot use the .Tn ARPA Internet .Tn UDP protocol with type .Dv SOCK_STREAM . .It Er 42 ENOPROTOOPT Em "Protocol not available" . A bad option or level was specified in a .Xr getsockopt 2 or .Xr setsockopt 2 call. .It Er 43 EPROTONOSUPPORT Em "Protocol not supported" . The protocol has not been configured into the system or no implementation for it exists. .It Er 44 ESOCKTNOSUPPORT Em "Socket type not supported" . The support for the socket type has not been configured into the system or no implementation for it exists. .It Er 45 EOPNOTSUPP Em "Operation not supported" . The attempted operation is not supported for the type of object referenced. Usually this occurs when a file descriptor refers to a file or socket that cannot support this operation, for example, trying to .Em accept a connection on a datagram socket. .It Er 46 EPFNOSUPPORT Em "Protocol family not supported" . The protocol family has not been configured into the system or no implementation for it exists. .It Er 47 EAFNOSUPPORT Em "Address family not supported by protocol family" . An address incompatible with the requested protocol was used. For example, you should not necessarily expect to be able to use .Tn NS addresses with .Tn ARPA Internet protocols. .It Er 48 EADDRINUSE Em "Address already in use" . Only one usage of each address is normally permitted. .It Er 49 EADDRNOTAVAIL Em "Can't assign requested address" . Normally results from an attempt to create a socket with an address not on this machine. .It Er 50 ENETDOWN Em "Network is down" . A socket operation encountered a dead network. .It Er 51 ENETUNREACH Em "Network is unreachable" . A socket operation was attempted to an unreachable network. .It Er 52 ENETRESET Em "Network dropped connection on reset" . The host you were connected to crashed and rebooted. .It Er 53 ECONNABORTED Em "Software caused connection abort" . A connection abort was caused internal to your host machine. .It Er 54 ECONNRESET Em "Connection reset by peer" . A connection was forcibly closed by a peer. This normally results from a loss of the connection on the remote socket due to a timeout or a reboot. .It Er 55 ENOBUFS Em "\&No buffer space available" . An operation on a socket or pipe was not performed because the system lacked sufficient buffer space or because a queue was full. .It Er 56 EISCONN Em "Socket is already connected" . A .Xr connect 2 request was made on an already connected socket; or, a .Xr sendto 2 or .Xr sendmsg 2 request on a connected socket specified a destination when already connected. .It Er 57 ENOTCONN Em "Socket is not connected" . An request to send or receive data was disallowed because the socket was not connected and (when sending on a datagram socket) no address was supplied. .It Er 58 ESHUTDOWN Em "Can't send after socket shutdown" . A request to send data was disallowed because the socket had already been shut down with a previous .Xr shutdown 2 call. .It Er 60 ETIMEDOUT Em "Operation timed out" . A .Xr connect 2 or .Xr send 2 request failed because the connected party did not properly respond after a period of time. (The timeout period is dependent on the communication protocol.) .It Er 61 ECONNREFUSED Em "Connection refused" . No connection could be made because the target machine actively refused it. This usually results from trying to connect to a service that is inactive on the foreign host. .It Er 62 ELOOP Em "Too many levels of symbolic links" . A path name lookup involved more than 32 .Pq Dv MAXSYMLINKS symbolic links. .It Er 63 ENAMETOOLONG Em "File name too long" . A component of a path name exceeded .Brq Dv NAME_MAX characters, or an entire path name exceeded .Brq Dv PATH_MAX characters. (See also the description of .Dv _PC_NO_TRUNC in .Xr pathconf 2 . ) .It Er 64 EHOSTDOWN Em "Host is down" . A socket operation failed because the destination host was down. .It Er 65 EHOSTUNREACH Em "No route to host" . A socket operation was attempted to an unreachable host. .It Er 66 ENOTEMPTY Em "Directory not empty" . A directory with entries other than .Ql .\& and .Ql ..\& was supplied to a remove directory or rename call. .It Er 67 EPROCLIM Em "Too many processes" . .It Er 68 EUSERS Em "Too many users" . The quota system ran out of table entries. .It Er 69 EDQUOT Em "Disc quota exceeded" . A .Xr write 2 to an ordinary file, the creation of a directory or symbolic link, or the creation of a directory entry failed because the user's quota of disk blocks was exhausted, or the allocation of an inode for a newly created file failed because the user's quota of inodes was exhausted. .It Er 70 ESTALE Em "Stale NFS file handle" . An attempt was made to access an open file (on an .Tn NFS file system) which is now unavailable as referenced by the file descriptor. This may indicate the file was deleted on the .Tn NFS server or some other catastrophic event occurred. .It Er 72 EBADRPC Em "RPC struct is bad" . Exchange of .Tn RPC information was unsuccessful. .It Er 73 ERPCMISMATCH Em "RPC version wrong" . The version of .Tn RPC on the remote peer is not compatible with the local version. .It Er 74 EPROGUNAVAIL Em "RPC prog. not avail" . The requested program is not registered on the remote host. .It Er 75 EPROGMISMATCH Em "Program version wrong" . The requested version of the program is not available on the remote host .Pq Tn RPC . .It Er 76 EPROCUNAVAIL Em "Bad procedure for program" . An .Tn RPC call was attempted for a procedure which does not exist in the remote program. .It Er 77 ENOLCK Em "No locks available" . A system-imposed limit on the number of simultaneous file locks was reached. .It Er 78 ENOSYS Em "Function not implemented" . Attempted a system call that is not available on this system. .It Er 79 EFTYPE Em "Inappropriate file type or format" . The file was the wrong type for the operation, or a data file had the wrong format. .It Er 80 EAUTH Em "Authentication error" . Attempted to use an invalid authentication ticket to mount a .Tn NFS file system. .It Er 81 ENEEDAUTH Em "Need authenticator" . An authentication ticket must be obtained before the given .Tn NFS file system may be mounted. .It Er 82 EIDRM Em "Identifier removed" . An IPC identifier was removed while the current process was waiting on it. .It Er 83 ENOMSG Em "No message of desired type" . An IPC message queue does not contain a message of the desired type, or a message catalog does not contain the requested message. .It Er 84 EOVERFLOW Em "Value too large to be stored in data type" . A numerical result of the function was too large to be stored in the caller provided space. .It Er 85 ECANCELED Em "Operation canceled" . The scheduled operation was canceled. .It Er 86 EILSEQ Em "Illegal byte sequence" . While decoding a multibyte character the function came along an invalid or an incomplete sequence of bytes or the given wide character is invalid. .It Er 87 ENOATTR Em "Attribute not found" . The specified extended attribute does not exist. .It Er 88 EDOOFUS Em "Programming error" . A function or API is being abused in a way which could only be detected at run-time. .It Er 89 EBADMSG Em "Bad message" . A corrupted message was detected. .It Er 90 EMULTIHOP Em "Multihop attempted" . This error code is unused, but present for compatibility with other systems. .It Er 91 ENOLINK Em "Link has been severed" . This error code is unused, but present for compatibility with other systems. .It Er 92 EPROTO Em "Protocol error" . A device or socket encountered an unrecoverable protocol error. .It Er 93 ENOTCAPABLE Em "Capabilities insufficient" . An operation on a capability file descriptor requires greater privilege than the capability allows. .It Er 94 ECAPMODE Em "Not permitted in capability mode" . The system call or operation is not permitted for capability mode processes. .It Er 95 ENOTRECOVERABLE Em "State not recoverable" . The state protected by a robust mutex is not recoverable. .It Er 96 EOWNERDEAD Em "Previous owner died" . The owner of a robust mutex terminated while holding the mutex lock. .El .Sh DEFINITIONS .Bl -tag -width Ds .It Process ID . Each active process in the system is uniquely identified by a non-negative integer called a process ID. The range of this ID is from 0 to 99999. .It Parent process ID A new process is created by a currently active process (see .Xr fork 2 ) . The parent process ID of a process is initially the process ID of its creator. If the creating process exits, -the parent process ID of each child is set to the ID of a system process, +the parent process ID of each child is set to the ID of the calling process's +reaper (see +.Xr procctl 2 ) , +normally .Xr init 8 . .It Process Group Each active process is a member of a process group that is identified by a non-negative integer called the process group ID. This is the process ID of the group leader. This grouping permits the signaling of related processes (see .Xr termios 4 ) and the job control mechanisms of .Xr csh 1 . .It Session A session is a set of one or more process groups. A session is created by a successful call to .Xr setsid 2 , which causes the caller to become the only member of the only process group in the new session. .It Session leader A process that has created a new session by a successful call to .Xr setsid 2 , is known as a session leader. Only a session leader may acquire a terminal as its controlling terminal (see .Xr termios 4 ) . .It Controlling process A session leader with a controlling terminal is a controlling process. .It Controlling terminal A terminal that is associated with a session is known as the controlling terminal for that session and its members. .It "Terminal Process Group ID" A terminal may be acquired by a session leader as its controlling terminal. Once a terminal is associated with a session, any of the process groups within the session may be placed into the foreground by setting the terminal process group ID to the ID of the process group. This facility is used to arbitrate between multiple jobs contending for the same terminal; (see .Xr csh 1 and .Xr tty 4 ) . .It "Orphaned Process Group" A process group is considered to be .Em orphaned if it is not under the control of a job control shell. More precisely, a process group is orphaned when none of its members has a parent process that is in the same session as the group, but is in a different process group. Note that when a process exits, the parent process for its children -is changed to be +is normally changed to be .Xr init 8 , which is in a separate session. Not all members of an orphaned process group are necessarily orphaned processes (those whose creating process has exited). The process group of a session leader is orphaned by definition. .It "Real User ID and Real Group ID" Each user on the system is identified by a positive integer termed the real user ID. .Pp Each user is also a member of one or more groups. One of these groups is distinguished from others and used in implementing accounting facilities. The positive integer corresponding to this distinguished group is termed the real group ID. .Pp All processes have a real user ID and real group ID. These are initialized from the equivalent attributes of the process that created it. .It "Effective User Id, Effective Group Id, and Group Access List" Access to system resources is governed by two values: the effective user ID, and the group access list. The first member of the group access list is also known as the effective group ID. (In POSIX.1, the group access list is known as the set of supplementary group IDs, and it is unspecified whether the effective group ID is a member of the list.) .Pp The effective user ID and effective group ID are initially the process's real user ID and real group ID respectively. Either may be modified through execution of a set-user-ID or set-group-ID file (possibly by one its ancestors) (see .Xr execve 2 ) . By convention, the effective group ID (the first member of the group access list) is duplicated, so that the execution of a set-group-ID program does not result in the loss of the original (real) group ID. .Pp The group access list is a set of group IDs used only in determining resource accessibility. Access checks are performed as described below in ``File Access Permissions''. .It "Saved Set User ID and Saved Set Group ID" When a process executes a new file, the effective user ID is set to the owner of the file if the file is set-user-ID, and the effective group ID (first element of the group access list) is set to the group of the file if the file is set-group-ID. The effective user ID of the process is then recorded as the saved set-user-ID, and the effective group ID of the process is recorded as the saved set-group-ID. These values may be used to regain those values as the effective user or group ID after reverting to the real ID (see .Xr setuid 2 ) . (In POSIX.1, the saved set-user-ID and saved set-group-ID are optional, and are used in setuid and setgid, but this does not work as desired for the super-user.) .It Super-user A process is recognized as a .Em super-user process and is granted special privileges if its effective user ID is 0. .It Descriptor An integer assigned by the system when a file is referenced by .Xr open 2 or .Xr dup 2 , or when a socket is created by .Xr pipe 2 , .Xr socket 2 or .Xr socketpair 2 , which uniquely identifies an access path to that file or socket from a given process or any of its children. .It File Name Names consisting of up to .Brq Dv NAME_MAX characters may be used to name an ordinary file, special file, or directory. .Pp These characters may be arbitrary eight-bit values, excluding .Dv NUL .Tn ( ASCII 0) and the .Ql \&/ character (slash, .Tn ASCII 47). .Pp Note that it is generally unwise to use .Ql \&* , .Ql \&? , .Ql \&[ or .Ql \&] as part of file names because of the special meaning attached to these characters by the shell. .It Path Name A path name is a .Dv NUL Ns -terminated character string starting with an optional slash .Ql \&/ , followed by zero or more directory names separated by slashes, optionally followed by a file name. The total length of a path name must be less than .Brq Dv PATH_MAX characters. (On some systems, this limit may be infinite.) .Pp If a path name begins with a slash, the path search begins at the .Em root directory. Otherwise, the search begins from the current working directory. A slash by itself names the root directory. An empty pathname refers to the current directory. .It Directory A directory is a special type of file that contains entries that are references to other files. Directory entries are called links. By convention, a directory contains at least two links, .Ql .\& and .Ql \&.. , referred to as .Em dot and .Em dot-dot respectively. Dot refers to the directory itself and dot-dot refers to its parent directory. .It "Root Directory and Current Working Directory" Each process has associated with it a concept of a root directory and a current working directory for the purpose of resolving path name searches. A process's root directory need not be the root directory of the root file system. .It File Access Permissions Every file in the file system has a set of access permissions. These permissions are used in determining whether a process may perform a requested operation on the file (such as opening a file for writing). Access permissions are established at the time a file is created. They may be changed at some later time through the .Xr chmod 2 call. .Pp File access is broken down according to whether a file may be: read, written, or executed. Directory files use the execute permission to control if the directory may be searched. .Pp File access permissions are interpreted by the system as they apply to three different classes of users: the owner of the file, those users in the file's group, anyone else. Every file has an independent set of access permissions for each of these classes. When an access check is made, the system decides if permission should be granted by checking the access information applicable to the caller. .Pp Read, write, and execute/search permissions on a file are granted to a process if: .Pp The process's effective user ID is that of the super-user. (Note: even the super-user cannot execute a non-executable file.) .Pp The process's effective user ID matches the user ID of the owner of the file and the owner permissions allow the access. .Pp The process's effective user ID does not match the user ID of the owner of the file, and either the process's effective group ID matches the group ID of the file, or the group ID of the file is in the process's group access list, and the group permissions allow the access. .Pp Neither the effective user ID nor effective group ID and group access list of the process match the corresponding user ID and group ID of the file, but the permissions for ``other users'' allow access. .Pp Otherwise, permission is denied. .It Sockets and Address Families A socket is an endpoint for communication between processes. Each socket has queues for sending and receiving data. .Pp Sockets are typed according to their communications properties. These properties include whether messages sent and received at a socket require the name of the partner, whether communication is reliable, the format used in naming message recipients, etc. .Pp Each instance of the system supports some collection of socket types; consult .Xr socket 2 for more information about the types available and their properties. .Pp Each instance of the system supports some number of sets of communications protocols. Each protocol set supports addresses of a certain format. An Address Family is the set of addresses for a specific group of protocols. Each socket has an address chosen from the address family in which the socket was created. .El .Sh SEE ALSO .Xr intro 3 , .Xr perror 3 Index: projects/clang390-import/lib/libcasper/services/cap_dns/Makefile =================================================================== --- projects/clang390-import/lib/libcasper/services/cap_dns/Makefile (revision 305686) +++ projects/clang390-import/lib/libcasper/services/cap_dns/Makefile (revision 305687) @@ -1,18 +1,24 @@ # $FreeBSD$ +.include + PACKAGE=libcasper LIB= cap_dns SHLIB_MAJOR= 0 SHLIBDIR?= /lib/casper INCSDIR?= ${INCLUDEDIR}/casper SRCS= cap_dns.c INCS= cap_dns.h LIBADD= nv CFLAGS+=-I${.CURDIR} + +.if ${MK_TESTS} != "no" +SUBDIR+= tests +.endif .include Index: projects/clang390-import/lib/libcasper/services/cap_dns/tests/Makefile =================================================================== --- projects/clang390-import/lib/libcasper/services/cap_dns/tests/Makefile (nonexistent) +++ projects/clang390-import/lib/libcasper/services/cap_dns/tests/Makefile (revision 305687) @@ -0,0 +1,11 @@ +# $FreeBSD$ + +TAP_TESTS_C= dns_test + +LIBADD+= casper +LIBADD+= cap_dns +LIBADD+= nv + +WARNS?= 3 + +.include Property changes on: projects/clang390-import/lib/libcasper/services/cap_dns/tests/Makefile ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang390-import/lib/libcasper/services/cap_dns/tests/dns_test.c =================================================================== --- projects/clang390-import/lib/libcasper/services/cap_dns/tests/dns_test.c (nonexistent) +++ projects/clang390-import/lib/libcasper/services/cap_dns/tests/dns_test.c (revision 305687) @@ -0,0 +1,702 @@ +/*- + * Copyright (c) 2013 The FreeBSD Foundation + * All rights reserved. + * + * This software was developed by Pawel Jakub Dawidek under sponsorship from + * the FreeBSD Foundation. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include +__FBSDID("$FreeBSD$"); + +#include + +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include + +static int ntest = 1; + +#define CHECK(expr) do { \ + if ((expr)) \ + printf("ok %d %s:%u\n", ntest, __FILE__, __LINE__); \ + else \ + printf("not ok %d %s:%u\n", ntest, __FILE__, __LINE__); \ + ntest++; \ +} while (0) +#define CHECKX(expr) do { \ + if ((expr)) { \ + printf("ok %d %s:%u\n", ntest, __FILE__, __LINE__); \ + } else { \ + printf("not ok %d %s:%u\n", ntest, __FILE__, __LINE__); \ + exit(1); \ + } \ + ntest++; \ +} while (0) + +#define GETHOSTBYNAME 0x01 +#define GETHOSTBYNAME2_AF_INET 0x02 +#define GETHOSTBYNAME2_AF_INET6 0x04 +#define GETHOSTBYADDR_AF_INET 0x08 +#define GETHOSTBYADDR_AF_INET6 0x10 +#define GETADDRINFO_AF_UNSPEC 0x20 +#define GETADDRINFO_AF_INET 0x40 +#define GETADDRINFO_AF_INET6 0x80 + +static bool +addrinfo_compare(struct addrinfo *ai0, struct addrinfo *ai1) +{ + struct addrinfo *at0, *at1; + + if (ai0 == NULL && ai1 == NULL) + return (true); + if (ai0 == NULL || ai1 == NULL) + return (false); + + at0 = ai0; + at1 = ai1; + while (true) { + if ((at0->ai_flags == at1->ai_flags) && + (at0->ai_family == at1->ai_family) && + (at0->ai_socktype == at1->ai_socktype) && + (at0->ai_protocol == at1->ai_protocol) && + (at0->ai_addrlen == at1->ai_addrlen) && + (memcmp(at0->ai_addr, at1->ai_addr, + at0->ai_addrlen) == 0)) { + if (at0->ai_canonname != NULL && + at1->ai_canonname != NULL) { + if (strcmp(at0->ai_canonname, + at1->ai_canonname) != 0) { + return (false); + } + } + + if (at0->ai_canonname == NULL && + at1->ai_canonname != NULL) { + return (false); + } + if (at0->ai_canonname != NULL && + at1->ai_canonname == NULL) { + return (false); + } + + if (at0->ai_next == NULL && at1->ai_next == NULL) + return (true); + if (at0->ai_next == NULL || at1->ai_next == NULL) + return (false); + + at0 = at0->ai_next; + at1 = at1->ai_next; + } else { + return (false); + } + } + + /* NOTREACHED */ + fprintf(stderr, "Dead code reached in addrinfo_compare()\n"); + exit(1); +} + +static bool +hostent_aliases_compare(char **aliases0, char **aliases1) +{ + int i0, i1; + + if (aliases0 == NULL && aliases1 == NULL) + return (true); + if (aliases0 == NULL || aliases1 == NULL) + return (false); + + for (i0 = 0; aliases0[i0] != NULL; i0++) { + for (i1 = 0; aliases1[i1] != NULL; i1++) { + if (strcmp(aliases0[i0], aliases1[i1]) == 0) + break; + } + if (aliases1[i1] == NULL) + return (false); + } + + return (true); +} + +static bool +hostent_addr_list_compare(char **addr_list0, char **addr_list1, int length) +{ + int i0, i1; + + if (addr_list0 == NULL && addr_list1 == NULL) + return (true); + if (addr_list0 == NULL || addr_list1 == NULL) + return (false); + + for (i0 = 0; addr_list0[i0] != NULL; i0++) { + for (i1 = 0; addr_list1[i1] != NULL; i1++) { + if (memcmp(addr_list0[i0], addr_list1[i1], length) == 0) + break; + } + if (addr_list1[i1] == NULL) + return (false); + } + + return (true); +} + +static bool +hostent_compare(const struct hostent *hp0, const struct hostent *hp1) +{ + + if (hp0 == NULL && hp1 != NULL) + return (true); + + if (hp0 == NULL || hp1 == NULL) + return (false); + + if (hp0->h_name != NULL || hp1->h_name != NULL) { + if (hp0->h_name == NULL || hp1->h_name == NULL) + return (false); + if (strcmp(hp0->h_name, hp1->h_name) != 0) + return (false); + } + + if (!hostent_aliases_compare(hp0->h_aliases, hp1->h_aliases)) + return (false); + if (!hostent_aliases_compare(hp1->h_aliases, hp0->h_aliases)) + return (false); + + if (hp0->h_addrtype != hp1->h_addrtype) + return (false); + + if (hp0->h_length != hp1->h_length) + return (false); + + if (!hostent_addr_list_compare(hp0->h_addr_list, hp1->h_addr_list, + hp0->h_length)) { + return (false); + } + if (!hostent_addr_list_compare(hp1->h_addr_list, hp0->h_addr_list, + hp0->h_length)) { + return (false); + } + + return (true); +} + +static unsigned int +runtest(cap_channel_t *capdns) +{ + unsigned int result; + struct addrinfo *ais, *aic, hints, *hintsp; + struct hostent *hps, *hpc; + struct in_addr ip4; + struct in6_addr ip6; + + result = 0; + + hps = gethostbyname("example.com"); + if (hps == NULL) + fprintf(stderr, "Unable to resolve %s IPv4.\n", "example.com"); + hpc = cap_gethostbyname(capdns, "example.com"); + if (hostent_compare(hps, hpc)) + result |= GETHOSTBYNAME; + + hps = gethostbyname2("example.com", AF_INET); + if (hps == NULL) + fprintf(stderr, "Unable to resolve %s IPv4.\n", "example.com"); + hpc = cap_gethostbyname2(capdns, "example.com", AF_INET); + if (hostent_compare(hps, hpc)) + result |= GETHOSTBYNAME2_AF_INET; + + hps = gethostbyname2("example.com", AF_INET6); + if (hps == NULL) + fprintf(stderr, "Unable to resolve %s IPv6.\n", "example.com"); + hpc = cap_gethostbyname2(capdns, "example.com", AF_INET6); + if (hostent_compare(hps, hpc)) + result |= GETHOSTBYNAME2_AF_INET6; + + hints.ai_flags = 0; + hints.ai_family = AF_UNSPEC; + hints.ai_socktype = 0; + hints.ai_protocol = 0; + hints.ai_addrlen = 0; + hints.ai_addr = NULL; + hints.ai_canonname = NULL; + hints.ai_next = NULL; + + hintsp = &hints; + + if (getaddrinfo("freebsd.org", "25", hintsp, &ais) != 0) { + fprintf(stderr, + "Unable to issue [system] getaddrinfo() for AF_UNSPEC: %s\n", + gai_strerror(errno)); + } + if (cap_getaddrinfo(capdns, "freebsd.org", "25", hintsp, &aic) == 0) { + if (addrinfo_compare(ais, aic)) + result |= GETADDRINFO_AF_UNSPEC; + freeaddrinfo(ais); + freeaddrinfo(aic); + } + + hints.ai_family = AF_INET; + if (getaddrinfo("freebsd.org", "25", hintsp, &ais) != 0) { + fprintf(stderr, + "Unable to issue [system] getaddrinfo() for AF_UNSPEC: %s\n", + gai_strerror(errno)); + } + if (cap_getaddrinfo(capdns, "freebsd.org", "25", hintsp, &aic) == 0) { + if (addrinfo_compare(ais, aic)) + result |= GETADDRINFO_AF_INET; + freeaddrinfo(ais); + freeaddrinfo(aic); + } + + hints.ai_family = AF_INET6; + if (getaddrinfo("freebsd.org", "25", hintsp, &ais) != 0) { + fprintf(stderr, + "Unable to issue [system] getaddrinfo() for AF_UNSPEC: %s\n", + gai_strerror(errno)); + } + if (cap_getaddrinfo(capdns, "freebsd.org", "25", hintsp, &aic) == 0) { + if (addrinfo_compare(ais, aic)) + result |= GETADDRINFO_AF_INET6; + freeaddrinfo(ais); + freeaddrinfo(aic); + } + + /* + * 8.8.178.135 is IPv4 address of freefall.freebsd.org + * as of 27 October 2013. + */ + inet_pton(AF_INET, "8.8.178.135", &ip4); + hps = gethostbyaddr(&ip4, sizeof(ip4), AF_INET); + if (hps == NULL) + fprintf(stderr, "Unable to resolve %s.\n", "8.8.178.135"); + hpc = cap_gethostbyaddr(capdns, &ip4, sizeof(ip4), AF_INET); + if (hostent_compare(hps, hpc)) + result |= GETHOSTBYADDR_AF_INET; + + /* + * 2001:1900:2254:206c::16:87 is IPv6 address of freefall.freebsd.org + * as of 27 October 2013. + */ + inet_pton(AF_INET6, "2001:1900:2254:206c::16:87", &ip6); + hps = gethostbyaddr(&ip6, sizeof(ip6), AF_INET6); + if (hps == NULL) { + fprintf(stderr, "Unable to resolve %s.\n", + "2001:1900:2254:206c::16:87"); + } + hpc = cap_gethostbyaddr(capdns, &ip6, sizeof(ip6), AF_INET6); + if (hostent_compare(hps, hpc)) + result |= GETHOSTBYADDR_AF_INET6; + + return (result); +} + +int +main(void) +{ + cap_channel_t *capcas, *capdns, *origcapdns; + const char *types[2]; + int families[2]; + + printf("1..91\n"); + + capcas = cap_init(); + CHECKX(capcas != NULL); + + origcapdns = capdns = cap_service_open(capcas, "system.dns"); + CHECKX(capdns != NULL); + + cap_close(capcas); + + /* No limits set. */ + + CHECK(runtest(capdns) == + (GETHOSTBYNAME | GETHOSTBYNAME2_AF_INET | GETHOSTBYNAME2_AF_INET6 | + GETHOSTBYADDR_AF_INET | GETHOSTBYADDR_AF_INET6 | + GETADDRINFO_AF_UNSPEC | GETADDRINFO_AF_INET | GETADDRINFO_AF_INET6)); + + /* + * Allow: + * type: NAME, ADDR + * family: AF_INET, AF_INET6 + */ + + capdns = cap_clone(origcapdns); + CHECK(capdns != NULL); + + types[0] = "NAME"; + types[1] = "ADDR"; + CHECK(cap_dns_type_limit(capdns, types, 2) == 0); + families[0] = AF_INET; + families[1] = AF_INET6; + CHECK(cap_dns_family_limit(capdns, families, 2) == 0); + + CHECK(runtest(capdns) == + (GETHOSTBYNAME | GETHOSTBYNAME2_AF_INET | GETHOSTBYNAME2_AF_INET6 | + GETHOSTBYADDR_AF_INET | GETHOSTBYADDR_AF_INET6 | + GETADDRINFO_AF_INET | GETADDRINFO_AF_INET6)); + + cap_close(capdns); + + /* + * Allow: + * type: NAME + * family: AF_INET, AF_INET6 + */ + + capdns = cap_clone(origcapdns); + CHECK(capdns != NULL); + + types[0] = "NAME"; + CHECK(cap_dns_type_limit(capdns, types, 1) == 0); + types[1] = "ADDR"; + CHECK(cap_dns_type_limit(capdns, types, 2) == -1 && + errno == ENOTCAPABLE); + types[0] = "ADDR"; + CHECK(cap_dns_type_limit(capdns, types, 1) == -1 && + errno == ENOTCAPABLE); + families[0] = AF_INET; + families[1] = AF_INET6; + CHECK(cap_dns_family_limit(capdns, families, 2) == 0); + + CHECK(runtest(capdns) == + (GETHOSTBYNAME | GETHOSTBYNAME2_AF_INET | GETHOSTBYNAME2_AF_INET6)); + + cap_close(capdns); + + /* + * Allow: + * type: ADDR + * family: AF_INET, AF_INET6 + */ + + capdns = cap_clone(origcapdns); + CHECK(capdns != NULL); + + types[0] = "ADDR"; + CHECK(cap_dns_type_limit(capdns, types, 1) == 0); + types[1] = "NAME"; + CHECK(cap_dns_type_limit(capdns, types, 2) == -1 && + errno == ENOTCAPABLE); + types[0] = "NAME"; + CHECK(cap_dns_type_limit(capdns, types, 1) == -1 && + errno == ENOTCAPABLE); + families[0] = AF_INET; + families[1] = AF_INET6; + CHECK(cap_dns_family_limit(capdns, families, 2) == 0); + + CHECK(runtest(capdns) == + (GETHOSTBYADDR_AF_INET | GETHOSTBYADDR_AF_INET6 | + GETADDRINFO_AF_INET | GETADDRINFO_AF_INET6)); + + cap_close(capdns); + + /* + * Allow: + * type: NAME, ADDR + * family: AF_INET + */ + + capdns = cap_clone(origcapdns); + CHECK(capdns != NULL); + + types[0] = "NAME"; + types[1] = "ADDR"; + CHECK(cap_dns_type_limit(capdns, types, 2) == 0); + families[0] = AF_INET; + CHECK(cap_dns_family_limit(capdns, families, 1) == 0); + families[1] = AF_INET6; + CHECK(cap_dns_family_limit(capdns, families, 2) == -1 && + errno == ENOTCAPABLE); + families[0] = AF_INET6; + CHECK(cap_dns_family_limit(capdns, families, 1) == -1 && + errno == ENOTCAPABLE); + + CHECK(runtest(capdns) == + (GETHOSTBYNAME | GETHOSTBYNAME2_AF_INET | GETHOSTBYADDR_AF_INET | + GETADDRINFO_AF_INET)); + + cap_close(capdns); + + /* + * Allow: + * type: NAME, ADDR + * family: AF_INET6 + */ + + capdns = cap_clone(origcapdns); + CHECK(capdns != NULL); + + types[0] = "NAME"; + types[1] = "ADDR"; + CHECK(cap_dns_type_limit(capdns, types, 2) == 0); + families[0] = AF_INET6; + CHECK(cap_dns_family_limit(capdns, families, 1) == 0); + families[1] = AF_INET; + CHECK(cap_dns_family_limit(capdns, families, 2) == -1 && + errno == ENOTCAPABLE); + families[0] = AF_INET; + CHECK(cap_dns_family_limit(capdns, families, 1) == -1 && + errno == ENOTCAPABLE); + + CHECK(runtest(capdns) == + (GETHOSTBYNAME2_AF_INET6 | GETHOSTBYADDR_AF_INET6 | + GETADDRINFO_AF_INET6)); + + cap_close(capdns); + + /* Below we also test further limiting capability. */ + + /* + * Allow: + * type: NAME + * family: AF_INET + */ + + capdns = cap_clone(origcapdns); + CHECK(capdns != NULL); + + types[0] = "NAME"; + types[1] = "ADDR"; + CHECK(cap_dns_type_limit(capdns, types, 2) == 0); + families[0] = AF_INET; + families[1] = AF_INET6; + CHECK(cap_dns_family_limit(capdns, families, 2) == 0); + types[0] = "NAME"; + CHECK(cap_dns_type_limit(capdns, types, 1) == 0); + types[1] = "ADDR"; + CHECK(cap_dns_type_limit(capdns, types, 2) == -1 && + errno == ENOTCAPABLE); + types[0] = "ADDR"; + CHECK(cap_dns_type_limit(capdns, types, 1) == -1 && + errno == ENOTCAPABLE); + families[0] = AF_INET; + CHECK(cap_dns_family_limit(capdns, families, 1) == 0); + families[1] = AF_INET6; + CHECK(cap_dns_family_limit(capdns, families, 2) == -1 && + errno == ENOTCAPABLE); + families[0] = AF_INET6; + CHECK(cap_dns_family_limit(capdns, families, 1) == -1 && + errno == ENOTCAPABLE); + + CHECK(runtest(capdns) == (GETHOSTBYNAME | GETHOSTBYNAME2_AF_INET)); + + cap_close(capdns); + + /* + * Allow: + * type: NAME + * family: AF_INET6 + */ + + capdns = cap_clone(origcapdns); + CHECK(capdns != NULL); + + types[0] = "NAME"; + types[1] = "ADDR"; + CHECK(cap_dns_type_limit(capdns, types, 2) == 0); + families[0] = AF_INET; + families[1] = AF_INET6; + CHECK(cap_dns_family_limit(capdns, families, 2) == 0); + types[0] = "NAME"; + CHECK(cap_dns_type_limit(capdns, types, 1) == 0); + types[1] = "ADDR"; + CHECK(cap_dns_type_limit(capdns, types, 2) == -1 && + errno == ENOTCAPABLE); + types[0] = "ADDR"; + CHECK(cap_dns_type_limit(capdns, types, 1) == -1 && + errno == ENOTCAPABLE); + families[0] = AF_INET6; + CHECK(cap_dns_family_limit(capdns, families, 1) == 0); + families[1] = AF_INET; + CHECK(cap_dns_family_limit(capdns, families, 2) == -1 && + errno == ENOTCAPABLE); + families[0] = AF_INET; + CHECK(cap_dns_family_limit(capdns, families, 1) == -1 && + errno == ENOTCAPABLE); + + CHECK(runtest(capdns) == GETHOSTBYNAME2_AF_INET6); + + cap_close(capdns); + + /* + * Allow: + * type: ADDR + * family: AF_INET + */ + + capdns = cap_clone(origcapdns); + CHECK(capdns != NULL); + + types[0] = "NAME"; + types[1] = "ADDR"; + CHECK(cap_dns_type_limit(capdns, types, 2) == 0); + families[0] = AF_INET; + families[1] = AF_INET6; + CHECK(cap_dns_family_limit(capdns, families, 2) == 0); + types[0] = "ADDR"; + CHECK(cap_dns_type_limit(capdns, types, 1) == 0); + types[1] = "NAME"; + CHECK(cap_dns_type_limit(capdns, types, 2) == -1 && + errno == ENOTCAPABLE); + types[0] = "NAME"; + CHECK(cap_dns_type_limit(capdns, types, 1) == -1 && + errno == ENOTCAPABLE); + families[0] = AF_INET; + CHECK(cap_dns_family_limit(capdns, families, 1) == 0); + families[1] = AF_INET6; + CHECK(cap_dns_family_limit(capdns, families, 2) == -1 && + errno == ENOTCAPABLE); + families[0] = AF_INET6; + CHECK(cap_dns_family_limit(capdns, families, 1) == -1 && + errno == ENOTCAPABLE); + + CHECK(runtest(capdns) == (GETHOSTBYADDR_AF_INET | GETADDRINFO_AF_INET)); + + cap_close(capdns); + + /* + * Allow: + * type: ADDR + * family: AF_INET6 + */ + + capdns = cap_clone(origcapdns); + CHECK(capdns != NULL); + + types[0] = "NAME"; + types[1] = "ADDR"; + CHECK(cap_dns_type_limit(capdns, types, 2) == 0); + families[0] = AF_INET; + families[1] = AF_INET6; + CHECK(cap_dns_family_limit(capdns, families, 2) == 0); + types[0] = "ADDR"; + CHECK(cap_dns_type_limit(capdns, types, 1) == 0); + types[1] = "NAME"; + CHECK(cap_dns_type_limit(capdns, types, 2) == -1 && + errno == ENOTCAPABLE); + types[0] = "NAME"; + CHECK(cap_dns_type_limit(capdns, types, 1) == -1 && + errno == ENOTCAPABLE); + families[0] = AF_INET6; + CHECK(cap_dns_family_limit(capdns, families, 1) == 0); + families[1] = AF_INET; + CHECK(cap_dns_family_limit(capdns, families, 2) == -1 && + errno == ENOTCAPABLE); + families[0] = AF_INET; + CHECK(cap_dns_family_limit(capdns, families, 1) == -1 && + errno == ENOTCAPABLE); + + CHECK(runtest(capdns) == (GETHOSTBYADDR_AF_INET6 | + GETADDRINFO_AF_INET6)); + + cap_close(capdns); + + /* Trying to rise the limits. */ + + capdns = cap_clone(origcapdns); + CHECK(capdns != NULL); + + types[0] = "NAME"; + CHECK(cap_dns_type_limit(capdns, types, 1) == 0); + families[0] = AF_INET; + CHECK(cap_dns_family_limit(capdns, families, 1) == 0); + + types[0] = "NAME"; + types[1] = "ADDR"; + CHECK(cap_dns_type_limit(capdns, types, 2) == -1 && + errno == ENOTCAPABLE); + families[0] = AF_INET; + families[1] = AF_INET6; + CHECK(cap_dns_family_limit(capdns, families, 2) == -1 && + errno == ENOTCAPABLE); + + types[0] = "ADDR"; + CHECK(cap_dns_type_limit(capdns, types, 1) == -1 && + errno == ENOTCAPABLE); + families[0] = AF_INET6; + CHECK(cap_dns_family_limit(capdns, families, 1) == -1 && + errno == ENOTCAPABLE); + + CHECK(cap_dns_type_limit(capdns, NULL, 0) == -1 && + errno == ENOTCAPABLE); + CHECK(cap_dns_family_limit(capdns, NULL, 0) == -1 && + errno == ENOTCAPABLE); + + /* Do the limits still hold? */ + CHECK(runtest(capdns) == (GETHOSTBYNAME | GETHOSTBYNAME2_AF_INET)); + + cap_close(capdns); + + capdns = cap_clone(origcapdns); + CHECK(capdns != NULL); + + types[0] = "ADDR"; + CHECK(cap_dns_type_limit(capdns, types, 1) == 0); + families[0] = AF_INET6; + CHECK(cap_dns_family_limit(capdns, families, 1) == 0); + + types[0] = "NAME"; + types[1] = "ADDR"; + CHECK(cap_dns_type_limit(capdns, types, 2) == -1 && + errno == ENOTCAPABLE); + families[0] = AF_INET; + families[1] = AF_INET6; + CHECK(cap_dns_family_limit(capdns, families, 2) == -1 && + errno == ENOTCAPABLE); + + types[0] = "NAME"; + CHECK(cap_dns_type_limit(capdns, types, 1) == -1 && + errno == ENOTCAPABLE); + families[0] = AF_INET; + CHECK(cap_dns_family_limit(capdns, families, 1) == -1 && + errno == ENOTCAPABLE); + + CHECK(cap_dns_type_limit(capdns, NULL, 0) == -1 && + errno == ENOTCAPABLE); + CHECK(cap_dns_family_limit(capdns, NULL, 0) == -1 && + errno == ENOTCAPABLE); + + /* Do the limits still hold? */ + CHECK(runtest(capdns) == (GETHOSTBYADDR_AF_INET6 | + GETADDRINFO_AF_INET6)); + + cap_close(capdns); + + cap_close(origcapdns); + + exit(0); +} Property changes on: projects/clang390-import/lib/libcasper/services/cap_dns/tests/dns_test.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang390-import/lib/libcasper/services/cap_grp/Makefile =================================================================== --- projects/clang390-import/lib/libcasper/services/cap_grp/Makefile (revision 305686) +++ projects/clang390-import/lib/libcasper/services/cap_grp/Makefile (revision 305687) @@ -1,18 +1,24 @@ # $FreeBSD$ +.include + PACKAGE=libcasper LIB= cap_grp SHLIB_MAJOR= 0 SHLIBDIR?= /lib/casper INCSDIR?= ${INCLUDEDIR}/casper SRCS= cap_grp.c INCS= cap_grp.h LIBADD= nv CFLAGS+=-I${.CURDIR} + +.if ${MK_TESTS} != "no" +SUBDIR+= tests +.endif .include Index: projects/clang390-import/lib/libcasper/services/cap_grp/tests/Makefile =================================================================== --- projects/clang390-import/lib/libcasper/services/cap_grp/tests/Makefile (nonexistent) +++ projects/clang390-import/lib/libcasper/services/cap_grp/tests/Makefile (revision 305687) @@ -0,0 +1,11 @@ +# $FreeBSD$ + +TAP_TESTS_C= grp_test + +LIBADD+= casper +LIBADD+= cap_grp +LIBADD+= nv + +WARNS?= 3 + +.include Property changes on: projects/clang390-import/lib/libcasper/services/cap_grp/tests/Makefile ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang390-import/lib/libcasper/services/cap_grp/tests/grp_test.c =================================================================== --- projects/clang390-import/lib/libcasper/services/cap_grp/tests/grp_test.c (nonexistent) +++ projects/clang390-import/lib/libcasper/services/cap_grp/tests/grp_test.c (revision 305687) @@ -0,0 +1,1550 @@ +/*- + * Copyright (c) 2013 The FreeBSD Foundation + * All rights reserved. + * + * This software was developed by Pawel Jakub Dawidek under sponsorship from + * the FreeBSD Foundation. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include +__FBSDID("$FreeBSD$"); + +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include + +static int ntest = 1; + +#define CHECK(expr) do { \ + if ((expr)) \ + printf("ok %d %s:%u\n", ntest, __FILE__, __LINE__); \ + else \ + printf("not ok %d %s:%u\n", ntest, __FILE__, __LINE__); \ + ntest++; \ +} while (0) +#define CHECKX(expr) do { \ + if ((expr)) { \ + printf("ok %d %s:%u\n", ntest, __FILE__, __LINE__); \ + } else { \ + printf("not ok %d %s:%u\n", ntest, __FILE__, __LINE__); \ + exit(1); \ + } \ + ntest++; \ +} while (0) + +#define GID_WHEEL 0 +#define GID_OPERATOR 5 + +#define GETGRENT0 0x0001 +#define GETGRENT1 0x0002 +#define GETGRENT2 0x0004 +#define GETGRENT (GETGRENT0 | GETGRENT1 | GETGRENT2) +#define GETGRENT_R0 0x0008 +#define GETGRENT_R1 0x0010 +#define GETGRENT_R2 0x0020 +#define GETGRENT_R (GETGRENT_R0 | GETGRENT_R1 | GETGRENT_R2) +#define GETGRNAM 0x0040 +#define GETGRNAM_R 0x0080 +#define GETGRGID 0x0100 +#define GETGRGID_R 0x0200 +#define SETGRENT 0x0400 + +static bool +group_mem_compare(char **mem0, char **mem1) +{ + int i0, i1; + + if (mem0 == NULL && mem1 == NULL) + return (true); + if (mem0 == NULL || mem1 == NULL) + return (false); + + for (i0 = 0; mem0[i0] != NULL; i0++) { + for (i1 = 0; mem1[i1] != NULL; i1++) { + if (strcmp(mem0[i0], mem1[i1]) == 0) + break; + } + if (mem1[i1] == NULL) + return (false); + } + + return (true); +} + +static bool +group_compare(const struct group *grp0, const struct group *grp1) +{ + + if (grp0 == NULL && grp1 == NULL) + return (true); + if (grp0 == NULL || grp1 == NULL) + return (false); + + if (strcmp(grp0->gr_name, grp1->gr_name) != 0) + return (false); + + if (grp0->gr_passwd != NULL || grp1->gr_passwd != NULL) { + if (grp0->gr_passwd == NULL || grp1->gr_passwd == NULL) + return (false); + if (strcmp(grp0->gr_passwd, grp1->gr_passwd) != 0) + return (false); + } + + if (grp0->gr_gid != grp1->gr_gid) + return (false); + + if (!group_mem_compare(grp0->gr_mem, grp1->gr_mem)) + return (false); + + return (true); +} + +static unsigned int +runtest_cmds(cap_channel_t *capgrp) +{ + char bufs[1024], bufc[1024]; + unsigned int result; + struct group *grps, *grpc; + struct group sts, stc; + + result = 0; + + (void)setgrent(); + if (cap_setgrent(capgrp) == 1) + result |= SETGRENT; + + grps = getgrent(); + grpc = cap_getgrent(capgrp); + if (group_compare(grps, grpc)) { + result |= GETGRENT0; + grps = getgrent(); + grpc = cap_getgrent(capgrp); + if (group_compare(grps, grpc)) + result |= GETGRENT1; + } + + getgrent_r(&sts, bufs, sizeof(bufs), &grps); + cap_getgrent_r(capgrp, &stc, bufc, sizeof(bufc), &grpc); + if (group_compare(grps, grpc)) { + result |= GETGRENT_R0; + getgrent_r(&sts, bufs, sizeof(bufs), &grps); + cap_getgrent_r(capgrp, &stc, bufc, sizeof(bufc), &grpc); + if (group_compare(grps, grpc)) + result |= GETGRENT_R1; + } + + (void)setgrent(); + if (cap_setgrent(capgrp) == 1) + result |= SETGRENT; + + getgrent_r(&sts, bufs, sizeof(bufs), &grps); + cap_getgrent_r(capgrp, &stc, bufc, sizeof(bufc), &grpc); + if (group_compare(grps, grpc)) + result |= GETGRENT_R2; + + grps = getgrent(); + grpc = cap_getgrent(capgrp); + if (group_compare(grps, grpc)) + result |= GETGRENT2; + + grps = getgrnam("wheel"); + grpc = cap_getgrnam(capgrp, "wheel"); + if (group_compare(grps, grpc)) { + grps = getgrnam("operator"); + grpc = cap_getgrnam(capgrp, "operator"); + if (group_compare(grps, grpc)) + result |= GETGRNAM; + } + + getgrnam_r("wheel", &sts, bufs, sizeof(bufs), &grps); + cap_getgrnam_r(capgrp, "wheel", &stc, bufc, sizeof(bufc), &grpc); + if (group_compare(grps, grpc)) { + getgrnam_r("operator", &sts, bufs, sizeof(bufs), &grps); + cap_getgrnam_r(capgrp, "operator", &stc, bufc, sizeof(bufc), + &grpc); + if (group_compare(grps, grpc)) + result |= GETGRNAM_R; + } + + grps = getgrgid(GID_WHEEL); + grpc = cap_getgrgid(capgrp, GID_WHEEL); + if (group_compare(grps, grpc)) { + grps = getgrgid(GID_OPERATOR); + grpc = cap_getgrgid(capgrp, GID_OPERATOR); + if (group_compare(grps, grpc)) + result |= GETGRGID; + } + + getgrgid_r(GID_WHEEL, &sts, bufs, sizeof(bufs), &grps); + cap_getgrgid_r(capgrp, GID_WHEEL, &stc, bufc, sizeof(bufc), &grpc); + if (group_compare(grps, grpc)) { + getgrgid_r(GID_OPERATOR, &sts, bufs, sizeof(bufs), &grps); + cap_getgrgid_r(capgrp, GID_OPERATOR, &stc, bufc, sizeof(bufc), + &grpc); + if (group_compare(grps, grpc)) + result |= GETGRGID_R; + } + + return (result); +} + +static void +test_cmds(cap_channel_t *origcapgrp) +{ + cap_channel_t *capgrp; + const char *cmds[7], *fields[4], *names[5]; + gid_t gids[5]; + + fields[0] = "gr_name"; + fields[1] = "gr_passwd"; + fields[2] = "gr_gid"; + fields[3] = "gr_mem"; + + names[0] = "wheel"; + names[1] = "daemon"; + names[2] = "kmem"; + names[3] = "sys"; + names[4] = "operator"; + + gids[0] = 0; + gids[1] = 1; + gids[2] = 2; + gids[3] = 3; + gids[4] = 5; + + /* + * Allow: + * cmds: setgrent, getgrent, getgrent_r, getgrnam, getgrnam_r, + * getgrgid, getgrgid_r + * fields: gr_name, gr_passwd, gr_gid, gr_mem + * groups: + * names: wheel, daemon, kmem, sys, operator + * gids: + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + cmds[0] = "setgrent"; + cmds[1] = "getgrent"; + cmds[2] = "getgrent_r"; + cmds[3] = "getgrnam"; + cmds[4] = "getgrnam_r"; + cmds[5] = "getgrgid"; + cmds[6] = "getgrgid_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 7) == 0); + CHECK(cap_grp_limit_fields(capgrp, fields, 4) == 0); + CHECK(cap_grp_limit_groups(capgrp, names, 5, NULL, 0) == 0); + + CHECK(runtest_cmds(capgrp) == (SETGRENT | GETGRENT | GETGRENT_R | + GETGRNAM | GETGRNAM_R | GETGRGID | GETGRGID_R)); + + cap_close(capgrp); + + /* + * Allow: + * cmds: setgrent, getgrent, getgrent_r, getgrnam, getgrnam_r, + * getgrgid, getgrgid_r + * fields: gr_name, gr_passwd, gr_gid, gr_mem + * groups: + * names: + * gids: 0, 1, 2, 3, 5 + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + cmds[0] = "setgrent"; + cmds[1] = "getgrent"; + cmds[2] = "getgrent_r"; + cmds[3] = "getgrnam"; + cmds[4] = "getgrnam_r"; + cmds[5] = "getgrgid"; + cmds[6] = "getgrgid_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 7) == 0); + CHECK(cap_grp_limit_fields(capgrp, fields, 4) == 0); + CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 5) == 0); + + CHECK(runtest_cmds(capgrp) == (SETGRENT | GETGRENT | GETGRENT_R | + GETGRNAM | GETGRNAM_R | GETGRGID | GETGRGID_R)); + + cap_close(capgrp); + + /* + * Allow: + * cmds: getgrent, getgrent_r, getgrnam, getgrnam_r, + * getgrgid, getgrgid_r + * fields: gr_name, gr_passwd, gr_gid, gr_mem + * groups: + * names: wheel, daemon, kmem, sys, operator + * gids: + * Disallow: + * cmds: setgrent + * fields: + * groups: + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + cmds[0] = "getgrent"; + cmds[1] = "getgrent_r"; + cmds[2] = "getgrnam"; + cmds[3] = "getgrnam_r"; + cmds[4] = "getgrgid"; + cmds[5] = "getgrgid_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 6) == 0); + cmds[0] = "setgrent"; + cmds[1] = "getgrent"; + cmds[2] = "getgrent_r"; + cmds[3] = "getgrnam"; + cmds[4] = "getgrnam_r"; + cmds[5] = "getgrgid"; + cmds[6] = "getgrgid_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 7) == -1 && errno == ENOTCAPABLE); + cmds[0] = "setgrent"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 1) == -1 && errno == ENOTCAPABLE); + CHECK(cap_grp_limit_groups(capgrp, names, 5, NULL, 0) == 0); + + CHECK(runtest_cmds(capgrp) == (GETGRENT0 | GETGRENT1 | GETGRENT_R0 | + GETGRENT_R1 | GETGRNAM | GETGRNAM_R | GETGRGID | GETGRGID_R)); + + cap_close(capgrp); + + /* + * Allow: + * cmds: getgrent, getgrent_r, getgrnam, getgrnam_r, + * getgrgid, getgrgid_r + * fields: gr_name, gr_passwd, gr_gid, gr_mem + * groups: + * names: + * gids: 0, 1, 2, 3, 5 + * Disallow: + * cmds: setgrent + * fields: + * groups: + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + cmds[0] = "getgrent"; + cmds[1] = "getgrent_r"; + cmds[2] = "getgrnam"; + cmds[3] = "getgrnam_r"; + cmds[4] = "getgrgid"; + cmds[5] = "getgrgid_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 6) == 0); + cmds[0] = "setgrent"; + cmds[1] = "getgrent"; + cmds[2] = "getgrent_r"; + cmds[3] = "getgrnam"; + cmds[4] = "getgrnam_r"; + cmds[5] = "getgrgid"; + cmds[6] = "getgrgid_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 7) == -1 && errno == ENOTCAPABLE); + cmds[0] = "setgrent"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 1) == -1 && errno == ENOTCAPABLE); + CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 5) == 0); + + CHECK(runtest_cmds(capgrp) == (GETGRENT0 | GETGRENT1 | GETGRENT_R0 | + GETGRENT_R1 | GETGRNAM | GETGRNAM_R | GETGRGID | GETGRGID_R)); + + cap_close(capgrp); + + /* + * Allow: + * cmds: setgrent, getgrent_r, getgrnam, getgrnam_r, + * getgrgid, getgrgid_r + * fields: gr_name, gr_passwd, gr_gid, gr_mem + * groups: + * names: wheel, daemon, kmem, sys, operator + * gids: + * Disallow: + * cmds: getgrent + * fields: + * groups: + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + cmds[0] = "setgrent"; + cmds[1] = "getgrent_r"; + cmds[2] = "getgrnam"; + cmds[3] = "getgrnam_r"; + cmds[4] = "getgrgid"; + cmds[5] = "getgrgid_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 6) == 0); + cmds[0] = "setgrent"; + cmds[1] = "getgrent"; + cmds[2] = "getgrent_r"; + cmds[3] = "getgrnam"; + cmds[4] = "getgrnam_r"; + cmds[5] = "getgrgid"; + cmds[6] = "getgrgid_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 7) == -1 && errno == ENOTCAPABLE); + cmds[0] = "getgrent"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 1) == -1 && errno == ENOTCAPABLE); + CHECK(cap_grp_limit_fields(capgrp, fields, 4) == 0); + CHECK(cap_grp_limit_groups(capgrp, names, 5, NULL, 0) == 0); + + CHECK(runtest_cmds(capgrp) == (SETGRENT | GETGRENT_R2 | + GETGRNAM | GETGRNAM_R | GETGRGID | GETGRGID_R)); + + cap_close(capgrp); + + /* + * Allow: + * cmds: setgrent, getgrent_r, getgrnam, getgrnam_r, + * getgrgid, getgrgid_r + * fields: gr_name, gr_passwd, gr_gid, gr_mem + * groups: + * names: + * gids: 0, 1, 2, 3, 5 + * Disallow: + * cmds: getgrent + * fields: + * groups: + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + cmds[0] = "setgrent"; + cmds[1] = "getgrent_r"; + cmds[2] = "getgrnam"; + cmds[3] = "getgrnam_r"; + cmds[4] = "getgrgid"; + cmds[5] = "getgrgid_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 6) == 0); + cmds[0] = "setgrent"; + cmds[1] = "getgrent"; + cmds[2] = "getgrent_r"; + cmds[3] = "getgrnam"; + cmds[4] = "getgrnam_r"; + cmds[5] = "getgrgid"; + cmds[6] = "getgrgid_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 7) == -1 && errno == ENOTCAPABLE); + cmds[0] = "getgrent"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 1) == -1 && errno == ENOTCAPABLE); + CHECK(cap_grp_limit_fields(capgrp, fields, 4) == 0); + CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 5) == 0); + + CHECK(runtest_cmds(capgrp) == (SETGRENT | GETGRENT_R2 | + GETGRNAM | GETGRNAM_R | GETGRGID | GETGRGID_R)); + + cap_close(capgrp); + + /* + * Allow: + * cmds: setgrent, getgrent, getgrnam, getgrnam_r, + * getgrgid, getgrgid_r + * fields: gr_name, gr_passwd, gr_gid, gr_mem + * groups: + * names: wheel, daemon, kmem, sys, operator + * gids: + * Disallow: + * cmds: getgrent_r + * fields: + * groups: + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + cmds[0] = "setgrent"; + cmds[1] = "getgrent"; + cmds[2] = "getgrnam"; + cmds[3] = "getgrnam_r"; + cmds[4] = "getgrgid"; + cmds[5] = "getgrgid_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 6) == 0); + cmds[0] = "setgrent"; + cmds[1] = "getgrent"; + cmds[2] = "getgrent_r"; + cmds[3] = "getgrnam"; + cmds[4] = "getgrnam_r"; + cmds[5] = "getgrgid"; + cmds[6] = "getgrgid_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 7) == -1 && errno == ENOTCAPABLE); + cmds[0] = "getgrent_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 1) == -1 && errno == ENOTCAPABLE); + CHECK(cap_grp_limit_groups(capgrp, names, 5, NULL, 0) == 0); + + CHECK(runtest_cmds(capgrp) == (SETGRENT | GETGRENT0 | GETGRENT1 | + GETGRNAM | GETGRNAM_R | GETGRGID | GETGRGID_R)); + + cap_close(capgrp); + + /* + * Allow: + * cmds: setgrent, getgrent, getgrnam, getgrnam_r, + * getgrgid, getgrgid_r + * fields: gr_name, gr_passwd, gr_gid, gr_mem + * groups: + * names: + * gids: 0, 1, 2, 3, 5 + * Disallow: + * cmds: getgrent_r + * fields: + * groups: + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + cmds[0] = "setgrent"; + cmds[1] = "getgrent"; + cmds[2] = "getgrnam"; + cmds[3] = "getgrnam_r"; + cmds[4] = "getgrgid"; + cmds[5] = "getgrgid_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 6) == 0); + cmds[0] = "setgrent"; + cmds[1] = "getgrent"; + cmds[2] = "getgrent_r"; + cmds[3] = "getgrnam"; + cmds[4] = "getgrnam_r"; + cmds[5] = "getgrgid"; + cmds[6] = "getgrgid_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 7) == -1 && errno == ENOTCAPABLE); + cmds[0] = "getgrent_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 1) == -1 && errno == ENOTCAPABLE); + CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 5) == 0); + + CHECK(runtest_cmds(capgrp) == (SETGRENT | GETGRENT0 | GETGRENT1 | + GETGRNAM | GETGRNAM_R | GETGRGID | GETGRGID_R)); + + cap_close(capgrp); + + /* + * Allow: + * cmds: setgrent, getgrent, getgrent_r, getgrnam_r, + * getgrgid, getgrgid_r + * fields: gr_name, gr_passwd, gr_gid, gr_mem + * groups: + * names: wheel, daemon, kmem, sys, operator + * gids: + * Disallow: + * cmds: getgrnam + * fields: + * groups: + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + cmds[0] = "setgrent"; + cmds[1] = "getgrent"; + cmds[2] = "getgrent_r"; + cmds[3] = "getgrnam_r"; + cmds[4] = "getgrgid"; + cmds[5] = "getgrgid_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 6) == 0); + cmds[0] = "setgrent"; + cmds[1] = "getgrent"; + cmds[2] = "getgrent_r"; + cmds[3] = "getgrnam"; + cmds[4] = "getgrnam_r"; + cmds[5] = "getgrgid"; + cmds[6] = "getgrgid_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 7) == -1 && errno == ENOTCAPABLE); + cmds[0] = "getgrnam"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 1) == -1 && errno == ENOTCAPABLE); + CHECK(cap_grp_limit_fields(capgrp, fields, 4) == 0); + CHECK(cap_grp_limit_groups(capgrp, names, 5, NULL, 0) == 0); + + CHECK(runtest_cmds(capgrp) == (SETGRENT | GETGRENT | GETGRENT_R | + GETGRNAM_R | GETGRGID | GETGRGID_R)); + + cap_close(capgrp); + + /* + * Allow: + * cmds: setgrent, getgrent, getgrent_r, getgrnam_r, + * getgrgid, getgrgid_r + * fields: gr_name, gr_passwd, gr_gid, gr_mem + * groups: + * names: + * gids: 0, 1, 2, 3, 5 + * Disallow: + * cmds: getgrnam + * fields: + * groups: + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + cmds[0] = "setgrent"; + cmds[1] = "getgrent"; + cmds[2] = "getgrent_r"; + cmds[3] = "getgrnam_r"; + cmds[4] = "getgrgid"; + cmds[5] = "getgrgid_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 6) == 0); + cmds[0] = "setgrent"; + cmds[1] = "getgrent"; + cmds[2] = "getgrent_r"; + cmds[3] = "getgrnam"; + cmds[4] = "getgrnam_r"; + cmds[5] = "getgrgid"; + cmds[6] = "getgrgid_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 7) == -1 && errno == ENOTCAPABLE); + cmds[0] = "getgrnam"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 1) == -1 && errno == ENOTCAPABLE); + CHECK(cap_grp_limit_fields(capgrp, fields, 4) == 0); + CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 5) == 0); + + CHECK(runtest_cmds(capgrp) == (SETGRENT | GETGRENT | GETGRENT_R | + GETGRNAM_R | GETGRGID | GETGRGID_R)); + + cap_close(capgrp); + + /* + * Allow: + * cmds: setgrent, getgrent, getgrent_r, getgrnam, + * getgrgid, getgrgid_r + * fields: gr_name, gr_passwd, gr_gid, gr_mem + * groups: + * names: wheel, daemon, kmem, sys, operator + * gids: + * Disallow: + * cmds: getgrnam_r + * fields: + * groups: + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + cmds[0] = "setgrent"; + cmds[1] = "getgrent"; + cmds[2] = "getgrent_r"; + cmds[3] = "getgrnam"; + cmds[4] = "getgrgid"; + cmds[5] = "getgrgid_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 6) == 0); + cmds[0] = "setgrent"; + cmds[1] = "getgrent"; + cmds[2] = "getgrent_r"; + cmds[3] = "getgrnam"; + cmds[4] = "getgrnam_r"; + cmds[5] = "getgrgid"; + cmds[6] = "getgrgid_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 7) == -1 && errno == ENOTCAPABLE); + cmds[0] = "getgrnam_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 1) == -1 && errno == ENOTCAPABLE); + CHECK(cap_grp_limit_groups(capgrp, names, 5, NULL, 0) == 0); + + CHECK(runtest_cmds(capgrp) == (SETGRENT | GETGRENT | GETGRENT_R | + GETGRNAM | GETGRGID | GETGRGID_R)); + + cap_close(capgrp); + + /* + * Allow: + * cmds: setgrent, getgrent, getgrent_r, getgrnam, + * getgrgid, getgrgid_r + * fields: gr_name, gr_passwd, gr_gid, gr_mem + * groups: + * names: + * gids: 0, 1, 2, 3, 5 + * Disallow: + * cmds: getgrnam_r + * fields: + * groups: + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + cmds[0] = "setgrent"; + cmds[1] = "getgrent"; + cmds[2] = "getgrent_r"; + cmds[3] = "getgrnam"; + cmds[4] = "getgrgid"; + cmds[5] = "getgrgid_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 6) == 0); + cmds[0] = "setgrent"; + cmds[1] = "getgrent"; + cmds[2] = "getgrent_r"; + cmds[3] = "getgrnam"; + cmds[4] = "getgrnam_r"; + cmds[5] = "getgrgid"; + cmds[6] = "getgrgid_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 7) == -1 && errno == ENOTCAPABLE); + cmds[0] = "getgrnam_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 1) == -1 && errno == ENOTCAPABLE); + CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 5) == 0); + + CHECK(runtest_cmds(capgrp) == (SETGRENT | GETGRENT | GETGRENT_R | + GETGRNAM | GETGRGID | GETGRGID_R)); + + cap_close(capgrp); + + /* + * Allow: + * cmds: setgrent, getgrent, getgrent_r, getgrnam, getgrnam_r, + * getgrgid_r + * fields: gr_name, gr_passwd, gr_gid, gr_mem + * groups: + * names: wheel, daemon, kmem, sys, operator + * gids: + * Disallow: + * cmds: getgrgid + * fields: + * groups: + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + cmds[0] = "setgrent"; + cmds[1] = "getgrent"; + cmds[2] = "getgrent_r"; + cmds[3] = "getgrnam"; + cmds[4] = "getgrnam_r"; + cmds[5] = "getgrgid_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 6) == 0); + cmds[0] = "setgrent"; + cmds[1] = "getgrent"; + cmds[2] = "getgrent_r"; + cmds[3] = "getgrnam"; + cmds[4] = "getgrnam_r"; + cmds[5] = "getgrgid"; + cmds[6] = "getgrgid_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 7) == -1 && errno == ENOTCAPABLE); + cmds[0] = "getgrgid"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 1) == -1 && errno == ENOTCAPABLE); + CHECK(cap_grp_limit_fields(capgrp, fields, 4) == 0); + CHECK(cap_grp_limit_groups(capgrp, names, 5, NULL, 0) == 0); + + CHECK(runtest_cmds(capgrp) == (SETGRENT | GETGRENT | GETGRENT_R | + GETGRNAM | GETGRNAM_R | GETGRGID_R)); + + cap_close(capgrp); + + /* + * Allow: + * cmds: setgrent, getgrent, getgrent_r, getgrnam, getgrnam_r, + * getgrgid_r + * fields: gr_name, gr_passwd, gr_gid, gr_mem + * groups: + * names: + * gids: 0, 1, 2, 3, 5 + * Disallow: + * cmds: getgrgid + * fields: + * groups: + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + cmds[0] = "setgrent"; + cmds[1] = "getgrent"; + cmds[2] = "getgrent_r"; + cmds[3] = "getgrnam"; + cmds[4] = "getgrnam_r"; + cmds[5] = "getgrgid_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 6) == 0); + cmds[0] = "setgrent"; + cmds[1] = "getgrent"; + cmds[2] = "getgrent_r"; + cmds[3] = "getgrnam"; + cmds[4] = "getgrnam_r"; + cmds[5] = "getgrgid"; + cmds[6] = "getgrgid_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 7) == -1 && errno == ENOTCAPABLE); + cmds[0] = "getgrgid"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 1) == -1 && errno == ENOTCAPABLE); + CHECK(cap_grp_limit_fields(capgrp, fields, 4) == 0); + CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 5) == 0); + + CHECK(runtest_cmds(capgrp) == (SETGRENT | GETGRENT | GETGRENT_R | + GETGRNAM | GETGRNAM_R | GETGRGID_R)); + + cap_close(capgrp); + + /* + * Allow: + * cmds: setgrent, getgrent, getgrent_r, getgrnam, getgrnam_r, + * getgrgid + * fields: gr_name, gr_passwd, gr_gid, gr_mem + * groups: + * names: wheel, daemon, kmem, sys, operator + * gids: + * Disallow: + * cmds: getgrgid_r + * fields: + * groups: + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + cmds[0] = "setgrent"; + cmds[1] = "getgrent"; + cmds[2] = "getgrent_r"; + cmds[3] = "getgrnam"; + cmds[4] = "getgrnam_r"; + cmds[5] = "getgrgid"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 6) == 0); + cmds[0] = "setgrent"; + cmds[1] = "getgrent"; + cmds[2] = "getgrent_r"; + cmds[3] = "getgrnam"; + cmds[4] = "getgrnam_r"; + cmds[5] = "getgrgid"; + cmds[6] = "getgrgid_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 7) == -1 && errno == ENOTCAPABLE); + cmds[0] = "getgrgid_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 1) == -1 && errno == ENOTCAPABLE); + CHECK(cap_grp_limit_groups(capgrp, names, 5, NULL, 0) == 0); + + CHECK(runtest_cmds(capgrp) == (SETGRENT | GETGRENT | GETGRENT_R | + GETGRNAM | GETGRNAM_R | GETGRGID)); + + cap_close(capgrp); + + /* + * Allow: + * cmds: setgrent, getgrent, getgrent_r, getgrnam, getgrnam_r, + * getgrgid + * fields: gr_name, gr_passwd, gr_gid, gr_mem + * groups: + * names: + * gids: 0, 1, 2, 3, 5 + * Disallow: + * cmds: getgrgid_r + * fields: + * groups: + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + cmds[0] = "setgrent"; + cmds[1] = "getgrent"; + cmds[2] = "getgrent_r"; + cmds[3] = "getgrnam"; + cmds[4] = "getgrnam_r"; + cmds[5] = "getgrgid"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 6) == 0); + cmds[0] = "setgrent"; + cmds[1] = "getgrent"; + cmds[2] = "getgrent_r"; + cmds[3] = "getgrnam"; + cmds[4] = "getgrnam_r"; + cmds[5] = "getgrgid"; + cmds[6] = "getgrgid_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 7) == -1 && errno == ENOTCAPABLE); + cmds[0] = "getgrgid_r"; + CHECK(cap_grp_limit_cmds(capgrp, cmds, 1) == -1 && errno == ENOTCAPABLE); + CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 5) == 0); + + CHECK(runtest_cmds(capgrp) == (SETGRENT | GETGRENT | GETGRENT_R | + GETGRNAM | GETGRNAM_R | GETGRGID)); + + cap_close(capgrp); +} + +#define GR_NAME 0x01 +#define GR_PASSWD 0x02 +#define GR_GID 0x04 +#define GR_MEM 0x08 + +static unsigned int +group_fields(const struct group *grp) +{ + unsigned int result; + + result = 0; + + if (grp->gr_name != NULL && grp->gr_name[0] != '\0') + result |= GR_NAME; + + if (grp->gr_passwd != NULL && grp->gr_passwd[0] != '\0') + result |= GR_PASSWD; + + if (grp->gr_gid != (gid_t)-1) + result |= GR_GID; + + if (grp->gr_mem != NULL && grp->gr_mem[0] != NULL) + result |= GR_MEM; + + return (result); +} + +static bool +runtest_fields(cap_channel_t *capgrp, unsigned int expected) +{ + char buf[1024]; + struct group *grp; + struct group st; + + (void)cap_setgrent(capgrp); + grp = cap_getgrent(capgrp); + if (group_fields(grp) != expected) + return (false); + + (void)cap_setgrent(capgrp); + cap_getgrent_r(capgrp, &st, buf, sizeof(buf), &grp); + if (group_fields(grp) != expected) + return (false); + + grp = cap_getgrnam(capgrp, "wheel"); + if (group_fields(grp) != expected) + return (false); + + cap_getgrnam_r(capgrp, "wheel", &st, buf, sizeof(buf), &grp); + if (group_fields(grp) != expected) + return (false); + + grp = cap_getgrgid(capgrp, GID_WHEEL); + if (group_fields(grp) != expected) + return (false); + + cap_getgrgid_r(capgrp, GID_WHEEL, &st, buf, sizeof(buf), &grp); + if (group_fields(grp) != expected) + return (false); + + return (true); +} + +static void +test_fields(cap_channel_t *origcapgrp) +{ + cap_channel_t *capgrp; + const char *fields[4]; + + /* No limits. */ + + CHECK(runtest_fields(origcapgrp, GR_NAME | GR_PASSWD | GR_GID | GR_MEM)); + + /* + * Allow: + * fields: gr_name, gr_passwd, gr_gid, gr_mem + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + fields[0] = "gr_name"; + fields[1] = "gr_passwd"; + fields[2] = "gr_gid"; + fields[3] = "gr_mem"; + CHECK(cap_grp_limit_fields(capgrp, fields, 4) == 0); + + CHECK(runtest_fields(capgrp, GR_NAME | GR_PASSWD | GR_GID | GR_MEM)); + + cap_close(capgrp); + + /* + * Allow: + * fields: gr_passwd, gr_gid, gr_mem + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + fields[0] = "gr_passwd"; + fields[1] = "gr_gid"; + fields[2] = "gr_mem"; + CHECK(cap_grp_limit_fields(capgrp, fields, 3) == 0); + fields[0] = "gr_name"; + fields[1] = "gr_passwd"; + fields[2] = "gr_gid"; + fields[3] = "gr_mem"; + CHECK(cap_grp_limit_fields(capgrp, fields, 4) == -1 && + errno == ENOTCAPABLE); + + CHECK(runtest_fields(capgrp, GR_PASSWD | GR_GID | GR_MEM)); + + cap_close(capgrp); + + /* + * Allow: + * fields: gr_name, gr_gid, gr_mem + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + fields[0] = "gr_name"; + fields[1] = "gr_gid"; + fields[2] = "gr_mem"; + CHECK(cap_grp_limit_fields(capgrp, fields, 3) == 0); + fields[0] = "gr_name"; + fields[1] = "gr_passwd"; + fields[2] = "gr_gid"; + fields[3] = "gr_mem"; + CHECK(cap_grp_limit_fields(capgrp, fields, 4) == -1 && + errno == ENOTCAPABLE); + fields[0] = "gr_passwd"; + CHECK(cap_grp_limit_fields(capgrp, fields, 1) == -1 && + errno == ENOTCAPABLE); + + CHECK(runtest_fields(capgrp, GR_NAME | GR_GID | GR_MEM)); + + cap_close(capgrp); + + /* + * Allow: + * fields: gr_name, gr_passwd, gr_mem + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + fields[0] = "gr_name"; + fields[1] = "gr_passwd"; + fields[2] = "gr_mem"; + CHECK(cap_grp_limit_fields(capgrp, fields, 3) == 0); + fields[0] = "gr_name"; + fields[1] = "gr_passwd"; + fields[2] = "gr_gid"; + fields[3] = "gr_mem"; + CHECK(cap_grp_limit_fields(capgrp, fields, 4) == -1 && + errno == ENOTCAPABLE); + fields[0] = "gr_gid"; + CHECK(cap_grp_limit_fields(capgrp, fields, 1) == -1 && + errno == ENOTCAPABLE); + + CHECK(runtest_fields(capgrp, GR_NAME | GR_PASSWD | GR_MEM)); + + cap_close(capgrp); + + /* + * Allow: + * fields: gr_name, gr_passwd, gr_gid + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + fields[0] = "gr_name"; + fields[1] = "gr_passwd"; + fields[2] = "gr_gid"; + CHECK(cap_grp_limit_fields(capgrp, fields, 3) == 0); + fields[0] = "gr_name"; + fields[1] = "gr_passwd"; + fields[2] = "gr_gid"; + fields[3] = "gr_mem"; + CHECK(cap_grp_limit_fields(capgrp, fields, 4) == -1 && + errno == ENOTCAPABLE); + fields[0] = "gr_mem"; + CHECK(cap_grp_limit_fields(capgrp, fields, 1) == -1 && + errno == ENOTCAPABLE); + + CHECK(runtest_fields(capgrp, GR_NAME | GR_PASSWD | GR_GID)); + + cap_close(capgrp); + + /* + * Allow: + * fields: gr_name, gr_passwd + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + fields[0] = "gr_name"; + fields[1] = "gr_passwd"; + CHECK(cap_grp_limit_fields(capgrp, fields, 2) == 0); + fields[0] = "gr_name"; + fields[1] = "gr_passwd"; + fields[2] = "gr_gid"; + fields[3] = "gr_mem"; + CHECK(cap_grp_limit_fields(capgrp, fields, 4) == -1 && + errno == ENOTCAPABLE); + fields[0] = "gr_gid"; + CHECK(cap_grp_limit_fields(capgrp, fields, 1) == -1 && + errno == ENOTCAPABLE); + + CHECK(runtest_fields(capgrp, GR_NAME | GR_PASSWD)); + + cap_close(capgrp); + + /* + * Allow: + * fields: gr_name, gr_gid + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + fields[0] = "gr_name"; + fields[1] = "gr_gid"; + CHECK(cap_grp_limit_fields(capgrp, fields, 2) == 0); + fields[0] = "gr_name"; + fields[1] = "gr_passwd"; + fields[2] = "gr_gid"; + fields[3] = "gr_mem"; + CHECK(cap_grp_limit_fields(capgrp, fields, 4) == -1 && + errno == ENOTCAPABLE); + fields[0] = "gr_mem"; + CHECK(cap_grp_limit_fields(capgrp, fields, 1) == -1 && + errno == ENOTCAPABLE); + + CHECK(runtest_fields(capgrp, GR_NAME | GR_GID)); + + cap_close(capgrp); + + /* + * Allow: + * fields: gr_name, gr_mem + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + fields[0] = "gr_name"; + fields[1] = "gr_mem"; + CHECK(cap_grp_limit_fields(capgrp, fields, 2) == 0); + fields[0] = "gr_name"; + fields[1] = "gr_passwd"; + fields[2] = "gr_gid"; + fields[3] = "gr_mem"; + CHECK(cap_grp_limit_fields(capgrp, fields, 4) == -1 && + errno == ENOTCAPABLE); + fields[0] = "gr_passwd"; + CHECK(cap_grp_limit_fields(capgrp, fields, 1) == -1 && + errno == ENOTCAPABLE); + + CHECK(runtest_fields(capgrp, GR_NAME | GR_MEM)); + + cap_close(capgrp); + + /* + * Allow: + * fields: gr_passwd, gr_gid + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + fields[0] = "gr_passwd"; + fields[1] = "gr_gid"; + CHECK(cap_grp_limit_fields(capgrp, fields, 2) == 0); + fields[0] = "gr_name"; + fields[1] = "gr_passwd"; + fields[2] = "gr_gid"; + fields[3] = "gr_mem"; + CHECK(cap_grp_limit_fields(capgrp, fields, 4) == -1 && + errno == ENOTCAPABLE); + fields[0] = "gr_mem"; + CHECK(cap_grp_limit_fields(capgrp, fields, 1) == -1 && + errno == ENOTCAPABLE); + + CHECK(runtest_fields(capgrp, GR_PASSWD | GR_GID)); + + cap_close(capgrp); + + /* + * Allow: + * fields: gr_passwd, gr_mem + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + fields[0] = "gr_passwd"; + fields[1] = "gr_mem"; + CHECK(cap_grp_limit_fields(capgrp, fields, 2) == 0); + fields[0] = "gr_name"; + fields[1] = "gr_passwd"; + fields[2] = "gr_gid"; + fields[3] = "gr_mem"; + CHECK(cap_grp_limit_fields(capgrp, fields, 4) == -1 && + errno == ENOTCAPABLE); + fields[0] = "gr_gid"; + CHECK(cap_grp_limit_fields(capgrp, fields, 1) == -1 && + errno == ENOTCAPABLE); + + CHECK(runtest_fields(capgrp, GR_PASSWD | GR_MEM)); + + cap_close(capgrp); + + /* + * Allow: + * fields: gr_gid, gr_mem + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + fields[0] = "gr_gid"; + fields[1] = "gr_mem"; + CHECK(cap_grp_limit_fields(capgrp, fields, 2) == 0); + fields[0] = "gr_name"; + fields[1] = "gr_passwd"; + fields[2] = "gr_gid"; + fields[3] = "gr_mem"; + CHECK(cap_grp_limit_fields(capgrp, fields, 4) == -1 && + errno == ENOTCAPABLE); + fields[0] = "gr_passwd"; + CHECK(cap_grp_limit_fields(capgrp, fields, 1) == -1 && + errno == ENOTCAPABLE); + + CHECK(runtest_fields(capgrp, GR_GID | GR_MEM)); + + cap_close(capgrp); +} + +static bool +runtest_groups(cap_channel_t *capgrp, const char **names, const gid_t *gids, + size_t ngroups) +{ + char buf[1024]; + struct group *grp; + struct group st; + unsigned int i, got; + + (void)cap_setgrent(capgrp); + got = 0; + for (;;) { + grp = cap_getgrent(capgrp); + if (grp == NULL) + break; + got++; + for (i = 0; i < ngroups; i++) { + if (strcmp(names[i], grp->gr_name) == 0 && + gids[i] == grp->gr_gid) { + break; + } + } + if (i == ngroups) + return (false); + } + if (got != ngroups) + return (false); + + (void)cap_setgrent(capgrp); + got = 0; + for (;;) { + cap_getgrent_r(capgrp, &st, buf, sizeof(buf), &grp); + if (grp == NULL) + break; + got++; + for (i = 0; i < ngroups; i++) { + if (strcmp(names[i], grp->gr_name) == 0 && + gids[i] == grp->gr_gid) { + break; + } + } + if (i == ngroups) + return (false); + } + if (got != ngroups) + return (false); + + for (i = 0; i < ngroups; i++) { + grp = cap_getgrnam(capgrp, names[i]); + if (grp == NULL) + return (false); + } + + for (i = 0; i < ngroups; i++) { + cap_getgrnam_r(capgrp, names[i], &st, buf, sizeof(buf), &grp); + if (grp == NULL) + return (false); + } + + for (i = 0; i < ngroups; i++) { + grp = cap_getgrgid(capgrp, gids[i]); + if (grp == NULL) + return (false); + } + + for (i = 0; i < ngroups; i++) { + cap_getgrgid_r(capgrp, gids[i], &st, buf, sizeof(buf), &grp); + if (grp == NULL) + return (false); + } + + return (true); +} + +static void +test_groups(cap_channel_t *origcapgrp) +{ + cap_channel_t *capgrp; + const char *names[5]; + gid_t gids[5]; + + /* + * Allow: + * groups: + * names: wheel, daemon, kmem, sys, tty + * gids: + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + names[0] = "wheel"; + names[1] = "daemon"; + names[2] = "kmem"; + names[3] = "sys"; + names[4] = "tty"; + CHECK(cap_grp_limit_groups(capgrp, names, 5, NULL, 0) == 0); + gids[0] = 0; + gids[1] = 1; + gids[2] = 2; + gids[3] = 3; + gids[4] = 4; + + CHECK(runtest_groups(capgrp, names, gids, 5)); + + cap_close(capgrp); + + /* + * Allow: + * groups: + * names: kmem, sys, tty + * gids: + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + names[0] = "kmem"; + names[1] = "sys"; + names[2] = "tty"; + CHECK(cap_grp_limit_groups(capgrp, names, 3, NULL, 0) == 0); + names[3] = "daemon"; + CHECK(cap_grp_limit_groups(capgrp, names, 4, NULL, 0) == -1 && + errno == ENOTCAPABLE); + names[0] = "daemon"; + CHECK(cap_grp_limit_groups(capgrp, names, 1, NULL, 0) == -1 && + errno == ENOTCAPABLE); + names[0] = "kmem"; + gids[0] = 2; + gids[1] = 3; + gids[2] = 4; + + CHECK(runtest_groups(capgrp, names, gids, 3)); + + cap_close(capgrp); + + /* + * Allow: + * groups: + * names: wheel, kmem, tty + * gids: + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + names[0] = "wheel"; + names[1] = "kmem"; + names[2] = "tty"; + CHECK(cap_grp_limit_groups(capgrp, names, 3, NULL, 0) == 0); + names[3] = "daemon"; + CHECK(cap_grp_limit_groups(capgrp, names, 4, NULL, 0) == -1 && + errno == ENOTCAPABLE); + names[0] = "daemon"; + CHECK(cap_grp_limit_groups(capgrp, names, 1, NULL, 0) == -1 && + errno == ENOTCAPABLE); + names[0] = "wheel"; + gids[0] = 0; + gids[1] = 2; + gids[2] = 4; + + CHECK(runtest_groups(capgrp, names, gids, 3)); + + cap_close(capgrp); + + /* + * Allow: + * groups: + * names: + * gids: 2, 3, 4 + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + names[0] = "kmem"; + names[1] = "sys"; + names[2] = "tty"; + gids[0] = 2; + gids[1] = 3; + gids[2] = 4; + CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 3) == 0); + gids[3] = 0; + CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 4) == -1 && + errno == ENOTCAPABLE); + gids[0] = 0; + CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 1) == -1 && + errno == ENOTCAPABLE); + gids[0] = 2; + + CHECK(runtest_groups(capgrp, names, gids, 3)); + + cap_close(capgrp); + + /* + * Allow: + * groups: + * names: + * gids: 0, 2, 4 + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + names[0] = "wheel"; + names[1] = "kmem"; + names[2] = "tty"; + gids[0] = 0; + gids[1] = 2; + gids[2] = 4; + CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 3) == 0); + gids[3] = 1; + CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 4) == -1 && + errno == ENOTCAPABLE); + gids[0] = 1; + CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 1) == -1 && + errno == ENOTCAPABLE); + gids[0] = 0; + + CHECK(runtest_groups(capgrp, names, gids, 3)); + + cap_close(capgrp); + + /* + * Allow: + * groups: + * names: kmem + * gids: + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + names[0] = "kmem"; + CHECK(cap_grp_limit_groups(capgrp, names, 1, NULL, 0) == 0); + names[1] = "daemon"; + CHECK(cap_grp_limit_groups(capgrp, names, 2, NULL, 0) == -1 && + errno == ENOTCAPABLE); + names[0] = "daemon"; + CHECK(cap_grp_limit_groups(capgrp, names, 1, NULL, 0) == -1 && + errno == ENOTCAPABLE); + names[0] = "kmem"; + gids[0] = 2; + + CHECK(runtest_groups(capgrp, names, gids, 1)); + + cap_close(capgrp); + + /* + * Allow: + * groups: + * names: wheel, tty + * gids: + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + names[0] = "wheel"; + names[1] = "tty"; + CHECK(cap_grp_limit_groups(capgrp, names, 2, NULL, 0) == 0); + names[2] = "daemon"; + CHECK(cap_grp_limit_groups(capgrp, names, 3, NULL, 0) == -1 && + errno == ENOTCAPABLE); + names[0] = "daemon"; + CHECK(cap_grp_limit_groups(capgrp, names, 1, NULL, 0) == -1 && + errno == ENOTCAPABLE); + names[0] = "wheel"; + gids[0] = 0; + gids[1] = 4; + + CHECK(runtest_groups(capgrp, names, gids, 2)); + + cap_close(capgrp); + + /* + * Allow: + * groups: + * names: + * gids: 2 + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + names[0] = "kmem"; + gids[0] = 2; + CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 1) == 0); + gids[1] = 1; + CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 2) == -1 && + errno == ENOTCAPABLE); + gids[0] = 1; + CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 1) == -1 && + errno == ENOTCAPABLE); + gids[0] = 2; + + CHECK(runtest_groups(capgrp, names, gids, 1)); + + cap_close(capgrp); + + /* + * Allow: + * groups: + * names: + * gids: 0, 4 + */ + capgrp = cap_clone(origcapgrp); + CHECK(capgrp != NULL); + + names[0] = "wheel"; + names[1] = "tty"; + gids[0] = 0; + gids[1] = 4; + CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 2) == 0); + gids[2] = 1; + CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 3) == -1 && + errno == ENOTCAPABLE); + gids[0] = 1; + CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 1) == -1 && + errno == ENOTCAPABLE); + gids[0] = 0; + + CHECK(runtest_groups(capgrp, names, gids, 2)); + + cap_close(capgrp); +} + +int +main(void) +{ + cap_channel_t *capcas, *capgrp; + + printf("1..199\n"); + + capcas = cap_init(); + CHECKX(capcas != NULL); + + capgrp = cap_service_open(capcas, "system.grp"); + CHECKX(capgrp != NULL); + + cap_close(capcas); + + /* No limits. */ + + CHECK(runtest_cmds(capgrp) == (SETGRENT | GETGRENT | GETGRENT_R | + GETGRNAM | GETGRNAM_R | GETGRGID | GETGRGID_R)); + + test_cmds(capgrp); + + test_fields(capgrp); + + test_groups(capgrp); + + cap_close(capgrp); + + exit(0); +} Property changes on: projects/clang390-import/lib/libcasper/services/cap_grp/tests/grp_test.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang390-import/lib/libcasper/services/cap_pwd/Makefile =================================================================== --- projects/clang390-import/lib/libcasper/services/cap_pwd/Makefile (revision 305686) +++ projects/clang390-import/lib/libcasper/services/cap_pwd/Makefile (revision 305687) @@ -1,18 +1,24 @@ # $FreeBSD$ +.include + PACKAGE=libcasper LIB= cap_pwd SHLIB_MAJOR= 0 SHLIBDIR?= /lib/casper INCSDIR?= ${INCLUDEDIR}/casper SRCS= cap_pwd.c INCS= cap_pwd.h LIBADD= nv CFLAGS+=-I${.CURDIR} + +.if ${MK_TESTS} != "no" +SUBDIR+= tests +.endif .include Index: projects/clang390-import/lib/libcasper/services/cap_pwd/tests/Makefile =================================================================== --- projects/clang390-import/lib/libcasper/services/cap_pwd/tests/Makefile (nonexistent) +++ projects/clang390-import/lib/libcasper/services/cap_pwd/tests/Makefile (revision 305687) @@ -0,0 +1,11 @@ +# $FreeBSD$ + +TAP_TESTS_C= pwd_test + +LIBADD+= casper +LIBADD+= cap_pwd +LIBADD+= nv + +WARNS?= 3 + +.include Property changes on: projects/clang390-import/lib/libcasper/services/cap_pwd/tests/Makefile ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang390-import/lib/libcasper/services/cap_pwd/tests/pwd_test.c =================================================================== --- projects/clang390-import/lib/libcasper/services/cap_pwd/tests/pwd_test.c (nonexistent) +++ projects/clang390-import/lib/libcasper/services/cap_pwd/tests/pwd_test.c (revision 305687) @@ -0,0 +1,1536 @@ +/*- + * Copyright (c) 2013 The FreeBSD Foundation + * All rights reserved. + * + * This software was developed by Pawel Jakub Dawidek under sponsorship from + * the FreeBSD Foundation. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include +__FBSDID("$FreeBSD$"); + +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include + +static int ntest = 1; + +#define CHECK(expr) do { \ + if ((expr)) \ + printf("ok %d %s:%u\n", ntest, __FILE__, __LINE__); \ + else \ + printf("not ok %d %s:%u\n", ntest, __FILE__, __LINE__);\ + ntest++; \ +} while (0) +#define CHECKX(expr) do { \ + if ((expr)) { \ + printf("ok %d %s:%u\n", ntest, __FILE__, __LINE__); \ + } else { \ + printf("not ok %d %s:%u\n", ntest, __FILE__, __LINE__);\ + exit(1); \ + } \ + ntest++; \ +} while (0) + +#define UID_ROOT 0 +#define UID_OPERATOR 2 + +#define GETPWENT0 0x0001 +#define GETPWENT1 0x0002 +#define GETPWENT2 0x0004 +#define GETPWENT (GETPWENT0 | GETPWENT1 | GETPWENT2) +#define GETPWENT_R0 0x0008 +#define GETPWENT_R1 0x0010 +#define GETPWENT_R2 0x0020 +#define GETPWENT_R (GETPWENT_R0 | GETPWENT_R1 | GETPWENT_R2) +#define GETPWNAM 0x0040 +#define GETPWNAM_R 0x0080 +#define GETPWUID 0x0100 +#define GETPWUID_R 0x0200 + +static bool +passwd_compare(const struct passwd *pwd0, const struct passwd *pwd1) +{ + + if (pwd0 == NULL && pwd1 == NULL) + return (true); + if (pwd0 == NULL || pwd1 == NULL) + return (false); + + if (strcmp(pwd0->pw_name, pwd1->pw_name) != 0) + return (false); + + if (pwd0->pw_passwd != NULL || pwd1->pw_passwd != NULL) { + if (pwd0->pw_passwd == NULL || pwd1->pw_passwd == NULL) + return (false); + if (strcmp(pwd0->pw_passwd, pwd1->pw_passwd) != 0) + return (false); + } + + if (pwd0->pw_uid != pwd1->pw_uid) + return (false); + + if (pwd0->pw_gid != pwd1->pw_gid) + return (false); + + if (pwd0->pw_change != pwd1->pw_change) + return (false); + + if (pwd0->pw_class != NULL || pwd1->pw_class != NULL) { + if (pwd0->pw_class == NULL || pwd1->pw_class == NULL) + return (false); + if (strcmp(pwd0->pw_class, pwd1->pw_class) != 0) + return (false); + } + + if (pwd0->pw_gecos != NULL || pwd1->pw_gecos != NULL) { + if (pwd0->pw_gecos == NULL || pwd1->pw_gecos == NULL) + return (false); + if (strcmp(pwd0->pw_gecos, pwd1->pw_gecos) != 0) + return (false); + } + + if (pwd0->pw_dir != NULL || pwd1->pw_dir != NULL) { + if (pwd0->pw_dir == NULL || pwd1->pw_dir == NULL) + return (false); + if (strcmp(pwd0->pw_dir, pwd1->pw_dir) != 0) + return (false); + } + + if (pwd0->pw_shell != NULL || pwd1->pw_shell != NULL) { + if (pwd0->pw_shell == NULL || pwd1->pw_shell == NULL) + return (false); + if (strcmp(pwd0->pw_shell, pwd1->pw_shell) != 0) + return (false); + } + + if (pwd0->pw_expire != pwd1->pw_expire) + return (false); + + if (pwd0->pw_fields != pwd1->pw_fields) + return (false); + + return (true); +} + +static unsigned int +runtest_cmds(cap_channel_t *cappwd) +{ + char bufs[1024], bufc[1024]; + unsigned int result; + struct passwd *pwds, *pwdc; + struct passwd sts, stc; + + result = 0; + + setpwent(); + cap_setpwent(cappwd); + + pwds = getpwent(); + pwdc = cap_getpwent(cappwd); + if (passwd_compare(pwds, pwdc)) { + result |= GETPWENT0; + pwds = getpwent(); + pwdc = cap_getpwent(cappwd); + if (passwd_compare(pwds, pwdc)) + result |= GETPWENT1; + } + + getpwent_r(&sts, bufs, sizeof(bufs), &pwds); + cap_getpwent_r(cappwd, &stc, bufc, sizeof(bufc), &pwdc); + if (passwd_compare(pwds, pwdc)) { + result |= GETPWENT_R0; + getpwent_r(&sts, bufs, sizeof(bufs), &pwds); + cap_getpwent_r(cappwd, &stc, bufc, sizeof(bufc), &pwdc); + if (passwd_compare(pwds, pwdc)) + result |= GETPWENT_R1; + } + + setpwent(); + cap_setpwent(cappwd); + + getpwent_r(&sts, bufs, sizeof(bufs), &pwds); + cap_getpwent_r(cappwd, &stc, bufc, sizeof(bufc), &pwdc); + if (passwd_compare(pwds, pwdc)) + result |= GETPWENT_R2; + + pwds = getpwent(); + pwdc = cap_getpwent(cappwd); + if (passwd_compare(pwds, pwdc)) + result |= GETPWENT2; + + pwds = getpwnam("root"); + pwdc = cap_getpwnam(cappwd, "root"); + if (passwd_compare(pwds, pwdc)) { + pwds = getpwnam("operator"); + pwdc = cap_getpwnam(cappwd, "operator"); + if (passwd_compare(pwds, pwdc)) + result |= GETPWNAM; + } + + getpwnam_r("root", &sts, bufs, sizeof(bufs), &pwds); + cap_getpwnam_r(cappwd, "root", &stc, bufc, sizeof(bufc), &pwdc); + if (passwd_compare(pwds, pwdc)) { + getpwnam_r("operator", &sts, bufs, sizeof(bufs), &pwds); + cap_getpwnam_r(cappwd, "operator", &stc, bufc, sizeof(bufc), + &pwdc); + if (passwd_compare(pwds, pwdc)) + result |= GETPWNAM_R; + } + + pwds = getpwuid(UID_ROOT); + pwdc = cap_getpwuid(cappwd, UID_ROOT); + if (passwd_compare(pwds, pwdc)) { + pwds = getpwuid(UID_OPERATOR); + pwdc = cap_getpwuid(cappwd, UID_OPERATOR); + if (passwd_compare(pwds, pwdc)) + result |= GETPWUID; + } + + getpwuid_r(UID_ROOT, &sts, bufs, sizeof(bufs), &pwds); + cap_getpwuid_r(cappwd, UID_ROOT, &stc, bufc, sizeof(bufc), &pwdc); + if (passwd_compare(pwds, pwdc)) { + getpwuid_r(UID_OPERATOR, &sts, bufs, sizeof(bufs), &pwds); + cap_getpwuid_r(cappwd, UID_OPERATOR, &stc, bufc, sizeof(bufc), + &pwdc); + if (passwd_compare(pwds, pwdc)) + result |= GETPWUID_R; + } + + return (result); +} + +static void +test_cmds(cap_channel_t *origcappwd) +{ + cap_channel_t *cappwd; + const char *cmds[7], *fields[10], *names[6]; + uid_t uids[5]; + + fields[0] = "pw_name"; + fields[1] = "pw_passwd"; + fields[2] = "pw_uid"; + fields[3] = "pw_gid"; + fields[4] = "pw_change"; + fields[5] = "pw_class"; + fields[6] = "pw_gecos"; + fields[7] = "pw_dir"; + fields[8] = "pw_shell"; + fields[9] = "pw_expire"; + + names[0] = "root"; + names[1] = "toor"; + names[2] = "daemon"; + names[3] = "operator"; + names[4] = "bin"; + names[5] = "kmem"; + + uids[0] = 0; + uids[1] = 1; + uids[2] = 2; + uids[3] = 3; + uids[4] = 5; + + /* + * Allow: + * cmds: setpwent, getpwent, getpwent_r, getpwnam, getpwnam_r, + * getpwuid, getpwuid_r + * users: + * names: root, toor, daemon, operator, bin, kmem + * uids: + */ + cappwd = cap_clone(origcappwd); + CHECK(cappwd != NULL); + + cmds[0] = "setpwent"; + cmds[1] = "getpwent"; + cmds[2] = "getpwent_r"; + cmds[3] = "getpwnam"; + cmds[4] = "getpwnam_r"; + cmds[5] = "getpwuid"; + cmds[6] = "getpwuid_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 7) == 0); + CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); + CHECK(cap_pwd_limit_users(cappwd, names, 6, NULL, 0) == 0); + + CHECK(runtest_cmds(cappwd) == (GETPWENT | GETPWENT_R | + GETPWNAM | GETPWNAM_R | GETPWUID | GETPWUID_R)); + + cap_close(cappwd); + + /* + * Allow: + * cmds: setpwent, getpwent, getpwent_r, getpwnam, getpwnam_r, + * getpwuid, getpwuid_r + * users: + * names: + * uids: 0, 1, 2, 3, 5 + */ + cappwd = cap_clone(origcappwd); + CHECK(cappwd != NULL); + + cmds[0] = "setpwent"; + cmds[1] = "getpwent"; + cmds[2] = "getpwent_r"; + cmds[3] = "getpwnam"; + cmds[4] = "getpwnam_r"; + cmds[5] = "getpwuid"; + cmds[6] = "getpwuid_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 7) == 0); + CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); + CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 5) == 0); + + CHECK(runtest_cmds(cappwd) == (GETPWENT | GETPWENT_R | + GETPWNAM | GETPWNAM_R | GETPWUID | GETPWUID_R)); + + cap_close(cappwd); + + /* + * Allow: + * cmds: getpwent, getpwent_r, getpwnam, getpwnam_r, + * getpwuid, getpwuid_r + * users: + * names: root, toor, daemon, operator, bin, kmem + * uids: + * Disallow: + * cmds: setpwent + * users: + */ + cappwd = cap_clone(origcappwd); + CHECK(cappwd != NULL); + + cap_setpwent(cappwd); + + cmds[0] = "getpwent"; + cmds[1] = "getpwent_r"; + cmds[2] = "getpwnam"; + cmds[3] = "getpwnam_r"; + cmds[4] = "getpwuid"; + cmds[5] = "getpwuid_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 6) == 0); + cmds[0] = "setpwent"; + cmds[1] = "getpwent"; + cmds[2] = "getpwent_r"; + cmds[3] = "getpwnam"; + cmds[4] = "getpwnam_r"; + cmds[5] = "getpwuid"; + cmds[6] = "getpwuid_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 7) == -1 && errno == ENOTCAPABLE); + cmds[0] = "setpwent"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 1) == -1 && errno == ENOTCAPABLE); + CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); + CHECK(cap_pwd_limit_users(cappwd, names, 6, NULL, 0) == 0); + + CHECK(runtest_cmds(cappwd) == (GETPWENT0 | GETPWENT1 | GETPWENT_R0 | + GETPWENT_R1 | GETPWNAM | GETPWNAM_R | GETPWUID | GETPWUID_R)); + + cap_close(cappwd); + + /* + * Allow: + * cmds: getpwent, getpwent_r, getpwnam, getpwnam_r, + * getpwuid, getpwuid_r + * users: + * names: + * uids: 0, 1, 2, 3, 5 + * Disallow: + * cmds: setpwent + * users: + */ + cappwd = cap_clone(origcappwd); + CHECK(cappwd != NULL); + + cap_setpwent(cappwd); + + cmds[0] = "getpwent"; + cmds[1] = "getpwent_r"; + cmds[2] = "getpwnam"; + cmds[3] = "getpwnam_r"; + cmds[4] = "getpwuid"; + cmds[5] = "getpwuid_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 6) == 0); + cmds[0] = "setpwent"; + cmds[1] = "getpwent"; + cmds[2] = "getpwent_r"; + cmds[3] = "getpwnam"; + cmds[4] = "getpwnam_r"; + cmds[5] = "getpwuid"; + cmds[6] = "getpwuid_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 7) == -1 && errno == ENOTCAPABLE); + cmds[0] = "setpwent"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 1) == -1 && errno == ENOTCAPABLE); + CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); + CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 5) == 0); + + CHECK(runtest_cmds(cappwd) == (GETPWENT0 | GETPWENT1 | GETPWENT_R0 | + GETPWENT_R1 | GETPWNAM | GETPWNAM_R | GETPWUID | GETPWUID_R)); + + cap_close(cappwd); + + /* + * Allow: + * cmds: setpwent, getpwent_r, getpwnam, getpwnam_r, + * getpwuid, getpwuid_r + * users: + * names: root, toor, daemon, operator, bin, kmem + * uids: + * Disallow: + * cmds: getpwent + * users: + */ + cappwd = cap_clone(origcappwd); + CHECK(cappwd != NULL); + + cmds[0] = "setpwent"; + cmds[1] = "getpwent_r"; + cmds[2] = "getpwnam"; + cmds[3] = "getpwnam_r"; + cmds[4] = "getpwuid"; + cmds[5] = "getpwuid_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 6) == 0); + cmds[0] = "setpwent"; + cmds[1] = "getpwent"; + cmds[2] = "getpwent_r"; + cmds[3] = "getpwnam"; + cmds[4] = "getpwnam_r"; + cmds[5] = "getpwuid"; + cmds[6] = "getpwuid_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 7) == -1 && errno == ENOTCAPABLE); + cmds[0] = "getpwent"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 1) == -1 && errno == ENOTCAPABLE); + CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); + CHECK(cap_pwd_limit_users(cappwd, names, 6, NULL, 0) == 0); + + CHECK(runtest_cmds(cappwd) == (GETPWENT_R2 | + GETPWNAM | GETPWNAM_R | GETPWUID | GETPWUID_R)); + + cap_close(cappwd); + + /* + * Allow: + * cmds: setpwent, getpwent_r, getpwnam, getpwnam_r, + * getpwuid, getpwuid_r + * users: + * names: + * uids: 0, 1, 2, 3, 5 + * Disallow: + * cmds: getpwent + * users: + */ + cappwd = cap_clone(origcappwd); + CHECK(cappwd != NULL); + + cmds[0] = "setpwent"; + cmds[1] = "getpwent_r"; + cmds[2] = "getpwnam"; + cmds[3] = "getpwnam_r"; + cmds[4] = "getpwuid"; + cmds[5] = "getpwuid_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 6) == 0); + cmds[0] = "setpwent"; + cmds[1] = "getpwent"; + cmds[2] = "getpwent_r"; + cmds[3] = "getpwnam"; + cmds[4] = "getpwnam_r"; + cmds[5] = "getpwuid"; + cmds[6] = "getpwuid_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 7) == -1 && errno == ENOTCAPABLE); + cmds[0] = "getpwent"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 1) == -1 && errno == ENOTCAPABLE); + CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); + CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 5) == 0); + + CHECK(runtest_cmds(cappwd) == (GETPWENT_R2 | + GETPWNAM | GETPWNAM_R | GETPWUID | GETPWUID_R)); + + cap_close(cappwd); + + /* + * Allow: + * cmds: setpwent, getpwent, getpwnam, getpwnam_r, + * getpwuid, getpwuid_r + * users: + * names: root, toor, daemon, operator, bin, kmem + * uids: + * Disallow: + * cmds: getpwent_r + * users: + */ + cappwd = cap_clone(origcappwd); + CHECK(cappwd != NULL); + + cmds[0] = "setpwent"; + cmds[1] = "getpwent"; + cmds[2] = "getpwnam"; + cmds[3] = "getpwnam_r"; + cmds[4] = "getpwuid"; + cmds[5] = "getpwuid_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 6) == 0); + cmds[0] = "setpwent"; + cmds[1] = "getpwent"; + cmds[2] = "getpwent_r"; + cmds[3] = "getpwnam"; + cmds[4] = "getpwnam_r"; + cmds[5] = "getpwuid"; + cmds[6] = "getpwuid_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 7) == -1 && errno == ENOTCAPABLE); + cmds[0] = "getpwent_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 1) == -1 && errno == ENOTCAPABLE); + CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); + CHECK(cap_pwd_limit_users(cappwd, names, 6, NULL, 0) == 0); + + CHECK(runtest_cmds(cappwd) == (GETPWENT0 | GETPWENT1 | + GETPWNAM | GETPWNAM_R | GETPWUID | GETPWUID_R)); + + cap_close(cappwd); + + /* + * Allow: + * cmds: setpwent, getpwent, getpwnam, getpwnam_r, + * getpwuid, getpwuid_r + * users: + * names: + * uids: 0, 1, 2, 3, 5 + * Disallow: + * cmds: getpwent_r + * users: + */ + cappwd = cap_clone(origcappwd); + CHECK(cappwd != NULL); + + cmds[0] = "setpwent"; + cmds[1] = "getpwent"; + cmds[2] = "getpwnam"; + cmds[3] = "getpwnam_r"; + cmds[4] = "getpwuid"; + cmds[5] = "getpwuid_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 6) == 0); + cmds[0] = "setpwent"; + cmds[1] = "getpwent"; + cmds[2] = "getpwent_r"; + cmds[3] = "getpwnam"; + cmds[4] = "getpwnam_r"; + cmds[5] = "getpwuid"; + cmds[6] = "getpwuid_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 7) == -1 && errno == ENOTCAPABLE); + cmds[0] = "getpwent_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 1) == -1 && errno == ENOTCAPABLE); + CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); + CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 5) == 0); + + CHECK(runtest_cmds(cappwd) == (GETPWENT0 | GETPWENT1 | + GETPWNAM | GETPWNAM_R | GETPWUID | GETPWUID_R)); + + cap_close(cappwd); + + /* + * Allow: + * cmds: setpwent, getpwent, getpwent_r, getpwnam_r, + * getpwuid, getpwuid_r + * users: + * names: root, toor, daemon, operator, bin, kmem + * uids: + * Disallow: + * cmds: getpwnam + * users: + */ + cappwd = cap_clone(origcappwd); + CHECK(cappwd != NULL); + + cmds[0] = "setpwent"; + cmds[1] = "getpwent"; + cmds[2] = "getpwent_r"; + cmds[3] = "getpwnam_r"; + cmds[4] = "getpwuid"; + cmds[5] = "getpwuid_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 6) == 0); + cmds[0] = "setpwent"; + cmds[1] = "getpwent"; + cmds[2] = "getpwent_r"; + cmds[3] = "getpwnam"; + cmds[4] = "getpwnam_r"; + cmds[5] = "getpwuid"; + cmds[6] = "getpwuid_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 7) == -1 && errno == ENOTCAPABLE); + cmds[0] = "getpwnam"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 1) == -1 && errno == ENOTCAPABLE); + CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); + CHECK(cap_pwd_limit_users(cappwd, names, 6, NULL, 0) == 0); + + CHECK(runtest_cmds(cappwd) == (GETPWENT | GETPWENT_R | + GETPWNAM_R | GETPWUID | GETPWUID_R)); + + cap_close(cappwd); + + /* + * Allow: + * cmds: setpwent, getpwent, getpwent_r, getpwnam_r, + * getpwuid, getpwuid_r + * users: + * names: + * uids: 0, 1, 2, 3, 5 + * Disallow: + * cmds: getpwnam + * users: + */ + cappwd = cap_clone(origcappwd); + CHECK(cappwd != NULL); + + cmds[0] = "setpwent"; + cmds[1] = "getpwent"; + cmds[2] = "getpwent_r"; + cmds[3] = "getpwnam_r"; + cmds[4] = "getpwuid"; + cmds[5] = "getpwuid_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 6) == 0); + cmds[0] = "setpwent"; + cmds[1] = "getpwent"; + cmds[2] = "getpwent_r"; + cmds[3] = "getpwnam"; + cmds[4] = "getpwnam_r"; + cmds[5] = "getpwuid"; + cmds[6] = "getpwuid_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 7) == -1 && errno == ENOTCAPABLE); + cmds[0] = "getpwnam"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 1) == -1 && errno == ENOTCAPABLE); + CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); + CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 5) == 0); + + CHECK(runtest_cmds(cappwd) == (GETPWENT | GETPWENT_R | + GETPWNAM_R | GETPWUID | GETPWUID_R)); + + cap_close(cappwd); + + /* + * Allow: + * cmds: setpwent, getpwent, getpwent_r, getpwnam, + * getpwuid, getpwuid_r + * users: + * names: root, toor, daemon, operator, bin, kmem + * uids: + * Disallow: + * cmds: getpwnam_r + * users: + */ + cappwd = cap_clone(origcappwd); + CHECK(cappwd != NULL); + + cmds[0] = "setpwent"; + cmds[1] = "getpwent"; + cmds[2] = "getpwent_r"; + cmds[3] = "getpwnam"; + cmds[4] = "getpwuid"; + cmds[5] = "getpwuid_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 6) == 0); + cmds[0] = "setpwent"; + cmds[1] = "getpwent"; + cmds[2] = "getpwent_r"; + cmds[3] = "getpwnam"; + cmds[4] = "getpwnam_r"; + cmds[5] = "getpwuid"; + cmds[6] = "getpwuid_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 7) == -1 && errno == ENOTCAPABLE); + cmds[0] = "getpwnam_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 1) == -1 && errno == ENOTCAPABLE); + CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); + CHECK(cap_pwd_limit_users(cappwd, names, 6, NULL, 0) == 0); + + CHECK(runtest_cmds(cappwd) == (GETPWENT | GETPWENT_R | + GETPWNAM | GETPWUID | GETPWUID_R)); + + cap_close(cappwd); + + /* + * Allow: + * cmds: setpwent, getpwent, getpwent_r, getpwnam, + * getpwuid, getpwuid_r + * users: + * names: + * uids: 0, 1, 2, 3, 5 + * Disallow: + * cmds: getpwnam_r + * users: + */ + cappwd = cap_clone(origcappwd); + CHECK(cappwd != NULL); + + cmds[0] = "setpwent"; + cmds[1] = "getpwent"; + cmds[2] = "getpwent_r"; + cmds[3] = "getpwnam"; + cmds[4] = "getpwuid"; + cmds[5] = "getpwuid_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 6) == 0); + cmds[0] = "setpwent"; + cmds[1] = "getpwent"; + cmds[2] = "getpwent_r"; + cmds[3] = "getpwnam"; + cmds[4] = "getpwnam_r"; + cmds[5] = "getpwuid"; + cmds[6] = "getpwuid_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 7) == -1 && errno == ENOTCAPABLE); + cmds[0] = "getpwnam_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 1) == -1 && errno == ENOTCAPABLE); + CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); + CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 5) == 0); + + CHECK(runtest_cmds(cappwd) == (GETPWENT | GETPWENT_R | + GETPWNAM | GETPWUID | GETPWUID_R)); + + cap_close(cappwd); + + /* + * Allow: + * cmds: setpwent, getpwent, getpwent_r, getpwnam, getpwnam_r, + * getpwuid_r + * users: + * names: root, toor, daemon, operator, bin, kmem + * uids: + * Disallow: + * cmds: getpwuid + * users: + */ + cappwd = cap_clone(origcappwd); + CHECK(cappwd != NULL); + + cmds[0] = "setpwent"; + cmds[1] = "getpwent"; + cmds[2] = "getpwent_r"; + cmds[3] = "getpwnam"; + cmds[4] = "getpwnam_r"; + cmds[5] = "getpwuid_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 6) == 0); + cmds[0] = "setpwent"; + cmds[1] = "getpwent"; + cmds[2] = "getpwent_r"; + cmds[3] = "getpwnam"; + cmds[4] = "getpwnam_r"; + cmds[5] = "getpwuid"; + cmds[6] = "getpwuid_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 7) == -1 && errno == ENOTCAPABLE); + cmds[0] = "getpwuid"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 1) == -1 && errno == ENOTCAPABLE); + CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); + CHECK(cap_pwd_limit_users(cappwd, names, 6, NULL, 0) == 0); + + CHECK(runtest_cmds(cappwd) == (GETPWENT | GETPWENT_R | + GETPWNAM | GETPWNAM_R | GETPWUID_R)); + + cap_close(cappwd); + + /* + * Allow: + * cmds: setpwent, getpwent, getpwent_r, getpwnam, getpwnam_r, + * getpwuid_r + * users: + * names: + * uids: 0, 1, 2, 3, 5 + * Disallow: + * cmds: getpwuid + * users: + */ + cappwd = cap_clone(origcappwd); + CHECK(cappwd != NULL); + + cmds[0] = "setpwent"; + cmds[1] = "getpwent"; + cmds[2] = "getpwent_r"; + cmds[3] = "getpwnam"; + cmds[4] = "getpwnam_r"; + cmds[5] = "getpwuid_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 6) == 0); + cmds[0] = "setpwent"; + cmds[1] = "getpwent"; + cmds[2] = "getpwent_r"; + cmds[3] = "getpwnam"; + cmds[4] = "getpwnam_r"; + cmds[5] = "getpwuid"; + cmds[6] = "getpwuid_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 7) == -1 && errno == ENOTCAPABLE); + cmds[0] = "getpwuid"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 1) == -1 && errno == ENOTCAPABLE); + CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); + CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 5) == 0); + + CHECK(runtest_cmds(cappwd) == (GETPWENT | GETPWENT_R | + GETPWNAM | GETPWNAM_R | GETPWUID_R)); + + cap_close(cappwd); + + /* + * Allow: + * cmds: setpwent, getpwent, getpwent_r, getpwnam, getpwnam_r, + * getpwuid + * users: + * names: root, toor, daemon, operator, bin, kmem + * uids: + * Disallow: + * cmds: getpwuid_r + * users: + */ + cappwd = cap_clone(origcappwd); + CHECK(cappwd != NULL); + + cmds[0] = "setpwent"; + cmds[1] = "getpwent"; + cmds[2] = "getpwent_r"; + cmds[3] = "getpwnam"; + cmds[4] = "getpwnam_r"; + cmds[5] = "getpwuid"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 6) == 0); + cmds[0] = "setpwent"; + cmds[1] = "getpwent"; + cmds[2] = "getpwent_r"; + cmds[3] = "getpwnam"; + cmds[4] = "getpwnam_r"; + cmds[5] = "getpwuid"; + cmds[6] = "getpwuid_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 7) == -1 && errno == ENOTCAPABLE); + cmds[0] = "getpwuid_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 1) == -1 && errno == ENOTCAPABLE); + CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); + CHECK(cap_pwd_limit_users(cappwd, names, 6, NULL, 0) == 0); + + CHECK(runtest_cmds(cappwd) == (GETPWENT | GETPWENT_R | + GETPWNAM | GETPWNAM_R | GETPWUID)); + + cap_close(cappwd); + + /* + * Allow: + * cmds: setpwent, getpwent, getpwent_r, getpwnam, getpwnam_r, + * getpwuid + * users: + * names: + * uids: 0, 1, 2, 3, 5 + * Disallow: + * cmds: getpwuid_r + * users: + */ + cappwd = cap_clone(origcappwd); + CHECK(cappwd != NULL); + + cmds[0] = "setpwent"; + cmds[1] = "getpwent"; + cmds[2] = "getpwent_r"; + cmds[3] = "getpwnam"; + cmds[4] = "getpwnam_r"; + cmds[5] = "getpwuid"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 6) == 0); + cmds[0] = "setpwent"; + cmds[1] = "getpwent"; + cmds[2] = "getpwent_r"; + cmds[3] = "getpwnam"; + cmds[4] = "getpwnam_r"; + cmds[5] = "getpwuid"; + cmds[6] = "getpwuid_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 7) == -1 && errno == ENOTCAPABLE); + cmds[0] = "getpwuid_r"; + CHECK(cap_pwd_limit_cmds(cappwd, cmds, 1) == -1 && errno == ENOTCAPABLE); + CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); + CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 5) == 0); + + CHECK(runtest_cmds(cappwd) == (GETPWENT | GETPWENT_R | + GETPWNAM | GETPWNAM_R | GETPWUID)); + + cap_close(cappwd); +} + +#define PW_NAME _PWF_NAME +#define PW_PASSWD _PWF_PASSWD +#define PW_UID _PWF_UID +#define PW_GID _PWF_GID +#define PW_CHANGE _PWF_CHANGE +#define PW_CLASS _PWF_CLASS +#define PW_GECOS _PWF_GECOS +#define PW_DIR _PWF_DIR +#define PW_SHELL _PWF_SHELL +#define PW_EXPIRE _PWF_EXPIRE + +static unsigned int +passwd_fields(const struct passwd *pwd) +{ + unsigned int result; + + result = 0; + + if (pwd->pw_name != NULL && pwd->pw_name[0] != '\0') + result |= PW_NAME; +// else +// printf("No pw_name\n"); + + if (pwd->pw_passwd != NULL && pwd->pw_passwd[0] != '\0') + result |= PW_PASSWD; + else if ((pwd->pw_fields & _PWF_PASSWD) != 0) + result |= PW_PASSWD; +// else +// printf("No pw_passwd\n"); + + if (pwd->pw_uid != (uid_t)-1) + result |= PW_UID; +// else +// printf("No pw_uid\n"); + + if (pwd->pw_gid != (gid_t)-1) + result |= PW_GID; +// else +// printf("No pw_gid\n"); + + if (pwd->pw_change != 0 || (pwd->pw_fields & _PWF_CHANGE) != 0) + result |= PW_CHANGE; +// else +// printf("No pw_change\n"); + + if (pwd->pw_class != NULL && pwd->pw_class[0] != '\0') + result |= PW_CLASS; + else if ((pwd->pw_fields & _PWF_CLASS) != 0) + result |= PW_CLASS; +// else +// printf("No pw_class\n"); + + if (pwd->pw_gecos != NULL && pwd->pw_gecos[0] != '\0') + result |= PW_GECOS; + else if ((pwd->pw_fields & _PWF_GECOS) != 0) + result |= PW_GECOS; +// else +// printf("No pw_gecos\n"); + + if (pwd->pw_dir != NULL && pwd->pw_dir[0] != '\0') + result |= PW_DIR; + else if ((pwd->pw_fields & _PWF_DIR) != 0) + result |= PW_DIR; +// else +// printf("No pw_dir\n"); + + if (pwd->pw_shell != NULL && pwd->pw_shell[0] != '\0') + result |= PW_SHELL; + else if ((pwd->pw_fields & _PWF_SHELL) != 0) + result |= PW_SHELL; +// else +// printf("No pw_shell\n"); + + if (pwd->pw_expire != 0 || (pwd->pw_fields & _PWF_EXPIRE) != 0) + result |= PW_EXPIRE; +// else +// printf("No pw_expire\n"); + +if (false && pwd->pw_fields != (int)result) { +printf("fields=0x%x != result=0x%x\n", (const unsigned int)pwd->pw_fields, result); +printf(" fields result\n"); +printf("PW_NAME %d %d\n", (pwd->pw_fields & PW_NAME) != 0, (result & PW_NAME) != 0); +printf("PW_PASSWD %d %d\n", (pwd->pw_fields & PW_PASSWD) != 0, (result & PW_PASSWD) != 0); +printf("PW_UID %d %d\n", (pwd->pw_fields & PW_UID) != 0, (result & PW_UID) != 0); +printf("PW_GID %d %d\n", (pwd->pw_fields & PW_GID) != 0, (result & PW_GID) != 0); +printf("PW_CHANGE %d %d\n", (pwd->pw_fields & PW_CHANGE) != 0, (result & PW_CHANGE) != 0); +printf("PW_CLASS %d %d\n", (pwd->pw_fields & PW_CLASS) != 0, (result & PW_CLASS) != 0); +printf("PW_GECOS %d %d\n", (pwd->pw_fields & PW_GECOS) != 0, (result & PW_GECOS) != 0); +printf("PW_DIR %d %d\n", (pwd->pw_fields & PW_DIR) != 0, (result & PW_DIR) != 0); +printf("PW_SHELL %d %d\n", (pwd->pw_fields & PW_SHELL) != 0, (result & PW_SHELL) != 0); +printf("PW_EXPIRE %d %d\n", (pwd->pw_fields & PW_EXPIRE) != 0, (result & PW_EXPIRE) != 0); +} + +//printf("result=0x%x\n", result); + return (result); +} + +static bool +runtest_fields(cap_channel_t *cappwd, unsigned int expected) +{ + char buf[1024]; + struct passwd *pwd; + struct passwd st; + +//printf("expected=0x%x\n", expected); + cap_setpwent(cappwd); + pwd = cap_getpwent(cappwd); + if ((passwd_fields(pwd) & ~expected) != 0) + return (false); + + cap_setpwent(cappwd); + cap_getpwent_r(cappwd, &st, buf, sizeof(buf), &pwd); + if ((passwd_fields(pwd) & ~expected) != 0) + return (false); + + pwd = cap_getpwnam(cappwd, "root"); + if ((passwd_fields(pwd) & ~expected) != 0) + return (false); + + cap_getpwnam_r(cappwd, "root", &st, buf, sizeof(buf), &pwd); + if ((passwd_fields(pwd) & ~expected) != 0) + return (false); + + pwd = cap_getpwuid(cappwd, UID_ROOT); + if ((passwd_fields(pwd) & ~expected) != 0) + return (false); + + cap_getpwuid_r(cappwd, UID_ROOT, &st, buf, sizeof(buf), &pwd); + if ((passwd_fields(pwd) & ~expected) != 0) + return (false); + + return (true); +} + +static void +test_fields(cap_channel_t *origcappwd) +{ + cap_channel_t *cappwd; + const char *fields[10]; + + /* No limits. */ + + CHECK(runtest_fields(origcappwd, PW_NAME | PW_PASSWD | PW_UID | + PW_GID | PW_CHANGE | PW_CLASS | PW_GECOS | PW_DIR | PW_SHELL | + PW_EXPIRE)); + + /* + * Allow: + * fields: pw_name, pw_passwd, pw_uid, pw_gid, pw_change, pw_class, + * pw_gecos, pw_dir, pw_shell, pw_expire + */ + cappwd = cap_clone(origcappwd); + CHECK(cappwd != NULL); + + fields[0] = "pw_name"; + fields[1] = "pw_passwd"; + fields[2] = "pw_uid"; + fields[3] = "pw_gid"; + fields[4] = "pw_change"; + fields[5] = "pw_class"; + fields[6] = "pw_gecos"; + fields[7] = "pw_dir"; + fields[8] = "pw_shell"; + fields[9] = "pw_expire"; + CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); + + CHECK(runtest_fields(origcappwd, PW_NAME | PW_PASSWD | PW_UID | + PW_GID | PW_CHANGE | PW_CLASS | PW_GECOS | PW_DIR | PW_SHELL | + PW_EXPIRE)); + + cap_close(cappwd); + + /* + * Allow: + * fields: pw_name, pw_passwd, pw_uid, pw_gid, pw_change + */ + cappwd = cap_clone(origcappwd); + CHECK(cappwd != NULL); + + fields[0] = "pw_name"; + fields[1] = "pw_passwd"; + fields[2] = "pw_uid"; + fields[3] = "pw_gid"; + fields[4] = "pw_change"; + CHECK(cap_pwd_limit_fields(cappwd, fields, 5) == 0); + fields[5] = "pw_class"; + CHECK(cap_pwd_limit_fields(cappwd, fields, 6) == -1 && + errno == ENOTCAPABLE); + fields[0] = "pw_class"; + CHECK(cap_pwd_limit_fields(cappwd, fields, 1) == -1 && + errno == ENOTCAPABLE); + + CHECK(runtest_fields(cappwd, PW_NAME | PW_PASSWD | PW_UID | + PW_GID | PW_CHANGE)); + + cap_close(cappwd); + + /* + * Allow: + * fields: pw_class, pw_gecos, pw_dir, pw_shell, pw_expire + */ + cappwd = cap_clone(origcappwd); + CHECK(cappwd != NULL); + + fields[0] = "pw_class"; + fields[1] = "pw_gecos"; + fields[2] = "pw_dir"; + fields[3] = "pw_shell"; + fields[4] = "pw_expire"; + CHECK(cap_pwd_limit_fields(cappwd, fields, 5) == 0); + fields[5] = "pw_uid"; + CHECK(cap_pwd_limit_fields(cappwd, fields, 6) == -1 && + errno == ENOTCAPABLE); + fields[0] = "pw_uid"; + CHECK(cap_pwd_limit_fields(cappwd, fields, 1) == -1 && + errno == ENOTCAPABLE); + + CHECK(runtest_fields(cappwd, PW_CLASS | PW_GECOS | PW_DIR | + PW_SHELL | PW_EXPIRE)); + + cap_close(cappwd); + + /* + * Allow: + * fields: pw_name, pw_uid, pw_change, pw_gecos, pw_shell + */ + cappwd = cap_clone(origcappwd); + CHECK(cappwd != NULL); + + fields[0] = "pw_name"; + fields[1] = "pw_uid"; + fields[2] = "pw_change"; + fields[3] = "pw_gecos"; + fields[4] = "pw_shell"; + CHECK(cap_pwd_limit_fields(cappwd, fields, 5) == 0); + fields[5] = "pw_class"; + CHECK(cap_pwd_limit_fields(cappwd, fields, 6) == -1 && + errno == ENOTCAPABLE); + fields[0] = "pw_class"; + CHECK(cap_pwd_limit_fields(cappwd, fields, 1) == -1 && + errno == ENOTCAPABLE); + + CHECK(runtest_fields(cappwd, PW_NAME | PW_UID | PW_CHANGE | + PW_GECOS | PW_SHELL)); + + cap_close(cappwd); + + /* + * Allow: + * fields: pw_passwd, pw_gid, pw_class, pw_dir, pw_expire + */ + cappwd = cap_clone(origcappwd); + CHECK(cappwd != NULL); + + fields[0] = "pw_passwd"; + fields[1] = "pw_gid"; + fields[2] = "pw_class"; + fields[3] = "pw_dir"; + fields[4] = "pw_expire"; + CHECK(cap_pwd_limit_fields(cappwd, fields, 5) == 0); + fields[5] = "pw_uid"; + CHECK(cap_pwd_limit_fields(cappwd, fields, 6) == -1 && + errno == ENOTCAPABLE); + fields[0] = "pw_uid"; + CHECK(cap_pwd_limit_fields(cappwd, fields, 1) == -1 && + errno == ENOTCAPABLE); + + CHECK(runtest_fields(cappwd, PW_PASSWD | PW_GID | PW_CLASS | + PW_DIR | PW_EXPIRE)); + + cap_close(cappwd); + + /* + * Allow: + * fields: pw_uid, pw_class, pw_shell + */ + cappwd = cap_clone(origcappwd); + CHECK(cappwd != NULL); + + fields[0] = "pw_uid"; + fields[1] = "pw_class"; + fields[2] = "pw_shell"; + CHECK(cap_pwd_limit_fields(cappwd, fields, 3) == 0); + fields[3] = "pw_change"; + CHECK(cap_pwd_limit_fields(cappwd, fields, 4) == -1 && + errno == ENOTCAPABLE); + fields[0] = "pw_change"; + CHECK(cap_pwd_limit_fields(cappwd, fields, 1) == -1 && + errno == ENOTCAPABLE); + + CHECK(runtest_fields(cappwd, PW_UID | PW_CLASS | PW_SHELL)); + + cap_close(cappwd); + + /* + * Allow: + * fields: pw_change + */ + cappwd = cap_clone(origcappwd); + CHECK(cappwd != NULL); + + fields[0] = "pw_change"; + CHECK(cap_pwd_limit_fields(cappwd, fields, 1) == 0); + fields[1] = "pw_uid"; + CHECK(cap_pwd_limit_fields(cappwd, fields, 2) == -1 && + errno == ENOTCAPABLE); + fields[0] = "pw_uid"; + CHECK(cap_pwd_limit_fields(cappwd, fields, 1) == -1 && + errno == ENOTCAPABLE); + + CHECK(runtest_fields(cappwd, PW_CHANGE)); + + cap_close(cappwd); +} + +static bool +runtest_users(cap_channel_t *cappwd, const char **names, const uid_t *uids, + size_t nusers) +{ + char buf[1024]; + struct passwd *pwd; + struct passwd st; + unsigned int i, got; + + cap_setpwent(cappwd); + got = 0; + for (;;) { + pwd = cap_getpwent(cappwd); + if (pwd == NULL) + break; + got++; + for (i = 0; i < nusers; i++) { + if (strcmp(names[i], pwd->pw_name) == 0 && + uids[i] == pwd->pw_uid) { + break; + } + } + if (i == nusers) + return (false); + } + if (got != nusers) + return (false); + + cap_setpwent(cappwd); + got = 0; + for (;;) { + cap_getpwent_r(cappwd, &st, buf, sizeof(buf), &pwd); + if (pwd == NULL) + break; + got++; + for (i = 0; i < nusers; i++) { + if (strcmp(names[i], pwd->pw_name) == 0 && + uids[i] == pwd->pw_uid) { + break; + } + } + if (i == nusers) + return (false); + } + if (got != nusers) + return (false); + + for (i = 0; i < nusers; i++) { + pwd = cap_getpwnam(cappwd, names[i]); + if (pwd == NULL) + return (false); + } + + for (i = 0; i < nusers; i++) { + cap_getpwnam_r(cappwd, names[i], &st, buf, sizeof(buf), &pwd); + if (pwd == NULL) + return (false); + } + + for (i = 0; i < nusers; i++) { + pwd = cap_getpwuid(cappwd, uids[i]); + if (pwd == NULL) + return (false); + } + + for (i = 0; i < nusers; i++) { + cap_getpwuid_r(cappwd, uids[i], &st, buf, sizeof(buf), &pwd); + if (pwd == NULL) + return (false); + } + + return (true); +} + +static void +test_users(cap_channel_t *origcappwd) +{ + cap_channel_t *cappwd; + const char *names[6]; + uid_t uids[6]; + + /* + * Allow: + * users: + * names: root, toor, daemon, operator, bin, tty + * uids: + */ + cappwd = cap_clone(origcappwd); + CHECK(cappwd != NULL); + + names[0] = "root"; + names[1] = "toor"; + names[2] = "daemon"; + names[3] = "operator"; + names[4] = "bin"; + names[5] = "tty"; + CHECK(cap_pwd_limit_users(cappwd, names, 6, NULL, 0) == 0); + uids[0] = 0; + uids[1] = 0; + uids[2] = 1; + uids[3] = 2; + uids[4] = 3; + uids[5] = 4; + + CHECK(runtest_users(cappwd, names, uids, 6)); + + cap_close(cappwd); + + /* + * Allow: + * users: + * names: daemon, operator, bin + * uids: + */ + cappwd = cap_clone(origcappwd); + CHECK(cappwd != NULL); + + names[0] = "daemon"; + names[1] = "operator"; + names[2] = "bin"; + CHECK(cap_pwd_limit_users(cappwd, names, 3, NULL, 0) == 0); + names[3] = "tty"; + CHECK(cap_pwd_limit_users(cappwd, names, 4, NULL, 0) == -1 && + errno == ENOTCAPABLE); + names[0] = "tty"; + CHECK(cap_pwd_limit_users(cappwd, names, 1, NULL, 0) == -1 && + errno == ENOTCAPABLE); + names[0] = "daemon"; + uids[0] = 1; + uids[1] = 2; + uids[2] = 3; + + CHECK(runtest_users(cappwd, names, uids, 3)); + + cap_close(cappwd); + + /* + * Allow: + * users: + * names: daemon, bin, tty + * uids: + */ + cappwd = cap_clone(origcappwd); + CHECK(cappwd != NULL); + + names[0] = "daemon"; + names[1] = "bin"; + names[2] = "tty"; + CHECK(cap_pwd_limit_users(cappwd, names, 3, NULL, 0) == 0); + names[3] = "operator"; + CHECK(cap_pwd_limit_users(cappwd, names, 4, NULL, 0) == -1 && + errno == ENOTCAPABLE); + names[0] = "operator"; + CHECK(cap_pwd_limit_users(cappwd, names, 1, NULL, 0) == -1 && + errno == ENOTCAPABLE); + names[0] = "daemon"; + uids[0] = 1; + uids[1] = 3; + uids[2] = 4; + + CHECK(runtest_users(cappwd, names, uids, 3)); + + cap_close(cappwd); + + /* + * Allow: + * users: + * names: + * uids: 1, 2, 3 + */ + cappwd = cap_clone(origcappwd); + CHECK(cappwd != NULL); + + names[0] = "daemon"; + names[1] = "operator"; + names[2] = "bin"; + uids[0] = 1; + uids[1] = 2; + uids[2] = 3; + CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 3) == 0); + uids[3] = 4; + CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 4) == -1 && + errno == ENOTCAPABLE); + uids[0] = 4; + CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 1) == -1 && + errno == ENOTCAPABLE); + uids[0] = 1; + + CHECK(runtest_users(cappwd, names, uids, 3)); + + cap_close(cappwd); + + /* + * Allow: + * users: + * names: + * uids: 1, 3, 4 + */ + cappwd = cap_clone(origcappwd); + CHECK(cappwd != NULL); + + names[0] = "daemon"; + names[1] = "bin"; + names[2] = "tty"; + uids[0] = 1; + uids[1] = 3; + uids[2] = 4; + CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 3) == 0); + uids[3] = 5; + CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 4) == -1 && + errno == ENOTCAPABLE); + uids[0] = 5; + CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 1) == -1 && + errno == ENOTCAPABLE); + uids[0] = 1; + + CHECK(runtest_users(cappwd, names, uids, 3)); + + cap_close(cappwd); + + /* + * Allow: + * users: + * names: bin + * uids: + */ + cappwd = cap_clone(origcappwd); + CHECK(cappwd != NULL); + + names[0] = "bin"; + CHECK(cap_pwd_limit_users(cappwd, names, 1, NULL, 0) == 0); + names[1] = "operator"; + CHECK(cap_pwd_limit_users(cappwd, names, 2, NULL, 0) == -1 && + errno == ENOTCAPABLE); + names[0] = "operator"; + CHECK(cap_pwd_limit_users(cappwd, names, 1, NULL, 0) == -1 && + errno == ENOTCAPABLE); + names[0] = "bin"; + uids[0] = 3; + + CHECK(runtest_users(cappwd, names, uids, 1)); + + cap_close(cappwd); + + /* + * Allow: + * users: + * names: daemon, tty + * uids: + */ + cappwd = cap_clone(origcappwd); + CHECK(cappwd != NULL); + + names[0] = "daemon"; + names[1] = "tty"; + CHECK(cap_pwd_limit_users(cappwd, names, 2, NULL, 0) == 0); + names[2] = "operator"; + CHECK(cap_pwd_limit_users(cappwd, names, 3, NULL, 0) == -1 && + errno == ENOTCAPABLE); + names[0] = "operator"; + CHECK(cap_pwd_limit_users(cappwd, names, 1, NULL, 0) == -1 && + errno == ENOTCAPABLE); + names[0] = "daemon"; + uids[0] = 1; + uids[1] = 4; + + CHECK(runtest_users(cappwd, names, uids, 2)); + + cap_close(cappwd); + + /* + * Allow: + * users: + * names: + * uids: 3 + */ + cappwd = cap_clone(origcappwd); + CHECK(cappwd != NULL); + + names[0] = "bin"; + uids[0] = 3; + CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 1) == 0); + uids[1] = 4; + CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 2) == -1 && + errno == ENOTCAPABLE); + uids[0] = 4; + CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 1) == -1 && + errno == ENOTCAPABLE); + uids[0] = 3; + + CHECK(runtest_users(cappwd, names, uids, 1)); + + cap_close(cappwd); + + /* + * Allow: + * users: + * names: + * uids: 1, 4 + */ + cappwd = cap_clone(origcappwd); + CHECK(cappwd != NULL); + + names[0] = "daemon"; + names[1] = "tty"; + uids[0] = 1; + uids[1] = 4; + CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 2) == 0); + uids[2] = 3; + CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 3) == -1 && + errno == ENOTCAPABLE); + uids[0] = 3; + CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 1) == -1 && + errno == ENOTCAPABLE); + uids[0] = 1; + + CHECK(runtest_users(cappwd, names, uids, 2)); + + cap_close(cappwd); +} + +int +main(void) +{ + cap_channel_t *capcas, *cappwd; + + printf("1..188\n"); + + capcas = cap_init(); + CHECKX(capcas != NULL); + + cappwd = cap_service_open(capcas, "system.pwd"); + CHECKX(cappwd != NULL); + + cap_close(capcas); + + /* No limits. */ + + CHECK(runtest_cmds(cappwd) == (GETPWENT | GETPWENT_R | GETPWNAM | + GETPWNAM_R | GETPWUID | GETPWUID_R)); + + test_cmds(cappwd); + + test_fields(cappwd); + + test_users(cappwd); + + cap_close(cappwd); + + exit(0); +} Property changes on: projects/clang390-import/lib/libcasper/services/cap_pwd/tests/pwd_test.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang390-import/lib/libcasper/services/cap_sysctl/Makefile =================================================================== --- projects/clang390-import/lib/libcasper/services/cap_sysctl/Makefile (revision 305686) +++ projects/clang390-import/lib/libcasper/services/cap_sysctl/Makefile (revision 305687) @@ -1,18 +1,24 @@ # $FreeBSD$ +.include + PACKAGE=libcasper LIB= cap_sysctl SHLIB_MAJOR= 0 SHLIBDIR?= /lib/casper INCSDIR?= ${INCLUDEDIR}/casper SRCS= cap_sysctl.c INCS= cap_sysctl.h LIBADD= nv CFLAGS+=-I${.CURDIR} + +.if ${MK_TESTS} != "no" +SUBDIR+= tests +.endif .include Index: projects/clang390-import/lib/libcasper/services/cap_sysctl/tests/Makefile =================================================================== --- projects/clang390-import/lib/libcasper/services/cap_sysctl/tests/Makefile (nonexistent) +++ projects/clang390-import/lib/libcasper/services/cap_sysctl/tests/Makefile (revision 305687) @@ -0,0 +1,11 @@ +# $FreeBSD$ + +TAP_TESTS_C= sysctl_test + +LIBADD+= casper +LIBADD+= cap_sysctl +LIBADD+= nv + +WARNS?= 3 + +.include Property changes on: projects/clang390-import/lib/libcasper/services/cap_sysctl/tests/Makefile ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang390-import/lib/libcasper/services/cap_sysctl/tests/sysctl_test.c =================================================================== --- projects/clang390-import/lib/libcasper/services/cap_sysctl/tests/sysctl_test.c (nonexistent) +++ projects/clang390-import/lib/libcasper/services/cap_sysctl/tests/sysctl_test.c (revision 305687) @@ -0,0 +1,1510 @@ +/*- + * Copyright (c) 2013 The FreeBSD Foundation + * All rights reserved. + * + * This software was developed by Pawel Jakub Dawidek under sponsorship from + * the FreeBSD Foundation. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include +__FBSDID("$FreeBSD$"); + +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include + +/* + * We need some sysctls to perform the tests on. + * We remember their values and restore them afer the test is done. + */ +#define SYSCTL0_PARENT "kern" +#define SYSCTL0_NAME "kern.sync_on_panic" +#define SYSCTL1_PARENT "debug" +#define SYSCTL1_NAME "debug.minidump" + +static int ntest = 1; + +#define CHECK(expr) do { \ + if ((expr)) \ + printf("ok %d %s:%u\n", ntest, __FILE__, __LINE__); \ + else \ + printf("not ok %d %s:%u\n", ntest, __FILE__, __LINE__); \ + ntest++; \ +} while (0) +#define CHECKX(expr) do { \ + if ((expr)) { \ + printf("ok %d %s:%u\n", ntest, __FILE__, __LINE__); \ + } else { \ + printf("not ok %d %s:%u\n", ntest, __FILE__, __LINE__); \ + exit(1); \ + } \ + ntest++; \ +} while (0) + +#define SYSCTL0_READ0 0x0001 +#define SYSCTL0_READ1 0x0002 +#define SYSCTL0_READ2 0x0004 +#define SYSCTL0_WRITE 0x0008 +#define SYSCTL0_READ_WRITE 0x0010 +#define SYSCTL1_READ0 0x0020 +#define SYSCTL1_READ1 0x0040 +#define SYSCTL1_READ2 0x0080 +#define SYSCTL1_WRITE 0x0100 +#define SYSCTL1_READ_WRITE 0x0200 + +static unsigned int +runtest(cap_channel_t *capsysctl) +{ + unsigned int result; + int oldvalue, newvalue; + size_t oldsize; + + result = 0; + + oldsize = sizeof(oldvalue); + if (cap_sysctlbyname(capsysctl, SYSCTL0_NAME, &oldvalue, &oldsize, + NULL, 0) == 0) { + if (oldsize == sizeof(oldvalue)) + result |= SYSCTL0_READ0; + } + + newvalue = 123; + if (cap_sysctlbyname(capsysctl, SYSCTL0_NAME, NULL, NULL, &newvalue, + sizeof(newvalue)) == 0) { + result |= SYSCTL0_WRITE; + } + + if ((result & SYSCTL0_WRITE) != 0) { + oldsize = sizeof(oldvalue); + if (cap_sysctlbyname(capsysctl, SYSCTL0_NAME, &oldvalue, + &oldsize, NULL, 0) == 0) { + if (oldsize == sizeof(oldvalue) && oldvalue == 123) + result |= SYSCTL0_READ1; + } + } + + oldsize = sizeof(oldvalue); + newvalue = 4567; + if (cap_sysctlbyname(capsysctl, SYSCTL0_NAME, &oldvalue, &oldsize, + &newvalue, sizeof(newvalue)) == 0) { + if (oldsize == sizeof(oldvalue) && oldvalue == 123) + result |= SYSCTL0_READ_WRITE; + } + + if ((result & SYSCTL0_READ_WRITE) != 0) { + oldsize = sizeof(oldvalue); + if (cap_sysctlbyname(capsysctl, SYSCTL0_NAME, &oldvalue, + &oldsize, NULL, 0) == 0) { + if (oldsize == sizeof(oldvalue) && oldvalue == 4567) + result |= SYSCTL0_READ2; + } + } + + oldsize = sizeof(oldvalue); + if (cap_sysctlbyname(capsysctl, SYSCTL1_NAME, &oldvalue, &oldsize, + NULL, 0) == 0) { + if (oldsize == sizeof(oldvalue)) + result |= SYSCTL1_READ0; + } + + newvalue = 506; + if (cap_sysctlbyname(capsysctl, SYSCTL1_NAME, NULL, NULL, &newvalue, + sizeof(newvalue)) == 0) { + result |= SYSCTL1_WRITE; + } + + if ((result & SYSCTL1_WRITE) != 0) { + oldsize = sizeof(oldvalue); + if (cap_sysctlbyname(capsysctl, SYSCTL1_NAME, &oldvalue, + &oldsize, NULL, 0) == 0) { + if (oldsize == sizeof(oldvalue) && oldvalue == 506) + result |= SYSCTL1_READ1; + } + } + + oldsize = sizeof(oldvalue); + newvalue = 7008; + if (cap_sysctlbyname(capsysctl, SYSCTL1_NAME, &oldvalue, &oldsize, + &newvalue, sizeof(newvalue)) == 0) { + if (oldsize == sizeof(oldvalue) && oldvalue == 506) + result |= SYSCTL1_READ_WRITE; + } + + if ((result & SYSCTL1_READ_WRITE) != 0) { + oldsize = sizeof(oldvalue); + if (cap_sysctlbyname(capsysctl, SYSCTL1_NAME, &oldvalue, + &oldsize, NULL, 0) == 0) { + if (oldsize == sizeof(oldvalue) && oldvalue == 7008) + result |= SYSCTL1_READ2; + } + } + + return (result); +} + +static void +test_operation(cap_channel_t *origcapsysctl) +{ + cap_channel_t *capsysctl; + nvlist_t *limits; + + /* + * Allow: + * SYSCTL0_PARENT/RDWR/RECURSIVE + * SYSCTL1_PARENT/RDWR/RECURSIVE + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == 0); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, "foo.bar", + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, "foo.bar", + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + + CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL0_READ1 | + SYSCTL0_READ2 | SYSCTL0_WRITE | SYSCTL0_READ_WRITE | + SYSCTL1_READ0 | SYSCTL1_READ1 | SYSCTL1_READ2 | SYSCTL1_WRITE | + SYSCTL1_READ_WRITE)); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_NAME, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == 0); + + CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL0_READ1 | + SYSCTL0_READ2 | SYSCTL0_WRITE | SYSCTL0_READ_WRITE | + SYSCTL1_READ0 | SYSCTL1_READ1 | SYSCTL1_READ2 | SYSCTL1_WRITE | + SYSCTL1_READ_WRITE)); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_NAME, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == 0); + + CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL1_WRITE)); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_READ); + nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_WRITE); + CHECK(cap_limit_set(capsysctl, limits) == 0); + + CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL1_WRITE)); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_READ); + CHECK(cap_limit_set(capsysctl, limits) == 0); + + CHECK(runtest(capsysctl) == SYSCTL0_READ0); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL0_NAME/RDWR/RECURSIVE + * SYSCTL1_NAME/RDWR/RECURSIVE + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_NAME, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == 0); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_RDWR); + nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_RDWR); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_READ); + nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_WRITE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_READ); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + + CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL0_READ1 | + SYSCTL0_READ2 | SYSCTL0_WRITE | SYSCTL0_READ_WRITE | + SYSCTL1_READ0 | SYSCTL1_READ1 | SYSCTL1_READ2 | SYSCTL1_WRITE | + SYSCTL1_READ_WRITE)); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL0_PARENT/RDWR + * SYSCTL1_PARENT/RDWR + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_RDWR); + nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_RDWR); + CHECK(cap_limit_set(capsysctl, limits) == 0); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + + CHECK(runtest(capsysctl) == 0); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL0_NAME/RDWR + * SYSCTL1_NAME/RDWR + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_RDWR); + nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_RDWR); + CHECK(cap_limit_set(capsysctl, limits) == 0); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + + CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL0_READ1 | + SYSCTL0_READ2 | SYSCTL0_WRITE | SYSCTL0_READ_WRITE | + SYSCTL1_READ0 | SYSCTL1_READ1 | SYSCTL1_READ2 | SYSCTL1_WRITE | + SYSCTL1_READ_WRITE)); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL0_PARENT/RDWR + * SYSCTL1_PARENT/RDWR/RECURSIVE + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_RDWR); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == 0); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + + CHECK(runtest(capsysctl) == (SYSCTL1_READ0 | SYSCTL1_READ1 | + SYSCTL1_READ2 | SYSCTL1_WRITE | SYSCTL1_READ_WRITE)); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL0_NAME/RDWR + * SYSCTL1_NAME/RDWR/RECURSIVE + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_RDWR); + nvlist_add_number(limits, SYSCTL1_NAME, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == 0); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_NAME, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + + CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL0_READ1 | + SYSCTL0_READ2 | SYSCTL0_WRITE | SYSCTL0_READ_WRITE | + SYSCTL1_READ0 | SYSCTL1_READ1 | SYSCTL1_READ2 | SYSCTL1_WRITE | + SYSCTL1_READ_WRITE)); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL0_PARENT/READ/RECURSIVE + * SYSCTL1_PARENT/READ/RECURSIVE + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == 0); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_RDWR); + nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_RDWR); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_WRITE); + nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_WRITE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_RDWR); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_WRITE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + + CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL1_READ0)); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL0_NAME/READ/RECURSIVE + * SYSCTL1_NAME/READ/RECURSIVE + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_NAME, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == 0); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_NAME, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_NAME, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_RDWR); + nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_RDWR); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_WRITE); + nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_WRITE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL1_NAME, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_RDWR); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_WRITE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + + CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL1_READ0)); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL0_PARENT/READ + * SYSCTL1_PARENT/READ + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_READ); + nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_READ); + CHECK(cap_limit_set(capsysctl, limits) == 0); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_RDWR); + nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_RDWR); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_WRITE); + nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_WRITE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_RDWR); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_WRITE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + + CHECK(runtest(capsysctl) == 0); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL0_NAME/READ + * SYSCTL1_NAME/READ + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_READ); + nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_READ); + CHECK(cap_limit_set(capsysctl, limits) == 0); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_NAME, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_NAME, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_NAME, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_RDWR); + nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_RDWR); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_WRITE); + nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_WRITE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL1_NAME, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_RDWR); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_WRITE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + + CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL1_READ0)); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL0_PARENT/READ + * SYSCTL1_PARENT/READ/RECURSIVE + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_READ); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == 0); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_READ); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + + CHECK(runtest(capsysctl) == SYSCTL1_READ0); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL0_NAME/READ + * SYSCTL1_NAME/READ/RECURSIVE + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_READ); + nvlist_add_number(limits, SYSCTL1_NAME, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == 0); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_READ); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + + CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL1_READ0)); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL0_PARENT/WRITE/RECURSIVE + * SYSCTL1_PARENT/WRITE/RECURSIVE + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == 0); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_RDWR); + nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_RDWR); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_READ); + nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_READ); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_RDWR); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_READ); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + + CHECK(runtest(capsysctl) == (SYSCTL0_WRITE | SYSCTL1_WRITE)); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL0_NAME/WRITE/RECURSIVE + * SYSCTL1_NAME/WRITE/RECURSIVE + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_NAME, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == 0); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_NAME, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_NAME, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_RDWR); + nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_RDWR); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_READ); + nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_READ); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL1_NAME, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_RDWR); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_READ); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + + CHECK(runtest(capsysctl) == (SYSCTL0_WRITE | SYSCTL1_WRITE)); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL0_PARENT/WRITE + * SYSCTL1_PARENT/WRITE + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_WRITE); + nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_WRITE); + CHECK(cap_limit_set(capsysctl, limits) == 0); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_RDWR); + nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_RDWR); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_READ); + nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_READ); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_RDWR); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_READ); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + + CHECK(runtest(capsysctl) == 0); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL0_NAME/WRITE + * SYSCTL1_NAME/WRITE + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_WRITE); + nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_WRITE); + CHECK(cap_limit_set(capsysctl, limits) == 0); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_NAME, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_NAME, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_NAME, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_RDWR); + nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_RDWR); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_READ); + nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_READ); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL1_NAME, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_RDWR); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_READ); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + + CHECK(runtest(capsysctl) == (SYSCTL0_WRITE | SYSCTL1_WRITE)); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL0_PARENT/WRITE + * SYSCTL1_PARENT/WRITE/RECURSIVE + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_WRITE); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == 0); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_WRITE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + + CHECK(runtest(capsysctl) == SYSCTL1_WRITE); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL0_NAME/WRITE + * SYSCTL1_NAME/WRITE/RECURSIVE + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_WRITE); + nvlist_add_number(limits, SYSCTL1_NAME, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == 0); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_WRITE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + + CHECK(runtest(capsysctl) == (SYSCTL0_WRITE | SYSCTL1_WRITE)); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL0_PARENT/READ/RECURSIVE + * SYSCTL1_PARENT/WRITE/RECURSIVE + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == 0); + + CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL1_WRITE)); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL0_NAME/READ/RECURSIVE + * SYSCTL1_NAME/WRITE/RECURSIVE + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_NAME, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == 0); + + CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL1_WRITE)); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL0_PARENT/READ + * SYSCTL1_PARENT/WRITE + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_READ); + nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_WRITE); + CHECK(cap_limit_set(capsysctl, limits) == 0); + + CHECK(runtest(capsysctl) == 0); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL0_NAME/READ + * SYSCTL1_NAME/WRITE + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_READ); + nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_WRITE); + CHECK(cap_limit_set(capsysctl, limits) == 0); + + CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL1_WRITE)); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL0_PARENT/READ + * SYSCTL1_PARENT/WRITE/RECURSIVE + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_READ); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == 0); + + CHECK(runtest(capsysctl) == SYSCTL1_WRITE); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL0_NAME/READ + * SYSCTL1_NAME/WRITE/RECURSIVE + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_READ); + nvlist_add_number(limits, SYSCTL1_NAME, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == 0); + + CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL1_WRITE)); + + cap_close(capsysctl); +} + +static void +test_names(cap_channel_t *origcapsysctl) +{ + cap_channel_t *capsysctl; + nvlist_t *limits; + + /* + * Allow: + * SYSCTL0_PARENT/READ/RECURSIVE + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == 0); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_READ); + nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_READ); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_READ); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + + CHECK(runtest(capsysctl) == SYSCTL0_READ0); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL1_NAME/READ/RECURSIVE + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL1_NAME, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == 0); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_NAME, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_READ); + nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_READ); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_READ); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + + CHECK(runtest(capsysctl) == SYSCTL1_READ0); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL0_PARENT/WRITE/RECURSIVE + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == 0); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_WRITE); + nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_WRITE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_WRITE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + + CHECK(runtest(capsysctl) == SYSCTL0_WRITE); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL1_NAME/WRITE/RECURSIVE + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL1_NAME, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == 0); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_NAME, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_WRITE); + nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_WRITE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_WRITE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + + CHECK(runtest(capsysctl) == SYSCTL1_WRITE); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL0_PARENT/RDWR/RECURSIVE + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == 0); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL1_PARENT, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_READ); + nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_READ); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_READ); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + + CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL0_READ1 | + SYSCTL0_READ2 | SYSCTL0_WRITE | SYSCTL0_READ_WRITE)); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL1_NAME/RDWR/RECURSIVE + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL1_NAME, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == 0); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + nvlist_add_number(limits, SYSCTL1_NAME, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, + CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_READ); + nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_READ); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_READ); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + + CHECK(runtest(capsysctl) == (SYSCTL1_READ0 | SYSCTL1_READ1 | + SYSCTL1_READ2 | SYSCTL1_WRITE | SYSCTL1_READ_WRITE)); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL0_PARENT/READ + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_READ); + CHECK(cap_limit_set(capsysctl, limits) == 0); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_READ); + nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_READ); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_READ); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + + CHECK(runtest(capsysctl) == 0); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL1_NAME/READ + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_READ); + CHECK(cap_limit_set(capsysctl, limits) == 0); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_READ); + nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_READ); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_READ); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + + CHECK(runtest(capsysctl) == SYSCTL1_READ0); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL0_PARENT/WRITE + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_WRITE); + CHECK(cap_limit_set(capsysctl, limits) == 0); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_WRITE); + nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_WRITE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_WRITE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + + CHECK(runtest(capsysctl) == 0); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL1_NAME/WRITE + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_WRITE); + CHECK(cap_limit_set(capsysctl, limits) == 0); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_WRITE); + nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_WRITE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_WRITE); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + + CHECK(runtest(capsysctl) == SYSCTL1_WRITE); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL0_PARENT/RDWR + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_RDWR); + CHECK(cap_limit_set(capsysctl, limits) == 0); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_RDWR); + nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_RDWR); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_RDWR); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + + CHECK(runtest(capsysctl) == 0); + + cap_close(capsysctl); + + /* + * Allow: + * SYSCTL1_NAME/RDWR + */ + + capsysctl = cap_clone(origcapsysctl); + CHECK(capsysctl != NULL); + + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_RDWR); + CHECK(cap_limit_set(capsysctl, limits) == 0); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_RDWR); + nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_RDWR); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + limits = nvlist_create(0); + nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_RDWR); + CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); + + CHECK(runtest(capsysctl) == (SYSCTL1_READ0 | SYSCTL1_READ1 | + SYSCTL1_READ2 | SYSCTL1_WRITE | SYSCTL1_READ_WRITE)); + + cap_close(capsysctl); +} + +int +main(void) +{ + cap_channel_t *capcas, *capsysctl; + int scvalue0, scvalue1; + size_t scsize; + + printf("1..256\n"); + + scsize = sizeof(scvalue0); + CHECKX(sysctlbyname(SYSCTL0_NAME, &scvalue0, &scsize, NULL, 0) == 0); + CHECKX(scsize == sizeof(scvalue0)); + scsize = sizeof(scvalue1); + CHECKX(sysctlbyname(SYSCTL1_NAME, &scvalue1, &scsize, NULL, 0) == 0); + CHECKX(scsize == sizeof(scvalue1)); + + capcas = cap_init(); + CHECKX(capcas != NULL); + + capsysctl = cap_service_open(capcas, "system.sysctl"); + CHECKX(capsysctl != NULL); + + cap_close(capcas); + + /* No limits set. */ + + CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL0_READ1 | + SYSCTL0_READ2 | SYSCTL0_WRITE | SYSCTL0_READ_WRITE | + SYSCTL1_READ0 | SYSCTL1_READ1 | SYSCTL1_READ2 | SYSCTL1_WRITE | + SYSCTL1_READ_WRITE)); + + test_operation(capsysctl); + + test_names(capsysctl); + + cap_close(capsysctl); + + CHECK(sysctlbyname(SYSCTL0_NAME, NULL, NULL, &scvalue0, + sizeof(scvalue0)) == 0); + CHECK(sysctlbyname(SYSCTL1_NAME, NULL, NULL, &scvalue1, + sizeof(scvalue1)) == 0); + + exit(0); +} Property changes on: projects/clang390-import/lib/libcasper/services/cap_sysctl/tests/sysctl_test.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang390-import/share/man/man3/Makefile =================================================================== --- projects/clang390-import/share/man/man3/Makefile (revision 305686) +++ projects/clang390-import/share/man/man3/Makefile (revision 305687) @@ -1,340 +1,341 @@ # @(#)Makefile 8.2 (Berkeley) 12/13/93 # $FreeBSD$ .include PACKAGE=runtime-manuals MAN= assert.3 \ ATOMIC_VAR_INIT.3 \ bitstring.3 \ end.3 \ fpgetround.3 \ intro.3 \ makedev.3 \ offsetof.3 \ ${PTHREAD_MAN} \ queue.3 \ sigevent.3 \ siginfo.3 \ stdarg.3 \ sysexits.3 \ tgmath.3 \ timeradd.3 \ tree.3 MLINKS= ATOMIC_VAR_INIT.3 atomic_compare_exchange_strong.3 \ ATOMIC_VAR_INIT.3 atomic_compare_exchange_strong_explicit.3 \ ATOMIC_VAR_INIT.3 atomic_compare_exchange_weak.3 \ ATOMIC_VAR_INIT.3 atomic_compare_exchange_weak_explicit.3 \ ATOMIC_VAR_INIT.3 atomic_exchange.3 \ ATOMIC_VAR_INIT.3 atomic_exchange_explicit.3 \ ATOMIC_VAR_INIT.3 atomic_fetch_add.3 \ ATOMIC_VAR_INIT.3 atomic_fetch_add_explicit.3 \ ATOMIC_VAR_INIT.3 atomic_fetch_and.3 \ ATOMIC_VAR_INIT.3 atomic_fetch_and_explicit.3 \ ATOMIC_VAR_INIT.3 atomic_fetch_or.3 \ ATOMIC_VAR_INIT.3 atomic_fetch_or_explicit.3 \ ATOMIC_VAR_INIT.3 atomic_fetch_sub.3 \ ATOMIC_VAR_INIT.3 atomic_fetch_sub_explicit.3 \ ATOMIC_VAR_INIT.3 atomic_fetch_xor.3 \ ATOMIC_VAR_INIT.3 atomic_fetch_xor_explicit.3 \ ATOMIC_VAR_INIT.3 atomic_init.3 \ ATOMIC_VAR_INIT.3 atomic_is_lock_free.3 \ ATOMIC_VAR_INIT.3 atomic_load.3 \ ATOMIC_VAR_INIT.3 atomic_load_explicit.3 \ ATOMIC_VAR_INIT.3 atomic_store.3 \ ATOMIC_VAR_INIT.3 atomic_store_explicit.3 MLINKS+= bitstring.3 bit_alloc.3 \ bitstring.3 bit_clear.3 \ bitstring.3 bit_decl.3 \ bitstring.3 bit_ffc.3 \ bitstring.3 bit_ffc_at.3 \ bitstring.3 bit_ffs.3 \ bitstring.3 bit_ffs_at.3 \ bitstring.3 bit_nclear.3 \ bitstring.3 bit_nset.3 \ bitstring.3 bit_set.3 \ bitstring.3 bitstr_size.3 \ bitstring.3 bit_test.3 MLINKS+= end.3 edata.3 \ end.3 etext.3 MLINKS+= fpgetround.3 fpgetmask.3 \ fpgetround.3 fpgetprec.3 \ fpgetround.3 fpgetsticky.3 \ fpgetround.3 fpresetsticky.3 \ fpgetround.3 fpsetmask.3 \ fpgetround.3 fpsetprec.3 \ fpgetround.3 fpsetround.3 MLINKS+= makedev.3 major.3 \ makedev.3 minor.3 MLINKS+= ${PTHREAD_MLINKS} MLINKS+= queue.3 LIST_CLASS_ENTRY.3 \ queue.3 LIST_CLASS_HEAD.3 \ queue.3 LIST_EMPTY.3 \ queue.3 LIST_ENTRY.3 \ queue.3 LIST_FIRST.3 \ queue.3 LIST_FOREACH.3 \ queue.3 LIST_FOREACH_FROM.3 \ queue.3 LIST_FOREACH_FROM_SAFE.3 \ queue.3 LIST_FOREACH_SAFE.3 \ queue.3 LIST_HEAD.3 \ queue.3 LIST_HEAD_INITIALIZER.3 \ queue.3 LIST_INIT.3 \ queue.3 LIST_INSERT_AFTER.3 \ queue.3 LIST_INSERT_BEFORE.3 \ queue.3 LIST_INSERT_HEAD.3 \ queue.3 LIST_NEXT.3 \ queue.3 LIST_PREV.3 \ queue.3 LIST_REMOVE.3 \ queue.3 LIST_SWAP.3 \ queue.3 SLIST_CLASS_ENTRY.3 \ queue.3 SLIST_CLASS_HEAD.3 \ queue.3 SLIST_EMPTY.3 \ queue.3 SLIST_ENTRY.3 \ queue.3 SLIST_FIRST.3 \ queue.3 SLIST_FOREACH.3 \ queue.3 SLIST_FOREACH_FROM.3 \ queue.3 SLIST_FOREACH_FROM_SAFE.3 \ queue.3 SLIST_FOREACH_SAFE.3 \ queue.3 SLIST_HEAD.3 \ queue.3 SLIST_HEAD_INITIALIZER.3 \ queue.3 SLIST_INIT.3 \ queue.3 SLIST_INSERT_AFTER.3 \ queue.3 SLIST_INSERT_HEAD.3 \ queue.3 SLIST_NEXT.3 \ queue.3 SLIST_REMOVE.3 \ queue.3 SLIST_REMOVE_AFTER.3 \ queue.3 SLIST_REMOVE_HEAD.3 \ + queue.3 SLIST_REMOVE_PREVPTR.3 \ queue.3 SLIST_SWAP.3 \ queue.3 STAILQ_CLASS_ENTRY.3 \ queue.3 STAILQ_CLASS_HEAD.3 \ queue.3 STAILQ_CONCAT.3 \ queue.3 STAILQ_EMPTY.3 \ queue.3 STAILQ_ENTRY.3 \ queue.3 STAILQ_FIRST.3 \ queue.3 STAILQ_FOREACH.3 \ queue.3 STAILQ_FOREACH_FROM.3 \ queue.3 STAILQ_FOREACH_FROM_SAFE.3 \ queue.3 STAILQ_FOREACH_SAFE.3 \ queue.3 STAILQ_HEAD.3 \ queue.3 STAILQ_HEAD_INITIALIZER.3 \ queue.3 STAILQ_INIT.3 \ queue.3 STAILQ_INSERT_AFTER.3 \ queue.3 STAILQ_INSERT_HEAD.3 \ queue.3 STAILQ_INSERT_TAIL.3 \ queue.3 STAILQ_LAST.3 \ queue.3 STAILQ_NEXT.3 \ queue.3 STAILQ_REMOVE.3 \ queue.3 STAILQ_REMOVE_AFTER.3 \ queue.3 STAILQ_REMOVE_HEAD.3 \ queue.3 STAILQ_SWAP.3 \ queue.3 TAILQ_CLASS_ENTRY.3 \ queue.3 TAILQ_CLASS_HEAD.3 \ queue.3 TAILQ_CONCAT.3 \ queue.3 TAILQ_EMPTY.3 \ queue.3 TAILQ_ENTRY.3 \ queue.3 TAILQ_FIRST.3 \ queue.3 TAILQ_FOREACH.3 \ queue.3 TAILQ_FOREACH_FROM.3 \ queue.3 TAILQ_FOREACH_FROM_SAFE.3 \ queue.3 TAILQ_FOREACH_REVERSE.3 \ queue.3 TAILQ_FOREACH_REVERSE_FROM.3 \ queue.3 TAILQ_FOREACH_REVERSE_FROM_SAFE.3 \ queue.3 TAILQ_FOREACH_REVERSE_SAFE.3 \ queue.3 TAILQ_FOREACH_SAFE.3 \ queue.3 TAILQ_HEAD.3 \ queue.3 TAILQ_HEAD_INITIALIZER.3 \ queue.3 TAILQ_INIT.3 \ queue.3 TAILQ_INSERT_AFTER.3 \ queue.3 TAILQ_INSERT_BEFORE.3 \ queue.3 TAILQ_INSERT_HEAD.3 \ queue.3 TAILQ_INSERT_TAIL.3 \ queue.3 TAILQ_LAST.3 \ queue.3 TAILQ_NEXT.3 \ queue.3 TAILQ_PREV.3 \ queue.3 TAILQ_REMOVE.3 \ queue.3 TAILQ_SWAP.3 MLINKS+= stdarg.3 va_arg.3 \ stdarg.3 va_copy.3 \ stdarg.3 va_end.3 \ stdarg.3 varargs.3 \ stdarg.3 va_start.3 MLINKS+= timeradd.3 timerclear.3 \ timeradd.3 timercmp.3 \ timeradd.3 timerisset.3 \ timeradd.3 timersub.3 MLINKS+= tree.3 RB_EMPTY.3 \ tree.3 RB_ENTRY.3 \ tree.3 RB_FIND.3 \ tree.3 RB_FOREACH.3 \ tree.3 RB_FOREACH_REVERSE.3 \ tree.3 RB_GENERATE.3 \ tree.3 RB_GENERATE_STATIC.3 \ tree.3 RB_HEAD.3 \ tree.3 RB_INIT.3 \ tree.3 RB_INITIALIZER.3 \ tree.3 RB_INSERT.3 \ tree.3 RB_LEFT.3 \ tree.3 RB_MAX.3 \ tree.3 RB_MIN.3 \ tree.3 RB_NEXT.3 \ tree.3 RB_NFIND.3 \ tree.3 RB_PARENT.3 \ tree.3 RB_PREV.3 \ tree.3 RB_PROTOTYPE.3 \ tree.3 RB_PROTOTYPE_STATIC.3 \ tree.3 RB_REMOVE.3 \ tree.3 RB_RIGHT.3 \ tree.3 RB_ROOT.3 \ tree.3 SPLAY_EMPTY.3 \ tree.3 SPLAY_ENTRY.3 \ tree.3 SPLAY_FIND.3 \ tree.3 SPLAY_FOREACH.3 \ tree.3 SPLAY_GENERATE.3 \ tree.3 SPLAY_HEAD.3 \ tree.3 SPLAY_INIT.3 \ tree.3 SPLAY_INITIALIZER.3 \ tree.3 SPLAY_INSERT.3 \ tree.3 SPLAY_LEFT.3 \ tree.3 SPLAY_MAX.3 \ tree.3 SPLAY_MIN.3 \ tree.3 SPLAY_NEXT.3 \ tree.3 SPLAY_PROTOTYPE.3 \ tree.3 SPLAY_REMOVE.3 \ tree.3 SPLAY_RIGHT.3 \ tree.3 SPLAY_ROOT.3 .if ${MK_LIBTHR} != "no" PTHREAD_MAN= pthread.3 \ pthread_affinity_np.3 \ pthread_atfork.3 \ pthread_attr.3 \ pthread_attr_affinity_np.3 \ pthread_attr_get_np.3 \ pthread_attr_setcreatesuspend_np.3 \ pthread_barrierattr.3 \ pthread_barrier_destroy.3 \ pthread_cancel.3 \ pthread_cleanup_pop.3 \ pthread_cleanup_push.3 \ pthread_condattr.3 \ pthread_cond_broadcast.3 \ pthread_cond_destroy.3 \ pthread_cond_init.3 \ pthread_cond_signal.3 \ pthread_cond_timedwait.3 \ pthread_cond_wait.3 \ pthread_create.3 \ pthread_detach.3 \ pthread_equal.3 \ pthread_exit.3 \ pthread_getconcurrency.3 \ pthread_getcpuclockid.3 \ pthread_getspecific.3 \ pthread_getthreadid_np.3 \ pthread_join.3 \ pthread_key_create.3 \ pthread_key_delete.3 \ pthread_kill.3 \ pthread_main_np.3 \ pthread_multi_np.3 \ pthread_mutexattr.3 \ pthread_mutexattr_getkind_np.3 \ pthread_mutex_consistent.3 \ pthread_mutex_destroy.3 \ pthread_mutex_init.3 \ pthread_mutex_lock.3 \ pthread_mutex_timedlock.3 \ pthread_mutex_trylock.3 \ pthread_mutex_unlock.3 \ pthread_once.3 \ pthread_resume_all_np.3 \ pthread_resume_np.3 \ pthread_rwlockattr_destroy.3 \ pthread_rwlockattr_getpshared.3 \ pthread_rwlockattr_init.3 \ pthread_rwlockattr_setpshared.3 \ pthread_rwlock_destroy.3 \ pthread_rwlock_init.3 \ pthread_rwlock_rdlock.3 \ pthread_rwlock_timedrdlock.3 \ pthread_rwlock_timedwrlock.3 \ pthread_rwlock_unlock.3 \ pthread_rwlock_wrlock.3 \ pthread_schedparam.3 \ pthread_self.3 \ pthread_set_name_np.3 \ pthread_setspecific.3 \ pthread_sigmask.3 \ pthread_spin_init.3 \ pthread_spin_lock.3 \ pthread_suspend_all_np.3 \ pthread_suspend_np.3 \ pthread_switch_add_np.3 \ pthread_testcancel.3 \ pthread_yield.3 PTHREAD_MLINKS= pthread_affinity_np.3 pthread_getaffinity_np.3 \ pthread_affinity_np.3 pthread_setaffinity_np.3 PTHREAD_MLINKS+=pthread_attr.3 pthread_attr_destroy.3 \ pthread_attr.3 pthread_attr_getdetachstate.3 \ pthread_attr.3 pthread_attr_getguardsize.3 \ pthread_attr.3 pthread_attr_getinheritsched.3 \ pthread_attr.3 pthread_attr_getschedparam.3 \ pthread_attr.3 pthread_attr_getschedpolicy.3 \ pthread_attr.3 pthread_attr_getscope.3 \ pthread_attr.3 pthread_attr_getstack.3 \ pthread_attr.3 pthread_attr_getstackaddr.3 \ pthread_attr.3 pthread_attr_getstacksize.3 \ pthread_attr.3 pthread_attr_init.3 \ pthread_attr.3 pthread_attr_setdetachstate.3 \ pthread_attr.3 pthread_attr_setguardsize.3 \ pthread_attr.3 pthread_attr_setinheritsched.3 \ pthread_attr.3 pthread_attr_setschedparam.3 \ pthread_attr.3 pthread_attr_setschedpolicy.3 \ pthread_attr.3 pthread_attr_setscope.3 \ pthread_attr.3 pthread_attr_setstack.3 \ pthread_attr.3 pthread_attr_setstackaddr.3 \ pthread_attr.3 pthread_attr_setstacksize.3 PTHREAD_MLINKS+=pthread_attr_affinity_np.3 pthread_attr_getaffinity_np.3 \ pthread_attr_affinity_np.3 pthread_attr_setaffinity_np.3 PTHREAD_MLINKS+=pthread_barrierattr.3 pthread_barrierattr_destroy.3 \ pthread_barrierattr.3 pthread_barrierattr_getpshared.3 \ pthread_barrierattr.3 pthread_barrierattr_init.3 \ pthread_barrierattr.3 pthread_barrierattr_setpshared.3 PTHREAD_MLINKS+=pthread_barrier_destroy.3 pthread_barrier_init.3 \ pthread_barrier_destroy.3 pthread_barrier_wait.3 PTHREAD_MLINKS+=pthread_condattr.3 pthread_condattr_destroy.3 \ pthread_condattr.3 pthread_condattr_init.3 \ pthread_condattr.3 pthread_condattr_getclock.3 \ pthread_condattr.3 pthread_condattr_setclock.3 \ pthread_condattr.3 pthread_condattr_getpshared.3 \ pthread_condattr.3 pthread_condattr_setpshared.3 PTHREAD_MLINKS+=pthread_getconcurrency.3 pthread_setconcurrency.3 PTHREAD_MLINKS+=pthread_multi_np.3 pthread_single_np.3 PTHREAD_MLINKS+=pthread_mutexattr.3 pthread_mutexattr_destroy.3 \ pthread_mutexattr.3 pthread_mutexattr_getprioceiling.3 \ pthread_mutexattr.3 pthread_mutexattr_getprotocol.3 \ pthread_mutexattr.3 pthread_mutexattr_getrobust.3 \ pthread_mutexattr.3 pthread_mutexattr_gettype.3 \ pthread_mutexattr.3 pthread_mutexattr_init.3 \ pthread_mutexattr.3 pthread_mutexattr_setprioceiling.3 \ pthread_mutexattr.3 pthread_mutexattr_setprotocol.3 \ pthread_mutexattr.3 pthread_mutexattr_setrobust.3 \ pthread_mutexattr.3 pthread_mutexattr_settype.3 PTHREAD_MLINKS+=pthread_mutexattr_getkind_np.3 pthread_mutexattr_setkind_np.3 PTHREAD_MLINKS+=pthread_rwlock_rdlock.3 pthread_rwlock_tryrdlock.3 PTHREAD_MLINKS+=pthread_rwlock_wrlock.3 pthread_rwlock_trywrlock.3 PTHREAD_MLINKS+=pthread_schedparam.3 pthread_getschedparam.3 \ pthread_schedparam.3 pthread_setschedparam.3 PTHREAD_MLINKS+=pthread_spin_init.3 pthread_spin_destroy.3 \ pthread_spin_lock.3 pthread_spin_trylock.3 \ pthread_spin_lock.3 pthread_spin_unlock.3 PTHREAD_MLINKS+=pthread_switch_add_np.3 pthread_switch_delete_np.3 PTHREAD_MLINKS+=pthread_testcancel.3 pthread_setcancelstate.3 \ pthread_testcancel.3 pthread_setcanceltype.3 PTHREAD_MLINKS+=pthread_join.3 pthread_timedjoin_np.3 .endif .include Index: projects/clang390-import/share/man/man3/queue.3 =================================================================== --- projects/clang390-import/share/man/man3/queue.3 (revision 305686) +++ projects/clang390-import/share/man/man3/queue.3 (revision 305687) @@ -1,1293 +1,1337 @@ .\" Copyright (c) 1993 .\" The Regents of the University of California. All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 3. Neither the name of the University nor the names of its contributors .\" may be used to endorse or promote products derived from this software .\" without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" @(#)queue.3 8.2 (Berkeley) 1/24/94 .\" $FreeBSD$ .\" -.Dd August 15, 2016 +.Dd September 8, 2016 .Dt QUEUE 3 .Os .Sh NAME .Nm SLIST_CLASS_ENTRY , .Nm SLIST_CLASS_HEAD , .Nm SLIST_CONCAT , .Nm SLIST_EMPTY , .Nm SLIST_ENTRY , .Nm SLIST_FIRST , .Nm SLIST_FOREACH , .Nm SLIST_FOREACH_FROM , .Nm SLIST_FOREACH_FROM_SAFE , .Nm SLIST_FOREACH_SAFE , .Nm SLIST_HEAD , .Nm SLIST_HEAD_INITIALIZER , .Nm SLIST_INIT , .Nm SLIST_INSERT_AFTER , .Nm SLIST_INSERT_HEAD , .Nm SLIST_NEXT , .Nm SLIST_REMOVE , .Nm SLIST_REMOVE_AFTER , .Nm SLIST_REMOVE_HEAD , .Nm SLIST_SWAP , .Nm STAILQ_CLASS_ENTRY , .Nm STAILQ_CLASS_HEAD , .Nm STAILQ_CONCAT , .Nm STAILQ_EMPTY , .Nm STAILQ_ENTRY , .Nm STAILQ_FIRST , .Nm STAILQ_FOREACH , .Nm STAILQ_FOREACH_FROM , .Nm STAILQ_FOREACH_FROM_SAFE , .Nm STAILQ_FOREACH_SAFE , .Nm STAILQ_HEAD , .Nm STAILQ_HEAD_INITIALIZER , .Nm STAILQ_INIT , .Nm STAILQ_INSERT_AFTER , .Nm STAILQ_INSERT_HEAD , .Nm STAILQ_INSERT_TAIL , .Nm STAILQ_LAST , .Nm STAILQ_NEXT , .Nm STAILQ_REMOVE , .Nm STAILQ_REMOVE_AFTER , .Nm STAILQ_REMOVE_HEAD , .Nm STAILQ_SWAP , .Nm LIST_CLASS_ENTRY , .Nm LIST_CLASS_HEAD , .Nm LIST_CONCAT , .Nm LIST_EMPTY , .Nm LIST_ENTRY , .Nm LIST_FIRST , .Nm LIST_FOREACH , .Nm LIST_FOREACH_FROM , .Nm LIST_FOREACH_FROM_SAFE , .Nm LIST_FOREACH_SAFE , .Nm LIST_HEAD , .Nm LIST_HEAD_INITIALIZER , .Nm LIST_INIT , .Nm LIST_INSERT_AFTER , .Nm LIST_INSERT_BEFORE , .Nm LIST_INSERT_HEAD , .Nm LIST_NEXT , .Nm LIST_PREV , .Nm LIST_REMOVE , .Nm LIST_SWAP , .Nm TAILQ_CLASS_ENTRY , .Nm TAILQ_CLASS_HEAD , .Nm TAILQ_CONCAT , .Nm TAILQ_EMPTY , .Nm TAILQ_ENTRY , .Nm TAILQ_FIRST , .Nm TAILQ_FOREACH , .Nm TAILQ_FOREACH_FROM , .Nm TAILQ_FOREACH_FROM_SAFE , .Nm TAILQ_FOREACH_REVERSE , .Nm TAILQ_FOREACH_REVERSE_FROM , .Nm TAILQ_FOREACH_REVERSE_FROM_SAFE , .Nm TAILQ_FOREACH_REVERSE_SAFE , .Nm TAILQ_FOREACH_SAFE , .Nm TAILQ_HEAD , .Nm TAILQ_HEAD_INITIALIZER , .Nm TAILQ_INIT , .Nm TAILQ_INSERT_AFTER , .Nm TAILQ_INSERT_BEFORE , .Nm TAILQ_INSERT_HEAD , .Nm TAILQ_INSERT_TAIL , .Nm TAILQ_LAST , .Nm TAILQ_NEXT , .Nm TAILQ_PREV , .Nm TAILQ_REMOVE , .Nm TAILQ_SWAP .Nd implementations of singly-linked lists, singly-linked tail queues, lists and tail queues .Sh SYNOPSIS .In sys/queue.h .\" .Fn SLIST_CLASS_ENTRY "CLASSTYPE" .Fn SLIST_CLASS_HEAD "HEADNAME" "CLASSTYPE" .Fn SLIST_CONCAT "SLIST_HEAD *head1" "SLIST_HEAD *head2" "TYPE" "SLIST_ENTRY NAME" .Fn SLIST_EMPTY "SLIST_HEAD *head" .Fn SLIST_ENTRY "TYPE" .Fn SLIST_FIRST "SLIST_HEAD *head" .Fn SLIST_FOREACH "TYPE *var" "SLIST_HEAD *head" "SLIST_ENTRY NAME" .Fn SLIST_FOREACH_FROM "TYPE *var" "SLIST_HEAD *head" "SLIST_ENTRY NAME" .Fn SLIST_FOREACH_FROM_SAFE "TYPE *var" "SLIST_HEAD *head" "SLIST_ENTRY NAME" "TYPE *temp_var" .Fn SLIST_FOREACH_SAFE "TYPE *var" "SLIST_HEAD *head" "SLIST_ENTRY NAME" "TYPE *temp_var" .Fn SLIST_HEAD "HEADNAME" "TYPE" .Fn SLIST_HEAD_INITIALIZER "SLIST_HEAD head" .Fn SLIST_INIT "SLIST_HEAD *head" .Fn SLIST_INSERT_AFTER "TYPE *listelm" "TYPE *elm" "SLIST_ENTRY NAME" .Fn SLIST_INSERT_HEAD "SLIST_HEAD *head" "TYPE *elm" "SLIST_ENTRY NAME" .Fn SLIST_NEXT "TYPE *elm" "SLIST_ENTRY NAME" .Fn SLIST_REMOVE "SLIST_HEAD *head" "TYPE *elm" "TYPE" "SLIST_ENTRY NAME" .Fn SLIST_REMOVE_AFTER "TYPE *elm" "SLIST_ENTRY NAME" .Fn SLIST_REMOVE_HEAD "SLIST_HEAD *head" "SLIST_ENTRY NAME" .Fn SLIST_SWAP "SLIST_HEAD *head1" "SLIST_HEAD *head2" "TYPE" .\" .Fn STAILQ_CLASS_ENTRY "CLASSTYPE" .Fn STAILQ_CLASS_HEAD "HEADNAME" "CLASSTYPE" .Fn STAILQ_CONCAT "STAILQ_HEAD *head1" "STAILQ_HEAD *head2" .Fn STAILQ_EMPTY "STAILQ_HEAD *head" .Fn STAILQ_ENTRY "TYPE" .Fn STAILQ_FIRST "STAILQ_HEAD *head" .Fn STAILQ_FOREACH "TYPE *var" "STAILQ_HEAD *head" "STAILQ_ENTRY NAME" .Fn STAILQ_FOREACH_FROM "TYPE *var" "STAILQ_HEAD *head" "STAILQ_ENTRY NAME" .Fn STAILQ_FOREACH_FROM_SAFE "TYPE *var" "STAILQ_HEAD *head" "STAILQ_ENTRY NAME" "TYPE *temp_var" .Fn STAILQ_FOREACH_SAFE "TYPE *var" "STAILQ_HEAD *head" "STAILQ_ENTRY NAME" "TYPE *temp_var" .Fn STAILQ_HEAD "HEADNAME" "TYPE" .Fn STAILQ_HEAD_INITIALIZER "STAILQ_HEAD head" .Fn STAILQ_INIT "STAILQ_HEAD *head" .Fn STAILQ_INSERT_AFTER "STAILQ_HEAD *head" "TYPE *listelm" "TYPE *elm" "STAILQ_ENTRY NAME" .Fn STAILQ_INSERT_HEAD "STAILQ_HEAD *head" "TYPE *elm" "STAILQ_ENTRY NAME" .Fn STAILQ_INSERT_TAIL "STAILQ_HEAD *head" "TYPE *elm" "STAILQ_ENTRY NAME" .Fn STAILQ_LAST "STAILQ_HEAD *head" "TYPE *elm" "STAILQ_ENTRY NAME" .Fn STAILQ_NEXT "TYPE *elm" "STAILQ_ENTRY NAME" .Fn STAILQ_REMOVE "STAILQ_HEAD *head" "TYPE *elm" "TYPE" "STAILQ_ENTRY NAME" .Fn STAILQ_REMOVE_AFTER "STAILQ_HEAD *head" "TYPE *elm" "STAILQ_ENTRY NAME" .Fn STAILQ_REMOVE_HEAD "STAILQ_HEAD *head" "STAILQ_ENTRY NAME" .Fn STAILQ_SWAP "STAILQ_HEAD *head1" "STAILQ_HEAD *head2" "TYPE" .\" .Fn LIST_CLASS_ENTRY "CLASSTYPE" .Fn LIST_CLASS_HEAD "HEADNAME" "CLASSTYPE" .Fn LIST_CONCAT "LIST_HEAD *head1" "LIST_HEAD *head2" "TYPE" "LIST_ENTRY NAME" .Fn LIST_EMPTY "LIST_HEAD *head" .Fn LIST_ENTRY "TYPE" .Fn LIST_FIRST "LIST_HEAD *head" .Fn LIST_FOREACH "TYPE *var" "LIST_HEAD *head" "LIST_ENTRY NAME" .Fn LIST_FOREACH_FROM "TYPE *var" "LIST_HEAD *head" "LIST_ENTRY NAME" .Fn LIST_FOREACH_FROM_SAFE "TYPE *var" "LIST_HEAD *head" "LIST_ENTRY NAME" "TYPE *temp_var" .Fn LIST_FOREACH_SAFE "TYPE *var" "LIST_HEAD *head" "LIST_ENTRY NAME" "TYPE *temp_var" .Fn LIST_HEAD "HEADNAME" "TYPE" .Fn LIST_HEAD_INITIALIZER "LIST_HEAD head" .Fn LIST_INIT "LIST_HEAD *head" .Fn LIST_INSERT_AFTER "TYPE *listelm" "TYPE *elm" "LIST_ENTRY NAME" .Fn LIST_INSERT_BEFORE "TYPE *listelm" "TYPE *elm" "LIST_ENTRY NAME" .Fn LIST_INSERT_HEAD "LIST_HEAD *head" "TYPE *elm" "LIST_ENTRY NAME" .Fn LIST_NEXT "TYPE *elm" "LIST_ENTRY NAME" .Fn LIST_PREV "TYPE *elm" "LIST_HEAD *head" "TYPE" "LIST_ENTRY NAME" .Fn LIST_REMOVE "TYPE *elm" "LIST_ENTRY NAME" .Fn LIST_SWAP "LIST_HEAD *head1" "LIST_HEAD *head2" "TYPE" "LIST_ENTRY NAME" .\" .Fn TAILQ_CLASS_ENTRY "CLASSTYPE" .Fn TAILQ_CLASS_HEAD "HEADNAME" "CLASSTYPE" .Fn TAILQ_CONCAT "TAILQ_HEAD *head1" "TAILQ_HEAD *head2" "TAILQ_ENTRY NAME" .Fn TAILQ_EMPTY "TAILQ_HEAD *head" .Fn TAILQ_ENTRY "TYPE" .Fn TAILQ_FIRST "TAILQ_HEAD *head" .Fn TAILQ_FOREACH "TYPE *var" "TAILQ_HEAD *head" "TAILQ_ENTRY NAME" .Fn TAILQ_FOREACH_FROM "TYPE *var" "TAILQ_HEAD *head" "TAILQ_ENTRY NAME" .Fn TAILQ_FOREACH_FROM_SAFE "TYPE *var" "TAILQ_HEAD *head" "TAILQ_ENTRY NAME" "TYPE *temp_var" .Fn TAILQ_FOREACH_REVERSE "TYPE *var" "TAILQ_HEAD *head" "HEADNAME" "TAILQ_ENTRY NAME" .Fn TAILQ_FOREACH_REVERSE_FROM "TYPE *var" "TAILQ_HEAD *head" "HEADNAME" "TAILQ_ENTRY NAME" .Fn TAILQ_FOREACH_REVERSE_FROM_SAFE "TYPE *var" "TAILQ_HEAD *head" "HEADNAME" "TAILQ_ENTRY NAME" "TYPE *temp_var" .Fn TAILQ_FOREACH_REVERSE_SAFE "TYPE *var" "TAILQ_HEAD *head" "HEADNAME" "TAILQ_ENTRY NAME" "TYPE *temp_var" .Fn TAILQ_FOREACH_SAFE "TYPE *var" "TAILQ_HEAD *head" "TAILQ_ENTRY NAME" "TYPE *temp_var" .Fn TAILQ_HEAD "HEADNAME" "TYPE" .Fn TAILQ_HEAD_INITIALIZER "TAILQ_HEAD head" .Fn TAILQ_INIT "TAILQ_HEAD *head" .Fn TAILQ_INSERT_AFTER "TAILQ_HEAD *head" "TYPE *listelm" "TYPE *elm" "TAILQ_ENTRY NAME" .Fn TAILQ_INSERT_BEFORE "TYPE *listelm" "TYPE *elm" "TAILQ_ENTRY NAME" .Fn TAILQ_INSERT_HEAD "TAILQ_HEAD *head" "TYPE *elm" "TAILQ_ENTRY NAME" .Fn TAILQ_INSERT_TAIL "TAILQ_HEAD *head" "TYPE *elm" "TAILQ_ENTRY NAME" .Fn TAILQ_LAST "TAILQ_HEAD *head" "HEADNAME" .Fn TAILQ_NEXT "TYPE *elm" "TAILQ_ENTRY NAME" .Fn TAILQ_PREV "TYPE *elm" "HEADNAME" "TAILQ_ENTRY NAME" .Fn TAILQ_REMOVE "TAILQ_HEAD *head" "TYPE *elm" "TAILQ_ENTRY NAME" .Fn TAILQ_SWAP "TAILQ_HEAD *head1" "TAILQ_HEAD *head2" "TYPE" "TAILQ_ENTRY NAME" .\" .Sh DESCRIPTION These macros define and operate on four types of data structures which can be used in both C and C++ source code: .Bl -enum -compact -offset indent .It Lists .It Singly-linked lists .It Singly-linked tail queues .It Tail queues .El All four structures support the following functionality: .Bl -enum -compact -offset indent .It Insertion of a new entry at the head of the list. .It Insertion of a new entry after any element in the list. .It O(1) removal of an entry from the head of the list. .It Forward traversal through the list. .It Swapping the contents of two lists. .El .Pp Singly-linked lists are the simplest of the four data structures and support only the above functionality. Singly-linked lists are ideal for applications with large datasets and few or no removals, or for implementing a LIFO queue. Singly-linked lists add the following functionality: .Bl -enum -compact -offset indent .It O(n) removal of any entry in the list. .It O(n) concatenation of two lists. .El .Pp Singly-linked tail queues add the following functionality: .Bl -enum -compact -offset indent .It Entries can be added at the end of a list. .It O(n) removal of any entry in the list. .It They may be concatenated. .El However: .Bl -enum -compact -offset indent .It All list insertions must specify the head of the list. .It Each head entry requires two pointers rather than one. .It Code size is about 15% greater and operations run about 20% slower than singly-linked lists. .El .Pp Singly-linked tail queues are ideal for applications with large datasets and few or no removals, or for implementing a FIFO queue. .Pp All doubly linked types of data structures (lists and tail queues) additionally allow: .Bl -enum -compact -offset indent .It Insertion of a new entry before any element in the list. .It O(1) removal of any entry in the list. .El However: .Bl -enum -compact -offset indent .It Each element requires two pointers rather than one. .It Code size and execution time of operations (except for removal) is about twice that of the singly-linked data-structures. .El .Pp Linked lists are the simplest of the doubly linked data structures. They add the following functionality over the above: .Bl -enum -compact -offset indent .It O(n) concatenation of two lists. .It They may be traversed backwards. .El However: .Bl -enum -compact -offset indent .It To traverse backwards, an entry to begin the traversal and the list in which it is contained must be specified. .El .Pp Tail queues add the following functionality: .Bl -enum -compact -offset indent .It Entries can be added at the end of a list. .It They may be traversed backwards, from tail to head. .It They may be concatenated. .El However: .Bl -enum -compact -offset indent .It All list insertions and removals must specify the head of the list. .It Each head entry requires two pointers rather than one. .It Code size is about 15% greater and operations run about 20% slower than singly-linked lists. .El .Pp In the macro definitions, .Fa TYPE is the name of a user defined structure. The structure must contain a field called .Fa NAME which is of type .Li SLIST_ENTRY , .Li STAILQ_ENTRY , .Li LIST_ENTRY , or .Li TAILQ_ENTRY . In the macro definitions, .Fa CLASSTYPE is the name of a user defined class. The class must contain a field called .Fa NAME which is of type .Li SLIST_CLASS_ENTRY , .Li STAILQ_CLASS_ENTRY , .Li LIST_CLASS_ENTRY , or .Li TAILQ_CLASS_ENTRY . The argument .Fa HEADNAME is the name of a user defined structure that must be declared using the macros .Li SLIST_HEAD , .Li SLIST_CLASS_HEAD , .Li STAILQ_HEAD , .Li STAILQ_CLASS_HEAD , .Li LIST_HEAD , .Li LIST_CLASS_HEAD , .Li TAILQ_HEAD , or .Li TAILQ_CLASS_HEAD . See the examples below for further explanation of how these macros are used. .Sh SINGLY-LINKED LISTS A singly-linked list is headed by a structure defined by the .Nm SLIST_HEAD macro. This structure contains a single pointer to the first element on the list. The elements are singly linked for minimum space and pointer manipulation overhead at the expense of O(n) removal for arbitrary elements. New elements can be added to the list after an existing element or at the head of the list. An .Fa SLIST_HEAD structure is declared as follows: .Bd -literal -offset indent SLIST_HEAD(HEADNAME, TYPE) head; .Ed .Pp where .Fa HEADNAME is the name of the structure to be defined, and .Fa TYPE is the type of the elements to be linked into the list. A pointer to the head of the list can later be declared as: .Bd -literal -offset indent struct HEADNAME *headp; .Ed .Pp (The names .Li head and .Li headp are user selectable.) .Pp The macro .Nm SLIST_HEAD_INITIALIZER evaluates to an initializer for the list .Fa head . .Pp The macro .Nm SLIST_CONCAT concatenates the list headed by .Fa head2 onto the end of the one headed by .Fa head1 removing all entries from the former. Use of this macro should be avoided as it traverses the entirety of the .Fa head1 list. A singly-linked tail queue should be used if this macro is needed in high-usage code paths or to operate on long lists. .Pp The macro .Nm SLIST_EMPTY evaluates to true if there are no elements in the list. .Pp The macro .Nm SLIST_ENTRY declares a structure that connects the elements in the list. .Pp The macro .Nm SLIST_FIRST returns the first element in the list or NULL if the list is empty. .Pp The macro .Nm SLIST_FOREACH traverses the list referenced by .Fa head in the forward direction, assigning each element in turn to .Fa var . .Pp The macro .Nm SLIST_FOREACH_FROM behaves identically to .Nm SLIST_FOREACH when .Fa var is NULL, else it treats .Fa var as a previously found SLIST element and begins the loop at .Fa var instead of the first element in the SLIST referenced by .Fa head . .Pp The macro .Nm SLIST_FOREACH_SAFE traverses the list referenced by .Fa head in the forward direction, assigning each element in turn to .Fa var . However, unlike .Fn SLIST_FOREACH here it is permitted to both remove .Fa var as well as free it from within the loop safely without interfering with the traversal. .Pp The macro .Nm SLIST_FOREACH_FROM_SAFE behaves identically to .Nm SLIST_FOREACH_SAFE when .Fa var is NULL, else it treats .Fa var as a previously found SLIST element and begins the loop at .Fa var instead of the first element in the SLIST referenced by .Fa head . .Pp The macro .Nm SLIST_INIT initializes the list referenced by .Fa head . .Pp The macro .Nm SLIST_INSERT_HEAD inserts the new element .Fa elm at the head of the list. .Pp The macro .Nm SLIST_INSERT_AFTER inserts the new element .Fa elm after the element .Fa listelm . .Pp The macro .Nm SLIST_NEXT returns the next element in the list. .Pp The macro .Nm SLIST_REMOVE_AFTER removes the element after .Fa elm from the list. Unlike .Fa SLIST_REMOVE , this macro does not traverse the entire list. .Pp The macro .Nm SLIST_REMOVE_HEAD removes the element .Fa elm from the head of the list. For optimum efficiency, elements being removed from the head of the list should explicitly use this macro instead of the generic .Fa SLIST_REMOVE macro. .Pp The macro .Nm SLIST_REMOVE removes the element .Fa elm from the list. Use of this macro should be avoided as it traverses the entire list. A doubly-linked list should be used if this macro is needed in high-usage code paths or to operate on long lists. .Pp The macro .Nm SLIST_SWAP swaps the contents of .Fa head1 and .Fa head2 . .Sh SINGLY-LINKED LIST EXAMPLE .Bd -literal SLIST_HEAD(slisthead, entry) head = SLIST_HEAD_INITIALIZER(head); struct slisthead *headp; /* Singly-linked List head. */ struct entry { ... SLIST_ENTRY(entry) entries; /* Singly-linked List. */ ... } *n1, *n2, *n3, *np; SLIST_INIT(&head); /* Initialize the list. */ n1 = malloc(sizeof(struct entry)); /* Insert at the head. */ SLIST_INSERT_HEAD(&head, n1, entries); n2 = malloc(sizeof(struct entry)); /* Insert after. */ SLIST_INSERT_AFTER(n1, n2, entries); SLIST_REMOVE(&head, n2, entry, entries);/* Deletion. */ free(n2); n3 = SLIST_FIRST(&head); SLIST_REMOVE_HEAD(&head, entries); /* Deletion from the head. */ free(n3); /* Forward traversal. */ SLIST_FOREACH(np, &head, entries) np-> ... /* Safe forward traversal. */ SLIST_FOREACH_SAFE(np, &head, entries, np_temp) { np->do_stuff(); ... SLIST_REMOVE(&head, np, entry, entries); free(np); } while (!SLIST_EMPTY(&head)) { /* List Deletion. */ n1 = SLIST_FIRST(&head); SLIST_REMOVE_HEAD(&head, entries); free(n1); } .Ed .Sh SINGLY-LINKED TAIL QUEUES A singly-linked tail queue is headed by a structure defined by the .Nm STAILQ_HEAD macro. This structure contains a pair of pointers, one to the first element in the tail queue and the other to the last element in the tail queue. The elements are singly linked for minimum space and pointer manipulation overhead at the expense of O(n) removal for arbitrary elements. New elements can be added to the tail queue after an existing element, at the head of the tail queue, or at the end of the tail queue. A .Fa STAILQ_HEAD structure is declared as follows: .Bd -literal -offset indent STAILQ_HEAD(HEADNAME, TYPE) head; .Ed .Pp where .Li HEADNAME is the name of the structure to be defined, and .Li TYPE is the type of the elements to be linked into the tail queue. A pointer to the head of the tail queue can later be declared as: .Bd -literal -offset indent struct HEADNAME *headp; .Ed .Pp (The names .Li head and .Li headp are user selectable.) .Pp The macro .Nm STAILQ_HEAD_INITIALIZER evaluates to an initializer for the tail queue .Fa head . .Pp The macro .Nm STAILQ_CONCAT concatenates the tail queue headed by .Fa head2 onto the end of the one headed by .Fa head1 removing all entries from the former. .Pp The macro .Nm STAILQ_EMPTY evaluates to true if there are no items on the tail queue. .Pp The macro .Nm STAILQ_ENTRY declares a structure that connects the elements in the tail queue. .Pp The macro .Nm STAILQ_FIRST returns the first item on the tail queue or NULL if the tail queue is empty. .Pp The macro .Nm STAILQ_FOREACH traverses the tail queue referenced by .Fa head in the forward direction, assigning each element in turn to .Fa var . .Pp The macro .Nm STAILQ_FOREACH_FROM behaves identically to .Nm STAILQ_FOREACH when .Fa var is NULL, else it treats .Fa var as a previously found STAILQ element and begins the loop at .Fa var instead of the first element in the STAILQ referenced by .Fa head . .Pp The macro .Nm STAILQ_FOREACH_SAFE traverses the tail queue referenced by .Fa head in the forward direction, assigning each element in turn to .Fa var . However, unlike .Fn STAILQ_FOREACH here it is permitted to both remove .Fa var as well as free it from within the loop safely without interfering with the traversal. .Pp The macro .Nm STAILQ_FOREACH_FROM_SAFE behaves identically to .Nm STAILQ_FOREACH_SAFE when .Fa var is NULL, else it treats .Fa var as a previously found STAILQ element and begins the loop at .Fa var instead of the first element in the STAILQ referenced by .Fa head . .Pp The macro .Nm STAILQ_INIT initializes the tail queue referenced by .Fa head . .Pp The macro .Nm STAILQ_INSERT_HEAD inserts the new element .Fa elm at the head of the tail queue. .Pp The macro .Nm STAILQ_INSERT_TAIL inserts the new element .Fa elm at the end of the tail queue. .Pp The macro .Nm STAILQ_INSERT_AFTER inserts the new element .Fa elm after the element .Fa listelm . .Pp The macro .Nm STAILQ_LAST returns the last item on the tail queue. If the tail queue is empty the return value is .Dv NULL . .Pp The macro .Nm STAILQ_NEXT returns the next item on the tail queue, or NULL this item is the last. .Pp The macro .Nm STAILQ_REMOVE_AFTER removes the element after .Fa elm from the tail queue. Unlike .Fa STAILQ_REMOVE , this macro does not traverse the entire tail queue. .Pp The macro .Nm STAILQ_REMOVE_HEAD removes the element at the head of the tail queue. For optimum efficiency, elements being removed from the head of the tail queue should use this macro explicitly rather than the generic .Fa STAILQ_REMOVE macro. .Pp The macro .Nm STAILQ_REMOVE removes the element .Fa elm from the tail queue. Use of this macro should be avoided as it traverses the entire list. A doubly-linked tail queue should be used if this macro is needed in high-usage code paths or to operate on long tail queues. .Pp The macro .Nm STAILQ_SWAP swaps the contents of .Fa head1 and .Fa head2 . .Sh SINGLY-LINKED TAIL QUEUE EXAMPLE .Bd -literal STAILQ_HEAD(stailhead, entry) head = STAILQ_HEAD_INITIALIZER(head); struct stailhead *headp; /* Singly-linked tail queue head. */ struct entry { ... STAILQ_ENTRY(entry) entries; /* Tail queue. */ ... } *n1, *n2, *n3, *np; STAILQ_INIT(&head); /* Initialize the queue. */ n1 = malloc(sizeof(struct entry)); /* Insert at the head. */ STAILQ_INSERT_HEAD(&head, n1, entries); n1 = malloc(sizeof(struct entry)); /* Insert at the tail. */ STAILQ_INSERT_TAIL(&head, n1, entries); n2 = malloc(sizeof(struct entry)); /* Insert after. */ STAILQ_INSERT_AFTER(&head, n1, n2, entries); /* Deletion. */ STAILQ_REMOVE(&head, n2, entry, entries); free(n2); /* Deletion from the head. */ n3 = STAILQ_FIRST(&head); STAILQ_REMOVE_HEAD(&head, entries); free(n3); /* Forward traversal. */ STAILQ_FOREACH(np, &head, entries) np-> ... /* Safe forward traversal. */ STAILQ_FOREACH_SAFE(np, &head, entries, np_temp) { np->do_stuff(); ... STAILQ_REMOVE(&head, np, entry, entries); free(np); } /* TailQ Deletion. */ while (!STAILQ_EMPTY(&head)) { n1 = STAILQ_FIRST(&head); STAILQ_REMOVE_HEAD(&head, entries); free(n1); } /* Faster TailQ Deletion. */ n1 = STAILQ_FIRST(&head); while (n1 != NULL) { n2 = STAILQ_NEXT(n1, entries); free(n1); n1 = n2; } STAILQ_INIT(&head); .Ed .Sh LISTS A list is headed by a structure defined by the .Nm LIST_HEAD macro. This structure contains a single pointer to the first element on the list. The elements are doubly linked so that an arbitrary element can be removed without traversing the list. New elements can be added to the list after an existing element, before an existing element, or at the head of the list. A .Fa LIST_HEAD structure is declared as follows: .Bd -literal -offset indent LIST_HEAD(HEADNAME, TYPE) head; .Ed .Pp where .Fa HEADNAME is the name of the structure to be defined, and .Fa TYPE is the type of the elements to be linked into the list. A pointer to the head of the list can later be declared as: .Bd -literal -offset indent struct HEADNAME *headp; .Ed .Pp (The names .Li head and .Li headp are user selectable.) .Pp The macro .Nm LIST_HEAD_INITIALIZER evaluates to an initializer for the list .Fa head . .Pp The macro .Nm LIST_CONCAT concatenates the list headed by .Fa head2 onto the end of the one headed by .Fa head1 removing all entries from the former. Use of this macro should be avoided as it traverses the entirety of the .Fa head1 list. A tail queue should be used if this macro is needed in high-usage code paths or to operate on long lists. .Pp The macro .Nm LIST_EMPTY evaluates to true if there are no elements in the list. .Pp The macro .Nm LIST_ENTRY declares a structure that connects the elements in the list. .Pp The macro .Nm LIST_FIRST returns the first element in the list or NULL if the list is empty. .Pp The macro .Nm LIST_FOREACH traverses the list referenced by .Fa head in the forward direction, assigning each element in turn to .Fa var . .Pp The macro .Nm LIST_FOREACH_FROM behaves identically to .Nm LIST_FOREACH when .Fa var is NULL, else it treats .Fa var as a previously found LIST element and begins the loop at .Fa var instead of the first element in the LIST referenced by .Fa head . .Pp The macro .Nm LIST_FOREACH_SAFE traverses the list referenced by .Fa head in the forward direction, assigning each element in turn to .Fa var . However, unlike .Fn LIST_FOREACH here it is permitted to both remove .Fa var as well as free it from within the loop safely without interfering with the traversal. .Pp The macro .Nm LIST_FOREACH_FROM_SAFE behaves identically to .Nm LIST_FOREACH_SAFE when .Fa var is NULL, else it treats .Fa var as a previously found LIST element and begins the loop at .Fa var instead of the first element in the LIST referenced by .Fa head . .Pp The macro .Nm LIST_INIT initializes the list referenced by .Fa head . .Pp The macro .Nm LIST_INSERT_HEAD inserts the new element .Fa elm at the head of the list. .Pp The macro .Nm LIST_INSERT_AFTER inserts the new element .Fa elm after the element .Fa listelm . .Pp The macro .Nm LIST_INSERT_BEFORE inserts the new element .Fa elm before the element .Fa listelm . .Pp The macro .Nm LIST_NEXT returns the next element in the list, or NULL if this is the last. .Pp The macro .Nm LIST_PREV returns the previous element in the list, or NULL if this is the first. List .Fa head must contain element .Fa elm . .Pp The macro .Nm LIST_REMOVE removes the element .Fa elm from the list. .Pp The macro .Nm LIST_SWAP swaps the contents of .Fa head1 and .Fa head2 . .Sh LIST EXAMPLE .Bd -literal LIST_HEAD(listhead, entry) head = LIST_HEAD_INITIALIZER(head); struct listhead *headp; /* List head. */ struct entry { ... LIST_ENTRY(entry) entries; /* List. */ ... } *n1, *n2, *n3, *np, *np_temp; LIST_INIT(&head); /* Initialize the list. */ n1 = malloc(sizeof(struct entry)); /* Insert at the head. */ LIST_INSERT_HEAD(&head, n1, entries); n2 = malloc(sizeof(struct entry)); /* Insert after. */ LIST_INSERT_AFTER(n1, n2, entries); n3 = malloc(sizeof(struct entry)); /* Insert before. */ LIST_INSERT_BEFORE(n2, n3, entries); LIST_REMOVE(n2, entries); /* Deletion. */ free(n2); /* Forward traversal. */ LIST_FOREACH(np, &head, entries) np-> ... /* Safe forward traversal. */ LIST_FOREACH_SAFE(np, &head, entries, np_temp) { np->do_stuff(); ... LIST_REMOVE(np, entries); free(np); } while (!LIST_EMPTY(&head)) { /* List Deletion. */ n1 = LIST_FIRST(&head); LIST_REMOVE(n1, entries); free(n1); } n1 = LIST_FIRST(&head); /* Faster List Deletion. */ while (n1 != NULL) { n2 = LIST_NEXT(n1, entries); free(n1); n1 = n2; } LIST_INIT(&head); .Ed .Sh TAIL QUEUES A tail queue is headed by a structure defined by the .Nm TAILQ_HEAD macro. This structure contains a pair of pointers, one to the first element in the tail queue and the other to the last element in the tail queue. The elements are doubly linked so that an arbitrary element can be removed without traversing the tail queue. New elements can be added to the tail queue after an existing element, before an existing element, at the head of the tail queue, or at the end of the tail queue. A .Fa TAILQ_HEAD structure is declared as follows: .Bd -literal -offset indent TAILQ_HEAD(HEADNAME, TYPE) head; .Ed .Pp where .Li HEADNAME is the name of the structure to be defined, and .Li TYPE is the type of the elements to be linked into the tail queue. A pointer to the head of the tail queue can later be declared as: .Bd -literal -offset indent struct HEADNAME *headp; .Ed .Pp (The names .Li head and .Li headp are user selectable.) .Pp The macro .Nm TAILQ_HEAD_INITIALIZER evaluates to an initializer for the tail queue .Fa head . .Pp The macro .Nm TAILQ_CONCAT concatenates the tail queue headed by .Fa head2 onto the end of the one headed by .Fa head1 removing all entries from the former. .Pp The macro .Nm TAILQ_EMPTY evaluates to true if there are no items on the tail queue. .Pp The macro .Nm TAILQ_ENTRY declares a structure that connects the elements in the tail queue. .Pp The macro .Nm TAILQ_FIRST returns the first item on the tail queue or NULL if the tail queue is empty. .Pp The macro .Nm TAILQ_FOREACH traverses the tail queue referenced by .Fa head in the forward direction, assigning each element in turn to .Fa var . .Fa var is set to .Dv NULL if the loop completes normally, or if there were no elements. .Pp The macro .Nm TAILQ_FOREACH_FROM behaves identically to .Nm TAILQ_FOREACH when .Fa var is NULL, else it treats .Fa var as a previously found TAILQ element and begins the loop at .Fa var instead of the first element in the TAILQ referenced by .Fa head . .Pp The macro .Nm TAILQ_FOREACH_REVERSE traverses the tail queue referenced by .Fa head in the reverse direction, assigning each element in turn to .Fa var . .Pp The macro .Nm TAILQ_FOREACH_REVERSE_FROM behaves identically to .Nm TAILQ_FOREACH_REVERSE when .Fa var is NULL, else it treats .Fa var as a previously found TAILQ element and begins the reverse loop at .Fa var instead of the last element in the TAILQ referenced by .Fa head . .Pp The macros .Nm TAILQ_FOREACH_SAFE and .Nm TAILQ_FOREACH_REVERSE_SAFE traverse the list referenced by .Fa head in the forward or reverse direction respectively, assigning each element in turn to .Fa var . However, unlike their unsafe counterparts, .Nm TAILQ_FOREACH and .Nm TAILQ_FOREACH_REVERSE permit to both remove .Fa var as well as free it from within the loop safely without interfering with the traversal. .Pp The macro .Nm TAILQ_FOREACH_FROM_SAFE behaves identically to .Nm TAILQ_FOREACH_SAFE when .Fa var is NULL, else it treats .Fa var as a previously found TAILQ element and begins the loop at .Fa var instead of the first element in the TAILQ referenced by .Fa head . .Pp The macro .Nm TAILQ_FOREACH_REVERSE_FROM_SAFE behaves identically to .Nm TAILQ_FOREACH_REVERSE_SAFE when .Fa var is NULL, else it treats .Fa var as a previously found TAILQ element and begins the reverse loop at .Fa var instead of the last element in the TAILQ referenced by .Fa head . .Pp The macro .Nm TAILQ_INIT initializes the tail queue referenced by .Fa head . .Pp The macro .Nm TAILQ_INSERT_HEAD inserts the new element .Fa elm at the head of the tail queue. .Pp The macro .Nm TAILQ_INSERT_TAIL inserts the new element .Fa elm at the end of the tail queue. .Pp The macro .Nm TAILQ_INSERT_AFTER inserts the new element .Fa elm after the element .Fa listelm . .Pp The macro .Nm TAILQ_INSERT_BEFORE inserts the new element .Fa elm before the element .Fa listelm . .Pp The macro .Nm TAILQ_LAST returns the last item on the tail queue. If the tail queue is empty the return value is .Dv NULL . .Pp The macro .Nm TAILQ_NEXT returns the next item on the tail queue, or NULL if this item is the last. .Pp The macro .Nm TAILQ_PREV returns the previous item on the tail queue, or NULL if this item is the first. .Pp The macro .Nm TAILQ_REMOVE removes the element .Fa elm from the tail queue. .Pp The macro .Nm TAILQ_SWAP swaps the contents of .Fa head1 and .Fa head2 . .Sh TAIL QUEUE EXAMPLE .Bd -literal TAILQ_HEAD(tailhead, entry) head = TAILQ_HEAD_INITIALIZER(head); struct tailhead *headp; /* Tail queue head. */ struct entry { ... TAILQ_ENTRY(entry) entries; /* Tail queue. */ ... } *n1, *n2, *n3, *np; TAILQ_INIT(&head); /* Initialize the queue. */ n1 = malloc(sizeof(struct entry)); /* Insert at the head. */ TAILQ_INSERT_HEAD(&head, n1, entries); n1 = malloc(sizeof(struct entry)); /* Insert at the tail. */ TAILQ_INSERT_TAIL(&head, n1, entries); n2 = malloc(sizeof(struct entry)); /* Insert after. */ TAILQ_INSERT_AFTER(&head, n1, n2, entries); n3 = malloc(sizeof(struct entry)); /* Insert before. */ TAILQ_INSERT_BEFORE(n2, n3, entries); TAILQ_REMOVE(&head, n2, entries); /* Deletion. */ free(n2); /* Forward traversal. */ TAILQ_FOREACH(np, &head, entries) np-> ... /* Safe forward traversal. */ TAILQ_FOREACH_SAFE(np, &head, entries, np_temp) { np->do_stuff(); ... TAILQ_REMOVE(&head, np, entries); free(np); } /* Reverse traversal. */ TAILQ_FOREACH_REVERSE(np, &head, tailhead, entries) np-> ... /* TailQ Deletion. */ while (!TAILQ_EMPTY(&head)) { n1 = TAILQ_FIRST(&head); TAILQ_REMOVE(&head, n1, entries); free(n1); } /* Faster TailQ Deletion. */ n1 = TAILQ_FIRST(&head); while (n1 != NULL) { n2 = TAILQ_NEXT(n1, entries); free(n1); n1 = n2; } TAILQ_INIT(&head); .Ed +.Sh DIAGNOSTICS +When debugging +.Nm queue(3) , +it can be useful to trace queue changes. +To enable tracing, define the macro +.Va QUEUE_MACRO_DEBUG_TRACE +at compile time. +.Pp +It can also be useful to trash pointers that have been unlinked from a queue, +to detect use after removal. +To enable pointer trashing, define the macro +.Va QUEUE_MACRO_DEBUG_TRASH +at compile time. +The macro +.Fn QMD_IS_TRASHED "void *ptr" +returns true if +.Fa ptr +has been trashed by the +.Va QUEUE_MACRO_DEBUG_TRASH +option. +.Pp +In the kernel (with +.Va INVARIANTS +enabled), the +.Fn SLIST_REMOVE_PREVPTR +macro is available to aid debugging: +.Bl -hang -offset indent +.It Fn SLIST_REMOVE_PREVPTR "TYPE **prev" "TYPE *elm" "SLIST_ENTRY NAME" +.Pp +Removes +.Fa elm , +which must directly follow the element whose +.Va &SLIST_NEXT() +is +.Fa prev , +from the SLIST. +This macro validates that +.Fa elm +follows +.Fa prev +in +.Va INVARIANTS +mode. +.El .Sh SEE ALSO .Xr tree 3 .Sh HISTORY The .Nm queue functions first appeared in .Bx 4.4 . Index: projects/clang390-import/share/man/man4/ddb.4 =================================================================== --- projects/clang390-import/share/man/man4/ddb.4 (revision 305686) +++ projects/clang390-import/share/man/man4/ddb.4 (revision 305687) @@ -1,1565 +1,1576 @@ .\" .\" Mach Operating System .\" Copyright (c) 1991,1990 Carnegie Mellon University .\" Copyright (c) 2007 Robert N. M. Watson .\" All Rights Reserved. .\" .\" Permission to use, copy, modify and distribute this software and its .\" documentation is hereby granted, provided that both the copyright .\" notice and this permission notice appear in all copies of the .\" software, derivative works or modified versions, and any portions .\" thereof, and that both notices appear in supporting documentation. .\" .\" CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" .\" CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND FOR .\" ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. .\" .\" Carnegie Mellon requests users of this software to return to .\" .\" Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU .\" School of Computer Science .\" Carnegie Mellon University .\" Pittsburgh PA 15213-3890 .\" .\" any improvements or extensions that they make and grant Carnegie Mellon .\" the rights to redistribute these changes. .\" .\" changed a \# to #, since groff choked on it. .\" .\" HISTORY .\" ddb.4,v .\" Revision 1.1 1993/07/15 18:41:02 brezak .\" Man page for DDB .\" .\" Revision 2.6 92/04/08 08:52:57 rpd .\" Changes from OSF. .\" [92/01/17 14:19:22 jsb] .\" Changes for OSF debugger modifications. .\" [91/12/12 tak] .\" .\" Revision 2.5 91/06/25 13:50:22 rpd .\" Added some watchpoint explanation. .\" [91/06/25 rpd] .\" .\" Revision 2.4 91/06/17 15:47:31 jsb .\" Added documentation for continue/c, match, search, and watchpoints. .\" I've not actually explained what a watchpoint is; maybe Rich can .\" do that (hint, hint). .\" [91/06/17 10:58:08 jsb] .\" .\" Revision 2.3 91/05/14 17:04:23 mrt .\" Correcting copyright .\" .\" Revision 2.2 91/02/14 14:10:06 mrt .\" Changed to new Mach copyright .\" [91/02/12 18:10:12 mrt] .\" .\" Revision 2.2 90/08/30 14:23:15 dbg .\" Created. .\" [90/08/30 dbg] .\" .\" $FreeBSD$ .\" .Dd July 13, 2016 .Dt DDB 4 .Os .Sh NAME .Nm ddb .Nd interactive kernel debugger .Sh SYNOPSIS In order to enable kernel debugging facilities include: .Bd -ragged -offset indent .Cd options KDB .Cd options DDB .Ed .Pp To prevent activation of the debugger on kernel .Xr panic 9 : .Bd -ragged -offset indent .Cd options KDB_UNATTENDED .Ed .Pp In order to print a stack trace of the current thread on the console for a panic: .Bd -ragged -offset indent .Cd options KDB_TRACE .Ed .Pp To print the numerical value of symbols in addition to the symbolic representation, define: .Bd -ragged -offset indent .Cd options DDB_NUMSYM .Ed .Pp To enable the .Xr gdb 1 backend, so that remote debugging with .Xr kgdb 1 is possible, include: .Bd -ragged -offset indent .Cd options GDB .Ed .Sh DESCRIPTION The .Nm kernel debugger is an interactive debugger with a syntax inspired by .Xr gdb 1 . If linked into the running kernel, it can be invoked locally with the .Ql debug .Xr keymap 5 action. The debugger is also invoked on kernel .Xr panic 9 if the .Va debug.debugger_on_panic .Xr sysctl 8 MIB variable is set non-zero, which is the default unless the .Dv KDB_UNATTENDED option is specified. .Pp The current location is called .Va dot . The .Va dot is displayed with a hexadecimal format at a prompt. The commands .Ic examine and .Ic write update .Va dot to the address of the last line examined or the last location modified, and set .Va next to the address of the next location to be examined or changed. Other commands do not change .Va dot , and set .Va next to be the same as .Va dot . .Pp The general command syntax is: .Ar command Ns Op Li / Ns Ar modifier -.Ar address Ns Op Li , Ns Ar count +.Oo Ar addr Oc Ns Op Li , Ns Ar count .Pp A blank line repeats the previous command from the address .Va next with count 1 and no modifiers. Specifying -.Ar address +.Ar addr sets .Va dot to the address. Omitting -.Ar address +.Ar addr uses .Va dot . A missing .Ar count is taken to be 1 for printing commands or infinity for stack traces. +A +.Ar count +of -1 is equivalent to a missing +.Ar count . +Options that are supplied but not supported by the given +.Ar command +are usually ignored. .Pp The .Nm debugger has a pager feature (like the .Xr more 1 command) for the output. If an output line exceeds the number set in the .Va lines variable, it displays .Dq Li --More-- and waits for a response. The valid responses for it are: .Pp .Bl -tag -compact -width ".Li SPC" .It Li SPC one more page .It Li RET one more line .It Li q abort the current command, and return to the command input mode .El .Pp Finally, .Nm provides a small (currently 10 items) command history, and offers simple .Nm emacs Ns -style command line editing capabilities. In addition to the .Nm emacs control keys, the usual .Tn ANSI arrow keys may be used to browse through the history buffer, and move the cursor within the current line. .Sh COMMANDS .Bl -tag -width indent -compact -.It Ic examine -.It Ic x +.It Xo +.Ic examine Ns Op Li / Ns Cm AISabcdghilmorsuxz ... +.Oo Ar addr Oc Ns Op Li , Ns Ar count +.Xc +.It Xo +.Ic x Ns Op Li / Ns Cm AISabcdghilmorsuxz ... +.Oo Ar addr Oc Ns Op Li , Ns Ar count +.Xc Display the addressed locations according to the formats in the modifier. Multiple modifier formats display multiple locations. If no format is specified, the last format specified for this command is used. .Pp The format characters are: .Bl -tag -compact -width indent .It Cm b look at by bytes (8 bits) .It Cm h look at by half words (16 bits) .It Cm l look at by long words (32 bits) .It Cm g look at by quad words (64 bits) .It Cm a print the location being displayed .It Cm A print the location with a line number if possible .It Cm x display in unsigned hex .It Cm z display in signed hex .It Cm o display in unsigned octal .It Cm d display in signed decimal .It Cm u display in unsigned decimal .It Cm r display in current radix, signed .It Cm c display low 8 bits as a character. Non-printing characters are displayed as an octal escape code (e.g., .Ql \e000 ) . .It Cm s display the null-terminated string at the location. Non-printing characters are displayed as octal escapes. .It Cm m display in unsigned hex with character dump at the end of each line. The location is also displayed in hex at the beginning of each line. .It Cm i display as an instruction .It Cm I display as an instruction with possible alternate formats depending on the -machine, but none of the supported architectures have an alternate format. +machine, but none of the supported architectures have an alternate format .It Cm S display a symbol name for the pointer stored at the address .El .Pp .It Ic xf Examine forward: execute an .Ic examine command with the last specified parameters to it except that the next address displayed by it is used as the start address. .Pp .It Ic xb Examine backward: execute an .Ic examine command with the last specified parameters to it except that the last start address subtracted by the size displayed by it is used as the start address. .Pp .It Ic print Ns Op Li / Ns Cm acdoruxz .It Ic p Ns Op Li / Ns Cm acdoruxz Print .Ar addr Ns s according to the modifier character (as described above for .Cm examine ) . Valid formats are: .Cm a , x , z , o , d , u , r , and .Cm c . If no modifier is specified, the last one specified to it is used. The argument .Ar addr can be a string, in which case it is printed as it is. For example: .Bd -literal -offset indent print/x "eax = " $eax "\enecx = " $ecx "\en" .Ed .Pp will print like: .Bd -literal -offset indent eax = xxxxxx ecx = yyyyyy .Ed .Pp .It Xo .Ic write Ns Op Li / Ns Cm bhl .Ar addr expr1 Op Ar expr2 ... .Xc .It Xo .Ic w Ns Op Li / Ns Cm bhl .Ar addr expr1 Op Ar expr2 ... .Xc Write the expressions specified after .Ar addr on the command line at succeeding locations starting with .Ar addr . The write unit size can be specified in the modifier with a letter .Cm b (byte), .Cm h (half word) or .Cm l (long word) respectively. If omitted, long word is assumed. .Pp .Sy Warning : since there is no delimiter between expressions, strange things may happen. It is best to enclose each expression in parentheses. .Pp .It Ic set Li $ Ns Ar variable Oo Li = Oc Ar expr Set the named variable or register with the value of .Ar expr . Valid variable names are described below. .Pp -.It Ic break Ns Op Li / Ns Cm u -.It Ic b Ns Op Li / Ns Cm u +.It Ic break Ns Oo Li / Ns Cm u Oc Oo Ar addr Oc Ns Op Li , Ns Ar count +.It Ic b Ns Oo Li / Ns Cm u Oc Oo Ar addr Oc Ns Op Li , Ns Ar count Set a break point at .Ar addr . If .Ar count -is supplied, continues +is supplied, the +.Ic continue +command will not stop at this break point on the first .Ar count -\- 1 times before stopping at the -break point. +\- 1 times that it is hit. If the break point is set, a break point number is printed with .Ql # . This number can be used in deleting the break point or adding conditions to it. .Pp If the .Cm u modifier is specified, this command sets a break point in user address space. Without the .Cm u option, the address is considered to be in the kernel space, and a wrong space address is rejected with an error message. This modifier can be used only if it is supported by machine dependent routines. .Pp .Sy Warning : If a user text is shadowed by a normal user space debugger, user space break points may not work correctly. Setting a break point at the low-level code paths may also cause strange behavior. .Pp -.It Ic delete Ar addr -.It Ic d Ar addr +.It Ic delete Op Ar addr +.It Ic d Op Ar addr .It Ic delete Li # Ns Ar number -.It Ic d Li # Ns Ar number -Delete the break point. -The target break point can be specified by a +.It Ic d Li # Ns Ar number +Delete the specified break point. +The break point can be specified by a break point number with .Ql # , or by using the same .Ar addr specified in the original .Ic break -command. +command, or by omitting +.Ar addr +to get the default address of +.Va dot . .Pp -.It Ic watch Ar addr Ns Li , Ns Ar size +.It Ic watch Oo Ar addr Oc Ns Op Li , Ns Ar size Set a watchpoint for a region. Execution stops when an attempt to modify the region occurs. The .Ar size argument defaults to 4. If you specify a wrong space address, the request is rejected with an error message. .Pp .Sy Warning : Attempts to watch wired kernel memory may cause unrecoverable error in some systems such as i386. Watchpoints on user addresses work best. .Pp -.It Ic hwatch Ar addr Ns Li , Ns Ar size +.It Ic hwatch Oo Ar addr Oc Ns Op Li , Ns Ar size Set a hardware watchpoint for a region if supported by the architecture. Execution stops when an attempt to modify the region occurs. The .Ar size argument defaults to 4. .Pp .Sy Warning : The hardware debug facilities do not have a concept of separate address spaces like the watch command does. Use .Ic hwatch for setting watchpoints on kernel address locations only, and avoid its use on user mode address spaces. .Pp -.It Ic dhwatch Ar addr Ns Li , Ns Ar size +.It Ic dhwatch Oo Ar addr Oc Ns Op Li , Ns Ar size Delete specified hardware watchpoint. .Pp -.It Ic step Ns Op Li / Ns Cm p -.It Ic s Ns Op Li / Ns Cm p +.It Ic step Ns Oo Li / Ns Cm p Oc Ns Op Li , Ns Ar count +.It Ic s Ns Oo Li / Ns Cm p Oc Ns Op Li , Ns Ar count Single step .Ar count -times (the comma is a mandatory part of the syntax). +times. If the .Cm p modifier is specified, print each instruction at each step. Otherwise, only print the last instruction. .Pp .Sy Warning : depending on machine type, it may not be possible to single-step through some low-level code paths or user space code. On machines with software-emulated single-stepping (e.g., pmax), stepping through code executed by interrupt handlers will probably do the wrong thing. .Pp .It Ic continue Ns Op Li / Ns Cm c .It Ic c Ns Op Li / Ns Cm c Continue execution until a breakpoint or watchpoint. If the .Cm c modifier is specified, count instructions while executing. Some machines (e.g., pmax) also count loads and stores. .Pp .Sy Warning : when counting, the debugger is really silently single-stepping. This means that single-stepping on low-level code may cause strange behavior. .Pp .It Ic until Ns Op Li / Ns Cm p Stop at the next call or return instruction. If the .Cm p modifier is specified, print the call nesting depth and the cumulative instruction count at each call or return. Otherwise, only print when the matching return is hit. .Pp .It Ic next Ns Op Li / Ns Cm p .It Ic match Ns Op Li / Ns Cm p Stop at the matching return instruction. If the .Cm p modifier is specified, print the call nesting depth and the cumulative instruction count at each call or return. Otherwise, only print when the matching return is hit. .Pp .It Xo .Ic trace Ns Op Li / Ns Cm u -.Op Ar pid | tid +.Op Ar pid | tid Ns .Op Li , Ns Ar count .Xc .It Xo .Ic t Ns Op Li / Ns Cm u -.Op Ar pid | tid +.Op Ar pid | tid Ns .Op Li , Ns Ar count .Xc .It Xo .Ic where Ns Op Li / Ns Cm u -.Op Ar pid | tid +.Op Ar pid | tid Ns .Op Li , Ns Ar count .Xc .It Xo .Ic bt Ns Op Li / Ns Cm u -.Op Ar pid | tid +.Op Ar pid | tid Ns .Op Li , Ns Ar count .Xc Stack trace. The .Cm u option traces user space; if omitted, .Ic trace only traces kernel space. The optional argument .Ar count is the number of frames to be traced. If .Ar count is omitted, all frames are printed. .Pp .Sy Warning : User space stack trace is valid only if the machine dependent code supports it. .Pp .It Xo .Ic search Ns Op Li / Ns Cm bhl .Ar addr .Ar value -.Op Ar mask +.Op Ar mask Ns .Op Li , Ns Ar count .Xc Search memory for .Ar value . -This command might fail in interesting -ways if it does not find the searched-for value. -This is because -.Nm -does not always recover from touching bad memory. The optional .Ar count argument limits the search. .\" .Pp .It Xo .Ic findstack .Ar addr .Xc Prints the thread address for a thread kernel-mode stack of which contains the specified address. If the thread is not found, search the thread stack cache and prints the cached stack address. Otherwise, prints nothing. .Pp .It Ic show Cm all procs Ns Op Li / Ns Cm m .It Ic ps Ns Op Li / Ns Cm m Display all process information. The process information may not be shown if it is not supported in the machine, or the bottom of the stack of the target process is not in the main memory at that time. The .Cm m modifier will alter the display to show VM map addresses for the process and not show other information. .\" .Pp .It Ic show Cm all trace .It Ic alltrace -.Xc Show a stack trace for every thread in the system. .Pp .It Ic show Cm all ttys Show all TTY's within the system. Output is similar to .Xr pstat 8 , but also includes the address of the TTY structure. .\" .Pp .It Ic show Cm all vnets Show the same output as "show vnet" does, but lists all virtualized network stacks within the system. .\" .Pp .It Ic show Cm allchains Show the same information like "show lockchain" does, but for every thread in the system. .\" .Pp .It Ic show Cm alllocks Show all locks that are currently held. This command is only available if .Xr witness 4 is included in the kernel. .\" .Pp .It Ic show Cm allpcpu The same as "show pcpu", but for every CPU present in the system. .\" .Pp .It Ic show Cm allrman Show information related with resource management, including interrupt request lines, DMA request lines, I/O ports, I/O memory addresses, and Resource IDs. .\" .Pp .It Ic show Cm apic Dump data about APIC IDT vector mappings. .\" .Pp .It Ic show Cm breaks Show breakpoints set with the "break" command. .\" .Pp .It Ic show Cm bio Ar addr Show information about the bio structure .Vt struct bio present at .Ar addr . See the .Pa sys/bio.h header file and .Xr g_bio 9 for more details on the exact meaning of the structure fields. .\" .Pp .It Ic show Cm buffer Ar addr Show information about the buf structure .Vt struct buf present at .Ar addr . See the .Pa sys/buf.h header file for more details on the exact meaning of the structure fields. .\" .Pp .It Ic show Cm callout Ar addr Show information about the callout structure .Vt struct callout present at .Ar addr . .\" .Pp .It Ic show Cm cbstat Show brief information about the TTY subsystem. .\" .Pp .It Ic show Cm cdev Without argument, show the list of all created cdev's, consisting of devfs node name and struct cdev address. When address of cdev is supplied, show some internal devfs state of the cdev. .\" .Pp .It Ic show Cm conifhk Lists hooks currently waiting for completion in run_interrupt_driven_config_hooks(). .\" .Pp .It Ic show Cm cpusets Print numbered root and assigned CPU affinity sets. See .Xr cpuset 2 for more details. .\" .Pp .It Ic show Cm cyrixreg Show registers specific to the Cyrix processor. .\" .Pp .It Ic show Cm devmap Prints the contents of the static device mapping table. Currently only available on the ARM architecture. .\" .Pp .It Ic show Cm domain Ar addr Print protocol domain structure .Vt struct domain at address .Ar addr . See the .Pa sys/domain.h header file for more details on the exact meaning of the structure fields. .\" .Pp .It Ic show Cm ffs Op Ar addr Show brief information about ffs mount at the address .Ar addr , if argument is given. Otherwise, provides the summary about each ffs mount. .\" .Pp .It Ic show Cm file Ar addr Show information about the file structure .Vt struct file present at address .Ar addr . .\" .Pp .It Ic show Cm files Show information about every file structure in the system. .\" .Pp .It Ic show Cm freepages Show the number of physical pages in each of the free lists. .\" .Pp .It Ic show Cm geom Op Ar addr If the .Ar addr argument is not given, displays the entire GEOM topology. If .Ar addr is given, displays details about the given GEOM object (class, geom, provider or consumer). .\" .Pp .It Ic show Cm idt Show IDT layout. The first column specifies the IDT vector. The second one is the name of the interrupt/trap handler. Those functions are machine dependent. .\" .Pp .It Ic show Cm igi_list Ar addr Show information about the IGMP structure .Vt struct igmp_ifsoftc present at .Ar addr . .\" .Pp .It Ic show Cm inodedeps Op Ar addr Show brief information about each inodedep structure. If .Ar addr is given, only inodedeps belonging to the fs located at the supplied address are shown. .\" .Pp .It Ic show Cm inpcb Ar addr Show information on IP Control Block .Vt struct in_pcb present at .Ar addr . .\" .Pp .It Ic show Cm intr Dump information about interrupt handlers. .\" .Pp .It Ic show Cm intrcnt Dump the interrupt statistics. .\" .Pp .It Ic show Cm irqs Show interrupt lines and their respective kernel threads. .\" .Pp .It Ic show Cm jails Show the list of .Xr jail 8 instances. In addition to what .Xr jls 8 shows, also list kernel internal details. .\" .Pp .It Ic show Cm lapic Show information from the local APIC registers for this CPU. .\" .Pp .It Ic show Cm lock Ar addr Show lock structure. The output format is as follows: .Bl -tag -width "flags" .It Ic class: Class of the lock. Possible types include .Xr mutex 9 , .Xr rmlock 9 , .Xr rwlock 9 , .Xr sx 9 . .It Ic name: Name of the lock. .It Ic flags: Flags passed to the lock initialization function. For exact possibilities see manual pages of possible lock types. .It Ic state: Current state of a lock. As well as .Ic flags it's lock-specific. .It Ic owner: Lock owner. .El .\" .Pp .It Ic show Cm lockchain Ar addr Show all threads a particular thread at address .Ar addr is waiting on based on non-sleepable and non-spin locks. .\" .Pp .It Ic show Cm lockedbufs Show the same information as "show buf", but for every locked .Vt struct buf object. .\" .Pp .It Ic show Cm lockedvnods List all locked vnodes in the system. .\" .Pp .It Ic show Cm locks Prints all locks that are currently acquired. This command is only available if .Xr witness 4 is included in the kernel. .\" .Pp .It Ic show Cm locktree .\" .Pp .It Ic show Cm malloc Prints .Xr malloc 9 memory allocator statistics. The output format is as follows: .Pp .Bl -tag -compact -offset indent -width "Requests" .It Ic Type Specifies a type of memory. It is the same as a description string used while defining the given memory type with .Xr MALLOC_DECLARE 9 . .It Ic InUse Number of memory allocations of the given type, for which .Xr free 9 has not been called yet. .It Ic MemUse Total memory consumed by the given allocation type. .It Ic Requests Number of memory allocation requests for the given memory type. .El .Pp The same information can be gathered in userspace with .Dq Nm vmstat Fl m . .\" .Pp .It Ic show Cm map Ns Oo Li / Ns Cm f Oc Ar addr Prints the VM map at .Ar addr . If the .Cm f modifier is specified the complete map is printed. .\" .Pp .It Ic show Cm msgbuf Print the system's message buffer. It is the same output as in the .Dq Nm dmesg case. It is useful if you got a kernel panic, attached a serial cable to the machine and want to get the boot messages from before the system hang. .\" .It Ic show Cm mount Displays short info about all currently mounted file systems. .Pp .It Ic show Cm mount Ar addr Displays details about the given mount point. .\" .Pp .It Ic show Cm object Ns Oo Li / Ns Cm f Oc Ar addr Prints the VM object at .Ar addr . If the .Cm f option is specified the complete object is printed. .\" .Pp .It Ic show Cm panic Print the panic message if set. .\" .Pp .It Ic show Cm page Show statistics on VM pages. .\" .Pp .It Ic show Cm pageq Show statistics on VM page queues. .\" .Pp .It Ic show Cm pciregs Print PCI bus registers. The same information can be gathered in userspace by running .Dq Nm pciconf Fl lv . .\" .Pp .It Ic show Cm pcpu Print current processor state. The output format is as follows: .Pp .Bl -tag -compact -offset indent -width "spin locks held:" .It Ic cpuid Processor identifier. .It Ic curthread Thread pointer, process identifier and the name of the process. .It Ic curpcb Control block pointer. .It Ic fpcurthread FPU thread pointer. .It Ic idlethread Idle thread pointer. .It Ic APIC ID CPU identifier coming from APIC. .It Ic currentldt LDT pointer. .It Ic spin locks held Names of spin locks held. .El .\" .Pp .It Ic show Cm pgrpdump Dump process groups present within the system. .\" .Pp .It Ic show Cm proc Op Ar addr If no .Op Ar addr is specified, print information about the current process. Otherwise, show information about the process at address .Ar addr . .\" .Pp .It Ic show Cm procvm Show process virtual memory layout. .\" .Pp .It Ic show Cm protosw Ar addr Print protocol switch structure .Vt struct protosw at address .Ar addr . .\" .Pp .It Ic show Cm registers Ns Op Li / Ns Cm u Display the register set. If the .Cm u modifier is specified, it displays user registers instead of kernel registers or the currently saved one. .Pp .Sy Warning : The support of the .Cm u modifier depends on the machine. If not supported, incorrect information will be displayed. .\" .Pp .It Ic show Cm rman Ar addr Show resource manager object .Vt struct rman at address .Ar addr . Addresses of particular pointers can be gathered with "show allrman" command. .\" .Pp .It Ic show Cm rtc Show real time clock value. Useful for long debugging sessions. .\" .Pp .It Ic show Cm sleepchain Show all the threads a particular thread is waiting on based on sleepable locks. .\" .Pp .It Ic show Cm sleepq .It Ic show Cm sleepqueue Both commands provide the same functionality. They show sleepqueue .Vt struct sleepqueue structure. Sleepqueues are used within the .Fx kernel to implement sleepable synchronization primitives (thread holding a lock might sleep or be context switched), which at the time of writing are: .Xr condvar 9 , .Xr sx 9 and standard .Xr msleep 9 interface. .\" .Pp .It Ic show Cm sockbuf Ar addr .It Ic show Cm socket Ar addr Those commands print .Vt struct sockbuf and .Vt struct socket objects placed at .Ar addr . Output consists of all values present in structures mentioned. For exact interpretation and more details, visit .Pa sys/socket.h header file. .\" .Pp .It Ic show Cm sysregs Show system registers (e.g., .Li cr0-4 on i386.) Not present on some platforms. .\" .Pp .It Ic show Cm tcpcb Ar addr Print TCP control block .Vt struct tcpcb lying at address .Ar addr . For exact interpretation of output, visit .Pa netinet/tcp.h header file. .\" .Pp .It Ic show Cm thread Op Ar addr If no .Ar addr is specified, show detailed information about current thread. Otherwise, information about thread at .Ar addr is printed. .\" .Pp .It Ic show Cm threads Show all threads within the system. Output format is as follows: .Pp .Bl -tag -compact -offset indent -width "Second column" .It Ic First column Thread identifier (TID) .It Ic Second column Thread structure address .It Ic Third column Backtrace. .El .\" .Pp .It Ic show Cm tty Ar addr Display the contents of a TTY structure in a readable form. .\" .Pp .It Ic show Cm turnstile Ar addr Show turnstile .Vt struct turnstile structure at address .Ar addr . Turnstiles are structures used within the .Fx kernel to implement synchronization primitives which, while holding a specific type of lock, cannot sleep or context switch to another thread. Currently, those are: .Xr mutex 9 , .Xr rwlock 9 , .Xr rmlock 9 . .\" .Pp .It Ic show Cm uma Show UMA allocator statistics. Output consists five columns: .Pp .Bl -tag -compact -offset indent -width "Requests" .It Cm "Zone" Name of the UMA zone. The same string that was passed to .Xr uma_zcreate 9 as a first argument. .It Cm "Size" Size of a given memory object (slab). .It Cm "Used" Number of slabs being currently used. .It Cm "Free" Number of free slabs within the UMA zone. .It Cm "Requests" Number of allocations requests to the given zone. .El .Pp The very same information might be gathered in the userspace with the help of .Dq Nm vmstat Fl z . .\" .Pp .It Ic show Cm unpcb Ar addr Shows UNIX domain socket private control block .Vt struct unpcb present at the address .Ar addr . .\" .Pp .It Ic show Cm vmochk Prints, whether the internal VM objects are in a map somewhere and none have zero ref counts. .\" .Pp .It Ic show Cm vmopag This is supposed to show physical addresses consumed by a VM object. Currently, it is not possible to use this command when .Xr witness 4 is compiled in the kernel. .\" .Pp .It Ic show Cm vnet Ar addr Prints virtualized network stack .Vt struct vnet structure present at the address .Ar addr . .\" .Pp .It Ic show Cm vnode Op Ar addr Prints vnode .Vt struct vnode structure lying at .Op Ar addr . For the exact interpretation of the output, look at the .Pa sys/vnode.h header file. .\" .Pp .It Ic show Cm vnodebufs Ar addr Shows clean/dirty buffer lists of the vnode located at .Ar addr . .\" .Pp .It Ic show Cm watches Displays all watchpoints. Shows watchpoints set with "watch" command. .\" .Pp .It Ic show Cm witness Shows information about lock acquisition coming from the .Xr witness 4 subsystem. .\" .Pp .It Ic gdb Toggles between remote GDB and DDB mode. In remote GDB mode, another machine is required that runs .Xr gdb 1 using the remote debug feature, with a connection to the serial console port on the target machine. Currently only available on the i386 architecture. .Pp .It Ic halt Halt the system. .Pp .It Ic kill Ar sig pid Send signal .Ar sig to process .Ar pid . The signal is acted on upon returning from the debugger. This command can be used to kill a process causing resource contention in the case of a hung system. See .Xr signal 3 for a list of signals. Note that the arguments are reversed relative to .Xr kill 2 . .Pp .It Ic reboot Op Ar seconds .It Ic reset Op Ar seconds Hard reset the system. If the optional argument .Ar seconds is given, the debugger will wait for this long, at most a week, before rebooting. .Pp .It Ic help Print a short summary of the available commands and command abbreviations. .Pp .It Ic capture on .It Ic capture off .It Ic capture reset .It Ic capture status .Nm supports a basic output capture facility, which can be used to retrieve the results of debugging commands from userspace using .Xr sysctl 3 . .Ic capture on enables output capture; .Ic capture off disables capture. .Ic capture reset will clear the capture buffer and disable capture. .Ic capture status will report current buffer use, buffer size, and disposition of output capture. .Pp Userspace processes may inspect and manage .Nm capture state using .Xr sysctl 8 : .Pp .Dv debug.ddb.capture.bufsize may be used to query or set the current capture buffer size. .Pp .Dv debug.ddb.capture.maxbufsize may be used to query the compile-time limit on the capture buffer size. .Pp .Dv debug.ddb.capture.bytes may be used to query the number of bytes of output currently in the capture buffer. .Pp .Dv debug.ddb.capture.data returns the contents of the buffer as a string to an appropriately privileged process. .Pp This facility is particularly useful in concert with the scripting and .Xr textdump 4 facilities, allowing scripted debugging output to be captured and committed to disk as part of a textdump for later analysis. The contents of the capture buffer may also be inspected in a kernel core dump using .Xr kgdb 1 . .Pp .It Ic run .It Ic script .It Ic scripts .It Ic unscript Run, define, list, and delete scripts. See the .Sx SCRIPTING section for more information on the scripting facility. .Pp .It Ic textdump dump .It Ic textdump set .It Ic textdump status .It Ic textdump unset Use the .Ic textdump dump command to immediately perform a textdump. More information may be found in .Xr textdump 4 . The .Ic textdump set command may be used to force the next kernel core dump to be a textdump rather than a traditional memory dump or minidump. .Ic textdump status reports whether a textdump has been scheduled. .Ic textdump unset cancels a request to perform a textdump as the next kernel core dump. .El .Sh VARIABLES The debugger accesses registers and variables as .Li $ Ns Ar name . Register names are as in the .Dq Ic show Cm registers command. Some variables are suffixed with numbers, and may have some modifier following a colon immediately after the variable name. For example, register variables can have a .Cm u modifier to indicate user register (e.g., .Dq Li $eax:u ) . .Pp Built-in variables currently supported are: .Pp .Bl -tag -width ".Va tabstops" -compact .It Va radix Input and output radix. .It Va maxoff Addresses are printed as .Dq Ar symbol Ns Li + Ns Ar offset unless .Ar offset is greater than .Va maxoff . .It Va maxwidth The width of the displayed line. .It Va lines The number of lines. It is used by the built-in pager. .It Va tabstops Tab stop width. .It Va work Ns Ar xx Work variable; .Ar xx can take values from 0 to 31. .El .Sh EXPRESSIONS Most expression operators in C are supported except .Ql ~ , .Ql ^ , and unary .Ql & . Special rules in .Nm are: .Bl -tag -width ".No Identifiers" .It Identifiers The name of a symbol is translated to the value of the symbol, which is the address of the corresponding object. .Ql \&. and .Ql \&: can be used in the identifier. If supported by an object format dependent routine, .Sm off .Oo Ar filename : Oc Ar func : lineno , .Sm on .Oo Ar filename : Oc Ns Ar variable , and .Oo Ar filename : Oc Ns Ar lineno can be accepted as a symbol. .It Numbers Radix is determined by the first two letters: .Ql 0x : hex, .Ql 0o : octal, .Ql 0t : decimal; otherwise, follow current radix. .It Li \&. .Va dot .It Li + .Va next .It Li .. address of the start of the last line examined. Unlike .Va dot or .Va next , this is only changed by .Ic examine or .Ic write command. .It Li ' last address explicitly specified. .It Li $ Ns Ar variable Translated to the value of the specified variable. It may be followed by a .Ql \&: and modifiers as described above. .It Ar a Ns Li # Ns Ar b A binary operator which rounds up the left hand side to the next multiple of right hand side. .It Li * Ns Ar expr Indirection. It may be followed by a .Ql \&: and modifiers as described above. .El .Sh SCRIPTING .Nm supports a basic scripting facility to allow automating tasks or responses to specific events. Each script consists of a list of DDB commands to be executed sequentially, and is assigned a unique name. Certain script names have special meaning, and will be automatically run on various .Nm events if scripts by those names have been defined. .Pp The .Ic script command may be used to define a script by name. Scripts consist of a series of .Nm commands separated with the .Ql \&; character. For example: .Bd -literal -offset indent script kdb.enter.panic=bt; show pcpu script lockinfo=show alllocks; show lockedvnods .Ed .Pp The .Ic scripts command lists currently defined scripts. .Pp The .Ic run command execute a script by name. For example: .Bd -literal -offset indent run lockinfo .Ed .Pp The .Ic unscript command may be used to delete a script by name. For example: .Bd -literal -offset indent unscript kdb.enter.panic .Ed .Pp These functions may also be performed from userspace using the .Xr ddb 8 command. .Pp Certain scripts are run automatically, if defined, for specific .Nm events. The follow scripts are run when various events occur: .Bl -tag -width kdb.enter.powerfail .It Dv kdb.enter.acpi The kernel debugger was entered as a result of an .Xr acpi 4 event. .It Dv kdb.enter.bootflags The kernel debugger was entered at boot as a result of the debugger boot flag being set. .It Dv kdb.enter.break The kernel debugger was entered as a result of a serial or console break. .It Dv kdb.enter.cam The kernel debugger was entered as a result of a .Xr CAM 4 event. .It Dv kdb.enter.mac The kernel debugger was entered as a result of an assertion failure in the .Xr mac_test 4 module of the TrustedBSD MAC Framework. .It Dv kdb.enter.ndis The kernel debugger was entered as a result of an .Xr ndis 4 breakpoint event. .It Dv kdb.enter.netgraph The kernel debugger was entered as a result of a .Xr netgraph 4 event. .It Dv kdb.enter.panic .Xr panic 9 was called. .It Dv kdb.enter.powerfail The kernel debugger was entered as a result of a powerfail NMI on the sparc64 platform. .It Dv kdb.enter.powerpc The kernel debugger was entered as a result of an unimplemented interrupt type on the powerpc platform. .It Dv kdb.enter.sysctl The kernel debugger was entered as a result of the .Dv debug.kdb.enter sysctl being set. .It Dv kdb.enter.trapsig The kernel debugger was entered as a result of a trapsig event on the sparc64 platform. .It Dv kdb.enter.unionfs The kernel debugger was entered as a result of an assertion failure in the union file system. .It Dv kdb.enter.unknown The kernel debugger was entered, but no reason has been set. .It Dv kdb.enter.vfslock The kernel debugger was entered as a result of a VFS lock violation. .It Dv kdb.enter.watchdog The kernel debugger was entered as a result of a watchdog firing. .It Dv kdb.enter.witness The kernel debugger was entered as a result of a .Xr witness 4 violation. .El .Pp In the event that none of these scripts is found, .Nm will attempt to execute a default script: .Bl -tag -width kdb.enter.powerfail .It Dv kdb.enter.default The kernel debugger was entered, but a script exactly matching the reason for entering was not defined. This can be used as a catch-all to handle cases not specifically of interest; for example, .Dv kdb.enter.witness might be defined to have special handling, and .Dv kdb.enter.default might be defined to simply panic and reboot. .El .Sh HINTS On machines with an ISA expansion bus, a simple NMI generation card can be constructed by connecting a push button between the A01 and B01 (CHCHK# and GND) card fingers. Momentarily shorting these two fingers together may cause the bridge chipset to generate an NMI, which causes the kernel to pass control to .Nm . Some bridge chipsets do not generate a NMI on CHCHK#, so your mileage may vary. The NMI allows one to break into the debugger on a wedged machine to diagnose problems. Other bus' bridge chipsets may be able to generate NMI using bus specific methods. There are many PCI and PCIe add-in cards which can generate NMI for debugging. Modern server systems typically use IPMI to generate signals to enter the debugger. The .Dv devel/ipmitool port can be used to send the .Cd chassis power diag command which delivers an NMI to the processor. Embedded systems often use JTAG for debugging, but rarely use it in combination with .Nm . .Pp For serial consoles, you can enter the debugger by sending a BREAK condition on the serial line if .Cd options BREAK_TO_DEBUGGER is specified in the kernel. Most terminal emulation programs can send a break sequence with a special key sequence or via a menu item. However, in some setups, sending the break can be difficult to arrange or happens spuriously, so if the kernel contains .Cd options ALT_BREAK_TO_DEBUGGER then the sequence of CR TILDE CTRL-B enters the debugger; CR TILDE CTRL-P causes a panic instead of entering the debugger; and CR TILDE CTRL-R causes an immediate reboot. In all the above sequences, CR is a Carriage Return and is usually sent by hitting the Enter or Return key. TILDE is the ASCII tilde character (~). CTRL-x is Control x created by hitting the control key and then x and then releasing both. .Pp The break to enter the debugger behavior may be enabled at run-time by setting the .Xr sysctl 8 .Dv debug.kdb.break_to_debugger to 1. The alternate sequence to enter the debugger behavior may be enabled at run-time by setting the .Xr sysctl 8 .Dv debug.kdb.alt_break_to_debugger to 1. The debugger may be entered by setting the .Xr sysctl 8 .Dv debug.kdb.enter to 1. .Sh FILES Header files mentioned in this manual page can be found below .Pa /usr/include directory. .Pp .Bl -dash -compact .It .Pa sys/buf.h .It .Pa sys/domain.h .It .Pa netinet/in_pcb.h .It .Pa sys/socket.h .It .Pa sys/vnode.h .El .Sh SEE ALSO .Xr gdb 1 , .Xr kgdb 1 , .Xr acpi 4 , .Xr CAM 4 , .Xr mac_test 4 , .Xr ndis 4 , .Xr netgraph 4 , .Xr textdump 4 , .Xr witness 4 , .Xr ddb 8 , .Xr sysctl 8 , .Xr panic 9 .Sh HISTORY The .Nm debugger was developed for Mach, and ported to .Bx 386 0.1 . This manual page translated from .Xr man 7 macros by .An Garrett Wollman . .Pp .An Robert N. M. Watson added support for .Nm output capture, .Xr textdump 4 and scripting in .Fx 7.1 . Index: projects/clang390-import/share/man/man4/pci.4 =================================================================== --- projects/clang390-import/share/man/man4/pci.4 (revision 305686) +++ projects/clang390-import/share/man/man4/pci.4 (revision 305687) @@ -1,342 +1,525 @@ .\" .\" Copyright (c) 1999 Kenneth D. Merry. .\" All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. The name of the author may not be used to endorse or promote products .\" derived from this software without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD$ .\" -.Dd August 9, 2016 +.Dd September 8, 2016 .Dt PCI 4 .Os .Sh NAME .Nm pci -.Nd generic PCI driver +.Nd generic PCI bus driver .Sh SYNOPSIS +To compile the PCI bus driver into the kernel, +place the following line in your +kernel configuration file: +.Bd -ragged -offset indent .Cd device pci +.Ed +.Pp +To compile in support for Single Root I/O Virtualization +.Pq SR-IOV : +.Bd -ragged -offset indent +.Cd options PCI_IOV +.Ed +.Pp +To compile in support for native PCI-express HotPlug: +.Bd -ragged -offset indent +.Cd options PCI_HP +.Ed .Sh DESCRIPTION The .Nm -driver provides a way for userland programs to read and write +driver provides support for .Tn PCI +devices in the kernel and limited access to +.Tn PCI +devices for userland. +.Pp +The +.Nm +driver provides a +.Pa /dev/pci +character device that can be used by userland programs to read and write +.Tn PCI configuration registers. -It also provides a way for userland programs to get a list of all +Programs can also use this device to get a list of all .Tn PCI devices, or all .Tn PCI devices that match various patterns. .Pp Since the .Nm driver provides a write interface for .Tn PCI configuration registers, system administrators should exercise caution when granting access to the .Nm device. If used improperly, this driver can allow userland applications to crash a machine or cause data loss. .Pp The .Nm driver implements the .Tn PCI bus in the kernel. It enumerates any devices on the .Tn PCI bus and gives .Tn PCI client drivers the chance to attach to them. It assigns resources to children, when the BIOS does not. It takes care of routing interrupts when necessary. It reprobes the unattached .Tn PCI children when .Tn PCI client drivers are dynamically loaded at runtime. -.Sh KERNEL CONFIGURATION The .Nm -device is included in the kernel as described in the SYNOPSIS section. -The -.Nm -driver cannot be built as a -.Xr kld 4 . +driver also includes support for PCI-PCI bridges, +various platform-specific Host-PCI bridges, +and basic support for +.Tn PCI +VGA adapters. .Sh IOCTLS The following .Xr ioctl 2 calls are supported by the .Nm driver. They are defined in the header file .In sys/pciio.h . .Bl -tag -width 012345678901234 .It PCIOCGETCONF This .Xr ioctl 2 takes a .Va pci_conf_io structure. It allows the user to retrieve information on all .Tn PCI devices in the system, or on .Tn PCI devices matching patterns supplied by the user. The call may set .Va errno to any value specified in either .Xr copyin 9 or .Xr copyout 9 . The .Va pci_conf_io structure consists of a number of fields: .Bl -tag -width match_buf_len .It pat_buf_len The length, in bytes, of the buffer filled with user-supplied patterns. .It num_patterns The number of user-supplied patterns. .It patterns Pointer to a buffer filled with user-supplied patterns. .Va patterns is a pointer to .Va num_patterns .Va pci_match_conf structures. The .Va pci_match_conf structure consists of the following elements: .Bl -tag -width pd_vendor .It pc_sel .Tn PCI domain, bus, slot and function. .It pd_name .Tn PCI device driver name. .It pd_unit .Tn PCI device driver unit number. .It pc_vendor .Tn PCI vendor ID. .It pc_device .Tn PCI device ID. .It pc_class .Tn PCI device class. .It flags The flags describe which of the fields the kernel should match against. A device must match all specified fields in order to be returned. The match flags are enumerated in the .Va pci_getconf_flags structure. Hopefully the flag values are obvious enough that they do not need to described in detail. .El .It match_buf_len Length of the .Va matches buffer allocated by the user to hold the results of the .Dv PCIOCGETCONF query. .It num_matches Number of matches returned by the kernel. .It matches Buffer containing matching devices returned by the kernel. The items in this buffer are of type .Va pci_conf , which consists of the following items: .Bl -tag -width pc_subvendor .It pc_sel .Tn PCI domain, bus, slot and function. .It pc_hdr .Tn PCI header type. .It pc_subvendor .Tn PCI subvendor ID. .It pc_subdevice .Tn PCI subdevice ID. .It pc_vendor .Tn PCI vendor ID. .It pc_device .Tn PCI device ID. .It pc_class .Tn PCI device class. .It pc_subclass .Tn PCI device subclass. .It pc_progif .Tn PCI device programming interface. .It pc_revid .Tn PCI revision ID. .It pd_name Driver name. .It pd_unit Driver unit number. .El .It offset The offset is passed in by the user to tell the kernel where it should start traversing the device list. The value passed out by the kernel points to the record immediately after the last one returned. The user may pass the value returned by the kernel in subsequent calls to the .Dv PCIOCGETCONF ioctl. If the user does not intend to use the offset, it must be set to zero. .It generation .Tn PCI configuration generation. This value only needs to be set if the offset is set. The kernel will compare the current generation number of its internal device list to the generation passed in by the user to determine whether its device list has changed since the user last called the .Dv PCIOCGETCONF ioctl. If the device list has changed, a status of .Va PCI_GETCONF_LIST_CHANGED will be passed back. .It status The status tells the user the disposition of his request for a device list. The possible status values are: .Bl -ohang .It PCI_GETCONF_LAST_DEVICE This means that there are no more devices in the PCI device list matching the specified criteria after the ones returned in the .Va matches buffer. .It PCI_GETCONF_LIST_CHANGED This status tells the user that the .Tn PCI device list has changed since his last call to the .Dv PCIOCGETCONF ioctl and he must reset the .Va offset and .Va generation to zero to start over at the beginning of the list. .It PCI_GETCONF_MORE_DEVS This tells the user that his buffer was not large enough to hold all of the remaining devices in the device list that match his criteria. .It PCI_GETCONF_ERROR This indicates a general error while servicing the user's request. If the .Va pat_buf_len is not equal to .Va num_patterns times .Fn sizeof "struct pci_match_conf" , .Va errno will be set to .Er EINVAL . .El .El .It PCIOCREAD This .Xr ioctl 2 reads the .Tn PCI configuration registers specified by the passed-in .Va pci_io structure. The .Va pci_io structure consists of the following fields: .Bl -tag -width pi_width .It pi_sel A .Va pcisel structure which specifies the domain, bus, slot and function the user would like to query. If the specific bus is not found, errno will be set to ENODEV and -1 returned from the ioctl. .It pi_reg The .Tn PCI configuration register the user would like to access. .It pi_width The width, in bytes, of the data the user would like to read. This value may be either 1, 2, or 4. 3-byte reads and reads larger than 4 bytes are not supported. If an invalid width is passed, errno will be set to EINVAL. .It pi_data The data returned by the kernel. .El .It PCIOCWRITE This .Xr ioctl 2 allows users to write to the .Tn PCI specified in the passed-in .Va pci_io structure. The .Va pci_io structure is described above. The limitations on data width described for reading registers, above, also apply to writing .Tn PCI configuration registers. +.El +.Sh LOADER TUNABLES +Tunables can be set at the +.Xr loader 8 +prompt before booting the kernel, or stored in +.Xr loader.conf 5 . +The current value of these tunables can be examined at runtime via +.Xr sysctl 8 +nodes of the same name. +Unless otherwise specified, +each of these tunables is a boolean that can be enabled by setting the +tunable to a non-zero value. +.Bl -tag -width indent +.It Va hw.pci.clear_bars Pq Defaults to 0 +Ignore any firmware-assigned memory and I/O port resources. +This forces the +.Tn PCI +bus driver to allocate resource ranges for memory and I/O port resources +from scratch. +.It Va hw.pci.clear_buses Pq Defaults to 0 +Ignore any firmware-assigned bus number registers in PCI-PCI bridges. +This forces the +.Tn PCI +bus driver and PCI-PCI bridge driver to allocate bus numbers for secondary +buses behind PCI-PCI bridges. +.It Va hw.pci.clear_pcib Pq Defaults to 0 +Ignore any firmware-assigned memory and I/O port resource windows in PCI-PCI +bridges. +This forces the PCI-PCI bridge driver to allocate memory and I/O port resources +for resource windows from scratch. +.Pp +By default the PCI-PCI bridge driver will allocate windows that +contain the firmware-assigned resources devices behind the bridge. +In addition, the PCI-PCI bridge driver will suballocate from existing window +regions when possible to satisfy a resource request. +As a result, +both +.Va hw.pci.clear_bars +and +.Va hw.pci.clear_pcib +must be enabled to fully ignore firmware-supplied resource assignments. +.It Va hw.pci.default_vgapci_unit Pq Defaults to -1 +By default, +the first +.Tn PCI +VGA adapter encountered by the system is assumed to be the boot display device. +This tunable can be set to choose a specific VGA adapter by specifying the +unit number of the associated +.Va vgapci Ns Ar X +device. +.It Va hw.pci.do_power_nodriver Pq Defaults to 0 +Place devices into a low power state +.Pq D3 +when a suitable device driver is not found. +Can be set to one of the following values: +.Bl -tag -width indent +.It 3 +Powers down all +.Tn PCI +devices without a device driver. +.It 2 +Powers down most devices without a device driver. +PCI devices with the display, memory, and base peripheral device classes +are not powered down. +.It 1 +Similar to a setting of 2 except that storage controllers are also not +powered down. +.It 0 +All devices are left fully powered. +.El +.Pp +A +.Tn PCI +device must support power management to be powered down. +Placing a device into a low power state may not reduce power consumption. +.It Va hw.pci.do_power_resume Pq Defaults to 1 +Place +.Tn PCI +devices into the fully powered state when resuming either the system or an +individual device. +Setting this to zero is discouraged as the system will not attempt to power +up non-powered PCI devices after a suspend. +.It Va hw.pci.do_power_suspend Pq Defaults to 1 +Place +.Tn PCI +devices into a low power state when suspending either the system or individual +devices. +Normally the D3 state is used as the low power state, +but firmware may override the desired power state during a system suspend. +.It Va hw.pci.enable_ari Pq Defaults to 1 +Enable support for PCI-express Alternative RID Interpretation. +This is often used in conjunction with SR-IOV. +.It Va hw.pci.enable_io_modes Pq Defaults to 1 +Enable memory or I/O port decoding in a PCI device's command register if it has +firmware-assigned memory or I/O port resources. +The firmware +.Pq BIOS +in some systems does not enable memory or I/O port decoding for some devices +even when it has assigned resources to the device. +This enables decoding for such resources during bus probe. +.It Va hw.pci.enable_msi Pq Defaults to 1 +Enable support for Message Signalled Interrupts +.Pq MSI . +MSI interrupts can be disabled by setting this tunable to 0. +.It Va hw.pci.enable_msix Pq Defaults to 1 +Enable support for extended Message Signalled Interrupts +.Pq MSI-X . +MSI-X interrupts can be disabled by setting this tunable to 0. +.It Va hw.pci.enable_pcie_hp Pq Defaults to 1 +Enable support for native PCI-express HotPlug. +.It Va hw.pci.honor_msi_blacklist Pq Defaults to 1 +MSI and MSI-X interrupts are disabled for certain chipsets known to have +broken MSI and MSI-X implementations when this tunable is set. +It can be set to zero to permit use of MSI and MSI-X interrupts if the +chipset match is a false positive. +.It Va hw.pci.iov_max_config Pq Defaults to 1MB +The maximum amount of memory permitted for the configuration parameters +used when creating Virtual Functions via SR-IOV. +This tunable can also be changed at runtime via +.Xr sysctl 8 . +.It Va hw.pci.realloc_bars Pq Defaults to 0 +Attempt to allocate a new resource range during the initial device scan +for any memory or I/O port resources with firmware-assigned ranges that +conflict with another active resource. +.It Va hw.pci.usb_early_takeover Pq Defaults to 1 on Tn amd64 and Tn i386 +Disable legacy device emulation of USB devices during the initial device +scan. +Set this tunable to zero to use USB devices via legacy emulation when +using a custom kernel without USB controller drivers. +.It Va hw.pci...INT

.irq +These tunables can be used to override the interrupt routing for legacy +PCI INTx interrupts. +Unlike other tunables in this list, +these do not have corresponding sysctl nodes. +The tunable name includes the address of the PCI device as well as the +pin of the desired INTx IRQ to override: +.Bl -tag -width indent +.It +The domain +.Pq or segment +of the PCI device in decimal. +.It +The bus address of the PCI device in decimal. +.It +The slot of the PCI device in decimal. +.It

+The interrupt pin of the PCI slot to override. +One of +.Ql A , +.Ql B , +.Ql C , +or +.Ql D . +.El +.Pp +The value of the tunable is the raw IRQ value to use for the INTx interrupt +pin identified by the tunable name. +Mapping of IRQ values to platform interrupt sources is machine dependent. .El .Sh FILES .Bl -tag -width /dev/pci -compact .It Pa /dev/pci Character device for the .Nm driver. .El .Sh SEE ALSO .Xr pciconf 8 .Sh HISTORY The .Nm driver (not the kernel's .Tn PCI support code) first appeared in .Fx 2.2 , and was written by Stefan Esser and Garrett Wollman. Support for device listing and matching was re-implemented by Kenneth Merry, and first appeared in .Fx 3.0 . .Sh AUTHORS .An Kenneth Merry Aq Mt ken@FreeBSD.org .Sh BUGS It is not possible for users to specify an accurate offset into the device list without calling the .Dv PCIOCGETCONF at least once, since they have no way of knowing the current generation number otherwise. This probably is not a serious problem, though, since users can easily narrow their search by specifying a pattern or patterns for the kernel to match against. Index: projects/clang390-import/share/mk/bsd.subdir.mk =================================================================== --- projects/clang390-import/share/mk/bsd.subdir.mk (revision 305686) +++ projects/clang390-import/share/mk/bsd.subdir.mk (revision 305687) @@ -1,193 +1,194 @@ # from: @(#)bsd.subdir.mk 5.9 (Berkeley) 2/1/91 # $FreeBSD$ # # The include file contains the default targets # for building subdirectories. # # For all of the directories listed in the variable SUBDIRS, the # specified directory will be visited and the target made. There is # also a default target which allows the command "make subdir" where # subdir is any directory listed in the variable SUBDIRS. # # # +++ variables +++ # # DISTRIBUTION Name of distribution. [base] # # SUBDIR A list of subdirectories that should be built as well. # Each of the targets will execute the same target in the # subdirectories. SUBDIR.yes is automatically appended # to this list. # # +++ targets +++ # # distribute: # This is a variant of install, which will # put the stuff into the right "distribution". # # See SUBDIR_TARGETS for list of targets that will recurse. # # Targets defined in STANDALONE_SUBDIR_TARGETS will always be ran # with SUBDIR_PARALLEL and will not respect .WAIT or SUBDIR_DEPEND_ # values. # # SUBDIR_TARGETS and STANDALONE_SUBDIR_TARGETS can be appended to # via make.conf or src.conf. # .if !target(____) ____: SUBDIR_TARGETS+= \ all all-man analyze buildconfig buildfiles buildincludes \ checkdpadd clean cleandepend cleandir cleanilinks \ cleanobj depend distribute files includes installconfig \ installfiles installincludes print-dir realinstall lint \ maninstall manlint obj objlink tags \ # Described above. STANDALONE_SUBDIR_TARGETS+= \ all-man buildconfig buildfiles buildincludes check checkdpadd \ clean cleandepend cleandir cleanilinks cleanobj files includes \ installconfig installincludes installfiles print-dir \ maninstall manlint obj objlink # It is safe to install in parallel when staging. .if defined(NO_ROOT) STANDALONE_SUBDIR_TARGETS+= realinstall .endif .include .if make(print-dir) NEED_SUBDIR= 1 ECHODIR= : .SILENT: .if ${RELDIR:U.} != "." print-dir: .PHONY @echo ${RELDIR} .endif .endif .if !defined(NEED_SUBDIR) .if ${.MAKE.LEVEL} == 0 && ${MK_DIRDEPS_BUILD} == "yes" && !empty(SUBDIR) && !(make(clean*) || make(destroy*)) .include # ignore this _SUBDIR: .endif .endif DISTRIBUTION?= base .if !target(distribute) distribute: .MAKE .for dist in ${DISTRIBUTION} ${_+_}cd ${.CURDIR}; \ ${MAKE} install installconfig -DNO_SUBDIR DESTDIR=${DISTDIR}/${dist} SHARED=copies .endfor .endif # Convenience targets to run 'build${target}' and 'install${target}' when # calling 'make ${target}'. .for __target in files includes .if !target(${__target}) ${__target}: build${__target} install${__target} .ORDER: build${__target} install${__target} .endif .endfor # Make 'install' supports a before and after target. Actual install # hooks are placed in 'realinstall'. .if !target(install) .for __stage in before real after .if !target(${__stage}install) ${__stage}install: .endif .endfor install: beforeinstall realinstall afterinstall .ORDER: beforeinstall realinstall afterinstall .endif .ORDER: all install # SUBDIR recursing may be disabled for MK_DIRDEPS_BUILD .if !target(_SUBDIR) .if defined(SUBDIR) SUBDIR:=${SUBDIR} ${SUBDIR.yes} SUBDIR:=${SUBDIR:u} .endif # Subdir code shared among 'make ', 'make ' and SUBDIR_PARALLEL. _SUBDIR_SH= \ if test -d ${.CURDIR}/$${dir}.${MACHINE_ARCH}; then \ dir=$${dir}.${MACHINE_ARCH}; \ fi; \ ${ECHODIR} "===> ${DIRPRFX}$${dir} ($${target})"; \ cd ${.CURDIR}/$${dir}; \ ${MAKE} $${target} DIRPRFX=${DIRPRFX}$${dir}/ # This is kept for compatibility only. The normal handling of attaching to # SUBDIR_TARGETS will create a target for each directory. _SUBDIR: .USEBEFORE .if defined(SUBDIR) && !empty(SUBDIR) && !defined(NO_SUBDIR) @${_+_}target=${.TARGET:realinstall=install}; \ for dir in ${SUBDIR:N.WAIT}; do ( ${_SUBDIR_SH} ); done .endif # Create 'make subdir' targets to run the real 'all' target. .for __dir in ${SUBDIR:N.WAIT} ${__dir}: all_subdir_${DIRPRFX}${__dir} .PHONY .endfor .for __target in ${SUBDIR_TARGETS} # Can ordering be skipped for this and SUBDIR_PARALLEL forced? .if ${STANDALONE_SUBDIR_TARGETS:M${__target}} _is_standalone_target= 1 -SUBDIR:= ${SUBDIR:N.WAIT} +_subdir_filter= N.WAIT .else _is_standalone_target= 0 +_subdir_filter= .endif __subdir_targets= -.for __dir in ${SUBDIR} +.for __dir in ${SUBDIR:${_subdir_filter}} .if ${__dir} == .WAIT __subdir_targets+= .WAIT .else __deps= .if ${_is_standalone_target} == 0 .if defined(SUBDIR_PARALLEL) # Apply SUBDIR_DEPEND dependencies for SUBDIR_PARALLEL. .for __dep in ${SUBDIR_DEPEND_${__dir}} __deps+= ${__target}_subdir_${DIRPRFX}${__dep} .endfor .else # For non-parallel builds, directories depend on all targets before them. __deps:= ${__subdir_targets} .endif # defined(SUBDIR_PARALLEL) .endif # ${_is_standalone_target} == 0 ${__target}_subdir_${DIRPRFX}${__dir}: .PHONY .MAKE .SILENT ${__deps} @${_+_}target=${__target:realinstall=install}; \ dir=${__dir}; \ ${_SUBDIR_SH}; __subdir_targets+= ${__target}_subdir_${DIRPRFX}${__dir} .endif # ${__dir} == .WAIT .endfor # __dir in ${SUBDIR} # Attach the subdir targets to the real target. # Only recurse on directly-called targets. I.e., don't recurse on dependencies # such as 'install' becoming {before,real,after}install, just recurse # 'install'. Despite that, 'realinstall' is special due to ordering issues # with 'afterinstall'. .if !defined(NO_SUBDIR) && (make(${__target}) || \ (${__target} == realinstall && make(install))) ${__target}: ${__subdir_targets} .PHONY .endif # make(${__target}) .endfor # __target in ${SUBDIR_TARGETS} .endif # !target(_SUBDIR) # Ensure all targets exist .for __target in ${SUBDIR_TARGETS} .if !target(${__target}) ${__target}: .endif .endfor .endif Index: projects/clang390-import/share/mk/dirdeps.mk =================================================================== --- projects/clang390-import/share/mk/dirdeps.mk (revision 305686) +++ projects/clang390-import/share/mk/dirdeps.mk (revision 305687) @@ -1,712 +1,725 @@ # $FreeBSD$ -# $Id: dirdeps.mk,v 1.62 2016/03/16 00:11:53 sjg Exp $ +# $Id: dirdeps.mk,v 1.73 2016/08/15 19:28:13 sjg Exp $ # Copyright (c) 2010-2013, Juniper Networks, Inc. # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS # "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR # A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT # OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, # SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT # LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, # DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY # THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. # Much of the complexity here is for supporting cross-building. # If a tree does not support that, simply using plain Makefile.depend # should provide sufficient clue. # Otherwise the recommendation is to use Makefile.depend.${MACHINE} # as expected below. # Note: this file gets multiply included. # This is what we do with DIRDEPS # DIRDEPS: # This is a list of directories - relative to SRCTOP, it is # normally only of interest to .MAKE.LEVEL 0. # In some cases the entry may be qualified with a . # or . suffix (see TARGET_SPEC_VARS below), # for example to force building something for the pseudo # machines "host" or "common" regardless of current ${MACHINE}. # # All unqualified entries end up being qualified with .${TARGET_SPEC} # and partially qualified (if TARGET_SPEC_VARS has multiple # entries) are also expanded to a full .. # The _DIRDEP_USE target uses the suffix to set TARGET_SPEC # correctly when visiting each entry. # # The fully qualified directory entries are used to construct a # dependency graph that will drive the build later. # # Also, for each fully qualified directory target, we will search # using ${.MAKE.DEPENDFILE_PREFERENCE} to find additional # dependencies. We use Makefile.depend (default value for # .MAKE.DEPENDFILE_PREFIX) to refer to these makefiles to # distinguish them from others. # # Each Makefile.depend file sets DEP_RELDIR to be the # the RELDIR (path relative to SRCTOP) for its directory, and # since each Makefile.depend file includes dirdeps.mk, this # processing is recursive and results in .MAKE.LEVEL 0 learning the # dependencies of the tree wrt the initial directory (_DEP_RELDIR). # # BUILD_AT_LEVEL0 # Indicates whether .MAKE.LEVEL 0 builds anything: # if "no" sub-makes are used to build everything, # if "yes" sub-makes are only used to build for other machines. # It is best to use "no", but this can require fixing some # makefiles to not do anything at .MAKE.LEVEL 0. # # TARGET_SPEC_VARS # The default value is just MACHINE, and for most environments # this is sufficient. The _DIRDEP_USE target actually sets # both MACHINE and TARGET_SPEC to the suffix of the current # target so that in the general case TARGET_SPEC can be ignored. # # If more than MACHINE is needed then sys.mk needs to decompose # TARGET_SPEC and set the relevant variables accordingly. # It is important that MACHINE be included in and actually be # the first member of TARGET_SPEC_VARS. This allows other # variables to be considered optional, and some of the treatment # below relies on MACHINE being the first entry. # Note: TARGET_SPEC cannot contain any '.'s so the target # triple used by compiler folk won't work (directly anyway). # # For example: # # # Always list MACHINE first, # # other variables might be optional. # TARGET_SPEC_VARS = MACHINE TARGET_OS # .if ${TARGET_SPEC:Uno:M*,*} != "" # _tspec := ${TARGET_SPEC:S/,/ /g} # MACHINE := ${_tspec:[1]} # TARGET_OS := ${_tspec:[2]} # # etc. # # We need to stop that TARGET_SPEC affecting any submakes # # and deal with MACHINE=${TARGET_SPEC} in the environment. # TARGET_SPEC = # # export but do not track # .export-env TARGET_SPEC # .export ${TARGET_SPEC_VARS} # .for v in ${TARGET_SPEC_VARS:O:u} # .if empty($v) # .undef $v # .endif # .endfor # .endif # # make sure we know what TARGET_SPEC is # # as we may need it to find Makefile.depend* # TARGET_SPEC = ${TARGET_SPEC_VARS:@v@${$v:U}@:ts,} # # touch this at your peril _DIRDEP_USE_LEVEL?= 0 .if ${.MAKE.LEVEL} == ${_DIRDEP_USE_LEVEL} # only the first instance is interested in all this -# First off, we want to know what ${MACHINE} to build for. -# This can be complicated if we are using a mixture of ${MACHINE} specific -# and non-specific Makefile.depend* - .if !target(_DIRDEP_USE) +# do some setup we only need once +_CURDIR ?= ${.CURDIR} +_OBJDIR ?= ${.OBJDIR} + +now_utc = ${%s:L:gmtime} +.if !defined(start_utc) +start_utc := ${now_utc} +.endif + .if ${MAKEFILE:T} == ${.PARSEFILE} && empty(DIRDEPS) && ${.TARGETS:Uall:M*/*} != "" # This little trick let's us do # # mk -f dirdeps.mk some/dir.${TARGET_SPEC} # all: ${.TARGETS:Nall}: all DIRDEPS := ${.TARGETS:M*[/.]*} # so that -DNO_DIRDEPS works DEP_RELDIR := ${DIRDEPS:[1]:R} # this will become DEP_MACHINE below TARGET_MACHINE := ${DIRDEPS:[1]:E:C/,.*//} .if ${TARGET_MACHINE:N*/*} == "" TARGET_MACHINE := ${MACHINE} .endif # disable DIRDEPS_CACHE as it does not like this trick MK_DIRDEPS_CACHE = no .endif # make sure we get the behavior we expect .MAKE.SAVE_DOLLARS = no -# do some setup we only need once -_CURDIR ?= ${.CURDIR} -_OBJDIR ?= ${.OBJDIR} - -now_utc = ${%s:L:gmtime} -.if !defined(start_utc) -start_utc := ${now_utc} -.endif - # make sure these are empty to start with _DEP_TARGET_SPEC = # If TARGET_SPEC_VARS is other than just MACHINE # it should be set by sys.mk or similar by now. # TARGET_SPEC must not contain any '.'s. TARGET_SPEC_VARS ?= MACHINE # this is what we started with TARGET_SPEC = ${TARGET_SPEC_VARS:@v@${$v:U}@:ts,} # this is what we mostly use below DEP_TARGET_SPEC = ${TARGET_SPEC_VARS:S,^,DEP_,:@v@${$v:U}@:ts,} # make sure we have defaults .for v in ${TARGET_SPEC_VARS} DEP_$v ?= ${$v} .endfor .if ${TARGET_SPEC_VARS:[#]} > 1 # Ok, this gets more complex (putting it mildly). # In order to stay sane, we need to ensure that all the build_dirs # we compute below are fully qualified wrt DEP_TARGET_SPEC. # The makefiles may only partially specify (eg. MACHINE only), # so we need to construct a set of modifiers to fill in the gaps. # jot 10 should output 1 2 3 .. 10 JOT ?= jot _tspec_x := ${${JOT} ${TARGET_SPEC_VARS:[#]}:L:sh} # this handles unqualified entries M_dep_qual_fixes = C;(/[^/.,]+)$$;\1.$${DEP_TARGET_SPEC}; # there needs to be at least one item missing for these to make sense .for i in ${_tspec_x:[2..-1]} _tspec_m$i := ${TARGET_SPEC_VARS:[2..$i]:@w@[^,]+@:ts,} _tspec_a$i := ,${TARGET_SPEC_VARS:[$i..-1]:@v@$$$${DEP_$v}@:ts,} M_dep_qual_fixes += C;(\.${_tspec_m$i})$$;\1${_tspec_a$i}; .endfor .else # A harmless? default. M_dep_qual_fixes = U .endif .if !defined(.MAKE.DEPENDFILE_PREFERENCE) # .MAKE.DEPENDFILE_PREFERENCE makes the logic below neater? # you really want this set by sys.mk or similar .MAKE.DEPENDFILE_PREFERENCE = ${_CURDIR}/${.MAKE.DEPENDFILE:T} .if ${.MAKE.DEPENDFILE:E} == "${TARGET_SPEC}" .if ${TARGET_SPEC} != ${MACHINE} .MAKE.DEPENDFILE_PREFERENCE += ${_CURDIR}/${.MAKE.DEPENDFILE:T:R}.$${MACHINE} .endif .MAKE.DEPENDFILE_PREFERENCE += ${_CURDIR}/${.MAKE.DEPENDFILE:T:R} .endif .endif _default_dependfile := ${.MAKE.DEPENDFILE_PREFERENCE:[1]:T} _machine_dependfiles := ${.MAKE.DEPENDFILE_PREFERENCE:T:M*${MACHINE}*} # for machine specific dependfiles we require ${MACHINE} to be at the end # also for the sake of sanity we require a common prefix .if !defined(.MAKE.DEPENDFILE_PREFIX) # knowing .MAKE.DEPENDFILE_PREFIX helps .if !empty(_machine_dependfiles) .MAKE.DEPENDFILE_PREFIX := ${_machine_dependfiles:[1]:T:R} .else .MAKE.DEPENDFILE_PREFIX := ${_default_dependfile:T} .endif .endif # this is how we identify non-machine specific dependfiles N_notmachine := ${.MAKE.DEPENDFILE_PREFERENCE:E:N*${MACHINE}*:${M_ListToSkip}} .endif # !target(_DIRDEP_USE) +# First off, we want to know what ${MACHINE} to build for. +# This can be complicated if we are using a mixture of ${MACHINE} specific +# and non-specific Makefile.depend* + # if we were included recursively _DEP_TARGET_SPEC should be valid. .if empty(_DEP_TARGET_SPEC) # we may or may not have included a dependfile yet .if defined(.INCLUDEDFROMFILE) _last_dependfile := ${.INCLUDEDFROMFILE:M${.MAKE.DEPENDFILE_PREFIX}*} .else _last_dependfile := ${.MAKE.MAKEFILES:M*/${.MAKE.DEPENDFILE_PREFIX}*:[-1]} .endif .if ${_debug_reldir:U0} .info ${DEP_RELDIR}.${DEP_TARGET_SPEC}: _last_dependfile='${_last_dependfile}' .endif .if empty(_last_dependfile) || ${_last_dependfile:E:${N_notmachine}} == "" # this is all we have to work with DEP_MACHINE = ${TARGET_MACHINE:U${MACHINE}} _DEP_TARGET_SPEC := ${DEP_TARGET_SPEC} .else _DEP_TARGET_SPEC = ${_last_dependfile:${M_dep_qual_fixes:ts:}:E} .endif .if !empty(_last_dependfile) # record that we've read dependfile for this _dirdeps_checked.${_CURDIR}.${TARGET_SPEC}: .endif .endif # by now _DEP_TARGET_SPEC should be set, parse it. .if ${TARGET_SPEC_VARS:[#]} > 1 # we need to parse DEP_MACHINE may or may not contain more info _tspec := ${_DEP_TARGET_SPEC:S/,/ /g} .for i in ${_tspec_x} DEP_${TARGET_SPEC_VARS:[$i]} := ${_tspec:[$i]} .endfor .for v in ${TARGET_SPEC_VARS:O:u} .if empty(DEP_$v) .undef DEP_$v .endif .endfor .else DEP_MACHINE := ${_DEP_TARGET_SPEC} .endif # reset each time through _build_all_dirs = # the first time we are included the _DIRDEP_USE target will not be defined # we can use this as a clue to do initialization and other one time things. .if !target(_DIRDEP_USE) # make sure this target exists dirdeps: beforedirdeps .WAIT beforedirdeps: # We normally expect to be included by Makefile.depend.* # which sets the DEP_* macros below. DEP_RELDIR ?= ${RELDIR} # this can cause lots of output! # set to a set of glob expressions that might match RELDIR DEBUG_DIRDEPS ?= no # remember the initial value of DEP_RELDIR - we test for it below. _DEP_RELDIR := ${DEP_RELDIR} .endif # pickup customizations # as below you can use !target(_DIRDEP_USE) to protect things # which should only be done once. .-include .if !target(_DIRDEP_USE) # things we skip for host tools SKIP_HOSTDIR ?= NSkipHostDir = ${SKIP_HOSTDIR:N*.host*:S,$,.host*,:N.host*:S,^,${SRCTOP}/,:${M_ListToSkip}} # things we always skip # SKIP_DIRDEPS allows for adding entries on command line. SKIP_DIR += .host *.WAIT ${SKIP_DIRDEPS} SKIP_DIR.host += ${SKIP_HOSTDIR} DEP_SKIP_DIR = ${SKIP_DIR} \ ${SKIP_DIR.${DEP_TARGET_SPEC}:U} \ ${SKIP_DIR.${DEP_MACHINE}:U} \ ${SKIP_DIRDEPS.${DEP_MACHINE}:U} NSkipDir = ${DEP_SKIP_DIR:${M_ListToSkip}} .if defined(NODIRDEPS) || defined(WITHOUT_DIRDEPS) NO_DIRDEPS = .elif defined(WITHOUT_DIRDEPS_BELOW) NO_DIRDEPS_BELOW = .endif .if defined(NO_DIRDEPS) # confine ourselves to the original dir and below. DIRDEPS_FILTER += M${_DEP_RELDIR}* .elif defined(NO_DIRDEPS_BELOW) DIRDEPS_FILTER += M${_DEP_RELDIR} .endif # this is what we run below DIRDEP_MAKE?= ${.MAKE} # we suppress SUBDIR when visiting the leaves # we assume sys.mk will set MACHINE_ARCH # you can add extras to DIRDEP_USE_ENV # if there is no makefile in the target directory, we skip it. _DIRDEP_USE: .USE .MAKE @for m in ${.MAKE.MAKEFILE_PREFERENCE}; do \ test -s ${.TARGET:R}/$$m || continue; \ echo "${TRACER}Checking ${.TARGET:R} for ${.TARGET:E} ..."; \ MACHINE_ARCH= NO_SUBDIR=1 ${DIRDEP_USE_ENV} \ TARGET_SPEC=${.TARGET:E} \ MACHINE=${.TARGET:E} \ ${DIRDEP_MAKE} -C ${.TARGET:R} || exit 1; \ break; \ done .ifdef ALL_MACHINES # this is how you limit it to only the machines we have been built for # previously. .if empty(ONLY_MACHINE_LIST) .if !empty(ALL_MACHINE_LIST) # ALL_MACHINE_LIST is the list of all legal machines - ignore anything else _machine_list != cd ${_CURDIR} && 'ls' -1 ${ALL_MACHINE_LIST:O:u:@m@${.MAKE.DEPENDFILE:T:R}.$m@} 2> /dev/null; echo .else _machine_list != 'ls' -1 ${_CURDIR}/${.MAKE.DEPENDFILE_PREFIX}.* 2> /dev/null; echo .endif _only_machines := ${_machine_list:${NIgnoreFiles:UN*.bak}:E:O:u} .else _only_machines := ${ONLY_MACHINE_LIST} .endif .if empty(_only_machines) # we must be boot-strapping _only_machines := ${TARGET_MACHINE:U${ALL_MACHINE_LIST:U${DEP_MACHINE}}} .endif .else # ! ALL_MACHINES # if ONLY_MACHINE_LIST is set, we are limited to that # if TARGET_MACHINE is set - it is really the same as ONLY_MACHINE_LIST # otherwise DEP_MACHINE is it - so DEP_MACHINE will match. _only_machines := ${ONLY_MACHINE_LIST:U${TARGET_MACHINE:U${DEP_MACHINE}}:M${DEP_MACHINE}} .endif .if !empty(NOT_MACHINE_LIST) _only_machines := ${_only_machines:${NOT_MACHINE_LIST:${M_ListToSkip}}} .endif # make sure we have a starting place? DIRDEPS ?= ${RELDIR} .endif # target # if repeatedly building the same target, # we can avoid the overhead of re-computing the tree dependencies. MK_DIRDEPS_CACHE ?= no BUILD_DIRDEPS_CACHE ?= no BUILD_DIRDEPS ?= yes .if !defined(NO_DIRDEPS) && !defined(NO_DIRDEPS_BELOW) .if ${MK_DIRDEPS_CACHE} == "yes" # this is where we will cache all our work -DIRDEPS_CACHE?= ${_OBJDIR}/dirdeps.cache${.TARGETS:Nall:O:u:ts-:S,/,_,g:S,^,.,:N.} +DIRDEPS_CACHE?= ${_OBJDIR:tA}/dirdeps.cache${.TARGETS:Nall:O:u:ts-:S,/,_,g:S,^,.,:N.} # just ensure this exists build-dirdeps: M_oneperline = @x@\\${.newline} $$x@ .if ${BUILD_DIRDEPS_CACHE} == "no" .if !target(dirdeps-cached) # we do this via sub-make BUILD_DIRDEPS = no +# ignore anything but these +.MAKE.META.IGNORE_FILTER = M*/${.MAKE.DEPENDFILE_PREFIX}* + dirdeps: dirdeps-cached dirdeps-cached: ${DIRDEPS_CACHE} .MAKE @echo "${TRACER}Using ${DIRDEPS_CACHE}" @MAKELEVEL=${.MAKE.LEVEL} ${.MAKE} -C ${_CURDIR} -f ${DIRDEPS_CACHE} \ dirdeps MK_DIRDEPS_CACHE=no BUILD_DIRDEPS=no # these should generally do BUILD_DIRDEPS_MAKEFILE ?= ${MAKEFILE} BUILD_DIRDEPS_TARGETS ?= ${.TARGETS} # we need the .meta file to ensure we update if # any of the Makefile.depend* changed. # We do not want to compare the command line though. ${DIRDEPS_CACHE}: .META .NOMETA_CMP +@{ echo '# Autogenerated - do NOT edit!'; echo; \ echo 'BUILD_DIRDEPS=no'; echo; \ echo '.include '; \ } > ${.TARGET}.new +@MAKELEVEL=${.MAKE.LEVEL} DIRDEPS_CACHE=${DIRDEPS_CACHE} \ DIRDEPS="${DIRDEPS}" \ MAKEFLAGS= ${.MAKE} -C ${_CURDIR} -f ${BUILD_DIRDEPS_MAKEFILE} \ ${BUILD_DIRDEPS_TARGETS} BUILD_DIRDEPS_CACHE=yes \ .MAKE.DEPENDFILE=.none \ ${.MAKEFLAGS:tW:S,-D ,-D,g:tw:M*WITH*} \ 3>&1 1>&2 | sed 's,${SRCTOP},$${SRCTOP},g' >> ${.TARGET}.new && \ mv ${.TARGET}.new ${.TARGET} .endif .elif !target(_count_dirdeps) # we want to capture the dirdeps count in the cache .END: _count_dirdeps _count_dirdeps: .NOMETA @echo '.info $${.newline}$${TRACER}Makefiles read: total=${.MAKE.MAKEFILES:[#]} depend=${.MAKE.MAKEFILES:M*depend*:[#]} dirdeps=${.ALLTARGETS:M${SRCTOP}*:O:u:[#]}' >&3 .endif .elif !make(dirdeps) && !target(_count_dirdeps) beforedirdeps: _count_dirdeps _count_dirdeps: .NOMETA @echo "${TRACER}Makefiles read: total=${.MAKE.MAKEFILES:[#]} depend=${.MAKE.MAKEFILES:M*depend*:[#]} dirdeps=${.ALLTARGETS:M${SRCTOP}*:O:u:[#]} seconds=`expr ${now_utc} - ${start_utc}`" .endif .endif .if ${BUILD_DIRDEPS} == "yes" .if ${DEBUG_DIRDEPS:@x@${DEP_RELDIR:M$x}${${DEP_RELDIR}.${DEP_MACHINE}:L:M$x}@} != "" _debug_reldir = 1 .else _debug_reldir = 0 .endif .if ${DEBUG_DIRDEPS:@x@${DEP_RELDIR:M$x}${${DEP_RELDIR}.depend:L:M$x}@} != "" _debug_search = 1 .else _debug_search = 0 .endif # the rest is done repeatedly for every Makefile.depend we read. # if we are anything but the original dir we care only about the # machine type we were included for.. .if ${DEP_RELDIR} == "." _this_dir := ${SRCTOP} .else _this_dir := ${SRCTOP}/${DEP_RELDIR} .endif # on rare occasions, there can be a need for extra help _dep_hack := ${_this_dir}/${.MAKE.DEPENDFILE_PREFIX}.inc .-include <${_dep_hack}> .if ${DEP_RELDIR} != ${_DEP_RELDIR} || ${DEP_TARGET_SPEC} != ${TARGET_SPEC} # this should be all _machines := ${DEP_MACHINE} .else # this is the machine list we actually use below _machines := ${_only_machines} .if defined(HOSTPROG) || ${DEP_MACHINE} == "host" # we need to build this guy's dependencies for host as well. _machines += host .endif _machines := ${_machines:O:u} .endif .if ${TARGET_SPEC_VARS:[#]} > 1 # we need to tweak _machines _dm := ${DEP_MACHINE} # apply the same filtering that we do when qualifying DIRDEPS. # M_dep_qual_fixes expects .${MACHINE}* so add (and remove) '.' _machines := ${_machines:@DEP_MACHINE@${DEP_TARGET_SPEC}@:S,^,.,:${M_dep_qual_fixes:ts:}:O:u:S,^.,,} DEP_MACHINE := ${_dm} .endif # reset each time through _build_dirs = .if ${DEP_RELDIR} == ${_DEP_RELDIR} # pickup other machines for this dir if necessary .if ${BUILD_AT_LEVEL0:Uyes} == "no" _build_dirs += ${_machines:@m@${_CURDIR}.$m@} .else _build_dirs += ${_machines:N${DEP_TARGET_SPEC}:@m@${_CURDIR}.$m@} .if ${DEP_TARGET_SPEC} == ${TARGET_SPEC} # pickup local dependencies now .if ${MAKE_VERSION} < 20160220 .-include <.depend> .else .dinclude <.depend> .endif .endif .endif .endif .if ${_debug_reldir} .info ${DEP_RELDIR}.${DEP_TARGET_SPEC}: DIRDEPS='${DIRDEPS}' .info ${DEP_RELDIR}.${DEP_TARGET_SPEC}: _machines='${_machines}' .endif .if !empty(DIRDEPS) # these we reset each time through as they can depend on DEP_MACHINE DEP_DIRDEPS_FILTER = \ ${DIRDEPS_FILTER.${DEP_TARGET_SPEC}:U} \ ${DIRDEPS_FILTER.${DEP_MACHINE}:U} \ ${DIRDEPS_FILTER:U} .if empty(DEP_DIRDEPS_FILTER) # something harmless DEP_DIRDEPS_FILTER = U .endif # this is what we start with __depdirs := ${DIRDEPS:${NSkipDir}:${DEP_DIRDEPS_FILTER:ts:}:C,//+,/,g:O:u:@d@${SRCTOP}/$d@} # some entries may be qualified with . # the :M*/*/*.* just tries to limit the dirs we check to likely ones. # the ${d:E:M*/*} ensures we don't consider junos/usr.sbin/mgd __qual_depdirs := ${__depdirs:M*/*/*.*:@d@${exists($d):?:${"${d:E:M*/*}":?:${exists(${d:R}):?$d:}}}@} __unqual_depdirs := ${__depdirs:${__qual_depdirs:Uno:${M_ListToSkip}}} .if ${DEP_RELDIR} == ${_DEP_RELDIR} # if it was called out - we likely need it. __hostdpadd := ${DPADD:U.:M${HOST_OBJTOP}/*:S,${HOST_OBJTOP}/,,:H:${NSkipDir}:${DIRDEPS_FILTER:ts:}:S,$,.host,:N.*:@d@${SRCTOP}/$d@} __qual_depdirs += ${__hostdpadd} .endif .if ${_debug_reldir} .info depdirs=${__depdirs} .info qualified=${__qual_depdirs} .info unqualified=${__unqual_depdirs} .endif # _build_dirs is what we will feed to _DIRDEP_USE _build_dirs += \ ${__qual_depdirs:M*.host:${NSkipHostDir}:N.host} \ ${__qual_depdirs:N*.host} \ ${_machines:Mhost*:@m@${__unqual_depdirs:@d@$d.$m@}@:${NSkipHostDir}:N.host} \ ${_machines:Nhost*:@m@${__unqual_depdirs:@d@$d.$m@}@} # qualify everything now _build_dirs := ${_build_dirs:${M_dep_qual_fixes:ts:}:O:u} _build_all_dirs += ${_build_dirs} _build_all_dirs := ${_build_all_dirs:O:u} .endif # empty DIRDEPS # Normally if doing make -V something, # we do not want to waste time chasing DIRDEPS # but if we want to count the number of Makefile.depend* read, we do. .if ${.MAKEFLAGS:M-V${_V_READ_DIRDEPS}} == "" .if !empty(_build_all_dirs) .if ${BUILD_DIRDEPS_CACHE} == "yes" x!= { echo; echo '\# ${DEP_RELDIR}.${DEP_TARGET_SPEC}'; \ echo 'dirdeps: ${_build_all_dirs:${M_oneperline}}'; echo; } >&3; echo x!= { ${_build_all_dirs:@x@${target($x):?:echo '$x: _DIRDEP_USE';}@} echo; } >&3; echo .else # this makes it all happen dirdeps: ${_build_all_dirs} .endif ${_build_all_dirs}: _DIRDEP_USE .if ${_debug_reldir} .info ${DEP_RELDIR}.${DEP_TARGET_SPEC}: needs: ${_build_dirs} .endif # this builds the dependency graph .for m in ${_machines} # it would be nice to do :N${.TARGET} .if !empty(__qual_depdirs) .for q in ${__qual_depdirs:${M_dep_qual_fixes:ts:}:E:O:u:N$m} .if ${_debug_reldir} || ${DEBUG_DIRDEPS:@x@${${DEP_RELDIR}.$m:L:M$x}${${DEP_RELDIR}.$q:L:M$x}@} != "" .info ${DEP_RELDIR}.$m: graph: ${_build_dirs:M*.$q} .endif .if ${BUILD_DIRDEPS_CACHE} == "yes" x!= { echo; echo '${_this_dir}.$m: ${_build_dirs:M*.$q:${M_oneperline}}'; echo; } >&3; echo .else ${_this_dir}.$m: ${_build_dirs:M*.$q} .endif .endfor .endif .if ${_debug_reldir} .info ${DEP_RELDIR}.$m: graph: ${_build_dirs:M*.$m:N${_this_dir}.$m} .endif .if ${BUILD_DIRDEPS_CACHE} == "yes" x!= { echo; echo '${_this_dir}.$m: ${_build_dirs:M*.$m:N${_this_dir}.$m:${M_oneperline}}'; echo; } >&3; echo .else ${_this_dir}.$m: ${_build_dirs:M*.$m:N${_this_dir}.$m} .endif .endfor .endif # Now find more dependencies - and recurse. .for d in ${_build_all_dirs} .if !target(_dirdeps_checked.$d) # once only _dirdeps_checked.$d: .if ${_debug_search} .info checking $d .endif # Note: _build_all_dirs is fully qualifed so d:R is always the directory .if exists(${d:R}) # Warning: there is an assumption here that MACHINE is always # the first entry in TARGET_SPEC_VARS. # If TARGET_SPEC and MACHINE are insufficient, you have a problem. _m := ${.MAKE.DEPENDFILE_PREFERENCE:T:S;${TARGET_SPEC}$;${d:E};:S;${MACHINE};${d:E:C/,.*//};:@m@${exists(${d:R}/$m):?${d:R}/$m:}@:[1]} .if !empty(_m) # M_dep_qual_fixes isn't geared to Makefile.depend _qm := ${_m:C;(\.depend)$;\1.${d:E};:${M_dep_qual_fixes:ts:}} .if ${_debug_search} .info Looking for ${_qm} .endif # we pass _DEP_TARGET_SPEC to tell the next step what we want _DEP_TARGET_SPEC := ${d:E} # some makefiles may still look at this _DEP_MACHINE := ${d:E:C/,.*//} # set this "just in case" # we can skip :tA since we computed the path above DEP_RELDIR := ${_m:H:S,${SRCTOP}/,,} # and reset this DIRDEPS = .if ${_debug_reldir} && ${_qm} != ${_m} .info loading ${_m} for ${d:E} .endif .include <${_m}> .endif .endif .endif .endfor .endif # -V .endif # BUILD_DIRDEPS .elif ${.MAKE.LEVEL} > 42 .error You should have stopped recursing by now. .else # we are building something DEP_RELDIR := ${RELDIR} _DEP_RELDIR := ${RELDIR} # pickup local dependencies .if ${MAKE_VERSION} < 20160220 .-include <.depend> .else .dinclude <.depend> .endif .endif # bootstrapping new dependencies made easy? .if !target(bootstrap) && (make(bootstrap) || \ make(bootstrap-this) || \ make(bootstrap-recurse) || \ make(bootstrap-empty)) -.if exists(${.CURDIR}/${.MAKE.DEPENDFILE:T}) +# if we are bootstrapping create the default +_want = ${.CURDIR}/${.MAKE.DEPENDFILE_DEFAULT:T} + +.if exists(${_want}) # stop here ${.TARGETS:Mboot*}: .elif !make(bootstrap-empty) # find a Makefile.depend to use as _src _src != cd ${.CURDIR} && for m in ${.MAKE.DEPENDFILE_PREFERENCE:T:S,${MACHINE},*,}; do test -s $$m || continue; echo $$m; break; done; echo .if empty(_src) .error cannot find any of ${.MAKE.DEPENDFILE_PREFERENCE:T}${.newline}Use: bootstrap-empty .endif -_src?= ${.MAKE.DEPENDFILE:T} +_src?= ${.MAKE.DEPENDFILE} +.MAKE.DEPENDFILE_BOOTSTRAP_SED+= -e 's,${_src:E},${MACHINE},g' + # just create Makefile.depend* for this dir bootstrap-this: .NOTMAIN - @echo Bootstrapping ${RELDIR}/${.MAKE.DEPENDFILE:T} from ${_src:T} - (cd ${.CURDIR} && sed 's,${_src:E},${MACHINE},g' ${_src} > ${.MAKE.DEPENDFILE:T}) + @echo Bootstrapping ${RELDIR}/${_want:T} from ${_src:T}; \ + echo You need to build ${RELDIR} to correctly populate it. +.if ${_src:T} != ${.MAKE.DEPENDFILE_PREFIX:T} + (cd ${.CURDIR} && sed ${.MAKE.DEPENDFILE_BOOTSTRAP_SED} ${_src} > ${_want}) +.else + cp ${.CURDIR}/${_src:T} ${_want} +.endif # create Makefile.depend* for this dir and its dependencies bootstrap: bootstrap-recurse bootstrap-recurse: bootstrap-this _mf := ${.PARSEFILE} bootstrap-recurse: .NOTMAIN .MAKE @cd ${SRCTOP} && \ for d in `cd ${RELDIR} && ${.MAKE} -B -f ${"${.MAKEFLAGS:M-n}":?${_src}:${.MAKE.DEPENDFILE:T}} -V DIRDEPS`; do \ test -d $$d || d=$${d%.*}; \ test -d $$d || continue; \ echo "Checking $$d for bootstrap ..."; \ (cd $$d && ${.MAKE} -f ${_mf} bootstrap-recurse); \ done .endif # create an empty Makefile.depend* to get the ball rolling. bootstrap-empty: .NOTMAIN .NOMETA - @echo Creating empty ${RELDIR}/${.MAKE.DEPENDFILE:T}; \ + @echo Creating empty ${RELDIR}/${_want:T}; \ echo You need to build ${RELDIR} to correctly populate it. - @{ echo DIRDEPS=; echo ".include "; } > ${.CURDIR}/${.MAKE.DEPENDFILE:T} + @{ echo DIRDEPS=; echo ".include "; } > ${_want} .endif Index: projects/clang390-import/share/mk/meta.sys.mk =================================================================== --- projects/clang390-import/share/mk/meta.sys.mk (revision 305686) +++ projects/clang390-import/share/mk/meta.sys.mk (revision 305687) @@ -1,143 +1,163 @@ # $FreeBSD$ # $Id: meta.sys.mk,v 1.19 2014/08/02 23:16:02 sjg Exp $ # # @(#) Copyright (c) 2010, Simon J. Gerraty # # This file is provided in the hope that it will # be of use. There is absolutely NO WARRANTY. # Permission to copy, redistribute or otherwise # use this file is hereby granted provided that # the above copyright notice and this notice are # left intact. # # Please send copies of changes and bug-fixes to: # sjg@crufty.net # # include this if you want to enable meta mode # for maximum benefit, requires filemon(4) driver. .if ${MAKE_VERSION:U0} > 20100901 .if !target(.ERROR) .-include "local.meta.sys.mk" # absoulte path to what we are reading. _PARSEDIR = ${.PARSEDIR:tA} +.if !defined(SYS_MK_DIR) +SYS_MK_DIR := ${_PARSEDIR} +.endif + META_MODE += meta verbose .MAKE.MODE ?= ${META_MODE} .if ${.MAKE.LEVEL} == 0 _make_mode := ${.MAKE.MODE} ${META_MODE} .if ${_make_mode:M*read*} != "" || ${_make_mode:M*nofilemon*} != "" # tell everyone we are not updating Makefile.depend* UPDATE_DEPENDFILE = NO .export UPDATE_DEPENDFILE .endif .if ${UPDATE_DEPENDFILE:Uyes:tl} == "no" && !exists(/dev/filemon) # we should not get upset META_MODE += nofilemon .export META_MODE .endif .endif .if !defined(NO_SILENT) .if ${MAKE_VERSION} > 20110818 # only be silent when we have a .meta file META_MODE += silent=yes .else .SILENT: .endif .endif # make defaults .MAKE.DEPENDFILE to .depend # that won't work for us. .if ${.MAKE.DEPENDFILE} == ".depend" .undef .MAKE.DEPENDFILE .endif # if you don't cross build for multiple MACHINEs concurrently, then # .MAKE.DEPENDFILE = Makefile.depend # probably makes sense - you can set that in local.sys.mk .MAKE.DEPENDFILE ?= Makefile.depend.${MACHINE} # we use the pseudo machine "host" for the build host. # this should be taken care of before we get here .if ${OBJTOP:Ua} == ${HOST_OBJTOP:Ub} MACHINE = host .endif .if ${.MAKE.LEVEL} == 0 # it can be handy to know which MACHINE kicked off the build # for example, if using Makefild.depend for multiple machines, # allowing only MACHINE0 to update can keep things simple. MACHINE0 := ${MACHINE} .export MACHINE0 .if defined(PYTHON) && exists(${PYTHON}) # we prefer the python version of this - it is much faster META2DEPS ?= ${.PARSEDIR}/meta2deps.py .else META2DEPS ?= ${.PARSEDIR}/meta2deps.sh .endif META2DEPS := ${META2DEPS} .export META2DEPS .endif MAKE_PRINT_VAR_ON_ERROR += \ .ERROR_TARGET \ .ERROR_META_FILE \ .MAKE.LEVEL \ MAKEFILE \ .MAKE.MODE .if !defined(SB) && defined(SRCTOP) SB = ${SRCTOP:H} .endif ERROR_LOGDIR ?= ${SB}/error meta_error_log = ${ERROR_LOGDIR}/meta-${.MAKE.PID}.log # we are not interested in make telling us a failure happened elsewhere .ERROR: _metaError _metaError: .NOMETA .NOTMAIN -@[ "${.ERROR_META_FILE}" ] && { \ grep -q 'failure has been detected in another branch' ${.ERROR_META_FILE} && exit 0; \ mkdir -p ${meta_error_log:H}; \ cp ${.ERROR_META_FILE} ${meta_error_log}; \ echo "ERROR: log ${meta_error_log}" >&2; }; : .endif +META_COOKIE_TOUCH= +# some targets need to be .PHONY in non-meta mode +META_NOPHONY= .PHONY # Are we, after all, in meta mode? .if ${.MAKE.MODE:Uno:Mmeta*} != "" MKDEP_MK = meta.autodep.mk + +# we can afford to use cookies to prevent some targets +# re-running needlessly +META_COOKIE_TOUCH= touch ${COOKIE.${.TARGET}:U${.OBJDIR}/${.TARGET:T}} +META_NOPHONY= + +# some targets involve old pre-built targets +# ignore mtime of shell +# and mtime of makefiles does not matter in meta mode +.MAKE.META.IGNORE_PATHS += \ + ${MAKEFILE} \ + ${SHELL} \ + ${SYS_MK_DIR} # if we think we are updating dependencies, # then filemon had better be present .if ${UPDATE_DEPENDFILE:Uyes:tl} != "no" && !exists(/dev/filemon) .error ${.newline}ERROR: The filemon module (/dev/filemon) is not loaded. .endif .if ${.MAKE.LEVEL} == 0 # make sure dirdeps target exists and do it first all: dirdeps .WAIT dirdeps: .NOPATH: dirdeps .if defined(ALL_MACHINES) # the first .MAIN: is what counts # by default dirdeps is all we want at level0 .MAIN: dirdeps # tell dirdeps.mk what we want BUILD_AT_LEVEL0 = no .endif .if ${.TARGETS:Nall} == "" # it works best if we do everything via sub-makes BUILD_AT_LEVEL0 ?= no .endif .endif .endif .endif Index: projects/clang390-import/sys/amd64/amd64/pmap.c =================================================================== --- projects/clang390-import/sys/amd64/amd64/pmap.c (revision 305686) +++ projects/clang390-import/sys/amd64/amd64/pmap.c (revision 305687) @@ -1,7274 +1,7268 @@ /*- * Copyright (c) 1991 Regents of the University of California. * All rights reserved. * Copyright (c) 1994 John S. Dyson * All rights reserved. * Copyright (c) 1994 David Greenman * All rights reserved. * Copyright (c) 2003 Peter Wemm * All rights reserved. * Copyright (c) 2005-2010 Alan L. Cox * All rights reserved. * * This code is derived from software contributed to Berkeley by * the Systems Programming Group of the University of Utah Computer * Science Department and William Jolitz of UUNET Technologies Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. All advertising materials mentioning features or use of this software * must display the following acknowledgement: * This product includes software developed by the University of * California, Berkeley and its contributors. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * from: @(#)pmap.c 7.7 (Berkeley) 5/12/91 */ /*- * Copyright (c) 2003 Networks Associates Technology, Inc. * All rights reserved. * * This software was developed for the FreeBSD Project by Jake Burkholder, * Safeport Network Services, and Network Associates Laboratories, the * Security Research Division of Network Associates, Inc. under * DARPA/SPAWAR contract N66001-01-C-8035 ("CBOSS"), as part of the DARPA * CHATS research program. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #define AMD64_NPT_AWARE #include __FBSDID("$FreeBSD$"); /* * Manages physical address maps. * * Since the information managed by this module is * also stored by the logical address mapping module, * this module may throw away valid virtual-to-physical * mappings at almost any time. However, invalidations * of virtual-to-physical mappings must be done as * requested. * * In order to cope with hardware architectures which * make virtual-to-physical map invalidates expensive, * this module may delay invalidate or reduced protection * operations until such time as they are actually * necessary. This module is given full information as * to which processors are currently using which maps, * and to when physical maps must be made correct. */ #include "opt_pmap.h" #include "opt_vm.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifdef SMP #include #endif static __inline boolean_t pmap_type_guest(pmap_t pmap) { return ((pmap->pm_type == PT_EPT) || (pmap->pm_type == PT_RVI)); } static __inline boolean_t pmap_emulate_ad_bits(pmap_t pmap) { return ((pmap->pm_flags & PMAP_EMULATE_AD_BITS) != 0); } static __inline pt_entry_t pmap_valid_bit(pmap_t pmap) { pt_entry_t mask; switch (pmap->pm_type) { case PT_X86: case PT_RVI: mask = X86_PG_V; break; case PT_EPT: if (pmap_emulate_ad_bits(pmap)) mask = EPT_PG_EMUL_V; else mask = EPT_PG_READ; break; default: panic("pmap_valid_bit: invalid pm_type %d", pmap->pm_type); } return (mask); } static __inline pt_entry_t pmap_rw_bit(pmap_t pmap) { pt_entry_t mask; switch (pmap->pm_type) { case PT_X86: case PT_RVI: mask = X86_PG_RW; break; case PT_EPT: if (pmap_emulate_ad_bits(pmap)) mask = EPT_PG_EMUL_RW; else mask = EPT_PG_WRITE; break; default: panic("pmap_rw_bit: invalid pm_type %d", pmap->pm_type); } return (mask); } static __inline pt_entry_t pmap_global_bit(pmap_t pmap) { pt_entry_t mask; switch (pmap->pm_type) { case PT_X86: mask = X86_PG_G; break; case PT_RVI: case PT_EPT: mask = 0; break; default: panic("pmap_global_bit: invalid pm_type %d", pmap->pm_type); } return (mask); } static __inline pt_entry_t pmap_accessed_bit(pmap_t pmap) { pt_entry_t mask; switch (pmap->pm_type) { case PT_X86: case PT_RVI: mask = X86_PG_A; break; case PT_EPT: if (pmap_emulate_ad_bits(pmap)) mask = EPT_PG_READ; else mask = EPT_PG_A; break; default: panic("pmap_accessed_bit: invalid pm_type %d", pmap->pm_type); } return (mask); } static __inline pt_entry_t pmap_modified_bit(pmap_t pmap) { pt_entry_t mask; switch (pmap->pm_type) { case PT_X86: case PT_RVI: mask = X86_PG_M; break; case PT_EPT: if (pmap_emulate_ad_bits(pmap)) mask = EPT_PG_WRITE; else mask = EPT_PG_M; break; default: panic("pmap_modified_bit: invalid pm_type %d", pmap->pm_type); } return (mask); } extern struct pcpu __pcpu[]; #if !defined(DIAGNOSTIC) #ifdef __GNUC_GNU_INLINE__ #define PMAP_INLINE __attribute__((__gnu_inline__)) inline #else #define PMAP_INLINE extern inline #endif #else #define PMAP_INLINE #endif #ifdef PV_STATS #define PV_STAT(x) do { x ; } while (0) #else #define PV_STAT(x) do { } while (0) #endif #define pa_index(pa) ((pa) >> PDRSHIFT) #define pa_to_pvh(pa) (&pv_table[pa_index(pa)]) #define NPV_LIST_LOCKS MAXCPU #define PHYS_TO_PV_LIST_LOCK(pa) \ (&pv_list_locks[pa_index(pa) % NPV_LIST_LOCKS]) #define CHANGE_PV_LIST_LOCK_TO_PHYS(lockp, pa) do { \ struct rwlock **_lockp = (lockp); \ struct rwlock *_new_lock; \ \ _new_lock = PHYS_TO_PV_LIST_LOCK(pa); \ if (_new_lock != *_lockp) { \ if (*_lockp != NULL) \ rw_wunlock(*_lockp); \ *_lockp = _new_lock; \ rw_wlock(*_lockp); \ } \ } while (0) #define CHANGE_PV_LIST_LOCK_TO_VM_PAGE(lockp, m) \ CHANGE_PV_LIST_LOCK_TO_PHYS(lockp, VM_PAGE_TO_PHYS(m)) #define RELEASE_PV_LIST_LOCK(lockp) do { \ struct rwlock **_lockp = (lockp); \ \ if (*_lockp != NULL) { \ rw_wunlock(*_lockp); \ *_lockp = NULL; \ } \ } while (0) #define VM_PAGE_TO_PV_LIST_LOCK(m) \ PHYS_TO_PV_LIST_LOCK(VM_PAGE_TO_PHYS(m)) struct pmap kernel_pmap_store; vm_offset_t virtual_avail; /* VA of first avail page (after kernel bss) */ vm_offset_t virtual_end; /* VA of last avail page (end of kernel AS) */ int nkpt; SYSCTL_INT(_machdep, OID_AUTO, nkpt, CTLFLAG_RD, &nkpt, 0, "Number of kernel page table pages allocated on bootup"); static int ndmpdp; vm_paddr_t dmaplimit; vm_offset_t kernel_vm_end = VM_MIN_KERNEL_ADDRESS; pt_entry_t pg_nx; static SYSCTL_NODE(_vm, OID_AUTO, pmap, CTLFLAG_RD, 0, "VM/pmap parameters"); static int pat_works = 1; SYSCTL_INT(_vm_pmap, OID_AUTO, pat_works, CTLFLAG_RD, &pat_works, 1, "Is page attribute table fully functional?"); static int pg_ps_enabled = 1; SYSCTL_INT(_vm_pmap, OID_AUTO, pg_ps_enabled, CTLFLAG_RDTUN | CTLFLAG_NOFETCH, &pg_ps_enabled, 0, "Are large page mappings enabled?"); #define PAT_INDEX_SIZE 8 static int pat_index[PAT_INDEX_SIZE]; /* cache mode to PAT index conversion */ static u_int64_t KPTphys; /* phys addr of kernel level 1 */ static u_int64_t KPDphys; /* phys addr of kernel level 2 */ u_int64_t KPDPphys; /* phys addr of kernel level 3 */ u_int64_t KPML4phys; /* phys addr of kernel level 4 */ static u_int64_t DMPDphys; /* phys addr of direct mapped level 2 */ static u_int64_t DMPDPphys; /* phys addr of direct mapped level 3 */ static int ndmpdpphys; /* number of DMPDPphys pages */ /* * pmap_mapdev support pre initialization (i.e. console) */ #define PMAP_PREINIT_MAPPING_COUNT 8 static struct pmap_preinit_mapping { vm_paddr_t pa; vm_offset_t va; vm_size_t sz; int mode; } pmap_preinit_mapping[PMAP_PREINIT_MAPPING_COUNT]; static int pmap_initialized; /* * Data for the pv entry allocation mechanism. * Updates to pv_invl_gen are protected by the pv_list_locks[] * elements, but reads are not. */ static TAILQ_HEAD(pch, pv_chunk) pv_chunks = TAILQ_HEAD_INITIALIZER(pv_chunks); static struct mtx pv_chunks_mutex; static struct rwlock pv_list_locks[NPV_LIST_LOCKS]; static u_long pv_invl_gen[NPV_LIST_LOCKS]; static struct md_page *pv_table; static struct md_page pv_dummy; /* * All those kernel PT submaps that BSD is so fond of */ pt_entry_t *CMAP1 = 0; caddr_t CADDR1 = 0; static vm_offset_t qframe = 0; static struct mtx qframe_mtx; static int pmap_flags = PMAP_PDE_SUPERPAGE; /* flags for x86 pmaps */ int pmap_pcid_enabled = 1; SYSCTL_INT(_vm_pmap, OID_AUTO, pcid_enabled, CTLFLAG_RDTUN | CTLFLAG_NOFETCH, &pmap_pcid_enabled, 0, "Is TLB Context ID enabled ?"); int invpcid_works = 0; SYSCTL_INT(_vm_pmap, OID_AUTO, invpcid_works, CTLFLAG_RD, &invpcid_works, 0, "Is the invpcid instruction available ?"); static int pmap_pcid_save_cnt_proc(SYSCTL_HANDLER_ARGS) { int i; uint64_t res; res = 0; CPU_FOREACH(i) { res += cpuid_to_pcpu[i]->pc_pm_save_cnt; } return (sysctl_handle_64(oidp, &res, 0, req)); } SYSCTL_PROC(_vm_pmap, OID_AUTO, pcid_save_cnt, CTLTYPE_U64 | CTLFLAG_RW | CTLFLAG_MPSAFE, NULL, 0, pmap_pcid_save_cnt_proc, "QU", "Count of saved TLB context on switch"); static LIST_HEAD(, pmap_invl_gen) pmap_invl_gen_tracker = LIST_HEAD_INITIALIZER(&pmap_invl_gen_tracker); static struct mtx invl_gen_mtx; static u_long pmap_invl_gen = 0; /* Fake lock object to satisfy turnstiles interface. */ static struct lock_object invl_gen_ts = { .lo_name = "invlts", }; #define PMAP_ASSERT_NOT_IN_DI() \ KASSERT(curthread->td_md.md_invl_gen.gen == 0, ("DI already started")) /* * Start a new Delayed Invalidation (DI) block of code, executed by * the current thread. Within a DI block, the current thread may * destroy both the page table and PV list entries for a mapping and * then release the corresponding PV list lock before ensuring that * the mapping is flushed from the TLBs of any processors with the * pmap active. */ static void pmap_delayed_invl_started(void) { struct pmap_invl_gen *invl_gen; u_long currgen; invl_gen = &curthread->td_md.md_invl_gen; PMAP_ASSERT_NOT_IN_DI(); mtx_lock(&invl_gen_mtx); if (LIST_EMPTY(&pmap_invl_gen_tracker)) currgen = pmap_invl_gen; else currgen = LIST_FIRST(&pmap_invl_gen_tracker)->gen; invl_gen->gen = currgen + 1; LIST_INSERT_HEAD(&pmap_invl_gen_tracker, invl_gen, link); mtx_unlock(&invl_gen_mtx); } /* * Finish the DI block, previously started by the current thread. All * required TLB flushes for the pages marked by * pmap_delayed_invl_page() must be finished before this function is * called. * * This function works by bumping the global DI generation number to * the generation number of the current thread's DI, unless there is a * pending DI that started earlier. In the latter case, bumping the * global DI generation number would incorrectly signal that the * earlier DI had finished. Instead, this function bumps the earlier * DI's generation number to match the generation number of the * current thread's DI. */ static void pmap_delayed_invl_finished(void) { struct pmap_invl_gen *invl_gen, *next; struct turnstile *ts; invl_gen = &curthread->td_md.md_invl_gen; KASSERT(invl_gen->gen != 0, ("missed invl_started")); mtx_lock(&invl_gen_mtx); next = LIST_NEXT(invl_gen, link); if (next == NULL) { turnstile_chain_lock(&invl_gen_ts); ts = turnstile_lookup(&invl_gen_ts); pmap_invl_gen = invl_gen->gen; if (ts != NULL) { turnstile_broadcast(ts, TS_SHARED_QUEUE); turnstile_unpend(ts, TS_SHARED_LOCK); } turnstile_chain_unlock(&invl_gen_ts); } else { next->gen = invl_gen->gen; } LIST_REMOVE(invl_gen, link); mtx_unlock(&invl_gen_mtx); invl_gen->gen = 0; } #ifdef PV_STATS static long invl_wait; SYSCTL_LONG(_vm_pmap, OID_AUTO, invl_wait, CTLFLAG_RD, &invl_wait, 0, "Number of times DI invalidation blocked pmap_remove_all/write"); #endif static u_long * pmap_delayed_invl_genp(vm_page_t m) { return (&pv_invl_gen[pa_index(VM_PAGE_TO_PHYS(m)) % NPV_LIST_LOCKS]); } /* * Ensure that all currently executing DI blocks, that need to flush * TLB for the given page m, actually flushed the TLB at the time the * function returned. If the page m has an empty PV list and we call * pmap_delayed_invl_wait(), upon its return we know that no CPU has a * valid mapping for the page m in either its page table or TLB. * * This function works by blocking until the global DI generation * number catches up with the generation number associated with the * given page m and its PV list. Since this function's callers * typically own an object lock and sometimes own a page lock, it * cannot sleep. Instead, it blocks on a turnstile to relinquish the * processor. */ static void pmap_delayed_invl_wait(vm_page_t m) { struct thread *td; struct turnstile *ts; u_long *m_gen; #ifdef PV_STATS bool accounted = false; #endif td = curthread; m_gen = pmap_delayed_invl_genp(m); while (*m_gen > pmap_invl_gen) { #ifdef PV_STATS if (!accounted) { atomic_add_long(&invl_wait, 1); accounted = true; } #endif ts = turnstile_trywait(&invl_gen_ts); if (*m_gen > pmap_invl_gen) turnstile_wait(ts, NULL, TS_SHARED_QUEUE); else turnstile_cancel(ts); } } /* * Mark the page m's PV list as participating in the current thread's * DI block. Any threads concurrently using m's PV list to remove or * restrict all mappings to m will wait for the current thread's DI * block to complete before proceeding. * * The function works by setting the DI generation number for m's PV * list to at least the DI generation number of the current thread. * This forces a caller of pmap_delayed_invl_wait() to block until * current thread calls pmap_delayed_invl_finished(). */ static void pmap_delayed_invl_page(vm_page_t m) { u_long gen, *m_gen; rw_assert(VM_PAGE_TO_PV_LIST_LOCK(m), RA_WLOCKED); gen = curthread->td_md.md_invl_gen.gen; if (gen == 0) return; m_gen = pmap_delayed_invl_genp(m); if (*m_gen < gen) *m_gen = gen; } /* * Crashdump maps. */ static caddr_t crashdumpmap; static void free_pv_chunk(struct pv_chunk *pc); static void free_pv_entry(pmap_t pmap, pv_entry_t pv); static pv_entry_t get_pv_entry(pmap_t pmap, struct rwlock **lockp); static int popcnt_pc_map_pq(uint64_t *map); static vm_page_t reclaim_pv_chunk(pmap_t locked_pmap, struct rwlock **lockp); static void reserve_pv_entries(pmap_t pmap, int needed, struct rwlock **lockp); static void pmap_pv_demote_pde(pmap_t pmap, vm_offset_t va, vm_paddr_t pa, struct rwlock **lockp); static boolean_t pmap_pv_insert_pde(pmap_t pmap, vm_offset_t va, vm_paddr_t pa, struct rwlock **lockp); static void pmap_pv_promote_pde(pmap_t pmap, vm_offset_t va, vm_paddr_t pa, struct rwlock **lockp); static void pmap_pvh_free(struct md_page *pvh, pmap_t pmap, vm_offset_t va); static pv_entry_t pmap_pvh_remove(struct md_page *pvh, pmap_t pmap, vm_offset_t va); static int pmap_change_attr_locked(vm_offset_t va, vm_size_t size, int mode); static boolean_t pmap_demote_pde(pmap_t pmap, pd_entry_t *pde, vm_offset_t va); static boolean_t pmap_demote_pde_locked(pmap_t pmap, pd_entry_t *pde, vm_offset_t va, struct rwlock **lockp); static boolean_t pmap_demote_pdpe(pmap_t pmap, pdp_entry_t *pdpe, vm_offset_t va); static boolean_t pmap_enter_pde(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot, struct rwlock **lockp); static vm_page_t pmap_enter_quick_locked(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot, vm_page_t mpte, struct rwlock **lockp); static void pmap_fill_ptp(pt_entry_t *firstpte, pt_entry_t newpte); static int pmap_insert_pt_page(pmap_t pmap, vm_page_t mpte); static void pmap_kenter_attr(vm_offset_t va, vm_paddr_t pa, int mode); static vm_page_t pmap_lookup_pt_page(pmap_t pmap, vm_offset_t va); static void pmap_pde_attr(pd_entry_t *pde, int cache_bits, int mask); static void pmap_promote_pde(pmap_t pmap, pd_entry_t *pde, vm_offset_t va, struct rwlock **lockp); static boolean_t pmap_protect_pde(pmap_t pmap, pd_entry_t *pde, vm_offset_t sva, vm_prot_t prot); static void pmap_pte_attr(pt_entry_t *pte, int cache_bits, int mask); static int pmap_remove_pde(pmap_t pmap, pd_entry_t *pdq, vm_offset_t sva, struct spglist *free, struct rwlock **lockp); static int pmap_remove_pte(pmap_t pmap, pt_entry_t *ptq, vm_offset_t sva, pd_entry_t ptepde, struct spglist *free, struct rwlock **lockp); static void pmap_remove_pt_page(pmap_t pmap, vm_page_t mpte); static void pmap_remove_page(pmap_t pmap, vm_offset_t va, pd_entry_t *pde, struct spglist *free); static boolean_t pmap_try_insert_pv_entry(pmap_t pmap, vm_offset_t va, vm_page_t m, struct rwlock **lockp); static void pmap_update_pde(pmap_t pmap, vm_offset_t va, pd_entry_t *pde, pd_entry_t newpde); static void pmap_update_pde_invalidate(pmap_t, vm_offset_t va, pd_entry_t pde); static vm_page_t _pmap_allocpte(pmap_t pmap, vm_pindex_t ptepindex, struct rwlock **lockp); static vm_page_t pmap_allocpde(pmap_t pmap, vm_offset_t va, struct rwlock **lockp); static vm_page_t pmap_allocpte(pmap_t pmap, vm_offset_t va, struct rwlock **lockp); static void _pmap_unwire_ptp(pmap_t pmap, vm_offset_t va, vm_page_t m, struct spglist *free); static int pmap_unuse_pt(pmap_t, vm_offset_t, pd_entry_t, struct spglist *); static vm_offset_t pmap_kmem_choose(vm_offset_t addr); /* * Move the kernel virtual free pointer to the next * 2MB. This is used to help improve performance * by using a large (2MB) page for much of the kernel * (.text, .data, .bss) */ static vm_offset_t pmap_kmem_choose(vm_offset_t addr) { vm_offset_t newaddr = addr; newaddr = roundup2(addr, NBPDR); return (newaddr); } /********************/ /* Inline functions */ /********************/ /* Return a non-clipped PD index for a given VA */ static __inline vm_pindex_t pmap_pde_pindex(vm_offset_t va) { return (va >> PDRSHIFT); } /* Return various clipped indexes for a given VA */ static __inline vm_pindex_t pmap_pte_index(vm_offset_t va) { return ((va >> PAGE_SHIFT) & ((1ul << NPTEPGSHIFT) - 1)); } static __inline vm_pindex_t pmap_pde_index(vm_offset_t va) { return ((va >> PDRSHIFT) & ((1ul << NPDEPGSHIFT) - 1)); } static __inline vm_pindex_t pmap_pdpe_index(vm_offset_t va) { return ((va >> PDPSHIFT) & ((1ul << NPDPEPGSHIFT) - 1)); } static __inline vm_pindex_t pmap_pml4e_index(vm_offset_t va) { return ((va >> PML4SHIFT) & ((1ul << NPML4EPGSHIFT) - 1)); } /* Return a pointer to the PML4 slot that corresponds to a VA */ static __inline pml4_entry_t * pmap_pml4e(pmap_t pmap, vm_offset_t va) { return (&pmap->pm_pml4[pmap_pml4e_index(va)]); } /* Return a pointer to the PDP slot that corresponds to a VA */ static __inline pdp_entry_t * pmap_pml4e_to_pdpe(pml4_entry_t *pml4e, vm_offset_t va) { pdp_entry_t *pdpe; pdpe = (pdp_entry_t *)PHYS_TO_DMAP(*pml4e & PG_FRAME); return (&pdpe[pmap_pdpe_index(va)]); } /* Return a pointer to the PDP slot that corresponds to a VA */ static __inline pdp_entry_t * pmap_pdpe(pmap_t pmap, vm_offset_t va) { pml4_entry_t *pml4e; pt_entry_t PG_V; PG_V = pmap_valid_bit(pmap); pml4e = pmap_pml4e(pmap, va); if ((*pml4e & PG_V) == 0) return (NULL); return (pmap_pml4e_to_pdpe(pml4e, va)); } /* Return a pointer to the PD slot that corresponds to a VA */ static __inline pd_entry_t * pmap_pdpe_to_pde(pdp_entry_t *pdpe, vm_offset_t va) { pd_entry_t *pde; pde = (pd_entry_t *)PHYS_TO_DMAP(*pdpe & PG_FRAME); return (&pde[pmap_pde_index(va)]); } /* Return a pointer to the PD slot that corresponds to a VA */ static __inline pd_entry_t * pmap_pde(pmap_t pmap, vm_offset_t va) { pdp_entry_t *pdpe; pt_entry_t PG_V; PG_V = pmap_valid_bit(pmap); pdpe = pmap_pdpe(pmap, va); if (pdpe == NULL || (*pdpe & PG_V) == 0) return (NULL); return (pmap_pdpe_to_pde(pdpe, va)); } /* Return a pointer to the PT slot that corresponds to a VA */ static __inline pt_entry_t * pmap_pde_to_pte(pd_entry_t *pde, vm_offset_t va) { pt_entry_t *pte; pte = (pt_entry_t *)PHYS_TO_DMAP(*pde & PG_FRAME); return (&pte[pmap_pte_index(va)]); } /* Return a pointer to the PT slot that corresponds to a VA */ static __inline pt_entry_t * pmap_pte(pmap_t pmap, vm_offset_t va) { pd_entry_t *pde; pt_entry_t PG_V; PG_V = pmap_valid_bit(pmap); pde = pmap_pde(pmap, va); if (pde == NULL || (*pde & PG_V) == 0) return (NULL); if ((*pde & PG_PS) != 0) /* compat with i386 pmap_pte() */ return ((pt_entry_t *)pde); return (pmap_pde_to_pte(pde, va)); } static __inline void pmap_resident_count_inc(pmap_t pmap, int count) { PMAP_LOCK_ASSERT(pmap, MA_OWNED); pmap->pm_stats.resident_count += count; } static __inline void pmap_resident_count_dec(pmap_t pmap, int count) { PMAP_LOCK_ASSERT(pmap, MA_OWNED); KASSERT(pmap->pm_stats.resident_count >= count, ("pmap %p resident count underflow %ld %d", pmap, pmap->pm_stats.resident_count, count)); pmap->pm_stats.resident_count -= count; } PMAP_INLINE pt_entry_t * vtopte(vm_offset_t va) { u_int64_t mask = ((1ul << (NPTEPGSHIFT + NPDEPGSHIFT + NPDPEPGSHIFT + NPML4EPGSHIFT)) - 1); KASSERT(va >= VM_MAXUSER_ADDRESS, ("vtopte on a uva/gpa 0x%0lx", va)); return (PTmap + ((va >> PAGE_SHIFT) & mask)); } static __inline pd_entry_t * vtopde(vm_offset_t va) { u_int64_t mask = ((1ul << (NPDEPGSHIFT + NPDPEPGSHIFT + NPML4EPGSHIFT)) - 1); KASSERT(va >= VM_MAXUSER_ADDRESS, ("vtopde on a uva/gpa 0x%0lx", va)); return (PDmap + ((va >> PDRSHIFT) & mask)); } static u_int64_t allocpages(vm_paddr_t *firstaddr, int n) { u_int64_t ret; ret = *firstaddr; bzero((void *)ret, n * PAGE_SIZE); *firstaddr += n * PAGE_SIZE; return (ret); } CTASSERT(powerof2(NDMPML4E)); /* number of kernel PDP slots */ #define NKPDPE(ptpgs) howmany(ptpgs, NPDEPG) static void nkpt_init(vm_paddr_t addr) { int pt_pages; #ifdef NKPT pt_pages = NKPT; #else pt_pages = howmany(addr, 1 << PDRSHIFT); pt_pages += NKPDPE(pt_pages); /* * Add some slop beyond the bare minimum required for bootstrapping * the kernel. * * This is quite important when allocating KVA for kernel modules. * The modules are required to be linked in the negative 2GB of * the address space. If we run out of KVA in this region then * pmap_growkernel() will need to allocate page table pages to map * the entire 512GB of KVA space which is an unnecessary tax on * physical memory. * * Secondly, device memory mapped as part of setting up the low- * level console(s) is taken from KVA, starting at virtual_avail. * This is because cninit() is called after pmap_bootstrap() but * before vm_init() and pmap_init(). 20MB for a frame buffer is * not uncommon. */ pt_pages += 32; /* 64MB additional slop. */ #endif nkpt = pt_pages; } static void create_pagetables(vm_paddr_t *firstaddr) { int i, j, ndm1g, nkpdpe; pt_entry_t *pt_p; pd_entry_t *pd_p; pdp_entry_t *pdp_p; pml4_entry_t *p4_p; /* Allocate page table pages for the direct map */ ndmpdp = howmany(ptoa(Maxmem), NBPDP); if (ndmpdp < 4) /* Minimum 4GB of dirmap */ ndmpdp = 4; ndmpdpphys = howmany(ndmpdp, NPDPEPG); if (ndmpdpphys > NDMPML4E) { /* * Each NDMPML4E allows 512 GB, so limit to that, * and then readjust ndmpdp and ndmpdpphys. */ printf("NDMPML4E limits system to %d GB\n", NDMPML4E * 512); Maxmem = atop(NDMPML4E * NBPML4); ndmpdpphys = NDMPML4E; ndmpdp = NDMPML4E * NPDEPG; } DMPDPphys = allocpages(firstaddr, ndmpdpphys); ndm1g = 0; if ((amd_feature & AMDID_PAGE1GB) != 0) ndm1g = ptoa(Maxmem) >> PDPSHIFT; if (ndm1g < ndmpdp) DMPDphys = allocpages(firstaddr, ndmpdp - ndm1g); dmaplimit = (vm_paddr_t)ndmpdp << PDPSHIFT; /* Allocate pages */ KPML4phys = allocpages(firstaddr, 1); KPDPphys = allocpages(firstaddr, NKPML4E); /* * Allocate the initial number of kernel page table pages required to * bootstrap. We defer this until after all memory-size dependent * allocations are done (e.g. direct map), so that we don't have to * build in too much slop in our estimate. * * Note that when NKPML4E > 1, we have an empty page underneath * all but the KPML4I'th one, so we need NKPML4E-1 extra (zeroed) * pages. (pmap_enter requires a PD page to exist for each KPML4E.) */ nkpt_init(*firstaddr); nkpdpe = NKPDPE(nkpt); KPTphys = allocpages(firstaddr, nkpt); KPDphys = allocpages(firstaddr, nkpdpe); /* Fill in the underlying page table pages */ /* Nominally read-only (but really R/W) from zero to physfree */ /* XXX not fully used, underneath 2M pages */ pt_p = (pt_entry_t *)KPTphys; for (i = 0; ptoa(i) < *firstaddr; i++) pt_p[i] = ptoa(i) | X86_PG_RW | X86_PG_V | X86_PG_G; /* Now map the page tables at their location within PTmap */ pd_p = (pd_entry_t *)KPDphys; for (i = 0; i < nkpt; i++) pd_p[i] = (KPTphys + ptoa(i)) | X86_PG_RW | X86_PG_V; /* Map from zero to end of allocations under 2M pages */ /* This replaces some of the KPTphys entries above */ for (i = 0; (i << PDRSHIFT) < *firstaddr; i++) pd_p[i] = (i << PDRSHIFT) | X86_PG_RW | X86_PG_V | PG_PS | X86_PG_G; /* And connect up the PD to the PDP (leaving room for L4 pages) */ pdp_p = (pdp_entry_t *)(KPDPphys + ptoa(KPML4I - KPML4BASE)); for (i = 0; i < nkpdpe; i++) pdp_p[i + KPDPI] = (KPDphys + ptoa(i)) | X86_PG_RW | X86_PG_V | PG_U; /* * Now, set up the direct map region using 2MB and/or 1GB pages. If * the end of physical memory is not aligned to a 1GB page boundary, * then the residual physical memory is mapped with 2MB pages. Later, * if pmap_mapdev{_attr}() uses the direct map for non-write-back * memory, pmap_change_attr() will demote any 2MB or 1GB page mappings * that are partially used. */ pd_p = (pd_entry_t *)DMPDphys; for (i = NPDEPG * ndm1g, j = 0; i < NPDEPG * ndmpdp; i++, j++) { pd_p[j] = (vm_paddr_t)i << PDRSHIFT; /* Preset PG_M and PG_A because demotion expects it. */ pd_p[j] |= X86_PG_RW | X86_PG_V | PG_PS | X86_PG_G | X86_PG_M | X86_PG_A; } pdp_p = (pdp_entry_t *)DMPDPphys; for (i = 0; i < ndm1g; i++) { pdp_p[i] = (vm_paddr_t)i << PDPSHIFT; /* Preset PG_M and PG_A because demotion expects it. */ pdp_p[i] |= X86_PG_RW | X86_PG_V | PG_PS | X86_PG_G | X86_PG_M | X86_PG_A; } for (j = 0; i < ndmpdp; i++, j++) { pdp_p[i] = DMPDphys + ptoa(j); pdp_p[i] |= X86_PG_RW | X86_PG_V | PG_U; } /* And recursively map PML4 to itself in order to get PTmap */ p4_p = (pml4_entry_t *)KPML4phys; p4_p[PML4PML4I] = KPML4phys; p4_p[PML4PML4I] |= X86_PG_RW | X86_PG_V | PG_U; /* Connect the Direct Map slot(s) up to the PML4. */ for (i = 0; i < ndmpdpphys; i++) { p4_p[DMPML4I + i] = DMPDPphys + ptoa(i); p4_p[DMPML4I + i] |= X86_PG_RW | X86_PG_V | PG_U; } /* Connect the KVA slots up to the PML4 */ for (i = 0; i < NKPML4E; i++) { p4_p[KPML4BASE + i] = KPDPphys + ptoa(i); p4_p[KPML4BASE + i] |= X86_PG_RW | X86_PG_V | PG_U; } } /* * Bootstrap the system enough to run with virtual memory. * * On amd64 this is called after mapping has already been enabled * and just syncs the pmap module with what has already been done. * [We can't call it easily with mapping off since the kernel is not * mapped with PA == VA, hence we would have to relocate every address * from the linked base (virtual) address "KERNBASE" to the actual * (physical) address starting relative to 0] */ void pmap_bootstrap(vm_paddr_t *firstaddr) { vm_offset_t va; pt_entry_t *pte; int i; /* * Create an initial set of page tables to run the kernel in. */ create_pagetables(firstaddr); /* * Add a physical memory segment (vm_phys_seg) corresponding to the * preallocated kernel page table pages so that vm_page structures * representing these pages will be created. The vm_page structures * are required for promotion of the corresponding kernel virtual * addresses to superpage mappings. */ vm_phys_add_seg(KPTphys, KPTphys + ptoa(nkpt)); virtual_avail = (vm_offset_t) KERNBASE + *firstaddr; virtual_avail = pmap_kmem_choose(virtual_avail); virtual_end = VM_MAX_KERNEL_ADDRESS; /* XXX do %cr0 as well */ load_cr4(rcr4() | CR4_PGE); load_cr3(KPML4phys); if (cpu_stdext_feature & CPUID_STDEXT_SMEP) load_cr4(rcr4() | CR4_SMEP); /* * Initialize the kernel pmap (which is statically allocated). */ PMAP_LOCK_INIT(kernel_pmap); kernel_pmap->pm_pml4 = (pdp_entry_t *)PHYS_TO_DMAP(KPML4phys); kernel_pmap->pm_cr3 = KPML4phys; CPU_FILL(&kernel_pmap->pm_active); /* don't allow deactivation */ TAILQ_INIT(&kernel_pmap->pm_pvchunk); kernel_pmap->pm_flags = pmap_flags; /* * Initialize the TLB invalidations generation number lock. */ mtx_init(&invl_gen_mtx, "invlgn", NULL, MTX_DEF); /* * Reserve some special page table entries/VA space for temporary * mapping of pages. */ #define SYSMAP(c, p, v, n) \ v = (c)va; va += ((n)*PAGE_SIZE); p = pte; pte += (n); va = virtual_avail; pte = vtopte(va); /* * Crashdump maps. The first page is reused as CMAP1 for the * memory test. */ SYSMAP(caddr_t, CMAP1, crashdumpmap, MAXDUMPPGS) CADDR1 = crashdumpmap; virtual_avail = va; /* Initialize the PAT MSR. */ pmap_init_pat(); /* Initialize TLB Context Id. */ TUNABLE_INT_FETCH("vm.pmap.pcid_enabled", &pmap_pcid_enabled); if ((cpu_feature2 & CPUID2_PCID) != 0 && pmap_pcid_enabled) { /* Check for INVPCID support */ invpcid_works = (cpu_stdext_feature & CPUID_STDEXT_INVPCID) != 0; for (i = 0; i < MAXCPU; i++) { kernel_pmap->pm_pcids[i].pm_pcid = PMAP_PCID_KERN; kernel_pmap->pm_pcids[i].pm_gen = 1; } __pcpu[0].pc_pcid_next = PMAP_PCID_KERN + 1; __pcpu[0].pc_pcid_gen = 1; /* * pcpu area for APs is zeroed during AP startup. * pc_pcid_next and pc_pcid_gen are initialized by AP * during pcpu setup. */ load_cr4(rcr4() | CR4_PCIDE); } else { pmap_pcid_enabled = 0; } } /* * Setup the PAT MSR. */ void pmap_init_pat(void) { int pat_table[PAT_INDEX_SIZE]; uint64_t pat_msr; u_long cr0, cr4; int i; /* Bail if this CPU doesn't implement PAT. */ if ((cpu_feature & CPUID_PAT) == 0) panic("no PAT??"); /* Set default PAT index table. */ for (i = 0; i < PAT_INDEX_SIZE; i++) pat_table[i] = -1; pat_table[PAT_WRITE_BACK] = 0; pat_table[PAT_WRITE_THROUGH] = 1; pat_table[PAT_UNCACHEABLE] = 3; pat_table[PAT_WRITE_COMBINING] = 3; pat_table[PAT_WRITE_PROTECTED] = 3; pat_table[PAT_UNCACHED] = 3; /* Initialize default PAT entries. */ pat_msr = PAT_VALUE(0, PAT_WRITE_BACK) | PAT_VALUE(1, PAT_WRITE_THROUGH) | PAT_VALUE(2, PAT_UNCACHED) | PAT_VALUE(3, PAT_UNCACHEABLE) | PAT_VALUE(4, PAT_WRITE_BACK) | PAT_VALUE(5, PAT_WRITE_THROUGH) | PAT_VALUE(6, PAT_UNCACHED) | PAT_VALUE(7, PAT_UNCACHEABLE); if (pat_works) { /* * Leave the indices 0-3 at the default of WB, WT, UC-, and UC. * Program 5 and 6 as WP and WC. * Leave 4 and 7 as WB and UC. */ pat_msr &= ~(PAT_MASK(5) | PAT_MASK(6)); pat_msr |= PAT_VALUE(5, PAT_WRITE_PROTECTED) | PAT_VALUE(6, PAT_WRITE_COMBINING); pat_table[PAT_UNCACHED] = 2; pat_table[PAT_WRITE_PROTECTED] = 5; pat_table[PAT_WRITE_COMBINING] = 6; } else { /* * Just replace PAT Index 2 with WC instead of UC-. */ pat_msr &= ~PAT_MASK(2); pat_msr |= PAT_VALUE(2, PAT_WRITE_COMBINING); pat_table[PAT_WRITE_COMBINING] = 2; } /* Disable PGE. */ cr4 = rcr4(); load_cr4(cr4 & ~CR4_PGE); /* Disable caches (CD = 1, NW = 0). */ cr0 = rcr0(); load_cr0((cr0 & ~CR0_NW) | CR0_CD); /* Flushes caches and TLBs. */ wbinvd(); invltlb(); /* Update PAT and index table. */ wrmsr(MSR_PAT, pat_msr); for (i = 0; i < PAT_INDEX_SIZE; i++) pat_index[i] = pat_table[i]; /* Flush caches and TLBs again. */ wbinvd(); invltlb(); /* Restore caches and PGE. */ load_cr0(cr0); load_cr4(cr4); } /* * Initialize a vm_page's machine-dependent fields. */ void pmap_page_init(vm_page_t m) { TAILQ_INIT(&m->md.pv_list); m->md.pat_mode = PAT_WRITE_BACK; } /* * Initialize the pmap module. * Called by vm_init, to initialize any structures that the pmap * system needs to map virtual memory. */ void pmap_init(void) { struct pmap_preinit_mapping *ppim; vm_page_t mpte; vm_size_t s; int error, i, pv_npg; /* * Initialize the vm page array entries for the kernel pmap's * page table pages. */ for (i = 0; i < nkpt; i++) { mpte = PHYS_TO_VM_PAGE(KPTphys + (i << PAGE_SHIFT)); KASSERT(mpte >= vm_page_array && mpte < &vm_page_array[vm_page_array_size], ("pmap_init: page table page is out of range")); mpte->pindex = pmap_pde_pindex(KERNBASE) + i; mpte->phys_addr = KPTphys + (i << PAGE_SHIFT); } /* * If the kernel is running on a virtual machine, then it must assume * that MCA is enabled by the hypervisor. Moreover, the kernel must * be prepared for the hypervisor changing the vendor and family that * are reported by CPUID. Consequently, the workaround for AMD Family * 10h Erratum 383 is enabled if the processor's feature set does not * include at least one feature that is only supported by older Intel * or newer AMD processors. */ if (vm_guest != VM_GUEST_NO && (cpu_feature & CPUID_SS) == 0 && (cpu_feature2 & (CPUID2_SSSE3 | CPUID2_SSE41 | CPUID2_AESNI | CPUID2_AVX | CPUID2_XSAVE)) == 0 && (amd_feature2 & (AMDID2_XOP | AMDID2_FMA4)) == 0) workaround_erratum383 = 1; /* * Are large page mappings enabled? */ TUNABLE_INT_FETCH("vm.pmap.pg_ps_enabled", &pg_ps_enabled); if (pg_ps_enabled) { KASSERT(MAXPAGESIZES > 1 && pagesizes[1] == 0, ("pmap_init: can't assign to pagesizes[1]")); pagesizes[1] = NBPDR; } /* * Initialize the pv chunk list mutex. */ mtx_init(&pv_chunks_mutex, "pmap pv chunk list", NULL, MTX_DEF); /* * Initialize the pool of pv list locks. */ for (i = 0; i < NPV_LIST_LOCKS; i++) rw_init(&pv_list_locks[i], "pmap pv list"); /* * Calculate the size of the pv head table for superpages. */ pv_npg = howmany(vm_phys_segs[vm_phys_nsegs - 1].end, NBPDR); /* * Allocate memory for the pv head table for superpages. */ s = (vm_size_t)(pv_npg * sizeof(struct md_page)); s = round_page(s); pv_table = (struct md_page *)kmem_malloc(kernel_arena, s, M_WAITOK | M_ZERO); for (i = 0; i < pv_npg; i++) TAILQ_INIT(&pv_table[i].pv_list); TAILQ_INIT(&pv_dummy.pv_list); pmap_initialized = 1; for (i = 0; i < PMAP_PREINIT_MAPPING_COUNT; i++) { ppim = pmap_preinit_mapping + i; if (ppim->va == 0) continue; /* Make the direct map consistent */ if (ppim->pa < dmaplimit && ppim->pa + ppim->sz < dmaplimit) { (void)pmap_change_attr(PHYS_TO_DMAP(ppim->pa), ppim->sz, ppim->mode); } if (!bootverbose) continue; printf("PPIM %u: PA=%#lx, VA=%#lx, size=%#lx, mode=%#x\n", i, ppim->pa, ppim->va, ppim->sz, ppim->mode); } mtx_init(&qframe_mtx, "qfrmlk", NULL, MTX_SPIN); error = vmem_alloc(kernel_arena, PAGE_SIZE, M_BESTFIT | M_WAITOK, (vmem_addr_t *)&qframe); if (error != 0) panic("qframe allocation failed"); } static SYSCTL_NODE(_vm_pmap, OID_AUTO, pde, CTLFLAG_RD, 0, "2MB page mapping counters"); static u_long pmap_pde_demotions; SYSCTL_ULONG(_vm_pmap_pde, OID_AUTO, demotions, CTLFLAG_RD, &pmap_pde_demotions, 0, "2MB page demotions"); static u_long pmap_pde_mappings; SYSCTL_ULONG(_vm_pmap_pde, OID_AUTO, mappings, CTLFLAG_RD, &pmap_pde_mappings, 0, "2MB page mappings"); static u_long pmap_pde_p_failures; SYSCTL_ULONG(_vm_pmap_pde, OID_AUTO, p_failures, CTLFLAG_RD, &pmap_pde_p_failures, 0, "2MB page promotion failures"); static u_long pmap_pde_promotions; SYSCTL_ULONG(_vm_pmap_pde, OID_AUTO, promotions, CTLFLAG_RD, &pmap_pde_promotions, 0, "2MB page promotions"); static SYSCTL_NODE(_vm_pmap, OID_AUTO, pdpe, CTLFLAG_RD, 0, "1GB page mapping counters"); static u_long pmap_pdpe_demotions; SYSCTL_ULONG(_vm_pmap_pdpe, OID_AUTO, demotions, CTLFLAG_RD, &pmap_pdpe_demotions, 0, "1GB page demotions"); /*************************************************** * Low level helper routines..... ***************************************************/ static pt_entry_t pmap_swap_pat(pmap_t pmap, pt_entry_t entry) { int x86_pat_bits = X86_PG_PTE_PAT | X86_PG_PDE_PAT; switch (pmap->pm_type) { case PT_X86: case PT_RVI: /* Verify that both PAT bits are not set at the same time */ KASSERT((entry & x86_pat_bits) != x86_pat_bits, ("Invalid PAT bits in entry %#lx", entry)); /* Swap the PAT bits if one of them is set */ if ((entry & x86_pat_bits) != 0) entry ^= x86_pat_bits; break; case PT_EPT: /* * Nothing to do - the memory attributes are represented * the same way for regular pages and superpages. */ break; default: panic("pmap_switch_pat_bits: bad pm_type %d", pmap->pm_type); } return (entry); } /* * Determine the appropriate bits to set in a PTE or PDE for a specified * caching mode. */ static int pmap_cache_bits(pmap_t pmap, int mode, boolean_t is_pde) { int cache_bits, pat_flag, pat_idx; if (mode < 0 || mode >= PAT_INDEX_SIZE || pat_index[mode] < 0) panic("Unknown caching mode %d\n", mode); switch (pmap->pm_type) { case PT_X86: case PT_RVI: /* The PAT bit is different for PTE's and PDE's. */ pat_flag = is_pde ? X86_PG_PDE_PAT : X86_PG_PTE_PAT; /* Map the caching mode to a PAT index. */ pat_idx = pat_index[mode]; /* Map the 3-bit index value into the PAT, PCD, and PWT bits. */ cache_bits = 0; if (pat_idx & 0x4) cache_bits |= pat_flag; if (pat_idx & 0x2) cache_bits |= PG_NC_PCD; if (pat_idx & 0x1) cache_bits |= PG_NC_PWT; break; case PT_EPT: cache_bits = EPT_PG_IGNORE_PAT | EPT_PG_MEMORY_TYPE(mode); break; default: panic("unsupported pmap type %d", pmap->pm_type); } return (cache_bits); } static int pmap_cache_mask(pmap_t pmap, boolean_t is_pde) { int mask; switch (pmap->pm_type) { case PT_X86: case PT_RVI: mask = is_pde ? X86_PG_PDE_CACHE : X86_PG_PTE_CACHE; break; case PT_EPT: mask = EPT_PG_IGNORE_PAT | EPT_PG_MEMORY_TYPE(0x7); break; default: panic("pmap_cache_mask: invalid pm_type %d", pmap->pm_type); } return (mask); } static __inline boolean_t pmap_ps_enabled(pmap_t pmap) { return (pg_ps_enabled && (pmap->pm_flags & PMAP_PDE_SUPERPAGE) != 0); } static void pmap_update_pde_store(pmap_t pmap, pd_entry_t *pde, pd_entry_t newpde) { switch (pmap->pm_type) { case PT_X86: break; case PT_RVI: case PT_EPT: /* * XXX * This is a little bogus since the generation number is * supposed to be bumped up when a region of the address * space is invalidated in the page tables. * * In this case the old PDE entry is valid but yet we want * to make sure that any mappings using the old entry are * invalidated in the TLB. * * The reason this works as expected is because we rendezvous * "all" host cpus and force any vcpu context to exit as a * side-effect. */ atomic_add_acq_long(&pmap->pm_eptgen, 1); break; default: panic("pmap_update_pde_store: bad pm_type %d", pmap->pm_type); } pde_store(pde, newpde); } /* * After changing the page size for the specified virtual address in the page * table, flush the corresponding entries from the processor's TLB. Only the * calling processor's TLB is affected. * * The calling thread must be pinned to a processor. */ static void pmap_update_pde_invalidate(pmap_t pmap, vm_offset_t va, pd_entry_t newpde) { pt_entry_t PG_G; if (pmap_type_guest(pmap)) return; KASSERT(pmap->pm_type == PT_X86, ("pmap_update_pde_invalidate: invalid type %d", pmap->pm_type)); PG_G = pmap_global_bit(pmap); if ((newpde & PG_PS) == 0) /* Demotion: flush a specific 2MB page mapping. */ invlpg(va); else if ((newpde & PG_G) == 0) /* * Promotion: flush every 4KB page mapping from the TLB * because there are too many to flush individually. */ invltlb(); else { /* * Promotion: flush every 4KB page mapping from the TLB, * including any global (PG_G) mappings. */ invltlb_glob(); } } #ifdef SMP /* * For SMP, these functions have to use the IPI mechanism for coherence. * * N.B.: Before calling any of the following TLB invalidation functions, * the calling processor must ensure that all stores updating a non- * kernel page table are globally performed. Otherwise, another * processor could cache an old, pre-update entry without being * invalidated. This can happen one of two ways: (1) The pmap becomes * active on another processor after its pm_active field is checked by * one of the following functions but before a store updating the page * table is globally performed. (2) The pmap becomes active on another * processor before its pm_active field is checked but due to * speculative loads one of the following functions stills reads the * pmap as inactive on the other processor. * * The kernel page table is exempt because its pm_active field is * immutable. The kernel page table is always active on every * processor. */ /* * Interrupt the cpus that are executing in the guest context. * This will force the vcpu to exit and the cached EPT mappings * will be invalidated by the host before the next vmresume. */ static __inline void pmap_invalidate_ept(pmap_t pmap) { int ipinum; sched_pin(); KASSERT(!CPU_ISSET(curcpu, &pmap->pm_active), ("pmap_invalidate_ept: absurd pm_active")); /* * The TLB mappings associated with a vcpu context are not * flushed each time a different vcpu is chosen to execute. * * This is in contrast with a process's vtop mappings that * are flushed from the TLB on each context switch. * * Therefore we need to do more than just a TLB shootdown on * the active cpus in 'pmap->pm_active'. To do this we keep * track of the number of invalidations performed on this pmap. * * Each vcpu keeps a cache of this counter and compares it * just before a vmresume. If the counter is out-of-date an * invept will be done to flush stale mappings from the TLB. */ atomic_add_acq_long(&pmap->pm_eptgen, 1); /* * Force the vcpu to exit and trap back into the hypervisor. */ ipinum = pmap->pm_flags & PMAP_NESTED_IPIMASK; ipi_selected(pmap->pm_active, ipinum); sched_unpin(); } void pmap_invalidate_page(pmap_t pmap, vm_offset_t va) { cpuset_t *mask; u_int cpuid, i; if (pmap_type_guest(pmap)) { pmap_invalidate_ept(pmap); return; } KASSERT(pmap->pm_type == PT_X86, ("pmap_invalidate_page: invalid type %d", pmap->pm_type)); sched_pin(); if (pmap == kernel_pmap) { invlpg(va); mask = &all_cpus; } else { cpuid = PCPU_GET(cpuid); if (pmap == PCPU_GET(curpmap)) invlpg(va); else if (pmap_pcid_enabled) pmap->pm_pcids[cpuid].pm_gen = 0; if (pmap_pcid_enabled) { CPU_FOREACH(i) { if (cpuid != i) pmap->pm_pcids[i].pm_gen = 0; } } mask = &pmap->pm_active; } smp_masked_invlpg(*mask, va); sched_unpin(); } /* 4k PTEs -- Chosen to exceed the total size of Broadwell L2 TLB */ #define PMAP_INVLPG_THRESHOLD (4 * 1024 * PAGE_SIZE) void pmap_invalidate_range(pmap_t pmap, vm_offset_t sva, vm_offset_t eva) { cpuset_t *mask; vm_offset_t addr; u_int cpuid, i; if (eva - sva >= PMAP_INVLPG_THRESHOLD) { pmap_invalidate_all(pmap); return; } if (pmap_type_guest(pmap)) { pmap_invalidate_ept(pmap); return; } KASSERT(pmap->pm_type == PT_X86, ("pmap_invalidate_range: invalid type %d", pmap->pm_type)); sched_pin(); cpuid = PCPU_GET(cpuid); if (pmap == kernel_pmap) { for (addr = sva; addr < eva; addr += PAGE_SIZE) invlpg(addr); mask = &all_cpus; } else { if (pmap == PCPU_GET(curpmap)) { for (addr = sva; addr < eva; addr += PAGE_SIZE) invlpg(addr); } else if (pmap_pcid_enabled) { pmap->pm_pcids[cpuid].pm_gen = 0; } if (pmap_pcid_enabled) { CPU_FOREACH(i) { if (cpuid != i) pmap->pm_pcids[i].pm_gen = 0; } } mask = &pmap->pm_active; } smp_masked_invlpg_range(*mask, sva, eva); sched_unpin(); } void pmap_invalidate_all(pmap_t pmap) { cpuset_t *mask; struct invpcid_descr d; u_int cpuid, i; if (pmap_type_guest(pmap)) { pmap_invalidate_ept(pmap); return; } KASSERT(pmap->pm_type == PT_X86, ("pmap_invalidate_all: invalid type %d", pmap->pm_type)); sched_pin(); if (pmap == kernel_pmap) { if (pmap_pcid_enabled && invpcid_works) { bzero(&d, sizeof(d)); invpcid(&d, INVPCID_CTXGLOB); } else { invltlb_glob(); } mask = &all_cpus; } else { cpuid = PCPU_GET(cpuid); if (pmap == PCPU_GET(curpmap)) { if (pmap_pcid_enabled) { if (invpcid_works) { d.pcid = pmap->pm_pcids[cpuid].pm_pcid; d.pad = 0; d.addr = 0; invpcid(&d, INVPCID_CTX); } else { load_cr3(pmap->pm_cr3 | pmap->pm_pcids [PCPU_GET(cpuid)].pm_pcid); } } else { invltlb(); } } else if (pmap_pcid_enabled) { pmap->pm_pcids[cpuid].pm_gen = 0; } if (pmap_pcid_enabled) { CPU_FOREACH(i) { if (cpuid != i) pmap->pm_pcids[i].pm_gen = 0; } } mask = &pmap->pm_active; } smp_masked_invltlb(*mask, pmap); sched_unpin(); } void pmap_invalidate_cache(void) { sched_pin(); wbinvd(); smp_cache_flush(); sched_unpin(); } struct pde_action { cpuset_t invalidate; /* processors that invalidate their TLB */ pmap_t pmap; vm_offset_t va; pd_entry_t *pde; pd_entry_t newpde; u_int store; /* processor that updates the PDE */ }; static void pmap_update_pde_action(void *arg) { struct pde_action *act = arg; if (act->store == PCPU_GET(cpuid)) pmap_update_pde_store(act->pmap, act->pde, act->newpde); } static void pmap_update_pde_teardown(void *arg) { struct pde_action *act = arg; if (CPU_ISSET(PCPU_GET(cpuid), &act->invalidate)) pmap_update_pde_invalidate(act->pmap, act->va, act->newpde); } /* * Change the page size for the specified virtual address in a way that * prevents any possibility of the TLB ever having two entries that map the * same virtual address using different page sizes. This is the recommended * workaround for Erratum 383 on AMD Family 10h processors. It prevents a * machine check exception for a TLB state that is improperly diagnosed as a * hardware error. */ static void pmap_update_pde(pmap_t pmap, vm_offset_t va, pd_entry_t *pde, pd_entry_t newpde) { struct pde_action act; cpuset_t active, other_cpus; u_int cpuid; sched_pin(); cpuid = PCPU_GET(cpuid); other_cpus = all_cpus; CPU_CLR(cpuid, &other_cpus); if (pmap == kernel_pmap || pmap_type_guest(pmap)) active = all_cpus; else { active = pmap->pm_active; } if (CPU_OVERLAP(&active, &other_cpus)) { act.store = cpuid; act.invalidate = active; act.va = va; act.pmap = pmap; act.pde = pde; act.newpde = newpde; CPU_SET(cpuid, &active); smp_rendezvous_cpus(active, smp_no_rendevous_barrier, pmap_update_pde_action, pmap_update_pde_teardown, &act); } else { pmap_update_pde_store(pmap, pde, newpde); if (CPU_ISSET(cpuid, &active)) pmap_update_pde_invalidate(pmap, va, newpde); } sched_unpin(); } #else /* !SMP */ /* * Normal, non-SMP, invalidation functions. */ void pmap_invalidate_page(pmap_t pmap, vm_offset_t va) { if (pmap->pm_type == PT_RVI || pmap->pm_type == PT_EPT) { pmap->pm_eptgen++; return; } KASSERT(pmap->pm_type == PT_X86, ("pmap_invalidate_range: unknown type %d", pmap->pm_type)); if (pmap == kernel_pmap || pmap == PCPU_GET(curpmap)) invlpg(va); else if (pmap_pcid_enabled) pmap->pm_pcids[0].pm_gen = 0; } void pmap_invalidate_range(pmap_t pmap, vm_offset_t sva, vm_offset_t eva) { vm_offset_t addr; if (pmap->pm_type == PT_RVI || pmap->pm_type == PT_EPT) { pmap->pm_eptgen++; return; } KASSERT(pmap->pm_type == PT_X86, ("pmap_invalidate_range: unknown type %d", pmap->pm_type)); if (pmap == kernel_pmap || pmap == PCPU_GET(curpmap)) { for (addr = sva; addr < eva; addr += PAGE_SIZE) invlpg(addr); } else if (pmap_pcid_enabled) { pmap->pm_pcids[0].pm_gen = 0; } } void pmap_invalidate_all(pmap_t pmap) { struct invpcid_descr d; if (pmap->pm_type == PT_RVI || pmap->pm_type == PT_EPT) { pmap->pm_eptgen++; return; } KASSERT(pmap->pm_type == PT_X86, ("pmap_invalidate_all: unknown type %d", pmap->pm_type)); if (pmap == kernel_pmap) { if (pmap_pcid_enabled && invpcid_works) { bzero(&d, sizeof(d)); invpcid(&d, INVPCID_CTXGLOB); } else { invltlb_glob(); } } else if (pmap == PCPU_GET(curpmap)) { if (pmap_pcid_enabled) { if (invpcid_works) { d.pcid = pmap->pm_pcids[0].pm_pcid; d.pad = 0; d.addr = 0; invpcid(&d, INVPCID_CTX); } else { load_cr3(pmap->pm_cr3 | pmap->pm_pcids[0]. pm_pcid); } } else { invltlb(); } } else if (pmap_pcid_enabled) { pmap->pm_pcids[0].pm_gen = 0; } } PMAP_INLINE void pmap_invalidate_cache(void) { wbinvd(); } static void pmap_update_pde(pmap_t pmap, vm_offset_t va, pd_entry_t *pde, pd_entry_t newpde) { pmap_update_pde_store(pmap, pde, newpde); if (pmap == kernel_pmap || pmap == PCPU_GET(curpmap)) pmap_update_pde_invalidate(pmap, va, newpde); else pmap->pm_pcids[0].pm_gen = 0; } #endif /* !SMP */ #define PMAP_CLFLUSH_THRESHOLD (2 * 1024 * 1024) void pmap_invalidate_cache_range(vm_offset_t sva, vm_offset_t eva, boolean_t force) { if (force) { sva &= ~(vm_offset_t)cpu_clflush_line_size; } else { KASSERT((sva & PAGE_MASK) == 0, ("pmap_invalidate_cache_range: sva not page-aligned")); KASSERT((eva & PAGE_MASK) == 0, ("pmap_invalidate_cache_range: eva not page-aligned")); } if ((cpu_feature & CPUID_SS) != 0 && !force) ; /* If "Self Snoop" is supported and allowed, do nothing. */ else if ((cpu_stdext_feature & CPUID_STDEXT_CLFLUSHOPT) != 0 && eva - sva < PMAP_CLFLUSH_THRESHOLD) { /* * XXX: Some CPUs fault, hang, or trash the local APIC * registers if we use CLFLUSH on the local APIC * range. The local APIC is always uncached, so we * don't need to flush for that range anyway. */ if (pmap_kextract(sva) == lapic_paddr) return; /* * Otherwise, do per-cache line flush. Use the mfence * instruction to insure that previous stores are * included in the write-back. The processor * propagates flush to other processors in the cache * coherence domain. */ mfence(); for (; sva < eva; sva += cpu_clflush_line_size) clflushopt(sva); mfence(); } else if ((cpu_feature & CPUID_CLFSH) != 0 && eva - sva < PMAP_CLFLUSH_THRESHOLD) { if (pmap_kextract(sva) == lapic_paddr) return; /* * Writes are ordered by CLFLUSH on Intel CPUs. */ if (cpu_vendor_id != CPU_VENDOR_INTEL) mfence(); for (; sva < eva; sva += cpu_clflush_line_size) clflush(sva); if (cpu_vendor_id != CPU_VENDOR_INTEL) mfence(); } else { /* * No targeted cache flush methods are supported by CPU, * or the supplied range is bigger than 2MB. * Globally invalidate cache. */ pmap_invalidate_cache(); } } /* * Remove the specified set of pages from the data and instruction caches. * * In contrast to pmap_invalidate_cache_range(), this function does not * rely on the CPU's self-snoop feature, because it is intended for use * when moving pages into a different cache domain. */ void pmap_invalidate_cache_pages(vm_page_t *pages, int count) { vm_offset_t daddr, eva; int i; bool useclflushopt; useclflushopt = (cpu_stdext_feature & CPUID_STDEXT_CLFLUSHOPT) != 0; if (count >= PMAP_CLFLUSH_THRESHOLD / PAGE_SIZE || ((cpu_feature & CPUID_CLFSH) == 0 && !useclflushopt)) pmap_invalidate_cache(); else { if (useclflushopt || cpu_vendor_id != CPU_VENDOR_INTEL) mfence(); for (i = 0; i < count; i++) { daddr = PHYS_TO_DMAP(VM_PAGE_TO_PHYS(pages[i])); eva = daddr + PAGE_SIZE; for (; daddr < eva; daddr += cpu_clflush_line_size) { if (useclflushopt) clflushopt(daddr); else clflush(daddr); } } if (useclflushopt || cpu_vendor_id != CPU_VENDOR_INTEL) mfence(); } } /* * Routine: pmap_extract * Function: * Extract the physical page address associated * with the given map/virtual_address pair. */ vm_paddr_t pmap_extract(pmap_t pmap, vm_offset_t va) { pdp_entry_t *pdpe; pd_entry_t *pde; pt_entry_t *pte, PG_V; vm_paddr_t pa; pa = 0; PG_V = pmap_valid_bit(pmap); PMAP_LOCK(pmap); pdpe = pmap_pdpe(pmap, va); if (pdpe != NULL && (*pdpe & PG_V) != 0) { if ((*pdpe & PG_PS) != 0) pa = (*pdpe & PG_PS_FRAME) | (va & PDPMASK); else { pde = pmap_pdpe_to_pde(pdpe, va); if ((*pde & PG_V) != 0) { if ((*pde & PG_PS) != 0) { pa = (*pde & PG_PS_FRAME) | (va & PDRMASK); } else { pte = pmap_pde_to_pte(pde, va); pa = (*pte & PG_FRAME) | (va & PAGE_MASK); } } } } PMAP_UNLOCK(pmap); return (pa); } /* * Routine: pmap_extract_and_hold * Function: * Atomically extract and hold the physical page * with the given pmap and virtual address pair * if that mapping permits the given protection. */ vm_page_t pmap_extract_and_hold(pmap_t pmap, vm_offset_t va, vm_prot_t prot) { pd_entry_t pde, *pdep; pt_entry_t pte, PG_RW, PG_V; vm_paddr_t pa; vm_page_t m; pa = 0; m = NULL; PG_RW = pmap_rw_bit(pmap); PG_V = pmap_valid_bit(pmap); PMAP_LOCK(pmap); retry: pdep = pmap_pde(pmap, va); if (pdep != NULL && (pde = *pdep)) { if (pde & PG_PS) { if ((pde & PG_RW) || (prot & VM_PROT_WRITE) == 0) { if (vm_page_pa_tryrelock(pmap, (pde & PG_PS_FRAME) | (va & PDRMASK), &pa)) goto retry; m = PHYS_TO_VM_PAGE((pde & PG_PS_FRAME) | (va & PDRMASK)); vm_page_hold(m); } } else { pte = *pmap_pde_to_pte(pdep, va); if ((pte & PG_V) && ((pte & PG_RW) || (prot & VM_PROT_WRITE) == 0)) { if (vm_page_pa_tryrelock(pmap, pte & PG_FRAME, &pa)) goto retry; m = PHYS_TO_VM_PAGE(pte & PG_FRAME); vm_page_hold(m); } } } PA_UNLOCK_COND(pa); PMAP_UNLOCK(pmap); return (m); } vm_paddr_t pmap_kextract(vm_offset_t va) { pd_entry_t pde; vm_paddr_t pa; if (va >= DMAP_MIN_ADDRESS && va < DMAP_MAX_ADDRESS) { pa = DMAP_TO_PHYS(va); } else { pde = *vtopde(va); if (pde & PG_PS) { pa = (pde & PG_PS_FRAME) | (va & PDRMASK); } else { /* * Beware of a concurrent promotion that changes the * PDE at this point! For example, vtopte() must not * be used to access the PTE because it would use the * new PDE. It is, however, safe to use the old PDE * because the page table page is preserved by the * promotion. */ pa = *pmap_pde_to_pte(&pde, va); pa = (pa & PG_FRAME) | (va & PAGE_MASK); } } return (pa); } /*************************************************** * Low level mapping routines..... ***************************************************/ /* * Add a wired page to the kva. * Note: not SMP coherent. */ PMAP_INLINE void pmap_kenter(vm_offset_t va, vm_paddr_t pa) { pt_entry_t *pte; pte = vtopte(va); pte_store(pte, pa | X86_PG_RW | X86_PG_V | X86_PG_G); } static __inline void pmap_kenter_attr(vm_offset_t va, vm_paddr_t pa, int mode) { pt_entry_t *pte; int cache_bits; pte = vtopte(va); cache_bits = pmap_cache_bits(kernel_pmap, mode, 0); pte_store(pte, pa | X86_PG_RW | X86_PG_V | X86_PG_G | cache_bits); } /* * Remove a page from the kernel pagetables. * Note: not SMP coherent. */ PMAP_INLINE void pmap_kremove(vm_offset_t va) { pt_entry_t *pte; pte = vtopte(va); pte_clear(pte); } /* * Used to map a range of physical addresses into kernel * virtual address space. * * The value passed in '*virt' is a suggested virtual address for * the mapping. Architectures which can support a direct-mapped * physical to virtual region can return the appropriate address * within that region, leaving '*virt' unchanged. Other * architectures should map the pages starting at '*virt' and * update '*virt' with the first usable address after the mapped * region. */ vm_offset_t pmap_map(vm_offset_t *virt, vm_paddr_t start, vm_paddr_t end, int prot) { return PHYS_TO_DMAP(start); } /* * Add a list of wired pages to the kva * this routine is only used for temporary * kernel mappings that do not need to have * page modification or references recorded. * Note that old mappings are simply written * over. The page *must* be wired. * Note: SMP coherent. Uses a ranged shootdown IPI. */ void pmap_qenter(vm_offset_t sva, vm_page_t *ma, int count) { pt_entry_t *endpte, oldpte, pa, *pte; vm_page_t m; int cache_bits; oldpte = 0; pte = vtopte(sva); endpte = pte + count; while (pte < endpte) { m = *ma++; cache_bits = pmap_cache_bits(kernel_pmap, m->md.pat_mode, 0); pa = VM_PAGE_TO_PHYS(m) | cache_bits; if ((*pte & (PG_FRAME | X86_PG_PTE_CACHE)) != pa) { oldpte |= *pte; pte_store(pte, pa | X86_PG_G | X86_PG_RW | X86_PG_V); } pte++; } if (__predict_false((oldpte & X86_PG_V) != 0)) pmap_invalidate_range(kernel_pmap, sva, sva + count * PAGE_SIZE); } /* * This routine tears out page mappings from the * kernel -- it is meant only for temporary mappings. * Note: SMP coherent. Uses a ranged shootdown IPI. */ void pmap_qremove(vm_offset_t sva, int count) { vm_offset_t va; va = sva; while (count-- > 0) { KASSERT(va >= VM_MIN_KERNEL_ADDRESS, ("usermode va %lx", va)); pmap_kremove(va); va += PAGE_SIZE; } pmap_invalidate_range(kernel_pmap, sva, va); } /*************************************************** * Page table page management routines..... ***************************************************/ static __inline void pmap_free_zero_pages(struct spglist *free) { vm_page_t m; while ((m = SLIST_FIRST(free)) != NULL) { SLIST_REMOVE_HEAD(free, plinks.s.ss); /* Preserve the page's PG_ZERO setting. */ vm_page_free_toq(m); } } /* * Schedule the specified unused page table page to be freed. Specifically, * add the page to the specified list of pages that will be released to the * physical memory manager after the TLB has been updated. */ static __inline void pmap_add_delayed_free_list(vm_page_t m, struct spglist *free, boolean_t set_PG_ZERO) { if (set_PG_ZERO) m->flags |= PG_ZERO; else m->flags &= ~PG_ZERO; SLIST_INSERT_HEAD(free, m, plinks.s.ss); } /* * Inserts the specified page table page into the specified pmap's collection * of idle page table pages. Each of a pmap's page table pages is responsible * for mapping a distinct range of virtual addresses. The pmap's collection is * ordered by this virtual address range. */ static __inline int pmap_insert_pt_page(pmap_t pmap, vm_page_t mpte) { PMAP_LOCK_ASSERT(pmap, MA_OWNED); return (vm_radix_insert(&pmap->pm_root, mpte)); } /* * Looks for a page table page mapping the specified virtual address in the * specified pmap's collection of idle page table pages. Returns NULL if there * is no page table page corresponding to the specified virtual address. */ static __inline vm_page_t pmap_lookup_pt_page(pmap_t pmap, vm_offset_t va) { PMAP_LOCK_ASSERT(pmap, MA_OWNED); return (vm_radix_lookup(&pmap->pm_root, pmap_pde_pindex(va))); } /* * Removes the specified page table page from the specified pmap's collection * of idle page table pages. The specified page table page must be a member of * the pmap's collection. */ static __inline void pmap_remove_pt_page(pmap_t pmap, vm_page_t mpte) { PMAP_LOCK_ASSERT(pmap, MA_OWNED); vm_radix_remove(&pmap->pm_root, mpte->pindex); } /* * Decrements a page table page's wire count, which is used to record the * number of valid page table entries within the page. If the wire count * drops to zero, then the page table page is unmapped. Returns TRUE if the * page table page was unmapped and FALSE otherwise. */ static inline boolean_t pmap_unwire_ptp(pmap_t pmap, vm_offset_t va, vm_page_t m, struct spglist *free) { --m->wire_count; if (m->wire_count == 0) { _pmap_unwire_ptp(pmap, va, m, free); return (TRUE); } else return (FALSE); } static void _pmap_unwire_ptp(pmap_t pmap, vm_offset_t va, vm_page_t m, struct spglist *free) { PMAP_LOCK_ASSERT(pmap, MA_OWNED); /* * unmap the page table page */ if (m->pindex >= (NUPDE + NUPDPE)) { /* PDP page */ pml4_entry_t *pml4; pml4 = pmap_pml4e(pmap, va); *pml4 = 0; } else if (m->pindex >= NUPDE) { /* PD page */ pdp_entry_t *pdp; pdp = pmap_pdpe(pmap, va); *pdp = 0; } else { /* PTE page */ pd_entry_t *pd; pd = pmap_pde(pmap, va); *pd = 0; } pmap_resident_count_dec(pmap, 1); if (m->pindex < NUPDE) { /* We just released a PT, unhold the matching PD */ vm_page_t pdpg; pdpg = PHYS_TO_VM_PAGE(*pmap_pdpe(pmap, va) & PG_FRAME); pmap_unwire_ptp(pmap, va, pdpg, free); } if (m->pindex >= NUPDE && m->pindex < (NUPDE + NUPDPE)) { /* We just released a PD, unhold the matching PDP */ vm_page_t pdppg; pdppg = PHYS_TO_VM_PAGE(*pmap_pml4e(pmap, va) & PG_FRAME); pmap_unwire_ptp(pmap, va, pdppg, free); } /* * This is a release store so that the ordinary store unmapping * the page table page is globally performed before TLB shoot- * down is begun. */ atomic_subtract_rel_int(&vm_cnt.v_wire_count, 1); /* * Put page on a list so that it is released after * *ALL* TLB shootdown is done */ pmap_add_delayed_free_list(m, free, TRUE); } /* * After removing a page table entry, this routine is used to * conditionally free the page, and manage the hold/wire counts. */ static int pmap_unuse_pt(pmap_t pmap, vm_offset_t va, pd_entry_t ptepde, struct spglist *free) { vm_page_t mpte; if (va >= VM_MAXUSER_ADDRESS) return (0); KASSERT(ptepde != 0, ("pmap_unuse_pt: ptepde != 0")); mpte = PHYS_TO_VM_PAGE(ptepde & PG_FRAME); return (pmap_unwire_ptp(pmap, va, mpte, free)); } void pmap_pinit0(pmap_t pmap) { int i; PMAP_LOCK_INIT(pmap); pmap->pm_pml4 = (pml4_entry_t *)PHYS_TO_DMAP(KPML4phys); pmap->pm_cr3 = KPML4phys; pmap->pm_root.rt_root = 0; CPU_ZERO(&pmap->pm_active); TAILQ_INIT(&pmap->pm_pvchunk); bzero(&pmap->pm_stats, sizeof pmap->pm_stats); pmap->pm_flags = pmap_flags; CPU_FOREACH(i) { pmap->pm_pcids[i].pm_pcid = PMAP_PCID_NONE; pmap->pm_pcids[i].pm_gen = 0; } PCPU_SET(curpmap, kernel_pmap); pmap_activate(curthread); CPU_FILL(&kernel_pmap->pm_active); } /* * Initialize a preallocated and zeroed pmap structure, * such as one in a vmspace structure. */ int pmap_pinit_type(pmap_t pmap, enum pmap_type pm_type, int flags) { vm_page_t pml4pg; vm_paddr_t pml4phys; int i; /* * allocate the page directory page */ while ((pml4pg = vm_page_alloc(NULL, 0, VM_ALLOC_NORMAL | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED | VM_ALLOC_ZERO)) == NULL) VM_WAIT; pml4phys = VM_PAGE_TO_PHYS(pml4pg); pmap->pm_pml4 = (pml4_entry_t *)PHYS_TO_DMAP(pml4phys); CPU_FOREACH(i) { pmap->pm_pcids[i].pm_pcid = PMAP_PCID_NONE; pmap->pm_pcids[i].pm_gen = 0; } pmap->pm_cr3 = ~0; /* initialize to an invalid value */ if ((pml4pg->flags & PG_ZERO) == 0) pagezero(pmap->pm_pml4); /* * Do not install the host kernel mappings in the nested page * tables. These mappings are meaningless in the guest physical * address space. */ if ((pmap->pm_type = pm_type) == PT_X86) { pmap->pm_cr3 = pml4phys; /* Wire in kernel global address entries. */ for (i = 0; i < NKPML4E; i++) { pmap->pm_pml4[KPML4BASE + i] = (KPDPphys + ptoa(i)) | X86_PG_RW | X86_PG_V | PG_U; } for (i = 0; i < ndmpdpphys; i++) { pmap->pm_pml4[DMPML4I + i] = (DMPDPphys + ptoa(i)) | X86_PG_RW | X86_PG_V | PG_U; } /* install self-referential address mapping entry(s) */ pmap->pm_pml4[PML4PML4I] = VM_PAGE_TO_PHYS(pml4pg) | X86_PG_V | X86_PG_RW | X86_PG_A | X86_PG_M; } pmap->pm_root.rt_root = 0; CPU_ZERO(&pmap->pm_active); TAILQ_INIT(&pmap->pm_pvchunk); bzero(&pmap->pm_stats, sizeof pmap->pm_stats); pmap->pm_flags = flags; pmap->pm_eptgen = 0; return (1); } int pmap_pinit(pmap_t pmap) { return (pmap_pinit_type(pmap, PT_X86, pmap_flags)); } /* * This routine is called if the desired page table page does not exist. * * If page table page allocation fails, this routine may sleep before * returning NULL. It sleeps only if a lock pointer was given. * * Note: If a page allocation fails at page table level two or three, * one or two pages may be held during the wait, only to be released * afterwards. This conservative approach is easily argued to avoid * race conditions. */ static vm_page_t _pmap_allocpte(pmap_t pmap, vm_pindex_t ptepindex, struct rwlock **lockp) { vm_page_t m, pdppg, pdpg; pt_entry_t PG_A, PG_M, PG_RW, PG_V; PMAP_LOCK_ASSERT(pmap, MA_OWNED); PG_A = pmap_accessed_bit(pmap); PG_M = pmap_modified_bit(pmap); PG_V = pmap_valid_bit(pmap); PG_RW = pmap_rw_bit(pmap); /* * Allocate a page table page. */ if ((m = vm_page_alloc(NULL, ptepindex, VM_ALLOC_NOOBJ | VM_ALLOC_WIRED | VM_ALLOC_ZERO)) == NULL) { if (lockp != NULL) { RELEASE_PV_LIST_LOCK(lockp); PMAP_UNLOCK(pmap); PMAP_ASSERT_NOT_IN_DI(); VM_WAIT; PMAP_LOCK(pmap); } /* * Indicate the need to retry. While waiting, the page table * page may have been allocated. */ return (NULL); } if ((m->flags & PG_ZERO) == 0) pmap_zero_page(m); /* * Map the pagetable page into the process address space, if * it isn't already there. */ if (ptepindex >= (NUPDE + NUPDPE)) { pml4_entry_t *pml4; vm_pindex_t pml4index; /* Wire up a new PDPE page */ pml4index = ptepindex - (NUPDE + NUPDPE); pml4 = &pmap->pm_pml4[pml4index]; *pml4 = VM_PAGE_TO_PHYS(m) | PG_U | PG_RW | PG_V | PG_A | PG_M; } else if (ptepindex >= NUPDE) { vm_pindex_t pml4index; vm_pindex_t pdpindex; pml4_entry_t *pml4; pdp_entry_t *pdp; /* Wire up a new PDE page */ pdpindex = ptepindex - NUPDE; pml4index = pdpindex >> NPML4EPGSHIFT; pml4 = &pmap->pm_pml4[pml4index]; if ((*pml4 & PG_V) == 0) { /* Have to allocate a new pdp, recurse */ if (_pmap_allocpte(pmap, NUPDE + NUPDPE + pml4index, lockp) == NULL) { --m->wire_count; atomic_subtract_int(&vm_cnt.v_wire_count, 1); vm_page_free_zero(m); return (NULL); } } else { /* Add reference to pdp page */ pdppg = PHYS_TO_VM_PAGE(*pml4 & PG_FRAME); pdppg->wire_count++; } pdp = (pdp_entry_t *)PHYS_TO_DMAP(*pml4 & PG_FRAME); /* Now find the pdp page */ pdp = &pdp[pdpindex & ((1ul << NPDPEPGSHIFT) - 1)]; *pdp = VM_PAGE_TO_PHYS(m) | PG_U | PG_RW | PG_V | PG_A | PG_M; } else { vm_pindex_t pml4index; vm_pindex_t pdpindex; pml4_entry_t *pml4; pdp_entry_t *pdp; pd_entry_t *pd; /* Wire up a new PTE page */ pdpindex = ptepindex >> NPDPEPGSHIFT; pml4index = pdpindex >> NPML4EPGSHIFT; /* First, find the pdp and check that its valid. */ pml4 = &pmap->pm_pml4[pml4index]; if ((*pml4 & PG_V) == 0) { /* Have to allocate a new pd, recurse */ if (_pmap_allocpte(pmap, NUPDE + pdpindex, lockp) == NULL) { --m->wire_count; atomic_subtract_int(&vm_cnt.v_wire_count, 1); vm_page_free_zero(m); return (NULL); } pdp = (pdp_entry_t *)PHYS_TO_DMAP(*pml4 & PG_FRAME); pdp = &pdp[pdpindex & ((1ul << NPDPEPGSHIFT) - 1)]; } else { pdp = (pdp_entry_t *)PHYS_TO_DMAP(*pml4 & PG_FRAME); pdp = &pdp[pdpindex & ((1ul << NPDPEPGSHIFT) - 1)]; if ((*pdp & PG_V) == 0) { /* Have to allocate a new pd, recurse */ if (_pmap_allocpte(pmap, NUPDE + pdpindex, lockp) == NULL) { --m->wire_count; atomic_subtract_int(&vm_cnt.v_wire_count, 1); vm_page_free_zero(m); return (NULL); } } else { /* Add reference to the pd page */ pdpg = PHYS_TO_VM_PAGE(*pdp & PG_FRAME); pdpg->wire_count++; } } pd = (pd_entry_t *)PHYS_TO_DMAP(*pdp & PG_FRAME); /* Now we know where the page directory page is */ pd = &pd[ptepindex & ((1ul << NPDEPGSHIFT) - 1)]; *pd = VM_PAGE_TO_PHYS(m) | PG_U | PG_RW | PG_V | PG_A | PG_M; } pmap_resident_count_inc(pmap, 1); return (m); } static vm_page_t pmap_allocpde(pmap_t pmap, vm_offset_t va, struct rwlock **lockp) { vm_pindex_t pdpindex, ptepindex; pdp_entry_t *pdpe, PG_V; vm_page_t pdpg; PG_V = pmap_valid_bit(pmap); retry: pdpe = pmap_pdpe(pmap, va); if (pdpe != NULL && (*pdpe & PG_V) != 0) { /* Add a reference to the pd page. */ pdpg = PHYS_TO_VM_PAGE(*pdpe & PG_FRAME); pdpg->wire_count++; } else { /* Allocate a pd page. */ ptepindex = pmap_pde_pindex(va); pdpindex = ptepindex >> NPDPEPGSHIFT; pdpg = _pmap_allocpte(pmap, NUPDE + pdpindex, lockp); if (pdpg == NULL && lockp != NULL) goto retry; } return (pdpg); } static vm_page_t pmap_allocpte(pmap_t pmap, vm_offset_t va, struct rwlock **lockp) { vm_pindex_t ptepindex; pd_entry_t *pd, PG_V; vm_page_t m; PG_V = pmap_valid_bit(pmap); /* * Calculate pagetable page index */ ptepindex = pmap_pde_pindex(va); retry: /* * Get the page directory entry */ pd = pmap_pde(pmap, va); /* * This supports switching from a 2MB page to a * normal 4K page. */ if (pd != NULL && (*pd & (PG_PS | PG_V)) == (PG_PS | PG_V)) { if (!pmap_demote_pde_locked(pmap, pd, va, lockp)) { /* * Invalidation of the 2MB page mapping may have caused * the deallocation of the underlying PD page. */ pd = NULL; } } /* * If the page table page is mapped, we just increment the * hold count, and activate it. */ if (pd != NULL && (*pd & PG_V) != 0) { m = PHYS_TO_VM_PAGE(*pd & PG_FRAME); m->wire_count++; } else { /* * Here if the pte page isn't mapped, or if it has been * deallocated. */ m = _pmap_allocpte(pmap, ptepindex, lockp); if (m == NULL && lockp != NULL) goto retry; } return (m); } /*************************************************** * Pmap allocation/deallocation routines. ***************************************************/ /* * Release any resources held by the given physical map. * Called when a pmap initialized by pmap_pinit is being released. * Should only be called if the map contains no valid mappings. */ void pmap_release(pmap_t pmap) { vm_page_t m; int i; KASSERT(pmap->pm_stats.resident_count == 0, ("pmap_release: pmap resident count %ld != 0", pmap->pm_stats.resident_count)); KASSERT(vm_radix_is_empty(&pmap->pm_root), ("pmap_release: pmap has reserved page table page(s)")); KASSERT(CPU_EMPTY(&pmap->pm_active), ("releasing active pmap %p", pmap)); m = PHYS_TO_VM_PAGE(DMAP_TO_PHYS((vm_offset_t)pmap->pm_pml4)); for (i = 0; i < NKPML4E; i++) /* KVA */ pmap->pm_pml4[KPML4BASE + i] = 0; for (i = 0; i < ndmpdpphys; i++)/* Direct Map */ pmap->pm_pml4[DMPML4I + i] = 0; pmap->pm_pml4[PML4PML4I] = 0; /* Recursive Mapping */ m->wire_count--; atomic_subtract_int(&vm_cnt.v_wire_count, 1); vm_page_free_zero(m); } static int kvm_size(SYSCTL_HANDLER_ARGS) { unsigned long ksize = VM_MAX_KERNEL_ADDRESS - VM_MIN_KERNEL_ADDRESS; return sysctl_handle_long(oidp, &ksize, 0, req); } SYSCTL_PROC(_vm, OID_AUTO, kvm_size, CTLTYPE_LONG|CTLFLAG_RD, 0, 0, kvm_size, "LU", "Size of KVM"); static int kvm_free(SYSCTL_HANDLER_ARGS) { unsigned long kfree = VM_MAX_KERNEL_ADDRESS - kernel_vm_end; return sysctl_handle_long(oidp, &kfree, 0, req); } SYSCTL_PROC(_vm, OID_AUTO, kvm_free, CTLTYPE_LONG|CTLFLAG_RD, 0, 0, kvm_free, "LU", "Amount of KVM free"); /* * grow the number of kernel page table entries, if needed */ void pmap_growkernel(vm_offset_t addr) { vm_paddr_t paddr; vm_page_t nkpg; pd_entry_t *pde, newpdir; pdp_entry_t *pdpe; mtx_assert(&kernel_map->system_mtx, MA_OWNED); /* * Return if "addr" is within the range of kernel page table pages * that were preallocated during pmap bootstrap. Moreover, leave * "kernel_vm_end" and the kernel page table as they were. * * The correctness of this action is based on the following * argument: vm_map_insert() allocates contiguous ranges of the * kernel virtual address space. It calls this function if a range * ends after "kernel_vm_end". If the kernel is mapped between * "kernel_vm_end" and "addr", then the range cannot begin at * "kernel_vm_end". In fact, its beginning address cannot be less * than the kernel. Thus, there is no immediate need to allocate * any new kernel page table pages between "kernel_vm_end" and * "KERNBASE". */ if (KERNBASE < addr && addr <= KERNBASE + nkpt * NBPDR) return; addr = roundup2(addr, NBPDR); if (addr - 1 >= kernel_map->max_offset) addr = kernel_map->max_offset; while (kernel_vm_end < addr) { pdpe = pmap_pdpe(kernel_pmap, kernel_vm_end); if ((*pdpe & X86_PG_V) == 0) { /* We need a new PDP entry */ nkpg = vm_page_alloc(NULL, kernel_vm_end >> PDPSHIFT, VM_ALLOC_INTERRUPT | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED | VM_ALLOC_ZERO); if (nkpg == NULL) panic("pmap_growkernel: no memory to grow kernel"); if ((nkpg->flags & PG_ZERO) == 0) pmap_zero_page(nkpg); paddr = VM_PAGE_TO_PHYS(nkpg); *pdpe = (pdp_entry_t)(paddr | X86_PG_V | X86_PG_RW | X86_PG_A | X86_PG_M); continue; /* try again */ } pde = pmap_pdpe_to_pde(pdpe, kernel_vm_end); if ((*pde & X86_PG_V) != 0) { kernel_vm_end = (kernel_vm_end + NBPDR) & ~PDRMASK; if (kernel_vm_end - 1 >= kernel_map->max_offset) { kernel_vm_end = kernel_map->max_offset; break; } continue; } nkpg = vm_page_alloc(NULL, pmap_pde_pindex(kernel_vm_end), VM_ALLOC_INTERRUPT | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED | VM_ALLOC_ZERO); if (nkpg == NULL) panic("pmap_growkernel: no memory to grow kernel"); if ((nkpg->flags & PG_ZERO) == 0) pmap_zero_page(nkpg); paddr = VM_PAGE_TO_PHYS(nkpg); newpdir = paddr | X86_PG_V | X86_PG_RW | X86_PG_A | X86_PG_M; pde_store(pde, newpdir); kernel_vm_end = (kernel_vm_end + NBPDR) & ~PDRMASK; if (kernel_vm_end - 1 >= kernel_map->max_offset) { kernel_vm_end = kernel_map->max_offset; break; } } } /*************************************************** * page management routines. ***************************************************/ CTASSERT(sizeof(struct pv_chunk) == PAGE_SIZE); CTASSERT(_NPCM == 3); CTASSERT(_NPCPV == 168); static __inline struct pv_chunk * pv_to_chunk(pv_entry_t pv) { return ((struct pv_chunk *)((uintptr_t)pv & ~(uintptr_t)PAGE_MASK)); } #define PV_PMAP(pv) (pv_to_chunk(pv)->pc_pmap) #define PC_FREE0 0xfffffffffffffffful #define PC_FREE1 0xfffffffffffffffful #define PC_FREE2 0x000000fffffffffful static const uint64_t pc_freemask[_NPCM] = { PC_FREE0, PC_FREE1, PC_FREE2 }; #ifdef PV_STATS static int pc_chunk_count, pc_chunk_allocs, pc_chunk_frees, pc_chunk_tryfail; SYSCTL_INT(_vm_pmap, OID_AUTO, pc_chunk_count, CTLFLAG_RD, &pc_chunk_count, 0, "Current number of pv entry chunks"); SYSCTL_INT(_vm_pmap, OID_AUTO, pc_chunk_allocs, CTLFLAG_RD, &pc_chunk_allocs, 0, "Current number of pv entry chunks allocated"); SYSCTL_INT(_vm_pmap, OID_AUTO, pc_chunk_frees, CTLFLAG_RD, &pc_chunk_frees, 0, "Current number of pv entry chunks frees"); SYSCTL_INT(_vm_pmap, OID_AUTO, pc_chunk_tryfail, CTLFLAG_RD, &pc_chunk_tryfail, 0, "Number of times tried to get a chunk page but failed."); static long pv_entry_frees, pv_entry_allocs, pv_entry_count; static int pv_entry_spare; SYSCTL_LONG(_vm_pmap, OID_AUTO, pv_entry_frees, CTLFLAG_RD, &pv_entry_frees, 0, "Current number of pv entry frees"); SYSCTL_LONG(_vm_pmap, OID_AUTO, pv_entry_allocs, CTLFLAG_RD, &pv_entry_allocs, 0, "Current number of pv entry allocs"); SYSCTL_LONG(_vm_pmap, OID_AUTO, pv_entry_count, CTLFLAG_RD, &pv_entry_count, 0, "Current number of pv entries"); SYSCTL_INT(_vm_pmap, OID_AUTO, pv_entry_spare, CTLFLAG_RD, &pv_entry_spare, 0, "Current number of spare pv entries"); #endif /* * We are in a serious low memory condition. Resort to * drastic measures to free some pages so we can allocate * another pv entry chunk. * * Returns NULL if PV entries were reclaimed from the specified pmap. * * We do not, however, unmap 2mpages because subsequent accesses will * allocate per-page pv entries until repromotion occurs, thereby * exacerbating the shortage of free pv entries. */ static vm_page_t reclaim_pv_chunk(pmap_t locked_pmap, struct rwlock **lockp) { struct pch new_tail; struct pv_chunk *pc; struct md_page *pvh; pd_entry_t *pde; pmap_t pmap; pt_entry_t *pte, tpte; pt_entry_t PG_G, PG_A, PG_M, PG_RW; pv_entry_t pv; vm_offset_t va; vm_page_t m, m_pc; struct spglist free; uint64_t inuse; int bit, field, freed; PMAP_LOCK_ASSERT(locked_pmap, MA_OWNED); KASSERT(lockp != NULL, ("reclaim_pv_chunk: lockp is NULL")); pmap = NULL; m_pc = NULL; PG_G = PG_A = PG_M = PG_RW = 0; SLIST_INIT(&free); TAILQ_INIT(&new_tail); pmap_delayed_invl_started(); mtx_lock(&pv_chunks_mutex); while ((pc = TAILQ_FIRST(&pv_chunks)) != NULL && SLIST_EMPTY(&free)) { TAILQ_REMOVE(&pv_chunks, pc, pc_lru); mtx_unlock(&pv_chunks_mutex); if (pmap != pc->pc_pmap) { if (pmap != NULL) { pmap_invalidate_all(pmap); if (pmap != locked_pmap) PMAP_UNLOCK(pmap); } pmap_delayed_invl_finished(); pmap_delayed_invl_started(); pmap = pc->pc_pmap; /* Avoid deadlock and lock recursion. */ if (pmap > locked_pmap) { RELEASE_PV_LIST_LOCK(lockp); PMAP_LOCK(pmap); } else if (pmap != locked_pmap && !PMAP_TRYLOCK(pmap)) { pmap = NULL; TAILQ_INSERT_TAIL(&new_tail, pc, pc_lru); mtx_lock(&pv_chunks_mutex); continue; } PG_G = pmap_global_bit(pmap); PG_A = pmap_accessed_bit(pmap); PG_M = pmap_modified_bit(pmap); PG_RW = pmap_rw_bit(pmap); } /* * Destroy every non-wired, 4 KB page mapping in the chunk. */ freed = 0; for (field = 0; field < _NPCM; field++) { for (inuse = ~pc->pc_map[field] & pc_freemask[field]; inuse != 0; inuse &= ~(1UL << bit)) { bit = bsfq(inuse); pv = &pc->pc_pventry[field * 64 + bit]; va = pv->pv_va; pde = pmap_pde(pmap, va); if ((*pde & PG_PS) != 0) continue; pte = pmap_pde_to_pte(pde, va); if ((*pte & PG_W) != 0) continue; tpte = pte_load_clear(pte); if ((tpte & PG_G) != 0) pmap_invalidate_page(pmap, va); m = PHYS_TO_VM_PAGE(tpte & PG_FRAME); if ((tpte & (PG_M | PG_RW)) == (PG_M | PG_RW)) vm_page_dirty(m); if ((tpte & PG_A) != 0) vm_page_aflag_set(m, PGA_REFERENCED); CHANGE_PV_LIST_LOCK_TO_VM_PAGE(lockp, m); TAILQ_REMOVE(&m->md.pv_list, pv, pv_next); m->md.pv_gen++; if (TAILQ_EMPTY(&m->md.pv_list) && (m->flags & PG_FICTITIOUS) == 0) { pvh = pa_to_pvh(VM_PAGE_TO_PHYS(m)); if (TAILQ_EMPTY(&pvh->pv_list)) { vm_page_aflag_clear(m, PGA_WRITEABLE); } } pmap_delayed_invl_page(m); pc->pc_map[field] |= 1UL << bit; pmap_unuse_pt(pmap, va, *pde, &free); freed++; } } if (freed == 0) { TAILQ_INSERT_TAIL(&new_tail, pc, pc_lru); mtx_lock(&pv_chunks_mutex); continue; } /* Every freed mapping is for a 4 KB page. */ pmap_resident_count_dec(pmap, freed); PV_STAT(atomic_add_long(&pv_entry_frees, freed)); PV_STAT(atomic_add_int(&pv_entry_spare, freed)); PV_STAT(atomic_subtract_long(&pv_entry_count, freed)); TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); if (pc->pc_map[0] == PC_FREE0 && pc->pc_map[1] == PC_FREE1 && pc->pc_map[2] == PC_FREE2) { PV_STAT(atomic_subtract_int(&pv_entry_spare, _NPCPV)); PV_STAT(atomic_subtract_int(&pc_chunk_count, 1)); PV_STAT(atomic_add_int(&pc_chunk_frees, 1)); /* Entire chunk is free; return it. */ m_pc = PHYS_TO_VM_PAGE(DMAP_TO_PHYS((vm_offset_t)pc)); dump_drop_page(m_pc->phys_addr); mtx_lock(&pv_chunks_mutex); break; } TAILQ_INSERT_HEAD(&pmap->pm_pvchunk, pc, pc_list); TAILQ_INSERT_TAIL(&new_tail, pc, pc_lru); mtx_lock(&pv_chunks_mutex); /* One freed pv entry in locked_pmap is sufficient. */ if (pmap == locked_pmap) break; } TAILQ_CONCAT(&pv_chunks, &new_tail, pc_lru); mtx_unlock(&pv_chunks_mutex); if (pmap != NULL) { pmap_invalidate_all(pmap); if (pmap != locked_pmap) PMAP_UNLOCK(pmap); } pmap_delayed_invl_finished(); if (m_pc == NULL && !SLIST_EMPTY(&free)) { m_pc = SLIST_FIRST(&free); SLIST_REMOVE_HEAD(&free, plinks.s.ss); /* Recycle a freed page table page. */ m_pc->wire_count = 1; atomic_add_int(&vm_cnt.v_wire_count, 1); } pmap_free_zero_pages(&free); return (m_pc); } /* * free the pv_entry back to the free list */ static void free_pv_entry(pmap_t pmap, pv_entry_t pv) { struct pv_chunk *pc; int idx, field, bit; PMAP_LOCK_ASSERT(pmap, MA_OWNED); PV_STAT(atomic_add_long(&pv_entry_frees, 1)); PV_STAT(atomic_add_int(&pv_entry_spare, 1)); PV_STAT(atomic_subtract_long(&pv_entry_count, 1)); pc = pv_to_chunk(pv); idx = pv - &pc->pc_pventry[0]; field = idx / 64; bit = idx % 64; pc->pc_map[field] |= 1ul << bit; if (pc->pc_map[0] != PC_FREE0 || pc->pc_map[1] != PC_FREE1 || pc->pc_map[2] != PC_FREE2) { /* 98% of the time, pc is already at the head of the list. */ if (__predict_false(pc != TAILQ_FIRST(&pmap->pm_pvchunk))) { TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); TAILQ_INSERT_HEAD(&pmap->pm_pvchunk, pc, pc_list); } return; } TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); free_pv_chunk(pc); } static void free_pv_chunk(struct pv_chunk *pc) { vm_page_t m; mtx_lock(&pv_chunks_mutex); TAILQ_REMOVE(&pv_chunks, pc, pc_lru); mtx_unlock(&pv_chunks_mutex); PV_STAT(atomic_subtract_int(&pv_entry_spare, _NPCPV)); PV_STAT(atomic_subtract_int(&pc_chunk_count, 1)); PV_STAT(atomic_add_int(&pc_chunk_frees, 1)); /* entire chunk is free, return it */ m = PHYS_TO_VM_PAGE(DMAP_TO_PHYS((vm_offset_t)pc)); dump_drop_page(m->phys_addr); vm_page_unwire(m, PQ_NONE); vm_page_free(m); } /* * Returns a new PV entry, allocating a new PV chunk from the system when * needed. If this PV chunk allocation fails and a PV list lock pointer was * given, a PV chunk is reclaimed from an arbitrary pmap. Otherwise, NULL is * returned. * * The given PV list lock may be released. */ static pv_entry_t get_pv_entry(pmap_t pmap, struct rwlock **lockp) { int bit, field; pv_entry_t pv; struct pv_chunk *pc; vm_page_t m; PMAP_LOCK_ASSERT(pmap, MA_OWNED); PV_STAT(atomic_add_long(&pv_entry_allocs, 1)); retry: pc = TAILQ_FIRST(&pmap->pm_pvchunk); if (pc != NULL) { for (field = 0; field < _NPCM; field++) { if (pc->pc_map[field]) { bit = bsfq(pc->pc_map[field]); break; } } if (field < _NPCM) { pv = &pc->pc_pventry[field * 64 + bit]; pc->pc_map[field] &= ~(1ul << bit); /* If this was the last item, move it to tail */ if (pc->pc_map[0] == 0 && pc->pc_map[1] == 0 && pc->pc_map[2] == 0) { TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); TAILQ_INSERT_TAIL(&pmap->pm_pvchunk, pc, pc_list); } PV_STAT(atomic_add_long(&pv_entry_count, 1)); PV_STAT(atomic_subtract_int(&pv_entry_spare, 1)); return (pv); } } /* No free items, allocate another chunk */ m = vm_page_alloc(NULL, 0, VM_ALLOC_NORMAL | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED); if (m == NULL) { if (lockp == NULL) { PV_STAT(pc_chunk_tryfail++); return (NULL); } m = reclaim_pv_chunk(pmap, lockp); if (m == NULL) goto retry; } PV_STAT(atomic_add_int(&pc_chunk_count, 1)); PV_STAT(atomic_add_int(&pc_chunk_allocs, 1)); dump_add_page(m->phys_addr); pc = (void *)PHYS_TO_DMAP(m->phys_addr); pc->pc_pmap = pmap; pc->pc_map[0] = PC_FREE0 & ~1ul; /* preallocated bit 0 */ pc->pc_map[1] = PC_FREE1; pc->pc_map[2] = PC_FREE2; mtx_lock(&pv_chunks_mutex); TAILQ_INSERT_TAIL(&pv_chunks, pc, pc_lru); mtx_unlock(&pv_chunks_mutex); pv = &pc->pc_pventry[0]; TAILQ_INSERT_HEAD(&pmap->pm_pvchunk, pc, pc_list); PV_STAT(atomic_add_long(&pv_entry_count, 1)); PV_STAT(atomic_add_int(&pv_entry_spare, _NPCPV - 1)); return (pv); } /* * Returns the number of one bits within the given PV chunk map. * * The erratas for Intel processors state that "POPCNT Instruction May * Take Longer to Execute Than Expected". It is believed that the * issue is the spurious dependency on the destination register. * Provide a hint to the register rename logic that the destination * value is overwritten, by clearing it, as suggested in the * optimization manual. It should be cheap for unaffected processors * as well. * * Reference numbers for erratas are * 4th Gen Core: HSD146 * 5th Gen Core: BDM85 * 6th Gen Core: SKL029 */ static int popcnt_pc_map_pq(uint64_t *map) { u_long result, tmp; __asm __volatile("xorl %k0,%k0;popcntq %2,%0;" "xorl %k1,%k1;popcntq %3,%1;addl %k1,%k0;" "xorl %k1,%k1;popcntq %4,%1;addl %k1,%k0" : "=&r" (result), "=&r" (tmp) : "m" (map[0]), "m" (map[1]), "m" (map[2])); return (result); } /* * Ensure that the number of spare PV entries in the specified pmap meets or * exceeds the given count, "needed". * * The given PV list lock may be released. */ static void reserve_pv_entries(pmap_t pmap, int needed, struct rwlock **lockp) { struct pch new_tail; struct pv_chunk *pc; int avail, free; vm_page_t m; PMAP_LOCK_ASSERT(pmap, MA_OWNED); KASSERT(lockp != NULL, ("reserve_pv_entries: lockp is NULL")); /* * Newly allocated PV chunks must be stored in a private list until * the required number of PV chunks have been allocated. Otherwise, * reclaim_pv_chunk() could recycle one of these chunks. In * contrast, these chunks must be added to the pmap upon allocation. */ TAILQ_INIT(&new_tail); retry: avail = 0; TAILQ_FOREACH(pc, &pmap->pm_pvchunk, pc_list) { #ifndef __POPCNT__ if ((cpu_feature2 & CPUID2_POPCNT) == 0) bit_count((bitstr_t *)pc->pc_map, 0, sizeof(pc->pc_map) * NBBY, &free); else #endif free = popcnt_pc_map_pq(pc->pc_map); if (free == 0) break; avail += free; if (avail >= needed) break; } for (; avail < needed; avail += _NPCPV) { m = vm_page_alloc(NULL, 0, VM_ALLOC_NORMAL | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED); if (m == NULL) { m = reclaim_pv_chunk(pmap, lockp); if (m == NULL) goto retry; } PV_STAT(atomic_add_int(&pc_chunk_count, 1)); PV_STAT(atomic_add_int(&pc_chunk_allocs, 1)); dump_add_page(m->phys_addr); pc = (void *)PHYS_TO_DMAP(m->phys_addr); pc->pc_pmap = pmap; pc->pc_map[0] = PC_FREE0; pc->pc_map[1] = PC_FREE1; pc->pc_map[2] = PC_FREE2; TAILQ_INSERT_HEAD(&pmap->pm_pvchunk, pc, pc_list); TAILQ_INSERT_TAIL(&new_tail, pc, pc_lru); PV_STAT(atomic_add_int(&pv_entry_spare, _NPCPV)); } if (!TAILQ_EMPTY(&new_tail)) { mtx_lock(&pv_chunks_mutex); TAILQ_CONCAT(&pv_chunks, &new_tail, pc_lru); mtx_unlock(&pv_chunks_mutex); } } /* * First find and then remove the pv entry for the specified pmap and virtual * address from the specified pv list. Returns the pv entry if found and NULL * otherwise. This operation can be performed on pv lists for either 4KB or * 2MB page mappings. */ static __inline pv_entry_t pmap_pvh_remove(struct md_page *pvh, pmap_t pmap, vm_offset_t va) { pv_entry_t pv; TAILQ_FOREACH(pv, &pvh->pv_list, pv_next) { if (pmap == PV_PMAP(pv) && va == pv->pv_va) { TAILQ_REMOVE(&pvh->pv_list, pv, pv_next); pvh->pv_gen++; break; } } return (pv); } /* * After demotion from a 2MB page mapping to 512 4KB page mappings, * destroy the pv entry for the 2MB page mapping and reinstantiate the pv * entries for each of the 4KB page mappings. */ static void pmap_pv_demote_pde(pmap_t pmap, vm_offset_t va, vm_paddr_t pa, struct rwlock **lockp) { struct md_page *pvh; struct pv_chunk *pc; pv_entry_t pv; vm_offset_t va_last; vm_page_t m; int bit, field; PMAP_LOCK_ASSERT(pmap, MA_OWNED); KASSERT((pa & PDRMASK) == 0, ("pmap_pv_demote_pde: pa is not 2mpage aligned")); CHANGE_PV_LIST_LOCK_TO_PHYS(lockp, pa); /* * Transfer the 2mpage's pv entry for this mapping to the first * page's pv list. Once this transfer begins, the pv list lock * must not be released until the last pv entry is reinstantiated. */ pvh = pa_to_pvh(pa); va = trunc_2mpage(va); pv = pmap_pvh_remove(pvh, pmap, va); KASSERT(pv != NULL, ("pmap_pv_demote_pde: pv not found")); m = PHYS_TO_VM_PAGE(pa); TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_next); m->md.pv_gen++; /* Instantiate the remaining NPTEPG - 1 pv entries. */ PV_STAT(atomic_add_long(&pv_entry_allocs, NPTEPG - 1)); va_last = va + NBPDR - PAGE_SIZE; for (;;) { pc = TAILQ_FIRST(&pmap->pm_pvchunk); KASSERT(pc->pc_map[0] != 0 || pc->pc_map[1] != 0 || pc->pc_map[2] != 0, ("pmap_pv_demote_pde: missing spare")); for (field = 0; field < _NPCM; field++) { while (pc->pc_map[field]) { bit = bsfq(pc->pc_map[field]); pc->pc_map[field] &= ~(1ul << bit); pv = &pc->pc_pventry[field * 64 + bit]; va += PAGE_SIZE; pv->pv_va = va; m++; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_pv_demote_pde: page %p is not managed", m)); TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_next); m->md.pv_gen++; if (va == va_last) goto out; } } TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); TAILQ_INSERT_TAIL(&pmap->pm_pvchunk, pc, pc_list); } out: if (pc->pc_map[0] == 0 && pc->pc_map[1] == 0 && pc->pc_map[2] == 0) { TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); TAILQ_INSERT_TAIL(&pmap->pm_pvchunk, pc, pc_list); } PV_STAT(atomic_add_long(&pv_entry_count, NPTEPG - 1)); PV_STAT(atomic_subtract_int(&pv_entry_spare, NPTEPG - 1)); } /* * After promotion from 512 4KB page mappings to a single 2MB page mapping, * replace the many pv entries for the 4KB page mappings by a single pv entry * for the 2MB page mapping. */ static void pmap_pv_promote_pde(pmap_t pmap, vm_offset_t va, vm_paddr_t pa, struct rwlock **lockp) { struct md_page *pvh; pv_entry_t pv; vm_offset_t va_last; vm_page_t m; KASSERT((pa & PDRMASK) == 0, ("pmap_pv_promote_pde: pa is not 2mpage aligned")); CHANGE_PV_LIST_LOCK_TO_PHYS(lockp, pa); /* * Transfer the first page's pv entry for this mapping to the 2mpage's * pv list. Aside from avoiding the cost of a call to get_pv_entry(), * a transfer avoids the possibility that get_pv_entry() calls * reclaim_pv_chunk() and that reclaim_pv_chunk() removes one of the * mappings that is being promoted. */ m = PHYS_TO_VM_PAGE(pa); va = trunc_2mpage(va); pv = pmap_pvh_remove(&m->md, pmap, va); KASSERT(pv != NULL, ("pmap_pv_promote_pde: pv not found")); pvh = pa_to_pvh(pa); TAILQ_INSERT_TAIL(&pvh->pv_list, pv, pv_next); pvh->pv_gen++; /* Free the remaining NPTEPG - 1 pv entries. */ va_last = va + NBPDR - PAGE_SIZE; do { m++; va += PAGE_SIZE; pmap_pvh_free(&m->md, pmap, va); } while (va < va_last); } /* * First find and then destroy the pv entry for the specified pmap and virtual * address. This operation can be performed on pv lists for either 4KB or 2MB * page mappings. */ static void pmap_pvh_free(struct md_page *pvh, pmap_t pmap, vm_offset_t va) { pv_entry_t pv; pv = pmap_pvh_remove(pvh, pmap, va); KASSERT(pv != NULL, ("pmap_pvh_free: pv not found")); free_pv_entry(pmap, pv); } /* * Conditionally create the PV entry for a 4KB page mapping if the required * memory can be allocated without resorting to reclamation. */ static boolean_t pmap_try_insert_pv_entry(pmap_t pmap, vm_offset_t va, vm_page_t m, struct rwlock **lockp) { pv_entry_t pv; PMAP_LOCK_ASSERT(pmap, MA_OWNED); /* Pass NULL instead of the lock pointer to disable reclamation. */ if ((pv = get_pv_entry(pmap, NULL)) != NULL) { pv->pv_va = va; CHANGE_PV_LIST_LOCK_TO_VM_PAGE(lockp, m); TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_next); m->md.pv_gen++; return (TRUE); } else return (FALSE); } /* * Conditionally create the PV entry for a 2MB page mapping if the required * memory can be allocated without resorting to reclamation. */ static boolean_t pmap_pv_insert_pde(pmap_t pmap, vm_offset_t va, vm_paddr_t pa, struct rwlock **lockp) { struct md_page *pvh; pv_entry_t pv; PMAP_LOCK_ASSERT(pmap, MA_OWNED); /* Pass NULL instead of the lock pointer to disable reclamation. */ if ((pv = get_pv_entry(pmap, NULL)) != NULL) { pv->pv_va = va; CHANGE_PV_LIST_LOCK_TO_PHYS(lockp, pa); pvh = pa_to_pvh(pa); TAILQ_INSERT_TAIL(&pvh->pv_list, pv, pv_next); pvh->pv_gen++; return (TRUE); } else return (FALSE); } /* * Fills a page table page with mappings to consecutive physical pages. */ static void pmap_fill_ptp(pt_entry_t *firstpte, pt_entry_t newpte) { pt_entry_t *pte; for (pte = firstpte; pte < firstpte + NPTEPG; pte++) { *pte = newpte; newpte += PAGE_SIZE; } } /* * Tries to demote a 2MB page mapping. If demotion fails, the 2MB page * mapping is invalidated. */ static boolean_t pmap_demote_pde(pmap_t pmap, pd_entry_t *pde, vm_offset_t va) { struct rwlock *lock; boolean_t rv; lock = NULL; rv = pmap_demote_pde_locked(pmap, pde, va, &lock); if (lock != NULL) rw_wunlock(lock); return (rv); } static boolean_t pmap_demote_pde_locked(pmap_t pmap, pd_entry_t *pde, vm_offset_t va, struct rwlock **lockp) { pd_entry_t newpde, oldpde; pt_entry_t *firstpte, newpte; pt_entry_t PG_A, PG_G, PG_M, PG_RW, PG_V; vm_paddr_t mptepa; vm_page_t mpte; struct spglist free; int PG_PTE_CACHE; PG_G = pmap_global_bit(pmap); PG_A = pmap_accessed_bit(pmap); PG_M = pmap_modified_bit(pmap); PG_RW = pmap_rw_bit(pmap); PG_V = pmap_valid_bit(pmap); PG_PTE_CACHE = pmap_cache_mask(pmap, 0); PMAP_LOCK_ASSERT(pmap, MA_OWNED); oldpde = *pde; KASSERT((oldpde & (PG_PS | PG_V)) == (PG_PS | PG_V), ("pmap_demote_pde: oldpde is missing PG_PS and/or PG_V")); if ((oldpde & PG_A) != 0 && (mpte = pmap_lookup_pt_page(pmap, va)) != NULL) pmap_remove_pt_page(pmap, mpte); else { KASSERT((oldpde & PG_W) == 0, ("pmap_demote_pde: page table page for a wired mapping" " is missing")); /* * Invalidate the 2MB page mapping and return "failure" if the * mapping was never accessed or the allocation of the new * page table page fails. If the 2MB page mapping belongs to * the direct map region of the kernel's address space, then * the page allocation request specifies the highest possible * priority (VM_ALLOC_INTERRUPT). Otherwise, the priority is * normal. Page table pages are preallocated for every other * part of the kernel address space, so the direct map region * is the only part of the kernel address space that must be * handled here. */ if ((oldpde & PG_A) == 0 || (mpte = vm_page_alloc(NULL, pmap_pde_pindex(va), (va >= DMAP_MIN_ADDRESS && va < DMAP_MAX_ADDRESS ? VM_ALLOC_INTERRUPT : VM_ALLOC_NORMAL) | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED)) == NULL) { SLIST_INIT(&free); pmap_remove_pde(pmap, pde, trunc_2mpage(va), &free, lockp); pmap_invalidate_page(pmap, trunc_2mpage(va)); pmap_free_zero_pages(&free); CTR2(KTR_PMAP, "pmap_demote_pde: failure for va %#lx" " in pmap %p", va, pmap); return (FALSE); } if (va < VM_MAXUSER_ADDRESS) pmap_resident_count_inc(pmap, 1); } mptepa = VM_PAGE_TO_PHYS(mpte); firstpte = (pt_entry_t *)PHYS_TO_DMAP(mptepa); newpde = mptepa | PG_M | PG_A | (oldpde & PG_U) | PG_RW | PG_V; KASSERT((oldpde & PG_A) != 0, ("pmap_demote_pde: oldpde is missing PG_A")); KASSERT((oldpde & (PG_M | PG_RW)) != PG_RW, ("pmap_demote_pde: oldpde is missing PG_M")); newpte = oldpde & ~PG_PS; newpte = pmap_swap_pat(pmap, newpte); /* * If the page table page is new, initialize it. */ if (mpte->wire_count == 1) { mpte->wire_count = NPTEPG; pmap_fill_ptp(firstpte, newpte); } KASSERT((*firstpte & PG_FRAME) == (newpte & PG_FRAME), ("pmap_demote_pde: firstpte and newpte map different physical" " addresses")); /* * If the mapping has changed attributes, update the page table * entries. */ if ((*firstpte & PG_PTE_PROMOTE) != (newpte & PG_PTE_PROMOTE)) pmap_fill_ptp(firstpte, newpte); /* * The spare PV entries must be reserved prior to demoting the * mapping, that is, prior to changing the PDE. Otherwise, the state * of the PDE and the PV lists will be inconsistent, which can result * in reclaim_pv_chunk() attempting to remove a PV entry from the * wrong PV list and pmap_pv_demote_pde() failing to find the expected * PV entry for the 2MB page mapping that is being demoted. */ if ((oldpde & PG_MANAGED) != 0) reserve_pv_entries(pmap, NPTEPG - 1, lockp); /* * Demote the mapping. This pmap is locked. The old PDE has * PG_A set. If the old PDE has PG_RW set, it also has PG_M * set. Thus, there is no danger of a race with another * processor changing the setting of PG_A and/or PG_M between * the read above and the store below. */ if (workaround_erratum383) pmap_update_pde(pmap, va, pde, newpde); else pde_store(pde, newpde); /* * Invalidate a stale recursive mapping of the page table page. */ if (va >= VM_MAXUSER_ADDRESS) pmap_invalidate_page(pmap, (vm_offset_t)vtopte(va)); /* * Demote the PV entry. */ if ((oldpde & PG_MANAGED) != 0) pmap_pv_demote_pde(pmap, va, oldpde & PG_PS_FRAME, lockp); atomic_add_long(&pmap_pde_demotions, 1); CTR2(KTR_PMAP, "pmap_demote_pde: success for va %#lx" " in pmap %p", va, pmap); return (TRUE); } /* * pmap_remove_kernel_pde: Remove a kernel superpage mapping. */ static void pmap_remove_kernel_pde(pmap_t pmap, pd_entry_t *pde, vm_offset_t va) { pd_entry_t newpde; vm_paddr_t mptepa; vm_page_t mpte; KASSERT(pmap == kernel_pmap, ("pmap %p is not kernel_pmap", pmap)); PMAP_LOCK_ASSERT(pmap, MA_OWNED); mpte = pmap_lookup_pt_page(pmap, va); if (mpte == NULL) panic("pmap_remove_kernel_pde: Missing pt page."); pmap_remove_pt_page(pmap, mpte); mptepa = VM_PAGE_TO_PHYS(mpte); newpde = mptepa | X86_PG_M | X86_PG_A | X86_PG_RW | X86_PG_V; /* * Initialize the page table page. */ pagezero((void *)PHYS_TO_DMAP(mptepa)); /* * Demote the mapping. */ if (workaround_erratum383) pmap_update_pde(pmap, va, pde, newpde); else pde_store(pde, newpde); /* * Invalidate a stale recursive mapping of the page table page. */ pmap_invalidate_page(pmap, (vm_offset_t)vtopte(va)); } /* * pmap_remove_pde: do the things to unmap a superpage in a process */ static int pmap_remove_pde(pmap_t pmap, pd_entry_t *pdq, vm_offset_t sva, struct spglist *free, struct rwlock **lockp) { struct md_page *pvh; pd_entry_t oldpde; vm_offset_t eva, va; vm_page_t m, mpte; pt_entry_t PG_G, PG_A, PG_M, PG_RW; PG_G = pmap_global_bit(pmap); PG_A = pmap_accessed_bit(pmap); PG_M = pmap_modified_bit(pmap); PG_RW = pmap_rw_bit(pmap); PMAP_LOCK_ASSERT(pmap, MA_OWNED); KASSERT((sva & PDRMASK) == 0, ("pmap_remove_pde: sva is not 2mpage aligned")); oldpde = pte_load_clear(pdq); if (oldpde & PG_W) pmap->pm_stats.wired_count -= NBPDR / PAGE_SIZE; /* * Machines that don't support invlpg, also don't support * PG_G. */ if (oldpde & PG_G) pmap_invalidate_page(kernel_pmap, sva); pmap_resident_count_dec(pmap, NBPDR / PAGE_SIZE); if (oldpde & PG_MANAGED) { CHANGE_PV_LIST_LOCK_TO_PHYS(lockp, oldpde & PG_PS_FRAME); pvh = pa_to_pvh(oldpde & PG_PS_FRAME); pmap_pvh_free(pvh, pmap, sva); eva = sva + NBPDR; for (va = sva, m = PHYS_TO_VM_PAGE(oldpde & PG_PS_FRAME); va < eva; va += PAGE_SIZE, m++) { if ((oldpde & (PG_M | PG_RW)) == (PG_M | PG_RW)) vm_page_dirty(m); if (oldpde & PG_A) vm_page_aflag_set(m, PGA_REFERENCED); if (TAILQ_EMPTY(&m->md.pv_list) && TAILQ_EMPTY(&pvh->pv_list)) vm_page_aflag_clear(m, PGA_WRITEABLE); pmap_delayed_invl_page(m); } } if (pmap == kernel_pmap) { pmap_remove_kernel_pde(pmap, pdq, sva); } else { mpte = pmap_lookup_pt_page(pmap, sva); if (mpte != NULL) { pmap_remove_pt_page(pmap, mpte); pmap_resident_count_dec(pmap, 1); KASSERT(mpte->wire_count == NPTEPG, ("pmap_remove_pde: pte page wire count error")); mpte->wire_count = 0; pmap_add_delayed_free_list(mpte, free, FALSE); atomic_subtract_int(&vm_cnt.v_wire_count, 1); } } return (pmap_unuse_pt(pmap, sva, *pmap_pdpe(pmap, sva), free)); } /* * pmap_remove_pte: do the things to unmap a page in a process */ static int pmap_remove_pte(pmap_t pmap, pt_entry_t *ptq, vm_offset_t va, pd_entry_t ptepde, struct spglist *free, struct rwlock **lockp) { struct md_page *pvh; pt_entry_t oldpte, PG_A, PG_M, PG_RW; vm_page_t m; PG_A = pmap_accessed_bit(pmap); PG_M = pmap_modified_bit(pmap); PG_RW = pmap_rw_bit(pmap); PMAP_LOCK_ASSERT(pmap, MA_OWNED); oldpte = pte_load_clear(ptq); if (oldpte & PG_W) pmap->pm_stats.wired_count -= 1; pmap_resident_count_dec(pmap, 1); if (oldpte & PG_MANAGED) { m = PHYS_TO_VM_PAGE(oldpte & PG_FRAME); if ((oldpte & (PG_M | PG_RW)) == (PG_M | PG_RW)) vm_page_dirty(m); if (oldpte & PG_A) vm_page_aflag_set(m, PGA_REFERENCED); CHANGE_PV_LIST_LOCK_TO_VM_PAGE(lockp, m); pmap_pvh_free(&m->md, pmap, va); if (TAILQ_EMPTY(&m->md.pv_list) && (m->flags & PG_FICTITIOUS) == 0) { pvh = pa_to_pvh(VM_PAGE_TO_PHYS(m)); if (TAILQ_EMPTY(&pvh->pv_list)) vm_page_aflag_clear(m, PGA_WRITEABLE); } pmap_delayed_invl_page(m); } return (pmap_unuse_pt(pmap, va, ptepde, free)); } /* * Remove a single page from a process address space */ static void pmap_remove_page(pmap_t pmap, vm_offset_t va, pd_entry_t *pde, struct spglist *free) { struct rwlock *lock; pt_entry_t *pte, PG_V; PG_V = pmap_valid_bit(pmap); PMAP_LOCK_ASSERT(pmap, MA_OWNED); if ((*pde & PG_V) == 0) return; pte = pmap_pde_to_pte(pde, va); if ((*pte & PG_V) == 0) return; lock = NULL; pmap_remove_pte(pmap, pte, va, *pde, free, &lock); if (lock != NULL) rw_wunlock(lock); pmap_invalidate_page(pmap, va); } /* * Remove the given range of addresses from the specified map. * * It is assumed that the start and end are properly * rounded to the page size. */ void pmap_remove(pmap_t pmap, vm_offset_t sva, vm_offset_t eva) { struct rwlock *lock; vm_offset_t va, va_next; pml4_entry_t *pml4e; pdp_entry_t *pdpe; pd_entry_t ptpaddr, *pde; pt_entry_t *pte, PG_G, PG_V; struct spglist free; int anyvalid; PG_G = pmap_global_bit(pmap); PG_V = pmap_valid_bit(pmap); /* * Perform an unsynchronized read. This is, however, safe. */ if (pmap->pm_stats.resident_count == 0) return; anyvalid = 0; SLIST_INIT(&free); pmap_delayed_invl_started(); PMAP_LOCK(pmap); /* * special handling of removing one page. a very * common operation and easy to short circuit some * code. */ if (sva + PAGE_SIZE == eva) { pde = pmap_pde(pmap, sva); if (pde && (*pde & PG_PS) == 0) { pmap_remove_page(pmap, sva, pde, &free); goto out; } } lock = NULL; for (; sva < eva; sva = va_next) { if (pmap->pm_stats.resident_count == 0) break; pml4e = pmap_pml4e(pmap, sva); if ((*pml4e & PG_V) == 0) { va_next = (sva + NBPML4) & ~PML4MASK; if (va_next < sva) va_next = eva; continue; } pdpe = pmap_pml4e_to_pdpe(pml4e, sva); if ((*pdpe & PG_V) == 0) { va_next = (sva + NBPDP) & ~PDPMASK; if (va_next < sva) va_next = eva; continue; } /* * Calculate index for next page table. */ va_next = (sva + NBPDR) & ~PDRMASK; if (va_next < sva) va_next = eva; pde = pmap_pdpe_to_pde(pdpe, sva); ptpaddr = *pde; /* * Weed out invalid mappings. */ if (ptpaddr == 0) continue; /* * Check for large page. */ if ((ptpaddr & PG_PS) != 0) { /* * Are we removing the entire large page? If not, * demote the mapping and fall through. */ if (sva + NBPDR == va_next && eva >= va_next) { /* * The TLB entry for a PG_G mapping is * invalidated by pmap_remove_pde(). */ if ((ptpaddr & PG_G) == 0) anyvalid = 1; pmap_remove_pde(pmap, pde, sva, &free, &lock); continue; } else if (!pmap_demote_pde_locked(pmap, pde, sva, &lock)) { /* The large page mapping was destroyed. */ continue; } else ptpaddr = *pde; } /* * Limit our scan to either the end of the va represented * by the current page table page, or to the end of the * range being removed. */ if (va_next > eva) va_next = eva; va = va_next; for (pte = pmap_pde_to_pte(pde, sva); sva != va_next; pte++, sva += PAGE_SIZE) { if (*pte == 0) { if (va != va_next) { pmap_invalidate_range(pmap, va, sva); va = va_next; } continue; } if ((*pte & PG_G) == 0) anyvalid = 1; else if (va == va_next) va = sva; if (pmap_remove_pte(pmap, pte, sva, ptpaddr, &free, &lock)) { sva += PAGE_SIZE; break; } } if (va != va_next) pmap_invalidate_range(pmap, va, sva); } if (lock != NULL) rw_wunlock(lock); out: if (anyvalid) pmap_invalidate_all(pmap); PMAP_UNLOCK(pmap); pmap_delayed_invl_finished(); pmap_free_zero_pages(&free); } /* * Routine: pmap_remove_all * Function: * Removes this physical page from * all physical maps in which it resides. * Reflects back modify bits to the pager. * * Notes: * Original versions of this routine were very * inefficient because they iteratively called * pmap_remove (slow...) */ void pmap_remove_all(vm_page_t m) { struct md_page *pvh; pv_entry_t pv; pmap_t pmap; struct rwlock *lock; pt_entry_t *pte, tpte, PG_A, PG_M, PG_RW; pd_entry_t *pde; vm_offset_t va; struct spglist free; int pvh_gen, md_gen; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_remove_all: page %p is not managed", m)); SLIST_INIT(&free); lock = VM_PAGE_TO_PV_LIST_LOCK(m); pvh = (m->flags & PG_FICTITIOUS) != 0 ? &pv_dummy : pa_to_pvh(VM_PAGE_TO_PHYS(m)); retry: rw_wlock(lock); while ((pv = TAILQ_FIRST(&pvh->pv_list)) != NULL) { pmap = PV_PMAP(pv); if (!PMAP_TRYLOCK(pmap)) { pvh_gen = pvh->pv_gen; rw_wunlock(lock); PMAP_LOCK(pmap); rw_wlock(lock); if (pvh_gen != pvh->pv_gen) { rw_wunlock(lock); PMAP_UNLOCK(pmap); goto retry; } } va = pv->pv_va; pde = pmap_pde(pmap, va); (void)pmap_demote_pde_locked(pmap, pde, va, &lock); PMAP_UNLOCK(pmap); } while ((pv = TAILQ_FIRST(&m->md.pv_list)) != NULL) { pmap = PV_PMAP(pv); if (!PMAP_TRYLOCK(pmap)) { pvh_gen = pvh->pv_gen; md_gen = m->md.pv_gen; rw_wunlock(lock); PMAP_LOCK(pmap); rw_wlock(lock); if (pvh_gen != pvh->pv_gen || md_gen != m->md.pv_gen) { rw_wunlock(lock); PMAP_UNLOCK(pmap); goto retry; } } PG_A = pmap_accessed_bit(pmap); PG_M = pmap_modified_bit(pmap); PG_RW = pmap_rw_bit(pmap); pmap_resident_count_dec(pmap, 1); pde = pmap_pde(pmap, pv->pv_va); KASSERT((*pde & PG_PS) == 0, ("pmap_remove_all: found" " a 2mpage in page %p's pv list", m)); pte = pmap_pde_to_pte(pde, pv->pv_va); tpte = pte_load_clear(pte); if (tpte & PG_W) pmap->pm_stats.wired_count--; if (tpte & PG_A) vm_page_aflag_set(m, PGA_REFERENCED); /* * Update the vm_page_t clean and reference bits. */ if ((tpte & (PG_M | PG_RW)) == (PG_M | PG_RW)) vm_page_dirty(m); pmap_unuse_pt(pmap, pv->pv_va, *pde, &free); pmap_invalidate_page(pmap, pv->pv_va); TAILQ_REMOVE(&m->md.pv_list, pv, pv_next); m->md.pv_gen++; free_pv_entry(pmap, pv); PMAP_UNLOCK(pmap); } vm_page_aflag_clear(m, PGA_WRITEABLE); rw_wunlock(lock); pmap_delayed_invl_wait(m); pmap_free_zero_pages(&free); } /* * pmap_protect_pde: do the things to protect a 2mpage in a process */ static boolean_t pmap_protect_pde(pmap_t pmap, pd_entry_t *pde, vm_offset_t sva, vm_prot_t prot) { pd_entry_t newpde, oldpde; vm_offset_t eva, va; vm_page_t m; boolean_t anychanged; pt_entry_t PG_G, PG_M, PG_RW; PG_G = pmap_global_bit(pmap); PG_M = pmap_modified_bit(pmap); PG_RW = pmap_rw_bit(pmap); PMAP_LOCK_ASSERT(pmap, MA_OWNED); KASSERT((sva & PDRMASK) == 0, ("pmap_protect_pde: sva is not 2mpage aligned")); anychanged = FALSE; retry: oldpde = newpde = *pde; if (oldpde & PG_MANAGED) { eva = sva + NBPDR; for (va = sva, m = PHYS_TO_VM_PAGE(oldpde & PG_PS_FRAME); va < eva; va += PAGE_SIZE, m++) if ((oldpde & (PG_M | PG_RW)) == (PG_M | PG_RW)) vm_page_dirty(m); } if ((prot & VM_PROT_WRITE) == 0) newpde &= ~(PG_RW | PG_M); if ((prot & VM_PROT_EXECUTE) == 0) newpde |= pg_nx; if (newpde != oldpde) { if (!atomic_cmpset_long(pde, oldpde, newpde)) goto retry; if (oldpde & PG_G) pmap_invalidate_page(pmap, sva); else anychanged = TRUE; } return (anychanged); } /* * Set the physical protection on the * specified range of this map as requested. */ void pmap_protect(pmap_t pmap, vm_offset_t sva, vm_offset_t eva, vm_prot_t prot) { vm_offset_t va_next; pml4_entry_t *pml4e; pdp_entry_t *pdpe; pd_entry_t ptpaddr, *pde; pt_entry_t *pte, PG_G, PG_M, PG_RW, PG_V; boolean_t anychanged; KASSERT((prot & ~VM_PROT_ALL) == 0, ("invalid prot %x", prot)); if (prot == VM_PROT_NONE) { pmap_remove(pmap, sva, eva); return; } if ((prot & (VM_PROT_WRITE|VM_PROT_EXECUTE)) == (VM_PROT_WRITE|VM_PROT_EXECUTE)) return; PG_G = pmap_global_bit(pmap); PG_M = pmap_modified_bit(pmap); PG_V = pmap_valid_bit(pmap); PG_RW = pmap_rw_bit(pmap); anychanged = FALSE; PMAP_LOCK(pmap); for (; sva < eva; sva = va_next) { pml4e = pmap_pml4e(pmap, sva); if ((*pml4e & PG_V) == 0) { va_next = (sva + NBPML4) & ~PML4MASK; if (va_next < sva) va_next = eva; continue; } pdpe = pmap_pml4e_to_pdpe(pml4e, sva); if ((*pdpe & PG_V) == 0) { va_next = (sva + NBPDP) & ~PDPMASK; if (va_next < sva) va_next = eva; continue; } va_next = (sva + NBPDR) & ~PDRMASK; if (va_next < sva) va_next = eva; pde = pmap_pdpe_to_pde(pdpe, sva); ptpaddr = *pde; /* * Weed out invalid mappings. */ if (ptpaddr == 0) continue; /* * Check for large page. */ if ((ptpaddr & PG_PS) != 0) { /* * Are we protecting the entire large page? If not, * demote the mapping and fall through. */ if (sva + NBPDR == va_next && eva >= va_next) { /* * The TLB entry for a PG_G mapping is * invalidated by pmap_protect_pde(). */ if (pmap_protect_pde(pmap, pde, sva, prot)) anychanged = TRUE; continue; } else if (!pmap_demote_pde(pmap, pde, sva)) { /* * The large page mapping was destroyed. */ continue; } } if (va_next > eva) va_next = eva; for (pte = pmap_pde_to_pte(pde, sva); sva != va_next; pte++, sva += PAGE_SIZE) { pt_entry_t obits, pbits; vm_page_t m; retry: obits = pbits = *pte; if ((pbits & PG_V) == 0) continue; if ((prot & VM_PROT_WRITE) == 0) { if ((pbits & (PG_MANAGED | PG_M | PG_RW)) == (PG_MANAGED | PG_M | PG_RW)) { m = PHYS_TO_VM_PAGE(pbits & PG_FRAME); vm_page_dirty(m); } pbits &= ~(PG_RW | PG_M); } if ((prot & VM_PROT_EXECUTE) == 0) pbits |= pg_nx; if (pbits != obits) { if (!atomic_cmpset_long(pte, obits, pbits)) goto retry; if (obits & PG_G) pmap_invalidate_page(pmap, sva); else anychanged = TRUE; } } } if (anychanged) pmap_invalidate_all(pmap); PMAP_UNLOCK(pmap); } /* * Tries to promote the 512, contiguous 4KB page mappings that are within a * single page table page (PTP) to a single 2MB page mapping. For promotion * to occur, two conditions must be met: (1) the 4KB page mappings must map * aligned, contiguous physical memory and (2) the 4KB page mappings must have * identical characteristics. */ static void pmap_promote_pde(pmap_t pmap, pd_entry_t *pde, vm_offset_t va, struct rwlock **lockp) { pd_entry_t newpde; pt_entry_t *firstpte, oldpte, pa, *pte; pt_entry_t PG_G, PG_A, PG_M, PG_RW, PG_V; vm_page_t mpte; int PG_PTE_CACHE; PG_A = pmap_accessed_bit(pmap); PG_G = pmap_global_bit(pmap); PG_M = pmap_modified_bit(pmap); PG_V = pmap_valid_bit(pmap); PG_RW = pmap_rw_bit(pmap); PG_PTE_CACHE = pmap_cache_mask(pmap, 0); PMAP_LOCK_ASSERT(pmap, MA_OWNED); /* * Examine the first PTE in the specified PTP. Abort if this PTE is * either invalid, unused, or does not map the first 4KB physical page * within a 2MB page. */ firstpte = (pt_entry_t *)PHYS_TO_DMAP(*pde & PG_FRAME); setpde: newpde = *firstpte; if ((newpde & ((PG_FRAME & PDRMASK) | PG_A | PG_V)) != (PG_A | PG_V)) { atomic_add_long(&pmap_pde_p_failures, 1); CTR2(KTR_PMAP, "pmap_promote_pde: failure for va %#lx" " in pmap %p", va, pmap); return; } if ((newpde & (PG_M | PG_RW)) == PG_RW) { /* * When PG_M is already clear, PG_RW can be cleared without * a TLB invalidation. */ if (!atomic_cmpset_long(firstpte, newpde, newpde & ~PG_RW)) goto setpde; newpde &= ~PG_RW; } /* * Examine each of the other PTEs in the specified PTP. Abort if this * PTE maps an unexpected 4KB physical page or does not have identical * characteristics to the first PTE. */ pa = (newpde & (PG_PS_FRAME | PG_A | PG_V)) + NBPDR - PAGE_SIZE; for (pte = firstpte + NPTEPG - 1; pte > firstpte; pte--) { setpte: oldpte = *pte; if ((oldpte & (PG_FRAME | PG_A | PG_V)) != pa) { atomic_add_long(&pmap_pde_p_failures, 1); CTR2(KTR_PMAP, "pmap_promote_pde: failure for va %#lx" " in pmap %p", va, pmap); return; } if ((oldpte & (PG_M | PG_RW)) == PG_RW) { /* * When PG_M is already clear, PG_RW can be cleared * without a TLB invalidation. */ if (!atomic_cmpset_long(pte, oldpte, oldpte & ~PG_RW)) goto setpte; oldpte &= ~PG_RW; CTR2(KTR_PMAP, "pmap_promote_pde: protect for va %#lx" " in pmap %p", (oldpte & PG_FRAME & PDRMASK) | (va & ~PDRMASK), pmap); } if ((oldpte & PG_PTE_PROMOTE) != (newpde & PG_PTE_PROMOTE)) { atomic_add_long(&pmap_pde_p_failures, 1); CTR2(KTR_PMAP, "pmap_promote_pde: failure for va %#lx" " in pmap %p", va, pmap); return; } pa -= PAGE_SIZE; } /* * Save the page table page in its current state until the PDE * mapping the superpage is demoted by pmap_demote_pde() or * destroyed by pmap_remove_pde(). */ mpte = PHYS_TO_VM_PAGE(*pde & PG_FRAME); KASSERT(mpte >= vm_page_array && mpte < &vm_page_array[vm_page_array_size], ("pmap_promote_pde: page table page is out of range")); KASSERT(mpte->pindex == pmap_pde_pindex(va), ("pmap_promote_pde: page table page's pindex is wrong")); if (pmap_insert_pt_page(pmap, mpte)) { atomic_add_long(&pmap_pde_p_failures, 1); CTR2(KTR_PMAP, "pmap_promote_pde: failure for va %#lx in pmap %p", va, pmap); return; } /* * Promote the pv entries. */ if ((newpde & PG_MANAGED) != 0) pmap_pv_promote_pde(pmap, va, newpde & PG_PS_FRAME, lockp); /* * Propagate the PAT index to its proper position. */ newpde = pmap_swap_pat(pmap, newpde); /* * Map the superpage. */ if (workaround_erratum383) pmap_update_pde(pmap, va, pde, PG_PS | newpde); else pde_store(pde, PG_PS | newpde); atomic_add_long(&pmap_pde_promotions, 1); CTR2(KTR_PMAP, "pmap_promote_pde: success for va %#lx" " in pmap %p", va, pmap); } /* * Insert the given physical page (p) at * the specified virtual address (v) in the * target physical map with the protection requested. * * If specified, the page will be wired down, meaning * that the related pte can not be reclaimed. * * NB: This is the only routine which MAY NOT lazy-evaluate * or lose information. That is, this routine must actually * insert this page into the given map NOW. * * When destroying both a page table and PV entry, this function * performs the TLB invalidation before releasing the PV list * lock, so we do not need pmap_delayed_invl_page() calls here. */ int pmap_enter(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot, u_int flags, int8_t psind __unused) { struct rwlock *lock; pd_entry_t *pde; pt_entry_t *pte, PG_G, PG_A, PG_M, PG_RW, PG_V; pt_entry_t newpte, origpte; pv_entry_t pv; vm_paddr_t opa, pa; vm_page_t mpte, om; boolean_t nosleep; PG_A = pmap_accessed_bit(pmap); PG_G = pmap_global_bit(pmap); PG_M = pmap_modified_bit(pmap); PG_V = pmap_valid_bit(pmap); PG_RW = pmap_rw_bit(pmap); va = trunc_page(va); KASSERT(va <= VM_MAX_KERNEL_ADDRESS, ("pmap_enter: toobig")); KASSERT(va < UPT_MIN_ADDRESS || va >= UPT_MAX_ADDRESS, ("pmap_enter: invalid to pmap_enter page table pages (va: 0x%lx)", va)); KASSERT((m->oflags & VPO_UNMANAGED) != 0 || va < kmi.clean_sva || va >= kmi.clean_eva, ("pmap_enter: managed mapping within the clean submap")); if ((m->oflags & VPO_UNMANAGED) == 0 && !vm_page_xbusied(m)) VM_OBJECT_ASSERT_LOCKED(m->object); pa = VM_PAGE_TO_PHYS(m); newpte = (pt_entry_t)(pa | PG_A | PG_V); if ((flags & VM_PROT_WRITE) != 0) newpte |= PG_M; if ((prot & VM_PROT_WRITE) != 0) newpte |= PG_RW; KASSERT((newpte & (PG_M | PG_RW)) != PG_M, ("pmap_enter: flags includes VM_PROT_WRITE but prot doesn't")); if ((prot & VM_PROT_EXECUTE) == 0) newpte |= pg_nx; if ((flags & PMAP_ENTER_WIRED) != 0) newpte |= PG_W; if (va < VM_MAXUSER_ADDRESS) newpte |= PG_U; if (pmap == kernel_pmap) newpte |= PG_G; newpte |= pmap_cache_bits(pmap, m->md.pat_mode, 0); /* * Set modified bit gratuitously for writeable mappings if * the page is unmanaged. We do not want to take a fault * to do the dirty bit accounting for these mappings. */ if ((m->oflags & VPO_UNMANAGED) != 0) { if ((newpte & PG_RW) != 0) newpte |= PG_M; } mpte = NULL; lock = NULL; PMAP_LOCK(pmap); /* * In the case that a page table page is not * resident, we are creating it here. */ retry: pde = pmap_pde(pmap, va); if (pde != NULL && (*pde & PG_V) != 0 && ((*pde & PG_PS) == 0 || pmap_demote_pde_locked(pmap, pde, va, &lock))) { pte = pmap_pde_to_pte(pde, va); if (va < VM_MAXUSER_ADDRESS && mpte == NULL) { mpte = PHYS_TO_VM_PAGE(*pde & PG_FRAME); mpte->wire_count++; } } else if (va < VM_MAXUSER_ADDRESS) { /* * Here if the pte page isn't mapped, or if it has been * deallocated. */ nosleep = (flags & PMAP_ENTER_NOSLEEP) != 0; mpte = _pmap_allocpte(pmap, pmap_pde_pindex(va), nosleep ? NULL : &lock); if (mpte == NULL && nosleep) { if (lock != NULL) rw_wunlock(lock); PMAP_UNLOCK(pmap); return (KERN_RESOURCE_SHORTAGE); } goto retry; } else panic("pmap_enter: invalid page directory va=%#lx", va); origpte = *pte; /* * Is the specified virtual address already mapped? */ if ((origpte & PG_V) != 0) { /* * Wiring change, just update stats. We don't worry about * wiring PT pages as they remain resident as long as there * are valid mappings in them. Hence, if a user page is wired, * the PT page will be also. */ if ((newpte & PG_W) != 0 && (origpte & PG_W) == 0) pmap->pm_stats.wired_count++; else if ((newpte & PG_W) == 0 && (origpte & PG_W) != 0) pmap->pm_stats.wired_count--; /* * Remove the extra PT page reference. */ if (mpte != NULL) { mpte->wire_count--; KASSERT(mpte->wire_count > 0, ("pmap_enter: missing reference to page table page," " va: 0x%lx", va)); } /* * Has the physical page changed? */ opa = origpte & PG_FRAME; if (opa == pa) { /* * No, might be a protection or wiring change. */ if ((origpte & PG_MANAGED) != 0) { newpte |= PG_MANAGED; if ((newpte & PG_RW) != 0) vm_page_aflag_set(m, PGA_WRITEABLE); } if (((origpte ^ newpte) & ~(PG_M | PG_A)) == 0) goto unchanged; goto validate; } } else { /* * Increment the counters. */ if ((newpte & PG_W) != 0) pmap->pm_stats.wired_count++; pmap_resident_count_inc(pmap, 1); } /* * Enter on the PV list if part of our managed memory. */ if ((m->oflags & VPO_UNMANAGED) == 0) { newpte |= PG_MANAGED; pv = get_pv_entry(pmap, &lock); pv->pv_va = va; CHANGE_PV_LIST_LOCK_TO_PHYS(&lock, pa); TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_next); m->md.pv_gen++; if ((newpte & PG_RW) != 0) vm_page_aflag_set(m, PGA_WRITEABLE); } /* * Update the PTE. */ if ((origpte & PG_V) != 0) { validate: origpte = pte_load_store(pte, newpte); opa = origpte & PG_FRAME; if (opa != pa) { if ((origpte & PG_MANAGED) != 0) { om = PHYS_TO_VM_PAGE(opa); if ((origpte & (PG_M | PG_RW)) == (PG_M | PG_RW)) vm_page_dirty(om); if ((origpte & PG_A) != 0) vm_page_aflag_set(om, PGA_REFERENCED); CHANGE_PV_LIST_LOCK_TO_PHYS(&lock, opa); pmap_pvh_free(&om->md, pmap, va); if ((om->aflags & PGA_WRITEABLE) != 0 && TAILQ_EMPTY(&om->md.pv_list) && ((om->flags & PG_FICTITIOUS) != 0 || TAILQ_EMPTY(&pa_to_pvh(opa)->pv_list))) vm_page_aflag_clear(om, PGA_WRITEABLE); } } else if ((newpte & PG_M) == 0 && (origpte & (PG_M | PG_RW)) == (PG_M | PG_RW)) { if ((origpte & PG_MANAGED) != 0) vm_page_dirty(m); /* * Although the PTE may still have PG_RW set, TLB * invalidation may nonetheless be required because * the PTE no longer has PG_M set. */ } else if ((origpte & PG_NX) != 0 || (newpte & PG_NX) == 0) { /* * This PTE change does not require TLB invalidation. */ goto unchanged; } if ((origpte & PG_A) != 0) pmap_invalidate_page(pmap, va); } else pte_store(pte, newpte); unchanged: /* * If both the page table page and the reservation are fully * populated, then attempt promotion. */ if ((mpte == NULL || mpte->wire_count == NPTEPG) && pmap_ps_enabled(pmap) && (m->flags & PG_FICTITIOUS) == 0 && vm_reserv_level_iffullpop(m) == 0) pmap_promote_pde(pmap, pde, va, &lock); if (lock != NULL) rw_wunlock(lock); PMAP_UNLOCK(pmap); return (KERN_SUCCESS); } /* * Tries to create a 2MB page mapping. Returns TRUE if successful and FALSE * otherwise. Fails if (1) a page table page cannot be allocated without * blocking, (2) a mapping already exists at the specified virtual address, or * (3) a pv entry cannot be allocated without reclaiming another pv entry. */ static boolean_t pmap_enter_pde(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot, struct rwlock **lockp) { pd_entry_t *pde, newpde; pt_entry_t PG_V; vm_page_t mpde; struct spglist free; PG_V = pmap_valid_bit(pmap); PMAP_LOCK_ASSERT(pmap, MA_OWNED); if ((mpde = pmap_allocpde(pmap, va, NULL)) == NULL) { CTR2(KTR_PMAP, "pmap_enter_pde: failure for va %#lx" " in pmap %p", va, pmap); return (FALSE); } pde = (pd_entry_t *)PHYS_TO_DMAP(VM_PAGE_TO_PHYS(mpde)); pde = &pde[pmap_pde_index(va)]; if ((*pde & PG_V) != 0) { KASSERT(mpde->wire_count > 1, ("pmap_enter_pde: mpde's wire count is too low")); mpde->wire_count--; CTR2(KTR_PMAP, "pmap_enter_pde: failure for va %#lx" " in pmap %p", va, pmap); return (FALSE); } newpde = VM_PAGE_TO_PHYS(m) | pmap_cache_bits(pmap, m->md.pat_mode, 1) | PG_PS | PG_V; if ((m->oflags & VPO_UNMANAGED) == 0) { newpde |= PG_MANAGED; /* * Abort this mapping if its PV entry could not be created. */ if (!pmap_pv_insert_pde(pmap, va, VM_PAGE_TO_PHYS(m), lockp)) { SLIST_INIT(&free); if (pmap_unwire_ptp(pmap, va, mpde, &free)) { /* * Although "va" is not mapped, paging- * structure caches could nonetheless have * entries that refer to the freed page table * pages. Invalidate those entries. */ pmap_invalidate_page(pmap, va); pmap_free_zero_pages(&free); } CTR2(KTR_PMAP, "pmap_enter_pde: failure for va %#lx" " in pmap %p", va, pmap); return (FALSE); } } if ((prot & VM_PROT_EXECUTE) == 0) newpde |= pg_nx; if (va < VM_MAXUSER_ADDRESS) newpde |= PG_U; /* * Increment counters. */ pmap_resident_count_inc(pmap, NBPDR / PAGE_SIZE); /* * Map the superpage. */ pde_store(pde, newpde); atomic_add_long(&pmap_pde_mappings, 1); CTR2(KTR_PMAP, "pmap_enter_pde: success for va %#lx" " in pmap %p", va, pmap); return (TRUE); } /* * Maps a sequence of resident pages belonging to the same object. * The sequence begins with the given page m_start. This page is * mapped at the given virtual address start. Each subsequent page is * mapped at a virtual address that is offset from start by the same * amount as the page is offset from m_start within the object. The * last page in the sequence is the page with the largest offset from * m_start that can be mapped at a virtual address less than the given * virtual address end. Not every virtual page between start and end * is mapped; only those for which a resident page exists with the * corresponding offset from m_start are mapped. */ void pmap_enter_object(pmap_t pmap, vm_offset_t start, vm_offset_t end, vm_page_t m_start, vm_prot_t prot) { struct rwlock *lock; vm_offset_t va; vm_page_t m, mpte; vm_pindex_t diff, psize; VM_OBJECT_ASSERT_LOCKED(m_start->object); psize = atop(end - start); mpte = NULL; m = m_start; lock = NULL; PMAP_LOCK(pmap); while (m != NULL && (diff = m->pindex - m_start->pindex) < psize) { va = start + ptoa(diff); if ((va & PDRMASK) == 0 && va + NBPDR <= end && m->psind == 1 && pmap_ps_enabled(pmap) && pmap_enter_pde(pmap, va, m, prot, &lock)) m = &m[NBPDR / PAGE_SIZE - 1]; else mpte = pmap_enter_quick_locked(pmap, va, m, prot, mpte, &lock); m = TAILQ_NEXT(m, listq); } if (lock != NULL) rw_wunlock(lock); PMAP_UNLOCK(pmap); } /* * this code makes some *MAJOR* assumptions: * 1. Current pmap & pmap exists. * 2. Not wired. * 3. Read access. * 4. No page table pages. * but is *MUCH* faster than pmap_enter... */ void pmap_enter_quick(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot) { struct rwlock *lock; lock = NULL; PMAP_LOCK(pmap); (void)pmap_enter_quick_locked(pmap, va, m, prot, NULL, &lock); if (lock != NULL) rw_wunlock(lock); PMAP_UNLOCK(pmap); } static vm_page_t pmap_enter_quick_locked(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot, vm_page_t mpte, struct rwlock **lockp) { struct spglist free; pt_entry_t *pte, PG_V; vm_paddr_t pa; KASSERT(va < kmi.clean_sva || va >= kmi.clean_eva || (m->oflags & VPO_UNMANAGED) != 0, ("pmap_enter_quick_locked: managed mapping within the clean submap")); PG_V = pmap_valid_bit(pmap); PMAP_LOCK_ASSERT(pmap, MA_OWNED); /* * In the case that a page table page is not * resident, we are creating it here. */ if (va < VM_MAXUSER_ADDRESS) { vm_pindex_t ptepindex; pd_entry_t *ptepa; /* * Calculate pagetable page index */ ptepindex = pmap_pde_pindex(va); if (mpte && (mpte->pindex == ptepindex)) { mpte->wire_count++; } else { /* * Get the page directory entry */ ptepa = pmap_pde(pmap, va); /* * If the page table page is mapped, we just increment * the hold count, and activate it. Otherwise, we * attempt to allocate a page table page. If this * attempt fails, we don't retry. Instead, we give up. */ if (ptepa && (*ptepa & PG_V) != 0) { if (*ptepa & PG_PS) return (NULL); mpte = PHYS_TO_VM_PAGE(*ptepa & PG_FRAME); mpte->wire_count++; } else { /* * Pass NULL instead of the PV list lock * pointer, because we don't intend to sleep. */ mpte = _pmap_allocpte(pmap, ptepindex, NULL); if (mpte == NULL) return (mpte); } } pte = (pt_entry_t *)PHYS_TO_DMAP(VM_PAGE_TO_PHYS(mpte)); pte = &pte[pmap_pte_index(va)]; } else { mpte = NULL; pte = vtopte(va); } if (*pte) { if (mpte != NULL) { mpte->wire_count--; mpte = NULL; } return (mpte); } /* * Enter on the PV list if part of our managed memory. */ if ((m->oflags & VPO_UNMANAGED) == 0 && !pmap_try_insert_pv_entry(pmap, va, m, lockp)) { if (mpte != NULL) { SLIST_INIT(&free); if (pmap_unwire_ptp(pmap, va, mpte, &free)) { /* * Although "va" is not mapped, paging- * structure caches could nonetheless have * entries that refer to the freed page table * pages. Invalidate those entries. */ pmap_invalidate_page(pmap, va); pmap_free_zero_pages(&free); } mpte = NULL; } return (mpte); } /* * Increment counters */ pmap_resident_count_inc(pmap, 1); pa = VM_PAGE_TO_PHYS(m) | pmap_cache_bits(pmap, m->md.pat_mode, 0); if ((prot & VM_PROT_EXECUTE) == 0) pa |= pg_nx; /* * Now validate mapping with RO protection */ if ((m->oflags & VPO_UNMANAGED) != 0) pte_store(pte, pa | PG_V | PG_U); else pte_store(pte, pa | PG_V | PG_U | PG_MANAGED); return (mpte); } /* * Make a temporary mapping for a physical address. This is only intended * to be used for panic dumps. */ void * pmap_kenter_temporary(vm_paddr_t pa, int i) { vm_offset_t va; va = (vm_offset_t)crashdumpmap + (i * PAGE_SIZE); pmap_kenter(va, pa); invlpg(va); return ((void *)crashdumpmap); } /* * This code maps large physical mmap regions into the * processor address space. Note that some shortcuts * are taken, but the code works. */ void pmap_object_init_pt(pmap_t pmap, vm_offset_t addr, vm_object_t object, vm_pindex_t pindex, vm_size_t size) { pd_entry_t *pde; pt_entry_t PG_A, PG_M, PG_RW, PG_V; vm_paddr_t pa, ptepa; vm_page_t p, pdpg; int pat_mode; PG_A = pmap_accessed_bit(pmap); PG_M = pmap_modified_bit(pmap); PG_V = pmap_valid_bit(pmap); PG_RW = pmap_rw_bit(pmap); VM_OBJECT_ASSERT_WLOCKED(object); KASSERT(object->type == OBJT_DEVICE || object->type == OBJT_SG, ("pmap_object_init_pt: non-device object")); if ((addr & (NBPDR - 1)) == 0 && (size & (NBPDR - 1)) == 0) { if (!pmap_ps_enabled(pmap)) return; if (!vm_object_populate(object, pindex, pindex + atop(size))) return; p = vm_page_lookup(object, pindex); KASSERT(p->valid == VM_PAGE_BITS_ALL, ("pmap_object_init_pt: invalid page %p", p)); pat_mode = p->md.pat_mode; /* * Abort the mapping if the first page is not physically * aligned to a 2MB page boundary. */ ptepa = VM_PAGE_TO_PHYS(p); if (ptepa & (NBPDR - 1)) return; /* * Skip the first page. Abort the mapping if the rest of * the pages are not physically contiguous or have differing * memory attributes. */ p = TAILQ_NEXT(p, listq); for (pa = ptepa + PAGE_SIZE; pa < ptepa + size; pa += PAGE_SIZE) { KASSERT(p->valid == VM_PAGE_BITS_ALL, ("pmap_object_init_pt: invalid page %p", p)); if (pa != VM_PAGE_TO_PHYS(p) || pat_mode != p->md.pat_mode) return; p = TAILQ_NEXT(p, listq); } /* * Map using 2MB pages. Since "ptepa" is 2M aligned and * "size" is a multiple of 2M, adding the PAT setting to "pa" * will not affect the termination of this loop. */ PMAP_LOCK(pmap); for (pa = ptepa | pmap_cache_bits(pmap, pat_mode, 1); pa < ptepa + size; pa += NBPDR) { pdpg = pmap_allocpde(pmap, addr, NULL); if (pdpg == NULL) { /* * The creation of mappings below is only an * optimization. If a page directory page * cannot be allocated without blocking, * continue on to the next mapping rather than * blocking. */ addr += NBPDR; continue; } pde = (pd_entry_t *)PHYS_TO_DMAP(VM_PAGE_TO_PHYS(pdpg)); pde = &pde[pmap_pde_index(addr)]; if ((*pde & PG_V) == 0) { pde_store(pde, pa | PG_PS | PG_M | PG_A | PG_U | PG_RW | PG_V); pmap_resident_count_inc(pmap, NBPDR / PAGE_SIZE); atomic_add_long(&pmap_pde_mappings, 1); } else { /* Continue on if the PDE is already valid. */ pdpg->wire_count--; KASSERT(pdpg->wire_count > 0, ("pmap_object_init_pt: missing reference " "to page directory page, va: 0x%lx", addr)); } addr += NBPDR; } PMAP_UNLOCK(pmap); } } /* * Clear the wired attribute from the mappings for the specified range of * addresses in the given pmap. Every valid mapping within that range * must have the wired attribute set. In contrast, invalid mappings * cannot have the wired attribute set, so they are ignored. * * The wired attribute of the page table entry is not a hardware * feature, so there is no need to invalidate any TLB entries. * Since pmap_demote_pde() for the wired entry must never fail, * pmap_delayed_invl_started()/finished() calls around the * function are not needed. */ void pmap_unwire(pmap_t pmap, vm_offset_t sva, vm_offset_t eva) { vm_offset_t va_next; pml4_entry_t *pml4e; pdp_entry_t *pdpe; pd_entry_t *pde; pt_entry_t *pte, PG_V; PG_V = pmap_valid_bit(pmap); PMAP_LOCK(pmap); for (; sva < eva; sva = va_next) { pml4e = pmap_pml4e(pmap, sva); if ((*pml4e & PG_V) == 0) { va_next = (sva + NBPML4) & ~PML4MASK; if (va_next < sva) va_next = eva; continue; } pdpe = pmap_pml4e_to_pdpe(pml4e, sva); if ((*pdpe & PG_V) == 0) { va_next = (sva + NBPDP) & ~PDPMASK; if (va_next < sva) va_next = eva; continue; } va_next = (sva + NBPDR) & ~PDRMASK; if (va_next < sva) va_next = eva; pde = pmap_pdpe_to_pde(pdpe, sva); if ((*pde & PG_V) == 0) continue; if ((*pde & PG_PS) != 0) { if ((*pde & PG_W) == 0) panic("pmap_unwire: pde %#jx is missing PG_W", (uintmax_t)*pde); /* * Are we unwiring the entire large page? If not, * demote the mapping and fall through. */ if (sva + NBPDR == va_next && eva >= va_next) { atomic_clear_long(pde, PG_W); pmap->pm_stats.wired_count -= NBPDR / PAGE_SIZE; continue; } else if (!pmap_demote_pde(pmap, pde, sva)) panic("pmap_unwire: demotion failed"); } if (va_next > eva) va_next = eva; for (pte = pmap_pde_to_pte(pde, sva); sva != va_next; pte++, sva += PAGE_SIZE) { if ((*pte & PG_V) == 0) continue; if ((*pte & PG_W) == 0) panic("pmap_unwire: pte %#jx is missing PG_W", (uintmax_t)*pte); /* * PG_W must be cleared atomically. Although the pmap * lock synchronizes access to PG_W, another processor * could be setting PG_M and/or PG_A concurrently. */ atomic_clear_long(pte, PG_W); pmap->pm_stats.wired_count--; } } PMAP_UNLOCK(pmap); } /* * Copy the range specified by src_addr/len * from the source map to the range dst_addr/len * in the destination map. * * This routine is only advisory and need not do anything. */ void pmap_copy(pmap_t dst_pmap, pmap_t src_pmap, vm_offset_t dst_addr, vm_size_t len, vm_offset_t src_addr) { struct rwlock *lock; struct spglist free; vm_offset_t addr; vm_offset_t end_addr = src_addr + len; vm_offset_t va_next; pt_entry_t PG_A, PG_M, PG_V; if (dst_addr != src_addr) return; if (dst_pmap->pm_type != src_pmap->pm_type) return; /* * EPT page table entries that require emulation of A/D bits are * sensitive to clearing the PG_A bit (aka EPT_PG_READ). Although * we clear PG_M (aka EPT_PG_WRITE) concomitantly, the PG_U bit * (aka EPT_PG_EXECUTE) could still be set. Since some EPT * implementations flag an EPT misconfiguration for exec-only * mappings we skip this function entirely for emulated pmaps. */ if (pmap_emulate_ad_bits(dst_pmap)) return; lock = NULL; if (dst_pmap < src_pmap) { PMAP_LOCK(dst_pmap); PMAP_LOCK(src_pmap); } else { PMAP_LOCK(src_pmap); PMAP_LOCK(dst_pmap); } PG_A = pmap_accessed_bit(dst_pmap); PG_M = pmap_modified_bit(dst_pmap); PG_V = pmap_valid_bit(dst_pmap); for (addr = src_addr; addr < end_addr; addr = va_next) { pt_entry_t *src_pte, *dst_pte; vm_page_t dstmpde, dstmpte, srcmpte; pml4_entry_t *pml4e; pdp_entry_t *pdpe; pd_entry_t srcptepaddr, *pde; KASSERT(addr < UPT_MIN_ADDRESS, ("pmap_copy: invalid to pmap_copy page tables")); pml4e = pmap_pml4e(src_pmap, addr); if ((*pml4e & PG_V) == 0) { va_next = (addr + NBPML4) & ~PML4MASK; if (va_next < addr) va_next = end_addr; continue; } pdpe = pmap_pml4e_to_pdpe(pml4e, addr); if ((*pdpe & PG_V) == 0) { va_next = (addr + NBPDP) & ~PDPMASK; if (va_next < addr) va_next = end_addr; continue; } va_next = (addr + NBPDR) & ~PDRMASK; if (va_next < addr) va_next = end_addr; pde = pmap_pdpe_to_pde(pdpe, addr); srcptepaddr = *pde; if (srcptepaddr == 0) continue; if (srcptepaddr & PG_PS) { if ((addr & PDRMASK) != 0 || addr + NBPDR > end_addr) continue; dstmpde = pmap_allocpde(dst_pmap, addr, NULL); if (dstmpde == NULL) break; pde = (pd_entry_t *) PHYS_TO_DMAP(VM_PAGE_TO_PHYS(dstmpde)); pde = &pde[pmap_pde_index(addr)]; if (*pde == 0 && ((srcptepaddr & PG_MANAGED) == 0 || pmap_pv_insert_pde(dst_pmap, addr, srcptepaddr & PG_PS_FRAME, &lock))) { *pde = srcptepaddr & ~PG_W; pmap_resident_count_inc(dst_pmap, NBPDR / PAGE_SIZE); atomic_add_long(&pmap_pde_mappings, 1); } else dstmpde->wire_count--; continue; } srcptepaddr &= PG_FRAME; srcmpte = PHYS_TO_VM_PAGE(srcptepaddr); KASSERT(srcmpte->wire_count > 0, ("pmap_copy: source page table page is unused")); if (va_next > end_addr) va_next = end_addr; src_pte = (pt_entry_t *)PHYS_TO_DMAP(srcptepaddr); src_pte = &src_pte[pmap_pte_index(addr)]; dstmpte = NULL; while (addr < va_next) { pt_entry_t ptetemp; ptetemp = *src_pte; /* * we only virtual copy managed pages */ if ((ptetemp & PG_MANAGED) != 0) { if (dstmpte != NULL && dstmpte->pindex == pmap_pde_pindex(addr)) dstmpte->wire_count++; else if ((dstmpte = pmap_allocpte(dst_pmap, addr, NULL)) == NULL) goto out; dst_pte = (pt_entry_t *) PHYS_TO_DMAP(VM_PAGE_TO_PHYS(dstmpte)); dst_pte = &dst_pte[pmap_pte_index(addr)]; if (*dst_pte == 0 && pmap_try_insert_pv_entry(dst_pmap, addr, PHYS_TO_VM_PAGE(ptetemp & PG_FRAME), &lock)) { /* * Clear the wired, modified, and * accessed (referenced) bits * during the copy. */ *dst_pte = ptetemp & ~(PG_W | PG_M | PG_A); pmap_resident_count_inc(dst_pmap, 1); } else { SLIST_INIT(&free); if (pmap_unwire_ptp(dst_pmap, addr, dstmpte, &free)) { /* * Although "addr" is not * mapped, paging-structure * caches could nonetheless * have entries that refer to * the freed page table pages. * Invalidate those entries. */ pmap_invalidate_page(dst_pmap, addr); pmap_free_zero_pages(&free); } goto out; } if (dstmpte->wire_count >= srcmpte->wire_count) break; } addr += PAGE_SIZE; src_pte++; } } out: if (lock != NULL) rw_wunlock(lock); PMAP_UNLOCK(src_pmap); PMAP_UNLOCK(dst_pmap); } /* * Zero the specified hardware page. */ void pmap_zero_page(vm_page_t m) { vm_offset_t va = PHYS_TO_DMAP(VM_PAGE_TO_PHYS(m)); pagezero((void *)va); } /* * Zero an an area within a single hardware page. off and size must not * cover an area beyond a single hardware page. */ void pmap_zero_page_area(vm_page_t m, int off, int size) { vm_offset_t va = PHYS_TO_DMAP(VM_PAGE_TO_PHYS(m)); if (off == 0 && size == PAGE_SIZE) pagezero((void *)va); else bzero((char *)va + off, size); } /* * Copy 1 specified hardware page to another. */ void pmap_copy_page(vm_page_t msrc, vm_page_t mdst) { vm_offset_t src = PHYS_TO_DMAP(VM_PAGE_TO_PHYS(msrc)); vm_offset_t dst = PHYS_TO_DMAP(VM_PAGE_TO_PHYS(mdst)); pagecopy((void *)src, (void *)dst); } int unmapped_buf_allowed = 1; void pmap_copy_pages(vm_page_t ma[], vm_offset_t a_offset, vm_page_t mb[], vm_offset_t b_offset, int xfersize) { void *a_cp, *b_cp; vm_page_t pages[2]; vm_offset_t vaddr[2], a_pg_offset, b_pg_offset; int cnt; boolean_t mapped; while (xfersize > 0) { a_pg_offset = a_offset & PAGE_MASK; pages[0] = ma[a_offset >> PAGE_SHIFT]; b_pg_offset = b_offset & PAGE_MASK; pages[1] = mb[b_offset >> PAGE_SHIFT]; cnt = min(xfersize, PAGE_SIZE - a_pg_offset); cnt = min(cnt, PAGE_SIZE - b_pg_offset); mapped = pmap_map_io_transient(pages, vaddr, 2, FALSE); a_cp = (char *)vaddr[0] + a_pg_offset; b_cp = (char *)vaddr[1] + b_pg_offset; bcopy(a_cp, b_cp, cnt); if (__predict_false(mapped)) pmap_unmap_io_transient(pages, vaddr, 2, FALSE); a_offset += cnt; b_offset += cnt; xfersize -= cnt; } } /* * Returns true if the pmap's pv is one of the first * 16 pvs linked to from this page. This count may * be changed upwards or downwards in the future; it * is only necessary that true be returned for a small * subset of pmaps for proper page aging. */ boolean_t pmap_page_exists_quick(pmap_t pmap, vm_page_t m) { struct md_page *pvh; struct rwlock *lock; pv_entry_t pv; int loops = 0; boolean_t rv; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_page_exists_quick: page %p is not managed", m)); rv = FALSE; lock = VM_PAGE_TO_PV_LIST_LOCK(m); rw_rlock(lock); TAILQ_FOREACH(pv, &m->md.pv_list, pv_next) { if (PV_PMAP(pv) == pmap) { rv = TRUE; break; } loops++; if (loops >= 16) break; } if (!rv && loops < 16 && (m->flags & PG_FICTITIOUS) == 0) { pvh = pa_to_pvh(VM_PAGE_TO_PHYS(m)); TAILQ_FOREACH(pv, &pvh->pv_list, pv_next) { if (PV_PMAP(pv) == pmap) { rv = TRUE; break; } loops++; if (loops >= 16) break; } } rw_runlock(lock); return (rv); } /* * pmap_page_wired_mappings: * * Return the number of managed mappings to the given physical page * that are wired. */ int pmap_page_wired_mappings(vm_page_t m) { struct rwlock *lock; struct md_page *pvh; pmap_t pmap; pt_entry_t *pte; pv_entry_t pv; int count, md_gen, pvh_gen; if ((m->oflags & VPO_UNMANAGED) != 0) return (0); lock = VM_PAGE_TO_PV_LIST_LOCK(m); rw_rlock(lock); restart: count = 0; TAILQ_FOREACH(pv, &m->md.pv_list, pv_next) { pmap = PV_PMAP(pv); if (!PMAP_TRYLOCK(pmap)) { md_gen = m->md.pv_gen; rw_runlock(lock); PMAP_LOCK(pmap); rw_rlock(lock); if (md_gen != m->md.pv_gen) { PMAP_UNLOCK(pmap); goto restart; } } pte = pmap_pte(pmap, pv->pv_va); if ((*pte & PG_W) != 0) count++; PMAP_UNLOCK(pmap); } if ((m->flags & PG_FICTITIOUS) == 0) { pvh = pa_to_pvh(VM_PAGE_TO_PHYS(m)); TAILQ_FOREACH(pv, &pvh->pv_list, pv_next) { pmap = PV_PMAP(pv); if (!PMAP_TRYLOCK(pmap)) { md_gen = m->md.pv_gen; pvh_gen = pvh->pv_gen; rw_runlock(lock); PMAP_LOCK(pmap); rw_rlock(lock); if (md_gen != m->md.pv_gen || pvh_gen != pvh->pv_gen) { PMAP_UNLOCK(pmap); goto restart; } } pte = pmap_pde(pmap, pv->pv_va); if ((*pte & PG_W) != 0) count++; PMAP_UNLOCK(pmap); } } rw_runlock(lock); return (count); } /* * Returns TRUE if the given page is mapped individually or as part of * a 2mpage. Otherwise, returns FALSE. */ boolean_t pmap_page_is_mapped(vm_page_t m) { struct rwlock *lock; boolean_t rv; if ((m->oflags & VPO_UNMANAGED) != 0) return (FALSE); lock = VM_PAGE_TO_PV_LIST_LOCK(m); rw_rlock(lock); rv = !TAILQ_EMPTY(&m->md.pv_list) || ((m->flags & PG_FICTITIOUS) == 0 && !TAILQ_EMPTY(&pa_to_pvh(VM_PAGE_TO_PHYS(m))->pv_list)); rw_runlock(lock); return (rv); } /* * Destroy all managed, non-wired mappings in the given user-space * pmap. This pmap cannot be active on any processor besides the * caller. * * This function cannot be applied to the kernel pmap. Moreover, it * is not intended for general use. It is only to be used during * process termination. Consequently, it can be implemented in ways * that make it faster than pmap_remove(). First, it can more quickly * destroy mappings by iterating over the pmap's collection of PV * entries, rather than searching the page table. Second, it doesn't * have to test and clear the page table entries atomically, because * no processor is currently accessing the user address space. In * particular, a page table entry's dirty bit won't change state once * this function starts. */ void pmap_remove_pages(pmap_t pmap) { pd_entry_t ptepde; pt_entry_t *pte, tpte; pt_entry_t PG_M, PG_RW, PG_V; struct spglist free; vm_page_t m, mpte, mt; pv_entry_t pv; struct md_page *pvh; struct pv_chunk *pc, *npc; struct rwlock *lock; int64_t bit; uint64_t inuse, bitmask; int allfree, field, freed, idx; boolean_t superpage; vm_paddr_t pa; /* * Assert that the given pmap is only active on the current * CPU. Unfortunately, we cannot block another CPU from * activating the pmap while this function is executing. */ KASSERT(pmap == PCPU_GET(curpmap), ("non-current pmap %p", pmap)); #ifdef INVARIANTS { cpuset_t other_cpus; other_cpus = all_cpus; critical_enter(); CPU_CLR(PCPU_GET(cpuid), &other_cpus); CPU_AND(&other_cpus, &pmap->pm_active); critical_exit(); KASSERT(CPU_EMPTY(&other_cpus), ("pmap active %p", pmap)); } #endif lock = NULL; PG_M = pmap_modified_bit(pmap); PG_V = pmap_valid_bit(pmap); PG_RW = pmap_rw_bit(pmap); SLIST_INIT(&free); PMAP_LOCK(pmap); TAILQ_FOREACH_SAFE(pc, &pmap->pm_pvchunk, pc_list, npc) { allfree = 1; freed = 0; for (field = 0; field < _NPCM; field++) { inuse = ~pc->pc_map[field] & pc_freemask[field]; while (inuse != 0) { bit = bsfq(inuse); bitmask = 1UL << bit; idx = field * 64 + bit; pv = &pc->pc_pventry[idx]; inuse &= ~bitmask; pte = pmap_pdpe(pmap, pv->pv_va); ptepde = *pte; pte = pmap_pdpe_to_pde(pte, pv->pv_va); tpte = *pte; if ((tpte & (PG_PS | PG_V)) == PG_V) { superpage = FALSE; ptepde = tpte; pte = (pt_entry_t *)PHYS_TO_DMAP(tpte & PG_FRAME); pte = &pte[pmap_pte_index(pv->pv_va)]; tpte = *pte; } else { /* * Keep track whether 'tpte' is a * superpage explicitly instead of * relying on PG_PS being set. * * This is because PG_PS is numerically * identical to PG_PTE_PAT and thus a * regular page could be mistaken for * a superpage. */ superpage = TRUE; } if ((tpte & PG_V) == 0) { panic("bad pte va %lx pte %lx", pv->pv_va, tpte); } /* * We cannot remove wired pages from a process' mapping at this time */ if (tpte & PG_W) { allfree = 0; continue; } if (superpage) pa = tpte & PG_PS_FRAME; else pa = tpte & PG_FRAME; m = PHYS_TO_VM_PAGE(pa); KASSERT(m->phys_addr == pa, ("vm_page_t %p phys_addr mismatch %016jx %016jx", m, (uintmax_t)m->phys_addr, (uintmax_t)tpte)); KASSERT((m->flags & PG_FICTITIOUS) != 0 || m < &vm_page_array[vm_page_array_size], ("pmap_remove_pages: bad tpte %#jx", (uintmax_t)tpte)); pte_clear(pte); /* * Update the vm_page_t clean/reference bits. */ if ((tpte & (PG_M | PG_RW)) == (PG_M | PG_RW)) { if (superpage) { for (mt = m; mt < &m[NBPDR / PAGE_SIZE]; mt++) vm_page_dirty(mt); } else vm_page_dirty(m); } CHANGE_PV_LIST_LOCK_TO_VM_PAGE(&lock, m); /* Mark free */ pc->pc_map[field] |= bitmask; if (superpage) { pmap_resident_count_dec(pmap, NBPDR / PAGE_SIZE); pvh = pa_to_pvh(tpte & PG_PS_FRAME); TAILQ_REMOVE(&pvh->pv_list, pv, pv_next); pvh->pv_gen++; if (TAILQ_EMPTY(&pvh->pv_list)) { for (mt = m; mt < &m[NBPDR / PAGE_SIZE]; mt++) if ((mt->aflags & PGA_WRITEABLE) != 0 && TAILQ_EMPTY(&mt->md.pv_list)) vm_page_aflag_clear(mt, PGA_WRITEABLE); } mpte = pmap_lookup_pt_page(pmap, pv->pv_va); if (mpte != NULL) { pmap_remove_pt_page(pmap, mpte); pmap_resident_count_dec(pmap, 1); KASSERT(mpte->wire_count == NPTEPG, ("pmap_remove_pages: pte page wire count error")); mpte->wire_count = 0; pmap_add_delayed_free_list(mpte, &free, FALSE); atomic_subtract_int(&vm_cnt.v_wire_count, 1); } } else { pmap_resident_count_dec(pmap, 1); TAILQ_REMOVE(&m->md.pv_list, pv, pv_next); m->md.pv_gen++; if ((m->aflags & PGA_WRITEABLE) != 0 && TAILQ_EMPTY(&m->md.pv_list) && (m->flags & PG_FICTITIOUS) == 0) { pvh = pa_to_pvh(VM_PAGE_TO_PHYS(m)); if (TAILQ_EMPTY(&pvh->pv_list)) vm_page_aflag_clear(m, PGA_WRITEABLE); } } pmap_unuse_pt(pmap, pv->pv_va, ptepde, &free); freed++; } } PV_STAT(atomic_add_long(&pv_entry_frees, freed)); PV_STAT(atomic_add_int(&pv_entry_spare, freed)); PV_STAT(atomic_subtract_long(&pv_entry_count, freed)); if (allfree) { TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); free_pv_chunk(pc); } } if (lock != NULL) rw_wunlock(lock); pmap_invalidate_all(pmap); PMAP_UNLOCK(pmap); pmap_free_zero_pages(&free); } static boolean_t pmap_page_test_mappings(vm_page_t m, boolean_t accessed, boolean_t modified) { struct rwlock *lock; pv_entry_t pv; struct md_page *pvh; pt_entry_t *pte, mask; pt_entry_t PG_A, PG_M, PG_RW, PG_V; pmap_t pmap; int md_gen, pvh_gen; boolean_t rv; rv = FALSE; lock = VM_PAGE_TO_PV_LIST_LOCK(m); rw_rlock(lock); restart: TAILQ_FOREACH(pv, &m->md.pv_list, pv_next) { pmap = PV_PMAP(pv); if (!PMAP_TRYLOCK(pmap)) { md_gen = m->md.pv_gen; rw_runlock(lock); PMAP_LOCK(pmap); rw_rlock(lock); if (md_gen != m->md.pv_gen) { PMAP_UNLOCK(pmap); goto restart; } } pte = pmap_pte(pmap, pv->pv_va); mask = 0; if (modified) { PG_M = pmap_modified_bit(pmap); PG_RW = pmap_rw_bit(pmap); mask |= PG_RW | PG_M; } if (accessed) { PG_A = pmap_accessed_bit(pmap); PG_V = pmap_valid_bit(pmap); mask |= PG_V | PG_A; } rv = (*pte & mask) == mask; PMAP_UNLOCK(pmap); if (rv) goto out; } if ((m->flags & PG_FICTITIOUS) == 0) { pvh = pa_to_pvh(VM_PAGE_TO_PHYS(m)); TAILQ_FOREACH(pv, &pvh->pv_list, pv_next) { pmap = PV_PMAP(pv); if (!PMAP_TRYLOCK(pmap)) { md_gen = m->md.pv_gen; pvh_gen = pvh->pv_gen; rw_runlock(lock); PMAP_LOCK(pmap); rw_rlock(lock); if (md_gen != m->md.pv_gen || pvh_gen != pvh->pv_gen) { PMAP_UNLOCK(pmap); goto restart; } } pte = pmap_pde(pmap, pv->pv_va); mask = 0; if (modified) { PG_M = pmap_modified_bit(pmap); PG_RW = pmap_rw_bit(pmap); mask |= PG_RW | PG_M; } if (accessed) { PG_A = pmap_accessed_bit(pmap); PG_V = pmap_valid_bit(pmap); mask |= PG_V | PG_A; } rv = (*pte & mask) == mask; PMAP_UNLOCK(pmap); if (rv) goto out; } } out: rw_runlock(lock); return (rv); } /* * pmap_is_modified: * * Return whether or not the specified physical page was modified * in any physical maps. */ boolean_t pmap_is_modified(vm_page_t m) { KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_is_modified: page %p is not managed", m)); /* * If the page is not exclusive busied, then PGA_WRITEABLE cannot be * concurrently set while the object is locked. Thus, if PGA_WRITEABLE * is clear, no PTEs can have PG_M set. */ VM_OBJECT_ASSERT_WLOCKED(m->object); if (!vm_page_xbusied(m) && (m->aflags & PGA_WRITEABLE) == 0) return (FALSE); return (pmap_page_test_mappings(m, FALSE, TRUE)); } /* * pmap_is_prefaultable: * * Return whether or not the specified virtual address is eligible * for prefault. */ boolean_t pmap_is_prefaultable(pmap_t pmap, vm_offset_t addr) { pd_entry_t *pde; pt_entry_t *pte, PG_V; boolean_t rv; PG_V = pmap_valid_bit(pmap); rv = FALSE; PMAP_LOCK(pmap); pde = pmap_pde(pmap, addr); if (pde != NULL && (*pde & (PG_PS | PG_V)) == PG_V) { pte = pmap_pde_to_pte(pde, addr); rv = (*pte & PG_V) == 0; } PMAP_UNLOCK(pmap); return (rv); } /* * pmap_is_referenced: * * Return whether or not the specified physical page was referenced * in any physical maps. */ boolean_t pmap_is_referenced(vm_page_t m) { KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_is_referenced: page %p is not managed", m)); return (pmap_page_test_mappings(m, TRUE, FALSE)); } /* * Clear the write and modified bits in each of the given page's mappings. */ void pmap_remove_write(vm_page_t m) { struct md_page *pvh; pmap_t pmap; struct rwlock *lock; pv_entry_t next_pv, pv; pd_entry_t *pde; pt_entry_t oldpte, *pte, PG_M, PG_RW; vm_offset_t va; int pvh_gen, md_gen; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_remove_write: page %p is not managed", m)); /* * If the page is not exclusive busied, then PGA_WRITEABLE cannot be * set by another thread while the object is locked. Thus, * if PGA_WRITEABLE is clear, no page table entries need updating. */ VM_OBJECT_ASSERT_WLOCKED(m->object); if (!vm_page_xbusied(m) && (m->aflags & PGA_WRITEABLE) == 0) return; lock = VM_PAGE_TO_PV_LIST_LOCK(m); pvh = (m->flags & PG_FICTITIOUS) != 0 ? &pv_dummy : pa_to_pvh(VM_PAGE_TO_PHYS(m)); retry_pv_loop: rw_wlock(lock); TAILQ_FOREACH_SAFE(pv, &pvh->pv_list, pv_next, next_pv) { pmap = PV_PMAP(pv); if (!PMAP_TRYLOCK(pmap)) { pvh_gen = pvh->pv_gen; rw_wunlock(lock); PMAP_LOCK(pmap); rw_wlock(lock); if (pvh_gen != pvh->pv_gen) { PMAP_UNLOCK(pmap); rw_wunlock(lock); goto retry_pv_loop; } } PG_RW = pmap_rw_bit(pmap); va = pv->pv_va; pde = pmap_pde(pmap, va); if ((*pde & PG_RW) != 0) (void)pmap_demote_pde_locked(pmap, pde, va, &lock); KASSERT(lock == VM_PAGE_TO_PV_LIST_LOCK(m), ("inconsistent pv lock %p %p for page %p", lock, VM_PAGE_TO_PV_LIST_LOCK(m), m)); PMAP_UNLOCK(pmap); } TAILQ_FOREACH(pv, &m->md.pv_list, pv_next) { pmap = PV_PMAP(pv); if (!PMAP_TRYLOCK(pmap)) { pvh_gen = pvh->pv_gen; md_gen = m->md.pv_gen; rw_wunlock(lock); PMAP_LOCK(pmap); rw_wlock(lock); if (pvh_gen != pvh->pv_gen || md_gen != m->md.pv_gen) { PMAP_UNLOCK(pmap); rw_wunlock(lock); goto retry_pv_loop; } } PG_M = pmap_modified_bit(pmap); PG_RW = pmap_rw_bit(pmap); pde = pmap_pde(pmap, pv->pv_va); KASSERT((*pde & PG_PS) == 0, ("pmap_remove_write: found a 2mpage in page %p's pv list", m)); pte = pmap_pde_to_pte(pde, pv->pv_va); retry: oldpte = *pte; if (oldpte & PG_RW) { if (!atomic_cmpset_long(pte, oldpte, oldpte & ~(PG_RW | PG_M))) goto retry; if ((oldpte & PG_M) != 0) vm_page_dirty(m); pmap_invalidate_page(pmap, pv->pv_va); } PMAP_UNLOCK(pmap); } rw_wunlock(lock); vm_page_aflag_clear(m, PGA_WRITEABLE); pmap_delayed_invl_wait(m); } static __inline boolean_t safe_to_clear_referenced(pmap_t pmap, pt_entry_t pte) { if (!pmap_emulate_ad_bits(pmap)) return (TRUE); KASSERT(pmap->pm_type == PT_EPT, ("invalid pm_type %d", pmap->pm_type)); /* * XWR = 010 or 110 will cause an unconditional EPT misconfiguration * so we don't let the referenced (aka EPT_PG_READ) bit to be cleared * if the EPT_PG_WRITE bit is set. */ if ((pte & EPT_PG_WRITE) != 0) return (FALSE); /* * XWR = 100 is allowed only if the PMAP_SUPPORTS_EXEC_ONLY is set. */ if ((pte & EPT_PG_EXECUTE) == 0 || ((pmap->pm_flags & PMAP_SUPPORTS_EXEC_ONLY) != 0)) return (TRUE); else return (FALSE); } -#define PMAP_TS_REFERENCED_MAX 5 - /* * pmap_ts_referenced: * * Return a count of reference bits for a page, clearing those bits. * It is not necessary for every reference bit to be cleared, but it * is necessary that 0 only be returned when there are truly no * reference bits set. * - * XXX: The exact number of bits to check and clear is a matter that - * should be tested and standardized at some point in the future for - * optimal aging of shared pages. - * * As an optimization, update the page's dirty field if a modified bit is * found while counting reference bits. This opportunistic update can be * performed at low cost and can eliminate the need for some future calls * to pmap_is_modified(). However, since this function stops after * finding PMAP_TS_REFERENCED_MAX reference bits, it may not detect some * dirty pages. Those dirty pages will only be detected by a future call * to pmap_is_modified(). * * A DI block is not needed within this function, because * invalidations are performed before the PV list lock is * released. */ int pmap_ts_referenced(vm_page_t m) { struct md_page *pvh; pv_entry_t pv, pvf; pmap_t pmap; struct rwlock *lock; pd_entry_t oldpde, *pde; pt_entry_t *pte, PG_A, PG_M, PG_RW; vm_offset_t va; vm_paddr_t pa; int cleared, md_gen, not_cleared, pvh_gen; struct spglist free; boolean_t demoted; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_ts_referenced: page %p is not managed", m)); SLIST_INIT(&free); cleared = 0; pa = VM_PAGE_TO_PHYS(m); lock = PHYS_TO_PV_LIST_LOCK(pa); pvh = (m->flags & PG_FICTITIOUS) != 0 ? &pv_dummy : pa_to_pvh(pa); rw_wlock(lock); retry: not_cleared = 0; if ((pvf = TAILQ_FIRST(&pvh->pv_list)) == NULL) goto small_mappings; pv = pvf; do { if (pvf == NULL) pvf = pv; pmap = PV_PMAP(pv); if (!PMAP_TRYLOCK(pmap)) { pvh_gen = pvh->pv_gen; rw_wunlock(lock); PMAP_LOCK(pmap); rw_wlock(lock); if (pvh_gen != pvh->pv_gen) { PMAP_UNLOCK(pmap); goto retry; } } PG_A = pmap_accessed_bit(pmap); PG_M = pmap_modified_bit(pmap); PG_RW = pmap_rw_bit(pmap); va = pv->pv_va; pde = pmap_pde(pmap, pv->pv_va); oldpde = *pde; if ((oldpde & (PG_M | PG_RW)) == (PG_M | PG_RW)) { /* * Although "oldpde" is mapping a 2MB page, because * this function is called at a 4KB page granularity, * we only update the 4KB page under test. */ vm_page_dirty(m); } - if ((*pde & PG_A) != 0) { + if ((oldpde & PG_A) != 0) { /* * Since this reference bit is shared by 512 4KB * pages, it should not be cleared every time it is * tested. Apply a simple "hash" function on the * physical page number, the virtual superpage number, * and the pmap address to select one 4KB page out of * the 512 on which testing the reference bit will * result in clearing that reference bit. This * function is designed to avoid the selection of the * same 4KB page for every 2MB page mapping. * * On demotion, a mapping that hasn't been referenced * is simply destroyed. To avoid the possibility of a * subsequent page fault on a demoted wired mapping, * always leave its reference bit set. Moreover, * since the superpage is wired, the current state of * its reference bit won't affect page replacement. */ if ((((pa >> PAGE_SHIFT) ^ (pv->pv_va >> PDRSHIFT) ^ (uintptr_t)pmap) & (NPTEPG - 1)) == 0 && - (*pde & PG_W) == 0) { + (oldpde & PG_W) == 0) { if (safe_to_clear_referenced(pmap, oldpde)) { atomic_clear_long(pde, PG_A); pmap_invalidate_page(pmap, pv->pv_va); demoted = FALSE; } else if (pmap_demote_pde_locked(pmap, pde, pv->pv_va, &lock)) { /* * Remove the mapping to a single page * so that a subsequent access may * repromote. Since the underlying * page table page is fully populated, * this removal never frees a page * table page. */ demoted = TRUE; va += VM_PAGE_TO_PHYS(m) - (oldpde & PG_PS_FRAME); pte = pmap_pde_to_pte(pde, va); pmap_remove_pte(pmap, pte, va, *pde, NULL, &lock); pmap_invalidate_page(pmap, va); } else demoted = TRUE; if (demoted) { /* * The superpage mapping was removed * entirely and therefore 'pv' is no * longer valid. */ if (pvf == pv) pvf = NULL; pv = NULL; } cleared++; KASSERT(lock == VM_PAGE_TO_PV_LIST_LOCK(m), ("inconsistent pv lock %p %p for page %p", lock, VM_PAGE_TO_PV_LIST_LOCK(m), m)); } else not_cleared++; } PMAP_UNLOCK(pmap); /* Rotate the PV list if it has more than one entry. */ if (pv != NULL && TAILQ_NEXT(pv, pv_next) != NULL) { TAILQ_REMOVE(&pvh->pv_list, pv, pv_next); TAILQ_INSERT_TAIL(&pvh->pv_list, pv, pv_next); pvh->pv_gen++; } if (cleared + not_cleared >= PMAP_TS_REFERENCED_MAX) goto out; } while ((pv = TAILQ_FIRST(&pvh->pv_list)) != pvf); small_mappings: if ((pvf = TAILQ_FIRST(&m->md.pv_list)) == NULL) goto out; pv = pvf; do { if (pvf == NULL) pvf = pv; pmap = PV_PMAP(pv); if (!PMAP_TRYLOCK(pmap)) { pvh_gen = pvh->pv_gen; md_gen = m->md.pv_gen; rw_wunlock(lock); PMAP_LOCK(pmap); rw_wlock(lock); if (pvh_gen != pvh->pv_gen || md_gen != m->md.pv_gen) { PMAP_UNLOCK(pmap); goto retry; } } PG_A = pmap_accessed_bit(pmap); PG_M = pmap_modified_bit(pmap); PG_RW = pmap_rw_bit(pmap); pde = pmap_pde(pmap, pv->pv_va); KASSERT((*pde & PG_PS) == 0, ("pmap_ts_referenced: found a 2mpage in page %p's pv list", m)); pte = pmap_pde_to_pte(pde, pv->pv_va); if ((*pte & (PG_M | PG_RW)) == (PG_M | PG_RW)) vm_page_dirty(m); if ((*pte & PG_A) != 0) { if (safe_to_clear_referenced(pmap, *pte)) { atomic_clear_long(pte, PG_A); pmap_invalidate_page(pmap, pv->pv_va); cleared++; } else if ((*pte & PG_W) == 0) { /* * Wired pages cannot be paged out so * doing accessed bit emulation for * them is wasted effort. We do the * hard work for unwired pages only. */ pmap_remove_pte(pmap, pte, pv->pv_va, *pde, &free, &lock); pmap_invalidate_page(pmap, pv->pv_va); cleared++; if (pvf == pv) pvf = NULL; pv = NULL; KASSERT(lock == VM_PAGE_TO_PV_LIST_LOCK(m), ("inconsistent pv lock %p %p for page %p", lock, VM_PAGE_TO_PV_LIST_LOCK(m), m)); } else not_cleared++; } PMAP_UNLOCK(pmap); /* Rotate the PV list if it has more than one entry. */ if (pv != NULL && TAILQ_NEXT(pv, pv_next) != NULL) { TAILQ_REMOVE(&m->md.pv_list, pv, pv_next); TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_next); m->md.pv_gen++; } } while ((pv = TAILQ_FIRST(&m->md.pv_list)) != pvf && cleared + not_cleared < PMAP_TS_REFERENCED_MAX); out: rw_wunlock(lock); pmap_free_zero_pages(&free); return (cleared + not_cleared); } /* * Apply the given advice to the specified range of addresses within the * given pmap. Depending on the advice, clear the referenced and/or * modified flags in each mapping and set the mapped page's dirty field. */ void pmap_advise(pmap_t pmap, vm_offset_t sva, vm_offset_t eva, int advice) { struct rwlock *lock; pml4_entry_t *pml4e; pdp_entry_t *pdpe; pd_entry_t oldpde, *pde; pt_entry_t *pte, PG_A, PG_G, PG_M, PG_RW, PG_V; vm_offset_t va_next; vm_page_t m; boolean_t anychanged; if (advice != MADV_DONTNEED && advice != MADV_FREE) return; /* * A/D bit emulation requires an alternate code path when clearing * the modified and accessed bits below. Since this function is * advisory in nature we skip it entirely for pmaps that require * A/D bit emulation. */ if (pmap_emulate_ad_bits(pmap)) return; PG_A = pmap_accessed_bit(pmap); PG_G = pmap_global_bit(pmap); PG_M = pmap_modified_bit(pmap); PG_V = pmap_valid_bit(pmap); PG_RW = pmap_rw_bit(pmap); anychanged = FALSE; pmap_delayed_invl_started(); PMAP_LOCK(pmap); for (; sva < eva; sva = va_next) { pml4e = pmap_pml4e(pmap, sva); if ((*pml4e & PG_V) == 0) { va_next = (sva + NBPML4) & ~PML4MASK; if (va_next < sva) va_next = eva; continue; } pdpe = pmap_pml4e_to_pdpe(pml4e, sva); if ((*pdpe & PG_V) == 0) { va_next = (sva + NBPDP) & ~PDPMASK; if (va_next < sva) va_next = eva; continue; } va_next = (sva + NBPDR) & ~PDRMASK; if (va_next < sva) va_next = eva; pde = pmap_pdpe_to_pde(pdpe, sva); oldpde = *pde; if ((oldpde & PG_V) == 0) continue; else if ((oldpde & PG_PS) != 0) { if ((oldpde & PG_MANAGED) == 0) continue; lock = NULL; if (!pmap_demote_pde_locked(pmap, pde, sva, &lock)) { if (lock != NULL) rw_wunlock(lock); /* * The large page mapping was destroyed. */ continue; } /* * Unless the page mappings are wired, remove the * mapping to a single page so that a subsequent * access may repromote. Since the underlying page * table page is fully populated, this removal never * frees a page table page. */ if ((oldpde & PG_W) == 0) { pte = pmap_pde_to_pte(pde, sva); KASSERT((*pte & PG_V) != 0, ("pmap_advise: invalid PTE")); pmap_remove_pte(pmap, pte, sva, *pde, NULL, &lock); anychanged = TRUE; } if (lock != NULL) rw_wunlock(lock); } if (va_next > eva) va_next = eva; for (pte = pmap_pde_to_pte(pde, sva); sva != va_next; pte++, sva += PAGE_SIZE) { if ((*pte & (PG_MANAGED | PG_V)) != (PG_MANAGED | PG_V)) continue; else if ((*pte & (PG_M | PG_RW)) == (PG_M | PG_RW)) { if (advice == MADV_DONTNEED) { /* * Future calls to pmap_is_modified() * can be avoided by making the page * dirty now. */ m = PHYS_TO_VM_PAGE(*pte & PG_FRAME); vm_page_dirty(m); } atomic_clear_long(pte, PG_M | PG_A); } else if ((*pte & PG_A) != 0) atomic_clear_long(pte, PG_A); else continue; if ((*pte & PG_G) != 0) pmap_invalidate_page(pmap, sva); else anychanged = TRUE; } } if (anychanged) pmap_invalidate_all(pmap); PMAP_UNLOCK(pmap); pmap_delayed_invl_finished(); } /* * Clear the modify bits on the specified physical page. */ void pmap_clear_modify(vm_page_t m) { struct md_page *pvh; pmap_t pmap; pv_entry_t next_pv, pv; pd_entry_t oldpde, *pde; pt_entry_t oldpte, *pte, PG_M, PG_RW, PG_V; struct rwlock *lock; vm_offset_t va; int md_gen, pvh_gen; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_clear_modify: page %p is not managed", m)); VM_OBJECT_ASSERT_WLOCKED(m->object); KASSERT(!vm_page_xbusied(m), ("pmap_clear_modify: page %p is exclusive busied", m)); /* * If the page is not PGA_WRITEABLE, then no PTEs can have PG_M set. * If the object containing the page is locked and the page is not * exclusive busied, then PGA_WRITEABLE cannot be concurrently set. */ if ((m->aflags & PGA_WRITEABLE) == 0) return; pvh = (m->flags & PG_FICTITIOUS) != 0 ? &pv_dummy : pa_to_pvh(VM_PAGE_TO_PHYS(m)); lock = VM_PAGE_TO_PV_LIST_LOCK(m); rw_wlock(lock); restart: TAILQ_FOREACH_SAFE(pv, &pvh->pv_list, pv_next, next_pv) { pmap = PV_PMAP(pv); if (!PMAP_TRYLOCK(pmap)) { pvh_gen = pvh->pv_gen; rw_wunlock(lock); PMAP_LOCK(pmap); rw_wlock(lock); if (pvh_gen != pvh->pv_gen) { PMAP_UNLOCK(pmap); goto restart; } } PG_M = pmap_modified_bit(pmap); PG_V = pmap_valid_bit(pmap); PG_RW = pmap_rw_bit(pmap); va = pv->pv_va; pde = pmap_pde(pmap, va); oldpde = *pde; if ((oldpde & PG_RW) != 0) { if (pmap_demote_pde_locked(pmap, pde, va, &lock)) { if ((oldpde & PG_W) == 0) { /* * Write protect the mapping to a * single page so that a subsequent * write access may repromote. */ va += VM_PAGE_TO_PHYS(m) - (oldpde & PG_PS_FRAME); pte = pmap_pde_to_pte(pde, va); oldpte = *pte; if ((oldpte & PG_V) != 0) { while (!atomic_cmpset_long(pte, oldpte, oldpte & ~(PG_M | PG_RW))) oldpte = *pte; vm_page_dirty(m); pmap_invalidate_page(pmap, va); } } } } PMAP_UNLOCK(pmap); } TAILQ_FOREACH(pv, &m->md.pv_list, pv_next) { pmap = PV_PMAP(pv); if (!PMAP_TRYLOCK(pmap)) { md_gen = m->md.pv_gen; pvh_gen = pvh->pv_gen; rw_wunlock(lock); PMAP_LOCK(pmap); rw_wlock(lock); if (pvh_gen != pvh->pv_gen || md_gen != m->md.pv_gen) { PMAP_UNLOCK(pmap); goto restart; } } PG_M = pmap_modified_bit(pmap); PG_RW = pmap_rw_bit(pmap); pde = pmap_pde(pmap, pv->pv_va); KASSERT((*pde & PG_PS) == 0, ("pmap_clear_modify: found" " a 2mpage in page %p's pv list", m)); pte = pmap_pde_to_pte(pde, pv->pv_va); if ((*pte & (PG_M | PG_RW)) == (PG_M | PG_RW)) { atomic_clear_long(pte, PG_M); pmap_invalidate_page(pmap, pv->pv_va); } PMAP_UNLOCK(pmap); } rw_wunlock(lock); } /* * Miscellaneous support routines follow */ /* Adjust the cache mode for a 4KB page mapped via a PTE. */ static __inline void pmap_pte_attr(pt_entry_t *pte, int cache_bits, int mask) { u_int opte, npte; /* * The cache mode bits are all in the low 32-bits of the * PTE, so we can just spin on updating the low 32-bits. */ do { opte = *(u_int *)pte; npte = opte & ~mask; npte |= cache_bits; } while (npte != opte && !atomic_cmpset_int((u_int *)pte, opte, npte)); } /* Adjust the cache mode for a 2MB page mapped via a PDE. */ static __inline void pmap_pde_attr(pd_entry_t *pde, int cache_bits, int mask) { u_int opde, npde; /* * The cache mode bits are all in the low 32-bits of the * PDE, so we can just spin on updating the low 32-bits. */ do { opde = *(u_int *)pde; npde = opde & ~mask; npde |= cache_bits; } while (npde != opde && !atomic_cmpset_int((u_int *)pde, opde, npde)); } /* * Map a set of physical memory pages into the kernel virtual * address space. Return a pointer to where it is mapped. This * routine is intended to be used for mapping device memory, * NOT real memory. */ void * pmap_mapdev_attr(vm_paddr_t pa, vm_size_t size, int mode) { struct pmap_preinit_mapping *ppim; vm_offset_t va, offset; vm_size_t tmpsize; int i; offset = pa & PAGE_MASK; size = round_page(offset + size); pa = trunc_page(pa); if (!pmap_initialized) { va = 0; for (i = 0; i < PMAP_PREINIT_MAPPING_COUNT; i++) { ppim = pmap_preinit_mapping + i; if (ppim->va == 0) { ppim->pa = pa; ppim->sz = size; ppim->mode = mode; ppim->va = virtual_avail; virtual_avail += size; va = ppim->va; break; } } if (va == 0) panic("%s: too many preinit mappings", __func__); } else { /* * If we have a preinit mapping, re-use it. */ for (i = 0; i < PMAP_PREINIT_MAPPING_COUNT; i++) { ppim = pmap_preinit_mapping + i; if (ppim->pa == pa && ppim->sz == size && ppim->mode == mode) return ((void *)(ppim->va + offset)); } /* * If the specified range of physical addresses fits within * the direct map window, use the direct map. */ if (pa < dmaplimit && pa + size < dmaplimit) { va = PHYS_TO_DMAP(pa); if (!pmap_change_attr(va, size, mode)) return ((void *)(va + offset)); } va = kva_alloc(size); if (va == 0) panic("%s: Couldn't allocate KVA", __func__); } for (tmpsize = 0; tmpsize < size; tmpsize += PAGE_SIZE) pmap_kenter_attr(va + tmpsize, pa + tmpsize, mode); pmap_invalidate_range(kernel_pmap, va, va + tmpsize); pmap_invalidate_cache_range(va, va + tmpsize, FALSE); return ((void *)(va + offset)); } void * pmap_mapdev(vm_paddr_t pa, vm_size_t size) { return (pmap_mapdev_attr(pa, size, PAT_UNCACHEABLE)); } void * pmap_mapbios(vm_paddr_t pa, vm_size_t size) { return (pmap_mapdev_attr(pa, size, PAT_WRITE_BACK)); } void pmap_unmapdev(vm_offset_t va, vm_size_t size) { struct pmap_preinit_mapping *ppim; vm_offset_t offset; int i; /* If we gave a direct map region in pmap_mapdev, do nothing */ if (va >= DMAP_MIN_ADDRESS && va < DMAP_MAX_ADDRESS) return; offset = va & PAGE_MASK; size = round_page(offset + size); va = trunc_page(va); for (i = 0; i < PMAP_PREINIT_MAPPING_COUNT; i++) { ppim = pmap_preinit_mapping + i; if (ppim->va == va && ppim->sz == size) { if (pmap_initialized) return; ppim->pa = 0; ppim->va = 0; ppim->sz = 0; ppim->mode = 0; if (va + size == virtual_avail) virtual_avail = va; return; } } if (pmap_initialized) kva_free(va, size); } /* * Tries to demote a 1GB page mapping. */ static boolean_t pmap_demote_pdpe(pmap_t pmap, pdp_entry_t *pdpe, vm_offset_t va) { pdp_entry_t newpdpe, oldpdpe; pd_entry_t *firstpde, newpde, *pde; pt_entry_t PG_A, PG_M, PG_RW, PG_V; vm_paddr_t mpdepa; vm_page_t mpde; PG_A = pmap_accessed_bit(pmap); PG_M = pmap_modified_bit(pmap); PG_V = pmap_valid_bit(pmap); PG_RW = pmap_rw_bit(pmap); PMAP_LOCK_ASSERT(pmap, MA_OWNED); oldpdpe = *pdpe; KASSERT((oldpdpe & (PG_PS | PG_V)) == (PG_PS | PG_V), ("pmap_demote_pdpe: oldpdpe is missing PG_PS and/or PG_V")); if ((mpde = vm_page_alloc(NULL, va >> PDPSHIFT, VM_ALLOC_INTERRUPT | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED)) == NULL) { CTR2(KTR_PMAP, "pmap_demote_pdpe: failure for va %#lx" " in pmap %p", va, pmap); return (FALSE); } mpdepa = VM_PAGE_TO_PHYS(mpde); firstpde = (pd_entry_t *)PHYS_TO_DMAP(mpdepa); newpdpe = mpdepa | PG_M | PG_A | (oldpdpe & PG_U) | PG_RW | PG_V; KASSERT((oldpdpe & PG_A) != 0, ("pmap_demote_pdpe: oldpdpe is missing PG_A")); KASSERT((oldpdpe & (PG_M | PG_RW)) != PG_RW, ("pmap_demote_pdpe: oldpdpe is missing PG_M")); newpde = oldpdpe; /* * Initialize the page directory page. */ for (pde = firstpde; pde < firstpde + NPDEPG; pde++) { *pde = newpde; newpde += NBPDR; } /* * Demote the mapping. */ *pdpe = newpdpe; /* * Invalidate a stale recursive mapping of the page directory page. */ pmap_invalidate_page(pmap, (vm_offset_t)vtopde(va)); pmap_pdpe_demotions++; CTR2(KTR_PMAP, "pmap_demote_pdpe: success for va %#lx" " in pmap %p", va, pmap); return (TRUE); } /* * Sets the memory attribute for the specified page. */ void pmap_page_set_memattr(vm_page_t m, vm_memattr_t ma) { m->md.pat_mode = ma; /* * If "m" is a normal page, update its direct mapping. This update * can be relied upon to perform any cache operations that are * required for data coherence. */ if ((m->flags & PG_FICTITIOUS) == 0 && pmap_change_attr(PHYS_TO_DMAP(VM_PAGE_TO_PHYS(m)), PAGE_SIZE, m->md.pat_mode)) panic("memory attribute change on the direct map failed"); } /* * Changes the specified virtual address range's memory type to that given by * the parameter "mode". The specified virtual address range must be * completely contained within either the direct map or the kernel map. If * the virtual address range is contained within the kernel map, then the * memory type for each of the corresponding ranges of the direct map is also * changed. (The corresponding ranges of the direct map are those ranges that * map the same physical pages as the specified virtual address range.) These * changes to the direct map are necessary because Intel describes the * behavior of their processors as "undefined" if two or more mappings to the * same physical page have different memory types. * * Returns zero if the change completed successfully, and either EINVAL or * ENOMEM if the change failed. Specifically, EINVAL is returned if some part * of the virtual address range was not mapped, and ENOMEM is returned if * there was insufficient memory available to complete the change. In the * latter case, the memory type may have been changed on some part of the * virtual address range or the direct map. */ int pmap_change_attr(vm_offset_t va, vm_size_t size, int mode) { int error; PMAP_LOCK(kernel_pmap); error = pmap_change_attr_locked(va, size, mode); PMAP_UNLOCK(kernel_pmap); return (error); } static int pmap_change_attr_locked(vm_offset_t va, vm_size_t size, int mode) { vm_offset_t base, offset, tmpva; vm_paddr_t pa_start, pa_end, pa_end1; pdp_entry_t *pdpe; pd_entry_t *pde; pt_entry_t *pte; int cache_bits_pte, cache_bits_pde, error; boolean_t changed; PMAP_LOCK_ASSERT(kernel_pmap, MA_OWNED); base = trunc_page(va); offset = va & PAGE_MASK; size = round_page(offset + size); /* * Only supported on kernel virtual addresses, including the direct * map but excluding the recursive map. */ if (base < DMAP_MIN_ADDRESS) return (EINVAL); cache_bits_pde = pmap_cache_bits(kernel_pmap, mode, 1); cache_bits_pte = pmap_cache_bits(kernel_pmap, mode, 0); changed = FALSE; /* * Pages that aren't mapped aren't supported. Also break down 2MB pages * into 4KB pages if required. */ for (tmpva = base; tmpva < base + size; ) { pdpe = pmap_pdpe(kernel_pmap, tmpva); if (pdpe == NULL || *pdpe == 0) return (EINVAL); if (*pdpe & PG_PS) { /* * If the current 1GB page already has the required * memory type, then we need not demote this page. Just * increment tmpva to the next 1GB page frame. */ if ((*pdpe & X86_PG_PDE_CACHE) == cache_bits_pde) { tmpva = trunc_1gpage(tmpva) + NBPDP; continue; } /* * If the current offset aligns with a 1GB page frame * and there is at least 1GB left within the range, then * we need not break down this page into 2MB pages. */ if ((tmpva & PDPMASK) == 0 && tmpva + PDPMASK < base + size) { tmpva += NBPDP; continue; } if (!pmap_demote_pdpe(kernel_pmap, pdpe, tmpva)) return (ENOMEM); } pde = pmap_pdpe_to_pde(pdpe, tmpva); if (*pde == 0) return (EINVAL); if (*pde & PG_PS) { /* * If the current 2MB page already has the required * memory type, then we need not demote this page. Just * increment tmpva to the next 2MB page frame. */ if ((*pde & X86_PG_PDE_CACHE) == cache_bits_pde) { tmpva = trunc_2mpage(tmpva) + NBPDR; continue; } /* * If the current offset aligns with a 2MB page frame * and there is at least 2MB left within the range, then * we need not break down this page into 4KB pages. */ if ((tmpva & PDRMASK) == 0 && tmpva + PDRMASK < base + size) { tmpva += NBPDR; continue; } if (!pmap_demote_pde(kernel_pmap, pde, tmpva)) return (ENOMEM); } pte = pmap_pde_to_pte(pde, tmpva); if (*pte == 0) return (EINVAL); tmpva += PAGE_SIZE; } error = 0; /* * Ok, all the pages exist, so run through them updating their * cache mode if required. */ pa_start = pa_end = 0; for (tmpva = base; tmpva < base + size; ) { pdpe = pmap_pdpe(kernel_pmap, tmpva); if (*pdpe & PG_PS) { if ((*pdpe & X86_PG_PDE_CACHE) != cache_bits_pde) { pmap_pde_attr(pdpe, cache_bits_pde, X86_PG_PDE_CACHE); changed = TRUE; } if (tmpva >= VM_MIN_KERNEL_ADDRESS && (*pdpe & PG_PS_FRAME) < dmaplimit) { if (pa_start == pa_end) { /* Start physical address run. */ pa_start = *pdpe & PG_PS_FRAME; pa_end = pa_start + NBPDP; } else if (pa_end == (*pdpe & PG_PS_FRAME)) pa_end += NBPDP; else { /* Run ended, update direct map. */ error = pmap_change_attr_locked( PHYS_TO_DMAP(pa_start), pa_end - pa_start, mode); if (error != 0) break; /* Start physical address run. */ pa_start = *pdpe & PG_PS_FRAME; pa_end = pa_start + NBPDP; } } tmpva = trunc_1gpage(tmpva) + NBPDP; continue; } pde = pmap_pdpe_to_pde(pdpe, tmpva); if (*pde & PG_PS) { if ((*pde & X86_PG_PDE_CACHE) != cache_bits_pde) { pmap_pde_attr(pde, cache_bits_pde, X86_PG_PDE_CACHE); changed = TRUE; } if (tmpva >= VM_MIN_KERNEL_ADDRESS && (*pde & PG_PS_FRAME) < dmaplimit) { if (pa_start == pa_end) { /* Start physical address run. */ pa_start = *pde & PG_PS_FRAME; pa_end = pa_start + NBPDR; } else if (pa_end == (*pde & PG_PS_FRAME)) pa_end += NBPDR; else { /* Run ended, update direct map. */ error = pmap_change_attr_locked( PHYS_TO_DMAP(pa_start), pa_end - pa_start, mode); if (error != 0) break; /* Start physical address run. */ pa_start = *pde & PG_PS_FRAME; pa_end = pa_start + NBPDR; } } tmpva = trunc_2mpage(tmpva) + NBPDR; } else { pte = pmap_pde_to_pte(pde, tmpva); if ((*pte & X86_PG_PTE_CACHE) != cache_bits_pte) { pmap_pte_attr(pte, cache_bits_pte, X86_PG_PTE_CACHE); changed = TRUE; } if (tmpva >= VM_MIN_KERNEL_ADDRESS && (*pte & PG_PS_FRAME) < dmaplimit) { if (pa_start == pa_end) { /* Start physical address run. */ pa_start = *pte & PG_FRAME; pa_end = pa_start + PAGE_SIZE; } else if (pa_end == (*pte & PG_FRAME)) pa_end += PAGE_SIZE; else { /* Run ended, update direct map. */ error = pmap_change_attr_locked( PHYS_TO_DMAP(pa_start), pa_end - pa_start, mode); if (error != 0) break; /* Start physical address run. */ pa_start = *pte & PG_FRAME; pa_end = pa_start + PAGE_SIZE; } } tmpva += PAGE_SIZE; } } if (error == 0 && pa_start != pa_end && pa_start < dmaplimit) { pa_end1 = MIN(pa_end, dmaplimit); if (pa_start != pa_end1) error = pmap_change_attr_locked(PHYS_TO_DMAP(pa_start), pa_end1 - pa_start, mode); } /* * Flush CPU caches if required to make sure any data isn't cached that * shouldn't be, etc. */ if (changed) { pmap_invalidate_range(kernel_pmap, base, tmpva); pmap_invalidate_cache_range(base, tmpva, FALSE); } return (error); } /* * Demotes any mapping within the direct map region that covers more than the * specified range of physical addresses. This range's size must be a power * of two and its starting address must be a multiple of its size. Since the * demotion does not change any attributes of the mapping, a TLB invalidation * is not mandatory. The caller may, however, request a TLB invalidation. */ void pmap_demote_DMAP(vm_paddr_t base, vm_size_t len, boolean_t invalidate) { pdp_entry_t *pdpe; pd_entry_t *pde; vm_offset_t va; boolean_t changed; if (len == 0) return; KASSERT(powerof2(len), ("pmap_demote_DMAP: len is not a power of 2")); KASSERT((base & (len - 1)) == 0, ("pmap_demote_DMAP: base is not a multiple of len")); if (len < NBPDP && base < dmaplimit) { va = PHYS_TO_DMAP(base); changed = FALSE; PMAP_LOCK(kernel_pmap); pdpe = pmap_pdpe(kernel_pmap, va); if ((*pdpe & X86_PG_V) == 0) panic("pmap_demote_DMAP: invalid PDPE"); if ((*pdpe & PG_PS) != 0) { if (!pmap_demote_pdpe(kernel_pmap, pdpe, va)) panic("pmap_demote_DMAP: PDPE failed"); changed = TRUE; } if (len < NBPDR) { pde = pmap_pdpe_to_pde(pdpe, va); if ((*pde & X86_PG_V) == 0) panic("pmap_demote_DMAP: invalid PDE"); if ((*pde & PG_PS) != 0) { if (!pmap_demote_pde(kernel_pmap, pde, va)) panic("pmap_demote_DMAP: PDE failed"); changed = TRUE; } } if (changed && invalidate) pmap_invalidate_page(kernel_pmap, va); PMAP_UNLOCK(kernel_pmap); } } /* * perform the pmap work for mincore */ int pmap_mincore(pmap_t pmap, vm_offset_t addr, vm_paddr_t *locked_pa) { pd_entry_t *pdep; pt_entry_t pte, PG_A, PG_M, PG_RW, PG_V; vm_paddr_t pa; int val; PG_A = pmap_accessed_bit(pmap); PG_M = pmap_modified_bit(pmap); PG_V = pmap_valid_bit(pmap); PG_RW = pmap_rw_bit(pmap); PMAP_LOCK(pmap); retry: pdep = pmap_pde(pmap, addr); if (pdep != NULL && (*pdep & PG_V)) { if (*pdep & PG_PS) { pte = *pdep; /* Compute the physical address of the 4KB page. */ pa = ((*pdep & PG_PS_FRAME) | (addr & PDRMASK)) & PG_FRAME; val = MINCORE_SUPER; } else { pte = *pmap_pde_to_pte(pdep, addr); pa = pte & PG_FRAME; val = 0; } } else { pte = 0; pa = 0; val = 0; } if ((pte & PG_V) != 0) { val |= MINCORE_INCORE; if ((pte & (PG_M | PG_RW)) == (PG_M | PG_RW)) val |= MINCORE_MODIFIED | MINCORE_MODIFIED_OTHER; if ((pte & PG_A) != 0) val |= MINCORE_REFERENCED | MINCORE_REFERENCED_OTHER; } if ((val & (MINCORE_MODIFIED_OTHER | MINCORE_REFERENCED_OTHER)) != (MINCORE_MODIFIED_OTHER | MINCORE_REFERENCED_OTHER) && (pte & (PG_MANAGED | PG_V)) == (PG_MANAGED | PG_V)) { /* Ensure that "PHYS_TO_VM_PAGE(pa)->object" doesn't change. */ if (vm_page_pa_tryrelock(pmap, pa, locked_pa)) goto retry; } else PA_UNLOCK_COND(*locked_pa); PMAP_UNLOCK(pmap); return (val); } static uint64_t pmap_pcid_alloc(pmap_t pmap, u_int cpuid) { uint32_t gen, new_gen, pcid_next; CRITICAL_ASSERT(curthread); gen = PCPU_GET(pcid_gen); if (pmap->pm_pcids[cpuid].pm_pcid == PMAP_PCID_KERN || pmap->pm_pcids[cpuid].pm_gen == gen) return (CR3_PCID_SAVE); pcid_next = PCPU_GET(pcid_next); KASSERT(pcid_next <= PMAP_PCID_OVERMAX, ("cpu %d pcid_next %#x", cpuid, pcid_next)); if (pcid_next == PMAP_PCID_OVERMAX) { new_gen = gen + 1; if (new_gen == 0) new_gen = 1; PCPU_SET(pcid_gen, new_gen); pcid_next = PMAP_PCID_KERN + 1; } else { new_gen = gen; } pmap->pm_pcids[cpuid].pm_pcid = pcid_next; pmap->pm_pcids[cpuid].pm_gen = new_gen; PCPU_SET(pcid_next, pcid_next + 1); return (0); } void pmap_activate_sw(struct thread *td) { pmap_t oldpmap, pmap; uint64_t cached, cr3; u_int cpuid; oldpmap = PCPU_GET(curpmap); pmap = vmspace_pmap(td->td_proc->p_vmspace); if (oldpmap == pmap) return; cpuid = PCPU_GET(cpuid); #ifdef SMP CPU_SET_ATOMIC(cpuid, &pmap->pm_active); #else CPU_SET(cpuid, &pmap->pm_active); #endif cr3 = rcr3(); if (pmap_pcid_enabled) { cached = pmap_pcid_alloc(pmap, cpuid); KASSERT(pmap->pm_pcids[cpuid].pm_pcid >= 0 && pmap->pm_pcids[cpuid].pm_pcid < PMAP_PCID_OVERMAX, ("pmap %p cpu %d pcid %#x", pmap, cpuid, pmap->pm_pcids[cpuid].pm_pcid)); KASSERT(pmap->pm_pcids[cpuid].pm_pcid != PMAP_PCID_KERN || pmap == kernel_pmap, ("non-kernel pmap thread %p pmap %p cpu %d pcid %#x", td, pmap, cpuid, pmap->pm_pcids[cpuid].pm_pcid)); if (!cached || (cr3 & ~CR3_PCID_MASK) != pmap->pm_cr3) { load_cr3(pmap->pm_cr3 | pmap->pm_pcids[cpuid].pm_pcid | cached); if (cached) PCPU_INC(pm_save_cnt); } } else if (cr3 != pmap->pm_cr3) { load_cr3(pmap->pm_cr3); } PCPU_SET(curpmap, pmap); #ifdef SMP CPU_CLR_ATOMIC(cpuid, &oldpmap->pm_active); #else CPU_CLR(cpuid, &oldpmap->pm_active); #endif } void pmap_activate(struct thread *td) { critical_enter(); pmap_activate_sw(td); critical_exit(); } void pmap_sync_icache(pmap_t pm, vm_offset_t va, vm_size_t sz) { } /* * Increase the starting virtual address of the given mapping if a * different alignment might result in more superpage mappings. */ void pmap_align_superpage(vm_object_t object, vm_ooffset_t offset, vm_offset_t *addr, vm_size_t size) { vm_offset_t superpage_offset; if (size < NBPDR) return; if (object != NULL && (object->flags & OBJ_COLORED) != 0) offset += ptoa(object->pg_color); superpage_offset = offset & PDRMASK; if (size - ((NBPDR - superpage_offset) & PDRMASK) < NBPDR || (*addr & PDRMASK) == superpage_offset) return; if ((*addr & PDRMASK) < superpage_offset) *addr = (*addr & ~PDRMASK) + superpage_offset; else *addr = ((*addr + PDRMASK) & ~PDRMASK) + superpage_offset; } #ifdef INVARIANTS static unsigned long num_dirty_emulations; SYSCTL_ULONG(_vm_pmap, OID_AUTO, num_dirty_emulations, CTLFLAG_RW, &num_dirty_emulations, 0, NULL); static unsigned long num_accessed_emulations; SYSCTL_ULONG(_vm_pmap, OID_AUTO, num_accessed_emulations, CTLFLAG_RW, &num_accessed_emulations, 0, NULL); static unsigned long num_superpage_accessed_emulations; SYSCTL_ULONG(_vm_pmap, OID_AUTO, num_superpage_accessed_emulations, CTLFLAG_RW, &num_superpage_accessed_emulations, 0, NULL); static unsigned long ad_emulation_superpage_promotions; SYSCTL_ULONG(_vm_pmap, OID_AUTO, ad_emulation_superpage_promotions, CTLFLAG_RW, &ad_emulation_superpage_promotions, 0, NULL); #endif /* INVARIANTS */ int pmap_emulate_accessed_dirty(pmap_t pmap, vm_offset_t va, int ftype) { int rv; struct rwlock *lock; vm_page_t m, mpte; pd_entry_t *pde; pt_entry_t *pte, PG_A, PG_M, PG_RW, PG_V; KASSERT(ftype == VM_PROT_READ || ftype == VM_PROT_WRITE, ("pmap_emulate_accessed_dirty: invalid fault type %d", ftype)); if (!pmap_emulate_ad_bits(pmap)) return (-1); PG_A = pmap_accessed_bit(pmap); PG_M = pmap_modified_bit(pmap); PG_V = pmap_valid_bit(pmap); PG_RW = pmap_rw_bit(pmap); rv = -1; lock = NULL; PMAP_LOCK(pmap); pde = pmap_pde(pmap, va); if (pde == NULL || (*pde & PG_V) == 0) goto done; if ((*pde & PG_PS) != 0) { if (ftype == VM_PROT_READ) { #ifdef INVARIANTS atomic_add_long(&num_superpage_accessed_emulations, 1); #endif *pde |= PG_A; rv = 0; } goto done; } pte = pmap_pde_to_pte(pde, va); if ((*pte & PG_V) == 0) goto done; if (ftype == VM_PROT_WRITE) { if ((*pte & PG_RW) == 0) goto done; /* * Set the modified and accessed bits simultaneously. * * Intel EPT PTEs that do software emulation of A/D bits map * PG_A and PG_M to EPT_PG_READ and EPT_PG_WRITE respectively. * An EPT misconfiguration is triggered if the PTE is writable * but not readable (WR=10). This is avoided by setting PG_A * and PG_M simultaneously. */ *pte |= PG_M | PG_A; } else { *pte |= PG_A; } /* try to promote the mapping */ if (va < VM_MAXUSER_ADDRESS) mpte = PHYS_TO_VM_PAGE(*pde & PG_FRAME); else mpte = NULL; m = PHYS_TO_VM_PAGE(*pte & PG_FRAME); if ((mpte == NULL || mpte->wire_count == NPTEPG) && pmap_ps_enabled(pmap) && (m->flags & PG_FICTITIOUS) == 0 && vm_reserv_level_iffullpop(m) == 0) { pmap_promote_pde(pmap, pde, va, &lock); #ifdef INVARIANTS atomic_add_long(&ad_emulation_superpage_promotions, 1); #endif } #ifdef INVARIANTS if (ftype == VM_PROT_WRITE) atomic_add_long(&num_dirty_emulations, 1); else atomic_add_long(&num_accessed_emulations, 1); #endif rv = 0; /* success */ done: if (lock != NULL) rw_wunlock(lock); PMAP_UNLOCK(pmap); return (rv); } void pmap_get_mapping(pmap_t pmap, vm_offset_t va, uint64_t *ptr, int *num) { pml4_entry_t *pml4; pdp_entry_t *pdp; pd_entry_t *pde; pt_entry_t *pte, PG_V; int idx; idx = 0; PG_V = pmap_valid_bit(pmap); PMAP_LOCK(pmap); pml4 = pmap_pml4e(pmap, va); ptr[idx++] = *pml4; if ((*pml4 & PG_V) == 0) goto done; pdp = pmap_pml4e_to_pdpe(pml4, va); ptr[idx++] = *pdp; if ((*pdp & PG_V) == 0 || (*pdp & PG_PS) != 0) goto done; pde = pmap_pdpe_to_pde(pdp, va); ptr[idx++] = *pde; if ((*pde & PG_V) == 0 || (*pde & PG_PS) != 0) goto done; pte = pmap_pde_to_pte(pde, va); ptr[idx++] = *pte; done: PMAP_UNLOCK(pmap); *num = idx; } /** * Get the kernel virtual address of a set of physical pages. If there are * physical addresses not covered by the DMAP perform a transient mapping * that will be removed when calling pmap_unmap_io_transient. * * \param page The pages the caller wishes to obtain the virtual * address on the kernel memory map. * \param vaddr On return contains the kernel virtual memory address * of the pages passed in the page parameter. * \param count Number of pages passed in. * \param can_fault TRUE if the thread using the mapped pages can take * page faults, FALSE otherwise. * * \returns TRUE if the caller must call pmap_unmap_io_transient when * finished or FALSE otherwise. * */ boolean_t pmap_map_io_transient(vm_page_t page[], vm_offset_t vaddr[], int count, boolean_t can_fault) { vm_paddr_t paddr; boolean_t needs_mapping; pt_entry_t *pte; int cache_bits, error, i; /* * Allocate any KVA space that we need, this is done in a separate * loop to prevent calling vmem_alloc while pinned. */ needs_mapping = FALSE; for (i = 0; i < count; i++) { paddr = VM_PAGE_TO_PHYS(page[i]); if (__predict_false(paddr >= dmaplimit)) { error = vmem_alloc(kernel_arena, PAGE_SIZE, M_BESTFIT | M_WAITOK, &vaddr[i]); KASSERT(error == 0, ("vmem_alloc failed: %d", error)); needs_mapping = TRUE; } else { vaddr[i] = PHYS_TO_DMAP(paddr); } } /* Exit early if everything is covered by the DMAP */ if (!needs_mapping) return (FALSE); /* * NB: The sequence of updating a page table followed by accesses * to the corresponding pages used in the !DMAP case is subject to * the situation described in the "AMD64 Architecture Programmer's * Manual Volume 2: System Programming" rev. 3.23, "7.3.1 Special * Coherency Considerations". Therefore, issuing the INVLPG right * after modifying the PTE bits is crucial. */ if (!can_fault) sched_pin(); for (i = 0; i < count; i++) { paddr = VM_PAGE_TO_PHYS(page[i]); if (paddr >= dmaplimit) { if (can_fault) { /* * Slow path, since we can get page faults * while mappings are active don't pin the * thread to the CPU and instead add a global * mapping visible to all CPUs. */ pmap_qenter(vaddr[i], &page[i], 1); } else { pte = vtopte(vaddr[i]); cache_bits = pmap_cache_bits(kernel_pmap, page[i]->md.pat_mode, 0); pte_store(pte, paddr | X86_PG_RW | X86_PG_V | cache_bits); invlpg(vaddr[i]); } } } return (needs_mapping); } void pmap_unmap_io_transient(vm_page_t page[], vm_offset_t vaddr[], int count, boolean_t can_fault) { vm_paddr_t paddr; int i; if (!can_fault) sched_unpin(); for (i = 0; i < count; i++) { paddr = VM_PAGE_TO_PHYS(page[i]); if (paddr >= dmaplimit) { if (can_fault) pmap_qremove(vaddr[i], 1); vmem_free(kernel_arena, vaddr[i], PAGE_SIZE); } } } vm_offset_t pmap_quick_enter_page(vm_page_t m) { vm_paddr_t paddr; paddr = VM_PAGE_TO_PHYS(m); if (paddr < dmaplimit) return (PHYS_TO_DMAP(paddr)); mtx_lock_spin(&qframe_mtx); KASSERT(*vtopte(qframe) == 0, ("qframe busy")); pte_store(vtopte(qframe), paddr | X86_PG_RW | X86_PG_V | X86_PG_A | X86_PG_M | pmap_cache_bits(kernel_pmap, m->md.pat_mode, 0)); return (qframe); } void pmap_quick_remove_page(vm_offset_t addr) { if (addr != qframe) return; pte_store(vtopte(qframe), 0); invlpg(qframe); mtx_unlock_spin(&qframe_mtx); } #include "opt_ddb.h" #ifdef DDB #include DB_SHOW_COMMAND(pte, pmap_print_pte) { pmap_t pmap; pml4_entry_t *pml4; pdp_entry_t *pdp; pd_entry_t *pde; pt_entry_t *pte, PG_V; vm_offset_t va; if (have_addr) { va = (vm_offset_t)addr; pmap = PCPU_GET(curpmap); /* XXX */ } else { db_printf("show pte addr\n"); return; } PG_V = pmap_valid_bit(pmap); pml4 = pmap_pml4e(pmap, va); db_printf("VA %#016lx pml4e %#016lx", va, *pml4); if ((*pml4 & PG_V) == 0) { db_printf("\n"); return; } pdp = pmap_pml4e_to_pdpe(pml4, va); db_printf(" pdpe %#016lx", *pdp); if ((*pdp & PG_V) == 0 || (*pdp & PG_PS) != 0) { db_printf("\n"); return; } pde = pmap_pdpe_to_pde(pdp, va); db_printf(" pde %#016lx", *pde); if ((*pde & PG_V) == 0 || (*pde & PG_PS) != 0) { db_printf("\n"); return; } pte = pmap_pde_to_pte(pde, va); db_printf(" pte %#016lx\n", *pte); } DB_SHOW_COMMAND(phys2dmap, pmap_phys2dmap) { vm_paddr_t a; if (have_addr) { a = (vm_paddr_t)addr; db_printf("0x%jx\n", (uintmax_t)PHYS_TO_DMAP(a)); } else { db_printf("show phys2dmap addr\n"); } } #endif Index: projects/clang390-import/sys/arm/arm/pmap-v6.c =================================================================== --- projects/clang390-import/sys/arm/arm/pmap-v6.c (revision 305686) +++ projects/clang390-import/sys/arm/arm/pmap-v6.c (revision 305687) @@ -1,6806 +1,6800 @@ /*- * Copyright (c) 1991 Regents of the University of California. * Copyright (c) 1994 John S. Dyson * Copyright (c) 1994 David Greenman * Copyright (c) 2005-2010 Alan L. Cox * Copyright (c) 2014-2016 Svatopluk Kraus * Copyright (c) 2014-2016 Michal Meloun * All rights reserved. * * This code is derived from software contributed to Berkeley by * the Systems Programming Group of the University of Utah Computer * Science Department and William Jolitz of UUNET Technologies Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * from: @(#)pmap.c 7.7 (Berkeley) 5/12/91 */ /*- * Copyright (c) 2003 Networks Associates Technology, Inc. * All rights reserved. * * This software was developed for the FreeBSD Project by Jake Burkholder, * Safeport Network Services, and Network Associates Laboratories, the * Security Research Division of Network Associates, Inc. under * DARPA/SPAWAR contract N66001-01-C-8035 ("CBOSS"), as part of the DARPA * CHATS research program. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); /* * Manages physical address maps. * * Since the information managed by this module is * also stored by the logical address mapping module, * this module may throw away valid virtual-to-physical * mappings at almost any time. However, invalidations * of virtual-to-physical mappings must be done as * requested. * * In order to cope with hardware architectures which * make virtual-to-physical map invalidates expensive, * this module may delay invalidate or reduced protection * operations until such time as they are actually * necessary. This module is given full information as * to which processors are currently using which maps, * and to when physical maps must be made correct. */ #include "opt_vm.h" #include "opt_pmap.h" #include "opt_ddb.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifdef SMP #include #else #include #endif #ifdef DDB #include #endif #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifdef SMP #include #endif #ifndef PMAP_SHPGPERPROC #define PMAP_SHPGPERPROC 200 #endif #ifndef DIAGNOSTIC #define PMAP_INLINE __inline #else #define PMAP_INLINE #endif #ifdef PMAP_DEBUG static void pmap_zero_page_check(vm_page_t m); void pmap_debug(int level); int pmap_pid_dump(int pid); #define PDEBUG(_lev_,_stat_) \ if (pmap_debug_level >= (_lev_)) \ ((_stat_)) #define dprintf printf int pmap_debug_level = 1; #else /* PMAP_DEBUG */ #define PDEBUG(_lev_,_stat_) /* Nothing */ #define dprintf(x, arg...) #endif /* PMAP_DEBUG */ /* * Level 2 page tables map definion ('max' is excluded). */ #define PT2V_MIN_ADDRESS ((vm_offset_t)PT2MAP) #define PT2V_MAX_ADDRESS ((vm_offset_t)PT2MAP + PT2MAP_SIZE) #define UPT2V_MIN_ADDRESS ((vm_offset_t)PT2MAP) #define UPT2V_MAX_ADDRESS \ ((vm_offset_t)(PT2MAP + (KERNBASE >> PT2MAP_SHIFT))) /* * Promotion to a 1MB (PTE1) page mapping requires that the corresponding * 4KB (PTE2) page mappings have identical settings for the following fields: */ #define PTE2_PROMOTE (PTE2_V | PTE2_A | PTE2_NM | PTE2_S | PTE2_NG | \ PTE2_NX | PTE2_RO | PTE2_U | PTE2_W | \ PTE2_ATTR_MASK) #define PTE1_PROMOTE (PTE1_V | PTE1_A | PTE1_NM | PTE1_S | PTE1_NG | \ PTE1_NX | PTE1_RO | PTE1_U | PTE1_W | \ PTE1_ATTR_MASK) #define ATTR_TO_L1(l2_attr) ((((l2_attr) & L2_TEX0) ? L1_S_TEX0 : 0) | \ (((l2_attr) & L2_C) ? L1_S_C : 0) | \ (((l2_attr) & L2_B) ? L1_S_B : 0) | \ (((l2_attr) & PTE2_A) ? PTE1_A : 0) | \ (((l2_attr) & PTE2_NM) ? PTE1_NM : 0) | \ (((l2_attr) & PTE2_S) ? PTE1_S : 0) | \ (((l2_attr) & PTE2_NG) ? PTE1_NG : 0) | \ (((l2_attr) & PTE2_NX) ? PTE1_NX : 0) | \ (((l2_attr) & PTE2_RO) ? PTE1_RO : 0) | \ (((l2_attr) & PTE2_U) ? PTE1_U : 0) | \ (((l2_attr) & PTE2_W) ? PTE1_W : 0)) #define ATTR_TO_L2(l1_attr) ((((l1_attr) & L1_S_TEX0) ? L2_TEX0 : 0) | \ (((l1_attr) & L1_S_C) ? L2_C : 0) | \ (((l1_attr) & L1_S_B) ? L2_B : 0) | \ (((l1_attr) & PTE1_A) ? PTE2_A : 0) | \ (((l1_attr) & PTE1_NM) ? PTE2_NM : 0) | \ (((l1_attr) & PTE1_S) ? PTE2_S : 0) | \ (((l1_attr) & PTE1_NG) ? PTE2_NG : 0) | \ (((l1_attr) & PTE1_NX) ? PTE2_NX : 0) | \ (((l1_attr) & PTE1_RO) ? PTE2_RO : 0) | \ (((l1_attr) & PTE1_U) ? PTE2_U : 0) | \ (((l1_attr) & PTE1_W) ? PTE2_W : 0)) /* * PTE2 descriptors creation macros. */ #define PTE2_ATTR_DEFAULT vm_memattr_to_pte2(VM_MEMATTR_DEFAULT) #define PTE2_ATTR_PT vm_memattr_to_pte2(pt_memattr) #define PTE2_KPT(pa) PTE2_KERN(pa, PTE2_AP_KRW, PTE2_ATTR_PT) #define PTE2_KPT_NG(pa) PTE2_KERN_NG(pa, PTE2_AP_KRW, PTE2_ATTR_PT) #define PTE2_KRW(pa) PTE2_KERN(pa, PTE2_AP_KRW, PTE2_ATTR_DEFAULT) #define PTE2_KRO(pa) PTE2_KERN(pa, PTE2_AP_KR, PTE2_ATTR_DEFAULT) #define PV_STATS #ifdef PV_STATS #define PV_STAT(x) do { x ; } while (0) #else #define PV_STAT(x) do { } while (0) #endif /* * The boot_pt1 is used temporary in very early boot stage as L1 page table. * We can init many things with no memory allocation thanks to its static * allocation and this brings two main advantages: * (1) other cores can be started very simply, * (2) various boot loaders can be supported as its arguments can be processed * in virtual address space and can be moved to safe location before * first allocation happened. * Only disadvantage is that boot_pt1 is used only in very early boot stage. * However, the table is uninitialized and so lays in bss. Therefore kernel * image size is not influenced. * * QQQ: In the future, maybe, boot_pt1 can be used for soft reset and * CPU suspend/resume game. */ extern pt1_entry_t boot_pt1[]; vm_paddr_t base_pt1; pt1_entry_t *kern_pt1; pt2_entry_t *kern_pt2tab; pt2_entry_t *PT2MAP; static uint32_t ttb_flags; static vm_memattr_t pt_memattr; ttb_entry_t pmap_kern_ttb; struct pmap kernel_pmap_store; LIST_HEAD(pmaplist, pmap); static struct pmaplist allpmaps; static struct mtx allpmaps_lock; vm_offset_t virtual_avail; /* VA of first avail page (after kernel bss) */ vm_offset_t virtual_end; /* VA of last avail page (end of kernel AS) */ static vm_offset_t kernel_vm_end_new; vm_offset_t kernel_vm_end = KERNBASE + NKPT2PG * NPT2_IN_PG * PTE1_SIZE; vm_offset_t vm_max_kernel_address; vm_paddr_t kernel_l1pa; static struct rwlock __aligned(CACHE_LINE_SIZE) pvh_global_lock; /* * Data for the pv entry allocation mechanism */ static TAILQ_HEAD(pch, pv_chunk) pv_chunks = TAILQ_HEAD_INITIALIZER(pv_chunks); static int pv_entry_count = 0, pv_entry_max = 0, pv_entry_high_water = 0; static struct md_page *pv_table; /* XXX: Is it used only the list in md_page? */ static int shpgperproc = PMAP_SHPGPERPROC; struct pv_chunk *pv_chunkbase; /* KVA block for pv_chunks */ int pv_maxchunks; /* How many chunks we have KVA for */ vm_offset_t pv_vafree; /* freelist stored in the PTE */ vm_paddr_t first_managed_pa; #define pa_to_pvh(pa) (&pv_table[pte1_index(pa - first_managed_pa)]) /* * All those kernel PT submaps that BSD is so fond of */ struct sysmaps { struct mtx lock; pt2_entry_t *CMAP1; pt2_entry_t *CMAP2; pt2_entry_t *CMAP3; caddr_t CADDR1; caddr_t CADDR2; caddr_t CADDR3; }; static struct sysmaps sysmaps_pcpu[MAXCPU]; caddr_t _tmppt = 0; struct msgbuf *msgbufp = NULL; /* XXX move it to machdep.c */ /* * Crashdump maps. */ static caddr_t crashdumpmap; static pt2_entry_t *PMAP1 = NULL, *PMAP2; static pt2_entry_t *PADDR1 = NULL, *PADDR2; #ifdef DDB static pt2_entry_t *PMAP3; static pt2_entry_t *PADDR3; static int PMAP3cpu __unused; /* for SMP only */ #endif #ifdef SMP static int PMAP1cpu; static int PMAP1changedcpu; SYSCTL_INT(_debug, OID_AUTO, PMAP1changedcpu, CTLFLAG_RD, &PMAP1changedcpu, 0, "Number of times pmap_pte2_quick changed CPU with same PMAP1"); #endif static int PMAP1changed; SYSCTL_INT(_debug, OID_AUTO, PMAP1changed, CTLFLAG_RD, &PMAP1changed, 0, "Number of times pmap_pte2_quick changed PMAP1"); static int PMAP1unchanged; SYSCTL_INT(_debug, OID_AUTO, PMAP1unchanged, CTLFLAG_RD, &PMAP1unchanged, 0, "Number of times pmap_pte2_quick didn't change PMAP1"); static struct mtx PMAP2mutex; static __inline void pt2_wirecount_init(vm_page_t m); static boolean_t pmap_demote_pte1(pmap_t pmap, pt1_entry_t *pte1p, vm_offset_t va); void cache_icache_sync_fresh(vm_offset_t va, vm_paddr_t pa, vm_size_t size); /* * Function to set the debug level of the pmap code. */ #ifdef PMAP_DEBUG void pmap_debug(int level) { pmap_debug_level = level; dprintf("pmap_debug: level=%d\n", pmap_debug_level); } #endif /* PMAP_DEBUG */ /* * This table must corespond with memory attribute configuration in vm.h. * First entry is used for normal system mapping. * * Device memory is always marked as shared. * Normal memory is shared only in SMP . * Not outer shareable bits are not used yet. * Class 6 cannot be used on ARM11. */ #define TEXDEF_TYPE_SHIFT 0 #define TEXDEF_TYPE_MASK 0x3 #define TEXDEF_INNER_SHIFT 2 #define TEXDEF_INNER_MASK 0x3 #define TEXDEF_OUTER_SHIFT 4 #define TEXDEF_OUTER_MASK 0x3 #define TEXDEF_NOS_SHIFT 6 #define TEXDEF_NOS_MASK 0x1 #define TEX(t, i, o, s) \ ((t) << TEXDEF_TYPE_SHIFT) | \ ((i) << TEXDEF_INNER_SHIFT) | \ ((o) << TEXDEF_OUTER_SHIFT | \ ((s) << TEXDEF_NOS_SHIFT)) static uint32_t tex_class[8] = { /* type inner cache outer cache */ TEX(PRRR_MEM, NMRR_WB_WA, NMRR_WB_WA, 0), /* 0 - ATTR_WB_WA */ TEX(PRRR_MEM, NMRR_NC, NMRR_NC, 0), /* 1 - ATTR_NOCACHE */ TEX(PRRR_DEV, NMRR_NC, NMRR_NC, 0), /* 2 - ATTR_DEVICE */ TEX(PRRR_SO, NMRR_NC, NMRR_NC, 0), /* 3 - ATTR_SO */ TEX(PRRR_MEM, NMRR_WT, NMRR_WT, 0), /* 4 - ATTR_WT */ TEX(PRRR_MEM, NMRR_NC, NMRR_NC, 0), /* 5 - NOT USED YET */ TEX(PRRR_MEM, NMRR_NC, NMRR_NC, 0), /* 6 - NOT USED YET */ TEX(PRRR_MEM, NMRR_NC, NMRR_NC, 0), /* 7 - NOT USED YET */ }; #undef TEX static uint32_t pte2_attr_tab[8] = { PTE2_ATTR_WB_WA, /* 0 - VM_MEMATTR_WB_WA */ PTE2_ATTR_NOCACHE, /* 1 - VM_MEMATTR_NOCACHE */ PTE2_ATTR_DEVICE, /* 2 - VM_MEMATTR_DEVICE */ PTE2_ATTR_SO, /* 3 - VM_MEMATTR_SO */ PTE2_ATTR_WT, /* 4 - VM_MEMATTR_WRITE_THROUGH */ 0, /* 5 - NOT USED YET */ 0, /* 6 - NOT USED YET */ 0 /* 7 - NOT USED YET */ }; CTASSERT(VM_MEMATTR_WB_WA == 0); CTASSERT(VM_MEMATTR_NOCACHE == 1); CTASSERT(VM_MEMATTR_DEVICE == 2); CTASSERT(VM_MEMATTR_SO == 3); CTASSERT(VM_MEMATTR_WRITE_THROUGH == 4); static inline uint32_t vm_memattr_to_pte2(vm_memattr_t ma) { KASSERT((u_int)ma < 5, ("%s: bad vm_memattr_t %d", __func__, ma)); return (pte2_attr_tab[(u_int)ma]); } static inline uint32_t vm_page_pte2_attr(vm_page_t m) { return (vm_memattr_to_pte2(m->md.pat_mode)); } /* * Convert TEX definition entry to TTB flags. */ static uint32_t encode_ttb_flags(int idx) { uint32_t inner, outer, nos, reg; inner = (tex_class[idx] >> TEXDEF_INNER_SHIFT) & TEXDEF_INNER_MASK; outer = (tex_class[idx] >> TEXDEF_OUTER_SHIFT) & TEXDEF_OUTER_MASK; nos = (tex_class[idx] >> TEXDEF_NOS_SHIFT) & TEXDEF_NOS_MASK; reg = nos << 5; reg |= outer << 3; if (cpuinfo.coherent_walk) reg |= (inner & 0x1) << 6; reg |= (inner & 0x2) >> 1; #ifdef SMP reg |= 1 << 1; #endif return reg; } /* * Set TEX remapping registers in current CPU. */ void pmap_set_tex(void) { uint32_t prrr, nmrr; uint32_t type, inner, outer, nos; int i; #ifdef PMAP_PTE_NOCACHE /* XXX fixme */ if (cpuinfo.coherent_walk) { pt_memattr = VM_MEMATTR_WB_WA; ttb_flags = encode_ttb_flags(0); } else { pt_memattr = VM_MEMATTR_NOCACHE; ttb_flags = encode_ttb_flags(1); } #else pt_memattr = VM_MEMATTR_WB_WA; ttb_flags = encode_ttb_flags(0); #endif prrr = 0; nmrr = 0; /* Build remapping register from TEX classes. */ for (i = 0; i < 8; i++) { type = (tex_class[i] >> TEXDEF_TYPE_SHIFT) & TEXDEF_TYPE_MASK; inner = (tex_class[i] >> TEXDEF_INNER_SHIFT) & TEXDEF_INNER_MASK; outer = (tex_class[i] >> TEXDEF_OUTER_SHIFT) & TEXDEF_OUTER_MASK; nos = (tex_class[i] >> TEXDEF_NOS_SHIFT) & TEXDEF_NOS_MASK; prrr |= type << (i * 2); prrr |= nos << (i + 24); nmrr |= inner << (i * 2); nmrr |= outer << (i * 2 + 16); } /* Add shareable bits for device memory. */ prrr |= PRRR_DS0 | PRRR_DS1; /* Add shareable bits for normal memory in SMP case. */ #ifdef SMP prrr |= PRRR_NS1; #endif cp15_prrr_set(prrr); cp15_nmrr_set(nmrr); /* Caches are disabled, so full TLB flush should be enough. */ tlb_flush_all_local(); } /* * KERNBASE must be multiple of NPT2_IN_PG * PTE1_SIZE. In other words, * KERNBASE is mapped by first L2 page table in L2 page table page. It * meets same constrain due to PT2MAP being placed just under KERNBASE. */ CTASSERT((KERNBASE & (NPT2_IN_PG * PTE1_SIZE - 1)) == 0); CTASSERT((KERNBASE - VM_MAXUSER_ADDRESS) >= PT2MAP_SIZE); /* * In crazy dreams, PAGE_SIZE could be a multiple of PTE2_SIZE in general. * For now, anyhow, the following check must be fulfilled. */ CTASSERT(PAGE_SIZE == PTE2_SIZE); /* * We don't want to mess up MI code with all MMU and PMAP definitions, * so some things, which depend on other ones, are defined independently. * Now, it is time to check that we don't screw up something. */ CTASSERT(PDRSHIFT == PTE1_SHIFT); /* * Check L1 and L2 page table entries definitions consistency. */ CTASSERT(NB_IN_PT1 == (sizeof(pt1_entry_t) * NPTE1_IN_PT1)); CTASSERT(NB_IN_PT2 == (sizeof(pt2_entry_t) * NPTE2_IN_PT2)); /* * Check L2 page tables page consistency. */ CTASSERT(PAGE_SIZE == (NPT2_IN_PG * NB_IN_PT2)); CTASSERT((1 << PT2PG_SHIFT) == NPT2_IN_PG); /* * Check PT2TAB consistency. * PT2TAB_ENTRIES is defined as a division of NPTE1_IN_PT1 by NPT2_IN_PG. * This should be done without remainder. */ CTASSERT(NPTE1_IN_PT1 == (PT2TAB_ENTRIES * NPT2_IN_PG)); /* * A PT2MAP magic. * * All level 2 page tables (PT2s) are mapped continuously and accordingly * into PT2MAP address space. As PT2 size is less than PAGE_SIZE, this can * be done only if PAGE_SIZE is a multiple of PT2 size. All PT2s in one page * must be used together, but not necessary at once. The first PT2 in a page * must map things on correctly aligned address and the others must follow * in right order. */ #define NB_IN_PT2TAB (PT2TAB_ENTRIES * sizeof(pt2_entry_t)) #define NPT2_IN_PT2TAB (NB_IN_PT2TAB / NB_IN_PT2) #define NPG_IN_PT2TAB (NB_IN_PT2TAB / PAGE_SIZE) /* * Check PT2TAB consistency. * NPT2_IN_PT2TAB is defined as a division of NB_IN_PT2TAB by NB_IN_PT2. * NPG_IN_PT2TAB is defined as a division of NB_IN_PT2TAB by PAGE_SIZE. * The both should be done without remainder. */ CTASSERT(NB_IN_PT2TAB == (NPT2_IN_PT2TAB * NB_IN_PT2)); CTASSERT(NB_IN_PT2TAB == (NPG_IN_PT2TAB * PAGE_SIZE)); /* * The implementation was made general, however, with the assumption * bellow in mind. In case of another value of NPG_IN_PT2TAB, * the code should be once more rechecked. */ CTASSERT(NPG_IN_PT2TAB == 1); /* * Get offset of PT2 in a page * associated with given PT1 index. */ static __inline u_int page_pt2off(u_int pt1_idx) { return ((pt1_idx & PT2PG_MASK) * NB_IN_PT2); } /* * Get physical address of PT2 * associated with given PT2s page and PT1 index. */ static __inline vm_paddr_t page_pt2pa(vm_paddr_t pgpa, u_int pt1_idx) { return (pgpa + page_pt2off(pt1_idx)); } /* * Get first entry of PT2 * associated with given PT2s page and PT1 index. */ static __inline pt2_entry_t * page_pt2(vm_offset_t pgva, u_int pt1_idx) { return ((pt2_entry_t *)(pgva + page_pt2off(pt1_idx))); } /* * Get virtual address of PT2s page (mapped in PT2MAP) * which holds PT2 which holds entry which maps given virtual address. */ static __inline vm_offset_t pt2map_pt2pg(vm_offset_t va) { va &= ~(NPT2_IN_PG * PTE1_SIZE - 1); return ((vm_offset_t)pt2map_entry(va)); } /***************************************************************************** * * THREE pmap initialization milestones exist: * * locore.S * -> fundamental init (including MMU) in ASM * * initarm() * -> fundamental init continues in C * -> first available physical address is known * * pmap_bootstrap_prepare() -> FIRST PMAP MILESTONE (first epoch begins) * -> basic (safe) interface for physical address allocation is made * -> basic (safe) interface for virtual mapping is made * -> limited not SMP coherent work is possible * * -> more fundamental init continues in C * -> locks and some more things are available * -> all fundamental allocations and mappings are done * * pmap_bootstrap() -> SECOND PMAP MILESTONE (second epoch begins) * -> phys_avail[] and virtual_avail is set * -> control is passed to vm subsystem * -> physical and virtual address allocation are off limit * -> low level mapping functions, some SMP coherent, * are available, which cannot be used before vm subsystem * is being inited * * mi_startup() * -> vm subsystem is being inited * * pmap_init() -> THIRD PMAP MILESTONE (third epoch begins) * -> pmap is fully inited * *****************************************************************************/ /***************************************************************************** * * PMAP first stage initialization and utility functions * for pre-bootstrap epoch. * * After pmap_bootstrap_prepare() is called, the following functions * can be used: * * (1) strictly only for this stage functions for physical page allocations, * virtual space allocations, and mappings: * * vm_paddr_t pmap_preboot_get_pages(u_int num); * void pmap_preboot_map_pages(vm_paddr_t pa, vm_offset_t va, u_int num); * vm_offset_t pmap_preboot_reserve_pages(u_int num); * vm_offset_t pmap_preboot_get_vpages(u_int num); * void pmap_preboot_map_attr(vm_paddr_t pa, vm_offset_t va, vm_size_t size, * vm_prot_t prot, vm_memattr_t attr); * * (2) for all stages: * * vm_paddr_t pmap_kextract(vm_offset_t va); * * NOTE: This is not SMP coherent stage. * *****************************************************************************/ #define KERNEL_P2V(pa) \ ((vm_offset_t)((pa) - arm_physmem_kernaddr + KERNVIRTADDR)) #define KERNEL_V2P(va) \ ((vm_paddr_t)((va) - KERNVIRTADDR + arm_physmem_kernaddr)) static vm_paddr_t last_paddr; /* * Pre-bootstrap epoch page allocator. */ vm_paddr_t pmap_preboot_get_pages(u_int num) { vm_paddr_t ret; ret = last_paddr; last_paddr += num * PAGE_SIZE; return (ret); } /* * The fundamental initialization of PMAP stuff. * * Some things already happened in locore.S and some things could happen * before pmap_bootstrap_prepare() is called, so let's recall what is done: * 1. Caches are disabled. * 2. We are running on virtual addresses already with 'boot_pt1' * as L1 page table. * 3. So far, all virtual addresses can be converted to physical ones and * vice versa by the following macros: * KERNEL_P2V(pa) .... physical to virtual ones, * KERNEL_V2P(va) .... virtual to physical ones. * * What is done herein: * 1. The 'boot_pt1' is replaced by real kernel L1 page table 'kern_pt1'. * 2. PT2MAP magic is brought to live. * 3. Basic preboot functions for page allocations and mappings can be used. * 4. Everything is prepared for L1 cache enabling. * * Variations: * 1. To use second TTB register, so kernel and users page tables will be * separated. This way process forking - pmap_pinit() - could be faster, * it saves physical pages and KVA per a process, and it's simple change. * However, it will lead, due to hardware matter, to the following: * (a) 2G space for kernel and 2G space for users. * (b) 1G space for kernel in low addresses and 3G for users above it. * A question is: Is the case (b) really an option? Note that case (b) * does save neither physical memory and KVA. */ void pmap_bootstrap_prepare(vm_paddr_t last) { vm_paddr_t pt2pg_pa, pt2tab_pa, pa, size; vm_offset_t pt2pg_va; pt1_entry_t *pte1p; pt2_entry_t *pte2p; u_int i; uint32_t actlr_mask, actlr_set, l1_attr; /* * Now, we are going to make real kernel mapping. Note that we are * already running on some mapping made in locore.S and we expect * that it's large enough to ensure nofault access to physical memory * allocated herein before switch. * * As kernel image and everything needed before are and will be mapped * by section mappings, we align last physical address to PTE1_SIZE. */ last_paddr = pte1_roundup(last); /* * Allocate and zero page(s) for kernel L1 page table. * * Note that it's first allocation on space which was PTE1_SIZE * aligned and as such base_pt1 is aligned to NB_IN_PT1 too. */ base_pt1 = pmap_preboot_get_pages(NPG_IN_PT1); kern_pt1 = (pt1_entry_t *)KERNEL_P2V(base_pt1); bzero((void*)kern_pt1, NB_IN_PT1); pte1_sync_range(kern_pt1, NB_IN_PT1); /* Allocate and zero page(s) for kernel PT2TAB. */ pt2tab_pa = pmap_preboot_get_pages(NPG_IN_PT2TAB); kern_pt2tab = (pt2_entry_t *)KERNEL_P2V(pt2tab_pa); bzero(kern_pt2tab, NB_IN_PT2TAB); pte2_sync_range(kern_pt2tab, NB_IN_PT2TAB); /* Allocate and zero page(s) for kernel L2 page tables. */ pt2pg_pa = pmap_preboot_get_pages(NKPT2PG); pt2pg_va = KERNEL_P2V(pt2pg_pa); size = NKPT2PG * PAGE_SIZE; bzero((void*)pt2pg_va, size); pte2_sync_range((pt2_entry_t *)pt2pg_va, size); /* * Add a physical memory segment (vm_phys_seg) corresponding to the * preallocated pages for kernel L2 page tables so that vm_page * structures representing these pages will be created. The vm_page * structures are required for promotion of the corresponding kernel * virtual addresses to section mappings. */ vm_phys_add_seg(pt2tab_pa, pmap_preboot_get_pages(0)); /* * Insert allocated L2 page table pages to PT2TAB and make * link to all PT2s in L1 page table. See how kernel_vm_end * is initialized. * * We play simple and safe. So every KVA will have underlaying * L2 page table, even kernel image mapped by sections. */ pte2p = kern_pt2tab_entry(KERNBASE); for (pa = pt2pg_pa; pa < pt2pg_pa + size; pa += PTE2_SIZE) pt2tab_store(pte2p++, PTE2_KPT(pa)); pte1p = kern_pte1(KERNBASE); for (pa = pt2pg_pa; pa < pt2pg_pa + size; pa += NB_IN_PT2) pte1_store(pte1p++, PTE1_LINK(pa)); /* Make section mappings for kernel. */ l1_attr = ATTR_TO_L1(PTE2_ATTR_DEFAULT); pte1p = kern_pte1(KERNBASE); for (pa = KERNEL_V2P(KERNBASE); pa < last; pa += PTE1_SIZE) pte1_store(pte1p++, PTE1_KERN(pa, PTE1_AP_KRW, l1_attr)); /* * Get free and aligned space for PT2MAP and make L1 page table links * to L2 page tables held in PT2TAB. * * Note that pages holding PT2s are stored in PT2TAB as pt2_entry_t * descriptors and PT2TAB page(s) itself is(are) used as PT2s. Thus * each entry in PT2TAB maps all PT2s in a page. This implies that * virtual address of PT2MAP must be aligned to NPT2_IN_PG * PTE1_SIZE. */ PT2MAP = (pt2_entry_t *)(KERNBASE - PT2MAP_SIZE); pte1p = kern_pte1((vm_offset_t)PT2MAP); for (pa = pt2tab_pa, i = 0; i < NPT2_IN_PT2TAB; i++, pa += NB_IN_PT2) { pte1_store(pte1p++, PTE1_LINK(pa)); } /* * Store PT2TAB in PT2TAB itself, i.e. self reference mapping. * Each pmap will hold own PT2TAB, so the mapping should be not global. */ pte2p = kern_pt2tab_entry((vm_offset_t)PT2MAP); for (pa = pt2tab_pa, i = 0; i < NPG_IN_PT2TAB; i++, pa += PTE2_SIZE) { pt2tab_store(pte2p++, PTE2_KPT_NG(pa)); } /* * Choose correct L2 page table and make mappings for allocations * made herein which replaces temporary locore.S mappings after a while. * Note that PT2MAP cannot be used until we switch to kern_pt1. * * Note, that these allocations started aligned on 1M section and * kernel PT1 was allocated first. Making of mappings must follow * order of physical allocations as we've used KERNEL_P2V() macro * for virtual addresses resolution. */ pte2p = kern_pt2tab_entry((vm_offset_t)kern_pt1); pt2pg_va = KERNEL_P2V(pte2_pa(pte2_load(pte2p))); pte2p = page_pt2(pt2pg_va, pte1_index((vm_offset_t)kern_pt1)); /* Make mapping for kernel L1 page table. */ for (pa = base_pt1, i = 0; i < NPG_IN_PT1; i++, pa += PTE2_SIZE) pte2_store(pte2p++, PTE2_KPT(pa)); /* Make mapping for kernel PT2TAB. */ for (pa = pt2tab_pa, i = 0; i < NPG_IN_PT2TAB; i++, pa += PTE2_SIZE) pte2_store(pte2p++, PTE2_KPT(pa)); /* Finally, switch from 'boot_pt1' to 'kern_pt1'. */ pmap_kern_ttb = base_pt1 | ttb_flags; cpuinfo_get_actlr_modifier(&actlr_mask, &actlr_set); reinit_mmu(pmap_kern_ttb, actlr_mask, actlr_set); /* * Initialize the first available KVA. As kernel image is mapped by * sections, we are leaving some gap behind. */ virtual_avail = (vm_offset_t)kern_pt2tab + NPG_IN_PT2TAB * PAGE_SIZE; } /* * Setup L2 page table page for given KVA. * Used in pre-bootstrap epoch. * * Note that we have allocated NKPT2PG pages for L2 page tables in advance * and used them for mapping KVA starting from KERNBASE. However, this is not * enough. Vectors and devices need L2 page tables too. Note that they are * even above VM_MAX_KERNEL_ADDRESS. */ static __inline vm_paddr_t pmap_preboot_pt2pg_setup(vm_offset_t va) { pt2_entry_t *pte2p, pte2; vm_paddr_t pt2pg_pa; /* Get associated entry in PT2TAB. */ pte2p = kern_pt2tab_entry(va); /* Just return, if PT2s page exists already. */ pte2 = pt2tab_load(pte2p); if (pte2_is_valid(pte2)) return (pte2_pa(pte2)); KASSERT(va >= VM_MAX_KERNEL_ADDRESS, ("%s: NKPT2PG too small", __func__)); /* * Allocate page for PT2s and insert it to PT2TAB. * In other words, map it into PT2MAP space. */ pt2pg_pa = pmap_preboot_get_pages(1); pt2tab_store(pte2p, PTE2_KPT(pt2pg_pa)); /* Zero all PT2s in allocated page. */ bzero((void*)pt2map_pt2pg(va), PAGE_SIZE); pte2_sync_range((pt2_entry_t *)pt2map_pt2pg(va), PAGE_SIZE); return (pt2pg_pa); } /* * Setup L2 page table for given KVA. * Used in pre-bootstrap epoch. */ static void pmap_preboot_pt2_setup(vm_offset_t va) { pt1_entry_t *pte1p; vm_paddr_t pt2pg_pa, pt2_pa; /* Setup PT2's page. */ pt2pg_pa = pmap_preboot_pt2pg_setup(va); pt2_pa = page_pt2pa(pt2pg_pa, pte1_index(va)); /* Insert PT2 to PT1. */ pte1p = kern_pte1(va); pte1_store(pte1p, PTE1_LINK(pt2_pa)); } /* * Get L2 page entry associated with given KVA. * Used in pre-bootstrap epoch. */ static __inline pt2_entry_t* pmap_preboot_vtopte2(vm_offset_t va) { pt1_entry_t *pte1p; /* Setup PT2 if needed. */ pte1p = kern_pte1(va); if (!pte1_is_valid(pte1_load(pte1p))) /* XXX - sections ?! */ pmap_preboot_pt2_setup(va); return (pt2map_entry(va)); } /* * Pre-bootstrap epoch page(s) mapping(s). */ void pmap_preboot_map_pages(vm_paddr_t pa, vm_offset_t va, u_int num) { u_int i; pt2_entry_t *pte2p; /* Map all the pages. */ for (i = 0; i < num; i++) { pte2p = pmap_preboot_vtopte2(va); pte2_store(pte2p, PTE2_KRW(pa)); va += PAGE_SIZE; pa += PAGE_SIZE; } } /* * Pre-bootstrap epoch virtual space alocator. */ vm_offset_t pmap_preboot_reserve_pages(u_int num) { u_int i; vm_offset_t start, va; pt2_entry_t *pte2p; /* Allocate virtual space. */ start = va = virtual_avail; virtual_avail += num * PAGE_SIZE; /* Zero the mapping. */ for (i = 0; i < num; i++) { pte2p = pmap_preboot_vtopte2(va); pte2_store(pte2p, 0); va += PAGE_SIZE; } return (start); } /* * Pre-bootstrap epoch page(s) allocation and mapping(s). */ vm_offset_t pmap_preboot_get_vpages(u_int num) { vm_paddr_t pa; vm_offset_t va; /* Allocate physical page(s). */ pa = pmap_preboot_get_pages(num); /* Allocate virtual space. */ va = virtual_avail; virtual_avail += num * PAGE_SIZE; /* Map and zero all. */ pmap_preboot_map_pages(pa, va, num); bzero((void *)va, num * PAGE_SIZE); return (va); } /* * Pre-bootstrap epoch page mapping(s) with attributes. */ void pmap_preboot_map_attr(vm_paddr_t pa, vm_offset_t va, vm_size_t size, vm_prot_t prot, vm_memattr_t attr) { u_int num; u_int l1_attr, l1_prot, l2_prot, l2_attr; pt1_entry_t *pte1p; pt2_entry_t *pte2p; l2_prot = prot & VM_PROT_WRITE ? PTE2_AP_KRW : PTE2_AP_KR; l2_prot |= (prot & VM_PROT_EXECUTE) ? PTE2_X : PTE2_NX; l2_attr = vm_memattr_to_pte2(attr); l1_prot = ATTR_TO_L1(l2_prot); l1_attr = ATTR_TO_L1(l2_attr); /* Map all the pages. */ num = round_page(size); while (num > 0) { if ((((va | pa) & PTE1_OFFSET) == 0) && (num >= PTE1_SIZE)) { pte1p = kern_pte1(va); pte1_store(pte1p, PTE1_KERN(pa, l1_prot, l1_attr)); va += PTE1_SIZE; pa += PTE1_SIZE; num -= PTE1_SIZE; } else { pte2p = pmap_preboot_vtopte2(va); pte2_store(pte2p, PTE2_KERN(pa, l2_prot, l2_attr)); va += PAGE_SIZE; pa += PAGE_SIZE; num -= PAGE_SIZE; } } } /* * Extract from the kernel page table the physical address * that is mapped by the given virtual address "va". */ vm_paddr_t pmap_kextract(vm_offset_t va) { vm_paddr_t pa; pt1_entry_t pte1; pt2_entry_t pte2; pte1 = pte1_load(kern_pte1(va)); if (pte1_is_section(pte1)) { pa = pte1_pa(pte1) | (va & PTE1_OFFSET); } else if (pte1_is_link(pte1)) { /* * We should beware of concurrent promotion that changes * pte1 at this point. However, it's not a problem as PT2 * page is preserved by promotion in PT2TAB. So even if * it happens, using of PT2MAP is still safe. * * QQQ: However, concurrent removing is a problem which * ends in abort on PT2MAP space. Locking must be used * to deal with this. */ pte2 = pte2_load(pt2map_entry(va)); pa = pte2_pa(pte2) | (va & PTE2_OFFSET); } else { panic("%s: va %#x pte1 %#x", __func__, va, pte1); } return (pa); } /* * Extract from the kernel page table the physical address * that is mapped by the given virtual address "va". Also * return L2 page table entry which maps the address. * * This is only intended to be used for panic dumps. */ vm_paddr_t pmap_dump_kextract(vm_offset_t va, pt2_entry_t *pte2p) { vm_paddr_t pa; pt1_entry_t pte1; pt2_entry_t pte2; pte1 = pte1_load(kern_pte1(va)); if (pte1_is_section(pte1)) { pa = pte1_pa(pte1) | (va & PTE1_OFFSET); pte2 = pa | ATTR_TO_L2(pte1) | PTE2_V; } else if (pte1_is_link(pte1)) { pte2 = pte2_load(pt2map_entry(va)); pa = pte2_pa(pte2); } else { pte2 = 0; pa = 0; } if (pte2p != NULL) *pte2p = pte2; return (pa); } /***************************************************************************** * * PMAP second stage initialization and utility functions * for bootstrap epoch. * * After pmap_bootstrap() is called, the following functions for * mappings can be used: * * void pmap_kenter(vm_offset_t va, vm_paddr_t pa); * void pmap_kremove(vm_offset_t va); * vm_offset_t pmap_map(vm_offset_t *virt, vm_paddr_t start, vm_paddr_t end, * int prot); * * NOTE: This is not SMP coherent stage. And physical page allocation is not * allowed during this stage. * *****************************************************************************/ /* * Initialize kernel PMAP locks and lists, kernel_pmap itself, and * reserve various virtual spaces for temporary mappings. */ void pmap_bootstrap(vm_offset_t firstaddr) { pt2_entry_t *unused __unused; struct sysmaps *sysmaps; u_int i; /* * Initialize the kernel pmap (which is statically allocated). */ PMAP_LOCK_INIT(kernel_pmap); kernel_l1pa = (vm_paddr_t)kern_pt1; /* for libkvm */ kernel_pmap->pm_pt1 = kern_pt1; kernel_pmap->pm_pt2tab = kern_pt2tab; CPU_FILL(&kernel_pmap->pm_active); /* don't allow deactivation */ TAILQ_INIT(&kernel_pmap->pm_pvchunk); /* * Initialize the global pv list lock. */ rw_init(&pvh_global_lock, "pmap pv global"); LIST_INIT(&allpmaps); /* * Request a spin mutex so that changes to allpmaps cannot be * preempted by smp_rendezvous_cpus(). */ mtx_init(&allpmaps_lock, "allpmaps", NULL, MTX_SPIN); mtx_lock_spin(&allpmaps_lock); LIST_INSERT_HEAD(&allpmaps, kernel_pmap, pm_list); mtx_unlock_spin(&allpmaps_lock); /* * Reserve some special page table entries/VA space for temporary * mapping of pages. */ #define SYSMAP(c, p, v, n) do { \ v = (c)pmap_preboot_reserve_pages(n); \ p = pt2map_entry((vm_offset_t)v); \ } while (0) /* * Local CMAP1/CMAP2 are used for zeroing and copying pages. * Local CMAP3 is used for data cache cleaning. */ for (i = 0; i < MAXCPU; i++) { sysmaps = &sysmaps_pcpu[i]; mtx_init(&sysmaps->lock, "SYSMAPS", NULL, MTX_DEF); SYSMAP(caddr_t, sysmaps->CMAP1, sysmaps->CADDR1, 1); SYSMAP(caddr_t, sysmaps->CMAP2, sysmaps->CADDR2, 1); SYSMAP(caddr_t, sysmaps->CMAP3, sysmaps->CADDR3, 1); } /* * Crashdump maps. */ SYSMAP(caddr_t, unused, crashdumpmap, MAXDUMPPGS); /* * _tmppt is used for reading arbitrary physical pages via /dev/mem. */ SYSMAP(caddr_t, unused, _tmppt, 1); /* * PADDR1 and PADDR2 are used by pmap_pte2_quick() and pmap_pte2(), * respectively. PADDR3 is used by pmap_pte2_ddb(). */ SYSMAP(pt2_entry_t *, PMAP1, PADDR1, 1); SYSMAP(pt2_entry_t *, PMAP2, PADDR2, 1); #ifdef DDB SYSMAP(pt2_entry_t *, PMAP3, PADDR3, 1); #endif mtx_init(&PMAP2mutex, "PMAP2", NULL, MTX_DEF); /* * Note that in very short time in initarm(), we are going to * initialize phys_avail[] array and no further page allocation * can happen after that until vm subsystem will be initialized. */ kernel_vm_end_new = kernel_vm_end; virtual_end = vm_max_kernel_address; } static void pmap_init_qpages(void) { struct pcpu *pc; int i; CPU_FOREACH(i) { pc = pcpu_find(i); pc->pc_qmap_addr = kva_alloc(PAGE_SIZE); if (pc->pc_qmap_addr == 0) panic("%s: unable to allocate KVA", __func__); } } SYSINIT(qpages_init, SI_SUB_CPU, SI_ORDER_ANY, pmap_init_qpages, NULL); /* * The function can already be use in second initialization stage. * As such, the function DOES NOT call pmap_growkernel() where PT2 * allocation can happen. So if used, be sure that PT2 for given * virtual address is allocated already! * * Add a wired page to the kva. * Note: not SMP coherent. */ static __inline void pmap_kenter_prot_attr(vm_offset_t va, vm_paddr_t pa, uint32_t prot, uint32_t attr) { pt1_entry_t *pte1p; pt2_entry_t *pte2p; pte1p = kern_pte1(va); if (!pte1_is_valid(pte1_load(pte1p))) { /* XXX - sections ?! */ /* * This is a very low level function, so PT2 and particularly * PT2PG associated with given virtual address must be already * allocated. It's a pain mainly during pmap initialization * stage. However, called after pmap initialization with * virtual address not under kernel_vm_end will lead to * the same misery. */ if (!pte2_is_valid(pte2_load(kern_pt2tab_entry(va)))) panic("%s: kernel PT2 not allocated!", __func__); } pte2p = pt2map_entry(va); pte2_store(pte2p, PTE2_KERN(pa, prot, attr)); } PMAP_INLINE void pmap_kenter(vm_offset_t va, vm_paddr_t pa) { pmap_kenter_prot_attr(va, pa, PTE2_AP_KRW, PTE2_ATTR_DEFAULT); } /* * Remove a page from the kernel pagetables. * Note: not SMP coherent. */ PMAP_INLINE void pmap_kremove(vm_offset_t va) { pt2_entry_t *pte2p; pte2p = pt2map_entry(va); pte2_clear(pte2p); } /* * Share new kernel PT2PG with all pmaps. * The caller is responsible for maintaining TLB consistency. */ static void pmap_kenter_pt2tab(vm_offset_t va, pt2_entry_t npte2) { pmap_t pmap; pt2_entry_t *pte2p; mtx_lock_spin(&allpmaps_lock); LIST_FOREACH(pmap, &allpmaps, pm_list) { pte2p = pmap_pt2tab_entry(pmap, va); pt2tab_store(pte2p, npte2); } mtx_unlock_spin(&allpmaps_lock); } /* * Share new kernel PTE1 with all pmaps. * The caller is responsible for maintaining TLB consistency. */ static void pmap_kenter_pte1(vm_offset_t va, pt1_entry_t npte1) { pmap_t pmap; pt1_entry_t *pte1p; mtx_lock_spin(&allpmaps_lock); LIST_FOREACH(pmap, &allpmaps, pm_list) { pte1p = pmap_pte1(pmap, va); pte1_store(pte1p, npte1); } mtx_unlock_spin(&allpmaps_lock); } /* * Used to map a range of physical addresses into kernel * virtual address space. * * The value passed in '*virt' is a suggested virtual address for * the mapping. Architectures which can support a direct-mapped * physical to virtual region can return the appropriate address * within that region, leaving '*virt' unchanged. Other * architectures should map the pages starting at '*virt' and * update '*virt' with the first usable address after the mapped * region. * * NOTE: Read the comments above pmap_kenter_prot_attr() as * the function is used herein! */ vm_offset_t pmap_map(vm_offset_t *virt, vm_paddr_t start, vm_paddr_t end, int prot) { vm_offset_t va, sva; vm_paddr_t pte1_offset; pt1_entry_t npte1; uint32_t l1prot, l2prot; uint32_t l1attr, l2attr; PDEBUG(1, printf("%s: virt = %#x, start = %#x, end = %#x (size = %#x)," " prot = %d\n", __func__, *virt, start, end, end - start, prot)); l2prot = (prot & VM_PROT_WRITE) ? PTE2_AP_KRW : PTE2_AP_KR; l2prot |= (prot & VM_PROT_EXECUTE) ? PTE2_X : PTE2_NX; l1prot = ATTR_TO_L1(l2prot); l2attr = PTE2_ATTR_DEFAULT; l1attr = ATTR_TO_L1(l2attr); va = *virt; /* * Does the physical address range's size and alignment permit at * least one section mapping to be created? */ pte1_offset = start & PTE1_OFFSET; if ((end - start) - ((PTE1_SIZE - pte1_offset) & PTE1_OFFSET) >= PTE1_SIZE) { /* * Increase the starting virtual address so that its alignment * does not preclude the use of section mappings. */ if ((va & PTE1_OFFSET) < pte1_offset) va = pte1_trunc(va) + pte1_offset; else if ((va & PTE1_OFFSET) > pte1_offset) va = pte1_roundup(va) + pte1_offset; } sva = va; while (start < end) { if ((start & PTE1_OFFSET) == 0 && end - start >= PTE1_SIZE) { KASSERT((va & PTE1_OFFSET) == 0, ("%s: misaligned va %#x", __func__, va)); npte1 = PTE1_KERN(start, l1prot, l1attr); pmap_kenter_pte1(va, npte1); va += PTE1_SIZE; start += PTE1_SIZE; } else { pmap_kenter_prot_attr(va, start, l2prot, l2attr); va += PAGE_SIZE; start += PAGE_SIZE; } } tlb_flush_range(sva, va - sva); *virt = va; return (sva); } /* * Make a temporary mapping for a physical address. * This is only intended to be used for panic dumps. */ void * pmap_kenter_temporary(vm_paddr_t pa, int i) { vm_offset_t va; /* QQQ: 'i' should be less or equal to MAXDUMPPGS. */ va = (vm_offset_t)crashdumpmap + (i * PAGE_SIZE); pmap_kenter(va, pa); tlb_flush_local(va); return ((void *)crashdumpmap); } /************************************* * * TLB & cache maintenance routines. * *************************************/ /* * We inline these within pmap.c for speed. */ PMAP_INLINE void pmap_tlb_flush(pmap_t pmap, vm_offset_t va) { if (pmap == kernel_pmap || !CPU_EMPTY(&pmap->pm_active)) tlb_flush(va); } PMAP_INLINE void pmap_tlb_flush_range(pmap_t pmap, vm_offset_t sva, vm_size_t size) { if (pmap == kernel_pmap || !CPU_EMPTY(&pmap->pm_active)) tlb_flush_range(sva, size); } /* * Abuse the pte2 nodes for unmapped kva to thread a kva freelist through. * Requirements: * - Must deal with pages in order to ensure that none of the PTE2_* bits * are ever set, PTE2_V in particular. * - Assumes we can write to pte2s without pte2_store() atomic ops. * - Assumes nothing will ever test these addresses for 0 to indicate * no mapping instead of correctly checking PTE2_V. * - Assumes a vm_offset_t will fit in a pte2 (true for arm). * Because PTE2_V is never set, there can be no mappings to invalidate. */ static vm_offset_t pmap_pte2list_alloc(vm_offset_t *head) { pt2_entry_t *pte2p; vm_offset_t va; va = *head; if (va == 0) panic("pmap_ptelist_alloc: exhausted ptelist KVA"); pte2p = pt2map_entry(va); *head = *pte2p; if (*head & PTE2_V) panic("%s: va with PTE2_V set!", __func__); *pte2p = 0; return (va); } static void pmap_pte2list_free(vm_offset_t *head, vm_offset_t va) { pt2_entry_t *pte2p; if (va & PTE2_V) panic("%s: freeing va with PTE2_V set!", __func__); pte2p = pt2map_entry(va); *pte2p = *head; /* virtual! PTE2_V is 0 though */ *head = va; } static void pmap_pte2list_init(vm_offset_t *head, void *base, int npages) { int i; vm_offset_t va; *head = 0; for (i = npages - 1; i >= 0; i--) { va = (vm_offset_t)base + i * PAGE_SIZE; pmap_pte2list_free(head, va); } } /***************************************************************************** * * PMAP third and final stage initialization. * * After pmap_init() is called, PMAP subsystem is fully initialized. * *****************************************************************************/ SYSCTL_NODE(_vm, OID_AUTO, pmap, CTLFLAG_RD, 0, "VM/pmap parameters"); SYSCTL_INT(_vm_pmap, OID_AUTO, pv_entry_max, CTLFLAG_RD, &pv_entry_max, 0, "Max number of PV entries"); SYSCTL_INT(_vm_pmap, OID_AUTO, shpgperproc, CTLFLAG_RD, &shpgperproc, 0, "Page share factor per proc"); static u_long nkpt2pg = NKPT2PG; SYSCTL_ULONG(_vm_pmap, OID_AUTO, nkpt2pg, CTLFLAG_RD, &nkpt2pg, 0, "Pre-allocated pages for kernel PT2s"); static int sp_enabled = 1; SYSCTL_INT(_vm_pmap, OID_AUTO, sp_enabled, CTLFLAG_RDTUN | CTLFLAG_NOFETCH, &sp_enabled, 0, "Are large page mappings enabled?"); static SYSCTL_NODE(_vm_pmap, OID_AUTO, pte1, CTLFLAG_RD, 0, "1MB page mapping counters"); static u_long pmap_pte1_demotions; SYSCTL_ULONG(_vm_pmap_pte1, OID_AUTO, demotions, CTLFLAG_RD, &pmap_pte1_demotions, 0, "1MB page demotions"); static u_long pmap_pte1_mappings; SYSCTL_ULONG(_vm_pmap_pte1, OID_AUTO, mappings, CTLFLAG_RD, &pmap_pte1_mappings, 0, "1MB page mappings"); static u_long pmap_pte1_p_failures; SYSCTL_ULONG(_vm_pmap_pte1, OID_AUTO, p_failures, CTLFLAG_RD, &pmap_pte1_p_failures, 0, "1MB page promotion failures"); static u_long pmap_pte1_promotions; SYSCTL_ULONG(_vm_pmap_pte1, OID_AUTO, promotions, CTLFLAG_RD, &pmap_pte1_promotions, 0, "1MB page promotions"); static u_long pmap_pte1_kern_demotions; SYSCTL_ULONG(_vm_pmap_pte1, OID_AUTO, kern_demotions, CTLFLAG_RD, &pmap_pte1_kern_demotions, 0, "1MB page kernel demotions"); static u_long pmap_pte1_kern_promotions; SYSCTL_ULONG(_vm_pmap_pte1, OID_AUTO, kern_promotions, CTLFLAG_RD, &pmap_pte1_kern_promotions, 0, "1MB page kernel promotions"); static __inline ttb_entry_t pmap_ttb_get(pmap_t pmap) { return (vtophys(pmap->pm_pt1) | ttb_flags); } /* * Initialize a vm_page's machine-dependent fields. * * Variations: * 1. Pages for L2 page tables are always not managed. So, pv_list and * pt2_wirecount can share same physical space. However, proper * initialization on a page alloc for page tables and reinitialization * on the page free must be ensured. */ void pmap_page_init(vm_page_t m) { TAILQ_INIT(&m->md.pv_list); pt2_wirecount_init(m); m->md.pat_mode = VM_MEMATTR_DEFAULT; } /* * Virtualization for faster way how to zero whole page. */ static __inline void pagezero(void *page) { bzero(page, PAGE_SIZE); } /* * Zero L2 page table page. * Use same KVA as in pmap_zero_page(). */ static __inline vm_paddr_t pmap_pt2pg_zero(vm_page_t m) { vm_paddr_t pa; struct sysmaps *sysmaps; pa = VM_PAGE_TO_PHYS(m); /* * XXX: For now, we map whole page even if it's already zero, * to sync it even if the sync is only DSB. */ sched_pin(); sysmaps = &sysmaps_pcpu[PCPU_GET(cpuid)]; mtx_lock(&sysmaps->lock); if (pte2_load(sysmaps->CMAP2) != 0) panic("%s: CMAP2 busy", __func__); pte2_store(sysmaps->CMAP2, PTE2_KERN_NG(pa, PTE2_AP_KRW, vm_page_pte2_attr(m))); /* Even VM_ALLOC_ZERO request is only advisory. */ if ((m->flags & PG_ZERO) == 0) pagezero(sysmaps->CADDR2); pte2_sync_range((pt2_entry_t *)sysmaps->CADDR2, PAGE_SIZE); pte2_clear(sysmaps->CMAP2); tlb_flush((vm_offset_t)sysmaps->CADDR2); sched_unpin(); mtx_unlock(&sysmaps->lock); return (pa); } /* * Init just allocated page as L2 page table(s) holder * and return its physical address. */ static __inline vm_paddr_t pmap_pt2pg_init(pmap_t pmap, vm_offset_t va, vm_page_t m) { vm_paddr_t pa; pt2_entry_t *pte2p; /* Check page attributes. */ if (m->md.pat_mode != pt_memattr) pmap_page_set_memattr(m, pt_memattr); /* Zero page and init wire counts. */ pa = pmap_pt2pg_zero(m); pt2_wirecount_init(m); /* * Map page to PT2MAP address space for given pmap. * Note that PT2MAP space is shared with all pmaps. */ if (pmap == kernel_pmap) pmap_kenter_pt2tab(va, PTE2_KPT(pa)); else { pte2p = pmap_pt2tab_entry(pmap, va); pt2tab_store(pte2p, PTE2_KPT_NG(pa)); } return (pa); } /* * Initialize the pmap module. * Called by vm_init, to initialize any structures that the pmap * system needs to map virtual memory. */ void pmap_init(void) { vm_size_t s; pt2_entry_t *pte2p, pte2; u_int i, pte1_idx, pv_npg; PDEBUG(1, printf("%s: phys_start = %#x\n", __func__, PHYSADDR)); /* * Initialize the vm page array entries for kernel pmap's * L2 page table pages allocated in advance. */ pte1_idx = pte1_index(KERNBASE - PT2MAP_SIZE); pte2p = kern_pt2tab_entry(KERNBASE - PT2MAP_SIZE); for (i = 0; i < nkpt2pg + NPG_IN_PT2TAB; i++, pte2p++) { vm_paddr_t pa; vm_page_t m; pte2 = pte2_load(pte2p); KASSERT(pte2_is_valid(pte2), ("%s: no valid entry", __func__)); pa = pte2_pa(pte2); m = PHYS_TO_VM_PAGE(pa); KASSERT(m >= vm_page_array && m < &vm_page_array[vm_page_array_size], ("%s: L2 page table page is out of range", __func__)); m->pindex = pte1_idx; m->phys_addr = pa; pte1_idx += NPT2_IN_PG; } /* * Initialize the address space (zone) for the pv entries. Set a * high water mark so that the system can recover from excessive * numbers of pv entries. */ TUNABLE_INT_FETCH("vm.pmap.shpgperproc", &shpgperproc); pv_entry_max = shpgperproc * maxproc + vm_cnt.v_page_count; TUNABLE_INT_FETCH("vm.pmap.pv_entries", &pv_entry_max); pv_entry_max = roundup(pv_entry_max, _NPCPV); pv_entry_high_water = 9 * (pv_entry_max / 10); /* * Are large page mappings enabled? */ TUNABLE_INT_FETCH("vm.pmap.sp_enabled", &sp_enabled); if (sp_enabled) { KASSERT(MAXPAGESIZES > 1 && pagesizes[1] == 0, ("%s: can't assign to pagesizes[1]", __func__)); pagesizes[1] = PTE1_SIZE; } /* * Calculate the size of the pv head table for sections. * Handle the possibility that "vm_phys_segs[...].end" is zero. * Note that the table is only for sections which could be promoted. */ first_managed_pa = pte1_trunc(vm_phys_segs[0].start); pv_npg = (pte1_trunc(vm_phys_segs[vm_phys_nsegs - 1].end - PAGE_SIZE) - first_managed_pa) / PTE1_SIZE + 1; /* * Allocate memory for the pv head table for sections. */ s = (vm_size_t)(pv_npg * sizeof(struct md_page)); s = round_page(s); pv_table = (struct md_page *)kmem_malloc(kernel_arena, s, M_WAITOK | M_ZERO); for (i = 0; i < pv_npg; i++) TAILQ_INIT(&pv_table[i].pv_list); pv_maxchunks = MAX(pv_entry_max / _NPCPV, maxproc); pv_chunkbase = (struct pv_chunk *)kva_alloc(PAGE_SIZE * pv_maxchunks); if (pv_chunkbase == NULL) panic("%s: not enough kvm for pv chunks", __func__); pmap_pte2list_init(&pv_vafree, pv_chunkbase, pv_maxchunks); } /* * Add a list of wired pages to the kva * this routine is only used for temporary * kernel mappings that do not need to have * page modification or references recorded. * Note that old mappings are simply written * over. The page *must* be wired. * Note: SMP coherent. Uses a ranged shootdown IPI. */ void pmap_qenter(vm_offset_t sva, vm_page_t *ma, int count) { u_int anychanged; pt2_entry_t *epte2p, *pte2p, pte2; vm_page_t m; vm_paddr_t pa; anychanged = 0; pte2p = pt2map_entry(sva); epte2p = pte2p + count; while (pte2p < epte2p) { m = *ma++; pa = VM_PAGE_TO_PHYS(m); pte2 = pte2_load(pte2p); if ((pte2_pa(pte2) != pa) || (pte2_attr(pte2) != vm_page_pte2_attr(m))) { anychanged++; pte2_store(pte2p, PTE2_KERN(pa, PTE2_AP_KRW, vm_page_pte2_attr(m))); } pte2p++; } if (__predict_false(anychanged)) tlb_flush_range(sva, count * PAGE_SIZE); } /* * This routine tears out page mappings from the * kernel -- it is meant only for temporary mappings. * Note: SMP coherent. Uses a ranged shootdown IPI. */ void pmap_qremove(vm_offset_t sva, int count) { vm_offset_t va; va = sva; while (count-- > 0) { pmap_kremove(va); va += PAGE_SIZE; } tlb_flush_range(sva, va - sva); } /* * Are we current address space or kernel? */ static __inline int pmap_is_current(pmap_t pmap) { return (pmap == kernel_pmap || (pmap == vmspace_pmap(curthread->td_proc->p_vmspace))); } /* * If the given pmap is not the current or kernel pmap, the returned * pte2 must be released by passing it to pmap_pte2_release(). */ static pt2_entry_t * pmap_pte2(pmap_t pmap, vm_offset_t va) { pt1_entry_t pte1; vm_paddr_t pt2pg_pa; pte1 = pte1_load(pmap_pte1(pmap, va)); if (pte1_is_section(pte1)) panic("%s: attempt to map PTE1", __func__); if (pte1_is_link(pte1)) { /* Are we current address space or kernel? */ if (pmap_is_current(pmap)) return (pt2map_entry(va)); /* Note that L2 page table size is not equal to PAGE_SIZE. */ pt2pg_pa = trunc_page(pte1_link_pa(pte1)); mtx_lock(&PMAP2mutex); if (pte2_pa(pte2_load(PMAP2)) != pt2pg_pa) { pte2_store(PMAP2, PTE2_KPT(pt2pg_pa)); tlb_flush((vm_offset_t)PADDR2); } return (PADDR2 + (arm32_btop(va) & (NPTE2_IN_PG - 1))); } return (NULL); } /* * Releases a pte2 that was obtained from pmap_pte2(). * Be prepared for the pte2p being NULL. */ static __inline void pmap_pte2_release(pt2_entry_t *pte2p) { if ((pt2_entry_t *)(trunc_page((vm_offset_t)pte2p)) == PADDR2) { mtx_unlock(&PMAP2mutex); } } /* * Super fast pmap_pte2 routine best used when scanning * the pv lists. This eliminates many coarse-grained * invltlb calls. Note that many of the pv list * scans are across different pmaps. It is very wasteful * to do an entire tlb flush for checking a single mapping. * * If the given pmap is not the current pmap, pvh_global_lock * must be held and curthread pinned to a CPU. */ static pt2_entry_t * pmap_pte2_quick(pmap_t pmap, vm_offset_t va) { pt1_entry_t pte1; vm_paddr_t pt2pg_pa; pte1 = pte1_load(pmap_pte1(pmap, va)); if (pte1_is_section(pte1)) panic("%s: attempt to map PTE1", __func__); if (pte1_is_link(pte1)) { /* Are we current address space or kernel? */ if (pmap_is_current(pmap)) return (pt2map_entry(va)); rw_assert(&pvh_global_lock, RA_WLOCKED); KASSERT(curthread->td_pinned > 0, ("%s: curthread not pinned", __func__)); /* Note that L2 page table size is not equal to PAGE_SIZE. */ pt2pg_pa = trunc_page(pte1_link_pa(pte1)); if (pte2_pa(pte2_load(PMAP1)) != pt2pg_pa) { pte2_store(PMAP1, PTE2_KPT(pt2pg_pa)); #ifdef SMP PMAP1cpu = PCPU_GET(cpuid); #endif tlb_flush_local((vm_offset_t)PADDR1); PMAP1changed++; } else #ifdef SMP if (PMAP1cpu != PCPU_GET(cpuid)) { PMAP1cpu = PCPU_GET(cpuid); tlb_flush_local((vm_offset_t)PADDR1); PMAP1changedcpu++; } else #endif PMAP1unchanged++; return (PADDR1 + (arm32_btop(va) & (NPTE2_IN_PG - 1))); } return (NULL); } /* * Routine: pmap_extract * Function: * Extract the physical page address associated * with the given map/virtual_address pair. */ vm_paddr_t pmap_extract(pmap_t pmap, vm_offset_t va) { vm_paddr_t pa; pt1_entry_t pte1; pt2_entry_t *pte2p; PMAP_LOCK(pmap); pte1 = pte1_load(pmap_pte1(pmap, va)); if (pte1_is_section(pte1)) pa = pte1_pa(pte1) | (va & PTE1_OFFSET); else if (pte1_is_link(pte1)) { pte2p = pmap_pte2(pmap, va); pa = pte2_pa(pte2_load(pte2p)) | (va & PTE2_OFFSET); pmap_pte2_release(pte2p); } else pa = 0; PMAP_UNLOCK(pmap); return (pa); } /* * Routine: pmap_extract_and_hold * Function: * Atomically extract and hold the physical page * with the given pmap and virtual address pair * if that mapping permits the given protection. */ vm_page_t pmap_extract_and_hold(pmap_t pmap, vm_offset_t va, vm_prot_t prot) { vm_paddr_t pa, lockpa; pt1_entry_t pte1; pt2_entry_t pte2, *pte2p; vm_page_t m; lockpa = 0; m = NULL; PMAP_LOCK(pmap); retry: pte1 = pte1_load(pmap_pte1(pmap, va)); if (pte1_is_section(pte1)) { if (!(pte1 & PTE1_RO) || !(prot & VM_PROT_WRITE)) { pa = pte1_pa(pte1) | (va & PTE1_OFFSET); if (vm_page_pa_tryrelock(pmap, pa, &lockpa)) goto retry; m = PHYS_TO_VM_PAGE(pa); vm_page_hold(m); } } else if (pte1_is_link(pte1)) { pte2p = pmap_pte2(pmap, va); pte2 = pte2_load(pte2p); pmap_pte2_release(pte2p); if (pte2_is_valid(pte2) && (!(pte2 & PTE2_RO) || !(prot & VM_PROT_WRITE))) { pa = pte2_pa(pte2); if (vm_page_pa_tryrelock(pmap, pa, &lockpa)) goto retry; m = PHYS_TO_VM_PAGE(pa); vm_page_hold(m); } } PA_UNLOCK_COND(lockpa); PMAP_UNLOCK(pmap); return (m); } /* * Grow the number of kernel L2 page table entries, if needed. */ void pmap_growkernel(vm_offset_t addr) { vm_page_t m; vm_paddr_t pt2pg_pa, pt2_pa; pt1_entry_t pte1; pt2_entry_t pte2; PDEBUG(1, printf("%s: addr = %#x\n", __func__, addr)); /* * All the time kernel_vm_end is first KVA for which underlying * L2 page table is either not allocated or linked from L1 page table * (not considering sections). Except for two possible cases: * * (1) in the very beginning as long as pmap_growkernel() was * not called, it could be first unused KVA (which is not * rounded up to PTE1_SIZE), * * (2) when all KVA space is mapped and kernel_map->max_offset * address is not rounded up to PTE1_SIZE. (For example, * it could be 0xFFFFFFFF.) */ kernel_vm_end = pte1_roundup(kernel_vm_end); mtx_assert(&kernel_map->system_mtx, MA_OWNED); addr = roundup2(addr, PTE1_SIZE); if (addr - 1 >= kernel_map->max_offset) addr = kernel_map->max_offset; while (kernel_vm_end < addr) { pte1 = pte1_load(kern_pte1(kernel_vm_end)); if (pte1_is_valid(pte1)) { kernel_vm_end += PTE1_SIZE; if (kernel_vm_end - 1 >= kernel_map->max_offset) { kernel_vm_end = kernel_map->max_offset; break; } continue; } /* * kernel_vm_end_new is used in pmap_pinit() when kernel * mappings are entered to new pmap all at once to avoid race * between pmap_kenter_pte1() and kernel_vm_end increase. * The same aplies to pmap_kenter_pt2tab(). */ kernel_vm_end_new = kernel_vm_end + PTE1_SIZE; pte2 = pt2tab_load(kern_pt2tab_entry(kernel_vm_end)); if (!pte2_is_valid(pte2)) { /* * Install new PT2s page into kernel PT2TAB. */ m = vm_page_alloc(NULL, pte1_index(kernel_vm_end) & ~PT2PG_MASK, VM_ALLOC_INTERRUPT | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED | VM_ALLOC_ZERO); if (m == NULL) panic("%s: no memory to grow kernel", __func__); /* * QQQ: To link all new L2 page tables from L1 page * table now and so pmap_kenter_pte1() them * at once together with pmap_kenter_pt2tab() * could be nice speed up. However, * pmap_growkernel() does not happen so often... * QQQ: The other TTBR is another option. */ pt2pg_pa = pmap_pt2pg_init(kernel_pmap, kernel_vm_end, m); } else pt2pg_pa = pte2_pa(pte2); pt2_pa = page_pt2pa(pt2pg_pa, pte1_index(kernel_vm_end)); pmap_kenter_pte1(kernel_vm_end, PTE1_LINK(pt2_pa)); kernel_vm_end = kernel_vm_end_new; if (kernel_vm_end - 1 >= kernel_map->max_offset) { kernel_vm_end = kernel_map->max_offset; break; } } } static int kvm_size(SYSCTL_HANDLER_ARGS) { unsigned long ksize = vm_max_kernel_address - KERNBASE; return (sysctl_handle_long(oidp, &ksize, 0, req)); } SYSCTL_PROC(_vm, OID_AUTO, kvm_size, CTLTYPE_LONG|CTLFLAG_RD, 0, 0, kvm_size, "IU", "Size of KVM"); static int kvm_free(SYSCTL_HANDLER_ARGS) { unsigned long kfree = vm_max_kernel_address - kernel_vm_end; return (sysctl_handle_long(oidp, &kfree, 0, req)); } SYSCTL_PROC(_vm, OID_AUTO, kvm_free, CTLTYPE_LONG|CTLFLAG_RD, 0, 0, kvm_free, "IU", "Amount of KVM free"); /*********************************************** * * Pmap allocation/deallocation routines. * ***********************************************/ /* * Initialize the pmap for the swapper process. */ void pmap_pinit0(pmap_t pmap) { PDEBUG(1, printf("%s: pmap = %p\n", __func__, pmap)); PMAP_LOCK_INIT(pmap); /* * Kernel page table directory and pmap stuff around is already * initialized, we are using it right now and here. So, finish * only PMAP structures initialization for process0 ... * * Since the L1 page table and PT2TAB is shared with the kernel pmap, * which is already included in the list "allpmaps", this pmap does * not need to be inserted into that list. */ pmap->pm_pt1 = kern_pt1; pmap->pm_pt2tab = kern_pt2tab; CPU_ZERO(&pmap->pm_active); PCPU_SET(curpmap, pmap); TAILQ_INIT(&pmap->pm_pvchunk); bzero(&pmap->pm_stats, sizeof pmap->pm_stats); CPU_SET(0, &pmap->pm_active); } static __inline void pte1_copy_nosync(pt1_entry_t *spte1p, pt1_entry_t *dpte1p, vm_offset_t sva, vm_offset_t eva) { u_int idx, count; idx = pte1_index(sva); count = (pte1_index(eva) - idx + 1) * sizeof(pt1_entry_t); bcopy(spte1p + idx, dpte1p + idx, count); } static __inline void pt2tab_copy_nosync(pt2_entry_t *spte2p, pt2_entry_t *dpte2p, vm_offset_t sva, vm_offset_t eva) { u_int idx, count; idx = pt2tab_index(sva); count = (pt2tab_index(eva) - idx + 1) * sizeof(pt2_entry_t); bcopy(spte2p + idx, dpte2p + idx, count); } /* * Initialize a preallocated and zeroed pmap structure, * such as one in a vmspace structure. */ int pmap_pinit(pmap_t pmap) { pt1_entry_t *pte1p; pt2_entry_t *pte2p; vm_paddr_t pa, pt2tab_pa; u_int i; PDEBUG(6, printf("%s: pmap = %p, pm_pt1 = %p\n", __func__, pmap, pmap->pm_pt1)); /* * No need to allocate L2 page table space yet but we do need * a valid L1 page table and PT2TAB table. * * Install shared kernel mappings to these tables. It's a little * tricky as some parts of KVA are reserved for vectors, devices, * and whatever else. These parts are supposed to be above * vm_max_kernel_address. Thus two regions should be installed: * * (1) . * * QQQ: The second region should be stable enough to be installed * only once in time when the tables are allocated. * QQQ: Maybe copy of both regions at once could be faster ... * QQQ: Maybe the other TTBR is an option. * * Finally, install own PT2TAB table to these tables. */ if (pmap->pm_pt1 == NULL) { pmap->pm_pt1 = (pt1_entry_t *)kmem_alloc_contig(kernel_arena, NB_IN_PT1, M_NOWAIT | M_ZERO, 0, -1UL, NB_IN_PT1, 0, pt_memattr); if (pmap->pm_pt1 == NULL) return (0); } if (pmap->pm_pt2tab == NULL) { /* * QQQ: (1) PT2TAB must be contiguous. If PT2TAB is one page * only, what should be the only size for 32 bit systems, * then we could allocate it with vm_page_alloc() and all * the stuff needed as other L2 page table pages. * (2) Note that a process PT2TAB is special L2 page table * page. Its mapping in kernel_arena is permanent and can * be used no matter which process is current. Its mapping * in PT2MAP can be used only for current process. */ pmap->pm_pt2tab = (pt2_entry_t *)kmem_alloc_attr(kernel_arena, NB_IN_PT2TAB, M_NOWAIT | M_ZERO, 0, -1UL, pt_memattr); if (pmap->pm_pt2tab == NULL) { /* * QQQ: As struct pmap is allocated from UMA with * UMA_ZONE_NOFREE flag, it's important to leave * no allocation in pmap if initialization failed. */ kmem_free(kernel_arena, (vm_offset_t)pmap->pm_pt1, NB_IN_PT1); pmap->pm_pt1 = NULL; return (0); } /* * QQQ: Each L2 page table page vm_page_t has pindex set to * pte1 index of virtual address mapped by this page. * It's not valid for non kernel PT2TABs themselves. * The pindex of these pages can not be altered because * of the way how they are allocated now. However, it * should not be a problem. */ } mtx_lock_spin(&allpmaps_lock); /* * To avoid race with pmap_kenter_pte1() and pmap_kenter_pt2tab(), * kernel_vm_end_new is used here instead of kernel_vm_end. */ pte1_copy_nosync(kern_pt1, pmap->pm_pt1, KERNBASE, kernel_vm_end_new - 1); pte1_copy_nosync(kern_pt1, pmap->pm_pt1, vm_max_kernel_address, 0xFFFFFFFF); pt2tab_copy_nosync(kern_pt2tab, pmap->pm_pt2tab, KERNBASE, kernel_vm_end_new - 1); pt2tab_copy_nosync(kern_pt2tab, pmap->pm_pt2tab, vm_max_kernel_address, 0xFFFFFFFF); LIST_INSERT_HEAD(&allpmaps, pmap, pm_list); mtx_unlock_spin(&allpmaps_lock); /* * Store PT2MAP PT2 pages (a.k.a. PT2TAB) in PT2TAB itself. * I.e. self reference mapping. The PT2TAB is private, however mapped * into shared PT2MAP space, so the mapping should be not global. */ pt2tab_pa = vtophys(pmap->pm_pt2tab); pte2p = pmap_pt2tab_entry(pmap, (vm_offset_t)PT2MAP); for (pa = pt2tab_pa, i = 0; i < NPG_IN_PT2TAB; i++, pa += PTE2_SIZE) { pt2tab_store(pte2p++, PTE2_KPT_NG(pa)); } /* Insert PT2MAP PT2s into pmap PT1. */ pte1p = pmap_pte1(pmap, (vm_offset_t)PT2MAP); for (pa = pt2tab_pa, i = 0; i < NPT2_IN_PT2TAB; i++, pa += NB_IN_PT2) { pte1_store(pte1p++, PTE1_LINK(pa)); } /* * Now synchronize new mapping which was made above. */ pte1_sync_range(pmap->pm_pt1, NB_IN_PT1); pte2_sync_range(pmap->pm_pt2tab, NB_IN_PT2TAB); CPU_ZERO(&pmap->pm_active); TAILQ_INIT(&pmap->pm_pvchunk); bzero(&pmap->pm_stats, sizeof pmap->pm_stats); return (1); } #ifdef INVARIANTS static boolean_t pt2tab_user_is_empty(pt2_entry_t *tab) { u_int i, end; end = pt2tab_index(VM_MAXUSER_ADDRESS); for (i = 0; i < end; i++) if (tab[i] != 0) return (FALSE); return (TRUE); } #endif /* * Release any resources held by the given physical map. * Called when a pmap initialized by pmap_pinit is being released. * Should only be called if the map contains no valid mappings. */ void pmap_release(pmap_t pmap) { #ifdef INVARIANTS vm_offset_t start, end; #endif KASSERT(pmap->pm_stats.resident_count == 0, ("%s: pmap resident count %ld != 0", __func__, pmap->pm_stats.resident_count)); KASSERT(pt2tab_user_is_empty(pmap->pm_pt2tab), ("%s: has allocated user PT2(s)", __func__)); KASSERT(CPU_EMPTY(&pmap->pm_active), ("%s: pmap %p is active on some CPU(s)", __func__, pmap)); mtx_lock_spin(&allpmaps_lock); LIST_REMOVE(pmap, pm_list); mtx_unlock_spin(&allpmaps_lock); #ifdef INVARIANTS start = pte1_index(KERNBASE) * sizeof(pt1_entry_t); end = (pte1_index(0xFFFFFFFF) + 1) * sizeof(pt1_entry_t); bzero((char *)pmap->pm_pt1 + start, end - start); start = pt2tab_index(KERNBASE) * sizeof(pt2_entry_t); end = (pt2tab_index(0xFFFFFFFF) + 1) * sizeof(pt2_entry_t); bzero((char *)pmap->pm_pt2tab + start, end - start); #endif /* * We are leaving PT1 and PT2TAB allocated on released pmap, * so hopefully UMA vmspace_zone will always be inited with * UMA_ZONE_NOFREE flag. */ } /********************************************************* * * L2 table pages and their pages management routines. * *********************************************************/ /* * Virtual interface for L2 page table wire counting. * * Each L2 page table in a page has own counter which counts a number of * valid mappings in a table. Global page counter counts mappings in all * tables in a page plus a single itself mapping in PT2TAB. * * During a promotion we leave the associated L2 page table counter * untouched, so the table (strictly speaking a page which holds it) * is never freed if promoted. * * If a page m->wire_count == 1 then no valid mappings exist in any L2 page * table in the page and the page itself is only mapped in PT2TAB. */ static __inline void pt2_wirecount_init(vm_page_t m) { u_int i; /* * Note: A page m is allocated with VM_ALLOC_WIRED flag and * m->wire_count should be already set correctly. * So, there is no need to set it again herein. */ for (i = 0; i < NPT2_IN_PG; i++) m->md.pt2_wirecount[i] = 0; } static __inline void pt2_wirecount_inc(vm_page_t m, uint32_t pte1_idx) { /* * Note: A just modificated pte2 (i.e. already allocated) * is acquiring one extra reference which must be * explicitly cleared. It influences the KASSERTs herein. * All L2 page tables in a page always belong to the same * pmap, so we allow only one extra reference for the page. */ KASSERT(m->md.pt2_wirecount[pte1_idx & PT2PG_MASK] < (NPTE2_IN_PT2 + 1), ("%s: PT2 is overflowing ...", __func__)); KASSERT(m->wire_count <= (NPTE2_IN_PG + 1), ("%s: PT2PG is overflowing ...", __func__)); m->wire_count++; m->md.pt2_wirecount[pte1_idx & PT2PG_MASK]++; } static __inline void pt2_wirecount_dec(vm_page_t m, uint32_t pte1_idx) { KASSERT(m->md.pt2_wirecount[pte1_idx & PT2PG_MASK] != 0, ("%s: PT2 is underflowing ...", __func__)); KASSERT(m->wire_count > 1, ("%s: PT2PG is underflowing ...", __func__)); m->wire_count--; m->md.pt2_wirecount[pte1_idx & PT2PG_MASK]--; } static __inline void pt2_wirecount_set(vm_page_t m, uint32_t pte1_idx, uint16_t count) { KASSERT(count <= NPTE2_IN_PT2, ("%s: invalid count %u", __func__, count)); KASSERT(m->wire_count > m->md.pt2_wirecount[pte1_idx & PT2PG_MASK], ("%s: PT2PG corrupting (%u, %u) ...", __func__, m->wire_count, m->md.pt2_wirecount[pte1_idx & PT2PG_MASK])); m->wire_count -= m->md.pt2_wirecount[pte1_idx & PT2PG_MASK]; m->wire_count += count; m->md.pt2_wirecount[pte1_idx & PT2PG_MASK] = count; KASSERT(m->wire_count <= (NPTE2_IN_PG + 1), ("%s: PT2PG is overflowed (%u) ...", __func__, m->wire_count)); } static __inline uint32_t pt2_wirecount_get(vm_page_t m, uint32_t pte1_idx) { return (m->md.pt2_wirecount[pte1_idx & PT2PG_MASK]); } static __inline boolean_t pt2_is_empty(vm_page_t m, vm_offset_t va) { return (m->md.pt2_wirecount[pte1_index(va) & PT2PG_MASK] == 0); } static __inline boolean_t pt2_is_full(vm_page_t m, vm_offset_t va) { return (m->md.pt2_wirecount[pte1_index(va) & PT2PG_MASK] == NPTE2_IN_PT2); } static __inline boolean_t pt2pg_is_empty(vm_page_t m) { return (m->wire_count == 1); } /* * This routine is called if the L2 page table * is not mapped correctly. */ static vm_page_t _pmap_allocpte2(pmap_t pmap, vm_offset_t va, u_int flags) { uint32_t pte1_idx; pt1_entry_t *pte1p; pt2_entry_t pte2; vm_page_t m; vm_paddr_t pt2pg_pa, pt2_pa; pte1_idx = pte1_index(va); pte1p = pmap->pm_pt1 + pte1_idx; KASSERT(pte1_load(pte1p) == 0, ("%s: pm_pt1[%#x] is not zero: %#x", __func__, pte1_idx, pte1_load(pte1p))); pte2 = pt2tab_load(pmap_pt2tab_entry(pmap, va)); if (!pte2_is_valid(pte2)) { /* * Install new PT2s page into pmap PT2TAB. */ m = vm_page_alloc(NULL, pte1_idx & ~PT2PG_MASK, VM_ALLOC_NOOBJ | VM_ALLOC_WIRED | VM_ALLOC_ZERO); if (m == NULL) { if ((flags & PMAP_ENTER_NOSLEEP) == 0) { PMAP_UNLOCK(pmap); rw_wunlock(&pvh_global_lock); VM_WAIT; rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); } /* * Indicate the need to retry. While waiting, * the L2 page table page may have been allocated. */ return (NULL); } pmap->pm_stats.resident_count++; pt2pg_pa = pmap_pt2pg_init(pmap, va, m); } else { pt2pg_pa = pte2_pa(pte2); m = PHYS_TO_VM_PAGE(pt2pg_pa); } pt2_wirecount_inc(m, pte1_idx); pt2_pa = page_pt2pa(pt2pg_pa, pte1_idx); pte1_store(pte1p, PTE1_LINK(pt2_pa)); return (m); } static vm_page_t pmap_allocpte2(pmap_t pmap, vm_offset_t va, u_int flags) { u_int pte1_idx; pt1_entry_t *pte1p, pte1; vm_page_t m; pte1_idx = pte1_index(va); retry: pte1p = pmap->pm_pt1 + pte1_idx; pte1 = pte1_load(pte1p); /* * This supports switching from a 1MB page to a * normal 4K page. */ if (pte1_is_section(pte1)) { (void)pmap_demote_pte1(pmap, pte1p, va); /* * Reload pte1 after demotion. * * Note: Demotion can even fail as either PT2 is not find for * the virtual address or PT2PG can not be allocated. */ pte1 = pte1_load(pte1p); } /* * If the L2 page table page is mapped, we just increment the * hold count, and activate it. */ if (pte1_is_link(pte1)) { m = PHYS_TO_VM_PAGE(pte1_link_pa(pte1)); pt2_wirecount_inc(m, pte1_idx); } else { /* * Here if the PT2 isn't mapped, or if it has * been deallocated. */ m = _pmap_allocpte2(pmap, va, flags); if (m == NULL && (flags & PMAP_ENTER_NOSLEEP) == 0) goto retry; } return (m); } static __inline void pmap_free_zero_pages(struct spglist *free) { vm_page_t m; while ((m = SLIST_FIRST(free)) != NULL) { SLIST_REMOVE_HEAD(free, plinks.s.ss); /* Preserve the page's PG_ZERO setting. */ vm_page_free_toq(m); } } /* * Schedule the specified unused L2 page table page to be freed. Specifically, * add the page to the specified list of pages that will be released to the * physical memory manager after the TLB has been updated. */ static __inline void pmap_add_delayed_free_list(vm_page_t m, struct spglist *free) { /* * Put page on a list so that it is released after * *ALL* TLB shootdown is done */ #ifdef PMAP_DEBUG pmap_zero_page_check(m); #endif m->flags |= PG_ZERO; SLIST_INSERT_HEAD(free, m, plinks.s.ss); } /* * Unwire L2 page tables page. */ static void pmap_unwire_pt2pg(pmap_t pmap, vm_offset_t va, vm_page_t m) { pt1_entry_t *pte1p, opte1 __unused; pt2_entry_t *pte2p; uint32_t i; KASSERT(pt2pg_is_empty(m), ("%s: pmap %p PT2PG %p wired", __func__, pmap, m)); /* * Unmap all L2 page tables in the page from L1 page table. * * QQQ: Individual L2 page tables (except the last one) can be unmapped * earlier. However, we are doing that this way. */ KASSERT(m->pindex == (pte1_index(va) & ~PT2PG_MASK), ("%s: pmap %p va %#x PT2PG %p bad index", __func__, pmap, va, m)); pte1p = pmap->pm_pt1 + m->pindex; for (i = 0; i < NPT2_IN_PG; i++, pte1p++) { KASSERT(m->md.pt2_wirecount[i] == 0, ("%s: pmap %p PT2 %u (PG %p) wired", __func__, pmap, i, m)); opte1 = pte1_load(pte1p); if (pte1_is_link(opte1)) { pte1_clear(pte1p); /* * Flush intermediate TLB cache. */ pmap_tlb_flush(pmap, (m->pindex + i) << PTE1_SHIFT); } #ifdef INVARIANTS else KASSERT((opte1 == 0) || pte1_is_section(opte1), ("%s: pmap %p va %#x bad pte1 %x at %u", __func__, pmap, va, opte1, i)); #endif } /* * Unmap the page from PT2TAB. */ pte2p = pmap_pt2tab_entry(pmap, va); (void)pt2tab_load_clear(pte2p); pmap_tlb_flush(pmap, pt2map_pt2pg(va)); m->wire_count = 0; pmap->pm_stats.resident_count--; /* * This is a release store so that the ordinary store unmapping * the L2 page table page is globally performed before TLB shoot- * down is begun. */ atomic_subtract_rel_int(&vm_cnt.v_wire_count, 1); } /* * Decrements a L2 page table page's wire count, which is used to record the * number of valid page table entries within the page. If the wire count * drops to zero, then the page table page is unmapped. Returns TRUE if the * page table page was unmapped and FALSE otherwise. */ static __inline boolean_t pmap_unwire_pt2(pmap_t pmap, vm_offset_t va, vm_page_t m, struct spglist *free) { pt2_wirecount_dec(m, pte1_index(va)); if (pt2pg_is_empty(m)) { /* * QQQ: Wire count is zero, so whole page should be zero and * we can set PG_ZERO flag to it. * Note that when promotion is enabled, it takes some * more efforts. See pmap_unwire_pt2_all() below. */ pmap_unwire_pt2pg(pmap, va, m); pmap_add_delayed_free_list(m, free); return (TRUE); } else return (FALSE); } /* * Drop a L2 page table page's wire count at once, which is used to record * the number of valid L2 page table entries within the page. If the wire * count drops to zero, then the L2 page table page is unmapped. */ static __inline void pmap_unwire_pt2_all(pmap_t pmap, vm_offset_t va, vm_page_t m, struct spglist *free) { u_int pte1_idx = pte1_index(va); KASSERT(m->pindex == (pte1_idx & ~PT2PG_MASK), ("%s: PT2 page's pindex is wrong", __func__)); KASSERT(m->wire_count > pt2_wirecount_get(m, pte1_idx), ("%s: bad pt2 wire count %u > %u", __func__, m->wire_count, pt2_wirecount_get(m, pte1_idx))); /* * It's possible that the L2 page table was never used. * It happened in case that a section was created without promotion. */ if (pt2_is_full(m, va)) { pt2_wirecount_set(m, pte1_idx, 0); /* * QQQ: We clear L2 page table now, so when L2 page table page * is going to be freed, we can set it PG_ZERO flag ... * This function is called only on section mappings, so * hopefully it's not to big overload. * * XXX: If pmap is current, existing PT2MAP mapping could be * used for zeroing. */ pmap_zero_page_area(m, page_pt2off(pte1_idx), NB_IN_PT2); } #ifdef INVARIANTS else KASSERT(pt2_is_empty(m, va), ("%s: PT2 is not empty (%u)", __func__, pt2_wirecount_get(m, pte1_idx))); #endif if (pt2pg_is_empty(m)) { pmap_unwire_pt2pg(pmap, va, m); pmap_add_delayed_free_list(m, free); } } /* * After removing a L2 page table entry, this routine is used to * conditionally free the page, and manage the hold/wire counts. */ static boolean_t pmap_unuse_pt2(pmap_t pmap, vm_offset_t va, struct spglist *free) { pt1_entry_t pte1; vm_page_t mpte; if (va >= VM_MAXUSER_ADDRESS) return (FALSE); pte1 = pte1_load(pmap_pte1(pmap, va)); mpte = PHYS_TO_VM_PAGE(pte1_link_pa(pte1)); return (pmap_unwire_pt2(pmap, va, mpte, free)); } /************************************* * * Page management routines. * *************************************/ CTASSERT(sizeof(struct pv_chunk) == PAGE_SIZE); CTASSERT(_NPCM == 11); CTASSERT(_NPCPV == 336); static __inline struct pv_chunk * pv_to_chunk(pv_entry_t pv) { return ((struct pv_chunk *)((uintptr_t)pv & ~(uintptr_t)PAGE_MASK)); } #define PV_PMAP(pv) (pv_to_chunk(pv)->pc_pmap) #define PC_FREE0_9 0xfffffffful /* Free values for index 0 through 9 */ #define PC_FREE10 0x0000fffful /* Free values for index 10 */ static const uint32_t pc_freemask[_NPCM] = { PC_FREE0_9, PC_FREE0_9, PC_FREE0_9, PC_FREE0_9, PC_FREE0_9, PC_FREE0_9, PC_FREE0_9, PC_FREE0_9, PC_FREE0_9, PC_FREE0_9, PC_FREE10 }; SYSCTL_INT(_vm_pmap, OID_AUTO, pv_entry_count, CTLFLAG_RD, &pv_entry_count, 0, "Current number of pv entries"); #ifdef PV_STATS static int pc_chunk_count, pc_chunk_allocs, pc_chunk_frees, pc_chunk_tryfail; SYSCTL_INT(_vm_pmap, OID_AUTO, pc_chunk_count, CTLFLAG_RD, &pc_chunk_count, 0, "Current number of pv entry chunks"); SYSCTL_INT(_vm_pmap, OID_AUTO, pc_chunk_allocs, CTLFLAG_RD, &pc_chunk_allocs, 0, "Current number of pv entry chunks allocated"); SYSCTL_INT(_vm_pmap, OID_AUTO, pc_chunk_frees, CTLFLAG_RD, &pc_chunk_frees, 0, "Current number of pv entry chunks frees"); SYSCTL_INT(_vm_pmap, OID_AUTO, pc_chunk_tryfail, CTLFLAG_RD, &pc_chunk_tryfail, 0, "Number of times tried to get a chunk page but failed."); static long pv_entry_frees, pv_entry_allocs; static int pv_entry_spare; SYSCTL_LONG(_vm_pmap, OID_AUTO, pv_entry_frees, CTLFLAG_RD, &pv_entry_frees, 0, "Current number of pv entry frees"); SYSCTL_LONG(_vm_pmap, OID_AUTO, pv_entry_allocs, CTLFLAG_RD, &pv_entry_allocs, 0, "Current number of pv entry allocs"); SYSCTL_INT(_vm_pmap, OID_AUTO, pv_entry_spare, CTLFLAG_RD, &pv_entry_spare, 0, "Current number of spare pv entries"); #endif /* * Is given page managed? */ static __inline boolean_t is_managed(vm_paddr_t pa) { vm_offset_t pgnum; vm_page_t m; pgnum = atop(pa); if (pgnum >= first_page) { m = PHYS_TO_VM_PAGE(pa); if (m == NULL) return (FALSE); if ((m->oflags & VPO_UNMANAGED) == 0) return (TRUE); } return (FALSE); } static __inline boolean_t pte1_is_managed(pt1_entry_t pte1) { return (is_managed(pte1_pa(pte1))); } static __inline boolean_t pte2_is_managed(pt2_entry_t pte2) { return (is_managed(pte2_pa(pte2))); } /* * We are in a serious low memory condition. Resort to * drastic measures to free some pages so we can allocate * another pv entry chunk. */ static vm_page_t pmap_pv_reclaim(pmap_t locked_pmap) { struct pch newtail; struct pv_chunk *pc; struct md_page *pvh; pt1_entry_t *pte1p; pmap_t pmap; pt2_entry_t *pte2p, tpte2; pv_entry_t pv; vm_offset_t va; vm_page_t m, m_pc; struct spglist free; uint32_t inuse; int bit, field, freed; PMAP_LOCK_ASSERT(locked_pmap, MA_OWNED); pmap = NULL; m_pc = NULL; SLIST_INIT(&free); TAILQ_INIT(&newtail); while ((pc = TAILQ_FIRST(&pv_chunks)) != NULL && (pv_vafree == 0 || SLIST_EMPTY(&free))) { TAILQ_REMOVE(&pv_chunks, pc, pc_lru); if (pmap != pc->pc_pmap) { if (pmap != NULL) { if (pmap != locked_pmap) PMAP_UNLOCK(pmap); } pmap = pc->pc_pmap; /* Avoid deadlock and lock recursion. */ if (pmap > locked_pmap) PMAP_LOCK(pmap); else if (pmap != locked_pmap && !PMAP_TRYLOCK(pmap)) { pmap = NULL; TAILQ_INSERT_TAIL(&newtail, pc, pc_lru); continue; } } /* * Destroy every non-wired, 4 KB page mapping in the chunk. */ freed = 0; for (field = 0; field < _NPCM; field++) { for (inuse = ~pc->pc_map[field] & pc_freemask[field]; inuse != 0; inuse &= ~(1UL << bit)) { bit = ffs(inuse) - 1; pv = &pc->pc_pventry[field * 32 + bit]; va = pv->pv_va; pte1p = pmap_pte1(pmap, va); if (pte1_is_section(pte1_load(pte1p))) continue; pte2p = pmap_pte2(pmap, va); tpte2 = pte2_load(pte2p); if ((tpte2 & PTE2_W) == 0) tpte2 = pte2_load_clear(pte2p); pmap_pte2_release(pte2p); if ((tpte2 & PTE2_W) != 0) continue; KASSERT(tpte2 != 0, ("pmap_pv_reclaim: pmap %p va %#x zero pte", pmap, va)); pmap_tlb_flush(pmap, va); m = PHYS_TO_VM_PAGE(pte2_pa(tpte2)); if (pte2_is_dirty(tpte2)) vm_page_dirty(m); if ((tpte2 & PTE2_A) != 0) vm_page_aflag_set(m, PGA_REFERENCED); TAILQ_REMOVE(&m->md.pv_list, pv, pv_next); if (TAILQ_EMPTY(&m->md.pv_list) && (m->flags & PG_FICTITIOUS) == 0) { pvh = pa_to_pvh(VM_PAGE_TO_PHYS(m)); if (TAILQ_EMPTY(&pvh->pv_list)) { vm_page_aflag_clear(m, PGA_WRITEABLE); } } pc->pc_map[field] |= 1UL << bit; pmap_unuse_pt2(pmap, va, &free); freed++; } } if (freed == 0) { TAILQ_INSERT_TAIL(&newtail, pc, pc_lru); continue; } /* Every freed mapping is for a 4 KB page. */ pmap->pm_stats.resident_count -= freed; PV_STAT(pv_entry_frees += freed); PV_STAT(pv_entry_spare += freed); pv_entry_count -= freed; TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); for (field = 0; field < _NPCM; field++) if (pc->pc_map[field] != pc_freemask[field]) { TAILQ_INSERT_HEAD(&pmap->pm_pvchunk, pc, pc_list); TAILQ_INSERT_TAIL(&newtail, pc, pc_lru); /* * One freed pv entry in locked_pmap is * sufficient. */ if (pmap == locked_pmap) goto out; break; } if (field == _NPCM) { PV_STAT(pv_entry_spare -= _NPCPV); PV_STAT(pc_chunk_count--); PV_STAT(pc_chunk_frees++); /* Entire chunk is free; return it. */ m_pc = PHYS_TO_VM_PAGE(pmap_kextract((vm_offset_t)pc)); pmap_qremove((vm_offset_t)pc, 1); pmap_pte2list_free(&pv_vafree, (vm_offset_t)pc); break; } } out: TAILQ_CONCAT(&pv_chunks, &newtail, pc_lru); if (pmap != NULL) { if (pmap != locked_pmap) PMAP_UNLOCK(pmap); } if (m_pc == NULL && pv_vafree != 0 && SLIST_EMPTY(&free)) { m_pc = SLIST_FIRST(&free); SLIST_REMOVE_HEAD(&free, plinks.s.ss); /* Recycle a freed page table page. */ m_pc->wire_count = 1; atomic_add_int(&vm_cnt.v_wire_count, 1); } pmap_free_zero_pages(&free); return (m_pc); } static void free_pv_chunk(struct pv_chunk *pc) { vm_page_t m; TAILQ_REMOVE(&pv_chunks, pc, pc_lru); PV_STAT(pv_entry_spare -= _NPCPV); PV_STAT(pc_chunk_count--); PV_STAT(pc_chunk_frees++); /* entire chunk is free, return it */ m = PHYS_TO_VM_PAGE(pmap_kextract((vm_offset_t)pc)); pmap_qremove((vm_offset_t)pc, 1); vm_page_unwire(m, PQ_NONE); vm_page_free(m); pmap_pte2list_free(&pv_vafree, (vm_offset_t)pc); } /* * Free the pv_entry back to the free list. */ static void free_pv_entry(pmap_t pmap, pv_entry_t pv) { struct pv_chunk *pc; int idx, field, bit; rw_assert(&pvh_global_lock, RA_WLOCKED); PMAP_LOCK_ASSERT(pmap, MA_OWNED); PV_STAT(pv_entry_frees++); PV_STAT(pv_entry_spare++); pv_entry_count--; pc = pv_to_chunk(pv); idx = pv - &pc->pc_pventry[0]; field = idx / 32; bit = idx % 32; pc->pc_map[field] |= 1ul << bit; for (idx = 0; idx < _NPCM; idx++) if (pc->pc_map[idx] != pc_freemask[idx]) { /* * 98% of the time, pc is already at the head of the * list. If it isn't already, move it to the head. */ if (__predict_false(TAILQ_FIRST(&pmap->pm_pvchunk) != pc)) { TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); TAILQ_INSERT_HEAD(&pmap->pm_pvchunk, pc, pc_list); } return; } TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); free_pv_chunk(pc); } /* * Get a new pv_entry, allocating a block from the system * when needed. */ static pv_entry_t get_pv_entry(pmap_t pmap, boolean_t try) { static const struct timeval printinterval = { 60, 0 }; static struct timeval lastprint; int bit, field; pv_entry_t pv; struct pv_chunk *pc; vm_page_t m; rw_assert(&pvh_global_lock, RA_WLOCKED); PMAP_LOCK_ASSERT(pmap, MA_OWNED); PV_STAT(pv_entry_allocs++); pv_entry_count++; if (pv_entry_count > pv_entry_high_water) if (ratecheck(&lastprint, &printinterval)) printf("Approaching the limit on PV entries, consider " "increasing either the vm.pmap.shpgperproc or the " "vm.pmap.pv_entry_max tunable.\n"); retry: pc = TAILQ_FIRST(&pmap->pm_pvchunk); if (pc != NULL) { for (field = 0; field < _NPCM; field++) { if (pc->pc_map[field]) { bit = ffs(pc->pc_map[field]) - 1; break; } } if (field < _NPCM) { pv = &pc->pc_pventry[field * 32 + bit]; pc->pc_map[field] &= ~(1ul << bit); /* If this was the last item, move it to tail */ for (field = 0; field < _NPCM; field++) if (pc->pc_map[field] != 0) { PV_STAT(pv_entry_spare--); return (pv); /* not full, return */ } TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); TAILQ_INSERT_TAIL(&pmap->pm_pvchunk, pc, pc_list); PV_STAT(pv_entry_spare--); return (pv); } } /* * Access to the pte2list "pv_vafree" is synchronized by the pvh * global lock. If "pv_vafree" is currently non-empty, it will * remain non-empty until pmap_pte2list_alloc() completes. */ if (pv_vafree == 0 || (m = vm_page_alloc(NULL, 0, VM_ALLOC_NORMAL | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED)) == NULL) { if (try) { pv_entry_count--; PV_STAT(pc_chunk_tryfail++); return (NULL); } m = pmap_pv_reclaim(pmap); if (m == NULL) goto retry; } PV_STAT(pc_chunk_count++); PV_STAT(pc_chunk_allocs++); pc = (struct pv_chunk *)pmap_pte2list_alloc(&pv_vafree); pmap_qenter((vm_offset_t)pc, &m, 1); pc->pc_pmap = pmap; pc->pc_map[0] = pc_freemask[0] & ~1ul; /* preallocated bit 0 */ for (field = 1; field < _NPCM; field++) pc->pc_map[field] = pc_freemask[field]; TAILQ_INSERT_TAIL(&pv_chunks, pc, pc_lru); pv = &pc->pc_pventry[0]; TAILQ_INSERT_HEAD(&pmap->pm_pvchunk, pc, pc_list); PV_STAT(pv_entry_spare += _NPCPV - 1); return (pv); } /* * Create a pv entry for page at pa for * (pmap, va). */ static void pmap_insert_entry(pmap_t pmap, vm_offset_t va, vm_page_t m) { pv_entry_t pv; rw_assert(&pvh_global_lock, RA_WLOCKED); PMAP_LOCK_ASSERT(pmap, MA_OWNED); pv = get_pv_entry(pmap, FALSE); pv->pv_va = va; TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_next); } static __inline pv_entry_t pmap_pvh_remove(struct md_page *pvh, pmap_t pmap, vm_offset_t va) { pv_entry_t pv; rw_assert(&pvh_global_lock, RA_WLOCKED); TAILQ_FOREACH(pv, &pvh->pv_list, pv_next) { if (pmap == PV_PMAP(pv) && va == pv->pv_va) { TAILQ_REMOVE(&pvh->pv_list, pv, pv_next); break; } } return (pv); } static void pmap_pvh_free(struct md_page *pvh, pmap_t pmap, vm_offset_t va) { pv_entry_t pv; pv = pmap_pvh_remove(pvh, pmap, va); KASSERT(pv != NULL, ("pmap_pvh_free: pv not found")); free_pv_entry(pmap, pv); } static void pmap_remove_entry(pmap_t pmap, vm_page_t m, vm_offset_t va) { struct md_page *pvh; rw_assert(&pvh_global_lock, RA_WLOCKED); pmap_pvh_free(&m->md, pmap, va); if (TAILQ_EMPTY(&m->md.pv_list) && (m->flags & PG_FICTITIOUS) == 0) { pvh = pa_to_pvh(VM_PAGE_TO_PHYS(m)); if (TAILQ_EMPTY(&pvh->pv_list)) vm_page_aflag_clear(m, PGA_WRITEABLE); } } static void pmap_pv_demote_pte1(pmap_t pmap, vm_offset_t va, vm_paddr_t pa) { struct md_page *pvh; pv_entry_t pv; vm_offset_t va_last; vm_page_t m; rw_assert(&pvh_global_lock, RA_WLOCKED); KASSERT((pa & PTE1_OFFSET) == 0, ("pmap_pv_demote_pte1: pa is not 1mpage aligned")); /* * Transfer the 1mpage's pv entry for this mapping to the first * page's pv list. */ pvh = pa_to_pvh(pa); va = pte1_trunc(va); pv = pmap_pvh_remove(pvh, pmap, va); KASSERT(pv != NULL, ("pmap_pv_demote_pte1: pv not found")); m = PHYS_TO_VM_PAGE(pa); TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_next); /* Instantiate the remaining NPTE2_IN_PT2 - 1 pv entries. */ va_last = va + PTE1_SIZE - PAGE_SIZE; do { m++; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_pv_demote_pte1: page %p is not managed", m)); va += PAGE_SIZE; pmap_insert_entry(pmap, va, m); } while (va < va_last); } static void pmap_pv_promote_pte1(pmap_t pmap, vm_offset_t va, vm_paddr_t pa) { struct md_page *pvh; pv_entry_t pv; vm_offset_t va_last; vm_page_t m; rw_assert(&pvh_global_lock, RA_WLOCKED); KASSERT((pa & PTE1_OFFSET) == 0, ("pmap_pv_promote_pte1: pa is not 1mpage aligned")); /* * Transfer the first page's pv entry for this mapping to the * 1mpage's pv list. Aside from avoiding the cost of a call * to get_pv_entry(), a transfer avoids the possibility that * get_pv_entry() calls pmap_pv_reclaim() and that pmap_pv_reclaim() * removes one of the mappings that is being promoted. */ m = PHYS_TO_VM_PAGE(pa); va = pte1_trunc(va); pv = pmap_pvh_remove(&m->md, pmap, va); KASSERT(pv != NULL, ("pmap_pv_promote_pte1: pv not found")); pvh = pa_to_pvh(pa); TAILQ_INSERT_TAIL(&pvh->pv_list, pv, pv_next); /* Free the remaining NPTE2_IN_PT2 - 1 pv entries. */ va_last = va + PTE1_SIZE - PAGE_SIZE; do { m++; va += PAGE_SIZE; pmap_pvh_free(&m->md, pmap, va); } while (va < va_last); } /* * Conditionally create a pv entry. */ static boolean_t pmap_try_insert_pv_entry(pmap_t pmap, vm_offset_t va, vm_page_t m) { pv_entry_t pv; rw_assert(&pvh_global_lock, RA_WLOCKED); PMAP_LOCK_ASSERT(pmap, MA_OWNED); if (pv_entry_count < pv_entry_high_water && (pv = get_pv_entry(pmap, TRUE)) != NULL) { pv->pv_va = va; TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_next); return (TRUE); } else return (FALSE); } /* * Create the pv entries for each of the pages within a section. */ static boolean_t pmap_pv_insert_pte1(pmap_t pmap, vm_offset_t va, vm_paddr_t pa) { struct md_page *pvh; pv_entry_t pv; rw_assert(&pvh_global_lock, RA_WLOCKED); if (pv_entry_count < pv_entry_high_water && (pv = get_pv_entry(pmap, TRUE)) != NULL) { pv->pv_va = va; pvh = pa_to_pvh(pa); TAILQ_INSERT_TAIL(&pvh->pv_list, pv, pv_next); return (TRUE); } else return (FALSE); } static inline void pmap_tlb_flush_pte1(pmap_t pmap, vm_offset_t va, pt1_entry_t npte1) { /* Kill all the small mappings or the big one only. */ if (pte1_is_section(npte1)) pmap_tlb_flush_range(pmap, pte1_trunc(va), PTE1_SIZE); else pmap_tlb_flush(pmap, pte1_trunc(va)); } /* * Update kernel pte1 on all pmaps. * * The following function is called only on one cpu with disabled interrupts. * In SMP case, smp_rendezvous_cpus() is used to stop other cpus. This way * nobody can invoke explicit hardware table walk during the update of pte1. * Unsolicited hardware table walk can still happen, invoked by speculative * data or instruction prefetch or even by speculative hardware table walk. * * The break-before-make approach should be implemented here. However, it's * not so easy to do that for kernel mappings as it would be unhappy to unmap * itself unexpectedly but voluntarily. */ static void pmap_update_pte1_kernel(vm_offset_t va, pt1_entry_t npte1) { pmap_t pmap; pt1_entry_t *pte1p; /* * Get current pmap. Interrupts should be disabled here * so PCPU_GET() is done atomically. */ pmap = PCPU_GET(curpmap); if (pmap == NULL) pmap = kernel_pmap; /* * (1) Change pte1 on current pmap. * (2) Flush all obsolete TLB entries on current CPU. * (3) Change pte1 on all pmaps. * (4) Flush all obsolete TLB entries on all CPUs in SMP case. */ pte1p = pmap_pte1(pmap, va); pte1_store(pte1p, npte1); /* Kill all the small mappings or the big one only. */ if (pte1_is_section(npte1)) { pmap_pte1_kern_promotions++; tlb_flush_range_local(pte1_trunc(va), PTE1_SIZE); } else { pmap_pte1_kern_demotions++; tlb_flush_local(pte1_trunc(va)); } /* * In SMP case, this function is called when all cpus are at smp * rendezvous, so there is no need to use 'allpmaps_lock' lock here. * In UP case, the function is called with this lock locked. */ LIST_FOREACH(pmap, &allpmaps, pm_list) { pte1p = pmap_pte1(pmap, va); pte1_store(pte1p, npte1); } #ifdef SMP /* Kill all the small mappings or the big one only. */ if (pte1_is_section(npte1)) tlb_flush_range(pte1_trunc(va), PTE1_SIZE); else tlb_flush(pte1_trunc(va)); #endif } #ifdef SMP struct pte1_action { vm_offset_t va; pt1_entry_t npte1; u_int update; /* CPU that updates the PTE1 */ }; static void pmap_update_pte1_action(void *arg) { struct pte1_action *act = arg; if (act->update == PCPU_GET(cpuid)) pmap_update_pte1_kernel(act->va, act->npte1); } /* * Change pte1 on current pmap. * Note that kernel pte1 must be changed on all pmaps. * * According to the architecture reference manual published by ARM, * the behaviour is UNPREDICTABLE when two or more TLB entries map the same VA. * According to this manual, UNPREDICTABLE behaviours must never happen in * a viable system. In contrast, on x86 processors, it is not specified which * TLB entry mapping the virtual address will be used, but the MMU doesn't * generate a bogus translation the way it does on Cortex-A8 rev 2 (Beaglebone * Black). * * It's a problem when either promotion or demotion is being done. The pte1 * update and appropriate TLB flush must be done atomically in general. */ static void pmap_change_pte1(pmap_t pmap, pt1_entry_t *pte1p, vm_offset_t va, pt1_entry_t npte1) { if (pmap == kernel_pmap) { struct pte1_action act; sched_pin(); act.va = va; act.npte1 = npte1; act.update = PCPU_GET(cpuid); smp_rendezvous_cpus(all_cpus, smp_no_rendevous_barrier, pmap_update_pte1_action, NULL, &act); sched_unpin(); } else { register_t cspr; /* * Use break-before-make approach for changing userland * mappings. It can cause L1 translation aborts on other * cores in SMP case. So, special treatment is implemented * in pmap_fault(). To reduce the likelihood that another core * will be affected by the broken mapping, disable interrupts * until the mapping change is completed. */ cspr = disable_interrupts(PSR_I | PSR_F); pte1_clear(pte1p); pmap_tlb_flush_pte1(pmap, va, npte1); pte1_store(pte1p, npte1); restore_interrupts(cspr); } } #else static void pmap_change_pte1(pmap_t pmap, pt1_entry_t *pte1p, vm_offset_t va, pt1_entry_t npte1) { if (pmap == kernel_pmap) { mtx_lock_spin(&allpmaps_lock); pmap_update_pte1_kernel(va, npte1); mtx_unlock_spin(&allpmaps_lock); } else { register_t cspr; /* * Use break-before-make approach for changing userland * mappings. It's absolutely safe in UP case when interrupts * are disabled. */ cspr = disable_interrupts(PSR_I | PSR_F); pte1_clear(pte1p); pmap_tlb_flush_pte1(pmap, va, npte1); pte1_store(pte1p, npte1); restore_interrupts(cspr); } } #endif /* * Tries to promote the NPTE2_IN_PT2, contiguous 4KB page mappings that are * within a single page table page (PT2) to a single 1MB page mapping. * For promotion to occur, two conditions must be met: (1) the 4KB page * mappings must map aligned, contiguous physical memory and (2) the 4KB page * mappings must have identical characteristics. * * Managed (PG_MANAGED) mappings within the kernel address space are not * promoted. The reason is that kernel PTE1s are replicated in each pmap but * pmap_remove_write(), pmap_clear_modify(), and pmap_clear_reference() only * read the PTE1 from the kernel pmap. */ static void pmap_promote_pte1(pmap_t pmap, pt1_entry_t *pte1p, vm_offset_t va) { pt1_entry_t npte1; pt2_entry_t *fpte2p, fpte2, fpte2_fav; pt2_entry_t *pte2p, pte2; vm_offset_t pteva __unused; vm_page_t m __unused; PDEBUG(6, printf("%s(%p): try for va %#x pte1 %#x at %p\n", __func__, pmap, va, pte1_load(pte1p), pte1p)); PMAP_LOCK_ASSERT(pmap, MA_OWNED); /* * Examine the first PTE2 in the specified PT2. Abort if this PTE2 is * either invalid, unused, or does not map the first 4KB physical page * within a 1MB page. */ fpte2p = pmap_pte2_quick(pmap, pte1_trunc(va)); fpte2 = pte2_load(fpte2p); if ((fpte2 & ((PTE2_FRAME & PTE1_OFFSET) | PTE2_A | PTE2_V)) != (PTE2_A | PTE2_V)) { pmap_pte1_p_failures++; CTR3(KTR_PMAP, "%s: failure(1) for va %#x in pmap %p", __func__, va, pmap); return; } if (pte2_is_managed(fpte2) && pmap == kernel_pmap) { pmap_pte1_p_failures++; CTR3(KTR_PMAP, "%s: failure(2) for va %#x in pmap %p", __func__, va, pmap); return; } if ((fpte2 & (PTE2_NM | PTE2_RO)) == PTE2_NM) { /* * When page is not modified, PTE2_RO can be set without * a TLB invalidation. */ fpte2 |= PTE2_RO; pte2_store(fpte2p, fpte2); } /* * Examine each of the other PTE2s in the specified PT2. Abort if this * PTE2 maps an unexpected 4KB physical page or does not have identical * characteristics to the first PTE2. */ fpte2_fav = (fpte2 & (PTE2_FRAME | PTE2_A | PTE2_V)); fpte2_fav += PTE1_SIZE - PTE2_SIZE; /* examine from the end */ for (pte2p = fpte2p + NPTE2_IN_PT2 - 1; pte2p > fpte2p; pte2p--) { pte2 = pte2_load(pte2p); if ((pte2 & (PTE2_FRAME | PTE2_A | PTE2_V)) != fpte2_fav) { pmap_pte1_p_failures++; CTR3(KTR_PMAP, "%s: failure(3) for va %#x in pmap %p", __func__, va, pmap); return; } if ((pte2 & (PTE2_NM | PTE2_RO)) == PTE2_NM) { /* * When page is not modified, PTE2_RO can be set * without a TLB invalidation. See note above. */ pte2 |= PTE2_RO; pte2_store(pte2p, pte2); pteva = pte1_trunc(va) | (pte2 & PTE1_OFFSET & PTE2_FRAME); CTR3(KTR_PMAP, "%s: protect for va %#x in pmap %p", __func__, pteva, pmap); } if ((pte2 & PTE2_PROMOTE) != (fpte2 & PTE2_PROMOTE)) { pmap_pte1_p_failures++; CTR3(KTR_PMAP, "%s: failure(4) for va %#x in pmap %p", __func__, va, pmap); return; } fpte2_fav -= PTE2_SIZE; } /* * The page table page in its current state will stay in PT2TAB * until the PTE1 mapping the section is demoted by pmap_demote_pte1() * or destroyed by pmap_remove_pte1(). * * Note that L2 page table size is not equal to PAGE_SIZE. */ m = PHYS_TO_VM_PAGE(trunc_page(pte1_link_pa(pte1_load(pte1p)))); KASSERT(m >= vm_page_array && m < &vm_page_array[vm_page_array_size], ("%s: PT2 page is out of range", __func__)); KASSERT(m->pindex == (pte1_index(va) & ~PT2PG_MASK), ("%s: PT2 page's pindex is wrong", __func__)); /* * Get pte1 from pte2 format. */ npte1 = (fpte2 & PTE1_FRAME) | ATTR_TO_L1(fpte2) | PTE1_V; /* * Promote the pv entries. */ if (pte2_is_managed(fpte2)) pmap_pv_promote_pte1(pmap, va, pte1_pa(npte1)); /* * Promote the mappings. */ pmap_change_pte1(pmap, pte1p, va, npte1); pmap_pte1_promotions++; CTR3(KTR_PMAP, "%s: success for va %#x in pmap %p", __func__, va, pmap); PDEBUG(6, printf("%s(%p): success for va %#x pte1 %#x(%#x) at %p\n", __func__, pmap, va, npte1, pte1_load(pte1p), pte1p)); } /* * Zero L2 page table page. */ static __inline void pmap_clear_pt2(pt2_entry_t *fpte2p) { pt2_entry_t *pte2p; for (pte2p = fpte2p; pte2p < fpte2p + NPTE2_IN_PT2; pte2p++) pte2_clear(pte2p); } /* * Removes a 1MB page mapping from the kernel pmap. */ static void pmap_remove_kernel_pte1(pmap_t pmap, pt1_entry_t *pte1p, vm_offset_t va) { vm_page_t m; uint32_t pte1_idx; pt2_entry_t *fpte2p; vm_paddr_t pt2_pa; PMAP_LOCK_ASSERT(pmap, MA_OWNED); m = pmap_pt2_page(pmap, va); if (m == NULL) /* * QQQ: Is this function called only on promoted pte1? * We certainly do section mappings directly * (without promotion) in kernel !!! */ panic("%s: missing pt2 page", __func__); pte1_idx = pte1_index(va); /* * Initialize the L2 page table. */ fpte2p = page_pt2(pt2map_pt2pg(va), pte1_idx); pmap_clear_pt2(fpte2p); /* * Remove the mapping. */ pt2_pa = page_pt2pa(VM_PAGE_TO_PHYS(m), pte1_idx); pmap_kenter_pte1(va, PTE1_LINK(pt2_pa)); /* * QQQ: We do not need to invalidate PT2MAP mapping * as we did not change it. I.e. the L2 page table page * was and still is mapped the same way. */ } /* * Do the things to unmap a section in a process */ static void pmap_remove_pte1(pmap_t pmap, pt1_entry_t *pte1p, vm_offset_t sva, struct spglist *free) { pt1_entry_t opte1; struct md_page *pvh; vm_offset_t eva, va; vm_page_t m; PDEBUG(6, printf("%s(%p): va %#x pte1 %#x at %p\n", __func__, pmap, sva, pte1_load(pte1p), pte1p)); PMAP_LOCK_ASSERT(pmap, MA_OWNED); KASSERT((sva & PTE1_OFFSET) == 0, ("%s: sva is not 1mpage aligned", __func__)); /* * Clear and invalidate the mapping. It should occupy one and only TLB * entry. So, pmap_tlb_flush() called with aligned address should be * sufficient. */ opte1 = pte1_load_clear(pte1p); pmap_tlb_flush(pmap, sva); if (pte1_is_wired(opte1)) pmap->pm_stats.wired_count -= PTE1_SIZE / PAGE_SIZE; pmap->pm_stats.resident_count -= PTE1_SIZE / PAGE_SIZE; if (pte1_is_managed(opte1)) { pvh = pa_to_pvh(pte1_pa(opte1)); pmap_pvh_free(pvh, pmap, sva); eva = sva + PTE1_SIZE; for (va = sva, m = PHYS_TO_VM_PAGE(pte1_pa(opte1)); va < eva; va += PAGE_SIZE, m++) { if (pte1_is_dirty(opte1)) vm_page_dirty(m); if (opte1 & PTE1_A) vm_page_aflag_set(m, PGA_REFERENCED); if (TAILQ_EMPTY(&m->md.pv_list) && TAILQ_EMPTY(&pvh->pv_list)) vm_page_aflag_clear(m, PGA_WRITEABLE); } } if (pmap == kernel_pmap) { /* * L2 page table(s) can't be removed from kernel map as * kernel counts on it (stuff around pmap_growkernel()). */ pmap_remove_kernel_pte1(pmap, pte1p, sva); } else { /* * Get associated L2 page table page. * It's possible that the page was never allocated. */ m = pmap_pt2_page(pmap, sva); if (m != NULL) pmap_unwire_pt2_all(pmap, sva, m, free); } } /* * Fills L2 page table page with mappings to consecutive physical pages. */ static __inline void pmap_fill_pt2(pt2_entry_t *fpte2p, pt2_entry_t npte2) { pt2_entry_t *pte2p; for (pte2p = fpte2p; pte2p < fpte2p + NPTE2_IN_PT2; pte2p++) { pte2_store(pte2p, npte2); npte2 += PTE2_SIZE; } } /* * Tries to demote a 1MB page mapping. If demotion fails, the * 1MB page mapping is invalidated. */ static boolean_t pmap_demote_pte1(pmap_t pmap, pt1_entry_t *pte1p, vm_offset_t va) { pt1_entry_t opte1, npte1; pt2_entry_t *fpte2p, npte2; vm_paddr_t pt2pg_pa, pt2_pa; vm_page_t m; struct spglist free; uint32_t pte1_idx, isnew = 0; PDEBUG(6, printf("%s(%p): try for va %#x pte1 %#x at %p\n", __func__, pmap, va, pte1_load(pte1p), pte1p)); PMAP_LOCK_ASSERT(pmap, MA_OWNED); opte1 = pte1_load(pte1p); KASSERT(pte1_is_section(opte1), ("%s: opte1 not a section", __func__)); if ((opte1 & PTE1_A) == 0 || (m = pmap_pt2_page(pmap, va)) == NULL) { KASSERT(!pte1_is_wired(opte1), ("%s: PT2 page for a wired mapping is missing", __func__)); /* * Invalidate the 1MB page mapping and return * "failure" if the mapping was never accessed or the * allocation of the new page table page fails. */ if ((opte1 & PTE1_A) == 0 || (m = vm_page_alloc(NULL, pte1_index(va) & ~PT2PG_MASK, VM_ALLOC_NOOBJ | VM_ALLOC_NORMAL | VM_ALLOC_WIRED)) == NULL) { SLIST_INIT(&free); pmap_remove_pte1(pmap, pte1p, pte1_trunc(va), &free); pmap_free_zero_pages(&free); CTR3(KTR_PMAP, "%s: failure for va %#x in pmap %p", __func__, va, pmap); return (FALSE); } if (va < VM_MAXUSER_ADDRESS) pmap->pm_stats.resident_count++; isnew = 1; /* * We init all L2 page tables in the page even if * we are going to change everything for one L2 page * table in a while. */ pt2pg_pa = pmap_pt2pg_init(pmap, va, m); } else { if (va < VM_MAXUSER_ADDRESS) { if (pt2_is_empty(m, va)) isnew = 1; /* Demoting section w/o promotion. */ #ifdef INVARIANTS else KASSERT(pt2_is_full(m, va), ("%s: bad PT2 wire" " count %u", __func__, pt2_wirecount_get(m, pte1_index(va)))); #endif } } pt2pg_pa = VM_PAGE_TO_PHYS(m); pte1_idx = pte1_index(va); /* * If the pmap is current, then the PT2MAP can provide access to * the page table page (promoted L2 page tables are not unmapped). * Otherwise, temporarily map the L2 page table page (m) into * the kernel's address space at either PADDR1 or PADDR2. * * Note that L2 page table size is not equal to PAGE_SIZE. */ if (pmap_is_current(pmap)) fpte2p = page_pt2(pt2map_pt2pg(va), pte1_idx); else if (curthread->td_pinned > 0 && rw_wowned(&pvh_global_lock)) { if (pte2_pa(pte2_load(PMAP1)) != pt2pg_pa) { pte2_store(PMAP1, PTE2_KPT(pt2pg_pa)); #ifdef SMP PMAP1cpu = PCPU_GET(cpuid); #endif tlb_flush_local((vm_offset_t)PADDR1); PMAP1changed++; } else #ifdef SMP if (PMAP1cpu != PCPU_GET(cpuid)) { PMAP1cpu = PCPU_GET(cpuid); tlb_flush_local((vm_offset_t)PADDR1); PMAP1changedcpu++; } else #endif PMAP1unchanged++; fpte2p = page_pt2((vm_offset_t)PADDR1, pte1_idx); } else { mtx_lock(&PMAP2mutex); if (pte2_pa(pte2_load(PMAP2)) != pt2pg_pa) { pte2_store(PMAP2, PTE2_KPT(pt2pg_pa)); tlb_flush((vm_offset_t)PADDR2); } fpte2p = page_pt2((vm_offset_t)PADDR2, pte1_idx); } pt2_pa = page_pt2pa(pt2pg_pa, pte1_idx); npte1 = PTE1_LINK(pt2_pa); KASSERT((opte1 & PTE1_A) != 0, ("%s: opte1 is missing PTE1_A", __func__)); KASSERT((opte1 & (PTE1_NM | PTE1_RO)) != PTE1_NM, ("%s: opte1 has PTE1_NM", __func__)); /* * Get pte2 from pte1 format. */ npte2 = pte1_pa(opte1) | ATTR_TO_L2(opte1) | PTE2_V; /* * If the L2 page table page is new, initialize it. If the mapping * has changed attributes, update the page table entries. */ if (isnew != 0) { pt2_wirecount_set(m, pte1_idx, NPTE2_IN_PT2); pmap_fill_pt2(fpte2p, npte2); } else if ((pte2_load(fpte2p) & PTE2_PROMOTE) != (npte2 & PTE2_PROMOTE)) pmap_fill_pt2(fpte2p, npte2); KASSERT(pte2_pa(pte2_load(fpte2p)) == pte2_pa(npte2), ("%s: fpte2p and npte2 map different physical addresses", __func__)); if (fpte2p == PADDR2) mtx_unlock(&PMAP2mutex); /* * Demote the mapping. This pmap is locked. The old PTE1 has * PTE1_A set. If the old PTE1 has not PTE1_RO set, it also * has not PTE1_NM set. Thus, there is no danger of a race with * another processor changing the setting of PTE1_A and/or PTE1_NM * between the read above and the store below. */ pmap_change_pte1(pmap, pte1p, va, npte1); /* * Demote the pv entry. This depends on the earlier demotion * of the mapping. Specifically, the (re)creation of a per- * page pv entry might trigger the execution of pmap_pv_reclaim(), * which might reclaim a newly (re)created per-page pv entry * and destroy the associated mapping. In order to destroy * the mapping, the PTE1 must have already changed from mapping * the 1mpage to referencing the page table page. */ if (pte1_is_managed(opte1)) pmap_pv_demote_pte1(pmap, va, pte1_pa(opte1)); pmap_pte1_demotions++; CTR3(KTR_PMAP, "%s: success for va %#x in pmap %p", __func__, va, pmap); PDEBUG(6, printf("%s(%p): success for va %#x pte1 %#x(%#x) at %p\n", __func__, pmap, va, npte1, pte1_load(pte1p), pte1p)); return (TRUE); } /* * Insert the given physical page (p) at * the specified virtual address (v) in the * target physical map with the protection requested. * * If specified, the page will be wired down, meaning * that the related pte can not be reclaimed. * * NB: This is the only routine which MAY NOT lazy-evaluate * or lose information. That is, this routine must actually * insert this page into the given map NOW. */ int pmap_enter(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot, u_int flags, int8_t psind) { pt1_entry_t *pte1p; pt2_entry_t *pte2p; pt2_entry_t npte2, opte2; pv_entry_t pv; vm_paddr_t opa, pa; vm_page_t mpte2, om; boolean_t wired; va = trunc_page(va); mpte2 = NULL; wired = (flags & PMAP_ENTER_WIRED) != 0; KASSERT(va <= vm_max_kernel_address, ("%s: toobig", __func__)); KASSERT(va < UPT2V_MIN_ADDRESS || va >= UPT2V_MAX_ADDRESS, ("%s: invalid to pmap_enter page table pages (va: 0x%x)", __func__, va)); if ((m->oflags & VPO_UNMANAGED) == 0 && !vm_page_xbusied(m)) VM_OBJECT_ASSERT_LOCKED(m->object); rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); sched_pin(); /* * In the case that a page table page is not * resident, we are creating it here. */ if (va < VM_MAXUSER_ADDRESS) { mpte2 = pmap_allocpte2(pmap, va, flags); if (mpte2 == NULL) { KASSERT((flags & PMAP_ENTER_NOSLEEP) != 0, ("pmap_allocpte2 failed with sleep allowed")); sched_unpin(); rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pmap); return (KERN_RESOURCE_SHORTAGE); } } pte1p = pmap_pte1(pmap, va); if (pte1_is_section(pte1_load(pte1p))) panic("%s: attempted on 1MB page", __func__); pte2p = pmap_pte2_quick(pmap, va); if (pte2p == NULL) panic("%s: invalid L1 page table entry va=%#x", __func__, va); om = NULL; pa = VM_PAGE_TO_PHYS(m); opte2 = pte2_load(pte2p); opa = pte2_pa(opte2); /* * Mapping has not changed, must be protection or wiring change. */ if (pte2_is_valid(opte2) && (opa == pa)) { /* * Wiring change, just update stats. We don't worry about * wiring PT2 pages as they remain resident as long as there * are valid mappings in them. Hence, if a user page is wired, * the PT2 page will be also. */ if (wired && !pte2_is_wired(opte2)) pmap->pm_stats.wired_count++; else if (!wired && pte2_is_wired(opte2)) pmap->pm_stats.wired_count--; /* * Remove extra pte2 reference */ if (mpte2) pt2_wirecount_dec(mpte2, pte1_index(va)); if (pte2_is_managed(opte2)) om = m; goto validate; } /* * QQQ: We think that changing physical address on writeable mapping * is not safe. Well, maybe on kernel address space with correct * locking, it can make a sense. However, we have no idea why * anyone should do that on user address space. Are we wrong? */ KASSERT((opa == 0) || (opa == pa) || !pte2_is_valid(opte2) || ((opte2 & PTE2_RO) != 0), ("%s: pmap %p va %#x(%#x) opa %#x pa %#x - gotcha %#x %#x!", __func__, pmap, va, opte2, opa, pa, flags, prot)); pv = NULL; /* * Mapping has changed, invalidate old range and fall through to * handle validating new mapping. */ if (opa) { if (pte2_is_wired(opte2)) pmap->pm_stats.wired_count--; if (pte2_is_managed(opte2)) { om = PHYS_TO_VM_PAGE(opa); pv = pmap_pvh_remove(&om->md, pmap, va); } /* * Remove extra pte2 reference */ if (mpte2 != NULL) pt2_wirecount_dec(mpte2, va >> PTE1_SHIFT); } else pmap->pm_stats.resident_count++; /* * Enter on the PV list if part of our managed memory. */ if ((m->oflags & VPO_UNMANAGED) == 0) { KASSERT(va < kmi.clean_sva || va >= kmi.clean_eva, ("%s: managed mapping within the clean submap", __func__)); if (pv == NULL) pv = get_pv_entry(pmap, FALSE); pv->pv_va = va; TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_next); } else if (pv != NULL) free_pv_entry(pmap, pv); /* * Increment counters */ if (wired) pmap->pm_stats.wired_count++; validate: /* * Now validate mapping with desired protection/wiring. */ npte2 = PTE2(pa, PTE2_NM, vm_page_pte2_attr(m)); if (prot & VM_PROT_WRITE) { if (pte2_is_managed(npte2)) vm_page_aflag_set(m, PGA_WRITEABLE); } else npte2 |= PTE2_RO; if ((prot & VM_PROT_EXECUTE) == 0) npte2 |= PTE2_NX; if (wired) npte2 |= PTE2_W; if (va < VM_MAXUSER_ADDRESS) npte2 |= PTE2_U; if (pmap != kernel_pmap) npte2 |= PTE2_NG; /* * If the mapping or permission bits are different, we need * to update the pte2. * * QQQ: Think again and again what to do * if the mapping is going to be changed! */ if ((opte2 & ~(PTE2_NM | PTE2_A)) != (npte2 & ~(PTE2_NM | PTE2_A))) { /* * Sync icache if exec permission and attribute VM_MEMATTR_WB_WA * is set. Do it now, before the mapping is stored and made * valid for hardware table walk. If done later, there is a race * for other threads of current process in lazy loading case. * Don't do it for kernel memory which is mapped with exec * permission even if the memory isn't going to hold executable * code. The only time when icache sync is needed is after * kernel module is loaded and the relocation info is processed. * And it's done in elf_cpu_load_file(). * * QQQ: (1) Does it exist any better way where * or how to sync icache? * (2) Now, we do it on a page basis. */ if ((prot & VM_PROT_EXECUTE) && pmap != kernel_pmap && m->md.pat_mode == VM_MEMATTR_WB_WA && (opa != pa || (opte2 & PTE2_NX))) cache_icache_sync_fresh(va, pa, PAGE_SIZE); npte2 |= PTE2_A; if (flags & VM_PROT_WRITE) npte2 &= ~PTE2_NM; if (opte2 & PTE2_V) { /* Change mapping with break-before-make approach. */ opte2 = pte2_load_clear(pte2p); pmap_tlb_flush(pmap, va); pte2_store(pte2p, npte2); if (opte2 & PTE2_A) { if (pte2_is_managed(opte2)) vm_page_aflag_set(om, PGA_REFERENCED); } if (pte2_is_dirty(opte2)) { if (pte2_is_managed(opte2)) vm_page_dirty(om); } if (pte2_is_managed(opte2) && TAILQ_EMPTY(&om->md.pv_list) && ((om->flags & PG_FICTITIOUS) != 0 || TAILQ_EMPTY(&pa_to_pvh(opa)->pv_list))) vm_page_aflag_clear(om, PGA_WRITEABLE); } else pte2_store(pte2p, npte2); } #if 0 else { /* * QQQ: In time when both access and not mofified bits are * emulated by software, this should not happen. Some * analysis is need, if this really happen. Missing * tlb flush somewhere could be the reason. */ panic("%s: pmap %p va %#x opte2 %x npte2 %x !!", __func__, pmap, va, opte2, npte2); } #endif /* * If both the L2 page table page and the reservation are fully * populated, then attempt promotion. */ if ((mpte2 == NULL || pt2_is_full(mpte2, va)) && sp_enabled && (m->flags & PG_FICTITIOUS) == 0 && vm_reserv_level_iffullpop(m) == 0) pmap_promote_pte1(pmap, pte1p, va); sched_unpin(); rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pmap); return (KERN_SUCCESS); } /* * Do the things to unmap a page in a process. */ static int pmap_remove_pte2(pmap_t pmap, pt2_entry_t *pte2p, vm_offset_t va, struct spglist *free) { pt2_entry_t opte2; vm_page_t m; rw_assert(&pvh_global_lock, RA_WLOCKED); PMAP_LOCK_ASSERT(pmap, MA_OWNED); /* Clear and invalidate the mapping. */ opte2 = pte2_load_clear(pte2p); pmap_tlb_flush(pmap, va); KASSERT(pte2_is_valid(opte2), ("%s: pmap %p va %#x not link pte2 %#x", __func__, pmap, va, opte2)); if (opte2 & PTE2_W) pmap->pm_stats.wired_count -= 1; pmap->pm_stats.resident_count -= 1; if (pte2_is_managed(opte2)) { m = PHYS_TO_VM_PAGE(pte2_pa(opte2)); if (pte2_is_dirty(opte2)) vm_page_dirty(m); if (opte2 & PTE2_A) vm_page_aflag_set(m, PGA_REFERENCED); pmap_remove_entry(pmap, m, va); } return (pmap_unuse_pt2(pmap, va, free)); } /* * Remove a single page from a process address space. */ static void pmap_remove_page(pmap_t pmap, vm_offset_t va, struct spglist *free) { pt2_entry_t *pte2p; rw_assert(&pvh_global_lock, RA_WLOCKED); KASSERT(curthread->td_pinned > 0, ("%s: curthread not pinned", __func__)); PMAP_LOCK_ASSERT(pmap, MA_OWNED); if ((pte2p = pmap_pte2_quick(pmap, va)) == NULL || !pte2_is_valid(pte2_load(pte2p))) return; pmap_remove_pte2(pmap, pte2p, va, free); } /* * Remove the given range of addresses from the specified map. * * It is assumed that the start and end are properly * rounded to the page size. */ void pmap_remove(pmap_t pmap, vm_offset_t sva, vm_offset_t eva) { vm_offset_t nextva; pt1_entry_t *pte1p, pte1; pt2_entry_t *pte2p, pte2; struct spglist free; /* * Perform an unsynchronized read. This is, however, safe. */ if (pmap->pm_stats.resident_count == 0) return; SLIST_INIT(&free); rw_wlock(&pvh_global_lock); sched_pin(); PMAP_LOCK(pmap); /* * Special handling of removing one page. A very common * operation and easy to short circuit some code. */ if (sva + PAGE_SIZE == eva) { pte1 = pte1_load(pmap_pte1(pmap, sva)); if (pte1_is_link(pte1)) { pmap_remove_page(pmap, sva, &free); goto out; } } for (; sva < eva; sva = nextva) { /* * Calculate address for next L2 page table. */ nextva = pte1_trunc(sva + PTE1_SIZE); if (nextva < sva) nextva = eva; if (pmap->pm_stats.resident_count == 0) break; pte1p = pmap_pte1(pmap, sva); pte1 = pte1_load(pte1p); /* * Weed out invalid mappings. Note: we assume that the L1 page * table is always allocated, and in kernel virtual. */ if (pte1 == 0) continue; if (pte1_is_section(pte1)) { /* * Are we removing the entire large page? If not, * demote the mapping and fall through. */ if (sva + PTE1_SIZE == nextva && eva >= nextva) { pmap_remove_pte1(pmap, pte1p, sva, &free); continue; } else if (!pmap_demote_pte1(pmap, pte1p, sva)) { /* The large page mapping was destroyed. */ continue; } #ifdef INVARIANTS else { /* Update pte1 after demotion. */ pte1 = pte1_load(pte1p); } #endif } KASSERT(pte1_is_link(pte1), ("%s: pmap %p va %#x pte1 %#x at %p" " is not link", __func__, pmap, sva, pte1, pte1p)); /* * Limit our scan to either the end of the va represented * by the current L2 page table page, or to the end of the * range being removed. */ if (nextva > eva) nextva = eva; for (pte2p = pmap_pte2_quick(pmap, sva); sva != nextva; pte2p++, sva += PAGE_SIZE) { pte2 = pte2_load(pte2p); if (!pte2_is_valid(pte2)) continue; if (pmap_remove_pte2(pmap, pte2p, sva, &free)) break; } } out: sched_unpin(); rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pmap); pmap_free_zero_pages(&free); } /* * Routine: pmap_remove_all * Function: * Removes this physical page from * all physical maps in which it resides. * Reflects back modify bits to the pager. * * Notes: * Original versions of this routine were very * inefficient because they iteratively called * pmap_remove (slow...) */ void pmap_remove_all(vm_page_t m) { struct md_page *pvh; pv_entry_t pv; pmap_t pmap; pt2_entry_t *pte2p, opte2; pt1_entry_t *pte1p; vm_offset_t va; struct spglist free; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("%s: page %p is not managed", __func__, m)); SLIST_INIT(&free); rw_wlock(&pvh_global_lock); sched_pin(); if ((m->flags & PG_FICTITIOUS) != 0) goto small_mappings; pvh = pa_to_pvh(VM_PAGE_TO_PHYS(m)); while ((pv = TAILQ_FIRST(&pvh->pv_list)) != NULL) { va = pv->pv_va; pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pte1p = pmap_pte1(pmap, va); (void)pmap_demote_pte1(pmap, pte1p, va); PMAP_UNLOCK(pmap); } small_mappings: while ((pv = TAILQ_FIRST(&m->md.pv_list)) != NULL) { pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pmap->pm_stats.resident_count--; pte1p = pmap_pte1(pmap, pv->pv_va); KASSERT(!pte1_is_section(pte1_load(pte1p)), ("%s: found " "a 1mpage in page %p's pv list", __func__, m)); pte2p = pmap_pte2_quick(pmap, pv->pv_va); opte2 = pte2_load_clear(pte2p); pmap_tlb_flush(pmap, pv->pv_va); KASSERT(pte2_is_valid(opte2), ("%s: pmap %p va %x zero pte2", __func__, pmap, pv->pv_va)); if (pte2_is_wired(opte2)) pmap->pm_stats.wired_count--; if (opte2 & PTE2_A) vm_page_aflag_set(m, PGA_REFERENCED); /* * Update the vm_page_t clean and reference bits. */ if (pte2_is_dirty(opte2)) vm_page_dirty(m); pmap_unuse_pt2(pmap, pv->pv_va, &free); TAILQ_REMOVE(&m->md.pv_list, pv, pv_next); free_pv_entry(pmap, pv); PMAP_UNLOCK(pmap); } vm_page_aflag_clear(m, PGA_WRITEABLE); sched_unpin(); rw_wunlock(&pvh_global_lock); pmap_free_zero_pages(&free); } /* * Just subroutine for pmap_remove_pages() to reasonably satisfy * good coding style, a.k.a. 80 character line width limit hell. */ static __inline void pmap_remove_pte1_quick(pmap_t pmap, pt1_entry_t pte1, pv_entry_t pv, struct spglist *free) { vm_paddr_t pa; vm_page_t m, mt, mpt2pg; struct md_page *pvh; pa = pte1_pa(pte1); m = PHYS_TO_VM_PAGE(pa); KASSERT(m->phys_addr == pa, ("%s: vm_page_t %p addr mismatch %#x %#x", __func__, m, m->phys_addr, pa)); KASSERT((m->flags & PG_FICTITIOUS) != 0 || m < &vm_page_array[vm_page_array_size], ("%s: bad pte1 %#x", __func__, pte1)); if (pte1_is_dirty(pte1)) { for (mt = m; mt < &m[PTE1_SIZE / PAGE_SIZE]; mt++) vm_page_dirty(mt); } pmap->pm_stats.resident_count -= PTE1_SIZE / PAGE_SIZE; pvh = pa_to_pvh(pa); TAILQ_REMOVE(&pvh->pv_list, pv, pv_next); if (TAILQ_EMPTY(&pvh->pv_list)) { for (mt = m; mt < &m[PTE1_SIZE / PAGE_SIZE]; mt++) if (TAILQ_EMPTY(&mt->md.pv_list)) vm_page_aflag_clear(mt, PGA_WRITEABLE); } mpt2pg = pmap_pt2_page(pmap, pv->pv_va); if (mpt2pg != NULL) pmap_unwire_pt2_all(pmap, pv->pv_va, mpt2pg, free); } /* * Just subroutine for pmap_remove_pages() to reasonably satisfy * good coding style, a.k.a. 80 character line width limit hell. */ static __inline void pmap_remove_pte2_quick(pmap_t pmap, pt2_entry_t pte2, pv_entry_t pv, struct spglist *free) { vm_paddr_t pa; vm_page_t m; struct md_page *pvh; pa = pte2_pa(pte2); m = PHYS_TO_VM_PAGE(pa); KASSERT(m->phys_addr == pa, ("%s: vm_page_t %p addr mismatch %#x %#x", __func__, m, m->phys_addr, pa)); KASSERT((m->flags & PG_FICTITIOUS) != 0 || m < &vm_page_array[vm_page_array_size], ("%s: bad pte2 %#x", __func__, pte2)); if (pte2_is_dirty(pte2)) vm_page_dirty(m); pmap->pm_stats.resident_count--; TAILQ_REMOVE(&m->md.pv_list, pv, pv_next); if (TAILQ_EMPTY(&m->md.pv_list) && (m->flags & PG_FICTITIOUS) == 0) { pvh = pa_to_pvh(pa); if (TAILQ_EMPTY(&pvh->pv_list)) vm_page_aflag_clear(m, PGA_WRITEABLE); } pmap_unuse_pt2(pmap, pv->pv_va, free); } /* * Remove all pages from specified address space this aids process * exit speeds. Also, this code is special cased for current process * only, but can have the more generic (and slightly slower) mode enabled. * This is much faster than pmap_remove in the case of running down * an entire address space. */ void pmap_remove_pages(pmap_t pmap) { pt1_entry_t *pte1p, pte1; pt2_entry_t *pte2p, pte2; pv_entry_t pv; struct pv_chunk *pc, *npc; struct spglist free; int field, idx; int32_t bit; uint32_t inuse, bitmask; boolean_t allfree; /* * Assert that the given pmap is only active on the current * CPU. Unfortunately, we cannot block another CPU from * activating the pmap while this function is executing. */ KASSERT(pmap == vmspace_pmap(curthread->td_proc->p_vmspace), ("%s: non-current pmap %p", __func__, pmap)); #if defined(SMP) && defined(INVARIANTS) { cpuset_t other_cpus; sched_pin(); other_cpus = pmap->pm_active; CPU_CLR(PCPU_GET(cpuid), &other_cpus); sched_unpin(); KASSERT(CPU_EMPTY(&other_cpus), ("%s: pmap %p active on other cpus", __func__, pmap)); } #endif SLIST_INIT(&free); rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); sched_pin(); TAILQ_FOREACH_SAFE(pc, &pmap->pm_pvchunk, pc_list, npc) { KASSERT(pc->pc_pmap == pmap, ("%s: wrong pmap %p %p", __func__, pmap, pc->pc_pmap)); allfree = TRUE; for (field = 0; field < _NPCM; field++) { inuse = (~(pc->pc_map[field])) & pc_freemask[field]; while (inuse != 0) { bit = ffs(inuse) - 1; bitmask = 1UL << bit; idx = field * 32 + bit; pv = &pc->pc_pventry[idx]; inuse &= ~bitmask; /* * Note that we cannot remove wired pages * from a process' mapping at this time */ pte1p = pmap_pte1(pmap, pv->pv_va); pte1 = pte1_load(pte1p); if (pte1_is_section(pte1)) { if (pte1_is_wired(pte1)) { allfree = FALSE; continue; } pte1_clear(pte1p); pmap_remove_pte1_quick(pmap, pte1, pv, &free); } else if (pte1_is_link(pte1)) { pte2p = pt2map_entry(pv->pv_va); pte2 = pte2_load(pte2p); if (!pte2_is_valid(pte2)) { printf("%s: pmap %p va %#x " "pte2 %#x\n", __func__, pmap, pv->pv_va, pte2); panic("bad pte2"); } if (pte2_is_wired(pte2)) { allfree = FALSE; continue; } pte2_clear(pte2p); pmap_remove_pte2_quick(pmap, pte2, pv, &free); } else { printf("%s: pmap %p va %#x pte1 %#x\n", __func__, pmap, pv->pv_va, pte1); panic("bad pte1"); } /* Mark free */ PV_STAT(pv_entry_frees++); PV_STAT(pv_entry_spare++); pv_entry_count--; pc->pc_map[field] |= bitmask; } } if (allfree) { TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); free_pv_chunk(pc); } } tlb_flush_all_ng_local(); sched_unpin(); rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pmap); pmap_free_zero_pages(&free); } /* * This code makes some *MAJOR* assumptions: * 1. Current pmap & pmap exists. * 2. Not wired. * 3. Read access. * 4. No L2 page table pages. * but is *MUCH* faster than pmap_enter... */ static vm_page_t pmap_enter_quick_locked(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot, vm_page_t mpt2pg) { pt2_entry_t *pte2p, pte2; vm_paddr_t pa; struct spglist free; uint32_t l2prot; KASSERT(va < kmi.clean_sva || va >= kmi.clean_eva || (m->oflags & VPO_UNMANAGED) != 0, ("%s: managed mapping within the clean submap", __func__)); rw_assert(&pvh_global_lock, RA_WLOCKED); PMAP_LOCK_ASSERT(pmap, MA_OWNED); /* * In the case that a L2 page table page is not * resident, we are creating it here. */ if (va < VM_MAXUSER_ADDRESS) { u_int pte1_idx; pt1_entry_t pte1, *pte1p; vm_paddr_t pt2_pa; /* * Get L1 page table things. */ pte1_idx = pte1_index(va); pte1p = pmap_pte1(pmap, va); pte1 = pte1_load(pte1p); if (mpt2pg && (mpt2pg->pindex == (pte1_idx & ~PT2PG_MASK))) { /* * Each of NPT2_IN_PG L2 page tables on the page can * come here. Make sure that associated L1 page table * link is established. * * QQQ: It comes that we don't establish all links to * L2 page tables for newly allocated L2 page * tables page. */ KASSERT(!pte1_is_section(pte1), ("%s: pte1 %#x is section", __func__, pte1)); if (!pte1_is_link(pte1)) { pt2_pa = page_pt2pa(VM_PAGE_TO_PHYS(mpt2pg), pte1_idx); pte1_store(pte1p, PTE1_LINK(pt2_pa)); } pt2_wirecount_inc(mpt2pg, pte1_idx); } else { /* * If the L2 page table page is mapped, we just * increment the hold count, and activate it. */ if (pte1_is_section(pte1)) { return (NULL); } else if (pte1_is_link(pte1)) { mpt2pg = PHYS_TO_VM_PAGE(pte1_link_pa(pte1)); pt2_wirecount_inc(mpt2pg, pte1_idx); } else { mpt2pg = _pmap_allocpte2(pmap, va, PMAP_ENTER_NOSLEEP); if (mpt2pg == NULL) return (NULL); } } } else { mpt2pg = NULL; } /* * This call to pt2map_entry() makes the assumption that we are * entering the page into the current pmap. In order to support * quick entry into any pmap, one would likely use pmap_pte2_quick(). * But that isn't as quick as pt2map_entry(). */ pte2p = pt2map_entry(va); pte2 = pte2_load(pte2p); if (pte2_is_valid(pte2)) { if (mpt2pg != NULL) { /* * Remove extra pte2 reference */ pt2_wirecount_dec(mpt2pg, pte1_index(va)); mpt2pg = NULL; } return (NULL); } /* * Enter on the PV list if part of our managed memory. */ if ((m->oflags & VPO_UNMANAGED) == 0 && !pmap_try_insert_pv_entry(pmap, va, m)) { if (mpt2pg != NULL) { SLIST_INIT(&free); if (pmap_unwire_pt2(pmap, va, mpt2pg, &free)) { pmap_tlb_flush(pmap, va); pmap_free_zero_pages(&free); } mpt2pg = NULL; } return (NULL); } /* * Increment counters */ pmap->pm_stats.resident_count++; /* * Now validate mapping with RO protection */ pa = VM_PAGE_TO_PHYS(m); l2prot = PTE2_RO | PTE2_NM; if (va < VM_MAXUSER_ADDRESS) l2prot |= PTE2_U | PTE2_NG; if ((prot & VM_PROT_EXECUTE) == 0) l2prot |= PTE2_NX; else if (m->md.pat_mode == VM_MEMATTR_WB_WA && pmap != kernel_pmap) { /* * Sync icache if exec permission and attribute VM_MEMATTR_WB_WA * is set. QQQ: For more info, see comments in pmap_enter(). */ cache_icache_sync_fresh(va, pa, PAGE_SIZE); } pte2_store(pte2p, PTE2(pa, l2prot, vm_page_pte2_attr(m))); return (mpt2pg); } void pmap_enter_quick(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot) { rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); (void)pmap_enter_quick_locked(pmap, va, m, prot, NULL); rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pmap); } /* * Tries to create 1MB page mapping. Returns TRUE if successful and * FALSE otherwise. Fails if (1) a page table page cannot be allocated without * blocking, (2) a mapping already exists at the specified virtual address, or * (3) a pv entry cannot be allocated without reclaiming another pv entry. */ static boolean_t pmap_enter_pte1(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot) { pt1_entry_t *pte1p; vm_paddr_t pa; uint32_t l1prot; rw_assert(&pvh_global_lock, RA_WLOCKED); PMAP_LOCK_ASSERT(pmap, MA_OWNED); pte1p = pmap_pte1(pmap, va); if (pte1_is_valid(pte1_load(pte1p))) { CTR3(KTR_PMAP, "%s: failure for va %#lx in pmap %p", __func__, va, pmap); return (FALSE); } if ((m->oflags & VPO_UNMANAGED) == 0) { /* * Abort this mapping if its PV entry could not be created. */ if (!pmap_pv_insert_pte1(pmap, va, VM_PAGE_TO_PHYS(m))) { CTR3(KTR_PMAP, "%s: failure for va %#lx in pmap %p", __func__, va, pmap); return (FALSE); } } /* * Increment counters. */ pmap->pm_stats.resident_count += PTE1_SIZE / PAGE_SIZE; /* * Map the section. * * QQQ: Why VM_PROT_WRITE is not evaluated and the mapping is * made readonly? */ pa = VM_PAGE_TO_PHYS(m); l1prot = PTE1_RO | PTE1_NM; if (va < VM_MAXUSER_ADDRESS) l1prot |= PTE1_U | PTE1_NG; if ((prot & VM_PROT_EXECUTE) == 0) l1prot |= PTE1_NX; else if (m->md.pat_mode == VM_MEMATTR_WB_WA && pmap != kernel_pmap) { /* * Sync icache if exec permission and attribute VM_MEMATTR_WB_WA * is set. QQQ: For more info, see comments in pmap_enter(). */ cache_icache_sync_fresh(va, pa, PTE1_SIZE); } pte1_store(pte1p, PTE1(pa, l1prot, ATTR_TO_L1(vm_page_pte2_attr(m)))); pmap_pte1_mappings++; CTR3(KTR_PMAP, "%s: success for va %#lx in pmap %p", __func__, va, pmap); return (TRUE); } /* * Maps a sequence of resident pages belonging to the same object. * The sequence begins with the given page m_start. This page is * mapped at the given virtual address start. Each subsequent page is * mapped at a virtual address that is offset from start by the same * amount as the page is offset from m_start within the object. The * last page in the sequence is the page with the largest offset from * m_start that can be mapped at a virtual address less than the given * virtual address end. Not every virtual page between start and end * is mapped; only those for which a resident page exists with the * corresponding offset from m_start are mapped. */ void pmap_enter_object(pmap_t pmap, vm_offset_t start, vm_offset_t end, vm_page_t m_start, vm_prot_t prot) { vm_offset_t va; vm_page_t m, mpt2pg; vm_pindex_t diff, psize; PDEBUG(6, printf("%s: pmap %p start %#x end %#x m %p prot %#x\n", __func__, pmap, start, end, m_start, prot)); VM_OBJECT_ASSERT_LOCKED(m_start->object); psize = atop(end - start); mpt2pg = NULL; m = m_start; rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); while (m != NULL && (diff = m->pindex - m_start->pindex) < psize) { va = start + ptoa(diff); if ((va & PTE1_OFFSET) == 0 && va + PTE1_SIZE <= end && m->psind == 1 && sp_enabled && pmap_enter_pte1(pmap, va, m, prot)) m = &m[PTE1_SIZE / PAGE_SIZE - 1]; else mpt2pg = pmap_enter_quick_locked(pmap, va, m, prot, mpt2pg); m = TAILQ_NEXT(m, listq); } rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pmap); } /* * This code maps large physical mmap regions into the * processor address space. Note that some shortcuts * are taken, but the code works. */ void pmap_object_init_pt(pmap_t pmap, vm_offset_t addr, vm_object_t object, vm_pindex_t pindex, vm_size_t size) { pt1_entry_t *pte1p; vm_paddr_t pa, pte2_pa; vm_page_t p; vm_memattr_t pat_mode; u_int l1attr, l1prot; VM_OBJECT_ASSERT_WLOCKED(object); KASSERT(object->type == OBJT_DEVICE || object->type == OBJT_SG, ("%s: non-device object", __func__)); if ((addr & PTE1_OFFSET) == 0 && (size & PTE1_OFFSET) == 0) { if (!vm_object_populate(object, pindex, pindex + atop(size))) return; p = vm_page_lookup(object, pindex); KASSERT(p->valid == VM_PAGE_BITS_ALL, ("%s: invalid page %p", __func__, p)); pat_mode = p->md.pat_mode; /* * Abort the mapping if the first page is not physically * aligned to a 1MB page boundary. */ pte2_pa = VM_PAGE_TO_PHYS(p); if (pte2_pa & PTE1_OFFSET) return; /* * Skip the first page. Abort the mapping if the rest of * the pages are not physically contiguous or have differing * memory attributes. */ p = TAILQ_NEXT(p, listq); for (pa = pte2_pa + PAGE_SIZE; pa < pte2_pa + size; pa += PAGE_SIZE) { KASSERT(p->valid == VM_PAGE_BITS_ALL, ("%s: invalid page %p", __func__, p)); if (pa != VM_PAGE_TO_PHYS(p) || pat_mode != p->md.pat_mode) return; p = TAILQ_NEXT(p, listq); } /* * Map using 1MB pages. * * QQQ: Well, we are mapping a section, so same condition must * be hold like during promotion. It looks that only RW mapping * is done here, so readonly mapping must be done elsewhere. */ l1prot = PTE1_U | PTE1_NG | PTE1_RW | PTE1_M | PTE1_A; l1attr = ATTR_TO_L1(vm_memattr_to_pte2(pat_mode)); PMAP_LOCK(pmap); for (pa = pte2_pa; pa < pte2_pa + size; pa += PTE1_SIZE) { pte1p = pmap_pte1(pmap, addr); if (!pte1_is_valid(pte1_load(pte1p))) { pte1_store(pte1p, PTE1(pa, l1prot, l1attr)); pmap->pm_stats.resident_count += PTE1_SIZE / PAGE_SIZE; pmap_pte1_mappings++; } /* Else continue on if the PTE1 is already valid. */ addr += PTE1_SIZE; } PMAP_UNLOCK(pmap); } } /* * Do the things to protect a 1mpage in a process. */ static void pmap_protect_pte1(pmap_t pmap, pt1_entry_t *pte1p, vm_offset_t sva, vm_prot_t prot) { pt1_entry_t npte1, opte1; vm_offset_t eva, va; vm_page_t m; PMAP_LOCK_ASSERT(pmap, MA_OWNED); KASSERT((sva & PTE1_OFFSET) == 0, ("%s: sva is not 1mpage aligned", __func__)); opte1 = npte1 = pte1_load(pte1p); if (pte1_is_managed(opte1)) { eva = sva + PTE1_SIZE; for (va = sva, m = PHYS_TO_VM_PAGE(pte1_pa(opte1)); va < eva; va += PAGE_SIZE, m++) if (pte1_is_dirty(opte1)) vm_page_dirty(m); } if ((prot & VM_PROT_WRITE) == 0) npte1 |= PTE1_RO | PTE1_NM; if ((prot & VM_PROT_EXECUTE) == 0) npte1 |= PTE1_NX; /* * QQQ: Herein, execute permission is never set. * It only can be cleared. So, no icache * syncing is needed. */ if (npte1 != opte1) { pte1_store(pte1p, npte1); pmap_tlb_flush(pmap, sva); } } /* * Set the physical protection on the * specified range of this map as requested. */ void pmap_protect(pmap_t pmap, vm_offset_t sva, vm_offset_t eva, vm_prot_t prot) { boolean_t pv_lists_locked; vm_offset_t nextva; pt1_entry_t *pte1p, pte1; pt2_entry_t *pte2p, opte2, npte2; KASSERT((prot & ~VM_PROT_ALL) == 0, ("invalid prot %x", prot)); if (prot == VM_PROT_NONE) { pmap_remove(pmap, sva, eva); return; } if ((prot & (VM_PROT_WRITE | VM_PROT_EXECUTE)) == (VM_PROT_WRITE | VM_PROT_EXECUTE)) return; if (pmap_is_current(pmap)) pv_lists_locked = FALSE; else { pv_lists_locked = TRUE; resume: rw_wlock(&pvh_global_lock); sched_pin(); } PMAP_LOCK(pmap); for (; sva < eva; sva = nextva) { /* * Calculate address for next L2 page table. */ nextva = pte1_trunc(sva + PTE1_SIZE); if (nextva < sva) nextva = eva; pte1p = pmap_pte1(pmap, sva); pte1 = pte1_load(pte1p); /* * Weed out invalid mappings. Note: we assume that L1 page * page table is always allocated, and in kernel virtual. */ if (pte1 == 0) continue; if (pte1_is_section(pte1)) { /* * Are we protecting the entire large page? If not, * demote the mapping and fall through. */ if (sva + PTE1_SIZE == nextva && eva >= nextva) { pmap_protect_pte1(pmap, pte1p, sva, prot); continue; } else { if (!pv_lists_locked) { pv_lists_locked = TRUE; if (!rw_try_wlock(&pvh_global_lock)) { PMAP_UNLOCK(pmap); goto resume; } sched_pin(); } if (!pmap_demote_pte1(pmap, pte1p, sva)) { /* * The large page mapping * was destroyed. */ continue; } #ifdef INVARIANTS else { /* Update pte1 after demotion */ pte1 = pte1_load(pte1p); } #endif } } KASSERT(pte1_is_link(pte1), ("%s: pmap %p va %#x pte1 %#x at %p" " is not link", __func__, pmap, sva, pte1, pte1p)); /* * Limit our scan to either the end of the va represented * by the current L2 page table page, or to the end of the * range being protected. */ if (nextva > eva) nextva = eva; for (pte2p = pmap_pte2_quick(pmap, sva); sva != nextva; pte2p++, sva += PAGE_SIZE) { vm_page_t m; opte2 = npte2 = pte2_load(pte2p); if (!pte2_is_valid(opte2)) continue; if ((prot & VM_PROT_WRITE) == 0) { if (pte2_is_managed(opte2) && pte2_is_dirty(opte2)) { m = PHYS_TO_VM_PAGE(pte2_pa(opte2)); vm_page_dirty(m); } npte2 |= PTE2_RO | PTE2_NM; } if ((prot & VM_PROT_EXECUTE) == 0) npte2 |= PTE2_NX; /* * QQQ: Herein, execute permission is never set. * It only can be cleared. So, no icache * syncing is needed. */ if (npte2 != opte2) { pte2_store(pte2p, npte2); pmap_tlb_flush(pmap, sva); } } } if (pv_lists_locked) { sched_unpin(); rw_wunlock(&pvh_global_lock); } PMAP_UNLOCK(pmap); } /* * pmap_pvh_wired_mappings: * * Return the updated number "count" of managed mappings that are wired. */ static int pmap_pvh_wired_mappings(struct md_page *pvh, int count) { pmap_t pmap; pt1_entry_t pte1; pt2_entry_t pte2; pv_entry_t pv; rw_assert(&pvh_global_lock, RA_WLOCKED); sched_pin(); TAILQ_FOREACH(pv, &pvh->pv_list, pv_next) { pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pte1 = pte1_load(pmap_pte1(pmap, pv->pv_va)); if (pte1_is_section(pte1)) { if (pte1_is_wired(pte1)) count++; } else { KASSERT(pte1_is_link(pte1), ("%s: pte1 %#x is not link", __func__, pte1)); pte2 = pte2_load(pmap_pte2_quick(pmap, pv->pv_va)); if (pte2_is_wired(pte2)) count++; } PMAP_UNLOCK(pmap); } sched_unpin(); return (count); } /* * pmap_page_wired_mappings: * * Return the number of managed mappings to the given physical page * that are wired. */ int pmap_page_wired_mappings(vm_page_t m) { int count; count = 0; if ((m->oflags & VPO_UNMANAGED) != 0) return (count); rw_wlock(&pvh_global_lock); count = pmap_pvh_wired_mappings(&m->md, count); if ((m->flags & PG_FICTITIOUS) == 0) { count = pmap_pvh_wired_mappings(pa_to_pvh(VM_PAGE_TO_PHYS(m)), count); } rw_wunlock(&pvh_global_lock); return (count); } /* * Returns TRUE if any of the given mappings were used to modify * physical memory. Otherwise, returns FALSE. Both page and 1mpage * mappings are supported. */ static boolean_t pmap_is_modified_pvh(struct md_page *pvh) { pv_entry_t pv; pt1_entry_t pte1; pt2_entry_t pte2; pmap_t pmap; boolean_t rv; rw_assert(&pvh_global_lock, RA_WLOCKED); rv = FALSE; sched_pin(); TAILQ_FOREACH(pv, &pvh->pv_list, pv_next) { pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pte1 = pte1_load(pmap_pte1(pmap, pv->pv_va)); if (pte1_is_section(pte1)) { rv = pte1_is_dirty(pte1); } else { KASSERT(pte1_is_link(pte1), ("%s: pte1 %#x is not link", __func__, pte1)); pte2 = pte2_load(pmap_pte2_quick(pmap, pv->pv_va)); rv = pte2_is_dirty(pte2); } PMAP_UNLOCK(pmap); if (rv) break; } sched_unpin(); return (rv); } /* * pmap_is_modified: * * Return whether or not the specified physical page was modified * in any physical maps. */ boolean_t pmap_is_modified(vm_page_t m) { boolean_t rv; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("%s: page %p is not managed", __func__, m)); /* * If the page is not exclusive busied, then PGA_WRITEABLE cannot be * concurrently set while the object is locked. Thus, if PGA_WRITEABLE * is clear, no PTE2s can have PG_M set. */ VM_OBJECT_ASSERT_WLOCKED(m->object); if (!vm_page_xbusied(m) && (m->aflags & PGA_WRITEABLE) == 0) return (FALSE); rw_wlock(&pvh_global_lock); rv = pmap_is_modified_pvh(&m->md) || ((m->flags & PG_FICTITIOUS) == 0 && pmap_is_modified_pvh(pa_to_pvh(VM_PAGE_TO_PHYS(m)))); rw_wunlock(&pvh_global_lock); return (rv); } /* * pmap_is_prefaultable: * * Return whether or not the specified virtual address is eligible * for prefault. */ boolean_t pmap_is_prefaultable(pmap_t pmap, vm_offset_t addr) { pt1_entry_t pte1; pt2_entry_t pte2; boolean_t rv; rv = FALSE; PMAP_LOCK(pmap); pte1 = pte1_load(pmap_pte1(pmap, addr)); if (pte1_is_link(pte1)) { pte2 = pte2_load(pt2map_entry(addr)); rv = !pte2_is_valid(pte2) ; } PMAP_UNLOCK(pmap); return (rv); } /* * Returns TRUE if any of the given mappings were referenced and FALSE * otherwise. Both page and 1mpage mappings are supported. */ static boolean_t pmap_is_referenced_pvh(struct md_page *pvh) { pv_entry_t pv; pt1_entry_t pte1; pt2_entry_t pte2; pmap_t pmap; boolean_t rv; rw_assert(&pvh_global_lock, RA_WLOCKED); rv = FALSE; sched_pin(); TAILQ_FOREACH(pv, &pvh->pv_list, pv_next) { pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pte1 = pte1_load(pmap_pte1(pmap, pv->pv_va)); if (pte1_is_section(pte1)) { rv = (pte1 & (PTE1_A | PTE1_V)) == (PTE1_A | PTE1_V); } else { pte2 = pte2_load(pmap_pte2_quick(pmap, pv->pv_va)); rv = (pte2 & (PTE2_A | PTE2_V)) == (PTE2_A | PTE2_V); } PMAP_UNLOCK(pmap); if (rv) break; } sched_unpin(); return (rv); } /* * pmap_is_referenced: * * Return whether or not the specified physical page was referenced * in any physical maps. */ boolean_t pmap_is_referenced(vm_page_t m) { boolean_t rv; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("%s: page %p is not managed", __func__, m)); rw_wlock(&pvh_global_lock); rv = pmap_is_referenced_pvh(&m->md) || ((m->flags & PG_FICTITIOUS) == 0 && pmap_is_referenced_pvh(pa_to_pvh(VM_PAGE_TO_PHYS(m)))); rw_wunlock(&pvh_global_lock); return (rv); } -#define PMAP_TS_REFERENCED_MAX 5 - /* * pmap_ts_referenced: * * Return a count of reference bits for a page, clearing those bits. * It is not necessary for every reference bit to be cleared, but it * is necessary that 0 only be returned when there are truly no * reference bits set. - * - * XXX: The exact number of bits to check and clear is a matter that - * should be tested and standardized at some point in the future for - * optimal aging of shared pages. * * As an optimization, update the page's dirty field if a modified bit is * found while counting reference bits. This opportunistic update can be * performed at low cost and can eliminate the need for some future calls * to pmap_is_modified(). However, since this function stops after * finding PMAP_TS_REFERENCED_MAX reference bits, it may not detect some * dirty pages. Those dirty pages will only be detected by a future call * to pmap_is_modified(). */ int pmap_ts_referenced(vm_page_t m) { struct md_page *pvh; pv_entry_t pv, pvf; pmap_t pmap; pt1_entry_t *pte1p, opte1; pt2_entry_t *pte2p, opte2; vm_paddr_t pa; int rtval = 0; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("%s: page %p is not managed", __func__, m)); pa = VM_PAGE_TO_PHYS(m); pvh = pa_to_pvh(pa); rw_wlock(&pvh_global_lock); sched_pin(); if ((m->flags & PG_FICTITIOUS) != 0 || (pvf = TAILQ_FIRST(&pvh->pv_list)) == NULL) goto small_mappings; pv = pvf; do { pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pte1p = pmap_pte1(pmap, pv->pv_va); opte1 = pte1_load(pte1p); if (pte1_is_dirty(opte1)) { /* * Although "opte1" is mapping a 1MB page, because * this function is called at a 4KB page granularity, * we only update the 4KB page under test. */ vm_page_dirty(m); } if ((opte1 & PTE1_A) != 0) { /* * Since this reference bit is shared by 256 4KB pages, * it should not be cleared every time it is tested. * Apply a simple "hash" function on the physical page * number, the virtual section number, and the pmap * address to select one 4KB page out of the 256 * on which testing the reference bit will result * in clearing that bit. This function is designed * to avoid the selection of the same 4KB page * for every 1MB page mapping. * * On demotion, a mapping that hasn't been referenced * is simply destroyed. To avoid the possibility of a * subsequent page fault on a demoted wired mapping, * always leave its reference bit set. Moreover, * since the section is wired, the current state of * its reference bit won't affect page replacement. */ if ((((pa >> PAGE_SHIFT) ^ (pv->pv_va >> PTE1_SHIFT) ^ (uintptr_t)pmap) & (NPTE2_IN_PG - 1)) == 0 && !pte1_is_wired(opte1)) { pte1_clear_bit(pte1p, PTE1_A); pmap_tlb_flush(pmap, pv->pv_va); } rtval++; } PMAP_UNLOCK(pmap); /* Rotate the PV list if it has more than one entry. */ if (TAILQ_NEXT(pv, pv_next) != NULL) { TAILQ_REMOVE(&pvh->pv_list, pv, pv_next); TAILQ_INSERT_TAIL(&pvh->pv_list, pv, pv_next); } if (rtval >= PMAP_TS_REFERENCED_MAX) goto out; } while ((pv = TAILQ_FIRST(&pvh->pv_list)) != pvf); small_mappings: if ((pvf = TAILQ_FIRST(&m->md.pv_list)) == NULL) goto out; pv = pvf; do { pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pte1p = pmap_pte1(pmap, pv->pv_va); KASSERT(pte1_is_link(pte1_load(pte1p)), ("%s: not found a link in page %p's pv list", __func__, m)); pte2p = pmap_pte2_quick(pmap, pv->pv_va); opte2 = pte2_load(pte2p); if (pte2_is_dirty(opte2)) vm_page_dirty(m); if ((opte2 & PTE2_A) != 0) { pte2_clear_bit(pte2p, PTE2_A); pmap_tlb_flush(pmap, pv->pv_va); rtval++; } PMAP_UNLOCK(pmap); /* Rotate the PV list if it has more than one entry. */ if (TAILQ_NEXT(pv, pv_next) != NULL) { TAILQ_REMOVE(&m->md.pv_list, pv, pv_next); TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_next); } } while ((pv = TAILQ_FIRST(&m->md.pv_list)) != pvf && rtval < PMAP_TS_REFERENCED_MAX); out: sched_unpin(); rw_wunlock(&pvh_global_lock); return (rtval); } /* * Clear the wired attribute from the mappings for the specified range of * addresses in the given pmap. Every valid mapping within that range * must have the wired attribute set. In contrast, invalid mappings * cannot have the wired attribute set, so they are ignored. * * The wired attribute of the page table entry is not a hardware feature, * so there is no need to invalidate any TLB entries. */ void pmap_unwire(pmap_t pmap, vm_offset_t sva, vm_offset_t eva) { vm_offset_t nextva; pt1_entry_t *pte1p, pte1; pt2_entry_t *pte2p, pte2; boolean_t pv_lists_locked; if (pmap_is_current(pmap)) pv_lists_locked = FALSE; else { pv_lists_locked = TRUE; resume: rw_wlock(&pvh_global_lock); sched_pin(); } PMAP_LOCK(pmap); for (; sva < eva; sva = nextva) { nextva = pte1_trunc(sva + PTE1_SIZE); if (nextva < sva) nextva = eva; pte1p = pmap_pte1(pmap, sva); pte1 = pte1_load(pte1p); /* * Weed out invalid mappings. Note: we assume that L1 page * page table is always allocated, and in kernel virtual. */ if (pte1 == 0) continue; if (pte1_is_section(pte1)) { if (!pte1_is_wired(pte1)) panic("%s: pte1 %#x not wired", __func__, pte1); /* * Are we unwiring the entire large page? If not, * demote the mapping and fall through. */ if (sva + PTE1_SIZE == nextva && eva >= nextva) { pte1_clear_bit(pte1p, PTE1_W); pmap->pm_stats.wired_count -= PTE1_SIZE / PAGE_SIZE; continue; } else { if (!pv_lists_locked) { pv_lists_locked = TRUE; if (!rw_try_wlock(&pvh_global_lock)) { PMAP_UNLOCK(pmap); /* Repeat sva. */ goto resume; } sched_pin(); } if (!pmap_demote_pte1(pmap, pte1p, sva)) panic("%s: demotion failed", __func__); #ifdef INVARIANTS else { /* Update pte1 after demotion */ pte1 = pte1_load(pte1p); } #endif } } KASSERT(pte1_is_link(pte1), ("%s: pmap %p va %#x pte1 %#x at %p" " is not link", __func__, pmap, sva, pte1, pte1p)); /* * Limit our scan to either the end of the va represented * by the current L2 page table page, or to the end of the * range being protected. */ if (nextva > eva) nextva = eva; for (pte2p = pmap_pte2_quick(pmap, sva); sva != nextva; pte2p++, sva += PAGE_SIZE) { pte2 = pte2_load(pte2p); if (!pte2_is_valid(pte2)) continue; if (!pte2_is_wired(pte2)) panic("%s: pte2 %#x is missing PTE2_W", __func__, pte2); /* * PTE2_W must be cleared atomically. Although the pmap * lock synchronizes access to PTE2_W, another processor * could be changing PTE2_NM and/or PTE2_A concurrently. */ pte2_clear_bit(pte2p, PTE2_W); pmap->pm_stats.wired_count--; } } if (pv_lists_locked) { sched_unpin(); rw_wunlock(&pvh_global_lock); } PMAP_UNLOCK(pmap); } /* * Clear the write and modified bits in each of the given page's mappings. */ void pmap_remove_write(vm_page_t m) { struct md_page *pvh; pv_entry_t next_pv, pv; pmap_t pmap; pt1_entry_t *pte1p; pt2_entry_t *pte2p, opte2; vm_offset_t va; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("%s: page %p is not managed", __func__, m)); /* * If the page is not exclusive busied, then PGA_WRITEABLE cannot be * set by another thread while the object is locked. Thus, * if PGA_WRITEABLE is clear, no page table entries need updating. */ VM_OBJECT_ASSERT_WLOCKED(m->object); if (!vm_page_xbusied(m) && (m->aflags & PGA_WRITEABLE) == 0) return; rw_wlock(&pvh_global_lock); sched_pin(); if ((m->flags & PG_FICTITIOUS) != 0) goto small_mappings; pvh = pa_to_pvh(VM_PAGE_TO_PHYS(m)); TAILQ_FOREACH_SAFE(pv, &pvh->pv_list, pv_next, next_pv) { va = pv->pv_va; pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pte1p = pmap_pte1(pmap, va); if (!(pte1_load(pte1p) & PTE1_RO)) (void)pmap_demote_pte1(pmap, pte1p, va); PMAP_UNLOCK(pmap); } small_mappings: TAILQ_FOREACH(pv, &m->md.pv_list, pv_next) { pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pte1p = pmap_pte1(pmap, pv->pv_va); KASSERT(!pte1_is_section(pte1_load(pte1p)), ("%s: found" " a section in page %p's pv list", __func__, m)); pte2p = pmap_pte2_quick(pmap, pv->pv_va); opte2 = pte2_load(pte2p); if (!(opte2 & PTE2_RO)) { pte2_store(pte2p, opte2 | PTE2_RO | PTE2_NM); if (pte2_is_dirty(opte2)) vm_page_dirty(m); pmap_tlb_flush(pmap, pv->pv_va); } PMAP_UNLOCK(pmap); } vm_page_aflag_clear(m, PGA_WRITEABLE); sched_unpin(); rw_wunlock(&pvh_global_lock); } /* * Apply the given advice to the specified range of addresses within the * given pmap. Depending on the advice, clear the referenced and/or * modified flags in each mapping and set the mapped page's dirty field. */ void pmap_advise(pmap_t pmap, vm_offset_t sva, vm_offset_t eva, int advice) { pt1_entry_t *pte1p, opte1; pt2_entry_t *pte2p, pte2; vm_offset_t pdnxt; vm_page_t m; boolean_t pv_lists_locked; if (advice != MADV_DONTNEED && advice != MADV_FREE) return; if (pmap_is_current(pmap)) pv_lists_locked = FALSE; else { pv_lists_locked = TRUE; resume: rw_wlock(&pvh_global_lock); sched_pin(); } PMAP_LOCK(pmap); for (; sva < eva; sva = pdnxt) { pdnxt = pte1_trunc(sva + PTE1_SIZE); if (pdnxt < sva) pdnxt = eva; pte1p = pmap_pte1(pmap, sva); opte1 = pte1_load(pte1p); if (!pte1_is_valid(opte1)) /* XXX */ continue; else if (pte1_is_section(opte1)) { if (!pte1_is_managed(opte1)) continue; if (!pv_lists_locked) { pv_lists_locked = TRUE; if (!rw_try_wlock(&pvh_global_lock)) { PMAP_UNLOCK(pmap); goto resume; } sched_pin(); } if (!pmap_demote_pte1(pmap, pte1p, sva)) { /* * The large page mapping was destroyed. */ continue; } /* * Unless the page mappings are wired, remove the * mapping to a single page so that a subsequent * access may repromote. Since the underlying L2 page * table is fully populated, this removal never * frees a L2 page table page. */ if (!pte1_is_wired(opte1)) { pte2p = pmap_pte2_quick(pmap, sva); KASSERT(pte2_is_valid(pte2_load(pte2p)), ("%s: invalid PTE2", __func__)); pmap_remove_pte2(pmap, pte2p, sva, NULL); } } if (pdnxt > eva) pdnxt = eva; for (pte2p = pmap_pte2_quick(pmap, sva); sva != pdnxt; pte2p++, sva += PAGE_SIZE) { pte2 = pte2_load(pte2p); if (!pte2_is_valid(pte2) || !pte2_is_managed(pte2)) continue; else if (pte2_is_dirty(pte2)) { if (advice == MADV_DONTNEED) { /* * Future calls to pmap_is_modified() * can be avoided by making the page * dirty now. */ m = PHYS_TO_VM_PAGE(pte2_pa(pte2)); vm_page_dirty(m); } pte2_set_bit(pte2p, PTE2_NM); pte2_clear_bit(pte2p, PTE2_A); } else if ((pte2 & PTE2_A) != 0) pte2_clear_bit(pte2p, PTE2_A); else continue; pmap_tlb_flush(pmap, sva); } } if (pv_lists_locked) { sched_unpin(); rw_wunlock(&pvh_global_lock); } PMAP_UNLOCK(pmap); } /* * Clear the modify bits on the specified physical page. */ void pmap_clear_modify(vm_page_t m) { struct md_page *pvh; pv_entry_t next_pv, pv; pmap_t pmap; pt1_entry_t *pte1p, opte1; pt2_entry_t *pte2p, opte2; vm_offset_t va; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("%s: page %p is not managed", __func__, m)); VM_OBJECT_ASSERT_WLOCKED(m->object); KASSERT(!vm_page_xbusied(m), ("%s: page %p is exclusive busy", __func__, m)); /* * If the page is not PGA_WRITEABLE, then no PTE2s can have PTE2_NM * cleared. If the object containing the page is locked and the page * is not exclusive busied, then PGA_WRITEABLE cannot be concurrently * set. */ if ((m->flags & PGA_WRITEABLE) == 0) return; rw_wlock(&pvh_global_lock); sched_pin(); if ((m->flags & PG_FICTITIOUS) != 0) goto small_mappings; pvh = pa_to_pvh(VM_PAGE_TO_PHYS(m)); TAILQ_FOREACH_SAFE(pv, &pvh->pv_list, pv_next, next_pv) { va = pv->pv_va; pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pte1p = pmap_pte1(pmap, va); opte1 = pte1_load(pte1p); if (!(opte1 & PTE1_RO)) { if (pmap_demote_pte1(pmap, pte1p, va) && !pte1_is_wired(opte1)) { /* * Write protect the mapping to a * single page so that a subsequent * write access may repromote. */ va += VM_PAGE_TO_PHYS(m) - pte1_pa(opte1); pte2p = pmap_pte2_quick(pmap, va); opte2 = pte2_load(pte2p); if ((opte2 & PTE2_V)) { pte2_set_bit(pte2p, PTE2_NM | PTE2_RO); vm_page_dirty(m); pmap_tlb_flush(pmap, va); } } } PMAP_UNLOCK(pmap); } small_mappings: TAILQ_FOREACH(pv, &m->md.pv_list, pv_next) { pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pte1p = pmap_pte1(pmap, pv->pv_va); KASSERT(!pte1_is_section(pte1_load(pte1p)), ("%s: found" " a section in page %p's pv list", __func__, m)); pte2p = pmap_pte2_quick(pmap, pv->pv_va); if (pte2_is_dirty(pte2_load(pte2p))) { pte2_set_bit(pte2p, PTE2_NM); pmap_tlb_flush(pmap, pv->pv_va); } PMAP_UNLOCK(pmap); } sched_unpin(); rw_wunlock(&pvh_global_lock); } /* * Sets the memory attribute for the specified page. */ void pmap_page_set_memattr(vm_page_t m, vm_memattr_t ma) { struct sysmaps *sysmaps; vm_memattr_t oma; vm_paddr_t pa; oma = m->md.pat_mode; m->md.pat_mode = ma; CTR5(KTR_PMAP, "%s: page %p - 0x%08X oma: %d, ma: %d", __func__, m, VM_PAGE_TO_PHYS(m), oma, ma); if ((m->flags & PG_FICTITIOUS) != 0) return; #if 0 /* * If "m" is a normal page, flush it from the cache. * * First, try to find an existing mapping of the page by sf * buffer. sf_buf_invalidate_cache() modifies mapping and * flushes the cache. */ if (sf_buf_invalidate_cache(m, oma)) return; #endif /* * If page is not mapped by sf buffer, map the page * transient and do invalidation. */ if (ma != oma) { pa = VM_PAGE_TO_PHYS(m); sched_pin(); sysmaps = &sysmaps_pcpu[PCPU_GET(cpuid)]; mtx_lock(&sysmaps->lock); if (*sysmaps->CMAP2) panic("%s: CMAP2 busy", __func__); pte2_store(sysmaps->CMAP2, PTE2_KERN_NG(pa, PTE2_AP_KRW, vm_memattr_to_pte2(ma))); dcache_wbinv_poc((vm_offset_t)sysmaps->CADDR2, pa, PAGE_SIZE); pte2_clear(sysmaps->CMAP2); tlb_flush((vm_offset_t)sysmaps->CADDR2); sched_unpin(); mtx_unlock(&sysmaps->lock); } } /* * Miscellaneous support routines follow */ /* * Returns TRUE if the given page is mapped individually or as part of * a 1mpage. Otherwise, returns FALSE. */ boolean_t pmap_page_is_mapped(vm_page_t m) { boolean_t rv; if ((m->oflags & VPO_UNMANAGED) != 0) return (FALSE); rw_wlock(&pvh_global_lock); rv = !TAILQ_EMPTY(&m->md.pv_list) || ((m->flags & PG_FICTITIOUS) == 0 && !TAILQ_EMPTY(&pa_to_pvh(VM_PAGE_TO_PHYS(m))->pv_list)); rw_wunlock(&pvh_global_lock); return (rv); } /* * Returns true if the pmap's pv is one of the first * 16 pvs linked to from this page. This count may * be changed upwards or downwards in the future; it * is only necessary that true be returned for a small * subset of pmaps for proper page aging. */ boolean_t pmap_page_exists_quick(pmap_t pmap, vm_page_t m) { struct md_page *pvh; pv_entry_t pv; int loops = 0; boolean_t rv; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("%s: page %p is not managed", __func__, m)); rv = FALSE; rw_wlock(&pvh_global_lock); TAILQ_FOREACH(pv, &m->md.pv_list, pv_next) { if (PV_PMAP(pv) == pmap) { rv = TRUE; break; } loops++; if (loops >= 16) break; } if (!rv && loops < 16 && (m->flags & PG_FICTITIOUS) == 0) { pvh = pa_to_pvh(VM_PAGE_TO_PHYS(m)); TAILQ_FOREACH(pv, &pvh->pv_list, pv_next) { if (PV_PMAP(pv) == pmap) { rv = TRUE; break; } loops++; if (loops >= 16) break; } } rw_wunlock(&pvh_global_lock); return (rv); } /* * pmap_zero_page zeros the specified hardware page by mapping * the page into KVM and using bzero to clear its contents. */ void pmap_zero_page(vm_page_t m) { struct sysmaps *sysmaps; sched_pin(); sysmaps = &sysmaps_pcpu[PCPU_GET(cpuid)]; mtx_lock(&sysmaps->lock); if (pte2_load(sysmaps->CMAP2) != 0) panic("%s: CMAP2 busy", __func__); pte2_store(sysmaps->CMAP2, PTE2_KERN_NG(VM_PAGE_TO_PHYS(m), PTE2_AP_KRW, vm_page_pte2_attr(m))); pagezero(sysmaps->CADDR2); pte2_clear(sysmaps->CMAP2); tlb_flush((vm_offset_t)sysmaps->CADDR2); sched_unpin(); mtx_unlock(&sysmaps->lock); } /* * pmap_zero_page_area zeros the specified hardware page by mapping * the page into KVM and using bzero to clear its contents. * * off and size may not cover an area beyond a single hardware page. */ void pmap_zero_page_area(vm_page_t m, int off, int size) { struct sysmaps *sysmaps; sched_pin(); sysmaps = &sysmaps_pcpu[PCPU_GET(cpuid)]; mtx_lock(&sysmaps->lock); if (pte2_load(sysmaps->CMAP2) != 0) panic("%s: CMAP2 busy", __func__); pte2_store(sysmaps->CMAP2, PTE2_KERN_NG(VM_PAGE_TO_PHYS(m), PTE2_AP_KRW, vm_page_pte2_attr(m))); if (off == 0 && size == PAGE_SIZE) pagezero(sysmaps->CADDR2); else bzero(sysmaps->CADDR2 + off, size); pte2_clear(sysmaps->CMAP2); tlb_flush((vm_offset_t)sysmaps->CADDR2); sched_unpin(); mtx_unlock(&sysmaps->lock); } /* * pmap_copy_page copies the specified (machine independent) * page by mapping the page into virtual memory and using * bcopy to copy the page, one machine dependent page at a * time. */ void pmap_copy_page(vm_page_t src, vm_page_t dst) { struct sysmaps *sysmaps; sched_pin(); sysmaps = &sysmaps_pcpu[PCPU_GET(cpuid)]; mtx_lock(&sysmaps->lock); if (pte2_load(sysmaps->CMAP1) != 0) panic("%s: CMAP1 busy", __func__); if (pte2_load(sysmaps->CMAP2) != 0) panic("%s: CMAP2 busy", __func__); pte2_store(sysmaps->CMAP1, PTE2_KERN_NG(VM_PAGE_TO_PHYS(src), PTE2_AP_KR | PTE2_NM, vm_page_pte2_attr(src))); pte2_store(sysmaps->CMAP2, PTE2_KERN_NG(VM_PAGE_TO_PHYS(dst), PTE2_AP_KRW, vm_page_pte2_attr(dst))); bcopy(sysmaps->CADDR1, sysmaps->CADDR2, PAGE_SIZE); pte2_clear(sysmaps->CMAP1); tlb_flush((vm_offset_t)sysmaps->CADDR1); pte2_clear(sysmaps->CMAP2); tlb_flush((vm_offset_t)sysmaps->CADDR2); sched_unpin(); mtx_unlock(&sysmaps->lock); } int unmapped_buf_allowed = 1; void pmap_copy_pages(vm_page_t ma[], vm_offset_t a_offset, vm_page_t mb[], vm_offset_t b_offset, int xfersize) { struct sysmaps *sysmaps; vm_page_t a_pg, b_pg; char *a_cp, *b_cp; vm_offset_t a_pg_offset, b_pg_offset; int cnt; sched_pin(); sysmaps = &sysmaps_pcpu[PCPU_GET(cpuid)]; mtx_lock(&sysmaps->lock); if (*sysmaps->CMAP1 != 0) panic("pmap_copy_pages: CMAP1 busy"); if (*sysmaps->CMAP2 != 0) panic("pmap_copy_pages: CMAP2 busy"); while (xfersize > 0) { a_pg = ma[a_offset >> PAGE_SHIFT]; a_pg_offset = a_offset & PAGE_MASK; cnt = min(xfersize, PAGE_SIZE - a_pg_offset); b_pg = mb[b_offset >> PAGE_SHIFT]; b_pg_offset = b_offset & PAGE_MASK; cnt = min(cnt, PAGE_SIZE - b_pg_offset); pte2_store(sysmaps->CMAP1, PTE2_KERN_NG(VM_PAGE_TO_PHYS(a_pg), PTE2_AP_KR | PTE2_NM, vm_page_pte2_attr(a_pg))); tlb_flush_local((vm_offset_t)sysmaps->CADDR1); pte2_store(sysmaps->CMAP2, PTE2_KERN_NG(VM_PAGE_TO_PHYS(b_pg), PTE2_AP_KRW, vm_page_pte2_attr(b_pg))); tlb_flush_local((vm_offset_t)sysmaps->CADDR2); a_cp = sysmaps->CADDR1 + a_pg_offset; b_cp = sysmaps->CADDR2 + b_pg_offset; bcopy(a_cp, b_cp, cnt); a_offset += cnt; b_offset += cnt; xfersize -= cnt; } pte2_clear(sysmaps->CMAP1); tlb_flush((vm_offset_t)sysmaps->CADDR1); pte2_clear(sysmaps->CMAP2); tlb_flush((vm_offset_t)sysmaps->CADDR2); sched_unpin(); mtx_unlock(&sysmaps->lock); } vm_offset_t pmap_quick_enter_page(vm_page_t m) { pt2_entry_t *pte2p; vm_offset_t qmap_addr; critical_enter(); qmap_addr = PCPU_GET(qmap_addr); pte2p = pt2map_entry(qmap_addr); KASSERT(pte2_load(pte2p) == 0, ("%s: PTE2 busy", __func__)); pte2_store(pte2p, PTE2_KERN_NG(VM_PAGE_TO_PHYS(m), PTE2_AP_KRW, vm_page_pte2_attr(m))); return (qmap_addr); } void pmap_quick_remove_page(vm_offset_t addr) { pt2_entry_t *pte2p; vm_offset_t qmap_addr; qmap_addr = PCPU_GET(qmap_addr); pte2p = pt2map_entry(qmap_addr); KASSERT(addr == qmap_addr, ("%s: invalid address", __func__)); KASSERT(pte2_load(pte2p) != 0, ("%s: PTE2 not in use", __func__)); pte2_clear(pte2p); tlb_flush(qmap_addr); critical_exit(); } /* * Copy the range specified by src_addr/len * from the source map to the range dst_addr/len * in the destination map. * * This routine is only advisory and need not do anything. */ void pmap_copy(pmap_t dst_pmap, pmap_t src_pmap, vm_offset_t dst_addr, vm_size_t len, vm_offset_t src_addr) { struct spglist free; vm_offset_t addr; vm_offset_t end_addr = src_addr + len; vm_offset_t nextva; if (dst_addr != src_addr) return; if (!pmap_is_current(src_pmap)) return; rw_wlock(&pvh_global_lock); if (dst_pmap < src_pmap) { PMAP_LOCK(dst_pmap); PMAP_LOCK(src_pmap); } else { PMAP_LOCK(src_pmap); PMAP_LOCK(dst_pmap); } sched_pin(); for (addr = src_addr; addr < end_addr; addr = nextva) { pt2_entry_t *src_pte2p, *dst_pte2p; vm_page_t dst_mpt2pg, src_mpt2pg; pt1_entry_t src_pte1; u_int pte1_idx; KASSERT(addr < VM_MAXUSER_ADDRESS, ("%s: invalid to pmap_copy page tables", __func__)); nextva = pte1_trunc(addr + PTE1_SIZE); if (nextva < addr) nextva = end_addr; pte1_idx = pte1_index(addr); src_pte1 = src_pmap->pm_pt1[pte1_idx]; if (pte1_is_section(src_pte1)) { if ((addr & PTE1_OFFSET) != 0 || (addr + PTE1_SIZE) > end_addr) continue; if (dst_pmap->pm_pt1[pte1_idx] == 0 && (!pte1_is_managed(src_pte1) || pmap_pv_insert_pte1(dst_pmap, addr, pte1_pa(src_pte1)))) { dst_pmap->pm_pt1[pte1_idx] = src_pte1 & ~PTE1_W; dst_pmap->pm_stats.resident_count += PTE1_SIZE / PAGE_SIZE; pmap_pte1_mappings++; } continue; } else if (!pte1_is_link(src_pte1)) continue; src_mpt2pg = PHYS_TO_VM_PAGE(pte1_link_pa(src_pte1)); /* * We leave PT2s to be linked from PT1 even if they are not * referenced until all PT2s in a page are without reference. * * QQQ: It could be changed ... */ #if 0 /* single_pt2_link_is_cleared */ KASSERT(pt2_wirecount_get(src_mpt2pg, pte1_idx) > 0, ("%s: source page table page is unused", __func__)); #else if (pt2_wirecount_get(src_mpt2pg, pte1_idx) == 0) continue; #endif if (nextva > end_addr) nextva = end_addr; src_pte2p = pt2map_entry(addr); while (addr < nextva) { pt2_entry_t temp_pte2; temp_pte2 = pte2_load(src_pte2p); /* * we only virtual copy managed pages */ if (pte2_is_managed(temp_pte2)) { dst_mpt2pg = pmap_allocpte2(dst_pmap, addr, PMAP_ENTER_NOSLEEP); if (dst_mpt2pg == NULL) goto out; dst_pte2p = pmap_pte2_quick(dst_pmap, addr); if (!pte2_is_valid(pte2_load(dst_pte2p)) && pmap_try_insert_pv_entry(dst_pmap, addr, PHYS_TO_VM_PAGE(pte2_pa(temp_pte2)))) { /* * Clear the wired, modified, and * accessed (referenced) bits * during the copy. */ temp_pte2 &= ~(PTE2_W | PTE2_A); temp_pte2 |= PTE2_NM; pte2_store(dst_pte2p, temp_pte2); dst_pmap->pm_stats.resident_count++; } else { SLIST_INIT(&free); if (pmap_unwire_pt2(dst_pmap, addr, dst_mpt2pg, &free)) { pmap_tlb_flush(dst_pmap, addr); pmap_free_zero_pages(&free); } goto out; } if (pt2_wirecount_get(dst_mpt2pg, pte1_idx) >= pt2_wirecount_get(src_mpt2pg, pte1_idx)) break; } addr += PAGE_SIZE; src_pte2p++; } } out: sched_unpin(); rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(src_pmap); PMAP_UNLOCK(dst_pmap); } /* * Increase the starting virtual address of the given mapping if a * different alignment might result in more section mappings. */ void pmap_align_superpage(vm_object_t object, vm_ooffset_t offset, vm_offset_t *addr, vm_size_t size) { vm_offset_t pte1_offset; if (size < PTE1_SIZE) return; if (object != NULL && (object->flags & OBJ_COLORED) != 0) offset += ptoa(object->pg_color); pte1_offset = offset & PTE1_OFFSET; if (size - ((PTE1_SIZE - pte1_offset) & PTE1_OFFSET) < PTE1_SIZE || (*addr & PTE1_OFFSET) == pte1_offset) return; if ((*addr & PTE1_OFFSET) < pte1_offset) *addr = pte1_trunc(*addr) + pte1_offset; else *addr = pte1_roundup(*addr) + pte1_offset; } void pmap_activate(struct thread *td) { pmap_t pmap, oldpmap; u_int cpuid, ttb; PDEBUG(9, printf("%s: td = %08x\n", __func__, (uint32_t)td)); critical_enter(); pmap = vmspace_pmap(td->td_proc->p_vmspace); oldpmap = PCPU_GET(curpmap); cpuid = PCPU_GET(cpuid); #if defined(SMP) CPU_CLR_ATOMIC(cpuid, &oldpmap->pm_active); CPU_SET_ATOMIC(cpuid, &pmap->pm_active); #else CPU_CLR(cpuid, &oldpmap->pm_active); CPU_SET(cpuid, &pmap->pm_active); #endif ttb = pmap_ttb_get(pmap); /* * pmap_activate is for the current thread on the current cpu */ td->td_pcb->pcb_pagedir = ttb; cp15_ttbr_set(ttb); PCPU_SET(curpmap, pmap); critical_exit(); } /* * Perform the pmap work for mincore. */ int pmap_mincore(pmap_t pmap, vm_offset_t addr, vm_paddr_t *locked_pa) { pt1_entry_t *pte1p, pte1; pt2_entry_t *pte2p, pte2; vm_paddr_t pa; boolean_t managed; int val; PMAP_LOCK(pmap); retry: pte1p = pmap_pte1(pmap, addr); pte1 = pte1_load(pte1p); if (pte1_is_section(pte1)) { pa = trunc_page(pte1_pa(pte1) | (addr & PTE1_OFFSET)); managed = pte1_is_managed(pte1); val = MINCORE_SUPER | MINCORE_INCORE; if (pte1_is_dirty(pte1)) val |= MINCORE_MODIFIED | MINCORE_MODIFIED_OTHER; if (pte1 & PTE1_A) val |= MINCORE_REFERENCED | MINCORE_REFERENCED_OTHER; } else if (pte1_is_link(pte1)) { pte2p = pmap_pte2(pmap, addr); pte2 = pte2_load(pte2p); pmap_pte2_release(pte2p); pa = pte2_pa(pte2); managed = pte2_is_managed(pte2); val = MINCORE_INCORE; if (pte2_is_dirty(pte2)) val |= MINCORE_MODIFIED | MINCORE_MODIFIED_OTHER; if (pte2 & PTE2_A) val |= MINCORE_REFERENCED | MINCORE_REFERENCED_OTHER; } else { managed = FALSE; val = 0; } if ((val & (MINCORE_MODIFIED_OTHER | MINCORE_REFERENCED_OTHER)) != (MINCORE_MODIFIED_OTHER | MINCORE_REFERENCED_OTHER) && managed) { /* Ensure that "PHYS_TO_VM_PAGE(pa)->object" doesn't change. */ if (vm_page_pa_tryrelock(pmap, pa, locked_pa)) goto retry; } else PA_UNLOCK_COND(*locked_pa); PMAP_UNLOCK(pmap); return (val); } void pmap_kenter_device(vm_offset_t va, vm_size_t size, vm_paddr_t pa) { vm_offset_t sva; uint32_t l2attr; KASSERT((size & PAGE_MASK) == 0, ("%s: device mapping not page-sized", __func__)); sva = va; l2attr = vm_memattr_to_pte2(VM_MEMATTR_DEVICE); while (size != 0) { pmap_kenter_prot_attr(va, pa, PTE2_AP_KRW, l2attr); va += PAGE_SIZE; pa += PAGE_SIZE; size -= PAGE_SIZE; } tlb_flush_range(sva, va - sva); } void pmap_kremove_device(vm_offset_t va, vm_size_t size) { vm_offset_t sva; KASSERT((size & PAGE_MASK) == 0, ("%s: device mapping not page-sized", __func__)); sva = va; while (size != 0) { pmap_kremove(va); va += PAGE_SIZE; size -= PAGE_SIZE; } tlb_flush_range(sva, va - sva); } void pmap_set_pcb_pagedir(pmap_t pmap, struct pcb *pcb) { pcb->pcb_pagedir = pmap_ttb_get(pmap); } /* * Clean L1 data cache range by physical address. * The range must be within a single page. */ static void pmap_dcache_wb_pou(vm_paddr_t pa, vm_size_t size, uint32_t attr) { struct sysmaps *sysmaps; KASSERT(((pa & PAGE_MASK) + size) <= PAGE_SIZE, ("%s: not on single page", __func__)); sched_pin(); sysmaps = &sysmaps_pcpu[PCPU_GET(cpuid)]; mtx_lock(&sysmaps->lock); if (*sysmaps->CMAP3) panic("%s: CMAP3 busy", __func__); pte2_store(sysmaps->CMAP3, PTE2_KERN_NG(pa, PTE2_AP_KRW, attr)); dcache_wb_pou((vm_offset_t)sysmaps->CADDR3 + (pa & PAGE_MASK), size); pte2_clear(sysmaps->CMAP3); tlb_flush((vm_offset_t)sysmaps->CADDR3); sched_unpin(); mtx_unlock(&sysmaps->lock); } /* * Sync instruction cache range which is not mapped yet. */ void cache_icache_sync_fresh(vm_offset_t va, vm_paddr_t pa, vm_size_t size) { uint32_t len, offset; vm_page_t m; /* Write back d-cache on given address range. */ offset = pa & PAGE_MASK; for ( ; size != 0; size -= len, pa += len, offset = 0) { len = min(PAGE_SIZE - offset, size); m = PHYS_TO_VM_PAGE(pa); KASSERT(m != NULL, ("%s: vm_page_t is null for %#x", __func__, pa)); pmap_dcache_wb_pou(pa, len, vm_page_pte2_attr(m)); } /* * I-cache is VIPT. Only way how to flush all virtual mappings * on given physical address is to invalidate all i-cache. */ icache_inv_all(); } void pmap_sync_icache(pmap_t pmap, vm_offset_t va, vm_size_t size) { /* Write back d-cache on given address range. */ if (va >= VM_MIN_KERNEL_ADDRESS) { dcache_wb_pou(va, size); } else { uint32_t len, offset; vm_paddr_t pa; vm_page_t m; offset = va & PAGE_MASK; for ( ; size != 0; size -= len, va += len, offset = 0) { pa = pmap_extract(pmap, va); /* offset is preserved */ len = min(PAGE_SIZE - offset, size); m = PHYS_TO_VM_PAGE(pa); KASSERT(m != NULL, ("%s: vm_page_t is null for %#x", __func__, pa)); pmap_dcache_wb_pou(pa, len, vm_page_pte2_attr(m)); } } /* * I-cache is VIPT. Only way how to flush all virtual mappings * on given physical address is to invalidate all i-cache. */ icache_inv_all(); } /* * The implementation of pmap_fault() uses IN_RANGE2() macro which * depends on the fact that given range size is a power of 2. */ CTASSERT(powerof2(NB_IN_PT1)); CTASSERT(powerof2(PT2MAP_SIZE)); #define IN_RANGE2(addr, start, size) \ ((vm_offset_t)(start) == ((vm_offset_t)(addr) & ~((size) - 1))) /* * Handle access and R/W emulation faults. */ int pmap_fault(pmap_t pmap, vm_offset_t far, uint32_t fsr, int idx, bool usermode) { pt1_entry_t *pte1p, pte1; pt2_entry_t *pte2p, pte2; if (pmap == NULL) pmap = kernel_pmap; /* * In kernel, we should never get abort with FAR which is in range of * pmap->pm_pt1 or PT2MAP address spaces. If it happens, stop here * and print out a useful abort message and even get to the debugger * otherwise it likely ends with never ending loop of aborts. */ if (__predict_false(IN_RANGE2(far, pmap->pm_pt1, NB_IN_PT1))) { /* * All L1 tables should always be mapped and present. * However, we check only current one herein. For user mode, * only permission abort from malicious user is not fatal. * And alignment abort as it may have higher priority. */ if (!usermode || (idx != FAULT_ALIGN && idx != FAULT_PERM_L2)) { CTR4(KTR_PMAP, "%s: pmap %#x pm_pt1 %#x far %#x", __func__, pmap, pmap->pm_pt1, far); panic("%s: pm_pt1 abort", __func__); } return (KERN_INVALID_ADDRESS); } if (__predict_false(IN_RANGE2(far, PT2MAP, PT2MAP_SIZE))) { /* * PT2MAP should be always mapped and present in current * L1 table. However, only existing L2 tables are mapped * in PT2MAP. For user mode, only L2 translation abort and * permission abort from malicious user is not fatal. * And alignment abort as it may have higher priority. */ if (!usermode || (idx != FAULT_ALIGN && idx != FAULT_TRAN_L2 && idx != FAULT_PERM_L2)) { CTR4(KTR_PMAP, "%s: pmap %#x PT2MAP %#x far %#x", __func__, pmap, PT2MAP, far); panic("%s: PT2MAP abort", __func__); } return (KERN_INVALID_ADDRESS); } /* * A pmap lock is used below for handling of access and R/W emulation * aborts. They were handled by atomic operations before so some * analysis of new situation is needed to answer the following question: * Is it safe to use the lock even for these aborts? * * There may happen two cases in general: * * (1) Aborts while the pmap lock is locked already - this should not * happen as pmap lock is not recursive. However, under pmap lock only * internal kernel data should be accessed and such data should be * mapped with A bit set and NM bit cleared. If double abort happens, * then a mapping of data which has caused it must be fixed. Further, * all new mappings are always made with A bit set and the bit can be * cleared only on managed mappings. * * (2) Aborts while another lock(s) is/are locked - this already can * happen. However, there is no difference here if it's either access or * R/W emulation abort, or if it's some other abort. */ PMAP_LOCK(pmap); #ifdef SMP /* * Special treatment is due to break-before-make approach done when * pte1 is updated for userland mapping during section promotion or * demotion. If not caught here, pmap_enter() can find a section * mapping on faulting address. That is not allowed. */ if (idx == FAULT_TRAN_L1 && usermode && cp15_ats1cur_check(far) == 0) { PMAP_UNLOCK(pmap); return (KERN_SUCCESS); } #endif /* * Accesss bits for page and section. Note that the entry * is not in TLB yet, so TLB flush is not necessary. * * QQQ: This is hardware emulation, we do not call userret() * for aborts from user mode. */ if (idx == FAULT_ACCESS_L2) { pte2p = pt2map_entry(far); pte2 = pte2_load(pte2p); if (pte2_is_valid(pte2)) { pte2_store(pte2p, pte2 | PTE2_A); PMAP_UNLOCK(pmap); return (KERN_SUCCESS); } } if (idx == FAULT_ACCESS_L1) { pte1p = pmap_pte1(pmap, far); pte1 = pte1_load(pte1p); if (pte1_is_section(pte1)) { pte1_store(pte1p, pte1 | PTE1_A); PMAP_UNLOCK(pmap); return (KERN_SUCCESS); } } /* * Handle modify bits for page and section. Note that the modify * bit is emulated by software. So PTEx_RO is software read only * bit and PTEx_NM flag is real hardware read only bit. * * QQQ: This is hardware emulation, we do not call userret() * for aborts from user mode. */ if ((fsr & FSR_WNR) && (idx == FAULT_PERM_L2)) { pte2p = pt2map_entry(far); pte2 = pte2_load(pte2p); if (pte2_is_valid(pte2) && !(pte2 & PTE2_RO) && (pte2 & PTE2_NM)) { pte2_store(pte2p, pte2 & ~PTE2_NM); tlb_flush(trunc_page(far)); PMAP_UNLOCK(pmap); return (KERN_SUCCESS); } } if ((fsr & FSR_WNR) && (idx == FAULT_PERM_L1)) { pte1p = pmap_pte1(pmap, far); pte1 = pte1_load(pte1p); if (pte1_is_section(pte1) && !(pte1 & PTE1_RO) && (pte1 & PTE1_NM)) { pte1_store(pte1p, pte1 & ~PTE1_NM); tlb_flush(pte1_trunc(far)); PMAP_UNLOCK(pmap); return (KERN_SUCCESS); } } /* * QQQ: The previous code, mainly fast handling of access and * modify bits aborts, could be moved to ASM. Now we are * starting to deal with not fast aborts. */ #ifdef INVARIANTS /* * Read an entry in PT2TAB associated with both pmap and far. * It's safe because PT2TAB is always mapped. */ pte2 = pt2tab_load(pmap_pt2tab_entry(pmap, far)); if (pte2_is_valid(pte2)) { /* * Now, when we know that L2 page table is allocated, * we can use PT2MAP to get L2 page table entry. */ pte2 = pte2_load(pt2map_entry(far)); if (pte2_is_valid(pte2)) { /* * If L2 page table entry is valid, make sure that * L1 page table entry is valid too. Note that we * leave L2 page entries untouched when promoted. */ pte1 = pte1_load(pmap_pte1(pmap, far)); if (!pte1_is_valid(pte1)) { panic("%s: missing L1 page entry (%p, %#x)", __func__, pmap, far); } } } #endif PMAP_UNLOCK(pmap); return (KERN_FAILURE); } #if defined(PMAP_DEBUG) /* * Reusing of KVA used in pmap_zero_page function !!! */ static void pmap_zero_page_check(vm_page_t m) { uint32_t *p, *end; struct sysmaps *sysmaps; sched_pin(); sysmaps = &sysmaps_pcpu[PCPU_GET(cpuid)]; mtx_lock(&sysmaps->lock); if (pte2_load(sysmaps->CMAP2) != 0) panic("%s: CMAP2 busy", __func__); pte2_store(sysmaps->CMAP2, PTE2_KERN_NG(VM_PAGE_TO_PHYS(m), PTE2_AP_KRW, vm_page_pte2_attr(m))); end = (uint32_t*)(sysmaps->CADDR2 + PAGE_SIZE); for (p = (uint32_t*)sysmaps->CADDR2; p < end; p++) if (*p != 0) panic("%s: page %p not zero, va: %p", __func__, m, sysmaps->CADDR2); pte2_clear(sysmaps->CMAP2); tlb_flush((vm_offset_t)sysmaps->CADDR2); sched_unpin(); mtx_unlock(&sysmaps->lock); } int pmap_pid_dump(int pid) { pmap_t pmap; struct proc *p; int npte2 = 0; int i, j, index; sx_slock(&allproc_lock); FOREACH_PROC_IN_SYSTEM(p) { if (p->p_pid != pid || p->p_vmspace == NULL) continue; index = 0; pmap = vmspace_pmap(p->p_vmspace); for (i = 0; i < NPTE1_IN_PT1; i++) { pt1_entry_t pte1; pt2_entry_t *pte2p, pte2; vm_offset_t base, va; vm_paddr_t pa; vm_page_t m; base = i << PTE1_SHIFT; pte1 = pte1_load(&pmap->pm_pt1[i]); if (pte1_is_section(pte1)) { /* * QQQ: Do something here! */ } else if (pte1_is_link(pte1)) { for (j = 0; j < NPTE2_IN_PT2; j++) { va = base + (j << PAGE_SHIFT); if (va >= VM_MIN_KERNEL_ADDRESS) { if (index) { index = 0; printf("\n"); } sx_sunlock(&allproc_lock); return (npte2); } pte2p = pmap_pte2(pmap, va); pte2 = pte2_load(pte2p); pmap_pte2_release(pte2p); if (!pte2_is_valid(pte2)) continue; pa = pte2_pa(pte2); m = PHYS_TO_VM_PAGE(pa); printf("va: 0x%x, pa: 0x%x, h: %d, w:" " %d, f: 0x%x", va, pa, m->hold_count, m->wire_count, m->flags); npte2++; index++; if (index >= 2) { index = 0; printf("\n"); } else { printf(" "); } } } } } sx_sunlock(&allproc_lock); return (npte2); } #endif #ifdef DDB static pt2_entry_t * pmap_pte2_ddb(pmap_t pmap, vm_offset_t va) { pt1_entry_t pte1; vm_paddr_t pt2pg_pa; pte1 = pte1_load(pmap_pte1(pmap, va)); if (!pte1_is_link(pte1)) return (NULL); if (pmap_is_current(pmap)) return (pt2map_entry(va)); /* Note that L2 page table size is not equal to PAGE_SIZE. */ pt2pg_pa = trunc_page(pte1_link_pa(pte1)); if (pte2_pa(pte2_load(PMAP3)) != pt2pg_pa) { pte2_store(PMAP3, PTE2_KPT(pt2pg_pa)); #ifdef SMP PMAP3cpu = PCPU_GET(cpuid); #endif tlb_flush_local((vm_offset_t)PADDR3); } #ifdef SMP else if (PMAP3cpu != PCPU_GET(cpuid)) { PMAP3cpu = PCPU_GET(cpuid); tlb_flush_local((vm_offset_t)PADDR3); } #endif return (PADDR3 + (arm32_btop(va) & (NPTE2_IN_PG - 1))); } static void dump_pmap(pmap_t pmap) { printf("pmap %p\n", pmap); printf(" pm_pt1: %p\n", pmap->pm_pt1); printf(" pm_pt2tab: %p\n", pmap->pm_pt2tab); printf(" pm_active: 0x%08lX\n", pmap->pm_active.__bits[0]); } DB_SHOW_COMMAND(pmaps, pmap_list_pmaps) { pmap_t pmap; LIST_FOREACH(pmap, &allpmaps, pm_list) { dump_pmap(pmap); } } static int pte2_class(pt2_entry_t pte2) { int cls; cls = (pte2 >> 2) & 0x03; cls |= (pte2 >> 4) & 0x04; return (cls); } static void dump_section(pmap_t pmap, uint32_t pte1_idx) { } static void dump_link(pmap_t pmap, uint32_t pte1_idx, boolean_t invalid_ok) { uint32_t i; vm_offset_t va; pt2_entry_t *pte2p, pte2; vm_page_t m; va = pte1_idx << PTE1_SHIFT; pte2p = pmap_pte2_ddb(pmap, va); for (i = 0; i < NPTE2_IN_PT2; i++, pte2p++, va += PAGE_SIZE) { pte2 = pte2_load(pte2p); if (pte2 == 0) continue; if (!pte2_is_valid(pte2)) { printf(" 0x%08X: 0x%08X", va, pte2); if (!invalid_ok) printf(" - not valid !!!"); printf("\n"); continue; } m = PHYS_TO_VM_PAGE(pte2_pa(pte2)); printf(" 0x%08X: 0x%08X, TEX%d, s:%d, g:%d, m:%p", va , pte2, pte2_class(pte2), !!(pte2 & PTE2_S), !(pte2 & PTE2_NG), m); if (m != NULL) { printf(" v:%d h:%d w:%d f:0x%04X\n", m->valid, m->hold_count, m->wire_count, m->flags); } else { printf("\n"); } } } static __inline boolean_t is_pv_chunk_space(vm_offset_t va) { if ((((vm_offset_t)pv_chunkbase) <= va) && (va < ((vm_offset_t)pv_chunkbase + PAGE_SIZE * pv_maxchunks))) return (TRUE); return (FALSE); } DB_SHOW_COMMAND(pmap, pmap_pmap_print) { /* XXX convert args. */ pmap_t pmap = (pmap_t)addr; pt1_entry_t pte1; pt2_entry_t pte2; vm_offset_t va, eva; vm_page_t m; uint32_t i; boolean_t invalid_ok, dump_link_ok, dump_pv_chunk; if (have_addr) { pmap_t pm; LIST_FOREACH(pm, &allpmaps, pm_list) if (pm == pmap) break; if (pm == NULL) { printf("given pmap %p is not in allpmaps list\n", pmap); return; } } else pmap = PCPU_GET(curpmap); eva = (modif[0] == 'u') ? VM_MAXUSER_ADDRESS : 0xFFFFFFFF; dump_pv_chunk = FALSE; /* XXX evaluate from modif[] */ printf("pmap: 0x%08X\n", (uint32_t)pmap); printf("PT2MAP: 0x%08X\n", (uint32_t)PT2MAP); printf("pt2tab: 0x%08X\n", (uint32_t)pmap->pm_pt2tab); for(i = 0; i < NPTE1_IN_PT1; i++) { pte1 = pte1_load(&pmap->pm_pt1[i]); if (pte1 == 0) continue; va = i << PTE1_SHIFT; if (va >= eva) break; if (pte1_is_section(pte1)) { printf("0x%08X: Section 0x%08X, s:%d g:%d\n", va, pte1, !!(pte1 & PTE1_S), !(pte1 & PTE1_NG)); dump_section(pmap, i); } else if (pte1_is_link(pte1)) { dump_link_ok = TRUE; invalid_ok = FALSE; pte2 = pte2_load(pmap_pt2tab_entry(pmap, va)); m = PHYS_TO_VM_PAGE(pte1_link_pa(pte1)); printf("0x%08X: Link 0x%08X, pt2tab: 0x%08X m: %p", va, pte1, pte2, m); if (is_pv_chunk_space(va)) { printf(" - pv_chunk space"); if (dump_pv_chunk) invalid_ok = TRUE; else dump_link_ok = FALSE; } else if (m != NULL) printf(" w:%d w2:%u", m->wire_count, pt2_wirecount_get(m, pte1_index(va))); if (pte2 == 0) printf(" !!! pt2tab entry is ZERO"); else if (pte2_pa(pte1) != pte2_pa(pte2)) printf(" !!! pt2tab entry is DIFFERENT - m: %p", PHYS_TO_VM_PAGE(pte2_pa(pte2))); printf("\n"); if (dump_link_ok) dump_link(pmap, i, invalid_ok); } else printf("0x%08X: Invalid entry 0x%08X\n", va, pte1); } } static void dump_pt2tab(pmap_t pmap) { uint32_t i; pt2_entry_t pte2; vm_offset_t va; vm_paddr_t pa; vm_page_t m; printf("PT2TAB:\n"); for (i = 0; i < PT2TAB_ENTRIES; i++) { pte2 = pte2_load(&pmap->pm_pt2tab[i]); if (!pte2_is_valid(pte2)) continue; va = i << PT2TAB_SHIFT; pa = pte2_pa(pte2); m = PHYS_TO_VM_PAGE(pa); printf(" 0x%08X: 0x%08X, TEX%d, s:%d, m:%p", va, pte2, pte2_class(pte2), !!(pte2 & PTE2_S), m); if (m != NULL) printf(" , h: %d, w: %d, f: 0x%04X pidx: %lld", m->hold_count, m->wire_count, m->flags, m->pindex); printf("\n"); } } DB_SHOW_COMMAND(pmap_pt2tab, pmap_pt2tab_print) { /* XXX convert args. */ pmap_t pmap = (pmap_t)addr; pt1_entry_t pte1; pt2_entry_t pte2; vm_offset_t va; uint32_t i, start; if (have_addr) { printf("supported only on current pmap\n"); return; } pmap = PCPU_GET(curpmap); printf("curpmap: 0x%08X\n", (uint32_t)pmap); printf("PT2MAP: 0x%08X\n", (uint32_t)PT2MAP); printf("pt2tab: 0x%08X\n", (uint32_t)pmap->pm_pt2tab); start = pte1_index((vm_offset_t)PT2MAP); for (i = start; i < (start + NPT2_IN_PT2TAB); i++) { pte1 = pte1_load(&pmap->pm_pt1[i]); if (pte1 == 0) continue; va = i << PTE1_SHIFT; if (pte1_is_section(pte1)) { printf("0x%08X: Section 0x%08X, s:%d\n", va, pte1, !!(pte1 & PTE1_S)); dump_section(pmap, i); } else if (pte1_is_link(pte1)) { pte2 = pte2_load(pmap_pt2tab_entry(pmap, va)); printf("0x%08X: Link 0x%08X, pt2tab: 0x%08X\n", va, pte1, pte2); if (pte2 == 0) printf(" !!! pt2tab entry is ZERO\n"); } else printf("0x%08X: Invalid entry 0x%08X\n", va, pte1); } dump_pt2tab(pmap); } #endif Index: projects/clang390-import/sys/arm64/arm64/pmap.c =================================================================== --- projects/clang390-import/sys/arm64/arm64/pmap.c (revision 305686) +++ projects/clang390-import/sys/arm64/arm64/pmap.c (revision 305687) @@ -1,4732 +1,4744 @@ /*- * Copyright (c) 1991 Regents of the University of California. * All rights reserved. * Copyright (c) 1994 John S. Dyson * All rights reserved. * Copyright (c) 1994 David Greenman * All rights reserved. * Copyright (c) 2003 Peter Wemm * All rights reserved. * Copyright (c) 2005-2010 Alan L. Cox * All rights reserved. * Copyright (c) 2014 Andrew Turner * All rights reserved. * Copyright (c) 2014-2016 The FreeBSD Foundation * All rights reserved. * * This code is derived from software contributed to Berkeley by * the Systems Programming Group of the University of Utah Computer * Science Department and William Jolitz of UUNET Technologies Inc. * * This software was developed by Andrew Turner under sponsorship from * the FreeBSD Foundation. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. All advertising materials mentioning features or use of this software * must display the following acknowledgement: * This product includes software developed by the University of * California, Berkeley and its contributors. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * from: @(#)pmap.c 7.7 (Berkeley) 5/12/91 */ /*- * Copyright (c) 2003 Networks Associates Technology, Inc. * All rights reserved. * * This software was developed for the FreeBSD Project by Jake Burkholder, * Safeport Network Services, and Network Associates Laboratories, the * Security Research Division of Network Associates, Inc. under * DARPA/SPAWAR contract N66001-01-C-8035 ("CBOSS"), as part of the DARPA * CHATS research program. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); /* * Manages physical address maps. * * Since the information managed by this module is * also stored by the logical address mapping module, * this module may throw away valid virtual-to-physical * mappings at almost any time. However, invalidations * of virtual-to-physical mappings must be done as * requested. * * In order to cope with hardware architectures which * make virtual-to-physical map invalidates expensive, * this module may delay invalidate or reduced protection * operations until such time as they are actually * necessary. This module is given full information as * to which processors are currently using which maps, * and to when physical maps must be made correct. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #define NL0PG (PAGE_SIZE/(sizeof (pd_entry_t))) #define NL1PG (PAGE_SIZE/(sizeof (pd_entry_t))) #define NL2PG (PAGE_SIZE/(sizeof (pd_entry_t))) #define NL3PG (PAGE_SIZE/(sizeof (pt_entry_t))) #define NUL0E L0_ENTRIES #define NUL1E (NUL0E * NL1PG) #define NUL2E (NUL1E * NL2PG) #if !defined(DIAGNOSTIC) #ifdef __GNUC_GNU_INLINE__ #define PMAP_INLINE __attribute__((__gnu_inline__)) inline #else #define PMAP_INLINE extern inline #endif #else #define PMAP_INLINE #endif /* * These are configured by the mair_el1 register. This is set up in locore.S */ #define DEVICE_MEMORY 0 #define UNCACHED_MEMORY 1 #define CACHED_MEMORY 2 #ifdef PV_STATS #define PV_STAT(x) do { x ; } while (0) #else #define PV_STAT(x) do { } while (0) #endif #define pmap_l2_pindex(v) ((v) >> L2_SHIFT) #define pa_to_pvh(pa) (&pv_table[pmap_l2_pindex(pa)]) #define NPV_LIST_LOCKS MAXCPU #define PHYS_TO_PV_LIST_LOCK(pa) \ (&pv_list_locks[pa_index(pa) % NPV_LIST_LOCKS]) #define CHANGE_PV_LIST_LOCK_TO_PHYS(lockp, pa) do { \ struct rwlock **_lockp = (lockp); \ struct rwlock *_new_lock; \ \ _new_lock = PHYS_TO_PV_LIST_LOCK(pa); \ if (_new_lock != *_lockp) { \ if (*_lockp != NULL) \ rw_wunlock(*_lockp); \ *_lockp = _new_lock; \ rw_wlock(*_lockp); \ } \ } while (0) #define CHANGE_PV_LIST_LOCK_TO_VM_PAGE(lockp, m) \ CHANGE_PV_LIST_LOCK_TO_PHYS(lockp, VM_PAGE_TO_PHYS(m)) #define RELEASE_PV_LIST_LOCK(lockp) do { \ struct rwlock **_lockp = (lockp); \ \ if (*_lockp != NULL) { \ rw_wunlock(*_lockp); \ *_lockp = NULL; \ } \ } while (0) #define VM_PAGE_TO_PV_LIST_LOCK(m) \ PHYS_TO_PV_LIST_LOCK(VM_PAGE_TO_PHYS(m)) struct pmap kernel_pmap_store; vm_offset_t virtual_avail; /* VA of first avail page (after kernel bss) */ vm_offset_t virtual_end; /* VA of last avail page (end of kernel AS) */ vm_offset_t kernel_vm_end = 0; struct msgbuf *msgbufp = NULL; /* * Data for the pv entry allocation mechanism. * Updates to pv_invl_gen are protected by the pv_list_locks[] * elements, but reads are not. */ static struct md_page *pv_table; static struct md_page pv_dummy; vm_paddr_t dmap_phys_base; /* The start of the dmap region */ vm_paddr_t dmap_phys_max; /* The limit of the dmap region */ vm_offset_t dmap_max_addr; /* The virtual address limit of the dmap */ /* This code assumes all L1 DMAP entries will be used */ CTASSERT((DMAP_MIN_ADDRESS & ~L0_OFFSET) == DMAP_MIN_ADDRESS); CTASSERT((DMAP_MAX_ADDRESS & ~L0_OFFSET) == DMAP_MAX_ADDRESS); #define DMAP_TABLES ((DMAP_MAX_ADDRESS - DMAP_MIN_ADDRESS) >> L0_SHIFT) extern pt_entry_t pagetable_dmap[]; static SYSCTL_NODE(_vm, OID_AUTO, pmap, CTLFLAG_RD, 0, "VM/pmap parameters"); static int superpages_enabled = 1; SYSCTL_INT(_vm_pmap, OID_AUTO, superpages_enabled, CTLFLAG_RDTUN | CTLFLAG_NOFETCH, &superpages_enabled, 0, "Are large page mappings enabled?"); /* * Data for the pv entry allocation mechanism */ static TAILQ_HEAD(pch, pv_chunk) pv_chunks = TAILQ_HEAD_INITIALIZER(pv_chunks); static struct mtx pv_chunks_mutex; static struct rwlock pv_list_locks[NPV_LIST_LOCKS]; static void free_pv_chunk(struct pv_chunk *pc); static void free_pv_entry(pmap_t pmap, pv_entry_t pv); static pv_entry_t get_pv_entry(pmap_t pmap, struct rwlock **lockp); static vm_page_t reclaim_pv_chunk(pmap_t locked_pmap, struct rwlock **lockp); static void pmap_pvh_free(struct md_page *pvh, pmap_t pmap, vm_offset_t va); static pv_entry_t pmap_pvh_remove(struct md_page *pvh, pmap_t pmap, vm_offset_t va); static int pmap_change_attr(vm_offset_t va, vm_size_t size, int mode); static int pmap_change_attr_locked(vm_offset_t va, vm_size_t size, int mode); static pt_entry_t *pmap_demote_l1(pmap_t pmap, pt_entry_t *l1, vm_offset_t va); static pt_entry_t *pmap_demote_l2_locked(pmap_t pmap, pt_entry_t *l2, vm_offset_t va, struct rwlock **lockp); static pt_entry_t *pmap_demote_l2(pmap_t pmap, pt_entry_t *l2, vm_offset_t va); static vm_page_t pmap_enter_quick_locked(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot, vm_page_t mpte, struct rwlock **lockp); static int pmap_remove_l3(pmap_t pmap, pt_entry_t *l3, vm_offset_t sva, pd_entry_t ptepde, struct spglist *free, struct rwlock **lockp); static boolean_t pmap_try_insert_pv_entry(pmap_t pmap, vm_offset_t va, vm_page_t m, struct rwlock **lockp); static vm_page_t _pmap_alloc_l3(pmap_t pmap, vm_pindex_t ptepindex, struct rwlock **lockp); static void _pmap_unwire_l3(pmap_t pmap, vm_offset_t va, vm_page_t m, struct spglist *free); static int pmap_unuse_l3(pmap_t, vm_offset_t, pd_entry_t, struct spglist *); /* * These load the old table data and store the new value. * They need to be atomic as the System MMU may write to the table at * the same time as the CPU. */ #define pmap_load_store(table, entry) atomic_swap_64(table, entry) #define pmap_set(table, mask) atomic_set_64(table, mask) #define pmap_load_clear(table) atomic_swap_64(table, 0) #define pmap_load(table) (*table) /********************/ /* Inline functions */ /********************/ static __inline void pagecopy(void *s, void *d) { memcpy(d, s, PAGE_SIZE); } #define pmap_l0_index(va) (((va) >> L0_SHIFT) & L0_ADDR_MASK) #define pmap_l1_index(va) (((va) >> L1_SHIFT) & Ln_ADDR_MASK) #define pmap_l2_index(va) (((va) >> L2_SHIFT) & Ln_ADDR_MASK) #define pmap_l3_index(va) (((va) >> L3_SHIFT) & Ln_ADDR_MASK) static __inline pd_entry_t * pmap_l0(pmap_t pmap, vm_offset_t va) { return (&pmap->pm_l0[pmap_l0_index(va)]); } static __inline pd_entry_t * pmap_l0_to_l1(pd_entry_t *l0, vm_offset_t va) { pd_entry_t *l1; l1 = (pd_entry_t *)PHYS_TO_DMAP(pmap_load(l0) & ~ATTR_MASK); return (&l1[pmap_l1_index(va)]); } static __inline pd_entry_t * pmap_l1(pmap_t pmap, vm_offset_t va) { pd_entry_t *l0; l0 = pmap_l0(pmap, va); if ((pmap_load(l0) & ATTR_DESCR_MASK) != L0_TABLE) return (NULL); return (pmap_l0_to_l1(l0, va)); } static __inline pd_entry_t * pmap_l1_to_l2(pd_entry_t *l1, vm_offset_t va) { pd_entry_t *l2; l2 = (pd_entry_t *)PHYS_TO_DMAP(pmap_load(l1) & ~ATTR_MASK); return (&l2[pmap_l2_index(va)]); } static __inline pd_entry_t * pmap_l2(pmap_t pmap, vm_offset_t va) { pd_entry_t *l1; l1 = pmap_l1(pmap, va); if ((pmap_load(l1) & ATTR_DESCR_MASK) != L1_TABLE) return (NULL); return (pmap_l1_to_l2(l1, va)); } static __inline pt_entry_t * pmap_l2_to_l3(pd_entry_t *l2, vm_offset_t va) { pt_entry_t *l3; l3 = (pd_entry_t *)PHYS_TO_DMAP(pmap_load(l2) & ~ATTR_MASK); return (&l3[pmap_l3_index(va)]); } /* * Returns the lowest valid pde for a given virtual address. * The next level may or may not point to a valid page or block. */ static __inline pd_entry_t * pmap_pde(pmap_t pmap, vm_offset_t va, int *level) { pd_entry_t *l0, *l1, *l2, desc; l0 = pmap_l0(pmap, va); desc = pmap_load(l0) & ATTR_DESCR_MASK; if (desc != L0_TABLE) { *level = -1; return (NULL); } l1 = pmap_l0_to_l1(l0, va); desc = pmap_load(l1) & ATTR_DESCR_MASK; if (desc != L1_TABLE) { *level = 0; return (l0); } l2 = pmap_l1_to_l2(l1, va); desc = pmap_load(l2) & ATTR_DESCR_MASK; if (desc != L2_TABLE) { *level = 1; return (l1); } *level = 2; return (l2); } /* * Returns the lowest valid pte block or table entry for a given virtual * address. If there are no valid entries return NULL and set the level to * the first invalid level. */ static __inline pt_entry_t * pmap_pte(pmap_t pmap, vm_offset_t va, int *level) { pd_entry_t *l1, *l2, desc; pt_entry_t *l3; l1 = pmap_l1(pmap, va); if (l1 == NULL) { *level = 0; return (NULL); } desc = pmap_load(l1) & ATTR_DESCR_MASK; if (desc == L1_BLOCK) { *level = 1; return (l1); } if (desc != L1_TABLE) { *level = 1; return (NULL); } l2 = pmap_l1_to_l2(l1, va); desc = pmap_load(l2) & ATTR_DESCR_MASK; if (desc == L2_BLOCK) { *level = 2; return (l2); } if (desc != L2_TABLE) { *level = 2; return (NULL); } *level = 3; l3 = pmap_l2_to_l3(l2, va); if ((pmap_load(l3) & ATTR_DESCR_MASK) != L3_PAGE) return (NULL); return (l3); } static inline bool pmap_superpages_enabled(void) { return (superpages_enabled != 0); } bool pmap_get_tables(pmap_t pmap, vm_offset_t va, pd_entry_t **l0, pd_entry_t **l1, pd_entry_t **l2, pt_entry_t **l3) { pd_entry_t *l0p, *l1p, *l2p; if (pmap->pm_l0 == NULL) return (false); l0p = pmap_l0(pmap, va); *l0 = l0p; if ((pmap_load(l0p) & ATTR_DESCR_MASK) != L0_TABLE) return (false); l1p = pmap_l0_to_l1(l0p, va); *l1 = l1p; if ((pmap_load(l1p) & ATTR_DESCR_MASK) == L1_BLOCK) { *l2 = NULL; *l3 = NULL; return (true); } if ((pmap_load(l1p) & ATTR_DESCR_MASK) != L1_TABLE) return (false); l2p = pmap_l1_to_l2(l1p, va); *l2 = l2p; if ((pmap_load(l2p) & ATTR_DESCR_MASK) == L2_BLOCK) { *l3 = NULL; return (true); } *l3 = pmap_l2_to_l3(l2p, va); return (true); } static __inline int pmap_is_current(pmap_t pmap) { return ((pmap == pmap_kernel()) || (pmap == curthread->td_proc->p_vmspace->vm_map.pmap)); } static __inline int pmap_l3_valid(pt_entry_t l3) { return ((l3 & ATTR_DESCR_MASK) == L3_PAGE); } /* Is a level 1 or 2entry a valid block and cacheable */ CTASSERT(L1_BLOCK == L2_BLOCK); static __inline int pmap_pte_valid_cacheable(pt_entry_t pte) { return (((pte & ATTR_DESCR_MASK) == L1_BLOCK) && ((pte & ATTR_IDX_MASK) == ATTR_IDX(CACHED_MEMORY))); } static __inline int pmap_l3_valid_cacheable(pt_entry_t l3) { return (((l3 & ATTR_DESCR_MASK) == L3_PAGE) && ((l3 & ATTR_IDX_MASK) == ATTR_IDX(CACHED_MEMORY))); } #define PTE_SYNC(pte) cpu_dcache_wb_range((vm_offset_t)pte, sizeof(*pte)) /* * Checks if the page is dirty. We currently lack proper tracking of this on * arm64 so for now assume is a page mapped as rw was accessed it is. */ static inline int pmap_page_dirty(pt_entry_t pte) { return ((pte & (ATTR_AF | ATTR_AP_RW_BIT)) == (ATTR_AF | ATTR_AP(ATTR_AP_RW))); } static __inline void pmap_resident_count_inc(pmap_t pmap, int count) { PMAP_LOCK_ASSERT(pmap, MA_OWNED); pmap->pm_stats.resident_count += count; } static __inline void pmap_resident_count_dec(pmap_t pmap, int count) { PMAP_LOCK_ASSERT(pmap, MA_OWNED); KASSERT(pmap->pm_stats.resident_count >= count, ("pmap %p resident count underflow %ld %d", pmap, pmap->pm_stats.resident_count, count)); pmap->pm_stats.resident_count -= count; } static pt_entry_t * pmap_early_page_idx(vm_offset_t l1pt, vm_offset_t va, u_int *l1_slot, u_int *l2_slot) { pt_entry_t *l2; pd_entry_t *l1; l1 = (pd_entry_t *)l1pt; *l1_slot = (va >> L1_SHIFT) & Ln_ADDR_MASK; /* Check locore has used a table L1 map */ KASSERT((l1[*l1_slot] & ATTR_DESCR_MASK) == L1_TABLE, ("Invalid bootstrap L1 table")); /* Find the address of the L2 table */ l2 = (pt_entry_t *)init_pt_va; *l2_slot = pmap_l2_index(va); return (l2); } static vm_paddr_t pmap_early_vtophys(vm_offset_t l1pt, vm_offset_t va) { u_int l1_slot, l2_slot; pt_entry_t *l2; l2 = pmap_early_page_idx(l1pt, va, &l1_slot, &l2_slot); return ((l2[l2_slot] & ~ATTR_MASK) + (va & L2_OFFSET)); } static void pmap_bootstrap_dmap(vm_offset_t kern_l1, vm_paddr_t min_pa, vm_paddr_t max_pa) { vm_offset_t va; vm_paddr_t pa; u_int l1_slot; pa = dmap_phys_base = min_pa & ~L1_OFFSET; va = DMAP_MIN_ADDRESS; for (; va < DMAP_MAX_ADDRESS && pa < max_pa; pa += L1_SIZE, va += L1_SIZE, l1_slot++) { l1_slot = ((va - DMAP_MIN_ADDRESS) >> L1_SHIFT); pmap_load_store(&pagetable_dmap[l1_slot], (pa & ~L1_OFFSET) | ATTR_DEFAULT | ATTR_IDX(CACHED_MEMORY) | L1_BLOCK); } /* Set the upper limit of the DMAP region */ dmap_phys_max = pa; dmap_max_addr = va; cpu_dcache_wb_range((vm_offset_t)pagetable_dmap, PAGE_SIZE * DMAP_TABLES); cpu_tlb_flushID(); } static vm_offset_t pmap_bootstrap_l2(vm_offset_t l1pt, vm_offset_t va, vm_offset_t l2_start) { vm_offset_t l2pt; vm_paddr_t pa; pd_entry_t *l1; u_int l1_slot; KASSERT((va & L1_OFFSET) == 0, ("Invalid virtual address")); l1 = (pd_entry_t *)l1pt; l1_slot = pmap_l1_index(va); l2pt = l2_start; for (; va < VM_MAX_KERNEL_ADDRESS; l1_slot++, va += L1_SIZE) { KASSERT(l1_slot < Ln_ENTRIES, ("Invalid L1 index")); pa = pmap_early_vtophys(l1pt, l2pt); pmap_load_store(&l1[l1_slot], (pa & ~Ln_TABLE_MASK) | L1_TABLE); l2pt += PAGE_SIZE; } /* Clean the L2 page table */ memset((void *)l2_start, 0, l2pt - l2_start); cpu_dcache_wb_range(l2_start, l2pt - l2_start); /* Flush the l1 table to ram */ cpu_dcache_wb_range((vm_offset_t)l1, PAGE_SIZE); return l2pt; } static vm_offset_t pmap_bootstrap_l3(vm_offset_t l1pt, vm_offset_t va, vm_offset_t l3_start) { vm_offset_t l2pt, l3pt; vm_paddr_t pa; pd_entry_t *l2; u_int l2_slot; KASSERT((va & L2_OFFSET) == 0, ("Invalid virtual address")); l2 = pmap_l2(kernel_pmap, va); l2 = (pd_entry_t *)rounddown2((uintptr_t)l2, PAGE_SIZE); l2pt = (vm_offset_t)l2; l2_slot = pmap_l2_index(va); l3pt = l3_start; for (; va < VM_MAX_KERNEL_ADDRESS; l2_slot++, va += L2_SIZE) { KASSERT(l2_slot < Ln_ENTRIES, ("Invalid L2 index")); pa = pmap_early_vtophys(l1pt, l3pt); pmap_load_store(&l2[l2_slot], (pa & ~Ln_TABLE_MASK) | L2_TABLE); l3pt += PAGE_SIZE; } /* Clean the L2 page table */ memset((void *)l3_start, 0, l3pt - l3_start); cpu_dcache_wb_range(l3_start, l3pt - l3_start); cpu_dcache_wb_range((vm_offset_t)l2, PAGE_SIZE); return l3pt; } /* * Bootstrap the system enough to run with virtual memory. */ void pmap_bootstrap(vm_offset_t l0pt, vm_offset_t l1pt, vm_paddr_t kernstart, vm_size_t kernlen) { u_int l1_slot, l2_slot, avail_slot, map_slot, used_map_slot; uint64_t kern_delta; pt_entry_t *l2; vm_offset_t va, freemempos; vm_offset_t dpcpu, msgbufpv; vm_paddr_t pa, max_pa, min_pa; int i; kern_delta = KERNBASE - kernstart; physmem = 0; printf("pmap_bootstrap %lx %lx %lx\n", l1pt, kernstart, kernlen); printf("%lx\n", l1pt); printf("%lx\n", (KERNBASE >> L1_SHIFT) & Ln_ADDR_MASK); /* Set this early so we can use the pagetable walking functions */ kernel_pmap_store.pm_l0 = (pd_entry_t *)l0pt; PMAP_LOCK_INIT(kernel_pmap); /* Assume the address we were loaded to is a valid physical address */ min_pa = max_pa = KERNBASE - kern_delta; /* * Find the minimum physical address. physmap is sorted, * but may contain empty ranges. */ for (i = 0; i < (physmap_idx * 2); i += 2) { if (physmap[i] == physmap[i + 1]) continue; if (physmap[i] <= min_pa) min_pa = physmap[i]; if (physmap[i + 1] > max_pa) max_pa = physmap[i + 1]; } /* Create a direct map region early so we can use it for pa -> va */ pmap_bootstrap_dmap(l1pt, min_pa, max_pa); va = KERNBASE; pa = KERNBASE - kern_delta; /* * Start to initialise phys_avail by copying from physmap * up to the physical address KERNBASE points at. */ map_slot = avail_slot = 0; for (; map_slot < (physmap_idx * 2) && avail_slot < (PHYS_AVAIL_SIZE - 2); map_slot += 2) { if (physmap[map_slot] == physmap[map_slot + 1]) continue; if (physmap[map_slot] <= pa && physmap[map_slot + 1] > pa) break; phys_avail[avail_slot] = physmap[map_slot]; phys_avail[avail_slot + 1] = physmap[map_slot + 1]; physmem += (phys_avail[avail_slot + 1] - phys_avail[avail_slot]) >> PAGE_SHIFT; avail_slot += 2; } /* Add the memory before the kernel */ if (physmap[avail_slot] < pa && avail_slot < (PHYS_AVAIL_SIZE - 2)) { phys_avail[avail_slot] = physmap[map_slot]; phys_avail[avail_slot + 1] = pa; physmem += (phys_avail[avail_slot + 1] - phys_avail[avail_slot]) >> PAGE_SHIFT; avail_slot += 2; } used_map_slot = map_slot; /* * Read the page table to find out what is already mapped. * This assumes we have mapped a block of memory from KERNBASE * using a single L1 entry. */ l2 = pmap_early_page_idx(l1pt, KERNBASE, &l1_slot, &l2_slot); /* Sanity check the index, KERNBASE should be the first VA */ KASSERT(l2_slot == 0, ("The L2 index is non-zero")); /* Find how many pages we have mapped */ for (; l2_slot < Ln_ENTRIES; l2_slot++) { if ((l2[l2_slot] & ATTR_DESCR_MASK) == 0) break; /* Check locore used L2 blocks */ KASSERT((l2[l2_slot] & ATTR_DESCR_MASK) == L2_BLOCK, ("Invalid bootstrap L2 table")); KASSERT((l2[l2_slot] & ~ATTR_MASK) == pa, ("Incorrect PA in L2 table")); va += L2_SIZE; pa += L2_SIZE; } va = roundup2(va, L1_SIZE); freemempos = KERNBASE + kernlen; freemempos = roundup2(freemempos, PAGE_SIZE); /* Create the l2 tables up to VM_MAX_KERNEL_ADDRESS */ freemempos = pmap_bootstrap_l2(l1pt, va, freemempos); /* And the l3 tables for the early devmap */ freemempos = pmap_bootstrap_l3(l1pt, VM_MAX_KERNEL_ADDRESS - L2_SIZE, freemempos); cpu_tlb_flushID(); #define alloc_pages(var, np) \ (var) = freemempos; \ freemempos += (np * PAGE_SIZE); \ memset((char *)(var), 0, ((np) * PAGE_SIZE)); /* Allocate dynamic per-cpu area. */ alloc_pages(dpcpu, DPCPU_SIZE / PAGE_SIZE); dpcpu_init((void *)dpcpu, 0); /* Allocate memory for the msgbuf, e.g. for /sbin/dmesg */ alloc_pages(msgbufpv, round_page(msgbufsize) / PAGE_SIZE); msgbufp = (void *)msgbufpv; virtual_avail = roundup2(freemempos, L1_SIZE); virtual_end = VM_MAX_KERNEL_ADDRESS - L2_SIZE; kernel_vm_end = virtual_avail; pa = pmap_early_vtophys(l1pt, freemempos); /* Finish initialising physmap */ map_slot = used_map_slot; for (; avail_slot < (PHYS_AVAIL_SIZE - 2) && map_slot < (physmap_idx * 2); map_slot += 2) { if (physmap[map_slot] == physmap[map_slot + 1]) continue; /* Have we used the current range? */ if (physmap[map_slot + 1] <= pa) continue; /* Do we need to split the entry? */ if (physmap[map_slot] < pa) { phys_avail[avail_slot] = pa; phys_avail[avail_slot + 1] = physmap[map_slot + 1]; } else { phys_avail[avail_slot] = physmap[map_slot]; phys_avail[avail_slot + 1] = physmap[map_slot + 1]; } physmem += (phys_avail[avail_slot + 1] - phys_avail[avail_slot]) >> PAGE_SHIFT; avail_slot += 2; } phys_avail[avail_slot] = 0; phys_avail[avail_slot + 1] = 0; /* * Maxmem isn't the "maximum memory", it's one larger than the * highest page of the physical address space. It should be * called something like "Maxphyspage". */ Maxmem = atop(phys_avail[avail_slot - 1]); cpu_tlb_flushID(); } /* * Initialize a vm_page's machine-dependent fields. */ void pmap_page_init(vm_page_t m) { TAILQ_INIT(&m->md.pv_list); m->md.pv_memattr = VM_MEMATTR_WRITE_BACK; } /* * Initialize the pmap module. * Called by vm_init, to initialize any structures that the pmap * system needs to map virtual memory. */ void pmap_init(void) { vm_size_t s; int i, pv_npg; /* * Are large page mappings enabled? */ TUNABLE_INT_FETCH("vm.pmap.superpages_enabled", &superpages_enabled); /* * Initialize the pv chunk list mutex. */ mtx_init(&pv_chunks_mutex, "pmap pv chunk list", NULL, MTX_DEF); /* * Initialize the pool of pv list locks. */ for (i = 0; i < NPV_LIST_LOCKS; i++) rw_init(&pv_list_locks[i], "pmap pv list"); /* * Calculate the size of the pv head table for superpages. */ pv_npg = howmany(vm_phys_segs[vm_phys_nsegs - 1].end, L2_SIZE); /* * Allocate memory for the pv head table for superpages. */ s = (vm_size_t)(pv_npg * sizeof(struct md_page)); s = round_page(s); pv_table = (struct md_page *)kmem_malloc(kernel_arena, s, M_WAITOK | M_ZERO); for (i = 0; i < pv_npg; i++) TAILQ_INIT(&pv_table[i].pv_list); TAILQ_INIT(&pv_dummy.pv_list); } static SYSCTL_NODE(_vm_pmap, OID_AUTO, l2, CTLFLAG_RD, 0, "2MB page mapping counters"); static u_long pmap_l2_demotions; SYSCTL_ULONG(_vm_pmap_l2, OID_AUTO, demotions, CTLFLAG_RD, &pmap_l2_demotions, 0, "2MB page demotions"); static u_long pmap_l2_p_failures; SYSCTL_ULONG(_vm_pmap_l2, OID_AUTO, p_failures, CTLFLAG_RD, &pmap_l2_p_failures, 0, "2MB page promotion failures"); static u_long pmap_l2_promotions; SYSCTL_ULONG(_vm_pmap_l2, OID_AUTO, promotions, CTLFLAG_RD, &pmap_l2_promotions, 0, "2MB page promotions"); /* * Invalidate a single TLB entry. */ PMAP_INLINE void pmap_invalidate_page(pmap_t pmap, vm_offset_t va) { sched_pin(); __asm __volatile( "dsb ishst \n" "tlbi vaae1is, %0 \n" "dsb ish \n" "isb \n" : : "r"(va >> PAGE_SHIFT)); sched_unpin(); } PMAP_INLINE void pmap_invalidate_range(pmap_t pmap, vm_offset_t sva, vm_offset_t eva) { vm_offset_t addr; sched_pin(); dsb(ishst); for (addr = sva; addr < eva; addr += PAGE_SIZE) { __asm __volatile( "tlbi vaae1is, %0" : : "r"(addr >> PAGE_SHIFT)); } __asm __volatile( "dsb ish \n" "isb \n"); sched_unpin(); } PMAP_INLINE void pmap_invalidate_all(pmap_t pmap) { sched_pin(); __asm __volatile( "dsb ishst \n" "tlbi vmalle1is \n" "dsb ish \n" "isb \n"); sched_unpin(); } /* * Routine: pmap_extract * Function: * Extract the physical page address associated * with the given map/virtual_address pair. */ vm_paddr_t pmap_extract(pmap_t pmap, vm_offset_t va) { pt_entry_t *pte, tpte; vm_paddr_t pa; int lvl; pa = 0; PMAP_LOCK(pmap); /* * Find the block or page map for this virtual address. pmap_pte * will return either a valid block/page entry, or NULL. */ pte = pmap_pte(pmap, va, &lvl); if (pte != NULL) { tpte = pmap_load(pte); pa = tpte & ~ATTR_MASK; switch(lvl) { case 1: KASSERT((tpte & ATTR_DESCR_MASK) == L1_BLOCK, ("pmap_extract: Invalid L1 pte found: %lx", tpte & ATTR_DESCR_MASK)); pa |= (va & L1_OFFSET); break; case 2: KASSERT((tpte & ATTR_DESCR_MASK) == L2_BLOCK, ("pmap_extract: Invalid L2 pte found: %lx", tpte & ATTR_DESCR_MASK)); pa |= (va & L2_OFFSET); break; case 3: KASSERT((tpte & ATTR_DESCR_MASK) == L3_PAGE, ("pmap_extract: Invalid L3 pte found: %lx", tpte & ATTR_DESCR_MASK)); pa |= (va & L3_OFFSET); break; } } PMAP_UNLOCK(pmap); return (pa); } /* * Routine: pmap_extract_and_hold * Function: * Atomically extract and hold the physical page * with the given pmap and virtual address pair * if that mapping permits the given protection. */ vm_page_t pmap_extract_and_hold(pmap_t pmap, vm_offset_t va, vm_prot_t prot) { pt_entry_t *pte, tpte; vm_offset_t off; vm_paddr_t pa; vm_page_t m; int lvl; pa = 0; m = NULL; PMAP_LOCK(pmap); retry: pte = pmap_pte(pmap, va, &lvl); if (pte != NULL) { tpte = pmap_load(pte); KASSERT(lvl > 0 && lvl <= 3, ("pmap_extract_and_hold: Invalid level %d", lvl)); CTASSERT(L1_BLOCK == L2_BLOCK); KASSERT((lvl == 3 && (tpte & ATTR_DESCR_MASK) == L3_PAGE) || (lvl < 3 && (tpte & ATTR_DESCR_MASK) == L1_BLOCK), ("pmap_extract_and_hold: Invalid pte at L%d: %lx", lvl, tpte & ATTR_DESCR_MASK)); if (((tpte & ATTR_AP_RW_BIT) == ATTR_AP(ATTR_AP_RW)) || ((prot & VM_PROT_WRITE) == 0)) { switch(lvl) { case 1: off = va & L1_OFFSET; break; case 2: off = va & L2_OFFSET; break; case 3: default: off = 0; } if (vm_page_pa_tryrelock(pmap, (tpte & ~ATTR_MASK) | off, &pa)) goto retry; m = PHYS_TO_VM_PAGE((tpte & ~ATTR_MASK) | off); vm_page_hold(m); } } PA_UNLOCK_COND(pa); PMAP_UNLOCK(pmap); return (m); } vm_paddr_t pmap_kextract(vm_offset_t va) { pt_entry_t *pte, tpte; vm_paddr_t pa; int lvl; if (va >= DMAP_MIN_ADDRESS && va < DMAP_MAX_ADDRESS) { pa = DMAP_TO_PHYS(va); } else { pa = 0; pte = pmap_pte(kernel_pmap, va, &lvl); if (pte != NULL) { tpte = pmap_load(pte); pa = tpte & ~ATTR_MASK; switch(lvl) { case 1: KASSERT((tpte & ATTR_DESCR_MASK) == L1_BLOCK, ("pmap_kextract: Invalid L1 pte found: %lx", tpte & ATTR_DESCR_MASK)); pa |= (va & L1_OFFSET); break; case 2: KASSERT((tpte & ATTR_DESCR_MASK) == L2_BLOCK, ("pmap_kextract: Invalid L2 pte found: %lx", tpte & ATTR_DESCR_MASK)); pa |= (va & L2_OFFSET); break; case 3: KASSERT((tpte & ATTR_DESCR_MASK) == L3_PAGE, ("pmap_kextract: Invalid L3 pte found: %lx", tpte & ATTR_DESCR_MASK)); pa |= (va & L3_OFFSET); break; } } } return (pa); } /*************************************************** * Low level mapping routines..... ***************************************************/ static void pmap_kenter(vm_offset_t sva, vm_size_t size, vm_paddr_t pa, int mode) { pd_entry_t *pde; pt_entry_t *pte; vm_offset_t va; int lvl; KASSERT((pa & L3_OFFSET) == 0, ("pmap_kenter: Invalid physical address")); KASSERT((sva & L3_OFFSET) == 0, ("pmap_kenter: Invalid virtual address")); KASSERT((size & PAGE_MASK) == 0, ("pmap_kenter: Mapping is not page-sized")); va = sva; while (size != 0) { pde = pmap_pde(kernel_pmap, va, &lvl); KASSERT(pde != NULL, ("pmap_kenter: Invalid page entry, va: 0x%lx", va)); KASSERT(lvl == 2, ("pmap_kenter: Invalid level %d", lvl)); pte = pmap_l2_to_l3(pde, va); pmap_load_store(pte, (pa & ~L3_OFFSET) | ATTR_DEFAULT | ATTR_IDX(mode) | L3_PAGE); PTE_SYNC(pte); va += PAGE_SIZE; pa += PAGE_SIZE; size -= PAGE_SIZE; } pmap_invalidate_range(kernel_pmap, sva, va); } void pmap_kenter_device(vm_offset_t sva, vm_size_t size, vm_paddr_t pa) { pmap_kenter(sva, size, pa, DEVICE_MEMORY); } /* * Remove a page from the kernel pagetables. */ PMAP_INLINE void pmap_kremove(vm_offset_t va) { pt_entry_t *pte; int lvl; pte = pmap_pte(kernel_pmap, va, &lvl); KASSERT(pte != NULL, ("pmap_kremove: Invalid address")); KASSERT(lvl == 3, ("pmap_kremove: Invalid pte level %d", lvl)); if (pmap_l3_valid_cacheable(pmap_load(pte))) cpu_dcache_wb_range(va, L3_SIZE); pmap_load_clear(pte); PTE_SYNC(pte); pmap_invalidate_page(kernel_pmap, va); } void pmap_kremove_device(vm_offset_t sva, vm_size_t size) { pt_entry_t *pte; vm_offset_t va; int lvl; KASSERT((sva & L3_OFFSET) == 0, ("pmap_kremove_device: Invalid virtual address")); KASSERT((size & PAGE_MASK) == 0, ("pmap_kremove_device: Mapping is not page-sized")); va = sva; while (size != 0) { pte = pmap_pte(kernel_pmap, va, &lvl); KASSERT(pte != NULL, ("Invalid page table, va: 0x%lx", va)); KASSERT(lvl == 3, ("Invalid device pagetable level: %d != 3", lvl)); pmap_load_clear(pte); PTE_SYNC(pte); va += PAGE_SIZE; size -= PAGE_SIZE; } pmap_invalidate_range(kernel_pmap, sva, va); } /* * Used to map a range of physical addresses into kernel * virtual address space. * * The value passed in '*virt' is a suggested virtual address for * the mapping. Architectures which can support a direct-mapped * physical to virtual region can return the appropriate address * within that region, leaving '*virt' unchanged. Other * architectures should map the pages starting at '*virt' and * update '*virt' with the first usable address after the mapped * region. */ vm_offset_t pmap_map(vm_offset_t *virt, vm_paddr_t start, vm_paddr_t end, int prot) { return PHYS_TO_DMAP(start); } /* * Add a list of wired pages to the kva * this routine is only used for temporary * kernel mappings that do not need to have * page modification or references recorded. * Note that old mappings are simply written * over. The page *must* be wired. * Note: SMP coherent. Uses a ranged shootdown IPI. */ void pmap_qenter(vm_offset_t sva, vm_page_t *ma, int count) { pd_entry_t *pde; pt_entry_t *pte, pa; vm_offset_t va; vm_page_t m; int i, lvl; va = sva; for (i = 0; i < count; i++) { pde = pmap_pde(kernel_pmap, va, &lvl); KASSERT(pde != NULL, ("pmap_qenter: Invalid page entry, va: 0x%lx", va)); KASSERT(lvl == 2, ("pmap_qenter: Invalid level %d", lvl)); m = ma[i]; pa = VM_PAGE_TO_PHYS(m) | ATTR_DEFAULT | ATTR_AP(ATTR_AP_RW) | ATTR_IDX(m->md.pv_memattr) | L3_PAGE; pte = pmap_l2_to_l3(pde, va); pmap_load_store(pte, pa); PTE_SYNC(pte); va += L3_SIZE; } pmap_invalidate_range(kernel_pmap, sva, va); } /* * This routine tears out page mappings from the * kernel -- it is meant only for temporary mappings. */ void pmap_qremove(vm_offset_t sva, int count) { pt_entry_t *pte; vm_offset_t va; int lvl; KASSERT(sva >= VM_MIN_KERNEL_ADDRESS, ("usermode va %lx", sva)); va = sva; while (count-- > 0) { pte = pmap_pte(kernel_pmap, va, &lvl); KASSERT(lvl == 3, ("Invalid device pagetable level: %d != 3", lvl)); if (pte != NULL) { if (pmap_l3_valid_cacheable(pmap_load(pte))) cpu_dcache_wb_range(va, L3_SIZE); pmap_load_clear(pte); PTE_SYNC(pte); } va += PAGE_SIZE; } pmap_invalidate_range(kernel_pmap, sva, va); } /*************************************************** * Page table page management routines..... ***************************************************/ static __inline void pmap_free_zero_pages(struct spglist *free) { vm_page_t m; while ((m = SLIST_FIRST(free)) != NULL) { SLIST_REMOVE_HEAD(free, plinks.s.ss); /* Preserve the page's PG_ZERO setting. */ vm_page_free_toq(m); } } /* * Schedule the specified unused page table page to be freed. Specifically, * add the page to the specified list of pages that will be released to the * physical memory manager after the TLB has been updated. */ static __inline void pmap_add_delayed_free_list(vm_page_t m, struct spglist *free, boolean_t set_PG_ZERO) { if (set_PG_ZERO) m->flags |= PG_ZERO; else m->flags &= ~PG_ZERO; SLIST_INSERT_HEAD(free, m, plinks.s.ss); } /* * Decrements a page table page's wire count, which is used to record the * number of valid page table entries within the page. If the wire count * drops to zero, then the page table page is unmapped. Returns TRUE if the * page table page was unmapped and FALSE otherwise. */ static inline boolean_t pmap_unwire_l3(pmap_t pmap, vm_offset_t va, vm_page_t m, struct spglist *free) { --m->wire_count; if (m->wire_count == 0) { _pmap_unwire_l3(pmap, va, m, free); return (TRUE); } else return (FALSE); } static void _pmap_unwire_l3(pmap_t pmap, vm_offset_t va, vm_page_t m, struct spglist *free) { PMAP_LOCK_ASSERT(pmap, MA_OWNED); /* * unmap the page table page */ if (m->pindex >= (NUL2E + NUL1E)) { /* l1 page */ pd_entry_t *l0; l0 = pmap_l0(pmap, va); pmap_load_clear(l0); PTE_SYNC(l0); } else if (m->pindex >= NUL2E) { /* l2 page */ pd_entry_t *l1; l1 = pmap_l1(pmap, va); pmap_load_clear(l1); PTE_SYNC(l1); } else { /* l3 page */ pd_entry_t *l2; l2 = pmap_l2(pmap, va); pmap_load_clear(l2); PTE_SYNC(l2); } pmap_resident_count_dec(pmap, 1); if (m->pindex < NUL2E) { /* We just released an l3, unhold the matching l2 */ pd_entry_t *l1, tl1; vm_page_t l2pg; l1 = pmap_l1(pmap, va); tl1 = pmap_load(l1); l2pg = PHYS_TO_VM_PAGE(tl1 & ~ATTR_MASK); pmap_unwire_l3(pmap, va, l2pg, free); } else if (m->pindex < (NUL2E + NUL1E)) { /* We just released an l2, unhold the matching l1 */ pd_entry_t *l0, tl0; vm_page_t l1pg; l0 = pmap_l0(pmap, va); tl0 = pmap_load(l0); l1pg = PHYS_TO_VM_PAGE(tl0 & ~ATTR_MASK); pmap_unwire_l3(pmap, va, l1pg, free); } pmap_invalidate_page(pmap, va); /* * This is a release store so that the ordinary store unmapping * the page table page is globally performed before TLB shoot- * down is begun. */ atomic_subtract_rel_int(&vm_cnt.v_wire_count, 1); /* * Put page on a list so that it is released after * *ALL* TLB shootdown is done */ pmap_add_delayed_free_list(m, free, TRUE); } /* * After removing an l3 entry, this routine is used to * conditionally free the page, and manage the hold/wire counts. */ static int pmap_unuse_l3(pmap_t pmap, vm_offset_t va, pd_entry_t ptepde, struct spglist *free) { vm_page_t mpte; if (va >= VM_MAXUSER_ADDRESS) return (0); KASSERT(ptepde != 0, ("pmap_unuse_pt: ptepde != 0")); mpte = PHYS_TO_VM_PAGE(ptepde & ~ATTR_MASK); return (pmap_unwire_l3(pmap, va, mpte, free)); } void pmap_pinit0(pmap_t pmap) { PMAP_LOCK_INIT(pmap); bzero(&pmap->pm_stats, sizeof(pmap->pm_stats)); pmap->pm_l0 = kernel_pmap->pm_l0; pmap->pm_root.rt_root = 0; } int pmap_pinit(pmap_t pmap) { vm_paddr_t l0phys; vm_page_t l0pt; /* * allocate the l0 page */ while ((l0pt = vm_page_alloc(NULL, 0, VM_ALLOC_NORMAL | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED | VM_ALLOC_ZERO)) == NULL) VM_WAIT; l0phys = VM_PAGE_TO_PHYS(l0pt); pmap->pm_l0 = (pd_entry_t *)PHYS_TO_DMAP(l0phys); if ((l0pt->flags & PG_ZERO) == 0) pagezero(pmap->pm_l0); pmap->pm_root.rt_root = 0; bzero(&pmap->pm_stats, sizeof(pmap->pm_stats)); return (1); } /* * This routine is called if the desired page table page does not exist. * * If page table page allocation fails, this routine may sleep before * returning NULL. It sleeps only if a lock pointer was given. * * Note: If a page allocation fails at page table level two or three, * one or two pages may be held during the wait, only to be released * afterwards. This conservative approach is easily argued to avoid * race conditions. */ static vm_page_t _pmap_alloc_l3(pmap_t pmap, vm_pindex_t ptepindex, struct rwlock **lockp) { vm_page_t m, l1pg, l2pg; PMAP_LOCK_ASSERT(pmap, MA_OWNED); /* * Allocate a page table page. */ if ((m = vm_page_alloc(NULL, ptepindex, VM_ALLOC_NOOBJ | VM_ALLOC_WIRED | VM_ALLOC_ZERO)) == NULL) { if (lockp != NULL) { RELEASE_PV_LIST_LOCK(lockp); PMAP_UNLOCK(pmap); VM_WAIT; PMAP_LOCK(pmap); } /* * Indicate the need to retry. While waiting, the page table * page may have been allocated. */ return (NULL); } if ((m->flags & PG_ZERO) == 0) pmap_zero_page(m); /* * Map the pagetable page into the process address space, if * it isn't already there. */ if (ptepindex >= (NUL2E + NUL1E)) { pd_entry_t *l0; vm_pindex_t l0index; l0index = ptepindex - (NUL2E + NUL1E); l0 = &pmap->pm_l0[l0index]; pmap_load_store(l0, VM_PAGE_TO_PHYS(m) | L0_TABLE); PTE_SYNC(l0); } else if (ptepindex >= NUL2E) { vm_pindex_t l0index, l1index; pd_entry_t *l0, *l1; pd_entry_t tl0; l1index = ptepindex - NUL2E; l0index = l1index >> L0_ENTRIES_SHIFT; l0 = &pmap->pm_l0[l0index]; tl0 = pmap_load(l0); if (tl0 == 0) { /* recurse for allocating page dir */ if (_pmap_alloc_l3(pmap, NUL2E + NUL1E + l0index, lockp) == NULL) { --m->wire_count; /* XXX: release mem barrier? */ atomic_subtract_int(&vm_cnt.v_wire_count, 1); vm_page_free_zero(m); return (NULL); } } else { l1pg = PHYS_TO_VM_PAGE(tl0 & ~ATTR_MASK); l1pg->wire_count++; } l1 = (pd_entry_t *)PHYS_TO_DMAP(pmap_load(l0) & ~ATTR_MASK); l1 = &l1[ptepindex & Ln_ADDR_MASK]; pmap_load_store(l1, VM_PAGE_TO_PHYS(m) | L1_TABLE); PTE_SYNC(l1); } else { vm_pindex_t l0index, l1index; pd_entry_t *l0, *l1, *l2; pd_entry_t tl0, tl1; l1index = ptepindex >> Ln_ENTRIES_SHIFT; l0index = l1index >> L0_ENTRIES_SHIFT; l0 = &pmap->pm_l0[l0index]; tl0 = pmap_load(l0); if (tl0 == 0) { /* recurse for allocating page dir */ if (_pmap_alloc_l3(pmap, NUL2E + l1index, lockp) == NULL) { --m->wire_count; atomic_subtract_int(&vm_cnt.v_wire_count, 1); vm_page_free_zero(m); return (NULL); } tl0 = pmap_load(l0); l1 = (pd_entry_t *)PHYS_TO_DMAP(tl0 & ~ATTR_MASK); l1 = &l1[l1index & Ln_ADDR_MASK]; } else { l1 = (pd_entry_t *)PHYS_TO_DMAP(tl0 & ~ATTR_MASK); l1 = &l1[l1index & Ln_ADDR_MASK]; tl1 = pmap_load(l1); if (tl1 == 0) { /* recurse for allocating page dir */ if (_pmap_alloc_l3(pmap, NUL2E + l1index, lockp) == NULL) { --m->wire_count; /* XXX: release mem barrier? */ atomic_subtract_int( &vm_cnt.v_wire_count, 1); vm_page_free_zero(m); return (NULL); } } else { l2pg = PHYS_TO_VM_PAGE(tl1 & ~ATTR_MASK); l2pg->wire_count++; } } l2 = (pd_entry_t *)PHYS_TO_DMAP(pmap_load(l1) & ~ATTR_MASK); l2 = &l2[ptepindex & Ln_ADDR_MASK]; pmap_load_store(l2, VM_PAGE_TO_PHYS(m) | L2_TABLE); PTE_SYNC(l2); } pmap_resident_count_inc(pmap, 1); return (m); } static vm_page_t pmap_alloc_l3(pmap_t pmap, vm_offset_t va, struct rwlock **lockp) { vm_pindex_t ptepindex; pd_entry_t *pde, tpde; #ifdef INVARIANTS pt_entry_t *pte; #endif vm_page_t m; int lvl; /* * Calculate pagetable page index */ ptepindex = pmap_l2_pindex(va); retry: /* * Get the page directory entry */ pde = pmap_pde(pmap, va, &lvl); /* * If the page table page is mapped, we just increment the hold count, * and activate it. If we get a level 2 pde it will point to a level 3 * table. */ switch (lvl) { case -1: break; case 0: #ifdef INVARIANTS pte = pmap_l0_to_l1(pde, va); KASSERT(pmap_load(pte) == 0, ("pmap_alloc_l3: TODO: l0 superpages")); #endif break; case 1: #ifdef INVARIANTS pte = pmap_l1_to_l2(pde, va); KASSERT(pmap_load(pte) == 0, ("pmap_alloc_l3: TODO: l1 superpages")); #endif break; case 2: tpde = pmap_load(pde); if (tpde != 0) { m = PHYS_TO_VM_PAGE(tpde & ~ATTR_MASK); m->wire_count++; return (m); } break; default: panic("pmap_alloc_l3: Invalid level %d", lvl); } /* * Here if the pte page isn't mapped, or if it has been deallocated. */ m = _pmap_alloc_l3(pmap, ptepindex, lockp); if (m == NULL && lockp != NULL) goto retry; return (m); } /*************************************************** * Pmap allocation/deallocation routines. ***************************************************/ /* * Release any resources held by the given physical map. * Called when a pmap initialized by pmap_pinit is being released. * Should only be called if the map contains no valid mappings. */ void pmap_release(pmap_t pmap) { vm_page_t m; KASSERT(pmap->pm_stats.resident_count == 0, ("pmap_release: pmap resident count %ld != 0", pmap->pm_stats.resident_count)); KASSERT(vm_radix_is_empty(&pmap->pm_root), ("pmap_release: pmap has reserved page table page(s)")); m = PHYS_TO_VM_PAGE(DMAP_TO_PHYS((vm_offset_t)pmap->pm_l0)); m->wire_count--; atomic_subtract_int(&vm_cnt.v_wire_count, 1); vm_page_free_zero(m); } static int kvm_size(SYSCTL_HANDLER_ARGS) { unsigned long ksize = VM_MAX_KERNEL_ADDRESS - VM_MIN_KERNEL_ADDRESS; return sysctl_handle_long(oidp, &ksize, 0, req); } SYSCTL_PROC(_vm, OID_AUTO, kvm_size, CTLTYPE_LONG|CTLFLAG_RD, 0, 0, kvm_size, "LU", "Size of KVM"); static int kvm_free(SYSCTL_HANDLER_ARGS) { unsigned long kfree = VM_MAX_KERNEL_ADDRESS - kernel_vm_end; return sysctl_handle_long(oidp, &kfree, 0, req); } SYSCTL_PROC(_vm, OID_AUTO, kvm_free, CTLTYPE_LONG|CTLFLAG_RD, 0, 0, kvm_free, "LU", "Amount of KVM free"); /* * grow the number of kernel page table entries, if needed */ void pmap_growkernel(vm_offset_t addr) { vm_paddr_t paddr; vm_page_t nkpg; pd_entry_t *l0, *l1, *l2; mtx_assert(&kernel_map->system_mtx, MA_OWNED); addr = roundup2(addr, L2_SIZE); if (addr - 1 >= kernel_map->max_offset) addr = kernel_map->max_offset; while (kernel_vm_end < addr) { l0 = pmap_l0(kernel_pmap, kernel_vm_end); KASSERT(pmap_load(l0) != 0, ("pmap_growkernel: No level 0 kernel entry")); l1 = pmap_l0_to_l1(l0, kernel_vm_end); if (pmap_load(l1) == 0) { /* We need a new PDP entry */ nkpg = vm_page_alloc(NULL, kernel_vm_end >> L1_SHIFT, VM_ALLOC_INTERRUPT | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED | VM_ALLOC_ZERO); if (nkpg == NULL) panic("pmap_growkernel: no memory to grow kernel"); if ((nkpg->flags & PG_ZERO) == 0) pmap_zero_page(nkpg); paddr = VM_PAGE_TO_PHYS(nkpg); pmap_load_store(l1, paddr | L1_TABLE); PTE_SYNC(l1); continue; /* try again */ } l2 = pmap_l1_to_l2(l1, kernel_vm_end); if ((pmap_load(l2) & ATTR_AF) != 0) { kernel_vm_end = (kernel_vm_end + L2_SIZE) & ~L2_OFFSET; if (kernel_vm_end - 1 >= kernel_map->max_offset) { kernel_vm_end = kernel_map->max_offset; break; } continue; } nkpg = vm_page_alloc(NULL, kernel_vm_end >> L2_SHIFT, VM_ALLOC_INTERRUPT | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED | VM_ALLOC_ZERO); if (nkpg == NULL) panic("pmap_growkernel: no memory to grow kernel"); if ((nkpg->flags & PG_ZERO) == 0) pmap_zero_page(nkpg); paddr = VM_PAGE_TO_PHYS(nkpg); pmap_load_store(l2, paddr | L2_TABLE); PTE_SYNC(l2); pmap_invalidate_page(kernel_pmap, kernel_vm_end); kernel_vm_end = (kernel_vm_end + L2_SIZE) & ~L2_OFFSET; if (kernel_vm_end - 1 >= kernel_map->max_offset) { kernel_vm_end = kernel_map->max_offset; break; } } } /*************************************************** * page management routines. ***************************************************/ CTASSERT(sizeof(struct pv_chunk) == PAGE_SIZE); CTASSERT(_NPCM == 3); CTASSERT(_NPCPV == 168); static __inline struct pv_chunk * pv_to_chunk(pv_entry_t pv) { return ((struct pv_chunk *)((uintptr_t)pv & ~(uintptr_t)PAGE_MASK)); } #define PV_PMAP(pv) (pv_to_chunk(pv)->pc_pmap) #define PC_FREE0 0xfffffffffffffffful #define PC_FREE1 0xfffffffffffffffful #define PC_FREE2 0x000000fffffffffful static const uint64_t pc_freemask[_NPCM] = { PC_FREE0, PC_FREE1, PC_FREE2 }; #if 0 #ifdef PV_STATS static int pc_chunk_count, pc_chunk_allocs, pc_chunk_frees, pc_chunk_tryfail; SYSCTL_INT(_vm_pmap, OID_AUTO, pc_chunk_count, CTLFLAG_RD, &pc_chunk_count, 0, "Current number of pv entry chunks"); SYSCTL_INT(_vm_pmap, OID_AUTO, pc_chunk_allocs, CTLFLAG_RD, &pc_chunk_allocs, 0, "Current number of pv entry chunks allocated"); SYSCTL_INT(_vm_pmap, OID_AUTO, pc_chunk_frees, CTLFLAG_RD, &pc_chunk_frees, 0, "Current number of pv entry chunks frees"); SYSCTL_INT(_vm_pmap, OID_AUTO, pc_chunk_tryfail, CTLFLAG_RD, &pc_chunk_tryfail, 0, "Number of times tried to get a chunk page but failed."); static long pv_entry_frees, pv_entry_allocs, pv_entry_count; static int pv_entry_spare; SYSCTL_LONG(_vm_pmap, OID_AUTO, pv_entry_frees, CTLFLAG_RD, &pv_entry_frees, 0, "Current number of pv entry frees"); SYSCTL_LONG(_vm_pmap, OID_AUTO, pv_entry_allocs, CTLFLAG_RD, &pv_entry_allocs, 0, "Current number of pv entry allocs"); SYSCTL_LONG(_vm_pmap, OID_AUTO, pv_entry_count, CTLFLAG_RD, &pv_entry_count, 0, "Current number of pv entries"); SYSCTL_INT(_vm_pmap, OID_AUTO, pv_entry_spare, CTLFLAG_RD, &pv_entry_spare, 0, "Current number of spare pv entries"); #endif #endif /* 0 */ /* * We are in a serious low memory condition. Resort to * drastic measures to free some pages so we can allocate * another pv entry chunk. * * Returns NULL if PV entries were reclaimed from the specified pmap. * * We do not, however, unmap 2mpages because subsequent accesses will * allocate per-page pv entries until repromotion occurs, thereby * exacerbating the shortage of free pv entries. */ static vm_page_t reclaim_pv_chunk(pmap_t locked_pmap, struct rwlock **lockp) { panic("ARM64TODO: reclaim_pv_chunk"); } /* * free the pv_entry back to the free list */ static void free_pv_entry(pmap_t pmap, pv_entry_t pv) { struct pv_chunk *pc; int idx, field, bit; PMAP_LOCK_ASSERT(pmap, MA_OWNED); PV_STAT(atomic_add_long(&pv_entry_frees, 1)); PV_STAT(atomic_add_int(&pv_entry_spare, 1)); PV_STAT(atomic_subtract_long(&pv_entry_count, 1)); pc = pv_to_chunk(pv); idx = pv - &pc->pc_pventry[0]; field = idx / 64; bit = idx % 64; pc->pc_map[field] |= 1ul << bit; if (pc->pc_map[0] != PC_FREE0 || pc->pc_map[1] != PC_FREE1 || pc->pc_map[2] != PC_FREE2) { /* 98% of the time, pc is already at the head of the list. */ if (__predict_false(pc != TAILQ_FIRST(&pmap->pm_pvchunk))) { TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); TAILQ_INSERT_HEAD(&pmap->pm_pvchunk, pc, pc_list); } return; } TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); free_pv_chunk(pc); } static void free_pv_chunk(struct pv_chunk *pc) { vm_page_t m; mtx_lock(&pv_chunks_mutex); TAILQ_REMOVE(&pv_chunks, pc, pc_lru); mtx_unlock(&pv_chunks_mutex); PV_STAT(atomic_subtract_int(&pv_entry_spare, _NPCPV)); PV_STAT(atomic_subtract_int(&pc_chunk_count, 1)); PV_STAT(atomic_add_int(&pc_chunk_frees, 1)); /* entire chunk is free, return it */ m = PHYS_TO_VM_PAGE(DMAP_TO_PHYS((vm_offset_t)pc)); dump_drop_page(m->phys_addr); vm_page_unwire(m, PQ_NONE); vm_page_free(m); } /* * Returns a new PV entry, allocating a new PV chunk from the system when * needed. If this PV chunk allocation fails and a PV list lock pointer was * given, a PV chunk is reclaimed from an arbitrary pmap. Otherwise, NULL is * returned. * * The given PV list lock may be released. */ static pv_entry_t get_pv_entry(pmap_t pmap, struct rwlock **lockp) { int bit, field; pv_entry_t pv; struct pv_chunk *pc; vm_page_t m; PMAP_LOCK_ASSERT(pmap, MA_OWNED); PV_STAT(atomic_add_long(&pv_entry_allocs, 1)); retry: pc = TAILQ_FIRST(&pmap->pm_pvchunk); if (pc != NULL) { for (field = 0; field < _NPCM; field++) { if (pc->pc_map[field]) { bit = ffsl(pc->pc_map[field]) - 1; break; } } if (field < _NPCM) { pv = &pc->pc_pventry[field * 64 + bit]; pc->pc_map[field] &= ~(1ul << bit); /* If this was the last item, move it to tail */ if (pc->pc_map[0] == 0 && pc->pc_map[1] == 0 && pc->pc_map[2] == 0) { TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); TAILQ_INSERT_TAIL(&pmap->pm_pvchunk, pc, pc_list); } PV_STAT(atomic_add_long(&pv_entry_count, 1)); PV_STAT(atomic_subtract_int(&pv_entry_spare, 1)); return (pv); } } /* No free items, allocate another chunk */ m = vm_page_alloc(NULL, 0, VM_ALLOC_NORMAL | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED); if (m == NULL) { if (lockp == NULL) { PV_STAT(pc_chunk_tryfail++); return (NULL); } m = reclaim_pv_chunk(pmap, lockp); if (m == NULL) goto retry; } PV_STAT(atomic_add_int(&pc_chunk_count, 1)); PV_STAT(atomic_add_int(&pc_chunk_allocs, 1)); dump_add_page(m->phys_addr); pc = (void *)PHYS_TO_DMAP(m->phys_addr); pc->pc_pmap = pmap; pc->pc_map[0] = PC_FREE0 & ~1ul; /* preallocated bit 0 */ pc->pc_map[1] = PC_FREE1; pc->pc_map[2] = PC_FREE2; mtx_lock(&pv_chunks_mutex); TAILQ_INSERT_TAIL(&pv_chunks, pc, pc_lru); mtx_unlock(&pv_chunks_mutex); pv = &pc->pc_pventry[0]; TAILQ_INSERT_HEAD(&pmap->pm_pvchunk, pc, pc_list); PV_STAT(atomic_add_long(&pv_entry_count, 1)); PV_STAT(atomic_add_int(&pv_entry_spare, _NPCPV - 1)); return (pv); } /* * Ensure that the number of spare PV entries in the specified pmap meets or * exceeds the given count, "needed". * * The given PV list lock may be released. */ static void reserve_pv_entries(pmap_t pmap, int needed, struct rwlock **lockp) { struct pch new_tail; struct pv_chunk *pc; int avail, free; vm_page_t m; PMAP_LOCK_ASSERT(pmap, MA_OWNED); KASSERT(lockp != NULL, ("reserve_pv_entries: lockp is NULL")); /* * Newly allocated PV chunks must be stored in a private list until * the required number of PV chunks have been allocated. Otherwise, * reclaim_pv_chunk() could recycle one of these chunks. In * contrast, these chunks must be added to the pmap upon allocation. */ TAILQ_INIT(&new_tail); retry: avail = 0; TAILQ_FOREACH(pc, &pmap->pm_pvchunk, pc_list) { bit_count((bitstr_t *)pc->pc_map, 0, sizeof(pc->pc_map) * NBBY, &free); if (free == 0) break; avail += free; if (avail >= needed) break; } for (; avail < needed; avail += _NPCPV) { m = vm_page_alloc(NULL, 0, VM_ALLOC_NORMAL | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED); if (m == NULL) { m = reclaim_pv_chunk(pmap, lockp); if (m == NULL) goto retry; } PV_STAT(atomic_add_int(&pc_chunk_count, 1)); PV_STAT(atomic_add_int(&pc_chunk_allocs, 1)); dump_add_page(m->phys_addr); pc = (void *)PHYS_TO_DMAP(m->phys_addr); pc->pc_pmap = pmap; pc->pc_map[0] = PC_FREE0; pc->pc_map[1] = PC_FREE1; pc->pc_map[2] = PC_FREE2; TAILQ_INSERT_HEAD(&pmap->pm_pvchunk, pc, pc_list); TAILQ_INSERT_TAIL(&new_tail, pc, pc_lru); PV_STAT(atomic_add_int(&pv_entry_spare, _NPCPV)); } if (!TAILQ_EMPTY(&new_tail)) { mtx_lock(&pv_chunks_mutex); TAILQ_CONCAT(&pv_chunks, &new_tail, pc_lru); mtx_unlock(&pv_chunks_mutex); } } /* * First find and then remove the pv entry for the specified pmap and virtual * address from the specified pv list. Returns the pv entry if found and NULL * otherwise. This operation can be performed on pv lists for either 4KB or * 2MB page mappings. */ static __inline pv_entry_t pmap_pvh_remove(struct md_page *pvh, pmap_t pmap, vm_offset_t va) { pv_entry_t pv; TAILQ_FOREACH(pv, &pvh->pv_list, pv_next) { if (pmap == PV_PMAP(pv) && va == pv->pv_va) { TAILQ_REMOVE(&pvh->pv_list, pv, pv_next); pvh->pv_gen++; break; } } return (pv); } /* * After demotion from a 2MB page mapping to 512 4KB page mappings, * destroy the pv entry for the 2MB page mapping and reinstantiate the pv * entries for each of the 4KB page mappings. */ static void pmap_pv_demote_l2(pmap_t pmap, vm_offset_t va, vm_paddr_t pa, struct rwlock **lockp) { struct md_page *pvh; struct pv_chunk *pc; pv_entry_t pv; vm_offset_t va_last; vm_page_t m; int bit, field; PMAP_LOCK_ASSERT(pmap, MA_OWNED); KASSERT((pa & L2_OFFSET) == 0, ("pmap_pv_demote_l2: pa is not 2mpage aligned")); CHANGE_PV_LIST_LOCK_TO_PHYS(lockp, pa); /* * Transfer the 2mpage's pv entry for this mapping to the first * page's pv list. Once this transfer begins, the pv list lock * must not be released until the last pv entry is reinstantiated. */ pvh = pa_to_pvh(pa); va = va & ~L2_OFFSET; pv = pmap_pvh_remove(pvh, pmap, va); KASSERT(pv != NULL, ("pmap_pv_demote_l2: pv not found")); m = PHYS_TO_VM_PAGE(pa); TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_next); m->md.pv_gen++; /* Instantiate the remaining Ln_ENTRIES - 1 pv entries. */ PV_STAT(atomic_add_long(&pv_entry_allocs, Ln_ENTRIES - 1)); va_last = va + L2_SIZE - PAGE_SIZE; for (;;) { pc = TAILQ_FIRST(&pmap->pm_pvchunk); KASSERT(pc->pc_map[0] != 0 || pc->pc_map[1] != 0 || pc->pc_map[2] != 0, ("pmap_pv_demote_l2: missing spare")); for (field = 0; field < _NPCM; field++) { while (pc->pc_map[field]) { bit = ffsl(pc->pc_map[field]) - 1; pc->pc_map[field] &= ~(1ul << bit); pv = &pc->pc_pventry[field * 64 + bit]; va += PAGE_SIZE; pv->pv_va = va; m++; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_pv_demote_l2: page %p is not managed", m)); TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_next); m->md.pv_gen++; if (va == va_last) goto out; } } TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); TAILQ_INSERT_TAIL(&pmap->pm_pvchunk, pc, pc_list); } out: if (pc->pc_map[0] == 0 && pc->pc_map[1] == 0 && pc->pc_map[2] == 0) { TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); TAILQ_INSERT_TAIL(&pmap->pm_pvchunk, pc, pc_list); } PV_STAT(atomic_add_long(&pv_entry_count, Ln_ENTRIES - 1)); PV_STAT(atomic_subtract_int(&pv_entry_spare, Ln_ENTRIES - 1)); } /* * First find and then destroy the pv entry for the specified pmap and virtual * address. This operation can be performed on pv lists for either 4KB or 2MB * page mappings. */ static void pmap_pvh_free(struct md_page *pvh, pmap_t pmap, vm_offset_t va) { pv_entry_t pv; pv = pmap_pvh_remove(pvh, pmap, va); KASSERT(pv != NULL, ("pmap_pvh_free: pv not found")); free_pv_entry(pmap, pv); } /* * Conditionally create the PV entry for a 4KB page mapping if the required * memory can be allocated without resorting to reclamation. */ static boolean_t pmap_try_insert_pv_entry(pmap_t pmap, vm_offset_t va, vm_page_t m, struct rwlock **lockp) { pv_entry_t pv; PMAP_LOCK_ASSERT(pmap, MA_OWNED); /* Pass NULL instead of the lock pointer to disable reclamation. */ if ((pv = get_pv_entry(pmap, NULL)) != NULL) { pv->pv_va = va; CHANGE_PV_LIST_LOCK_TO_VM_PAGE(lockp, m); TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_next); m->md.pv_gen++; return (TRUE); } else return (FALSE); } /* * pmap_remove_l3: do the things to unmap a page in a process */ static int pmap_remove_l3(pmap_t pmap, pt_entry_t *l3, vm_offset_t va, pd_entry_t l2e, struct spglist *free, struct rwlock **lockp) { struct md_page *pvh; pt_entry_t old_l3; vm_page_t m; PMAP_LOCK_ASSERT(pmap, MA_OWNED); if (pmap_is_current(pmap) && pmap_l3_valid_cacheable(pmap_load(l3))) cpu_dcache_wb_range(va, L3_SIZE); old_l3 = pmap_load_clear(l3); PTE_SYNC(l3); pmap_invalidate_page(pmap, va); if (old_l3 & ATTR_SW_WIRED) pmap->pm_stats.wired_count -= 1; pmap_resident_count_dec(pmap, 1); if (old_l3 & ATTR_SW_MANAGED) { m = PHYS_TO_VM_PAGE(old_l3 & ~ATTR_MASK); if (pmap_page_dirty(old_l3)) vm_page_dirty(m); if (old_l3 & ATTR_AF) vm_page_aflag_set(m, PGA_REFERENCED); CHANGE_PV_LIST_LOCK_TO_VM_PAGE(lockp, m); pmap_pvh_free(&m->md, pmap, va); if (TAILQ_EMPTY(&m->md.pv_list) && (m->flags & PG_FICTITIOUS) == 0) { pvh = pa_to_pvh(VM_PAGE_TO_PHYS(m)); if (TAILQ_EMPTY(&pvh->pv_list)) vm_page_aflag_clear(m, PGA_WRITEABLE); } } return (pmap_unuse_l3(pmap, va, l2e, free)); } /* * Remove the given range of addresses from the specified map. * * It is assumed that the start and end are properly * rounded to the page size. */ void pmap_remove(pmap_t pmap, vm_offset_t sva, vm_offset_t eva) { struct rwlock *lock; vm_offset_t va, va_next; pd_entry_t *l0, *l1, *l2; pt_entry_t l3_paddr, *l3; struct spglist free; int anyvalid; /* * Perform an unsynchronized read. This is, however, safe. */ if (pmap->pm_stats.resident_count == 0) return; anyvalid = 0; SLIST_INIT(&free); PMAP_LOCK(pmap); lock = NULL; for (; sva < eva; sva = va_next) { if (pmap->pm_stats.resident_count == 0) break; l0 = pmap_l0(pmap, sva); if (pmap_load(l0) == 0) { va_next = (sva + L0_SIZE) & ~L0_OFFSET; if (va_next < sva) va_next = eva; continue; } l1 = pmap_l0_to_l1(l0, sva); if (pmap_load(l1) == 0) { va_next = (sva + L1_SIZE) & ~L1_OFFSET; if (va_next < sva) va_next = eva; continue; } /* * Calculate index for next page table. */ va_next = (sva + L2_SIZE) & ~L2_OFFSET; if (va_next < sva) va_next = eva; l2 = pmap_l1_to_l2(l1, sva); if (l2 == NULL) continue; l3_paddr = pmap_load(l2); if ((l3_paddr & ATTR_DESCR_MASK) == L2_BLOCK) { /* TODO: Add pmap_remove_l2 */ if (pmap_demote_l2_locked(pmap, l2, sva & ~L2_OFFSET, &lock) == NULL) continue; l3_paddr = pmap_load(l2); } /* * Weed out invalid mappings. */ if ((l3_paddr & ATTR_DESCR_MASK) != L2_TABLE) continue; /* * Limit our scan to either the end of the va represented * by the current page table page, or to the end of the * range being removed. */ if (va_next > eva) va_next = eva; va = va_next; for (l3 = pmap_l2_to_l3(l2, sva); sva != va_next; l3++, sva += L3_SIZE) { if (l3 == NULL) panic("l3 == NULL"); if (pmap_load(l3) == 0) { if (va != va_next) { pmap_invalidate_range(pmap, va, sva); va = va_next; } continue; } if (va == va_next) va = sva; if (pmap_remove_l3(pmap, l3, sva, l3_paddr, &free, &lock)) { sva += L3_SIZE; break; } } if (va != va_next) pmap_invalidate_range(pmap, va, sva); } if (lock != NULL) rw_wunlock(lock); if (anyvalid) pmap_invalidate_all(pmap); PMAP_UNLOCK(pmap); pmap_free_zero_pages(&free); } /* * Routine: pmap_remove_all * Function: * Removes this physical page from * all physical maps in which it resides. * Reflects back modify bits to the pager. * * Notes: * Original versions of this routine were very * inefficient because they iteratively called * pmap_remove (slow...) */ void pmap_remove_all(vm_page_t m) { struct md_page *pvh; pv_entry_t pv; pmap_t pmap; struct rwlock *lock; pd_entry_t *pde, tpde; pt_entry_t *pte, tpte; vm_offset_t va; struct spglist free; int lvl, pvh_gen, md_gen; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_remove_all: page %p is not managed", m)); SLIST_INIT(&free); lock = VM_PAGE_TO_PV_LIST_LOCK(m); pvh = (m->flags & PG_FICTITIOUS) != 0 ? &pv_dummy : pa_to_pvh(VM_PAGE_TO_PHYS(m)); retry: rw_wlock(lock); while ((pv = TAILQ_FIRST(&pvh->pv_list)) != NULL) { pmap = PV_PMAP(pv); if (!PMAP_TRYLOCK(pmap)) { pvh_gen = pvh->pv_gen; rw_wunlock(lock); PMAP_LOCK(pmap); rw_wlock(lock); if (pvh_gen != pvh->pv_gen) { rw_wunlock(lock); PMAP_UNLOCK(pmap); goto retry; } } va = pv->pv_va; pte = pmap_pte(pmap, va, &lvl); KASSERT(pte != NULL, ("pmap_remove_all: no page table entry found")); KASSERT(lvl == 2, ("pmap_remove_all: invalid pte level %d", lvl)); pmap_demote_l2_locked(pmap, pte, va, &lock); PMAP_UNLOCK(pmap); } while ((pv = TAILQ_FIRST(&m->md.pv_list)) != NULL) { pmap = PV_PMAP(pv); if (!PMAP_TRYLOCK(pmap)) { pvh_gen = pvh->pv_gen; md_gen = m->md.pv_gen; rw_wunlock(lock); PMAP_LOCK(pmap); rw_wlock(lock); if (pvh_gen != pvh->pv_gen || md_gen != m->md.pv_gen) { rw_wunlock(lock); PMAP_UNLOCK(pmap); goto retry; } } pmap_resident_count_dec(pmap, 1); pde = pmap_pde(pmap, pv->pv_va, &lvl); KASSERT(pde != NULL, ("pmap_remove_all: no page directory entry found")); KASSERT(lvl == 2, ("pmap_remove_all: invalid pde level %d", lvl)); tpde = pmap_load(pde); pte = pmap_l2_to_l3(pde, pv->pv_va); tpte = pmap_load(pte); if (pmap_is_current(pmap) && pmap_l3_valid_cacheable(tpte)) cpu_dcache_wb_range(pv->pv_va, L3_SIZE); pmap_load_clear(pte); PTE_SYNC(pte); pmap_invalidate_page(pmap, pv->pv_va); if (tpte & ATTR_SW_WIRED) pmap->pm_stats.wired_count--; if ((tpte & ATTR_AF) != 0) vm_page_aflag_set(m, PGA_REFERENCED); /* * Update the vm_page_t clean and reference bits. */ if (pmap_page_dirty(tpte)) vm_page_dirty(m); pmap_unuse_l3(pmap, pv->pv_va, tpde, &free); TAILQ_REMOVE(&m->md.pv_list, pv, pv_next); m->md.pv_gen++; free_pv_entry(pmap, pv); PMAP_UNLOCK(pmap); } vm_page_aflag_clear(m, PGA_WRITEABLE); rw_wunlock(lock); pmap_free_zero_pages(&free); } /* * Set the physical protection on the * specified range of this map as requested. */ void pmap_protect(pmap_t pmap, vm_offset_t sva, vm_offset_t eva, vm_prot_t prot) { vm_offset_t va, va_next; pd_entry_t *l0, *l1, *l2; pt_entry_t *l3p, l3; if ((prot & VM_PROT_READ) == VM_PROT_NONE) { pmap_remove(pmap, sva, eva); return; } if ((prot & VM_PROT_WRITE) == VM_PROT_WRITE) return; PMAP_LOCK(pmap); for (; sva < eva; sva = va_next) { l0 = pmap_l0(pmap, sva); if (pmap_load(l0) == 0) { va_next = (sva + L0_SIZE) & ~L0_OFFSET; if (va_next < sva) va_next = eva; continue; } l1 = pmap_l0_to_l1(l0, sva); if (pmap_load(l1) == 0) { va_next = (sva + L1_SIZE) & ~L1_OFFSET; if (va_next < sva) va_next = eva; continue; } va_next = (sva + L2_SIZE) & ~L2_OFFSET; if (va_next < sva) va_next = eva; l2 = pmap_l1_to_l2(l1, sva); if (pmap_load(l2) == 0) continue; if ((pmap_load(l2) & ATTR_DESCR_MASK) == L2_BLOCK) { l3p = pmap_demote_l2(pmap, l2, sva); if (l3p == NULL) continue; } KASSERT((pmap_load(l2) & ATTR_DESCR_MASK) == L2_TABLE, ("pmap_protect: Invalid L2 entry after demotion")); if (va_next > eva) va_next = eva; va = va_next; for (l3p = pmap_l2_to_l3(l2, sva); sva != va_next; l3p++, sva += L3_SIZE) { l3 = pmap_load(l3p); if (pmap_l3_valid(l3)) { pmap_set(l3p, ATTR_AP(ATTR_AP_RO)); PTE_SYNC(l3p); /* XXX: Use pmap_invalidate_range */ pmap_invalidate_page(pmap, va); } } } PMAP_UNLOCK(pmap); /* TODO: Only invalidate entries we are touching */ pmap_invalidate_all(pmap); } /* * Inserts the specified page table page into the specified pmap's collection * of idle page table pages. Each of a pmap's page table pages is responsible * for mapping a distinct range of virtual addresses. The pmap's collection is * ordered by this virtual address range. */ static __inline int pmap_insert_pt_page(pmap_t pmap, vm_page_t mpte) { PMAP_LOCK_ASSERT(pmap, MA_OWNED); return (vm_radix_insert(&pmap->pm_root, mpte)); } /* * Looks for a page table page mapping the specified virtual address in the * specified pmap's collection of idle page table pages. Returns NULL if there * is no page table page corresponding to the specified virtual address. */ static __inline vm_page_t pmap_lookup_pt_page(pmap_t pmap, vm_offset_t va) { PMAP_LOCK_ASSERT(pmap, MA_OWNED); return (vm_radix_lookup(&pmap->pm_root, pmap_l2_pindex(va))); } /* * Removes the specified page table page from the specified pmap's collection * of idle page table pages. The specified page table page must be a member of * the pmap's collection. */ static __inline void pmap_remove_pt_page(pmap_t pmap, vm_page_t mpte) { PMAP_LOCK_ASSERT(pmap, MA_OWNED); vm_radix_remove(&pmap->pm_root, mpte->pindex); } /* * Performs a break-before-make update of a pmap entry. This is needed when * either promoting or demoting pages to ensure the TLB doesn't get into an * inconsistent state. */ static void pmap_update_entry(pmap_t pmap, pd_entry_t *pte, pd_entry_t newpte, vm_offset_t va, vm_size_t size) { register_t intr; PMAP_LOCK_ASSERT(pmap, MA_OWNED); /* * Ensure we don't get switched out with the page table in an * inconsistent state. We also need to ensure no interrupts fire * as they may make use of an address we are about to invalidate. */ intr = intr_disable(); critical_enter(); /* Clear the old mapping */ pmap_load_clear(pte); PTE_SYNC(pte); pmap_invalidate_range(pmap, va, va + size); /* Create the new mapping */ pmap_load_store(pte, newpte); PTE_SYNC(pte); critical_exit(); intr_restore(intr); } /* * After promotion from 512 4KB page mappings to a single 2MB page mapping, * replace the many pv entries for the 4KB page mappings by a single pv entry * for the 2MB page mapping. */ static void pmap_pv_promote_l2(pmap_t pmap, vm_offset_t va, vm_paddr_t pa, struct rwlock **lockp) { struct md_page *pvh; pv_entry_t pv; vm_offset_t va_last; vm_page_t m; KASSERT((pa & L2_OFFSET) == 0, ("pmap_pv_promote_l2: pa is not 2mpage aligned")); CHANGE_PV_LIST_LOCK_TO_PHYS(lockp, pa); /* * Transfer the first page's pv entry for this mapping to the 2mpage's * pv list. Aside from avoiding the cost of a call to get_pv_entry(), * a transfer avoids the possibility that get_pv_entry() calls * reclaim_pv_chunk() and that reclaim_pv_chunk() removes one of the * mappings that is being promoted. */ m = PHYS_TO_VM_PAGE(pa); va = va & ~L2_OFFSET; pv = pmap_pvh_remove(&m->md, pmap, va); KASSERT(pv != NULL, ("pmap_pv_promote_l2: pv not found")); pvh = pa_to_pvh(pa); TAILQ_INSERT_TAIL(&pvh->pv_list, pv, pv_next); pvh->pv_gen++; /* Free the remaining NPTEPG - 1 pv entries. */ va_last = va + L2_SIZE - PAGE_SIZE; do { m++; va += PAGE_SIZE; pmap_pvh_free(&m->md, pmap, va); } while (va < va_last); } /* * Tries to promote the 512, contiguous 4KB page mappings that are within a * single level 2 table entry to a single 2MB page mapping. For promotion * to occur, two conditions must be met: (1) the 4KB page mappings must map * aligned, contiguous physical memory and (2) the 4KB page mappings must have * identical characteristics. */ static void pmap_promote_l2(pmap_t pmap, pd_entry_t *l2, vm_offset_t va, struct rwlock **lockp) { pt_entry_t *firstl3, *l3, newl2, oldl3, pa; vm_page_t mpte; vm_offset_t sva; PMAP_LOCK_ASSERT(pmap, MA_OWNED); sva = va & ~L2_OFFSET; firstl3 = pmap_l2_to_l3(l2, sva); newl2 = pmap_load(firstl3); /* Check the alingment is valid */ if (((newl2 & ~ATTR_MASK) & L2_OFFSET) != 0) { atomic_add_long(&pmap_l2_p_failures, 1); CTR2(KTR_PMAP, "pmap_promote_l2: failure for va %#lx" " in pmap %p", va, pmap); return; } pa = newl2 + L2_SIZE - PAGE_SIZE; for (l3 = firstl3 + NL3PG - 1; l3 > firstl3; l3--) { oldl3 = pmap_load(l3); if (oldl3 != pa) { atomic_add_long(&pmap_l2_p_failures, 1); CTR2(KTR_PMAP, "pmap_promote_l2: failure for va %#lx" " in pmap %p", va, pmap); return; } pa -= PAGE_SIZE; } /* * Save the page table page in its current state until the L2 * mapping the superpage is demoted by pmap_demote_l2() or * destroyed by pmap_remove_l3(). */ mpte = PHYS_TO_VM_PAGE(pmap_load(l2) & ~ATTR_MASK); KASSERT(mpte >= vm_page_array && mpte < &vm_page_array[vm_page_array_size], ("pmap_promote_l2: page table page is out of range")); KASSERT(mpte->pindex == pmap_l2_pindex(va), ("pmap_promote_l2: page table page's pindex is wrong")); if (pmap_insert_pt_page(pmap, mpte)) { atomic_add_long(&pmap_l2_p_failures, 1); CTR2(KTR_PMAP, "pmap_promote_l2: failure for va %#lx in pmap %p", va, pmap); return; } if ((newl2 & ATTR_SW_MANAGED) != 0) pmap_pv_promote_l2(pmap, va, newl2 & ~ATTR_MASK, lockp); newl2 &= ~ATTR_DESCR_MASK; newl2 |= L2_BLOCK; pmap_update_entry(pmap, l2, newl2, sva, L2_SIZE); atomic_add_long(&pmap_l2_promotions, 1); CTR2(KTR_PMAP, "pmap_promote_l2: success for va %#lx in pmap %p", va, pmap); } /* * Insert the given physical page (p) at * the specified virtual address (v) in the * target physical map with the protection requested. * * If specified, the page will be wired down, meaning * that the related pte can not be reclaimed. * * NB: This is the only routine which MAY NOT lazy-evaluate * or lose information. That is, this routine must actually * insert this page into the given map NOW. */ int pmap_enter(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot, u_int flags, int8_t psind __unused) { struct rwlock *lock; pd_entry_t *pde; pt_entry_t new_l3, orig_l3; pt_entry_t *l2, *l3; pv_entry_t pv; vm_paddr_t opa, pa, l1_pa, l2_pa, l3_pa; vm_page_t mpte, om, l1_m, l2_m, l3_m; boolean_t nosleep; int lvl; va = trunc_page(va); if ((m->oflags & VPO_UNMANAGED) == 0 && !vm_page_xbusied(m)) VM_OBJECT_ASSERT_LOCKED(m->object); pa = VM_PAGE_TO_PHYS(m); new_l3 = (pt_entry_t)(pa | ATTR_DEFAULT | ATTR_IDX(m->md.pv_memattr) | L3_PAGE); if ((prot & VM_PROT_WRITE) == 0) new_l3 |= ATTR_AP(ATTR_AP_RO); if ((flags & PMAP_ENTER_WIRED) != 0) new_l3 |= ATTR_SW_WIRED; if ((va >> 63) == 0) new_l3 |= ATTR_AP(ATTR_AP_USER); CTR2(KTR_PMAP, "pmap_enter: %.16lx -> %.16lx", va, pa); mpte = NULL; lock = NULL; PMAP_LOCK(pmap); pde = pmap_pde(pmap, va, &lvl); if (pde != NULL && lvl == 1) { l2 = pmap_l1_to_l2(pde, va); if ((pmap_load(l2) & ATTR_DESCR_MASK) == L2_BLOCK && (l3 = pmap_demote_l2_locked(pmap, l2, va & ~L2_OFFSET, &lock)) != NULL) { l3 = &l3[pmap_l3_index(va)]; if (va < VM_MAXUSER_ADDRESS) { mpte = PHYS_TO_VM_PAGE( pmap_load(l2) & ~ATTR_MASK); mpte->wire_count++; } goto havel3; } } if (va < VM_MAXUSER_ADDRESS) { nosleep = (flags & PMAP_ENTER_NOSLEEP) != 0; mpte = pmap_alloc_l3(pmap, va, nosleep ? NULL : &lock); if (mpte == NULL && nosleep) { CTR0(KTR_PMAP, "pmap_enter: mpte == NULL"); if (lock != NULL) rw_wunlock(lock); PMAP_UNLOCK(pmap); return (KERN_RESOURCE_SHORTAGE); } pde = pmap_pde(pmap, va, &lvl); KASSERT(pde != NULL, ("pmap_enter: Invalid page entry, va: 0x%lx", va)); KASSERT(lvl == 2, ("pmap_enter: Invalid level %d", lvl)); l3 = pmap_l2_to_l3(pde, va); } else { /* * If we get a level 2 pde it must point to a level 3 entry * otherwise we will need to create the intermediate tables */ if (lvl < 2) { switch(lvl) { default: case -1: /* Get the l0 pde to update */ pde = pmap_l0(pmap, va); KASSERT(pde != NULL, ("...")); l1_m = vm_page_alloc(NULL, 0, VM_ALLOC_NORMAL | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED | VM_ALLOC_ZERO); if (l1_m == NULL) panic("pmap_enter: l1 pte_m == NULL"); if ((l1_m->flags & PG_ZERO) == 0) pmap_zero_page(l1_m); l1_pa = VM_PAGE_TO_PHYS(l1_m); pmap_load_store(pde, l1_pa | L0_TABLE); PTE_SYNC(pde); /* FALLTHROUGH */ case 0: /* Get the l1 pde to update */ pde = pmap_l1_to_l2(pde, va); KASSERT(pde != NULL, ("...")); l2_m = vm_page_alloc(NULL, 0, VM_ALLOC_NORMAL | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED | VM_ALLOC_ZERO); if (l2_m == NULL) panic("pmap_enter: l2 pte_m == NULL"); if ((l2_m->flags & PG_ZERO) == 0) pmap_zero_page(l2_m); l2_pa = VM_PAGE_TO_PHYS(l2_m); pmap_load_store(pde, l2_pa | L1_TABLE); PTE_SYNC(pde); /* FALLTHROUGH */ case 1: /* Get the l2 pde to update */ pde = pmap_l1_to_l2(pde, va); l3_m = vm_page_alloc(NULL, 0, VM_ALLOC_NORMAL | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED | VM_ALLOC_ZERO); if (l3_m == NULL) panic("pmap_enter: l3 pte_m == NULL"); if ((l3_m->flags & PG_ZERO) == 0) pmap_zero_page(l3_m); l3_pa = VM_PAGE_TO_PHYS(l3_m); pmap_load_store(pde, l3_pa | L2_TABLE); PTE_SYNC(pde); break; } } l3 = pmap_l2_to_l3(pde, va); pmap_invalidate_page(pmap, va); } havel3: om = NULL; orig_l3 = pmap_load(l3); opa = orig_l3 & ~ATTR_MASK; /* * Is the specified virtual address already mapped? */ if (pmap_l3_valid(orig_l3)) { /* * Wiring change, just update stats. We don't worry about * wiring PT pages as they remain resident as long as there * are valid mappings in them. Hence, if a user page is wired, * the PT page will be also. */ if ((flags & PMAP_ENTER_WIRED) != 0 && (orig_l3 & ATTR_SW_WIRED) == 0) pmap->pm_stats.wired_count++; else if ((flags & PMAP_ENTER_WIRED) == 0 && (orig_l3 & ATTR_SW_WIRED) != 0) pmap->pm_stats.wired_count--; /* * Remove the extra PT page reference. */ if (mpte != NULL) { mpte->wire_count--; KASSERT(mpte->wire_count > 0, ("pmap_enter: missing reference to page table page," " va: 0x%lx", va)); } /* * Has the physical page changed? */ if (opa == pa) { /* * No, might be a protection or wiring change. */ if ((orig_l3 & ATTR_SW_MANAGED) != 0) { new_l3 |= ATTR_SW_MANAGED; if ((new_l3 & ATTR_AP(ATTR_AP_RW)) == ATTR_AP(ATTR_AP_RW)) { vm_page_aflag_set(m, PGA_WRITEABLE); } } goto validate; } /* Flush the cache, there might be uncommitted data in it */ if (pmap_is_current(pmap) && pmap_l3_valid_cacheable(orig_l3)) cpu_dcache_wb_range(va, L3_SIZE); } else { /* * Increment the counters. */ if ((new_l3 & ATTR_SW_WIRED) != 0) pmap->pm_stats.wired_count++; pmap_resident_count_inc(pmap, 1); } /* * Enter on the PV list if part of our managed memory. */ if ((m->oflags & VPO_UNMANAGED) == 0) { new_l3 |= ATTR_SW_MANAGED; pv = get_pv_entry(pmap, &lock); pv->pv_va = va; CHANGE_PV_LIST_LOCK_TO_PHYS(&lock, pa); TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_next); m->md.pv_gen++; if ((new_l3 & ATTR_AP_RW_BIT) == ATTR_AP(ATTR_AP_RW)) vm_page_aflag_set(m, PGA_WRITEABLE); } /* * Update the L3 entry. */ if (orig_l3 != 0) { validate: orig_l3 = pmap_load(l3); opa = orig_l3 & ~ATTR_MASK; if (opa != pa) { pmap_update_entry(pmap, l3, new_l3, va, PAGE_SIZE); if ((orig_l3 & ATTR_SW_MANAGED) != 0) { om = PHYS_TO_VM_PAGE(opa); if (pmap_page_dirty(orig_l3)) vm_page_dirty(om); if ((orig_l3 & ATTR_AF) != 0) vm_page_aflag_set(om, PGA_REFERENCED); CHANGE_PV_LIST_LOCK_TO_PHYS(&lock, opa); pmap_pvh_free(&om->md, pmap, va); if ((om->aflags & PGA_WRITEABLE) != 0 && TAILQ_EMPTY(&om->md.pv_list) && ((om->flags & PG_FICTITIOUS) != 0 || TAILQ_EMPTY(&pa_to_pvh(opa)->pv_list))) vm_page_aflag_clear(om, PGA_WRITEABLE); } } else { pmap_load_store(l3, new_l3); PTE_SYNC(l3); pmap_invalidate_page(pmap, va); if (pmap_page_dirty(orig_l3) && (orig_l3 & ATTR_SW_MANAGED) != 0) vm_page_dirty(m); } } else { pmap_load_store(l3, new_l3); } PTE_SYNC(l3); pmap_invalidate_page(pmap, va); if (pmap != pmap_kernel()) { if (pmap == &curproc->p_vmspace->vm_pmap && (prot & VM_PROT_EXECUTE) != 0) cpu_icache_sync_range(va, PAGE_SIZE); if ((mpte == NULL || mpte->wire_count == NL3PG) && pmap_superpages_enabled() && (m->flags & PG_FICTITIOUS) == 0 && vm_reserv_level_iffullpop(m) == 0) { pmap_promote_l2(pmap, pde, va, &lock); } } if (lock != NULL) rw_wunlock(lock); PMAP_UNLOCK(pmap); return (KERN_SUCCESS); } /* * Maps a sequence of resident pages belonging to the same object. * The sequence begins with the given page m_start. This page is * mapped at the given virtual address start. Each subsequent page is * mapped at a virtual address that is offset from start by the same * amount as the page is offset from m_start within the object. The * last page in the sequence is the page with the largest offset from * m_start that can be mapped at a virtual address less than the given * virtual address end. Not every virtual page between start and end * is mapped; only those for which a resident page exists with the * corresponding offset from m_start are mapped. */ void pmap_enter_object(pmap_t pmap, vm_offset_t start, vm_offset_t end, vm_page_t m_start, vm_prot_t prot) { struct rwlock *lock; vm_offset_t va; vm_page_t m, mpte; vm_pindex_t diff, psize; VM_OBJECT_ASSERT_LOCKED(m_start->object); psize = atop(end - start); mpte = NULL; m = m_start; lock = NULL; PMAP_LOCK(pmap); while (m != NULL && (diff = m->pindex - m_start->pindex) < psize) { va = start + ptoa(diff); mpte = pmap_enter_quick_locked(pmap, va, m, prot, mpte, &lock); m = TAILQ_NEXT(m, listq); } if (lock != NULL) rw_wunlock(lock); PMAP_UNLOCK(pmap); } /* * this code makes some *MAJOR* assumptions: * 1. Current pmap & pmap exists. * 2. Not wired. * 3. Read access. * 4. No page table pages. * but is *MUCH* faster than pmap_enter... */ void pmap_enter_quick(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot) { struct rwlock *lock; lock = NULL; PMAP_LOCK(pmap); (void)pmap_enter_quick_locked(pmap, va, m, prot, NULL, &lock); if (lock != NULL) rw_wunlock(lock); PMAP_UNLOCK(pmap); } static vm_page_t pmap_enter_quick_locked(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot, vm_page_t mpte, struct rwlock **lockp) { struct spglist free; pd_entry_t *pde; pt_entry_t *l2, *l3; vm_paddr_t pa; int lvl; KASSERT(va < kmi.clean_sva || va >= kmi.clean_eva || (m->oflags & VPO_UNMANAGED) != 0, ("pmap_enter_quick_locked: managed mapping within the clean submap")); PMAP_LOCK_ASSERT(pmap, MA_OWNED); CTR2(KTR_PMAP, "pmap_enter_quick_locked: %p %lx", pmap, va); /* * In the case that a page table page is not * resident, we are creating it here. */ if (va < VM_MAXUSER_ADDRESS) { vm_pindex_t l2pindex; /* * Calculate pagetable page index */ l2pindex = pmap_l2_pindex(va); if (mpte && (mpte->pindex == l2pindex)) { mpte->wire_count++; } else { /* * Get the l2 entry */ pde = pmap_pde(pmap, va, &lvl); /* * If the page table page is mapped, we just increment * the hold count, and activate it. Otherwise, we * attempt to allocate a page table page. If this * attempt fails, we don't retry. Instead, we give up. */ if (lvl == 1) { l2 = pmap_l1_to_l2(pde, va); if ((pmap_load(l2) & ATTR_DESCR_MASK) == L2_BLOCK) return (NULL); } if (lvl == 2 && pmap_load(pde) != 0) { mpte = PHYS_TO_VM_PAGE(pmap_load(pde) & ~ATTR_MASK); mpte->wire_count++; } else { /* * Pass NULL instead of the PV list lock * pointer, because we don't intend to sleep. */ mpte = _pmap_alloc_l3(pmap, l2pindex, NULL); if (mpte == NULL) return (mpte); } } l3 = (pt_entry_t *)PHYS_TO_DMAP(VM_PAGE_TO_PHYS(mpte)); l3 = &l3[pmap_l3_index(va)]; } else { mpte = NULL; pde = pmap_pde(kernel_pmap, va, &lvl); KASSERT(pde != NULL, ("pmap_enter_quick_locked: Invalid page entry, va: 0x%lx", va)); KASSERT(lvl == 2, ("pmap_enter_quick_locked: Invalid level %d", lvl)); l3 = pmap_l2_to_l3(pde, va); } if (pmap_load(l3) != 0) { if (mpte != NULL) { mpte->wire_count--; mpte = NULL; } return (mpte); } /* * Enter on the PV list if part of our managed memory. */ if ((m->oflags & VPO_UNMANAGED) == 0 && !pmap_try_insert_pv_entry(pmap, va, m, lockp)) { if (mpte != NULL) { SLIST_INIT(&free); if (pmap_unwire_l3(pmap, va, mpte, &free)) { pmap_invalidate_page(pmap, va); pmap_free_zero_pages(&free); } mpte = NULL; } return (mpte); } /* * Increment counters */ pmap_resident_count_inc(pmap, 1); pa = VM_PAGE_TO_PHYS(m) | ATTR_DEFAULT | ATTR_IDX(m->md.pv_memattr) | ATTR_AP(ATTR_AP_RO) | L3_PAGE; /* * Now validate mapping with RO protection */ if ((m->oflags & VPO_UNMANAGED) == 0) pa |= ATTR_SW_MANAGED; pmap_load_store(l3, pa); PTE_SYNC(l3); pmap_invalidate_page(pmap, va); return (mpte); } /* * This code maps large physical mmap regions into the * processor address space. Note that some shortcuts * are taken, but the code works. */ void pmap_object_init_pt(pmap_t pmap, vm_offset_t addr, vm_object_t object, vm_pindex_t pindex, vm_size_t size) { VM_OBJECT_ASSERT_WLOCKED(object); KASSERT(object->type == OBJT_DEVICE || object->type == OBJT_SG, ("pmap_object_init_pt: non-device object")); } /* * Clear the wired attribute from the mappings for the specified range of * addresses in the given pmap. Every valid mapping within that range * must have the wired attribute set. In contrast, invalid mappings * cannot have the wired attribute set, so they are ignored. * * The wired attribute of the page table entry is not a hardware feature, * so there is no need to invalidate any TLB entries. */ void pmap_unwire(pmap_t pmap, vm_offset_t sva, vm_offset_t eva) { vm_offset_t va_next; pd_entry_t *l0, *l1, *l2; pt_entry_t *l3; PMAP_LOCK(pmap); for (; sva < eva; sva = va_next) { l0 = pmap_l0(pmap, sva); if (pmap_load(l0) == 0) { va_next = (sva + L0_SIZE) & ~L0_OFFSET; if (va_next < sva) va_next = eva; continue; } l1 = pmap_l0_to_l1(l0, sva); if (pmap_load(l1) == 0) { va_next = (sva + L1_SIZE) & ~L1_OFFSET; if (va_next < sva) va_next = eva; continue; } va_next = (sva + L2_SIZE) & ~L2_OFFSET; if (va_next < sva) va_next = eva; l2 = pmap_l1_to_l2(l1, sva); if (pmap_load(l2) == 0) continue; if ((pmap_load(l2) & ATTR_DESCR_MASK) == L2_BLOCK) { l3 = pmap_demote_l2(pmap, l2, sva); if (l3 == NULL) continue; } KASSERT((pmap_load(l2) & ATTR_DESCR_MASK) == L2_TABLE, ("pmap_unwire: Invalid l2 entry after demotion")); if (va_next > eva) va_next = eva; for (l3 = pmap_l2_to_l3(l2, sva); sva != va_next; l3++, sva += L3_SIZE) { if (pmap_load(l3) == 0) continue; if ((pmap_load(l3) & ATTR_SW_WIRED) == 0) panic("pmap_unwire: l3 %#jx is missing " "ATTR_SW_WIRED", (uintmax_t)pmap_load(l3)); /* * PG_W must be cleared atomically. Although the pmap * lock synchronizes access to PG_W, another processor * could be setting PG_M and/or PG_A concurrently. */ atomic_clear_long(l3, ATTR_SW_WIRED); pmap->pm_stats.wired_count--; } } PMAP_UNLOCK(pmap); } /* * Copy the range specified by src_addr/len * from the source map to the range dst_addr/len * in the destination map. * * This routine is only advisory and need not do anything. */ void pmap_copy(pmap_t dst_pmap, pmap_t src_pmap, vm_offset_t dst_addr, vm_size_t len, vm_offset_t src_addr) { } /* * pmap_zero_page zeros the specified hardware page by mapping * the page into KVM and using bzero to clear its contents. */ void pmap_zero_page(vm_page_t m) { vm_offset_t va = PHYS_TO_DMAP(VM_PAGE_TO_PHYS(m)); pagezero((void *)va); } /* * pmap_zero_page_area zeros the specified hardware page by mapping * the page into KVM and using bzero to clear its contents. * * off and size may not cover an area beyond a single hardware page. */ void pmap_zero_page_area(vm_page_t m, int off, int size) { vm_offset_t va = PHYS_TO_DMAP(VM_PAGE_TO_PHYS(m)); if (off == 0 && size == PAGE_SIZE) pagezero((void *)va); else bzero((char *)va + off, size); } /* * pmap_copy_page copies the specified (machine independent) * page by mapping the page into virtual memory and using * bcopy to copy the page, one machine dependent page at a * time. */ void pmap_copy_page(vm_page_t msrc, vm_page_t mdst) { vm_offset_t src = PHYS_TO_DMAP(VM_PAGE_TO_PHYS(msrc)); vm_offset_t dst = PHYS_TO_DMAP(VM_PAGE_TO_PHYS(mdst)); pagecopy((void *)src, (void *)dst); } int unmapped_buf_allowed = 1; void pmap_copy_pages(vm_page_t ma[], vm_offset_t a_offset, vm_page_t mb[], vm_offset_t b_offset, int xfersize) { void *a_cp, *b_cp; vm_page_t m_a, m_b; vm_paddr_t p_a, p_b; vm_offset_t a_pg_offset, b_pg_offset; int cnt; while (xfersize > 0) { a_pg_offset = a_offset & PAGE_MASK; m_a = ma[a_offset >> PAGE_SHIFT]; p_a = m_a->phys_addr; b_pg_offset = b_offset & PAGE_MASK; m_b = mb[b_offset >> PAGE_SHIFT]; p_b = m_b->phys_addr; cnt = min(xfersize, PAGE_SIZE - a_pg_offset); cnt = min(cnt, PAGE_SIZE - b_pg_offset); if (__predict_false(!PHYS_IN_DMAP(p_a))) { panic("!DMAP a %lx", p_a); } else { a_cp = (char *)PHYS_TO_DMAP(p_a) + a_pg_offset; } if (__predict_false(!PHYS_IN_DMAP(p_b))) { panic("!DMAP b %lx", p_b); } else { b_cp = (char *)PHYS_TO_DMAP(p_b) + b_pg_offset; } bcopy(a_cp, b_cp, cnt); a_offset += cnt; b_offset += cnt; xfersize -= cnt; } } vm_offset_t pmap_quick_enter_page(vm_page_t m) { return (PHYS_TO_DMAP(VM_PAGE_TO_PHYS(m))); } void pmap_quick_remove_page(vm_offset_t addr) { } /* * Returns true if the pmap's pv is one of the first * 16 pvs linked to from this page. This count may * be changed upwards or downwards in the future; it * is only necessary that true be returned for a small * subset of pmaps for proper page aging. */ boolean_t pmap_page_exists_quick(pmap_t pmap, vm_page_t m) { struct md_page *pvh; struct rwlock *lock; pv_entry_t pv; int loops = 0; boolean_t rv; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_page_exists_quick: page %p is not managed", m)); rv = FALSE; lock = VM_PAGE_TO_PV_LIST_LOCK(m); rw_rlock(lock); TAILQ_FOREACH(pv, &m->md.pv_list, pv_next) { if (PV_PMAP(pv) == pmap) { rv = TRUE; break; } loops++; if (loops >= 16) break; } if (!rv && loops < 16 && (m->flags & PG_FICTITIOUS) == 0) { pvh = pa_to_pvh(VM_PAGE_TO_PHYS(m)); TAILQ_FOREACH(pv, &pvh->pv_list, pv_next) { if (PV_PMAP(pv) == pmap) { rv = TRUE; break; } loops++; if (loops >= 16) break; } } rw_runlock(lock); return (rv); } /* * pmap_page_wired_mappings: * * Return the number of managed mappings to the given physical page * that are wired. */ int pmap_page_wired_mappings(vm_page_t m) { struct rwlock *lock; struct md_page *pvh; pmap_t pmap; pt_entry_t *pte; pv_entry_t pv; int count, lvl, md_gen, pvh_gen; if ((m->oflags & VPO_UNMANAGED) != 0) return (0); lock = VM_PAGE_TO_PV_LIST_LOCK(m); rw_rlock(lock); restart: count = 0; TAILQ_FOREACH(pv, &m->md.pv_list, pv_next) { pmap = PV_PMAP(pv); if (!PMAP_TRYLOCK(pmap)) { md_gen = m->md.pv_gen; rw_runlock(lock); PMAP_LOCK(pmap); rw_rlock(lock); if (md_gen != m->md.pv_gen) { PMAP_UNLOCK(pmap); goto restart; } } pte = pmap_pte(pmap, pv->pv_va, &lvl); if (pte != NULL && (pmap_load(pte) & ATTR_SW_WIRED) != 0) count++; PMAP_UNLOCK(pmap); } if ((m->flags & PG_FICTITIOUS) == 0) { pvh = pa_to_pvh(VM_PAGE_TO_PHYS(m)); TAILQ_FOREACH(pv, &pvh->pv_list, pv_next) { pmap = PV_PMAP(pv); if (!PMAP_TRYLOCK(pmap)) { md_gen = m->md.pv_gen; pvh_gen = pvh->pv_gen; rw_runlock(lock); PMAP_LOCK(pmap); rw_rlock(lock); if (md_gen != m->md.pv_gen || pvh_gen != pvh->pv_gen) { PMAP_UNLOCK(pmap); goto restart; } } pte = pmap_pte(pmap, pv->pv_va, &lvl); if (pte != NULL && (pmap_load(pte) & ATTR_SW_WIRED) != 0) count++; PMAP_UNLOCK(pmap); } } rw_runlock(lock); return (count); } /* * Destroy all managed, non-wired mappings in the given user-space * pmap. This pmap cannot be active on any processor besides the * caller. * * This function cannot be applied to the kernel pmap. Moreover, it * is not intended for general use. It is only to be used during * process termination. Consequently, it can be implemented in ways * that make it faster than pmap_remove(). First, it can more quickly * destroy mappings by iterating over the pmap's collection of PV * entries, rather than searching the page table. Second, it doesn't * have to test and clear the page table entries atomically, because * no processor is currently accessing the user address space. In * particular, a page table entry's dirty bit won't change state once * this function starts. */ void pmap_remove_pages(pmap_t pmap) { pd_entry_t *pde; pt_entry_t *pte, tpte; struct spglist free; vm_page_t m, ml3, mt; pv_entry_t pv; struct md_page *pvh; struct pv_chunk *pc, *npc; struct rwlock *lock; int64_t bit; uint64_t inuse, bitmask; int allfree, field, freed, idx, lvl; vm_paddr_t pa; lock = NULL; SLIST_INIT(&free); PMAP_LOCK(pmap); TAILQ_FOREACH_SAFE(pc, &pmap->pm_pvchunk, pc_list, npc) { allfree = 1; freed = 0; for (field = 0; field < _NPCM; field++) { inuse = ~pc->pc_map[field] & pc_freemask[field]; while (inuse != 0) { bit = ffsl(inuse) - 1; bitmask = 1UL << bit; idx = field * 64 + bit; pv = &pc->pc_pventry[idx]; inuse &= ~bitmask; pde = pmap_pde(pmap, pv->pv_va, &lvl); KASSERT(pde != NULL, ("Attempting to remove an unmapped page")); switch(lvl) { case 1: pte = pmap_l1_to_l2(pde, pv->pv_va); tpte = pmap_load(pte); KASSERT((tpte & ATTR_DESCR_MASK) == L2_BLOCK, ("Attempting to remove an invalid " "block: %lx", tpte)); tpte = pmap_load(pte); break; case 2: pte = pmap_l2_to_l3(pde, pv->pv_va); tpte = pmap_load(pte); KASSERT((tpte & ATTR_DESCR_MASK) == L3_PAGE, ("Attempting to remove an invalid " "page: %lx", tpte)); break; default: panic( "Invalid page directory level: %d", lvl); } /* * We cannot remove wired pages from a process' mapping at this time */ if (tpte & ATTR_SW_WIRED) { allfree = 0; continue; } pa = tpte & ~ATTR_MASK; m = PHYS_TO_VM_PAGE(pa); KASSERT(m->phys_addr == pa, ("vm_page_t %p phys_addr mismatch %016jx %016jx", m, (uintmax_t)m->phys_addr, (uintmax_t)tpte)); KASSERT((m->flags & PG_FICTITIOUS) != 0 || m < &vm_page_array[vm_page_array_size], ("pmap_remove_pages: bad pte %#jx", (uintmax_t)tpte)); if (pmap_is_current(pmap)) { if (lvl == 2 && pmap_l3_valid_cacheable(tpte)) { cpu_dcache_wb_range(pv->pv_va, L3_SIZE); } else if (lvl == 1 && pmap_pte_valid_cacheable(tpte)) { cpu_dcache_wb_range(pv->pv_va, L2_SIZE); } } pmap_load_clear(pte); PTE_SYNC(pte); pmap_invalidate_page(pmap, pv->pv_va); /* * Update the vm_page_t clean/reference bits. */ if ((tpte & ATTR_AP_RW_BIT) == ATTR_AP(ATTR_AP_RW)) { switch (lvl) { case 1: for (mt = m; mt < &m[L2_SIZE / PAGE_SIZE]; mt++) vm_page_dirty(m); break; case 2: vm_page_dirty(m); break; } } CHANGE_PV_LIST_LOCK_TO_VM_PAGE(&lock, m); /* Mark free */ pc->pc_map[field] |= bitmask; switch (lvl) { case 1: pmap_resident_count_dec(pmap, L2_SIZE / PAGE_SIZE); pvh = pa_to_pvh(tpte & ~ATTR_MASK); TAILQ_REMOVE(&pvh->pv_list, pv,pv_next); pvh->pv_gen++; if (TAILQ_EMPTY(&pvh->pv_list)) { for (mt = m; mt < &m[L2_SIZE / PAGE_SIZE]; mt++) if ((mt->aflags & PGA_WRITEABLE) != 0 && TAILQ_EMPTY(&mt->md.pv_list)) vm_page_aflag_clear(mt, PGA_WRITEABLE); } ml3 = pmap_lookup_pt_page(pmap, pv->pv_va); if (ml3 != NULL) { pmap_remove_pt_page(pmap, ml3); pmap_resident_count_dec(pmap,1); KASSERT(ml3->wire_count == NL3PG, ("pmap_remove_pages: l3 page wire count error")); ml3->wire_count = 0; pmap_add_delayed_free_list(ml3, &free, FALSE); atomic_subtract_int( &vm_cnt.v_wire_count, 1); } break; case 2: pmap_resident_count_dec(pmap, 1); TAILQ_REMOVE(&m->md.pv_list, pv, pv_next); m->md.pv_gen++; if ((m->aflags & PGA_WRITEABLE) != 0 && TAILQ_EMPTY(&m->md.pv_list) && (m->flags & PG_FICTITIOUS) == 0) { pvh = pa_to_pvh( VM_PAGE_TO_PHYS(m)); if (TAILQ_EMPTY(&pvh->pv_list)) vm_page_aflag_clear(m, PGA_WRITEABLE); } break; } pmap_unuse_l3(pmap, pv->pv_va, pmap_load(pde), &free); freed++; } } PV_STAT(atomic_add_long(&pv_entry_frees, freed)); PV_STAT(atomic_add_int(&pv_entry_spare, freed)); PV_STAT(atomic_subtract_long(&pv_entry_count, freed)); if (allfree) { TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); free_pv_chunk(pc); } } pmap_invalidate_all(pmap); if (lock != NULL) rw_wunlock(lock); PMAP_UNLOCK(pmap); pmap_free_zero_pages(&free); } /* * This is used to check if a page has been accessed or modified. As we * don't have a bit to see if it has been modified we have to assume it * has been if the page is read/write. */ static boolean_t pmap_page_test_mappings(vm_page_t m, boolean_t accessed, boolean_t modified) { struct rwlock *lock; pv_entry_t pv; struct md_page *pvh; pt_entry_t *pte, mask, value; pmap_t pmap; int lvl, md_gen, pvh_gen; boolean_t rv; rv = FALSE; lock = VM_PAGE_TO_PV_LIST_LOCK(m); rw_rlock(lock); restart: TAILQ_FOREACH(pv, &m->md.pv_list, pv_next) { pmap = PV_PMAP(pv); if (!PMAP_TRYLOCK(pmap)) { md_gen = m->md.pv_gen; rw_runlock(lock); PMAP_LOCK(pmap); rw_rlock(lock); if (md_gen != m->md.pv_gen) { PMAP_UNLOCK(pmap); goto restart; } } pte = pmap_pte(pmap, pv->pv_va, &lvl); KASSERT(lvl == 3, ("pmap_page_test_mappings: Invalid level %d", lvl)); mask = 0; value = 0; if (modified) { mask |= ATTR_AP_RW_BIT; value |= ATTR_AP(ATTR_AP_RW); } if (accessed) { mask |= ATTR_AF | ATTR_DESCR_MASK; value |= ATTR_AF | L3_PAGE; } rv = (pmap_load(pte) & mask) == value; PMAP_UNLOCK(pmap); if (rv) goto out; } if ((m->flags & PG_FICTITIOUS) == 0) { pvh = pa_to_pvh(VM_PAGE_TO_PHYS(m)); TAILQ_FOREACH(pv, &pvh->pv_list, pv_next) { pmap = PV_PMAP(pv); if (!PMAP_TRYLOCK(pmap)) { md_gen = m->md.pv_gen; pvh_gen = pvh->pv_gen; rw_runlock(lock); PMAP_LOCK(pmap); rw_rlock(lock); if (md_gen != m->md.pv_gen || pvh_gen != pvh->pv_gen) { PMAP_UNLOCK(pmap); goto restart; } } pte = pmap_pte(pmap, pv->pv_va, &lvl); KASSERT(lvl == 2, ("pmap_page_test_mappings: Invalid level %d", lvl)); mask = 0; value = 0; if (modified) { mask |= ATTR_AP_RW_BIT; value |= ATTR_AP(ATTR_AP_RW); } if (accessed) { mask |= ATTR_AF | ATTR_DESCR_MASK; value |= ATTR_AF | L2_BLOCK; } rv = (pmap_load(pte) & mask) == value; PMAP_UNLOCK(pmap); if (rv) goto out; } } out: rw_runlock(lock); return (rv); } /* * pmap_is_modified: * * Return whether or not the specified physical page was modified * in any physical maps. */ boolean_t pmap_is_modified(vm_page_t m) { KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_is_modified: page %p is not managed", m)); /* * If the page is not exclusive busied, then PGA_WRITEABLE cannot be * concurrently set while the object is locked. Thus, if PGA_WRITEABLE * is clear, no PTEs can have PG_M set. */ VM_OBJECT_ASSERT_WLOCKED(m->object); if (!vm_page_xbusied(m) && (m->aflags & PGA_WRITEABLE) == 0) return (FALSE); return (pmap_page_test_mappings(m, FALSE, TRUE)); } /* * pmap_is_prefaultable: * * Return whether or not the specified virtual address is eligible * for prefault. */ boolean_t pmap_is_prefaultable(pmap_t pmap, vm_offset_t addr) { pt_entry_t *pte; boolean_t rv; int lvl; rv = FALSE; PMAP_LOCK(pmap); pte = pmap_pte(pmap, addr, &lvl); if (pte != NULL && pmap_load(pte) != 0) { rv = TRUE; } PMAP_UNLOCK(pmap); return (rv); } /* * pmap_is_referenced: * * Return whether or not the specified physical page was referenced * in any physical maps. */ boolean_t pmap_is_referenced(vm_page_t m) { KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_is_referenced: page %p is not managed", m)); return (pmap_page_test_mappings(m, TRUE, FALSE)); } /* * Clear the write and modified bits in each of the given page's mappings. */ void pmap_remove_write(vm_page_t m) { struct md_page *pvh; pmap_t pmap; struct rwlock *lock; pv_entry_t next_pv, pv; pt_entry_t oldpte, *pte; vm_offset_t va; int lvl, md_gen, pvh_gen; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_remove_write: page %p is not managed", m)); /* * If the page is not exclusive busied, then PGA_WRITEABLE cannot be * set by another thread while the object is locked. Thus, * if PGA_WRITEABLE is clear, no page table entries need updating. */ VM_OBJECT_ASSERT_WLOCKED(m->object); if (!vm_page_xbusied(m) && (m->aflags & PGA_WRITEABLE) == 0) return; lock = VM_PAGE_TO_PV_LIST_LOCK(m); pvh = (m->flags & PG_FICTITIOUS) != 0 ? &pv_dummy : pa_to_pvh(VM_PAGE_TO_PHYS(m)); retry_pv_loop: rw_wlock(lock); TAILQ_FOREACH_SAFE(pv, &pvh->pv_list, pv_next, next_pv) { pmap = PV_PMAP(pv); if (!PMAP_TRYLOCK(pmap)) { pvh_gen = pvh->pv_gen; rw_wunlock(lock); PMAP_LOCK(pmap); rw_wlock(lock); if (pvh_gen != pvh->pv_gen) { PMAP_UNLOCK(pmap); rw_wunlock(lock); goto retry_pv_loop; } } va = pv->pv_va; pte = pmap_pte(pmap, pv->pv_va, &lvl); if ((pmap_load(pte) & ATTR_AP_RW_BIT) == ATTR_AP(ATTR_AP_RW)) pmap_demote_l2_locked(pmap, pte, va & ~L2_OFFSET, &lock); KASSERT(lock == VM_PAGE_TO_PV_LIST_LOCK(m), ("inconsistent pv lock %p %p for page %p", lock, VM_PAGE_TO_PV_LIST_LOCK(m), m)); PMAP_UNLOCK(pmap); } TAILQ_FOREACH(pv, &m->md.pv_list, pv_next) { pmap = PV_PMAP(pv); if (!PMAP_TRYLOCK(pmap)) { pvh_gen = pvh->pv_gen; md_gen = m->md.pv_gen; rw_wunlock(lock); PMAP_LOCK(pmap); rw_wlock(lock); if (pvh_gen != pvh->pv_gen || md_gen != m->md.pv_gen) { PMAP_UNLOCK(pmap); rw_wunlock(lock); goto retry_pv_loop; } } pte = pmap_pte(pmap, pv->pv_va, &lvl); retry: oldpte = pmap_load(pte); if ((oldpte & ATTR_AP_RW_BIT) == ATTR_AP(ATTR_AP_RW)) { if (!atomic_cmpset_long(pte, oldpte, oldpte | ATTR_AP(ATTR_AP_RO))) goto retry; if ((oldpte & ATTR_AF) != 0) vm_page_dirty(m); pmap_invalidate_page(pmap, pv->pv_va); } PMAP_UNLOCK(pmap); } rw_wunlock(lock); vm_page_aflag_clear(m, PGA_WRITEABLE); } static __inline boolean_t safe_to_clear_referenced(pmap_t pmap, pt_entry_t pte) { return (FALSE); } -#define PMAP_TS_REFERENCED_MAX 5 - /* * pmap_ts_referenced: * * Return a count of reference bits for a page, clearing those bits. * It is not necessary for every reference bit to be cleared, but it * is necessary that 0 only be returned when there are truly no * reference bits set. * - * XXX: The exact number of bits to check and clear is a matter that - * should be tested and standardized at some point in the future for - * optimal aging of shared pages. + * As an optimization, update the page's dirty field if a modified bit is + * found while counting reference bits. This opportunistic update can be + * performed at low cost and can eliminate the need for some future calls + * to pmap_is_modified(). However, since this function stops after + * finding PMAP_TS_REFERENCED_MAX reference bits, it may not detect some + * dirty pages. Those dirty pages will only be detected by a future call + * to pmap_is_modified(). */ int pmap_ts_referenced(vm_page_t m) { struct md_page *pvh; pv_entry_t pv, pvf; pmap_t pmap; struct rwlock *lock; pd_entry_t *pde, tpde; pt_entry_t *pte, tpte; pt_entry_t *l3; vm_offset_t va; vm_paddr_t pa; int cleared, md_gen, not_cleared, lvl, pvh_gen; struct spglist free; bool demoted; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_ts_referenced: page %p is not managed", m)); SLIST_INIT(&free); cleared = 0; pa = VM_PAGE_TO_PHYS(m); lock = PHYS_TO_PV_LIST_LOCK(pa); pvh = (m->flags & PG_FICTITIOUS) != 0 ? &pv_dummy : pa_to_pvh(pa); rw_wlock(lock); retry: not_cleared = 0; if ((pvf = TAILQ_FIRST(&pvh->pv_list)) == NULL) goto small_mappings; pv = pvf; do { if (pvf == NULL) pvf = pv; pmap = PV_PMAP(pv); if (!PMAP_TRYLOCK(pmap)) { pvh_gen = pvh->pv_gen; rw_wunlock(lock); PMAP_LOCK(pmap); rw_wlock(lock); if (pvh_gen != pvh->pv_gen) { PMAP_UNLOCK(pmap); goto retry; } } va = pv->pv_va; pde = pmap_pde(pmap, pv->pv_va, &lvl); KASSERT(pde != NULL, ("pmap_ts_referenced: no l1 table found")); KASSERT(lvl == 1, ("pmap_ts_referenced: invalid pde level %d", lvl)); tpde = pmap_load(pde); KASSERT((tpde & ATTR_DESCR_MASK) == L1_TABLE, ("pmap_ts_referenced: found an invalid l1 table")); pte = pmap_l1_to_l2(pde, pv->pv_va); tpte = pmap_load(pte); + if (pmap_page_dirty(tpte)) { + /* + * Although "tpte" is mapping a 2MB page, because + * this function is called at a 4KB page granularity, + * we only update the 4KB page under test. + */ + vm_page_dirty(m); + } if ((tpte & ATTR_AF) != 0) { /* * Since this reference bit is shared by 512 4KB * pages, it should not be cleared every time it is * tested. Apply a simple "hash" function on the * physical page number, the virtual superpage number, * and the pmap address to select one 4KB page out of * the 512 on which testing the reference bit will * result in clearing that reference bit. This * function is designed to avoid the selection of the * same 4KB page for every 2MB page mapping. * * On demotion, a mapping that hasn't been referenced * is simply destroyed. To avoid the possibility of a * subsequent page fault on a demoted wired mapping, * always leave its reference bit set. Moreover, * since the superpage is wired, the current state of * its reference bit won't affect page replacement. */ if ((((pa >> PAGE_SHIFT) ^ (pv->pv_va >> L2_SHIFT) ^ (uintptr_t)pmap) & (Ln_ENTRIES - 1)) == 0 && (tpte & ATTR_SW_WIRED) == 0) { if (safe_to_clear_referenced(pmap, tpte)) { /* * TODO: We don't handle the access * flag at all. We need to be able * to set it in the exception handler. */ panic("ARM64TODO: " "safe_to_clear_referenced\n"); } else if (pmap_demote_l2_locked(pmap, pte, pv->pv_va, &lock) != NULL) { demoted = true; va += VM_PAGE_TO_PHYS(m) - (tpte & ~ATTR_MASK); l3 = pmap_l2_to_l3(pte, va); pmap_remove_l3(pmap, l3, va, pmap_load(pte), NULL, &lock); } else demoted = true; if (demoted) { /* * The superpage mapping was removed * entirely and therefore 'pv' is no * longer valid. */ if (pvf == pv) pvf = NULL; pv = NULL; } cleared++; KASSERT(lock == VM_PAGE_TO_PV_LIST_LOCK(m), ("inconsistent pv lock %p %p for page %p", lock, VM_PAGE_TO_PV_LIST_LOCK(m), m)); } else not_cleared++; } PMAP_UNLOCK(pmap); /* Rotate the PV list if it has more than one entry. */ if (pv != NULL && TAILQ_NEXT(pv, pv_next) != NULL) { TAILQ_REMOVE(&pvh->pv_list, pv, pv_next); TAILQ_INSERT_TAIL(&pvh->pv_list, pv, pv_next); pvh->pv_gen++; } if (cleared + not_cleared >= PMAP_TS_REFERENCED_MAX) goto out; } while ((pv = TAILQ_FIRST(&pvh->pv_list)) != pvf); small_mappings: if ((pvf = TAILQ_FIRST(&m->md.pv_list)) == NULL) goto out; pv = pvf; do { if (pvf == NULL) pvf = pv; pmap = PV_PMAP(pv); if (!PMAP_TRYLOCK(pmap)) { pvh_gen = pvh->pv_gen; md_gen = m->md.pv_gen; rw_wunlock(lock); PMAP_LOCK(pmap); rw_wlock(lock); if (pvh_gen != pvh->pv_gen || md_gen != m->md.pv_gen) { PMAP_UNLOCK(pmap); goto retry; } } pde = pmap_pde(pmap, pv->pv_va, &lvl); KASSERT(pde != NULL, ("pmap_ts_referenced: no l2 table found")); KASSERT(lvl == 2, ("pmap_ts_referenced: invalid pde level %d", lvl)); tpde = pmap_load(pde); KASSERT((tpde & ATTR_DESCR_MASK) == L2_TABLE, ("pmap_ts_referenced: found an invalid l2 table")); pte = pmap_l2_to_l3(pde, pv->pv_va); tpte = pmap_load(pte); + if (pmap_page_dirty(tpte)) + vm_page_dirty(m); if ((tpte & ATTR_AF) != 0) { if (safe_to_clear_referenced(pmap, tpte)) { /* * TODO: We don't handle the access flag * at all. We need to be able to set it in * the exception handler. */ panic("ARM64TODO: safe_to_clear_referenced\n"); } else if ((tpte & ATTR_SW_WIRED) == 0) { /* * Wired pages cannot be paged out so * doing accessed bit emulation for * them is wasted effort. We do the * hard work for unwired pages only. */ pmap_remove_l3(pmap, pte, pv->pv_va, tpde, &free, &lock); pmap_invalidate_page(pmap, pv->pv_va); cleared++; if (pvf == pv) pvf = NULL; pv = NULL; KASSERT(lock == VM_PAGE_TO_PV_LIST_LOCK(m), ("inconsistent pv lock %p %p for page %p", lock, VM_PAGE_TO_PV_LIST_LOCK(m), m)); } else not_cleared++; } PMAP_UNLOCK(pmap); /* Rotate the PV list if it has more than one entry. */ if (pv != NULL && TAILQ_NEXT(pv, pv_next) != NULL) { TAILQ_REMOVE(&m->md.pv_list, pv, pv_next); TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_next); m->md.pv_gen++; } } while ((pv = TAILQ_FIRST(&m->md.pv_list)) != pvf && cleared + not_cleared < PMAP_TS_REFERENCED_MAX); out: rw_wunlock(lock); pmap_free_zero_pages(&free); return (cleared + not_cleared); } /* * Apply the given advice to the specified range of addresses within the * given pmap. Depending on the advice, clear the referenced and/or * modified flags in each mapping and set the mapped page's dirty field. */ void pmap_advise(pmap_t pmap, vm_offset_t sva, vm_offset_t eva, int advice) { } /* * Clear the modify bits on the specified physical page. */ void pmap_clear_modify(vm_page_t m) { KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_clear_modify: page %p is not managed", m)); VM_OBJECT_ASSERT_WLOCKED(m->object); KASSERT(!vm_page_xbusied(m), ("pmap_clear_modify: page %p is exclusive busied", m)); /* * If the page is not PGA_WRITEABLE, then no PTEs can have PG_M set. * If the object containing the page is locked and the page is not * exclusive busied, then PGA_WRITEABLE cannot be concurrently set. */ if ((m->aflags & PGA_WRITEABLE) == 0) return; /* ARM64TODO: We lack support for tracking if a page is modified */ } void * pmap_mapbios(vm_paddr_t pa, vm_size_t size) { return ((void *)PHYS_TO_DMAP(pa)); } void pmap_unmapbios(vm_paddr_t pa, vm_size_t size) { } /* * Sets the memory attribute for the specified page. */ void pmap_page_set_memattr(vm_page_t m, vm_memattr_t ma) { m->md.pv_memattr = ma; /* * If "m" is a normal page, update its direct mapping. This update * can be relied upon to perform any cache operations that are * required for data coherence. */ if ((m->flags & PG_FICTITIOUS) == 0 && pmap_change_attr(PHYS_TO_DMAP(VM_PAGE_TO_PHYS(m)), PAGE_SIZE, m->md.pv_memattr) != 0) panic("memory attribute change on the direct map failed"); } /* * Changes the specified virtual address range's memory type to that given by * the parameter "mode". The specified virtual address range must be * completely contained within either the direct map or the kernel map. If * the virtual address range is contained within the kernel map, then the * memory type for each of the corresponding ranges of the direct map is also * changed. (The corresponding ranges of the direct map are those ranges that * map the same physical pages as the specified virtual address range.) These * changes to the direct map are necessary because Intel describes the * behavior of their processors as "undefined" if two or more mappings to the * same physical page have different memory types. * * Returns zero if the change completed successfully, and either EINVAL or * ENOMEM if the change failed. Specifically, EINVAL is returned if some part * of the virtual address range was not mapped, and ENOMEM is returned if * there was insufficient memory available to complete the change. In the * latter case, the memory type may have been changed on some part of the * virtual address range or the direct map. */ static int pmap_change_attr(vm_offset_t va, vm_size_t size, int mode) { int error; PMAP_LOCK(kernel_pmap); error = pmap_change_attr_locked(va, size, mode); PMAP_UNLOCK(kernel_pmap); return (error); } static int pmap_change_attr_locked(vm_offset_t va, vm_size_t size, int mode) { vm_offset_t base, offset, tmpva; pt_entry_t l3, *pte, *newpte; int lvl; PMAP_LOCK_ASSERT(kernel_pmap, MA_OWNED); base = trunc_page(va); offset = va & PAGE_MASK; size = round_page(offset + size); if (!VIRT_IN_DMAP(base)) return (EINVAL); for (tmpva = base; tmpva < base + size; ) { pte = pmap_pte(kernel_pmap, va, &lvl); if (pte == NULL) return (EINVAL); if ((pmap_load(pte) & ATTR_IDX_MASK) == ATTR_IDX(mode)) { /* * We already have the correct attribute, * ignore this entry. */ switch (lvl) { default: panic("Invalid DMAP table level: %d\n", lvl); case 1: tmpva = (tmpva & ~L1_OFFSET) + L1_SIZE; break; case 2: tmpva = (tmpva & ~L2_OFFSET) + L2_SIZE; break; case 3: tmpva += PAGE_SIZE; break; } } else { /* * Split the entry to an level 3 table, then * set the new attribute. */ switch (lvl) { default: panic("Invalid DMAP table level: %d\n", lvl); case 1: newpte = pmap_demote_l1(kernel_pmap, pte, tmpva & ~L1_OFFSET); if (newpte == NULL) return (EINVAL); pte = pmap_l1_to_l2(pte, tmpva); case 2: newpte = pmap_demote_l2(kernel_pmap, pte, tmpva & ~L2_OFFSET); if (newpte == NULL) return (EINVAL); pte = pmap_l2_to_l3(pte, tmpva); case 3: /* Update the entry */ l3 = pmap_load(pte); l3 &= ~ATTR_IDX_MASK; l3 |= ATTR_IDX(mode); pmap_update_entry(kernel_pmap, pte, l3, tmpva, PAGE_SIZE); /* * If moving to a non-cacheable entry flush * the cache. */ if (mode == VM_MEMATTR_UNCACHEABLE) cpu_dcache_wbinv_range(tmpva, L3_SIZE); break; } tmpva += PAGE_SIZE; } } return (0); } /* * Create an L2 table to map all addresses within an L1 mapping. */ static pt_entry_t * pmap_demote_l1(pmap_t pmap, pt_entry_t *l1, vm_offset_t va) { pt_entry_t *l2, newl2, oldl1; vm_offset_t tmpl1; vm_paddr_t l2phys, phys; vm_page_t ml2; int i; PMAP_LOCK_ASSERT(pmap, MA_OWNED); oldl1 = pmap_load(l1); KASSERT((oldl1 & ATTR_DESCR_MASK) == L1_BLOCK, ("pmap_demote_l1: Demoting a non-block entry")); KASSERT((va & L1_OFFSET) == 0, ("pmap_demote_l1: Invalid virtual address %#lx", va)); KASSERT((oldl1 & ATTR_SW_MANAGED) == 0, ("pmap_demote_l1: Level 1 table shouldn't be managed")); tmpl1 = 0; if (va <= (vm_offset_t)l1 && va + L1_SIZE > (vm_offset_t)l1) { tmpl1 = kva_alloc(PAGE_SIZE); if (tmpl1 == 0) return (NULL); } if ((ml2 = vm_page_alloc(NULL, 0, VM_ALLOC_INTERRUPT | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED)) == NULL) { CTR2(KTR_PMAP, "pmap_demote_l1: failure for va %#lx" " in pmap %p", va, pmap); return (NULL); } l2phys = VM_PAGE_TO_PHYS(ml2); l2 = (pt_entry_t *)PHYS_TO_DMAP(l2phys); /* Address the range points at */ phys = oldl1 & ~ATTR_MASK; /* The attributed from the old l1 table to be copied */ newl2 = oldl1 & ATTR_MASK; /* Create the new entries */ for (i = 0; i < Ln_ENTRIES; i++) { l2[i] = newl2 | phys; phys += L2_SIZE; } cpu_dcache_wb_range((vm_offset_t)l2, PAGE_SIZE); KASSERT(l2[0] == ((oldl1 & ~ATTR_DESCR_MASK) | L2_BLOCK), ("Invalid l2 page (%lx != %lx)", l2[0], (oldl1 & ~ATTR_DESCR_MASK) | L2_BLOCK)); if (tmpl1 != 0) { pmap_kenter(tmpl1, PAGE_SIZE, DMAP_TO_PHYS((vm_offset_t)l1) & ~L3_OFFSET, CACHED_MEMORY); l1 = (pt_entry_t *)(tmpl1 + ((vm_offset_t)l1 & PAGE_MASK)); } pmap_update_entry(pmap, l1, l2phys | L1_TABLE, va, PAGE_SIZE); if (tmpl1 != 0) { pmap_kremove(tmpl1); kva_free(tmpl1, PAGE_SIZE); } return (l2); } /* * Create an L3 table to map all addresses within an L2 mapping. */ static pt_entry_t * pmap_demote_l2_locked(pmap_t pmap, pt_entry_t *l2, vm_offset_t va, struct rwlock **lockp) { pt_entry_t *l3, newl3, oldl2; vm_offset_t tmpl2; vm_paddr_t l3phys, phys; vm_page_t ml3; int i; PMAP_LOCK_ASSERT(pmap, MA_OWNED); l3 = NULL; oldl2 = pmap_load(l2); KASSERT((oldl2 & ATTR_DESCR_MASK) == L2_BLOCK, ("pmap_demote_l2: Demoting a non-block entry")); KASSERT((va & L2_OFFSET) == 0, ("pmap_demote_l2: Invalid virtual address %#lx", va)); tmpl2 = 0; if (va <= (vm_offset_t)l2 && va + L2_SIZE > (vm_offset_t)l2) { tmpl2 = kva_alloc(PAGE_SIZE); if (tmpl2 == 0) return (NULL); } if ((ml3 = pmap_lookup_pt_page(pmap, va)) != NULL) { pmap_remove_pt_page(pmap, ml3); } else { ml3 = vm_page_alloc(NULL, pmap_l2_pindex(va), (VIRT_IN_DMAP(va) ? VM_ALLOC_INTERRUPT : VM_ALLOC_NORMAL) | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED); if (ml3 == NULL) { CTR2(KTR_PMAP, "pmap_demote_l2: failure for va %#lx" " in pmap %p", va, pmap); goto fail; } if (va < VM_MAXUSER_ADDRESS) pmap_resident_count_inc(pmap, 1); } l3phys = VM_PAGE_TO_PHYS(ml3); l3 = (pt_entry_t *)PHYS_TO_DMAP(l3phys); /* Address the range points at */ phys = oldl2 & ~ATTR_MASK; /* The attributed from the old l2 table to be copied */ newl3 = (oldl2 & (ATTR_MASK & ~ATTR_DESCR_MASK)) | L3_PAGE; /* * If the page table page is new, initialize it. */ if (ml3->wire_count == 1) { for (i = 0; i < Ln_ENTRIES; i++) { l3[i] = newl3 | phys; phys += L3_SIZE; } cpu_dcache_wb_range((vm_offset_t)l3, PAGE_SIZE); } KASSERT(l3[0] == ((oldl2 & ~ATTR_DESCR_MASK) | L3_PAGE), ("Invalid l3 page (%lx != %lx)", l3[0], (oldl2 & ~ATTR_DESCR_MASK) | L3_PAGE)); /* * Map the temporary page so we don't lose access to the l2 table. */ if (tmpl2 != 0) { pmap_kenter(tmpl2, PAGE_SIZE, DMAP_TO_PHYS((vm_offset_t)l2) & ~L3_OFFSET, CACHED_MEMORY); l2 = (pt_entry_t *)(tmpl2 + ((vm_offset_t)l2 & PAGE_MASK)); } /* * The spare PV entries must be reserved prior to demoting the * mapping, that is, prior to changing the PDE. Otherwise, the state * of the L2 and the PV lists will be inconsistent, which can result * in reclaim_pv_chunk() attempting to remove a PV entry from the * wrong PV list and pmap_pv_demote_l2() failing to find the expected * PV entry for the 2MB page mapping that is being demoted. */ if ((oldl2 & ATTR_SW_MANAGED) != 0) reserve_pv_entries(pmap, Ln_ENTRIES - 1, lockp); pmap_update_entry(pmap, l2, l3phys | L2_TABLE, va, PAGE_SIZE); /* * Demote the PV entry. */ if ((oldl2 & ATTR_SW_MANAGED) != 0) pmap_pv_demote_l2(pmap, va, oldl2 & ~ATTR_MASK, lockp); atomic_add_long(&pmap_l2_demotions, 1); CTR3(KTR_PMAP, "pmap_demote_l2: success for va %#lx" " in pmap %p %lx", va, pmap, l3[0]); fail: if (tmpl2 != 0) { pmap_kremove(tmpl2); kva_free(tmpl2, PAGE_SIZE); } return (l3); } static pt_entry_t * pmap_demote_l2(pmap_t pmap, pt_entry_t *l2, vm_offset_t va) { struct rwlock *lock; pt_entry_t *l3; lock = NULL; l3 = pmap_demote_l2_locked(pmap, l2, va, &lock); if (lock != NULL) rw_wunlock(lock); return (l3); } /* * perform the pmap work for mincore */ int pmap_mincore(pmap_t pmap, vm_offset_t addr, vm_paddr_t *locked_pa) { pd_entry_t *l1p, l1; pd_entry_t *l2p, l2; pt_entry_t *l3p, l3; vm_paddr_t pa; bool managed; int val; PMAP_LOCK(pmap); retry: pa = 0; val = 0; managed = false; l1p = pmap_l1(pmap, addr); if (l1p == NULL) /* No l1 */ goto done; l1 = pmap_load(l1p); if ((l1 & ATTR_DESCR_MASK) == L1_INVAL) goto done; if ((l1 & ATTR_DESCR_MASK) == L1_BLOCK) { pa = (l1 & ~ATTR_MASK) | (addr & L1_OFFSET); managed = (l1 & ATTR_SW_MANAGED) == ATTR_SW_MANAGED; val = MINCORE_SUPER | MINCORE_INCORE; if (pmap_page_dirty(l1)) val |= MINCORE_MODIFIED | MINCORE_MODIFIED_OTHER; if ((l1 & ATTR_AF) == ATTR_AF) val |= MINCORE_REFERENCED | MINCORE_REFERENCED_OTHER; goto done; } l2p = pmap_l1_to_l2(l1p, addr); if (l2p == NULL) /* No l2 */ goto done; l2 = pmap_load(l2p); if ((l2 & ATTR_DESCR_MASK) == L2_INVAL) goto done; if ((l2 & ATTR_DESCR_MASK) == L2_BLOCK) { pa = (l2 & ~ATTR_MASK) | (addr & L2_OFFSET); managed = (l2 & ATTR_SW_MANAGED) == ATTR_SW_MANAGED; val = MINCORE_SUPER | MINCORE_INCORE; if (pmap_page_dirty(l2)) val |= MINCORE_MODIFIED | MINCORE_MODIFIED_OTHER; if ((l2 & ATTR_AF) == ATTR_AF) val |= MINCORE_REFERENCED | MINCORE_REFERENCED_OTHER; goto done; } l3p = pmap_l2_to_l3(l2p, addr); if (l3p == NULL) /* No l3 */ goto done; l3 = pmap_load(l2p); if ((l3 & ATTR_DESCR_MASK) == L3_INVAL) goto done; if ((l3 & ATTR_DESCR_MASK) == L3_PAGE) { pa = (l3 & ~ATTR_MASK) | (addr & L3_OFFSET); managed = (l3 & ATTR_SW_MANAGED) == ATTR_SW_MANAGED; val = MINCORE_INCORE; if (pmap_page_dirty(l3)) val |= MINCORE_MODIFIED | MINCORE_MODIFIED_OTHER; if ((l3 & ATTR_AF) == ATTR_AF) val |= MINCORE_REFERENCED | MINCORE_REFERENCED_OTHER; } done: if ((val & (MINCORE_MODIFIED_OTHER | MINCORE_REFERENCED_OTHER)) != (MINCORE_MODIFIED_OTHER | MINCORE_REFERENCED_OTHER) && managed) { /* Ensure that "PHYS_TO_VM_PAGE(pa)->object" doesn't change. */ if (vm_page_pa_tryrelock(pmap, pa, locked_pa)) goto retry; } else PA_UNLOCK_COND(*locked_pa); PMAP_UNLOCK(pmap); return (val); } void pmap_activate(struct thread *td) { pmap_t pmap; critical_enter(); pmap = vmspace_pmap(td->td_proc->p_vmspace); td->td_pcb->pcb_l0addr = vtophys(pmap->pm_l0); __asm __volatile("msr ttbr0_el1, %0" : : "r"(td->td_pcb->pcb_l0addr)); pmap_invalidate_all(pmap); critical_exit(); } void pmap_sync_icache(pmap_t pmap, vm_offset_t va, vm_size_t sz) { if (va >= VM_MIN_KERNEL_ADDRESS) { cpu_icache_sync_range(va, sz); } else { u_int len, offset; vm_paddr_t pa; /* Find the length of data in this page to flush */ offset = va & PAGE_MASK; len = imin(PAGE_SIZE - offset, sz); while (sz != 0) { /* Extract the physical address & find it in the DMAP */ pa = pmap_extract(pmap, va); if (pa != 0) cpu_icache_sync_range(PHYS_TO_DMAP(pa), len); /* Move to the next page */ sz -= len; va += len; /* Set the length for the next iteration */ len = imin(PAGE_SIZE, sz); } } } int pmap_fault(pmap_t pmap, uint64_t esr, uint64_t far) { #ifdef SMP uint64_t par; #endif switch (ESR_ELx_EXCEPTION(esr)) { case EXCP_DATA_ABORT_L: case EXCP_DATA_ABORT: break; default: return (KERN_FAILURE); } #ifdef SMP PMAP_LOCK(pmap); switch (esr & ISS_DATA_DFSC_MASK) { case ISS_DATA_DFSC_TF_L0: case ISS_DATA_DFSC_TF_L1: case ISS_DATA_DFSC_TF_L2: case ISS_DATA_DFSC_TF_L3: /* Ask the MMU to check the address */ if (pmap == kernel_pmap) par = arm64_address_translate_s1e1r(far); else par = arm64_address_translate_s1e0r(far); /* * If the translation was successful the address was invalid * due to a break-before-make sequence. We can unlock and * return success to the trap handler. */ if (PAR_SUCCESS(par)) { PMAP_UNLOCK(pmap); return (KERN_SUCCESS); } break; default: break; } PMAP_UNLOCK(pmap); #endif return (KERN_FAILURE); } /* * Increase the starting virtual address of the given mapping if a * different alignment might result in more superpage mappings. */ void pmap_align_superpage(vm_object_t object, vm_ooffset_t offset, vm_offset_t *addr, vm_size_t size) { vm_offset_t superpage_offset; if (size < L2_SIZE) return; if (object != NULL && (object->flags & OBJ_COLORED) != 0) offset += ptoa(object->pg_color); superpage_offset = offset & L2_OFFSET; if (size - ((L2_SIZE - superpage_offset) & L2_OFFSET) < L2_SIZE || (*addr & L2_OFFSET) == superpage_offset) return; if ((*addr & L2_OFFSET) < superpage_offset) *addr = (*addr & ~L2_OFFSET) + superpage_offset; else *addr = ((*addr + L2_OFFSET) & ~L2_OFFSET) + superpage_offset; } /** * Get the kernel virtual address of a set of physical pages. If there are * physical addresses not covered by the DMAP perform a transient mapping * that will be removed when calling pmap_unmap_io_transient. * * \param page The pages the caller wishes to obtain the virtual * address on the kernel memory map. * \param vaddr On return contains the kernel virtual memory address * of the pages passed in the page parameter. * \param count Number of pages passed in. * \param can_fault TRUE if the thread using the mapped pages can take * page faults, FALSE otherwise. * * \returns TRUE if the caller must call pmap_unmap_io_transient when * finished or FALSE otherwise. * */ boolean_t pmap_map_io_transient(vm_page_t page[], vm_offset_t vaddr[], int count, boolean_t can_fault) { vm_paddr_t paddr; boolean_t needs_mapping; int error, i; /* * Allocate any KVA space that we need, this is done in a separate * loop to prevent calling vmem_alloc while pinned. */ needs_mapping = FALSE; for (i = 0; i < count; i++) { paddr = VM_PAGE_TO_PHYS(page[i]); if (__predict_false(!PHYS_IN_DMAP(paddr))) { error = vmem_alloc(kernel_arena, PAGE_SIZE, M_BESTFIT | M_WAITOK, &vaddr[i]); KASSERT(error == 0, ("vmem_alloc failed: %d", error)); needs_mapping = TRUE; } else { vaddr[i] = PHYS_TO_DMAP(paddr); } } /* Exit early if everything is covered by the DMAP */ if (!needs_mapping) return (FALSE); if (!can_fault) sched_pin(); for (i = 0; i < count; i++) { paddr = VM_PAGE_TO_PHYS(page[i]); if (!PHYS_IN_DMAP(paddr)) { panic( "pmap_map_io_transient: TODO: Map out of DMAP data"); } } return (needs_mapping); } void pmap_unmap_io_transient(vm_page_t page[], vm_offset_t vaddr[], int count, boolean_t can_fault) { vm_paddr_t paddr; int i; if (!can_fault) sched_unpin(); for (i = 0; i < count; i++) { paddr = VM_PAGE_TO_PHYS(page[i]); if (!PHYS_IN_DMAP(paddr)) { panic("ARM64TODO: pmap_unmap_io_transient: Unmap data"); } } } Index: projects/clang390-import/sys/cddl/compat/opensolaris/sys/random.h =================================================================== --- projects/clang390-import/sys/cddl/compat/opensolaris/sys/random.h (revision 305686) +++ projects/clang390-import/sys/cddl/compat/opensolaris/sys/random.h (revision 305687) @@ -1,37 +1,37 @@ /*- * Copyright (c) 2007 Pawel Jakub Dawidek * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ #ifndef _OPENSOLARIS_SYS_RANDOM_H_ #define _OPENSOLARIS_SYS_RANDOM_H_ #include_next #define random_get_bytes(p, s) read_random((p), (int)(s)) -#define random_get_pseudo_bytes(p, s) read_random((p), (int)(s)) +#define random_get_pseudo_bytes(p, s) arc4rand((p), (int)(s), 0) #endif /* !_OPENSOLARIS_SYS_RANDOM_H_ */ Index: projects/clang390-import/sys/conf/options.mips =================================================================== --- projects/clang390-import/sys/conf/options.mips (revision 305686) +++ projects/clang390-import/sys/conf/options.mips (revision 305687) @@ -1,148 +1,149 @@ # Copyright (c) 2001, 2008, Juniper Networks, Inc. # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # 3. Neither the name of the Juniper Networks, Inc. nor the names of its # contributors may be used to endorse or promote products derived from # this software without specific prior written permission. # # THIS SOFTWARE IS PROVIDED BY JUNIPER NETWORKS AND CONTRIBUTORS ``AS IS'' AND # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE # ARE DISCLAIMED. IN NO EVENT SHALL JUNIPER NETWORKS OR CONTRIBUTORS BE LIABLE # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS # OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) # HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT # LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY # OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF # SUCH DAMAGE. # # JNPR: options.mips,v 1.2 2006/09/15 12:52:34 # $FreeBSD$ CPU_MIPS4KC opt_global.h CPU_MIPS24K opt_global.h CPU_MIPS34K opt_global.h CPU_MIPS74K opt_global.h CPU_MIPS1004K opt_global.h CPU_MIPS1074K opt_global.h CPU_INTERAPTIV opt_global.h CPU_PROAPTIV opt_global.h CPU_MIPS32 opt_global.h CPU_MIPS64 opt_global.h CPU_SENTRY5 opt_global.h CPU_HAVEFPU opt_global.h CPU_SB1 opt_global.h CPU_CNMIPS opt_global.h CPU_RMI opt_global.h CPU_NLM opt_global.h CPU_BERI opt_global.h +CPU_MALTA opt_global.h # which MACHINE_ARCH architecture MIPS MIPSEL MIPS64 MIPS64EL MIPSN32 COMPAT_FREEBSD32 opt_compat.h YAMON opt_global.h CFE opt_global.h CFE_CONSOLE opt_global.h CFE_ENV opt_global.h CFE_ENV_SIZE opt_global.h GFB_DEBUG opt_gfb.h GFB_NO_FONT_LOADING opt_gfb.h GFB_NO_MODE_CHANGE opt_gfb.h NOFPU opt_global.h TICK_USE_YAMON_FREQ opt_global.h TICK_USE_MALTA_RTC opt_global.h # # The highest memory address that can be used by the kernel in units of KB. # MAXMEM opt_global.h # # Manual override of cache config # MIPS_DISABLE_L1_CACHE opt_global.h # # Options that control the Cavium Simple Executive. # OCTEON_MODEL opt_cvmx.h OCTEON_VENDOR_LANNER opt_cvmx.h OCTEON_VENDOR_UBIQUITI opt_cvmx.h OCTEON_VENDOR_RADISYS opt_cvmx.h OCTEON_VENDOR_GEFES opt_cvmx.h OCTEON_BOARD_CAPK_0100ND opt_cvmx.h # # Options specific to the BERI platform. # BERI_LARGE_TLB opt_global.h # # Options that control the NetFPGA-10G Embedded CPU Ethernet Core. # NF10BMAC_64BIT opt_netfpga.h # # Options that control the Atheros SoC peripherals # ARGE_DEBUG opt_arge.h ARGE_MDIO opt_arge.h # # At least one of the AR71XX ubiquiti boards has a Redboot configuration # that "lies" about the amount of RAM it has. Until a cleaner method is # defined, this option will suffice in overriding what Redboot says. # AR71XX_REALMEM opt_ar71xx.h AR71XX_ENV_UBOOT opt_ar71xx.h AR71XX_ENV_REDBOOT opt_ar71xx.h AR71XX_ENV_ROUTERBOOT opt_ar71xx.h AR71XX_ATH_EEPROM opt_ar71xx.h # # Options that control the Ralink RT305xF Etherenet MAC. # IF_RT_DEBUG opt_if_rt.h IF_RT_PHY_SUPPORT opt_if_rt.h IF_RT_RING_DATA_COUNT opt_if_rt.h # # Options that control the Ralink/Mediatek SoC type. # MT7620 opt_rt305x.h RT5350 opt_rt305x.h RT305XF opt_rt305x.h RT3052F opt_rt305x.h RT3050F opt_rt305x.h RT305X opt_rt305x.h RT305X_UBOOT opt_rt305x.h RT305X_USE_UART opt_rt305x.h # # Options that affect the pmap. # PV_STATS opt_pmap.h # # Options to use INTRNG code # INTRNG opt_global.h MIPS_NIRQ opt_global.h Index: projects/clang390-import/sys/ddb/db_command.c =================================================================== --- projects/clang390-import/sys/ddb/db_command.c (revision 305686) +++ projects/clang390-import/sys/ddb/db_command.c (revision 305687) @@ -1,888 +1,888 @@ /*- * Mach Operating System * Copyright (c) 1991,1990 Carnegie Mellon University * All Rights Reserved. * * Permission to use, copy, modify and distribute this software and its * documentation is hereby granted, provided that both the copyright * notice and this permission notice appear in all copies of the * software, derivative works or modified versions, and any portions * thereof, and that both notices appear in supporting documentation. * * CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS * CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND FOR * ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. * * Carnegie Mellon requests users of this software to return to * * Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU * School of Computer Science * Carnegie Mellon University * Pittsburgh PA 15213-3890 * * any improvements or extensions that they make and grant Carnegie the * rights to redistribute these changes. */ /* * Author: David B. Golub, Carnegie Mellon University * Date: 7/90 */ /* * Command dispatcher. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include /* * Exported global variables */ -bool db_cmd_loop_done; +int db_cmd_loop_done; db_addr_t db_dot; db_addr_t db_last_addr; db_addr_t db_prev; db_addr_t db_next; static db_cmdfcn_t db_dump; static db_cmdfcn_t db_fncall; static db_cmdfcn_t db_gdb; static db_cmdfcn_t db_halt; static db_cmdfcn_t db_kill; static db_cmdfcn_t db_reset; static db_cmdfcn_t db_stack_trace; static db_cmdfcn_t db_stack_trace_active; static db_cmdfcn_t db_stack_trace_all; static db_cmdfcn_t db_watchdog; /* * 'show' commands */ static struct command db_show_active_cmds[] = { { "trace", db_stack_trace_active, 0, NULL }, }; struct command_table db_show_active_table = LIST_HEAD_INITIALIZER(db_show_active_table); static struct command db_show_all_cmds[] = { { "trace", db_stack_trace_all, 0, NULL }, }; struct command_table db_show_all_table = LIST_HEAD_INITIALIZER(db_show_all_table); static struct command db_show_cmds[] = { { "active", 0, 0, &db_show_active_table }, { "all", 0, 0, &db_show_all_table }, { "registers", db_show_regs, 0, NULL }, { "breaks", db_listbreak_cmd, 0, NULL }, { "threads", db_show_threads, 0, NULL }, }; struct command_table db_show_table = LIST_HEAD_INITIALIZER(db_show_table); static struct command db_cmds[] = { { "print", db_print_cmd, 0, NULL }, { "p", db_print_cmd, 0, NULL }, { "examine", db_examine_cmd, CS_SET_DOT, NULL }, { "x", db_examine_cmd, CS_SET_DOT, NULL }, { "search", db_search_cmd, CS_OWN|CS_SET_DOT, NULL }, { "set", db_set_cmd, CS_OWN, NULL }, { "write", db_write_cmd, CS_MORE|CS_SET_DOT, NULL }, { "w", db_write_cmd, CS_MORE|CS_SET_DOT, NULL }, { "delete", db_delete_cmd, 0, NULL }, { "d", db_delete_cmd, 0, NULL }, { "dump", db_dump, 0, NULL }, { "break", db_breakpoint_cmd, 0, NULL }, { "b", db_breakpoint_cmd, 0, NULL }, { "dwatch", db_deletewatch_cmd, 0, NULL }, { "watch", db_watchpoint_cmd, CS_MORE,NULL }, { "dhwatch", db_deletehwatch_cmd, 0, NULL }, { "hwatch", db_hwatchpoint_cmd, 0, NULL }, { "step", db_single_step_cmd, 0, NULL }, { "s", db_single_step_cmd, 0, NULL }, { "continue", db_continue_cmd, 0, NULL }, { "c", db_continue_cmd, 0, NULL }, { "until", db_trace_until_call_cmd,0, NULL }, { "next", db_trace_until_matching_cmd,0, NULL }, { "match", db_trace_until_matching_cmd,0, NULL }, { "trace", db_stack_trace, CS_OWN, NULL }, { "t", db_stack_trace, CS_OWN, NULL }, /* XXX alias for active trace */ { "acttrace", db_stack_trace_active, 0, NULL }, /* XXX alias for all trace */ { "alltrace", db_stack_trace_all, 0, NULL }, { "where", db_stack_trace, CS_OWN, NULL }, { "bt", db_stack_trace, CS_OWN, NULL }, { "call", db_fncall, CS_OWN, NULL }, { "show", 0, 0, &db_show_table }, { "ps", db_ps, 0, NULL }, { "gdb", db_gdb, 0, NULL }, { "halt", db_halt, 0, NULL }, { "reboot", db_reset, 0, NULL }, { "reset", db_reset, 0, NULL }, { "kill", db_kill, CS_OWN, NULL }, { "watchdog", db_watchdog, CS_OWN, NULL }, { "thread", db_set_thread, CS_OWN, NULL }, { "run", db_run_cmd, CS_OWN, NULL }, { "script", db_script_cmd, CS_OWN, NULL }, { "scripts", db_scripts_cmd, 0, NULL }, { "unscript", db_unscript_cmd, CS_OWN, NULL }, { "capture", db_capture_cmd, CS_OWN, NULL }, { "textdump", db_textdump_cmd, CS_OWN, NULL }, { "findstack", db_findstack_cmd, 0, NULL }, }; struct command_table db_cmd_table = LIST_HEAD_INITIALIZER(db_cmd_table); static struct command *db_last_command = NULL; /* * if 'ed' style: 'dot' is set at start of last item printed, * and '+' points to next line. * Otherwise: 'dot' points to next item, '..' points to last. */ static bool db_ed_style = true; /* * Utility routine - discard tokens through end-of-line. */ void db_skip_to_eol(void) { int t; do { t = db_read_token(); } while (t != tEOL); } /* * Results of command search. */ #define CMD_UNIQUE 0 #define CMD_FOUND 1 #define CMD_NONE 2 #define CMD_AMBIGUOUS 3 #define CMD_HELP 4 static void db_cmd_match(char *name, struct command *cmd, struct command **cmdp, int *resultp); static void db_cmd_list(struct command_table *table); static int db_cmd_search(char *name, struct command_table *table, struct command **cmdp); static void db_command(struct command **last_cmdp, struct command_table *cmd_table, int dopager); /* * Initialize the command lists from the static tables. */ void db_command_init(void) { #define N(a) (sizeof(a) / sizeof(a[0])) int i; for (i = 0; i < N(db_cmds); i++) db_command_register(&db_cmd_table, &db_cmds[i]); for (i = 0; i < N(db_show_cmds); i++) db_command_register(&db_show_table, &db_show_cmds[i]); for (i = 0; i < N(db_show_active_cmds); i++) db_command_register(&db_show_active_table, &db_show_active_cmds[i]); for (i = 0; i < N(db_show_all_cmds); i++) db_command_register(&db_show_all_table, &db_show_all_cmds[i]); #undef N } /* * Register a command. */ void db_command_register(struct command_table *list, struct command *cmd) { struct command *c, *last; last = NULL; LIST_FOREACH(c, list, next) { int n = strcmp(cmd->name, c->name); /* Check that the command is not already present. */ if (n == 0) { printf("%s: Warning, the command \"%s\" already exists;" " ignoring request\n", __func__, cmd->name); return; } if (n < 0) { /* NB: keep list sorted lexicographically */ LIST_INSERT_BEFORE(c, cmd, next); return; } last = c; } if (last == NULL) LIST_INSERT_HEAD(list, cmd, next); else LIST_INSERT_AFTER(last, cmd, next); } /* * Remove a command previously registered with db_command_register. */ void db_command_unregister(struct command_table *list, struct command *cmd) { struct command *c; LIST_FOREACH(c, list, next) { if (cmd == c) { LIST_REMOVE(cmd, next); return; } } /* NB: intentionally quiet */ } /* * Helper function to match a single command. */ static void db_cmd_match(char *name, struct command *cmd, struct command **cmdp, int *resultp) { char *lp, *rp; int c; lp = name; rp = cmd->name; while ((c = *lp) == *rp) { if (c == 0) { /* complete match */ *cmdp = cmd; *resultp = CMD_UNIQUE; return; } lp++; rp++; } if (c == 0) { /* end of name, not end of command - partial match */ if (*resultp == CMD_FOUND) { *resultp = CMD_AMBIGUOUS; /* but keep looking for a full match - this lets us match single letters */ } else { *cmdp = cmd; *resultp = CMD_FOUND; } } } /* * Search for command prefix. */ static int db_cmd_search(char *name, struct command_table *table, struct command **cmdp) { struct command *cmd; int result = CMD_NONE; LIST_FOREACH(cmd, table, next) { db_cmd_match(name,cmd,cmdp,&result); if (result == CMD_UNIQUE) break; } if (result == CMD_NONE) { /* check for 'help' */ if (name[0] == 'h' && name[1] == 'e' && name[2] == 'l' && name[3] == 'p') result = CMD_HELP; } return (result); } static void db_cmd_list(struct command_table *table) { struct command *cmd; LIST_FOREACH(cmd, table, next) { db_printf("%-16s", cmd->name); db_end_line(16); } } static void db_command(struct command **last_cmdp, struct command_table *cmd_table, int dopager) { struct command *cmd = NULL; int t; char modif[TOK_STRING_SIZE]; db_expr_t addr, count; bool have_addr = false; int result; t = db_read_token(); if (t == tEOL) { /* empty line repeats last command, at 'next' */ cmd = *last_cmdp; addr = (db_expr_t)db_next; have_addr = false; count = 1; modif[0] = '\0'; } else if (t == tEXCL) { db_fncall((db_expr_t)0, (bool)false, (db_expr_t)0, (char *)0); return; } else if (t != tIDENT) { db_printf("?\n"); db_flush_lex(); return; } else { /* * Search for command */ while (cmd_table) { result = db_cmd_search(db_tok_string, cmd_table, &cmd); switch (result) { case CMD_NONE: db_printf("No such command\n"); db_flush_lex(); return; case CMD_AMBIGUOUS: db_printf("Ambiguous\n"); db_flush_lex(); return; case CMD_HELP: db_cmd_list(cmd_table); db_flush_lex(); return; default: break; } if ((cmd_table = cmd->more) != NULL) { t = db_read_token(); if (t != tIDENT) { db_cmd_list(cmd_table); db_flush_lex(); return; } } } if ((cmd->flag & CS_OWN) == 0) { /* * Standard syntax: * command [/modifier] [addr] [,count] */ t = db_read_token(); if (t == tSLASH) { t = db_read_token(); if (t != tIDENT) { db_printf("Bad modifier\n"); db_flush_lex(); return; } db_strcpy(modif, db_tok_string); } else { db_unread_token(t); modif[0] = '\0'; } if (db_expression(&addr)) { db_dot = (db_addr_t) addr; db_last_addr = db_dot; have_addr = true; } else { addr = (db_expr_t) db_dot; have_addr = false; } t = db_read_token(); if (t == tCOMMA) { if (!db_expression(&count)) { db_printf("Count missing\n"); db_flush_lex(); return; } } else { db_unread_token(t); count = -1; } if ((cmd->flag & CS_MORE) == 0) { db_skip_to_eol(); } } } *last_cmdp = cmd; if (cmd != NULL) { /* * Execute the command. */ if (dopager) db_enable_pager(); else db_disable_pager(); (*cmd->fcn)(addr, have_addr, count, modif); if (dopager) db_disable_pager(); if (cmd->flag & CS_SET_DOT) { /* * If command changes dot, set dot to * previous address displayed (if 'ed' style). */ if (db_ed_style) { db_dot = db_prev; } else { db_dot = db_next; } } else { /* * If command does not change dot, * set 'next' location to be the same. */ db_next = db_dot; } } } /* * At least one non-optional command must be implemented using * DB_COMMAND() so that db_cmd_set gets created. Here is one. */ DB_COMMAND(panic, db_panic) { db_disable_pager(); panic("from debugger"); } void db_command_loop(void) { /* * Initialize 'prev' and 'next' to dot. */ db_prev = db_dot; db_next = db_dot; db_cmd_loop_done = 0; while (!db_cmd_loop_done) { if (db_print_position() != 0) db_printf("\n"); db_printf("db> "); (void) db_read_line(); db_command(&db_last_command, &db_cmd_table, /* dopager */ 1); } } /* * Execute a command on behalf of a script. The caller is responsible for * making sure that the command string is < DB_MAXLINE or it will be * truncated. * * XXXRW: Runs by injecting faked input into DDB input stream; it would be * nicer to use an alternative approach that didn't mess with the previous * command buffer. */ void db_command_script(const char *command) { db_prev = db_next = db_dot; db_inject_line(command); db_command(&db_last_command, &db_cmd_table, /* dopager */ 0); } void db_error(const char *s) { if (s) db_printf("%s", s); db_flush_lex(); kdb_reenter(); } static void db_dump(db_expr_t dummy, bool dummy2, db_expr_t dummy3, char *dummy4) { int error; if (textdump_pending) { db_printf("textdump_pending set.\n" "run \"textdump unset\" first or \"textdump dump\" for a textdump.\n"); return; } error = doadump(false); if (error) { db_printf("Cannot dump: "); switch (error) { case EBUSY: db_printf("debugger got invoked while dumping.\n"); break; case ENXIO: db_printf("no dump device specified.\n"); break; default: db_printf("unknown error (error=%d).\n", error); break; } } } /* * Call random function: * !expr(arg,arg,arg) */ /* The generic implementation supports a maximum of 10 arguments. */ typedef db_expr_t __db_f(db_expr_t, db_expr_t, db_expr_t, db_expr_t, db_expr_t, db_expr_t, db_expr_t, db_expr_t, db_expr_t, db_expr_t); static __inline int db_fncall_generic(db_expr_t addr, db_expr_t *rv, int nargs, db_expr_t args[]) { __db_f *f = (__db_f *)addr; if (nargs > 10) { db_printf("Too many arguments (max 10)\n"); return (0); } *rv = (*f)(args[0], args[1], args[2], args[3], args[4], args[5], args[6], args[7], args[8], args[9]); return (1); } static void db_fncall(db_expr_t dummy1, bool dummy2, db_expr_t dummy3, char *dummy4) { db_expr_t fn_addr; db_expr_t args[DB_MAXARGS]; int nargs = 0; db_expr_t retval; int t; if (!db_expression(&fn_addr)) { db_printf("Bad function\n"); db_flush_lex(); return; } t = db_read_token(); if (t == tLPAREN) { if (db_expression(&args[0])) { nargs++; while ((t = db_read_token()) == tCOMMA) { if (nargs == DB_MAXARGS) { db_printf("Too many arguments (max %d)\n", DB_MAXARGS); db_flush_lex(); return; } if (!db_expression(&args[nargs])) { db_printf("Argument missing\n"); db_flush_lex(); return; } nargs++; } db_unread_token(t); } if (db_read_token() != tRPAREN) { db_printf("?\n"); db_flush_lex(); return; } } db_skip_to_eol(); db_disable_pager(); if (DB_CALL(fn_addr, &retval, nargs, args)) db_printf("= %#lr\n", (long)retval); } static void db_halt(db_expr_t dummy, bool dummy2, db_expr_t dummy3, char *dummy4) { cpu_halt(); } static void db_kill(db_expr_t dummy1, bool dummy2, db_expr_t dummy3, char *dummy4) { db_expr_t old_radix, pid, sig; struct proc *p; #define DB_ERROR(f) do { db_printf f; db_flush_lex(); goto out; } while (0) /* * PIDs and signal numbers are typically represented in base * 10, so make that the default here. It can, of course, be * overridden by specifying a prefix. */ old_radix = db_radix; db_radix = 10; /* Retrieve arguments. */ if (!db_expression(&sig)) DB_ERROR(("Missing signal number\n")); if (!db_expression(&pid)) DB_ERROR(("Missing process ID\n")); db_skip_to_eol(); if (!_SIG_VALID(sig)) DB_ERROR(("Signal number out of range\n")); /* * Find the process in question. allproc_lock is not needed * since we're in DDB. */ /* sx_slock(&allproc_lock); */ FOREACH_PROC_IN_SYSTEM(p) if (p->p_pid == pid) break; /* sx_sunlock(&allproc_lock); */ if (p == NULL) DB_ERROR(("Can't find process with pid %ld\n", (long) pid)); /* If it's already locked, bail; otherwise, do the deed. */ if (PROC_TRYLOCK(p) == 0) DB_ERROR(("Can't lock process with pid %ld\n", (long) pid)); else { pksignal(p, sig, NULL); PROC_UNLOCK(p); } out: db_radix = old_radix; #undef DB_ERROR } /* * Reboot. In case there is an additional argument, take it as delay in * seconds. Default to 15s if we cannot parse it and make sure we will * never wait longer than 1 week. Some code is similar to * kern_shutdown.c:shutdown_panic(). */ #ifndef DB_RESET_MAXDELAY #define DB_RESET_MAXDELAY (3600 * 24 * 7) #endif static void db_reset(db_expr_t addr, bool have_addr, db_expr_t count __unused, char *modif __unused) { int delay, loop; if (have_addr) { delay = (int)db_hex2dec(addr); /* If we parse to fail, use 15s. */ if (delay == -1) delay = 15; /* Cap at one week. */ if ((uintmax_t)delay > (uintmax_t)DB_RESET_MAXDELAY) delay = DB_RESET_MAXDELAY; db_printf("Automatic reboot in %d seconds - " "press a key on the console to abort\n", delay); for (loop = delay * 10; loop > 0; --loop) { DELAY(1000 * 100); /* 1/10th second */ /* Did user type a key? */ if (cncheckc() != -1) return; } } cpu_reset(); } static void db_watchdog(db_expr_t dummy1, bool dummy2, db_expr_t dummy3, char *dummy4) { db_expr_t old_radix, tout; int err, i; old_radix = db_radix; db_radix = 10; err = db_expression(&tout); db_skip_to_eol(); db_radix = old_radix; /* If no argument is provided the watchdog will just be disabled. */ if (err == 0) { db_printf("No argument provided, disabling watchdog\n"); tout = 0; } else if ((tout & WD_INTERVAL) == WD_TO_NEVER) { db_error("Out of range watchdog interval\n"); return; } EVENTHANDLER_INVOKE(watchdog_list, tout, &i); } static void db_gdb(db_expr_t dummy1, bool dummy2, db_expr_t dummy3, char *dummy4) { if (kdb_dbbe_select("gdb") != 0) { db_printf("The remote GDB backend could not be selected.\n"); return; } /* * Mark that we are done in the debugger. kdb_trap() * should re-enter with the new backend. */ db_cmd_loop_done = 1; db_printf("(ctrl-c will return control to ddb)\n"); } static void db_stack_trace(db_expr_t tid, bool hastid, db_expr_t count, char *modif) { struct thread *td; db_expr_t radix; pid_t pid; int t; /* * We parse our own arguments. We don't like the default radix. */ radix = db_radix; db_radix = 10; hastid = db_expression(&tid); t = db_read_token(); if (t == tCOMMA) { if (!db_expression(&count)) { db_printf("Count missing\n"); db_flush_lex(); return; } } else { db_unread_token(t); count = -1; } db_skip_to_eol(); db_radix = radix; if (hastid) { td = kdb_thr_lookup((lwpid_t)tid); if (td == NULL) td = kdb_thr_from_pid((pid_t)tid); if (td == NULL) { db_printf("Thread %d not found\n", (int)tid); return; } } else td = kdb_thread; if (td->td_proc != NULL) pid = td->td_proc->p_pid; else pid = -1; db_printf("Tracing pid %d tid %ld td %p\n", pid, (long)td->td_tid, td); db_trace_thread(td, count); } static void _db_stack_trace_all(bool active_only) { struct proc *p; struct thread *td; jmp_buf jb; void *prev_jb; FOREACH_PROC_IN_SYSTEM(p) { prev_jb = kdb_jmpbuf(jb); if (setjmp(jb) == 0) { FOREACH_THREAD_IN_PROC(p, td) { if (td->td_state == TDS_RUNNING) db_printf("\nTracing command %s pid %d" " tid %ld td %p (CPU %d)\n", p->p_comm, p->p_pid, (long)td->td_tid, td, td->td_oncpu); else if (active_only) continue; else db_printf("\nTracing command %s pid %d" " tid %ld td %p\n", p->p_comm, p->p_pid, (long)td->td_tid, td); db_trace_thread(td, -1); if (db_pager_quit) { kdb_jmpbuf(prev_jb); return; } } } kdb_jmpbuf(prev_jb); } } static void db_stack_trace_active(db_expr_t dummy, bool dummy2, db_expr_t dummy3, char *dummy4) { _db_stack_trace_all(true); } static void db_stack_trace_all(db_expr_t dummy, bool dummy2, db_expr_t dummy3, char *dummy4) { _db_stack_trace_all(false); } /* * Take the parsed expression value from the command line that was parsed * as a hexadecimal value and convert it as if the expression was parsed * as a decimal value. Returns -1 if the expression was not a valid * decimal value. */ db_expr_t db_hex2dec(db_expr_t expr) { uintptr_t x, y; db_expr_t val; y = 1; val = 0; x = expr; while (x != 0) { if (x % 16 > 9) return (-1); val += (x % 16) * (y); x >>= 4; y *= 10; } return (val); } Index: projects/clang390-import/sys/ddb/db_main.c =================================================================== --- projects/clang390-import/sys/ddb/db_main.c (revision 305686) +++ projects/clang390-import/sys/ddb/db_main.c (revision 305687) @@ -1,282 +1,279 @@ /*- * Mach Operating System * Copyright (c) 1991,1990 Carnegie Mellon University * All Rights Reserved. * * Permission to use, copy, modify and distribute this software and its * documentation is hereby granted, provided that both the copyright * notice and this permission notice appear in all copies of the * software, derivative works or modified versions, and any portions * thereof, and that both notices appear in supporting documentation. * * CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS * CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND FOR * ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. * * Carnegie Mellon requests users of this software to return to * * Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU * School of Computer Science * Carnegie Mellon University * Pittsburgh PA 15213-3890 * * any improvements or extensions that they make and grant Carnegie the * rights to redistribute these changes. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include SYSCTL_NODE(_debug, OID_AUTO, ddb, CTLFLAG_RW, 0, "DDB settings"); static dbbe_init_f db_init; static dbbe_trap_f db_trap; static dbbe_trace_f db_trace_self_wrapper; static dbbe_trace_thread_f db_trace_thread_wrapper; KDB_BACKEND(ddb, db_init, db_trace_self_wrapper, db_trace_thread_wrapper, db_trap); /* * Symbols can be loaded by specifying the exact addresses of * the symtab and strtab in memory. This is used when loaded from * boot loaders different than the native one (like Xen). */ vm_offset_t ksymtab, kstrtab, ksymtab_size; bool X_db_line_at_pc(db_symtab_t *symtab, c_db_sym_t sym, char **file, int *line, db_expr_t off) { return (false); } c_db_sym_t X_db_lookup(db_symtab_t *symtab, const char *symbol) { c_linker_sym_t lsym; Elf_Sym *sym; if (symtab->private == NULL) { return ((c_db_sym_t)((!linker_ddb_lookup(symbol, &lsym)) ? lsym : NULL)); } else { sym = (Elf_Sym *)symtab->start; while ((char *)sym < symtab->end) { if (sym->st_name != 0 && !strcmp(symtab->private + sym->st_name, symbol)) return ((c_db_sym_t)sym); sym++; } } return (NULL); } c_db_sym_t X_db_search_symbol(db_symtab_t *symtab, db_addr_t off, db_strategy_t strat, db_expr_t *diffp) { c_linker_sym_t lsym; Elf_Sym *sym, *match; unsigned long diff; if (symtab->private == NULL) { if (!linker_ddb_search_symbol((caddr_t)off, &lsym, &diff)) { *diffp = (db_expr_t)diff; return ((c_db_sym_t)lsym); } return (NULL); } diff = ~0UL; match = NULL; for (sym = (Elf_Sym*)symtab->start; (char*)sym < symtab->end; sym++) { if (sym->st_name == 0 || sym->st_shndx == SHN_UNDEF) continue; if (off < sym->st_value) continue; if (ELF_ST_TYPE(sym->st_info) != STT_OBJECT && ELF_ST_TYPE(sym->st_info) != STT_FUNC && ELF_ST_TYPE(sym->st_info) != STT_NOTYPE) continue; if ((off - sym->st_value) > diff) continue; if ((off - sym->st_value) < diff) { diff = off - sym->st_value; match = sym; } else { if (match == NULL) match = sym; else if (ELF_ST_BIND(match->st_info) == STB_LOCAL && ELF_ST_BIND(sym->st_info) != STB_LOCAL) match = sym; } if (diff == 0) { if (strat == DB_STGY_PROC && ELF_ST_TYPE(sym->st_info) == STT_FUNC && ELF_ST_BIND(sym->st_info) != STB_LOCAL) break; if (strat == DB_STGY_ANY && ELF_ST_BIND(sym->st_info) != STB_LOCAL) break; } } *diffp = (match == NULL) ? off : diff; return ((c_db_sym_t)match); } bool X_db_sym_numargs(db_symtab_t *symtab, c_db_sym_t sym, int *nargp, char **argp) { return (false); } void X_db_symbol_values(db_symtab_t *symtab, c_db_sym_t sym, const char **namep, db_expr_t *valp) { linker_symval_t lval; if (symtab->private == NULL) { linker_ddb_symbol_values((c_linker_sym_t)sym, &lval); if (namep != NULL) *namep = (const char*)lval.name; if (valp != NULL) *valp = (db_expr_t)lval.value; } else { if (namep != NULL) *namep = (const char *)symtab->private + ((const Elf_Sym *)sym)->st_name; if (valp != NULL) *valp = (db_expr_t)((const Elf_Sym *)sym)->st_value; } } int db_fetch_ksymtab(vm_offset_t ksym_start, vm_offset_t ksym_end) { Elf_Size strsz; if (ksym_end > ksym_start && ksym_start != 0) { ksymtab = ksym_start; ksymtab_size = *(Elf_Size*)ksymtab; ksymtab += sizeof(Elf_Size); kstrtab = ksymtab + ksymtab_size; strsz = *(Elf_Size*)kstrtab; kstrtab += sizeof(Elf_Size); if (kstrtab + strsz > ksym_end) { /* Sizes doesn't match, unset everything. */ ksymtab = ksymtab_size = kstrtab = 0; } } if (ksymtab == 0 || ksymtab_size == 0 || kstrtab == 0) return (-1); return (0); } static int db_init(void) { db_command_init(); if (ksymtab != 0 && kstrtab != 0 && ksymtab_size != 0) { db_add_symbol_table((char *)ksymtab, (char *)(ksymtab + ksymtab_size), "elf", (char *)kstrtab); } db_add_symbol_table(NULL, NULL, "kld", NULL); return (1); /* We're the default debugger. */ } static int db_trap(int type, int code) { jmp_buf jb; void *prev_jb; bool bkpt, watchpt; const char *why; /* * Don't handle the trap if the console is unavailable (i.e. it * is in graphics mode). */ if (cnunavailable()) return (0); - bkpt = IS_BREAKPOINT_TRAP(type, code); - watchpt = IS_WATCHPOINT_TRAP(type, code); - - if (db_stop_at_pc(&bkpt)) { + if (db_stop_at_pc(type, code, &bkpt, &watchpt)) { if (db_inst_count) { db_printf("After %d instructions (%d loads, %d stores),\n", db_inst_count, db_load_count, db_store_count); } prev_jb = kdb_jmpbuf(jb); if (setjmp(jb) == 0) { db_dot = PC_REGS(); db_print_thread(); if (bkpt) db_printf("Breakpoint at\t"); else if (watchpt) db_printf("Watchpoint at\t"); else db_printf("Stopped at\t"); db_print_loc_and_inst(db_dot); } why = kdb_why; db_script_kdbenter(why != KDB_WHY_UNSET ? why : "unknown"); db_command_loop(); (void)kdb_jmpbuf(prev_jb); } db_restart_at_pc(watchpt); return (1); } static void db_trace_self_wrapper(void) { jmp_buf jb; void *prev_jb; prev_jb = kdb_jmpbuf(jb); if (setjmp(jb) == 0) db_trace_self(); (void)kdb_jmpbuf(prev_jb); } static void db_trace_thread_wrapper(struct thread *td) { jmp_buf jb; void *prev_jb; prev_jb = kdb_jmpbuf(jb); if (setjmp(jb) == 0) db_trace_thread(td, -1); (void)kdb_jmpbuf(prev_jb); } Index: projects/clang390-import/sys/ddb/db_run.c =================================================================== --- projects/clang390-import/sys/ddb/db_run.c (revision 305686) +++ projects/clang390-import/sys/ddb/db_run.c (revision 305687) @@ -1,384 +1,386 @@ /*- * Mach Operating System * Copyright (c) 1991,1990 Carnegie Mellon University * All Rights Reserved. * * Permission to use, copy, modify and distribute this software and its * documentation is hereby granted, provided that both the copyright * notice and this permission notice appear in all copies of the * software, derivative works or modified versions, and any portions * thereof, and that both notices appear in supporting documentation. * * CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS * CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND FOR * ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. * * Carnegie Mellon requests users of this software to return to * * Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU * School of Computer Science * Carnegie Mellon University * Pittsburgh PA 15213-3890 * * any improvements or extensions that they make and grant Carnegie the * rights to redistribute these changes. */ /* * Author: David B. Golub, Carnegie Mellon University * Date: 7/90 */ /* * Commands to run process. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include static int db_run_mode; #define STEP_NONE 0 #define STEP_ONCE 1 #define STEP_RETURN 2 #define STEP_CALLT 3 #define STEP_CONTINUE 4 #define STEP_INVISIBLE 5 #define STEP_COUNT 6 static bool db_sstep_print; static int db_loop_count; static int db_call_depth; int db_inst_count; int db_load_count; int db_store_count; #ifdef SOFTWARE_SSTEP db_breakpoint_t db_not_taken_bkpt = 0; db_breakpoint_t db_taken_bkpt = 0; #endif #ifndef db_set_single_step void db_set_single_step(void); #endif #ifndef db_clear_single_step void db_clear_single_step(void); #endif #ifndef db_pc_is_singlestep static bool db_pc_is_singlestep(db_addr_t pc) { #ifdef SOFTWARE_SSTEP if ((db_not_taken_bkpt != 0 && pc == db_not_taken_bkpt->address) || (db_taken_bkpt != 0 && pc == db_taken_bkpt->address)) return (true); #endif return (false); } #endif bool -db_stop_at_pc(bool *is_breakpoint) +db_stop_at_pc(int type, int code, bool *is_breakpoint, bool *is_watchpoint) { db_addr_t pc; db_breakpoint_t bkpt; + *is_breakpoint = IS_BREAKPOINT_TRAP(type, code); + *is_watchpoint = IS_WATCHPOINT_TRAP(type, code); pc = PC_REGS(); - if (db_pc_is_singlestep(pc)) *is_breakpoint = false; db_clear_single_step(); db_clear_breakpoints(); db_clear_watchpoints(); #ifdef FIXUP_PC_AFTER_BREAK if (*is_breakpoint) { /* * Breakpoint trap. Fix up the PC if the * machine requires it. */ FIXUP_PC_AFTER_BREAK pc = PC_REGS(); } #endif /* * Now check for a breakpoint at this address. */ bkpt = db_find_breakpoint_here(pc); if (bkpt) { if (--bkpt->count == 0) { bkpt->count = bkpt->init_count; *is_breakpoint = true; return (true); /* stop here */ } + return (false); /* continue the countdown */ } else if (*is_breakpoint) { #ifdef BKPT_SKIP BKPT_SKIP; #endif } *is_breakpoint = false; if (db_run_mode == STEP_INVISIBLE) { db_run_mode = STEP_CONTINUE; return (false); /* continue */ } if (db_run_mode == STEP_COUNT) { return (false); /* continue */ } if (db_run_mode == STEP_ONCE) { if (--db_loop_count > 0) { if (db_sstep_print) { db_printf("\t\t"); db_print_loc_and_inst(pc); } return (false); /* continue */ } } if (db_run_mode == STEP_RETURN) { /* continue until matching return */ db_expr_t ins; ins = db_get_value(pc, sizeof(int), false); if (!inst_trap_return(ins) && (!inst_return(ins) || --db_call_depth != 0)) { if (db_sstep_print) { if (inst_call(ins) || inst_return(ins)) { int i; db_printf("[after %6d] ", db_inst_count); for (i = db_call_depth; --i > 0; ) db_printf(" "); db_print_loc_and_inst(pc); } } if (inst_call(ins)) db_call_depth++; return (false); /* continue */ } } if (db_run_mode == STEP_CALLT) { /* continue until call or return */ db_expr_t ins; ins = db_get_value(pc, sizeof(int), false); if (!inst_call(ins) && !inst_return(ins) && !inst_trap_return(ins)) { return (false); /* continue */ } } db_run_mode = STEP_NONE; return (true); } void db_restart_at_pc(bool watchpt) { db_addr_t pc = PC_REGS(); if ((db_run_mode == STEP_COUNT) || (db_run_mode == STEP_RETURN) || (db_run_mode == STEP_CALLT)) { /* * We are about to execute this instruction, * so count it now. */ #ifdef SOFTWARE_SSTEP db_expr_t ins = #endif db_get_value(pc, sizeof(int), false); db_inst_count++; db_load_count += inst_load(ins); db_store_count += inst_store(ins); #ifdef SOFTWARE_SSTEP /* XXX works on mips, but... */ if (inst_branch(ins) || inst_call(ins)) { ins = db_get_value(next_instr_address(pc,1), sizeof(int), false); db_inst_count++; db_load_count += inst_load(ins); db_store_count += inst_store(ins); } #endif /* SOFTWARE_SSTEP */ } if (db_run_mode == STEP_CONTINUE) { if (watchpt || db_find_breakpoint_here(pc)) { /* * Step over breakpoint/watchpoint. */ db_run_mode = STEP_INVISIBLE; db_set_single_step(); } else { db_set_breakpoints(); db_set_watchpoints(); } } else { db_set_single_step(); } } #ifdef SOFTWARE_SSTEP /* * Software implementation of single-stepping. * If your machine does not have a trace mode * similar to the vax or sun ones you can use * this implementation, done for the mips. * Just define the above conditional and provide * the functions/macros defined below. * * extern bool * inst_branch(), returns true if the instruction might branch * extern unsigned * branch_taken(), return the address the instruction might * branch to * db_getreg_val(); return the value of a user register, * as indicated in the hardware instruction * encoding, e.g. 8 for r8 * * next_instr_address(pc,bd) returns the address of the first * instruction following the one at "pc", * which is either in the taken path of * the branch (bd==1) or not. This is * for machines (mips) with branch delays. * * A single-step may involve at most 2 breakpoints - * one for branch-not-taken and one for branch taken. * If one of these addresses does not already have a breakpoint, * we allocate a breakpoint and save it here. * These breakpoints are deleted on return. */ void db_set_single_step(void) { db_addr_t pc = PC_REGS(), brpc; unsigned inst; /* * User was stopped at pc, e.g. the instruction * at pc was not executed. */ inst = db_get_value(pc, sizeof(int), false); if (inst_branch(inst) || inst_call(inst) || inst_return(inst)) { brpc = branch_taken(inst, pc); if (brpc != pc) { /* self-branches are hopeless */ db_taken_bkpt = db_set_temp_breakpoint(brpc); } pc = next_instr_address(pc, 1); } pc = next_instr_address(pc, 0); db_not_taken_bkpt = db_set_temp_breakpoint(pc); } void db_clear_single_step(void) { if (db_not_taken_bkpt != 0) { db_delete_temp_breakpoint(db_not_taken_bkpt); db_not_taken_bkpt = 0; } if (db_taken_bkpt != 0) { db_delete_temp_breakpoint(db_taken_bkpt); db_taken_bkpt = 0; } } #endif /* SOFTWARE_SSTEP */ extern int db_cmd_loop_done; /* single-step */ /*ARGSUSED*/ void db_single_step_cmd(db_expr_t addr, bool have_addr, db_expr_t count, char *modif) { bool print = false; if (count == -1) count = 1; if (modif[0] == 'p') print = true; db_run_mode = STEP_ONCE; db_loop_count = count; db_sstep_print = print; db_inst_count = 0; db_load_count = 0; db_store_count = 0; db_cmd_loop_done = 1; } /* trace and print until call/return */ /*ARGSUSED*/ void db_trace_until_call_cmd(db_expr_t addr, bool have_addr, db_expr_t count, char *modif) { bool print = false; if (modif[0] == 'p') print = true; db_run_mode = STEP_CALLT; db_sstep_print = print; db_inst_count = 0; db_load_count = 0; db_store_count = 0; db_cmd_loop_done = 1; } /*ARGSUSED*/ void db_trace_until_matching_cmd(db_expr_t addr, bool have_addr, db_expr_t count, char *modif) { bool print = false; if (modif[0] == 'p') print = true; db_run_mode = STEP_RETURN; db_call_depth = 1; db_sstep_print = print; db_inst_count = 0; db_load_count = 0; db_store_count = 0; db_cmd_loop_done = 1; } /* continue */ /*ARGSUSED*/ void db_continue_cmd(db_expr_t addr, bool have_addr, db_expr_t count, char *modif) { if (modif[0] == 'c') db_run_mode = STEP_COUNT; else db_run_mode = STEP_CONTINUE; db_inst_count = 0; db_load_count = 0; db_store_count = 0; db_cmd_loop_done = 1; } Index: projects/clang390-import/sys/ddb/ddb.h =================================================================== --- projects/clang390-import/sys/ddb/ddb.h (revision 305686) +++ projects/clang390-import/sys/ddb/ddb.h (revision 305687) @@ -1,295 +1,296 @@ /*- * Copyright (c) 1993, Garrett A. Wollman. * Copyright (c) 1993, University of Vermont and State Agricultural College. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ /* * Necessary declarations for the `ddb' kernel debugger. */ #ifndef _DDB_DDB_H_ #define _DDB_DDB_H_ #ifdef SYSCTL_DECL SYSCTL_DECL(_debug_ddb); #endif #include /* type definitions */ #include /* LIST_* */ #include /* SYSINIT */ #ifndef DB_MAXARGS #define DB_MAXARGS 10 #endif #ifndef DB_MAXLINE #define DB_MAXLINE 120 #endif #ifndef DB_MAXSCRIPTS #define DB_MAXSCRIPTS 8 #endif #ifndef DB_MAXSCRIPTNAME #define DB_MAXSCRIPTNAME 32 #endif #ifndef DB_MAXSCRIPTLEN #define DB_MAXSCRIPTLEN 128 #endif #ifndef DB_MAXSCRIPTRECURSION #define DB_MAXSCRIPTRECURSION 3 #endif #ifndef DB_CALL #define DB_CALL db_fncall_generic #else int DB_CALL(db_expr_t, db_expr_t *, int, db_expr_t[]); #endif /* * Extern variables to set the address and size of the symtab and strtab. * Most users should use db_fetch_symtab in order to set them from the * boot loader provided values. */ extern vm_offset_t ksymtab, kstrtab, ksymtab_size; /* * There are three "command tables": * - One for simple commands; a list of these is displayed * by typing 'help' at the debugger prompt. * - One for sub-commands of 'show'; to see this type 'show' * without any arguments. * - The last one for sub-commands of 'show all'; type 'show all' * without any argument to get a list. */ struct command; LIST_HEAD(command_table, command); extern struct command_table db_cmd_table; extern struct command_table db_show_table; extern struct command_table db_show_all_table; /* * Type signature for a function implementing a ddb command. */ typedef void db_cmdfcn_t(db_expr_t addr, bool have_addr, db_expr_t count, char *modif); /* * Command table entry. */ struct command { char * name; /* command name */ db_cmdfcn_t *fcn; /* function to call */ int flag; /* extra info: */ #define CS_OWN 0x1 /* non-standard syntax */ #define CS_MORE 0x2 /* standard syntax, but may have other words * at end */ #define CS_SET_DOT 0x100 /* set dot after command */ struct command_table *more; /* another level of command */ LIST_ENTRY(command) next; /* next entry in the command table */ }; /* * Arrange for the specified ddb command to be defined and * bound to the specified function. Commands can be defined * in modules in which case they will be available only when * the module is loaded. */ #define _DB_SET(_suffix, _name, _func, list, _flag, _more) \ static struct command __CONCAT(_name,_suffix) = { \ .name = __STRING(_name), \ .fcn = _func, \ .flag = _flag, \ .more = _more \ }; \ static void __CONCAT(__CONCAT(_name,_suffix),_add)(void *arg __unused) \ { db_command_register(&list, &__CONCAT(_name,_suffix)); } \ SYSINIT(__CONCAT(_name,_suffix), SI_SUB_KLD, SI_ORDER_ANY, \ __CONCAT(__CONCAT(_name,_suffix),_add), NULL); \ static void __CONCAT(__CONCAT(_name,_suffix),_del)(void *arg __unused) \ { db_command_unregister(&list, &__CONCAT(_name,_suffix)); } \ SYSUNINIT(__CONCAT(_name,_suffix), SI_SUB_KLD, SI_ORDER_ANY, \ __CONCAT(__CONCAT(_name,_suffix),_del), NULL); /* * Like _DB_SET but also create the function declaration which * must be followed immediately by the body; e.g. * _DB_FUNC(_cmd, panic, db_panic, db_cmd_table, 0, NULL) * { * ...panic implementation... * } * * This macro is mostly used to define commands placed in one of * the ddb command tables; see DB_COMMAND, etc. below. */ #define _DB_FUNC(_suffix, _name, _func, list, _flag, _more) \ static db_cmdfcn_t _func; \ _DB_SET(_suffix, _name, _func, list, _flag, _more); \ static void \ _func(db_expr_t addr, bool have_addr, db_expr_t count, char *modif) /* common idom provided for backwards compatibility */ #define DB_FUNC(_name, _func, list, _flag, _more) \ _DB_FUNC(_cmd, _name, _func, list, _flag, _more) #define DB_COMMAND(cmd_name, func_name) \ _DB_FUNC(_cmd, cmd_name, func_name, db_cmd_table, 0, NULL) #define DB_ALIAS(alias_name, func_name) \ _DB_SET(_cmd, alias_name, func_name, db_cmd_table, 0, NULL) #define DB_SHOW_COMMAND(cmd_name, func_name) \ _DB_FUNC(_show, cmd_name, func_name, db_show_table, 0, NULL) #define DB_SHOW_ALIAS(alias_name, func_name) \ _DB_SET(_show, alias_name, func_name, db_show_table, 0, NULL) #define DB_SHOW_ALL_COMMAND(cmd_name, func_name) \ _DB_FUNC(_show_all, cmd_name, func_name, db_show_all_table, 0, NULL) #define DB_SHOW_ALL_ALIAS(alias_name, func_name) \ _DB_SET(_show_all, alias_name, func_name, db_show_all_table, 0, NULL) extern db_expr_t db_maxoff; extern int db_indent; extern int db_inst_count; extern int db_load_count; extern int db_store_count; extern volatile int db_pager_quit; extern db_expr_t db_radix; extern db_expr_t db_max_width; extern db_expr_t db_tab_stop_width; extern db_expr_t db_lines_per_page; struct thread; struct vm_map; void db_check_interrupt(void); void db_clear_watchpoints(void); db_addr_t db_disasm(db_addr_t loc, bool altfmt); /* instruction disassembler */ void db_error(const char *s); int db_expression(db_expr_t *valuep); int db_get_variable(db_expr_t *valuep); void db_iprintf(const char *,...) __printflike(1, 2); struct proc *db_lookup_proc(db_expr_t addr); struct thread *db_lookup_thread(db_expr_t addr, bool check_pid); struct vm_map *db_map_addr(vm_offset_t); bool db_map_current(struct vm_map *); bool db_map_equal(struct vm_map *, struct vm_map *); int db_md_set_watchpoint(db_expr_t addr, db_expr_t size); int db_md_clr_watchpoint(db_expr_t addr, db_expr_t size); void db_md_list_watchpoints(void); void db_print_loc_and_inst(db_addr_t loc); void db_print_thread(void); int db_printf(const char *fmt, ...) __printflike(1, 2); int db_read_bytes(vm_offset_t addr, size_t size, char *data); /* machine-dependent */ int db_readline(char *lstart, int lsize); void db_restart_at_pc(bool watchpt); int db_set_variable(db_expr_t value); void db_set_watchpoints(void); void db_skip_to_eol(void); -bool db_stop_at_pc(bool *is_breakpoint); +bool db_stop_at_pc(int type, int code, bool *is_breakpoint, + bool *is_watchpoint); #define db_strcpy strcpy void db_trace_self(void); int db_trace_thread(struct thread *, int); bool db_value_of_name(const char *name, db_expr_t *valuep); bool db_value_of_name_pcpu(const char *name, db_expr_t *valuep); bool db_value_of_name_vnet(const char *name, db_expr_t *valuep); int db_write_bytes(vm_offset_t addr, size_t size, char *data); void db_command_register(struct command_table *, struct command *); void db_command_unregister(struct command_table *, struct command *); int db_fetch_ksymtab(vm_offset_t ksym_start, vm_offset_t ksym_end); db_cmdfcn_t db_breakpoint_cmd; db_cmdfcn_t db_capture_cmd; db_cmdfcn_t db_continue_cmd; db_cmdfcn_t db_delete_cmd; db_cmdfcn_t db_deletehwatch_cmd; db_cmdfcn_t db_deletewatch_cmd; db_cmdfcn_t db_examine_cmd; db_cmdfcn_t db_findstack_cmd; db_cmdfcn_t db_hwatchpoint_cmd; db_cmdfcn_t db_listbreak_cmd; db_cmdfcn_t db_scripts_cmd; db_cmdfcn_t db_print_cmd; db_cmdfcn_t db_ps; db_cmdfcn_t db_run_cmd; db_cmdfcn_t db_script_cmd; db_cmdfcn_t db_search_cmd; db_cmdfcn_t db_set_cmd; db_cmdfcn_t db_set_thread; db_cmdfcn_t db_show_regs; db_cmdfcn_t db_show_threads; db_cmdfcn_t db_single_step_cmd; db_cmdfcn_t db_textdump_cmd; db_cmdfcn_t db_trace_until_call_cmd; db_cmdfcn_t db_trace_until_matching_cmd; db_cmdfcn_t db_unscript_cmd; db_cmdfcn_t db_watchpoint_cmd; db_cmdfcn_t db_write_cmd; /* * Interface between DDB and the DDB output capture facility. */ struct dumperinfo; void db_capture_dump(struct dumperinfo *di); void db_capture_enterpager(void); void db_capture_exitpager(void); void db_capture_write(char *buffer, u_int buflen); void db_capture_writech(char ch); /* * Interface between DDB and the script facility. */ void db_script_kdbenter(const char *eventname); /* KDB enter event. */ /* * Interface between DDB and the textdump facility. * * Text dump blocks are of a fixed size; textdump_block_buffer is a * statically allocated buffer that code interacting with textdumps can use * to prepare and hold a pending block in when calling writenextblock(). */ #define TEXTDUMP_BLOCKSIZE 512 extern char textdump_block_buffer[TEXTDUMP_BLOCKSIZE]; void textdump_mkustar(char *block_buffer, const char *filename, u_int size); void textdump_restoreoff(off_t offset); void textdump_saveoff(off_t *offsetp); int textdump_writenextblock(struct dumperinfo *di, char *buffer); /* * Interface between the kernel and textdumps. */ extern int textdump_pending; /* Call textdump_dumpsys() instead. */ void textdump_dumpsys(struct dumperinfo *di); #endif /* !_DDB_DDB_H_ */ Index: projects/clang390-import/sys/dev/ath/ath_hal/ah.c =================================================================== --- projects/clang390-import/sys/dev/ath/ath_hal/ah.c (revision 305686) +++ projects/clang390-import/sys/dev/ath/ath_hal/ah.c (revision 305687) @@ -1,1480 +1,1497 @@ /* * Copyright (c) 2002-2009 Sam Leffler, Errno Consulting * Copyright (c) 2002-2008 Atheros Communications, Inc. * * Permission to use, copy, modify, and/or distribute this software for any * purpose with or without fee is hereby granted, provided that the above * copyright notice and this permission notice appear in all copies. * * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. * * $FreeBSD$ */ #include "opt_ah.h" #include "ah.h" #include "ah_internal.h" #include "ah_devid.h" #include "ah_eeprom.h" /* for 5ghz fast clock flag */ #include "ar5416/ar5416reg.h" /* NB: includes ar5212reg.h */ #include "ar9003/ar9300_devid.h" /* linker set of registered chips */ OS_SET_DECLARE(ah_chips, struct ath_hal_chip); /* * Check the set of registered chips to see if any recognize * the device as one they can support. */ const char* ath_hal_probe(uint16_t vendorid, uint16_t devid) { struct ath_hal_chip * const *pchip; OS_SET_FOREACH(pchip, ah_chips) { const char *name = (*pchip)->probe(vendorid, devid); if (name != AH_NULL) return name; } return AH_NULL; } /* * Attach detects device chip revisions, initializes the hwLayer * function list, reads EEPROM information, * selects reset vectors, and performs a short self test. * Any failures will return an error that should cause a hardware * disable. */ struct ath_hal* ath_hal_attach(uint16_t devid, HAL_SOFTC sc, HAL_BUS_TAG st, HAL_BUS_HANDLE sh, uint16_t *eepromdata, HAL_OPS_CONFIG *ah_config, HAL_STATUS *error) { struct ath_hal_chip * const *pchip; OS_SET_FOREACH(pchip, ah_chips) { struct ath_hal_chip *chip = *pchip; struct ath_hal *ah; /* XXX don't have vendorid, assume atheros one works */ if (chip->probe(ATHEROS_VENDOR_ID, devid) == AH_NULL) continue; ah = chip->attach(devid, sc, st, sh, eepromdata, ah_config, error); if (ah != AH_NULL) { /* copy back private state to public area */ ah->ah_devid = AH_PRIVATE(ah)->ah_devid; ah->ah_subvendorid = AH_PRIVATE(ah)->ah_subvendorid; ah->ah_macVersion = AH_PRIVATE(ah)->ah_macVersion; ah->ah_macRev = AH_PRIVATE(ah)->ah_macRev; ah->ah_phyRev = AH_PRIVATE(ah)->ah_phyRev; ah->ah_analog5GhzRev = AH_PRIVATE(ah)->ah_analog5GhzRev; ah->ah_analog2GhzRev = AH_PRIVATE(ah)->ah_analog2GhzRev; return ah; } } return AH_NULL; } const char * ath_hal_mac_name(struct ath_hal *ah) { switch (ah->ah_macVersion) { case AR_SREV_VERSION_CRETE: case AR_SREV_VERSION_MAUI_1: return "AR5210"; case AR_SREV_VERSION_MAUI_2: case AR_SREV_VERSION_OAHU: return "AR5211"; case AR_SREV_VERSION_VENICE: return "AR5212"; case AR_SREV_VERSION_GRIFFIN: return "AR2413"; case AR_SREV_VERSION_CONDOR: return "AR5424"; case AR_SREV_VERSION_EAGLE: return "AR5413"; case AR_SREV_VERSION_COBRA: return "AR2415"; case AR_SREV_2425: /* Swan */ return "AR2425"; case AR_SREV_2417: /* Nala */ return "AR2417"; case AR_XSREV_VERSION_OWL_PCI: return "AR5416"; case AR_XSREV_VERSION_OWL_PCIE: return "AR5418"; case AR_XSREV_VERSION_HOWL: return "AR9130"; case AR_XSREV_VERSION_SOWL: return "AR9160"; case AR_XSREV_VERSION_MERLIN: if (AH_PRIVATE(ah)->ah_ispcie) return "AR9280"; return "AR9220"; case AR_XSREV_VERSION_KITE: return "AR9285"; case AR_XSREV_VERSION_KIWI: if (AH_PRIVATE(ah)->ah_ispcie) return "AR9287"; return "AR9227"; case AR_SREV_VERSION_AR9380: if (ah->ah_macRev >= AR_SREV_REVISION_AR9580_10) return "AR9580"; return "AR9380"; case AR_SREV_VERSION_AR9460: return "AR9460"; case AR_SREV_VERSION_AR9330: return "AR9330"; case AR_SREV_VERSION_AR9340: return "AR9340"; case AR_SREV_VERSION_QCA9550: return "QCA9550"; case AR_SREV_VERSION_AR9485: return "AR9485"; case AR_SREV_VERSION_QCA9565: return "QCA9565"; case AR_SREV_VERSION_QCA9530: return "QCA9530"; } return "????"; } /* * Return the mask of available modes based on the hardware capabilities. */ u_int ath_hal_getwirelessmodes(struct ath_hal*ah) { return ath_hal_getWirelessModes(ah); } /* linker set of registered RF backends */ OS_SET_DECLARE(ah_rfs, struct ath_hal_rf); /* * Check the set of registered RF backends to see if * any recognize the device as one they can support. */ struct ath_hal_rf * ath_hal_rfprobe(struct ath_hal *ah, HAL_STATUS *ecode) { struct ath_hal_rf * const *prf; OS_SET_FOREACH(prf, ah_rfs) { struct ath_hal_rf *rf = *prf; if (rf->probe(ah)) return rf; } *ecode = HAL_ENOTSUPP; return AH_NULL; } const char * ath_hal_rf_name(struct ath_hal *ah) { switch (ah->ah_analog5GhzRev & AR_RADIO_SREV_MAJOR) { case 0: /* 5210 */ return "5110"; /* NB: made up */ case AR_RAD5111_SREV_MAJOR: case AR_RAD5111_SREV_PROD: return "5111"; case AR_RAD2111_SREV_MAJOR: return "2111"; case AR_RAD5112_SREV_MAJOR: case AR_RAD5112_SREV_2_0: case AR_RAD5112_SREV_2_1: return "5112"; case AR_RAD2112_SREV_MAJOR: case AR_RAD2112_SREV_2_0: case AR_RAD2112_SREV_2_1: return "2112"; case AR_RAD2413_SREV_MAJOR: return "2413"; case AR_RAD5413_SREV_MAJOR: return "5413"; case AR_RAD2316_SREV_MAJOR: return "2316"; case AR_RAD2317_SREV_MAJOR: return "2317"; case AR_RAD5424_SREV_MAJOR: return "5424"; case AR_RAD5133_SREV_MAJOR: return "5133"; case AR_RAD2133_SREV_MAJOR: return "2133"; case AR_RAD5122_SREV_MAJOR: return "5122"; case AR_RAD2122_SREV_MAJOR: return "2122"; } return "????"; } /* * Poll the register looking for a specific value. */ HAL_BOOL ath_hal_wait(struct ath_hal *ah, u_int reg, uint32_t mask, uint32_t val) { #define AH_TIMEOUT 1000 return ath_hal_waitfor(ah, reg, mask, val, AH_TIMEOUT); #undef AH_TIMEOUT } HAL_BOOL ath_hal_waitfor(struct ath_hal *ah, u_int reg, uint32_t mask, uint32_t val, uint32_t timeout) { int i; for (i = 0; i < timeout; i++) { if ((OS_REG_READ(ah, reg) & mask) == val) return AH_TRUE; OS_DELAY(10); } HALDEBUG(ah, HAL_DEBUG_REGIO | HAL_DEBUG_PHYIO, "%s: timeout on reg 0x%x: 0x%08x & 0x%08x != 0x%08x\n", __func__, reg, OS_REG_READ(ah, reg), mask, val); return AH_FALSE; } /* * Reverse the bits starting at the low bit for a value of * bit_count in size */ uint32_t ath_hal_reverseBits(uint32_t val, uint32_t n) { uint32_t retval; int i; for (i = 0, retval = 0; i < n; i++) { retval = (retval << 1) | (val & 1); val >>= 1; } return retval; } /* 802.11n related timing definitions */ #define OFDM_PLCP_BITS 22 #define HT_L_STF 8 #define HT_L_LTF 8 #define HT_L_SIG 4 #define HT_SIG 8 #define HT_STF 4 #define HT_LTF(n) ((n) * 4) -#define HT_RC_2_MCS(_rc) ((_rc) & 0xf) +#define HT_RC_2_MCS(_rc) ((_rc) & 0x1f) #define HT_RC_2_STREAMS(_rc) ((((_rc) & 0x78) >> 3) + 1) #define IS_HT_RATE(_rc) ( (_rc) & IEEE80211_RATE_MCS) /* * Calculate the duration of a packet whether it is 11n or legacy. */ uint32_t ath_hal_pkt_txtime(struct ath_hal *ah, const HAL_RATE_TABLE *rates, uint32_t frameLen, uint16_t rateix, HAL_BOOL isht40, HAL_BOOL shortPreamble, HAL_BOOL includeSifs) { uint8_t rc; int numStreams; rc = rates->info[rateix].rateCode; /* Legacy rate? Return the old way */ if (! IS_HT_RATE(rc)) return ath_hal_computetxtime(ah, rates, frameLen, rateix, shortPreamble, includeSifs); /* 11n frame - extract out the number of spatial streams */ numStreams = HT_RC_2_STREAMS(rc); KASSERT(numStreams > 0 && numStreams <= 4, ("number of spatial streams needs to be 1..3: MCS rate 0x%x!", rateix)); /* XXX TODO: Add SIFS */ return ath_computedur_ht(frameLen, rc, numStreams, isht40, shortPreamble); } static const uint16_t ht20_bps[32] = { 26, 52, 78, 104, 156, 208, 234, 260, 52, 104, 156, 208, 312, 416, 468, 520, 78, 156, 234, 312, 468, 624, 702, 780, 104, 208, 312, 416, 624, 832, 936, 1040 }; static const uint16_t ht40_bps[32] = { 54, 108, 162, 216, 324, 432, 486, 540, 108, 216, 324, 432, 648, 864, 972, 1080, 162, 324, 486, 648, 972, 1296, 1458, 1620, 216, 432, 648, 864, 1296, 1728, 1944, 2160 }; /* * Calculate the transmit duration of an 11n frame. */ uint32_t ath_computedur_ht(uint32_t frameLen, uint16_t rate, int streams, HAL_BOOL isht40, HAL_BOOL isShortGI) { uint32_t bitsPerSymbol, numBits, numSymbols, txTime; KASSERT(rate & IEEE80211_RATE_MCS, ("not mcs %d", rate)); KASSERT((rate &~ IEEE80211_RATE_MCS) < 31, ("bad mcs 0x%x", rate)); if (isht40) - bitsPerSymbol = ht40_bps[rate & 0x1f]; + bitsPerSymbol = ht40_bps[HT_RC_2_MCS(rate)]; else - bitsPerSymbol = ht20_bps[rate & 0x1f]; + bitsPerSymbol = ht20_bps[HT_RC_2_MCS(rate)]; numBits = OFDM_PLCP_BITS + (frameLen << 3); numSymbols = howmany(numBits, bitsPerSymbol); if (isShortGI) txTime = ((numSymbols * 18) + 4) / 5; /* 3.6us */ else txTime = numSymbols * 4; /* 4us */ return txTime + HT_L_STF + HT_L_LTF + HT_L_SIG + HT_SIG + HT_STF + HT_LTF(streams); } /* * Compute the time to transmit a frame of length frameLen bytes * using the specified rate, phy, and short preamble setting. */ uint16_t ath_hal_computetxtime(struct ath_hal *ah, const HAL_RATE_TABLE *rates, uint32_t frameLen, uint16_t rateix, HAL_BOOL shortPreamble, HAL_BOOL includeSifs) { uint32_t bitsPerSymbol, numBits, numSymbols, phyTime, txTime; uint32_t kbps; /* Warn if this function is called for 11n rates; it should not be! */ if (IS_HT_RATE(rates->info[rateix].rateCode)) ath_hal_printf(ah, "%s: MCS rate? (index %d; hwrate 0x%x)\n", __func__, rateix, rates->info[rateix].rateCode); kbps = rates->info[rateix].rateKbps; /* * index can be invalid during dynamic Turbo transitions. * XXX */ if (kbps == 0) return 0; switch (rates->info[rateix].phy) { case IEEE80211_T_CCK: phyTime = CCK_PREAMBLE_BITS + CCK_PLCP_BITS; if (shortPreamble && rates->info[rateix].shortPreamble) phyTime >>= 1; numBits = frameLen << 3; txTime = phyTime + ((numBits * 1000)/kbps); if (includeSifs) txTime += CCK_SIFS_TIME; break; case IEEE80211_T_OFDM: bitsPerSymbol = (kbps * OFDM_SYMBOL_TIME) / 1000; HALASSERT(bitsPerSymbol != 0); numBits = OFDM_PLCP_BITS + (frameLen << 3); numSymbols = howmany(numBits, bitsPerSymbol); txTime = OFDM_PREAMBLE_TIME + (numSymbols * OFDM_SYMBOL_TIME); if (includeSifs) txTime += OFDM_SIFS_TIME; break; case IEEE80211_T_OFDM_HALF: bitsPerSymbol = (kbps * OFDM_HALF_SYMBOL_TIME) / 1000; HALASSERT(bitsPerSymbol != 0); numBits = OFDM_HALF_PLCP_BITS + (frameLen << 3); numSymbols = howmany(numBits, bitsPerSymbol); txTime = OFDM_HALF_PREAMBLE_TIME + (numSymbols * OFDM_HALF_SYMBOL_TIME); if (includeSifs) txTime += OFDM_HALF_SIFS_TIME; break; case IEEE80211_T_OFDM_QUARTER: bitsPerSymbol = (kbps * OFDM_QUARTER_SYMBOL_TIME) / 1000; HALASSERT(bitsPerSymbol != 0); numBits = OFDM_QUARTER_PLCP_BITS + (frameLen << 3); numSymbols = howmany(numBits, bitsPerSymbol); txTime = OFDM_QUARTER_PREAMBLE_TIME + (numSymbols * OFDM_QUARTER_SYMBOL_TIME); if (includeSifs) txTime += OFDM_QUARTER_SIFS_TIME; break; case IEEE80211_T_TURBO: bitsPerSymbol = (kbps * TURBO_SYMBOL_TIME) / 1000; HALASSERT(bitsPerSymbol != 0); numBits = TURBO_PLCP_BITS + (frameLen << 3); numSymbols = howmany(numBits, bitsPerSymbol); txTime = TURBO_PREAMBLE_TIME + (numSymbols * TURBO_SYMBOL_TIME); if (includeSifs) txTime += TURBO_SIFS_TIME; break; default: HALDEBUG(ah, HAL_DEBUG_PHYIO, "%s: unknown phy %u (rate ix %u)\n", __func__, rates->info[rateix].phy, rateix); txTime = 0; break; } return txTime; } int ath_hal_get_curmode(struct ath_hal *ah, const struct ieee80211_channel *chan) { /* * Pick a default mode at bootup. A channel change is inevitable. */ if (!chan) return HAL_MODE_11NG_HT20; if (IEEE80211_IS_CHAN_TURBO(chan)) return HAL_MODE_TURBO; /* check for NA_HT before plain A, since IS_CHAN_A includes NA_HT */ if (IEEE80211_IS_CHAN_5GHZ(chan) && IEEE80211_IS_CHAN_HT20(chan)) return HAL_MODE_11NA_HT20; if (IEEE80211_IS_CHAN_5GHZ(chan) && IEEE80211_IS_CHAN_HT40U(chan)) return HAL_MODE_11NA_HT40PLUS; if (IEEE80211_IS_CHAN_5GHZ(chan) && IEEE80211_IS_CHAN_HT40D(chan)) return HAL_MODE_11NA_HT40MINUS; if (IEEE80211_IS_CHAN_A(chan)) return HAL_MODE_11A; /* check for NG_HT before plain G, since IS_CHAN_G includes NG_HT */ if (IEEE80211_IS_CHAN_2GHZ(chan) && IEEE80211_IS_CHAN_HT20(chan)) return HAL_MODE_11NG_HT20; if (IEEE80211_IS_CHAN_2GHZ(chan) && IEEE80211_IS_CHAN_HT40U(chan)) return HAL_MODE_11NG_HT40PLUS; if (IEEE80211_IS_CHAN_2GHZ(chan) && IEEE80211_IS_CHAN_HT40D(chan)) return HAL_MODE_11NG_HT40MINUS; /* * XXX For FreeBSD, will this work correctly given the DYN * chan mode (OFDM+CCK dynamic) ? We have pure-G versions DYN-BG.. */ if (IEEE80211_IS_CHAN_G(chan)) return HAL_MODE_11G; if (IEEE80211_IS_CHAN_B(chan)) return HAL_MODE_11B; HALASSERT(0); return HAL_MODE_11NG_HT20; } typedef enum { WIRELESS_MODE_11a = 0, WIRELESS_MODE_TURBO = 1, WIRELESS_MODE_11b = 2, WIRELESS_MODE_11g = 3, WIRELESS_MODE_108g = 4, WIRELESS_MODE_MAX } WIRELESS_MODE; +/* + * XXX TODO: for some (?) chips, an 11b mode still runs at 11bg. + * Maybe AR5211 has separate 11b and 11g only modes, so 11b is 22MHz + * and 11g is 44MHz, but AR5416 and later run 11b in 11bg mode, right? + */ static WIRELESS_MODE ath_hal_chan2wmode(struct ath_hal *ah, const struct ieee80211_channel *chan) { if (IEEE80211_IS_CHAN_B(chan)) return WIRELESS_MODE_11b; if (IEEE80211_IS_CHAN_G(chan)) return WIRELESS_MODE_11g; if (IEEE80211_IS_CHAN_108G(chan)) return WIRELESS_MODE_108g; if (IEEE80211_IS_CHAN_TURBO(chan)) return WIRELESS_MODE_TURBO; return WIRELESS_MODE_11a; } /* * Convert between microseconds and core system clocks. */ /* 11a Turbo 11b 11g 108g */ static const uint8_t CLOCK_RATE[] = { 40, 80, 22, 44, 88 }; #define CLOCK_FAST_RATE_5GHZ_OFDM 44 u_int ath_hal_mac_clks(struct ath_hal *ah, u_int usecs) { const struct ieee80211_channel *c = AH_PRIVATE(ah)->ah_curchan; u_int clks; /* NB: ah_curchan may be null when called attach time */ /* XXX merlin and later specific workaround - 5ghz fast clock is 44 */ if (c != AH_NULL && IS_5GHZ_FAST_CLOCK_EN(ah, c)) { clks = usecs * CLOCK_FAST_RATE_5GHZ_OFDM; if (IEEE80211_IS_CHAN_HT40(c)) clks <<= 1; } else if (c != AH_NULL) { clks = usecs * CLOCK_RATE[ath_hal_chan2wmode(ah, c)]; if (IEEE80211_IS_CHAN_HT40(c)) clks <<= 1; } else clks = usecs * CLOCK_RATE[WIRELESS_MODE_11b]; /* Compensate for half/quarter rate */ if (c != AH_NULL && IEEE80211_IS_CHAN_HALF(c)) clks = clks / 2; else if (c != AH_NULL && IEEE80211_IS_CHAN_QUARTER(c)) clks = clks / 4; return clks; } u_int ath_hal_mac_usec(struct ath_hal *ah, u_int clks) { + uint64_t psec; + + psec = ath_hal_mac_psec(ah, clks); + return (psec / 1000000); +} + +/* + * XXX TODO: half, quarter rates. + */ +uint64_t +ath_hal_mac_psec(struct ath_hal *ah, u_int clks) +{ const struct ieee80211_channel *c = AH_PRIVATE(ah)->ah_curchan; - u_int usec; + uint64_t psec; /* NB: ah_curchan may be null when called attach time */ /* XXX merlin and later specific workaround - 5ghz fast clock is 44 */ if (c != AH_NULL && IS_5GHZ_FAST_CLOCK_EN(ah, c)) { - usec = clks / CLOCK_FAST_RATE_5GHZ_OFDM; + psec = (clks * 1000000ULL) / CLOCK_FAST_RATE_5GHZ_OFDM; if (IEEE80211_IS_CHAN_HT40(c)) - usec >>= 1; + psec >>= 1; } else if (c != AH_NULL) { - usec = clks / CLOCK_RATE[ath_hal_chan2wmode(ah, c)]; + psec = (clks * 1000000ULL) / CLOCK_RATE[ath_hal_chan2wmode(ah, c)]; if (IEEE80211_IS_CHAN_HT40(c)) - usec >>= 1; + psec >>= 1; } else - usec = clks / CLOCK_RATE[WIRELESS_MODE_11b]; - return usec; + psec = (clks * 1000000ULL) / CLOCK_RATE[WIRELESS_MODE_11b]; + return psec; } /* * Setup a h/w rate table's reverse lookup table and * fill in ack durations. This routine is called for * each rate table returned through the ah_getRateTable * method. The reverse lookup tables are assumed to be * initialized to zero (or at least the first entry). * We use this as a key that indicates whether or not * we've previously setup the reverse lookup table. * * XXX not reentrant, but shouldn't matter */ void ath_hal_setupratetable(struct ath_hal *ah, HAL_RATE_TABLE *rt) { #define N(a) (sizeof(a)/sizeof(a[0])) int i; if (rt->rateCodeToIndex[0] != 0) /* already setup */ return; for (i = 0; i < N(rt->rateCodeToIndex); i++) rt->rateCodeToIndex[i] = (uint8_t) -1; for (i = 0; i < rt->rateCount; i++) { uint8_t code = rt->info[i].rateCode; uint8_t cix = rt->info[i].controlRate; HALASSERT(code < N(rt->rateCodeToIndex)); rt->rateCodeToIndex[code] = i; HALASSERT((code | rt->info[i].shortPreamble) < N(rt->rateCodeToIndex)); rt->rateCodeToIndex[code | rt->info[i].shortPreamble] = i; /* * XXX for 11g the control rate to use for 5.5 and 11 Mb/s * depends on whether they are marked as basic rates; * the static tables are setup with an 11b-compatible * 2Mb/s rate which will work but is suboptimal */ rt->info[i].lpAckDuration = ath_hal_computetxtime(ah, rt, WLAN_CTRL_FRAME_SIZE, cix, AH_FALSE, AH_TRUE); rt->info[i].spAckDuration = ath_hal_computetxtime(ah, rt, WLAN_CTRL_FRAME_SIZE, cix, AH_TRUE, AH_TRUE); } #undef N } HAL_STATUS ath_hal_getcapability(struct ath_hal *ah, HAL_CAPABILITY_TYPE type, uint32_t capability, uint32_t *result) { const HAL_CAPABILITIES *pCap = &AH_PRIVATE(ah)->ah_caps; switch (type) { case HAL_CAP_REG_DMN: /* regulatory domain */ *result = AH_PRIVATE(ah)->ah_currentRD; return HAL_OK; case HAL_CAP_DFS_DMN: /* DFS Domain */ *result = AH_PRIVATE(ah)->ah_dfsDomain; return HAL_OK; case HAL_CAP_CIPHER: /* cipher handled in hardware */ case HAL_CAP_TKIP_MIC: /* handle TKIP MIC in hardware */ return HAL_ENOTSUPP; case HAL_CAP_TKIP_SPLIT: /* hardware TKIP uses split keys */ return HAL_ENOTSUPP; case HAL_CAP_PHYCOUNTERS: /* hardware PHY error counters */ return pCap->halHwPhyCounterSupport ? HAL_OK : HAL_ENXIO; case HAL_CAP_WME_TKIPMIC: /* hardware can do TKIP MIC when WMM is turned on */ return HAL_ENOTSUPP; case HAL_CAP_DIVERSITY: /* hardware supports fast diversity */ return HAL_ENOTSUPP; case HAL_CAP_KEYCACHE_SIZE: /* hardware key cache size */ *result = pCap->halKeyCacheSize; return HAL_OK; case HAL_CAP_NUM_TXQUEUES: /* number of hardware tx queues */ *result = pCap->halTotalQueues; return HAL_OK; case HAL_CAP_VEOL: /* hardware supports virtual EOL */ return pCap->halVEOLSupport ? HAL_OK : HAL_ENOTSUPP; case HAL_CAP_PSPOLL: /* hardware PS-Poll support works */ return pCap->halPSPollBroken ? HAL_ENOTSUPP : HAL_OK; case HAL_CAP_COMPRESSION: return pCap->halCompressSupport ? HAL_OK : HAL_ENOTSUPP; case HAL_CAP_BURST: return pCap->halBurstSupport ? HAL_OK : HAL_ENOTSUPP; case HAL_CAP_FASTFRAME: return pCap->halFastFramesSupport ? HAL_OK : HAL_ENOTSUPP; case HAL_CAP_DIAG: /* hardware diagnostic support */ *result = AH_PRIVATE(ah)->ah_diagreg; return HAL_OK; case HAL_CAP_TXPOW: /* global tx power limit */ switch (capability) { case 0: /* facility is supported */ return HAL_OK; case 1: /* current limit */ *result = AH_PRIVATE(ah)->ah_powerLimit; return HAL_OK; case 2: /* current max tx power */ *result = AH_PRIVATE(ah)->ah_maxPowerLevel; return HAL_OK; case 3: /* scale factor */ *result = AH_PRIVATE(ah)->ah_tpScale; return HAL_OK; } return HAL_ENOTSUPP; case HAL_CAP_BSSIDMASK: /* hardware supports bssid mask */ return pCap->halBssIdMaskSupport ? HAL_OK : HAL_ENOTSUPP; case HAL_CAP_MCAST_KEYSRCH: /* multicast frame keycache search */ return pCap->halMcastKeySrchSupport ? HAL_OK : HAL_ENOTSUPP; case HAL_CAP_TSF_ADJUST: /* hardware has beacon tsf adjust */ return HAL_ENOTSUPP; case HAL_CAP_RFSILENT: /* rfsilent support */ switch (capability) { case 0: /* facility is supported */ return pCap->halRfSilentSupport ? HAL_OK : HAL_ENOTSUPP; case 1: /* current setting */ return AH_PRIVATE(ah)->ah_rfkillEnabled ? HAL_OK : HAL_ENOTSUPP; case 2: /* rfsilent config */ *result = AH_PRIVATE(ah)->ah_rfsilent; return HAL_OK; } return HAL_ENOTSUPP; case HAL_CAP_11D: return HAL_OK; case HAL_CAP_HT: return pCap->halHTSupport ? HAL_OK : HAL_ENOTSUPP; case HAL_CAP_GTXTO: return pCap->halGTTSupport ? HAL_OK : HAL_ENOTSUPP; case HAL_CAP_FAST_CC: return pCap->halFastCCSupport ? HAL_OK : HAL_ENOTSUPP; case HAL_CAP_TX_CHAINMASK: /* mask of TX chains supported */ *result = pCap->halTxChainMask; return HAL_OK; case HAL_CAP_RX_CHAINMASK: /* mask of RX chains supported */ *result = pCap->halRxChainMask; return HAL_OK; case HAL_CAP_NUM_GPIO_PINS: *result = pCap->halNumGpioPins; return HAL_OK; case HAL_CAP_CST: return pCap->halCSTSupport ? HAL_OK : HAL_ENOTSUPP; case HAL_CAP_RTS_AGGR_LIMIT: *result = pCap->halRtsAggrLimit; return HAL_OK; case HAL_CAP_4ADDR_AGGR: return pCap->hal4AddrAggrSupport ? HAL_OK : HAL_ENOTSUPP; case HAL_CAP_EXT_CHAN_DFS: return pCap->halExtChanDfsSupport ? HAL_OK : HAL_ENOTSUPP; case HAL_CAP_RX_STBC: return pCap->halRxStbcSupport ? HAL_OK : HAL_ENOTSUPP; case HAL_CAP_TX_STBC: return pCap->halTxStbcSupport ? HAL_OK : HAL_ENOTSUPP; case HAL_CAP_COMBINED_RADAR_RSSI: return pCap->halUseCombinedRadarRssi ? HAL_OK : HAL_ENOTSUPP; case HAL_CAP_AUTO_SLEEP: return pCap->halAutoSleepSupport ? HAL_OK : HAL_ENOTSUPP; case HAL_CAP_MBSSID_AGGR_SUPPORT: return pCap->halMbssidAggrSupport ? HAL_OK : HAL_ENOTSUPP; case HAL_CAP_SPLIT_4KB_TRANS: /* hardware handles descriptors straddling 4k page boundary */ return pCap->hal4kbSplitTransSupport ? HAL_OK : HAL_ENOTSUPP; case HAL_CAP_REG_FLAG: *result = AH_PRIVATE(ah)->ah_currentRDext; return HAL_OK; case HAL_CAP_ENHANCED_DMA_SUPPORT: return pCap->halEnhancedDmaSupport ? HAL_OK : HAL_ENOTSUPP; case HAL_CAP_NUM_TXMAPS: *result = pCap->halNumTxMaps; return HAL_OK; case HAL_CAP_TXDESCLEN: *result = pCap->halTxDescLen; return HAL_OK; case HAL_CAP_TXSTATUSLEN: *result = pCap->halTxStatusLen; return HAL_OK; case HAL_CAP_RXSTATUSLEN: *result = pCap->halRxStatusLen; return HAL_OK; case HAL_CAP_RXFIFODEPTH: switch (capability) { case HAL_RX_QUEUE_HP: *result = pCap->halRxHpFifoDepth; return HAL_OK; case HAL_RX_QUEUE_LP: *result = pCap->halRxLpFifoDepth; return HAL_OK; default: return HAL_ENOTSUPP; } case HAL_CAP_RXBUFSIZE: case HAL_CAP_NUM_MR_RETRIES: *result = pCap->halNumMRRetries; return HAL_OK; case HAL_CAP_BT_COEX: return pCap->halBtCoexSupport ? HAL_OK : HAL_ENOTSUPP; case HAL_CAP_SPECTRAL_SCAN: return pCap->halSpectralScanSupport ? HAL_OK : HAL_ENOTSUPP; case HAL_CAP_HT20_SGI: return pCap->halHTSGI20Support ? HAL_OK : HAL_ENOTSUPP; case HAL_CAP_RXTSTAMP_PREC: /* rx desc tstamp precision (bits) */ *result = pCap->halRxTstampPrecision; return HAL_OK; case HAL_CAP_ANT_DIV_COMB: /* AR9285/AR9485 LNA diversity */ return pCap->halAntDivCombSupport ? HAL_OK : HAL_ENOTSUPP; case HAL_CAP_ENHANCED_DFS_SUPPORT: return pCap->halEnhancedDfsSupport ? HAL_OK : HAL_ENOTSUPP; /* FreeBSD-specific entries for now */ case HAL_CAP_RXORN_FATAL: /* HAL_INT_RXORN treated as fatal */ return AH_PRIVATE(ah)->ah_rxornIsFatal ? HAL_OK : HAL_ENOTSUPP; case HAL_CAP_INTRMASK: /* mask of supported interrupts */ *result = pCap->halIntrMask; return HAL_OK; case HAL_CAP_BSSIDMATCH: /* hardware has disable bssid match */ return pCap->halBssidMatchSupport ? HAL_OK : HAL_ENOTSUPP; case HAL_CAP_STREAMS: /* number of 11n spatial streams */ switch (capability) { case 0: /* TX */ *result = pCap->halTxStreams; return HAL_OK; case 1: /* RX */ *result = pCap->halRxStreams; return HAL_OK; default: return HAL_ENOTSUPP; } case HAL_CAP_RXDESC_SELFLINK: /* hardware supports self-linked final RX descriptors correctly */ return pCap->halHasRxSelfLinkedTail ? HAL_OK : HAL_ENOTSUPP; case HAL_CAP_BB_READ_WAR: /* Baseband read WAR */ return pCap->halHasBBReadWar? HAL_OK : HAL_ENOTSUPP; case HAL_CAP_SERIALISE_WAR: /* PCI register serialisation */ return pCap->halSerialiseRegWar ? HAL_OK : HAL_ENOTSUPP; case HAL_CAP_MFP: /* Management frame protection setting */ *result = pCap->halMfpSupport; return HAL_OK; case HAL_CAP_RX_LNA_MIXING: /* Hardware uses an RX LNA mixer to map 2 antennas to a 1 stream receiver */ return pCap->halRxUsingLnaMixing ? HAL_OK : HAL_ENOTSUPP; case HAL_CAP_DO_MYBEACON: /* Hardware supports filtering my-beacons */ return pCap->halRxDoMyBeacon ? HAL_OK : HAL_ENOTSUPP; case HAL_CAP_TXTSTAMP_PREC: /* tx desc tstamp precision (bits) */ *result = pCap->halTxTstampPrecision; return HAL_OK; default: return HAL_EINVAL; } } HAL_BOOL ath_hal_setcapability(struct ath_hal *ah, HAL_CAPABILITY_TYPE type, uint32_t capability, uint32_t setting, HAL_STATUS *status) { switch (type) { case HAL_CAP_TXPOW: switch (capability) { case 3: if (setting <= HAL_TP_SCALE_MIN) { AH_PRIVATE(ah)->ah_tpScale = setting; return AH_TRUE; } break; } break; case HAL_CAP_RFSILENT: /* rfsilent support */ /* * NB: allow even if halRfSilentSupport is false * in case the EEPROM is misprogrammed. */ switch (capability) { case 1: /* current setting */ AH_PRIVATE(ah)->ah_rfkillEnabled = (setting != 0); return AH_TRUE; case 2: /* rfsilent config */ /* XXX better done per-chip for validation? */ AH_PRIVATE(ah)->ah_rfsilent = setting; return AH_TRUE; } break; case HAL_CAP_REG_DMN: /* regulatory domain */ AH_PRIVATE(ah)->ah_currentRD = setting; return AH_TRUE; case HAL_CAP_RXORN_FATAL: /* HAL_INT_RXORN treated as fatal */ AH_PRIVATE(ah)->ah_rxornIsFatal = setting; return AH_TRUE; default: break; } if (status) *status = HAL_EINVAL; return AH_FALSE; } /* * Common support for getDiagState method. */ static u_int ath_hal_getregdump(struct ath_hal *ah, const HAL_REGRANGE *regs, void *dstbuf, int space) { uint32_t *dp = dstbuf; int i; for (i = 0; space >= 2*sizeof(uint32_t); i++) { uint32_t r = regs[i].start; uint32_t e = regs[i].end; *dp++ = r; *dp++ = e; space -= 2*sizeof(uint32_t); do { *dp++ = OS_REG_READ(ah, r); r += sizeof(uint32_t); space -= sizeof(uint32_t); } while (r <= e && space >= sizeof(uint32_t)); } return (char *) dp - (char *) dstbuf; } static void ath_hal_setregs(struct ath_hal *ah, const HAL_REGWRITE *regs, int space) { while (space >= sizeof(HAL_REGWRITE)) { OS_REG_WRITE(ah, regs->addr, regs->value); regs++, space -= sizeof(HAL_REGWRITE); } } HAL_BOOL ath_hal_getdiagstate(struct ath_hal *ah, int request, const void *args, uint32_t argsize, void **result, uint32_t *resultsize) { switch (request) { case HAL_DIAG_REVS: *result = &AH_PRIVATE(ah)->ah_devid; *resultsize = sizeof(HAL_REVS); return AH_TRUE; case HAL_DIAG_REGS: *resultsize = ath_hal_getregdump(ah, args, *result,*resultsize); return AH_TRUE; case HAL_DIAG_SETREGS: ath_hal_setregs(ah, args, argsize); *resultsize = 0; return AH_TRUE; case HAL_DIAG_FATALERR: *result = &AH_PRIVATE(ah)->ah_fatalState[0]; *resultsize = sizeof(AH_PRIVATE(ah)->ah_fatalState); return AH_TRUE; case HAL_DIAG_EEREAD: if (argsize != sizeof(uint16_t)) return AH_FALSE; if (!ath_hal_eepromRead(ah, *(const uint16_t *)args, *result)) return AH_FALSE; *resultsize = sizeof(uint16_t); return AH_TRUE; #ifdef AH_PRIVATE_DIAG case HAL_DIAG_SETKEY: { const HAL_DIAG_KEYVAL *dk; if (argsize != sizeof(HAL_DIAG_KEYVAL)) return AH_FALSE; dk = (const HAL_DIAG_KEYVAL *)args; return ah->ah_setKeyCacheEntry(ah, dk->dk_keyix, &dk->dk_keyval, dk->dk_mac, dk->dk_xor); } case HAL_DIAG_RESETKEY: if (argsize != sizeof(uint16_t)) return AH_FALSE; return ah->ah_resetKeyCacheEntry(ah, *(const uint16_t *)args); #ifdef AH_SUPPORT_WRITE_EEPROM case HAL_DIAG_EEWRITE: { const HAL_DIAG_EEVAL *ee; if (argsize != sizeof(HAL_DIAG_EEVAL)) return AH_FALSE; ee = (const HAL_DIAG_EEVAL *)args; return ath_hal_eepromWrite(ah, ee->ee_off, ee->ee_data); } #endif /* AH_SUPPORT_WRITE_EEPROM */ #endif /* AH_PRIVATE_DIAG */ case HAL_DIAG_11NCOMPAT: if (argsize == 0) { *resultsize = sizeof(uint32_t); *((uint32_t *)(*result)) = AH_PRIVATE(ah)->ah_11nCompat; } else if (argsize == sizeof(uint32_t)) { AH_PRIVATE(ah)->ah_11nCompat = *(const uint32_t *)args; } else return AH_FALSE; return AH_TRUE; case HAL_DIAG_CHANSURVEY: *result = &AH_PRIVATE(ah)->ah_chansurvey; *resultsize = sizeof(HAL_CHANNEL_SURVEY); return AH_TRUE; } return AH_FALSE; } /* * Set the properties of the tx queue with the parameters * from qInfo. */ HAL_BOOL ath_hal_setTxQProps(struct ath_hal *ah, HAL_TX_QUEUE_INFO *qi, const HAL_TXQ_INFO *qInfo) { uint32_t cw; if (qi->tqi_type == HAL_TX_QUEUE_INACTIVE) { HALDEBUG(ah, HAL_DEBUG_TXQUEUE, "%s: inactive queue\n", __func__); return AH_FALSE; } /* XXX validate parameters */ qi->tqi_ver = qInfo->tqi_ver; qi->tqi_subtype = qInfo->tqi_subtype; qi->tqi_qflags = qInfo->tqi_qflags; qi->tqi_priority = qInfo->tqi_priority; if (qInfo->tqi_aifs != HAL_TXQ_USEDEFAULT) qi->tqi_aifs = AH_MIN(qInfo->tqi_aifs, 255); else qi->tqi_aifs = INIT_AIFS; if (qInfo->tqi_cwmin != HAL_TXQ_USEDEFAULT) { cw = AH_MIN(qInfo->tqi_cwmin, 1024); /* make sure that the CWmin is of the form (2^n - 1) */ qi->tqi_cwmin = 1; while (qi->tqi_cwmin < cw) qi->tqi_cwmin = (qi->tqi_cwmin << 1) | 1; } else qi->tqi_cwmin = qInfo->tqi_cwmin; if (qInfo->tqi_cwmax != HAL_TXQ_USEDEFAULT) { cw = AH_MIN(qInfo->tqi_cwmax, 1024); /* make sure that the CWmax is of the form (2^n - 1) */ qi->tqi_cwmax = 1; while (qi->tqi_cwmax < cw) qi->tqi_cwmax = (qi->tqi_cwmax << 1) | 1; } else qi->tqi_cwmax = INIT_CWMAX; /* Set retry limit values */ if (qInfo->tqi_shretry != 0) qi->tqi_shretry = AH_MIN(qInfo->tqi_shretry, 15); else qi->tqi_shretry = INIT_SH_RETRY; if (qInfo->tqi_lgretry != 0) qi->tqi_lgretry = AH_MIN(qInfo->tqi_lgretry, 15); else qi->tqi_lgretry = INIT_LG_RETRY; qi->tqi_cbrPeriod = qInfo->tqi_cbrPeriod; qi->tqi_cbrOverflowLimit = qInfo->tqi_cbrOverflowLimit; qi->tqi_burstTime = qInfo->tqi_burstTime; qi->tqi_readyTime = qInfo->tqi_readyTime; switch (qInfo->tqi_subtype) { case HAL_WME_UPSD: if (qi->tqi_type == HAL_TX_QUEUE_DATA) qi->tqi_intFlags = HAL_TXQ_USE_LOCKOUT_BKOFF_DIS; break; default: break; /* NB: silence compiler */ } return AH_TRUE; } HAL_BOOL ath_hal_getTxQProps(struct ath_hal *ah, HAL_TXQ_INFO *qInfo, const HAL_TX_QUEUE_INFO *qi) { if (qi->tqi_type == HAL_TX_QUEUE_INACTIVE) { HALDEBUG(ah, HAL_DEBUG_TXQUEUE, "%s: inactive queue\n", __func__); return AH_FALSE; } qInfo->tqi_qflags = qi->tqi_qflags; qInfo->tqi_ver = qi->tqi_ver; qInfo->tqi_subtype = qi->tqi_subtype; qInfo->tqi_qflags = qi->tqi_qflags; qInfo->tqi_priority = qi->tqi_priority; qInfo->tqi_aifs = qi->tqi_aifs; qInfo->tqi_cwmin = qi->tqi_cwmin; qInfo->tqi_cwmax = qi->tqi_cwmax; qInfo->tqi_shretry = qi->tqi_shretry; qInfo->tqi_lgretry = qi->tqi_lgretry; qInfo->tqi_cbrPeriod = qi->tqi_cbrPeriod; qInfo->tqi_cbrOverflowLimit = qi->tqi_cbrOverflowLimit; qInfo->tqi_burstTime = qi->tqi_burstTime; qInfo->tqi_readyTime = qi->tqi_readyTime; return AH_TRUE; } /* 11a Turbo 11b 11g 108g */ static const int16_t NOISE_FLOOR[] = { -96, -93, -98, -96, -93 }; /* * Read the current channel noise floor and return. * If nf cal hasn't finished, channel noise floor should be 0 * and we return a nominal value based on band and frequency. * * NB: This is a private routine used by per-chip code to * implement the ah_getChanNoise method. */ int16_t ath_hal_getChanNoise(struct ath_hal *ah, const struct ieee80211_channel *chan) { HAL_CHANNEL_INTERNAL *ichan; ichan = ath_hal_checkchannel(ah, chan); if (ichan == AH_NULL) { HALDEBUG(ah, HAL_DEBUG_NFCAL, "%s: invalid channel %u/0x%x; no mapping\n", __func__, chan->ic_freq, chan->ic_flags); return 0; } if (ichan->rawNoiseFloor == 0) { WIRELESS_MODE mode = ath_hal_chan2wmode(ah, chan); HALASSERT(mode < WIRELESS_MODE_MAX); return NOISE_FLOOR[mode] + ath_hal_getNfAdjust(ah, ichan); } else return ichan->rawNoiseFloor + ichan->noiseFloorAdjust; } /* * Fetch the current setup of ctl/ext noise floor values. * * If the CHANNEL_MIMO_NF_VALID flag isn't set, the array is simply * populated with values from NOISE_FLOOR[] + ath_hal_getNfAdjust(). * * The caller must supply ctl/ext NF arrays which are at least * AH_MAX_CHAINS entries long. */ int ath_hal_get_mimo_chan_noise(struct ath_hal *ah, const struct ieee80211_channel *chan, int16_t *nf_ctl, int16_t *nf_ext) { #ifdef AH_SUPPORT_AR5416 HAL_CHANNEL_INTERNAL *ichan; int i; ichan = ath_hal_checkchannel(ah, chan); if (ichan == AH_NULL) { HALDEBUG(ah, HAL_DEBUG_NFCAL, "%s: invalid channel %u/0x%x; no mapping\n", __func__, chan->ic_freq, chan->ic_flags); for (i = 0; i < AH_MAX_CHAINS; i++) { nf_ctl[i] = nf_ext[i] = 0; } return 0; } /* Return 0 if there's no valid MIMO values (yet) */ if (! (ichan->privFlags & CHANNEL_MIMO_NF_VALID)) { for (i = 0; i < AH_MAX_CHAINS; i++) { nf_ctl[i] = nf_ext[i] = 0; } return 0; } if (ichan->rawNoiseFloor == 0) { WIRELESS_MODE mode = ath_hal_chan2wmode(ah, chan); HALASSERT(mode < WIRELESS_MODE_MAX); /* * See the comment below - this could cause issues for * stations which have a very low RSSI, below the * 'normalised' NF values in NOISE_FLOOR[]. */ for (i = 0; i < AH_MAX_CHAINS; i++) { nf_ctl[i] = nf_ext[i] = NOISE_FLOOR[mode] + ath_hal_getNfAdjust(ah, ichan); } return 1; } else { /* * The value returned here from a MIMO radio is presumed to be * "good enough" as a NF calculation. As RSSI values are calculated * against this, an adjusted NF may be higher than the RSSI value * returned from a vary weak station, resulting in an obscenely * high signal strength calculation being returned. * * This should be re-evaluated at a later date, along with any * signal strength calculations which are made. Quite likely the * RSSI values will need to be adjusted to ensure the calculations * don't "wrap" when RSSI is less than the "adjusted" NF value. * ("Adjust" here is via ichan->noiseFloorAdjust.) */ for (i = 0; i < AH_MAX_CHAINS; i++) { nf_ctl[i] = ichan->noiseFloorCtl[i] + ath_hal_getNfAdjust(ah, ichan); nf_ext[i] = ichan->noiseFloorExt[i] + ath_hal_getNfAdjust(ah, ichan); } return 1; } #else return 0; #endif /* AH_SUPPORT_AR5416 */ } /* * Process all valid raw noise floors into the dBm noise floor values. * Though our device has no reference for a dBm noise floor, we perform * a relative minimization of NF's based on the lowest NF found across a * channel scan. */ void ath_hal_process_noisefloor(struct ath_hal *ah) { HAL_CHANNEL_INTERNAL *c; int16_t correct2, correct5; int16_t lowest2, lowest5; int i; /* * Find the lowest 2GHz and 5GHz noise floor values after adjusting * for statistically recorded NF/channel deviation. */ correct2 = lowest2 = 0; correct5 = lowest5 = 0; for (i = 0; i < AH_PRIVATE(ah)->ah_nchan; i++) { WIRELESS_MODE mode; int16_t nf; c = &AH_PRIVATE(ah)->ah_channels[i]; if (c->rawNoiseFloor >= 0) continue; /* XXX can't identify proper mode */ mode = IS_CHAN_5GHZ(c) ? WIRELESS_MODE_11a : WIRELESS_MODE_11g; nf = c->rawNoiseFloor + NOISE_FLOOR[mode] + ath_hal_getNfAdjust(ah, c); if (IS_CHAN_5GHZ(c)) { if (nf < lowest5) { lowest5 = nf; correct5 = NOISE_FLOOR[mode] - (c->rawNoiseFloor + ath_hal_getNfAdjust(ah, c)); } } else { if (nf < lowest2) { lowest2 = nf; correct2 = NOISE_FLOOR[mode] - (c->rawNoiseFloor + ath_hal_getNfAdjust(ah, c)); } } } /* Correct the channels to reach the expected NF value */ for (i = 0; i < AH_PRIVATE(ah)->ah_nchan; i++) { c = &AH_PRIVATE(ah)->ah_channels[i]; if (c->rawNoiseFloor >= 0) continue; /* Apply correction factor */ c->noiseFloorAdjust = ath_hal_getNfAdjust(ah, c) + (IS_CHAN_5GHZ(c) ? correct5 : correct2); HALDEBUG(ah, HAL_DEBUG_NFCAL, "%u raw nf %d adjust %d\n", c->channel, c->rawNoiseFloor, c->noiseFloorAdjust); } } /* * INI support routines. */ int ath_hal_ini_write(struct ath_hal *ah, const HAL_INI_ARRAY *ia, int col, int regWr) { int r; HALASSERT(col < ia->cols); for (r = 0; r < ia->rows; r++) { OS_REG_WRITE(ah, HAL_INI_VAL(ia, r, 0), HAL_INI_VAL(ia, r, col)); /* Analog shift register delay seems needed for Merlin - PR kern/154220 */ if (HAL_INI_VAL(ia, r, 0) >= 0x7800 && HAL_INI_VAL(ia, r, 0) < 0x7900) OS_DELAY(100); DMA_YIELD(regWr); } return regWr; } void ath_hal_ini_bank_setup(uint32_t data[], const HAL_INI_ARRAY *ia, int col) { int r; HALASSERT(col < ia->cols); for (r = 0; r < ia->rows; r++) data[r] = HAL_INI_VAL(ia, r, col); } int ath_hal_ini_bank_write(struct ath_hal *ah, const HAL_INI_ARRAY *ia, const uint32_t data[], int regWr) { int r; for (r = 0; r < ia->rows; r++) { OS_REG_WRITE(ah, HAL_INI_VAL(ia, r, 0), data[r]); DMA_YIELD(regWr); } return regWr; } /* * These are EEPROM board related routines which should likely live in * a helper library of some sort. */ /************************************************************** * ath_ee_getLowerUppderIndex * * Return indices surrounding the value in sorted integer lists. * Requirement: the input list must be monotonically increasing * and populated up to the list size * Returns: match is set if an index in the array matches exactly * or a the target is before or after the range of the array. */ HAL_BOOL ath_ee_getLowerUpperIndex(uint8_t target, uint8_t *pList, uint16_t listSize, uint16_t *indexL, uint16_t *indexR) { uint16_t i; /* * Check first and last elements for beyond ordered array cases. */ if (target <= pList[0]) { *indexL = *indexR = 0; return AH_TRUE; } if (target >= pList[listSize-1]) { *indexL = *indexR = (uint16_t)(listSize - 1); return AH_TRUE; } /* look for value being near or between 2 values in list */ for (i = 0; i < listSize - 1; i++) { /* * If value is close to the current value of the list * then target is not between values, it is one of the values */ if (pList[i] == target) { *indexL = *indexR = i; return AH_TRUE; } /* * Look for value being between current value and next value * if so return these 2 values */ if (target < pList[i + 1]) { *indexL = i; *indexR = (uint16_t)(i + 1); return AH_FALSE; } } HALASSERT(0); *indexL = *indexR = 0; return AH_FALSE; } /************************************************************** * ath_ee_FillVpdTable * * Fill the Vpdlist for indices Pmax-Pmin * Note: pwrMin, pwrMax and Vpdlist are all in dBm * 4 */ HAL_BOOL ath_ee_FillVpdTable(uint8_t pwrMin, uint8_t pwrMax, uint8_t *pPwrList, uint8_t *pVpdList, uint16_t numIntercepts, uint8_t *pRetVpdList) { uint16_t i, k; uint8_t currPwr = pwrMin; uint16_t idxL, idxR; HALASSERT(pwrMax > pwrMin); for (i = 0; i <= (pwrMax - pwrMin) / 2; i++) { ath_ee_getLowerUpperIndex(currPwr, pPwrList, numIntercepts, &(idxL), &(idxR)); if (idxR < 1) idxR = 1; /* extrapolate below */ if (idxL == numIntercepts - 1) idxL = (uint16_t)(numIntercepts - 2); /* extrapolate above */ if (pPwrList[idxL] == pPwrList[idxR]) k = pVpdList[idxL]; else k = (uint16_t)( ((currPwr - pPwrList[idxL]) * pVpdList[idxR] + (pPwrList[idxR] - currPwr) * pVpdList[idxL]) / (pPwrList[idxR] - pPwrList[idxL]) ); HALASSERT(k < 256); pRetVpdList[i] = (uint8_t)k; currPwr += 2; /* half dB steps */ } return AH_TRUE; } /************************************************************************** * ath_ee_interpolate * * Returns signed interpolated or the scaled up interpolated value */ int16_t ath_ee_interpolate(uint16_t target, uint16_t srcLeft, uint16_t srcRight, int16_t targetLeft, int16_t targetRight) { int16_t rv; if (srcRight == srcLeft) { rv = targetLeft; } else { rv = (int16_t)( ((target - srcLeft) * targetRight + (srcRight - target) * targetLeft) / (srcRight - srcLeft) ); } return rv; } /* * Adjust the TSF. */ void ath_hal_adjusttsf(struct ath_hal *ah, int32_t tsfdelta) { /* XXX handle wrap/overflow */ OS_REG_WRITE(ah, AR_TSF_L32, OS_REG_READ(ah, AR_TSF_L32) + tsfdelta); } /* * Enable or disable CCA. */ void ath_hal_setcca(struct ath_hal *ah, int ena) { /* * NB: fill me in; this is not provided by default because disabling * CCA in most locales violates regulatory. */ } /* * Get CCA setting. */ int ath_hal_getcca(struct ath_hal *ah) { u_int32_t diag; if (ath_hal_getcapability(ah, HAL_CAP_DIAG, 0, &diag) != HAL_OK) return 1; return ((diag & 0x500000) == 0); } /* * This routine is only needed when supporting EEPROM-in-RAM setups * (eg embedded SoCs and on-board PCI/PCIe devices.) */ /* NB: This is in 16 bit words; not bytes */ /* XXX This doesn't belong here! */ #define ATH_DATA_EEPROM_SIZE 2048 HAL_BOOL ath_hal_EepromDataRead(struct ath_hal *ah, u_int off, uint16_t *data) { if (ah->ah_eepromdata == AH_NULL) { HALDEBUG(ah, HAL_DEBUG_ANY, "%s: no eeprom data!\n", __func__); return AH_FALSE; } if (off > ATH_DATA_EEPROM_SIZE) { HALDEBUG(ah, HAL_DEBUG_ANY, "%s: offset %x > %x\n", __func__, off, ATH_DATA_EEPROM_SIZE); return AH_FALSE; } (*data) = ah->ah_eepromdata[off]; return AH_TRUE; } /* * Do a 2GHz specific MHz->IEEE based on the hardware * frequency. * * This is the unmapped frequency which is programmed into the hardware. */ int ath_hal_mhz2ieee_2ghz(struct ath_hal *ah, int freq) { if (freq == 2484) return 14; if (freq < 2484) return ((int) freq - 2407) / 5; else return 15 + ((freq - 2512) / 20); } /* * Clear the current survey data. * * This should be done during a channel change. */ void ath_hal_survey_clear(struct ath_hal *ah) { OS_MEMZERO(&AH_PRIVATE(ah)->ah_chansurvey, sizeof(AH_PRIVATE(ah)->ah_chansurvey)); } /* * Add a sample to the channel survey. */ void ath_hal_survey_add_sample(struct ath_hal *ah, HAL_SURVEY_SAMPLE *hs) { HAL_CHANNEL_SURVEY *cs; cs = &AH_PRIVATE(ah)->ah_chansurvey; OS_MEMCPY(&cs->samples[cs->cur_sample], hs, sizeof(*hs)); cs->samples[cs->cur_sample].seq_num = cs->cur_seq; cs->cur_sample = (cs->cur_sample + 1) % CHANNEL_SURVEY_SAMPLE_COUNT; cs->cur_seq++; } Index: projects/clang390-import/sys/dev/ath/ath_hal/ah.h =================================================================== --- projects/clang390-import/sys/dev/ath/ath_hal/ah.h (revision 305686) +++ projects/clang390-import/sys/dev/ath/ath_hal/ah.h (revision 305687) @@ -1,1677 +1,1684 @@ /* * Copyright (c) 2002-2009 Sam Leffler, Errno Consulting * Copyright (c) 2002-2008 Atheros Communications, Inc. * * Permission to use, copy, modify, and/or distribute this software for any * purpose with or without fee is hereby granted, provided that the above * copyright notice and this permission notice appear in all copies. * * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. * * $FreeBSD$ */ #ifndef _ATH_AH_H_ #define _ATH_AH_H_ /* * Atheros Hardware Access Layer * * Clients of the HAL call ath_hal_attach to obtain a reference to an ath_hal * structure for use with the device. Hardware-related operations that * follow must call back into the HAL through interface, supplying the * reference as the first parameter. */ #include "ah_osdep.h" /* * The maximum number of TX/RX chains supported. * This is intended to be used by various statistics gathering operations * (NF, RSSI, EVM). */ #define AH_MAX_CHAINS 3 #define AH_MIMO_MAX_EVM_PILOTS 6 /* * __ahdecl is analogous to _cdecl; it defines the calling * convention used within the HAL. For most systems this * can just default to be empty and the compiler will (should) * use _cdecl. For systems where _cdecl is not compatible this * must be defined. See linux/ah_osdep.h for an example. */ #ifndef __ahdecl #define __ahdecl #endif /* * Status codes that may be returned by the HAL. Note that * interfaces that return a status code set it only when an * error occurs--i.e. you cannot check it for success. */ typedef enum { HAL_OK = 0, /* No error */ HAL_ENXIO = 1, /* No hardware present */ HAL_ENOMEM = 2, /* Memory allocation failed */ HAL_EIO = 3, /* Hardware didn't respond as expected */ HAL_EEMAGIC = 4, /* EEPROM magic number invalid */ HAL_EEVERSION = 5, /* EEPROM version invalid */ HAL_EELOCKED = 6, /* EEPROM unreadable */ HAL_EEBADSUM = 7, /* EEPROM checksum invalid */ HAL_EEREAD = 8, /* EEPROM read problem */ HAL_EEBADMAC = 9, /* EEPROM mac address invalid */ HAL_EESIZE = 10, /* EEPROM size not supported */ HAL_EEWRITE = 11, /* Attempt to change write-locked EEPROM */ HAL_EINVAL = 12, /* Invalid parameter to function */ HAL_ENOTSUPP = 13, /* Hardware revision not supported */ HAL_ESELFTEST = 14, /* Hardware self-test failed */ HAL_EINPROGRESS = 15, /* Operation incomplete */ HAL_EEBADREG = 16, /* EEPROM invalid regulatory contents */ HAL_EEBADCC = 17, /* EEPROM invalid country code */ HAL_INV_PMODE = 18, /* Couldn't bring out of sleep state */ } HAL_STATUS; typedef enum { AH_FALSE = 0, /* NB: lots of code assumes false is zero */ AH_TRUE = 1, } HAL_BOOL; typedef enum { HAL_CAP_REG_DMN = 0, /* current regulatory domain */ HAL_CAP_CIPHER = 1, /* hardware supports cipher */ HAL_CAP_TKIP_MIC = 2, /* handle TKIP MIC in hardware */ HAL_CAP_TKIP_SPLIT = 3, /* hardware TKIP uses split keys */ HAL_CAP_PHYCOUNTERS = 4, /* hardware PHY error counters */ HAL_CAP_DIVERSITY = 5, /* hardware supports fast diversity */ HAL_CAP_KEYCACHE_SIZE = 6, /* number of entries in key cache */ HAL_CAP_NUM_TXQUEUES = 7, /* number of hardware xmit queues */ HAL_CAP_VEOL = 9, /* hardware supports virtual EOL */ HAL_CAP_PSPOLL = 10, /* hardware has working PS-Poll support */ HAL_CAP_DIAG = 11, /* hardware diagnostic support */ HAL_CAP_COMPRESSION = 12, /* hardware supports compression */ HAL_CAP_BURST = 13, /* hardware supports packet bursting */ HAL_CAP_FASTFRAME = 14, /* hardware supoprts fast frames */ HAL_CAP_TXPOW = 15, /* global tx power limit */ HAL_CAP_TPC = 16, /* per-packet tx power control */ HAL_CAP_PHYDIAG = 17, /* hardware phy error diagnostic */ HAL_CAP_BSSIDMASK = 18, /* hardware supports bssid mask */ HAL_CAP_MCAST_KEYSRCH = 19, /* hardware has multicast key search */ HAL_CAP_TSF_ADJUST = 20, /* hardware has beacon tsf adjust */ /* 21 was HAL_CAP_XR */ HAL_CAP_WME_TKIPMIC = 22, /* hardware can support TKIP MIC when WMM is turned on */ /* 23 was HAL_CAP_CHAN_HALFRATE */ /* 24 was HAL_CAP_CHAN_QUARTERRATE */ HAL_CAP_RFSILENT = 25, /* hardware has rfsilent support */ HAL_CAP_TPC_ACK = 26, /* ack txpower with per-packet tpc */ HAL_CAP_TPC_CTS = 27, /* cts txpower with per-packet tpc */ HAL_CAP_11D = 28, /* 11d beacon support for changing cc */ HAL_CAP_PCIE_PS = 29, HAL_CAP_HT = 30, /* hardware can support HT */ HAL_CAP_GTXTO = 31, /* hardware supports global tx timeout */ HAL_CAP_FAST_CC = 32, /* hardware supports fast channel change */ HAL_CAP_TX_CHAINMASK = 33, /* mask of TX chains supported */ HAL_CAP_RX_CHAINMASK = 34, /* mask of RX chains supported */ HAL_CAP_NUM_GPIO_PINS = 36, /* number of GPIO pins */ HAL_CAP_CST = 38, /* hardware supports carrier sense timeout */ HAL_CAP_RIFS_RX = 39, HAL_CAP_RIFS_TX = 40, HAL_CAP_FORCE_PPM = 41, HAL_CAP_RTS_AGGR_LIMIT = 42, /* aggregation limit with RTS */ HAL_CAP_4ADDR_AGGR = 43, /* hardware is capable of 4addr aggregation */ HAL_CAP_DFS_DMN = 44, /* current DFS domain */ HAL_CAP_EXT_CHAN_DFS = 45, /* DFS support for extension channel */ HAL_CAP_COMBINED_RADAR_RSSI = 46, /* Is combined RSSI for radar accurate */ HAL_CAP_AUTO_SLEEP = 48, /* hardware can go to network sleep automatically after waking up to receive TIM */ HAL_CAP_MBSSID_AGGR_SUPPORT = 49, /* Support for mBSSID Aggregation */ HAL_CAP_SPLIT_4KB_TRANS = 50, /* hardware supports descriptors straddling a 4k page boundary */ HAL_CAP_REG_FLAG = 51, /* Regulatory domain flags */ HAL_CAP_BB_RIFS_HANG = 52, HAL_CAP_RIFS_RX_ENABLED = 53, HAL_CAP_BB_DFS_HANG = 54, HAL_CAP_RX_STBC = 58, HAL_CAP_TX_STBC = 59, HAL_CAP_BT_COEX = 60, /* hardware is capable of bluetooth coexistence */ HAL_CAP_DYNAMIC_SMPS = 61, /* Dynamic MIMO Power Save hardware support */ HAL_CAP_DS = 67, /* 2 stream */ HAL_CAP_BB_RX_CLEAR_STUCK_HANG = 68, HAL_CAP_MAC_HANG = 69, /* can MAC hang */ HAL_CAP_MFP = 70, /* Management Frame Protection in hardware */ HAL_CAP_TS = 72, /* 3 stream */ HAL_CAP_ENHANCED_DMA_SUPPORT = 75, /* DMA FIFO support */ HAL_CAP_NUM_TXMAPS = 76, /* Number of buffers in a transmit descriptor */ HAL_CAP_TXDESCLEN = 77, /* Length of transmit descriptor */ HAL_CAP_TXSTATUSLEN = 78, /* Length of transmit status descriptor */ HAL_CAP_RXSTATUSLEN = 79, /* Length of transmit status descriptor */ HAL_CAP_RXFIFODEPTH = 80, /* Receive hardware FIFO depth */ HAL_CAP_RXBUFSIZE = 81, /* Receive Buffer Length */ HAL_CAP_NUM_MR_RETRIES = 82, /* limit on multirate retries */ HAL_CAP_OL_PWRCTRL = 84, /* Open loop TX power control */ HAL_CAP_SPECTRAL_SCAN = 90, /* Hardware supports spectral scan */ HAL_CAP_BB_PANIC_WATCHDOG = 92, HAL_CAP_HT20_SGI = 96, /* hardware supports HT20 short GI */ HAL_CAP_LDPC = 99, HAL_CAP_RXTSTAMP_PREC = 100, /* rx desc tstamp precision (bits) */ HAL_CAP_ANT_DIV_COMB = 105, /* Enable antenna diversity/combining */ HAL_CAP_PHYRESTART_CLR_WAR = 106, /* in some cases, clear phy restart to fix bb hang */ HAL_CAP_ENTERPRISE_MODE = 107, /* Enterprise mode features */ HAL_CAP_LDPCWAR = 108, HAL_CAP_CHANNEL_SWITCH_TIME_USEC = 109, /* Channel change time, usec */ HAL_CAP_ENABLE_APM = 110, /* APM enabled */ HAL_CAP_PCIE_LCR_EXTSYNC_EN = 111, HAL_CAP_PCIE_LCR_OFFSET = 112, HAL_CAP_ENHANCED_DFS_SUPPORT = 117, /* hardware supports enhanced DFS */ HAL_CAP_MCI = 118, HAL_CAP_SMARTANTENNA = 119, HAL_CAP_TRAFFIC_FAST_RECOVER = 120, HAL_CAP_TX_DIVERSITY = 121, HAL_CAP_CRDC = 122, /* The following are private to the FreeBSD HAL (224 onward) */ HAL_CAP_INTMIT = 229, /* interference mitigation */ HAL_CAP_RXORN_FATAL = 230, /* HAL_INT_RXORN treated as fatal */ HAL_CAP_BB_HANG = 235, /* can baseband hang */ HAL_CAP_INTRMASK = 237, /* bitmask of supported interrupts */ HAL_CAP_BSSIDMATCH = 238, /* hardware has disable bssid match */ HAL_CAP_STREAMS = 239, /* how many 802.11n spatial streams are available */ HAL_CAP_RXDESC_SELFLINK = 242, /* support a self-linked tail RX descriptor */ HAL_CAP_BB_READ_WAR = 244, /* baseband read WAR */ HAL_CAP_SERIALISE_WAR = 245, /* serialise register access on PCI */ HAL_CAP_ENFORCE_TXOP = 246, /* Enforce TXOP if supported */ HAL_CAP_RX_LNA_MIXING = 247, /* RX hardware uses LNA mixing */ HAL_CAP_DO_MYBEACON = 248, /* Supports HAL_RX_FILTER_MYBEACON */ HAL_CAP_TOA_LOCATIONING = 249, /* time of flight / arrival locationing */ HAL_CAP_TXTSTAMP_PREC = 250, /* tx desc tstamp precision (bits) */ } HAL_CAPABILITY_TYPE; /* * "States" for setting the LED. These correspond to * the possible 802.11 operational states and there may * be a many-to-one mapping between these states and the * actual hardware state for the LED's (i.e. the hardware * may have fewer states). */ typedef enum { HAL_LED_INIT = 0, HAL_LED_SCAN = 1, HAL_LED_AUTH = 2, HAL_LED_ASSOC = 3, HAL_LED_RUN = 4 } HAL_LED_STATE; /* * Transmit queue types/numbers. These are used to tag * each transmit queue in the hardware and to identify a set * of transmit queues for operations such as start/stop dma. */ typedef enum { HAL_TX_QUEUE_INACTIVE = 0, /* queue is inactive/unused */ HAL_TX_QUEUE_DATA = 1, /* data xmit q's */ HAL_TX_QUEUE_BEACON = 2, /* beacon xmit q */ HAL_TX_QUEUE_CAB = 3, /* "crap after beacon" xmit q */ HAL_TX_QUEUE_UAPSD = 4, /* u-apsd power save xmit q */ HAL_TX_QUEUE_PSPOLL = 5, /* power save poll xmit q */ HAL_TX_QUEUE_CFEND = 6, HAL_TX_QUEUE_PAPRD = 7, } HAL_TX_QUEUE; #define HAL_NUM_TX_QUEUES 10 /* max possible # of queues */ /* * Receive queue types. These are used to tag * each transmit queue in the hardware and to identify a set * of transmit queues for operations such as start/stop dma. */ typedef enum { HAL_RX_QUEUE_HP = 0, /* high priority recv queue */ HAL_RX_QUEUE_LP = 1, /* low priority recv queue */ } HAL_RX_QUEUE; #define HAL_NUM_RX_QUEUES 2 /* max possible # of queues */ #define HAL_TXFIFO_DEPTH 8 /* transmit fifo depth */ /* * Transmit queue subtype. These map directly to * WME Access Categories (except for UPSD). Refer * to Table 5 of the WME spec. */ typedef enum { HAL_WME_AC_BK = 0, /* background access category */ HAL_WME_AC_BE = 1, /* best effort access category*/ HAL_WME_AC_VI = 2, /* video access category */ HAL_WME_AC_VO = 3, /* voice access category */ HAL_WME_UPSD = 4, /* uplink power save */ } HAL_TX_QUEUE_SUBTYPE; /* * Transmit queue flags that control various * operational parameters. */ typedef enum { /* * Per queue interrupt enables. When set the associated * interrupt may be delivered for packets sent through * the queue. Without these enabled no interrupts will * be delivered for transmits through the queue. */ HAL_TXQ_TXOKINT_ENABLE = 0x0001, /* enable TXOK interrupt */ HAL_TXQ_TXERRINT_ENABLE = 0x0001, /* enable TXERR interrupt */ HAL_TXQ_TXDESCINT_ENABLE = 0x0002, /* enable TXDESC interrupt */ HAL_TXQ_TXEOLINT_ENABLE = 0x0004, /* enable TXEOL interrupt */ HAL_TXQ_TXURNINT_ENABLE = 0x0008, /* enable TXURN interrupt */ /* * Enable hardware compression for packets sent through * the queue. The compression buffer must be setup and * packets must have a key entry marked in the tx descriptor. */ HAL_TXQ_COMPRESSION_ENABLE = 0x0010, /* enable h/w compression */ /* * Disable queue when veol is hit or ready time expires. * By default the queue is disabled only on reaching the * physical end of queue (i.e. a null link ptr in the * descriptor chain). */ HAL_TXQ_RDYTIME_EXP_POLICY_ENABLE = 0x0020, /* * Schedule frames on delivery of a DBA (DMA Beacon Alert) * event. Frames will be transmitted only when this timer * fires, e.g to transmit a beacon in ap or adhoc modes. */ HAL_TXQ_DBA_GATED = 0x0040, /* schedule based on DBA */ /* * Each transmit queue has a counter that is incremented * each time the queue is enabled and decremented when * the list of frames to transmit is traversed (or when * the ready time for the queue expires). This counter * must be non-zero for frames to be scheduled for * transmission. The following controls disable bumping * this counter under certain conditions. Typically this * is used to gate frames based on the contents of another * queue (e.g. CAB traffic may only follow a beacon frame). * These are meaningful only when frames are scheduled * with a non-ASAP policy (e.g. DBA-gated). */ HAL_TXQ_CBR_DIS_QEMPTY = 0x0080, /* disable on this q empty */ HAL_TXQ_CBR_DIS_BEMPTY = 0x0100, /* disable on beacon q empty */ /* * Fragment burst backoff policy. Normally the no backoff * is done after a successful transmission, the next fragment * is sent at SIFS. If this flag is set backoff is done * after each fragment, regardless whether it was ack'd or * not, after the backoff count reaches zero a normal channel * access procedure is done before the next transmit (i.e. * wait AIFS instead of SIFS). */ HAL_TXQ_FRAG_BURST_BACKOFF_ENABLE = 0x00800000, /* * Disable post-tx backoff following each frame. */ HAL_TXQ_BACKOFF_DISABLE = 0x00010000, /* disable post backoff */ /* * DCU arbiter lockout control. This controls how * lower priority tx queues are handled with respect to * to a specific queue when multiple queues have frames * to send. No lockout means lower priority queues arbitrate * concurrently with this queue. Intra-frame lockout * means lower priority queues are locked out until the * current frame transmits (e.g. including backoffs and bursting). * Global lockout means nothing lower can arbitrary so * long as there is traffic activity on this queue (frames, * backoff, etc). */ HAL_TXQ_ARB_LOCKOUT_INTRA = 0x00020000, /* intra-frame lockout */ HAL_TXQ_ARB_LOCKOUT_GLOBAL = 0x00040000, /* full lockout s */ HAL_TXQ_IGNORE_VIRTCOL = 0x00080000, /* ignore virt collisions */ HAL_TXQ_SEQNUM_INC_DIS = 0x00100000, /* disable seqnum increment */ } HAL_TX_QUEUE_FLAGS; typedef struct { uint32_t tqi_ver; /* hal TXQ version */ HAL_TX_QUEUE_SUBTYPE tqi_subtype; /* subtype if applicable */ HAL_TX_QUEUE_FLAGS tqi_qflags; /* flags (see above) */ uint32_t tqi_priority; /* (not used) */ uint32_t tqi_aifs; /* aifs */ uint32_t tqi_cwmin; /* cwMin */ uint32_t tqi_cwmax; /* cwMax */ uint16_t tqi_shretry; /* rts retry limit */ uint16_t tqi_lgretry; /* long retry limit (not used)*/ uint32_t tqi_cbrPeriod; /* CBR period (us) */ uint32_t tqi_cbrOverflowLimit; /* threshold for CBROVF int */ uint32_t tqi_burstTime; /* max burst duration (us) */ uint32_t tqi_readyTime; /* frame schedule time (us) */ uint32_t tqi_compBuf; /* comp buffer phys addr */ } HAL_TXQ_INFO; #define HAL_TQI_NONVAL 0xffff /* token to use for aifs, cwmin, cwmax */ #define HAL_TXQ_USEDEFAULT ((uint32_t) -1) /* compression definitions */ #define HAL_COMP_BUF_MAX_SIZE 9216 /* 9K */ #define HAL_COMP_BUF_ALIGN_SIZE 512 /* * Transmit packet types. This belongs in ah_desc.h, but * is here so we can give a proper type to various parameters * (and not require everyone include the file). * * NB: These values are intentionally assigned for * direct use when setting up h/w descriptors. */ typedef enum { HAL_PKT_TYPE_NORMAL = 0, HAL_PKT_TYPE_ATIM = 1, HAL_PKT_TYPE_PSPOLL = 2, HAL_PKT_TYPE_BEACON = 3, HAL_PKT_TYPE_PROBE_RESP = 4, HAL_PKT_TYPE_CHIRP = 5, HAL_PKT_TYPE_GRP_POLL = 6, HAL_PKT_TYPE_AMPDU = 7, } HAL_PKT_TYPE; /* Rx Filter Frame Types */ typedef enum { /* * These bits correspond to AR_RX_FILTER for all chips. * Not all bits are supported by all chips. */ HAL_RX_FILTER_UCAST = 0x00000001, /* Allow unicast frames */ HAL_RX_FILTER_MCAST = 0x00000002, /* Allow multicast frames */ HAL_RX_FILTER_BCAST = 0x00000004, /* Allow broadcast frames */ HAL_RX_FILTER_CONTROL = 0x00000008, /* Allow control frames */ HAL_RX_FILTER_BEACON = 0x00000010, /* Allow beacon frames */ HAL_RX_FILTER_PROM = 0x00000020, /* Promiscuous mode */ HAL_RX_FILTER_PROBEREQ = 0x00000080, /* Allow probe request frames */ HAL_RX_FILTER_PHYERR = 0x00000100, /* Allow phy errors */ HAL_RX_FILTER_MYBEACON = 0x00000200, /* Filter beacons other than mine */ HAL_RX_FILTER_COMPBAR = 0x00000400, /* Allow compressed BAR */ HAL_RX_FILTER_COMP_BA = 0x00000800, /* Allow compressed blockack */ HAL_RX_FILTER_PHYRADAR = 0x00002000, /* Allow phy radar errors */ HAL_RX_FILTER_PSPOLL = 0x00004000, /* Allow PS-POLL frames */ HAL_RX_FILTER_MCAST_BCAST_ALL = 0x00008000, /* Allow all mcast/bcast frames */ /* * Magic RX filter flags that aren't targeting hardware bits * but instead the HAL sets individual bits - eg PHYERR will result * in OFDM/CCK timing error frames being received. */ HAL_RX_FILTER_BSSID = 0x40000000, /* Disable BSSID match */ } HAL_RX_FILTER; typedef enum { HAL_PM_AWAKE = 0, HAL_PM_FULL_SLEEP = 1, HAL_PM_NETWORK_SLEEP = 2, HAL_PM_UNDEFINED = 3 } HAL_POWER_MODE; /* * Enterprise mode flags */ #define AH_ENT_DUAL_BAND_DISABLE 0x00000001 #define AH_ENT_CHAIN2_DISABLE 0x00000002 #define AH_ENT_5MHZ_DISABLE 0x00000004 #define AH_ENT_10MHZ_DISABLE 0x00000008 #define AH_ENT_49GHZ_DISABLE 0x00000010 #define AH_ENT_LOOPBACK_DISABLE 0x00000020 #define AH_ENT_TPC_PERF_DISABLE 0x00000040 #define AH_ENT_MIN_PKT_SIZE_DISABLE 0x00000080 #define AH_ENT_SPECTRAL_PRECISION 0x00000300 #define AH_ENT_SPECTRAL_PRECISION_S 8 #define AH_ENT_RTSCTS_DELIM_WAR 0x00010000 #define AH_FIRST_DESC_NDELIMS 60 /* * NOTE WELL: * These are mapped to take advantage of the common locations for many of * the bits on all of the currently supported MAC chips. This is to make * the ISR as efficient as possible, while still abstracting HW differences. * When new hardware breaks this commonality this enumerated type, as well * as the HAL functions using it, must be modified. All values are directly * mapped unless commented otherwise. */ typedef enum { HAL_INT_RX = 0x00000001, /* Non-common mapping */ HAL_INT_RXDESC = 0x00000002, /* Legacy mapping */ HAL_INT_RXERR = 0x00000004, HAL_INT_RXHP = 0x00000001, /* EDMA */ HAL_INT_RXLP = 0x00000002, /* EDMA */ HAL_INT_RXNOFRM = 0x00000008, HAL_INT_RXEOL = 0x00000010, HAL_INT_RXORN = 0x00000020, HAL_INT_TX = 0x00000040, /* Non-common mapping */ HAL_INT_TXDESC = 0x00000080, HAL_INT_TIM_TIMER= 0x00000100, HAL_INT_MCI = 0x00000200, HAL_INT_BBPANIC = 0x00000400, HAL_INT_TXURN = 0x00000800, HAL_INT_MIB = 0x00001000, HAL_INT_RXPHY = 0x00004000, HAL_INT_RXKCM = 0x00008000, HAL_INT_SWBA = 0x00010000, HAL_INT_BRSSI = 0x00020000, HAL_INT_BMISS = 0x00040000, HAL_INT_BNR = 0x00100000, HAL_INT_TIM = 0x00200000, /* Non-common mapping */ HAL_INT_DTIM = 0x00400000, /* Non-common mapping */ HAL_INT_DTIMSYNC= 0x00800000, /* Non-common mapping */ HAL_INT_GPIO = 0x01000000, HAL_INT_CABEND = 0x02000000, /* Non-common mapping */ HAL_INT_TSFOOR = 0x04000000, /* Non-common mapping */ HAL_INT_TBTT = 0x08000000, /* Non-common mapping */ /* Atheros ref driver has a generic timer interrupt now..*/ HAL_INT_GENTIMER = 0x08000000, /* Non-common mapping */ HAL_INT_CST = 0x10000000, /* Non-common mapping */ HAL_INT_GTT = 0x20000000, /* Non-common mapping */ HAL_INT_FATAL = 0x40000000, /* Non-common mapping */ #define HAL_INT_GLOBAL 0x80000000 /* Set/clear IER */ HAL_INT_BMISC = HAL_INT_TIM | HAL_INT_DTIM | HAL_INT_DTIMSYNC | HAL_INT_CABEND | HAL_INT_TBTT, /* Interrupt bits that map directly to ISR/IMR bits */ HAL_INT_COMMON = HAL_INT_RXNOFRM | HAL_INT_RXDESC | HAL_INT_RXEOL | HAL_INT_RXORN | HAL_INT_TXDESC | HAL_INT_TXURN | HAL_INT_MIB | HAL_INT_RXPHY | HAL_INT_RXKCM | HAL_INT_SWBA | HAL_INT_BMISS | HAL_INT_BRSSI | HAL_INT_BNR | HAL_INT_GPIO, } HAL_INT; /* * MSI vector assignments */ typedef enum { HAL_MSIVEC_MISC = 0, HAL_MSIVEC_TX = 1, HAL_MSIVEC_RXLP = 2, HAL_MSIVEC_RXHP = 3, } HAL_MSIVEC; typedef enum { HAL_INT_LINE = 0, HAL_INT_MSI = 1, } HAL_INT_TYPE; /* For interrupt mitigation registers */ typedef enum { HAL_INT_RX_FIRSTPKT=0, HAL_INT_RX_LASTPKT, HAL_INT_TX_FIRSTPKT, HAL_INT_TX_LASTPKT, HAL_INT_THRESHOLD } HAL_INT_MITIGATION; /* XXX this is duplicate information! */ typedef struct { u_int32_t cyclecnt_diff; /* delta cycle count */ u_int32_t rxclr_cnt; /* rx clear count */ u_int32_t extrxclr_cnt; /* ext chan rx clear count */ u_int32_t txframecnt_diff; /* delta tx frame count */ u_int32_t rxframecnt_diff; /* delta rx frame count */ u_int32_t listen_time; /* listen time in msec - time for which ch is free */ u_int32_t ofdmphyerr_cnt; /* OFDM err count since last reset */ u_int32_t cckphyerr_cnt; /* CCK err count since last reset */ u_int32_t ofdmphyerrcnt_diff; /* delta OFDM Phy Error Count */ HAL_BOOL valid; /* if the stats are valid*/ } HAL_ANISTATS; typedef struct { u_int8_t txctl_offset; u_int8_t txctl_numwords; u_int8_t txstatus_offset; u_int8_t txstatus_numwords; u_int8_t rxctl_offset; u_int8_t rxctl_numwords; u_int8_t rxstatus_offset; u_int8_t rxstatus_numwords; u_int8_t macRevision; } HAL_DESC_INFO; typedef enum { HAL_GPIO_OUTPUT_MUX_AS_OUTPUT = 0, HAL_GPIO_OUTPUT_MUX_PCIE_ATTENTION_LED = 1, HAL_GPIO_OUTPUT_MUX_PCIE_POWER_LED = 2, HAL_GPIO_OUTPUT_MUX_MAC_NETWORK_LED = 3, HAL_GPIO_OUTPUT_MUX_MAC_POWER_LED = 4, HAL_GPIO_OUTPUT_MUX_AS_WLAN_ACTIVE = 5, HAL_GPIO_OUTPUT_MUX_AS_TX_FRAME = 6, HAL_GPIO_OUTPUT_MUX_AS_MCI_WLAN_DATA, HAL_GPIO_OUTPUT_MUX_AS_MCI_WLAN_CLK, HAL_GPIO_OUTPUT_MUX_AS_MCI_BT_DATA, HAL_GPIO_OUTPUT_MUX_AS_MCI_BT_CLK, HAL_GPIO_OUTPUT_MUX_AS_WL_IN_TX, HAL_GPIO_OUTPUT_MUX_AS_WL_IN_RX, HAL_GPIO_OUTPUT_MUX_AS_BT_IN_TX, HAL_GPIO_OUTPUT_MUX_AS_BT_IN_RX, HAL_GPIO_OUTPUT_MUX_AS_RUCKUS_STROBE, HAL_GPIO_OUTPUT_MUX_AS_RUCKUS_DATA, HAL_GPIO_OUTPUT_MUX_AS_SMARTANT_CTRL0, HAL_GPIO_OUTPUT_MUX_AS_SMARTANT_CTRL1, HAL_GPIO_OUTPUT_MUX_AS_SMARTANT_CTRL2, HAL_GPIO_OUTPUT_MUX_NUM_ENTRIES } HAL_GPIO_MUX_TYPE; typedef enum { HAL_GPIO_INTR_LOW = 0, HAL_GPIO_INTR_HIGH = 1, HAL_GPIO_INTR_DISABLE = 2 } HAL_GPIO_INTR_TYPE; typedef struct halCounters { u_int32_t tx_frame_count; u_int32_t rx_frame_count; u_int32_t rx_clear_count; u_int32_t cycle_count; u_int8_t is_rx_active; // true (1) or false (0) u_int8_t is_tx_active; // true (1) or false (0) } HAL_COUNTERS; typedef enum { HAL_RFGAIN_INACTIVE = 0, HAL_RFGAIN_READ_REQUESTED = 1, HAL_RFGAIN_NEED_CHANGE = 2 } HAL_RFGAIN; typedef uint16_t HAL_CTRY_CODE; /* country code */ typedef uint16_t HAL_REG_DOMAIN; /* regulatory domain code */ #define HAL_ANTENNA_MIN_MODE 0 #define HAL_ANTENNA_FIXED_A 1 #define HAL_ANTENNA_FIXED_B 2 #define HAL_ANTENNA_MAX_MODE 3 typedef struct { uint32_t ackrcv_bad; uint32_t rts_bad; uint32_t rts_good; uint32_t fcs_bad; uint32_t beacons; } HAL_MIB_STATS; /* * These bits represent what's in ah_currentRDext. */ typedef enum { REG_EXT_FCC_MIDBAND = 0, REG_EXT_JAPAN_MIDBAND = 1, REG_EXT_FCC_DFS_HT40 = 2, REG_EXT_JAPAN_NONDFS_HT40 = 3, REG_EXT_JAPAN_DFS_HT40 = 4 } REG_EXT_BITMAP; enum { HAL_MODE_11A = 0x001, /* 11a channels */ HAL_MODE_TURBO = 0x002, /* 11a turbo-only channels */ HAL_MODE_11B = 0x004, /* 11b channels */ HAL_MODE_PUREG = 0x008, /* 11g channels (OFDM only) */ #ifdef notdef HAL_MODE_11G = 0x010, /* 11g channels (OFDM/CCK) */ #else HAL_MODE_11G = 0x008, /* XXX historical */ #endif HAL_MODE_108G = 0x020, /* 11g+Turbo channels */ HAL_MODE_108A = 0x040, /* 11a+Turbo channels */ HAL_MODE_11A_HALF_RATE = 0x200, /* 11a half width channels */ HAL_MODE_11A_QUARTER_RATE = 0x400, /* 11a quarter width channels */ HAL_MODE_11G_HALF_RATE = 0x800, /* 11g half width channels */ HAL_MODE_11G_QUARTER_RATE = 0x1000, /* 11g quarter width channels */ HAL_MODE_11NG_HT20 = 0x008000, HAL_MODE_11NA_HT20 = 0x010000, HAL_MODE_11NG_HT40PLUS = 0x020000, HAL_MODE_11NG_HT40MINUS = 0x040000, HAL_MODE_11NA_HT40PLUS = 0x080000, HAL_MODE_11NA_HT40MINUS = 0x100000, HAL_MODE_ALL = 0xffffff }; typedef struct { int rateCount; /* NB: for proper padding */ uint8_t rateCodeToIndex[256]; /* back mapping */ struct { uint8_t valid; /* valid for rate control use */ uint8_t phy; /* CCK/OFDM/XR */ uint32_t rateKbps; /* transfer rate in kbs */ uint8_t rateCode; /* rate for h/w descriptors */ uint8_t shortPreamble; /* mask for enabling short * preamble in CCK rate code */ uint8_t dot11Rate; /* value for supported rates * info element of MLME */ uint8_t controlRate; /* index of next lower basic * rate; used for dur. calcs */ uint16_t lpAckDuration; /* long preamble ACK duration */ uint16_t spAckDuration; /* short preamble ACK duration*/ } info[64]; } HAL_RATE_TABLE; typedef struct { u_int rs_count; /* number of valid entries */ uint8_t rs_rates[64]; /* rates */ } HAL_RATE_SET; /* * 802.11n specific structures and enums */ typedef enum { HAL_CHAINTYPE_TX = 1, /* Tx chain type */ HAL_CHAINTYPE_RX = 2, /* RX chain type */ } HAL_CHAIN_TYPE; typedef struct { u_int Tries; u_int Rate; /* hardware rate code */ u_int RateIndex; /* rate series table index */ u_int PktDuration; u_int ChSel; u_int RateFlags; #define HAL_RATESERIES_RTS_CTS 0x0001 /* use rts/cts w/this series */ #define HAL_RATESERIES_2040 0x0002 /* use ext channel for series */ #define HAL_RATESERIES_HALFGI 0x0004 /* use half-gi for series */ #define HAL_RATESERIES_STBC 0x0008 /* use STBC for series */ u_int tx_power_cap; /* in 1/2 dBm units XXX TODO */ } HAL_11N_RATE_SERIES; typedef enum { HAL_HT_MACMODE_20 = 0, /* 20 MHz operation */ HAL_HT_MACMODE_2040 = 1, /* 20/40 MHz operation */ } HAL_HT_MACMODE; typedef enum { HAL_HT_PHYMODE_20 = 0, /* 20 MHz operation */ HAL_HT_PHYMODE_2040 = 1, /* 20/40 MHz operation */ } HAL_HT_PHYMODE; typedef enum { HAL_HT_EXTPROTSPACING_20 = 0, /* 20 MHz spacing */ HAL_HT_EXTPROTSPACING_25 = 1, /* 25 MHz spacing */ } HAL_HT_EXTPROTSPACING; typedef enum { HAL_RX_CLEAR_CTL_LOW = 0x1, /* force control channel to appear busy */ HAL_RX_CLEAR_EXT_LOW = 0x2, /* force extension channel to appear busy */ } HAL_HT_RXCLEAR; typedef enum { HAL_FREQ_BAND_5GHZ = 0, HAL_FREQ_BAND_2GHZ = 1, } HAL_FREQ_BAND; /* * Antenna switch control. By default antenna selection * enables multiple (2) antenna use. To force use of the * A or B antenna only specify a fixed setting. Fixing * the antenna will also disable any diversity support. */ typedef enum { HAL_ANT_VARIABLE = 0, /* variable by programming */ HAL_ANT_FIXED_A = 1, /* fixed antenna A */ HAL_ANT_FIXED_B = 2, /* fixed antenna B */ } HAL_ANT_SETTING; typedef enum { HAL_M_STA = 1, /* infrastructure station */ HAL_M_IBSS = 0, /* IBSS (adhoc) station */ HAL_M_HOSTAP = 6, /* Software Access Point */ HAL_M_MONITOR = 8 /* Monitor mode */ } HAL_OPMODE; typedef enum { HAL_RESET_NORMAL = 0, /* Do normal reset */ HAL_RESET_BBPANIC = 1, /* Reset because of BB panic */ HAL_RESET_FORCE_COLD = 2, /* Force full reset */ } HAL_RESET_TYPE; typedef struct { uint8_t kv_type; /* one of HAL_CIPHER */ uint8_t kv_apsd; /* Mask for APSD enabled ACs */ uint16_t kv_len; /* length in bits */ uint8_t kv_val[16]; /* enough for 128-bit keys */ uint8_t kv_mic[8]; /* TKIP MIC key */ uint8_t kv_txmic[8]; /* TKIP TX MIC key (optional) */ } HAL_KEYVAL; /* * This is the TX descriptor field which marks the key padding requirement. * The naming is unfortunately unclear. */ #define AH_KEYTYPE_MASK 0x0F typedef enum { HAL_KEY_TYPE_CLEAR, HAL_KEY_TYPE_WEP, HAL_KEY_TYPE_AES, HAL_KEY_TYPE_TKIP, } HAL_KEY_TYPE; typedef enum { HAL_CIPHER_WEP = 0, HAL_CIPHER_AES_OCB = 1, HAL_CIPHER_AES_CCM = 2, HAL_CIPHER_CKIP = 3, HAL_CIPHER_TKIP = 4, HAL_CIPHER_CLR = 5, /* no encryption */ HAL_CIPHER_MIC = 127 /* TKIP-MIC, not a cipher */ } HAL_CIPHER; enum { HAL_SLOT_TIME_6 = 6, /* NB: for turbo mode */ HAL_SLOT_TIME_9 = 9, HAL_SLOT_TIME_20 = 20, }; /* * Per-station beacon timer state. Note that the specified * beacon interval (given in TU's) can also include flags * to force a TSF reset and to enable the beacon xmit logic. * If bs_cfpmaxduration is non-zero the hardware is setup to * coexist with a PCF-capable AP. */ typedef struct { uint32_t bs_nexttbtt; /* next beacon in TU */ uint32_t bs_nextdtim; /* next DTIM in TU */ uint32_t bs_intval; /* beacon interval+flags */ /* * HAL_BEACON_PERIOD, HAL_BEACON_ENA and HAL_BEACON_RESET_TSF * are all 1:1 correspondances with the pre-11n chip AR_BEACON * register. */ #define HAL_BEACON_PERIOD 0x0000ffff /* beacon interval period */ #define HAL_BEACON_PERIOD_TU8 0x0007ffff /* beacon interval, tu/8 */ #define HAL_BEACON_ENA 0x00800000 /* beacon xmit enable */ #define HAL_BEACON_RESET_TSF 0x01000000 /* clear TSF */ #define HAL_TSFOOR_THRESHOLD 0x00004240 /* TSF OOR thresh (16k uS) */ uint32_t bs_dtimperiod; uint16_t bs_cfpperiod; /* CFP period in TU */ uint16_t bs_cfpmaxduration; /* max CFP duration in TU */ uint32_t bs_cfpnext; /* next CFP in TU */ uint16_t bs_timoffset; /* byte offset to TIM bitmap */ uint16_t bs_bmissthreshold; /* beacon miss threshold */ uint32_t bs_sleepduration; /* max sleep duration */ uint32_t bs_tsfoor_threshold; /* TSF out of range threshold */ } HAL_BEACON_STATE; /* * Like HAL_BEACON_STATE but for non-station mode setup. * NB: see above flag definitions for bt_intval. */ typedef struct { uint32_t bt_intval; /* beacon interval+flags */ uint32_t bt_nexttbtt; /* next beacon in TU */ uint32_t bt_nextatim; /* next ATIM in TU */ uint32_t bt_nextdba; /* next DBA in 1/8th TU */ uint32_t bt_nextswba; /* next SWBA in 1/8th TU */ uint32_t bt_flags; /* timer enables */ #define HAL_BEACON_TBTT_EN 0x00000001 #define HAL_BEACON_DBA_EN 0x00000002 #define HAL_BEACON_SWBA_EN 0x00000004 } HAL_BEACON_TIMERS; /* * Per-node statistics maintained by the driver for use in * optimizing signal quality and other operational aspects. */ typedef struct { uint32_t ns_avgbrssi; /* average beacon rssi */ uint32_t ns_avgrssi; /* average data rssi */ uint32_t ns_avgtxrssi; /* average tx rssi */ } HAL_NODE_STATS; #define HAL_RSSI_EP_MULTIPLIER (1<<7) /* pow2 to optimize out * and / */ /* * This is the ANI state and MIB stats. * * It's used by the HAL modules to keep state /and/ by the debug ioctl * to fetch ANI information. */ typedef struct { uint32_t ast_ani_niup; /* ANI increased noise immunity */ uint32_t ast_ani_nidown; /* ANI decreased noise immunity */ uint32_t ast_ani_spurup; /* ANI increased spur immunity */ uint32_t ast_ani_spurdown;/* ANI descreased spur immunity */ uint32_t ast_ani_ofdmon; /* ANI OFDM weak signal detect on */ uint32_t ast_ani_ofdmoff;/* ANI OFDM weak signal detect off */ uint32_t ast_ani_cckhigh;/* ANI CCK weak signal threshold high */ uint32_t ast_ani_ccklow; /* ANI CCK weak signal threshold low */ uint32_t ast_ani_stepup; /* ANI increased first step level */ uint32_t ast_ani_stepdown;/* ANI decreased first step level */ uint32_t ast_ani_ofdmerrs;/* ANI cumulative ofdm phy err count */ uint32_t ast_ani_cckerrs;/* ANI cumulative cck phy err count */ uint32_t ast_ani_reset; /* ANI parameters zero'd for non-STA */ uint32_t ast_ani_lzero; /* ANI listen time forced to zero */ uint32_t ast_ani_lneg; /* ANI listen time calculated < 0 */ HAL_MIB_STATS ast_mibstats; /* MIB counter stats */ HAL_NODE_STATS ast_nodestats; /* Latest rssi stats from driver */ } HAL_ANI_STATS; typedef struct { uint8_t noiseImmunityLevel; uint8_t spurImmunityLevel; uint8_t firstepLevel; uint8_t ofdmWeakSigDetectOff; uint8_t cckWeakSigThreshold; uint32_t listenTime; /* NB: intentionally ordered so data exported to user space is first */ uint32_t txFrameCount; /* Last txFrameCount */ uint32_t rxFrameCount; /* Last rx Frame count */ uint32_t cycleCount; /* Last cycleCount (to detect wrap-around) */ uint32_t ofdmPhyErrCount;/* OFDM err count since last reset */ uint32_t cckPhyErrCount; /* CCK err count since last reset */ } HAL_ANI_STATE; struct ath_desc; struct ath_tx_status; struct ath_rx_status; struct ieee80211_channel; /* * This is a channel survey sample entry. * * The AR5212 ANI routines fill these samples. The ANI code then uses it * when calculating listen time; it is also exported via a diagnostic * API. */ typedef struct { uint32_t seq_num; uint32_t tx_busy; uint32_t rx_busy; uint32_t chan_busy; uint32_t ext_chan_busy; uint32_t cycle_count; /* XXX TODO */ uint32_t ofdm_phyerr_count; uint32_t cck_phyerr_count; } HAL_SURVEY_SAMPLE; /* * This provides 3.2 seconds of sample space given an * ANI time of 1/10th of a second. This may not be enough! */ #define CHANNEL_SURVEY_SAMPLE_COUNT 32 typedef struct { HAL_SURVEY_SAMPLE samples[CHANNEL_SURVEY_SAMPLE_COUNT]; uint32_t cur_sample; /* current sample in sequence */ uint32_t cur_seq; /* current sequence number */ } HAL_CHANNEL_SURVEY; /* * ANI commands. * * These are used both internally and externally via the diagnostic * API. * * Note that this is NOT the ANI commands being used via the INTMIT * capability - that has a different mapping for some reason. */ typedef enum { HAL_ANI_PRESENT = 0, /* is ANI support present */ HAL_ANI_NOISE_IMMUNITY_LEVEL = 1, /* set level */ HAL_ANI_OFDM_WEAK_SIGNAL_DETECTION = 2, /* enable/disable */ HAL_ANI_CCK_WEAK_SIGNAL_THR = 3, /* enable/disable */ HAL_ANI_FIRSTEP_LEVEL = 4, /* set level */ HAL_ANI_SPUR_IMMUNITY_LEVEL = 5, /* set level */ HAL_ANI_MODE = 6, /* 0 => manual, 1 => auto (XXX do not change) */ HAL_ANI_PHYERR_RESET = 7, /* reset phy error stats */ HAL_ANI_MRC_CCK = 8, } HAL_ANI_CMD; #define HAL_ANI_ALL 0xffffffff /* * This is the layout of the ANI INTMIT capability. * * Notice that the command values differ to HAL_ANI_CMD. */ typedef enum { HAL_CAP_INTMIT_PRESENT = 0, HAL_CAP_INTMIT_ENABLE = 1, HAL_CAP_INTMIT_NOISE_IMMUNITY_LEVEL = 2, HAL_CAP_INTMIT_OFDM_WEAK_SIGNAL_LEVEL = 3, HAL_CAP_INTMIT_CCK_WEAK_SIGNAL_THR = 4, HAL_CAP_INTMIT_FIRSTEP_LEVEL = 5, HAL_CAP_INTMIT_SPUR_IMMUNITY_LEVEL = 6 } HAL_CAP_INTMIT_CMD; typedef struct { int32_t pe_firpwr; /* FIR pwr out threshold */ int32_t pe_rrssi; /* Radar rssi thresh */ int32_t pe_height; /* Pulse height thresh */ int32_t pe_prssi; /* Pulse rssi thresh */ int32_t pe_inband; /* Inband thresh */ /* The following params are only for AR5413 and later */ u_int32_t pe_relpwr; /* Relative power threshold in 0.5dB steps */ u_int32_t pe_relstep; /* Pulse Relative step threshold in 0.5dB steps */ u_int32_t pe_maxlen; /* Max length of radar sign in 0.8us units */ int32_t pe_usefir128; /* Use the average in-band power measured over 128 cycles */ int32_t pe_blockradar; /* * Enable to block radar check if pkt detect is done via OFDM * weak signal detect or pkt is detected immediately after tx * to rx transition */ int32_t pe_enmaxrssi; /* * Enable to use the max rssi instead of the last rssi during * fine gain changes for radar detection */ int32_t pe_extchannel; /* Enable DFS on ext channel */ int32_t pe_enabled; /* Whether radar detection is enabled */ int32_t pe_enrelpwr; int32_t pe_en_relstep_check; } HAL_PHYERR_PARAM; #define HAL_PHYERR_PARAM_NOVAL 65535 typedef struct { u_int16_t ss_fft_period; /* Skip interval for FFT reports */ u_int16_t ss_period; /* Spectral scan period */ u_int16_t ss_count; /* # of reports to return from ss_active */ u_int16_t ss_short_report;/* Set to report ony 1 set of FFT results */ u_int8_t radar_bin_thresh_sel; /* strong signal radar FFT threshold configuration */ u_int16_t ss_spectral_pri; /* are we doing a noise power cal ? */ int8_t ss_nf_cal[AH_MAX_CHAINS*2]; /* nf calibrated values for ctl+ext from eeprom */ int8_t ss_nf_pwr[AH_MAX_CHAINS*2]; /* nf pwr values for ctl+ext from eeprom */ int32_t ss_nf_temp_data; /* temperature data taken during nf scan */ int ss_enabled; int ss_active; } HAL_SPECTRAL_PARAM; #define HAL_SPECTRAL_PARAM_NOVAL 0xFFFF #define HAL_SPECTRAL_PARAM_ENABLE 0x8000 /* Enable/Disable if applicable */ /* * DFS operating mode flags. */ typedef enum { HAL_DFS_UNINIT_DOMAIN = 0, /* Uninitialized dfs domain */ HAL_DFS_FCC_DOMAIN = 1, /* FCC3 dfs domain */ HAL_DFS_ETSI_DOMAIN = 2, /* ETSI dfs domain */ HAL_DFS_MKK4_DOMAIN = 3, /* Japan dfs domain */ } HAL_DFS_DOMAIN; /* * MFP decryption options for initializing the MAC. */ typedef enum { HAL_MFP_QOSDATA = 0, /* Decrypt MFP frames like QoS data frames. All chips before Merlin. */ HAL_MFP_PASSTHRU, /* Don't decrypt MFP frames at all. Passthrough */ HAL_MFP_HW_CRYPTO /* hardware decryption enabled. Merlin can do it. */ } HAL_MFP_OPT_T; /* LNA config supported */ typedef enum { HAL_ANT_DIV_COMB_LNA1_MINUS_LNA2 = 0, HAL_ANT_DIV_COMB_LNA2 = 1, HAL_ANT_DIV_COMB_LNA1 = 2, HAL_ANT_DIV_COMB_LNA1_PLUS_LNA2 = 3, } HAL_ANT_DIV_COMB_LNA_CONF; typedef struct { u_int8_t main_lna_conf; u_int8_t alt_lna_conf; u_int8_t fast_div_bias; u_int8_t main_gaintb; u_int8_t alt_gaintb; u_int8_t antdiv_configgroup; int8_t lna1_lna2_delta; } HAL_ANT_COMB_CONFIG; #define DEFAULT_ANTDIV_CONFIG_GROUP 0x00 #define HAL_ANTDIV_CONFIG_GROUP_1 0x01 #define HAL_ANTDIV_CONFIG_GROUP_2 0x02 #define HAL_ANTDIV_CONFIG_GROUP_3 0x03 /* * Flag for setting QUIET period */ typedef enum { HAL_QUIET_DISABLE = 0x0, HAL_QUIET_ENABLE = 0x1, HAL_QUIET_ADD_CURRENT_TSF = 0x2, /* add current TSF to next_start offset */ HAL_QUIET_ADD_SWBA_RESP_TIME = 0x4, /* add beacon response time to next_start offset */ } HAL_QUIET_FLAG; #define HAL_DFS_EVENT_PRICH 0x0000001 #define HAL_DFS_EVENT_EXTCH 0x0000002 #define HAL_DFS_EVENT_EXTEARLY 0x0000004 #define HAL_DFS_EVENT_ISDC 0x0000008 struct hal_dfs_event { uint64_t re_full_ts; /* 64-bit full timestamp from interrupt time */ uint32_t re_ts; /* Original 15 bit recv timestamp */ uint8_t re_rssi; /* rssi of radar event */ uint8_t re_dur; /* duration of radar pulse */ uint32_t re_flags; /* Flags (see above) */ }; typedef struct hal_dfs_event HAL_DFS_EVENT; /* * Generic Timer domain */ typedef enum { HAL_GEN_TIMER_TSF = 0, HAL_GEN_TIMER_TSF2, HAL_GEN_TIMER_TSF_ANY } HAL_GEN_TIMER_DOMAIN; /* * BT Co-existence definitions */ #include "ath_hal/ah_btcoex.h" struct hal_bb_panic_info { u_int32_t status; u_int32_t tsf; u_int32_t phy_panic_wd_ctl1; u_int32_t phy_panic_wd_ctl2; u_int32_t phy_gen_ctrl; u_int32_t rxc_pcnt; u_int32_t rxf_pcnt; u_int32_t txf_pcnt; u_int32_t cycles; u_int32_t wd; u_int32_t det; u_int32_t rdar; u_int32_t r_odfm; u_int32_t r_cck; u_int32_t t_odfm; u_int32_t t_cck; u_int32_t agc; u_int32_t src; }; /* Serialize Register Access Mode */ typedef enum { SER_REG_MODE_OFF = 0, SER_REG_MODE_ON = 1, SER_REG_MODE_AUTO = 2, } SER_REG_MODE; typedef struct { int ah_debug; /* only used if AH_DEBUG is defined */ int ah_ar5416_biasadj; /* enable AR2133 radio specific bias fiddling */ /* NB: these are deprecated; they exist for now for compatibility */ int ah_dma_beacon_response_time;/* in TU's */ int ah_sw_beacon_response_time; /* in TU's */ int ah_additional_swba_backoff; /* in TU's */ int ah_force_full_reset; /* force full chip reset rather then warm reset */ int ah_serialise_reg_war; /* force serialisation of register IO */ /* XXX these don't belong here, they're just for the ar9300 HAL port effort */ int ath_hal_desc_tpc; /* Per-packet TPC */ int ath_hal_sta_update_tx_pwr_enable; /* GreenTX */ int ath_hal_sta_update_tx_pwr_enable_S1; /* GreenTX */ int ath_hal_sta_update_tx_pwr_enable_S2; /* GreenTX */ int ath_hal_sta_update_tx_pwr_enable_S3; /* GreenTX */ /* I'm not sure what the default values for these should be */ int ath_hal_pll_pwr_save; int ath_hal_pcie_power_save_enable; int ath_hal_intr_mitigation_rx; int ath_hal_intr_mitigation_tx; int ath_hal_pcie_clock_req; #define AR_PCIE_PLL_PWRSAVE_CONTROL (1<<0) #define AR_PCIE_PLL_PWRSAVE_ON_D3 (1<<1) #define AR_PCIE_PLL_PWRSAVE_ON_D0 (1<<2) int ath_hal_pcie_waen; int ath_hal_pcie_ser_des_write; /* these are important for correct AR9300 behaviour */ int ath_hal_ht_enable; /* needs to be enabled for AR9300 HT */ int ath_hal_diversity_control; int ath_hal_antenna_switch_swap; int ath_hal_ext_lna_ctl_gpio; int ath_hal_spur_mode; int ath_hal_6mb_ack; /* should set this to 1 for 11a/11na? */ int ath_hal_enable_msi; /* enable MSI interrupts (needed?) */ int ath_hal_beacon_filter_interval; /* ok to be 0 for now? */ /* For now, set this to 0 - net80211 needs to know about hardware MFP support */ int ath_hal_mfp_support; int ath_hal_enable_ani; /* should set this.. */ int ath_hal_cwm_ignore_ext_cca; int ath_hal_show_bb_panic; int ath_hal_ant_ctrl_comm2g_switch_enable; int ath_hal_ext_atten_margin_cfg; int ath_hal_min_gainidx; int ath_hal_war70c; uint32_t ath_hal_mci_config; } HAL_OPS_CONFIG; /* * Hardware Access Layer (HAL) API. * * Clients of the HAL call ath_hal_attach to obtain a reference to an * ath_hal structure for use with the device. Hardware-related operations * that follow must call back into the HAL through interface, supplying * the reference as the first parameter. Note that before using the * reference returned by ath_hal_attach the caller should verify the * ABI version number. */ struct ath_hal { uint32_t ah_magic; /* consistency check magic number */ uint16_t ah_devid; /* PCI device ID */ uint16_t ah_subvendorid; /* PCI subvendor ID */ HAL_SOFTC ah_sc; /* back pointer to driver/os state */ HAL_BUS_TAG ah_st; /* params for register r+w */ HAL_BUS_HANDLE ah_sh; HAL_CTRY_CODE ah_countryCode; uint32_t ah_macVersion; /* MAC version id */ uint16_t ah_macRev; /* MAC revision */ uint16_t ah_phyRev; /* PHY revision */ /* NB: when only one radio is present the rev is in 5Ghz */ uint16_t ah_analog5GhzRev;/* 5GHz radio revision */ uint16_t ah_analog2GhzRev;/* 2GHz radio revision */ uint16_t *ah_eepromdata; /* eeprom buffer, if needed */ uint32_t ah_intrstate[8]; /* last int state */ uint32_t ah_syncstate; /* last sync intr state */ /* Current powerstate from HAL calls */ HAL_POWER_MODE ah_powerMode; HAL_OPS_CONFIG ah_config; const HAL_RATE_TABLE *__ahdecl(*ah_getRateTable)(struct ath_hal *, u_int mode); void __ahdecl(*ah_detach)(struct ath_hal*); /* Reset functions */ HAL_BOOL __ahdecl(*ah_reset)(struct ath_hal *, HAL_OPMODE, struct ieee80211_channel *, HAL_BOOL bChannelChange, HAL_RESET_TYPE resetType, HAL_STATUS *status); HAL_BOOL __ahdecl(*ah_phyDisable)(struct ath_hal *); HAL_BOOL __ahdecl(*ah_disable)(struct ath_hal *); void __ahdecl(*ah_configPCIE)(struct ath_hal *, HAL_BOOL restore, HAL_BOOL power_off); void __ahdecl(*ah_disablePCIE)(struct ath_hal *); void __ahdecl(*ah_setPCUConfig)(struct ath_hal *); HAL_BOOL __ahdecl(*ah_perCalibration)(struct ath_hal*, struct ieee80211_channel *, HAL_BOOL *); HAL_BOOL __ahdecl(*ah_perCalibrationN)(struct ath_hal *, struct ieee80211_channel *, u_int chainMask, HAL_BOOL longCal, HAL_BOOL *isCalDone); HAL_BOOL __ahdecl(*ah_resetCalValid)(struct ath_hal *, const struct ieee80211_channel *); HAL_BOOL __ahdecl(*ah_setTxPower)(struct ath_hal *, const struct ieee80211_channel *, uint16_t *); HAL_BOOL __ahdecl(*ah_setTxPowerLimit)(struct ath_hal *, uint32_t); HAL_BOOL __ahdecl(*ah_setBoardValues)(struct ath_hal *, const struct ieee80211_channel *); /* Transmit functions */ HAL_BOOL __ahdecl(*ah_updateTxTrigLevel)(struct ath_hal*, HAL_BOOL incTrigLevel); int __ahdecl(*ah_setupTxQueue)(struct ath_hal *, HAL_TX_QUEUE, const HAL_TXQ_INFO *qInfo); HAL_BOOL __ahdecl(*ah_setTxQueueProps)(struct ath_hal *, int q, const HAL_TXQ_INFO *qInfo); HAL_BOOL __ahdecl(*ah_getTxQueueProps)(struct ath_hal *, int q, HAL_TXQ_INFO *qInfo); HAL_BOOL __ahdecl(*ah_releaseTxQueue)(struct ath_hal *ah, u_int q); HAL_BOOL __ahdecl(*ah_resetTxQueue)(struct ath_hal *ah, u_int q); uint32_t __ahdecl(*ah_getTxDP)(struct ath_hal*, u_int); HAL_BOOL __ahdecl(*ah_setTxDP)(struct ath_hal*, u_int, uint32_t txdp); uint32_t __ahdecl(*ah_numTxPending)(struct ath_hal *, u_int q); HAL_BOOL __ahdecl(*ah_startTxDma)(struct ath_hal*, u_int); HAL_BOOL __ahdecl(*ah_stopTxDma)(struct ath_hal*, u_int); HAL_BOOL __ahdecl(*ah_setupTxDesc)(struct ath_hal *, struct ath_desc *, u_int pktLen, u_int hdrLen, HAL_PKT_TYPE type, u_int txPower, u_int txRate0, u_int txTries0, u_int keyIx, u_int antMode, u_int flags, u_int rtsctsRate, u_int rtsctsDuration, u_int compicvLen, u_int compivLen, u_int comp); HAL_BOOL __ahdecl(*ah_setupXTxDesc)(struct ath_hal *, struct ath_desc*, u_int txRate1, u_int txTries1, u_int txRate2, u_int txTries2, u_int txRate3, u_int txTries3); HAL_BOOL __ahdecl(*ah_fillTxDesc)(struct ath_hal *, struct ath_desc *, HAL_DMA_ADDR *bufAddrList, uint32_t *segLenList, u_int descId, u_int qcuId, HAL_BOOL firstSeg, HAL_BOOL lastSeg, const struct ath_desc *); HAL_STATUS __ahdecl(*ah_procTxDesc)(struct ath_hal *, struct ath_desc *, struct ath_tx_status *); void __ahdecl(*ah_getTxIntrQueue)(struct ath_hal *, uint32_t *); void __ahdecl(*ah_reqTxIntrDesc)(struct ath_hal *, struct ath_desc*); HAL_BOOL __ahdecl(*ah_getTxCompletionRates)(struct ath_hal *, const struct ath_desc *ds, int *rates, int *tries); void __ahdecl(*ah_setTxDescLink)(struct ath_hal *ah, void *ds, uint32_t link); void __ahdecl(*ah_getTxDescLink)(struct ath_hal *ah, void *ds, uint32_t *link); void __ahdecl(*ah_getTxDescLinkPtr)(struct ath_hal *ah, void *ds, uint32_t **linkptr); void __ahdecl(*ah_setupTxStatusRing)(struct ath_hal *, void *ts_start, uint32_t ts_paddr_start, uint16_t size); void __ahdecl(*ah_getTxRawTxDesc)(struct ath_hal *, u_int32_t *); /* Receive Functions */ uint32_t __ahdecl(*ah_getRxDP)(struct ath_hal*, HAL_RX_QUEUE); void __ahdecl(*ah_setRxDP)(struct ath_hal*, uint32_t rxdp, HAL_RX_QUEUE); void __ahdecl(*ah_enableReceive)(struct ath_hal*); HAL_BOOL __ahdecl(*ah_stopDmaReceive)(struct ath_hal*); void __ahdecl(*ah_startPcuReceive)(struct ath_hal*); void __ahdecl(*ah_stopPcuReceive)(struct ath_hal*); void __ahdecl(*ah_setMulticastFilter)(struct ath_hal*, uint32_t filter0, uint32_t filter1); HAL_BOOL __ahdecl(*ah_setMulticastFilterIndex)(struct ath_hal*, uint32_t index); HAL_BOOL __ahdecl(*ah_clrMulticastFilterIndex)(struct ath_hal*, uint32_t index); uint32_t __ahdecl(*ah_getRxFilter)(struct ath_hal*); void __ahdecl(*ah_setRxFilter)(struct ath_hal*, uint32_t); HAL_BOOL __ahdecl(*ah_setupRxDesc)(struct ath_hal *, struct ath_desc *, uint32_t size, u_int flags); HAL_STATUS __ahdecl(*ah_procRxDesc)(struct ath_hal *, struct ath_desc *, uint32_t phyAddr, struct ath_desc *next, uint64_t tsf, struct ath_rx_status *); void __ahdecl(*ah_rxMonitor)(struct ath_hal *, const HAL_NODE_STATS *, const struct ieee80211_channel *); void __ahdecl(*ah_aniPoll)(struct ath_hal *, const struct ieee80211_channel *); void __ahdecl(*ah_procMibEvent)(struct ath_hal *, const HAL_NODE_STATS *); /* Misc Functions */ HAL_STATUS __ahdecl(*ah_getCapability)(struct ath_hal *, HAL_CAPABILITY_TYPE, uint32_t capability, uint32_t *result); HAL_BOOL __ahdecl(*ah_setCapability)(struct ath_hal *, HAL_CAPABILITY_TYPE, uint32_t capability, uint32_t setting, HAL_STATUS *); HAL_BOOL __ahdecl(*ah_getDiagState)(struct ath_hal *, int request, const void *args, uint32_t argsize, void **result, uint32_t *resultsize); void __ahdecl(*ah_getMacAddress)(struct ath_hal *, uint8_t *); HAL_BOOL __ahdecl(*ah_setMacAddress)(struct ath_hal *, const uint8_t*); void __ahdecl(*ah_getBssIdMask)(struct ath_hal *, uint8_t *); HAL_BOOL __ahdecl(*ah_setBssIdMask)(struct ath_hal *, const uint8_t*); HAL_BOOL __ahdecl(*ah_setRegulatoryDomain)(struct ath_hal*, uint16_t, HAL_STATUS *); void __ahdecl(*ah_setLedState)(struct ath_hal*, HAL_LED_STATE); void __ahdecl(*ah_writeAssocid)(struct ath_hal*, const uint8_t *bssid, uint16_t assocId); HAL_BOOL __ahdecl(*ah_gpioCfgOutput)(struct ath_hal *, uint32_t gpio, HAL_GPIO_MUX_TYPE); HAL_BOOL __ahdecl(*ah_gpioCfgInput)(struct ath_hal *, uint32_t gpio); uint32_t __ahdecl(*ah_gpioGet)(struct ath_hal *, uint32_t gpio); HAL_BOOL __ahdecl(*ah_gpioSet)(struct ath_hal *, uint32_t gpio, uint32_t val); void __ahdecl(*ah_gpioSetIntr)(struct ath_hal*, u_int, uint32_t); uint32_t __ahdecl(*ah_getTsf32)(struct ath_hal*); uint64_t __ahdecl(*ah_getTsf64)(struct ath_hal*); void __ahdecl(*ah_setTsf64)(struct ath_hal *, uint64_t); void __ahdecl(*ah_resetTsf)(struct ath_hal*); HAL_BOOL __ahdecl(*ah_detectCardPresent)(struct ath_hal*); void __ahdecl(*ah_updateMibCounters)(struct ath_hal*, HAL_MIB_STATS*); HAL_RFGAIN __ahdecl(*ah_getRfGain)(struct ath_hal*); u_int __ahdecl(*ah_getDefAntenna)(struct ath_hal*); void __ahdecl(*ah_setDefAntenna)(struct ath_hal*, u_int); HAL_ANT_SETTING __ahdecl(*ah_getAntennaSwitch)(struct ath_hal*); HAL_BOOL __ahdecl(*ah_setAntennaSwitch)(struct ath_hal*, HAL_ANT_SETTING); HAL_BOOL __ahdecl(*ah_setSifsTime)(struct ath_hal*, u_int); u_int __ahdecl(*ah_getSifsTime)(struct ath_hal*); HAL_BOOL __ahdecl(*ah_setSlotTime)(struct ath_hal*, u_int); u_int __ahdecl(*ah_getSlotTime)(struct ath_hal*); HAL_BOOL __ahdecl(*ah_setAckTimeout)(struct ath_hal*, u_int); u_int __ahdecl(*ah_getAckTimeout)(struct ath_hal*); HAL_BOOL __ahdecl(*ah_setAckCTSRate)(struct ath_hal*, u_int); u_int __ahdecl(*ah_getAckCTSRate)(struct ath_hal*); HAL_BOOL __ahdecl(*ah_setCTSTimeout)(struct ath_hal*, u_int); u_int __ahdecl(*ah_getCTSTimeout)(struct ath_hal*); HAL_BOOL __ahdecl(*ah_setDecompMask)(struct ath_hal*, uint16_t, int); void __ahdecl(*ah_setCoverageClass)(struct ath_hal*, uint8_t, int); HAL_STATUS __ahdecl(*ah_setQuiet)(struct ath_hal *ah, uint32_t period, uint32_t duration, uint32_t nextStart, HAL_QUIET_FLAG flag); void __ahdecl(*ah_setChainMasks)(struct ath_hal *, uint32_t, uint32_t); /* DFS functions */ void __ahdecl(*ah_enableDfs)(struct ath_hal *ah, HAL_PHYERR_PARAM *pe); void __ahdecl(*ah_getDfsThresh)(struct ath_hal *ah, HAL_PHYERR_PARAM *pe); HAL_BOOL __ahdecl(*ah_getDfsDefaultThresh)(struct ath_hal *ah, HAL_PHYERR_PARAM *pe); HAL_BOOL __ahdecl(*ah_procRadarEvent)(struct ath_hal *ah, struct ath_rx_status *rxs, uint64_t fulltsf, const char *buf, HAL_DFS_EVENT *event); HAL_BOOL __ahdecl(*ah_isFastClockEnabled)(struct ath_hal *ah); /* Spectral Scan functions */ void __ahdecl(*ah_spectralConfigure)(struct ath_hal *ah, HAL_SPECTRAL_PARAM *sp); void __ahdecl(*ah_spectralGetConfig)(struct ath_hal *ah, HAL_SPECTRAL_PARAM *sp); void __ahdecl(*ah_spectralStart)(struct ath_hal *); void __ahdecl(*ah_spectralStop)(struct ath_hal *); HAL_BOOL __ahdecl(*ah_spectralIsEnabled)(struct ath_hal *); HAL_BOOL __ahdecl(*ah_spectralIsActive)(struct ath_hal *); /* XXX getNfPri() and getNfExt() */ /* Key Cache Functions */ uint32_t __ahdecl(*ah_getKeyCacheSize)(struct ath_hal*); HAL_BOOL __ahdecl(*ah_resetKeyCacheEntry)(struct ath_hal*, uint16_t); HAL_BOOL __ahdecl(*ah_isKeyCacheEntryValid)(struct ath_hal *, uint16_t); HAL_BOOL __ahdecl(*ah_setKeyCacheEntry)(struct ath_hal*, uint16_t, const HAL_KEYVAL *, const uint8_t *, int); HAL_BOOL __ahdecl(*ah_setKeyCacheEntryMac)(struct ath_hal*, uint16_t, const uint8_t *); /* Power Management Functions */ HAL_BOOL __ahdecl(*ah_setPowerMode)(struct ath_hal*, HAL_POWER_MODE mode, int setChip); HAL_POWER_MODE __ahdecl(*ah_getPowerMode)(struct ath_hal*); int16_t __ahdecl(*ah_getChanNoise)(struct ath_hal *, const struct ieee80211_channel *); /* Beacon Management Functions */ void __ahdecl(*ah_setBeaconTimers)(struct ath_hal*, const HAL_BEACON_TIMERS *); /* NB: deprecated, use ah_setBeaconTimers instead */ void __ahdecl(*ah_beaconInit)(struct ath_hal *, uint32_t nexttbtt, uint32_t intval); void __ahdecl(*ah_setStationBeaconTimers)(struct ath_hal*, const HAL_BEACON_STATE *); void __ahdecl(*ah_resetStationBeaconTimers)(struct ath_hal*); uint64_t __ahdecl(*ah_getNextTBTT)(struct ath_hal *); /* 802.11n Functions */ HAL_BOOL __ahdecl(*ah_chainTxDesc)(struct ath_hal *, struct ath_desc *, HAL_DMA_ADDR *bufAddrList, uint32_t *segLenList, u_int, u_int, HAL_PKT_TYPE, u_int, HAL_CIPHER, uint8_t, HAL_BOOL, HAL_BOOL, HAL_BOOL); HAL_BOOL __ahdecl(*ah_setupFirstTxDesc)(struct ath_hal *, struct ath_desc *, u_int, u_int, u_int, u_int, u_int, u_int, u_int, u_int); HAL_BOOL __ahdecl(*ah_setupLastTxDesc)(struct ath_hal *, struct ath_desc *, const struct ath_desc *); void __ahdecl(*ah_set11nRateScenario)(struct ath_hal *, struct ath_desc *, u_int, u_int, HAL_11N_RATE_SERIES [], u_int, u_int); /* * The next 4 (set11ntxdesc -> set11naggrlast) are specific * to the EDMA HAL. Descriptors are chained together by * using filltxdesc (not ChainTxDesc) and then setting the * aggregate flags appropriately using first/middle/last. */ void __ahdecl(*ah_set11nTxDesc)(struct ath_hal *, void *, u_int, HAL_PKT_TYPE, u_int, u_int, u_int); void __ahdecl(*ah_set11nAggrFirst)(struct ath_hal *, struct ath_desc *, u_int, u_int); void __ahdecl(*ah_set11nAggrMiddle)(struct ath_hal *, struct ath_desc *, u_int); void __ahdecl(*ah_set11nAggrLast)(struct ath_hal *, struct ath_desc *); void __ahdecl(*ah_clr11nAggr)(struct ath_hal *, struct ath_desc *); void __ahdecl(*ah_set11nBurstDuration)(struct ath_hal *, struct ath_desc *, u_int); void __ahdecl(*ah_set11nVirtMoreFrag)(struct ath_hal *, struct ath_desc *, u_int); HAL_BOOL __ahdecl(*ah_getMibCycleCounts) (struct ath_hal *, HAL_SURVEY_SAMPLE *); uint32_t __ahdecl(*ah_get11nExtBusy)(struct ath_hal *); void __ahdecl(*ah_set11nMac2040)(struct ath_hal *, HAL_HT_MACMODE); HAL_HT_RXCLEAR __ahdecl(*ah_get11nRxClear)(struct ath_hal *ah); void __ahdecl(*ah_set11nRxClear)(struct ath_hal *, HAL_HT_RXCLEAR); /* Interrupt functions */ HAL_BOOL __ahdecl(*ah_isInterruptPending)(struct ath_hal*); HAL_BOOL __ahdecl(*ah_getPendingInterrupts)(struct ath_hal*, HAL_INT*); HAL_INT __ahdecl(*ah_getInterrupts)(struct ath_hal*); HAL_INT __ahdecl(*ah_setInterrupts)(struct ath_hal*, HAL_INT); /* Bluetooth Coexistence functions */ void __ahdecl(*ah_btCoexSetInfo)(struct ath_hal *, HAL_BT_COEX_INFO *); void __ahdecl(*ah_btCoexSetConfig)(struct ath_hal *, HAL_BT_COEX_CONFIG *); void __ahdecl(*ah_btCoexSetQcuThresh)(struct ath_hal *, int); void __ahdecl(*ah_btCoexSetWeights)(struct ath_hal *, uint32_t); void __ahdecl(*ah_btCoexSetBmissThresh)(struct ath_hal *, uint32_t); void __ahdecl(*ah_btCoexSetParameter)(struct ath_hal *, uint32_t, uint32_t); void __ahdecl(*ah_btCoexDisable)(struct ath_hal *); int __ahdecl(*ah_btCoexEnable)(struct ath_hal *); /* Bluetooth MCI methods */ void __ahdecl(*ah_btMciSetup)(struct ath_hal *, uint32_t, void *, uint16_t, uint32_t); HAL_BOOL __ahdecl(*ah_btMciSendMessage)(struct ath_hal *, uint8_t, uint32_t, uint32_t *, uint8_t, HAL_BOOL, HAL_BOOL); uint32_t __ahdecl(*ah_btMciGetInterrupt)(struct ath_hal *, uint32_t *, uint32_t *); uint32_t __ahdecl(*ah_btMciState)(struct ath_hal *, uint32_t, uint32_t *); void __ahdecl(*ah_btMciDetach)(struct ath_hal *); /* LNA diversity configuration */ void __ahdecl(*ah_divLnaConfGet)(struct ath_hal *, HAL_ANT_COMB_CONFIG *); void __ahdecl(*ah_divLnaConfSet)(struct ath_hal *, HAL_ANT_COMB_CONFIG *); }; /* * Check the PCI vendor ID and device ID against Atheros' values * and return a printable description for any Atheros hardware. * AH_NULL is returned if the ID's do not describe Atheros hardware. */ extern const char *__ahdecl ath_hal_probe(uint16_t vendorid, uint16_t devid); /* * Attach the HAL for use with the specified device. The device is * defined by the PCI device ID. The caller provides an opaque pointer * to an upper-layer data structure (HAL_SOFTC) that is stored in the * HAL state block for later use. Hardware register accesses are done * using the specified bus tag and handle. On successful return a * reference to a state block is returned that must be supplied in all * subsequent HAL calls. Storage associated with this reference is * dynamically allocated and must be freed by calling the ah_detach * method when the client is done. If the attach operation fails a * null (AH_NULL) reference will be returned and a status code will * be returned if the status parameter is non-zero. */ extern struct ath_hal * __ahdecl ath_hal_attach(uint16_t devid, HAL_SOFTC, HAL_BUS_TAG, HAL_BUS_HANDLE, uint16_t *eepromdata, HAL_OPS_CONFIG *ah_config, HAL_STATUS* status); extern const char *ath_hal_mac_name(struct ath_hal *); extern const char *ath_hal_rf_name(struct ath_hal *); /* * Regulatory interfaces. Drivers should use ath_hal_init_channels to * request a set of channels for a particular country code and/or * regulatory domain. If CTRY_DEFAULT and SKU_NONE are specified then * this list is constructed according to the contents of the EEPROM. * ath_hal_getchannels acts similarly but does not alter the operating * state; this can be used to collect information for a particular * regulatory configuration. Finally ath_hal_set_channels installs a * channel list constructed outside the driver. The HAL will adopt the * channel list and setup internal state according to the specified * regulatory configuration (e.g. conformance test limits). * * For all interfaces the channel list is returned in the supplied array. * maxchans defines the maximum size of this array. nchans contains the * actual number of channels returned. If a problem occurred then a * status code != HAL_OK is returned. */ struct ieee80211_channel; /* * Return a list of channels according to the specified regulatory. */ extern HAL_STATUS __ahdecl ath_hal_getchannels(struct ath_hal *, struct ieee80211_channel *chans, u_int maxchans, int *nchans, u_int modeSelect, HAL_CTRY_CODE cc, HAL_REG_DOMAIN regDmn, HAL_BOOL enableExtendedChannels); /* * Return a list of channels and install it as the current operating * regulatory list. */ extern HAL_STATUS __ahdecl ath_hal_init_channels(struct ath_hal *, struct ieee80211_channel *chans, u_int maxchans, int *nchans, u_int modeSelect, HAL_CTRY_CODE cc, HAL_REG_DOMAIN rd, HAL_BOOL enableExtendedChannels); /* * Install the list of channels as the current operating regulatory * and setup related state according to the country code and sku. */ extern HAL_STATUS __ahdecl ath_hal_set_channels(struct ath_hal *, struct ieee80211_channel *chans, int nchans, HAL_CTRY_CODE cc, HAL_REG_DOMAIN regDmn); /* * Fetch the ctl/ext noise floor values reported by a MIMO * radio. Returns 1 for valid results, 0 for invalid channel. */ extern int __ahdecl ath_hal_get_mimo_chan_noise(struct ath_hal *ah, const struct ieee80211_channel *chan, int16_t *nf_ctl, int16_t *nf_ext); /* * Calibrate noise floor data following a channel scan or similar. * This must be called prior retrieving noise floor data. */ extern void __ahdecl ath_hal_process_noisefloor(struct ath_hal *ah); /* * Return bit mask of wireless modes supported by the hardware. */ extern u_int __ahdecl ath_hal_getwirelessmodes(struct ath_hal*); /* * Get the HAL wireless mode for the given channel. */ extern int ath_hal_get_curmode(struct ath_hal *ah, const struct ieee80211_channel *chan); /* * Calculate the packet TX time for a legacy or 11n frame */ extern uint32_t __ahdecl ath_hal_pkt_txtime(struct ath_hal *ah, const HAL_RATE_TABLE *rates, uint32_t frameLen, uint16_t rateix, HAL_BOOL isht40, HAL_BOOL shortPreamble, HAL_BOOL includeSifs); /* * Calculate the duration of an 11n frame. */ extern uint32_t __ahdecl ath_computedur_ht(uint32_t frameLen, uint16_t rate, int streams, HAL_BOOL isht40, HAL_BOOL isShortGI); /* * Calculate the transmit duration of a legacy frame. */ extern uint16_t __ahdecl ath_hal_computetxtime(struct ath_hal *, const HAL_RATE_TABLE *rates, uint32_t frameLen, uint16_t rateix, HAL_BOOL shortPreamble, HAL_BOOL includeSifs); /* * Adjust the TSF. */ extern void __ahdecl ath_hal_adjusttsf(struct ath_hal *ah, int32_t tsfdelta); /* * Enable or disable CCA. */ void __ahdecl ath_hal_setcca(struct ath_hal *ah, int ena); /* * Get CCA setting. */ int __ahdecl ath_hal_getcca(struct ath_hal *ah); /* * Read EEPROM data from ah_eepromdata */ HAL_BOOL __ahdecl ath_hal_EepromDataRead(struct ath_hal *ah, u_int off, uint16_t *data); /* * For now, simply pass through MFP frames. */ static inline u_int32_t ath_hal_get_mfp_qos(struct ath_hal *ah) { //return AH_PRIVATE(ah)->ah_mfp_qos; return HAL_MFP_QOSDATA; } +/* + * Convert between microseconds and core system clocks. + */ +extern u_int ath_hal_mac_clks(struct ath_hal *ah, u_int usecs); +extern u_int ath_hal_mac_usec(struct ath_hal *ah, u_int clks); +extern uint64_t ath_hal_mac_psec(struct ath_hal *ah, u_int clks); + #endif /* _ATH_AH_H_ */ Index: projects/clang390-import/sys/dev/ath/ath_hal/ah_internal.h =================================================================== --- projects/clang390-import/sys/dev/ath/ath_hal/ah_internal.h (revision 305686) +++ projects/clang390-import/sys/dev/ath/ath_hal/ah_internal.h (revision 305687) @@ -1,1047 +1,1041 @@ /* * Copyright (c) 2002-2009 Sam Leffler, Errno Consulting * Copyright (c) 2002-2008 Atheros Communications, Inc. * * Permission to use, copy, modify, and/or distribute this software for any * purpose with or without fee is hereby granted, provided that the above * copyright notice and this permission notice appear in all copies. * * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. * * $FreeBSD$ */ #ifndef _ATH_AH_INTERAL_H_ #define _ATH_AH_INTERAL_H_ /* * Atheros Device Hardware Access Layer (HAL). * * Internal definitions. */ #define AH_NULL 0 #define AH_MIN(a,b) ((a)<(b)?(a):(b)) #define AH_MAX(a,b) ((a)>(b)?(a):(b)) #include #include "opt_ah.h" /* needed for AH_SUPPORT_AR5416 */ #ifndef AH_SUPPORT_AR5416 #define AH_SUPPORT_AR5416 1 #endif #ifndef NBBY #define NBBY 8 /* number of bits/byte */ #endif #ifndef roundup #define roundup(x, y) ((((x)+((y)-1))/(y))*(y)) /* to any y */ #endif #ifndef howmany #define howmany(x, y) (((x)+((y)-1))/(y)) #endif #ifndef offsetof #define offsetof(type, field) ((size_t)(&((type *)0)->field)) #endif typedef struct { uint32_t start; /* first register */ uint32_t end; /* ending register or zero */ } HAL_REGRANGE; typedef struct { uint32_t addr; /* regiser address/offset */ uint32_t value; /* value to write */ } HAL_REGWRITE; /* * Transmit power scale factor. * * NB: This is not public because we want to discourage the use of * scaling; folks should use the tx power limit interface. */ typedef enum { HAL_TP_SCALE_MAX = 0, /* no scaling (default) */ HAL_TP_SCALE_50 = 1, /* 50% of max (-3 dBm) */ HAL_TP_SCALE_25 = 2, /* 25% of max (-6 dBm) */ HAL_TP_SCALE_12 = 3, /* 12% of max (-9 dBm) */ HAL_TP_SCALE_MIN = 4, /* min, but still on */ } HAL_TP_SCALE; typedef enum { HAL_CAP_RADAR = 0, /* Radar capability */ HAL_CAP_AR = 1, /* AR capability */ } HAL_PHYDIAG_CAPS; /* * Enable/disable strong signal fast diversity */ #define HAL_CAP_STRONG_DIV 2 /* * Each chip or class of chips registers to offer support. */ struct ath_hal_chip { const char *name; const char *(*probe)(uint16_t vendorid, uint16_t devid); struct ath_hal *(*attach)(uint16_t devid, HAL_SOFTC, HAL_BUS_TAG, HAL_BUS_HANDLE, uint16_t *eepromdata, HAL_OPS_CONFIG *ah, HAL_STATUS *error); }; #ifndef AH_CHIP #define AH_CHIP(_name, _probe, _attach) \ static struct ath_hal_chip _name##_chip = { \ .name = #_name, \ .probe = _probe, \ .attach = _attach \ }; \ OS_DATA_SET(ah_chips, _name##_chip) #endif /* * Each RF backend registers to offer support; this is mostly * used by multi-chip 5212 solutions. Single-chip solutions * have a fixed idea about which RF to use. */ struct ath_hal_rf { const char *name; HAL_BOOL (*probe)(struct ath_hal *ah); HAL_BOOL (*attach)(struct ath_hal *ah, HAL_STATUS *ecode); }; #ifndef AH_RF #define AH_RF(_name, _probe, _attach) \ static struct ath_hal_rf _name##_rf = { \ .name = __STRING(_name), \ .probe = _probe, \ .attach = _attach \ }; \ OS_DATA_SET(ah_rfs, _name##_rf) #endif struct ath_hal_rf *ath_hal_rfprobe(struct ath_hal *ah, HAL_STATUS *ecode); /* * Maximum number of internal channels. Entries are per unique * frequency so this might be need to be increased to handle all * usage cases; typically no more than 32 are really needed but * dynamically allocating the data structures is a bit painful * right now. */ #ifndef AH_MAXCHAN #define AH_MAXCHAN 96 #endif #define HAL_NF_CAL_HIST_LEN_FULL 5 #define HAL_NF_CAL_HIST_LEN_SMALL 1 #define HAL_NUM_NF_READINGS 6 /* 3 chains * (ctl + ext) */ #define HAL_NF_LOAD_DELAY 1000 /* * PER_CHAN doesn't work for now, as it looks like the device layer * has to pre-populate the per-channel list with nominal values. */ //#define ATH_NF_PER_CHAN 1 typedef struct { u_int8_t curr_index; int8_t invalidNFcount; /* TO DO: REMOVE THIS! */ int16_t priv_nf[HAL_NUM_NF_READINGS]; } HAL_NFCAL_BASE; typedef struct { HAL_NFCAL_BASE base; int16_t nf_cal_buffer[HAL_NF_CAL_HIST_LEN_FULL][HAL_NUM_NF_READINGS]; } HAL_NFCAL_HIST_FULL; typedef struct { HAL_NFCAL_BASE base; int16_t nf_cal_buffer[HAL_NF_CAL_HIST_LEN_SMALL][HAL_NUM_NF_READINGS]; } HAL_NFCAL_HIST_SMALL; #ifdef ATH_NF_PER_CHAN typedef HAL_NFCAL_HIST_FULL HAL_CHAN_NFCAL_HIST; #define AH_HOME_CHAN_NFCAL_HIST(ah, ichan) (ichan ? &ichan->nf_cal_hist: NULL) #else typedef HAL_NFCAL_HIST_SMALL HAL_CHAN_NFCAL_HIST; #define AH_HOME_CHAN_NFCAL_HIST(ah, ichan) (&AH_PRIVATE(ah)->nf_cal_hist) #endif /* ATH_NF_PER_CHAN */ /* * Internal per-channel state. These are found * using ic_devdata in the ieee80211_channel. */ typedef struct { uint16_t channel; /* h/w frequency, NB: may be mapped */ uint8_t privFlags; #define CHANNEL_IQVALID 0x01 /* IQ calibration valid */ #define CHANNEL_ANI_INIT 0x02 /* ANI state initialized */ #define CHANNEL_ANI_SETUP 0x04 /* ANI state setup */ #define CHANNEL_MIMO_NF_VALID 0x04 /* Mimo NF values are valid */ uint8_t calValid; /* bitmask of cal types */ int8_t iCoff; int8_t qCoff; int16_t rawNoiseFloor; int16_t noiseFloorAdjust; #ifdef AH_SUPPORT_AR5416 int16_t noiseFloorCtl[AH_MAX_CHAINS]; int16_t noiseFloorExt[AH_MAX_CHAINS]; #endif /* AH_SUPPORT_AR5416 */ uint16_t mainSpur; /* cached spur value for this channel */ /*XXX TODO: make these part of privFlags */ uint8_t paprd_done:1, /* 1: PAPRD DONE, 0: PAPRD Cal not done */ paprd_table_write_done:1; /* 1: DONE, 0: Cal data write not done */ int one_time_cals_done; HAL_CHAN_NFCAL_HIST nf_cal_hist; } HAL_CHANNEL_INTERNAL; /* channel requires noise floor check */ #define CHANNEL_NFCREQUIRED IEEE80211_CHAN_PRIV0 /* all full-width channels */ #define IEEE80211_CHAN_ALLFULL \ (IEEE80211_CHAN_ALL - (IEEE80211_CHAN_HALF | IEEE80211_CHAN_QUARTER)) #define IEEE80211_CHAN_ALLTURBOFULL \ (IEEE80211_CHAN_ALLTURBO - \ (IEEE80211_CHAN_HALF | IEEE80211_CHAN_QUARTER)) typedef struct { uint32_t halChanSpreadSupport : 1, halSleepAfterBeaconBroken : 1, halCompressSupport : 1, halBurstSupport : 1, halFastFramesSupport : 1, halChapTuningSupport : 1, halTurboGSupport : 1, halTurboPrimeSupport : 1, halMicAesCcmSupport : 1, halMicCkipSupport : 1, halMicTkipSupport : 1, halTkipMicTxRxKeySupport : 1, halCipherAesCcmSupport : 1, halCipherCkipSupport : 1, halCipherTkipSupport : 1, halPSPollBroken : 1, halVEOLSupport : 1, halBssIdMaskSupport : 1, halMcastKeySrchSupport : 1, halTsfAddSupport : 1, halChanHalfRate : 1, halChanQuarterRate : 1, halHTSupport : 1, halHTSGI20Support : 1, halRfSilentSupport : 1, halHwPhyCounterSupport : 1, halWowSupport : 1, halWowMatchPatternExact : 1, halAutoSleepSupport : 1, halFastCCSupport : 1, halBtCoexSupport : 1; uint32_t halRxStbcSupport : 1, halTxStbcSupport : 1, halGTTSupport : 1, halCSTSupport : 1, halRifsRxSupport : 1, halRifsTxSupport : 1, hal4AddrAggrSupport : 1, halExtChanDfsSupport : 1, halUseCombinedRadarRssi : 1, halForcePpmSupport : 1, halEnhancedPmSupport : 1, halEnhancedDfsSupport : 1, halMbssidAggrSupport : 1, halBssidMatchSupport : 1, hal4kbSplitTransSupport : 1, halHasRxSelfLinkedTail : 1, halSupportsFastClock5GHz : 1, halHasBBReadWar : 1, halSerialiseRegWar : 1, halMciSupport : 1, halRxTxAbortSupport : 1, halPaprdEnabled : 1, halHasUapsdSupport : 1, halWpsPushButtonSupport : 1, halBtCoexApsmWar : 1, halGenTimerSupport : 1, halLDPCSupport : 1, halHwBeaconProcSupport : 1, halEnhancedDmaSupport : 1; uint32_t halIsrRacSupport : 1, halApmEnable : 1, halIntrMitigation : 1, hal49GhzSupport : 1, halAntDivCombSupport : 1, halAntDivCombSupportOrg : 1, halRadioRetentionSupport : 1, halSpectralScanSupport : 1, halRxUsingLnaMixing : 1, halRxDoMyBeacon : 1, halHwUapsdTrig : 1; uint32_t halWirelessModes; uint16_t halTotalQueues; uint16_t halKeyCacheSize; uint16_t halLow5GhzChan, halHigh5GhzChan; uint16_t halLow2GhzChan, halHigh2GhzChan; int halTxTstampPrecision; int halRxTstampPrecision; int halRtsAggrLimit; uint8_t halTxChainMask; uint8_t halRxChainMask; uint8_t halNumGpioPins; uint8_t halNumAntCfg2GHz; uint8_t halNumAntCfg5GHz; uint32_t halIntrMask; uint8_t halTxStreams; uint8_t halRxStreams; HAL_MFP_OPT_T halMfpSupport; /* AR9300 HAL porting capabilities */ int hal_paprd_enabled; int hal_pcie_lcr_offset; int hal_pcie_lcr_extsync_en; int halNumTxMaps; int halTxDescLen; int halTxStatusLen; int halRxStatusLen; int halRxHpFifoDepth; int halRxLpFifoDepth; uint32_t halRegCap; /* XXX needed? */ int halNumMRRetries; int hal_ani_poll_interval; int hal_channel_switch_time_usec; } HAL_CAPABILITIES; struct regDomain; /* * Definitions for ah_flags in ath_hal_private */ #define AH_USE_EEPROM 0x1 #define AH_IS_HB63 0x2 /* * The ``private area'' follows immediately after the ``public area'' * in the data structure returned by ath_hal_attach. Private data are * used by device-independent code such as the regulatory domain support. * In general, code within the HAL should never depend on data in the * public area. Instead any public data needed internally should be * shadowed here. * * When declaring a device-specific ath_hal data structure this structure * is assumed to at the front; e.g. * * struct ath_hal_5212 { * struct ath_hal_private ah_priv; * ... * }; * * It might be better to manage the method pointers in this structure * using an indirect pointer to a read-only data structure but this would * disallow class-style method overriding. */ struct ath_hal_private { struct ath_hal h; /* public area */ /* NB: all methods go first to simplify initialization */ HAL_BOOL (*ah_getChannelEdges)(struct ath_hal*, uint16_t channelFlags, uint16_t *lowChannel, uint16_t *highChannel); u_int (*ah_getWirelessModes)(struct ath_hal*); HAL_BOOL (*ah_eepromRead)(struct ath_hal *, u_int off, uint16_t *data); HAL_BOOL (*ah_eepromWrite)(struct ath_hal *, u_int off, uint16_t data); HAL_BOOL (*ah_getChipPowerLimits)(struct ath_hal *, struct ieee80211_channel *); int16_t (*ah_getNfAdjust)(struct ath_hal *, const HAL_CHANNEL_INTERNAL*); void (*ah_getNoiseFloor)(struct ath_hal *, int16_t nfarray[]); void *ah_eeprom; /* opaque EEPROM state */ uint16_t ah_eeversion; /* EEPROM version */ void (*ah_eepromDetach)(struct ath_hal *); HAL_STATUS (*ah_eepromGet)(struct ath_hal *, int, void *); HAL_STATUS (*ah_eepromSet)(struct ath_hal *, int, int); uint16_t (*ah_getSpurChan)(struct ath_hal *, int, HAL_BOOL); HAL_BOOL (*ah_eepromDiag)(struct ath_hal *, int request, const void *args, uint32_t argsize, void **result, uint32_t *resultsize); /* * Device revision information. */ uint16_t ah_devid; /* PCI device ID */ uint16_t ah_subvendorid; /* PCI subvendor ID */ uint32_t ah_macVersion; /* MAC version id */ uint16_t ah_macRev; /* MAC revision */ uint16_t ah_phyRev; /* PHY revision */ uint16_t ah_analog5GhzRev; /* 2GHz radio revision */ uint16_t ah_analog2GhzRev; /* 5GHz radio revision */ uint32_t ah_flags; /* misc flags */ uint8_t ah_ispcie; /* PCIE, special treatment */ uint8_t ah_devType; /* card type - CB, PCI, PCIe */ HAL_OPMODE ah_opmode; /* operating mode from reset */ const struct ieee80211_channel *ah_curchan;/* operating channel */ HAL_CAPABILITIES ah_caps; /* device capabilities */ uint32_t ah_diagreg; /* user-specified AR_DIAG_SW */ int16_t ah_powerLimit; /* tx power cap */ uint16_t ah_maxPowerLevel; /* calculated max tx power */ u_int ah_tpScale; /* tx power scale factor */ u_int16_t ah_extraTxPow; /* low rates extra-txpower */ uint32_t ah_11nCompat; /* 11n compat controls */ /* * State for regulatory domain handling. */ HAL_REG_DOMAIN ah_currentRD; /* EEPROM regulatory domain */ HAL_REG_DOMAIN ah_currentRDext; /* EEPROM extended regdomain flags */ HAL_DFS_DOMAIN ah_dfsDomain; /* current DFS domain */ HAL_CHANNEL_INTERNAL ah_channels[AH_MAXCHAN]; /* private chan state */ u_int ah_nchan; /* valid items in ah_channels */ const struct regDomain *ah_rd2GHz; /* reg state for 2G band */ const struct regDomain *ah_rd5GHz; /* reg state for 5G band */ uint8_t ah_coverageClass; /* coverage class */ /* * RF Silent handling; setup according to the EEPROM. */ uint16_t ah_rfsilent; /* GPIO pin + polarity */ HAL_BOOL ah_rfkillEnabled; /* enable/disable RfKill */ /* * Diagnostic support for discriminating HIUERR reports. */ uint32_t ah_fatalState[6]; /* AR_ISR+shadow regs */ int ah_rxornIsFatal; /* how to treat HAL_INT_RXORN */ /* Only used if ATH_NF_PER_CHAN is defined */ HAL_NFCAL_HIST_FULL nf_cal_hist; /* * Channel survey history - current channel only. */ HAL_CHANNEL_SURVEY ah_chansurvey; /* channel survey */ }; #define AH_PRIVATE(_ah) ((struct ath_hal_private *)(_ah)) #define ath_hal_getChannelEdges(_ah, _cf, _lc, _hc) \ AH_PRIVATE(_ah)->ah_getChannelEdges(_ah, _cf, _lc, _hc) #define ath_hal_getWirelessModes(_ah) \ AH_PRIVATE(_ah)->ah_getWirelessModes(_ah) #define ath_hal_eepromRead(_ah, _off, _data) \ AH_PRIVATE(_ah)->ah_eepromRead(_ah, _off, _data) #define ath_hal_eepromWrite(_ah, _off, _data) \ AH_PRIVATE(_ah)->ah_eepromWrite(_ah, _off, _data) #define ath_hal_gpioCfgOutput(_ah, _gpio, _type) \ (_ah)->ah_gpioCfgOutput(_ah, _gpio, _type) #define ath_hal_gpioCfgInput(_ah, _gpio) \ (_ah)->ah_gpioCfgInput(_ah, _gpio) #define ath_hal_gpioGet(_ah, _gpio) \ (_ah)->ah_gpioGet(_ah, _gpio) #define ath_hal_gpioSet(_ah, _gpio, _val) \ (_ah)->ah_gpioSet(_ah, _gpio, _val) #define ath_hal_gpioSetIntr(_ah, _gpio, _ilevel) \ (_ah)->ah_gpioSetIntr(_ah, _gpio, _ilevel) #define ath_hal_getpowerlimits(_ah, _chan) \ AH_PRIVATE(_ah)->ah_getChipPowerLimits(_ah, _chan) #define ath_hal_getNfAdjust(_ah, _c) \ AH_PRIVATE(_ah)->ah_getNfAdjust(_ah, _c) #define ath_hal_getNoiseFloor(_ah, _nfArray) \ AH_PRIVATE(_ah)->ah_getNoiseFloor(_ah, _nfArray) #define ath_hal_configPCIE(_ah, _reset, _poweroff) \ (_ah)->ah_configPCIE(_ah, _reset, _poweroff) #define ath_hal_disablePCIE(_ah) \ (_ah)->ah_disablePCIE(_ah) #define ath_hal_setInterrupts(_ah, _mask) \ (_ah)->ah_setInterrupts(_ah, _mask) #define ath_hal_isrfkillenabled(_ah) \ (ath_hal_getcapability(_ah, HAL_CAP_RFSILENT, 1, AH_NULL) == HAL_OK) #define ath_hal_enable_rfkill(_ah, _v) \ ath_hal_setcapability(_ah, HAL_CAP_RFSILENT, 1, _v, AH_NULL) #define ath_hal_hasrfkill_int(_ah) \ (ath_hal_getcapability(_ah, HAL_CAP_RFSILENT, 3, AH_NULL) == HAL_OK) #define ath_hal_eepromDetach(_ah) do { \ if (AH_PRIVATE(_ah)->ah_eepromDetach != AH_NULL) \ AH_PRIVATE(_ah)->ah_eepromDetach(_ah); \ } while (0) #define ath_hal_eepromGet(_ah, _param, _val) \ AH_PRIVATE(_ah)->ah_eepromGet(_ah, _param, _val) #define ath_hal_eepromSet(_ah, _param, _val) \ AH_PRIVATE(_ah)->ah_eepromSet(_ah, _param, _val) #define ath_hal_eepromGetFlag(_ah, _param) \ (AH_PRIVATE(_ah)->ah_eepromGet(_ah, _param, AH_NULL) == HAL_OK) #define ath_hal_getSpurChan(_ah, _ix, _is2G) \ AH_PRIVATE(_ah)->ah_getSpurChan(_ah, _ix, _is2G) #define ath_hal_eepromDiag(_ah, _request, _a, _asize, _r, _rsize) \ AH_PRIVATE(_ah)->ah_eepromDiag(_ah, _request, _a, _asize, _r, _rsize) #ifndef _NET_IF_IEEE80211_H_ /* * Stuff that would naturally come from _ieee80211.h */ #define IEEE80211_ADDR_LEN 6 #define IEEE80211_WEP_IVLEN 3 /* 24bit */ #define IEEE80211_WEP_KIDLEN 1 /* 1 octet */ #define IEEE80211_WEP_CRCLEN 4 /* CRC-32 */ #define IEEE80211_CRC_LEN 4 #define IEEE80211_MAX_LEN (2300 + IEEE80211_CRC_LEN + \ (IEEE80211_WEP_IVLEN + IEEE80211_WEP_KIDLEN + IEEE80211_WEP_CRCLEN)) #endif /* _NET_IF_IEEE80211_H_ */ #define HAL_TXQ_USE_LOCKOUT_BKOFF_DIS 0x00000001 #define INIT_AIFS 2 #define INIT_CWMIN 15 #define INIT_CWMIN_11B 31 #define INIT_CWMAX 1023 #define INIT_SH_RETRY 10 #define INIT_LG_RETRY 10 #define INIT_SSH_RETRY 32 #define INIT_SLG_RETRY 32 typedef struct { uint32_t tqi_ver; /* HAL TXQ verson */ HAL_TX_QUEUE tqi_type; /* hw queue type*/ HAL_TX_QUEUE_SUBTYPE tqi_subtype; /* queue subtype, if applicable */ HAL_TX_QUEUE_FLAGS tqi_qflags; /* queue flags */ uint32_t tqi_priority; uint32_t tqi_aifs; /* aifs */ uint32_t tqi_cwmin; /* cwMin */ uint32_t tqi_cwmax; /* cwMax */ uint16_t tqi_shretry; /* frame short retry limit */ uint16_t tqi_lgretry; /* frame long retry limit */ uint32_t tqi_cbrPeriod; uint32_t tqi_cbrOverflowLimit; uint32_t tqi_burstTime; uint32_t tqi_readyTime; uint32_t tqi_physCompBuf; uint32_t tqi_intFlags; /* flags for internal use */ } HAL_TX_QUEUE_INFO; extern HAL_BOOL ath_hal_setTxQProps(struct ath_hal *ah, HAL_TX_QUEUE_INFO *qi, const HAL_TXQ_INFO *qInfo); extern HAL_BOOL ath_hal_getTxQProps(struct ath_hal *ah, HAL_TXQ_INFO *qInfo, const HAL_TX_QUEUE_INFO *qi); #define HAL_SPUR_VAL_MASK 0x3FFF #define HAL_SPUR_CHAN_WIDTH 87 #define HAL_BIN_WIDTH_BASE_100HZ 3125 #define HAL_BIN_WIDTH_TURBO_100HZ 6250 #define HAL_MAX_BINS_ALLOWED 28 #define IS_CHAN_5GHZ(_c) ((_c)->channel > 4900) #define IS_CHAN_2GHZ(_c) (!IS_CHAN_5GHZ(_c)) #define IS_CHAN_IN_PUBLIC_SAFETY_BAND(_c) ((_c) > 4940 && (_c) < 4990) /* * Deduce if the host cpu has big- or litt-endian byte order. */ static __inline__ int isBigEndian(void) { union { int32_t i; char c[4]; } u; u.i = 1; return (u.c[0] == 0); } /* unalligned little endian access */ #define LE_READ_2(p) \ ((uint16_t) \ ((((const uint8_t *)(p))[0] ) | (((const uint8_t *)(p))[1]<< 8))) #define LE_READ_4(p) \ ((uint32_t) \ ((((const uint8_t *)(p))[0] ) | (((const uint8_t *)(p))[1]<< 8) |\ (((const uint8_t *)(p))[2]<<16) | (((const uint8_t *)(p))[3]<<24))) /* * Register manipulation macros that expect bit field defines * to follow the convention that an _S suffix is appended for * a shift count, while the field mask has no suffix. */ #define SM(_v, _f) (((_v) << _f##_S) & (_f)) #define MS(_v, _f) (((_v) & (_f)) >> _f##_S) #define OS_REG_RMW(_a, _r, _set, _clr) \ OS_REG_WRITE(_a, _r, (OS_REG_READ(_a, _r) & ~(_clr)) | (_set)) #define OS_REG_RMW_FIELD(_a, _r, _f, _v) \ OS_REG_WRITE(_a, _r, \ (OS_REG_READ(_a, _r) &~ (_f)) | (((_v) << _f##_S) & (_f))) #define OS_REG_SET_BIT(_a, _r, _f) \ OS_REG_WRITE(_a, _r, OS_REG_READ(_a, _r) | (_f)) #define OS_REG_CLR_BIT(_a, _r, _f) \ OS_REG_WRITE(_a, _r, OS_REG_READ(_a, _r) &~ (_f)) #define OS_REG_IS_BIT_SET(_a, _r, _f) \ ((OS_REG_READ(_a, _r) & (_f)) != 0) #define OS_REG_RMW_FIELD_ALT(_a, _r, _f, _v) \ OS_REG_WRITE(_a, _r, \ (OS_REG_READ(_a, _r) &~(_f<<_f##_S)) | \ (((_v) << _f##_S) & (_f<<_f##_S))) #define OS_REG_READ_FIELD(_a, _r, _f) \ (((OS_REG_READ(_a, _r) & _f) >> _f##_S)) #define OS_REG_READ_FIELD_ALT(_a, _r, _f) \ ((OS_REG_READ(_a, _r) >> (_f##_S))&(_f)) /* Analog register writes may require a delay between each one (eg Merlin?) */ #define OS_A_REG_RMW_FIELD(_a, _r, _f, _v) \ do { OS_REG_WRITE(_a, _r, (OS_REG_READ(_a, _r) &~ (_f)) | \ (((_v) << _f##_S) & (_f))) ; OS_DELAY(100); } while (0) #define OS_A_REG_WRITE(_a, _r, _v) \ do { OS_REG_WRITE(_a, _r, _v); OS_DELAY(100); } while (0) /* wait for the register contents to have the specified value */ extern HAL_BOOL ath_hal_wait(struct ath_hal *, u_int reg, uint32_t mask, uint32_t val); extern HAL_BOOL ath_hal_waitfor(struct ath_hal *, u_int reg, uint32_t mask, uint32_t val, uint32_t timeout); /* return the first n bits in val reversed */ extern uint32_t ath_hal_reverseBits(uint32_t val, uint32_t n); /* printf interfaces */ extern void ath_hal_printf(struct ath_hal *, const char*, ...) __printflike(2,3); extern void ath_hal_vprintf(struct ath_hal *, const char*, __va_list) __printflike(2, 0); extern const char* ath_hal_ether_sprintf(const uint8_t *mac); /* allocate and free memory */ extern void *ath_hal_malloc(size_t); extern void ath_hal_free(void *); /* common debugging interfaces */ #ifdef AH_DEBUG #include "ah_debug.h" extern int ath_hal_debug; /* Global debug flags */ /* * The typecast is purely because some callers will pass in * AH_NULL directly rather than using a NULL ath_hal pointer. */ #define HALDEBUG(_ah, __m, ...) \ do { \ if ((__m) == HAL_DEBUG_UNMASKABLE || \ ath_hal_debug & (__m) || \ ((_ah) != NULL && \ ((struct ath_hal *) (_ah))->ah_config.ah_debug & (__m))) { \ DO_HALDEBUG((_ah), (__m), __VA_ARGS__); \ } \ } while(0); extern void DO_HALDEBUG(struct ath_hal *ah, u_int mask, const char* fmt, ...) __printflike(3,4); #else #define HALDEBUG(_ah, __m, ...) #endif /* AH_DEBUG */ /* * Register logging definitions shared with ardecode. */ #include "ah_decode.h" /* * Common assertion interface. Note: it is a bad idea to generate * an assertion failure for any recoverable event. Instead catch * the violation and, if possible, fix it up or recover from it; either * with an error return value or a diagnostic messages. System software * does not panic unless the situation is hopeless. */ #ifdef AH_ASSERT extern void ath_hal_assert_failed(const char* filename, int lineno, const char* msg); #define HALASSERT(_x) do { \ if (!(_x)) { \ ath_hal_assert_failed(__FILE__, __LINE__, #_x); \ } \ } while (0) #else #define HALASSERT(_x) #endif /* AH_ASSERT */ /* * Regulatory domain support. */ /* * Return the max allowed antenna gain and apply any regulatory * domain specific changes. */ u_int ath_hal_getantennareduction(struct ath_hal *ah, const struct ieee80211_channel *chan, u_int twiceGain); /* * Return the test group for the specific channel based on * the current regulatory setup. */ u_int ath_hal_getctl(struct ath_hal *, const struct ieee80211_channel *); /* * Map a public channel definition to the corresponding * internal data structure. This implicitly specifies * whether or not the specified channel is ok to use * based on the current regulatory domain constraints. */ #ifndef AH_DEBUG static OS_INLINE HAL_CHANNEL_INTERNAL * ath_hal_checkchannel(struct ath_hal *ah, const struct ieee80211_channel *c) { HAL_CHANNEL_INTERNAL *cc; HALASSERT(c->ic_devdata < AH_PRIVATE(ah)->ah_nchan); cc = &AH_PRIVATE(ah)->ah_channels[c->ic_devdata]; HALASSERT(c->ic_freq == cc->channel || IEEE80211_IS_CHAN_GSM(c)); return cc; } #else /* NB: non-inline version that checks state */ HAL_CHANNEL_INTERNAL *ath_hal_checkchannel(struct ath_hal *, const struct ieee80211_channel *); #endif /* AH_DEBUG */ /* * Return the h/w frequency for a channel. This may be * different from ic_freq if this is a GSM device that * takes 2.4GHz frequencies and down-converts them. */ static OS_INLINE uint16_t ath_hal_gethwchannel(struct ath_hal *ah, const struct ieee80211_channel *c) { return ath_hal_checkchannel(ah, c)->channel; } /* - * Convert between microseconds and core system clocks. - */ -extern u_int ath_hal_mac_clks(struct ath_hal *ah, u_int usecs); -extern u_int ath_hal_mac_usec(struct ath_hal *ah, u_int clks); - -/* * Generic get/set capability support. Each chip overrides * this routine to support chip-specific capabilities. */ extern HAL_STATUS ath_hal_getcapability(struct ath_hal *ah, HAL_CAPABILITY_TYPE type, uint32_t capability, uint32_t *result); extern HAL_BOOL ath_hal_setcapability(struct ath_hal *ah, HAL_CAPABILITY_TYPE type, uint32_t capability, uint32_t setting, HAL_STATUS *status); /* The diagnostic codes used to be internally defined here -adrian */ #include "ah_diagcodes.h" /* * The AR5416 and later HALs have MAC and baseband hang checking. */ typedef struct { uint32_t hang_reg_offset; uint32_t hang_val; uint32_t hang_mask; uint32_t hang_offset; } hal_hw_hang_check_t; typedef struct { uint32_t dma_dbg_3; uint32_t dma_dbg_4; uint32_t dma_dbg_5; uint32_t dma_dbg_6; } mac_dbg_regs_t; typedef enum { dcu_chain_state = 0x1, dcu_complete_state = 0x2, qcu_state = 0x4, qcu_fsp_ok = 0x8, qcu_fsp_state = 0x10, qcu_stitch_state = 0x20, qcu_fetch_state = 0x40, qcu_complete_state = 0x80 } hal_mac_hangs_t; typedef struct { int states; uint8_t dcu_chain_state; uint8_t dcu_complete_state; uint8_t qcu_state; uint8_t qcu_fsp_ok; uint8_t qcu_fsp_state; uint8_t qcu_stitch_state; uint8_t qcu_fetch_state; uint8_t qcu_complete_state; } hal_mac_hang_check_t; enum { HAL_BB_HANG_DFS = 0x0001, HAL_BB_HANG_RIFS = 0x0002, HAL_BB_HANG_RX_CLEAR = 0x0004, HAL_BB_HANG_UNKNOWN = 0x0080, HAL_MAC_HANG_SIG1 = 0x0100, HAL_MAC_HANG_SIG2 = 0x0200, HAL_MAC_HANG_UNKNOWN = 0x8000, HAL_BB_HANGS = HAL_BB_HANG_DFS | HAL_BB_HANG_RIFS | HAL_BB_HANG_RX_CLEAR | HAL_BB_HANG_UNKNOWN, HAL_MAC_HANGS = HAL_MAC_HANG_SIG1 | HAL_MAC_HANG_SIG2 | HAL_MAC_HANG_UNKNOWN, }; /* Merge these with above */ typedef enum hal_hw_hangs { HAL_DFS_BB_HANG_WAR = 0x1, HAL_RIFS_BB_HANG_WAR = 0x2, HAL_RX_STUCK_LOW_BB_HANG_WAR = 0x4, HAL_MAC_HANG_WAR = 0x8, HAL_PHYRESTART_CLR_WAR = 0x10, HAL_MAC_HANG_DETECTED = 0x40000000, HAL_BB_HANG_DETECTED = 0x80000000 } hal_hw_hangs_t; /* * Device revision information. */ typedef struct { uint16_t ah_devid; /* PCI device ID */ uint16_t ah_subvendorid; /* PCI subvendor ID */ uint32_t ah_macVersion; /* MAC version id */ uint16_t ah_macRev; /* MAC revision */ uint16_t ah_phyRev; /* PHY revision */ uint16_t ah_analog5GhzRev; /* 2GHz radio revision */ uint16_t ah_analog2GhzRev; /* 5GHz radio revision */ } HAL_REVS; /* * Argument payload for HAL_DIAG_SETKEY. */ typedef struct { HAL_KEYVAL dk_keyval; uint16_t dk_keyix; /* key index */ uint8_t dk_mac[IEEE80211_ADDR_LEN]; int dk_xor; /* XOR key data */ } HAL_DIAG_KEYVAL; /* * Argument payload for HAL_DIAG_EEWRITE. */ typedef struct { uint16_t ee_off; /* eeprom offset */ uint16_t ee_data; /* write data */ } HAL_DIAG_EEVAL; typedef struct { u_int offset; /* reg offset */ uint32_t val; /* reg value */ } HAL_DIAG_REGVAL; /* * 11n compatibility tweaks. */ #define HAL_DIAG_11N_SERVICES 0x00000003 #define HAL_DIAG_11N_SERVICES_S 0 #define HAL_DIAG_11N_TXSTOMP 0x0000000c #define HAL_DIAG_11N_TXSTOMP_S 2 typedef struct { int maxNoiseImmunityLevel; /* [0..4] */ int totalSizeDesired[5]; int coarseHigh[5]; int coarseLow[5]; int firpwr[5]; int maxSpurImmunityLevel; /* [0..7] */ int cycPwrThr1[8]; int maxFirstepLevel; /* [0..2] */ int firstep[3]; uint32_t ofdmTrigHigh; uint32_t ofdmTrigLow; int32_t cckTrigHigh; int32_t cckTrigLow; int32_t rssiThrLow; int32_t rssiThrHigh; int period; /* update listen period */ } HAL_ANI_PARAMS; extern HAL_BOOL ath_hal_getdiagstate(struct ath_hal *ah, int request, const void *args, uint32_t argsize, void **result, uint32_t *resultsize); /* * Setup a h/w rate table for use. */ extern void ath_hal_setupratetable(struct ath_hal *ah, HAL_RATE_TABLE *rt); /* * Common routine for implementing getChanNoise api. */ int16_t ath_hal_getChanNoise(struct ath_hal *, const struct ieee80211_channel *); /* * Initialization support. */ typedef struct { const uint32_t *data; int rows, cols; } HAL_INI_ARRAY; #define HAL_INI_INIT(_ia, _data, _cols) do { \ (_ia)->data = (const uint32_t *)(_data); \ (_ia)->rows = sizeof(_data) / sizeof((_data)[0]); \ (_ia)->cols = (_cols); \ } while (0) #define HAL_INI_VAL(_ia, _r, _c) \ ((_ia)->data[((_r)*(_ia)->cols) + (_c)]) /* * OS_DELAY() does a PIO READ on the PCI bus which allows * other cards' DMA reads to complete in the middle of our reset. */ #define DMA_YIELD(x) do { \ if ((++(x) % 64) == 0) \ OS_DELAY(1); \ } while (0) #define HAL_INI_WRITE_ARRAY(ah, regArray, col, regWr) do { \ int r; \ for (r = 0; r < N(regArray); r++) { \ OS_REG_WRITE(ah, (regArray)[r][0], (regArray)[r][col]); \ DMA_YIELD(regWr); \ } \ } while (0) #define HAL_INI_WRITE_BANK(ah, regArray, bankData, regWr) do { \ int r; \ for (r = 0; r < N(regArray); r++) { \ OS_REG_WRITE(ah, (regArray)[r][0], (bankData)[r]); \ DMA_YIELD(regWr); \ } \ } while (0) extern int ath_hal_ini_write(struct ath_hal *ah, const HAL_INI_ARRAY *ia, int col, int regWr); extern void ath_hal_ini_bank_setup(uint32_t data[], const HAL_INI_ARRAY *ia, int col); extern int ath_hal_ini_bank_write(struct ath_hal *ah, const HAL_INI_ARRAY *ia, const uint32_t data[], int regWr); #define CCK_SIFS_TIME 10 #define CCK_PREAMBLE_BITS 144 #define CCK_PLCP_BITS 48 #define OFDM_SIFS_TIME 16 #define OFDM_PREAMBLE_TIME 20 #define OFDM_PLCP_BITS 22 #define OFDM_SYMBOL_TIME 4 #define OFDM_HALF_SIFS_TIME 32 #define OFDM_HALF_PREAMBLE_TIME 40 #define OFDM_HALF_PLCP_BITS 22 #define OFDM_HALF_SYMBOL_TIME 8 #define OFDM_QUARTER_SIFS_TIME 64 #define OFDM_QUARTER_PREAMBLE_TIME 80 #define OFDM_QUARTER_PLCP_BITS 22 #define OFDM_QUARTER_SYMBOL_TIME 16 #define TURBO_SIFS_TIME 8 #define TURBO_PREAMBLE_TIME 14 #define TURBO_PLCP_BITS 22 #define TURBO_SYMBOL_TIME 4 #define WLAN_CTRL_FRAME_SIZE (2+2+6+4) /* ACK+FCS */ /* Generic EEPROM board value functions */ extern HAL_BOOL ath_ee_getLowerUpperIndex(uint8_t target, uint8_t *pList, uint16_t listSize, uint16_t *indexL, uint16_t *indexR); extern HAL_BOOL ath_ee_FillVpdTable(uint8_t pwrMin, uint8_t pwrMax, uint8_t *pPwrList, uint8_t *pVpdList, uint16_t numIntercepts, uint8_t *pRetVpdList); extern int16_t ath_ee_interpolate(uint16_t target, uint16_t srcLeft, uint16_t srcRight, int16_t targetLeft, int16_t targetRight); /* Whether 5ghz fast clock is needed */ /* * The chipset (Merlin, AR9300/later) should set the capability flag below; * this flag simply says that the hardware can do it, not that the EEPROM * says it can. * * Merlin 2.0/2.1 chips with an EEPROM version > 16 do 5ghz fast clock * if the relevant eeprom flag is set. * Merlin 2.0/2.1 chips with an EEPROM version <= 16 do 5ghz fast clock * by default. */ #define IS_5GHZ_FAST_CLOCK_EN(_ah, _c) \ (IEEE80211_IS_CHAN_5GHZ(_c) && \ AH_PRIVATE((_ah))->ah_caps.halSupportsFastClock5GHz && \ ath_hal_eepromGetFlag((_ah), AR_EEP_FSTCLK_5G)) /* * Fetch the maximum regulatory domain power for the given channel * in 1/2dBm steps. */ static inline int ath_hal_get_twice_max_regpower(struct ath_hal_private *ahp, const HAL_CHANNEL_INTERNAL *ichan, const struct ieee80211_channel *chan) { struct ath_hal *ah = &ahp->h; if (! chan) { ath_hal_printf(ah, "%s: called with chan=NULL!\n", __func__); return (0); } return (chan->ic_maxpower); } /* * Get the maximum antenna gain allowed, in 1/2dBm steps. */ static inline int ath_hal_getantennaallowed(struct ath_hal *ah, const struct ieee80211_channel *chan) { if (! chan) return (0); return (chan->ic_maxantgain); } /* * Map the given 2GHz channel to an IEEE number. */ extern int ath_hal_mhz2ieee_2ghz(struct ath_hal *, int freq); /* * Clear the channel survey data. */ extern void ath_hal_survey_clear(struct ath_hal *ah); /* * Add a sample to the channel survey data. */ extern void ath_hal_survey_add_sample(struct ath_hal *ah, HAL_SURVEY_SAMPLE *hs); #endif /* _ATH_AH_INTERAL_H_ */ Index: projects/clang390-import/sys/dev/ath/ath_hal/ar5416/ar5416_reset.c =================================================================== --- projects/clang390-import/sys/dev/ath/ath_hal/ar5416/ar5416_reset.c (revision 305686) +++ projects/clang390-import/sys/dev/ath/ath_hal/ar5416/ar5416_reset.c (revision 305687) @@ -1,2891 +1,2894 @@ /* * Copyright (c) 2002-2009 Sam Leffler, Errno Consulting * Copyright (c) 2002-2008 Atheros Communications, Inc. * * Permission to use, copy, modify, and/or distribute this software for any * purpose with or without fee is hereby granted, provided that the above * copyright notice and this permission notice appear in all copies. * * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. * * $FreeBSD$ */ #include "opt_ah.h" #include "ah.h" #include "ah_internal.h" #include "ah_devid.h" #include "ah_eeprom_v14.h" #include "ar5416/ar5416.h" #include "ar5416/ar5416reg.h" #include "ar5416/ar5416phy.h" /* Eeprom versioning macros. Returns true if the version is equal or newer than the ver specified */ #define EEP_MINOR(_ah) \ (AH_PRIVATE(_ah)->ah_eeversion & AR5416_EEP_VER_MINOR_MASK) #define IS_EEP_MINOR_V2(_ah) (EEP_MINOR(_ah) >= AR5416_EEP_MINOR_VER_2) #define IS_EEP_MINOR_V3(_ah) (EEP_MINOR(_ah) >= AR5416_EEP_MINOR_VER_3) /* Additional Time delay to wait after activiting the Base band */ #define BASE_ACTIVATE_DELAY 100 /* 100 usec */ #define PLL_SETTLE_DELAY 300 /* 300 usec */ #define RTC_PLL_SETTLE_DELAY 1000 /* 1 ms */ static void ar5416InitDMA(struct ath_hal *ah); static void ar5416InitBB(struct ath_hal *ah, const struct ieee80211_channel *); static void ar5416InitIMR(struct ath_hal *ah, HAL_OPMODE opmode); static void ar5416InitQoS(struct ath_hal *ah); static void ar5416InitUserSettings(struct ath_hal *ah); static void ar5416OverrideIni(struct ath_hal *ah, const struct ieee80211_channel *); #if 0 static HAL_BOOL ar5416ChannelChange(struct ath_hal *, const struct ieee80211_channel *); #endif static void ar5416SetDeltaSlope(struct ath_hal *, const struct ieee80211_channel *); static HAL_BOOL ar5416SetResetPowerOn(struct ath_hal *ah); static HAL_BOOL ar5416SetReset(struct ath_hal *ah, int type); static HAL_BOOL ar5416SetPowerPerRateTable(struct ath_hal *ah, struct ar5416eeprom *pEepData, const struct ieee80211_channel *chan, int16_t *ratesArray, uint16_t cfgCtl, uint16_t AntennaReduction, uint16_t twiceMaxRegulatoryPower, uint16_t powerLimit); static void ar5416Set11nRegs(struct ath_hal *ah, const struct ieee80211_channel *chan); static void ar5416MarkPhyInactive(struct ath_hal *ah); static void ar5416SetIFSTiming(struct ath_hal *ah, const struct ieee80211_channel *chan); /* * Places the device in and out of reset and then places sane * values in the registers based on EEPROM config, initialization * vectors (as determined by the mode), and station configuration * * bChannelChange is used to preserve DMA/PCU registers across * a HW Reset during channel change. */ HAL_BOOL ar5416Reset(struct ath_hal *ah, HAL_OPMODE opmode, struct ieee80211_channel *chan, HAL_BOOL bChannelChange, HAL_RESET_TYPE resetType, HAL_STATUS *status) { #define N(a) (sizeof (a) / sizeof (a[0])) #define FAIL(_code) do { ecode = _code; goto bad; } while (0) struct ath_hal_5212 *ahp = AH5212(ah); HAL_CHANNEL_INTERNAL *ichan; uint32_t saveDefAntenna, saveLedState; uint32_t macStaId1; uint16_t rfXpdGain[2]; HAL_STATUS ecode; uint32_t powerVal, rssiThrReg; uint32_t ackTpcPow, ctsTpcPow, chirpTpcPow; int i; uint64_t tsf = 0; OS_MARK(ah, AH_MARK_RESET, bChannelChange); /* Bring out of sleep mode */ if (!ar5416SetPowerMode(ah, HAL_PM_AWAKE, AH_TRUE)) { HALDEBUG(ah, HAL_DEBUG_ANY, "%s: chip did not wakeup\n", __func__); FAIL(HAL_EIO); } /* * Map public channel to private. */ ichan = ath_hal_checkchannel(ah, chan); if (ichan == AH_NULL) FAIL(HAL_EINVAL); switch (opmode) { case HAL_M_STA: case HAL_M_IBSS: case HAL_M_HOSTAP: case HAL_M_MONITOR: break; default: HALDEBUG(ah, HAL_DEBUG_ANY, "%s: invalid operating mode %u\n", __func__, opmode); FAIL(HAL_EINVAL); break; } HALASSERT(AH_PRIVATE(ah)->ah_eeversion >= AR_EEPROM_VER14_1); /* Blank the channel survey statistics */ ath_hal_survey_clear(ah); /* XXX Turn on fast channel change for 5416 */ /* * Preserve the bmiss rssi threshold and count threshold * across resets */ rssiThrReg = OS_REG_READ(ah, AR_RSSI_THR); /* If reg is zero, first time thru set to default val */ if (rssiThrReg == 0) rssiThrReg = INIT_RSSI_THR; /* * Preserve the antenna on a channel change */ saveDefAntenna = OS_REG_READ(ah, AR_DEF_ANTENNA); /* * Don't do this for the AR9285 - it breaks RX for single * antenna designs when diversity is disabled. * * I'm not sure what this was working around; it may be * something to do with the AR5416. Certainly this register * isn't supposed to be used by the MIMO chips for anything * except for defining the default antenna when an external * phase array / smart antenna is connected. * * See PR: kern/179269 . */ if ((! AR_SREV_KITE(ah)) && saveDefAntenna == 0) /* XXX magic constants */ saveDefAntenna = 1; /* Save hardware flag before chip reset clears the register */ macStaId1 = OS_REG_READ(ah, AR_STA_ID1) & (AR_STA_ID1_BASE_RATE_11B | AR_STA_ID1_USE_DEFANT); /* Save led state from pci config register */ saveLedState = OS_REG_READ(ah, AR_MAC_LED) & (AR_MAC_LED_ASSOC | AR_MAC_LED_MODE | AR_MAC_LED_BLINK_THRESH_SEL | AR_MAC_LED_BLINK_SLOW); /* For chips on which the RTC reset is done, save TSF before it gets cleared */ if (AR_SREV_HOWL(ah) || (AR_SREV_MERLIN(ah) && ath_hal_eepromGetFlag(ah, AR_EEP_OL_PWRCTRL)) || (ah->ah_config.ah_force_full_reset)) tsf = ar5416GetTsf64(ah); /* Mark PHY as inactive; marked active in ar5416InitBB() */ ar5416MarkPhyInactive(ah); if (!ar5416ChipReset(ah, chan)) { HALDEBUG(ah, HAL_DEBUG_ANY, "%s: chip reset failed\n", __func__); FAIL(HAL_EIO); } /* Restore TSF */ if (tsf) ar5416SetTsf64(ah, tsf); OS_MARK(ah, AH_MARK_RESET_LINE, __LINE__); if (AR_SREV_MERLIN_10_OR_LATER(ah)) OS_REG_SET_BIT(ah, AR_GPIO_INPUT_EN_VAL, AR_GPIO_JTAG_DISABLE); AH5416(ah)->ah_writeIni(ah, chan); if(AR_SREV_KIWI_13_OR_LATER(ah) ) { /* Enable ASYNC FIFO */ OS_REG_SET_BIT(ah, AR_MAC_PCU_ASYNC_FIFO_REG3, AR_MAC_PCU_ASYNC_FIFO_REG3_DATAPATH_SEL); OS_REG_SET_BIT(ah, AR_PHY_MODE, AR_PHY_MODE_ASYNCFIFO); OS_REG_CLR_BIT(ah, AR_MAC_PCU_ASYNC_FIFO_REG3, AR_MAC_PCU_ASYNC_FIFO_REG3_SOFT_RESET); OS_REG_SET_BIT(ah, AR_MAC_PCU_ASYNC_FIFO_REG3, AR_MAC_PCU_ASYNC_FIFO_REG3_SOFT_RESET); } /* Override ini values (that can be overriden in this fashion) */ ar5416OverrideIni(ah, chan); /* Setup 11n MAC/Phy mode registers */ ar5416Set11nRegs(ah, chan); OS_MARK(ah, AH_MARK_RESET_LINE, __LINE__); /* * Some AR91xx SoC devices frequently fail to accept TSF writes * right after the chip reset. When that happens, write a new * value after the initvals have been applied, with an offset * based on measured time difference */ if (AR_SREV_HOWL(ah) && (ar5416GetTsf64(ah) < tsf)) { tsf += 1500; ar5416SetTsf64(ah, tsf); } HALDEBUG(ah, HAL_DEBUG_RESET, ">>>2 %s: AR_PHY_DAG_CTRLCCK=0x%x\n", __func__, OS_REG_READ(ah,AR_PHY_DAG_CTRLCCK)); HALDEBUG(ah, HAL_DEBUG_RESET, ">>>2 %s: AR_PHY_ADC_CTL=0x%x\n", __func__, OS_REG_READ(ah,AR_PHY_ADC_CTL)); /* * This routine swaps the analog chains - it should be done * before any radio register twiddling is done. */ ar5416InitChainMasks(ah); /* Setup the open-loop power calibration if required */ if (ath_hal_eepromGetFlag(ah, AR_EEP_OL_PWRCTRL)) { AH5416(ah)->ah_olcInit(ah); AH5416(ah)->ah_olcTempCompensation(ah); } /* Setup the transmit power values. */ if (!ah->ah_setTxPower(ah, chan, rfXpdGain)) { HALDEBUG(ah, HAL_DEBUG_ANY, "%s: error init'ing transmit power\n", __func__); FAIL(HAL_EIO); } /* Write the analog registers */ if (!ahp->ah_rfHal->setRfRegs(ah, chan, IEEE80211_IS_CHAN_2GHZ(chan) ? 2: 1, rfXpdGain)) { HALDEBUG(ah, HAL_DEBUG_ANY, "%s: ar5212SetRfRegs failed\n", __func__); FAIL(HAL_EIO); } /* Write delta slope for OFDM enabled modes (A, G, Turbo) */ if (IEEE80211_IS_CHAN_OFDM(chan)|| IEEE80211_IS_CHAN_HT(chan)) ar5416SetDeltaSlope(ah, chan); AH5416(ah)->ah_spurMitigate(ah, chan); /* Setup board specific options for EEPROM version 3 */ if (!ah->ah_setBoardValues(ah, chan)) { HALDEBUG(ah, HAL_DEBUG_ANY, "%s: error setting board options\n", __func__); FAIL(HAL_EIO); } OS_MARK(ah, AH_MARK_RESET_LINE, __LINE__); OS_REG_WRITE(ah, AR_STA_ID0, LE_READ_4(ahp->ah_macaddr)); OS_REG_WRITE(ah, AR_STA_ID1, LE_READ_2(ahp->ah_macaddr + 4) | macStaId1 | AR_STA_ID1_RTS_USE_DEF | ahp->ah_staId1Defaults ); ar5212SetOperatingMode(ah, opmode); /* Set Venice BSSID mask according to current state */ OS_REG_WRITE(ah, AR_BSSMSKL, LE_READ_4(ahp->ah_bssidmask)); OS_REG_WRITE(ah, AR_BSSMSKU, LE_READ_2(ahp->ah_bssidmask + 4)); /* Restore previous led state */ if (AR_SREV_HOWL(ah)) OS_REG_WRITE(ah, AR_MAC_LED, AR_MAC_LED_ASSOC_ACTIVE | AR_CFG_SCLK_32KHZ); else OS_REG_WRITE(ah, AR_MAC_LED, OS_REG_READ(ah, AR_MAC_LED) | saveLedState); /* Start TSF2 for generic timer 8-15 */ #ifdef NOTYET if (AR_SREV_KIWI(ah)) ar5416StartTsf2(ah); #endif /* * Enable Bluetooth Coexistence if it's enabled. */ if (AH5416(ah)->ah_btCoexConfigType != HAL_BT_COEX_CFG_NONE) ar5416InitBTCoex(ah); /* Restore previous antenna */ OS_REG_WRITE(ah, AR_DEF_ANTENNA, saveDefAntenna); /* then our BSSID and associate id */ OS_REG_WRITE(ah, AR_BSS_ID0, LE_READ_4(ahp->ah_bssid)); OS_REG_WRITE(ah, AR_BSS_ID1, LE_READ_2(ahp->ah_bssid + 4) | (ahp->ah_assocId & 0x3fff) << AR_BSS_ID1_AID_S); /* Restore bmiss rssi & count thresholds */ OS_REG_WRITE(ah, AR_RSSI_THR, ahp->ah_rssiThr); OS_REG_WRITE(ah, AR_ISR, ~0); /* cleared on write */ /* Restore bmiss rssi & count thresholds */ OS_REG_WRITE(ah, AR_RSSI_THR, rssiThrReg); if (!ar5212SetChannel(ah, chan)) FAIL(HAL_EIO); OS_MARK(ah, AH_MARK_RESET_LINE, __LINE__); /* Set 1:1 QCU to DCU mapping for all queues */ for (i = 0; i < AR_NUM_DCU; i++) OS_REG_WRITE(ah, AR_DQCUMASK(i), 1 << i); ahp->ah_intrTxqs = 0; for (i = 0; i < AH_PRIVATE(ah)->ah_caps.halTotalQueues; i++) ah->ah_resetTxQueue(ah, i); ar5416InitIMR(ah, opmode); ar5416SetCoverageClass(ah, AH_PRIVATE(ah)->ah_coverageClass, 1); ar5416InitQoS(ah); /* This may override the AR_DIAG_SW register */ ar5416InitUserSettings(ah); /* XXX this won't work for AR9287! */ if (IEEE80211_IS_CHAN_HALF(chan) || IEEE80211_IS_CHAN_QUARTER(chan)) { ar5416SetIFSTiming(ah, chan); #if 0 /* * AR5413? * Force window_length for 1/2 and 1/4 rate channels, * the ini file sets this to zero otherwise. */ OS_REG_RMW_FIELD(ah, AR_PHY_FRAME_CTL, AR_PHY_FRAME_CTL_WINLEN, 3); } #endif } if (AR_SREV_KIWI_13_OR_LATER(ah)) { /* * Enable ASYNC FIFO * * If Async FIFO is enabled, the following counters change * as MAC now runs at 117 Mhz instead of 88/44MHz when * async FIFO is disabled. * * Overwrite the delay/timeouts initialized in ProcessIni() * above. */ OS_REG_WRITE(ah, AR_D_GBL_IFS_SIFS, AR_D_GBL_IFS_SIFS_ASYNC_FIFO_DUR); OS_REG_WRITE(ah, AR_D_GBL_IFS_SLOT, AR_D_GBL_IFS_SLOT_ASYNC_FIFO_DUR); OS_REG_WRITE(ah, AR_D_GBL_IFS_EIFS, AR_D_GBL_IFS_EIFS_ASYNC_FIFO_DUR); OS_REG_WRITE(ah, AR_TIME_OUT, AR_TIME_OUT_ACK_CTS_ASYNC_FIFO_DUR); OS_REG_WRITE(ah, AR_USEC, AR_USEC_ASYNC_FIFO_DUR); OS_REG_SET_BIT(ah, AR_MAC_PCU_LOGIC_ANALYZER, AR_MAC_PCU_LOGIC_ANALYZER_DISBUG20768); OS_REG_RMW_FIELD(ah, AR_AHB_MODE, AR_AHB_CUSTOM_BURST_EN, AR_AHB_CUSTOM_BURST_ASYNC_FIFO_VAL); } if (AR_SREV_KIWI_13_OR_LATER(ah)) { /* Enable AGGWEP to accelerate encryption engine */ OS_REG_SET_BIT(ah, AR_PCU_MISC_MODE2, AR_PCU_MISC_MODE2_ENABLE_AGGWEP); } /* * disable seq number generation in hw */ OS_REG_WRITE(ah, AR_STA_ID1, OS_REG_READ(ah, AR_STA_ID1) | AR_STA_ID1_PRESERVE_SEQNUM); ar5416InitDMA(ah); /* * program OBS bus to see MAC interrupts */ OS_REG_WRITE(ah, AR_OBS, 8); /* * Disable the "general" TX/RX mitigation timers. */ OS_REG_WRITE(ah, AR_MIRT, 0); #ifdef AH_AR5416_INTERRUPT_MITIGATION /* * This initialises the RX interrupt mitigation timers. * * The mitigation timers begin at idle and are triggered * upon the RXOK of a single frame (or sub-frame, for A-MPDU.) * Then, the RX mitigation interrupt will fire: * * + 250uS after the last RX'ed frame, or * + 700uS after the first RX'ed frame * * Thus, the LAST field dictates the extra latency * induced by the RX mitigation method and the FIRST * field dictates how long to delay before firing an * RX mitigation interrupt. * * Please note this only seems to be for RXOK frames; * not CRC or PHY error frames. * */ OS_REG_RMW_FIELD(ah, AR_RIMT, AR_RIMT_LAST, 250); OS_REG_RMW_FIELD(ah, AR_RIMT, AR_RIMT_FIRST, 700); #endif ar5416InitBB(ah, chan); /* Setup compression registers */ ar5212SetCompRegs(ah); /* XXX not needed? */ /* * 5416 baseband will check the per rate power table * and select the lower of the two */ ackTpcPow = 63; ctsTpcPow = 63; chirpTpcPow = 63; powerVal = SM(ackTpcPow, AR_TPC_ACK) | SM(ctsTpcPow, AR_TPC_CTS) | SM(chirpTpcPow, AR_TPC_CHIRP); OS_REG_WRITE(ah, AR_TPC, powerVal); if (!ar5416InitCal(ah, chan)) FAIL(HAL_ESELFTEST); ar5416RestoreChainMask(ah); AH_PRIVATE(ah)->ah_opmode = opmode; /* record operating mode */ if (bChannelChange && !IEEE80211_IS_CHAN_DFS(chan)) chan->ic_state &= ~IEEE80211_CHANSTATE_CWINT; if (AR_SREV_HOWL(ah)) { /* * Enable the MBSSID block-ack fix for HOWL. * This feature is only supported on Howl 1.4, but it is safe to * set bit 22 of STA_ID1 on other Howl revisions (1.1, 1.2, 1.3), * since bit 22 is unused in those Howl revisions. */ unsigned int reg; reg = (OS_REG_READ(ah, AR_STA_ID1) | (1<<22)); OS_REG_WRITE(ah,AR_STA_ID1, reg); ath_hal_printf(ah, "MBSSID Set bit 22 of AR_STA_ID 0x%x\n", reg); } HALDEBUG(ah, HAL_DEBUG_RESET, "%s: done\n", __func__); OS_MARK(ah, AH_MARK_RESET_DONE, 0); return AH_TRUE; bad: OS_MARK(ah, AH_MARK_RESET_DONE, ecode); if (status != AH_NULL) *status = ecode; return AH_FALSE; #undef FAIL #undef N } #if 0 /* * This channel change evaluates whether the selected hardware can * perform a synthesizer-only channel change (no reset). If the * TX is not stopped, or the RFBus cannot be granted in the given * time, the function returns false as a reset is necessary */ HAL_BOOL ar5416ChannelChange(struct ath_hal *ah, const structu ieee80211_channel *chan) { uint32_t ulCount; uint32_t data, synthDelay, qnum; uint16_t rfXpdGain[4]; struct ath_hal_5212 *ahp = AH5212(ah); HAL_CHANNEL_INTERNAL *ichan; /* * Map public channel to private. */ ichan = ath_hal_checkchannel(ah, chan); /* TX must be stopped or RF Bus grant will not work */ for (qnum = 0; qnum < AH_PRIVATE(ah)->ah_caps.halTotalQueues; qnum++) { if (ar5212NumTxPending(ah, qnum)) { HALDEBUG(ah, HAL_DEBUG_ANY, "%s: frames pending on queue %d\n", __func__, qnum); return AH_FALSE; } } /* * Kill last Baseband Rx Frame - Request analog bus grant */ OS_REG_WRITE(ah, AR_PHY_RFBUS_REQ, AR_PHY_RFBUS_REQ_REQUEST); if (!ath_hal_wait(ah, AR_PHY_RFBUS_GNT, AR_PHY_RFBUS_GRANT_EN, AR_PHY_RFBUS_GRANT_EN)) { HALDEBUG(ah, HAL_DEBUG_ANY, "%s: could not kill baseband rx\n", __func__); return AH_FALSE; } ar5416Set11nRegs(ah, chan); /* NB: setup 5416-specific regs */ /* Change the synth */ if (!ar5212SetChannel(ah, chan)) return AH_FALSE; /* Setup the transmit power values. */ if (!ah->ah_setTxPower(ah, chan, rfXpdGain)) { HALDEBUG(ah, HAL_DEBUG_ANY, "%s: error init'ing transmit power\n", __func__); return AH_FALSE; } /* * Wait for the frequency synth to settle (synth goes on * via PHY_ACTIVE_EN). Read the phy active delay register. * Value is in 100ns increments. */ data = OS_REG_READ(ah, AR_PHY_RX_DELAY) & AR_PHY_RX_DELAY_DELAY; if (IS_CHAN_CCK(ichan)) { synthDelay = (4 * data) / 22; } else { synthDelay = data / 10; } OS_DELAY(synthDelay + BASE_ACTIVATE_DELAY); /* Release the RFBus Grant */ OS_REG_WRITE(ah, AR_PHY_RFBUS_REQ, 0); /* Write delta slope for OFDM enabled modes (A, G, Turbo) */ if (IEEE80211_IS_CHAN_OFDM(ichan)|| IEEE80211_IS_CHAN_HT(chan)) { HALASSERT(AH_PRIVATE(ah)->ah_eeversion >= AR_EEPROM_VER5_3); ar5212SetSpurMitigation(ah, chan); ar5416SetDeltaSlope(ah, chan); } /* XXX spur mitigation for Melin */ if (!IEEE80211_IS_CHAN_DFS(chan)) chan->ic_state &= ~IEEE80211_CHANSTATE_CWINT; ichan->channel_time = 0; ichan->tsf_last = ar5416GetTsf64(ah); ar5212TxEnable(ah, AH_TRUE); return AH_TRUE; } #endif static void ar5416InitDMA(struct ath_hal *ah) { struct ath_hal_5212 *ahp = AH5212(ah); /* * set AHB_MODE not to do cacheline prefetches */ OS_REG_SET_BIT(ah, AR_AHB_MODE, AR_AHB_PREFETCH_RD_EN); /* * let mac dma reads be in 128 byte chunks */ OS_REG_WRITE(ah, AR_TXCFG, (OS_REG_READ(ah, AR_TXCFG) & ~AR_TXCFG_DMASZ_MASK) | AR_TXCFG_DMASZ_128B); /* * let mac dma writes be in 128 byte chunks */ /* * XXX If you change this, you must change the headroom * assigned in ah_maxTxTrigLev - see ar5416InitState(). */ OS_REG_WRITE(ah, AR_RXCFG, (OS_REG_READ(ah, AR_RXCFG) & ~AR_RXCFG_DMASZ_MASK) | AR_RXCFG_DMASZ_128B); /* restore TX trigger level */ OS_REG_WRITE(ah, AR_TXCFG, (OS_REG_READ(ah, AR_TXCFG) &~ AR_FTRIG) | SM(ahp->ah_txTrigLev, AR_FTRIG)); /* * Setup receive FIFO threshold to hold off TX activities */ OS_REG_WRITE(ah, AR_RXFIFO_CFG, 0x200); /* * reduce the number of usable entries in PCU TXBUF to avoid * wrap around. */ if (AR_SREV_KITE(ah)) /* * For AR9285 the number of Fifos are reduced to half. * So set the usable tx buf size also to half to * avoid data/delimiter underruns */ OS_REG_WRITE(ah, AR_PCU_TXBUF_CTRL, AR_9285_PCU_TXBUF_CTRL_USABLE_SIZE); else OS_REG_WRITE(ah, AR_PCU_TXBUF_CTRL, AR_PCU_TXBUF_CTRL_USABLE_SIZE); } static void ar5416InitBB(struct ath_hal *ah, const struct ieee80211_channel *chan) { uint32_t synthDelay; /* * Wait for the frequency synth to settle (synth goes on * via AR_PHY_ACTIVE_EN). Read the phy active delay register. * Value is in 100ns increments. */ synthDelay = OS_REG_READ(ah, AR_PHY_RX_DELAY) & AR_PHY_RX_DELAY_DELAY; if (IEEE80211_IS_CHAN_CCK(chan)) { synthDelay = (4 * synthDelay) / 22; } else { synthDelay /= 10; } /* Turn on PLL on 5416 */ HALDEBUG(ah, HAL_DEBUG_RESET, "%s %s channel\n", __func__, IEEE80211_IS_CHAN_5GHZ(chan) ? "5GHz" : "2GHz"); /* Activate the PHY (includes baseband activate and synthesizer on) */ OS_REG_WRITE(ah, AR_PHY_ACTIVE, AR_PHY_ACTIVE_EN); /* * If the AP starts the calibration before the base band timeout * completes we could get rx_clear false triggering. Add an * extra BASE_ACTIVATE_DELAY usecs to ensure this condition * does not happen. */ if (IEEE80211_IS_CHAN_HALF(chan)) { OS_DELAY((synthDelay << 1) + BASE_ACTIVATE_DELAY); } else if (IEEE80211_IS_CHAN_QUARTER(chan)) { OS_DELAY((synthDelay << 2) + BASE_ACTIVATE_DELAY); } else { OS_DELAY(synthDelay + BASE_ACTIVATE_DELAY); } } static void ar5416InitIMR(struct ath_hal *ah, HAL_OPMODE opmode) { struct ath_hal_5212 *ahp = AH5212(ah); /* * Setup interrupt handling. Note that ar5212ResetTxQueue * manipulates the secondary IMR's as queues are enabled * and disabled. This is done with RMW ops to insure the * settings we make here are preserved. */ ahp->ah_maskReg = AR_IMR_TXERR | AR_IMR_TXURN | AR_IMR_RXERR | AR_IMR_RXORN | AR_IMR_BCNMISC; #ifdef AH_AR5416_INTERRUPT_MITIGATION ahp->ah_maskReg |= AR_IMR_RXINTM | AR_IMR_RXMINTR; #else ahp->ah_maskReg |= AR_IMR_RXOK; #endif ahp->ah_maskReg |= AR_IMR_TXOK; if (opmode == HAL_M_HOSTAP) ahp->ah_maskReg |= AR_IMR_MIB; OS_REG_WRITE(ah, AR_IMR, ahp->ah_maskReg); #ifdef ADRIAN_NOTYET /* This is straight from ath9k */ if (! AR_SREV_HOWL(ah)) { OS_REG_WRITE(ah, AR_INTR_SYNC_CAUSE, 0xFFFFFFFF); OS_REG_WRITE(ah, AR_INTR_SYNC_ENABLE, AR_INTR_SYNC_DEFAULT); OS_REG_WRITE(ah, AR_INTR_SYNC_MASK, 0); } #endif /* Enable bus errors that are OR'd to set the HIUERR bit */ #if 0 OS_REG_WRITE(ah, AR_IMR_S2, OS_REG_READ(ah, AR_IMR_S2) | AR_IMR_S2_GTT | AR_IMR_S2_CST); #endif } static void ar5416InitQoS(struct ath_hal *ah) { /* QoS support */ OS_REG_WRITE(ah, AR_QOS_CONTROL, 0x100aa); /* XXX magic */ OS_REG_WRITE(ah, AR_QOS_SELECT, 0x3210); /* XXX magic */ /* Turn on NOACK Support for QoS packets */ OS_REG_WRITE(ah, AR_NOACK, SM(2, AR_NOACK_2BIT_VALUE) | SM(5, AR_NOACK_BIT_OFFSET) | SM(0, AR_NOACK_BYTE_OFFSET)); /* * initialize TXOP for all TIDs */ OS_REG_WRITE(ah, AR_TXOP_X, AR_TXOP_X_VAL); OS_REG_WRITE(ah, AR_TXOP_0_3, 0xFFFFFFFF); OS_REG_WRITE(ah, AR_TXOP_4_7, 0xFFFFFFFF); OS_REG_WRITE(ah, AR_TXOP_8_11, 0xFFFFFFFF); OS_REG_WRITE(ah, AR_TXOP_12_15, 0xFFFFFFFF); } static void ar5416InitUserSettings(struct ath_hal *ah) { struct ath_hal_5212 *ahp = AH5212(ah); /* Restore user-specified settings */ if (ahp->ah_miscMode != 0) OS_REG_WRITE(ah, AR_MISC_MODE, OS_REG_READ(ah, AR_MISC_MODE) | ahp->ah_miscMode); if (ahp->ah_sifstime != (u_int) -1) ar5212SetSifsTime(ah, ahp->ah_sifstime); if (ahp->ah_slottime != (u_int) -1) ar5212SetSlotTime(ah, ahp->ah_slottime); if (ahp->ah_acktimeout != (u_int) -1) ar5212SetAckTimeout(ah, ahp->ah_acktimeout); if (ahp->ah_ctstimeout != (u_int) -1) ar5212SetCTSTimeout(ah, ahp->ah_ctstimeout); if (AH_PRIVATE(ah)->ah_diagreg != 0) OS_REG_WRITE(ah, AR_DIAG_SW, AH_PRIVATE(ah)->ah_diagreg); if (AH5416(ah)->ah_globaltxtimeout != (u_int) -1) ar5416SetGlobalTxTimeout(ah, AH5416(ah)->ah_globaltxtimeout); } static void ar5416SetRfMode(struct ath_hal *ah, const struct ieee80211_channel *chan) { uint32_t rfMode; if (chan == AH_NULL) return; /* treat channel B as channel G , no B mode suport in owl */ rfMode = IEEE80211_IS_CHAN_CCK(chan) ? AR_PHY_MODE_DYNAMIC : AR_PHY_MODE_OFDM; if (AR_SREV_MERLIN_20(ah) && IS_5GHZ_FAST_CLOCK_EN(ah, chan)) { /* phy mode bits for 5GHz channels require Fast Clock */ rfMode |= AR_PHY_MODE_DYNAMIC | AR_PHY_MODE_DYN_CCK_DISABLE; } else if (!AR_SREV_MERLIN_10_OR_LATER(ah)) { rfMode |= IEEE80211_IS_CHAN_5GHZ(chan) ? AR_PHY_MODE_RF5GHZ : AR_PHY_MODE_RF2GHZ; } OS_REG_WRITE(ah, AR_PHY_MODE, rfMode); } /* * Places the hardware into reset and then pulls it out of reset */ HAL_BOOL ar5416ChipReset(struct ath_hal *ah, const struct ieee80211_channel *chan) { OS_MARK(ah, AH_MARK_CHIPRESET, chan ? chan->ic_freq : 0); /* * Warm reset is optimistic for open-loop TX power control. */ if (AR_SREV_MERLIN(ah) && ath_hal_eepromGetFlag(ah, AR_EEP_OL_PWRCTRL)) { if (!ar5416SetResetReg(ah, HAL_RESET_POWER_ON)) return AH_FALSE; } else if (ah->ah_config.ah_force_full_reset) { if (!ar5416SetResetReg(ah, HAL_RESET_POWER_ON)) return AH_FALSE; } else { if (!ar5416SetResetReg(ah, HAL_RESET_WARM)) return AH_FALSE; } /* Bring out of sleep mode (AGAIN) */ if (!ar5416SetPowerMode(ah, HAL_PM_AWAKE, AH_TRUE)) return AH_FALSE; #ifdef notyet ahp->ah_chipFullSleep = AH_FALSE; #endif AH5416(ah)->ah_initPLL(ah, chan); /* * Perform warm reset before the mode/PLL/turbo registers * are changed in order to deactivate the radio. Mode changes * with an active radio can result in corrupted shifts to the * radio device. */ ar5416SetRfMode(ah, chan); return AH_TRUE; } /* * Delta slope coefficient computation. * Required for OFDM operation. */ static void ar5416GetDeltaSlopeValues(struct ath_hal *ah, uint32_t coef_scaled, uint32_t *coef_mantissa, uint32_t *coef_exponent) { #define COEF_SCALE_S 24 uint32_t coef_exp, coef_man; /* * ALGO -> coef_exp = 14-floor(log2(coef)); * floor(log2(x)) is the highest set bit position */ for (coef_exp = 31; coef_exp > 0; coef_exp--) if ((coef_scaled >> coef_exp) & 0x1) break; /* A coef_exp of 0 is a legal bit position but an unexpected coef_exp */ HALASSERT(coef_exp); coef_exp = 14 - (coef_exp - COEF_SCALE_S); /* * ALGO -> coef_man = floor(coef* 2^coef_exp+0.5); * The coefficient is already shifted up for scaling */ coef_man = coef_scaled + (1 << (COEF_SCALE_S - coef_exp - 1)); *coef_mantissa = coef_man >> (COEF_SCALE_S - coef_exp); *coef_exponent = coef_exp - 16; #undef COEF_SCALE_S } void ar5416SetDeltaSlope(struct ath_hal *ah, const struct ieee80211_channel *chan) { #define INIT_CLOCKMHZSCALED 0x64000000 uint32_t coef_scaled, ds_coef_exp, ds_coef_man; uint32_t clockMhzScaled; CHAN_CENTERS centers; /* half and quarter rate can divide the scaled clock by 2 or 4 respectively */ /* scale for selected channel bandwidth */ clockMhzScaled = INIT_CLOCKMHZSCALED; if (IEEE80211_IS_CHAN_TURBO(chan)) clockMhzScaled <<= 1; else if (IEEE80211_IS_CHAN_HALF(chan)) clockMhzScaled >>= 1; else if (IEEE80211_IS_CHAN_QUARTER(chan)) clockMhzScaled >>= 2; /* * ALGO -> coef = 1e8/fcarrier*fclock/40; * scaled coef to provide precision for this floating calculation */ ar5416GetChannelCenters(ah, chan, ¢ers); coef_scaled = clockMhzScaled / centers.synth_center; ar5416GetDeltaSlopeValues(ah, coef_scaled, &ds_coef_man, &ds_coef_exp); OS_REG_RMW_FIELD(ah, AR_PHY_TIMING3, AR_PHY_TIMING3_DSC_MAN, ds_coef_man); OS_REG_RMW_FIELD(ah, AR_PHY_TIMING3, AR_PHY_TIMING3_DSC_EXP, ds_coef_exp); /* * For Short GI, * scaled coeff is 9/10 that of normal coeff */ coef_scaled = (9 * coef_scaled)/10; ar5416GetDeltaSlopeValues(ah, coef_scaled, &ds_coef_man, &ds_coef_exp); /* for short gi */ OS_REG_RMW_FIELD(ah, AR_PHY_HALFGI, AR_PHY_HALFGI_DSC_MAN, ds_coef_man); OS_REG_RMW_FIELD(ah, AR_PHY_HALFGI, AR_PHY_HALFGI_DSC_EXP, ds_coef_exp); #undef INIT_CLOCKMHZSCALED } /* * Set a limit on the overall output power. Used for dynamic * transmit power control and the like. * * NB: limit is in units of 0.5 dbM. */ HAL_BOOL ar5416SetTxPowerLimit(struct ath_hal *ah, uint32_t limit) { uint16_t dummyXpdGains[2]; AH_PRIVATE(ah)->ah_powerLimit = AH_MIN(limit, MAX_RATE_POWER); return ah->ah_setTxPower(ah, AH_PRIVATE(ah)->ah_curchan, dummyXpdGains); } HAL_BOOL ar5416GetChipPowerLimits(struct ath_hal *ah, struct ieee80211_channel *chan) { struct ath_hal_5212 *ahp = AH5212(ah); int16_t minPower, maxPower; /* * Get Pier table max and min powers. */ if (ahp->ah_rfHal->getChannelMaxMinPower(ah, chan, &maxPower, &minPower)) { /* NB: rf code returns 1/4 dBm units, convert */ chan->ic_maxpower = maxPower / 2; chan->ic_minpower = minPower / 2; } else { HALDEBUG(ah, HAL_DEBUG_ANY, "%s: no min/max power for %u/0x%x\n", __func__, chan->ic_freq, chan->ic_flags); chan->ic_maxpower = AR5416_MAX_RATE_POWER; chan->ic_minpower = 0; } HALDEBUG(ah, HAL_DEBUG_RESET, "Chan %d: MaxPow = %d MinPow = %d\n", chan->ic_freq, chan->ic_maxpower, chan->ic_minpower); return AH_TRUE; } /************************************************************** * ar5416WriteTxPowerRateRegisters * * Write the TX power rate registers from the raw values given * in ratesArray[]. * * The CCK and HT40 rate registers are only written if needed. * HT20 and 11g/11a OFDM rate registers are always written. * * The values written are raw values which should be written * to the registers - so it's up to the caller to pre-adjust * them (eg CCK power offset value, or Merlin TX power offset, * etc.) */ void ar5416WriteTxPowerRateRegisters(struct ath_hal *ah, const struct ieee80211_channel *chan, const int16_t ratesArray[]) { #define POW_SM(_r, _s) (((_r) & 0x3f) << (_s)) /* Write the OFDM power per rate set */ OS_REG_WRITE(ah, AR_PHY_POWER_TX_RATE1, POW_SM(ratesArray[rate18mb], 24) | POW_SM(ratesArray[rate12mb], 16) | POW_SM(ratesArray[rate9mb], 8) | POW_SM(ratesArray[rate6mb], 0) ); OS_REG_WRITE(ah, AR_PHY_POWER_TX_RATE2, POW_SM(ratesArray[rate54mb], 24) | POW_SM(ratesArray[rate48mb], 16) | POW_SM(ratesArray[rate36mb], 8) | POW_SM(ratesArray[rate24mb], 0) ); if (IEEE80211_IS_CHAN_2GHZ(chan)) { /* Write the CCK power per rate set */ OS_REG_WRITE(ah, AR_PHY_POWER_TX_RATE3, POW_SM(ratesArray[rate2s], 24) | POW_SM(ratesArray[rate2l], 16) | POW_SM(ratesArray[rateXr], 8) /* XR target power */ | POW_SM(ratesArray[rate1l], 0) ); OS_REG_WRITE(ah, AR_PHY_POWER_TX_RATE4, POW_SM(ratesArray[rate11s], 24) | POW_SM(ratesArray[rate11l], 16) | POW_SM(ratesArray[rate5_5s], 8) | POW_SM(ratesArray[rate5_5l], 0) ); HALDEBUG(ah, HAL_DEBUG_RESET, "%s AR_PHY_POWER_TX_RATE3=0x%x AR_PHY_POWER_TX_RATE4=0x%x\n", __func__, OS_REG_READ(ah,AR_PHY_POWER_TX_RATE3), OS_REG_READ(ah,AR_PHY_POWER_TX_RATE4)); } /* Write the HT20 power per rate set */ OS_REG_WRITE(ah, AR_PHY_POWER_TX_RATE5, POW_SM(ratesArray[rateHt20_3], 24) | POW_SM(ratesArray[rateHt20_2], 16) | POW_SM(ratesArray[rateHt20_1], 8) | POW_SM(ratesArray[rateHt20_0], 0) ); OS_REG_WRITE(ah, AR_PHY_POWER_TX_RATE6, POW_SM(ratesArray[rateHt20_7], 24) | POW_SM(ratesArray[rateHt20_6], 16) | POW_SM(ratesArray[rateHt20_5], 8) | POW_SM(ratesArray[rateHt20_4], 0) ); if (IEEE80211_IS_CHAN_HT40(chan)) { /* Write the HT40 power per rate set */ OS_REG_WRITE(ah, AR_PHY_POWER_TX_RATE7, POW_SM(ratesArray[rateHt40_3], 24) | POW_SM(ratesArray[rateHt40_2], 16) | POW_SM(ratesArray[rateHt40_1], 8) | POW_SM(ratesArray[rateHt40_0], 0) ); OS_REG_WRITE(ah, AR_PHY_POWER_TX_RATE8, POW_SM(ratesArray[rateHt40_7], 24) | POW_SM(ratesArray[rateHt40_6], 16) | POW_SM(ratesArray[rateHt40_5], 8) | POW_SM(ratesArray[rateHt40_4], 0) ); /* Write the Dup/Ext 40 power per rate set */ OS_REG_WRITE(ah, AR_PHY_POWER_TX_RATE9, POW_SM(ratesArray[rateExtOfdm], 24) | POW_SM(ratesArray[rateExtCck], 16) | POW_SM(ratesArray[rateDupOfdm], 8) | POW_SM(ratesArray[rateDupCck], 0) ); } /* * Set max power to 30 dBm and, optionally, * enable TPC in tx descriptors. */ OS_REG_WRITE(ah, AR_PHY_POWER_TX_RATE_MAX, MAX_RATE_POWER | (AH5212(ah)->ah_tpcEnabled ? AR_PHY_POWER_TX_RATE_MAX_TPC_ENABLE : 0)); #undef POW_SM } /************************************************************** * ar5416SetTransmitPower * * Set the transmit power in the baseband for the given * operating channel and mode. */ HAL_BOOL ar5416SetTransmitPower(struct ath_hal *ah, const struct ieee80211_channel *chan, uint16_t *rfXpdGain) { #define N(a) (sizeof (a) / sizeof (a[0])) #define POW_SM(_r, _s) (((_r) & 0x3f) << (_s)) MODAL_EEP_HEADER *pModal; struct ath_hal_5212 *ahp = AH5212(ah); int16_t txPowerIndexOffset = 0; int i; uint16_t cfgCtl; uint16_t powerLimit; uint16_t twiceAntennaReduction; uint16_t twiceMaxRegulatoryPower; int16_t maxPower; HAL_EEPROM_v14 *ee = AH_PRIVATE(ah)->ah_eeprom; struct ar5416eeprom *pEepData = &ee->ee_base; HALASSERT(AH_PRIVATE(ah)->ah_eeversion >= AR_EEPROM_VER14_1); /* * Default to 2, is overridden based on the EEPROM version / value. */ AH5416(ah)->ah_ht40PowerIncForPdadc = 2; /* Setup info for the actual eeprom */ OS_MEMZERO(AH5416(ah)->ah_ratesArray, sizeof(AH5416(ah)->ah_ratesArray)); cfgCtl = ath_hal_getctl(ah, chan); powerLimit = chan->ic_maxregpower * 2; twiceAntennaReduction = chan->ic_maxantgain; twiceMaxRegulatoryPower = AH_MIN(MAX_RATE_POWER, AH_PRIVATE(ah)->ah_powerLimit); pModal = &pEepData->modalHeader[IEEE80211_IS_CHAN_2GHZ(chan)]; HALDEBUG(ah, HAL_DEBUG_RESET, "%s Channel=%u CfgCtl=%u\n", __func__,chan->ic_freq, cfgCtl ); if (IS_EEP_MINOR_V2(ah)) { AH5416(ah)->ah_ht40PowerIncForPdadc = pModal->ht40PowerIncForPdadc; } if (!ar5416SetPowerPerRateTable(ah, pEepData, chan, &AH5416(ah)->ah_ratesArray[0], cfgCtl, twiceAntennaReduction, twiceMaxRegulatoryPower, powerLimit)) { HALDEBUG(ah, HAL_DEBUG_ANY, "%s: unable to set tx power per rate table\n", __func__); return AH_FALSE; } if (!AH5416(ah)->ah_setPowerCalTable(ah, pEepData, chan, &txPowerIndexOffset)) { HALDEBUG(ah, HAL_DEBUG_ANY, "%s: unable to set power table\n", __func__); return AH_FALSE; } maxPower = AH_MAX(AH5416(ah)->ah_ratesArray[rate6mb], AH5416(ah)->ah_ratesArray[rateHt20_0]); if (IEEE80211_IS_CHAN_2GHZ(chan)) { maxPower = AH_MAX(maxPower, AH5416(ah)->ah_ratesArray[rate1l]); } if (IEEE80211_IS_CHAN_HT40(chan)) { maxPower = AH_MAX(maxPower, AH5416(ah)->ah_ratesArray[rateHt40_0]); } ahp->ah_tx6PowerInHalfDbm = maxPower; AH_PRIVATE(ah)->ah_maxPowerLevel = maxPower; ahp->ah_txPowerIndexOffset = txPowerIndexOffset; /* * txPowerIndexOffset is set by the SetPowerTable() call - * adjust the rate table (0 offset if rates EEPROM not loaded) */ for (i = 0; i < N(AH5416(ah)->ah_ratesArray); i++) { AH5416(ah)->ah_ratesArray[i] = (int16_t)(txPowerIndexOffset + AH5416(ah)->ah_ratesArray[i]); if (AH5416(ah)->ah_ratesArray[i] > AR5416_MAX_RATE_POWER) AH5416(ah)->ah_ratesArray[i] = AR5416_MAX_RATE_POWER; } #ifdef AH_EEPROM_DUMP /* * Dump the rate array whilst it represents the intended dBm*2 * values versus what's being adjusted before being programmed * in. Keep this in mind if you code up this function and enable * this debugging; the values won't necessarily be what's being * programmed into the hardware. */ ar5416PrintPowerPerRate(ah, AH5416(ah)->ah_ratesArray); #endif /* * Merlin and later have a power offset, so subtract * pwr_table_offset * 2 from each value. The default * power offset is -5 dBm - ie, a register value of 0 * equates to a TX power of -5 dBm. */ if (AR_SREV_MERLIN_20_OR_LATER(ah)) { int8_t pwr_table_offset; (void) ath_hal_eepromGet(ah, AR_EEP_PWR_TABLE_OFFSET, &pwr_table_offset); /* Underflow power gets clamped at raw value 0 */ /* Overflow power gets camped at AR5416_MAX_RATE_POWER */ for (i = 0; i < N(AH5416(ah)->ah_ratesArray); i++) { /* * + pwr_table_offset is in dBm * + ratesArray is in 1/2 dBm */ AH5416(ah)->ah_ratesArray[i] -= (pwr_table_offset * 2); if (AH5416(ah)->ah_ratesArray[i] < 0) AH5416(ah)->ah_ratesArray[i] = 0; else if (AH5416(ah)->ah_ratesArray[i] > AR5416_MAX_RATE_POWER) AH5416(ah)->ah_ratesArray[i] = AR5416_MAX_RATE_POWER; } } /* * Adjust rates for OLC where needed * * The following CCK rates need adjusting when doing 2.4ghz * CCK transmission. * * + rate2s, rate2l, rate1l, rate11s, rate11l, rate5_5s, rate5_5l * + rateExtCck, rateDupCck * * They're adjusted here regardless. The hardware then gets * programmed as needed. 5GHz operation doesn't program in CCK * rates for legacy mode but they seem to be initialised for * HT40 regardless of channel type. */ if (AR_SREV_MERLIN_20_OR_LATER(ah) && ath_hal_eepromGetFlag(ah, AR_EEP_OL_PWRCTRL)) { int adj[] = { rate2s, rate2l, rate1l, rate11s, rate11l, rate5_5s, rate5_5l, rateExtCck, rateDupCck }; int cck_ofdm_delta = 2; int i; for (i = 0; i < N(adj); i++) { AH5416(ah)->ah_ratesArray[adj[i]] -= cck_ofdm_delta; if (AH5416(ah)->ah_ratesArray[adj[i]] < 0) AH5416(ah)->ah_ratesArray[adj[i]] = 0; } } /* * Adjust the HT40 power to meet the correct target TX power * for 40MHz mode, based on TX power curves that are established * for 20MHz mode. * * XXX handle overflow/too high power level? */ if (IEEE80211_IS_CHAN_HT40(chan)) { AH5416(ah)->ah_ratesArray[rateHt40_0] += AH5416(ah)->ah_ht40PowerIncForPdadc; AH5416(ah)->ah_ratesArray[rateHt40_1] += AH5416(ah)->ah_ht40PowerIncForPdadc; AH5416(ah)->ah_ratesArray[rateHt40_2] += AH5416(ah)->ah_ht40PowerIncForPdadc; AH5416(ah)->ah_ratesArray[rateHt40_3] += AH5416(ah)->ah_ht40PowerIncForPdadc; AH5416(ah)->ah_ratesArray[rateHt40_4] += AH5416(ah)->ah_ht40PowerIncForPdadc; AH5416(ah)->ah_ratesArray[rateHt40_5] += AH5416(ah)->ah_ht40PowerIncForPdadc; AH5416(ah)->ah_ratesArray[rateHt40_6] += AH5416(ah)->ah_ht40PowerIncForPdadc; AH5416(ah)->ah_ratesArray[rateHt40_7] += AH5416(ah)->ah_ht40PowerIncForPdadc; } /* Write the TX power rate registers */ ar5416WriteTxPowerRateRegisters(ah, chan, AH5416(ah)->ah_ratesArray); /* Write the Power subtraction for dynamic chain changing, for per-packet powertx */ OS_REG_WRITE(ah, AR_PHY_POWER_TX_SUB, POW_SM(pModal->pwrDecreaseFor3Chain, 6) | POW_SM(pModal->pwrDecreaseFor2Chain, 0) ); return AH_TRUE; #undef POW_SM #undef N } /* * Exported call to check for a recent gain reading and return * the current state of the thermal calibration gain engine. */ HAL_RFGAIN ar5416GetRfgain(struct ath_hal *ah) { return (HAL_RFGAIN_INACTIVE); } /* * Places all of hardware into reset */ HAL_BOOL ar5416Disable(struct ath_hal *ah) { if (!ar5416SetPowerMode(ah, HAL_PM_AWAKE, AH_TRUE)) return AH_FALSE; if (! ar5416SetResetReg(ah, HAL_RESET_COLD)) return AH_FALSE; AH5416(ah)->ah_initPLL(ah, AH_NULL); return (AH_TRUE); } /* * Places the PHY and Radio chips into reset. A full reset * must be called to leave this state. The PCI/MAC/PCU are * not placed into reset as we must receive interrupt to * re-enable the hardware. */ HAL_BOOL ar5416PhyDisable(struct ath_hal *ah) { if (! ar5416SetResetReg(ah, HAL_RESET_WARM)) return AH_FALSE; AH5416(ah)->ah_initPLL(ah, AH_NULL); return (AH_TRUE); } /* * Write the given reset bit mask into the reset register */ HAL_BOOL ar5416SetResetReg(struct ath_hal *ah, uint32_t type) { /* * Set force wake */ OS_REG_WRITE(ah, AR_RTC_FORCE_WAKE, AR_RTC_FORCE_WAKE_EN | AR_RTC_FORCE_WAKE_ON_INT); switch (type) { case HAL_RESET_POWER_ON: return ar5416SetResetPowerOn(ah); case HAL_RESET_WARM: case HAL_RESET_COLD: return ar5416SetReset(ah, type); default: HALASSERT(AH_FALSE); return AH_FALSE; } } static HAL_BOOL ar5416SetResetPowerOn(struct ath_hal *ah) { /* Power On Reset (Hard Reset) */ /* * Set force wake * * If the MAC was running, previously calling * reset will wake up the MAC but it may go back to sleep * before we can start polling. * Set force wake stops that * This must be called before initiating a hard reset. */ OS_REG_WRITE(ah, AR_RTC_FORCE_WAKE, AR_RTC_FORCE_WAKE_EN | AR_RTC_FORCE_WAKE_ON_INT); /* * PowerOn reset can be used in open loop power control or failure recovery. * If we do RTC reset while DMA is still running, hardware may corrupt memory. * Therefore, we need to reset AHB first to stop DMA. */ if (! AR_SREV_HOWL(ah)) OS_REG_WRITE(ah, AR_RC, AR_RC_AHB); /* * RTC reset and clear */ OS_REG_WRITE(ah, AR_RTC_RESET, 0); OS_DELAY(20); if (! AR_SREV_HOWL(ah)) OS_REG_WRITE(ah, AR_RC, 0); OS_REG_WRITE(ah, AR_RTC_RESET, 1); /* * Poll till RTC is ON */ if (!ath_hal_wait(ah, AR_RTC_STATUS, AR_RTC_PM_STATUS_M, AR_RTC_STATUS_ON)) { HALDEBUG(ah, HAL_DEBUG_ANY, "%s: RTC not waking up\n", __func__); return AH_FALSE; } return ar5416SetReset(ah, HAL_RESET_COLD); } static HAL_BOOL ar5416SetReset(struct ath_hal *ah, int type) { uint32_t tmpReg, mask; uint32_t rst_flags; #ifdef AH_SUPPORT_AR9130 /* Because of the AR9130 specific registers */ if (AR_SREV_HOWL(ah)) { HALDEBUG(ah, HAL_DEBUG_ANY, "[ath] HOWL: Fiddling with derived clk!\n"); uint32_t val = OS_REG_READ(ah, AR_RTC_DERIVED_CLK); val &= ~AR_RTC_DERIVED_CLK_PERIOD; val |= SM(1, AR_RTC_DERIVED_CLK_PERIOD); OS_REG_WRITE(ah, AR_RTC_DERIVED_CLK, val); (void) OS_REG_READ(ah, AR_RTC_DERIVED_CLK); } #endif /* AH_SUPPORT_AR9130 */ /* * Force wake */ OS_REG_WRITE(ah, AR_RTC_FORCE_WAKE, AR_RTC_FORCE_WAKE_EN | AR_RTC_FORCE_WAKE_ON_INT); #ifdef AH_SUPPORT_AR9130 if (AR_SREV_HOWL(ah)) { rst_flags = AR_RTC_RC_MAC_WARM | AR_RTC_RC_MAC_COLD | AR_RTC_RC_COLD_RESET | AR_RTC_RC_WARM_RESET; } else { #endif /* AH_SUPPORT_AR9130 */ /* * Reset AHB * * (In case the last interrupt source was a bus timeout.) * XXX TODO: this is not the way to do it! It should be recorded * XXX by the interrupt handler and passed _into_ the * XXX reset path routine so this occurs. */ tmpReg = OS_REG_READ(ah, AR_INTR_SYNC_CAUSE); if (tmpReg & (AR_INTR_SYNC_LOCAL_TIMEOUT|AR_INTR_SYNC_RADM_CPL_TIMEOUT)) { OS_REG_WRITE(ah, AR_INTR_SYNC_ENABLE, 0); OS_REG_WRITE(ah, AR_RC, AR_RC_AHB|AR_RC_HOSTIF); } else { OS_REG_WRITE(ah, AR_RC, AR_RC_AHB); } rst_flags = AR_RTC_RC_MAC_WARM; if (type == HAL_RESET_COLD) rst_flags |= AR_RTC_RC_MAC_COLD; #ifdef AH_SUPPORT_AR9130 } #endif /* AH_SUPPORT_AR9130 */ OS_REG_WRITE(ah, AR_RTC_RC, rst_flags); if (AR_SREV_HOWL(ah)) OS_DELAY(10000); else OS_DELAY(100); /* * Clear resets and force wakeup */ OS_REG_WRITE(ah, AR_RTC_RC, 0); if (!ath_hal_wait(ah, AR_RTC_RC, AR_RTC_RC_M, 0)) { HALDEBUG(ah, HAL_DEBUG_ANY, "%s: RTC stuck in MAC reset\n", __func__); return AH_FALSE; } /* Clear AHB reset */ if (! AR_SREV_HOWL(ah)) OS_REG_WRITE(ah, AR_RC, 0); if (AR_SREV_HOWL(ah)) OS_DELAY(50); if (AR_SREV_HOWL(ah)) { uint32_t mask; mask = OS_REG_READ(ah, AR_CFG); if (mask & (AR_CFG_SWRB | AR_CFG_SWTB | AR_CFG_SWRG)) { HALDEBUG(ah, HAL_DEBUG_RESET, "CFG Byte Swap Set 0x%x\n", mask); } else { mask = INIT_CONFIG_STATUS | AR_CFG_SWRB | AR_CFG_SWTB; OS_REG_WRITE(ah, AR_CFG, mask); HALDEBUG(ah, HAL_DEBUG_RESET, "Setting CFG 0x%x\n", OS_REG_READ(ah, AR_CFG)); } } else { if (type == HAL_RESET_COLD) { if (isBigEndian()) { /* * Set CFG, little-endian for descriptor accesses. */ mask = INIT_CONFIG_STATUS | AR_CFG_SWRD; #ifndef AH_NEED_DESC_SWAP mask |= AR_CFG_SWTD; #endif HALDEBUG(ah, HAL_DEBUG_RESET, "%s Applying descriptor swap\n", __func__); OS_REG_WRITE(ah, AR_CFG, mask); } else OS_REG_WRITE(ah, AR_CFG, INIT_CONFIG_STATUS); } } return AH_TRUE; } void ar5416InitChainMasks(struct ath_hal *ah) { int rx_chainmask = AH5416(ah)->ah_rx_chainmask; /* Flip this for this chainmask regardless of chip */ if (rx_chainmask == 0x5) OS_REG_SET_BIT(ah, AR_PHY_ANALOG_SWAP, AR_PHY_SWAP_ALT_CHAIN); /* * Workaround for OWL 1.0 calibration failure; enable multi-chain; * then set true mask after calibration. */ if (IS_5416V1(ah) && (rx_chainmask == 0x5 || rx_chainmask == 0x3)) { OS_REG_WRITE(ah, AR_PHY_RX_CHAINMASK, 0x7); OS_REG_WRITE(ah, AR_PHY_CAL_CHAINMASK, 0x7); } else { OS_REG_WRITE(ah, AR_PHY_RX_CHAINMASK, AH5416(ah)->ah_rx_chainmask); OS_REG_WRITE(ah, AR_PHY_CAL_CHAINMASK, AH5416(ah)->ah_rx_chainmask); } OS_REG_WRITE(ah, AR_SELFGEN_MASK, AH5416(ah)->ah_tx_chainmask); if (AH5416(ah)->ah_tx_chainmask == 0x5) OS_REG_SET_BIT(ah, AR_PHY_ANALOG_SWAP, AR_PHY_SWAP_ALT_CHAIN); if (AR_SREV_HOWL(ah)) { OS_REG_WRITE(ah, AR_PHY_ANALOG_SWAP, OS_REG_READ(ah, AR_PHY_ANALOG_SWAP) | 0x00000001); } } /* * Work-around for Owl 1.0 calibration failure. * * ar5416InitChainMasks sets the RX chainmask to 0x7 if it's Owl 1.0 * due to init calibration failures. ar5416RestoreChainMask restores * these registers to the correct setting. */ void ar5416RestoreChainMask(struct ath_hal *ah) { int rx_chainmask = AH5416(ah)->ah_rx_chainmask; if (IS_5416V1(ah) && (rx_chainmask == 0x5 || rx_chainmask == 0x3)) { OS_REG_WRITE(ah, AR_PHY_RX_CHAINMASK, rx_chainmask); OS_REG_WRITE(ah, AR_PHY_CAL_CHAINMASK, rx_chainmask); } } void ar5416InitPLL(struct ath_hal *ah, const struct ieee80211_channel *chan) { uint32_t pll = AR_RTC_PLL_REFDIV_5 | AR_RTC_PLL_DIV2; if (chan != AH_NULL) { if (IEEE80211_IS_CHAN_HALF(chan)) pll |= SM(0x1, AR_RTC_PLL_CLKSEL); else if (IEEE80211_IS_CHAN_QUARTER(chan)) pll |= SM(0x2, AR_RTC_PLL_CLKSEL); if (IEEE80211_IS_CHAN_5GHZ(chan)) pll |= SM(0xa, AR_RTC_PLL_DIV); else pll |= SM(0xb, AR_RTC_PLL_DIV); } else pll |= SM(0xb, AR_RTC_PLL_DIV); OS_REG_WRITE(ah, AR_RTC_PLL_CONTROL, pll); /* TODO: * For multi-band owl, switch between bands by reiniting the PLL. */ OS_DELAY(RTC_PLL_SETTLE_DELAY); OS_REG_WRITE(ah, AR_RTC_SLEEP_CLK, AR_RTC_SLEEP_DERIVED_CLK); } static void ar5416SetDefGainValues(struct ath_hal *ah, const MODAL_EEP_HEADER *pModal, const struct ar5416eeprom *eep, uint8_t txRxAttenLocal, int regChainOffset, int i) { if (IS_EEP_MINOR_V3(ah)) { txRxAttenLocal = pModal->txRxAttenCh[i]; if (AR_SREV_MERLIN_10_OR_LATER(ah)) { OS_REG_RMW_FIELD(ah, AR_PHY_GAIN_2GHZ + regChainOffset, AR_PHY_GAIN_2GHZ_XATTEN1_MARGIN, pModal->bswMargin[i]); OS_REG_RMW_FIELD(ah, AR_PHY_GAIN_2GHZ + regChainOffset, AR_PHY_GAIN_2GHZ_XATTEN1_DB, pModal->bswAtten[i]); OS_REG_RMW_FIELD(ah, AR_PHY_GAIN_2GHZ + regChainOffset, AR_PHY_GAIN_2GHZ_XATTEN2_MARGIN, pModal->xatten2Margin[i]); OS_REG_RMW_FIELD(ah, AR_PHY_GAIN_2GHZ + regChainOffset, AR_PHY_GAIN_2GHZ_XATTEN2_DB, pModal->xatten2Db[i]); } else { OS_REG_RMW_FIELD(ah, AR_PHY_GAIN_2GHZ + regChainOffset, AR_PHY_GAIN_2GHZ_BSW_MARGIN, pModal->bswMargin[i]); OS_REG_RMW_FIELD(ah, AR_PHY_GAIN_2GHZ + regChainOffset, AR_PHY_GAIN_2GHZ_BSW_ATTEN, pModal->bswAtten[i]); } } if (AR_SREV_MERLIN_10_OR_LATER(ah)) { OS_REG_RMW_FIELD(ah, AR_PHY_RXGAIN + regChainOffset, AR9280_PHY_RXGAIN_TXRX_ATTEN, txRxAttenLocal); OS_REG_RMW_FIELD(ah, AR_PHY_RXGAIN + regChainOffset, AR9280_PHY_RXGAIN_TXRX_MARGIN, pModal->rxTxMarginCh[i]); } else { OS_REG_RMW_FIELD(ah, AR_PHY_RXGAIN + regChainOffset, AR_PHY_RXGAIN_TXRX_ATTEN, txRxAttenLocal); OS_REG_RMW_FIELD(ah, AR_PHY_GAIN_2GHZ + regChainOffset, AR_PHY_GAIN_2GHZ_RXTX_MARGIN, pModal->rxTxMarginCh[i]); } } /* * Get the register chain offset for the given chain. * * Take into account the register chain swapping with AR5416 v2.0. * * XXX make sure that the reg chain swapping is only done for * XXX AR5416 v2.0 or greater, and not later chips? */ int ar5416GetRegChainOffset(struct ath_hal *ah, int i) { int regChainOffset; if (AR_SREV_5416_V20_OR_LATER(ah) && (AH5416(ah)->ah_rx_chainmask == 0x5 || AH5416(ah)->ah_tx_chainmask == 0x5) && (i != 0)) { /* Regs are swapped from chain 2 to 1 for 5416 2_0 with * only chains 0 and 2 populated */ regChainOffset = (i == 1) ? 0x2000 : 0x1000; } else { regChainOffset = i * 0x1000; } return regChainOffset; } /* * Read EEPROM header info and program the device for correct operation * given the channel value. */ HAL_BOOL ar5416SetBoardValues(struct ath_hal *ah, const struct ieee80211_channel *chan) { const HAL_EEPROM_v14 *ee = AH_PRIVATE(ah)->ah_eeprom; const struct ar5416eeprom *eep = &ee->ee_base; const MODAL_EEP_HEADER *pModal; int i, regChainOffset; uint8_t txRxAttenLocal; /* workaround for eeprom versions <= 14.2 */ HALASSERT(AH_PRIVATE(ah)->ah_eeversion >= AR_EEPROM_VER14_1); pModal = &eep->modalHeader[IEEE80211_IS_CHAN_2GHZ(chan)]; /* NB: workaround for eeprom versions <= 14.2 */ txRxAttenLocal = IEEE80211_IS_CHAN_2GHZ(chan) ? 23 : 44; OS_REG_WRITE(ah, AR_PHY_SWITCH_COM, pModal->antCtrlCommon); for (i = 0; i < AR5416_MAX_CHAINS; i++) { if (AR_SREV_MERLIN(ah)) { if (i >= 2) break; } regChainOffset = ar5416GetRegChainOffset(ah, i); OS_REG_WRITE(ah, AR_PHY_SWITCH_CHAIN_0 + regChainOffset, pModal->antCtrlChain[i]); OS_REG_WRITE(ah, AR_PHY_TIMING_CTRL4 + regChainOffset, (OS_REG_READ(ah, AR_PHY_TIMING_CTRL4 + regChainOffset) & ~(AR_PHY_TIMING_CTRL4_IQCORR_Q_Q_COFF | AR_PHY_TIMING_CTRL4_IQCORR_Q_I_COFF)) | SM(pModal->iqCalICh[i], AR_PHY_TIMING_CTRL4_IQCORR_Q_I_COFF) | SM(pModal->iqCalQCh[i], AR_PHY_TIMING_CTRL4_IQCORR_Q_Q_COFF)); /* * Large signal upgrade, * If 14.3 or later EEPROM, use * txRxAttenLocal = pModal->txRxAttenCh[i] * else txRxAttenLocal is fixed value above. */ if ((i == 0) || AR_SREV_5416_V20_OR_LATER(ah)) ar5416SetDefGainValues(ah, pModal, eep, txRxAttenLocal, regChainOffset, i); } if (AR_SREV_MERLIN_10_OR_LATER(ah)) { if (IEEE80211_IS_CHAN_2GHZ(chan)) { OS_A_REG_RMW_FIELD(ah, AR_AN_RF2G1_CH0, AR_AN_RF2G1_CH0_OB, pModal->ob); OS_A_REG_RMW_FIELD(ah, AR_AN_RF2G1_CH0, AR_AN_RF2G1_CH0_DB, pModal->db); OS_A_REG_RMW_FIELD(ah, AR_AN_RF2G1_CH1, AR_AN_RF2G1_CH1_OB, pModal->ob_ch1); OS_A_REG_RMW_FIELD(ah, AR_AN_RF2G1_CH1, AR_AN_RF2G1_CH1_DB, pModal->db_ch1); } else { OS_A_REG_RMW_FIELD(ah, AR_AN_RF5G1_CH0, AR_AN_RF5G1_CH0_OB5, pModal->ob); OS_A_REG_RMW_FIELD(ah, AR_AN_RF5G1_CH0, AR_AN_RF5G1_CH0_DB5, pModal->db); OS_A_REG_RMW_FIELD(ah, AR_AN_RF5G1_CH1, AR_AN_RF5G1_CH1_OB5, pModal->ob_ch1); OS_A_REG_RMW_FIELD(ah, AR_AN_RF5G1_CH1, AR_AN_RF5G1_CH1_DB5, pModal->db_ch1); } OS_A_REG_RMW_FIELD(ah, AR_AN_TOP2, AR_AN_TOP2_XPABIAS_LVL, pModal->xpaBiasLvl); OS_A_REG_RMW_FIELD(ah, AR_AN_TOP2, AR_AN_TOP2_LOCALBIAS, !!(pModal->flagBits & AR5416_EEP_FLAG_LOCALBIAS)); OS_A_REG_RMW_FIELD(ah, AR_PHY_XPA_CFG, AR_PHY_FORCE_XPA_CFG, !!(pModal->flagBits & AR5416_EEP_FLAG_FORCEXPAON)); } OS_REG_RMW_FIELD(ah, AR_PHY_SETTLING, AR_PHY_SETTLING_SWITCH, pModal->switchSettling); OS_REG_RMW_FIELD(ah, AR_PHY_DESIRED_SZ, AR_PHY_DESIRED_SZ_ADC, pModal->adcDesiredSize); if (! AR_SREV_MERLIN_10_OR_LATER(ah)) OS_REG_RMW_FIELD(ah, AR_PHY_DESIRED_SZ, AR_PHY_DESIRED_SZ_PGA, pModal->pgaDesiredSize); OS_REG_WRITE(ah, AR_PHY_RF_CTL4, SM(pModal->txEndToXpaOff, AR_PHY_RF_CTL4_TX_END_XPAA_OFF) | SM(pModal->txEndToXpaOff, AR_PHY_RF_CTL4_TX_END_XPAB_OFF) | SM(pModal->txFrameToXpaOn, AR_PHY_RF_CTL4_FRAME_XPAA_ON) | SM(pModal->txFrameToXpaOn, AR_PHY_RF_CTL4_FRAME_XPAB_ON)); OS_REG_RMW_FIELD(ah, AR_PHY_RF_CTL3, AR_PHY_TX_END_TO_A2_RX_ON, pModal->txEndToRxOn); if (AR_SREV_MERLIN_10_OR_LATER(ah)) { OS_REG_RMW_FIELD(ah, AR_PHY_CCA, AR9280_PHY_CCA_THRESH62, pModal->thresh62); OS_REG_RMW_FIELD(ah, AR_PHY_EXT_CCA0, AR_PHY_EXT_CCA0_THRESH62, pModal->thresh62); } else { OS_REG_RMW_FIELD(ah, AR_PHY_CCA, AR_PHY_CCA_THRESH62, pModal->thresh62); OS_REG_RMW_FIELD(ah, AR_PHY_EXT_CCA, AR_PHY_EXT_CCA_THRESH62, pModal->thresh62); } /* Minor Version Specific application */ if (IS_EEP_MINOR_V2(ah)) { OS_REG_RMW_FIELD(ah, AR_PHY_RF_CTL2, AR_PHY_TX_FRAME_TO_DATA_START, pModal->txFrameToDataStart); OS_REG_RMW_FIELD(ah, AR_PHY_RF_CTL2, AR_PHY_TX_FRAME_TO_PA_ON, pModal->txFrameToPaOn); } if (IS_EEP_MINOR_V3(ah) && IEEE80211_IS_CHAN_HT40(chan)) /* Overwrite switch settling with HT40 value */ OS_REG_RMW_FIELD(ah, AR_PHY_SETTLING, AR_PHY_SETTLING_SWITCH, pModal->swSettleHt40); if (AR_SREV_MERLIN_20_OR_LATER(ah) && EEP_MINOR(ah) >= AR5416_EEP_MINOR_VER_19) OS_REG_RMW_FIELD(ah, AR_PHY_CCK_TX_CTRL, AR_PHY_CCK_TX_CTRL_TX_DAC_SCALE_CCK, pModal->miscBits); if (AR_SREV_MERLIN_20(ah) && EEP_MINOR(ah) >= AR5416_EEP_MINOR_VER_20) { if (IEEE80211_IS_CHAN_2GHZ(chan)) OS_A_REG_RMW_FIELD(ah, AR_AN_TOP1, AR_AN_TOP1_DACIPMODE, eep->baseEepHeader.dacLpMode); else if (eep->baseEepHeader.dacHiPwrMode_5G) OS_A_REG_RMW_FIELD(ah, AR_AN_TOP1, AR_AN_TOP1_DACIPMODE, 0); else OS_A_REG_RMW_FIELD(ah, AR_AN_TOP1, AR_AN_TOP1_DACIPMODE, eep->baseEepHeader.dacLpMode); OS_DELAY(100); OS_REG_RMW_FIELD(ah, AR_PHY_FRAME_CTL, AR_PHY_FRAME_CTL_TX_CLIP, pModal->miscBits >> 2); OS_REG_RMW_FIELD(ah, AR_PHY_TX_PWRCTRL9, AR_PHY_TX_DESIRED_SCALE_CCK, eep->baseEepHeader.desiredScaleCCK); } return (AH_TRUE); } /* * Helper functions common for AP/CB/XB */ /* * Set the target power array "ratesArray" from the * given set of target powers. * * This is used by the various chipset/EEPROM TX power * setup routines. */ void ar5416SetRatesArrayFromTargetPower(struct ath_hal *ah, const struct ieee80211_channel *chan, int16_t *ratesArray, const CAL_TARGET_POWER_LEG *targetPowerCck, const CAL_TARGET_POWER_LEG *targetPowerCckExt, const CAL_TARGET_POWER_LEG *targetPowerOfdm, const CAL_TARGET_POWER_LEG *targetPowerOfdmExt, const CAL_TARGET_POWER_HT *targetPowerHt20, const CAL_TARGET_POWER_HT *targetPowerHt40) { #define N(a) (sizeof(a)/sizeof(a[0])) int i; /* Blank the rates array, to be consistent */ for (i = 0; i < Ar5416RateSize; i++) ratesArray[i] = 0; /* Set rates Array from collected data */ ratesArray[rate6mb] = ratesArray[rate9mb] = ratesArray[rate12mb] = ratesArray[rate18mb] = ratesArray[rate24mb] = targetPowerOfdm->tPow2x[0]; ratesArray[rate36mb] = targetPowerOfdm->tPow2x[1]; ratesArray[rate48mb] = targetPowerOfdm->tPow2x[2]; ratesArray[rate54mb] = targetPowerOfdm->tPow2x[3]; ratesArray[rateXr] = targetPowerOfdm->tPow2x[0]; for (i = 0; i < N(targetPowerHt20->tPow2x); i++) { ratesArray[rateHt20_0 + i] = targetPowerHt20->tPow2x[i]; } if (IEEE80211_IS_CHAN_2GHZ(chan)) { ratesArray[rate1l] = targetPowerCck->tPow2x[0]; ratesArray[rate2s] = ratesArray[rate2l] = targetPowerCck->tPow2x[1]; ratesArray[rate5_5s] = ratesArray[rate5_5l] = targetPowerCck->tPow2x[2]; ratesArray[rate11s] = ratesArray[rate11l] = targetPowerCck->tPow2x[3]; } if (IEEE80211_IS_CHAN_HT40(chan)) { for (i = 0; i < N(targetPowerHt40->tPow2x); i++) { ratesArray[rateHt40_0 + i] = targetPowerHt40->tPow2x[i]; } ratesArray[rateDupOfdm] = targetPowerHt40->tPow2x[0]; ratesArray[rateDupCck] = targetPowerHt40->tPow2x[0]; ratesArray[rateExtOfdm] = targetPowerOfdmExt->tPow2x[0]; if (IEEE80211_IS_CHAN_2GHZ(chan)) { ratesArray[rateExtCck] = targetPowerCckExt->tPow2x[0]; } } #undef N } /* * ar5416SetPowerPerRateTable * * Sets the transmit power in the baseband for the given * operating channel and mode. */ static HAL_BOOL ar5416SetPowerPerRateTable(struct ath_hal *ah, struct ar5416eeprom *pEepData, const struct ieee80211_channel *chan, int16_t *ratesArray, uint16_t cfgCtl, uint16_t AntennaReduction, uint16_t twiceMaxRegulatoryPower, uint16_t powerLimit) { #define N(a) (sizeof(a)/sizeof(a[0])) /* Local defines to distinguish between extension and control CTL's */ #define EXT_ADDITIVE (0x8000) #define CTL_11A_EXT (CTL_11A | EXT_ADDITIVE) #define CTL_11G_EXT (CTL_11G | EXT_ADDITIVE) #define CTL_11B_EXT (CTL_11B | EXT_ADDITIVE) uint16_t twiceMaxEdgePower = AR5416_MAX_RATE_POWER; int i; int16_t twiceLargestAntenna; CAL_CTL_DATA *rep; CAL_TARGET_POWER_LEG targetPowerOfdm, targetPowerCck = {0, {0, 0, 0, 0}}; CAL_TARGET_POWER_LEG targetPowerOfdmExt = {0, {0, 0, 0, 0}}, targetPowerCckExt = {0, {0, 0, 0, 0}}; CAL_TARGET_POWER_HT targetPowerHt20, targetPowerHt40 = {0, {0, 0, 0, 0}}; int16_t scaledPower, minCtlPower; #define SUB_NUM_CTL_MODES_AT_5G_40 2 /* excluding HT40, EXT-OFDM */ #define SUB_NUM_CTL_MODES_AT_2G_40 3 /* excluding HT40, EXT-OFDM, EXT-CCK */ static const uint16_t ctlModesFor11a[] = { CTL_11A, CTL_5GHT20, CTL_11A_EXT, CTL_5GHT40 }; static const uint16_t ctlModesFor11g[] = { CTL_11B, CTL_11G, CTL_2GHT20, CTL_11B_EXT, CTL_11G_EXT, CTL_2GHT40 }; const uint16_t *pCtlMode; uint16_t numCtlModes, ctlMode, freq; CHAN_CENTERS centers; ar5416GetChannelCenters(ah, chan, ¢ers); /* Compute TxPower reduction due to Antenna Gain */ twiceLargestAntenna = AH_MAX(AH_MAX( pEepData->modalHeader[IEEE80211_IS_CHAN_2GHZ(chan)].antennaGainCh[0], pEepData->modalHeader[IEEE80211_IS_CHAN_2GHZ(chan)].antennaGainCh[1]), pEepData->modalHeader[IEEE80211_IS_CHAN_2GHZ(chan)].antennaGainCh[2]); #if 0 /* Turn it back on if we need to calculate per chain antenna gain reduction */ /* Use only if the expected gain > 6dbi */ /* Chain 0 is always used */ twiceLargestAntenna = pEepData->modalHeader[IEEE80211_IS_CHAN_2GHZ(chan)].antennaGainCh[0]; /* Look at antenna gains of Chains 1 and 2 if the TX mask is set */ if (ahp->ah_tx_chainmask & 0x2) twiceLargestAntenna = AH_MAX(twiceLargestAntenna, pEepData->modalHeader[IEEE80211_IS_CHAN_2GHZ(chan)].antennaGainCh[1]); if (ahp->ah_tx_chainmask & 0x4) twiceLargestAntenna = AH_MAX(twiceLargestAntenna, pEepData->modalHeader[IEEE80211_IS_CHAN_2GHZ(chan)].antennaGainCh[2]); #endif twiceLargestAntenna = (int16_t)AH_MIN((AntennaReduction) - twiceLargestAntenna, 0); /* XXX setup for 5212 use (really used?) */ ath_hal_eepromSet(ah, IEEE80211_IS_CHAN_2GHZ(chan) ? AR_EEP_ANTGAINMAX_2 : AR_EEP_ANTGAINMAX_5, twiceLargestAntenna); /* * scaledPower is the minimum of the user input power level and * the regulatory allowed power level */ scaledPower = AH_MIN(powerLimit, twiceMaxRegulatoryPower + twiceLargestAntenna); /* Reduce scaled Power by number of chains active to get to per chain tx power level */ /* TODO: better value than these? */ switch (owl_get_ntxchains(AH5416(ah)->ah_tx_chainmask)) { case 1: break; case 2: scaledPower -= pEepData->modalHeader[IEEE80211_IS_CHAN_2GHZ(chan)].pwrDecreaseFor2Chain; break; case 3: scaledPower -= pEepData->modalHeader[IEEE80211_IS_CHAN_2GHZ(chan)].pwrDecreaseFor3Chain; break; default: return AH_FALSE; /* Unsupported number of chains */ } scaledPower = AH_MAX(0, scaledPower); /* Get target powers from EEPROM - our baseline for TX Power */ if (IEEE80211_IS_CHAN_2GHZ(chan)) { /* Setup for CTL modes */ numCtlModes = N(ctlModesFor11g) - SUB_NUM_CTL_MODES_AT_2G_40; /* CTL_11B, CTL_11G, CTL_2GHT20 */ pCtlMode = ctlModesFor11g; ar5416GetTargetPowersLeg(ah, chan, pEepData->calTargetPowerCck, AR5416_NUM_2G_CCK_TARGET_POWERS, &targetPowerCck, 4, AH_FALSE); ar5416GetTargetPowersLeg(ah, chan, pEepData->calTargetPower2G, AR5416_NUM_2G_20_TARGET_POWERS, &targetPowerOfdm, 4, AH_FALSE); ar5416GetTargetPowers(ah, chan, pEepData->calTargetPower2GHT20, AR5416_NUM_2G_20_TARGET_POWERS, &targetPowerHt20, 8, AH_FALSE); if (IEEE80211_IS_CHAN_HT40(chan)) { numCtlModes = N(ctlModesFor11g); /* All 2G CTL's */ ar5416GetTargetPowers(ah, chan, pEepData->calTargetPower2GHT40, AR5416_NUM_2G_40_TARGET_POWERS, &targetPowerHt40, 8, AH_TRUE); /* Get target powers for extension channels */ ar5416GetTargetPowersLeg(ah, chan, pEepData->calTargetPowerCck, AR5416_NUM_2G_CCK_TARGET_POWERS, &targetPowerCckExt, 4, AH_TRUE); ar5416GetTargetPowersLeg(ah, chan, pEepData->calTargetPower2G, AR5416_NUM_2G_20_TARGET_POWERS, &targetPowerOfdmExt, 4, AH_TRUE); } } else { /* Setup for CTL modes */ numCtlModes = N(ctlModesFor11a) - SUB_NUM_CTL_MODES_AT_5G_40; /* CTL_11A, CTL_5GHT20 */ pCtlMode = ctlModesFor11a; ar5416GetTargetPowersLeg(ah, chan, pEepData->calTargetPower5G, AR5416_NUM_5G_20_TARGET_POWERS, &targetPowerOfdm, 4, AH_FALSE); ar5416GetTargetPowers(ah, chan, pEepData->calTargetPower5GHT20, AR5416_NUM_5G_20_TARGET_POWERS, &targetPowerHt20, 8, AH_FALSE); if (IEEE80211_IS_CHAN_HT40(chan)) { numCtlModes = N(ctlModesFor11a); /* All 5G CTL's */ ar5416GetTargetPowers(ah, chan, pEepData->calTargetPower5GHT40, AR5416_NUM_5G_40_TARGET_POWERS, &targetPowerHt40, 8, AH_TRUE); ar5416GetTargetPowersLeg(ah, chan, pEepData->calTargetPower5G, AR5416_NUM_5G_20_TARGET_POWERS, &targetPowerOfdmExt, 4, AH_TRUE); } } /* * For MIMO, need to apply regulatory caps individually across dynamically * running modes: CCK, OFDM, HT20, HT40 * * The outer loop walks through each possible applicable runtime mode. * The inner loop walks through each ctlIndex entry in EEPROM. * The ctl value is encoded as [7:4] == test group, [3:0] == test mode. * */ for (ctlMode = 0; ctlMode < numCtlModes; ctlMode++) { HAL_BOOL isHt40CtlMode = (pCtlMode[ctlMode] == CTL_5GHT40) || (pCtlMode[ctlMode] == CTL_2GHT40); if (isHt40CtlMode) { freq = centers.ctl_center; } else if (pCtlMode[ctlMode] & EXT_ADDITIVE) { freq = centers.ext_center; } else { freq = centers.ctl_center; } /* walk through each CTL index stored in EEPROM */ for (i = 0; (i < AR5416_NUM_CTLS) && pEepData->ctlIndex[i]; i++) { uint16_t twiceMinEdgePower; /* compare test group from regulatory channel list with test mode from pCtlMode list */ if ((((cfgCtl & ~CTL_MODE_M) | (pCtlMode[ctlMode] & CTL_MODE_M)) == pEepData->ctlIndex[i]) || (((cfgCtl & ~CTL_MODE_M) | (pCtlMode[ctlMode] & CTL_MODE_M)) == ((pEepData->ctlIndex[i] & CTL_MODE_M) | SD_NO_CTL))) { rep = &(pEepData->ctlData[i]); twiceMinEdgePower = ar5416GetMaxEdgePower(freq, rep->ctlEdges[owl_get_ntxchains(AH5416(ah)->ah_tx_chainmask) - 1], IEEE80211_IS_CHAN_2GHZ(chan)); if ((cfgCtl & ~CTL_MODE_M) == SD_NO_CTL) { /* Find the minimum of all CTL edge powers that apply to this channel */ twiceMaxEdgePower = AH_MIN(twiceMaxEdgePower, twiceMinEdgePower); } else { /* specific */ twiceMaxEdgePower = twiceMinEdgePower; break; } } } minCtlPower = (uint8_t)AH_MIN(twiceMaxEdgePower, scaledPower); /* Apply ctl mode to correct target power set */ switch(pCtlMode[ctlMode]) { case CTL_11B: for (i = 0; i < N(targetPowerCck.tPow2x); i++) { targetPowerCck.tPow2x[i] = (uint8_t)AH_MIN(targetPowerCck.tPow2x[i], minCtlPower); } break; case CTL_11A: case CTL_11G: for (i = 0; i < N(targetPowerOfdm.tPow2x); i++) { targetPowerOfdm.tPow2x[i] = (uint8_t)AH_MIN(targetPowerOfdm.tPow2x[i], minCtlPower); } break; case CTL_5GHT20: case CTL_2GHT20: for (i = 0; i < N(targetPowerHt20.tPow2x); i++) { targetPowerHt20.tPow2x[i] = (uint8_t)AH_MIN(targetPowerHt20.tPow2x[i], minCtlPower); } break; case CTL_11B_EXT: targetPowerCckExt.tPow2x[0] = (uint8_t)AH_MIN(targetPowerCckExt.tPow2x[0], minCtlPower); break; case CTL_11A_EXT: case CTL_11G_EXT: targetPowerOfdmExt.tPow2x[0] = (uint8_t)AH_MIN(targetPowerOfdmExt.tPow2x[0], minCtlPower); break; case CTL_5GHT40: case CTL_2GHT40: for (i = 0; i < N(targetPowerHt40.tPow2x); i++) { targetPowerHt40.tPow2x[i] = (uint8_t)AH_MIN(targetPowerHt40.tPow2x[i], minCtlPower); } break; default: return AH_FALSE; break; } } /* end ctl mode checking */ /* Set rates Array from collected data */ ar5416SetRatesArrayFromTargetPower(ah, chan, ratesArray, &targetPowerCck, &targetPowerCckExt, &targetPowerOfdm, &targetPowerOfdmExt, &targetPowerHt20, &targetPowerHt40); return AH_TRUE; #undef EXT_ADDITIVE #undef CTL_11A_EXT #undef CTL_11G_EXT #undef CTL_11B_EXT #undef SUB_NUM_CTL_MODES_AT_5G_40 #undef SUB_NUM_CTL_MODES_AT_2G_40 #undef N } /************************************************************************** * fbin2freq * * Get channel value from binary representation held in eeprom * RETURNS: the frequency in MHz */ static uint16_t fbin2freq(uint8_t fbin, HAL_BOOL is2GHz) { /* * Reserved value 0xFF provides an empty definition both as * an fbin and as a frequency - do not convert */ if (fbin == AR5416_BCHAN_UNUSED) { return fbin; } return (uint16_t)((is2GHz) ? (2300 + fbin) : (4800 + 5 * fbin)); } /* * ar5416GetMaxEdgePower * * Find the maximum conformance test limit for the given channel and CTL info */ uint16_t ar5416GetMaxEdgePower(uint16_t freq, CAL_CTL_EDGES *pRdEdgesPower, HAL_BOOL is2GHz) { uint16_t twiceMaxEdgePower = AR5416_MAX_RATE_POWER; int i; /* Get the edge power */ for (i = 0; (i < AR5416_NUM_BAND_EDGES) && (pRdEdgesPower[i].bChannel != AR5416_BCHAN_UNUSED) ; i++) { /* * If there's an exact channel match or an inband flag set * on the lower channel use the given rdEdgePower */ if (freq == fbin2freq(pRdEdgesPower[i].bChannel, is2GHz)) { twiceMaxEdgePower = MS(pRdEdgesPower[i].tPowerFlag, CAL_CTL_EDGES_POWER); break; } else if ((i > 0) && (freq < fbin2freq(pRdEdgesPower[i].bChannel, is2GHz))) { if (fbin2freq(pRdEdgesPower[i - 1].bChannel, is2GHz) < freq && (pRdEdgesPower[i - 1].tPowerFlag & CAL_CTL_EDGES_FLAG) != 0) { twiceMaxEdgePower = MS(pRdEdgesPower[i - 1].tPowerFlag, CAL_CTL_EDGES_POWER); } /* Leave loop - no more affecting edges possible in this monotonic increasing list */ break; } } HALASSERT(twiceMaxEdgePower > 0); return twiceMaxEdgePower; } /************************************************************** * ar5416GetTargetPowers * * Return the rates of target power for the given target power table * channel, and number of channels */ void ar5416GetTargetPowers(struct ath_hal *ah, const struct ieee80211_channel *chan, CAL_TARGET_POWER_HT *powInfo, uint16_t numChannels, CAL_TARGET_POWER_HT *pNewPower, uint16_t numRates, HAL_BOOL isHt40Target) { uint16_t clo, chi; int i; int matchIndex = -1, lowIndex = -1; uint16_t freq; CHAN_CENTERS centers; ar5416GetChannelCenters(ah, chan, ¢ers); freq = isHt40Target ? centers.synth_center : centers.ctl_center; /* Copy the target powers into the temp channel list */ if (freq <= fbin2freq(powInfo[0].bChannel, IEEE80211_IS_CHAN_2GHZ(chan))) { matchIndex = 0; } else { for (i = 0; (i < numChannels) && (powInfo[i].bChannel != AR5416_BCHAN_UNUSED); i++) { if (freq == fbin2freq(powInfo[i].bChannel, IEEE80211_IS_CHAN_2GHZ(chan))) { matchIndex = i; break; } else if ((freq < fbin2freq(powInfo[i].bChannel, IEEE80211_IS_CHAN_2GHZ(chan))) && (freq > fbin2freq(powInfo[i - 1].bChannel, IEEE80211_IS_CHAN_2GHZ(chan)))) { lowIndex = i - 1; break; } } if ((matchIndex == -1) && (lowIndex == -1)) { HALASSERT(freq > fbin2freq(powInfo[i - 1].bChannel, IEEE80211_IS_CHAN_2GHZ(chan))); matchIndex = i - 1; } } if (matchIndex != -1) { OS_MEMCPY(pNewPower, &powInfo[matchIndex], sizeof(*pNewPower)); } else { HALASSERT(lowIndex != -1); /* * Get the lower and upper channels, target powers, * and interpolate between them. */ clo = fbin2freq(powInfo[lowIndex].bChannel, IEEE80211_IS_CHAN_2GHZ(chan)); chi = fbin2freq(powInfo[lowIndex + 1].bChannel, IEEE80211_IS_CHAN_2GHZ(chan)); for (i = 0; i < numRates; i++) { pNewPower->tPow2x[i] = (uint8_t)ath_ee_interpolate(freq, clo, chi, powInfo[lowIndex].tPow2x[i], powInfo[lowIndex + 1].tPow2x[i]); } } } /************************************************************** * ar5416GetTargetPowersLeg * * Return the four rates of target power for the given target power table * channel, and number of channels */ void ar5416GetTargetPowersLeg(struct ath_hal *ah, const struct ieee80211_channel *chan, CAL_TARGET_POWER_LEG *powInfo, uint16_t numChannels, CAL_TARGET_POWER_LEG *pNewPower, uint16_t numRates, HAL_BOOL isExtTarget) { uint16_t clo, chi; int i; int matchIndex = -1, lowIndex = -1; uint16_t freq; CHAN_CENTERS centers; ar5416GetChannelCenters(ah, chan, ¢ers); freq = (isExtTarget) ? centers.ext_center :centers.ctl_center; /* Copy the target powers into the temp channel list */ if (freq <= fbin2freq(powInfo[0].bChannel, IEEE80211_IS_CHAN_2GHZ(chan))) { matchIndex = 0; } else { for (i = 0; (i < numChannels) && (powInfo[i].bChannel != AR5416_BCHAN_UNUSED); i++) { if (freq == fbin2freq(powInfo[i].bChannel, IEEE80211_IS_CHAN_2GHZ(chan))) { matchIndex = i; break; } else if ((freq < fbin2freq(powInfo[i].bChannel, IEEE80211_IS_CHAN_2GHZ(chan))) && (freq > fbin2freq(powInfo[i - 1].bChannel, IEEE80211_IS_CHAN_2GHZ(chan)))) { lowIndex = i - 1; break; } } if ((matchIndex == -1) && (lowIndex == -1)) { HALASSERT(freq > fbin2freq(powInfo[i - 1].bChannel, IEEE80211_IS_CHAN_2GHZ(chan))); matchIndex = i - 1; } } if (matchIndex != -1) { OS_MEMCPY(pNewPower, &powInfo[matchIndex], sizeof(*pNewPower)); } else { HALASSERT(lowIndex != -1); /* * Get the lower and upper channels, target powers, * and interpolate between them. */ clo = fbin2freq(powInfo[lowIndex].bChannel, IEEE80211_IS_CHAN_2GHZ(chan)); chi = fbin2freq(powInfo[lowIndex + 1].bChannel, IEEE80211_IS_CHAN_2GHZ(chan)); for (i = 0; i < numRates; i++) { pNewPower->tPow2x[i] = (uint8_t)ath_ee_interpolate(freq, clo, chi, powInfo[lowIndex].tPow2x[i], powInfo[lowIndex + 1].tPow2x[i]); } } } /* * Set the gain boundaries for the given radio chain. * * The gain boundaries tell the hardware at what point in the * PDADC array to "switch over" from one PD gain setting * to another. There's also a gain overlap between two * PDADC array gain curves where there's valid PD values * for 2 gain settings. * * The hardware uses the gain overlap and gain boundaries * to determine which gain curve to use for the given * target TX power. */ void ar5416SetGainBoundariesClosedLoop(struct ath_hal *ah, int i, uint16_t pdGainOverlap_t2, uint16_t gainBoundaries[]) { int regChainOffset; regChainOffset = ar5416GetRegChainOffset(ah, i); HALDEBUG(ah, HAL_DEBUG_EEPROM, "%s: chain %d: gainOverlap_t2: %d," " gainBoundaries: %d, %d, %d, %d\n", __func__, i, pdGainOverlap_t2, gainBoundaries[0], gainBoundaries[1], gainBoundaries[2], gainBoundaries[3]); OS_REG_WRITE(ah, AR_PHY_TPCRG5 + regChainOffset, SM(pdGainOverlap_t2, AR_PHY_TPCRG5_PD_GAIN_OVERLAP) | SM(gainBoundaries[0], AR_PHY_TPCRG5_PD_GAIN_BOUNDARY_1) | SM(gainBoundaries[1], AR_PHY_TPCRG5_PD_GAIN_BOUNDARY_2) | SM(gainBoundaries[2], AR_PHY_TPCRG5_PD_GAIN_BOUNDARY_3) | SM(gainBoundaries[3], AR_PHY_TPCRG5_PD_GAIN_BOUNDARY_4)); } /* * Get the gain values and the number of gain levels given * in xpdMask. * * The EEPROM xpdMask determines which power detector gain * levels were used during calibration. Each of these mask * bits maps to a fixed gain level in hardware. */ uint16_t ar5416GetXpdGainValues(struct ath_hal *ah, uint16_t xpdMask, uint16_t xpdGainValues[]) { int i; uint16_t numXpdGain = 0; for (i = 1; i <= AR5416_PD_GAINS_IN_MASK; i++) { if ((xpdMask >> (AR5416_PD_GAINS_IN_MASK - i)) & 1) { if (numXpdGain >= AR5416_NUM_PD_GAINS) { HALASSERT(0); break; } xpdGainValues[numXpdGain] = (uint16_t)(AR5416_PD_GAINS_IN_MASK - i); numXpdGain++; } } return numXpdGain; } /* * Write the detector gain and biases. * * There are four power detector gain levels. The xpdMask in the EEPROM * determines which power detector gain levels have TX power calibration * data associated with them. This function writes the number of * PD gain levels and their values into the hardware. * * This is valid for all TX chains - the calibration data itself however * will likely differ per-chain. */ void ar5416WriteDetectorGainBiases(struct ath_hal *ah, uint16_t numXpdGain, uint16_t xpdGainValues[]) { HALDEBUG(ah, HAL_DEBUG_EEPROM, "%s: numXpdGain: %d," " xpdGainValues: %d, %d, %d\n", __func__, numXpdGain, xpdGainValues[0], xpdGainValues[1], xpdGainValues[2]); OS_REG_WRITE(ah, AR_PHY_TPCRG1, (OS_REG_READ(ah, AR_PHY_TPCRG1) & ~(AR_PHY_TPCRG1_NUM_PD_GAIN | AR_PHY_TPCRG1_PD_GAIN_1 | AR_PHY_TPCRG1_PD_GAIN_2 | AR_PHY_TPCRG1_PD_GAIN_3)) | SM(numXpdGain - 1, AR_PHY_TPCRG1_NUM_PD_GAIN) | SM(xpdGainValues[0], AR_PHY_TPCRG1_PD_GAIN_1 ) | SM(xpdGainValues[1], AR_PHY_TPCRG1_PD_GAIN_2) | SM(xpdGainValues[2], AR_PHY_TPCRG1_PD_GAIN_3)); } /* * Write the PDADC array to the given radio chain i. * * The 32 PDADC registers are written without any care about * their contents - so if various chips treat values as "special", * this routine will not care. */ void ar5416WritePdadcValues(struct ath_hal *ah, int i, uint8_t pdadcValues[]) { int regOffset, regChainOffset; int j; int reg32; regChainOffset = ar5416GetRegChainOffset(ah, i); regOffset = AR_PHY_BASE + (672 << 2) + regChainOffset; for (j = 0; j < 32; j++) { reg32 = ((pdadcValues[4*j + 0] & 0xFF) << 0) | ((pdadcValues[4*j + 1] & 0xFF) << 8) | ((pdadcValues[4*j + 2] & 0xFF) << 16) | ((pdadcValues[4*j + 3] & 0xFF) << 24) ; OS_REG_WRITE(ah, regOffset, reg32); HALDEBUG(ah, HAL_DEBUG_EEPROM, "PDADC: Chain %d |" " PDADC %3d Value %3d | PDADC %3d Value %3d | PDADC %3d" " Value %3d | PDADC %3d Value %3d |\n", i, 4*j, pdadcValues[4*j], 4*j+1, pdadcValues[4*j + 1], 4*j+2, pdadcValues[4*j + 2], 4*j+3, pdadcValues[4*j + 3]); regOffset += 4; } } /************************************************************** * ar5416SetPowerCalTable * * Pull the PDADC piers from cal data and interpolate them across the given * points as well as from the nearest pier(s) to get a power detector * linear voltage to power level table. */ HAL_BOOL ar5416SetPowerCalTable(struct ath_hal *ah, struct ar5416eeprom *pEepData, const struct ieee80211_channel *chan, int16_t *pTxPowerIndexOffset) { CAL_DATA_PER_FREQ *pRawDataset; uint8_t *pCalBChans = AH_NULL; uint16_t pdGainOverlap_t2; static uint8_t pdadcValues[AR5416_NUM_PDADC_VALUES]; uint16_t gainBoundaries[AR5416_PD_GAINS_IN_MASK]; uint16_t numPiers, i; int16_t tMinCalPower; uint16_t numXpdGain, xpdMask; uint16_t xpdGainValues[AR5416_NUM_PD_GAINS]; uint32_t regChainOffset; OS_MEMZERO(xpdGainValues, sizeof(xpdGainValues)); xpdMask = pEepData->modalHeader[IEEE80211_IS_CHAN_2GHZ(chan)].xpdGain; if (IS_EEP_MINOR_V2(ah)) { pdGainOverlap_t2 = pEepData->modalHeader[IEEE80211_IS_CHAN_2GHZ(chan)].pdGainOverlap; } else { pdGainOverlap_t2 = (uint16_t)(MS(OS_REG_READ(ah, AR_PHY_TPCRG5), AR_PHY_TPCRG5_PD_GAIN_OVERLAP)); } if (IEEE80211_IS_CHAN_2GHZ(chan)) { pCalBChans = pEepData->calFreqPier2G; numPiers = AR5416_NUM_2G_CAL_PIERS; } else { pCalBChans = pEepData->calFreqPier5G; numPiers = AR5416_NUM_5G_CAL_PIERS; } /* Calculate the value of xpdgains from the xpdGain Mask */ numXpdGain = ar5416GetXpdGainValues(ah, xpdMask, xpdGainValues); /* Write the detector gain biases and their number */ ar5416WriteDetectorGainBiases(ah, numXpdGain, xpdGainValues); for (i = 0; i < AR5416_MAX_CHAINS; i++) { regChainOffset = ar5416GetRegChainOffset(ah, i); if (pEepData->baseEepHeader.txMask & (1 << i)) { if (IEEE80211_IS_CHAN_2GHZ(chan)) { pRawDataset = pEepData->calPierData2G[i]; } else { pRawDataset = pEepData->calPierData5G[i]; } /* Fetch the gain boundaries and the PDADC values */ ar5416GetGainBoundariesAndPdadcs(ah, chan, pRawDataset, pCalBChans, numPiers, pdGainOverlap_t2, &tMinCalPower, gainBoundaries, pdadcValues, numXpdGain); if ((i == 0) || AR_SREV_5416_V20_OR_LATER(ah)) { ar5416SetGainBoundariesClosedLoop(ah, i, pdGainOverlap_t2, gainBoundaries); } /* Write the power values into the baseband power table */ ar5416WritePdadcValues(ah, i, pdadcValues); } } *pTxPowerIndexOffset = 0; return AH_TRUE; } /************************************************************** * ar5416GetGainBoundariesAndPdadcs * * Uses the data points read from EEPROM to reconstruct the pdadc power table * Called by ar5416SetPowerCalTable only. */ void ar5416GetGainBoundariesAndPdadcs(struct ath_hal *ah, const struct ieee80211_channel *chan, CAL_DATA_PER_FREQ *pRawDataSet, uint8_t * bChans, uint16_t availPiers, uint16_t tPdGainOverlap, int16_t *pMinCalPower, uint16_t * pPdGainBoundaries, uint8_t * pPDADCValues, uint16_t numXpdGains) { int i, j, k; int16_t ss; /* potentially -ve index for taking care of pdGainOverlap */ uint16_t idxL, idxR, numPiers; /* Pier indexes */ /* filled out Vpd table for all pdGains (chanL) */ static uint8_t vpdTableL[AR5416_NUM_PD_GAINS][AR5416_MAX_PWR_RANGE_IN_HALF_DB]; /* filled out Vpd table for all pdGains (chanR) */ static uint8_t vpdTableR[AR5416_NUM_PD_GAINS][AR5416_MAX_PWR_RANGE_IN_HALF_DB]; /* filled out Vpd table for all pdGains (interpolated) */ static uint8_t vpdTableI[AR5416_NUM_PD_GAINS][AR5416_MAX_PWR_RANGE_IN_HALF_DB]; uint8_t *pVpdL, *pVpdR, *pPwrL, *pPwrR; uint8_t minPwrT4[AR5416_NUM_PD_GAINS]; uint8_t maxPwrT4[AR5416_NUM_PD_GAINS]; int16_t vpdStep; int16_t tmpVal; uint16_t sizeCurrVpdTable, maxIndex, tgtIndex; HAL_BOOL match; int16_t minDelta = 0; CHAN_CENTERS centers; ar5416GetChannelCenters(ah, chan, ¢ers); /* Trim numPiers for the number of populated channel Piers */ for (numPiers = 0; numPiers < availPiers; numPiers++) { if (bChans[numPiers] == AR5416_BCHAN_UNUSED) { break; } } /* Find pier indexes around the current channel */ match = ath_ee_getLowerUpperIndex((uint8_t)FREQ2FBIN(centers.synth_center, IEEE80211_IS_CHAN_2GHZ(chan)), bChans, numPiers, &idxL, &idxR); if (match) { /* Directly fill both vpd tables from the matching index */ for (i = 0; i < numXpdGains; i++) { minPwrT4[i] = pRawDataSet[idxL].pwrPdg[i][0]; maxPwrT4[i] = pRawDataSet[idxL].pwrPdg[i][4]; ath_ee_FillVpdTable(minPwrT4[i], maxPwrT4[i], pRawDataSet[idxL].pwrPdg[i], pRawDataSet[idxL].vpdPdg[i], AR5416_PD_GAIN_ICEPTS, vpdTableI[i]); } } else { for (i = 0; i < numXpdGains; i++) { pVpdL = pRawDataSet[idxL].vpdPdg[i]; pPwrL = pRawDataSet[idxL].pwrPdg[i]; pVpdR = pRawDataSet[idxR].vpdPdg[i]; pPwrR = pRawDataSet[idxR].pwrPdg[i]; /* Start Vpd interpolation from the max of the minimum powers */ minPwrT4[i] = AH_MAX(pPwrL[0], pPwrR[0]); /* End Vpd interpolation from the min of the max powers */ maxPwrT4[i] = AH_MIN(pPwrL[AR5416_PD_GAIN_ICEPTS - 1], pPwrR[AR5416_PD_GAIN_ICEPTS - 1]); HALASSERT(maxPwrT4[i] > minPwrT4[i]); /* Fill pier Vpds */ ath_ee_FillVpdTable(minPwrT4[i], maxPwrT4[i], pPwrL, pVpdL, AR5416_PD_GAIN_ICEPTS, vpdTableL[i]); ath_ee_FillVpdTable(minPwrT4[i], maxPwrT4[i], pPwrR, pVpdR, AR5416_PD_GAIN_ICEPTS, vpdTableR[i]); /* Interpolate the final vpd */ for (j = 0; j <= (maxPwrT4[i] - minPwrT4[i]) / 2; j++) { vpdTableI[i][j] = (uint8_t)(ath_ee_interpolate((uint16_t)FREQ2FBIN(centers.synth_center, IEEE80211_IS_CHAN_2GHZ(chan)), bChans[idxL], bChans[idxR], vpdTableL[i][j], vpdTableR[i][j])); } } } *pMinCalPower = (int16_t)(minPwrT4[0] / 2); k = 0; /* index for the final table */ for (i = 0; i < numXpdGains; i++) { if (i == (numXpdGains - 1)) { pPdGainBoundaries[i] = (uint16_t)(maxPwrT4[i] / 2); } else { pPdGainBoundaries[i] = (uint16_t)((maxPwrT4[i] + minPwrT4[i+1]) / 4); } pPdGainBoundaries[i] = (uint16_t)AH_MIN(AR5416_MAX_RATE_POWER, pPdGainBoundaries[i]); /* NB: only applies to owl 1.0 */ if ((i == 0) && !AR_SREV_5416_V20_OR_LATER(ah) ) { /* * fix the gain delta, but get a delta that can be applied to min to * keep the upper power values accurate, don't think max needs to * be adjusted because should not be at that area of the table? */ minDelta = pPdGainBoundaries[0] - 23; pPdGainBoundaries[0] = 23; } else { minDelta = 0; } /* Find starting index for this pdGain */ if (i == 0) { if (AR_SREV_MERLIN_10_OR_LATER(ah)) ss = (int16_t)(0 - (minPwrT4[i] / 2)); else ss = 0; /* for the first pdGain, start from index 0 */ } else { /* need overlap entries extrapolated below. */ ss = (int16_t)((pPdGainBoundaries[i-1] - (minPwrT4[i] / 2)) - tPdGainOverlap + 1 + minDelta); } vpdStep = (int16_t)(vpdTableI[i][1] - vpdTableI[i][0]); vpdStep = (int16_t)((vpdStep < 1) ? 1 : vpdStep); /* *-ve ss indicates need to extrapolate data below for this pdGain */ while ((ss < 0) && (k < (AR5416_NUM_PDADC_VALUES - 1))) { tmpVal = (int16_t)(vpdTableI[i][0] + ss * vpdStep); pPDADCValues[k++] = (uint8_t)((tmpVal < 0) ? 0 : tmpVal); ss++; } sizeCurrVpdTable = (uint8_t)((maxPwrT4[i] - minPwrT4[i]) / 2 +1); tgtIndex = (uint8_t)(pPdGainBoundaries[i] + tPdGainOverlap - (minPwrT4[i] / 2)); maxIndex = (tgtIndex < sizeCurrVpdTable) ? tgtIndex : sizeCurrVpdTable; while ((ss < maxIndex) && (k < (AR5416_NUM_PDADC_VALUES - 1))) { pPDADCValues[k++] = vpdTableI[i][ss++]; } vpdStep = (int16_t)(vpdTableI[i][sizeCurrVpdTable - 1] - vpdTableI[i][sizeCurrVpdTable - 2]); vpdStep = (int16_t)((vpdStep < 1) ? 1 : vpdStep); /* * for last gain, pdGainBoundary == Pmax_t2, so will * have to extrapolate */ if (tgtIndex >= maxIndex) { /* need to extrapolate above */ while ((ss <= tgtIndex) && (k < (AR5416_NUM_PDADC_VALUES - 1))) { tmpVal = (int16_t)((vpdTableI[i][sizeCurrVpdTable - 1] + (ss - maxIndex +1) * vpdStep)); pPDADCValues[k++] = (uint8_t)((tmpVal > 255) ? 255 : tmpVal); ss++; } } /* extrapolated above */ } /* for all pdGainUsed */ /* Fill out pdGainBoundaries - only up to 2 allowed here, but hardware allows up to 4 */ while (i < AR5416_PD_GAINS_IN_MASK) { pPdGainBoundaries[i] = pPdGainBoundaries[i-1]; i++; } while (k < AR5416_NUM_PDADC_VALUES) { pPDADCValues[k] = pPDADCValues[k-1]; k++; } return; } /* * The linux ath9k driver and (from what I've been told) the reference * Atheros driver enables the 11n PHY by default whether or not it's * configured. */ static void ar5416Set11nRegs(struct ath_hal *ah, const struct ieee80211_channel *chan) { uint32_t phymode; uint32_t enableDacFifo = 0; HAL_HT_MACMODE macmode; /* MAC - 20/40 mode */ if (AR_SREV_KITE_10_OR_LATER(ah)) enableDacFifo = (OS_REG_READ(ah, AR_PHY_TURBO) & AR_PHY_FC_ENABLE_DAC_FIFO); /* Enable 11n HT, 20 MHz */ phymode = AR_PHY_FC_HT_EN | AR_PHY_FC_SHORT_GI_40 | AR_PHY_FC_SINGLE_HT_LTF1 | AR_PHY_FC_WALSH | enableDacFifo; /* Configure baseband for dynamic 20/40 operation */ if (IEEE80211_IS_CHAN_HT40(chan)) { phymode |= AR_PHY_FC_DYN2040_EN; /* Configure control (primary) channel at +-10MHz */ if (IEEE80211_IS_CHAN_HT40U(chan)) phymode |= AR_PHY_FC_DYN2040_PRI_CH; #if 0 /* Configure 20/25 spacing */ if (ht->ht_extprotspacing == HAL_HT_EXTPROTSPACING_25) phymode |= AR_PHY_FC_DYN2040_EXT_CH; #endif macmode = HAL_HT_MACMODE_2040; } else macmode = HAL_HT_MACMODE_20; OS_REG_WRITE(ah, AR_PHY_TURBO, phymode); /* Configure MAC for 20/40 operation */ ar5416Set11nMac2040(ah, macmode); /* global transmit timeout (25 TUs default)*/ /* XXX - put this elsewhere??? */ OS_REG_WRITE(ah, AR_GTXTO, 25 << AR_GTXTO_TIMEOUT_LIMIT_S) ; /* carrier sense timeout */ OS_REG_SET_BIT(ah, AR_GTTM, AR_GTTM_CST_USEC); OS_REG_WRITE(ah, AR_CST, 0xF << AR_CST_TIMEOUT_LIMIT_S); } void ar5416GetChannelCenters(struct ath_hal *ah, const struct ieee80211_channel *chan, CHAN_CENTERS *centers) { uint16_t freq = ath_hal_gethwchannel(ah, chan); centers->ctl_center = freq; centers->synth_center = freq; /* * In 20/40 phy mode, the center frequency is * "between" the control and extension channels. */ if (IEEE80211_IS_CHAN_HT40U(chan)) { centers->synth_center += HT40_CHANNEL_CENTER_SHIFT; centers->ext_center = centers->synth_center + HT40_CHANNEL_CENTER_SHIFT; } else if (IEEE80211_IS_CHAN_HT40D(chan)) { centers->synth_center -= HT40_CHANNEL_CENTER_SHIFT; centers->ext_center = centers->synth_center - HT40_CHANNEL_CENTER_SHIFT; } else { centers->ext_center = freq; } } /* * Override the INI vals being programmed. */ static void ar5416OverrideIni(struct ath_hal *ah, const struct ieee80211_channel *chan) { uint32_t val; /* * Set the RX_ABORT and RX_DIS and clear if off only after * RXE is set for MAC. This prevents frames with corrupted * descriptor status. */ OS_REG_SET_BIT(ah, AR_DIAG_SW, (AR_DIAG_RX_DIS | AR_DIAG_RX_ABORT)); if (AR_SREV_MERLIN_10_OR_LATER(ah)) { val = OS_REG_READ(ah, AR_PCU_MISC_MODE2); val &= (~AR_PCU_MISC_MODE2_ADHOC_MCAST_KEYID_ENABLE); if (!AR_SREV_9271(ah)) val &= ~AR_PCU_MISC_MODE2_HWWAR1; if (AR_SREV_KIWI_10_OR_LATER(ah)) val = val & (~AR_PCU_MISC_MODE2_HWWAR2); OS_REG_WRITE(ah, AR_PCU_MISC_MODE2, val); } /* * Disable RIFS search on some chips to avoid baseband * hang issues. */ if (AR_SREV_HOWL(ah) || AR_SREV_SOWL(ah)) (void) ar5416SetRifsDelay(ah, chan, AH_FALSE); if (!AR_SREV_5416_V20_OR_LATER(ah) || AR_SREV_MERLIN(ah)) return; /* * Disable BB clock gating * Necessary to avoid issues on AR5416 2.0 */ OS_REG_WRITE(ah, 0x9800 + (651 << 2), 0x11); } struct ini { uint32_t *data; /* NB: !const */ int rows, cols; }; /* * Override XPA bias level based on operating frequency. * This is a v14 EEPROM specific thing for the AR9160. */ void ar5416EepromSetAddac(struct ath_hal *ah, const struct ieee80211_channel *chan) { #define XPA_LVL_FREQ(cnt) (pModal->xpaBiasLvlFreq[cnt]) MODAL_EEP_HEADER *pModal; HAL_EEPROM_v14 *ee = AH_PRIVATE(ah)->ah_eeprom; struct ar5416eeprom *eep = &ee->ee_base; uint8_t biaslevel; if (! AR_SREV_SOWL(ah)) return; if (EEP_MINOR(ah) < AR5416_EEP_MINOR_VER_7) return; pModal = &(eep->modalHeader[IEEE80211_IS_CHAN_2GHZ(chan)]); if (pModal->xpaBiasLvl != 0xff) biaslevel = pModal->xpaBiasLvl; else { uint16_t resetFreqBin, freqBin, freqCount = 0; CHAN_CENTERS centers; ar5416GetChannelCenters(ah, chan, ¢ers); resetFreqBin = FREQ2FBIN(centers.synth_center, IEEE80211_IS_CHAN_2GHZ(chan)); freqBin = XPA_LVL_FREQ(0) & 0xff; biaslevel = (uint8_t) (XPA_LVL_FREQ(0) >> 14); freqCount++; while (freqCount < 3) { if (XPA_LVL_FREQ(freqCount) == 0x0) break; freqBin = XPA_LVL_FREQ(freqCount) & 0xff; if (resetFreqBin >= freqBin) biaslevel = (uint8_t)(XPA_LVL_FREQ(freqCount) >> 14); else break; freqCount++; } } HALDEBUG(ah, HAL_DEBUG_EEPROM, "%s: overriding XPA bias level = %d\n", __func__, biaslevel); /* * This is a dirty workaround for the const initval data, * which will upset multiple AR9160's on the same board. * * The HAL should likely just have a private copy of the addac * data per instance. */ if (IEEE80211_IS_CHAN_2GHZ(chan)) HAL_INI_VAL((struct ini *) &AH5416(ah)->ah_ini_addac, 7, 1) = (HAL_INI_VAL(&AH5416(ah)->ah_ini_addac, 7, 1) & (~0x18)) | biaslevel << 3; else HAL_INI_VAL((struct ini *) &AH5416(ah)->ah_ini_addac, 6, 1) = (HAL_INI_VAL(&AH5416(ah)->ah_ini_addac, 6, 1) & (~0xc0)) | biaslevel << 6; #undef XPA_LVL_FREQ } static void ar5416MarkPhyInactive(struct ath_hal *ah) { OS_REG_WRITE(ah, AR_PHY_ACTIVE, AR_PHY_ACTIVE_DIS); } #define AR5416_IFS_SLOT_FULL_RATE_40 0x168 /* 9 us half, 40 MHz core clock (9*40) */ #define AR5416_IFS_SLOT_HALF_RATE_40 0x104 /* 13 us half, 20 MHz core clock (13*20) */ #define AR5416_IFS_SLOT_QUARTER_RATE_40 0xD2 /* 21 us quarter, 10 MHz core clock (21*10) */ #define AR5416_IFS_EIFS_FULL_RATE_40 0xE60 /* (74 + (2 * 9)) * 40MHz core clock */ #define AR5416_IFS_EIFS_HALF_RATE_40 0xDAC /* (149 + (2 * 13)) * 20MHz core clock */ #define AR5416_IFS_EIFS_QUARTER_RATE_40 0xD48 /* (298 + (2 * 21)) * 10MHz core clock */ #define AR5416_IFS_SLOT_FULL_RATE_44 0x18c /* 9 us half, 44 MHz core clock (9*44) */ #define AR5416_IFS_SLOT_HALF_RATE_44 0x11e /* 13 us half, 22 MHz core clock (13*22) */ #define AR5416_IFS_SLOT_QUARTER_RATE_44 0xe7 /* 21 us quarter, 11 MHz core clock (21*11) */ #define AR5416_IFS_EIFS_FULL_RATE_44 0xfd0 /* (74 + (2 * 9)) * 44MHz core clock */ #define AR5416_IFS_EIFS_HALF_RATE_44 0xf0a /* (149 + (2 * 13)) * 22MHz core clock */ #define AR5416_IFS_EIFS_QUARTER_RATE_44 0xe9c /* (298 + (2 * 21)) * 11MHz core clock */ #define AR5416_INIT_USEC_40 40 #define AR5416_HALF_RATE_USEC_40 19 /* ((40 / 2) - 1 ) */ #define AR5416_QUARTER_RATE_USEC_40 9 /* ((40 / 4) - 1 ) */ #define AR5416_INIT_USEC_44 44 #define AR5416_HALF_RATE_USEC_44 21 /* ((44 / 2) - 1 ) */ #define AR5416_QUARTER_RATE_USEC_44 10 /* ((44 / 4) - 1 ) */ /* XXX What should these be for 40/44MHz clocks (and half/quarter) ? */ #define AR5416_RX_NON_FULL_RATE_LATENCY 63 #define AR5416_TX_HALF_RATE_LATENCY 108 #define AR5416_TX_QUARTER_RATE_LATENCY 216 /* * Adjust various register settings based on half/quarter rate clock setting. * This includes: * * + USEC, TX/RX latency, * + IFS params: slot, eifs, misc etc. * * TODO: * * + Verify which other registers need to be tweaked; * + Verify the behaviour of this for 5GHz fast and non-fast clock mode; * + This just plain won't work for long distance links - the coverage class * code isn't aware of the slot/ifs/ACK/RTS timeout values that need to * change; * + Verify whether the 32KHz USEC value needs to be kept for the 802.11n * series chips? * + Calculate/derive values for 2GHz, 5GHz, 5GHz fast clock */ static void ar5416SetIFSTiming(struct ath_hal *ah, const struct ieee80211_channel *chan) { uint32_t txLat, rxLat, usec, slot, refClock, eifs, init_usec; int clk_44 = 0; HALASSERT(IEEE80211_IS_CHAN_HALF(chan) || IEEE80211_IS_CHAN_QUARTER(chan)); /* 2GHz and 5GHz fast clock - 44MHz; else 40MHz */ if (IEEE80211_IS_CHAN_2GHZ(chan)) clk_44 = 1; else if (IEEE80211_IS_CHAN_5GHZ(chan) && IS_5GHZ_FAST_CLOCK_EN(ah, chan)) clk_44 = 1; /* XXX does this need save/restoring for the 11n chips? */ + /* + * XXX TODO: should mask out the txlat/rxlat/usec values? + */ refClock = OS_REG_READ(ah, AR_USEC) & AR_USEC_USEC32; /* * XXX This really should calculate things, not use * hard coded values! Ew. */ if (IEEE80211_IS_CHAN_HALF(chan)) { if (clk_44) { slot = AR5416_IFS_SLOT_HALF_RATE_44; rxLat = AR5416_RX_NON_FULL_RATE_LATENCY << AR5416_USEC_RX_LAT_S; txLat = AR5416_TX_HALF_RATE_LATENCY << AR5416_USEC_TX_LAT_S; usec = AR5416_HALF_RATE_USEC_44; eifs = AR5416_IFS_EIFS_HALF_RATE_44; init_usec = AR5416_INIT_USEC_44 >> 1; } else { slot = AR5416_IFS_SLOT_HALF_RATE_40; rxLat = AR5416_RX_NON_FULL_RATE_LATENCY << AR5416_USEC_RX_LAT_S; txLat = AR5416_TX_HALF_RATE_LATENCY << AR5416_USEC_TX_LAT_S; usec = AR5416_HALF_RATE_USEC_40; eifs = AR5416_IFS_EIFS_HALF_RATE_40; init_usec = AR5416_INIT_USEC_40 >> 1; } } else { /* quarter rate */ if (clk_44) { slot = AR5416_IFS_SLOT_QUARTER_RATE_44; rxLat = AR5416_RX_NON_FULL_RATE_LATENCY << AR5416_USEC_RX_LAT_S; txLat = AR5416_TX_QUARTER_RATE_LATENCY << AR5416_USEC_TX_LAT_S; usec = AR5416_QUARTER_RATE_USEC_44; eifs = AR5416_IFS_EIFS_QUARTER_RATE_44; init_usec = AR5416_INIT_USEC_44 >> 2; } else { slot = AR5416_IFS_SLOT_QUARTER_RATE_40; rxLat = AR5416_RX_NON_FULL_RATE_LATENCY << AR5416_USEC_RX_LAT_S; txLat = AR5416_TX_QUARTER_RATE_LATENCY << AR5416_USEC_TX_LAT_S; usec = AR5416_QUARTER_RATE_USEC_40; eifs = AR5416_IFS_EIFS_QUARTER_RATE_40; init_usec = AR5416_INIT_USEC_40 >> 2; } } /* XXX verify these! */ OS_REG_WRITE(ah, AR_USEC, (usec | refClock | txLat | rxLat)); OS_REG_WRITE(ah, AR_D_GBL_IFS_SLOT, slot); OS_REG_WRITE(ah, AR_D_GBL_IFS_EIFS, eifs); OS_REG_RMW_FIELD(ah, AR_D_GBL_IFS_MISC, AR_D_GBL_IFS_MISC_USEC_DURATION, init_usec); } Index: projects/clang390-import/sys/dev/cxgbe/t4_main.c =================================================================== --- projects/clang390-import/sys/dev/cxgbe/t4_main.c (revision 305686) +++ projects/clang390-import/sys/dev/cxgbe/t4_main.c (revision 305687) @@ -1,9575 +1,9577 @@ /*- * Copyright (c) 2011 Chelsio Communications, Inc. * All rights reserved. * Written by: Navdeep Parhar * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include "opt_ddb.h" #include "opt_inet.h" #include "opt_inet6.h" #include "opt_rss.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifdef RSS #include #endif #if defined(__i386__) || defined(__amd64__) #include #include #endif #ifdef DDB #include #include #endif #include "common/common.h" #include "common/t4_msg.h" #include "common/t4_regs.h" #include "common/t4_regs_values.h" #include "t4_ioctl.h" #include "t4_l2t.h" #include "t4_mp_ring.h" #include "t4_if.h" /* T4 bus driver interface */ static int t4_probe(device_t); static int t4_attach(device_t); static int t4_detach(device_t); static int t4_ready(device_t); static int t4_read_port_device(device_t, int, device_t *); static device_method_t t4_methods[] = { DEVMETHOD(device_probe, t4_probe), DEVMETHOD(device_attach, t4_attach), DEVMETHOD(device_detach, t4_detach), DEVMETHOD(t4_is_main_ready, t4_ready), DEVMETHOD(t4_read_port_device, t4_read_port_device), DEVMETHOD_END }; static driver_t t4_driver = { "t4nex", t4_methods, sizeof(struct adapter) }; /* T4 port (cxgbe) interface */ static int cxgbe_probe(device_t); static int cxgbe_attach(device_t); static int cxgbe_detach(device_t); device_method_t cxgbe_methods[] = { DEVMETHOD(device_probe, cxgbe_probe), DEVMETHOD(device_attach, cxgbe_attach), DEVMETHOD(device_detach, cxgbe_detach), { 0, 0 } }; static driver_t cxgbe_driver = { "cxgbe", cxgbe_methods, sizeof(struct port_info) }; /* T4 VI (vcxgbe) interface */ static int vcxgbe_probe(device_t); static int vcxgbe_attach(device_t); static int vcxgbe_detach(device_t); static device_method_t vcxgbe_methods[] = { DEVMETHOD(device_probe, vcxgbe_probe), DEVMETHOD(device_attach, vcxgbe_attach), DEVMETHOD(device_detach, vcxgbe_detach), { 0, 0 } }; static driver_t vcxgbe_driver = { "vcxgbe", vcxgbe_methods, sizeof(struct vi_info) }; static d_ioctl_t t4_ioctl; static struct cdevsw t4_cdevsw = { .d_version = D_VERSION, .d_ioctl = t4_ioctl, .d_name = "t4nex", }; /* T5 bus driver interface */ static int t5_probe(device_t); static device_method_t t5_methods[] = { DEVMETHOD(device_probe, t5_probe), DEVMETHOD(device_attach, t4_attach), DEVMETHOD(device_detach, t4_detach), DEVMETHOD(t4_is_main_ready, t4_ready), DEVMETHOD(t4_read_port_device, t4_read_port_device), DEVMETHOD_END }; static driver_t t5_driver = { "t5nex", t5_methods, sizeof(struct adapter) }; /* T5 port (cxl) interface */ static driver_t cxl_driver = { "cxl", cxgbe_methods, sizeof(struct port_info) }; /* T5 VI (vcxl) interface */ static driver_t vcxl_driver = { "vcxl", vcxgbe_methods, sizeof(struct vi_info) }; /* ifnet + media interface */ static void cxgbe_init(void *); static int cxgbe_ioctl(struct ifnet *, unsigned long, caddr_t); static int cxgbe_transmit(struct ifnet *, struct mbuf *); static void cxgbe_qflush(struct ifnet *); static int cxgbe_media_change(struct ifnet *); static void cxgbe_media_status(struct ifnet *, struct ifmediareq *); MALLOC_DEFINE(M_CXGBE, "cxgbe", "Chelsio T4/T5 Ethernet driver and services"); /* * Correct lock order when you need to acquire multiple locks is t4_list_lock, * then ADAPTER_LOCK, then t4_uld_list_lock. */ static struct sx t4_list_lock; SLIST_HEAD(, adapter) t4_list; #ifdef TCP_OFFLOAD static struct sx t4_uld_list_lock; SLIST_HEAD(, uld_info) t4_uld_list; #endif /* * Tunables. See tweak_tunables() too. * * Each tunable is set to a default value here if it's known at compile-time. * Otherwise it is set to -1 as an indication to tweak_tunables() that it should * provide a reasonable default when the driver is loaded. * * Tunables applicable to both T4 and T5 are under hw.cxgbe. Those specific to * T5 are under hw.cxl. */ /* * Number of queues for tx and rx, 10G and 1G, NIC and offload. */ #define NTXQ_10G 16 int t4_ntxq10g = -1; TUNABLE_INT("hw.cxgbe.ntxq10g", &t4_ntxq10g); #define NRXQ_10G 8 int t4_nrxq10g = -1; TUNABLE_INT("hw.cxgbe.nrxq10g", &t4_nrxq10g); #define NTXQ_1G 4 int t4_ntxq1g = -1; TUNABLE_INT("hw.cxgbe.ntxq1g", &t4_ntxq1g); #define NRXQ_1G 2 int t4_nrxq1g = -1; TUNABLE_INT("hw.cxgbe.nrxq1g", &t4_nrxq1g); #define NTXQ_VI 1 static int t4_ntxq_vi = -1; TUNABLE_INT("hw.cxgbe.ntxq_vi", &t4_ntxq_vi); #define NRXQ_VI 1 static int t4_nrxq_vi = -1; TUNABLE_INT("hw.cxgbe.nrxq_vi", &t4_nrxq_vi); static int t4_rsrv_noflowq = 0; TUNABLE_INT("hw.cxgbe.rsrv_noflowq", &t4_rsrv_noflowq); #ifdef TCP_OFFLOAD #define NOFLDTXQ_10G 8 static int t4_nofldtxq10g = -1; TUNABLE_INT("hw.cxgbe.nofldtxq10g", &t4_nofldtxq10g); #define NOFLDRXQ_10G 2 static int t4_nofldrxq10g = -1; TUNABLE_INT("hw.cxgbe.nofldrxq10g", &t4_nofldrxq10g); #define NOFLDTXQ_1G 2 static int t4_nofldtxq1g = -1; TUNABLE_INT("hw.cxgbe.nofldtxq1g", &t4_nofldtxq1g); #define NOFLDRXQ_1G 1 static int t4_nofldrxq1g = -1; TUNABLE_INT("hw.cxgbe.nofldrxq1g", &t4_nofldrxq1g); #define NOFLDTXQ_VI 1 static int t4_nofldtxq_vi = -1; TUNABLE_INT("hw.cxgbe.nofldtxq_vi", &t4_nofldtxq_vi); #define NOFLDRXQ_VI 1 static int t4_nofldrxq_vi = -1; TUNABLE_INT("hw.cxgbe.nofldrxq_vi", &t4_nofldrxq_vi); #endif #ifdef DEV_NETMAP #define NNMTXQ_VI 2 static int t4_nnmtxq_vi = -1; TUNABLE_INT("hw.cxgbe.nnmtxq_vi", &t4_nnmtxq_vi); #define NNMRXQ_VI 2 static int t4_nnmrxq_vi = -1; TUNABLE_INT("hw.cxgbe.nnmrxq_vi", &t4_nnmrxq_vi); #endif /* * Holdoff parameters for 10G and 1G ports. */ #define TMR_IDX_10G 1 int t4_tmr_idx_10g = TMR_IDX_10G; TUNABLE_INT("hw.cxgbe.holdoff_timer_idx_10G", &t4_tmr_idx_10g); #define PKTC_IDX_10G (-1) int t4_pktc_idx_10g = PKTC_IDX_10G; TUNABLE_INT("hw.cxgbe.holdoff_pktc_idx_10G", &t4_pktc_idx_10g); #define TMR_IDX_1G 1 int t4_tmr_idx_1g = TMR_IDX_1G; TUNABLE_INT("hw.cxgbe.holdoff_timer_idx_1G", &t4_tmr_idx_1g); #define PKTC_IDX_1G (-1) int t4_pktc_idx_1g = PKTC_IDX_1G; TUNABLE_INT("hw.cxgbe.holdoff_pktc_idx_1G", &t4_pktc_idx_1g); /* * Size (# of entries) of each tx and rx queue. */ unsigned int t4_qsize_txq = TX_EQ_QSIZE; TUNABLE_INT("hw.cxgbe.qsize_txq", &t4_qsize_txq); unsigned int t4_qsize_rxq = RX_IQ_QSIZE; TUNABLE_INT("hw.cxgbe.qsize_rxq", &t4_qsize_rxq); /* * Interrupt types allowed (bits 0, 1, 2 = INTx, MSI, MSI-X respectively). */ int t4_intr_types = INTR_MSIX | INTR_MSI | INTR_INTX; TUNABLE_INT("hw.cxgbe.interrupt_types", &t4_intr_types); /* * Configuration file. */ #define DEFAULT_CF "default" #define FLASH_CF "flash" #define UWIRE_CF "uwire" #define FPGA_CF "fpga" static char t4_cfg_file[32] = DEFAULT_CF; TUNABLE_STR("hw.cxgbe.config_file", t4_cfg_file, sizeof(t4_cfg_file)); /* * PAUSE settings (bit 0, 1 = rx_pause, tx_pause respectively). * rx_pause = 1 to heed incoming PAUSE frames, 0 to ignore them. * tx_pause = 1 to emit PAUSE frames when the rx FIFO reaches its high water * mark or when signalled to do so, 0 to never emit PAUSE. */ static int t4_pause_settings = PAUSE_TX | PAUSE_RX; TUNABLE_INT("hw.cxgbe.pause_settings", &t4_pause_settings); /* * Firmware auto-install by driver during attach (0, 1, 2 = prohibited, allowed, * encouraged respectively). */ static unsigned int t4_fw_install = 1; TUNABLE_INT("hw.cxgbe.fw_install", &t4_fw_install); /* * ASIC features that will be used. Disable the ones you don't want so that the * chip resources aren't wasted on features that will not be used. */ static int t4_nbmcaps_allowed = 0; TUNABLE_INT("hw.cxgbe.nbmcaps_allowed", &t4_nbmcaps_allowed); static int t4_linkcaps_allowed = 0; /* No DCBX, PPP, etc. by default */ TUNABLE_INT("hw.cxgbe.linkcaps_allowed", &t4_linkcaps_allowed); static int t4_switchcaps_allowed = FW_CAPS_CONFIG_SWITCH_INGRESS | FW_CAPS_CONFIG_SWITCH_EGRESS; TUNABLE_INT("hw.cxgbe.switchcaps_allowed", &t4_switchcaps_allowed); static int t4_niccaps_allowed = FW_CAPS_CONFIG_NIC; TUNABLE_INT("hw.cxgbe.niccaps_allowed", &t4_niccaps_allowed); static int t4_toecaps_allowed = -1; TUNABLE_INT("hw.cxgbe.toecaps_allowed", &t4_toecaps_allowed); static int t4_rdmacaps_allowed = -1; TUNABLE_INT("hw.cxgbe.rdmacaps_allowed", &t4_rdmacaps_allowed); static int t4_tlscaps_allowed = 0; TUNABLE_INT("hw.cxgbe.tlscaps_allowed", &t4_tlscaps_allowed); static int t4_iscsicaps_allowed = -1; TUNABLE_INT("hw.cxgbe.iscsicaps_allowed", &t4_iscsicaps_allowed); static int t4_fcoecaps_allowed = 0; TUNABLE_INT("hw.cxgbe.fcoecaps_allowed", &t4_fcoecaps_allowed); static int t5_write_combine = 0; TUNABLE_INT("hw.cxl.write_combine", &t5_write_combine); static int t4_num_vis = 1; TUNABLE_INT("hw.cxgbe.num_vis", &t4_num_vis); /* Functions used by extra VIs to obtain unique MAC addresses for each VI. */ static int vi_mac_funcs[] = { FW_VI_FUNC_OFLD, FW_VI_FUNC_IWARP, FW_VI_FUNC_OPENISCSI, FW_VI_FUNC_OPENFCOE, FW_VI_FUNC_FOISCSI, FW_VI_FUNC_FOFCOE, }; struct intrs_and_queues { uint16_t intr_type; /* INTx, MSI, or MSI-X */ uint16_t nirq; /* Total # of vectors */ uint16_t intr_flags_10g;/* Interrupt flags for each 10G port */ uint16_t intr_flags_1g; /* Interrupt flags for each 1G port */ uint16_t ntxq10g; /* # of NIC txq's for each 10G port */ uint16_t nrxq10g; /* # of NIC rxq's for each 10G port */ uint16_t ntxq1g; /* # of NIC txq's for each 1G port */ uint16_t nrxq1g; /* # of NIC rxq's for each 1G port */ uint16_t rsrv_noflowq; /* Flag whether to reserve queue 0 */ uint16_t nofldtxq10g; /* # of TOE txq's for each 10G port */ uint16_t nofldrxq10g; /* # of TOE rxq's for each 10G port */ uint16_t nofldtxq1g; /* # of TOE txq's for each 1G port */ uint16_t nofldrxq1g; /* # of TOE rxq's for each 1G port */ /* The vcxgbe/vcxl interfaces use these and not the ones above. */ uint16_t ntxq_vi; /* # of NIC txq's */ uint16_t nrxq_vi; /* # of NIC rxq's */ uint16_t nofldtxq_vi; /* # of TOE txq's */ uint16_t nofldrxq_vi; /* # of TOE rxq's */ uint16_t nnmtxq_vi; /* # of netmap txq's */ uint16_t nnmrxq_vi; /* # of netmap rxq's */ }; struct filter_entry { uint32_t valid:1; /* filter allocated and valid */ uint32_t locked:1; /* filter is administratively locked */ uint32_t pending:1; /* filter action is pending firmware reply */ uint32_t smtidx:8; /* Source MAC Table index for smac */ struct l2t_entry *l2t; /* Layer Two Table entry for dmac */ struct t4_filter_specification fs; }; static void setup_memwin(struct adapter *); static void position_memwin(struct adapter *, int, uint32_t); static int rw_via_memwin(struct adapter *, int, uint32_t, uint32_t *, int, int); static inline int read_via_memwin(struct adapter *, int, uint32_t, uint32_t *, int); static inline int write_via_memwin(struct adapter *, int, uint32_t, const uint32_t *, int); static int validate_mem_range(struct adapter *, uint32_t, int); static int fwmtype_to_hwmtype(int); static int validate_mt_off_len(struct adapter *, int, uint32_t, int, uint32_t *); static int fixup_devlog_params(struct adapter *); static int cfg_itype_and_nqueues(struct adapter *, int, int, int, struct intrs_and_queues *); static int prep_firmware(struct adapter *); static int partition_resources(struct adapter *, const struct firmware *, const char *); static int get_params__pre_init(struct adapter *); static int get_params__post_init(struct adapter *); static int set_params__post_init(struct adapter *); static void t4_set_desc(struct adapter *); static void build_medialist(struct port_info *, struct ifmedia *); static int cxgbe_init_synchronized(struct vi_info *); static int cxgbe_uninit_synchronized(struct vi_info *); static void quiesce_txq(struct adapter *, struct sge_txq *); static void quiesce_wrq(struct adapter *, struct sge_wrq *); static void quiesce_iq(struct adapter *, struct sge_iq *); static void quiesce_fl(struct adapter *, struct sge_fl *); static int t4_alloc_irq(struct adapter *, struct irq *, int rid, driver_intr_t *, void *, char *); static int t4_free_irq(struct adapter *, struct irq *); static void get_regs(struct adapter *, struct t4_regdump *, uint8_t *); static void vi_refresh_stats(struct adapter *, struct vi_info *); static void cxgbe_refresh_stats(struct adapter *, struct port_info *); static void cxgbe_tick(void *); static void cxgbe_vlan_config(void *, struct ifnet *, uint16_t); static void cxgbe_sysctls(struct port_info *); static int sysctl_int_array(SYSCTL_HANDLER_ARGS); static int sysctl_bitfield(SYSCTL_HANDLER_ARGS); static int sysctl_btphy(SYSCTL_HANDLER_ARGS); static int sysctl_noflowq(SYSCTL_HANDLER_ARGS); static int sysctl_holdoff_tmr_idx(SYSCTL_HANDLER_ARGS); static int sysctl_holdoff_pktc_idx(SYSCTL_HANDLER_ARGS); static int sysctl_qsize_rxq(SYSCTL_HANDLER_ARGS); static int sysctl_qsize_txq(SYSCTL_HANDLER_ARGS); static int sysctl_pause_settings(SYSCTL_HANDLER_ARGS); static int sysctl_handle_t4_reg64(SYSCTL_HANDLER_ARGS); static int sysctl_temperature(SYSCTL_HANDLER_ARGS); #ifdef SBUF_DRAIN static int sysctl_cctrl(SYSCTL_HANDLER_ARGS); static int sysctl_cim_ibq_obq(SYSCTL_HANDLER_ARGS); static int sysctl_cim_la(SYSCTL_HANDLER_ARGS); static int sysctl_cim_la_t6(SYSCTL_HANDLER_ARGS); static int sysctl_cim_ma_la(SYSCTL_HANDLER_ARGS); static int sysctl_cim_pif_la(SYSCTL_HANDLER_ARGS); static int sysctl_cim_qcfg(SYSCTL_HANDLER_ARGS); static int sysctl_cpl_stats(SYSCTL_HANDLER_ARGS); static int sysctl_ddp_stats(SYSCTL_HANDLER_ARGS); static int sysctl_devlog(SYSCTL_HANDLER_ARGS); static int sysctl_fcoe_stats(SYSCTL_HANDLER_ARGS); static int sysctl_hw_sched(SYSCTL_HANDLER_ARGS); static int sysctl_lb_stats(SYSCTL_HANDLER_ARGS); static int sysctl_linkdnrc(SYSCTL_HANDLER_ARGS); static int sysctl_meminfo(SYSCTL_HANDLER_ARGS); static int sysctl_mps_tcam(SYSCTL_HANDLER_ARGS); static int sysctl_mps_tcam_t6(SYSCTL_HANDLER_ARGS); static int sysctl_path_mtus(SYSCTL_HANDLER_ARGS); static int sysctl_pm_stats(SYSCTL_HANDLER_ARGS); static int sysctl_rdma_stats(SYSCTL_HANDLER_ARGS); static int sysctl_tcp_stats(SYSCTL_HANDLER_ARGS); static int sysctl_tids(SYSCTL_HANDLER_ARGS); static int sysctl_tp_err_stats(SYSCTL_HANDLER_ARGS); static int sysctl_tp_la_mask(SYSCTL_HANDLER_ARGS); static int sysctl_tp_la(SYSCTL_HANDLER_ARGS); static int sysctl_tx_rate(SYSCTL_HANDLER_ARGS); static int sysctl_ulprx_la(SYSCTL_HANDLER_ARGS); static int sysctl_wcwr_stats(SYSCTL_HANDLER_ARGS); static int sysctl_tc_params(SYSCTL_HANDLER_ARGS); #endif #ifdef TCP_OFFLOAD static int sysctl_tp_tick(SYSCTL_HANDLER_ARGS); static int sysctl_tp_dack_timer(SYSCTL_HANDLER_ARGS); static int sysctl_tp_timer(SYSCTL_HANDLER_ARGS); #endif static uint32_t fconf_iconf_to_mode(uint32_t, uint32_t); static uint32_t mode_to_fconf(uint32_t); static uint32_t mode_to_iconf(uint32_t); static int check_fspec_against_fconf_iconf(struct adapter *, struct t4_filter_specification *); static int get_filter_mode(struct adapter *, uint32_t *); static int set_filter_mode(struct adapter *, uint32_t); static inline uint64_t get_filter_hits(struct adapter *, uint32_t); static int get_filter(struct adapter *, struct t4_filter *); static int set_filter(struct adapter *, struct t4_filter *); static int del_filter(struct adapter *, struct t4_filter *); static void clear_filter(struct filter_entry *); static int set_filter_wr(struct adapter *, int); static int del_filter_wr(struct adapter *, int); static int set_tcb_rpl(struct sge_iq *, const struct rss_header *, struct mbuf *); static int get_sge_context(struct adapter *, struct t4_sge_context *); static int load_fw(struct adapter *, struct t4_data *); static int read_card_mem(struct adapter *, int, struct t4_mem_range *); static int read_i2c(struct adapter *, struct t4_i2c_data *); #ifdef TCP_OFFLOAD static int toe_capability(struct vi_info *, int); #endif static int mod_event(module_t, int, void *); static int notify_siblings(device_t, int); struct { uint16_t device; char *desc; } t4_pciids[] = { {0xa000, "Chelsio Terminator 4 FPGA"}, {0x4400, "Chelsio T440-dbg"}, {0x4401, "Chelsio T420-CR"}, {0x4402, "Chelsio T422-CR"}, {0x4403, "Chelsio T440-CR"}, {0x4404, "Chelsio T420-BCH"}, {0x4405, "Chelsio T440-BCH"}, {0x4406, "Chelsio T440-CH"}, {0x4407, "Chelsio T420-SO"}, {0x4408, "Chelsio T420-CX"}, {0x4409, "Chelsio T420-BT"}, {0x440a, "Chelsio T404-BT"}, {0x440e, "Chelsio T440-LP-CR"}, }, t5_pciids[] = { {0xb000, "Chelsio Terminator 5 FPGA"}, {0x5400, "Chelsio T580-dbg"}, {0x5401, "Chelsio T520-CR"}, /* 2 x 10G */ {0x5402, "Chelsio T522-CR"}, /* 2 x 10G, 2 X 1G */ {0x5403, "Chelsio T540-CR"}, /* 4 x 10G */ {0x5407, "Chelsio T520-SO"}, /* 2 x 10G, nomem */ {0x5409, "Chelsio T520-BT"}, /* 2 x 10GBaseT */ {0x540a, "Chelsio T504-BT"}, /* 4 x 1G */ {0x540d, "Chelsio T580-CR"}, /* 2 x 40G */ {0x540e, "Chelsio T540-LP-CR"}, /* 4 x 10G */ {0x5410, "Chelsio T580-LP-CR"}, /* 2 x 40G */ {0x5411, "Chelsio T520-LL-CR"}, /* 2 x 10G */ {0x5412, "Chelsio T560-CR"}, /* 1 x 40G, 2 x 10G */ {0x5414, "Chelsio T580-LP-SO-CR"}, /* 2 x 40G, nomem */ {0x5415, "Chelsio T502-BT"}, /* 2 x 1G */ #ifdef notyet {0x5404, "Chelsio T520-BCH"}, {0x5405, "Chelsio T540-BCH"}, {0x5406, "Chelsio T540-CH"}, {0x5408, "Chelsio T520-CX"}, {0x540b, "Chelsio B520-SR"}, {0x540c, "Chelsio B504-BT"}, {0x540f, "Chelsio Amsterdam"}, {0x5413, "Chelsio T580-CHR"}, #endif }; #ifdef TCP_OFFLOAD /* * service_iq() has an iq and needs the fl. Offset of fl from the iq should be * exactly the same for both rxq and ofld_rxq. */ CTASSERT(offsetof(struct sge_ofld_rxq, iq) == offsetof(struct sge_rxq, iq)); CTASSERT(offsetof(struct sge_ofld_rxq, fl) == offsetof(struct sge_rxq, fl)); #endif CTASSERT(sizeof(struct cluster_metadata) <= CL_METADATA_SIZE); static int t4_probe(device_t dev) { int i; uint16_t v = pci_get_vendor(dev); uint16_t d = pci_get_device(dev); uint8_t f = pci_get_function(dev); if (v != PCI_VENDOR_ID_CHELSIO) return (ENXIO); /* Attach only to PF0 of the FPGA */ if (d == 0xa000 && f != 0) return (ENXIO); for (i = 0; i < nitems(t4_pciids); i++) { if (d == t4_pciids[i].device) { device_set_desc(dev, t4_pciids[i].desc); return (BUS_PROBE_DEFAULT); } } return (ENXIO); } static int t5_probe(device_t dev) { int i; uint16_t v = pci_get_vendor(dev); uint16_t d = pci_get_device(dev); uint8_t f = pci_get_function(dev); if (v != PCI_VENDOR_ID_CHELSIO) return (ENXIO); /* Attach only to PF0 of the FPGA */ if (d == 0xb000 && f != 0) return (ENXIO); for (i = 0; i < nitems(t5_pciids); i++) { if (d == t5_pciids[i].device) { device_set_desc(dev, t5_pciids[i].desc); return (BUS_PROBE_DEFAULT); } } return (ENXIO); } static void t5_attribute_workaround(device_t dev) { device_t root_port; uint32_t v; /* * The T5 chips do not properly echo the No Snoop and Relaxed * Ordering attributes when replying to a TLP from a Root * Port. As a workaround, find the parent Root Port and * disable No Snoop and Relaxed Ordering. Note that this * affects all devices under this root port. */ root_port = pci_find_pcie_root_port(dev); if (root_port == NULL) { device_printf(dev, "Unable to find parent root port\n"); return; } v = pcie_adjust_config(root_port, PCIER_DEVICE_CTL, PCIEM_CTL_RELAXED_ORD_ENABLE | PCIEM_CTL_NOSNOOP_ENABLE, 0, 2); if ((v & (PCIEM_CTL_RELAXED_ORD_ENABLE | PCIEM_CTL_NOSNOOP_ENABLE)) != 0) device_printf(dev, "Disabled No Snoop/Relaxed Ordering on %s\n", device_get_nameunit(root_port)); } static int t4_attach(device_t dev) { struct adapter *sc; int rc = 0, i, j, n10g, n1g, rqidx, tqidx; struct make_dev_args mda; struct intrs_and_queues iaq; struct sge *s; uint8_t *buf; #ifdef TCP_OFFLOAD int ofld_rqidx, ofld_tqidx; #endif #ifdef DEV_NETMAP int nm_rqidx, nm_tqidx; #endif int num_vis; sc = device_get_softc(dev); sc->dev = dev; TUNABLE_INT_FETCH("hw.cxgbe.debug_flags", &sc->debug_flags); if ((pci_get_device(dev) & 0xff00) == 0x5400) t5_attribute_workaround(dev); pci_enable_busmaster(dev); if (pci_find_cap(dev, PCIY_EXPRESS, &i) == 0) { uint32_t v; pci_set_max_read_req(dev, 4096); v = pci_read_config(dev, i + PCIER_DEVICE_CTL, 2); v |= PCIEM_CTL_RELAXED_ORD_ENABLE; pci_write_config(dev, i + PCIER_DEVICE_CTL, v, 2); sc->params.pci.mps = 128 << ((v & PCIEM_CTL_MAX_PAYLOAD) >> 5); } sc->sge_gts_reg = MYPF_REG(A_SGE_PF_GTS); sc->sge_kdoorbell_reg = MYPF_REG(A_SGE_PF_KDOORBELL); sc->traceq = -1; mtx_init(&sc->ifp_lock, sc->ifp_lockname, 0, MTX_DEF); snprintf(sc->ifp_lockname, sizeof(sc->ifp_lockname), "%s tracer", device_get_nameunit(dev)); snprintf(sc->lockname, sizeof(sc->lockname), "%s", device_get_nameunit(dev)); mtx_init(&sc->sc_lock, sc->lockname, 0, MTX_DEF); t4_add_adapter(sc); mtx_init(&sc->sfl_lock, "starving freelists", 0, MTX_DEF); TAILQ_INIT(&sc->sfl); callout_init_mtx(&sc->sfl_callout, &sc->sfl_lock, 0); mtx_init(&sc->reg_lock, "indirect register access", 0, MTX_DEF); rc = t4_map_bars_0_and_4(sc); if (rc != 0) goto done; /* error message displayed already */ /* * This is the real PF# to which we're attaching. Works from within PCI * passthrough environments too, where pci_get_function() could return a * different PF# depending on the passthrough configuration. We need to * use the real PF# in all our communication with the firmware. */ sc->pf = G_SOURCEPF(t4_read_reg(sc, A_PL_WHOAMI)); sc->mbox = sc->pf; memset(sc->chan_map, 0xff, sizeof(sc->chan_map)); /* Prepare the adapter for operation. */ buf = malloc(PAGE_SIZE, M_CXGBE, M_ZERO | M_WAITOK); rc = -t4_prep_adapter(sc, buf); free(buf, M_CXGBE); if (rc != 0) { device_printf(dev, "failed to prepare adapter: %d.\n", rc); goto done; } /* * Do this really early, with the memory windows set up even before the * character device. The userland tool's register i/o and mem read * will work even in "recovery mode". */ setup_memwin(sc); if (t4_init_devlog_params(sc, 0) == 0) fixup_devlog_params(sc); make_dev_args_init(&mda); mda.mda_devsw = &t4_cdevsw; mda.mda_uid = UID_ROOT; mda.mda_gid = GID_WHEEL; mda.mda_mode = 0600; mda.mda_si_drv1 = sc; rc = make_dev_s(&mda, &sc->cdev, "%s", device_get_nameunit(dev)); if (rc != 0) device_printf(dev, "failed to create nexus char device: %d.\n", rc); /* Go no further if recovery mode has been requested. */ if (TUNABLE_INT_FETCH("hw.cxgbe.sos", &i) && i != 0) { device_printf(dev, "recovery mode.\n"); goto done; } #if defined(__i386__) if ((cpu_feature & CPUID_CX8) == 0) { device_printf(dev, "64 bit atomics not available.\n"); rc = ENOTSUP; goto done; } #endif /* Prepare the firmware for operation */ rc = prep_firmware(sc); if (rc != 0) goto done; /* error message displayed already */ rc = get_params__post_init(sc); if (rc != 0) goto done; /* error message displayed already */ rc = set_params__post_init(sc); if (rc != 0) goto done; /* error message displayed already */ rc = t4_map_bar_2(sc); if (rc != 0) goto done; /* error message displayed already */ rc = t4_create_dma_tag(sc); if (rc != 0) goto done; /* error message displayed already */ /* * Number of VIs to create per-port. The first VI is the "main" regular * VI for the port. The rest are additional virtual interfaces on the * same physical port. Note that the main VI does not have native * netmap support but the extra VIs do. * * Limit the number of VIs per port to the number of available * MAC addresses per port. */ if (t4_num_vis >= 1) num_vis = t4_num_vis; else num_vis = 1; if (num_vis > nitems(vi_mac_funcs)) { num_vis = nitems(vi_mac_funcs); device_printf(dev, "Number of VIs limited to %d\n", num_vis); } /* * First pass over all the ports - allocate VIs and initialize some * basic parameters like mac address, port type, etc. We also figure * out whether a port is 10G or 1G and use that information when * calculating how many interrupts to attempt to allocate. */ n10g = n1g = 0; for_each_port(sc, i) { struct port_info *pi; pi = malloc(sizeof(*pi), M_CXGBE, M_ZERO | M_WAITOK); sc->port[i] = pi; /* These must be set before t4_port_init */ pi->adapter = sc; pi->port_id = i; /* * XXX: vi[0] is special so we can't delay this allocation until * pi->nvi's final value is known. */ pi->vi = malloc(sizeof(struct vi_info) * num_vis, M_CXGBE, M_ZERO | M_WAITOK); /* * Allocate the "main" VI and initialize parameters * like mac addr. */ rc = -t4_port_init(sc, sc->mbox, sc->pf, 0, i); if (rc != 0) { device_printf(dev, "unable to initialize port %d: %d\n", i, rc); free(pi->vi, M_CXGBE); free(pi, M_CXGBE); sc->port[i] = NULL; goto done; } pi->link_cfg.requested_fc &= ~(PAUSE_TX | PAUSE_RX); pi->link_cfg.requested_fc |= t4_pause_settings; pi->link_cfg.fc &= ~(PAUSE_TX | PAUSE_RX); pi->link_cfg.fc |= t4_pause_settings; rc = -t4_link_l1cfg(sc, sc->mbox, pi->tx_chan, &pi->link_cfg); if (rc != 0) { device_printf(dev, "port %d l1cfg failed: %d\n", i, rc); free(pi->vi, M_CXGBE); free(pi, M_CXGBE); sc->port[i] = NULL; goto done; } snprintf(pi->lockname, sizeof(pi->lockname), "%sp%d", device_get_nameunit(dev), i); mtx_init(&pi->pi_lock, pi->lockname, 0, MTX_DEF); sc->chan_map[pi->tx_chan] = i; pi->tc = malloc(sizeof(struct tx_sched_class) * sc->chip_params->nsched_cls, M_CXGBE, M_ZERO | M_WAITOK); if (is_10G_port(pi) || is_40G_port(pi)) { n10g++; } else { n1g++; } pi->linkdnrc = -1; pi->dev = device_add_child(dev, is_t4(sc) ? "cxgbe" : "cxl", -1); if (pi->dev == NULL) { device_printf(dev, "failed to add device for port %d.\n", i); rc = ENXIO; goto done; } pi->vi[0].dev = pi->dev; device_set_softc(pi->dev, pi); } /* * Interrupt type, # of interrupts, # of rx/tx queues, etc. */ rc = cfg_itype_and_nqueues(sc, n10g, n1g, num_vis, &iaq); if (rc != 0) goto done; /* error message displayed already */ if (iaq.nrxq_vi + iaq.nofldrxq_vi + iaq.nnmrxq_vi == 0) num_vis = 1; sc->intr_type = iaq.intr_type; sc->intr_count = iaq.nirq; s = &sc->sge; s->nrxq = n10g * iaq.nrxq10g + n1g * iaq.nrxq1g; s->ntxq = n10g * iaq.ntxq10g + n1g * iaq.ntxq1g; if (num_vis > 1) { s->nrxq += (n10g + n1g) * (num_vis - 1) * iaq.nrxq_vi; s->ntxq += (n10g + n1g) * (num_vis - 1) * iaq.ntxq_vi; } s->neq = s->ntxq + s->nrxq; /* the free list in an rxq is an eq */ s->neq += sc->params.nports + 1;/* ctrl queues: 1 per port + 1 mgmt */ s->niq = s->nrxq + 1; /* 1 extra for firmware event queue */ #ifdef TCP_OFFLOAD if (is_offload(sc)) { s->nofldrxq = n10g * iaq.nofldrxq10g + n1g * iaq.nofldrxq1g; s->nofldtxq = n10g * iaq.nofldtxq10g + n1g * iaq.nofldtxq1g; if (num_vis > 1) { s->nofldrxq += (n10g + n1g) * (num_vis - 1) * iaq.nofldrxq_vi; s->nofldtxq += (n10g + n1g) * (num_vis - 1) * iaq.nofldtxq_vi; } s->neq += s->nofldtxq + s->nofldrxq; s->niq += s->nofldrxq; s->ofld_rxq = malloc(s->nofldrxq * sizeof(struct sge_ofld_rxq), M_CXGBE, M_ZERO | M_WAITOK); s->ofld_txq = malloc(s->nofldtxq * sizeof(struct sge_wrq), M_CXGBE, M_ZERO | M_WAITOK); } #endif #ifdef DEV_NETMAP if (num_vis > 1) { s->nnmrxq = (n10g + n1g) * (num_vis - 1) * iaq.nnmrxq_vi; s->nnmtxq = (n10g + n1g) * (num_vis - 1) * iaq.nnmtxq_vi; } s->neq += s->nnmtxq + s->nnmrxq; s->niq += s->nnmrxq; s->nm_rxq = malloc(s->nnmrxq * sizeof(struct sge_nm_rxq), M_CXGBE, M_ZERO | M_WAITOK); s->nm_txq = malloc(s->nnmtxq * sizeof(struct sge_nm_txq), M_CXGBE, M_ZERO | M_WAITOK); #endif s->ctrlq = malloc(sc->params.nports * sizeof(struct sge_wrq), M_CXGBE, M_ZERO | M_WAITOK); s->rxq = malloc(s->nrxq * sizeof(struct sge_rxq), M_CXGBE, M_ZERO | M_WAITOK); s->txq = malloc(s->ntxq * sizeof(struct sge_txq), M_CXGBE, M_ZERO | M_WAITOK); s->iqmap = malloc(s->niq * sizeof(struct sge_iq *), M_CXGBE, M_ZERO | M_WAITOK); s->eqmap = malloc(s->neq * sizeof(struct sge_eq *), M_CXGBE, M_ZERO | M_WAITOK); sc->irq = malloc(sc->intr_count * sizeof(struct irq), M_CXGBE, M_ZERO | M_WAITOK); t4_init_l2t(sc, M_WAITOK); /* * Second pass over the ports. This time we know the number of rx and * tx queues that each port should get. */ rqidx = tqidx = 0; #ifdef TCP_OFFLOAD ofld_rqidx = ofld_tqidx = 0; #endif #ifdef DEV_NETMAP nm_rqidx = nm_tqidx = 0; #endif for_each_port(sc, i) { struct port_info *pi = sc->port[i]; struct vi_info *vi; if (pi == NULL) continue; pi->nvi = num_vis; for_each_vi(pi, j, vi) { vi->pi = pi; vi->qsize_rxq = t4_qsize_rxq; vi->qsize_txq = t4_qsize_txq; vi->first_rxq = rqidx; vi->first_txq = tqidx; if (is_10G_port(pi) || is_40G_port(pi)) { vi->tmr_idx = t4_tmr_idx_10g; vi->pktc_idx = t4_pktc_idx_10g; vi->flags |= iaq.intr_flags_10g & INTR_RXQ; vi->nrxq = j == 0 ? iaq.nrxq10g : iaq.nrxq_vi; vi->ntxq = j == 0 ? iaq.ntxq10g : iaq.ntxq_vi; } else { vi->tmr_idx = t4_tmr_idx_1g; vi->pktc_idx = t4_pktc_idx_1g; vi->flags |= iaq.intr_flags_1g & INTR_RXQ; vi->nrxq = j == 0 ? iaq.nrxq1g : iaq.nrxq_vi; vi->ntxq = j == 0 ? iaq.ntxq1g : iaq.ntxq_vi; } rqidx += vi->nrxq; tqidx += vi->ntxq; if (j == 0 && vi->ntxq > 1) vi->rsrv_noflowq = iaq.rsrv_noflowq ? 1 : 0; else vi->rsrv_noflowq = 0; #ifdef TCP_OFFLOAD vi->first_ofld_rxq = ofld_rqidx; vi->first_ofld_txq = ofld_tqidx; if (is_10G_port(pi) || is_40G_port(pi)) { vi->flags |= iaq.intr_flags_10g & INTR_OFLD_RXQ; vi->nofldrxq = j == 0 ? iaq.nofldrxq10g : iaq.nofldrxq_vi; vi->nofldtxq = j == 0 ? iaq.nofldtxq10g : iaq.nofldtxq_vi; } else { vi->flags |= iaq.intr_flags_1g & INTR_OFLD_RXQ; vi->nofldrxq = j == 0 ? iaq.nofldrxq1g : iaq.nofldrxq_vi; vi->nofldtxq = j == 0 ? iaq.nofldtxq1g : iaq.nofldtxq_vi; } ofld_rqidx += vi->nofldrxq; ofld_tqidx += vi->nofldtxq; #endif #ifdef DEV_NETMAP if (j > 0) { vi->first_nm_rxq = nm_rqidx; vi->first_nm_txq = nm_tqidx; vi->nnmrxq = iaq.nnmrxq_vi; vi->nnmtxq = iaq.nnmtxq_vi; nm_rqidx += vi->nnmrxq; nm_tqidx += vi->nnmtxq; } #endif } } rc = t4_setup_intr_handlers(sc); if (rc != 0) { device_printf(dev, "failed to setup interrupt handlers: %d\n", rc); goto done; } rc = bus_generic_attach(dev); if (rc != 0) { device_printf(dev, "failed to attach all child ports: %d\n", rc); goto done; } device_printf(dev, "PCIe gen%d x%d, %d ports, %d %s interrupt%s, %d eq, %d iq\n", sc->params.pci.speed, sc->params.pci.width, sc->params.nports, sc->intr_count, sc->intr_type == INTR_MSIX ? "MSI-X" : (sc->intr_type == INTR_MSI ? "MSI" : "INTx"), sc->intr_count > 1 ? "s" : "", sc->sge.neq, sc->sge.niq); t4_set_desc(sc); notify_siblings(dev, 0); done: if (rc != 0 && sc->cdev) { /* cdev was created and so cxgbetool works; recover that way. */ device_printf(dev, "error during attach, adapter is now in recovery mode.\n"); rc = 0; } if (rc != 0) t4_detach_common(dev); else t4_sysctls(sc); return (rc); } static int t4_ready(device_t dev) { struct adapter *sc; sc = device_get_softc(dev); if (sc->flags & FW_OK) return (0); return (ENXIO); } static int t4_read_port_device(device_t dev, int port, device_t *child) { struct adapter *sc; struct port_info *pi; sc = device_get_softc(dev); if (port < 0 || port >= MAX_NPORTS) return (EINVAL); pi = sc->port[port]; if (pi == NULL || pi->dev == NULL) return (ENXIO); *child = pi->dev; return (0); } static int notify_siblings(device_t dev, int detaching) { device_t sibling; int error, i; error = 0; for (i = 0; i < PCI_FUNCMAX; i++) { if (i == pci_get_function(dev)) continue; sibling = pci_find_dbsf(pci_get_domain(dev), pci_get_bus(dev), pci_get_slot(dev), i); if (sibling == NULL || !device_is_attached(sibling)) continue; if (detaching) error = T4_DETACH_CHILD(sibling); else (void)T4_ATTACH_CHILD(sibling); if (error) break; } return (error); } /* * Idempotent */ static int t4_detach(device_t dev) { struct adapter *sc; int rc; sc = device_get_softc(dev); rc = notify_siblings(dev, 1); if (rc) { device_printf(dev, "failed to detach sibling devices: %d\n", rc); return (rc); } return (t4_detach_common(dev)); } int t4_detach_common(device_t dev) { struct adapter *sc; struct port_info *pi; int i, rc; sc = device_get_softc(dev); if (sc->flags & FULL_INIT_DONE) { if (!(sc->flags & IS_VF)) t4_intr_disable(sc); } if (sc->cdev) { destroy_dev(sc->cdev); sc->cdev = NULL; } if (device_is_attached(dev)) { rc = bus_generic_detach(dev); if (rc) { device_printf(dev, "failed to detach child devices: %d\n", rc); return (rc); } } for (i = 0; i < sc->intr_count; i++) t4_free_irq(sc, &sc->irq[i]); for (i = 0; i < MAX_NPORTS; i++) { pi = sc->port[i]; if (pi) { t4_free_vi(sc, sc->mbox, sc->pf, 0, pi->vi[0].viid); if (pi->dev) device_delete_child(dev, pi->dev); mtx_destroy(&pi->pi_lock); free(pi->vi, M_CXGBE); free(pi->tc, M_CXGBE); free(pi, M_CXGBE); } } if (sc->flags & FULL_INIT_DONE) adapter_full_uninit(sc); if ((sc->flags & (IS_VF | FW_OK)) == FW_OK) t4_fw_bye(sc, sc->mbox); if (sc->intr_type == INTR_MSI || sc->intr_type == INTR_MSIX) pci_release_msi(dev); if (sc->regs_res) bus_release_resource(dev, SYS_RES_MEMORY, sc->regs_rid, sc->regs_res); if (sc->udbs_res) bus_release_resource(dev, SYS_RES_MEMORY, sc->udbs_rid, sc->udbs_res); if (sc->msix_res) bus_release_resource(dev, SYS_RES_MEMORY, sc->msix_rid, sc->msix_res); if (sc->l2t) t4_free_l2t(sc->l2t); #ifdef TCP_OFFLOAD free(sc->sge.ofld_rxq, M_CXGBE); free(sc->sge.ofld_txq, M_CXGBE); #endif #ifdef DEV_NETMAP free(sc->sge.nm_rxq, M_CXGBE); free(sc->sge.nm_txq, M_CXGBE); #endif free(sc->irq, M_CXGBE); free(sc->sge.rxq, M_CXGBE); free(sc->sge.txq, M_CXGBE); free(sc->sge.ctrlq, M_CXGBE); free(sc->sge.iqmap, M_CXGBE); free(sc->sge.eqmap, M_CXGBE); free(sc->tids.ftid_tab, M_CXGBE); t4_destroy_dma_tag(sc); if (mtx_initialized(&sc->sc_lock)) { sx_xlock(&t4_list_lock); SLIST_REMOVE(&t4_list, sc, adapter, link); sx_xunlock(&t4_list_lock); mtx_destroy(&sc->sc_lock); } callout_drain(&sc->sfl_callout); if (mtx_initialized(&sc->tids.ftid_lock)) mtx_destroy(&sc->tids.ftid_lock); if (mtx_initialized(&sc->sfl_lock)) mtx_destroy(&sc->sfl_lock); if (mtx_initialized(&sc->ifp_lock)) mtx_destroy(&sc->ifp_lock); if (mtx_initialized(&sc->reg_lock)) mtx_destroy(&sc->reg_lock); for (i = 0; i < NUM_MEMWIN; i++) { struct memwin *mw = &sc->memwin[i]; if (rw_initialized(&mw->mw_lock)) rw_destroy(&mw->mw_lock); } bzero(sc, sizeof(*sc)); return (0); } static int cxgbe_probe(device_t dev) { char buf[128]; struct port_info *pi = device_get_softc(dev); snprintf(buf, sizeof(buf), "port %d", pi->port_id); device_set_desc_copy(dev, buf); return (BUS_PROBE_DEFAULT); } #define T4_CAP (IFCAP_VLAN_HWTAGGING | IFCAP_VLAN_MTU | IFCAP_HWCSUM | \ IFCAP_VLAN_HWCSUM | IFCAP_TSO | IFCAP_JUMBO_MTU | IFCAP_LRO | \ IFCAP_VLAN_HWTSO | IFCAP_LINKSTATE | IFCAP_HWCSUM_IPV6 | IFCAP_HWSTATS) #define T4_CAP_ENABLE (T4_CAP) static int cxgbe_vi_attach(device_t dev, struct vi_info *vi) { struct ifnet *ifp; struct sbuf *sb; vi->xact_addr_filt = -1; callout_init(&vi->tick, 1); /* Allocate an ifnet and set it up */ ifp = if_alloc(IFT_ETHER); if (ifp == NULL) { device_printf(dev, "Cannot allocate ifnet\n"); return (ENOMEM); } vi->ifp = ifp; ifp->if_softc = vi; if_initname(ifp, device_get_name(dev), device_get_unit(dev)); ifp->if_flags = IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST; ifp->if_init = cxgbe_init; ifp->if_ioctl = cxgbe_ioctl; ifp->if_transmit = cxgbe_transmit; ifp->if_qflush = cxgbe_qflush; ifp->if_get_counter = cxgbe_get_counter; ifp->if_capabilities = T4_CAP; #ifdef TCP_OFFLOAD if (vi->nofldrxq != 0) ifp->if_capabilities |= IFCAP_TOE; #endif ifp->if_capenable = T4_CAP_ENABLE; ifp->if_hwassist = CSUM_TCP | CSUM_UDP | CSUM_IP | CSUM_TSO | CSUM_UDP_IPV6 | CSUM_TCP_IPV6; ifp->if_hw_tsomax = 65536 - (ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN); ifp->if_hw_tsomaxsegcount = TX_SGL_SEGS; ifp->if_hw_tsomaxsegsize = 65536; /* Initialize ifmedia for this VI */ ifmedia_init(&vi->media, IFM_IMASK, cxgbe_media_change, cxgbe_media_status); build_medialist(vi->pi, &vi->media); vi->vlan_c = EVENTHANDLER_REGISTER(vlan_config, cxgbe_vlan_config, ifp, EVENTHANDLER_PRI_ANY); ether_ifattach(ifp, vi->hw_addr); #ifdef DEV_NETMAP if (vi->nnmrxq != 0) cxgbe_nm_attach(vi); #endif sb = sbuf_new_auto(); sbuf_printf(sb, "%d txq, %d rxq (NIC)", vi->ntxq, vi->nrxq); #ifdef TCP_OFFLOAD if (ifp->if_capabilities & IFCAP_TOE) sbuf_printf(sb, "; %d txq, %d rxq (TOE)", vi->nofldtxq, vi->nofldrxq); #endif #ifdef DEV_NETMAP if (ifp->if_capabilities & IFCAP_NETMAP) sbuf_printf(sb, "; %d txq, %d rxq (netmap)", vi->nnmtxq, vi->nnmrxq); #endif sbuf_finish(sb); device_printf(dev, "%s\n", sbuf_data(sb)); sbuf_delete(sb); vi_sysctls(vi); return (0); } static int cxgbe_attach(device_t dev) { struct port_info *pi = device_get_softc(dev); struct vi_info *vi; int i, rc; callout_init_mtx(&pi->tick, &pi->pi_lock, 0); rc = cxgbe_vi_attach(dev, &pi->vi[0]); if (rc) return (rc); for_each_vi(pi, i, vi) { if (i == 0) continue; vi->dev = device_add_child(dev, is_t4(pi->adapter) ? "vcxgbe" : "vcxl", -1); if (vi->dev == NULL) { device_printf(dev, "failed to add VI %d\n", i); continue; } device_set_softc(vi->dev, vi); } cxgbe_sysctls(pi); bus_generic_attach(dev); return (0); } static void cxgbe_vi_detach(struct vi_info *vi) { struct ifnet *ifp = vi->ifp; ether_ifdetach(ifp); if (vi->vlan_c) EVENTHANDLER_DEREGISTER(vlan_config, vi->vlan_c); /* Let detach proceed even if these fail. */ #ifdef DEV_NETMAP if (ifp->if_capabilities & IFCAP_NETMAP) cxgbe_nm_detach(vi); #endif cxgbe_uninit_synchronized(vi); callout_drain(&vi->tick); vi_full_uninit(vi); ifmedia_removeall(&vi->media); if_free(vi->ifp); vi->ifp = NULL; } static int cxgbe_detach(device_t dev) { struct port_info *pi = device_get_softc(dev); struct adapter *sc = pi->adapter; int rc; /* Detach the extra VIs first. */ rc = bus_generic_detach(dev); if (rc) return (rc); device_delete_children(dev); doom_vi(sc, &pi->vi[0]); if (pi->flags & HAS_TRACEQ) { sc->traceq = -1; /* cloner should not create ifnet */ t4_tracer_port_detach(sc); } cxgbe_vi_detach(&pi->vi[0]); callout_drain(&pi->tick); end_synchronized_op(sc, 0); return (0); } static void cxgbe_init(void *arg) { struct vi_info *vi = arg; struct adapter *sc = vi->pi->adapter; if (begin_synchronized_op(sc, vi, SLEEP_OK | INTR_OK, "t4init") != 0) return; cxgbe_init_synchronized(vi); end_synchronized_op(sc, 0); } static int cxgbe_ioctl(struct ifnet *ifp, unsigned long cmd, caddr_t data) { int rc = 0, mtu, flags, can_sleep; struct vi_info *vi = ifp->if_softc; struct adapter *sc = vi->pi->adapter; struct ifreq *ifr = (struct ifreq *)data; uint32_t mask; switch (cmd) { case SIOCSIFMTU: mtu = ifr->ifr_mtu; if ((mtu < ETHERMIN) || (mtu > ETHERMTU_JUMBO)) return (EINVAL); rc = begin_synchronized_op(sc, vi, SLEEP_OK | INTR_OK, "t4mtu"); if (rc) return (rc); ifp->if_mtu = mtu; if (vi->flags & VI_INIT_DONE) { t4_update_fl_bufsize(ifp); if (ifp->if_drv_flags & IFF_DRV_RUNNING) rc = update_mac_settings(ifp, XGMAC_MTU); } end_synchronized_op(sc, 0); break; case SIOCSIFFLAGS: can_sleep = 0; redo_sifflags: rc = begin_synchronized_op(sc, vi, can_sleep ? (SLEEP_OK | INTR_OK) : HOLD_LOCK, "t4flg"); if (rc) return (rc); if (ifp->if_flags & IFF_UP) { if (ifp->if_drv_flags & IFF_DRV_RUNNING) { flags = vi->if_flags; if ((ifp->if_flags ^ flags) & (IFF_PROMISC | IFF_ALLMULTI)) { if (can_sleep == 1) { end_synchronized_op(sc, 0); can_sleep = 0; goto redo_sifflags; } rc = update_mac_settings(ifp, XGMAC_PROMISC | XGMAC_ALLMULTI); } } else { if (can_sleep == 0) { end_synchronized_op(sc, LOCK_HELD); can_sleep = 1; goto redo_sifflags; } rc = cxgbe_init_synchronized(vi); } vi->if_flags = ifp->if_flags; } else if (ifp->if_drv_flags & IFF_DRV_RUNNING) { if (can_sleep == 0) { end_synchronized_op(sc, LOCK_HELD); can_sleep = 1; goto redo_sifflags; } rc = cxgbe_uninit_synchronized(vi); } end_synchronized_op(sc, can_sleep ? 0 : LOCK_HELD); break; case SIOCADDMULTI: case SIOCDELMULTI: /* these two are called with a mutex held :-( */ rc = begin_synchronized_op(sc, vi, HOLD_LOCK, "t4multi"); if (rc) return (rc); if (ifp->if_drv_flags & IFF_DRV_RUNNING) rc = update_mac_settings(ifp, XGMAC_MCADDRS); end_synchronized_op(sc, LOCK_HELD); break; case SIOCSIFCAP: rc = begin_synchronized_op(sc, vi, SLEEP_OK | INTR_OK, "t4cap"); if (rc) return (rc); mask = ifr->ifr_reqcap ^ ifp->if_capenable; if (mask & IFCAP_TXCSUM) { ifp->if_capenable ^= IFCAP_TXCSUM; ifp->if_hwassist ^= (CSUM_TCP | CSUM_UDP | CSUM_IP); if (IFCAP_TSO4 & ifp->if_capenable && !(IFCAP_TXCSUM & ifp->if_capenable)) { ifp->if_capenable &= ~IFCAP_TSO4; if_printf(ifp, "tso4 disabled due to -txcsum.\n"); } } if (mask & IFCAP_TXCSUM_IPV6) { ifp->if_capenable ^= IFCAP_TXCSUM_IPV6; ifp->if_hwassist ^= (CSUM_UDP_IPV6 | CSUM_TCP_IPV6); if (IFCAP_TSO6 & ifp->if_capenable && !(IFCAP_TXCSUM_IPV6 & ifp->if_capenable)) { ifp->if_capenable &= ~IFCAP_TSO6; if_printf(ifp, "tso6 disabled due to -txcsum6.\n"); } } if (mask & IFCAP_RXCSUM) ifp->if_capenable ^= IFCAP_RXCSUM; if (mask & IFCAP_RXCSUM_IPV6) ifp->if_capenable ^= IFCAP_RXCSUM_IPV6; /* * Note that we leave CSUM_TSO alone (it is always set). The * kernel takes both IFCAP_TSOx and CSUM_TSO into account before * sending a TSO request our way, so it's sufficient to toggle * IFCAP_TSOx only. */ if (mask & IFCAP_TSO4) { if (!(IFCAP_TSO4 & ifp->if_capenable) && !(IFCAP_TXCSUM & ifp->if_capenable)) { if_printf(ifp, "enable txcsum first.\n"); rc = EAGAIN; goto fail; } ifp->if_capenable ^= IFCAP_TSO4; } if (mask & IFCAP_TSO6) { if (!(IFCAP_TSO6 & ifp->if_capenable) && !(IFCAP_TXCSUM_IPV6 & ifp->if_capenable)) { if_printf(ifp, "enable txcsum6 first.\n"); rc = EAGAIN; goto fail; } ifp->if_capenable ^= IFCAP_TSO6; } if (mask & IFCAP_LRO) { #if defined(INET) || defined(INET6) int i; struct sge_rxq *rxq; ifp->if_capenable ^= IFCAP_LRO; for_each_rxq(vi, i, rxq) { if (ifp->if_capenable & IFCAP_LRO) rxq->iq.flags |= IQ_LRO_ENABLED; else rxq->iq.flags &= ~IQ_LRO_ENABLED; } #endif } #ifdef TCP_OFFLOAD if (mask & IFCAP_TOE) { int enable = (ifp->if_capenable ^ mask) & IFCAP_TOE; rc = toe_capability(vi, enable); if (rc != 0) goto fail; ifp->if_capenable ^= mask; } #endif if (mask & IFCAP_VLAN_HWTAGGING) { ifp->if_capenable ^= IFCAP_VLAN_HWTAGGING; if (ifp->if_drv_flags & IFF_DRV_RUNNING) rc = update_mac_settings(ifp, XGMAC_VLANEX); } if (mask & IFCAP_VLAN_MTU) { ifp->if_capenable ^= IFCAP_VLAN_MTU; /* Need to find out how to disable auto-mtu-inflation */ } if (mask & IFCAP_VLAN_HWTSO) ifp->if_capenable ^= IFCAP_VLAN_HWTSO; if (mask & IFCAP_VLAN_HWCSUM) ifp->if_capenable ^= IFCAP_VLAN_HWCSUM; #ifdef VLAN_CAPABILITIES VLAN_CAPABILITIES(ifp); #endif fail: end_synchronized_op(sc, 0); break; case SIOCSIFMEDIA: case SIOCGIFMEDIA: ifmedia_ioctl(ifp, ifr, &vi->media, cmd); break; case SIOCGI2C: { struct ifi2creq i2c; rc = copyin(ifr->ifr_data, &i2c, sizeof(i2c)); if (rc != 0) break; if (i2c.dev_addr != 0xA0 && i2c.dev_addr != 0xA2) { rc = EPERM; break; } if (i2c.len > sizeof(i2c.data)) { rc = EINVAL; break; } rc = begin_synchronized_op(sc, vi, SLEEP_OK | INTR_OK, "t4i2c"); if (rc) return (rc); rc = -t4_i2c_rd(sc, sc->mbox, vi->pi->port_id, i2c.dev_addr, i2c.offset, i2c.len, &i2c.data[0]); end_synchronized_op(sc, 0); if (rc == 0) rc = copyout(&i2c, ifr->ifr_data, sizeof(i2c)); break; } default: rc = ether_ioctl(ifp, cmd, data); } return (rc); } static int cxgbe_transmit(struct ifnet *ifp, struct mbuf *m) { struct vi_info *vi = ifp->if_softc; struct port_info *pi = vi->pi; struct adapter *sc = pi->adapter; struct sge_txq *txq; void *items[1]; int rc; M_ASSERTPKTHDR(m); MPASS(m->m_nextpkt == NULL); /* not quite ready for this yet */ if (__predict_false(pi->link_cfg.link_ok == 0)) { m_freem(m); return (ENETDOWN); } rc = parse_pkt(sc, &m); if (__predict_false(rc != 0)) { MPASS(m == NULL); /* was freed already */ atomic_add_int(&pi->tx_parse_error, 1); /* rare, atomic is ok */ return (rc); } /* Select a txq. */ txq = &sc->sge.txq[vi->first_txq]; if (M_HASHTYPE_GET(m) != M_HASHTYPE_NONE) txq += ((m->m_pkthdr.flowid % (vi->ntxq - vi->rsrv_noflowq)) + vi->rsrv_noflowq); items[0] = m; rc = mp_ring_enqueue(txq->r, items, 1, 4096); if (__predict_false(rc != 0)) m_freem(m); return (rc); } static void cxgbe_qflush(struct ifnet *ifp) { struct vi_info *vi = ifp->if_softc; struct sge_txq *txq; int i; /* queues do not exist if !VI_INIT_DONE. */ if (vi->flags & VI_INIT_DONE) { for_each_txq(vi, i, txq) { TXQ_LOCK(txq); txq->eq.flags &= ~EQ_ENABLED; TXQ_UNLOCK(txq); while (!mp_ring_is_idle(txq->r)) { mp_ring_check_drainage(txq->r, 0); pause("qflush", 1); } } } if_qflush(ifp); } static uint64_t vi_get_counter(struct ifnet *ifp, ift_counter c) { struct vi_info *vi = ifp->if_softc; struct fw_vi_stats_vf *s = &vi->stats; vi_refresh_stats(vi->pi->adapter, vi); switch (c) { case IFCOUNTER_IPACKETS: return (s->rx_bcast_frames + s->rx_mcast_frames + s->rx_ucast_frames); case IFCOUNTER_IERRORS: return (s->rx_err_frames); case IFCOUNTER_OPACKETS: return (s->tx_bcast_frames + s->tx_mcast_frames + s->tx_ucast_frames + s->tx_offload_frames); case IFCOUNTER_OERRORS: return (s->tx_drop_frames); case IFCOUNTER_IBYTES: return (s->rx_bcast_bytes + s->rx_mcast_bytes + s->rx_ucast_bytes); case IFCOUNTER_OBYTES: return (s->tx_bcast_bytes + s->tx_mcast_bytes + s->tx_ucast_bytes + s->tx_offload_bytes); case IFCOUNTER_IMCASTS: return (s->rx_mcast_frames); case IFCOUNTER_OMCASTS: return (s->tx_mcast_frames); case IFCOUNTER_OQDROPS: { uint64_t drops; drops = 0; if (vi->flags & VI_INIT_DONE) { int i; struct sge_txq *txq; for_each_txq(vi, i, txq) drops += counter_u64_fetch(txq->r->drops); } return (drops); } default: return (if_get_counter_default(ifp, c)); } } uint64_t cxgbe_get_counter(struct ifnet *ifp, ift_counter c) { struct vi_info *vi = ifp->if_softc; struct port_info *pi = vi->pi; struct adapter *sc = pi->adapter; struct port_stats *s = &pi->stats; if (pi->nvi > 1 || sc->flags & IS_VF) return (vi_get_counter(ifp, c)); cxgbe_refresh_stats(sc, pi); switch (c) { case IFCOUNTER_IPACKETS: return (s->rx_frames); case IFCOUNTER_IERRORS: return (s->rx_jabber + s->rx_runt + s->rx_too_long + s->rx_fcs_err + s->rx_len_err); case IFCOUNTER_OPACKETS: return (s->tx_frames); case IFCOUNTER_OERRORS: return (s->tx_error_frames); case IFCOUNTER_IBYTES: return (s->rx_octets); case IFCOUNTER_OBYTES: return (s->tx_octets); case IFCOUNTER_IMCASTS: return (s->rx_mcast_frames); case IFCOUNTER_OMCASTS: return (s->tx_mcast_frames); case IFCOUNTER_IQDROPS: return (s->rx_ovflow0 + s->rx_ovflow1 + s->rx_ovflow2 + s->rx_ovflow3 + s->rx_trunc0 + s->rx_trunc1 + s->rx_trunc2 + s->rx_trunc3 + pi->tnl_cong_drops); case IFCOUNTER_OQDROPS: { uint64_t drops; drops = s->tx_drop; if (vi->flags & VI_INIT_DONE) { int i; struct sge_txq *txq; for_each_txq(vi, i, txq) drops += counter_u64_fetch(txq->r->drops); } return (drops); } default: return (if_get_counter_default(ifp, c)); } } static int cxgbe_media_change(struct ifnet *ifp) { struct vi_info *vi = ifp->if_softc; device_printf(vi->dev, "%s unimplemented.\n", __func__); return (EOPNOTSUPP); } static void cxgbe_media_status(struct ifnet *ifp, struct ifmediareq *ifmr) { struct vi_info *vi = ifp->if_softc; struct port_info *pi = vi->pi; struct ifmedia_entry *cur; int speed = pi->link_cfg.speed; cur = vi->media.ifm_cur; ifmr->ifm_status = IFM_AVALID; if (!pi->link_cfg.link_ok) return; ifmr->ifm_status |= IFM_ACTIVE; /* active and current will differ iff current media is autoselect. */ if (IFM_SUBTYPE(cur->ifm_media) != IFM_AUTO) return; ifmr->ifm_active = IFM_ETHER | IFM_FDX; if (speed == 10000) ifmr->ifm_active |= IFM_10G_T; else if (speed == 1000) ifmr->ifm_active |= IFM_1000_T; else if (speed == 100) ifmr->ifm_active |= IFM_100_TX; else if (speed == 10) ifmr->ifm_active |= IFM_10_T; else KASSERT(0, ("%s: link up but speed unknown (%u)", __func__, speed)); } static int vcxgbe_probe(device_t dev) { char buf[128]; struct vi_info *vi = device_get_softc(dev); snprintf(buf, sizeof(buf), "port %d vi %td", vi->pi->port_id, vi - vi->pi->vi); device_set_desc_copy(dev, buf); return (BUS_PROBE_DEFAULT); } static int vcxgbe_attach(device_t dev) { struct vi_info *vi; struct port_info *pi; struct adapter *sc; int func, index, rc; u32 param, val; vi = device_get_softc(dev); pi = vi->pi; sc = pi->adapter; index = vi - pi->vi; KASSERT(index < nitems(vi_mac_funcs), ("%s: VI %s doesn't have a MAC func", __func__, device_get_nameunit(dev))); func = vi_mac_funcs[index]; rc = t4_alloc_vi_func(sc, sc->mbox, pi->tx_chan, sc->pf, 0, 1, vi->hw_addr, &vi->rss_size, func, 0); if (rc < 0) { device_printf(dev, "Failed to allocate virtual interface " "for port %d: %d\n", pi->port_id, -rc); return (-rc); } vi->viid = rc; param = V_FW_PARAMS_MNEM(FW_PARAMS_MNEM_DEV) | V_FW_PARAMS_PARAM_X(FW_PARAMS_PARAM_DEV_RSSINFO) | V_FW_PARAMS_PARAM_YZ(vi->viid); rc = t4_query_params(sc, sc->mbox, sc->pf, 0, 1, ¶m, &val); if (rc) vi->rss_base = 0xffff; else { /* MPASS((val >> 16) == rss_size); */ vi->rss_base = val & 0xffff; } rc = cxgbe_vi_attach(dev, vi); if (rc) { t4_free_vi(sc, sc->mbox, sc->pf, 0, vi->viid); return (rc); } return (0); } static int vcxgbe_detach(device_t dev) { struct vi_info *vi; struct adapter *sc; vi = device_get_softc(dev); sc = vi->pi->adapter; doom_vi(sc, vi); cxgbe_vi_detach(vi); t4_free_vi(sc, sc->mbox, sc->pf, 0, vi->viid); end_synchronized_op(sc, 0); return (0); } void t4_fatal_err(struct adapter *sc) { t4_set_reg_field(sc, A_SGE_CONTROL, F_GLOBALENABLE, 0); t4_intr_disable(sc); log(LOG_EMERG, "%s: encountered fatal error, adapter stopped.\n", device_get_nameunit(sc->dev)); } void t4_add_adapter(struct adapter *sc) { sx_xlock(&t4_list_lock); SLIST_INSERT_HEAD(&t4_list, sc, link); sx_xunlock(&t4_list_lock); } int t4_map_bars_0_and_4(struct adapter *sc) { sc->regs_rid = PCIR_BAR(0); sc->regs_res = bus_alloc_resource_any(sc->dev, SYS_RES_MEMORY, &sc->regs_rid, RF_ACTIVE); if (sc->regs_res == NULL) { device_printf(sc->dev, "cannot map registers.\n"); return (ENXIO); } sc->bt = rman_get_bustag(sc->regs_res); sc->bh = rman_get_bushandle(sc->regs_res); sc->mmio_len = rman_get_size(sc->regs_res); setbit(&sc->doorbells, DOORBELL_KDB); sc->msix_rid = PCIR_BAR(4); sc->msix_res = bus_alloc_resource_any(sc->dev, SYS_RES_MEMORY, &sc->msix_rid, RF_ACTIVE); if (sc->msix_res == NULL) { device_printf(sc->dev, "cannot map MSI-X BAR.\n"); return (ENXIO); } return (0); } int t4_map_bar_2(struct adapter *sc) { /* * T4: only iWARP driver uses the userspace doorbells. There is no need * to map it if RDMA is disabled. */ if (is_t4(sc) && sc->rdmacaps == 0) return (0); sc->udbs_rid = PCIR_BAR(2); sc->udbs_res = bus_alloc_resource_any(sc->dev, SYS_RES_MEMORY, &sc->udbs_rid, RF_ACTIVE); if (sc->udbs_res == NULL) { device_printf(sc->dev, "cannot map doorbell BAR.\n"); return (ENXIO); } sc->udbs_base = rman_get_virtual(sc->udbs_res); if (is_t5(sc)) { setbit(&sc->doorbells, DOORBELL_UDB); #if defined(__i386__) || defined(__amd64__) if (t5_write_combine) { int rc; /* * Enable write combining on BAR2. This is the * userspace doorbell BAR and is split into 128B * (UDBS_SEG_SIZE) doorbell regions, each associated * with an egress queue. The first 64B has the doorbell * and the second 64B can be used to submit a tx work * request with an implicit doorbell. */ rc = pmap_change_attr((vm_offset_t)sc->udbs_base, rman_get_size(sc->udbs_res), PAT_WRITE_COMBINING); if (rc == 0) { clrbit(&sc->doorbells, DOORBELL_UDB); setbit(&sc->doorbells, DOORBELL_WCWR); setbit(&sc->doorbells, DOORBELL_UDBWC); } else { device_printf(sc->dev, "couldn't enable write combining: %d\n", rc); } t4_write_reg(sc, A_SGE_STAT_CFG, V_STATSOURCE_T5(7) | V_STATMODE(0)); } #endif } return (0); } struct memwin_init { uint32_t base; uint32_t aperture; }; static const struct memwin_init t4_memwin[NUM_MEMWIN] = { { MEMWIN0_BASE, MEMWIN0_APERTURE }, { MEMWIN1_BASE, MEMWIN1_APERTURE }, { MEMWIN2_BASE_T4, MEMWIN2_APERTURE_T4 } }; static const struct memwin_init t5_memwin[NUM_MEMWIN] = { { MEMWIN0_BASE, MEMWIN0_APERTURE }, { MEMWIN1_BASE, MEMWIN1_APERTURE }, { MEMWIN2_BASE_T5, MEMWIN2_APERTURE_T5 }, }; static void setup_memwin(struct adapter *sc) { const struct memwin_init *mw_init; struct memwin *mw; int i; uint32_t bar0; if (is_t4(sc)) { /* * Read low 32b of bar0 indirectly via the hardware backdoor * mechanism. Works from within PCI passthrough environments * too, where rman_get_start() can return a different value. We * need to program the T4 memory window decoders with the actual * addresses that will be coming across the PCIe link. */ bar0 = t4_hw_pci_read_cfg4(sc, PCIR_BAR(0)); bar0 &= (uint32_t) PCIM_BAR_MEM_BASE; mw_init = &t4_memwin[0]; } else { /* T5+ use the relative offset inside the PCIe BAR */ bar0 = 0; mw_init = &t5_memwin[0]; } for (i = 0, mw = &sc->memwin[0]; i < NUM_MEMWIN; i++, mw_init++, mw++) { rw_init(&mw->mw_lock, "memory window access"); mw->mw_base = mw_init->base; mw->mw_aperture = mw_init->aperture; mw->mw_curpos = 0; t4_write_reg(sc, PCIE_MEM_ACCESS_REG(A_PCIE_MEM_ACCESS_BASE_WIN, i), (mw->mw_base + bar0) | V_BIR(0) | V_WINDOW(ilog2(mw->mw_aperture) - 10)); rw_wlock(&mw->mw_lock); position_memwin(sc, i, 0); rw_wunlock(&mw->mw_lock); } /* flush */ t4_read_reg(sc, PCIE_MEM_ACCESS_REG(A_PCIE_MEM_ACCESS_BASE_WIN, 2)); } /* * Positions the memory window at the given address in the card's address space. * There are some alignment requirements and the actual position may be at an * address prior to the requested address. mw->mw_curpos always has the actual * position of the window. */ static void position_memwin(struct adapter *sc, int idx, uint32_t addr) { struct memwin *mw; uint32_t pf; uint32_t reg; MPASS(idx >= 0 && idx < NUM_MEMWIN); mw = &sc->memwin[idx]; rw_assert(&mw->mw_lock, RA_WLOCKED); if (is_t4(sc)) { pf = 0; mw->mw_curpos = addr & ~0xf; /* start must be 16B aligned */ } else { pf = V_PFNUM(sc->pf); mw->mw_curpos = addr & ~0x7f; /* start must be 128B aligned */ } reg = PCIE_MEM_ACCESS_REG(A_PCIE_MEM_ACCESS_OFFSET, idx); t4_write_reg(sc, reg, mw->mw_curpos | pf); t4_read_reg(sc, reg); /* flush */ } static int rw_via_memwin(struct adapter *sc, int idx, uint32_t addr, uint32_t *val, int len, int rw) { struct memwin *mw; uint32_t mw_end, v; MPASS(idx >= 0 && idx < NUM_MEMWIN); /* Memory can only be accessed in naturally aligned 4 byte units */ if (addr & 3 || len & 3 || len <= 0) return (EINVAL); mw = &sc->memwin[idx]; while (len > 0) { rw_rlock(&mw->mw_lock); mw_end = mw->mw_curpos + mw->mw_aperture; if (addr >= mw_end || addr < mw->mw_curpos) { /* Will need to reposition the window */ if (!rw_try_upgrade(&mw->mw_lock)) { rw_runlock(&mw->mw_lock); rw_wlock(&mw->mw_lock); } rw_assert(&mw->mw_lock, RA_WLOCKED); position_memwin(sc, idx, addr); rw_downgrade(&mw->mw_lock); mw_end = mw->mw_curpos + mw->mw_aperture; } rw_assert(&mw->mw_lock, RA_RLOCKED); while (addr < mw_end && len > 0) { if (rw == 0) { v = t4_read_reg(sc, mw->mw_base + addr - mw->mw_curpos); *val++ = le32toh(v); } else { v = *val++; t4_write_reg(sc, mw->mw_base + addr - mw->mw_curpos, htole32(v)); } addr += 4; len -= 4; } rw_runlock(&mw->mw_lock); } return (0); } static inline int read_via_memwin(struct adapter *sc, int idx, uint32_t addr, uint32_t *val, int len) { return (rw_via_memwin(sc, idx, addr, val, len, 0)); } static inline int write_via_memwin(struct adapter *sc, int idx, uint32_t addr, const uint32_t *val, int len) { return (rw_via_memwin(sc, idx, addr, (void *)(uintptr_t)val, len, 1)); } static int t4_range_cmp(const void *a, const void *b) { return ((const struct t4_range *)a)->start - ((const struct t4_range *)b)->start; } /* * Verify that the memory range specified by the addr/len pair is valid within * the card's address space. */ static int validate_mem_range(struct adapter *sc, uint32_t addr, int len) { struct t4_range mem_ranges[4], *r, *next; uint32_t em, addr_len; int i, n, remaining; /* Memory can only be accessed in naturally aligned 4 byte units */ if (addr & 3 || len & 3 || len <= 0) return (EINVAL); /* Enabled memories */ em = t4_read_reg(sc, A_MA_TARGET_MEM_ENABLE); r = &mem_ranges[0]; n = 0; bzero(r, sizeof(mem_ranges)); if (em & F_EDRAM0_ENABLE) { addr_len = t4_read_reg(sc, A_MA_EDRAM0_BAR); r->size = G_EDRAM0_SIZE(addr_len) << 20; if (r->size > 0) { r->start = G_EDRAM0_BASE(addr_len) << 20; if (addr >= r->start && addr + len <= r->start + r->size) return (0); r++; n++; } } if (em & F_EDRAM1_ENABLE) { addr_len = t4_read_reg(sc, A_MA_EDRAM1_BAR); r->size = G_EDRAM1_SIZE(addr_len) << 20; if (r->size > 0) { r->start = G_EDRAM1_BASE(addr_len) << 20; if (addr >= r->start && addr + len <= r->start + r->size) return (0); r++; n++; } } if (em & F_EXT_MEM_ENABLE) { addr_len = t4_read_reg(sc, A_MA_EXT_MEMORY_BAR); r->size = G_EXT_MEM_SIZE(addr_len) << 20; if (r->size > 0) { r->start = G_EXT_MEM_BASE(addr_len) << 20; if (addr >= r->start && addr + len <= r->start + r->size) return (0); r++; n++; } } if (is_t5(sc) && em & F_EXT_MEM1_ENABLE) { addr_len = t4_read_reg(sc, A_MA_EXT_MEMORY1_BAR); r->size = G_EXT_MEM1_SIZE(addr_len) << 20; if (r->size > 0) { r->start = G_EXT_MEM1_BASE(addr_len) << 20; if (addr >= r->start && addr + len <= r->start + r->size) return (0); r++; n++; } } MPASS(n <= nitems(mem_ranges)); if (n > 1) { /* Sort and merge the ranges. */ qsort(mem_ranges, n, sizeof(struct t4_range), t4_range_cmp); /* Start from index 0 and examine the next n - 1 entries. */ r = &mem_ranges[0]; for (remaining = n - 1; remaining > 0; remaining--, r++) { MPASS(r->size > 0); /* r is a valid entry. */ next = r + 1; MPASS(next->size > 0); /* and so is the next one. */ while (r->start + r->size >= next->start) { /* Merge the next one into the current entry. */ r->size = max(r->start + r->size, next->start + next->size) - r->start; n--; /* One fewer entry in total. */ if (--remaining == 0) goto done; /* short circuit */ next++; } if (next != r + 1) { /* * Some entries were merged into r and next * points to the first valid entry that couldn't * be merged. */ MPASS(next->size > 0); /* must be valid */ memcpy(r + 1, next, remaining * sizeof(*r)); #ifdef INVARIANTS /* * This so that the foo->size assertion in the * next iteration of the loop do the right * thing for entries that were pulled up and are * no longer valid. */ MPASS(n < nitems(mem_ranges)); bzero(&mem_ranges[n], (nitems(mem_ranges) - n) * sizeof(struct t4_range)); #endif } } done: /* Done merging the ranges. */ MPASS(n > 0); r = &mem_ranges[0]; for (i = 0; i < n; i++, r++) { if (addr >= r->start && addr + len <= r->start + r->size) return (0); } } return (EFAULT); } static int fwmtype_to_hwmtype(int mtype) { switch (mtype) { case FW_MEMTYPE_EDC0: return (MEM_EDC0); case FW_MEMTYPE_EDC1: return (MEM_EDC1); case FW_MEMTYPE_EXTMEM: return (MEM_MC0); case FW_MEMTYPE_EXTMEM1: return (MEM_MC1); default: panic("%s: cannot translate fw mtype %d.", __func__, mtype); } } /* * Verify that the memory range specified by the memtype/offset/len pair is * valid and lies entirely within the memtype specified. The global address of * the start of the range is returned in addr. */ static int validate_mt_off_len(struct adapter *sc, int mtype, uint32_t off, int len, uint32_t *addr) { uint32_t em, addr_len, maddr; /* Memory can only be accessed in naturally aligned 4 byte units */ if (off & 3 || len & 3 || len == 0) return (EINVAL); em = t4_read_reg(sc, A_MA_TARGET_MEM_ENABLE); switch (fwmtype_to_hwmtype(mtype)) { case MEM_EDC0: if (!(em & F_EDRAM0_ENABLE)) return (EINVAL); addr_len = t4_read_reg(sc, A_MA_EDRAM0_BAR); maddr = G_EDRAM0_BASE(addr_len) << 20; break; case MEM_EDC1: if (!(em & F_EDRAM1_ENABLE)) return (EINVAL); addr_len = t4_read_reg(sc, A_MA_EDRAM1_BAR); maddr = G_EDRAM1_BASE(addr_len) << 20; break; case MEM_MC: if (!(em & F_EXT_MEM_ENABLE)) return (EINVAL); addr_len = t4_read_reg(sc, A_MA_EXT_MEMORY_BAR); maddr = G_EXT_MEM_BASE(addr_len) << 20; break; case MEM_MC1: if (!is_t5(sc) || !(em & F_EXT_MEM1_ENABLE)) return (EINVAL); addr_len = t4_read_reg(sc, A_MA_EXT_MEMORY1_BAR); maddr = G_EXT_MEM1_BASE(addr_len) << 20; break; default: return (EINVAL); } *addr = maddr + off; /* global address */ return (validate_mem_range(sc, *addr, len)); } static int fixup_devlog_params(struct adapter *sc) { struct devlog_params *dparams = &sc->params.devlog; int rc; rc = validate_mt_off_len(sc, dparams->memtype, dparams->start, dparams->size, &dparams->addr); return (rc); } static int cfg_itype_and_nqueues(struct adapter *sc, int n10g, int n1g, int num_vis, struct intrs_and_queues *iaq) { int rc, itype, navail, nrxq10g, nrxq1g, n; int nofldrxq10g = 0, nofldrxq1g = 0; bzero(iaq, sizeof(*iaq)); iaq->ntxq10g = t4_ntxq10g; iaq->ntxq1g = t4_ntxq1g; iaq->ntxq_vi = t4_ntxq_vi; iaq->nrxq10g = nrxq10g = t4_nrxq10g; iaq->nrxq1g = nrxq1g = t4_nrxq1g; iaq->nrxq_vi = t4_nrxq_vi; iaq->rsrv_noflowq = t4_rsrv_noflowq; #ifdef TCP_OFFLOAD if (is_offload(sc)) { iaq->nofldtxq10g = t4_nofldtxq10g; iaq->nofldtxq1g = t4_nofldtxq1g; iaq->nofldtxq_vi = t4_nofldtxq_vi; iaq->nofldrxq10g = nofldrxq10g = t4_nofldrxq10g; iaq->nofldrxq1g = nofldrxq1g = t4_nofldrxq1g; iaq->nofldrxq_vi = t4_nofldrxq_vi; } #endif #ifdef DEV_NETMAP iaq->nnmtxq_vi = t4_nnmtxq_vi; iaq->nnmrxq_vi = t4_nnmrxq_vi; #endif for (itype = INTR_MSIX; itype; itype >>= 1) { if ((itype & t4_intr_types) == 0) continue; /* not allowed */ if (itype == INTR_MSIX) navail = pci_msix_count(sc->dev); else if (itype == INTR_MSI) navail = pci_msi_count(sc->dev); else navail = 1; restart: if (navail == 0) continue; iaq->intr_type = itype; iaq->intr_flags_10g = 0; iaq->intr_flags_1g = 0; /* * Best option: an interrupt vector for errors, one for the * firmware event queue, and one for every rxq (NIC and TOE) of * every VI. The VIs that support netmap use the same * interrupts for the NIC rx queues and the netmap rx queues * because only one set of queues is active at a time. */ iaq->nirq = T4_EXTRA_INTR; iaq->nirq += n10g * (nrxq10g + nofldrxq10g); iaq->nirq += n1g * (nrxq1g + nofldrxq1g); iaq->nirq += (n10g + n1g) * (num_vis - 1) * max(iaq->nrxq_vi, iaq->nnmrxq_vi); /* See comment above. */ iaq->nirq += (n10g + n1g) * (num_vis - 1) * iaq->nofldrxq_vi; if (iaq->nirq <= navail && (itype != INTR_MSI || powerof2(iaq->nirq))) { iaq->intr_flags_10g = INTR_ALL; iaq->intr_flags_1g = INTR_ALL; goto allocate; } /* Disable the VIs (and netmap) if there aren't enough intrs */ if (num_vis > 1) { device_printf(sc->dev, "virtual interfaces disabled " "because num_vis=%u with current settings " "(nrxq10g=%u, nrxq1g=%u, nofldrxq10g=%u, " "nofldrxq1g=%u, nrxq_vi=%u nofldrxq_vi=%u, " "nnmrxq_vi=%u) would need %u interrupts but " "only %u are available.\n", num_vis, nrxq10g, nrxq1g, nofldrxq10g, nofldrxq1g, iaq->nrxq_vi, iaq->nofldrxq_vi, iaq->nnmrxq_vi, iaq->nirq, navail); num_vis = 1; iaq->ntxq_vi = iaq->nrxq_vi = 0; iaq->nofldtxq_vi = iaq->nofldrxq_vi = 0; iaq->nnmtxq_vi = iaq->nnmrxq_vi = 0; goto restart; } /* * Second best option: a vector for errors, one for the firmware * event queue, and vectors for either all the NIC rx queues or * all the TOE rx queues. The queues that don't get vectors * will forward their interrupts to those that do. */ iaq->nirq = T4_EXTRA_INTR; if (nrxq10g >= nofldrxq10g) { iaq->intr_flags_10g = INTR_RXQ; iaq->nirq += n10g * nrxq10g; } else { iaq->intr_flags_10g = INTR_OFLD_RXQ; iaq->nirq += n10g * nofldrxq10g; } if (nrxq1g >= nofldrxq1g) { iaq->intr_flags_1g = INTR_RXQ; iaq->nirq += n1g * nrxq1g; } else { iaq->intr_flags_1g = INTR_OFLD_RXQ; iaq->nirq += n1g * nofldrxq1g; } if (iaq->nirq <= navail && (itype != INTR_MSI || powerof2(iaq->nirq))) goto allocate; /* * Next best option: an interrupt vector for errors, one for the * firmware event queue, and at least one per main-VI. At this * point we know we'll have to downsize nrxq and/or nofldrxq to * fit what's available to us. */ iaq->nirq = T4_EXTRA_INTR; iaq->nirq += n10g + n1g; if (iaq->nirq <= navail) { int leftover = navail - iaq->nirq; if (n10g > 0) { int target = max(nrxq10g, nofldrxq10g); iaq->intr_flags_10g = nrxq10g >= nofldrxq10g ? INTR_RXQ : INTR_OFLD_RXQ; n = 1; while (n < target && leftover >= n10g) { leftover -= n10g; iaq->nirq += n10g; n++; } iaq->nrxq10g = min(n, nrxq10g); #ifdef TCP_OFFLOAD iaq->nofldrxq10g = min(n, nofldrxq10g); #endif } if (n1g > 0) { int target = max(nrxq1g, nofldrxq1g); iaq->intr_flags_1g = nrxq1g >= nofldrxq1g ? INTR_RXQ : INTR_OFLD_RXQ; n = 1; while (n < target && leftover >= n1g) { leftover -= n1g; iaq->nirq += n1g; n++; } iaq->nrxq1g = min(n, nrxq1g); #ifdef TCP_OFFLOAD iaq->nofldrxq1g = min(n, nofldrxq1g); #endif } if (itype != INTR_MSI || powerof2(iaq->nirq)) goto allocate; } /* * Least desirable option: one interrupt vector for everything. */ iaq->nirq = iaq->nrxq10g = iaq->nrxq1g = 1; iaq->intr_flags_10g = iaq->intr_flags_1g = 0; #ifdef TCP_OFFLOAD if (is_offload(sc)) iaq->nofldrxq10g = iaq->nofldrxq1g = 1; #endif allocate: navail = iaq->nirq; rc = 0; if (itype == INTR_MSIX) rc = pci_alloc_msix(sc->dev, &navail); else if (itype == INTR_MSI) rc = pci_alloc_msi(sc->dev, &navail); if (rc == 0) { if (navail == iaq->nirq) return (0); /* * Didn't get the number requested. Use whatever number * the kernel is willing to allocate (it's in navail). */ device_printf(sc->dev, "fewer vectors than requested, " "type=%d, req=%d, rcvd=%d; will downshift req.\n", itype, iaq->nirq, navail); pci_release_msi(sc->dev); goto restart; } device_printf(sc->dev, "failed to allocate vectors:%d, type=%d, req=%d, rcvd=%d\n", itype, rc, iaq->nirq, navail); } device_printf(sc->dev, "failed to find a usable interrupt type. " "allowed=%d, msi-x=%d, msi=%d, intx=1", t4_intr_types, pci_msix_count(sc->dev), pci_msi_count(sc->dev)); return (ENXIO); } #define FW_VERSION(chip) ( \ V_FW_HDR_FW_VER_MAJOR(chip##FW_VERSION_MAJOR) | \ V_FW_HDR_FW_VER_MINOR(chip##FW_VERSION_MINOR) | \ V_FW_HDR_FW_VER_MICRO(chip##FW_VERSION_MICRO) | \ V_FW_HDR_FW_VER_BUILD(chip##FW_VERSION_BUILD)) #define FW_INTFVER(chip, intf) (chip##FW_HDR_INTFVER_##intf) struct fw_info { uint8_t chip; char *kld_name; char *fw_mod_name; struct fw_hdr fw_hdr; /* XXX: waste of space, need a sparse struct */ } fw_info[] = { { .chip = CHELSIO_T4, .kld_name = "t4fw_cfg", .fw_mod_name = "t4fw", .fw_hdr = { .chip = FW_HDR_CHIP_T4, .fw_ver = htobe32_const(FW_VERSION(T4)), .intfver_nic = FW_INTFVER(T4, NIC), .intfver_vnic = FW_INTFVER(T4, VNIC), .intfver_ofld = FW_INTFVER(T4, OFLD), .intfver_ri = FW_INTFVER(T4, RI), .intfver_iscsipdu = FW_INTFVER(T4, ISCSIPDU), .intfver_iscsi = FW_INTFVER(T4, ISCSI), .intfver_fcoepdu = FW_INTFVER(T4, FCOEPDU), .intfver_fcoe = FW_INTFVER(T4, FCOE), }, }, { .chip = CHELSIO_T5, .kld_name = "t5fw_cfg", .fw_mod_name = "t5fw", .fw_hdr = { .chip = FW_HDR_CHIP_T5, .fw_ver = htobe32_const(FW_VERSION(T5)), .intfver_nic = FW_INTFVER(T5, NIC), .intfver_vnic = FW_INTFVER(T5, VNIC), .intfver_ofld = FW_INTFVER(T5, OFLD), .intfver_ri = FW_INTFVER(T5, RI), .intfver_iscsipdu = FW_INTFVER(T5, ISCSIPDU), .intfver_iscsi = FW_INTFVER(T5, ISCSI), .intfver_fcoepdu = FW_INTFVER(T5, FCOEPDU), .intfver_fcoe = FW_INTFVER(T5, FCOE), }, } }; static struct fw_info * find_fw_info(int chip) { int i; for (i = 0; i < nitems(fw_info); i++) { if (fw_info[i].chip == chip) return (&fw_info[i]); } return (NULL); } /* * Is the given firmware API compatible with the one the driver was compiled * with? */ static int fw_compatible(const struct fw_hdr *hdr1, const struct fw_hdr *hdr2) { /* short circuit if it's the exact same firmware version */ if (hdr1->chip == hdr2->chip && hdr1->fw_ver == hdr2->fw_ver) return (1); /* * XXX: Is this too conservative? Perhaps I should limit this to the * features that are supported in the driver. */ #define SAME_INTF(x) (hdr1->intfver_##x == hdr2->intfver_##x) if (hdr1->chip == hdr2->chip && SAME_INTF(nic) && SAME_INTF(vnic) && SAME_INTF(ofld) && SAME_INTF(ri) && SAME_INTF(iscsipdu) && SAME_INTF(iscsi) && SAME_INTF(fcoepdu) && SAME_INTF(fcoe)) return (1); #undef SAME_INTF return (0); } /* * The firmware in the KLD is usable, but should it be installed? This routine * explains itself in detail if it indicates the KLD firmware should be * installed. */ static int should_install_kld_fw(struct adapter *sc, int card_fw_usable, int k, int c) { const char *reason; if (!card_fw_usable) { reason = "incompatible or unusable"; goto install; } if (k > c) { reason = "older than the version bundled with this driver"; goto install; } if (t4_fw_install == 2 && k != c) { reason = "different than the version bundled with this driver"; goto install; } return (0); install: if (t4_fw_install == 0) { device_printf(sc->dev, "firmware on card (%u.%u.%u.%u) is %s, " "but the driver is prohibited from installing a different " "firmware on the card.\n", G_FW_HDR_FW_VER_MAJOR(c), G_FW_HDR_FW_VER_MINOR(c), G_FW_HDR_FW_VER_MICRO(c), G_FW_HDR_FW_VER_BUILD(c), reason); return (0); } device_printf(sc->dev, "firmware on card (%u.%u.%u.%u) is %s, " "installing firmware %u.%u.%u.%u on card.\n", G_FW_HDR_FW_VER_MAJOR(c), G_FW_HDR_FW_VER_MINOR(c), G_FW_HDR_FW_VER_MICRO(c), G_FW_HDR_FW_VER_BUILD(c), reason, G_FW_HDR_FW_VER_MAJOR(k), G_FW_HDR_FW_VER_MINOR(k), G_FW_HDR_FW_VER_MICRO(k), G_FW_HDR_FW_VER_BUILD(k)); return (1); } /* * Establish contact with the firmware and determine if we are the master driver * or not, and whether we are responsible for chip initialization. */ static int prep_firmware(struct adapter *sc) { const struct firmware *fw = NULL, *default_cfg; int rc, pf, card_fw_usable, kld_fw_usable, need_fw_reset = 1; enum dev_state state; struct fw_info *fw_info; struct fw_hdr *card_fw; /* fw on the card */ const struct fw_hdr *kld_fw; /* fw in the KLD */ const struct fw_hdr *drv_fw; /* fw header the driver was compiled against */ /* Contact firmware. */ rc = t4_fw_hello(sc, sc->mbox, sc->mbox, MASTER_MAY, &state); if (rc < 0 || state == DEV_STATE_ERR) { rc = -rc; device_printf(sc->dev, "failed to connect to the firmware: %d, %d.\n", rc, state); return (rc); } pf = rc; if (pf == sc->mbox) sc->flags |= MASTER_PF; else if (state == DEV_STATE_UNINIT) { /* * We didn't get to be the master so we definitely won't be * configuring the chip. It's a bug if someone else hasn't * configured it already. */ device_printf(sc->dev, "couldn't be master(%d), " "device not already initialized either(%d).\n", rc, state); return (EDOOFUS); } /* This is the firmware whose headers the driver was compiled against */ fw_info = find_fw_info(chip_id(sc)); if (fw_info == NULL) { device_printf(sc->dev, "unable to look up firmware information for chip %d.\n", chip_id(sc)); return (EINVAL); } drv_fw = &fw_info->fw_hdr; /* * The firmware KLD contains many modules. The KLD name is also the * name of the module that contains the default config file. */ default_cfg = firmware_get(fw_info->kld_name); /* Read the header of the firmware on the card */ card_fw = malloc(sizeof(*card_fw), M_CXGBE, M_ZERO | M_WAITOK); rc = -t4_read_flash(sc, FLASH_FW_START, sizeof (*card_fw) / sizeof (uint32_t), (uint32_t *)card_fw, 1); if (rc == 0) card_fw_usable = fw_compatible(drv_fw, (const void*)card_fw); else { device_printf(sc->dev, "Unable to read card's firmware header: %d\n", rc); card_fw_usable = 0; } /* This is the firmware in the KLD */ fw = firmware_get(fw_info->fw_mod_name); if (fw != NULL) { kld_fw = (const void *)fw->data; kld_fw_usable = fw_compatible(drv_fw, kld_fw); } else { kld_fw = NULL; kld_fw_usable = 0; } if (card_fw_usable && card_fw->fw_ver == drv_fw->fw_ver && (!kld_fw_usable || kld_fw->fw_ver == drv_fw->fw_ver)) { /* * Common case: the firmware on the card is an exact match and * the KLD is an exact match too, or the KLD is * absent/incompatible. Note that t4_fw_install = 2 is ignored * here -- use cxgbetool loadfw if you want to reinstall the * same firmware as the one on the card. */ } else if (kld_fw_usable && state == DEV_STATE_UNINIT && should_install_kld_fw(sc, card_fw_usable, be32toh(kld_fw->fw_ver), be32toh(card_fw->fw_ver))) { rc = -t4_fw_upgrade(sc, sc->mbox, fw->data, fw->datasize, 0); if (rc != 0) { device_printf(sc->dev, "failed to install firmware: %d\n", rc); goto done; } /* Installed successfully, update the cached header too. */ memcpy(card_fw, kld_fw, sizeof(*card_fw)); card_fw_usable = 1; need_fw_reset = 0; /* already reset as part of load_fw */ } if (!card_fw_usable) { uint32_t d, c, k; d = ntohl(drv_fw->fw_ver); c = ntohl(card_fw->fw_ver); k = kld_fw ? ntohl(kld_fw->fw_ver) : 0; device_printf(sc->dev, "Cannot find a usable firmware: " "fw_install %d, chip state %d, " "driver compiled with %d.%d.%d.%d, " "card has %d.%d.%d.%d, KLD has %d.%d.%d.%d\n", t4_fw_install, state, G_FW_HDR_FW_VER_MAJOR(d), G_FW_HDR_FW_VER_MINOR(d), G_FW_HDR_FW_VER_MICRO(d), G_FW_HDR_FW_VER_BUILD(d), G_FW_HDR_FW_VER_MAJOR(c), G_FW_HDR_FW_VER_MINOR(c), G_FW_HDR_FW_VER_MICRO(c), G_FW_HDR_FW_VER_BUILD(c), G_FW_HDR_FW_VER_MAJOR(k), G_FW_HDR_FW_VER_MINOR(k), G_FW_HDR_FW_VER_MICRO(k), G_FW_HDR_FW_VER_BUILD(k)); rc = EINVAL; goto done; } /* Reset device */ if (need_fw_reset && (rc = -t4_fw_reset(sc, sc->mbox, F_PIORSTMODE | F_PIORST)) != 0) { device_printf(sc->dev, "firmware reset failed: %d.\n", rc); if (rc != ETIMEDOUT && rc != EIO) t4_fw_bye(sc, sc->mbox); goto done; } sc->flags |= FW_OK; rc = get_params__pre_init(sc); if (rc != 0) goto done; /* error message displayed already */ /* Partition adapter resources as specified in the config file. */ if (state == DEV_STATE_UNINIT) { KASSERT(sc->flags & MASTER_PF, ("%s: trying to change chip settings when not master.", __func__)); rc = partition_resources(sc, default_cfg, fw_info->kld_name); if (rc != 0) goto done; /* error message displayed already */ t4_tweak_chip_settings(sc); /* get basic stuff going */ rc = -t4_fw_initialize(sc, sc->mbox); if (rc != 0) { device_printf(sc->dev, "fw init failed: %d.\n", rc); goto done; } } else { snprintf(sc->cfg_file, sizeof(sc->cfg_file), "pf%d", pf); sc->cfcsum = 0; } done: free(card_fw, M_CXGBE); if (fw != NULL) firmware_put(fw, FIRMWARE_UNLOAD); if (default_cfg != NULL) firmware_put(default_cfg, FIRMWARE_UNLOAD); return (rc); } #define FW_PARAM_DEV(param) \ (V_FW_PARAMS_MNEM(FW_PARAMS_MNEM_DEV) | \ V_FW_PARAMS_PARAM_X(FW_PARAMS_PARAM_DEV_##param)) #define FW_PARAM_PFVF(param) \ (V_FW_PARAMS_MNEM(FW_PARAMS_MNEM_PFVF) | \ V_FW_PARAMS_PARAM_X(FW_PARAMS_PARAM_PFVF_##param)) /* * Partition chip resources for use between various PFs, VFs, etc. */ static int partition_resources(struct adapter *sc, const struct firmware *default_cfg, const char *name_prefix) { const struct firmware *cfg = NULL; int rc = 0; struct fw_caps_config_cmd caps; uint32_t mtype, moff, finicsum, cfcsum; /* * Figure out what configuration file to use. Pick the default config * file for the card if the user hasn't specified one explicitly. */ snprintf(sc->cfg_file, sizeof(sc->cfg_file), "%s", t4_cfg_file); if (strncmp(t4_cfg_file, DEFAULT_CF, sizeof(t4_cfg_file)) == 0) { /* Card specific overrides go here. */ if (pci_get_device(sc->dev) == 0x440a) snprintf(sc->cfg_file, sizeof(sc->cfg_file), UWIRE_CF); if (is_fpga(sc)) snprintf(sc->cfg_file, sizeof(sc->cfg_file), FPGA_CF); } /* * We need to load another module if the profile is anything except * "default" or "flash". */ if (strncmp(sc->cfg_file, DEFAULT_CF, sizeof(sc->cfg_file)) != 0 && strncmp(sc->cfg_file, FLASH_CF, sizeof(sc->cfg_file)) != 0) { char s[32]; snprintf(s, sizeof(s), "%s_%s", name_prefix, sc->cfg_file); cfg = firmware_get(s); if (cfg == NULL) { if (default_cfg != NULL) { device_printf(sc->dev, "unable to load module \"%s\" for " "configuration profile \"%s\", will use " "the default config file instead.\n", s, sc->cfg_file); snprintf(sc->cfg_file, sizeof(sc->cfg_file), "%s", DEFAULT_CF); } else { device_printf(sc->dev, "unable to load module \"%s\" for " "configuration profile \"%s\", will use " "the config file on the card's flash " "instead.\n", s, sc->cfg_file); snprintf(sc->cfg_file, sizeof(sc->cfg_file), "%s", FLASH_CF); } } } if (strncmp(sc->cfg_file, DEFAULT_CF, sizeof(sc->cfg_file)) == 0 && default_cfg == NULL) { device_printf(sc->dev, "default config file not available, will use the config " "file on the card's flash instead.\n"); snprintf(sc->cfg_file, sizeof(sc->cfg_file), "%s", FLASH_CF); } if (strncmp(sc->cfg_file, FLASH_CF, sizeof(sc->cfg_file)) != 0) { u_int cflen; const uint32_t *cfdata; uint32_t param, val, addr; KASSERT(cfg != NULL || default_cfg != NULL, ("%s: no config to upload", __func__)); /* * Ask the firmware where it wants us to upload the config file. */ param = FW_PARAM_DEV(CF); rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 1, ¶m, &val); if (rc != 0) { /* No support for config file? Shouldn't happen. */ device_printf(sc->dev, "failed to query config file location: %d.\n", rc); goto done; } mtype = G_FW_PARAMS_PARAM_Y(val); moff = G_FW_PARAMS_PARAM_Z(val) << 16; /* * XXX: sheer laziness. We deliberately added 4 bytes of * useless stuffing/comments at the end of the config file so * it's ok to simply throw away the last remaining bytes when * the config file is not an exact multiple of 4. This also * helps with the validate_mt_off_len check. */ if (cfg != NULL) { cflen = cfg->datasize & ~3; cfdata = cfg->data; } else { cflen = default_cfg->datasize & ~3; cfdata = default_cfg->data; } if (cflen > FLASH_CFG_MAX_SIZE) { device_printf(sc->dev, "config file too long (%d, max allowed is %d). " "Will try to use the config on the card, if any.\n", cflen, FLASH_CFG_MAX_SIZE); goto use_config_on_flash; } rc = validate_mt_off_len(sc, mtype, moff, cflen, &addr); if (rc != 0) { device_printf(sc->dev, "%s: addr (%d/0x%x) or len %d is not valid: %d. " "Will try to use the config on the card, if any.\n", __func__, mtype, moff, cflen, rc); goto use_config_on_flash; } write_via_memwin(sc, 2, addr, cfdata, cflen); } else { use_config_on_flash: mtype = FW_MEMTYPE_FLASH; moff = t4_flash_cfg_addr(sc); } bzero(&caps, sizeof(caps)); caps.op_to_write = htobe32(V_FW_CMD_OP(FW_CAPS_CONFIG_CMD) | F_FW_CMD_REQUEST | F_FW_CMD_READ); caps.cfvalid_to_len16 = htobe32(F_FW_CAPS_CONFIG_CMD_CFVALID | V_FW_CAPS_CONFIG_CMD_MEMTYPE_CF(mtype) | V_FW_CAPS_CONFIG_CMD_MEMADDR64K_CF(moff >> 16) | FW_LEN16(caps)); rc = -t4_wr_mbox(sc, sc->mbox, &caps, sizeof(caps), &caps); if (rc != 0) { device_printf(sc->dev, "failed to pre-process config file: %d " "(mtype %d, moff 0x%x).\n", rc, mtype, moff); goto done; } finicsum = be32toh(caps.finicsum); cfcsum = be32toh(caps.cfcsum); if (finicsum != cfcsum) { device_printf(sc->dev, "WARNING: config file checksum mismatch: %08x %08x\n", finicsum, cfcsum); } sc->cfcsum = cfcsum; #define LIMIT_CAPS(x) do { \ caps.x &= htobe16(t4_##x##_allowed); \ } while (0) /* * Let the firmware know what features will (not) be used so it can tune * things accordingly. */ LIMIT_CAPS(nbmcaps); LIMIT_CAPS(linkcaps); LIMIT_CAPS(switchcaps); LIMIT_CAPS(niccaps); LIMIT_CAPS(toecaps); LIMIT_CAPS(rdmacaps); LIMIT_CAPS(tlscaps); LIMIT_CAPS(iscsicaps); LIMIT_CAPS(fcoecaps); #undef LIMIT_CAPS caps.op_to_write = htobe32(V_FW_CMD_OP(FW_CAPS_CONFIG_CMD) | F_FW_CMD_REQUEST | F_FW_CMD_WRITE); caps.cfvalid_to_len16 = htobe32(FW_LEN16(caps)); rc = -t4_wr_mbox(sc, sc->mbox, &caps, sizeof(caps), NULL); if (rc != 0) { device_printf(sc->dev, "failed to process config file: %d.\n", rc); } done: if (cfg != NULL) firmware_put(cfg, FIRMWARE_UNLOAD); return (rc); } /* * Retrieve parameters that are needed (or nice to have) very early. */ static int get_params__pre_init(struct adapter *sc) { int rc; uint32_t param[2], val[2]; t4_get_version_info(sc); snprintf(sc->fw_version, sizeof(sc->fw_version), "%u.%u.%u.%u", G_FW_HDR_FW_VER_MAJOR(sc->params.fw_vers), G_FW_HDR_FW_VER_MINOR(sc->params.fw_vers), G_FW_HDR_FW_VER_MICRO(sc->params.fw_vers), G_FW_HDR_FW_VER_BUILD(sc->params.fw_vers)); snprintf(sc->bs_version, sizeof(sc->bs_version), "%u.%u.%u.%u", G_FW_HDR_FW_VER_MAJOR(sc->params.bs_vers), G_FW_HDR_FW_VER_MINOR(sc->params.bs_vers), G_FW_HDR_FW_VER_MICRO(sc->params.bs_vers), G_FW_HDR_FW_VER_BUILD(sc->params.bs_vers)); snprintf(sc->tp_version, sizeof(sc->tp_version), "%u.%u.%u.%u", G_FW_HDR_FW_VER_MAJOR(sc->params.tp_vers), G_FW_HDR_FW_VER_MINOR(sc->params.tp_vers), G_FW_HDR_FW_VER_MICRO(sc->params.tp_vers), G_FW_HDR_FW_VER_BUILD(sc->params.tp_vers)); snprintf(sc->er_version, sizeof(sc->er_version), "%u.%u.%u.%u", G_FW_HDR_FW_VER_MAJOR(sc->params.er_vers), G_FW_HDR_FW_VER_MINOR(sc->params.er_vers), G_FW_HDR_FW_VER_MICRO(sc->params.er_vers), G_FW_HDR_FW_VER_BUILD(sc->params.er_vers)); param[0] = FW_PARAM_DEV(PORTVEC); param[1] = FW_PARAM_DEV(CCLK); rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 2, param, val); if (rc != 0) { device_printf(sc->dev, "failed to query parameters (pre_init): %d.\n", rc); return (rc); } sc->params.portvec = val[0]; sc->params.nports = bitcount32(val[0]); sc->params.vpd.cclk = val[1]; /* Read device log parameters. */ rc = -t4_init_devlog_params(sc, 1); if (rc == 0) fixup_devlog_params(sc); else { device_printf(sc->dev, "failed to get devlog parameters: %d.\n", rc); rc = 0; /* devlog isn't critical for device operation */ } return (rc); } /* * Retrieve various parameters that are of interest to the driver. The device * has been initialized by the firmware at this point. */ static int get_params__post_init(struct adapter *sc) { int rc; uint32_t param[7], val[7]; struct fw_caps_config_cmd caps; param[0] = FW_PARAM_PFVF(IQFLINT_START); param[1] = FW_PARAM_PFVF(EQ_START); param[2] = FW_PARAM_PFVF(FILTER_START); param[3] = FW_PARAM_PFVF(FILTER_END); param[4] = FW_PARAM_PFVF(L2T_START); param[5] = FW_PARAM_PFVF(L2T_END); rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 6, param, val); if (rc != 0) { device_printf(sc->dev, "failed to query parameters (post_init): %d.\n", rc); return (rc); } sc->sge.iq_start = val[0]; sc->sge.eq_start = val[1]; sc->tids.ftid_base = val[2]; sc->tids.nftids = val[3] - val[2] + 1; sc->params.ftid_min = val[2]; sc->params.ftid_max = val[3]; sc->vres.l2t.start = val[4]; sc->vres.l2t.size = val[5] - val[4] + 1; KASSERT(sc->vres.l2t.size <= L2T_SIZE, ("%s: L2 table size (%u) larger than expected (%u)", __func__, sc->vres.l2t.size, L2T_SIZE)); /* get capabilites */ bzero(&caps, sizeof(caps)); caps.op_to_write = htobe32(V_FW_CMD_OP(FW_CAPS_CONFIG_CMD) | F_FW_CMD_REQUEST | F_FW_CMD_READ); caps.cfvalid_to_len16 = htobe32(FW_LEN16(caps)); rc = -t4_wr_mbox(sc, sc->mbox, &caps, sizeof(caps), &caps); if (rc != 0) { device_printf(sc->dev, "failed to get card capabilities: %d.\n", rc); return (rc); } #define READ_CAPS(x) do { \ sc->x = htobe16(caps.x); \ } while (0) READ_CAPS(nbmcaps); READ_CAPS(linkcaps); READ_CAPS(switchcaps); READ_CAPS(niccaps); READ_CAPS(toecaps); READ_CAPS(rdmacaps); READ_CAPS(tlscaps); READ_CAPS(iscsicaps); READ_CAPS(fcoecaps); if (sc->niccaps & FW_CAPS_CONFIG_NIC_ETHOFLD) { param[0] = FW_PARAM_PFVF(ETHOFLD_START); param[1] = FW_PARAM_PFVF(ETHOFLD_END); param[2] = FW_PARAM_DEV(FLOWC_BUFFIFO_SZ); rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 3, param, val); if (rc != 0) { device_printf(sc->dev, "failed to query NIC parameters: %d.\n", rc); return (rc); } sc->tids.etid_base = val[0]; sc->params.etid_min = val[0]; sc->tids.netids = val[1] - val[0] + 1; sc->params.netids = sc->tids.netids; sc->params.eo_wr_cred = val[2]; sc->params.ethoffload = 1; } if (sc->toecaps) { /* query offload-related parameters */ param[0] = FW_PARAM_DEV(NTID); param[1] = FW_PARAM_PFVF(SERVER_START); param[2] = FW_PARAM_PFVF(SERVER_END); param[3] = FW_PARAM_PFVF(TDDP_START); param[4] = FW_PARAM_PFVF(TDDP_END); param[5] = FW_PARAM_DEV(FLOWC_BUFFIFO_SZ); rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 6, param, val); if (rc != 0) { device_printf(sc->dev, "failed to query TOE parameters: %d.\n", rc); return (rc); } sc->tids.ntids = val[0]; sc->tids.natids = min(sc->tids.ntids / 2, MAX_ATIDS); sc->tids.stid_base = val[1]; sc->tids.nstids = val[2] - val[1] + 1; sc->vres.ddp.start = val[3]; sc->vres.ddp.size = val[4] - val[3] + 1; sc->params.ofldq_wr_cred = val[5]; sc->params.offload = 1; } if (sc->rdmacaps) { param[0] = FW_PARAM_PFVF(STAG_START); param[1] = FW_PARAM_PFVF(STAG_END); param[2] = FW_PARAM_PFVF(RQ_START); param[3] = FW_PARAM_PFVF(RQ_END); param[4] = FW_PARAM_PFVF(PBL_START); param[5] = FW_PARAM_PFVF(PBL_END); rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 6, param, val); if (rc != 0) { device_printf(sc->dev, "failed to query RDMA parameters(1): %d.\n", rc); return (rc); } sc->vres.stag.start = val[0]; sc->vres.stag.size = val[1] - val[0] + 1; sc->vres.rq.start = val[2]; sc->vres.rq.size = val[3] - val[2] + 1; sc->vres.pbl.start = val[4]; sc->vres.pbl.size = val[5] - val[4] + 1; param[0] = FW_PARAM_PFVF(SQRQ_START); param[1] = FW_PARAM_PFVF(SQRQ_END); param[2] = FW_PARAM_PFVF(CQ_START); param[3] = FW_PARAM_PFVF(CQ_END); param[4] = FW_PARAM_PFVF(OCQ_START); param[5] = FW_PARAM_PFVF(OCQ_END); rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 6, param, val); if (rc != 0) { device_printf(sc->dev, "failed to query RDMA parameters(2): %d.\n", rc); return (rc); } sc->vres.qp.start = val[0]; sc->vres.qp.size = val[1] - val[0] + 1; sc->vres.cq.start = val[2]; sc->vres.cq.size = val[3] - val[2] + 1; sc->vres.ocq.start = val[4]; sc->vres.ocq.size = val[5] - val[4] + 1; } if (sc->iscsicaps) { param[0] = FW_PARAM_PFVF(ISCSI_START); param[1] = FW_PARAM_PFVF(ISCSI_END); rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 2, param, val); if (rc != 0) { device_printf(sc->dev, "failed to query iSCSI parameters: %d.\n", rc); return (rc); } sc->vres.iscsi.start = val[0]; sc->vres.iscsi.size = val[1] - val[0] + 1; } t4_init_sge_params(sc); /* * We've got the params we wanted to query via the firmware. Now grab * some others directly from the chip. */ rc = t4_read_chip_settings(sc); return (rc); } static int set_params__post_init(struct adapter *sc) { uint32_t param, val; /* ask for encapsulated CPLs */ param = FW_PARAM_PFVF(CPLFW4MSG_ENCAP); val = 1; (void)t4_set_params(sc, sc->mbox, sc->pf, 0, 1, ¶m, &val); return (0); } #undef FW_PARAM_PFVF #undef FW_PARAM_DEV static void t4_set_desc(struct adapter *sc) { char buf[128]; struct adapter_params *p = &sc->params; snprintf(buf, sizeof(buf), "Chelsio %s", p->vpd.id); device_set_desc_copy(sc->dev, buf); } static void build_medialist(struct port_info *pi, struct ifmedia *media) { int m; PORT_LOCK(pi); ifmedia_removeall(media); m = IFM_ETHER | IFM_FDX; switch(pi->port_type) { case FW_PORT_TYPE_BT_XFI: case FW_PORT_TYPE_BT_XAUI: ifmedia_add(media, m | IFM_10G_T, 0, NULL); /* fall through */ case FW_PORT_TYPE_BT_SGMII: ifmedia_add(media, m | IFM_1000_T, 0, NULL); ifmedia_add(media, m | IFM_100_TX, 0, NULL); ifmedia_add(media, IFM_ETHER | IFM_AUTO, 0, NULL); ifmedia_set(media, IFM_ETHER | IFM_AUTO); break; case FW_PORT_TYPE_CX4: ifmedia_add(media, m | IFM_10G_CX4, 0, NULL); ifmedia_set(media, m | IFM_10G_CX4); break; case FW_PORT_TYPE_QSFP_10G: case FW_PORT_TYPE_SFP: case FW_PORT_TYPE_FIBER_XFI: case FW_PORT_TYPE_FIBER_XAUI: switch (pi->mod_type) { case FW_PORT_MOD_TYPE_LR: ifmedia_add(media, m | IFM_10G_LR, 0, NULL); ifmedia_set(media, m | IFM_10G_LR); break; case FW_PORT_MOD_TYPE_SR: ifmedia_add(media, m | IFM_10G_SR, 0, NULL); ifmedia_set(media, m | IFM_10G_SR); break; case FW_PORT_MOD_TYPE_LRM: ifmedia_add(media, m | IFM_10G_LRM, 0, NULL); ifmedia_set(media, m | IFM_10G_LRM); break; case FW_PORT_MOD_TYPE_TWINAX_PASSIVE: case FW_PORT_MOD_TYPE_TWINAX_ACTIVE: ifmedia_add(media, m | IFM_10G_TWINAX, 0, NULL); ifmedia_set(media, m | IFM_10G_TWINAX); break; case FW_PORT_MOD_TYPE_NONE: m &= ~IFM_FDX; ifmedia_add(media, m | IFM_NONE, 0, NULL); ifmedia_set(media, m | IFM_NONE); break; case FW_PORT_MOD_TYPE_NA: case FW_PORT_MOD_TYPE_ER: default: device_printf(pi->dev, "unknown port_type (%d), mod_type (%d)\n", pi->port_type, pi->mod_type); ifmedia_add(media, m | IFM_UNKNOWN, 0, NULL); ifmedia_set(media, m | IFM_UNKNOWN); break; } break; case FW_PORT_TYPE_QSFP: switch (pi->mod_type) { case FW_PORT_MOD_TYPE_LR: ifmedia_add(media, m | IFM_40G_LR4, 0, NULL); ifmedia_set(media, m | IFM_40G_LR4); break; case FW_PORT_MOD_TYPE_SR: ifmedia_add(media, m | IFM_40G_SR4, 0, NULL); ifmedia_set(media, m | IFM_40G_SR4); break; case FW_PORT_MOD_TYPE_TWINAX_PASSIVE: case FW_PORT_MOD_TYPE_TWINAX_ACTIVE: ifmedia_add(media, m | IFM_40G_CR4, 0, NULL); ifmedia_set(media, m | IFM_40G_CR4); break; case FW_PORT_MOD_TYPE_NONE: m &= ~IFM_FDX; ifmedia_add(media, m | IFM_NONE, 0, NULL); ifmedia_set(media, m | IFM_NONE); break; default: device_printf(pi->dev, "unknown port_type (%d), mod_type (%d)\n", pi->port_type, pi->mod_type); ifmedia_add(media, m | IFM_UNKNOWN, 0, NULL); ifmedia_set(media, m | IFM_UNKNOWN); break; } break; default: device_printf(pi->dev, "unknown port_type (%d), mod_type (%d)\n", pi->port_type, pi->mod_type); ifmedia_add(media, m | IFM_UNKNOWN, 0, NULL); ifmedia_set(media, m | IFM_UNKNOWN); break; } PORT_UNLOCK(pi); } #define FW_MAC_EXACT_CHUNK 7 /* * Program the port's XGMAC based on parameters in ifnet. The caller also * indicates which parameters should be programmed (the rest are left alone). */ int update_mac_settings(struct ifnet *ifp, int flags) { int rc = 0; struct vi_info *vi = ifp->if_softc; struct port_info *pi = vi->pi; struct adapter *sc = pi->adapter; int mtu = -1, promisc = -1, allmulti = -1, vlanex = -1; ASSERT_SYNCHRONIZED_OP(sc); KASSERT(flags, ("%s: not told what to update.", __func__)); if (flags & XGMAC_MTU) mtu = ifp->if_mtu; if (flags & XGMAC_PROMISC) promisc = ifp->if_flags & IFF_PROMISC ? 1 : 0; if (flags & XGMAC_ALLMULTI) allmulti = ifp->if_flags & IFF_ALLMULTI ? 1 : 0; if (flags & XGMAC_VLANEX) vlanex = ifp->if_capenable & IFCAP_VLAN_HWTAGGING ? 1 : 0; if (flags & (XGMAC_MTU|XGMAC_PROMISC|XGMAC_ALLMULTI|XGMAC_VLANEX)) { rc = -t4_set_rxmode(sc, sc->mbox, vi->viid, mtu, promisc, allmulti, 1, vlanex, false); if (rc) { if_printf(ifp, "set_rxmode (%x) failed: %d\n", flags, rc); return (rc); } } if (flags & XGMAC_UCADDR) { uint8_t ucaddr[ETHER_ADDR_LEN]; bcopy(IF_LLADDR(ifp), ucaddr, sizeof(ucaddr)); rc = t4_change_mac(sc, sc->mbox, vi->viid, vi->xact_addr_filt, ucaddr, true, true); if (rc < 0) { rc = -rc; if_printf(ifp, "change_mac failed: %d\n", rc); return (rc); } else { vi->xact_addr_filt = rc; rc = 0; } } if (flags & XGMAC_MCADDRS) { const uint8_t *mcaddr[FW_MAC_EXACT_CHUNK]; int del = 1; uint64_t hash = 0; struct ifmultiaddr *ifma; int i = 0, j; if_maddr_rlock(ifp); TAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) { if (ifma->ifma_addr->sa_family != AF_LINK) continue; mcaddr[i] = LLADDR((struct sockaddr_dl *)ifma->ifma_addr); MPASS(ETHER_IS_MULTICAST(mcaddr[i])); i++; if (i == FW_MAC_EXACT_CHUNK) { rc = t4_alloc_mac_filt(sc, sc->mbox, vi->viid, del, i, mcaddr, NULL, &hash, 0); if (rc < 0) { rc = -rc; for (j = 0; j < i; j++) { if_printf(ifp, "failed to add mc address" " %02x:%02x:%02x:" "%02x:%02x:%02x rc=%d\n", mcaddr[j][0], mcaddr[j][1], mcaddr[j][2], mcaddr[j][3], mcaddr[j][4], mcaddr[j][5], rc); } goto mcfail; } del = 0; i = 0; } } if (i > 0) { rc = t4_alloc_mac_filt(sc, sc->mbox, vi->viid, del, i, mcaddr, NULL, &hash, 0); if (rc < 0) { rc = -rc; for (j = 0; j < i; j++) { if_printf(ifp, "failed to add mc address" " %02x:%02x:%02x:" "%02x:%02x:%02x rc=%d\n", mcaddr[j][0], mcaddr[j][1], mcaddr[j][2], mcaddr[j][3], mcaddr[j][4], mcaddr[j][5], rc); } goto mcfail; } } rc = -t4_set_addr_hash(sc, sc->mbox, vi->viid, 0, hash, 0); if (rc != 0) if_printf(ifp, "failed to set mc address hash: %d", rc); mcfail: if_maddr_runlock(ifp); } return (rc); } /* * {begin|end}_synchronized_op must be called from the same thread. */ int begin_synchronized_op(struct adapter *sc, struct vi_info *vi, int flags, char *wmesg) { int rc, pri; #ifdef WITNESS /* the caller thinks it's ok to sleep, but is it really? */ if (flags & SLEEP_OK) WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, NULL, "begin_synchronized_op"); #endif if (INTR_OK) pri = PCATCH; else pri = 0; ADAPTER_LOCK(sc); for (;;) { if (vi && IS_DOOMED(vi)) { rc = ENXIO; goto done; } if (!IS_BUSY(sc)) { rc = 0; break; } if (!(flags & SLEEP_OK)) { rc = EBUSY; goto done; } if (mtx_sleep(&sc->flags, &sc->sc_lock, pri, wmesg, 0)) { rc = EINTR; goto done; } } KASSERT(!IS_BUSY(sc), ("%s: controller busy.", __func__)); SET_BUSY(sc); #ifdef INVARIANTS sc->last_op = wmesg; sc->last_op_thr = curthread; sc->last_op_flags = flags; #endif done: if (!(flags & HOLD_LOCK) || rc) ADAPTER_UNLOCK(sc); return (rc); } /* * Tell if_ioctl and if_init that the VI is going away. This is * special variant of begin_synchronized_op and must be paired with a * call to end_synchronized_op. */ void doom_vi(struct adapter *sc, struct vi_info *vi) { ADAPTER_LOCK(sc); SET_DOOMED(vi); wakeup(&sc->flags); while (IS_BUSY(sc)) mtx_sleep(&sc->flags, &sc->sc_lock, 0, "t4detach", 0); SET_BUSY(sc); #ifdef INVARIANTS sc->last_op = "t4detach"; sc->last_op_thr = curthread; sc->last_op_flags = 0; #endif ADAPTER_UNLOCK(sc); } /* * {begin|end}_synchronized_op must be called from the same thread. */ void end_synchronized_op(struct adapter *sc, int flags) { if (flags & LOCK_HELD) ADAPTER_LOCK_ASSERT_OWNED(sc); else ADAPTER_LOCK(sc); KASSERT(IS_BUSY(sc), ("%s: controller not busy.", __func__)); CLR_BUSY(sc); wakeup(&sc->flags); ADAPTER_UNLOCK(sc); } static int cxgbe_init_synchronized(struct vi_info *vi) { struct port_info *pi = vi->pi; struct adapter *sc = pi->adapter; struct ifnet *ifp = vi->ifp; int rc = 0, i; struct sge_txq *txq; ASSERT_SYNCHRONIZED_OP(sc); if (ifp->if_drv_flags & IFF_DRV_RUNNING) return (0); /* already running */ if (!(sc->flags & FULL_INIT_DONE) && ((rc = adapter_full_init(sc)) != 0)) return (rc); /* error message displayed already */ if (!(vi->flags & VI_INIT_DONE) && ((rc = vi_full_init(vi)) != 0)) return (rc); /* error message displayed already */ rc = update_mac_settings(ifp, XGMAC_ALL); if (rc) goto done; /* error message displayed already */ rc = -t4_enable_vi(sc, sc->mbox, vi->viid, true, true); if (rc != 0) { if_printf(ifp, "enable_vi failed: %d\n", rc); goto done; } /* * Can't fail from this point onwards. Review cxgbe_uninit_synchronized * if this changes. */ for_each_txq(vi, i, txq) { TXQ_LOCK(txq); txq->eq.flags |= EQ_ENABLED; TXQ_UNLOCK(txq); } /* * The first iq of the first port to come up is used for tracing. */ if (sc->traceq < 0 && IS_MAIN_VI(vi)) { sc->traceq = sc->sge.rxq[vi->first_rxq].iq.abs_id; t4_write_reg(sc, is_t4(sc) ? A_MPS_TRC_RSS_CONTROL : A_MPS_T5_TRC_RSS_CONTROL, V_RSSCONTROL(pi->tx_chan) | V_QUEUENUMBER(sc->traceq)); pi->flags |= HAS_TRACEQ; } /* all ok */ PORT_LOCK(pi); ifp->if_drv_flags |= IFF_DRV_RUNNING; pi->up_vis++; if (pi->nvi > 1 || sc->flags & IS_VF) callout_reset(&vi->tick, hz, vi_tick, vi); else callout_reset(&pi->tick, hz, cxgbe_tick, pi); PORT_UNLOCK(pi); done: if (rc != 0) cxgbe_uninit_synchronized(vi); return (rc); } /* * Idempotent. */ static int cxgbe_uninit_synchronized(struct vi_info *vi) { struct port_info *pi = vi->pi; struct adapter *sc = pi->adapter; struct ifnet *ifp = vi->ifp; int rc, i; struct sge_txq *txq; ASSERT_SYNCHRONIZED_OP(sc); if (!(vi->flags & VI_INIT_DONE)) { KASSERT(!(ifp->if_drv_flags & IFF_DRV_RUNNING), ("uninited VI is running")); return (0); } /* * Disable the VI so that all its data in either direction is discarded * by the MPS. Leave everything else (the queues, interrupts, and 1Hz * tick) intact as the TP can deliver negative advice or data that it's * holding in its RAM (for an offloaded connection) even after the VI is * disabled. */ rc = -t4_enable_vi(sc, sc->mbox, vi->viid, false, false); if (rc) { if_printf(ifp, "disable_vi failed: %d\n", rc); return (rc); } for_each_txq(vi, i, txq) { TXQ_LOCK(txq); txq->eq.flags &= ~EQ_ENABLED; TXQ_UNLOCK(txq); } PORT_LOCK(pi); if (pi->nvi > 1 || sc->flags & IS_VF) callout_stop(&vi->tick); else callout_stop(&pi->tick); if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) { PORT_UNLOCK(pi); return (0); } ifp->if_drv_flags &= ~IFF_DRV_RUNNING; pi->up_vis--; if (pi->up_vis > 0) { PORT_UNLOCK(pi); return (0); } PORT_UNLOCK(pi); pi->link_cfg.link_ok = 0; pi->link_cfg.speed = 0; pi->linkdnrc = -1; t4_os_link_changed(sc, pi->port_id, 0, -1); return (0); } /* * It is ok for this function to fail midway and return right away. t4_detach * will walk the entire sc->irq list and clean up whatever is valid. */ int t4_setup_intr_handlers(struct adapter *sc) { int rc, rid, p, q, v; char s[8]; struct irq *irq; struct port_info *pi; struct vi_info *vi; struct sge *sge = &sc->sge; struct sge_rxq *rxq; #ifdef TCP_OFFLOAD struct sge_ofld_rxq *ofld_rxq; #endif #ifdef DEV_NETMAP struct sge_nm_rxq *nm_rxq; #endif #ifdef RSS int nbuckets = rss_getnumbuckets(); #endif /* * Setup interrupts. */ irq = &sc->irq[0]; rid = sc->intr_type == INTR_INTX ? 0 : 1; if (sc->intr_count == 1) return (t4_alloc_irq(sc, irq, rid, t4_intr_all, sc, "all")); /* Multiple interrupts. */ if (sc->flags & IS_VF) KASSERT(sc->intr_count >= T4VF_EXTRA_INTR + sc->params.nports, ("%s: too few intr.", __func__)); else KASSERT(sc->intr_count >= T4_EXTRA_INTR + sc->params.nports, ("%s: too few intr.", __func__)); /* The first one is always error intr on PFs */ if (!(sc->flags & IS_VF)) { rc = t4_alloc_irq(sc, irq, rid, t4_intr_err, sc, "err"); if (rc != 0) return (rc); irq++; rid++; } /* The second one is always the firmware event queue (first on VFs) */ rc = t4_alloc_irq(sc, irq, rid, t4_intr_evt, &sge->fwq, "evt"); if (rc != 0) return (rc); irq++; rid++; for_each_port(sc, p) { pi = sc->port[p]; for_each_vi(pi, v, vi) { vi->first_intr = rid - 1; if (vi->nnmrxq > 0) { int n = max(vi->nrxq, vi->nnmrxq); MPASS(vi->flags & INTR_RXQ); rxq = &sge->rxq[vi->first_rxq]; #ifdef DEV_NETMAP nm_rxq = &sge->nm_rxq[vi->first_nm_rxq]; #endif for (q = 0; q < n; q++) { snprintf(s, sizeof(s), "%x%c%x", p, 'a' + v, q); if (q < vi->nrxq) irq->rxq = rxq++; #ifdef DEV_NETMAP if (q < vi->nnmrxq) irq->nm_rxq = nm_rxq++; #endif rc = t4_alloc_irq(sc, irq, rid, t4_vi_intr, irq, s); if (rc != 0) return (rc); irq++; rid++; vi->nintr++; } } else if (vi->flags & INTR_RXQ) { for_each_rxq(vi, q, rxq) { snprintf(s, sizeof(s), "%x%c%x", p, 'a' + v, q); rc = t4_alloc_irq(sc, irq, rid, t4_intr, rxq, s); if (rc != 0) return (rc); #ifdef RSS bus_bind_intr(sc->dev, irq->res, rss_getcpu(q % nbuckets)); #endif irq++; rid++; vi->nintr++; } } #ifdef TCP_OFFLOAD if (vi->flags & INTR_OFLD_RXQ) { for_each_ofld_rxq(vi, q, ofld_rxq) { snprintf(s, sizeof(s), "%x%c%x", p, 'A' + v, q); rc = t4_alloc_irq(sc, irq, rid, t4_intr, ofld_rxq, s); if (rc != 0) return (rc); irq++; rid++; vi->nintr++; } } #endif } } MPASS(irq == &sc->irq[sc->intr_count]); return (0); } int adapter_full_init(struct adapter *sc) { int rc, i; ASSERT_SYNCHRONIZED_OP(sc); ADAPTER_LOCK_ASSERT_NOTOWNED(sc); KASSERT((sc->flags & FULL_INIT_DONE) == 0, ("%s: FULL_INIT_DONE already", __func__)); /* * queues that belong to the adapter (not any particular port). */ rc = t4_setup_adapter_queues(sc); if (rc != 0) goto done; for (i = 0; i < nitems(sc->tq); i++) { sc->tq[i] = taskqueue_create("t4 taskq", M_NOWAIT, taskqueue_thread_enqueue, &sc->tq[i]); if (sc->tq[i] == NULL) { device_printf(sc->dev, "failed to allocate task queue %d\n", i); rc = ENOMEM; goto done; } taskqueue_start_threads(&sc->tq[i], 1, PI_NET, "%s tq%d", device_get_nameunit(sc->dev), i); } if (!(sc->flags & IS_VF)) t4_intr_enable(sc); sc->flags |= FULL_INIT_DONE; done: if (rc != 0) adapter_full_uninit(sc); return (rc); } int adapter_full_uninit(struct adapter *sc) { int i; ADAPTER_LOCK_ASSERT_NOTOWNED(sc); t4_teardown_adapter_queues(sc); for (i = 0; i < nitems(sc->tq) && sc->tq[i]; i++) { taskqueue_free(sc->tq[i]); sc->tq[i] = NULL; } sc->flags &= ~FULL_INIT_DONE; return (0); } #ifdef RSS #define SUPPORTED_RSS_HASHTYPES (RSS_HASHTYPE_RSS_IPV4 | \ RSS_HASHTYPE_RSS_TCP_IPV4 | RSS_HASHTYPE_RSS_IPV6 | \ RSS_HASHTYPE_RSS_TCP_IPV6 | RSS_HASHTYPE_RSS_UDP_IPV4 | \ RSS_HASHTYPE_RSS_UDP_IPV6) /* Translates kernel hash types to hardware. */ static int hashconfig_to_hashen(int hashconfig) { int hashen = 0; if (hashconfig & RSS_HASHTYPE_RSS_IPV4) hashen |= F_FW_RSS_VI_CONFIG_CMD_IP4TWOTUPEN; if (hashconfig & RSS_HASHTYPE_RSS_IPV6) hashen |= F_FW_RSS_VI_CONFIG_CMD_IP6TWOTUPEN; if (hashconfig & RSS_HASHTYPE_RSS_UDP_IPV4) { hashen |= F_FW_RSS_VI_CONFIG_CMD_UDPEN | F_FW_RSS_VI_CONFIG_CMD_IP4FOURTUPEN; } if (hashconfig & RSS_HASHTYPE_RSS_UDP_IPV6) { hashen |= F_FW_RSS_VI_CONFIG_CMD_UDPEN | F_FW_RSS_VI_CONFIG_CMD_IP6FOURTUPEN; } if (hashconfig & RSS_HASHTYPE_RSS_TCP_IPV4) hashen |= F_FW_RSS_VI_CONFIG_CMD_IP4FOURTUPEN; if (hashconfig & RSS_HASHTYPE_RSS_TCP_IPV6) hashen |= F_FW_RSS_VI_CONFIG_CMD_IP6FOURTUPEN; return (hashen); } /* Translates hardware hash types to kernel. */ static int hashen_to_hashconfig(int hashen) { int hashconfig = 0; if (hashen & F_FW_RSS_VI_CONFIG_CMD_UDPEN) { /* * If UDP hashing was enabled it must have been enabled for * either IPv4 or IPv6 (inclusive or). Enabling UDP without * enabling any 4-tuple hash is nonsense configuration. */ MPASS(hashen & (F_FW_RSS_VI_CONFIG_CMD_IP4FOURTUPEN | F_FW_RSS_VI_CONFIG_CMD_IP6FOURTUPEN)); if (hashen & F_FW_RSS_VI_CONFIG_CMD_IP4FOURTUPEN) hashconfig |= RSS_HASHTYPE_RSS_UDP_IPV4; if (hashen & F_FW_RSS_VI_CONFIG_CMD_IP6FOURTUPEN) hashconfig |= RSS_HASHTYPE_RSS_UDP_IPV6; } if (hashen & F_FW_RSS_VI_CONFIG_CMD_IP4FOURTUPEN) hashconfig |= RSS_HASHTYPE_RSS_TCP_IPV4; if (hashen & F_FW_RSS_VI_CONFIG_CMD_IP6FOURTUPEN) hashconfig |= RSS_HASHTYPE_RSS_TCP_IPV6; if (hashen & F_FW_RSS_VI_CONFIG_CMD_IP4TWOTUPEN) hashconfig |= RSS_HASHTYPE_RSS_IPV4; if (hashen & F_FW_RSS_VI_CONFIG_CMD_IP6TWOTUPEN) hashconfig |= RSS_HASHTYPE_RSS_IPV6; return (hashconfig); } #endif int vi_full_init(struct vi_info *vi) { struct adapter *sc = vi->pi->adapter; struct ifnet *ifp = vi->ifp; uint16_t *rss; struct sge_rxq *rxq; int rc, i, j, hashen; #ifdef RSS int nbuckets = rss_getnumbuckets(); int hashconfig = rss_gethashconfig(); int extra; uint32_t raw_rss_key[RSS_KEYSIZE / sizeof(uint32_t)]; uint32_t rss_key[RSS_KEYSIZE / sizeof(uint32_t)]; #endif ASSERT_SYNCHRONIZED_OP(sc); KASSERT((vi->flags & VI_INIT_DONE) == 0, ("%s: VI_INIT_DONE already", __func__)); sysctl_ctx_init(&vi->ctx); vi->flags |= VI_SYSCTL_CTX; /* * Allocate tx/rx/fl queues for this VI. */ rc = t4_setup_vi_queues(vi); if (rc != 0) goto done; /* error message displayed already */ /* * Setup RSS for this VI. Save a copy of the RSS table for later use. */ if (vi->nrxq > vi->rss_size) { if_printf(ifp, "nrxq (%d) > hw RSS table size (%d); " "some queues will never receive traffic.\n", vi->nrxq, vi->rss_size); } else if (vi->rss_size % vi->nrxq) { if_printf(ifp, "nrxq (%d), hw RSS table size (%d); " "expect uneven traffic distribution.\n", vi->nrxq, vi->rss_size); } #ifdef RSS MPASS(RSS_KEYSIZE == 40); if (vi->nrxq != nbuckets) { if_printf(ifp, "nrxq (%d) != kernel RSS buckets (%d);" "performance will be impacted.\n", vi->nrxq, nbuckets); } rss_getkey((void *)&raw_rss_key[0]); for (i = 0; i < nitems(rss_key); i++) { rss_key[i] = htobe32(raw_rss_key[nitems(rss_key) - 1 - i]); } t4_write_rss_key(sc, &rss_key[0], -1); #endif rss = malloc(vi->rss_size * sizeof (*rss), M_CXGBE, M_ZERO | M_WAITOK); for (i = 0; i < vi->rss_size;) { #ifdef RSS j = rss_get_indirection_to_bucket(i); j %= vi->nrxq; rxq = &sc->sge.rxq[vi->first_rxq + j]; rss[i++] = rxq->iq.abs_id; #else for_each_rxq(vi, j, rxq) { rss[i++] = rxq->iq.abs_id; if (i == vi->rss_size) break; } #endif } rc = -t4_config_rss_range(sc, sc->mbox, vi->viid, 0, vi->rss_size, rss, vi->rss_size); if (rc != 0) { if_printf(ifp, "rss_config failed: %d\n", rc); goto done; } #ifdef RSS hashen = hashconfig_to_hashen(hashconfig); /* * We may have had to enable some hashes even though the global config * wants them disabled. This is a potential problem that must be * reported to the user. */ extra = hashen_to_hashconfig(hashen) ^ hashconfig; /* * If we consider only the supported hash types, then the enabled hashes * are a superset of the requested hashes. In other words, there cannot * be any supported hash that was requested but not enabled, but there * can be hashes that were not requested but had to be enabled. */ extra &= SUPPORTED_RSS_HASHTYPES; MPASS((extra & hashconfig) == 0); if (extra) { if_printf(ifp, "global RSS config (0x%x) cannot be accommodated.\n", hashconfig); } if (extra & RSS_HASHTYPE_RSS_IPV4) if_printf(ifp, "IPv4 2-tuple hashing forced on.\n"); if (extra & RSS_HASHTYPE_RSS_TCP_IPV4) if_printf(ifp, "TCP/IPv4 4-tuple hashing forced on.\n"); if (extra & RSS_HASHTYPE_RSS_IPV6) if_printf(ifp, "IPv6 2-tuple hashing forced on.\n"); if (extra & RSS_HASHTYPE_RSS_TCP_IPV6) if_printf(ifp, "TCP/IPv6 4-tuple hashing forced on.\n"); if (extra & RSS_HASHTYPE_RSS_UDP_IPV4) if_printf(ifp, "UDP/IPv4 4-tuple hashing forced on.\n"); if (extra & RSS_HASHTYPE_RSS_UDP_IPV6) if_printf(ifp, "UDP/IPv6 4-tuple hashing forced on.\n"); #else hashen = F_FW_RSS_VI_CONFIG_CMD_IP6FOURTUPEN | F_FW_RSS_VI_CONFIG_CMD_IP6TWOTUPEN | F_FW_RSS_VI_CONFIG_CMD_IP4FOURTUPEN | F_FW_RSS_VI_CONFIG_CMD_IP4TWOTUPEN | F_FW_RSS_VI_CONFIG_CMD_UDPEN; #endif rc = -t4_config_vi_rss(sc, sc->mbox, vi->viid, hashen, rss[0]); if (rc != 0) { if_printf(ifp, "rss hash/defaultq config failed: %d\n", rc); goto done; } vi->rss = rss; vi->flags |= VI_INIT_DONE; done: if (rc != 0) vi_full_uninit(vi); return (rc); } /* * Idempotent. */ int vi_full_uninit(struct vi_info *vi) { struct port_info *pi = vi->pi; struct adapter *sc = pi->adapter; int i; struct sge_rxq *rxq; struct sge_txq *txq; #ifdef TCP_OFFLOAD struct sge_ofld_rxq *ofld_rxq; struct sge_wrq *ofld_txq; #endif if (vi->flags & VI_INIT_DONE) { /* Need to quiesce queues. */ /* XXX: Only for the first VI? */ if (IS_MAIN_VI(vi) && !(sc->flags & IS_VF)) quiesce_wrq(sc, &sc->sge.ctrlq[pi->port_id]); for_each_txq(vi, i, txq) { quiesce_txq(sc, txq); } #ifdef TCP_OFFLOAD for_each_ofld_txq(vi, i, ofld_txq) { quiesce_wrq(sc, ofld_txq); } #endif for_each_rxq(vi, i, rxq) { quiesce_iq(sc, &rxq->iq); quiesce_fl(sc, &rxq->fl); } #ifdef TCP_OFFLOAD for_each_ofld_rxq(vi, i, ofld_rxq) { quiesce_iq(sc, &ofld_rxq->iq); quiesce_fl(sc, &ofld_rxq->fl); } #endif free(vi->rss, M_CXGBE); free(vi->nm_rss, M_CXGBE); } t4_teardown_vi_queues(vi); vi->flags &= ~VI_INIT_DONE; return (0); } static void quiesce_txq(struct adapter *sc, struct sge_txq *txq) { struct sge_eq *eq = &txq->eq; struct sge_qstat *spg = (void *)&eq->desc[eq->sidx]; (void) sc; /* unused */ #ifdef INVARIANTS TXQ_LOCK(txq); MPASS((eq->flags & EQ_ENABLED) == 0); TXQ_UNLOCK(txq); #endif /* Wait for the mp_ring to empty. */ while (!mp_ring_is_idle(txq->r)) { mp_ring_check_drainage(txq->r, 0); pause("rquiesce", 1); } /* Then wait for the hardware to finish. */ while (spg->cidx != htobe16(eq->pidx)) pause("equiesce", 1); /* Finally, wait for the driver to reclaim all descriptors. */ while (eq->cidx != eq->pidx) pause("dquiesce", 1); } static void quiesce_wrq(struct adapter *sc, struct sge_wrq *wrq) { /* XXXTX */ } static void quiesce_iq(struct adapter *sc, struct sge_iq *iq) { (void) sc; /* unused */ /* Synchronize with the interrupt handler */ while (!atomic_cmpset_int(&iq->state, IQS_IDLE, IQS_DISABLED)) pause("iqfree", 1); } static void quiesce_fl(struct adapter *sc, struct sge_fl *fl) { mtx_lock(&sc->sfl_lock); FL_LOCK(fl); fl->flags |= FL_DOOMED; FL_UNLOCK(fl); callout_stop(&sc->sfl_callout); mtx_unlock(&sc->sfl_lock); KASSERT((fl->flags & FL_STARVING) == 0, ("%s: still starving", __func__)); } static int t4_alloc_irq(struct adapter *sc, struct irq *irq, int rid, driver_intr_t *handler, void *arg, char *name) { int rc; irq->rid = rid; irq->res = bus_alloc_resource_any(sc->dev, SYS_RES_IRQ, &irq->rid, RF_SHAREABLE | RF_ACTIVE); if (irq->res == NULL) { device_printf(sc->dev, "failed to allocate IRQ for rid %d, name %s.\n", rid, name); return (ENOMEM); } rc = bus_setup_intr(sc->dev, irq->res, INTR_MPSAFE | INTR_TYPE_NET, NULL, handler, arg, &irq->tag); if (rc != 0) { device_printf(sc->dev, "failed to setup interrupt for rid %d, name %s: %d\n", rid, name, rc); } else if (name) bus_describe_intr(sc->dev, irq->res, irq->tag, "%s", name); return (rc); } static int t4_free_irq(struct adapter *sc, struct irq *irq) { if (irq->tag) bus_teardown_intr(sc->dev, irq->res, irq->tag); if (irq->res) bus_release_resource(sc->dev, SYS_RES_IRQ, irq->rid, irq->res); bzero(irq, sizeof(*irq)); return (0); } static void get_regs(struct adapter *sc, struct t4_regdump *regs, uint8_t *buf) { regs->version = chip_id(sc) | chip_rev(sc) << 10; t4_get_regs(sc, buf, regs->len); } #define A_PL_INDIR_CMD 0x1f8 #define S_PL_AUTOINC 31 #define M_PL_AUTOINC 0x1U #define V_PL_AUTOINC(x) ((x) << S_PL_AUTOINC) #define G_PL_AUTOINC(x) (((x) >> S_PL_AUTOINC) & M_PL_AUTOINC) #define S_PL_VFID 20 #define M_PL_VFID 0xffU #define V_PL_VFID(x) ((x) << S_PL_VFID) #define G_PL_VFID(x) (((x) >> S_PL_VFID) & M_PL_VFID) #define S_PL_ADDR 0 #define M_PL_ADDR 0xfffffU #define V_PL_ADDR(x) ((x) << S_PL_ADDR) #define G_PL_ADDR(x) (((x) >> S_PL_ADDR) & M_PL_ADDR) #define A_PL_INDIR_DATA 0x1fc static uint64_t read_vf_stat(struct adapter *sc, unsigned int viid, int reg) { u32 stats[2]; mtx_assert(&sc->reg_lock, MA_OWNED); if (sc->flags & IS_VF) { stats[0] = t4_read_reg(sc, VF_MPS_REG(reg)); stats[1] = t4_read_reg(sc, VF_MPS_REG(reg + 4)); } else { t4_write_reg(sc, A_PL_INDIR_CMD, V_PL_AUTOINC(1) | V_PL_VFID(G_FW_VIID_VIN(viid)) | V_PL_ADDR(VF_MPS_REG(reg))); stats[0] = t4_read_reg(sc, A_PL_INDIR_DATA); stats[1] = t4_read_reg(sc, A_PL_INDIR_DATA); } return (((uint64_t)stats[1]) << 32 | stats[0]); } static void t4_get_vi_stats(struct adapter *sc, unsigned int viid, struct fw_vi_stats_vf *stats) { #define GET_STAT(name) \ read_vf_stat(sc, viid, A_MPS_VF_STAT_##name##_L) stats->tx_bcast_bytes = GET_STAT(TX_VF_BCAST_BYTES); stats->tx_bcast_frames = GET_STAT(TX_VF_BCAST_FRAMES); stats->tx_mcast_bytes = GET_STAT(TX_VF_MCAST_BYTES); stats->tx_mcast_frames = GET_STAT(TX_VF_MCAST_FRAMES); stats->tx_ucast_bytes = GET_STAT(TX_VF_UCAST_BYTES); stats->tx_ucast_frames = GET_STAT(TX_VF_UCAST_FRAMES); stats->tx_drop_frames = GET_STAT(TX_VF_DROP_FRAMES); stats->tx_offload_bytes = GET_STAT(TX_VF_OFFLOAD_BYTES); stats->tx_offload_frames = GET_STAT(TX_VF_OFFLOAD_FRAMES); stats->rx_bcast_bytes = GET_STAT(RX_VF_BCAST_BYTES); stats->rx_bcast_frames = GET_STAT(RX_VF_BCAST_FRAMES); stats->rx_mcast_bytes = GET_STAT(RX_VF_MCAST_BYTES); stats->rx_mcast_frames = GET_STAT(RX_VF_MCAST_FRAMES); stats->rx_ucast_bytes = GET_STAT(RX_VF_UCAST_BYTES); stats->rx_ucast_frames = GET_STAT(RX_VF_UCAST_FRAMES); stats->rx_err_frames = GET_STAT(RX_VF_ERR_FRAMES); #undef GET_STAT } static void t4_clr_vi_stats(struct adapter *sc, unsigned int viid) { int reg; t4_write_reg(sc, A_PL_INDIR_CMD, V_PL_AUTOINC(1) | V_PL_VFID(G_FW_VIID_VIN(viid)) | V_PL_ADDR(VF_MPS_REG(A_MPS_VF_STAT_TX_VF_BCAST_BYTES_L))); for (reg = A_MPS_VF_STAT_TX_VF_BCAST_BYTES_L; reg <= A_MPS_VF_STAT_RX_VF_ERR_FRAMES_H; reg += 4) t4_write_reg(sc, A_PL_INDIR_DATA, 0); } static void vi_refresh_stats(struct adapter *sc, struct vi_info *vi) { struct timeval tv; const struct timeval interval = {0, 250000}; /* 250ms */ if (!(vi->flags & VI_INIT_DONE)) return; getmicrotime(&tv); timevalsub(&tv, &interval); if (timevalcmp(&tv, &vi->last_refreshed, <)) return; mtx_lock(&sc->reg_lock); t4_get_vi_stats(sc, vi->viid, &vi->stats); getmicrotime(&vi->last_refreshed); mtx_unlock(&sc->reg_lock); } static void cxgbe_refresh_stats(struct adapter *sc, struct port_info *pi) { int i; u_int v, tnl_cong_drops; struct timeval tv; const struct timeval interval = {0, 250000}; /* 250ms */ getmicrotime(&tv); timevalsub(&tv, &interval); if (timevalcmp(&tv, &pi->last_refreshed, <)) return; tnl_cong_drops = 0; t4_get_port_stats(sc, pi->tx_chan, &pi->stats); for (i = 0; i < sc->chip_params->nchan; i++) { if (pi->rx_chan_map & (1 << i)) { mtx_lock(&sc->reg_lock); t4_read_indirect(sc, A_TP_MIB_INDEX, A_TP_MIB_DATA, &v, 1, A_TP_MIB_TNL_CNG_DROP_0 + i); mtx_unlock(&sc->reg_lock); tnl_cong_drops += v; } } pi->tnl_cong_drops = tnl_cong_drops; getmicrotime(&pi->last_refreshed); } static void cxgbe_tick(void *arg) { struct port_info *pi = arg; struct adapter *sc = pi->adapter; PORT_LOCK_ASSERT_OWNED(pi); cxgbe_refresh_stats(sc, pi); callout_schedule(&pi->tick, hz); } void vi_tick(void *arg) { struct vi_info *vi = arg; struct adapter *sc = vi->pi->adapter; vi_refresh_stats(sc, vi); callout_schedule(&vi->tick, hz); } static void cxgbe_vlan_config(void *arg, struct ifnet *ifp, uint16_t vid) { struct ifnet *vlan; if (arg != ifp || ifp->if_type != IFT_ETHER) return; vlan = VLAN_DEVAT(ifp, vid); VLAN_SETCOOKIE(vlan, ifp); } /* * Should match fw_caps_config_ enums in t4fw_interface.h */ static char *caps_decoder[] = { "\20\001IPMI\002NCSI", /* 0: NBM */ "\20\001PPP\002QFC\003DCBX", /* 1: link */ "\20\001INGRESS\002EGRESS", /* 2: switch */ "\20\001NIC\002VM\003IDS\004UM\005UM_ISGL" /* 3: NIC */ "\006HASHFILTER\007ETHOFLD", "\20\001TOE", /* 4: TOE */ "\20\001RDDP\002RDMAC", /* 5: RDMA */ "\20\001INITIATOR_PDU\002TARGET_PDU" /* 6: iSCSI */ "\003INITIATOR_CNXOFLD\004TARGET_CNXOFLD" "\005INITIATOR_SSNOFLD\006TARGET_SSNOFLD" "\007T10DIF" "\010INITIATOR_CMDOFLD\011TARGET_CMDOFLD", "\20\00KEYS", /* 7: TLS */ "\20\001INITIATOR\002TARGET\003CTRL_OFLD" /* 8: FCoE */ "\004PO_INITIATOR\005PO_TARGET", }; void t4_sysctls(struct adapter *sc) { struct sysctl_ctx_list *ctx; struct sysctl_oid *oid; struct sysctl_oid_list *children, *c0; static char *doorbells = {"\20\1UDB\2WCWR\3UDBWC\4KDB"}; ctx = device_get_sysctl_ctx(sc->dev); /* * dev.t4nex.X. */ oid = device_get_sysctl_tree(sc->dev); c0 = children = SYSCTL_CHILDREN(oid); sc->sc_do_rxcopy = 1; SYSCTL_ADD_INT(ctx, children, OID_AUTO, "do_rx_copy", CTLFLAG_RW, &sc->sc_do_rxcopy, 1, "Do RX copy of small frames"); SYSCTL_ADD_INT(ctx, children, OID_AUTO, "nports", CTLFLAG_RD, NULL, sc->params.nports, "# of ports"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "doorbells", CTLTYPE_STRING | CTLFLAG_RD, doorbells, sc->doorbells, sysctl_bitfield, "A", "available doorbells"); SYSCTL_ADD_INT(ctx, children, OID_AUTO, "core_clock", CTLFLAG_RD, NULL, sc->params.vpd.cclk, "core clock frequency (in KHz)"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "holdoff_timers", CTLTYPE_STRING | CTLFLAG_RD, sc->params.sge.timer_val, sizeof(sc->params.sge.timer_val), sysctl_int_array, "A", "interrupt holdoff timer values (us)"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "holdoff_pkt_counts", CTLTYPE_STRING | CTLFLAG_RD, sc->params.sge.counter_val, sizeof(sc->params.sge.counter_val), sysctl_int_array, "A", "interrupt holdoff packet counter values"); t4_sge_sysctls(sc, ctx, children); sc->lro_timeout = 100; SYSCTL_ADD_INT(ctx, children, OID_AUTO, "lro_timeout", CTLFLAG_RW, &sc->lro_timeout, 0, "lro inactive-flush timeout (in us)"); SYSCTL_ADD_INT(ctx, children, OID_AUTO, "debug_flags", CTLFLAG_RW, &sc->debug_flags, 0, "flags to enable runtime debugging"); SYSCTL_ADD_STRING(ctx, children, OID_AUTO, "tp_version", CTLFLAG_RD, sc->tp_version, 0, "TP microcode version"); SYSCTL_ADD_STRING(ctx, children, OID_AUTO, "firmware_version", CTLFLAG_RD, sc->fw_version, 0, "firmware version"); if (sc->flags & IS_VF) return; SYSCTL_ADD_INT(ctx, children, OID_AUTO, "hw_revision", CTLFLAG_RD, NULL, chip_rev(sc), "chip hardware revision"); SYSCTL_ADD_STRING(ctx, children, OID_AUTO, "sn", CTLFLAG_RD, sc->params.vpd.sn, 0, "serial number"); SYSCTL_ADD_STRING(ctx, children, OID_AUTO, "pn", CTLFLAG_RD, sc->params.vpd.pn, 0, "part number"); SYSCTL_ADD_STRING(ctx, children, OID_AUTO, "ec", CTLFLAG_RD, sc->params.vpd.ec, 0, "engineering change"); SYSCTL_ADD_STRING(ctx, children, OID_AUTO, "na", CTLFLAG_RD, sc->params.vpd.na, 0, "network address"); SYSCTL_ADD_STRING(ctx, children, OID_AUTO, "er_version", CTLFLAG_RD, sc->er_version, 0, "expansion ROM version"); SYSCTL_ADD_STRING(ctx, children, OID_AUTO, "bs_version", CTLFLAG_RD, sc->bs_version, 0, "bootstrap firmware version"); SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "scfg_version", CTLFLAG_RD, NULL, sc->params.scfg_vers, "serial config version"); SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "vpd_version", CTLFLAG_RD, NULL, sc->params.vpd_vers, "VPD version"); SYSCTL_ADD_STRING(ctx, children, OID_AUTO, "cf", CTLFLAG_RD, sc->cfg_file, 0, "configuration file"); SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "cfcsum", CTLFLAG_RD, NULL, sc->cfcsum, "config file checksum"); #define SYSCTL_CAP(name, n, text) \ SYSCTL_ADD_PROC(ctx, children, OID_AUTO, #name, \ CTLTYPE_STRING | CTLFLAG_RD, caps_decoder[n], sc->name, \ sysctl_bitfield, "A", "available " text " capabilities") SYSCTL_CAP(nbmcaps, 0, "NBM"); SYSCTL_CAP(linkcaps, 1, "link"); SYSCTL_CAP(switchcaps, 2, "switch"); SYSCTL_CAP(niccaps, 3, "NIC"); SYSCTL_CAP(toecaps, 4, "TCP offload"); SYSCTL_CAP(rdmacaps, 5, "RDMA"); SYSCTL_CAP(iscsicaps, 6, "iSCSI"); SYSCTL_CAP(tlscaps, 7, "TLS"); SYSCTL_CAP(fcoecaps, 8, "FCoE"); #undef SYSCTL_CAP SYSCTL_ADD_INT(ctx, children, OID_AUTO, "nfilters", CTLFLAG_RD, NULL, sc->tids.nftids, "number of filters"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "temperature", CTLTYPE_INT | CTLFLAG_RD, sc, 0, sysctl_temperature, "I", "chip temperature (in Celsius)"); #ifdef SBUF_DRAIN /* * dev.t4nex.X.misc. Marked CTLFLAG_SKIP to avoid information overload. */ oid = SYSCTL_ADD_NODE(ctx, c0, OID_AUTO, "misc", CTLFLAG_RD | CTLFLAG_SKIP, NULL, "logs and miscellaneous information"); children = SYSCTL_CHILDREN(oid); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cctrl", CTLTYPE_STRING | CTLFLAG_RD, sc, 0, sysctl_cctrl, "A", "congestion control"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_ibq_tp0", CTLTYPE_STRING | CTLFLAG_RD, sc, 0, sysctl_cim_ibq_obq, "A", "CIM IBQ 0 (TP0)"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_ibq_tp1", CTLTYPE_STRING | CTLFLAG_RD, sc, 1, sysctl_cim_ibq_obq, "A", "CIM IBQ 1 (TP1)"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_ibq_ulp", CTLTYPE_STRING | CTLFLAG_RD, sc, 2, sysctl_cim_ibq_obq, "A", "CIM IBQ 2 (ULP)"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_ibq_sge0", CTLTYPE_STRING | CTLFLAG_RD, sc, 3, sysctl_cim_ibq_obq, "A", "CIM IBQ 3 (SGE0)"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_ibq_sge1", CTLTYPE_STRING | CTLFLAG_RD, sc, 4, sysctl_cim_ibq_obq, "A", "CIM IBQ 4 (SGE1)"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_ibq_ncsi", CTLTYPE_STRING | CTLFLAG_RD, sc, 5, sysctl_cim_ibq_obq, "A", "CIM IBQ 5 (NCSI)"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_la", CTLTYPE_STRING | CTLFLAG_RD, sc, 0, chip_id(sc) <= CHELSIO_T5 ? sysctl_cim_la : sysctl_cim_la_t6, "A", "CIM logic analyzer"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_ma_la", CTLTYPE_STRING | CTLFLAG_RD, sc, 0, sysctl_cim_ma_la, "A", "CIM MA logic analyzer"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_obq_ulp0", CTLTYPE_STRING | CTLFLAG_RD, sc, 0 + CIM_NUM_IBQ, sysctl_cim_ibq_obq, "A", "CIM OBQ 0 (ULP0)"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_obq_ulp1", CTLTYPE_STRING | CTLFLAG_RD, sc, 1 + CIM_NUM_IBQ, sysctl_cim_ibq_obq, "A", "CIM OBQ 1 (ULP1)"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_obq_ulp2", CTLTYPE_STRING | CTLFLAG_RD, sc, 2 + CIM_NUM_IBQ, sysctl_cim_ibq_obq, "A", "CIM OBQ 2 (ULP2)"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_obq_ulp3", CTLTYPE_STRING | CTLFLAG_RD, sc, 3 + CIM_NUM_IBQ, sysctl_cim_ibq_obq, "A", "CIM OBQ 3 (ULP3)"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_obq_sge", CTLTYPE_STRING | CTLFLAG_RD, sc, 4 + CIM_NUM_IBQ, sysctl_cim_ibq_obq, "A", "CIM OBQ 4 (SGE)"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_obq_ncsi", CTLTYPE_STRING | CTLFLAG_RD, sc, 5 + CIM_NUM_IBQ, sysctl_cim_ibq_obq, "A", "CIM OBQ 5 (NCSI)"); if (chip_id(sc) > CHELSIO_T4) { SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_obq_sge0_rx", CTLTYPE_STRING | CTLFLAG_RD, sc, 6 + CIM_NUM_IBQ, sysctl_cim_ibq_obq, "A", "CIM OBQ 6 (SGE0-RX)"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_obq_sge1_rx", CTLTYPE_STRING | CTLFLAG_RD, sc, 7 + CIM_NUM_IBQ, sysctl_cim_ibq_obq, "A", "CIM OBQ 7 (SGE1-RX)"); } SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_pif_la", CTLTYPE_STRING | CTLFLAG_RD, sc, 0, sysctl_cim_pif_la, "A", "CIM PIF logic analyzer"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_qcfg", CTLTYPE_STRING | CTLFLAG_RD, sc, 0, sysctl_cim_qcfg, "A", "CIM queue configuration"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cpl_stats", CTLTYPE_STRING | CTLFLAG_RD, sc, 0, sysctl_cpl_stats, "A", "CPL statistics"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "ddp_stats", CTLTYPE_STRING | CTLFLAG_RD, sc, 0, sysctl_ddp_stats, "A", "non-TCP DDP statistics"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "devlog", CTLTYPE_STRING | CTLFLAG_RD, sc, 0, sysctl_devlog, "A", "firmware's device log"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "fcoe_stats", CTLTYPE_STRING | CTLFLAG_RD, sc, 0, sysctl_fcoe_stats, "A", "FCoE statistics"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "hw_sched", CTLTYPE_STRING | CTLFLAG_RD, sc, 0, sysctl_hw_sched, "A", "hardware scheduler "); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "l2t", CTLTYPE_STRING | CTLFLAG_RD, sc, 0, sysctl_l2t, "A", "hardware L2 table"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "lb_stats", CTLTYPE_STRING | CTLFLAG_RD, sc, 0, sysctl_lb_stats, "A", "loopback statistics"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "meminfo", CTLTYPE_STRING | CTLFLAG_RD, sc, 0, sysctl_meminfo, "A", "memory regions"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "mps_tcam", CTLTYPE_STRING | CTLFLAG_RD, sc, 0, chip_id(sc) <= CHELSIO_T5 ? sysctl_mps_tcam : sysctl_mps_tcam_t6, "A", "MPS TCAM entries"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "path_mtus", CTLTYPE_STRING | CTLFLAG_RD, sc, 0, sysctl_path_mtus, "A", "path MTUs"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "pm_stats", CTLTYPE_STRING | CTLFLAG_RD, sc, 0, sysctl_pm_stats, "A", "PM statistics"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "rdma_stats", CTLTYPE_STRING | CTLFLAG_RD, sc, 0, sysctl_rdma_stats, "A", "RDMA statistics"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "tcp_stats", CTLTYPE_STRING | CTLFLAG_RD, sc, 0, sysctl_tcp_stats, "A", "TCP statistics"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "tids", CTLTYPE_STRING | CTLFLAG_RD, sc, 0, sysctl_tids, "A", "TID information"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "tp_err_stats", CTLTYPE_STRING | CTLFLAG_RD, sc, 0, sysctl_tp_err_stats, "A", "TP error statistics"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "tp_la_mask", CTLTYPE_INT | CTLFLAG_RW, sc, 0, sysctl_tp_la_mask, "I", "TP logic analyzer event capture mask"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "tp_la", CTLTYPE_STRING | CTLFLAG_RD, sc, 0, sysctl_tp_la, "A", "TP logic analyzer"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "tx_rate", CTLTYPE_STRING | CTLFLAG_RD, sc, 0, sysctl_tx_rate, "A", "Tx rate"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "ulprx_la", CTLTYPE_STRING | CTLFLAG_RD, sc, 0, sysctl_ulprx_la, "A", "ULPRX logic analyzer"); if (is_t5(sc)) { SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "wcwr_stats", CTLTYPE_STRING | CTLFLAG_RD, sc, 0, sysctl_wcwr_stats, "A", "write combined work requests"); } #endif #ifdef TCP_OFFLOAD if (is_offload(sc)) { /* * dev.t4nex.X.toe. */ oid = SYSCTL_ADD_NODE(ctx, c0, OID_AUTO, "toe", CTLFLAG_RD, NULL, "TOE parameters"); children = SYSCTL_CHILDREN(oid); sc->tt.sndbuf = 256 * 1024; SYSCTL_ADD_INT(ctx, children, OID_AUTO, "sndbuf", CTLFLAG_RW, &sc->tt.sndbuf, 0, "max hardware send buffer size"); sc->tt.ddp = 0; SYSCTL_ADD_INT(ctx, children, OID_AUTO, "ddp", CTLFLAG_RW, &sc->tt.ddp, 0, "DDP allowed"); sc->tt.rx_coalesce = 1; SYSCTL_ADD_INT(ctx, children, OID_AUTO, "rx_coalesce", CTLFLAG_RW, &sc->tt.rx_coalesce, 0, "receive coalescing"); sc->tt.tx_align = 1; SYSCTL_ADD_INT(ctx, children, OID_AUTO, "tx_align", CTLFLAG_RW, &sc->tt.tx_align, 0, "chop and align payload"); sc->tt.tx_zcopy = 0; SYSCTL_ADD_INT(ctx, children, OID_AUTO, "tx_zcopy", CTLFLAG_RW, &sc->tt.tx_zcopy, 0, "Enable zero-copy aio_write(2)"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "timer_tick", CTLTYPE_STRING | CTLFLAG_RD, sc, 0, sysctl_tp_tick, "A", "TP timer tick (us)"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "timestamp_tick", CTLTYPE_STRING | CTLFLAG_RD, sc, 1, sysctl_tp_tick, "A", "TCP timestamp tick (us)"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "dack_tick", CTLTYPE_STRING | CTLFLAG_RD, sc, 2, sysctl_tp_tick, "A", "DACK tick (us)"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "dack_timer", CTLTYPE_UINT | CTLFLAG_RD, sc, 0, sysctl_tp_dack_timer, "IU", "DACK timer (us)"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "rexmt_min", CTLTYPE_ULONG | CTLFLAG_RD, sc, A_TP_RXT_MIN, sysctl_tp_timer, "LU", "Retransmit min (us)"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "rexmt_max", CTLTYPE_ULONG | CTLFLAG_RD, sc, A_TP_RXT_MAX, sysctl_tp_timer, "LU", "Retransmit max (us)"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "persist_min", CTLTYPE_ULONG | CTLFLAG_RD, sc, A_TP_PERS_MIN, sysctl_tp_timer, "LU", "Persist timer min (us)"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "persist_max", CTLTYPE_ULONG | CTLFLAG_RD, sc, A_TP_PERS_MAX, sysctl_tp_timer, "LU", "Persist timer max (us)"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "keepalive_idle", CTLTYPE_ULONG | CTLFLAG_RD, sc, A_TP_KEEP_IDLE, sysctl_tp_timer, "LU", "Keepidle idle timer (us)"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "keepalive_intvl", CTLTYPE_ULONG | CTLFLAG_RD, sc, A_TP_KEEP_INTVL, sysctl_tp_timer, "LU", "Keepidle interval (us)"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "initial_srtt", CTLTYPE_ULONG | CTLFLAG_RD, sc, A_TP_INIT_SRTT, sysctl_tp_timer, "LU", "Initial SRTT (us)"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "finwait2_timer", CTLTYPE_ULONG | CTLFLAG_RD, sc, A_TP_FINWAIT2_TIMER, sysctl_tp_timer, "LU", "FINWAIT2 timer (us)"); } #endif } void vi_sysctls(struct vi_info *vi) { struct sysctl_ctx_list *ctx; struct sysctl_oid *oid; struct sysctl_oid_list *children; ctx = device_get_sysctl_ctx(vi->dev); /* * dev.v?(cxgbe|cxl).X. */ oid = device_get_sysctl_tree(vi->dev); children = SYSCTL_CHILDREN(oid); SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "viid", CTLFLAG_RD, NULL, vi->viid, "VI identifer"); SYSCTL_ADD_INT(ctx, children, OID_AUTO, "nrxq", CTLFLAG_RD, &vi->nrxq, 0, "# of rx queues"); SYSCTL_ADD_INT(ctx, children, OID_AUTO, "ntxq", CTLFLAG_RD, &vi->ntxq, 0, "# of tx queues"); SYSCTL_ADD_INT(ctx, children, OID_AUTO, "first_rxq", CTLFLAG_RD, &vi->first_rxq, 0, "index of first rx queue"); SYSCTL_ADD_INT(ctx, children, OID_AUTO, "first_txq", CTLFLAG_RD, &vi->first_txq, 0, "index of first tx queue"); SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "rss_size", CTLFLAG_RD, NULL, vi->rss_size, "size of RSS indirection table"); if (IS_MAIN_VI(vi)) { SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "rsrv_noflowq", CTLTYPE_INT | CTLFLAG_RW, vi, 0, sysctl_noflowq, "IU", "Reserve queue 0 for non-flowid packets"); } #ifdef TCP_OFFLOAD if (vi->nofldrxq != 0) { SYSCTL_ADD_INT(ctx, children, OID_AUTO, "nofldrxq", CTLFLAG_RD, &vi->nofldrxq, 0, "# of rx queues for offloaded TCP connections"); SYSCTL_ADD_INT(ctx, children, OID_AUTO, "nofldtxq", CTLFLAG_RD, &vi->nofldtxq, 0, "# of tx queues for offloaded TCP connections"); SYSCTL_ADD_INT(ctx, children, OID_AUTO, "first_ofld_rxq", CTLFLAG_RD, &vi->first_ofld_rxq, 0, "index of first TOE rx queue"); SYSCTL_ADD_INT(ctx, children, OID_AUTO, "first_ofld_txq", CTLFLAG_RD, &vi->first_ofld_txq, 0, "index of first TOE tx queue"); } #endif #ifdef DEV_NETMAP if (vi->nnmrxq != 0) { SYSCTL_ADD_INT(ctx, children, OID_AUTO, "nnmrxq", CTLFLAG_RD, &vi->nnmrxq, 0, "# of netmap rx queues"); SYSCTL_ADD_INT(ctx, children, OID_AUTO, "nnmtxq", CTLFLAG_RD, &vi->nnmtxq, 0, "# of netmap tx queues"); SYSCTL_ADD_INT(ctx, children, OID_AUTO, "first_nm_rxq", CTLFLAG_RD, &vi->first_nm_rxq, 0, "index of first netmap rx queue"); SYSCTL_ADD_INT(ctx, children, OID_AUTO, "first_nm_txq", CTLFLAG_RD, &vi->first_nm_txq, 0, "index of first netmap tx queue"); } #endif SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "holdoff_tmr_idx", CTLTYPE_INT | CTLFLAG_RW, vi, 0, sysctl_holdoff_tmr_idx, "I", "holdoff timer index"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "holdoff_pktc_idx", CTLTYPE_INT | CTLFLAG_RW, vi, 0, sysctl_holdoff_pktc_idx, "I", "holdoff packet counter index"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "qsize_rxq", CTLTYPE_INT | CTLFLAG_RW, vi, 0, sysctl_qsize_rxq, "I", "rx queue size"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "qsize_txq", CTLTYPE_INT | CTLFLAG_RW, vi, 0, sysctl_qsize_txq, "I", "tx queue size"); } static void cxgbe_sysctls(struct port_info *pi) { struct sysctl_ctx_list *ctx; struct sysctl_oid *oid; struct sysctl_oid_list *children, *children2; struct adapter *sc = pi->adapter; int i; char name[16]; ctx = device_get_sysctl_ctx(pi->dev); /* * dev.cxgbe.X. */ oid = device_get_sysctl_tree(pi->dev); children = SYSCTL_CHILDREN(oid); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "linkdnrc", CTLTYPE_STRING | CTLFLAG_RD, pi, 0, sysctl_linkdnrc, "A", "reason why link is down"); if (pi->port_type == FW_PORT_TYPE_BT_XAUI) { SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "temperature", CTLTYPE_INT | CTLFLAG_RD, pi, 0, sysctl_btphy, "I", "PHY temperature (in Celsius)"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "fw_version", CTLTYPE_INT | CTLFLAG_RD, pi, 1, sysctl_btphy, "I", "PHY firmware version"); } SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "pause_settings", CTLTYPE_STRING | CTLFLAG_RW, pi, PAUSE_TX, sysctl_pause_settings, "A", "PAUSE settings (bit 0 = rx_pause, bit 1 = tx_pause)"); SYSCTL_ADD_INT(ctx, children, OID_AUTO, "max_speed", CTLFLAG_RD, NULL, port_top_speed(pi), "max speed (in Gbps)"); if (sc->flags & IS_VF) return; /* * dev.(cxgbe|cxl).X.tc. */ oid = SYSCTL_ADD_NODE(ctx, children, OID_AUTO, "tc", CTLFLAG_RD, NULL, "Tx scheduler traffic classes"); for (i = 0; i < sc->chip_params->nsched_cls; i++) { struct tx_sched_class *tc = &pi->tc[i]; snprintf(name, sizeof(name), "%d", i); children2 = SYSCTL_CHILDREN(SYSCTL_ADD_NODE(ctx, SYSCTL_CHILDREN(oid), OID_AUTO, name, CTLFLAG_RD, NULL, "traffic class")); SYSCTL_ADD_UINT(ctx, children2, OID_AUTO, "flags", CTLFLAG_RD, &tc->flags, 0, "flags"); SYSCTL_ADD_UINT(ctx, children2, OID_AUTO, "refcount", CTLFLAG_RD, &tc->refcount, 0, "references to this class"); #ifdef SBUF_DRAIN SYSCTL_ADD_PROC(ctx, children2, OID_AUTO, "params", CTLTYPE_STRING | CTLFLAG_RD, sc, (pi->port_id << 16) | i, sysctl_tc_params, "A", "traffic class parameters"); #endif } /* * dev.cxgbe.X.stats. */ oid = SYSCTL_ADD_NODE(ctx, children, OID_AUTO, "stats", CTLFLAG_RD, NULL, "port statistics"); children = SYSCTL_CHILDREN(oid); SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "tx_parse_error", CTLFLAG_RD, &pi->tx_parse_error, 0, "# of tx packets with invalid length or # of segments"); #define SYSCTL_ADD_T4_REG64(pi, name, desc, reg) \ SYSCTL_ADD_OID(ctx, children, OID_AUTO, name, \ CTLTYPE_U64 | CTLFLAG_RD, sc, reg, \ sysctl_handle_t4_reg64, "QU", desc) SYSCTL_ADD_T4_REG64(pi, "tx_octets", "# of octets in good frames", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_BYTES_L)); SYSCTL_ADD_T4_REG64(pi, "tx_frames", "total # of good frames", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_FRAMES_L)); SYSCTL_ADD_T4_REG64(pi, "tx_bcast_frames", "# of broadcast frames", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_BCAST_L)); SYSCTL_ADD_T4_REG64(pi, "tx_mcast_frames", "# of multicast frames", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_MCAST_L)); SYSCTL_ADD_T4_REG64(pi, "tx_ucast_frames", "# of unicast frames", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_UCAST_L)); SYSCTL_ADD_T4_REG64(pi, "tx_error_frames", "# of error frames", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_ERROR_L)); SYSCTL_ADD_T4_REG64(pi, "tx_frames_64", "# of tx frames in this range", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_64B_L)); SYSCTL_ADD_T4_REG64(pi, "tx_frames_65_127", "# of tx frames in this range", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_65B_127B_L)); SYSCTL_ADD_T4_REG64(pi, "tx_frames_128_255", "# of tx frames in this range", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_128B_255B_L)); SYSCTL_ADD_T4_REG64(pi, "tx_frames_256_511", "# of tx frames in this range", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_256B_511B_L)); SYSCTL_ADD_T4_REG64(pi, "tx_frames_512_1023", "# of tx frames in this range", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_512B_1023B_L)); SYSCTL_ADD_T4_REG64(pi, "tx_frames_1024_1518", "# of tx frames in this range", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_1024B_1518B_L)); SYSCTL_ADD_T4_REG64(pi, "tx_frames_1519_max", "# of tx frames in this range", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_1519B_MAX_L)); SYSCTL_ADD_T4_REG64(pi, "tx_drop", "# of dropped tx frames", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_DROP_L)); SYSCTL_ADD_T4_REG64(pi, "tx_pause", "# of pause frames transmitted", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PAUSE_L)); SYSCTL_ADD_T4_REG64(pi, "tx_ppp0", "# of PPP prio 0 frames transmitted", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PPP0_L)); SYSCTL_ADD_T4_REG64(pi, "tx_ppp1", "# of PPP prio 1 frames transmitted", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PPP1_L)); SYSCTL_ADD_T4_REG64(pi, "tx_ppp2", "# of PPP prio 2 frames transmitted", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PPP2_L)); SYSCTL_ADD_T4_REG64(pi, "tx_ppp3", "# of PPP prio 3 frames transmitted", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PPP3_L)); SYSCTL_ADD_T4_REG64(pi, "tx_ppp4", "# of PPP prio 4 frames transmitted", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PPP4_L)); SYSCTL_ADD_T4_REG64(pi, "tx_ppp5", "# of PPP prio 5 frames transmitted", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PPP5_L)); SYSCTL_ADD_T4_REG64(pi, "tx_ppp6", "# of PPP prio 6 frames transmitted", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PPP6_L)); SYSCTL_ADD_T4_REG64(pi, "tx_ppp7", "# of PPP prio 7 frames transmitted", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PPP7_L)); SYSCTL_ADD_T4_REG64(pi, "rx_octets", "# of octets in good frames", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_BYTES_L)); SYSCTL_ADD_T4_REG64(pi, "rx_frames", "total # of good frames", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_FRAMES_L)); SYSCTL_ADD_T4_REG64(pi, "rx_bcast_frames", "# of broadcast frames", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_BCAST_L)); SYSCTL_ADD_T4_REG64(pi, "rx_mcast_frames", "# of multicast frames", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_MCAST_L)); SYSCTL_ADD_T4_REG64(pi, "rx_ucast_frames", "# of unicast frames", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_UCAST_L)); SYSCTL_ADD_T4_REG64(pi, "rx_too_long", "# of frames exceeding MTU", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_MTU_ERROR_L)); SYSCTL_ADD_T4_REG64(pi, "rx_jabber", "# of jabber frames", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_MTU_CRC_ERROR_L)); SYSCTL_ADD_T4_REG64(pi, "rx_fcs_err", "# of frames received with bad FCS", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_CRC_ERROR_L)); SYSCTL_ADD_T4_REG64(pi, "rx_len_err", "# of frames received with length error", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_LEN_ERROR_L)); SYSCTL_ADD_T4_REG64(pi, "rx_symbol_err", "symbol errors", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_SYM_ERROR_L)); SYSCTL_ADD_T4_REG64(pi, "rx_runt", "# of short frames received", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_LESS_64B_L)); SYSCTL_ADD_T4_REG64(pi, "rx_frames_64", "# of rx frames in this range", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_64B_L)); SYSCTL_ADD_T4_REG64(pi, "rx_frames_65_127", "# of rx frames in this range", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_65B_127B_L)); SYSCTL_ADD_T4_REG64(pi, "rx_frames_128_255", "# of rx frames in this range", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_128B_255B_L)); SYSCTL_ADD_T4_REG64(pi, "rx_frames_256_511", "# of rx frames in this range", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_256B_511B_L)); SYSCTL_ADD_T4_REG64(pi, "rx_frames_512_1023", "# of rx frames in this range", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_512B_1023B_L)); SYSCTL_ADD_T4_REG64(pi, "rx_frames_1024_1518", "# of rx frames in this range", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_1024B_1518B_L)); SYSCTL_ADD_T4_REG64(pi, "rx_frames_1519_max", "# of rx frames in this range", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_1519B_MAX_L)); SYSCTL_ADD_T4_REG64(pi, "rx_pause", "# of pause frames received", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PAUSE_L)); SYSCTL_ADD_T4_REG64(pi, "rx_ppp0", "# of PPP prio 0 frames received", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PPP0_L)); SYSCTL_ADD_T4_REG64(pi, "rx_ppp1", "# of PPP prio 1 frames received", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PPP1_L)); SYSCTL_ADD_T4_REG64(pi, "rx_ppp2", "# of PPP prio 2 frames received", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PPP2_L)); SYSCTL_ADD_T4_REG64(pi, "rx_ppp3", "# of PPP prio 3 frames received", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PPP3_L)); SYSCTL_ADD_T4_REG64(pi, "rx_ppp4", "# of PPP prio 4 frames received", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PPP4_L)); SYSCTL_ADD_T4_REG64(pi, "rx_ppp5", "# of PPP prio 5 frames received", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PPP5_L)); SYSCTL_ADD_T4_REG64(pi, "rx_ppp6", "# of PPP prio 6 frames received", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PPP6_L)); SYSCTL_ADD_T4_REG64(pi, "rx_ppp7", "# of PPP prio 7 frames received", PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PPP7_L)); #undef SYSCTL_ADD_T4_REG64 #define SYSCTL_ADD_T4_PORTSTAT(name, desc) \ SYSCTL_ADD_UQUAD(ctx, children, OID_AUTO, #name, CTLFLAG_RD, \ &pi->stats.name, desc) /* We get these from port_stats and they may be stale by up to 1s */ SYSCTL_ADD_T4_PORTSTAT(rx_ovflow0, "# drops due to buffer-group 0 overflows"); SYSCTL_ADD_T4_PORTSTAT(rx_ovflow1, "# drops due to buffer-group 1 overflows"); SYSCTL_ADD_T4_PORTSTAT(rx_ovflow2, "# drops due to buffer-group 2 overflows"); SYSCTL_ADD_T4_PORTSTAT(rx_ovflow3, "# drops due to buffer-group 3 overflows"); SYSCTL_ADD_T4_PORTSTAT(rx_trunc0, "# of buffer-group 0 truncated packets"); SYSCTL_ADD_T4_PORTSTAT(rx_trunc1, "# of buffer-group 1 truncated packets"); SYSCTL_ADD_T4_PORTSTAT(rx_trunc2, "# of buffer-group 2 truncated packets"); SYSCTL_ADD_T4_PORTSTAT(rx_trunc3, "# of buffer-group 3 truncated packets"); #undef SYSCTL_ADD_T4_PORTSTAT } static int sysctl_int_array(SYSCTL_HANDLER_ARGS) { int rc, *i, space = 0; struct sbuf sb; sbuf_new_for_sysctl(&sb, NULL, 64, req); for (i = arg1; arg2; arg2 -= sizeof(int), i++) { if (space) sbuf_printf(&sb, " "); sbuf_printf(&sb, "%d", *i); space = 1; } rc = sbuf_finish(&sb); sbuf_delete(&sb); return (rc); } static int sysctl_bitfield(SYSCTL_HANDLER_ARGS) { int rc; struct sbuf *sb; rc = sysctl_wire_old_buffer(req, 0); if (rc != 0) return(rc); sb = sbuf_new_for_sysctl(NULL, NULL, 128, req); if (sb == NULL) return (ENOMEM); sbuf_printf(sb, "%b", (int)arg2, (char *)arg1); rc = sbuf_finish(sb); sbuf_delete(sb); return (rc); } static int sysctl_btphy(SYSCTL_HANDLER_ARGS) { struct port_info *pi = arg1; int op = arg2; struct adapter *sc = pi->adapter; u_int v; int rc; rc = begin_synchronized_op(sc, &pi->vi[0], SLEEP_OK | INTR_OK, "t4btt"); if (rc) return (rc); /* XXX: magic numbers */ rc = -t4_mdio_rd(sc, sc->mbox, pi->mdio_addr, 0x1e, op ? 0x20 : 0xc820, &v); end_synchronized_op(sc, 0); if (rc) return (rc); if (op == 0) v /= 256; rc = sysctl_handle_int(oidp, &v, 0, req); return (rc); } static int sysctl_noflowq(SYSCTL_HANDLER_ARGS) { struct vi_info *vi = arg1; int rc, val; val = vi->rsrv_noflowq; rc = sysctl_handle_int(oidp, &val, 0, req); if (rc != 0 || req->newptr == NULL) return (rc); if ((val >= 1) && (vi->ntxq > 1)) vi->rsrv_noflowq = 1; else vi->rsrv_noflowq = 0; return (rc); } static int sysctl_holdoff_tmr_idx(SYSCTL_HANDLER_ARGS) { struct vi_info *vi = arg1; struct adapter *sc = vi->pi->adapter; int idx, rc, i; struct sge_rxq *rxq; #ifdef TCP_OFFLOAD struct sge_ofld_rxq *ofld_rxq; #endif uint8_t v; idx = vi->tmr_idx; rc = sysctl_handle_int(oidp, &idx, 0, req); if (rc != 0 || req->newptr == NULL) return (rc); if (idx < 0 || idx >= SGE_NTIMERS) return (EINVAL); rc = begin_synchronized_op(sc, vi, HOLD_LOCK | SLEEP_OK | INTR_OK, "t4tmr"); if (rc) return (rc); v = V_QINTR_TIMER_IDX(idx) | V_QINTR_CNT_EN(vi->pktc_idx != -1); for_each_rxq(vi, i, rxq) { #ifdef atomic_store_rel_8 atomic_store_rel_8(&rxq->iq.intr_params, v); #else rxq->iq.intr_params = v; #endif } #ifdef TCP_OFFLOAD for_each_ofld_rxq(vi, i, ofld_rxq) { #ifdef atomic_store_rel_8 atomic_store_rel_8(&ofld_rxq->iq.intr_params, v); #else ofld_rxq->iq.intr_params = v; #endif } #endif vi->tmr_idx = idx; end_synchronized_op(sc, LOCK_HELD); return (0); } static int sysctl_holdoff_pktc_idx(SYSCTL_HANDLER_ARGS) { struct vi_info *vi = arg1; struct adapter *sc = vi->pi->adapter; int idx, rc; idx = vi->pktc_idx; rc = sysctl_handle_int(oidp, &idx, 0, req); if (rc != 0 || req->newptr == NULL) return (rc); if (idx < -1 || idx >= SGE_NCOUNTERS) return (EINVAL); rc = begin_synchronized_op(sc, vi, HOLD_LOCK | SLEEP_OK | INTR_OK, "t4pktc"); if (rc) return (rc); if (vi->flags & VI_INIT_DONE) rc = EBUSY; /* cannot be changed once the queues are created */ else vi->pktc_idx = idx; end_synchronized_op(sc, LOCK_HELD); return (rc); } static int sysctl_qsize_rxq(SYSCTL_HANDLER_ARGS) { struct vi_info *vi = arg1; struct adapter *sc = vi->pi->adapter; int qsize, rc; qsize = vi->qsize_rxq; rc = sysctl_handle_int(oidp, &qsize, 0, req); if (rc != 0 || req->newptr == NULL) return (rc); if (qsize < 128 || (qsize & 7)) return (EINVAL); rc = begin_synchronized_op(sc, vi, HOLD_LOCK | SLEEP_OK | INTR_OK, "t4rxqs"); if (rc) return (rc); if (vi->flags & VI_INIT_DONE) rc = EBUSY; /* cannot be changed once the queues are created */ else vi->qsize_rxq = qsize; end_synchronized_op(sc, LOCK_HELD); return (rc); } static int sysctl_qsize_txq(SYSCTL_HANDLER_ARGS) { struct vi_info *vi = arg1; struct adapter *sc = vi->pi->adapter; int qsize, rc; qsize = vi->qsize_txq; rc = sysctl_handle_int(oidp, &qsize, 0, req); if (rc != 0 || req->newptr == NULL) return (rc); if (qsize < 128 || qsize > 65536) return (EINVAL); rc = begin_synchronized_op(sc, vi, HOLD_LOCK | SLEEP_OK | INTR_OK, "t4txqs"); if (rc) return (rc); if (vi->flags & VI_INIT_DONE) rc = EBUSY; /* cannot be changed once the queues are created */ else vi->qsize_txq = qsize; end_synchronized_op(sc, LOCK_HELD); return (rc); } static int sysctl_pause_settings(SYSCTL_HANDLER_ARGS) { struct port_info *pi = arg1; struct adapter *sc = pi->adapter; struct link_config *lc = &pi->link_cfg; int rc; if (req->newptr == NULL) { struct sbuf *sb; static char *bits = "\20\1PAUSE_RX\2PAUSE_TX"; rc = sysctl_wire_old_buffer(req, 0); if (rc != 0) return(rc); sb = sbuf_new_for_sysctl(NULL, NULL, 128, req); if (sb == NULL) return (ENOMEM); sbuf_printf(sb, "%b", lc->fc & (PAUSE_TX | PAUSE_RX), bits); rc = sbuf_finish(sb); sbuf_delete(sb); } else { char s[2]; int n; s[0] = '0' + (lc->requested_fc & (PAUSE_TX | PAUSE_RX)); s[1] = 0; rc = sysctl_handle_string(oidp, s, sizeof(s), req); if (rc != 0) return(rc); if (s[1] != 0) return (EINVAL); if (s[0] < '0' || s[0] > '9') return (EINVAL); /* not a number */ n = s[0] - '0'; if (n & ~(PAUSE_TX | PAUSE_RX)) return (EINVAL); /* some other bit is set too */ rc = begin_synchronized_op(sc, &pi->vi[0], SLEEP_OK | INTR_OK, "t4PAUSE"); if (rc) return (rc); if ((lc->requested_fc & (PAUSE_TX | PAUSE_RX)) != n) { int link_ok = lc->link_ok; lc->requested_fc &= ~(PAUSE_TX | PAUSE_RX); lc->requested_fc |= n; rc = -t4_link_l1cfg(sc, sc->mbox, pi->tx_chan, lc); lc->link_ok = link_ok; /* restore */ } end_synchronized_op(sc, 0); } return (rc); } static int sysctl_handle_t4_reg64(SYSCTL_HANDLER_ARGS) { struct adapter *sc = arg1; int reg = arg2; uint64_t val; val = t4_read_reg64(sc, reg); return (sysctl_handle_64(oidp, &val, 0, req)); } static int sysctl_temperature(SYSCTL_HANDLER_ARGS) { struct adapter *sc = arg1; int rc, t; uint32_t param, val; rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4temp"); if (rc) return (rc); param = V_FW_PARAMS_MNEM(FW_PARAMS_MNEM_DEV) | V_FW_PARAMS_PARAM_X(FW_PARAMS_PARAM_DEV_DIAG) | V_FW_PARAMS_PARAM_Y(FW_PARAM_DEV_DIAG_TMP); rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 1, ¶m, &val); end_synchronized_op(sc, 0); if (rc) return (rc); /* unknown is returned as 0 but we display -1 in that case */ t = val == 0 ? -1 : val; rc = sysctl_handle_int(oidp, &t, 0, req); return (rc); } #ifdef SBUF_DRAIN static int sysctl_cctrl(SYSCTL_HANDLER_ARGS) { struct adapter *sc = arg1; struct sbuf *sb; int rc, i; uint16_t incr[NMTUS][NCCTRL_WIN]; static const char *dec_fac[] = { "0.5", "0.5625", "0.625", "0.6875", "0.75", "0.8125", "0.875", "0.9375" }; rc = sysctl_wire_old_buffer(req, 0); if (rc != 0) return (rc); sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req); if (sb == NULL) return (ENOMEM); t4_read_cong_tbl(sc, incr); for (i = 0; i < NCCTRL_WIN; ++i) { sbuf_printf(sb, "%2d: %4u %4u %4u %4u %4u %4u %4u %4u\n", i, incr[0][i], incr[1][i], incr[2][i], incr[3][i], incr[4][i], incr[5][i], incr[6][i], incr[7][i]); sbuf_printf(sb, "%8u %4u %4u %4u %4u %4u %4u %4u %5u %s\n", incr[8][i], incr[9][i], incr[10][i], incr[11][i], incr[12][i], incr[13][i], incr[14][i], incr[15][i], sc->params.a_wnd[i], dec_fac[sc->params.b_wnd[i]]); } rc = sbuf_finish(sb); sbuf_delete(sb); return (rc); } static const char *qname[CIM_NUM_IBQ + CIM_NUM_OBQ_T5] = { "TP0", "TP1", "ULP", "SGE0", "SGE1", "NC-SI", /* ibq's */ "ULP0", "ULP1", "ULP2", "ULP3", "SGE", "NC-SI", /* obq's */ "SGE0-RX", "SGE1-RX" /* additional obq's (T5 onwards) */ }; static int sysctl_cim_ibq_obq(SYSCTL_HANDLER_ARGS) { struct adapter *sc = arg1; struct sbuf *sb; int rc, i, n, qid = arg2; uint32_t *buf, *p; char *qtype; u_int cim_num_obq = sc->chip_params->cim_num_obq; KASSERT(qid >= 0 && qid < CIM_NUM_IBQ + cim_num_obq, ("%s: bad qid %d\n", __func__, qid)); if (qid < CIM_NUM_IBQ) { /* inbound queue */ qtype = "IBQ"; n = 4 * CIM_IBQ_SIZE; buf = malloc(n * sizeof(uint32_t), M_CXGBE, M_ZERO | M_WAITOK); rc = t4_read_cim_ibq(sc, qid, buf, n); } else { /* outbound queue */ qtype = "OBQ"; qid -= CIM_NUM_IBQ; n = 4 * cim_num_obq * CIM_OBQ_SIZE; buf = malloc(n * sizeof(uint32_t), M_CXGBE, M_ZERO | M_WAITOK); rc = t4_read_cim_obq(sc, qid, buf, n); } if (rc < 0) { rc = -rc; goto done; } n = rc * sizeof(uint32_t); /* rc has # of words actually read */ rc = sysctl_wire_old_buffer(req, 0); if (rc != 0) goto done; sb = sbuf_new_for_sysctl(NULL, NULL, PAGE_SIZE, req); if (sb == NULL) { rc = ENOMEM; goto done; } sbuf_printf(sb, "%s%d %s", qtype , qid, qname[arg2]); for (i = 0, p = buf; i < n; i += 16, p += 4) sbuf_printf(sb, "\n%#06x: %08x %08x %08x %08x", i, p[0], p[1], p[2], p[3]); rc = sbuf_finish(sb); sbuf_delete(sb); done: free(buf, M_CXGBE); return (rc); } static int sysctl_cim_la(SYSCTL_HANDLER_ARGS) { struct adapter *sc = arg1; u_int cfg; struct sbuf *sb; uint32_t *buf, *p; int rc; MPASS(chip_id(sc) <= CHELSIO_T5); rc = -t4_cim_read(sc, A_UP_UP_DBG_LA_CFG, 1, &cfg); if (rc != 0) return (rc); rc = sysctl_wire_old_buffer(req, 0); if (rc != 0) return (rc); sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req); if (sb == NULL) return (ENOMEM); buf = malloc(sc->params.cim_la_size * sizeof(uint32_t), M_CXGBE, M_ZERO | M_WAITOK); rc = -t4_cim_read_la(sc, buf, NULL); if (rc != 0) goto done; sbuf_printf(sb, "Status Data PC%s", cfg & F_UPDBGLACAPTPCONLY ? "" : " LS0Stat LS0Addr LS0Data"); for (p = buf; p <= &buf[sc->params.cim_la_size - 8]; p += 8) { if (cfg & F_UPDBGLACAPTPCONLY) { sbuf_printf(sb, "\n %02x %08x %08x", p[5] & 0xff, p[6], p[7]); sbuf_printf(sb, "\n %02x %02x%06x %02x%06x", (p[3] >> 8) & 0xff, p[3] & 0xff, p[4] >> 8, p[4] & 0xff, p[5] >> 8); sbuf_printf(sb, "\n %02x %x%07x %x%07x", (p[0] >> 4) & 0xff, p[0] & 0xf, p[1] >> 4, p[1] & 0xf, p[2] >> 4); } else { sbuf_printf(sb, "\n %02x %x%07x %x%07x %08x %08x " "%08x%08x%08x%08x", (p[0] >> 4) & 0xff, p[0] & 0xf, p[1] >> 4, p[1] & 0xf, p[2] >> 4, p[2] & 0xf, p[3], p[4], p[5], p[6], p[7]); } } rc = sbuf_finish(sb); sbuf_delete(sb); done: free(buf, M_CXGBE); return (rc); } static int sysctl_cim_la_t6(SYSCTL_HANDLER_ARGS) { struct adapter *sc = arg1; u_int cfg; struct sbuf *sb; uint32_t *buf, *p; int rc; MPASS(chip_id(sc) > CHELSIO_T5); rc = -t4_cim_read(sc, A_UP_UP_DBG_LA_CFG, 1, &cfg); if (rc != 0) return (rc); rc = sysctl_wire_old_buffer(req, 0); if (rc != 0) return (rc); sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req); if (sb == NULL) return (ENOMEM); buf = malloc(sc->params.cim_la_size * sizeof(uint32_t), M_CXGBE, M_ZERO | M_WAITOK); rc = -t4_cim_read_la(sc, buf, NULL); if (rc != 0) goto done; sbuf_printf(sb, "Status Inst Data PC%s", cfg & F_UPDBGLACAPTPCONLY ? "" : " LS0Stat LS0Addr LS0Data LS1Stat LS1Addr LS1Data"); for (p = buf; p <= &buf[sc->params.cim_la_size - 10]; p += 10) { if (cfg & F_UPDBGLACAPTPCONLY) { sbuf_printf(sb, "\n %02x %08x %08x %08x", p[3] & 0xff, p[2], p[1], p[0]); sbuf_printf(sb, "\n %02x %02x%06x %02x%06x %02x%06x", (p[6] >> 8) & 0xff, p[6] & 0xff, p[5] >> 8, p[5] & 0xff, p[4] >> 8, p[4] & 0xff, p[3] >> 8); sbuf_printf(sb, "\n %02x %04x%04x %04x%04x %04x%04x", (p[9] >> 16) & 0xff, p[9] & 0xffff, p[8] >> 16, p[8] & 0xffff, p[7] >> 16, p[7] & 0xffff, p[6] >> 16); } else { sbuf_printf(sb, "\n %02x %04x%04x %04x%04x %04x%04x " "%08x %08x %08x %08x %08x %08x", (p[9] >> 16) & 0xff, p[9] & 0xffff, p[8] >> 16, p[8] & 0xffff, p[7] >> 16, p[7] & 0xffff, p[6] >> 16, p[2], p[1], p[0], p[5], p[4], p[3]); } } rc = sbuf_finish(sb); sbuf_delete(sb); done: free(buf, M_CXGBE); return (rc); } static int sysctl_cim_ma_la(SYSCTL_HANDLER_ARGS) { struct adapter *sc = arg1; u_int i; struct sbuf *sb; uint32_t *buf, *p; int rc; rc = sysctl_wire_old_buffer(req, 0); if (rc != 0) return (rc); sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req); if (sb == NULL) return (ENOMEM); buf = malloc(2 * CIM_MALA_SIZE * 5 * sizeof(uint32_t), M_CXGBE, M_ZERO | M_WAITOK); t4_cim_read_ma_la(sc, buf, buf + 5 * CIM_MALA_SIZE); p = buf; for (i = 0; i < CIM_MALA_SIZE; i++, p += 5) { sbuf_printf(sb, "\n%02x%08x%08x%08x%08x", p[4], p[3], p[2], p[1], p[0]); } sbuf_printf(sb, "\n\nCnt ID Tag UE Data RDY VLD"); for (i = 0; i < CIM_MALA_SIZE; i++, p += 5) { sbuf_printf(sb, "\n%3u %2u %x %u %08x%08x %u %u", (p[2] >> 10) & 0xff, (p[2] >> 7) & 7, (p[2] >> 3) & 0xf, (p[2] >> 2) & 1, (p[1] >> 2) | ((p[2] & 3) << 30), (p[0] >> 2) | ((p[1] & 3) << 30), (p[0] >> 1) & 1, p[0] & 1); } rc = sbuf_finish(sb); sbuf_delete(sb); free(buf, M_CXGBE); return (rc); } static int sysctl_cim_pif_la(SYSCTL_HANDLER_ARGS) { struct adapter *sc = arg1; u_int i; struct sbuf *sb; uint32_t *buf, *p; int rc; rc = sysctl_wire_old_buffer(req, 0); if (rc != 0) return (rc); sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req); if (sb == NULL) return (ENOMEM); buf = malloc(2 * CIM_PIFLA_SIZE * 6 * sizeof(uint32_t), M_CXGBE, M_ZERO | M_WAITOK); t4_cim_read_pif_la(sc, buf, buf + 6 * CIM_PIFLA_SIZE, NULL, NULL); p = buf; sbuf_printf(sb, "Cntl ID DataBE Addr Data"); for (i = 0; i < CIM_PIFLA_SIZE; i++, p += 6) { sbuf_printf(sb, "\n %02x %02x %04x %08x %08x%08x%08x%08x", (p[5] >> 22) & 0xff, (p[5] >> 16) & 0x3f, p[5] & 0xffff, p[4], p[3], p[2], p[1], p[0]); } sbuf_printf(sb, "\n\nCntl ID Data"); for (i = 0; i < CIM_PIFLA_SIZE; i++, p += 6) { sbuf_printf(sb, "\n %02x %02x %08x%08x%08x%08x", (p[4] >> 6) & 0xff, p[4] & 0x3f, p[3], p[2], p[1], p[0]); } rc = sbuf_finish(sb); sbuf_delete(sb); free(buf, M_CXGBE); return (rc); } static int sysctl_cim_qcfg(SYSCTL_HANDLER_ARGS) { struct adapter *sc = arg1; struct sbuf *sb; int rc, i; uint16_t base[CIM_NUM_IBQ + CIM_NUM_OBQ_T5]; uint16_t size[CIM_NUM_IBQ + CIM_NUM_OBQ_T5]; uint16_t thres[CIM_NUM_IBQ]; uint32_t obq_wr[2 * CIM_NUM_OBQ_T5], *wr = obq_wr; uint32_t stat[4 * (CIM_NUM_IBQ + CIM_NUM_OBQ_T5)], *p = stat; u_int cim_num_obq, ibq_rdaddr, obq_rdaddr, nq; cim_num_obq = sc->chip_params->cim_num_obq; if (is_t4(sc)) { ibq_rdaddr = A_UP_IBQ_0_RDADDR; obq_rdaddr = A_UP_OBQ_0_REALADDR; } else { ibq_rdaddr = A_UP_IBQ_0_SHADOW_RDADDR; obq_rdaddr = A_UP_OBQ_0_SHADOW_REALADDR; } nq = CIM_NUM_IBQ + cim_num_obq; rc = -t4_cim_read(sc, ibq_rdaddr, 4 * nq, stat); if (rc == 0) rc = -t4_cim_read(sc, obq_rdaddr, 2 * cim_num_obq, obq_wr); if (rc != 0) return (rc); t4_read_cimq_cfg(sc, base, size, thres); rc = sysctl_wire_old_buffer(req, 0); if (rc != 0) return (rc); sb = sbuf_new_for_sysctl(NULL, NULL, PAGE_SIZE, req); if (sb == NULL) return (ENOMEM); sbuf_printf(sb, "Queue Base Size Thres RdPtr WrPtr SOP EOP Avail"); for (i = 0; i < CIM_NUM_IBQ; i++, p += 4) sbuf_printf(sb, "\n%7s %5x %5u %5u %6x %4x %4u %4u %5u", qname[i], base[i], size[i], thres[i], G_IBQRDADDR(p[0]), G_IBQWRADDR(p[1]), G_QUESOPCNT(p[3]), G_QUEEOPCNT(p[3]), G_QUEREMFLITS(p[2]) * 16); for ( ; i < nq; i++, p += 4, wr += 2) sbuf_printf(sb, "\n%7s %5x %5u %12x %4x %4u %4u %5u", qname[i], base[i], size[i], G_QUERDADDR(p[0]) & 0x3fff, wr[0] - base[i], G_QUESOPCNT(p[3]), G_QUEEOPCNT(p[3]), G_QUEREMFLITS(p[2]) * 16); rc = sbuf_finish(sb); sbuf_delete(sb); return (rc); } static int sysctl_cpl_stats(SYSCTL_HANDLER_ARGS) { struct adapter *sc = arg1; struct sbuf *sb; int rc; struct tp_cpl_stats stats; rc = sysctl_wire_old_buffer(req, 0); if (rc != 0) return (rc); sb = sbuf_new_for_sysctl(NULL, NULL, 256, req); if (sb == NULL) return (ENOMEM); mtx_lock(&sc->reg_lock); t4_tp_get_cpl_stats(sc, &stats); mtx_unlock(&sc->reg_lock); if (sc->chip_params->nchan > 2) { sbuf_printf(sb, " channel 0 channel 1" " channel 2 channel 3"); sbuf_printf(sb, "\nCPL requests: %10u %10u %10u %10u", stats.req[0], stats.req[1], stats.req[2], stats.req[3]); sbuf_printf(sb, "\nCPL responses: %10u %10u %10u %10u", stats.rsp[0], stats.rsp[1], stats.rsp[2], stats.rsp[3]); } else { sbuf_printf(sb, " channel 0 channel 1"); sbuf_printf(sb, "\nCPL requests: %10u %10u", stats.req[0], stats.req[1]); sbuf_printf(sb, "\nCPL responses: %10u %10u", stats.rsp[0], stats.rsp[1]); } rc = sbuf_finish(sb); sbuf_delete(sb); return (rc); } static int sysctl_ddp_stats(SYSCTL_HANDLER_ARGS) { struct adapter *sc = arg1; struct sbuf *sb; int rc; struct tp_usm_stats stats; rc = sysctl_wire_old_buffer(req, 0); if (rc != 0) return(rc); sb = sbuf_new_for_sysctl(NULL, NULL, 256, req); if (sb == NULL) return (ENOMEM); t4_get_usm_stats(sc, &stats); sbuf_printf(sb, "Frames: %u\n", stats.frames); sbuf_printf(sb, "Octets: %ju\n", stats.octets); sbuf_printf(sb, "Drops: %u", stats.drops); rc = sbuf_finish(sb); sbuf_delete(sb); return (rc); } static const char * const devlog_level_strings[] = { [FW_DEVLOG_LEVEL_EMERG] = "EMERG", [FW_DEVLOG_LEVEL_CRIT] = "CRIT", [FW_DEVLOG_LEVEL_ERR] = "ERR", [FW_DEVLOG_LEVEL_NOTICE] = "NOTICE", [FW_DEVLOG_LEVEL_INFO] = "INFO", [FW_DEVLOG_LEVEL_DEBUG] = "DEBUG" }; static const char * const devlog_facility_strings[] = { [FW_DEVLOG_FACILITY_CORE] = "CORE", [FW_DEVLOG_FACILITY_CF] = "CF", [FW_DEVLOG_FACILITY_SCHED] = "SCHED", [FW_DEVLOG_FACILITY_TIMER] = "TIMER", [FW_DEVLOG_FACILITY_RES] = "RES", [FW_DEVLOG_FACILITY_HW] = "HW", [FW_DEVLOG_FACILITY_FLR] = "FLR", [FW_DEVLOG_FACILITY_DMAQ] = "DMAQ", [FW_DEVLOG_FACILITY_PHY] = "PHY", [FW_DEVLOG_FACILITY_MAC] = "MAC", [FW_DEVLOG_FACILITY_PORT] = "PORT", [FW_DEVLOG_FACILITY_VI] = "VI", [FW_DEVLOG_FACILITY_FILTER] = "FILTER", [FW_DEVLOG_FACILITY_ACL] = "ACL", [FW_DEVLOG_FACILITY_TM] = "TM", [FW_DEVLOG_FACILITY_QFC] = "QFC", [FW_DEVLOG_FACILITY_DCB] = "DCB", [FW_DEVLOG_FACILITY_ETH] = "ETH", [FW_DEVLOG_FACILITY_OFLD] = "OFLD", [FW_DEVLOG_FACILITY_RI] = "RI", [FW_DEVLOG_FACILITY_ISCSI] = "ISCSI", [FW_DEVLOG_FACILITY_FCOE] = "FCOE", [FW_DEVLOG_FACILITY_FOISCSI] = "FOISCSI", [FW_DEVLOG_FACILITY_FOFCOE] = "FOFCOE", [FW_DEVLOG_FACILITY_CHNET] = "CHNET", }; static int sysctl_devlog(SYSCTL_HANDLER_ARGS) { struct adapter *sc = arg1; struct devlog_params *dparams = &sc->params.devlog; struct fw_devlog_e *buf, *e; int i, j, rc, nentries, first = 0; struct sbuf *sb; uint64_t ftstamp = UINT64_MAX; if (dparams->addr == 0) return (ENXIO); buf = malloc(dparams->size, M_CXGBE, M_NOWAIT); if (buf == NULL) return (ENOMEM); rc = read_via_memwin(sc, 1, dparams->addr, (void *)buf, dparams->size); if (rc != 0) goto done; nentries = dparams->size / sizeof(struct fw_devlog_e); for (i = 0; i < nentries; i++) { e = &buf[i]; if (e->timestamp == 0) break; /* end */ e->timestamp = be64toh(e->timestamp); e->seqno = be32toh(e->seqno); for (j = 0; j < 8; j++) e->params[j] = be32toh(e->params[j]); if (e->timestamp < ftstamp) { ftstamp = e->timestamp; first = i; } } if (buf[first].timestamp == 0) goto done; /* nothing in the log */ rc = sysctl_wire_old_buffer(req, 0); if (rc != 0) goto done; sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req); if (sb == NULL) { rc = ENOMEM; goto done; } sbuf_printf(sb, "%10s %15s %8s %8s %s\n", "Seq#", "Tstamp", "Level", "Facility", "Message"); i = first; do { e = &buf[i]; if (e->timestamp == 0) break; /* end */ sbuf_printf(sb, "%10d %15ju %8s %8s ", e->seqno, e->timestamp, (e->level < nitems(devlog_level_strings) ? devlog_level_strings[e->level] : "UNKNOWN"), (e->facility < nitems(devlog_facility_strings) ? devlog_facility_strings[e->facility] : "UNKNOWN")); sbuf_printf(sb, e->fmt, e->params[0], e->params[1], e->params[2], e->params[3], e->params[4], e->params[5], e->params[6], e->params[7]); if (++i == nentries) i = 0; } while (i != first); rc = sbuf_finish(sb); sbuf_delete(sb); done: free(buf, M_CXGBE); return (rc); } static int sysctl_fcoe_stats(SYSCTL_HANDLER_ARGS) { struct adapter *sc = arg1; struct sbuf *sb; int rc; struct tp_fcoe_stats stats[MAX_NCHAN]; int i, nchan = sc->chip_params->nchan; rc = sysctl_wire_old_buffer(req, 0); if (rc != 0) return (rc); sb = sbuf_new_for_sysctl(NULL, NULL, 256, req); if (sb == NULL) return (ENOMEM); for (i = 0; i < nchan; i++) t4_get_fcoe_stats(sc, i, &stats[i]); if (nchan > 2) { sbuf_printf(sb, " channel 0 channel 1" " channel 2 channel 3"); sbuf_printf(sb, "\noctetsDDP: %16ju %16ju %16ju %16ju", stats[0].octets_ddp, stats[1].octets_ddp, stats[2].octets_ddp, stats[3].octets_ddp); sbuf_printf(sb, "\nframesDDP: %16u %16u %16u %16u", stats[0].frames_ddp, stats[1].frames_ddp, stats[2].frames_ddp, stats[3].frames_ddp); sbuf_printf(sb, "\nframesDrop: %16u %16u %16u %16u", stats[0].frames_drop, stats[1].frames_drop, stats[2].frames_drop, stats[3].frames_drop); } else { sbuf_printf(sb, " channel 0 channel 1"); sbuf_printf(sb, "\noctetsDDP: %16ju %16ju", stats[0].octets_ddp, stats[1].octets_ddp); sbuf_printf(sb, "\nframesDDP: %16u %16u", stats[0].frames_ddp, stats[1].frames_ddp); sbuf_printf(sb, "\nframesDrop: %16u %16u", stats[0].frames_drop, stats[1].frames_drop); } rc = sbuf_finish(sb); sbuf_delete(sb); return (rc); } static int sysctl_hw_sched(SYSCTL_HANDLER_ARGS) { struct adapter *sc = arg1; struct sbuf *sb; int rc, i; unsigned int map, kbps, ipg, mode; unsigned int pace_tab[NTX_SCHED]; rc = sysctl_wire_old_buffer(req, 0); if (rc != 0) return (rc); sb = sbuf_new_for_sysctl(NULL, NULL, 256, req); if (sb == NULL) return (ENOMEM); map = t4_read_reg(sc, A_TP_TX_MOD_QUEUE_REQ_MAP); mode = G_TIMERMODE(t4_read_reg(sc, A_TP_MOD_CONFIG)); t4_read_pace_tbl(sc, pace_tab); sbuf_printf(sb, "Scheduler Mode Channel Rate (Kbps) " "Class IPG (0.1 ns) Flow IPG (us)"); for (i = 0; i < NTX_SCHED; ++i, map >>= 2) { t4_get_tx_sched(sc, i, &kbps, &ipg); sbuf_printf(sb, "\n %u %-5s %u ", i, (mode & (1 << i)) ? "flow" : "class", map & 3); if (kbps) sbuf_printf(sb, "%9u ", kbps); else sbuf_printf(sb, " disabled "); if (ipg) sbuf_printf(sb, "%13u ", ipg); else sbuf_printf(sb, " disabled "); if (pace_tab[i]) sbuf_printf(sb, "%10u", pace_tab[i]); else sbuf_printf(sb, " disabled"); } rc = sbuf_finish(sb); sbuf_delete(sb); return (rc); } static int sysctl_lb_stats(SYSCTL_HANDLER_ARGS) { struct adapter *sc = arg1; struct sbuf *sb; int rc, i, j; uint64_t *p0, *p1; struct lb_port_stats s[2]; static const char *stat_name[] = { "OctetsOK:", "FramesOK:", "BcastFrames:", "McastFrames:", "UcastFrames:", "ErrorFrames:", "Frames64:", "Frames65To127:", "Frames128To255:", "Frames256To511:", "Frames512To1023:", "Frames1024To1518:", "Frames1519ToMax:", "FramesDropped:", "BG0FramesDropped:", "BG1FramesDropped:", "BG2FramesDropped:", "BG3FramesDropped:", "BG0FramesTrunc:", "BG1FramesTrunc:", "BG2FramesTrunc:", "BG3FramesTrunc:" }; rc = sysctl_wire_old_buffer(req, 0); if (rc != 0) return (rc); sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req); if (sb == NULL) return (ENOMEM); memset(s, 0, sizeof(s)); for (i = 0; i < sc->chip_params->nchan; i += 2) { t4_get_lb_stats(sc, i, &s[0]); t4_get_lb_stats(sc, i + 1, &s[1]); p0 = &s[0].octets; p1 = &s[1].octets; sbuf_printf(sb, "%s Loopback %u" " Loopback %u", i == 0 ? "" : "\n", i, i + 1); for (j = 0; j < nitems(stat_name); j++) sbuf_printf(sb, "\n%-17s %20ju %20ju", stat_name[j], *p0++, *p1++); } rc = sbuf_finish(sb); sbuf_delete(sb); return (rc); } static int sysctl_linkdnrc(SYSCTL_HANDLER_ARGS) { int rc = 0; struct port_info *pi = arg1; struct sbuf *sb; rc = sysctl_wire_old_buffer(req, 0); if (rc != 0) return(rc); sb = sbuf_new_for_sysctl(NULL, NULL, 64, req); if (sb == NULL) return (ENOMEM); if (pi->linkdnrc < 0) sbuf_printf(sb, "n/a"); else sbuf_printf(sb, "%s", t4_link_down_rc_str(pi->linkdnrc)); rc = sbuf_finish(sb); sbuf_delete(sb); return (rc); } struct mem_desc { unsigned int base; unsigned int limit; unsigned int idx; }; static int mem_desc_cmp(const void *a, const void *b) { return ((const struct mem_desc *)a)->base - ((const struct mem_desc *)b)->base; } static void mem_region_show(struct sbuf *sb, const char *name, unsigned int from, unsigned int to) { unsigned int size; if (from == to) return; size = to - from + 1; if (size == 0) return; /* XXX: need humanize_number(3) in libkern for a more readable 'size' */ sbuf_printf(sb, "%-15s %#x-%#x [%u]\n", name, from, to, size); } static int sysctl_meminfo(SYSCTL_HANDLER_ARGS) { struct adapter *sc = arg1; struct sbuf *sb; int rc, i, n; uint32_t lo, hi, used, alloc; static const char *memory[] = {"EDC0:", "EDC1:", "MC:", "MC0:", "MC1:"}; static const char *region[] = { "DBQ contexts:", "IMSG contexts:", "FLM cache:", "TCBs:", "Pstructs:", "Timers:", "Rx FL:", "Tx FL:", "Pstruct FL:", "Tx payload:", "Rx payload:", "LE hash:", "iSCSI region:", "TDDP region:", "TPT region:", "STAG region:", "RQ region:", "RQUDP region:", "PBL region:", "TXPBL region:", "DBVFIFO region:", "ULPRX state:", "ULPTX state:", "On-chip queues:" }; struct mem_desc avail[4]; struct mem_desc mem[nitems(region) + 3]; /* up to 3 holes */ struct mem_desc *md = mem; rc = sysctl_wire_old_buffer(req, 0); if (rc != 0) return (rc); sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req); if (sb == NULL) return (ENOMEM); for (i = 0; i < nitems(mem); i++) { mem[i].limit = 0; mem[i].idx = i; } /* Find and sort the populated memory ranges */ i = 0; lo = t4_read_reg(sc, A_MA_TARGET_MEM_ENABLE); if (lo & F_EDRAM0_ENABLE) { hi = t4_read_reg(sc, A_MA_EDRAM0_BAR); avail[i].base = G_EDRAM0_BASE(hi) << 20; avail[i].limit = avail[i].base + (G_EDRAM0_SIZE(hi) << 20); avail[i].idx = 0; i++; } if (lo & F_EDRAM1_ENABLE) { hi = t4_read_reg(sc, A_MA_EDRAM1_BAR); avail[i].base = G_EDRAM1_BASE(hi) << 20; avail[i].limit = avail[i].base + (G_EDRAM1_SIZE(hi) << 20); avail[i].idx = 1; i++; } if (lo & F_EXT_MEM_ENABLE) { hi = t4_read_reg(sc, A_MA_EXT_MEMORY_BAR); avail[i].base = G_EXT_MEM_BASE(hi) << 20; avail[i].limit = avail[i].base + (G_EXT_MEM_SIZE(hi) << 20); avail[i].idx = is_t5(sc) ? 3 : 2; /* Call it MC0 for T5 */ i++; } if (is_t5(sc) && lo & F_EXT_MEM1_ENABLE) { hi = t4_read_reg(sc, A_MA_EXT_MEMORY1_BAR); avail[i].base = G_EXT_MEM1_BASE(hi) << 20; avail[i].limit = avail[i].base + (G_EXT_MEM1_SIZE(hi) << 20); avail[i].idx = 4; i++; } if (!i) /* no memory available */ return 0; qsort(avail, i, sizeof(struct mem_desc), mem_desc_cmp); (md++)->base = t4_read_reg(sc, A_SGE_DBQ_CTXT_BADDR); (md++)->base = t4_read_reg(sc, A_SGE_IMSG_CTXT_BADDR); (md++)->base = t4_read_reg(sc, A_SGE_FLM_CACHE_BADDR); (md++)->base = t4_read_reg(sc, A_TP_CMM_TCB_BASE); (md++)->base = t4_read_reg(sc, A_TP_CMM_MM_BASE); (md++)->base = t4_read_reg(sc, A_TP_CMM_TIMER_BASE); (md++)->base = t4_read_reg(sc, A_TP_CMM_MM_RX_FLST_BASE); (md++)->base = t4_read_reg(sc, A_TP_CMM_MM_TX_FLST_BASE); (md++)->base = t4_read_reg(sc, A_TP_CMM_MM_PS_FLST_BASE); /* the next few have explicit upper bounds */ md->base = t4_read_reg(sc, A_TP_PMM_TX_BASE); md->limit = md->base - 1 + t4_read_reg(sc, A_TP_PMM_TX_PAGE_SIZE) * G_PMTXMAXPAGE(t4_read_reg(sc, A_TP_PMM_TX_MAX_PAGE)); md++; md->base = t4_read_reg(sc, A_TP_PMM_RX_BASE); md->limit = md->base - 1 + t4_read_reg(sc, A_TP_PMM_RX_PAGE_SIZE) * G_PMRXMAXPAGE(t4_read_reg(sc, A_TP_PMM_RX_MAX_PAGE)); md++; if (t4_read_reg(sc, A_LE_DB_CONFIG) & F_HASHEN) { if (chip_id(sc) <= CHELSIO_T5) md->base = t4_read_reg(sc, A_LE_DB_HASH_TID_BASE); else md->base = t4_read_reg(sc, A_LE_DB_HASH_TBL_BASE_ADDR); md->limit = 0; } else { md->base = 0; md->idx = nitems(region); /* hide it */ } md++; #define ulp_region(reg) \ md->base = t4_read_reg(sc, A_ULP_ ## reg ## _LLIMIT);\ (md++)->limit = t4_read_reg(sc, A_ULP_ ## reg ## _ULIMIT) ulp_region(RX_ISCSI); ulp_region(RX_TDDP); ulp_region(TX_TPT); ulp_region(RX_STAG); ulp_region(RX_RQ); ulp_region(RX_RQUDP); ulp_region(RX_PBL); ulp_region(TX_PBL); #undef ulp_region md->base = 0; md->idx = nitems(region); if (!is_t4(sc)) { uint32_t size = 0; uint32_t sge_ctrl = t4_read_reg(sc, A_SGE_CONTROL2); uint32_t fifo_size = t4_read_reg(sc, A_SGE_DBVFIFO_SIZE); if (is_t5(sc)) { if (sge_ctrl & F_VFIFO_ENABLE) size = G_DBVFIFO_SIZE(fifo_size); } else size = G_T6_DBVFIFO_SIZE(fifo_size); if (size) { md->base = G_BASEADDR(t4_read_reg(sc, A_SGE_DBVFIFO_BADDR)); md->limit = md->base + (size << 2) - 1; } } md++; md->base = t4_read_reg(sc, A_ULP_RX_CTX_BASE); md->limit = 0; md++; md->base = t4_read_reg(sc, A_ULP_TX_ERR_TABLE_BASE); md->limit = 0; md++; md->base = sc->vres.ocq.start; if (sc->vres.ocq.size) md->limit = md->base + sc->vres.ocq.size - 1; else md->idx = nitems(region); /* hide it */ md++; /* add any address-space holes, there can be up to 3 */ for (n = 0; n < i - 1; n++) if (avail[n].limit < avail[n + 1].base) (md++)->base = avail[n].limit; if (avail[n].limit) (md++)->base = avail[n].limit; n = md - mem; qsort(mem, n, sizeof(struct mem_desc), mem_desc_cmp); for (lo = 0; lo < i; lo++) mem_region_show(sb, memory[avail[lo].idx], avail[lo].base, avail[lo].limit - 1); sbuf_printf(sb, "\n"); for (i = 0; i < n; i++) { if (mem[i].idx >= nitems(region)) continue; /* skip holes */ if (!mem[i].limit) mem[i].limit = i < n - 1 ? mem[i + 1].base - 1 : ~0; mem_region_show(sb, region[mem[i].idx], mem[i].base, mem[i].limit); } sbuf_printf(sb, "\n"); lo = t4_read_reg(sc, A_CIM_SDRAM_BASE_ADDR); hi = t4_read_reg(sc, A_CIM_SDRAM_ADDR_SIZE) + lo - 1; mem_region_show(sb, "uP RAM:", lo, hi); lo = t4_read_reg(sc, A_CIM_EXTMEM2_BASE_ADDR); hi = t4_read_reg(sc, A_CIM_EXTMEM2_ADDR_SIZE) + lo - 1; mem_region_show(sb, "uP Extmem2:", lo, hi); lo = t4_read_reg(sc, A_TP_PMM_RX_MAX_PAGE); sbuf_printf(sb, "\n%u Rx pages of size %uKiB for %u channels\n", G_PMRXMAXPAGE(lo), t4_read_reg(sc, A_TP_PMM_RX_PAGE_SIZE) >> 10, (lo & F_PMRXNUMCHN) ? 2 : 1); lo = t4_read_reg(sc, A_TP_PMM_TX_MAX_PAGE); hi = t4_read_reg(sc, A_TP_PMM_TX_PAGE_SIZE); sbuf_printf(sb, "%u Tx pages of size %u%ciB for %u channels\n", G_PMTXMAXPAGE(lo), hi >= (1 << 20) ? (hi >> 20) : (hi >> 10), hi >= (1 << 20) ? 'M' : 'K', 1 << G_PMTXNUMCHN(lo)); sbuf_printf(sb, "%u p-structs\n", t4_read_reg(sc, A_TP_CMM_MM_MAX_PSTRUCT)); for (i = 0; i < 4; i++) { if (chip_id(sc) > CHELSIO_T5) lo = t4_read_reg(sc, A_MPS_RX_MAC_BG_PG_CNT0 + i * 4); else lo = t4_read_reg(sc, A_MPS_RX_PG_RSV0 + i * 4); if (is_t5(sc)) { used = G_T5_USED(lo); alloc = G_T5_ALLOC(lo); } else { used = G_USED(lo); alloc = G_ALLOC(lo); } /* For T6 these are MAC buffer groups */ sbuf_printf(sb, "\nPort %d using %u pages out of %u allocated", i, used, alloc); } for (i = 0; i < sc->chip_params->nchan; i++) { if (chip_id(sc) > CHELSIO_T5) lo = t4_read_reg(sc, A_MPS_RX_LPBK_BG_PG_CNT0 + i * 4); else lo = t4_read_reg(sc, A_MPS_RX_PG_RSV4 + i * 4); if (is_t5(sc)) { used = G_T5_USED(lo); alloc = G_T5_ALLOC(lo); } else { used = G_USED(lo); alloc = G_ALLOC(lo); } /* For T6 these are MAC buffer groups */ sbuf_printf(sb, "\nLoopback %d using %u pages out of %u allocated", i, used, alloc); } rc = sbuf_finish(sb); sbuf_delete(sb); return (rc); } static inline void tcamxy2valmask(uint64_t x, uint64_t y, uint8_t *addr, uint64_t *mask) { *mask = x | y; y = htobe64(y); memcpy(addr, (char *)&y + 2, ETHER_ADDR_LEN); } static int sysctl_mps_tcam(SYSCTL_HANDLER_ARGS) { struct adapter *sc = arg1; struct sbuf *sb; int rc, i; MPASS(chip_id(sc) <= CHELSIO_T5); rc = sysctl_wire_old_buffer(req, 0); if (rc != 0) return (rc); sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req); if (sb == NULL) return (ENOMEM); sbuf_printf(sb, "Idx Ethernet address Mask Vld Ports PF" " VF Replication P0 P1 P2 P3 ML"); for (i = 0; i < sc->chip_params->mps_tcam_size; i++) { uint64_t tcamx, tcamy, mask; uint32_t cls_lo, cls_hi; uint8_t addr[ETHER_ADDR_LEN]; tcamy = t4_read_reg64(sc, MPS_CLS_TCAM_Y_L(i)); tcamx = t4_read_reg64(sc, MPS_CLS_TCAM_X_L(i)); if (tcamx & tcamy) continue; tcamxy2valmask(tcamx, tcamy, addr, &mask); cls_lo = t4_read_reg(sc, MPS_CLS_SRAM_L(i)); cls_hi = t4_read_reg(sc, MPS_CLS_SRAM_H(i)); sbuf_printf(sb, "\n%3u %02x:%02x:%02x:%02x:%02x:%02x %012jx" " %c %#x%4u%4d", i, addr[0], addr[1], addr[2], addr[3], addr[4], addr[5], (uintmax_t)mask, (cls_lo & F_SRAM_VLD) ? 'Y' : 'N', G_PORTMAP(cls_hi), G_PF(cls_lo), (cls_lo & F_VF_VALID) ? G_VF(cls_lo) : -1); if (cls_lo & F_REPLICATE) { struct fw_ldst_cmd ldst_cmd; memset(&ldst_cmd, 0, sizeof(ldst_cmd)); ldst_cmd.op_to_addrspace = htobe32(V_FW_CMD_OP(FW_LDST_CMD) | F_FW_CMD_REQUEST | F_FW_CMD_READ | V_FW_LDST_CMD_ADDRSPACE(FW_LDST_ADDRSPC_MPS)); ldst_cmd.cycles_to_len16 = htobe32(FW_LEN16(ldst_cmd)); ldst_cmd.u.mps.rplc.fid_idx = htobe16(V_FW_LDST_CMD_FID(FW_LDST_MPS_RPLC) | V_FW_LDST_CMD_IDX(i)); rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4mps"); if (rc) break; rc = -t4_wr_mbox(sc, sc->mbox, &ldst_cmd, sizeof(ldst_cmd), &ldst_cmd); end_synchronized_op(sc, 0); if (rc != 0) { sbuf_printf(sb, "%36d", rc); rc = 0; } else { sbuf_printf(sb, " %08x %08x %08x %08x", be32toh(ldst_cmd.u.mps.rplc.rplc127_96), be32toh(ldst_cmd.u.mps.rplc.rplc95_64), be32toh(ldst_cmd.u.mps.rplc.rplc63_32), be32toh(ldst_cmd.u.mps.rplc.rplc31_0)); } } else sbuf_printf(sb, "%36s", ""); sbuf_printf(sb, "%4u%3u%3u%3u %#3x", G_SRAM_PRIO0(cls_lo), G_SRAM_PRIO1(cls_lo), G_SRAM_PRIO2(cls_lo), G_SRAM_PRIO3(cls_lo), (cls_lo >> S_MULTILISTEN0) & 0xf); } if (rc) (void) sbuf_finish(sb); else rc = sbuf_finish(sb); sbuf_delete(sb); return (rc); } static int sysctl_mps_tcam_t6(SYSCTL_HANDLER_ARGS) { struct adapter *sc = arg1; struct sbuf *sb; int rc, i; MPASS(chip_id(sc) > CHELSIO_T5); rc = sysctl_wire_old_buffer(req, 0); if (rc != 0) return (rc); sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req); if (sb == NULL) return (ENOMEM); sbuf_printf(sb, "Idx Ethernet address Mask VNI Mask" " IVLAN Vld DIP_Hit Lookup Port Vld Ports PF VF" " Replication" " P0 P1 P2 P3 ML\n"); for (i = 0; i < sc->chip_params->mps_tcam_size; i++) { uint8_t dip_hit, vlan_vld, lookup_type, port_num; uint16_t ivlan; uint64_t tcamx, tcamy, val, mask; uint32_t cls_lo, cls_hi, ctl, data2, vnix, vniy; uint8_t addr[ETHER_ADDR_LEN]; ctl = V_CTLREQID(1) | V_CTLCMDTYPE(0) | V_CTLXYBITSEL(0); if (i < 256) ctl |= V_CTLTCAMINDEX(i) | V_CTLTCAMSEL(0); else ctl |= V_CTLTCAMINDEX(i - 256) | V_CTLTCAMSEL(1); t4_write_reg(sc, A_MPS_CLS_TCAM_DATA2_CTL, ctl); val = t4_read_reg(sc, A_MPS_CLS_TCAM_RDATA1_REQ_ID1); tcamy = G_DMACH(val) << 32; tcamy |= t4_read_reg(sc, A_MPS_CLS_TCAM_RDATA0_REQ_ID1); data2 = t4_read_reg(sc, A_MPS_CLS_TCAM_RDATA2_REQ_ID1); lookup_type = G_DATALKPTYPE(data2); port_num = G_DATAPORTNUM(data2); if (lookup_type && lookup_type != M_DATALKPTYPE) { /* Inner header VNI */ vniy = ((data2 & F_DATAVIDH2) << 23) | (G_DATAVIDH1(data2) << 16) | G_VIDL(val); dip_hit = data2 & F_DATADIPHIT; vlan_vld = 0; } else { vniy = 0; dip_hit = 0; vlan_vld = data2 & F_DATAVIDH2; ivlan = G_VIDL(val); } ctl |= V_CTLXYBITSEL(1); t4_write_reg(sc, A_MPS_CLS_TCAM_DATA2_CTL, ctl); val = t4_read_reg(sc, A_MPS_CLS_TCAM_RDATA1_REQ_ID1); tcamx = G_DMACH(val) << 32; tcamx |= t4_read_reg(sc, A_MPS_CLS_TCAM_RDATA0_REQ_ID1); data2 = t4_read_reg(sc, A_MPS_CLS_TCAM_RDATA2_REQ_ID1); if (lookup_type && lookup_type != M_DATALKPTYPE) { /* Inner header VNI mask */ vnix = ((data2 & F_DATAVIDH2) << 23) | (G_DATAVIDH1(data2) << 16) | G_VIDL(val); } else vnix = 0; if (tcamx & tcamy) continue; tcamxy2valmask(tcamx, tcamy, addr, &mask); cls_lo = t4_read_reg(sc, MPS_CLS_SRAM_L(i)); cls_hi = t4_read_reg(sc, MPS_CLS_SRAM_H(i)); if (lookup_type && lookup_type != M_DATALKPTYPE) { sbuf_printf(sb, "\n%3u %02x:%02x:%02x:%02x:%02x:%02x " "%012jx %06x %06x - - %3c" " 'I' %4x %3c %#x%4u%4d", i, addr[0], addr[1], addr[2], addr[3], addr[4], addr[5], (uintmax_t)mask, vniy, vnix, dip_hit ? 'Y' : 'N', port_num, cls_lo & F_T6_SRAM_VLD ? 'Y' : 'N', G_PORTMAP(cls_hi), G_T6_PF(cls_lo), cls_lo & F_T6_VF_VALID ? G_T6_VF(cls_lo) : -1); } else { sbuf_printf(sb, "\n%3u %02x:%02x:%02x:%02x:%02x:%02x " "%012jx - - ", i, addr[0], addr[1], addr[2], addr[3], addr[4], addr[5], (uintmax_t)mask); if (vlan_vld) sbuf_printf(sb, "%4u Y ", ivlan); else sbuf_printf(sb, " - N "); sbuf_printf(sb, "- %3c %4x %3c %#x%4u%4d", lookup_type ? 'I' : 'O', port_num, cls_lo & F_T6_SRAM_VLD ? 'Y' : 'N', G_PORTMAP(cls_hi), G_T6_PF(cls_lo), cls_lo & F_T6_VF_VALID ? G_T6_VF(cls_lo) : -1); } if (cls_lo & F_T6_REPLICATE) { struct fw_ldst_cmd ldst_cmd; memset(&ldst_cmd, 0, sizeof(ldst_cmd)); ldst_cmd.op_to_addrspace = htobe32(V_FW_CMD_OP(FW_LDST_CMD) | F_FW_CMD_REQUEST | F_FW_CMD_READ | V_FW_LDST_CMD_ADDRSPACE(FW_LDST_ADDRSPC_MPS)); ldst_cmd.cycles_to_len16 = htobe32(FW_LEN16(ldst_cmd)); ldst_cmd.u.mps.rplc.fid_idx = htobe16(V_FW_LDST_CMD_FID(FW_LDST_MPS_RPLC) | V_FW_LDST_CMD_IDX(i)); rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t6mps"); if (rc) break; rc = -t4_wr_mbox(sc, sc->mbox, &ldst_cmd, sizeof(ldst_cmd), &ldst_cmd); end_synchronized_op(sc, 0); if (rc != 0) { sbuf_printf(sb, "%72d", rc); rc = 0; } else { sbuf_printf(sb, " %08x %08x %08x %08x" " %08x %08x %08x %08x", be32toh(ldst_cmd.u.mps.rplc.rplc255_224), be32toh(ldst_cmd.u.mps.rplc.rplc223_192), be32toh(ldst_cmd.u.mps.rplc.rplc191_160), be32toh(ldst_cmd.u.mps.rplc.rplc159_128), be32toh(ldst_cmd.u.mps.rplc.rplc127_96), be32toh(ldst_cmd.u.mps.rplc.rplc95_64), be32toh(ldst_cmd.u.mps.rplc.rplc63_32), be32toh(ldst_cmd.u.mps.rplc.rplc31_0)); } } else sbuf_printf(sb, "%72s", ""); sbuf_printf(sb, "%4u%3u%3u%3u %#x", G_T6_SRAM_PRIO0(cls_lo), G_T6_SRAM_PRIO1(cls_lo), G_T6_SRAM_PRIO2(cls_lo), G_T6_SRAM_PRIO3(cls_lo), (cls_lo >> S_T6_MULTILISTEN0) & 0xf); } if (rc) (void) sbuf_finish(sb); else rc = sbuf_finish(sb); sbuf_delete(sb); return (rc); } static int sysctl_path_mtus(SYSCTL_HANDLER_ARGS) { struct adapter *sc = arg1; struct sbuf *sb; int rc; uint16_t mtus[NMTUS]; rc = sysctl_wire_old_buffer(req, 0); if (rc != 0) return (rc); sb = sbuf_new_for_sysctl(NULL, NULL, 256, req); if (sb == NULL) return (ENOMEM); t4_read_mtu_tbl(sc, mtus, NULL); sbuf_printf(sb, "%u %u %u %u %u %u %u %u %u %u %u %u %u %u %u %u", mtus[0], mtus[1], mtus[2], mtus[3], mtus[4], mtus[5], mtus[6], mtus[7], mtus[8], mtus[9], mtus[10], mtus[11], mtus[12], mtus[13], mtus[14], mtus[15]); rc = sbuf_finish(sb); sbuf_delete(sb); return (rc); } static int sysctl_pm_stats(SYSCTL_HANDLER_ARGS) { struct adapter *sc = arg1; struct sbuf *sb; int rc, i; uint32_t tx_cnt[MAX_PM_NSTATS], rx_cnt[MAX_PM_NSTATS]; uint64_t tx_cyc[MAX_PM_NSTATS], rx_cyc[MAX_PM_NSTATS]; static const char *tx_stats[MAX_PM_NSTATS] = { "Read:", "Write bypass:", "Write mem:", "Bypass + mem:", "Tx FIFO wait", NULL, "Tx latency" }; static const char *rx_stats[MAX_PM_NSTATS] = { "Read:", "Write bypass:", "Write mem:", "Flush:", " Rx FIFO wait", NULL, "Rx latency" }; rc = sysctl_wire_old_buffer(req, 0); if (rc != 0) return (rc); sb = sbuf_new_for_sysctl(NULL, NULL, 256, req); if (sb == NULL) return (ENOMEM); t4_pmtx_get_stats(sc, tx_cnt, tx_cyc); t4_pmrx_get_stats(sc, rx_cnt, rx_cyc); sbuf_printf(sb, " Tx pcmds Tx bytes"); for (i = 0; i < 4; i++) { sbuf_printf(sb, "\n%-13s %10u %20ju", tx_stats[i], tx_cnt[i], tx_cyc[i]); } sbuf_printf(sb, "\n Rx pcmds Rx bytes"); for (i = 0; i < 4; i++) { sbuf_printf(sb, "\n%-13s %10u %20ju", rx_stats[i], rx_cnt[i], rx_cyc[i]); } if (chip_id(sc) > CHELSIO_T5) { sbuf_printf(sb, "\n Total wait Total occupancy"); sbuf_printf(sb, "\n%-13s %10u %20ju", tx_stats[i], tx_cnt[i], tx_cyc[i]); sbuf_printf(sb, "\n%-13s %10u %20ju", rx_stats[i], rx_cnt[i], rx_cyc[i]); i += 2; MPASS(i < nitems(tx_stats)); sbuf_printf(sb, "\n Reads Total wait"); sbuf_printf(sb, "\n%-13s %10u %20ju", tx_stats[i], tx_cnt[i], tx_cyc[i]); sbuf_printf(sb, "\n%-13s %10u %20ju", rx_stats[i], rx_cnt[i], rx_cyc[i]); } rc = sbuf_finish(sb); sbuf_delete(sb); return (rc); } static int sysctl_rdma_stats(SYSCTL_HANDLER_ARGS) { struct adapter *sc = arg1; struct sbuf *sb; int rc; struct tp_rdma_stats stats; rc = sysctl_wire_old_buffer(req, 0); if (rc != 0) return (rc); sb = sbuf_new_for_sysctl(NULL, NULL, 256, req); if (sb == NULL) return (ENOMEM); mtx_lock(&sc->reg_lock); t4_tp_get_rdma_stats(sc, &stats); mtx_unlock(&sc->reg_lock); sbuf_printf(sb, "NoRQEModDefferals: %u\n", stats.rqe_dfr_mod); sbuf_printf(sb, "NoRQEPktDefferals: %u", stats.rqe_dfr_pkt); rc = sbuf_finish(sb); sbuf_delete(sb); return (rc); } static int sysctl_tcp_stats(SYSCTL_HANDLER_ARGS) { struct adapter *sc = arg1; struct sbuf *sb; int rc; struct tp_tcp_stats v4, v6; rc = sysctl_wire_old_buffer(req, 0); if (rc != 0) return (rc); sb = sbuf_new_for_sysctl(NULL, NULL, 256, req); if (sb == NULL) return (ENOMEM); mtx_lock(&sc->reg_lock); t4_tp_get_tcp_stats(sc, &v4, &v6); mtx_unlock(&sc->reg_lock); sbuf_printf(sb, " IP IPv6\n"); sbuf_printf(sb, "OutRsts: %20u %20u\n", v4.tcp_out_rsts, v6.tcp_out_rsts); sbuf_printf(sb, "InSegs: %20ju %20ju\n", v4.tcp_in_segs, v6.tcp_in_segs); sbuf_printf(sb, "OutSegs: %20ju %20ju\n", v4.tcp_out_segs, v6.tcp_out_segs); sbuf_printf(sb, "RetransSegs: %20ju %20ju", v4.tcp_retrans_segs, v6.tcp_retrans_segs); rc = sbuf_finish(sb); sbuf_delete(sb); return (rc); } static int sysctl_tids(SYSCTL_HANDLER_ARGS) { struct adapter *sc = arg1; struct sbuf *sb; int rc; struct tid_info *t = &sc->tids; rc = sysctl_wire_old_buffer(req, 0); if (rc != 0) return (rc); sb = sbuf_new_for_sysctl(NULL, NULL, 256, req); if (sb == NULL) return (ENOMEM); if (t->natids) { sbuf_printf(sb, "ATID range: 0-%u, in use: %u\n", t->natids - 1, t->atids_in_use); } if (t->ntids) { if (t4_read_reg(sc, A_LE_DB_CONFIG) & F_HASHEN) { uint32_t b = t4_read_reg(sc, A_LE_DB_SERVER_INDEX) / 4; if (b) { sbuf_printf(sb, "TID range: 0-%u, %u-%u", b - 1, t4_read_reg(sc, A_LE_DB_TID_HASHBASE) / 4, t->ntids - 1); } else { sbuf_printf(sb, "TID range: %u-%u", t4_read_reg(sc, A_LE_DB_TID_HASHBASE) / 4, t->ntids - 1); } } else sbuf_printf(sb, "TID range: 0-%u", t->ntids - 1); sbuf_printf(sb, ", in use: %u\n", atomic_load_acq_int(&t->tids_in_use)); } if (t->nstids) { sbuf_printf(sb, "STID range: %u-%u, in use: %u\n", t->stid_base, t->stid_base + t->nstids - 1, t->stids_in_use); } if (t->nftids) { sbuf_printf(sb, "FTID range: %u-%u\n", t->ftid_base, t->ftid_base + t->nftids - 1); } if (t->netids) { sbuf_printf(sb, "ETID range: %u-%u\n", t->etid_base, t->etid_base + t->netids - 1); } sbuf_printf(sb, "HW TID usage: %u IP users, %u IPv6 users", t4_read_reg(sc, A_LE_DB_ACT_CNT_IPV4), t4_read_reg(sc, A_LE_DB_ACT_CNT_IPV6)); rc = sbuf_finish(sb); sbuf_delete(sb); return (rc); } static int sysctl_tp_err_stats(SYSCTL_HANDLER_ARGS) { struct adapter *sc = arg1; struct sbuf *sb; int rc; struct tp_err_stats stats; rc = sysctl_wire_old_buffer(req, 0); if (rc != 0) return (rc); sb = sbuf_new_for_sysctl(NULL, NULL, 256, req); if (sb == NULL) return (ENOMEM); mtx_lock(&sc->reg_lock); t4_tp_get_err_stats(sc, &stats); mtx_unlock(&sc->reg_lock); if (sc->chip_params->nchan > 2) { sbuf_printf(sb, " channel 0 channel 1" " channel 2 channel 3\n"); sbuf_printf(sb, "macInErrs: %10u %10u %10u %10u\n", stats.mac_in_errs[0], stats.mac_in_errs[1], stats.mac_in_errs[2], stats.mac_in_errs[3]); sbuf_printf(sb, "hdrInErrs: %10u %10u %10u %10u\n", stats.hdr_in_errs[0], stats.hdr_in_errs[1], stats.hdr_in_errs[2], stats.hdr_in_errs[3]); sbuf_printf(sb, "tcpInErrs: %10u %10u %10u %10u\n", stats.tcp_in_errs[0], stats.tcp_in_errs[1], stats.tcp_in_errs[2], stats.tcp_in_errs[3]); sbuf_printf(sb, "tcp6InErrs: %10u %10u %10u %10u\n", stats.tcp6_in_errs[0], stats.tcp6_in_errs[1], stats.tcp6_in_errs[2], stats.tcp6_in_errs[3]); sbuf_printf(sb, "tnlCongDrops: %10u %10u %10u %10u\n", stats.tnl_cong_drops[0], stats.tnl_cong_drops[1], stats.tnl_cong_drops[2], stats.tnl_cong_drops[3]); sbuf_printf(sb, "tnlTxDrops: %10u %10u %10u %10u\n", stats.tnl_tx_drops[0], stats.tnl_tx_drops[1], stats.tnl_tx_drops[2], stats.tnl_tx_drops[3]); sbuf_printf(sb, "ofldVlanDrops: %10u %10u %10u %10u\n", stats.ofld_vlan_drops[0], stats.ofld_vlan_drops[1], stats.ofld_vlan_drops[2], stats.ofld_vlan_drops[3]); sbuf_printf(sb, "ofldChanDrops: %10u %10u %10u %10u\n\n", stats.ofld_chan_drops[0], stats.ofld_chan_drops[1], stats.ofld_chan_drops[2], stats.ofld_chan_drops[3]); } else { sbuf_printf(sb, " channel 0 channel 1\n"); sbuf_printf(sb, "macInErrs: %10u %10u\n", stats.mac_in_errs[0], stats.mac_in_errs[1]); sbuf_printf(sb, "hdrInErrs: %10u %10u\n", stats.hdr_in_errs[0], stats.hdr_in_errs[1]); sbuf_printf(sb, "tcpInErrs: %10u %10u\n", stats.tcp_in_errs[0], stats.tcp_in_errs[1]); sbuf_printf(sb, "tcp6InErrs: %10u %10u\n", stats.tcp6_in_errs[0], stats.tcp6_in_errs[1]); sbuf_printf(sb, "tnlCongDrops: %10u %10u\n", stats.tnl_cong_drops[0], stats.tnl_cong_drops[1]); sbuf_printf(sb, "tnlTxDrops: %10u %10u\n", stats.tnl_tx_drops[0], stats.tnl_tx_drops[1]); sbuf_printf(sb, "ofldVlanDrops: %10u %10u\n", stats.ofld_vlan_drops[0], stats.ofld_vlan_drops[1]); sbuf_printf(sb, "ofldChanDrops: %10u %10u\n\n", stats.ofld_chan_drops[0], stats.ofld_chan_drops[1]); } sbuf_printf(sb, "ofldNoNeigh: %u\nofldCongDefer: %u", stats.ofld_no_neigh, stats.ofld_cong_defer); rc = sbuf_finish(sb); sbuf_delete(sb); return (rc); } static int sysctl_tp_la_mask(SYSCTL_HANDLER_ARGS) { struct adapter *sc = arg1; struct tp_params *tpp = &sc->params.tp; u_int mask; int rc; mask = tpp->la_mask >> 16; rc = sysctl_handle_int(oidp, &mask, 0, req); if (rc != 0 || req->newptr == NULL) return (rc); if (mask > 0xffff) return (EINVAL); tpp->la_mask = mask << 16; t4_set_reg_field(sc, A_TP_DBG_LA_CONFIG, 0xffff0000U, tpp->la_mask); return (0); } struct field_desc { const char *name; u_int start; u_int width; }; static void field_desc_show(struct sbuf *sb, uint64_t v, const struct field_desc *f) { char buf[32]; int line_size = 0; while (f->name) { uint64_t mask = (1ULL << f->width) - 1; int len = snprintf(buf, sizeof(buf), "%s: %ju", f->name, ((uintmax_t)v >> f->start) & mask); if (line_size + len >= 79) { line_size = 8; sbuf_printf(sb, "\n "); } sbuf_printf(sb, "%s ", buf); line_size += len + 1; f++; } sbuf_printf(sb, "\n"); } static const struct field_desc tp_la0[] = { { "RcfOpCodeOut", 60, 4 }, { "State", 56, 4 }, { "WcfState", 52, 4 }, { "RcfOpcSrcOut", 50, 2 }, { "CRxError", 49, 1 }, { "ERxError", 48, 1 }, { "SanityFailed", 47, 1 }, { "SpuriousMsg", 46, 1 }, { "FlushInputMsg", 45, 1 }, { "FlushInputCpl", 44, 1 }, { "RssUpBit", 43, 1 }, { "RssFilterHit", 42, 1 }, { "Tid", 32, 10 }, { "InitTcb", 31, 1 }, { "LineNumber", 24, 7 }, { "Emsg", 23, 1 }, { "EdataOut", 22, 1 }, { "Cmsg", 21, 1 }, { "CdataOut", 20, 1 }, { "EreadPdu", 19, 1 }, { "CreadPdu", 18, 1 }, { "TunnelPkt", 17, 1 }, { "RcfPeerFin", 16, 1 }, { "RcfReasonOut", 12, 4 }, { "TxCchannel", 10, 2 }, { "RcfTxChannel", 8, 2 }, { "RxEchannel", 6, 2 }, { "RcfRxChannel", 5, 1 }, { "RcfDataOutSrdy", 4, 1 }, { "RxDvld", 3, 1 }, { "RxOoDvld", 2, 1 }, { "RxCongestion", 1, 1 }, { "TxCongestion", 0, 1 }, { NULL } }; static const struct field_desc tp_la1[] = { { "CplCmdIn", 56, 8 }, { "CplCmdOut", 48, 8 }, { "ESynOut", 47, 1 }, { "EAckOut", 46, 1 }, { "EFinOut", 45, 1 }, { "ERstOut", 44, 1 }, { "SynIn", 43, 1 }, { "AckIn", 42, 1 }, { "FinIn", 41, 1 }, { "RstIn", 40, 1 }, { "DataIn", 39, 1 }, { "DataInVld", 38, 1 }, { "PadIn", 37, 1 }, { "RxBufEmpty", 36, 1 }, { "RxDdp", 35, 1 }, { "RxFbCongestion", 34, 1 }, { "TxFbCongestion", 33, 1 }, { "TxPktSumSrdy", 32, 1 }, { "RcfUlpType", 28, 4 }, { "Eread", 27, 1 }, { "Ebypass", 26, 1 }, { "Esave", 25, 1 }, { "Static0", 24, 1 }, { "Cread", 23, 1 }, { "Cbypass", 22, 1 }, { "Csave", 21, 1 }, { "CPktOut", 20, 1 }, { "RxPagePoolFull", 18, 2 }, { "RxLpbkPkt", 17, 1 }, { "TxLpbkPkt", 16, 1 }, { "RxVfValid", 15, 1 }, { "SynLearned", 14, 1 }, { "SetDelEntry", 13, 1 }, { "SetInvEntry", 12, 1 }, { "CpcmdDvld", 11, 1 }, { "CpcmdSave", 10, 1 }, { "RxPstructsFull", 8, 2 }, { "EpcmdDvld", 7, 1 }, { "EpcmdFlush", 6, 1 }, { "EpcmdTrimPrefix", 5, 1 }, { "EpcmdTrimPostfix", 4, 1 }, { "ERssIp4Pkt", 3, 1 }, { "ERssIp6Pkt", 2, 1 }, { "ERssTcpUdpPkt", 1, 1 }, { "ERssFceFipPkt", 0, 1 }, { NULL } }; static const struct field_desc tp_la2[] = { { "CplCmdIn", 56, 8 }, { "MpsVfVld", 55, 1 }, { "MpsPf", 52, 3 }, { "MpsVf", 44, 8 }, { "SynIn", 43, 1 }, { "AckIn", 42, 1 }, { "FinIn", 41, 1 }, { "RstIn", 40, 1 }, { "DataIn", 39, 1 }, { "DataInVld", 38, 1 }, { "PadIn", 37, 1 }, { "RxBufEmpty", 36, 1 }, { "RxDdp", 35, 1 }, { "RxFbCongestion", 34, 1 }, { "TxFbCongestion", 33, 1 }, { "TxPktSumSrdy", 32, 1 }, { "RcfUlpType", 28, 4 }, { "Eread", 27, 1 }, { "Ebypass", 26, 1 }, { "Esave", 25, 1 }, { "Static0", 24, 1 }, { "Cread", 23, 1 }, { "Cbypass", 22, 1 }, { "Csave", 21, 1 }, { "CPktOut", 20, 1 }, { "RxPagePoolFull", 18, 2 }, { "RxLpbkPkt", 17, 1 }, { "TxLpbkPkt", 16, 1 }, { "RxVfValid", 15, 1 }, { "SynLearned", 14, 1 }, { "SetDelEntry", 13, 1 }, { "SetInvEntry", 12, 1 }, { "CpcmdDvld", 11, 1 }, { "CpcmdSave", 10, 1 }, { "RxPstructsFull", 8, 2 }, { "EpcmdDvld", 7, 1 }, { "EpcmdFlush", 6, 1 }, { "EpcmdTrimPrefix", 5, 1 }, { "EpcmdTrimPostfix", 4, 1 }, { "ERssIp4Pkt", 3, 1 }, { "ERssIp6Pkt", 2, 1 }, { "ERssTcpUdpPkt", 1, 1 }, { "ERssFceFipPkt", 0, 1 }, { NULL } }; static void tp_la_show(struct sbuf *sb, uint64_t *p, int idx) { field_desc_show(sb, *p, tp_la0); } static void tp_la_show2(struct sbuf *sb, uint64_t *p, int idx) { if (idx) sbuf_printf(sb, "\n"); field_desc_show(sb, p[0], tp_la0); if (idx < (TPLA_SIZE / 2 - 1) || p[1] != ~0ULL) field_desc_show(sb, p[1], tp_la0); } static void tp_la_show3(struct sbuf *sb, uint64_t *p, int idx) { if (idx) sbuf_printf(sb, "\n"); field_desc_show(sb, p[0], tp_la0); if (idx < (TPLA_SIZE / 2 - 1) || p[1] != ~0ULL) field_desc_show(sb, p[1], (p[0] & (1 << 17)) ? tp_la2 : tp_la1); } static int sysctl_tp_la(SYSCTL_HANDLER_ARGS) { struct adapter *sc = arg1; struct sbuf *sb; uint64_t *buf, *p; int rc; u_int i, inc; void (*show_func)(struct sbuf *, uint64_t *, int); rc = sysctl_wire_old_buffer(req, 0); if (rc != 0) return (rc); sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req); if (sb == NULL) return (ENOMEM); buf = malloc(TPLA_SIZE * sizeof(uint64_t), M_CXGBE, M_ZERO | M_WAITOK); t4_tp_read_la(sc, buf, NULL); p = buf; switch (G_DBGLAMODE(t4_read_reg(sc, A_TP_DBG_LA_CONFIG))) { case 2: inc = 2; show_func = tp_la_show2; break; case 3: inc = 2; show_func = tp_la_show3; break; default: inc = 1; show_func = tp_la_show; } for (i = 0; i < TPLA_SIZE / inc; i++, p += inc) (*show_func)(sb, p, i); rc = sbuf_finish(sb); sbuf_delete(sb); free(buf, M_CXGBE); return (rc); } static int sysctl_tx_rate(SYSCTL_HANDLER_ARGS) { struct adapter *sc = arg1; struct sbuf *sb; int rc; u64 nrate[MAX_NCHAN], orate[MAX_NCHAN]; rc = sysctl_wire_old_buffer(req, 0); if (rc != 0) return (rc); sb = sbuf_new_for_sysctl(NULL, NULL, 256, req); if (sb == NULL) return (ENOMEM); t4_get_chan_txrate(sc, nrate, orate); if (sc->chip_params->nchan > 2) { sbuf_printf(sb, " channel 0 channel 1" " channel 2 channel 3\n"); sbuf_printf(sb, "NIC B/s: %10ju %10ju %10ju %10ju\n", nrate[0], nrate[1], nrate[2], nrate[3]); sbuf_printf(sb, "Offload B/s: %10ju %10ju %10ju %10ju", orate[0], orate[1], orate[2], orate[3]); } else { sbuf_printf(sb, " channel 0 channel 1\n"); sbuf_printf(sb, "NIC B/s: %10ju %10ju\n", nrate[0], nrate[1]); sbuf_printf(sb, "Offload B/s: %10ju %10ju", orate[0], orate[1]); } rc = sbuf_finish(sb); sbuf_delete(sb); return (rc); } static int sysctl_ulprx_la(SYSCTL_HANDLER_ARGS) { struct adapter *sc = arg1; struct sbuf *sb; uint32_t *buf, *p; int rc, i; rc = sysctl_wire_old_buffer(req, 0); if (rc != 0) return (rc); sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req); if (sb == NULL) return (ENOMEM); buf = malloc(ULPRX_LA_SIZE * 8 * sizeof(uint32_t), M_CXGBE, M_ZERO | M_WAITOK); t4_ulprx_read_la(sc, buf); p = buf; sbuf_printf(sb, " Pcmd Type Message" " Data"); for (i = 0; i < ULPRX_LA_SIZE; i++, p += 8) { sbuf_printf(sb, "\n%08x%08x %4x %08x %08x%08x%08x%08x", p[1], p[0], p[2], p[3], p[7], p[6], p[5], p[4]); } rc = sbuf_finish(sb); sbuf_delete(sb); free(buf, M_CXGBE); return (rc); } static int sysctl_wcwr_stats(SYSCTL_HANDLER_ARGS) { struct adapter *sc = arg1; struct sbuf *sb; int rc, v; rc = sysctl_wire_old_buffer(req, 0); if (rc != 0) return (rc); sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req); if (sb == NULL) return (ENOMEM); v = t4_read_reg(sc, A_SGE_STAT_CFG); if (G_STATSOURCE_T5(v) == 7) { if (G_STATMODE(v) == 0) { sbuf_printf(sb, "total %d, incomplete %d", t4_read_reg(sc, A_SGE_STAT_TOTAL), t4_read_reg(sc, A_SGE_STAT_MATCH)); } else if (G_STATMODE(v) == 1) { sbuf_printf(sb, "total %d, data overflow %d", t4_read_reg(sc, A_SGE_STAT_TOTAL), t4_read_reg(sc, A_SGE_STAT_MATCH)); } } rc = sbuf_finish(sb); sbuf_delete(sb); return (rc); } static int sysctl_tc_params(SYSCTL_HANDLER_ARGS) { struct adapter *sc = arg1; struct tx_sched_class *tc; struct t4_sched_class_params p; struct sbuf *sb; int i, rc, port_id, flags, mbps, gbps; rc = sysctl_wire_old_buffer(req, 0); if (rc != 0) return (rc); sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req); if (sb == NULL) return (ENOMEM); port_id = arg2 >> 16; MPASS(port_id < sc->params.nports); MPASS(sc->port[port_id] != NULL); i = arg2 & 0xffff; MPASS(i < sc->chip_params->nsched_cls); tc = &sc->port[port_id]->tc[i]; rc = begin_synchronized_op(sc, NULL, HOLD_LOCK | SLEEP_OK | INTR_OK, "t4tc_p"); if (rc) goto done; flags = tc->flags; p = tc->params; end_synchronized_op(sc, LOCK_HELD); if ((flags & TX_SC_OK) == 0) { sbuf_printf(sb, "none"); goto done; } if (p.level == SCHED_CLASS_LEVEL_CL_WRR) { sbuf_printf(sb, "cl-wrr weight %u", p.weight); goto done; } else if (p.level == SCHED_CLASS_LEVEL_CL_RL) sbuf_printf(sb, "cl-rl"); else if (p.level == SCHED_CLASS_LEVEL_CH_RL) sbuf_printf(sb, "ch-rl"); else { rc = ENXIO; goto done; } if (p.ratemode == SCHED_CLASS_RATEMODE_REL) { /* XXX: top speed or actual link speed? */ gbps = port_top_speed(sc->port[port_id]); sbuf_printf(sb, " %u%% of %uGbps", p.maxrate, gbps); } else if (p.ratemode == SCHED_CLASS_RATEMODE_ABS) { switch (p.rateunit) { case SCHED_CLASS_RATEUNIT_BITS: mbps = p.maxrate / 1000; gbps = p.maxrate / 1000000; if (p.maxrate == gbps * 1000000) sbuf_printf(sb, " %uGbps", gbps); else if (p.maxrate == mbps * 1000) sbuf_printf(sb, " %uMbps", mbps); else sbuf_printf(sb, " %uKbps", p.maxrate); break; case SCHED_CLASS_RATEUNIT_PKTS: sbuf_printf(sb, " %upps", p.maxrate); break; default: rc = ENXIO; goto done; } } switch (p.mode) { case SCHED_CLASS_MODE_CLASS: sbuf_printf(sb, " aggregate"); break; case SCHED_CLASS_MODE_FLOW: sbuf_printf(sb, " per-flow"); break; default: rc = ENXIO; goto done; } done: if (rc == 0) rc = sbuf_finish(sb); sbuf_delete(sb); return (rc); } #endif #ifdef TCP_OFFLOAD static void unit_conv(char *buf, size_t len, u_int val, u_int factor) { u_int rem = val % factor; if (rem == 0) snprintf(buf, len, "%u", val / factor); else { while (rem % 10 == 0) rem /= 10; snprintf(buf, len, "%u.%u", val / factor, rem); } } static int sysctl_tp_tick(SYSCTL_HANDLER_ARGS) { struct adapter *sc = arg1; char buf[16]; u_int res, re; u_int cclk_ps = 1000000000 / sc->params.vpd.cclk; res = t4_read_reg(sc, A_TP_TIMER_RESOLUTION); switch (arg2) { case 0: /* timer_tick */ re = G_TIMERRESOLUTION(res); break; case 1: /* TCP timestamp tick */ re = G_TIMESTAMPRESOLUTION(res); break; case 2: /* DACK tick */ re = G_DELAYEDACKRESOLUTION(res); break; default: return (EDOOFUS); } unit_conv(buf, sizeof(buf), (cclk_ps << re), 1000000); return (sysctl_handle_string(oidp, buf, sizeof(buf), req)); } static int sysctl_tp_dack_timer(SYSCTL_HANDLER_ARGS) { struct adapter *sc = arg1; u_int res, dack_re, v; u_int cclk_ps = 1000000000 / sc->params.vpd.cclk; res = t4_read_reg(sc, A_TP_TIMER_RESOLUTION); dack_re = G_DELAYEDACKRESOLUTION(res); v = ((cclk_ps << dack_re) / 1000000) * t4_read_reg(sc, A_TP_DACK_TIMER); return (sysctl_handle_int(oidp, &v, 0, req)); } static int sysctl_tp_timer(SYSCTL_HANDLER_ARGS) { struct adapter *sc = arg1; int reg = arg2; u_int tre; u_long tp_tick_us, v; u_int cclk_ps = 1000000000 / sc->params.vpd.cclk; MPASS(reg == A_TP_RXT_MIN || reg == A_TP_RXT_MAX || reg == A_TP_PERS_MIN || reg == A_TP_PERS_MAX || reg == A_TP_KEEP_IDLE || A_TP_KEEP_INTVL || reg == A_TP_INIT_SRTT || reg == A_TP_FINWAIT2_TIMER); tre = G_TIMERRESOLUTION(t4_read_reg(sc, A_TP_TIMER_RESOLUTION)); tp_tick_us = (cclk_ps << tre) / 1000000; if (reg == A_TP_INIT_SRTT) v = tp_tick_us * G_INITSRTT(t4_read_reg(sc, reg)); else v = tp_tick_us * t4_read_reg(sc, reg); return (sysctl_handle_long(oidp, &v, 0, req)); } #endif static uint32_t fconf_iconf_to_mode(uint32_t fconf, uint32_t iconf) { uint32_t mode; mode = T4_FILTER_IPv4 | T4_FILTER_IPv6 | T4_FILTER_IP_SADDR | T4_FILTER_IP_DADDR | T4_FILTER_IP_SPORT | T4_FILTER_IP_DPORT; if (fconf & F_FRAGMENTATION) mode |= T4_FILTER_IP_FRAGMENT; if (fconf & F_MPSHITTYPE) mode |= T4_FILTER_MPS_HIT_TYPE; if (fconf & F_MACMATCH) mode |= T4_FILTER_MAC_IDX; if (fconf & F_ETHERTYPE) mode |= T4_FILTER_ETH_TYPE; if (fconf & F_PROTOCOL) mode |= T4_FILTER_IP_PROTO; if (fconf & F_TOS) mode |= T4_FILTER_IP_TOS; if (fconf & F_VLAN) mode |= T4_FILTER_VLAN; if (fconf & F_VNIC_ID) { mode |= T4_FILTER_VNIC; if (iconf & F_VNIC) mode |= T4_FILTER_IC_VNIC; } if (fconf & F_PORT) mode |= T4_FILTER_PORT; if (fconf & F_FCOE) mode |= T4_FILTER_FCoE; return (mode); } static uint32_t mode_to_fconf(uint32_t mode) { uint32_t fconf = 0; if (mode & T4_FILTER_IP_FRAGMENT) fconf |= F_FRAGMENTATION; if (mode & T4_FILTER_MPS_HIT_TYPE) fconf |= F_MPSHITTYPE; if (mode & T4_FILTER_MAC_IDX) fconf |= F_MACMATCH; if (mode & T4_FILTER_ETH_TYPE) fconf |= F_ETHERTYPE; if (mode & T4_FILTER_IP_PROTO) fconf |= F_PROTOCOL; if (mode & T4_FILTER_IP_TOS) fconf |= F_TOS; if (mode & T4_FILTER_VLAN) fconf |= F_VLAN; if (mode & T4_FILTER_VNIC) fconf |= F_VNIC_ID; if (mode & T4_FILTER_PORT) fconf |= F_PORT; if (mode & T4_FILTER_FCoE) fconf |= F_FCOE; return (fconf); } static uint32_t mode_to_iconf(uint32_t mode) { if (mode & T4_FILTER_IC_VNIC) return (F_VNIC); return (0); } static int check_fspec_against_fconf_iconf(struct adapter *sc, struct t4_filter_specification *fs) { struct tp_params *tpp = &sc->params.tp; uint32_t fconf = 0; if (fs->val.frag || fs->mask.frag) fconf |= F_FRAGMENTATION; if (fs->val.matchtype || fs->mask.matchtype) fconf |= F_MPSHITTYPE; if (fs->val.macidx || fs->mask.macidx) fconf |= F_MACMATCH; if (fs->val.ethtype || fs->mask.ethtype) fconf |= F_ETHERTYPE; if (fs->val.proto || fs->mask.proto) fconf |= F_PROTOCOL; if (fs->val.tos || fs->mask.tos) fconf |= F_TOS; if (fs->val.vlan_vld || fs->mask.vlan_vld) fconf |= F_VLAN; if (fs->val.ovlan_vld || fs->mask.ovlan_vld) { fconf |= F_VNIC_ID; if (tpp->ingress_config & F_VNIC) return (EINVAL); } if (fs->val.pfvf_vld || fs->mask.pfvf_vld) { fconf |= F_VNIC_ID; if ((tpp->ingress_config & F_VNIC) == 0) return (EINVAL); } if (fs->val.iport || fs->mask.iport) fconf |= F_PORT; if (fs->val.fcoe || fs->mask.fcoe) fconf |= F_FCOE; if ((tpp->vlan_pri_map | fconf) != tpp->vlan_pri_map) return (E2BIG); return (0); } static int get_filter_mode(struct adapter *sc, uint32_t *mode) { struct tp_params *tpp = &sc->params.tp; /* * We trust the cached values of the relevant TP registers. This means * things work reliably only if writes to those registers are always via * t4_set_filter_mode. */ *mode = fconf_iconf_to_mode(tpp->vlan_pri_map, tpp->ingress_config); return (0); } static int set_filter_mode(struct adapter *sc, uint32_t mode) { struct tp_params *tpp = &sc->params.tp; uint32_t fconf, iconf; int rc; iconf = mode_to_iconf(mode); if ((iconf ^ tpp->ingress_config) & F_VNIC) { /* * For now we just complain if A_TP_INGRESS_CONFIG is not * already set to the correct value for the requested filter * mode. It's not clear if it's safe to write to this register * on the fly. (And we trust the cached value of the register). */ return (EBUSY); } fconf = mode_to_fconf(mode); rc = begin_synchronized_op(sc, NULL, HOLD_LOCK | SLEEP_OK | INTR_OK, "t4setfm"); if (rc) return (rc); if (sc->tids.ftids_in_use > 0) { rc = EBUSY; goto done; } #ifdef TCP_OFFLOAD if (uld_active(sc, ULD_TOM)) { rc = EBUSY; goto done; } #endif rc = -t4_set_filter_mode(sc, fconf); done: end_synchronized_op(sc, LOCK_HELD); return (rc); } static inline uint64_t get_filter_hits(struct adapter *sc, uint32_t fid) { uint32_t tcb_addr; tcb_addr = t4_read_reg(sc, A_TP_CMM_TCB_BASE) + (fid + sc->tids.ftid_base) * TCB_SIZE; if (is_t4(sc)) { uint64_t hits; read_via_memwin(sc, 0, tcb_addr + 16, (uint32_t *)&hits, 8); return (be64toh(hits)); } else { uint32_t hits; read_via_memwin(sc, 0, tcb_addr + 24, &hits, 4); return (be32toh(hits)); } } static int get_filter(struct adapter *sc, struct t4_filter *t) { int i, rc, nfilters = sc->tids.nftids; struct filter_entry *f; rc = begin_synchronized_op(sc, NULL, HOLD_LOCK | SLEEP_OK | INTR_OK, "t4getf"); if (rc) return (rc); if (sc->tids.ftids_in_use == 0 || sc->tids.ftid_tab == NULL || t->idx >= nfilters) { t->idx = 0xffffffff; goto done; } f = &sc->tids.ftid_tab[t->idx]; for (i = t->idx; i < nfilters; i++, f++) { if (f->valid) { t->idx = i; t->l2tidx = f->l2t ? f->l2t->idx : 0; t->smtidx = f->smtidx; if (f->fs.hitcnts) t->hits = get_filter_hits(sc, t->idx); else t->hits = UINT64_MAX; t->fs = f->fs; goto done; } } t->idx = 0xffffffff; done: end_synchronized_op(sc, LOCK_HELD); return (0); } static int set_filter(struct adapter *sc, struct t4_filter *t) { unsigned int nfilters, nports; struct filter_entry *f; int i, rc; rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4setf"); if (rc) return (rc); nfilters = sc->tids.nftids; nports = sc->params.nports; if (nfilters == 0) { rc = ENOTSUP; goto done; } if (t->idx >= nfilters) { rc = EINVAL; goto done; } /* Validate against the global filter mode and ingress config */ rc = check_fspec_against_fconf_iconf(sc, &t->fs); if (rc != 0) goto done; if (t->fs.action == FILTER_SWITCH && t->fs.eport >= nports) { rc = EINVAL; goto done; } if (t->fs.val.iport >= nports) { rc = EINVAL; goto done; } /* Can't specify an iq if not steering to it */ if (!t->fs.dirsteer && t->fs.iq) { rc = EINVAL; goto done; } /* IPv6 filter idx must be 4 aligned */ if (t->fs.type == 1 && ((t->idx & 0x3) || t->idx + 4 >= nfilters)) { rc = EINVAL; goto done; } if (!(sc->flags & FULL_INIT_DONE) && ((rc = adapter_full_init(sc)) != 0)) goto done; if (sc->tids.ftid_tab == NULL) { KASSERT(sc->tids.ftids_in_use == 0, ("%s: no memory allocated but filters_in_use > 0", __func__)); sc->tids.ftid_tab = malloc(sizeof (struct filter_entry) * nfilters, M_CXGBE, M_NOWAIT | M_ZERO); if (sc->tids.ftid_tab == NULL) { rc = ENOMEM; goto done; } mtx_init(&sc->tids.ftid_lock, "T4 filters", 0, MTX_DEF); } for (i = 0; i < 4; i++) { f = &sc->tids.ftid_tab[t->idx + i]; if (f->pending || f->valid) { rc = EBUSY; goto done; } if (f->locked) { rc = EPERM; goto done; } if (t->fs.type == 0) break; } f = &sc->tids.ftid_tab[t->idx]; f->fs = t->fs; rc = set_filter_wr(sc, t->idx); done: end_synchronized_op(sc, 0); if (rc == 0) { mtx_lock(&sc->tids.ftid_lock); for (;;) { if (f->pending == 0) { rc = f->valid ? 0 : EIO; break; } if (mtx_sleep(&sc->tids.ftid_tab, &sc->tids.ftid_lock, PCATCH, "t4setfw", 0)) { rc = EINPROGRESS; break; } } mtx_unlock(&sc->tids.ftid_lock); } return (rc); } static int del_filter(struct adapter *sc, struct t4_filter *t) { unsigned int nfilters; struct filter_entry *f; int rc; rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4delf"); if (rc) return (rc); nfilters = sc->tids.nftids; if (nfilters == 0) { rc = ENOTSUP; goto done; } if (sc->tids.ftid_tab == NULL || sc->tids.ftids_in_use == 0 || t->idx >= nfilters) { rc = EINVAL; goto done; } if (!(sc->flags & FULL_INIT_DONE)) { rc = EAGAIN; goto done; } f = &sc->tids.ftid_tab[t->idx]; if (f->pending) { rc = EBUSY; goto done; } if (f->locked) { rc = EPERM; goto done; } if (f->valid) { t->fs = f->fs; /* extra info for the caller */ rc = del_filter_wr(sc, t->idx); } done: end_synchronized_op(sc, 0); if (rc == 0) { mtx_lock(&sc->tids.ftid_lock); for (;;) { if (f->pending == 0) { rc = f->valid ? EIO : 0; break; } if (mtx_sleep(&sc->tids.ftid_tab, &sc->tids.ftid_lock, PCATCH, "t4delfw", 0)) { rc = EINPROGRESS; break; } } mtx_unlock(&sc->tids.ftid_lock); } return (rc); } static void clear_filter(struct filter_entry *f) { if (f->l2t) t4_l2t_release(f->l2t); bzero(f, sizeof (*f)); } static int set_filter_wr(struct adapter *sc, int fidx) { struct filter_entry *f = &sc->tids.ftid_tab[fidx]; struct fw_filter_wr *fwr; unsigned int ftid, vnic_vld, vnic_vld_mask; struct wrq_cookie cookie; ASSERT_SYNCHRONIZED_OP(sc); if (f->fs.newdmac || f->fs.newvlan) { /* This filter needs an L2T entry; allocate one. */ f->l2t = t4_l2t_alloc_switching(sc->l2t); if (f->l2t == NULL) return (EAGAIN); if (t4_l2t_set_switching(sc, f->l2t, f->fs.vlan, f->fs.eport, f->fs.dmac)) { t4_l2t_release(f->l2t); f->l2t = NULL; return (ENOMEM); } } /* Already validated against fconf, iconf */ MPASS((f->fs.val.pfvf_vld & f->fs.val.ovlan_vld) == 0); MPASS((f->fs.mask.pfvf_vld & f->fs.mask.ovlan_vld) == 0); if (f->fs.val.pfvf_vld || f->fs.val.ovlan_vld) vnic_vld = 1; else vnic_vld = 0; if (f->fs.mask.pfvf_vld || f->fs.mask.ovlan_vld) vnic_vld_mask = 1; else vnic_vld_mask = 0; ftid = sc->tids.ftid_base + fidx; fwr = start_wrq_wr(&sc->sge.mgmtq, howmany(sizeof(*fwr), 16), &cookie); if (fwr == NULL) return (ENOMEM); bzero(fwr, sizeof(*fwr)); fwr->op_pkd = htobe32(V_FW_WR_OP(FW_FILTER_WR)); fwr->len16_pkd = htobe32(FW_LEN16(*fwr)); fwr->tid_to_iq = htobe32(V_FW_FILTER_WR_TID(ftid) | V_FW_FILTER_WR_RQTYPE(f->fs.type) | V_FW_FILTER_WR_NOREPLY(0) | V_FW_FILTER_WR_IQ(f->fs.iq)); fwr->del_filter_to_l2tix = htobe32(V_FW_FILTER_WR_RPTTID(f->fs.rpttid) | V_FW_FILTER_WR_DROP(f->fs.action == FILTER_DROP) | V_FW_FILTER_WR_DIRSTEER(f->fs.dirsteer) | V_FW_FILTER_WR_MASKHASH(f->fs.maskhash) | V_FW_FILTER_WR_DIRSTEERHASH(f->fs.dirsteerhash) | V_FW_FILTER_WR_LPBK(f->fs.action == FILTER_SWITCH) | V_FW_FILTER_WR_DMAC(f->fs.newdmac) | V_FW_FILTER_WR_SMAC(f->fs.newsmac) | V_FW_FILTER_WR_INSVLAN(f->fs.newvlan == VLAN_INSERT || f->fs.newvlan == VLAN_REWRITE) | V_FW_FILTER_WR_RMVLAN(f->fs.newvlan == VLAN_REMOVE || f->fs.newvlan == VLAN_REWRITE) | V_FW_FILTER_WR_HITCNTS(f->fs.hitcnts) | V_FW_FILTER_WR_TXCHAN(f->fs.eport) | V_FW_FILTER_WR_PRIO(f->fs.prio) | V_FW_FILTER_WR_L2TIX(f->l2t ? f->l2t->idx : 0)); fwr->ethtype = htobe16(f->fs.val.ethtype); fwr->ethtypem = htobe16(f->fs.mask.ethtype); fwr->frag_to_ovlan_vldm = (V_FW_FILTER_WR_FRAG(f->fs.val.frag) | V_FW_FILTER_WR_FRAGM(f->fs.mask.frag) | V_FW_FILTER_WR_IVLAN_VLD(f->fs.val.vlan_vld) | V_FW_FILTER_WR_OVLAN_VLD(vnic_vld) | V_FW_FILTER_WR_IVLAN_VLDM(f->fs.mask.vlan_vld) | V_FW_FILTER_WR_OVLAN_VLDM(vnic_vld_mask)); fwr->smac_sel = 0; fwr->rx_chan_rx_rpl_iq = htobe16(V_FW_FILTER_WR_RX_CHAN(0) | V_FW_FILTER_WR_RX_RPL_IQ(sc->sge.fwq.abs_id)); fwr->maci_to_matchtypem = htobe32(V_FW_FILTER_WR_MACI(f->fs.val.macidx) | V_FW_FILTER_WR_MACIM(f->fs.mask.macidx) | V_FW_FILTER_WR_FCOE(f->fs.val.fcoe) | V_FW_FILTER_WR_FCOEM(f->fs.mask.fcoe) | V_FW_FILTER_WR_PORT(f->fs.val.iport) | V_FW_FILTER_WR_PORTM(f->fs.mask.iport) | V_FW_FILTER_WR_MATCHTYPE(f->fs.val.matchtype) | V_FW_FILTER_WR_MATCHTYPEM(f->fs.mask.matchtype)); fwr->ptcl = f->fs.val.proto; fwr->ptclm = f->fs.mask.proto; fwr->ttyp = f->fs.val.tos; fwr->ttypm = f->fs.mask.tos; fwr->ivlan = htobe16(f->fs.val.vlan); fwr->ivlanm = htobe16(f->fs.mask.vlan); fwr->ovlan = htobe16(f->fs.val.vnic); fwr->ovlanm = htobe16(f->fs.mask.vnic); bcopy(f->fs.val.dip, fwr->lip, sizeof (fwr->lip)); bcopy(f->fs.mask.dip, fwr->lipm, sizeof (fwr->lipm)); bcopy(f->fs.val.sip, fwr->fip, sizeof (fwr->fip)); bcopy(f->fs.mask.sip, fwr->fipm, sizeof (fwr->fipm)); fwr->lp = htobe16(f->fs.val.dport); fwr->lpm = htobe16(f->fs.mask.dport); fwr->fp = htobe16(f->fs.val.sport); fwr->fpm = htobe16(f->fs.mask.sport); if (f->fs.newsmac) bcopy(f->fs.smac, fwr->sma, sizeof (fwr->sma)); f->pending = 1; sc->tids.ftids_in_use++; commit_wrq_wr(&sc->sge.mgmtq, fwr, &cookie); return (0); } static int del_filter_wr(struct adapter *sc, int fidx) { struct filter_entry *f = &sc->tids.ftid_tab[fidx]; struct fw_filter_wr *fwr; unsigned int ftid; struct wrq_cookie cookie; ftid = sc->tids.ftid_base + fidx; fwr = start_wrq_wr(&sc->sge.mgmtq, howmany(sizeof(*fwr), 16), &cookie); if (fwr == NULL) return (ENOMEM); bzero(fwr, sizeof (*fwr)); t4_mk_filtdelwr(ftid, fwr, sc->sge.fwq.abs_id); f->pending = 1; commit_wrq_wr(&sc->sge.mgmtq, fwr, &cookie); return (0); } int t4_filter_rpl(struct sge_iq *iq, const struct rss_header *rss, struct mbuf *m) { struct adapter *sc = iq->adapter; const struct cpl_set_tcb_rpl *rpl = (const void *)(rss + 1); unsigned int idx = GET_TID(rpl); unsigned int rc; struct filter_entry *f; KASSERT(m == NULL, ("%s: payload with opcode %02x", __func__, rss->opcode)); MPASS(iq == &sc->sge.fwq); MPASS(is_ftid(sc, idx)); idx -= sc->tids.ftid_base; f = &sc->tids.ftid_tab[idx]; rc = G_COOKIE(rpl->cookie); mtx_lock(&sc->tids.ftid_lock); if (rc == FW_FILTER_WR_FLT_ADDED) { KASSERT(f->pending, ("%s: filter[%u] isn't pending.", __func__, idx)); f->smtidx = (be64toh(rpl->oldval) >> 24) & 0xff; f->pending = 0; /* asynchronous setup completed */ f->valid = 1; } else { if (rc != FW_FILTER_WR_FLT_DELETED) { /* Add or delete failed, display an error */ log(LOG_ERR, "filter %u setup failed with error %u\n", idx, rc); } clear_filter(f); sc->tids.ftids_in_use--; } wakeup(&sc->tids.ftid_tab); mtx_unlock(&sc->tids.ftid_lock); return (0); } static int set_tcb_rpl(struct sge_iq *iq, const struct rss_header *rss, struct mbuf *m) { MPASS(iq->set_tcb_rpl != NULL); return (iq->set_tcb_rpl(iq, rss, m)); } static int l2t_write_rpl(struct sge_iq *iq, const struct rss_header *rss, struct mbuf *m) { MPASS(iq->l2t_write_rpl != NULL); return (iq->l2t_write_rpl(iq, rss, m)); } static int get_sge_context(struct adapter *sc, struct t4_sge_context *cntxt) { int rc; if (cntxt->cid > M_CTXTQID) return (EINVAL); if (cntxt->mem_id != CTXT_EGRESS && cntxt->mem_id != CTXT_INGRESS && cntxt->mem_id != CTXT_FLM && cntxt->mem_id != CTXT_CNM) return (EINVAL); rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4ctxt"); if (rc) return (rc); if (sc->flags & FW_OK) { rc = -t4_sge_ctxt_rd(sc, sc->mbox, cntxt->cid, cntxt->mem_id, &cntxt->data[0]); if (rc == 0) goto done; } /* * Read via firmware failed or wasn't even attempted. Read directly via * the backdoor. */ rc = -t4_sge_ctxt_rd_bd(sc, cntxt->cid, cntxt->mem_id, &cntxt->data[0]); done: end_synchronized_op(sc, 0); return (rc); } static int load_fw(struct adapter *sc, struct t4_data *fw) { int rc; uint8_t *fw_data; rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4ldfw"); if (rc) return (rc); if (sc->flags & FULL_INIT_DONE) { rc = EBUSY; goto done; } fw_data = malloc(fw->len, M_CXGBE, M_WAITOK); if (fw_data == NULL) { rc = ENOMEM; goto done; } rc = copyin(fw->data, fw_data, fw->len); if (rc == 0) rc = -t4_load_fw(sc, fw_data, fw->len); free(fw_data, M_CXGBE); done: end_synchronized_op(sc, 0); return (rc); } #define MAX_READ_BUF_SIZE (128 * 1024) static int read_card_mem(struct adapter *sc, int win, struct t4_mem_range *mr) { uint32_t addr, remaining, n; uint32_t *buf; int rc; uint8_t *dst; rc = validate_mem_range(sc, mr->addr, mr->len); if (rc != 0) return (rc); buf = malloc(min(mr->len, MAX_READ_BUF_SIZE), M_CXGBE, M_WAITOK); addr = mr->addr; remaining = mr->len; dst = (void *)mr->data; while (remaining) { n = min(remaining, MAX_READ_BUF_SIZE); read_via_memwin(sc, 2, addr, buf, n); rc = copyout(buf, dst, n); if (rc != 0) break; dst += n; remaining -= n; addr += n; } free(buf, M_CXGBE); return (rc); } #undef MAX_READ_BUF_SIZE static int read_i2c(struct adapter *sc, struct t4_i2c_data *i2cd) { int rc; if (i2cd->len == 0 || i2cd->port_id >= sc->params.nports) return (EINVAL); if (i2cd->len > sizeof(i2cd->data)) return (EFBIG); rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4i2crd"); if (rc) return (rc); rc = -t4_i2c_rd(sc, sc->mbox, i2cd->port_id, i2cd->dev_addr, i2cd->offset, i2cd->len, &i2cd->data[0]); end_synchronized_op(sc, 0); return (rc); } static int in_range(int val, int lo, int hi) { return (val < 0 || (val <= hi && val >= lo)); } static int set_sched_class_config(struct adapter *sc, int minmax) { int rc; if (minmax < 0) return (EINVAL); rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4sscc"); if (rc) return (rc); rc = -t4_sched_config(sc, FW_SCHED_TYPE_PKTSCHED, minmax, 1); end_synchronized_op(sc, 0); return (rc); } static int set_sched_class_params(struct adapter *sc, struct t4_sched_class_params *p, int sleep_ok) { int rc, top_speed, fw_level, fw_mode, fw_rateunit, fw_ratemode; struct port_info *pi; struct tx_sched_class *tc; if (p->level == SCHED_CLASS_LEVEL_CL_RL) fw_level = FW_SCHED_PARAMS_LEVEL_CL_RL; else if (p->level == SCHED_CLASS_LEVEL_CL_WRR) fw_level = FW_SCHED_PARAMS_LEVEL_CL_WRR; else if (p->level == SCHED_CLASS_LEVEL_CH_RL) fw_level = FW_SCHED_PARAMS_LEVEL_CH_RL; else return (EINVAL); if (p->mode == SCHED_CLASS_MODE_CLASS) fw_mode = FW_SCHED_PARAMS_MODE_CLASS; else if (p->mode == SCHED_CLASS_MODE_FLOW) fw_mode = FW_SCHED_PARAMS_MODE_FLOW; else return (EINVAL); if (p->rateunit == SCHED_CLASS_RATEUNIT_BITS) fw_rateunit = FW_SCHED_PARAMS_UNIT_BITRATE; else if (p->rateunit == SCHED_CLASS_RATEUNIT_PKTS) fw_rateunit = FW_SCHED_PARAMS_UNIT_PKTRATE; else return (EINVAL); if (p->ratemode == SCHED_CLASS_RATEMODE_REL) fw_ratemode = FW_SCHED_PARAMS_RATE_REL; else if (p->ratemode == SCHED_CLASS_RATEMODE_ABS) fw_ratemode = FW_SCHED_PARAMS_RATE_ABS; else return (EINVAL); /* Vet our parameters ... */ if (!in_range(p->channel, 0, sc->chip_params->nchan - 1)) return (ERANGE); pi = sc->port[sc->chan_map[p->channel]]; if (pi == NULL) return (ENXIO); MPASS(pi->tx_chan == p->channel); top_speed = port_top_speed(pi) * 1000000; /* Gbps -> Kbps */ if (!in_range(p->cl, 0, sc->chip_params->nsched_cls) || !in_range(p->minrate, 0, top_speed) || !in_range(p->maxrate, 0, top_speed) || !in_range(p->weight, 0, 100)) return (ERANGE); /* * Translate any unset parameters into the firmware's * nomenclature and/or fail the call if the parameters * are required ... */ if (p->rateunit < 0 || p->ratemode < 0 || p->channel < 0 || p->cl < 0) return (EINVAL); if (p->minrate < 0) p->minrate = 0; if (p->maxrate < 0) { if (p->level == SCHED_CLASS_LEVEL_CL_RL || p->level == SCHED_CLASS_LEVEL_CH_RL) return (EINVAL); else p->maxrate = 0; } if (p->weight < 0) { if (p->level == SCHED_CLASS_LEVEL_CL_WRR) return (EINVAL); else p->weight = 0; } if (p->pktsize < 0) { if (p->level == SCHED_CLASS_LEVEL_CL_RL || p->level == SCHED_CLASS_LEVEL_CH_RL) return (EINVAL); else p->pktsize = 0; } rc = begin_synchronized_op(sc, NULL, sleep_ok ? (SLEEP_OK | INTR_OK) : HOLD_LOCK, "t4sscp"); if (rc) return (rc); tc = &pi->tc[p->cl]; tc->params = *p; rc = -t4_sched_params(sc, FW_SCHED_TYPE_PKTSCHED, fw_level, fw_mode, fw_rateunit, fw_ratemode, p->channel, p->cl, p->minrate, p->maxrate, p->weight, p->pktsize, sleep_ok); if (rc == 0) tc->flags |= TX_SC_OK; else { /* * Unknown state at this point, see tc->params for what was * attempted. */ tc->flags &= ~TX_SC_OK; } end_synchronized_op(sc, sleep_ok ? 0 : LOCK_HELD); return (rc); } int t4_set_sched_class(struct adapter *sc, struct t4_sched_params *p) { if (p->type != SCHED_CLASS_TYPE_PACKET) return (EINVAL); if (p->subcmd == SCHED_CLASS_SUBCMD_CONFIG) return (set_sched_class_config(sc, p->u.config.minmax)); if (p->subcmd == SCHED_CLASS_SUBCMD_PARAMS) return (set_sched_class_params(sc, &p->u.params, 1)); return (EINVAL); } int t4_set_sched_queue(struct adapter *sc, struct t4_sched_queue *p) { struct port_info *pi = NULL; struct vi_info *vi; struct sge_txq *txq; uint32_t fw_mnem, fw_queue, fw_class; int i, rc; rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4setsq"); if (rc) return (rc); if (p->port >= sc->params.nports) { rc = EINVAL; goto done; } /* XXX: Only supported for the main VI. */ pi = sc->port[p->port]; vi = &pi->vi[0]; if (!(vi->flags & VI_INIT_DONE)) { /* tx queues not set up yet */ rc = EAGAIN; goto done; } if (!in_range(p->queue, 0, vi->ntxq - 1) || !in_range(p->cl, 0, sc->chip_params->nsched_cls - 1)) { rc = EINVAL; goto done; } /* * Create a template for the FW_PARAMS_CMD mnemonic and value (TX * Scheduling Class in this case). */ fw_mnem = (V_FW_PARAMS_MNEM(FW_PARAMS_MNEM_DMAQ) | V_FW_PARAMS_PARAM_X(FW_PARAMS_PARAM_DMAQ_EQ_SCHEDCLASS_ETH)); fw_class = p->cl < 0 ? 0xffffffff : p->cl; /* * If op.queue is non-negative, then we're only changing the scheduling * on a single specified TX queue. */ if (p->queue >= 0) { txq = &sc->sge.txq[vi->first_txq + p->queue]; fw_queue = (fw_mnem | V_FW_PARAMS_PARAM_YZ(txq->eq.cntxt_id)); rc = -t4_set_params(sc, sc->mbox, sc->pf, 0, 1, &fw_queue, &fw_class); goto done; } /* * Change the scheduling on all the TX queues for the * interface. */ for_each_txq(vi, i, txq) { fw_queue = (fw_mnem | V_FW_PARAMS_PARAM_YZ(txq->eq.cntxt_id)); rc = -t4_set_params(sc, sc->mbox, sc->pf, 0, 1, &fw_queue, &fw_class); if (rc) goto done; } rc = 0; done: end_synchronized_op(sc, 0); return (rc); } int t4_os_find_pci_capability(struct adapter *sc, int cap) { int i; return (pci_find_cap(sc->dev, cap, &i) == 0 ? i : 0); } int t4_os_pci_save_state(struct adapter *sc) { device_t dev; struct pci_devinfo *dinfo; dev = sc->dev; dinfo = device_get_ivars(dev); pci_cfg_save(dev, dinfo, 0); return (0); } int t4_os_pci_restore_state(struct adapter *sc) { device_t dev; struct pci_devinfo *dinfo; dev = sc->dev; dinfo = device_get_ivars(dev); pci_cfg_restore(dev, dinfo); return (0); } void t4_os_portmod_changed(const struct adapter *sc, int idx) { struct port_info *pi = sc->port[idx]; struct vi_info *vi; struct ifnet *ifp; int v; static const char *mod_str[] = { NULL, "LR", "SR", "ER", "TWINAX", "active TWINAX", "LRM" }; for_each_vi(pi, v, vi) { build_medialist(pi, &vi->media); } ifp = pi->vi[0].ifp; if (pi->mod_type == FW_PORT_MOD_TYPE_NONE) if_printf(ifp, "transceiver unplugged.\n"); else if (pi->mod_type == FW_PORT_MOD_TYPE_UNKNOWN) if_printf(ifp, "unknown transceiver inserted.\n"); else if (pi->mod_type == FW_PORT_MOD_TYPE_NOTSUPPORTED) if_printf(ifp, "unsupported transceiver inserted.\n"); else if (pi->mod_type > 0 && pi->mod_type < nitems(mod_str)) { if_printf(ifp, "%s transceiver inserted.\n", mod_str[pi->mod_type]); } else { if_printf(ifp, "transceiver (type %d) inserted.\n", pi->mod_type); } } void t4_os_link_changed(struct adapter *sc, int idx, int link_stat, int reason) { struct port_info *pi = sc->port[idx]; struct vi_info *vi; struct ifnet *ifp; int v; if (link_stat) pi->linkdnrc = -1; else { if (reason >= 0) pi->linkdnrc = reason; } for_each_vi(pi, v, vi) { ifp = vi->ifp; if (ifp == NULL) continue; if (link_stat) { ifp->if_baudrate = IF_Mbps(pi->link_cfg.speed); if_link_state_change(ifp, LINK_STATE_UP); } else { if_link_state_change(ifp, LINK_STATE_DOWN); } } } void t4_iterate(void (*func)(struct adapter *, void *), void *arg) { struct adapter *sc; sx_slock(&t4_list_lock); SLIST_FOREACH(sc, &t4_list, link) { /* * func should not make any assumptions about what state sc is * in - the only guarantee is that sc->sc_lock is a valid lock. */ func(sc, arg); } sx_sunlock(&t4_list_lock); } static int t4_ioctl(struct cdev *dev, unsigned long cmd, caddr_t data, int fflag, struct thread *td) { int rc; struct adapter *sc = dev->si_drv1; rc = priv_check(td, PRIV_DRIVER); if (rc != 0) return (rc); switch (cmd) { case CHELSIO_T4_GETREG: { struct t4_reg *edata = (struct t4_reg *)data; if ((edata->addr & 0x3) != 0 || edata->addr >= sc->mmio_len) return (EFAULT); if (edata->size == 4) edata->val = t4_read_reg(sc, edata->addr); else if (edata->size == 8) edata->val = t4_read_reg64(sc, edata->addr); else return (EINVAL); break; } case CHELSIO_T4_SETREG: { struct t4_reg *edata = (struct t4_reg *)data; if ((edata->addr & 0x3) != 0 || edata->addr >= sc->mmio_len) return (EFAULT); if (edata->size == 4) { if (edata->val & 0xffffffff00000000) return (EINVAL); t4_write_reg(sc, edata->addr, (uint32_t) edata->val); } else if (edata->size == 8) t4_write_reg64(sc, edata->addr, edata->val); else return (EINVAL); break; } case CHELSIO_T4_REGDUMP: { struct t4_regdump *regs = (struct t4_regdump *)data; int reglen = t4_get_regs_len(sc); uint8_t *buf; if (regs->len < reglen) { regs->len = reglen; /* hint to the caller */ return (ENOBUFS); } regs->len = reglen; buf = malloc(reglen, M_CXGBE, M_WAITOK | M_ZERO); get_regs(sc, regs, buf); rc = copyout(buf, regs->data, reglen); free(buf, M_CXGBE); break; } case CHELSIO_T4_GET_FILTER_MODE: rc = get_filter_mode(sc, (uint32_t *)data); break; case CHELSIO_T4_SET_FILTER_MODE: rc = set_filter_mode(sc, *(uint32_t *)data); break; case CHELSIO_T4_GET_FILTER: rc = get_filter(sc, (struct t4_filter *)data); break; case CHELSIO_T4_SET_FILTER: rc = set_filter(sc, (struct t4_filter *)data); break; case CHELSIO_T4_DEL_FILTER: rc = del_filter(sc, (struct t4_filter *)data); break; case CHELSIO_T4_GET_SGE_CONTEXT: rc = get_sge_context(sc, (struct t4_sge_context *)data); break; case CHELSIO_T4_LOAD_FW: rc = load_fw(sc, (struct t4_data *)data); break; case CHELSIO_T4_GET_MEM: rc = read_card_mem(sc, 2, (struct t4_mem_range *)data); break; case CHELSIO_T4_GET_I2C: rc = read_i2c(sc, (struct t4_i2c_data *)data); break; case CHELSIO_T4_CLEAR_STATS: { int i, v; u_int port_id = *(uint32_t *)data; struct port_info *pi; struct vi_info *vi; if (port_id >= sc->params.nports) return (EINVAL); pi = sc->port[port_id]; + if (pi == NULL) + return (EIO); /* MAC stats */ t4_clr_port_stats(sc, pi->tx_chan); pi->tx_parse_error = 0; mtx_lock(&sc->reg_lock); for_each_vi(pi, v, vi) { if (vi->flags & VI_INIT_DONE) t4_clr_vi_stats(sc, vi->viid); } mtx_unlock(&sc->reg_lock); /* * Since this command accepts a port, clear stats for * all VIs on this port. */ for_each_vi(pi, v, vi) { if (vi->flags & VI_INIT_DONE) { struct sge_rxq *rxq; struct sge_txq *txq; struct sge_wrq *wrq; for_each_rxq(vi, i, rxq) { #if defined(INET) || defined(INET6) rxq->lro.lro_queued = 0; rxq->lro.lro_flushed = 0; #endif rxq->rxcsum = 0; rxq->vlan_extraction = 0; } for_each_txq(vi, i, txq) { txq->txcsum = 0; txq->tso_wrs = 0; txq->vlan_insertion = 0; txq->imm_wrs = 0; txq->sgl_wrs = 0; txq->txpkt_wrs = 0; txq->txpkts0_wrs = 0; txq->txpkts1_wrs = 0; txq->txpkts0_pkts = 0; txq->txpkts1_pkts = 0; mp_ring_reset_stats(txq->r); } #ifdef TCP_OFFLOAD /* nothing to clear for each ofld_rxq */ for_each_ofld_txq(vi, i, wrq) { wrq->tx_wrs_direct = 0; wrq->tx_wrs_copied = 0; } #endif if (IS_MAIN_VI(vi)) { wrq = &sc->sge.ctrlq[pi->port_id]; wrq->tx_wrs_direct = 0; wrq->tx_wrs_copied = 0; } } } break; } case CHELSIO_T4_SCHED_CLASS: rc = t4_set_sched_class(sc, (struct t4_sched_params *)data); break; case CHELSIO_T4_SCHED_QUEUE: rc = t4_set_sched_queue(sc, (struct t4_sched_queue *)data); break; case CHELSIO_T4_GET_TRACER: rc = t4_get_tracer(sc, (struct t4_tracer *)data); break; case CHELSIO_T4_SET_TRACER: rc = t4_set_tracer(sc, (struct t4_tracer *)data); break; default: rc = ENOTTY; } return (rc); } void t4_db_full(struct adapter *sc) { CXGBE_UNIMPLEMENTED(__func__); } void t4_db_dropped(struct adapter *sc) { CXGBE_UNIMPLEMENTED(__func__); } #ifdef TCP_OFFLOAD static int toe_capability(struct vi_info *vi, int enable) { int rc; struct port_info *pi = vi->pi; struct adapter *sc = pi->adapter; ASSERT_SYNCHRONIZED_OP(sc); if (!is_offload(sc)) return (ENODEV); if (enable) { if ((vi->ifp->if_capenable & IFCAP_TOE) != 0) { /* TOE is already enabled. */ return (0); } /* * We need the port's queues around so that we're able to send * and receive CPLs to/from the TOE even if the ifnet for this * port has never been UP'd administratively. */ if (!(vi->flags & VI_INIT_DONE)) { rc = vi_full_init(vi); if (rc) return (rc); } if (!(pi->vi[0].flags & VI_INIT_DONE)) { rc = vi_full_init(&pi->vi[0]); if (rc) return (rc); } if (isset(&sc->offload_map, pi->port_id)) { /* TOE is enabled on another VI of this port. */ pi->uld_vis++; return (0); } if (!uld_active(sc, ULD_TOM)) { rc = t4_activate_uld(sc, ULD_TOM); if (rc == EAGAIN) { log(LOG_WARNING, "You must kldload t4_tom.ko before trying " "to enable TOE on a cxgbe interface.\n"); } if (rc != 0) return (rc); KASSERT(sc->tom_softc != NULL, ("%s: TOM activated but softc NULL", __func__)); KASSERT(uld_active(sc, ULD_TOM), ("%s: TOM activated but flag not set", __func__)); } /* Activate iWARP and iSCSI too, if the modules are loaded. */ if (!uld_active(sc, ULD_IWARP)) (void) t4_activate_uld(sc, ULD_IWARP); if (!uld_active(sc, ULD_ISCSI)) (void) t4_activate_uld(sc, ULD_ISCSI); pi->uld_vis++; setbit(&sc->offload_map, pi->port_id); } else { pi->uld_vis--; if (!isset(&sc->offload_map, pi->port_id) || pi->uld_vis > 0) return (0); KASSERT(uld_active(sc, ULD_TOM), ("%s: TOM never initialized?", __func__)); clrbit(&sc->offload_map, pi->port_id); } return (0); } /* * Add an upper layer driver to the global list. */ int t4_register_uld(struct uld_info *ui) { int rc = 0; struct uld_info *u; sx_xlock(&t4_uld_list_lock); SLIST_FOREACH(u, &t4_uld_list, link) { if (u->uld_id == ui->uld_id) { rc = EEXIST; goto done; } } SLIST_INSERT_HEAD(&t4_uld_list, ui, link); ui->refcount = 0; done: sx_xunlock(&t4_uld_list_lock); return (rc); } int t4_unregister_uld(struct uld_info *ui) { int rc = EINVAL; struct uld_info *u; sx_xlock(&t4_uld_list_lock); SLIST_FOREACH(u, &t4_uld_list, link) { if (u == ui) { if (ui->refcount > 0) { rc = EBUSY; goto done; } SLIST_REMOVE(&t4_uld_list, ui, uld_info, link); rc = 0; goto done; } } done: sx_xunlock(&t4_uld_list_lock); return (rc); } int t4_activate_uld(struct adapter *sc, int id) { int rc; struct uld_info *ui; ASSERT_SYNCHRONIZED_OP(sc); if (id < 0 || id > ULD_MAX) return (EINVAL); rc = EAGAIN; /* kldoad the module with this ULD and try again. */ sx_slock(&t4_uld_list_lock); SLIST_FOREACH(ui, &t4_uld_list, link) { if (ui->uld_id == id) { if (!(sc->flags & FULL_INIT_DONE)) { rc = adapter_full_init(sc); if (rc != 0) break; } rc = ui->activate(sc); if (rc == 0) { setbit(&sc->active_ulds, id); ui->refcount++; } break; } } sx_sunlock(&t4_uld_list_lock); return (rc); } int t4_deactivate_uld(struct adapter *sc, int id) { int rc; struct uld_info *ui; ASSERT_SYNCHRONIZED_OP(sc); if (id < 0 || id > ULD_MAX) return (EINVAL); rc = ENXIO; sx_slock(&t4_uld_list_lock); SLIST_FOREACH(ui, &t4_uld_list, link) { if (ui->uld_id == id) { rc = ui->deactivate(sc); if (rc == 0) { clrbit(&sc->active_ulds, id); ui->refcount--; } break; } } sx_sunlock(&t4_uld_list_lock); return (rc); } int uld_active(struct adapter *sc, int uld_id) { MPASS(uld_id >= 0 && uld_id <= ULD_MAX); return (isset(&sc->active_ulds, uld_id)); } #endif /* * Come up with reasonable defaults for some of the tunables, provided they're * not set by the user (in which case we'll use the values as is). */ static void tweak_tunables(void) { int nc = mp_ncpus; /* our snapshot of the number of CPUs */ if (t4_ntxq10g < 1) { #ifdef RSS t4_ntxq10g = rss_getnumbuckets(); #else t4_ntxq10g = min(nc, NTXQ_10G); #endif } if (t4_ntxq1g < 1) { #ifdef RSS /* XXX: way too many for 1GbE? */ t4_ntxq1g = rss_getnumbuckets(); #else t4_ntxq1g = min(nc, NTXQ_1G); #endif } if (t4_ntxq_vi < 1) t4_ntxq_vi = min(nc, NTXQ_VI); if (t4_nrxq10g < 1) { #ifdef RSS t4_nrxq10g = rss_getnumbuckets(); #else t4_nrxq10g = min(nc, NRXQ_10G); #endif } if (t4_nrxq1g < 1) { #ifdef RSS /* XXX: way too many for 1GbE? */ t4_nrxq1g = rss_getnumbuckets(); #else t4_nrxq1g = min(nc, NRXQ_1G); #endif } if (t4_nrxq_vi < 1) t4_nrxq_vi = min(nc, NRXQ_VI); #ifdef TCP_OFFLOAD if (t4_nofldtxq10g < 1) t4_nofldtxq10g = min(nc, NOFLDTXQ_10G); if (t4_nofldtxq1g < 1) t4_nofldtxq1g = min(nc, NOFLDTXQ_1G); if (t4_nofldtxq_vi < 1) t4_nofldtxq_vi = min(nc, NOFLDTXQ_VI); if (t4_nofldrxq10g < 1) t4_nofldrxq10g = min(nc, NOFLDRXQ_10G); if (t4_nofldrxq1g < 1) t4_nofldrxq1g = min(nc, NOFLDRXQ_1G); if (t4_nofldrxq_vi < 1) t4_nofldrxq_vi = min(nc, NOFLDRXQ_VI); if (t4_toecaps_allowed == -1) t4_toecaps_allowed = FW_CAPS_CONFIG_TOE; if (t4_rdmacaps_allowed == -1) { t4_rdmacaps_allowed = FW_CAPS_CONFIG_RDMA_RDDP | FW_CAPS_CONFIG_RDMA_RDMAC; } if (t4_iscsicaps_allowed == -1) { t4_iscsicaps_allowed = FW_CAPS_CONFIG_ISCSI_INITIATOR_PDU | FW_CAPS_CONFIG_ISCSI_TARGET_PDU | FW_CAPS_CONFIG_ISCSI_T10DIF; } #else if (t4_toecaps_allowed == -1) t4_toecaps_allowed = 0; if (t4_rdmacaps_allowed == -1) t4_rdmacaps_allowed = 0; if (t4_iscsicaps_allowed == -1) t4_iscsicaps_allowed = 0; #endif #ifdef DEV_NETMAP if (t4_nnmtxq_vi < 1) t4_nnmtxq_vi = min(nc, NNMTXQ_VI); if (t4_nnmrxq_vi < 1) t4_nnmrxq_vi = min(nc, NNMRXQ_VI); #endif if (t4_tmr_idx_10g < 0 || t4_tmr_idx_10g >= SGE_NTIMERS) t4_tmr_idx_10g = TMR_IDX_10G; if (t4_pktc_idx_10g < -1 || t4_pktc_idx_10g >= SGE_NCOUNTERS) t4_pktc_idx_10g = PKTC_IDX_10G; if (t4_tmr_idx_1g < 0 || t4_tmr_idx_1g >= SGE_NTIMERS) t4_tmr_idx_1g = TMR_IDX_1G; if (t4_pktc_idx_1g < -1 || t4_pktc_idx_1g >= SGE_NCOUNTERS) t4_pktc_idx_1g = PKTC_IDX_1G; if (t4_qsize_txq < 128) t4_qsize_txq = 128; if (t4_qsize_rxq < 128) t4_qsize_rxq = 128; while (t4_qsize_rxq & 7) t4_qsize_rxq++; t4_intr_types &= INTR_MSIX | INTR_MSI | INTR_INTX; } #ifdef DDB static void t4_dump_tcb(struct adapter *sc, int tid) { uint32_t base, i, j, off, pf, reg, save, tcb_addr, win_pos; reg = PCIE_MEM_ACCESS_REG(A_PCIE_MEM_ACCESS_OFFSET, 2); save = t4_read_reg(sc, reg); base = sc->memwin[2].mw_base; /* Dump TCB for the tid */ tcb_addr = t4_read_reg(sc, A_TP_CMM_TCB_BASE); tcb_addr += tid * TCB_SIZE; if (is_t4(sc)) { pf = 0; win_pos = tcb_addr & ~0xf; /* start must be 16B aligned */ } else { pf = V_PFNUM(sc->pf); win_pos = tcb_addr & ~0x7f; /* start must be 128B aligned */ } t4_write_reg(sc, reg, win_pos | pf); t4_read_reg(sc, reg); off = tcb_addr - win_pos; for (i = 0; i < 4; i++) { uint32_t buf[8]; for (j = 0; j < 8; j++, off += 4) buf[j] = htonl(t4_read_reg(sc, base + off)); db_printf("%08x %08x %08x %08x %08x %08x %08x %08x\n", buf[0], buf[1], buf[2], buf[3], buf[4], buf[5], buf[6], buf[7]); } t4_write_reg(sc, reg, save); t4_read_reg(sc, reg); } static void t4_dump_devlog(struct adapter *sc) { struct devlog_params *dparams = &sc->params.devlog; struct fw_devlog_e e; int i, first, j, m, nentries, rc; uint64_t ftstamp = UINT64_MAX; if (dparams->start == 0) { db_printf("devlog params not valid\n"); return; } nentries = dparams->size / sizeof(struct fw_devlog_e); m = fwmtype_to_hwmtype(dparams->memtype); /* Find the first entry. */ first = -1; for (i = 0; i < nentries && !db_pager_quit; i++) { rc = -t4_mem_read(sc, m, dparams->start + i * sizeof(e), sizeof(e), (void *)&e); if (rc != 0) break; if (e.timestamp == 0) break; e.timestamp = be64toh(e.timestamp); if (e.timestamp < ftstamp) { ftstamp = e.timestamp; first = i; } } if (first == -1) return; i = first; do { rc = -t4_mem_read(sc, m, dparams->start + i * sizeof(e), sizeof(e), (void *)&e); if (rc != 0) return; if (e.timestamp == 0) return; e.timestamp = be64toh(e.timestamp); e.seqno = be32toh(e.seqno); for (j = 0; j < 8; j++) e.params[j] = be32toh(e.params[j]); db_printf("%10d %15ju %8s %8s ", e.seqno, e.timestamp, (e.level < nitems(devlog_level_strings) ? devlog_level_strings[e.level] : "UNKNOWN"), (e.facility < nitems(devlog_facility_strings) ? devlog_facility_strings[e.facility] : "UNKNOWN")); db_printf(e.fmt, e.params[0], e.params[1], e.params[2], e.params[3], e.params[4], e.params[5], e.params[6], e.params[7]); if (++i == nentries) i = 0; } while (i != first && !db_pager_quit); } static struct command_table db_t4_table = LIST_HEAD_INITIALIZER(db_t4_table); _DB_SET(_show, t4, NULL, db_show_table, 0, &db_t4_table); DB_FUNC(devlog, db_show_devlog, db_t4_table, CS_OWN, NULL) { device_t dev; int t; bool valid; valid = false; t = db_read_token(); if (t == tIDENT) { dev = device_lookup_by_name(db_tok_string); valid = true; } db_skip_to_eol(); if (!valid) { db_printf("usage: show t4 devlog \n"); return; } if (dev == NULL) { db_printf("device not found\n"); return; } t4_dump_devlog(device_get_softc(dev)); } DB_FUNC(tcb, db_show_t4tcb, db_t4_table, CS_OWN, NULL) { device_t dev; int radix, tid, t; bool valid; valid = false; radix = db_radix; db_radix = 10; t = db_read_token(); if (t == tIDENT) { dev = device_lookup_by_name(db_tok_string); t = db_read_token(); if (t == tNUMBER) { tid = db_tok_number; valid = true; } } db_radix = radix; db_skip_to_eol(); if (!valid) { db_printf("usage: show t4 tcb \n"); return; } if (dev == NULL) { db_printf("device not found\n"); return; } if (tid < 0) { db_printf("invalid tid\n"); return; } t4_dump_tcb(device_get_softc(dev), tid); } #endif static struct sx mlu; /* mod load unload */ SX_SYSINIT(cxgbe_mlu, &mlu, "cxgbe mod load/unload"); static int mod_event(module_t mod, int cmd, void *arg) { int rc = 0; static int loaded = 0; switch (cmd) { case MOD_LOAD: sx_xlock(&mlu); if (loaded++ == 0) { t4_sge_modload(); t4_register_cpl_handler(CPL_SET_TCB_RPL, set_tcb_rpl); t4_register_cpl_handler(CPL_L2T_WRITE_RPL, l2t_write_rpl); t4_register_cpl_handler(CPL_TRACE_PKT, t4_trace_pkt); t4_register_cpl_handler(CPL_T5_TRACE_PKT, t5_trace_pkt); sx_init(&t4_list_lock, "T4/T5 adapters"); SLIST_INIT(&t4_list); #ifdef TCP_OFFLOAD sx_init(&t4_uld_list_lock, "T4/T5 ULDs"); SLIST_INIT(&t4_uld_list); #endif t4_tracer_modload(); tweak_tunables(); } sx_xunlock(&mlu); break; case MOD_UNLOAD: sx_xlock(&mlu); if (--loaded == 0) { int tries; sx_slock(&t4_list_lock); if (!SLIST_EMPTY(&t4_list)) { rc = EBUSY; sx_sunlock(&t4_list_lock); goto done_unload; } #ifdef TCP_OFFLOAD sx_slock(&t4_uld_list_lock); if (!SLIST_EMPTY(&t4_uld_list)) { rc = EBUSY; sx_sunlock(&t4_uld_list_lock); sx_sunlock(&t4_list_lock); goto done_unload; } #endif tries = 0; while (tries++ < 5 && t4_sge_extfree_refs() != 0) { uprintf("%ju clusters with custom free routine " "still is use.\n", t4_sge_extfree_refs()); pause("t4unload", 2 * hz); } #ifdef TCP_OFFLOAD sx_sunlock(&t4_uld_list_lock); #endif sx_sunlock(&t4_list_lock); if (t4_sge_extfree_refs() == 0) { t4_tracer_modunload(); #ifdef TCP_OFFLOAD sx_destroy(&t4_uld_list_lock); #endif sx_destroy(&t4_list_lock); t4_sge_modunload(); loaded = 0; } else { rc = EBUSY; loaded++; /* undo earlier decrement */ } } done_unload: sx_xunlock(&mlu); break; } return (rc); } static devclass_t t4_devclass, t5_devclass; static devclass_t cxgbe_devclass, cxl_devclass; static devclass_t vcxgbe_devclass, vcxl_devclass; DRIVER_MODULE(t4nex, pci, t4_driver, t4_devclass, mod_event, 0); MODULE_VERSION(t4nex, 1); MODULE_DEPEND(t4nex, firmware, 1, 1, 1); #ifdef DEV_NETMAP MODULE_DEPEND(t4nex, netmap, 1, 1, 1); #endif /* DEV_NETMAP */ DRIVER_MODULE(t5nex, pci, t5_driver, t5_devclass, mod_event, 0); MODULE_VERSION(t5nex, 1); MODULE_DEPEND(t5nex, firmware, 1, 1, 1); #ifdef DEV_NETMAP MODULE_DEPEND(t5nex, netmap, 1, 1, 1); #endif /* DEV_NETMAP */ DRIVER_MODULE(cxgbe, t4nex, cxgbe_driver, cxgbe_devclass, 0, 0); MODULE_VERSION(cxgbe, 1); DRIVER_MODULE(cxl, t5nex, cxl_driver, cxl_devclass, 0, 0); MODULE_VERSION(cxl, 1); DRIVER_MODULE(vcxgbe, cxgbe, vcxgbe_driver, vcxgbe_devclass, 0, 0); MODULE_VERSION(vcxgbe, 1); DRIVER_MODULE(vcxl, cxl, vcxl_driver, vcxl_devclass, 0, 0); MODULE_VERSION(vcxl, 1); Index: projects/clang390-import/sys/dev/cxgbe/t4_sge.c =================================================================== --- projects/clang390-import/sys/dev/cxgbe/t4_sge.c (revision 305686) +++ projects/clang390-import/sys/dev/cxgbe/t4_sge.c (revision 305687) @@ -1,5211 +1,5210 @@ /*- * Copyright (c) 2011 Chelsio Communications, Inc. * All rights reserved. * Written by: Navdeep Parhar * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include "opt_inet.h" #include "opt_inet6.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifdef DEV_NETMAP #include #include #include #include #include #endif #include "common/common.h" #include "common/t4_regs.h" #include "common/t4_regs_values.h" #include "common/t4_msg.h" #include "t4_l2t.h" #include "t4_mp_ring.h" #ifdef T4_PKT_TIMESTAMP #define RX_COPY_THRESHOLD (MINCLSIZE - 8) #else #define RX_COPY_THRESHOLD MINCLSIZE #endif /* * Ethernet frames are DMA'd at this byte offset into the freelist buffer. * 0-7 are valid values. */ static int fl_pktshift = 2; TUNABLE_INT("hw.cxgbe.fl_pktshift", &fl_pktshift); /* * Pad ethernet payload up to this boundary. * -1: driver should figure out a good value. * 0: disable padding. * Any power of 2 from 32 to 4096 (both inclusive) is also a valid value. */ int fl_pad = -1; TUNABLE_INT("hw.cxgbe.fl_pad", &fl_pad); /* * Status page length. * -1: driver should figure out a good value. * 64 or 128 are the only other valid values. */ static int spg_len = -1; TUNABLE_INT("hw.cxgbe.spg_len", &spg_len); /* * Congestion drops. * -1: no congestion feedback (not recommended). * 0: backpressure the channel instead of dropping packets right away. * 1: no backpressure, drop packets for the congested queue immediately. */ static int cong_drop = 0; TUNABLE_INT("hw.cxgbe.cong_drop", &cong_drop); /* * Deliver multiple frames in the same free list buffer if they fit. * -1: let the driver decide whether to enable buffer packing or not. * 0: disable buffer packing. * 1: enable buffer packing. */ static int buffer_packing = -1; TUNABLE_INT("hw.cxgbe.buffer_packing", &buffer_packing); /* * Start next frame in a packed buffer at this boundary. * -1: driver should figure out a good value. * T4: driver will ignore this and use the same value as fl_pad above. * T5: 16, or a power of 2 from 64 to 4096 (both inclusive) is a valid value. */ static int fl_pack = -1; TUNABLE_INT("hw.cxgbe.fl_pack", &fl_pack); /* * Allow the driver to create mbuf(s) in a cluster allocated for rx. * 0: never; always allocate mbufs from the zone_mbuf UMA zone. * 1: ok to create mbuf(s) within a cluster if there is room. */ static int allow_mbufs_in_cluster = 1; TUNABLE_INT("hw.cxgbe.allow_mbufs_in_cluster", &allow_mbufs_in_cluster); /* * Largest rx cluster size that the driver is allowed to allocate. */ static int largest_rx_cluster = MJUM16BYTES; TUNABLE_INT("hw.cxgbe.largest_rx_cluster", &largest_rx_cluster); /* * Size of cluster allocation that's most likely to succeed. The driver will * fall back to this size if it fails to allocate clusters larger than this. */ static int safest_rx_cluster = PAGE_SIZE; TUNABLE_INT("hw.cxgbe.safest_rx_cluster", &safest_rx_cluster); struct txpkts { u_int wr_type; /* type 0 or type 1 */ u_int npkt; /* # of packets in this work request */ u_int plen; /* total payload (sum of all packets) */ u_int len16; /* # of 16B pieces used by this work request */ }; /* A packet's SGL. This + m_pkthdr has all info needed for tx */ struct sgl { struct sglist sg; struct sglist_seg seg[TX_SGL_SEGS]; }; static int service_iq(struct sge_iq *, int); static struct mbuf *get_fl_payload(struct adapter *, struct sge_fl *, uint32_t); static int t4_eth_rx(struct sge_iq *, const struct rss_header *, struct mbuf *); static inline void init_iq(struct sge_iq *, struct adapter *, int, int, int); static inline void init_fl(struct adapter *, struct sge_fl *, int, int, char *); static inline void init_eq(struct adapter *, struct sge_eq *, int, int, uint8_t, uint16_t, char *); static int alloc_ring(struct adapter *, size_t, bus_dma_tag_t *, bus_dmamap_t *, bus_addr_t *, void **); static int free_ring(struct adapter *, bus_dma_tag_t, bus_dmamap_t, bus_addr_t, void *); static int alloc_iq_fl(struct vi_info *, struct sge_iq *, struct sge_fl *, int, int); static int free_iq_fl(struct vi_info *, struct sge_iq *, struct sge_fl *); static void add_fl_sysctls(struct sysctl_ctx_list *, struct sysctl_oid *, struct sge_fl *); static int alloc_fwq(struct adapter *); static int free_fwq(struct adapter *); static int alloc_mgmtq(struct adapter *); static int free_mgmtq(struct adapter *); static int alloc_rxq(struct vi_info *, struct sge_rxq *, int, int, struct sysctl_oid *); static int free_rxq(struct vi_info *, struct sge_rxq *); #ifdef TCP_OFFLOAD static int alloc_ofld_rxq(struct vi_info *, struct sge_ofld_rxq *, int, int, struct sysctl_oid *); static int free_ofld_rxq(struct vi_info *, struct sge_ofld_rxq *); #endif #ifdef DEV_NETMAP static int alloc_nm_rxq(struct vi_info *, struct sge_nm_rxq *, int, int, struct sysctl_oid *); static int free_nm_rxq(struct vi_info *, struct sge_nm_rxq *); static int alloc_nm_txq(struct vi_info *, struct sge_nm_txq *, int, int, struct sysctl_oid *); static int free_nm_txq(struct vi_info *, struct sge_nm_txq *); #endif static int ctrl_eq_alloc(struct adapter *, struct sge_eq *); static int eth_eq_alloc(struct adapter *, struct vi_info *, struct sge_eq *); #ifdef TCP_OFFLOAD static int ofld_eq_alloc(struct adapter *, struct vi_info *, struct sge_eq *); #endif static int alloc_eq(struct adapter *, struct vi_info *, struct sge_eq *); static int free_eq(struct adapter *, struct sge_eq *); static int alloc_wrq(struct adapter *, struct vi_info *, struct sge_wrq *, struct sysctl_oid *); static int free_wrq(struct adapter *, struct sge_wrq *); static int alloc_txq(struct vi_info *, struct sge_txq *, int, struct sysctl_oid *); static int free_txq(struct vi_info *, struct sge_txq *); static void oneseg_dma_callback(void *, bus_dma_segment_t *, int, int); static inline void ring_fl_db(struct adapter *, struct sge_fl *); static int refill_fl(struct adapter *, struct sge_fl *, int); static void refill_sfl(void *); static int alloc_fl_sdesc(struct sge_fl *); static void free_fl_sdesc(struct adapter *, struct sge_fl *); static void find_best_refill_source(struct adapter *, struct sge_fl *, int); static void find_safe_refill_source(struct adapter *, struct sge_fl *); static void add_fl_to_sfl(struct adapter *, struct sge_fl *); static inline void get_pkt_gl(struct mbuf *, struct sglist *); static inline u_int txpkt_len16(u_int, u_int); static inline u_int txpkt_vm_len16(u_int, u_int); static inline u_int txpkts0_len16(u_int); static inline u_int txpkts1_len16(void); static u_int write_txpkt_wr(struct sge_txq *, struct fw_eth_tx_pkt_wr *, struct mbuf *, u_int); static u_int write_txpkt_vm_wr(struct sge_txq *, struct fw_eth_tx_pkt_vm_wr *, struct mbuf *, u_int); static int try_txpkts(struct mbuf *, struct mbuf *, struct txpkts *, u_int); static int add_to_txpkts(struct mbuf *, struct txpkts *, u_int); static u_int write_txpkts_wr(struct sge_txq *, struct fw_eth_tx_pkts_wr *, struct mbuf *, const struct txpkts *, u_int); static void write_gl_to_txd(struct sge_txq *, struct mbuf *, caddr_t *, int); static inline void copy_to_txd(struct sge_eq *, caddr_t, caddr_t *, int); static inline void ring_eq_db(struct adapter *, struct sge_eq *, u_int); static inline uint16_t read_hw_cidx(struct sge_eq *); static inline u_int reclaimable_tx_desc(struct sge_eq *); static inline u_int total_available_tx_desc(struct sge_eq *); static u_int reclaim_tx_descs(struct sge_txq *, u_int); static void tx_reclaim(void *, int); static __be64 get_flit(struct sglist_seg *, int, int); static int handle_sge_egr_update(struct sge_iq *, const struct rss_header *, struct mbuf *); static int handle_fw_msg(struct sge_iq *, const struct rss_header *, struct mbuf *); static int t4_handle_wrerr_rpl(struct adapter *, const __be64 *); static void wrq_tx_drain(void *, int); static void drain_wrq_wr_list(struct adapter *, struct sge_wrq *); static int sysctl_uint16(SYSCTL_HANDLER_ARGS); static int sysctl_bufsizes(SYSCTL_HANDLER_ARGS); static int sysctl_tc(SYSCTL_HANDLER_ARGS); static counter_u64_t extfree_refs; static counter_u64_t extfree_rels; an_handler_t t4_an_handler; fw_msg_handler_t t4_fw_msg_handler[NUM_FW6_TYPES]; cpl_handler_t t4_cpl_handler[NUM_CPL_CMDS]; static int an_not_handled(struct sge_iq *iq, const struct rsp_ctrl *ctrl) { #ifdef INVARIANTS panic("%s: async notification on iq %p (ctrl %p)", __func__, iq, ctrl); #else log(LOG_ERR, "%s: async notification on iq %p (ctrl %p)\n", __func__, iq, ctrl); #endif return (EDOOFUS); } int t4_register_an_handler(an_handler_t h) { uintptr_t *loc, new; new = h ? (uintptr_t)h : (uintptr_t)an_not_handled; loc = (uintptr_t *) &t4_an_handler; atomic_store_rel_ptr(loc, new); return (0); } static int fw_msg_not_handled(struct adapter *sc, const __be64 *rpl) { const struct cpl_fw6_msg *cpl = __containerof(rpl, struct cpl_fw6_msg, data[0]); #ifdef INVARIANTS panic("%s: fw_msg type %d", __func__, cpl->type); #else log(LOG_ERR, "%s: fw_msg type %d\n", __func__, cpl->type); #endif return (EDOOFUS); } int t4_register_fw_msg_handler(int type, fw_msg_handler_t h) { uintptr_t *loc, new; if (type >= nitems(t4_fw_msg_handler)) return (EINVAL); /* * These are dispatched by the handler for FW{4|6}_CPL_MSG using the CPL * handler dispatch table. Reject any attempt to install a handler for * this subtype. */ if (type == FW_TYPE_RSSCPL || type == FW6_TYPE_RSSCPL) return (EINVAL); new = h ? (uintptr_t)h : (uintptr_t)fw_msg_not_handled; loc = (uintptr_t *) &t4_fw_msg_handler[type]; atomic_store_rel_ptr(loc, new); return (0); } static int cpl_not_handled(struct sge_iq *iq, const struct rss_header *rss, struct mbuf *m) { #ifdef INVARIANTS panic("%s: opcode 0x%02x on iq %p with payload %p", __func__, rss->opcode, iq, m); #else log(LOG_ERR, "%s: opcode 0x%02x on iq %p with payload %p\n", __func__, rss->opcode, iq, m); m_freem(m); #endif return (EDOOFUS); } int t4_register_cpl_handler(int opcode, cpl_handler_t h) { uintptr_t *loc, new; if (opcode >= nitems(t4_cpl_handler)) return (EINVAL); new = h ? (uintptr_t)h : (uintptr_t)cpl_not_handled; loc = (uintptr_t *) &t4_cpl_handler[opcode]; atomic_store_rel_ptr(loc, new); return (0); } /* * Called on MOD_LOAD. Validates and calculates the SGE tunables. */ void t4_sge_modload(void) { int i; if (fl_pktshift < 0 || fl_pktshift > 7) { printf("Invalid hw.cxgbe.fl_pktshift value (%d)," " using 2 instead.\n", fl_pktshift); fl_pktshift = 2; } if (spg_len != 64 && spg_len != 128) { int len; #if defined(__i386__) || defined(__amd64__) len = cpu_clflush_line_size > 64 ? 128 : 64; #else len = 64; #endif if (spg_len != -1) { printf("Invalid hw.cxgbe.spg_len value (%d)," " using %d instead.\n", spg_len, len); } spg_len = len; } if (cong_drop < -1 || cong_drop > 1) { printf("Invalid hw.cxgbe.cong_drop value (%d)," " using 0 instead.\n", cong_drop); cong_drop = 0; } extfree_refs = counter_u64_alloc(M_WAITOK); extfree_rels = counter_u64_alloc(M_WAITOK); counter_u64_zero(extfree_refs); counter_u64_zero(extfree_rels); t4_an_handler = an_not_handled; for (i = 0; i < nitems(t4_fw_msg_handler); i++) t4_fw_msg_handler[i] = fw_msg_not_handled; for (i = 0; i < nitems(t4_cpl_handler); i++) t4_cpl_handler[i] = cpl_not_handled; t4_register_cpl_handler(CPL_FW4_MSG, handle_fw_msg); t4_register_cpl_handler(CPL_FW6_MSG, handle_fw_msg); t4_register_cpl_handler(CPL_SGE_EGR_UPDATE, handle_sge_egr_update); t4_register_cpl_handler(CPL_RX_PKT, t4_eth_rx); t4_register_fw_msg_handler(FW6_TYPE_CMD_RPL, t4_handle_fw_rpl); t4_register_fw_msg_handler(FW6_TYPE_WRERR_RPL, t4_handle_wrerr_rpl); } void t4_sge_modunload(void) { counter_u64_free(extfree_refs); counter_u64_free(extfree_rels); } uint64_t t4_sge_extfree_refs(void) { uint64_t refs, rels; rels = counter_u64_fetch(extfree_rels); refs = counter_u64_fetch(extfree_refs); return (refs - rels); } static inline void setup_pad_and_pack_boundaries(struct adapter *sc) { uint32_t v, m; int pad, pack; pad = fl_pad; if (fl_pad < 32 || fl_pad > 4096 || !powerof2(fl_pad)) { /* * If there is any chance that we might use buffer packing and * the chip is a T4, then pick 64 as the pad/pack boundary. Set * it to 32 in all other cases. */ pad = is_t4(sc) && buffer_packing ? 64 : 32; /* * For fl_pad = 0 we'll still write a reasonable value to the * register but all the freelists will opt out of padding. * We'll complain here only if the user tried to set it to a * value greater than 0 that was invalid. */ if (fl_pad > 0) { device_printf(sc->dev, "Invalid hw.cxgbe.fl_pad value" " (%d), using %d instead.\n", fl_pad, pad); } } m = V_INGPADBOUNDARY(M_INGPADBOUNDARY); v = V_INGPADBOUNDARY(ilog2(pad) - 5); t4_set_reg_field(sc, A_SGE_CONTROL, m, v); if (is_t4(sc)) { if (fl_pack != -1 && fl_pack != pad) { /* Complain but carry on. */ device_printf(sc->dev, "hw.cxgbe.fl_pack (%d) ignored," " using %d instead.\n", fl_pack, pad); } return; } pack = fl_pack; if (fl_pack < 16 || fl_pack == 32 || fl_pack > 4096 || !powerof2(fl_pack)) { pack = max(sc->params.pci.mps, CACHE_LINE_SIZE); MPASS(powerof2(pack)); if (pack < 16) pack = 16; if (pack == 32) pack = 64; if (pack > 4096) pack = 4096; if (fl_pack != -1) { device_printf(sc->dev, "Invalid hw.cxgbe.fl_pack value" " (%d), using %d instead.\n", fl_pack, pack); } } m = V_INGPACKBOUNDARY(M_INGPACKBOUNDARY); if (pack == 16) v = V_INGPACKBOUNDARY(0); else v = V_INGPACKBOUNDARY(ilog2(pack) - 5); MPASS(!is_t4(sc)); /* T4 doesn't have SGE_CONTROL2 */ t4_set_reg_field(sc, A_SGE_CONTROL2, m, v); } /* * adap->params.vpd.cclk must be set up before this is called. */ void t4_tweak_chip_settings(struct adapter *sc) { int i; uint32_t v, m; int intr_timer[SGE_NTIMERS] = {1, 5, 10, 50, 100, 200}; int timer_max = M_TIMERVALUE0 * 1000 / sc->params.vpd.cclk; int intr_pktcount[SGE_NCOUNTERS] = {1, 8, 16, 32}; /* 63 max */ uint16_t indsz = min(RX_COPY_THRESHOLD - 1, M_INDICATESIZE); static int sge_flbuf_sizes[] = { MCLBYTES, #if MJUMPAGESIZE != MCLBYTES MJUMPAGESIZE, MJUMPAGESIZE - CL_METADATA_SIZE, MJUMPAGESIZE - 2 * MSIZE - CL_METADATA_SIZE, #endif MJUM9BYTES, MJUM16BYTES, MCLBYTES - MSIZE - CL_METADATA_SIZE, MJUM9BYTES - CL_METADATA_SIZE, MJUM16BYTES - CL_METADATA_SIZE, }; KASSERT(sc->flags & MASTER_PF, ("%s: trying to change chip settings when not master.", __func__)); m = V_PKTSHIFT(M_PKTSHIFT) | F_RXPKTCPLMODE | F_EGRSTATUSPAGESIZE; v = V_PKTSHIFT(fl_pktshift) | F_RXPKTCPLMODE | V_EGRSTATUSPAGESIZE(spg_len == 128); t4_set_reg_field(sc, A_SGE_CONTROL, m, v); setup_pad_and_pack_boundaries(sc); v = V_HOSTPAGESIZEPF0(PAGE_SHIFT - 10) | V_HOSTPAGESIZEPF1(PAGE_SHIFT - 10) | V_HOSTPAGESIZEPF2(PAGE_SHIFT - 10) | V_HOSTPAGESIZEPF3(PAGE_SHIFT - 10) | V_HOSTPAGESIZEPF4(PAGE_SHIFT - 10) | V_HOSTPAGESIZEPF5(PAGE_SHIFT - 10) | V_HOSTPAGESIZEPF6(PAGE_SHIFT - 10) | V_HOSTPAGESIZEPF7(PAGE_SHIFT - 10); t4_write_reg(sc, A_SGE_HOST_PAGE_SIZE, v); KASSERT(nitems(sge_flbuf_sizes) <= SGE_FLBUF_SIZES, ("%s: hw buffer size table too big", __func__)); for (i = 0; i < min(nitems(sge_flbuf_sizes), SGE_FLBUF_SIZES); i++) { t4_write_reg(sc, A_SGE_FL_BUFFER_SIZE0 + (4 * i), sge_flbuf_sizes[i]); } v = V_THRESHOLD_0(intr_pktcount[0]) | V_THRESHOLD_1(intr_pktcount[1]) | V_THRESHOLD_2(intr_pktcount[2]) | V_THRESHOLD_3(intr_pktcount[3]); t4_write_reg(sc, A_SGE_INGRESS_RX_THRESHOLD, v); KASSERT(intr_timer[0] <= timer_max, ("%s: not a single usable timer (%d, %d)", __func__, intr_timer[0], timer_max)); for (i = 1; i < nitems(intr_timer); i++) { KASSERT(intr_timer[i] >= intr_timer[i - 1], ("%s: timers not listed in increasing order (%d)", __func__, i)); while (intr_timer[i] > timer_max) { if (i == nitems(intr_timer) - 1) { intr_timer[i] = timer_max; break; } intr_timer[i] += intr_timer[i - 1]; intr_timer[i] /= 2; } } v = V_TIMERVALUE0(us_to_core_ticks(sc, intr_timer[0])) | V_TIMERVALUE1(us_to_core_ticks(sc, intr_timer[1])); t4_write_reg(sc, A_SGE_TIMER_VALUE_0_AND_1, v); v = V_TIMERVALUE2(us_to_core_ticks(sc, intr_timer[2])) | V_TIMERVALUE3(us_to_core_ticks(sc, intr_timer[3])); t4_write_reg(sc, A_SGE_TIMER_VALUE_2_AND_3, v); v = V_TIMERVALUE4(us_to_core_ticks(sc, intr_timer[4])) | V_TIMERVALUE5(us_to_core_ticks(sc, intr_timer[5])); t4_write_reg(sc, A_SGE_TIMER_VALUE_4_AND_5, v); /* 4K, 16K, 64K, 256K DDP "page sizes" for TDDP */ v = V_HPZ0(0) | V_HPZ1(2) | V_HPZ2(4) | V_HPZ3(6); t4_write_reg(sc, A_ULP_RX_TDDP_PSZ, v); /* * 4K, 8K, 16K, 64K DDP "page sizes" for iSCSI DDP. These have been * chosen with MAXPHYS = 128K in mind. The largest DDP buffer that we * may have to deal with is MAXPHYS + 1 page. */ v = V_HPZ0(0) | V_HPZ1(1) | V_HPZ2(2) | V_HPZ3(4); t4_write_reg(sc, A_ULP_RX_ISCSI_PSZ, v); /* We use multiple DDP page sizes both in plain-TOE and ISCSI modes. */ m = v = F_TDDPTAGTCB | F_ISCSITAGTCB; t4_set_reg_field(sc, A_ULP_RX_CTL, m, v); m = V_INDICATESIZE(M_INDICATESIZE) | F_REARMDDPOFFSET | F_RESETDDPOFFSET; v = V_INDICATESIZE(indsz) | F_REARMDDPOFFSET | F_RESETDDPOFFSET; t4_set_reg_field(sc, A_TP_PARA_REG5, m, v); } /* * SGE wants the buffer to be at least 64B and then a multiple of 16. If * padding is in use, the buffer's start and end need to be aligned to the pad * boundary as well. We'll just make sure that the size is a multiple of the * boundary here, it is up to the buffer allocation code to make sure the start * of the buffer is aligned as well. */ static inline int hwsz_ok(struct adapter *sc, int hwsz) { int mask = fl_pad ? sc->params.sge.pad_boundary - 1 : 16 - 1; return (hwsz >= 64 && (hwsz & mask) == 0); } /* * XXX: driver really should be able to deal with unexpected settings. */ int t4_read_chip_settings(struct adapter *sc) { struct sge *s = &sc->sge; struct sge_params *sp = &sc->params.sge; int i, j, n, rc = 0; uint32_t m, v, r; uint16_t indsz = min(RX_COPY_THRESHOLD - 1, M_INDICATESIZE); static int sw_buf_sizes[] = { /* Sorted by size */ MCLBYTES, #if MJUMPAGESIZE != MCLBYTES MJUMPAGESIZE, #endif MJUM9BYTES, MJUM16BYTES }; struct sw_zone_info *swz, *safe_swz; struct hw_buf_info *hwb; m = F_RXPKTCPLMODE; v = F_RXPKTCPLMODE; r = sc->params.sge.sge_control; if ((r & m) != v) { device_printf(sc->dev, "invalid SGE_CONTROL(0x%x)\n", r); rc = EINVAL; } /* * If this changes then every single use of PAGE_SHIFT in the driver * needs to be carefully reviewed for PAGE_SHIFT vs sp->page_shift. */ if (sp->page_shift != PAGE_SHIFT) { device_printf(sc->dev, "invalid SGE_HOST_PAGE_SIZE(0x%x)\n", r); rc = EINVAL; } /* Filter out unusable hw buffer sizes entirely (mark with -2). */ hwb = &s->hw_buf_info[0]; for (i = 0; i < nitems(s->hw_buf_info); i++, hwb++) { r = sc->params.sge.sge_fl_buffer_size[i]; hwb->size = r; hwb->zidx = hwsz_ok(sc, r) ? -1 : -2; hwb->next = -1; } /* * Create a sorted list in decreasing order of hw buffer sizes (and so * increasing order of spare area) for each software zone. * * If padding is enabled then the start and end of the buffer must align * to the pad boundary; if packing is enabled then they must align with * the pack boundary as well. Allocations from the cluster zones are * aligned to min(size, 4K), so the buffer starts at that alignment and * ends at hwb->size alignment. If mbuf inlining is allowed the * starting alignment will be reduced to MSIZE and the driver will * exercise appropriate caution when deciding on the best buffer layout * to use. */ n = 0; /* no usable buffer size to begin with */ swz = &s->sw_zone_info[0]; safe_swz = NULL; for (i = 0; i < SW_ZONE_SIZES; i++, swz++) { int8_t head = -1, tail = -1; swz->size = sw_buf_sizes[i]; swz->zone = m_getzone(swz->size); swz->type = m_gettype(swz->size); if (swz->size < PAGE_SIZE) { MPASS(powerof2(swz->size)); if (fl_pad && (swz->size % sp->pad_boundary != 0)) continue; } if (swz->size == safest_rx_cluster) safe_swz = swz; hwb = &s->hw_buf_info[0]; for (j = 0; j < SGE_FLBUF_SIZES; j++, hwb++) { if (hwb->zidx != -1 || hwb->size > swz->size) continue; #ifdef INVARIANTS if (fl_pad) MPASS(hwb->size % sp->pad_boundary == 0); #endif hwb->zidx = i; if (head == -1) head = tail = j; else if (hwb->size < s->hw_buf_info[tail].size) { s->hw_buf_info[tail].next = j; tail = j; } else { int8_t *cur; struct hw_buf_info *t; for (cur = &head; *cur != -1; cur = &t->next) { t = &s->hw_buf_info[*cur]; if (hwb->size == t->size) { hwb->zidx = -2; break; } if (hwb->size > t->size) { hwb->next = *cur; *cur = j; break; } } } } swz->head_hwidx = head; swz->tail_hwidx = tail; if (tail != -1) { n++; if (swz->size - s->hw_buf_info[tail].size >= CL_METADATA_SIZE) sc->flags |= BUF_PACKING_OK; } } if (n == 0) { device_printf(sc->dev, "no usable SGE FL buffer size.\n"); rc = EINVAL; } s->safe_hwidx1 = -1; s->safe_hwidx2 = -1; if (safe_swz != NULL) { s->safe_hwidx1 = safe_swz->head_hwidx; for (i = safe_swz->head_hwidx; i != -1; i = hwb->next) { int spare; hwb = &s->hw_buf_info[i]; #ifdef INVARIANTS if (fl_pad) MPASS(hwb->size % sp->pad_boundary == 0); #endif spare = safe_swz->size - hwb->size; if (spare >= CL_METADATA_SIZE) { s->safe_hwidx2 = i; break; } } } if (sc->flags & IS_VF) return (0); v = V_HPZ0(0) | V_HPZ1(2) | V_HPZ2(4) | V_HPZ3(6); r = t4_read_reg(sc, A_ULP_RX_TDDP_PSZ); if (r != v) { device_printf(sc->dev, "invalid ULP_RX_TDDP_PSZ(0x%x)\n", r); rc = EINVAL; } m = v = F_TDDPTAGTCB; r = t4_read_reg(sc, A_ULP_RX_CTL); if ((r & m) != v) { device_printf(sc->dev, "invalid ULP_RX_CTL(0x%x)\n", r); rc = EINVAL; } m = V_INDICATESIZE(M_INDICATESIZE) | F_REARMDDPOFFSET | F_RESETDDPOFFSET; v = V_INDICATESIZE(indsz) | F_REARMDDPOFFSET | F_RESETDDPOFFSET; r = t4_read_reg(sc, A_TP_PARA_REG5); if ((r & m) != v) { device_printf(sc->dev, "invalid TP_PARA_REG5(0x%x)\n", r); rc = EINVAL; } t4_init_tp_params(sc); t4_read_mtu_tbl(sc, sc->params.mtus, NULL); t4_load_mtus(sc, sc->params.mtus, sc->params.a_wnd, sc->params.b_wnd); return (rc); } int t4_create_dma_tag(struct adapter *sc) { int rc; rc = bus_dma_tag_create(bus_get_dma_tag(sc->dev), 1, 0, BUS_SPACE_MAXADDR, BUS_SPACE_MAXADDR, NULL, NULL, BUS_SPACE_MAXSIZE, BUS_SPACE_UNRESTRICTED, BUS_SPACE_MAXSIZE, BUS_DMA_ALLOCNOW, NULL, NULL, &sc->dmat); if (rc != 0) { device_printf(sc->dev, "failed to create main DMA tag: %d\n", rc); } return (rc); } void t4_sge_sysctls(struct adapter *sc, struct sysctl_ctx_list *ctx, struct sysctl_oid_list *children) { struct sge_params *sp = &sc->params.sge; SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "buffer_sizes", CTLTYPE_STRING | CTLFLAG_RD, &sc->sge, 0, sysctl_bufsizes, "A", "freelist buffer sizes"); SYSCTL_ADD_INT(ctx, children, OID_AUTO, "fl_pktshift", CTLFLAG_RD, NULL, sp->fl_pktshift, "payload DMA offset in rx buffer (bytes)"); SYSCTL_ADD_INT(ctx, children, OID_AUTO, "fl_pad", CTLFLAG_RD, NULL, sp->pad_boundary, "payload pad boundary (bytes)"); SYSCTL_ADD_INT(ctx, children, OID_AUTO, "spg_len", CTLFLAG_RD, NULL, sp->spg_len, "status page size (bytes)"); SYSCTL_ADD_INT(ctx, children, OID_AUTO, "cong_drop", CTLFLAG_RD, NULL, cong_drop, "congestion drop setting"); SYSCTL_ADD_INT(ctx, children, OID_AUTO, "fl_pack", CTLFLAG_RD, NULL, sp->pack_boundary, "payload pack boundary (bytes)"); } int t4_destroy_dma_tag(struct adapter *sc) { if (sc->dmat) bus_dma_tag_destroy(sc->dmat); return (0); } /* * Allocate and initialize the firmware event queue and the management queue. * * Returns errno on failure. Resources allocated up to that point may still be * allocated. Caller is responsible for cleanup in case this function fails. */ int t4_setup_adapter_queues(struct adapter *sc) { int rc; ADAPTER_LOCK_ASSERT_NOTOWNED(sc); sysctl_ctx_init(&sc->ctx); sc->flags |= ADAP_SYSCTL_CTX; /* * Firmware event queue */ rc = alloc_fwq(sc); if (rc != 0) return (rc); /* * Management queue. This is just a control queue that uses the fwq as * its associated iq. */ if (!(sc->flags & IS_VF)) rc = alloc_mgmtq(sc); return (rc); } /* * Idempotent */ int t4_teardown_adapter_queues(struct adapter *sc) { ADAPTER_LOCK_ASSERT_NOTOWNED(sc); /* Do this before freeing the queue */ if (sc->flags & ADAP_SYSCTL_CTX) { sysctl_ctx_free(&sc->ctx); sc->flags &= ~ADAP_SYSCTL_CTX; } free_mgmtq(sc); free_fwq(sc); return (0); } static inline int first_vector(struct vi_info *vi) { struct adapter *sc = vi->pi->adapter; if (sc->intr_count == 1) return (0); return (vi->first_intr); } /* * Given an arbitrary "index," come up with an iq that can be used by other * queues (of this VI) for interrupt forwarding, SGE egress updates, etc. * The iq returned is guaranteed to be something that takes direct interrupts. */ static struct sge_iq * vi_intr_iq(struct vi_info *vi, int idx) { struct adapter *sc = vi->pi->adapter; struct sge *s = &sc->sge; struct sge_iq *iq = NULL; int nintr, i; if (sc->intr_count == 1) return (&sc->sge.fwq); nintr = vi->nintr; KASSERT(nintr != 0, ("%s: vi %p has no exclusive interrupts, total interrupts = %d", __func__, vi, sc->intr_count)); i = idx % nintr; if (vi->flags & INTR_RXQ) { if (i < vi->nrxq) { iq = &s->rxq[vi->first_rxq + i].iq; goto done; } i -= vi->nrxq; } #ifdef TCP_OFFLOAD if (vi->flags & INTR_OFLD_RXQ) { if (i < vi->nofldrxq) { iq = &s->ofld_rxq[vi->first_ofld_rxq + i].iq; goto done; } i -= vi->nofldrxq; } #endif panic("%s: vi %p, intr_flags 0x%lx, idx %d, total intr %d\n", __func__, vi, vi->flags & INTR_ALL, idx, nintr); done: MPASS(iq != NULL); KASSERT(iq->flags & IQ_INTR, ("%s: iq %p (vi %p, intr_flags 0x%lx, idx %d)", __func__, iq, vi, vi->flags & INTR_ALL, idx)); return (iq); } /* Maximum payload that can be delivered with a single iq descriptor */ static inline int mtu_to_max_payload(struct adapter *sc, int mtu, const int toe) { int payload; #ifdef TCP_OFFLOAD if (toe) { payload = sc->tt.rx_coalesce ? G_RXCOALESCESIZE(t4_read_reg(sc, A_TP_PARA_REG2)) : mtu; } else { #endif /* large enough even when hw VLAN extraction is disabled */ payload = sc->params.sge.fl_pktshift + ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN + mtu; #ifdef TCP_OFFLOAD } #endif return (payload); } int t4_setup_vi_queues(struct vi_info *vi) { int rc = 0, i, j, intr_idx, iqid; struct sge_rxq *rxq; struct sge_txq *txq; struct sge_wrq *ctrlq; #ifdef TCP_OFFLOAD struct sge_ofld_rxq *ofld_rxq; struct sge_wrq *ofld_txq; #endif #ifdef DEV_NETMAP int saved_idx; struct sge_nm_rxq *nm_rxq; struct sge_nm_txq *nm_txq; #endif char name[16]; struct port_info *pi = vi->pi; struct adapter *sc = pi->adapter; struct ifnet *ifp = vi->ifp; struct sysctl_oid *oid = device_get_sysctl_tree(vi->dev); struct sysctl_oid_list *children = SYSCTL_CHILDREN(oid); int maxp, mtu = ifp->if_mtu; /* Interrupt vector to start from (when using multiple vectors) */ intr_idx = first_vector(vi); #ifdef DEV_NETMAP saved_idx = intr_idx; if (ifp->if_capabilities & IFCAP_NETMAP) { /* netmap is supported with direct interrupts only. */ MPASS(vi->flags & INTR_RXQ); /* * We don't have buffers to back the netmap rx queues * right now so we create the queues in a way that * doesn't set off any congestion signal in the chip. */ oid = SYSCTL_ADD_NODE(&vi->ctx, children, OID_AUTO, "nm_rxq", CTLFLAG_RD, NULL, "rx queues"); for_each_nm_rxq(vi, i, nm_rxq) { rc = alloc_nm_rxq(vi, nm_rxq, intr_idx, i, oid); if (rc != 0) goto done; intr_idx++; } oid = SYSCTL_ADD_NODE(&vi->ctx, children, OID_AUTO, "nm_txq", CTLFLAG_RD, NULL, "tx queues"); for_each_nm_txq(vi, i, nm_txq) { iqid = vi->first_nm_rxq + (i % vi->nnmrxq); rc = alloc_nm_txq(vi, nm_txq, iqid, i, oid); if (rc != 0) goto done; } } /* Normal rx queues and netmap rx queues share the same interrupts. */ intr_idx = saved_idx; #endif /* * First pass over all NIC and TOE rx queues: * a) initialize iq and fl * b) allocate queue iff it will take direct interrupts. */ maxp = mtu_to_max_payload(sc, mtu, 0); if (vi->flags & INTR_RXQ) { oid = SYSCTL_ADD_NODE(&vi->ctx, children, OID_AUTO, "rxq", CTLFLAG_RD, NULL, "rx queues"); } for_each_rxq(vi, i, rxq) { init_iq(&rxq->iq, sc, vi->tmr_idx, vi->pktc_idx, vi->qsize_rxq); snprintf(name, sizeof(name), "%s rxq%d-fl", device_get_nameunit(vi->dev), i); init_fl(sc, &rxq->fl, vi->qsize_rxq / 8, maxp, name); if (vi->flags & INTR_RXQ) { rxq->iq.flags |= IQ_INTR; rc = alloc_rxq(vi, rxq, intr_idx, i, oid); if (rc != 0) goto done; intr_idx++; } } #ifdef DEV_NETMAP if (ifp->if_capabilities & IFCAP_NETMAP) intr_idx = saved_idx + max(vi->nrxq, vi->nnmrxq); #endif #ifdef TCP_OFFLOAD maxp = mtu_to_max_payload(sc, mtu, 1); if (vi->flags & INTR_OFLD_RXQ) { oid = SYSCTL_ADD_NODE(&vi->ctx, children, OID_AUTO, "ofld_rxq", CTLFLAG_RD, NULL, "rx queues for offloaded TCP connections"); } for_each_ofld_rxq(vi, i, ofld_rxq) { init_iq(&ofld_rxq->iq, sc, vi->tmr_idx, vi->pktc_idx, vi->qsize_rxq); snprintf(name, sizeof(name), "%s ofld_rxq%d-fl", device_get_nameunit(vi->dev), i); init_fl(sc, &ofld_rxq->fl, vi->qsize_rxq / 8, maxp, name); if (vi->flags & INTR_OFLD_RXQ) { ofld_rxq->iq.flags |= IQ_INTR; rc = alloc_ofld_rxq(vi, ofld_rxq, intr_idx, i, oid); if (rc != 0) goto done; intr_idx++; } } #endif /* * Second pass over all NIC and TOE rx queues. The queues forwarding * their interrupts are allocated now. */ j = 0; if (!(vi->flags & INTR_RXQ)) { oid = SYSCTL_ADD_NODE(&vi->ctx, children, OID_AUTO, "rxq", CTLFLAG_RD, NULL, "rx queues"); for_each_rxq(vi, i, rxq) { MPASS(!(rxq->iq.flags & IQ_INTR)); intr_idx = vi_intr_iq(vi, j)->abs_id; rc = alloc_rxq(vi, rxq, intr_idx, i, oid); if (rc != 0) goto done; j++; } } #ifdef TCP_OFFLOAD if (vi->nofldrxq != 0 && !(vi->flags & INTR_OFLD_RXQ)) { oid = SYSCTL_ADD_NODE(&vi->ctx, children, OID_AUTO, "ofld_rxq", CTLFLAG_RD, NULL, "rx queues for offloaded TCP connections"); for_each_ofld_rxq(vi, i, ofld_rxq) { MPASS(!(ofld_rxq->iq.flags & IQ_INTR)); intr_idx = vi_intr_iq(vi, j)->abs_id; rc = alloc_ofld_rxq(vi, ofld_rxq, intr_idx, i, oid); if (rc != 0) goto done; j++; } } #endif /* * Now the tx queues. Only one pass needed. */ oid = SYSCTL_ADD_NODE(&vi->ctx, children, OID_AUTO, "txq", CTLFLAG_RD, NULL, "tx queues"); j = 0; for_each_txq(vi, i, txq) { iqid = vi_intr_iq(vi, j)->cntxt_id; snprintf(name, sizeof(name), "%s txq%d", device_get_nameunit(vi->dev), i); init_eq(sc, &txq->eq, EQ_ETH, vi->qsize_txq, pi->tx_chan, iqid, name); rc = alloc_txq(vi, txq, i, oid); if (rc != 0) goto done; j++; } #ifdef TCP_OFFLOAD oid = SYSCTL_ADD_NODE(&vi->ctx, children, OID_AUTO, "ofld_txq", CTLFLAG_RD, NULL, "tx queues for offloaded TCP connections"); for_each_ofld_txq(vi, i, ofld_txq) { struct sysctl_oid *oid2; iqid = vi_intr_iq(vi, j)->cntxt_id; snprintf(name, sizeof(name), "%s ofld_txq%d", device_get_nameunit(vi->dev), i); init_eq(sc, &ofld_txq->eq, EQ_OFLD, vi->qsize_txq, pi->tx_chan, iqid, name); snprintf(name, sizeof(name), "%d", i); oid2 = SYSCTL_ADD_NODE(&vi->ctx, SYSCTL_CHILDREN(oid), OID_AUTO, name, CTLFLAG_RD, NULL, "offload tx queue"); rc = alloc_wrq(sc, vi, ofld_txq, oid2); if (rc != 0) goto done; j++; } #endif /* * Finally, the control queue. */ if (!IS_MAIN_VI(vi) || sc->flags & IS_VF) goto done; oid = SYSCTL_ADD_NODE(&vi->ctx, children, OID_AUTO, "ctrlq", CTLFLAG_RD, NULL, "ctrl queue"); ctrlq = &sc->sge.ctrlq[pi->port_id]; iqid = vi_intr_iq(vi, 0)->cntxt_id; snprintf(name, sizeof(name), "%s ctrlq", device_get_nameunit(vi->dev)); init_eq(sc, &ctrlq->eq, EQ_CTRL, CTRL_EQ_QSIZE, pi->tx_chan, iqid, name); rc = alloc_wrq(sc, vi, ctrlq, oid); done: if (rc) t4_teardown_vi_queues(vi); return (rc); } /* * Idempotent */ int t4_teardown_vi_queues(struct vi_info *vi) { int i; struct port_info *pi = vi->pi; struct adapter *sc = pi->adapter; struct sge_rxq *rxq; struct sge_txq *txq; #ifdef TCP_OFFLOAD struct sge_ofld_rxq *ofld_rxq; struct sge_wrq *ofld_txq; #endif #ifdef DEV_NETMAP struct sge_nm_rxq *nm_rxq; struct sge_nm_txq *nm_txq; #endif /* Do this before freeing the queues */ if (vi->flags & VI_SYSCTL_CTX) { sysctl_ctx_free(&vi->ctx); vi->flags &= ~VI_SYSCTL_CTX; } #ifdef DEV_NETMAP if (vi->ifp->if_capabilities & IFCAP_NETMAP) { for_each_nm_txq(vi, i, nm_txq) { free_nm_txq(vi, nm_txq); } for_each_nm_rxq(vi, i, nm_rxq) { free_nm_rxq(vi, nm_rxq); } } #endif /* * Take down all the tx queues first, as they reference the rx queues * (for egress updates, etc.). */ if (IS_MAIN_VI(vi) && !(sc->flags & IS_VF)) free_wrq(sc, &sc->sge.ctrlq[pi->port_id]); for_each_txq(vi, i, txq) { free_txq(vi, txq); } #ifdef TCP_OFFLOAD for_each_ofld_txq(vi, i, ofld_txq) { free_wrq(sc, ofld_txq); } #endif /* * Then take down the rx queues that forward their interrupts, as they * reference other rx queues. */ for_each_rxq(vi, i, rxq) { if ((rxq->iq.flags & IQ_INTR) == 0) free_rxq(vi, rxq); } #ifdef TCP_OFFLOAD for_each_ofld_rxq(vi, i, ofld_rxq) { if ((ofld_rxq->iq.flags & IQ_INTR) == 0) free_ofld_rxq(vi, ofld_rxq); } #endif /* * Then take down the rx queues that take direct interrupts. */ for_each_rxq(vi, i, rxq) { if (rxq->iq.flags & IQ_INTR) free_rxq(vi, rxq); } #ifdef TCP_OFFLOAD for_each_ofld_rxq(vi, i, ofld_rxq) { if (ofld_rxq->iq.flags & IQ_INTR) free_ofld_rxq(vi, ofld_rxq); } #endif return (0); } /* * Deals with errors and the firmware event queue. All data rx queues forward * their interrupt to the firmware event queue. */ void t4_intr_all(void *arg) { struct adapter *sc = arg; struct sge_iq *fwq = &sc->sge.fwq; t4_intr_err(arg); if (atomic_cmpset_int(&fwq->state, IQS_IDLE, IQS_BUSY)) { service_iq(fwq, 0); atomic_cmpset_int(&fwq->state, IQS_BUSY, IQS_IDLE); } } /* Deals with error interrupts */ void t4_intr_err(void *arg) { struct adapter *sc = arg; t4_write_reg(sc, MYPF_REG(A_PCIE_PF_CLI), 0); t4_slow_intr_handler(sc); } void t4_intr_evt(void *arg) { struct sge_iq *iq = arg; if (atomic_cmpset_int(&iq->state, IQS_IDLE, IQS_BUSY)) { service_iq(iq, 0); atomic_cmpset_int(&iq->state, IQS_BUSY, IQS_IDLE); } } void t4_intr(void *arg) { struct sge_iq *iq = arg; if (atomic_cmpset_int(&iq->state, IQS_IDLE, IQS_BUSY)) { service_iq(iq, 0); atomic_cmpset_int(&iq->state, IQS_BUSY, IQS_IDLE); } } void t4_vi_intr(void *arg) { struct irq *irq = arg; #ifdef DEV_NETMAP if (atomic_cmpset_int(&irq->nm_state, NM_ON, NM_BUSY)) { t4_nm_intr(irq->nm_rxq); atomic_cmpset_int(&irq->nm_state, NM_BUSY, NM_ON); } #endif if (irq->rxq != NULL) t4_intr(irq->rxq); } /* * Deals with anything and everything on the given ingress queue. */ static int service_iq(struct sge_iq *iq, int budget) { struct sge_iq *q; struct sge_rxq *rxq = iq_to_rxq(iq); /* Use iff iq is part of rxq */ struct sge_fl *fl; /* Use iff IQ_HAS_FL */ struct adapter *sc = iq->adapter; struct iq_desc *d = &iq->desc[iq->cidx]; int ndescs = 0, limit; int rsp_type, refill; uint32_t lq; uint16_t fl_hw_cidx; struct mbuf *m0; STAILQ_HEAD(, sge_iq) iql = STAILQ_HEAD_INITIALIZER(iql); #if defined(INET) || defined(INET6) const struct timeval lro_timeout = {0, sc->lro_timeout}; #endif KASSERT(iq->state == IQS_BUSY, ("%s: iq %p not BUSY", __func__, iq)); limit = budget ? budget : iq->qsize / 16; if (iq->flags & IQ_HAS_FL) { fl = &rxq->fl; fl_hw_cidx = fl->hw_cidx; /* stable snapshot */ } else { fl = NULL; fl_hw_cidx = 0; /* to silence gcc warning */ } /* * We always come back and check the descriptor ring for new indirect * interrupts and other responses after running a single handler. */ for (;;) { while ((d->rsp.u.type_gen & F_RSPD_GEN) == iq->gen) { rmb(); refill = 0; m0 = NULL; rsp_type = G_RSPD_TYPE(d->rsp.u.type_gen); lq = be32toh(d->rsp.pldbuflen_qid); switch (rsp_type) { case X_RSPD_TYPE_FLBUF: KASSERT(iq->flags & IQ_HAS_FL, ("%s: data for an iq (%p) with no freelist", __func__, iq)); m0 = get_fl_payload(sc, fl, lq); if (__predict_false(m0 == NULL)) goto process_iql; refill = IDXDIFF(fl->hw_cidx, fl_hw_cidx, fl->sidx) > 2; #ifdef T4_PKT_TIMESTAMP /* * 60 bit timestamp for the payload is * *(uint64_t *)m0->m_pktdat. Note that it is * in the leading free-space in the mbuf. The * kernel can clobber it during a pullup, * m_copymdata, etc. You need to make sure that * the mbuf reaches you unmolested if you care * about the timestamp. */ *(uint64_t *)m0->m_pktdat = be64toh(ctrl->u.last_flit) & 0xfffffffffffffff; #endif /* fall through */ case X_RSPD_TYPE_CPL: KASSERT(d->rss.opcode < NUM_CPL_CMDS, ("%s: bad opcode %02x.", __func__, d->rss.opcode)); t4_cpl_handler[d->rss.opcode](iq, &d->rss, m0); break; case X_RSPD_TYPE_INTR: /* * Interrupts should be forwarded only to queues * that are not forwarding their interrupts. * This means service_iq can recurse but only 1 * level deep. */ KASSERT(budget == 0, ("%s: budget %u, rsp_type %u", __func__, budget, rsp_type)); /* * There are 1K interrupt-capable queues (qids 0 * through 1023). A response type indicating a * forwarded interrupt with a qid >= 1K is an * iWARP async notification. */ if (lq >= 1024) { t4_an_handler(iq, &d->rsp); break; } q = sc->sge.iqmap[lq - sc->sge.iq_start - sc->sge.iq_base]; if (atomic_cmpset_int(&q->state, IQS_IDLE, IQS_BUSY)) { if (service_iq(q, q->qsize / 16) == 0) { atomic_cmpset_int(&q->state, IQS_BUSY, IQS_IDLE); } else { STAILQ_INSERT_TAIL(&iql, q, link); } } break; default: KASSERT(0, ("%s: illegal response type %d on iq %p", __func__, rsp_type, iq)); log(LOG_ERR, "%s: illegal response type %d on iq %p", device_get_nameunit(sc->dev), rsp_type, iq); break; } d++; if (__predict_false(++iq->cidx == iq->sidx)) { iq->cidx = 0; iq->gen ^= F_RSPD_GEN; d = &iq->desc[0]; } if (__predict_false(++ndescs == limit)) { t4_write_reg(sc, sc->sge_gts_reg, V_CIDXINC(ndescs) | V_INGRESSQID(iq->cntxt_id) | V_SEINTARM(V_QINTR_TIMER_IDX(X_TIMERREG_UPDATE_CIDX))); ndescs = 0; #if defined(INET) || defined(INET6) if (iq->flags & IQ_LRO_ENABLED && sc->lro_timeout != 0) { tcp_lro_flush_inactive(&rxq->lro, &lro_timeout); } #endif if (budget) { if (iq->flags & IQ_HAS_FL) { FL_LOCK(fl); refill_fl(sc, fl, 32); FL_UNLOCK(fl); } return (EINPROGRESS); } } if (refill) { FL_LOCK(fl); refill_fl(sc, fl, 32); FL_UNLOCK(fl); fl_hw_cidx = fl->hw_cidx; } } process_iql: if (STAILQ_EMPTY(&iql)) break; /* * Process the head only, and send it to the back of the list if * it's still not done. */ q = STAILQ_FIRST(&iql); STAILQ_REMOVE_HEAD(&iql, link); if (service_iq(q, q->qsize / 8) == 0) atomic_cmpset_int(&q->state, IQS_BUSY, IQS_IDLE); else STAILQ_INSERT_TAIL(&iql, q, link); } #if defined(INET) || defined(INET6) if (iq->flags & IQ_LRO_ENABLED) { struct lro_ctrl *lro = &rxq->lro; tcp_lro_flush_all(lro); } #endif t4_write_reg(sc, sc->sge_gts_reg, V_CIDXINC(ndescs) | V_INGRESSQID((u32)iq->cntxt_id) | V_SEINTARM(iq->intr_params)); if (iq->flags & IQ_HAS_FL) { int starved; FL_LOCK(fl); starved = refill_fl(sc, fl, 64); FL_UNLOCK(fl); if (__predict_false(starved != 0)) add_fl_to_sfl(sc, fl); } return (0); } static inline int cl_has_metadata(struct sge_fl *fl, struct cluster_layout *cll) { int rc = fl->flags & FL_BUF_PACKING || cll->region1 > 0; if (rc) MPASS(cll->region3 >= CL_METADATA_SIZE); return (rc); } static inline struct cluster_metadata * cl_metadata(struct adapter *sc, struct sge_fl *fl, struct cluster_layout *cll, caddr_t cl) { if (cl_has_metadata(fl, cll)) { struct sw_zone_info *swz = &sc->sge.sw_zone_info[cll->zidx]; return ((struct cluster_metadata *)(cl + swz->size) - 1); } return (NULL); } static void rxb_free(struct mbuf *m, void *arg1, void *arg2) { uma_zone_t zone = arg1; caddr_t cl = arg2; uma_zfree(zone, cl); counter_u64_add(extfree_rels, 1); } /* * The mbuf returned by this function could be allocated from zone_mbuf or * constructed in spare room in the cluster. * * The mbuf carries the payload in one of these ways * a) frame inside the mbuf (mbuf from zone_mbuf) * b) m_cljset (for clusters without metadata) zone_mbuf * c) m_extaddref (cluster with metadata) inline mbuf * d) m_extaddref (cluster with metadata) zone_mbuf */ static struct mbuf * get_scatter_segment(struct adapter *sc, struct sge_fl *fl, int fr_offset, int remaining) { struct mbuf *m; struct fl_sdesc *sd = &fl->sdesc[fl->cidx]; struct cluster_layout *cll = &sd->cll; struct sw_zone_info *swz = &sc->sge.sw_zone_info[cll->zidx]; struct hw_buf_info *hwb = &sc->sge.hw_buf_info[cll->hwidx]; struct cluster_metadata *clm = cl_metadata(sc, fl, cll, sd->cl); int len, blen; caddr_t payload; blen = hwb->size - fl->rx_offset; /* max possible in this buf */ len = min(remaining, blen); payload = sd->cl + cll->region1 + fl->rx_offset; if (fl->flags & FL_BUF_PACKING) { const u_int l = fr_offset + len; const u_int pad = roundup2(l, fl->buf_boundary) - l; if (fl->rx_offset + len + pad < hwb->size) blen = len + pad; MPASS(fl->rx_offset + blen <= hwb->size); } else { MPASS(fl->rx_offset == 0); /* not packing */ } if (sc->sc_do_rxcopy && len < RX_COPY_THRESHOLD) { /* * Copy payload into a freshly allocated mbuf. */ m = fr_offset == 0 ? m_gethdr(M_NOWAIT, MT_DATA) : m_get(M_NOWAIT, MT_DATA); if (m == NULL) return (NULL); fl->mbuf_allocated++; #ifdef T4_PKT_TIMESTAMP /* Leave room for a timestamp */ m->m_data += 8; #endif /* copy data to mbuf */ bcopy(payload, mtod(m, caddr_t), len); } else if (sd->nmbuf * MSIZE < cll->region1) { /* * There's spare room in the cluster for an mbuf. Create one * and associate it with the payload that's in the cluster. */ MPASS(clm != NULL); m = (struct mbuf *)(sd->cl + sd->nmbuf * MSIZE); /* No bzero required */ if (m_init(m, M_NOWAIT, MT_DATA, fr_offset == 0 ? M_PKTHDR | M_NOFREE : M_NOFREE)) return (NULL); fl->mbuf_inlined++; m_extaddref(m, payload, blen, &clm->refcount, rxb_free, swz->zone, sd->cl); if (sd->nmbuf++ == 0) counter_u64_add(extfree_refs, 1); } else { /* * Grab an mbuf from zone_mbuf and associate it with the * payload in the cluster. */ m = fr_offset == 0 ? m_gethdr(M_NOWAIT, MT_DATA) : m_get(M_NOWAIT, MT_DATA); if (m == NULL) return (NULL); fl->mbuf_allocated++; if (clm != NULL) { m_extaddref(m, payload, blen, &clm->refcount, rxb_free, swz->zone, sd->cl); if (sd->nmbuf++ == 0) counter_u64_add(extfree_refs, 1); } else { m_cljset(m, sd->cl, swz->type); sd->cl = NULL; /* consumed, not a recycle candidate */ } } if (fr_offset == 0) m->m_pkthdr.len = remaining; m->m_len = len; if (fl->flags & FL_BUF_PACKING) { fl->rx_offset += blen; MPASS(fl->rx_offset <= hwb->size); if (fl->rx_offset < hwb->size) return (m); /* without advancing the cidx */ } if (__predict_false(++fl->cidx % 8 == 0)) { uint16_t cidx = fl->cidx / 8; if (__predict_false(cidx == fl->sidx)) fl->cidx = cidx = 0; fl->hw_cidx = cidx; } fl->rx_offset = 0; return (m); } static struct mbuf * get_fl_payload(struct adapter *sc, struct sge_fl *fl, uint32_t len_newbuf) { struct mbuf *m0, *m, **pnext; u_int remaining; const u_int total = G_RSPD_LEN(len_newbuf); if (__predict_false(fl->flags & FL_BUF_RESUME)) { M_ASSERTPKTHDR(fl->m0); MPASS(fl->m0->m_pkthdr.len == total); MPASS(fl->remaining < total); m0 = fl->m0; pnext = fl->pnext; remaining = fl->remaining; fl->flags &= ~FL_BUF_RESUME; goto get_segment; } if (fl->rx_offset > 0 && len_newbuf & F_RSPD_NEWBUF) { fl->rx_offset = 0; if (__predict_false(++fl->cidx % 8 == 0)) { uint16_t cidx = fl->cidx / 8; if (__predict_false(cidx == fl->sidx)) fl->cidx = cidx = 0; fl->hw_cidx = cidx; } } /* * Payload starts at rx_offset in the current hw buffer. Its length is * 'len' and it may span multiple hw buffers. */ m0 = get_scatter_segment(sc, fl, 0, total); if (m0 == NULL) return (NULL); remaining = total - m0->m_len; pnext = &m0->m_next; while (remaining > 0) { get_segment: MPASS(fl->rx_offset == 0); m = get_scatter_segment(sc, fl, total - remaining, remaining); if (__predict_false(m == NULL)) { fl->m0 = m0; fl->pnext = pnext; fl->remaining = remaining; fl->flags |= FL_BUF_RESUME; return (NULL); } *pnext = m; pnext = &m->m_next; remaining -= m->m_len; } *pnext = NULL; M_ASSERTPKTHDR(m0); return (m0); } static int t4_eth_rx(struct sge_iq *iq, const struct rss_header *rss, struct mbuf *m0) { struct sge_rxq *rxq = iq_to_rxq(iq); struct ifnet *ifp = rxq->ifp; struct adapter *sc = iq->adapter; const struct cpl_rx_pkt *cpl = (const void *)(rss + 1); #if defined(INET) || defined(INET6) struct lro_ctrl *lro = &rxq->lro; #endif static const int sw_hashtype[4][2] = { {M_HASHTYPE_NONE, M_HASHTYPE_NONE}, {M_HASHTYPE_RSS_IPV4, M_HASHTYPE_RSS_IPV6}, {M_HASHTYPE_RSS_TCP_IPV4, M_HASHTYPE_RSS_TCP_IPV6}, {M_HASHTYPE_RSS_UDP_IPV4, M_HASHTYPE_RSS_UDP_IPV6}, }; KASSERT(m0 != NULL, ("%s: no payload with opcode %02x", __func__, rss->opcode)); m0->m_pkthdr.len -= sc->params.sge.fl_pktshift; m0->m_len -= sc->params.sge.fl_pktshift; m0->m_data += sc->params.sge.fl_pktshift; m0->m_pkthdr.rcvif = ifp; M_HASHTYPE_SET(m0, sw_hashtype[rss->hash_type][rss->ipv6]); m0->m_pkthdr.flowid = be32toh(rss->hash_val); if (cpl->csum_calc && !cpl->err_vec) { if (ifp->if_capenable & IFCAP_RXCSUM && cpl->l2info & htobe32(F_RXF_IP)) { m0->m_pkthdr.csum_flags = (CSUM_IP_CHECKED | CSUM_IP_VALID | CSUM_DATA_VALID | CSUM_PSEUDO_HDR); rxq->rxcsum++; } else if (ifp->if_capenable & IFCAP_RXCSUM_IPV6 && cpl->l2info & htobe32(F_RXF_IP6)) { m0->m_pkthdr.csum_flags = (CSUM_DATA_VALID_IPV6 | CSUM_PSEUDO_HDR); rxq->rxcsum++; } if (__predict_false(cpl->ip_frag)) m0->m_pkthdr.csum_data = be16toh(cpl->csum); else m0->m_pkthdr.csum_data = 0xffff; } if (cpl->vlan_ex) { m0->m_pkthdr.ether_vtag = be16toh(cpl->vlan); m0->m_flags |= M_VLANTAG; rxq->vlan_extraction++; } #if defined(INET) || defined(INET6) - if (cpl->l2info & htobe32(F_RXF_LRO) && - iq->flags & IQ_LRO_ENABLED && + if (iq->flags & IQ_LRO_ENABLED && tcp_lro_rx(lro, m0, 0) == 0) { /* queued for LRO */ } else #endif ifp->if_input(ifp, m0); return (0); } /* * Must drain the wrq or make sure that someone else will. */ static void wrq_tx_drain(void *arg, int n) { struct sge_wrq *wrq = arg; struct sge_eq *eq = &wrq->eq; EQ_LOCK(eq); if (TAILQ_EMPTY(&wrq->incomplete_wrs) && !STAILQ_EMPTY(&wrq->wr_list)) drain_wrq_wr_list(wrq->adapter, wrq); EQ_UNLOCK(eq); } static void drain_wrq_wr_list(struct adapter *sc, struct sge_wrq *wrq) { struct sge_eq *eq = &wrq->eq; u_int available, dbdiff; /* # of hardware descriptors */ u_int n; struct wrqe *wr; struct fw_eth_tx_pkt_wr *dst; /* any fw WR struct will do */ EQ_LOCK_ASSERT_OWNED(eq); MPASS(TAILQ_EMPTY(&wrq->incomplete_wrs)); wr = STAILQ_FIRST(&wrq->wr_list); MPASS(wr != NULL); /* Must be called with something useful to do */ MPASS(eq->pidx == eq->dbidx); dbdiff = 0; do { eq->cidx = read_hw_cidx(eq); if (eq->pidx == eq->cidx) available = eq->sidx - 1; else available = IDXDIFF(eq->cidx, eq->pidx, eq->sidx) - 1; MPASS(wr->wrq == wrq); n = howmany(wr->wr_len, EQ_ESIZE); if (available < n) break; dst = (void *)&eq->desc[eq->pidx]; if (__predict_true(eq->sidx - eq->pidx > n)) { /* Won't wrap, won't end exactly at the status page. */ bcopy(&wr->wr[0], dst, wr->wr_len); eq->pidx += n; } else { int first_portion = (eq->sidx - eq->pidx) * EQ_ESIZE; bcopy(&wr->wr[0], dst, first_portion); if (wr->wr_len > first_portion) { bcopy(&wr->wr[first_portion], &eq->desc[0], wr->wr_len - first_portion); } eq->pidx = n - (eq->sidx - eq->pidx); } if (available < eq->sidx / 4 && atomic_cmpset_int(&eq->equiq, 0, 1)) { dst->equiq_to_len16 |= htobe32(F_FW_WR_EQUIQ | F_FW_WR_EQUEQ); eq->equeqidx = eq->pidx; } else if (IDXDIFF(eq->pidx, eq->equeqidx, eq->sidx) >= 32) { dst->equiq_to_len16 |= htobe32(F_FW_WR_EQUEQ); eq->equeqidx = eq->pidx; } dbdiff += n; if (dbdiff >= 16) { ring_eq_db(sc, eq, dbdiff); dbdiff = 0; } STAILQ_REMOVE_HEAD(&wrq->wr_list, link); free_wrqe(wr); MPASS(wrq->nwr_pending > 0); wrq->nwr_pending--; MPASS(wrq->ndesc_needed >= n); wrq->ndesc_needed -= n; } while ((wr = STAILQ_FIRST(&wrq->wr_list)) != NULL); if (dbdiff) ring_eq_db(sc, eq, dbdiff); } /* * Doesn't fail. Holds on to work requests it can't send right away. */ void t4_wrq_tx_locked(struct adapter *sc, struct sge_wrq *wrq, struct wrqe *wr) { #ifdef INVARIANTS struct sge_eq *eq = &wrq->eq; #endif EQ_LOCK_ASSERT_OWNED(eq); MPASS(wr != NULL); MPASS(wr->wr_len > 0 && wr->wr_len <= SGE_MAX_WR_LEN); MPASS((wr->wr_len & 0x7) == 0); STAILQ_INSERT_TAIL(&wrq->wr_list, wr, link); wrq->nwr_pending++; wrq->ndesc_needed += howmany(wr->wr_len, EQ_ESIZE); if (!TAILQ_EMPTY(&wrq->incomplete_wrs)) return; /* commit_wrq_wr will drain wr_list as well. */ drain_wrq_wr_list(sc, wrq); /* Doorbell must have caught up to the pidx. */ MPASS(eq->pidx == eq->dbidx); } void t4_update_fl_bufsize(struct ifnet *ifp) { struct vi_info *vi = ifp->if_softc; struct adapter *sc = vi->pi->adapter; struct sge_rxq *rxq; #ifdef TCP_OFFLOAD struct sge_ofld_rxq *ofld_rxq; #endif struct sge_fl *fl; int i, maxp, mtu = ifp->if_mtu; maxp = mtu_to_max_payload(sc, mtu, 0); for_each_rxq(vi, i, rxq) { fl = &rxq->fl; FL_LOCK(fl); find_best_refill_source(sc, fl, maxp); FL_UNLOCK(fl); } #ifdef TCP_OFFLOAD maxp = mtu_to_max_payload(sc, mtu, 1); for_each_ofld_rxq(vi, i, ofld_rxq) { fl = &ofld_rxq->fl; FL_LOCK(fl); find_best_refill_source(sc, fl, maxp); FL_UNLOCK(fl); } #endif } static inline int mbuf_nsegs(struct mbuf *m) { M_ASSERTPKTHDR(m); KASSERT(m->m_pkthdr.l5hlen > 0, ("%s: mbuf %p missing information on # of segments.", __func__, m)); return (m->m_pkthdr.l5hlen); } static inline void set_mbuf_nsegs(struct mbuf *m, uint8_t nsegs) { M_ASSERTPKTHDR(m); m->m_pkthdr.l5hlen = nsegs; } static inline int mbuf_len16(struct mbuf *m) { int n; M_ASSERTPKTHDR(m); n = m->m_pkthdr.PH_loc.eight[0]; MPASS(n > 0 && n <= SGE_MAX_WR_LEN / 16); return (n); } static inline void set_mbuf_len16(struct mbuf *m, uint8_t len16) { M_ASSERTPKTHDR(m); m->m_pkthdr.PH_loc.eight[0] = len16; } static inline int needs_tso(struct mbuf *m) { M_ASSERTPKTHDR(m); if (m->m_pkthdr.csum_flags & CSUM_TSO) { KASSERT(m->m_pkthdr.tso_segsz > 0, ("%s: TSO requested in mbuf %p but MSS not provided", __func__, m)); return (1); } return (0); } static inline int needs_l3_csum(struct mbuf *m) { M_ASSERTPKTHDR(m); if (m->m_pkthdr.csum_flags & (CSUM_IP | CSUM_TSO)) return (1); return (0); } static inline int needs_l4_csum(struct mbuf *m) { M_ASSERTPKTHDR(m); if (m->m_pkthdr.csum_flags & (CSUM_TCP | CSUM_UDP | CSUM_UDP_IPV6 | CSUM_TCP_IPV6 | CSUM_TSO)) return (1); return (0); } static inline int needs_vlan_insertion(struct mbuf *m) { M_ASSERTPKTHDR(m); if (m->m_flags & M_VLANTAG) { KASSERT(m->m_pkthdr.ether_vtag != 0, ("%s: HWVLAN requested in mbuf %p but tag not provided", __func__, m)); return (1); } return (0); } static void * m_advance(struct mbuf **pm, int *poffset, int len) { struct mbuf *m = *pm; int offset = *poffset; uintptr_t p = 0; MPASS(len > 0); for (;;) { if (offset + len < m->m_len) { offset += len; p = mtod(m, uintptr_t) + offset; break; } len -= m->m_len - offset; m = m->m_next; offset = 0; MPASS(m != NULL); } *poffset = offset; *pm = m; return ((void *)p); } static inline int same_paddr(char *a, char *b) { if (a == b) return (1); else if (a != NULL && b != NULL) { vm_offset_t x = (vm_offset_t)a; vm_offset_t y = (vm_offset_t)b; if ((x & PAGE_MASK) == (y & PAGE_MASK) && pmap_kextract(x) == pmap_kextract(y)) return (1); } return (0); } /* * Can deal with empty mbufs in the chain that have m_len = 0, but the chain * must have at least one mbuf that's not empty. */ static inline int count_mbuf_nsegs(struct mbuf *m) { char *prev_end, *start; int len, nsegs; MPASS(m != NULL); nsegs = 0; prev_end = NULL; for (; m; m = m->m_next) { len = m->m_len; if (__predict_false(len == 0)) continue; start = mtod(m, char *); nsegs += sglist_count(start, len); if (same_paddr(prev_end, start)) nsegs--; prev_end = start + len; } MPASS(nsegs > 0); return (nsegs); } /* * Analyze the mbuf to determine its tx needs. The mbuf passed in may change: * a) caller can assume it's been freed if this function returns with an error. * b) it may get defragged up if the gather list is too long for the hardware. */ int parse_pkt(struct adapter *sc, struct mbuf **mp) { struct mbuf *m0 = *mp, *m; int rc, nsegs, defragged = 0, offset; struct ether_header *eh; void *l3hdr; #if defined(INET) || defined(INET6) struct tcphdr *tcp; #endif uint16_t eh_type; M_ASSERTPKTHDR(m0); if (__predict_false(m0->m_pkthdr.len < ETHER_HDR_LEN)) { rc = EINVAL; fail: m_freem(m0); *mp = NULL; return (rc); } restart: /* * First count the number of gather list segments in the payload. * Defrag the mbuf if nsegs exceeds the hardware limit. */ M_ASSERTPKTHDR(m0); MPASS(m0->m_pkthdr.len > 0); nsegs = count_mbuf_nsegs(m0); if (nsegs > (needs_tso(m0) ? TX_SGL_SEGS_TSO : TX_SGL_SEGS)) { if (defragged++ > 0 || (m = m_defrag(m0, M_NOWAIT)) == NULL) { rc = EFBIG; goto fail; } *mp = m0 = m; /* update caller's copy after defrag */ goto restart; } if (__predict_false(nsegs > 2 && m0->m_pkthdr.len <= MHLEN)) { m0 = m_pullup(m0, m0->m_pkthdr.len); if (m0 == NULL) { /* Should have left well enough alone. */ rc = EFBIG; goto fail; } *mp = m0; /* update caller's copy after pullup */ goto restart; } set_mbuf_nsegs(m0, nsegs); if (sc->flags & IS_VF) set_mbuf_len16(m0, txpkt_vm_len16(nsegs, needs_tso(m0))); else set_mbuf_len16(m0, txpkt_len16(nsegs, needs_tso(m0))); if (!needs_tso(m0) && !(sc->flags & IS_VF && (needs_l3_csum(m0) || needs_l4_csum(m0)))) return (0); m = m0; eh = mtod(m, struct ether_header *); eh_type = ntohs(eh->ether_type); if (eh_type == ETHERTYPE_VLAN) { struct ether_vlan_header *evh = (void *)eh; eh_type = ntohs(evh->evl_proto); m0->m_pkthdr.l2hlen = sizeof(*evh); } else m0->m_pkthdr.l2hlen = sizeof(*eh); offset = 0; l3hdr = m_advance(&m, &offset, m0->m_pkthdr.l2hlen); switch (eh_type) { #ifdef INET6 case ETHERTYPE_IPV6: { struct ip6_hdr *ip6 = l3hdr; MPASS(!needs_tso(m0) || ip6->ip6_nxt == IPPROTO_TCP); m0->m_pkthdr.l3hlen = sizeof(*ip6); break; } #endif #ifdef INET case ETHERTYPE_IP: { struct ip *ip = l3hdr; m0->m_pkthdr.l3hlen = ip->ip_hl * 4; break; } #endif default: panic("%s: ethertype 0x%04x unknown. if_cxgbe must be compiled" " with the same INET/INET6 options as the kernel.", __func__, eh_type); } #if defined(INET) || defined(INET6) if (needs_tso(m0)) { tcp = m_advance(&m, &offset, m0->m_pkthdr.l3hlen); m0->m_pkthdr.l4hlen = tcp->th_off * 4; } #endif MPASS(m0 == *mp); return (0); } void * start_wrq_wr(struct sge_wrq *wrq, int len16, struct wrq_cookie *cookie) { struct sge_eq *eq = &wrq->eq; struct adapter *sc = wrq->adapter; int ndesc, available; struct wrqe *wr; void *w; MPASS(len16 > 0); ndesc = howmany(len16, EQ_ESIZE / 16); MPASS(ndesc > 0 && ndesc <= SGE_MAX_WR_NDESC); EQ_LOCK(eq); if (!STAILQ_EMPTY(&wrq->wr_list)) drain_wrq_wr_list(sc, wrq); if (!STAILQ_EMPTY(&wrq->wr_list)) { slowpath: EQ_UNLOCK(eq); wr = alloc_wrqe(len16 * 16, wrq); if (__predict_false(wr == NULL)) return (NULL); cookie->pidx = -1; cookie->ndesc = ndesc; return (&wr->wr); } eq->cidx = read_hw_cidx(eq); if (eq->pidx == eq->cidx) available = eq->sidx - 1; else available = IDXDIFF(eq->cidx, eq->pidx, eq->sidx) - 1; if (available < ndesc) goto slowpath; cookie->pidx = eq->pidx; cookie->ndesc = ndesc; TAILQ_INSERT_TAIL(&wrq->incomplete_wrs, cookie, link); w = &eq->desc[eq->pidx]; IDXINCR(eq->pidx, ndesc, eq->sidx); if (__predict_false(eq->pidx < ndesc - 1)) { w = &wrq->ss[0]; wrq->ss_pidx = cookie->pidx; wrq->ss_len = len16 * 16; } EQ_UNLOCK(eq); return (w); } void commit_wrq_wr(struct sge_wrq *wrq, void *w, struct wrq_cookie *cookie) { struct sge_eq *eq = &wrq->eq; struct adapter *sc = wrq->adapter; int ndesc, pidx; struct wrq_cookie *prev, *next; if (cookie->pidx == -1) { struct wrqe *wr = __containerof(w, struct wrqe, wr); t4_wrq_tx(sc, wr); return; } ndesc = cookie->ndesc; /* Can be more than SGE_MAX_WR_NDESC here. */ pidx = cookie->pidx; MPASS(pidx >= 0 && pidx < eq->sidx); if (__predict_false(w == &wrq->ss[0])) { int n = (eq->sidx - wrq->ss_pidx) * EQ_ESIZE; MPASS(wrq->ss_len > n); /* WR had better wrap around. */ bcopy(&wrq->ss[0], &eq->desc[wrq->ss_pidx], n); bcopy(&wrq->ss[n], &eq->desc[0], wrq->ss_len - n); wrq->tx_wrs_ss++; } else wrq->tx_wrs_direct++; EQ_LOCK(eq); prev = TAILQ_PREV(cookie, wrq_incomplete_wrs, link); next = TAILQ_NEXT(cookie, link); if (prev == NULL) { MPASS(pidx == eq->dbidx); if (next == NULL || ndesc >= 16) ring_eq_db(wrq->adapter, eq, ndesc); else { MPASS(IDXDIFF(next->pidx, pidx, eq->sidx) == ndesc); next->pidx = pidx; next->ndesc += ndesc; } } else { MPASS(IDXDIFF(pidx, prev->pidx, eq->sidx) == prev->ndesc); prev->ndesc += ndesc; } TAILQ_REMOVE(&wrq->incomplete_wrs, cookie, link); if (TAILQ_EMPTY(&wrq->incomplete_wrs) && !STAILQ_EMPTY(&wrq->wr_list)) drain_wrq_wr_list(sc, wrq); #ifdef INVARIANTS if (TAILQ_EMPTY(&wrq->incomplete_wrs)) { /* Doorbell must have caught up to the pidx. */ MPASS(wrq->eq.pidx == wrq->eq.dbidx); } #endif EQ_UNLOCK(eq); } static u_int can_resume_eth_tx(struct mp_ring *r) { struct sge_eq *eq = r->cookie; return (total_available_tx_desc(eq) > eq->sidx / 8); } static inline int cannot_use_txpkts(struct mbuf *m) { /* maybe put a GL limit too, to avoid silliness? */ return (needs_tso(m)); } /* * r->items[cidx] to r->items[pidx], with a wraparound at r->size, are ready to * be consumed. Return the actual number consumed. 0 indicates a stall. */ static u_int eth_tx(struct mp_ring *r, u_int cidx, u_int pidx) { struct sge_txq *txq = r->cookie; struct sge_eq *eq = &txq->eq; struct ifnet *ifp = txq->ifp; struct vi_info *vi = ifp->if_softc; struct port_info *pi = vi->pi; struct adapter *sc = pi->adapter; u_int total, remaining; /* # of packets */ u_int available, dbdiff; /* # of hardware descriptors */ u_int n, next_cidx; struct mbuf *m0, *tail; struct txpkts txp; struct fw_eth_tx_pkts_wr *wr; /* any fw WR struct will do */ remaining = IDXDIFF(pidx, cidx, r->size); MPASS(remaining > 0); /* Must not be called without work to do. */ total = 0; TXQ_LOCK(txq); if (__predict_false((eq->flags & EQ_ENABLED) == 0)) { while (cidx != pidx) { m0 = r->items[cidx]; m_freem(m0); if (++cidx == r->size) cidx = 0; } reclaim_tx_descs(txq, 2048); total = remaining; goto done; } /* How many hardware descriptors do we have readily available. */ if (eq->pidx == eq->cidx) available = eq->sidx - 1; else available = IDXDIFF(eq->cidx, eq->pidx, eq->sidx) - 1; dbdiff = IDXDIFF(eq->pidx, eq->dbidx, eq->sidx); while (remaining > 0) { m0 = r->items[cidx]; M_ASSERTPKTHDR(m0); MPASS(m0->m_nextpkt == NULL); if (available < SGE_MAX_WR_NDESC) { available += reclaim_tx_descs(txq, 64); if (available < howmany(mbuf_len16(m0), EQ_ESIZE / 16)) break; /* out of descriptors */ } next_cidx = cidx + 1; if (__predict_false(next_cidx == r->size)) next_cidx = 0; wr = (void *)&eq->desc[eq->pidx]; if (sc->flags & IS_VF) { total++; remaining--; ETHER_BPF_MTAP(ifp, m0); n = write_txpkt_vm_wr(txq, (void *)wr, m0, available); } else if (remaining > 1 && try_txpkts(m0, r->items[next_cidx], &txp, available) == 0) { /* pkts at cidx, next_cidx should both be in txp. */ MPASS(txp.npkt == 2); tail = r->items[next_cidx]; MPASS(tail->m_nextpkt == NULL); ETHER_BPF_MTAP(ifp, m0); ETHER_BPF_MTAP(ifp, tail); m0->m_nextpkt = tail; if (__predict_false(++next_cidx == r->size)) next_cidx = 0; while (next_cidx != pidx) { if (add_to_txpkts(r->items[next_cidx], &txp, available) != 0) break; tail->m_nextpkt = r->items[next_cidx]; tail = tail->m_nextpkt; ETHER_BPF_MTAP(ifp, tail); if (__predict_false(++next_cidx == r->size)) next_cidx = 0; } n = write_txpkts_wr(txq, wr, m0, &txp, available); total += txp.npkt; remaining -= txp.npkt; } else { total++; remaining--; ETHER_BPF_MTAP(ifp, m0); n = write_txpkt_wr(txq, (void *)wr, m0, available); } MPASS(n >= 1 && n <= available && n <= SGE_MAX_WR_NDESC); available -= n; dbdiff += n; IDXINCR(eq->pidx, n, eq->sidx); if (total_available_tx_desc(eq) < eq->sidx / 4 && atomic_cmpset_int(&eq->equiq, 0, 1)) { wr->equiq_to_len16 |= htobe32(F_FW_WR_EQUIQ | F_FW_WR_EQUEQ); eq->equeqidx = eq->pidx; } else if (IDXDIFF(eq->pidx, eq->equeqidx, eq->sidx) >= 32) { wr->equiq_to_len16 |= htobe32(F_FW_WR_EQUEQ); eq->equeqidx = eq->pidx; } if (dbdiff >= 16 && remaining >= 4) { ring_eq_db(sc, eq, dbdiff); available += reclaim_tx_descs(txq, 4 * dbdiff); dbdiff = 0; } cidx = next_cidx; } if (dbdiff != 0) { ring_eq_db(sc, eq, dbdiff); reclaim_tx_descs(txq, 32); } done: TXQ_UNLOCK(txq); return (total); } static inline void init_iq(struct sge_iq *iq, struct adapter *sc, int tmr_idx, int pktc_idx, int qsize) { KASSERT(tmr_idx >= 0 && tmr_idx < SGE_NTIMERS, ("%s: bad tmr_idx %d", __func__, tmr_idx)); KASSERT(pktc_idx < SGE_NCOUNTERS, /* -ve is ok, means don't use */ ("%s: bad pktc_idx %d", __func__, pktc_idx)); iq->flags = 0; iq->adapter = sc; iq->intr_params = V_QINTR_TIMER_IDX(tmr_idx); iq->intr_pktc_idx = SGE_NCOUNTERS - 1; if (pktc_idx >= 0) { iq->intr_params |= F_QINTR_CNT_EN; iq->intr_pktc_idx = pktc_idx; } iq->qsize = roundup2(qsize, 16); /* See FW_IQ_CMD/iqsize */ iq->sidx = iq->qsize - sc->params.sge.spg_len / IQ_ESIZE; } static inline void init_fl(struct adapter *sc, struct sge_fl *fl, int qsize, int maxp, char *name) { fl->qsize = qsize; fl->sidx = qsize - sc->params.sge.spg_len / EQ_ESIZE; strlcpy(fl->lockname, name, sizeof(fl->lockname)); if (sc->flags & BUF_PACKING_OK && ((!is_t4(sc) && buffer_packing) || /* T5+: enabled unless 0 */ (is_t4(sc) && buffer_packing == 1)))/* T4: disabled unless 1 */ fl->flags |= FL_BUF_PACKING; find_best_refill_source(sc, fl, maxp); find_safe_refill_source(sc, fl); } static inline void init_eq(struct adapter *sc, struct sge_eq *eq, int eqtype, int qsize, uint8_t tx_chan, uint16_t iqid, char *name) { KASSERT(eqtype <= EQ_TYPEMASK, ("%s: bad qtype %d", __func__, eqtype)); eq->flags = eqtype & EQ_TYPEMASK; eq->tx_chan = tx_chan; eq->iqid = iqid; eq->sidx = qsize - sc->params.sge.spg_len / EQ_ESIZE; strlcpy(eq->lockname, name, sizeof(eq->lockname)); } static int alloc_ring(struct adapter *sc, size_t len, bus_dma_tag_t *tag, bus_dmamap_t *map, bus_addr_t *pa, void **va) { int rc; rc = bus_dma_tag_create(sc->dmat, 512, 0, BUS_SPACE_MAXADDR, BUS_SPACE_MAXADDR, NULL, NULL, len, 1, len, 0, NULL, NULL, tag); if (rc != 0) { device_printf(sc->dev, "cannot allocate DMA tag: %d\n", rc); goto done; } rc = bus_dmamem_alloc(*tag, va, BUS_DMA_WAITOK | BUS_DMA_COHERENT | BUS_DMA_ZERO, map); if (rc != 0) { device_printf(sc->dev, "cannot allocate DMA memory: %d\n", rc); goto done; } rc = bus_dmamap_load(*tag, *map, *va, len, oneseg_dma_callback, pa, 0); if (rc != 0) { device_printf(sc->dev, "cannot load DMA map: %d\n", rc); goto done; } done: if (rc) free_ring(sc, *tag, *map, *pa, *va); return (rc); } static int free_ring(struct adapter *sc, bus_dma_tag_t tag, bus_dmamap_t map, bus_addr_t pa, void *va) { if (pa) bus_dmamap_unload(tag, map); if (va) bus_dmamem_free(tag, va, map); if (tag) bus_dma_tag_destroy(tag); return (0); } /* * Allocates the ring for an ingress queue and an optional freelist. If the * freelist is specified it will be allocated and then associated with the * ingress queue. * * Returns errno on failure. Resources allocated up to that point may still be * allocated. Caller is responsible for cleanup in case this function fails. * * If the ingress queue will take interrupts directly (iq->flags & IQ_INTR) then * the intr_idx specifies the vector, starting from 0. Otherwise it specifies * the abs_id of the ingress queue to which its interrupts should be forwarded. */ static int alloc_iq_fl(struct vi_info *vi, struct sge_iq *iq, struct sge_fl *fl, int intr_idx, int cong) { int rc, i, cntxt_id; size_t len; struct fw_iq_cmd c; struct port_info *pi = vi->pi; struct adapter *sc = iq->adapter; struct sge_params *sp = &sc->params.sge; __be32 v = 0; len = iq->qsize * IQ_ESIZE; rc = alloc_ring(sc, len, &iq->desc_tag, &iq->desc_map, &iq->ba, (void **)&iq->desc); if (rc != 0) return (rc); bzero(&c, sizeof(c)); c.op_to_vfn = htobe32(V_FW_CMD_OP(FW_IQ_CMD) | F_FW_CMD_REQUEST | F_FW_CMD_WRITE | F_FW_CMD_EXEC | V_FW_IQ_CMD_PFN(sc->pf) | V_FW_IQ_CMD_VFN(0)); c.alloc_to_len16 = htobe32(F_FW_IQ_CMD_ALLOC | F_FW_IQ_CMD_IQSTART | FW_LEN16(c)); /* Special handling for firmware event queue */ if (iq == &sc->sge.fwq) v |= F_FW_IQ_CMD_IQASYNCH; if (iq->flags & IQ_INTR) { KASSERT(intr_idx < sc->intr_count, ("%s: invalid direct intr_idx %d", __func__, intr_idx)); } else v |= F_FW_IQ_CMD_IQANDST; v |= V_FW_IQ_CMD_IQANDSTINDEX(intr_idx); c.type_to_iqandstindex = htobe32(v | V_FW_IQ_CMD_TYPE(FW_IQ_TYPE_FL_INT_CAP) | V_FW_IQ_CMD_VIID(vi->viid) | V_FW_IQ_CMD_IQANUD(X_UPDATEDELIVERY_INTERRUPT)); c.iqdroprss_to_iqesize = htobe16(V_FW_IQ_CMD_IQPCIECH(pi->tx_chan) | F_FW_IQ_CMD_IQGTSMODE | V_FW_IQ_CMD_IQINTCNTTHRESH(iq->intr_pktc_idx) | V_FW_IQ_CMD_IQESIZE(ilog2(IQ_ESIZE) - 4)); c.iqsize = htobe16(iq->qsize); c.iqaddr = htobe64(iq->ba); if (cong >= 0) c.iqns_to_fl0congen = htobe32(F_FW_IQ_CMD_IQFLINTCONGEN); if (fl) { mtx_init(&fl->fl_lock, fl->lockname, NULL, MTX_DEF); len = fl->qsize * EQ_ESIZE; rc = alloc_ring(sc, len, &fl->desc_tag, &fl->desc_map, &fl->ba, (void **)&fl->desc); if (rc) return (rc); /* Allocate space for one software descriptor per buffer. */ rc = alloc_fl_sdesc(fl); if (rc != 0) { device_printf(sc->dev, "failed to setup fl software descriptors: %d\n", rc); return (rc); } if (fl->flags & FL_BUF_PACKING) { fl->lowat = roundup2(sp->fl_starve_threshold2, 8); fl->buf_boundary = sp->pack_boundary; } else { fl->lowat = roundup2(sp->fl_starve_threshold, 8); fl->buf_boundary = 16; } if (fl_pad && fl->buf_boundary < sp->pad_boundary) fl->buf_boundary = sp->pad_boundary; c.iqns_to_fl0congen |= htobe32(V_FW_IQ_CMD_FL0HOSTFCMODE(X_HOSTFCMODE_NONE) | F_FW_IQ_CMD_FL0FETCHRO | F_FW_IQ_CMD_FL0DATARO | (fl_pad ? F_FW_IQ_CMD_FL0PADEN : 0) | (fl->flags & FL_BUF_PACKING ? F_FW_IQ_CMD_FL0PACKEN : 0)); if (cong >= 0) { c.iqns_to_fl0congen |= htobe32(V_FW_IQ_CMD_FL0CNGCHMAP(cong) | F_FW_IQ_CMD_FL0CONGCIF | F_FW_IQ_CMD_FL0CONGEN); } c.fl0dcaen_to_fl0cidxfthresh = htobe16(V_FW_IQ_CMD_FL0FBMIN(X_FETCHBURSTMIN_128B) | V_FW_IQ_CMD_FL0FBMAX(X_FETCHBURSTMAX_512B)); c.fl0size = htobe16(fl->qsize); c.fl0addr = htobe64(fl->ba); } rc = -t4_wr_mbox(sc, sc->mbox, &c, sizeof(c), &c); if (rc != 0) { device_printf(sc->dev, "failed to create ingress queue: %d\n", rc); return (rc); } iq->cidx = 0; iq->gen = F_RSPD_GEN; iq->intr_next = iq->intr_params; iq->cntxt_id = be16toh(c.iqid); iq->abs_id = be16toh(c.physiqid); iq->flags |= IQ_ALLOCATED; cntxt_id = iq->cntxt_id - sc->sge.iq_start; if (cntxt_id >= sc->sge.niq) { panic ("%s: iq->cntxt_id (%d) more than the max (%d)", __func__, cntxt_id, sc->sge.niq - 1); } sc->sge.iqmap[cntxt_id] = iq; if (fl) { u_int qid; iq->flags |= IQ_HAS_FL; fl->cntxt_id = be16toh(c.fl0id); fl->pidx = fl->cidx = 0; cntxt_id = fl->cntxt_id - sc->sge.eq_start; if (cntxt_id >= sc->sge.neq) { panic("%s: fl->cntxt_id (%d) more than the max (%d)", __func__, cntxt_id, sc->sge.neq - 1); } sc->sge.eqmap[cntxt_id] = (void *)fl; qid = fl->cntxt_id; if (isset(&sc->doorbells, DOORBELL_UDB)) { uint32_t s_qpp = sc->params.sge.eq_s_qpp; uint32_t mask = (1 << s_qpp) - 1; volatile uint8_t *udb; udb = sc->udbs_base + UDBS_DB_OFFSET; udb += (qid >> s_qpp) << PAGE_SHIFT; qid &= mask; if (qid < PAGE_SIZE / UDBS_SEG_SIZE) { udb += qid << UDBS_SEG_SHIFT; qid = 0; } fl->udb = (volatile void *)udb; } fl->dbval = V_QID(qid) | sc->chip_params->sge_fl_db; FL_LOCK(fl); /* Enough to make sure the SGE doesn't think it's starved */ refill_fl(sc, fl, fl->lowat); FL_UNLOCK(fl); } if (is_t5(sc) && !(sc->flags & IS_VF) && cong >= 0) { uint32_t param, val; param = V_FW_PARAMS_MNEM(FW_PARAMS_MNEM_DMAQ) | V_FW_PARAMS_PARAM_X(FW_PARAMS_PARAM_DMAQ_CONM_CTXT) | V_FW_PARAMS_PARAM_YZ(iq->cntxt_id); if (cong == 0) val = 1 << 19; else { val = 2 << 19; for (i = 0; i < 4; i++) { if (cong & (1 << i)) val |= 1 << (i << 2); } } rc = -t4_set_params(sc, sc->mbox, sc->pf, 0, 1, ¶m, &val); if (rc != 0) { /* report error but carry on */ device_printf(sc->dev, "failed to set congestion manager context for " "ingress queue %d: %d\n", iq->cntxt_id, rc); } } /* Enable IQ interrupts */ atomic_store_rel_int(&iq->state, IQS_IDLE); t4_write_reg(sc, sc->sge_gts_reg, V_SEINTARM(iq->intr_params) | V_INGRESSQID(iq->cntxt_id)); return (0); } static int free_iq_fl(struct vi_info *vi, struct sge_iq *iq, struct sge_fl *fl) { int rc; struct adapter *sc = iq->adapter; device_t dev; if (sc == NULL) return (0); /* nothing to do */ dev = vi ? vi->dev : sc->dev; if (iq->flags & IQ_ALLOCATED) { rc = -t4_iq_free(sc, sc->mbox, sc->pf, 0, FW_IQ_TYPE_FL_INT_CAP, iq->cntxt_id, fl ? fl->cntxt_id : 0xffff, 0xffff); if (rc != 0) { device_printf(dev, "failed to free queue %p: %d\n", iq, rc); return (rc); } iq->flags &= ~IQ_ALLOCATED; } free_ring(sc, iq->desc_tag, iq->desc_map, iq->ba, iq->desc); bzero(iq, sizeof(*iq)); if (fl) { free_ring(sc, fl->desc_tag, fl->desc_map, fl->ba, fl->desc); if (fl->sdesc) free_fl_sdesc(sc, fl); if (mtx_initialized(&fl->fl_lock)) mtx_destroy(&fl->fl_lock); bzero(fl, sizeof(*fl)); } return (0); } static void add_fl_sysctls(struct sysctl_ctx_list *ctx, struct sysctl_oid *oid, struct sge_fl *fl) { struct sysctl_oid_list *children = SYSCTL_CHILDREN(oid); oid = SYSCTL_ADD_NODE(ctx, children, OID_AUTO, "fl", CTLFLAG_RD, NULL, "freelist"); children = SYSCTL_CHILDREN(oid); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cntxt_id", CTLTYPE_INT | CTLFLAG_RD, &fl->cntxt_id, 0, sysctl_uint16, "I", "SGE context id of the freelist"); SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "padding", CTLFLAG_RD, NULL, fl_pad ? 1 : 0, "padding enabled"); SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "packing", CTLFLAG_RD, NULL, fl->flags & FL_BUF_PACKING ? 1 : 0, "packing enabled"); SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "cidx", CTLFLAG_RD, &fl->cidx, 0, "consumer index"); if (fl->flags & FL_BUF_PACKING) { SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "rx_offset", CTLFLAG_RD, &fl->rx_offset, 0, "packing rx offset"); } SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "pidx", CTLFLAG_RD, &fl->pidx, 0, "producer index"); SYSCTL_ADD_UQUAD(ctx, children, OID_AUTO, "mbuf_allocated", CTLFLAG_RD, &fl->mbuf_allocated, "# of mbuf allocated"); SYSCTL_ADD_UQUAD(ctx, children, OID_AUTO, "mbuf_inlined", CTLFLAG_RD, &fl->mbuf_inlined, "# of mbuf inlined in clusters"); SYSCTL_ADD_UQUAD(ctx, children, OID_AUTO, "cluster_allocated", CTLFLAG_RD, &fl->cl_allocated, "# of clusters allocated"); SYSCTL_ADD_UQUAD(ctx, children, OID_AUTO, "cluster_recycled", CTLFLAG_RD, &fl->cl_recycled, "# of clusters recycled"); SYSCTL_ADD_UQUAD(ctx, children, OID_AUTO, "cluster_fast_recycled", CTLFLAG_RD, &fl->cl_fast_recycled, "# of clusters recycled (fast)"); } static int alloc_fwq(struct adapter *sc) { int rc, intr_idx; struct sge_iq *fwq = &sc->sge.fwq; struct sysctl_oid *oid = device_get_sysctl_tree(sc->dev); struct sysctl_oid_list *children = SYSCTL_CHILDREN(oid); init_iq(fwq, sc, 0, 0, FW_IQ_QSIZE); fwq->flags |= IQ_INTR; /* always */ if (sc->flags & IS_VF) intr_idx = 0; else { intr_idx = sc->intr_count > 1 ? 1 : 0; fwq->set_tcb_rpl = t4_filter_rpl; fwq->l2t_write_rpl = do_l2t_write_rpl; } rc = alloc_iq_fl(&sc->port[0]->vi[0], fwq, NULL, intr_idx, -1); if (rc != 0) { device_printf(sc->dev, "failed to create firmware event queue: %d\n", rc); return (rc); } oid = SYSCTL_ADD_NODE(&sc->ctx, children, OID_AUTO, "fwq", CTLFLAG_RD, NULL, "firmware event queue"); children = SYSCTL_CHILDREN(oid); SYSCTL_ADD_PROC(&sc->ctx, children, OID_AUTO, "abs_id", CTLTYPE_INT | CTLFLAG_RD, &fwq->abs_id, 0, sysctl_uint16, "I", "absolute id of the queue"); SYSCTL_ADD_PROC(&sc->ctx, children, OID_AUTO, "cntxt_id", CTLTYPE_INT | CTLFLAG_RD, &fwq->cntxt_id, 0, sysctl_uint16, "I", "SGE context id of the queue"); SYSCTL_ADD_PROC(&sc->ctx, children, OID_AUTO, "cidx", CTLTYPE_INT | CTLFLAG_RD, &fwq->cidx, 0, sysctl_uint16, "I", "consumer index"); return (0); } static int free_fwq(struct adapter *sc) { return free_iq_fl(NULL, &sc->sge.fwq, NULL); } static int alloc_mgmtq(struct adapter *sc) { int rc; struct sge_wrq *mgmtq = &sc->sge.mgmtq; char name[16]; struct sysctl_oid *oid = device_get_sysctl_tree(sc->dev); struct sysctl_oid_list *children = SYSCTL_CHILDREN(oid); oid = SYSCTL_ADD_NODE(&sc->ctx, children, OID_AUTO, "mgmtq", CTLFLAG_RD, NULL, "management queue"); snprintf(name, sizeof(name), "%s mgmtq", device_get_nameunit(sc->dev)); init_eq(sc, &mgmtq->eq, EQ_CTRL, CTRL_EQ_QSIZE, sc->port[0]->tx_chan, sc->sge.fwq.cntxt_id, name); rc = alloc_wrq(sc, NULL, mgmtq, oid); if (rc != 0) { device_printf(sc->dev, "failed to create management queue: %d\n", rc); return (rc); } return (0); } static int free_mgmtq(struct adapter *sc) { return free_wrq(sc, &sc->sge.mgmtq); } int tnl_cong(struct port_info *pi, int drop) { if (drop == -1) return (-1); else if (drop == 1) return (0); else return (pi->rx_chan_map); } static int alloc_rxq(struct vi_info *vi, struct sge_rxq *rxq, int intr_idx, int idx, struct sysctl_oid *oid) { int rc; struct adapter *sc = vi->pi->adapter; struct sysctl_oid_list *children; char name[16]; rc = alloc_iq_fl(vi, &rxq->iq, &rxq->fl, intr_idx, tnl_cong(vi->pi, cong_drop)); if (rc != 0) return (rc); if (idx == 0) sc->sge.iq_base = rxq->iq.abs_id - rxq->iq.cntxt_id; else KASSERT(rxq->iq.cntxt_id + sc->sge.iq_base == rxq->iq.abs_id, ("iq_base mismatch")); KASSERT(sc->sge.iq_base == 0 || sc->flags & IS_VF, ("PF with non-zero iq_base")); /* * The freelist is just barely above the starvation threshold right now, * fill it up a bit more. */ FL_LOCK(&rxq->fl); refill_fl(sc, &rxq->fl, 128); FL_UNLOCK(&rxq->fl); #if defined(INET) || defined(INET6) rc = tcp_lro_init(&rxq->lro); if (rc != 0) return (rc); rxq->lro.ifp = vi->ifp; /* also indicates LRO init'ed */ if (vi->ifp->if_capenable & IFCAP_LRO) rxq->iq.flags |= IQ_LRO_ENABLED; #endif rxq->ifp = vi->ifp; children = SYSCTL_CHILDREN(oid); snprintf(name, sizeof(name), "%d", idx); oid = SYSCTL_ADD_NODE(&vi->ctx, children, OID_AUTO, name, CTLFLAG_RD, NULL, "rx queue"); children = SYSCTL_CHILDREN(oid); SYSCTL_ADD_PROC(&vi->ctx, children, OID_AUTO, "abs_id", CTLTYPE_INT | CTLFLAG_RD, &rxq->iq.abs_id, 0, sysctl_uint16, "I", "absolute id of the queue"); SYSCTL_ADD_PROC(&vi->ctx, children, OID_AUTO, "cntxt_id", CTLTYPE_INT | CTLFLAG_RD, &rxq->iq.cntxt_id, 0, sysctl_uint16, "I", "SGE context id of the queue"); SYSCTL_ADD_PROC(&vi->ctx, children, OID_AUTO, "cidx", CTLTYPE_INT | CTLFLAG_RD, &rxq->iq.cidx, 0, sysctl_uint16, "I", "consumer index"); #if defined(INET) || defined(INET6) SYSCTL_ADD_U64(&vi->ctx, children, OID_AUTO, "lro_queued", CTLFLAG_RD, &rxq->lro.lro_queued, 0, NULL); SYSCTL_ADD_U64(&vi->ctx, children, OID_AUTO, "lro_flushed", CTLFLAG_RD, &rxq->lro.lro_flushed, 0, NULL); #endif SYSCTL_ADD_UQUAD(&vi->ctx, children, OID_AUTO, "rxcsum", CTLFLAG_RD, &rxq->rxcsum, "# of times hardware assisted with checksum"); SYSCTL_ADD_UQUAD(&vi->ctx, children, OID_AUTO, "vlan_extraction", CTLFLAG_RD, &rxq->vlan_extraction, "# of times hardware extracted 802.1Q tag"); add_fl_sysctls(&vi->ctx, oid, &rxq->fl); return (rc); } static int free_rxq(struct vi_info *vi, struct sge_rxq *rxq) { int rc; #if defined(INET) || defined(INET6) if (rxq->lro.ifp) { tcp_lro_free(&rxq->lro); rxq->lro.ifp = NULL; } #endif rc = free_iq_fl(vi, &rxq->iq, &rxq->fl); if (rc == 0) bzero(rxq, sizeof(*rxq)); return (rc); } #ifdef TCP_OFFLOAD static int alloc_ofld_rxq(struct vi_info *vi, struct sge_ofld_rxq *ofld_rxq, int intr_idx, int idx, struct sysctl_oid *oid) { int rc; struct sysctl_oid_list *children; char name[16]; rc = alloc_iq_fl(vi, &ofld_rxq->iq, &ofld_rxq->fl, intr_idx, vi->pi->rx_chan_map); if (rc != 0) return (rc); children = SYSCTL_CHILDREN(oid); snprintf(name, sizeof(name), "%d", idx); oid = SYSCTL_ADD_NODE(&vi->ctx, children, OID_AUTO, name, CTLFLAG_RD, NULL, "rx queue"); children = SYSCTL_CHILDREN(oid); SYSCTL_ADD_PROC(&vi->ctx, children, OID_AUTO, "abs_id", CTLTYPE_INT | CTLFLAG_RD, &ofld_rxq->iq.abs_id, 0, sysctl_uint16, "I", "absolute id of the queue"); SYSCTL_ADD_PROC(&vi->ctx, children, OID_AUTO, "cntxt_id", CTLTYPE_INT | CTLFLAG_RD, &ofld_rxq->iq.cntxt_id, 0, sysctl_uint16, "I", "SGE context id of the queue"); SYSCTL_ADD_PROC(&vi->ctx, children, OID_AUTO, "cidx", CTLTYPE_INT | CTLFLAG_RD, &ofld_rxq->iq.cidx, 0, sysctl_uint16, "I", "consumer index"); add_fl_sysctls(&vi->ctx, oid, &ofld_rxq->fl); return (rc); } static int free_ofld_rxq(struct vi_info *vi, struct sge_ofld_rxq *ofld_rxq) { int rc; rc = free_iq_fl(vi, &ofld_rxq->iq, &ofld_rxq->fl); if (rc == 0) bzero(ofld_rxq, sizeof(*ofld_rxq)); return (rc); } #endif #ifdef DEV_NETMAP static int alloc_nm_rxq(struct vi_info *vi, struct sge_nm_rxq *nm_rxq, int intr_idx, int idx, struct sysctl_oid *oid) { int rc; struct sysctl_oid_list *children; struct sysctl_ctx_list *ctx; char name[16]; size_t len; struct adapter *sc = vi->pi->adapter; struct netmap_adapter *na = NA(vi->ifp); MPASS(na != NULL); len = vi->qsize_rxq * IQ_ESIZE; rc = alloc_ring(sc, len, &nm_rxq->iq_desc_tag, &nm_rxq->iq_desc_map, &nm_rxq->iq_ba, (void **)&nm_rxq->iq_desc); if (rc != 0) return (rc); len = na->num_rx_desc * EQ_ESIZE + sc->params.sge.spg_len; rc = alloc_ring(sc, len, &nm_rxq->fl_desc_tag, &nm_rxq->fl_desc_map, &nm_rxq->fl_ba, (void **)&nm_rxq->fl_desc); if (rc != 0) return (rc); nm_rxq->vi = vi; nm_rxq->nid = idx; nm_rxq->iq_cidx = 0; nm_rxq->iq_sidx = vi->qsize_rxq - sc->params.sge.spg_len / IQ_ESIZE; nm_rxq->iq_gen = F_RSPD_GEN; nm_rxq->fl_pidx = nm_rxq->fl_cidx = 0; nm_rxq->fl_sidx = na->num_rx_desc; nm_rxq->intr_idx = intr_idx; ctx = &vi->ctx; children = SYSCTL_CHILDREN(oid); snprintf(name, sizeof(name), "%d", idx); oid = SYSCTL_ADD_NODE(ctx, children, OID_AUTO, name, CTLFLAG_RD, NULL, "rx queue"); children = SYSCTL_CHILDREN(oid); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "abs_id", CTLTYPE_INT | CTLFLAG_RD, &nm_rxq->iq_abs_id, 0, sysctl_uint16, "I", "absolute id of the queue"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cntxt_id", CTLTYPE_INT | CTLFLAG_RD, &nm_rxq->iq_cntxt_id, 0, sysctl_uint16, "I", "SGE context id of the queue"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cidx", CTLTYPE_INT | CTLFLAG_RD, &nm_rxq->iq_cidx, 0, sysctl_uint16, "I", "consumer index"); children = SYSCTL_CHILDREN(oid); oid = SYSCTL_ADD_NODE(ctx, children, OID_AUTO, "fl", CTLFLAG_RD, NULL, "freelist"); children = SYSCTL_CHILDREN(oid); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cntxt_id", CTLTYPE_INT | CTLFLAG_RD, &nm_rxq->fl_cntxt_id, 0, sysctl_uint16, "I", "SGE context id of the freelist"); SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "cidx", CTLFLAG_RD, &nm_rxq->fl_cidx, 0, "consumer index"); SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "pidx", CTLFLAG_RD, &nm_rxq->fl_pidx, 0, "producer index"); return (rc); } static int free_nm_rxq(struct vi_info *vi, struct sge_nm_rxq *nm_rxq) { struct adapter *sc = vi->pi->adapter; free_ring(sc, nm_rxq->iq_desc_tag, nm_rxq->iq_desc_map, nm_rxq->iq_ba, nm_rxq->iq_desc); free_ring(sc, nm_rxq->fl_desc_tag, nm_rxq->fl_desc_map, nm_rxq->fl_ba, nm_rxq->fl_desc); return (0); } static int alloc_nm_txq(struct vi_info *vi, struct sge_nm_txq *nm_txq, int iqidx, int idx, struct sysctl_oid *oid) { int rc; size_t len; struct port_info *pi = vi->pi; struct adapter *sc = pi->adapter; struct netmap_adapter *na = NA(vi->ifp); char name[16]; struct sysctl_oid_list *children = SYSCTL_CHILDREN(oid); len = na->num_tx_desc * EQ_ESIZE + sc->params.sge.spg_len; rc = alloc_ring(sc, len, &nm_txq->desc_tag, &nm_txq->desc_map, &nm_txq->ba, (void **)&nm_txq->desc); if (rc) return (rc); nm_txq->pidx = nm_txq->cidx = 0; nm_txq->sidx = na->num_tx_desc; nm_txq->nid = idx; nm_txq->iqidx = iqidx; nm_txq->cpl_ctrl0 = htobe32(V_TXPKT_OPCODE(CPL_TX_PKT) | V_TXPKT_INTF(pi->tx_chan) | V_TXPKT_VF_VLD(1) | V_TXPKT_VF(vi->viid)); snprintf(name, sizeof(name), "%d", idx); oid = SYSCTL_ADD_NODE(&vi->ctx, children, OID_AUTO, name, CTLFLAG_RD, NULL, "netmap tx queue"); children = SYSCTL_CHILDREN(oid); SYSCTL_ADD_UINT(&vi->ctx, children, OID_AUTO, "cntxt_id", CTLFLAG_RD, &nm_txq->cntxt_id, 0, "SGE context id of the queue"); SYSCTL_ADD_PROC(&vi->ctx, children, OID_AUTO, "cidx", CTLTYPE_INT | CTLFLAG_RD, &nm_txq->cidx, 0, sysctl_uint16, "I", "consumer index"); SYSCTL_ADD_PROC(&vi->ctx, children, OID_AUTO, "pidx", CTLTYPE_INT | CTLFLAG_RD, &nm_txq->pidx, 0, sysctl_uint16, "I", "producer index"); return (rc); } static int free_nm_txq(struct vi_info *vi, struct sge_nm_txq *nm_txq) { struct adapter *sc = vi->pi->adapter; free_ring(sc, nm_txq->desc_tag, nm_txq->desc_map, nm_txq->ba, nm_txq->desc); return (0); } #endif static int ctrl_eq_alloc(struct adapter *sc, struct sge_eq *eq) { int rc, cntxt_id; struct fw_eq_ctrl_cmd c; int qsize = eq->sidx + sc->params.sge.spg_len / EQ_ESIZE; bzero(&c, sizeof(c)); c.op_to_vfn = htobe32(V_FW_CMD_OP(FW_EQ_CTRL_CMD) | F_FW_CMD_REQUEST | F_FW_CMD_WRITE | F_FW_CMD_EXEC | V_FW_EQ_CTRL_CMD_PFN(sc->pf) | V_FW_EQ_CTRL_CMD_VFN(0)); c.alloc_to_len16 = htobe32(F_FW_EQ_CTRL_CMD_ALLOC | F_FW_EQ_CTRL_CMD_EQSTART | FW_LEN16(c)); c.cmpliqid_eqid = htonl(V_FW_EQ_CTRL_CMD_CMPLIQID(eq->iqid)); c.physeqid_pkd = htobe32(0); c.fetchszm_to_iqid = htobe32(V_FW_EQ_CTRL_CMD_HOSTFCMODE(X_HOSTFCMODE_NONE) | V_FW_EQ_CTRL_CMD_PCIECHN(eq->tx_chan) | F_FW_EQ_CTRL_CMD_FETCHRO | V_FW_EQ_CTRL_CMD_IQID(eq->iqid)); c.dcaen_to_eqsize = htobe32(V_FW_EQ_CTRL_CMD_FBMIN(X_FETCHBURSTMIN_64B) | V_FW_EQ_CTRL_CMD_FBMAX(X_FETCHBURSTMAX_512B) | V_FW_EQ_CTRL_CMD_EQSIZE(qsize)); c.eqaddr = htobe64(eq->ba); rc = -t4_wr_mbox(sc, sc->mbox, &c, sizeof(c), &c); if (rc != 0) { device_printf(sc->dev, "failed to create control queue %d: %d\n", eq->tx_chan, rc); return (rc); } eq->flags |= EQ_ALLOCATED; eq->cntxt_id = G_FW_EQ_CTRL_CMD_EQID(be32toh(c.cmpliqid_eqid)); cntxt_id = eq->cntxt_id - sc->sge.eq_start; if (cntxt_id >= sc->sge.neq) panic("%s: eq->cntxt_id (%d) more than the max (%d)", __func__, cntxt_id, sc->sge.neq - 1); sc->sge.eqmap[cntxt_id] = eq; return (rc); } static int eth_eq_alloc(struct adapter *sc, struct vi_info *vi, struct sge_eq *eq) { int rc, cntxt_id; struct fw_eq_eth_cmd c; int qsize = eq->sidx + sc->params.sge.spg_len / EQ_ESIZE; bzero(&c, sizeof(c)); c.op_to_vfn = htobe32(V_FW_CMD_OP(FW_EQ_ETH_CMD) | F_FW_CMD_REQUEST | F_FW_CMD_WRITE | F_FW_CMD_EXEC | V_FW_EQ_ETH_CMD_PFN(sc->pf) | V_FW_EQ_ETH_CMD_VFN(0)); c.alloc_to_len16 = htobe32(F_FW_EQ_ETH_CMD_ALLOC | F_FW_EQ_ETH_CMD_EQSTART | FW_LEN16(c)); c.autoequiqe_to_viid = htobe32(F_FW_EQ_ETH_CMD_AUTOEQUIQE | F_FW_EQ_ETH_CMD_AUTOEQUEQE | V_FW_EQ_ETH_CMD_VIID(vi->viid)); c.fetchszm_to_iqid = htobe32(V_FW_EQ_ETH_CMD_HOSTFCMODE(X_HOSTFCMODE_NONE) | V_FW_EQ_ETH_CMD_PCIECHN(eq->tx_chan) | F_FW_EQ_ETH_CMD_FETCHRO | V_FW_EQ_ETH_CMD_IQID(eq->iqid)); c.dcaen_to_eqsize = htobe32(V_FW_EQ_ETH_CMD_FBMIN(X_FETCHBURSTMIN_64B) | V_FW_EQ_ETH_CMD_FBMAX(X_FETCHBURSTMAX_512B) | V_FW_EQ_ETH_CMD_EQSIZE(qsize)); c.eqaddr = htobe64(eq->ba); rc = -t4_wr_mbox(sc, sc->mbox, &c, sizeof(c), &c); if (rc != 0) { device_printf(vi->dev, "failed to create Ethernet egress queue: %d\n", rc); return (rc); } eq->flags |= EQ_ALLOCATED; eq->cntxt_id = G_FW_EQ_ETH_CMD_EQID(be32toh(c.eqid_pkd)); eq->abs_id = G_FW_EQ_ETH_CMD_PHYSEQID(be32toh(c.physeqid_pkd)); cntxt_id = eq->cntxt_id - sc->sge.eq_start; if (cntxt_id >= sc->sge.neq) panic("%s: eq->cntxt_id (%d) more than the max (%d)", __func__, cntxt_id, sc->sge.neq - 1); sc->sge.eqmap[cntxt_id] = eq; return (rc); } #ifdef TCP_OFFLOAD static int ofld_eq_alloc(struct adapter *sc, struct vi_info *vi, struct sge_eq *eq) { int rc, cntxt_id; struct fw_eq_ofld_cmd c; int qsize = eq->sidx + sc->params.sge.spg_len / EQ_ESIZE; bzero(&c, sizeof(c)); c.op_to_vfn = htonl(V_FW_CMD_OP(FW_EQ_OFLD_CMD) | F_FW_CMD_REQUEST | F_FW_CMD_WRITE | F_FW_CMD_EXEC | V_FW_EQ_OFLD_CMD_PFN(sc->pf) | V_FW_EQ_OFLD_CMD_VFN(0)); c.alloc_to_len16 = htonl(F_FW_EQ_OFLD_CMD_ALLOC | F_FW_EQ_OFLD_CMD_EQSTART | FW_LEN16(c)); c.fetchszm_to_iqid = htonl(V_FW_EQ_OFLD_CMD_HOSTFCMODE(X_HOSTFCMODE_NONE) | V_FW_EQ_OFLD_CMD_PCIECHN(eq->tx_chan) | F_FW_EQ_OFLD_CMD_FETCHRO | V_FW_EQ_OFLD_CMD_IQID(eq->iqid)); c.dcaen_to_eqsize = htobe32(V_FW_EQ_OFLD_CMD_FBMIN(X_FETCHBURSTMIN_64B) | V_FW_EQ_OFLD_CMD_FBMAX(X_FETCHBURSTMAX_512B) | V_FW_EQ_OFLD_CMD_EQSIZE(qsize)); c.eqaddr = htobe64(eq->ba); rc = -t4_wr_mbox(sc, sc->mbox, &c, sizeof(c), &c); if (rc != 0) { device_printf(vi->dev, "failed to create egress queue for TCP offload: %d\n", rc); return (rc); } eq->flags |= EQ_ALLOCATED; eq->cntxt_id = G_FW_EQ_OFLD_CMD_EQID(be32toh(c.eqid_pkd)); cntxt_id = eq->cntxt_id - sc->sge.eq_start; if (cntxt_id >= sc->sge.neq) panic("%s: eq->cntxt_id (%d) more than the max (%d)", __func__, cntxt_id, sc->sge.neq - 1); sc->sge.eqmap[cntxt_id] = eq; return (rc); } #endif static int alloc_eq(struct adapter *sc, struct vi_info *vi, struct sge_eq *eq) { int rc, qsize; size_t len; mtx_init(&eq->eq_lock, eq->lockname, NULL, MTX_DEF); qsize = eq->sidx + sc->params.sge.spg_len / EQ_ESIZE; len = qsize * EQ_ESIZE; rc = alloc_ring(sc, len, &eq->desc_tag, &eq->desc_map, &eq->ba, (void **)&eq->desc); if (rc) return (rc); eq->pidx = eq->cidx = 0; eq->equeqidx = eq->dbidx = 0; eq->doorbells = sc->doorbells; switch (eq->flags & EQ_TYPEMASK) { case EQ_CTRL: rc = ctrl_eq_alloc(sc, eq); break; case EQ_ETH: rc = eth_eq_alloc(sc, vi, eq); break; #ifdef TCP_OFFLOAD case EQ_OFLD: rc = ofld_eq_alloc(sc, vi, eq); break; #endif default: panic("%s: invalid eq type %d.", __func__, eq->flags & EQ_TYPEMASK); } if (rc != 0) { device_printf(sc->dev, "failed to allocate egress queue(%d): %d\n", eq->flags & EQ_TYPEMASK, rc); } if (isset(&eq->doorbells, DOORBELL_UDB) || isset(&eq->doorbells, DOORBELL_UDBWC) || isset(&eq->doorbells, DOORBELL_WCWR)) { uint32_t s_qpp = sc->params.sge.eq_s_qpp; uint32_t mask = (1 << s_qpp) - 1; volatile uint8_t *udb; udb = sc->udbs_base + UDBS_DB_OFFSET; udb += (eq->cntxt_id >> s_qpp) << PAGE_SHIFT; /* pg offset */ eq->udb_qid = eq->cntxt_id & mask; /* id in page */ if (eq->udb_qid >= PAGE_SIZE / UDBS_SEG_SIZE) clrbit(&eq->doorbells, DOORBELL_WCWR); else { udb += eq->udb_qid << UDBS_SEG_SHIFT; /* seg offset */ eq->udb_qid = 0; } eq->udb = (volatile void *)udb; } return (rc); } static int free_eq(struct adapter *sc, struct sge_eq *eq) { int rc; if (eq->flags & EQ_ALLOCATED) { switch (eq->flags & EQ_TYPEMASK) { case EQ_CTRL: rc = -t4_ctrl_eq_free(sc, sc->mbox, sc->pf, 0, eq->cntxt_id); break; case EQ_ETH: rc = -t4_eth_eq_free(sc, sc->mbox, sc->pf, 0, eq->cntxt_id); break; #ifdef TCP_OFFLOAD case EQ_OFLD: rc = -t4_ofld_eq_free(sc, sc->mbox, sc->pf, 0, eq->cntxt_id); break; #endif default: panic("%s: invalid eq type %d.", __func__, eq->flags & EQ_TYPEMASK); } if (rc != 0) { device_printf(sc->dev, "failed to free egress queue (%d): %d\n", eq->flags & EQ_TYPEMASK, rc); return (rc); } eq->flags &= ~EQ_ALLOCATED; } free_ring(sc, eq->desc_tag, eq->desc_map, eq->ba, eq->desc); if (mtx_initialized(&eq->eq_lock)) mtx_destroy(&eq->eq_lock); bzero(eq, sizeof(*eq)); return (0); } static int alloc_wrq(struct adapter *sc, struct vi_info *vi, struct sge_wrq *wrq, struct sysctl_oid *oid) { int rc; struct sysctl_ctx_list *ctx = vi ? &vi->ctx : &sc->ctx; struct sysctl_oid_list *children = SYSCTL_CHILDREN(oid); rc = alloc_eq(sc, vi, &wrq->eq); if (rc) return (rc); wrq->adapter = sc; TASK_INIT(&wrq->wrq_tx_task, 0, wrq_tx_drain, wrq); TAILQ_INIT(&wrq->incomplete_wrs); STAILQ_INIT(&wrq->wr_list); wrq->nwr_pending = 0; wrq->ndesc_needed = 0; SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "cntxt_id", CTLFLAG_RD, &wrq->eq.cntxt_id, 0, "SGE context id of the queue"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cidx", CTLTYPE_INT | CTLFLAG_RD, &wrq->eq.cidx, 0, sysctl_uint16, "I", "consumer index"); SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "pidx", CTLTYPE_INT | CTLFLAG_RD, &wrq->eq.pidx, 0, sysctl_uint16, "I", "producer index"); SYSCTL_ADD_UQUAD(ctx, children, OID_AUTO, "tx_wrs_direct", CTLFLAG_RD, &wrq->tx_wrs_direct, "# of work requests (direct)"); SYSCTL_ADD_UQUAD(ctx, children, OID_AUTO, "tx_wrs_copied", CTLFLAG_RD, &wrq->tx_wrs_copied, "# of work requests (copied)"); return (rc); } static int free_wrq(struct adapter *sc, struct sge_wrq *wrq) { int rc; rc = free_eq(sc, &wrq->eq); if (rc) return (rc); bzero(wrq, sizeof(*wrq)); return (0); } static int alloc_txq(struct vi_info *vi, struct sge_txq *txq, int idx, struct sysctl_oid *oid) { int rc; struct port_info *pi = vi->pi; struct adapter *sc = pi->adapter; struct sge_eq *eq = &txq->eq; char name[16]; struct sysctl_oid_list *children = SYSCTL_CHILDREN(oid); rc = mp_ring_alloc(&txq->r, eq->sidx, txq, eth_tx, can_resume_eth_tx, M_CXGBE, M_WAITOK); if (rc != 0) { device_printf(sc->dev, "failed to allocate mp_ring: %d\n", rc); return (rc); } rc = alloc_eq(sc, vi, eq); if (rc != 0) { mp_ring_free(txq->r); txq->r = NULL; return (rc); } /* Can't fail after this point. */ if (idx == 0) sc->sge.eq_base = eq->abs_id - eq->cntxt_id; else KASSERT(eq->cntxt_id + sc->sge.eq_base == eq->abs_id, ("eq_base mismatch")); KASSERT(sc->sge.eq_base == 0 || sc->flags & IS_VF, ("PF with non-zero eq_base")); TASK_INIT(&txq->tx_reclaim_task, 0, tx_reclaim, eq); txq->ifp = vi->ifp; txq->gl = sglist_alloc(TX_SGL_SEGS, M_WAITOK); if (sc->flags & IS_VF) txq->cpl_ctrl0 = htobe32(V_TXPKT_OPCODE(CPL_TX_PKT_XT) | V_TXPKT_INTF(pi->tx_chan)); else txq->cpl_ctrl0 = htobe32(V_TXPKT_OPCODE(CPL_TX_PKT) | V_TXPKT_INTF(pi->tx_chan) | V_TXPKT_VF_VLD(1) | V_TXPKT_VF(vi->viid)); txq->tc_idx = -1; txq->sdesc = malloc(eq->sidx * sizeof(struct tx_sdesc), M_CXGBE, M_ZERO | M_WAITOK); snprintf(name, sizeof(name), "%d", idx); oid = SYSCTL_ADD_NODE(&vi->ctx, children, OID_AUTO, name, CTLFLAG_RD, NULL, "tx queue"); children = SYSCTL_CHILDREN(oid); SYSCTL_ADD_UINT(&vi->ctx, children, OID_AUTO, "abs_id", CTLFLAG_RD, &eq->abs_id, 0, "absolute id of the queue"); SYSCTL_ADD_UINT(&vi->ctx, children, OID_AUTO, "cntxt_id", CTLFLAG_RD, &eq->cntxt_id, 0, "SGE context id of the queue"); SYSCTL_ADD_PROC(&vi->ctx, children, OID_AUTO, "cidx", CTLTYPE_INT | CTLFLAG_RD, &eq->cidx, 0, sysctl_uint16, "I", "consumer index"); SYSCTL_ADD_PROC(&vi->ctx, children, OID_AUTO, "pidx", CTLTYPE_INT | CTLFLAG_RD, &eq->pidx, 0, sysctl_uint16, "I", "producer index"); SYSCTL_ADD_PROC(&vi->ctx, children, OID_AUTO, "tc", CTLTYPE_INT | CTLFLAG_RW, vi, idx, sysctl_tc, "I", "traffic class (-1 means none)"); SYSCTL_ADD_UQUAD(&vi->ctx, children, OID_AUTO, "txcsum", CTLFLAG_RD, &txq->txcsum, "# of times hardware assisted with checksum"); SYSCTL_ADD_UQUAD(&vi->ctx, children, OID_AUTO, "vlan_insertion", CTLFLAG_RD, &txq->vlan_insertion, "# of times hardware inserted 802.1Q tag"); SYSCTL_ADD_UQUAD(&vi->ctx, children, OID_AUTO, "tso_wrs", CTLFLAG_RD, &txq->tso_wrs, "# of TSO work requests"); SYSCTL_ADD_UQUAD(&vi->ctx, children, OID_AUTO, "imm_wrs", CTLFLAG_RD, &txq->imm_wrs, "# of work requests with immediate data"); SYSCTL_ADD_UQUAD(&vi->ctx, children, OID_AUTO, "sgl_wrs", CTLFLAG_RD, &txq->sgl_wrs, "# of work requests with direct SGL"); SYSCTL_ADD_UQUAD(&vi->ctx, children, OID_AUTO, "txpkt_wrs", CTLFLAG_RD, &txq->txpkt_wrs, "# of txpkt work requests (one pkt/WR)"); SYSCTL_ADD_UQUAD(&vi->ctx, children, OID_AUTO, "txpkts0_wrs", CTLFLAG_RD, &txq->txpkts0_wrs, "# of txpkts (type 0) work requests"); SYSCTL_ADD_UQUAD(&vi->ctx, children, OID_AUTO, "txpkts1_wrs", CTLFLAG_RD, &txq->txpkts1_wrs, "# of txpkts (type 1) work requests"); SYSCTL_ADD_UQUAD(&vi->ctx, children, OID_AUTO, "txpkts0_pkts", CTLFLAG_RD, &txq->txpkts0_pkts, "# of frames tx'd using type0 txpkts work requests"); SYSCTL_ADD_UQUAD(&vi->ctx, children, OID_AUTO, "txpkts1_pkts", CTLFLAG_RD, &txq->txpkts1_pkts, "# of frames tx'd using type1 txpkts work requests"); SYSCTL_ADD_COUNTER_U64(&vi->ctx, children, OID_AUTO, "r_enqueues", CTLFLAG_RD, &txq->r->enqueues, "# of enqueues to the mp_ring for this queue"); SYSCTL_ADD_COUNTER_U64(&vi->ctx, children, OID_AUTO, "r_drops", CTLFLAG_RD, &txq->r->drops, "# of drops in the mp_ring for this queue"); SYSCTL_ADD_COUNTER_U64(&vi->ctx, children, OID_AUTO, "r_starts", CTLFLAG_RD, &txq->r->starts, "# of normal consumer starts in the mp_ring for this queue"); SYSCTL_ADD_COUNTER_U64(&vi->ctx, children, OID_AUTO, "r_stalls", CTLFLAG_RD, &txq->r->stalls, "# of consumer stalls in the mp_ring for this queue"); SYSCTL_ADD_COUNTER_U64(&vi->ctx, children, OID_AUTO, "r_restarts", CTLFLAG_RD, &txq->r->restarts, "# of consumer restarts in the mp_ring for this queue"); SYSCTL_ADD_COUNTER_U64(&vi->ctx, children, OID_AUTO, "r_abdications", CTLFLAG_RD, &txq->r->abdications, "# of consumer abdications in the mp_ring for this queue"); return (0); } static int free_txq(struct vi_info *vi, struct sge_txq *txq) { int rc; struct adapter *sc = vi->pi->adapter; struct sge_eq *eq = &txq->eq; rc = free_eq(sc, eq); if (rc) return (rc); sglist_free(txq->gl); free(txq->sdesc, M_CXGBE); mp_ring_free(txq->r); bzero(txq, sizeof(*txq)); return (0); } static void oneseg_dma_callback(void *arg, bus_dma_segment_t *segs, int nseg, int error) { bus_addr_t *ba = arg; KASSERT(nseg == 1, ("%s meant for single segment mappings only.", __func__)); *ba = error ? 0 : segs->ds_addr; } static inline void ring_fl_db(struct adapter *sc, struct sge_fl *fl) { uint32_t n, v; n = IDXDIFF(fl->pidx / 8, fl->dbidx, fl->sidx); MPASS(n > 0); wmb(); v = fl->dbval | V_PIDX(n); if (fl->udb) *fl->udb = htole32(v); else t4_write_reg(sc, sc->sge_kdoorbell_reg, v); IDXINCR(fl->dbidx, n, fl->sidx); } /* * Fills up the freelist by allocating up to 'n' buffers. Buffers that are * recycled do not count towards this allocation budget. * * Returns non-zero to indicate that this freelist should be added to the list * of starving freelists. */ static int refill_fl(struct adapter *sc, struct sge_fl *fl, int n) { __be64 *d; struct fl_sdesc *sd; uintptr_t pa; caddr_t cl; struct cluster_layout *cll; struct sw_zone_info *swz; struct cluster_metadata *clm; uint16_t max_pidx; uint16_t hw_cidx = fl->hw_cidx; /* stable snapshot */ FL_LOCK_ASSERT_OWNED(fl); /* * We always stop at the beginning of the hardware descriptor that's just * before the one with the hw cidx. This is to avoid hw pidx = hw cidx, * which would mean an empty freelist to the chip. */ max_pidx = __predict_false(hw_cidx == 0) ? fl->sidx - 1 : hw_cidx - 1; if (fl->pidx == max_pidx * 8) return (0); d = &fl->desc[fl->pidx]; sd = &fl->sdesc[fl->pidx]; cll = &fl->cll_def; /* default layout */ swz = &sc->sge.sw_zone_info[cll->zidx]; while (n > 0) { if (sd->cl != NULL) { if (sd->nmbuf == 0) { /* * Fast recycle without involving any atomics on * the cluster's metadata (if the cluster has * metadata). This happens when all frames * received in the cluster were small enough to * fit within a single mbuf each. */ fl->cl_fast_recycled++; #ifdef INVARIANTS clm = cl_metadata(sc, fl, &sd->cll, sd->cl); if (clm != NULL) MPASS(clm->refcount == 1); #endif goto recycled_fast; } /* * Cluster is guaranteed to have metadata. Clusters * without metadata always take the fast recycle path * when they're recycled. */ clm = cl_metadata(sc, fl, &sd->cll, sd->cl); MPASS(clm != NULL); if (atomic_fetchadd_int(&clm->refcount, -1) == 1) { fl->cl_recycled++; counter_u64_add(extfree_rels, 1); goto recycled; } sd->cl = NULL; /* gave up my reference */ } MPASS(sd->cl == NULL); alloc: cl = uma_zalloc(swz->zone, M_NOWAIT); if (__predict_false(cl == NULL)) { if (cll == &fl->cll_alt || fl->cll_alt.zidx == -1 || fl->cll_def.zidx == fl->cll_alt.zidx) break; /* fall back to the safe zone */ cll = &fl->cll_alt; swz = &sc->sge.sw_zone_info[cll->zidx]; goto alloc; } fl->cl_allocated++; n--; pa = pmap_kextract((vm_offset_t)cl); pa += cll->region1; sd->cl = cl; sd->cll = *cll; *d = htobe64(pa | cll->hwidx); clm = cl_metadata(sc, fl, cll, cl); if (clm != NULL) { recycled: #ifdef INVARIANTS clm->sd = sd; #endif clm->refcount = 1; } sd->nmbuf = 0; recycled_fast: d++; sd++; if (__predict_false(++fl->pidx % 8 == 0)) { uint16_t pidx = fl->pidx / 8; if (__predict_false(pidx == fl->sidx)) { fl->pidx = 0; pidx = 0; sd = fl->sdesc; d = fl->desc; } if (pidx == max_pidx) break; if (IDXDIFF(pidx, fl->dbidx, fl->sidx) >= 4) ring_fl_db(sc, fl); } } if (fl->pidx / 8 != fl->dbidx) ring_fl_db(sc, fl); return (FL_RUNNING_LOW(fl) && !(fl->flags & FL_STARVING)); } /* * Attempt to refill all starving freelists. */ static void refill_sfl(void *arg) { struct adapter *sc = arg; struct sge_fl *fl, *fl_temp; mtx_assert(&sc->sfl_lock, MA_OWNED); TAILQ_FOREACH_SAFE(fl, &sc->sfl, link, fl_temp) { FL_LOCK(fl); refill_fl(sc, fl, 64); if (FL_NOT_RUNNING_LOW(fl) || fl->flags & FL_DOOMED) { TAILQ_REMOVE(&sc->sfl, fl, link); fl->flags &= ~FL_STARVING; } FL_UNLOCK(fl); } if (!TAILQ_EMPTY(&sc->sfl)) callout_schedule(&sc->sfl_callout, hz / 5); } static int alloc_fl_sdesc(struct sge_fl *fl) { fl->sdesc = malloc(fl->sidx * 8 * sizeof(struct fl_sdesc), M_CXGBE, M_ZERO | M_WAITOK); return (0); } static void free_fl_sdesc(struct adapter *sc, struct sge_fl *fl) { struct fl_sdesc *sd; struct cluster_metadata *clm; struct cluster_layout *cll; int i; sd = fl->sdesc; for (i = 0; i < fl->sidx * 8; i++, sd++) { if (sd->cl == NULL) continue; cll = &sd->cll; clm = cl_metadata(sc, fl, cll, sd->cl); if (sd->nmbuf == 0) uma_zfree(sc->sge.sw_zone_info[cll->zidx].zone, sd->cl); else if (clm && atomic_fetchadd_int(&clm->refcount, -1) == 1) { uma_zfree(sc->sge.sw_zone_info[cll->zidx].zone, sd->cl); counter_u64_add(extfree_rels, 1); } sd->cl = NULL; } free(fl->sdesc, M_CXGBE); fl->sdesc = NULL; } static inline void get_pkt_gl(struct mbuf *m, struct sglist *gl) { int rc; M_ASSERTPKTHDR(m); sglist_reset(gl); rc = sglist_append_mbuf(gl, m); if (__predict_false(rc != 0)) { panic("%s: mbuf %p (%d segs) was vetted earlier but now fails " "with %d.", __func__, m, mbuf_nsegs(m), rc); } KASSERT(gl->sg_nseg == mbuf_nsegs(m), ("%s: nsegs changed for mbuf %p from %d to %d", __func__, m, mbuf_nsegs(m), gl->sg_nseg)); KASSERT(gl->sg_nseg > 0 && gl->sg_nseg <= (needs_tso(m) ? TX_SGL_SEGS_TSO : TX_SGL_SEGS), ("%s: %d segments, should have been 1 <= nsegs <= %d", __func__, gl->sg_nseg, needs_tso(m) ? TX_SGL_SEGS_TSO : TX_SGL_SEGS)); } /* * len16 for a txpkt WR with a GL. Includes the firmware work request header. */ static inline u_int txpkt_len16(u_int nsegs, u_int tso) { u_int n; MPASS(nsegs > 0); nsegs--; /* first segment is part of ulptx_sgl */ n = sizeof(struct fw_eth_tx_pkt_wr) + sizeof(struct cpl_tx_pkt_core) + sizeof(struct ulptx_sgl) + 8 * ((3 * nsegs) / 2 + (nsegs & 1)); if (tso) n += sizeof(struct cpl_tx_pkt_lso_core); return (howmany(n, 16)); } /* * len16 for a txpkt_vm WR with a GL. Includes the firmware work * request header. */ static inline u_int txpkt_vm_len16(u_int nsegs, u_int tso) { u_int n; MPASS(nsegs > 0); nsegs--; /* first segment is part of ulptx_sgl */ n = sizeof(struct fw_eth_tx_pkt_vm_wr) + sizeof(struct cpl_tx_pkt_core) + sizeof(struct ulptx_sgl) + 8 * ((3 * nsegs) / 2 + (nsegs & 1)); if (tso) n += sizeof(struct cpl_tx_pkt_lso_core); return (howmany(n, 16)); } /* * len16 for a txpkts type 0 WR with a GL. Does not include the firmware work * request header. */ static inline u_int txpkts0_len16(u_int nsegs) { u_int n; MPASS(nsegs > 0); nsegs--; /* first segment is part of ulptx_sgl */ n = sizeof(struct ulp_txpkt) + sizeof(struct ulptx_idata) + sizeof(struct cpl_tx_pkt_core) + sizeof(struct ulptx_sgl) + 8 * ((3 * nsegs) / 2 + (nsegs & 1)); return (howmany(n, 16)); } /* * len16 for a txpkts type 1 WR with a GL. Does not include the firmware work * request header. */ static inline u_int txpkts1_len16(void) { u_int n; n = sizeof(struct cpl_tx_pkt_core) + sizeof(struct ulptx_sgl); return (howmany(n, 16)); } static inline u_int imm_payload(u_int ndesc) { u_int n; n = ndesc * EQ_ESIZE - sizeof(struct fw_eth_tx_pkt_wr) - sizeof(struct cpl_tx_pkt_core); return (n); } /* * Write a VM txpkt WR for this packet to the hardware descriptors, update the * software descriptor, and advance the pidx. It is guaranteed that enough * descriptors are available. * * The return value is the # of hardware descriptors used. */ static u_int write_txpkt_vm_wr(struct sge_txq *txq, struct fw_eth_tx_pkt_vm_wr *wr, struct mbuf *m0, u_int available) { struct sge_eq *eq = &txq->eq; struct tx_sdesc *txsd; struct cpl_tx_pkt_core *cpl; uint32_t ctrl; /* used in many unrelated places */ uint64_t ctrl1; int csum_type, len16, ndesc, pktlen, nsegs; caddr_t dst; TXQ_LOCK_ASSERT_OWNED(txq); M_ASSERTPKTHDR(m0); MPASS(available > 0 && available < eq->sidx); len16 = mbuf_len16(m0); nsegs = mbuf_nsegs(m0); pktlen = m0->m_pkthdr.len; ctrl = sizeof(struct cpl_tx_pkt_core); if (needs_tso(m0)) ctrl += sizeof(struct cpl_tx_pkt_lso_core); ndesc = howmany(len16, EQ_ESIZE / 16); MPASS(ndesc <= available); /* Firmware work request header */ MPASS(wr == (void *)&eq->desc[eq->pidx]); wr->op_immdlen = htobe32(V_FW_WR_OP(FW_ETH_TX_PKT_VM_WR) | V_FW_ETH_TX_PKT_WR_IMMDLEN(ctrl)); ctrl = V_FW_WR_LEN16(len16); wr->equiq_to_len16 = htobe32(ctrl); wr->r3[0] = 0; wr->r3[1] = 0; /* * Copy over ethmacdst, ethmacsrc, ethtype, and vlantci. * vlantci is ignored unless the ethtype is 0x8100, so it's * simpler to always copy it rather than making it * conditional. Also, it seems that we do not have to set * vlantci or fake the ethtype when doing VLAN tag insertion. */ m_copydata(m0, 0, sizeof(struct ether_header) + 2, wr->ethmacdst); csum_type = -1; if (needs_tso(m0)) { struct cpl_tx_pkt_lso_core *lso = (void *)(wr + 1); KASSERT(m0->m_pkthdr.l2hlen > 0 && m0->m_pkthdr.l3hlen > 0 && m0->m_pkthdr.l4hlen > 0, ("%s: mbuf %p needs TSO but missing header lengths", __func__, m0)); ctrl = V_LSO_OPCODE(CPL_TX_PKT_LSO) | F_LSO_FIRST_SLICE | F_LSO_LAST_SLICE | V_LSO_IPHDR_LEN(m0->m_pkthdr.l3hlen >> 2) | V_LSO_TCPHDR_LEN(m0->m_pkthdr.l4hlen >> 2); if (m0->m_pkthdr.l2hlen == sizeof(struct ether_vlan_header)) ctrl |= V_LSO_ETHHDR_LEN(1); if (m0->m_pkthdr.l3hlen == sizeof(struct ip6_hdr)) ctrl |= F_LSO_IPV6; lso->lso_ctrl = htobe32(ctrl); lso->ipid_ofst = htobe16(0); lso->mss = htobe16(m0->m_pkthdr.tso_segsz); lso->seqno_offset = htobe32(0); lso->len = htobe32(pktlen); if (m0->m_pkthdr.l3hlen == sizeof(struct ip6_hdr)) csum_type = TX_CSUM_TCPIP6; else csum_type = TX_CSUM_TCPIP; cpl = (void *)(lso + 1); txq->tso_wrs++; } else { if (m0->m_pkthdr.csum_flags & CSUM_IP_TCP) csum_type = TX_CSUM_TCPIP; else if (m0->m_pkthdr.csum_flags & CSUM_IP_UDP) csum_type = TX_CSUM_UDPIP; else if (m0->m_pkthdr.csum_flags & CSUM_IP6_TCP) csum_type = TX_CSUM_TCPIP6; else if (m0->m_pkthdr.csum_flags & CSUM_IP6_UDP) csum_type = TX_CSUM_UDPIP6; #if defined(INET) else if (m0->m_pkthdr.csum_flags & CSUM_IP) { /* * XXX: The firmware appears to stomp on the * fragment/flags field of the IP header when * using TX_CSUM_IP. Fall back to doing * software checksums. */ u_short *sump; struct mbuf *m; int offset; m = m0; offset = 0; sump = m_advance(&m, &offset, m0->m_pkthdr.l2hlen + offsetof(struct ip, ip_sum)); *sump = in_cksum_skip(m0, m0->m_pkthdr.l2hlen + m0->m_pkthdr.l3hlen, m0->m_pkthdr.l2hlen); m0->m_pkthdr.csum_flags &= ~CSUM_IP; } #endif cpl = (void *)(wr + 1); } /* Checksum offload */ ctrl1 = 0; if (needs_l3_csum(m0) == 0) ctrl1 |= F_TXPKT_IPCSUM_DIS; if (csum_type >= 0) { KASSERT(m0->m_pkthdr.l2hlen > 0 && m0->m_pkthdr.l3hlen > 0, ("%s: mbuf %p needs checksum offload but missing header lengths", __func__, m0)); /* XXX: T6 */ ctrl1 |= V_TXPKT_ETHHDR_LEN(m0->m_pkthdr.l2hlen - ETHER_HDR_LEN); ctrl1 |= V_TXPKT_IPHDR_LEN(m0->m_pkthdr.l3hlen); ctrl1 |= V_TXPKT_CSUM_TYPE(csum_type); } else ctrl1 |= F_TXPKT_L4CSUM_DIS; if (m0->m_pkthdr.csum_flags & (CSUM_IP | CSUM_TCP | CSUM_UDP | CSUM_UDP_IPV6 | CSUM_TCP_IPV6 | CSUM_TSO)) txq->txcsum++; /* some hardware assistance provided */ /* VLAN tag insertion */ if (needs_vlan_insertion(m0)) { ctrl1 |= F_TXPKT_VLAN_VLD | V_TXPKT_VLAN(m0->m_pkthdr.ether_vtag); txq->vlan_insertion++; } /* CPL header */ cpl->ctrl0 = txq->cpl_ctrl0; cpl->pack = 0; cpl->len = htobe16(pktlen); cpl->ctrl1 = htobe64(ctrl1); /* SGL */ dst = (void *)(cpl + 1); /* * A packet using TSO will use up an entire descriptor for the * firmware work request header, LSO CPL, and TX_PKT_XT CPL. * If this descriptor is the last descriptor in the ring, wrap * around to the front of the ring explicitly for the start of * the sgl. */ if (dst == (void *)&eq->desc[eq->sidx]) { dst = (void *)&eq->desc[0]; write_gl_to_txd(txq, m0, &dst, 0); } else write_gl_to_txd(txq, m0, &dst, eq->sidx - ndesc < eq->pidx); txq->sgl_wrs++; txq->txpkt_wrs++; txsd = &txq->sdesc[eq->pidx]; txsd->m = m0; txsd->desc_used = ndesc; return (ndesc); } /* * Write a txpkt WR for this packet to the hardware descriptors, update the * software descriptor, and advance the pidx. It is guaranteed that enough * descriptors are available. * * The return value is the # of hardware descriptors used. */ static u_int write_txpkt_wr(struct sge_txq *txq, struct fw_eth_tx_pkt_wr *wr, struct mbuf *m0, u_int available) { struct sge_eq *eq = &txq->eq; struct tx_sdesc *txsd; struct cpl_tx_pkt_core *cpl; uint32_t ctrl; /* used in many unrelated places */ uint64_t ctrl1; int len16, ndesc, pktlen, nsegs; caddr_t dst; TXQ_LOCK_ASSERT_OWNED(txq); M_ASSERTPKTHDR(m0); MPASS(available > 0 && available < eq->sidx); len16 = mbuf_len16(m0); nsegs = mbuf_nsegs(m0); pktlen = m0->m_pkthdr.len; ctrl = sizeof(struct cpl_tx_pkt_core); if (needs_tso(m0)) ctrl += sizeof(struct cpl_tx_pkt_lso_core); else if (pktlen <= imm_payload(2) && available >= 2) { /* Immediate data. Recalculate len16 and set nsegs to 0. */ ctrl += pktlen; len16 = howmany(sizeof(struct fw_eth_tx_pkt_wr) + sizeof(struct cpl_tx_pkt_core) + pktlen, 16); nsegs = 0; } ndesc = howmany(len16, EQ_ESIZE / 16); MPASS(ndesc <= available); /* Firmware work request header */ MPASS(wr == (void *)&eq->desc[eq->pidx]); wr->op_immdlen = htobe32(V_FW_WR_OP(FW_ETH_TX_PKT_WR) | V_FW_ETH_TX_PKT_WR_IMMDLEN(ctrl)); ctrl = V_FW_WR_LEN16(len16); wr->equiq_to_len16 = htobe32(ctrl); wr->r3 = 0; if (needs_tso(m0)) { struct cpl_tx_pkt_lso_core *lso = (void *)(wr + 1); KASSERT(m0->m_pkthdr.l2hlen > 0 && m0->m_pkthdr.l3hlen > 0 && m0->m_pkthdr.l4hlen > 0, ("%s: mbuf %p needs TSO but missing header lengths", __func__, m0)); ctrl = V_LSO_OPCODE(CPL_TX_PKT_LSO) | F_LSO_FIRST_SLICE | F_LSO_LAST_SLICE | V_LSO_IPHDR_LEN(m0->m_pkthdr.l3hlen >> 2) | V_LSO_TCPHDR_LEN(m0->m_pkthdr.l4hlen >> 2); if (m0->m_pkthdr.l2hlen == sizeof(struct ether_vlan_header)) ctrl |= V_LSO_ETHHDR_LEN(1); if (m0->m_pkthdr.l3hlen == sizeof(struct ip6_hdr)) ctrl |= F_LSO_IPV6; lso->lso_ctrl = htobe32(ctrl); lso->ipid_ofst = htobe16(0); lso->mss = htobe16(m0->m_pkthdr.tso_segsz); lso->seqno_offset = htobe32(0); lso->len = htobe32(pktlen); cpl = (void *)(lso + 1); txq->tso_wrs++; } else cpl = (void *)(wr + 1); /* Checksum offload */ ctrl1 = 0; if (needs_l3_csum(m0) == 0) ctrl1 |= F_TXPKT_IPCSUM_DIS; if (needs_l4_csum(m0) == 0) ctrl1 |= F_TXPKT_L4CSUM_DIS; if (m0->m_pkthdr.csum_flags & (CSUM_IP | CSUM_TCP | CSUM_UDP | CSUM_UDP_IPV6 | CSUM_TCP_IPV6 | CSUM_TSO)) txq->txcsum++; /* some hardware assistance provided */ /* VLAN tag insertion */ if (needs_vlan_insertion(m0)) { ctrl1 |= F_TXPKT_VLAN_VLD | V_TXPKT_VLAN(m0->m_pkthdr.ether_vtag); txq->vlan_insertion++; } /* CPL header */ cpl->ctrl0 = txq->cpl_ctrl0; cpl->pack = 0; cpl->len = htobe16(pktlen); cpl->ctrl1 = htobe64(ctrl1); /* SGL */ dst = (void *)(cpl + 1); if (nsegs > 0) { write_gl_to_txd(txq, m0, &dst, eq->sidx - ndesc < eq->pidx); txq->sgl_wrs++; } else { struct mbuf *m; for (m = m0; m != NULL; m = m->m_next) { copy_to_txd(eq, mtod(m, caddr_t), &dst, m->m_len); #ifdef INVARIANTS pktlen -= m->m_len; #endif } #ifdef INVARIANTS KASSERT(pktlen == 0, ("%s: %d bytes left.", __func__, pktlen)); #endif txq->imm_wrs++; } txq->txpkt_wrs++; txsd = &txq->sdesc[eq->pidx]; txsd->m = m0; txsd->desc_used = ndesc; return (ndesc); } static int try_txpkts(struct mbuf *m, struct mbuf *n, struct txpkts *txp, u_int available) { u_int needed, nsegs1, nsegs2, l1, l2; if (cannot_use_txpkts(m) || cannot_use_txpkts(n)) return (1); nsegs1 = mbuf_nsegs(m); nsegs2 = mbuf_nsegs(n); if (nsegs1 + nsegs2 == 2) { txp->wr_type = 1; l1 = l2 = txpkts1_len16(); } else { txp->wr_type = 0; l1 = txpkts0_len16(nsegs1); l2 = txpkts0_len16(nsegs2); } txp->len16 = howmany(sizeof(struct fw_eth_tx_pkts_wr), 16) + l1 + l2; needed = howmany(txp->len16, EQ_ESIZE / 16); if (needed > SGE_MAX_WR_NDESC || needed > available) return (1); txp->plen = m->m_pkthdr.len + n->m_pkthdr.len; if (txp->plen > 65535) return (1); txp->npkt = 2; set_mbuf_len16(m, l1); set_mbuf_len16(n, l2); return (0); } static int add_to_txpkts(struct mbuf *m, struct txpkts *txp, u_int available) { u_int plen, len16, needed, nsegs; MPASS(txp->wr_type == 0 || txp->wr_type == 1); nsegs = mbuf_nsegs(m); if (needs_tso(m) || (txp->wr_type == 1 && nsegs != 1)) return (1); plen = txp->plen + m->m_pkthdr.len; if (plen > 65535) return (1); if (txp->wr_type == 0) len16 = txpkts0_len16(nsegs); else len16 = txpkts1_len16(); needed = howmany(txp->len16 + len16, EQ_ESIZE / 16); if (needed > SGE_MAX_WR_NDESC || needed > available) return (1); txp->npkt++; txp->plen = plen; txp->len16 += len16; set_mbuf_len16(m, len16); return (0); } /* * Write a txpkts WR for the packets in txp to the hardware descriptors, update * the software descriptor, and advance the pidx. It is guaranteed that enough * descriptors are available. * * The return value is the # of hardware descriptors used. */ static u_int write_txpkts_wr(struct sge_txq *txq, struct fw_eth_tx_pkts_wr *wr, struct mbuf *m0, const struct txpkts *txp, u_int available) { struct sge_eq *eq = &txq->eq; struct tx_sdesc *txsd; struct cpl_tx_pkt_core *cpl; uint32_t ctrl; uint64_t ctrl1; int ndesc, checkwrap; struct mbuf *m; void *flitp; TXQ_LOCK_ASSERT_OWNED(txq); MPASS(txp->npkt > 0); MPASS(txp->plen < 65536); MPASS(m0 != NULL); MPASS(m0->m_nextpkt != NULL); MPASS(txp->len16 <= howmany(SGE_MAX_WR_LEN, 16)); MPASS(available > 0 && available < eq->sidx); ndesc = howmany(txp->len16, EQ_ESIZE / 16); MPASS(ndesc <= available); MPASS(wr == (void *)&eq->desc[eq->pidx]); wr->op_pkd = htobe32(V_FW_WR_OP(FW_ETH_TX_PKTS_WR)); ctrl = V_FW_WR_LEN16(txp->len16); wr->equiq_to_len16 = htobe32(ctrl); wr->plen = htobe16(txp->plen); wr->npkt = txp->npkt; wr->r3 = 0; wr->type = txp->wr_type; flitp = wr + 1; /* * At this point we are 16B into a hardware descriptor. If checkwrap is * set then we know the WR is going to wrap around somewhere. We'll * check for that at appropriate points. */ checkwrap = eq->sidx - ndesc < eq->pidx; for (m = m0; m != NULL; m = m->m_nextpkt) { if (txp->wr_type == 0) { struct ulp_txpkt *ulpmc; struct ulptx_idata *ulpsc; /* ULP master command */ ulpmc = flitp; ulpmc->cmd_dest = htobe32(V_ULPTX_CMD(ULP_TX_PKT) | V_ULP_TXPKT_DEST(0) | V_ULP_TXPKT_FID(eq->iqid)); ulpmc->len = htobe32(mbuf_len16(m)); /* ULP subcommand */ ulpsc = (void *)(ulpmc + 1); ulpsc->cmd_more = htobe32(V_ULPTX_CMD(ULP_TX_SC_IMM) | F_ULP_TX_SC_MORE); ulpsc->len = htobe32(sizeof(struct cpl_tx_pkt_core)); cpl = (void *)(ulpsc + 1); if (checkwrap && (uintptr_t)cpl == (uintptr_t)&eq->desc[eq->sidx]) cpl = (void *)&eq->desc[0]; txq->txpkts0_pkts += txp->npkt; txq->txpkts0_wrs++; } else { cpl = flitp; txq->txpkts1_pkts += txp->npkt; txq->txpkts1_wrs++; } /* Checksum offload */ ctrl1 = 0; if (needs_l3_csum(m) == 0) ctrl1 |= F_TXPKT_IPCSUM_DIS; if (needs_l4_csum(m) == 0) ctrl1 |= F_TXPKT_L4CSUM_DIS; if (m->m_pkthdr.csum_flags & (CSUM_IP | CSUM_TCP | CSUM_UDP | CSUM_UDP_IPV6 | CSUM_TCP_IPV6 | CSUM_TSO)) txq->txcsum++; /* some hardware assistance provided */ /* VLAN tag insertion */ if (needs_vlan_insertion(m)) { ctrl1 |= F_TXPKT_VLAN_VLD | V_TXPKT_VLAN(m->m_pkthdr.ether_vtag); txq->vlan_insertion++; } /* CPL header */ cpl->ctrl0 = txq->cpl_ctrl0; cpl->pack = 0; cpl->len = htobe16(m->m_pkthdr.len); cpl->ctrl1 = htobe64(ctrl1); flitp = cpl + 1; if (checkwrap && (uintptr_t)flitp == (uintptr_t)&eq->desc[eq->sidx]) flitp = (void *)&eq->desc[0]; write_gl_to_txd(txq, m, (caddr_t *)(&flitp), checkwrap); } txsd = &txq->sdesc[eq->pidx]; txsd->m = m0; txsd->desc_used = ndesc; return (ndesc); } /* * If the SGL ends on an address that is not 16 byte aligned, this function will * add a 0 filled flit at the end. */ static void write_gl_to_txd(struct sge_txq *txq, struct mbuf *m, caddr_t *to, int checkwrap) { struct sge_eq *eq = &txq->eq; struct sglist *gl = txq->gl; struct sglist_seg *seg; __be64 *flitp, *wrap; struct ulptx_sgl *usgl; int i, nflits, nsegs; KASSERT(((uintptr_t)(*to) & 0xf) == 0, ("%s: SGL must start at a 16 byte boundary: %p", __func__, *to)); MPASS((uintptr_t)(*to) >= (uintptr_t)&eq->desc[0]); MPASS((uintptr_t)(*to) < (uintptr_t)&eq->desc[eq->sidx]); get_pkt_gl(m, gl); nsegs = gl->sg_nseg; MPASS(nsegs > 0); nflits = (3 * (nsegs - 1)) / 2 + ((nsegs - 1) & 1) + 2; flitp = (__be64 *)(*to); wrap = (__be64 *)(&eq->desc[eq->sidx]); seg = &gl->sg_segs[0]; usgl = (void *)flitp; /* * We start at a 16 byte boundary somewhere inside the tx descriptor * ring, so we're at least 16 bytes away from the status page. There is * no chance of a wrap around in the middle of usgl (which is 16 bytes). */ usgl->cmd_nsge = htobe32(V_ULPTX_CMD(ULP_TX_SC_DSGL) | V_ULPTX_NSGE(nsegs)); usgl->len0 = htobe32(seg->ss_len); usgl->addr0 = htobe64(seg->ss_paddr); seg++; if (checkwrap == 0 || (uintptr_t)(flitp + nflits) <= (uintptr_t)wrap) { /* Won't wrap around at all */ for (i = 0; i < nsegs - 1; i++, seg++) { usgl->sge[i / 2].len[i & 1] = htobe32(seg->ss_len); usgl->sge[i / 2].addr[i & 1] = htobe64(seg->ss_paddr); } if (i & 1) usgl->sge[i / 2].len[1] = htobe32(0); flitp += nflits; } else { /* Will wrap somewhere in the rest of the SGL */ /* 2 flits already written, write the rest flit by flit */ flitp = (void *)(usgl + 1); for (i = 0; i < nflits - 2; i++) { if (flitp == wrap) flitp = (void *)eq->desc; *flitp++ = get_flit(seg, nsegs - 1, i); } } if (nflits & 1) { MPASS(((uintptr_t)flitp) & 0xf); *flitp++ = 0; } MPASS((((uintptr_t)flitp) & 0xf) == 0); if (__predict_false(flitp == wrap)) *to = (void *)eq->desc; else *to = (void *)flitp; } static inline void copy_to_txd(struct sge_eq *eq, caddr_t from, caddr_t *to, int len) { MPASS((uintptr_t)(*to) >= (uintptr_t)&eq->desc[0]); MPASS((uintptr_t)(*to) < (uintptr_t)&eq->desc[eq->sidx]); if (__predict_true((uintptr_t)(*to) + len <= (uintptr_t)&eq->desc[eq->sidx])) { bcopy(from, *to, len); (*to) += len; } else { int portion = (uintptr_t)&eq->desc[eq->sidx] - (uintptr_t)(*to); bcopy(from, *to, portion); from += portion; portion = len - portion; /* remaining */ bcopy(from, (void *)eq->desc, portion); (*to) = (caddr_t)eq->desc + portion; } } static inline void ring_eq_db(struct adapter *sc, struct sge_eq *eq, u_int n) { u_int db; MPASS(n > 0); db = eq->doorbells; if (n > 1) clrbit(&db, DOORBELL_WCWR); wmb(); switch (ffs(db) - 1) { case DOORBELL_UDB: *eq->udb = htole32(V_QID(eq->udb_qid) | V_PIDX(n)); break; case DOORBELL_WCWR: { volatile uint64_t *dst, *src; int i; /* * Queues whose 128B doorbell segment fits in the page do not * use relative qid (udb_qid is always 0). Only queues with * doorbell segments can do WCWR. */ KASSERT(eq->udb_qid == 0 && n == 1, ("%s: inappropriate doorbell (0x%x, %d, %d) for eq %p", __func__, eq->doorbells, n, eq->dbidx, eq)); dst = (volatile void *)((uintptr_t)eq->udb + UDBS_WR_OFFSET - UDBS_DB_OFFSET); i = eq->dbidx; src = (void *)&eq->desc[i]; while (src != (void *)&eq->desc[i + 1]) *dst++ = *src++; wmb(); break; } case DOORBELL_UDBWC: *eq->udb = htole32(V_QID(eq->udb_qid) | V_PIDX(n)); wmb(); break; case DOORBELL_KDB: t4_write_reg(sc, sc->sge_kdoorbell_reg, V_QID(eq->cntxt_id) | V_PIDX(n)); break; } IDXINCR(eq->dbidx, n, eq->sidx); } static inline u_int reclaimable_tx_desc(struct sge_eq *eq) { uint16_t hw_cidx; hw_cidx = read_hw_cidx(eq); return (IDXDIFF(hw_cidx, eq->cidx, eq->sidx)); } static inline u_int total_available_tx_desc(struct sge_eq *eq) { uint16_t hw_cidx, pidx; hw_cidx = read_hw_cidx(eq); pidx = eq->pidx; if (pidx == hw_cidx) return (eq->sidx - 1); else return (IDXDIFF(hw_cidx, pidx, eq->sidx) - 1); } static inline uint16_t read_hw_cidx(struct sge_eq *eq) { struct sge_qstat *spg = (void *)&eq->desc[eq->sidx]; uint16_t cidx = spg->cidx; /* stable snapshot */ return (be16toh(cidx)); } /* * Reclaim 'n' descriptors approximately. */ static u_int reclaim_tx_descs(struct sge_txq *txq, u_int n) { struct tx_sdesc *txsd; struct sge_eq *eq = &txq->eq; u_int can_reclaim, reclaimed; TXQ_LOCK_ASSERT_OWNED(txq); MPASS(n > 0); reclaimed = 0; can_reclaim = reclaimable_tx_desc(eq); while (can_reclaim && reclaimed < n) { int ndesc; struct mbuf *m, *nextpkt; txsd = &txq->sdesc[eq->cidx]; ndesc = txsd->desc_used; /* Firmware doesn't return "partial" credits. */ KASSERT(can_reclaim >= ndesc, ("%s: unexpected number of credits: %d, %d", __func__, can_reclaim, ndesc)); for (m = txsd->m; m != NULL; m = nextpkt) { nextpkt = m->m_nextpkt; m->m_nextpkt = NULL; m_freem(m); } reclaimed += ndesc; can_reclaim -= ndesc; IDXINCR(eq->cidx, ndesc, eq->sidx); } return (reclaimed); } static void tx_reclaim(void *arg, int n) { struct sge_txq *txq = arg; struct sge_eq *eq = &txq->eq; do { if (TXQ_TRYLOCK(txq) == 0) break; n = reclaim_tx_descs(txq, 32); if (eq->cidx == eq->pidx) eq->equeqidx = eq->pidx; TXQ_UNLOCK(txq); } while (n > 0); } static __be64 get_flit(struct sglist_seg *segs, int nsegs, int idx) { int i = (idx / 3) * 2; switch (idx % 3) { case 0: { __be64 rc; rc = htobe32(segs[i].ss_len); if (i + 1 < nsegs) rc |= (uint64_t)htobe32(segs[i + 1].ss_len) << 32; return (rc); } case 1: return (htobe64(segs[i].ss_paddr)); case 2: return (htobe64(segs[i + 1].ss_paddr)); } return (0); } static void find_best_refill_source(struct adapter *sc, struct sge_fl *fl, int maxp) { int8_t zidx, hwidx, idx; uint16_t region1, region3; int spare, spare_needed, n; struct sw_zone_info *swz; struct hw_buf_info *hwb, *hwb_list = &sc->sge.hw_buf_info[0]; /* * Buffer Packing: Look for PAGE_SIZE or larger zone which has a bufsize * large enough for the max payload and cluster metadata. Otherwise * settle for the largest bufsize that leaves enough room in the cluster * for metadata. * * Without buffer packing: Look for the smallest zone which has a * bufsize large enough for the max payload. Settle for the largest * bufsize available if there's nothing big enough for max payload. */ spare_needed = fl->flags & FL_BUF_PACKING ? CL_METADATA_SIZE : 0; swz = &sc->sge.sw_zone_info[0]; hwidx = -1; for (zidx = 0; zidx < SW_ZONE_SIZES; zidx++, swz++) { if (swz->size > largest_rx_cluster) { if (__predict_true(hwidx != -1)) break; /* * This is a misconfiguration. largest_rx_cluster is * preventing us from finding a refill source. See * dev.t5nex..buffer_sizes to figure out why. */ device_printf(sc->dev, "largest_rx_cluster=%u leaves no" " refill source for fl %p (dma %u). Ignored.\n", largest_rx_cluster, fl, maxp); } for (idx = swz->head_hwidx; idx != -1; idx = hwb->next) { hwb = &hwb_list[idx]; spare = swz->size - hwb->size; if (spare < spare_needed) continue; hwidx = idx; /* best option so far */ if (hwb->size >= maxp) { if ((fl->flags & FL_BUF_PACKING) == 0) goto done; /* stop looking (not packing) */ if (swz->size >= safest_rx_cluster) goto done; /* stop looking (packing) */ } break; /* keep looking, next zone */ } } done: /* A usable hwidx has been located. */ MPASS(hwidx != -1); hwb = &hwb_list[hwidx]; zidx = hwb->zidx; swz = &sc->sge.sw_zone_info[zidx]; region1 = 0; region3 = swz->size - hwb->size; /* * Stay within this zone and see if there is a better match when mbuf * inlining is allowed. Remember that the hwidx's are sorted in * decreasing order of size (so in increasing order of spare area). */ for (idx = hwidx; idx != -1; idx = hwb->next) { hwb = &hwb_list[idx]; spare = swz->size - hwb->size; if (allow_mbufs_in_cluster == 0 || hwb->size < maxp) break; /* * Do not inline mbufs if doing so would violate the pad/pack * boundary alignment requirement. */ if (fl_pad && (MSIZE % sc->params.sge.pad_boundary) != 0) continue; if (fl->flags & FL_BUF_PACKING && (MSIZE % sc->params.sge.pack_boundary) != 0) continue; if (spare < CL_METADATA_SIZE + MSIZE) continue; n = (spare - CL_METADATA_SIZE) / MSIZE; if (n > howmany(hwb->size, maxp)) break; hwidx = idx; if (fl->flags & FL_BUF_PACKING) { region1 = n * MSIZE; region3 = spare - region1; } else { region1 = MSIZE; region3 = spare - region1; break; } } KASSERT(zidx >= 0 && zidx < SW_ZONE_SIZES, ("%s: bad zone %d for fl %p, maxp %d", __func__, zidx, fl, maxp)); KASSERT(hwidx >= 0 && hwidx <= SGE_FLBUF_SIZES, ("%s: bad hwidx %d for fl %p, maxp %d", __func__, hwidx, fl, maxp)); KASSERT(region1 + sc->sge.hw_buf_info[hwidx].size + region3 == sc->sge.sw_zone_info[zidx].size, ("%s: bad buffer layout for fl %p, maxp %d. " "cl %d; r1 %d, payload %d, r3 %d", __func__, fl, maxp, sc->sge.sw_zone_info[zidx].size, region1, sc->sge.hw_buf_info[hwidx].size, region3)); if (fl->flags & FL_BUF_PACKING || region1 > 0) { KASSERT(region3 >= CL_METADATA_SIZE, ("%s: no room for metadata. fl %p, maxp %d; " "cl %d; r1 %d, payload %d, r3 %d", __func__, fl, maxp, sc->sge.sw_zone_info[zidx].size, region1, sc->sge.hw_buf_info[hwidx].size, region3)); KASSERT(region1 % MSIZE == 0, ("%s: bad mbuf region for fl %p, maxp %d. " "cl %d; r1 %d, payload %d, r3 %d", __func__, fl, maxp, sc->sge.sw_zone_info[zidx].size, region1, sc->sge.hw_buf_info[hwidx].size, region3)); } fl->cll_def.zidx = zidx; fl->cll_def.hwidx = hwidx; fl->cll_def.region1 = region1; fl->cll_def.region3 = region3; } static void find_safe_refill_source(struct adapter *sc, struct sge_fl *fl) { struct sge *s = &sc->sge; struct hw_buf_info *hwb; struct sw_zone_info *swz; int spare; int8_t hwidx; if (fl->flags & FL_BUF_PACKING) hwidx = s->safe_hwidx2; /* with room for metadata */ else if (allow_mbufs_in_cluster && s->safe_hwidx2 != -1) { hwidx = s->safe_hwidx2; hwb = &s->hw_buf_info[hwidx]; swz = &s->sw_zone_info[hwb->zidx]; spare = swz->size - hwb->size; /* no good if there isn't room for an mbuf as well */ if (spare < CL_METADATA_SIZE + MSIZE) hwidx = s->safe_hwidx1; } else hwidx = s->safe_hwidx1; if (hwidx == -1) { /* No fallback source */ fl->cll_alt.hwidx = -1; fl->cll_alt.zidx = -1; return; } hwb = &s->hw_buf_info[hwidx]; swz = &s->sw_zone_info[hwb->zidx]; spare = swz->size - hwb->size; fl->cll_alt.hwidx = hwidx; fl->cll_alt.zidx = hwb->zidx; if (allow_mbufs_in_cluster && (fl_pad == 0 || (MSIZE % sc->params.sge.pad_boundary) == 0)) fl->cll_alt.region1 = ((spare - CL_METADATA_SIZE) / MSIZE) * MSIZE; else fl->cll_alt.region1 = 0; fl->cll_alt.region3 = spare - fl->cll_alt.region1; } static void add_fl_to_sfl(struct adapter *sc, struct sge_fl *fl) { mtx_lock(&sc->sfl_lock); FL_LOCK(fl); if ((fl->flags & FL_DOOMED) == 0) { fl->flags |= FL_STARVING; TAILQ_INSERT_TAIL(&sc->sfl, fl, link); callout_reset(&sc->sfl_callout, hz / 5, refill_sfl, sc); } FL_UNLOCK(fl); mtx_unlock(&sc->sfl_lock); } static void handle_wrq_egr_update(struct adapter *sc, struct sge_eq *eq) { struct sge_wrq *wrq = (void *)eq; atomic_readandclear_int(&eq->equiq); taskqueue_enqueue(sc->tq[eq->tx_chan], &wrq->wrq_tx_task); } static void handle_eth_egr_update(struct adapter *sc, struct sge_eq *eq) { struct sge_txq *txq = (void *)eq; MPASS((eq->flags & EQ_TYPEMASK) == EQ_ETH); atomic_readandclear_int(&eq->equiq); mp_ring_check_drainage(txq->r, 0); taskqueue_enqueue(sc->tq[eq->tx_chan], &txq->tx_reclaim_task); } static int handle_sge_egr_update(struct sge_iq *iq, const struct rss_header *rss, struct mbuf *m) { const struct cpl_sge_egr_update *cpl = (const void *)(rss + 1); unsigned int qid = G_EGR_QID(ntohl(cpl->opcode_qid)); struct adapter *sc = iq->adapter; struct sge *s = &sc->sge; struct sge_eq *eq; static void (*h[])(struct adapter *, struct sge_eq *) = {NULL, &handle_wrq_egr_update, &handle_eth_egr_update, &handle_wrq_egr_update}; KASSERT(m == NULL, ("%s: payload with opcode %02x", __func__, rss->opcode)); eq = s->eqmap[qid - s->eq_start - s->eq_base]; (*h[eq->flags & EQ_TYPEMASK])(sc, eq); return (0); } /* handle_fw_msg works for both fw4_msg and fw6_msg because this is valid */ CTASSERT(offsetof(struct cpl_fw4_msg, data) == \ offsetof(struct cpl_fw6_msg, data)); static int handle_fw_msg(struct sge_iq *iq, const struct rss_header *rss, struct mbuf *m) { struct adapter *sc = iq->adapter; const struct cpl_fw6_msg *cpl = (const void *)(rss + 1); KASSERT(m == NULL, ("%s: payload with opcode %02x", __func__, rss->opcode)); if (cpl->type == FW_TYPE_RSSCPL || cpl->type == FW6_TYPE_RSSCPL) { const struct rss_header *rss2; rss2 = (const struct rss_header *)&cpl->data[0]; return (t4_cpl_handler[rss2->opcode](iq, rss2, m)); } return (t4_fw_msg_handler[cpl->type](sc, &cpl->data[0])); } /** * t4_handle_wrerr_rpl - process a FW work request error message * @adap: the adapter * @rpl: start of the FW message */ static int t4_handle_wrerr_rpl(struct adapter *adap, const __be64 *rpl) { u8 opcode = *(const u8 *)rpl; const struct fw_error_cmd *e = (const void *)rpl; unsigned int i; if (opcode != FW_ERROR_CMD) { log(LOG_ERR, "%s: Received WRERR_RPL message with opcode %#x\n", device_get_nameunit(adap->dev), opcode); return (EINVAL); } log(LOG_ERR, "%s: FW_ERROR (%s) ", device_get_nameunit(adap->dev), G_FW_ERROR_CMD_FATAL(be32toh(e->op_to_type)) ? "fatal" : "non-fatal"); switch (G_FW_ERROR_CMD_TYPE(be32toh(e->op_to_type))) { case FW_ERROR_TYPE_EXCEPTION: log(LOG_ERR, "exception info:\n"); for (i = 0; i < nitems(e->u.exception.info); i++) log(LOG_ERR, "%s%08x", i == 0 ? "\t" : " ", be32toh(e->u.exception.info[i])); log(LOG_ERR, "\n"); break; case FW_ERROR_TYPE_HWMODULE: log(LOG_ERR, "HW module regaddr %08x regval %08x\n", be32toh(e->u.hwmodule.regaddr), be32toh(e->u.hwmodule.regval)); break; case FW_ERROR_TYPE_WR: log(LOG_ERR, "WR cidx %d PF %d VF %d eqid %d hdr:\n", be16toh(e->u.wr.cidx), G_FW_ERROR_CMD_PFN(be16toh(e->u.wr.pfn_vfn)), G_FW_ERROR_CMD_VFN(be16toh(e->u.wr.pfn_vfn)), be32toh(e->u.wr.eqid)); for (i = 0; i < nitems(e->u.wr.wrhdr); i++) log(LOG_ERR, "%s%02x", i == 0 ? "\t" : " ", e->u.wr.wrhdr[i]); log(LOG_ERR, "\n"); break; case FW_ERROR_TYPE_ACL: log(LOG_ERR, "ACL cidx %d PF %d VF %d eqid %d %s", be16toh(e->u.acl.cidx), G_FW_ERROR_CMD_PFN(be16toh(e->u.acl.pfn_vfn)), G_FW_ERROR_CMD_VFN(be16toh(e->u.acl.pfn_vfn)), be32toh(e->u.acl.eqid), G_FW_ERROR_CMD_MV(be16toh(e->u.acl.mv_pkd)) ? "vlanid" : "MAC"); for (i = 0; i < nitems(e->u.acl.val); i++) log(LOG_ERR, " %02x", e->u.acl.val[i]); log(LOG_ERR, "\n"); break; default: log(LOG_ERR, "type %#x\n", G_FW_ERROR_CMD_TYPE(be32toh(e->op_to_type))); return (EINVAL); } return (0); } static int sysctl_uint16(SYSCTL_HANDLER_ARGS) { uint16_t *id = arg1; int i = *id; return sysctl_handle_int(oidp, &i, 0, req); } static int sysctl_bufsizes(SYSCTL_HANDLER_ARGS) { struct sge *s = arg1; struct hw_buf_info *hwb = &s->hw_buf_info[0]; struct sw_zone_info *swz = &s->sw_zone_info[0]; int i, rc; struct sbuf sb; char c; sbuf_new(&sb, NULL, 32, SBUF_AUTOEXTEND); for (i = 0; i < SGE_FLBUF_SIZES; i++, hwb++) { if (hwb->zidx >= 0 && swz[hwb->zidx].size <= largest_rx_cluster) c = '*'; else c = '\0'; sbuf_printf(&sb, "%u%c ", hwb->size, c); } sbuf_trim(&sb); sbuf_finish(&sb); rc = sysctl_handle_string(oidp, sbuf_data(&sb), sbuf_len(&sb), req); sbuf_delete(&sb); return (rc); } static int sysctl_tc(SYSCTL_HANDLER_ARGS) { struct vi_info *vi = arg1; struct port_info *pi; struct adapter *sc; struct sge_txq *txq; struct tx_sched_class *tc; int qidx = arg2, rc, tc_idx; uint32_t fw_queue, fw_class; MPASS(qidx >= 0 && qidx < vi->ntxq); pi = vi->pi; sc = pi->adapter; txq = &sc->sge.txq[vi->first_txq + qidx]; tc_idx = txq->tc_idx; rc = sysctl_handle_int(oidp, &tc_idx, 0, req); if (rc != 0 || req->newptr == NULL) return (rc); /* Note that -1 is legitimate input (it means unbind). */ if (tc_idx < -1 || tc_idx >= sc->chip_params->nsched_cls) return (EINVAL); rc = begin_synchronized_op(sc, vi, SLEEP_OK | INTR_OK, "t4stc"); if (rc) return (rc); if (tc_idx == txq->tc_idx) { rc = 0; /* No change, nothing to do. */ goto done; } fw_queue = V_FW_PARAMS_MNEM(FW_PARAMS_MNEM_DMAQ) | V_FW_PARAMS_PARAM_X(FW_PARAMS_PARAM_DMAQ_EQ_SCHEDCLASS_ETH) | V_FW_PARAMS_PARAM_YZ(txq->eq.cntxt_id); if (tc_idx == -1) fw_class = 0xffffffff; /* Unbind. */ else { /* * Bind to a different class. Ethernet txq's are only allowed * to bind to cl-rl mode-class for now. XXX: too restrictive. */ tc = &pi->tc[tc_idx]; if (tc->flags & TX_SC_OK && tc->params.level == SCHED_CLASS_LEVEL_CL_RL && tc->params.mode == SCHED_CLASS_MODE_CLASS) { /* Ok to proceed. */ fw_class = tc_idx; } else { rc = tc->flags & TX_SC_OK ? EBUSY : ENXIO; goto done; } } rc = -t4_set_params(sc, sc->mbox, sc->pf, 0, 1, &fw_queue, &fw_class); if (rc == 0) { if (txq->tc_idx != -1) { tc = &pi->tc[txq->tc_idx]; MPASS(tc->refcount > 0); tc->refcount--; } if (tc_idx != -1) { tc = &pi->tc[tc_idx]; tc->refcount++; } txq->tc_idx = tc_idx; } done: end_synchronized_op(sc, 0); return (rc); } Index: projects/clang390-import/sys/dev/gpio/gpiobusvar.h =================================================================== --- projects/clang390-import/sys/dev/gpio/gpiobusvar.h (revision 305686) +++ projects/clang390-import/sys/dev/gpio/gpiobusvar.h (revision 305687) @@ -1,153 +1,157 @@ /*- * Copyright (c) 2009 Oleksandr Tymoshenko * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ * */ #ifndef __GPIOBUS_H__ #define __GPIOBUS_H__ #include "opt_platform.h" #include #include #include #ifdef FDT #include #include #endif +#ifdef INTRNG +#include +#endif + #include "gpio_if.h" #ifdef FDT #define GPIOBUS_IVAR(d) (struct gpiobus_ivar *) \ &((struct ofw_gpiobus_devinfo *)device_get_ivars(d))->opd_dinfo #else #define GPIOBUS_IVAR(d) (struct gpiobus_ivar *) device_get_ivars(d) #endif #define GPIOBUS_SOFTC(d) (struct gpiobus_softc *) device_get_softc(d) #define GPIOBUS_LOCK(_sc) mtx_lock(&(_sc)->sc_mtx) #define GPIOBUS_UNLOCK(_sc) mtx_unlock(&(_sc)->sc_mtx) #define GPIOBUS_LOCK_INIT(_sc) mtx_init(&_sc->sc_mtx, \ device_get_nameunit(_sc->sc_dev), "gpiobus", MTX_DEF) #define GPIOBUS_LOCK_DESTROY(_sc) mtx_destroy(&_sc->sc_mtx) #define GPIOBUS_ASSERT_LOCKED(_sc) mtx_assert(&_sc->sc_mtx, MA_OWNED) #define GPIOBUS_ASSERT_UNLOCKED(_sc) mtx_assert(&_sc->sc_mtx, MA_NOTOWNED) #define GPIOBUS_WAIT 1 #define GPIOBUS_DONTWAIT 2 /* Use default interrupt mode - for gpio_alloc_intr_resource */ #define GPIO_INTR_CONFORM GPIO_INTR_NONE struct gpiobus_pin_data { int mapped; /* pin is mapped/reserved. */ char *name; /* pin name. */ }; #ifdef INTRNG struct intr_map_data_gpio { struct intr_map_data hdr; u_int gpio_pin_num; u_int gpio_pin_flags; u_int gpio_intr_mode; }; #endif struct gpiobus_softc { struct mtx sc_mtx; /* bus mutex */ struct rman sc_intr_rman; /* isr resources */ device_t sc_busdev; /* bus device */ device_t sc_owner; /* bus owner */ device_t sc_dev; /* driver device */ int sc_npins; /* total pins on bus */ struct gpiobus_pin_data *sc_pins; /* pin data */ }; struct gpiobus_pin { device_t dev; /* gpio device */ uint32_t flags; /* pin flags */ uint32_t pin; /* pin number */ }; typedef struct gpiobus_pin *gpio_pin_t; struct gpiobus_ivar { struct resource_list rl; /* isr resource list */ uint32_t npins; /* pins total */ uint32_t *flags; /* pins flags */ uint32_t *pins; /* pins map */ }; #ifdef FDT struct ofw_gpiobus_devinfo { struct gpiobus_ivar opd_dinfo; struct ofw_bus_devinfo opd_obdinfo; }; static __inline int gpio_map_gpios(device_t bus, phandle_t dev, phandle_t gparent, int gcells, pcell_t *gpios, uint32_t *pin, uint32_t *flags) { return (GPIO_MAP_GPIOS(bus, dev, gparent, gcells, gpios, pin, flags)); } device_t ofw_gpiobus_add_fdt_child(device_t, const char *, phandle_t); int ofw_gpiobus_parse_gpios(device_t, char *, struct gpiobus_pin **); void ofw_gpiobus_register_provider(device_t); void ofw_gpiobus_unregister_provider(device_t); /* Consumers interface. */ int gpio_pin_get_by_ofw_name(device_t consumer, phandle_t node, char *name, gpio_pin_t *gpio); int gpio_pin_get_by_ofw_idx(device_t consumer, phandle_t node, int idx, gpio_pin_t *gpio); int gpio_pin_get_by_ofw_property(device_t consumer, phandle_t node, char *name, gpio_pin_t *gpio); void gpio_pin_release(gpio_pin_t gpio); int gpio_pin_getcaps(gpio_pin_t pin, uint32_t *caps); int gpio_pin_is_active(gpio_pin_t pin, bool *active); int gpio_pin_set_active(gpio_pin_t pin, bool active); int gpio_pin_setflags(gpio_pin_t pin, uint32_t flags); #endif struct resource *gpio_alloc_intr_resource(device_t consumer_dev, int *rid, u_int alloc_flags, gpio_pin_t pin, uint32_t intr_mode); int gpio_check_flags(uint32_t, uint32_t); device_t gpiobus_attach_bus(device_t); int gpiobus_detach_bus(device_t); int gpiobus_init_softc(device_t); int gpiobus_alloc_ivars(struct gpiobus_ivar *); void gpiobus_free_ivars(struct gpiobus_ivar *); int gpiobus_acquire_pin(device_t, uint32_t); int gpiobus_release_pin(device_t, uint32_t); extern driver_t gpiobus_driver; #endif /* __GPIOBUS_H__ */ Index: projects/clang390-import/sys/fs/nullfs/null_vnops.c =================================================================== --- projects/clang390-import/sys/fs/nullfs/null_vnops.c (revision 305686) +++ projects/clang390-import/sys/fs/nullfs/null_vnops.c (revision 305687) @@ -1,935 +1,934 @@ /*- * Copyright (c) 1992, 1993 * The Regents of the University of California. All rights reserved. * * This code is derived from software contributed to Berkeley by * John Heidemann of the UCLA Ficus project. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)null_vnops.c 8.6 (Berkeley) 5/27/95 * * Ancestors: * @(#)lofs_vnops.c 1.2 (Berkeley) 6/18/92 * ...and... * @(#)null_vnodeops.c 1.20 92/07/07 UCLA Ficus project * * $FreeBSD$ */ /* * Null Layer * * (See mount_nullfs(8) for more information.) * * The null layer duplicates a portion of the filesystem * name space under a new name. In this respect, it is * similar to the loopback filesystem. It differs from * the loopback fs in two respects: it is implemented using * a stackable layers techniques, and its "null-node"s stack above * all lower-layer vnodes, not just over directory vnodes. * * The null layer has two purposes. First, it serves as a demonstration * of layering by proving a layer which does nothing. (It actually * does everything the loopback filesystem does, which is slightly * more than nothing.) Second, the null layer can serve as a prototype * layer. Since it provides all necessary layer framework, * new filesystem layers can be created very easily be starting * with a null layer. * * The remainder of this man page examines the null layer as a basis * for constructing new layers. * * * INSTANTIATING NEW NULL LAYERS * * New null layers are created with mount_nullfs(8). * Mount_nullfs(8) takes two arguments, the pathname * of the lower vfs (target-pn) and the pathname where the null * layer will appear in the namespace (alias-pn). After * the null layer is put into place, the contents * of target-pn subtree will be aliased under alias-pn. * * * OPERATION OF A NULL LAYER * * The null layer is the minimum filesystem layer, * simply bypassing all possible operations to the lower layer * for processing there. The majority of its activity centers * on the bypass routine, through which nearly all vnode operations * pass. * * The bypass routine accepts arbitrary vnode operations for * handling by the lower layer. It begins by examing vnode * operation arguments and replacing any null-nodes by their * lower-layer equivlants. It then invokes the operation * on the lower layer. Finally, it replaces the null-nodes * in the arguments and, if a vnode is return by the operation, * stacks a null-node on top of the returned vnode. * * Although bypass handles most operations, vop_getattr, vop_lock, * vop_unlock, vop_inactive, vop_reclaim, and vop_print are not * bypassed. Vop_getattr must change the fsid being returned. * Vop_lock and vop_unlock must handle any locking for the * current vnode as well as pass the lock request down. * Vop_inactive and vop_reclaim are not bypassed so that * they can handle freeing null-layer specific data. Vop_print * is not bypassed to avoid excessive debugging information. * Also, certain vnode operations change the locking state within * the operation (create, mknod, remove, link, rename, mkdir, rmdir, * and symlink). Ideally these operations should not change the * lock state, but should be changed to let the caller of the * function unlock them. Otherwise all intermediate vnode layers * (such as union, umapfs, etc) must catch these functions to do * the necessary locking at their layer. * * * INSTANTIATING VNODE STACKS * * Mounting associates the null layer with a lower layer, * effect stacking two VFSes. Vnode stacks are instead * created on demand as files are accessed. * * The initial mount creates a single vnode stack for the * root of the new null layer. All other vnode stacks * are created as a result of vnode operations on * this or other null vnode stacks. * * New vnode stacks come into existence as a result of * an operation which returns a vnode. * The bypass routine stacks a null-node above the new * vnode before returning it to the caller. * * For example, imagine mounting a null layer with * "mount_nullfs /usr/include /dev/layer/null". * Changing directory to /dev/layer/null will assign * the root null-node (which was created when the null layer was mounted). * Now consider opening "sys". A vop_lookup would be * done on the root null-node. This operation would bypass through * to the lower layer which would return a vnode representing * the UFS "sys". Null_bypass then builds a null-node * aliasing the UFS "sys" and returns this to the caller. * Later operations on the null-node "sys" will repeat this * process when constructing other vnode stacks. * * * CREATING OTHER FILE SYSTEM LAYERS * * One of the easiest ways to construct new filesystem layers is to make * a copy of the null layer, rename all files and variables, and * then begin modifing the copy. Sed can be used to easily rename * all variables. * * The umap layer is an example of a layer descended from the * null layer. * * * INVOKING OPERATIONS ON LOWER LAYERS * * There are two techniques to invoke operations on a lower layer * when the operation cannot be completely bypassed. Each method * is appropriate in different situations. In both cases, * it is the responsibility of the aliasing layer to make * the operation arguments "correct" for the lower layer * by mapping a vnode arguments to the lower layer. * * The first approach is to call the aliasing layer's bypass routine. * This method is most suitable when you wish to invoke the operation * currently being handled on the lower layer. It has the advantage * that the bypass routine already must do argument mapping. * An example of this is null_getattrs in the null layer. * * A second approach is to directly invoke vnode operations on * the lower layer with the VOP_OPERATIONNAME interface. * The advantage of this method is that it is easy to invoke * arbitrary operations on the lower layer. The disadvantage * is that vnode arguments must be manualy mapped. * */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include static int null_bug_bypass = 0; /* for debugging: enables bypass printf'ing */ SYSCTL_INT(_debug, OID_AUTO, nullfs_bug_bypass, CTLFLAG_RW, &null_bug_bypass, 0, ""); /* * This is the 10-Apr-92 bypass routine. * This version has been optimized for speed, throwing away some * safety checks. It should still always work, but it's not as * robust to programmer errors. * * In general, we map all vnodes going down and unmap them on the way back. * As an exception to this, vnodes can be marked "unmapped" by setting * the Nth bit in operation's vdesc_flags. * * Also, some BSD vnode operations have the side effect of vrele'ing * their arguments. With stacking, the reference counts are held * by the upper node, not the lower one, so we must handle these * side-effects here. This is not of concern in Sun-derived systems * since there are no such side-effects. * * This makes the following assumptions: * - only one returned vpp * - no INOUT vpp's (Sun's vop_open has one of these) * - the vnode operation vector of the first vnode should be used * to determine what implementation of the op should be invoked * - all mapped vnodes are of our vnode-type (NEEDSWORK: * problems on rmdir'ing mount points and renaming?) */ int null_bypass(struct vop_generic_args *ap) { struct vnode **this_vp_p; int error; struct vnode *old_vps[VDESC_MAX_VPS]; struct vnode **vps_p[VDESC_MAX_VPS]; struct vnode ***vppp; struct vnodeop_desc *descp = ap->a_desc; int reles, i; if (null_bug_bypass) printf ("null_bypass: %s\n", descp->vdesc_name); #ifdef DIAGNOSTIC /* * We require at least one vp. */ if (descp->vdesc_vp_offsets == NULL || descp->vdesc_vp_offsets[0] == VDESC_NO_OFFSET) panic ("null_bypass: no vp's in map"); #endif /* * Map the vnodes going in. * Later, we'll invoke the operation based on * the first mapped vnode's operation vector. */ reles = descp->vdesc_flags; for (i = 0; i < VDESC_MAX_VPS; reles >>= 1, i++) { if (descp->vdesc_vp_offsets[i] == VDESC_NO_OFFSET) break; /* bail out at end of list */ vps_p[i] = this_vp_p = VOPARG_OFFSETTO(struct vnode**,descp->vdesc_vp_offsets[i],ap); /* * We're not guaranteed that any but the first vnode * are of our type. Check for and don't map any * that aren't. (We must always map first vp or vclean fails.) */ if (i && (*this_vp_p == NULLVP || (*this_vp_p)->v_op != &null_vnodeops)) { old_vps[i] = NULLVP; } else { old_vps[i] = *this_vp_p; *(vps_p[i]) = NULLVPTOLOWERVP(*this_vp_p); /* * XXX - Several operations have the side effect * of vrele'ing their vp's. We must account for * that. (This should go away in the future.) */ if (reles & VDESC_VP0_WILLRELE) VREF(*this_vp_p); } } /* * Call the operation on the lower layer * with the modified argument structure. */ if (vps_p[0] && *vps_p[0]) error = VCALL(ap); else { printf("null_bypass: no map for %s\n", descp->vdesc_name); error = EINVAL; } /* * Maintain the illusion of call-by-value * by restoring vnodes in the argument structure * to their original value. */ reles = descp->vdesc_flags; for (i = 0; i < VDESC_MAX_VPS; reles >>= 1, i++) { if (descp->vdesc_vp_offsets[i] == VDESC_NO_OFFSET) break; /* bail out at end of list */ if (old_vps[i]) { *(vps_p[i]) = old_vps[i]; #if 0 if (reles & VDESC_VP0_WILLUNLOCK) VOP_UNLOCK(*(vps_p[i]), 0); #endif if (reles & VDESC_VP0_WILLRELE) vrele(*(vps_p[i])); } } /* * Map the possible out-going vpp * (Assumes that the lower layer always returns * a VREF'ed vpp unless it gets an error.) */ if (descp->vdesc_vpp_offset != VDESC_NO_OFFSET && !(descp->vdesc_flags & VDESC_NOMAP_VPP) && !error) { /* * XXX - even though some ops have vpp returned vp's, * several ops actually vrele this before returning. * We must avoid these ops. * (This should go away when these ops are regularized.) */ if (descp->vdesc_flags & VDESC_VPP_WILLRELE) goto out; vppp = VOPARG_OFFSETTO(struct vnode***, descp->vdesc_vpp_offset,ap); if (*vppp) error = null_nodeget(old_vps[0]->v_mount, **vppp, *vppp); } out: return (error); } static int null_add_writecount(struct vop_add_writecount_args *ap) { struct vnode *lvp, *vp; int error; vp = ap->a_vp; lvp = NULLVPTOLOWERVP(vp); KASSERT(vp->v_writecount + ap->a_inc >= 0, ("wrong writecount inc")); if (vp->v_writecount > 0 && vp->v_writecount + ap->a_inc == 0) error = VOP_ADD_WRITECOUNT(lvp, -1); else if (vp->v_writecount == 0 && vp->v_writecount + ap->a_inc > 0) error = VOP_ADD_WRITECOUNT(lvp, 1); else error = 0; if (error == 0) vp->v_writecount += ap->a_inc; return (error); } /* * We have to carry on the locking protocol on the null layer vnodes * as we progress through the tree. We also have to enforce read-only * if this layer is mounted read-only. */ static int null_lookup(struct vop_lookup_args *ap) { struct componentname *cnp = ap->a_cnp; struct vnode *dvp = ap->a_dvp; int flags = cnp->cn_flags; struct vnode *vp, *ldvp, *lvp; struct mount *mp; int error; mp = dvp->v_mount; if ((flags & ISLASTCN) != 0 && (mp->mnt_flag & MNT_RDONLY) != 0 && (cnp->cn_nameiop == DELETE || cnp->cn_nameiop == RENAME)) return (EROFS); /* * Although it is possible to call null_bypass(), we'll do * a direct call to reduce overhead */ ldvp = NULLVPTOLOWERVP(dvp); vp = lvp = NULL; KASSERT((ldvp->v_vflag & VV_ROOT) == 0 || ((dvp->v_vflag & VV_ROOT) != 0 && (flags & ISDOTDOT) == 0), ("ldvp %p fl %#x dvp %p fl %#x flags %#x", ldvp, ldvp->v_vflag, dvp, dvp->v_vflag, flags)); /* * Hold ldvp. The reference on it, owned by dvp, is lost in * case of dvp reclamation, and we need ldvp to move our lock * from ldvp to dvp. */ vhold(ldvp); error = VOP_LOOKUP(ldvp, &lvp, cnp); /* * VOP_LOOKUP() on lower vnode may unlock ldvp, which allows * dvp to be reclaimed due to shared v_vnlock. Check for the * doomed state and return error. */ if ((error == 0 || error == EJUSTRETURN) && (dvp->v_iflag & VI_DOOMED) != 0) { error = ENOENT; if (lvp != NULL) vput(lvp); /* * If vgone() did reclaimed dvp before curthread * relocked ldvp, the locks of dvp and ldpv are no * longer shared. In this case, relock of ldvp in * lower fs VOP_LOOKUP() does not restore the locking * state of dvp. Compensate for this by unlocking * ldvp and locking dvp, which is also correct if the * locks are still shared. */ VOP_UNLOCK(ldvp, 0); vn_lock(dvp, LK_EXCLUSIVE | LK_RETRY); } vdrop(ldvp); if (error == EJUSTRETURN && (flags & ISLASTCN) != 0 && (mp->mnt_flag & MNT_RDONLY) != 0 && (cnp->cn_nameiop == CREATE || cnp->cn_nameiop == RENAME)) error = EROFS; if ((error == 0 || error == EJUSTRETURN) && lvp != NULL) { if (ldvp == lvp) { *ap->a_vpp = dvp; VREF(dvp); vrele(lvp); } else { error = null_nodeget(mp, lvp, &vp); if (error == 0) *ap->a_vpp = vp; } } return (error); } static int null_open(struct vop_open_args *ap) { int retval; struct vnode *vp, *ldvp; vp = ap->a_vp; ldvp = NULLVPTOLOWERVP(vp); retval = null_bypass(&ap->a_gen); if (retval == 0) vp->v_object = ldvp->v_object; return (retval); } /* * Setattr call. Disallow write attempts if the layer is mounted read-only. */ static int null_setattr(struct vop_setattr_args *ap) { struct vnode *vp = ap->a_vp; struct vattr *vap = ap->a_vap; if ((vap->va_flags != VNOVAL || vap->va_uid != (uid_t)VNOVAL || vap->va_gid != (gid_t)VNOVAL || vap->va_atime.tv_sec != VNOVAL || vap->va_mtime.tv_sec != VNOVAL || vap->va_mode != (mode_t)VNOVAL) && (vp->v_mount->mnt_flag & MNT_RDONLY)) return (EROFS); if (vap->va_size != VNOVAL) { switch (vp->v_type) { case VDIR: return (EISDIR); case VCHR: case VBLK: case VSOCK: case VFIFO: if (vap->va_flags != VNOVAL) return (EOPNOTSUPP); return (0); case VREG: case VLNK: default: /* * Disallow write attempts if the filesystem is * mounted read-only. */ if (vp->v_mount->mnt_flag & MNT_RDONLY) return (EROFS); } } return (null_bypass((struct vop_generic_args *)ap)); } /* * We handle getattr only to change the fsid. */ static int null_getattr(struct vop_getattr_args *ap) { int error; if ((error = null_bypass((struct vop_generic_args *)ap)) != 0) return (error); ap->a_vap->va_fsid = ap->a_vp->v_mount->mnt_stat.f_fsid.val[0]; return (0); } /* * Handle to disallow write access if mounted read-only. */ static int null_access(struct vop_access_args *ap) { struct vnode *vp = ap->a_vp; accmode_t accmode = ap->a_accmode; /* * Disallow write attempts on read-only layers; * unless the file is a socket, fifo, or a block or * character device resident on the filesystem. */ if (accmode & VWRITE) { switch (vp->v_type) { case VDIR: case VLNK: case VREG: if (vp->v_mount->mnt_flag & MNT_RDONLY) return (EROFS); break; default: break; } } return (null_bypass((struct vop_generic_args *)ap)); } static int null_accessx(struct vop_accessx_args *ap) { struct vnode *vp = ap->a_vp; accmode_t accmode = ap->a_accmode; /* * Disallow write attempts on read-only layers; * unless the file is a socket, fifo, or a block or * character device resident on the filesystem. */ if (accmode & VWRITE) { switch (vp->v_type) { case VDIR: case VLNK: case VREG: if (vp->v_mount->mnt_flag & MNT_RDONLY) return (EROFS); break; default: break; } } return (null_bypass((struct vop_generic_args *)ap)); } /* * Increasing refcount of lower vnode is needed at least for the case * when lower FS is NFS to do sillyrename if the file is in use. * Unfortunately v_usecount is incremented in many places in * the kernel and, as such, there may be races that result in * the NFS client doing an extraneous silly rename, but that seems * preferable to not doing a silly rename when it is needed. */ static int null_remove(struct vop_remove_args *ap) { int retval, vreleit; struct vnode *lvp, *vp; vp = ap->a_vp; if (vrefcnt(vp) > 1) { lvp = NULLVPTOLOWERVP(vp); VREF(lvp); vreleit = 1; } else vreleit = 0; VTONULL(vp)->null_flags |= NULLV_DROP; retval = null_bypass(&ap->a_gen); if (vreleit != 0) vrele(lvp); return (retval); } /* * We handle this to eliminate null FS to lower FS * file moving. Don't know why we don't allow this, * possibly we should. */ static int null_rename(struct vop_rename_args *ap) { struct vnode *tdvp = ap->a_tdvp; struct vnode *fvp = ap->a_fvp; struct vnode *fdvp = ap->a_fdvp; struct vnode *tvp = ap->a_tvp; struct null_node *tnn; /* Check for cross-device rename. */ if ((fvp->v_mount != tdvp->v_mount) || (tvp && (fvp->v_mount != tvp->v_mount))) { if (tdvp == tvp) vrele(tdvp); else vput(tdvp); if (tvp) vput(tvp); vrele(fdvp); vrele(fvp); return (EXDEV); } if (tvp != NULL) { tnn = VTONULL(tvp); tnn->null_flags |= NULLV_DROP; } return (null_bypass((struct vop_generic_args *)ap)); } static int null_rmdir(struct vop_rmdir_args *ap) { VTONULL(ap->a_vp)->null_flags |= NULLV_DROP; return (null_bypass(&ap->a_gen)); } /* * We need to process our own vnode lock and then clear the * interlock flag as it applies only to our vnode, not the * vnodes below us on the stack. */ static int null_lock(struct vop_lock1_args *ap) { struct vnode *vp = ap->a_vp; int flags = ap->a_flags; struct null_node *nn; struct vnode *lvp; int error; if ((flags & LK_INTERLOCK) == 0) { VI_LOCK(vp); ap->a_flags = flags |= LK_INTERLOCK; } nn = VTONULL(vp); /* * If we're still active we must ask the lower layer to * lock as ffs has special lock considerations in it's * vop lock. */ if (nn != NULL && (lvp = NULLVPTOLOWERVP(vp)) != NULL) { VI_LOCK_FLAGS(lvp, MTX_DUPOK); VI_UNLOCK(vp); /* * We have to hold the vnode here to solve a potential * reclaim race. If we're forcibly vgone'd while we * still have refs, a thread could be sleeping inside * the lowervp's vop_lock routine. When we vgone we will * drop our last ref to the lowervp, which would allow it * to be reclaimed. The lowervp could then be recycled, * in which case it is not legal to be sleeping in it's VOP. * We prevent it from being recycled by holding the vnode * here. */ vholdl(lvp); error = VOP_LOCK(lvp, flags); /* * We might have slept to get the lock and someone might have * clean our vnode already, switching vnode lock from one in * lowervp to v_lock in our own vnode structure. Handle this * case by reacquiring correct lock in requested mode. */ if (VTONULL(vp) == NULL && error == 0) { ap->a_flags &= ~(LK_TYPE_MASK | LK_INTERLOCK); switch (flags & LK_TYPE_MASK) { case LK_SHARED: ap->a_flags |= LK_SHARED; break; case LK_UPGRADE: case LK_EXCLUSIVE: ap->a_flags |= LK_EXCLUSIVE; break; default: panic("Unsupported lock request %d\n", ap->a_flags); } VOP_UNLOCK(lvp, 0); error = vop_stdlock(ap); } vdrop(lvp); } else error = vop_stdlock(ap); return (error); } /* * We need to process our own vnode unlock and then clear the * interlock flag as it applies only to our vnode, not the * vnodes below us on the stack. */ static int null_unlock(struct vop_unlock_args *ap) { struct vnode *vp = ap->a_vp; int flags = ap->a_flags; int mtxlkflag = 0; struct null_node *nn; struct vnode *lvp; int error; if ((flags & LK_INTERLOCK) != 0) mtxlkflag = 1; else if (mtx_owned(VI_MTX(vp)) == 0) { VI_LOCK(vp); mtxlkflag = 2; } nn = VTONULL(vp); if (nn != NULL && (lvp = NULLVPTOLOWERVP(vp)) != NULL) { VI_LOCK_FLAGS(lvp, MTX_DUPOK); flags |= LK_INTERLOCK; vholdl(lvp); VI_UNLOCK(vp); error = VOP_UNLOCK(lvp, flags); vdrop(lvp); if (mtxlkflag == 0) VI_LOCK(vp); } else { if (mtxlkflag == 2) VI_UNLOCK(vp); error = vop_stdunlock(ap); } return (error); } /* * Do not allow the VOP_INACTIVE to be passed to the lower layer, * since the reference count on the lower vnode is not related to * ours. */ static int null_inactive(struct vop_inactive_args *ap __unused) { struct vnode *vp, *lvp; struct null_node *xp; struct mount *mp; struct null_mount *xmp; vp = ap->a_vp; xp = VTONULL(vp); lvp = NULLVPTOLOWERVP(vp); mp = vp->v_mount; xmp = MOUNTTONULLMOUNT(mp); if ((xmp->nullm_flags & NULLM_CACHE) == 0 || (xp->null_flags & NULLV_DROP) != 0 || (lvp->v_vflag & VV_NOSYNC) != 0) { /* * If this is the last reference and caching of the * nullfs vnodes is not enabled, or the lower vnode is * deleted, then free up the vnode so as not to tie up * the lower vnodes. */ vp->v_object = NULL; vrecycle(vp); } return (0); } /* * Now, the nullfs vnode and, due to the sharing lock, the lower * vnode, are exclusively locked, and we shall destroy the null vnode. */ static int null_reclaim(struct vop_reclaim_args *ap) { struct vnode *vp; struct null_node *xp; struct vnode *lowervp; vp = ap->a_vp; xp = VTONULL(vp); lowervp = xp->null_lowervp; KASSERT(lowervp != NULL && vp->v_vnlock != &vp->v_lock, ("Reclaiming incomplete null vnode %p", vp)); null_hashrem(xp); /* * Use the interlock to protect the clearing of v_data to * prevent faults in null_lock(). */ lockmgr(&vp->v_lock, LK_EXCLUSIVE, NULL); VI_LOCK(vp); vp->v_data = NULL; vp->v_object = NULL; vp->v_vnlock = &vp->v_lock; VI_UNLOCK(vp); /* * If we were opened for write, we leased one write reference * to the lower vnode. If this is a reclamation due to the * forced unmount, undo the reference now. */ if (vp->v_writecount > 0) VOP_ADD_WRITECOUNT(lowervp, -1); if ((xp->null_flags & NULLV_NOUNLOCK) != 0) vunref(lowervp); else vput(lowervp); free(xp, M_NULLFSNODE); return (0); } static int null_print(struct vop_print_args *ap) { struct vnode *vp = ap->a_vp; printf("\tvp=%p, lowervp=%p\n", vp, VTONULL(vp)->null_lowervp); return (0); } /* ARGSUSED */ static int null_getwritemount(struct vop_getwritemount_args *ap) { struct null_node *xp; struct vnode *lowervp; struct vnode *vp; vp = ap->a_vp; VI_LOCK(vp); xp = VTONULL(vp); if (xp && (lowervp = xp->null_lowervp)) { VI_LOCK_FLAGS(lowervp, MTX_DUPOK); VI_UNLOCK(vp); vholdl(lowervp); VI_UNLOCK(lowervp); VOP_GETWRITEMOUNT(lowervp, ap->a_mpp); vdrop(lowervp); } else { VI_UNLOCK(vp); *(ap->a_mpp) = NULL; } return (0); } static int null_vptofh(struct vop_vptofh_args *ap) { struct vnode *lvp; lvp = NULLVPTOLOWERVP(ap->a_vp); return VOP_VPTOFH(lvp, ap->a_fhp); } static int null_vptocnp(struct vop_vptocnp_args *ap) { struct vnode *vp = ap->a_vp; struct vnode **dvp = ap->a_vpp; struct vnode *lvp, *ldvp; struct ucred *cred = ap->a_cred; int error, locked; locked = VOP_ISLOCKED(vp); lvp = NULLVPTOLOWERVP(vp); vhold(lvp); VOP_UNLOCK(vp, 0); /* vp is held by vn_vptocnp_locked that called us */ ldvp = lvp; vref(lvp); error = vn_vptocnp(&ldvp, cred, ap->a_buf, ap->a_buflen); vdrop(lvp); if (error != 0) { vn_lock(vp, locked | LK_RETRY); return (ENOENT); } /* * Exclusive lock is required by insmntque1 call in * null_nodeget() */ error = vn_lock(ldvp, LK_EXCLUSIVE); if (error != 0) { vrele(ldvp); vn_lock(vp, locked | LK_RETRY); return (ENOENT); } - vref(ldvp); error = null_nodeget(vp->v_mount, ldvp, dvp); if (error == 0) { #ifdef DIAGNOSTIC NULLVPTOLOWERVP(*dvp); #endif VOP_UNLOCK(*dvp, 0); /* keep reference on *dvp */ } vn_lock(vp, locked | LK_RETRY); return (error); } /* * Global vfs data structures */ struct vop_vector null_vnodeops = { .vop_bypass = null_bypass, .vop_access = null_access, .vop_accessx = null_accessx, .vop_advlockpurge = vop_stdadvlockpurge, .vop_bmap = VOP_EOPNOTSUPP, .vop_getattr = null_getattr, .vop_getwritemount = null_getwritemount, .vop_inactive = null_inactive, .vop_islocked = vop_stdislocked, .vop_lock1 = null_lock, .vop_lookup = null_lookup, .vop_open = null_open, .vop_print = null_print, .vop_reclaim = null_reclaim, .vop_remove = null_remove, .vop_rename = null_rename, .vop_rmdir = null_rmdir, .vop_setattr = null_setattr, .vop_strategy = VOP_EOPNOTSUPP, .vop_unlock = null_unlock, .vop_vptocnp = null_vptocnp, .vop_vptofh = null_vptofh, .vop_add_writecount = null_add_writecount, }; Index: projects/clang390-import/sys/i386/i386/pmap.c =================================================================== --- projects/clang390-import/sys/i386/i386/pmap.c (revision 305686) +++ projects/clang390-import/sys/i386/i386/pmap.c (revision 305687) @@ -1,5622 +1,5616 @@ /*- * Copyright (c) 1991 Regents of the University of California. * All rights reserved. * Copyright (c) 1994 John S. Dyson * All rights reserved. * Copyright (c) 1994 David Greenman * All rights reserved. * Copyright (c) 2005-2010 Alan L. Cox * All rights reserved. * * This code is derived from software contributed to Berkeley by * the Systems Programming Group of the University of Utah Computer * Science Department and William Jolitz of UUNET Technologies Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. All advertising materials mentioning features or use of this software * must display the following acknowledgement: * This product includes software developed by the University of * California, Berkeley and its contributors. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * from: @(#)pmap.c 7.7 (Berkeley) 5/12/91 */ /*- * Copyright (c) 2003 Networks Associates Technology, Inc. * All rights reserved. * * This software was developed for the FreeBSD Project by Jake Burkholder, * Safeport Network Services, and Network Associates Laboratories, the * Security Research Division of Network Associates, Inc. under * DARPA/SPAWAR contract N66001-01-C-8035 ("CBOSS"), as part of the DARPA * CHATS research program. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); /* * Manages physical address maps. * * Since the information managed by this module is * also stored by the logical address mapping module, * this module may throw away valid virtual-to-physical * mappings at almost any time. However, invalidations * of virtual-to-physical mappings must be done as * requested. * * In order to cope with hardware architectures which * make virtual-to-physical map invalidates expensive, * this module may delay invalidate or reduced protection * operations until such time as they are actually * necessary. This module is given full information as * to which processors are currently using which maps, * and to when physical maps must be made correct. */ #include "opt_apic.h" #include "opt_cpu.h" #include "opt_pmap.h" #include "opt_smp.h" #include "opt_xbox.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifdef DEV_APIC #include #include #include #endif #include #include #include #include #include #ifdef SMP #include #endif #ifdef XBOX #include #endif #if !defined(CPU_DISABLE_SSE) && defined(I686_CPU) #define CPU_ENABLE_SSE #endif #ifndef PMAP_SHPGPERPROC #define PMAP_SHPGPERPROC 200 #endif #if !defined(DIAGNOSTIC) #ifdef __GNUC_GNU_INLINE__ #define PMAP_INLINE __attribute__((__gnu_inline__)) inline #else #define PMAP_INLINE extern inline #endif #else #define PMAP_INLINE #endif #ifdef PV_STATS #define PV_STAT(x) do { x ; } while (0) #else #define PV_STAT(x) do { } while (0) #endif #define pa_index(pa) ((pa) >> PDRSHIFT) #define pa_to_pvh(pa) (&pv_table[pa_index(pa)]) /* * Get PDEs and PTEs for user/kernel address space */ #define pmap_pde(m, v) (&((m)->pm_pdir[(vm_offset_t)(v) >> PDRSHIFT])) #define pdir_pde(m, v) (m[(vm_offset_t)(v) >> PDRSHIFT]) #define pmap_pde_v(pte) ((*(int *)pte & PG_V) != 0) #define pmap_pte_w(pte) ((*(int *)pte & PG_W) != 0) #define pmap_pte_m(pte) ((*(int *)pte & PG_M) != 0) #define pmap_pte_u(pte) ((*(int *)pte & PG_A) != 0) #define pmap_pte_v(pte) ((*(int *)pte & PG_V) != 0) #define pmap_pte_set_w(pte, v) ((v) ? atomic_set_int((u_int *)(pte), PG_W) : \ atomic_clear_int((u_int *)(pte), PG_W)) #define pmap_pte_set_prot(pte, v) ((*(int *)pte &= ~PG_PROT), (*(int *)pte |= (v))) struct pmap kernel_pmap_store; LIST_HEAD(pmaplist, pmap); static struct pmaplist allpmaps; static struct mtx allpmaps_lock; vm_offset_t virtual_avail; /* VA of first avail page (after kernel bss) */ vm_offset_t virtual_end; /* VA of last avail page (end of kernel AS) */ int pgeflag = 0; /* PG_G or-in */ int pseflag = 0; /* PG_PS or-in */ static int nkpt = NKPT; vm_offset_t kernel_vm_end = KERNBASE + NKPT * NBPDR; extern u_int32_t KERNend; extern u_int32_t KPTphys; #if defined(PAE) || defined(PAE_TABLES) pt_entry_t pg_nx; static uma_zone_t pdptzone; #endif static SYSCTL_NODE(_vm, OID_AUTO, pmap, CTLFLAG_RD, 0, "VM/pmap parameters"); static int pat_works = 1; SYSCTL_INT(_vm_pmap, OID_AUTO, pat_works, CTLFLAG_RD, &pat_works, 1, "Is page attribute table fully functional?"); static int pg_ps_enabled = 1; SYSCTL_INT(_vm_pmap, OID_AUTO, pg_ps_enabled, CTLFLAG_RDTUN | CTLFLAG_NOFETCH, &pg_ps_enabled, 0, "Are large page mappings enabled?"); #define PAT_INDEX_SIZE 8 static int pat_index[PAT_INDEX_SIZE]; /* cache mode to PAT index conversion */ /* * pmap_mapdev support pre initialization (i.e. console) */ #define PMAP_PREINIT_MAPPING_COUNT 8 static struct pmap_preinit_mapping { vm_paddr_t pa; vm_offset_t va; vm_size_t sz; int mode; } pmap_preinit_mapping[PMAP_PREINIT_MAPPING_COUNT]; static int pmap_initialized; static struct rwlock_padalign pvh_global_lock; /* * Data for the pv entry allocation mechanism */ static TAILQ_HEAD(pch, pv_chunk) pv_chunks = TAILQ_HEAD_INITIALIZER(pv_chunks); static int pv_entry_count = 0, pv_entry_max = 0, pv_entry_high_water = 0; static struct md_page *pv_table; static int shpgperproc = PMAP_SHPGPERPROC; struct pv_chunk *pv_chunkbase; /* KVA block for pv_chunks */ int pv_maxchunks; /* How many chunks we have KVA for */ vm_offset_t pv_vafree; /* freelist stored in the PTE */ /* * All those kernel PT submaps that BSD is so fond of */ struct sysmaps { struct mtx lock; pt_entry_t *CMAP1; pt_entry_t *CMAP2; caddr_t CADDR1; caddr_t CADDR2; }; static struct sysmaps sysmaps_pcpu[MAXCPU]; pt_entry_t *CMAP3; static pd_entry_t *KPTD; caddr_t ptvmmap = 0; caddr_t CADDR3; struct msgbuf *msgbufp = NULL; /* * Crashdump maps. */ static caddr_t crashdumpmap; static pt_entry_t *PMAP1 = NULL, *PMAP2; static pt_entry_t *PADDR1 = NULL, *PADDR2; #ifdef SMP static int PMAP1cpu; static int PMAP1changedcpu; SYSCTL_INT(_debug, OID_AUTO, PMAP1changedcpu, CTLFLAG_RD, &PMAP1changedcpu, 0, "Number of times pmap_pte_quick changed CPU with same PMAP1"); #endif static int PMAP1changed; SYSCTL_INT(_debug, OID_AUTO, PMAP1changed, CTLFLAG_RD, &PMAP1changed, 0, "Number of times pmap_pte_quick changed PMAP1"); static int PMAP1unchanged; SYSCTL_INT(_debug, OID_AUTO, PMAP1unchanged, CTLFLAG_RD, &PMAP1unchanged, 0, "Number of times pmap_pte_quick didn't change PMAP1"); static struct mtx PMAP2mutex; static void free_pv_chunk(struct pv_chunk *pc); static void free_pv_entry(pmap_t pmap, pv_entry_t pv); static pv_entry_t get_pv_entry(pmap_t pmap, boolean_t try); static void pmap_pv_demote_pde(pmap_t pmap, vm_offset_t va, vm_paddr_t pa); static boolean_t pmap_pv_insert_pde(pmap_t pmap, vm_offset_t va, vm_paddr_t pa); static void pmap_pv_promote_pde(pmap_t pmap, vm_offset_t va, vm_paddr_t pa); static void pmap_pvh_free(struct md_page *pvh, pmap_t pmap, vm_offset_t va); static pv_entry_t pmap_pvh_remove(struct md_page *pvh, pmap_t pmap, vm_offset_t va); static int pmap_pvh_wired_mappings(struct md_page *pvh, int count); static boolean_t pmap_demote_pde(pmap_t pmap, pd_entry_t *pde, vm_offset_t va); static boolean_t pmap_enter_pde(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot); static vm_page_t pmap_enter_quick_locked(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot, vm_page_t mpte); static void pmap_flush_page(vm_page_t m); static int pmap_insert_pt_page(pmap_t pmap, vm_page_t mpte); static void pmap_fill_ptp(pt_entry_t *firstpte, pt_entry_t newpte); static boolean_t pmap_is_modified_pvh(struct md_page *pvh); static boolean_t pmap_is_referenced_pvh(struct md_page *pvh); static void pmap_kenter_attr(vm_offset_t va, vm_paddr_t pa, int mode); static void pmap_kenter_pde(vm_offset_t va, pd_entry_t newpde); static vm_page_t pmap_lookup_pt_page(pmap_t pmap, vm_offset_t va); static void pmap_pde_attr(pd_entry_t *pde, int cache_bits); static void pmap_promote_pde(pmap_t pmap, pd_entry_t *pde, vm_offset_t va); static boolean_t pmap_protect_pde(pmap_t pmap, pd_entry_t *pde, vm_offset_t sva, vm_prot_t prot); static void pmap_pte_attr(pt_entry_t *pte, int cache_bits); static void pmap_remove_pde(pmap_t pmap, pd_entry_t *pdq, vm_offset_t sva, struct spglist *free); static int pmap_remove_pte(pmap_t pmap, pt_entry_t *ptq, vm_offset_t sva, struct spglist *free); static void pmap_remove_pt_page(pmap_t pmap, vm_page_t mpte); static void pmap_remove_page(struct pmap *pmap, vm_offset_t va, struct spglist *free); static void pmap_remove_entry(struct pmap *pmap, vm_page_t m, vm_offset_t va); static void pmap_insert_entry(pmap_t pmap, vm_offset_t va, vm_page_t m); static boolean_t pmap_try_insert_pv_entry(pmap_t pmap, vm_offset_t va, vm_page_t m); static void pmap_update_pde(pmap_t pmap, vm_offset_t va, pd_entry_t *pde, pd_entry_t newpde); static void pmap_update_pde_invalidate(vm_offset_t va, pd_entry_t newpde); static vm_page_t pmap_allocpte(pmap_t pmap, vm_offset_t va, u_int flags); static vm_page_t _pmap_allocpte(pmap_t pmap, u_int ptepindex, u_int flags); static void _pmap_unwire_ptp(pmap_t pmap, vm_page_t m, struct spglist *free); static pt_entry_t *pmap_pte_quick(pmap_t pmap, vm_offset_t va); static void pmap_pte_release(pt_entry_t *pte); static int pmap_unuse_pt(pmap_t, vm_offset_t, struct spglist *); #if defined(PAE) || defined(PAE_TABLES) static void *pmap_pdpt_allocf(uma_zone_t zone, vm_size_t bytes, uint8_t *flags, int wait); #endif static void pmap_set_pg(void); static __inline void pagezero(void *page); CTASSERT(1 << PDESHIFT == sizeof(pd_entry_t)); CTASSERT(1 << PTESHIFT == sizeof(pt_entry_t)); /* * If you get an error here, then you set KVA_PAGES wrong! See the * description of KVA_PAGES in sys/i386/include/pmap.h. It must be * multiple of 4 for a normal kernel, or a multiple of 8 for a PAE. */ CTASSERT(KERNBASE % (1 << 24) == 0); /* * Bootstrap the system enough to run with virtual memory. * * On the i386 this is called after mapping has already been enabled * and just syncs the pmap module with what has already been done. * [We can't call it easily with mapping off since the kernel is not * mapped with PA == VA, hence we would have to relocate every address * from the linked base (virtual) address "KERNBASE" to the actual * (physical) address starting relative to 0] */ void pmap_bootstrap(vm_paddr_t firstaddr) { vm_offset_t va; pt_entry_t *pte, *unused; struct sysmaps *sysmaps; int i; /* * Add a physical memory segment (vm_phys_seg) corresponding to the * preallocated kernel page table pages so that vm_page structures * representing these pages will be created. The vm_page structures * are required for promotion of the corresponding kernel virtual * addresses to superpage mappings. */ vm_phys_add_seg(KPTphys, KPTphys + ptoa(nkpt)); /* * Initialize the first available kernel virtual address. However, * using "firstaddr" may waste a few pages of the kernel virtual * address space, because locore may not have mapped every physical * page that it allocated. Preferably, locore would provide a first * unused virtual address in addition to "firstaddr". */ virtual_avail = (vm_offset_t) KERNBASE + firstaddr; virtual_end = VM_MAX_KERNEL_ADDRESS; /* * Initialize the kernel pmap (which is statically allocated). */ PMAP_LOCK_INIT(kernel_pmap); kernel_pmap->pm_pdir = (pd_entry_t *) (KERNBASE + (u_int)IdlePTD); #if defined(PAE) || defined(PAE_TABLES) kernel_pmap->pm_pdpt = (pdpt_entry_t *) (KERNBASE + (u_int)IdlePDPT); #endif CPU_FILL(&kernel_pmap->pm_active); /* don't allow deactivation */ TAILQ_INIT(&kernel_pmap->pm_pvchunk); /* * Initialize the global pv list lock. */ rw_init(&pvh_global_lock, "pmap pv global"); LIST_INIT(&allpmaps); /* * Request a spin mutex so that changes to allpmaps cannot be * preempted by smp_rendezvous_cpus(). Otherwise, * pmap_update_pde_kernel() could access allpmaps while it is * being changed. */ mtx_init(&allpmaps_lock, "allpmaps", NULL, MTX_SPIN); mtx_lock_spin(&allpmaps_lock); LIST_INSERT_HEAD(&allpmaps, kernel_pmap, pm_list); mtx_unlock_spin(&allpmaps_lock); /* * Reserve some special page table entries/VA space for temporary * mapping of pages. */ #define SYSMAP(c, p, v, n) \ v = (c)va; va += ((n)*PAGE_SIZE); p = pte; pte += (n); va = virtual_avail; pte = vtopte(va); /* * CMAP1/CMAP2 are used for zeroing and copying pages. * CMAP3 is used for the boot-time memory test. */ for (i = 0; i < MAXCPU; i++) { sysmaps = &sysmaps_pcpu[i]; mtx_init(&sysmaps->lock, "SYSMAPS", NULL, MTX_DEF); SYSMAP(caddr_t, sysmaps->CMAP1, sysmaps->CADDR1, 1) SYSMAP(caddr_t, sysmaps->CMAP2, sysmaps->CADDR2, 1) } SYSMAP(caddr_t, CMAP3, CADDR3, 1); /* * Crashdump maps. */ SYSMAP(caddr_t, unused, crashdumpmap, MAXDUMPPGS) /* * ptvmmap is used for reading arbitrary physical pages via /dev/mem. */ SYSMAP(caddr_t, unused, ptvmmap, 1) /* * msgbufp is used to map the system message buffer. */ SYSMAP(struct msgbuf *, unused, msgbufp, atop(round_page(msgbufsize))) /* * KPTmap is used by pmap_kextract(). * * KPTmap is first initialized by locore. However, that initial * KPTmap can only support NKPT page table pages. Here, a larger * KPTmap is created that can support KVA_PAGES page table pages. */ SYSMAP(pt_entry_t *, KPTD, KPTmap, KVA_PAGES) for (i = 0; i < NKPT; i++) KPTD[i] = (KPTphys + (i << PAGE_SHIFT)) | pgeflag | PG_RW | PG_V; /* * Adjust the start of the KPTD and KPTmap so that the implementation * of pmap_kextract() and pmap_growkernel() can be made simpler. */ KPTD -= KPTDI; KPTmap -= i386_btop(KPTDI << PDRSHIFT); /* * PADDR1 and PADDR2 are used by pmap_pte_quick() and pmap_pte(), * respectively. */ SYSMAP(pt_entry_t *, PMAP1, PADDR1, 1) SYSMAP(pt_entry_t *, PMAP2, PADDR2, 1) mtx_init(&PMAP2mutex, "PMAP2", NULL, MTX_DEF); virtual_avail = va; /* * Leave in place an identity mapping (virt == phys) for the low 1 MB * physical memory region that is used by the ACPI wakeup code. This * mapping must not have PG_G set. */ #ifdef XBOX /* FIXME: This is gross, but needed for the XBOX. Since we are in such * an early stadium, we cannot yet neatly map video memory ... :-( * Better fixes are very welcome! */ if (!arch_i386_is_xbox) #endif for (i = 1; i < NKPT; i++) PTD[i] = 0; /* Initialize the PAT MSR if present. */ pmap_init_pat(); /* Turn on PG_G on kernel page(s) */ pmap_set_pg(); } static void pmap_init_qpages(void) { struct pcpu *pc; int i; CPU_FOREACH(i) { pc = pcpu_find(i); pc->pc_qmap_addr = kva_alloc(PAGE_SIZE); if (pc->pc_qmap_addr == 0) panic("pmap_init_qpages: unable to allocate KVA"); } } SYSINIT(qpages_init, SI_SUB_CPU, SI_ORDER_ANY, pmap_init_qpages, NULL); /* * Setup the PAT MSR. */ void pmap_init_pat(void) { int pat_table[PAT_INDEX_SIZE]; uint64_t pat_msr; u_long cr0, cr4; int i; /* Set default PAT index table. */ for (i = 0; i < PAT_INDEX_SIZE; i++) pat_table[i] = -1; pat_table[PAT_WRITE_BACK] = 0; pat_table[PAT_WRITE_THROUGH] = 1; pat_table[PAT_UNCACHEABLE] = 3; pat_table[PAT_WRITE_COMBINING] = 3; pat_table[PAT_WRITE_PROTECTED] = 3; pat_table[PAT_UNCACHED] = 3; /* Bail if this CPU doesn't implement PAT. */ if ((cpu_feature & CPUID_PAT) == 0) { for (i = 0; i < PAT_INDEX_SIZE; i++) pat_index[i] = pat_table[i]; pat_works = 0; return; } /* * Due to some Intel errata, we can only safely use the lower 4 * PAT entries. * * Intel Pentium III Processor Specification Update * Errata E.27 (Upper Four PAT Entries Not Usable With Mode B * or Mode C Paging) * * Intel Pentium IV Processor Specification Update * Errata N46 (PAT Index MSB May Be Calculated Incorrectly) */ if (cpu_vendor_id == CPU_VENDOR_INTEL && !(CPUID_TO_FAMILY(cpu_id) == 6 && CPUID_TO_MODEL(cpu_id) >= 0xe)) pat_works = 0; /* Initialize default PAT entries. */ pat_msr = PAT_VALUE(0, PAT_WRITE_BACK) | PAT_VALUE(1, PAT_WRITE_THROUGH) | PAT_VALUE(2, PAT_UNCACHED) | PAT_VALUE(3, PAT_UNCACHEABLE) | PAT_VALUE(4, PAT_WRITE_BACK) | PAT_VALUE(5, PAT_WRITE_THROUGH) | PAT_VALUE(6, PAT_UNCACHED) | PAT_VALUE(7, PAT_UNCACHEABLE); if (pat_works) { /* * Leave the indices 0-3 at the default of WB, WT, UC-, and UC. * Program 5 and 6 as WP and WC. * Leave 4 and 7 as WB and UC. */ pat_msr &= ~(PAT_MASK(5) | PAT_MASK(6)); pat_msr |= PAT_VALUE(5, PAT_WRITE_PROTECTED) | PAT_VALUE(6, PAT_WRITE_COMBINING); pat_table[PAT_UNCACHED] = 2; pat_table[PAT_WRITE_PROTECTED] = 5; pat_table[PAT_WRITE_COMBINING] = 6; } else { /* * Just replace PAT Index 2 with WC instead of UC-. */ pat_msr &= ~PAT_MASK(2); pat_msr |= PAT_VALUE(2, PAT_WRITE_COMBINING); pat_table[PAT_WRITE_COMBINING] = 2; } /* Disable PGE. */ cr4 = rcr4(); load_cr4(cr4 & ~CR4_PGE); /* Disable caches (CD = 1, NW = 0). */ cr0 = rcr0(); load_cr0((cr0 & ~CR0_NW) | CR0_CD); /* Flushes caches and TLBs. */ wbinvd(); invltlb(); /* Update PAT and index table. */ wrmsr(MSR_PAT, pat_msr); for (i = 0; i < PAT_INDEX_SIZE; i++) pat_index[i] = pat_table[i]; /* Flush caches and TLBs again. */ wbinvd(); invltlb(); /* Restore caches and PGE. */ load_cr0(cr0); load_cr4(cr4); } /* * Set PG_G on kernel pages. Only the BSP calls this when SMP is turned on. */ static void pmap_set_pg(void) { pt_entry_t *pte; vm_offset_t va, endva; if (pgeflag == 0) return; endva = KERNBASE + KERNend; if (pseflag) { va = KERNBASE + KERNLOAD; while (va < endva) { pdir_pde(PTD, va) |= pgeflag; invltlb(); /* Flush non-PG_G entries. */ va += NBPDR; } } else { va = (vm_offset_t)btext; while (va < endva) { pte = vtopte(va); if (*pte) *pte |= pgeflag; invltlb(); /* Flush non-PG_G entries. */ va += PAGE_SIZE; } } } /* * Initialize a vm_page's machine-dependent fields. */ void pmap_page_init(vm_page_t m) { TAILQ_INIT(&m->md.pv_list); m->md.pat_mode = PAT_WRITE_BACK; } #if defined(PAE) || defined(PAE_TABLES) static void * pmap_pdpt_allocf(uma_zone_t zone, vm_size_t bytes, uint8_t *flags, int wait) { /* Inform UMA that this allocator uses kernel_map/object. */ *flags = UMA_SLAB_KERNEL; return ((void *)kmem_alloc_contig(kernel_arena, bytes, wait, 0x0ULL, 0xffffffffULL, 1, 0, VM_MEMATTR_DEFAULT)); } #endif /* * Abuse the pte nodes for unmapped kva to thread a kva freelist through. * Requirements: * - Must deal with pages in order to ensure that none of the PG_* bits * are ever set, PG_V in particular. * - Assumes we can write to ptes without pte_store() atomic ops, even * on PAE systems. This should be ok. * - Assumes nothing will ever test these addresses for 0 to indicate * no mapping instead of correctly checking PG_V. * - Assumes a vm_offset_t will fit in a pte (true for i386). * Because PG_V is never set, there can be no mappings to invalidate. */ static vm_offset_t pmap_ptelist_alloc(vm_offset_t *head) { pt_entry_t *pte; vm_offset_t va; va = *head; if (va == 0) panic("pmap_ptelist_alloc: exhausted ptelist KVA"); pte = vtopte(va); *head = *pte; if (*head & PG_V) panic("pmap_ptelist_alloc: va with PG_V set!"); *pte = 0; return (va); } static void pmap_ptelist_free(vm_offset_t *head, vm_offset_t va) { pt_entry_t *pte; if (va & PG_V) panic("pmap_ptelist_free: freeing va with PG_V set!"); pte = vtopte(va); *pte = *head; /* virtual! PG_V is 0 though */ *head = va; } static void pmap_ptelist_init(vm_offset_t *head, void *base, int npages) { int i; vm_offset_t va; *head = 0; for (i = npages - 1; i >= 0; i--) { va = (vm_offset_t)base + i * PAGE_SIZE; pmap_ptelist_free(head, va); } } /* * Initialize the pmap module. * Called by vm_init, to initialize any structures that the pmap * system needs to map virtual memory. */ void pmap_init(void) { struct pmap_preinit_mapping *ppim; vm_page_t mpte; vm_size_t s; int i, pv_npg; /* * Initialize the vm page array entries for the kernel pmap's * page table pages. */ for (i = 0; i < NKPT; i++) { mpte = PHYS_TO_VM_PAGE(KPTphys + (i << PAGE_SHIFT)); KASSERT(mpte >= vm_page_array && mpte < &vm_page_array[vm_page_array_size], ("pmap_init: page table page is out of range")); mpte->pindex = i + KPTDI; mpte->phys_addr = KPTphys + (i << PAGE_SHIFT); } /* * Initialize the address space (zone) for the pv entries. Set a * high water mark so that the system can recover from excessive * numbers of pv entries. */ TUNABLE_INT_FETCH("vm.pmap.shpgperproc", &shpgperproc); pv_entry_max = shpgperproc * maxproc + vm_cnt.v_page_count; TUNABLE_INT_FETCH("vm.pmap.pv_entries", &pv_entry_max); pv_entry_max = roundup(pv_entry_max, _NPCPV); pv_entry_high_water = 9 * (pv_entry_max / 10); /* * If the kernel is running on a virtual machine, then it must assume * that MCA is enabled by the hypervisor. Moreover, the kernel must * be prepared for the hypervisor changing the vendor and family that * are reported by CPUID. Consequently, the workaround for AMD Family * 10h Erratum 383 is enabled if the processor's feature set does not * include at least one feature that is only supported by older Intel * or newer AMD processors. */ if (vm_guest != VM_GUEST_NO && (cpu_feature & CPUID_SS) == 0 && (cpu_feature2 & (CPUID2_SSSE3 | CPUID2_SSE41 | CPUID2_AESNI | CPUID2_AVX | CPUID2_XSAVE)) == 0 && (amd_feature2 & (AMDID2_XOP | AMDID2_FMA4)) == 0) workaround_erratum383 = 1; /* * Are large page mappings supported and enabled? */ TUNABLE_INT_FETCH("vm.pmap.pg_ps_enabled", &pg_ps_enabled); if (pseflag == 0) pg_ps_enabled = 0; else if (pg_ps_enabled) { KASSERT(MAXPAGESIZES > 1 && pagesizes[1] == 0, ("pmap_init: can't assign to pagesizes[1]")); pagesizes[1] = NBPDR; } /* * Calculate the size of the pv head table for superpages. * Handle the possibility that "vm_phys_segs[...].end" is zero. */ pv_npg = trunc_4mpage(vm_phys_segs[vm_phys_nsegs - 1].end - PAGE_SIZE) / NBPDR + 1; /* * Allocate memory for the pv head table for superpages. */ s = (vm_size_t)(pv_npg * sizeof(struct md_page)); s = round_page(s); pv_table = (struct md_page *)kmem_malloc(kernel_arena, s, M_WAITOK | M_ZERO); for (i = 0; i < pv_npg; i++) TAILQ_INIT(&pv_table[i].pv_list); pv_maxchunks = MAX(pv_entry_max / _NPCPV, maxproc); pv_chunkbase = (struct pv_chunk *)kva_alloc(PAGE_SIZE * pv_maxchunks); if (pv_chunkbase == NULL) panic("pmap_init: not enough kvm for pv chunks"); pmap_ptelist_init(&pv_vafree, pv_chunkbase, pv_maxchunks); #if defined(PAE) || defined(PAE_TABLES) pdptzone = uma_zcreate("PDPT", NPGPTD * sizeof(pdpt_entry_t), NULL, NULL, NULL, NULL, (NPGPTD * sizeof(pdpt_entry_t)) - 1, UMA_ZONE_VM | UMA_ZONE_NOFREE); uma_zone_set_allocf(pdptzone, pmap_pdpt_allocf); #endif pmap_initialized = 1; if (!bootverbose) return; for (i = 0; i < PMAP_PREINIT_MAPPING_COUNT; i++) { ppim = pmap_preinit_mapping + i; if (ppim->va == 0) continue; printf("PPIM %u: PA=%#jx, VA=%#x, size=%#x, mode=%#x\n", i, (uintmax_t)ppim->pa, ppim->va, ppim->sz, ppim->mode); } } SYSCTL_INT(_vm_pmap, OID_AUTO, pv_entry_max, CTLFLAG_RD, &pv_entry_max, 0, "Max number of PV entries"); SYSCTL_INT(_vm_pmap, OID_AUTO, shpgperproc, CTLFLAG_RD, &shpgperproc, 0, "Page share factor per proc"); static SYSCTL_NODE(_vm_pmap, OID_AUTO, pde, CTLFLAG_RD, 0, "2/4MB page mapping counters"); static u_long pmap_pde_demotions; SYSCTL_ULONG(_vm_pmap_pde, OID_AUTO, demotions, CTLFLAG_RD, &pmap_pde_demotions, 0, "2/4MB page demotions"); static u_long pmap_pde_mappings; SYSCTL_ULONG(_vm_pmap_pde, OID_AUTO, mappings, CTLFLAG_RD, &pmap_pde_mappings, 0, "2/4MB page mappings"); static u_long pmap_pde_p_failures; SYSCTL_ULONG(_vm_pmap_pde, OID_AUTO, p_failures, CTLFLAG_RD, &pmap_pde_p_failures, 0, "2/4MB page promotion failures"); static u_long pmap_pde_promotions; SYSCTL_ULONG(_vm_pmap_pde, OID_AUTO, promotions, CTLFLAG_RD, &pmap_pde_promotions, 0, "2/4MB page promotions"); /*************************************************** * Low level helper routines..... ***************************************************/ /* * Determine the appropriate bits to set in a PTE or PDE for a specified * caching mode. */ int pmap_cache_bits(int mode, boolean_t is_pde) { int cache_bits, pat_flag, pat_idx; if (mode < 0 || mode >= PAT_INDEX_SIZE || pat_index[mode] < 0) panic("Unknown caching mode %d\n", mode); /* The PAT bit is different for PTE's and PDE's. */ pat_flag = is_pde ? PG_PDE_PAT : PG_PTE_PAT; /* Map the caching mode to a PAT index. */ pat_idx = pat_index[mode]; /* Map the 3-bit index value into the PAT, PCD, and PWT bits. */ cache_bits = 0; if (pat_idx & 0x4) cache_bits |= pat_flag; if (pat_idx & 0x2) cache_bits |= PG_NC_PCD; if (pat_idx & 0x1) cache_bits |= PG_NC_PWT; return (cache_bits); } /* * The caller is responsible for maintaining TLB consistency. */ static void pmap_kenter_pde(vm_offset_t va, pd_entry_t newpde) { pd_entry_t *pde; pmap_t pmap; boolean_t PTD_updated; PTD_updated = FALSE; mtx_lock_spin(&allpmaps_lock); LIST_FOREACH(pmap, &allpmaps, pm_list) { if ((pmap->pm_pdir[PTDPTDI] & PG_FRAME) == (PTDpde[0] & PG_FRAME)) PTD_updated = TRUE; pde = pmap_pde(pmap, va); pde_store(pde, newpde); } mtx_unlock_spin(&allpmaps_lock); KASSERT(PTD_updated, ("pmap_kenter_pde: current page table is not in allpmaps")); } /* * After changing the page size for the specified virtual address in the page * table, flush the corresponding entries from the processor's TLB. Only the * calling processor's TLB is affected. * * The calling thread must be pinned to a processor. */ static void pmap_update_pde_invalidate(vm_offset_t va, pd_entry_t newpde) { u_long cr4; if ((newpde & PG_PS) == 0) /* Demotion: flush a specific 2MB page mapping. */ invlpg(va); else if ((newpde & PG_G) == 0) /* * Promotion: flush every 4KB page mapping from the TLB * because there are too many to flush individually. */ invltlb(); else { /* * Promotion: flush every 4KB page mapping from the TLB, * including any global (PG_G) mappings. */ cr4 = rcr4(); load_cr4(cr4 & ~CR4_PGE); /* * Although preemption at this point could be detrimental to * performance, it would not lead to an error. PG_G is simply * ignored if CR4.PGE is clear. Moreover, in case this block * is re-entered, the load_cr4() either above or below will * modify CR4.PGE flushing the TLB. */ load_cr4(cr4 | CR4_PGE); } } void invltlb_glob(void) { uint64_t cr4; if (pgeflag == 0) { invltlb(); } else { cr4 = rcr4(); load_cr4(cr4 & ~CR4_PGE); load_cr4(cr4 | CR4_PGE); } } #ifdef SMP /* * For SMP, these functions have to use the IPI mechanism for coherence. * * N.B.: Before calling any of the following TLB invalidation functions, * the calling processor must ensure that all stores updating a non- * kernel page table are globally performed. Otherwise, another * processor could cache an old, pre-update entry without being * invalidated. This can happen one of two ways: (1) The pmap becomes * active on another processor after its pm_active field is checked by * one of the following functions but before a store updating the page * table is globally performed. (2) The pmap becomes active on another * processor before its pm_active field is checked but due to * speculative loads one of the following functions stills reads the * pmap as inactive on the other processor. * * The kernel page table is exempt because its pm_active field is * immutable. The kernel page table is always active on every * processor. */ void pmap_invalidate_page(pmap_t pmap, vm_offset_t va) { cpuset_t *mask, other_cpus; u_int cpuid; sched_pin(); if (pmap == kernel_pmap || !CPU_CMP(&pmap->pm_active, &all_cpus)) { invlpg(va); mask = &all_cpus; } else { cpuid = PCPU_GET(cpuid); other_cpus = all_cpus; CPU_CLR(cpuid, &other_cpus); if (CPU_ISSET(cpuid, &pmap->pm_active)) invlpg(va); CPU_AND(&other_cpus, &pmap->pm_active); mask = &other_cpus; } smp_masked_invlpg(*mask, va); sched_unpin(); } /* 4k PTEs -- Chosen to exceed the total size of Broadwell L2 TLB */ #define PMAP_INVLPG_THRESHOLD (4 * 1024 * PAGE_SIZE) void pmap_invalidate_range(pmap_t pmap, vm_offset_t sva, vm_offset_t eva) { cpuset_t *mask, other_cpus; vm_offset_t addr; u_int cpuid; if (eva - sva >= PMAP_INVLPG_THRESHOLD) { pmap_invalidate_all(pmap); return; } sched_pin(); if (pmap == kernel_pmap || !CPU_CMP(&pmap->pm_active, &all_cpus)) { for (addr = sva; addr < eva; addr += PAGE_SIZE) invlpg(addr); mask = &all_cpus; } else { cpuid = PCPU_GET(cpuid); other_cpus = all_cpus; CPU_CLR(cpuid, &other_cpus); if (CPU_ISSET(cpuid, &pmap->pm_active)) for (addr = sva; addr < eva; addr += PAGE_SIZE) invlpg(addr); CPU_AND(&other_cpus, &pmap->pm_active); mask = &other_cpus; } smp_masked_invlpg_range(*mask, sva, eva); sched_unpin(); } void pmap_invalidate_all(pmap_t pmap) { cpuset_t *mask, other_cpus; u_int cpuid; sched_pin(); if (pmap == kernel_pmap) { invltlb_glob(); mask = &all_cpus; } else if (!CPU_CMP(&pmap->pm_active, &all_cpus)) { invltlb(); mask = &all_cpus; } else { cpuid = PCPU_GET(cpuid); other_cpus = all_cpus; CPU_CLR(cpuid, &other_cpus); if (CPU_ISSET(cpuid, &pmap->pm_active)) invltlb(); CPU_AND(&other_cpus, &pmap->pm_active); mask = &other_cpus; } smp_masked_invltlb(*mask, pmap); sched_unpin(); } void pmap_invalidate_cache(void) { sched_pin(); wbinvd(); smp_cache_flush(); sched_unpin(); } struct pde_action { cpuset_t invalidate; /* processors that invalidate their TLB */ vm_offset_t va; pd_entry_t *pde; pd_entry_t newpde; u_int store; /* processor that updates the PDE */ }; static void pmap_update_pde_kernel(void *arg) { struct pde_action *act = arg; pd_entry_t *pde; pmap_t pmap; if (act->store == PCPU_GET(cpuid)) { /* * Elsewhere, this operation requires allpmaps_lock for * synchronization. Here, it does not because it is being * performed in the context of an all_cpus rendezvous. */ LIST_FOREACH(pmap, &allpmaps, pm_list) { pde = pmap_pde(pmap, act->va); pde_store(pde, act->newpde); } } } static void pmap_update_pde_user(void *arg) { struct pde_action *act = arg; if (act->store == PCPU_GET(cpuid)) pde_store(act->pde, act->newpde); } static void pmap_update_pde_teardown(void *arg) { struct pde_action *act = arg; if (CPU_ISSET(PCPU_GET(cpuid), &act->invalidate)) pmap_update_pde_invalidate(act->va, act->newpde); } /* * Change the page size for the specified virtual address in a way that * prevents any possibility of the TLB ever having two entries that map the * same virtual address using different page sizes. This is the recommended * workaround for Erratum 383 on AMD Family 10h processors. It prevents a * machine check exception for a TLB state that is improperly diagnosed as a * hardware error. */ static void pmap_update_pde(pmap_t pmap, vm_offset_t va, pd_entry_t *pde, pd_entry_t newpde) { struct pde_action act; cpuset_t active, other_cpus; u_int cpuid; sched_pin(); cpuid = PCPU_GET(cpuid); other_cpus = all_cpus; CPU_CLR(cpuid, &other_cpus); if (pmap == kernel_pmap) active = all_cpus; else active = pmap->pm_active; if (CPU_OVERLAP(&active, &other_cpus)) { act.store = cpuid; act.invalidate = active; act.va = va; act.pde = pde; act.newpde = newpde; CPU_SET(cpuid, &active); smp_rendezvous_cpus(active, smp_no_rendevous_barrier, pmap == kernel_pmap ? pmap_update_pde_kernel : pmap_update_pde_user, pmap_update_pde_teardown, &act); } else { if (pmap == kernel_pmap) pmap_kenter_pde(va, newpde); else pde_store(pde, newpde); if (CPU_ISSET(cpuid, &active)) pmap_update_pde_invalidate(va, newpde); } sched_unpin(); } #else /* !SMP */ /* * Normal, non-SMP, 486+ invalidation functions. * We inline these within pmap.c for speed. */ PMAP_INLINE void pmap_invalidate_page(pmap_t pmap, vm_offset_t va) { if (pmap == kernel_pmap || !CPU_EMPTY(&pmap->pm_active)) invlpg(va); } PMAP_INLINE void pmap_invalidate_range(pmap_t pmap, vm_offset_t sva, vm_offset_t eva) { vm_offset_t addr; if (pmap == kernel_pmap || !CPU_EMPTY(&pmap->pm_active)) for (addr = sva; addr < eva; addr += PAGE_SIZE) invlpg(addr); } PMAP_INLINE void pmap_invalidate_all(pmap_t pmap) { if (pmap == kernel_pmap) invltlb_glob(); else if (!CPU_EMPTY(&pmap->pm_active)) invltlb(); } PMAP_INLINE void pmap_invalidate_cache(void) { wbinvd(); } static void pmap_update_pde(pmap_t pmap, vm_offset_t va, pd_entry_t *pde, pd_entry_t newpde) { if (pmap == kernel_pmap) pmap_kenter_pde(va, newpde); else pde_store(pde, newpde); if (pmap == kernel_pmap || !CPU_EMPTY(&pmap->pm_active)) pmap_update_pde_invalidate(va, newpde); } #endif /* !SMP */ #define PMAP_CLFLUSH_THRESHOLD (2 * 1024 * 1024) void pmap_invalidate_cache_range(vm_offset_t sva, vm_offset_t eva, boolean_t force) { if (force) { sva &= ~(vm_offset_t)cpu_clflush_line_size; } else { KASSERT((sva & PAGE_MASK) == 0, ("pmap_invalidate_cache_range: sva not page-aligned")); KASSERT((eva & PAGE_MASK) == 0, ("pmap_invalidate_cache_range: eva not page-aligned")); } if ((cpu_feature & CPUID_SS) != 0 && !force) ; /* If "Self Snoop" is supported and allowed, do nothing. */ else if ((cpu_stdext_feature & CPUID_STDEXT_CLFLUSHOPT) != 0 && eva - sva < PMAP_CLFLUSH_THRESHOLD) { #ifdef DEV_APIC /* * XXX: Some CPUs fault, hang, or trash the local APIC * registers if we use CLFLUSH on the local APIC * range. The local APIC is always uncached, so we * don't need to flush for that range anyway. */ if (pmap_kextract(sva) == lapic_paddr) return; #endif /* * Otherwise, do per-cache line flush. Use the mfence * instruction to insure that previous stores are * included in the write-back. The processor * propagates flush to other processors in the cache * coherence domain. */ mfence(); for (; sva < eva; sva += cpu_clflush_line_size) clflushopt(sva); mfence(); } else if ((cpu_feature & CPUID_CLFSH) != 0 && eva - sva < PMAP_CLFLUSH_THRESHOLD) { #ifdef DEV_APIC if (pmap_kextract(sva) == lapic_paddr) return; #endif /* * Writes are ordered by CLFLUSH on Intel CPUs. */ if (cpu_vendor_id != CPU_VENDOR_INTEL) mfence(); for (; sva < eva; sva += cpu_clflush_line_size) clflush(sva); if (cpu_vendor_id != CPU_VENDOR_INTEL) mfence(); } else { /* * No targeted cache flush methods are supported by CPU, * or the supplied range is bigger than 2MB. * Globally invalidate cache. */ pmap_invalidate_cache(); } } void pmap_invalidate_cache_pages(vm_page_t *pages, int count) { int i; if (count >= PMAP_CLFLUSH_THRESHOLD / PAGE_SIZE || (cpu_feature & CPUID_CLFSH) == 0) { pmap_invalidate_cache(); } else { for (i = 0; i < count; i++) pmap_flush_page(pages[i]); } } /* * Are we current address space or kernel? */ static __inline int pmap_is_current(pmap_t pmap) { return (pmap == kernel_pmap || pmap == vmspace_pmap(curthread->td_proc->p_vmspace)); } /* * If the given pmap is not the current or kernel pmap, the returned pte must * be released by passing it to pmap_pte_release(). */ pt_entry_t * pmap_pte(pmap_t pmap, vm_offset_t va) { pd_entry_t newpf; pd_entry_t *pde; pde = pmap_pde(pmap, va); if (*pde & PG_PS) return (pde); if (*pde != 0) { /* are we current address space or kernel? */ if (pmap_is_current(pmap)) return (vtopte(va)); mtx_lock(&PMAP2mutex); newpf = *pde & PG_FRAME; if ((*PMAP2 & PG_FRAME) != newpf) { *PMAP2 = newpf | PG_RW | PG_V | PG_A | PG_M; pmap_invalidate_page(kernel_pmap, (vm_offset_t)PADDR2); } return (PADDR2 + (i386_btop(va) & (NPTEPG - 1))); } return (NULL); } /* * Releases a pte that was obtained from pmap_pte(). Be prepared for the pte * being NULL. */ static __inline void pmap_pte_release(pt_entry_t *pte) { if ((pt_entry_t *)((vm_offset_t)pte & ~PAGE_MASK) == PADDR2) mtx_unlock(&PMAP2mutex); } /* * NB: The sequence of updating a page table followed by accesses to the * corresponding pages is subject to the situation described in the "AMD64 * Architecture Programmer's Manual Volume 2: System Programming" rev. 3.23, * "7.3.1 Special Coherency Considerations". Therefore, issuing the INVLPG * right after modifying the PTE bits is crucial. */ static __inline void invlcaddr(void *caddr) { invlpg((u_int)caddr); } /* * Super fast pmap_pte routine best used when scanning * the pv lists. This eliminates many coarse-grained * invltlb calls. Note that many of the pv list * scans are across different pmaps. It is very wasteful * to do an entire invltlb for checking a single mapping. * * If the given pmap is not the current pmap, pvh_global_lock * must be held and curthread pinned to a CPU. */ static pt_entry_t * pmap_pte_quick(pmap_t pmap, vm_offset_t va) { pd_entry_t newpf; pd_entry_t *pde; pde = pmap_pde(pmap, va); if (*pde & PG_PS) return (pde); if (*pde != 0) { /* are we current address space or kernel? */ if (pmap_is_current(pmap)) return (vtopte(va)); rw_assert(&pvh_global_lock, RA_WLOCKED); KASSERT(curthread->td_pinned > 0, ("curthread not pinned")); newpf = *pde & PG_FRAME; if ((*PMAP1 & PG_FRAME) != newpf) { *PMAP1 = newpf | PG_RW | PG_V | PG_A | PG_M; #ifdef SMP PMAP1cpu = PCPU_GET(cpuid); #endif invlcaddr(PADDR1); PMAP1changed++; } else #ifdef SMP if (PMAP1cpu != PCPU_GET(cpuid)) { PMAP1cpu = PCPU_GET(cpuid); invlcaddr(PADDR1); PMAP1changedcpu++; } else #endif PMAP1unchanged++; return (PADDR1 + (i386_btop(va) & (NPTEPG - 1))); } return (0); } /* * Routine: pmap_extract * Function: * Extract the physical page address associated * with the given map/virtual_address pair. */ vm_paddr_t pmap_extract(pmap_t pmap, vm_offset_t va) { vm_paddr_t rtval; pt_entry_t *pte; pd_entry_t pde; rtval = 0; PMAP_LOCK(pmap); pde = pmap->pm_pdir[va >> PDRSHIFT]; if (pde != 0) { if ((pde & PG_PS) != 0) rtval = (pde & PG_PS_FRAME) | (va & PDRMASK); else { pte = pmap_pte(pmap, va); rtval = (*pte & PG_FRAME) | (va & PAGE_MASK); pmap_pte_release(pte); } } PMAP_UNLOCK(pmap); return (rtval); } /* * Routine: pmap_extract_and_hold * Function: * Atomically extract and hold the physical page * with the given pmap and virtual address pair * if that mapping permits the given protection. */ vm_page_t pmap_extract_and_hold(pmap_t pmap, vm_offset_t va, vm_prot_t prot) { pd_entry_t pde; pt_entry_t pte, *ptep; vm_page_t m; vm_paddr_t pa; pa = 0; m = NULL; PMAP_LOCK(pmap); retry: pde = *pmap_pde(pmap, va); if (pde != 0) { if (pde & PG_PS) { if ((pde & PG_RW) || (prot & VM_PROT_WRITE) == 0) { if (vm_page_pa_tryrelock(pmap, (pde & PG_PS_FRAME) | (va & PDRMASK), &pa)) goto retry; m = PHYS_TO_VM_PAGE((pde & PG_PS_FRAME) | (va & PDRMASK)); vm_page_hold(m); } } else { ptep = pmap_pte(pmap, va); pte = *ptep; pmap_pte_release(ptep); if (pte != 0 && ((pte & PG_RW) || (prot & VM_PROT_WRITE) == 0)) { if (vm_page_pa_tryrelock(pmap, pte & PG_FRAME, &pa)) goto retry; m = PHYS_TO_VM_PAGE(pte & PG_FRAME); vm_page_hold(m); } } } PA_UNLOCK_COND(pa); PMAP_UNLOCK(pmap); return (m); } /*************************************************** * Low level mapping routines..... ***************************************************/ /* * Add a wired page to the kva. * Note: not SMP coherent. * * This function may be used before pmap_bootstrap() is called. */ PMAP_INLINE void pmap_kenter(vm_offset_t va, vm_paddr_t pa) { pt_entry_t *pte; pte = vtopte(va); pte_store(pte, pa | PG_RW | PG_V | pgeflag); } static __inline void pmap_kenter_attr(vm_offset_t va, vm_paddr_t pa, int mode) { pt_entry_t *pte; pte = vtopte(va); pte_store(pte, pa | PG_RW | PG_V | pgeflag | pmap_cache_bits(mode, 0)); } /* * Remove a page from the kernel pagetables. * Note: not SMP coherent. * * This function may be used before pmap_bootstrap() is called. */ PMAP_INLINE void pmap_kremove(vm_offset_t va) { pt_entry_t *pte; pte = vtopte(va); pte_clear(pte); } /* * Used to map a range of physical addresses into kernel * virtual address space. * * The value passed in '*virt' is a suggested virtual address for * the mapping. Architectures which can support a direct-mapped * physical to virtual region can return the appropriate address * within that region, leaving '*virt' unchanged. Other * architectures should map the pages starting at '*virt' and * update '*virt' with the first usable address after the mapped * region. */ vm_offset_t pmap_map(vm_offset_t *virt, vm_paddr_t start, vm_paddr_t end, int prot) { vm_offset_t va, sva; vm_paddr_t superpage_offset; pd_entry_t newpde; va = *virt; /* * Does the physical address range's size and alignment permit at * least one superpage mapping to be created? */ superpage_offset = start & PDRMASK; if ((end - start) - ((NBPDR - superpage_offset) & PDRMASK) >= NBPDR) { /* * Increase the starting virtual address so that its alignment * does not preclude the use of superpage mappings. */ if ((va & PDRMASK) < superpage_offset) va = (va & ~PDRMASK) + superpage_offset; else if ((va & PDRMASK) > superpage_offset) va = ((va + PDRMASK) & ~PDRMASK) + superpage_offset; } sva = va; while (start < end) { if ((start & PDRMASK) == 0 && end - start >= NBPDR && pseflag) { KASSERT((va & PDRMASK) == 0, ("pmap_map: misaligned va %#x", va)); newpde = start | PG_PS | pgeflag | PG_RW | PG_V; pmap_kenter_pde(va, newpde); va += NBPDR; start += NBPDR; } else { pmap_kenter(va, start); va += PAGE_SIZE; start += PAGE_SIZE; } } pmap_invalidate_range(kernel_pmap, sva, va); *virt = va; return (sva); } /* * Add a list of wired pages to the kva * this routine is only used for temporary * kernel mappings that do not need to have * page modification or references recorded. * Note that old mappings are simply written * over. The page *must* be wired. * Note: SMP coherent. Uses a ranged shootdown IPI. */ void pmap_qenter(vm_offset_t sva, vm_page_t *ma, int count) { pt_entry_t *endpte, oldpte, pa, *pte; vm_page_t m; oldpte = 0; pte = vtopte(sva); endpte = pte + count; while (pte < endpte) { m = *ma++; pa = VM_PAGE_TO_PHYS(m) | pmap_cache_bits(m->md.pat_mode, 0); if ((*pte & (PG_FRAME | PG_PTE_CACHE)) != pa) { oldpte |= *pte; pte_store(pte, pa | pgeflag | PG_RW | PG_V); } pte++; } if (__predict_false((oldpte & PG_V) != 0)) pmap_invalidate_range(kernel_pmap, sva, sva + count * PAGE_SIZE); } /* * This routine tears out page mappings from the * kernel -- it is meant only for temporary mappings. * Note: SMP coherent. Uses a ranged shootdown IPI. */ void pmap_qremove(vm_offset_t sva, int count) { vm_offset_t va; va = sva; while (count-- > 0) { pmap_kremove(va); va += PAGE_SIZE; } pmap_invalidate_range(kernel_pmap, sva, va); } /*************************************************** * Page table page management routines..... ***************************************************/ static __inline void pmap_free_zero_pages(struct spglist *free) { vm_page_t m; while ((m = SLIST_FIRST(free)) != NULL) { SLIST_REMOVE_HEAD(free, plinks.s.ss); /* Preserve the page's PG_ZERO setting. */ vm_page_free_toq(m); } } /* * Schedule the specified unused page table page to be freed. Specifically, * add the page to the specified list of pages that will be released to the * physical memory manager after the TLB has been updated. */ static __inline void pmap_add_delayed_free_list(vm_page_t m, struct spglist *free, boolean_t set_PG_ZERO) { if (set_PG_ZERO) m->flags |= PG_ZERO; else m->flags &= ~PG_ZERO; SLIST_INSERT_HEAD(free, m, plinks.s.ss); } /* * Inserts the specified page table page into the specified pmap's collection * of idle page table pages. Each of a pmap's page table pages is responsible * for mapping a distinct range of virtual addresses. The pmap's collection is * ordered by this virtual address range. */ static __inline int pmap_insert_pt_page(pmap_t pmap, vm_page_t mpte) { PMAP_LOCK_ASSERT(pmap, MA_OWNED); return (vm_radix_insert(&pmap->pm_root, mpte)); } /* * Looks for a page table page mapping the specified virtual address in the * specified pmap's collection of idle page table pages. Returns NULL if there * is no page table page corresponding to the specified virtual address. */ static __inline vm_page_t pmap_lookup_pt_page(pmap_t pmap, vm_offset_t va) { PMAP_LOCK_ASSERT(pmap, MA_OWNED); return (vm_radix_lookup(&pmap->pm_root, va >> PDRSHIFT)); } /* * Removes the specified page table page from the specified pmap's collection * of idle page table pages. The specified page table page must be a member of * the pmap's collection. */ static __inline void pmap_remove_pt_page(pmap_t pmap, vm_page_t mpte) { PMAP_LOCK_ASSERT(pmap, MA_OWNED); vm_radix_remove(&pmap->pm_root, mpte->pindex); } /* * Decrements a page table page's wire count, which is used to record the * number of valid page table entries within the page. If the wire count * drops to zero, then the page table page is unmapped. Returns TRUE if the * page table page was unmapped and FALSE otherwise. */ static inline boolean_t pmap_unwire_ptp(pmap_t pmap, vm_page_t m, struct spglist *free) { --m->wire_count; if (m->wire_count == 0) { _pmap_unwire_ptp(pmap, m, free); return (TRUE); } else return (FALSE); } static void _pmap_unwire_ptp(pmap_t pmap, vm_page_t m, struct spglist *free) { vm_offset_t pteva; /* * unmap the page table page */ pmap->pm_pdir[m->pindex] = 0; --pmap->pm_stats.resident_count; /* * This is a release store so that the ordinary store unmapping * the page table page is globally performed before TLB shoot- * down is begun. */ atomic_subtract_rel_int(&vm_cnt.v_wire_count, 1); /* * Do an invltlb to make the invalidated mapping * take effect immediately. */ pteva = VM_MAXUSER_ADDRESS + i386_ptob(m->pindex); pmap_invalidate_page(pmap, pteva); /* * Put page on a list so that it is released after * *ALL* TLB shootdown is done */ pmap_add_delayed_free_list(m, free, TRUE); } /* * After removing a page table entry, this routine is used to * conditionally free the page, and manage the hold/wire counts. */ static int pmap_unuse_pt(pmap_t pmap, vm_offset_t va, struct spglist *free) { pd_entry_t ptepde; vm_page_t mpte; if (va >= VM_MAXUSER_ADDRESS) return (0); ptepde = *pmap_pde(pmap, va); mpte = PHYS_TO_VM_PAGE(ptepde & PG_FRAME); return (pmap_unwire_ptp(pmap, mpte, free)); } /* * Initialize the pmap for the swapper process. */ void pmap_pinit0(pmap_t pmap) { PMAP_LOCK_INIT(pmap); /* * Since the page table directory is shared with the kernel pmap, * which is already included in the list "allpmaps", this pmap does * not need to be inserted into that list. */ pmap->pm_pdir = (pd_entry_t *)(KERNBASE + (vm_offset_t)IdlePTD); #if defined(PAE) || defined(PAE_TABLES) pmap->pm_pdpt = (pdpt_entry_t *)(KERNBASE + (vm_offset_t)IdlePDPT); #endif pmap->pm_root.rt_root = 0; CPU_ZERO(&pmap->pm_active); PCPU_SET(curpmap, pmap); TAILQ_INIT(&pmap->pm_pvchunk); bzero(&pmap->pm_stats, sizeof pmap->pm_stats); } /* * Initialize a preallocated and zeroed pmap structure, * such as one in a vmspace structure. */ int pmap_pinit(pmap_t pmap) { vm_page_t m, ptdpg[NPGPTD]; vm_paddr_t pa; int i; /* * No need to allocate page table space yet but we do need a valid * page directory table. */ if (pmap->pm_pdir == NULL) { pmap->pm_pdir = (pd_entry_t *)kva_alloc(NBPTD); if (pmap->pm_pdir == NULL) return (0); #if defined(PAE) || defined(PAE_TABLES) pmap->pm_pdpt = uma_zalloc(pdptzone, M_WAITOK | M_ZERO); KASSERT(((vm_offset_t)pmap->pm_pdpt & ((NPGPTD * sizeof(pdpt_entry_t)) - 1)) == 0, ("pmap_pinit: pdpt misaligned")); KASSERT(pmap_kextract((vm_offset_t)pmap->pm_pdpt) < (4ULL<<30), ("pmap_pinit: pdpt above 4g")); #endif pmap->pm_root.rt_root = 0; } KASSERT(vm_radix_is_empty(&pmap->pm_root), ("pmap_pinit: pmap has reserved page table page(s)")); /* * allocate the page directory page(s) */ for (i = 0; i < NPGPTD;) { m = vm_page_alloc(NULL, 0, VM_ALLOC_NORMAL | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED | VM_ALLOC_ZERO); if (m == NULL) VM_WAIT; else { ptdpg[i++] = m; } } pmap_qenter((vm_offset_t)pmap->pm_pdir, ptdpg, NPGPTD); for (i = 0; i < NPGPTD; i++) if ((ptdpg[i]->flags & PG_ZERO) == 0) pagezero(pmap->pm_pdir + (i * NPDEPG)); mtx_lock_spin(&allpmaps_lock); LIST_INSERT_HEAD(&allpmaps, pmap, pm_list); /* Copy the kernel page table directory entries. */ bcopy(PTD + KPTDI, pmap->pm_pdir + KPTDI, nkpt * sizeof(pd_entry_t)); mtx_unlock_spin(&allpmaps_lock); /* install self-referential address mapping entry(s) */ for (i = 0; i < NPGPTD; i++) { pa = VM_PAGE_TO_PHYS(ptdpg[i]); pmap->pm_pdir[PTDPTDI + i] = pa | PG_V | PG_RW | PG_A | PG_M; #if defined(PAE) || defined(PAE_TABLES) pmap->pm_pdpt[i] = pa | PG_V; #endif } CPU_ZERO(&pmap->pm_active); TAILQ_INIT(&pmap->pm_pvchunk); bzero(&pmap->pm_stats, sizeof pmap->pm_stats); return (1); } /* * this routine is called if the page table page is not * mapped correctly. */ static vm_page_t _pmap_allocpte(pmap_t pmap, u_int ptepindex, u_int flags) { vm_paddr_t ptepa; vm_page_t m; /* * Allocate a page table page. */ if ((m = vm_page_alloc(NULL, ptepindex, VM_ALLOC_NOOBJ | VM_ALLOC_WIRED | VM_ALLOC_ZERO)) == NULL) { if ((flags & PMAP_ENTER_NOSLEEP) == 0) { PMAP_UNLOCK(pmap); rw_wunlock(&pvh_global_lock); VM_WAIT; rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); } /* * Indicate the need to retry. While waiting, the page table * page may have been allocated. */ return (NULL); } if ((m->flags & PG_ZERO) == 0) pmap_zero_page(m); /* * Map the pagetable page into the process address space, if * it isn't already there. */ pmap->pm_stats.resident_count++; ptepa = VM_PAGE_TO_PHYS(m); pmap->pm_pdir[ptepindex] = (pd_entry_t) (ptepa | PG_U | PG_RW | PG_V | PG_A | PG_M); return (m); } static vm_page_t pmap_allocpte(pmap_t pmap, vm_offset_t va, u_int flags) { u_int ptepindex; pd_entry_t ptepa; vm_page_t m; /* * Calculate pagetable page index */ ptepindex = va >> PDRSHIFT; retry: /* * Get the page directory entry */ ptepa = pmap->pm_pdir[ptepindex]; /* * This supports switching from a 4MB page to a * normal 4K page. */ if (ptepa & PG_PS) { (void)pmap_demote_pde(pmap, &pmap->pm_pdir[ptepindex], va); ptepa = pmap->pm_pdir[ptepindex]; } /* * If the page table page is mapped, we just increment the * hold count, and activate it. */ if (ptepa) { m = PHYS_TO_VM_PAGE(ptepa & PG_FRAME); m->wire_count++; } else { /* * Here if the pte page isn't mapped, or if it has * been deallocated. */ m = _pmap_allocpte(pmap, ptepindex, flags); if (m == NULL && (flags & PMAP_ENTER_NOSLEEP) == 0) goto retry; } return (m); } /*************************************************** * Pmap allocation/deallocation routines. ***************************************************/ /* * Release any resources held by the given physical map. * Called when a pmap initialized by pmap_pinit is being released. * Should only be called if the map contains no valid mappings. */ void pmap_release(pmap_t pmap) { vm_page_t m, ptdpg[NPGPTD]; int i; KASSERT(pmap->pm_stats.resident_count == 0, ("pmap_release: pmap resident count %ld != 0", pmap->pm_stats.resident_count)); KASSERT(vm_radix_is_empty(&pmap->pm_root), ("pmap_release: pmap has reserved page table page(s)")); KASSERT(CPU_EMPTY(&pmap->pm_active), ("releasing active pmap %p", pmap)); mtx_lock_spin(&allpmaps_lock); LIST_REMOVE(pmap, pm_list); mtx_unlock_spin(&allpmaps_lock); for (i = 0; i < NPGPTD; i++) ptdpg[i] = PHYS_TO_VM_PAGE(pmap->pm_pdir[PTDPTDI + i] & PG_FRAME); bzero(pmap->pm_pdir + PTDPTDI, (nkpt + NPGPTD) * sizeof(*pmap->pm_pdir)); pmap_qremove((vm_offset_t)pmap->pm_pdir, NPGPTD); for (i = 0; i < NPGPTD; i++) { m = ptdpg[i]; #if defined(PAE) || defined(PAE_TABLES) KASSERT(VM_PAGE_TO_PHYS(m) == (pmap->pm_pdpt[i] & PG_FRAME), ("pmap_release: got wrong ptd page")); #endif m->wire_count--; atomic_subtract_int(&vm_cnt.v_wire_count, 1); vm_page_free_zero(m); } } static int kvm_size(SYSCTL_HANDLER_ARGS) { unsigned long ksize = VM_MAX_KERNEL_ADDRESS - KERNBASE; return (sysctl_handle_long(oidp, &ksize, 0, req)); } SYSCTL_PROC(_vm, OID_AUTO, kvm_size, CTLTYPE_LONG|CTLFLAG_RD, 0, 0, kvm_size, "IU", "Size of KVM"); static int kvm_free(SYSCTL_HANDLER_ARGS) { unsigned long kfree = VM_MAX_KERNEL_ADDRESS - kernel_vm_end; return (sysctl_handle_long(oidp, &kfree, 0, req)); } SYSCTL_PROC(_vm, OID_AUTO, kvm_free, CTLTYPE_LONG|CTLFLAG_RD, 0, 0, kvm_free, "IU", "Amount of KVM free"); /* * grow the number of kernel page table entries, if needed */ void pmap_growkernel(vm_offset_t addr) { vm_paddr_t ptppaddr; vm_page_t nkpg; pd_entry_t newpdir; mtx_assert(&kernel_map->system_mtx, MA_OWNED); addr = roundup2(addr, NBPDR); if (addr - 1 >= kernel_map->max_offset) addr = kernel_map->max_offset; while (kernel_vm_end < addr) { if (pdir_pde(PTD, kernel_vm_end)) { kernel_vm_end = (kernel_vm_end + NBPDR) & ~PDRMASK; if (kernel_vm_end - 1 >= kernel_map->max_offset) { kernel_vm_end = kernel_map->max_offset; break; } continue; } nkpg = vm_page_alloc(NULL, kernel_vm_end >> PDRSHIFT, VM_ALLOC_INTERRUPT | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED | VM_ALLOC_ZERO); if (nkpg == NULL) panic("pmap_growkernel: no memory to grow kernel"); nkpt++; if ((nkpg->flags & PG_ZERO) == 0) pmap_zero_page(nkpg); ptppaddr = VM_PAGE_TO_PHYS(nkpg); newpdir = (pd_entry_t) (ptppaddr | PG_V | PG_RW | PG_A | PG_M); pdir_pde(KPTD, kernel_vm_end) = pgeflag | newpdir; pmap_kenter_pde(kernel_vm_end, newpdir); kernel_vm_end = (kernel_vm_end + NBPDR) & ~PDRMASK; if (kernel_vm_end - 1 >= kernel_map->max_offset) { kernel_vm_end = kernel_map->max_offset; break; } } } /*************************************************** * page management routines. ***************************************************/ CTASSERT(sizeof(struct pv_chunk) == PAGE_SIZE); CTASSERT(_NPCM == 11); CTASSERT(_NPCPV == 336); static __inline struct pv_chunk * pv_to_chunk(pv_entry_t pv) { return ((struct pv_chunk *)((uintptr_t)pv & ~(uintptr_t)PAGE_MASK)); } #define PV_PMAP(pv) (pv_to_chunk(pv)->pc_pmap) #define PC_FREE0_9 0xfffffffful /* Free values for index 0 through 9 */ #define PC_FREE10 0x0000fffful /* Free values for index 10 */ static const uint32_t pc_freemask[_NPCM] = { PC_FREE0_9, PC_FREE0_9, PC_FREE0_9, PC_FREE0_9, PC_FREE0_9, PC_FREE0_9, PC_FREE0_9, PC_FREE0_9, PC_FREE0_9, PC_FREE0_9, PC_FREE10 }; SYSCTL_INT(_vm_pmap, OID_AUTO, pv_entry_count, CTLFLAG_RD, &pv_entry_count, 0, "Current number of pv entries"); #ifdef PV_STATS static int pc_chunk_count, pc_chunk_allocs, pc_chunk_frees, pc_chunk_tryfail; SYSCTL_INT(_vm_pmap, OID_AUTO, pc_chunk_count, CTLFLAG_RD, &pc_chunk_count, 0, "Current number of pv entry chunks"); SYSCTL_INT(_vm_pmap, OID_AUTO, pc_chunk_allocs, CTLFLAG_RD, &pc_chunk_allocs, 0, "Current number of pv entry chunks allocated"); SYSCTL_INT(_vm_pmap, OID_AUTO, pc_chunk_frees, CTLFLAG_RD, &pc_chunk_frees, 0, "Current number of pv entry chunks frees"); SYSCTL_INT(_vm_pmap, OID_AUTO, pc_chunk_tryfail, CTLFLAG_RD, &pc_chunk_tryfail, 0, "Number of times tried to get a chunk page but failed."); static long pv_entry_frees, pv_entry_allocs; static int pv_entry_spare; SYSCTL_LONG(_vm_pmap, OID_AUTO, pv_entry_frees, CTLFLAG_RD, &pv_entry_frees, 0, "Current number of pv entry frees"); SYSCTL_LONG(_vm_pmap, OID_AUTO, pv_entry_allocs, CTLFLAG_RD, &pv_entry_allocs, 0, "Current number of pv entry allocs"); SYSCTL_INT(_vm_pmap, OID_AUTO, pv_entry_spare, CTLFLAG_RD, &pv_entry_spare, 0, "Current number of spare pv entries"); #endif /* * We are in a serious low memory condition. Resort to * drastic measures to free some pages so we can allocate * another pv entry chunk. */ static vm_page_t pmap_pv_reclaim(pmap_t locked_pmap) { struct pch newtail; struct pv_chunk *pc; struct md_page *pvh; pd_entry_t *pde; pmap_t pmap; pt_entry_t *pte, tpte; pv_entry_t pv; vm_offset_t va; vm_page_t m, m_pc; struct spglist free; uint32_t inuse; int bit, field, freed; PMAP_LOCK_ASSERT(locked_pmap, MA_OWNED); pmap = NULL; m_pc = NULL; SLIST_INIT(&free); TAILQ_INIT(&newtail); while ((pc = TAILQ_FIRST(&pv_chunks)) != NULL && (pv_vafree == 0 || SLIST_EMPTY(&free))) { TAILQ_REMOVE(&pv_chunks, pc, pc_lru); if (pmap != pc->pc_pmap) { if (pmap != NULL) { pmap_invalidate_all(pmap); if (pmap != locked_pmap) PMAP_UNLOCK(pmap); } pmap = pc->pc_pmap; /* Avoid deadlock and lock recursion. */ if (pmap > locked_pmap) PMAP_LOCK(pmap); else if (pmap != locked_pmap && !PMAP_TRYLOCK(pmap)) { pmap = NULL; TAILQ_INSERT_TAIL(&newtail, pc, pc_lru); continue; } } /* * Destroy every non-wired, 4 KB page mapping in the chunk. */ freed = 0; for (field = 0; field < _NPCM; field++) { for (inuse = ~pc->pc_map[field] & pc_freemask[field]; inuse != 0; inuse &= ~(1UL << bit)) { bit = bsfl(inuse); pv = &pc->pc_pventry[field * 32 + bit]; va = pv->pv_va; pde = pmap_pde(pmap, va); if ((*pde & PG_PS) != 0) continue; pte = pmap_pte(pmap, va); tpte = *pte; if ((tpte & PG_W) == 0) tpte = pte_load_clear(pte); pmap_pte_release(pte); if ((tpte & PG_W) != 0) continue; KASSERT(tpte != 0, ("pmap_pv_reclaim: pmap %p va %x zero pte", pmap, va)); if ((tpte & PG_G) != 0) pmap_invalidate_page(pmap, va); m = PHYS_TO_VM_PAGE(tpte & PG_FRAME); if ((tpte & (PG_M | PG_RW)) == (PG_M | PG_RW)) vm_page_dirty(m); if ((tpte & PG_A) != 0) vm_page_aflag_set(m, PGA_REFERENCED); TAILQ_REMOVE(&m->md.pv_list, pv, pv_next); if (TAILQ_EMPTY(&m->md.pv_list) && (m->flags & PG_FICTITIOUS) == 0) { pvh = pa_to_pvh(VM_PAGE_TO_PHYS(m)); if (TAILQ_EMPTY(&pvh->pv_list)) { vm_page_aflag_clear(m, PGA_WRITEABLE); } } pc->pc_map[field] |= 1UL << bit; pmap_unuse_pt(pmap, va, &free); freed++; } } if (freed == 0) { TAILQ_INSERT_TAIL(&newtail, pc, pc_lru); continue; } /* Every freed mapping is for a 4 KB page. */ pmap->pm_stats.resident_count -= freed; PV_STAT(pv_entry_frees += freed); PV_STAT(pv_entry_spare += freed); pv_entry_count -= freed; TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); for (field = 0; field < _NPCM; field++) if (pc->pc_map[field] != pc_freemask[field]) { TAILQ_INSERT_HEAD(&pmap->pm_pvchunk, pc, pc_list); TAILQ_INSERT_TAIL(&newtail, pc, pc_lru); /* * One freed pv entry in locked_pmap is * sufficient. */ if (pmap == locked_pmap) goto out; break; } if (field == _NPCM) { PV_STAT(pv_entry_spare -= _NPCPV); PV_STAT(pc_chunk_count--); PV_STAT(pc_chunk_frees++); /* Entire chunk is free; return it. */ m_pc = PHYS_TO_VM_PAGE(pmap_kextract((vm_offset_t)pc)); pmap_qremove((vm_offset_t)pc, 1); pmap_ptelist_free(&pv_vafree, (vm_offset_t)pc); break; } } out: TAILQ_CONCAT(&pv_chunks, &newtail, pc_lru); if (pmap != NULL) { pmap_invalidate_all(pmap); if (pmap != locked_pmap) PMAP_UNLOCK(pmap); } if (m_pc == NULL && pv_vafree != 0 && SLIST_EMPTY(&free)) { m_pc = SLIST_FIRST(&free); SLIST_REMOVE_HEAD(&free, plinks.s.ss); /* Recycle a freed page table page. */ m_pc->wire_count = 1; atomic_add_int(&vm_cnt.v_wire_count, 1); } pmap_free_zero_pages(&free); return (m_pc); } /* * free the pv_entry back to the free list */ static void free_pv_entry(pmap_t pmap, pv_entry_t pv) { struct pv_chunk *pc; int idx, field, bit; rw_assert(&pvh_global_lock, RA_WLOCKED); PMAP_LOCK_ASSERT(pmap, MA_OWNED); PV_STAT(pv_entry_frees++); PV_STAT(pv_entry_spare++); pv_entry_count--; pc = pv_to_chunk(pv); idx = pv - &pc->pc_pventry[0]; field = idx / 32; bit = idx % 32; pc->pc_map[field] |= 1ul << bit; for (idx = 0; idx < _NPCM; idx++) if (pc->pc_map[idx] != pc_freemask[idx]) { /* * 98% of the time, pc is already at the head of the * list. If it isn't already, move it to the head. */ if (__predict_false(TAILQ_FIRST(&pmap->pm_pvchunk) != pc)) { TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); TAILQ_INSERT_HEAD(&pmap->pm_pvchunk, pc, pc_list); } return; } TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); free_pv_chunk(pc); } static void free_pv_chunk(struct pv_chunk *pc) { vm_page_t m; TAILQ_REMOVE(&pv_chunks, pc, pc_lru); PV_STAT(pv_entry_spare -= _NPCPV); PV_STAT(pc_chunk_count--); PV_STAT(pc_chunk_frees++); /* entire chunk is free, return it */ m = PHYS_TO_VM_PAGE(pmap_kextract((vm_offset_t)pc)); pmap_qremove((vm_offset_t)pc, 1); vm_page_unwire(m, PQ_NONE); vm_page_free(m); pmap_ptelist_free(&pv_vafree, (vm_offset_t)pc); } /* * get a new pv_entry, allocating a block from the system * when needed. */ static pv_entry_t get_pv_entry(pmap_t pmap, boolean_t try) { static const struct timeval printinterval = { 60, 0 }; static struct timeval lastprint; int bit, field; pv_entry_t pv; struct pv_chunk *pc; vm_page_t m; rw_assert(&pvh_global_lock, RA_WLOCKED); PMAP_LOCK_ASSERT(pmap, MA_OWNED); PV_STAT(pv_entry_allocs++); pv_entry_count++; if (pv_entry_count > pv_entry_high_water) if (ratecheck(&lastprint, &printinterval)) printf("Approaching the limit on PV entries, consider " "increasing either the vm.pmap.shpgperproc or the " "vm.pmap.pv_entry_max tunable.\n"); retry: pc = TAILQ_FIRST(&pmap->pm_pvchunk); if (pc != NULL) { for (field = 0; field < _NPCM; field++) { if (pc->pc_map[field]) { bit = bsfl(pc->pc_map[field]); break; } } if (field < _NPCM) { pv = &pc->pc_pventry[field * 32 + bit]; pc->pc_map[field] &= ~(1ul << bit); /* If this was the last item, move it to tail */ for (field = 0; field < _NPCM; field++) if (pc->pc_map[field] != 0) { PV_STAT(pv_entry_spare--); return (pv); /* not full, return */ } TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); TAILQ_INSERT_TAIL(&pmap->pm_pvchunk, pc, pc_list); PV_STAT(pv_entry_spare--); return (pv); } } /* * Access to the ptelist "pv_vafree" is synchronized by the pvh * global lock. If "pv_vafree" is currently non-empty, it will * remain non-empty until pmap_ptelist_alloc() completes. */ if (pv_vafree == 0 || (m = vm_page_alloc(NULL, 0, VM_ALLOC_NORMAL | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED)) == NULL) { if (try) { pv_entry_count--; PV_STAT(pc_chunk_tryfail++); return (NULL); } m = pmap_pv_reclaim(pmap); if (m == NULL) goto retry; } PV_STAT(pc_chunk_count++); PV_STAT(pc_chunk_allocs++); pc = (struct pv_chunk *)pmap_ptelist_alloc(&pv_vafree); pmap_qenter((vm_offset_t)pc, &m, 1); pc->pc_pmap = pmap; pc->pc_map[0] = pc_freemask[0] & ~1ul; /* preallocated bit 0 */ for (field = 1; field < _NPCM; field++) pc->pc_map[field] = pc_freemask[field]; TAILQ_INSERT_TAIL(&pv_chunks, pc, pc_lru); pv = &pc->pc_pventry[0]; TAILQ_INSERT_HEAD(&pmap->pm_pvchunk, pc, pc_list); PV_STAT(pv_entry_spare += _NPCPV - 1); return (pv); } static __inline pv_entry_t pmap_pvh_remove(struct md_page *pvh, pmap_t pmap, vm_offset_t va) { pv_entry_t pv; rw_assert(&pvh_global_lock, RA_WLOCKED); TAILQ_FOREACH(pv, &pvh->pv_list, pv_next) { if (pmap == PV_PMAP(pv) && va == pv->pv_va) { TAILQ_REMOVE(&pvh->pv_list, pv, pv_next); break; } } return (pv); } static void pmap_pv_demote_pde(pmap_t pmap, vm_offset_t va, vm_paddr_t pa) { struct md_page *pvh; pv_entry_t pv; vm_offset_t va_last; vm_page_t m; rw_assert(&pvh_global_lock, RA_WLOCKED); KASSERT((pa & PDRMASK) == 0, ("pmap_pv_demote_pde: pa is not 4mpage aligned")); /* * Transfer the 4mpage's pv entry for this mapping to the first * page's pv list. */ pvh = pa_to_pvh(pa); va = trunc_4mpage(va); pv = pmap_pvh_remove(pvh, pmap, va); KASSERT(pv != NULL, ("pmap_pv_demote_pde: pv not found")); m = PHYS_TO_VM_PAGE(pa); TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_next); /* Instantiate the remaining NPTEPG - 1 pv entries. */ va_last = va + NBPDR - PAGE_SIZE; do { m++; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_pv_demote_pde: page %p is not managed", m)); va += PAGE_SIZE; pmap_insert_entry(pmap, va, m); } while (va < va_last); } static void pmap_pv_promote_pde(pmap_t pmap, vm_offset_t va, vm_paddr_t pa) { struct md_page *pvh; pv_entry_t pv; vm_offset_t va_last; vm_page_t m; rw_assert(&pvh_global_lock, RA_WLOCKED); KASSERT((pa & PDRMASK) == 0, ("pmap_pv_promote_pde: pa is not 4mpage aligned")); /* * Transfer the first page's pv entry for this mapping to the * 4mpage's pv list. Aside from avoiding the cost of a call * to get_pv_entry(), a transfer avoids the possibility that * get_pv_entry() calls pmap_collect() and that pmap_collect() * removes one of the mappings that is being promoted. */ m = PHYS_TO_VM_PAGE(pa); va = trunc_4mpage(va); pv = pmap_pvh_remove(&m->md, pmap, va); KASSERT(pv != NULL, ("pmap_pv_promote_pde: pv not found")); pvh = pa_to_pvh(pa); TAILQ_INSERT_TAIL(&pvh->pv_list, pv, pv_next); /* Free the remaining NPTEPG - 1 pv entries. */ va_last = va + NBPDR - PAGE_SIZE; do { m++; va += PAGE_SIZE; pmap_pvh_free(&m->md, pmap, va); } while (va < va_last); } static void pmap_pvh_free(struct md_page *pvh, pmap_t pmap, vm_offset_t va) { pv_entry_t pv; pv = pmap_pvh_remove(pvh, pmap, va); KASSERT(pv != NULL, ("pmap_pvh_free: pv not found")); free_pv_entry(pmap, pv); } static void pmap_remove_entry(pmap_t pmap, vm_page_t m, vm_offset_t va) { struct md_page *pvh; rw_assert(&pvh_global_lock, RA_WLOCKED); pmap_pvh_free(&m->md, pmap, va); if (TAILQ_EMPTY(&m->md.pv_list) && (m->flags & PG_FICTITIOUS) == 0) { pvh = pa_to_pvh(VM_PAGE_TO_PHYS(m)); if (TAILQ_EMPTY(&pvh->pv_list)) vm_page_aflag_clear(m, PGA_WRITEABLE); } } /* * Create a pv entry for page at pa for * (pmap, va). */ static void pmap_insert_entry(pmap_t pmap, vm_offset_t va, vm_page_t m) { pv_entry_t pv; rw_assert(&pvh_global_lock, RA_WLOCKED); PMAP_LOCK_ASSERT(pmap, MA_OWNED); pv = get_pv_entry(pmap, FALSE); pv->pv_va = va; TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_next); } /* * Conditionally create a pv entry. */ static boolean_t pmap_try_insert_pv_entry(pmap_t pmap, vm_offset_t va, vm_page_t m) { pv_entry_t pv; rw_assert(&pvh_global_lock, RA_WLOCKED); PMAP_LOCK_ASSERT(pmap, MA_OWNED); if (pv_entry_count < pv_entry_high_water && (pv = get_pv_entry(pmap, TRUE)) != NULL) { pv->pv_va = va; TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_next); return (TRUE); } else return (FALSE); } /* * Create the pv entries for each of the pages within a superpage. */ static boolean_t pmap_pv_insert_pde(pmap_t pmap, vm_offset_t va, vm_paddr_t pa) { struct md_page *pvh; pv_entry_t pv; rw_assert(&pvh_global_lock, RA_WLOCKED); if (pv_entry_count < pv_entry_high_water && (pv = get_pv_entry(pmap, TRUE)) != NULL) { pv->pv_va = va; pvh = pa_to_pvh(pa); TAILQ_INSERT_TAIL(&pvh->pv_list, pv, pv_next); return (TRUE); } else return (FALSE); } /* * Fills a page table page with mappings to consecutive physical pages. */ static void pmap_fill_ptp(pt_entry_t *firstpte, pt_entry_t newpte) { pt_entry_t *pte; for (pte = firstpte; pte < firstpte + NPTEPG; pte++) { *pte = newpte; newpte += PAGE_SIZE; } } /* * Tries to demote a 2- or 4MB page mapping. If demotion fails, the * 2- or 4MB page mapping is invalidated. */ static boolean_t pmap_demote_pde(pmap_t pmap, pd_entry_t *pde, vm_offset_t va) { pd_entry_t newpde, oldpde; pt_entry_t *firstpte, newpte; vm_paddr_t mptepa; vm_page_t mpte; struct spglist free; PMAP_LOCK_ASSERT(pmap, MA_OWNED); oldpde = *pde; KASSERT((oldpde & (PG_PS | PG_V)) == (PG_PS | PG_V), ("pmap_demote_pde: oldpde is missing PG_PS and/or PG_V")); if ((oldpde & PG_A) != 0 && (mpte = pmap_lookup_pt_page(pmap, va)) != NULL) pmap_remove_pt_page(pmap, mpte); else { KASSERT((oldpde & PG_W) == 0, ("pmap_demote_pde: page table page for a wired mapping" " is missing")); /* * Invalidate the 2- or 4MB page mapping and return * "failure" if the mapping was never accessed or the * allocation of the new page table page fails. */ if ((oldpde & PG_A) == 0 || (mpte = vm_page_alloc(NULL, va >> PDRSHIFT, VM_ALLOC_NOOBJ | VM_ALLOC_NORMAL | VM_ALLOC_WIRED)) == NULL) { SLIST_INIT(&free); pmap_remove_pde(pmap, pde, trunc_4mpage(va), &free); pmap_invalidate_page(pmap, trunc_4mpage(va)); pmap_free_zero_pages(&free); CTR2(KTR_PMAP, "pmap_demote_pde: failure for va %#x" " in pmap %p", va, pmap); return (FALSE); } if (va < VM_MAXUSER_ADDRESS) pmap->pm_stats.resident_count++; } mptepa = VM_PAGE_TO_PHYS(mpte); /* * If the page mapping is in the kernel's address space, then the * KPTmap can provide access to the page table page. Otherwise, * temporarily map the page table page (mpte) into the kernel's * address space at either PADDR1 or PADDR2. */ if (va >= KERNBASE) firstpte = &KPTmap[i386_btop(trunc_4mpage(va))]; else if (curthread->td_pinned > 0 && rw_wowned(&pvh_global_lock)) { if ((*PMAP1 & PG_FRAME) != mptepa) { *PMAP1 = mptepa | PG_RW | PG_V | PG_A | PG_M; #ifdef SMP PMAP1cpu = PCPU_GET(cpuid); #endif invlcaddr(PADDR1); PMAP1changed++; } else #ifdef SMP if (PMAP1cpu != PCPU_GET(cpuid)) { PMAP1cpu = PCPU_GET(cpuid); invlcaddr(PADDR1); PMAP1changedcpu++; } else #endif PMAP1unchanged++; firstpte = PADDR1; } else { mtx_lock(&PMAP2mutex); if ((*PMAP2 & PG_FRAME) != mptepa) { *PMAP2 = mptepa | PG_RW | PG_V | PG_A | PG_M; pmap_invalidate_page(kernel_pmap, (vm_offset_t)PADDR2); } firstpte = PADDR2; } newpde = mptepa | PG_M | PG_A | (oldpde & PG_U) | PG_RW | PG_V; KASSERT((oldpde & PG_A) != 0, ("pmap_demote_pde: oldpde is missing PG_A")); KASSERT((oldpde & (PG_M | PG_RW)) != PG_RW, ("pmap_demote_pde: oldpde is missing PG_M")); newpte = oldpde & ~PG_PS; if ((newpte & PG_PDE_PAT) != 0) newpte ^= PG_PDE_PAT | PG_PTE_PAT; /* * If the page table page is new, initialize it. */ if (mpte->wire_count == 1) { mpte->wire_count = NPTEPG; pmap_fill_ptp(firstpte, newpte); } KASSERT((*firstpte & PG_FRAME) == (newpte & PG_FRAME), ("pmap_demote_pde: firstpte and newpte map different physical" " addresses")); /* * If the mapping has changed attributes, update the page table * entries. */ if ((*firstpte & PG_PTE_PROMOTE) != (newpte & PG_PTE_PROMOTE)) pmap_fill_ptp(firstpte, newpte); /* * Demote the mapping. This pmap is locked. The old PDE has * PG_A set. If the old PDE has PG_RW set, it also has PG_M * set. Thus, there is no danger of a race with another * processor changing the setting of PG_A and/or PG_M between * the read above and the store below. */ if (workaround_erratum383) pmap_update_pde(pmap, va, pde, newpde); else if (pmap == kernel_pmap) pmap_kenter_pde(va, newpde); else pde_store(pde, newpde); if (firstpte == PADDR2) mtx_unlock(&PMAP2mutex); /* * Invalidate the recursive mapping of the page table page. */ pmap_invalidate_page(pmap, (vm_offset_t)vtopte(va)); /* * Demote the pv entry. This depends on the earlier demotion * of the mapping. Specifically, the (re)creation of a per- * page pv entry might trigger the execution of pmap_collect(), * which might reclaim a newly (re)created per-page pv entry * and destroy the associated mapping. In order to destroy * the mapping, the PDE must have already changed from mapping * the 2mpage to referencing the page table page. */ if ((oldpde & PG_MANAGED) != 0) pmap_pv_demote_pde(pmap, va, oldpde & PG_PS_FRAME); pmap_pde_demotions++; CTR2(KTR_PMAP, "pmap_demote_pde: success for va %#x" " in pmap %p", va, pmap); return (TRUE); } /* * Removes a 2- or 4MB page mapping from the kernel pmap. */ static void pmap_remove_kernel_pde(pmap_t pmap, pd_entry_t *pde, vm_offset_t va) { pd_entry_t newpde; vm_paddr_t mptepa; vm_page_t mpte; PMAP_LOCK_ASSERT(pmap, MA_OWNED); mpte = pmap_lookup_pt_page(pmap, va); if (mpte == NULL) panic("pmap_remove_kernel_pde: Missing pt page."); pmap_remove_pt_page(pmap, mpte); mptepa = VM_PAGE_TO_PHYS(mpte); newpde = mptepa | PG_M | PG_A | PG_RW | PG_V; /* * Initialize the page table page. */ pagezero((void *)&KPTmap[i386_btop(trunc_4mpage(va))]); /* * Remove the mapping. */ if (workaround_erratum383) pmap_update_pde(pmap, va, pde, newpde); else pmap_kenter_pde(va, newpde); /* * Invalidate the recursive mapping of the page table page. */ pmap_invalidate_page(pmap, (vm_offset_t)vtopte(va)); } /* * pmap_remove_pde: do the things to unmap a superpage in a process */ static void pmap_remove_pde(pmap_t pmap, pd_entry_t *pdq, vm_offset_t sva, struct spglist *free) { struct md_page *pvh; pd_entry_t oldpde; vm_offset_t eva, va; vm_page_t m, mpte; PMAP_LOCK_ASSERT(pmap, MA_OWNED); KASSERT((sva & PDRMASK) == 0, ("pmap_remove_pde: sva is not 4mpage aligned")); oldpde = pte_load_clear(pdq); if (oldpde & PG_W) pmap->pm_stats.wired_count -= NBPDR / PAGE_SIZE; /* * Machines that don't support invlpg, also don't support * PG_G. */ if (oldpde & PG_G) pmap_invalidate_page(kernel_pmap, sva); pmap->pm_stats.resident_count -= NBPDR / PAGE_SIZE; if (oldpde & PG_MANAGED) { pvh = pa_to_pvh(oldpde & PG_PS_FRAME); pmap_pvh_free(pvh, pmap, sva); eva = sva + NBPDR; for (va = sva, m = PHYS_TO_VM_PAGE(oldpde & PG_PS_FRAME); va < eva; va += PAGE_SIZE, m++) { if ((oldpde & (PG_M | PG_RW)) == (PG_M | PG_RW)) vm_page_dirty(m); if (oldpde & PG_A) vm_page_aflag_set(m, PGA_REFERENCED); if (TAILQ_EMPTY(&m->md.pv_list) && TAILQ_EMPTY(&pvh->pv_list)) vm_page_aflag_clear(m, PGA_WRITEABLE); } } if (pmap == kernel_pmap) { pmap_remove_kernel_pde(pmap, pdq, sva); } else { mpte = pmap_lookup_pt_page(pmap, sva); if (mpte != NULL) { pmap_remove_pt_page(pmap, mpte); pmap->pm_stats.resident_count--; KASSERT(mpte->wire_count == NPTEPG, ("pmap_remove_pde: pte page wire count error")); mpte->wire_count = 0; pmap_add_delayed_free_list(mpte, free, FALSE); atomic_subtract_int(&vm_cnt.v_wire_count, 1); } } } /* * pmap_remove_pte: do the things to unmap a page in a process */ static int pmap_remove_pte(pmap_t pmap, pt_entry_t *ptq, vm_offset_t va, struct spglist *free) { pt_entry_t oldpte; vm_page_t m; rw_assert(&pvh_global_lock, RA_WLOCKED); PMAP_LOCK_ASSERT(pmap, MA_OWNED); oldpte = pte_load_clear(ptq); KASSERT(oldpte != 0, ("pmap_remove_pte: pmap %p va %x zero pte", pmap, va)); if (oldpte & PG_W) pmap->pm_stats.wired_count -= 1; /* * Machines that don't support invlpg, also don't support * PG_G. */ if (oldpte & PG_G) pmap_invalidate_page(kernel_pmap, va); pmap->pm_stats.resident_count -= 1; if (oldpte & PG_MANAGED) { m = PHYS_TO_VM_PAGE(oldpte & PG_FRAME); if ((oldpte & (PG_M | PG_RW)) == (PG_M | PG_RW)) vm_page_dirty(m); if (oldpte & PG_A) vm_page_aflag_set(m, PGA_REFERENCED); pmap_remove_entry(pmap, m, va); } return (pmap_unuse_pt(pmap, va, free)); } /* * Remove a single page from a process address space */ static void pmap_remove_page(pmap_t pmap, vm_offset_t va, struct spglist *free) { pt_entry_t *pte; rw_assert(&pvh_global_lock, RA_WLOCKED); KASSERT(curthread->td_pinned > 0, ("curthread not pinned")); PMAP_LOCK_ASSERT(pmap, MA_OWNED); if ((pte = pmap_pte_quick(pmap, va)) == NULL || *pte == 0) return; pmap_remove_pte(pmap, pte, va, free); pmap_invalidate_page(pmap, va); } /* * Remove the given range of addresses from the specified map. * * It is assumed that the start and end are properly * rounded to the page size. */ void pmap_remove(pmap_t pmap, vm_offset_t sva, vm_offset_t eva) { vm_offset_t pdnxt; pd_entry_t ptpaddr; pt_entry_t *pte; struct spglist free; int anyvalid; /* * Perform an unsynchronized read. This is, however, safe. */ if (pmap->pm_stats.resident_count == 0) return; anyvalid = 0; SLIST_INIT(&free); rw_wlock(&pvh_global_lock); sched_pin(); PMAP_LOCK(pmap); /* * special handling of removing one page. a very * common operation and easy to short circuit some * code. */ if ((sva + PAGE_SIZE == eva) && ((pmap->pm_pdir[(sva >> PDRSHIFT)] & PG_PS) == 0)) { pmap_remove_page(pmap, sva, &free); goto out; } for (; sva < eva; sva = pdnxt) { u_int pdirindex; /* * Calculate index for next page table. */ pdnxt = (sva + NBPDR) & ~PDRMASK; if (pdnxt < sva) pdnxt = eva; if (pmap->pm_stats.resident_count == 0) break; pdirindex = sva >> PDRSHIFT; ptpaddr = pmap->pm_pdir[pdirindex]; /* * Weed out invalid mappings. Note: we assume that the page * directory table is always allocated, and in kernel virtual. */ if (ptpaddr == 0) continue; /* * Check for large page. */ if ((ptpaddr & PG_PS) != 0) { /* * Are we removing the entire large page? If not, * demote the mapping and fall through. */ if (sva + NBPDR == pdnxt && eva >= pdnxt) { /* * The TLB entry for a PG_G mapping is * invalidated by pmap_remove_pde(). */ if ((ptpaddr & PG_G) == 0) anyvalid = 1; pmap_remove_pde(pmap, &pmap->pm_pdir[pdirindex], sva, &free); continue; } else if (!pmap_demote_pde(pmap, &pmap->pm_pdir[pdirindex], sva)) { /* The large page mapping was destroyed. */ continue; } } /* * Limit our scan to either the end of the va represented * by the current page table page, or to the end of the * range being removed. */ if (pdnxt > eva) pdnxt = eva; for (pte = pmap_pte_quick(pmap, sva); sva != pdnxt; pte++, sva += PAGE_SIZE) { if (*pte == 0) continue; /* * The TLB entry for a PG_G mapping is invalidated * by pmap_remove_pte(). */ if ((*pte & PG_G) == 0) anyvalid = 1; if (pmap_remove_pte(pmap, pte, sva, &free)) break; } } out: sched_unpin(); if (anyvalid) pmap_invalidate_all(pmap); rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pmap); pmap_free_zero_pages(&free); } /* * Routine: pmap_remove_all * Function: * Removes this physical page from * all physical maps in which it resides. * Reflects back modify bits to the pager. * * Notes: * Original versions of this routine were very * inefficient because they iteratively called * pmap_remove (slow...) */ void pmap_remove_all(vm_page_t m) { struct md_page *pvh; pv_entry_t pv; pmap_t pmap; pt_entry_t *pte, tpte; pd_entry_t *pde; vm_offset_t va; struct spglist free; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_remove_all: page %p is not managed", m)); SLIST_INIT(&free); rw_wlock(&pvh_global_lock); sched_pin(); if ((m->flags & PG_FICTITIOUS) != 0) goto small_mappings; pvh = pa_to_pvh(VM_PAGE_TO_PHYS(m)); while ((pv = TAILQ_FIRST(&pvh->pv_list)) != NULL) { va = pv->pv_va; pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pde = pmap_pde(pmap, va); (void)pmap_demote_pde(pmap, pde, va); PMAP_UNLOCK(pmap); } small_mappings: while ((pv = TAILQ_FIRST(&m->md.pv_list)) != NULL) { pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pmap->pm_stats.resident_count--; pde = pmap_pde(pmap, pv->pv_va); KASSERT((*pde & PG_PS) == 0, ("pmap_remove_all: found" " a 4mpage in page %p's pv list", m)); pte = pmap_pte_quick(pmap, pv->pv_va); tpte = pte_load_clear(pte); KASSERT(tpte != 0, ("pmap_remove_all: pmap %p va %x zero pte", pmap, pv->pv_va)); if (tpte & PG_W) pmap->pm_stats.wired_count--; if (tpte & PG_A) vm_page_aflag_set(m, PGA_REFERENCED); /* * Update the vm_page_t clean and reference bits. */ if ((tpte & (PG_M | PG_RW)) == (PG_M | PG_RW)) vm_page_dirty(m); pmap_unuse_pt(pmap, pv->pv_va, &free); pmap_invalidate_page(pmap, pv->pv_va); TAILQ_REMOVE(&m->md.pv_list, pv, pv_next); free_pv_entry(pmap, pv); PMAP_UNLOCK(pmap); } vm_page_aflag_clear(m, PGA_WRITEABLE); sched_unpin(); rw_wunlock(&pvh_global_lock); pmap_free_zero_pages(&free); } /* * pmap_protect_pde: do the things to protect a 4mpage in a process */ static boolean_t pmap_protect_pde(pmap_t pmap, pd_entry_t *pde, vm_offset_t sva, vm_prot_t prot) { pd_entry_t newpde, oldpde; vm_offset_t eva, va; vm_page_t m; boolean_t anychanged; PMAP_LOCK_ASSERT(pmap, MA_OWNED); KASSERT((sva & PDRMASK) == 0, ("pmap_protect_pde: sva is not 4mpage aligned")); anychanged = FALSE; retry: oldpde = newpde = *pde; if (oldpde & PG_MANAGED) { eva = sva + NBPDR; for (va = sva, m = PHYS_TO_VM_PAGE(oldpde & PG_PS_FRAME); va < eva; va += PAGE_SIZE, m++) if ((oldpde & (PG_M | PG_RW)) == (PG_M | PG_RW)) vm_page_dirty(m); } if ((prot & VM_PROT_WRITE) == 0) newpde &= ~(PG_RW | PG_M); #if defined(PAE) || defined(PAE_TABLES) if ((prot & VM_PROT_EXECUTE) == 0) newpde |= pg_nx; #endif if (newpde != oldpde) { if (!pde_cmpset(pde, oldpde, newpde)) goto retry; if (oldpde & PG_G) pmap_invalidate_page(pmap, sva); else anychanged = TRUE; } return (anychanged); } /* * Set the physical protection on the * specified range of this map as requested. */ void pmap_protect(pmap_t pmap, vm_offset_t sva, vm_offset_t eva, vm_prot_t prot) { vm_offset_t pdnxt; pd_entry_t ptpaddr; pt_entry_t *pte; boolean_t anychanged, pv_lists_locked; KASSERT((prot & ~VM_PROT_ALL) == 0, ("invalid prot %x", prot)); if (prot == VM_PROT_NONE) { pmap_remove(pmap, sva, eva); return; } #if defined(PAE) || defined(PAE_TABLES) if ((prot & (VM_PROT_WRITE|VM_PROT_EXECUTE)) == (VM_PROT_WRITE|VM_PROT_EXECUTE)) return; #else if (prot & VM_PROT_WRITE) return; #endif if (pmap_is_current(pmap)) pv_lists_locked = FALSE; else { pv_lists_locked = TRUE; resume: rw_wlock(&pvh_global_lock); sched_pin(); } anychanged = FALSE; PMAP_LOCK(pmap); for (; sva < eva; sva = pdnxt) { pt_entry_t obits, pbits; u_int pdirindex; pdnxt = (sva + NBPDR) & ~PDRMASK; if (pdnxt < sva) pdnxt = eva; pdirindex = sva >> PDRSHIFT; ptpaddr = pmap->pm_pdir[pdirindex]; /* * Weed out invalid mappings. Note: we assume that the page * directory table is always allocated, and in kernel virtual. */ if (ptpaddr == 0) continue; /* * Check for large page. */ if ((ptpaddr & PG_PS) != 0) { /* * Are we protecting the entire large page? If not, * demote the mapping and fall through. */ if (sva + NBPDR == pdnxt && eva >= pdnxt) { /* * The TLB entry for a PG_G mapping is * invalidated by pmap_protect_pde(). */ if (pmap_protect_pde(pmap, &pmap->pm_pdir[pdirindex], sva, prot)) anychanged = TRUE; continue; } else { if (!pv_lists_locked) { pv_lists_locked = TRUE; if (!rw_try_wlock(&pvh_global_lock)) { if (anychanged) pmap_invalidate_all( pmap); PMAP_UNLOCK(pmap); goto resume; } sched_pin(); } if (!pmap_demote_pde(pmap, &pmap->pm_pdir[pdirindex], sva)) { /* * The large page mapping was * destroyed. */ continue; } } } if (pdnxt > eva) pdnxt = eva; for (pte = pmap_pte_quick(pmap, sva); sva != pdnxt; pte++, sva += PAGE_SIZE) { vm_page_t m; retry: /* * Regardless of whether a pte is 32 or 64 bits in * size, PG_RW, PG_A, and PG_M are among the least * significant 32 bits. */ obits = pbits = *pte; if ((pbits & PG_V) == 0) continue; if ((prot & VM_PROT_WRITE) == 0) { if ((pbits & (PG_MANAGED | PG_M | PG_RW)) == (PG_MANAGED | PG_M | PG_RW)) { m = PHYS_TO_VM_PAGE(pbits & PG_FRAME); vm_page_dirty(m); } pbits &= ~(PG_RW | PG_M); } #if defined(PAE) || defined(PAE_TABLES) if ((prot & VM_PROT_EXECUTE) == 0) pbits |= pg_nx; #endif if (pbits != obits) { #if defined(PAE) || defined(PAE_TABLES) if (!atomic_cmpset_64(pte, obits, pbits)) goto retry; #else if (!atomic_cmpset_int((u_int *)pte, obits, pbits)) goto retry; #endif if (obits & PG_G) pmap_invalidate_page(pmap, sva); else anychanged = TRUE; } } } if (anychanged) pmap_invalidate_all(pmap); if (pv_lists_locked) { sched_unpin(); rw_wunlock(&pvh_global_lock); } PMAP_UNLOCK(pmap); } /* * Tries to promote the 512 or 1024, contiguous 4KB page mappings that are * within a single page table page (PTP) to a single 2- or 4MB page mapping. * For promotion to occur, two conditions must be met: (1) the 4KB page * mappings must map aligned, contiguous physical memory and (2) the 4KB page * mappings must have identical characteristics. * * Managed (PG_MANAGED) mappings within the kernel address space are not * promoted. The reason is that kernel PDEs are replicated in each pmap but * pmap_clear_ptes() and pmap_ts_referenced() only read the PDE from the kernel * pmap. */ static void pmap_promote_pde(pmap_t pmap, pd_entry_t *pde, vm_offset_t va) { pd_entry_t newpde; pt_entry_t *firstpte, oldpte, pa, *pte; vm_offset_t oldpteva; vm_page_t mpte; PMAP_LOCK_ASSERT(pmap, MA_OWNED); /* * Examine the first PTE in the specified PTP. Abort if this PTE is * either invalid, unused, or does not map the first 4KB physical page * within a 2- or 4MB page. */ firstpte = pmap_pte_quick(pmap, trunc_4mpage(va)); setpde: newpde = *firstpte; if ((newpde & ((PG_FRAME & PDRMASK) | PG_A | PG_V)) != (PG_A | PG_V)) { pmap_pde_p_failures++; CTR2(KTR_PMAP, "pmap_promote_pde: failure for va %#x" " in pmap %p", va, pmap); return; } if ((*firstpte & PG_MANAGED) != 0 && pmap == kernel_pmap) { pmap_pde_p_failures++; CTR2(KTR_PMAP, "pmap_promote_pde: failure for va %#x" " in pmap %p", va, pmap); return; } if ((newpde & (PG_M | PG_RW)) == PG_RW) { /* * When PG_M is already clear, PG_RW can be cleared without * a TLB invalidation. */ if (!atomic_cmpset_int((u_int *)firstpte, newpde, newpde & ~PG_RW)) goto setpde; newpde &= ~PG_RW; } /* * Examine each of the other PTEs in the specified PTP. Abort if this * PTE maps an unexpected 4KB physical page or does not have identical * characteristics to the first PTE. */ pa = (newpde & (PG_PS_FRAME | PG_A | PG_V)) + NBPDR - PAGE_SIZE; for (pte = firstpte + NPTEPG - 1; pte > firstpte; pte--) { setpte: oldpte = *pte; if ((oldpte & (PG_FRAME | PG_A | PG_V)) != pa) { pmap_pde_p_failures++; CTR2(KTR_PMAP, "pmap_promote_pde: failure for va %#x" " in pmap %p", va, pmap); return; } if ((oldpte & (PG_M | PG_RW)) == PG_RW) { /* * When PG_M is already clear, PG_RW can be cleared * without a TLB invalidation. */ if (!atomic_cmpset_int((u_int *)pte, oldpte, oldpte & ~PG_RW)) goto setpte; oldpte &= ~PG_RW; oldpteva = (oldpte & PG_FRAME & PDRMASK) | (va & ~PDRMASK); CTR2(KTR_PMAP, "pmap_promote_pde: protect for va %#x" " in pmap %p", oldpteva, pmap); } if ((oldpte & PG_PTE_PROMOTE) != (newpde & PG_PTE_PROMOTE)) { pmap_pde_p_failures++; CTR2(KTR_PMAP, "pmap_promote_pde: failure for va %#x" " in pmap %p", va, pmap); return; } pa -= PAGE_SIZE; } /* * Save the page table page in its current state until the PDE * mapping the superpage is demoted by pmap_demote_pde() or * destroyed by pmap_remove_pde(). */ mpte = PHYS_TO_VM_PAGE(*pde & PG_FRAME); KASSERT(mpte >= vm_page_array && mpte < &vm_page_array[vm_page_array_size], ("pmap_promote_pde: page table page is out of range")); KASSERT(mpte->pindex == va >> PDRSHIFT, ("pmap_promote_pde: page table page's pindex is wrong")); if (pmap_insert_pt_page(pmap, mpte)) { pmap_pde_p_failures++; CTR2(KTR_PMAP, "pmap_promote_pde: failure for va %#x in pmap %p", va, pmap); return; } /* * Promote the pv entries. */ if ((newpde & PG_MANAGED) != 0) pmap_pv_promote_pde(pmap, va, newpde & PG_PS_FRAME); /* * Propagate the PAT index to its proper position. */ if ((newpde & PG_PTE_PAT) != 0) newpde ^= PG_PDE_PAT | PG_PTE_PAT; /* * Map the superpage. */ if (workaround_erratum383) pmap_update_pde(pmap, va, pde, PG_PS | newpde); else if (pmap == kernel_pmap) pmap_kenter_pde(va, PG_PS | newpde); else pde_store(pde, PG_PS | newpde); pmap_pde_promotions++; CTR2(KTR_PMAP, "pmap_promote_pde: success for va %#x" " in pmap %p", va, pmap); } /* * Insert the given physical page (p) at * the specified virtual address (v) in the * target physical map with the protection requested. * * If specified, the page will be wired down, meaning * that the related pte can not be reclaimed. * * NB: This is the only routine which MAY NOT lazy-evaluate * or lose information. That is, this routine must actually * insert this page into the given map NOW. */ int pmap_enter(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot, u_int flags, int8_t psind) { pd_entry_t *pde; pt_entry_t *pte; pt_entry_t newpte, origpte; pv_entry_t pv; vm_paddr_t opa, pa; vm_page_t mpte, om; boolean_t invlva, wired; va = trunc_page(va); mpte = NULL; wired = (flags & PMAP_ENTER_WIRED) != 0; KASSERT(va <= VM_MAX_KERNEL_ADDRESS, ("pmap_enter: toobig")); KASSERT(va < UPT_MIN_ADDRESS || va >= UPT_MAX_ADDRESS, ("pmap_enter: invalid to pmap_enter page table pages (va: 0x%x)", va)); if ((m->oflags & VPO_UNMANAGED) == 0 && !vm_page_xbusied(m)) VM_OBJECT_ASSERT_LOCKED(m->object); rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); sched_pin(); /* * In the case that a page table page is not * resident, we are creating it here. */ if (va < VM_MAXUSER_ADDRESS) { mpte = pmap_allocpte(pmap, va, flags); if (mpte == NULL) { KASSERT((flags & PMAP_ENTER_NOSLEEP) != 0, ("pmap_allocpte failed with sleep allowed")); sched_unpin(); rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pmap); return (KERN_RESOURCE_SHORTAGE); } } pde = pmap_pde(pmap, va); if ((*pde & PG_PS) != 0) panic("pmap_enter: attempted pmap_enter on 4MB page"); pte = pmap_pte_quick(pmap, va); /* * Page Directory table entry not valid, we need a new PT page */ if (pte == NULL) { panic("pmap_enter: invalid page directory pdir=%#jx, va=%#x", (uintmax_t)pmap->pm_pdir[PTDPTDI], va); } pa = VM_PAGE_TO_PHYS(m); om = NULL; origpte = *pte; opa = origpte & PG_FRAME; /* * Mapping has not changed, must be protection or wiring change. */ if (origpte && (opa == pa)) { /* * Wiring change, just update stats. We don't worry about * wiring PT pages as they remain resident as long as there * are valid mappings in them. Hence, if a user page is wired, * the PT page will be also. */ if (wired && ((origpte & PG_W) == 0)) pmap->pm_stats.wired_count++; else if (!wired && (origpte & PG_W)) pmap->pm_stats.wired_count--; /* * Remove extra pte reference */ if (mpte) mpte->wire_count--; if (origpte & PG_MANAGED) { om = m; pa |= PG_MANAGED; } goto validate; } pv = NULL; /* * Mapping has changed, invalidate old range and fall through to * handle validating new mapping. */ if (opa) { if (origpte & PG_W) pmap->pm_stats.wired_count--; if (origpte & PG_MANAGED) { om = PHYS_TO_VM_PAGE(opa); pv = pmap_pvh_remove(&om->md, pmap, va); } if (mpte != NULL) { mpte->wire_count--; KASSERT(mpte->wire_count > 0, ("pmap_enter: missing reference to page table page," " va: 0x%x", va)); } } else pmap->pm_stats.resident_count++; /* * Enter on the PV list if part of our managed memory. */ if ((m->oflags & VPO_UNMANAGED) == 0) { KASSERT(va < kmi.clean_sva || va >= kmi.clean_eva, ("pmap_enter: managed mapping within the clean submap")); if (pv == NULL) pv = get_pv_entry(pmap, FALSE); pv->pv_va = va; TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_next); pa |= PG_MANAGED; } else if (pv != NULL) free_pv_entry(pmap, pv); /* * Increment counters */ if (wired) pmap->pm_stats.wired_count++; validate: /* * Now validate mapping with desired protection/wiring. */ newpte = (pt_entry_t)(pa | pmap_cache_bits(m->md.pat_mode, 0) | PG_V); if ((prot & VM_PROT_WRITE) != 0) { newpte |= PG_RW; if ((newpte & PG_MANAGED) != 0) vm_page_aflag_set(m, PGA_WRITEABLE); } #if defined(PAE) || defined(PAE_TABLES) if ((prot & VM_PROT_EXECUTE) == 0) newpte |= pg_nx; #endif if (wired) newpte |= PG_W; if (va < VM_MAXUSER_ADDRESS) newpte |= PG_U; if (pmap == kernel_pmap) newpte |= pgeflag; /* * if the mapping or permission bits are different, we need * to update the pte. */ if ((origpte & ~(PG_M|PG_A)) != newpte) { newpte |= PG_A; if ((flags & VM_PROT_WRITE) != 0) newpte |= PG_M; if (origpte & PG_V) { invlva = FALSE; origpte = pte_load_store(pte, newpte); if (origpte & PG_A) { if (origpte & PG_MANAGED) vm_page_aflag_set(om, PGA_REFERENCED); if (opa != VM_PAGE_TO_PHYS(m)) invlva = TRUE; #if defined(PAE) || defined(PAE_TABLES) if ((origpte & PG_NX) == 0 && (newpte & PG_NX) != 0) invlva = TRUE; #endif } if ((origpte & (PG_M | PG_RW)) == (PG_M | PG_RW)) { if ((origpte & PG_MANAGED) != 0) vm_page_dirty(om); if ((prot & VM_PROT_WRITE) == 0) invlva = TRUE; } if ((origpte & PG_MANAGED) != 0 && TAILQ_EMPTY(&om->md.pv_list) && ((om->flags & PG_FICTITIOUS) != 0 || TAILQ_EMPTY(&pa_to_pvh(opa)->pv_list))) vm_page_aflag_clear(om, PGA_WRITEABLE); if (invlva) pmap_invalidate_page(pmap, va); } else pte_store(pte, newpte); } /* * If both the page table page and the reservation are fully * populated, then attempt promotion. */ if ((mpte == NULL || mpte->wire_count == NPTEPG) && pg_ps_enabled && (m->flags & PG_FICTITIOUS) == 0 && vm_reserv_level_iffullpop(m) == 0) pmap_promote_pde(pmap, pde, va); sched_unpin(); rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pmap); return (KERN_SUCCESS); } /* * Tries to create a 2- or 4MB page mapping. Returns TRUE if successful and * FALSE otherwise. Fails if (1) a page table page cannot be allocated without * blocking, (2) a mapping already exists at the specified virtual address, or * (3) a pv entry cannot be allocated without reclaiming another pv entry. */ static boolean_t pmap_enter_pde(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot) { pd_entry_t *pde, newpde; rw_assert(&pvh_global_lock, RA_WLOCKED); PMAP_LOCK_ASSERT(pmap, MA_OWNED); pde = pmap_pde(pmap, va); if (*pde != 0) { CTR2(KTR_PMAP, "pmap_enter_pde: failure for va %#lx" " in pmap %p", va, pmap); return (FALSE); } newpde = VM_PAGE_TO_PHYS(m) | pmap_cache_bits(m->md.pat_mode, 1) | PG_PS | PG_V; if ((m->oflags & VPO_UNMANAGED) == 0) { newpde |= PG_MANAGED; /* * Abort this mapping if its PV entry could not be created. */ if (!pmap_pv_insert_pde(pmap, va, VM_PAGE_TO_PHYS(m))) { CTR2(KTR_PMAP, "pmap_enter_pde: failure for va %#lx" " in pmap %p", va, pmap); return (FALSE); } } #if defined(PAE) || defined(PAE_TABLES) if ((prot & VM_PROT_EXECUTE) == 0) newpde |= pg_nx; #endif if (va < VM_MAXUSER_ADDRESS) newpde |= PG_U; /* * Increment counters. */ pmap->pm_stats.resident_count += NBPDR / PAGE_SIZE; /* * Map the superpage. */ pde_store(pde, newpde); pmap_pde_mappings++; CTR2(KTR_PMAP, "pmap_enter_pde: success for va %#lx" " in pmap %p", va, pmap); return (TRUE); } /* * Maps a sequence of resident pages belonging to the same object. * The sequence begins with the given page m_start. This page is * mapped at the given virtual address start. Each subsequent page is * mapped at a virtual address that is offset from start by the same * amount as the page is offset from m_start within the object. The * last page in the sequence is the page with the largest offset from * m_start that can be mapped at a virtual address less than the given * virtual address end. Not every virtual page between start and end * is mapped; only those for which a resident page exists with the * corresponding offset from m_start are mapped. */ void pmap_enter_object(pmap_t pmap, vm_offset_t start, vm_offset_t end, vm_page_t m_start, vm_prot_t prot) { vm_offset_t va; vm_page_t m, mpte; vm_pindex_t diff, psize; VM_OBJECT_ASSERT_LOCKED(m_start->object); psize = atop(end - start); mpte = NULL; m = m_start; rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); while (m != NULL && (diff = m->pindex - m_start->pindex) < psize) { va = start + ptoa(diff); if ((va & PDRMASK) == 0 && va + NBPDR <= end && m->psind == 1 && pg_ps_enabled && pmap_enter_pde(pmap, va, m, prot)) m = &m[NBPDR / PAGE_SIZE - 1]; else mpte = pmap_enter_quick_locked(pmap, va, m, prot, mpte); m = TAILQ_NEXT(m, listq); } rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pmap); } /* * this code makes some *MAJOR* assumptions: * 1. Current pmap & pmap exists. * 2. Not wired. * 3. Read access. * 4. No page table pages. * but is *MUCH* faster than pmap_enter... */ void pmap_enter_quick(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot) { rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); (void)pmap_enter_quick_locked(pmap, va, m, prot, NULL); rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pmap); } static vm_page_t pmap_enter_quick_locked(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot, vm_page_t mpte) { pt_entry_t *pte; vm_paddr_t pa; struct spglist free; KASSERT(va < kmi.clean_sva || va >= kmi.clean_eva || (m->oflags & VPO_UNMANAGED) != 0, ("pmap_enter_quick_locked: managed mapping within the clean submap")); rw_assert(&pvh_global_lock, RA_WLOCKED); PMAP_LOCK_ASSERT(pmap, MA_OWNED); /* * In the case that a page table page is not * resident, we are creating it here. */ if (va < VM_MAXUSER_ADDRESS) { u_int ptepindex; pd_entry_t ptepa; /* * Calculate pagetable page index */ ptepindex = va >> PDRSHIFT; if (mpte && (mpte->pindex == ptepindex)) { mpte->wire_count++; } else { /* * Get the page directory entry */ ptepa = pmap->pm_pdir[ptepindex]; /* * If the page table page is mapped, we just increment * the hold count, and activate it. */ if (ptepa) { if (ptepa & PG_PS) return (NULL); mpte = PHYS_TO_VM_PAGE(ptepa & PG_FRAME); mpte->wire_count++; } else { mpte = _pmap_allocpte(pmap, ptepindex, PMAP_ENTER_NOSLEEP); if (mpte == NULL) return (mpte); } } } else { mpte = NULL; } /* * This call to vtopte makes the assumption that we are * entering the page into the current pmap. In order to support * quick entry into any pmap, one would likely use pmap_pte_quick. * But that isn't as quick as vtopte. */ pte = vtopte(va); if (*pte) { if (mpte != NULL) { mpte->wire_count--; mpte = NULL; } return (mpte); } /* * Enter on the PV list if part of our managed memory. */ if ((m->oflags & VPO_UNMANAGED) == 0 && !pmap_try_insert_pv_entry(pmap, va, m)) { if (mpte != NULL) { SLIST_INIT(&free); if (pmap_unwire_ptp(pmap, mpte, &free)) { pmap_invalidate_page(pmap, va); pmap_free_zero_pages(&free); } mpte = NULL; } return (mpte); } /* * Increment counters */ pmap->pm_stats.resident_count++; pa = VM_PAGE_TO_PHYS(m) | pmap_cache_bits(m->md.pat_mode, 0); #if defined(PAE) || defined(PAE_TABLES) if ((prot & VM_PROT_EXECUTE) == 0) pa |= pg_nx; #endif /* * Now validate mapping with RO protection */ if ((m->oflags & VPO_UNMANAGED) != 0) pte_store(pte, pa | PG_V | PG_U); else pte_store(pte, pa | PG_V | PG_U | PG_MANAGED); return (mpte); } /* * Make a temporary mapping for a physical address. This is only intended * to be used for panic dumps. */ void * pmap_kenter_temporary(vm_paddr_t pa, int i) { vm_offset_t va; va = (vm_offset_t)crashdumpmap + (i * PAGE_SIZE); pmap_kenter(va, pa); invlpg(va); return ((void *)crashdumpmap); } /* * This code maps large physical mmap regions into the * processor address space. Note that some shortcuts * are taken, but the code works. */ void pmap_object_init_pt(pmap_t pmap, vm_offset_t addr, vm_object_t object, vm_pindex_t pindex, vm_size_t size) { pd_entry_t *pde; vm_paddr_t pa, ptepa; vm_page_t p; int pat_mode; VM_OBJECT_ASSERT_WLOCKED(object); KASSERT(object->type == OBJT_DEVICE || object->type == OBJT_SG, ("pmap_object_init_pt: non-device object")); if (pseflag && (addr & (NBPDR - 1)) == 0 && (size & (NBPDR - 1)) == 0) { if (!vm_object_populate(object, pindex, pindex + atop(size))) return; p = vm_page_lookup(object, pindex); KASSERT(p->valid == VM_PAGE_BITS_ALL, ("pmap_object_init_pt: invalid page %p", p)); pat_mode = p->md.pat_mode; /* * Abort the mapping if the first page is not physically * aligned to a 2/4MB page boundary. */ ptepa = VM_PAGE_TO_PHYS(p); if (ptepa & (NBPDR - 1)) return; /* * Skip the first page. Abort the mapping if the rest of * the pages are not physically contiguous or have differing * memory attributes. */ p = TAILQ_NEXT(p, listq); for (pa = ptepa + PAGE_SIZE; pa < ptepa + size; pa += PAGE_SIZE) { KASSERT(p->valid == VM_PAGE_BITS_ALL, ("pmap_object_init_pt: invalid page %p", p)); if (pa != VM_PAGE_TO_PHYS(p) || pat_mode != p->md.pat_mode) return; p = TAILQ_NEXT(p, listq); } /* * Map using 2/4MB pages. Since "ptepa" is 2/4M aligned and * "size" is a multiple of 2/4M, adding the PAT setting to * "pa" will not affect the termination of this loop. */ PMAP_LOCK(pmap); for (pa = ptepa | pmap_cache_bits(pat_mode, 1); pa < ptepa + size; pa += NBPDR) { pde = pmap_pde(pmap, addr); if (*pde == 0) { pde_store(pde, pa | PG_PS | PG_M | PG_A | PG_U | PG_RW | PG_V); pmap->pm_stats.resident_count += NBPDR / PAGE_SIZE; pmap_pde_mappings++; } /* Else continue on if the PDE is already valid. */ addr += NBPDR; } PMAP_UNLOCK(pmap); } } /* * Clear the wired attribute from the mappings for the specified range of * addresses in the given pmap. Every valid mapping within that range * must have the wired attribute set. In contrast, invalid mappings * cannot have the wired attribute set, so they are ignored. * * The wired attribute of the page table entry is not a hardware feature, * so there is no need to invalidate any TLB entries. */ void pmap_unwire(pmap_t pmap, vm_offset_t sva, vm_offset_t eva) { vm_offset_t pdnxt; pd_entry_t *pde; pt_entry_t *pte; boolean_t pv_lists_locked; if (pmap_is_current(pmap)) pv_lists_locked = FALSE; else { pv_lists_locked = TRUE; resume: rw_wlock(&pvh_global_lock); sched_pin(); } PMAP_LOCK(pmap); for (; sva < eva; sva = pdnxt) { pdnxt = (sva + NBPDR) & ~PDRMASK; if (pdnxt < sva) pdnxt = eva; pde = pmap_pde(pmap, sva); if ((*pde & PG_V) == 0) continue; if ((*pde & PG_PS) != 0) { if ((*pde & PG_W) == 0) panic("pmap_unwire: pde %#jx is missing PG_W", (uintmax_t)*pde); /* * Are we unwiring the entire large page? If not, * demote the mapping and fall through. */ if (sva + NBPDR == pdnxt && eva >= pdnxt) { /* * Regardless of whether a pde (or pte) is 32 * or 64 bits in size, PG_W is among the least * significant 32 bits. */ atomic_clear_int((u_int *)pde, PG_W); pmap->pm_stats.wired_count -= NBPDR / PAGE_SIZE; continue; } else { if (!pv_lists_locked) { pv_lists_locked = TRUE; if (!rw_try_wlock(&pvh_global_lock)) { PMAP_UNLOCK(pmap); /* Repeat sva. */ goto resume; } sched_pin(); } if (!pmap_demote_pde(pmap, pde, sva)) panic("pmap_unwire: demotion failed"); } } if (pdnxt > eva) pdnxt = eva; for (pte = pmap_pte_quick(pmap, sva); sva != pdnxt; pte++, sva += PAGE_SIZE) { if ((*pte & PG_V) == 0) continue; if ((*pte & PG_W) == 0) panic("pmap_unwire: pte %#jx is missing PG_W", (uintmax_t)*pte); /* * PG_W must be cleared atomically. Although the pmap * lock synchronizes access to PG_W, another processor * could be setting PG_M and/or PG_A concurrently. * * PG_W is among the least significant 32 bits. */ atomic_clear_int((u_int *)pte, PG_W); pmap->pm_stats.wired_count--; } } if (pv_lists_locked) { sched_unpin(); rw_wunlock(&pvh_global_lock); } PMAP_UNLOCK(pmap); } /* * Copy the range specified by src_addr/len * from the source map to the range dst_addr/len * in the destination map. * * This routine is only advisory and need not do anything. */ void pmap_copy(pmap_t dst_pmap, pmap_t src_pmap, vm_offset_t dst_addr, vm_size_t len, vm_offset_t src_addr) { struct spglist free; vm_offset_t addr; vm_offset_t end_addr = src_addr + len; vm_offset_t pdnxt; if (dst_addr != src_addr) return; if (!pmap_is_current(src_pmap)) return; rw_wlock(&pvh_global_lock); if (dst_pmap < src_pmap) { PMAP_LOCK(dst_pmap); PMAP_LOCK(src_pmap); } else { PMAP_LOCK(src_pmap); PMAP_LOCK(dst_pmap); } sched_pin(); for (addr = src_addr; addr < end_addr; addr = pdnxt) { pt_entry_t *src_pte, *dst_pte; vm_page_t dstmpte, srcmpte; pd_entry_t srcptepaddr; u_int ptepindex; KASSERT(addr < UPT_MIN_ADDRESS, ("pmap_copy: invalid to pmap_copy page tables")); pdnxt = (addr + NBPDR) & ~PDRMASK; if (pdnxt < addr) pdnxt = end_addr; ptepindex = addr >> PDRSHIFT; srcptepaddr = src_pmap->pm_pdir[ptepindex]; if (srcptepaddr == 0) continue; if (srcptepaddr & PG_PS) { if ((addr & PDRMASK) != 0 || addr + NBPDR > end_addr) continue; if (dst_pmap->pm_pdir[ptepindex] == 0 && ((srcptepaddr & PG_MANAGED) == 0 || pmap_pv_insert_pde(dst_pmap, addr, srcptepaddr & PG_PS_FRAME))) { dst_pmap->pm_pdir[ptepindex] = srcptepaddr & ~PG_W; dst_pmap->pm_stats.resident_count += NBPDR / PAGE_SIZE; pmap_pde_mappings++; } continue; } srcmpte = PHYS_TO_VM_PAGE(srcptepaddr & PG_FRAME); KASSERT(srcmpte->wire_count > 0, ("pmap_copy: source page table page is unused")); if (pdnxt > end_addr) pdnxt = end_addr; src_pte = vtopte(addr); while (addr < pdnxt) { pt_entry_t ptetemp; ptetemp = *src_pte; /* * we only virtual copy managed pages */ if ((ptetemp & PG_MANAGED) != 0) { dstmpte = pmap_allocpte(dst_pmap, addr, PMAP_ENTER_NOSLEEP); if (dstmpte == NULL) goto out; dst_pte = pmap_pte_quick(dst_pmap, addr); if (*dst_pte == 0 && pmap_try_insert_pv_entry(dst_pmap, addr, PHYS_TO_VM_PAGE(ptetemp & PG_FRAME))) { /* * Clear the wired, modified, and * accessed (referenced) bits * during the copy. */ *dst_pte = ptetemp & ~(PG_W | PG_M | PG_A); dst_pmap->pm_stats.resident_count++; } else { SLIST_INIT(&free); if (pmap_unwire_ptp(dst_pmap, dstmpte, &free)) { pmap_invalidate_page(dst_pmap, addr); pmap_free_zero_pages(&free); } goto out; } if (dstmpte->wire_count >= srcmpte->wire_count) break; } addr += PAGE_SIZE; src_pte++; } } out: sched_unpin(); rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(src_pmap); PMAP_UNLOCK(dst_pmap); } /* * Zero 1 page of virtual memory mapped from a hardware page by the caller. */ static __inline void pagezero(void *page) { #if defined(I686_CPU) if (cpu_class == CPUCLASS_686) { #if defined(CPU_ENABLE_SSE) if (cpu_feature & CPUID_SSE2) sse2_pagezero(page); else #endif i686_pagezero(page); } else #endif bzero(page, PAGE_SIZE); } /* * Zero the specified hardware page. */ void pmap_zero_page(vm_page_t m) { struct sysmaps *sysmaps; sysmaps = &sysmaps_pcpu[PCPU_GET(cpuid)]; mtx_lock(&sysmaps->lock); if (*sysmaps->CMAP2) panic("pmap_zero_page: CMAP2 busy"); sched_pin(); *sysmaps->CMAP2 = PG_V | PG_RW | VM_PAGE_TO_PHYS(m) | PG_A | PG_M | pmap_cache_bits(m->md.pat_mode, 0); invlcaddr(sysmaps->CADDR2); pagezero(sysmaps->CADDR2); *sysmaps->CMAP2 = 0; sched_unpin(); mtx_unlock(&sysmaps->lock); } /* * Zero an an area within a single hardware page. off and size must not * cover an area beyond a single hardware page. */ void pmap_zero_page_area(vm_page_t m, int off, int size) { struct sysmaps *sysmaps; sysmaps = &sysmaps_pcpu[PCPU_GET(cpuid)]; mtx_lock(&sysmaps->lock); if (*sysmaps->CMAP2) panic("pmap_zero_page_area: CMAP2 busy"); sched_pin(); *sysmaps->CMAP2 = PG_V | PG_RW | VM_PAGE_TO_PHYS(m) | PG_A | PG_M | pmap_cache_bits(m->md.pat_mode, 0); invlcaddr(sysmaps->CADDR2); if (off == 0 && size == PAGE_SIZE) pagezero(sysmaps->CADDR2); else bzero((char *)sysmaps->CADDR2 + off, size); *sysmaps->CMAP2 = 0; sched_unpin(); mtx_unlock(&sysmaps->lock); } /* * Copy 1 specified hardware page to another. */ void pmap_copy_page(vm_page_t src, vm_page_t dst) { struct sysmaps *sysmaps; sysmaps = &sysmaps_pcpu[PCPU_GET(cpuid)]; mtx_lock(&sysmaps->lock); if (*sysmaps->CMAP1) panic("pmap_copy_page: CMAP1 busy"); if (*sysmaps->CMAP2) panic("pmap_copy_page: CMAP2 busy"); sched_pin(); *sysmaps->CMAP1 = PG_V | VM_PAGE_TO_PHYS(src) | PG_A | pmap_cache_bits(src->md.pat_mode, 0); invlcaddr(sysmaps->CADDR1); *sysmaps->CMAP2 = PG_V | PG_RW | VM_PAGE_TO_PHYS(dst) | PG_A | PG_M | pmap_cache_bits(dst->md.pat_mode, 0); invlcaddr(sysmaps->CADDR2); bcopy(sysmaps->CADDR1, sysmaps->CADDR2, PAGE_SIZE); *sysmaps->CMAP1 = 0; *sysmaps->CMAP2 = 0; sched_unpin(); mtx_unlock(&sysmaps->lock); } int unmapped_buf_allowed = 1; void pmap_copy_pages(vm_page_t ma[], vm_offset_t a_offset, vm_page_t mb[], vm_offset_t b_offset, int xfersize) { struct sysmaps *sysmaps; vm_page_t a_pg, b_pg; char *a_cp, *b_cp; vm_offset_t a_pg_offset, b_pg_offset; int cnt; sysmaps = &sysmaps_pcpu[PCPU_GET(cpuid)]; mtx_lock(&sysmaps->lock); if (*sysmaps->CMAP1 != 0) panic("pmap_copy_pages: CMAP1 busy"); if (*sysmaps->CMAP2 != 0) panic("pmap_copy_pages: CMAP2 busy"); sched_pin(); while (xfersize > 0) { a_pg = ma[a_offset >> PAGE_SHIFT]; a_pg_offset = a_offset & PAGE_MASK; cnt = min(xfersize, PAGE_SIZE - a_pg_offset); b_pg = mb[b_offset >> PAGE_SHIFT]; b_pg_offset = b_offset & PAGE_MASK; cnt = min(cnt, PAGE_SIZE - b_pg_offset); *sysmaps->CMAP1 = PG_V | VM_PAGE_TO_PHYS(a_pg) | PG_A | pmap_cache_bits(a_pg->md.pat_mode, 0); invlcaddr(sysmaps->CADDR1); *sysmaps->CMAP2 = PG_V | PG_RW | VM_PAGE_TO_PHYS(b_pg) | PG_A | PG_M | pmap_cache_bits(b_pg->md.pat_mode, 0); invlcaddr(sysmaps->CADDR2); a_cp = sysmaps->CADDR1 + a_pg_offset; b_cp = sysmaps->CADDR2 + b_pg_offset; bcopy(a_cp, b_cp, cnt); a_offset += cnt; b_offset += cnt; xfersize -= cnt; } *sysmaps->CMAP1 = 0; *sysmaps->CMAP2 = 0; sched_unpin(); mtx_unlock(&sysmaps->lock); } /* * Returns true if the pmap's pv is one of the first * 16 pvs linked to from this page. This count may * be changed upwards or downwards in the future; it * is only necessary that true be returned for a small * subset of pmaps for proper page aging. */ boolean_t pmap_page_exists_quick(pmap_t pmap, vm_page_t m) { struct md_page *pvh; pv_entry_t pv; int loops = 0; boolean_t rv; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_page_exists_quick: page %p is not managed", m)); rv = FALSE; rw_wlock(&pvh_global_lock); TAILQ_FOREACH(pv, &m->md.pv_list, pv_next) { if (PV_PMAP(pv) == pmap) { rv = TRUE; break; } loops++; if (loops >= 16) break; } if (!rv && loops < 16 && (m->flags & PG_FICTITIOUS) == 0) { pvh = pa_to_pvh(VM_PAGE_TO_PHYS(m)); TAILQ_FOREACH(pv, &pvh->pv_list, pv_next) { if (PV_PMAP(pv) == pmap) { rv = TRUE; break; } loops++; if (loops >= 16) break; } } rw_wunlock(&pvh_global_lock); return (rv); } /* * pmap_page_wired_mappings: * * Return the number of managed mappings to the given physical page * that are wired. */ int pmap_page_wired_mappings(vm_page_t m) { int count; count = 0; if ((m->oflags & VPO_UNMANAGED) != 0) return (count); rw_wlock(&pvh_global_lock); count = pmap_pvh_wired_mappings(&m->md, count); if ((m->flags & PG_FICTITIOUS) == 0) { count = pmap_pvh_wired_mappings(pa_to_pvh(VM_PAGE_TO_PHYS(m)), count); } rw_wunlock(&pvh_global_lock); return (count); } /* * pmap_pvh_wired_mappings: * * Return the updated number "count" of managed mappings that are wired. */ static int pmap_pvh_wired_mappings(struct md_page *pvh, int count) { pmap_t pmap; pt_entry_t *pte; pv_entry_t pv; rw_assert(&pvh_global_lock, RA_WLOCKED); sched_pin(); TAILQ_FOREACH(pv, &pvh->pv_list, pv_next) { pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pte = pmap_pte_quick(pmap, pv->pv_va); if ((*pte & PG_W) != 0) count++; PMAP_UNLOCK(pmap); } sched_unpin(); return (count); } /* * Returns TRUE if the given page is mapped individually or as part of * a 4mpage. Otherwise, returns FALSE. */ boolean_t pmap_page_is_mapped(vm_page_t m) { boolean_t rv; if ((m->oflags & VPO_UNMANAGED) != 0) return (FALSE); rw_wlock(&pvh_global_lock); rv = !TAILQ_EMPTY(&m->md.pv_list) || ((m->flags & PG_FICTITIOUS) == 0 && !TAILQ_EMPTY(&pa_to_pvh(VM_PAGE_TO_PHYS(m))->pv_list)); rw_wunlock(&pvh_global_lock); return (rv); } /* * Remove all pages from specified address space * this aids process exit speeds. Also, this code * is special cased for current process only, but * can have the more generic (and slightly slower) * mode enabled. This is much faster than pmap_remove * in the case of running down an entire address space. */ void pmap_remove_pages(pmap_t pmap) { pt_entry_t *pte, tpte; vm_page_t m, mpte, mt; pv_entry_t pv; struct md_page *pvh; struct pv_chunk *pc, *npc; struct spglist free; int field, idx; int32_t bit; uint32_t inuse, bitmask; int allfree; if (pmap != PCPU_GET(curpmap)) { printf("warning: pmap_remove_pages called with non-current pmap\n"); return; } SLIST_INIT(&free); rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); sched_pin(); TAILQ_FOREACH_SAFE(pc, &pmap->pm_pvchunk, pc_list, npc) { KASSERT(pc->pc_pmap == pmap, ("Wrong pmap %p %p", pmap, pc->pc_pmap)); allfree = 1; for (field = 0; field < _NPCM; field++) { inuse = ~pc->pc_map[field] & pc_freemask[field]; while (inuse != 0) { bit = bsfl(inuse); bitmask = 1UL << bit; idx = field * 32 + bit; pv = &pc->pc_pventry[idx]; inuse &= ~bitmask; pte = pmap_pde(pmap, pv->pv_va); tpte = *pte; if ((tpte & PG_PS) == 0) { pte = vtopte(pv->pv_va); tpte = *pte & ~PG_PTE_PAT; } if (tpte == 0) { printf( "TPTE at %p IS ZERO @ VA %08x\n", pte, pv->pv_va); panic("bad pte"); } /* * We cannot remove wired pages from a process' mapping at this time */ if (tpte & PG_W) { allfree = 0; continue; } m = PHYS_TO_VM_PAGE(tpte & PG_FRAME); KASSERT(m->phys_addr == (tpte & PG_FRAME), ("vm_page_t %p phys_addr mismatch %016jx %016jx", m, (uintmax_t)m->phys_addr, (uintmax_t)tpte)); KASSERT((m->flags & PG_FICTITIOUS) != 0 || m < &vm_page_array[vm_page_array_size], ("pmap_remove_pages: bad tpte %#jx", (uintmax_t)tpte)); pte_clear(pte); /* * Update the vm_page_t clean/reference bits. */ if ((tpte & (PG_M | PG_RW)) == (PG_M | PG_RW)) { if ((tpte & PG_PS) != 0) { for (mt = m; mt < &m[NBPDR / PAGE_SIZE]; mt++) vm_page_dirty(mt); } else vm_page_dirty(m); } /* Mark free */ PV_STAT(pv_entry_frees++); PV_STAT(pv_entry_spare++); pv_entry_count--; pc->pc_map[field] |= bitmask; if ((tpte & PG_PS) != 0) { pmap->pm_stats.resident_count -= NBPDR / PAGE_SIZE; pvh = pa_to_pvh(tpte & PG_PS_FRAME); TAILQ_REMOVE(&pvh->pv_list, pv, pv_next); if (TAILQ_EMPTY(&pvh->pv_list)) { for (mt = m; mt < &m[NBPDR / PAGE_SIZE]; mt++) if (TAILQ_EMPTY(&mt->md.pv_list)) vm_page_aflag_clear(mt, PGA_WRITEABLE); } mpte = pmap_lookup_pt_page(pmap, pv->pv_va); if (mpte != NULL) { pmap_remove_pt_page(pmap, mpte); pmap->pm_stats.resident_count--; KASSERT(mpte->wire_count == NPTEPG, ("pmap_remove_pages: pte page wire count error")); mpte->wire_count = 0; pmap_add_delayed_free_list(mpte, &free, FALSE); atomic_subtract_int(&vm_cnt.v_wire_count, 1); } } else { pmap->pm_stats.resident_count--; TAILQ_REMOVE(&m->md.pv_list, pv, pv_next); if (TAILQ_EMPTY(&m->md.pv_list) && (m->flags & PG_FICTITIOUS) == 0) { pvh = pa_to_pvh(VM_PAGE_TO_PHYS(m)); if (TAILQ_EMPTY(&pvh->pv_list)) vm_page_aflag_clear(m, PGA_WRITEABLE); } pmap_unuse_pt(pmap, pv->pv_va, &free); } } } if (allfree) { TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); free_pv_chunk(pc); } } sched_unpin(); pmap_invalidate_all(pmap); rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pmap); pmap_free_zero_pages(&free); } /* * pmap_is_modified: * * Return whether or not the specified physical page was modified * in any physical maps. */ boolean_t pmap_is_modified(vm_page_t m) { boolean_t rv; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_is_modified: page %p is not managed", m)); /* * If the page is not exclusive busied, then PGA_WRITEABLE cannot be * concurrently set while the object is locked. Thus, if PGA_WRITEABLE * is clear, no PTEs can have PG_M set. */ VM_OBJECT_ASSERT_WLOCKED(m->object); if (!vm_page_xbusied(m) && (m->aflags & PGA_WRITEABLE) == 0) return (FALSE); rw_wlock(&pvh_global_lock); rv = pmap_is_modified_pvh(&m->md) || ((m->flags & PG_FICTITIOUS) == 0 && pmap_is_modified_pvh(pa_to_pvh(VM_PAGE_TO_PHYS(m)))); rw_wunlock(&pvh_global_lock); return (rv); } /* * Returns TRUE if any of the given mappings were used to modify * physical memory. Otherwise, returns FALSE. Both page and 2mpage * mappings are supported. */ static boolean_t pmap_is_modified_pvh(struct md_page *pvh) { pv_entry_t pv; pt_entry_t *pte; pmap_t pmap; boolean_t rv; rw_assert(&pvh_global_lock, RA_WLOCKED); rv = FALSE; sched_pin(); TAILQ_FOREACH(pv, &pvh->pv_list, pv_next) { pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pte = pmap_pte_quick(pmap, pv->pv_va); rv = (*pte & (PG_M | PG_RW)) == (PG_M | PG_RW); PMAP_UNLOCK(pmap); if (rv) break; } sched_unpin(); return (rv); } /* * pmap_is_prefaultable: * * Return whether or not the specified virtual address is elgible * for prefault. */ boolean_t pmap_is_prefaultable(pmap_t pmap, vm_offset_t addr) { pd_entry_t *pde; pt_entry_t *pte; boolean_t rv; rv = FALSE; PMAP_LOCK(pmap); pde = pmap_pde(pmap, addr); if (*pde != 0 && (*pde & PG_PS) == 0) { pte = vtopte(addr); rv = *pte == 0; } PMAP_UNLOCK(pmap); return (rv); } /* * pmap_is_referenced: * * Return whether or not the specified physical page was referenced * in any physical maps. */ boolean_t pmap_is_referenced(vm_page_t m) { boolean_t rv; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_is_referenced: page %p is not managed", m)); rw_wlock(&pvh_global_lock); rv = pmap_is_referenced_pvh(&m->md) || ((m->flags & PG_FICTITIOUS) == 0 && pmap_is_referenced_pvh(pa_to_pvh(VM_PAGE_TO_PHYS(m)))); rw_wunlock(&pvh_global_lock); return (rv); } /* * Returns TRUE if any of the given mappings were referenced and FALSE * otherwise. Both page and 4mpage mappings are supported. */ static boolean_t pmap_is_referenced_pvh(struct md_page *pvh) { pv_entry_t pv; pt_entry_t *pte; pmap_t pmap; boolean_t rv; rw_assert(&pvh_global_lock, RA_WLOCKED); rv = FALSE; sched_pin(); TAILQ_FOREACH(pv, &pvh->pv_list, pv_next) { pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pte = pmap_pte_quick(pmap, pv->pv_va); rv = (*pte & (PG_A | PG_V)) == (PG_A | PG_V); PMAP_UNLOCK(pmap); if (rv) break; } sched_unpin(); return (rv); } /* * Clear the write and modified bits in each of the given page's mappings. */ void pmap_remove_write(vm_page_t m) { struct md_page *pvh; pv_entry_t next_pv, pv; pmap_t pmap; pd_entry_t *pde; pt_entry_t oldpte, *pte; vm_offset_t va; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_remove_write: page %p is not managed", m)); /* * If the page is not exclusive busied, then PGA_WRITEABLE cannot be * set by another thread while the object is locked. Thus, * if PGA_WRITEABLE is clear, no page table entries need updating. */ VM_OBJECT_ASSERT_WLOCKED(m->object); if (!vm_page_xbusied(m) && (m->aflags & PGA_WRITEABLE) == 0) return; rw_wlock(&pvh_global_lock); sched_pin(); if ((m->flags & PG_FICTITIOUS) != 0) goto small_mappings; pvh = pa_to_pvh(VM_PAGE_TO_PHYS(m)); TAILQ_FOREACH_SAFE(pv, &pvh->pv_list, pv_next, next_pv) { va = pv->pv_va; pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pde = pmap_pde(pmap, va); if ((*pde & PG_RW) != 0) (void)pmap_demote_pde(pmap, pde, va); PMAP_UNLOCK(pmap); } small_mappings: TAILQ_FOREACH(pv, &m->md.pv_list, pv_next) { pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pde = pmap_pde(pmap, pv->pv_va); KASSERT((*pde & PG_PS) == 0, ("pmap_clear_write: found" " a 4mpage in page %p's pv list", m)); pte = pmap_pte_quick(pmap, pv->pv_va); retry: oldpte = *pte; if ((oldpte & PG_RW) != 0) { /* * Regardless of whether a pte is 32 or 64 bits * in size, PG_RW and PG_M are among the least * significant 32 bits. */ if (!atomic_cmpset_int((u_int *)pte, oldpte, oldpte & ~(PG_RW | PG_M))) goto retry; if ((oldpte & PG_M) != 0) vm_page_dirty(m); pmap_invalidate_page(pmap, pv->pv_va); } PMAP_UNLOCK(pmap); } vm_page_aflag_clear(m, PGA_WRITEABLE); sched_unpin(); rw_wunlock(&pvh_global_lock); } -#define PMAP_TS_REFERENCED_MAX 5 - /* * pmap_ts_referenced: * * Return a count of reference bits for a page, clearing those bits. * It is not necessary for every reference bit to be cleared, but it * is necessary that 0 only be returned when there are truly no * reference bits set. - * - * XXX: The exact number of bits to check and clear is a matter that - * should be tested and standardized at some point in the future for - * optimal aging of shared pages. * * As an optimization, update the page's dirty field if a modified bit is * found while counting reference bits. This opportunistic update can be * performed at low cost and can eliminate the need for some future calls * to pmap_is_modified(). However, since this function stops after * finding PMAP_TS_REFERENCED_MAX reference bits, it may not detect some * dirty pages. Those dirty pages will only be detected by a future call * to pmap_is_modified(). */ int pmap_ts_referenced(vm_page_t m) { struct md_page *pvh; pv_entry_t pv, pvf; pmap_t pmap; pd_entry_t *pde; pt_entry_t *pte; vm_paddr_t pa; int rtval = 0; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_ts_referenced: page %p is not managed", m)); pa = VM_PAGE_TO_PHYS(m); pvh = pa_to_pvh(pa); rw_wlock(&pvh_global_lock); sched_pin(); if ((m->flags & PG_FICTITIOUS) != 0 || (pvf = TAILQ_FIRST(&pvh->pv_list)) == NULL) goto small_mappings; pv = pvf; do { pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pde = pmap_pde(pmap, pv->pv_va); if ((*pde & (PG_M | PG_RW)) == (PG_M | PG_RW)) { /* * Although "*pde" is mapping a 2/4MB page, because * this function is called at a 4KB page granularity, * we only update the 4KB page under test. */ vm_page_dirty(m); } if ((*pde & PG_A) != 0) { /* * Since this reference bit is shared by either 1024 * or 512 4KB pages, it should not be cleared every * time it is tested. Apply a simple "hash" function * on the physical page number, the virtual superpage * number, and the pmap address to select one 4KB page * out of the 1024 or 512 on which testing the * reference bit will result in clearing that bit. * This function is designed to avoid the selection of * the same 4KB page for every 2- or 4MB page mapping. * * On demotion, a mapping that hasn't been referenced * is simply destroyed. To avoid the possibility of a * subsequent page fault on a demoted wired mapping, * always leave its reference bit set. Moreover, * since the superpage is wired, the current state of * its reference bit won't affect page replacement. */ if ((((pa >> PAGE_SHIFT) ^ (pv->pv_va >> PDRSHIFT) ^ (uintptr_t)pmap) & (NPTEPG - 1)) == 0 && (*pde & PG_W) == 0) { atomic_clear_int((u_int *)pde, PG_A); pmap_invalidate_page(pmap, pv->pv_va); } rtval++; } PMAP_UNLOCK(pmap); /* Rotate the PV list if it has more than one entry. */ if (TAILQ_NEXT(pv, pv_next) != NULL) { TAILQ_REMOVE(&pvh->pv_list, pv, pv_next); TAILQ_INSERT_TAIL(&pvh->pv_list, pv, pv_next); } if (rtval >= PMAP_TS_REFERENCED_MAX) goto out; } while ((pv = TAILQ_FIRST(&pvh->pv_list)) != pvf); small_mappings: if ((pvf = TAILQ_FIRST(&m->md.pv_list)) == NULL) goto out; pv = pvf; do { pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pde = pmap_pde(pmap, pv->pv_va); KASSERT((*pde & PG_PS) == 0, ("pmap_ts_referenced: found a 4mpage in page %p's pv list", m)); pte = pmap_pte_quick(pmap, pv->pv_va); if ((*pte & (PG_M | PG_RW)) == (PG_M | PG_RW)) vm_page_dirty(m); if ((*pte & PG_A) != 0) { atomic_clear_int((u_int *)pte, PG_A); pmap_invalidate_page(pmap, pv->pv_va); rtval++; } PMAP_UNLOCK(pmap); /* Rotate the PV list if it has more than one entry. */ if (TAILQ_NEXT(pv, pv_next) != NULL) { TAILQ_REMOVE(&m->md.pv_list, pv, pv_next); TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_next); } } while ((pv = TAILQ_FIRST(&m->md.pv_list)) != pvf && rtval < PMAP_TS_REFERENCED_MAX); out: sched_unpin(); rw_wunlock(&pvh_global_lock); return (rtval); } /* * Apply the given advice to the specified range of addresses within the * given pmap. Depending on the advice, clear the referenced and/or * modified flags in each mapping and set the mapped page's dirty field. */ void pmap_advise(pmap_t pmap, vm_offset_t sva, vm_offset_t eva, int advice) { pd_entry_t oldpde, *pde; pt_entry_t *pte; vm_offset_t pdnxt; vm_page_t m; boolean_t anychanged, pv_lists_locked; if (advice != MADV_DONTNEED && advice != MADV_FREE) return; if (pmap_is_current(pmap)) pv_lists_locked = FALSE; else { pv_lists_locked = TRUE; resume: rw_wlock(&pvh_global_lock); sched_pin(); } anychanged = FALSE; PMAP_LOCK(pmap); for (; sva < eva; sva = pdnxt) { pdnxt = (sva + NBPDR) & ~PDRMASK; if (pdnxt < sva) pdnxt = eva; pde = pmap_pde(pmap, sva); oldpde = *pde; if ((oldpde & PG_V) == 0) continue; else if ((oldpde & PG_PS) != 0) { if ((oldpde & PG_MANAGED) == 0) continue; if (!pv_lists_locked) { pv_lists_locked = TRUE; if (!rw_try_wlock(&pvh_global_lock)) { if (anychanged) pmap_invalidate_all(pmap); PMAP_UNLOCK(pmap); goto resume; } sched_pin(); } if (!pmap_demote_pde(pmap, pde, sva)) { /* * The large page mapping was destroyed. */ continue; } /* * Unless the page mappings are wired, remove the * mapping to a single page so that a subsequent * access may repromote. Since the underlying page * table page is fully populated, this removal never * frees a page table page. */ if ((oldpde & PG_W) == 0) { pte = pmap_pte_quick(pmap, sva); KASSERT((*pte & PG_V) != 0, ("pmap_advise: invalid PTE")); pmap_remove_pte(pmap, pte, sva, NULL); anychanged = TRUE; } } if (pdnxt > eva) pdnxt = eva; for (pte = pmap_pte_quick(pmap, sva); sva != pdnxt; pte++, sva += PAGE_SIZE) { if ((*pte & (PG_MANAGED | PG_V)) != (PG_MANAGED | PG_V)) continue; else if ((*pte & (PG_M | PG_RW)) == (PG_M | PG_RW)) { if (advice == MADV_DONTNEED) { /* * Future calls to pmap_is_modified() * can be avoided by making the page * dirty now. */ m = PHYS_TO_VM_PAGE(*pte & PG_FRAME); vm_page_dirty(m); } atomic_clear_int((u_int *)pte, PG_M | PG_A); } else if ((*pte & PG_A) != 0) atomic_clear_int((u_int *)pte, PG_A); else continue; if ((*pte & PG_G) != 0) pmap_invalidate_page(pmap, sva); else anychanged = TRUE; } } if (anychanged) pmap_invalidate_all(pmap); if (pv_lists_locked) { sched_unpin(); rw_wunlock(&pvh_global_lock); } PMAP_UNLOCK(pmap); } /* * Clear the modify bits on the specified physical page. */ void pmap_clear_modify(vm_page_t m) { struct md_page *pvh; pv_entry_t next_pv, pv; pmap_t pmap; pd_entry_t oldpde, *pde; pt_entry_t oldpte, *pte; vm_offset_t va; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_clear_modify: page %p is not managed", m)); VM_OBJECT_ASSERT_WLOCKED(m->object); KASSERT(!vm_page_xbusied(m), ("pmap_clear_modify: page %p is exclusive busied", m)); /* * If the page is not PGA_WRITEABLE, then no PTEs can have PG_M set. * If the object containing the page is locked and the page is not * exclusive busied, then PGA_WRITEABLE cannot be concurrently set. */ if ((m->aflags & PGA_WRITEABLE) == 0) return; rw_wlock(&pvh_global_lock); sched_pin(); if ((m->flags & PG_FICTITIOUS) != 0) goto small_mappings; pvh = pa_to_pvh(VM_PAGE_TO_PHYS(m)); TAILQ_FOREACH_SAFE(pv, &pvh->pv_list, pv_next, next_pv) { va = pv->pv_va; pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pde = pmap_pde(pmap, va); oldpde = *pde; if ((oldpde & PG_RW) != 0) { if (pmap_demote_pde(pmap, pde, va)) { if ((oldpde & PG_W) == 0) { /* * Write protect the mapping to a * single page so that a subsequent * write access may repromote. */ va += VM_PAGE_TO_PHYS(m) - (oldpde & PG_PS_FRAME); pte = pmap_pte_quick(pmap, va); oldpte = *pte; if ((oldpte & PG_V) != 0) { /* * Regardless of whether a pte is 32 or 64 bits * in size, PG_RW and PG_M are among the least * significant 32 bits. */ while (!atomic_cmpset_int((u_int *)pte, oldpte, oldpte & ~(PG_M | PG_RW))) oldpte = *pte; vm_page_dirty(m); pmap_invalidate_page(pmap, va); } } } } PMAP_UNLOCK(pmap); } small_mappings: TAILQ_FOREACH(pv, &m->md.pv_list, pv_next) { pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pde = pmap_pde(pmap, pv->pv_va); KASSERT((*pde & PG_PS) == 0, ("pmap_clear_modify: found" " a 4mpage in page %p's pv list", m)); pte = pmap_pte_quick(pmap, pv->pv_va); if ((*pte & (PG_M | PG_RW)) == (PG_M | PG_RW)) { /* * Regardless of whether a pte is 32 or 64 bits * in size, PG_M is among the least significant * 32 bits. */ atomic_clear_int((u_int *)pte, PG_M); pmap_invalidate_page(pmap, pv->pv_va); } PMAP_UNLOCK(pmap); } sched_unpin(); rw_wunlock(&pvh_global_lock); } /* * Miscellaneous support routines follow */ /* Adjust the cache mode for a 4KB page mapped via a PTE. */ static __inline void pmap_pte_attr(pt_entry_t *pte, int cache_bits) { u_int opte, npte; /* * The cache mode bits are all in the low 32-bits of the * PTE, so we can just spin on updating the low 32-bits. */ do { opte = *(u_int *)pte; npte = opte & ~PG_PTE_CACHE; npte |= cache_bits; } while (npte != opte && !atomic_cmpset_int((u_int *)pte, opte, npte)); } /* Adjust the cache mode for a 2/4MB page mapped via a PDE. */ static __inline void pmap_pde_attr(pd_entry_t *pde, int cache_bits) { u_int opde, npde; /* * The cache mode bits are all in the low 32-bits of the * PDE, so we can just spin on updating the low 32-bits. */ do { opde = *(u_int *)pde; npde = opde & ~PG_PDE_CACHE; npde |= cache_bits; } while (npde != opde && !atomic_cmpset_int((u_int *)pde, opde, npde)); } /* * Map a set of physical memory pages into the kernel virtual * address space. Return a pointer to where it is mapped. This * routine is intended to be used for mapping device memory, * NOT real memory. */ void * pmap_mapdev_attr(vm_paddr_t pa, vm_size_t size, int mode) { struct pmap_preinit_mapping *ppim; vm_offset_t va, offset; vm_size_t tmpsize; int i; offset = pa & PAGE_MASK; size = round_page(offset + size); pa = pa & PG_FRAME; if (pa < KERNLOAD && pa + size <= KERNLOAD) va = KERNBASE + pa; else if (!pmap_initialized) { va = 0; for (i = 0; i < PMAP_PREINIT_MAPPING_COUNT; i++) { ppim = pmap_preinit_mapping + i; if (ppim->va == 0) { ppim->pa = pa; ppim->sz = size; ppim->mode = mode; ppim->va = virtual_avail; virtual_avail += size; va = ppim->va; break; } } if (va == 0) panic("%s: too many preinit mappings", __func__); } else { /* * If we have a preinit mapping, re-use it. */ for (i = 0; i < PMAP_PREINIT_MAPPING_COUNT; i++) { ppim = pmap_preinit_mapping + i; if (ppim->pa == pa && ppim->sz == size && ppim->mode == mode) return ((void *)(ppim->va + offset)); } va = kva_alloc(size); if (va == 0) panic("%s: Couldn't allocate KVA", __func__); } for (tmpsize = 0; tmpsize < size; tmpsize += PAGE_SIZE) pmap_kenter_attr(va + tmpsize, pa + tmpsize, mode); pmap_invalidate_range(kernel_pmap, va, va + tmpsize); pmap_invalidate_cache_range(va, va + size, FALSE); return ((void *)(va + offset)); } void * pmap_mapdev(vm_paddr_t pa, vm_size_t size) { return (pmap_mapdev_attr(pa, size, PAT_UNCACHEABLE)); } void * pmap_mapbios(vm_paddr_t pa, vm_size_t size) { return (pmap_mapdev_attr(pa, size, PAT_WRITE_BACK)); } void pmap_unmapdev(vm_offset_t va, vm_size_t size) { struct pmap_preinit_mapping *ppim; vm_offset_t offset; int i; if (va >= KERNBASE && va + size <= KERNBASE + KERNLOAD) return; offset = va & PAGE_MASK; size = round_page(offset + size); va = trunc_page(va); for (i = 0; i < PMAP_PREINIT_MAPPING_COUNT; i++) { ppim = pmap_preinit_mapping + i; if (ppim->va == va && ppim->sz == size) { if (pmap_initialized) return; ppim->pa = 0; ppim->va = 0; ppim->sz = 0; ppim->mode = 0; if (va + size == virtual_avail) virtual_avail = va; return; } } if (pmap_initialized) kva_free(va, size); } /* * Sets the memory attribute for the specified page. */ void pmap_page_set_memattr(vm_page_t m, vm_memattr_t ma) { m->md.pat_mode = ma; if ((m->flags & PG_FICTITIOUS) != 0) return; /* * If "m" is a normal page, flush it from the cache. * See pmap_invalidate_cache_range(). * * First, try to find an existing mapping of the page by sf * buffer. sf_buf_invalidate_cache() modifies mapping and * flushes the cache. */ if (sf_buf_invalidate_cache(m)) return; /* * If page is not mapped by sf buffer, but CPU does not * support self snoop, map the page transient and do * invalidation. In the worst case, whole cache is flushed by * pmap_invalidate_cache_range(). */ if ((cpu_feature & CPUID_SS) == 0) pmap_flush_page(m); } static void pmap_flush_page(vm_page_t m) { struct sysmaps *sysmaps; vm_offset_t sva, eva; bool useclflushopt; useclflushopt = (cpu_stdext_feature & CPUID_STDEXT_CLFLUSHOPT) != 0; if (useclflushopt || (cpu_feature & CPUID_CLFSH) != 0) { sysmaps = &sysmaps_pcpu[PCPU_GET(cpuid)]; mtx_lock(&sysmaps->lock); if (*sysmaps->CMAP2) panic("pmap_flush_page: CMAP2 busy"); sched_pin(); *sysmaps->CMAP2 = PG_V | PG_RW | VM_PAGE_TO_PHYS(m) | PG_A | PG_M | pmap_cache_bits(m->md.pat_mode, 0); invlcaddr(sysmaps->CADDR2); sva = (vm_offset_t)sysmaps->CADDR2; eva = sva + PAGE_SIZE; /* * Use mfence despite the ordering implied by * mtx_{un,}lock() because clflush on non-Intel CPUs * and clflushopt are not guaranteed to be ordered by * any other instruction. */ if (useclflushopt || cpu_vendor_id != CPU_VENDOR_INTEL) mfence(); for (; sva < eva; sva += cpu_clflush_line_size) { if (useclflushopt) clflushopt(sva); else clflush(sva); } if (useclflushopt || cpu_vendor_id != CPU_VENDOR_INTEL) mfence(); *sysmaps->CMAP2 = 0; sched_unpin(); mtx_unlock(&sysmaps->lock); } else pmap_invalidate_cache(); } /* * Changes the specified virtual address range's memory type to that given by * the parameter "mode". The specified virtual address range must be * completely contained within either the kernel map. * * Returns zero if the change completed successfully, and either EINVAL or * ENOMEM if the change failed. Specifically, EINVAL is returned if some part * of the virtual address range was not mapped, and ENOMEM is returned if * there was insufficient memory available to complete the change. */ int pmap_change_attr(vm_offset_t va, vm_size_t size, int mode) { vm_offset_t base, offset, tmpva; pd_entry_t *pde; pt_entry_t *pte; int cache_bits_pte, cache_bits_pde; boolean_t changed; base = trunc_page(va); offset = va & PAGE_MASK; size = round_page(offset + size); /* * Only supported on kernel virtual addresses above the recursive map. */ if (base < VM_MIN_KERNEL_ADDRESS) return (EINVAL); cache_bits_pde = pmap_cache_bits(mode, 1); cache_bits_pte = pmap_cache_bits(mode, 0); changed = FALSE; /* * Pages that aren't mapped aren't supported. Also break down * 2/4MB pages into 4KB pages if required. */ PMAP_LOCK(kernel_pmap); for (tmpva = base; tmpva < base + size; ) { pde = pmap_pde(kernel_pmap, tmpva); if (*pde == 0) { PMAP_UNLOCK(kernel_pmap); return (EINVAL); } if (*pde & PG_PS) { /* * If the current 2/4MB page already has * the required memory type, then we need not * demote this page. Just increment tmpva to * the next 2/4MB page frame. */ if ((*pde & PG_PDE_CACHE) == cache_bits_pde) { tmpva = trunc_4mpage(tmpva) + NBPDR; continue; } /* * If the current offset aligns with a 2/4MB * page frame and there is at least 2/4MB left * within the range, then we need not break * down this page into 4KB pages. */ if ((tmpva & PDRMASK) == 0 && tmpva + PDRMASK < base + size) { tmpva += NBPDR; continue; } if (!pmap_demote_pde(kernel_pmap, pde, tmpva)) { PMAP_UNLOCK(kernel_pmap); return (ENOMEM); } } pte = vtopte(tmpva); if (*pte == 0) { PMAP_UNLOCK(kernel_pmap); return (EINVAL); } tmpva += PAGE_SIZE; } PMAP_UNLOCK(kernel_pmap); /* * Ok, all the pages exist, so run through them updating their * cache mode if required. */ for (tmpva = base; tmpva < base + size; ) { pde = pmap_pde(kernel_pmap, tmpva); if (*pde & PG_PS) { if ((*pde & PG_PDE_CACHE) != cache_bits_pde) { pmap_pde_attr(pde, cache_bits_pde); changed = TRUE; } tmpva = trunc_4mpage(tmpva) + NBPDR; } else { pte = vtopte(tmpva); if ((*pte & PG_PTE_CACHE) != cache_bits_pte) { pmap_pte_attr(pte, cache_bits_pte); changed = TRUE; } tmpva += PAGE_SIZE; } } /* * Flush CPU caches to make sure any data isn't cached that * shouldn't be, etc. */ if (changed) { pmap_invalidate_range(kernel_pmap, base, tmpva); pmap_invalidate_cache_range(base, tmpva, FALSE); } return (0); } /* * perform the pmap work for mincore */ int pmap_mincore(pmap_t pmap, vm_offset_t addr, vm_paddr_t *locked_pa) { pd_entry_t *pdep; pt_entry_t *ptep, pte; vm_paddr_t pa; int val; PMAP_LOCK(pmap); retry: pdep = pmap_pde(pmap, addr); if (*pdep != 0) { if (*pdep & PG_PS) { pte = *pdep; /* Compute the physical address of the 4KB page. */ pa = ((*pdep & PG_PS_FRAME) | (addr & PDRMASK)) & PG_FRAME; val = MINCORE_SUPER; } else { ptep = pmap_pte(pmap, addr); pte = *ptep; pmap_pte_release(ptep); pa = pte & PG_FRAME; val = 0; } } else { pte = 0; pa = 0; val = 0; } if ((pte & PG_V) != 0) { val |= MINCORE_INCORE; if ((pte & (PG_M | PG_RW)) == (PG_M | PG_RW)) val |= MINCORE_MODIFIED | MINCORE_MODIFIED_OTHER; if ((pte & PG_A) != 0) val |= MINCORE_REFERENCED | MINCORE_REFERENCED_OTHER; } if ((val & (MINCORE_MODIFIED_OTHER | MINCORE_REFERENCED_OTHER)) != (MINCORE_MODIFIED_OTHER | MINCORE_REFERENCED_OTHER) && (pte & (PG_MANAGED | PG_V)) == (PG_MANAGED | PG_V)) { /* Ensure that "PHYS_TO_VM_PAGE(pa)->object" doesn't change. */ if (vm_page_pa_tryrelock(pmap, pa, locked_pa)) goto retry; } else PA_UNLOCK_COND(*locked_pa); PMAP_UNLOCK(pmap); return (val); } void pmap_activate(struct thread *td) { pmap_t pmap, oldpmap; u_int cpuid; u_int32_t cr3; critical_enter(); pmap = vmspace_pmap(td->td_proc->p_vmspace); oldpmap = PCPU_GET(curpmap); cpuid = PCPU_GET(cpuid); #if defined(SMP) CPU_CLR_ATOMIC(cpuid, &oldpmap->pm_active); CPU_SET_ATOMIC(cpuid, &pmap->pm_active); #else CPU_CLR(cpuid, &oldpmap->pm_active); CPU_SET(cpuid, &pmap->pm_active); #endif #if defined(PAE) || defined(PAE_TABLES) cr3 = vtophys(pmap->pm_pdpt); #else cr3 = vtophys(pmap->pm_pdir); #endif /* * pmap_activate is for the current thread on the current cpu */ td->td_pcb->pcb_cr3 = cr3; load_cr3(cr3); PCPU_SET(curpmap, pmap); critical_exit(); } void pmap_sync_icache(pmap_t pm, vm_offset_t va, vm_size_t sz) { } /* * Increase the starting virtual address of the given mapping if a * different alignment might result in more superpage mappings. */ void pmap_align_superpage(vm_object_t object, vm_ooffset_t offset, vm_offset_t *addr, vm_size_t size) { vm_offset_t superpage_offset; if (size < NBPDR) return; if (object != NULL && (object->flags & OBJ_COLORED) != 0) offset += ptoa(object->pg_color); superpage_offset = offset & PDRMASK; if (size - ((NBPDR - superpage_offset) & PDRMASK) < NBPDR || (*addr & PDRMASK) == superpage_offset) return; if ((*addr & PDRMASK) < superpage_offset) *addr = (*addr & ~PDRMASK) + superpage_offset; else *addr = ((*addr + PDRMASK) & ~PDRMASK) + superpage_offset; } vm_offset_t pmap_quick_enter_page(vm_page_t m) { vm_offset_t qaddr; pt_entry_t *pte; critical_enter(); qaddr = PCPU_GET(qmap_addr); pte = vtopte(qaddr); KASSERT(*pte == 0, ("pmap_quick_enter_page: PTE busy")); *pte = PG_V | PG_RW | VM_PAGE_TO_PHYS(m) | PG_A | PG_M | pmap_cache_bits(pmap_page_get_memattr(m), 0); invlpg(qaddr); return (qaddr); } void pmap_quick_remove_page(vm_offset_t addr) { vm_offset_t qaddr; pt_entry_t *pte; qaddr = PCPU_GET(qmap_addr); pte = vtopte(qaddr); KASSERT(*pte != 0, ("pmap_quick_remove_page: PTE not in use")); KASSERT(addr == qaddr, ("pmap_quick_remove_page: invalid address")); *pte = 0; critical_exit(); } #if defined(PMAP_DEBUG) pmap_pid_dump(int pid) { pmap_t pmap; struct proc *p; int npte = 0; int index; sx_slock(&allproc_lock); FOREACH_PROC_IN_SYSTEM(p) { if (p->p_pid != pid) continue; if (p->p_vmspace) { int i,j; index = 0; pmap = vmspace_pmap(p->p_vmspace); for (i = 0; i < NPDEPTD; i++) { pd_entry_t *pde; pt_entry_t *pte; vm_offset_t base = i << PDRSHIFT; pde = &pmap->pm_pdir[i]; if (pde && pmap_pde_v(pde)) { for (j = 0; j < NPTEPG; j++) { vm_offset_t va = base + (j << PAGE_SHIFT); if (va >= (vm_offset_t) VM_MIN_KERNEL_ADDRESS) { if (index) { index = 0; printf("\n"); } sx_sunlock(&allproc_lock); return (npte); } pte = pmap_pte(pmap, va); if (pte && pmap_pte_v(pte)) { pt_entry_t pa; vm_page_t m; pa = *pte; m = PHYS_TO_VM_PAGE(pa & PG_FRAME); printf("va: 0x%x, pt: 0x%x, h: %d, w: %d, f: 0x%x", va, pa, m->hold_count, m->wire_count, m->flags); npte++; index++; if (index >= 2) { index = 0; printf("\n"); } else { printf(" "); } } } } } } } sx_sunlock(&allproc_lock); return (npte); } #endif Index: projects/clang390-import/sys/kern/kern_exit.c =================================================================== --- projects/clang390-import/sys/kern/kern_exit.c (revision 305686) +++ projects/clang390-import/sys/kern/kern_exit.c (revision 305687) @@ -1,1326 +1,1326 @@ /*- * Copyright (c) 1982, 1986, 1989, 1991, 1993 * The Regents of the University of California. All rights reserved. * (c) UNIX System Laboratories, Inc. * All or some portions of this file are derived from material licensed * to the University of California by American Telephone and Telegraph * Co. or Unix System Laboratories, Inc. and are reproduced herein with * the permission of UNIX System Laboratories, Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)kern_exit.c 8.7 (Berkeley) 2/12/94 */ #include __FBSDID("$FreeBSD$"); #include "opt_compat.h" #include "opt_ktrace.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include /* for acct_process() function prototype */ #include #include #include #include #include #ifdef KTRACE #include #endif #include #include #include #include #include #include #include #include #include #include #ifdef KDTRACE_HOOKS #include dtrace_execexit_func_t dtrace_fasttrap_exit; #endif SDT_PROVIDER_DECLARE(proc); SDT_PROBE_DEFINE1(proc, , , exit, "int"); /* Hook for NFS teardown procedure. */ void (*nlminfo_release_p)(struct proc *p); struct proc * proc_realparent(struct proc *child) { struct proc *p, *parent; sx_assert(&proctree_lock, SX_LOCKED); if ((child->p_treeflag & P_TREE_ORPHANED) == 0) { if (child->p_oppid == 0 || child->p_pptr->p_pid == child->p_oppid) parent = child->p_pptr; else parent = initproc; return (parent); } for (p = child; (p->p_treeflag & P_TREE_FIRST_ORPHAN) == 0;) { /* Cannot use LIST_PREV(), since the list head is not known. */ p = __containerof(p->p_orphan.le_prev, struct proc, p_orphan.le_next); KASSERT((p->p_treeflag & P_TREE_ORPHANED) != 0, ("missing P_ORPHAN %p", p)); } parent = __containerof(p->p_orphan.le_prev, struct proc, p_orphans.lh_first); return (parent); } void reaper_abandon_children(struct proc *p, bool exiting) { struct proc *p1, *p2, *ptmp; sx_assert(&proctree_lock, SX_LOCKED); KASSERT(p != initproc, ("reaper_abandon_children for initproc")); if ((p->p_treeflag & P_TREE_REAPER) == 0) return; p1 = p->p_reaper; LIST_FOREACH_SAFE(p2, &p->p_reaplist, p_reapsibling, ptmp) { LIST_REMOVE(p2, p_reapsibling); p2->p_reaper = p1; p2->p_reapsubtree = p->p_reapsubtree; LIST_INSERT_HEAD(&p1->p_reaplist, p2, p_reapsibling); if (exiting && p2->p_pptr == p) { PROC_LOCK(p2); proc_reparent(p2, p1); PROC_UNLOCK(p2); } } KASSERT(LIST_EMPTY(&p->p_reaplist), ("p_reaplist not empty")); p->p_treeflag &= ~P_TREE_REAPER; } static void clear_orphan(struct proc *p) { struct proc *p1; sx_assert(&proctree_lock, SA_XLOCKED); if ((p->p_treeflag & P_TREE_ORPHANED) == 0) return; if ((p->p_treeflag & P_TREE_FIRST_ORPHAN) != 0) { p1 = LIST_NEXT(p, p_orphan); if (p1 != NULL) p1->p_treeflag |= P_TREE_FIRST_ORPHAN; p->p_treeflag &= ~P_TREE_FIRST_ORPHAN; } LIST_REMOVE(p, p_orphan); p->p_treeflag &= ~P_TREE_ORPHANED; } /* * exit -- death of process. */ void sys_sys_exit(struct thread *td, struct sys_exit_args *uap) { exit1(td, uap->rval, 0); /* NOTREACHED */ } /* * Exit: deallocate address space and other resources, change proc state to * zombie, and unlink proc from allproc and parent's lists. Save exit status * and rusage for wait(). Check for child processes and orphan them. */ void exit1(struct thread *td, int rval, int signo) { struct proc *p, *nq, *q, *t; struct thread *tdt; mtx_assert(&Giant, MA_NOTOWNED); KASSERT(rval == 0 || signo == 0, ("exit1 rv %d sig %d", rval, signo)); p = td->td_proc; /* * XXX in case we're rebooting we just let init die in order to * work around an unsolved stack overflow seen very late during * shutdown on sparc64 when the gmirror worker process exists. */ if (p == initproc && rebooting == 0) { printf("init died (signal %d, exit %d)\n", signo, rval); panic("Going nowhere without my init!"); } /* * Deref SU mp, since the thread does not return to userspace. */ if (softdep_ast_cleanup != NULL) softdep_ast_cleanup(); /* * MUST abort all other threads before proceeding past here. */ PROC_LOCK(p); /* * First check if some other thread or external request got * here before us. If so, act appropriately: exit or suspend. * We must ensure that stop requests are handled before we set * P_WEXIT. */ thread_suspend_check(0); while (p->p_flag & P_HADTHREADS) { /* * Kill off the other threads. This requires * some co-operation from other parts of the kernel * so it may not be instantaneous. With this state set * any thread entering the kernel from userspace will * thread_exit() in trap(). Any thread attempting to * sleep will return immediately with EINTR or EWOULDBLOCK * which will hopefully force them to back out to userland * freeing resources as they go. Any thread attempting * to return to userland will thread_exit() from userret(). * thread_exit() will unsuspend us when the last of the * other threads exits. * If there is already a thread singler after resumption, * calling thread_single will fail; in that case, we just * re-check all suspension request, the thread should * either be suspended there or exit. */ if (!thread_single(p, SINGLE_EXIT)) /* * All other activity in this process is now * stopped. Threading support has been turned * off. */ break; /* * Recheck for new stop or suspend requests which * might appear while process lock was dropped in * thread_single(). */ thread_suspend_check(0); } KASSERT(p->p_numthreads == 1, ("exit1: proc %p exiting with %d threads", p, p->p_numthreads)); racct_sub(p, RACCT_NTHR, 1); /* Let event handler change exit status */ p->p_xexit = rval; p->p_xsig = signo; /* * Wakeup anyone in procfs' PIOCWAIT. They should have a hold * on our vmspace, so we should block below until they have * released their reference to us. Note that if they have * requested S_EXIT stops we will block here until they ack * via PIOCCONT. */ _STOPEVENT(p, S_EXIT, 0); /* * Ignore any pending request to stop due to a stop signal. * Once P_WEXIT is set, future requests will be ignored as * well. */ p->p_flag &= ~P_STOPPED_SIG; KASSERT(!P_SHOULDSTOP(p), ("exiting process is stopped")); /* * Note that we are exiting and do another wakeup of anyone in * PIOCWAIT in case they aren't listening for S_EXIT stops or * decided to wait again after we told them we are exiting. */ p->p_flag |= P_WEXIT; wakeup(&p->p_stype); /* * Wait for any processes that have a hold on our vmspace to * release their reference. */ while (p->p_lock > 0) msleep(&p->p_lock, &p->p_mtx, PWAIT, "exithold", 0); PROC_UNLOCK(p); /* Drain the limit callout while we don't have the proc locked */ callout_drain(&p->p_limco); #ifdef AUDIT /* * The Sun BSM exit token contains two components: an exit status as * passed to exit(), and a return value to indicate what sort of exit * it was. The exit status is WEXITSTATUS(rv), but it's not clear * what the return value is. */ AUDIT_ARG_EXIT(rval, 0); AUDIT_SYSCALL_EXIT(0, td); #endif /* Are we a task leader with peers? */ if (p->p_peers != NULL && p == p->p_leader) { mtx_lock(&ppeers_lock); q = p->p_peers; while (q != NULL) { PROC_LOCK(q); kern_psignal(q, SIGKILL); PROC_UNLOCK(q); q = q->p_peers; } while (p->p_peers != NULL) msleep(p, &ppeers_lock, PWAIT, "exit1", 0); mtx_unlock(&ppeers_lock); } /* * Check if any loadable modules need anything done at process exit. * E.g. SYSV IPC stuff. * Event handler could change exit status. * XXX what if one of these generates an error? */ EVENTHANDLER_INVOKE(process_exit, p); /* * If parent is waiting for us to exit or exec, * P_PPWAIT is set; we will wakeup the parent below. */ PROC_LOCK(p); stopprofclock(p); p->p_flag &= ~(P_TRACED | P_PPWAIT | P_PPTRACE); p->p_ptevents = 0; /* * Stop the real interval timer. If the handler is currently * executing, prevent it from rearming itself and let it finish. */ if (timevalisset(&p->p_realtimer.it_value) && _callout_stop_safe(&p->p_itcallout, CS_EXECUTING, NULL) == 0) { timevalclear(&p->p_realtimer.it_interval); msleep(&p->p_itcallout, &p->p_mtx, PWAIT, "ritwait", 0); KASSERT(!timevalisset(&p->p_realtimer.it_value), ("realtime timer is still armed")); } PROC_UNLOCK(p); umtx_thread_exit(td); /* * Reset any sigio structures pointing to us as a result of * F_SETOWN with our pid. */ funsetownlst(&p->p_sigiolst); /* * If this process has an nlminfo data area (for lockd), release it */ if (nlminfo_release_p != NULL && p->p_nlminfo != NULL) (*nlminfo_release_p)(p); /* * Close open files and release open-file table. * This may block! */ fdescfree(td); /* * If this thread tickled GEOM, we need to wait for the giggling to * stop before we return to userland */ if (td->td_pflags & TDP_GEOM) g_waitidle(); /* * Remove ourself from our leader's peer list and wake our leader. */ if (p->p_leader->p_peers != NULL) { mtx_lock(&ppeers_lock); if (p->p_leader->p_peers != NULL) { q = p->p_leader; while (q->p_peers != p) q = q->p_peers; q->p_peers = p->p_peers; wakeup(p->p_leader); } mtx_unlock(&ppeers_lock); } vmspace_exit(td); killjobc(); (void)acct_process(td); #ifdef KTRACE ktrprocexit(td); #endif /* * Release reference to text vnode */ if (p->p_textvp != NULL) { vrele(p->p_textvp); p->p_textvp = NULL; } /* * Release our limits structure. */ lim_free(p->p_limit); p->p_limit = NULL; tidhash_remove(td); /* * Remove proc from allproc queue and pidhash chain. * Place onto zombproc. Unlink from parent's child list. */ sx_xlock(&allproc_lock); LIST_REMOVE(p, p_list); LIST_INSERT_HEAD(&zombproc, p, p_list); LIST_REMOVE(p, p_hash); sx_xunlock(&allproc_lock); /* * Call machine-dependent code to release any * machine-dependent resources other than the address space. * The address space is released by "vmspace_exitfree(p)" in * vm_waitproc(). */ cpu_exit(td); WITNESS_WARN(WARN_PANIC, NULL, "process (pid %d) exiting", p->p_pid); /* * Reparent all children processes: * - traced ones to the original parent (or init if we are that parent) * - the rest to init */ sx_xlock(&proctree_lock); q = LIST_FIRST(&p->p_children); if (q != NULL) /* only need this if any child is S_ZOMB */ wakeup(q->p_reaper); for (; q != NULL; q = nq) { nq = LIST_NEXT(q, p_sibling); PROC_LOCK(q); q->p_sigparent = SIGCHLD; if (!(q->p_flag & P_TRACED)) { proc_reparent(q, q->p_reaper); } else { /* * Traced processes are killed since their existence * means someone is screwing up. */ t = proc_realparent(q); if (t == p) { proc_reparent(q, q->p_reaper); } else { PROC_LOCK(t); proc_reparent(q, t); PROC_UNLOCK(t); } /* * Since q was found on our children list, the * proc_reparent() call moved q to the orphan * list due to present P_TRACED flag. Clear * orphan link for q now while q is locked. */ clear_orphan(q); q->p_flag &= ~(P_TRACED | P_STOPPED_TRACE); q->p_flag2 &= ~P2_PTRACE_FSTP; q->p_ptevents = 0; FOREACH_THREAD_IN_PROC(q, tdt) { tdt->td_dbgflags &= ~(TDB_SUSPEND | TDB_XSIG | TDB_FSTP); } kern_psignal(q, SIGKILL); } PROC_UNLOCK(q); } /* * Also get rid of our orphans. */ while ((q = LIST_FIRST(&p->p_orphans)) != NULL) { PROC_LOCK(q); CTR2(KTR_PTRACE, "exit: pid %d, clearing orphan %d", p->p_pid, q->p_pid); clear_orphan(q); PROC_UNLOCK(q); } /* Save exit status. */ PROC_LOCK(p); p->p_xthread = td; /* Tell the prison that we are gone. */ prison_proc_free(p->p_ucred->cr_prison); #ifdef KDTRACE_HOOKS /* * Tell the DTrace fasttrap provider about the exit if it * has declared an interest. */ if (dtrace_fasttrap_exit) dtrace_fasttrap_exit(p); #endif /* * Notify interested parties of our demise. */ KNOTE_LOCKED(p->p_klist, NOTE_EXIT); #ifdef KDTRACE_HOOKS int reason = CLD_EXITED; if (WCOREDUMP(signo)) reason = CLD_DUMPED; else if (WIFSIGNALED(signo)) reason = CLD_KILLED; SDT_PROBE1(proc, , , exit, reason); #endif /* * If this is a process with a descriptor, we may not need to deliver * a signal to the parent. proctree_lock is held over * procdesc_exit() to serialize concurrent calls to close() and * exit(). */ if (p->p_procdesc == NULL || procdesc_exit(p)) { /* * Notify parent that we're gone. If parent has the * PS_NOCLDWAIT flag set, or if the handler is set to SIG_IGN, * notify process 1 instead (and hope it will handle this * situation). */ PROC_LOCK(p->p_pptr); mtx_lock(&p->p_pptr->p_sigacts->ps_mtx); if (p->p_pptr->p_sigacts->ps_flag & (PS_NOCLDWAIT | PS_CLDSIGIGN)) { struct proc *pp; mtx_unlock(&p->p_pptr->p_sigacts->ps_mtx); pp = p->p_pptr; PROC_UNLOCK(pp); proc_reparent(p, p->p_reaper); p->p_sigparent = SIGCHLD; PROC_LOCK(p->p_pptr); /* * Notify parent, so in case he was wait(2)ing or * executing waitpid(2) with our pid, he will * continue. */ wakeup(pp); } else mtx_unlock(&p->p_pptr->p_sigacts->ps_mtx); if (p->p_pptr == p->p_reaper || p->p_pptr == initproc) childproc_exited(p); else if (p->p_sigparent != 0) { if (p->p_sigparent == SIGCHLD) childproc_exited(p); else /* LINUX thread */ kern_psignal(p->p_pptr, p->p_sigparent); } } else PROC_LOCK(p->p_pptr); sx_xunlock(&proctree_lock); /* * The state PRS_ZOMBIE prevents other proesses from sending * signal to the process, to avoid memory leak, we free memory * for signal queue at the time when the state is set. */ sigqueue_flush(&p->p_sigqueue); sigqueue_flush(&td->td_sigqueue); /* * We have to wait until after acquiring all locks before * changing p_state. We need to avoid all possible context * switches (including ones from blocking on a mutex) while * marked as a zombie. We also have to set the zombie state * before we release the parent process' proc lock to avoid * a lost wakeup. So, we first call wakeup, then we grab the * sched lock, update the state, and release the parent process' * proc lock. */ wakeup(p->p_pptr); cv_broadcast(&p->p_pwait); sched_exit(p->p_pptr, td); PROC_SLOCK(p); p->p_state = PRS_ZOMBIE; PROC_UNLOCK(p->p_pptr); /* * Save our children's rusage information in our exit rusage. */ PROC_STATLOCK(p); ruadd(&p->p_ru, &p->p_rux, &p->p_stats->p_cru, &p->p_crux); PROC_STATUNLOCK(p); /* * Make sure the scheduler takes this thread out of its tables etc. * This will also release this thread's reference to the ucred. * Other thread parts to release include pcb bits and such. */ thread_exit(); } #ifndef _SYS_SYSPROTO_H_ struct abort2_args { char *why; int nargs; void **args; }; #endif int sys_abort2(struct thread *td, struct abort2_args *uap) { struct proc *p = td->td_proc; struct sbuf *sb; void *uargs[16]; int error, i, sig; /* * Do it right now so we can log either proper call of abort2(), or * note, that invalid argument was passed. 512 is big enough to * handle 16 arguments' descriptions with additional comments. */ sb = sbuf_new(NULL, NULL, 512, SBUF_FIXEDLEN); sbuf_clear(sb); sbuf_printf(sb, "%s(pid %d uid %d) aborted: ", p->p_comm, p->p_pid, td->td_ucred->cr_uid); /* * Since we can't return from abort2(), send SIGKILL in cases, where * abort2() was called improperly */ sig = SIGKILL; /* Prevent from DoSes from user-space. */ if (uap->nargs < 0 || uap->nargs > 16) goto out; if (uap->nargs > 0) { if (uap->args == NULL) goto out; error = copyin(uap->args, uargs, uap->nargs * sizeof(void *)); if (error != 0) goto out; } /* * Limit size of 'reason' string to 128. Will fit even when * maximal number of arguments was chosen to be logged. */ if (uap->why != NULL) { error = sbuf_copyin(sb, uap->why, 128); if (error < 0) goto out; } else { sbuf_printf(sb, "(null)"); } if (uap->nargs > 0) { sbuf_printf(sb, "("); for (i = 0;i < uap->nargs; i++) sbuf_printf(sb, "%s%p", i == 0 ? "" : ", ", uargs[i]); sbuf_printf(sb, ")"); } /* * Final stage: arguments were proper, string has been * successfully copied from userspace, and copying pointers * from user-space succeed. */ sig = SIGABRT; out: if (sig == SIGKILL) { sbuf_trim(sb); sbuf_printf(sb, " (Reason text inaccessible)"); } sbuf_cat(sb, "\n"); sbuf_finish(sb); log(LOG_INFO, "%s", sbuf_data(sb)); sbuf_delete(sb); exit1(td, 0, sig); return (0); } #ifdef COMPAT_43 /* * The dirty work is handled by kern_wait(). */ int owait(struct thread *td, struct owait_args *uap __unused) { int error, status; error = kern_wait(td, WAIT_ANY, &status, 0, NULL); if (error == 0) td->td_retval[1] = status; return (error); } #endif /* COMPAT_43 */ /* * The dirty work is handled by kern_wait(). */ int sys_wait4(struct thread *td, struct wait4_args *uap) { struct rusage ru, *rup; int error, status; if (uap->rusage != NULL) rup = &ru; else rup = NULL; error = kern_wait(td, uap->pid, &status, uap->options, rup); - if (uap->status != NULL && error == 0) + if (uap->status != NULL && error == 0 && td->td_retval[0] != 0) error = copyout(&status, uap->status, sizeof(status)); - if (uap->rusage != NULL && error == 0) + if (uap->rusage != NULL && error == 0 && td->td_retval[0] != 0) error = copyout(&ru, uap->rusage, sizeof(struct rusage)); return (error); } int sys_wait6(struct thread *td, struct wait6_args *uap) { struct __wrusage wru, *wrup; siginfo_t si, *sip; idtype_t idtype; id_t id; int error, status; idtype = uap->idtype; id = uap->id; if (uap->wrusage != NULL) wrup = &wru; else wrup = NULL; if (uap->info != NULL) { sip = &si; bzero(sip, sizeof(*sip)); } else sip = NULL; /* * We expect all callers of wait6() to know about WEXITED and * WTRAPPED. */ error = kern_wait6(td, idtype, id, &status, uap->options, wrup, sip); - if (uap->status != NULL && error == 0) + if (uap->status != NULL && error == 0 && td->td_retval[0] != 0) error = copyout(&status, uap->status, sizeof(status)); - if (uap->wrusage != NULL && error == 0) + if (uap->wrusage != NULL && error == 0 && td->td_retval[0] != 0) error = copyout(&wru, uap->wrusage, sizeof(wru)); if (uap->info != NULL && error == 0) error = copyout(&si, uap->info, sizeof(si)); return (error); } /* * Reap the remains of a zombie process and optionally return status and * rusage. Asserts and will release both the proctree_lock and the process * lock as part of its work. */ void proc_reap(struct thread *td, struct proc *p, int *status, int options) { struct proc *q, *t; sx_assert(&proctree_lock, SA_XLOCKED); PROC_LOCK_ASSERT(p, MA_OWNED); PROC_SLOCK_ASSERT(p, MA_OWNED); KASSERT(p->p_state == PRS_ZOMBIE, ("proc_reap: !PRS_ZOMBIE")); q = td->td_proc; PROC_SUNLOCK(p); if (status) *status = KW_EXITCODE(p->p_xexit, p->p_xsig); if (options & WNOWAIT) { /* * Only poll, returning the status. Caller does not wish to * release the proc struct just yet. */ PROC_UNLOCK(p); sx_xunlock(&proctree_lock); return; } PROC_LOCK(q); sigqueue_take(p->p_ksi); PROC_UNLOCK(q); /* * If we got the child via a ptrace 'attach', we need to give it back * to the old parent. */ if (p->p_oppid != 0 && p->p_oppid != p->p_pptr->p_pid) { PROC_UNLOCK(p); t = proc_realparent(p); PROC_LOCK(t); PROC_LOCK(p); CTR2(KTR_PTRACE, "wait: traced child %d moved back to parent %d", p->p_pid, t->p_pid); proc_reparent(p, t); p->p_oppid = 0; PROC_UNLOCK(p); pksignal(t, SIGCHLD, p->p_ksi); wakeup(t); cv_broadcast(&p->p_pwait); PROC_UNLOCK(t); sx_xunlock(&proctree_lock); return; } p->p_oppid = 0; PROC_UNLOCK(p); /* * Remove other references to this process to ensure we have an * exclusive reference. */ sx_xlock(&allproc_lock); LIST_REMOVE(p, p_list); /* off zombproc */ sx_xunlock(&allproc_lock); LIST_REMOVE(p, p_sibling); reaper_abandon_children(p, true); LIST_REMOVE(p, p_reapsibling); PROC_LOCK(p); clear_orphan(p); PROC_UNLOCK(p); leavepgrp(p); if (p->p_procdesc != NULL) procdesc_reap(p); sx_xunlock(&proctree_lock); PROC_LOCK(p); knlist_detach(p->p_klist); p->p_klist = NULL; PROC_UNLOCK(p); /* * Removal from allproc list and process group list paired with * PROC_LOCK which was executed during that time should guarantee * nothing can reach this process anymore. As such further locking * is unnecessary. */ p->p_xexit = p->p_xsig = 0; /* XXX: why? */ PROC_LOCK(q); ruadd(&q->p_stats->p_cru, &q->p_crux, &p->p_ru, &p->p_rux); PROC_UNLOCK(q); /* * Decrement the count of procs running with this uid. */ (void)chgproccnt(p->p_ucred->cr_ruidinfo, -1, 0); /* * Destroy resource accounting information associated with the process. */ #ifdef RACCT if (racct_enable) { PROC_LOCK(p); racct_sub(p, RACCT_NPROC, 1); PROC_UNLOCK(p); } #endif racct_proc_exit(p); /* * Free credentials, arguments, and sigacts. */ crfree(p->p_ucred); proc_set_cred(p, NULL); pargs_drop(p->p_args); p->p_args = NULL; sigacts_free(p->p_sigacts); p->p_sigacts = NULL; /* * Do any thread-system specific cleanups. */ thread_wait(p); /* * Give vm and machine-dependent layer a chance to free anything that * cpu_exit couldn't release while still running in process context. */ vm_waitproc(p); #ifdef MAC mac_proc_destroy(p); #endif /* * Free any domain policy that's still hiding around. */ vm_domain_policy_cleanup(&p->p_vm_dom_policy); KASSERT(FIRST_THREAD_IN_PROC(p), ("proc_reap: no residual thread!")); uma_zfree(proc_zone, p); atomic_add_int(&nprocs, -1); } static int proc_to_reap(struct thread *td, struct proc *p, idtype_t idtype, id_t id, int *status, int options, struct __wrusage *wrusage, siginfo_t *siginfo, int check_only) { struct rusage *rup; sx_assert(&proctree_lock, SA_XLOCKED); PROC_LOCK(p); switch (idtype) { case P_ALL: if (p->p_procdesc != NULL) { PROC_UNLOCK(p); return (0); } break; case P_PID: if (p->p_pid != (pid_t)id) { PROC_UNLOCK(p); return (0); } break; case P_PGID: if (p->p_pgid != (pid_t)id) { PROC_UNLOCK(p); return (0); } break; case P_SID: if (p->p_session->s_sid != (pid_t)id) { PROC_UNLOCK(p); return (0); } break; case P_UID: if (p->p_ucred->cr_uid != (uid_t)id) { PROC_UNLOCK(p); return (0); } break; case P_GID: if (p->p_ucred->cr_gid != (gid_t)id) { PROC_UNLOCK(p); return (0); } break; case P_JAILID: if (p->p_ucred->cr_prison->pr_id != (int)id) { PROC_UNLOCK(p); return (0); } break; /* * It seems that the thread structures get zeroed out * at process exit. This makes it impossible to * support P_SETID, P_CID or P_CPUID. */ default: PROC_UNLOCK(p); return (0); } if (p_canwait(td, p)) { PROC_UNLOCK(p); return (0); } if (((options & WEXITED) == 0) && (p->p_state == PRS_ZOMBIE)) { PROC_UNLOCK(p); return (0); } /* * This special case handles a kthread spawned by linux_clone * (see linux_misc.c). The linux_wait4 and linux_waitpid * functions need to be able to distinguish between waiting * on a process and waiting on a thread. It is a thread if * p_sigparent is not SIGCHLD, and the WLINUXCLONE option * signifies we want to wait for threads and not processes. */ if ((p->p_sigparent != SIGCHLD) ^ ((options & WLINUXCLONE) != 0)) { PROC_UNLOCK(p); return (0); } if (siginfo != NULL) { bzero(siginfo, sizeof(*siginfo)); siginfo->si_errno = 0; /* * SUSv4 requires that the si_signo value is always * SIGCHLD. Obey it despite the rfork(2) interface * allows to request other signal for child exit * notification. */ siginfo->si_signo = SIGCHLD; /* * This is still a rough estimate. We will fix the * cases TRAPPED, STOPPED, and CONTINUED later. */ if (WCOREDUMP(p->p_xsig)) { siginfo->si_code = CLD_DUMPED; siginfo->si_status = WTERMSIG(p->p_xsig); } else if (WIFSIGNALED(p->p_xsig)) { siginfo->si_code = CLD_KILLED; siginfo->si_status = WTERMSIG(p->p_xsig); } else { siginfo->si_code = CLD_EXITED; siginfo->si_status = p->p_xexit; } siginfo->si_pid = p->p_pid; siginfo->si_uid = p->p_ucred->cr_uid; /* * The si_addr field would be useful additional * detail, but apparently the PC value may be lost * when we reach this point. bzero() above sets * siginfo->si_addr to NULL. */ } /* * There should be no reason to limit resources usage info to * exited processes only. A snapshot about any resources used * by a stopped process may be exactly what is needed. */ if (wrusage != NULL) { rup = &wrusage->wru_self; *rup = p->p_ru; PROC_STATLOCK(p); calcru(p, &rup->ru_utime, &rup->ru_stime); PROC_STATUNLOCK(p); rup = &wrusage->wru_children; *rup = p->p_stats->p_cru; calccru(p, &rup->ru_utime, &rup->ru_stime); } if (p->p_state == PRS_ZOMBIE && !check_only) { PROC_SLOCK(p); proc_reap(td, p, status, options); return (-1); } PROC_UNLOCK(p); return (1); } int kern_wait(struct thread *td, pid_t pid, int *status, int options, struct rusage *rusage) { struct __wrusage wru, *wrup; idtype_t idtype; id_t id; int ret; /* * Translate the special pid values into the (idtype, pid) * pair for kern_wait6. The WAIT_MYPGRP case is handled by * kern_wait6() on its own. */ if (pid == WAIT_ANY) { idtype = P_ALL; id = 0; } else if (pid < 0) { idtype = P_PGID; id = (id_t)-pid; } else { idtype = P_PID; id = (id_t)pid; } if (rusage != NULL) wrup = &wru; else wrup = NULL; /* * For backward compatibility we implicitly add flags WEXITED * and WTRAPPED here. */ options |= WEXITED | WTRAPPED; ret = kern_wait6(td, idtype, id, status, options, wrup, NULL); if (rusage != NULL) *rusage = wru.wru_self; return (ret); } int kern_wait6(struct thread *td, idtype_t idtype, id_t id, int *status, int options, struct __wrusage *wrusage, siginfo_t *siginfo) { struct proc *p, *q; pid_t pid; int error, nfound, ret; AUDIT_ARG_VALUE((int)idtype); /* XXX - This is likely wrong! */ AUDIT_ARG_PID((pid_t)id); /* XXX - This may be wrong! */ AUDIT_ARG_VALUE(options); q = td->td_proc; if ((pid_t)id == WAIT_MYPGRP && (idtype == P_PID || idtype == P_PGID)) { PROC_LOCK(q); id = (id_t)q->p_pgid; PROC_UNLOCK(q); idtype = P_PGID; } /* If we don't know the option, just return. */ if ((options & ~(WUNTRACED | WNOHANG | WCONTINUED | WNOWAIT | WEXITED | WTRAPPED | WLINUXCLONE)) != 0) return (EINVAL); if ((options & (WEXITED | WUNTRACED | WCONTINUED | WTRAPPED)) == 0) { /* * We will be unable to find any matching processes, * because there are no known events to look for. * Prefer to return error instead of blocking * indefinitely. */ return (EINVAL); } loop: if (q->p_flag & P_STATCHILD) { PROC_LOCK(q); q->p_flag &= ~P_STATCHILD; PROC_UNLOCK(q); } nfound = 0; sx_xlock(&proctree_lock); LIST_FOREACH(p, &q->p_children, p_sibling) { pid = p->p_pid; ret = proc_to_reap(td, p, idtype, id, status, options, wrusage, siginfo, 0); if (ret == 0) continue; else if (ret == 1) nfound++; else { td->td_retval[0] = pid; return (0); } PROC_LOCK(p); PROC_SLOCK(p); if ((options & WTRAPPED) != 0 && (p->p_flag & P_TRACED) != 0 && (p->p_flag & (P_STOPPED_TRACE | P_STOPPED_SIG)) != 0 && (p->p_suspcount == p->p_numthreads) && ((p->p_flag & P_WAITED) == 0)) { PROC_SUNLOCK(p); if ((options & WNOWAIT) == 0) p->p_flag |= P_WAITED; sx_xunlock(&proctree_lock); if (status != NULL) *status = W_STOPCODE(p->p_xsig); if (siginfo != NULL) { siginfo->si_status = p->p_xsig; siginfo->si_code = CLD_TRAPPED; } if ((options & WNOWAIT) == 0) { PROC_LOCK(q); sigqueue_take(p->p_ksi); PROC_UNLOCK(q); } CTR4(KTR_PTRACE, "wait: returning trapped pid %d status %#x (xstat %d) xthread %d", p->p_pid, W_STOPCODE(p->p_xsig), p->p_xsig, p->p_xthread != NULL ? p->p_xthread->td_tid : -1); PROC_UNLOCK(p); td->td_retval[0] = pid; return (0); } if ((options & WUNTRACED) != 0 && (p->p_flag & P_STOPPED_SIG) != 0 && (p->p_suspcount == p->p_numthreads) && ((p->p_flag & P_WAITED) == 0)) { PROC_SUNLOCK(p); if ((options & WNOWAIT) == 0) p->p_flag |= P_WAITED; sx_xunlock(&proctree_lock); if (status != NULL) *status = W_STOPCODE(p->p_xsig); if (siginfo != NULL) { siginfo->si_status = p->p_xsig; siginfo->si_code = CLD_STOPPED; } if ((options & WNOWAIT) == 0) { PROC_LOCK(q); sigqueue_take(p->p_ksi); PROC_UNLOCK(q); } PROC_UNLOCK(p); td->td_retval[0] = pid; return (0); } PROC_SUNLOCK(p); if ((options & WCONTINUED) != 0 && (p->p_flag & P_CONTINUED) != 0) { sx_xunlock(&proctree_lock); if ((options & WNOWAIT) == 0) { p->p_flag &= ~P_CONTINUED; PROC_LOCK(q); sigqueue_take(p->p_ksi); PROC_UNLOCK(q); } PROC_UNLOCK(p); if (status != NULL) *status = SIGCONT; if (siginfo != NULL) { siginfo->si_status = SIGCONT; siginfo->si_code = CLD_CONTINUED; } td->td_retval[0] = pid; return (0); } PROC_UNLOCK(p); } /* * Look in the orphans list too, to allow the parent to * collect it's child exit status even if child is being * debugged. * * Debugger detaches from the parent upon successful * switch-over from parent to child. At this point due to * re-parenting the parent loses the child to debugger and a * wait4(2) call would report that it has no children to wait * for. By maintaining a list of orphans we allow the parent * to successfully wait until the child becomes a zombie. */ if (nfound == 0) { LIST_FOREACH(p, &q->p_orphans, p_orphan) { ret = proc_to_reap(td, p, idtype, id, NULL, options, NULL, NULL, 1); if (ret != 0) { KASSERT(ret != -1, ("reaped an orphan (pid %d)", (int)td->td_retval[0])); nfound++; break; } } } if (nfound == 0) { sx_xunlock(&proctree_lock); return (ECHILD); } if (options & WNOHANG) { sx_xunlock(&proctree_lock); td->td_retval[0] = 0; return (0); } PROC_LOCK(q); sx_xunlock(&proctree_lock); if (q->p_flag & P_STATCHILD) { q->p_flag &= ~P_STATCHILD; error = 0; } else error = msleep(q, &q->p_mtx, PWAIT | PCATCH, "wait", 0); PROC_UNLOCK(q); if (error) return (error); goto loop; } /* * Make process 'parent' the new parent of process 'child'. * Must be called with an exclusive hold of proctree lock. */ void proc_reparent(struct proc *child, struct proc *parent) { sx_assert(&proctree_lock, SX_XLOCKED); PROC_LOCK_ASSERT(child, MA_OWNED); if (child->p_pptr == parent) return; PROC_LOCK(child->p_pptr); sigqueue_take(child->p_ksi); PROC_UNLOCK(child->p_pptr); LIST_REMOVE(child, p_sibling); LIST_INSERT_HEAD(&parent->p_children, child, p_sibling); clear_orphan(child); if (child->p_flag & P_TRACED) { if (LIST_EMPTY(&child->p_pptr->p_orphans)) { child->p_treeflag |= P_TREE_FIRST_ORPHAN; LIST_INSERT_HEAD(&child->p_pptr->p_orphans, child, p_orphan); } else { LIST_INSERT_AFTER(LIST_FIRST(&child->p_pptr->p_orphans), child, p_orphan); } child->p_treeflag |= P_TREE_ORPHANED; } child->p_pptr = parent; } Index: projects/clang390-import/sys/kern/kern_mutex.c =================================================================== --- projects/clang390-import/sys/kern/kern_mutex.c (revision 305686) +++ projects/clang390-import/sys/kern/kern_mutex.c (revision 305687) @@ -1,1082 +1,1119 @@ /*- * Copyright (c) 1998 Berkeley Software Design, Inc. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Berkeley Software Design Inc's name may not be used to endorse or * promote products derived from this software without specific prior * written permission. * * THIS SOFTWARE IS PROVIDED BY BERKELEY SOFTWARE DESIGN INC ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL BERKELEY SOFTWARE DESIGN INC BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * from BSDI $Id: mutex_witness.c,v 1.1.2.20 2000/04/27 03:10:27 cp Exp $ * and BSDI $Id: synch_machdep.c,v 2.3.2.39 2000/04/27 03:10:25 cp Exp $ */ /* * Machine independent bits of mutex implementation. */ #include __FBSDID("$FreeBSD$"); #include "opt_adaptive_mutexes.h" #include "opt_ddb.h" #include "opt_hwpmc_hooks.h" #include "opt_sched.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #if defined(SMP) && !defined(NO_ADAPTIVE_MUTEXES) #define ADAPTIVE_MUTEXES #endif #ifdef HWPMC_HOOKS #include PMC_SOFT_DEFINE( , , lock, failed); #endif /* * Return the mutex address when the lock cookie address is provided. * This functionality assumes that struct mtx* have a member named mtx_lock. */ #define mtxlock2mtx(c) (__containerof(c, struct mtx, mtx_lock)) /* * Internal utility macros. */ #define mtx_unowned(m) ((m)->mtx_lock == MTX_UNOWNED) #define mtx_destroyed(m) ((m)->mtx_lock == MTX_DESTROYED) #define mtx_owner(m) ((struct thread *)((m)->mtx_lock & ~MTX_FLAGMASK)) static void assert_mtx(const struct lock_object *lock, int what); #ifdef DDB static void db_show_mtx(const struct lock_object *lock); #endif static void lock_mtx(struct lock_object *lock, uintptr_t how); static void lock_spin(struct lock_object *lock, uintptr_t how); #ifdef KDTRACE_HOOKS static int owner_mtx(const struct lock_object *lock, struct thread **owner); #endif static uintptr_t unlock_mtx(struct lock_object *lock); static uintptr_t unlock_spin(struct lock_object *lock); /* * Lock classes for sleep and spin mutexes. */ struct lock_class lock_class_mtx_sleep = { .lc_name = "sleep mutex", .lc_flags = LC_SLEEPLOCK | LC_RECURSABLE, .lc_assert = assert_mtx, #ifdef DDB .lc_ddb_show = db_show_mtx, #endif .lc_lock = lock_mtx, .lc_unlock = unlock_mtx, #ifdef KDTRACE_HOOKS .lc_owner = owner_mtx, #endif }; struct lock_class lock_class_mtx_spin = { .lc_name = "spin mutex", .lc_flags = LC_SPINLOCK | LC_RECURSABLE, .lc_assert = assert_mtx, #ifdef DDB .lc_ddb_show = db_show_mtx, #endif .lc_lock = lock_spin, .lc_unlock = unlock_spin, #ifdef KDTRACE_HOOKS .lc_owner = owner_mtx, #endif }; #ifdef ADAPTIVE_MUTEXES static SYSCTL_NODE(_debug, OID_AUTO, mtx, CTLFLAG_RD, NULL, "mtx debugging"); static struct lock_delay_config mtx_delay = { .initial = 1000, .step = 500, .min = 100, .max = 5000, }; SYSCTL_INT(_debug_mtx, OID_AUTO, delay_initial, CTLFLAG_RW, &mtx_delay.initial, 0, ""); SYSCTL_INT(_debug_mtx, OID_AUTO, delay_step, CTLFLAG_RW, &mtx_delay.step, 0, ""); SYSCTL_INT(_debug_mtx, OID_AUTO, delay_min, CTLFLAG_RW, &mtx_delay.min, 0, ""); SYSCTL_INT(_debug_mtx, OID_AUTO, delay_max, CTLFLAG_RW, &mtx_delay.max, 0, ""); static void mtx_delay_sysinit(void *dummy) { mtx_delay.initial = mp_ncpus * 25; mtx_delay.step = (mp_ncpus * 25) / 2; mtx_delay.min = mp_ncpus * 5; mtx_delay.max = mp_ncpus * 25 * 10; } LOCK_DELAY_SYSINIT(mtx_delay_sysinit); #endif +static SYSCTL_NODE(_debug, OID_AUTO, mtx_spin, CTLFLAG_RD, NULL, + "mtx spin debugging"); + +static struct lock_delay_config mtx_spin_delay = { + .initial = 1000, + .step = 500, + .min = 100, + .max = 5000, +}; + +SYSCTL_INT(_debug_mtx_spin, OID_AUTO, delay_initial, CTLFLAG_RW, + &mtx_spin_delay.initial, 0, ""); +SYSCTL_INT(_debug_mtx_spin, OID_AUTO, delay_step, CTLFLAG_RW, &mtx_spin_delay.step, + 0, ""); +SYSCTL_INT(_debug_mtx_spin, OID_AUTO, delay_min, CTLFLAG_RW, &mtx_spin_delay.min, + 0, ""); +SYSCTL_INT(_debug_mtx_spin, OID_AUTO, delay_max, CTLFLAG_RW, &mtx_spin_delay.max, + 0, ""); + +static void +mtx_spin_delay_sysinit(void *dummy) +{ + + mtx_spin_delay.initial = mp_ncpus * 25; + mtx_spin_delay.step = (mp_ncpus * 25) / 2; + mtx_spin_delay.min = mp_ncpus * 5; + mtx_spin_delay.max = mp_ncpus * 25 * 10; +} +LOCK_DELAY_SYSINIT(mtx_spin_delay_sysinit); + /* * System-wide mutexes */ struct mtx blocked_lock; struct mtx Giant; void assert_mtx(const struct lock_object *lock, int what) { mtx_assert((const struct mtx *)lock, what); } void lock_mtx(struct lock_object *lock, uintptr_t how) { mtx_lock((struct mtx *)lock); } void lock_spin(struct lock_object *lock, uintptr_t how) { panic("spin locks can only use msleep_spin"); } uintptr_t unlock_mtx(struct lock_object *lock) { struct mtx *m; m = (struct mtx *)lock; mtx_assert(m, MA_OWNED | MA_NOTRECURSED); mtx_unlock(m); return (0); } uintptr_t unlock_spin(struct lock_object *lock) { panic("spin locks can only use msleep_spin"); } #ifdef KDTRACE_HOOKS int owner_mtx(const struct lock_object *lock, struct thread **owner) { const struct mtx *m = (const struct mtx *)lock; *owner = mtx_owner(m); return (mtx_unowned(m) == 0); } #endif /* * Function versions of the inlined __mtx_* macros. These are used by * modules and can also be called from assembly language if needed. */ void __mtx_lock_flags(volatile uintptr_t *c, int opts, const char *file, int line) { struct mtx *m; if (SCHEDULER_STOPPED()) return; m = mtxlock2mtx(c); KASSERT(kdb_active != 0 || !TD_IS_IDLETHREAD(curthread), ("mtx_lock() by idle thread %p on sleep mutex %s @ %s:%d", curthread, m->lock_object.lo_name, file, line)); KASSERT(m->mtx_lock != MTX_DESTROYED, ("mtx_lock() of destroyed mutex @ %s:%d", file, line)); KASSERT(LOCK_CLASS(&m->lock_object) == &lock_class_mtx_sleep, ("mtx_lock() of spin mutex %s @ %s:%d", m->lock_object.lo_name, file, line)); WITNESS_CHECKORDER(&m->lock_object, (opts & ~MTX_RECURSE) | LOP_NEWORDER | LOP_EXCLUSIVE, file, line, NULL); __mtx_lock(m, curthread, opts, file, line); LOCK_LOG_LOCK("LOCK", &m->lock_object, opts, m->mtx_recurse, file, line); WITNESS_LOCK(&m->lock_object, (opts & ~MTX_RECURSE) | LOP_EXCLUSIVE, file, line); TD_LOCKS_INC(curthread); } void __mtx_unlock_flags(volatile uintptr_t *c, int opts, const char *file, int line) { struct mtx *m; if (SCHEDULER_STOPPED()) return; m = mtxlock2mtx(c); KASSERT(m->mtx_lock != MTX_DESTROYED, ("mtx_unlock() of destroyed mutex @ %s:%d", file, line)); KASSERT(LOCK_CLASS(&m->lock_object) == &lock_class_mtx_sleep, ("mtx_unlock() of spin mutex %s @ %s:%d", m->lock_object.lo_name, file, line)); WITNESS_UNLOCK(&m->lock_object, opts | LOP_EXCLUSIVE, file, line); LOCK_LOG_LOCK("UNLOCK", &m->lock_object, opts, m->mtx_recurse, file, line); mtx_assert(m, MA_OWNED); __mtx_unlock(m, curthread, opts, file, line); TD_LOCKS_DEC(curthread); } void __mtx_lock_spin_flags(volatile uintptr_t *c, int opts, const char *file, int line) { struct mtx *m; if (SCHEDULER_STOPPED()) return; m = mtxlock2mtx(c); KASSERT(m->mtx_lock != MTX_DESTROYED, ("mtx_lock_spin() of destroyed mutex @ %s:%d", file, line)); KASSERT(LOCK_CLASS(&m->lock_object) == &lock_class_mtx_spin, ("mtx_lock_spin() of sleep mutex %s @ %s:%d", m->lock_object.lo_name, file, line)); if (mtx_owned(m)) KASSERT((m->lock_object.lo_flags & LO_RECURSABLE) != 0 || (opts & MTX_RECURSE) != 0, ("mtx_lock_spin: recursed on non-recursive mutex %s @ %s:%d\n", m->lock_object.lo_name, file, line)); opts &= ~MTX_RECURSE; WITNESS_CHECKORDER(&m->lock_object, opts | LOP_NEWORDER | LOP_EXCLUSIVE, file, line, NULL); __mtx_lock_spin(m, curthread, opts, file, line); LOCK_LOG_LOCK("LOCK", &m->lock_object, opts, m->mtx_recurse, file, line); WITNESS_LOCK(&m->lock_object, opts | LOP_EXCLUSIVE, file, line); } int __mtx_trylock_spin_flags(volatile uintptr_t *c, int opts, const char *file, int line) { struct mtx *m; if (SCHEDULER_STOPPED()) return (1); m = mtxlock2mtx(c); KASSERT(m->mtx_lock != MTX_DESTROYED, ("mtx_trylock_spin() of destroyed mutex @ %s:%d", file, line)); KASSERT(LOCK_CLASS(&m->lock_object) == &lock_class_mtx_spin, ("mtx_trylock_spin() of sleep mutex %s @ %s:%d", m->lock_object.lo_name, file, line)); KASSERT((opts & MTX_RECURSE) == 0, ("mtx_trylock_spin: unsupp. opt MTX_RECURSE on mutex %s @ %s:%d\n", m->lock_object.lo_name, file, line)); if (__mtx_trylock_spin(m, curthread, opts, file, line)) { LOCK_LOG_TRY("LOCK", &m->lock_object, opts, 1, file, line); WITNESS_LOCK(&m->lock_object, opts | LOP_EXCLUSIVE, file, line); return (1); } LOCK_LOG_TRY("LOCK", &m->lock_object, opts, 0, file, line); return (0); } void __mtx_unlock_spin_flags(volatile uintptr_t *c, int opts, const char *file, int line) { struct mtx *m; if (SCHEDULER_STOPPED()) return; m = mtxlock2mtx(c); KASSERT(m->mtx_lock != MTX_DESTROYED, ("mtx_unlock_spin() of destroyed mutex @ %s:%d", file, line)); KASSERT(LOCK_CLASS(&m->lock_object) == &lock_class_mtx_spin, ("mtx_unlock_spin() of sleep mutex %s @ %s:%d", m->lock_object.lo_name, file, line)); WITNESS_UNLOCK(&m->lock_object, opts | LOP_EXCLUSIVE, file, line); LOCK_LOG_LOCK("UNLOCK", &m->lock_object, opts, m->mtx_recurse, file, line); mtx_assert(m, MA_OWNED); __mtx_unlock_spin(m); } /* * The important part of mtx_trylock{,_flags}() * Tries to acquire lock `m.' If this function is called on a mutex that * is already owned, it will recursively acquire the lock. */ int _mtx_trylock_flags_(volatile uintptr_t *c, int opts, const char *file, int line) { struct mtx *m; #ifdef LOCK_PROFILING uint64_t waittime = 0; int contested = 0; #endif int rval; if (SCHEDULER_STOPPED()) return (1); m = mtxlock2mtx(c); KASSERT(kdb_active != 0 || !TD_IS_IDLETHREAD(curthread), ("mtx_trylock() by idle thread %p on sleep mutex %s @ %s:%d", curthread, m->lock_object.lo_name, file, line)); KASSERT(m->mtx_lock != MTX_DESTROYED, ("mtx_trylock() of destroyed mutex @ %s:%d", file, line)); KASSERT(LOCK_CLASS(&m->lock_object) == &lock_class_mtx_sleep, ("mtx_trylock() of spin mutex %s @ %s:%d", m->lock_object.lo_name, file, line)); if (mtx_owned(m) && ((m->lock_object.lo_flags & LO_RECURSABLE) != 0 || (opts & MTX_RECURSE) != 0)) { m->mtx_recurse++; atomic_set_ptr(&m->mtx_lock, MTX_RECURSED); rval = 1; } else rval = _mtx_obtain_lock(m, (uintptr_t)curthread); opts &= ~MTX_RECURSE; LOCK_LOG_TRY("LOCK", &m->lock_object, opts, rval, file, line); if (rval) { WITNESS_LOCK(&m->lock_object, opts | LOP_EXCLUSIVE | LOP_TRYLOCK, file, line); TD_LOCKS_INC(curthread); if (m->mtx_recurse == 0) LOCKSTAT_PROFILE_OBTAIN_LOCK_SUCCESS(adaptive__acquire, m, contested, waittime, file, line); } return (rval); } /* * __mtx_lock_sleep: the tougher part of acquiring an MTX_DEF lock. * * We call this if the lock is either contested (i.e. we need to go to * sleep waiting for it), or if we need to recurse on it. */ void __mtx_lock_sleep(volatile uintptr_t *c, uintptr_t tid, int opts, const char *file, int line) { struct mtx *m; struct turnstile *ts; uintptr_t v; #ifdef ADAPTIVE_MUTEXES volatile struct thread *owner; #endif #ifdef KTR int cont_logged = 0; #endif #ifdef LOCK_PROFILING int contested = 0; uint64_t waittime = 0; #endif #if defined(ADAPTIVE_MUTEXES) || defined(KDTRACE_HOOKS) struct lock_delay_arg lda; #endif #ifdef KDTRACE_HOOKS u_int sleep_cnt = 0; int64_t sleep_time = 0; int64_t all_time = 0; #endif if (SCHEDULER_STOPPED()) return; #if defined(ADAPTIVE_MUTEXES) lock_delay_arg_init(&lda, &mtx_delay); #elif defined(KDTRACE_HOOKS) lock_delay_arg_init(&lda, NULL); #endif m = mtxlock2mtx(c); if (mtx_owned(m)) { KASSERT((m->lock_object.lo_flags & LO_RECURSABLE) != 0 || (opts & MTX_RECURSE) != 0, ("_mtx_lock_sleep: recursed on non-recursive mutex %s @ %s:%d\n", m->lock_object.lo_name, file, line)); opts &= ~MTX_RECURSE; m->mtx_recurse++; atomic_set_ptr(&m->mtx_lock, MTX_RECURSED); if (LOCK_LOG_TEST(&m->lock_object, opts)) CTR1(KTR_LOCK, "_mtx_lock_sleep: %p recursing", m); return; } opts &= ~MTX_RECURSE; #ifdef HWPMC_HOOKS PMC_SOFT_CALL( , , lock, failed); #endif lock_profile_obtain_lock_failed(&m->lock_object, &contested, &waittime); if (LOCK_LOG_TEST(&m->lock_object, opts)) CTR4(KTR_LOCK, "_mtx_lock_sleep: %s contested (lock=%p) at %s:%d", m->lock_object.lo_name, (void *)m->mtx_lock, file, line); #ifdef KDTRACE_HOOKS all_time -= lockstat_nsecs(&m->lock_object); #endif for (;;) { if (m->mtx_lock == MTX_UNOWNED && _mtx_obtain_lock(m, tid)) break; #ifdef KDTRACE_HOOKS lda.spin_cnt++; #endif #ifdef ADAPTIVE_MUTEXES /* * If the owner is running on another CPU, spin until the * owner stops running or the state of the lock changes. */ v = m->mtx_lock; if (v != MTX_UNOWNED) { owner = (struct thread *)(v & ~MTX_FLAGMASK); if (TD_IS_RUNNING(owner)) { if (LOCK_LOG_TEST(&m->lock_object, 0)) CTR3(KTR_LOCK, "%s: spinning on %p held by %p", __func__, m, owner); KTR_STATE1(KTR_SCHED, "thread", sched_tdname((struct thread *)tid), "spinning", "lockname:\"%s\"", m->lock_object.lo_name); while (mtx_owner(m) == owner && TD_IS_RUNNING(owner)) lock_delay(&lda); KTR_STATE0(KTR_SCHED, "thread", sched_tdname((struct thread *)tid), "running"); continue; } } #endif ts = turnstile_trywait(&m->lock_object); v = m->mtx_lock; /* * Check if the lock has been released while spinning for * the turnstile chain lock. */ if (v == MTX_UNOWNED) { turnstile_cancel(ts); continue; } #ifdef ADAPTIVE_MUTEXES /* * The current lock owner might have started executing * on another CPU (or the lock could have changed * owners) while we were waiting on the turnstile * chain lock. If so, drop the turnstile lock and try * again. */ owner = (struct thread *)(v & ~MTX_FLAGMASK); if (TD_IS_RUNNING(owner)) { turnstile_cancel(ts); continue; } #endif /* * If the mutex isn't already contested and a failure occurs * setting the contested bit, the mutex was either released * or the state of the MTX_RECURSED bit changed. */ if ((v & MTX_CONTESTED) == 0 && !atomic_cmpset_ptr(&m->mtx_lock, v, v | MTX_CONTESTED)) { turnstile_cancel(ts); continue; } /* * We definitely must sleep for this lock. */ mtx_assert(m, MA_NOTOWNED); #ifdef KTR if (!cont_logged) { CTR6(KTR_CONTENTION, "contention: %p at %s:%d wants %s, taken by %s:%d", (void *)tid, file, line, m->lock_object.lo_name, WITNESS_FILE(&m->lock_object), WITNESS_LINE(&m->lock_object)); cont_logged = 1; } #endif /* * Block on the turnstile. */ #ifdef KDTRACE_HOOKS sleep_time -= lockstat_nsecs(&m->lock_object); #endif turnstile_wait(ts, mtx_owner(m), TS_EXCLUSIVE_QUEUE); #ifdef KDTRACE_HOOKS sleep_time += lockstat_nsecs(&m->lock_object); sleep_cnt++; #endif } #ifdef KDTRACE_HOOKS all_time += lockstat_nsecs(&m->lock_object); #endif #ifdef KTR if (cont_logged) { CTR4(KTR_CONTENTION, "contention end: %s acquired by %p at %s:%d", m->lock_object.lo_name, (void *)tid, file, line); } #endif LOCKSTAT_PROFILE_OBTAIN_LOCK_SUCCESS(adaptive__acquire, m, contested, waittime, file, line); #ifdef KDTRACE_HOOKS if (sleep_time) LOCKSTAT_RECORD1(adaptive__block, m, sleep_time); /* * Only record the loops spinning and not sleeping. */ if (lda.spin_cnt > sleep_cnt) LOCKSTAT_RECORD1(adaptive__spin, m, all_time - sleep_time); #endif } static void _mtx_lock_spin_failed(struct mtx *m) { struct thread *td; td = mtx_owner(m); /* If the mutex is unlocked, try again. */ if (td == NULL) return; printf( "spin lock %p (%s) held by %p (tid %d) too long\n", m, m->lock_object.lo_name, td, td->td_tid); #ifdef WITNESS witness_display_spinlock(&m->lock_object, td, printf); #endif panic("spin lock held too long"); } #ifdef SMP /* * _mtx_lock_spin_cookie: the tougher part of acquiring an MTX_SPIN lock. * * This is only called if we need to actually spin for the lock. Recursion * is handled inline. */ void _mtx_lock_spin_cookie(volatile uintptr_t *c, uintptr_t tid, int opts, const char *file, int line) { struct mtx *m; - int i = 0; + struct lock_delay_arg lda; #ifdef LOCK_PROFILING int contested = 0; uint64_t waittime = 0; #endif #ifdef KDTRACE_HOOKS int64_t spin_time = 0; #endif if (SCHEDULER_STOPPED()) return; + lock_delay_arg_init(&lda, &mtx_spin_delay); m = mtxlock2mtx(c); if (LOCK_LOG_TEST(&m->lock_object, opts)) CTR1(KTR_LOCK, "_mtx_lock_spin: %p spinning", m); KTR_STATE1(KTR_SCHED, "thread", sched_tdname((struct thread *)tid), "spinning", "lockname:\"%s\"", m->lock_object.lo_name); #ifdef HWPMC_HOOKS PMC_SOFT_CALL( , , lock, failed); #endif lock_profile_obtain_lock_failed(&m->lock_object, &contested, &waittime); #ifdef KDTRACE_HOOKS spin_time -= lockstat_nsecs(&m->lock_object); #endif for (;;) { if (m->mtx_lock == MTX_UNOWNED && _mtx_obtain_lock(m, tid)) break; /* Give interrupts a chance while we spin. */ spinlock_exit(); while (m->mtx_lock != MTX_UNOWNED) { - if (i++ < 10000000) { - cpu_spinwait(); + if (lda.spin_cnt < 10000000) { + lock_delay(&lda); continue; } - if (i < 60000000 || kdb_active || panicstr != NULL) + lda.spin_cnt++; + if (lda.spin_cnt < 60000000 || kdb_active || + panicstr != NULL) DELAY(1); else _mtx_lock_spin_failed(m); cpu_spinwait(); } spinlock_enter(); } #ifdef KDTRACE_HOOKS spin_time += lockstat_nsecs(&m->lock_object); #endif if (LOCK_LOG_TEST(&m->lock_object, opts)) CTR1(KTR_LOCK, "_mtx_lock_spin: %p spin done", m); KTR_STATE0(KTR_SCHED, "thread", sched_tdname((struct thread *)tid), "running"); #ifdef KDTRACE_HOOKS LOCKSTAT_PROFILE_OBTAIN_LOCK_SUCCESS(spin__acquire, m, contested, waittime, file, line); if (spin_time != 0) LOCKSTAT_RECORD1(spin__spin, m, spin_time); #endif } #endif /* SMP */ void thread_lock_flags_(struct thread *td, int opts, const char *file, int line) { struct mtx *m; uintptr_t tid; - int i; + struct lock_delay_arg lda; #ifdef LOCK_PROFILING int contested = 0; uint64_t waittime = 0; #endif #ifdef KDTRACE_HOOKS int64_t spin_time = 0; #endif - i = 0; tid = (uintptr_t)curthread; if (SCHEDULER_STOPPED()) { /* * Ensure that spinlock sections are balanced even when the * scheduler is stopped, since we may otherwise inadvertently * re-enable interrupts while dumping core. */ spinlock_enter(); return; } + lock_delay_arg_init(&lda, &mtx_spin_delay); + #ifdef KDTRACE_HOOKS spin_time -= lockstat_nsecs(&td->td_lock->lock_object); #endif for (;;) { retry: spinlock_enter(); m = td->td_lock; KASSERT(m->mtx_lock != MTX_DESTROYED, ("thread_lock() of destroyed mutex @ %s:%d", file, line)); KASSERT(LOCK_CLASS(&m->lock_object) == &lock_class_mtx_spin, ("thread_lock() of sleep mutex %s @ %s:%d", m->lock_object.lo_name, file, line)); if (mtx_owned(m)) KASSERT((m->lock_object.lo_flags & LO_RECURSABLE) != 0, ("thread_lock: recursed on non-recursive mutex %s @ %s:%d\n", m->lock_object.lo_name, file, line)); WITNESS_CHECKORDER(&m->lock_object, opts | LOP_NEWORDER | LOP_EXCLUSIVE, file, line, NULL); for (;;) { if (m->mtx_lock == MTX_UNOWNED && _mtx_obtain_lock(m, tid)) break; if (m->mtx_lock == tid) { m->mtx_recurse++; break; } #ifdef HWPMC_HOOKS PMC_SOFT_CALL( , , lock, failed); #endif lock_profile_obtain_lock_failed(&m->lock_object, &contested, &waittime); /* Give interrupts a chance while we spin. */ spinlock_exit(); while (m->mtx_lock != MTX_UNOWNED) { - if (i++ < 10000000) + if (lda.spin_cnt < 10000000) { + lock_delay(&lda); + } else { + lda.spin_cnt++; + if (lda.spin_cnt < 60000000 || + kdb_active || panicstr != NULL) + DELAY(1); + else + _mtx_lock_spin_failed(m); cpu_spinwait(); - else if (i < 60000000 || - kdb_active || panicstr != NULL) - DELAY(1); - else - _mtx_lock_spin_failed(m); - cpu_spinwait(); + } if (m != td->td_lock) goto retry; } spinlock_enter(); } if (m == td->td_lock) break; __mtx_unlock_spin(m); /* does spinlock_exit() */ } #ifdef KDTRACE_HOOKS spin_time += lockstat_nsecs(&m->lock_object); #endif if (m->mtx_recurse == 0) LOCKSTAT_PROFILE_OBTAIN_LOCK_SUCCESS(spin__acquire, m, contested, waittime, file, line); LOCK_LOG_LOCK("LOCK", &m->lock_object, opts, m->mtx_recurse, file, line); WITNESS_LOCK(&m->lock_object, opts | LOP_EXCLUSIVE, file, line); #ifdef KDTRACE_HOOKS if (spin_time != 0) LOCKSTAT_RECORD1(thread__spin, m, spin_time); #endif } struct mtx * thread_lock_block(struct thread *td) { struct mtx *lock; THREAD_LOCK_ASSERT(td, MA_OWNED); lock = td->td_lock; td->td_lock = &blocked_lock; mtx_unlock_spin(lock); return (lock); } void thread_lock_unblock(struct thread *td, struct mtx *new) { mtx_assert(new, MA_OWNED); MPASS(td->td_lock == &blocked_lock); atomic_store_rel_ptr((volatile void *)&td->td_lock, (uintptr_t)new); } void thread_lock_set(struct thread *td, struct mtx *new) { struct mtx *lock; mtx_assert(new, MA_OWNED); THREAD_LOCK_ASSERT(td, MA_OWNED); lock = td->td_lock; td->td_lock = new; mtx_unlock_spin(lock); } /* * __mtx_unlock_sleep: the tougher part of releasing an MTX_DEF lock. * * We are only called here if the lock is recursed or contested (i.e. we * need to wake up a blocked thread). */ void __mtx_unlock_sleep(volatile uintptr_t *c, int opts, const char *file, int line) { struct mtx *m; struct turnstile *ts; if (SCHEDULER_STOPPED()) return; m = mtxlock2mtx(c); if (mtx_recursed(m)) { if (--(m->mtx_recurse) == 0) atomic_clear_ptr(&m->mtx_lock, MTX_RECURSED); if (LOCK_LOG_TEST(&m->lock_object, opts)) CTR1(KTR_LOCK, "_mtx_unlock_sleep: %p unrecurse", m); return; } /* * We have to lock the chain before the turnstile so this turnstile * can be removed from the hash list if it is empty. */ turnstile_chain_lock(&m->lock_object); ts = turnstile_lookup(&m->lock_object); if (LOCK_LOG_TEST(&m->lock_object, opts)) CTR1(KTR_LOCK, "_mtx_unlock_sleep: %p contested", m); MPASS(ts != NULL); turnstile_broadcast(ts, TS_EXCLUSIVE_QUEUE); _mtx_release_lock_quick(m); /* * This turnstile is now no longer associated with the mutex. We can * unlock the chain lock so a new turnstile may take it's place. */ turnstile_unpend(ts, TS_EXCLUSIVE_LOCK); turnstile_chain_unlock(&m->lock_object); } /* * All the unlocking of MTX_SPIN locks is done inline. * See the __mtx_unlock_spin() macro for the details. */ /* * The backing function for the INVARIANTS-enabled mtx_assert() */ #ifdef INVARIANT_SUPPORT void __mtx_assert(const volatile uintptr_t *c, int what, const char *file, int line) { const struct mtx *m; if (panicstr != NULL || dumping) return; m = mtxlock2mtx(c); switch (what) { case MA_OWNED: case MA_OWNED | MA_RECURSED: case MA_OWNED | MA_NOTRECURSED: if (!mtx_owned(m)) panic("mutex %s not owned at %s:%d", m->lock_object.lo_name, file, line); if (mtx_recursed(m)) { if ((what & MA_NOTRECURSED) != 0) panic("mutex %s recursed at %s:%d", m->lock_object.lo_name, file, line); } else if ((what & MA_RECURSED) != 0) { panic("mutex %s unrecursed at %s:%d", m->lock_object.lo_name, file, line); } break; case MA_NOTOWNED: if (mtx_owned(m)) panic("mutex %s owned at %s:%d", m->lock_object.lo_name, file, line); break; default: panic("unknown mtx_assert at %s:%d", file, line); } } #endif /* * General init routine used by the MTX_SYSINIT() macro. */ void mtx_sysinit(void *arg) { struct mtx_args *margs = arg; mtx_init((struct mtx *)margs->ma_mtx, margs->ma_desc, NULL, margs->ma_opts); } /* * Mutex initialization routine; initialize lock `m' of type contained in * `opts' with options contained in `opts' and name `name.' The optional * lock type `type' is used as a general lock category name for use with * witness. */ void _mtx_init(volatile uintptr_t *c, const char *name, const char *type, int opts) { struct mtx *m; struct lock_class *class; int flags; m = mtxlock2mtx(c); MPASS((opts & ~(MTX_SPIN | MTX_QUIET | MTX_RECURSE | MTX_NOWITNESS | MTX_DUPOK | MTX_NOPROFILE | MTX_NEW)) == 0); ASSERT_ATOMIC_LOAD_PTR(m->mtx_lock, ("%s: mtx_lock not aligned for %s: %p", __func__, name, &m->mtx_lock)); /* Determine lock class and lock flags. */ if (opts & MTX_SPIN) class = &lock_class_mtx_spin; else class = &lock_class_mtx_sleep; flags = 0; if (opts & MTX_QUIET) flags |= LO_QUIET; if (opts & MTX_RECURSE) flags |= LO_RECURSABLE; if ((opts & MTX_NOWITNESS) == 0) flags |= LO_WITNESS; if (opts & MTX_DUPOK) flags |= LO_DUPOK; if (opts & MTX_NOPROFILE) flags |= LO_NOPROFILE; if (opts & MTX_NEW) flags |= LO_NEW; /* Initialize mutex. */ lock_init(&m->lock_object, class, name, type, flags); m->mtx_lock = MTX_UNOWNED; m->mtx_recurse = 0; } /* * Remove lock `m' from all_mtx queue. We don't allow MTX_QUIET to be * passed in as a flag here because if the corresponding mtx_init() was * called with MTX_QUIET set, then it will already be set in the mutex's * flags. */ void _mtx_destroy(volatile uintptr_t *c) { struct mtx *m; m = mtxlock2mtx(c); if (!mtx_owned(m)) MPASS(mtx_unowned(m)); else { MPASS((m->mtx_lock & (MTX_RECURSED|MTX_CONTESTED)) == 0); /* Perform the non-mtx related part of mtx_unlock_spin(). */ if (LOCK_CLASS(&m->lock_object) == &lock_class_mtx_spin) spinlock_exit(); else TD_LOCKS_DEC(curthread); lock_profile_release_lock(&m->lock_object); /* Tell witness this isn't locked to make it happy. */ WITNESS_UNLOCK(&m->lock_object, LOP_EXCLUSIVE, __FILE__, __LINE__); } m->mtx_lock = MTX_DESTROYED; lock_destroy(&m->lock_object); } /* * Intialize the mutex code and system mutexes. This is called from the MD * startup code prior to mi_startup(). The per-CPU data space needs to be * setup before this is called. */ void mutex_init(void) { /* Setup turnstiles so that sleep mutexes work. */ init_turnstiles(); /* * Initialize mutexes. */ mtx_init(&Giant, "Giant", NULL, MTX_DEF | MTX_RECURSE); mtx_init(&blocked_lock, "blocked lock", NULL, MTX_SPIN); blocked_lock.mtx_lock = 0xdeadc0de; /* Always blocked. */ mtx_init(&proc0.p_mtx, "process lock", NULL, MTX_DEF | MTX_DUPOK); mtx_init(&proc0.p_slock, "process slock", NULL, MTX_SPIN); mtx_init(&proc0.p_statmtx, "pstatl", NULL, MTX_SPIN); mtx_init(&proc0.p_itimmtx, "pitiml", NULL, MTX_SPIN); mtx_init(&proc0.p_profmtx, "pprofl", NULL, MTX_SPIN); mtx_init(&devmtx, "cdev", NULL, MTX_DEF); mtx_lock(&Giant); } #ifdef DDB void db_show_mtx(const struct lock_object *lock) { struct thread *td; const struct mtx *m; m = (const struct mtx *)lock; db_printf(" flags: {"); if (LOCK_CLASS(lock) == &lock_class_mtx_spin) db_printf("SPIN"); else db_printf("DEF"); if (m->lock_object.lo_flags & LO_RECURSABLE) db_printf(", RECURSE"); if (m->lock_object.lo_flags & LO_DUPOK) db_printf(", DUPOK"); db_printf("}\n"); db_printf(" state: {"); if (mtx_unowned(m)) db_printf("UNOWNED"); else if (mtx_destroyed(m)) db_printf("DESTROYED"); else { db_printf("OWNED"); if (m->mtx_lock & MTX_CONTESTED) db_printf(", CONTESTED"); if (m->mtx_lock & MTX_RECURSED) db_printf(", RECURSED"); } db_printf("}\n"); if (!mtx_unowned(m) && !mtx_destroyed(m)) { td = mtx_owner(m); db_printf(" owner: %p (tid %d, pid %d, \"%s\")\n", td, td->td_tid, td->td_proc->p_pid, td->td_name); if (mtx_recursed(m)) db_printf(" recursed: %d\n", m->mtx_recurse); } } #endif Index: projects/clang390-import/sys/kern/subr_witness.c =================================================================== --- projects/clang390-import/sys/kern/subr_witness.c (revision 305686) +++ projects/clang390-import/sys/kern/subr_witness.c (revision 305687) @@ -1,3017 +1,3025 @@ /*- * Copyright (c) 2008 Isilon Systems, Inc. * Copyright (c) 2008 Ilya Maykov * Copyright (c) 1998 Berkeley Software Design, Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Berkeley Software Design Inc's name may not be used to endorse or * promote products derived from this software without specific prior * written permission. * * THIS SOFTWARE IS PROVIDED BY BERKELEY SOFTWARE DESIGN INC ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL BERKELEY SOFTWARE DESIGN INC BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * from BSDI $Id: mutex_witness.c,v 1.1.2.20 2000/04/27 03:10:27 cp Exp $ * and BSDI $Id: synch_machdep.c,v 2.3.2.39 2000/04/27 03:10:25 cp Exp $ */ /* * Implementation of the `witness' lock verifier. Originally implemented for * mutexes in BSD/OS. Extended to handle generic lock objects and lock * classes in FreeBSD. */ /* * Main Entry: witness * Pronunciation: 'wit-n&s * Function: noun * Etymology: Middle English witnesse, from Old English witnes knowledge, * testimony, witness, from 2wit * Date: before 12th century * 1 : attestation of a fact or event : TESTIMONY * 2 : one that gives evidence; specifically : one who testifies in * a cause or before a judicial tribunal * 3 : one asked to be present at a transaction so as to be able to * testify to its having taken place * 4 : one who has personal knowledge of something * 5 a : something serving as evidence or proof : SIGN * b : public affirmation by word or example of usually * religious faith or conviction * 6 capitalized : a member of the Jehovah's Witnesses */ /* * Special rules concerning Giant and lock orders: * * 1) Giant must be acquired before any other mutexes. Stated another way, * no other mutex may be held when Giant is acquired. * * 2) Giant must be released when blocking on a sleepable lock. * * This rule is less obvious, but is a result of Giant providing the same * semantics as spl(). Basically, when a thread sleeps, it must release * Giant. When a thread blocks on a sleepable lock, it sleeps. Hence rule * 2). * * 3) Giant may be acquired before or after sleepable locks. * * This rule is also not quite as obvious. Giant may be acquired after * a sleepable lock because it is a non-sleepable lock and non-sleepable * locks may always be acquired while holding a sleepable lock. The second * case, Giant before a sleepable lock, follows from rule 2) above. Suppose * you have two threads T1 and T2 and a sleepable lock X. Suppose that T1 * acquires X and blocks on Giant. Then suppose that T2 acquires Giant and * blocks on X. When T2 blocks on X, T2 will release Giant allowing T1 to * execute. Thus, acquiring Giant both before and after a sleepable lock * will not result in a lock order reversal. */ #include __FBSDID("$FreeBSD$"); #include "opt_ddb.h" #include "opt_hwpmc_hooks.h" #include "opt_stack.h" #include "opt_witness.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifdef DDB #include #endif #include #if !defined(DDB) && !defined(STACK) #error "DDB or STACK options are required for WITNESS" #endif /* Note that these traces do not work with KTR_ALQ. */ #if 0 #define KTR_WITNESS KTR_SUBSYS #else #define KTR_WITNESS 0 #endif #define LI_RECURSEMASK 0x0000ffff /* Recursion depth of lock instance. */ #define LI_EXCLUSIVE 0x00010000 /* Exclusive lock instance. */ #define LI_NORELEASE 0x00020000 /* Lock not allowed to be released. */ /* Define this to check for blessed mutexes */ #undef BLESSING #ifndef WITNESS_COUNT #define WITNESS_COUNT 1536 #endif #define WITNESS_HASH_SIZE 251 /* Prime, gives load factor < 2 */ #define WITNESS_PENDLIST (1024 + MAXCPU) /* Allocate 256 KB of stack data space */ #define WITNESS_LO_DATA_COUNT 2048 /* Prime, gives load factor of ~2 at full load */ #define WITNESS_LO_HASH_SIZE 1021 /* * XXX: This is somewhat bogus, as we assume here that at most 2048 threads * will hold LOCK_NCHILDREN locks. We handle failure ok, and we should * probably be safe for the most part, but it's still a SWAG. */ #define LOCK_NCHILDREN 5 #define LOCK_CHILDCOUNT 2048 #define MAX_W_NAME 64 #define FULLGRAPH_SBUF_SIZE 512 /* * These flags go in the witness relationship matrix and describe the * relationship between any two struct witness objects. */ #define WITNESS_UNRELATED 0x00 /* No lock order relation. */ #define WITNESS_PARENT 0x01 /* Parent, aka direct ancestor. */ #define WITNESS_ANCESTOR 0x02 /* Direct or indirect ancestor. */ #define WITNESS_CHILD 0x04 /* Child, aka direct descendant. */ #define WITNESS_DESCENDANT 0x08 /* Direct or indirect descendant. */ #define WITNESS_ANCESTOR_MASK (WITNESS_PARENT | WITNESS_ANCESTOR) #define WITNESS_DESCENDANT_MASK (WITNESS_CHILD | WITNESS_DESCENDANT) #define WITNESS_RELATED_MASK \ (WITNESS_ANCESTOR_MASK | WITNESS_DESCENDANT_MASK) #define WITNESS_REVERSAL 0x10 /* A lock order reversal has been * observed. */ #define WITNESS_RESERVED1 0x20 /* Unused flag, reserved. */ #define WITNESS_RESERVED2 0x40 /* Unused flag, reserved. */ #define WITNESS_LOCK_ORDER_KNOWN 0x80 /* This lock order is known. */ /* Descendant to ancestor flags */ #define WITNESS_DTOA(x) (((x) & WITNESS_RELATED_MASK) >> 2) /* Ancestor to descendant flags */ #define WITNESS_ATOD(x) (((x) & WITNESS_RELATED_MASK) << 2) #define WITNESS_INDEX_ASSERT(i) \ MPASS((i) > 0 && (i) <= w_max_used_index && (i) < witness_count) static MALLOC_DEFINE(M_WITNESS, "Witness", "Witness"); /* * Lock instances. A lock instance is the data associated with a lock while * it is held by witness. For example, a lock instance will hold the * recursion count of a lock. Lock instances are held in lists. Spin locks * are held in a per-cpu list while sleep locks are held in per-thread list. */ struct lock_instance { struct lock_object *li_lock; const char *li_file; int li_line; u_int li_flags; }; /* * A simple list type used to build the list of locks held by a thread * or CPU. We can't simply embed the list in struct lock_object since a * lock may be held by more than one thread if it is a shared lock. Locks * are added to the head of the list, so we fill up each list entry from * "the back" logically. To ease some of the arithmetic, we actually fill * in each list entry the normal way (children[0] then children[1], etc.) but * when we traverse the list we read children[count-1] as the first entry * down to children[0] as the final entry. */ struct lock_list_entry { struct lock_list_entry *ll_next; struct lock_instance ll_children[LOCK_NCHILDREN]; u_int ll_count; }; /* * The main witness structure. One of these per named lock type in the system * (for example, "vnode interlock"). */ struct witness { char w_name[MAX_W_NAME]; uint32_t w_index; /* Index in the relationship matrix */ struct lock_class *w_class; STAILQ_ENTRY(witness) w_list; /* List of all witnesses. */ STAILQ_ENTRY(witness) w_typelist; /* Witnesses of a type. */ struct witness *w_hash_next; /* Linked list in hash buckets. */ const char *w_file; /* File where last acquired */ uint32_t w_line; /* Line where last acquired */ uint32_t w_refcount; uint16_t w_num_ancestors; /* direct/indirect * ancestor count */ uint16_t w_num_descendants; /* direct/indirect * descendant count */ int16_t w_ddb_level; unsigned w_displayed:1; unsigned w_reversed:1; }; STAILQ_HEAD(witness_list, witness); /* * The witness hash table. Keys are witness names (const char *), elements are * witness objects (struct witness *). */ struct witness_hash { struct witness *wh_array[WITNESS_HASH_SIZE]; uint32_t wh_size; uint32_t wh_count; }; /* * Key type for the lock order data hash table. */ struct witness_lock_order_key { uint16_t from; uint16_t to; }; struct witness_lock_order_data { struct stack wlod_stack; struct witness_lock_order_key wlod_key; struct witness_lock_order_data *wlod_next; }; /* * The witness lock order data hash table. Keys are witness index tuples * (struct witness_lock_order_key), elements are lock order data objects * (struct witness_lock_order_data). */ struct witness_lock_order_hash { struct witness_lock_order_data *wloh_array[WITNESS_LO_HASH_SIZE]; u_int wloh_size; u_int wloh_count; }; #ifdef BLESSING struct witness_blessed { const char *b_lock1; const char *b_lock2; }; #endif struct witness_pendhelp { const char *wh_type; struct lock_object *wh_lock; }; struct witness_order_list_entry { const char *w_name; struct lock_class *w_class; }; /* * Returns 0 if one of the locks is a spin lock and the other is not. * Returns 1 otherwise. */ static __inline int witness_lock_type_equal(struct witness *w1, struct witness *w2) { return ((w1->w_class->lc_flags & (LC_SLEEPLOCK | LC_SPINLOCK)) == (w2->w_class->lc_flags & (LC_SLEEPLOCK | LC_SPINLOCK))); } static __inline int witness_lock_order_key_equal(const struct witness_lock_order_key *a, const struct witness_lock_order_key *b) { return (a->from == b->from && a->to == b->to); } static int _isitmyx(struct witness *w1, struct witness *w2, int rmask, const char *fname); static void adopt(struct witness *parent, struct witness *child); #ifdef BLESSING static int blessed(struct witness *, struct witness *); #endif static void depart(struct witness *w); static struct witness *enroll(const char *description, struct lock_class *lock_class); static struct lock_instance *find_instance(struct lock_list_entry *list, const struct lock_object *lock); static int isitmychild(struct witness *parent, struct witness *child); static int isitmydescendant(struct witness *parent, struct witness *child); static void itismychild(struct witness *parent, struct witness *child); static int sysctl_debug_witness_badstacks(SYSCTL_HANDLER_ARGS); static int sysctl_debug_witness_watch(SYSCTL_HANDLER_ARGS); static int sysctl_debug_witness_fullgraph(SYSCTL_HANDLER_ARGS); static int sysctl_debug_witness_channel(SYSCTL_HANDLER_ARGS); static void witness_add_fullgraph(struct sbuf *sb, struct witness *parent); #ifdef DDB static void witness_ddb_compute_levels(void); static void witness_ddb_display(int(*)(const char *fmt, ...)); static void witness_ddb_display_descendants(int(*)(const char *fmt, ...), struct witness *, int indent); static void witness_ddb_display_list(int(*prnt)(const char *fmt, ...), struct witness_list *list); static void witness_ddb_level_descendants(struct witness *parent, int l); static void witness_ddb_list(struct thread *td); #endif static void witness_debugger(int cond, const char *msg); static void witness_free(struct witness *m); static struct witness *witness_get(void); static uint32_t witness_hash_djb2(const uint8_t *key, uint32_t size); static struct witness *witness_hash_get(const char *key); static void witness_hash_put(struct witness *w); static void witness_init_hash_tables(void); static void witness_increment_graph_generation(void); static void witness_lock_list_free(struct lock_list_entry *lle); static struct lock_list_entry *witness_lock_list_get(void); static int witness_lock_order_add(struct witness *parent, struct witness *child); static int witness_lock_order_check(struct witness *parent, struct witness *child); static struct witness_lock_order_data *witness_lock_order_get( struct witness *parent, struct witness *child); static void witness_list_lock(struct lock_instance *instance, int (*prnt)(const char *fmt, ...)); static int witness_output(const char *fmt, ...) __printflike(1, 2); static int witness_voutput(const char *fmt, va_list ap) __printflike(1, 0); static void witness_setflag(struct lock_object *lock, int flag, int set); static SYSCTL_NODE(_debug, OID_AUTO, witness, CTLFLAG_RW, NULL, "Witness Locking"); /* * If set to 0, lock order checking is disabled. If set to -1, * witness is completely disabled. Otherwise witness performs full * lock order checking for all locks. At runtime, lock order checking * may be toggled. However, witness cannot be reenabled once it is * completely disabled. */ static int witness_watch = 1; SYSCTL_PROC(_debug_witness, OID_AUTO, watch, CTLFLAG_RWTUN | CTLTYPE_INT, NULL, 0, sysctl_debug_witness_watch, "I", "witness is watching lock operations"); #ifdef KDB /* * When KDB is enabled and witness_kdb is 1, it will cause the system * to drop into kdebug() when: * - a lock hierarchy violation occurs * - locks are held when going to sleep. */ #ifdef WITNESS_KDB int witness_kdb = 1; #else int witness_kdb = 0; #endif SYSCTL_INT(_debug_witness, OID_AUTO, kdb, CTLFLAG_RWTUN, &witness_kdb, 0, ""); #endif /* KDB */ #if defined(DDB) || defined(KDB) /* * When DDB or KDB is enabled and witness_trace is 1, it will cause the system * to print a stack trace: * - a lock hierarchy violation occurs * - locks are held when going to sleep. */ int witness_trace = 1; SYSCTL_INT(_debug_witness, OID_AUTO, trace, CTLFLAG_RWTUN, &witness_trace, 0, ""); #endif /* DDB || KDB */ #ifdef WITNESS_SKIPSPIN int witness_skipspin = 1; #else int witness_skipspin = 0; #endif SYSCTL_INT(_debug_witness, OID_AUTO, skipspin, CTLFLAG_RDTUN, &witness_skipspin, 0, ""); int badstack_sbuf_size; int witness_count = WITNESS_COUNT; SYSCTL_INT(_debug_witness, OID_AUTO, witness_count, CTLFLAG_RDTUN, &witness_count, 0, ""); /* * Output channel for witness messages. By default we print to the console. */ enum witness_channel { WITNESS_CONSOLE, WITNESS_LOG, WITNESS_NONE, }; static enum witness_channel witness_channel = WITNESS_CONSOLE; SYSCTL_PROC(_debug_witness, OID_AUTO, output_channel, CTLTYPE_STRING | CTLFLAG_RWTUN, NULL, 0, sysctl_debug_witness_channel, "A", "Output channel for warnings"); /* * Call this to print out the relations between locks. */ SYSCTL_PROC(_debug_witness, OID_AUTO, fullgraph, CTLTYPE_STRING | CTLFLAG_RD, NULL, 0, sysctl_debug_witness_fullgraph, "A", "Show locks relation graphs"); /* * Call this to print out the witness faulty stacks. */ SYSCTL_PROC(_debug_witness, OID_AUTO, badstacks, CTLTYPE_STRING | CTLFLAG_RD, NULL, 0, sysctl_debug_witness_badstacks, "A", "Show bad witness stacks"); static struct mtx w_mtx; /* w_list */ static struct witness_list w_free = STAILQ_HEAD_INITIALIZER(w_free); static struct witness_list w_all = STAILQ_HEAD_INITIALIZER(w_all); /* w_typelist */ static struct witness_list w_spin = STAILQ_HEAD_INITIALIZER(w_spin); static struct witness_list w_sleep = STAILQ_HEAD_INITIALIZER(w_sleep); /* lock list */ static struct lock_list_entry *w_lock_list_free = NULL; static struct witness_pendhelp pending_locks[WITNESS_PENDLIST]; static u_int pending_cnt; static int w_free_cnt, w_spin_cnt, w_sleep_cnt; SYSCTL_INT(_debug_witness, OID_AUTO, free_cnt, CTLFLAG_RD, &w_free_cnt, 0, ""); SYSCTL_INT(_debug_witness, OID_AUTO, spin_cnt, CTLFLAG_RD, &w_spin_cnt, 0, ""); SYSCTL_INT(_debug_witness, OID_AUTO, sleep_cnt, CTLFLAG_RD, &w_sleep_cnt, 0, ""); static struct witness *w_data; static uint8_t **w_rmatrix; static struct lock_list_entry w_locklistdata[LOCK_CHILDCOUNT]; static struct witness_hash w_hash; /* The witness hash table. */ /* The lock order data hash */ static struct witness_lock_order_data w_lodata[WITNESS_LO_DATA_COUNT]; static struct witness_lock_order_data *w_lofree = NULL; static struct witness_lock_order_hash w_lohash; static int w_max_used_index = 0; static unsigned int w_generation = 0; static const char w_notrunning[] = "Witness not running\n"; static const char w_stillcold[] = "Witness is still cold\n"; static struct witness_order_list_entry order_lists[] = { /* * sx locks */ { "proctree", &lock_class_sx }, { "allproc", &lock_class_sx }, { "allprison", &lock_class_sx }, { NULL, NULL }, /* * Various mutexes */ { "Giant", &lock_class_mtx_sleep }, { "pipe mutex", &lock_class_mtx_sleep }, { "sigio lock", &lock_class_mtx_sleep }, { "process group", &lock_class_mtx_sleep }, { "process lock", &lock_class_mtx_sleep }, { "session", &lock_class_mtx_sleep }, { "uidinfo hash", &lock_class_rw }, #ifdef HWPMC_HOOKS { "pmc-sleep", &lock_class_mtx_sleep }, #endif { "time lock", &lock_class_mtx_sleep }, { NULL, NULL }, /* * umtx */ { "umtx lock", &lock_class_mtx_sleep }, { NULL, NULL }, /* * Sockets */ { "accept", &lock_class_mtx_sleep }, { "so_snd", &lock_class_mtx_sleep }, { "so_rcv", &lock_class_mtx_sleep }, { "sellck", &lock_class_mtx_sleep }, { NULL, NULL }, /* * Routing */ { "so_rcv", &lock_class_mtx_sleep }, { "radix node head", &lock_class_rw }, { "rtentry", &lock_class_mtx_sleep }, { "ifaddr", &lock_class_mtx_sleep }, { NULL, NULL }, /* * IPv4 multicast: * protocol locks before interface locks, after UDP locks. */ { "udpinp", &lock_class_rw }, { "in_multi_mtx", &lock_class_mtx_sleep }, { "igmp_mtx", &lock_class_mtx_sleep }, { "if_addr_lock", &lock_class_rw }, { NULL, NULL }, /* * IPv6 multicast: * protocol locks before interface locks, after UDP locks. */ { "udpinp", &lock_class_rw }, { "in6_multi_mtx", &lock_class_mtx_sleep }, { "mld_mtx", &lock_class_mtx_sleep }, { "if_addr_lock", &lock_class_rw }, { NULL, NULL }, /* * UNIX Domain Sockets */ { "unp_link_rwlock", &lock_class_rw }, { "unp_list_lock", &lock_class_mtx_sleep }, { "unp", &lock_class_mtx_sleep }, { "so_snd", &lock_class_mtx_sleep }, { NULL, NULL }, /* * UDP/IP */ { "udp", &lock_class_rw }, { "udpinp", &lock_class_rw }, { "so_snd", &lock_class_mtx_sleep }, { NULL, NULL }, /* * TCP/IP */ { "tcp", &lock_class_rw }, { "tcpinp", &lock_class_rw }, { "so_snd", &lock_class_mtx_sleep }, { NULL, NULL }, /* * BPF */ { "bpf global lock", &lock_class_mtx_sleep }, { "bpf interface lock", &lock_class_rw }, { "bpf cdev lock", &lock_class_mtx_sleep }, { NULL, NULL }, /* * NFS server */ { "nfsd_mtx", &lock_class_mtx_sleep }, { "so_snd", &lock_class_mtx_sleep }, { NULL, NULL }, /* * IEEE 802.11 */ { "802.11 com lock", &lock_class_mtx_sleep}, { NULL, NULL }, /* * Network drivers */ { "network driver", &lock_class_mtx_sleep}, { NULL, NULL }, /* * Netgraph */ { "ng_node", &lock_class_mtx_sleep }, { "ng_worklist", &lock_class_mtx_sleep }, { NULL, NULL }, /* * CDEV */ { "vm map (system)", &lock_class_mtx_sleep }, { "vm page queue", &lock_class_mtx_sleep }, { "vnode interlock", &lock_class_mtx_sleep }, { "cdev", &lock_class_mtx_sleep }, { NULL, NULL }, /* * VM */ { "vm map (user)", &lock_class_sx }, { "vm object", &lock_class_rw }, { "vm page", &lock_class_mtx_sleep }, { "vm page queue", &lock_class_mtx_sleep }, { "pmap pv global", &lock_class_rw }, { "pmap", &lock_class_mtx_sleep }, { "pmap pv list", &lock_class_rw }, { "vm page free queue", &lock_class_mtx_sleep }, { NULL, NULL }, /* * kqueue/VFS interaction */ { "kqueue", &lock_class_mtx_sleep }, { "struct mount mtx", &lock_class_mtx_sleep }, { "vnode interlock", &lock_class_mtx_sleep }, { NULL, NULL }, /* + * VFS namecache + */ + { "ncglobal", &lock_class_rw }, + { "ncbuc", &lock_class_rw }, + { "vnode interlock", &lock_class_mtx_sleep }, + { "ncneg", &lock_class_mtx_sleep }, + { NULL, NULL }, + /* * ZFS locking */ { "dn->dn_mtx", &lock_class_sx }, { "dr->dt.di.dr_mtx", &lock_class_sx }, { "db->db_mtx", &lock_class_sx }, { NULL, NULL }, /* * spin locks */ #ifdef SMP { "ap boot", &lock_class_mtx_spin }, #endif { "rm.mutex_mtx", &lock_class_mtx_spin }, { "sio", &lock_class_mtx_spin }, #ifdef __i386__ { "cy", &lock_class_mtx_spin }, #endif #ifdef __sparc64__ { "pcib_mtx", &lock_class_mtx_spin }, { "rtc_mtx", &lock_class_mtx_spin }, #endif { "scc_hwmtx", &lock_class_mtx_spin }, { "uart_hwmtx", &lock_class_mtx_spin }, { "fast_taskqueue", &lock_class_mtx_spin }, { "intr table", &lock_class_mtx_spin }, #ifdef HWPMC_HOOKS { "pmc-per-proc", &lock_class_mtx_spin }, #endif { "process slock", &lock_class_mtx_spin }, { "syscons video lock", &lock_class_mtx_spin }, { "sleepq chain", &lock_class_mtx_spin }, { "rm_spinlock", &lock_class_mtx_spin }, { "turnstile chain", &lock_class_mtx_spin }, { "turnstile lock", &lock_class_mtx_spin }, { "sched lock", &lock_class_mtx_spin }, { "td_contested", &lock_class_mtx_spin }, { "callout", &lock_class_mtx_spin }, { "entropy harvest mutex", &lock_class_mtx_spin }, #ifdef SMP { "smp rendezvous", &lock_class_mtx_spin }, #endif #ifdef __powerpc__ { "tlb0", &lock_class_mtx_spin }, #endif /* * leaf locks */ { "intrcnt", &lock_class_mtx_spin }, { "icu", &lock_class_mtx_spin }, #if defined(SMP) && defined(__sparc64__) { "ipi", &lock_class_mtx_spin }, #endif #ifdef __i386__ { "allpmaps", &lock_class_mtx_spin }, { "descriptor tables", &lock_class_mtx_spin }, #endif { "clk", &lock_class_mtx_spin }, { "cpuset", &lock_class_mtx_spin }, { "mprof lock", &lock_class_mtx_spin }, { "zombie lock", &lock_class_mtx_spin }, { "ALD Queue", &lock_class_mtx_spin }, #if defined(__i386__) || defined(__amd64__) { "pcicfg", &lock_class_mtx_spin }, { "NDIS thread lock", &lock_class_mtx_spin }, #endif { "tw_osl_io_lock", &lock_class_mtx_spin }, { "tw_osl_q_lock", &lock_class_mtx_spin }, { "tw_cl_io_lock", &lock_class_mtx_spin }, { "tw_cl_intr_lock", &lock_class_mtx_spin }, { "tw_cl_gen_lock", &lock_class_mtx_spin }, #ifdef HWPMC_HOOKS { "pmc-leaf", &lock_class_mtx_spin }, #endif { "blocked lock", &lock_class_mtx_spin }, { NULL, NULL }, { NULL, NULL } }; #ifdef BLESSING /* * Pairs of locks which have been blessed * Don't complain about order problems with blessed locks */ static struct witness_blessed blessed_list[] = { }; #endif /* * This global is set to 0 once it becomes safe to use the witness code. */ static int witness_cold = 1; /* * This global is set to 1 once the static lock orders have been enrolled * so that a warning can be issued for any spin locks enrolled later. */ static int witness_spin_warn = 0; /* Trim useless garbage from filenames. */ static const char * fixup_filename(const char *file) { if (file == NULL) return (NULL); while (strncmp(file, "../", 3) == 0) file += 3; return (file); } /* * The WITNESS-enabled diagnostic code. Note that the witness code does * assume that the early boot is single-threaded at least until after this * routine is completed. */ static void witness_initialize(void *dummy __unused) { struct lock_object *lock; struct witness_order_list_entry *order; struct witness *w, *w1; int i; w_data = malloc(sizeof (struct witness) * witness_count, M_WITNESS, M_WAITOK | M_ZERO); w_rmatrix = malloc(sizeof(*w_rmatrix) * (witness_count + 1), M_WITNESS, M_WAITOK | M_ZERO); for (i = 0; i < witness_count + 1; i++) { w_rmatrix[i] = malloc(sizeof(*w_rmatrix[i]) * (witness_count + 1), M_WITNESS, M_WAITOK | M_ZERO); } badstack_sbuf_size = witness_count * 256; /* * We have to release Giant before initializing its witness * structure so that WITNESS doesn't get confused. */ mtx_unlock(&Giant); mtx_assert(&Giant, MA_NOTOWNED); CTR1(KTR_WITNESS, "%s: initializing witness", __func__); mtx_init(&w_mtx, "witness lock", NULL, MTX_SPIN | MTX_QUIET | MTX_NOWITNESS | MTX_NOPROFILE); for (i = witness_count - 1; i >= 0; i--) { w = &w_data[i]; memset(w, 0, sizeof(*w)); w_data[i].w_index = i; /* Witness index never changes. */ witness_free(w); } KASSERT(STAILQ_FIRST(&w_free)->w_index == 0, ("%s: Invalid list of free witness objects", __func__)); /* Witness with index 0 is not used to aid in debugging. */ STAILQ_REMOVE_HEAD(&w_free, w_list); w_free_cnt--; for (i = 0; i < witness_count; i++) { memset(w_rmatrix[i], 0, sizeof(*w_rmatrix[i]) * (witness_count + 1)); } for (i = 0; i < LOCK_CHILDCOUNT; i++) witness_lock_list_free(&w_locklistdata[i]); witness_init_hash_tables(); /* First add in all the specified order lists. */ for (order = order_lists; order->w_name != NULL; order++) { w = enroll(order->w_name, order->w_class); if (w == NULL) continue; w->w_file = "order list"; for (order++; order->w_name != NULL; order++) { w1 = enroll(order->w_name, order->w_class); if (w1 == NULL) continue; w1->w_file = "order list"; itismychild(w, w1); w = w1; } } witness_spin_warn = 1; /* Iterate through all locks and add them to witness. */ for (i = 0; pending_locks[i].wh_lock != NULL; i++) { lock = pending_locks[i].wh_lock; KASSERT(lock->lo_flags & LO_WITNESS, ("%s: lock %s is on pending list but not LO_WITNESS", __func__, lock->lo_name)); lock->lo_witness = enroll(pending_locks[i].wh_type, LOCK_CLASS(lock)); } /* Mark the witness code as being ready for use. */ witness_cold = 0; mtx_lock(&Giant); } SYSINIT(witness_init, SI_SUB_WITNESS, SI_ORDER_FIRST, witness_initialize, NULL); void witness_init(struct lock_object *lock, const char *type) { struct lock_class *class; /* Various sanity checks. */ class = LOCK_CLASS(lock); if ((lock->lo_flags & LO_RECURSABLE) != 0 && (class->lc_flags & LC_RECURSABLE) == 0) kassert_panic("%s: lock (%s) %s can not be recursable", __func__, class->lc_name, lock->lo_name); if ((lock->lo_flags & LO_SLEEPABLE) != 0 && (class->lc_flags & LC_SLEEPABLE) == 0) kassert_panic("%s: lock (%s) %s can not be sleepable", __func__, class->lc_name, lock->lo_name); if ((lock->lo_flags & LO_UPGRADABLE) != 0 && (class->lc_flags & LC_UPGRADABLE) == 0) kassert_panic("%s: lock (%s) %s can not be upgradable", __func__, class->lc_name, lock->lo_name); /* * If we shouldn't watch this lock, then just clear lo_witness. * Otherwise, if witness_cold is set, then it is too early to * enroll this lock, so defer it to witness_initialize() by adding * it to the pending_locks list. If it is not too early, then enroll * the lock now. */ if (witness_watch < 1 || panicstr != NULL || (lock->lo_flags & LO_WITNESS) == 0) lock->lo_witness = NULL; else if (witness_cold) { pending_locks[pending_cnt].wh_lock = lock; pending_locks[pending_cnt++].wh_type = type; if (pending_cnt > WITNESS_PENDLIST) panic("%s: pending locks list is too small, " "increase WITNESS_PENDLIST\n", __func__); } else lock->lo_witness = enroll(type, class); } void witness_destroy(struct lock_object *lock) { struct lock_class *class; struct witness *w; class = LOCK_CLASS(lock); if (witness_cold) panic("lock (%s) %s destroyed while witness_cold", class->lc_name, lock->lo_name); /* XXX: need to verify that no one holds the lock */ if ((lock->lo_flags & LO_WITNESS) == 0 || lock->lo_witness == NULL) return; w = lock->lo_witness; mtx_lock_spin(&w_mtx); MPASS(w->w_refcount > 0); w->w_refcount--; if (w->w_refcount == 0) depart(w); mtx_unlock_spin(&w_mtx); } #ifdef DDB static void witness_ddb_compute_levels(void) { struct witness *w; /* * First clear all levels. */ STAILQ_FOREACH(w, &w_all, w_list) w->w_ddb_level = -1; /* * Look for locks with no parents and level all their descendants. */ STAILQ_FOREACH(w, &w_all, w_list) { /* If the witness has ancestors (is not a root), skip it. */ if (w->w_num_ancestors > 0) continue; witness_ddb_level_descendants(w, 0); } } static void witness_ddb_level_descendants(struct witness *w, int l) { int i; if (w->w_ddb_level >= l) return; w->w_ddb_level = l; l++; for (i = 1; i <= w_max_used_index; i++) { if (w_rmatrix[w->w_index][i] & WITNESS_PARENT) witness_ddb_level_descendants(&w_data[i], l); } } static void witness_ddb_display_descendants(int(*prnt)(const char *fmt, ...), struct witness *w, int indent) { int i; for (i = 0; i < indent; i++) prnt(" "); prnt("%s (type: %s, depth: %d, active refs: %d)", w->w_name, w->w_class->lc_name, w->w_ddb_level, w->w_refcount); if (w->w_displayed) { prnt(" -- (already displayed)\n"); return; } w->w_displayed = 1; if (w->w_file != NULL && w->w_line != 0) prnt(" -- last acquired @ %s:%d\n", fixup_filename(w->w_file), w->w_line); else prnt(" -- never acquired\n"); indent++; WITNESS_INDEX_ASSERT(w->w_index); for (i = 1; i <= w_max_used_index; i++) { if (db_pager_quit) return; if (w_rmatrix[w->w_index][i] & WITNESS_PARENT) witness_ddb_display_descendants(prnt, &w_data[i], indent); } } static void witness_ddb_display_list(int(*prnt)(const char *fmt, ...), struct witness_list *list) { struct witness *w; STAILQ_FOREACH(w, list, w_typelist) { if (w->w_file == NULL || w->w_ddb_level > 0) continue; /* This lock has no anscestors - display its descendants. */ witness_ddb_display_descendants(prnt, w, 0); if (db_pager_quit) return; } } static void witness_ddb_display(int(*prnt)(const char *fmt, ...)) { struct witness *w; KASSERT(witness_cold == 0, ("%s: witness_cold", __func__)); witness_ddb_compute_levels(); /* Clear all the displayed flags. */ STAILQ_FOREACH(w, &w_all, w_list) w->w_displayed = 0; /* * First, handle sleep locks which have been acquired at least * once. */ prnt("Sleep locks:\n"); witness_ddb_display_list(prnt, &w_sleep); if (db_pager_quit) return; /* * Now do spin locks which have been acquired at least once. */ prnt("\nSpin locks:\n"); witness_ddb_display_list(prnt, &w_spin); if (db_pager_quit) return; /* * Finally, any locks which have not been acquired yet. */ prnt("\nLocks which were never acquired:\n"); STAILQ_FOREACH(w, &w_all, w_list) { if (w->w_file != NULL || w->w_refcount == 0) continue; prnt("%s (type: %s, depth: %d)\n", w->w_name, w->w_class->lc_name, w->w_ddb_level); if (db_pager_quit) return; } } #endif /* DDB */ int witness_defineorder(struct lock_object *lock1, struct lock_object *lock2) { if (witness_watch == -1 || panicstr != NULL) return (0); /* Require locks that witness knows about. */ if (lock1 == NULL || lock1->lo_witness == NULL || lock2 == NULL || lock2->lo_witness == NULL) return (EINVAL); mtx_assert(&w_mtx, MA_NOTOWNED); mtx_lock_spin(&w_mtx); /* * If we already have either an explicit or implied lock order that * is the other way around, then return an error. */ if (witness_watch && isitmydescendant(lock2->lo_witness, lock1->lo_witness)) { mtx_unlock_spin(&w_mtx); return (EDOOFUS); } /* Try to add the new order. */ CTR3(KTR_WITNESS, "%s: adding %s as a child of %s", __func__, lock2->lo_witness->w_name, lock1->lo_witness->w_name); itismychild(lock1->lo_witness, lock2->lo_witness); mtx_unlock_spin(&w_mtx); return (0); } void witness_checkorder(struct lock_object *lock, int flags, const char *file, int line, struct lock_object *interlock) { struct lock_list_entry *lock_list, *lle; struct lock_instance *lock1, *lock2, *plock; struct lock_class *class, *iclass; struct witness *w, *w1; struct thread *td; int i, j; if (witness_cold || witness_watch < 1 || lock->lo_witness == NULL || panicstr != NULL) return; w = lock->lo_witness; class = LOCK_CLASS(lock); td = curthread; if (class->lc_flags & LC_SLEEPLOCK) { /* * Since spin locks include a critical section, this check * implicitly enforces a lock order of all sleep locks before * all spin locks. */ if (td->td_critnest != 0 && !kdb_active) kassert_panic("acquiring blockable sleep lock with " "spinlock or critical section held (%s) %s @ %s:%d", class->lc_name, lock->lo_name, fixup_filename(file), line); /* * If this is the first lock acquired then just return as * no order checking is needed. */ lock_list = td->td_sleeplocks; if (lock_list == NULL || lock_list->ll_count == 0) return; } else { /* * If this is the first lock, just return as no order * checking is needed. Avoid problems with thread * migration pinning the thread while checking if * spinlocks are held. If at least one spinlock is held * the thread is in a safe path and it is allowed to * unpin it. */ sched_pin(); lock_list = PCPU_GET(spinlocks); if (lock_list == NULL || lock_list->ll_count == 0) { sched_unpin(); return; } sched_unpin(); } /* * Check to see if we are recursing on a lock we already own. If * so, make sure that we don't mismatch exclusive and shared lock * acquires. */ lock1 = find_instance(lock_list, lock); if (lock1 != NULL) { if ((lock1->li_flags & LI_EXCLUSIVE) != 0 && (flags & LOP_EXCLUSIVE) == 0) { witness_output("shared lock of (%s) %s @ %s:%d\n", class->lc_name, lock->lo_name, fixup_filename(file), line); witness_output("while exclusively locked from %s:%d\n", fixup_filename(lock1->li_file), lock1->li_line); kassert_panic("excl->share"); } if ((lock1->li_flags & LI_EXCLUSIVE) == 0 && (flags & LOP_EXCLUSIVE) != 0) { witness_output("exclusive lock of (%s) %s @ %s:%d\n", class->lc_name, lock->lo_name, fixup_filename(file), line); witness_output("while share locked from %s:%d\n", fixup_filename(lock1->li_file), lock1->li_line); kassert_panic("share->excl"); } return; } /* Warn if the interlock is not locked exactly once. */ if (interlock != NULL) { iclass = LOCK_CLASS(interlock); lock1 = find_instance(lock_list, interlock); if (lock1 == NULL) kassert_panic("interlock (%s) %s not locked @ %s:%d", iclass->lc_name, interlock->lo_name, fixup_filename(file), line); else if ((lock1->li_flags & LI_RECURSEMASK) != 0) kassert_panic("interlock (%s) %s recursed @ %s:%d", iclass->lc_name, interlock->lo_name, fixup_filename(file), line); } /* * Find the previously acquired lock, but ignore interlocks. */ plock = &lock_list->ll_children[lock_list->ll_count - 1]; if (interlock != NULL && plock->li_lock == interlock) { if (lock_list->ll_count > 1) plock = &lock_list->ll_children[lock_list->ll_count - 2]; else { lle = lock_list->ll_next; /* * The interlock is the only lock we hold, so * simply return. */ if (lle == NULL) return; plock = &lle->ll_children[lle->ll_count - 1]; } } /* * Try to perform most checks without a lock. If this succeeds we * can skip acquiring the lock and return success. Otherwise we redo * the check with the lock held to handle races with concurrent updates. */ w1 = plock->li_lock->lo_witness; if (witness_lock_order_check(w1, w)) return; mtx_lock_spin(&w_mtx); if (witness_lock_order_check(w1, w)) { mtx_unlock_spin(&w_mtx); return; } witness_lock_order_add(w1, w); /* * Check for duplicate locks of the same type. Note that we only * have to check for this on the last lock we just acquired. Any * other cases will be caught as lock order violations. */ if (w1 == w) { i = w->w_index; if (!(lock->lo_flags & LO_DUPOK) && !(flags & LOP_DUPOK) && !(w_rmatrix[i][i] & WITNESS_REVERSAL)) { w_rmatrix[i][i] |= WITNESS_REVERSAL; w->w_reversed = 1; mtx_unlock_spin(&w_mtx); witness_output( "acquiring duplicate lock of same type: \"%s\"\n", w->w_name); witness_output(" 1st %s @ %s:%d\n", plock->li_lock->lo_name, fixup_filename(plock->li_file), plock->li_line); witness_output(" 2nd %s @ %s:%d\n", lock->lo_name, fixup_filename(file), line); witness_debugger(1, __func__); } else mtx_unlock_spin(&w_mtx); return; } mtx_assert(&w_mtx, MA_OWNED); /* * If we know that the lock we are acquiring comes after * the lock we most recently acquired in the lock order tree, * then there is no need for any further checks. */ if (isitmychild(w1, w)) goto out; for (j = 0, lle = lock_list; lle != NULL; lle = lle->ll_next) { for (i = lle->ll_count - 1; i >= 0; i--, j++) { MPASS(j < LOCK_CHILDCOUNT * LOCK_NCHILDREN); lock1 = &lle->ll_children[i]; /* * Ignore the interlock. */ if (interlock == lock1->li_lock) continue; /* * If this lock doesn't undergo witness checking, * then skip it. */ w1 = lock1->li_lock->lo_witness; if (w1 == NULL) { KASSERT((lock1->li_lock->lo_flags & LO_WITNESS) == 0, ("lock missing witness structure")); continue; } /* * If we are locking Giant and this is a sleepable * lock, then skip it. */ if ((lock1->li_lock->lo_flags & LO_SLEEPABLE) != 0 && lock == &Giant.lock_object) continue; /* * If we are locking a sleepable lock and this lock * is Giant, then skip it. */ if ((lock->lo_flags & LO_SLEEPABLE) != 0 && lock1->li_lock == &Giant.lock_object) continue; /* * If we are locking a sleepable lock and this lock * isn't sleepable, we want to treat it as a lock * order violation to enfore a general lock order of * sleepable locks before non-sleepable locks. */ if (((lock->lo_flags & LO_SLEEPABLE) != 0 && (lock1->li_lock->lo_flags & LO_SLEEPABLE) == 0)) goto reversal; /* * If we are locking Giant and this is a non-sleepable * lock, then treat it as a reversal. */ if ((lock1->li_lock->lo_flags & LO_SLEEPABLE) == 0 && lock == &Giant.lock_object) goto reversal; /* * Check the lock order hierarchy for a reveresal. */ if (!isitmydescendant(w, w1)) continue; reversal: /* * We have a lock order violation, check to see if it * is allowed or has already been yelled about. */ #ifdef BLESSING /* * If the lock order is blessed, just bail. We don't * look for other lock order violations though, which * may be a bug. */ if (blessed(w, w1)) goto out; #endif /* Bail if this violation is known */ if (w_rmatrix[w1->w_index][w->w_index] & WITNESS_REVERSAL) goto out; /* Record this as a violation */ w_rmatrix[w1->w_index][w->w_index] |= WITNESS_REVERSAL; w_rmatrix[w->w_index][w1->w_index] |= WITNESS_REVERSAL; w->w_reversed = w1->w_reversed = 1; witness_increment_graph_generation(); mtx_unlock_spin(&w_mtx); #ifdef WITNESS_NO_VNODE /* * There are known LORs between VNODE locks. They are * not an indication of a bug. VNODE locks are flagged * as such (LO_IS_VNODE) and we don't yell if the LOR * is between 2 VNODE locks. */ if ((lock->lo_flags & LO_IS_VNODE) != 0 && (lock1->li_lock->lo_flags & LO_IS_VNODE) != 0) return; #endif /* * Ok, yell about it. */ if (((lock->lo_flags & LO_SLEEPABLE) != 0 && (lock1->li_lock->lo_flags & LO_SLEEPABLE) == 0)) witness_output( "lock order reversal: (sleepable after non-sleepable)\n"); else if ((lock1->li_lock->lo_flags & LO_SLEEPABLE) == 0 && lock == &Giant.lock_object) witness_output( "lock order reversal: (Giant after non-sleepable)\n"); else witness_output("lock order reversal:\n"); /* * Try to locate an earlier lock with * witness w in our list. */ do { lock2 = &lle->ll_children[i]; MPASS(lock2->li_lock != NULL); if (lock2->li_lock->lo_witness == w) break; if (i == 0 && lle->ll_next != NULL) { lle = lle->ll_next; i = lle->ll_count - 1; MPASS(i >= 0 && i < LOCK_NCHILDREN); } else i--; } while (i >= 0); if (i < 0) { witness_output(" 1st %p %s (%s) @ %s:%d\n", lock1->li_lock, lock1->li_lock->lo_name, w1->w_name, fixup_filename(lock1->li_file), lock1->li_line); witness_output(" 2nd %p %s (%s) @ %s:%d\n", lock, lock->lo_name, w->w_name, fixup_filename(file), line); } else { witness_output(" 1st %p %s (%s) @ %s:%d\n", lock2->li_lock, lock2->li_lock->lo_name, lock2->li_lock->lo_witness->w_name, fixup_filename(lock2->li_file), lock2->li_line); witness_output(" 2nd %p %s (%s) @ %s:%d\n", lock1->li_lock, lock1->li_lock->lo_name, w1->w_name, fixup_filename(lock1->li_file), lock1->li_line); witness_output(" 3rd %p %s (%s) @ %s:%d\n", lock, lock->lo_name, w->w_name, fixup_filename(file), line); } witness_debugger(1, __func__); return; } } /* * If requested, build a new lock order. However, don't build a new * relationship between a sleepable lock and Giant if it is in the * wrong direction. The correct lock order is that sleepable locks * always come before Giant. */ if (flags & LOP_NEWORDER && !(plock->li_lock == &Giant.lock_object && (lock->lo_flags & LO_SLEEPABLE) != 0)) { CTR3(KTR_WITNESS, "%s: adding %s as a child of %s", __func__, w->w_name, plock->li_lock->lo_witness->w_name); itismychild(plock->li_lock->lo_witness, w); } out: mtx_unlock_spin(&w_mtx); } void witness_lock(struct lock_object *lock, int flags, const char *file, int line) { struct lock_list_entry **lock_list, *lle; struct lock_instance *instance; struct witness *w; struct thread *td; if (witness_cold || witness_watch == -1 || lock->lo_witness == NULL || panicstr != NULL) return; w = lock->lo_witness; td = curthread; /* Determine lock list for this lock. */ if (LOCK_CLASS(lock)->lc_flags & LC_SLEEPLOCK) lock_list = &td->td_sleeplocks; else lock_list = PCPU_PTR(spinlocks); /* Check to see if we are recursing on a lock we already own. */ instance = find_instance(*lock_list, lock); if (instance != NULL) { instance->li_flags++; CTR4(KTR_WITNESS, "%s: pid %d recursed on %s r=%d", __func__, td->td_proc->p_pid, lock->lo_name, instance->li_flags & LI_RECURSEMASK); instance->li_file = file; instance->li_line = line; return; } /* Update per-witness last file and line acquire. */ w->w_file = file; w->w_line = line; /* Find the next open lock instance in the list and fill it. */ lle = *lock_list; if (lle == NULL || lle->ll_count == LOCK_NCHILDREN) { lle = witness_lock_list_get(); if (lle == NULL) return; lle->ll_next = *lock_list; CTR3(KTR_WITNESS, "%s: pid %d added lle %p", __func__, td->td_proc->p_pid, lle); *lock_list = lle; } instance = &lle->ll_children[lle->ll_count++]; instance->li_lock = lock; instance->li_line = line; instance->li_file = file; if ((flags & LOP_EXCLUSIVE) != 0) instance->li_flags = LI_EXCLUSIVE; else instance->li_flags = 0; CTR4(KTR_WITNESS, "%s: pid %d added %s as lle[%d]", __func__, td->td_proc->p_pid, lock->lo_name, lle->ll_count - 1); } void witness_upgrade(struct lock_object *lock, int flags, const char *file, int line) { struct lock_instance *instance; struct lock_class *class; KASSERT(witness_cold == 0, ("%s: witness_cold", __func__)); if (lock->lo_witness == NULL || witness_watch == -1 || panicstr != NULL) return; class = LOCK_CLASS(lock); if (witness_watch) { if ((lock->lo_flags & LO_UPGRADABLE) == 0) kassert_panic( "upgrade of non-upgradable lock (%s) %s @ %s:%d", class->lc_name, lock->lo_name, fixup_filename(file), line); if ((class->lc_flags & LC_SLEEPLOCK) == 0) kassert_panic( "upgrade of non-sleep lock (%s) %s @ %s:%d", class->lc_name, lock->lo_name, fixup_filename(file), line); } instance = find_instance(curthread->td_sleeplocks, lock); if (instance == NULL) { kassert_panic("upgrade of unlocked lock (%s) %s @ %s:%d", class->lc_name, lock->lo_name, fixup_filename(file), line); return; } if (witness_watch) { if ((instance->li_flags & LI_EXCLUSIVE) != 0) kassert_panic( "upgrade of exclusive lock (%s) %s @ %s:%d", class->lc_name, lock->lo_name, fixup_filename(file), line); if ((instance->li_flags & LI_RECURSEMASK) != 0) kassert_panic( "upgrade of recursed lock (%s) %s r=%d @ %s:%d", class->lc_name, lock->lo_name, instance->li_flags & LI_RECURSEMASK, fixup_filename(file), line); } instance->li_flags |= LI_EXCLUSIVE; } void witness_downgrade(struct lock_object *lock, int flags, const char *file, int line) { struct lock_instance *instance; struct lock_class *class; KASSERT(witness_cold == 0, ("%s: witness_cold", __func__)); if (lock->lo_witness == NULL || witness_watch == -1 || panicstr != NULL) return; class = LOCK_CLASS(lock); if (witness_watch) { if ((lock->lo_flags & LO_UPGRADABLE) == 0) kassert_panic( "downgrade of non-upgradable lock (%s) %s @ %s:%d", class->lc_name, lock->lo_name, fixup_filename(file), line); if ((class->lc_flags & LC_SLEEPLOCK) == 0) kassert_panic( "downgrade of non-sleep lock (%s) %s @ %s:%d", class->lc_name, lock->lo_name, fixup_filename(file), line); } instance = find_instance(curthread->td_sleeplocks, lock); if (instance == NULL) { kassert_panic("downgrade of unlocked lock (%s) %s @ %s:%d", class->lc_name, lock->lo_name, fixup_filename(file), line); return; } if (witness_watch) { if ((instance->li_flags & LI_EXCLUSIVE) == 0) kassert_panic( "downgrade of shared lock (%s) %s @ %s:%d", class->lc_name, lock->lo_name, fixup_filename(file), line); if ((instance->li_flags & LI_RECURSEMASK) != 0) kassert_panic( "downgrade of recursed lock (%s) %s r=%d @ %s:%d", class->lc_name, lock->lo_name, instance->li_flags & LI_RECURSEMASK, fixup_filename(file), line); } instance->li_flags &= ~LI_EXCLUSIVE; } void witness_unlock(struct lock_object *lock, int flags, const char *file, int line) { struct lock_list_entry **lock_list, *lle; struct lock_instance *instance; struct lock_class *class; struct thread *td; register_t s; int i, j; if (witness_cold || lock->lo_witness == NULL || panicstr != NULL) return; td = curthread; class = LOCK_CLASS(lock); /* Find lock instance associated with this lock. */ if (class->lc_flags & LC_SLEEPLOCK) lock_list = &td->td_sleeplocks; else lock_list = PCPU_PTR(spinlocks); lle = *lock_list; for (; *lock_list != NULL; lock_list = &(*lock_list)->ll_next) for (i = 0; i < (*lock_list)->ll_count; i++) { instance = &(*lock_list)->ll_children[i]; if (instance->li_lock == lock) goto found; } /* * When disabling WITNESS through witness_watch we could end up in * having registered locks in the td_sleeplocks queue. * We have to make sure we flush these queues, so just search for * eventual register locks and remove them. */ if (witness_watch > 0) { kassert_panic("lock (%s) %s not locked @ %s:%d", class->lc_name, lock->lo_name, fixup_filename(file), line); return; } else { return; } found: /* First, check for shared/exclusive mismatches. */ if ((instance->li_flags & LI_EXCLUSIVE) != 0 && witness_watch > 0 && (flags & LOP_EXCLUSIVE) == 0) { witness_output("shared unlock of (%s) %s @ %s:%d\n", class->lc_name, lock->lo_name, fixup_filename(file), line); witness_output("while exclusively locked from %s:%d\n", fixup_filename(instance->li_file), instance->li_line); kassert_panic("excl->ushare"); } if ((instance->li_flags & LI_EXCLUSIVE) == 0 && witness_watch > 0 && (flags & LOP_EXCLUSIVE) != 0) { witness_output("exclusive unlock of (%s) %s @ %s:%d\n", class->lc_name, lock->lo_name, fixup_filename(file), line); witness_output("while share locked from %s:%d\n", fixup_filename(instance->li_file), instance->li_line); kassert_panic("share->uexcl"); } /* If we are recursed, unrecurse. */ if ((instance->li_flags & LI_RECURSEMASK) > 0) { CTR4(KTR_WITNESS, "%s: pid %d unrecursed on %s r=%d", __func__, td->td_proc->p_pid, instance->li_lock->lo_name, instance->li_flags); instance->li_flags--; return; } /* The lock is now being dropped, check for NORELEASE flag */ if ((instance->li_flags & LI_NORELEASE) != 0 && witness_watch > 0) { witness_output("forbidden unlock of (%s) %s @ %s:%d\n", class->lc_name, lock->lo_name, fixup_filename(file), line); kassert_panic("lock marked norelease"); } /* Otherwise, remove this item from the list. */ s = intr_disable(); CTR4(KTR_WITNESS, "%s: pid %d removed %s from lle[%d]", __func__, td->td_proc->p_pid, instance->li_lock->lo_name, (*lock_list)->ll_count - 1); for (j = i; j < (*lock_list)->ll_count - 1; j++) (*lock_list)->ll_children[j] = (*lock_list)->ll_children[j + 1]; (*lock_list)->ll_count--; intr_restore(s); /* * In order to reduce contention on w_mtx, we want to keep always an * head object into lists so that frequent allocation from the * free witness pool (and subsequent locking) is avoided. * In order to maintain the current code simple, when the head * object is totally unloaded it means also that we do not have * further objects in the list, so the list ownership needs to be * hand over to another object if the current head needs to be freed. */ if ((*lock_list)->ll_count == 0) { if (*lock_list == lle) { if (lle->ll_next == NULL) return; } else lle = *lock_list; *lock_list = lle->ll_next; CTR3(KTR_WITNESS, "%s: pid %d removed lle %p", __func__, td->td_proc->p_pid, lle); witness_lock_list_free(lle); } } void witness_thread_exit(struct thread *td) { struct lock_list_entry *lle; int i, n; lle = td->td_sleeplocks; if (lle == NULL || panicstr != NULL) return; if (lle->ll_count != 0) { for (n = 0; lle != NULL; lle = lle->ll_next) for (i = lle->ll_count - 1; i >= 0; i--) { if (n == 0) witness_output( "Thread %p exiting with the following locks held:\n", td); n++; witness_list_lock(&lle->ll_children[i], witness_output); } kassert_panic( "Thread %p cannot exit while holding sleeplocks\n", td); } witness_lock_list_free(lle); } /* * Warn if any locks other than 'lock' are held. Flags can be passed in to * exempt Giant and sleepable locks from the checks as well. If any * non-exempt locks are held, then a supplied message is printed to the * output channel along with a list of the offending locks. If indicated in the * flags then a failure results in a panic as well. */ int witness_warn(int flags, struct lock_object *lock, const char *fmt, ...) { struct lock_list_entry *lock_list, *lle; struct lock_instance *lock1; struct thread *td; va_list ap; int i, n; if (witness_cold || witness_watch < 1 || panicstr != NULL) return (0); n = 0; td = curthread; for (lle = td->td_sleeplocks; lle != NULL; lle = lle->ll_next) for (i = lle->ll_count - 1; i >= 0; i--) { lock1 = &lle->ll_children[i]; if (lock1->li_lock == lock) continue; if (flags & WARN_GIANTOK && lock1->li_lock == &Giant.lock_object) continue; if (flags & WARN_SLEEPOK && (lock1->li_lock->lo_flags & LO_SLEEPABLE) != 0) continue; if (n == 0) { va_start(ap, fmt); witness_voutput(fmt, ap); va_end(ap); witness_output( " with the following %slocks held:\n", (flags & WARN_SLEEPOK) != 0 ? "non-sleepable " : ""); } n++; witness_list_lock(lock1, witness_output); } /* * Pin the thread in order to avoid problems with thread migration. * Once that all verifies are passed about spinlocks ownership, * the thread is in a safe path and it can be unpinned. */ sched_pin(); lock_list = PCPU_GET(spinlocks); if (lock_list != NULL && lock_list->ll_count != 0) { sched_unpin(); /* * We should only have one spinlock and as long as * the flags cannot match for this locks class, * check if the first spinlock is the one curthread * should hold. */ lock1 = &lock_list->ll_children[lock_list->ll_count - 1]; if (lock_list->ll_count == 1 && lock_list->ll_next == NULL && lock1->li_lock == lock && n == 0) return (0); va_start(ap, fmt); witness_voutput(fmt, ap); va_end(ap); witness_output(" with the following %slocks held:\n", (flags & WARN_SLEEPOK) != 0 ? "non-sleepable " : ""); n += witness_list_locks(&lock_list, witness_output); } else sched_unpin(); if (flags & WARN_PANIC && n) kassert_panic("%s", __func__); else witness_debugger(n, __func__); return (n); } const char * witness_file(struct lock_object *lock) { struct witness *w; if (witness_cold || witness_watch < 1 || lock->lo_witness == NULL) return ("?"); w = lock->lo_witness; return (w->w_file); } int witness_line(struct lock_object *lock) { struct witness *w; if (witness_cold || witness_watch < 1 || lock->lo_witness == NULL) return (0); w = lock->lo_witness; return (w->w_line); } static struct witness * enroll(const char *description, struct lock_class *lock_class) { struct witness *w; struct witness_list *typelist; MPASS(description != NULL); if (witness_watch == -1 || panicstr != NULL) return (NULL); if ((lock_class->lc_flags & LC_SPINLOCK)) { if (witness_skipspin) return (NULL); else typelist = &w_spin; } else if ((lock_class->lc_flags & LC_SLEEPLOCK)) { typelist = &w_sleep; } else { kassert_panic("lock class %s is not sleep or spin", lock_class->lc_name); return (NULL); } mtx_lock_spin(&w_mtx); w = witness_hash_get(description); if (w) goto found; if ((w = witness_get()) == NULL) return (NULL); MPASS(strlen(description) < MAX_W_NAME); strcpy(w->w_name, description); w->w_class = lock_class; w->w_refcount = 1; STAILQ_INSERT_HEAD(&w_all, w, w_list); if (lock_class->lc_flags & LC_SPINLOCK) { STAILQ_INSERT_HEAD(&w_spin, w, w_typelist); w_spin_cnt++; } else if (lock_class->lc_flags & LC_SLEEPLOCK) { STAILQ_INSERT_HEAD(&w_sleep, w, w_typelist); w_sleep_cnt++; } /* Insert new witness into the hash */ witness_hash_put(w); witness_increment_graph_generation(); mtx_unlock_spin(&w_mtx); return (w); found: w->w_refcount++; mtx_unlock_spin(&w_mtx); if (lock_class != w->w_class) kassert_panic( "lock (%s) %s does not match earlier (%s) lock", description, lock_class->lc_name, w->w_class->lc_name); return (w); } static void depart(struct witness *w) { struct witness_list *list; MPASS(w->w_refcount == 0); if (w->w_class->lc_flags & LC_SLEEPLOCK) { list = &w_sleep; w_sleep_cnt--; } else { list = &w_spin; w_spin_cnt--; } /* * Set file to NULL as it may point into a loadable module. */ w->w_file = NULL; w->w_line = 0; witness_increment_graph_generation(); } static void adopt(struct witness *parent, struct witness *child) { int pi, ci, i, j; if (witness_cold == 0) mtx_assert(&w_mtx, MA_OWNED); /* If the relationship is already known, there's no work to be done. */ if (isitmychild(parent, child)) return; /* When the structure of the graph changes, bump up the generation. */ witness_increment_graph_generation(); /* * The hard part ... create the direct relationship, then propagate all * indirect relationships. */ pi = parent->w_index; ci = child->w_index; WITNESS_INDEX_ASSERT(pi); WITNESS_INDEX_ASSERT(ci); MPASS(pi != ci); w_rmatrix[pi][ci] |= WITNESS_PARENT; w_rmatrix[ci][pi] |= WITNESS_CHILD; /* * If parent was not already an ancestor of child, * then we increment the descendant and ancestor counters. */ if ((w_rmatrix[pi][ci] & WITNESS_ANCESTOR) == 0) { parent->w_num_descendants++; child->w_num_ancestors++; } /* * Find each ancestor of 'pi'. Note that 'pi' itself is counted as * an ancestor of 'pi' during this loop. */ for (i = 1; i <= w_max_used_index; i++) { if ((w_rmatrix[i][pi] & WITNESS_ANCESTOR_MASK) == 0 && (i != pi)) continue; /* Find each descendant of 'i' and mark it as a descendant. */ for (j = 1; j <= w_max_used_index; j++) { /* * Skip children that are already marked as * descendants of 'i'. */ if (w_rmatrix[i][j] & WITNESS_ANCESTOR_MASK) continue; /* * We are only interested in descendants of 'ci'. Note * that 'ci' itself is counted as a descendant of 'ci'. */ if ((w_rmatrix[ci][j] & WITNESS_ANCESTOR_MASK) == 0 && (j != ci)) continue; w_rmatrix[i][j] |= WITNESS_ANCESTOR; w_rmatrix[j][i] |= WITNESS_DESCENDANT; w_data[i].w_num_descendants++; w_data[j].w_num_ancestors++; /* * Make sure we aren't marking a node as both an * ancestor and descendant. We should have caught * this as a lock order reversal earlier. */ if ((w_rmatrix[i][j] & WITNESS_ANCESTOR_MASK) && (w_rmatrix[i][j] & WITNESS_DESCENDANT_MASK)) { printf("witness rmatrix paradox! [%d][%d]=%d " "both ancestor and descendant\n", i, j, w_rmatrix[i][j]); kdb_backtrace(); printf("Witness disabled.\n"); witness_watch = -1; } if ((w_rmatrix[j][i] & WITNESS_ANCESTOR_MASK) && (w_rmatrix[j][i] & WITNESS_DESCENDANT_MASK)) { printf("witness rmatrix paradox! [%d][%d]=%d " "both ancestor and descendant\n", j, i, w_rmatrix[j][i]); kdb_backtrace(); printf("Witness disabled.\n"); witness_watch = -1; } } } } static void itismychild(struct witness *parent, struct witness *child) { int unlocked; MPASS(child != NULL && parent != NULL); if (witness_cold == 0) mtx_assert(&w_mtx, MA_OWNED); if (!witness_lock_type_equal(parent, child)) { if (witness_cold == 0) { unlocked = 1; mtx_unlock_spin(&w_mtx); } else { unlocked = 0; } kassert_panic( "%s: parent \"%s\" (%s) and child \"%s\" (%s) are not " "the same lock type", __func__, parent->w_name, parent->w_class->lc_name, child->w_name, child->w_class->lc_name); if (unlocked) mtx_lock_spin(&w_mtx); } adopt(parent, child); } /* * Generic code for the isitmy*() functions. The rmask parameter is the * expected relationship of w1 to w2. */ static int _isitmyx(struct witness *w1, struct witness *w2, int rmask, const char *fname) { unsigned char r1, r2; int i1, i2; i1 = w1->w_index; i2 = w2->w_index; WITNESS_INDEX_ASSERT(i1); WITNESS_INDEX_ASSERT(i2); r1 = w_rmatrix[i1][i2] & WITNESS_RELATED_MASK; r2 = w_rmatrix[i2][i1] & WITNESS_RELATED_MASK; /* The flags on one better be the inverse of the flags on the other */ if (!((WITNESS_ATOD(r1) == r2 && WITNESS_DTOA(r2) == r1) || (WITNESS_DTOA(r1) == r2 && WITNESS_ATOD(r2) == r1))) { /* Don't squawk if we're potentially racing with an update. */ if (!mtx_owned(&w_mtx)) return (0); printf("%s: rmatrix mismatch between %s (index %d) and %s " "(index %d): w_rmatrix[%d][%d] == %hhx but " "w_rmatrix[%d][%d] == %hhx\n", fname, w1->w_name, i1, w2->w_name, i2, i1, i2, r1, i2, i1, r2); kdb_backtrace(); printf("Witness disabled.\n"); witness_watch = -1; } return (r1 & rmask); } /* * Checks if @child is a direct child of @parent. */ static int isitmychild(struct witness *parent, struct witness *child) { return (_isitmyx(parent, child, WITNESS_PARENT, __func__)); } /* * Checks if @descendant is a direct or inderect descendant of @ancestor. */ static int isitmydescendant(struct witness *ancestor, struct witness *descendant) { return (_isitmyx(ancestor, descendant, WITNESS_ANCESTOR_MASK, __func__)); } #ifdef BLESSING static int blessed(struct witness *w1, struct witness *w2) { int i; struct witness_blessed *b; for (i = 0; i < nitems(blessed_list); i++) { b = &blessed_list[i]; if (strcmp(w1->w_name, b->b_lock1) == 0) { if (strcmp(w2->w_name, b->b_lock2) == 0) return (1); continue; } if (strcmp(w1->w_name, b->b_lock2) == 0) if (strcmp(w2->w_name, b->b_lock1) == 0) return (1); } return (0); } #endif static struct witness * witness_get(void) { struct witness *w; int index; if (witness_cold == 0) mtx_assert(&w_mtx, MA_OWNED); if (witness_watch == -1) { mtx_unlock_spin(&w_mtx); return (NULL); } if (STAILQ_EMPTY(&w_free)) { witness_watch = -1; mtx_unlock_spin(&w_mtx); printf("WITNESS: unable to allocate a new witness object\n"); return (NULL); } w = STAILQ_FIRST(&w_free); STAILQ_REMOVE_HEAD(&w_free, w_list); w_free_cnt--; index = w->w_index; MPASS(index > 0 && index == w_max_used_index+1 && index < witness_count); bzero(w, sizeof(*w)); w->w_index = index; if (index > w_max_used_index) w_max_used_index = index; return (w); } static void witness_free(struct witness *w) { STAILQ_INSERT_HEAD(&w_free, w, w_list); w_free_cnt++; } static struct lock_list_entry * witness_lock_list_get(void) { struct lock_list_entry *lle; if (witness_watch == -1) return (NULL); mtx_lock_spin(&w_mtx); lle = w_lock_list_free; if (lle == NULL) { witness_watch = -1; mtx_unlock_spin(&w_mtx); printf("%s: witness exhausted\n", __func__); return (NULL); } w_lock_list_free = lle->ll_next; mtx_unlock_spin(&w_mtx); bzero(lle, sizeof(*lle)); return (lle); } static void witness_lock_list_free(struct lock_list_entry *lle) { mtx_lock_spin(&w_mtx); lle->ll_next = w_lock_list_free; w_lock_list_free = lle; mtx_unlock_spin(&w_mtx); } static struct lock_instance * find_instance(struct lock_list_entry *list, const struct lock_object *lock) { struct lock_list_entry *lle; struct lock_instance *instance; int i; for (lle = list; lle != NULL; lle = lle->ll_next) for (i = lle->ll_count - 1; i >= 0; i--) { instance = &lle->ll_children[i]; if (instance->li_lock == lock) return (instance); } return (NULL); } static void witness_list_lock(struct lock_instance *instance, int (*prnt)(const char *fmt, ...)) { struct lock_object *lock; lock = instance->li_lock; prnt("%s %s %s", (instance->li_flags & LI_EXCLUSIVE) != 0 ? "exclusive" : "shared", LOCK_CLASS(lock)->lc_name, lock->lo_name); if (lock->lo_witness->w_name != lock->lo_name) prnt(" (%s)", lock->lo_witness->w_name); prnt(" r = %d (%p) locked @ %s:%d\n", instance->li_flags & LI_RECURSEMASK, lock, fixup_filename(instance->li_file), instance->li_line); } static int witness_output(const char *fmt, ...) { va_list ap; int ret; va_start(ap, fmt); ret = witness_voutput(fmt, ap); va_end(ap); return (ret); } static int witness_voutput(const char *fmt, va_list ap) { int ret; ret = 0; switch (witness_channel) { case WITNESS_CONSOLE: ret = vprintf(fmt, ap); break; case WITNESS_LOG: vlog(LOG_NOTICE, fmt, ap); break; case WITNESS_NONE: break; } return (ret); } #ifdef DDB static int witness_thread_has_locks(struct thread *td) { if (td->td_sleeplocks == NULL) return (0); return (td->td_sleeplocks->ll_count != 0); } static int witness_proc_has_locks(struct proc *p) { struct thread *td; FOREACH_THREAD_IN_PROC(p, td) { if (witness_thread_has_locks(td)) return (1); } return (0); } #endif int witness_list_locks(struct lock_list_entry **lock_list, int (*prnt)(const char *fmt, ...)) { struct lock_list_entry *lle; int i, nheld; nheld = 0; for (lle = *lock_list; lle != NULL; lle = lle->ll_next) for (i = lle->ll_count - 1; i >= 0; i--) { witness_list_lock(&lle->ll_children[i], prnt); nheld++; } return (nheld); } /* * This is a bit risky at best. We call this function when we have timed * out acquiring a spin lock, and we assume that the other CPU is stuck * with this lock held. So, we go groveling around in the other CPU's * per-cpu data to try to find the lock instance for this spin lock to * see when it was last acquired. */ void witness_display_spinlock(struct lock_object *lock, struct thread *owner, int (*prnt)(const char *fmt, ...)) { struct lock_instance *instance; struct pcpu *pc; if (owner->td_critnest == 0 || owner->td_oncpu == NOCPU) return; pc = pcpu_find(owner->td_oncpu); instance = find_instance(pc->pc_spinlocks, lock); if (instance != NULL) witness_list_lock(instance, prnt); } void witness_save(struct lock_object *lock, const char **filep, int *linep) { struct lock_list_entry *lock_list; struct lock_instance *instance; struct lock_class *class; /* * This function is used independently in locking code to deal with * Giant, SCHEDULER_STOPPED() check can be removed here after Giant * is gone. */ if (SCHEDULER_STOPPED()) return; KASSERT(witness_cold == 0, ("%s: witness_cold", __func__)); if (lock->lo_witness == NULL || witness_watch == -1 || panicstr != NULL) return; class = LOCK_CLASS(lock); if (class->lc_flags & LC_SLEEPLOCK) lock_list = curthread->td_sleeplocks; else { if (witness_skipspin) return; lock_list = PCPU_GET(spinlocks); } instance = find_instance(lock_list, lock); if (instance == NULL) { kassert_panic("%s: lock (%s) %s not locked", __func__, class->lc_name, lock->lo_name); return; } *filep = instance->li_file; *linep = instance->li_line; } void witness_restore(struct lock_object *lock, const char *file, int line) { struct lock_list_entry *lock_list; struct lock_instance *instance; struct lock_class *class; /* * This function is used independently in locking code to deal with * Giant, SCHEDULER_STOPPED() check can be removed here after Giant * is gone. */ if (SCHEDULER_STOPPED()) return; KASSERT(witness_cold == 0, ("%s: witness_cold", __func__)); if (lock->lo_witness == NULL || witness_watch == -1 || panicstr != NULL) return; class = LOCK_CLASS(lock); if (class->lc_flags & LC_SLEEPLOCK) lock_list = curthread->td_sleeplocks; else { if (witness_skipspin) return; lock_list = PCPU_GET(spinlocks); } instance = find_instance(lock_list, lock); if (instance == NULL) kassert_panic("%s: lock (%s) %s not locked", __func__, class->lc_name, lock->lo_name); lock->lo_witness->w_file = file; lock->lo_witness->w_line = line; if (instance == NULL) return; instance->li_file = file; instance->li_line = line; } void witness_assert(const struct lock_object *lock, int flags, const char *file, int line) { #ifdef INVARIANT_SUPPORT struct lock_instance *instance; struct lock_class *class; if (lock->lo_witness == NULL || witness_watch < 1 || panicstr != NULL) return; class = LOCK_CLASS(lock); if ((class->lc_flags & LC_SLEEPLOCK) != 0) instance = find_instance(curthread->td_sleeplocks, lock); else if ((class->lc_flags & LC_SPINLOCK) != 0) instance = find_instance(PCPU_GET(spinlocks), lock); else { kassert_panic("Lock (%s) %s is not sleep or spin!", class->lc_name, lock->lo_name); return; } switch (flags) { case LA_UNLOCKED: if (instance != NULL) kassert_panic("Lock (%s) %s locked @ %s:%d.", class->lc_name, lock->lo_name, fixup_filename(file), line); break; case LA_LOCKED: case LA_LOCKED | LA_RECURSED: case LA_LOCKED | LA_NOTRECURSED: case LA_SLOCKED: case LA_SLOCKED | LA_RECURSED: case LA_SLOCKED | LA_NOTRECURSED: case LA_XLOCKED: case LA_XLOCKED | LA_RECURSED: case LA_XLOCKED | LA_NOTRECURSED: if (instance == NULL) { kassert_panic("Lock (%s) %s not locked @ %s:%d.", class->lc_name, lock->lo_name, fixup_filename(file), line); break; } if ((flags & LA_XLOCKED) != 0 && (instance->li_flags & LI_EXCLUSIVE) == 0) kassert_panic( "Lock (%s) %s not exclusively locked @ %s:%d.", class->lc_name, lock->lo_name, fixup_filename(file), line); if ((flags & LA_SLOCKED) != 0 && (instance->li_flags & LI_EXCLUSIVE) != 0) kassert_panic( "Lock (%s) %s exclusively locked @ %s:%d.", class->lc_name, lock->lo_name, fixup_filename(file), line); if ((flags & LA_RECURSED) != 0 && (instance->li_flags & LI_RECURSEMASK) == 0) kassert_panic("Lock (%s) %s not recursed @ %s:%d.", class->lc_name, lock->lo_name, fixup_filename(file), line); if ((flags & LA_NOTRECURSED) != 0 && (instance->li_flags & LI_RECURSEMASK) != 0) kassert_panic("Lock (%s) %s recursed @ %s:%d.", class->lc_name, lock->lo_name, fixup_filename(file), line); break; default: kassert_panic("Invalid lock assertion at %s:%d.", fixup_filename(file), line); } #endif /* INVARIANT_SUPPORT */ } static void witness_setflag(struct lock_object *lock, int flag, int set) { struct lock_list_entry *lock_list; struct lock_instance *instance; struct lock_class *class; if (lock->lo_witness == NULL || witness_watch == -1 || panicstr != NULL) return; class = LOCK_CLASS(lock); if (class->lc_flags & LC_SLEEPLOCK) lock_list = curthread->td_sleeplocks; else { if (witness_skipspin) return; lock_list = PCPU_GET(spinlocks); } instance = find_instance(lock_list, lock); if (instance == NULL) { kassert_panic("%s: lock (%s) %s not locked", __func__, class->lc_name, lock->lo_name); return; } if (set) instance->li_flags |= flag; else instance->li_flags &= ~flag; } void witness_norelease(struct lock_object *lock) { witness_setflag(lock, LI_NORELEASE, 1); } void witness_releaseok(struct lock_object *lock) { witness_setflag(lock, LI_NORELEASE, 0); } #ifdef DDB static void witness_ddb_list(struct thread *td) { KASSERT(witness_cold == 0, ("%s: witness_cold", __func__)); KASSERT(kdb_active, ("%s: not in the debugger", __func__)); if (witness_watch < 1) return; witness_list_locks(&td->td_sleeplocks, db_printf); /* * We only handle spinlocks if td == curthread. This is somewhat broken * if td is currently executing on some other CPU and holds spin locks * as we won't display those locks. If we had a MI way of getting * the per-cpu data for a given cpu then we could use * td->td_oncpu to get the list of spinlocks for this thread * and "fix" this. * * That still wouldn't really fix this unless we locked the scheduler * lock or stopped the other CPU to make sure it wasn't changing the * list out from under us. It is probably best to just not try to * handle threads on other CPU's for now. */ if (td == curthread && PCPU_GET(spinlocks) != NULL) witness_list_locks(PCPU_PTR(spinlocks), db_printf); } DB_SHOW_COMMAND(locks, db_witness_list) { struct thread *td; if (have_addr) td = db_lookup_thread(addr, true); else td = kdb_thread; witness_ddb_list(td); } DB_SHOW_ALL_COMMAND(locks, db_witness_list_all) { struct thread *td; struct proc *p; /* * It would be nice to list only threads and processes that actually * held sleep locks, but that information is currently not exported * by WITNESS. */ FOREACH_PROC_IN_SYSTEM(p) { if (!witness_proc_has_locks(p)) continue; FOREACH_THREAD_IN_PROC(p, td) { if (!witness_thread_has_locks(td)) continue; db_printf("Process %d (%s) thread %p (%d)\n", p->p_pid, p->p_comm, td, td->td_tid); witness_ddb_list(td); if (db_pager_quit) return; } } } DB_SHOW_ALIAS(alllocks, db_witness_list_all) DB_SHOW_COMMAND(witness, db_witness_display) { witness_ddb_display(db_printf); } #endif static int sysctl_debug_witness_badstacks(SYSCTL_HANDLER_ARGS) { struct witness_lock_order_data *data1, *data2, *tmp_data1, *tmp_data2; struct witness *tmp_w1, *tmp_w2, *w1, *w2; struct sbuf *sb; u_int w_rmatrix1, w_rmatrix2; int error, generation, i, j; tmp_data1 = NULL; tmp_data2 = NULL; tmp_w1 = NULL; tmp_w2 = NULL; if (witness_watch < 1) { error = SYSCTL_OUT(req, w_notrunning, sizeof(w_notrunning)); return (error); } if (witness_cold) { error = SYSCTL_OUT(req, w_stillcold, sizeof(w_stillcold)); return (error); } error = 0; sb = sbuf_new(NULL, NULL, badstack_sbuf_size, SBUF_AUTOEXTEND); if (sb == NULL) return (ENOMEM); /* Allocate and init temporary storage space. */ tmp_w1 = malloc(sizeof(struct witness), M_TEMP, M_WAITOK | M_ZERO); tmp_w2 = malloc(sizeof(struct witness), M_TEMP, M_WAITOK | M_ZERO); tmp_data1 = malloc(sizeof(struct witness_lock_order_data), M_TEMP, M_WAITOK | M_ZERO); tmp_data2 = malloc(sizeof(struct witness_lock_order_data), M_TEMP, M_WAITOK | M_ZERO); stack_zero(&tmp_data1->wlod_stack); stack_zero(&tmp_data2->wlod_stack); restart: mtx_lock_spin(&w_mtx); generation = w_generation; mtx_unlock_spin(&w_mtx); sbuf_printf(sb, "Number of known direct relationships is %d\n", w_lohash.wloh_count); for (i = 1; i < w_max_used_index; i++) { mtx_lock_spin(&w_mtx); if (generation != w_generation) { mtx_unlock_spin(&w_mtx); /* The graph has changed, try again. */ req->oldidx = 0; sbuf_clear(sb); goto restart; } w1 = &w_data[i]; if (w1->w_reversed == 0) { mtx_unlock_spin(&w_mtx); continue; } /* Copy w1 locally so we can release the spin lock. */ *tmp_w1 = *w1; mtx_unlock_spin(&w_mtx); if (tmp_w1->w_reversed == 0) continue; for (j = 1; j < w_max_used_index; j++) { if ((w_rmatrix[i][j] & WITNESS_REVERSAL) == 0 || i > j) continue; mtx_lock_spin(&w_mtx); if (generation != w_generation) { mtx_unlock_spin(&w_mtx); /* The graph has changed, try again. */ req->oldidx = 0; sbuf_clear(sb); goto restart; } w2 = &w_data[j]; data1 = witness_lock_order_get(w1, w2); data2 = witness_lock_order_get(w2, w1); /* * Copy information locally so we can release the * spin lock. */ *tmp_w2 = *w2; w_rmatrix1 = (unsigned int)w_rmatrix[i][j]; w_rmatrix2 = (unsigned int)w_rmatrix[j][i]; if (data1) { stack_zero(&tmp_data1->wlod_stack); stack_copy(&data1->wlod_stack, &tmp_data1->wlod_stack); } if (data2 && data2 != data1) { stack_zero(&tmp_data2->wlod_stack); stack_copy(&data2->wlod_stack, &tmp_data2->wlod_stack); } mtx_unlock_spin(&w_mtx); sbuf_printf(sb, "\nLock order reversal between \"%s\"(%s) and \"%s\"(%s)!\n", tmp_w1->w_name, tmp_w1->w_class->lc_name, tmp_w2->w_name, tmp_w2->w_class->lc_name); if (data1) { sbuf_printf(sb, "Lock order \"%s\"(%s) -> \"%s\"(%s) first seen at:\n", tmp_w1->w_name, tmp_w1->w_class->lc_name, tmp_w2->w_name, tmp_w2->w_class->lc_name); stack_sbuf_print(sb, &tmp_data1->wlod_stack); sbuf_printf(sb, "\n"); } if (data2 && data2 != data1) { sbuf_printf(sb, "Lock order \"%s\"(%s) -> \"%s\"(%s) first seen at:\n", tmp_w2->w_name, tmp_w2->w_class->lc_name, tmp_w1->w_name, tmp_w1->w_class->lc_name); stack_sbuf_print(sb, &tmp_data2->wlod_stack); sbuf_printf(sb, "\n"); } } } mtx_lock_spin(&w_mtx); if (generation != w_generation) { mtx_unlock_spin(&w_mtx); /* * The graph changed while we were printing stack data, * try again. */ req->oldidx = 0; sbuf_clear(sb); goto restart; } mtx_unlock_spin(&w_mtx); /* Free temporary storage space. */ free(tmp_data1, M_TEMP); free(tmp_data2, M_TEMP); free(tmp_w1, M_TEMP); free(tmp_w2, M_TEMP); sbuf_finish(sb); error = SYSCTL_OUT(req, sbuf_data(sb), sbuf_len(sb) + 1); sbuf_delete(sb); return (error); } static int sysctl_debug_witness_channel(SYSCTL_HANDLER_ARGS) { static const struct { enum witness_channel channel; const char *name; } channels[] = { { WITNESS_CONSOLE, "console" }, { WITNESS_LOG, "log" }, { WITNESS_NONE, "none" }, }; char buf[16]; u_int i; int error; buf[0] = '\0'; for (i = 0; i < nitems(channels); i++) if (witness_channel == channels[i].channel) { snprintf(buf, sizeof(buf), "%s", channels[i].name); break; } error = sysctl_handle_string(oidp, buf, sizeof(buf), req); if (error != 0 || req->newptr == NULL) return (error); error = EINVAL; for (i = 0; i < nitems(channels); i++) if (strcmp(channels[i].name, buf) == 0) { witness_channel = channels[i].channel; error = 0; break; } return (error); } static int sysctl_debug_witness_fullgraph(SYSCTL_HANDLER_ARGS) { struct witness *w; struct sbuf *sb; int error; if (witness_watch < 1) { error = SYSCTL_OUT(req, w_notrunning, sizeof(w_notrunning)); return (error); } if (witness_cold) { error = SYSCTL_OUT(req, w_stillcold, sizeof(w_stillcold)); return (error); } error = 0; error = sysctl_wire_old_buffer(req, 0); if (error != 0) return (error); sb = sbuf_new_for_sysctl(NULL, NULL, FULLGRAPH_SBUF_SIZE, req); if (sb == NULL) return (ENOMEM); sbuf_printf(sb, "\n"); mtx_lock_spin(&w_mtx); STAILQ_FOREACH(w, &w_all, w_list) w->w_displayed = 0; STAILQ_FOREACH(w, &w_all, w_list) witness_add_fullgraph(sb, w); mtx_unlock_spin(&w_mtx); /* * Close the sbuf and return to userland. */ error = sbuf_finish(sb); sbuf_delete(sb); return (error); } static int sysctl_debug_witness_watch(SYSCTL_HANDLER_ARGS) { int error, value; value = witness_watch; error = sysctl_handle_int(oidp, &value, 0, req); if (error != 0 || req->newptr == NULL) return (error); if (value > 1 || value < -1 || (witness_watch == -1 && value != witness_watch)) return (EINVAL); witness_watch = value; return (0); } static void witness_add_fullgraph(struct sbuf *sb, struct witness *w) { int i; if (w->w_displayed != 0 || (w->w_file == NULL && w->w_line == 0)) return; w->w_displayed = 1; WITNESS_INDEX_ASSERT(w->w_index); for (i = 1; i <= w_max_used_index; i++) { if (w_rmatrix[w->w_index][i] & WITNESS_PARENT) { sbuf_printf(sb, "\"%s\",\"%s\"\n", w->w_name, w_data[i].w_name); witness_add_fullgraph(sb, &w_data[i]); } } } /* * A simple hash function. Takes a key pointer and a key size. If size == 0, * interprets the key as a string and reads until the null * terminator. Otherwise, reads the first size bytes. Returns an unsigned 32-bit * hash value computed from the key. */ static uint32_t witness_hash_djb2(const uint8_t *key, uint32_t size) { unsigned int hash = 5381; int i; /* hash = hash * 33 + key[i] */ if (size) for (i = 0; i < size; i++) hash = ((hash << 5) + hash) + (unsigned int)key[i]; else for (i = 0; key[i] != 0; i++) hash = ((hash << 5) + hash) + (unsigned int)key[i]; return (hash); } /* * Initializes the two witness hash tables. Called exactly once from * witness_initialize(). */ static void witness_init_hash_tables(void) { int i; MPASS(witness_cold); /* Initialize the hash tables. */ for (i = 0; i < WITNESS_HASH_SIZE; i++) w_hash.wh_array[i] = NULL; w_hash.wh_size = WITNESS_HASH_SIZE; w_hash.wh_count = 0; /* Initialize the lock order data hash. */ w_lofree = NULL; for (i = 0; i < WITNESS_LO_DATA_COUNT; i++) { memset(&w_lodata[i], 0, sizeof(w_lodata[i])); w_lodata[i].wlod_next = w_lofree; w_lofree = &w_lodata[i]; } w_lohash.wloh_size = WITNESS_LO_HASH_SIZE; w_lohash.wloh_count = 0; for (i = 0; i < WITNESS_LO_HASH_SIZE; i++) w_lohash.wloh_array[i] = NULL; } static struct witness * witness_hash_get(const char *key) { struct witness *w; uint32_t hash; MPASS(key != NULL); if (witness_cold == 0) mtx_assert(&w_mtx, MA_OWNED); hash = witness_hash_djb2(key, 0) % w_hash.wh_size; w = w_hash.wh_array[hash]; while (w != NULL) { if (strcmp(w->w_name, key) == 0) goto out; w = w->w_hash_next; } out: return (w); } static void witness_hash_put(struct witness *w) { uint32_t hash; MPASS(w != NULL); MPASS(w->w_name != NULL); if (witness_cold == 0) mtx_assert(&w_mtx, MA_OWNED); KASSERT(witness_hash_get(w->w_name) == NULL, ("%s: trying to add a hash entry that already exists!", __func__)); KASSERT(w->w_hash_next == NULL, ("%s: w->w_hash_next != NULL", __func__)); hash = witness_hash_djb2(w->w_name, 0) % w_hash.wh_size; w->w_hash_next = w_hash.wh_array[hash]; w_hash.wh_array[hash] = w; w_hash.wh_count++; } static struct witness_lock_order_data * witness_lock_order_get(struct witness *parent, struct witness *child) { struct witness_lock_order_data *data = NULL; struct witness_lock_order_key key; unsigned int hash; MPASS(parent != NULL && child != NULL); key.from = parent->w_index; key.to = child->w_index; WITNESS_INDEX_ASSERT(key.from); WITNESS_INDEX_ASSERT(key.to); if ((w_rmatrix[parent->w_index][child->w_index] & WITNESS_LOCK_ORDER_KNOWN) == 0) goto out; hash = witness_hash_djb2((const char*)&key, sizeof(key)) % w_lohash.wloh_size; data = w_lohash.wloh_array[hash]; while (data != NULL) { if (witness_lock_order_key_equal(&data->wlod_key, &key)) break; data = data->wlod_next; } out: return (data); } /* * Verify that parent and child have a known relationship, are not the same, * and child is actually a child of parent. This is done without w_mtx * to avoid contention in the common case. */ static int witness_lock_order_check(struct witness *parent, struct witness *child) { if (parent != child && w_rmatrix[parent->w_index][child->w_index] & WITNESS_LOCK_ORDER_KNOWN && isitmychild(parent, child)) return (1); return (0); } static int witness_lock_order_add(struct witness *parent, struct witness *child) { struct witness_lock_order_data *data = NULL; struct witness_lock_order_key key; unsigned int hash; MPASS(parent != NULL && child != NULL); key.from = parent->w_index; key.to = child->w_index; WITNESS_INDEX_ASSERT(key.from); WITNESS_INDEX_ASSERT(key.to); if (w_rmatrix[parent->w_index][child->w_index] & WITNESS_LOCK_ORDER_KNOWN) return (1); hash = witness_hash_djb2((const char*)&key, sizeof(key)) % w_lohash.wloh_size; w_rmatrix[parent->w_index][child->w_index] |= WITNESS_LOCK_ORDER_KNOWN; data = w_lofree; if (data == NULL) return (0); w_lofree = data->wlod_next; data->wlod_next = w_lohash.wloh_array[hash]; data->wlod_key = key; w_lohash.wloh_array[hash] = data; w_lohash.wloh_count++; stack_zero(&data->wlod_stack); stack_save(&data->wlod_stack); return (1); } /* Call this whenever the structure of the witness graph changes. */ static void witness_increment_graph_generation(void) { if (witness_cold == 0) mtx_assert(&w_mtx, MA_OWNED); w_generation++; } static int witness_output_drain(void *arg __unused, const char *data, int len) { witness_output("%.*s", len, data); return (len); } static void witness_debugger(int cond, const char *msg) { char buf[32]; struct sbuf sb; struct stack st; if (!cond) return; if (witness_trace) { sbuf_new(&sb, buf, sizeof(buf), SBUF_FIXEDLEN); sbuf_set_drain(&sb, witness_output_drain, NULL); stack_zero(&st); stack_save(&st); witness_output("stack backtrace:\n"); stack_sbuf_print_ddb(&sb, &st); sbuf_finish(&sb); } #ifdef KDB if (witness_kdb) kdb_enter(KDB_WHY_WITNESS, msg); #endif } Index: projects/clang390-import/sys/kern/uipc_syscalls.c =================================================================== --- projects/clang390-import/sys/kern/uipc_syscalls.c (revision 305686) +++ projects/clang390-import/sys/kern/uipc_syscalls.c (revision 305687) @@ -1,1755 +1,1575 @@ /*- * Copyright (c) 1982, 1986, 1989, 1990, 1993 * The Regents of the University of California. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)uipc_syscalls.c 8.4 (Berkeley) 2/21/94 */ #include __FBSDID("$FreeBSD$"); #include "opt_capsicum.h" #include "opt_inet.h" #include "opt_inet6.h" #include "opt_compat.h" #include "opt_ktrace.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifdef KTRACE #include #endif #ifdef COMPAT_FREEBSD32 #include #endif #include #include #include /* * Flags for accept1() and kern_accept4(), in addition to SOCK_CLOEXEC * and SOCK_NONBLOCK. */ #define ACCEPT4_INHERIT 0x1 #define ACCEPT4_COMPAT 0x2 static int sendit(struct thread *td, int s, struct msghdr *mp, int flags); static int recvit(struct thread *td, int s, struct msghdr *mp, void *namelenp); static int accept1(struct thread *td, int s, struct sockaddr *uname, socklen_t *anamelen, int flags); static int getsockname1(struct thread *td, struct getsockname_args *uap, int compat); static int getpeername1(struct thread *td, struct getpeername_args *uap, int compat); static int sockargs(struct mbuf **, char *, socklen_t, int); /* * Convert a user file descriptor to a kernel file entry and check if required * capability rights are present. * A reference on the file entry is held upon returning. */ int getsock_cap(struct thread *td, int fd, cap_rights_t *rightsp, struct file **fpp, u_int *fflagp) { struct file *fp; int error; error = fget_unlocked(td->td_proc->p_fd, fd, rightsp, &fp, NULL); if (error != 0) return (error); if (fp->f_type != DTYPE_SOCKET) { fdrop(fp, td); return (ENOTSOCK); } if (fflagp != NULL) *fflagp = fp->f_flag; *fpp = fp; return (0); } /* * System call interface to the socket abstraction. */ #if defined(COMPAT_43) #define COMPAT_OLDSOCK #endif int -sys_socket(td, uap) - struct thread *td; - struct socket_args /* { - int domain; - int type; - int protocol; - } */ *uap; +sys_socket(struct thread *td, struct socket_args *uap) { struct socket *so; struct file *fp; int fd, error, type, oflag, fflag; AUDIT_ARG_SOCKET(uap->domain, uap->type, uap->protocol); type = uap->type; oflag = 0; fflag = 0; if ((type & SOCK_CLOEXEC) != 0) { type &= ~SOCK_CLOEXEC; oflag |= O_CLOEXEC; } if ((type & SOCK_NONBLOCK) != 0) { type &= ~SOCK_NONBLOCK; fflag |= FNONBLOCK; } #ifdef MAC error = mac_socket_check_create(td->td_ucred, uap->domain, type, uap->protocol); if (error != 0) return (error); #endif error = falloc(td, &fp, &fd, oflag); if (error != 0) return (error); /* An extra reference on `fp' has been held for us by falloc(). */ error = socreate(uap->domain, &so, type, uap->protocol, td->td_ucred, td); if (error != 0) { fdclose(td, fp, fd); } else { finit(fp, FREAD | FWRITE | fflag, DTYPE_SOCKET, so, &socketops); if ((fflag & FNONBLOCK) != 0) (void) fo_ioctl(fp, FIONBIO, &fflag, td->td_ucred, td); td->td_retval[0] = fd; } fdrop(fp, td); return (error); } -/* ARGSUSED */ int -sys_bind(td, uap) - struct thread *td; - struct bind_args /* { - int s; - caddr_t name; - int namelen; - } */ *uap; +sys_bind(struct thread *td, struct bind_args *uap) { struct sockaddr *sa; int error; error = getsockaddr(&sa, uap->name, uap->namelen); if (error == 0) { error = kern_bindat(td, AT_FDCWD, uap->s, sa); free(sa, M_SONAME); } return (error); } int kern_bindat(struct thread *td, int dirfd, int fd, struct sockaddr *sa) { struct socket *so; struct file *fp; cap_rights_t rights; int error; AUDIT_ARG_FD(fd); AUDIT_ARG_SOCKADDR(td, dirfd, sa); error = getsock_cap(td, fd, cap_rights_init(&rights, CAP_BIND), &fp, NULL); if (error != 0) return (error); so = fp->f_data; #ifdef KTRACE if (KTRPOINT(td, KTR_STRUCT)) ktrsockaddr(sa); #endif #ifdef MAC error = mac_socket_check_bind(td->td_ucred, so, sa); if (error == 0) { #endif if (dirfd == AT_FDCWD) error = sobind(so, sa, td); else error = sobindat(dirfd, so, sa, td); #ifdef MAC } #endif fdrop(fp, td); return (error); } -/* ARGSUSED */ int -sys_bindat(td, uap) - struct thread *td; - struct bindat_args /* { - int fd; - int s; - caddr_t name; - int namelen; - } */ *uap; +sys_bindat(struct thread *td, struct bindat_args *uap) { struct sockaddr *sa; int error; error = getsockaddr(&sa, uap->name, uap->namelen); if (error == 0) { error = kern_bindat(td, uap->fd, uap->s, sa); free(sa, M_SONAME); } return (error); } -/* ARGSUSED */ int -sys_listen(td, uap) - struct thread *td; - struct listen_args /* { - int s; - int backlog; - } */ *uap; +sys_listen(struct thread *td, struct listen_args *uap) { struct socket *so; struct file *fp; cap_rights_t rights; int error; AUDIT_ARG_FD(uap->s); error = getsock_cap(td, uap->s, cap_rights_init(&rights, CAP_LISTEN), &fp, NULL); if (error == 0) { so = fp->f_data; #ifdef MAC error = mac_socket_check_listen(td->td_ucred, so); if (error == 0) #endif error = solisten(so, uap->backlog, td); fdrop(fp, td); } return(error); } /* * accept1() */ static int accept1(td, s, uname, anamelen, flags) struct thread *td; int s; struct sockaddr *uname; socklen_t *anamelen; int flags; { struct sockaddr *name; socklen_t namelen; struct file *fp; int error; if (uname == NULL) return (kern_accept4(td, s, NULL, NULL, flags, NULL)); error = copyin(anamelen, &namelen, sizeof (namelen)); if (error != 0) return (error); error = kern_accept4(td, s, &name, &namelen, flags, &fp); if (error != 0) return (error); if (error == 0 && uname != NULL) { #ifdef COMPAT_OLDSOCK if (flags & ACCEPT4_COMPAT) ((struct osockaddr *)name)->sa_family = name->sa_family; #endif error = copyout(name, uname, namelen); } if (error == 0) error = copyout(&namelen, anamelen, sizeof(namelen)); if (error != 0) fdclose(td, fp, td->td_retval[0]); fdrop(fp, td); free(name, M_SONAME); return (error); } int kern_accept(struct thread *td, int s, struct sockaddr **name, socklen_t *namelen, struct file **fp) { return (kern_accept4(td, s, name, namelen, ACCEPT4_INHERIT, fp)); } int kern_accept4(struct thread *td, int s, struct sockaddr **name, socklen_t *namelen, int flags, struct file **fp) { struct file *headfp, *nfp = NULL; struct sockaddr *sa = NULL; struct socket *head, *so; cap_rights_t rights; u_int fflag; pid_t pgid; int error, fd, tmp; if (name != NULL) *name = NULL; AUDIT_ARG_FD(s); error = getsock_cap(td, s, cap_rights_init(&rights, CAP_ACCEPT), &headfp, &fflag); if (error != 0) return (error); head = headfp->f_data; if ((head->so_options & SO_ACCEPTCONN) == 0) { error = EINVAL; goto done; } #ifdef MAC error = mac_socket_check_accept(td->td_ucred, head); if (error != 0) goto done; #endif error = falloc(td, &nfp, &fd, (flags & SOCK_CLOEXEC) ? O_CLOEXEC : 0); if (error != 0) goto done; ACCEPT_LOCK(); if ((head->so_state & SS_NBIO) && TAILQ_EMPTY(&head->so_comp)) { ACCEPT_UNLOCK(); error = EWOULDBLOCK; goto noconnection; } while (TAILQ_EMPTY(&head->so_comp) && head->so_error == 0) { if (head->so_rcv.sb_state & SBS_CANTRCVMORE) { head->so_error = ECONNABORTED; break; } error = msleep(&head->so_timeo, &accept_mtx, PSOCK | PCATCH, "accept", 0); if (error != 0) { ACCEPT_UNLOCK(); goto noconnection; } } if (head->so_error) { error = head->so_error; head->so_error = 0; ACCEPT_UNLOCK(); goto noconnection; } so = TAILQ_FIRST(&head->so_comp); KASSERT(!(so->so_qstate & SQ_INCOMP), ("accept1: so SQ_INCOMP")); KASSERT(so->so_qstate & SQ_COMP, ("accept1: so not SQ_COMP")); /* * Before changing the flags on the socket, we have to bump the * reference count. Otherwise, if the protocol calls sofree(), * the socket will be released due to a zero refcount. */ SOCK_LOCK(so); /* soref() and so_state update */ soref(so); /* file descriptor reference */ TAILQ_REMOVE(&head->so_comp, so, so_list); head->so_qlen--; if (flags & ACCEPT4_INHERIT) so->so_state |= (head->so_state & SS_NBIO); else so->so_state |= (flags & SOCK_NONBLOCK) ? SS_NBIO : 0; so->so_qstate &= ~SQ_COMP; so->so_head = NULL; SOCK_UNLOCK(so); ACCEPT_UNLOCK(); /* An extra reference on `nfp' has been held for us by falloc(). */ td->td_retval[0] = fd; /* connection has been removed from the listen queue */ KNOTE_UNLOCKED(&head->so_rcv.sb_sel.si_note, 0); if (flags & ACCEPT4_INHERIT) { pgid = fgetown(&head->so_sigio); if (pgid != 0) fsetown(pgid, &so->so_sigio); } else { fflag &= ~(FNONBLOCK | FASYNC); if (flags & SOCK_NONBLOCK) fflag |= FNONBLOCK; } finit(nfp, fflag, DTYPE_SOCKET, so, &socketops); /* Sync socket nonblocking/async state with file flags */ tmp = fflag & FNONBLOCK; (void) fo_ioctl(nfp, FIONBIO, &tmp, td->td_ucred, td); tmp = fflag & FASYNC; (void) fo_ioctl(nfp, FIOASYNC, &tmp, td->td_ucred, td); sa = NULL; error = soaccept(so, &sa); if (error != 0) goto noconnection; if (sa == NULL) { if (name) *namelen = 0; goto done; } AUDIT_ARG_SOCKADDR(td, AT_FDCWD, sa); if (name) { /* check sa_len before it is destroyed */ if (*namelen > sa->sa_len) *namelen = sa->sa_len; #ifdef KTRACE if (KTRPOINT(td, KTR_STRUCT)) ktrsockaddr(sa); #endif *name = sa; sa = NULL; } noconnection: free(sa, M_SONAME); /* * close the new descriptor, assuming someone hasn't ripped it * out from under us. */ if (error != 0) fdclose(td, nfp, fd); /* * Release explicitly held references before returning. We return * a reference on nfp to the caller on success if they request it. */ done: if (fp != NULL) { if (error == 0) { *fp = nfp; nfp = NULL; } else *fp = NULL; } if (nfp != NULL) fdrop(nfp, td); fdrop(headfp, td); return (error); } int sys_accept(td, uap) struct thread *td; struct accept_args *uap; { return (accept1(td, uap->s, uap->name, uap->anamelen, ACCEPT4_INHERIT)); } int sys_accept4(td, uap) struct thread *td; struct accept4_args *uap; { if (uap->flags & ~(SOCK_CLOEXEC | SOCK_NONBLOCK)) return (EINVAL); return (accept1(td, uap->s, uap->name, uap->anamelen, uap->flags)); } #ifdef COMPAT_OLDSOCK int oaccept(td, uap) struct thread *td; struct accept_args *uap; { return (accept1(td, uap->s, uap->name, uap->anamelen, ACCEPT4_INHERIT | ACCEPT4_COMPAT)); } #endif /* COMPAT_OLDSOCK */ -/* ARGSUSED */ int -sys_connect(td, uap) - struct thread *td; - struct connect_args /* { - int s; - caddr_t name; - int namelen; - } */ *uap; +sys_connect(struct thread *td, struct connect_args *uap) { struct sockaddr *sa; int error; error = getsockaddr(&sa, uap->name, uap->namelen); if (error == 0) { error = kern_connectat(td, AT_FDCWD, uap->s, sa); free(sa, M_SONAME); } return (error); } int kern_connectat(struct thread *td, int dirfd, int fd, struct sockaddr *sa) { struct socket *so; struct file *fp; cap_rights_t rights; int error, interrupted = 0; AUDIT_ARG_FD(fd); AUDIT_ARG_SOCKADDR(td, dirfd, sa); error = getsock_cap(td, fd, cap_rights_init(&rights, CAP_CONNECT), &fp, NULL); if (error != 0) return (error); so = fp->f_data; if (so->so_state & SS_ISCONNECTING) { error = EALREADY; goto done1; } #ifdef KTRACE if (KTRPOINT(td, KTR_STRUCT)) ktrsockaddr(sa); #endif #ifdef MAC error = mac_socket_check_connect(td->td_ucred, so, sa); if (error != 0) goto bad; #endif if (dirfd == AT_FDCWD) error = soconnect(so, sa, td); else error = soconnectat(dirfd, so, sa, td); if (error != 0) goto bad; if ((so->so_state & SS_NBIO) && (so->so_state & SS_ISCONNECTING)) { error = EINPROGRESS; goto done1; } SOCK_LOCK(so); while ((so->so_state & SS_ISCONNECTING) && so->so_error == 0) { error = msleep(&so->so_timeo, SOCK_MTX(so), PSOCK | PCATCH, "connec", 0); if (error != 0) { if (error == EINTR || error == ERESTART) interrupted = 1; break; } } if (error == 0) { error = so->so_error; so->so_error = 0; } SOCK_UNLOCK(so); bad: if (!interrupted) so->so_state &= ~SS_ISCONNECTING; if (error == ERESTART) error = EINTR; done1: fdrop(fp, td); return (error); } -/* ARGSUSED */ int -sys_connectat(td, uap) - struct thread *td; - struct connectat_args /* { - int fd; - int s; - caddr_t name; - int namelen; - } */ *uap; +sys_connectat(struct thread *td, struct connectat_args *uap) { struct sockaddr *sa; int error; error = getsockaddr(&sa, uap->name, uap->namelen); if (error == 0) { error = kern_connectat(td, uap->fd, uap->s, sa); free(sa, M_SONAME); } return (error); } int kern_socketpair(struct thread *td, int domain, int type, int protocol, int *rsv) { struct file *fp1, *fp2; struct socket *so1, *so2; int fd, error, oflag, fflag; AUDIT_ARG_SOCKET(domain, type, protocol); oflag = 0; fflag = 0; if ((type & SOCK_CLOEXEC) != 0) { type &= ~SOCK_CLOEXEC; oflag |= O_CLOEXEC; } if ((type & SOCK_NONBLOCK) != 0) { type &= ~SOCK_NONBLOCK; fflag |= FNONBLOCK; } #ifdef MAC /* We might want to have a separate check for socket pairs. */ error = mac_socket_check_create(td->td_ucred, domain, type, protocol); if (error != 0) return (error); #endif error = socreate(domain, &so1, type, protocol, td->td_ucred, td); if (error != 0) return (error); error = socreate(domain, &so2, type, protocol, td->td_ucred, td); if (error != 0) goto free1; /* On success extra reference to `fp1' and 'fp2' is set by falloc. */ error = falloc(td, &fp1, &fd, oflag); if (error != 0) goto free2; rsv[0] = fd; fp1->f_data = so1; /* so1 already has ref count */ error = falloc(td, &fp2, &fd, oflag); if (error != 0) goto free3; fp2->f_data = so2; /* so2 already has ref count */ rsv[1] = fd; error = soconnect2(so1, so2); if (error != 0) goto free4; if (type == SOCK_DGRAM) { /* * Datagram socket connection is asymmetric. */ error = soconnect2(so2, so1); if (error != 0) goto free4; } finit(fp1, FREAD | FWRITE | fflag, DTYPE_SOCKET, fp1->f_data, &socketops); finit(fp2, FREAD | FWRITE | fflag, DTYPE_SOCKET, fp2->f_data, &socketops); if ((fflag & FNONBLOCK) != 0) { (void) fo_ioctl(fp1, FIONBIO, &fflag, td->td_ucred, td); (void) fo_ioctl(fp2, FIONBIO, &fflag, td->td_ucred, td); } fdrop(fp1, td); fdrop(fp2, td); return (0); free4: fdclose(td, fp2, rsv[1]); fdrop(fp2, td); free3: fdclose(td, fp1, rsv[0]); fdrop(fp1, td); free2: if (so2 != NULL) (void)soclose(so2); free1: if (so1 != NULL) (void)soclose(so1); return (error); } int sys_socketpair(struct thread *td, struct socketpair_args *uap) { int error, sv[2]; error = kern_socketpair(td, uap->domain, uap->type, uap->protocol, sv); if (error != 0) return (error); error = copyout(sv, uap->rsv, 2 * sizeof(int)); if (error != 0) { (void)kern_close(td, sv[0]); (void)kern_close(td, sv[1]); } return (error); } static int -sendit(td, s, mp, flags) - struct thread *td; - int s; - struct msghdr *mp; - int flags; +sendit(struct thread *td, int s, struct msghdr *mp, int flags) { struct mbuf *control; struct sockaddr *to; int error; #ifdef CAPABILITY_MODE if (IN_CAPABILITY_MODE(td) && (mp->msg_name != NULL)) return (ECAPMODE); #endif if (mp->msg_name != NULL) { error = getsockaddr(&to, mp->msg_name, mp->msg_namelen); if (error != 0) { to = NULL; goto bad; } mp->msg_name = to; } else { to = NULL; } if (mp->msg_control) { if (mp->msg_controllen < sizeof(struct cmsghdr) #ifdef COMPAT_OLDSOCK && mp->msg_flags != MSG_COMPAT #endif ) { error = EINVAL; goto bad; } error = sockargs(&control, mp->msg_control, mp->msg_controllen, MT_CONTROL); if (error != 0) goto bad; #ifdef COMPAT_OLDSOCK if (mp->msg_flags == MSG_COMPAT) { struct cmsghdr *cm; M_PREPEND(control, sizeof(*cm), M_WAITOK); cm = mtod(control, struct cmsghdr *); cm->cmsg_len = control->m_len; cm->cmsg_level = SOL_SOCKET; cm->cmsg_type = SCM_RIGHTS; } #endif } else { control = NULL; } error = kern_sendit(td, s, mp, flags, control, UIO_USERSPACE); bad: free(to, M_SONAME); return (error); } int -kern_sendit(td, s, mp, flags, control, segflg) - struct thread *td; - int s; - struct msghdr *mp; - int flags; - struct mbuf *control; - enum uio_seg segflg; +kern_sendit(struct thread *td, int s, struct msghdr *mp, int flags, + struct mbuf *control, enum uio_seg segflg) { struct file *fp; struct uio auio; struct iovec *iov; struct socket *so; cap_rights_t rights; #ifdef KTRACE struct uio *ktruio = NULL; #endif ssize_t len; int i, error; AUDIT_ARG_FD(s); cap_rights_init(&rights, CAP_SEND); if (mp->msg_name != NULL) { AUDIT_ARG_SOCKADDR(td, AT_FDCWD, mp->msg_name); cap_rights_set(&rights, CAP_CONNECT); } error = getsock_cap(td, s, &rights, &fp, NULL); if (error != 0) return (error); so = (struct socket *)fp->f_data; #ifdef KTRACE if (mp->msg_name != NULL && KTRPOINT(td, KTR_STRUCT)) ktrsockaddr(mp->msg_name); #endif #ifdef MAC if (mp->msg_name != NULL) { error = mac_socket_check_connect(td->td_ucred, so, mp->msg_name); if (error != 0) goto bad; } error = mac_socket_check_send(td->td_ucred, so); if (error != 0) goto bad; #endif auio.uio_iov = mp->msg_iov; auio.uio_iovcnt = mp->msg_iovlen; auio.uio_segflg = segflg; auio.uio_rw = UIO_WRITE; auio.uio_td = td; auio.uio_offset = 0; /* XXX */ auio.uio_resid = 0; iov = mp->msg_iov; for (i = 0; i < mp->msg_iovlen; i++, iov++) { if ((auio.uio_resid += iov->iov_len) < 0) { error = EINVAL; goto bad; } } #ifdef KTRACE if (KTRPOINT(td, KTR_GENIO)) ktruio = cloneuio(&auio); #endif len = auio.uio_resid; error = sosend(so, mp->msg_name, &auio, 0, control, flags, td); if (error != 0) { if (auio.uio_resid != len && (error == ERESTART || error == EINTR || error == EWOULDBLOCK)) error = 0; /* Generation of SIGPIPE can be controlled per socket */ if (error == EPIPE && !(so->so_options & SO_NOSIGPIPE) && !(flags & MSG_NOSIGNAL)) { PROC_LOCK(td->td_proc); tdsignal(td, SIGPIPE); PROC_UNLOCK(td->td_proc); } } if (error == 0) td->td_retval[0] = len - auio.uio_resid; #ifdef KTRACE if (ktruio != NULL) { ktruio->uio_resid = td->td_retval[0]; ktrgenio(s, UIO_WRITE, ktruio, error); } #endif bad: fdrop(fp, td); return (error); } int -sys_sendto(td, uap) - struct thread *td; - struct sendto_args /* { - int s; - caddr_t buf; - size_t len; - int flags; - caddr_t to; - int tolen; - } */ *uap; +sys_sendto(struct thread *td, struct sendto_args *uap) { struct msghdr msg; struct iovec aiov; msg.msg_name = uap->to; msg.msg_namelen = uap->tolen; msg.msg_iov = &aiov; msg.msg_iovlen = 1; msg.msg_control = 0; #ifdef COMPAT_OLDSOCK msg.msg_flags = 0; #endif aiov.iov_base = uap->buf; aiov.iov_len = uap->len; return (sendit(td, uap->s, &msg, uap->flags)); } #ifdef COMPAT_OLDSOCK int -osend(td, uap) - struct thread *td; - struct osend_args /* { - int s; - caddr_t buf; - int len; - int flags; - } */ *uap; +osend(struct thread *td, struct osend_args *uap) { struct msghdr msg; struct iovec aiov; msg.msg_name = 0; msg.msg_namelen = 0; msg.msg_iov = &aiov; msg.msg_iovlen = 1; aiov.iov_base = uap->buf; aiov.iov_len = uap->len; msg.msg_control = 0; msg.msg_flags = 0; return (sendit(td, uap->s, &msg, uap->flags)); } int -osendmsg(td, uap) - struct thread *td; - struct osendmsg_args /* { - int s; - caddr_t msg; - int flags; - } */ *uap; +osendmsg(struct thread *td, struct osendmsg_args *uap) { struct msghdr msg; struct iovec *iov; int error; error = copyin(uap->msg, &msg, sizeof (struct omsghdr)); if (error != 0) return (error); error = copyiniov(msg.msg_iov, msg.msg_iovlen, &iov, EMSGSIZE); if (error != 0) return (error); msg.msg_iov = iov; msg.msg_flags = MSG_COMPAT; error = sendit(td, uap->s, &msg, uap->flags); free(iov, M_IOV); return (error); } #endif int -sys_sendmsg(td, uap) - struct thread *td; - struct sendmsg_args /* { - int s; - caddr_t msg; - int flags; - } */ *uap; +sys_sendmsg(struct thread *td, struct sendmsg_args *uap) { struct msghdr msg; struct iovec *iov; int error; error = copyin(uap->msg, &msg, sizeof (msg)); if (error != 0) return (error); error = copyiniov(msg.msg_iov, msg.msg_iovlen, &iov, EMSGSIZE); if (error != 0) return (error); msg.msg_iov = iov; #ifdef COMPAT_OLDSOCK msg.msg_flags = 0; #endif error = sendit(td, uap->s, &msg, uap->flags); free(iov, M_IOV); return (error); } int -kern_recvit(td, s, mp, fromseg, controlp) - struct thread *td; - int s; - struct msghdr *mp; - enum uio_seg fromseg; - struct mbuf **controlp; +kern_recvit(struct thread *td, int s, struct msghdr *mp, enum uio_seg fromseg, + struct mbuf **controlp) { struct uio auio; struct iovec *iov; struct mbuf *m, *control = NULL; caddr_t ctlbuf; struct file *fp; struct socket *so; struct sockaddr *fromsa = NULL; cap_rights_t rights; #ifdef KTRACE struct uio *ktruio = NULL; #endif ssize_t len; int error, i; if (controlp != NULL) *controlp = NULL; AUDIT_ARG_FD(s); error = getsock_cap(td, s, cap_rights_init(&rights, CAP_RECV), &fp, NULL); if (error != 0) return (error); so = fp->f_data; #ifdef MAC error = mac_socket_check_receive(td->td_ucred, so); if (error != 0) { fdrop(fp, td); return (error); } #endif auio.uio_iov = mp->msg_iov; auio.uio_iovcnt = mp->msg_iovlen; auio.uio_segflg = UIO_USERSPACE; auio.uio_rw = UIO_READ; auio.uio_td = td; auio.uio_offset = 0; /* XXX */ auio.uio_resid = 0; iov = mp->msg_iov; for (i = 0; i < mp->msg_iovlen; i++, iov++) { if ((auio.uio_resid += iov->iov_len) < 0) { fdrop(fp, td); return (EINVAL); } } #ifdef KTRACE if (KTRPOINT(td, KTR_GENIO)) ktruio = cloneuio(&auio); #endif len = auio.uio_resid; error = soreceive(so, &fromsa, &auio, NULL, (mp->msg_control || controlp) ? &control : NULL, &mp->msg_flags); if (error != 0) { if (auio.uio_resid != len && (error == ERESTART || error == EINTR || error == EWOULDBLOCK)) error = 0; } if (fromsa != NULL) AUDIT_ARG_SOCKADDR(td, AT_FDCWD, fromsa); #ifdef KTRACE if (ktruio != NULL) { ktruio->uio_resid = len - auio.uio_resid; ktrgenio(s, UIO_READ, ktruio, error); } #endif if (error != 0) goto out; td->td_retval[0] = len - auio.uio_resid; if (mp->msg_name) { len = mp->msg_namelen; if (len <= 0 || fromsa == NULL) len = 0; else { /* save sa_len before it is destroyed by MSG_COMPAT */ len = MIN(len, fromsa->sa_len); #ifdef COMPAT_OLDSOCK if (mp->msg_flags & MSG_COMPAT) ((struct osockaddr *)fromsa)->sa_family = fromsa->sa_family; #endif if (fromseg == UIO_USERSPACE) { error = copyout(fromsa, mp->msg_name, (unsigned)len); if (error != 0) goto out; } else bcopy(fromsa, mp->msg_name, len); } mp->msg_namelen = len; } if (mp->msg_control && controlp == NULL) { #ifdef COMPAT_OLDSOCK /* * We assume that old recvmsg calls won't receive access * rights and other control info, esp. as control info * is always optional and those options didn't exist in 4.3. * If we receive rights, trim the cmsghdr; anything else * is tossed. */ if (control && mp->msg_flags & MSG_COMPAT) { if (mtod(control, struct cmsghdr *)->cmsg_level != SOL_SOCKET || mtod(control, struct cmsghdr *)->cmsg_type != SCM_RIGHTS) { mp->msg_controllen = 0; goto out; } control->m_len -= sizeof (struct cmsghdr); control->m_data += sizeof (struct cmsghdr); } #endif len = mp->msg_controllen; m = control; mp->msg_controllen = 0; ctlbuf = mp->msg_control; while (m && len > 0) { unsigned int tocopy; if (len >= m->m_len) tocopy = m->m_len; else { mp->msg_flags |= MSG_CTRUNC; tocopy = len; } if ((error = copyout(mtod(m, caddr_t), ctlbuf, tocopy)) != 0) goto out; ctlbuf += tocopy; len -= tocopy; m = m->m_next; } mp->msg_controllen = ctlbuf - (caddr_t)mp->msg_control; } out: fdrop(fp, td); #ifdef KTRACE if (fromsa && KTRPOINT(td, KTR_STRUCT)) ktrsockaddr(fromsa); #endif free(fromsa, M_SONAME); if (error == 0 && controlp != NULL) *controlp = control; else if (control) m_freem(control); return (error); } static int -recvit(td, s, mp, namelenp) - struct thread *td; - int s; - struct msghdr *mp; - void *namelenp; +recvit(struct thread *td, int s, struct msghdr *mp, void *namelenp) { int error; error = kern_recvit(td, s, mp, UIO_USERSPACE, NULL); if (error != 0) return (error); if (namelenp != NULL) { error = copyout(&mp->msg_namelen, namelenp, sizeof (socklen_t)); #ifdef COMPAT_OLDSOCK if (mp->msg_flags & MSG_COMPAT) error = 0; /* old recvfrom didn't check */ #endif } return (error); } int -sys_recvfrom(td, uap) - struct thread *td; - struct recvfrom_args /* { - int s; - caddr_t buf; - size_t len; - int flags; - struct sockaddr * __restrict from; - socklen_t * __restrict fromlenaddr; - } */ *uap; +sys_recvfrom(struct thread *td, struct recvfrom_args *uap) { struct msghdr msg; struct iovec aiov; int error; if (uap->fromlenaddr) { error = copyin(uap->fromlenaddr, &msg.msg_namelen, sizeof (msg.msg_namelen)); if (error != 0) goto done2; } else { msg.msg_namelen = 0; } msg.msg_name = uap->from; msg.msg_iov = &aiov; msg.msg_iovlen = 1; aiov.iov_base = uap->buf; aiov.iov_len = uap->len; msg.msg_control = 0; msg.msg_flags = uap->flags; error = recvit(td, uap->s, &msg, uap->fromlenaddr); done2: return (error); } #ifdef COMPAT_OLDSOCK int -orecvfrom(td, uap) - struct thread *td; - struct recvfrom_args *uap; +orecvfrom(struct thread *td, struct recvfrom_args *uap) { uap->flags |= MSG_COMPAT; return (sys_recvfrom(td, uap)); } #endif #ifdef COMPAT_OLDSOCK int -orecv(td, uap) - struct thread *td; - struct orecv_args /* { - int s; - caddr_t buf; - int len; - int flags; - } */ *uap; +orecv(struct thread *td, struct orecv_args *uap) { struct msghdr msg; struct iovec aiov; msg.msg_name = 0; msg.msg_namelen = 0; msg.msg_iov = &aiov; msg.msg_iovlen = 1; aiov.iov_base = uap->buf; aiov.iov_len = uap->len; msg.msg_control = 0; msg.msg_flags = uap->flags; return (recvit(td, uap->s, &msg, NULL)); } /* * Old recvmsg. This code takes advantage of the fact that the old msghdr * overlays the new one, missing only the flags, and with the (old) access * rights where the control fields are now. */ int -orecvmsg(td, uap) - struct thread *td; - struct orecvmsg_args /* { - int s; - struct omsghdr *msg; - int flags; - } */ *uap; +orecvmsg(struct thread *td, struct orecvmsg_args *uap) { struct msghdr msg; struct iovec *iov; int error; error = copyin(uap->msg, &msg, sizeof (struct omsghdr)); if (error != 0) return (error); error = copyiniov(msg.msg_iov, msg.msg_iovlen, &iov, EMSGSIZE); if (error != 0) return (error); msg.msg_flags = uap->flags | MSG_COMPAT; msg.msg_iov = iov; error = recvit(td, uap->s, &msg, &uap->msg->msg_namelen); if (msg.msg_controllen && error == 0) error = copyout(&msg.msg_controllen, &uap->msg->msg_accrightslen, sizeof (int)); free(iov, M_IOV); return (error); } #endif int -sys_recvmsg(td, uap) - struct thread *td; - struct recvmsg_args /* { - int s; - struct msghdr *msg; - int flags; - } */ *uap; +sys_recvmsg(struct thread *td, struct recvmsg_args *uap) { struct msghdr msg; struct iovec *uiov, *iov; int error; error = copyin(uap->msg, &msg, sizeof (msg)); if (error != 0) return (error); error = copyiniov(msg.msg_iov, msg.msg_iovlen, &iov, EMSGSIZE); if (error != 0) return (error); msg.msg_flags = uap->flags; #ifdef COMPAT_OLDSOCK msg.msg_flags &= ~MSG_COMPAT; #endif uiov = msg.msg_iov; msg.msg_iov = iov; error = recvit(td, uap->s, &msg, NULL); if (error == 0) { msg.msg_iov = uiov; error = copyout(&msg, uap->msg, sizeof(msg)); } free(iov, M_IOV); return (error); } -/* ARGSUSED */ int -sys_shutdown(td, uap) - struct thread *td; - struct shutdown_args /* { - int s; - int how; - } */ *uap; +sys_shutdown(struct thread *td, struct shutdown_args *uap) { struct socket *so; struct file *fp; cap_rights_t rights; int error; AUDIT_ARG_FD(uap->s); error = getsock_cap(td, uap->s, cap_rights_init(&rights, CAP_SHUTDOWN), &fp, NULL); if (error == 0) { so = fp->f_data; error = soshutdown(so, uap->how); /* * Previous versions did not return ENOTCONN, but 0 in * case the socket was not connected. Some important * programs like syslogd up to r279016, 2015-02-19, * still depend on this behavior. */ if (error == ENOTCONN && td->td_proc->p_osrel < P_OSREL_SHUTDOWN_ENOTCONN) error = 0; fdrop(fp, td); } return (error); } -/* ARGSUSED */ int -sys_setsockopt(td, uap) - struct thread *td; - struct setsockopt_args /* { - int s; - int level; - int name; - caddr_t val; - int valsize; - } */ *uap; +sys_setsockopt(struct thread *td, struct setsockopt_args *uap) { return (kern_setsockopt(td, uap->s, uap->level, uap->name, uap->val, UIO_USERSPACE, uap->valsize)); } int -kern_setsockopt(td, s, level, name, val, valseg, valsize) - struct thread *td; - int s; - int level; - int name; - void *val; - enum uio_seg valseg; - socklen_t valsize; +kern_setsockopt(struct thread *td, int s, int level, int name, void *val, + enum uio_seg valseg, socklen_t valsize) { struct socket *so; struct file *fp; struct sockopt sopt; cap_rights_t rights; int error; if (val == NULL && valsize != 0) return (EFAULT); if ((int)valsize < 0) return (EINVAL); sopt.sopt_dir = SOPT_SET; sopt.sopt_level = level; sopt.sopt_name = name; sopt.sopt_val = val; sopt.sopt_valsize = valsize; switch (valseg) { case UIO_USERSPACE: sopt.sopt_td = td; break; case UIO_SYSSPACE: sopt.sopt_td = NULL; break; default: panic("kern_setsockopt called with bad valseg"); } AUDIT_ARG_FD(s); error = getsock_cap(td, s, cap_rights_init(&rights, CAP_SETSOCKOPT), &fp, NULL); if (error == 0) { so = fp->f_data; error = sosetopt(so, &sopt); fdrop(fp, td); } return(error); } -/* ARGSUSED */ int -sys_getsockopt(td, uap) - struct thread *td; - struct getsockopt_args /* { - int s; - int level; - int name; - void * __restrict val; - socklen_t * __restrict avalsize; - } */ *uap; +sys_getsockopt(struct thread *td, struct getsockopt_args *uap) { socklen_t valsize; int error; if (uap->val) { error = copyin(uap->avalsize, &valsize, sizeof (valsize)); if (error != 0) return (error); } error = kern_getsockopt(td, uap->s, uap->level, uap->name, uap->val, UIO_USERSPACE, &valsize); if (error == 0) error = copyout(&valsize, uap->avalsize, sizeof (valsize)); return (error); } /* * Kernel version of getsockopt. * optval can be a userland or userspace. optlen is always a kernel pointer. */ int -kern_getsockopt(td, s, level, name, val, valseg, valsize) - struct thread *td; - int s; - int level; - int name; - void *val; - enum uio_seg valseg; - socklen_t *valsize; +kern_getsockopt(struct thread *td, int s, int level, int name, void *val, + enum uio_seg valseg, socklen_t *valsize) { struct socket *so; struct file *fp; struct sockopt sopt; cap_rights_t rights; int error; if (val == NULL) *valsize = 0; if ((int)*valsize < 0) return (EINVAL); sopt.sopt_dir = SOPT_GET; sopt.sopt_level = level; sopt.sopt_name = name; sopt.sopt_val = val; sopt.sopt_valsize = (size_t)*valsize; /* checked non-negative above */ switch (valseg) { case UIO_USERSPACE: sopt.sopt_td = td; break; case UIO_SYSSPACE: sopt.sopt_td = NULL; break; default: panic("kern_getsockopt called with bad valseg"); } AUDIT_ARG_FD(s); error = getsock_cap(td, s, cap_rights_init(&rights, CAP_GETSOCKOPT), &fp, NULL); if (error == 0) { so = fp->f_data; error = sogetopt(so, &sopt); *valsize = sopt.sopt_valsize; fdrop(fp, td); } return (error); } /* * getsockname1() - Get socket name. */ -/* ARGSUSED */ static int -getsockname1(td, uap, compat) - struct thread *td; - struct getsockname_args /* { - int fdes; - struct sockaddr * __restrict asa; - socklen_t * __restrict alen; - } */ *uap; - int compat; +getsockname1(struct thread *td, struct getsockname_args *uap, int compat) { struct sockaddr *sa; socklen_t len; int error; error = copyin(uap->alen, &len, sizeof(len)); if (error != 0) return (error); error = kern_getsockname(td, uap->fdes, &sa, &len); if (error != 0) return (error); if (len != 0) { #ifdef COMPAT_OLDSOCK if (compat) ((struct osockaddr *)sa)->sa_family = sa->sa_family; #endif error = copyout(sa, uap->asa, (u_int)len); } free(sa, M_SONAME); if (error == 0) error = copyout(&len, uap->alen, sizeof(len)); return (error); } int kern_getsockname(struct thread *td, int fd, struct sockaddr **sa, socklen_t *alen) { struct socket *so; struct file *fp; cap_rights_t rights; socklen_t len; int error; AUDIT_ARG_FD(fd); error = getsock_cap(td, fd, cap_rights_init(&rights, CAP_GETSOCKNAME), &fp, NULL); if (error != 0) return (error); so = fp->f_data; *sa = NULL; CURVNET_SET(so->so_vnet); error = (*so->so_proto->pr_usrreqs->pru_sockaddr)(so, sa); CURVNET_RESTORE(); if (error != 0) goto bad; if (*sa == NULL) len = 0; else len = MIN(*alen, (*sa)->sa_len); *alen = len; #ifdef KTRACE if (KTRPOINT(td, KTR_STRUCT)) ktrsockaddr(*sa); #endif bad: fdrop(fp, td); if (error != 0 && *sa != NULL) { free(*sa, M_SONAME); *sa = NULL; } return (error); } int -sys_getsockname(td, uap) - struct thread *td; - struct getsockname_args *uap; +sys_getsockname(struct thread *td, struct getsockname_args *uap) { return (getsockname1(td, uap, 0)); } #ifdef COMPAT_OLDSOCK int -ogetsockname(td, uap) - struct thread *td; - struct getsockname_args *uap; +ogetsockname(struct thread *td, struct getsockname_args *uap) { return (getsockname1(td, uap, 1)); } #endif /* COMPAT_OLDSOCK */ /* * getpeername1() - Get name of peer for connected socket. */ -/* ARGSUSED */ static int -getpeername1(td, uap, compat) - struct thread *td; - struct getpeername_args /* { - int fdes; - struct sockaddr * __restrict asa; - socklen_t * __restrict alen; - } */ *uap; - int compat; +getpeername1(struct thread *td, struct getpeername_args *uap, int compat) { struct sockaddr *sa; socklen_t len; int error; error = copyin(uap->alen, &len, sizeof (len)); if (error != 0) return (error); error = kern_getpeername(td, uap->fdes, &sa, &len); if (error != 0) return (error); if (len != 0) { #ifdef COMPAT_OLDSOCK if (compat) ((struct osockaddr *)sa)->sa_family = sa->sa_family; #endif error = copyout(sa, uap->asa, (u_int)len); } free(sa, M_SONAME); if (error == 0) error = copyout(&len, uap->alen, sizeof(len)); return (error); } int kern_getpeername(struct thread *td, int fd, struct sockaddr **sa, socklen_t *alen) { struct socket *so; struct file *fp; cap_rights_t rights; socklen_t len; int error; AUDIT_ARG_FD(fd); error = getsock_cap(td, fd, cap_rights_init(&rights, CAP_GETPEERNAME), &fp, NULL); if (error != 0) return (error); so = fp->f_data; if ((so->so_state & (SS_ISCONNECTED|SS_ISCONFIRMING)) == 0) { error = ENOTCONN; goto done; } *sa = NULL; CURVNET_SET(so->so_vnet); error = (*so->so_proto->pr_usrreqs->pru_peeraddr)(so, sa); CURVNET_RESTORE(); if (error != 0) goto bad; if (*sa == NULL) len = 0; else len = MIN(*alen, (*sa)->sa_len); *alen = len; #ifdef KTRACE if (KTRPOINT(td, KTR_STRUCT)) ktrsockaddr(*sa); #endif bad: if (error != 0 && *sa != NULL) { free(*sa, M_SONAME); *sa = NULL; } done: fdrop(fp, td); return (error); } int -sys_getpeername(td, uap) - struct thread *td; - struct getpeername_args *uap; +sys_getpeername(struct thread *td, struct getpeername_args *uap) { return (getpeername1(td, uap, 0)); } #ifdef COMPAT_OLDSOCK int -ogetpeername(td, uap) - struct thread *td; - struct ogetpeername_args *uap; +ogetpeername(struct thread *td, struct ogetpeername_args *uap) { /* XXX uap should have type `getpeername_args *' to begin with. */ return (getpeername1(td, (struct getpeername_args *)uap, 1)); } #endif /* COMPAT_OLDSOCK */ static int sockargs(struct mbuf **mp, char *buf, socklen_t buflen, int type) { struct sockaddr *sa; struct mbuf *m; int error; if (buflen > MLEN) { #ifdef COMPAT_OLDSOCK if (type == MT_SONAME && buflen <= 112) buflen = MLEN; /* unix domain compat. hack */ else #endif if (buflen > MCLBYTES) return (EINVAL); } m = m_get2(buflen, M_WAITOK, type, 0); m->m_len = buflen; error = copyin(buf, mtod(m, void *), buflen); if (error != 0) (void) m_free(m); else { *mp = m; if (type == MT_SONAME) { sa = mtod(m, struct sockaddr *); #if defined(COMPAT_OLDSOCK) && BYTE_ORDER != BIG_ENDIAN if (sa->sa_family == 0 && sa->sa_len < AF_MAX) sa->sa_family = sa->sa_len; #endif sa->sa_len = buflen; } } return (error); } int -getsockaddr(namp, uaddr, len) - struct sockaddr **namp; - caddr_t uaddr; - size_t len; +getsockaddr(struct sockaddr **namp, caddr_t uaddr, size_t len) { struct sockaddr *sa; int error; if (len > SOCK_MAXADDRLEN) return (ENAMETOOLONG); if (len < offsetof(struct sockaddr, sa_data[0])) return (EINVAL); sa = malloc(len, M_SONAME, M_WAITOK); error = copyin(uaddr, sa, len); if (error != 0) { free(sa, M_SONAME); } else { #if defined(COMPAT_OLDSOCK) && BYTE_ORDER != BIG_ENDIAN if (sa->sa_family == 0 && sa->sa_len < AF_MAX) sa->sa_family = sa->sa_len; #endif sa->sa_len = len; *namp = sa; } return (error); } Index: projects/clang390-import/sys/kern/vfs_cache.c =================================================================== --- projects/clang390-import/sys/kern/vfs_cache.c (revision 305686) +++ projects/clang390-import/sys/kern/vfs_cache.c (revision 305687) @@ -1,1591 +1,1739 @@ /*- * Copyright (c) 1989, 1993, 1995 * The Regents of the University of California. All rights reserved. * * This code is derived from software contributed to Berkeley by * Poul-Henning Kamp of the FreeBSD Project. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)vfs_cache.c 8.5 (Berkeley) 3/22/95 */ #include __FBSDID("$FreeBSD$"); #include "opt_ktrace.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include +#include #include #include #include #include #ifdef KTRACE #include #endif #include SDT_PROVIDER_DECLARE(vfs); SDT_PROBE_DEFINE3(vfs, namecache, enter, done, "struct vnode *", "char *", "struct vnode *"); SDT_PROBE_DEFINE2(vfs, namecache, enter_negative, done, "struct vnode *", "char *"); SDT_PROBE_DEFINE1(vfs, namecache, fullpath, entry, "struct vnode *"); SDT_PROBE_DEFINE3(vfs, namecache, fullpath, hit, "struct vnode *", "char *", "struct vnode *"); SDT_PROBE_DEFINE1(vfs, namecache, fullpath, miss, "struct vnode *"); SDT_PROBE_DEFINE3(vfs, namecache, fullpath, return, "int", "struct vnode *", "char *"); SDT_PROBE_DEFINE3(vfs, namecache, lookup, hit, "struct vnode *", "char *", "struct vnode *"); SDT_PROBE_DEFINE2(vfs, namecache, lookup, hit__negative, "struct vnode *", "char *"); SDT_PROBE_DEFINE2(vfs, namecache, lookup, miss, "struct vnode *", "char *"); SDT_PROBE_DEFINE1(vfs, namecache, purge, done, "struct vnode *"); SDT_PROBE_DEFINE1(vfs, namecache, purge_negative, done, "struct vnode *"); SDT_PROBE_DEFINE1(vfs, namecache, purgevfs, done, "struct mount *"); SDT_PROBE_DEFINE3(vfs, namecache, zap, done, "struct vnode *", "char *", "struct vnode *"); SDT_PROBE_DEFINE2(vfs, namecache, zap_negative, done, "struct vnode *", "char *"); /* * This structure describes the elements in the cache of recent * names looked up by namei. */ struct namecache { LIST_ENTRY(namecache) nc_hash; /* hash chain */ LIST_ENTRY(namecache) nc_src; /* source vnode list */ TAILQ_ENTRY(namecache) nc_dst; /* destination vnode list */ struct vnode *nc_dvp; /* vnode of parent of name */ struct vnode *nc_vp; /* vnode the name refers to */ u_char nc_flag; /* flag bits */ u_char nc_nlen; /* length of name */ char nc_name[0]; /* segment name + nul */ }; /* * struct namecache_ts repeats struct namecache layout up to the * nc_nlen member. * struct namecache_ts is used in place of struct namecache when time(s) need * to be stored. The nc_dotdottime field is used when a cache entry is mapping * both a non-dotdot directory name plus dotdot for the directory's * parent. */ struct namecache_ts { LIST_ENTRY(namecache) nc_hash; /* hash chain */ LIST_ENTRY(namecache) nc_src; /* source vnode list */ TAILQ_ENTRY(namecache) nc_dst; /* destination vnode list */ struct vnode *nc_dvp; /* vnode of parent of name */ struct vnode *nc_vp; /* vnode the name refers to */ u_char nc_flag; /* flag bits */ u_char nc_nlen; /* length of name */ struct timespec nc_time; /* timespec provided by fs */ struct timespec nc_dotdottime; /* dotdot timespec provided by fs */ int nc_ticks; /* ticks value when entry was added */ char nc_name[0]; /* segment name + nul */ }; /* * Flags in namecache.nc_flag */ #define NCF_WHITE 0x01 #define NCF_ISDOTDOT 0x02 #define NCF_TS 0x04 #define NCF_DTS 0x08 #define NCF_DVDROP 0x10 /* * Name caching works as follows: * * Names found by directory scans are retained in a cache * for future reference. It is managed LRU, so frequently * used names will hang around. Cache is indexed by hash value * obtained from (vp, name) where vp refers to the directory * containing name. * * If it is a "negative" entry, (i.e. for a name that is known NOT to * exist) the vnode pointer will be NULL. * * Upon reaching the last segment of a path, if the reference * is for DELETE, or NOCACHE is set (rewrite), and the * name is located in the cache, it will be dropped. + * + * These locks are used (in the order in which they can be taken): + * NAME TYPE ROLE + * cache_lock rwlock global, needed for all modifications + * bucketlock rwlock for access to given hash bucket + * ncneg_mtx mtx negative entry LRU management + * + * A name -> vnode lookup can be safely performed by either locking cache_lock + * or the relevant hash bucket. + * + * ".." and vnode -> name lookups require cache_lock. + * + * Modifications require both cache_lock and relevant bucketlock taken for + * writing. + * + * Negative entry LRU management requires ncneg_mtx taken on top of either + * cache_lock or bucketlock. */ /* * Structures associated with name caching. */ #define NCHHASH(hash) \ (&nchashtbl[(hash) & nchash]) static LIST_HEAD(nchashhead, namecache) *nchashtbl; /* Hash Table */ static TAILQ_HEAD(, namecache) ncneg; /* Hash Table */ static u_long nchash; /* size of hash table */ SYSCTL_ULONG(_debug, OID_AUTO, nchash, CTLFLAG_RD, &nchash, 0, "Size of namecache hash table"); static u_long ncnegfactor = 16; /* ratio of negative entries */ SYSCTL_ULONG(_vfs, OID_AUTO, ncnegfactor, CTLFLAG_RW, &ncnegfactor, 0, "Ratio of negative namecache entries"); static u_long numneg; /* number of negative entries allocated */ SYSCTL_ULONG(_debug, OID_AUTO, numneg, CTLFLAG_RD, &numneg, 0, "Number of negative entries in namecache"); static u_long numcache; /* number of cache entries allocated */ SYSCTL_ULONG(_debug, OID_AUTO, numcache, CTLFLAG_RD, &numcache, 0, "Number of namecache entries"); static u_long numcachehv; /* number of cache entries with vnodes held */ SYSCTL_ULONG(_debug, OID_AUTO, numcachehv, CTLFLAG_RD, &numcachehv, 0, "Number of namecache entries with vnodes held"); u_int ncsizefactor = 2; SYSCTL_UINT(_vfs, OID_AUTO, ncsizefactor, CTLFLAG_RW, &ncsizefactor, 0, "Size factor for namecache"); struct nchstats nchstats; /* cache effectiveness statistics */ static struct rwlock cache_lock; -RW_SYSINIT(vfscache, &cache_lock, "Name Cache"); +RW_SYSINIT(vfscache, &cache_lock, "ncglobal"); +#define CACHE_TRY_WLOCK() rw_try_wlock(&cache_lock) #define CACHE_UPGRADE_LOCK() rw_try_upgrade(&cache_lock) #define CACHE_RLOCK() rw_rlock(&cache_lock) #define CACHE_RUNLOCK() rw_runlock(&cache_lock) #define CACHE_WLOCK() rw_wlock(&cache_lock) #define CACHE_WUNLOCK() rw_wunlock(&cache_lock) static struct mtx_padalign ncneg_mtx; -MTX_SYSINIT(vfscache_neg, &ncneg_mtx, "Name Cache neg", MTX_DEF); +MTX_SYSINIT(vfscache_neg, &ncneg_mtx, "ncneg", MTX_DEF); +static u_int numbucketlocks; +static struct rwlock_padalign *bucketlocks; +#define HASH2BUCKETLOCK(hash) \ + ((struct rwlock *)(&bucketlocks[((hash) % numbucketlocks)])) + /* * UMA zones for the VFS cache. * * The small cache is used for entries with short names, which are the * most common. The large cache is used for entries which are too big to * fit in the small cache. */ static uma_zone_t cache_zone_small; static uma_zone_t cache_zone_small_ts; static uma_zone_t cache_zone_large; static uma_zone_t cache_zone_large_ts; #define CACHE_PATH_CUTOFF 35 static struct namecache * cache_alloc(int len, int ts) { if (len > CACHE_PATH_CUTOFF) { if (ts) return (uma_zalloc(cache_zone_large_ts, M_WAITOK)); else return (uma_zalloc(cache_zone_large, M_WAITOK)); } if (ts) return (uma_zalloc(cache_zone_small_ts, M_WAITOK)); else return (uma_zalloc(cache_zone_small, M_WAITOK)); } static void cache_free(struct namecache *ncp) { int ts; if (ncp == NULL) return; ts = ncp->nc_flag & NCF_TS; if ((ncp->nc_flag & NCF_DVDROP) != 0) vdrop(ncp->nc_dvp); if (ncp->nc_nlen <= CACHE_PATH_CUTOFF) { if (ts) uma_zfree(cache_zone_small_ts, ncp); else uma_zfree(cache_zone_small, ncp); } else if (ts) uma_zfree(cache_zone_large_ts, ncp); else uma_zfree(cache_zone_large, ncp); } static char * nc_get_name(struct namecache *ncp) { struct namecache_ts *ncp_ts; if ((ncp->nc_flag & NCF_TS) == 0) return (ncp->nc_name); ncp_ts = (struct namecache_ts *)ncp; return (ncp_ts->nc_name); } static void cache_out_ts(struct namecache *ncp, struct timespec *tsp, int *ticksp) { KASSERT((ncp->nc_flag & NCF_TS) != 0 || (tsp == NULL && ticksp == NULL), ("No NCF_TS")); if (tsp != NULL) *tsp = ((struct namecache_ts *)ncp)->nc_time; if (ticksp != NULL) *ticksp = ((struct namecache_ts *)ncp)->nc_ticks; } static int doingcache = 1; /* 1 => enable the cache */ SYSCTL_INT(_debug, OID_AUTO, vfscache, CTLFLAG_RW, &doingcache, 0, "VFS namecache enabled"); /* Export size information to userland */ SYSCTL_INT(_debug_sizeof, OID_AUTO, namecache, CTLFLAG_RD, SYSCTL_NULL_INT_PTR, sizeof(struct namecache), "sizeof(struct namecache)"); /* * The new name cache statistics */ static SYSCTL_NODE(_vfs, OID_AUTO, cache, CTLFLAG_RW, 0, "Name cache statistics"); #define STATNODE_ULONG(name, descr) \ SYSCTL_ULONG(_vfs_cache, OID_AUTO, name, CTLFLAG_RD, &name, 0, descr); #define STATNODE_COUNTER(name, descr) \ static counter_u64_t name; \ SYSCTL_COUNTER_U64(_vfs_cache, OID_AUTO, name, CTLFLAG_RD, &name, descr); STATNODE_ULONG(numneg, "Number of negative cache entries"); STATNODE_ULONG(numcache, "Number of cache entries"); STATNODE_COUNTER(numcalls, "Number of cache lookups"); STATNODE_COUNTER(dothits, "Number of '.' hits"); STATNODE_COUNTER(dotdothits, "Number of '..' hits"); STATNODE_COUNTER(numchecks, "Number of checks in lookup"); STATNODE_COUNTER(nummiss, "Number of cache misses"); STATNODE_COUNTER(nummisszap, "Number of cache misses we do not want to cache"); STATNODE_COUNTER(numposzaps, "Number of cache hits (positive) we do not want to cache"); STATNODE_COUNTER(numposhits, "Number of cache hits (positive)"); STATNODE_COUNTER(numnegzaps, "Number of cache hits (negative) we do not want to cache"); STATNODE_COUNTER(numneghits, "Number of cache hits (negative)"); /* These count for kern___getcwd(), too. */ STATNODE_COUNTER(numfullpathcalls, "Number of fullpath search calls"); STATNODE_COUNTER(numfullpathfail1, "Number of fullpath search errors (ENOTDIR)"); STATNODE_COUNTER(numfullpathfail2, "Number of fullpath search errors (VOP_VPTOCNP failures)"); STATNODE_COUNTER(numfullpathfail4, "Number of fullpath search errors (ENOMEM)"); STATNODE_COUNTER(numfullpathfound, "Number of successful fullpath calls"); static long numupgrades; STATNODE_ULONG(numupgrades, "Number of updates of the cache after lookup (write lock + retry)"); +static long zap_and_exit_bucket_fail; STATNODE_ULONG(zap_and_exit_bucket_fail, + "Number of times bucketlocked zap_and_exit case failed to writelock"); static void cache_zap(struct namecache *ncp); static int vn_vptocnp_locked(struct vnode **vp, struct ucred *cred, char *buf, u_int *buflen); static int vn_fullpath1(struct thread *td, struct vnode *vp, struct vnode *rdir, char *buf, char **retbuf, u_int buflen); static MALLOC_DEFINE(M_VFSCACHE, "vfscache", "VFS name cache entries"); static uint32_t cache_get_hash(char *name, u_char len, struct vnode *dvp) { uint32_t hash; hash = fnv_32_buf(name, len, FNV1_32_INIT); hash = fnv_32_buf(&dvp, sizeof(dvp), hash); return (hash); } +#ifdef INVARIANTS +static void +cache_assert_bucket_locked(struct namecache *ncp, int mode) +{ + struct rwlock *bucketlock; + uint32_t hash; + + hash = cache_get_hash(nc_get_name(ncp), ncp->nc_nlen, ncp->nc_dvp); + bucketlock = HASH2BUCKETLOCK(hash); + rw_assert(bucketlock, mode); +} +#else +#define cache_assert_bucket_locked(x, y) do { } while (0) +#endif + +static void +cache_lock_all_buckets(void) +{ + u_int i; + + for (i = 0; i < numbucketlocks; i++) + rw_wlock(&bucketlocks[i]); +} + +static void +cache_unlock_all_buckets(void) +{ + u_int i; + + for (i = 0; i < numbucketlocks; i++) + rw_wunlock(&bucketlocks[i]); +} + static int sysctl_nchstats(SYSCTL_HANDLER_ARGS) { struct nchstats snap; if (req->oldptr == NULL) return (SYSCTL_OUT(req, 0, sizeof(snap))); snap = nchstats; snap.ncs_goodhits = counter_u64_fetch(numposhits); snap.ncs_neghits = counter_u64_fetch(numneghits); snap.ncs_badhits = counter_u64_fetch(numposzaps) + counter_u64_fetch(numnegzaps); snap.ncs_miss = counter_u64_fetch(nummisszap) + counter_u64_fetch(nummiss); return (SYSCTL_OUT(req, &snap, sizeof(snap))); } SYSCTL_PROC(_vfs_cache, OID_AUTO, nchstats, CTLTYPE_OPAQUE | CTLFLAG_RD | CTLFLAG_MPSAFE, 0, 0, sysctl_nchstats, "LU", "VFS cache effectiveness statistics"); #ifdef DIAGNOSTIC /* * Grab an atomic snapshot of the name cache hash chain lengths */ static SYSCTL_NODE(_debug, OID_AUTO, hashstat, CTLFLAG_RW, NULL, "hash table stats"); static int sysctl_debug_hashstat_rawnchash(SYSCTL_HANDLER_ARGS) { struct nchashhead *ncpp; struct namecache *ncp; int i, error, n_nchash, *cntbuf; retry: n_nchash = nchash + 1; /* nchash is max index, not count */ if (req->oldptr == NULL) return SYSCTL_OUT(req, 0, n_nchash * sizeof(int)); cntbuf = malloc(n_nchash * sizeof(int), M_TEMP, M_ZERO | M_WAITOK); CACHE_RLOCK(); if (n_nchash != nchash + 1) { CACHE_RUNLOCK(); free(cntbuf, M_TEMP); goto retry; } /* Scan hash tables counting entries */ for (ncpp = nchashtbl, i = 0; i < n_nchash; ncpp++, i++) LIST_FOREACH(ncp, ncpp, nc_hash) cntbuf[i]++; CACHE_RUNLOCK(); for (error = 0, i = 0; i < n_nchash; i++) if ((error = SYSCTL_OUT(req, &cntbuf[i], sizeof(int))) != 0) break; free(cntbuf, M_TEMP); return (error); } SYSCTL_PROC(_debug_hashstat, OID_AUTO, rawnchash, CTLTYPE_INT|CTLFLAG_RD| CTLFLAG_MPSAFE, 0, 0, sysctl_debug_hashstat_rawnchash, "S,int", "nchash chain lengths"); static int sysctl_debug_hashstat_nchash(SYSCTL_HANDLER_ARGS) { int error; struct nchashhead *ncpp; struct namecache *ncp; int n_nchash; int count, maxlength, used, pct; if (!req->oldptr) return SYSCTL_OUT(req, 0, 4 * sizeof(int)); CACHE_RLOCK(); n_nchash = nchash + 1; /* nchash is max index, not count */ used = 0; maxlength = 0; /* Scan hash tables for applicable entries */ for (ncpp = nchashtbl; n_nchash > 0; n_nchash--, ncpp++) { count = 0; LIST_FOREACH(ncp, ncpp, nc_hash) { count++; } if (count) used++; if (maxlength < count) maxlength = count; } n_nchash = nchash + 1; CACHE_RUNLOCK(); pct = (used * 100) / (n_nchash / 100); error = SYSCTL_OUT(req, &n_nchash, sizeof(n_nchash)); if (error) return (error); error = SYSCTL_OUT(req, &used, sizeof(used)); if (error) return (error); error = SYSCTL_OUT(req, &maxlength, sizeof(maxlength)); if (error) return (error); error = SYSCTL_OUT(req, &pct, sizeof(pct)); if (error) return (error); return (0); } SYSCTL_PROC(_debug_hashstat, OID_AUTO, nchash, CTLTYPE_INT|CTLFLAG_RD| CTLFLAG_MPSAFE, 0, 0, sysctl_debug_hashstat_nchash, "I", "nchash statistics (number of total/used buckets, maximum chain length, usage percentage)"); #endif /* * Negative entries management */ static void -cache_negative_hit(struct namecache *ncp, int wlocked) +cache_negative_hit(struct namecache *ncp) { - if (!wlocked) { - rw_assert(&cache_lock, RA_RLOCKED); - mtx_lock(&ncneg_mtx); - } else { - rw_assert(&cache_lock, RA_WLOCKED); - } - + mtx_lock(&ncneg_mtx); TAILQ_REMOVE(&ncneg, ncp, nc_dst); TAILQ_INSERT_TAIL(&ncneg, ncp, nc_dst); - - if (!wlocked) - mtx_unlock(&ncneg_mtx); + mtx_unlock(&ncneg_mtx); } static void cache_negative_insert(struct namecache *ncp) { rw_assert(&cache_lock, RA_WLOCKED); + cache_assert_bucket_locked(ncp, RA_WLOCKED); MPASS(ncp->nc_vp == NULL); + mtx_lock(&ncneg_mtx); TAILQ_INSERT_TAIL(&ncneg, ncp, nc_dst); numneg++; + mtx_unlock(&ncneg_mtx); } static void cache_negative_remove(struct namecache *ncp) { rw_assert(&cache_lock, RA_WLOCKED); + cache_assert_bucket_locked(ncp, RA_WLOCKED); MPASS(ncp->nc_vp == NULL); + mtx_lock(&ncneg_mtx); TAILQ_REMOVE(&ncneg, ncp, nc_dst); numneg--; + mtx_unlock(&ncneg_mtx); } static struct namecache * cache_negative_zap_one(void) { struct namecache *ncp; rw_assert(&cache_lock, RA_WLOCKED); ncp = TAILQ_FIRST(&ncneg); KASSERT(ncp->nc_vp == NULL, ("ncp %p vp %p on ncneg", ncp, ncp->nc_vp)); cache_zap(ncp); return (ncp); } /* * cache_zap(): * * Removes a namecache entry from cache, whether it contains an actual * pointer to a vnode or if it is just a negative cache entry. */ static void -cache_zap(struct namecache *ncp) +cache_zap_locked(struct namecache *ncp) { rw_assert(&cache_lock, RA_WLOCKED); + cache_assert_bucket_locked(ncp, RA_WLOCKED); CTR2(KTR_VFS, "cache_zap(%p) vp %p", ncp, ncp->nc_vp); if (ncp->nc_vp != NULL) { SDT_PROBE3(vfs, namecache, zap, done, ncp->nc_dvp, nc_get_name(ncp), ncp->nc_vp); } else { SDT_PROBE2(vfs, namecache, zap_negative, done, ncp->nc_dvp, nc_get_name(ncp)); } LIST_REMOVE(ncp, nc_hash); if (ncp->nc_flag & NCF_ISDOTDOT) { if (ncp == ncp->nc_dvp->v_cache_dd) ncp->nc_dvp->v_cache_dd = NULL; } else { LIST_REMOVE(ncp, nc_src); if (LIST_EMPTY(&ncp->nc_dvp->v_cache_src)) { ncp->nc_flag |= NCF_DVDROP; numcachehv--; } } if (ncp->nc_vp) { TAILQ_REMOVE(&ncp->nc_vp->v_cache_dst, ncp, nc_dst); if (ncp == ncp->nc_vp->v_cache_dd) ncp->nc_vp->v_cache_dd = NULL; } else { cache_negative_remove(ncp); } numcache--; } +static void +cache_zap(struct namecache *ncp) +{ + struct rwlock *bucketlock; + uint32_t hash; + + rw_assert(&cache_lock, RA_WLOCKED); + + hash = cache_get_hash(nc_get_name(ncp), ncp->nc_nlen, ncp->nc_dvp); + bucketlock = HASH2BUCKETLOCK(hash); + rw_wlock(bucketlock); + cache_zap_locked(ncp); + rw_wunlock(bucketlock); +} + /* * Lookup an entry in the cache * * Lookup is called with dvp pointing to the directory to search, * cnp pointing to the name of the entry being sought. If the lookup * succeeds, the vnode is returned in *vpp, and a status of -1 is * returned. If the lookup determines that the name does not exist * (negative caching), a status of ENOENT is returned. If the lookup * fails, a status of zero is returned. If the directory vnode is * recycled out from under us due to a forced unmount, a status of * ENOENT is returned. * * vpp is locked and ref'd on return. If we're looking up DOTDOT, dvp is * unlocked. If we're looking up . an extra ref is taken, but the lock is * not recursively acquired. */ +enum { UNLOCKED, WLOCKED, RLOCKED }; + +static void +cache_unlock(int cache_locked) +{ + + switch (cache_locked) { + case UNLOCKED: + break; + case WLOCKED: + CACHE_WUNLOCK(); + break; + case RLOCKED: + CACHE_RUNLOCK(); + break; + } +} + int cache_lookup(struct vnode *dvp, struct vnode **vpp, struct componentname *cnp, struct timespec *tsp, int *ticksp) { + struct rwlock *bucketlock; struct namecache *ncp; uint32_t hash; - int error, ltype, wlocked; + int error, ltype, cache_locked; if (!doingcache) { cnp->cn_flags &= ~MAKEENTRY; return (0); } retry: - wlocked = 0; - counter_u64_add(numcalls, 1); + bucketlock = NULL; + cache_locked = UNLOCKED; error = 0; + counter_u64_add(numcalls, 1); retry_wlocked: if (cnp->cn_nameptr[0] == '.') { if (cnp->cn_namelen == 1) { *vpp = dvp; CTR2(KTR_VFS, "cache_lookup(%p, %s) found via .", dvp, cnp->cn_nameptr); counter_u64_add(dothits, 1); SDT_PROBE3(vfs, namecache, lookup, hit, dvp, ".", *vpp); if (tsp != NULL) timespecclear(tsp); if (ticksp != NULL) *ticksp = ticks; VREF(*vpp); /* * When we lookup "." we still can be asked to lock it * differently... */ ltype = cnp->cn_lkflags & LK_TYPE_MASK; if (ltype != VOP_ISLOCKED(*vpp)) { if (ltype == LK_EXCLUSIVE) { vn_lock(*vpp, LK_UPGRADE | LK_RETRY); if ((*vpp)->v_iflag & VI_DOOMED) { /* forced unmount */ vrele(*vpp); *vpp = NULL; return (ENOENT); } } else vn_lock(*vpp, LK_DOWNGRADE | LK_RETRY); } return (-1); } - if (!wlocked) - CACHE_RLOCK(); if (cnp->cn_namelen == 2 && cnp->cn_nameptr[1] == '.') { counter_u64_add(dotdothits, 1); + if (cache_locked == UNLOCKED) { + CACHE_RLOCK(); + cache_locked = RLOCKED; + } + if (dvp->v_cache_dd == NULL) { SDT_PROBE3(vfs, namecache, lookup, miss, dvp, "..", NULL); goto unlock; } if ((cnp->cn_flags & MAKEENTRY) == 0) { - if (!wlocked && !CACHE_UPGRADE_LOCK()) + if (cache_locked != WLOCKED && + !CACHE_UPGRADE_LOCK()) goto wlock; ncp = NULL; if (dvp->v_cache_dd->nc_flag & NCF_ISDOTDOT) { ncp = dvp->v_cache_dd; cache_zap(ncp); } dvp->v_cache_dd = NULL; CACHE_WUNLOCK(); cache_free(ncp); return (0); } ncp = dvp->v_cache_dd; if (ncp->nc_flag & NCF_ISDOTDOT) *vpp = ncp->nc_vp; else *vpp = ncp->nc_dvp; /* Return failure if negative entry was found. */ if (*vpp == NULL) goto negative_success; CTR3(KTR_VFS, "cache_lookup(%p, %s) found %p via ..", dvp, cnp->cn_nameptr, *vpp); SDT_PROBE3(vfs, namecache, lookup, hit, dvp, "..", *vpp); cache_out_ts(ncp, tsp, ticksp); if ((ncp->nc_flag & (NCF_ISDOTDOT | NCF_DTS)) == NCF_DTS && tsp != NULL) *tsp = ((struct namecache_ts *)ncp)-> nc_dotdottime; goto success; } - } else if (!wlocked) - CACHE_RLOCK(); + } hash = cache_get_hash(cnp->cn_nameptr, cnp->cn_namelen, dvp); + if (cache_locked == UNLOCKED) { + bucketlock = HASH2BUCKETLOCK(hash); + rw_rlock(bucketlock); + } + LIST_FOREACH(ncp, (NCHHASH(hash)), nc_hash) { counter_u64_add(numchecks, 1); if (ncp->nc_dvp == dvp && ncp->nc_nlen == cnp->cn_namelen && !bcmp(nc_get_name(ncp), cnp->cn_nameptr, ncp->nc_nlen)) break; } /* We failed to find an entry */ if (ncp == NULL) { SDT_PROBE3(vfs, namecache, lookup, miss, dvp, cnp->cn_nameptr, NULL); if ((cnp->cn_flags & MAKEENTRY) == 0) { counter_u64_add(nummisszap, 1); } else { counter_u64_add(nummiss, 1); } goto unlock; } /* We don't want to have an entry, so dump it */ if ((cnp->cn_flags & MAKEENTRY) == 0) { counter_u64_add(numposzaps, 1); - if (!wlocked && !CACHE_UPGRADE_LOCK()) - goto wlock; - cache_zap(ncp); - CACHE_WUNLOCK(); - cache_free(ncp); - return (0); + goto zap_and_exit; } /* We found a "positive" match, return the vnode */ if (ncp->nc_vp) { counter_u64_add(numposhits, 1); *vpp = ncp->nc_vp; CTR4(KTR_VFS, "cache_lookup(%p, %s) found %p via ncp %p", dvp, cnp->cn_nameptr, *vpp, ncp); SDT_PROBE3(vfs, namecache, lookup, hit, dvp, nc_get_name(ncp), *vpp); cache_out_ts(ncp, tsp, ticksp); goto success; } negative_success: /* We found a negative match, and want to create it, so purge */ if (cnp->cn_nameiop == CREATE) { counter_u64_add(numnegzaps, 1); - if (!wlocked && !CACHE_UPGRADE_LOCK()) - goto wlock; - cache_zap(ncp); - CACHE_WUNLOCK(); - cache_free(ncp); - return (0); + goto zap_and_exit; } counter_u64_add(numneghits, 1); - cache_negative_hit(ncp, wlocked); + cache_negative_hit(ncp); if (ncp->nc_flag & NCF_WHITE) cnp->cn_flags |= ISWHITEOUT; SDT_PROBE2(vfs, namecache, lookup, hit__negative, dvp, nc_get_name(ncp)); cache_out_ts(ncp, tsp, ticksp); - if (wlocked) - CACHE_WUNLOCK(); - else - CACHE_RUNLOCK(); + MPASS(bucketlock != NULL || cache_locked != UNLOCKED); + if (bucketlock != NULL) + rw_runlock(bucketlock); + cache_unlock(cache_locked); return (ENOENT); wlock: /* * We need to update the cache after our lookup, so upgrade to * a write lock and retry the operation. */ CACHE_RUNLOCK(); +wlock_unlocked: CACHE_WLOCK(); numupgrades++; - wlocked = 1; + cache_locked = WLOCKED; goto retry_wlocked; success: /* * On success we return a locked and ref'd vnode as per the lookup * protocol. */ MPASS(dvp != *vpp); ltype = 0; /* silence gcc warning */ if (cnp->cn_flags & ISDOTDOT) { ltype = VOP_ISLOCKED(dvp); VOP_UNLOCK(dvp, 0); } vhold(*vpp); - if (wlocked) - CACHE_WUNLOCK(); - else - CACHE_RUNLOCK(); + MPASS(bucketlock != NULL || cache_locked != UNLOCKED); + if (bucketlock != NULL) + rw_runlock(bucketlock); + cache_unlock(cache_locked); error = vget(*vpp, cnp->cn_lkflags | LK_VNHELD, cnp->cn_thread); if (cnp->cn_flags & ISDOTDOT) { vn_lock(dvp, ltype | LK_RETRY); if (dvp->v_iflag & VI_DOOMED) { if (error == 0) vput(*vpp); *vpp = NULL; return (ENOENT); } } if (error) { *vpp = NULL; goto retry; } if ((cnp->cn_flags & ISLASTCN) && (cnp->cn_lkflags & LK_TYPE_MASK) == LK_EXCLUSIVE) { ASSERT_VOP_ELOCKED(*vpp, "cache_lookup"); } return (-1); unlock: - if (wlocked) - CACHE_WUNLOCK(); - else - CACHE_RUNLOCK(); + MPASS(bucketlock != NULL || cache_locked != UNLOCKED); + if (bucketlock != NULL) + rw_runlock(bucketlock); + cache_unlock(cache_locked); return (0); + +zap_and_exit: + if (bucketlock != NULL) { + rw_assert(&cache_lock, RA_UNLOCKED); + if (!CACHE_TRY_WLOCK()) { + rw_runlock(bucketlock); + bucketlock = NULL; + zap_and_exit_bucket_fail++; + goto wlock_unlocked; + } + cache_locked = WLOCKED; + rw_runlock(bucketlock); + bucketlock = NULL; + } else if (cache_locked != WLOCKED && !CACHE_UPGRADE_LOCK()) + goto wlock; + cache_zap(ncp); + CACHE_WUNLOCK(); + cache_free(ncp); + return (0); } /* * Add an entry to the cache. */ void cache_enter_time(struct vnode *dvp, struct vnode *vp, struct componentname *cnp, struct timespec *tsp, struct timespec *dtsp) { + struct rwlock *bucketlock; struct namecache *ncp, *n2, *ndd, *nneg; struct namecache_ts *n3; struct nchashhead *ncpp; uint32_t hash; int flag; int len; CTR3(KTR_VFS, "cache_enter(%p, %p, %s)", dvp, vp, cnp->cn_nameptr); VNASSERT(vp == NULL || (vp->v_iflag & VI_DOOMED) == 0, vp, ("cache_enter: Adding a doomed vnode")); VNASSERT(dvp == NULL || (dvp->v_iflag & VI_DOOMED) == 0, dvp, ("cache_enter: Doomed vnode used as src")); if (!doingcache) return; /* * Avoid blowout in namecache entries. */ if (numcache >= desiredvnodes * ncsizefactor) return; ndd = nneg = NULL; flag = 0; if (cnp->cn_nameptr[0] == '.') { if (cnp->cn_namelen == 1) return; if (cnp->cn_namelen == 2 && cnp->cn_nameptr[1] == '.') { CACHE_WLOCK(); /* * If dotdot entry already exists, just retarget it * to new parent vnode, otherwise continue with new * namecache entry allocation. */ if ((ncp = dvp->v_cache_dd) != NULL && ncp->nc_flag & NCF_ISDOTDOT) { KASSERT(ncp->nc_dvp == dvp, ("wrong isdotdot parent")); if (ncp->nc_vp != NULL) { TAILQ_REMOVE(&ncp->nc_vp->v_cache_dst, ncp, nc_dst); } else { cache_negative_remove(ncp); } if (vp != NULL) { TAILQ_INSERT_HEAD(&vp->v_cache_dst, ncp, nc_dst); } else { cache_negative_insert(ncp); } ncp->nc_vp = vp; CACHE_WUNLOCK(); return; } dvp->v_cache_dd = NULL; SDT_PROBE3(vfs, namecache, enter, done, dvp, "..", vp); CACHE_WUNLOCK(); flag = NCF_ISDOTDOT; } } /* * Calculate the hash key and setup as much of the new * namecache entry as possible before acquiring the lock. */ ncp = cache_alloc(cnp->cn_namelen, tsp != NULL); ncp->nc_vp = vp; ncp->nc_dvp = dvp; ncp->nc_flag = flag; if (tsp != NULL) { n3 = (struct namecache_ts *)ncp; n3->nc_time = *tsp; n3->nc_ticks = ticks; n3->nc_flag |= NCF_TS; if (dtsp != NULL) { n3->nc_dotdottime = *dtsp; n3->nc_flag |= NCF_DTS; } } len = ncp->nc_nlen = cnp->cn_namelen; hash = cache_get_hash(cnp->cn_nameptr, len, dvp); strlcpy(nc_get_name(ncp), cnp->cn_nameptr, len + 1); CACHE_WLOCK(); /* * See if this vnode or negative entry is already in the cache * with this name. This can happen with concurrent lookups of * the same path name. */ ncpp = NCHHASH(hash); LIST_FOREACH(n2, ncpp, nc_hash) { if (n2->nc_dvp == dvp && n2->nc_nlen == cnp->cn_namelen && !bcmp(nc_get_name(n2), cnp->cn_nameptr, n2->nc_nlen)) { if (tsp != NULL) { KASSERT((n2->nc_flag & NCF_TS) != 0, ("no NCF_TS")); n3 = (struct namecache_ts *)n2; n3->nc_time = ((struct namecache_ts *)ncp)->nc_time; n3->nc_ticks = ((struct namecache_ts *)ncp)->nc_ticks; if (dtsp != NULL) { n3->nc_dotdottime = ((struct namecache_ts *)ncp)-> nc_dotdottime; n3->nc_flag |= NCF_DTS; } } CACHE_WUNLOCK(); cache_free(ncp); return; } } if (flag == NCF_ISDOTDOT) { /* * See if we are trying to add .. entry, but some other lookup * has populated v_cache_dd pointer already. */ if (dvp->v_cache_dd != NULL) { CACHE_WUNLOCK(); cache_free(ncp); return; } KASSERT(vp == NULL || vp->v_type == VDIR, ("wrong vnode type %p", vp)); dvp->v_cache_dd = ncp; } numcache++; if (vp != NULL) { if (vp->v_type == VDIR) { if (flag != NCF_ISDOTDOT) { /* * For this case, the cache entry maps both the * directory name in it and the name ".." for the * directory's parent. */ if ((ndd = vp->v_cache_dd) != NULL) { if ((ndd->nc_flag & NCF_ISDOTDOT) != 0) cache_zap(ndd); else ndd = NULL; } vp->v_cache_dd = ncp; } } else { vp->v_cache_dd = NULL; } } - /* - * Insert the new namecache entry into the appropriate chain - * within the cache entries table. - */ - LIST_INSERT_HEAD(ncpp, ncp, nc_hash); if (flag != NCF_ISDOTDOT) { if (LIST_EMPTY(&dvp->v_cache_src)) { vhold(dvp); numcachehv++; } LIST_INSERT_HEAD(&dvp->v_cache_src, ncp, nc_src); } + bucketlock = HASH2BUCKETLOCK(hash); + rw_wlock(bucketlock); + /* + * Insert the new namecache entry into the appropriate chain + * within the cache entries table. + */ + LIST_INSERT_HEAD(ncpp, ncp, nc_hash); + + /* * If the entry is "negative", we place it into the * "negative" cache queue, otherwise, we place it into the * destination vnode's cache entries queue. */ if (vp != NULL) { TAILQ_INSERT_HEAD(&vp->v_cache_dst, ncp, nc_dst); SDT_PROBE3(vfs, namecache, enter, done, dvp, nc_get_name(ncp), vp); } else { if (cnp->cn_flags & ISWHITEOUT) ncp->nc_flag |= NCF_WHITE; cache_negative_insert(ncp); SDT_PROBE2(vfs, namecache, enter_negative, done, dvp, nc_get_name(ncp)); } + rw_wunlock(bucketlock); if (numneg * ncnegfactor > numcache) nneg = cache_negative_zap_one(); CACHE_WUNLOCK(); cache_free(ndd); cache_free(nneg); } +static u_int +cache_roundup_2(u_int val) +{ + u_int res; + + for (res = 1; res <= val; res <<= 1) + continue; + + return (res); +} + /* * Name cache initialization, from vfs_init() when we are booting */ static void nchinit(void *dummy __unused) { + u_int i; TAILQ_INIT(&ncneg); cache_zone_small = uma_zcreate("S VFS Cache", sizeof(struct namecache) + CACHE_PATH_CUTOFF + 1, NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_ZINIT); cache_zone_small_ts = uma_zcreate("STS VFS Cache", sizeof(struct namecache_ts) + CACHE_PATH_CUTOFF + 1, NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_ZINIT); cache_zone_large = uma_zcreate("L VFS Cache", sizeof(struct namecache) + NAME_MAX + 1, NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_ZINIT); cache_zone_large_ts = uma_zcreate("LTS VFS Cache", sizeof(struct namecache_ts) + NAME_MAX + 1, NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_ZINIT); nchashtbl = hashinit(desiredvnodes * 2, M_VFSCACHE, &nchash); + numbucketlocks = cache_roundup_2(mp_ncpus * 16); + if (numbucketlocks > nchash) + numbucketlocks = nchash; + bucketlocks = malloc(sizeof(*bucketlocks) * numbucketlocks, M_VFSCACHE, + M_WAITOK | M_ZERO); + for (i = 0; i < numbucketlocks; i++) + rw_init_flags(&bucketlocks[i], "ncbuc", RW_DUPOK); numcalls = counter_u64_alloc(M_WAITOK); dothits = counter_u64_alloc(M_WAITOK); dotdothits = counter_u64_alloc(M_WAITOK); numchecks = counter_u64_alloc(M_WAITOK); nummiss = counter_u64_alloc(M_WAITOK); nummisszap = counter_u64_alloc(M_WAITOK); numposzaps = counter_u64_alloc(M_WAITOK); numposhits = counter_u64_alloc(M_WAITOK); numnegzaps = counter_u64_alloc(M_WAITOK); numneghits = counter_u64_alloc(M_WAITOK); numfullpathcalls = counter_u64_alloc(M_WAITOK); numfullpathfail1 = counter_u64_alloc(M_WAITOK); numfullpathfail2 = counter_u64_alloc(M_WAITOK); numfullpathfail4 = counter_u64_alloc(M_WAITOK); numfullpathfound = counter_u64_alloc(M_WAITOK); } SYSINIT(vfs, SI_SUB_VFS, SI_ORDER_SECOND, nchinit, NULL); void cache_changesize(int newmaxvnodes) { struct nchashhead *new_nchashtbl, *old_nchashtbl; u_long new_nchash, old_nchash; struct namecache *ncp; uint32_t hash; int i; new_nchashtbl = hashinit(newmaxvnodes * 2, M_VFSCACHE, &new_nchash); /* If same hash table size, nothing to do */ if (nchash == new_nchash) { free(new_nchashtbl, M_VFSCACHE); return; } /* * Move everything from the old hash table to the new table. * None of the namecache entries in the table can be removed * because to do so, they have to be removed from the hash table. */ CACHE_WLOCK(); + cache_lock_all_buckets(); old_nchashtbl = nchashtbl; old_nchash = nchash; nchashtbl = new_nchashtbl; nchash = new_nchash; for (i = 0; i <= old_nchash; i++) { while ((ncp = LIST_FIRST(&old_nchashtbl[i])) != NULL) { hash = cache_get_hash(nc_get_name(ncp), ncp->nc_nlen, ncp->nc_dvp); LIST_REMOVE(ncp, nc_hash); LIST_INSERT_HEAD(NCHHASH(hash), ncp, nc_hash); } } + cache_unlock_all_buckets(); CACHE_WUNLOCK(); free(old_nchashtbl, M_VFSCACHE); } /* * Invalidate all entries to a particular vnode. */ void cache_purge(struct vnode *vp) { TAILQ_HEAD(, namecache) ncps; struct namecache *ncp, *nnp; CTR1(KTR_VFS, "cache_purge(%p)", vp); SDT_PROBE1(vfs, namecache, purge, done, vp); TAILQ_INIT(&ncps); CACHE_WLOCK(); while (!LIST_EMPTY(&vp->v_cache_src)) { ncp = LIST_FIRST(&vp->v_cache_src); cache_zap(ncp); TAILQ_INSERT_TAIL(&ncps, ncp, nc_dst); } while (!TAILQ_EMPTY(&vp->v_cache_dst)) { ncp = TAILQ_FIRST(&vp->v_cache_dst); cache_zap(ncp); TAILQ_INSERT_TAIL(&ncps, ncp, nc_dst); } if (vp->v_cache_dd != NULL) { ncp = vp->v_cache_dd; KASSERT(ncp->nc_flag & NCF_ISDOTDOT, ("lost dotdot link")); cache_zap(ncp); TAILQ_INSERT_TAIL(&ncps, ncp, nc_dst); } KASSERT(vp->v_cache_dd == NULL, ("incomplete purge")); CACHE_WUNLOCK(); TAILQ_FOREACH_SAFE(ncp, &ncps, nc_dst, nnp) { cache_free(ncp); } } /* * Invalidate all negative entries for a particular directory vnode. */ void cache_purge_negative(struct vnode *vp) { TAILQ_HEAD(, namecache) ncps; struct namecache *ncp, *nnp; CTR1(KTR_VFS, "cache_purge_negative(%p)", vp); SDT_PROBE1(vfs, namecache, purge_negative, done, vp); TAILQ_INIT(&ncps); CACHE_WLOCK(); LIST_FOREACH_SAFE(ncp, &vp->v_cache_src, nc_src, nnp) { if (ncp->nc_vp != NULL) continue; cache_zap(ncp); TAILQ_INSERT_TAIL(&ncps, ncp, nc_dst); } CACHE_WUNLOCK(); TAILQ_FOREACH_SAFE(ncp, &ncps, nc_dst, nnp) { cache_free(ncp); } } /* * Flush all entries referencing a particular filesystem. */ void cache_purgevfs(struct mount *mp) { TAILQ_HEAD(, namecache) ncps; - struct nchashhead *ncpp; + struct rwlock *bucketlock; + struct nchashhead *bucket; struct namecache *ncp, *nnp; + u_long i, j, n_nchash; /* Scan hash tables for applicable entries */ SDT_PROBE1(vfs, namecache, purgevfs, done, mp); TAILQ_INIT(&ncps); CACHE_WLOCK(); - for (ncpp = &nchashtbl[nchash]; ncpp >= nchashtbl; ncpp--) { - LIST_FOREACH_SAFE(ncp, ncpp, nc_hash, nnp) { - if (ncp->nc_dvp->v_mount != mp) - continue; - cache_zap(ncp); - TAILQ_INSERT_TAIL(&ncps, ncp, nc_dst); + n_nchash = nchash + 1; + for (i = 0; i < numbucketlocks; i++) { + bucketlock = (struct rwlock *)&bucketlocks[i]; + rw_wlock(bucketlock); + for (j = i; j < n_nchash; j += numbucketlocks) { + bucket = &nchashtbl[j]; + LIST_FOREACH_SAFE(ncp, bucket, nc_hash, nnp) { + cache_assert_bucket_locked(ncp, RA_WLOCKED); + if (ncp->nc_dvp->v_mount != mp) + continue; + cache_zap_locked(ncp); + TAILQ_INSERT_HEAD(&ncps, ncp, nc_dst); + } } + rw_wunlock(bucketlock); } CACHE_WUNLOCK(); TAILQ_FOREACH_SAFE(ncp, &ncps, nc_dst, nnp) { cache_free(ncp); } } /* * Perform canonical checks and cache lookup and pass on to filesystem * through the vop_cachedlookup only if needed. */ int vfs_cache_lookup(struct vop_lookup_args *ap) { struct vnode *dvp; int error; struct vnode **vpp = ap->a_vpp; struct componentname *cnp = ap->a_cnp; struct ucred *cred = cnp->cn_cred; int flags = cnp->cn_flags; struct thread *td = cnp->cn_thread; *vpp = NULL; dvp = ap->a_dvp; if (dvp->v_type != VDIR) return (ENOTDIR); if ((flags & ISLASTCN) && (dvp->v_mount->mnt_flag & MNT_RDONLY) && (cnp->cn_nameiop == DELETE || cnp->cn_nameiop == RENAME)) return (EROFS); error = VOP_ACCESS(dvp, VEXEC, cred, td); if (error) return (error); error = cache_lookup(dvp, vpp, cnp, NULL, NULL); if (error == 0) return (VOP_CACHEDLOOKUP(dvp, vpp, cnp)); if (error == -1) return (0); return (error); } /* * XXX All of these sysctls would probably be more productive dead. */ static int disablecwd; SYSCTL_INT(_debug, OID_AUTO, disablecwd, CTLFLAG_RW, &disablecwd, 0, "Disable the getcwd syscall"); /* Implementation of the getcwd syscall. */ int sys___getcwd(struct thread *td, struct __getcwd_args *uap) { return (kern___getcwd(td, uap->buf, UIO_USERSPACE, uap->buflen, MAXPATHLEN)); } int kern___getcwd(struct thread *td, char *buf, enum uio_seg bufseg, u_int buflen, u_int path_max) { char *bp, *tmpbuf; struct filedesc *fdp; struct vnode *cdir, *rdir; int error; if (disablecwd) return (ENODEV); if (buflen < 2) return (EINVAL); if (buflen > path_max) buflen = path_max; tmpbuf = malloc(buflen, M_TEMP, M_WAITOK); fdp = td->td_proc->p_fd; FILEDESC_SLOCK(fdp); cdir = fdp->fd_cdir; VREF(cdir); rdir = fdp->fd_rdir; VREF(rdir); FILEDESC_SUNLOCK(fdp); error = vn_fullpath1(td, cdir, rdir, tmpbuf, &bp, buflen); vrele(rdir); vrele(cdir); if (!error) { if (bufseg == UIO_SYSSPACE) bcopy(bp, buf, strlen(bp) + 1); else error = copyout(bp, buf, strlen(bp) + 1); #ifdef KTRACE if (KTRPOINT(curthread, KTR_NAMEI)) ktrnamei(bp); #endif } free(tmpbuf, M_TEMP); return (error); } /* * Thus begins the fullpath magic. */ static int disablefullpath; SYSCTL_INT(_debug, OID_AUTO, disablefullpath, CTLFLAG_RW, &disablefullpath, 0, "Disable the vn_fullpath function"); /* * Retrieve the full filesystem path that correspond to a vnode from the name * cache (if available) */ int vn_fullpath(struct thread *td, struct vnode *vn, char **retbuf, char **freebuf) { char *buf; struct filedesc *fdp; struct vnode *rdir; int error; if (disablefullpath) return (ENODEV); if (vn == NULL) return (EINVAL); buf = malloc(MAXPATHLEN, M_TEMP, M_WAITOK); fdp = td->td_proc->p_fd; FILEDESC_SLOCK(fdp); rdir = fdp->fd_rdir; VREF(rdir); FILEDESC_SUNLOCK(fdp); error = vn_fullpath1(td, vn, rdir, buf, retbuf, MAXPATHLEN); vrele(rdir); if (!error) *freebuf = buf; else free(buf, M_TEMP); return (error); } /* * This function is similar to vn_fullpath, but it attempts to lookup the * pathname relative to the global root mount point. This is required for the * auditing sub-system, as audited pathnames must be absolute, relative to the * global root mount point. */ int vn_fullpath_global(struct thread *td, struct vnode *vn, char **retbuf, char **freebuf) { char *buf; int error; if (disablefullpath) return (ENODEV); if (vn == NULL) return (EINVAL); buf = malloc(MAXPATHLEN, M_TEMP, M_WAITOK); error = vn_fullpath1(td, vn, rootvnode, buf, retbuf, MAXPATHLEN); if (!error) *freebuf = buf; else free(buf, M_TEMP); return (error); } int vn_vptocnp(struct vnode **vp, struct ucred *cred, char *buf, u_int *buflen) { int error; CACHE_RLOCK(); error = vn_vptocnp_locked(vp, cred, buf, buflen); if (error == 0) CACHE_RUNLOCK(); return (error); } static int vn_vptocnp_locked(struct vnode **vp, struct ucred *cred, char *buf, u_int *buflen) { struct vnode *dvp; struct namecache *ncp; int error; TAILQ_FOREACH(ncp, &((*vp)->v_cache_dst), nc_dst) { if ((ncp->nc_flag & NCF_ISDOTDOT) == 0) break; } if (ncp != NULL) { if (*buflen < ncp->nc_nlen) { CACHE_RUNLOCK(); vrele(*vp); counter_u64_add(numfullpathfail4, 1); error = ENOMEM; SDT_PROBE3(vfs, namecache, fullpath, return, error, vp, NULL); return (error); } *buflen -= ncp->nc_nlen; memcpy(buf + *buflen, nc_get_name(ncp), ncp->nc_nlen); SDT_PROBE3(vfs, namecache, fullpath, hit, ncp->nc_dvp, nc_get_name(ncp), vp); dvp = *vp; *vp = ncp->nc_dvp; vref(*vp); CACHE_RUNLOCK(); vrele(dvp); CACHE_RLOCK(); return (0); } SDT_PROBE1(vfs, namecache, fullpath, miss, vp); CACHE_RUNLOCK(); vn_lock(*vp, LK_SHARED | LK_RETRY); error = VOP_VPTOCNP(*vp, &dvp, cred, buf, buflen); vput(*vp); if (error) { counter_u64_add(numfullpathfail2, 1); SDT_PROBE3(vfs, namecache, fullpath, return, error, vp, NULL); return (error); } *vp = dvp; CACHE_RLOCK(); if (dvp->v_iflag & VI_DOOMED) { /* forced unmount */ CACHE_RUNLOCK(); vrele(dvp); error = ENOENT; SDT_PROBE3(vfs, namecache, fullpath, return, error, vp, NULL); return (error); } /* * *vp has its use count incremented still. */ return (0); } /* * The magic behind kern___getcwd() and vn_fullpath(). */ static int vn_fullpath1(struct thread *td, struct vnode *vp, struct vnode *rdir, char *buf, char **retbuf, u_int buflen) { int error, slash_prefixed; #ifdef KDTRACE_HOOKS struct vnode *startvp = vp; #endif struct vnode *vp1; buflen--; buf[buflen] = '\0'; error = 0; slash_prefixed = 0; SDT_PROBE1(vfs, namecache, fullpath, entry, vp); counter_u64_add(numfullpathcalls, 1); vref(vp); CACHE_RLOCK(); if (vp->v_type != VDIR) { error = vn_vptocnp_locked(&vp, td->td_ucred, buf, &buflen); if (error) return (error); if (buflen == 0) { CACHE_RUNLOCK(); vrele(vp); return (ENOMEM); } buf[--buflen] = '/'; slash_prefixed = 1; } while (vp != rdir && vp != rootvnode) { if (vp->v_vflag & VV_ROOT) { if (vp->v_iflag & VI_DOOMED) { /* forced unmount */ CACHE_RUNLOCK(); vrele(vp); error = ENOENT; SDT_PROBE3(vfs, namecache, fullpath, return, error, vp, NULL); break; } vp1 = vp->v_mount->mnt_vnodecovered; vref(vp1); CACHE_RUNLOCK(); vrele(vp); vp = vp1; CACHE_RLOCK(); continue; } if (vp->v_type != VDIR) { CACHE_RUNLOCK(); vrele(vp); counter_u64_add(numfullpathfail1, 1); error = ENOTDIR; SDT_PROBE3(vfs, namecache, fullpath, return, error, vp, NULL); break; } error = vn_vptocnp_locked(&vp, td->td_ucred, buf, &buflen); if (error) break; if (buflen == 0) { CACHE_RUNLOCK(); vrele(vp); error = ENOMEM; SDT_PROBE3(vfs, namecache, fullpath, return, error, startvp, NULL); break; } buf[--buflen] = '/'; slash_prefixed = 1; } if (error) return (error); if (!slash_prefixed) { if (buflen == 0) { CACHE_RUNLOCK(); vrele(vp); counter_u64_add(numfullpathfail4, 1); SDT_PROBE3(vfs, namecache, fullpath, return, ENOMEM, startvp, NULL); return (ENOMEM); } buf[--buflen] = '/'; } counter_u64_add(numfullpathfound, 1); CACHE_RUNLOCK(); vrele(vp); SDT_PROBE3(vfs, namecache, fullpath, return, 0, startvp, buf + buflen); *retbuf = buf + buflen; return (0); } struct vnode * vn_dir_dd_ino(struct vnode *vp) { struct namecache *ncp; struct vnode *ddvp; ASSERT_VOP_LOCKED(vp, "vn_dir_dd_ino"); CACHE_RLOCK(); TAILQ_FOREACH(ncp, &(vp->v_cache_dst), nc_dst) { if ((ncp->nc_flag & NCF_ISDOTDOT) != 0) continue; ddvp = ncp->nc_dvp; vhold(ddvp); CACHE_RUNLOCK(); if (vget(ddvp, LK_SHARED | LK_NOWAIT | LK_VNHELD, curthread)) return (NULL); return (ddvp); } CACHE_RUNLOCK(); return (NULL); } int vn_commname(struct vnode *vp, char *buf, u_int buflen) { struct namecache *ncp; int l; CACHE_RLOCK(); TAILQ_FOREACH(ncp, &vp->v_cache_dst, nc_dst) if ((ncp->nc_flag & NCF_ISDOTDOT) == 0) break; if (ncp == NULL) { CACHE_RUNLOCK(); return (ENOENT); } l = min(ncp->nc_nlen, buflen - 1); memcpy(buf, nc_get_name(ncp), l); CACHE_RUNLOCK(); buf[l] = '\0'; return (0); } /* ABI compat shims for old kernel modules. */ #undef cache_enter void cache_enter(struct vnode *dvp, struct vnode *vp, struct componentname *cnp); void cache_enter(struct vnode *dvp, struct vnode *vp, struct componentname *cnp) { cache_enter_time(dvp, vp, cnp, NULL, NULL); } /* * This function updates path string to vnode's full global path * and checks the size of the new path string against the pathlen argument. * * Requires a locked, referenced vnode. * Vnode is re-locked on success or ENODEV, otherwise unlocked. * * If sysctl debug.disablefullpath is set, ENODEV is returned, * vnode is left locked and path remain untouched. * * If vp is a directory, the call to vn_fullpath_global() always succeeds * because it falls back to the ".." lookup if the namecache lookup fails. */ int vn_path_to_global_path(struct thread *td, struct vnode *vp, char *path, u_int pathlen) { struct nameidata nd; struct vnode *vp1; char *rpath, *fbuf; int error; ASSERT_VOP_ELOCKED(vp, __func__); /* Return ENODEV if sysctl debug.disablefullpath==1 */ if (disablefullpath) return (ENODEV); /* Construct global filesystem path from vp. */ VOP_UNLOCK(vp, 0); error = vn_fullpath_global(td, vp, &rpath, &fbuf); if (error != 0) { vrele(vp); return (error); } if (strlen(rpath) >= pathlen) { vrele(vp); error = ENAMETOOLONG; goto out; } /* * Re-lookup the vnode by path to detect a possible rename. * As a side effect, the vnode is relocked. * If vnode was renamed, return ENOENT. */ NDINIT(&nd, LOOKUP, FOLLOW | LOCKLEAF | AUDITVNODE1, UIO_SYSSPACE, path, td); error = namei(&nd); if (error != 0) { vrele(vp); goto out; } NDFREE(&nd, NDF_ONLY_PNBUF); vp1 = nd.ni_vp; vrele(vp); if (vp1 == vp) strcpy(path, rpath); else { vput(vp1); error = ENOENT; } out: free(fbuf, M_TEMP); return (error); } Index: projects/clang390-import/sys/mips/malta/asm_malta.S =================================================================== --- projects/clang390-import/sys/mips/malta/asm_malta.S (nonexistent) +++ projects/clang390-import/sys/mips/malta/asm_malta.S (revision 305687) @@ -0,0 +1,89 @@ +/*- + * Copyright (c) 2016 Ruslan Bukin + * All rights reserved. + * + * Portions of this software were developed by SRI International and the + * University of Cambridge Computer Laboratory under DARPA/AFRL contract + * FA8750-10-C-0237 ("CTSRD"), as part of the DARPA CRASH research programme. + * + * Portions of this software were developed by the University of Cambridge + * Computer Laboratory as part of the CTSRD Project, with support from the + * UK Higher Education Innovation Fund (HEIF). + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD$ + */ + +#include + +#define VPECONF0_MVP (1 << 1) + + .set noreorder + +#ifdef SMP +/* + * This function must be implemented in assembly because it is called early + * in AP boot without a valid stack. + */ +LEAF(platform_processor_id) + .set push + .set mips32r2 + mfc0 v0, $15, 1 + jr ra + andi v0, 0x1f + .set pop +END(platform_processor_id) + +LEAF(enable_mvp) + .set push + .set mips32r2 + .set noat + li t2, (VPECONF0_MVP) + move $1, t2 + jr ra + .word 0x41810000 | (1 << 11) | 2 # mttc0 t2, $1, 2 + .set pop +END(enable_mvp) + +/* + * Called on APs to wait until they are told to launch. + */ +LEAF(malta_ap_wait) + jal platform_processor_id + nop + +1: + ll t0, malta_ap_boot + bne v0, t0, 1b + nop + + move t0, zero + sc t0, malta_ap_boot + + beqz t0, 1b + nop + + j mpentry + nop +END(malta_ap_wait) +#endif Property changes on: projects/clang390-import/sys/mips/malta/asm_malta.S ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang390-import/sys/mips/malta/files.malta =================================================================== --- projects/clang390-import/sys/mips/malta/files.malta (revision 305686) +++ projects/clang390-import/sys/mips/malta/files.malta (revision 305687) @@ -1,12 +1,16 @@ # $FreeBSD$ mips/malta/gt.c standard mips/malta/gt_pci.c standard mips/malta/gt_pci_bus_space.c standard mips/malta/obio.c optional uart mips/malta/uart_cpu_maltausart.c optional uart mips/malta/uart_bus_maltausart.c optional uart dev/uart/uart_dev_ns8250.c optional uart mips/malta/malta_machdep.c standard mips/malta/yamon.c standard mips/mips/intr_machdep.c standard mips/mips/tick.c standard + +# SMP +mips/malta/asm_malta.S optional smp +mips/malta/malta_mp.c optional smp Index: projects/clang390-import/sys/mips/malta/malta_mp.c =================================================================== --- projects/clang390-import/sys/mips/malta/malta_mp.c (nonexistent) +++ projects/clang390-import/sys/mips/malta/malta_mp.c (revision 305687) @@ -0,0 +1,226 @@ +/*- + * Copyright (c) 2016 Ruslan Bukin + * All rights reserved. + * + * Portions of this software were developed by SRI International and the + * University of Cambridge Computer Laboratory under DARPA/AFRL contract + * FA8750-10-C-0237 ("CTSRD"), as part of the DARPA CRASH research programme. + * + * Portions of this software were developed by the University of Cambridge + * Computer Laboratory as part of the CTSRD Project, with support from the + * UK Higher Education Innovation Fund (HEIF). + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD$ + */ + +#include +__FBSDID("$FreeBSD$"); + +#include +#include +#include +#include +#include + +#include +#include +#include +#include + +#define MALTA_MAXCPU 2 + +unsigned malta_ap_boot = ~0; + +#define C_SW0 (1 << 8) +#define C_SW1 (1 << 9) +#define C_IRQ0 (1 << 10) +#define C_IRQ1 (1 << 11) +#define C_IRQ2 (1 << 12) +#define C_IRQ3 (1 << 13) +#define C_IRQ4 (1 << 14) +#define C_IRQ5 (1 << 15) + +static inline void +ehb(void) +{ + __asm __volatile( + " .set mips32r2 \n" + " ehb \n" + " .set mips0 \n"); +} + +#define mttc0(rd, sel, val) \ +({ \ + __asm __volatile( \ + " .set push \n" \ + " .set mips32r2 \n" \ + " .set noat \n" \ + " move $1, %0 \n" \ + " .word 0x41810000 | (" #rd " << 11) | " #sel " \n" \ + " .set pop \n" \ + :: "r" (val)); \ +}) + +#define mftc0(rt, sel) \ +({ \ + unsigned long __res; \ + __asm __volatile( \ + " .set push \n" \ + " .set mips32r2 \n" \ + " .set noat \n" \ + " .word 0x41000800 | (" #rt " << 16) | " #sel " \n" \ + " move %0, $1 \n" \ + " .set pop \n" \ + : "=r" (__res)); \ + __res; \ +}) + +#define write_c0_register32(reg, sel, val) \ +({ \ + __asm __volatile( \ + " .set push \n" \ + " .set mips32 \n" \ + " mtc0 %0, $%1, %2 \n" \ + " .set pop \n" \ + :: "r" (val), "i" (reg), "i" (sel)); \ +}) + +#define read_c0_register32(reg, sel) \ +({ \ + uint32_t __retval; \ + __asm __volatile( \ + " .set push \n" \ + " .set mips32 \n" \ + " mfc0 %0, $%1, %2 \n" \ + " .set pop \n" \ + : "=r" (__retval) : "i" (reg), "i" (sel)); \ + __retval; \ +}) + +void +platform_ipi_send(int cpuid) +{ + uint32_t reg; + + /* + * Set thread context. + * Note this is not global, so we don't need lock. + */ + reg = read_c0_register32(1, 1); + reg &= ~(0xff); + reg |= cpuid; + write_c0_register32(1, 1, reg); + + ehb(); + + /* Set cause */ + reg = mftc0(13, 0); + mttc0(13, 0, (reg | C_SW1)); +} + +void +platform_ipi_clear(void) +{ + uint32_t reg; + + reg = mips_rd_cause(); + reg &= ~(C_SW1); + mips_wr_cause(reg); +} + +int +platform_ipi_hardintr_num(void) +{ + + return (-1); +} + +int +platform_ipi_softintr_num(void) +{ + + return (1); +} + +void +platform_init_ap(int cpuid) +{ + uint32_t clock_int_mask; + uint32_t ipi_intr_mask; + + /* + * Clear any pending IPIs. + */ + platform_ipi_clear(); + + /* + * Unmask the clock and ipi interrupts. + */ + ipi_intr_mask = soft_int_mask(platform_ipi_softintr_num()); + clock_int_mask = hard_int_mask(5); + set_intr_mask(ipi_intr_mask | clock_int_mask); + + mips_wbflush(); +} + +void +platform_cpu_mask(cpuset_t *mask) +{ + uint32_t i, m; + + CPU_ZERO(mask); + for (i = 0, m = 1 ; i < MALTA_MAXCPU; i++, m <<= 1) + CPU_SET(i, mask); +} + +struct cpu_group * +platform_smp_topo(void) +{ + + return (smp_topo_none()); +} + +int +platform_start_ap(int cpuid) +{ + int timeout; + + if (atomic_cmpset_32(&malta_ap_boot, ~0, cpuid) == 0) + return (-1); + + printf("Waiting for cpu%d to start\n", cpuid); + + timeout = 100; + do { + DELAY(1000); + if (atomic_cmpset_32(&malta_ap_boot, 0, ~0) != 0) { + printf("CPU %d started\n", cpuid); + return (0); + } + } while (timeout--); + + printf("CPU %d failed to start\n", cpuid); + + return (0); +} Property changes on: projects/clang390-import/sys/mips/malta/malta_mp.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang390-import/sys/mips/malta/std.malta =================================================================== --- projects/clang390-import/sys/mips/malta/std.malta (revision 305686) +++ projects/clang390-import/sys/mips/malta/std.malta (revision 305687) @@ -1,11 +1,11 @@ # $FreeBSD$ files "../malta/files.malta" -cpu CPU_MIPS4KC +cpu CPU_MALTA device pci device ata device scbus # SCSI bus (required for ATA/SCSI) device cd # CD device da # Direct Access (disks) device pass # Passthrough device (direct ATA/SCSI access) Index: projects/clang390-import/sys/mips/mips/locore.S =================================================================== --- projects/clang390-import/sys/mips/mips/locore.S (revision 305686) +++ projects/clang390-import/sys/mips/mips/locore.S (revision 305687) @@ -1,187 +1,202 @@ /* $OpenBSD: locore.S,v 1.18 1998/09/15 10:58:53 pefo Exp $ */ /*- * Copyright (c) 1992, 1993 * The Regents of the University of California. All rights reserved. * * This code is derived from software contributed to Berkeley by * Digital Equipment Corporation and Ralph Campbell. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * Copyright (C) 1989 Digital Equipment Corporation. * Permission to use, copy, modify, and distribute this software and * its documentation for any purpose and without fee is hereby granted, * provided that the above copyright notice appears in all copies. * Digital Equipment Corporation makes no representations about the * suitability of this software for any purpose. It is provided "as is" * without express or implied warranty. * * from: Header: /sprite/src/kernel/mach/ds3100.md/RCS/loMem.s, * v 1.1 89/07/11 17:55:04 nelson Exp SPRITE (DECWRL) * from: Header: /sprite/src/kernel/mach/ds3100.md/RCS/machAsm.s, * v 9.2 90/01/29 18:00:39 shirriff Exp SPRITE (DECWRL) * from: Header: /sprite/src/kernel/vm/ds3100.md/vmPmaxAsm.s, * v 1.1 89/07/10 14:27:41 nelson Exp SPRITE (DECWRL) * * from: @(#)locore.s 8.5 (Berkeley) 1/4/94 * JNPR: locore.S,v 1.6.2.1 2007/08/29 12:24:49 girish * $FreeBSD$ */ /* * FREEBSD_DEVELOPERS_FIXME * The start routine below was written for a multi-core CPU * with each core being hyperthreaded. This serves as an example * for a complex CPU architecture. For a different CPU complex * please make necessary changes to read CPU-ID etc. * A clean solution would be to have a different locore file for * each CPU type. */ /* * Contains code that is the first executed at boot time plus * assembly language support routines. */ #include #include #include #include #include "assym.s" .data #ifdef YAMON GLOBAL(fenvp) .space 4 # Assumes mips32? Is that OK? #endif .set noreorder .text GLOBAL(btext) ASM_ENTRY(_start) VECTOR(_locore, unknown) /* UNSAFE TO USE a0..a3, need to preserve the args from boot loader */ mtc0 zero, MIPS_COP_0_CAUSE # Clear soft interrupts #if defined(CPU_CNMIPS) /* * t1: Bits to set explicitly: * Enable FPU */ /* Set these bits */ li t1, (MIPS_SR_COP_0_BIT | MIPS_SR_PX | MIPS_SR_KX | MIPS_SR_UX | MIPS_SR_SX | MIPS_SR_BEV) /* Reset these bits */ li t0, ~(MIPS_SR_DE | MIPS_SR_SR | MIPS_SR_ERL | MIPS_SR_EXL | MIPS_SR_INT_IE | MIPS_SR_COP_2_BIT) #elif defined (CPU_RMI) || defined (CPU_NLM) /* Set these bits */ li t1, (MIPS_SR_COP_2_BIT | MIPS_SR_COP_0_BIT | MIPS_SR_KX | MIPS_SR_UX) /* Reset these bits */ li t0, ~(MIPS_SR_BEV | MIPS_SR_SR | MIPS_SR_INT_IE) #else /* * t0: Bits to preserve if set: * Soft reset * Boot exception vectors (firmware-provided) */ li t0, (MIPS_SR_BEV | MIPS_SR_SR) /* * t1: Bits to set explicitly: * Enable FPU */ li t1, MIPS_SR_COP_1_BIT #ifdef __mips_n64 or t1, MIPS_SR_KX | MIPS_SR_SX | MIPS_SR_UX #endif #endif /* * Read coprocessor 0 status register, clear bits not * preserved (namely, clearing interrupt bits), and set * bits we want to explicitly set. */ mfc0 t2, MIPS_COP_0_STATUS and t2, t0 or t2, t1 mtc0 t2, MIPS_COP_0_STATUS COP0_SYNC /* Make sure KSEG0 is cached */ li t0, MIPS_CCA_CACHED mtc0 t0, MIPS_COP_0_CONFIG COP0_SYNC /*xxximp * now that we pass a0...a3 to the platform_init routine, do we need * to stash this stuff here? */ #ifdef YAMON /* Save YAMON boot environment pointer */ sw a2, _C_LABEL(fenvp) #endif #if defined(CPU_CNMIPS) && defined(SMP) .set push .set mips32r2 rdhwr t2, $0 beqz t2, 1f nop j octeon_ap_wait nop .set pop 1: #endif +#if defined(CPU_MALTA) && defined(SMP) + .set push + .set mips32r2 + jal enable_mvp + nop + jal platform_processor_id + nop + beqz v0, 1f + nop + j malta_ap_wait + nop + .set pop +1: +#endif + /* * Initialize stack and call machine startup. */ PTR_LA sp, _C_LABEL(pcpu_space) PTR_ADDU sp, (PAGE_SIZE * 2) - CALLFRAME_SIZ REG_S zero, CALLFRAME_RA(sp) # Zero out old ra for debugger REG_S zero, CALLFRAME_SP(sp) # Zero out old fp for debugger PTR_LA gp, _C_LABEL(_gp) /* Call the platform-specific startup code. */ jal _C_LABEL(platform_start) nop PTR_LA sp, _C_LABEL(thread0_st) PTR_L a0, TD_PCB(sp) REG_LI t0, ~7 and a0, a0, t0 PTR_SUBU sp, a0, CALLFRAME_SIZ jal _C_LABEL(mi_startup) # mi_startup(frame) sw zero, CALLFRAME_SIZ - 8(sp) # Zero out old fp for debugger PANIC("Startup failed!") VECTOR_END(_locore) Index: projects/clang390-import/sys/net80211/ieee80211_freebsd.c =================================================================== --- projects/clang390-import/sys/net80211/ieee80211_freebsd.c (revision 305686) +++ projects/clang390-import/sys/net80211/ieee80211_freebsd.c (revision 305687) @@ -1,936 +1,972 @@ /*- * Copyright (c) 2003-2009 Sam Leffler, Errno Consulting * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); /* * IEEE 802.11 support (FreeBSD-specific code) */ #include "opt_wlan.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include SYSCTL_NODE(_net, OID_AUTO, wlan, CTLFLAG_RD, 0, "IEEE 80211 parameters"); #ifdef IEEE80211_DEBUG static int ieee80211_debug = 0; SYSCTL_INT(_net_wlan, OID_AUTO, debug, CTLFLAG_RW, &ieee80211_debug, 0, "debugging printfs"); #endif static MALLOC_DEFINE(M_80211_COM, "80211com", "802.11 com state"); static const char wlanname[] = "wlan"; static struct if_clone *wlan_cloner; static int wlan_clone_create(struct if_clone *ifc, int unit, caddr_t params) { struct ieee80211_clone_params cp; struct ieee80211vap *vap; struct ieee80211com *ic; int error; error = copyin(params, &cp, sizeof(cp)); if (error) return error; ic = ieee80211_find_com(cp.icp_parent); if (ic == NULL) return ENXIO; if (cp.icp_opmode >= IEEE80211_OPMODE_MAX) { ic_printf(ic, "%s: invalid opmode %d\n", __func__, cp.icp_opmode); return EINVAL; } if ((ic->ic_caps & ieee80211_opcap[cp.icp_opmode]) == 0) { ic_printf(ic, "%s mode not supported\n", ieee80211_opmode_name[cp.icp_opmode]); return EOPNOTSUPP; } if ((cp.icp_flags & IEEE80211_CLONE_TDMA) && #ifdef IEEE80211_SUPPORT_TDMA (ic->ic_caps & IEEE80211_C_TDMA) == 0 #else (1) #endif ) { ic_printf(ic, "TDMA not supported\n"); return EOPNOTSUPP; } vap = ic->ic_vap_create(ic, wlanname, unit, cp.icp_opmode, cp.icp_flags, cp.icp_bssid, cp.icp_flags & IEEE80211_CLONE_MACADDR ? cp.icp_macaddr : ic->ic_macaddr); return (vap == NULL ? EIO : 0); } static void wlan_clone_destroy(struct ifnet *ifp) { struct ieee80211vap *vap = ifp->if_softc; struct ieee80211com *ic = vap->iv_ic; ic->ic_vap_delete(vap); } void ieee80211_vap_destroy(struct ieee80211vap *vap) { CURVNET_SET(vap->iv_ifp->if_vnet); if_clone_destroyif(wlan_cloner, vap->iv_ifp); CURVNET_RESTORE(); } int ieee80211_sysctl_msecs_ticks(SYSCTL_HANDLER_ARGS) { int msecs = ticks_to_msecs(*(int *)arg1); int error, t; error = sysctl_handle_int(oidp, &msecs, 0, req); if (error || !req->newptr) return error; t = msecs_to_ticks(msecs); *(int *)arg1 = (t < 1) ? 1 : t; return 0; } static int ieee80211_sysctl_inact(SYSCTL_HANDLER_ARGS) { int inact = (*(int *)arg1) * IEEE80211_INACT_WAIT; int error; error = sysctl_handle_int(oidp, &inact, 0, req); if (error || !req->newptr) return error; *(int *)arg1 = inact / IEEE80211_INACT_WAIT; return 0; } static int ieee80211_sysctl_parent(SYSCTL_HANDLER_ARGS) { struct ieee80211com *ic = arg1; return SYSCTL_OUT_STR(req, ic->ic_name); } static int ieee80211_sysctl_radar(SYSCTL_HANDLER_ARGS) { struct ieee80211com *ic = arg1; int t = 0, error; error = sysctl_handle_int(oidp, &t, 0, req); if (error || !req->newptr) return error; IEEE80211_LOCK(ic); ieee80211_dfs_notify_radar(ic, ic->ic_curchan); IEEE80211_UNLOCK(ic); return 0; } void ieee80211_sysctl_attach(struct ieee80211com *ic) { } void ieee80211_sysctl_detach(struct ieee80211com *ic) { } void ieee80211_sysctl_vattach(struct ieee80211vap *vap) { struct ifnet *ifp = vap->iv_ifp; struct sysctl_ctx_list *ctx; struct sysctl_oid *oid; char num[14]; /* sufficient for 32 bits */ ctx = (struct sysctl_ctx_list *) IEEE80211_MALLOC(sizeof(struct sysctl_ctx_list), M_DEVBUF, IEEE80211_M_NOWAIT | IEEE80211_M_ZERO); if (ctx == NULL) { if_printf(ifp, "%s: cannot allocate sysctl context!\n", __func__); return; } sysctl_ctx_init(ctx); snprintf(num, sizeof(num), "%u", ifp->if_dunit); oid = SYSCTL_ADD_NODE(ctx, &SYSCTL_NODE_CHILDREN(_net, wlan), OID_AUTO, num, CTLFLAG_RD, NULL, ""); SYSCTL_ADD_PROC(ctx, SYSCTL_CHILDREN(oid), OID_AUTO, "%parent", CTLTYPE_STRING | CTLFLAG_RD, vap->iv_ic, 0, ieee80211_sysctl_parent, "A", "parent device"); SYSCTL_ADD_UINT(ctx, SYSCTL_CHILDREN(oid), OID_AUTO, "driver_caps", CTLFLAG_RW, &vap->iv_caps, 0, "driver capabilities"); #ifdef IEEE80211_DEBUG vap->iv_debug = ieee80211_debug; SYSCTL_ADD_UINT(ctx, SYSCTL_CHILDREN(oid), OID_AUTO, "debug", CTLFLAG_RW, &vap->iv_debug, 0, "control debugging printfs"); #endif SYSCTL_ADD_INT(ctx, SYSCTL_CHILDREN(oid), OID_AUTO, "bmiss_max", CTLFLAG_RW, &vap->iv_bmiss_max, 0, "consecutive beacon misses before scanning"); /* XXX inherit from tunables */ SYSCTL_ADD_PROC(ctx, SYSCTL_CHILDREN(oid), OID_AUTO, "inact_run", CTLTYPE_INT | CTLFLAG_RW, &vap->iv_inact_run, 0, ieee80211_sysctl_inact, "I", "station inactivity timeout (sec)"); SYSCTL_ADD_PROC(ctx, SYSCTL_CHILDREN(oid), OID_AUTO, "inact_probe", CTLTYPE_INT | CTLFLAG_RW, &vap->iv_inact_probe, 0, ieee80211_sysctl_inact, "I", "station inactivity probe timeout (sec)"); SYSCTL_ADD_PROC(ctx, SYSCTL_CHILDREN(oid), OID_AUTO, "inact_auth", CTLTYPE_INT | CTLFLAG_RW, &vap->iv_inact_auth, 0, ieee80211_sysctl_inact, "I", "station authentication timeout (sec)"); SYSCTL_ADD_PROC(ctx, SYSCTL_CHILDREN(oid), OID_AUTO, "inact_init", CTLTYPE_INT | CTLFLAG_RW, &vap->iv_inact_init, 0, ieee80211_sysctl_inact, "I", "station initial state timeout (sec)"); if (vap->iv_htcaps & IEEE80211_HTC_HT) { SYSCTL_ADD_UINT(ctx, SYSCTL_CHILDREN(oid), OID_AUTO, "ampdu_mintraffic_bk", CTLFLAG_RW, &vap->iv_ampdu_mintraffic[WME_AC_BK], 0, "BK traffic tx aggr threshold (pps)"); SYSCTL_ADD_UINT(ctx, SYSCTL_CHILDREN(oid), OID_AUTO, "ampdu_mintraffic_be", CTLFLAG_RW, &vap->iv_ampdu_mintraffic[WME_AC_BE], 0, "BE traffic tx aggr threshold (pps)"); SYSCTL_ADD_UINT(ctx, SYSCTL_CHILDREN(oid), OID_AUTO, "ampdu_mintraffic_vo", CTLFLAG_RW, &vap->iv_ampdu_mintraffic[WME_AC_VO], 0, "VO traffic tx aggr threshold (pps)"); SYSCTL_ADD_UINT(ctx, SYSCTL_CHILDREN(oid), OID_AUTO, "ampdu_mintraffic_vi", CTLFLAG_RW, &vap->iv_ampdu_mintraffic[WME_AC_VI], 0, "VI traffic tx aggr threshold (pps)"); } if (vap->iv_caps & IEEE80211_C_DFS) { SYSCTL_ADD_PROC(ctx, SYSCTL_CHILDREN(oid), OID_AUTO, "radar", CTLTYPE_INT | CTLFLAG_RW, vap->iv_ic, 0, ieee80211_sysctl_radar, "I", "simulate radar event"); } vap->iv_sysctl = ctx; vap->iv_oid = oid; } void ieee80211_sysctl_vdetach(struct ieee80211vap *vap) { if (vap->iv_sysctl != NULL) { sysctl_ctx_free(vap->iv_sysctl); IEEE80211_FREE(vap->iv_sysctl, M_DEVBUF); vap->iv_sysctl = NULL; } } int ieee80211_node_dectestref(struct ieee80211_node *ni) { /* XXX need equivalent of atomic_dec_and_test */ atomic_subtract_int(&ni->ni_refcnt, 1); return atomic_cmpset_int(&ni->ni_refcnt, 0, 1); } void ieee80211_drain_ifq(struct ifqueue *ifq) { struct ieee80211_node *ni; struct mbuf *m; for (;;) { IF_DEQUEUE(ifq, m); if (m == NULL) break; ni = (struct ieee80211_node *)m->m_pkthdr.rcvif; KASSERT(ni != NULL, ("frame w/o node")); ieee80211_free_node(ni); m->m_pkthdr.rcvif = NULL; m_freem(m); } } void ieee80211_flush_ifq(struct ifqueue *ifq, struct ieee80211vap *vap) { struct ieee80211_node *ni; struct mbuf *m, **mprev; IF_LOCK(ifq); mprev = &ifq->ifq_head; while ((m = *mprev) != NULL) { ni = (struct ieee80211_node *)m->m_pkthdr.rcvif; if (ni != NULL && ni->ni_vap == vap) { *mprev = m->m_nextpkt; /* remove from list */ ifq->ifq_len--; m_freem(m); ieee80211_free_node(ni); /* reclaim ref */ } else mprev = &m->m_nextpkt; } /* recalculate tail ptr */ m = ifq->ifq_head; for (; m != NULL && m->m_nextpkt != NULL; m = m->m_nextpkt) ; ifq->ifq_tail = m; IF_UNLOCK(ifq); } /* * As above, for mbufs allocated with m_gethdr/MGETHDR * or initialized by M_COPY_PKTHDR. */ #define MC_ALIGN(m, len) \ do { \ (m)->m_data += rounddown2(MCLBYTES - (len), sizeof(long)); \ } while (/* CONSTCOND */ 0) /* * Allocate and setup a management frame of the specified * size. We return the mbuf and a pointer to the start * of the contiguous data area that's been reserved based * on the packet length. The data area is forced to 32-bit * alignment and the buffer length to a multiple of 4 bytes. * This is done mainly so beacon frames (that require this) * can use this interface too. */ struct mbuf * ieee80211_getmgtframe(uint8_t **frm, int headroom, int pktlen) { struct mbuf *m; u_int len; /* * NB: we know the mbuf routines will align the data area * so we don't need to do anything special. */ len = roundup2(headroom + pktlen, 4); KASSERT(len <= MCLBYTES, ("802.11 mgt frame too large: %u", len)); if (len < MINCLSIZE) { m = m_gethdr(M_NOWAIT, MT_DATA); /* * Align the data in case additional headers are added. * This should only happen when a WEP header is added * which only happens for shared key authentication mgt * frames which all fit in MHLEN. */ if (m != NULL) M_ALIGN(m, len); } else { m = m_getcl(M_NOWAIT, MT_DATA, M_PKTHDR); if (m != NULL) MC_ALIGN(m, len); } if (m != NULL) { m->m_data += headroom; *frm = m->m_data; } return m; } #ifndef __NO_STRICT_ALIGNMENT /* * Re-align the payload in the mbuf. This is mainly used (right now) * to handle IP header alignment requirements on certain architectures. */ struct mbuf * ieee80211_realign(struct ieee80211vap *vap, struct mbuf *m, size_t align) { int pktlen, space; struct mbuf *n; pktlen = m->m_pkthdr.len; space = pktlen + align; if (space < MINCLSIZE) n = m_gethdr(M_NOWAIT, MT_DATA); else { n = m_getjcl(M_NOWAIT, MT_DATA, M_PKTHDR, space <= MCLBYTES ? MCLBYTES : #if MJUMPAGESIZE != MCLBYTES space <= MJUMPAGESIZE ? MJUMPAGESIZE : #endif space <= MJUM9BYTES ? MJUM9BYTES : MJUM16BYTES); } if (__predict_true(n != NULL)) { m_move_pkthdr(n, m); n->m_data = (caddr_t)(ALIGN(n->m_data + align) - align); m_copydata(m, 0, pktlen, mtod(n, caddr_t)); n->m_len = pktlen; } else { IEEE80211_DISCARD(vap, IEEE80211_MSG_ANY, mtod(m, const struct ieee80211_frame *), NULL, "%s", "no mbuf to realign"); vap->iv_stats.is_rx_badalign++; } m_freem(m); return n; } #endif /* !__NO_STRICT_ALIGNMENT */ int ieee80211_add_callback(struct mbuf *m, void (*func)(struct ieee80211_node *, void *, int), void *arg) { struct m_tag *mtag; struct ieee80211_cb *cb; mtag = m_tag_alloc(MTAG_ABI_NET80211, NET80211_TAG_CALLBACK, sizeof(struct ieee80211_cb), M_NOWAIT); if (mtag == NULL) return 0; cb = (struct ieee80211_cb *)(mtag+1); cb->func = func; cb->arg = arg; m_tag_prepend(m, mtag); m->m_flags |= M_TXCB; return 1; } int ieee80211_add_xmit_params(struct mbuf *m, const struct ieee80211_bpf_params *params) { struct m_tag *mtag; struct ieee80211_tx_params *tx; mtag = m_tag_alloc(MTAG_ABI_NET80211, NET80211_TAG_XMIT_PARAMS, sizeof(struct ieee80211_tx_params), M_NOWAIT); if (mtag == NULL) return (0); tx = (struct ieee80211_tx_params *)(mtag+1); memcpy(&tx->params, params, sizeof(struct ieee80211_bpf_params)); m_tag_prepend(m, mtag); return (1); } int ieee80211_get_xmit_params(struct mbuf *m, struct ieee80211_bpf_params *params) { struct m_tag *mtag; struct ieee80211_tx_params *tx; mtag = m_tag_locate(m, MTAG_ABI_NET80211, NET80211_TAG_XMIT_PARAMS, NULL); if (mtag == NULL) return (-1); tx = (struct ieee80211_tx_params *)(mtag + 1); memcpy(params, &tx->params, sizeof(struct ieee80211_bpf_params)); return (0); } void ieee80211_process_callback(struct ieee80211_node *ni, struct mbuf *m, int status) { struct m_tag *mtag; mtag = m_tag_locate(m, MTAG_ABI_NET80211, NET80211_TAG_CALLBACK, NULL); if (mtag != NULL) { struct ieee80211_cb *cb = (struct ieee80211_cb *)(mtag+1); cb->func(ni, cb->arg, status); } } /* * Add RX parameters to the given mbuf. * * Returns 1 if OK, 0 on error. */ int ieee80211_add_rx_params(struct mbuf *m, const struct ieee80211_rx_stats *rxs) { struct m_tag *mtag; struct ieee80211_rx_params *rx; mtag = m_tag_alloc(MTAG_ABI_NET80211, NET80211_TAG_RECV_PARAMS, sizeof(struct ieee80211_rx_stats), M_NOWAIT); if (mtag == NULL) return (0); rx = (struct ieee80211_rx_params *)(mtag + 1); memcpy(&rx->params, rxs, sizeof(*rxs)); m_tag_prepend(m, mtag); return (1); } int ieee80211_get_rx_params(struct mbuf *m, struct ieee80211_rx_stats *rxs) { struct m_tag *mtag; struct ieee80211_rx_params *rx; mtag = m_tag_locate(m, MTAG_ABI_NET80211, NET80211_TAG_RECV_PARAMS, NULL); if (mtag == NULL) return (-1); rx = (struct ieee80211_rx_params *)(mtag + 1); memcpy(rxs, &rx->params, sizeof(*rxs)); return (0); } /* + * Add TOA parameters to the given mbuf. + */ +int +ieee80211_add_toa_params(struct mbuf *m, const struct ieee80211_toa_params *p) +{ + struct m_tag *mtag; + struct ieee80211_toa_params *rp; + + mtag = m_tag_alloc(MTAG_ABI_NET80211, NET80211_TAG_TOA_PARAMS, + sizeof(struct ieee80211_toa_params), M_NOWAIT); + if (mtag == NULL) + return (0); + + rp = (struct ieee80211_toa_params *)(mtag + 1); + memcpy(rp, p, sizeof(*rp)); + m_tag_prepend(m, mtag); + return (1); +} + +int +ieee80211_get_toa_params(struct mbuf *m, struct ieee80211_toa_params *p) +{ + struct m_tag *mtag; + struct ieee80211_toa_params *rp; + + mtag = m_tag_locate(m, MTAG_ABI_NET80211, NET80211_TAG_TOA_PARAMS, + NULL); + if (mtag == NULL) + return (0); + rp = (struct ieee80211_toa_params *)(mtag + 1); + if (p != NULL) + memcpy(p, rp, sizeof(*p)); + return (1); +} + +/* * Transmit a frame to the parent interface. */ int ieee80211_parent_xmitpkt(struct ieee80211com *ic, struct mbuf *m) { int error; /* * Assert the IC TX lock is held - this enforces the * processing -> queuing order is maintained */ IEEE80211_TX_LOCK_ASSERT(ic); error = ic->ic_transmit(ic, m); if (error) { struct ieee80211_node *ni; ni = (struct ieee80211_node *)m->m_pkthdr.rcvif; /* XXX number of fragments */ if_inc_counter(ni->ni_vap->iv_ifp, IFCOUNTER_OERRORS, 1); ieee80211_free_node(ni); ieee80211_free_mbuf(m); } return (error); } /* * Transmit a frame to the VAP interface. */ int ieee80211_vap_xmitpkt(struct ieee80211vap *vap, struct mbuf *m) { struct ifnet *ifp = vap->iv_ifp; /* * When transmitting via the VAP, we shouldn't hold * any IC TX lock as the VAP TX path will acquire it. */ IEEE80211_TX_UNLOCK_ASSERT(vap->iv_ic); return (ifp->if_transmit(ifp, m)); } #include void get_random_bytes(void *p, size_t n) { uint8_t *dp = p; while (n > 0) { uint32_t v = arc4random(); size_t nb = n > sizeof(uint32_t) ? sizeof(uint32_t) : n; bcopy(&v, dp, n > sizeof(uint32_t) ? sizeof(uint32_t) : n); dp += sizeof(uint32_t), n -= nb; } } /* * Helper function for events that pass just a single mac address. */ static void notify_macaddr(struct ifnet *ifp, int op, const uint8_t mac[IEEE80211_ADDR_LEN]) { struct ieee80211_join_event iev; CURVNET_SET(ifp->if_vnet); memset(&iev, 0, sizeof(iev)); IEEE80211_ADDR_COPY(iev.iev_addr, mac); rt_ieee80211msg(ifp, op, &iev, sizeof(iev)); CURVNET_RESTORE(); } void ieee80211_notify_node_join(struct ieee80211_node *ni, int newassoc) { struct ieee80211vap *vap = ni->ni_vap; struct ifnet *ifp = vap->iv_ifp; CURVNET_SET_QUIET(ifp->if_vnet); IEEE80211_NOTE(vap, IEEE80211_MSG_NODE, ni, "%snode join", (ni == vap->iv_bss) ? "bss " : ""); if (ni == vap->iv_bss) { notify_macaddr(ifp, newassoc ? RTM_IEEE80211_ASSOC : RTM_IEEE80211_REASSOC, ni->ni_bssid); if_link_state_change(ifp, LINK_STATE_UP); } else { notify_macaddr(ifp, newassoc ? RTM_IEEE80211_JOIN : RTM_IEEE80211_REJOIN, ni->ni_macaddr); } CURVNET_RESTORE(); } void ieee80211_notify_node_leave(struct ieee80211_node *ni) { struct ieee80211vap *vap = ni->ni_vap; struct ifnet *ifp = vap->iv_ifp; CURVNET_SET_QUIET(ifp->if_vnet); IEEE80211_NOTE(vap, IEEE80211_MSG_NODE, ni, "%snode leave", (ni == vap->iv_bss) ? "bss " : ""); if (ni == vap->iv_bss) { rt_ieee80211msg(ifp, RTM_IEEE80211_DISASSOC, NULL, 0); if_link_state_change(ifp, LINK_STATE_DOWN); } else { /* fire off wireless event station leaving */ notify_macaddr(ifp, RTM_IEEE80211_LEAVE, ni->ni_macaddr); } CURVNET_RESTORE(); } void ieee80211_notify_scan_done(struct ieee80211vap *vap) { struct ifnet *ifp = vap->iv_ifp; IEEE80211_DPRINTF(vap, IEEE80211_MSG_SCAN, "%s\n", "notify scan done"); /* dispatch wireless event indicating scan completed */ CURVNET_SET(ifp->if_vnet); rt_ieee80211msg(ifp, RTM_IEEE80211_SCAN, NULL, 0); CURVNET_RESTORE(); } void ieee80211_notify_replay_failure(struct ieee80211vap *vap, const struct ieee80211_frame *wh, const struct ieee80211_key *k, u_int64_t rsc, int tid) { struct ifnet *ifp = vap->iv_ifp; IEEE80211_NOTE_MAC(vap, IEEE80211_MSG_CRYPTO, wh->i_addr2, "%s replay detected tid %d ", k->wk_cipher->ic_name, tid, (intmax_t) rsc, (intmax_t) k->wk_keyrsc[tid], k->wk_keyix, k->wk_rxkeyix); if (ifp != NULL) { /* NB: for cipher test modules */ struct ieee80211_replay_event iev; IEEE80211_ADDR_COPY(iev.iev_dst, wh->i_addr1); IEEE80211_ADDR_COPY(iev.iev_src, wh->i_addr2); iev.iev_cipher = k->wk_cipher->ic_cipher; if (k->wk_rxkeyix != IEEE80211_KEYIX_NONE) iev.iev_keyix = k->wk_rxkeyix; else iev.iev_keyix = k->wk_keyix; iev.iev_keyrsc = k->wk_keyrsc[tid]; iev.iev_rsc = rsc; CURVNET_SET(ifp->if_vnet); rt_ieee80211msg(ifp, RTM_IEEE80211_REPLAY, &iev, sizeof(iev)); CURVNET_RESTORE(); } } void ieee80211_notify_michael_failure(struct ieee80211vap *vap, const struct ieee80211_frame *wh, u_int keyix) { struct ifnet *ifp = vap->iv_ifp; IEEE80211_NOTE_MAC(vap, IEEE80211_MSG_CRYPTO, wh->i_addr2, "michael MIC verification failed ", keyix); vap->iv_stats.is_rx_tkipmic++; if (ifp != NULL) { /* NB: for cipher test modules */ struct ieee80211_michael_event iev; IEEE80211_ADDR_COPY(iev.iev_dst, wh->i_addr1); IEEE80211_ADDR_COPY(iev.iev_src, wh->i_addr2); iev.iev_cipher = IEEE80211_CIPHER_TKIP; iev.iev_keyix = keyix; CURVNET_SET(ifp->if_vnet); rt_ieee80211msg(ifp, RTM_IEEE80211_MICHAEL, &iev, sizeof(iev)); CURVNET_RESTORE(); } } void ieee80211_notify_wds_discover(struct ieee80211_node *ni) { struct ieee80211vap *vap = ni->ni_vap; struct ifnet *ifp = vap->iv_ifp; notify_macaddr(ifp, RTM_IEEE80211_WDS, ni->ni_macaddr); } void ieee80211_notify_csa(struct ieee80211com *ic, const struct ieee80211_channel *c, int mode, int count) { struct ieee80211_csa_event iev; struct ieee80211vap *vap; struct ifnet *ifp; memset(&iev, 0, sizeof(iev)); iev.iev_flags = c->ic_flags; iev.iev_freq = c->ic_freq; iev.iev_ieee = c->ic_ieee; iev.iev_mode = mode; iev.iev_count = count; TAILQ_FOREACH(vap, &ic->ic_vaps, iv_next) { ifp = vap->iv_ifp; CURVNET_SET(ifp->if_vnet); rt_ieee80211msg(ifp, RTM_IEEE80211_CSA, &iev, sizeof(iev)); CURVNET_RESTORE(); } } void ieee80211_notify_radar(struct ieee80211com *ic, const struct ieee80211_channel *c) { struct ieee80211_radar_event iev; struct ieee80211vap *vap; struct ifnet *ifp; memset(&iev, 0, sizeof(iev)); iev.iev_flags = c->ic_flags; iev.iev_freq = c->ic_freq; iev.iev_ieee = c->ic_ieee; TAILQ_FOREACH(vap, &ic->ic_vaps, iv_next) { ifp = vap->iv_ifp; CURVNET_SET(ifp->if_vnet); rt_ieee80211msg(ifp, RTM_IEEE80211_RADAR, &iev, sizeof(iev)); CURVNET_RESTORE(); } } void ieee80211_notify_cac(struct ieee80211com *ic, const struct ieee80211_channel *c, enum ieee80211_notify_cac_event type) { struct ieee80211_cac_event iev; struct ieee80211vap *vap; struct ifnet *ifp; memset(&iev, 0, sizeof(iev)); iev.iev_flags = c->ic_flags; iev.iev_freq = c->ic_freq; iev.iev_ieee = c->ic_ieee; iev.iev_type = type; TAILQ_FOREACH(vap, &ic->ic_vaps, iv_next) { ifp = vap->iv_ifp; CURVNET_SET(ifp->if_vnet); rt_ieee80211msg(ifp, RTM_IEEE80211_CAC, &iev, sizeof(iev)); CURVNET_RESTORE(); } } void ieee80211_notify_node_deauth(struct ieee80211_node *ni) { struct ieee80211vap *vap = ni->ni_vap; struct ifnet *ifp = vap->iv_ifp; IEEE80211_NOTE(vap, IEEE80211_MSG_NODE, ni, "%s", "node deauth"); notify_macaddr(ifp, RTM_IEEE80211_DEAUTH, ni->ni_macaddr); } void ieee80211_notify_node_auth(struct ieee80211_node *ni) { struct ieee80211vap *vap = ni->ni_vap; struct ifnet *ifp = vap->iv_ifp; IEEE80211_NOTE(vap, IEEE80211_MSG_NODE, ni, "%s", "node auth"); notify_macaddr(ifp, RTM_IEEE80211_AUTH, ni->ni_macaddr); } void ieee80211_notify_country(struct ieee80211vap *vap, const uint8_t bssid[IEEE80211_ADDR_LEN], const uint8_t cc[2]) { struct ifnet *ifp = vap->iv_ifp; struct ieee80211_country_event iev; memset(&iev, 0, sizeof(iev)); IEEE80211_ADDR_COPY(iev.iev_addr, bssid); iev.iev_cc[0] = cc[0]; iev.iev_cc[1] = cc[1]; CURVNET_SET(ifp->if_vnet); rt_ieee80211msg(ifp, RTM_IEEE80211_COUNTRY, &iev, sizeof(iev)); CURVNET_RESTORE(); } void ieee80211_notify_radio(struct ieee80211com *ic, int state) { struct ieee80211_radio_event iev; struct ieee80211vap *vap; struct ifnet *ifp; memset(&iev, 0, sizeof(iev)); iev.iev_state = state; TAILQ_FOREACH(vap, &ic->ic_vaps, iv_next) { ifp = vap->iv_ifp; CURVNET_SET(ifp->if_vnet); rt_ieee80211msg(ifp, RTM_IEEE80211_RADIO, &iev, sizeof(iev)); CURVNET_RESTORE(); } } void ieee80211_load_module(const char *modname) { #ifdef notyet (void)kern_kldload(curthread, modname, NULL); #else printf("%s: load the %s module by hand for now.\n", __func__, modname); #endif } static eventhandler_tag wlan_bpfevent; static eventhandler_tag wlan_ifllevent; static void bpf_track(void *arg, struct ifnet *ifp, int dlt, int attach) { /* NB: identify vap's by if_init */ if (dlt == DLT_IEEE802_11_RADIO && ifp->if_init == ieee80211_init) { struct ieee80211vap *vap = ifp->if_softc; /* * Track bpf radiotap listener state. We mark the vap * to indicate if any listener is present and the com * to indicate if any listener exists on any associated * vap. This flag is used by drivers to prepare radiotap * state only when needed. */ if (attach) { ieee80211_syncflag_ext(vap, IEEE80211_FEXT_BPF); if (vap->iv_opmode == IEEE80211_M_MONITOR) atomic_add_int(&vap->iv_ic->ic_montaps, 1); } else if (!bpf_peers_present(vap->iv_rawbpf)) { ieee80211_syncflag_ext(vap, -IEEE80211_FEXT_BPF); if (vap->iv_opmode == IEEE80211_M_MONITOR) atomic_subtract_int(&vap->iv_ic->ic_montaps, 1); } } } /* * Change MAC address on the vap (if was not started). */ static void wlan_iflladdr(void *arg __unused, struct ifnet *ifp) { /* NB: identify vap's by if_init */ if (ifp->if_init == ieee80211_init && (ifp->if_flags & IFF_UP) == 0) { struct ieee80211vap *vap = ifp->if_softc; IEEE80211_ADDR_COPY(vap->iv_myaddr, IF_LLADDR(ifp)); } } /* * Module glue. * * NB: the module name is "wlan" for compatibility with NetBSD. */ static int wlan_modevent(module_t mod, int type, void *unused) { switch (type) { case MOD_LOAD: if (bootverbose) printf("wlan: <802.11 Link Layer>\n"); wlan_bpfevent = EVENTHANDLER_REGISTER(bpf_track, bpf_track, 0, EVENTHANDLER_PRI_ANY); wlan_ifllevent = EVENTHANDLER_REGISTER(iflladdr_event, wlan_iflladdr, NULL, EVENTHANDLER_PRI_ANY); wlan_cloner = if_clone_simple(wlanname, wlan_clone_create, wlan_clone_destroy, 0); return 0; case MOD_UNLOAD: if_clone_detach(wlan_cloner); EVENTHANDLER_DEREGISTER(bpf_track, wlan_bpfevent); EVENTHANDLER_DEREGISTER(iflladdr_event, wlan_ifllevent); return 0; } return EINVAL; } static moduledata_t wlan_mod = { wlanname, wlan_modevent, 0 }; DECLARE_MODULE(wlan, wlan_mod, SI_SUB_DRIVERS, SI_ORDER_FIRST); MODULE_VERSION(wlan, 1); MODULE_DEPEND(wlan, ether, 1, 1, 1); #ifdef IEEE80211_ALQ MODULE_DEPEND(wlan, alq, 1, 1, 1); #endif /* IEEE80211_ALQ */ Index: projects/clang390-import/sys/net80211/ieee80211_freebsd.h =================================================================== --- projects/clang390-import/sys/net80211/ieee80211_freebsd.h (revision 305686) +++ projects/clang390-import/sys/net80211/ieee80211_freebsd.h (revision 305687) @@ -1,675 +1,685 @@ /*- * Copyright (c) 2003-2008 Sam Leffler, Errno Consulting * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * $FreeBSD$ */ #ifndef _NET80211_IEEE80211_FREEBSD_H_ #define _NET80211_IEEE80211_FREEBSD_H_ #ifdef _KERNEL #include #include #include #include #include #include #include #include /* * Common state locking definitions. */ typedef struct { char name[16]; /* e.g. "ath0_com_lock" */ struct mtx mtx; } ieee80211_com_lock_t; #define IEEE80211_LOCK_INIT(_ic, _name) do { \ ieee80211_com_lock_t *cl = &(_ic)->ic_comlock; \ snprintf(cl->name, sizeof(cl->name), "%s_com_lock", _name); \ mtx_init(&cl->mtx, cl->name, NULL, MTX_DEF | MTX_RECURSE); \ } while (0) #define IEEE80211_LOCK_OBJ(_ic) (&(_ic)->ic_comlock.mtx) #define IEEE80211_LOCK_DESTROY(_ic) mtx_destroy(IEEE80211_LOCK_OBJ(_ic)) #define IEEE80211_LOCK(_ic) mtx_lock(IEEE80211_LOCK_OBJ(_ic)) #define IEEE80211_UNLOCK(_ic) mtx_unlock(IEEE80211_LOCK_OBJ(_ic)) #define IEEE80211_LOCK_ASSERT(_ic) \ mtx_assert(IEEE80211_LOCK_OBJ(_ic), MA_OWNED) #define IEEE80211_UNLOCK_ASSERT(_ic) \ mtx_assert(IEEE80211_LOCK_OBJ(_ic), MA_NOTOWNED) /* * Transmit lock. * * This is a (mostly) temporary lock designed to serialise all of the * transmission operations throughout the stack. */ typedef struct { char name[16]; /* e.g. "ath0_tx_lock" */ struct mtx mtx; } ieee80211_tx_lock_t; #define IEEE80211_TX_LOCK_INIT(_ic, _name) do { \ ieee80211_tx_lock_t *cl = &(_ic)->ic_txlock; \ snprintf(cl->name, sizeof(cl->name), "%s_tx_lock", _name); \ mtx_init(&cl->mtx, cl->name, NULL, MTX_DEF); \ } while (0) #define IEEE80211_TX_LOCK_OBJ(_ic) (&(_ic)->ic_txlock.mtx) #define IEEE80211_TX_LOCK_DESTROY(_ic) mtx_destroy(IEEE80211_TX_LOCK_OBJ(_ic)) #define IEEE80211_TX_LOCK(_ic) mtx_lock(IEEE80211_TX_LOCK_OBJ(_ic)) #define IEEE80211_TX_UNLOCK(_ic) mtx_unlock(IEEE80211_TX_LOCK_OBJ(_ic)) #define IEEE80211_TX_LOCK_ASSERT(_ic) \ mtx_assert(IEEE80211_TX_LOCK_OBJ(_ic), MA_OWNED) #define IEEE80211_TX_UNLOCK_ASSERT(_ic) \ mtx_assert(IEEE80211_TX_LOCK_OBJ(_ic), MA_NOTOWNED) /* * Stageq / ni_tx_superg lock */ typedef struct { char name[16]; /* e.g. "ath0_ff_lock" */ struct mtx mtx; } ieee80211_ff_lock_t; #define IEEE80211_FF_LOCK_INIT(_ic, _name) do { \ ieee80211_ff_lock_t *fl = &(_ic)->ic_fflock; \ snprintf(fl->name, sizeof(fl->name), "%s_ff_lock", _name); \ mtx_init(&fl->mtx, fl->name, NULL, MTX_DEF); \ } while (0) #define IEEE80211_FF_LOCK_OBJ(_ic) (&(_ic)->ic_fflock.mtx) #define IEEE80211_FF_LOCK_DESTROY(_ic) mtx_destroy(IEEE80211_FF_LOCK_OBJ(_ic)) #define IEEE80211_FF_LOCK(_ic) mtx_lock(IEEE80211_FF_LOCK_OBJ(_ic)) #define IEEE80211_FF_UNLOCK(_ic) mtx_unlock(IEEE80211_FF_LOCK_OBJ(_ic)) #define IEEE80211_FF_LOCK_ASSERT(_ic) \ mtx_assert(IEEE80211_FF_LOCK_OBJ(_ic), MA_OWNED) /* * Node locking definitions. */ typedef struct { char name[16]; /* e.g. "ath0_node_lock" */ struct mtx mtx; } ieee80211_node_lock_t; #define IEEE80211_NODE_LOCK_INIT(_nt, _name) do { \ ieee80211_node_lock_t *nl = &(_nt)->nt_nodelock; \ snprintf(nl->name, sizeof(nl->name), "%s_node_lock", _name); \ mtx_init(&nl->mtx, nl->name, NULL, MTX_DEF | MTX_RECURSE); \ } while (0) #define IEEE80211_NODE_LOCK_OBJ(_nt) (&(_nt)->nt_nodelock.mtx) #define IEEE80211_NODE_LOCK_DESTROY(_nt) \ mtx_destroy(IEEE80211_NODE_LOCK_OBJ(_nt)) #define IEEE80211_NODE_LOCK(_nt) \ mtx_lock(IEEE80211_NODE_LOCK_OBJ(_nt)) #define IEEE80211_NODE_IS_LOCKED(_nt) \ mtx_owned(IEEE80211_NODE_LOCK_OBJ(_nt)) #define IEEE80211_NODE_UNLOCK(_nt) \ mtx_unlock(IEEE80211_NODE_LOCK_OBJ(_nt)) #define IEEE80211_NODE_LOCK_ASSERT(_nt) \ mtx_assert(IEEE80211_NODE_LOCK_OBJ(_nt), MA_OWNED) /* * Power-save queue definitions. */ typedef struct mtx ieee80211_psq_lock_t; #define IEEE80211_PSQ_INIT(_psq, _name) \ mtx_init(&(_psq)->psq_lock, _name, "802.11 ps q", MTX_DEF) #define IEEE80211_PSQ_DESTROY(_psq) mtx_destroy(&(_psq)->psq_lock) #define IEEE80211_PSQ_LOCK(_psq) mtx_lock(&(_psq)->psq_lock) #define IEEE80211_PSQ_UNLOCK(_psq) mtx_unlock(&(_psq)->psq_lock) #ifndef IF_PREPEND_LIST #define _IF_PREPEND_LIST(ifq, mhead, mtail, mcount) do { \ (mtail)->m_nextpkt = (ifq)->ifq_head; \ if ((ifq)->ifq_tail == NULL) \ (ifq)->ifq_tail = (mtail); \ (ifq)->ifq_head = (mhead); \ (ifq)->ifq_len += (mcount); \ } while (0) #define IF_PREPEND_LIST(ifq, mhead, mtail, mcount) do { \ IF_LOCK(ifq); \ _IF_PREPEND_LIST(ifq, mhead, mtail, mcount); \ IF_UNLOCK(ifq); \ } while (0) #endif /* IF_PREPEND_LIST */ /* * Age queue definitions. */ typedef struct mtx ieee80211_ageq_lock_t; #define IEEE80211_AGEQ_INIT(_aq, _name) \ mtx_init(&(_aq)->aq_lock, _name, "802.11 age q", MTX_DEF) #define IEEE80211_AGEQ_DESTROY(_aq) mtx_destroy(&(_aq)->aq_lock) #define IEEE80211_AGEQ_LOCK(_aq) mtx_lock(&(_aq)->aq_lock) #define IEEE80211_AGEQ_UNLOCK(_aq) mtx_unlock(&(_aq)->aq_lock) /* * 802.1x MAC ACL database locking definitions. */ typedef struct mtx acl_lock_t; #define ACL_LOCK_INIT(_as, _name) \ mtx_init(&(_as)->as_lock, _name, "802.11 ACL", MTX_DEF) #define ACL_LOCK_DESTROY(_as) mtx_destroy(&(_as)->as_lock) #define ACL_LOCK(_as) mtx_lock(&(_as)->as_lock) #define ACL_UNLOCK(_as) mtx_unlock(&(_as)->as_lock) #define ACL_LOCK_ASSERT(_as) \ mtx_assert((&(_as)->as_lock), MA_OWNED) /* * Scan table definitions. */ typedef struct mtx ieee80211_scan_table_lock_t; #define IEEE80211_SCAN_TABLE_LOCK_INIT(_st, _name) \ mtx_init(&(_st)->st_lock, _name, "802.11 scan table", MTX_DEF) #define IEEE80211_SCAN_TABLE_LOCK_DESTROY(_st) mtx_destroy(&(_st)->st_lock) #define IEEE80211_SCAN_TABLE_LOCK(_st) mtx_lock(&(_st)->st_lock) #define IEEE80211_SCAN_TABLE_UNLOCK(_st) mtx_unlock(&(_st)->st_lock) typedef struct mtx ieee80211_scan_iter_lock_t; #define IEEE80211_SCAN_ITER_LOCK_INIT(_st, _name) \ mtx_init(&(_st)->st_scanlock, _name, "802.11 scangen", MTX_DEF) #define IEEE80211_SCAN_ITER_LOCK_DESTROY(_st) mtx_destroy(&(_st)->st_scanlock) #define IEEE80211_SCAN_ITER_LOCK(_st) mtx_lock(&(_st)->st_scanlock) #define IEEE80211_SCAN_ITER_UNLOCK(_st) mtx_unlock(&(_st)->st_scanlock) /* * Mesh node/routing definitions. */ typedef struct mtx ieee80211_rte_lock_t; #define MESH_RT_ENTRY_LOCK_INIT(_rt, _name) \ mtx_init(&(rt)->rt_lock, _name, "802.11s route entry", MTX_DEF) #define MESH_RT_ENTRY_LOCK_DESTROY(_rt) \ mtx_destroy(&(_rt)->rt_lock) #define MESH_RT_ENTRY_LOCK(rt) mtx_lock(&(rt)->rt_lock) #define MESH_RT_ENTRY_LOCK_ASSERT(rt) mtx_assert(&(rt)->rt_lock, MA_OWNED) #define MESH_RT_ENTRY_UNLOCK(rt) mtx_unlock(&(rt)->rt_lock) typedef struct mtx ieee80211_rt_lock_t; #define MESH_RT_LOCK(ms) mtx_lock(&(ms)->ms_rt_lock) #define MESH_RT_LOCK_ASSERT(ms) mtx_assert(&(ms)->ms_rt_lock, MA_OWNED) #define MESH_RT_UNLOCK(ms) mtx_unlock(&(ms)->ms_rt_lock) #define MESH_RT_LOCK_INIT(ms, name) \ mtx_init(&(ms)->ms_rt_lock, name, "802.11s routing table", MTX_DEF) #define MESH_RT_LOCK_DESTROY(ms) \ mtx_destroy(&(ms)->ms_rt_lock) /* * Node reference counting definitions. * * ieee80211_node_initref initialize the reference count to 1 * ieee80211_node_incref add a reference * ieee80211_node_decref remove a reference * ieee80211_node_dectestref remove a reference and return 1 if this * is the last reference, otherwise 0 * ieee80211_node_refcnt reference count for printing (only) */ #include #define ieee80211_node_initref(_ni) \ do { ((_ni)->ni_refcnt = 1); } while (0) #define ieee80211_node_incref(_ni) \ atomic_add_int(&(_ni)->ni_refcnt, 1) #define ieee80211_node_decref(_ni) \ atomic_subtract_int(&(_ni)->ni_refcnt, 1) struct ieee80211_node; int ieee80211_node_dectestref(struct ieee80211_node *ni); #define ieee80211_node_refcnt(_ni) (_ni)->ni_refcnt struct ifqueue; struct ieee80211vap; void ieee80211_drain_ifq(struct ifqueue *); void ieee80211_flush_ifq(struct ifqueue *, struct ieee80211vap *); void ieee80211_vap_destroy(struct ieee80211vap *); #define IFNET_IS_UP_RUNNING(_ifp) \ (((_ifp)->if_flags & IFF_UP) && \ ((_ifp)->if_drv_flags & IFF_DRV_RUNNING)) /* XXX TODO: cap these at 1, as hz may not be 1000 */ #define msecs_to_ticks(ms) (((ms)*hz)/1000) #define ticks_to_msecs(t) (1000*(t) / hz) #define ticks_to_secs(t) ((t) / hz) #define ieee80211_time_after(a,b) ((long)(b) - (long)(a) < 0) #define ieee80211_time_before(a,b) ieee80211_time_after(b,a) #define ieee80211_time_after_eq(a,b) ((long)(a) - (long)(b) >= 0) #define ieee80211_time_before_eq(a,b) ieee80211_time_after_eq(b,a) struct mbuf *ieee80211_getmgtframe(uint8_t **frm, int headroom, int pktlen); /* tx path usage */ #define M_ENCAP M_PROTO1 /* 802.11 encap done */ #define M_EAPOL M_PROTO3 /* PAE/EAPOL frame */ #define M_PWR_SAV M_PROTO4 /* bypass PS handling */ #define M_MORE_DATA M_PROTO5 /* more data frames to follow */ #define M_FF M_PROTO6 /* fast frame / A-MSDU */ #define M_TXCB M_PROTO7 /* do tx complete callback */ #define M_AMPDU_MPDU M_PROTO8 /* ok for A-MPDU aggregation */ #define M_FRAG M_PROTO9 /* frame fragmentation */ #define M_FIRSTFRAG M_PROTO10 /* first frame fragment */ #define M_LASTFRAG M_PROTO11 /* last frame fragment */ #define M_80211_TX \ (M_ENCAP|M_EAPOL|M_PWR_SAV|M_MORE_DATA|M_FF|M_TXCB| \ M_AMPDU_MPDU|M_FRAG|M_FIRSTFRAG|M_LASTFRAG) /* rx path usage */ #define M_AMPDU M_PROTO1 /* A-MPDU subframe */ #define M_WEP M_PROTO2 /* WEP done by hardware */ #if 0 #define M_AMPDU_MPDU M_PROTO8 /* A-MPDU re-order done */ #endif #define M_80211_RX (M_AMPDU|M_WEP|M_AMPDU_MPDU) #define IEEE80211_MBUF_TX_FLAG_BITS \ M_FLAG_BITS \ "\15M_ENCAP\17M_EAPOL\20M_PWR_SAV\21M_MORE_DATA\22M_FF\23M_TXCB" \ "\24M_AMPDU_MPDU\25M_FRAG\26M_FIRSTFRAG\27M_LASTFRAG" #define IEEE80211_MBUF_RX_FLAG_BITS \ M_FLAG_BITS \ "\15M_AMPDU\16M_WEP\24M_AMPDU_MPDU" /* * Store WME access control bits in the vlan tag. * This is safe since it's done after the packet is classified * (where we use any previous tag) and because it's passed * directly in to the driver and there's no chance someone * else will clobber them on us. */ #define M_WME_SETAC(m, ac) \ ((m)->m_pkthdr.ether_vtag = (ac)) #define M_WME_GETAC(m) ((m)->m_pkthdr.ether_vtag) /* * Mbufs on the power save queue are tagged with an age and * timed out. We reuse the hardware checksum field in the * mbuf packet header to store this data. */ #define M_AGE_SET(m,v) (m->m_pkthdr.csum_data = v) #define M_AGE_GET(m) (m->m_pkthdr.csum_data) #define M_AGE_SUB(m,adj) (m->m_pkthdr.csum_data -= adj) /* * Store the sequence number. */ #define M_SEQNO_SET(m, seqno) \ ((m)->m_pkthdr.tso_segsz = (seqno)) #define M_SEQNO_GET(m) ((m)->m_pkthdr.tso_segsz) #define MTAG_ABI_NET80211 1132948340 /* net80211 ABI */ struct ieee80211_cb { void (*func)(struct ieee80211_node *, void *, int status); void *arg; }; #define NET80211_TAG_CALLBACK 0 /* xmit complete callback */ int ieee80211_add_callback(struct mbuf *m, void (*func)(struct ieee80211_node *, void *, int), void *arg); void ieee80211_process_callback(struct ieee80211_node *, struct mbuf *, int); #define NET80211_TAG_XMIT_PARAMS 1 /* See below; this is after the bpf_params definition */ #define NET80211_TAG_RECV_PARAMS 2 +#define NET80211_TAG_TOA_PARAMS 3 + struct ieee80211com; int ieee80211_parent_xmitpkt(struct ieee80211com *, struct mbuf *); int ieee80211_vap_xmitpkt(struct ieee80211vap *, struct mbuf *); void get_random_bytes(void *, size_t); void ieee80211_sysctl_attach(struct ieee80211com *); void ieee80211_sysctl_detach(struct ieee80211com *); void ieee80211_sysctl_vattach(struct ieee80211vap *); void ieee80211_sysctl_vdetach(struct ieee80211vap *); SYSCTL_DECL(_net_wlan); int ieee80211_sysctl_msecs_ticks(SYSCTL_HANDLER_ARGS); void ieee80211_load_module(const char *); /* * A "policy module" is an adjunct module to net80211 that provides * functionality that typically includes policy decisions. This * modularity enables extensibility and vendor-supplied functionality. */ #define _IEEE80211_POLICY_MODULE(policy, name, version) \ typedef void (*policy##_setup)(int); \ SET_DECLARE(policy##_set, policy##_setup); \ static int \ wlan_##name##_modevent(module_t mod, int type, void *unused) \ { \ policy##_setup * const *iter, f; \ switch (type) { \ case MOD_LOAD: \ SET_FOREACH(iter, policy##_set) { \ f = (void*) *iter; \ f(type); \ } \ return 0; \ case MOD_UNLOAD: \ case MOD_QUIESCE: \ if (nrefs) { \ printf("wlan_" #name ": still in use " \ "(%u dynamic refs)\n", nrefs); \ return EBUSY; \ } \ if (type == MOD_UNLOAD) { \ SET_FOREACH(iter, policy##_set) { \ f = (void*) *iter; \ f(type); \ } \ } \ return 0; \ } \ return EINVAL; \ } \ static moduledata_t name##_mod = { \ "wlan_" #name, \ wlan_##name##_modevent, \ 0 \ }; \ DECLARE_MODULE(wlan_##name, name##_mod, SI_SUB_DRIVERS, SI_ORDER_FIRST);\ MODULE_VERSION(wlan_##name, version); \ MODULE_DEPEND(wlan_##name, wlan, 1, 1, 1) /* * Crypto modules implement cipher support. */ #define IEEE80211_CRYPTO_MODULE(name, version) \ _IEEE80211_POLICY_MODULE(crypto, name, version); \ static void \ name##_modevent(int type) \ { \ if (type == MOD_LOAD) \ ieee80211_crypto_register(&name); \ else \ ieee80211_crypto_unregister(&name); \ } \ TEXT_SET(crypto##_set, name##_modevent) /* * Scanner modules provide scanning policy. */ #define IEEE80211_SCANNER_MODULE(name, version) \ _IEEE80211_POLICY_MODULE(scanner, name, version) #define IEEE80211_SCANNER_ALG(name, alg, v) \ static void \ name##_modevent(int type) \ { \ if (type == MOD_LOAD) \ ieee80211_scanner_register(alg, &v); \ else \ ieee80211_scanner_unregister(alg, &v); \ } \ TEXT_SET(scanner_set, name##_modevent); \ /* * ACL modules implement acl policy. */ #define IEEE80211_ACL_MODULE(name, alg, version) \ _IEEE80211_POLICY_MODULE(acl, name, version); \ static void \ alg##_modevent(int type) \ { \ if (type == MOD_LOAD) \ ieee80211_aclator_register(&alg); \ else \ ieee80211_aclator_unregister(&alg); \ } \ TEXT_SET(acl_set, alg##_modevent); \ /* * Authenticator modules handle 802.1x/WPA authentication. */ #define IEEE80211_AUTH_MODULE(name, version) \ _IEEE80211_POLICY_MODULE(auth, name, version) #define IEEE80211_AUTH_ALG(name, alg, v) \ static void \ name##_modevent(int type) \ { \ if (type == MOD_LOAD) \ ieee80211_authenticator_register(alg, &v); \ else \ ieee80211_authenticator_unregister(alg); \ } \ TEXT_SET(auth_set, name##_modevent) /* * Rate control modules provide tx rate control support. */ #define IEEE80211_RATECTL_MODULE(alg, version) \ _IEEE80211_POLICY_MODULE(ratectl, alg, version); \ #define IEEE80211_RATECTL_ALG(name, alg, v) \ static void \ alg##_modevent(int type) \ { \ if (type == MOD_LOAD) \ ieee80211_ratectl_register(alg, &v); \ else \ ieee80211_ratectl_unregister(alg); \ } \ TEXT_SET(ratectl##_set, alg##_modevent) struct ieee80211req; typedef int ieee80211_ioctl_getfunc(struct ieee80211vap *, struct ieee80211req *); SET_DECLARE(ieee80211_ioctl_getset, ieee80211_ioctl_getfunc); #define IEEE80211_IOCTL_GET(_name, _get) TEXT_SET(ieee80211_ioctl_getset, _get) typedef int ieee80211_ioctl_setfunc(struct ieee80211vap *, struct ieee80211req *); SET_DECLARE(ieee80211_ioctl_setset, ieee80211_ioctl_setfunc); #define IEEE80211_IOCTL_SET(_name, _set) TEXT_SET(ieee80211_ioctl_setset, _set) #endif /* _KERNEL */ /* XXX this stuff belongs elsewhere */ /* * Message formats for messages from the net80211 layer to user * applications via the routing socket. These messages are appended * to an if_announcemsghdr structure. */ struct ieee80211_join_event { uint8_t iev_addr[6]; }; struct ieee80211_leave_event { uint8_t iev_addr[6]; }; struct ieee80211_replay_event { uint8_t iev_src[6]; /* src MAC */ uint8_t iev_dst[6]; /* dst MAC */ uint8_t iev_cipher; /* cipher type */ uint8_t iev_keyix; /* key id/index */ uint64_t iev_keyrsc; /* RSC from key */ uint64_t iev_rsc; /* RSC from frame */ }; struct ieee80211_michael_event { uint8_t iev_src[6]; /* src MAC */ uint8_t iev_dst[6]; /* dst MAC */ uint8_t iev_cipher; /* cipher type */ uint8_t iev_keyix; /* key id/index */ }; struct ieee80211_wds_event { uint8_t iev_addr[6]; }; struct ieee80211_csa_event { uint32_t iev_flags; /* channel flags */ uint16_t iev_freq; /* setting in Mhz */ uint8_t iev_ieee; /* IEEE channel number */ uint8_t iev_mode; /* CSA mode */ uint8_t iev_count; /* CSA count */ }; struct ieee80211_cac_event { uint32_t iev_flags; /* channel flags */ uint16_t iev_freq; /* setting in Mhz */ uint8_t iev_ieee; /* IEEE channel number */ /* XXX timestamp? */ uint8_t iev_type; /* IEEE80211_NOTIFY_CAC_* */ }; struct ieee80211_radar_event { uint32_t iev_flags; /* channel flags */ uint16_t iev_freq; /* setting in Mhz */ uint8_t iev_ieee; /* IEEE channel number */ /* XXX timestamp? */ }; struct ieee80211_auth_event { uint8_t iev_addr[6]; }; struct ieee80211_deauth_event { uint8_t iev_addr[6]; }; struct ieee80211_country_event { uint8_t iev_addr[6]; uint8_t iev_cc[2]; /* ISO country code */ }; struct ieee80211_radio_event { uint8_t iev_state; /* 1 on, 0 off */ }; #define RTM_IEEE80211_ASSOC 100 /* station associate (bss mode) */ #define RTM_IEEE80211_REASSOC 101 /* station re-associate (bss mode) */ #define RTM_IEEE80211_DISASSOC 102 /* station disassociate (bss mode) */ #define RTM_IEEE80211_JOIN 103 /* station join (ap mode) */ #define RTM_IEEE80211_LEAVE 104 /* station leave (ap mode) */ #define RTM_IEEE80211_SCAN 105 /* scan complete, results available */ #define RTM_IEEE80211_REPLAY 106 /* sequence counter replay detected */ #define RTM_IEEE80211_MICHAEL 107 /* Michael MIC failure detected */ #define RTM_IEEE80211_REJOIN 108 /* station re-associate (ap mode) */ #define RTM_IEEE80211_WDS 109 /* WDS discovery (ap mode) */ #define RTM_IEEE80211_CSA 110 /* Channel Switch Announcement event */ #define RTM_IEEE80211_RADAR 111 /* radar event */ #define RTM_IEEE80211_CAC 112 /* Channel Availability Check event */ #define RTM_IEEE80211_DEAUTH 113 /* station deauthenticate */ #define RTM_IEEE80211_AUTH 114 /* station authenticate (ap mode) */ #define RTM_IEEE80211_COUNTRY 115 /* discovered country code (sta mode) */ #define RTM_IEEE80211_RADIO 116 /* RF kill switch state change */ /* * Structure prepended to raw packets sent through the bpf * interface when set to DLT_IEEE802_11_RADIO. This allows * user applications to specify pretty much everything in * an Atheros tx descriptor. XXX need to generalize. * * XXX cannot be more than 14 bytes as it is copied to a sockaddr's * XXX sa_data area. */ struct ieee80211_bpf_params { uint8_t ibp_vers; /* version */ #define IEEE80211_BPF_VERSION 0 uint8_t ibp_len; /* header length in bytes */ uint8_t ibp_flags; #define IEEE80211_BPF_SHORTPRE 0x01 /* tx with short preamble */ #define IEEE80211_BPF_NOACK 0x02 /* tx with no ack */ #define IEEE80211_BPF_CRYPTO 0x04 /* tx with h/w encryption */ #define IEEE80211_BPF_FCS 0x10 /* frame incldues FCS */ #define IEEE80211_BPF_DATAPAD 0x20 /* frame includes data padding */ #define IEEE80211_BPF_RTS 0x40 /* tx with RTS/CTS */ #define IEEE80211_BPF_CTS 0x80 /* tx with CTS only */ uint8_t ibp_pri; /* WME/WMM AC+tx antenna */ uint8_t ibp_try0; /* series 1 try count */ uint8_t ibp_rate0; /* series 1 IEEE tx rate */ uint8_t ibp_power; /* tx power (device units) */ uint8_t ibp_ctsrate; /* IEEE tx rate for CTS */ uint8_t ibp_try1; /* series 2 try count */ uint8_t ibp_rate1; /* series 2 IEEE tx rate */ uint8_t ibp_try2; /* series 3 try count */ uint8_t ibp_rate2; /* series 3 IEEE tx rate */ uint8_t ibp_try3; /* series 4 try count */ uint8_t ibp_rate3; /* series 4 IEEE tx rate */ }; #ifdef _KERNEL struct ieee80211_tx_params { struct ieee80211_bpf_params params; }; int ieee80211_add_xmit_params(struct mbuf *m, const struct ieee80211_bpf_params *); int ieee80211_get_xmit_params(struct mbuf *m, struct ieee80211_bpf_params *); #define IEEE80211_MAX_CHAINS 3 #define IEEE80211_MAX_EVM_PILOTS 6 #define IEEE80211_R_NF 0x0000001 /* global NF value valid */ #define IEEE80211_R_RSSI 0x0000002 /* global RSSI value valid */ #define IEEE80211_R_C_CHAIN 0x0000004 /* RX chain count valid */ #define IEEE80211_R_C_NF 0x0000008 /* per-chain NF value valid */ #define IEEE80211_R_C_RSSI 0x0000010 /* per-chain RSSI value valid */ #define IEEE80211_R_C_EVM 0x0000020 /* per-chain EVM valid */ #define IEEE80211_R_C_HT40 0x0000040 /* RX'ed packet is 40mhz, pilots 4,5 valid */ #define IEEE80211_R_FREQ 0x0000080 /* Freq value populated, MHz */ #define IEEE80211_R_IEEE 0x0000100 /* IEEE value populated */ #define IEEE80211_R_BAND 0x0000200 /* Frequency band populated */ struct ieee80211_rx_stats { uint32_t r_flags; /* IEEE80211_R_* flags */ uint8_t c_chain; /* number of RX chains involved */ int16_t c_nf_ctl[IEEE80211_MAX_CHAINS]; /* per-chain NF */ int16_t c_nf_ext[IEEE80211_MAX_CHAINS]; /* per-chain NF */ int16_t c_rssi_ctl[IEEE80211_MAX_CHAINS]; /* per-chain RSSI */ int16_t c_rssi_ext[IEEE80211_MAX_CHAINS]; /* per-chain RSSI */ uint8_t nf; /* global NF */ uint8_t rssi; /* global RSSI */ uint8_t evm[IEEE80211_MAX_CHAINS][IEEE80211_MAX_EVM_PILOTS]; /* per-chain, per-pilot EVM values */ uint16_t c_freq; uint8_t c_ieee; }; struct ieee80211_rx_params { struct ieee80211_rx_stats params; }; int ieee80211_add_rx_params(struct mbuf *m, const struct ieee80211_rx_stats *rxs); int ieee80211_get_rx_params(struct mbuf *m, struct ieee80211_rx_stats *rxs); + +struct ieee80211_toa_params { + int request_id; +}; +int ieee80211_add_toa_params(struct mbuf *m, + const struct ieee80211_toa_params *p); +int ieee80211_get_toa_params(struct mbuf *m, + struct ieee80211_toa_params *p); #endif /* _KERNEL */ /* * Malloc API. Other BSD operating systems have slightly * different malloc/free namings (eg DragonflyBSD.) */ #define IEEE80211_MALLOC malloc #define IEEE80211_FREE free /* XXX TODO: get rid of WAITOK, fix all the users of it? */ #define IEEE80211_M_NOWAIT M_NOWAIT #define IEEE80211_M_WAITOK M_WAITOK #define IEEE80211_M_ZERO M_ZERO /* XXX TODO: the type fields */ #endif /* _NET80211_IEEE80211_FREEBSD_H_ */ Index: projects/clang390-import/sys/powerpc/booke/pmap.c =================================================================== --- projects/clang390-import/sys/powerpc/booke/pmap.c (revision 305686) +++ projects/clang390-import/sys/powerpc/booke/pmap.c (revision 305687) @@ -1,3601 +1,3607 @@ /*- * Copyright (C) 2007-2009 Semihalf, Rafal Jaworowski * Copyright (C) 2006 Semihalf, Marian Balakowicz * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN * NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED * TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * Some hw specific parts of this pmap were derived or influenced * by NetBSD's ibm4xx pmap module. More generic code is shared with * a few other pmap modules from the FreeBSD tree. */ /* * VM layout notes: * * Kernel and user threads run within one common virtual address space * defined by AS=0. * * Virtual address space layout: * ----------------------------- * 0x0000_0000 - 0xafff_ffff : user process * 0xb000_0000 - 0xbfff_ffff : pmap_mapdev()-ed area (PCI/PCIE etc.) * 0xc000_0000 - 0xc0ff_ffff : kernel reserved * 0xc000_0000 - data_end : kernel code+data, env, metadata etc. * 0xc100_0000 - 0xfeef_ffff : KVA * 0xc100_0000 - 0xc100_3fff : reserved for page zero/copy * 0xc100_4000 - 0xc200_3fff : reserved for ptbl bufs * 0xc200_4000 - 0xc200_8fff : guard page + kstack0 * 0xc200_9000 - 0xfeef_ffff : actual free KVA space * 0xfef0_0000 - 0xffff_ffff : I/O devices region */ #include __FBSDID("$FreeBSD$"); #include "opt_kstack_pages.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "mmu_if.h" #define SPARSE_MAPDEV #ifdef DEBUG #define debugf(fmt, args...) printf(fmt, ##args) #else #define debugf(fmt, args...) #endif #define TODO panic("%s: not implemented", __func__); extern unsigned char _etext[]; extern unsigned char _end[]; extern uint32_t *bootinfo; vm_paddr_t kernload; vm_offset_t kernstart; vm_size_t kernsize; /* Message buffer and tables. */ static vm_offset_t data_start; static vm_size_t data_end; /* Phys/avail memory regions. */ static struct mem_region *availmem_regions; static int availmem_regions_sz; static struct mem_region *physmem_regions; static int physmem_regions_sz; /* Reserved KVA space and mutex for mmu_booke_zero_page. */ static vm_offset_t zero_page_va; static struct mtx zero_page_mutex; static struct mtx tlbivax_mutex; /* Reserved KVA space and mutex for mmu_booke_copy_page. */ static vm_offset_t copy_page_src_va; static vm_offset_t copy_page_dst_va; static struct mtx copy_page_mutex; /**************************************************************************/ /* PMAP */ /**************************************************************************/ static int mmu_booke_enter_locked(mmu_t, pmap_t, vm_offset_t, vm_page_t, vm_prot_t, u_int flags, int8_t psind); unsigned int kptbl_min; /* Index of the first kernel ptbl. */ unsigned int kernel_ptbls; /* Number of KVA ptbls. */ /* * If user pmap is processed with mmu_booke_remove and the resident count * drops to 0, there are no more pages to remove, so we need not continue. */ #define PMAP_REMOVE_DONE(pmap) \ ((pmap) != kernel_pmap && (pmap)->pm_stats.resident_count == 0) extern int elf32_nxstack; /**************************************************************************/ /* TLB and TID handling */ /**************************************************************************/ /* Translation ID busy table */ static volatile pmap_t tidbusy[MAXCPU][TID_MAX + 1]; /* * TLB0 capabilities (entry, way numbers etc.). These can vary between e500 * core revisions and should be read from h/w registers during early config. */ uint32_t tlb0_entries; uint32_t tlb0_ways; uint32_t tlb0_entries_per_way; uint32_t tlb1_entries; #define TLB0_ENTRIES (tlb0_entries) #define TLB0_WAYS (tlb0_ways) #define TLB0_ENTRIES_PER_WAY (tlb0_entries_per_way) #define TLB1_ENTRIES (tlb1_entries) #define TLB1_MAXENTRIES 64 static vm_offset_t tlb1_map_base = VM_MAXUSER_ADDRESS + PAGE_SIZE; static tlbtid_t tid_alloc(struct pmap *); static void tid_flush(tlbtid_t tid); static void tlb_print_entry(int, uint32_t, uint32_t, uint32_t, uint32_t); static void tlb1_read_entry(tlb_entry_t *, unsigned int); static void tlb1_write_entry(tlb_entry_t *, unsigned int); static int tlb1_iomapped(int, vm_paddr_t, vm_size_t, vm_offset_t *); static vm_size_t tlb1_mapin_region(vm_offset_t, vm_paddr_t, vm_size_t); static vm_size_t tsize2size(unsigned int); static unsigned int size2tsize(vm_size_t); static unsigned int ilog2(unsigned int); static void set_mas4_defaults(void); static inline void tlb0_flush_entry(vm_offset_t); static inline unsigned int tlb0_tableidx(vm_offset_t, unsigned int); /**************************************************************************/ /* Page table management */ /**************************************************************************/ static struct rwlock_padalign pvh_global_lock; /* Data for the pv entry allocation mechanism */ static uma_zone_t pvzone; static int pv_entry_count = 0, pv_entry_max = 0, pv_entry_high_water = 0; #define PV_ENTRY_ZONE_MIN 2048 /* min pv entries in uma zone */ #ifndef PMAP_SHPGPERPROC #define PMAP_SHPGPERPROC 200 #endif static void ptbl_init(void); static struct ptbl_buf *ptbl_buf_alloc(void); static void ptbl_buf_free(struct ptbl_buf *); static void ptbl_free_pmap_ptbl(pmap_t, pte_t *); static pte_t *ptbl_alloc(mmu_t, pmap_t, unsigned int, boolean_t); static void ptbl_free(mmu_t, pmap_t, unsigned int); static void ptbl_hold(mmu_t, pmap_t, unsigned int); static int ptbl_unhold(mmu_t, pmap_t, unsigned int); static vm_paddr_t pte_vatopa(mmu_t, pmap_t, vm_offset_t); static pte_t *pte_find(mmu_t, pmap_t, vm_offset_t); static int pte_enter(mmu_t, pmap_t, vm_page_t, vm_offset_t, uint32_t, boolean_t); static int pte_remove(mmu_t, pmap_t, vm_offset_t, uint8_t); static void kernel_pte_alloc(vm_offset_t data_end, vm_offset_t addr, vm_offset_t pdir); static pv_entry_t pv_alloc(void); static void pv_free(pv_entry_t); static void pv_insert(pmap_t, vm_offset_t, vm_page_t); static void pv_remove(pmap_t, vm_offset_t, vm_page_t); static void booke_pmap_init_qpages(void); /* Number of kva ptbl buffers, each covering one ptbl (PTBL_PAGES). */ #define PTBL_BUFS (128 * 16) struct ptbl_buf { TAILQ_ENTRY(ptbl_buf) link; /* list link */ vm_offset_t kva; /* va of mapping */ }; /* ptbl free list and a lock used for access synchronization. */ static TAILQ_HEAD(, ptbl_buf) ptbl_buf_freelist; static struct mtx ptbl_buf_freelist_lock; /* Base address of kva space allocated fot ptbl bufs. */ static vm_offset_t ptbl_buf_pool_vabase; /* Pointer to ptbl_buf structures. */ static struct ptbl_buf *ptbl_bufs; #ifdef SMP extern tlb_entry_t __boot_tlb1[]; void pmap_bootstrap_ap(volatile uint32_t *); #endif /* * Kernel MMU interface */ static void mmu_booke_clear_modify(mmu_t, vm_page_t); static void mmu_booke_copy(mmu_t, pmap_t, pmap_t, vm_offset_t, vm_size_t, vm_offset_t); static void mmu_booke_copy_page(mmu_t, vm_page_t, vm_page_t); static void mmu_booke_copy_pages(mmu_t, vm_page_t *, vm_offset_t, vm_page_t *, vm_offset_t, int); static int mmu_booke_enter(mmu_t, pmap_t, vm_offset_t, vm_page_t, vm_prot_t, u_int flags, int8_t psind); static void mmu_booke_enter_object(mmu_t, pmap_t, vm_offset_t, vm_offset_t, vm_page_t, vm_prot_t); static void mmu_booke_enter_quick(mmu_t, pmap_t, vm_offset_t, vm_page_t, vm_prot_t); static vm_paddr_t mmu_booke_extract(mmu_t, pmap_t, vm_offset_t); static vm_page_t mmu_booke_extract_and_hold(mmu_t, pmap_t, vm_offset_t, vm_prot_t); static void mmu_booke_init(mmu_t); static boolean_t mmu_booke_is_modified(mmu_t, vm_page_t); static boolean_t mmu_booke_is_prefaultable(mmu_t, pmap_t, vm_offset_t); static boolean_t mmu_booke_is_referenced(mmu_t, vm_page_t); static int mmu_booke_ts_referenced(mmu_t, vm_page_t); static vm_offset_t mmu_booke_map(mmu_t, vm_offset_t *, vm_paddr_t, vm_paddr_t, int); static int mmu_booke_mincore(mmu_t, pmap_t, vm_offset_t, vm_paddr_t *); static void mmu_booke_object_init_pt(mmu_t, pmap_t, vm_offset_t, vm_object_t, vm_pindex_t, vm_size_t); static boolean_t mmu_booke_page_exists_quick(mmu_t, pmap_t, vm_page_t); static void mmu_booke_page_init(mmu_t, vm_page_t); static int mmu_booke_page_wired_mappings(mmu_t, vm_page_t); static void mmu_booke_pinit(mmu_t, pmap_t); static void mmu_booke_pinit0(mmu_t, pmap_t); static void mmu_booke_protect(mmu_t, pmap_t, vm_offset_t, vm_offset_t, vm_prot_t); static void mmu_booke_qenter(mmu_t, vm_offset_t, vm_page_t *, int); static void mmu_booke_qremove(mmu_t, vm_offset_t, int); static void mmu_booke_release(mmu_t, pmap_t); static void mmu_booke_remove(mmu_t, pmap_t, vm_offset_t, vm_offset_t); static void mmu_booke_remove_all(mmu_t, vm_page_t); static void mmu_booke_remove_write(mmu_t, vm_page_t); static void mmu_booke_unwire(mmu_t, pmap_t, vm_offset_t, vm_offset_t); static void mmu_booke_zero_page(mmu_t, vm_page_t); static void mmu_booke_zero_page_area(mmu_t, vm_page_t, int, int); static void mmu_booke_activate(mmu_t, struct thread *); static void mmu_booke_deactivate(mmu_t, struct thread *); static void mmu_booke_bootstrap(mmu_t, vm_offset_t, vm_offset_t); static void *mmu_booke_mapdev(mmu_t, vm_paddr_t, vm_size_t); static void *mmu_booke_mapdev_attr(mmu_t, vm_paddr_t, vm_size_t, vm_memattr_t); static void mmu_booke_unmapdev(mmu_t, vm_offset_t, vm_size_t); static vm_paddr_t mmu_booke_kextract(mmu_t, vm_offset_t); static void mmu_booke_kenter(mmu_t, vm_offset_t, vm_paddr_t); static void mmu_booke_kenter_attr(mmu_t, vm_offset_t, vm_paddr_t, vm_memattr_t); static void mmu_booke_kremove(mmu_t, vm_offset_t); static boolean_t mmu_booke_dev_direct_mapped(mmu_t, vm_paddr_t, vm_size_t); static void mmu_booke_sync_icache(mmu_t, pmap_t, vm_offset_t, vm_size_t); static void mmu_booke_dumpsys_map(mmu_t, vm_paddr_t pa, size_t, void **); static void mmu_booke_dumpsys_unmap(mmu_t, vm_paddr_t pa, size_t, void *); static void mmu_booke_scan_init(mmu_t); static vm_offset_t mmu_booke_quick_enter_page(mmu_t mmu, vm_page_t m); static void mmu_booke_quick_remove_page(mmu_t mmu, vm_offset_t addr); static int mmu_booke_change_attr(mmu_t mmu, vm_offset_t addr, vm_size_t sz, vm_memattr_t mode); static mmu_method_t mmu_booke_methods[] = { /* pmap dispatcher interface */ MMUMETHOD(mmu_clear_modify, mmu_booke_clear_modify), MMUMETHOD(mmu_copy, mmu_booke_copy), MMUMETHOD(mmu_copy_page, mmu_booke_copy_page), MMUMETHOD(mmu_copy_pages, mmu_booke_copy_pages), MMUMETHOD(mmu_enter, mmu_booke_enter), MMUMETHOD(mmu_enter_object, mmu_booke_enter_object), MMUMETHOD(mmu_enter_quick, mmu_booke_enter_quick), MMUMETHOD(mmu_extract, mmu_booke_extract), MMUMETHOD(mmu_extract_and_hold, mmu_booke_extract_and_hold), MMUMETHOD(mmu_init, mmu_booke_init), MMUMETHOD(mmu_is_modified, mmu_booke_is_modified), MMUMETHOD(mmu_is_prefaultable, mmu_booke_is_prefaultable), MMUMETHOD(mmu_is_referenced, mmu_booke_is_referenced), MMUMETHOD(mmu_ts_referenced, mmu_booke_ts_referenced), MMUMETHOD(mmu_map, mmu_booke_map), MMUMETHOD(mmu_mincore, mmu_booke_mincore), MMUMETHOD(mmu_object_init_pt, mmu_booke_object_init_pt), MMUMETHOD(mmu_page_exists_quick,mmu_booke_page_exists_quick), MMUMETHOD(mmu_page_init, mmu_booke_page_init), MMUMETHOD(mmu_page_wired_mappings, mmu_booke_page_wired_mappings), MMUMETHOD(mmu_pinit, mmu_booke_pinit), MMUMETHOD(mmu_pinit0, mmu_booke_pinit0), MMUMETHOD(mmu_protect, mmu_booke_protect), MMUMETHOD(mmu_qenter, mmu_booke_qenter), MMUMETHOD(mmu_qremove, mmu_booke_qremove), MMUMETHOD(mmu_release, mmu_booke_release), MMUMETHOD(mmu_remove, mmu_booke_remove), MMUMETHOD(mmu_remove_all, mmu_booke_remove_all), MMUMETHOD(mmu_remove_write, mmu_booke_remove_write), MMUMETHOD(mmu_sync_icache, mmu_booke_sync_icache), MMUMETHOD(mmu_unwire, mmu_booke_unwire), MMUMETHOD(mmu_zero_page, mmu_booke_zero_page), MMUMETHOD(mmu_zero_page_area, mmu_booke_zero_page_area), MMUMETHOD(mmu_activate, mmu_booke_activate), MMUMETHOD(mmu_deactivate, mmu_booke_deactivate), MMUMETHOD(mmu_quick_enter_page, mmu_booke_quick_enter_page), MMUMETHOD(mmu_quick_remove_page, mmu_booke_quick_remove_page), /* Internal interfaces */ MMUMETHOD(mmu_bootstrap, mmu_booke_bootstrap), MMUMETHOD(mmu_dev_direct_mapped,mmu_booke_dev_direct_mapped), MMUMETHOD(mmu_mapdev, mmu_booke_mapdev), MMUMETHOD(mmu_mapdev_attr, mmu_booke_mapdev_attr), MMUMETHOD(mmu_kenter, mmu_booke_kenter), MMUMETHOD(mmu_kenter_attr, mmu_booke_kenter_attr), MMUMETHOD(mmu_kextract, mmu_booke_kextract), MMUMETHOD(mmu_kremove, mmu_booke_kremove), MMUMETHOD(mmu_unmapdev, mmu_booke_unmapdev), MMUMETHOD(mmu_change_attr, mmu_booke_change_attr), /* dumpsys() support */ MMUMETHOD(mmu_dumpsys_map, mmu_booke_dumpsys_map), MMUMETHOD(mmu_dumpsys_unmap, mmu_booke_dumpsys_unmap), MMUMETHOD(mmu_scan_init, mmu_booke_scan_init), { 0, 0 } }; MMU_DEF(booke_mmu, MMU_TYPE_BOOKE, mmu_booke_methods, 0); static __inline uint32_t tlb_calc_wimg(vm_paddr_t pa, vm_memattr_t ma) { uint32_t attrib; int i; if (ma != VM_MEMATTR_DEFAULT) { switch (ma) { case VM_MEMATTR_UNCACHEABLE: return (MAS2_I | MAS2_G); case VM_MEMATTR_WRITE_COMBINING: case VM_MEMATTR_WRITE_BACK: case VM_MEMATTR_PREFETCHABLE: return (MAS2_I); case VM_MEMATTR_WRITE_THROUGH: return (MAS2_W | MAS2_M); case VM_MEMATTR_CACHEABLE: return (MAS2_M); } } /* * Assume the page is cache inhibited and access is guarded unless * it's in our available memory array. */ attrib = _TLB_ENTRY_IO; for (i = 0; i < physmem_regions_sz; i++) { if ((pa >= physmem_regions[i].mr_start) && (pa < (physmem_regions[i].mr_start + physmem_regions[i].mr_size))) { attrib = _TLB_ENTRY_MEM; break; } } return (attrib); } static inline void tlb_miss_lock(void) { #ifdef SMP struct pcpu *pc; if (!smp_started) return; STAILQ_FOREACH(pc, &cpuhead, pc_allcpu) { if (pc != pcpup) { CTR3(KTR_PMAP, "%s: tlb miss LOCK of CPU=%d, " "tlb_lock=%p", __func__, pc->pc_cpuid, pc->pc_booke_tlb_lock); KASSERT((pc->pc_cpuid != PCPU_GET(cpuid)), ("tlb_miss_lock: tried to lock self")); tlb_lock(pc->pc_booke_tlb_lock); CTR1(KTR_PMAP, "%s: locked", __func__); } } #endif } static inline void tlb_miss_unlock(void) { #ifdef SMP struct pcpu *pc; if (!smp_started) return; STAILQ_FOREACH(pc, &cpuhead, pc_allcpu) { if (pc != pcpup) { CTR2(KTR_PMAP, "%s: tlb miss UNLOCK of CPU=%d", __func__, pc->pc_cpuid); tlb_unlock(pc->pc_booke_tlb_lock); CTR1(KTR_PMAP, "%s: unlocked", __func__); } } #endif } /* Return number of entries in TLB0. */ static __inline void tlb0_get_tlbconf(void) { uint32_t tlb0_cfg; tlb0_cfg = mfspr(SPR_TLB0CFG); tlb0_entries = tlb0_cfg & TLBCFG_NENTRY_MASK; tlb0_ways = (tlb0_cfg & TLBCFG_ASSOC_MASK) >> TLBCFG_ASSOC_SHIFT; tlb0_entries_per_way = tlb0_entries / tlb0_ways; } /* Return number of entries in TLB1. */ static __inline void tlb1_get_tlbconf(void) { uint32_t tlb1_cfg; tlb1_cfg = mfspr(SPR_TLB1CFG); tlb1_entries = tlb1_cfg & TLBCFG_NENTRY_MASK; } /**************************************************************************/ /* Page table related */ /**************************************************************************/ /* Initialize pool of kva ptbl buffers. */ static void ptbl_init(void) { int i; CTR3(KTR_PMAP, "%s: s (ptbl_bufs = 0x%08x size 0x%08x)", __func__, (uint32_t)ptbl_bufs, sizeof(struct ptbl_buf) * PTBL_BUFS); CTR3(KTR_PMAP, "%s: s (ptbl_buf_pool_vabase = 0x%08x size = 0x%08x)", __func__, ptbl_buf_pool_vabase, PTBL_BUFS * PTBL_PAGES * PAGE_SIZE); mtx_init(&ptbl_buf_freelist_lock, "ptbl bufs lock", NULL, MTX_DEF); TAILQ_INIT(&ptbl_buf_freelist); for (i = 0; i < PTBL_BUFS; i++) { ptbl_bufs[i].kva = ptbl_buf_pool_vabase + i * PTBL_PAGES * PAGE_SIZE; TAILQ_INSERT_TAIL(&ptbl_buf_freelist, &ptbl_bufs[i], link); } } /* Get a ptbl_buf from the freelist. */ static struct ptbl_buf * ptbl_buf_alloc(void) { struct ptbl_buf *buf; mtx_lock(&ptbl_buf_freelist_lock); buf = TAILQ_FIRST(&ptbl_buf_freelist); if (buf != NULL) TAILQ_REMOVE(&ptbl_buf_freelist, buf, link); mtx_unlock(&ptbl_buf_freelist_lock); CTR2(KTR_PMAP, "%s: buf = %p", __func__, buf); return (buf); } /* Return ptbl buff to free pool. */ static void ptbl_buf_free(struct ptbl_buf *buf) { CTR2(KTR_PMAP, "%s: buf = %p", __func__, buf); mtx_lock(&ptbl_buf_freelist_lock); TAILQ_INSERT_TAIL(&ptbl_buf_freelist, buf, link); mtx_unlock(&ptbl_buf_freelist_lock); } /* * Search the list of allocated ptbl bufs and find on list of allocated ptbls */ static void ptbl_free_pmap_ptbl(pmap_t pmap, pte_t *ptbl) { struct ptbl_buf *pbuf; CTR2(KTR_PMAP, "%s: ptbl = %p", __func__, ptbl); PMAP_LOCK_ASSERT(pmap, MA_OWNED); TAILQ_FOREACH(pbuf, &pmap->pm_ptbl_list, link) if (pbuf->kva == (vm_offset_t)ptbl) { /* Remove from pmap ptbl buf list. */ TAILQ_REMOVE(&pmap->pm_ptbl_list, pbuf, link); /* Free corresponding ptbl buf. */ ptbl_buf_free(pbuf); break; } } /* Allocate page table. */ static pte_t * ptbl_alloc(mmu_t mmu, pmap_t pmap, unsigned int pdir_idx, boolean_t nosleep) { vm_page_t mtbl[PTBL_PAGES]; vm_page_t m; struct ptbl_buf *pbuf; unsigned int pidx; pte_t *ptbl; int i, j; CTR4(KTR_PMAP, "%s: pmap = %p su = %d pdir_idx = %d", __func__, pmap, (pmap == kernel_pmap), pdir_idx); KASSERT((pdir_idx <= (VM_MAXUSER_ADDRESS / PDIR_SIZE)), ("ptbl_alloc: invalid pdir_idx")); KASSERT((pmap->pm_pdir[pdir_idx] == NULL), ("pte_alloc: valid ptbl entry exists!")); pbuf = ptbl_buf_alloc(); if (pbuf == NULL) panic("pte_alloc: couldn't alloc kernel virtual memory"); ptbl = (pte_t *)pbuf->kva; CTR2(KTR_PMAP, "%s: ptbl kva = %p", __func__, ptbl); /* Allocate ptbl pages, this will sleep! */ for (i = 0; i < PTBL_PAGES; i++) { pidx = (PTBL_PAGES * pdir_idx) + i; while ((m = vm_page_alloc(NULL, pidx, VM_ALLOC_NOOBJ | VM_ALLOC_WIRED)) == NULL) { PMAP_UNLOCK(pmap); rw_wunlock(&pvh_global_lock); if (nosleep) { ptbl_free_pmap_ptbl(pmap, ptbl); for (j = 0; j < i; j++) vm_page_free(mtbl[j]); atomic_subtract_int(&vm_cnt.v_wire_count, i); return (NULL); } VM_WAIT; rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); } mtbl[i] = m; } /* Map allocated pages into kernel_pmap. */ mmu_booke_qenter(mmu, (vm_offset_t)ptbl, mtbl, PTBL_PAGES); /* Zero whole ptbl. */ bzero((caddr_t)ptbl, PTBL_PAGES * PAGE_SIZE); /* Add pbuf to the pmap ptbl bufs list. */ TAILQ_INSERT_TAIL(&pmap->pm_ptbl_list, pbuf, link); return (ptbl); } /* Free ptbl pages and invalidate pdir entry. */ static void ptbl_free(mmu_t mmu, pmap_t pmap, unsigned int pdir_idx) { pte_t *ptbl; vm_paddr_t pa; vm_offset_t va; vm_page_t m; int i; CTR4(KTR_PMAP, "%s: pmap = %p su = %d pdir_idx = %d", __func__, pmap, (pmap == kernel_pmap), pdir_idx); KASSERT((pdir_idx <= (VM_MAXUSER_ADDRESS / PDIR_SIZE)), ("ptbl_free: invalid pdir_idx")); ptbl = pmap->pm_pdir[pdir_idx]; CTR2(KTR_PMAP, "%s: ptbl = %p", __func__, ptbl); KASSERT((ptbl != NULL), ("ptbl_free: null ptbl")); /* * Invalidate the pdir entry as soon as possible, so that other CPUs * don't attempt to look up the page tables we are releasing. */ mtx_lock_spin(&tlbivax_mutex); tlb_miss_lock(); pmap->pm_pdir[pdir_idx] = NULL; tlb_miss_unlock(); mtx_unlock_spin(&tlbivax_mutex); for (i = 0; i < PTBL_PAGES; i++) { va = ((vm_offset_t)ptbl + (i * PAGE_SIZE)); pa = pte_vatopa(mmu, kernel_pmap, va); m = PHYS_TO_VM_PAGE(pa); vm_page_free_zero(m); atomic_subtract_int(&vm_cnt.v_wire_count, 1); mmu_booke_kremove(mmu, va); } ptbl_free_pmap_ptbl(pmap, ptbl); } /* * Decrement ptbl pages hold count and attempt to free ptbl pages. * Called when removing pte entry from ptbl. * * Return 1 if ptbl pages were freed. */ static int ptbl_unhold(mmu_t mmu, pmap_t pmap, unsigned int pdir_idx) { pte_t *ptbl; vm_paddr_t pa; vm_page_t m; int i; CTR4(KTR_PMAP, "%s: pmap = %p su = %d pdir_idx = %d", __func__, pmap, (pmap == kernel_pmap), pdir_idx); KASSERT((pdir_idx <= (VM_MAXUSER_ADDRESS / PDIR_SIZE)), ("ptbl_unhold: invalid pdir_idx")); KASSERT((pmap != kernel_pmap), ("ptbl_unhold: unholding kernel ptbl!")); ptbl = pmap->pm_pdir[pdir_idx]; //debugf("ptbl_unhold: ptbl = 0x%08x\n", (u_int32_t)ptbl); KASSERT(((vm_offset_t)ptbl >= VM_MIN_KERNEL_ADDRESS), ("ptbl_unhold: non kva ptbl")); /* decrement hold count */ for (i = 0; i < PTBL_PAGES; i++) { pa = pte_vatopa(mmu, kernel_pmap, (vm_offset_t)ptbl + (i * PAGE_SIZE)); m = PHYS_TO_VM_PAGE(pa); m->wire_count--; } /* * Free ptbl pages if there are no pte etries in this ptbl. * wire_count has the same value for all ptbl pages, so check the last * page. */ if (m->wire_count == 0) { ptbl_free(mmu, pmap, pdir_idx); //debugf("ptbl_unhold: e (freed ptbl)\n"); return (1); } return (0); } /* * Increment hold count for ptbl pages. This routine is used when a new pte * entry is being inserted into the ptbl. */ static void ptbl_hold(mmu_t mmu, pmap_t pmap, unsigned int pdir_idx) { vm_paddr_t pa; pte_t *ptbl; vm_page_t m; int i; CTR3(KTR_PMAP, "%s: pmap = %p pdir_idx = %d", __func__, pmap, pdir_idx); KASSERT((pdir_idx <= (VM_MAXUSER_ADDRESS / PDIR_SIZE)), ("ptbl_hold: invalid pdir_idx")); KASSERT((pmap != kernel_pmap), ("ptbl_hold: holding kernel ptbl!")); ptbl = pmap->pm_pdir[pdir_idx]; KASSERT((ptbl != NULL), ("ptbl_hold: null ptbl")); for (i = 0; i < PTBL_PAGES; i++) { pa = pte_vatopa(mmu, kernel_pmap, (vm_offset_t)ptbl + (i * PAGE_SIZE)); m = PHYS_TO_VM_PAGE(pa); m->wire_count++; } } /* Allocate pv_entry structure. */ pv_entry_t pv_alloc(void) { pv_entry_t pv; pv_entry_count++; if (pv_entry_count > pv_entry_high_water) pagedaemon_wakeup(); pv = uma_zalloc(pvzone, M_NOWAIT); return (pv); } /* Free pv_entry structure. */ static __inline void pv_free(pv_entry_t pve) { pv_entry_count--; uma_zfree(pvzone, pve); } /* Allocate and initialize pv_entry structure. */ static void pv_insert(pmap_t pmap, vm_offset_t va, vm_page_t m) { pv_entry_t pve; //int su = (pmap == kernel_pmap); //debugf("pv_insert: s (su = %d pmap = 0x%08x va = 0x%08x m = 0x%08x)\n", su, // (u_int32_t)pmap, va, (u_int32_t)m); pve = pv_alloc(); if (pve == NULL) panic("pv_insert: no pv entries!"); pve->pv_pmap = pmap; pve->pv_va = va; /* add to pv_list */ PMAP_LOCK_ASSERT(pmap, MA_OWNED); rw_assert(&pvh_global_lock, RA_WLOCKED); TAILQ_INSERT_TAIL(&m->md.pv_list, pve, pv_link); //debugf("pv_insert: e\n"); } /* Destroy pv entry. */ static void pv_remove(pmap_t pmap, vm_offset_t va, vm_page_t m) { pv_entry_t pve; //int su = (pmap == kernel_pmap); //debugf("pv_remove: s (su = %d pmap = 0x%08x va = 0x%08x)\n", su, (u_int32_t)pmap, va); PMAP_LOCK_ASSERT(pmap, MA_OWNED); rw_assert(&pvh_global_lock, RA_WLOCKED); /* find pv entry */ TAILQ_FOREACH(pve, &m->md.pv_list, pv_link) { if ((pmap == pve->pv_pmap) && (va == pve->pv_va)) { /* remove from pv_list */ TAILQ_REMOVE(&m->md.pv_list, pve, pv_link); if (TAILQ_EMPTY(&m->md.pv_list)) vm_page_aflag_clear(m, PGA_WRITEABLE); /* free pv entry struct */ pv_free(pve); break; } } //debugf("pv_remove: e\n"); } /* * Clean pte entry, try to free page table page if requested. * * Return 1 if ptbl pages were freed, otherwise return 0. */ static int pte_remove(mmu_t mmu, pmap_t pmap, vm_offset_t va, uint8_t flags) { unsigned int pdir_idx = PDIR_IDX(va); unsigned int ptbl_idx = PTBL_IDX(va); vm_page_t m; pte_t *ptbl; pte_t *pte; //int su = (pmap == kernel_pmap); //debugf("pte_remove: s (su = %d pmap = 0x%08x va = 0x%08x flags = %d)\n", // su, (u_int32_t)pmap, va, flags); ptbl = pmap->pm_pdir[pdir_idx]; KASSERT(ptbl, ("pte_remove: null ptbl")); pte = &ptbl[ptbl_idx]; if (pte == NULL || !PTE_ISVALID(pte)) return (0); if (PTE_ISWIRED(pte)) pmap->pm_stats.wired_count--; /* Handle managed entry. */ if (PTE_ISMANAGED(pte)) { /* Get vm_page_t for mapped pte. */ m = PHYS_TO_VM_PAGE(PTE_PA(pte)); if (PTE_ISMODIFIED(pte)) vm_page_dirty(m); if (PTE_ISREFERENCED(pte)) vm_page_aflag_set(m, PGA_REFERENCED); pv_remove(pmap, va, m); } mtx_lock_spin(&tlbivax_mutex); tlb_miss_lock(); tlb0_flush_entry(va); *pte = 0; tlb_miss_unlock(); mtx_unlock_spin(&tlbivax_mutex); pmap->pm_stats.resident_count--; if (flags & PTBL_UNHOLD) { //debugf("pte_remove: e (unhold)\n"); return (ptbl_unhold(mmu, pmap, pdir_idx)); } //debugf("pte_remove: e\n"); return (0); } /* * Insert PTE for a given page and virtual address. */ static int pte_enter(mmu_t mmu, pmap_t pmap, vm_page_t m, vm_offset_t va, uint32_t flags, boolean_t nosleep) { unsigned int pdir_idx = PDIR_IDX(va); unsigned int ptbl_idx = PTBL_IDX(va); pte_t *ptbl, *pte; CTR4(KTR_PMAP, "%s: su = %d pmap = %p va = %p", __func__, pmap == kernel_pmap, pmap, va); /* Get the page table pointer. */ ptbl = pmap->pm_pdir[pdir_idx]; if (ptbl == NULL) { /* Allocate page table pages. */ ptbl = ptbl_alloc(mmu, pmap, pdir_idx, nosleep); if (ptbl == NULL) { KASSERT(nosleep, ("nosleep and NULL ptbl")); return (ENOMEM); } } else { /* * Check if there is valid mapping for requested * va, if there is, remove it. */ pte = &pmap->pm_pdir[pdir_idx][ptbl_idx]; if (PTE_ISVALID(pte)) { pte_remove(mmu, pmap, va, PTBL_HOLD); } else { /* * pte is not used, increment hold count * for ptbl pages. */ if (pmap != kernel_pmap) ptbl_hold(mmu, pmap, pdir_idx); } } /* * Insert pv_entry into pv_list for mapped page if part of managed * memory. */ if ((m->oflags & VPO_UNMANAGED) == 0) { flags |= PTE_MANAGED; /* Create and insert pv entry. */ pv_insert(pmap, va, m); } pmap->pm_stats.resident_count++; mtx_lock_spin(&tlbivax_mutex); tlb_miss_lock(); tlb0_flush_entry(va); if (pmap->pm_pdir[pdir_idx] == NULL) { /* * If we just allocated a new page table, hook it in * the pdir. */ pmap->pm_pdir[pdir_idx] = ptbl; } pte = &(pmap->pm_pdir[pdir_idx][ptbl_idx]); *pte = PTE_RPN_FROM_PA(VM_PAGE_TO_PHYS(m)); *pte |= (PTE_VALID | flags | PTE_PS_4KB); /* 4KB pages only */ tlb_miss_unlock(); mtx_unlock_spin(&tlbivax_mutex); return (0); } /* Return the pa for the given pmap/va. */ static vm_paddr_t pte_vatopa(mmu_t mmu, pmap_t pmap, vm_offset_t va) { vm_paddr_t pa = 0; pte_t *pte; pte = pte_find(mmu, pmap, va); if ((pte != NULL) && PTE_ISVALID(pte)) pa = (PTE_PA(pte) | (va & PTE_PA_MASK)); return (pa); } /* Get a pointer to a PTE in a page table. */ static pte_t * pte_find(mmu_t mmu, pmap_t pmap, vm_offset_t va) { unsigned int pdir_idx = PDIR_IDX(va); unsigned int ptbl_idx = PTBL_IDX(va); KASSERT((pmap != NULL), ("pte_find: invalid pmap")); if (pmap->pm_pdir[pdir_idx]) return (&(pmap->pm_pdir[pdir_idx][ptbl_idx])); return (NULL); } /* Set up kernel page tables. */ static void kernel_pte_alloc(vm_offset_t data_end, vm_offset_t addr, vm_offset_t pdir) { int i; vm_offset_t va; pte_t *pte; /* Initialize kernel pdir */ for (i = 0; i < kernel_ptbls; i++) kernel_pmap->pm_pdir[kptbl_min + i] = (pte_t *)(pdir + (i * PAGE_SIZE * PTBL_PAGES)); /* * Fill in PTEs covering kernel code and data. They are not required * for address translation, as this area is covered by static TLB1 * entries, but for pte_vatopa() to work correctly with kernel area * addresses. */ for (va = addr; va < data_end; va += PAGE_SIZE) { pte = &(kernel_pmap->pm_pdir[PDIR_IDX(va)][PTBL_IDX(va)]); *pte = PTE_RPN_FROM_PA(kernload + (va - kernstart)); *pte |= PTE_M | PTE_SR | PTE_SW | PTE_SX | PTE_WIRED | PTE_VALID | PTE_PS_4KB; } } /**************************************************************************/ /* PMAP related */ /**************************************************************************/ /* * This is called during booke_init, before the system is really initialized. */ static void mmu_booke_bootstrap(mmu_t mmu, vm_offset_t start, vm_offset_t kernelend) { vm_paddr_t phys_kernelend; struct mem_region *mp, *mp1; int cnt, i, j; vm_paddr_t s, e, sz; vm_paddr_t physsz, hwphyssz; u_int phys_avail_count; vm_size_t kstack0_sz; vm_offset_t kernel_pdir, kstack0; vm_paddr_t kstack0_phys; void *dpcpu; debugf("mmu_booke_bootstrap: entered\n"); /* Set interesting system properties */ hw_direct_map = 0; elf32_nxstack = 1; /* Initialize invalidation mutex */ mtx_init(&tlbivax_mutex, "tlbivax", NULL, MTX_SPIN); /* Read TLB0 size and associativity. */ tlb0_get_tlbconf(); /* * Align kernel start and end address (kernel image). * Note that kernel end does not necessarily relate to kernsize. * kernsize is the size of the kernel that is actually mapped. */ kernstart = trunc_page(start); data_start = round_page(kernelend); data_end = data_start; /* * Addresses of preloaded modules (like file systems) use * physical addresses. Make sure we relocate those into * virtual addresses. */ preload_addr_relocate = kernstart - kernload; /* Allocate the dynamic per-cpu area. */ dpcpu = (void *)data_end; data_end += DPCPU_SIZE; /* Allocate space for the message buffer. */ msgbufp = (struct msgbuf *)data_end; data_end += msgbufsize; debugf(" msgbufp at 0x%08x end = 0x%08x\n", (uint32_t)msgbufp, data_end); data_end = round_page(data_end); /* Allocate space for ptbl_bufs. */ ptbl_bufs = (struct ptbl_buf *)data_end; data_end += sizeof(struct ptbl_buf) * PTBL_BUFS; debugf(" ptbl_bufs at 0x%08x end = 0x%08x\n", (uint32_t)ptbl_bufs, data_end); data_end = round_page(data_end); /* Allocate PTE tables for kernel KVA. */ kernel_pdir = data_end; kernel_ptbls = howmany(VM_MAX_KERNEL_ADDRESS - VM_MIN_KERNEL_ADDRESS, PDIR_SIZE); data_end += kernel_ptbls * PTBL_PAGES * PAGE_SIZE; debugf(" kernel ptbls: %d\n", kernel_ptbls); debugf(" kernel pdir at 0x%08x end = 0x%08x\n", kernel_pdir, data_end); debugf(" data_end: 0x%08x\n", data_end); if (data_end - kernstart > kernsize) { kernsize += tlb1_mapin_region(kernstart + kernsize, kernload + kernsize, (data_end - kernstart) - kernsize); } data_end = kernstart + kernsize; debugf(" updated data_end: 0x%08x\n", data_end); /* * Clear the structures - note we can only do it safely after the * possible additional TLB1 translations are in place (above) so that * all range up to the currently calculated 'data_end' is covered. */ dpcpu_init(dpcpu, 0); memset((void *)ptbl_bufs, 0, sizeof(struct ptbl_buf) * PTBL_SIZE); memset((void *)kernel_pdir, 0, kernel_ptbls * PTBL_PAGES * PAGE_SIZE); /*******************************************************/ /* Set the start and end of kva. */ /*******************************************************/ virtual_avail = round_page(data_end); virtual_end = VM_MAX_KERNEL_ADDRESS; /* Allocate KVA space for page zero/copy operations. */ zero_page_va = virtual_avail; virtual_avail += PAGE_SIZE; copy_page_src_va = virtual_avail; virtual_avail += PAGE_SIZE; copy_page_dst_va = virtual_avail; virtual_avail += PAGE_SIZE; debugf("zero_page_va = 0x%08x\n", zero_page_va); debugf("copy_page_src_va = 0x%08x\n", copy_page_src_va); debugf("copy_page_dst_va = 0x%08x\n", copy_page_dst_va); /* Initialize page zero/copy mutexes. */ mtx_init(&zero_page_mutex, "mmu_booke_zero_page", NULL, MTX_DEF); mtx_init(©_page_mutex, "mmu_booke_copy_page", NULL, MTX_DEF); /* Allocate KVA space for ptbl bufs. */ ptbl_buf_pool_vabase = virtual_avail; virtual_avail += PTBL_BUFS * PTBL_PAGES * PAGE_SIZE; debugf("ptbl_buf_pool_vabase = 0x%08x end = 0x%08x\n", ptbl_buf_pool_vabase, virtual_avail); /* Calculate corresponding physical addresses for the kernel region. */ phys_kernelend = kernload + kernsize; debugf("kernel image and allocated data:\n"); debugf(" kernload = 0x%09llx\n", (uint64_t)kernload); debugf(" kernstart = 0x%08x\n", kernstart); debugf(" kernsize = 0x%08x\n", kernsize); if (sizeof(phys_avail) / sizeof(phys_avail[0]) < availmem_regions_sz) panic("mmu_booke_bootstrap: phys_avail too small"); /* * Remove kernel physical address range from avail regions list. Page * align all regions. Non-page aligned memory isn't very interesting * to us. Also, sort the entries for ascending addresses. */ /* Retrieve phys/avail mem regions */ mem_regions(&physmem_regions, &physmem_regions_sz, &availmem_regions, &availmem_regions_sz); sz = 0; cnt = availmem_regions_sz; debugf("processing avail regions:\n"); for (mp = availmem_regions; mp->mr_size; mp++) { s = mp->mr_start; e = mp->mr_start + mp->mr_size; debugf(" %09jx-%09jx -> ", (uintmax_t)s, (uintmax_t)e); /* Check whether this region holds all of the kernel. */ if (s < kernload && e > phys_kernelend) { availmem_regions[cnt].mr_start = phys_kernelend; availmem_regions[cnt++].mr_size = e - phys_kernelend; e = kernload; } /* Look whether this regions starts within the kernel. */ if (s >= kernload && s < phys_kernelend) { if (e <= phys_kernelend) goto empty; s = phys_kernelend; } /* Now look whether this region ends within the kernel. */ if (e > kernload && e <= phys_kernelend) { if (s >= kernload) goto empty; e = kernload; } /* Now page align the start and size of the region. */ s = round_page(s); e = trunc_page(e); if (e < s) e = s; sz = e - s; debugf("%09jx-%09jx = %jx\n", (uintmax_t)s, (uintmax_t)e, (uintmax_t)sz); /* Check whether some memory is left here. */ if (sz == 0) { empty: memmove(mp, mp + 1, (cnt - (mp - availmem_regions)) * sizeof(*mp)); cnt--; mp--; continue; } /* Do an insertion sort. */ for (mp1 = availmem_regions; mp1 < mp; mp1++) if (s < mp1->mr_start) break; if (mp1 < mp) { memmove(mp1 + 1, mp1, (char *)mp - (char *)mp1); mp1->mr_start = s; mp1->mr_size = sz; } else { mp->mr_start = s; mp->mr_size = sz; } } availmem_regions_sz = cnt; /*******************************************************/ /* Steal physical memory for kernel stack from the end */ /* of the first avail region */ /*******************************************************/ kstack0_sz = kstack_pages * PAGE_SIZE; kstack0_phys = availmem_regions[0].mr_start + availmem_regions[0].mr_size; kstack0_phys -= kstack0_sz; availmem_regions[0].mr_size -= kstack0_sz; /*******************************************************/ /* Fill in phys_avail table, based on availmem_regions */ /*******************************************************/ phys_avail_count = 0; physsz = 0; hwphyssz = 0; TUNABLE_ULONG_FETCH("hw.physmem", (u_long *) &hwphyssz); debugf("fill in phys_avail:\n"); for (i = 0, j = 0; i < availmem_regions_sz; i++, j += 2) { debugf(" region: 0x%jx - 0x%jx (0x%jx)\n", (uintmax_t)availmem_regions[i].mr_start, (uintmax_t)availmem_regions[i].mr_start + availmem_regions[i].mr_size, (uintmax_t)availmem_regions[i].mr_size); if (hwphyssz != 0 && (physsz + availmem_regions[i].mr_size) >= hwphyssz) { debugf(" hw.physmem adjust\n"); if (physsz < hwphyssz) { phys_avail[j] = availmem_regions[i].mr_start; phys_avail[j + 1] = availmem_regions[i].mr_start + hwphyssz - physsz; physsz = hwphyssz; phys_avail_count++; } break; } phys_avail[j] = availmem_regions[i].mr_start; phys_avail[j + 1] = availmem_regions[i].mr_start + availmem_regions[i].mr_size; phys_avail_count++; physsz += availmem_regions[i].mr_size; } physmem = btoc(physsz); /* Calculate the last available physical address. */ for (i = 0; phys_avail[i + 2] != 0; i += 2) ; Maxmem = powerpc_btop(phys_avail[i + 1]); debugf("Maxmem = 0x%08lx\n", Maxmem); debugf("phys_avail_count = %d\n", phys_avail_count); debugf("physsz = 0x%09jx physmem = %jd (0x%09jx)\n", (uintmax_t)physsz, (uintmax_t)physmem, (uintmax_t)physmem); /*******************************************************/ /* Initialize (statically allocated) kernel pmap. */ /*******************************************************/ PMAP_LOCK_INIT(kernel_pmap); kptbl_min = VM_MIN_KERNEL_ADDRESS / PDIR_SIZE; debugf("kernel_pmap = 0x%08x\n", (uint32_t)kernel_pmap); debugf("kptbl_min = %d, kernel_ptbls = %d\n", kptbl_min, kernel_ptbls); debugf("kernel pdir range: 0x%08x - 0x%08x\n", kptbl_min * PDIR_SIZE, (kptbl_min + kernel_ptbls) * PDIR_SIZE - 1); kernel_pte_alloc(data_end, kernstart, kernel_pdir); for (i = 0; i < MAXCPU; i++) { kernel_pmap->pm_tid[i] = TID_KERNEL; /* Initialize each CPU's tidbusy entry 0 with kernel_pmap */ tidbusy[i][TID_KERNEL] = kernel_pmap; } /* Mark kernel_pmap active on all CPUs */ CPU_FILL(&kernel_pmap->pm_active); /* * Initialize the global pv list lock. */ rw_init(&pvh_global_lock, "pmap pv global"); /*******************************************************/ /* Final setup */ /*******************************************************/ /* Enter kstack0 into kernel map, provide guard page */ kstack0 = virtual_avail + KSTACK_GUARD_PAGES * PAGE_SIZE; thread0.td_kstack = kstack0; thread0.td_kstack_pages = kstack_pages; debugf("kstack_sz = 0x%08x\n", kstack0_sz); debugf("kstack0_phys at 0x%09llx - 0x%09llx\n", kstack0_phys, kstack0_phys + kstack0_sz); debugf("kstack0 at 0x%08x - 0x%08x\n", kstack0, kstack0 + kstack0_sz); virtual_avail += KSTACK_GUARD_PAGES * PAGE_SIZE + kstack0_sz; for (i = 0; i < kstack_pages; i++) { mmu_booke_kenter(mmu, kstack0, kstack0_phys); kstack0 += PAGE_SIZE; kstack0_phys += PAGE_SIZE; } pmap_bootstrapped = 1; debugf("virtual_avail = %08x\n", virtual_avail); debugf("virtual_end = %08x\n", virtual_end); debugf("mmu_booke_bootstrap: exit\n"); } #ifdef SMP void tlb1_ap_prep(void) { tlb_entry_t *e, tmp; unsigned int i; /* Prepare TLB1 image for AP processors */ e = __boot_tlb1; for (i = 0; i < TLB1_ENTRIES; i++) { tlb1_read_entry(&tmp, i); if ((tmp.mas1 & MAS1_VALID) && (tmp.mas2 & _TLB_ENTRY_SHARED)) memcpy(e++, &tmp, sizeof(tmp)); } } void pmap_bootstrap_ap(volatile uint32_t *trcp __unused) { int i; /* * Finish TLB1 configuration: the BSP already set up its TLB1 and we * have the snapshot of its contents in the s/w __boot_tlb1[] table * created by tlb1_ap_prep(), so use these values directly to * (re)program AP's TLB1 hardware. * * Start at index 1 because index 0 has the kernel map. */ for (i = 1; i < TLB1_ENTRIES; i++) { if (__boot_tlb1[i].mas1 & MAS1_VALID) tlb1_write_entry(&__boot_tlb1[i], i); } set_mas4_defaults(); } #endif static void booke_pmap_init_qpages(void) { struct pcpu *pc; int i; CPU_FOREACH(i) { pc = pcpu_find(i); pc->pc_qmap_addr = kva_alloc(PAGE_SIZE); if (pc->pc_qmap_addr == 0) panic("pmap_init_qpages: unable to allocate KVA"); } } SYSINIT(qpages_init, SI_SUB_CPU, SI_ORDER_ANY, booke_pmap_init_qpages, NULL); /* * Get the physical page address for the given pmap/virtual address. */ static vm_paddr_t mmu_booke_extract(mmu_t mmu, pmap_t pmap, vm_offset_t va) { vm_paddr_t pa; PMAP_LOCK(pmap); pa = pte_vatopa(mmu, pmap, va); PMAP_UNLOCK(pmap); return (pa); } /* * Extract the physical page address associated with the given * kernel virtual address. */ static vm_paddr_t mmu_booke_kextract(mmu_t mmu, vm_offset_t va) { tlb_entry_t e; int i; /* Check TLB1 mappings */ for (i = 0; i < TLB1_ENTRIES; i++) { tlb1_read_entry(&e, i); if (!(e.mas1 & MAS1_VALID)) continue; if (va >= e.virt && va < e.virt + e.size) return (e.phys + (va - e.virt)); } return (pte_vatopa(mmu, kernel_pmap, va)); } /* * Initialize the pmap module. * Called by vm_init, to initialize any structures that the pmap * system needs to map virtual memory. */ static void mmu_booke_init(mmu_t mmu) { int shpgperproc = PMAP_SHPGPERPROC; /* * Initialize the address space (zone) for the pv entries. Set a * high water mark so that the system can recover from excessive * numbers of pv entries. */ pvzone = uma_zcreate("PV ENTRY", sizeof(struct pv_entry), NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_VM | UMA_ZONE_NOFREE); TUNABLE_INT_FETCH("vm.pmap.shpgperproc", &shpgperproc); pv_entry_max = shpgperproc * maxproc + vm_cnt.v_page_count; TUNABLE_INT_FETCH("vm.pmap.pv_entries", &pv_entry_max); pv_entry_high_water = 9 * (pv_entry_max / 10); uma_zone_reserve_kva(pvzone, pv_entry_max); /* Pre-fill pvzone with initial number of pv entries. */ uma_prealloc(pvzone, PV_ENTRY_ZONE_MIN); /* Initialize ptbl allocation. */ ptbl_init(); } /* * Map a list of wired pages into kernel virtual address space. This is * intended for temporary mappings which do not need page modification or * references recorded. Existing mappings in the region are overwritten. */ static void mmu_booke_qenter(mmu_t mmu, vm_offset_t sva, vm_page_t *m, int count) { vm_offset_t va; va = sva; while (count-- > 0) { mmu_booke_kenter(mmu, va, VM_PAGE_TO_PHYS(*m)); va += PAGE_SIZE; m++; } } /* * Remove page mappings from kernel virtual address space. Intended for * temporary mappings entered by mmu_booke_qenter. */ static void mmu_booke_qremove(mmu_t mmu, vm_offset_t sva, int count) { vm_offset_t va; va = sva; while (count-- > 0) { mmu_booke_kremove(mmu, va); va += PAGE_SIZE; } } /* * Map a wired page into kernel virtual address space. */ static void mmu_booke_kenter(mmu_t mmu, vm_offset_t va, vm_paddr_t pa) { mmu_booke_kenter_attr(mmu, va, pa, VM_MEMATTR_DEFAULT); } static void mmu_booke_kenter_attr(mmu_t mmu, vm_offset_t va, vm_paddr_t pa, vm_memattr_t ma) { uint32_t flags; pte_t *pte; KASSERT(((va >= VM_MIN_KERNEL_ADDRESS) && (va <= VM_MAX_KERNEL_ADDRESS)), ("mmu_booke_kenter: invalid va")); flags = PTE_SR | PTE_SW | PTE_SX | PTE_WIRED | PTE_VALID; flags |= tlb_calc_wimg(pa, ma) << PTE_MAS2_SHIFT; flags |= PTE_PS_4KB; pte = pte_find(mmu, kernel_pmap, va); mtx_lock_spin(&tlbivax_mutex); tlb_miss_lock(); if (PTE_ISVALID(pte)) { CTR1(KTR_PMAP, "%s: replacing entry!", __func__); /* Flush entry from TLB0 */ tlb0_flush_entry(va); } *pte = PTE_RPN_FROM_PA(pa) | flags; //debugf("mmu_booke_kenter: pdir_idx = %d ptbl_idx = %d va=0x%08x " // "pa=0x%08x rpn=0x%08x flags=0x%08x\n", // pdir_idx, ptbl_idx, va, pa, pte->rpn, pte->flags); /* Flush the real memory from the instruction cache. */ if ((flags & (PTE_I | PTE_G)) == 0) __syncicache((void *)va, PAGE_SIZE); tlb_miss_unlock(); mtx_unlock_spin(&tlbivax_mutex); } /* * Remove a page from kernel page table. */ static void mmu_booke_kremove(mmu_t mmu, vm_offset_t va) { pte_t *pte; CTR2(KTR_PMAP,"%s: s (va = 0x%08x)\n", __func__, va); KASSERT(((va >= VM_MIN_KERNEL_ADDRESS) && (va <= VM_MAX_KERNEL_ADDRESS)), ("mmu_booke_kremove: invalid va")); pte = pte_find(mmu, kernel_pmap, va); if (!PTE_ISVALID(pte)) { CTR1(KTR_PMAP, "%s: invalid pte", __func__); return; } mtx_lock_spin(&tlbivax_mutex); tlb_miss_lock(); /* Invalidate entry in TLB0, update PTE. */ tlb0_flush_entry(va); *pte = 0; tlb_miss_unlock(); mtx_unlock_spin(&tlbivax_mutex); } /* * Initialize pmap associated with process 0. */ static void mmu_booke_pinit0(mmu_t mmu, pmap_t pmap) { PMAP_LOCK_INIT(pmap); mmu_booke_pinit(mmu, pmap); PCPU_SET(curpmap, pmap); } /* * Initialize a preallocated and zeroed pmap structure, * such as one in a vmspace structure. */ static void mmu_booke_pinit(mmu_t mmu, pmap_t pmap) { int i; CTR4(KTR_PMAP, "%s: pmap = %p, proc %d '%s'", __func__, pmap, curthread->td_proc->p_pid, curthread->td_proc->p_comm); KASSERT((pmap != kernel_pmap), ("pmap_pinit: initializing kernel_pmap")); for (i = 0; i < MAXCPU; i++) pmap->pm_tid[i] = TID_NONE; CPU_ZERO(&kernel_pmap->pm_active); bzero(&pmap->pm_stats, sizeof(pmap->pm_stats)); bzero(&pmap->pm_pdir, sizeof(pte_t *) * PDIR_NENTRIES); TAILQ_INIT(&pmap->pm_ptbl_list); } /* * Release any resources held by the given physical map. * Called when a pmap initialized by mmu_booke_pinit is being released. * Should only be called if the map contains no valid mappings. */ static void mmu_booke_release(mmu_t mmu, pmap_t pmap) { KASSERT(pmap->pm_stats.resident_count == 0, ("pmap_release: pmap resident count %ld != 0", pmap->pm_stats.resident_count)); } /* * Insert the given physical page at the specified virtual address in the * target physical map with the protection requested. If specified the page * will be wired down. */ static int mmu_booke_enter(mmu_t mmu, pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot, u_int flags, int8_t psind) { int error; rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); error = mmu_booke_enter_locked(mmu, pmap, va, m, prot, flags, psind); rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pmap); return (error); } static int mmu_booke_enter_locked(mmu_t mmu, pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot, u_int pmap_flags, int8_t psind __unused) { pte_t *pte; vm_paddr_t pa; uint32_t flags; int error, su, sync; pa = VM_PAGE_TO_PHYS(m); su = (pmap == kernel_pmap); sync = 0; //debugf("mmu_booke_enter_locked: s (pmap=0x%08x su=%d tid=%d m=0x%08x va=0x%08x " // "pa=0x%08x prot=0x%08x flags=%#x)\n", // (u_int32_t)pmap, su, pmap->pm_tid, // (u_int32_t)m, va, pa, prot, flags); if (su) { KASSERT(((va >= virtual_avail) && (va <= VM_MAX_KERNEL_ADDRESS)), ("mmu_booke_enter_locked: kernel pmap, non kernel va")); } else { KASSERT((va <= VM_MAXUSER_ADDRESS), ("mmu_booke_enter_locked: user pmap, non user va")); } if ((m->oflags & VPO_UNMANAGED) == 0 && !vm_page_xbusied(m)) VM_OBJECT_ASSERT_LOCKED(m->object); PMAP_LOCK_ASSERT(pmap, MA_OWNED); /* * If there is an existing mapping, and the physical address has not * changed, must be protection or wiring change. */ if (((pte = pte_find(mmu, pmap, va)) != NULL) && (PTE_ISVALID(pte)) && (PTE_PA(pte) == pa)) { /* * Before actually updating pte->flags we calculate and * prepare its new value in a helper var. */ flags = *pte; flags &= ~(PTE_UW | PTE_UX | PTE_SW | PTE_SX | PTE_MODIFIED); /* Wiring change, just update stats. */ if ((pmap_flags & PMAP_ENTER_WIRED) != 0) { if (!PTE_ISWIRED(pte)) { flags |= PTE_WIRED; pmap->pm_stats.wired_count++; } } else { if (PTE_ISWIRED(pte)) { flags &= ~PTE_WIRED; pmap->pm_stats.wired_count--; } } if (prot & VM_PROT_WRITE) { /* Add write permissions. */ flags |= PTE_SW; if (!su) flags |= PTE_UW; if ((flags & PTE_MANAGED) != 0) vm_page_aflag_set(m, PGA_WRITEABLE); } else { /* Handle modified pages, sense modify status. */ /* * The PTE_MODIFIED flag could be set by underlying * TLB misses since we last read it (above), possibly * other CPUs could update it so we check in the PTE * directly rather than rely on that saved local flags * copy. */ if (PTE_ISMODIFIED(pte)) vm_page_dirty(m); } if (prot & VM_PROT_EXECUTE) { flags |= PTE_SX; if (!su) flags |= PTE_UX; /* * Check existing flags for execute permissions: if we * are turning execute permissions on, icache should * be flushed. */ if ((*pte & (PTE_UX | PTE_SX)) == 0) sync++; } flags &= ~PTE_REFERENCED; /* * The new flags value is all calculated -- only now actually * update the PTE. */ mtx_lock_spin(&tlbivax_mutex); tlb_miss_lock(); tlb0_flush_entry(va); *pte &= ~PTE_FLAGS_MASK; *pte |= flags; tlb_miss_unlock(); mtx_unlock_spin(&tlbivax_mutex); } else { /* * If there is an existing mapping, but it's for a different * physical address, pte_enter() will delete the old mapping. */ //if ((pte != NULL) && PTE_ISVALID(pte)) // debugf("mmu_booke_enter_locked: replace\n"); //else // debugf("mmu_booke_enter_locked: new\n"); /* Now set up the flags and install the new mapping. */ flags = (PTE_SR | PTE_VALID); flags |= PTE_M; if (!su) flags |= PTE_UR; if (prot & VM_PROT_WRITE) { flags |= PTE_SW; if (!su) flags |= PTE_UW; if ((m->oflags & VPO_UNMANAGED) == 0) vm_page_aflag_set(m, PGA_WRITEABLE); } if (prot & VM_PROT_EXECUTE) { flags |= PTE_SX; if (!su) flags |= PTE_UX; } /* If its wired update stats. */ if ((pmap_flags & PMAP_ENTER_WIRED) != 0) flags |= PTE_WIRED; error = pte_enter(mmu, pmap, m, va, flags, (pmap_flags & PMAP_ENTER_NOSLEEP) != 0); if (error != 0) return (KERN_RESOURCE_SHORTAGE); if ((flags & PMAP_ENTER_WIRED) != 0) pmap->pm_stats.wired_count++; /* Flush the real memory from the instruction cache. */ if (prot & VM_PROT_EXECUTE) sync++; } if (sync && (su || pmap == PCPU_GET(curpmap))) { __syncicache((void *)va, PAGE_SIZE); sync = 0; } return (KERN_SUCCESS); } /* * Maps a sequence of resident pages belonging to the same object. * The sequence begins with the given page m_start. This page is * mapped at the given virtual address start. Each subsequent page is * mapped at a virtual address that is offset from start by the same * amount as the page is offset from m_start within the object. The * last page in the sequence is the page with the largest offset from * m_start that can be mapped at a virtual address less than the given * virtual address end. Not every virtual page between start and end * is mapped; only those for which a resident page exists with the * corresponding offset from m_start are mapped. */ static void mmu_booke_enter_object(mmu_t mmu, pmap_t pmap, vm_offset_t start, vm_offset_t end, vm_page_t m_start, vm_prot_t prot) { vm_page_t m; vm_pindex_t diff, psize; VM_OBJECT_ASSERT_LOCKED(m_start->object); psize = atop(end - start); m = m_start; rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); while (m != NULL && (diff = m->pindex - m_start->pindex) < psize) { mmu_booke_enter_locked(mmu, pmap, start + ptoa(diff), m, prot & (VM_PROT_READ | VM_PROT_EXECUTE), PMAP_ENTER_NOSLEEP, 0); m = TAILQ_NEXT(m, listq); } rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pmap); } static void mmu_booke_enter_quick(mmu_t mmu, pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot) { rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); mmu_booke_enter_locked(mmu, pmap, va, m, prot & (VM_PROT_READ | VM_PROT_EXECUTE), PMAP_ENTER_NOSLEEP, 0); rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pmap); } /* * Remove the given range of addresses from the specified map. * * It is assumed that the start and end are properly rounded to the page size. */ static void mmu_booke_remove(mmu_t mmu, pmap_t pmap, vm_offset_t va, vm_offset_t endva) { pte_t *pte; uint8_t hold_flag; int su = (pmap == kernel_pmap); //debugf("mmu_booke_remove: s (su = %d pmap=0x%08x tid=%d va=0x%08x endva=0x%08x)\n", // su, (u_int32_t)pmap, pmap->pm_tid, va, endva); if (su) { KASSERT(((va >= virtual_avail) && (va <= VM_MAX_KERNEL_ADDRESS)), ("mmu_booke_remove: kernel pmap, non kernel va")); } else { KASSERT((va <= VM_MAXUSER_ADDRESS), ("mmu_booke_remove: user pmap, non user va")); } if (PMAP_REMOVE_DONE(pmap)) { //debugf("mmu_booke_remove: e (empty)\n"); return; } hold_flag = PTBL_HOLD_FLAG(pmap); //debugf("mmu_booke_remove: hold_flag = %d\n", hold_flag); rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); for (; va < endva; va += PAGE_SIZE) { pte = pte_find(mmu, pmap, va); if ((pte != NULL) && PTE_ISVALID(pte)) pte_remove(mmu, pmap, va, hold_flag); } PMAP_UNLOCK(pmap); rw_wunlock(&pvh_global_lock); //debugf("mmu_booke_remove: e\n"); } /* * Remove physical page from all pmaps in which it resides. */ static void mmu_booke_remove_all(mmu_t mmu, vm_page_t m) { pv_entry_t pv, pvn; uint8_t hold_flag; rw_wlock(&pvh_global_lock); for (pv = TAILQ_FIRST(&m->md.pv_list); pv != NULL; pv = pvn) { pvn = TAILQ_NEXT(pv, pv_link); PMAP_LOCK(pv->pv_pmap); hold_flag = PTBL_HOLD_FLAG(pv->pv_pmap); pte_remove(mmu, pv->pv_pmap, pv->pv_va, hold_flag); PMAP_UNLOCK(pv->pv_pmap); } vm_page_aflag_clear(m, PGA_WRITEABLE); rw_wunlock(&pvh_global_lock); } /* * Map a range of physical addresses into kernel virtual address space. */ static vm_offset_t mmu_booke_map(mmu_t mmu, vm_offset_t *virt, vm_paddr_t pa_start, vm_paddr_t pa_end, int prot) { vm_offset_t sva = *virt; vm_offset_t va = sva; //debugf("mmu_booke_map: s (sva = 0x%08x pa_start = 0x%08x pa_end = 0x%08x)\n", // sva, pa_start, pa_end); while (pa_start < pa_end) { mmu_booke_kenter(mmu, va, pa_start); va += PAGE_SIZE; pa_start += PAGE_SIZE; } *virt = va; //debugf("mmu_booke_map: e (va = 0x%08x)\n", va); return (sva); } /* * The pmap must be activated before it's address space can be accessed in any * way. */ static void mmu_booke_activate(mmu_t mmu, struct thread *td) { pmap_t pmap; u_int cpuid; pmap = &td->td_proc->p_vmspace->vm_pmap; CTR5(KTR_PMAP, "%s: s (td = %p, proc = '%s', id = %d, pmap = 0x%08x)", __func__, td, td->td_proc->p_comm, td->td_proc->p_pid, pmap); KASSERT((pmap != kernel_pmap), ("mmu_booke_activate: kernel_pmap!")); sched_pin(); cpuid = PCPU_GET(cpuid); CPU_SET_ATOMIC(cpuid, &pmap->pm_active); PCPU_SET(curpmap, pmap); if (pmap->pm_tid[cpuid] == TID_NONE) tid_alloc(pmap); /* Load PID0 register with pmap tid value. */ mtspr(SPR_PID0, pmap->pm_tid[cpuid]); __asm __volatile("isync"); mtspr(SPR_DBCR0, td->td_pcb->pcb_cpu.booke.dbcr0); sched_unpin(); CTR3(KTR_PMAP, "%s: e (tid = %d for '%s')", __func__, pmap->pm_tid[PCPU_GET(cpuid)], td->td_proc->p_comm); } /* * Deactivate the specified process's address space. */ static void mmu_booke_deactivate(mmu_t mmu, struct thread *td) { pmap_t pmap; pmap = &td->td_proc->p_vmspace->vm_pmap; CTR5(KTR_PMAP, "%s: td=%p, proc = '%s', id = %d, pmap = 0x%08x", __func__, td, td->td_proc->p_comm, td->td_proc->p_pid, pmap); td->td_pcb->pcb_cpu.booke.dbcr0 = mfspr(SPR_DBCR0); CPU_CLR_ATOMIC(PCPU_GET(cpuid), &pmap->pm_active); PCPU_SET(curpmap, NULL); } /* * Copy the range specified by src_addr/len * from the source map to the range dst_addr/len * in the destination map. * * This routine is only advisory and need not do anything. */ static void mmu_booke_copy(mmu_t mmu, pmap_t dst_pmap, pmap_t src_pmap, vm_offset_t dst_addr, vm_size_t len, vm_offset_t src_addr) { } /* * Set the physical protection on the specified range of this map as requested. */ static void mmu_booke_protect(mmu_t mmu, pmap_t pmap, vm_offset_t sva, vm_offset_t eva, vm_prot_t prot) { vm_offset_t va; vm_page_t m; pte_t *pte; if ((prot & VM_PROT_READ) == VM_PROT_NONE) { mmu_booke_remove(mmu, pmap, sva, eva); return; } if (prot & VM_PROT_WRITE) return; PMAP_LOCK(pmap); for (va = sva; va < eva; va += PAGE_SIZE) { if ((pte = pte_find(mmu, pmap, va)) != NULL) { if (PTE_ISVALID(pte)) { m = PHYS_TO_VM_PAGE(PTE_PA(pte)); mtx_lock_spin(&tlbivax_mutex); tlb_miss_lock(); /* Handle modified pages. */ if (PTE_ISMODIFIED(pte) && PTE_ISMANAGED(pte)) vm_page_dirty(m); tlb0_flush_entry(va); *pte &= ~(PTE_UW | PTE_SW | PTE_MODIFIED); tlb_miss_unlock(); mtx_unlock_spin(&tlbivax_mutex); } } } PMAP_UNLOCK(pmap); } /* * Clear the write and modified bits in each of the given page's mappings. */ static void mmu_booke_remove_write(mmu_t mmu, vm_page_t m) { pv_entry_t pv; pte_t *pte; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("mmu_booke_remove_write: page %p is not managed", m)); /* * If the page is not exclusive busied, then PGA_WRITEABLE cannot be * set by another thread while the object is locked. Thus, * if PGA_WRITEABLE is clear, no page table entries need updating. */ VM_OBJECT_ASSERT_WLOCKED(m->object); if (!vm_page_xbusied(m) && (m->aflags & PGA_WRITEABLE) == 0) return; rw_wlock(&pvh_global_lock); TAILQ_FOREACH(pv, &m->md.pv_list, pv_link) { PMAP_LOCK(pv->pv_pmap); if ((pte = pte_find(mmu, pv->pv_pmap, pv->pv_va)) != NULL) { if (PTE_ISVALID(pte)) { m = PHYS_TO_VM_PAGE(PTE_PA(pte)); mtx_lock_spin(&tlbivax_mutex); tlb_miss_lock(); /* Handle modified pages. */ if (PTE_ISMODIFIED(pte)) vm_page_dirty(m); /* Flush mapping from TLB0. */ *pte &= ~(PTE_UW | PTE_SW | PTE_MODIFIED); tlb_miss_unlock(); mtx_unlock_spin(&tlbivax_mutex); } } PMAP_UNLOCK(pv->pv_pmap); } vm_page_aflag_clear(m, PGA_WRITEABLE); rw_wunlock(&pvh_global_lock); } static void mmu_booke_sync_icache(mmu_t mmu, pmap_t pm, vm_offset_t va, vm_size_t sz) { pte_t *pte; pmap_t pmap; vm_page_t m; vm_offset_t addr; vm_paddr_t pa = 0; int active, valid; va = trunc_page(va); sz = round_page(sz); rw_wlock(&pvh_global_lock); pmap = PCPU_GET(curpmap); active = (pm == kernel_pmap || pm == pmap) ? 1 : 0; while (sz > 0) { PMAP_LOCK(pm); pte = pte_find(mmu, pm, va); valid = (pte != NULL && PTE_ISVALID(pte)) ? 1 : 0; if (valid) pa = PTE_PA(pte); PMAP_UNLOCK(pm); if (valid) { if (!active) { /* Create a mapping in the active pmap. */ addr = 0; m = PHYS_TO_VM_PAGE(pa); PMAP_LOCK(pmap); pte_enter(mmu, pmap, m, addr, PTE_SR | PTE_VALID | PTE_UR, FALSE); __syncicache((void *)addr, PAGE_SIZE); pte_remove(mmu, pmap, addr, PTBL_UNHOLD); PMAP_UNLOCK(pmap); } else __syncicache((void *)va, PAGE_SIZE); } va += PAGE_SIZE; sz -= PAGE_SIZE; } rw_wunlock(&pvh_global_lock); } /* * Atomically extract and hold the physical page with the given * pmap and virtual address pair if that mapping permits the given * protection. */ static vm_page_t mmu_booke_extract_and_hold(mmu_t mmu, pmap_t pmap, vm_offset_t va, vm_prot_t prot) { pte_t *pte; vm_page_t m; uint32_t pte_wbit; vm_paddr_t pa; m = NULL; pa = 0; PMAP_LOCK(pmap); retry: pte = pte_find(mmu, pmap, va); if ((pte != NULL) && PTE_ISVALID(pte)) { if (pmap == kernel_pmap) pte_wbit = PTE_SW; else pte_wbit = PTE_UW; if ((*pte & pte_wbit) || ((prot & VM_PROT_WRITE) == 0)) { if (vm_page_pa_tryrelock(pmap, PTE_PA(pte), &pa)) goto retry; m = PHYS_TO_VM_PAGE(PTE_PA(pte)); vm_page_hold(m); } } PA_UNLOCK_COND(pa); PMAP_UNLOCK(pmap); return (m); } /* * Initialize a vm_page's machine-dependent fields. */ static void mmu_booke_page_init(mmu_t mmu, vm_page_t m) { TAILQ_INIT(&m->md.pv_list); } /* * mmu_booke_zero_page_area zeros the specified hardware page by * mapping it into virtual memory and using bzero to clear * its contents. * * off and size must reside within a single page. */ static void mmu_booke_zero_page_area(mmu_t mmu, vm_page_t m, int off, int size) { vm_offset_t va; /* XXX KASSERT off and size are within a single page? */ mtx_lock(&zero_page_mutex); va = zero_page_va; mmu_booke_kenter(mmu, va, VM_PAGE_TO_PHYS(m)); bzero((caddr_t)va + off, size); mmu_booke_kremove(mmu, va); mtx_unlock(&zero_page_mutex); } /* * mmu_booke_zero_page zeros the specified hardware page. */ static void mmu_booke_zero_page(mmu_t mmu, vm_page_t m) { vm_offset_t off, va; mtx_lock(&zero_page_mutex); va = zero_page_va; mmu_booke_kenter(mmu, va, VM_PAGE_TO_PHYS(m)); for (off = 0; off < PAGE_SIZE; off += cacheline_size) __asm __volatile("dcbz 0,%0" :: "r"(va + off)); mmu_booke_kremove(mmu, va); mtx_unlock(&zero_page_mutex); } /* * mmu_booke_copy_page copies the specified (machine independent) page by * mapping the page into virtual memory and using memcopy to copy the page, * one machine dependent page at a time. */ static void mmu_booke_copy_page(mmu_t mmu, vm_page_t sm, vm_page_t dm) { vm_offset_t sva, dva; sva = copy_page_src_va; dva = copy_page_dst_va; mtx_lock(©_page_mutex); mmu_booke_kenter(mmu, sva, VM_PAGE_TO_PHYS(sm)); mmu_booke_kenter(mmu, dva, VM_PAGE_TO_PHYS(dm)); memcpy((caddr_t)dva, (caddr_t)sva, PAGE_SIZE); mmu_booke_kremove(mmu, dva); mmu_booke_kremove(mmu, sva); mtx_unlock(©_page_mutex); } static inline void mmu_booke_copy_pages(mmu_t mmu, vm_page_t *ma, vm_offset_t a_offset, vm_page_t *mb, vm_offset_t b_offset, int xfersize) { void *a_cp, *b_cp; vm_offset_t a_pg_offset, b_pg_offset; int cnt; mtx_lock(©_page_mutex); while (xfersize > 0) { a_pg_offset = a_offset & PAGE_MASK; cnt = min(xfersize, PAGE_SIZE - a_pg_offset); mmu_booke_kenter(mmu, copy_page_src_va, VM_PAGE_TO_PHYS(ma[a_offset >> PAGE_SHIFT])); a_cp = (char *)copy_page_src_va + a_pg_offset; b_pg_offset = b_offset & PAGE_MASK; cnt = min(cnt, PAGE_SIZE - b_pg_offset); mmu_booke_kenter(mmu, copy_page_dst_va, VM_PAGE_TO_PHYS(mb[b_offset >> PAGE_SHIFT])); b_cp = (char *)copy_page_dst_va + b_pg_offset; bcopy(a_cp, b_cp, cnt); mmu_booke_kremove(mmu, copy_page_dst_va); mmu_booke_kremove(mmu, copy_page_src_va); a_offset += cnt; b_offset += cnt; xfersize -= cnt; } mtx_unlock(©_page_mutex); } static vm_offset_t mmu_booke_quick_enter_page(mmu_t mmu, vm_page_t m) { vm_paddr_t paddr; vm_offset_t qaddr; uint32_t flags; pte_t *pte; paddr = VM_PAGE_TO_PHYS(m); flags = PTE_SR | PTE_SW | PTE_SX | PTE_WIRED | PTE_VALID; flags |= tlb_calc_wimg(paddr, pmap_page_get_memattr(m)) << PTE_MAS2_SHIFT; flags |= PTE_PS_4KB; critical_enter(); qaddr = PCPU_GET(qmap_addr); pte = pte_find(mmu, kernel_pmap, qaddr); KASSERT(*pte == 0, ("mmu_booke_quick_enter_page: PTE busy")); /* * XXX: tlbivax is broadcast to other cores, but qaddr should * not be present in other TLBs. Is there a better instruction * sequence to use? Or just forget it & use mmu_booke_kenter()... */ __asm __volatile("tlbivax 0, %0" :: "r"(qaddr & MAS2_EPN_MASK)); __asm __volatile("isync; msync"); *pte = PTE_RPN_FROM_PA(paddr) | flags; /* Flush the real memory from the instruction cache. */ if ((flags & (PTE_I | PTE_G)) == 0) __syncicache((void *)qaddr, PAGE_SIZE); return (qaddr); } static void mmu_booke_quick_remove_page(mmu_t mmu, vm_offset_t addr) { pte_t *pte; pte = pte_find(mmu, kernel_pmap, addr); KASSERT(PCPU_GET(qmap_addr) == addr, ("mmu_booke_quick_remove_page: invalid address")); KASSERT(*pte != 0, ("mmu_booke_quick_remove_page: PTE not in use")); *pte = 0; critical_exit(); } /* * Return whether or not the specified physical page was modified * in any of physical maps. */ static boolean_t mmu_booke_is_modified(mmu_t mmu, vm_page_t m) { pte_t *pte; pv_entry_t pv; boolean_t rv; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("mmu_booke_is_modified: page %p is not managed", m)); rv = FALSE; /* * If the page is not exclusive busied, then PGA_WRITEABLE cannot be * concurrently set while the object is locked. Thus, if PGA_WRITEABLE * is clear, no PTEs can be modified. */ VM_OBJECT_ASSERT_WLOCKED(m->object); if (!vm_page_xbusied(m) && (m->aflags & PGA_WRITEABLE) == 0) return (rv); rw_wlock(&pvh_global_lock); TAILQ_FOREACH(pv, &m->md.pv_list, pv_link) { PMAP_LOCK(pv->pv_pmap); if ((pte = pte_find(mmu, pv->pv_pmap, pv->pv_va)) != NULL && PTE_ISVALID(pte)) { if (PTE_ISMODIFIED(pte)) rv = TRUE; } PMAP_UNLOCK(pv->pv_pmap); if (rv) break; } rw_wunlock(&pvh_global_lock); return (rv); } /* * Return whether or not the specified virtual address is eligible * for prefault. */ static boolean_t mmu_booke_is_prefaultable(mmu_t mmu, pmap_t pmap, vm_offset_t addr) { return (FALSE); } /* * Return whether or not the specified physical page was referenced * in any physical maps. */ static boolean_t mmu_booke_is_referenced(mmu_t mmu, vm_page_t m) { pte_t *pte; pv_entry_t pv; boolean_t rv; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("mmu_booke_is_referenced: page %p is not managed", m)); rv = FALSE; rw_wlock(&pvh_global_lock); TAILQ_FOREACH(pv, &m->md.pv_list, pv_link) { PMAP_LOCK(pv->pv_pmap); if ((pte = pte_find(mmu, pv->pv_pmap, pv->pv_va)) != NULL && PTE_ISVALID(pte)) { if (PTE_ISREFERENCED(pte)) rv = TRUE; } PMAP_UNLOCK(pv->pv_pmap); if (rv) break; } rw_wunlock(&pvh_global_lock); return (rv); } /* * Clear the modify bits on the specified physical page. */ static void mmu_booke_clear_modify(mmu_t mmu, vm_page_t m) { pte_t *pte; pv_entry_t pv; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("mmu_booke_clear_modify: page %p is not managed", m)); VM_OBJECT_ASSERT_WLOCKED(m->object); KASSERT(!vm_page_xbusied(m), ("mmu_booke_clear_modify: page %p is exclusive busied", m)); /* * If the page is not PG_AWRITEABLE, then no PTEs can be modified. * If the object containing the page is locked and the page is not * exclusive busied, then PG_AWRITEABLE cannot be concurrently set. */ if ((m->aflags & PGA_WRITEABLE) == 0) return; rw_wlock(&pvh_global_lock); TAILQ_FOREACH(pv, &m->md.pv_list, pv_link) { PMAP_LOCK(pv->pv_pmap); if ((pte = pte_find(mmu, pv->pv_pmap, pv->pv_va)) != NULL && PTE_ISVALID(pte)) { mtx_lock_spin(&tlbivax_mutex); tlb_miss_lock(); if (*pte & (PTE_SW | PTE_UW | PTE_MODIFIED)) { tlb0_flush_entry(pv->pv_va); *pte &= ~(PTE_SW | PTE_UW | PTE_MODIFIED | PTE_REFERENCED); } tlb_miss_unlock(); mtx_unlock_spin(&tlbivax_mutex); } PMAP_UNLOCK(pv->pv_pmap); } rw_wunlock(&pvh_global_lock); } /* * Return a count of reference bits for a page, clearing those bits. * It is not necessary for every reference bit to be cleared, but it * is necessary that 0 only be returned when there are truly no * reference bits set. * - * XXX: The exact number of bits to check and clear is a matter that - * should be tested and standardized at some point in the future for - * optimal aging of shared pages. + * As an optimization, update the page's dirty field if a modified bit is + * found while counting reference bits. This opportunistic update can be + * performed at low cost and can eliminate the need for some future calls + * to pmap_is_modified(). However, since this function stops after + * finding PMAP_TS_REFERENCED_MAX reference bits, it may not detect some + * dirty pages. Those dirty pages will only be detected by a future call + * to pmap_is_modified(). */ static int mmu_booke_ts_referenced(mmu_t mmu, vm_page_t m) { pte_t *pte; pv_entry_t pv; int count; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("mmu_booke_ts_referenced: page %p is not managed", m)); count = 0; rw_wlock(&pvh_global_lock); TAILQ_FOREACH(pv, &m->md.pv_list, pv_link) { PMAP_LOCK(pv->pv_pmap); if ((pte = pte_find(mmu, pv->pv_pmap, pv->pv_va)) != NULL && PTE_ISVALID(pte)) { + if (PTE_ISMODIFIED(pte)) + vm_page_dirty(m); if (PTE_ISREFERENCED(pte)) { mtx_lock_spin(&tlbivax_mutex); tlb_miss_lock(); tlb0_flush_entry(pv->pv_va); *pte &= ~PTE_REFERENCED; tlb_miss_unlock(); mtx_unlock_spin(&tlbivax_mutex); - if (++count > 4) { + if (++count >= PMAP_TS_REFERENCED_MAX) { PMAP_UNLOCK(pv->pv_pmap); break; } } } PMAP_UNLOCK(pv->pv_pmap); } rw_wunlock(&pvh_global_lock); return (count); } /* * Clear the wired attribute from the mappings for the specified range of * addresses in the given pmap. Every valid mapping within that range must * have the wired attribute set. In contrast, invalid mappings cannot have * the wired attribute set, so they are ignored. * * The wired attribute of the page table entry is not a hardware feature, so * there is no need to invalidate any TLB entries. */ static void mmu_booke_unwire(mmu_t mmu, pmap_t pmap, vm_offset_t sva, vm_offset_t eva) { vm_offset_t va; pte_t *pte; PMAP_LOCK(pmap); for (va = sva; va < eva; va += PAGE_SIZE) { if ((pte = pte_find(mmu, pmap, va)) != NULL && PTE_ISVALID(pte)) { if (!PTE_ISWIRED(pte)) panic("mmu_booke_unwire: pte %p isn't wired", pte); *pte &= ~PTE_WIRED; pmap->pm_stats.wired_count--; } } PMAP_UNLOCK(pmap); } /* * Return true if the pmap's pv is one of the first 16 pvs linked to from this * page. This count may be changed upwards or downwards in the future; it is * only necessary that true be returned for a small subset of pmaps for proper * page aging. */ static boolean_t mmu_booke_page_exists_quick(mmu_t mmu, pmap_t pmap, vm_page_t m) { pv_entry_t pv; int loops; boolean_t rv; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("mmu_booke_page_exists_quick: page %p is not managed", m)); loops = 0; rv = FALSE; rw_wlock(&pvh_global_lock); TAILQ_FOREACH(pv, &m->md.pv_list, pv_link) { if (pv->pv_pmap == pmap) { rv = TRUE; break; } if (++loops >= 16) break; } rw_wunlock(&pvh_global_lock); return (rv); } /* * Return the number of managed mappings to the given physical page that are * wired. */ static int mmu_booke_page_wired_mappings(mmu_t mmu, vm_page_t m) { pv_entry_t pv; pte_t *pte; int count = 0; if ((m->oflags & VPO_UNMANAGED) != 0) return (count); rw_wlock(&pvh_global_lock); TAILQ_FOREACH(pv, &m->md.pv_list, pv_link) { PMAP_LOCK(pv->pv_pmap); if ((pte = pte_find(mmu, pv->pv_pmap, pv->pv_va)) != NULL) if (PTE_ISVALID(pte) && PTE_ISWIRED(pte)) count++; PMAP_UNLOCK(pv->pv_pmap); } rw_wunlock(&pvh_global_lock); return (count); } static int mmu_booke_dev_direct_mapped(mmu_t mmu, vm_paddr_t pa, vm_size_t size) { int i; vm_offset_t va; /* * This currently does not work for entries that * overlap TLB1 entries. */ for (i = 0; i < TLB1_ENTRIES; i ++) { if (tlb1_iomapped(i, pa, size, &va) == 0) return (0); } return (EFAULT); } void mmu_booke_dumpsys_map(mmu_t mmu, vm_paddr_t pa, size_t sz, void **va) { vm_paddr_t ppa; vm_offset_t ofs; vm_size_t gran; /* Minidumps are based on virtual memory addresses. */ if (do_minidump) { *va = (void *)(vm_offset_t)pa; return; } /* Raw physical memory dumps don't have a virtual address. */ /* We always map a 256MB page at 256M. */ gran = 256 * 1024 * 1024; ppa = rounddown2(pa, gran); ofs = pa - ppa; *va = (void *)gran; tlb1_set_entry((vm_offset_t)va, ppa, gran, _TLB_ENTRY_IO); if (sz > (gran - ofs)) tlb1_set_entry((vm_offset_t)(va + gran), ppa + gran, gran, _TLB_ENTRY_IO); } void mmu_booke_dumpsys_unmap(mmu_t mmu, vm_paddr_t pa, size_t sz, void *va) { vm_paddr_t ppa; vm_offset_t ofs; vm_size_t gran; tlb_entry_t e; int i; /* Minidumps are based on virtual memory addresses. */ /* Nothing to do... */ if (do_minidump) return; for (i = 0; i < TLB1_ENTRIES; i++) { tlb1_read_entry(&e, i); if (!(e.mas1 & MAS1_VALID)) break; } /* Raw physical memory dumps don't have a virtual address. */ i--; e.mas1 = 0; e.mas2 = 0; e.mas3 = 0; tlb1_write_entry(&e, i); gran = 256 * 1024 * 1024; ppa = rounddown2(pa, gran); ofs = pa - ppa; if (sz > (gran - ofs)) { i--; e.mas1 = 0; e.mas2 = 0; e.mas3 = 0; tlb1_write_entry(&e, i); } } extern struct dump_pa dump_map[PHYS_AVAIL_SZ + 1]; void mmu_booke_scan_init(mmu_t mmu) { vm_offset_t va; pte_t *pte; int i; if (!do_minidump) { /* Initialize phys. segments for dumpsys(). */ memset(&dump_map, 0, sizeof(dump_map)); mem_regions(&physmem_regions, &physmem_regions_sz, &availmem_regions, &availmem_regions_sz); for (i = 0; i < physmem_regions_sz; i++) { dump_map[i].pa_start = physmem_regions[i].mr_start; dump_map[i].pa_size = physmem_regions[i].mr_size; } return; } /* Virtual segments for minidumps: */ memset(&dump_map, 0, sizeof(dump_map)); /* 1st: kernel .data and .bss. */ dump_map[0].pa_start = trunc_page((uintptr_t)_etext); dump_map[0].pa_size = round_page((uintptr_t)_end) - dump_map[0].pa_start; /* 2nd: msgbuf and tables (see pmap_bootstrap()). */ dump_map[1].pa_start = data_start; dump_map[1].pa_size = data_end - data_start; /* 3rd: kernel VM. */ va = dump_map[1].pa_start + dump_map[1].pa_size; /* Find start of next chunk (from va). */ while (va < virtual_end) { /* Don't dump the buffer cache. */ if (va >= kmi.buffer_sva && va < kmi.buffer_eva) { va = kmi.buffer_eva; continue; } pte = pte_find(mmu, kernel_pmap, va); if (pte != NULL && PTE_ISVALID(pte)) break; va += PAGE_SIZE; } if (va < virtual_end) { dump_map[2].pa_start = va; va += PAGE_SIZE; /* Find last page in chunk. */ while (va < virtual_end) { /* Don't run into the buffer cache. */ if (va == kmi.buffer_sva) break; pte = pte_find(mmu, kernel_pmap, va); if (pte == NULL || !PTE_ISVALID(pte)) break; va += PAGE_SIZE; } dump_map[2].pa_size = va - dump_map[2].pa_start; } } /* * Map a set of physical memory pages into the kernel virtual address space. * Return a pointer to where it is mapped. This routine is intended to be used * for mapping device memory, NOT real memory. */ static void * mmu_booke_mapdev(mmu_t mmu, vm_paddr_t pa, vm_size_t size) { return (mmu_booke_mapdev_attr(mmu, pa, size, VM_MEMATTR_DEFAULT)); } static void * mmu_booke_mapdev_attr(mmu_t mmu, vm_paddr_t pa, vm_size_t size, vm_memattr_t ma) { tlb_entry_t e; void *res; uintptr_t va, tmpva; vm_size_t sz; int i; /* * Check if this is premapped in TLB1. Note: this should probably also * check whether a sequence of TLB1 entries exist that match the * requirement, but now only checks the easy case. */ if (ma == VM_MEMATTR_DEFAULT) { for (i = 0; i < TLB1_ENTRIES; i++) { tlb1_read_entry(&e, i); if (!(e.mas1 & MAS1_VALID)) continue; if (pa >= e.phys && (pa + size) <= (e.phys + e.size)) return (void *)(e.virt + (vm_offset_t)(pa - e.phys)); } } size = roundup(size, PAGE_SIZE); /* * The device mapping area is between VM_MAXUSER_ADDRESS and * VM_MIN_KERNEL_ADDRESS. This gives 1GB of device addressing. */ #ifdef SPARSE_MAPDEV /* * With a sparse mapdev, align to the largest starting region. This * could feasibly be optimized for a 'best-fit' alignment, but that * calculation could be very costly. */ do { tmpva = tlb1_map_base; va = roundup(tlb1_map_base, 1 << flsl(size)); } while (!atomic_cmpset_int(&tlb1_map_base, tmpva, va + size)); #else va = atomic_fetchadd_int(&tlb1_map_base, size); #endif res = (void *)va; do { sz = 1 << (ilog2(size) & ~1); if (va % sz != 0) { do { sz >>= 2; } while (va % sz != 0); } if (bootverbose) printf("Wiring VA=%x to PA=%jx (size=%x)\n", va, (uintmax_t)pa, sz); tlb1_set_entry(va, pa, sz, _TLB_ENTRY_SHARED | tlb_calc_wimg(pa, ma)); size -= sz; pa += sz; va += sz; } while (size > 0); return (res); } /* * 'Unmap' a range mapped by mmu_booke_mapdev(). */ static void mmu_booke_unmapdev(mmu_t mmu, vm_offset_t va, vm_size_t size) { #ifdef SUPPORTS_SHRINKING_TLB1 vm_offset_t base, offset; /* * Unmap only if this is inside kernel virtual space. */ if ((va >= VM_MIN_KERNEL_ADDRESS) && (va <= VM_MAX_KERNEL_ADDRESS)) { base = trunc_page(va); offset = va & PAGE_MASK; size = roundup(offset + size, PAGE_SIZE); kva_free(base, size); } #endif } /* * mmu_booke_object_init_pt preloads the ptes for a given object into the * specified pmap. This eliminates the blast of soft faults on process startup * and immediately after an mmap. */ static void mmu_booke_object_init_pt(mmu_t mmu, pmap_t pmap, vm_offset_t addr, vm_object_t object, vm_pindex_t pindex, vm_size_t size) { VM_OBJECT_ASSERT_WLOCKED(object); KASSERT(object->type == OBJT_DEVICE || object->type == OBJT_SG, ("mmu_booke_object_init_pt: non-device object")); } /* * Perform the pmap work for mincore. */ static int mmu_booke_mincore(mmu_t mmu, pmap_t pmap, vm_offset_t addr, vm_paddr_t *locked_pa) { /* XXX: this should be implemented at some point */ return (0); } static int mmu_booke_change_attr(mmu_t mmu, vm_offset_t addr, vm_size_t sz, vm_memattr_t mode) { vm_offset_t va; pte_t *pte; int i, j; tlb_entry_t e; /* Check TLB1 mappings */ for (i = 0; i < TLB1_ENTRIES; i++) { tlb1_read_entry(&e, i); if (!(e.mas1 & MAS1_VALID)) continue; if (addr >= e.virt && addr < e.virt + e.size) break; } if (i < TLB1_ENTRIES) { /* Only allow full mappings to be modified for now. */ /* Validate the range. */ for (j = i, va = addr; va < addr + sz; va += e.size, j++) { tlb1_read_entry(&e, j); if (va != e.virt || (sz - (va - addr) < e.size)) return (EINVAL); } for (va = addr; va < addr + sz; va += e.size, i++) { tlb1_read_entry(&e, i); e.mas2 &= ~MAS2_WIMGE_MASK; e.mas2 |= tlb_calc_wimg(e.phys, mode); /* * Write it out to the TLB. Should really re-sync with other * cores. */ tlb1_write_entry(&e, i); } return (0); } /* Not in TLB1, try through pmap */ /* First validate the range. */ for (va = addr; va < addr + sz; va += PAGE_SIZE) { pte = pte_find(mmu, kernel_pmap, va); if (pte == NULL || !PTE_ISVALID(pte)) return (EINVAL); } mtx_lock_spin(&tlbivax_mutex); tlb_miss_lock(); for (va = addr; va < addr + sz; va += PAGE_SIZE) { pte = pte_find(mmu, kernel_pmap, va); *pte &= ~(PTE_MAS2_MASK << PTE_MAS2_SHIFT); *pte |= tlb_calc_wimg(PTE_PA(pte), mode << PTE_MAS2_SHIFT); tlb0_flush_entry(va); } tlb_miss_unlock(); mtx_unlock_spin(&tlbivax_mutex); return (pte_vatopa(mmu, kernel_pmap, va)); } /**************************************************************************/ /* TID handling */ /**************************************************************************/ /* * Allocate a TID. If necessary, steal one from someone else. * The new TID is flushed from the TLB before returning. */ static tlbtid_t tid_alloc(pmap_t pmap) { tlbtid_t tid; int thiscpu; KASSERT((pmap != kernel_pmap), ("tid_alloc: kernel pmap")); CTR2(KTR_PMAP, "%s: s (pmap = %p)", __func__, pmap); thiscpu = PCPU_GET(cpuid); tid = PCPU_GET(tid_next); if (tid > TID_MAX) tid = TID_MIN; PCPU_SET(tid_next, tid + 1); /* If we are stealing TID then clear the relevant pmap's field */ if (tidbusy[thiscpu][tid] != NULL) { CTR2(KTR_PMAP, "%s: warning: stealing tid %d", __func__, tid); tidbusy[thiscpu][tid]->pm_tid[thiscpu] = TID_NONE; /* Flush all entries from TLB0 matching this TID. */ tid_flush(tid); } tidbusy[thiscpu][tid] = pmap; pmap->pm_tid[thiscpu] = tid; __asm __volatile("msync; isync"); CTR3(KTR_PMAP, "%s: e (%02d next = %02d)", __func__, tid, PCPU_GET(tid_next)); return (tid); } /**************************************************************************/ /* TLB0 handling */ /**************************************************************************/ static void tlb_print_entry(int i, uint32_t mas1, uint32_t mas2, uint32_t mas3, uint32_t mas7) { int as; char desc[3]; tlbtid_t tid; vm_size_t size; unsigned int tsize; desc[2] = '\0'; if (mas1 & MAS1_VALID) desc[0] = 'V'; else desc[0] = ' '; if (mas1 & MAS1_IPROT) desc[1] = 'P'; else desc[1] = ' '; as = (mas1 & MAS1_TS_MASK) ? 1 : 0; tid = MAS1_GETTID(mas1); tsize = (mas1 & MAS1_TSIZE_MASK) >> MAS1_TSIZE_SHIFT; size = 0; if (tsize) size = tsize2size(tsize); debugf("%3d: (%s) [AS=%d] " "sz = 0x%08x tsz = %d tid = %d mas1 = 0x%08x " "mas2(va) = 0x%08x mas3(pa) = 0x%08x mas7 = 0x%08x\n", i, desc, as, size, tsize, tid, mas1, mas2, mas3, mas7); } /* Convert TLB0 va and way number to tlb0[] table index. */ static inline unsigned int tlb0_tableidx(vm_offset_t va, unsigned int way) { unsigned int idx; idx = (way * TLB0_ENTRIES_PER_WAY); idx += (va & MAS2_TLB0_ENTRY_IDX_MASK) >> MAS2_TLB0_ENTRY_IDX_SHIFT; return (idx); } /* * Invalidate TLB0 entry. */ static inline void tlb0_flush_entry(vm_offset_t va) { CTR2(KTR_PMAP, "%s: s va=0x%08x", __func__, va); mtx_assert(&tlbivax_mutex, MA_OWNED); __asm __volatile("tlbivax 0, %0" :: "r"(va & MAS2_EPN_MASK)); __asm __volatile("isync; msync"); __asm __volatile("tlbsync; msync"); CTR1(KTR_PMAP, "%s: e", __func__); } /* Print out contents of the MAS registers for each TLB0 entry */ void tlb0_print_tlbentries(void) { uint32_t mas0, mas1, mas2, mas3, mas7; int entryidx, way, idx; debugf("TLB0 entries:\n"); for (way = 0; way < TLB0_WAYS; way ++) for (entryidx = 0; entryidx < TLB0_ENTRIES_PER_WAY; entryidx++) { mas0 = MAS0_TLBSEL(0) | MAS0_ESEL(way); mtspr(SPR_MAS0, mas0); __asm __volatile("isync"); mas2 = entryidx << MAS2_TLB0_ENTRY_IDX_SHIFT; mtspr(SPR_MAS2, mas2); __asm __volatile("isync; tlbre"); mas1 = mfspr(SPR_MAS1); mas2 = mfspr(SPR_MAS2); mas3 = mfspr(SPR_MAS3); mas7 = mfspr(SPR_MAS7); idx = tlb0_tableidx(mas2, way); tlb_print_entry(idx, mas1, mas2, mas3, mas7); } } /**************************************************************************/ /* TLB1 handling */ /**************************************************************************/ /* * TLB1 mapping notes: * * TLB1[0] Kernel text and data. * TLB1[1-15] Additional kernel text and data mappings (if required), PCI * windows, other devices mappings. */ /* * Read an entry from given TLB1 slot. */ void tlb1_read_entry(tlb_entry_t *entry, unsigned int slot) { uint32_t mas0; KASSERT((entry != NULL), ("%s(): Entry is NULL!", __func__)); mas0 = MAS0_TLBSEL(1) | MAS0_ESEL(slot); mtspr(SPR_MAS0, mas0); __asm __volatile("isync; tlbre"); entry->mas1 = mfspr(SPR_MAS1); entry->mas2 = mfspr(SPR_MAS2); entry->mas3 = mfspr(SPR_MAS3); switch ((mfpvr() >> 16) & 0xFFFF) { case FSL_E500v2: case FSL_E500mc: case FSL_E5500: case FSL_E6500: entry->mas7 = mfspr(SPR_MAS7); break; default: entry->mas7 = 0; break; } entry->virt = entry->mas2 & MAS2_EPN_MASK; entry->phys = ((vm_paddr_t)(entry->mas7 & MAS7_RPN) << 32) | (entry->mas3 & MAS3_RPN); entry->size = tsize2size((entry->mas1 & MAS1_TSIZE_MASK) >> MAS1_TSIZE_SHIFT); } /* * Write given entry to TLB1 hardware. * Use 32 bit pa, clear 4 high-order bits of RPN (mas7). */ static void tlb1_write_entry(tlb_entry_t *e, unsigned int idx) { uint32_t mas0; //debugf("tlb1_write_entry: s\n"); /* Select entry */ mas0 = MAS0_TLBSEL(1) | MAS0_ESEL(idx); //debugf("tlb1_write_entry: mas0 = 0x%08x\n", mas0); mtspr(SPR_MAS0, mas0); __asm __volatile("isync"); mtspr(SPR_MAS1, e->mas1); __asm __volatile("isync"); mtspr(SPR_MAS2, e->mas2); __asm __volatile("isync"); mtspr(SPR_MAS3, e->mas3); __asm __volatile("isync"); switch ((mfpvr() >> 16) & 0xFFFF) { case FSL_E500mc: case FSL_E5500: case FSL_E6500: mtspr(SPR_MAS8, 0); __asm __volatile("isync"); /* FALLTHROUGH */ case FSL_E500v2: mtspr(SPR_MAS7, e->mas7); __asm __volatile("isync"); break; default: break; } __asm __volatile("tlbwe; isync; msync"); //debugf("tlb1_write_entry: e\n"); } /* * Return the largest uint value log such that 2^log <= num. */ static unsigned int ilog2(unsigned int num) { int lz; __asm ("cntlzw %0, %1" : "=r" (lz) : "r" (num)); return (31 - lz); } /* * Convert TLB TSIZE value to mapped region size. */ static vm_size_t tsize2size(unsigned int tsize) { /* * size = 4^tsize KB * size = 4^tsize * 2^10 = 2^(2 * tsize - 10) */ return ((1 << (2 * tsize)) * 1024); } /* * Convert region size (must be power of 4) to TLB TSIZE value. */ static unsigned int size2tsize(vm_size_t size) { return (ilog2(size) / 2 - 5); } /* * Register permanent kernel mapping in TLB1. * * Entries are created starting from index 0 (current free entry is * kept in tlb1_idx) and are not supposed to be invalidated. */ int tlb1_set_entry(vm_offset_t va, vm_paddr_t pa, vm_size_t size, uint32_t flags) { tlb_entry_t e; uint32_t ts, tid; int tsize, index; for (index = 0; index < TLB1_ENTRIES; index++) { tlb1_read_entry(&e, index); if ((e.mas1 & MAS1_VALID) == 0) break; /* Check if we're just updating the flags, and update them. */ if (e.phys == pa && e.virt == va && e.size == size) { e.mas2 = (va & MAS2_EPN_MASK) | flags; tlb1_write_entry(&e, index); return (0); } } if (index >= TLB1_ENTRIES) { printf("tlb1_set_entry: TLB1 full!\n"); return (-1); } /* Convert size to TSIZE */ tsize = size2tsize(size); tid = (TID_KERNEL << MAS1_TID_SHIFT) & MAS1_TID_MASK; /* XXX TS is hard coded to 0 for now as we only use single address space */ ts = (0 << MAS1_TS_SHIFT) & MAS1_TS_MASK; e.phys = pa; e.virt = va; e.size = size; e.mas1 = MAS1_VALID | MAS1_IPROT | ts | tid; e.mas1 |= ((tsize << MAS1_TSIZE_SHIFT) & MAS1_TSIZE_MASK); e.mas2 = (va & MAS2_EPN_MASK) | flags; /* Set supervisor RWX permission bits */ e.mas3 = (pa & MAS3_RPN) | MAS3_SR | MAS3_SW | MAS3_SX; e.mas7 = (pa >> 32) & MAS7_RPN; tlb1_write_entry(&e, index); /* * XXX in general TLB1 updates should be propagated between CPUs, * since current design assumes to have the same TLB1 set-up on all * cores. */ return (0); } /* * Map in contiguous RAM region into the TLB1 using maximum of * KERNEL_REGION_MAX_TLB_ENTRIES entries. * * If necessary round up last entry size and return total size * used by all allocated entries. */ vm_size_t tlb1_mapin_region(vm_offset_t va, vm_paddr_t pa, vm_size_t size) { vm_size_t pgs[KERNEL_REGION_MAX_TLB_ENTRIES]; vm_size_t mapped, pgsz, base, mask; int idx, nents; /* Round up to the next 1M */ size = roundup2(size, 1 << 20); mapped = 0; idx = 0; base = va; pgsz = 64*1024*1024; while (mapped < size) { while (mapped < size && idx < KERNEL_REGION_MAX_TLB_ENTRIES) { while (pgsz > (size - mapped)) pgsz >>= 2; pgs[idx++] = pgsz; mapped += pgsz; } /* We under-map. Correct for this. */ if (mapped < size) { while (pgs[idx - 1] == pgsz) { idx--; mapped -= pgsz; } /* XXX We may increase beyond out starting point. */ pgsz <<= 2; pgs[idx++] = pgsz; mapped += pgsz; } } nents = idx; mask = pgs[0] - 1; /* Align address to the boundary */ if (va & mask) { va = (va + mask) & ~mask; pa = (pa + mask) & ~mask; } for (idx = 0; idx < nents; idx++) { pgsz = pgs[idx]; debugf("%u: %llx -> %x, size=%x\n", idx, pa, va, pgsz); tlb1_set_entry(va, pa, pgsz, _TLB_ENTRY_SHARED | _TLB_ENTRY_MEM); pa += pgsz; va += pgsz; } mapped = (va - base); #ifdef __powerpc64__ printf("mapped size 0x%016lx (wasted space 0x%16lx)\n", #else printf("mapped size 0x%08x (wasted space 0x%08x)\n", #endif mapped, mapped - size); return (mapped); } /* * TLB1 initialization routine, to be called after the very first * assembler level setup done in locore.S. */ void tlb1_init() { uint32_t mas0, mas1, mas2, mas3, mas7; uint32_t tsz; tlb1_get_tlbconf(); mas0 = MAS0_TLBSEL(1) | MAS0_ESEL(0); mtspr(SPR_MAS0, mas0); __asm __volatile("isync; tlbre"); mas1 = mfspr(SPR_MAS1); mas2 = mfspr(SPR_MAS2); mas3 = mfspr(SPR_MAS3); mas7 = mfspr(SPR_MAS7); kernload = ((vm_paddr_t)(mas7 & MAS7_RPN) << 32) | (mas3 & MAS3_RPN); tsz = (mas1 & MAS1_TSIZE_MASK) >> MAS1_TSIZE_SHIFT; kernsize += (tsz > 0) ? tsize2size(tsz) : 0; /* Setup TLB miss defaults */ set_mas4_defaults(); } /* * pmap_early_io_unmap() should be used in short conjunction with * pmap_early_io_map(), as in the following snippet: * * x = pmap_early_io_map(...); * * pmap_early_io_unmap(x, size); * * And avoiding more allocations between. */ void pmap_early_io_unmap(vm_offset_t va, vm_size_t size) { int i; tlb_entry_t e; vm_size_t isize; size = roundup(size, PAGE_SIZE); isize = size; for (i = 0; i < TLB1_ENTRIES && size > 0; i++) { tlb1_read_entry(&e, i); if (!(e.mas1 & MAS1_VALID)) continue; if (va <= e.virt && (va + isize) >= (e.virt + e.size)) { size -= e.size; e.mas1 &= ~MAS1_VALID; tlb1_write_entry(&e, i); } } if (tlb1_map_base == va + isize) tlb1_map_base -= isize; } vm_offset_t pmap_early_io_map(vm_paddr_t pa, vm_size_t size) { vm_paddr_t pa_base; vm_offset_t va, sz; int i; tlb_entry_t e; KASSERT(!pmap_bootstrapped, ("Do not use after PMAP is up!")); for (i = 0; i < TLB1_ENTRIES; i++) { tlb1_read_entry(&e, i); if (!(e.mas1 & MAS1_VALID)) continue; if (pa >= e.phys && (pa + size) <= (e.phys + e.size)) return (e.virt + (pa - e.phys)); } pa_base = rounddown(pa, PAGE_SIZE); size = roundup(size + (pa - pa_base), PAGE_SIZE); tlb1_map_base = roundup2(tlb1_map_base, 1 << (ilog2(size) & ~1)); va = tlb1_map_base + (pa - pa_base); do { sz = 1 << (ilog2(size) & ~1); tlb1_set_entry(tlb1_map_base, pa_base, sz, _TLB_ENTRY_SHARED | _TLB_ENTRY_IO); size -= sz; pa_base += sz; tlb1_map_base += sz; } while (size > 0); return (va); } /* * Setup MAS4 defaults. * These values are loaded to MAS0-2 on a TLB miss. */ static void set_mas4_defaults(void) { uint32_t mas4; /* Defaults: TLB0, PID0, TSIZED=4K */ mas4 = MAS4_TLBSELD0; mas4 |= (TLB_SIZE_4K << MAS4_TSIZED_SHIFT) & MAS4_TSIZED_MASK; #ifdef SMP mas4 |= MAS4_MD; #endif mtspr(SPR_MAS4, mas4); __asm __volatile("isync"); } /* * Print out contents of the MAS registers for each TLB1 entry */ void tlb1_print_tlbentries(void) { uint32_t mas0, mas1, mas2, mas3, mas7; int i; debugf("TLB1 entries:\n"); for (i = 0; i < TLB1_ENTRIES; i++) { mas0 = MAS0_TLBSEL(1) | MAS0_ESEL(i); mtspr(SPR_MAS0, mas0); __asm __volatile("isync; tlbre"); mas1 = mfspr(SPR_MAS1); mas2 = mfspr(SPR_MAS2); mas3 = mfspr(SPR_MAS3); mas7 = mfspr(SPR_MAS7); tlb_print_entry(i, mas1, mas2, mas3, mas7); } } /* * Return 0 if the physical IO range is encompassed by one of the * the TLB1 entries, otherwise return related error code. */ static int tlb1_iomapped(int i, vm_paddr_t pa, vm_size_t size, vm_offset_t *va) { uint32_t prot; vm_paddr_t pa_start; vm_paddr_t pa_end; unsigned int entry_tsize; vm_size_t entry_size; tlb_entry_t e; *va = (vm_offset_t)NULL; tlb1_read_entry(&e, i); /* Skip invalid entries */ if (!(e.mas1 & MAS1_VALID)) return (EINVAL); /* * The entry must be cache-inhibited, guarded, and r/w * so it can function as an i/o page */ prot = e.mas2 & (MAS2_I | MAS2_G); if (prot != (MAS2_I | MAS2_G)) return (EPERM); prot = e.mas3 & (MAS3_SR | MAS3_SW); if (prot != (MAS3_SR | MAS3_SW)) return (EPERM); /* The address should be within the entry range. */ entry_tsize = (e.mas1 & MAS1_TSIZE_MASK) >> MAS1_TSIZE_SHIFT; KASSERT((entry_tsize), ("tlb1_iomapped: invalid entry tsize")); entry_size = tsize2size(entry_tsize); pa_start = (((vm_paddr_t)e.mas7 & MAS7_RPN) << 32) | (e.mas3 & MAS3_RPN); pa_end = pa_start + entry_size; if ((pa < pa_start) || ((pa + size) > pa_end)) return (ERANGE); /* Return virtual address of this mapping. */ *va = (e.mas2 & MAS2_EPN_MASK) + (pa - pa_start); return (0); } /* * Invalidate all TLB0 entries which match the given TID. Note this is * dedicated for cases when invalidations should NOT be propagated to other * CPUs. */ static void tid_flush(tlbtid_t tid) { register_t msr; uint32_t mas0, mas1, mas2; int entry, way; /* Don't evict kernel translations */ if (tid == TID_KERNEL) return; msr = mfmsr(); __asm __volatile("wrteei 0"); for (way = 0; way < TLB0_WAYS; way++) for (entry = 0; entry < TLB0_ENTRIES_PER_WAY; entry++) { mas0 = MAS0_TLBSEL(0) | MAS0_ESEL(way); mtspr(SPR_MAS0, mas0); __asm __volatile("isync"); mas2 = entry << MAS2_TLB0_ENTRY_IDX_SHIFT; mtspr(SPR_MAS2, mas2); __asm __volatile("isync; tlbre"); mas1 = mfspr(SPR_MAS1); if (!(mas1 & MAS1_VALID)) continue; if (((mas1 & MAS1_TID_MASK) >> MAS1_TID_SHIFT) != tid) continue; mas1 &= ~MAS1_VALID; mtspr(SPR_MAS1, mas1); __asm __volatile("isync; tlbwe; isync; msync"); } mtmsr(msr); } Index: projects/clang390-import/sys/powerpc/conf/MPC85XX =================================================================== --- projects/clang390-import/sys/powerpc/conf/MPC85XX (revision 305686) +++ projects/clang390-import/sys/powerpc/conf/MPC85XX (revision 305687) @@ -1,94 +1,95 @@ # # Custom kernel for Freescale MPC85XX development boards like the CDS etc. # # $FreeBSD$ # cpu BOOKE cpu BOOKE_E500 ident MPC85XX machine powerpc powerpc include "dpaa/config.dpaa" makeoptions DEBUG="-Wa,-me500 -g" makeoptions WERROR="-Werror -Wno-format -Wno-redundant-decls" makeoptions NO_MODULES=yes options FPU_EMU options _KPOSIX_PRIORITY_SCHEDULING options ALT_BREAK_TO_DEBUGGER options BREAK_TO_DEBUGGER options BOOTP options BOOTP_NFSROOT #options BOOTP_NFSV3 options CD9660 options COMPAT_43 options DDB #options DEADLKRES options DEVICE_POLLING #options DIAGNOSTIC options FDT #makeoptions FDT_DTS_FILE=mpc8555cds.dts options FFS options GDB options GEOM_PART_GPT options INET options INET6 options INVARIANTS options INVARIANT_SUPPORT options KDB options KTRACE options MD_ROOT options MPC85XX options MSDOSFS options NFS_ROOT options NFSCL options NFSLOCKD options PROCFS options PSEUDOFS options SCHED_ULE options CAPABILITIES options CAPABILITY_MODE options SMP options SYSVMSG options SYSVSEM options SYSVSHM options WITNESS options WITNESS_SKIPSPIN device ata device bpf device cfi device crypto device cryptodev device da device ds1553 device em device alc device ether device fxp device gpio device iic device iicbus #device isa device loop device md device miibus device pass device pci device quicc device random #device rl device scbus device scc device sec device tsec device tun device uart options USB_DEBUG # enable debug msgs #device uhci +device ehci device umass device usb device vlan Index: projects/clang390-import/sys/riscv/riscv/pmap.c =================================================================== --- projects/clang390-import/sys/riscv/riscv/pmap.c (revision 305686) +++ projects/clang390-import/sys/riscv/riscv/pmap.c (revision 305687) @@ -1,3279 +1,3284 @@ /*- * Copyright (c) 1991 Regents of the University of California. * All rights reserved. * Copyright (c) 1994 John S. Dyson * All rights reserved. * Copyright (c) 1994 David Greenman * All rights reserved. * Copyright (c) 2003 Peter Wemm * All rights reserved. * Copyright (c) 2005-2010 Alan L. Cox * All rights reserved. * Copyright (c) 2014 Andrew Turner * All rights reserved. * Copyright (c) 2014 The FreeBSD Foundation * All rights reserved. * Copyright (c) 2015-2016 Ruslan Bukin * All rights reserved. * * This code is derived from software contributed to Berkeley by * the Systems Programming Group of the University of Utah Computer * Science Department and William Jolitz of UUNET Technologies Inc. * * Portions of this software were developed by Andrew Turner under * sponsorship from The FreeBSD Foundation. * * Portions of this software were developed by SRI International and the * University of Cambridge Computer Laboratory under DARPA/AFRL contract * FA8750-10-C-0237 ("CTSRD"), as part of the DARPA CRASH research programme. * * Portions of this software were developed by the University of Cambridge * Computer Laboratory as part of the CTSRD Project, with support from the * UK Higher Education Innovation Fund (HEIF). * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. All advertising materials mentioning features or use of this software * must display the following acknowledgement: * This product includes software developed by the University of * California, Berkeley and its contributors. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * from: @(#)pmap.c 7.7 (Berkeley) 5/12/91 */ /*- * Copyright (c) 2003 Networks Associates Technology, Inc. * All rights reserved. * * This software was developed for the FreeBSD Project by Jake Burkholder, * Safeport Network Services, and Network Associates Laboratories, the * Security Research Division of Network Associates, Inc. under * DARPA/SPAWAR contract N66001-01-C-8035 ("CBOSS"), as part of the DARPA * CHATS research program. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); /* * Manages physical address maps. * * Since the information managed by this module is * also stored by the logical address mapping module, * this module may throw away valid virtual-to-physical * mappings at almost any time. However, invalidations * of virtual-to-physical mappings must be done as * requested. * * In order to cope with hardware architectures which * make virtual-to-physical map invalidates expensive, * this module may delay invalidate or reduced protection * operations until such time as they are actually * necessary. This module is given full information as * to which processors are currently using which maps, * and to when physical maps must be made correct. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #define NPDEPG (PAGE_SIZE/(sizeof (pd_entry_t))) #define NUPDE (NPDEPG * NPDEPG) #define NUSERPGTBLS (NUPDE + NPDEPG) #if !defined(DIAGNOSTIC) #ifdef __GNUC_GNU_INLINE__ #define PMAP_INLINE __attribute__((__gnu_inline__)) inline #else #define PMAP_INLINE extern inline #endif #else #define PMAP_INLINE #endif #ifdef PV_STATS #define PV_STAT(x) do { x ; } while (0) #else #define PV_STAT(x) do { } while (0) #endif #define pmap_l2_pindex(v) ((v) >> L2_SHIFT) #define NPV_LIST_LOCKS MAXCPU #define PHYS_TO_PV_LIST_LOCK(pa) \ (&pv_list_locks[pa_index(pa) % NPV_LIST_LOCKS]) #define CHANGE_PV_LIST_LOCK_TO_PHYS(lockp, pa) do { \ struct rwlock **_lockp = (lockp); \ struct rwlock *_new_lock; \ \ _new_lock = PHYS_TO_PV_LIST_LOCK(pa); \ if (_new_lock != *_lockp) { \ if (*_lockp != NULL) \ rw_wunlock(*_lockp); \ *_lockp = _new_lock; \ rw_wlock(*_lockp); \ } \ } while (0) #define CHANGE_PV_LIST_LOCK_TO_VM_PAGE(lockp, m) \ CHANGE_PV_LIST_LOCK_TO_PHYS(lockp, VM_PAGE_TO_PHYS(m)) #define RELEASE_PV_LIST_LOCK(lockp) do { \ struct rwlock **_lockp = (lockp); \ \ if (*_lockp != NULL) { \ rw_wunlock(*_lockp); \ *_lockp = NULL; \ } \ } while (0) #define VM_PAGE_TO_PV_LIST_LOCK(m) \ PHYS_TO_PV_LIST_LOCK(VM_PAGE_TO_PHYS(m)) /* The list of all the user pmaps */ LIST_HEAD(pmaplist, pmap); static struct pmaplist allpmaps; static MALLOC_DEFINE(M_VMPMAP, "pmap", "PMAP L1"); struct pmap kernel_pmap_store; vm_offset_t virtual_avail; /* VA of first avail page (after kernel bss) */ vm_offset_t virtual_end; /* VA of last avail page (end of kernel AS) */ vm_offset_t kernel_vm_end = 0; struct msgbuf *msgbufp = NULL; vm_paddr_t dmap_phys_base; /* The start of the dmap region */ vm_paddr_t dmap_phys_max; /* The limit of the dmap region */ vm_offset_t dmap_max_addr; /* The virtual address limit of the dmap */ /* This code assumes all L1 DMAP entries will be used */ CTASSERT((DMAP_MIN_ADDRESS & ~L1_OFFSET) == DMAP_MIN_ADDRESS); CTASSERT((DMAP_MAX_ADDRESS & ~L1_OFFSET) == DMAP_MAX_ADDRESS); static struct rwlock_padalign pvh_global_lock; /* * Data for the pv entry allocation mechanism */ static TAILQ_HEAD(pch, pv_chunk) pv_chunks = TAILQ_HEAD_INITIALIZER(pv_chunks); static struct mtx pv_chunks_mutex; static struct rwlock pv_list_locks[NPV_LIST_LOCKS]; static void free_pv_chunk(struct pv_chunk *pc); static void free_pv_entry(pmap_t pmap, pv_entry_t pv); static pv_entry_t get_pv_entry(pmap_t pmap, struct rwlock **lockp); static vm_page_t reclaim_pv_chunk(pmap_t locked_pmap, struct rwlock **lockp); static void pmap_pvh_free(struct md_page *pvh, pmap_t pmap, vm_offset_t va); static pv_entry_t pmap_pvh_remove(struct md_page *pvh, pmap_t pmap, vm_offset_t va); static vm_page_t pmap_enter_quick_locked(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot, vm_page_t mpte, struct rwlock **lockp); static int pmap_remove_l3(pmap_t pmap, pt_entry_t *l3, vm_offset_t sva, pd_entry_t ptepde, struct spglist *free, struct rwlock **lockp); static boolean_t pmap_try_insert_pv_entry(pmap_t pmap, vm_offset_t va, vm_page_t m, struct rwlock **lockp); static vm_page_t _pmap_alloc_l3(pmap_t pmap, vm_pindex_t ptepindex, struct rwlock **lockp); static void _pmap_unwire_l3(pmap_t pmap, vm_offset_t va, vm_page_t m, struct spglist *free); static int pmap_unuse_l3(pmap_t, vm_offset_t, pd_entry_t, struct spglist *); /* * These load the old table data and store the new value. * They need to be atomic as the System MMU may write to the table at * the same time as the CPU. */ #define pmap_load_store(table, entry) atomic_swap_64(table, entry) #define pmap_set(table, mask) atomic_set_64(table, mask) #define pmap_load_clear(table) atomic_swap_64(table, 0) #define pmap_load(table) (*table) /********************/ /* Inline functions */ /********************/ static __inline void pagecopy(void *s, void *d) { memcpy(d, s, PAGE_SIZE); } static __inline void pagezero(void *p) { bzero(p, PAGE_SIZE); } #define pmap_l1_index(va) (((va) >> L1_SHIFT) & Ln_ADDR_MASK) #define pmap_l2_index(va) (((va) >> L2_SHIFT) & Ln_ADDR_MASK) #define pmap_l3_index(va) (((va) >> L3_SHIFT) & Ln_ADDR_MASK) #define PTE_TO_PHYS(pte) ((pte >> PTE_PPN0_S) * PAGE_SIZE) static __inline pd_entry_t * pmap_l1(pmap_t pmap, vm_offset_t va) { return (&pmap->pm_l1[pmap_l1_index(va)]); } static __inline pd_entry_t * pmap_l1_to_l2(pd_entry_t *l1, vm_offset_t va) { vm_paddr_t phys; pd_entry_t *l2; phys = PTE_TO_PHYS(pmap_load(l1)); l2 = (pd_entry_t *)PHYS_TO_DMAP(phys); return (&l2[pmap_l2_index(va)]); } static __inline pd_entry_t * pmap_l2(pmap_t pmap, vm_offset_t va) { pd_entry_t *l1; l1 = pmap_l1(pmap, va); if (l1 == NULL) return (NULL); if ((pmap_load(l1) & PTE_V) == 0) return (NULL); if ((pmap_load(l1) & PTE_RX) != 0) return (NULL); return (pmap_l1_to_l2(l1, va)); } static __inline pt_entry_t * pmap_l2_to_l3(pd_entry_t *l2, vm_offset_t va) { vm_paddr_t phys; pt_entry_t *l3; phys = PTE_TO_PHYS(pmap_load(l2)); l3 = (pd_entry_t *)PHYS_TO_DMAP(phys); return (&l3[pmap_l3_index(va)]); } static __inline pt_entry_t * pmap_l3(pmap_t pmap, vm_offset_t va) { pd_entry_t *l2; l2 = pmap_l2(pmap, va); if (l2 == NULL) return (NULL); if ((pmap_load(l2) & PTE_V) == 0) return (NULL); if ((pmap_load(l2) & PTE_RX) != 0) return (NULL); return (pmap_l2_to_l3(l2, va)); } static __inline int pmap_is_write(pt_entry_t entry) { return (entry & PTE_W); } static __inline int pmap_is_current(pmap_t pmap) { return ((pmap == pmap_kernel()) || (pmap == curthread->td_proc->p_vmspace->vm_map.pmap)); } static __inline int pmap_l3_valid(pt_entry_t l3) { return (l3 & PTE_V); } static __inline int pmap_l3_valid_cacheable(pt_entry_t l3) { /* TODO */ return (0); } #define PTE_SYNC(pte) cpu_dcache_wb_range((vm_offset_t)pte, sizeof(*pte)) /* Checks if the page is dirty. */ static inline int pmap_page_dirty(pt_entry_t pte) { return (pte & PTE_D); } static __inline void pmap_resident_count_inc(pmap_t pmap, int count) { PMAP_LOCK_ASSERT(pmap, MA_OWNED); pmap->pm_stats.resident_count += count; } static __inline void pmap_resident_count_dec(pmap_t pmap, int count) { PMAP_LOCK_ASSERT(pmap, MA_OWNED); KASSERT(pmap->pm_stats.resident_count >= count, ("pmap %p resident count underflow %ld %d", pmap, pmap->pm_stats.resident_count, count)); pmap->pm_stats.resident_count -= count; } static void pmap_distribute_l1(struct pmap *pmap, vm_pindex_t l1index, pt_entry_t entry) { struct pmap *user_pmap; pd_entry_t *l1; /* Distribute new kernel L1 entry to all the user pmaps */ if (pmap != kernel_pmap) return; LIST_FOREACH(user_pmap, &allpmaps, pm_list) { l1 = &user_pmap->pm_l1[l1index]; if (entry) pmap_load_store(l1, entry); else pmap_load_clear(l1); } } static pt_entry_t * pmap_early_page_idx(vm_offset_t l1pt, vm_offset_t va, u_int *l1_slot, u_int *l2_slot) { pt_entry_t *l2; pd_entry_t *l1; l1 = (pd_entry_t *)l1pt; *l1_slot = (va >> L1_SHIFT) & Ln_ADDR_MASK; /* Check locore has used a table L1 map */ KASSERT((l1[*l1_slot] & PTE_RX) == 0, ("Invalid bootstrap L1 table")); /* Find the address of the L2 table */ l2 = (pt_entry_t *)init_pt_va; *l2_slot = pmap_l2_index(va); return (l2); } static vm_paddr_t pmap_early_vtophys(vm_offset_t l1pt, vm_offset_t va) { u_int l1_slot, l2_slot; pt_entry_t *l2; u_int ret; l2 = pmap_early_page_idx(l1pt, va, &l1_slot, &l2_slot); /* Check locore has used L2 superpages */ KASSERT((l2[l2_slot] & PTE_RX) != 0, ("Invalid bootstrap L2 table")); /* L2 is superpages */ ret = (l2[l2_slot] >> PTE_PPN1_S) << L2_SHIFT; ret += (va & L2_OFFSET); return (ret); } static void pmap_bootstrap_dmap(vm_offset_t kern_l1, vm_paddr_t min_pa, vm_paddr_t max_pa) { vm_offset_t va; vm_paddr_t pa; pd_entry_t *l1; u_int l1_slot; pt_entry_t entry; pn_t pn; pa = dmap_phys_base = min_pa & ~L1_OFFSET; va = DMAP_MIN_ADDRESS; l1 = (pd_entry_t *)kern_l1; l1_slot = pmap_l1_index(DMAP_MIN_ADDRESS); for (; va < DMAP_MAX_ADDRESS && pa < max_pa; pa += L1_SIZE, va += L1_SIZE, l1_slot++) { KASSERT(l1_slot < Ln_ENTRIES, ("Invalid L1 index")); /* superpages */ pn = (pa / PAGE_SIZE); entry = (PTE_V | PTE_RWX); entry |= (pn << PTE_PPN0_S); pmap_load_store(&l1[l1_slot], entry); } /* Set the upper limit of the DMAP region */ dmap_phys_max = pa; dmap_max_addr = va; cpu_dcache_wb_range((vm_offset_t)l1, PAGE_SIZE); cpu_tlb_flushID(); } static vm_offset_t pmap_bootstrap_l3(vm_offset_t l1pt, vm_offset_t va, vm_offset_t l3_start) { vm_offset_t l2pt, l3pt; pt_entry_t entry; pd_entry_t *l2; vm_paddr_t pa; u_int l2_slot; pn_t pn; KASSERT((va & L2_OFFSET) == 0, ("Invalid virtual address")); l2 = pmap_l2(kernel_pmap, va); l2 = (pd_entry_t *)((uintptr_t)l2 & ~(PAGE_SIZE - 1)); l2pt = (vm_offset_t)l2; l2_slot = pmap_l2_index(va); l3pt = l3_start; for (; va < VM_MAX_KERNEL_ADDRESS; l2_slot++, va += L2_SIZE) { KASSERT(l2_slot < Ln_ENTRIES, ("Invalid L2 index")); pa = pmap_early_vtophys(l1pt, l3pt); pn = (pa / PAGE_SIZE); entry = (PTE_V); entry |= (pn << PTE_PPN0_S); pmap_load_store(&l2[l2_slot], entry); l3pt += PAGE_SIZE; } /* Clean the L2 page table */ memset((void *)l3_start, 0, l3pt - l3_start); cpu_dcache_wb_range(l3_start, l3pt - l3_start); cpu_dcache_wb_range((vm_offset_t)l2, PAGE_SIZE); return (l3pt); } /* * Bootstrap the system enough to run with virtual memory. */ void pmap_bootstrap(vm_offset_t l1pt, vm_paddr_t kernstart, vm_size_t kernlen) { u_int l1_slot, l2_slot, avail_slot, map_slot, used_map_slot; uint64_t kern_delta; pt_entry_t *l2; vm_offset_t va, freemempos; vm_offset_t dpcpu, msgbufpv; vm_paddr_t pa, min_pa, max_pa; int i; kern_delta = KERNBASE - kernstart; physmem = 0; printf("pmap_bootstrap %lx %lx %lx\n", l1pt, kernstart, kernlen); printf("%lx\n", l1pt); printf("%lx\n", (KERNBASE >> L1_SHIFT) & Ln_ADDR_MASK); /* Set this early so we can use the pagetable walking functions */ kernel_pmap_store.pm_l1 = (pd_entry_t *)l1pt; PMAP_LOCK_INIT(kernel_pmap); /* * Initialize the global pv list lock. */ rw_init(&pvh_global_lock, "pmap pv global"); LIST_INIT(&allpmaps); /* Assume the address we were loaded to is a valid physical address */ min_pa = max_pa = KERNBASE - kern_delta; /* * Find the minimum physical address. physmap is sorted, * but may contain empty ranges. */ for (i = 0; i < (physmap_idx * 2); i += 2) { if (physmap[i] == physmap[i + 1]) continue; if (physmap[i] <= min_pa) min_pa = physmap[i]; if (physmap[i + 1] > max_pa) max_pa = physmap[i + 1]; break; } /* Create a direct map region early so we can use it for pa -> va */ pmap_bootstrap_dmap(l1pt, min_pa, max_pa); va = KERNBASE; pa = KERNBASE - kern_delta; /* * Start to initialize phys_avail by copying from physmap * up to the physical address KERNBASE points at. */ map_slot = avail_slot = 0; for (; map_slot < (physmap_idx * 2); map_slot += 2) { if (physmap[map_slot] == physmap[map_slot + 1]) continue; if (physmap[map_slot] <= pa && physmap[map_slot + 1] > pa) break; phys_avail[avail_slot] = physmap[map_slot]; phys_avail[avail_slot + 1] = physmap[map_slot + 1]; physmem += (phys_avail[avail_slot + 1] - phys_avail[avail_slot]) >> PAGE_SHIFT; avail_slot += 2; } /* Add the memory before the kernel */ if (physmap[avail_slot] < pa) { phys_avail[avail_slot] = physmap[map_slot]; phys_avail[avail_slot + 1] = pa; physmem += (phys_avail[avail_slot + 1] - phys_avail[avail_slot]) >> PAGE_SHIFT; avail_slot += 2; } used_map_slot = map_slot; /* * Read the page table to find out what is already mapped. * This assumes we have mapped a block of memory from KERNBASE * using a single L1 entry. */ l2 = pmap_early_page_idx(l1pt, KERNBASE, &l1_slot, &l2_slot); /* Sanity check the index, KERNBASE should be the first VA */ KASSERT(l2_slot == 0, ("The L2 index is non-zero")); /* Find how many pages we have mapped */ for (; l2_slot < Ln_ENTRIES; l2_slot++) { if ((l2[l2_slot] & PTE_V) == 0) break; /* Check locore used L2 superpages */ KASSERT((l2[l2_slot] & PTE_RX) != 0, ("Invalid bootstrap L2 table")); va += L2_SIZE; pa += L2_SIZE; } va = roundup2(va, L2_SIZE); freemempos = KERNBASE + kernlen; freemempos = roundup2(freemempos, PAGE_SIZE); /* Create the l3 tables for the early devmap */ freemempos = pmap_bootstrap_l3(l1pt, VM_MAX_KERNEL_ADDRESS - L2_SIZE, freemempos); cpu_tlb_flushID(); #define alloc_pages(var, np) \ (var) = freemempos; \ freemempos += (np * PAGE_SIZE); \ memset((char *)(var), 0, ((np) * PAGE_SIZE)); /* Allocate dynamic per-cpu area. */ alloc_pages(dpcpu, DPCPU_SIZE / PAGE_SIZE); dpcpu_init((void *)dpcpu, 0); /* Allocate memory for the msgbuf, e.g. for /sbin/dmesg */ alloc_pages(msgbufpv, round_page(msgbufsize) / PAGE_SIZE); msgbufp = (void *)msgbufpv; virtual_avail = roundup2(freemempos, L2_SIZE); virtual_end = VM_MAX_KERNEL_ADDRESS - L2_SIZE; kernel_vm_end = virtual_avail; pa = pmap_early_vtophys(l1pt, freemempos); /* Finish initialising physmap */ map_slot = used_map_slot; for (; avail_slot < (PHYS_AVAIL_SIZE - 2) && map_slot < (physmap_idx * 2); map_slot += 2) { if (physmap[map_slot] == physmap[map_slot + 1]) { continue; } /* Have we used the current range? */ if (physmap[map_slot + 1] <= pa) { continue; } /* Do we need to split the entry? */ if (physmap[map_slot] < pa) { phys_avail[avail_slot] = pa; phys_avail[avail_slot + 1] = physmap[map_slot + 1]; } else { phys_avail[avail_slot] = physmap[map_slot]; phys_avail[avail_slot + 1] = physmap[map_slot + 1]; } physmem += (phys_avail[avail_slot + 1] - phys_avail[avail_slot]) >> PAGE_SHIFT; avail_slot += 2; } phys_avail[avail_slot] = 0; phys_avail[avail_slot + 1] = 0; /* * Maxmem isn't the "maximum memory", it's one larger than the * highest page of the physical address space. It should be * called something like "Maxphyspage". */ Maxmem = atop(phys_avail[avail_slot - 1]); cpu_tlb_flushID(); } /* * Initialize a vm_page's machine-dependent fields. */ void pmap_page_init(vm_page_t m) { TAILQ_INIT(&m->md.pv_list); m->md.pv_memattr = VM_MEMATTR_WRITE_BACK; } /* * Initialize the pmap module. * Called by vm_init, to initialize any structures that the pmap * system needs to map virtual memory. */ void pmap_init(void) { int i; /* * Initialize the pv chunk list mutex. */ mtx_init(&pv_chunks_mutex, "pmap pv chunk list", NULL, MTX_DEF); /* * Initialize the pool of pv list locks. */ for (i = 0; i < NPV_LIST_LOCKS; i++) rw_init(&pv_list_locks[i], "pmap pv list"); } /* * Normal, non-SMP, invalidation functions. * We inline these within pmap.c for speed. */ PMAP_INLINE void pmap_invalidate_page(pmap_t pmap, vm_offset_t va) { /* TODO */ sched_pin(); __asm __volatile("sfence.vm"); sched_unpin(); } PMAP_INLINE void pmap_invalidate_range(pmap_t pmap, vm_offset_t sva, vm_offset_t eva) { /* TODO */ sched_pin(); __asm __volatile("sfence.vm"); sched_unpin(); } PMAP_INLINE void pmap_invalidate_all(pmap_t pmap) { /* TODO */ sched_pin(); __asm __volatile("sfence.vm"); sched_unpin(); } /* * Routine: pmap_extract * Function: * Extract the physical page address associated * with the given map/virtual_address pair. */ vm_paddr_t pmap_extract(pmap_t pmap, vm_offset_t va) { pd_entry_t *l2p, l2; pt_entry_t *l3p, l3; vm_paddr_t pa; pa = 0; PMAP_LOCK(pmap); /* * Start with the l2 tabel. We are unable to allocate * pages in the l1 table. */ l2p = pmap_l2(pmap, va); if (l2p != NULL) { l2 = pmap_load(l2p); if ((l2 & PTE_RX) == 0) { l3p = pmap_l2_to_l3(l2p, va); if (l3p != NULL) { l3 = pmap_load(l3p); pa = PTE_TO_PHYS(l3); pa |= (va & L3_OFFSET); } } else { /* L2 is superpages */ pa = (l2 >> PTE_PPN1_S) << L2_SHIFT; pa |= (va & L2_OFFSET); } } PMAP_UNLOCK(pmap); return (pa); } /* * Routine: pmap_extract_and_hold * Function: * Atomically extract and hold the physical page * with the given pmap and virtual address pair * if that mapping permits the given protection. */ vm_page_t pmap_extract_and_hold(pmap_t pmap, vm_offset_t va, vm_prot_t prot) { pt_entry_t *l3p, l3; vm_paddr_t phys; vm_paddr_t pa; vm_page_t m; pa = 0; m = NULL; PMAP_LOCK(pmap); retry: l3p = pmap_l3(pmap, va); if (l3p != NULL && (l3 = pmap_load(l3p)) != 0) { if ((pmap_is_write(l3)) || ((prot & VM_PROT_WRITE) == 0)) { phys = PTE_TO_PHYS(l3); if (vm_page_pa_tryrelock(pmap, phys, &pa)) goto retry; m = PHYS_TO_VM_PAGE(phys); vm_page_hold(m); } } PA_UNLOCK_COND(pa); PMAP_UNLOCK(pmap); return (m); } vm_paddr_t pmap_kextract(vm_offset_t va) { pd_entry_t *l2; pt_entry_t *l3; vm_paddr_t pa; if (va >= DMAP_MIN_ADDRESS && va < DMAP_MAX_ADDRESS) { pa = DMAP_TO_PHYS(va); } else { l2 = pmap_l2(kernel_pmap, va); if (l2 == NULL) panic("pmap_kextract: No l2"); if ((pmap_load(l2) & PTE_RX) != 0) { /* superpages */ pa = (pmap_load(l2) >> PTE_PPN1_S) << L2_SHIFT; pa |= (va & L2_OFFSET); return (pa); } l3 = pmap_l2_to_l3(l2, va); if (l3 == NULL) panic("pmap_kextract: No l3..."); pa = PTE_TO_PHYS(pmap_load(l3)); pa |= (va & PAGE_MASK); } return (pa); } /*************************************************** * Low level mapping routines..... ***************************************************/ void pmap_kenter_device(vm_offset_t sva, vm_size_t size, vm_paddr_t pa) { pt_entry_t entry; pt_entry_t *l3; vm_offset_t va; pn_t pn; KASSERT((pa & L3_OFFSET) == 0, ("pmap_kenter_device: Invalid physical address")); KASSERT((sva & L3_OFFSET) == 0, ("pmap_kenter_device: Invalid virtual address")); KASSERT((size & PAGE_MASK) == 0, ("pmap_kenter_device: Mapping is not page-sized")); va = sva; while (size != 0) { l3 = pmap_l3(kernel_pmap, va); KASSERT(l3 != NULL, ("Invalid page table, va: 0x%lx", va)); pn = (pa / PAGE_SIZE); entry = (PTE_V | PTE_RWX); entry |= (pn << PTE_PPN0_S); pmap_load_store(l3, entry); PTE_SYNC(l3); va += PAGE_SIZE; pa += PAGE_SIZE; size -= PAGE_SIZE; } pmap_invalidate_range(kernel_pmap, sva, va); } /* * Remove a page from the kernel pagetables. * Note: not SMP coherent. */ PMAP_INLINE void pmap_kremove(vm_offset_t va) { pt_entry_t *l3; l3 = pmap_l3(kernel_pmap, va); KASSERT(l3 != NULL, ("pmap_kremove: Invalid address")); if (pmap_l3_valid_cacheable(pmap_load(l3))) cpu_dcache_wb_range(va, L3_SIZE); pmap_load_clear(l3); PTE_SYNC(l3); pmap_invalidate_page(kernel_pmap, va); } void pmap_kremove_device(vm_offset_t sva, vm_size_t size) { pt_entry_t *l3; vm_offset_t va; KASSERT((sva & L3_OFFSET) == 0, ("pmap_kremove_device: Invalid virtual address")); KASSERT((size & PAGE_MASK) == 0, ("pmap_kremove_device: Mapping is not page-sized")); va = sva; while (size != 0) { l3 = pmap_l3(kernel_pmap, va); KASSERT(l3 != NULL, ("Invalid page table, va: 0x%lx", va)); pmap_load_clear(l3); PTE_SYNC(l3); va += PAGE_SIZE; size -= PAGE_SIZE; } pmap_invalidate_range(kernel_pmap, sva, va); } /* * Used to map a range of physical addresses into kernel * virtual address space. * * The value passed in '*virt' is a suggested virtual address for * the mapping. Architectures which can support a direct-mapped * physical to virtual region can return the appropriate address * within that region, leaving '*virt' unchanged. Other * architectures should map the pages starting at '*virt' and * update '*virt' with the first usable address after the mapped * region. */ vm_offset_t pmap_map(vm_offset_t *virt, vm_paddr_t start, vm_paddr_t end, int prot) { return PHYS_TO_DMAP(start); } /* * Add a list of wired pages to the kva * this routine is only used for temporary * kernel mappings that do not need to have * page modification or references recorded. * Note that old mappings are simply written * over. The page *must* be wired. * Note: SMP coherent. Uses a ranged shootdown IPI. */ void pmap_qenter(vm_offset_t sva, vm_page_t *ma, int count) { pt_entry_t *l3, pa; vm_offset_t va; vm_page_t m; pt_entry_t entry; pn_t pn; int i; va = sva; for (i = 0; i < count; i++) { m = ma[i]; pa = VM_PAGE_TO_PHYS(m); pn = (pa / PAGE_SIZE); l3 = pmap_l3(kernel_pmap, va); entry = (PTE_V | PTE_RWX); entry |= (pn << PTE_PPN0_S); pmap_load_store(l3, entry); PTE_SYNC(l3); va += L3_SIZE; } pmap_invalidate_range(kernel_pmap, sva, va); } /* * This routine tears out page mappings from the * kernel -- it is meant only for temporary mappings. * Note: SMP coherent. Uses a ranged shootdown IPI. */ void pmap_qremove(vm_offset_t sva, int count) { pt_entry_t *l3; vm_offset_t va; KASSERT(sva >= VM_MIN_KERNEL_ADDRESS, ("usermode va %lx", sva)); va = sva; while (count-- > 0) { l3 = pmap_l3(kernel_pmap, va); KASSERT(l3 != NULL, ("pmap_kremove: Invalid address")); if (pmap_l3_valid_cacheable(pmap_load(l3))) cpu_dcache_wb_range(va, L3_SIZE); pmap_load_clear(l3); PTE_SYNC(l3); va += PAGE_SIZE; } pmap_invalidate_range(kernel_pmap, sva, va); } /*************************************************** * Page table page management routines..... ***************************************************/ static __inline void pmap_free_zero_pages(struct spglist *free) { vm_page_t m; while ((m = SLIST_FIRST(free)) != NULL) { SLIST_REMOVE_HEAD(free, plinks.s.ss); /* Preserve the page's PG_ZERO setting. */ vm_page_free_toq(m); } } /* * Schedule the specified unused page table page to be freed. Specifically, * add the page to the specified list of pages that will be released to the * physical memory manager after the TLB has been updated. */ static __inline void pmap_add_delayed_free_list(vm_page_t m, struct spglist *free, boolean_t set_PG_ZERO) { if (set_PG_ZERO) m->flags |= PG_ZERO; else m->flags &= ~PG_ZERO; SLIST_INSERT_HEAD(free, m, plinks.s.ss); } /* * Decrements a page table page's wire count, which is used to record the * number of valid page table entries within the page. If the wire count * drops to zero, then the page table page is unmapped. Returns TRUE if the * page table page was unmapped and FALSE otherwise. */ static inline boolean_t pmap_unwire_l3(pmap_t pmap, vm_offset_t va, vm_page_t m, struct spglist *free) { --m->wire_count; if (m->wire_count == 0) { _pmap_unwire_l3(pmap, va, m, free); return (TRUE); } else { return (FALSE); } } static void _pmap_unwire_l3(pmap_t pmap, vm_offset_t va, vm_page_t m, struct spglist *free) { vm_paddr_t phys; PMAP_LOCK_ASSERT(pmap, MA_OWNED); /* * unmap the page table page */ if (m->pindex >= NUPDE) { /* PD page */ pd_entry_t *l1; l1 = pmap_l1(pmap, va); pmap_load_clear(l1); pmap_distribute_l1(pmap, pmap_l1_index(va), 0); PTE_SYNC(l1); } else { /* PTE page */ pd_entry_t *l2; l2 = pmap_l2(pmap, va); pmap_load_clear(l2); PTE_SYNC(l2); } pmap_resident_count_dec(pmap, 1); if (m->pindex < NUPDE) { pd_entry_t *l1; /* We just released a PT, unhold the matching PD */ vm_page_t pdpg; l1 = pmap_l1(pmap, va); phys = PTE_TO_PHYS(pmap_load(l1)); pdpg = PHYS_TO_VM_PAGE(phys); pmap_unwire_l3(pmap, va, pdpg, free); } pmap_invalidate_page(pmap, va); /* * This is a release store so that the ordinary store unmapping * the page table page is globally performed before TLB shoot- * down is begun. */ atomic_subtract_rel_int(&vm_cnt.v_wire_count, 1); /* * Put page on a list so that it is released after * *ALL* TLB shootdown is done */ pmap_add_delayed_free_list(m, free, TRUE); } /* * After removing an l3 entry, this routine is used to * conditionally free the page, and manage the hold/wire counts. */ static int pmap_unuse_l3(pmap_t pmap, vm_offset_t va, pd_entry_t ptepde, struct spglist *free) { vm_paddr_t phys; vm_page_t mpte; if (va >= VM_MAXUSER_ADDRESS) return (0); KASSERT(ptepde != 0, ("pmap_unuse_pt: ptepde != 0")); phys = PTE_TO_PHYS(ptepde); mpte = PHYS_TO_VM_PAGE(phys); return (pmap_unwire_l3(pmap, va, mpte, free)); } void pmap_pinit0(pmap_t pmap) { PMAP_LOCK_INIT(pmap); bzero(&pmap->pm_stats, sizeof(pmap->pm_stats)); pmap->pm_l1 = kernel_pmap->pm_l1; } int pmap_pinit(pmap_t pmap) { vm_paddr_t l1phys; vm_page_t l1pt; /* * allocate the l1 page */ while ((l1pt = vm_page_alloc(NULL, 0xdeadbeef, VM_ALLOC_NORMAL | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED | VM_ALLOC_ZERO)) == NULL) VM_WAIT; l1phys = VM_PAGE_TO_PHYS(l1pt); pmap->pm_l1 = (pd_entry_t *)PHYS_TO_DMAP(l1phys); if ((l1pt->flags & PG_ZERO) == 0) pagezero(pmap->pm_l1); bzero(&pmap->pm_stats, sizeof(pmap->pm_stats)); /* Install kernel pagetables */ memcpy(pmap->pm_l1, kernel_pmap->pm_l1, PAGE_SIZE); /* Add to the list of all user pmaps */ LIST_INSERT_HEAD(&allpmaps, pmap, pm_list); return (1); } /* * This routine is called if the desired page table page does not exist. * * If page table page allocation fails, this routine may sleep before * returning NULL. It sleeps only if a lock pointer was given. * * Note: If a page allocation fails at page table level two or three, * one or two pages may be held during the wait, only to be released * afterwards. This conservative approach is easily argued to avoid * race conditions. */ static vm_page_t _pmap_alloc_l3(pmap_t pmap, vm_pindex_t ptepindex, struct rwlock **lockp) { vm_page_t m, /*pdppg, */pdpg; pt_entry_t entry; vm_paddr_t phys; pn_t pn; PMAP_LOCK_ASSERT(pmap, MA_OWNED); /* * Allocate a page table page. */ if ((m = vm_page_alloc(NULL, ptepindex, VM_ALLOC_NOOBJ | VM_ALLOC_WIRED | VM_ALLOC_ZERO)) == NULL) { if (lockp != NULL) { RELEASE_PV_LIST_LOCK(lockp); PMAP_UNLOCK(pmap); rw_runlock(&pvh_global_lock); VM_WAIT; rw_rlock(&pvh_global_lock); PMAP_LOCK(pmap); } /* * Indicate the need to retry. While waiting, the page table * page may have been allocated. */ return (NULL); } if ((m->flags & PG_ZERO) == 0) pmap_zero_page(m); /* * Map the pagetable page into the process address space, if * it isn't already there. */ if (ptepindex >= NUPDE) { pd_entry_t *l1; vm_pindex_t l1index; l1index = ptepindex - NUPDE; l1 = &pmap->pm_l1[l1index]; pn = (VM_PAGE_TO_PHYS(m) / PAGE_SIZE); entry = (PTE_V); entry |= (pn << PTE_PPN0_S); pmap_load_store(l1, entry); pmap_distribute_l1(pmap, l1index, entry); PTE_SYNC(l1); } else { vm_pindex_t l1index; pd_entry_t *l1, *l2; l1index = ptepindex >> (L1_SHIFT - L2_SHIFT); l1 = &pmap->pm_l1[l1index]; if (pmap_load(l1) == 0) { /* recurse for allocating page dir */ if (_pmap_alloc_l3(pmap, NUPDE + l1index, lockp) == NULL) { --m->wire_count; atomic_subtract_int(&vm_cnt.v_wire_count, 1); vm_page_free_zero(m); return (NULL); } } else { phys = PTE_TO_PHYS(pmap_load(l1)); pdpg = PHYS_TO_VM_PAGE(phys); pdpg->wire_count++; } phys = PTE_TO_PHYS(pmap_load(l1)); l2 = (pd_entry_t *)PHYS_TO_DMAP(phys); l2 = &l2[ptepindex & Ln_ADDR_MASK]; pn = (VM_PAGE_TO_PHYS(m) / PAGE_SIZE); entry = (PTE_V); entry |= (pn << PTE_PPN0_S); pmap_load_store(l2, entry); PTE_SYNC(l2); } pmap_resident_count_inc(pmap, 1); return (m); } static vm_page_t pmap_alloc_l3(pmap_t pmap, vm_offset_t va, struct rwlock **lockp) { vm_pindex_t ptepindex; pd_entry_t *l2; vm_paddr_t phys; vm_page_t m; /* * Calculate pagetable page index */ ptepindex = pmap_l2_pindex(va); retry: /* * Get the page directory entry */ l2 = pmap_l2(pmap, va); /* * If the page table page is mapped, we just increment the * hold count, and activate it. */ if (l2 != NULL && pmap_load(l2) != 0) { phys = PTE_TO_PHYS(pmap_load(l2)); m = PHYS_TO_VM_PAGE(phys); m->wire_count++; } else { /* * Here if the pte page isn't mapped, or if it has been * deallocated. */ m = _pmap_alloc_l3(pmap, ptepindex, lockp); if (m == NULL && lockp != NULL) goto retry; } return (m); } /*************************************************** * Pmap allocation/deallocation routines. ***************************************************/ /* * Release any resources held by the given physical map. * Called when a pmap initialized by pmap_pinit is being released. * Should only be called if the map contains no valid mappings. */ void pmap_release(pmap_t pmap) { vm_page_t m; KASSERT(pmap->pm_stats.resident_count == 0, ("pmap_release: pmap resident count %ld != 0", pmap->pm_stats.resident_count)); m = PHYS_TO_VM_PAGE(DMAP_TO_PHYS((vm_offset_t)pmap->pm_l1)); m->wire_count--; atomic_subtract_int(&vm_cnt.v_wire_count, 1); vm_page_free_zero(m); /* Remove pmap from the allpmaps list */ LIST_REMOVE(pmap, pm_list); /* Remove kernel pagetables */ bzero(pmap->pm_l1, PAGE_SIZE); } #if 0 static int kvm_size(SYSCTL_HANDLER_ARGS) { unsigned long ksize = VM_MAX_KERNEL_ADDRESS - VM_MIN_KERNEL_ADDRESS; return sysctl_handle_long(oidp, &ksize, 0, req); } SYSCTL_PROC(_vm, OID_AUTO, kvm_size, CTLTYPE_LONG|CTLFLAG_RD, 0, 0, kvm_size, "LU", "Size of KVM"); static int kvm_free(SYSCTL_HANDLER_ARGS) { unsigned long kfree = VM_MAX_KERNEL_ADDRESS - kernel_vm_end; return sysctl_handle_long(oidp, &kfree, 0, req); } SYSCTL_PROC(_vm, OID_AUTO, kvm_free, CTLTYPE_LONG|CTLFLAG_RD, 0, 0, kvm_free, "LU", "Amount of KVM free"); #endif /* 0 */ /* * grow the number of kernel page table entries, if needed */ void pmap_growkernel(vm_offset_t addr) { vm_paddr_t paddr; vm_page_t nkpg; pd_entry_t *l1, *l2; pt_entry_t entry; pn_t pn; mtx_assert(&kernel_map->system_mtx, MA_OWNED); addr = roundup2(addr, L2_SIZE); if (addr - 1 >= kernel_map->max_offset) addr = kernel_map->max_offset; while (kernel_vm_end < addr) { l1 = pmap_l1(kernel_pmap, kernel_vm_end); if (pmap_load(l1) == 0) { /* We need a new PDP entry */ nkpg = vm_page_alloc(NULL, kernel_vm_end >> L1_SHIFT, VM_ALLOC_INTERRUPT | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED | VM_ALLOC_ZERO); if (nkpg == NULL) panic("pmap_growkernel: no memory to grow kernel"); if ((nkpg->flags & PG_ZERO) == 0) pmap_zero_page(nkpg); paddr = VM_PAGE_TO_PHYS(nkpg); pn = (paddr / PAGE_SIZE); entry = (PTE_V); entry |= (pn << PTE_PPN0_S); pmap_load_store(l1, entry); pmap_distribute_l1(kernel_pmap, pmap_l1_index(kernel_vm_end), entry); PTE_SYNC(l1); continue; /* try again */ } l2 = pmap_l1_to_l2(l1, kernel_vm_end); if ((pmap_load(l2) & PTE_A) != 0) { kernel_vm_end = (kernel_vm_end + L2_SIZE) & ~L2_OFFSET; if (kernel_vm_end - 1 >= kernel_map->max_offset) { kernel_vm_end = kernel_map->max_offset; break; } continue; } nkpg = vm_page_alloc(NULL, kernel_vm_end >> L2_SHIFT, VM_ALLOC_INTERRUPT | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED | VM_ALLOC_ZERO); if (nkpg == NULL) panic("pmap_growkernel: no memory to grow kernel"); if ((nkpg->flags & PG_ZERO) == 0) { pmap_zero_page(nkpg); } paddr = VM_PAGE_TO_PHYS(nkpg); pn = (paddr / PAGE_SIZE); entry = (PTE_V); entry |= (pn << PTE_PPN0_S); pmap_load_store(l2, entry); PTE_SYNC(l2); pmap_invalidate_page(kernel_pmap, kernel_vm_end); kernel_vm_end = (kernel_vm_end + L2_SIZE) & ~L2_OFFSET; if (kernel_vm_end - 1 >= kernel_map->max_offset) { kernel_vm_end = kernel_map->max_offset; break; } } } /*************************************************** * page management routines. ***************************************************/ CTASSERT(sizeof(struct pv_chunk) == PAGE_SIZE); CTASSERT(_NPCM == 3); CTASSERT(_NPCPV == 168); static __inline struct pv_chunk * pv_to_chunk(pv_entry_t pv) { return ((struct pv_chunk *)((uintptr_t)pv & ~(uintptr_t)PAGE_MASK)); } #define PV_PMAP(pv) (pv_to_chunk(pv)->pc_pmap) #define PC_FREE0 0xfffffffffffffffful #define PC_FREE1 0xfffffffffffffffful #define PC_FREE2 0x000000fffffffffful static const uint64_t pc_freemask[_NPCM] = { PC_FREE0, PC_FREE1, PC_FREE2 }; #if 0 #ifdef PV_STATS static int pc_chunk_count, pc_chunk_allocs, pc_chunk_frees, pc_chunk_tryfail; SYSCTL_INT(_vm_pmap, OID_AUTO, pc_chunk_count, CTLFLAG_RD, &pc_chunk_count, 0, "Current number of pv entry chunks"); SYSCTL_INT(_vm_pmap, OID_AUTO, pc_chunk_allocs, CTLFLAG_RD, &pc_chunk_allocs, 0, "Current number of pv entry chunks allocated"); SYSCTL_INT(_vm_pmap, OID_AUTO, pc_chunk_frees, CTLFLAG_RD, &pc_chunk_frees, 0, "Current number of pv entry chunks frees"); SYSCTL_INT(_vm_pmap, OID_AUTO, pc_chunk_tryfail, CTLFLAG_RD, &pc_chunk_tryfail, 0, "Number of times tried to get a chunk page but failed."); static long pv_entry_frees, pv_entry_allocs, pv_entry_count; static int pv_entry_spare; SYSCTL_LONG(_vm_pmap, OID_AUTO, pv_entry_frees, CTLFLAG_RD, &pv_entry_frees, 0, "Current number of pv entry frees"); SYSCTL_LONG(_vm_pmap, OID_AUTO, pv_entry_allocs, CTLFLAG_RD, &pv_entry_allocs, 0, "Current number of pv entry allocs"); SYSCTL_LONG(_vm_pmap, OID_AUTO, pv_entry_count, CTLFLAG_RD, &pv_entry_count, 0, "Current number of pv entries"); SYSCTL_INT(_vm_pmap, OID_AUTO, pv_entry_spare, CTLFLAG_RD, &pv_entry_spare, 0, "Current number of spare pv entries"); #endif #endif /* 0 */ /* * We are in a serious low memory condition. Resort to * drastic measures to free some pages so we can allocate * another pv entry chunk. * * Returns NULL if PV entries were reclaimed from the specified pmap. * * We do not, however, unmap 2mpages because subsequent accesses will * allocate per-page pv entries until repromotion occurs, thereby * exacerbating the shortage of free pv entries. */ static vm_page_t reclaim_pv_chunk(pmap_t locked_pmap, struct rwlock **lockp) { panic("RISCVTODO: reclaim_pv_chunk"); } /* * free the pv_entry back to the free list */ static void free_pv_entry(pmap_t pmap, pv_entry_t pv) { struct pv_chunk *pc; int idx, field, bit; rw_assert(&pvh_global_lock, RA_LOCKED); PMAP_LOCK_ASSERT(pmap, MA_OWNED); PV_STAT(atomic_add_long(&pv_entry_frees, 1)); PV_STAT(atomic_add_int(&pv_entry_spare, 1)); PV_STAT(atomic_subtract_long(&pv_entry_count, 1)); pc = pv_to_chunk(pv); idx = pv - &pc->pc_pventry[0]; field = idx / 64; bit = idx % 64; pc->pc_map[field] |= 1ul << bit; if (pc->pc_map[0] != PC_FREE0 || pc->pc_map[1] != PC_FREE1 || pc->pc_map[2] != PC_FREE2) { /* 98% of the time, pc is already at the head of the list. */ if (__predict_false(pc != TAILQ_FIRST(&pmap->pm_pvchunk))) { TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); TAILQ_INSERT_HEAD(&pmap->pm_pvchunk, pc, pc_list); } return; } TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); free_pv_chunk(pc); } static void free_pv_chunk(struct pv_chunk *pc) { vm_page_t m; mtx_lock(&pv_chunks_mutex); TAILQ_REMOVE(&pv_chunks, pc, pc_lru); mtx_unlock(&pv_chunks_mutex); PV_STAT(atomic_subtract_int(&pv_entry_spare, _NPCPV)); PV_STAT(atomic_subtract_int(&pc_chunk_count, 1)); PV_STAT(atomic_add_int(&pc_chunk_frees, 1)); /* entire chunk is free, return it */ m = PHYS_TO_VM_PAGE(DMAP_TO_PHYS((vm_offset_t)pc)); #if 0 /* TODO: For minidump */ dump_drop_page(m->phys_addr); #endif vm_page_unwire(m, PQ_INACTIVE); vm_page_free(m); } /* * Returns a new PV entry, allocating a new PV chunk from the system when * needed. If this PV chunk allocation fails and a PV list lock pointer was * given, a PV chunk is reclaimed from an arbitrary pmap. Otherwise, NULL is * returned. * * The given PV list lock may be released. */ static pv_entry_t get_pv_entry(pmap_t pmap, struct rwlock **lockp) { int bit, field; pv_entry_t pv; struct pv_chunk *pc; vm_page_t m; rw_assert(&pvh_global_lock, RA_LOCKED); PMAP_LOCK_ASSERT(pmap, MA_OWNED); PV_STAT(atomic_add_long(&pv_entry_allocs, 1)); retry: pc = TAILQ_FIRST(&pmap->pm_pvchunk); if (pc != NULL) { for (field = 0; field < _NPCM; field++) { if (pc->pc_map[field]) { bit = ffsl(pc->pc_map[field]) - 1; break; } } if (field < _NPCM) { pv = &pc->pc_pventry[field * 64 + bit]; pc->pc_map[field] &= ~(1ul << bit); /* If this was the last item, move it to tail */ if (pc->pc_map[0] == 0 && pc->pc_map[1] == 0 && pc->pc_map[2] == 0) { TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); TAILQ_INSERT_TAIL(&pmap->pm_pvchunk, pc, pc_list); } PV_STAT(atomic_add_long(&pv_entry_count, 1)); PV_STAT(atomic_subtract_int(&pv_entry_spare, 1)); return (pv); } } /* No free items, allocate another chunk */ m = vm_page_alloc(NULL, 0, VM_ALLOC_NORMAL | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED); if (m == NULL) { if (lockp == NULL) { PV_STAT(pc_chunk_tryfail++); return (NULL); } m = reclaim_pv_chunk(pmap, lockp); if (m == NULL) goto retry; } PV_STAT(atomic_add_int(&pc_chunk_count, 1)); PV_STAT(atomic_add_int(&pc_chunk_allocs, 1)); #if 0 /* TODO: This is for minidump */ dump_add_page(m->phys_addr); #endif pc = (void *)PHYS_TO_DMAP(m->phys_addr); pc->pc_pmap = pmap; pc->pc_map[0] = PC_FREE0 & ~1ul; /* preallocated bit 0 */ pc->pc_map[1] = PC_FREE1; pc->pc_map[2] = PC_FREE2; mtx_lock(&pv_chunks_mutex); TAILQ_INSERT_TAIL(&pv_chunks, pc, pc_lru); mtx_unlock(&pv_chunks_mutex); pv = &pc->pc_pventry[0]; TAILQ_INSERT_HEAD(&pmap->pm_pvchunk, pc, pc_list); PV_STAT(atomic_add_long(&pv_entry_count, 1)); PV_STAT(atomic_add_int(&pv_entry_spare, _NPCPV - 1)); return (pv); } /* * First find and then remove the pv entry for the specified pmap and virtual * address from the specified pv list. Returns the pv entry if found and NULL * otherwise. This operation can be performed on pv lists for either 4KB or * 2MB page mappings. */ static __inline pv_entry_t pmap_pvh_remove(struct md_page *pvh, pmap_t pmap, vm_offset_t va) { pv_entry_t pv; rw_assert(&pvh_global_lock, RA_LOCKED); TAILQ_FOREACH(pv, &pvh->pv_list, pv_next) { if (pmap == PV_PMAP(pv) && va == pv->pv_va) { TAILQ_REMOVE(&pvh->pv_list, pv, pv_next); pvh->pv_gen++; break; } } return (pv); } /* * First find and then destroy the pv entry for the specified pmap and virtual * address. This operation can be performed on pv lists for either 4KB or 2MB * page mappings. */ static void pmap_pvh_free(struct md_page *pvh, pmap_t pmap, vm_offset_t va) { pv_entry_t pv; pv = pmap_pvh_remove(pvh, pmap, va); KASSERT(pv != NULL, ("pmap_pvh_free: pv not found")); free_pv_entry(pmap, pv); } /* * Conditionally create the PV entry for a 4KB page mapping if the required * memory can be allocated without resorting to reclamation. */ static boolean_t pmap_try_insert_pv_entry(pmap_t pmap, vm_offset_t va, vm_page_t m, struct rwlock **lockp) { pv_entry_t pv; rw_assert(&pvh_global_lock, RA_LOCKED); PMAP_LOCK_ASSERT(pmap, MA_OWNED); /* Pass NULL instead of the lock pointer to disable reclamation. */ if ((pv = get_pv_entry(pmap, NULL)) != NULL) { pv->pv_va = va; CHANGE_PV_LIST_LOCK_TO_VM_PAGE(lockp, m); TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_next); m->md.pv_gen++; return (TRUE); } else return (FALSE); } /* * pmap_remove_l3: do the things to unmap a page in a process */ static int pmap_remove_l3(pmap_t pmap, pt_entry_t *l3, vm_offset_t va, pd_entry_t l2e, struct spglist *free, struct rwlock **lockp) { pt_entry_t old_l3; vm_paddr_t phys; vm_page_t m; PMAP_LOCK_ASSERT(pmap, MA_OWNED); if (pmap_is_current(pmap) && pmap_l3_valid_cacheable(pmap_load(l3))) cpu_dcache_wb_range(va, L3_SIZE); old_l3 = pmap_load_clear(l3); PTE_SYNC(l3); pmap_invalidate_page(pmap, va); if (old_l3 & PTE_SW_WIRED) pmap->pm_stats.wired_count -= 1; pmap_resident_count_dec(pmap, 1); if (old_l3 & PTE_SW_MANAGED) { phys = PTE_TO_PHYS(old_l3); m = PHYS_TO_VM_PAGE(phys); if (pmap_page_dirty(old_l3)) vm_page_dirty(m); if (old_l3 & PTE_A) vm_page_aflag_set(m, PGA_REFERENCED); CHANGE_PV_LIST_LOCK_TO_VM_PAGE(lockp, m); pmap_pvh_free(&m->md, pmap, va); } return (pmap_unuse_l3(pmap, va, l2e, free)); } /* * Remove the given range of addresses from the specified map. * * It is assumed that the start and end are properly * rounded to the page size. */ void pmap_remove(pmap_t pmap, vm_offset_t sva, vm_offset_t eva) { struct rwlock *lock; vm_offset_t va, va_next; pd_entry_t *l1, *l2; pt_entry_t l3_pte, *l3; struct spglist free; int anyvalid; /* * Perform an unsynchronized read. This is, however, safe. */ if (pmap->pm_stats.resident_count == 0) return; anyvalid = 0; SLIST_INIT(&free); rw_rlock(&pvh_global_lock); PMAP_LOCK(pmap); lock = NULL; for (; sva < eva; sva = va_next) { if (pmap->pm_stats.resident_count == 0) break; l1 = pmap_l1(pmap, sva); if (pmap_load(l1) == 0) { va_next = (sva + L1_SIZE) & ~L1_OFFSET; if (va_next < sva) va_next = eva; continue; } /* * Calculate index for next page table. */ va_next = (sva + L2_SIZE) & ~L2_OFFSET; if (va_next < sva) va_next = eva; l2 = pmap_l1_to_l2(l1, sva); if (l2 == NULL) continue; l3_pte = pmap_load(l2); /* * Weed out invalid mappings. */ if (l3_pte == 0) continue; if ((pmap_load(l2) & PTE_RX) != 0) continue; /* * Limit our scan to either the end of the va represented * by the current page table page, or to the end of the * range being removed. */ if (va_next > eva) va_next = eva; va = va_next; for (l3 = pmap_l2_to_l3(l2, sva); sva != va_next; l3++, sva += L3_SIZE) { if (l3 == NULL) panic("l3 == NULL"); if (pmap_load(l3) == 0) { if (va != va_next) { pmap_invalidate_range(pmap, va, sva); va = va_next; } continue; } if (va == va_next) va = sva; if (pmap_remove_l3(pmap, l3, sva, l3_pte, &free, &lock)) { sva += L3_SIZE; break; } } if (va != va_next) pmap_invalidate_range(pmap, va, sva); } if (lock != NULL) rw_wunlock(lock); if (anyvalid) pmap_invalidate_all(pmap); rw_runlock(&pvh_global_lock); PMAP_UNLOCK(pmap); pmap_free_zero_pages(&free); } /* * Routine: pmap_remove_all * Function: * Removes this physical page from * all physical maps in which it resides. * Reflects back modify bits to the pager. * * Notes: * Original versions of this routine were very * inefficient because they iteratively called * pmap_remove (slow...) */ void pmap_remove_all(vm_page_t m) { pv_entry_t pv; pmap_t pmap; pt_entry_t *l3, tl3; pd_entry_t *l2, tl2; struct spglist free; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_remove_all: page %p is not managed", m)); SLIST_INIT(&free); rw_wlock(&pvh_global_lock); while ((pv = TAILQ_FIRST(&m->md.pv_list)) != NULL) { pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pmap_resident_count_dec(pmap, 1); l2 = pmap_l2(pmap, pv->pv_va); KASSERT(l2 != NULL, ("pmap_remove_all: no l2 table found")); tl2 = pmap_load(l2); KASSERT((tl2 & PTE_RX) == 0, ("pmap_remove_all: found a table when expecting " "a block in %p's pv list", m)); l3 = pmap_l2_to_l3(l2, pv->pv_va); if (pmap_is_current(pmap) && pmap_l3_valid_cacheable(pmap_load(l3))) cpu_dcache_wb_range(pv->pv_va, L3_SIZE); tl3 = pmap_load_clear(l3); PTE_SYNC(l3); pmap_invalidate_page(pmap, pv->pv_va); if (tl3 & PTE_SW_WIRED) pmap->pm_stats.wired_count--; if ((tl3 & PTE_A) != 0) vm_page_aflag_set(m, PGA_REFERENCED); /* * Update the vm_page_t clean and reference bits. */ if (pmap_page_dirty(tl3)) vm_page_dirty(m); pmap_unuse_l3(pmap, pv->pv_va, pmap_load(l2), &free); TAILQ_REMOVE(&m->md.pv_list, pv, pv_next); m->md.pv_gen++; free_pv_entry(pmap, pv); PMAP_UNLOCK(pmap); } vm_page_aflag_clear(m, PGA_WRITEABLE); rw_wunlock(&pvh_global_lock); pmap_free_zero_pages(&free); } /* * Set the physical protection on the * specified range of this map as requested. */ void pmap_protect(pmap_t pmap, vm_offset_t sva, vm_offset_t eva, vm_prot_t prot) { vm_offset_t va, va_next; pd_entry_t *l1, *l2; pt_entry_t *l3p, l3; pt_entry_t entry; if ((prot & VM_PROT_READ) == VM_PROT_NONE) { pmap_remove(pmap, sva, eva); return; } if ((prot & VM_PROT_WRITE) == VM_PROT_WRITE) return; PMAP_LOCK(pmap); for (; sva < eva; sva = va_next) { l1 = pmap_l1(pmap, sva); if (pmap_load(l1) == 0) { va_next = (sva + L1_SIZE) & ~L1_OFFSET; if (va_next < sva) va_next = eva; continue; } va_next = (sva + L2_SIZE) & ~L2_OFFSET; if (va_next < sva) va_next = eva; l2 = pmap_l1_to_l2(l1, sva); if (l2 == NULL) continue; if ((pmap_load(l2) & PTE_RX) != 0) continue; if (va_next > eva) va_next = eva; va = va_next; for (l3p = pmap_l2_to_l3(l2, sva); sva != va_next; l3p++, sva += L3_SIZE) { l3 = pmap_load(l3p); if (pmap_l3_valid(l3)) { entry = pmap_load(l3p); entry &= ~(PTE_W); pmap_load_store(l3p, entry); PTE_SYNC(l3p); /* XXX: Use pmap_invalidate_range */ pmap_invalidate_page(pmap, va); } } } PMAP_UNLOCK(pmap); /* TODO: Only invalidate entries we are touching */ pmap_invalidate_all(pmap); } /* * Insert the given physical page (p) at * the specified virtual address (v) in the * target physical map with the protection requested. * * If specified, the page will be wired down, meaning * that the related pte can not be reclaimed. * * NB: This is the only routine which MAY NOT lazy-evaluate * or lose information. That is, this routine must actually * insert this page into the given map NOW. */ int pmap_enter(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot, u_int flags, int8_t psind __unused) { struct rwlock *lock; pd_entry_t *l1, *l2; pt_entry_t new_l3, orig_l3; pt_entry_t *l3; pv_entry_t pv; vm_paddr_t opa, pa, l2_pa, l3_pa; vm_page_t mpte, om, l2_m, l3_m; boolean_t nosleep; pt_entry_t entry; pn_t l2_pn; pn_t l3_pn; pn_t pn; va = trunc_page(va); if ((m->oflags & VPO_UNMANAGED) == 0 && !vm_page_xbusied(m)) VM_OBJECT_ASSERT_LOCKED(m->object); pa = VM_PAGE_TO_PHYS(m); pn = (pa / PAGE_SIZE); new_l3 = PTE_V | PTE_R | PTE_X; if (prot & VM_PROT_WRITE) new_l3 |= PTE_W; if ((va >> 63) == 0) new_l3 |= PTE_U; new_l3 |= (pn << PTE_PPN0_S); if ((flags & PMAP_ENTER_WIRED) != 0) new_l3 |= PTE_SW_WIRED; CTR2(KTR_PMAP, "pmap_enter: %.16lx -> %.16lx", va, pa); mpte = NULL; lock = NULL; rw_rlock(&pvh_global_lock); PMAP_LOCK(pmap); if (va < VM_MAXUSER_ADDRESS) { nosleep = (flags & PMAP_ENTER_NOSLEEP) != 0; mpte = pmap_alloc_l3(pmap, va, nosleep ? NULL : &lock); if (mpte == NULL && nosleep) { CTR0(KTR_PMAP, "pmap_enter: mpte == NULL"); if (lock != NULL) rw_wunlock(lock); rw_runlock(&pvh_global_lock); PMAP_UNLOCK(pmap); return (KERN_RESOURCE_SHORTAGE); } l3 = pmap_l3(pmap, va); } else { l3 = pmap_l3(pmap, va); /* TODO: This is not optimal, but should mostly work */ if (l3 == NULL) { l2 = pmap_l2(pmap, va); if (l2 == NULL) { l2_m = vm_page_alloc(NULL, 0, VM_ALLOC_NORMAL | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED | VM_ALLOC_ZERO); if (l2_m == NULL) panic("pmap_enter: l2 pte_m == NULL"); if ((l2_m->flags & PG_ZERO) == 0) pmap_zero_page(l2_m); l2_pa = VM_PAGE_TO_PHYS(l2_m); l2_pn = (l2_pa / PAGE_SIZE); l1 = pmap_l1(pmap, va); entry = (PTE_V); entry |= (l2_pn << PTE_PPN0_S); pmap_load_store(l1, entry); pmap_distribute_l1(pmap, pmap_l1_index(va), entry); PTE_SYNC(l1); l2 = pmap_l1_to_l2(l1, va); } KASSERT(l2 != NULL, ("No l2 table after allocating one")); l3_m = vm_page_alloc(NULL, 0, VM_ALLOC_NORMAL | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED | VM_ALLOC_ZERO); if (l3_m == NULL) panic("pmap_enter: l3 pte_m == NULL"); if ((l3_m->flags & PG_ZERO) == 0) pmap_zero_page(l3_m); l3_pa = VM_PAGE_TO_PHYS(l3_m); l3_pn = (l3_pa / PAGE_SIZE); entry = (PTE_V); entry |= (l3_pn << PTE_PPN0_S); pmap_load_store(l2, entry); PTE_SYNC(l2); l3 = pmap_l2_to_l3(l2, va); } pmap_invalidate_page(pmap, va); } om = NULL; orig_l3 = pmap_load(l3); opa = PTE_TO_PHYS(orig_l3); /* * Is the specified virtual address already mapped? */ if (pmap_l3_valid(orig_l3)) { /* * Wiring change, just update stats. We don't worry about * wiring PT pages as they remain resident as long as there * are valid mappings in them. Hence, if a user page is wired, * the PT page will be also. */ if ((flags & PMAP_ENTER_WIRED) != 0 && (orig_l3 & PTE_SW_WIRED) == 0) pmap->pm_stats.wired_count++; else if ((flags & PMAP_ENTER_WIRED) == 0 && (orig_l3 & PTE_SW_WIRED) != 0) pmap->pm_stats.wired_count--; /* * Remove the extra PT page reference. */ if (mpte != NULL) { mpte->wire_count--; KASSERT(mpte->wire_count > 0, ("pmap_enter: missing reference to page table page," " va: 0x%lx", va)); } /* * Has the physical page changed? */ if (opa == pa) { /* * No, might be a protection or wiring change. */ if ((orig_l3 & PTE_SW_MANAGED) != 0) { new_l3 |= PTE_SW_MANAGED; if (pmap_is_write(new_l3)) vm_page_aflag_set(m, PGA_WRITEABLE); } goto validate; } /* Flush the cache, there might be uncommitted data in it */ if (pmap_is_current(pmap) && pmap_l3_valid_cacheable(orig_l3)) cpu_dcache_wb_range(va, L3_SIZE); } else { /* * Increment the counters. */ if ((new_l3 & PTE_SW_WIRED) != 0) pmap->pm_stats.wired_count++; pmap_resident_count_inc(pmap, 1); } /* * Enter on the PV list if part of our managed memory. */ if ((m->oflags & VPO_UNMANAGED) == 0) { new_l3 |= PTE_SW_MANAGED; pv = get_pv_entry(pmap, &lock); pv->pv_va = va; CHANGE_PV_LIST_LOCK_TO_PHYS(&lock, pa); TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_next); m->md.pv_gen++; if (pmap_is_write(new_l3)) vm_page_aflag_set(m, PGA_WRITEABLE); } /* * Update the L3 entry. */ if (orig_l3 != 0) { validate: orig_l3 = pmap_load_store(l3, new_l3); PTE_SYNC(l3); opa = PTE_TO_PHYS(orig_l3); if (opa != pa) { if ((orig_l3 & PTE_SW_MANAGED) != 0) { om = PHYS_TO_VM_PAGE(opa); if (pmap_page_dirty(orig_l3)) vm_page_dirty(om); if ((orig_l3 & PTE_A) != 0) vm_page_aflag_set(om, PGA_REFERENCED); CHANGE_PV_LIST_LOCK_TO_PHYS(&lock, opa); pmap_pvh_free(&om->md, pmap, va); } } else if (pmap_page_dirty(orig_l3)) { if ((orig_l3 & PTE_SW_MANAGED) != 0) vm_page_dirty(m); } } else { pmap_load_store(l3, new_l3); PTE_SYNC(l3); } pmap_invalidate_page(pmap, va); if ((pmap != pmap_kernel()) && (pmap == &curproc->p_vmspace->vm_pmap)) cpu_icache_sync_range(va, PAGE_SIZE); if (lock != NULL) rw_wunlock(lock); rw_runlock(&pvh_global_lock); PMAP_UNLOCK(pmap); return (KERN_SUCCESS); } /* * Maps a sequence of resident pages belonging to the same object. * The sequence begins with the given page m_start. This page is * mapped at the given virtual address start. Each subsequent page is * mapped at a virtual address that is offset from start by the same * amount as the page is offset from m_start within the object. The * last page in the sequence is the page with the largest offset from * m_start that can be mapped at a virtual address less than the given * virtual address end. Not every virtual page between start and end * is mapped; only those for which a resident page exists with the * corresponding offset from m_start are mapped. */ void pmap_enter_object(pmap_t pmap, vm_offset_t start, vm_offset_t end, vm_page_t m_start, vm_prot_t prot) { struct rwlock *lock; vm_offset_t va; vm_page_t m, mpte; vm_pindex_t diff, psize; VM_OBJECT_ASSERT_LOCKED(m_start->object); psize = atop(end - start); mpte = NULL; m = m_start; lock = NULL; rw_rlock(&pvh_global_lock); PMAP_LOCK(pmap); while (m != NULL && (diff = m->pindex - m_start->pindex) < psize) { va = start + ptoa(diff); mpte = pmap_enter_quick_locked(pmap, va, m, prot, mpte, &lock); m = TAILQ_NEXT(m, listq); } if (lock != NULL) rw_wunlock(lock); rw_runlock(&pvh_global_lock); PMAP_UNLOCK(pmap); } /* * this code makes some *MAJOR* assumptions: * 1. Current pmap & pmap exists. * 2. Not wired. * 3. Read access. * 4. No page table pages. * but is *MUCH* faster than pmap_enter... */ void pmap_enter_quick(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot) { struct rwlock *lock; lock = NULL; rw_rlock(&pvh_global_lock); PMAP_LOCK(pmap); (void)pmap_enter_quick_locked(pmap, va, m, prot, NULL, &lock); if (lock != NULL) rw_wunlock(lock); rw_runlock(&pvh_global_lock); PMAP_UNLOCK(pmap); } static vm_page_t pmap_enter_quick_locked(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot, vm_page_t mpte, struct rwlock **lockp) { struct spglist free; vm_paddr_t phys; pd_entry_t *l2; pt_entry_t *l3; vm_paddr_t pa; pt_entry_t entry; pn_t pn; KASSERT(va < kmi.clean_sva || va >= kmi.clean_eva || (m->oflags & VPO_UNMANAGED) != 0, ("pmap_enter_quick_locked: managed mapping within the clean submap")); rw_assert(&pvh_global_lock, RA_LOCKED); PMAP_LOCK_ASSERT(pmap, MA_OWNED); CTR2(KTR_PMAP, "pmap_enter_quick_locked: %p %lx", pmap, va); /* * In the case that a page table page is not * resident, we are creating it here. */ if (va < VM_MAXUSER_ADDRESS) { vm_pindex_t l2pindex; /* * Calculate pagetable page index */ l2pindex = pmap_l2_pindex(va); if (mpte && (mpte->pindex == l2pindex)) { mpte->wire_count++; } else { /* * Get the l2 entry */ l2 = pmap_l2(pmap, va); /* * If the page table page is mapped, we just increment * the hold count, and activate it. Otherwise, we * attempt to allocate a page table page. If this * attempt fails, we don't retry. Instead, we give up. */ if (l2 != NULL && pmap_load(l2) != 0) { phys = PTE_TO_PHYS(pmap_load(l2)); mpte = PHYS_TO_VM_PAGE(phys); mpte->wire_count++; } else { /* * Pass NULL instead of the PV list lock * pointer, because we don't intend to sleep. */ mpte = _pmap_alloc_l3(pmap, l2pindex, NULL); if (mpte == NULL) return (mpte); } } l3 = (pt_entry_t *)PHYS_TO_DMAP(VM_PAGE_TO_PHYS(mpte)); l3 = &l3[pmap_l3_index(va)]; } else { mpte = NULL; l3 = pmap_l3(kernel_pmap, va); } if (l3 == NULL) panic("pmap_enter_quick_locked: No l3"); if (pmap_load(l3) != 0) { if (mpte != NULL) { mpte->wire_count--; mpte = NULL; } return (mpte); } /* * Enter on the PV list if part of our managed memory. */ if ((m->oflags & VPO_UNMANAGED) == 0 && !pmap_try_insert_pv_entry(pmap, va, m, lockp)) { if (mpte != NULL) { SLIST_INIT(&free); if (pmap_unwire_l3(pmap, va, mpte, &free)) { pmap_invalidate_page(pmap, va); pmap_free_zero_pages(&free); } mpte = NULL; } return (mpte); } /* * Increment counters */ pmap_resident_count_inc(pmap, 1); pa = VM_PAGE_TO_PHYS(m); pn = (pa / PAGE_SIZE); /* RISCVTODO: check permissions */ entry = (PTE_V | PTE_RWX); entry |= (pn << PTE_PPN0_S); /* * Now validate mapping with RO protection */ if ((m->oflags & VPO_UNMANAGED) == 0) entry |= PTE_SW_MANAGED; pmap_load_store(l3, entry); PTE_SYNC(l3); pmap_invalidate_page(pmap, va); return (mpte); } /* * This code maps large physical mmap regions into the * processor address space. Note that some shortcuts * are taken, but the code works. */ void pmap_object_init_pt(pmap_t pmap, vm_offset_t addr, vm_object_t object, vm_pindex_t pindex, vm_size_t size) { VM_OBJECT_ASSERT_WLOCKED(object); KASSERT(object->type == OBJT_DEVICE || object->type == OBJT_SG, ("pmap_object_init_pt: non-device object")); } /* * Clear the wired attribute from the mappings for the specified range of * addresses in the given pmap. Every valid mapping within that range * must have the wired attribute set. In contrast, invalid mappings * cannot have the wired attribute set, so they are ignored. * * The wired attribute of the page table entry is not a hardware feature, * so there is no need to invalidate any TLB entries. */ void pmap_unwire(pmap_t pmap, vm_offset_t sva, vm_offset_t eva) { vm_offset_t va_next; pd_entry_t *l1, *l2; pt_entry_t *l3; boolean_t pv_lists_locked; pv_lists_locked = FALSE; PMAP_LOCK(pmap); for (; sva < eva; sva = va_next) { l1 = pmap_l1(pmap, sva); if (pmap_load(l1) == 0) { va_next = (sva + L1_SIZE) & ~L1_OFFSET; if (va_next < sva) va_next = eva; continue; } va_next = (sva + L2_SIZE) & ~L2_OFFSET; if (va_next < sva) va_next = eva; l2 = pmap_l1_to_l2(l1, sva); if (pmap_load(l2) == 0) continue; if (va_next > eva) va_next = eva; for (l3 = pmap_l2_to_l3(l2, sva); sva != va_next; l3++, sva += L3_SIZE) { if (pmap_load(l3) == 0) continue; if ((pmap_load(l3) & PTE_SW_WIRED) == 0) panic("pmap_unwire: l3 %#jx is missing " "PTE_SW_WIRED", (uintmax_t)*l3); /* * PG_W must be cleared atomically. Although the pmap * lock synchronizes access to PG_W, another processor * could be setting PG_M and/or PG_A concurrently. */ atomic_clear_long(l3, PTE_SW_WIRED); pmap->pm_stats.wired_count--; } } if (pv_lists_locked) rw_runlock(&pvh_global_lock); PMAP_UNLOCK(pmap); } /* * Copy the range specified by src_addr/len * from the source map to the range dst_addr/len * in the destination map. * * This routine is only advisory and need not do anything. */ void pmap_copy(pmap_t dst_pmap, pmap_t src_pmap, vm_offset_t dst_addr, vm_size_t len, vm_offset_t src_addr) { } /* * pmap_zero_page zeros the specified hardware page by mapping * the page into KVM and using bzero to clear its contents. */ void pmap_zero_page(vm_page_t m) { vm_offset_t va = PHYS_TO_DMAP(VM_PAGE_TO_PHYS(m)); pagezero((void *)va); } /* * pmap_zero_page_area zeros the specified hardware page by mapping * the page into KVM and using bzero to clear its contents. * * off and size may not cover an area beyond a single hardware page. */ void pmap_zero_page_area(vm_page_t m, int off, int size) { vm_offset_t va = PHYS_TO_DMAP(VM_PAGE_TO_PHYS(m)); if (off == 0 && size == PAGE_SIZE) pagezero((void *)va); else bzero((char *)va + off, size); } /* * pmap_copy_page copies the specified (machine independent) * page by mapping the page into virtual memory and using * bcopy to copy the page, one machine dependent page at a * time. */ void pmap_copy_page(vm_page_t msrc, vm_page_t mdst) { vm_offset_t src = PHYS_TO_DMAP(VM_PAGE_TO_PHYS(msrc)); vm_offset_t dst = PHYS_TO_DMAP(VM_PAGE_TO_PHYS(mdst)); pagecopy((void *)src, (void *)dst); } int unmapped_buf_allowed = 1; void pmap_copy_pages(vm_page_t ma[], vm_offset_t a_offset, vm_page_t mb[], vm_offset_t b_offset, int xfersize) { void *a_cp, *b_cp; vm_page_t m_a, m_b; vm_paddr_t p_a, p_b; vm_offset_t a_pg_offset, b_pg_offset; int cnt; while (xfersize > 0) { a_pg_offset = a_offset & PAGE_MASK; m_a = ma[a_offset >> PAGE_SHIFT]; p_a = m_a->phys_addr; b_pg_offset = b_offset & PAGE_MASK; m_b = mb[b_offset >> PAGE_SHIFT]; p_b = m_b->phys_addr; cnt = min(xfersize, PAGE_SIZE - a_pg_offset); cnt = min(cnt, PAGE_SIZE - b_pg_offset); if (__predict_false(!PHYS_IN_DMAP(p_a))) { panic("!DMAP a %lx", p_a); } else { a_cp = (char *)PHYS_TO_DMAP(p_a) + a_pg_offset; } if (__predict_false(!PHYS_IN_DMAP(p_b))) { panic("!DMAP b %lx", p_b); } else { b_cp = (char *)PHYS_TO_DMAP(p_b) + b_pg_offset; } bcopy(a_cp, b_cp, cnt); a_offset += cnt; b_offset += cnt; xfersize -= cnt; } } vm_offset_t pmap_quick_enter_page(vm_page_t m) { return (PHYS_TO_DMAP(VM_PAGE_TO_PHYS(m))); } void pmap_quick_remove_page(vm_offset_t addr) { } /* * Returns true if the pmap's pv is one of the first * 16 pvs linked to from this page. This count may * be changed upwards or downwards in the future; it * is only necessary that true be returned for a small * subset of pmaps for proper page aging. */ boolean_t pmap_page_exists_quick(pmap_t pmap, vm_page_t m) { struct rwlock *lock; pv_entry_t pv; int loops = 0; boolean_t rv; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_page_exists_quick: page %p is not managed", m)); rv = FALSE; rw_rlock(&pvh_global_lock); lock = VM_PAGE_TO_PV_LIST_LOCK(m); rw_rlock(lock); TAILQ_FOREACH(pv, &m->md.pv_list, pv_next) { if (PV_PMAP(pv) == pmap) { rv = TRUE; break; } loops++; if (loops >= 16) break; } rw_runlock(lock); rw_runlock(&pvh_global_lock); return (rv); } /* * pmap_page_wired_mappings: * * Return the number of managed mappings to the given physical page * that are wired. */ int pmap_page_wired_mappings(vm_page_t m) { struct rwlock *lock; pmap_t pmap; pt_entry_t *l3; pv_entry_t pv; int count, md_gen; if ((m->oflags & VPO_UNMANAGED) != 0) return (0); rw_rlock(&pvh_global_lock); lock = VM_PAGE_TO_PV_LIST_LOCK(m); rw_rlock(lock); restart: count = 0; TAILQ_FOREACH(pv, &m->md.pv_list, pv_next) { pmap = PV_PMAP(pv); if (!PMAP_TRYLOCK(pmap)) { md_gen = m->md.pv_gen; rw_runlock(lock); PMAP_LOCK(pmap); rw_rlock(lock); if (md_gen != m->md.pv_gen) { PMAP_UNLOCK(pmap); goto restart; } } l3 = pmap_l3(pmap, pv->pv_va); if (l3 != NULL && (pmap_load(l3) & PTE_SW_WIRED) != 0) count++; PMAP_UNLOCK(pmap); } rw_runlock(lock); rw_runlock(&pvh_global_lock); return (count); } /* * Destroy all managed, non-wired mappings in the given user-space * pmap. This pmap cannot be active on any processor besides the * caller. * * This function cannot be applied to the kernel pmap. Moreover, it * is not intended for general use. It is only to be used during * process termination. Consequently, it can be implemented in ways * that make it faster than pmap_remove(). First, it can more quickly * destroy mappings by iterating over the pmap's collection of PV * entries, rather than searching the page table. Second, it doesn't * have to test and clear the page table entries atomically, because * no processor is currently accessing the user address space. In * particular, a page table entry's dirty bit won't change state once * this function starts. */ void pmap_remove_pages(pmap_t pmap) { pd_entry_t ptepde, *l2; pt_entry_t *l3, tl3; struct spglist free; vm_page_t m; pv_entry_t pv; struct pv_chunk *pc, *npc; struct rwlock *lock; int64_t bit; uint64_t inuse, bitmask; int allfree, field, freed, idx; vm_paddr_t pa; lock = NULL; SLIST_INIT(&free); rw_rlock(&pvh_global_lock); PMAP_LOCK(pmap); TAILQ_FOREACH_SAFE(pc, &pmap->pm_pvchunk, pc_list, npc) { allfree = 1; freed = 0; for (field = 0; field < _NPCM; field++) { inuse = ~pc->pc_map[field] & pc_freemask[field]; while (inuse != 0) { bit = ffsl(inuse) - 1; bitmask = 1UL << bit; idx = field * 64 + bit; pv = &pc->pc_pventry[idx]; inuse &= ~bitmask; l2 = pmap_l2(pmap, pv->pv_va); ptepde = pmap_load(l2); l3 = pmap_l2_to_l3(l2, pv->pv_va); tl3 = pmap_load(l3); /* * We cannot remove wired pages from a process' mapping at this time */ if (tl3 & PTE_SW_WIRED) { allfree = 0; continue; } pa = PTE_TO_PHYS(tl3); m = PHYS_TO_VM_PAGE(pa); KASSERT(m->phys_addr == pa, ("vm_page_t %p phys_addr mismatch %016jx %016jx", m, (uintmax_t)m->phys_addr, (uintmax_t)tl3)); KASSERT((m->flags & PG_FICTITIOUS) != 0 || m < &vm_page_array[vm_page_array_size], ("pmap_remove_pages: bad l3 %#jx", (uintmax_t)tl3)); if (pmap_is_current(pmap) && pmap_l3_valid_cacheable(pmap_load(l3))) cpu_dcache_wb_range(pv->pv_va, L3_SIZE); pmap_load_clear(l3); PTE_SYNC(l3); pmap_invalidate_page(pmap, pv->pv_va); /* * Update the vm_page_t clean/reference bits. */ if (pmap_page_dirty(tl3)) vm_page_dirty(m); CHANGE_PV_LIST_LOCK_TO_VM_PAGE(&lock, m); /* Mark free */ pc->pc_map[field] |= bitmask; pmap_resident_count_dec(pmap, 1); TAILQ_REMOVE(&m->md.pv_list, pv, pv_next); m->md.pv_gen++; pmap_unuse_l3(pmap, pv->pv_va, ptepde, &free); freed++; } } PV_STAT(atomic_add_long(&pv_entry_frees, freed)); PV_STAT(atomic_add_int(&pv_entry_spare, freed)); PV_STAT(atomic_subtract_long(&pv_entry_count, freed)); if (allfree) { TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); free_pv_chunk(pc); } } pmap_invalidate_all(pmap); if (lock != NULL) rw_wunlock(lock); rw_runlock(&pvh_global_lock); PMAP_UNLOCK(pmap); pmap_free_zero_pages(&free); } /* * This is used to check if a page has been accessed or modified. As we * don't have a bit to see if it has been modified we have to assume it * has been if the page is read/write. */ static boolean_t pmap_page_test_mappings(vm_page_t m, boolean_t accessed, boolean_t modified) { struct rwlock *lock; pv_entry_t pv; pt_entry_t *l3, mask, value; pmap_t pmap; int md_gen; boolean_t rv; rv = FALSE; rw_rlock(&pvh_global_lock); lock = VM_PAGE_TO_PV_LIST_LOCK(m); rw_rlock(lock); restart: TAILQ_FOREACH(pv, &m->md.pv_list, pv_next) { pmap = PV_PMAP(pv); if (!PMAP_TRYLOCK(pmap)) { md_gen = m->md.pv_gen; rw_runlock(lock); PMAP_LOCK(pmap); rw_rlock(lock); if (md_gen != m->md.pv_gen) { PMAP_UNLOCK(pmap); goto restart; } } l3 = pmap_l3(pmap, pv->pv_va); mask = 0; value = 0; if (modified) { mask |= PTE_D; value |= PTE_D; } if (accessed) { mask |= PTE_A; value |= PTE_A; } #if 0 if (modified) { mask |= ATTR_AP_RW_BIT; value |= ATTR_AP(ATTR_AP_RW); } if (accessed) { mask |= ATTR_AF | ATTR_DESCR_MASK; value |= ATTR_AF | L3_PAGE; } #endif rv = (pmap_load(l3) & mask) == value; PMAP_UNLOCK(pmap); if (rv) goto out; } out: rw_runlock(lock); rw_runlock(&pvh_global_lock); return (rv); } /* * pmap_is_modified: * * Return whether or not the specified physical page was modified * in any physical maps. */ boolean_t pmap_is_modified(vm_page_t m) { KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_is_modified: page %p is not managed", m)); /* * If the page is not exclusive busied, then PGA_WRITEABLE cannot be * concurrently set while the object is locked. Thus, if PGA_WRITEABLE * is clear, no PTEs can have PG_M set. */ VM_OBJECT_ASSERT_WLOCKED(m->object); if (!vm_page_xbusied(m) && (m->aflags & PGA_WRITEABLE) == 0) return (FALSE); return (pmap_page_test_mappings(m, FALSE, TRUE)); } /* * pmap_is_prefaultable: * * Return whether or not the specified virtual address is eligible * for prefault. */ boolean_t pmap_is_prefaultable(pmap_t pmap, vm_offset_t addr) { pt_entry_t *l3; boolean_t rv; rv = FALSE; PMAP_LOCK(pmap); l3 = pmap_l3(pmap, addr); if (l3 != NULL && pmap_load(l3) != 0) { rv = TRUE; } PMAP_UNLOCK(pmap); return (rv); } /* * pmap_is_referenced: * * Return whether or not the specified physical page was referenced * in any physical maps. */ boolean_t pmap_is_referenced(vm_page_t m) { KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_is_referenced: page %p is not managed", m)); return (pmap_page_test_mappings(m, TRUE, FALSE)); } /* * Clear the write and modified bits in each of the given page's mappings. */ void pmap_remove_write(vm_page_t m) { pmap_t pmap; struct rwlock *lock; pv_entry_t pv; pt_entry_t *l3, oldl3; pt_entry_t newl3; int md_gen; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_remove_write: page %p is not managed", m)); /* * If the page is not exclusive busied, then PGA_WRITEABLE cannot be * set by another thread while the object is locked. Thus, * if PGA_WRITEABLE is clear, no page table entries need updating. */ VM_OBJECT_ASSERT_WLOCKED(m->object); if (!vm_page_xbusied(m) && (m->aflags & PGA_WRITEABLE) == 0) return; rw_rlock(&pvh_global_lock); lock = VM_PAGE_TO_PV_LIST_LOCK(m); retry_pv_loop: rw_wlock(lock); TAILQ_FOREACH(pv, &m->md.pv_list, pv_next) { pmap = PV_PMAP(pv); if (!PMAP_TRYLOCK(pmap)) { md_gen = m->md.pv_gen; rw_wunlock(lock); PMAP_LOCK(pmap); rw_wlock(lock); if (md_gen != m->md.pv_gen) { PMAP_UNLOCK(pmap); rw_wunlock(lock); goto retry_pv_loop; } } l3 = pmap_l3(pmap, pv->pv_va); retry: oldl3 = pmap_load(l3); if (pmap_is_write(oldl3)) { newl3 = oldl3 & ~(PTE_W); if (!atomic_cmpset_long(l3, oldl3, newl3)) goto retry; /* TODO: use pmap_page_dirty(oldl3) ? */ if ((oldl3 & PTE_A) != 0) vm_page_dirty(m); pmap_invalidate_page(pmap, pv->pv_va); } PMAP_UNLOCK(pmap); } rw_wunlock(lock); vm_page_aflag_clear(m, PGA_WRITEABLE); rw_runlock(&pvh_global_lock); } static __inline boolean_t safe_to_clear_referenced(pmap_t pmap, pt_entry_t pte) { return (FALSE); } -#define PMAP_TS_REFERENCED_MAX 5 - /* * pmap_ts_referenced: * * Return a count of reference bits for a page, clearing those bits. * It is not necessary for every reference bit to be cleared, but it * is necessary that 0 only be returned when there are truly no * reference bits set. * - * XXX: The exact number of bits to check and clear is a matter that - * should be tested and standardized at some point in the future for - * optimal aging of shared pages. + * As an optimization, update the page's dirty field if a modified bit is + * found while counting reference bits. This opportunistic update can be + * performed at low cost and can eliminate the need for some future calls + * to pmap_is_modified(). However, since this function stops after + * finding PMAP_TS_REFERENCED_MAX reference bits, it may not detect some + * dirty pages. Those dirty pages will only be detected by a future call + * to pmap_is_modified(). */ int pmap_ts_referenced(vm_page_t m) { pv_entry_t pv, pvf; pmap_t pmap; struct rwlock *lock; pd_entry_t *l2; - pt_entry_t *l3; + pt_entry_t *l3, old_l3; vm_paddr_t pa; int cleared, md_gen, not_cleared; struct spglist free; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_ts_referenced: page %p is not managed", m)); SLIST_INIT(&free); cleared = 0; pa = VM_PAGE_TO_PHYS(m); lock = PHYS_TO_PV_LIST_LOCK(pa); rw_rlock(&pvh_global_lock); rw_wlock(lock); retry: not_cleared = 0; if ((pvf = TAILQ_FIRST(&m->md.pv_list)) == NULL) goto out; pv = pvf; do { if (pvf == NULL) pvf = pv; pmap = PV_PMAP(pv); if (!PMAP_TRYLOCK(pmap)) { md_gen = m->md.pv_gen; rw_wunlock(lock); PMAP_LOCK(pmap); rw_wlock(lock); if (md_gen != m->md.pv_gen) { PMAP_UNLOCK(pmap); goto retry; } } l2 = pmap_l2(pmap, pv->pv_va); KASSERT((pmap_load(l2) & PTE_RX) == 0, ("pmap_ts_referenced: found an invalid l2 table")); l3 = pmap_l2_to_l3(l2, pv->pv_va); - if ((pmap_load(l3) & PTE_A) != 0) { - if (safe_to_clear_referenced(pmap, pmap_load(l3))) { + old_l3 = pmap_load(l3); + if (pmap_page_dirty(old_l3)) + vm_page_dirty(m); + if ((old_l3 & PTE_A) != 0) { + if (safe_to_clear_referenced(pmap, old_l3)) { /* * TODO: We don't handle the access flag * at all. We need to be able to set it in * the exception handler. */ panic("RISCVTODO: safe_to_clear_referenced\n"); - } else if ((pmap_load(l3) & PTE_SW_WIRED) == 0) { + } else if ((old_l3 & PTE_SW_WIRED) == 0) { /* * Wired pages cannot be paged out so * doing accessed bit emulation for * them is wasted effort. We do the * hard work for unwired pages only. */ pmap_remove_l3(pmap, l3, pv->pv_va, pmap_load(l2), &free, &lock); pmap_invalidate_page(pmap, pv->pv_va); cleared++; if (pvf == pv) pvf = NULL; pv = NULL; KASSERT(lock == VM_PAGE_TO_PV_LIST_LOCK(m), ("inconsistent pv lock %p %p for page %p", lock, VM_PAGE_TO_PV_LIST_LOCK(m), m)); } else not_cleared++; } PMAP_UNLOCK(pmap); /* Rotate the PV list if it has more than one entry. */ if (pv != NULL && TAILQ_NEXT(pv, pv_next) != NULL) { TAILQ_REMOVE(&m->md.pv_list, pv, pv_next); TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_next); m->md.pv_gen++; } } while ((pv = TAILQ_FIRST(&m->md.pv_list)) != pvf && cleared + not_cleared < PMAP_TS_REFERENCED_MAX); out: rw_wunlock(lock); rw_runlock(&pvh_global_lock); pmap_free_zero_pages(&free); return (cleared + not_cleared); } /* * Apply the given advice to the specified range of addresses within the * given pmap. Depending on the advice, clear the referenced and/or * modified flags in each mapping and set the mapped page's dirty field. */ void pmap_advise(pmap_t pmap, vm_offset_t sva, vm_offset_t eva, int advice) { } /* * Clear the modify bits on the specified physical page. */ void pmap_clear_modify(vm_page_t m) { KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_clear_modify: page %p is not managed", m)); VM_OBJECT_ASSERT_WLOCKED(m->object); KASSERT(!vm_page_xbusied(m), ("pmap_clear_modify: page %p is exclusive busied", m)); /* * If the page is not PGA_WRITEABLE, then no PTEs can have PG_M set. * If the object containing the page is locked and the page is not * exclusive busied, then PGA_WRITEABLE cannot be concurrently set. */ if ((m->aflags & PGA_WRITEABLE) == 0) return; /* RISCVTODO: We lack support for tracking if a page is modified */ } void * pmap_mapbios(vm_paddr_t pa, vm_size_t size) { return ((void *)PHYS_TO_DMAP(pa)); } void pmap_unmapbios(vm_paddr_t pa, vm_size_t size) { } /* * Sets the memory attribute for the specified page. */ void pmap_page_set_memattr(vm_page_t m, vm_memattr_t ma) { m->md.pv_memattr = ma; /* * RISCVTODO: Implement the below (from the amd64 pmap) * If "m" is a normal page, update its direct mapping. This update * can be relied upon to perform any cache operations that are * required for data coherence. */ if ((m->flags & PG_FICTITIOUS) == 0 && PHYS_IN_DMAP(VM_PAGE_TO_PHYS(m))) panic("RISCVTODO: pmap_page_set_memattr"); } /* * perform the pmap work for mincore */ int pmap_mincore(pmap_t pmap, vm_offset_t addr, vm_paddr_t *locked_pa) { panic("RISCVTODO: pmap_mincore"); } void pmap_activate(struct thread *td) { pmap_t pmap; critical_enter(); pmap = vmspace_pmap(td->td_proc->p_vmspace); td->td_pcb->pcb_l1addr = vtophys(pmap->pm_l1); __asm __volatile("csrw sptbr, %0" :: "r"(td->td_pcb->pcb_l1addr >> PAGE_SHIFT)); pmap_invalidate_all(pmap); critical_exit(); } void pmap_sync_icache(pmap_t pm, vm_offset_t va, vm_size_t sz) { panic("RISCVTODO: pmap_sync_icache"); } /* * Increase the starting virtual address of the given mapping if a * different alignment might result in more superpage mappings. */ void pmap_align_superpage(vm_object_t object, vm_ooffset_t offset, vm_offset_t *addr, vm_size_t size) { } /** * Get the kernel virtual address of a set of physical pages. If there are * physical addresses not covered by the DMAP perform a transient mapping * that will be removed when calling pmap_unmap_io_transient. * * \param page The pages the caller wishes to obtain the virtual * address on the kernel memory map. * \param vaddr On return contains the kernel virtual memory address * of the pages passed in the page parameter. * \param count Number of pages passed in. * \param can_fault TRUE if the thread using the mapped pages can take * page faults, FALSE otherwise. * * \returns TRUE if the caller must call pmap_unmap_io_transient when * finished or FALSE otherwise. * */ boolean_t pmap_map_io_transient(vm_page_t page[], vm_offset_t vaddr[], int count, boolean_t can_fault) { vm_paddr_t paddr; boolean_t needs_mapping; int error, i; /* * Allocate any KVA space that we need, this is done in a separate * loop to prevent calling vmem_alloc while pinned. */ needs_mapping = FALSE; for (i = 0; i < count; i++) { paddr = VM_PAGE_TO_PHYS(page[i]); if (__predict_false(paddr >= DMAP_MAX_PHYSADDR)) { error = vmem_alloc(kernel_arena, PAGE_SIZE, M_BESTFIT | M_WAITOK, &vaddr[i]); KASSERT(error == 0, ("vmem_alloc failed: %d", error)); needs_mapping = TRUE; } else { vaddr[i] = PHYS_TO_DMAP(paddr); } } /* Exit early if everything is covered by the DMAP */ if (!needs_mapping) return (FALSE); if (!can_fault) sched_pin(); for (i = 0; i < count; i++) { paddr = VM_PAGE_TO_PHYS(page[i]); if (paddr >= DMAP_MAX_PHYSADDR) { panic( "pmap_map_io_transient: TODO: Map out of DMAP data"); } } return (needs_mapping); } void pmap_unmap_io_transient(vm_page_t page[], vm_offset_t vaddr[], int count, boolean_t can_fault) { vm_paddr_t paddr; int i; if (!can_fault) sched_unpin(); for (i = 0; i < count; i++) { paddr = VM_PAGE_TO_PHYS(page[i]); if (paddr >= DMAP_MAX_PHYSADDR) { panic("RISCVTODO: pmap_unmap_io_transient: Unmap data"); } } } Index: projects/clang390-import/sys/security/audit/audit_syscalls.c =================================================================== --- projects/clang390-import/sys/security/audit/audit_syscalls.c (revision 305686) +++ projects/clang390-import/sys/security/audit/audit_syscalls.c (revision 305687) @@ -1,873 +1,873 @@ /*- * Copyright (c) 1999-2009 Apple Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of Apple Inc. ("Apple") nor the names of * its contributors may be used to endorse or promote products derived * from this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY APPLE AND ITS CONTRIBUTORS "AS IS" AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL APPLE OR ITS CONTRIBUTORS BE LIABLE FOR * ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifdef AUDIT /* * System call to allow a user space application to submit a BSM audit record * to the kernel for inclusion in the audit log. This function does little * verification on the audit record that is submitted. * * XXXAUDIT: Audit preselection for user records does not currently work, * since we pre-select only based on the AUE_audit event type, not the event * type submitted as part of the user audit data. */ /* ARGSUSED */ int sys_audit(struct thread *td, struct audit_args *uap) { int error; void * rec; struct kaudit_record *ar; if (jailed(td->td_ucred)) return (ENOSYS); error = priv_check(td, PRIV_AUDIT_SUBMIT); if (error) return (error); if ((uap->length <= 0) || (uap->length > audit_qctrl.aq_bufsz)) return (EINVAL); ar = currecord(); /* * If there's no current audit record (audit() itself not audited) * commit the user audit record. */ if (ar == NULL) { /* * This is not very efficient; we're required to allocate a * complete kernel audit record just so the user record can * tag along. * * XXXAUDIT: Maybe AUE_AUDIT in the system call context and * special pre-select handling? */ td->td_ar = audit_new(AUE_NULL, td); if (td->td_ar == NULL) return (ENOTSUP); td->td_pflags |= TDP_AUDITREC; ar = td->td_ar; } if (uap->length > MAX_AUDIT_RECORD_SIZE) return (EINVAL); rec = malloc(uap->length, M_AUDITDATA, M_WAITOK); error = copyin(uap->record, rec, uap->length); if (error) goto free_out; /* Verify the record. */ if (bsm_rec_verify(rec) == 0) { error = EINVAL; goto free_out; } #ifdef MAC error = mac_system_check_audit(td->td_ucred, rec, uap->length); if (error) goto free_out; #endif /* * Attach the user audit record to the kernel audit record. Because * this system call is an auditable event, we will write the user * record along with the record for this audit event. * * XXXAUDIT: KASSERT appropriate starting values of k_udata, k_ulen, * k_ar_commit & AR_COMMIT_USER? */ ar->k_udata = rec; ar->k_ulen = uap->length; ar->k_ar_commit |= AR_COMMIT_USER; /* * Currently we assume that all preselection has been performed in * userspace. We unconditionally set these masks so that the records * get committed both to the trail and pipe. In the future we will * want to setup kernel based preselection. */ ar->k_ar_commit |= (AR_PRESELECT_USER_TRAIL | AR_PRESELECT_USER_PIPE); return (0); free_out: /* * audit_syscall_exit() will free the audit record on the thread even * if we allocated it above. */ free(rec, M_AUDITDATA); return (error); } /* * System call to manipulate auditing. */ /* ARGSUSED */ int sys_auditon(struct thread *td, struct auditon_args *uap) { struct ucred *cred, *newcred, *oldcred; int error; union auditon_udata udata; struct proc *tp; if (jailed(td->td_ucred)) return (ENOSYS); AUDIT_ARG_CMD(uap->cmd); #ifdef MAC error = mac_system_check_auditon(td->td_ucred, uap->cmd); if (error) return (error); #endif error = priv_check(td, PRIV_AUDIT_CONTROL); if (error) return (error); if ((uap->length <= 0) || (uap->length > sizeof(union auditon_udata))) return (EINVAL); memset((void *)&udata, 0, sizeof(udata)); /* * Some of the GET commands use the arguments too. */ switch (uap->cmd) { case A_SETPOLICY: case A_OLDSETPOLICY: case A_SETKMASK: case A_SETQCTRL: case A_OLDSETQCTRL: case A_SETSTAT: case A_SETUMASK: case A_SETSMASK: case A_SETCOND: case A_OLDSETCOND: case A_SETCLASS: case A_SETPMASK: case A_SETFSIZE: case A_SETKAUDIT: case A_GETCLASS: case A_GETPINFO: case A_GETPINFO_ADDR: case A_SENDTRIGGER: error = copyin(uap->data, (void *)&udata, uap->length); if (error) return (error); AUDIT_ARG_AUDITON(&udata); break; } /* * XXXAUDIT: Locking? */ switch (uap->cmd) { case A_OLDGETPOLICY: case A_GETPOLICY: if (uap->length == sizeof(udata.au_policy64)) { if (!audit_fail_stop) udata.au_policy64 |= AUDIT_CNT; if (audit_panic_on_write_fail) udata.au_policy64 |= AUDIT_AHLT; if (audit_argv) udata.au_policy64 |= AUDIT_ARGV; if (audit_arge) udata.au_policy64 |= AUDIT_ARGE; break; } if (uap->length != sizeof(udata.au_policy)) return (EINVAL); if (!audit_fail_stop) udata.au_policy |= AUDIT_CNT; if (audit_panic_on_write_fail) udata.au_policy |= AUDIT_AHLT; if (audit_argv) udata.au_policy |= AUDIT_ARGV; if (audit_arge) udata.au_policy |= AUDIT_ARGE; break; case A_OLDSETPOLICY: case A_SETPOLICY: if (uap->length == sizeof(udata.au_policy64)) { if (udata.au_policy & (~AUDIT_CNT|AUDIT_AHLT| AUDIT_ARGV|AUDIT_ARGE)) return (EINVAL); audit_fail_stop = ((udata.au_policy64 & AUDIT_CNT) == 0); audit_panic_on_write_fail = (udata.au_policy64 & AUDIT_AHLT); audit_argv = (udata.au_policy64 & AUDIT_ARGV); audit_arge = (udata.au_policy64 & AUDIT_ARGE); break; } if (uap->length != sizeof(udata.au_policy)) return (EINVAL); if (udata.au_policy & ~(AUDIT_CNT|AUDIT_AHLT|AUDIT_ARGV| AUDIT_ARGE)) return (EINVAL); /* * XXX - Need to wake up waiters if the policy relaxes? */ audit_fail_stop = ((udata.au_policy & AUDIT_CNT) == 0); audit_panic_on_write_fail = (udata.au_policy & AUDIT_AHLT); audit_argv = (udata.au_policy & AUDIT_ARGV); audit_arge = (udata.au_policy & AUDIT_ARGE); break; case A_GETKMASK: if (uap->length != sizeof(udata.au_mask)) return (EINVAL); udata.au_mask = audit_nae_mask; break; case A_SETKMASK: if (uap->length != sizeof(udata.au_mask)) return (EINVAL); audit_nae_mask = udata.au_mask; break; case A_OLDGETQCTRL: case A_GETQCTRL: if (uap->length == sizeof(udata.au_qctrl64)) { udata.au_qctrl64.aq64_hiwater = (u_int64_t)audit_qctrl.aq_hiwater; udata.au_qctrl64.aq64_lowater = (u_int64_t)audit_qctrl.aq_lowater; udata.au_qctrl64.aq64_bufsz = (u_int64_t)audit_qctrl.aq_bufsz; udata.au_qctrl64.aq64_minfree = (u_int64_t)audit_qctrl.aq_minfree; break; } if (uap->length != sizeof(udata.au_qctrl)) return (EINVAL); udata.au_qctrl = audit_qctrl; break; case A_OLDSETQCTRL: case A_SETQCTRL: if (uap->length == sizeof(udata.au_qctrl64)) { + /* NB: aq64_minfree is unsigned unlike aq_minfree. */ if ((udata.au_qctrl64.aq64_hiwater > AQ_MAXHIGH) || (udata.au_qctrl64.aq64_lowater >= udata.au_qctrl.aq_hiwater) || (udata.au_qctrl64.aq64_bufsz > AQ_MAXBUFSZ) || - (udata.au_qctrl64.aq64_minfree > 100) || - (udata.au_qctrl64.aq64_minfree < 0)) + (udata.au_qctrl64.aq64_minfree > 100)) return (EINVAL); audit_qctrl.aq_hiwater = (int)udata.au_qctrl64.aq64_hiwater; audit_qctrl.aq_lowater = (int)udata.au_qctrl64.aq64_lowater; audit_qctrl.aq_bufsz = (int)udata.au_qctrl64.aq64_bufsz; audit_qctrl.aq_minfree = (int)udata.au_qctrl64.aq64_minfree; audit_qctrl.aq_delay = -1; /* Not used. */ break; } if (uap->length != sizeof(udata.au_qctrl)) return (EINVAL); if ((udata.au_qctrl.aq_hiwater > AQ_MAXHIGH) || (udata.au_qctrl.aq_lowater >= udata.au_qctrl.aq_hiwater) || (udata.au_qctrl.aq_bufsz > AQ_MAXBUFSZ) || (udata.au_qctrl.aq_minfree < 0) || (udata.au_qctrl.aq_minfree > 100)) return (EINVAL); audit_qctrl = udata.au_qctrl; /* XXX The queue delay value isn't used with the kernel. */ audit_qctrl.aq_delay = -1; break; case A_GETCWD: return (ENOSYS); break; case A_GETCAR: return (ENOSYS); break; case A_GETSTAT: return (ENOSYS); break; case A_SETSTAT: return (ENOSYS); break; case A_SETUMASK: return (ENOSYS); break; case A_SETSMASK: return (ENOSYS); break; case A_OLDGETCOND: case A_GETCOND: if (uap->length == sizeof(udata.au_cond64)) { if (audit_enabled && !audit_suspended) udata.au_cond64 = AUC_AUDITING; else udata.au_cond64 = AUC_NOAUDIT; break; } if (uap->length != sizeof(udata.au_cond)) return (EINVAL); if (audit_enabled && !audit_suspended) udata.au_cond = AUC_AUDITING; else udata.au_cond = AUC_NOAUDIT; break; case A_OLDSETCOND: case A_SETCOND: if (uap->length == sizeof(udata.au_cond64)) { if (udata.au_cond64 == AUC_NOAUDIT) audit_suspended = 1; if (udata.au_cond64 == AUC_AUDITING) audit_suspended = 0; if (udata.au_cond64 == AUC_DISABLED) { audit_suspended = 1; audit_shutdown(NULL, 0); } break; } if (uap->length != sizeof(udata.au_cond)) return (EINVAL); if (udata.au_cond == AUC_NOAUDIT) audit_suspended = 1; if (udata.au_cond == AUC_AUDITING) audit_suspended = 0; if (udata.au_cond == AUC_DISABLED) { audit_suspended = 1; audit_shutdown(NULL, 0); } break; case A_GETCLASS: if (uap->length != sizeof(udata.au_evclass)) return (EINVAL); udata.au_evclass.ec_class = au_event_class( udata.au_evclass.ec_number); break; case A_SETCLASS: if (uap->length != sizeof(udata.au_evclass)) return (EINVAL); au_evclassmap_insert(udata.au_evclass.ec_number, udata.au_evclass.ec_class); break; case A_GETPINFO: if (uap->length != sizeof(udata.au_aupinfo)) return (EINVAL); if (udata.au_aupinfo.ap_pid < 1) return (ESRCH); if ((tp = pfind(udata.au_aupinfo.ap_pid)) == NULL) return (ESRCH); if ((error = p_cansee(td, tp)) != 0) { PROC_UNLOCK(tp); return (error); } cred = tp->p_ucred; if (cred->cr_audit.ai_termid.at_type == AU_IPv6) { PROC_UNLOCK(tp); return (EINVAL); } udata.au_aupinfo.ap_auid = cred->cr_audit.ai_auid; udata.au_aupinfo.ap_mask.am_success = cred->cr_audit.ai_mask.am_success; udata.au_aupinfo.ap_mask.am_failure = cred->cr_audit.ai_mask.am_failure; udata.au_aupinfo.ap_termid.machine = cred->cr_audit.ai_termid.at_addr[0]; udata.au_aupinfo.ap_termid.port = (dev_t)cred->cr_audit.ai_termid.at_port; udata.au_aupinfo.ap_asid = cred->cr_audit.ai_asid; PROC_UNLOCK(tp); break; case A_SETPMASK: if (uap->length != sizeof(udata.au_aupinfo)) return (EINVAL); if (udata.au_aupinfo.ap_pid < 1) return (ESRCH); newcred = crget(); if ((tp = pfind(udata.au_aupinfo.ap_pid)) == NULL) { crfree(newcred); return (ESRCH); } if ((error = p_cansee(td, tp)) != 0) { PROC_UNLOCK(tp); crfree(newcred); return (error); } oldcred = tp->p_ucred; crcopy(newcred, oldcred); newcred->cr_audit.ai_mask.am_success = udata.au_aupinfo.ap_mask.am_success; newcred->cr_audit.ai_mask.am_failure = udata.au_aupinfo.ap_mask.am_failure; proc_set_cred(tp, newcred); PROC_UNLOCK(tp); crfree(oldcred); break; case A_SETFSIZE: if (uap->length != sizeof(udata.au_fstat)) return (EINVAL); if ((udata.au_fstat.af_filesz != 0) && (udata.au_fstat.af_filesz < MIN_AUDIT_FILE_SIZE)) return (EINVAL); audit_fstat.af_filesz = udata.au_fstat.af_filesz; break; case A_GETFSIZE: if (uap->length != sizeof(udata.au_fstat)) return (EINVAL); udata.au_fstat.af_filesz = audit_fstat.af_filesz; udata.au_fstat.af_currsz = audit_fstat.af_currsz; break; case A_GETPINFO_ADDR: if (uap->length != sizeof(udata.au_aupinfo_addr)) return (EINVAL); if (udata.au_aupinfo_addr.ap_pid < 1) return (ESRCH); if ((tp = pfind(udata.au_aupinfo_addr.ap_pid)) == NULL) return (ESRCH); cred = tp->p_ucred; udata.au_aupinfo_addr.ap_auid = cred->cr_audit.ai_auid; udata.au_aupinfo_addr.ap_mask.am_success = cred->cr_audit.ai_mask.am_success; udata.au_aupinfo_addr.ap_mask.am_failure = cred->cr_audit.ai_mask.am_failure; udata.au_aupinfo_addr.ap_termid = cred->cr_audit.ai_termid; udata.au_aupinfo_addr.ap_asid = cred->cr_audit.ai_asid; PROC_UNLOCK(tp); break; case A_GETKAUDIT: if (uap->length != sizeof(udata.au_kau_info)) return (EINVAL); audit_get_kinfo(&udata.au_kau_info); break; case A_SETKAUDIT: if (uap->length != sizeof(udata.au_kau_info)) return (EINVAL); if (udata.au_kau_info.ai_termid.at_type != AU_IPv4 && udata.au_kau_info.ai_termid.at_type != AU_IPv6) return (EINVAL); audit_set_kinfo(&udata.au_kau_info); break; case A_SENDTRIGGER: if (uap->length != sizeof(udata.au_trigger)) return (EINVAL); if ((udata.au_trigger < AUDIT_TRIGGER_MIN) || (udata.au_trigger > AUDIT_TRIGGER_MAX)) return (EINVAL); return (audit_send_trigger(udata.au_trigger)); default: return (EINVAL); } /* * Copy data back to userspace for the GET comands. */ switch (uap->cmd) { case A_GETPOLICY: case A_OLDGETPOLICY: case A_GETKMASK: case A_GETQCTRL: case A_OLDGETQCTRL: case A_GETCWD: case A_GETCAR: case A_GETSTAT: case A_GETCOND: case A_OLDGETCOND: case A_GETCLASS: case A_GETPINFO: case A_GETFSIZE: case A_GETPINFO_ADDR: case A_GETKAUDIT: error = copyout((void *)&udata, uap->data, uap->length); if (error) return (error); break; } return (0); } /* * System calls to manage the user audit information. */ /* ARGSUSED */ int sys_getauid(struct thread *td, struct getauid_args *uap) { int error; if (jailed(td->td_ucred)) return (ENOSYS); error = priv_check(td, PRIV_AUDIT_GETAUDIT); if (error) return (error); return (copyout(&td->td_ucred->cr_audit.ai_auid, uap->auid, sizeof(td->td_ucred->cr_audit.ai_auid))); } /* ARGSUSED */ int sys_setauid(struct thread *td, struct setauid_args *uap) { struct ucred *newcred, *oldcred; au_id_t id; int error; if (jailed(td->td_ucred)) return (ENOSYS); error = copyin(uap->auid, &id, sizeof(id)); if (error) return (error); audit_arg_auid(id); newcred = crget(); PROC_LOCK(td->td_proc); oldcred = td->td_proc->p_ucred; crcopy(newcred, oldcred); #ifdef MAC error = mac_cred_check_setauid(oldcred, id); if (error) goto fail; #endif error = priv_check_cred(oldcred, PRIV_AUDIT_SETAUDIT, 0); if (error) goto fail; newcred->cr_audit.ai_auid = id; proc_set_cred(td->td_proc, newcred); PROC_UNLOCK(td->td_proc); crfree(oldcred); return (0); fail: PROC_UNLOCK(td->td_proc); crfree(newcred); return (error); } /* * System calls to get and set process audit information. */ /* ARGSUSED */ int sys_getaudit(struct thread *td, struct getaudit_args *uap) { struct auditinfo ai; struct ucred *cred; int error; cred = td->td_ucred; if (jailed(cred)) return (ENOSYS); error = priv_check(td, PRIV_AUDIT_GETAUDIT); if (error) return (error); if (cred->cr_audit.ai_termid.at_type == AU_IPv6) return (E2BIG); bzero(&ai, sizeof(ai)); ai.ai_auid = cred->cr_audit.ai_auid; ai.ai_mask = cred->cr_audit.ai_mask; ai.ai_asid = cred->cr_audit.ai_asid; ai.ai_termid.machine = cred->cr_audit.ai_termid.at_addr[0]; ai.ai_termid.port = cred->cr_audit.ai_termid.at_port; return (copyout(&ai, uap->auditinfo, sizeof(ai))); } /* ARGSUSED */ int sys_setaudit(struct thread *td, struct setaudit_args *uap) { struct ucred *newcred, *oldcred; struct auditinfo ai; int error; if (jailed(td->td_ucred)) return (ENOSYS); error = copyin(uap->auditinfo, &ai, sizeof(ai)); if (error) return (error); audit_arg_auditinfo(&ai); newcred = crget(); PROC_LOCK(td->td_proc); oldcred = td->td_proc->p_ucred; crcopy(newcred, oldcred); #ifdef MAC error = mac_cred_check_setaudit(oldcred, &ai); if (error) goto fail; #endif error = priv_check_cred(oldcred, PRIV_AUDIT_SETAUDIT, 0); if (error) goto fail; bzero(&newcred->cr_audit, sizeof(newcred->cr_audit)); newcred->cr_audit.ai_auid = ai.ai_auid; newcred->cr_audit.ai_mask = ai.ai_mask; newcred->cr_audit.ai_asid = ai.ai_asid; newcred->cr_audit.ai_termid.at_addr[0] = ai.ai_termid.machine; newcred->cr_audit.ai_termid.at_port = ai.ai_termid.port; newcred->cr_audit.ai_termid.at_type = AU_IPv4; proc_set_cred(td->td_proc, newcred); PROC_UNLOCK(td->td_proc); crfree(oldcred); return (0); fail: PROC_UNLOCK(td->td_proc); crfree(newcred); return (error); } /* ARGSUSED */ int sys_getaudit_addr(struct thread *td, struct getaudit_addr_args *uap) { int error; if (jailed(td->td_ucred)) return (ENOSYS); if (uap->length < sizeof(*uap->auditinfo_addr)) return (EOVERFLOW); error = priv_check(td, PRIV_AUDIT_GETAUDIT); if (error) return (error); return (copyout(&td->td_ucred->cr_audit, uap->auditinfo_addr, sizeof(*uap->auditinfo_addr))); } /* ARGSUSED */ int sys_setaudit_addr(struct thread *td, struct setaudit_addr_args *uap) { struct ucred *newcred, *oldcred; struct auditinfo_addr aia; int error; if (jailed(td->td_ucred)) return (ENOSYS); error = copyin(uap->auditinfo_addr, &aia, sizeof(aia)); if (error) return (error); audit_arg_auditinfo_addr(&aia); if (aia.ai_termid.at_type != AU_IPv6 && aia.ai_termid.at_type != AU_IPv4) return (EINVAL); newcred = crget(); PROC_LOCK(td->td_proc); oldcred = td->td_proc->p_ucred; crcopy(newcred, oldcred); #ifdef MAC error = mac_cred_check_setaudit_addr(oldcred, &aia); if (error) goto fail; #endif error = priv_check_cred(oldcred, PRIV_AUDIT_SETAUDIT, 0); if (error) goto fail; newcred->cr_audit = aia; proc_set_cred(td->td_proc, newcred); PROC_UNLOCK(td->td_proc); crfree(oldcred); return (0); fail: PROC_UNLOCK(td->td_proc); crfree(newcred); return (error); } /* * Syscall to manage audit files. */ /* ARGSUSED */ int sys_auditctl(struct thread *td, struct auditctl_args *uap) { struct nameidata nd; struct ucred *cred; struct vnode *vp; int error = 0; int flags; if (jailed(td->td_ucred)) return (ENOSYS); error = priv_check(td, PRIV_AUDIT_CONTROL); if (error) return (error); vp = NULL; cred = NULL; /* * If a path is specified, open the replacement vnode, perform * validity checks, and grab another reference to the current * credential. * * On Darwin, a NULL path argument is also used to disable audit. */ if (uap->path == NULL) return (EINVAL); NDINIT(&nd, LOOKUP, FOLLOW | LOCKLEAF | AUDITVNODE1, UIO_USERSPACE, uap->path, td); flags = AUDIT_OPEN_FLAGS; error = vn_open(&nd, &flags, 0, NULL); if (error) return (error); vp = nd.ni_vp; #ifdef MAC error = mac_system_check_auditctl(td->td_ucred, vp); VOP_UNLOCK(vp, 0); if (error) { vn_close(vp, AUDIT_CLOSE_FLAGS, td->td_ucred, td); return (error); } #else VOP_UNLOCK(vp, 0); #endif NDFREE(&nd, NDF_ONLY_PNBUF); if (vp->v_type != VREG) { vn_close(vp, AUDIT_CLOSE_FLAGS, td->td_ucred, td); return (EINVAL); } cred = td->td_ucred; crhold(cred); /* * XXXAUDIT: Should audit_suspended actually be cleared by * audit_worker? */ audit_suspended = 0; audit_rotate_vnode(cred, vp); return (error); } #else /* !AUDIT */ int sys_audit(struct thread *td, struct audit_args *uap) { return (ENOSYS); } int sys_auditon(struct thread *td, struct auditon_args *uap) { return (ENOSYS); } int sys_getauid(struct thread *td, struct getauid_args *uap) { return (ENOSYS); } int sys_setauid(struct thread *td, struct setauid_args *uap) { return (ENOSYS); } int sys_getaudit(struct thread *td, struct getaudit_args *uap) { return (ENOSYS); } int sys_setaudit(struct thread *td, struct setaudit_args *uap) { return (ENOSYS); } int sys_getaudit_addr(struct thread *td, struct getaudit_addr_args *uap) { return (ENOSYS); } int sys_setaudit_addr(struct thread *td, struct setaudit_addr_args *uap) { return (ENOSYS); } int sys_auditctl(struct thread *td, struct auditctl_args *uap) { return (ENOSYS); } #endif /* AUDIT */ Index: projects/clang390-import/sys/sparc64/sparc64/pmap.c =================================================================== --- projects/clang390-import/sys/sparc64/sparc64/pmap.c (revision 305686) +++ projects/clang390-import/sys/sparc64/sparc64/pmap.c (revision 305687) @@ -1,2334 +1,2328 @@ /*- * Copyright (c) 1991 Regents of the University of California. * All rights reserved. * Copyright (c) 1994 John S. Dyson * All rights reserved. * Copyright (c) 1994 David Greenman * All rights reserved. * * This code is derived from software contributed to Berkeley by * the Systems Programming Group of the University of Utah Computer * Science Department and William Jolitz of UUNET Technologies Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * from: @(#)pmap.c 7.7 (Berkeley) 5/12/91 */ #include __FBSDID("$FreeBSD$"); /* * Manages physical address maps. * * Since the information managed by this module is also stored by the * logical address mapping module, this module may throw away valid virtual * to physical mappings at almost any time. However, invalidations of * mappings must be done as requested. * * In order to cope with hardware architectures which make virtual to * physical map invalidates expensive, this module may delay invalidate * reduced protection operations until such time as they are actually * necessary. This module is given full information as to which processors * are currently using which maps, and to when physical maps must be made * correct. */ #include "opt_kstack_pages.h" #include "opt_pmap.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include /* * Virtual address of message buffer */ struct msgbuf *msgbufp; /* * Map of physical memory reagions */ vm_paddr_t phys_avail[128]; static struct ofw_mem_region mra[128]; struct ofw_mem_region sparc64_memreg[128]; int sparc64_nmemreg; static struct ofw_map translations[128]; static int translations_size; static vm_offset_t pmap_idle_map; static vm_offset_t pmap_temp_map_1; static vm_offset_t pmap_temp_map_2; /* * First and last available kernel virtual addresses */ vm_offset_t virtual_avail; vm_offset_t virtual_end; vm_offset_t kernel_vm_end; vm_offset_t vm_max_kernel_address; /* * Kernel pmap */ struct pmap kernel_pmap_store; struct rwlock_padalign tte_list_global_lock; /* * Allocate physical memory for use in pmap_bootstrap. */ static vm_paddr_t pmap_bootstrap_alloc(vm_size_t size, uint32_t colors); static void pmap_bootstrap_set_tte(struct tte *tp, u_long vpn, u_long data); static void pmap_cache_remove(vm_page_t m, vm_offset_t va); static int pmap_protect_tte(struct pmap *pm1, struct pmap *pm2, struct tte *tp, vm_offset_t va); static int pmap_unwire_tte(pmap_t pm, pmap_t pm2, struct tte *tp, vm_offset_t va); static void pmap_init_qpages(void); /* * Map the given physical page at the specified virtual address in the * target pmap with the protection requested. If specified the page * will be wired down. * * The page queues and pmap must be locked. */ static int pmap_enter_locked(pmap_t pm, vm_offset_t va, vm_page_t m, vm_prot_t prot, u_int flags, int8_t psind); extern int tl1_dmmu_miss_direct_patch_tsb_phys_1[]; extern int tl1_dmmu_miss_direct_patch_tsb_phys_end_1[]; extern int tl1_dmmu_miss_patch_asi_1[]; extern int tl1_dmmu_miss_patch_quad_ldd_1[]; extern int tl1_dmmu_miss_patch_tsb_1[]; extern int tl1_dmmu_miss_patch_tsb_2[]; extern int tl1_dmmu_miss_patch_tsb_mask_1[]; extern int tl1_dmmu_miss_patch_tsb_mask_2[]; extern int tl1_dmmu_prot_patch_asi_1[]; extern int tl1_dmmu_prot_patch_quad_ldd_1[]; extern int tl1_dmmu_prot_patch_tsb_1[]; extern int tl1_dmmu_prot_patch_tsb_2[]; extern int tl1_dmmu_prot_patch_tsb_mask_1[]; extern int tl1_dmmu_prot_patch_tsb_mask_2[]; extern int tl1_immu_miss_patch_asi_1[]; extern int tl1_immu_miss_patch_quad_ldd_1[]; extern int tl1_immu_miss_patch_tsb_1[]; extern int tl1_immu_miss_patch_tsb_2[]; extern int tl1_immu_miss_patch_tsb_mask_1[]; extern int tl1_immu_miss_patch_tsb_mask_2[]; /* * If user pmap is processed with pmap_remove and with pmap_remove and the * resident count drops to 0, there are no more pages to remove, so we * need not continue. */ #define PMAP_REMOVE_DONE(pm) \ ((pm) != kernel_pmap && (pm)->pm_stats.resident_count == 0) /* * The threshold (in bytes) above which tsb_foreach() is used in pmap_remove() * and pmap_protect() instead of trying each virtual address. */ #define PMAP_TSB_THRESH ((TSB_SIZE / 2) * PAGE_SIZE) SYSCTL_NODE(_debug, OID_AUTO, pmap_stats, CTLFLAG_RD, 0, ""); PMAP_STATS_VAR(pmap_nenter); PMAP_STATS_VAR(pmap_nenter_update); PMAP_STATS_VAR(pmap_nenter_replace); PMAP_STATS_VAR(pmap_nenter_new); PMAP_STATS_VAR(pmap_nkenter); PMAP_STATS_VAR(pmap_nkenter_oc); PMAP_STATS_VAR(pmap_nkenter_stupid); PMAP_STATS_VAR(pmap_nkremove); PMAP_STATS_VAR(pmap_nqenter); PMAP_STATS_VAR(pmap_nqremove); PMAP_STATS_VAR(pmap_ncache_enter); PMAP_STATS_VAR(pmap_ncache_enter_c); PMAP_STATS_VAR(pmap_ncache_enter_oc); PMAP_STATS_VAR(pmap_ncache_enter_cc); PMAP_STATS_VAR(pmap_ncache_enter_coc); PMAP_STATS_VAR(pmap_ncache_enter_nc); PMAP_STATS_VAR(pmap_ncache_enter_cnc); PMAP_STATS_VAR(pmap_ncache_remove); PMAP_STATS_VAR(pmap_ncache_remove_c); PMAP_STATS_VAR(pmap_ncache_remove_oc); PMAP_STATS_VAR(pmap_ncache_remove_cc); PMAP_STATS_VAR(pmap_ncache_remove_coc); PMAP_STATS_VAR(pmap_ncache_remove_nc); PMAP_STATS_VAR(pmap_nzero_page); PMAP_STATS_VAR(pmap_nzero_page_c); PMAP_STATS_VAR(pmap_nzero_page_oc); PMAP_STATS_VAR(pmap_nzero_page_nc); PMAP_STATS_VAR(pmap_nzero_page_area); PMAP_STATS_VAR(pmap_nzero_page_area_c); PMAP_STATS_VAR(pmap_nzero_page_area_oc); PMAP_STATS_VAR(pmap_nzero_page_area_nc); PMAP_STATS_VAR(pmap_ncopy_page); PMAP_STATS_VAR(pmap_ncopy_page_c); PMAP_STATS_VAR(pmap_ncopy_page_oc); PMAP_STATS_VAR(pmap_ncopy_page_nc); PMAP_STATS_VAR(pmap_ncopy_page_dc); PMAP_STATS_VAR(pmap_ncopy_page_doc); PMAP_STATS_VAR(pmap_ncopy_page_sc); PMAP_STATS_VAR(pmap_ncopy_page_soc); PMAP_STATS_VAR(pmap_nnew_thread); PMAP_STATS_VAR(pmap_nnew_thread_oc); static inline u_long dtlb_get_data(u_int tlb, u_int slot); /* * Quick sort callout for comparing memory regions */ static int mr_cmp(const void *a, const void *b); static int om_cmp(const void *a, const void *b); static int mr_cmp(const void *a, const void *b) { const struct ofw_mem_region *mra; const struct ofw_mem_region *mrb; mra = a; mrb = b; if (mra->mr_start < mrb->mr_start) return (-1); else if (mra->mr_start > mrb->mr_start) return (1); else return (0); } static int om_cmp(const void *a, const void *b) { const struct ofw_map *oma; const struct ofw_map *omb; oma = a; omb = b; if (oma->om_start < omb->om_start) return (-1); else if (oma->om_start > omb->om_start) return (1); else return (0); } static inline u_long dtlb_get_data(u_int tlb, u_int slot) { u_long data; register_t s; slot = TLB_DAR_SLOT(tlb, slot); /* * We read ASI_DTLB_DATA_ACCESS_REG twice back-to-back in order to * work around errata of USIII and beyond. */ s = intr_disable(); (void)ldxa(slot, ASI_DTLB_DATA_ACCESS_REG); data = ldxa(slot, ASI_DTLB_DATA_ACCESS_REG); intr_restore(s); return (data); } /* * Bootstrap the system enough to run with virtual memory. */ void pmap_bootstrap(u_int cpu_impl) { struct pmap *pm; struct tte *tp; vm_offset_t off; vm_offset_t va; vm_paddr_t pa; vm_size_t physsz; vm_size_t virtsz; u_long data; u_long vpn; phandle_t pmem; phandle_t vmem; u_int dtlb_slots_avail; int i; int j; int sz; uint32_t asi; uint32_t colors; uint32_t ldd; /* * Set the kernel context. */ pmap_set_kctx(); colors = dcache_color_ignore != 0 ? 1 : DCACHE_COLORS; /* * Find out what physical memory is available from the PROM and * initialize the phys_avail array. This must be done before * pmap_bootstrap_alloc is called. */ if ((pmem = OF_finddevice("/memory")) == -1) OF_panic("%s: finddevice /memory", __func__); if ((sz = OF_getproplen(pmem, "available")) == -1) OF_panic("%s: getproplen /memory/available", __func__); if (sizeof(phys_avail) < sz) OF_panic("%s: phys_avail too small", __func__); if (sizeof(mra) < sz) OF_panic("%s: mra too small", __func__); bzero(mra, sz); if (OF_getprop(pmem, "available", mra, sz) == -1) OF_panic("%s: getprop /memory/available", __func__); sz /= sizeof(*mra); #ifdef DIAGNOSTIC OF_printf("pmap_bootstrap: physical memory\n"); #endif qsort(mra, sz, sizeof (*mra), mr_cmp); physsz = 0; getenv_quad("hw.physmem", &physmem); physmem = btoc(physmem); for (i = 0, j = 0; i < sz; i++, j += 2) { #ifdef DIAGNOSTIC OF_printf("start=%#lx size=%#lx\n", mra[i].mr_start, mra[i].mr_size); #endif if (physmem != 0 && btoc(physsz + mra[i].mr_size) >= physmem) { if (btoc(physsz) < physmem) { phys_avail[j] = mra[i].mr_start; phys_avail[j + 1] = mra[i].mr_start + (ctob(physmem) - physsz); physsz = ctob(physmem); } break; } phys_avail[j] = mra[i].mr_start; phys_avail[j + 1] = mra[i].mr_start + mra[i].mr_size; physsz += mra[i].mr_size; } physmem = btoc(physsz); /* * Calculate the size of kernel virtual memory, and the size and mask * for the kernel TSB based on the phsyical memory size but limited * by the amount of dTLB slots available for locked entries if we have * to lock the TSB in the TLB (given that for spitfire-class CPUs all * of the dt64 slots can hold locked entries but there is no large * dTLB for unlocked ones, we don't use more than half of it for the * TSB). * Note that for reasons unknown OpenSolaris doesn't take advantage of * ASI_ATOMIC_QUAD_LDD_PHYS on UltraSPARC-III. However, given that no * public documentation is available for these, the latter just might * not support it, yet. */ if (cpu_impl == CPU_IMPL_SPARC64V || cpu_impl >= CPU_IMPL_ULTRASPARCIIIp) { tsb_kernel_ldd_phys = 1; virtsz = roundup(5 / 3 * physsz, PAGE_SIZE_4M << (PAGE_SHIFT - TTE_SHIFT)); } else { dtlb_slots_avail = 0; for (i = 0; i < dtlb_slots; i++) { data = dtlb_get_data(cpu_impl == CPU_IMPL_ULTRASPARCIII ? TLB_DAR_T16 : TLB_DAR_T32, i); if ((data & (TD_V | TD_L)) != (TD_V | TD_L)) dtlb_slots_avail++; } #ifdef SMP dtlb_slots_avail -= PCPU_PAGES; #endif if (cpu_impl >= CPU_IMPL_ULTRASPARCI && cpu_impl < CPU_IMPL_ULTRASPARCIII) dtlb_slots_avail /= 2; virtsz = roundup(physsz, PAGE_SIZE_4M << (PAGE_SHIFT - TTE_SHIFT)); virtsz = MIN(virtsz, (dtlb_slots_avail * PAGE_SIZE_4M) << (PAGE_SHIFT - TTE_SHIFT)); } vm_max_kernel_address = VM_MIN_KERNEL_ADDRESS + virtsz; tsb_kernel_size = virtsz >> (PAGE_SHIFT - TTE_SHIFT); tsb_kernel_mask = (tsb_kernel_size >> TTE_SHIFT) - 1; /* * Allocate the kernel TSB and lock it in the TLB if necessary. */ pa = pmap_bootstrap_alloc(tsb_kernel_size, colors); if (pa & PAGE_MASK_4M) OF_panic("%s: TSB unaligned", __func__); tsb_kernel_phys = pa; if (tsb_kernel_ldd_phys == 0) { tsb_kernel = (struct tte *)(VM_MIN_KERNEL_ADDRESS - tsb_kernel_size); pmap_map_tsb(); bzero(tsb_kernel, tsb_kernel_size); } else { tsb_kernel = (struct tte *)TLB_PHYS_TO_DIRECT(tsb_kernel_phys); aszero(ASI_PHYS_USE_EC, tsb_kernel_phys, tsb_kernel_size); } /* * Allocate and map the dynamic per-CPU area for the BSP. */ pa = pmap_bootstrap_alloc(DPCPU_SIZE, colors); dpcpu0 = (void *)TLB_PHYS_TO_DIRECT(pa); /* * Allocate and map the message buffer. */ pa = pmap_bootstrap_alloc(msgbufsize, colors); msgbufp = (struct msgbuf *)TLB_PHYS_TO_DIRECT(pa); /* * Patch the TSB addresses and mask as well as the ASIs used to load * it into the trap table. */ #define LDDA_R_I_R(rd, imm_asi, rs1, rs2) \ (EIF_OP(IOP_LDST) | EIF_F3_RD(rd) | EIF_F3_OP3(INS3_LDDA) | \ EIF_F3_RS1(rs1) | EIF_F3_I(0) | EIF_F3_IMM_ASI(imm_asi) | \ EIF_F3_RS2(rs2)) #define OR_R_I_R(rd, imm13, rs1) \ (EIF_OP(IOP_MISC) | EIF_F3_RD(rd) | EIF_F3_OP3(INS2_OR) | \ EIF_F3_RS1(rs1) | EIF_F3_I(1) | EIF_IMM(imm13, 13)) #define SETHI(rd, imm22) \ (EIF_OP(IOP_FORM2) | EIF_F2_RD(rd) | EIF_F2_OP2(INS0_SETHI) | \ EIF_IMM((imm22) >> 10, 22)) #define WR_R_I(rd, imm13, rs1) \ (EIF_OP(IOP_MISC) | EIF_F3_RD(rd) | EIF_F3_OP3(INS2_WR) | \ EIF_F3_RS1(rs1) | EIF_F3_I(1) | EIF_IMM(imm13, 13)) #define PATCH_ASI(addr, asi) do { \ if (addr[0] != WR_R_I(IF_F3_RD(addr[0]), 0x0, \ IF_F3_RS1(addr[0]))) \ OF_panic("%s: patched instructions have changed", \ __func__); \ addr[0] |= EIF_IMM((asi), 13); \ flush(addr); \ } while (0) #define PATCH_LDD(addr, asi) do { \ if (addr[0] != LDDA_R_I_R(IF_F3_RD(addr[0]), 0x0, \ IF_F3_RS1(addr[0]), IF_F3_RS2(addr[0]))) \ OF_panic("%s: patched instructions have changed", \ __func__); \ addr[0] |= EIF_F3_IMM_ASI(asi); \ flush(addr); \ } while (0) #define PATCH_TSB(addr, val) do { \ if (addr[0] != SETHI(IF_F2_RD(addr[0]), 0x0) || \ addr[1] != OR_R_I_R(IF_F3_RD(addr[1]), 0x0, \ IF_F3_RS1(addr[1])) || \ addr[3] != SETHI(IF_F2_RD(addr[3]), 0x0)) \ OF_panic("%s: patched instructions have changed", \ __func__); \ addr[0] |= EIF_IMM((val) >> 42, 22); \ addr[1] |= EIF_IMM((val) >> 32, 10); \ addr[3] |= EIF_IMM((val) >> 10, 22); \ flush(addr); \ flush(addr + 1); \ flush(addr + 3); \ } while (0) #define PATCH_TSB_MASK(addr, val) do { \ if (addr[0] != SETHI(IF_F2_RD(addr[0]), 0x0) || \ addr[1] != OR_R_I_R(IF_F3_RD(addr[1]), 0x0, \ IF_F3_RS1(addr[1]))) \ OF_panic("%s: patched instructions have changed", \ __func__); \ addr[0] |= EIF_IMM((val) >> 10, 22); \ addr[1] |= EIF_IMM((val), 10); \ flush(addr); \ flush(addr + 1); \ } while (0) if (tsb_kernel_ldd_phys == 0) { asi = ASI_N; ldd = ASI_NUCLEUS_QUAD_LDD; off = (vm_offset_t)tsb_kernel; } else { asi = ASI_PHYS_USE_EC; ldd = ASI_ATOMIC_QUAD_LDD_PHYS; off = (vm_offset_t)tsb_kernel_phys; } PATCH_TSB(tl1_dmmu_miss_direct_patch_tsb_phys_1, tsb_kernel_phys); PATCH_TSB(tl1_dmmu_miss_direct_patch_tsb_phys_end_1, tsb_kernel_phys + tsb_kernel_size - 1); PATCH_ASI(tl1_dmmu_miss_patch_asi_1, asi); PATCH_LDD(tl1_dmmu_miss_patch_quad_ldd_1, ldd); PATCH_TSB(tl1_dmmu_miss_patch_tsb_1, off); PATCH_TSB(tl1_dmmu_miss_patch_tsb_2, off); PATCH_TSB_MASK(tl1_dmmu_miss_patch_tsb_mask_1, tsb_kernel_mask); PATCH_TSB_MASK(tl1_dmmu_miss_patch_tsb_mask_2, tsb_kernel_mask); PATCH_ASI(tl1_dmmu_prot_patch_asi_1, asi); PATCH_LDD(tl1_dmmu_prot_patch_quad_ldd_1, ldd); PATCH_TSB(tl1_dmmu_prot_patch_tsb_1, off); PATCH_TSB(tl1_dmmu_prot_patch_tsb_2, off); PATCH_TSB_MASK(tl1_dmmu_prot_patch_tsb_mask_1, tsb_kernel_mask); PATCH_TSB_MASK(tl1_dmmu_prot_patch_tsb_mask_2, tsb_kernel_mask); PATCH_ASI(tl1_immu_miss_patch_asi_1, asi); PATCH_LDD(tl1_immu_miss_patch_quad_ldd_1, ldd); PATCH_TSB(tl1_immu_miss_patch_tsb_1, off); PATCH_TSB(tl1_immu_miss_patch_tsb_2, off); PATCH_TSB_MASK(tl1_immu_miss_patch_tsb_mask_1, tsb_kernel_mask); PATCH_TSB_MASK(tl1_immu_miss_patch_tsb_mask_2, tsb_kernel_mask); /* * Enter fake 8k pages for the 4MB kernel pages, so that * pmap_kextract() will work for them. */ for (i = 0; i < kernel_tlb_slots; i++) { pa = kernel_tlbs[i].te_pa; va = kernel_tlbs[i].te_va; for (off = 0; off < PAGE_SIZE_4M; off += PAGE_SIZE) { tp = tsb_kvtotte(va + off); vpn = TV_VPN(va + off, TS_8K); data = TD_V | TD_8K | TD_PA(pa + off) | TD_REF | TD_SW | TD_CP | TD_CV | TD_P | TD_W; pmap_bootstrap_set_tte(tp, vpn, data); } } /* * Set the start and end of KVA. The kernel is loaded starting * at the first available 4MB super page, so we advance to the * end of the last one used for it. */ virtual_avail = KERNBASE + kernel_tlb_slots * PAGE_SIZE_4M; virtual_end = vm_max_kernel_address; kernel_vm_end = vm_max_kernel_address; /* * Allocate kva space for temporary mappings. */ pmap_idle_map = virtual_avail; virtual_avail += PAGE_SIZE * colors; pmap_temp_map_1 = virtual_avail; virtual_avail += PAGE_SIZE * colors; pmap_temp_map_2 = virtual_avail; virtual_avail += PAGE_SIZE * colors; /* * Allocate a kernel stack with guard page for thread0 and map it * into the kernel TSB. We must ensure that the virtual address is * colored properly for corresponding CPUs, since we're allocating * from phys_avail so the memory won't have an associated vm_page_t. */ pa = pmap_bootstrap_alloc(KSTACK_PAGES * PAGE_SIZE, colors); kstack0_phys = pa; virtual_avail += roundup(KSTACK_GUARD_PAGES, colors) * PAGE_SIZE; kstack0 = virtual_avail; virtual_avail += roundup(KSTACK_PAGES, colors) * PAGE_SIZE; if (dcache_color_ignore == 0) KASSERT(DCACHE_COLOR(kstack0) == DCACHE_COLOR(kstack0_phys), ("pmap_bootstrap: kstack0 miscolored")); for (i = 0; i < KSTACK_PAGES; i++) { pa = kstack0_phys + i * PAGE_SIZE; va = kstack0 + i * PAGE_SIZE; tp = tsb_kvtotte(va); vpn = TV_VPN(va, TS_8K); data = TD_V | TD_8K | TD_PA(pa) | TD_REF | TD_SW | TD_CP | TD_CV | TD_P | TD_W; pmap_bootstrap_set_tte(tp, vpn, data); } /* * Calculate the last available physical address. */ for (i = 0; phys_avail[i + 2] != 0; i += 2) ; Maxmem = sparc64_btop(phys_avail[i + 1]); /* * Add the PROM mappings to the kernel TSB. */ if ((vmem = OF_finddevice("/virtual-memory")) == -1) OF_panic("%s: finddevice /virtual-memory", __func__); if ((sz = OF_getproplen(vmem, "translations")) == -1) OF_panic("%s: getproplen translations", __func__); if (sizeof(translations) < sz) OF_panic("%s: translations too small", __func__); bzero(translations, sz); if (OF_getprop(vmem, "translations", translations, sz) == -1) OF_panic("%s: getprop /virtual-memory/translations", __func__); sz /= sizeof(*translations); translations_size = sz; #ifdef DIAGNOSTIC OF_printf("pmap_bootstrap: translations\n"); #endif qsort(translations, sz, sizeof (*translations), om_cmp); for (i = 0; i < sz; i++) { #ifdef DIAGNOSTIC OF_printf("translation: start=%#lx size=%#lx tte=%#lx\n", translations[i].om_start, translations[i].om_size, translations[i].om_tte); #endif if ((translations[i].om_tte & TD_V) == 0) continue; if (translations[i].om_start < VM_MIN_PROM_ADDRESS || translations[i].om_start > VM_MAX_PROM_ADDRESS) continue; for (off = 0; off < translations[i].om_size; off += PAGE_SIZE) { va = translations[i].om_start + off; tp = tsb_kvtotte(va); vpn = TV_VPN(va, TS_8K); data = ((translations[i].om_tte & ~((TD_SOFT2_MASK << TD_SOFT2_SHIFT) | (cpu_impl >= CPU_IMPL_ULTRASPARCI && cpu_impl < CPU_IMPL_ULTRASPARCIII ? (TD_DIAG_SF_MASK << TD_DIAG_SF_SHIFT) : (TD_RSVD_CH_MASK << TD_RSVD_CH_SHIFT)) | (TD_SOFT_MASK << TD_SOFT_SHIFT))) | TD_EXEC) + off; pmap_bootstrap_set_tte(tp, vpn, data); } } /* * Get the available physical memory ranges from /memory/reg. These * are only used for kernel dumps, but it may not be wise to do PROM * calls in that situation. */ if ((sz = OF_getproplen(pmem, "reg")) == -1) OF_panic("%s: getproplen /memory/reg", __func__); if (sizeof(sparc64_memreg) < sz) OF_panic("%s: sparc64_memreg too small", __func__); if (OF_getprop(pmem, "reg", sparc64_memreg, sz) == -1) OF_panic("%s: getprop /memory/reg", __func__); sparc64_nmemreg = sz / sizeof(*sparc64_memreg); /* * Initialize the kernel pmap (which is statically allocated). */ pm = kernel_pmap; PMAP_LOCK_INIT(pm); for (i = 0; i < MAXCPU; i++) pm->pm_context[i] = TLB_CTX_KERNEL; CPU_FILL(&pm->pm_active); /* * Initialize the global tte list lock, which is more commonly * known as the pmap pv global lock. */ rw_init(&tte_list_global_lock, "pmap pv global"); /* * Flush all non-locked TLB entries possibly left over by the * firmware. */ tlb_flush_nonlocked(); } static void pmap_init_qpages(void) { struct pcpu *pc; int i; if (dcache_color_ignore != 0) return; CPU_FOREACH(i) { pc = pcpu_find(i); pc->pc_qmap_addr = kva_alloc(PAGE_SIZE * DCACHE_COLORS); if (pc->pc_qmap_addr == 0) panic("pmap_init_qpages: unable to allocate KVA"); } } SYSINIT(qpages_init, SI_SUB_CPU, SI_ORDER_ANY, pmap_init_qpages, NULL); /* * Map the 4MB kernel TSB pages. */ void pmap_map_tsb(void) { vm_offset_t va; vm_paddr_t pa; u_long data; int i; for (i = 0; i < tsb_kernel_size; i += PAGE_SIZE_4M) { va = (vm_offset_t)tsb_kernel + i; pa = tsb_kernel_phys + i; data = TD_V | TD_4M | TD_PA(pa) | TD_L | TD_CP | TD_CV | TD_P | TD_W; stxa(AA_DMMU_TAR, ASI_DMMU, TLB_TAR_VA(va) | TLB_TAR_CTX(TLB_CTX_KERNEL)); stxa_sync(0, ASI_DTLB_DATA_IN_REG, data); } } /* * Set the secondary context to be the kernel context (needed for FP block * operations in the kernel). */ void pmap_set_kctx(void) { stxa(AA_DMMU_SCXR, ASI_DMMU, (ldxa(AA_DMMU_SCXR, ASI_DMMU) & TLB_CXR_PGSZ_MASK) | TLB_CTX_KERNEL); flush(KERNBASE); } /* * Allocate a physical page of memory directly from the phys_avail map. * Can only be called from pmap_bootstrap before avail start and end are * calculated. */ static vm_paddr_t pmap_bootstrap_alloc(vm_size_t size, uint32_t colors) { vm_paddr_t pa; int i; size = roundup(size, PAGE_SIZE * colors); for (i = 0; phys_avail[i + 1] != 0; i += 2) { if (phys_avail[i + 1] - phys_avail[i] < size) continue; pa = phys_avail[i]; phys_avail[i] += size; return (pa); } OF_panic("%s: no suitable region found", __func__); } /* * Set a TTE. This function is intended as a helper when tsb_kernel is * direct-mapped but we haven't taken over the trap table, yet, as it's the * case when we are taking advantage of ASI_ATOMIC_QUAD_LDD_PHYS to access * the kernel TSB. */ void pmap_bootstrap_set_tte(struct tte *tp, u_long vpn, u_long data) { if (tsb_kernel_ldd_phys == 0) { tp->tte_vpn = vpn; tp->tte_data = data; } else { stxa((vm_paddr_t)tp + offsetof(struct tte, tte_vpn), ASI_PHYS_USE_EC, vpn); stxa((vm_paddr_t)tp + offsetof(struct tte, tte_data), ASI_PHYS_USE_EC, data); } } /* * Initialize a vm_page's machine-dependent fields. */ void pmap_page_init(vm_page_t m) { TAILQ_INIT(&m->md.tte_list); m->md.color = DCACHE_COLOR(VM_PAGE_TO_PHYS(m)); m->md.pmap = NULL; } /* * Initialize the pmap module. */ void pmap_init(void) { vm_offset_t addr; vm_size_t size; int result; int i; for (i = 0; i < translations_size; i++) { addr = translations[i].om_start; size = translations[i].om_size; if ((translations[i].om_tte & TD_V) == 0) continue; if (addr < VM_MIN_PROM_ADDRESS || addr > VM_MAX_PROM_ADDRESS) continue; result = vm_map_find(kernel_map, NULL, 0, &addr, size, 0, VMFS_NO_SPACE, VM_PROT_ALL, VM_PROT_ALL, MAP_NOFAULT); if (result != KERN_SUCCESS || addr != translations[i].om_start) panic("pmap_init: vm_map_find"); } } /* * Extract the physical page address associated with the given * map/virtual_address pair. */ vm_paddr_t pmap_extract(pmap_t pm, vm_offset_t va) { struct tte *tp; vm_paddr_t pa; if (pm == kernel_pmap) return (pmap_kextract(va)); PMAP_LOCK(pm); tp = tsb_tte_lookup(pm, va); if (tp == NULL) pa = 0; else pa = TTE_GET_PA(tp) | (va & TTE_GET_PAGE_MASK(tp)); PMAP_UNLOCK(pm); return (pa); } /* * Atomically extract and hold the physical page with the given * pmap and virtual address pair if that mapping permits the given * protection. */ vm_page_t pmap_extract_and_hold(pmap_t pm, vm_offset_t va, vm_prot_t prot) { struct tte *tp; vm_page_t m; vm_paddr_t pa; m = NULL; pa = 0; PMAP_LOCK(pm); retry: if (pm == kernel_pmap) { if (va >= VM_MIN_DIRECT_ADDRESS) { tp = NULL; m = PHYS_TO_VM_PAGE(TLB_DIRECT_TO_PHYS(va)); (void)vm_page_pa_tryrelock(pm, TLB_DIRECT_TO_PHYS(va), &pa); vm_page_hold(m); } else { tp = tsb_kvtotte(va); if ((tp->tte_data & TD_V) == 0) tp = NULL; } } else tp = tsb_tte_lookup(pm, va); if (tp != NULL && ((tp->tte_data & TD_SW) || (prot & VM_PROT_WRITE) == 0)) { if (vm_page_pa_tryrelock(pm, TTE_GET_PA(tp), &pa)) goto retry; m = PHYS_TO_VM_PAGE(TTE_GET_PA(tp)); vm_page_hold(m); } PA_UNLOCK_COND(pa); PMAP_UNLOCK(pm); return (m); } /* * Extract the physical page address associated with the given kernel virtual * address. */ vm_paddr_t pmap_kextract(vm_offset_t va) { struct tte *tp; if (va >= VM_MIN_DIRECT_ADDRESS) return (TLB_DIRECT_TO_PHYS(va)); tp = tsb_kvtotte(va); if ((tp->tte_data & TD_V) == 0) return (0); return (TTE_GET_PA(tp) | (va & TTE_GET_PAGE_MASK(tp))); } int pmap_cache_enter(vm_page_t m, vm_offset_t va) { struct tte *tp; int color; rw_assert(&tte_list_global_lock, RA_WLOCKED); KASSERT((m->flags & PG_FICTITIOUS) == 0, ("pmap_cache_enter: fake page")); PMAP_STATS_INC(pmap_ncache_enter); if (dcache_color_ignore != 0) return (1); /* * Find the color for this virtual address and note the added mapping. */ color = DCACHE_COLOR(va); m->md.colors[color]++; /* * If all existing mappings have the same color, the mapping is * cacheable. */ if (m->md.color == color) { KASSERT(m->md.colors[DCACHE_OTHER_COLOR(color)] == 0, ("pmap_cache_enter: cacheable, mappings of other color")); if (m->md.color == DCACHE_COLOR(VM_PAGE_TO_PHYS(m))) PMAP_STATS_INC(pmap_ncache_enter_c); else PMAP_STATS_INC(pmap_ncache_enter_oc); return (1); } /* * If there are no mappings of the other color, and the page still has * the wrong color, this must be a new mapping. Change the color to * match the new mapping, which is cacheable. We must flush the page * from the cache now. */ if (m->md.colors[DCACHE_OTHER_COLOR(color)] == 0) { KASSERT(m->md.colors[color] == 1, ("pmap_cache_enter: changing color, not new mapping")); dcache_page_inval(VM_PAGE_TO_PHYS(m)); m->md.color = color; if (m->md.color == DCACHE_COLOR(VM_PAGE_TO_PHYS(m))) PMAP_STATS_INC(pmap_ncache_enter_cc); else PMAP_STATS_INC(pmap_ncache_enter_coc); return (1); } /* * If the mapping is already non-cacheable, just return. */ if (m->md.color == -1) { PMAP_STATS_INC(pmap_ncache_enter_nc); return (0); } PMAP_STATS_INC(pmap_ncache_enter_cnc); /* * Mark all mappings as uncacheable, flush any lines with the other * color out of the dcache, and set the color to none (-1). */ TAILQ_FOREACH(tp, &m->md.tte_list, tte_link) { atomic_clear_long(&tp->tte_data, TD_CV); tlb_page_demap(TTE_GET_PMAP(tp), TTE_GET_VA(tp)); } dcache_page_inval(VM_PAGE_TO_PHYS(m)); m->md.color = -1; return (0); } static void pmap_cache_remove(vm_page_t m, vm_offset_t va) { struct tte *tp; int color; rw_assert(&tte_list_global_lock, RA_WLOCKED); CTR3(KTR_PMAP, "pmap_cache_remove: m=%p va=%#lx c=%d", m, va, m->md.colors[DCACHE_COLOR(va)]); KASSERT((m->flags & PG_FICTITIOUS) == 0, ("pmap_cache_remove: fake page")); PMAP_STATS_INC(pmap_ncache_remove); if (dcache_color_ignore != 0) return; KASSERT(m->md.colors[DCACHE_COLOR(va)] > 0, ("pmap_cache_remove: no mappings %d <= 0", m->md.colors[DCACHE_COLOR(va)])); /* * Find the color for this virtual address and note the removal of * the mapping. */ color = DCACHE_COLOR(va); m->md.colors[color]--; /* * If the page is cacheable, just return and keep the same color, even * if there are no longer any mappings. */ if (m->md.color != -1) { if (m->md.color == DCACHE_COLOR(VM_PAGE_TO_PHYS(m))) PMAP_STATS_INC(pmap_ncache_remove_c); else PMAP_STATS_INC(pmap_ncache_remove_oc); return; } KASSERT(m->md.colors[DCACHE_OTHER_COLOR(color)] != 0, ("pmap_cache_remove: uncacheable, no mappings of other color")); /* * If the page is not cacheable (color is -1), and the number of * mappings for this color is not zero, just return. There are * mappings of the other color still, so remain non-cacheable. */ if (m->md.colors[color] != 0) { PMAP_STATS_INC(pmap_ncache_remove_nc); return; } /* * The number of mappings for this color is now zero. Recache the * other colored mappings, and change the page color to the other * color. There should be no lines in the data cache for this page, * so flushing should not be needed. */ TAILQ_FOREACH(tp, &m->md.tte_list, tte_link) { atomic_set_long(&tp->tte_data, TD_CV); tlb_page_demap(TTE_GET_PMAP(tp), TTE_GET_VA(tp)); } m->md.color = DCACHE_OTHER_COLOR(color); if (m->md.color == DCACHE_COLOR(VM_PAGE_TO_PHYS(m))) PMAP_STATS_INC(pmap_ncache_remove_cc); else PMAP_STATS_INC(pmap_ncache_remove_coc); } /* * Map a wired page into kernel virtual address space. */ void pmap_kenter(vm_offset_t va, vm_page_t m) { vm_offset_t ova; struct tte *tp; vm_page_t om; u_long data; rw_assert(&tte_list_global_lock, RA_WLOCKED); PMAP_STATS_INC(pmap_nkenter); tp = tsb_kvtotte(va); CTR4(KTR_PMAP, "pmap_kenter: va=%#lx pa=%#lx tp=%p data=%#lx", va, VM_PAGE_TO_PHYS(m), tp, tp->tte_data); if (DCACHE_COLOR(VM_PAGE_TO_PHYS(m)) != DCACHE_COLOR(va)) { CTR5(KTR_SPARE2, "pmap_kenter: off color va=%#lx pa=%#lx o=%p ot=%d pi=%#lx", va, VM_PAGE_TO_PHYS(m), m->object, m->object ? m->object->type : -1, m->pindex); PMAP_STATS_INC(pmap_nkenter_oc); } if ((tp->tte_data & TD_V) != 0) { om = PHYS_TO_VM_PAGE(TTE_GET_PA(tp)); ova = TTE_GET_VA(tp); if (m == om && va == ova) { PMAP_STATS_INC(pmap_nkenter_stupid); return; } TAILQ_REMOVE(&om->md.tte_list, tp, tte_link); pmap_cache_remove(om, ova); if (va != ova) tlb_page_demap(kernel_pmap, ova); } data = TD_V | TD_8K | VM_PAGE_TO_PHYS(m) | TD_REF | TD_SW | TD_CP | TD_P | TD_W; if (pmap_cache_enter(m, va) != 0) data |= TD_CV; tp->tte_vpn = TV_VPN(va, TS_8K); tp->tte_data = data; TAILQ_INSERT_TAIL(&m->md.tte_list, tp, tte_link); } /* * Map a wired page into kernel virtual address space. This additionally * takes a flag argument which is or'ed to the TTE data. This is used by * sparc64_bus_mem_map(). * NOTE: if the mapping is non-cacheable, it's the caller's responsibility * to flush entries that might still be in the cache, if applicable. */ void pmap_kenter_flags(vm_offset_t va, vm_paddr_t pa, u_long flags) { struct tte *tp; tp = tsb_kvtotte(va); CTR4(KTR_PMAP, "pmap_kenter_flags: va=%#lx pa=%#lx tp=%p data=%#lx", va, pa, tp, tp->tte_data); tp->tte_vpn = TV_VPN(va, TS_8K); tp->tte_data = TD_V | TD_8K | TD_PA(pa) | TD_REF | TD_P | flags; } /* * Remove a wired page from kernel virtual address space. */ void pmap_kremove(vm_offset_t va) { struct tte *tp; vm_page_t m; rw_assert(&tte_list_global_lock, RA_WLOCKED); PMAP_STATS_INC(pmap_nkremove); tp = tsb_kvtotte(va); CTR3(KTR_PMAP, "pmap_kremove: va=%#lx tp=%p data=%#lx", va, tp, tp->tte_data); if ((tp->tte_data & TD_V) == 0) return; m = PHYS_TO_VM_PAGE(TTE_GET_PA(tp)); TAILQ_REMOVE(&m->md.tte_list, tp, tte_link); pmap_cache_remove(m, va); TTE_ZERO(tp); } /* * Inverse of pmap_kenter_flags, used by bus_space_unmap(). */ void pmap_kremove_flags(vm_offset_t va) { struct tte *tp; tp = tsb_kvtotte(va); CTR3(KTR_PMAP, "pmap_kremove_flags: va=%#lx tp=%p data=%#lx", va, tp, tp->tte_data); TTE_ZERO(tp); } /* * Map a range of physical addresses into kernel virtual address space. * * The value passed in *virt is a suggested virtual address for the mapping. * Architectures which can support a direct-mapped physical to virtual region * can return the appropriate address within that region, leaving '*virt' * unchanged. */ vm_offset_t pmap_map(vm_offset_t *virt, vm_paddr_t start, vm_paddr_t end, int prot) { return (TLB_PHYS_TO_DIRECT(start)); } /* * Map a list of wired pages into kernel virtual address space. This is * intended for temporary mappings which do not need page modification or * references recorded. Existing mappings in the region are overwritten. */ void pmap_qenter(vm_offset_t sva, vm_page_t *m, int count) { vm_offset_t va; PMAP_STATS_INC(pmap_nqenter); va = sva; rw_wlock(&tte_list_global_lock); while (count-- > 0) { pmap_kenter(va, *m); va += PAGE_SIZE; m++; } rw_wunlock(&tte_list_global_lock); tlb_range_demap(kernel_pmap, sva, va); } /* * Remove page mappings from kernel virtual address space. Intended for * temporary mappings entered by pmap_qenter. */ void pmap_qremove(vm_offset_t sva, int count) { vm_offset_t va; PMAP_STATS_INC(pmap_nqremove); va = sva; rw_wlock(&tte_list_global_lock); while (count-- > 0) { pmap_kremove(va); va += PAGE_SIZE; } rw_wunlock(&tte_list_global_lock); tlb_range_demap(kernel_pmap, sva, va); } /* * Initialize the pmap associated with process 0. */ void pmap_pinit0(pmap_t pm) { int i; PMAP_LOCK_INIT(pm); for (i = 0; i < MAXCPU; i++) pm->pm_context[i] = TLB_CTX_KERNEL; CPU_ZERO(&pm->pm_active); pm->pm_tsb = NULL; pm->pm_tsb_obj = NULL; bzero(&pm->pm_stats, sizeof(pm->pm_stats)); } /* * Initialize a preallocated and zeroed pmap structure, such as one in a * vmspace structure. */ int pmap_pinit(pmap_t pm) { vm_page_t ma[TSB_PAGES]; vm_page_t m; int i; /* * Allocate KVA space for the TSB. */ if (pm->pm_tsb == NULL) { pm->pm_tsb = (struct tte *)kva_alloc(TSB_BSIZE); if (pm->pm_tsb == NULL) return (0); } /* * Allocate an object for it. */ if (pm->pm_tsb_obj == NULL) pm->pm_tsb_obj = vm_object_allocate(OBJT_PHYS, TSB_PAGES); for (i = 0; i < MAXCPU; i++) pm->pm_context[i] = -1; CPU_ZERO(&pm->pm_active); VM_OBJECT_WLOCK(pm->pm_tsb_obj); for (i = 0; i < TSB_PAGES; i++) { m = vm_page_grab(pm->pm_tsb_obj, i, VM_ALLOC_NOBUSY | VM_ALLOC_WIRED | VM_ALLOC_ZERO); m->valid = VM_PAGE_BITS_ALL; m->md.pmap = pm; ma[i] = m; } VM_OBJECT_WUNLOCK(pm->pm_tsb_obj); pmap_qenter((vm_offset_t)pm->pm_tsb, ma, TSB_PAGES); bzero(&pm->pm_stats, sizeof(pm->pm_stats)); return (1); } /* * Release any resources held by the given physical map. * Called when a pmap initialized by pmap_pinit is being released. * Should only be called if the map contains no valid mappings. */ void pmap_release(pmap_t pm) { vm_object_t obj; vm_page_t m; #ifdef SMP struct pcpu *pc; #endif CTR2(KTR_PMAP, "pmap_release: ctx=%#x tsb=%p", pm->pm_context[curcpu], pm->pm_tsb); KASSERT(pmap_resident_count(pm) == 0, ("pmap_release: resident pages %ld != 0", pmap_resident_count(pm))); /* * After the pmap was freed, it might be reallocated to a new process. * When switching, this might lead us to wrongly assume that we need * not switch contexts because old and new pmap pointer are equal. * Therefore, make sure that this pmap is not referenced by any PCPU * pointer any more. This could happen in two cases: * - A process that referenced the pmap is currently exiting on a CPU. * However, it is guaranteed to not switch in any more after setting * its state to PRS_ZOMBIE. * - A process that referenced this pmap ran on a CPU, but we switched * to a kernel thread, leaving the pmap pointer unchanged. */ #ifdef SMP sched_pin(); STAILQ_FOREACH(pc, &cpuhead, pc_allcpu) atomic_cmpset_rel_ptr((uintptr_t *)&pc->pc_pmap, (uintptr_t)pm, (uintptr_t)NULL); sched_unpin(); #else critical_enter(); if (PCPU_GET(pmap) == pm) PCPU_SET(pmap, NULL); critical_exit(); #endif pmap_qremove((vm_offset_t)pm->pm_tsb, TSB_PAGES); obj = pm->pm_tsb_obj; VM_OBJECT_WLOCK(obj); KASSERT(obj->ref_count == 1, ("pmap_release: tsbobj ref count != 1")); while (!TAILQ_EMPTY(&obj->memq)) { m = TAILQ_FIRST(&obj->memq); m->md.pmap = NULL; m->wire_count--; atomic_subtract_int(&vm_cnt.v_wire_count, 1); vm_page_free_zero(m); } VM_OBJECT_WUNLOCK(obj); } /* * Grow the number of kernel page table entries. Unneeded. */ void pmap_growkernel(vm_offset_t addr) { panic("pmap_growkernel: can't grow kernel"); } int pmap_remove_tte(struct pmap *pm, struct pmap *pm2, struct tte *tp, vm_offset_t va) { vm_page_t m; u_long data; rw_assert(&tte_list_global_lock, RA_WLOCKED); data = atomic_readandclear_long(&tp->tte_data); if ((data & TD_FAKE) == 0) { m = PHYS_TO_VM_PAGE(TD_PA(data)); TAILQ_REMOVE(&m->md.tte_list, tp, tte_link); if ((data & TD_WIRED) != 0) pm->pm_stats.wired_count--; if ((data & TD_PV) != 0) { if ((data & TD_W) != 0) vm_page_dirty(m); if ((data & TD_REF) != 0) vm_page_aflag_set(m, PGA_REFERENCED); if (TAILQ_EMPTY(&m->md.tte_list)) vm_page_aflag_clear(m, PGA_WRITEABLE); pm->pm_stats.resident_count--; } pmap_cache_remove(m, va); } TTE_ZERO(tp); if (PMAP_REMOVE_DONE(pm)) return (0); return (1); } /* * Remove the given range of addresses from the specified map. */ void pmap_remove(pmap_t pm, vm_offset_t start, vm_offset_t end) { struct tte *tp; vm_offset_t va; CTR3(KTR_PMAP, "pmap_remove: ctx=%#lx start=%#lx end=%#lx", pm->pm_context[curcpu], start, end); if (PMAP_REMOVE_DONE(pm)) return; rw_wlock(&tte_list_global_lock); PMAP_LOCK(pm); if (end - start > PMAP_TSB_THRESH) { tsb_foreach(pm, NULL, start, end, pmap_remove_tte); tlb_context_demap(pm); } else { for (va = start; va < end; va += PAGE_SIZE) if ((tp = tsb_tte_lookup(pm, va)) != NULL && !pmap_remove_tte(pm, NULL, tp, va)) break; tlb_range_demap(pm, start, end - 1); } PMAP_UNLOCK(pm); rw_wunlock(&tte_list_global_lock); } void pmap_remove_all(vm_page_t m) { struct pmap *pm; struct tte *tpn; struct tte *tp; vm_offset_t va; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_remove_all: page %p is not managed", m)); rw_wlock(&tte_list_global_lock); for (tp = TAILQ_FIRST(&m->md.tte_list); tp != NULL; tp = tpn) { tpn = TAILQ_NEXT(tp, tte_link); if ((tp->tte_data & TD_PV) == 0) continue; pm = TTE_GET_PMAP(tp); va = TTE_GET_VA(tp); PMAP_LOCK(pm); if ((tp->tte_data & TD_WIRED) != 0) pm->pm_stats.wired_count--; if ((tp->tte_data & TD_REF) != 0) vm_page_aflag_set(m, PGA_REFERENCED); if ((tp->tte_data & TD_W) != 0) vm_page_dirty(m); tp->tte_data &= ~TD_V; tlb_page_demap(pm, va); TAILQ_REMOVE(&m->md.tte_list, tp, tte_link); pm->pm_stats.resident_count--; pmap_cache_remove(m, va); TTE_ZERO(tp); PMAP_UNLOCK(pm); } vm_page_aflag_clear(m, PGA_WRITEABLE); rw_wunlock(&tte_list_global_lock); } static int pmap_protect_tte(struct pmap *pm, struct pmap *pm2, struct tte *tp, vm_offset_t va) { u_long data; vm_page_t m; PMAP_LOCK_ASSERT(pm, MA_OWNED); data = atomic_clear_long(&tp->tte_data, TD_SW | TD_W); if ((data & (TD_PV | TD_W)) == (TD_PV | TD_W)) { m = PHYS_TO_VM_PAGE(TD_PA(data)); vm_page_dirty(m); } return (1); } /* * Set the physical protection on the specified range of this map as requested. */ void pmap_protect(pmap_t pm, vm_offset_t sva, vm_offset_t eva, vm_prot_t prot) { vm_offset_t va; struct tte *tp; CTR4(KTR_PMAP, "pmap_protect: ctx=%#lx sva=%#lx eva=%#lx prot=%#lx", pm->pm_context[curcpu], sva, eva, prot); if ((prot & VM_PROT_READ) == VM_PROT_NONE) { pmap_remove(pm, sva, eva); return; } if (prot & VM_PROT_WRITE) return; PMAP_LOCK(pm); if (eva - sva > PMAP_TSB_THRESH) { tsb_foreach(pm, NULL, sva, eva, pmap_protect_tte); tlb_context_demap(pm); } else { for (va = sva; va < eva; va += PAGE_SIZE) if ((tp = tsb_tte_lookup(pm, va)) != NULL) pmap_protect_tte(pm, NULL, tp, va); tlb_range_demap(pm, sva, eva - 1); } PMAP_UNLOCK(pm); } /* * Map the given physical page at the specified virtual address in the * target pmap with the protection requested. If specified the page * will be wired down. */ int pmap_enter(pmap_t pm, vm_offset_t va, vm_page_t m, vm_prot_t prot, u_int flags, int8_t psind) { int rv; rw_wlock(&tte_list_global_lock); PMAP_LOCK(pm); rv = pmap_enter_locked(pm, va, m, prot, flags, psind); rw_wunlock(&tte_list_global_lock); PMAP_UNLOCK(pm); return (rv); } /* * Map the given physical page at the specified virtual address in the * target pmap with the protection requested. If specified the page * will be wired down. * * The page queues and pmap must be locked. */ static int pmap_enter_locked(pmap_t pm, vm_offset_t va, vm_page_t m, vm_prot_t prot, u_int flags, int8_t psind __unused) { struct tte *tp; vm_paddr_t pa; vm_page_t real; u_long data; boolean_t wired; rw_assert(&tte_list_global_lock, RA_WLOCKED); PMAP_LOCK_ASSERT(pm, MA_OWNED); if ((m->oflags & VPO_UNMANAGED) == 0 && !vm_page_xbusied(m)) VM_OBJECT_ASSERT_LOCKED(m->object); PMAP_STATS_INC(pmap_nenter); pa = VM_PAGE_TO_PHYS(m); wired = (flags & PMAP_ENTER_WIRED) != 0; /* * If this is a fake page from the device_pager, but it covers actual * physical memory, convert to the real backing page. */ if ((m->flags & PG_FICTITIOUS) != 0) { real = vm_phys_paddr_to_vm_page(pa); if (real != NULL) m = real; } CTR6(KTR_PMAP, "pmap_enter_locked: ctx=%p m=%p va=%#lx pa=%#lx prot=%#x wired=%d", pm->pm_context[curcpu], m, va, pa, prot, wired); /* * If there is an existing mapping, and the physical address has not * changed, must be protection or wiring change. */ if ((tp = tsb_tte_lookup(pm, va)) != NULL && TTE_GET_PA(tp) == pa) { CTR0(KTR_PMAP, "pmap_enter_locked: update"); PMAP_STATS_INC(pmap_nenter_update); /* * Wiring change, just update stats. */ if (wired) { if ((tp->tte_data & TD_WIRED) == 0) { tp->tte_data |= TD_WIRED; pm->pm_stats.wired_count++; } } else { if ((tp->tte_data & TD_WIRED) != 0) { tp->tte_data &= ~TD_WIRED; pm->pm_stats.wired_count--; } } /* * Save the old bits and clear the ones we're interested in. */ data = tp->tte_data; tp->tte_data &= ~(TD_EXEC | TD_SW | TD_W); /* * If we're turning off write permissions, sense modify status. */ if ((prot & VM_PROT_WRITE) != 0) { tp->tte_data |= TD_SW; if (wired) tp->tte_data |= TD_W; if ((m->oflags & VPO_UNMANAGED) == 0) vm_page_aflag_set(m, PGA_WRITEABLE); } else if ((data & TD_W) != 0) vm_page_dirty(m); /* * If we're turning on execute permissions, flush the icache. */ if ((prot & VM_PROT_EXECUTE) != 0) { if ((data & TD_EXEC) == 0) icache_page_inval(pa); tp->tte_data |= TD_EXEC; } /* * Delete the old mapping. */ tlb_page_demap(pm, TTE_GET_VA(tp)); } else { /* * If there is an existing mapping, but its for a different * physical address, delete the old mapping. */ if (tp != NULL) { CTR0(KTR_PMAP, "pmap_enter_locked: replace"); PMAP_STATS_INC(pmap_nenter_replace); pmap_remove_tte(pm, NULL, tp, va); tlb_page_demap(pm, va); } else { CTR0(KTR_PMAP, "pmap_enter_locked: new"); PMAP_STATS_INC(pmap_nenter_new); } /* * Now set up the data and install the new mapping. */ data = TD_V | TD_8K | TD_PA(pa); if (pm == kernel_pmap) data |= TD_P; if ((prot & VM_PROT_WRITE) != 0) { data |= TD_SW; if ((m->oflags & VPO_UNMANAGED) == 0) vm_page_aflag_set(m, PGA_WRITEABLE); } if (prot & VM_PROT_EXECUTE) { data |= TD_EXEC; icache_page_inval(pa); } /* * If its wired update stats. We also don't need reference or * modify tracking for wired mappings, so set the bits now. */ if (wired) { pm->pm_stats.wired_count++; data |= TD_REF | TD_WIRED; if ((prot & VM_PROT_WRITE) != 0) data |= TD_W; } tsb_tte_enter(pm, m, va, TS_8K, data); } return (KERN_SUCCESS); } /* * Maps a sequence of resident pages belonging to the same object. * The sequence begins with the given page m_start. This page is * mapped at the given virtual address start. Each subsequent page is * mapped at a virtual address that is offset from start by the same * amount as the page is offset from m_start within the object. The * last page in the sequence is the page with the largest offset from * m_start that can be mapped at a virtual address less than the given * virtual address end. Not every virtual page between start and end * is mapped; only those for which a resident page exists with the * corresponding offset from m_start are mapped. */ void pmap_enter_object(pmap_t pm, vm_offset_t start, vm_offset_t end, vm_page_t m_start, vm_prot_t prot) { vm_page_t m; vm_pindex_t diff, psize; VM_OBJECT_ASSERT_LOCKED(m_start->object); psize = atop(end - start); m = m_start; rw_wlock(&tte_list_global_lock); PMAP_LOCK(pm); while (m != NULL && (diff = m->pindex - m_start->pindex) < psize) { pmap_enter_locked(pm, start + ptoa(diff), m, prot & (VM_PROT_READ | VM_PROT_EXECUTE), 0, 0); m = TAILQ_NEXT(m, listq); } rw_wunlock(&tte_list_global_lock); PMAP_UNLOCK(pm); } void pmap_enter_quick(pmap_t pm, vm_offset_t va, vm_page_t m, vm_prot_t prot) { rw_wlock(&tte_list_global_lock); PMAP_LOCK(pm); pmap_enter_locked(pm, va, m, prot & (VM_PROT_READ | VM_PROT_EXECUTE), 0, 0); rw_wunlock(&tte_list_global_lock); PMAP_UNLOCK(pm); } void pmap_object_init_pt(pmap_t pm, vm_offset_t addr, vm_object_t object, vm_pindex_t pindex, vm_size_t size) { VM_OBJECT_ASSERT_WLOCKED(object); KASSERT(object->type == OBJT_DEVICE || object->type == OBJT_SG, ("pmap_object_init_pt: non-device object")); } static int pmap_unwire_tte(pmap_t pm, pmap_t pm2, struct tte *tp, vm_offset_t va) { PMAP_LOCK_ASSERT(pm, MA_OWNED); if ((tp->tte_data & TD_WIRED) == 0) panic("pmap_unwire_tte: tp %p is missing TD_WIRED", tp); atomic_clear_long(&tp->tte_data, TD_WIRED); pm->pm_stats.wired_count--; return (1); } /* * Clear the wired attribute from the mappings for the specified range of * addresses in the given pmap. Every valid mapping within that range must * have the wired attribute set. In contrast, invalid mappings cannot have * the wired attribute set, so they are ignored. * * The wired attribute of the translation table entry is not a hardware * feature, so there is no need to invalidate any TLB entries. */ void pmap_unwire(pmap_t pm, vm_offset_t sva, vm_offset_t eva) { vm_offset_t va; struct tte *tp; PMAP_LOCK(pm); if (eva - sva > PMAP_TSB_THRESH) tsb_foreach(pm, NULL, sva, eva, pmap_unwire_tte); else { for (va = sva; va < eva; va += PAGE_SIZE) if ((tp = tsb_tte_lookup(pm, va)) != NULL) pmap_unwire_tte(pm, NULL, tp, va); } PMAP_UNLOCK(pm); } static int pmap_copy_tte(pmap_t src_pmap, pmap_t dst_pmap, struct tte *tp, vm_offset_t va) { vm_page_t m; u_long data; if ((tp->tte_data & TD_FAKE) != 0) return (1); if (tsb_tte_lookup(dst_pmap, va) == NULL) { data = tp->tte_data & ~(TD_PV | TD_REF | TD_SW | TD_CV | TD_W); m = PHYS_TO_VM_PAGE(TTE_GET_PA(tp)); tsb_tte_enter(dst_pmap, m, va, TS_8K, data); } return (1); } void pmap_copy(pmap_t dst_pmap, pmap_t src_pmap, vm_offset_t dst_addr, vm_size_t len, vm_offset_t src_addr) { struct tte *tp; vm_offset_t va; if (dst_addr != src_addr) return; rw_wlock(&tte_list_global_lock); if (dst_pmap < src_pmap) { PMAP_LOCK(dst_pmap); PMAP_LOCK(src_pmap); } else { PMAP_LOCK(src_pmap); PMAP_LOCK(dst_pmap); } if (len > PMAP_TSB_THRESH) { tsb_foreach(src_pmap, dst_pmap, src_addr, src_addr + len, pmap_copy_tte); tlb_context_demap(dst_pmap); } else { for (va = src_addr; va < src_addr + len; va += PAGE_SIZE) if ((tp = tsb_tte_lookup(src_pmap, va)) != NULL) pmap_copy_tte(src_pmap, dst_pmap, tp, va); tlb_range_demap(dst_pmap, src_addr, src_addr + len - 1); } rw_wunlock(&tte_list_global_lock); PMAP_UNLOCK(src_pmap); PMAP_UNLOCK(dst_pmap); } void pmap_zero_page(vm_page_t m) { struct tte *tp; vm_offset_t va; vm_paddr_t pa; KASSERT((m->flags & PG_FICTITIOUS) == 0, ("pmap_zero_page: fake page")); PMAP_STATS_INC(pmap_nzero_page); pa = VM_PAGE_TO_PHYS(m); if (dcache_color_ignore != 0 || m->md.color == DCACHE_COLOR(pa)) { PMAP_STATS_INC(pmap_nzero_page_c); va = TLB_PHYS_TO_DIRECT(pa); cpu_block_zero((void *)va, PAGE_SIZE); } else if (m->md.color == -1) { PMAP_STATS_INC(pmap_nzero_page_nc); aszero(ASI_PHYS_USE_EC, pa, PAGE_SIZE); } else { PMAP_STATS_INC(pmap_nzero_page_oc); PMAP_LOCK(kernel_pmap); va = pmap_temp_map_1 + (m->md.color * PAGE_SIZE); tp = tsb_kvtotte(va); tp->tte_data = TD_V | TD_8K | TD_PA(pa) | TD_CP | TD_CV | TD_W; tp->tte_vpn = TV_VPN(va, TS_8K); cpu_block_zero((void *)va, PAGE_SIZE); tlb_page_demap(kernel_pmap, va); PMAP_UNLOCK(kernel_pmap); } } void pmap_zero_page_area(vm_page_t m, int off, int size) { struct tte *tp; vm_offset_t va; vm_paddr_t pa; KASSERT((m->flags & PG_FICTITIOUS) == 0, ("pmap_zero_page_area: fake page")); KASSERT(off + size <= PAGE_SIZE, ("pmap_zero_page_area: bad off/size")); PMAP_STATS_INC(pmap_nzero_page_area); pa = VM_PAGE_TO_PHYS(m); if (dcache_color_ignore != 0 || m->md.color == DCACHE_COLOR(pa)) { PMAP_STATS_INC(pmap_nzero_page_area_c); va = TLB_PHYS_TO_DIRECT(pa); bzero((void *)(va + off), size); } else if (m->md.color == -1) { PMAP_STATS_INC(pmap_nzero_page_area_nc); aszero(ASI_PHYS_USE_EC, pa + off, size); } else { PMAP_STATS_INC(pmap_nzero_page_area_oc); PMAP_LOCK(kernel_pmap); va = pmap_temp_map_1 + (m->md.color * PAGE_SIZE); tp = tsb_kvtotte(va); tp->tte_data = TD_V | TD_8K | TD_PA(pa) | TD_CP | TD_CV | TD_W; tp->tte_vpn = TV_VPN(va, TS_8K); bzero((void *)(va + off), size); tlb_page_demap(kernel_pmap, va); PMAP_UNLOCK(kernel_pmap); } } void pmap_copy_page(vm_page_t msrc, vm_page_t mdst) { vm_offset_t vdst; vm_offset_t vsrc; vm_paddr_t pdst; vm_paddr_t psrc; struct tte *tp; KASSERT((mdst->flags & PG_FICTITIOUS) == 0, ("pmap_copy_page: fake dst page")); KASSERT((msrc->flags & PG_FICTITIOUS) == 0, ("pmap_copy_page: fake src page")); PMAP_STATS_INC(pmap_ncopy_page); pdst = VM_PAGE_TO_PHYS(mdst); psrc = VM_PAGE_TO_PHYS(msrc); if (dcache_color_ignore != 0 || (msrc->md.color == DCACHE_COLOR(psrc) && mdst->md.color == DCACHE_COLOR(pdst))) { PMAP_STATS_INC(pmap_ncopy_page_c); vdst = TLB_PHYS_TO_DIRECT(pdst); vsrc = TLB_PHYS_TO_DIRECT(psrc); cpu_block_copy((void *)vsrc, (void *)vdst, PAGE_SIZE); } else if (msrc->md.color == -1 && mdst->md.color == -1) { PMAP_STATS_INC(pmap_ncopy_page_nc); ascopy(ASI_PHYS_USE_EC, psrc, pdst, PAGE_SIZE); } else if (msrc->md.color == -1) { if (mdst->md.color == DCACHE_COLOR(pdst)) { PMAP_STATS_INC(pmap_ncopy_page_dc); vdst = TLB_PHYS_TO_DIRECT(pdst); ascopyfrom(ASI_PHYS_USE_EC, psrc, (void *)vdst, PAGE_SIZE); } else { PMAP_STATS_INC(pmap_ncopy_page_doc); PMAP_LOCK(kernel_pmap); vdst = pmap_temp_map_1 + (mdst->md.color * PAGE_SIZE); tp = tsb_kvtotte(vdst); tp->tte_data = TD_V | TD_8K | TD_PA(pdst) | TD_CP | TD_CV | TD_W; tp->tte_vpn = TV_VPN(vdst, TS_8K); ascopyfrom(ASI_PHYS_USE_EC, psrc, (void *)vdst, PAGE_SIZE); tlb_page_demap(kernel_pmap, vdst); PMAP_UNLOCK(kernel_pmap); } } else if (mdst->md.color == -1) { if (msrc->md.color == DCACHE_COLOR(psrc)) { PMAP_STATS_INC(pmap_ncopy_page_sc); vsrc = TLB_PHYS_TO_DIRECT(psrc); ascopyto((void *)vsrc, ASI_PHYS_USE_EC, pdst, PAGE_SIZE); } else { PMAP_STATS_INC(pmap_ncopy_page_soc); PMAP_LOCK(kernel_pmap); vsrc = pmap_temp_map_1 + (msrc->md.color * PAGE_SIZE); tp = tsb_kvtotte(vsrc); tp->tte_data = TD_V | TD_8K | TD_PA(psrc) | TD_CP | TD_CV | TD_W; tp->tte_vpn = TV_VPN(vsrc, TS_8K); ascopyto((void *)vsrc, ASI_PHYS_USE_EC, pdst, PAGE_SIZE); tlb_page_demap(kernel_pmap, vsrc); PMAP_UNLOCK(kernel_pmap); } } else { PMAP_STATS_INC(pmap_ncopy_page_oc); PMAP_LOCK(kernel_pmap); vdst = pmap_temp_map_1 + (mdst->md.color * PAGE_SIZE); tp = tsb_kvtotte(vdst); tp->tte_data = TD_V | TD_8K | TD_PA(pdst) | TD_CP | TD_CV | TD_W; tp->tte_vpn = TV_VPN(vdst, TS_8K); vsrc = pmap_temp_map_2 + (msrc->md.color * PAGE_SIZE); tp = tsb_kvtotte(vsrc); tp->tte_data = TD_V | TD_8K | TD_PA(psrc) | TD_CP | TD_CV | TD_W; tp->tte_vpn = TV_VPN(vsrc, TS_8K); cpu_block_copy((void *)vsrc, (void *)vdst, PAGE_SIZE); tlb_page_demap(kernel_pmap, vdst); tlb_page_demap(kernel_pmap, vsrc); PMAP_UNLOCK(kernel_pmap); } } vm_offset_t pmap_quick_enter_page(vm_page_t m) { vm_paddr_t pa; vm_offset_t qaddr; struct tte *tp; pa = VM_PAGE_TO_PHYS(m); if (dcache_color_ignore != 0 || m->md.color == DCACHE_COLOR(pa)) return (TLB_PHYS_TO_DIRECT(pa)); critical_enter(); qaddr = PCPU_GET(qmap_addr); qaddr += (PAGE_SIZE * ((DCACHE_COLORS + DCACHE_COLOR(pa) - DCACHE_COLOR(qaddr)) % DCACHE_COLORS)); tp = tsb_kvtotte(qaddr); KASSERT(tp->tte_data == 0, ("pmap_quick_enter_page: PTE busy")); tp->tte_data = TD_V | TD_8K | TD_PA(pa) | TD_CP | TD_CV | TD_W; tp->tte_vpn = TV_VPN(qaddr, TS_8K); return (qaddr); } void pmap_quick_remove_page(vm_offset_t addr) { vm_offset_t qaddr; struct tte *tp; if (addr >= VM_MIN_DIRECT_ADDRESS) return; tp = tsb_kvtotte(addr); qaddr = PCPU_GET(qmap_addr); KASSERT((addr >= qaddr) && (addr < (qaddr + (PAGE_SIZE * DCACHE_COLORS))), ("pmap_quick_remove_page: invalid address")); KASSERT(tp->tte_data != 0, ("pmap_quick_remove_page: PTE not in use")); stxa(TLB_DEMAP_VA(addr) | TLB_DEMAP_NUCLEUS | TLB_DEMAP_PAGE, ASI_DMMU_DEMAP, 0); stxa(TLB_DEMAP_VA(addr) | TLB_DEMAP_NUCLEUS | TLB_DEMAP_PAGE, ASI_IMMU_DEMAP, 0); flush(KERNBASE); TTE_ZERO(tp); critical_exit(); } int unmapped_buf_allowed; void pmap_copy_pages(vm_page_t ma[], vm_offset_t a_offset, vm_page_t mb[], vm_offset_t b_offset, int xfersize) { panic("pmap_copy_pages: not implemented"); } /* * Returns true if the pmap's pv is one of the first * 16 pvs linked to from this page. This count may * be changed upwards or downwards in the future; it * is only necessary that true be returned for a small * subset of pmaps for proper page aging. */ boolean_t pmap_page_exists_quick(pmap_t pm, vm_page_t m) { struct tte *tp; int loops; boolean_t rv; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_page_exists_quick: page %p is not managed", m)); loops = 0; rv = FALSE; rw_wlock(&tte_list_global_lock); TAILQ_FOREACH(tp, &m->md.tte_list, tte_link) { if ((tp->tte_data & TD_PV) == 0) continue; if (TTE_GET_PMAP(tp) == pm) { rv = TRUE; break; } if (++loops >= 16) break; } rw_wunlock(&tte_list_global_lock); return (rv); } /* * Return the number of managed mappings to the given physical page * that are wired. */ int pmap_page_wired_mappings(vm_page_t m) { struct tte *tp; int count; count = 0; if ((m->oflags & VPO_UNMANAGED) != 0) return (count); rw_wlock(&tte_list_global_lock); TAILQ_FOREACH(tp, &m->md.tte_list, tte_link) if ((tp->tte_data & (TD_PV | TD_WIRED)) == (TD_PV | TD_WIRED)) count++; rw_wunlock(&tte_list_global_lock); return (count); } /* * Remove all pages from specified address space, this aids process exit * speeds. This is much faster than pmap_remove in the case of running down * an entire address space. Only works for the current pmap. */ void pmap_remove_pages(pmap_t pm) { } /* * Returns TRUE if the given page has a managed mapping. */ boolean_t pmap_page_is_mapped(vm_page_t m) { struct tte *tp; boolean_t rv; rv = FALSE; if ((m->oflags & VPO_UNMANAGED) != 0) return (rv); rw_wlock(&tte_list_global_lock); TAILQ_FOREACH(tp, &m->md.tte_list, tte_link) if ((tp->tte_data & TD_PV) != 0) { rv = TRUE; break; } rw_wunlock(&tte_list_global_lock); return (rv); } -#define PMAP_TS_REFERENCED_MAX 5 - /* * Return a count of reference bits for a page, clearing those bits. * It is not necessary for every reference bit to be cleared, but it * is necessary that 0 only be returned when there are truly no * reference bits set. - * - * XXX: The exact number of bits to check and clear is a matter that - * should be tested and standardized at some point in the future for - * optimal aging of shared pages. * * As an optimization, update the page's dirty field if a modified bit is * found while counting reference bits. This opportunistic update can be * performed at low cost and can eliminate the need for some future calls * to pmap_is_modified(). However, since this function stops after * finding PMAP_TS_REFERENCED_MAX reference bits, it may not detect some * dirty pages. Those dirty pages will only be detected by a future call * to pmap_is_modified(). */ int pmap_ts_referenced(vm_page_t m) { struct tte *tpf; struct tte *tpn; struct tte *tp; u_long data; int count; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_ts_referenced: page %p is not managed", m)); count = 0; rw_wlock(&tte_list_global_lock); if ((tp = TAILQ_FIRST(&m->md.tte_list)) != NULL) { tpf = tp; do { tpn = TAILQ_NEXT(tp, tte_link); TAILQ_REMOVE(&m->md.tte_list, tp, tte_link); TAILQ_INSERT_TAIL(&m->md.tte_list, tp, tte_link); if ((tp->tte_data & TD_PV) == 0) continue; data = atomic_clear_long(&tp->tte_data, TD_REF); if ((data & TD_W) != 0) vm_page_dirty(m); if ((data & TD_REF) != 0 && ++count >= PMAP_TS_REFERENCED_MAX) break; } while ((tp = tpn) != NULL && tp != tpf); } rw_wunlock(&tte_list_global_lock); return (count); } boolean_t pmap_is_modified(vm_page_t m) { struct tte *tp; boolean_t rv; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_is_modified: page %p is not managed", m)); rv = FALSE; /* * If the page is not exclusive busied, then PGA_WRITEABLE cannot be * concurrently set while the object is locked. Thus, if PGA_WRITEABLE * is clear, no TTEs can have TD_W set. */ VM_OBJECT_ASSERT_WLOCKED(m->object); if (!vm_page_xbusied(m) && (m->aflags & PGA_WRITEABLE) == 0) return (rv); rw_wlock(&tte_list_global_lock); TAILQ_FOREACH(tp, &m->md.tte_list, tte_link) { if ((tp->tte_data & TD_PV) == 0) continue; if ((tp->tte_data & TD_W) != 0) { rv = TRUE; break; } } rw_wunlock(&tte_list_global_lock); return (rv); } /* * pmap_is_prefaultable: * * Return whether or not the specified virtual address is elgible * for prefault. */ boolean_t pmap_is_prefaultable(pmap_t pmap, vm_offset_t addr) { boolean_t rv; PMAP_LOCK(pmap); rv = tsb_tte_lookup(pmap, addr) == NULL; PMAP_UNLOCK(pmap); return (rv); } /* * Return whether or not the specified physical page was referenced * in any physical maps. */ boolean_t pmap_is_referenced(vm_page_t m) { struct tte *tp; boolean_t rv; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_is_referenced: page %p is not managed", m)); rv = FALSE; rw_wlock(&tte_list_global_lock); TAILQ_FOREACH(tp, &m->md.tte_list, tte_link) { if ((tp->tte_data & TD_PV) == 0) continue; if ((tp->tte_data & TD_REF) != 0) { rv = TRUE; break; } } rw_wunlock(&tte_list_global_lock); return (rv); } /* * This function is advisory. */ void pmap_advise(pmap_t pmap, vm_offset_t sva, vm_offset_t eva, int advice) { } void pmap_clear_modify(vm_page_t m) { struct tte *tp; u_long data; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_clear_modify: page %p is not managed", m)); VM_OBJECT_ASSERT_WLOCKED(m->object); KASSERT(!vm_page_xbusied(m), ("pmap_clear_modify: page %p is exclusive busied", m)); /* * If the page is not PGA_WRITEABLE, then no TTEs can have TD_W set. * If the object containing the page is locked and the page is not * exclusive busied, then PGA_WRITEABLE cannot be concurrently set. */ if ((m->aflags & PGA_WRITEABLE) == 0) return; rw_wlock(&tte_list_global_lock); TAILQ_FOREACH(tp, &m->md.tte_list, tte_link) { if ((tp->tte_data & TD_PV) == 0) continue; data = atomic_clear_long(&tp->tte_data, TD_W); if ((data & TD_W) != 0) tlb_page_demap(TTE_GET_PMAP(tp), TTE_GET_VA(tp)); } rw_wunlock(&tte_list_global_lock); } void pmap_remove_write(vm_page_t m) { struct tte *tp; u_long data; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_remove_write: page %p is not managed", m)); /* * If the page is not exclusive busied, then PGA_WRITEABLE cannot be * set by another thread while the object is locked. Thus, * if PGA_WRITEABLE is clear, no page table entries need updating. */ VM_OBJECT_ASSERT_WLOCKED(m->object); if (!vm_page_xbusied(m) && (m->aflags & PGA_WRITEABLE) == 0) return; rw_wlock(&tte_list_global_lock); TAILQ_FOREACH(tp, &m->md.tte_list, tte_link) { if ((tp->tte_data & TD_PV) == 0) continue; data = atomic_clear_long(&tp->tte_data, TD_SW | TD_W); if ((data & TD_W) != 0) { vm_page_dirty(m); tlb_page_demap(TTE_GET_PMAP(tp), TTE_GET_VA(tp)); } } vm_page_aflag_clear(m, PGA_WRITEABLE); rw_wunlock(&tte_list_global_lock); } int pmap_mincore(pmap_t pm, vm_offset_t addr, vm_paddr_t *locked_pa) { /* TODO; */ return (0); } /* * Activate a user pmap. The pmap must be activated before its address space * can be accessed in any way. */ void pmap_activate(struct thread *td) { struct vmspace *vm; struct pmap *pm; int context; critical_enter(); vm = td->td_proc->p_vmspace; pm = vmspace_pmap(vm); context = PCPU_GET(tlb_ctx); if (context == PCPU_GET(tlb_ctx_max)) { tlb_flush_user(); context = PCPU_GET(tlb_ctx_min); } PCPU_SET(tlb_ctx, context + 1); pm->pm_context[curcpu] = context; #ifdef SMP CPU_SET_ATOMIC(PCPU_GET(cpuid), &pm->pm_active); atomic_store_acq_ptr((uintptr_t *)PCPU_PTR(pmap), (uintptr_t)pm); #else CPU_SET(PCPU_GET(cpuid), &pm->pm_active); PCPU_SET(pmap, pm); #endif stxa(AA_DMMU_TSB, ASI_DMMU, pm->pm_tsb); stxa(AA_IMMU_TSB, ASI_IMMU, pm->pm_tsb); stxa(AA_DMMU_PCXR, ASI_DMMU, (ldxa(AA_DMMU_PCXR, ASI_DMMU) & TLB_CXR_PGSZ_MASK) | context); flush(KERNBASE); critical_exit(); } void pmap_sync_icache(pmap_t pm, vm_offset_t va, vm_size_t sz) { } /* * Increase the starting virtual address of the given mapping if a * different alignment might result in more superpage mappings. */ void pmap_align_superpage(vm_object_t object, vm_ooffset_t offset, vm_offset_t *addr, vm_size_t size) { } Index: projects/clang390-import/sys/sys/queue.h =================================================================== --- projects/clang390-import/sys/sys/queue.h (revision 305686) +++ projects/clang390-import/sys/sys/queue.h (revision 305687) @@ -1,787 +1,819 @@ /*- * Copyright (c) 1991, 1993 * The Regents of the University of California. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)queue.h 8.5 (Berkeley) 8/20/94 * $FreeBSD$ */ #ifndef _SYS_QUEUE_H_ #define _SYS_QUEUE_H_ #include /* * This file defines four types of data structures: singly-linked lists, * singly-linked tail queues, lists and tail queues. * * A singly-linked list is headed by a single forward pointer. The elements * are singly linked for minimum space and pointer manipulation overhead at * the expense of O(n) removal for arbitrary elements. New elements can be * added to the list after an existing element or at the head of the list. * Elements being removed from the head of the list should use the explicit * macro for this purpose for optimum efficiency. A singly-linked list may * only be traversed in the forward direction. Singly-linked lists are ideal * for applications with large datasets and few or no removals or for * implementing a LIFO queue. * * A singly-linked tail queue is headed by a pair of pointers, one to the * head of the list and the other to the tail of the list. The elements are * singly linked for minimum space and pointer manipulation overhead at the * expense of O(n) removal for arbitrary elements. New elements can be added * to the list after an existing element, at the head of the list, or at the * end of the list. Elements being removed from the head of the tail queue * should use the explicit macro for this purpose for optimum efficiency. * A singly-linked tail queue may only be traversed in the forward direction. * Singly-linked tail queues are ideal for applications with large datasets * and few or no removals or for implementing a FIFO queue. * * A list is headed by a single forward pointer (or an array of forward * pointers for a hash table header). The elements are doubly linked * so that an arbitrary element can be removed without a need to * traverse the list. New elements can be added to the list before * or after an existing element or at the head of the list. A list * may be traversed in either direction. * * A tail queue is headed by a pair of pointers, one to the head of the * list and the other to the tail of the list. The elements are doubly * linked so that an arbitrary element can be removed without a need to * traverse the list. New elements can be added to the list before or * after an existing element, at the head of the list, or at the end of * the list. A tail queue may be traversed in either direction. * * For details on the use of these macros, see the queue(3) manual page. * * Below is a summary of implemented functions where: * + means the macro is available * - means the macro is not available * s means the macro is available but is slow (runs in O(n) time) * * SLIST LIST STAILQ TAILQ * _HEAD + + + + * _CLASS_HEAD + + + + * _HEAD_INITIALIZER + + + + * _ENTRY + + + + * _CLASS_ENTRY + + + + * _INIT + + + + * _EMPTY + + + + * _FIRST + + + + * _NEXT + + + + * _PREV - + - + * _LAST - - + + * _FOREACH + + + + * _FOREACH_FROM + + + + * _FOREACH_SAFE + + + + * _FOREACH_FROM_SAFE + + + + * _FOREACH_REVERSE - - - + * _FOREACH_REVERSE_FROM - - - + * _FOREACH_REVERSE_SAFE - - - + * _FOREACH_REVERSE_FROM_SAFE - - - + * _INSERT_HEAD + + + + * _INSERT_BEFORE - + - + * _INSERT_AFTER + + + + * _INSERT_TAIL - - + + * _CONCAT s s + + * _REMOVE_AFTER + - + - * _REMOVE_HEAD + - + - * _REMOVE s + s + * _SWAP + + + + * */ #ifdef QUEUE_MACRO_DEBUG +#warn Use QUEUE_MACRO_DEBUG_TRACE and/or QUEUE_MACRO_DEBUG_TRASH +#define QUEUE_MACRO_DEBUG_TRACE +#define QUEUE_MACRO_DEBUG_TRASH +#endif + +#ifdef QUEUE_MACRO_DEBUG_TRACE /* Store the last 2 places the queue element or head was altered */ struct qm_trace { unsigned long lastline; unsigned long prevline; const char *lastfile; const char *prevfile; }; #define TRACEBUF struct qm_trace trace; #define TRACEBUF_INITIALIZER { __LINE__, 0, __FILE__, NULL } , -#define TRASHIT(x) do {(x) = (void *)-1;} while (0) -#define QMD_SAVELINK(name, link) void **name = (void *)&(link) #define QMD_TRACE_HEAD(head) do { \ (head)->trace.prevline = (head)->trace.lastline; \ (head)->trace.prevfile = (head)->trace.lastfile; \ (head)->trace.lastline = __LINE__; \ (head)->trace.lastfile = __FILE__; \ } while (0) #define QMD_TRACE_ELEM(elem) do { \ (elem)->trace.prevline = (elem)->trace.lastline; \ (elem)->trace.prevfile = (elem)->trace.lastfile; \ (elem)->trace.lastline = __LINE__; \ (elem)->trace.lastfile = __FILE__; \ } while (0) -#else +#else /* !QUEUE_MACRO_DEBUG_TRACE */ #define QMD_TRACE_ELEM(elem) #define QMD_TRACE_HEAD(head) -#define QMD_SAVELINK(name, link) #define TRACEBUF #define TRACEBUF_INITIALIZER +#endif /* QUEUE_MACRO_DEBUG_TRACE */ + +#ifdef QUEUE_MACRO_DEBUG_TRASH +#define TRASHIT(x) do {(x) = (void *)-1;} while (0) +#define QMD_IS_TRASHED(x) ((x) == (void *)(intptr_t)-1) +#else /* !QUEUE_MACRO_DEBUG_TRASH */ #define TRASHIT(x) -#endif /* QUEUE_MACRO_DEBUG */ +#define QMD_IS_TRASHED(x) 0 +#endif /* QUEUE_MACRO_DEBUG_TRASH */ +#if defined(QUEUE_MACRO_DEBUG_TRACE) || defined(QUEUE_MACRO_DEBUG_TRASH) +#define QMD_SAVELINK(name, link) void **name = (void *)&(link) +#else /* !QUEUE_MACRO_DEBUG_TRACE && !QUEUE_MACRO_DEBUG_TRASH */ +#define QMD_SAVELINK(name, link) +#endif /* QUEUE_MACRO_DEBUG_TRACE || QUEUE_MACRO_DEBUG_TRASH */ + #ifdef __cplusplus /* * In C++ there can be structure lists and class lists: */ #define QUEUE_TYPEOF(type) type #else #define QUEUE_TYPEOF(type) struct type #endif /* * Singly-linked List declarations. */ #define SLIST_HEAD(name, type) \ struct name { \ struct type *slh_first; /* first element */ \ } #define SLIST_CLASS_HEAD(name, type) \ struct name { \ class type *slh_first; /* first element */ \ } #define SLIST_HEAD_INITIALIZER(head) \ { NULL } #define SLIST_ENTRY(type) \ struct { \ struct type *sle_next; /* next element */ \ } #define SLIST_CLASS_ENTRY(type) \ struct { \ class type *sle_next; /* next element */ \ } /* * Singly-linked List functions. */ +#if (defined(_KERNEL) && defined(INVARIANTS)) +#define QMD_SLIST_CHECK_PREVPTR(prevp, elm) do { \ + if (*(prevp) != (elm)) \ + panic("Bad prevptr *(%p) == %p != %p", \ + (prevp), *(prevp), (elm)); \ +} while (0) +#else +#define QMD_SLIST_CHECK_PREVPTR(prevp, elm) +#endif + #define SLIST_CONCAT(head1, head2, type, field) do { \ QUEUE_TYPEOF(type) *curelm = SLIST_FIRST(head1); \ if (curelm == NULL) { \ if ((SLIST_FIRST(head1) = SLIST_FIRST(head2)) != NULL) \ SLIST_INIT(head2); \ } else if (SLIST_FIRST(head2) != NULL) { \ while (SLIST_NEXT(curelm, field) != NULL) \ curelm = SLIST_NEXT(curelm, field); \ SLIST_NEXT(curelm, field) = SLIST_FIRST(head2); \ SLIST_INIT(head2); \ } \ } while (0) #define SLIST_EMPTY(head) ((head)->slh_first == NULL) #define SLIST_FIRST(head) ((head)->slh_first) #define SLIST_FOREACH(var, head, field) \ for ((var) = SLIST_FIRST((head)); \ (var); \ (var) = SLIST_NEXT((var), field)) #define SLIST_FOREACH_FROM(var, head, field) \ for ((var) = ((var) ? (var) : SLIST_FIRST((head))); \ (var); \ (var) = SLIST_NEXT((var), field)) #define SLIST_FOREACH_SAFE(var, head, field, tvar) \ for ((var) = SLIST_FIRST((head)); \ (var) && ((tvar) = SLIST_NEXT((var), field), 1); \ (var) = (tvar)) #define SLIST_FOREACH_FROM_SAFE(var, head, field, tvar) \ for ((var) = ((var) ? (var) : SLIST_FIRST((head))); \ (var) && ((tvar) = SLIST_NEXT((var), field), 1); \ (var) = (tvar)) #define SLIST_FOREACH_PREVPTR(var, varp, head, field) \ for ((varp) = &SLIST_FIRST((head)); \ ((var) = *(varp)) != NULL; \ (varp) = &SLIST_NEXT((var), field)) #define SLIST_INIT(head) do { \ SLIST_FIRST((head)) = NULL; \ } while (0) #define SLIST_INSERT_AFTER(slistelm, elm, field) do { \ SLIST_NEXT((elm), field) = SLIST_NEXT((slistelm), field); \ SLIST_NEXT((slistelm), field) = (elm); \ } while (0) #define SLIST_INSERT_HEAD(head, elm, field) do { \ SLIST_NEXT((elm), field) = SLIST_FIRST((head)); \ SLIST_FIRST((head)) = (elm); \ } while (0) #define SLIST_NEXT(elm, field) ((elm)->field.sle_next) #define SLIST_REMOVE(head, elm, type, field) do { \ QMD_SAVELINK(oldnext, (elm)->field.sle_next); \ if (SLIST_FIRST((head)) == (elm)) { \ SLIST_REMOVE_HEAD((head), field); \ } \ else { \ QUEUE_TYPEOF(type) *curelm = SLIST_FIRST(head); \ while (SLIST_NEXT(curelm, field) != (elm)) \ curelm = SLIST_NEXT(curelm, field); \ SLIST_REMOVE_AFTER(curelm, field); \ } \ TRASHIT(*oldnext); \ } while (0) #define SLIST_REMOVE_AFTER(elm, field) do { \ SLIST_NEXT(elm, field) = \ SLIST_NEXT(SLIST_NEXT(elm, field), field); \ } while (0) #define SLIST_REMOVE_HEAD(head, field) do { \ SLIST_FIRST((head)) = SLIST_NEXT(SLIST_FIRST((head)), field); \ +} while (0) + +#define SLIST_REMOVE_PREVPTR(prevp, elm, field) do { \ + QMD_SLIST_CHECK_PREVPTR(prevp, elm); \ + *(prevp) = SLIST_NEXT(elm, field); \ + TRASHIT((elm)->field.sle_next); \ } while (0) #define SLIST_SWAP(head1, head2, type) do { \ QUEUE_TYPEOF(type) *swap_first = SLIST_FIRST(head1); \ SLIST_FIRST(head1) = SLIST_FIRST(head2); \ SLIST_FIRST(head2) = swap_first; \ } while (0) /* * Singly-linked Tail queue declarations. */ #define STAILQ_HEAD(name, type) \ struct name { \ struct type *stqh_first;/* first element */ \ struct type **stqh_last;/* addr of last next element */ \ } #define STAILQ_CLASS_HEAD(name, type) \ struct name { \ class type *stqh_first; /* first element */ \ class type **stqh_last; /* addr of last next element */ \ } #define STAILQ_HEAD_INITIALIZER(head) \ { NULL, &(head).stqh_first } #define STAILQ_ENTRY(type) \ struct { \ struct type *stqe_next; /* next element */ \ } #define STAILQ_CLASS_ENTRY(type) \ struct { \ class type *stqe_next; /* next element */ \ } /* * Singly-linked Tail queue functions. */ #define STAILQ_CONCAT(head1, head2) do { \ if (!STAILQ_EMPTY((head2))) { \ *(head1)->stqh_last = (head2)->stqh_first; \ (head1)->stqh_last = (head2)->stqh_last; \ STAILQ_INIT((head2)); \ } \ } while (0) #define STAILQ_EMPTY(head) ((head)->stqh_first == NULL) #define STAILQ_FIRST(head) ((head)->stqh_first) #define STAILQ_FOREACH(var, head, field) \ for((var) = STAILQ_FIRST((head)); \ (var); \ (var) = STAILQ_NEXT((var), field)) #define STAILQ_FOREACH_FROM(var, head, field) \ for ((var) = ((var) ? (var) : STAILQ_FIRST((head))); \ (var); \ (var) = STAILQ_NEXT((var), field)) #define STAILQ_FOREACH_SAFE(var, head, field, tvar) \ for ((var) = STAILQ_FIRST((head)); \ (var) && ((tvar) = STAILQ_NEXT((var), field), 1); \ (var) = (tvar)) #define STAILQ_FOREACH_FROM_SAFE(var, head, field, tvar) \ for ((var) = ((var) ? (var) : STAILQ_FIRST((head))); \ (var) && ((tvar) = STAILQ_NEXT((var), field), 1); \ (var) = (tvar)) #define STAILQ_INIT(head) do { \ STAILQ_FIRST((head)) = NULL; \ (head)->stqh_last = &STAILQ_FIRST((head)); \ } while (0) #define STAILQ_INSERT_AFTER(head, tqelm, elm, field) do { \ if ((STAILQ_NEXT((elm), field) = STAILQ_NEXT((tqelm), field)) == NULL)\ (head)->stqh_last = &STAILQ_NEXT((elm), field); \ STAILQ_NEXT((tqelm), field) = (elm); \ } while (0) #define STAILQ_INSERT_HEAD(head, elm, field) do { \ if ((STAILQ_NEXT((elm), field) = STAILQ_FIRST((head))) == NULL) \ (head)->stqh_last = &STAILQ_NEXT((elm), field); \ STAILQ_FIRST((head)) = (elm); \ } while (0) #define STAILQ_INSERT_TAIL(head, elm, field) do { \ STAILQ_NEXT((elm), field) = NULL; \ *(head)->stqh_last = (elm); \ (head)->stqh_last = &STAILQ_NEXT((elm), field); \ } while (0) #define STAILQ_LAST(head, type, field) \ (STAILQ_EMPTY((head)) ? NULL : \ __containerof((head)->stqh_last, \ QUEUE_TYPEOF(type), field.stqe_next)) #define STAILQ_NEXT(elm, field) ((elm)->field.stqe_next) #define STAILQ_REMOVE(head, elm, type, field) do { \ QMD_SAVELINK(oldnext, (elm)->field.stqe_next); \ if (STAILQ_FIRST((head)) == (elm)) { \ STAILQ_REMOVE_HEAD((head), field); \ } \ else { \ QUEUE_TYPEOF(type) *curelm = STAILQ_FIRST(head); \ while (STAILQ_NEXT(curelm, field) != (elm)) \ curelm = STAILQ_NEXT(curelm, field); \ STAILQ_REMOVE_AFTER(head, curelm, field); \ } \ TRASHIT(*oldnext); \ } while (0) #define STAILQ_REMOVE_AFTER(head, elm, field) do { \ if ((STAILQ_NEXT(elm, field) = \ STAILQ_NEXT(STAILQ_NEXT(elm, field), field)) == NULL) \ (head)->stqh_last = &STAILQ_NEXT((elm), field); \ } while (0) #define STAILQ_REMOVE_HEAD(head, field) do { \ if ((STAILQ_FIRST((head)) = \ STAILQ_NEXT(STAILQ_FIRST((head)), field)) == NULL) \ (head)->stqh_last = &STAILQ_FIRST((head)); \ } while (0) #define STAILQ_SWAP(head1, head2, type) do { \ QUEUE_TYPEOF(type) *swap_first = STAILQ_FIRST(head1); \ QUEUE_TYPEOF(type) **swap_last = (head1)->stqh_last; \ STAILQ_FIRST(head1) = STAILQ_FIRST(head2); \ (head1)->stqh_last = (head2)->stqh_last; \ STAILQ_FIRST(head2) = swap_first; \ (head2)->stqh_last = swap_last; \ if (STAILQ_EMPTY(head1)) \ (head1)->stqh_last = &STAILQ_FIRST(head1); \ if (STAILQ_EMPTY(head2)) \ (head2)->stqh_last = &STAILQ_FIRST(head2); \ } while (0) /* * List declarations. */ #define LIST_HEAD(name, type) \ struct name { \ struct type *lh_first; /* first element */ \ } #define LIST_CLASS_HEAD(name, type) \ struct name { \ class type *lh_first; /* first element */ \ } #define LIST_HEAD_INITIALIZER(head) \ { NULL } #define LIST_ENTRY(type) \ struct { \ struct type *le_next; /* next element */ \ struct type **le_prev; /* address of previous next element */ \ } #define LIST_CLASS_ENTRY(type) \ struct { \ class type *le_next; /* next element */ \ class type **le_prev; /* address of previous next element */ \ } /* * List functions. */ #if (defined(_KERNEL) && defined(INVARIANTS)) #define QMD_LIST_CHECK_HEAD(head, field) do { \ if (LIST_FIRST((head)) != NULL && \ LIST_FIRST((head))->field.le_prev != \ &LIST_FIRST((head))) \ panic("Bad list head %p first->prev != head", (head)); \ } while (0) #define QMD_LIST_CHECK_NEXT(elm, field) do { \ if (LIST_NEXT((elm), field) != NULL && \ LIST_NEXT((elm), field)->field.le_prev != \ &((elm)->field.le_next)) \ panic("Bad link elm %p next->prev != elm", (elm)); \ } while (0) #define QMD_LIST_CHECK_PREV(elm, field) do { \ if (*(elm)->field.le_prev != (elm)) \ panic("Bad link elm %p prev->next != elm", (elm)); \ } while (0) #else #define QMD_LIST_CHECK_HEAD(head, field) #define QMD_LIST_CHECK_NEXT(elm, field) #define QMD_LIST_CHECK_PREV(elm, field) #endif /* (_KERNEL && INVARIANTS) */ #define LIST_CONCAT(head1, head2, type, field) do { \ QUEUE_TYPEOF(type) *curelm = LIST_FIRST(head1); \ if (curelm == NULL) { \ if ((LIST_FIRST(head1) = LIST_FIRST(head2)) != NULL) { \ LIST_FIRST(head2)->field.le_prev = \ &LIST_FIRST((head1)); \ LIST_INIT(head2); \ } \ } else if (LIST_FIRST(head2) != NULL) { \ while (LIST_NEXT(curelm, field) != NULL) \ curelm = LIST_NEXT(curelm, field); \ LIST_NEXT(curelm, field) = LIST_FIRST(head2); \ LIST_FIRST(head2)->field.le_prev = &LIST_NEXT(curelm, field); \ LIST_INIT(head2); \ } \ } while (0) #define LIST_EMPTY(head) ((head)->lh_first == NULL) #define LIST_FIRST(head) ((head)->lh_first) #define LIST_FOREACH(var, head, field) \ for ((var) = LIST_FIRST((head)); \ (var); \ (var) = LIST_NEXT((var), field)) #define LIST_FOREACH_FROM(var, head, field) \ for ((var) = ((var) ? (var) : LIST_FIRST((head))); \ (var); \ (var) = LIST_NEXT((var), field)) #define LIST_FOREACH_SAFE(var, head, field, tvar) \ for ((var) = LIST_FIRST((head)); \ (var) && ((tvar) = LIST_NEXT((var), field), 1); \ (var) = (tvar)) #define LIST_FOREACH_FROM_SAFE(var, head, field, tvar) \ for ((var) = ((var) ? (var) : LIST_FIRST((head))); \ (var) && ((tvar) = LIST_NEXT((var), field), 1); \ (var) = (tvar)) #define LIST_INIT(head) do { \ LIST_FIRST((head)) = NULL; \ } while (0) #define LIST_INSERT_AFTER(listelm, elm, field) do { \ QMD_LIST_CHECK_NEXT(listelm, field); \ if ((LIST_NEXT((elm), field) = LIST_NEXT((listelm), field)) != NULL)\ LIST_NEXT((listelm), field)->field.le_prev = \ &LIST_NEXT((elm), field); \ LIST_NEXT((listelm), field) = (elm); \ (elm)->field.le_prev = &LIST_NEXT((listelm), field); \ } while (0) #define LIST_INSERT_BEFORE(listelm, elm, field) do { \ QMD_LIST_CHECK_PREV(listelm, field); \ (elm)->field.le_prev = (listelm)->field.le_prev; \ LIST_NEXT((elm), field) = (listelm); \ *(listelm)->field.le_prev = (elm); \ (listelm)->field.le_prev = &LIST_NEXT((elm), field); \ } while (0) #define LIST_INSERT_HEAD(head, elm, field) do { \ QMD_LIST_CHECK_HEAD((head), field); \ if ((LIST_NEXT((elm), field) = LIST_FIRST((head))) != NULL) \ LIST_FIRST((head))->field.le_prev = &LIST_NEXT((elm), field);\ LIST_FIRST((head)) = (elm); \ (elm)->field.le_prev = &LIST_FIRST((head)); \ } while (0) #define LIST_NEXT(elm, field) ((elm)->field.le_next) #define LIST_PREV(elm, head, type, field) \ ((elm)->field.le_prev == &LIST_FIRST((head)) ? NULL : \ __containerof((elm)->field.le_prev, \ QUEUE_TYPEOF(type), field.le_next)) #define LIST_REMOVE(elm, field) do { \ QMD_SAVELINK(oldnext, (elm)->field.le_next); \ QMD_SAVELINK(oldprev, (elm)->field.le_prev); \ QMD_LIST_CHECK_NEXT(elm, field); \ QMD_LIST_CHECK_PREV(elm, field); \ if (LIST_NEXT((elm), field) != NULL) \ LIST_NEXT((elm), field)->field.le_prev = \ (elm)->field.le_prev; \ *(elm)->field.le_prev = LIST_NEXT((elm), field); \ TRASHIT(*oldnext); \ TRASHIT(*oldprev); \ } while (0) #define LIST_SWAP(head1, head2, type, field) do { \ QUEUE_TYPEOF(type) *swap_tmp = LIST_FIRST(head1); \ LIST_FIRST((head1)) = LIST_FIRST((head2)); \ LIST_FIRST((head2)) = swap_tmp; \ if ((swap_tmp = LIST_FIRST((head1))) != NULL) \ swap_tmp->field.le_prev = &LIST_FIRST((head1)); \ if ((swap_tmp = LIST_FIRST((head2))) != NULL) \ swap_tmp->field.le_prev = &LIST_FIRST((head2)); \ } while (0) /* * Tail queue declarations. */ #define TAILQ_HEAD(name, type) \ struct name { \ struct type *tqh_first; /* first element */ \ struct type **tqh_last; /* addr of last next element */ \ TRACEBUF \ } #define TAILQ_CLASS_HEAD(name, type) \ struct name { \ class type *tqh_first; /* first element */ \ class type **tqh_last; /* addr of last next element */ \ TRACEBUF \ } #define TAILQ_HEAD_INITIALIZER(head) \ { NULL, &(head).tqh_first, TRACEBUF_INITIALIZER } #define TAILQ_ENTRY(type) \ struct { \ struct type *tqe_next; /* next element */ \ struct type **tqe_prev; /* address of previous next element */ \ TRACEBUF \ } #define TAILQ_CLASS_ENTRY(type) \ struct { \ class type *tqe_next; /* next element */ \ class type **tqe_prev; /* address of previous next element */ \ TRACEBUF \ } /* * Tail queue functions. */ #if (defined(_KERNEL) && defined(INVARIANTS)) #define QMD_TAILQ_CHECK_HEAD(head, field) do { \ if (!TAILQ_EMPTY(head) && \ TAILQ_FIRST((head))->field.tqe_prev != \ &TAILQ_FIRST((head))) \ panic("Bad tailq head %p first->prev != head", (head)); \ } while (0) #define QMD_TAILQ_CHECK_TAIL(head, field) do { \ if (*(head)->tqh_last != NULL) \ panic("Bad tailq NEXT(%p->tqh_last) != NULL", (head)); \ } while (0) #define QMD_TAILQ_CHECK_NEXT(elm, field) do { \ if (TAILQ_NEXT((elm), field) != NULL && \ TAILQ_NEXT((elm), field)->field.tqe_prev != \ &((elm)->field.tqe_next)) \ panic("Bad link elm %p next->prev != elm", (elm)); \ } while (0) #define QMD_TAILQ_CHECK_PREV(elm, field) do { \ if (*(elm)->field.tqe_prev != (elm)) \ panic("Bad link elm %p prev->next != elm", (elm)); \ } while (0) #else #define QMD_TAILQ_CHECK_HEAD(head, field) #define QMD_TAILQ_CHECK_TAIL(head, headname) #define QMD_TAILQ_CHECK_NEXT(elm, field) #define QMD_TAILQ_CHECK_PREV(elm, field) #endif /* (_KERNEL && INVARIANTS) */ #define TAILQ_CONCAT(head1, head2, field) do { \ if (!TAILQ_EMPTY(head2)) { \ *(head1)->tqh_last = (head2)->tqh_first; \ (head2)->tqh_first->field.tqe_prev = (head1)->tqh_last; \ (head1)->tqh_last = (head2)->tqh_last; \ TAILQ_INIT((head2)); \ QMD_TRACE_HEAD(head1); \ QMD_TRACE_HEAD(head2); \ } \ } while (0) #define TAILQ_EMPTY(head) ((head)->tqh_first == NULL) #define TAILQ_FIRST(head) ((head)->tqh_first) #define TAILQ_FOREACH(var, head, field) \ for ((var) = TAILQ_FIRST((head)); \ (var); \ (var) = TAILQ_NEXT((var), field)) #define TAILQ_FOREACH_FROM(var, head, field) \ for ((var) = ((var) ? (var) : TAILQ_FIRST((head))); \ (var); \ (var) = TAILQ_NEXT((var), field)) #define TAILQ_FOREACH_SAFE(var, head, field, tvar) \ for ((var) = TAILQ_FIRST((head)); \ (var) && ((tvar) = TAILQ_NEXT((var), field), 1); \ (var) = (tvar)) #define TAILQ_FOREACH_FROM_SAFE(var, head, field, tvar) \ for ((var) = ((var) ? (var) : TAILQ_FIRST((head))); \ (var) && ((tvar) = TAILQ_NEXT((var), field), 1); \ (var) = (tvar)) #define TAILQ_FOREACH_REVERSE(var, head, headname, field) \ for ((var) = TAILQ_LAST((head), headname); \ (var); \ (var) = TAILQ_PREV((var), headname, field)) #define TAILQ_FOREACH_REVERSE_FROM(var, head, headname, field) \ for ((var) = ((var) ? (var) : TAILQ_LAST((head), headname)); \ (var); \ (var) = TAILQ_PREV((var), headname, field)) #define TAILQ_FOREACH_REVERSE_SAFE(var, head, headname, field, tvar) \ for ((var) = TAILQ_LAST((head), headname); \ (var) && ((tvar) = TAILQ_PREV((var), headname, field), 1); \ (var) = (tvar)) #define TAILQ_FOREACH_REVERSE_FROM_SAFE(var, head, headname, field, tvar) \ for ((var) = ((var) ? (var) : TAILQ_LAST((head), headname)); \ (var) && ((tvar) = TAILQ_PREV((var), headname, field), 1); \ (var) = (tvar)) #define TAILQ_INIT(head) do { \ TAILQ_FIRST((head)) = NULL; \ (head)->tqh_last = &TAILQ_FIRST((head)); \ QMD_TRACE_HEAD(head); \ } while (0) #define TAILQ_INSERT_AFTER(head, listelm, elm, field) do { \ QMD_TAILQ_CHECK_NEXT(listelm, field); \ if ((TAILQ_NEXT((elm), field) = TAILQ_NEXT((listelm), field)) != NULL)\ TAILQ_NEXT((elm), field)->field.tqe_prev = \ &TAILQ_NEXT((elm), field); \ else { \ (head)->tqh_last = &TAILQ_NEXT((elm), field); \ QMD_TRACE_HEAD(head); \ } \ TAILQ_NEXT((listelm), field) = (elm); \ (elm)->field.tqe_prev = &TAILQ_NEXT((listelm), field); \ QMD_TRACE_ELEM(&(elm)->field); \ QMD_TRACE_ELEM(&(listelm)->field); \ } while (0) #define TAILQ_INSERT_BEFORE(listelm, elm, field) do { \ QMD_TAILQ_CHECK_PREV(listelm, field); \ (elm)->field.tqe_prev = (listelm)->field.tqe_prev; \ TAILQ_NEXT((elm), field) = (listelm); \ *(listelm)->field.tqe_prev = (elm); \ (listelm)->field.tqe_prev = &TAILQ_NEXT((elm), field); \ QMD_TRACE_ELEM(&(elm)->field); \ QMD_TRACE_ELEM(&(listelm)->field); \ } while (0) #define TAILQ_INSERT_HEAD(head, elm, field) do { \ QMD_TAILQ_CHECK_HEAD(head, field); \ if ((TAILQ_NEXT((elm), field) = TAILQ_FIRST((head))) != NULL) \ TAILQ_FIRST((head))->field.tqe_prev = \ &TAILQ_NEXT((elm), field); \ else \ (head)->tqh_last = &TAILQ_NEXT((elm), field); \ TAILQ_FIRST((head)) = (elm); \ (elm)->field.tqe_prev = &TAILQ_FIRST((head)); \ QMD_TRACE_HEAD(head); \ QMD_TRACE_ELEM(&(elm)->field); \ } while (0) #define TAILQ_INSERT_TAIL(head, elm, field) do { \ QMD_TAILQ_CHECK_TAIL(head, field); \ TAILQ_NEXT((elm), field) = NULL; \ (elm)->field.tqe_prev = (head)->tqh_last; \ *(head)->tqh_last = (elm); \ (head)->tqh_last = &TAILQ_NEXT((elm), field); \ QMD_TRACE_HEAD(head); \ QMD_TRACE_ELEM(&(elm)->field); \ } while (0) #define TAILQ_LAST(head, headname) \ (*(((struct headname *)((head)->tqh_last))->tqh_last)) #define TAILQ_NEXT(elm, field) ((elm)->field.tqe_next) #define TAILQ_PREV(elm, headname, field) \ (*(((struct headname *)((elm)->field.tqe_prev))->tqh_last)) #define TAILQ_REMOVE(head, elm, field) do { \ QMD_SAVELINK(oldnext, (elm)->field.tqe_next); \ QMD_SAVELINK(oldprev, (elm)->field.tqe_prev); \ QMD_TAILQ_CHECK_NEXT(elm, field); \ QMD_TAILQ_CHECK_PREV(elm, field); \ if ((TAILQ_NEXT((elm), field)) != NULL) \ TAILQ_NEXT((elm), field)->field.tqe_prev = \ (elm)->field.tqe_prev; \ else { \ (head)->tqh_last = (elm)->field.tqe_prev; \ QMD_TRACE_HEAD(head); \ } \ *(elm)->field.tqe_prev = TAILQ_NEXT((elm), field); \ TRASHIT(*oldnext); \ TRASHIT(*oldprev); \ QMD_TRACE_ELEM(&(elm)->field); \ } while (0) #define TAILQ_SWAP(head1, head2, type, field) do { \ QUEUE_TYPEOF(type) *swap_first = (head1)->tqh_first; \ QUEUE_TYPEOF(type) **swap_last = (head1)->tqh_last; \ (head1)->tqh_first = (head2)->tqh_first; \ (head1)->tqh_last = (head2)->tqh_last; \ (head2)->tqh_first = swap_first; \ (head2)->tqh_last = swap_last; \ if ((swap_first = (head1)->tqh_first) != NULL) \ swap_first->field.tqe_prev = &(head1)->tqh_first; \ else \ (head1)->tqh_last = &(head1)->tqh_first; \ if ((swap_first = (head2)->tqh_first) != NULL) \ swap_first->field.tqe_prev = &(head2)->tqh_first; \ else \ (head2)->tqh_last = &(head2)->tqh_first; \ } while (0) #endif /* !_SYS_QUEUE_H_ */ Index: projects/clang390-import/sys/vm/pmap.h =================================================================== --- projects/clang390-import/sys/vm/pmap.h (revision 305686) +++ projects/clang390-import/sys/vm/pmap.h (revision 305687) @@ -1,161 +1,171 @@ /*- * Copyright (c) 1991, 1993 * The Regents of the University of California. All rights reserved. * * This code is derived from software contributed to Berkeley by * The Mach Operating System project at Carnegie-Mellon University. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * from: @(#)pmap.h 8.1 (Berkeley) 6/11/93 * * * Copyright (c) 1987, 1990 Carnegie-Mellon University. * All rights reserved. * * Author: Avadis Tevanian, Jr. * * Permission to use, copy, modify and distribute this software and * its documentation is hereby granted, provided that both the copyright * notice and this permission notice appear in all copies of the * software, derivative works or modified versions, and any portions * thereof, and that both notices appear in supporting documentation. * * CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" * CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND * FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. * * Carnegie Mellon requests users of this software to return to * * Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU * School of Computer Science * Carnegie Mellon University * Pittsburgh PA 15213-3890 * * any improvements or extensions that they make and grant Carnegie the * rights to redistribute these changes. * * $FreeBSD$ */ /* * Machine address mapping definitions -- machine-independent * section. [For machine-dependent section, see "machine/pmap.h".] */ #ifndef _PMAP_VM_ #define _PMAP_VM_ /* * Each machine dependent implementation is expected to * keep certain statistics. They may do this anyway they * so choose, but are expected to return the statistics * in the following structure. */ struct pmap_statistics { long resident_count; /* # of pages mapped (total) */ long wired_count; /* # of pages wired */ }; typedef struct pmap_statistics *pmap_statistics_t; /* * Each machine-dependent implementation is required to provide: * * vm_memattr_t pmap_page_get_memattr(vm_page_t); * boolean_t pmap_page_is_mapped(vm_page_t); * boolean_t pmap_page_is_write_mapped(vm_page_t); * void pmap_page_set_memattr(vm_page_t, vm_memattr_t); */ #include #ifdef _KERNEL struct thread; /* * Updates to kernel_vm_end are synchronized by the kernel_map's system mutex. */ extern vm_offset_t kernel_vm_end; /* * Flags for pmap_enter(). The bits in the low-order byte are reserved * for the protection code (vm_prot_t) that describes the fault type. */ #define PMAP_ENTER_NOSLEEP 0x0100 #define PMAP_ENTER_WIRED 0x0200 +/* + * Define the maximum number of machine-dependent reference bits that are + * cleared by a call to pmap_ts_referenced(). This limit serves two purposes. + * First, it bounds the cost of reference bit maintenance on widely shared + * pages. Second, it prevents numeric overflow during maintenance of a + * widely shared page's "act_count" field. An overflow could result in the + * premature deactivation of the page. + */ +#define PMAP_TS_REFERENCED_MAX 5 + void pmap_activate(struct thread *td); void pmap_advise(pmap_t pmap, vm_offset_t sva, vm_offset_t eva, int advice); void pmap_align_superpage(vm_object_t, vm_ooffset_t, vm_offset_t *, vm_size_t); void pmap_clear_modify(vm_page_t m); void pmap_copy(pmap_t, pmap_t, vm_offset_t, vm_size_t, vm_offset_t); void pmap_copy_page(vm_page_t, vm_page_t); void pmap_copy_pages(vm_page_t ma[], vm_offset_t a_offset, vm_page_t mb[], vm_offset_t b_offset, int xfersize); int pmap_enter(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot, u_int flags, int8_t psind); void pmap_enter_object(pmap_t pmap, vm_offset_t start, vm_offset_t end, vm_page_t m_start, vm_prot_t prot); void pmap_enter_quick(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot); vm_paddr_t pmap_extract(pmap_t pmap, vm_offset_t va); vm_page_t pmap_extract_and_hold(pmap_t pmap, vm_offset_t va, vm_prot_t prot); void pmap_growkernel(vm_offset_t); void pmap_init(void); boolean_t pmap_is_modified(vm_page_t m); boolean_t pmap_is_prefaultable(pmap_t pmap, vm_offset_t va); boolean_t pmap_is_referenced(vm_page_t m); vm_offset_t pmap_map(vm_offset_t *, vm_paddr_t, vm_paddr_t, int); int pmap_mincore(pmap_t pmap, vm_offset_t addr, vm_paddr_t *locked_pa); void pmap_object_init_pt(pmap_t pmap, vm_offset_t addr, vm_object_t object, vm_pindex_t pindex, vm_size_t size); boolean_t pmap_page_exists_quick(pmap_t pmap, vm_page_t m); void pmap_page_init(vm_page_t m); int pmap_page_wired_mappings(vm_page_t m); int pmap_pinit(pmap_t); void pmap_pinit0(pmap_t); void pmap_protect(pmap_t, vm_offset_t, vm_offset_t, vm_prot_t); void pmap_qenter(vm_offset_t, vm_page_t *, int); void pmap_qremove(vm_offset_t, int); vm_offset_t pmap_quick_enter_page(vm_page_t); void pmap_quick_remove_page(vm_offset_t); void pmap_release(pmap_t); void pmap_remove(pmap_t, vm_offset_t, vm_offset_t); void pmap_remove_all(vm_page_t m); void pmap_remove_pages(pmap_t); void pmap_remove_write(vm_page_t m); void pmap_sync_icache(pmap_t, vm_offset_t, vm_size_t); boolean_t pmap_ts_referenced(vm_page_t m); void pmap_unwire(pmap_t pmap, vm_offset_t start, vm_offset_t end); void pmap_zero_page(vm_page_t); void pmap_zero_page_area(vm_page_t, int off, int size); #define pmap_resident_count(pm) ((pm)->pm_stats.resident_count) #define pmap_wired_count(pm) ((pm)->pm_stats.wired_count) #endif /* _KERNEL */ #endif /* _PMAP_VM_ */ Index: projects/clang390-import/tests/sys/kern/Makefile =================================================================== --- projects/clang390-import/tests/sys/kern/Makefile (revision 305686) +++ projects/clang390-import/tests/sys/kern/Makefile (revision 305687) @@ -1,39 +1,40 @@ # $FreeBSD$ TESTSRC= ${SRCTOP}/contrib/netbsd-tests/kernel .PATH: ${SRCTOP}/sys/kern TESTSDIR= ${TESTSBASE}/sys/kern ATF_TESTS_C+= kern_copyin ATF_TESTS_C+= kern_descrip_test ATF_TESTS_C+= ptrace_test PLAIN_TESTS_C+= subr_unit_test ATF_TESTS_C+= unix_seqpacket_test ATF_TESTS_C+= unix_passfd_test TEST_METADATA.unix_seqpacket_test+= timeout="15" +ATF_TESTS_C+= waitpid_nohang LIBADD.ptrace_test+= pthread LIBADD.unix_seqpacket_test+= pthread NETBSD_ATF_TESTS_C+= lockf_test NETBSD_ATF_TESTS_C+= mqueue_test CFLAGS.mqueue_test+= -I${SRCTOP}/tests LIBADD.mqueue_test+= rt # subr_unit.c contains functions whose prototypes lie in headers that cannot be # included in userland. But as far as subr_unit_test goes, they're effectively # static. So it's ok to disable -Wmissing-prototypes for this program. CFLAGS.subr_unit.c+= -Wno-missing-prototypes SRCS.subr_unit_test+= subr_unit.c WARNS?= 5 TESTS_SUBDIRS+= acct TESTS_SUBDIRS+= execve TESTS_SUBDIRS+= pipe .include .include Index: projects/clang390-import/tests/sys/kern/waitpid_nohang.c =================================================================== --- projects/clang390-import/tests/sys/kern/waitpid_nohang.c (nonexistent) +++ projects/clang390-import/tests/sys/kern/waitpid_nohang.c (revision 305687) @@ -0,0 +1,70 @@ +/*- + * Copyright (c) 2016 Jilles Tjoelker + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include +__FBSDID("$FreeBSD$"); + +#include + +#include +#include +#include + +ATF_TC_WITHOUT_HEAD(waitpid_nohang); +ATF_TC_BODY(waitpid_nohang, tc) +{ + pid_t child, pid; + int status, r; + + child = fork(); + ATF_REQUIRE(child != -1); + if (child == 0) { + sleep(10); + _exit(1); + } + + status = 42; + pid = waitpid(child, &status, WNOHANG); + ATF_REQUIRE(pid == 0); + ATF_CHECK(status == 42); + + r = kill(child, SIGTERM); + ATF_REQUIRE(r == 0); + r = waitid(P_PID, child, NULL, WEXITED | WNOWAIT); + ATF_REQUIRE(r == 0); + + status = -1; + pid = waitpid(child, &status, WNOHANG); + ATF_REQUIRE(pid == child); + ATF_CHECK(WIFSIGNALED(status) && WTERMSIG(status) == SIGTERM); +} + +ATF_TP_ADD_TCS(tp) +{ + + ATF_TP_ADD_TC(tp, waitpid_nohang); + return (atf_no_error()); +} Property changes on: projects/clang390-import/tests/sys/kern/waitpid_nohang.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang390-import/tools/regression/capsicum/libcasper/dns.c =================================================================== --- projects/clang390-import/tools/regression/capsicum/libcasper/dns.c (revision 305686) +++ projects/clang390-import/tools/regression/capsicum/libcasper/dns.c (nonexistent) @@ -1,702 +0,0 @@ -/*- - * Copyright (c) 2013 The FreeBSD Foundation - * All rights reserved. - * - * This software was developed by Pawel Jakub Dawidek under sponsorship from - * the FreeBSD Foundation. - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND - * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE - * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE - * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL - * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS - * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) - * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT - * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY - * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF - * SUCH DAMAGE. - */ - -#include -__FBSDID("$FreeBSD$"); - -#include - -#include -#include - -#include -#include -#include -#include -#include -#include -#include -#include - -#include - -#include - -static int ntest = 1; - -#define CHECK(expr) do { \ - if ((expr)) \ - printf("ok %d %s:%u\n", ntest, __FILE__, __LINE__); \ - else \ - printf("not ok %d %s:%u\n", ntest, __FILE__, __LINE__); \ - ntest++; \ -} while (0) -#define CHECKX(expr) do { \ - if ((expr)) { \ - printf("ok %d %s:%u\n", ntest, __FILE__, __LINE__); \ - } else { \ - printf("not ok %d %s:%u\n", ntest, __FILE__, __LINE__); \ - exit(1); \ - } \ - ntest++; \ -} while (0) - -#define GETHOSTBYNAME 0x01 -#define GETHOSTBYNAME2_AF_INET 0x02 -#define GETHOSTBYNAME2_AF_INET6 0x04 -#define GETHOSTBYADDR_AF_INET 0x08 -#define GETHOSTBYADDR_AF_INET6 0x10 -#define GETADDRINFO_AF_UNSPEC 0x20 -#define GETADDRINFO_AF_INET 0x40 -#define GETADDRINFO_AF_INET6 0x80 - -static bool -addrinfo_compare(struct addrinfo *ai0, struct addrinfo *ai1) -{ - struct addrinfo *at0, *at1; - - if (ai0 == NULL && ai1 == NULL) - return (true); - if (ai0 == NULL || ai1 == NULL) - return (false); - - at0 = ai0; - at1 = ai1; - while (true) { - if ((at0->ai_flags == at1->ai_flags) && - (at0->ai_family == at1->ai_family) && - (at0->ai_socktype == at1->ai_socktype) && - (at0->ai_protocol == at1->ai_protocol) && - (at0->ai_addrlen == at1->ai_addrlen) && - (memcmp(at0->ai_addr, at1->ai_addr, - at0->ai_addrlen) == 0)) { - if (at0->ai_canonname != NULL && - at1->ai_canonname != NULL) { - if (strcmp(at0->ai_canonname, - at1->ai_canonname) != 0) { - return (false); - } - } - - if (at0->ai_canonname == NULL && - at1->ai_canonname != NULL) { - return (false); - } - if (at0->ai_canonname != NULL && - at1->ai_canonname == NULL) { - return (false); - } - - if (at0->ai_next == NULL && at1->ai_next == NULL) - return (true); - if (at0->ai_next == NULL || at1->ai_next == NULL) - return (false); - - at0 = at0->ai_next; - at1 = at1->ai_next; - } else { - return (false); - } - } - - /* NOTREACHED */ - fprintf(stderr, "Dead code reached in addrinfo_compare()\n"); - exit(1); -} - -static bool -hostent_aliases_compare(char **aliases0, char **aliases1) -{ - int i0, i1; - - if (aliases0 == NULL && aliases1 == NULL) - return (true); - if (aliases0 == NULL || aliases1 == NULL) - return (false); - - for (i0 = 0; aliases0[i0] != NULL; i0++) { - for (i1 = 0; aliases1[i1] != NULL; i1++) { - if (strcmp(aliases0[i0], aliases1[i1]) == 0) - break; - } - if (aliases1[i1] == NULL) - return (false); - } - - return (true); -} - -static bool -hostent_addr_list_compare(char **addr_list0, char **addr_list1, int length) -{ - int i0, i1; - - if (addr_list0 == NULL && addr_list1 == NULL) - return (true); - if (addr_list0 == NULL || addr_list1 == NULL) - return (false); - - for (i0 = 0; addr_list0[i0] != NULL; i0++) { - for (i1 = 0; addr_list1[i1] != NULL; i1++) { - if (memcmp(addr_list0[i0], addr_list1[i1], length) == 0) - break; - } - if (addr_list1[i1] == NULL) - return (false); - } - - return (true); -} - -static bool -hostent_compare(const struct hostent *hp0, const struct hostent *hp1) -{ - - if (hp0 == NULL && hp1 != NULL) - return (true); - - if (hp0 == NULL || hp1 == NULL) - return (false); - - if (hp0->h_name != NULL || hp1->h_name != NULL) { - if (hp0->h_name == NULL || hp1->h_name == NULL) - return (false); - if (strcmp(hp0->h_name, hp1->h_name) != 0) - return (false); - } - - if (!hostent_aliases_compare(hp0->h_aliases, hp1->h_aliases)) - return (false); - if (!hostent_aliases_compare(hp1->h_aliases, hp0->h_aliases)) - return (false); - - if (hp0->h_addrtype != hp1->h_addrtype) - return (false); - - if (hp0->h_length != hp1->h_length) - return (false); - - if (!hostent_addr_list_compare(hp0->h_addr_list, hp1->h_addr_list, - hp0->h_length)) { - return (false); - } - if (!hostent_addr_list_compare(hp1->h_addr_list, hp0->h_addr_list, - hp0->h_length)) { - return (false); - } - - return (true); -} - -static unsigned int -runtest(cap_channel_t *capdns) -{ - unsigned int result; - struct addrinfo *ais, *aic, hints, *hintsp; - struct hostent *hps, *hpc; - struct in_addr ip4; - struct in6_addr ip6; - - result = 0; - - hps = gethostbyname("example.com"); - if (hps == NULL) - fprintf(stderr, "Unable to resolve %s IPv4.\n", "example.com"); - hpc = cap_gethostbyname(capdns, "example.com"); - if (hostent_compare(hps, hpc)) - result |= GETHOSTBYNAME; - - hps = gethostbyname2("example.com", AF_INET); - if (hps == NULL) - fprintf(stderr, "Unable to resolve %s IPv4.\n", "example.com"); - hpc = cap_gethostbyname2(capdns, "example.com", AF_INET); - if (hostent_compare(hps, hpc)) - result |= GETHOSTBYNAME2_AF_INET; - - hps = gethostbyname2("example.com", AF_INET6); - if (hps == NULL) - fprintf(stderr, "Unable to resolve %s IPv6.\n", "example.com"); - hpc = cap_gethostbyname2(capdns, "example.com", AF_INET6); - if (hostent_compare(hps, hpc)) - result |= GETHOSTBYNAME2_AF_INET6; - - hints.ai_flags = 0; - hints.ai_family = AF_UNSPEC; - hints.ai_socktype = 0; - hints.ai_protocol = 0; - hints.ai_addrlen = 0; - hints.ai_addr = NULL; - hints.ai_canonname = NULL; - hints.ai_next = NULL; - - hintsp = &hints; - - if (getaddrinfo("freebsd.org", "25", hintsp, &ais) != 0) { - fprintf(stderr, - "Unable to issue [system] getaddrinfo() for AF_UNSPEC: %s\n", - gai_strerror(errno)); - } - if (cap_getaddrinfo(capdns, "freebsd.org", "25", hintsp, &aic) == 0) { - if (addrinfo_compare(ais, aic)) - result |= GETADDRINFO_AF_UNSPEC; - freeaddrinfo(ais); - freeaddrinfo(aic); - } - - hints.ai_family = AF_INET; - if (getaddrinfo("freebsd.org", "25", hintsp, &ais) != 0) { - fprintf(stderr, - "Unable to issue [system] getaddrinfo() for AF_UNSPEC: %s\n", - gai_strerror(errno)); - } - if (cap_getaddrinfo(capdns, "freebsd.org", "25", hintsp, &aic) == 0) { - if (addrinfo_compare(ais, aic)) - result |= GETADDRINFO_AF_INET; - freeaddrinfo(ais); - freeaddrinfo(aic); - } - - hints.ai_family = AF_INET6; - if (getaddrinfo("freebsd.org", "25", hintsp, &ais) != 0) { - fprintf(stderr, - "Unable to issue [system] getaddrinfo() for AF_UNSPEC: %s\n", - gai_strerror(errno)); - } - if (cap_getaddrinfo(capdns, "freebsd.org", "25", hintsp, &aic) == 0) { - if (addrinfo_compare(ais, aic)) - result |= GETADDRINFO_AF_INET6; - freeaddrinfo(ais); - freeaddrinfo(aic); - } - - /* - * 8.8.178.135 is IPv4 address of freefall.freebsd.org - * as of 27 October 2013. - */ - inet_pton(AF_INET, "8.8.178.135", &ip4); - hps = gethostbyaddr(&ip4, sizeof(ip4), AF_INET); - if (hps == NULL) - fprintf(stderr, "Unable to resolve %s.\n", "8.8.178.135"); - hpc = cap_gethostbyaddr(capdns, &ip4, sizeof(ip4), AF_INET); - if (hostent_compare(hps, hpc)) - result |= GETHOSTBYADDR_AF_INET; - - /* - * 2001:1900:2254:206c::16:87 is IPv6 address of freefall.freebsd.org - * as of 27 October 2013. - */ - inet_pton(AF_INET6, "2001:1900:2254:206c::16:87", &ip6); - hps = gethostbyaddr(&ip6, sizeof(ip6), AF_INET6); - if (hps == NULL) { - fprintf(stderr, "Unable to resolve %s.\n", - "2001:1900:2254:206c::16:87"); - } - hpc = cap_gethostbyaddr(capdns, &ip6, sizeof(ip6), AF_INET6); - if (hostent_compare(hps, hpc)) - result |= GETHOSTBYADDR_AF_INET6; - - return (result); -} - -int -main(void) -{ - cap_channel_t *capcas, *capdns, *origcapdns; - const char *types[2]; - int families[2]; - - printf("1..91\n"); - - capcas = cap_init(); - CHECKX(capcas != NULL); - - origcapdns = capdns = cap_service_open(capcas, "system.dns"); - CHECKX(capdns != NULL); - - cap_close(capcas); - - /* No limits set. */ - - CHECK(runtest(capdns) == - (GETHOSTBYNAME | GETHOSTBYNAME2_AF_INET | GETHOSTBYNAME2_AF_INET6 | - GETHOSTBYADDR_AF_INET | GETHOSTBYADDR_AF_INET6 | - GETADDRINFO_AF_UNSPEC | GETADDRINFO_AF_INET | GETADDRINFO_AF_INET6)); - - /* - * Allow: - * type: NAME, ADDR - * family: AF_INET, AF_INET6 - */ - - capdns = cap_clone(origcapdns); - CHECK(capdns != NULL); - - types[0] = "NAME"; - types[1] = "ADDR"; - CHECK(cap_dns_type_limit(capdns, types, 2) == 0); - families[0] = AF_INET; - families[1] = AF_INET6; - CHECK(cap_dns_family_limit(capdns, families, 2) == 0); - - CHECK(runtest(capdns) == - (GETHOSTBYNAME | GETHOSTBYNAME2_AF_INET | GETHOSTBYNAME2_AF_INET6 | - GETHOSTBYADDR_AF_INET | GETHOSTBYADDR_AF_INET6 | - GETADDRINFO_AF_INET | GETADDRINFO_AF_INET6)); - - cap_close(capdns); - - /* - * Allow: - * type: NAME - * family: AF_INET, AF_INET6 - */ - - capdns = cap_clone(origcapdns); - CHECK(capdns != NULL); - - types[0] = "NAME"; - CHECK(cap_dns_type_limit(capdns, types, 1) == 0); - types[1] = "ADDR"; - CHECK(cap_dns_type_limit(capdns, types, 2) == -1 && - errno == ENOTCAPABLE); - types[0] = "ADDR"; - CHECK(cap_dns_type_limit(capdns, types, 1) == -1 && - errno == ENOTCAPABLE); - families[0] = AF_INET; - families[1] = AF_INET6; - CHECK(cap_dns_family_limit(capdns, families, 2) == 0); - - CHECK(runtest(capdns) == - (GETHOSTBYNAME | GETHOSTBYNAME2_AF_INET | GETHOSTBYNAME2_AF_INET6)); - - cap_close(capdns); - - /* - * Allow: - * type: ADDR - * family: AF_INET, AF_INET6 - */ - - capdns = cap_clone(origcapdns); - CHECK(capdns != NULL); - - types[0] = "ADDR"; - CHECK(cap_dns_type_limit(capdns, types, 1) == 0); - types[1] = "NAME"; - CHECK(cap_dns_type_limit(capdns, types, 2) == -1 && - errno == ENOTCAPABLE); - types[0] = "NAME"; - CHECK(cap_dns_type_limit(capdns, types, 1) == -1 && - errno == ENOTCAPABLE); - families[0] = AF_INET; - families[1] = AF_INET6; - CHECK(cap_dns_family_limit(capdns, families, 2) == 0); - - CHECK(runtest(capdns) == - (GETHOSTBYADDR_AF_INET | GETHOSTBYADDR_AF_INET6 | - GETADDRINFO_AF_INET | GETADDRINFO_AF_INET6)); - - cap_close(capdns); - - /* - * Allow: - * type: NAME, ADDR - * family: AF_INET - */ - - capdns = cap_clone(origcapdns); - CHECK(capdns != NULL); - - types[0] = "NAME"; - types[1] = "ADDR"; - CHECK(cap_dns_type_limit(capdns, types, 2) == 0); - families[0] = AF_INET; - CHECK(cap_dns_family_limit(capdns, families, 1) == 0); - families[1] = AF_INET6; - CHECK(cap_dns_family_limit(capdns, families, 2) == -1 && - errno == ENOTCAPABLE); - families[0] = AF_INET6; - CHECK(cap_dns_family_limit(capdns, families, 1) == -1 && - errno == ENOTCAPABLE); - - CHECK(runtest(capdns) == - (GETHOSTBYNAME | GETHOSTBYNAME2_AF_INET | GETHOSTBYADDR_AF_INET | - GETADDRINFO_AF_INET)); - - cap_close(capdns); - - /* - * Allow: - * type: NAME, ADDR - * family: AF_INET6 - */ - - capdns = cap_clone(origcapdns); - CHECK(capdns != NULL); - - types[0] = "NAME"; - types[1] = "ADDR"; - CHECK(cap_dns_type_limit(capdns, types, 2) == 0); - families[0] = AF_INET6; - CHECK(cap_dns_family_limit(capdns, families, 1) == 0); - families[1] = AF_INET; - CHECK(cap_dns_family_limit(capdns, families, 2) == -1 && - errno == ENOTCAPABLE); - families[0] = AF_INET; - CHECK(cap_dns_family_limit(capdns, families, 1) == -1 && - errno == ENOTCAPABLE); - - CHECK(runtest(capdns) == - (GETHOSTBYNAME2_AF_INET6 | GETHOSTBYADDR_AF_INET6 | - GETADDRINFO_AF_INET6)); - - cap_close(capdns); - - /* Below we also test further limiting capability. */ - - /* - * Allow: - * type: NAME - * family: AF_INET - */ - - capdns = cap_clone(origcapdns); - CHECK(capdns != NULL); - - types[0] = "NAME"; - types[1] = "ADDR"; - CHECK(cap_dns_type_limit(capdns, types, 2) == 0); - families[0] = AF_INET; - families[1] = AF_INET6; - CHECK(cap_dns_family_limit(capdns, families, 2) == 0); - types[0] = "NAME"; - CHECK(cap_dns_type_limit(capdns, types, 1) == 0); - types[1] = "ADDR"; - CHECK(cap_dns_type_limit(capdns, types, 2) == -1 && - errno == ENOTCAPABLE); - types[0] = "ADDR"; - CHECK(cap_dns_type_limit(capdns, types, 1) == -1 && - errno == ENOTCAPABLE); - families[0] = AF_INET; - CHECK(cap_dns_family_limit(capdns, families, 1) == 0); - families[1] = AF_INET6; - CHECK(cap_dns_family_limit(capdns, families, 2) == -1 && - errno == ENOTCAPABLE); - families[0] = AF_INET6; - CHECK(cap_dns_family_limit(capdns, families, 1) == -1 && - errno == ENOTCAPABLE); - - CHECK(runtest(capdns) == (GETHOSTBYNAME | GETHOSTBYNAME2_AF_INET)); - - cap_close(capdns); - - /* - * Allow: - * type: NAME - * family: AF_INET6 - */ - - capdns = cap_clone(origcapdns); - CHECK(capdns != NULL); - - types[0] = "NAME"; - types[1] = "ADDR"; - CHECK(cap_dns_type_limit(capdns, types, 2) == 0); - families[0] = AF_INET; - families[1] = AF_INET6; - CHECK(cap_dns_family_limit(capdns, families, 2) == 0); - types[0] = "NAME"; - CHECK(cap_dns_type_limit(capdns, types, 1) == 0); - types[1] = "ADDR"; - CHECK(cap_dns_type_limit(capdns, types, 2) == -1 && - errno == ENOTCAPABLE); - types[0] = "ADDR"; - CHECK(cap_dns_type_limit(capdns, types, 1) == -1 && - errno == ENOTCAPABLE); - families[0] = AF_INET6; - CHECK(cap_dns_family_limit(capdns, families, 1) == 0); - families[1] = AF_INET; - CHECK(cap_dns_family_limit(capdns, families, 2) == -1 && - errno == ENOTCAPABLE); - families[0] = AF_INET; - CHECK(cap_dns_family_limit(capdns, families, 1) == -1 && - errno == ENOTCAPABLE); - - CHECK(runtest(capdns) == GETHOSTBYNAME2_AF_INET6); - - cap_close(capdns); - - /* - * Allow: - * type: ADDR - * family: AF_INET - */ - - capdns = cap_clone(origcapdns); - CHECK(capdns != NULL); - - types[0] = "NAME"; - types[1] = "ADDR"; - CHECK(cap_dns_type_limit(capdns, types, 2) == 0); - families[0] = AF_INET; - families[1] = AF_INET6; - CHECK(cap_dns_family_limit(capdns, families, 2) == 0); - types[0] = "ADDR"; - CHECK(cap_dns_type_limit(capdns, types, 1) == 0); - types[1] = "NAME"; - CHECK(cap_dns_type_limit(capdns, types, 2) == -1 && - errno == ENOTCAPABLE); - types[0] = "NAME"; - CHECK(cap_dns_type_limit(capdns, types, 1) == -1 && - errno == ENOTCAPABLE); - families[0] = AF_INET; - CHECK(cap_dns_family_limit(capdns, families, 1) == 0); - families[1] = AF_INET6; - CHECK(cap_dns_family_limit(capdns, families, 2) == -1 && - errno == ENOTCAPABLE); - families[0] = AF_INET6; - CHECK(cap_dns_family_limit(capdns, families, 1) == -1 && - errno == ENOTCAPABLE); - - CHECK(runtest(capdns) == (GETHOSTBYADDR_AF_INET | GETADDRINFO_AF_INET)); - - cap_close(capdns); - - /* - * Allow: - * type: ADDR - * family: AF_INET6 - */ - - capdns = cap_clone(origcapdns); - CHECK(capdns != NULL); - - types[0] = "NAME"; - types[1] = "ADDR"; - CHECK(cap_dns_type_limit(capdns, types, 2) == 0); - families[0] = AF_INET; - families[1] = AF_INET6; - CHECK(cap_dns_family_limit(capdns, families, 2) == 0); - types[0] = "ADDR"; - CHECK(cap_dns_type_limit(capdns, types, 1) == 0); - types[1] = "NAME"; - CHECK(cap_dns_type_limit(capdns, types, 2) == -1 && - errno == ENOTCAPABLE); - types[0] = "NAME"; - CHECK(cap_dns_type_limit(capdns, types, 1) == -1 && - errno == ENOTCAPABLE); - families[0] = AF_INET6; - CHECK(cap_dns_family_limit(capdns, families, 1) == 0); - families[1] = AF_INET; - CHECK(cap_dns_family_limit(capdns, families, 2) == -1 && - errno == ENOTCAPABLE); - families[0] = AF_INET; - CHECK(cap_dns_family_limit(capdns, families, 1) == -1 && - errno == ENOTCAPABLE); - - CHECK(runtest(capdns) == (GETHOSTBYADDR_AF_INET6 | - GETADDRINFO_AF_INET6)); - - cap_close(capdns); - - /* Trying to rise the limits. */ - - capdns = cap_clone(origcapdns); - CHECK(capdns != NULL); - - types[0] = "NAME"; - CHECK(cap_dns_type_limit(capdns, types, 1) == 0); - families[0] = AF_INET; - CHECK(cap_dns_family_limit(capdns, families, 1) == 0); - - types[0] = "NAME"; - types[1] = "ADDR"; - CHECK(cap_dns_type_limit(capdns, types, 2) == -1 && - errno == ENOTCAPABLE); - families[0] = AF_INET; - families[1] = AF_INET6; - CHECK(cap_dns_family_limit(capdns, families, 2) == -1 && - errno == ENOTCAPABLE); - - types[0] = "ADDR"; - CHECK(cap_dns_type_limit(capdns, types, 1) == -1 && - errno == ENOTCAPABLE); - families[0] = AF_INET6; - CHECK(cap_dns_family_limit(capdns, families, 1) == -1 && - errno == ENOTCAPABLE); - - CHECK(cap_dns_type_limit(capdns, NULL, 0) == -1 && - errno == ENOTCAPABLE); - CHECK(cap_dns_family_limit(capdns, NULL, 0) == -1 && - errno == ENOTCAPABLE); - - /* Do the limits still hold? */ - CHECK(runtest(capdns) == (GETHOSTBYNAME | GETHOSTBYNAME2_AF_INET)); - - cap_close(capdns); - - capdns = cap_clone(origcapdns); - CHECK(capdns != NULL); - - types[0] = "ADDR"; - CHECK(cap_dns_type_limit(capdns, types, 1) == 0); - families[0] = AF_INET6; - CHECK(cap_dns_family_limit(capdns, families, 1) == 0); - - types[0] = "NAME"; - types[1] = "ADDR"; - CHECK(cap_dns_type_limit(capdns, types, 2) == -1 && - errno == ENOTCAPABLE); - families[0] = AF_INET; - families[1] = AF_INET6; - CHECK(cap_dns_family_limit(capdns, families, 2) == -1 && - errno == ENOTCAPABLE); - - types[0] = "NAME"; - CHECK(cap_dns_type_limit(capdns, types, 1) == -1 && - errno == ENOTCAPABLE); - families[0] = AF_INET; - CHECK(cap_dns_family_limit(capdns, families, 1) == -1 && - errno == ENOTCAPABLE); - - CHECK(cap_dns_type_limit(capdns, NULL, 0) == -1 && - errno == ENOTCAPABLE); - CHECK(cap_dns_family_limit(capdns, NULL, 0) == -1 && - errno == ENOTCAPABLE); - - /* Do the limits still hold? */ - CHECK(runtest(capdns) == (GETHOSTBYADDR_AF_INET6 | - GETADDRINFO_AF_INET6)); - - cap_close(capdns); - - cap_close(origcapdns); - - exit(0); -} Property changes on: projects/clang390-import/tools/regression/capsicum/libcasper/dns.c ___________________________________________________________________ Deleted: svn:eol-style ## -1 +0,0 ## -native \ No newline at end of property Deleted: svn:keywords ## -1 +0,0 ## -FreeBSD=%H \ No newline at end of property Deleted: svn:mime-type ## -1 +0,0 ## -text/plain \ No newline at end of property Index: projects/clang390-import/tools/regression/capsicum/libcasper/grp.c =================================================================== --- projects/clang390-import/tools/regression/capsicum/libcasper/grp.c (revision 305686) +++ projects/clang390-import/tools/regression/capsicum/libcasper/grp.c (nonexistent) @@ -1,1550 +0,0 @@ -/*- - * Copyright (c) 2013 The FreeBSD Foundation - * All rights reserved. - * - * This software was developed by Pawel Jakub Dawidek under sponsorship from - * the FreeBSD Foundation. - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND - * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE - * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE - * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL - * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS - * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) - * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT - * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY - * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF - * SUCH DAMAGE. - */ - -#include -__FBSDID("$FreeBSD$"); - -#include - -#include -#include -#include -#include -#include -#include -#include -#include - -#include - -#include - -static int ntest = 1; - -#define CHECK(expr) do { \ - if ((expr)) \ - printf("ok %d %s:%u\n", ntest, __FILE__, __LINE__); \ - else \ - printf("not ok %d %s:%u\n", ntest, __FILE__, __LINE__); \ - ntest++; \ -} while (0) -#define CHECKX(expr) do { \ - if ((expr)) { \ - printf("ok %d %s:%u\n", ntest, __FILE__, __LINE__); \ - } else { \ - printf("not ok %d %s:%u\n", ntest, __FILE__, __LINE__); \ - exit(1); \ - } \ - ntest++; \ -} while (0) - -#define GID_WHEEL 0 -#define GID_OPERATOR 5 - -#define GETGRENT0 0x0001 -#define GETGRENT1 0x0002 -#define GETGRENT2 0x0004 -#define GETGRENT (GETGRENT0 | GETGRENT1 | GETGRENT2) -#define GETGRENT_R0 0x0008 -#define GETGRENT_R1 0x0010 -#define GETGRENT_R2 0x0020 -#define GETGRENT_R (GETGRENT_R0 | GETGRENT_R1 | GETGRENT_R2) -#define GETGRNAM 0x0040 -#define GETGRNAM_R 0x0080 -#define GETGRGID 0x0100 -#define GETGRGID_R 0x0200 -#define SETGRENT 0x0400 - -static bool -group_mem_compare(char **mem0, char **mem1) -{ - int i0, i1; - - if (mem0 == NULL && mem1 == NULL) - return (true); - if (mem0 == NULL || mem1 == NULL) - return (false); - - for (i0 = 0; mem0[i0] != NULL; i0++) { - for (i1 = 0; mem1[i1] != NULL; i1++) { - if (strcmp(mem0[i0], mem1[i1]) == 0) - break; - } - if (mem1[i1] == NULL) - return (false); - } - - return (true); -} - -static bool -group_compare(const struct group *grp0, const struct group *grp1) -{ - - if (grp0 == NULL && grp1 == NULL) - return (true); - if (grp0 == NULL || grp1 == NULL) - return (false); - - if (strcmp(grp0->gr_name, grp1->gr_name) != 0) - return (false); - - if (grp0->gr_passwd != NULL || grp1->gr_passwd != NULL) { - if (grp0->gr_passwd == NULL || grp1->gr_passwd == NULL) - return (false); - if (strcmp(grp0->gr_passwd, grp1->gr_passwd) != 0) - return (false); - } - - if (grp0->gr_gid != grp1->gr_gid) - return (false); - - if (!group_mem_compare(grp0->gr_mem, grp1->gr_mem)) - return (false); - - return (true); -} - -static unsigned int -runtest_cmds(cap_channel_t *capgrp) -{ - char bufs[1024], bufc[1024]; - unsigned int result; - struct group *grps, *grpc; - struct group sts, stc; - - result = 0; - - (void)setgrent(); - if (cap_setgrent(capgrp) == 1) - result |= SETGRENT; - - grps = getgrent(); - grpc = cap_getgrent(capgrp); - if (group_compare(grps, grpc)) { - result |= GETGRENT0; - grps = getgrent(); - grpc = cap_getgrent(capgrp); - if (group_compare(grps, grpc)) - result |= GETGRENT1; - } - - getgrent_r(&sts, bufs, sizeof(bufs), &grps); - cap_getgrent_r(capgrp, &stc, bufc, sizeof(bufc), &grpc); - if (group_compare(grps, grpc)) { - result |= GETGRENT_R0; - getgrent_r(&sts, bufs, sizeof(bufs), &grps); - cap_getgrent_r(capgrp, &stc, bufc, sizeof(bufc), &grpc); - if (group_compare(grps, grpc)) - result |= GETGRENT_R1; - } - - (void)setgrent(); - if (cap_setgrent(capgrp) == 1) - result |= SETGRENT; - - getgrent_r(&sts, bufs, sizeof(bufs), &grps); - cap_getgrent_r(capgrp, &stc, bufc, sizeof(bufc), &grpc); - if (group_compare(grps, grpc)) - result |= GETGRENT_R2; - - grps = getgrent(); - grpc = cap_getgrent(capgrp); - if (group_compare(grps, grpc)) - result |= GETGRENT2; - - grps = getgrnam("wheel"); - grpc = cap_getgrnam(capgrp, "wheel"); - if (group_compare(grps, grpc)) { - grps = getgrnam("operator"); - grpc = cap_getgrnam(capgrp, "operator"); - if (group_compare(grps, grpc)) - result |= GETGRNAM; - } - - getgrnam_r("wheel", &sts, bufs, sizeof(bufs), &grps); - cap_getgrnam_r(capgrp, "wheel", &stc, bufc, sizeof(bufc), &grpc); - if (group_compare(grps, grpc)) { - getgrnam_r("operator", &sts, bufs, sizeof(bufs), &grps); - cap_getgrnam_r(capgrp, "operator", &stc, bufc, sizeof(bufc), - &grpc); - if (group_compare(grps, grpc)) - result |= GETGRNAM_R; - } - - grps = getgrgid(GID_WHEEL); - grpc = cap_getgrgid(capgrp, GID_WHEEL); - if (group_compare(grps, grpc)) { - grps = getgrgid(GID_OPERATOR); - grpc = cap_getgrgid(capgrp, GID_OPERATOR); - if (group_compare(grps, grpc)) - result |= GETGRGID; - } - - getgrgid_r(GID_WHEEL, &sts, bufs, sizeof(bufs), &grps); - cap_getgrgid_r(capgrp, GID_WHEEL, &stc, bufc, sizeof(bufc), &grpc); - if (group_compare(grps, grpc)) { - getgrgid_r(GID_OPERATOR, &sts, bufs, sizeof(bufs), &grps); - cap_getgrgid_r(capgrp, GID_OPERATOR, &stc, bufc, sizeof(bufc), - &grpc); - if (group_compare(grps, grpc)) - result |= GETGRGID_R; - } - - return (result); -} - -static void -test_cmds(cap_channel_t *origcapgrp) -{ - cap_channel_t *capgrp; - const char *cmds[7], *fields[4], *names[5]; - gid_t gids[5]; - - fields[0] = "gr_name"; - fields[1] = "gr_passwd"; - fields[2] = "gr_gid"; - fields[3] = "gr_mem"; - - names[0] = "wheel"; - names[1] = "daemon"; - names[2] = "kmem"; - names[3] = "sys"; - names[4] = "operator"; - - gids[0] = 0; - gids[1] = 1; - gids[2] = 2; - gids[3] = 3; - gids[4] = 5; - - /* - * Allow: - * cmds: setgrent, getgrent, getgrent_r, getgrnam, getgrnam_r, - * getgrgid, getgrgid_r - * fields: gr_name, gr_passwd, gr_gid, gr_mem - * groups: - * names: wheel, daemon, kmem, sys, operator - * gids: - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - cmds[0] = "setgrent"; - cmds[1] = "getgrent"; - cmds[2] = "getgrent_r"; - cmds[3] = "getgrnam"; - cmds[4] = "getgrnam_r"; - cmds[5] = "getgrgid"; - cmds[6] = "getgrgid_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 7) == 0); - CHECK(cap_grp_limit_fields(capgrp, fields, 4) == 0); - CHECK(cap_grp_limit_groups(capgrp, names, 5, NULL, 0) == 0); - - CHECK(runtest_cmds(capgrp) == (SETGRENT | GETGRENT | GETGRENT_R | - GETGRNAM | GETGRNAM_R | GETGRGID | GETGRGID_R)); - - cap_close(capgrp); - - /* - * Allow: - * cmds: setgrent, getgrent, getgrent_r, getgrnam, getgrnam_r, - * getgrgid, getgrgid_r - * fields: gr_name, gr_passwd, gr_gid, gr_mem - * groups: - * names: - * gids: 0, 1, 2, 3, 5 - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - cmds[0] = "setgrent"; - cmds[1] = "getgrent"; - cmds[2] = "getgrent_r"; - cmds[3] = "getgrnam"; - cmds[4] = "getgrnam_r"; - cmds[5] = "getgrgid"; - cmds[6] = "getgrgid_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 7) == 0); - CHECK(cap_grp_limit_fields(capgrp, fields, 4) == 0); - CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 5) == 0); - - CHECK(runtest_cmds(capgrp) == (SETGRENT | GETGRENT | GETGRENT_R | - GETGRNAM | GETGRNAM_R | GETGRGID | GETGRGID_R)); - - cap_close(capgrp); - - /* - * Allow: - * cmds: getgrent, getgrent_r, getgrnam, getgrnam_r, - * getgrgid, getgrgid_r - * fields: gr_name, gr_passwd, gr_gid, gr_mem - * groups: - * names: wheel, daemon, kmem, sys, operator - * gids: - * Disallow: - * cmds: setgrent - * fields: - * groups: - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - cmds[0] = "getgrent"; - cmds[1] = "getgrent_r"; - cmds[2] = "getgrnam"; - cmds[3] = "getgrnam_r"; - cmds[4] = "getgrgid"; - cmds[5] = "getgrgid_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 6) == 0); - cmds[0] = "setgrent"; - cmds[1] = "getgrent"; - cmds[2] = "getgrent_r"; - cmds[3] = "getgrnam"; - cmds[4] = "getgrnam_r"; - cmds[5] = "getgrgid"; - cmds[6] = "getgrgid_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 7) == -1 && errno == ENOTCAPABLE); - cmds[0] = "setgrent"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 1) == -1 && errno == ENOTCAPABLE); - CHECK(cap_grp_limit_groups(capgrp, names, 5, NULL, 0) == 0); - - CHECK(runtest_cmds(capgrp) == (GETGRENT0 | GETGRENT1 | GETGRENT_R0 | - GETGRENT_R1 | GETGRNAM | GETGRNAM_R | GETGRGID | GETGRGID_R)); - - cap_close(capgrp); - - /* - * Allow: - * cmds: getgrent, getgrent_r, getgrnam, getgrnam_r, - * getgrgid, getgrgid_r - * fields: gr_name, gr_passwd, gr_gid, gr_mem - * groups: - * names: - * gids: 0, 1, 2, 3, 5 - * Disallow: - * cmds: setgrent - * fields: - * groups: - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - cmds[0] = "getgrent"; - cmds[1] = "getgrent_r"; - cmds[2] = "getgrnam"; - cmds[3] = "getgrnam_r"; - cmds[4] = "getgrgid"; - cmds[5] = "getgrgid_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 6) == 0); - cmds[0] = "setgrent"; - cmds[1] = "getgrent"; - cmds[2] = "getgrent_r"; - cmds[3] = "getgrnam"; - cmds[4] = "getgrnam_r"; - cmds[5] = "getgrgid"; - cmds[6] = "getgrgid_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 7) == -1 && errno == ENOTCAPABLE); - cmds[0] = "setgrent"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 1) == -1 && errno == ENOTCAPABLE); - CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 5) == 0); - - CHECK(runtest_cmds(capgrp) == (GETGRENT0 | GETGRENT1 | GETGRENT_R0 | - GETGRENT_R1 | GETGRNAM | GETGRNAM_R | GETGRGID | GETGRGID_R)); - - cap_close(capgrp); - - /* - * Allow: - * cmds: setgrent, getgrent_r, getgrnam, getgrnam_r, - * getgrgid, getgrgid_r - * fields: gr_name, gr_passwd, gr_gid, gr_mem - * groups: - * names: wheel, daemon, kmem, sys, operator - * gids: - * Disallow: - * cmds: getgrent - * fields: - * groups: - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - cmds[0] = "setgrent"; - cmds[1] = "getgrent_r"; - cmds[2] = "getgrnam"; - cmds[3] = "getgrnam_r"; - cmds[4] = "getgrgid"; - cmds[5] = "getgrgid_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 6) == 0); - cmds[0] = "setgrent"; - cmds[1] = "getgrent"; - cmds[2] = "getgrent_r"; - cmds[3] = "getgrnam"; - cmds[4] = "getgrnam_r"; - cmds[5] = "getgrgid"; - cmds[6] = "getgrgid_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 7) == -1 && errno == ENOTCAPABLE); - cmds[0] = "getgrent"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 1) == -1 && errno == ENOTCAPABLE); - CHECK(cap_grp_limit_fields(capgrp, fields, 4) == 0); - CHECK(cap_grp_limit_groups(capgrp, names, 5, NULL, 0) == 0); - - CHECK(runtest_cmds(capgrp) == (SETGRENT | GETGRENT_R2 | - GETGRNAM | GETGRNAM_R | GETGRGID | GETGRGID_R)); - - cap_close(capgrp); - - /* - * Allow: - * cmds: setgrent, getgrent_r, getgrnam, getgrnam_r, - * getgrgid, getgrgid_r - * fields: gr_name, gr_passwd, gr_gid, gr_mem - * groups: - * names: - * gids: 0, 1, 2, 3, 5 - * Disallow: - * cmds: getgrent - * fields: - * groups: - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - cmds[0] = "setgrent"; - cmds[1] = "getgrent_r"; - cmds[2] = "getgrnam"; - cmds[3] = "getgrnam_r"; - cmds[4] = "getgrgid"; - cmds[5] = "getgrgid_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 6) == 0); - cmds[0] = "setgrent"; - cmds[1] = "getgrent"; - cmds[2] = "getgrent_r"; - cmds[3] = "getgrnam"; - cmds[4] = "getgrnam_r"; - cmds[5] = "getgrgid"; - cmds[6] = "getgrgid_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 7) == -1 && errno == ENOTCAPABLE); - cmds[0] = "getgrent"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 1) == -1 && errno == ENOTCAPABLE); - CHECK(cap_grp_limit_fields(capgrp, fields, 4) == 0); - CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 5) == 0); - - CHECK(runtest_cmds(capgrp) == (SETGRENT | GETGRENT_R2 | - GETGRNAM | GETGRNAM_R | GETGRGID | GETGRGID_R)); - - cap_close(capgrp); - - /* - * Allow: - * cmds: setgrent, getgrent, getgrnam, getgrnam_r, - * getgrgid, getgrgid_r - * fields: gr_name, gr_passwd, gr_gid, gr_mem - * groups: - * names: wheel, daemon, kmem, sys, operator - * gids: - * Disallow: - * cmds: getgrent_r - * fields: - * groups: - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - cmds[0] = "setgrent"; - cmds[1] = "getgrent"; - cmds[2] = "getgrnam"; - cmds[3] = "getgrnam_r"; - cmds[4] = "getgrgid"; - cmds[5] = "getgrgid_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 6) == 0); - cmds[0] = "setgrent"; - cmds[1] = "getgrent"; - cmds[2] = "getgrent_r"; - cmds[3] = "getgrnam"; - cmds[4] = "getgrnam_r"; - cmds[5] = "getgrgid"; - cmds[6] = "getgrgid_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 7) == -1 && errno == ENOTCAPABLE); - cmds[0] = "getgrent_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 1) == -1 && errno == ENOTCAPABLE); - CHECK(cap_grp_limit_groups(capgrp, names, 5, NULL, 0) == 0); - - CHECK(runtest_cmds(capgrp) == (SETGRENT | GETGRENT0 | GETGRENT1 | - GETGRNAM | GETGRNAM_R | GETGRGID | GETGRGID_R)); - - cap_close(capgrp); - - /* - * Allow: - * cmds: setgrent, getgrent, getgrnam, getgrnam_r, - * getgrgid, getgrgid_r - * fields: gr_name, gr_passwd, gr_gid, gr_mem - * groups: - * names: - * gids: 0, 1, 2, 3, 5 - * Disallow: - * cmds: getgrent_r - * fields: - * groups: - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - cmds[0] = "setgrent"; - cmds[1] = "getgrent"; - cmds[2] = "getgrnam"; - cmds[3] = "getgrnam_r"; - cmds[4] = "getgrgid"; - cmds[5] = "getgrgid_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 6) == 0); - cmds[0] = "setgrent"; - cmds[1] = "getgrent"; - cmds[2] = "getgrent_r"; - cmds[3] = "getgrnam"; - cmds[4] = "getgrnam_r"; - cmds[5] = "getgrgid"; - cmds[6] = "getgrgid_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 7) == -1 && errno == ENOTCAPABLE); - cmds[0] = "getgrent_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 1) == -1 && errno == ENOTCAPABLE); - CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 5) == 0); - - CHECK(runtest_cmds(capgrp) == (SETGRENT | GETGRENT0 | GETGRENT1 | - GETGRNAM | GETGRNAM_R | GETGRGID | GETGRGID_R)); - - cap_close(capgrp); - - /* - * Allow: - * cmds: setgrent, getgrent, getgrent_r, getgrnam_r, - * getgrgid, getgrgid_r - * fields: gr_name, gr_passwd, gr_gid, gr_mem - * groups: - * names: wheel, daemon, kmem, sys, operator - * gids: - * Disallow: - * cmds: getgrnam - * fields: - * groups: - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - cmds[0] = "setgrent"; - cmds[1] = "getgrent"; - cmds[2] = "getgrent_r"; - cmds[3] = "getgrnam_r"; - cmds[4] = "getgrgid"; - cmds[5] = "getgrgid_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 6) == 0); - cmds[0] = "setgrent"; - cmds[1] = "getgrent"; - cmds[2] = "getgrent_r"; - cmds[3] = "getgrnam"; - cmds[4] = "getgrnam_r"; - cmds[5] = "getgrgid"; - cmds[6] = "getgrgid_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 7) == -1 && errno == ENOTCAPABLE); - cmds[0] = "getgrnam"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 1) == -1 && errno == ENOTCAPABLE); - CHECK(cap_grp_limit_fields(capgrp, fields, 4) == 0); - CHECK(cap_grp_limit_groups(capgrp, names, 5, NULL, 0) == 0); - - CHECK(runtest_cmds(capgrp) == (SETGRENT | GETGRENT | GETGRENT_R | - GETGRNAM_R | GETGRGID | GETGRGID_R)); - - cap_close(capgrp); - - /* - * Allow: - * cmds: setgrent, getgrent, getgrent_r, getgrnam_r, - * getgrgid, getgrgid_r - * fields: gr_name, gr_passwd, gr_gid, gr_mem - * groups: - * names: - * gids: 0, 1, 2, 3, 5 - * Disallow: - * cmds: getgrnam - * fields: - * groups: - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - cmds[0] = "setgrent"; - cmds[1] = "getgrent"; - cmds[2] = "getgrent_r"; - cmds[3] = "getgrnam_r"; - cmds[4] = "getgrgid"; - cmds[5] = "getgrgid_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 6) == 0); - cmds[0] = "setgrent"; - cmds[1] = "getgrent"; - cmds[2] = "getgrent_r"; - cmds[3] = "getgrnam"; - cmds[4] = "getgrnam_r"; - cmds[5] = "getgrgid"; - cmds[6] = "getgrgid_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 7) == -1 && errno == ENOTCAPABLE); - cmds[0] = "getgrnam"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 1) == -1 && errno == ENOTCAPABLE); - CHECK(cap_grp_limit_fields(capgrp, fields, 4) == 0); - CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 5) == 0); - - CHECK(runtest_cmds(capgrp) == (SETGRENT | GETGRENT | GETGRENT_R | - GETGRNAM_R | GETGRGID | GETGRGID_R)); - - cap_close(capgrp); - - /* - * Allow: - * cmds: setgrent, getgrent, getgrent_r, getgrnam, - * getgrgid, getgrgid_r - * fields: gr_name, gr_passwd, gr_gid, gr_mem - * groups: - * names: wheel, daemon, kmem, sys, operator - * gids: - * Disallow: - * cmds: getgrnam_r - * fields: - * groups: - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - cmds[0] = "setgrent"; - cmds[1] = "getgrent"; - cmds[2] = "getgrent_r"; - cmds[3] = "getgrnam"; - cmds[4] = "getgrgid"; - cmds[5] = "getgrgid_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 6) == 0); - cmds[0] = "setgrent"; - cmds[1] = "getgrent"; - cmds[2] = "getgrent_r"; - cmds[3] = "getgrnam"; - cmds[4] = "getgrnam_r"; - cmds[5] = "getgrgid"; - cmds[6] = "getgrgid_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 7) == -1 && errno == ENOTCAPABLE); - cmds[0] = "getgrnam_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 1) == -1 && errno == ENOTCAPABLE); - CHECK(cap_grp_limit_groups(capgrp, names, 5, NULL, 0) == 0); - - CHECK(runtest_cmds(capgrp) == (SETGRENT | GETGRENT | GETGRENT_R | - GETGRNAM | GETGRGID | GETGRGID_R)); - - cap_close(capgrp); - - /* - * Allow: - * cmds: setgrent, getgrent, getgrent_r, getgrnam, - * getgrgid, getgrgid_r - * fields: gr_name, gr_passwd, gr_gid, gr_mem - * groups: - * names: - * gids: 0, 1, 2, 3, 5 - * Disallow: - * cmds: getgrnam_r - * fields: - * groups: - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - cmds[0] = "setgrent"; - cmds[1] = "getgrent"; - cmds[2] = "getgrent_r"; - cmds[3] = "getgrnam"; - cmds[4] = "getgrgid"; - cmds[5] = "getgrgid_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 6) == 0); - cmds[0] = "setgrent"; - cmds[1] = "getgrent"; - cmds[2] = "getgrent_r"; - cmds[3] = "getgrnam"; - cmds[4] = "getgrnam_r"; - cmds[5] = "getgrgid"; - cmds[6] = "getgrgid_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 7) == -1 && errno == ENOTCAPABLE); - cmds[0] = "getgrnam_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 1) == -1 && errno == ENOTCAPABLE); - CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 5) == 0); - - CHECK(runtest_cmds(capgrp) == (SETGRENT | GETGRENT | GETGRENT_R | - GETGRNAM | GETGRGID | GETGRGID_R)); - - cap_close(capgrp); - - /* - * Allow: - * cmds: setgrent, getgrent, getgrent_r, getgrnam, getgrnam_r, - * getgrgid_r - * fields: gr_name, gr_passwd, gr_gid, gr_mem - * groups: - * names: wheel, daemon, kmem, sys, operator - * gids: - * Disallow: - * cmds: getgrgid - * fields: - * groups: - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - cmds[0] = "setgrent"; - cmds[1] = "getgrent"; - cmds[2] = "getgrent_r"; - cmds[3] = "getgrnam"; - cmds[4] = "getgrnam_r"; - cmds[5] = "getgrgid_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 6) == 0); - cmds[0] = "setgrent"; - cmds[1] = "getgrent"; - cmds[2] = "getgrent_r"; - cmds[3] = "getgrnam"; - cmds[4] = "getgrnam_r"; - cmds[5] = "getgrgid"; - cmds[6] = "getgrgid_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 7) == -1 && errno == ENOTCAPABLE); - cmds[0] = "getgrgid"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 1) == -1 && errno == ENOTCAPABLE); - CHECK(cap_grp_limit_fields(capgrp, fields, 4) == 0); - CHECK(cap_grp_limit_groups(capgrp, names, 5, NULL, 0) == 0); - - CHECK(runtest_cmds(capgrp) == (SETGRENT | GETGRENT | GETGRENT_R | - GETGRNAM | GETGRNAM_R | GETGRGID_R)); - - cap_close(capgrp); - - /* - * Allow: - * cmds: setgrent, getgrent, getgrent_r, getgrnam, getgrnam_r, - * getgrgid_r - * fields: gr_name, gr_passwd, gr_gid, gr_mem - * groups: - * names: - * gids: 0, 1, 2, 3, 5 - * Disallow: - * cmds: getgrgid - * fields: - * groups: - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - cmds[0] = "setgrent"; - cmds[1] = "getgrent"; - cmds[2] = "getgrent_r"; - cmds[3] = "getgrnam"; - cmds[4] = "getgrnam_r"; - cmds[5] = "getgrgid_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 6) == 0); - cmds[0] = "setgrent"; - cmds[1] = "getgrent"; - cmds[2] = "getgrent_r"; - cmds[3] = "getgrnam"; - cmds[4] = "getgrnam_r"; - cmds[5] = "getgrgid"; - cmds[6] = "getgrgid_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 7) == -1 && errno == ENOTCAPABLE); - cmds[0] = "getgrgid"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 1) == -1 && errno == ENOTCAPABLE); - CHECK(cap_grp_limit_fields(capgrp, fields, 4) == 0); - CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 5) == 0); - - CHECK(runtest_cmds(capgrp) == (SETGRENT | GETGRENT | GETGRENT_R | - GETGRNAM | GETGRNAM_R | GETGRGID_R)); - - cap_close(capgrp); - - /* - * Allow: - * cmds: setgrent, getgrent, getgrent_r, getgrnam, getgrnam_r, - * getgrgid - * fields: gr_name, gr_passwd, gr_gid, gr_mem - * groups: - * names: wheel, daemon, kmem, sys, operator - * gids: - * Disallow: - * cmds: getgrgid_r - * fields: - * groups: - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - cmds[0] = "setgrent"; - cmds[1] = "getgrent"; - cmds[2] = "getgrent_r"; - cmds[3] = "getgrnam"; - cmds[4] = "getgrnam_r"; - cmds[5] = "getgrgid"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 6) == 0); - cmds[0] = "setgrent"; - cmds[1] = "getgrent"; - cmds[2] = "getgrent_r"; - cmds[3] = "getgrnam"; - cmds[4] = "getgrnam_r"; - cmds[5] = "getgrgid"; - cmds[6] = "getgrgid_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 7) == -1 && errno == ENOTCAPABLE); - cmds[0] = "getgrgid_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 1) == -1 && errno == ENOTCAPABLE); - CHECK(cap_grp_limit_groups(capgrp, names, 5, NULL, 0) == 0); - - CHECK(runtest_cmds(capgrp) == (SETGRENT | GETGRENT | GETGRENT_R | - GETGRNAM | GETGRNAM_R | GETGRGID)); - - cap_close(capgrp); - - /* - * Allow: - * cmds: setgrent, getgrent, getgrent_r, getgrnam, getgrnam_r, - * getgrgid - * fields: gr_name, gr_passwd, gr_gid, gr_mem - * groups: - * names: - * gids: 0, 1, 2, 3, 5 - * Disallow: - * cmds: getgrgid_r - * fields: - * groups: - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - cmds[0] = "setgrent"; - cmds[1] = "getgrent"; - cmds[2] = "getgrent_r"; - cmds[3] = "getgrnam"; - cmds[4] = "getgrnam_r"; - cmds[5] = "getgrgid"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 6) == 0); - cmds[0] = "setgrent"; - cmds[1] = "getgrent"; - cmds[2] = "getgrent_r"; - cmds[3] = "getgrnam"; - cmds[4] = "getgrnam_r"; - cmds[5] = "getgrgid"; - cmds[6] = "getgrgid_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 7) == -1 && errno == ENOTCAPABLE); - cmds[0] = "getgrgid_r"; - CHECK(cap_grp_limit_cmds(capgrp, cmds, 1) == -1 && errno == ENOTCAPABLE); - CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 5) == 0); - - CHECK(runtest_cmds(capgrp) == (SETGRENT | GETGRENT | GETGRENT_R | - GETGRNAM | GETGRNAM_R | GETGRGID)); - - cap_close(capgrp); -} - -#define GR_NAME 0x01 -#define GR_PASSWD 0x02 -#define GR_GID 0x04 -#define GR_MEM 0x08 - -static unsigned int -group_fields(const struct group *grp) -{ - unsigned int result; - - result = 0; - - if (grp->gr_name != NULL && grp->gr_name[0] != '\0') - result |= GR_NAME; - - if (grp->gr_passwd != NULL && grp->gr_passwd[0] != '\0') - result |= GR_PASSWD; - - if (grp->gr_gid != (gid_t)-1) - result |= GR_GID; - - if (grp->gr_mem != NULL && grp->gr_mem[0] != NULL) - result |= GR_MEM; - - return (result); -} - -static bool -runtest_fields(cap_channel_t *capgrp, unsigned int expected) -{ - char buf[1024]; - struct group *grp; - struct group st; - - (void)cap_setgrent(capgrp); - grp = cap_getgrent(capgrp); - if (group_fields(grp) != expected) - return (false); - - (void)cap_setgrent(capgrp); - cap_getgrent_r(capgrp, &st, buf, sizeof(buf), &grp); - if (group_fields(grp) != expected) - return (false); - - grp = cap_getgrnam(capgrp, "wheel"); - if (group_fields(grp) != expected) - return (false); - - cap_getgrnam_r(capgrp, "wheel", &st, buf, sizeof(buf), &grp); - if (group_fields(grp) != expected) - return (false); - - grp = cap_getgrgid(capgrp, GID_WHEEL); - if (group_fields(grp) != expected) - return (false); - - cap_getgrgid_r(capgrp, GID_WHEEL, &st, buf, sizeof(buf), &grp); - if (group_fields(grp) != expected) - return (false); - - return (true); -} - -static void -test_fields(cap_channel_t *origcapgrp) -{ - cap_channel_t *capgrp; - const char *fields[4]; - - /* No limits. */ - - CHECK(runtest_fields(origcapgrp, GR_NAME | GR_PASSWD | GR_GID | GR_MEM)); - - /* - * Allow: - * fields: gr_name, gr_passwd, gr_gid, gr_mem - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - fields[0] = "gr_name"; - fields[1] = "gr_passwd"; - fields[2] = "gr_gid"; - fields[3] = "gr_mem"; - CHECK(cap_grp_limit_fields(capgrp, fields, 4) == 0); - - CHECK(runtest_fields(capgrp, GR_NAME | GR_PASSWD | GR_GID | GR_MEM)); - - cap_close(capgrp); - - /* - * Allow: - * fields: gr_passwd, gr_gid, gr_mem - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - fields[0] = "gr_passwd"; - fields[1] = "gr_gid"; - fields[2] = "gr_mem"; - CHECK(cap_grp_limit_fields(capgrp, fields, 3) == 0); - fields[0] = "gr_name"; - fields[1] = "gr_passwd"; - fields[2] = "gr_gid"; - fields[3] = "gr_mem"; - CHECK(cap_grp_limit_fields(capgrp, fields, 4) == -1 && - errno == ENOTCAPABLE); - - CHECK(runtest_fields(capgrp, GR_PASSWD | GR_GID | GR_MEM)); - - cap_close(capgrp); - - /* - * Allow: - * fields: gr_name, gr_gid, gr_mem - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - fields[0] = "gr_name"; - fields[1] = "gr_gid"; - fields[2] = "gr_mem"; - CHECK(cap_grp_limit_fields(capgrp, fields, 3) == 0); - fields[0] = "gr_name"; - fields[1] = "gr_passwd"; - fields[2] = "gr_gid"; - fields[3] = "gr_mem"; - CHECK(cap_grp_limit_fields(capgrp, fields, 4) == -1 && - errno == ENOTCAPABLE); - fields[0] = "gr_passwd"; - CHECK(cap_grp_limit_fields(capgrp, fields, 1) == -1 && - errno == ENOTCAPABLE); - - CHECK(runtest_fields(capgrp, GR_NAME | GR_GID | GR_MEM)); - - cap_close(capgrp); - - /* - * Allow: - * fields: gr_name, gr_passwd, gr_mem - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - fields[0] = "gr_name"; - fields[1] = "gr_passwd"; - fields[2] = "gr_mem"; - CHECK(cap_grp_limit_fields(capgrp, fields, 3) == 0); - fields[0] = "gr_name"; - fields[1] = "gr_passwd"; - fields[2] = "gr_gid"; - fields[3] = "gr_mem"; - CHECK(cap_grp_limit_fields(capgrp, fields, 4) == -1 && - errno == ENOTCAPABLE); - fields[0] = "gr_gid"; - CHECK(cap_grp_limit_fields(capgrp, fields, 1) == -1 && - errno == ENOTCAPABLE); - - CHECK(runtest_fields(capgrp, GR_NAME | GR_PASSWD | GR_MEM)); - - cap_close(capgrp); - - /* - * Allow: - * fields: gr_name, gr_passwd, gr_gid - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - fields[0] = "gr_name"; - fields[1] = "gr_passwd"; - fields[2] = "gr_gid"; - CHECK(cap_grp_limit_fields(capgrp, fields, 3) == 0); - fields[0] = "gr_name"; - fields[1] = "gr_passwd"; - fields[2] = "gr_gid"; - fields[3] = "gr_mem"; - CHECK(cap_grp_limit_fields(capgrp, fields, 4) == -1 && - errno == ENOTCAPABLE); - fields[0] = "gr_mem"; - CHECK(cap_grp_limit_fields(capgrp, fields, 1) == -1 && - errno == ENOTCAPABLE); - - CHECK(runtest_fields(capgrp, GR_NAME | GR_PASSWD | GR_GID)); - - cap_close(capgrp); - - /* - * Allow: - * fields: gr_name, gr_passwd - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - fields[0] = "gr_name"; - fields[1] = "gr_passwd"; - CHECK(cap_grp_limit_fields(capgrp, fields, 2) == 0); - fields[0] = "gr_name"; - fields[1] = "gr_passwd"; - fields[2] = "gr_gid"; - fields[3] = "gr_mem"; - CHECK(cap_grp_limit_fields(capgrp, fields, 4) == -1 && - errno == ENOTCAPABLE); - fields[0] = "gr_gid"; - CHECK(cap_grp_limit_fields(capgrp, fields, 1) == -1 && - errno == ENOTCAPABLE); - - CHECK(runtest_fields(capgrp, GR_NAME | GR_PASSWD)); - - cap_close(capgrp); - - /* - * Allow: - * fields: gr_name, gr_gid - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - fields[0] = "gr_name"; - fields[1] = "gr_gid"; - CHECK(cap_grp_limit_fields(capgrp, fields, 2) == 0); - fields[0] = "gr_name"; - fields[1] = "gr_passwd"; - fields[2] = "gr_gid"; - fields[3] = "gr_mem"; - CHECK(cap_grp_limit_fields(capgrp, fields, 4) == -1 && - errno == ENOTCAPABLE); - fields[0] = "gr_mem"; - CHECK(cap_grp_limit_fields(capgrp, fields, 1) == -1 && - errno == ENOTCAPABLE); - - CHECK(runtest_fields(capgrp, GR_NAME | GR_GID)); - - cap_close(capgrp); - - /* - * Allow: - * fields: gr_name, gr_mem - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - fields[0] = "gr_name"; - fields[1] = "gr_mem"; - CHECK(cap_grp_limit_fields(capgrp, fields, 2) == 0); - fields[0] = "gr_name"; - fields[1] = "gr_passwd"; - fields[2] = "gr_gid"; - fields[3] = "gr_mem"; - CHECK(cap_grp_limit_fields(capgrp, fields, 4) == -1 && - errno == ENOTCAPABLE); - fields[0] = "gr_passwd"; - CHECK(cap_grp_limit_fields(capgrp, fields, 1) == -1 && - errno == ENOTCAPABLE); - - CHECK(runtest_fields(capgrp, GR_NAME | GR_MEM)); - - cap_close(capgrp); - - /* - * Allow: - * fields: gr_passwd, gr_gid - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - fields[0] = "gr_passwd"; - fields[1] = "gr_gid"; - CHECK(cap_grp_limit_fields(capgrp, fields, 2) == 0); - fields[0] = "gr_name"; - fields[1] = "gr_passwd"; - fields[2] = "gr_gid"; - fields[3] = "gr_mem"; - CHECK(cap_grp_limit_fields(capgrp, fields, 4) == -1 && - errno == ENOTCAPABLE); - fields[0] = "gr_mem"; - CHECK(cap_grp_limit_fields(capgrp, fields, 1) == -1 && - errno == ENOTCAPABLE); - - CHECK(runtest_fields(capgrp, GR_PASSWD | GR_GID)); - - cap_close(capgrp); - - /* - * Allow: - * fields: gr_passwd, gr_mem - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - fields[0] = "gr_passwd"; - fields[1] = "gr_mem"; - CHECK(cap_grp_limit_fields(capgrp, fields, 2) == 0); - fields[0] = "gr_name"; - fields[1] = "gr_passwd"; - fields[2] = "gr_gid"; - fields[3] = "gr_mem"; - CHECK(cap_grp_limit_fields(capgrp, fields, 4) == -1 && - errno == ENOTCAPABLE); - fields[0] = "gr_gid"; - CHECK(cap_grp_limit_fields(capgrp, fields, 1) == -1 && - errno == ENOTCAPABLE); - - CHECK(runtest_fields(capgrp, GR_PASSWD | GR_MEM)); - - cap_close(capgrp); - - /* - * Allow: - * fields: gr_gid, gr_mem - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - fields[0] = "gr_gid"; - fields[1] = "gr_mem"; - CHECK(cap_grp_limit_fields(capgrp, fields, 2) == 0); - fields[0] = "gr_name"; - fields[1] = "gr_passwd"; - fields[2] = "gr_gid"; - fields[3] = "gr_mem"; - CHECK(cap_grp_limit_fields(capgrp, fields, 4) == -1 && - errno == ENOTCAPABLE); - fields[0] = "gr_passwd"; - CHECK(cap_grp_limit_fields(capgrp, fields, 1) == -1 && - errno == ENOTCAPABLE); - - CHECK(runtest_fields(capgrp, GR_GID | GR_MEM)); - - cap_close(capgrp); -} - -static bool -runtest_groups(cap_channel_t *capgrp, const char **names, const gid_t *gids, - size_t ngroups) -{ - char buf[1024]; - struct group *grp; - struct group st; - unsigned int i, got; - - (void)cap_setgrent(capgrp); - got = 0; - for (;;) { - grp = cap_getgrent(capgrp); - if (grp == NULL) - break; - got++; - for (i = 0; i < ngroups; i++) { - if (strcmp(names[i], grp->gr_name) == 0 && - gids[i] == grp->gr_gid) { - break; - } - } - if (i == ngroups) - return (false); - } - if (got != ngroups) - return (false); - - (void)cap_setgrent(capgrp); - got = 0; - for (;;) { - cap_getgrent_r(capgrp, &st, buf, sizeof(buf), &grp); - if (grp == NULL) - break; - got++; - for (i = 0; i < ngroups; i++) { - if (strcmp(names[i], grp->gr_name) == 0 && - gids[i] == grp->gr_gid) { - break; - } - } - if (i == ngroups) - return (false); - } - if (got != ngroups) - return (false); - - for (i = 0; i < ngroups; i++) { - grp = cap_getgrnam(capgrp, names[i]); - if (grp == NULL) - return (false); - } - - for (i = 0; i < ngroups; i++) { - cap_getgrnam_r(capgrp, names[i], &st, buf, sizeof(buf), &grp); - if (grp == NULL) - return (false); - } - - for (i = 0; i < ngroups; i++) { - grp = cap_getgrgid(capgrp, gids[i]); - if (grp == NULL) - return (false); - } - - for (i = 0; i < ngroups; i++) { - cap_getgrgid_r(capgrp, gids[i], &st, buf, sizeof(buf), &grp); - if (grp == NULL) - return (false); - } - - return (true); -} - -static void -test_groups(cap_channel_t *origcapgrp) -{ - cap_channel_t *capgrp; - const char *names[5]; - gid_t gids[5]; - - /* - * Allow: - * groups: - * names: wheel, daemon, kmem, sys, tty - * gids: - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - names[0] = "wheel"; - names[1] = "daemon"; - names[2] = "kmem"; - names[3] = "sys"; - names[4] = "tty"; - CHECK(cap_grp_limit_groups(capgrp, names, 5, NULL, 0) == 0); - gids[0] = 0; - gids[1] = 1; - gids[2] = 2; - gids[3] = 3; - gids[4] = 4; - - CHECK(runtest_groups(capgrp, names, gids, 5)); - - cap_close(capgrp); - - /* - * Allow: - * groups: - * names: kmem, sys, tty - * gids: - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - names[0] = "kmem"; - names[1] = "sys"; - names[2] = "tty"; - CHECK(cap_grp_limit_groups(capgrp, names, 3, NULL, 0) == 0); - names[3] = "daemon"; - CHECK(cap_grp_limit_groups(capgrp, names, 4, NULL, 0) == -1 && - errno == ENOTCAPABLE); - names[0] = "daemon"; - CHECK(cap_grp_limit_groups(capgrp, names, 1, NULL, 0) == -1 && - errno == ENOTCAPABLE); - names[0] = "kmem"; - gids[0] = 2; - gids[1] = 3; - gids[2] = 4; - - CHECK(runtest_groups(capgrp, names, gids, 3)); - - cap_close(capgrp); - - /* - * Allow: - * groups: - * names: wheel, kmem, tty - * gids: - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - names[0] = "wheel"; - names[1] = "kmem"; - names[2] = "tty"; - CHECK(cap_grp_limit_groups(capgrp, names, 3, NULL, 0) == 0); - names[3] = "daemon"; - CHECK(cap_grp_limit_groups(capgrp, names, 4, NULL, 0) == -1 && - errno == ENOTCAPABLE); - names[0] = "daemon"; - CHECK(cap_grp_limit_groups(capgrp, names, 1, NULL, 0) == -1 && - errno == ENOTCAPABLE); - names[0] = "wheel"; - gids[0] = 0; - gids[1] = 2; - gids[2] = 4; - - CHECK(runtest_groups(capgrp, names, gids, 3)); - - cap_close(capgrp); - - /* - * Allow: - * groups: - * names: - * gids: 2, 3, 4 - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - names[0] = "kmem"; - names[1] = "sys"; - names[2] = "tty"; - gids[0] = 2; - gids[1] = 3; - gids[2] = 4; - CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 3) == 0); - gids[3] = 0; - CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 4) == -1 && - errno == ENOTCAPABLE); - gids[0] = 0; - CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 1) == -1 && - errno == ENOTCAPABLE); - gids[0] = 2; - - CHECK(runtest_groups(capgrp, names, gids, 3)); - - cap_close(capgrp); - - /* - * Allow: - * groups: - * names: - * gids: 0, 2, 4 - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - names[0] = "wheel"; - names[1] = "kmem"; - names[2] = "tty"; - gids[0] = 0; - gids[1] = 2; - gids[2] = 4; - CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 3) == 0); - gids[3] = 1; - CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 4) == -1 && - errno == ENOTCAPABLE); - gids[0] = 1; - CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 1) == -1 && - errno == ENOTCAPABLE); - gids[0] = 0; - - CHECK(runtest_groups(capgrp, names, gids, 3)); - - cap_close(capgrp); - - /* - * Allow: - * groups: - * names: kmem - * gids: - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - names[0] = "kmem"; - CHECK(cap_grp_limit_groups(capgrp, names, 1, NULL, 0) == 0); - names[1] = "daemon"; - CHECK(cap_grp_limit_groups(capgrp, names, 2, NULL, 0) == -1 && - errno == ENOTCAPABLE); - names[0] = "daemon"; - CHECK(cap_grp_limit_groups(capgrp, names, 1, NULL, 0) == -1 && - errno == ENOTCAPABLE); - names[0] = "kmem"; - gids[0] = 2; - - CHECK(runtest_groups(capgrp, names, gids, 1)); - - cap_close(capgrp); - - /* - * Allow: - * groups: - * names: wheel, tty - * gids: - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - names[0] = "wheel"; - names[1] = "tty"; - CHECK(cap_grp_limit_groups(capgrp, names, 2, NULL, 0) == 0); - names[2] = "daemon"; - CHECK(cap_grp_limit_groups(capgrp, names, 3, NULL, 0) == -1 && - errno == ENOTCAPABLE); - names[0] = "daemon"; - CHECK(cap_grp_limit_groups(capgrp, names, 1, NULL, 0) == -1 && - errno == ENOTCAPABLE); - names[0] = "wheel"; - gids[0] = 0; - gids[1] = 4; - - CHECK(runtest_groups(capgrp, names, gids, 2)); - - cap_close(capgrp); - - /* - * Allow: - * groups: - * names: - * gids: 2 - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - names[0] = "kmem"; - gids[0] = 2; - CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 1) == 0); - gids[1] = 1; - CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 2) == -1 && - errno == ENOTCAPABLE); - gids[0] = 1; - CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 1) == -1 && - errno == ENOTCAPABLE); - gids[0] = 2; - - CHECK(runtest_groups(capgrp, names, gids, 1)); - - cap_close(capgrp); - - /* - * Allow: - * groups: - * names: - * gids: 0, 4 - */ - capgrp = cap_clone(origcapgrp); - CHECK(capgrp != NULL); - - names[0] = "wheel"; - names[1] = "tty"; - gids[0] = 0; - gids[1] = 4; - CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 2) == 0); - gids[2] = 1; - CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 3) == -1 && - errno == ENOTCAPABLE); - gids[0] = 1; - CHECK(cap_grp_limit_groups(capgrp, NULL, 0, gids, 1) == -1 && - errno == ENOTCAPABLE); - gids[0] = 0; - - CHECK(runtest_groups(capgrp, names, gids, 2)); - - cap_close(capgrp); -} - -int -main(void) -{ - cap_channel_t *capcas, *capgrp; - - printf("1..199\n"); - - capcas = cap_init(); - CHECKX(capcas != NULL); - - capgrp = cap_service_open(capcas, "system.grp"); - CHECKX(capgrp != NULL); - - cap_close(capcas); - - /* No limits. */ - - CHECK(runtest_cmds(capgrp) == (SETGRENT | GETGRENT | GETGRENT_R | - GETGRNAM | GETGRNAM_R | GETGRGID | GETGRGID_R)); - - test_cmds(capgrp); - - test_fields(capgrp); - - test_groups(capgrp); - - cap_close(capgrp); - - exit(0); -} Property changes on: projects/clang390-import/tools/regression/capsicum/libcasper/grp.c ___________________________________________________________________ Deleted: svn:eol-style ## -1 +0,0 ## -native \ No newline at end of property Deleted: svn:keywords ## -1 +0,0 ## -FreeBSD=%H \ No newline at end of property Deleted: svn:mime-type ## -1 +0,0 ## -text/plain \ No newline at end of property Index: projects/clang390-import/tools/regression/capsicum/libcasper/pwd.c =================================================================== --- projects/clang390-import/tools/regression/capsicum/libcasper/pwd.c (revision 305686) +++ projects/clang390-import/tools/regression/capsicum/libcasper/pwd.c (nonexistent) @@ -1,1536 +0,0 @@ -/*- - * Copyright (c) 2013 The FreeBSD Foundation - * All rights reserved. - * - * This software was developed by Pawel Jakub Dawidek under sponsorship from - * the FreeBSD Foundation. - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND - * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE - * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE - * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL - * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS - * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) - * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT - * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY - * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF - * SUCH DAMAGE. - */ - -#include -__FBSDID("$FreeBSD$"); - -#include - -#include -#include -#include -#include -#include -#include -#include -#include - -#include - -#include - -static int ntest = 1; - -#define CHECK(expr) do { \ - if ((expr)) \ - printf("ok %d %s:%u\n", ntest, __FILE__, __LINE__); \ - else \ - printf("not ok %d %s:%u\n", ntest, __FILE__, __LINE__);\ - ntest++; \ -} while (0) -#define CHECKX(expr) do { \ - if ((expr)) { \ - printf("ok %d %s:%u\n", ntest, __FILE__, __LINE__); \ - } else { \ - printf("not ok %d %s:%u\n", ntest, __FILE__, __LINE__);\ - exit(1); \ - } \ - ntest++; \ -} while (0) - -#define UID_ROOT 0 -#define UID_OPERATOR 2 - -#define GETPWENT0 0x0001 -#define GETPWENT1 0x0002 -#define GETPWENT2 0x0004 -#define GETPWENT (GETPWENT0 | GETPWENT1 | GETPWENT2) -#define GETPWENT_R0 0x0008 -#define GETPWENT_R1 0x0010 -#define GETPWENT_R2 0x0020 -#define GETPWENT_R (GETPWENT_R0 | GETPWENT_R1 | GETPWENT_R2) -#define GETPWNAM 0x0040 -#define GETPWNAM_R 0x0080 -#define GETPWUID 0x0100 -#define GETPWUID_R 0x0200 - -static bool -passwd_compare(const struct passwd *pwd0, const struct passwd *pwd1) -{ - - if (pwd0 == NULL && pwd1 == NULL) - return (true); - if (pwd0 == NULL || pwd1 == NULL) - return (false); - - if (strcmp(pwd0->pw_name, pwd1->pw_name) != 0) - return (false); - - if (pwd0->pw_passwd != NULL || pwd1->pw_passwd != NULL) { - if (pwd0->pw_passwd == NULL || pwd1->pw_passwd == NULL) - return (false); - if (strcmp(pwd0->pw_passwd, pwd1->pw_passwd) != 0) - return (false); - } - - if (pwd0->pw_uid != pwd1->pw_uid) - return (false); - - if (pwd0->pw_gid != pwd1->pw_gid) - return (false); - - if (pwd0->pw_change != pwd1->pw_change) - return (false); - - if (pwd0->pw_class != NULL || pwd1->pw_class != NULL) { - if (pwd0->pw_class == NULL || pwd1->pw_class == NULL) - return (false); - if (strcmp(pwd0->pw_class, pwd1->pw_class) != 0) - return (false); - } - - if (pwd0->pw_gecos != NULL || pwd1->pw_gecos != NULL) { - if (pwd0->pw_gecos == NULL || pwd1->pw_gecos == NULL) - return (false); - if (strcmp(pwd0->pw_gecos, pwd1->pw_gecos) != 0) - return (false); - } - - if (pwd0->pw_dir != NULL || pwd1->pw_dir != NULL) { - if (pwd0->pw_dir == NULL || pwd1->pw_dir == NULL) - return (false); - if (strcmp(pwd0->pw_dir, pwd1->pw_dir) != 0) - return (false); - } - - if (pwd0->pw_shell != NULL || pwd1->pw_shell != NULL) { - if (pwd0->pw_shell == NULL || pwd1->pw_shell == NULL) - return (false); - if (strcmp(pwd0->pw_shell, pwd1->pw_shell) != 0) - return (false); - } - - if (pwd0->pw_expire != pwd1->pw_expire) - return (false); - - if (pwd0->pw_fields != pwd1->pw_fields) - return (false); - - return (true); -} - -static unsigned int -runtest_cmds(cap_channel_t *cappwd) -{ - char bufs[1024], bufc[1024]; - unsigned int result; - struct passwd *pwds, *pwdc; - struct passwd sts, stc; - - result = 0; - - setpwent(); - cap_setpwent(cappwd); - - pwds = getpwent(); - pwdc = cap_getpwent(cappwd); - if (passwd_compare(pwds, pwdc)) { - result |= GETPWENT0; - pwds = getpwent(); - pwdc = cap_getpwent(cappwd); - if (passwd_compare(pwds, pwdc)) - result |= GETPWENT1; - } - - getpwent_r(&sts, bufs, sizeof(bufs), &pwds); - cap_getpwent_r(cappwd, &stc, bufc, sizeof(bufc), &pwdc); - if (passwd_compare(pwds, pwdc)) { - result |= GETPWENT_R0; - getpwent_r(&sts, bufs, sizeof(bufs), &pwds); - cap_getpwent_r(cappwd, &stc, bufc, sizeof(bufc), &pwdc); - if (passwd_compare(pwds, pwdc)) - result |= GETPWENT_R1; - } - - setpwent(); - cap_setpwent(cappwd); - - getpwent_r(&sts, bufs, sizeof(bufs), &pwds); - cap_getpwent_r(cappwd, &stc, bufc, sizeof(bufc), &pwdc); - if (passwd_compare(pwds, pwdc)) - result |= GETPWENT_R2; - - pwds = getpwent(); - pwdc = cap_getpwent(cappwd); - if (passwd_compare(pwds, pwdc)) - result |= GETPWENT2; - - pwds = getpwnam("root"); - pwdc = cap_getpwnam(cappwd, "root"); - if (passwd_compare(pwds, pwdc)) { - pwds = getpwnam("operator"); - pwdc = cap_getpwnam(cappwd, "operator"); - if (passwd_compare(pwds, pwdc)) - result |= GETPWNAM; - } - - getpwnam_r("root", &sts, bufs, sizeof(bufs), &pwds); - cap_getpwnam_r(cappwd, "root", &stc, bufc, sizeof(bufc), &pwdc); - if (passwd_compare(pwds, pwdc)) { - getpwnam_r("operator", &sts, bufs, sizeof(bufs), &pwds); - cap_getpwnam_r(cappwd, "operator", &stc, bufc, sizeof(bufc), - &pwdc); - if (passwd_compare(pwds, pwdc)) - result |= GETPWNAM_R; - } - - pwds = getpwuid(UID_ROOT); - pwdc = cap_getpwuid(cappwd, UID_ROOT); - if (passwd_compare(pwds, pwdc)) { - pwds = getpwuid(UID_OPERATOR); - pwdc = cap_getpwuid(cappwd, UID_OPERATOR); - if (passwd_compare(pwds, pwdc)) - result |= GETPWUID; - } - - getpwuid_r(UID_ROOT, &sts, bufs, sizeof(bufs), &pwds); - cap_getpwuid_r(cappwd, UID_ROOT, &stc, bufc, sizeof(bufc), &pwdc); - if (passwd_compare(pwds, pwdc)) { - getpwuid_r(UID_OPERATOR, &sts, bufs, sizeof(bufs), &pwds); - cap_getpwuid_r(cappwd, UID_OPERATOR, &stc, bufc, sizeof(bufc), - &pwdc); - if (passwd_compare(pwds, pwdc)) - result |= GETPWUID_R; - } - - return (result); -} - -static void -test_cmds(cap_channel_t *origcappwd) -{ - cap_channel_t *cappwd; - const char *cmds[7], *fields[10], *names[6]; - uid_t uids[5]; - - fields[0] = "pw_name"; - fields[1] = "pw_passwd"; - fields[2] = "pw_uid"; - fields[3] = "pw_gid"; - fields[4] = "pw_change"; - fields[5] = "pw_class"; - fields[6] = "pw_gecos"; - fields[7] = "pw_dir"; - fields[8] = "pw_shell"; - fields[9] = "pw_expire"; - - names[0] = "root"; - names[1] = "toor"; - names[2] = "daemon"; - names[3] = "operator"; - names[4] = "bin"; - names[5] = "kmem"; - - uids[0] = 0; - uids[1] = 1; - uids[2] = 2; - uids[3] = 3; - uids[4] = 5; - - /* - * Allow: - * cmds: setpwent, getpwent, getpwent_r, getpwnam, getpwnam_r, - * getpwuid, getpwuid_r - * users: - * names: root, toor, daemon, operator, bin, kmem - * uids: - */ - cappwd = cap_clone(origcappwd); - CHECK(cappwd != NULL); - - cmds[0] = "setpwent"; - cmds[1] = "getpwent"; - cmds[2] = "getpwent_r"; - cmds[3] = "getpwnam"; - cmds[4] = "getpwnam_r"; - cmds[5] = "getpwuid"; - cmds[6] = "getpwuid_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 7) == 0); - CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); - CHECK(cap_pwd_limit_users(cappwd, names, 6, NULL, 0) == 0); - - CHECK(runtest_cmds(cappwd) == (GETPWENT | GETPWENT_R | - GETPWNAM | GETPWNAM_R | GETPWUID | GETPWUID_R)); - - cap_close(cappwd); - - /* - * Allow: - * cmds: setpwent, getpwent, getpwent_r, getpwnam, getpwnam_r, - * getpwuid, getpwuid_r - * users: - * names: - * uids: 0, 1, 2, 3, 5 - */ - cappwd = cap_clone(origcappwd); - CHECK(cappwd != NULL); - - cmds[0] = "setpwent"; - cmds[1] = "getpwent"; - cmds[2] = "getpwent_r"; - cmds[3] = "getpwnam"; - cmds[4] = "getpwnam_r"; - cmds[5] = "getpwuid"; - cmds[6] = "getpwuid_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 7) == 0); - CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); - CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 5) == 0); - - CHECK(runtest_cmds(cappwd) == (GETPWENT | GETPWENT_R | - GETPWNAM | GETPWNAM_R | GETPWUID | GETPWUID_R)); - - cap_close(cappwd); - - /* - * Allow: - * cmds: getpwent, getpwent_r, getpwnam, getpwnam_r, - * getpwuid, getpwuid_r - * users: - * names: root, toor, daemon, operator, bin, kmem - * uids: - * Disallow: - * cmds: setpwent - * users: - */ - cappwd = cap_clone(origcappwd); - CHECK(cappwd != NULL); - - cap_setpwent(cappwd); - - cmds[0] = "getpwent"; - cmds[1] = "getpwent_r"; - cmds[2] = "getpwnam"; - cmds[3] = "getpwnam_r"; - cmds[4] = "getpwuid"; - cmds[5] = "getpwuid_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 6) == 0); - cmds[0] = "setpwent"; - cmds[1] = "getpwent"; - cmds[2] = "getpwent_r"; - cmds[3] = "getpwnam"; - cmds[4] = "getpwnam_r"; - cmds[5] = "getpwuid"; - cmds[6] = "getpwuid_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 7) == -1 && errno == ENOTCAPABLE); - cmds[0] = "setpwent"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 1) == -1 && errno == ENOTCAPABLE); - CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); - CHECK(cap_pwd_limit_users(cappwd, names, 6, NULL, 0) == 0); - - CHECK(runtest_cmds(cappwd) == (GETPWENT0 | GETPWENT1 | GETPWENT_R0 | - GETPWENT_R1 | GETPWNAM | GETPWNAM_R | GETPWUID | GETPWUID_R)); - - cap_close(cappwd); - - /* - * Allow: - * cmds: getpwent, getpwent_r, getpwnam, getpwnam_r, - * getpwuid, getpwuid_r - * users: - * names: - * uids: 0, 1, 2, 3, 5 - * Disallow: - * cmds: setpwent - * users: - */ - cappwd = cap_clone(origcappwd); - CHECK(cappwd != NULL); - - cap_setpwent(cappwd); - - cmds[0] = "getpwent"; - cmds[1] = "getpwent_r"; - cmds[2] = "getpwnam"; - cmds[3] = "getpwnam_r"; - cmds[4] = "getpwuid"; - cmds[5] = "getpwuid_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 6) == 0); - cmds[0] = "setpwent"; - cmds[1] = "getpwent"; - cmds[2] = "getpwent_r"; - cmds[3] = "getpwnam"; - cmds[4] = "getpwnam_r"; - cmds[5] = "getpwuid"; - cmds[6] = "getpwuid_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 7) == -1 && errno == ENOTCAPABLE); - cmds[0] = "setpwent"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 1) == -1 && errno == ENOTCAPABLE); - CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); - CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 5) == 0); - - CHECK(runtest_cmds(cappwd) == (GETPWENT0 | GETPWENT1 | GETPWENT_R0 | - GETPWENT_R1 | GETPWNAM | GETPWNAM_R | GETPWUID | GETPWUID_R)); - - cap_close(cappwd); - - /* - * Allow: - * cmds: setpwent, getpwent_r, getpwnam, getpwnam_r, - * getpwuid, getpwuid_r - * users: - * names: root, toor, daemon, operator, bin, kmem - * uids: - * Disallow: - * cmds: getpwent - * users: - */ - cappwd = cap_clone(origcappwd); - CHECK(cappwd != NULL); - - cmds[0] = "setpwent"; - cmds[1] = "getpwent_r"; - cmds[2] = "getpwnam"; - cmds[3] = "getpwnam_r"; - cmds[4] = "getpwuid"; - cmds[5] = "getpwuid_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 6) == 0); - cmds[0] = "setpwent"; - cmds[1] = "getpwent"; - cmds[2] = "getpwent_r"; - cmds[3] = "getpwnam"; - cmds[4] = "getpwnam_r"; - cmds[5] = "getpwuid"; - cmds[6] = "getpwuid_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 7) == -1 && errno == ENOTCAPABLE); - cmds[0] = "getpwent"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 1) == -1 && errno == ENOTCAPABLE); - CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); - CHECK(cap_pwd_limit_users(cappwd, names, 6, NULL, 0) == 0); - - CHECK(runtest_cmds(cappwd) == (GETPWENT_R2 | - GETPWNAM | GETPWNAM_R | GETPWUID | GETPWUID_R)); - - cap_close(cappwd); - - /* - * Allow: - * cmds: setpwent, getpwent_r, getpwnam, getpwnam_r, - * getpwuid, getpwuid_r - * users: - * names: - * uids: 0, 1, 2, 3, 5 - * Disallow: - * cmds: getpwent - * users: - */ - cappwd = cap_clone(origcappwd); - CHECK(cappwd != NULL); - - cmds[0] = "setpwent"; - cmds[1] = "getpwent_r"; - cmds[2] = "getpwnam"; - cmds[3] = "getpwnam_r"; - cmds[4] = "getpwuid"; - cmds[5] = "getpwuid_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 6) == 0); - cmds[0] = "setpwent"; - cmds[1] = "getpwent"; - cmds[2] = "getpwent_r"; - cmds[3] = "getpwnam"; - cmds[4] = "getpwnam_r"; - cmds[5] = "getpwuid"; - cmds[6] = "getpwuid_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 7) == -1 && errno == ENOTCAPABLE); - cmds[0] = "getpwent"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 1) == -1 && errno == ENOTCAPABLE); - CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); - CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 5) == 0); - - CHECK(runtest_cmds(cappwd) == (GETPWENT_R2 | - GETPWNAM | GETPWNAM_R | GETPWUID | GETPWUID_R)); - - cap_close(cappwd); - - /* - * Allow: - * cmds: setpwent, getpwent, getpwnam, getpwnam_r, - * getpwuid, getpwuid_r - * users: - * names: root, toor, daemon, operator, bin, kmem - * uids: - * Disallow: - * cmds: getpwent_r - * users: - */ - cappwd = cap_clone(origcappwd); - CHECK(cappwd != NULL); - - cmds[0] = "setpwent"; - cmds[1] = "getpwent"; - cmds[2] = "getpwnam"; - cmds[3] = "getpwnam_r"; - cmds[4] = "getpwuid"; - cmds[5] = "getpwuid_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 6) == 0); - cmds[0] = "setpwent"; - cmds[1] = "getpwent"; - cmds[2] = "getpwent_r"; - cmds[3] = "getpwnam"; - cmds[4] = "getpwnam_r"; - cmds[5] = "getpwuid"; - cmds[6] = "getpwuid_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 7) == -1 && errno == ENOTCAPABLE); - cmds[0] = "getpwent_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 1) == -1 && errno == ENOTCAPABLE); - CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); - CHECK(cap_pwd_limit_users(cappwd, names, 6, NULL, 0) == 0); - - CHECK(runtest_cmds(cappwd) == (GETPWENT0 | GETPWENT1 | - GETPWNAM | GETPWNAM_R | GETPWUID | GETPWUID_R)); - - cap_close(cappwd); - - /* - * Allow: - * cmds: setpwent, getpwent, getpwnam, getpwnam_r, - * getpwuid, getpwuid_r - * users: - * names: - * uids: 0, 1, 2, 3, 5 - * Disallow: - * cmds: getpwent_r - * users: - */ - cappwd = cap_clone(origcappwd); - CHECK(cappwd != NULL); - - cmds[0] = "setpwent"; - cmds[1] = "getpwent"; - cmds[2] = "getpwnam"; - cmds[3] = "getpwnam_r"; - cmds[4] = "getpwuid"; - cmds[5] = "getpwuid_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 6) == 0); - cmds[0] = "setpwent"; - cmds[1] = "getpwent"; - cmds[2] = "getpwent_r"; - cmds[3] = "getpwnam"; - cmds[4] = "getpwnam_r"; - cmds[5] = "getpwuid"; - cmds[6] = "getpwuid_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 7) == -1 && errno == ENOTCAPABLE); - cmds[0] = "getpwent_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 1) == -1 && errno == ENOTCAPABLE); - CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); - CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 5) == 0); - - CHECK(runtest_cmds(cappwd) == (GETPWENT0 | GETPWENT1 | - GETPWNAM | GETPWNAM_R | GETPWUID | GETPWUID_R)); - - cap_close(cappwd); - - /* - * Allow: - * cmds: setpwent, getpwent, getpwent_r, getpwnam_r, - * getpwuid, getpwuid_r - * users: - * names: root, toor, daemon, operator, bin, kmem - * uids: - * Disallow: - * cmds: getpwnam - * users: - */ - cappwd = cap_clone(origcappwd); - CHECK(cappwd != NULL); - - cmds[0] = "setpwent"; - cmds[1] = "getpwent"; - cmds[2] = "getpwent_r"; - cmds[3] = "getpwnam_r"; - cmds[4] = "getpwuid"; - cmds[5] = "getpwuid_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 6) == 0); - cmds[0] = "setpwent"; - cmds[1] = "getpwent"; - cmds[2] = "getpwent_r"; - cmds[3] = "getpwnam"; - cmds[4] = "getpwnam_r"; - cmds[5] = "getpwuid"; - cmds[6] = "getpwuid_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 7) == -1 && errno == ENOTCAPABLE); - cmds[0] = "getpwnam"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 1) == -1 && errno == ENOTCAPABLE); - CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); - CHECK(cap_pwd_limit_users(cappwd, names, 6, NULL, 0) == 0); - - CHECK(runtest_cmds(cappwd) == (GETPWENT | GETPWENT_R | - GETPWNAM_R | GETPWUID | GETPWUID_R)); - - cap_close(cappwd); - - /* - * Allow: - * cmds: setpwent, getpwent, getpwent_r, getpwnam_r, - * getpwuid, getpwuid_r - * users: - * names: - * uids: 0, 1, 2, 3, 5 - * Disallow: - * cmds: getpwnam - * users: - */ - cappwd = cap_clone(origcappwd); - CHECK(cappwd != NULL); - - cmds[0] = "setpwent"; - cmds[1] = "getpwent"; - cmds[2] = "getpwent_r"; - cmds[3] = "getpwnam_r"; - cmds[4] = "getpwuid"; - cmds[5] = "getpwuid_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 6) == 0); - cmds[0] = "setpwent"; - cmds[1] = "getpwent"; - cmds[2] = "getpwent_r"; - cmds[3] = "getpwnam"; - cmds[4] = "getpwnam_r"; - cmds[5] = "getpwuid"; - cmds[6] = "getpwuid_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 7) == -1 && errno == ENOTCAPABLE); - cmds[0] = "getpwnam"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 1) == -1 && errno == ENOTCAPABLE); - CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); - CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 5) == 0); - - CHECK(runtest_cmds(cappwd) == (GETPWENT | GETPWENT_R | - GETPWNAM_R | GETPWUID | GETPWUID_R)); - - cap_close(cappwd); - - /* - * Allow: - * cmds: setpwent, getpwent, getpwent_r, getpwnam, - * getpwuid, getpwuid_r - * users: - * names: root, toor, daemon, operator, bin, kmem - * uids: - * Disallow: - * cmds: getpwnam_r - * users: - */ - cappwd = cap_clone(origcappwd); - CHECK(cappwd != NULL); - - cmds[0] = "setpwent"; - cmds[1] = "getpwent"; - cmds[2] = "getpwent_r"; - cmds[3] = "getpwnam"; - cmds[4] = "getpwuid"; - cmds[5] = "getpwuid_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 6) == 0); - cmds[0] = "setpwent"; - cmds[1] = "getpwent"; - cmds[2] = "getpwent_r"; - cmds[3] = "getpwnam"; - cmds[4] = "getpwnam_r"; - cmds[5] = "getpwuid"; - cmds[6] = "getpwuid_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 7) == -1 && errno == ENOTCAPABLE); - cmds[0] = "getpwnam_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 1) == -1 && errno == ENOTCAPABLE); - CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); - CHECK(cap_pwd_limit_users(cappwd, names, 6, NULL, 0) == 0); - - CHECK(runtest_cmds(cappwd) == (GETPWENT | GETPWENT_R | - GETPWNAM | GETPWUID | GETPWUID_R)); - - cap_close(cappwd); - - /* - * Allow: - * cmds: setpwent, getpwent, getpwent_r, getpwnam, - * getpwuid, getpwuid_r - * users: - * names: - * uids: 0, 1, 2, 3, 5 - * Disallow: - * cmds: getpwnam_r - * users: - */ - cappwd = cap_clone(origcappwd); - CHECK(cappwd != NULL); - - cmds[0] = "setpwent"; - cmds[1] = "getpwent"; - cmds[2] = "getpwent_r"; - cmds[3] = "getpwnam"; - cmds[4] = "getpwuid"; - cmds[5] = "getpwuid_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 6) == 0); - cmds[0] = "setpwent"; - cmds[1] = "getpwent"; - cmds[2] = "getpwent_r"; - cmds[3] = "getpwnam"; - cmds[4] = "getpwnam_r"; - cmds[5] = "getpwuid"; - cmds[6] = "getpwuid_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 7) == -1 && errno == ENOTCAPABLE); - cmds[0] = "getpwnam_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 1) == -1 && errno == ENOTCAPABLE); - CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); - CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 5) == 0); - - CHECK(runtest_cmds(cappwd) == (GETPWENT | GETPWENT_R | - GETPWNAM | GETPWUID | GETPWUID_R)); - - cap_close(cappwd); - - /* - * Allow: - * cmds: setpwent, getpwent, getpwent_r, getpwnam, getpwnam_r, - * getpwuid_r - * users: - * names: root, toor, daemon, operator, bin, kmem - * uids: - * Disallow: - * cmds: getpwuid - * users: - */ - cappwd = cap_clone(origcappwd); - CHECK(cappwd != NULL); - - cmds[0] = "setpwent"; - cmds[1] = "getpwent"; - cmds[2] = "getpwent_r"; - cmds[3] = "getpwnam"; - cmds[4] = "getpwnam_r"; - cmds[5] = "getpwuid_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 6) == 0); - cmds[0] = "setpwent"; - cmds[1] = "getpwent"; - cmds[2] = "getpwent_r"; - cmds[3] = "getpwnam"; - cmds[4] = "getpwnam_r"; - cmds[5] = "getpwuid"; - cmds[6] = "getpwuid_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 7) == -1 && errno == ENOTCAPABLE); - cmds[0] = "getpwuid"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 1) == -1 && errno == ENOTCAPABLE); - CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); - CHECK(cap_pwd_limit_users(cappwd, names, 6, NULL, 0) == 0); - - CHECK(runtest_cmds(cappwd) == (GETPWENT | GETPWENT_R | - GETPWNAM | GETPWNAM_R | GETPWUID_R)); - - cap_close(cappwd); - - /* - * Allow: - * cmds: setpwent, getpwent, getpwent_r, getpwnam, getpwnam_r, - * getpwuid_r - * users: - * names: - * uids: 0, 1, 2, 3, 5 - * Disallow: - * cmds: getpwuid - * users: - */ - cappwd = cap_clone(origcappwd); - CHECK(cappwd != NULL); - - cmds[0] = "setpwent"; - cmds[1] = "getpwent"; - cmds[2] = "getpwent_r"; - cmds[3] = "getpwnam"; - cmds[4] = "getpwnam_r"; - cmds[5] = "getpwuid_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 6) == 0); - cmds[0] = "setpwent"; - cmds[1] = "getpwent"; - cmds[2] = "getpwent_r"; - cmds[3] = "getpwnam"; - cmds[4] = "getpwnam_r"; - cmds[5] = "getpwuid"; - cmds[6] = "getpwuid_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 7) == -1 && errno == ENOTCAPABLE); - cmds[0] = "getpwuid"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 1) == -1 && errno == ENOTCAPABLE); - CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); - CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 5) == 0); - - CHECK(runtest_cmds(cappwd) == (GETPWENT | GETPWENT_R | - GETPWNAM | GETPWNAM_R | GETPWUID_R)); - - cap_close(cappwd); - - /* - * Allow: - * cmds: setpwent, getpwent, getpwent_r, getpwnam, getpwnam_r, - * getpwuid - * users: - * names: root, toor, daemon, operator, bin, kmem - * uids: - * Disallow: - * cmds: getpwuid_r - * users: - */ - cappwd = cap_clone(origcappwd); - CHECK(cappwd != NULL); - - cmds[0] = "setpwent"; - cmds[1] = "getpwent"; - cmds[2] = "getpwent_r"; - cmds[3] = "getpwnam"; - cmds[4] = "getpwnam_r"; - cmds[5] = "getpwuid"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 6) == 0); - cmds[0] = "setpwent"; - cmds[1] = "getpwent"; - cmds[2] = "getpwent_r"; - cmds[3] = "getpwnam"; - cmds[4] = "getpwnam_r"; - cmds[5] = "getpwuid"; - cmds[6] = "getpwuid_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 7) == -1 && errno == ENOTCAPABLE); - cmds[0] = "getpwuid_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 1) == -1 && errno == ENOTCAPABLE); - CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); - CHECK(cap_pwd_limit_users(cappwd, names, 6, NULL, 0) == 0); - - CHECK(runtest_cmds(cappwd) == (GETPWENT | GETPWENT_R | - GETPWNAM | GETPWNAM_R | GETPWUID)); - - cap_close(cappwd); - - /* - * Allow: - * cmds: setpwent, getpwent, getpwent_r, getpwnam, getpwnam_r, - * getpwuid - * users: - * names: - * uids: 0, 1, 2, 3, 5 - * Disallow: - * cmds: getpwuid_r - * users: - */ - cappwd = cap_clone(origcappwd); - CHECK(cappwd != NULL); - - cmds[0] = "setpwent"; - cmds[1] = "getpwent"; - cmds[2] = "getpwent_r"; - cmds[3] = "getpwnam"; - cmds[4] = "getpwnam_r"; - cmds[5] = "getpwuid"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 6) == 0); - cmds[0] = "setpwent"; - cmds[1] = "getpwent"; - cmds[2] = "getpwent_r"; - cmds[3] = "getpwnam"; - cmds[4] = "getpwnam_r"; - cmds[5] = "getpwuid"; - cmds[6] = "getpwuid_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 7) == -1 && errno == ENOTCAPABLE); - cmds[0] = "getpwuid_r"; - CHECK(cap_pwd_limit_cmds(cappwd, cmds, 1) == -1 && errno == ENOTCAPABLE); - CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); - CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 5) == 0); - - CHECK(runtest_cmds(cappwd) == (GETPWENT | GETPWENT_R | - GETPWNAM | GETPWNAM_R | GETPWUID)); - - cap_close(cappwd); -} - -#define PW_NAME _PWF_NAME -#define PW_PASSWD _PWF_PASSWD -#define PW_UID _PWF_UID -#define PW_GID _PWF_GID -#define PW_CHANGE _PWF_CHANGE -#define PW_CLASS _PWF_CLASS -#define PW_GECOS _PWF_GECOS -#define PW_DIR _PWF_DIR -#define PW_SHELL _PWF_SHELL -#define PW_EXPIRE _PWF_EXPIRE - -static unsigned int -passwd_fields(const struct passwd *pwd) -{ - unsigned int result; - - result = 0; - - if (pwd->pw_name != NULL && pwd->pw_name[0] != '\0') - result |= PW_NAME; -// else -// printf("No pw_name\n"); - - if (pwd->pw_passwd != NULL && pwd->pw_passwd[0] != '\0') - result |= PW_PASSWD; - else if ((pwd->pw_fields & _PWF_PASSWD) != 0) - result |= PW_PASSWD; -// else -// printf("No pw_passwd\n"); - - if (pwd->pw_uid != (uid_t)-1) - result |= PW_UID; -// else -// printf("No pw_uid\n"); - - if (pwd->pw_gid != (gid_t)-1) - result |= PW_GID; -// else -// printf("No pw_gid\n"); - - if (pwd->pw_change != 0 || (pwd->pw_fields & _PWF_CHANGE) != 0) - result |= PW_CHANGE; -// else -// printf("No pw_change\n"); - - if (pwd->pw_class != NULL && pwd->pw_class[0] != '\0') - result |= PW_CLASS; - else if ((pwd->pw_fields & _PWF_CLASS) != 0) - result |= PW_CLASS; -// else -// printf("No pw_class\n"); - - if (pwd->pw_gecos != NULL && pwd->pw_gecos[0] != '\0') - result |= PW_GECOS; - else if ((pwd->pw_fields & _PWF_GECOS) != 0) - result |= PW_GECOS; -// else -// printf("No pw_gecos\n"); - - if (pwd->pw_dir != NULL && pwd->pw_dir[0] != '\0') - result |= PW_DIR; - else if ((pwd->pw_fields & _PWF_DIR) != 0) - result |= PW_DIR; -// else -// printf("No pw_dir\n"); - - if (pwd->pw_shell != NULL && pwd->pw_shell[0] != '\0') - result |= PW_SHELL; - else if ((pwd->pw_fields & _PWF_SHELL) != 0) - result |= PW_SHELL; -// else -// printf("No pw_shell\n"); - - if (pwd->pw_expire != 0 || (pwd->pw_fields & _PWF_EXPIRE) != 0) - result |= PW_EXPIRE; -// else -// printf("No pw_expire\n"); - -if (false && pwd->pw_fields != (int)result) { -printf("fields=0x%x != result=0x%x\n", (const unsigned int)pwd->pw_fields, result); -printf(" fields result\n"); -printf("PW_NAME %d %d\n", (pwd->pw_fields & PW_NAME) != 0, (result & PW_NAME) != 0); -printf("PW_PASSWD %d %d\n", (pwd->pw_fields & PW_PASSWD) != 0, (result & PW_PASSWD) != 0); -printf("PW_UID %d %d\n", (pwd->pw_fields & PW_UID) != 0, (result & PW_UID) != 0); -printf("PW_GID %d %d\n", (pwd->pw_fields & PW_GID) != 0, (result & PW_GID) != 0); -printf("PW_CHANGE %d %d\n", (pwd->pw_fields & PW_CHANGE) != 0, (result & PW_CHANGE) != 0); -printf("PW_CLASS %d %d\n", (pwd->pw_fields & PW_CLASS) != 0, (result & PW_CLASS) != 0); -printf("PW_GECOS %d %d\n", (pwd->pw_fields & PW_GECOS) != 0, (result & PW_GECOS) != 0); -printf("PW_DIR %d %d\n", (pwd->pw_fields & PW_DIR) != 0, (result & PW_DIR) != 0); -printf("PW_SHELL %d %d\n", (pwd->pw_fields & PW_SHELL) != 0, (result & PW_SHELL) != 0); -printf("PW_EXPIRE %d %d\n", (pwd->pw_fields & PW_EXPIRE) != 0, (result & PW_EXPIRE) != 0); -} - -//printf("result=0x%x\n", result); - return (result); -} - -static bool -runtest_fields(cap_channel_t *cappwd, unsigned int expected) -{ - char buf[1024]; - struct passwd *pwd; - struct passwd st; - -//printf("expected=0x%x\n", expected); - cap_setpwent(cappwd); - pwd = cap_getpwent(cappwd); - if ((passwd_fields(pwd) & ~expected) != 0) - return (false); - - cap_setpwent(cappwd); - cap_getpwent_r(cappwd, &st, buf, sizeof(buf), &pwd); - if ((passwd_fields(pwd) & ~expected) != 0) - return (false); - - pwd = cap_getpwnam(cappwd, "root"); - if ((passwd_fields(pwd) & ~expected) != 0) - return (false); - - cap_getpwnam_r(cappwd, "root", &st, buf, sizeof(buf), &pwd); - if ((passwd_fields(pwd) & ~expected) != 0) - return (false); - - pwd = cap_getpwuid(cappwd, UID_ROOT); - if ((passwd_fields(pwd) & ~expected) != 0) - return (false); - - cap_getpwuid_r(cappwd, UID_ROOT, &st, buf, sizeof(buf), &pwd); - if ((passwd_fields(pwd) & ~expected) != 0) - return (false); - - return (true); -} - -static void -test_fields(cap_channel_t *origcappwd) -{ - cap_channel_t *cappwd; - const char *fields[10]; - - /* No limits. */ - - CHECK(runtest_fields(origcappwd, PW_NAME | PW_PASSWD | PW_UID | - PW_GID | PW_CHANGE | PW_CLASS | PW_GECOS | PW_DIR | PW_SHELL | - PW_EXPIRE)); - - /* - * Allow: - * fields: pw_name, pw_passwd, pw_uid, pw_gid, pw_change, pw_class, - * pw_gecos, pw_dir, pw_shell, pw_expire - */ - cappwd = cap_clone(origcappwd); - CHECK(cappwd != NULL); - - fields[0] = "pw_name"; - fields[1] = "pw_passwd"; - fields[2] = "pw_uid"; - fields[3] = "pw_gid"; - fields[4] = "pw_change"; - fields[5] = "pw_class"; - fields[6] = "pw_gecos"; - fields[7] = "pw_dir"; - fields[8] = "pw_shell"; - fields[9] = "pw_expire"; - CHECK(cap_pwd_limit_fields(cappwd, fields, 10) == 0); - - CHECK(runtest_fields(origcappwd, PW_NAME | PW_PASSWD | PW_UID | - PW_GID | PW_CHANGE | PW_CLASS | PW_GECOS | PW_DIR | PW_SHELL | - PW_EXPIRE)); - - cap_close(cappwd); - - /* - * Allow: - * fields: pw_name, pw_passwd, pw_uid, pw_gid, pw_change - */ - cappwd = cap_clone(origcappwd); - CHECK(cappwd != NULL); - - fields[0] = "pw_name"; - fields[1] = "pw_passwd"; - fields[2] = "pw_uid"; - fields[3] = "pw_gid"; - fields[4] = "pw_change"; - CHECK(cap_pwd_limit_fields(cappwd, fields, 5) == 0); - fields[5] = "pw_class"; - CHECK(cap_pwd_limit_fields(cappwd, fields, 6) == -1 && - errno == ENOTCAPABLE); - fields[0] = "pw_class"; - CHECK(cap_pwd_limit_fields(cappwd, fields, 1) == -1 && - errno == ENOTCAPABLE); - - CHECK(runtest_fields(cappwd, PW_NAME | PW_PASSWD | PW_UID | - PW_GID | PW_CHANGE)); - - cap_close(cappwd); - - /* - * Allow: - * fields: pw_class, pw_gecos, pw_dir, pw_shell, pw_expire - */ - cappwd = cap_clone(origcappwd); - CHECK(cappwd != NULL); - - fields[0] = "pw_class"; - fields[1] = "pw_gecos"; - fields[2] = "pw_dir"; - fields[3] = "pw_shell"; - fields[4] = "pw_expire"; - CHECK(cap_pwd_limit_fields(cappwd, fields, 5) == 0); - fields[5] = "pw_uid"; - CHECK(cap_pwd_limit_fields(cappwd, fields, 6) == -1 && - errno == ENOTCAPABLE); - fields[0] = "pw_uid"; - CHECK(cap_pwd_limit_fields(cappwd, fields, 1) == -1 && - errno == ENOTCAPABLE); - - CHECK(runtest_fields(cappwd, PW_CLASS | PW_GECOS | PW_DIR | - PW_SHELL | PW_EXPIRE)); - - cap_close(cappwd); - - /* - * Allow: - * fields: pw_name, pw_uid, pw_change, pw_gecos, pw_shell - */ - cappwd = cap_clone(origcappwd); - CHECK(cappwd != NULL); - - fields[0] = "pw_name"; - fields[1] = "pw_uid"; - fields[2] = "pw_change"; - fields[3] = "pw_gecos"; - fields[4] = "pw_shell"; - CHECK(cap_pwd_limit_fields(cappwd, fields, 5) == 0); - fields[5] = "pw_class"; - CHECK(cap_pwd_limit_fields(cappwd, fields, 6) == -1 && - errno == ENOTCAPABLE); - fields[0] = "pw_class"; - CHECK(cap_pwd_limit_fields(cappwd, fields, 1) == -1 && - errno == ENOTCAPABLE); - - CHECK(runtest_fields(cappwd, PW_NAME | PW_UID | PW_CHANGE | - PW_GECOS | PW_SHELL)); - - cap_close(cappwd); - - /* - * Allow: - * fields: pw_passwd, pw_gid, pw_class, pw_dir, pw_expire - */ - cappwd = cap_clone(origcappwd); - CHECK(cappwd != NULL); - - fields[0] = "pw_passwd"; - fields[1] = "pw_gid"; - fields[2] = "pw_class"; - fields[3] = "pw_dir"; - fields[4] = "pw_expire"; - CHECK(cap_pwd_limit_fields(cappwd, fields, 5) == 0); - fields[5] = "pw_uid"; - CHECK(cap_pwd_limit_fields(cappwd, fields, 6) == -1 && - errno == ENOTCAPABLE); - fields[0] = "pw_uid"; - CHECK(cap_pwd_limit_fields(cappwd, fields, 1) == -1 && - errno == ENOTCAPABLE); - - CHECK(runtest_fields(cappwd, PW_PASSWD | PW_GID | PW_CLASS | - PW_DIR | PW_EXPIRE)); - - cap_close(cappwd); - - /* - * Allow: - * fields: pw_uid, pw_class, pw_shell - */ - cappwd = cap_clone(origcappwd); - CHECK(cappwd != NULL); - - fields[0] = "pw_uid"; - fields[1] = "pw_class"; - fields[2] = "pw_shell"; - CHECK(cap_pwd_limit_fields(cappwd, fields, 3) == 0); - fields[3] = "pw_change"; - CHECK(cap_pwd_limit_fields(cappwd, fields, 4) == -1 && - errno == ENOTCAPABLE); - fields[0] = "pw_change"; - CHECK(cap_pwd_limit_fields(cappwd, fields, 1) == -1 && - errno == ENOTCAPABLE); - - CHECK(runtest_fields(cappwd, PW_UID | PW_CLASS | PW_SHELL)); - - cap_close(cappwd); - - /* - * Allow: - * fields: pw_change - */ - cappwd = cap_clone(origcappwd); - CHECK(cappwd != NULL); - - fields[0] = "pw_change"; - CHECK(cap_pwd_limit_fields(cappwd, fields, 1) == 0); - fields[1] = "pw_uid"; - CHECK(cap_pwd_limit_fields(cappwd, fields, 2) == -1 && - errno == ENOTCAPABLE); - fields[0] = "pw_uid"; - CHECK(cap_pwd_limit_fields(cappwd, fields, 1) == -1 && - errno == ENOTCAPABLE); - - CHECK(runtest_fields(cappwd, PW_CHANGE)); - - cap_close(cappwd); -} - -static bool -runtest_users(cap_channel_t *cappwd, const char **names, const uid_t *uids, - size_t nusers) -{ - char buf[1024]; - struct passwd *pwd; - struct passwd st; - unsigned int i, got; - - cap_setpwent(cappwd); - got = 0; - for (;;) { - pwd = cap_getpwent(cappwd); - if (pwd == NULL) - break; - got++; - for (i = 0; i < nusers; i++) { - if (strcmp(names[i], pwd->pw_name) == 0 && - uids[i] == pwd->pw_uid) { - break; - } - } - if (i == nusers) - return (false); - } - if (got != nusers) - return (false); - - cap_setpwent(cappwd); - got = 0; - for (;;) { - cap_getpwent_r(cappwd, &st, buf, sizeof(buf), &pwd); - if (pwd == NULL) - break; - got++; - for (i = 0; i < nusers; i++) { - if (strcmp(names[i], pwd->pw_name) == 0 && - uids[i] == pwd->pw_uid) { - break; - } - } - if (i == nusers) - return (false); - } - if (got != nusers) - return (false); - - for (i = 0; i < nusers; i++) { - pwd = cap_getpwnam(cappwd, names[i]); - if (pwd == NULL) - return (false); - } - - for (i = 0; i < nusers; i++) { - cap_getpwnam_r(cappwd, names[i], &st, buf, sizeof(buf), &pwd); - if (pwd == NULL) - return (false); - } - - for (i = 0; i < nusers; i++) { - pwd = cap_getpwuid(cappwd, uids[i]); - if (pwd == NULL) - return (false); - } - - for (i = 0; i < nusers; i++) { - cap_getpwuid_r(cappwd, uids[i], &st, buf, sizeof(buf), &pwd); - if (pwd == NULL) - return (false); - } - - return (true); -} - -static void -test_users(cap_channel_t *origcappwd) -{ - cap_channel_t *cappwd; - const char *names[6]; - uid_t uids[6]; - - /* - * Allow: - * users: - * names: root, toor, daemon, operator, bin, tty - * uids: - */ - cappwd = cap_clone(origcappwd); - CHECK(cappwd != NULL); - - names[0] = "root"; - names[1] = "toor"; - names[2] = "daemon"; - names[3] = "operator"; - names[4] = "bin"; - names[5] = "tty"; - CHECK(cap_pwd_limit_users(cappwd, names, 6, NULL, 0) == 0); - uids[0] = 0; - uids[1] = 0; - uids[2] = 1; - uids[3] = 2; - uids[4] = 3; - uids[5] = 4; - - CHECK(runtest_users(cappwd, names, uids, 6)); - - cap_close(cappwd); - - /* - * Allow: - * users: - * names: daemon, operator, bin - * uids: - */ - cappwd = cap_clone(origcappwd); - CHECK(cappwd != NULL); - - names[0] = "daemon"; - names[1] = "operator"; - names[2] = "bin"; - CHECK(cap_pwd_limit_users(cappwd, names, 3, NULL, 0) == 0); - names[3] = "tty"; - CHECK(cap_pwd_limit_users(cappwd, names, 4, NULL, 0) == -1 && - errno == ENOTCAPABLE); - names[0] = "tty"; - CHECK(cap_pwd_limit_users(cappwd, names, 1, NULL, 0) == -1 && - errno == ENOTCAPABLE); - names[0] = "daemon"; - uids[0] = 1; - uids[1] = 2; - uids[2] = 3; - - CHECK(runtest_users(cappwd, names, uids, 3)); - - cap_close(cappwd); - - /* - * Allow: - * users: - * names: daemon, bin, tty - * uids: - */ - cappwd = cap_clone(origcappwd); - CHECK(cappwd != NULL); - - names[0] = "daemon"; - names[1] = "bin"; - names[2] = "tty"; - CHECK(cap_pwd_limit_users(cappwd, names, 3, NULL, 0) == 0); - names[3] = "operator"; - CHECK(cap_pwd_limit_users(cappwd, names, 4, NULL, 0) == -1 && - errno == ENOTCAPABLE); - names[0] = "operator"; - CHECK(cap_pwd_limit_users(cappwd, names, 1, NULL, 0) == -1 && - errno == ENOTCAPABLE); - names[0] = "daemon"; - uids[0] = 1; - uids[1] = 3; - uids[2] = 4; - - CHECK(runtest_users(cappwd, names, uids, 3)); - - cap_close(cappwd); - - /* - * Allow: - * users: - * names: - * uids: 1, 2, 3 - */ - cappwd = cap_clone(origcappwd); - CHECK(cappwd != NULL); - - names[0] = "daemon"; - names[1] = "operator"; - names[2] = "bin"; - uids[0] = 1; - uids[1] = 2; - uids[2] = 3; - CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 3) == 0); - uids[3] = 4; - CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 4) == -1 && - errno == ENOTCAPABLE); - uids[0] = 4; - CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 1) == -1 && - errno == ENOTCAPABLE); - uids[0] = 1; - - CHECK(runtest_users(cappwd, names, uids, 3)); - - cap_close(cappwd); - - /* - * Allow: - * users: - * names: - * uids: 1, 3, 4 - */ - cappwd = cap_clone(origcappwd); - CHECK(cappwd != NULL); - - names[0] = "daemon"; - names[1] = "bin"; - names[2] = "tty"; - uids[0] = 1; - uids[1] = 3; - uids[2] = 4; - CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 3) == 0); - uids[3] = 5; - CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 4) == -1 && - errno == ENOTCAPABLE); - uids[0] = 5; - CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 1) == -1 && - errno == ENOTCAPABLE); - uids[0] = 1; - - CHECK(runtest_users(cappwd, names, uids, 3)); - - cap_close(cappwd); - - /* - * Allow: - * users: - * names: bin - * uids: - */ - cappwd = cap_clone(origcappwd); - CHECK(cappwd != NULL); - - names[0] = "bin"; - CHECK(cap_pwd_limit_users(cappwd, names, 1, NULL, 0) == 0); - names[1] = "operator"; - CHECK(cap_pwd_limit_users(cappwd, names, 2, NULL, 0) == -1 && - errno == ENOTCAPABLE); - names[0] = "operator"; - CHECK(cap_pwd_limit_users(cappwd, names, 1, NULL, 0) == -1 && - errno == ENOTCAPABLE); - names[0] = "bin"; - uids[0] = 3; - - CHECK(runtest_users(cappwd, names, uids, 1)); - - cap_close(cappwd); - - /* - * Allow: - * users: - * names: daemon, tty - * uids: - */ - cappwd = cap_clone(origcappwd); - CHECK(cappwd != NULL); - - names[0] = "daemon"; - names[1] = "tty"; - CHECK(cap_pwd_limit_users(cappwd, names, 2, NULL, 0) == 0); - names[2] = "operator"; - CHECK(cap_pwd_limit_users(cappwd, names, 3, NULL, 0) == -1 && - errno == ENOTCAPABLE); - names[0] = "operator"; - CHECK(cap_pwd_limit_users(cappwd, names, 1, NULL, 0) == -1 && - errno == ENOTCAPABLE); - names[0] = "daemon"; - uids[0] = 1; - uids[1] = 4; - - CHECK(runtest_users(cappwd, names, uids, 2)); - - cap_close(cappwd); - - /* - * Allow: - * users: - * names: - * uids: 3 - */ - cappwd = cap_clone(origcappwd); - CHECK(cappwd != NULL); - - names[0] = "bin"; - uids[0] = 3; - CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 1) == 0); - uids[1] = 4; - CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 2) == -1 && - errno == ENOTCAPABLE); - uids[0] = 4; - CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 1) == -1 && - errno == ENOTCAPABLE); - uids[0] = 3; - - CHECK(runtest_users(cappwd, names, uids, 1)); - - cap_close(cappwd); - - /* - * Allow: - * users: - * names: - * uids: 1, 4 - */ - cappwd = cap_clone(origcappwd); - CHECK(cappwd != NULL); - - names[0] = "daemon"; - names[1] = "tty"; - uids[0] = 1; - uids[1] = 4; - CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 2) == 0); - uids[2] = 3; - CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 3) == -1 && - errno == ENOTCAPABLE); - uids[0] = 3; - CHECK(cap_pwd_limit_users(cappwd, NULL, 0, uids, 1) == -1 && - errno == ENOTCAPABLE); - uids[0] = 1; - - CHECK(runtest_users(cappwd, names, uids, 2)); - - cap_close(cappwd); -} - -int -main(void) -{ - cap_channel_t *capcas, *cappwd; - - printf("1..188\n"); - - capcas = cap_init(); - CHECKX(capcas != NULL); - - cappwd = cap_service_open(capcas, "system.pwd"); - CHECKX(cappwd != NULL); - - cap_close(capcas); - - /* No limits. */ - - CHECK(runtest_cmds(cappwd) == (GETPWENT | GETPWENT_R | GETPWNAM | - GETPWNAM_R | GETPWUID | GETPWUID_R)); - - test_cmds(cappwd); - - test_fields(cappwd); - - test_users(cappwd); - - cap_close(cappwd); - - exit(0); -} Property changes on: projects/clang390-import/tools/regression/capsicum/libcasper/pwd.c ___________________________________________________________________ Deleted: svn:eol-style ## -1 +0,0 ## -native \ No newline at end of property Deleted: svn:keywords ## -1 +0,0 ## -FreeBSD=%H \ No newline at end of property Deleted: svn:mime-type ## -1 +0,0 ## -text/plain \ No newline at end of property Index: projects/clang390-import/tools/regression/capsicum/libcasper/sysctl.c =================================================================== --- projects/clang390-import/tools/regression/capsicum/libcasper/sysctl.c (revision 305686) +++ projects/clang390-import/tools/regression/capsicum/libcasper/sysctl.c (nonexistent) @@ -1,1510 +0,0 @@ -/*- - * Copyright (c) 2013 The FreeBSD Foundation - * All rights reserved. - * - * This software was developed by Pawel Jakub Dawidek under sponsorship from - * the FreeBSD Foundation. - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND - * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE - * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE - * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL - * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS - * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) - * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT - * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY - * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF - * SUCH DAMAGE. - */ - -#include -__FBSDID("$FreeBSD$"); - -#include -#include -#include -#include - -#include -#include -#include -#include -#include -#include -#include -#include - -#include - -#include - -/* - * We need some sysctls to perform the tests on. - * We remember their values and restore them afer the test is done. - */ -#define SYSCTL0_PARENT "kern" -#define SYSCTL0_NAME "kern.sync_on_panic" -#define SYSCTL1_PARENT "debug" -#define SYSCTL1_NAME "debug.minidump" - -static int ntest = 1; - -#define CHECK(expr) do { \ - if ((expr)) \ - printf("ok %d %s:%u\n", ntest, __FILE__, __LINE__); \ - else \ - printf("not ok %d %s:%u\n", ntest, __FILE__, __LINE__); \ - ntest++; \ -} while (0) -#define CHECKX(expr) do { \ - if ((expr)) { \ - printf("ok %d %s:%u\n", ntest, __FILE__, __LINE__); \ - } else { \ - printf("not ok %d %s:%u\n", ntest, __FILE__, __LINE__); \ - exit(1); \ - } \ - ntest++; \ -} while (0) - -#define SYSCTL0_READ0 0x0001 -#define SYSCTL0_READ1 0x0002 -#define SYSCTL0_READ2 0x0004 -#define SYSCTL0_WRITE 0x0008 -#define SYSCTL0_READ_WRITE 0x0010 -#define SYSCTL1_READ0 0x0020 -#define SYSCTL1_READ1 0x0040 -#define SYSCTL1_READ2 0x0080 -#define SYSCTL1_WRITE 0x0100 -#define SYSCTL1_READ_WRITE 0x0200 - -static unsigned int -runtest(cap_channel_t *capsysctl) -{ - unsigned int result; - int oldvalue, newvalue; - size_t oldsize; - - result = 0; - - oldsize = sizeof(oldvalue); - if (cap_sysctlbyname(capsysctl, SYSCTL0_NAME, &oldvalue, &oldsize, - NULL, 0) == 0) { - if (oldsize == sizeof(oldvalue)) - result |= SYSCTL0_READ0; - } - - newvalue = 123; - if (cap_sysctlbyname(capsysctl, SYSCTL0_NAME, NULL, NULL, &newvalue, - sizeof(newvalue)) == 0) { - result |= SYSCTL0_WRITE; - } - - if ((result & SYSCTL0_WRITE) != 0) { - oldsize = sizeof(oldvalue); - if (cap_sysctlbyname(capsysctl, SYSCTL0_NAME, &oldvalue, - &oldsize, NULL, 0) == 0) { - if (oldsize == sizeof(oldvalue) && oldvalue == 123) - result |= SYSCTL0_READ1; - } - } - - oldsize = sizeof(oldvalue); - newvalue = 4567; - if (cap_sysctlbyname(capsysctl, SYSCTL0_NAME, &oldvalue, &oldsize, - &newvalue, sizeof(newvalue)) == 0) { - if (oldsize == sizeof(oldvalue) && oldvalue == 123) - result |= SYSCTL0_READ_WRITE; - } - - if ((result & SYSCTL0_READ_WRITE) != 0) { - oldsize = sizeof(oldvalue); - if (cap_sysctlbyname(capsysctl, SYSCTL0_NAME, &oldvalue, - &oldsize, NULL, 0) == 0) { - if (oldsize == sizeof(oldvalue) && oldvalue == 4567) - result |= SYSCTL0_READ2; - } - } - - oldsize = sizeof(oldvalue); - if (cap_sysctlbyname(capsysctl, SYSCTL1_NAME, &oldvalue, &oldsize, - NULL, 0) == 0) { - if (oldsize == sizeof(oldvalue)) - result |= SYSCTL1_READ0; - } - - newvalue = 506; - if (cap_sysctlbyname(capsysctl, SYSCTL1_NAME, NULL, NULL, &newvalue, - sizeof(newvalue)) == 0) { - result |= SYSCTL1_WRITE; - } - - if ((result & SYSCTL1_WRITE) != 0) { - oldsize = sizeof(oldvalue); - if (cap_sysctlbyname(capsysctl, SYSCTL1_NAME, &oldvalue, - &oldsize, NULL, 0) == 0) { - if (oldsize == sizeof(oldvalue) && oldvalue == 506) - result |= SYSCTL1_READ1; - } - } - - oldsize = sizeof(oldvalue); - newvalue = 7008; - if (cap_sysctlbyname(capsysctl, SYSCTL1_NAME, &oldvalue, &oldsize, - &newvalue, sizeof(newvalue)) == 0) { - if (oldsize == sizeof(oldvalue) && oldvalue == 506) - result |= SYSCTL1_READ_WRITE; - } - - if ((result & SYSCTL1_READ_WRITE) != 0) { - oldsize = sizeof(oldvalue); - if (cap_sysctlbyname(capsysctl, SYSCTL1_NAME, &oldvalue, - &oldsize, NULL, 0) == 0) { - if (oldsize == sizeof(oldvalue) && oldvalue == 7008) - result |= SYSCTL1_READ2; - } - } - - return (result); -} - -static void -test_operation(cap_channel_t *origcapsysctl) -{ - cap_channel_t *capsysctl; - nvlist_t *limits; - - /* - * Allow: - * SYSCTL0_PARENT/RDWR/RECURSIVE - * SYSCTL1_PARENT/RDWR/RECURSIVE - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == 0); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, "foo.bar", - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, "foo.bar", - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - - CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL0_READ1 | - SYSCTL0_READ2 | SYSCTL0_WRITE | SYSCTL0_READ_WRITE | - SYSCTL1_READ0 | SYSCTL1_READ1 | SYSCTL1_READ2 | SYSCTL1_WRITE | - SYSCTL1_READ_WRITE)); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_NAME, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == 0); - - CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL0_READ1 | - SYSCTL0_READ2 | SYSCTL0_WRITE | SYSCTL0_READ_WRITE | - SYSCTL1_READ0 | SYSCTL1_READ1 | SYSCTL1_READ2 | SYSCTL1_WRITE | - SYSCTL1_READ_WRITE)); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_NAME, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == 0); - - CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL1_WRITE)); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_READ); - nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_WRITE); - CHECK(cap_limit_set(capsysctl, limits) == 0); - - CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL1_WRITE)); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_READ); - CHECK(cap_limit_set(capsysctl, limits) == 0); - - CHECK(runtest(capsysctl) == SYSCTL0_READ0); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL0_NAME/RDWR/RECURSIVE - * SYSCTL1_NAME/RDWR/RECURSIVE - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_NAME, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == 0); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_RDWR); - nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_RDWR); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_READ); - nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_WRITE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_READ); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - - CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL0_READ1 | - SYSCTL0_READ2 | SYSCTL0_WRITE | SYSCTL0_READ_WRITE | - SYSCTL1_READ0 | SYSCTL1_READ1 | SYSCTL1_READ2 | SYSCTL1_WRITE | - SYSCTL1_READ_WRITE)); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL0_PARENT/RDWR - * SYSCTL1_PARENT/RDWR - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_RDWR); - nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_RDWR); - CHECK(cap_limit_set(capsysctl, limits) == 0); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - - CHECK(runtest(capsysctl) == 0); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL0_NAME/RDWR - * SYSCTL1_NAME/RDWR - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_RDWR); - nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_RDWR); - CHECK(cap_limit_set(capsysctl, limits) == 0); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - - CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL0_READ1 | - SYSCTL0_READ2 | SYSCTL0_WRITE | SYSCTL0_READ_WRITE | - SYSCTL1_READ0 | SYSCTL1_READ1 | SYSCTL1_READ2 | SYSCTL1_WRITE | - SYSCTL1_READ_WRITE)); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL0_PARENT/RDWR - * SYSCTL1_PARENT/RDWR/RECURSIVE - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_RDWR); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == 0); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - - CHECK(runtest(capsysctl) == (SYSCTL1_READ0 | SYSCTL1_READ1 | - SYSCTL1_READ2 | SYSCTL1_WRITE | SYSCTL1_READ_WRITE)); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL0_NAME/RDWR - * SYSCTL1_NAME/RDWR/RECURSIVE - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_RDWR); - nvlist_add_number(limits, SYSCTL1_NAME, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == 0); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_NAME, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - - CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL0_READ1 | - SYSCTL0_READ2 | SYSCTL0_WRITE | SYSCTL0_READ_WRITE | - SYSCTL1_READ0 | SYSCTL1_READ1 | SYSCTL1_READ2 | SYSCTL1_WRITE | - SYSCTL1_READ_WRITE)); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL0_PARENT/READ/RECURSIVE - * SYSCTL1_PARENT/READ/RECURSIVE - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == 0); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_RDWR); - nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_RDWR); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_WRITE); - nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_WRITE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_RDWR); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_WRITE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - - CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL1_READ0)); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL0_NAME/READ/RECURSIVE - * SYSCTL1_NAME/READ/RECURSIVE - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_NAME, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == 0); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_NAME, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_NAME, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_RDWR); - nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_RDWR); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_WRITE); - nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_WRITE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL1_NAME, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_RDWR); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_WRITE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - - CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL1_READ0)); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL0_PARENT/READ - * SYSCTL1_PARENT/READ - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_READ); - nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_READ); - CHECK(cap_limit_set(capsysctl, limits) == 0); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_RDWR); - nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_RDWR); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_WRITE); - nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_WRITE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_RDWR); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_WRITE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - - CHECK(runtest(capsysctl) == 0); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL0_NAME/READ - * SYSCTL1_NAME/READ - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_READ); - nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_READ); - CHECK(cap_limit_set(capsysctl, limits) == 0); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_NAME, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_NAME, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_NAME, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_RDWR); - nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_RDWR); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_WRITE); - nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_WRITE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL1_NAME, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_RDWR); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_WRITE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - - CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL1_READ0)); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL0_PARENT/READ - * SYSCTL1_PARENT/READ/RECURSIVE - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_READ); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == 0); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_READ); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - - CHECK(runtest(capsysctl) == SYSCTL1_READ0); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL0_NAME/READ - * SYSCTL1_NAME/READ/RECURSIVE - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_READ); - nvlist_add_number(limits, SYSCTL1_NAME, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == 0); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_READ); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - - CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL1_READ0)); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL0_PARENT/WRITE/RECURSIVE - * SYSCTL1_PARENT/WRITE/RECURSIVE - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == 0); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_RDWR); - nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_RDWR); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_READ); - nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_READ); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_RDWR); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_READ); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - - CHECK(runtest(capsysctl) == (SYSCTL0_WRITE | SYSCTL1_WRITE)); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL0_NAME/WRITE/RECURSIVE - * SYSCTL1_NAME/WRITE/RECURSIVE - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_NAME, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == 0); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_NAME, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_NAME, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_RDWR); - nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_RDWR); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_READ); - nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_READ); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL1_NAME, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_RDWR); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_READ); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - - CHECK(runtest(capsysctl) == (SYSCTL0_WRITE | SYSCTL1_WRITE)); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL0_PARENT/WRITE - * SYSCTL1_PARENT/WRITE - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_WRITE); - nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_WRITE); - CHECK(cap_limit_set(capsysctl, limits) == 0); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_RDWR); - nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_RDWR); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_READ); - nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_READ); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_RDWR); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_READ); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - - CHECK(runtest(capsysctl) == 0); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL0_NAME/WRITE - * SYSCTL1_NAME/WRITE - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_WRITE); - nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_WRITE); - CHECK(cap_limit_set(capsysctl, limits) == 0); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_NAME, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_NAME, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_NAME, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_RDWR); - nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_RDWR); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_READ); - nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_READ); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL1_NAME, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_RDWR); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_READ); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - - CHECK(runtest(capsysctl) == (SYSCTL0_WRITE | SYSCTL1_WRITE)); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL0_PARENT/WRITE - * SYSCTL1_PARENT/WRITE/RECURSIVE - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_WRITE); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == 0); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_WRITE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - - CHECK(runtest(capsysctl) == SYSCTL1_WRITE); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL0_NAME/WRITE - * SYSCTL1_NAME/WRITE/RECURSIVE - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_WRITE); - nvlist_add_number(limits, SYSCTL1_NAME, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == 0); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_WRITE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - - CHECK(runtest(capsysctl) == (SYSCTL0_WRITE | SYSCTL1_WRITE)); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL0_PARENT/READ/RECURSIVE - * SYSCTL1_PARENT/WRITE/RECURSIVE - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == 0); - - CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL1_WRITE)); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL0_NAME/READ/RECURSIVE - * SYSCTL1_NAME/WRITE/RECURSIVE - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_NAME, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == 0); - - CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL1_WRITE)); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL0_PARENT/READ - * SYSCTL1_PARENT/WRITE - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_READ); - nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_WRITE); - CHECK(cap_limit_set(capsysctl, limits) == 0); - - CHECK(runtest(capsysctl) == 0); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL0_NAME/READ - * SYSCTL1_NAME/WRITE - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_READ); - nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_WRITE); - CHECK(cap_limit_set(capsysctl, limits) == 0); - - CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL1_WRITE)); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL0_PARENT/READ - * SYSCTL1_PARENT/WRITE/RECURSIVE - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_READ); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == 0); - - CHECK(runtest(capsysctl) == SYSCTL1_WRITE); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL0_NAME/READ - * SYSCTL1_NAME/WRITE/RECURSIVE - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_READ); - nvlist_add_number(limits, SYSCTL1_NAME, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == 0); - - CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL1_WRITE)); - - cap_close(capsysctl); -} - -static void -test_names(cap_channel_t *origcapsysctl) -{ - cap_channel_t *capsysctl; - nvlist_t *limits; - - /* - * Allow: - * SYSCTL0_PARENT/READ/RECURSIVE - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == 0); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_READ); - nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_READ); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_READ); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - - CHECK(runtest(capsysctl) == SYSCTL0_READ0); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL1_NAME/READ/RECURSIVE - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL1_NAME, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == 0); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_NAME, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_READ | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_READ); - nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_READ); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_READ); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - - CHECK(runtest(capsysctl) == SYSCTL1_READ0); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL0_PARENT/WRITE/RECURSIVE - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == 0); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_WRITE); - nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_WRITE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_WRITE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - - CHECK(runtest(capsysctl) == SYSCTL0_WRITE); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL1_NAME/WRITE/RECURSIVE - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL1_NAME, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == 0); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_NAME, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_WRITE | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_WRITE); - nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_WRITE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_WRITE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - - CHECK(runtest(capsysctl) == SYSCTL1_WRITE); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL0_PARENT/RDWR/RECURSIVE - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == 0); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL1_PARENT, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_READ); - nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_READ); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_READ); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - - CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL0_READ1 | - SYSCTL0_READ2 | SYSCTL0_WRITE | SYSCTL0_READ_WRITE)); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL1_NAME/RDWR/RECURSIVE - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL1_NAME, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == 0); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - nvlist_add_number(limits, SYSCTL1_NAME, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, - CAP_SYSCTL_RDWR | CAP_SYSCTL_RECURSIVE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_READ); - nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_READ); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_READ); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - - CHECK(runtest(capsysctl) == (SYSCTL1_READ0 | SYSCTL1_READ1 | - SYSCTL1_READ2 | SYSCTL1_WRITE | SYSCTL1_READ_WRITE)); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL0_PARENT/READ - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_READ); - CHECK(cap_limit_set(capsysctl, limits) == 0); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_READ); - nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_READ); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_READ); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - - CHECK(runtest(capsysctl) == 0); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL1_NAME/READ - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_READ); - CHECK(cap_limit_set(capsysctl, limits) == 0); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_READ); - nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_READ); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_READ); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - - CHECK(runtest(capsysctl) == SYSCTL1_READ0); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL0_PARENT/WRITE - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_WRITE); - CHECK(cap_limit_set(capsysctl, limits) == 0); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_WRITE); - nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_WRITE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_WRITE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - - CHECK(runtest(capsysctl) == 0); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL1_NAME/WRITE - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_WRITE); - CHECK(cap_limit_set(capsysctl, limits) == 0); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_WRITE); - nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_WRITE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_WRITE); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - - CHECK(runtest(capsysctl) == SYSCTL1_WRITE); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL0_PARENT/RDWR - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_RDWR); - CHECK(cap_limit_set(capsysctl, limits) == 0); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_PARENT, CAP_SYSCTL_RDWR); - nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_RDWR); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL1_PARENT, CAP_SYSCTL_RDWR); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - - CHECK(runtest(capsysctl) == 0); - - cap_close(capsysctl); - - /* - * Allow: - * SYSCTL1_NAME/RDWR - */ - - capsysctl = cap_clone(origcapsysctl); - CHECK(capsysctl != NULL); - - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_RDWR); - CHECK(cap_limit_set(capsysctl, limits) == 0); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_RDWR); - nvlist_add_number(limits, SYSCTL1_NAME, CAP_SYSCTL_RDWR); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - limits = nvlist_create(0); - nvlist_add_number(limits, SYSCTL0_NAME, CAP_SYSCTL_RDWR); - CHECK(cap_limit_set(capsysctl, limits) == -1 && errno == ENOTCAPABLE); - - CHECK(runtest(capsysctl) == (SYSCTL1_READ0 | SYSCTL1_READ1 | - SYSCTL1_READ2 | SYSCTL1_WRITE | SYSCTL1_READ_WRITE)); - - cap_close(capsysctl); -} - -int -main(void) -{ - cap_channel_t *capcas, *capsysctl; - int scvalue0, scvalue1; - size_t scsize; - - printf("1..256\n"); - - scsize = sizeof(scvalue0); - CHECKX(sysctlbyname(SYSCTL0_NAME, &scvalue0, &scsize, NULL, 0) == 0); - CHECKX(scsize == sizeof(scvalue0)); - scsize = sizeof(scvalue1); - CHECKX(sysctlbyname(SYSCTL1_NAME, &scvalue1, &scsize, NULL, 0) == 0); - CHECKX(scsize == sizeof(scvalue1)); - - capcas = cap_init(); - CHECKX(capcas != NULL); - - capsysctl = cap_service_open(capcas, "system.sysctl"); - CHECKX(capsysctl != NULL); - - cap_close(capcas); - - /* No limits set. */ - - CHECK(runtest(capsysctl) == (SYSCTL0_READ0 | SYSCTL0_READ1 | - SYSCTL0_READ2 | SYSCTL0_WRITE | SYSCTL0_READ_WRITE | - SYSCTL1_READ0 | SYSCTL1_READ1 | SYSCTL1_READ2 | SYSCTL1_WRITE | - SYSCTL1_READ_WRITE)); - - test_operation(capsysctl); - - test_names(capsysctl); - - cap_close(capsysctl); - - CHECK(sysctlbyname(SYSCTL0_NAME, NULL, NULL, &scvalue0, - sizeof(scvalue0)) == 0); - CHECK(sysctlbyname(SYSCTL1_NAME, NULL, NULL, &scvalue1, - sizeof(scvalue1)) == 0); - - exit(0); -} Property changes on: projects/clang390-import/tools/regression/capsicum/libcasper/sysctl.c ___________________________________________________________________ Deleted: svn:eol-style ## -1 +0,0 ## -native \ No newline at end of property Deleted: svn:keywords ## -1 +0,0 ## -FreeBSD=%H \ No newline at end of property Deleted: svn:mime-type ## -1 +0,0 ## -text/plain \ No newline at end of property Index: projects/clang390-import/tools/regression/capsicum/libcasper/Makefile =================================================================== --- projects/clang390-import/tools/regression/capsicum/libcasper/Makefile (revision 305686) +++ projects/clang390-import/tools/regression/capsicum/libcasper/Makefile (nonexistent) @@ -1,32 +0,0 @@ -# $FreeBSD$ - -SERVICES= dns -SERVICES+= grp -SERVICES+= pwd -SERVICES+= sysctl - -CFLAGS= -O2 -pipe -std=gnu99 -fstack-protector -CFLAGS+= -Wsystem-headers -Werror -Wall -Wno-format-y2k -W -Wno-unused-parameter -CFLAGS+= -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wreturn-type -CFLAGS+= -Wcast-qual -Wwrite-strings -Wswitch -Wshadow -Wunused-parameter -CFLAGS+= -Wcast-align -Wchar-subscripts -Winline -Wnested-externs -Wredundant-decls -CFLAGS+= -Wold-style-definition -Wno-pointer-sign - -CFLAGS+= -ggdb - -SERVTEST= ${SERVICES:=.t} - -all: ${SERVTEST} - -.for SERVICE in ${SERVICES} - -${SERVICE}.t: ${SERVICE}.c - ${CC} ${CFLAGS} ${@:.t=.c} -o $@ -lnv -lcasper -lcap_${@:.t=} - -.endfor - -test: all - @prove -r ${.CURDIR} - -clean: - rm -f ${SERVTEST} Property changes on: projects/clang390-import/tools/regression/capsicum/libcasper/Makefile ___________________________________________________________________ Deleted: svn:eol-style ## -1 +0,0 ## -native \ No newline at end of property Deleted: svn:keywords ## -1 +0,0 ## -FreeBSD=%H \ No newline at end of property Deleted: svn:mime-type ## -1 +0,0 ## -text/plain \ No newline at end of property Index: projects/clang390-import/usr.bin/bmake/Makefile =================================================================== --- projects/clang390-import/usr.bin/bmake/Makefile (revision 305686) +++ projects/clang390-import/usr.bin/bmake/Makefile (revision 305687) @@ -1,177 +1,177 @@ # This is a generated file, do NOT edit! # See contrib/bmake/bsd.after-import.mk # # $FreeBSD$ .sinclude "Makefile.inc" SRCTOP?= ${.CURDIR:H:H} # look here first for config.h CFLAGS+= -I${.CURDIR} # for after-import CLEANDIRS+= FreeBSD CLEANFILES+= bootstrap -# $Id: Makefile,v 1.67 2016/06/07 00:46:12 sjg Exp $ +# $Id: Makefile,v 1.72 2016/08/18 23:02:26 sjg Exp $ # Base version on src date -_MAKE_VERSION= 20160606 +_MAKE_VERSION= 20160818 PROG?= ${.CURDIR:T} SRCS= \ arch.c \ buf.c \ compat.c \ cond.c \ dir.c \ for.c \ hash.c \ job.c \ main.c \ make.c \ make_malloc.c \ meta.c \ metachar.c \ parse.c \ str.c \ strlist.c \ suff.c \ targ.c \ trace.c \ util.c \ var.c # from lst.lib/ SRCS+= \ lstAppend.c \ lstAtEnd.c \ lstAtFront.c \ lstClose.c \ lstConcat.c \ lstDatum.c \ lstDeQueue.c \ lstDestroy.c \ lstDupl.c \ lstEnQueue.c \ lstFind.c \ lstFindFrom.c \ lstFirst.c \ lstForEach.c \ lstForEachFrom.c \ lstInit.c \ lstInsert.c \ lstIsAtEnd.c \ lstIsEmpty.c \ lstLast.c \ lstMember.c \ lstNext.c \ lstOpen.c \ lstPrev.c \ lstRemove.c \ lstReplace.c \ lstSucc.c # this file gets generated by configure .sinclude "Makefile.config" .if !empty(LIBOBJS) SRCS+= ${LIBOBJS:T:.o=.c} .endif # just in case prefix?= /usr srcdir?= ${.CURDIR} DEFAULT_SYS_PATH?= ${prefix}/share/mk CPPFLAGS+= -DUSE_META CFLAGS+= ${CPPFLAGS} CFLAGS+= -D_PATH_DEFSYSPATH=\"${DEFAULT_SYS_PATH}\" CFLAGS+= -I. -I${srcdir} ${XDEFS} -DMAKE_NATIVE CFLAGS+= ${COPTS.${.ALLSRC:M*.c:T:u}} COPTS.main.c+= "-DMAKE_VERSION=\"${_MAKE_VERSION}\"" # meta mode can be useful even without filemon FILEMON_H ?= /usr/include/dev/filemon/filemon.h .if exists(${FILEMON_H}) && ${FILEMON_H:T} == "filemon.h" COPTS.meta.c += -DHAVE_FILEMON_H -I${FILEMON_H:H} .endif .PATH: ${srcdir} .PATH: ${srcdir}/lst.lib .if make(obj) || make(clean) SUBDIR+= unit-tests .endif MAN= ${PROG}.1 MAN1= ${MAN} .if (${PROG} != "make") CLEANFILES+= my.history .if make(${MAN}) || !exists(${srcdir}/${MAN}) my.history: ${MAKEFILE} @(echo ".Nm"; \ echo "is derived from NetBSD"; \ echo ".Xr make 1 ."; \ echo "It uses autoconf to facilitate portability to other platforms."; \ echo ".Pp") > $@ .NOPATH: ${MAN} ${MAN}: make.1 my.history @echo making $@ @sed -e 's/^.Nx/NetBSD/' -e '/^.Nm/s/make/${PROG}/' \ -e '/^.Sh HISTORY/rmy.history' \ -e '/^.Sh HISTORY/,$$s,^.Nm,make,' ${srcdir}/make.1 > $@ all beforeinstall: ${MAN} _mfromdir=. .endif .endif MANTARGET?= cat MANDEST?= ${MANDIR}/${MANTARGET}1 .if ${MANTARGET} == "cat" _mfromdir=${srcdir} .endif .include CPPFLAGS+= -DMAKE_NATIVE -DHAVE_CONFIG_H COPTS.var.c += -Wno-cast-qual COPTS.job.c += -Wno-format-nonliteral COPTS.parse.c += -Wno-format-nonliteral COPTS.var.c += -Wno-format-nonliteral # Force these SHAREDIR= ${SHAREDIR.bmake:U${prefix}/share} BINDIR= ${BINDIR.bmake:U${prefix}/bin} MANDIR= ${MANDIR.bmake:U${SHAREDIR}/man} .if !exists(.depend) ${OBJS}: config.h .endif # make sure that MAKE_VERSION gets updated. main.o: ${SRCS} ${MAKEFILE} # A simple unit-test driver to help catch regressions accept test: cd ${.CURDIR}/unit-tests && MAKEFLAGS= ${.MAKE} -r -m / TEST_MAKE=${TEST_MAKE:U${.OBJDIR}/${PROG:T}} ${.TARGET} # override some simple things BINDIR= /usr/bin MANDIR= /usr/share/man/man # make sure we get this CFLAGS+= ${COPTS.${.IMPSRC:T}} after-import: ${SRCTOP}/contrib/bmake/bsd.after-import.mk cd ${.CURDIR} && ${.MAKE} -f ${SRCTOP}/contrib/bmake/bsd.after-import.mk Index: projects/clang390-import/usr.sbin/newsyslog/newsyslog.c =================================================================== --- projects/clang390-import/usr.sbin/newsyslog/newsyslog.c (revision 305686) +++ projects/clang390-import/usr.sbin/newsyslog/newsyslog.c (revision 305687) @@ -1,2681 +1,2681 @@ /*- * ------+---------+---------+-------- + --------+---------+---------+---------* * This file includes significant modifications done by: * Copyright (c) 2003, 2004 - Garance Alistair Drosehn . * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * ------+---------+---------+-------- + --------+---------+---------+---------* */ /* * This file contains changes from the Open Software Foundation. */ /* * Copyright 1988, 1989 by the Massachusetts Institute of Technology * * Permission to use, copy, modify, and distribute this software and its * documentation for any purpose and without fee is hereby granted, provided * that the above copyright notice appear in all copies and that both that * copyright notice and this permission notice appear in supporting * documentation, and that the names of M.I.T. and the M.I.T. S.I.P.B. not be * used in advertising or publicity pertaining to distribution of the * software without specific, written prior permission. M.I.T. and the M.I.T. * S.I.P.B. make no representations about the suitability of this software * for any purpose. It is provided "as is" without express or implied * warranty. * */ /* * newsyslog - roll over selected logs at the appropriate time, keeping the a * specified number of backup files around. */ #include __FBSDID("$FreeBSD$"); #define OSF #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "pathnames.h" #include "extern.h" /* * Compression suffixes */ #ifndef COMPRESS_SUFFIX_GZ #define COMPRESS_SUFFIX_GZ ".gz" #endif #ifndef COMPRESS_SUFFIX_BZ2 #define COMPRESS_SUFFIX_BZ2 ".bz2" #endif #ifndef COMPRESS_SUFFIX_XZ #define COMPRESS_SUFFIX_XZ ".xz" #endif #define COMPRESS_SUFFIX_MAXLEN MAX(MAX(sizeof(COMPRESS_SUFFIX_GZ),sizeof(COMPRESS_SUFFIX_BZ2)),sizeof(COMPRESS_SUFFIX_XZ)) /* * Compression types */ #define COMPRESS_TYPES 4 /* Number of supported compression types */ #define COMPRESS_NONE 0 #define COMPRESS_GZIP 1 #define COMPRESS_BZIP2 2 #define COMPRESS_XZ 3 /* * Bit-values for the 'flags' parsed from a config-file entry. */ #define CE_BINARY 0x0008 /* Logfile is in binary, do not add status */ /* messages to logfile(s) when rotating. */ #define CE_NOSIGNAL 0x0010 /* There is no process to signal when */ /* trimming this file. */ #define CE_TRIMAT 0x0020 /* trim file at a specific time. */ #define CE_GLOB 0x0040 /* name of the log is file name pattern. */ #define CE_SIGNALGROUP 0x0080 /* Signal a process-group instead of a single */ /* process when trimming this file. */ #define CE_CREATE 0x0100 /* Create the log file if it does not exist. */ #define CE_NODUMP 0x0200 /* Set 'nodump' on newly created log file. */ #define CE_PID2CMD 0x0400 /* Replace PID file with a shell command.*/ #define MIN_PID 5 /* Don't touch pids lower than this */ #define MAX_PID 99999 /* was lower, see /usr/include/sys/proc.h */ #define kbytes(size) (((size) + 1023) >> 10) #define DEFAULT_MARKER "" #define DEBUG_MARKER "" #define INCLUDE_MARKER "" #define DEFAULT_TIMEFNAME_FMT "%Y%m%dT%H%M%S" #define MAX_OLDLOGS 65536 /* Default maximum number of old logfiles */ struct compress_types { const char *flag; /* Flag in configuration file */ const char *suffix; /* Compression suffix */ const char *path; /* Path to compression program */ }; static const struct compress_types compress_type[COMPRESS_TYPES] = { { "", "", "" }, /* no compression */ { "Z", COMPRESS_SUFFIX_GZ, _PATH_GZIP }, /* gzip compression */ { "J", COMPRESS_SUFFIX_BZ2, _PATH_BZIP2 }, /* bzip2 compression */ { "X", COMPRESS_SUFFIX_XZ, _PATH_XZ } /* xz compression */ }; struct conf_entry { STAILQ_ENTRY(conf_entry) cf_nextp; char *log; /* Name of the log */ char *pid_cmd_file; /* PID or command file */ char *r_reason; /* The reason this file is being rotated */ int firstcreate; /* Creating log for the first time (-C). */ int rotate; /* Non-zero if this file should be rotated */ int fsize; /* size found for the log file */ uid_t uid; /* Owner of log */ gid_t gid; /* Group of log */ int numlogs; /* Number of logs to keep */ int trsize; /* Size cutoff to trigger trimming the log */ int hours; /* Hours between log trimming */ struct ptime_data *trim_at; /* Specific time to do trimming */ unsigned int permissions; /* File permissions on the log */ int flags; /* CE_BINARY */ int compress; /* Compression */ int sig; /* Signal to send */ int def_cfg; /* Using the rule for this file */ }; struct sigwork_entry { SLIST_ENTRY(sigwork_entry) sw_nextp; int sw_signum; /* the signal to send */ int sw_pidok; /* true if pid value is valid */ pid_t sw_pid; /* the process id from the PID file */ const char *sw_pidtype; /* "daemon" or "process group" */ int sw_runcmd; /* run command or send PID to signal */ char sw_fname[1]; /* file the PID was read from or shell cmd */ }; struct zipwork_entry { SLIST_ENTRY(zipwork_entry) zw_nextp; const struct conf_entry *zw_conf; /* for chown/perm/flag info */ const struct sigwork_entry *zw_swork; /* to know success of signal */ int zw_fsize; /* size of the file to compress */ char zw_fname[1]; /* the file to compress */ }; struct include_entry { STAILQ_ENTRY(include_entry) inc_nextp; const char *file; /* Name of file to process */ }; struct oldlog_entry { char *fname; /* Filename of the log file */ time_t t; /* Parsed timestamp of the logfile */ }; typedef enum { FREE_ENT, KEEP_ENT } fk_entry; STAILQ_HEAD(cflist, conf_entry); static SLIST_HEAD(swlisthead, sigwork_entry) swhead = SLIST_HEAD_INITIALIZER(swhead); static SLIST_HEAD(zwlisthead, zipwork_entry) zwhead = SLIST_HEAD_INITIALIZER(zwhead); STAILQ_HEAD(ilist, include_entry); int dbg_at_times; /* -D Show details of 'trim_at' code */ static int archtodir = 0; /* Archive old logfiles to other directory */ static int createlogs; /* Create (non-GLOB) logfiles which do not */ /* already exist. 1=='for entries with */ /* C flag', 2=='for all entries'. */ int verbose = 0; /* Print out what's going on */ static int needroot = 1; /* Root privs are necessary */ int noaction = 0; /* Don't do anything, just show it */ static int norotate = 0; /* Don't rotate */ static int nosignal; /* Do not send any signals */ static int enforcepid = 0; /* If PID file does not exist or empty, do nothing */ static int force = 0; /* Force the trim no matter what */ static int rotatereq = 0; /* -R = Always rotate the file(s) as given */ /* on the command (this also requires */ /* that a list of files *are* given on */ /* the run command). */ static char *requestor; /* The name given on a -R request */ static char *timefnamefmt = NULL;/* Use time based filenames instead of .0 */ static char *archdirname; /* Directory path to old logfiles archive */ static char *destdir = NULL; /* Directory to treat at root for logs */ static const char *conf; /* Configuration file to use */ struct ptime_data *dbg_timenow; /* A "timenow" value set via -D option */ static struct ptime_data *timenow; /* The time to use for checking at-fields */ #define DAYTIME_LEN 16 static char daytime[DAYTIME_LEN];/* The current time in human readable form, * used for rotation-tracking messages. */ static char hostname[MAXHOSTNAMELEN]; /* hostname */ static const char *path_syslogpid = _PATH_SYSLOGPID; static struct cflist *get_worklist(char **files); static void parse_file(FILE *cf, struct cflist *work_p, struct cflist *glob_p, struct conf_entry *defconf_p, struct ilist *inclist); static void add_to_queue(const char *fname, struct ilist *inclist); static char *sob(char *p); static char *son(char *p); static int isnumberstr(const char *); static int isglobstr(const char *); static char *missing_field(char *p, char *errline); static void change_attrs(const char *, const struct conf_entry *); static const char *get_logfile_suffix(const char *logfile); static fk_entry do_entry(struct conf_entry *); static fk_entry do_rotate(const struct conf_entry *); static void do_sigwork(struct sigwork_entry *); static void do_zipwork(struct zipwork_entry *); static struct sigwork_entry * save_sigwork(const struct conf_entry *); static struct zipwork_entry * save_zipwork(const struct conf_entry *, const struct sigwork_entry *, int, const char *); static void set_swpid(struct sigwork_entry *, const struct conf_entry *); static int sizefile(const char *); static void expand_globs(struct cflist *work_p, struct cflist *glob_p); static void free_clist(struct cflist *list); static void free_entry(struct conf_entry *ent); static struct conf_entry *init_entry(const char *fname, struct conf_entry *src_entry); static void parse_args(int argc, char **argv); static int parse_doption(const char *doption); static void usage(void); static int log_trim(const char *logname, const struct conf_entry *log_ent); static int age_old_log(const char *file); static void savelog(char *from, char *to); static void createdir(const struct conf_entry *ent, char *dirpart); static void createlog(const struct conf_entry *ent); static int parse_signal(const char *str); /* * All the following take a parameter of 'int', but expect values in the * range of unsigned char. Define wrappers which take values of type 'char', * whether signed or unsigned, and ensure they end up in the right range. */ #define isdigitch(Anychar) isdigit((u_char)(Anychar)) #define isprintch(Anychar) isprint((u_char)(Anychar)) #define isspacech(Anychar) isspace((u_char)(Anychar)) #define tolowerch(Anychar) tolower((u_char)(Anychar)) int main(int argc, char **argv) { struct cflist *worklist; struct conf_entry *p; struct sigwork_entry *stmp; struct zipwork_entry *ztmp; SLIST_INIT(&swhead); SLIST_INIT(&zwhead); parse_args(argc, argv); argc -= optind; argv += optind; if (needroot && getuid() && geteuid()) errx(1, "must have root privs"); worklist = get_worklist(argv); /* * Rotate all the files which need to be rotated. Note that * some users have *hundreds* of entries in newsyslog.conf! */ while (!STAILQ_EMPTY(worklist)) { p = STAILQ_FIRST(worklist); STAILQ_REMOVE_HEAD(worklist, cf_nextp); if (do_entry(p) == FREE_ENT) free_entry(p); } /* * Send signals to any processes which need a signal to tell * them to close and re-open the log file(s) we have rotated. * Note that zipwork_entries include pointers to these * sigwork_entry's, so we can not free the entries here. */ if (!SLIST_EMPTY(&swhead)) { if (noaction || verbose) printf("Signal all daemon process(es)...\n"); SLIST_FOREACH(stmp, &swhead, sw_nextp) do_sigwork(stmp); if (!(rotatereq && nosignal)) { if (noaction) printf("\tsleep 10\n"); else { if (verbose) printf("Pause 10 seconds to allow " "daemon(s) to close log file(s)\n"); sleep(10); } } } /* * Compress all files that we're expected to compress, now * that all processes should have closed the files which * have been rotated. */ if (!SLIST_EMPTY(&zwhead)) { if (noaction || verbose) printf("Compress all rotated log file(s)...\n"); while (!SLIST_EMPTY(&zwhead)) { ztmp = SLIST_FIRST(&zwhead); do_zipwork(ztmp); SLIST_REMOVE_HEAD(&zwhead, zw_nextp); free(ztmp); } } /* Now free all the sigwork entries. */ while (!SLIST_EMPTY(&swhead)) { stmp = SLIST_FIRST(&swhead); SLIST_REMOVE_HEAD(&swhead, sw_nextp); free(stmp); } while (wait(NULL) > 0 || errno == EINTR) ; return (0); } static struct conf_entry * init_entry(const char *fname, struct conf_entry *src_entry) { struct conf_entry *tempwork; if (verbose > 4) printf("\t--> [creating entry for %s]\n", fname); tempwork = malloc(sizeof(struct conf_entry)); if (tempwork == NULL) err(1, "malloc of conf_entry for %s", fname); if (destdir == NULL || fname[0] != '/') tempwork->log = strdup(fname); else asprintf(&tempwork->log, "%s%s", destdir, fname); if (tempwork->log == NULL) err(1, "strdup for %s", fname); if (src_entry != NULL) { tempwork->pid_cmd_file = NULL; if (src_entry->pid_cmd_file) tempwork->pid_cmd_file = strdup(src_entry->pid_cmd_file); tempwork->r_reason = NULL; tempwork->firstcreate = 0; tempwork->rotate = 0; tempwork->fsize = -1; tempwork->uid = src_entry->uid; tempwork->gid = src_entry->gid; tempwork->numlogs = src_entry->numlogs; tempwork->trsize = src_entry->trsize; tempwork->hours = src_entry->hours; tempwork->trim_at = NULL; if (src_entry->trim_at != NULL) tempwork->trim_at = ptime_init(src_entry->trim_at); tempwork->permissions = src_entry->permissions; tempwork->flags = src_entry->flags; tempwork->compress = src_entry->compress; tempwork->sig = src_entry->sig; tempwork->def_cfg = src_entry->def_cfg; } else { /* Initialize as a "do-nothing" entry */ tempwork->pid_cmd_file = NULL; tempwork->r_reason = NULL; tempwork->firstcreate = 0; tempwork->rotate = 0; tempwork->fsize = -1; tempwork->uid = (uid_t)-1; tempwork->gid = (gid_t)-1; tempwork->numlogs = 1; tempwork->trsize = -1; tempwork->hours = -1; tempwork->trim_at = NULL; tempwork->permissions = 0; tempwork->flags = 0; tempwork->compress = COMPRESS_NONE; tempwork->sig = SIGHUP; tempwork->def_cfg = 0; } return (tempwork); } static void free_entry(struct conf_entry *ent) { if (ent == NULL) return; if (ent->log != NULL) { if (verbose > 4) printf("\t--> [freeing entry for %s]\n", ent->log); free(ent->log); ent->log = NULL; } if (ent->pid_cmd_file != NULL) { free(ent->pid_cmd_file); ent->pid_cmd_file = NULL; } if (ent->r_reason != NULL) { free(ent->r_reason); ent->r_reason = NULL; } if (ent->trim_at != NULL) { ptime_free(ent->trim_at); ent->trim_at = NULL; } free(ent); } static void free_clist(struct cflist *list) { struct conf_entry *ent; while (!STAILQ_EMPTY(list)) { ent = STAILQ_FIRST(list); STAILQ_REMOVE_HEAD(list, cf_nextp); free_entry(ent); } free(list); list = NULL; } static fk_entry do_entry(struct conf_entry * ent) { #define REASON_MAX 80 int modtime; fk_entry free_or_keep; double diffsecs; char temp_reason[REASON_MAX]; int oversized; free_or_keep = FREE_ENT; if (verbose) printf("%s <%d%s>: ", ent->log, ent->numlogs, compress_type[ent->compress].flag); ent->fsize = sizefile(ent->log); oversized = ((ent->trsize > 0) && (ent->fsize >= ent->trsize)); modtime = age_old_log(ent->log); ent->rotate = 0; ent->firstcreate = 0; if (ent->fsize < 0) { /* * If either the C flag or the -C option was specified, * and if we won't be creating the file, then have the * verbose message include a hint as to why the file * will not be created. */ temp_reason[0] = '\0'; if (createlogs > 1) ent->firstcreate = 1; else if ((ent->flags & CE_CREATE) && createlogs) ent->firstcreate = 1; else if (ent->flags & CE_CREATE) strlcpy(temp_reason, " (no -C option)", REASON_MAX); else if (createlogs) strlcpy(temp_reason, " (no C flag)", REASON_MAX); if (ent->firstcreate) { if (verbose) printf("does not exist -> will create.\n"); createlog(ent); } else if (verbose) { printf("does not exist, skipped%s.\n", temp_reason); } } else { if (ent->flags & CE_TRIMAT && !force && !rotatereq && !oversized) { diffsecs = ptimeget_diff(timenow, ent->trim_at); if (diffsecs < 0.0) { /* trim_at is some time in the future. */ if (verbose) { ptime_adjust4dst(ent->trim_at, timenow); printf("--> will trim at %s", ptimeget_ctime(ent->trim_at)); } return (free_or_keep); } else if (diffsecs >= 3600.0) { /* * trim_at is more than an hour in the past, * so find the next valid trim_at time, and * tell the user what that will be. */ if (verbose && dbg_at_times) printf("\n\t--> prev trim at %s\t", ptimeget_ctime(ent->trim_at)); if (verbose) { ptimeset_nxtime(ent->trim_at); printf("--> will trim at %s", ptimeget_ctime(ent->trim_at)); } return (free_or_keep); } else if (verbose && noaction && dbg_at_times) { /* * If we are just debugging at-times, then * a detailed message is helpful. Also * skip "doing" any commands, since they * would all be turned off by no-action. */ printf("\n\t--> timematch at %s", ptimeget_ctime(ent->trim_at)); return (free_or_keep); } else if (verbose && ent->hours <= 0) { printf("--> time is up\n"); } } if (verbose && (ent->trsize > 0)) printf("size (Kb): %d [%d] ", ent->fsize, ent->trsize); if (verbose && (ent->hours > 0)) printf(" age (hr): %d [%d] ", modtime, ent->hours); /* * Figure out if this logfile needs to be rotated. */ temp_reason[0] = '\0'; if (rotatereq) { ent->rotate = 1; snprintf(temp_reason, REASON_MAX, " due to -R from %s", requestor); } else if (force) { ent->rotate = 1; snprintf(temp_reason, REASON_MAX, " due to -F request"); } else if (oversized) { ent->rotate = 1; snprintf(temp_reason, REASON_MAX, " due to size>%dK", ent->trsize); } else if (ent->hours <= 0 && (ent->flags & CE_TRIMAT)) { ent->rotate = 1; } else if ((ent->hours > 0) && ((modtime >= ent->hours) || (modtime < 0))) { ent->rotate = 1; } /* * If the file needs to be rotated, then rotate it. */ if (ent->rotate && !norotate) { if (temp_reason[0] != '\0') ent->r_reason = strdup(temp_reason); if (verbose) printf("--> trimming log....\n"); if (noaction && !verbose) printf("%s <%d%s>: trimming\n", ent->log, ent->numlogs, compress_type[ent->compress].flag); free_or_keep = do_rotate(ent); } else { if (verbose) printf("--> skipping\n"); } } return (free_or_keep); #undef REASON_MAX } static void parse_args(int argc, char **argv) { int ch; char *p; timenow = ptime_init(NULL); ptimeset_time(timenow, time(NULL)); strlcpy(daytime, ptimeget_ctime(timenow) + 4, DAYTIME_LEN); /* Let's get our hostname */ (void)gethostname(hostname, sizeof(hostname)); /* Truncate domain */ if ((p = strchr(hostname, '.')) != NULL) *p = '\0'; /* Parse command line options. */ while ((ch = getopt(argc, argv, "a:d:f:nrst:vCD:FNPR:S:")) != -1) switch (ch) { case 'a': archtodir++; archdirname = optarg; break; case 'd': destdir = optarg; break; case 'f': conf = optarg; break; case 'n': noaction++; /* FALLTHROUGH */ case 'r': needroot = 0; break; case 's': nosignal = 1; break; case 't': if (optarg[0] == '\0' || strcmp(optarg, "DEFAULT") == 0) timefnamefmt = strdup(DEFAULT_TIMEFNAME_FMT); else timefnamefmt = strdup(optarg); break; case 'v': verbose++; break; case 'C': /* Useful for things like rc.diskless... */ createlogs++; break; case 'D': /* * Set some debugging option. The specific option * depends on the value of optarg. These options * may come and go without notice or documentation. */ if (parse_doption(optarg)) break; usage(); /* NOTREACHED */ case 'F': force++; break; case 'N': norotate++; break; case 'P': enforcepid++; break; case 'R': rotatereq++; requestor = strdup(optarg); break; case 'S': path_syslogpid = optarg; break; case 'm': /* Used by OpenBSD for "monitor mode" */ default: usage(); /* NOTREACHED */ } if (force && norotate) { warnx("Only one of -F and -N may be specified."); usage(); /* NOTREACHED */ } if (rotatereq) { if (optind == argc) { warnx("At least one filename must be given when -R is specified."); usage(); /* NOTREACHED */ } /* Make sure "requestor" value is safe for a syslog message. */ for (p = requestor; *p != '\0'; p++) { if (!isprintch(*p) && (*p != '\t')) *p = '.'; } } if (dbg_timenow) { /* * Note that the 'daytime' variable is not changed. * That is only used in messages that track when a * logfile is rotated, and if a file *is* rotated, * then it will still rotated at the "real now" time. */ ptime_free(timenow); timenow = dbg_timenow; fprintf(stderr, "Debug: Running as if TimeNow is %s", ptimeget_ctime(dbg_timenow)); } } /* * These debugging options are mainly meant for developer use, such * as writing regression-tests. They would not be needed by users * during normal operation of newsyslog... */ static int parse_doption(const char *doption) { const char TN[] = "TN="; int res; if (strncmp(doption, TN, sizeof(TN) - 1) == 0) { /* * The "TimeNow" debugging option. This might be off * by an hour when crossing a timezone change. */ dbg_timenow = ptime_init(NULL); res = ptime_relparse(dbg_timenow, PTM_PARSE_ISO8601, time(NULL), doption + sizeof(TN) - 1); if (res == -2) { warnx("Non-existent time specified on -D %s", doption); return (0); /* failure */ } else if (res < 0) { warnx("Malformed time given on -D %s", doption); return (0); /* failure */ } return (1); /* successfully parsed */ } if (strcmp(doption, "ats") == 0) { dbg_at_times++; return (1); /* successfully parsed */ } /* XXX - This check could probably be dropped. */ if ((strcmp(doption, "neworder") == 0) || (strcmp(doption, "oldorder") == 0)) { warnx("NOTE: newsyslog always uses 'neworder'."); return (1); /* successfully parsed */ } warnx("Unknown -D (debug) option: '%s'", doption); return (0); /* failure */ } static void usage(void) { fprintf(stderr, "usage: newsyslog [-CFNPnrsv] [-a directory] [-d directory] [-f config_file]\n" " [-S pidfile] [-t timefmt] [[-R tagname] file ...]\n"); exit(1); } /* * Parse a configuration file and return a linked list of all the logs * which should be processed. */ static struct cflist * get_worklist(char **files) { FILE *f; char **given; struct cflist *cmdlist, *filelist, *globlist; struct conf_entry *defconf, *dupent, *ent; struct ilist inclist; struct include_entry *inc; int gmatch, fnres; defconf = NULL; STAILQ_INIT(&inclist); filelist = malloc(sizeof(struct cflist)); if (filelist == NULL) err(1, "malloc of filelist"); STAILQ_INIT(filelist); globlist = malloc(sizeof(struct cflist)); if (globlist == NULL) err(1, "malloc of globlist"); STAILQ_INIT(globlist); inc = malloc(sizeof(struct include_entry)); if (inc == NULL) err(1, "malloc of inc"); inc->file = conf; if (inc->file == NULL) inc->file = _PATH_CONF; STAILQ_INSERT_TAIL(&inclist, inc, inc_nextp); STAILQ_FOREACH(inc, &inclist, inc_nextp) { if (strcmp(inc->file, "-") != 0) f = fopen(inc->file, "r"); else { f = stdin; inc->file = ""; } if (!f) err(1, "%s", inc->file); if (verbose) printf("Processing %s\n", inc->file); parse_file(f, filelist, globlist, defconf, &inclist); (void) fclose(f); } /* * All config-file information has been read in and turned into * a filelist and a globlist. If there were no specific files * given on the run command, then the only thing left to do is to * call a routine which finds all files matched by the globlist * and adds them to the filelist. Then return the worklist. */ if (*files == NULL) { expand_globs(filelist, globlist); free_clist(globlist); if (defconf != NULL) free_entry(defconf); return (filelist); /* NOTREACHED */ } /* * If newsyslog was given a specific list of files to process, * it may be that some of those files were not listed in any * config file. Those unlisted files should get the default * rotation action. First, create the default-rotation action * if none was found in a system config file. */ if (defconf == NULL) { defconf = init_entry(DEFAULT_MARKER, NULL); defconf->numlogs = 3; defconf->trsize = 50; defconf->permissions = S_IRUSR|S_IWUSR; } /* * If newsyslog was run with a list of specific filenames, * then create a new worklist which has only those files in * it, picking up the rotation-rules for those files from * the original filelist. * * XXX - Note that this will copy multiple rules for a single * logfile, if multiple entries are an exact match for * that file. That matches the historic behavior, but do * we want to continue to allow it? If so, it should * probably be handled more intelligently. */ cmdlist = malloc(sizeof(struct cflist)); if (cmdlist == NULL) err(1, "malloc of cmdlist"); STAILQ_INIT(cmdlist); for (given = files; *given; ++given) { /* * First try to find exact-matches for this given file. */ gmatch = 0; STAILQ_FOREACH(ent, filelist, cf_nextp) { if (strcmp(ent->log, *given) == 0) { gmatch++; dupent = init_entry(*given, ent); STAILQ_INSERT_TAIL(cmdlist, dupent, cf_nextp); } } if (gmatch) { if (verbose > 2) printf("\t+ Matched entry %s\n", *given); continue; } /* * There was no exact-match for this given file, so look * for a "glob" entry which does match. */ gmatch = 0; if (verbose > 2 && globlist != NULL) printf("\t+ Checking globs for %s\n", *given); STAILQ_FOREACH(ent, globlist, cf_nextp) { fnres = fnmatch(ent->log, *given, FNM_PATHNAME); if (verbose > 2) printf("\t+ = %d for pattern %s\n", fnres, ent->log); if (fnres == 0) { gmatch++; dupent = init_entry(*given, ent); /* This new entry is not a glob! */ dupent->flags &= ~CE_GLOB; STAILQ_INSERT_TAIL(cmdlist, dupent, cf_nextp); /* Only allow a match to one glob-entry */ break; } } if (gmatch) { if (verbose > 2) printf("\t+ Matched %s via %s\n", *given, ent->log); continue; } /* * This given file was not found in any config file, so * add a worklist item based on the default entry. */ if (verbose > 2) printf("\t+ No entry matched %s (will use %s)\n", *given, DEFAULT_MARKER); dupent = init_entry(*given, defconf); /* Mark that it was *not* found in a config file */ dupent->def_cfg = 1; STAILQ_INSERT_TAIL(cmdlist, dupent, cf_nextp); } /* * Free all the entries in the original work list, the list of * glob entries, and the default entry. */ free_clist(filelist); free_clist(globlist); free_entry(defconf); /* And finally, return a worklist which matches the given files. */ return (cmdlist); } /* * Expand the list of entries with filename patterns, and add all files * which match those glob-entries onto the worklist. */ static void expand_globs(struct cflist *work_p, struct cflist *glob_p) { int gmatch, gres; size_t i; char *mfname; struct conf_entry *dupent, *ent, *globent; glob_t pglob; struct stat st_fm; /* * The worklist contains all fully-specified (non-GLOB) names. * * Now expand the list of filename-pattern (GLOB) entries into * a second list, which (by definition) will only match files * that already exist. Do not add a glob-related entry for any * file which already exists in the fully-specified list. */ STAILQ_FOREACH(globent, glob_p, cf_nextp) { gres = glob(globent->log, GLOB_NOCHECK, NULL, &pglob); if (gres != 0) { warn("cannot expand pattern (%d): %s", gres, globent->log); continue; } if (verbose > 2) printf("\t+ Expanding pattern %s\n", globent->log); for (i = 0; i < pglob.gl_matchc; i++) { mfname = pglob.gl_pathv[i]; /* See if this file already has a specific entry. */ gmatch = 0; STAILQ_FOREACH(ent, work_p, cf_nextp) { if (strcmp(mfname, ent->log) == 0) { gmatch++; break; } } if (gmatch) continue; /* Make sure the named matched is a file. */ gres = lstat(mfname, &st_fm); if (gres != 0) { /* Error on a file that glob() matched?!? */ warn("Skipping %s - lstat() error", mfname); continue; } if (!S_ISREG(st_fm.st_mode)) { /* We only rotate files! */ if (verbose > 2) printf("\t+ . skipping %s (!file)\n", mfname); continue; } if (verbose > 2) printf("\t+ . add file %s\n", mfname); dupent = init_entry(mfname, globent); /* This new entry is not a glob! */ dupent->flags &= ~CE_GLOB; /* Add to the worklist. */ STAILQ_INSERT_TAIL(work_p, dupent, cf_nextp); } globfree(&pglob); if (verbose > 2) printf("\t+ Done with pattern %s\n", globent->log); } } /* * Parse a configuration file and update a linked list of all the logs to * process. */ static void parse_file(FILE *cf, struct cflist *work_p, struct cflist *glob_p, struct conf_entry *defconf_p, struct ilist *inclist) { char line[BUFSIZ], *parse, *q; char *cp, *errline, *group; struct conf_entry *working; struct passwd *pwd; struct group *grp; glob_t pglob; int eol, ptm_opts, res, special; size_t i; errline = NULL; while (fgets(line, BUFSIZ, cf)) { if ((line[0] == '\n') || (line[0] == '#') || (strlen(line) == 0)) continue; if (errline != NULL) free(errline); errline = strdup(line); for (cp = line + 1; *cp != '\0'; cp++) { if (*cp != '#') continue; if (*(cp - 1) == '\\') { strcpy(cp - 1, cp); cp--; continue; } *cp = '\0'; break; } q = parse = missing_field(sob(line), errline); parse = son(line); if (!*parse) errx(1, "malformed line (missing fields):\n%s", errline); *parse = '\0'; /* * Allow people to set debug options via the config file. * (NOTE: debug options are undocumented, and may disappear * at any time, etc). */ if (strcasecmp(DEBUG_MARKER, q) == 0) { q = parse = missing_field(sob(parse + 1), errline); parse = son(parse); if (!*parse) warnx("debug line specifies no option:\n%s", errline); else { *parse = '\0'; parse_doption(q); } continue; } else if (strcasecmp(INCLUDE_MARKER, q) == 0) { if (verbose) printf("Found: %s", errline); q = parse = missing_field(sob(parse + 1), errline); parse = son(parse); if (!*parse) { warnx("include line missing argument:\n%s", errline); continue; } *parse = '\0'; if (isglobstr(q)) { res = glob(q, GLOB_NOCHECK, NULL, &pglob); if (res != 0) { warn("cannot expand pattern (%d): %s", res, q); continue; } if (verbose > 2) printf("\t+ Expanding pattern %s\n", q); for (i = 0; i < pglob.gl_matchc; i++) add_to_queue(pglob.gl_pathv[i], inclist); globfree(&pglob); } else add_to_queue(q, inclist); continue; } special = 0; working = init_entry(q, NULL); if (strcasecmp(DEFAULT_MARKER, q) == 0) { special = 1; if (defconf_p != NULL) { warnx("Ignoring duplicate entry for %s!", q); free_entry(working); continue; } defconf_p = working; } q = parse = missing_field(sob(parse + 1), errline); parse = son(parse); if (!*parse) errx(1, "malformed line (missing fields):\n%s", errline); *parse = '\0'; if ((group = strchr(q, ':')) != NULL || (group = strrchr(q, '.')) != NULL) { *group++ = '\0'; if (*q) { if (!(isnumberstr(q))) { if ((pwd = getpwnam(q)) == NULL) errx(1, "error in config file; unknown user:\n%s", errline); working->uid = pwd->pw_uid; } else working->uid = atoi(q); } else working->uid = (uid_t)-1; q = group; if (*q) { if (!(isnumberstr(q))) { if ((grp = getgrnam(q)) == NULL) errx(1, "error in config file; unknown group:\n%s", errline); working->gid = grp->gr_gid; } else working->gid = atoi(q); } else working->gid = (gid_t)-1; q = parse = missing_field(sob(parse + 1), errline); parse = son(parse); if (!*parse) errx(1, "malformed line (missing fields):\n%s", errline); *parse = '\0'; } else { working->uid = (uid_t)-1; working->gid = (gid_t)-1; } if (!sscanf(q, "%o", &working->permissions)) errx(1, "error in config file; bad permissions:\n%s", errline); q = parse = missing_field(sob(parse + 1), errline); parse = son(parse); if (!*parse) errx(1, "malformed line (missing fields):\n%s", errline); *parse = '\0'; if (!sscanf(q, "%d", &working->numlogs) || working->numlogs < 0) errx(1, "error in config file; bad value for count of logs to save:\n%s", errline); q = parse = missing_field(sob(parse + 1), errline); parse = son(parse); if (!*parse) errx(1, "malformed line (missing fields):\n%s", errline); *parse = '\0'; if (isdigitch(*q)) working->trsize = atoi(q); else if (strcmp(q, "*") == 0) working->trsize = -1; else { warnx("Invalid value of '%s' for 'size' in line:\n%s", q, errline); working->trsize = -1; } working->flags = 0; working->compress = COMPRESS_NONE; q = parse = missing_field(sob(parse + 1), errline); parse = son(parse); eol = !*parse; *parse = '\0'; { char *ep; u_long ul; ul = strtoul(q, &ep, 10); if (ep == q) working->hours = 0; else if (*ep == '*') working->hours = -1; else if (ul > INT_MAX) errx(1, "interval is too large:\n%s", errline); else working->hours = ul; if (*ep == '\0' || strcmp(ep, "*") == 0) goto no_trimat; if (*ep != '@' && *ep != '$') errx(1, "malformed interval/at:\n%s", errline); working->flags |= CE_TRIMAT; working->trim_at = ptime_init(NULL); ptm_opts = PTM_PARSE_ISO8601; if (*ep == '$') ptm_opts = PTM_PARSE_DWM; ptm_opts |= PTM_PARSE_MATCHDOM; res = ptime_relparse(working->trim_at, ptm_opts, ptimeget_secs(timenow), ep + 1); if (res == -2) errx(1, "nonexistent time for 'at' value:\n%s", errline); else if (res < 0) errx(1, "malformed 'at' value:\n%s", errline); } no_trimat: if (eol) q = NULL; else { q = parse = sob(parse + 1); /* Optional field */ parse = son(parse); if (!*parse) eol = 1; *parse = '\0'; } for (; q && *q && !isspacech(*q); q++) { switch (tolowerch(*q)) { case 'b': working->flags |= CE_BINARY; break; case 'c': working->flags |= CE_CREATE; break; case 'd': working->flags |= CE_NODUMP; break; case 'g': working->flags |= CE_GLOB; break; case 'j': working->compress = COMPRESS_BZIP2; break; case 'n': working->flags |= CE_NOSIGNAL; break; case 'r': working->flags |= CE_PID2CMD; break; case 'u': working->flags |= CE_SIGNALGROUP; break; case 'w': /* Depreciated flag - keep for compatibility purposes */ break; case 'x': working->compress = COMPRESS_XZ; break; case 'z': working->compress = COMPRESS_GZIP; break; case '-': break; case 'f': /* Used by OpenBSD for "CE_FOLLOW" */ case 'm': /* Used by OpenBSD for "CE_MONITOR" */ case 'p': /* Used by NetBSD for "CE_PLAIN0" */ default: errx(1, "illegal flag in config file -- %c", *q); } } if (eol) q = NULL; else { q = parse = sob(parse + 1); /* Optional field */ parse = son(parse); if (!*parse) eol = 1; *parse = '\0'; } working->pid_cmd_file = NULL; if (q && *q) { if (*q == '/') working->pid_cmd_file = strdup(q); else if (isalnum(*q)) goto got_sig; else { errx(1, "illegal pid file or signal in config file:\n%s", errline); } } if (eol) q = NULL; else { q = parse = sob(parse + 1); /* Optional field */ *(parse = son(parse)) = '\0'; } working->sig = SIGHUP; if (q && *q) { got_sig: working->sig = parse_signal(q); if (working->sig < 1 || working->sig >= sys_nsig) { errx(1, "illegal signal in config file:\n%s", errline); } } /* * Finish figuring out what pid-file to use (if any) in * later processing if this logfile needs to be rotated. */ if ((working->flags & CE_NOSIGNAL) == CE_NOSIGNAL) { /* * This config-entry specified 'n' for nosignal, * see if it also specified an explicit pid_cmd_file. * This would be a pretty pointless combination. */ if (working->pid_cmd_file != NULL) { warnx("Ignoring '%s' because flag 'n' was specified in line:\n%s", working->pid_cmd_file, errline); free(working->pid_cmd_file); working->pid_cmd_file = NULL; } } else if (working->pid_cmd_file == NULL) { /* * This entry did not specify the 'n' flag, which * means it should signal syslogd unless it had * specified some other pid-file (and obviously the * syslog pid-file will not be for a process-group). * Also, we should only try to notify syslog if we * are root. */ if (working->flags & CE_SIGNALGROUP) { warnx("Ignoring flag 'U' in line:\n%s", errline); working->flags &= ~CE_SIGNALGROUP; } if (needroot) working->pid_cmd_file = strdup(path_syslogpid); } /* * Add this entry to the appropriate list of entries, unless * it was some kind of special entry (eg: ). */ if (special) { ; /* Do not add to any list */ } else if (working->flags & CE_GLOB) { STAILQ_INSERT_TAIL(glob_p, working, cf_nextp); } else { STAILQ_INSERT_TAIL(work_p, working, cf_nextp); } } if (errline != NULL) free(errline); } static char * missing_field(char *p, char *errline) { if (!p || !*p) errx(1, "missing field in config file:\n%s", errline); return (p); } /* * In our sort we return it in the reverse of what qsort normally * would do, as we want the newest files first. If we have two * entries with the same time we don't really care about order. * * Support function for qsort() in delete_oldest_timelog(). */ static int oldlog_entry_compare(const void *a, const void *b) { const struct oldlog_entry *ola = a, *olb = b; if (ola->t > olb->t) return (-1); else if (ola->t < olb->t) return (1); else return (0); } /* * Check whether the file corresponding to dp is an archive of the logfile * logfname, based on the timefnamefmt format string. Return true and fill out * tm if this is the case; otherwise return false. */ static int validate_old_timelog(int fd, const struct dirent *dp, const char *logfname, struct tm *tm) { struct stat sb; size_t logfname_len; char *s; int c; logfname_len = strlen(logfname); if (dp->d_type != DT_REG) { /* * Some filesystems (e.g. NFS) don't fill out the d_type field * and leave it set to DT_UNKNOWN; in this case we must obtain * the file type ourselves. */ if (dp->d_type != DT_UNKNOWN || fstatat(fd, dp->d_name, &sb, AT_SYMLINK_NOFOLLOW) != 0 || !S_ISREG(sb.st_mode)) return (0); } /* Ignore everything but files with our logfile prefix. */ if (strncmp(dp->d_name, logfname, logfname_len) != 0) return (0); /* Ignore the actual non-rotated logfile. */ if (dp->d_namlen == logfname_len) return (0); /* * Make sure we created have found a logfile, so the * postfix is valid, IE format is: '.