diff --git a/en_US.ISO8859-1/books/arch-handbook/mac/chapter.sgml b/en_US.ISO8859-1/books/arch-handbook/mac/chapter.sgml index 6e053546e9..4a826bf9b2 100644 --- a/en_US.ISO8859-1/books/arch-handbook/mac/chapter.sgml +++ b/en_US.ISO8859-1/books/arch-handbook/mac/chapter.sgml @@ -1,7821 +1,7821 @@ Chris Costello TrustedBSD Project
chris@FreeBSD.org
Robert Watson TrustedBSD Project
rwatson@FreeBSD.org
The TrustedBSD MAC Framework MAC Documentation Copyright This documentation was developed for the FreeBSD Project by Chris Costello at Safeport Network Services and Network Associates Laboratories, the Security Research Division of Network Associates, Inc. under DARPA/SPAWAR contract N66001-01-C-8035 (CBOSS), as part of the DARPA CHATS research program. Redistribution and use in source (SGML DocBook) and 'compiled' forms (SGML, HTML, PDF, PostScript, RTF and so forth) with or without modification, are permitted provided that the following conditions are met: Redistributions of source code (SGML DocBook) must retain the above copyright notice, this list of conditions and the following disclaimer as the first lines of this file unmodified. Redistributions in compiled form (transformed to other DTDs, converted to PDF, PostScript, RTF and other formats) must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS DOCUMENTATION IS PROVIDED BY THE NETWORKS ASSOCIATES TECHNOLOGY, INC "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL NETWORKS ASSOCIATES TECHNOLOGY, INC BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS DOCUMENTATION, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Synopsis FreeBSD includes experimental support for several mandatory access control policies, as well as a framework for kernel security extensibility, the TrustedBSD MAC Framework. The MAC Framework provides a pluggable access control framework, permitting new security policies to be easily linked into the kernel, loaded at boot, or loaded dynamically at run-time. The framework provides a variety of features to make it easier to implement new policies, including the ability to easily tag security labels (such as confidentiality information) onto system objects. This chapter introduces the MAC policy framework and provides documentation for a sample MAC policy module. Introduction The TrustedBSD MAC framework provides a mechanism to allow the compile-time or run-time extension of the kernel access control model. New system policies may be implemented as kernel modules and linked to the kernel; if multiple policy modules are present, their results will be composed. The MAC Framework provides a variety of access control infrastructure services to assist policy writers, including support for transient and persistent policy-agnostic object security labels. This support is currently considered experimental. - + Policy Background Mandatory Access Control (MAC), refers to a set of access control policies that are mandatorily enforced on users by the operating system. MAC policies may be contrasted with Discretionary Access Control (DAC) protections, by which non-administrative users may (at their discretion) protect objects. In traditional UNIX systems, DAC protections include file permissions and access control lists; MAC protections include process controls preventing inter-user debugging and firewalls. A variety of MAC policies have been formulated by operating system designers and security researches, including the Multi-Level Security (MLS) confidentiality policy, the Biba integrity policy, Role-Based Access Control (RBAC), and Type Enforcement (TE). Each model bases decisions on a variety of factors, including user identity, role, and security clearance, as well as security labels on objects representing concepts such as data sensitivity and integrity. The TrustedBSD MAC Framework is capable of supporting policy modules that implement all of these policies, as well as a broad class of system hardening policies. In addition, despite the name, the MAC Framework can also be used to implement purely discretionary policies, as policy modules are given substantial flexibility in how they authorize protections. MAC Framework Kernel Architecture The TrustedBSD MAC Framework permits kernel modules to extend the operating system security policy, as well as providing infrastructure functionality required by many access control modules. If multiple policies are simultaneously loaded, the MAC Framework will usefully (for some definition of useful) compose the results of the policies. Kernel Elements The MAC Framework contains a number of kernel elements: Framework management interfaces Concurrency and synchronization primitives. Policy registration Extensible security label for kernel objects Policy entry point composition operators Label management primitives Entry point API invoked by kernel services Entry point API to policy modules Entry points implementations (policy life cycle, object life cycle/label management, access control checks). Policy-agnostic label-management system calls mac_syscall() multiplex system call Various security policies implemented as MAC policy modules Management Interfaces The TrustedBSD MAC Framework may be directly managed using sysctls, loader tunables, and system calls. In most cases, sysctls and loader tunables modify the same parameters, and control behavior such as enforcement of protections relating to various kernel subsystems. In addition, if MAC debugging support is compiled into the kernel, a variety of counters will be maintained tracking label allocation. In most cases, it is advised that per-subsystem enforcement controls not be used to control policy behavior in production environments, as they broadly impact the operation of all active policies. Instead, per-policy controls should be preferred to ensure proper policy operation. Loading and unloading of policy modules is performed using the system module management system calls and other system interfaces, including loader variables. Concurrency and Synchronization As the set of active policies may change at run-time, and the invocation of entry points is non-atomic, synchronization is required to prevent unloading or loading of new policies while an entry point invocation is progress, freezing the list of policies for the duration. This is accomplished by means of a Framework busy count. Whenever an entry point is entered, the busy count is incremented; whenever it is exited, the busy count is decremented. While the busy count is elevated, policy list changes are not permitted, and threads attempting to modify the policy list will sleep until the list is not busy. The busy count is protected by a mutex, and a condition variable is used to wake up sleepers waiting on policy list modifications. Various optimizations are used to reduce the overhead of the busy count, including avoiding the full cost of incrementing and decrementing if the list is empty or contains only static entries (policies that are loaded before the system starts, and cannot be unloaded). Policy Registration The MAC Framework maintains two lists of active policies: a static list, and a dynamic list. The lists differ only with regards to their locking semantics: an elevated reference count is not required to make use of the static list. When kernel modules containing MAC Framework policies are loaded, the policy module will use SYSINIT to invoke a registration function; when a policy module is unloaded, SYSINIT will likewise invoke a de-registration function. Registration may fail if a policy module is loaded more than once, if insufficient resources are available for the registration (for example, the policy might require labeling and insufficient labeling state might be available), or other policy prerequisites might not be met (some policies may only be loaded prior to boot). Likewise, de-registration may fail if a policy refuses an unload. Entry Points Kernel services interact with the MAC Framework in two ways: they invoke a series of APIs to notify the framework of relevant events, and they a policy-agnostic label structure in security-relevant objects. This label structure is maintained by the MAC Framework via label management entry points, and permits the Framework to offer a labeling service to policy modules through relatively non-invasive changes to the kernel subsystem maintaining the object. For example, label structures have been added to processes, process credentials, sockets, pipes, vnodes, Mbufs, network interfaces, IP reassembly queues, and a variety of other security-relevant structures. Kernel services also invoke the MAC Framework when they perform important security decisions, permitting policy modules to augment those decisions based on their own criteria (possibly including data stored in security labels). Policy Composition When more than one policy module is loaded into the kernel at a time, the results of the policy modules will be composed by the framework using a composition operator. This operator is currently hard-coded, and requires that all active policies must approve a request for it to occur. As policies may return a variety of error conditions (success, access denied, object doesn't exist, ...), a precedence operator selects the resulting error from the set of errors returned by policies. While it is not guaranteed that the resulting composition will be useful or secure, we've found that it is for many useful selections of policies. Labeling Support As many interesting access control extensions rely on security labels on objects, the MAC Framework provides a set of policy-agnostic label management system calls covering a variety of user-exposed objects. Common label types include partition identifiers, sensitivity labels, integrity labels, compartments, domains, roles, and types. Policy modules participate in the internalization and externalization of string-based labels provides by user applications, and can expose multiple label elements to applications if desired. In-memory labels are stored in struct label, which consists of a fixed-length array of unions, each holding a void * pointer and a long. Policies registering for label storage will be assigned a "slot" identifier, which may be used to dereference the label storage. The semantics of the storage are left entirely up to the policy module: modules are provided with a variety of entry points associated with the kernel object life cycle, including initialization, association/creation, and destruction. Using these interfaces, it is possible to implement reference counting and other storage mechanisms. Direct access to the kernel object is generally not required by policy modules to retrieve a label, as the MAC Framework generally passes both a pointer to the object and a direct pointer to the object's label into entry points. Initialization entry points frequently include a blocking disposition flag indicating whether or not an initialization is permitted to block; if blocking is not permitted, a failure may be returned to cancel allocation of the label. This may occur, for example, in the network stack during interrupt handling, where blocking is not permitted. Due to the performance cost of maintaining labels on in-flight network packets (Mbufs), policies must specifically declare a requirement that Mbuf labels be allocated. Dynamically loaded policies making use of labels must be able to handle the case where their init function has not been called on an object, as objects may already exist when the policy is loaded. In the case of file system labels, special support is provided for the persistent storage of security labels in extended attributes. Where available, EA transactions are used to permit consistent compound updates of security labels on vnodes. Currently, if a labeled policy permits dynamic unloading, its state slot cannot be reclaimed. System Calls The MAC Framework implements a number of system calls: most of these calls support the policy-agnostic label retrieval and manipulation APIs exposed to user applications. The label management calls accept a label description structure, struct mac, which contains a series of MAC label elements. Each element contains a character string name, and character string value. Each policy will be given the chance to claim a particular element name, permitting policies to expose multiple independent elements if desired. Policy modules perform the internalization and externalization between kernel labels and user-provided labels via entry points, permitting a variety of semantics. Label management system calls are generally wrapped by user library functions to perform memory allocation and error handling. In addition, mac_syscall() permits policy modules to create new system calls without allocating system calls. mac_execve() permits an atomic process credential label change when executing a new image. MAC Policy Architecture Security policies are either linked directly into the kernel, or compiled into loadable kernel modules that may be loaded at boot, or dynamically using the module loading system calls at runtime. Policy modules interact with the system through a set of declared entry points, providing access to a stream of system events and permitting the policy to influence access control decisions. Each policy contains a number of elements: Optional configuration parameters for policy. Centralized implementation of the policy logic and parameters. Optional implementation of policy life cycle events, such as initialization and destruction. Optional support for initializing, maintaining, and destroying labels on selected kernel objects. Optional support for user process inspection and modification of labels on selected objects. Implementation of selected access control entry points that are of interest to the policy. Declaration of policy identity, module entry points, and policy properties. Policy Declaration Modules may be declared using the MAC_POLICY_SET() macro, which names the policy, provides a reference to the MAC entry point vector, provides load-time flags determining how the policy framework should handle the policy, and optionally requests the allocation of label state by the framework. static struct mac_policy_ops mac_policy_ops = { .mpo_destroy = mac_policy_destroy, .mpo_init = mac_policy_init, .mpo_init_bpfdesc_label = mac_policy_init_bpfdesc_label, .mpo_init_cred_label = mac_policy_init_label, /* ... */ .mpo_check_vnode_setutimes = mac_policy_check_vnode_setutimes, .mpo_check_vnode_stat = mac_policy_check_vnode_stat, .mpo_check_vnode_write = mac_policy_check_vnode_write, }; The MAC policy entry point vector, mac_policy_ops in this example, associates functions defined in the module with specific entry points. A complete listing of available entry points and their prototypes may be found in the MAC entry point reference section. Of specific interest during module registration are the .mpo_destroy and .mpo_init entry points. .mpo_init will be invoked once a policy is successfully registered with the module framework but prior to any other entry points becoming active. This permits the policy to perform any policy-specific allocation and initialization, such as initialization of any data or locks. .mpo_destroy will be invoked when a policy module is unloaded to permit releasing of any allocated memory and destruction of locks. Currently, these two entry points are invoked with the MAC policy list mutex held to prevent any other entry points from being invoked: this will be changed, but in the mean time, policies should be careful about what kernel primitives they invoke so as to avoid lock ordering or sleeping problems. The policy declaration's module name field exists so that the module may be uniquely identified for the purposes of module dependencies. An appropriate string should be selected. The full string name of the policy is displayed to the user via the kernel log during load and unload events, and also exported when providing status information to userland processes. Policy Flags The policy declaration flags field permits the module to provide the framework with information about its capabilities at the time the module is loaded. Currently, three flags are defined: MPC_LOADTIME_FLAG_UNLOADOK This flag indicates that the policy module may be unloaded. If this flag is not provided, then the policy framework will reject requests to unload the module. This flag might be used by modules that allocate label state and are unable to free that state at runtime. MPC_LOADTIME_FLAG_NOTLATE This flag indicates that the policy module must be loaded and initialized early in the boot process. If the flag is specified, attempts to register the module following boot will be rejected. The flag may be used by policies that require pervasive labeling of all system objects, and cannot handle objects that have not been properly initialized by the policy. MPC_LOADTIME_FLAG_LABELMBUFS This flag indicates that the policy module requires labeling of Mbufs, and that memory should always be allocated for the storage of Mbuf labels. By default, the MAC Framework will not allocate label storage for Mbufs unless at least one loaded policy has this flag set. This measurably improves network performance when policies do not require Mbuf labeling. A kernel option, MAC_ALWAYS_LABEL_MBUF, exists to force the MAC Framework to allocate Mbuf label storage regardless of the setting of this flag, and may be useful in some environments. Policies using the MPC_LOADTIME_FLAG_LABELMBUFS without the MPC_LOADTIME_FLAG_NOTLATE flag set must be able to correctly handle NULL Mbuf label pointers passed into entry points. This is necessary as in-flight Mbufs without label storage may persist after a policy enabling Mbuf labeling has been loaded. If a policy is loaded before the network subsystem is active (i.e., the policy is not being loaded late), then all Mbufs are guaranteed to have label storage. Policy Entry Points Four classes of entry points are offered to policies registered with the framework: entry points associated with the registration and management of policies, entry points denoting initialization, creation, destruction, and other life cycle events for kernel objects, events associated with access control decisions that the policy module may influence, and calls associated with the management of labels on objects. In addition, a mac_syscall() entry point is provided so that policies may extend the kernel interface without registering new system calls. Policy module writers should be aware of the kernel locking strategy, as well as what object locks are available during which entry points. Writers should attempt to avoid deadlock scenarios by avoiding grabbing non-leaf locks inside of entry points, and also follow the locking protocol for object access and modification. In particular, writers should be aware that while necessary locks to access objects and their labels are generally held, sufficient locks to modify an object or its label may not be present for all entry points. Locking information for arguments is documented in the MAC framework entry point document. Policy entry points will pass a reference to the object label along with the object itself. This permits labeled policies to be unaware of the internals of the object yet still make decisions based on the label. The exception to this is the process credential, which is assumed to be understood by policies as a first class security object in the kernel. Policies that do not implement labels on kernel objects will be passed NULL pointers for label arguments to entry points. MAC Policy Entry Point Reference General-Purpose Module Entry Points <function>&mac.mpo;_init</function> void &mac.mpo;_init struct mac_policy_conf *conf &mac.thead; conf MAC policy definition Policy load event. The policy list mutex is held, so caution should be applied. <function>&mac.mpo;_destroy</function> void &mac.mpo;_destroy struct mac_policy_conf *conf &mac.thead; conf MAC policy definition Policy load event. The policy list mutex is held, so caution should be applied. <function>&mac.mpo;_syscall</function> int &mac.mpo;_syscall struct thread *td int call void *arg &mac.thead; td Calling thread call Syscall number arg Pointer to syscall arguments This entry point provides a policy-multiplexed system call so that policies may provide additional services to user processes without registering specific system calls. The policy name provided during registration is used to demux calls from userland, and the arguments will be forwarded to this entry point. When implementing new services, security modules should be sure to invoke appropriate access control checks from the MAC framework as needed. For example, if a policy implements an augmented signal functionality, it should call the necessary signal access control checks to invoke the MAC framework and other registered policies. Modules must currently perform the copyin() of the syscall data on their own. <function>&mac.mpo;_thread_userret</function> void &mac.mpo;_thread_userret struct thread *td &mac.thead; td Returning thread This entry point permits policy modules to perform MAC-related events when a thread returns to user space. This is required for policies that have floating process labels, as it is not always possible to acquire the process lock at arbitrary points in the stack during system call processing; process labels might represent traditional authentication data, process history information, or other data. Label Operations <function>&mac.mpo;_init_bpfdesc_label</function> void &mac.mpo;_init_bpfdesc_label struct label *label &mac.thead; label New label to apply Initialize the label on a newly instantiated bpfdesc (BPF descriptor) <function>&mac.mpo;_init_cred_label</function> void &mac.mpo;_init_cred_label struct label *label &mac.thead; label New label to initialize Initialize the label for a newly instantiated user credential. <function>&mac.mpo;_init_devfsdirent_label</function> void &mac.mpo;_init_devfsdirent_label struct label *label &mac.thead; label New label to apply Initialize the label on a newly instantiated devfs entry. <function>&mac.mpo;_init_ifnet_label</function> void &mac.mpo;_init_ifnet_label struct label *label &mac.thead; label New label to apply Initialize the label on a newly instantiated network interface. <function>&mac.mpo;_init_ipq_label</function> void &mac.mpo;_init_ipq_label struct label *label int flag &mac.thead; label New label to apply flag Blocking/non-blocking &man.malloc.9;; see below Initialize the label on a newly instantiated IP fragment reassembly queue. The flag field may be one of M_WAITOK and M_NOWAIT, and should be employed to avoid performing a blocking &man.malloc.9; during this initialization call. IP fragment reassembly queue allocation frequently occurs in performance sensitive environments, and the implementation should be careful to avoid blocking or long-lived operations. This entry point is permitted to fail resulting in the failure to allocate the IP fragment reassembly queue. <function>&mac.mpo;_init_mbuf_label</function> void &mac.mpo;_init_mbuf_label int flag struct label *label &mac.thead; flag Blocking/non-blocking &man.malloc.9;; see below label Policy label to initialize Initialize the label on a newly instantiated mbuf packet header (mbuf). The flag field may be one of M_WAITOK and M_NOWAIT, and should be employed to avoid performing a blocking &man.malloc.9; during this initialization call. Mbuf allocation frequently occurs in performance sensitive environments, and the implementation should be careful to avoid blocking or long-lived operations. This entry point is permitted to fail resulting in the failure to allocate the mbuf header. <function>&mac.mpo;_init_mount_label</function> void &mac.mpo;_init_mount_label struct label *mntlabel struct label *fslabel &mac.thead; mntlabel Policy label to be initialized for the mount itself fslabel Policy label to be initialized for the file system Initialize the labels on a newly instantiated mount point. <function>&mac.mpo;_init_mount_fs_label</function> void &mac.mpo;_init_mount_fs_label struct label *label &mac.thead; label Label to be initialized Initialize the label on a newly mounted file system. <function>&mac.mpo;_init_pipe_label</function> void &mac.mpo;_init_pipe_label struct label*label &mac.thead; label Label to be filled in Initialize a label for a newly instantiated pipe. <function>&mac.mpo;_init_socket_label</function> void &mac.mpo;_init_socket_label struct label *label int flag &mac.thead; label New label to initialize flag &man.malloc.9; flags Initialize a label for a newly instantiated socket. <function>&mac.mpo;_init_socket_peer_label</function> void &mac.mpo;_init_socket_peer_label struct label *label int flag &mac.thead; label New label to initialize flag &man.malloc.9; flags Initialize the peer label for a newly instantiated socket. <function>&mac.mpo;_init_proc_label</function> void &mac.mpo;_init_proc_label struct label *label &mac.thead; label New label to initialize Initialize the label for a newly instantiated process. <function>&mac.mpo;_init_vnode_label</function> void &mac.mpo;_init_vnode_label struct label *label &mac.thead; label New label to initialize Initialize the label on a newly instantiated vnode. <function>&mac.mpo;_destroy_bpfdesc_label</function> void &mac.mpo;_destroy_bpfdesc_label struct label *label &mac.thead; label bpfdesc label Destroy the label on a BPF descriptor. In this entry point a policy should free any internal storage associated with label so that it may be destroyed. <function>&mac.mpo;_destroy_cred_label</function> void &mac.mpo;_destroy_cred_label struct label *label &mac.thead; label Label being destroyed Destroy the label on a credential. In this entry point, a policy module should free any internal storage associated with label so that it may be destroyed. <function>&mac.mpo;_destroy_devfsdirent_label</function> void &mac.mpo;_destroy_devfsdirent_label struct label *label &mac.thead; label Label being destroyed Destroy the label on a devfs entry. In this entry point, a policy module should free any internal storage associated with label so that it may be destroyed. <function>&mac.mpo;_destroy_ifnet_label</function> void &mac.mpo;_destroy_ifnet_label struct label *label &mac.thead; label Label being destroyed Destroy the label on a removed interface. In this entry point, a policy module should free any internal storage associated with label so that it may be destroyed. <function>&mac.mpo;_destroy_ipq_label</function> void &mac.mpo;_destroy_ipq_label struct label *label &mac.thead; label Label being destroyed Destroy the label on an IP fragment queue. In this entry point, a policy module should free any internal storage associated with label so that it may be destroyed. <function>&mac.mpo;_destroy_mbuf_label</function> void &mac.mpo;_destroy_mbuf_label struct label *label &mac.thead; label Label being destroyed Destroy the label on an mbuf header. In this entry point, a policy module should free any internal storage associated with label so that it may be destroyed. <function>&mac.mpo;_destroy_mount_label</function> void &mac.mpo;_destroy_mount_label struct label *label &mac.thead; label Mount point label being destroyed Destroy the labels on a mount point. In this entry point, a policy module should free the internal storage associated with mntlabel so that they may be destroyed. <function>&mac.mpo;_destroy_mount_label</function> void &mac.mpo;_destroy_mount_label struct label *mntlabel struct label *fslabel &mac.thead; mntlabel Mount point label being destroyed fslabel File system label being destroyed> Destroy the labels on a mount point. In this entry point, a policy module should free the internal storage associated with mntlabel and fslabel so that they may be destroyed. <function>&mac.mpo;_destroy_socket_label</function> void &mac.mpo;_destroy_socket_label struct label *label &mac.thead; label Socket label being destroyed Destroy the label on a socket. In this entry point, a policy module should free any internal storage associated with label so that it may be destroyed. <function>&mac.mpo;_destroy_socket_peer_label</function> void &mac.mpo;_destroy_socket_peer_label struct label *peerlabel &mac.thead; peerlabel Socket peer label being destroyed Destroy the peer label on a socket. In this entry point, a policy module should free any internal storage associated with label so that it may be destroyed. <function>&mac.mpo;_destroy_pipe_label</function> void &mac.mpo;_destroy_pipe_label struct label *label &mac.thead; label Pipe label Destroy the label on a pipe. In this entry point, a policy module should free any internal storage associated with label so that it may be destroyed. <function>&mac.mpo;_destroy_proc_label</function> void &mac.mpo;_destroy_proc_label struct label *label &mac.thead; label Process label Destroy the label on a process. In this entry point, a policy module should free any internal storage associated with label so that it may be destroyed. <function>&mac.mpo;_destroy_vnode_label</function> void &mac.mpo;_destroy_vnode_label struct label *label &mac.thead; label Process label Destroy the label on a vnode. In this entry point, a policy module should free any internal storage associated with label so that it may be destroyed. <function>&mac.mpo;_copy_mbuf_label</function> void &mac.mpo;_copy_mbuf_label struct label *src struct label *dest &mac.thead; src Source label dest Destination label Copy the label information in src into dest. <function>&mac.mpo;_copy_pipe_label</function> void &mac.mpo;_copy_pipe_label struct label *src struct label *dest &mac.thead; src Source label dest Destination label Copy the label information in src into dest. <function>&mac.mpo;_copy_vnode_label</function> void &mac.mpo;_copy_vnode_label struct label *src struct label *dest &mac.thead; src Source label dest Destination label Copy the label information in src into dest. <function>&mac.mpo;_externalize_cred_label</function> int &mac.mpo;_externalize_cred_label &mac.externalize.paramdefs; &mac.thead; &mac.externalize.tbody; &mac.externalize.para; <function>&mac.mpo;_externalize_ifnet_label</function> int &mac.mpo;_externalize_ifnet_label &mac.externalize.paramdefs; &mac.thead; &mac.externalize.tbody; &mac.externalize.para; <function>&mac.mpo;_externalize_pipe_label</function> int &mac.mpo;_externalize_pipe_label &mac.externalize.paramdefs; &mac.thead; &mac.externalize.tbody; &mac.externalize.para; <function>&mac.mpo;_externalize_socket_label</function> int &mac.mpo;_externalize_socket_label &mac.externalize.paramdefs; &mac.thead; &mac.externalize.tbody; &mac.externalize.para; <function>&mac.mpo;_externalize_socket_peer_label</function> int &mac.mpo;_externalize_socket_peer_label &mac.externalize.paramdefs; &mac.thead; &mac.externalize.tbody; &mac.externalize.para; <function>&mac.mpo;_externalize_vnode_label</function> int &mac.mpo;_externalize_vnode_label &mac.externalize.paramdefs; &mac.thead; &mac.externalize.tbody; &mac.externalize.para; <function>&mac.mpo;_internalize_cred_label</function> int &mac.mpo;_internalize_cred_label &mac.internalize.paramdefs; &mac.thead; &mac.internalize.tbody; &mac.internalize.para; <function>&mac.mpo;_internalize_ifnet_label</function> int &mac.mpo;_internalize_ifnet_label &mac.internalize.paramdefs; &mac.thead; &mac.internalize.tbody; &mac.internalize.para; <function>&mac.mpo;_internalize_pipe_label</function> int &mac.mpo;_internalize_pipe_label &mac.internalize.paramdefs; &mac.thead; &mac.internalize.tbody; &mac.internalize.para; <function>&mac.mpo;_internalize_socket_label</function> int &mac.mpo;_internalize_socket_label &mac.internalize.paramdefs; &mac.thead; &mac.internalize.tbody; &mac.internalize.para; <function>&mac.mpo;_internalize_vnode_label</function> int &mac.mpo;_internalize_vnode_label &mac.internalize.paramdefs; &mac.thead; &mac.internalize.tbody; &mac.internalize.para; Label Events This class of entry points is used by the MAC framework to permit policies to maintain label information on kernel objects. For each labeled kernel object of interest to a MAC policy, entry points may be registered for relevant life cycle events. All objects implement initialization, creation, and destruction hooks. Some objects will also implement relabeling, allowing user processes to change the labels on objects. Some objects will also implement object-specific events, such as label events associated with IP reassembly. A typical labeled object will have the following life cycle of entry points: Label initialization o (object-specific wait) \ Label creation o \ Relabel events, o--<--. Various object-specific, | | Access control events ~-->--o \ Label destruction o Label initialization permits policies to allocate memory and set initial values for labels without context for the use of the object. The label slot allocated to a policy will be zeroed by default, so some policies may not need to perform initialization. Label creation occurs when the kernel structure is associated with an actual kernel object. For example, Mbufs may be allocated and remain unused in a pool until they are required. mbuf allocation causes label initialization on the mbuf to take place, but mbuf creation occurs when the mbuf is associated with a datagram. Typically, context will be provided for a creation event, including the circumstances of the creation, and labels of other relevant objects in the creation process. For example, when an mbuf is created from a socket, the socket and its label will be presented to registered policies in addition to the new mbuf and its label. Memory allocation in creation events is discouraged, as it may occur in performance sensitive ports of the kernel; in addition, creation calls are not permitted to fail so a failure to allocate memory cannot be reported. Object specific events do not generally fall into the other broad classes of label events, but will generally provide an opportunity to modify or update the label on an object based on additional context. For example, the label on an IP fragment reassembly queue may be updated during the MAC_UPDATE_IPQ entry point as a result of the acceptance of an additional mbuf to that queue. Access control events are discussed in detail in the following section. Label destruction permits policies to release storage or state associated with a label during its association with an object so that the kernel data structures supporting the object may be reused or released. In addition to labels associated with specific kernel objects, an additional class of labels exists: temporary labels. These labels are used to store update information submitted by user processes. These labels are initialized and destroyed as with other label types, but the creation event is MAC_INTERNALIZE, which accepts a user label to be converted to an in-kernel representation. File System Object Labeling Event Operations <function>&mac.mpo;_associate_vnode_devfs</function> void &mac.mpo;_associate_vnode_devfs struct mount *mp struct label *fslabel struct devfs_dirent *de struct label *delabel struct vnode *vp struct label *vlabel &mac.thead; mp Devfs mount point fslabel Devfs file system label (mp->mnt_fslabel) de Devfs directory entry delabel Policy label associated with de vp vnode associated with de vlabel Policy label associated with vp Fill in the label (vlabel) for a newly created devfs vnode based on the devfs directory entry passed in de and its label. <function>&mac.mpo;_associate_vnode_extattr</function> int &mac.mpo;_associate_vnode_extattr struct mount *mp struct label *fslabel struct vnode *vp struct label *vlabel &mac.thead; mp File system mount point fslabel File system label vp Vnode to label vlabel Policy label associated with vp Attempt to retrieve the label for vp from the file system extended attributes. Upon success, the value 0 is returned. Should extended attribute retrieval not be supported, an accepted fallback is to copy fslabel into vlabel. In the event of an error, an appropriate value for errno should be returned. <function>&mac.mpo;_associate_vnode_singlelabel</function> void &mac.mpo;_associate_vnode_singlelabel struct mount *mp struct label *fslabel struct vnode *vp struct label *vlabel &mac.thead; mp File system mount point fslabel File system label vp Vnode to label vlabel Policy label associated with vp On non-multilabel file systems, this entry point is called to set the policy label for vp based on the file system label, fslabel. <function>&mac.mpo;_create_devfs_device</function> void &mac.mpo;_create_devfs_device dev_t dev struct devfs_dirent *devfs_dirent struct label *label &mac.thead; dev Device corresponding with devfs_dirent devfs_dirent Devfs directory entry to be labeled. label Label for devfs_dirent to be filled in. Fill out the label on a devfs_dirent being created for the passed device. This call will be made when the device file system is mounted, regenerated, or a new device is made available. <function>&mac.mpo;_create_devfs_directory</function> void &mac.mpo;_create_devfs_directory char *dirname int dirnamelen struct devfs_dirent *devfs_dirent struct label *label &mac.thead; dirname Name of directory being created namelen Length of string dirname devfs_dirent Devfs directory entry for directory being created. Fill out the label on a devfs_dirent being created for the passed directory. This call will be made when the device file system is mounted, regenerated, or a new device requiring a specific directory hierarchy is made available. <function>&mac.mpo;_create_devfs_symlink</function> void &mac.mpo;_create_devfs_symlink struct ucred *cred struct mount *mp struct devfs_dirent *dd struct label *ddlabel struct devfs_dirent *de struct label *delabel &mac.thead; cred Subject credential mp Devfs mount point dd Link destination ddlabel Label associated with dd de Symlink entry delabel Label associated with de Fill in the label (delabel) for a newly created &man.devfs.5; symbolic link entry. <function>&mac.mpo;_create_vnode_extattr</function> int &mac.mpo;_create_vnode_extattr struct ucred *cred struct mount *mp struct label *fslabel struct vnode *dvp struct label *dlabel struct vnode *vp struct label *vlabel struct componentname *cnp &mac.thead; cred Subject credential mount File system mount point label File system label dvp Parent directory vnode dlabel Label associated with dvp vp Newly created vnode vlabel Policy label associated with vp cnp Component name for vp Write out the label for vp to the appropriate extended attribute. If the write succeeds, fill in vlabel with the label, and return 0. Otherwise, return an appropriate error. <function>&mac.mpo;_create_mount</function> void &mac.mpo;_create_mount struct ucred *cred struct mount *mp struct label *mnt struct label *fslabel &mac.thead; cred Subject credential mp Object; file system being mounted mntlabel Policy label to be filled in for mp fslabel Policy label for the file system mp mounts. Fill out the labels on the mount point being created by the passed subject credential. This call will be made when a new file system is mounted. <function>&mac.mpo;_create_root_mount</function> void &mac.mpo;_create_root_mount struct ucred *cred struct mount *mp struct label *mntlabel struct label *fslabel &mac.thead; See . Fill out the labels on the mount point being created by the passed subject credential. This call will be made when the root file system is mounted, after &mac.mpo;_create_mount;. <function>&mac.mpo;_relabel_vnode</function> void &mac.mpo;_relabel_vnode struct ucred *cred struct vnode *vp struct label *vnodelabel struct label *newlabel &mac.thead; cred Subject credential vp vnode to relabel vnodelabel Existing policy label for vp newlabel New, possibly partial label to replace vnodelabel Update the label on the passed vnode given the passed update vnode label and the passed subject credential. <function>&mac.mpo;_setlabel_vnode_extattr</function> int &mac.mpo;_setlabel_vnode_extattr struct ucred *cred struct vnode *vp struct label *vlabel struct label *intlabel &mac.thead; cred Subject credential vp Vnode for which the label is being written vlabel Policy label associated with vp intlabel Label to write out Write out the policy from intlabel to an extended attribute. This is called from vop_stdcreatevnode_ea. <function>&mac.mpo;_update_devfsdirent</function> void &mac.mpo;_update_devfsdirent struct devfs_dirent *devfs_dirent struct label *direntlabel struct vnode *vp struct label *vnodelabel &mac.thead; devfs_dirent Object; devfs directory entry direntlabel Policy label for devfs_dirent to be updated. vp Parent vnode Locked vnodelabel Policy label for vp Update the devfs_dirent label from the passed devfs vnode label. This call will be made when a devfs vnode has been successfully relabeled to commit the label change such that it lasts even if the vnode is recycled. It will also be made when when a symlink is created in devfs, following a call to mac_vnode_create_from_vnode to initialize the vnode label. IPC Object Labeling Event Operations <function>&mac.mpo;_create_mbuf_from_socket</function> void &mac.mpo;_create_mbuf_from_socket struct socket *so struct label *socketlabel struct mbuf *m struct label *mbuflabel &mac.thead; socket Socket Socket locking WIP socketlabel Policy label for socket m Object; mbuf mbuflabel Policy label to fill in for m Set the label on a newly created mbuf header from the passed socket label. This call is made when a new datagram or message is generated by the socket and stored in the passed mbuf. <function>&mac.mpo;_create_pipe</function> void &mac.mpo;_create_pipe struct ucred *cred struct pipe *pipe struct label *pipelabel &mac.thead; cred Subject credential pipe Pipe pipelabel Policy label associated with pipe Set the label on a newly created pipe from the passed subject credential. This call is made when a new pipe is created. <function>&mac.mpo;_create_socket</function> void &mac.mpo;_create_socket struct ucred *cred struct socket *so struct label *socketlabel &mac.thead; cred Subject credential Immutable so Object; socket to label socketlabel Label to fill in for so Set the label on a newly created socket from the passed subject credential. This call is made when a socket is created. <function>&mac.mpo;_create_socket_from_socket</function> void &mac.mpo;_create_socket_from_socket struct socket *oldsocket struct label *oldsocketlabel struct socket *newsocket struct label *newsocketlabel &mac.thead; oldsocket Listening socket oldsocketlabel Policy label associated with oldsocket newsocket New socket newsocketlabel Policy label associated with newsocketlabel Label a socket, newsocket, newly &man.accept.2;ed, based on the &man.listen.2; socket, oldsocket. <function>&mac.mpo;_relabel_pipe</function> void &mac.mpo;_relabel_pipe struct ucred *cred struct pipe *pipe struct label *oldlabel struct label *newlabel &mac.thead; cred Subject credential pipe Pipe oldlabel Current policy label associated with pipe newlabel Policy label update to apply to pipe Apply a new label, newlabel, to pipe. <function>&mac.mpo;_relabel_socket</function> void &mac.mpo;_relabel_socket struct ucred *cred struct socket *so struct label *oldlabel struct label *newlabel &mac.thead; cred Subject credential Immutable so Object; socket oldlabel Current label for so newlabel Label update for so Update the label on a socket from the passed socket label update. <function>&mac.mpo;_set_socket_peer_from_mbuf</function> void &mac.mpo;_set_socket_peer_from_mbuf struct mbuf *mbuf struct label *mbuflabel struct label *oldlabel struct label *newlabel &mac.thead; mbuf First datagram received over socket mbuflabel Label for mbuf oldlabel Current label for the socket newlabel Policy label to be filled out for the socket Set the peer label on a stream socket from the passed mbuf label. This call will be made when the first datagram is received by the stream socket, with the exception of Unix domain sockets. <function>&mac.mpo;_set_socket_peer_from_socket</function> void &mac.mpo;_set_socket_peer_from_socket struct socket *oldsocket struct label *oldsocketlabel struct socket *newsocket struct label *newsocketpeerlabel &mac.thead; oldsocket Local socket oldsocketlabel Policy label for oldsocket newsocket Peer socket newsocketpeerlabel Policy label to fill in for newsocket Set the peer label on a stream UNIX domain socket from the passed remote socket endpoint. This call will be made when the socket pair is connected, and will be made for both endpoints. Network Object Labeling Event Operations <function>&mac.mpo;_create_bpfdesc</function> void &mac.mpo;_create_bpfdesc struct ucred *cred struct bpf_d *bpf_d struct label *bpflabel &mac.thead; cred Subject credential Immutable bpf_d Object; bpf descriptor bpf Policy label to be filled in for bpf_d Set the label on a newly created BPF descriptor from the passed subject credential. This call will be made when a BPF device node is opened by a process with the passed subject credential. <function>&mac.mpo;_create_ifnet</function> void &mac.mpo;_create_ifnet struct ifnet *ifnet struct label *ifnetlabel &mac.thead; ifnet Network interface ifnetlabel Policy label to fill in for ifnet Set the label on a newly created interface. This call may be made when a new physical interface becomes available to the system, or when a pseudo-interface is instantiated during the boot or as a result of a user action. <function>&mac.mpo;_create_ipq</function> void &mac.mpo;_create_ipq struct mbuf *fragment struct label *fragmentlabel struct ipq *ipq struct label *ipqlabel &mac.thead; fragment First received IP fragment fragmentlabel Policy label for fragment ipq IP reassembly queue to be labeled ipqlabel Policy label to be filled in for ipq Set the label on a newly created IP fragment reassembly queue from the mbuf header of the first received fragment. <function>&mac.mpo;_create_datagram_from_ipq</function> void &mac.mpo;_create_create_datagram_from_ipq struct ipq *ipq struct label *ipqlabel struct mbuf *datagram struct label *datagramlabel &mac.thead; ipq IP reassembly queue ipqlabel Policy label for ipq datagram Datagram to be labeled datagramlabel Policy label to be filled in for datagramlabel Set the label on a newly reassembled IP datagram from the IP fragment reassembly queue from which it was generated. <function>&mac.mpo;_create_fragment</function> void &mac.mpo;_create_fragment struct mbuf *datagram struct label *datagramlabel struct mbuf *fragment struct label *fragmentlabel &mac.thead; datagram Datagram datagramlabel Policy label for datagram fragment Fragment to be labeled fragmentlabel Policy label to be filled in for datagram Set the label on the mbuf header of a newly created IP fragment from the label on the mbuf header of the datagram it was generate from. <function>&mac.mpo;_create_mbuf_from_mbuf</function> void &mac.mpo;_create_mbuf_from_mbuf struct mbuf *oldmbuf struct label *oldmbuflabel struct mbuf *newmbuf struct label *newmbuflabel &mac.thead; oldmbuf Existing (source) mbuf oldmbuflabel Policy label for oldmbuf newmbuf New mbuf to be labeled newmbuflabel Policy label to be filled in for newmbuf Set the label on the mbuf header of a newly created datagram from the mbuf header of an existing datagram. This call may be made in a number of situations, including when an mbuf is re-allocated for alignment purposes. <function>&mac.mpo;_create_mbuf_linklayer</function> void &mac.mpo;_create_mbuf_linklayer struct ifnet *ifnet struct label *ifnetlabel struct mbuf *mbuf struct label *mbuflabel &mac.thead; ifnet Network interface ifnetlabel Policy label for ifnet mbuf mbuf header for new datagram mbuflabel Policy label to be filled in for mbuf Set the label on the mbuf header of a newly created datagram generated for the purposes of a link layer response for the passed interface. This call may be made in a number of situations, including for ARP or ND6 responses in the IPv4 and IPv6 stacks. <function>&mac.mpo;_create_mbuf_from_bpfdesc</function> void &mac.mpo;_create_mbuf_from_bpfdesc struct bpf_d *bpf_d struct label *bpflabel struct mbuf *mbuf struct label *mbuflabel &mac.thead; bpf_d BPF descriptor bpflabel Policy label for bpflabel mbuf New mbuf to be labeled mbuflabel Policy label to fill in for mbuf Set the label on the mbuf header of a newly created datagram generated using the passed BPF descriptor. This call is made when a write is performed to the BPF device associated with the passed BPF descriptor. <function>&mac.mpo;_create_mbuf_from_ifnet</function> void &mac.mpo;_create_mbuf_from_ifnet struct ifnet *ifnet struct label *ifnetlabel struct mbuf *mbuf struct label *mbuflabel &mac.thead; ifnet Network interface ifnetlabel Policy label for ifnetlabel mbuf mbuf header for new datagram mbuflabel Policy label to be filled in for mbuf Set the label on the mbuf header of a newly created datagram generated from the passed network interface. <function>&mac.mpo;_create_mbuf_multicast_encap</function> void &mac.mpo;_create_mbuf_multicast_encap struct mbuf *oldmbuf struct label *oldmbuflabel struct ifnet *ifnet struct label *ifnetlabel struct mbuf *newmbuf struct label *newmbuflabel &mac.thead; oldmbuf mbuf header for existing datagram oldmbuflabel Policy label for oldmbuf ifnet Network interface ifnetlabel Policy label for ifnet newmbuf mbuf header to be labeled for new datagram newmbuflabel Policy label to be filled in for newmbuf Set the label on the mbuf header of a newly created datagram generated from the existing passed datagram when it is processed by the passed multicast encapsulation interface. This call is made when an mbuf is to be delivered using the virtual interface. <function>&mac.mpo;_create_mbuf_netlayer</function> void &mac.mpo;_create_mbuf_netlayer struct mbuf *oldmbuf struct label *oldmbuflabel struct mbuf *newmbuf struct label *newmbuflabel &mac.thead; oldmbuf Received datagram oldmbuflabel Policy label for oldmbuf newmbuf Newly created datagram newmbuflabel Policy label for newmbuf Set the label on the mbuf header of a newly created datagram generated by the IP stack in response to an existing received datagram (oldmbuf). This call may be made in a number of situations, including when responding to ICMP request datagrams. <function>&mac.mpo;_fragment_match</function> int &mac.mpo;_fragment_match struct mbuf *fragment struct label *fragmentlabel struct ipq *ipq struct label *ipqlabel &mac.thead; fragment IP datagram fragment fragmentlabel Policy label for fragment ipq IP fragment reassembly queue ipqlabel Policy label for ipq Determine whether an mbuf header containing an IP datagram (fragment) fragment matches the label of the passed IP fragment reassembly queue (ipq). Return (1) for a successful match, or (0) for no match. This call is made when the IP stack attempts to find an existing fragment reassembly queue for a newly received fragment; if this fails, a new fragment reassembly queue may be instantiated for the fragment. Policies may use this entry point to prevent the reassembly of otherwise matching IP fragments if policy does not permit them to be reassembled based on the label or other information. <function>&mac.mpo;_relabel_ifnet</function> void &mac.mpo;_relabel_ifnet struct ucred *cred struct ifnet *ifnet struct label *ifnetlabel struct label *newlabel &mac.thead; cred Subject credential ifnet Object; Network interface ifnetlabel Policy label for ifnet newlabel Label update to apply to ifnet Update the label of network interface, ifnet, based on the passed update label, newlabel, and the passed subject credential, cred. <function>&mac.mpo;_update_ipq</function> void &mac.mpo;_update_ipq struct mbuf *fragment struct label *fragmentlabel struct ipq *ipq struct label *ipqlabel &mac.thead; mbuf IP fragment mbuflabel Policy label for mbuf ipq IP fragment reassembly queue ipqlabel Policy label to be updated for ipq Update the label on an IP fragment reassembly queue (ipq) based on the acceptance of the passed IP fragment mbuf header (mbuf). Process Labeling Event Operations <function>&mac.mpo;_create_cred</function> void &mac.mpo;_create_cred struct ucred *parent_cred struct ucred *child_cred &mac.thead; parent_cred Parent subject credential child_cred Child subject credential Set the label of a newly created subject credential from the passed subject credential. This call will be made when &man.crcopy.9; is invoked on a newly created struct ucred. This call should not be confused with a process forking or creation event. <function>&mac.mpo;_execve_transition</function> void &mac.mpo;_execve_transition struct ucred *old struct ucred *new struct vnode *vp struct label *vnodelabel &mac.thead; old Existing subject credential Immutable new New subject credential to be labeled vp File to execute Locked vnodelabel Policy label for vp Update the label of a newly created subject credential (new) from the passed existing subject credential (old) based on a label transition caused by executing the passed vnode (vp). This call occurs when a process executes the passed vnode and one of the policies returns a success from the mpo_execve_will_transition entry point. Policies may choose to implement this call simply by invoking mpo_create_cred and passing the two subject credentials so as not to implement a transitioning event. Policies should not leave this entry point unimplemented if they implement mpo_create_cred, even if they do not implement mpo_execve_will_transition. <function>&mac.mpo;_execve_will_transition</function> int &mac.mpo;_execve_will_transition struct ucred *old struct vnode *vp struct label *vnodelabel &mac.thead; old Subject credential prior to &man.execve.2; Immutable vp File to execute vnodelabel Policy label for vp Determine whether the policy will want to perform a transition event as a result of the execution of the passed vnode by the passed subject credential. Return 1 if a transition is required, 0 if not. Even if a policy returns 0, it should behave correctly in the presence of an unexpected invocation of mpo_execve_transition, as that call may happen as a result of another policy requesting a transition. <function>&mac.mpo;_create_proc0</function> void &mac.mpo;_create_proc0 struct ucred *cred &mac.thead; cred Subject credential to be filled in Create the subject credential of process 0, the parent of all kernel processes. <function>&mac.mpo;_create_proc1</function> void &mac.mpo;_create_proc1 struct ucred *cred &mac.thead; cred Subject credential to be filled in Create the subject credential of process 1, the parent of all user processes. <function>&mac.mpo;_relabel_cred</function> void &mac.mpo;_relabel_cred struct ucred *cred struct label *newlabel &mac.thead; cred Subject credential newlabel Label update to apply to cred Update the label on a subject credential from the passed update label. Access Control Checks Access control entry points permit policy modules to influence access control decisions made by the kernel. Generally, although not always, arguments to an access control entry point will include one or more authorizing credentials, information (possibly including a label) for any other objects involved in the operation. An access control entry point may return 0 to permit the operation, or an &man.errno.2; error value. The results of invoking the entry point across various registered policy modules will be composed as follows: if all modules permit the operation to succeed, success will be returned. If one or modules returns a failure, a failure will be returned. If more than one module returns a failure, the errno value to return to the user will be selected using the following precedence, implemented by the error_select() function in kern_mac.c: Most precedence EDEADLK EINVAL ESRCH EACCES Least precedence EPERM If none of the error values returned by all modules are listed in the precedence chart then an arbitrarily selected value from the set will be returned. In general, the rules provide precedence to errors in the following order: kernel failures, invalid arguments, object not present, access not permitted, other. <function>&mac.mpo;_check_bpfdesc_receive</function> int &mac.mpo;_check_bpfdesc_receive struct bpf_d *bpf_d struct label *bpflabel struct ifnet *ifnet struct label *ifnetlabel &mac.thead; bpf_d Subject; BPF descriptor bpflabel Policy label for bpf_d ifnet Object; network interface ifnetlabel Policy label for ifnet Determine whether the MAC framework should permit datagrams from the passed interface to be delivered to the buffers of the passed BPF descriptor. Return (0) for success, or an errno value for failure Suggested failure: EACCES for label mismatches, EPERM for lack of privilege. <function>&mac.mpo;_check_kenv_dump</function> int &mac.mpo;_check_kenv_dump struct ucred *cred &mac.thead; cred Subject credential Determine whether the subject should be allowed to retrieve the kernel environment (see &man.kenv.2;). <function>&mac.mpo;_check_kenv_get</function> int &mac.mpo;_check_kenv_get struct ucred *cred char *name &mac.thead; cred Subject credential name Kernel environment variable name Determine whether the subject should be allowed to retrieve the value of the specified kernel environment variable. <function>&mac.mpo;_check_kenv_set</function> int &mac.mpo;_check_kenv_set struct ucred *cred char *name &mac.thead; cred Subject credential name Kernel environment variable name Determine whether the subject should be allowed to set the specified kernel environment variable. <function>&mac.mpo;_check_kenv_unset</function> int &mac.mpo;_check_kenv_unset struct ucred *cred char *name &mac.thead; cred Subject credential name Kernel environment variable name Determine whether the subject should be allowed to unset the specified kernel environment variable. <function>&mac.mpo;_check_kld_load</function> int &mac.mpo;_check_kld_load struct ucred *cred struct vnode *vp struct label *vlabel &mac.thead; cred Subject credential vp Kernel module vnode vlabel Label associated with vp Determine whether the subject should be allowed to load the specified module file. <function>&mac.mpo;_check_kld_stat</function> int &mac.mpo;_check_kld_stat struct ucred *cred &mac.thead; cred Subject credential Determine whether the subject should be allowed to retrieve a list of loaded kernel module files and associated statistics. <function>&mac.mpo;_check_kld_unload</function> int &mac.mpo;_check_kld_unload struct ucred *cred &mac.thead; cred Subject credential Determine whether the subject should be allowed to unload a kernel module. <function>&mac.mpo;_check_pipe_ioctl</function> int &mac.mpo;_check_pipe_ioctl struct ucred *cred struct pipe *pipe struct label *pipelabel unsigned long cmd void *data &mac.thead; cred Subject credential pipe Pipe pipelabel Policy label associated with pipe cmd &man.ioctl.2; command data &man.ioctl.2; data Determine whether the subject should be allowed to make the specified &man.ioctl.2; call. <function>&mac.mpo;_check_pipe_poll</function> int &mac.mpo;_check_pipe_poll struct ucred *cred struct pipe *pipe struct label *pipelabel &mac.thead; cred Subject credential pipe Pipe pipelabel Policy label associated with pipe Determine whether the subject should be allowed to poll pipe. <function>&mac.mpo;_check_pipe_read</function> int &mac.mpo;_check_pipe_read struct ucred *cred struct pipe *pipe struct label *pipelabel &mac.thead; cred Subject credential pipe Pipe pipelabel Policy label associated with pipe Determine whether the subject should be allowed read access to pipe. <function>&mac.mpo;_check_pipe_relabel</function> int &mac.mpo;_check_pipe_relabel struct ucred *cred struct pipe *pipe struct label *pipelabel struct label *newlabel &mac.thead; cred Subject credential pipe Pipe pipelabel Current policy label associated with pipe newlabel Label update to pipelabel Determine whether the subject should be allowed to relabel pipe. <function>&mac.mpo;_check_pipe_stat</function> int &mac.mpo;_check_pipe_stat struct ucred *cred struct pipe *pipe struct label *pipelabel &mac.thead; cred Subject credential pipe Pipe pipelabel Policy label associated with pipe Determine whether the subject should be allowed to retrieve statistics related to pipe. <function>&mac.mpo;_check_pipe_write</function> int &mac.mpo;_check_pipe_write struct ucred *cred struct pipe *pipe struct label *pipelabel &mac.thead; cred Subject credential pipe Pipe pipelabel Policy label associated with pipe Determine whether the subject should be allowed to write to pipe. <function>&mac.mpo;_check_socket_bind</function> int &mac.mpo;_check_socket_bind struct ucred *cred struct socket *socket struct label *socketlabel struct sockaddr *sockaddr &mac.thead; cred Subject credential socket Socket to be bound socketlabel Policy label for socket sockaddr Address of socket <function>&mac.mpo;_check_socket_connect</function> int &mac.mpo;_check_socket_connect struct ucred *cred struct socket *socket struct label *socketlabel struct sockaddr *sockaddr &mac.thead; cred Subject credential socket Socket to be connected socketlabel Policy label for socket sockaddr Address of socket Determine whether the subject credential (cred) can connect the passed socket (socket) to the passed socket address (sockaddr). Return 0 for success, or an errno value for failure. Suggested failure: EACCES for label mismatches, EPERM for lack of privilege. <function>&mac.mpo;_check_socket_receive</function> int &mac.mpo;_check_socket_receive struct ucred *cred struct socket *so struct label *socketlabel &mac.thead; cred Subject credential so Socket socketlabel Policy label associated with so Determine whether the subject should be allowed to receive information from the socket so. <function>&mac.mpo;_check_socket_send</function> int &mac.mpo;_check_socket_send struct ucred *cred struct socket *so struct label *socketlabel &mac.thead; cred Subject credential so Socket socketlabel Policy label associated with so Determine whether the subject should be allowed to send information across the socket so. <function>&mac.mpo;_check_cred_visible</function> int &mac.mpo;_check_cred_visible struct ucred *u1 struct ucred *u2 &mac.thead; u1 Subject credential u2 Object credential Determine whether the subject credential u1 can see other subjects with the passed subject credential u2. Return 0 for success, or an errno value for failure. Suggested failure: EACCES for label mismatches, EPERM for lack of privilege, or ESRCH to hide visibility. This call may be made in a number of situations, including inter-process status sysctls used by ps, and in procfs lookups. <function>&mac.mpo;_check_socket_visible</function> int &mac.mpo;_check_socket_visible struct ucred *cred struct socket *socket struct label *socketlabel &mac.thead; cred Subject credential socket Object; socket socketlabel Policy label for socket <function>&mac.mpo;_check_ifnet_relabel</function> int &mac.mpo;_check_ifnet_relabel struct ucred *cred struct ifnet *ifnet struct label *ifnetlabel struct label *newlabel &mac.thead; cred Subject credential ifnet Object; network interface ifnetlabel Existing policy label for ifnet newlabel Policy label update to later be applied to ifnet Determine whether the subject credential can relabel the passed network interface to the passed label update. <function>&mac.mpo;_check_socket_relabel</function> int &mac.mpo;_check_socket_relabel struct ucred *cred struct socket *socket struct label *socketlabel struct label *newlabel &mac.thead; cred Subject credential socket Object; socket socketlabel Existing policy label for socket newlabel Label update to later be applied to socketlabel Determine whether the subject credential can relabel the passed socket to the passed label update. <function>&mac.mpo;_check_cred_relabel</function> int &mac.mpo;_check_cred_relabel struct ucred *cred struct label *newlabel &mac.thead; cred Subject credential newlabel Label update to later be applied to cred Determine whether the subject credential can relabel itself to the passed label update. <function>&mac.mpo;_check_vnode_relabel</function> int &mac.mpo;_check_vnode_relabel struct ucred *cred struct vnode *vp struct label *vnodelabel struct label *newlabel &mac.thead; cred Subject credential Immutable vp Object; vnode Locked vnodelabel Existing policy label for vp newlabel Policy label update to later be applied to vp Determine whether the subject credential can relabel the passed vnode to the passed label update. <function>&mac.mpo;_check_mount_stat</function> int &mac.mpo;_check_mount_stat struct ucred *cred struct mount *mp struct label *mountlabel &mac.thead; cred Subject credential mp Object; file system mount mountlabel Policy label for mp Determine whether the subject credential can see the results of a statfs performed on the file system. Return 0 for success, or an errno value for failure. Suggested failure: EACCES for label mismatches or EPERM for lack of privilege. This call may be made in a number of situations, including during invocations of &man.statfs.2; and related calls, as well as to determine what file systems to exclude from listings of file systems, such as when &man.getfsstat.2; is invoked. <function>&mac.mpo;_check_proc_debug</function> int &mac.mpo;_check_proc_debug struct ucred *cred struct proc *proc &mac.thead; cred Subject credential Immutable proc Object; process Determine whether the subject credential can debug the passed process. Return 0 for success, or an errno value for failure. Suggested failure: EACCES for label mismatch, EPERM for lack of privilege, or ESRCH to hide visibility of the target. This call may be made in a number of situations, including use of the &man.ptrace.2; and &man.ktrace.2; APIs, as well as for some types of procfs operations. <function>&mac.mpo;_check_vnode_access</function> int &mac.mpo;_check_vnode_access struct ucred *cred struct vnode *vp struct label *label int flags &mac.thead; cred Subject credential vp Object; vnode label Policy label for vp flags &man.access.2; flags Determine how invocations of &man.access.2; and related calls by the subject credential should return when performed on the passed vnode using the passed access flags. This should generally be implemented using the same semantics used in &mac.mpo;_check_vnode_open. Return 0 for success, or an errno value for failure. Suggested failure: EACCES for label mismatches or EPERM for lack of privilege. <function>&mac.mpo;_check_vnode_chdir</function> int &mac.mpo;_check_vnode_chdir struct ucred *cred struct vnode *dvp struct label *dlabel &mac.thead; cred Subject credential dvp Object; vnode to &man.chdir.2; into dlabel Policy label for dvp Determine whether the subject credential can change the process working directory to the passed vnode. Return 0 for success, or an errno value for failure. Suggested failure: EACCES for label mismatch, or EPERM for lack of privilege. <function>&mac.mpo;_check_vnode_chroot</function> int &mac.mpo;_check_vnode_chroot struct ucred *cred struct vnode *dvp struct label *dlabel &mac.thead; cred Subject credential dvp Directory vnode dlabel Policy label associated with dvp Determine whether the subject should be allowed to &man.chroot.2; into the specified directory (dvp). <function>&mac.mpo;_check_vnode_create</function> int &mac.mpo;_check_vnode_create struct ucred *cred struct vnode *dvp struct label *dlabel struct componentname *cnp struct vattr *vap &mac.thead; cred Subject credential dvp Object; vnode dlabel Policy label for dvp cnp Component name for dvp vap vnode attributes for vap Determine whether the subject credential can create a vnode with the passed parent directory, passed name information, and passed attribute information. Return 0 for success, or an errno value for failure. Suggested failure: EACCES. for label mismatch, or EPERM for lack of privilege. This call may be made in a number of situations, including as a result of calls to &man.open.2; with O_CREAT, &man.mknod.2;, &man.mkfifo.2;, and others. <function>&mac.mpo;_check_vnode_delete</function> int &mac.mpo;_check_vnode_delete struct ucred *cred struct vnode *dvp struct label *dlabel struct vnode *vp void *label struct componentname *cnp &mac.thead; cred Subject credential dvp Parent directory vnode dlabel Policy label for dvp vp Object; vnode to delete label Policy label for vp cnp Component name for vp Determine whether the subject credential can delete a vnode from the passed parent directory and passed name information. Return 0 for success, or an errno value for failure. Suggested failure: EACCES for label mismatch, or EPERM for lack of privilege. This call may be made in a number of situations, including as a result of calls to &man.unlink.2; and &man.rmdir.2;. Policies implementing this entry point should also implement mpo_check_rename_to to authorize deletion of objects as a result of being the target of a rename. <function>&mac.mpo;_check_vnode_deleteacl</function> int &mac.mpo;_check_vnode_deleteacl struct ucred *cred struct vnode *vp struct label *label acl_type_t type &mac.thead; cred Subject credential Immutable vp Object; vnode Locked label Policy label for vp type ACL type Determine whether the subject credential can delete the ACL of passed type from the passed vnode. Return 0 for success, or an errno value for failure. Suggested failure: EACCES for label mismatch, or EPERM for lack of privilege. <function>&mac.mpo;_check_vnode_exec</function> int &mac.mpo;_check_vnode_exec struct ucred *cred struct vnode *vp struct label *label &mac.thead; cred Subject credential vp Object; vnode to execute label Policy label for vp Determine whether the subject credential can execute the passed vnode. Determination of execute privilege is made separately from decisions about any transitioning event. Return 0 for success, or an errno value for failure. Suggested failure: EACCES for label mismatch, or EPERM for lack of privilege. <function>&mac.mpo;_check_vnode_getacl</function> int &mac.mpo;_check_vnode_getacl struct ucred *cred struct vnode *vp struct label *label acl_type_t type &mac.thead; cred Subject credential vp Object; vnode label Policy label for vp type ACL type Determine whether the subject credential can retrieve the ACL of passed type from the passed vnode. Return 0 for success, or an errno value for failure. Suggested failure: EACCES for label mismatch, or EPERM for lack of privilege. <function>&mac.mpo;_check_vnode_getextattr</function> int &mac.mpo;_check_vnode_getextattr struct ucred *cred struct vnode *vp struct label *label int attrnamespace const char *name struct uio *uio &mac.thead; cred Subject credential vp Object; vnode label Policy label for vp attrnamespace Extended attribute namespace name Extended attribute name uio I/O structure pointer; see &man.uio.9; Determine whether the subject credential can retrieve the extended attribute with the passed namespace and name from the passed vnode. Policies implementing labeling using extended attributes may be interested in special handling of operations on those extended attributes. Return 0 for success, or an errno value for failure. Suggested failure: EACCES for label mismatch, or EPERM for lack of privilege. <function>&mac.mpo;_check_vnode_link</function> int &mac.mpo;_check_vnode_link struct ucred *cred struct vnode *dvp struct label *dlabel struct vnode *vp struct label *label struct componentname *cnp &mac.thead; cred Subject credential dvp Directory vnode dlabel Policy label associated with dvp vp Link destination vnode label Policy label associated with vp cnp Component name for the link being created Determine whether the subject should be allowed to create a link to the vnode vp with the name specified by cnp. <function>&mac.mpo;_check_vnode_mmap</function> int &mac.mpo;_check_vnode_mmap struct ucred *cred struct vnode *vp struct label *label int prot &mac.thead; cred Subject credential vp Vnode to map label Policy label associated with vp prot Mmap protections (see &man.mmap.2;) Determine whether the subject should be allowed to map the vnode vp with the protections specified in prot. <function>&mac.mpo;_check_vnode_mmap_downgrade</function> void &mac.mpo;_check_vnode_mmap_downgrade struct ucred *cred struct vnode *vp struct label *label int *prot &mac.thead; cred See . vp label prot Mmap protections to be downgraded Downgrade the mmap protections based on the subject and object labels. <function>&mac.mpo;_check_vnode_mprotect</function> int &mac.mpo;_check_vnode_mprotect struct ucred *cred struct vnode *vp struct label *label int prot &mac.thead; cred Subject credential vp Mapped vnode prot Memory protections Determine whether the subject should be allowed to set the specified memory protections on memory mapped from the vnode vp. <function>&mac.mpo;_check_vnode_poll</function> int &mac.mpo;_check_vnode_poll struct ucred *active_cred struct ucred *file_cred struct vnode *vp struct label *label &mac.thead; active_cred Subject credential file_cred Credential associated with the struct file vp Polled vnode label Policy label associated with vp Determine whether the subject should be allowed to poll the vnode vp. <function>&mac.mpo;_check_vnode_rename_from</function> int &mac.mpo;_vnode_rename_from struct ucred *cred struct vnode *dvp struct label *dlabel struct vnode *vp struct label *label struct componentname *cnp &mac.thead; cred Subject credential dvp Directory vnode dlabel Policy label associated with dvp vp Vnode to be renamed label Policy label associated with vp cnp Component name for vp Determine whether the subject should be allowed to rename the vnode vp to something else. <function>&mac.mpo;_check_vnode_rename_to</function> int &mac.mpo;_check_vnode_rename_to struct ucred *cred struct vnode *dvp struct label *dlabel struct vnode *vp struct label *label int samedir struct componentname *cnp &mac.thead; cred Subject credential dvp Directory vnode dlabel Policy label associated with dvp vp Overwritten vnode label Policy label associated with vp samedir Boolean; 1 if the source and destination directories are the same cnp Destination component name Determine whether the subject should be allowed to rename to the vnode vp, into the directory dvp, or to the name represented by cnp. If there is no existing file to overwrite, vp and label will be NULL. <function>&mac.mpo;_check_socket_listen</function> int &mac.mpo;_check_socket_listen struct ucred *cred struct socket *socket struct label *socketlabel &mac.thead; cred Subject credential socket Object; socket socketlabel Policy label for socket Determine whether the subject credential can listen on the passed socket. Return 0 for success, or an errno value for failure. Suggested failure: EACCES for label mismatch, or EPERM for lack of privilege. <function>&mac.mpo;_check_vnode_lookup</function> int &mac.mpo;_check_vnode_lookup struct ucred *cred struct vnode *dvp struct label *dlabel struct componentname *cnp &mac.thead; cred Subject credential dvp Object; vnode dlabel Policy label for dvp cnp Component name being looked up Determine whether the subject credential can perform a lookup in the passed directory vnode for the passed name. Return 0 for success, or an errno value for failure. Suggested failure: EACCES for label mismatch, or EPERM for lack of privilege. <function>&mac.mpo;_check_vnode_open</function> int &mac.mpo;_check_vnode_open struct ucred *cred struct vnode *vp struct label *label int acc_mode &mac.thead; cred Subject credential vp Object; vnode label Policy label for vp acc_mode &man.open.2; access mode Determine whether the subject credential can perform an open operation on the passed vnode with the passed access mode. Return 0 for success, or an errno value for failure. Suggested failure: EACCES for label mismatch, or EPERM for lack of privilege. <function>&mac.mpo;_check_vnode_readdir</function> int &mac.mpo;_check_vnode_readdir struct ucred *cred struct vnode *dvp struct label *dlabel &mac.thead; cred Subject credential dvp Object; directory vnode dlabel Policy label for dvp Determine whether the subject credential can perform a readdir operation on the passed directory vnode. Return 0 for success, or an errno value for failure. Suggested failure: EACCES for label mismatch, or EPERM for lack of privilege. <function>&mac.mpo;_check_vnode_readlink</function> int &mac.mpo;_check_vnode_readlink struct ucred *cred struct vnode *vp struct label *label &mac.thead; cred Subject credential vp Object; vnode label Policy label for vp Determine whether the subject credential can perform a readlink operation on the passed symlink vnode. Return 0 for success, or an errno value for failure. Suggested failure: EACCES for label mismatch, or EPERM for lack of privilege. This call may be made in a number of situations, including an explicit readlink call by the user process, or as a result of an implicit readlink during a name lookup by the process. <function>&mac.mpo;_check_vnode_revoke</function> int &mac.mpo;_check_vnode_revoke struct ucred *cred struct vnode *vp struct label *label &mac.thead; cred Subject credential vp Object; vnode label Policy label for vp Determine whether the subject credential can revoke access to the passed vnode. Return 0 for success, or an errno value for failure. Suggested failure: EACCES for label mismatch, or EPERM for lack of privilege. <function>&mac.mpo;_check_vnode_setacl</function> int &mac.mpo;_check_vnode_setacl struct ucred *cred struct vnode *vp struct label *label acl_type_t type struct acl *acl &mac.thead; cred Subject credential vp Object; vnode label Policy label for vp type ACL type acl ACL Determine whether the subject credential can set the passed ACL of passed type on the passed vnode. Return 0 for success, or an errno value for failure. Suggested failure: EACCES for label mismatch, or EPERM for lack of privilege. <function>&mac.mpo;_check_vnode_setextattr</function> int &mac.mpo;_check_vnode_setextattr struct ucred *cred struct vnode *vp struct label *label int attrnamespace const char *name struct uio *uio &mac.thead; cred Subject credential vp Object; vnode label Policy label for vp attrnamespace Extended attribute namespace name Extended attribute name uio I/O structure pointer; see &man.uio.9; Determine whether the subject credential can set the extended attribute of passed name and passed namespace on the passed vnode. Policies implementing security labels backed into extended attributes may want to provide additional protections for those attributes. Additionally, policies should avoid making decisions based on the data referenced from uio, as there is a potential race condition between this check and the actual operation. The uio may also be NULL if a delete operation is being performed. Return 0 for success, or an errno value for failure. Suggested failure: EACCES for label mismatch, or EPERM for lack of privilege. <function>&mac.mpo;_check_vnode_setflags</function> int &mac.mpo;_check_vnode_setflags struct ucred *cred struct vnode *vp struct label *label u_long flags &mac.thead; cred Subject credential vp Object; vnode label Policy label for vp flags File flags; see &man.chflags.2; Determine whether the subject credential can set the passed flags on the passed vnode. Return 0 for success, or an errno value for failure. Suggested failure: EACCES for label mismatch, or EPERM for lack of privilege. <function>&mac.mpo;_check_vnode_setmode</function> int &mac.mpo;_check_vnode_setmode struct ucred *cred struct vnode *vp struct label *label mode_t mode &mac.thead; cred Subject credential vp Object; vnode label Policy label for vp mode File mode; see &man.chmod.2; Determine whether the subject credential can set the passed mode on the passed vnode. Return 0 for success, or an errno value for failure. Suggested failure: EACCES for label mismatch, or EPERM for lack of privilege. <function>&mac.mpo;_check_vnode_setowner</function> int &mac.mpo;_check_vnode_setowner struct ucred *cred struct vnode *vp struct label *label uid_t uid gid_t gid &mac.thead; cred Subject credential vp Object; vnode label Policy label for vp uid User ID gid Group ID Determine whether the subject credential can set the passed uid and passed gid as file uid and file gid on the passed vnode. The IDs may be set to (-1) to request no update. Return 0 for success, or an errno value for failure. Suggested failure: EACCES for label mismatch, or EPERM for lack of privilege. <function>&mac.mpo;_check_vnode_setutimes</function> int &mac.mpo;_check_vnode_setutimes struct ucred *cred struct vnode *vp struct label *label struct timespec atime struct timespec mtime &mac.thead; cred Subject credential vp Object; vp label Policy label for vp atime Access time; see &man.utimes.2; mtime Modification time; see &man.utimes.2; Determine whether the subject credential can set the passed access timestamps on the passed vnode. Return 0 for success, or an errno value for failure. Suggested failure: EACCES for label mismatch, or EPERM for lack of privilege. <function>&mac.mpo;_check_proc_sched</function> int &mac.mpo;_check_proc_sched struct ucred *ucred struct proc *proc &mac.thead; cred Subject credential proc Object; process Determine whether the subject credential can change the scheduling parameters of the passed process. Return 0 for success, or an errno value for failure. Suggested failure: EACCES for label mismatch, EPERM for lack of privilege, or ESRCH to limit visibility. See &man.setpriority.2; for more information. <function>&mac.mpo;_check_proc_signal</function> int &mac.mpo;_check_proc_signal struct ucred *cred struct proc *proc int signal &mac.thead; cred Subject credential proc Object; process signal Signal; see &man.kill.2; Determine whether the subject credential can deliver the passed signal to the passed process. Return 0 for success, or an errno value for failure. Suggested failure: EACCES for label mismatch, EPERM for lack of privilege, or ESRCH to limit visibility. <function>&mac.mpo;_check_vnode_stat</function> int &mac.mpo;_check_vnode_stat struct ucred *cred struct vnode *vp struct label *label &mac.thead; cred Subject credential vp Object; vnode label Policy label for vp Determine whether the subject credential can stat the passed vnode. Return 0 for success, or an errno value for failure. Suggested failure: EACCES for label mismatch, or EPERM for lack of privilege. See &man.stat.2; for more information. <function>&mac.mpo;_check_ifnet_transmit</function> int &mac.mpo;_check_ifnet_transmit struct ucred *cred struct ifnet *ifnet struct label *ifnetlabel struct mbuf *mbuf struct label *mbuflabel &mac.thead; cred Subject credential ifnet Network interface ifnetlabel Policy label for ifnet mbuf Object; mbuf to be sent mbuflabel Policy label for mbuf Determine whether the network interface can transmit the passed mbuf. Return 0 for success, or an errno value for failure. Suggested failure: EACCES for label mismatch, or EPERM for lack of privilege. <function>&mac.mpo;_check_socket_deliver</function> int &mac.mpo;_check_socket_deliver struct ucred *cred struct ifnet *ifnet struct label *ifnetlabel struct mbuf *mbuf struct label *mbuflabel &mac.thead; cred Subject credential ifnet Network interface ifnetlabel Policy label for ifnet mbuf Object; mbuf to be delivered mbuflabel Policy label for mbuf Determine whether the socket may receive the datagram stored in the passed mbuf header. Return 0 for success, or an errno value for failure. Suggested failures: EACCES for label mismatch, or EPERM for lack of privilege. <function>&mac.mpo;_check_socket_visible</function> int &mac.mpo;_check_socket_visible struct ucred *cred struct socket *so struct label *socketlabel &mac.thead; cred Subject credential Immutable so Object; socket socketlabel Policy label for so Determine whether the subject credential cred can "see" the passed socket (socket) using system monitoring functions, such as those employed by &man.netstat.8; and &man.sockstat.1;. Return 0 for success, or an errno value for failure. Suggested failure: EACCES for label mismatches, EPERM for lack of privilege, or ESRCH to hide visibility. <function>&mac.mpo;_check_system_acct</function> int &mac.mpo;_check_system_acct struct ucred *ucred struct vnode *vp struct label *vlabel &mac.thead; ucred Subject credential vp Accounting file; &man.acct.5; vlabel Label associated with vp Determine whether the subject should be allowed to enable accounting, based on its label and the label of the accounting log file. <function>&mac.mpo;_check_system_nfsd</function> int &mac.mpo;_check_system_nfsd struct ucred *cred &mac.thead; cred Subject credential Determine whether the subject should be allowed to call &man.nfssvc.2;. <function>&mac.mpo;_check_system_reboot</function> int &mac.mpo;_check_system_reboot struct ucred *cred int howto &mac.thead; cred Subject credential howto howto parameter from &man.reboot.2; Determine whether the subject should be allowed to reboot the system in the specified manner. <function>&mac.mpo;_check_system_settime</function> int &mac.mpo;_check_system_settime struct ucred *cred &mac.thead; cred Subject credential Determine whether the user should be allowed to set the system clock. <function>&mac.mpo;_check_system_swapon</function> int &mac.mpo;_check_system_swapon struct ucred *cred struct vnode *vp struct label *vlabel &mac.thead; cred Subject credential vp Swap device vlabel Label associated with vp Determine whether the subject should be allowed to add vp as a swap device. <function>&mac.mpo;_check_system_sysctl</function> int &mac.mpo;_check_system_sysctl struct ucred *cred int *name u_int *namelen void *old size_t *oldlenp int inkernel void *new size_t newlen &mac.thead; cred Subject credential name See &man.sysctl.3; namelen old oldlenp inkernel Boolean; 1 if called from kernel new See &man.sysctl.3; newlen Determine whether the subject should be allowed to make the specified &man.sysctl.3; transaction. Label Management Calls Relabel events occur when a user process has requested that the label on an object be modified. A two-phase update occurs: first, an access control check will be performed to determine if the update is both valid and permitted, and then the update itself is performed via a separate entry point. Relabel entry points typically accept the object, object label reference, and an update label submitted by the process. Memory allocation during relabel is discouraged, as relabel calls are not permitted to fail (failure should be reported earlier in the relabel check). Userland Architecture The TrustedBSD MAC Framework includes a number of policy-agnostic elements, including MAC library interfaces for abstractly managing labels, modifications to the system credential management and login libraries to support the assignment of MAC labels to users, and a set of tools to monitor and modify labels on processes, files, and network interfaces. More details on the user architecture will be added to this section in the near future. APIs for Policy-Agnostic Label Management The TrustedBSD MAC Framework provides a number of library and system calls permitting applications to manage MAC labels on objects using a policy-agnostic interface. This permits applications to manipulate labels for a variety of policies without being written to support specific policies. These interfaces are used by general-purpose tools such as &man.ifconfig.8;, &man.ls.1; and &man.ps.1; to view labels on network interfaces, files, and processes. The APIs also support MAC management tools including &man.getfmac.8;, &man.getpmac.8;, &man.setfmac.8;, &man.setfsmac.8;, and &man.setpmac.8;. The MAC APIs are documented in &man.mac.3;. Applications handle MAC labels in two forms: an internalized form used to return and set labels on processes and objects (mac_t), and externalized form based on C strings appropriate for storage in configuration files, display to the user, or input from the user. Each MAC label contains a number of elements, each consisting of a name and value pair. Policy modules in the kernel bind to specific names and interpret the values in policy-specific ways. In the externalized string form, labels are represented by a comma-delimited list of name and value pairs separated by the / character. Labels may be directly converted to and from text using provided APIs; when retrieving labels from the kernel, internalized label storage must first be prepared for the desired label element set. Typically, this is done in one of two ways: using &man.mac.prepare.3; and an arbitrary list of desired label elements, or one of the variants of the call that loads a default element set from the &man.mac.conf.5; configuration file. Per-object defaults permit application writers to usefully display labels associated with objects without being aware of the policies present in the system. Currently, direct manipulation of label elements other than by conversion to a text string, string editing, and conversion back to an internalized label is not supported by the MAC library. Such interfaces may be added in the future if they prove necessary for application writers. Binding of Labels to Users The standard user context management interface, &man.setusercontext.3;, has been modified to retrieve MAC labels associated with a user's class from &man.login.conf.5;. These labels are then set along with other user context when either LOGIN_SETALL is specified, or when LOGIN_SETMAC is explicitly specified. It is expected that, in a future version of FreeBSD, the MAC label database will be separated from the login.conf user class abstraction, and be maintained in a separate database. However, the &man.setusercontext.3; API should remain the same following such a change. Conclusion The TrustedBSD MAC framework permits kernel modules to augment the system security policy in a highly integrated manner. They may do this based on existing object properties, or based on label data that is maintained with the assistance of the MAC framework. The framework is sufficiently flexible to implement a variety of policy types, including information flow security policies such as MLS and Biba, as well as policies based on existing BSD credentials or file protections. Policy authors may wish to consult this documentation as well as existing security modules when implementing a new security service.
diff --git a/en_US.ISO8859-1/books/arch-handbook/newbus/chapter.sgml b/en_US.ISO8859-1/books/arch-handbook/newbus/chapter.sgml index c20617c4fc..186880d010 100644 --- a/en_US.ISO8859-1/books/arch-handbook/newbus/chapter.sgml +++ b/en_US.ISO8859-1/books/arch-handbook/newbus/chapter.sgml @@ -1,360 +1,360 @@ Jeroen Ruigrok van der Werven (asmodai)
asmodai@FreeBSD.org
Written by
Hiten Pandya
hiten@uk.FreeBSD.org
Newbus Special thanks to Matthew N. Dodd, Warner Losh, Bill Paul, Doug Rabson, Mike Smith, Peter Wemm and Scott Long. This chapter explains the Newbus device framework in detail. - + Device Drivers Purpose of a Device Driver A device driver is a software component which provides the interface between the kernel's generic view of a peripheral (e.g. disk, network adapter) and the actual implementation of the peripheral. The device driver interface (DDI) is the defined interface between the kernel and the device driver component. Types of Device Drivers There used to be days in &unix;, and thus FreeBSD, in which there were four types of devices defined: block device drivers character device drivers network device drivers pseudo-device drivers Block devices performed in way that used fixed size blocks [of data]. This type of driver depended on the so called buffer cache, which had the purpose to cache accessed blocks of data in a dedicated part of the memory. Often this buffer cache was based on write-behind, which meant that when data was modified in memory it got synced to disk whenever the system did its periodical disk flushing, thus optimizing writes. Character devices However, in the versions of FreeBSD 4.0 and onward the distinction between block and character devices became non-existent. Overview of Newbus Newbus is the implementation of a new bus architecture based on abstraction layers which saw its introduction in FreeBSD 3.0 when the Alpha port was imported into the source tree. It was not until 4.0 before it became the default system to use for device drivers. Its goals are to provide a more object oriented means of interconnecting the various busses and devices which a host system provides to the Operating System. Its main features include amongst others: dynamic attaching easy modularization of drivers pseudo-busses One of the most prominent changes is the migration from the flat and ad-hoc system to a device tree lay-out. At the top level resides the root device which is the parent to hang all other devices on. For each architecture, there is typically a single child of root which has such things as host-to-PCI bridges, etc. attached to it. For x86, this root device is the nexus device and for Alpha, various different different models of Alpha have different top-level devices corresponding to the different hardware chipsets, including lca, apecs, cia and tsunami. A device in the Newbus context represents a single hardware entity in the system. For instance each PCI device is represented by a Newbus device. Any device in the system can have children; a device which has children is often called a bus. Examples of common busses in the system are ISA and PCI which manage lists of devices attached to ISA and PCI busses respectively. Often, a connection between different kinds of bus is represented by a bridge device which normally has one child for the attached bus. An example of this is a PCI-to-PCI bridge which is represented by a device pcibN on the parent PCI bus and has a child pciN for the attached bus. This layout simplifies the implementation of the PCI bus tree, allowing common code to be used for both top-level and bridged busses. Each device in the Newbus architecture asks its parent to map its resources. The parent then asks its own parent until the nexus is reached. So, basically the nexus is the only part of the Newbus system which knows about all resources. An ISA device might want to map its IO port at 0x230, so it asks its parent, in this case the ISA bus. The ISA bus hands it over to the PCI-to-ISA bridge which in its turn asks the PCI bus, which reaches the host-to-PCI bridge and finally the nexus. The beauty of this transition upwards is that there is room to translate the requests. For example, the 0x230 IO port request might become memory-mapped at 0xb0000230 on a MIPS box by the PCI bridge. Resource allocation can be controlled at any place in the device tree. For instance on many Alpha platforms, ISA interrupts are managed separately from PCI interrupts and resource allocations for ISA interrupts are managed by the Alpha's ISA bus device. On IA-32, ISA and PCI interrupts are both managed by the top-level nexus device. For both ports, memory and port address space is managed by a single entity - nexus for IA-32 and the relevant chipset driver on Alpha (e.g. CIA or tsunami). In order to normalize access to memory and port mapped resources, Newbus integrates the bus_space APIs from NetBSD. These provide a single API to replace inb/outb and direct memory reads/writes. The advantage of this is that a single driver can easily use either memory-mapped registers or port-mapped registers (some hardware supports both). This support is integrated into the resource allocation mechanism. When a resource is allocated, a driver can retrieve the associated bus_space_tag_t and bus_space_handle_t from the resource. Newbus also allows for definitions of interface methods in files dedicated to this purpose. These are the .m files that are found under the src/sys hierarchy. The core of the Newbus system is an extensible object-based programming model. Each device in the system has a table of methods which it supports. The system and other devices uses those methods to control the device and request services. The different methods supported by a device are defined by a number of interfaces. An interface is simply a group of related methods which can be implemented by a device. In the Newbus system, the methods for a device are provided by the various device drivers in the system. When a device is attached to a driver during auto-configuration, it uses the method table declared by the driver. A device can later detach from its driver and re-attach to a new driver with a new method table. This allows dynamic replacement of drivers which can be useful for driver development. The interfaces are described by an interface definition language similar to the language used to define vnode operations for file systems. The interface would be stored in a methods file (which would normally named foo_if.m). Newbus Methods # Foo subsystem/driver (a comment...) INTERFACE foo METHOD int doit { device_t dev; }; # DEFAULT is the method that will be used, if a method was not # provided via: DEVMETHOD() METHOD void doit_to_child { device_t dev; driver_t child; } DEFAULT doit_generic_to_child; When this interface is compiled, it generates a header file foo_if.h which contains function declarations: int FOO_DOIT(device_t dev); int FOO_DOIT_TO_CHILD(device_t dev, device_t child); A source file, foo_if.c is also created to accompany the automatically generated header file; it contains implementations of those functions which look up the location of the relevant functions in the object's method table and call that function. The system defines two main interfaces. The first fundamental interface is called device and includes methods which are relevant to all devices. Methods in the device interface include probe, attach and detach to control detection of hardware and shutdown, suspend and resume for critical event notification. The second, more complex interface is bus. This interface contains methods suitable for devices which have children, including methods to access bus specific per-device information &man.bus.generic.read.ivar.9; and &man.bus.generic.write.ivar.9;, event notification (child_detached, driver_added) and resource management (alloc_resource, activate_resource, deactivate_resource, release_resource). Many methods in the bus interface are performing services for some child of the bus device. These methods would normally use the first two arguments to specify the bus providing the service and the child device which is requesting the service. To simplify driver code, many of these methods have accessor functions which lookup the parent and call a method on the parent. For instance the method BUS_TEARDOWN_INTR(device_t dev, device_t child, ...) can be called using the function bus_teardown_intr(device_t child, ...). Some bus types in the system define additional interfaces to provide access to bus-specific functionality. For instance, the PCI bus driver defines the pci interface which has two methods read_config and write_config for accessing the configuration registers of a PCI device. Newbus API As the Newbus API is huge, this section makes some effort at documenting it. More information to come in the next revision of this document. Important locations in the source hierarchy src/sys/[arch]/[arch] - Kernel code for a specific machine architecture resides in this directory. for example, the i386 architecture, or the SPARC64 architecture. src/sys/dev/[bus] - device support for a specific [bus] resides in this directory. src/sys/dev/pci - PCI bus support code resides in this directory. src/sys/[isa|pci] - PCI/ISA device drivers reside in this directory. The PCI/ISA bus support code used to exist in this directory in FreeBSD version 4.0. Important structures and type definitions devclass_t - This is a type definition of a pointer to a struct devclass. device_method_t - This is same as kobj_method_t (see src/sys/kobj.h). device_t - This is a type definition of a pointer to a struct device. device_t represents a device in the system. It is a kernel object. See src/sys/sys/bus_private.h for implementation details. driver_t - This is a type definition which, references struct driver. The driver struct is a class of the device kernel object; it also holds data private to for the driver.
<emphasis>driver_t</emphasis> implementation struct driver { KOBJ_CLASS_FIELDS; void *priv; /* driver private data */ };
A device_state_t type, which is an enumeration, device_state. It contains the possible states of a Newbus device before and after the autoconfiguration process.
Device states<emphasis>device_state_t</emphasis> /* * src/sys/sys/bus.h */ typedef enum device_state { DS_NOTPRESENT, /* not probed or probe failed */ DS_ALIVE, /* probe succeeded */ DS_ATTACHED, /* attach method called */ DS_BUSY /* device is open */ } device_state_t;
diff --git a/en_US.ISO8859-1/books/arch-handbook/smp/chapter.sgml b/en_US.ISO8859-1/books/arch-handbook/smp/chapter.sgml index 3ce99fa171..3ce6e041f1 100644 --- a/en_US.ISO8859-1/books/arch-handbook/smp/chapter.sgml +++ b/en_US.ISO8859-1/books/arch-handbook/smp/chapter.sgml @@ -1,944 +1,944 @@ John Baldwin Robert Watson $FreeBSD$ 2002 2003 John Baldwin Robert Watson SMPng Design Document - + Introduction This document presents the current design and implementation of the SMPng Architecture. First, the basic primitives and tools are introduced. Next, a general architecture for the FreeBSD kernel's synchronization and execution model is laid out. Then, locking strategies for specific subsystems are discussed, documenting the approaches taken to introduce fine-grained synchronization and parallelism for each subsystem. Finally, detailed implementation notes are provided to motivate design choices, and make the reader aware of important implications involving the use of specific primitives. This document is a work-in-progress, and will be updated to reflect on-going design and implementation activities associated with the SMPng Project. Many sections currently exist only in outline form, but will be fleshed out as work proceeds. Updates or suggestions regarding the document may be directed to the document editors. The goal of SMPng is to allow concurrency in the kernel. The kernel is basically one rather large and complex program. To make the kernel multi-threaded we use some of the same tools used to make other programs multi-threaded. These include mutexes, shared/exclusive locks, semaphores, and condition variables. For the definitions of these and other SMP-related terms, please see - the section of this article. + the section of this article. - + Basic Tools and Locking Fundamentals Atomic Instructions and Memory Barriers There are several existing treatments of memory barriers and atomic instructions, so this section will not include a lot of detail. To put it simply, one can not go around reading variables without a lock if a lock is used to protect writes to that variable. This becomes obvious when you consider that memory barriers simply determine relative order of memory operations; they do not make any guarantee about timing of memory operations. That is, a memory barrier does not force the contents of a CPU's local cache or store buffer to flush. Instead, the memory barrier at lock release simply ensures that all writes to the protected data will be visible to other CPU's or devices if the write to release the lock is visible. The CPU is free to keep that data in its cache or store buffer as long as it wants. However, if another CPU performs an atomic instruction on the same datum, the first CPU must guarantee that the updated value is made visible to the second CPU along with any other operations that memory barriers may require. For example, assuming a simple model where data is considered visible when it is in main memory (or a global cache), when an atomic instruction is triggered on one CPU, other CPU's store buffers and caches must flush any writes to that same cache line along with any pending operations behind a memory barrier. This requires one to take special care when using an item protected by atomic instructions. For example, in the sleep mutex implementation, we have to use an atomic_cmpset rather than an atomic_set to turn on the MTX_CONTESTED bit. The reason is that we read the value of mtx_lock into a variable and then make a decision based on that read. However, the value we read may be stale, or it may change while we are making our decision. Thus, when the atomic_set executed, it may end up setting the bit on another value than the one we made the decision on. Thus, we have to use an atomic_cmpset to set the value only if the value we made the decision on is up-to-date and valid. Finally, atomic instructions only allow one item to be updated or read. If one needs to atomically update several items, then a lock must be used instead. For example, if two counters must be read and have values that are consistent relative to each other, then those counters must be protected by a lock rather than by separate atomic instructions. Read Locks versus Write Locks Read locks do not need to be as strong as write locks. Both types of locks need to ensure that the data they are accessing is not stale. However, only write access requires exclusive access. Multiple threads can safely read a value. Using different types of locks for reads and writes can be implemented in a number of ways. First, sx locks can be used in this manner by using an exclusive lock when writing and a shared lock when reading. This method is quite straightforward. A second method is a bit more obscure. You can protect a datum with multiple locks. Then for reading that data you simply need to have a read lock of one of the locks. However, to write to the data, you need to have a write lock of all of the locks. This can make writing rather expensive but can be useful when data is accessed in various ways. For example, the parent process pointer is protected by both the proctree_lock sx lock and the per-process mutex. Sometimes the proc lock is easier as we are just checking to see who a parent of a process is that we already have locked. However, other places such as inferior need to walk the tree of processes via parent pointers and locking each process would be prohibitive as well as a pain to guarantee that the condition you are checking remains valid for both the check and the actions taken as a result of the check. Locking Conditions and Results If you need a lock to check the state of a variable so that you can take an action based on the state you read, you can not just hold the lock while reading the variable and then drop the lock before you act on the value you read. Once you drop the lock, the variable can change rendering your decision invalid. Thus, you must hold the lock both while reading the variable and while performing the action as a result of the test. - + General Architecture and Design Interrupt Handling Following the pattern of several other multi-threaded &unix; kernels, FreeBSD deals with interrupt handlers by giving them their own thread context. Providing a context for interrupt handlers allows them to block on locks. To help avoid latency, however, interrupt threads run at real-time kernel priority. Thus, interrupt handlers should not execute for very long to avoid starving other kernel threads. In addition, since multiple handlers may share an interrupt thread, interrupt handlers should not sleep or use a sleepable lock to avoid starving another interrupt handler. The interrupt threads currently in FreeBSD are referred to as heavyweight interrupt threads. They are called this because switching to an interrupt thread involves a full context switch. In the initial implementation, the kernel was not preemptive and thus interrupts that interrupted a kernel thread would have to wait until the kernel thread blocked or returned to userland before they would have an opportunity to run. To deal with the latency problems, the kernel in FreeBSD has been made preemptive. Currently, we only preempt a kernel thread when we release a sleep mutex or when an interrupt comes in. However, the plan is to make the FreeBSD kernel fully preemptive as described below. Not all interrupt handlers execute in a thread context. Instead, some handlers execute directly in primary interrupt context. These interrupt handlers are currently misnamed fast interrupt handlers since the INTR_FAST flag used in earlier versions of the kernel is used to mark these handlers. The only interrupts which currently use these types of interrupt handlers are clock interrupts and serial I/O device interrupts. Since these handlers do not have their own context, they may not acquire blocking locks and thus may only use spin mutexes. Finally, there is one optional optimization that can be added in MD code called lightweight context switches. Since an interrupt thread executes in a kernel context, it can borrow the vmspace of any process. Thus, in a lightweight context switch, the switch to the interrupt thread does not switch vmspaces but borrows the vmspace of the interrupted thread. In order to ensure that the vmspace of the interrupted thread does not disappear out from under us, the interrupted thread is not allowed to execute until the interrupt thread is no longer borrowing its vmspace. This can happen when the interrupt thread either blocks or finishes. If an interrupt thread blocks, then it will use its own context when it is made runnable again. Thus, it can release the interrupted thread. The cons of this optimization are that they are very machine specific and complex and thus only worth the effort if their is a large performance improvement. At this point it is probably too early to tell, and in fact, will probably hurt performance as almost all interrupt handlers will immediately block on Giant and require a thread fix-up when they block. Also, an alternative method of interrupt handling has been proposed by Mike Smith that works like so: Each interrupt handler has two parts: a predicate which runs in primary interrupt context and a handler which runs in its own thread context. If an interrupt handler has a predicate, then when an interrupt is triggered, the predicate is run. If the predicate returns true then the interrupt is assumed to be fully handled and the kernel returns from the interrupt. If the predicate returns false or there is no predicate, then the threaded handler is scheduled to run. Fitting light weight context switches into this scheme might prove rather complicated. Since we may want to change to this scheme at some point in the future, it is probably best to defer work on light weight context switches until we have settled on the final interrupt handling architecture and determined how light weight context switches might or might not fit into it. Kernel Preemption and Critical Sections Kernel Preemption in a Nutshell Kernel preemption is fairly simple. The basic idea is that a CPU should always be doing the highest priority work available. Well, that is the ideal at least. There are a couple of cases where the expense of achieving the ideal is not worth being perfect. Implementing full kernel preemption is very straightforward: when you schedule a thread to be executed by putting it on a runqueue, you check to see if its priority is higher than the currently executing thread. If so, you initiate a context switch to that thread. While locks can protect most data in the case of a preemption, not all of the kernel is preemption safe. For example, if a thread holding a spin mutex preempted and the new thread attempts to grab the same spin mutex, the new thread may spin forever as the interrupted thread may never get a chance to execute. Also, some code such as the code to assign an address space number for a process during exec() on the Alpha needs to not be preempted as it supports the actual context switch code. Preemption is disabled for these code sections by using a critical section. Critical Sections The responsibility of the critical section API is to prevent context switches inside of a critical section. With a fully preemptive kernel, every setrunqueue of a thread other than the current thread is a preemption point. One implementation is for critical_enter to set a per-thread flag that is cleared by its counterpart. If setrunqueue is called with this flag set, it does not preempt regardless of the priority of the new thread relative to the current thread. However, since critical sections are used in spin mutexes to prevent context switches and multiple spin mutexes can be acquired, the critical section API must support nesting. For this reason the current implementation uses a nesting count instead of a single per-thread flag. In order to minimize latency, preemptions inside of a critical section are deferred rather than dropped. If a thread is made runnable that would normally be preempted to outside of a critical section, then a per-thread flag is set to indicate that there is a pending preemption. When the outermost critical section is exited, the flag is checked. If the flag is set, then the current thread is preempted to allow the higher priority thread to run. Interrupts pose a problem with regards to spin mutexes. If a low-level interrupt handler needs a lock, it needs to not interrupt any code needing that lock to avoid possible data structure corruption. Currently, providing this mechanism is piggybacked onto critical section API by means of the cpu_critical_enter and cpu_critical_exit functions. Currently this API disables and re-enables interrupts on all of FreeBSD's current platforms. This approach may not be purely optimal, but it is simple to understand and simple to get right. Theoretically, this second API need only be used for spin mutexes that are used in primary interrupt context. However, to make the code simpler, it is used for all spin mutexes and even all critical sections. It may be desirable to split out the MD API from the MI API and only use it in conjunction with the MI API in the spin mutex implementation. If this approach is taken, then the MD API likely would need a rename to show that it is a separate API now. Design Tradeoffs As mentioned earlier, a couple of trade-offs have been made to sacrifice cases where perfect preemption may not always provide the best performance. The first trade-off is that the preemption code does not take other CPUs into account. Suppose we have a two CPU's A and B with the priority of A's thread as 4 and the priority of B's thread as 2. If CPU B makes a thread with priority 1 runnable, then in theory, we want CPU A to switch to the new thread so that we will be running the two highest priority runnable threads. However, the cost of determining which CPU to enforce a preemption on as well as actually signaling that CPU via an IPI along with the synchronization that would be required would be enormous. Thus, the current code would instead force CPU B to switch to the higher priority thread. Note that this still puts the system in a better position as CPU B is executing a thread of priority 1 rather than a thread of priority 2. The second trade-off limits immediate kernel preemption to real-time priority kernel threads. In the simple case of preemption defined above, a thread is always preempted immediately (or as soon as a critical section is exited) if a higher priority thread is made runnable. However, many threads executing in the kernel only execute in a kernel context for a short time before either blocking or returning to userland. Thus, if the kernel preempts these threads to run another non-realtime kernel thread, the kernel may switch out the executing thread just before it is about to sleep or execute. The cache on the CPU must then adjust to the new thread. When the kernel returns to the interrupted CPU, it must refill all the cache information that was lost. In addition, two extra context switches are performed that could be avoided if the kernel deferred the preemption until the first thread blocked or returned to userland. Thus, by default, the preemption code will only preempt immediately if the higher priority thread is a real-time priority thread. Turning on full kernel preemption for all kernel threads has value as a debugging aid since it exposes more race conditions. It is especially useful on UP systems were many races are hard to simulate otherwise. Thus, there will be a kernel option to enable preemption for all kernel threads that can be used for debugging purposes. Thread Migration Simply put, a thread migrates when it moves from one CPU to another. In a non-preemptive kernel this can only happen at well-defined points such as when calling tsleep or returning to userland. However, in the preemptive kernel, an interrupt can force a preemption and possible migration at any time. This can have negative affects on per-CPU data since with the exception of curthread and curpcb the data can change whenever you migrate. Since you can potentially migrate at any time this renders per-CPU data rather useless. Thus it is desirable to be able to disable migration for sections of code that need per-CPU data to be stable. Critical sections currently prevent migration since they do not allow context switches. However, this may be too strong of a requirement to enforce in some cases since a critical section also effectively blocks interrupt threads on the current processor. As a result, it may be desirable to provide an API whereby code may indicate that if the current thread is preempted it should not migrate to another CPU. One possible implementation is to use a per-thread nesting count td_pinnest along with a td_pincpu which is updated to the current CPU on each context switch. Each CPU has its own run queue that holds threads pinned to that CPU. A thread is pinned when its nesting count is greater than zero and a thread starts off unpinned with a nesting count of zero. When a thread is put on a runqueue, we check to see if it is pinned. If so, we put it on the per-CPU runqueue, otherwise we put it on the global runqueue. When choosethread is called to retrieve the next thread, it could either always prefer bound threads to unbound threads or use some sort of bias when comparing priorities. If the nesting count is only ever written to by the thread itself and is only read by other threads when the owning thread is not executing but while holding the sched_lock, then td_pinnest will not need any other locks. The migrate_disable function would increment the nesting count and migrate_enable would decrement the nesting count. Due to the locking requirements specified above, they will only operate on the current thread and thus would not need to handle the case of making a thread migrateable that currently resides on a per-CPU run queue. It is still debatable if this API is needed or if the critical section API is sufficient by itself. Many of the places that need to prevent migration also need to prevent preemption as well, and in those places a critical section must be used regardless. Callouts The timeout() kernel facility permits kernel services to register functions for execution as part of the softclock() software interrupt. Events are scheduled based on a desired number of clock ticks, and callbacks to the consumer-provided function will occur at approximately the right time. The global list of pending timeout events is protected by a global spin mutex, callout_lock; all access to the timeout list must be performed with this mutex held. When softclock() is woken up, it scans the list of pending timeouts for those that should fire. In order to avoid lock order reversal, the softclock thread will release the callout_lock mutex when invoking the provided timeout() callback function. If the CALLOUT_MPSAFE flag was not set during registration, then Giant will be grabbed before invoking the callout, and then released afterwards. The callout_lock mutex will be re-grabbed before proceeding. The softclock() code is careful to leave the list in a consistent state while releasing the mutex. If DIAGNOSTIC is enabled, then the time taken to execute each function is measured, and a warning generated if it exceeds a threshold. - + Specific Locking Strategies Credentials struct ucred is the kernel's internal credential structure, and is generally used as the basis for process-driven access control within the kernel. BSD-derived systems use a copy-on-write model for credential data: multiple references may exist for a credential structure, and when a change needs to be made, the structure is duplicated, modified, and then the reference replaced. Due to wide-spread caching of the credential to implement access control on open, this results in substantial memory savings. With a move to fine-grained SMP, this model also saves substantially on locking operations by requiring that modification only occur on an unshared credential, avoiding the need for explicit synchronization when consuming a known-shared credential. Credential structures with a single reference are considered mutable; shared credential structures must not be modified or a race condition is risked. A mutex, cr_mtxp protects the reference count of struct ucred so as to maintain consistency. Any use of the structure requires a valid reference for the duration of the use, or the structure may be released out from under the illegitimate consumer. The struct ucred mutex is a leaf mutex, and for performance reasons, is implemented via a mutex pool. Usually, credentials are used in a read-only manner for access control decisions, and in this case td_ucred is generally preferred because it requires no locking. When a process' credential is updated the proc lock must be held across the check and update operations thus avoid races. The process credential p_ucred must be used for check and update operations to prevent time-of-check, time-of-use races. If system call invocations will perform access control after an update to the process credential, the value of td_ucred must also be refreshed to the current process value. This will prevent use of a stale credential following a change. The kernel automatically refreshes the td_ucred pointer in the thread structure from the process p_ucred whenever a process enters the kernel, permitting use of a fresh credential for kernel access control. File Descriptors and File Descriptor Tables Details to follow. Jail Structures struct prison stores administrative details pertinent to the maintenance of jails created using the &man.jail.2; API. This includes the per-jail hostname, IP address, and related settings. This structure is reference-counted since pointers to instances of the structure are shared by many credential structures. A single mutex, pr_mtx protects read and write access to the reference count and all mutable variables inside the struct jail. Some variables are set only when the jail is created, and a valid reference to the struct prison is sufficient to read these values. The precise locking of each entry is documented via comments in sys/jail.h. MAC Framework The TrustedBSD MAC Framework maintains data in a variety of kernel objects, in the form of struct label. In general, labels in kernel objects are protected by the same lock as the remainder of the kernel object. For example, the v_label label in struct vnode is protected by the vnode lock on the vnode. In addition to labels maintained in standard kernel objects, the MAC Framework also maintains a list of registered and active policies. The policy list is protected by a global mutex (mac_policy_list_lock) and a busy count (also protected by the mutex). Since many access control checks may occur in parallel, entry to the framework for a read-only access to the policy list requires holding the mutex while incrementing (and later decrementing) the busy count. The mutex need not be held for the duration of the MAC entry operation--some operations, such as label operations on file system objects--are long-lived. To modify the policy list, such as during policy registration and de-registration, the mutex must be held and the reference count must be zero, to prevent modification of the list while it is in use. A condition variable, mac_policy_list_not_busy, is available to threads that need to wait for the list to become unbusy, but this condition variable must only be waited on if the caller is holding no other locks, or a lock order violation may be possible. The busy count, in effect, acts as a form of shared/exclusive lock over access to the framework: the difference is that, unlike with an sx lock, consumers waiting for the list to become unbusy may be starved, rather than permitting lock order problems with regards to the busy count and other locks that may be held on entry to (or inside) the MAC Framework. Modules For the module subsystem there exists a single lock that is used to protect the shared data. This lock is a shared/exclusive (SX) lock and has a good chance of needing to be acquired (shared or exclusively), therefore there are a few macros that have been added to make access to the lock more easy. These macros can be located in sys/module.h and are quite basic in terms of usage. The main structures protected under this lock are the module_t structures (when shared) and the global modulelist_t structure, modules. One should review the related source code in kern/kern_module.c to further understand the locking strategy. Newbus Device Tree The newbus system will have one sx lock. Readers will hold a shared (read) lock (&man.sx.slock.9;) and writers will hold an exclusive (write) lock (&man.sx.xlock.9;). Internal functions will not do locking at all. Externally visible ones will lock as needed. Those items that do not matter if the race is won or lost will not be locked, since they tend to be read all over the place (e.g. &man.device.get.softc.9;). There will be relatively few changes to the newbus data structures, so a single lock should be sufficient and not impose a performance penalty. Pipes ... Processes and Threads - process hierarchy - proc locks, references - thread-specific copies of proc entries to freeze during system calls, including td_ucred - inter-process operations - process groups and sessions Scheduler Lots of references to sched_lock and notes pointing at specific primitives and related magic elsewhere in the document. Select and Poll The select() and poll() functions permit threads to block waiting on events on file descriptors--most frequently, whether or not the file descriptors are readable or writable. ... SIGIO The SIGIO service permits processes to request the delivery of a SIGIO signal to its process group when the read/write status of specified file descriptors changes. At most one process or process group is permitted to register for SIGIO from any given kernel object, and that process or group is referred to as the owner. Each object supporting SIGIO registration contains pointer field that is NULL if the object is not registered, or points to a struct sigio describing the registration. This field is protected by a global mutex, sigio_lock. Callers to SIGIO maintenance functions must pass in this field by reference so that local register copies of the field are not made when unprotected by the lock. One struct sigio is allocated for each registered object associated with any process or process group, and contains back-pointers to the object, owner, signal information, a credential, and the general disposition of the registration. Each process or progress group contains a list of registered struct sigio structures, p_sigiolst for processes, and pg_sigiolst for process groups. These lists are protected by the process or process group locks respectively. Most fields in each struct sigio are constant for the duration of the registration, with the exception of the sio_pgsigio field which links the struct sigio into the process or process group list. Developers implementing new kernel objects supporting SIGIO will, in general, want to avoid holding structure locks while invoking SIGIO supporting functions, such as fsetown() or funsetown() to avoid defining a lock order between structure locks and the global SIGIO lock. This is generally possible through use of an elevated reference count on the structure, such as reliance on a file descriptor reference to a pipe during a pipe operation. Sysctl The sysctl() MIB service is invoked from both within the kernel and from userland applications using a system call. At least two issues are raised in locking: first, the protection of the structures maintaining the namespace, and second, interactions with kernel variables and functions that are accessed by the sysctl interface. Since sysctl permits the direct export (and modification) of kernel statistics and configuration parameters, the sysctl mechanism must become aware of appropriate locking semantics for those variables. Currently, sysctl makes use of a single global sx lock to serialize use of sysctl(); however, it is assumed to operate under Giant and other protections are not provided. The remainder of this section speculates on locking and semantic changes to sysctl. - Need to change the order of operations for sysctl's that update values from read old, copyin and copyout, write new to copyin, lock, read old and write new, unlock, copyout. Normal sysctl's that just copyout the old value and set a new value that they copyin may still be able to follow the old model. However, it may be cleaner to use the second model for all of the sysctl handlers to avoid lock operations. - To allow for the common case, a sysctl could embed a pointer to a mutex in the SYSCTL_FOO macros and in the struct. This would work for most sysctl's. For values protected by sx locks, spin mutexes, or other locking strategies besides a single sleep mutex, SYSCTL_PROC nodes could be used to get the locking right. Taskqueue The taskqueue's interface has two basic locks associated with it in order to protect the related shared data. The taskqueue_queues_mutex is meant to serve as a lock to protect the taskqueue_queues TAILQ. The other mutex lock associated with this system is the one in the struct taskqueue data structure. The use of the synchronization primitive here is to protect the integrity of the data in the struct taskqueue. It should be noted that there are no separate macros to assist the user in locking down his/her own work since these locks are most likely not going to be used outside of kern/subr_taskqueue.c. - + Implementation Notes Details of the Mutex Implementation - Should we require mutexes to be owned for mtx_destroy() since we can not safely assert that they are unowned by anyone else otherwise? Spin Mutexes - Use a critical section... Sleep Mutexes - Describe the races with contested mutexes - Why it is safe to read mtx_lock of a contested mutex when holding sched_lock. - Priority propagation Witness - What does it do - How does it work - + Miscellaneous Topics Interrupt Source and ICU Abstractions - struct isrc - pic drivers Other Random Questions/Topics Should we pass an interlock into sema_wait? - Generic turnstiles for sleep mutexes and sx locks. - Should we have non-sleepable sx locks? - + Glossary - + atomic An operation is atomic if all of its effects are visible to other CPUs together when the proper access protocol is followed. In the degenerate case are atomic instructions provided directly by machine architectures. At a higher level, if several members of a structure are protected by a lock, then a set of operations are atomic if they are all performed while holding the lock without releasing the lock in between any of the operations. operation - + block A thread is blocked when it is waiting on a lock, resource, or condition. Unfortunately this term is a bit overloaded as a result. sleep - + critical section A section of code that is not allowed to be preempted. A critical section is entered and exited using the &man.critical.enter.9; API. - + MD Machine dependent. MI - + memory operation A memory operation reads and/or writes to a memory location. - + MI Machine independent. MD - + operation memory operation - + primary interrupt context Primary interrupt context refers to the code that runs when an interrupt occurs. This code can either run an interrupt handler directly or schedule an asynchronous interrupt thread to execute the interrupt handlers for a given interrupt source. realtime kernel thread A high priority kernel thread. Currently, the only realtime priority kernel threads are interrupt threads. thread - + sleep A thread is asleep when it is blocked on a condition variable or a sleep queue via msleep or tsleep. block - + sleepable lock A sleepable lock is a lock that can be held by a thread which is asleep. Lockmgr locks and sx locks are currently the only sleepable locks in FreeBSD. Eventually, some sx locks such as the allproc and proctree locks may become non-sleepable locks. sleep - + thread A kernel thread represented by a struct thread. Threads own locks and hold a single execution context.