Index: head/lib/libc/locale/setlocale.3 =================================================================== --- head/lib/libc/locale/setlocale.3 (revision 368816) +++ head/lib/libc/locale/setlocale.3 (revision 368817) @@ -1,211 +1,211 @@ .\" Copyright (c) 1993 .\" The Regents of the University of California. All rights reserved. .\" .\" This code is derived from software contributed to Berkeley by .\" Donn Seeley at BSDI. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 3. Neither the name of the University nor the names of its contributors .\" may be used to endorse or promote products derived from this software .\" without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" @(#)setlocale.3 8.1 (Berkeley) 6/9/93 .\" $FreeBSD$ .\" .Dd August 7, 2020 .Dt SETLOCALE 3 .Os .Sh NAME .Nm setlocale .Nd natural language formatting for C .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In locale.h .Ft char * .Fn setlocale "int category" "const char *locale" .Sh DESCRIPTION The .Fn setlocale function sets the C library's notion of natural language formatting style for particular sets of routines. Each such style is called a .Sq locale and is invoked using an appropriate name passed as a C string. .Pp The .Fn setlocale function recognizes several categories of routines. These are the categories and the sets of routines they select: .Bl -tag -width LC_MONETARY .It Dv LC_ALL Set the entire locale generically. .It Dv LC_COLLATE Set a locale for string collation routines. This controls alphabetic ordering in .Fn strcoll and .Fn strxfrm . .It Dv LC_CTYPE Set a locale for the .Xr ctype 3 and .Xr multibyte 3 functions. This controls recognition of upper and lower case, alphabetic or non-alphabetic characters, and so on. .It Dv LC_MESSAGES Set a locale for message catalogs, see .Xr catopen 3 function. .It Dv LC_MONETARY Set a locale for formatting monetary values; this affects the .Fn localeconv function. .It Dv LC_NUMERIC Set a locale for formatting numbers. This controls the formatting of decimal points in input and output of floating point numbers in functions such as .Fn printf and .Fn scanf , as well as values returned by .Fn localeconv . .It Dv LC_TIME Set a locale for formatting dates and times using the .Fn strftime function. .It Dv LANG Sets the generic locale category for native language, local customs and coded character set in the absence of more specific locale variables. .El .Pp Only three locales are defined by default, the empty string .Li \&"\|" which denotes the native environment, and the .Li \&"C" and .Li \&"POSIX" locales, which denote the C language environment. A .Fa locale argument of .Dv NULL causes .Fn setlocale to return the current locale. .Pp The option .Fl a to the .Xr locale 1 command can be used to display all further possible names for the .Fa locale argument that are recognized. Specifying any unrecognized value for .Fa locale makes .Fn setlocale fail. .Pp By default, C programs start in the .Li \&"C" locale. .Pp The only function in the library that sets the locale is .Fn setlocale ; the locale is never changed as a side effect of some other routine. .Sh RETURN VALUES Upon successful completion, .Fn setlocale returns the string associated with the specified .Fa category for the requested .Fa locale . The .Fn setlocale function returns .Dv NULL and fails to change the locale if the given combination of .Fa category and .Fa locale makes no sense. +.Sh FILES +.Bl -tag -width /usr/share/locale/locale/category -compact +.It Pa $PATH_LOCALE/ Ns Em locale/category +.It Pa /usr/share/locale/ Ns Em locale/category +locale file for the locale +.Em locale +and the category +.Em category . +.El .Sh EXAMPLES The following code illustrates how a program can initialize the international environment for one language, while selectively modifying the program's locale such that regular expressions and string operations can be applied to text recorded in a different language: .Bd -literal setlocale(LC_ALL, "de"); setlocale(LC_COLLATE, "fr"); .Ed .Pp When a process is started, its current locale is set to the C or POSIX locale. An internationalized program that depends on locale data not defined in the C or POSIX locale must invoke the setlocale subroutine in the following manner before using any of the locale-specific information: .Bd -literal setlocale(LC_ALL, ""); .Ed -.Sh FILES -.Bl -tag -width /usr/share/locale/locale/category -compact -.It Pa $PATH_LOCALE/ Ns Em locale/category -.It Pa /usr/share/locale/ Ns Em locale/category -locale file for the locale -.Em locale -and the category -.Em category . -.El .Sh ERRORS No errors are defined. .Sh SEE ALSO .Xr locale 1 , .Xr localedef 1 , .Xr catopen 3 , .Xr ctype 3 , .Xr localeconv 3 , .Xr multibyte 3 , .Xr strcoll 3 , .Xr strxfrm 3 , .Xr euc 5 , .Xr utf8 5 , .Xr environ 7 .Sh STANDARDS The .Fn setlocale function conforms to .St -isoC-99 . .Sh HISTORY The .Fn setlocale function first appeared in .Bx 4.4 . Index: head/lib/libc/net/gethostbyname.3 =================================================================== --- head/lib/libc/net/gethostbyname.3 (revision 368816) +++ head/lib/libc/net/gethostbyname.3 (revision 368817) @@ -1,376 +1,376 @@ .\" Copyright (c) 1983, 1987, 1991, 1993 .\" The Regents of the University of California. All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 3. Neither the name of the University nor the names of its contributors .\" may be used to endorse or promote products derived from this software .\" without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" From: @(#)gethostbyname.3 8.4 (Berkeley) 5/25/95 .\" $FreeBSD$ .\" .Dd October 4, 2017 .Dt GETHOSTBYNAME 3 .Os .Sh NAME .Nm gethostbyname , .Nm gethostbyname2 , .Nm gethostbyaddr , .Nm gethostent , .Nm sethostent , .Nm endhostent , .Nm herror , .Nm hstrerror .Nd get network host entry .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In netdb.h .Vt int h_errno ; .Ft struct hostent * .Fn gethostbyname "const char *name" .Ft struct hostent * .Fn gethostbyname2 "const char *name" "int af" .Ft struct hostent * .Fn gethostbyaddr "const void *addr" "socklen_t len" "int af" .Ft struct hostent * .Fn gethostent void .Ft void .Fn sethostent "int stayopen" .Ft void .Fn endhostent void .Ft void .Fn herror "const char *string" .Ft const char * .Fn hstrerror "int err" .Sh DESCRIPTION .Bf -symbolic The .Xr getaddrinfo 3 and .Xr getnameinfo 3 functions are preferred over the .Fn gethostbyname , .Fn gethostbyname2 , and .Fn gethostbyaddr functions. .Ef .Pp The .Fn gethostbyname , .Fn gethostbyname2 and .Fn gethostbyaddr functions each return a pointer to an object with the following structure describing an internet host referenced by name or by address, respectively. .Pp The .Fa name argument passed to .Fn gethostbyname or .Fn gethostbyname2 should point to a .Dv NUL Ns -terminated hostname. The .Fa addr argument passed to .Fn gethostbyaddr should point to an address which is .Fa len bytes long, in binary form (i.e., not an IP address in human readable .Tn ASCII form). The .Fa af argument specifies the address family (e.g.\& .Dv AF_INET , AF_INET6 , etc.) of this address. .Pp The structure returned contains either the information obtained from the name server, .Xr named 8 , broken-out fields from a line in .Pa /etc/hosts , or database entries supplied by the .Xr yp 8 system. The order of the lookups is controlled by the .Sq hosts entry in .Xr nsswitch.conf 5 . .Bd -literal struct hostent { char *h_name; /* official name of host */ char **h_aliases; /* alias list */ int h_addrtype; /* host address type */ int h_length; /* length of address */ char **h_addr_list; /* list of addresses from name server */ }; #define h_addr h_addr_list[0] /* address, for backward compatibility */ .Ed .Pp The members of this structure are: .Bl -tag -width h_addr_list .It Va h_name Official name of the host. .It Va h_aliases A .Dv NULL Ns -terminated array of alternate names for the host. .It Va h_addrtype The type of address being returned; usually .Dv AF_INET . .It Va h_length The length, in bytes, of the address. .It Va h_addr_list A .Dv NULL Ns -terminated array of network addresses for the host. Host addresses are returned in network byte order. .It Va h_addr The first address in .Va h_addr_list ; this is for backward compatibility. .El .Pp When using the nameserver, .Fn gethostbyname and .Fn gethostbyname2 will search for the named host in the current domain and its parents unless the name ends in a dot. If the name contains no dot, and if the environment variable .Dq Ev HOSTALIASES contains the name of an alias file, the alias file will first be searched for an alias matching the input name. See .Xr hostname 7 for the domain search procedure and the alias file format. .Pp The .Fn gethostbyname2 function is an evolution of .Fn gethostbyname which is intended to allow lookups in address families other than .Dv AF_INET , for example .Dv AF_INET6 . .Pp The .Fn sethostent function may be used to request the use of a connected .Tn TCP socket for queries. Queries will by default use .Tn UDP datagrams. If the .Fa stayopen flag is non-zero, a .Tn TCP connection to the name server will be used. It will remain open after calls to .Fn gethostbyname , .Fn gethostbyname2 or .Fn gethostbyaddr have completed. .Pp The .Fn endhostent function closes the .Tn TCP connection. .Pp The .Fn herror function writes a message to the diagnostic output consisting of the string argument .Fa string , the constant string .Qq Li ":\ " , and a message corresponding to the value of .Va h_errno . .Pp The .Fn hstrerror function returns a string which is the message text corresponding to the value of the .Fa err argument. .Sh FILES .Bl -tag -width /etc/nsswitch.conf -compact .It Pa /etc/hosts .It Pa /etc/nsswitch.conf .It Pa /etc/resolv.conf .El .Sh EXAMPLES Print out the hostname associated with a specific IP address: .Bd -literal -offset indent const char *ipstr = "127.0.0.1"; struct in_addr ip; struct hostent *hp; if (!inet_aton(ipstr, &ip)) errx(1, "can't parse IP address %s", ipstr); if ((hp = gethostbyaddr((const void *)&ip, sizeof ip, AF_INET)) == NULL) errx(1, "no name associated with %s", ipstr); printf("name associated with %s is %s\en", ipstr, hp->h_name); .Ed .Sh DIAGNOSTICS Error return status from .Fn gethostbyname , .Fn gethostbyname2 and .Fn gethostbyaddr is indicated by return of a .Dv NULL pointer. The integer .Va h_errno may then be checked to see whether this is a temporary failure or an invalid or unknown host. The routine .Fn herror can be used to print an error message describing the failure. If its argument .Fa string is .Pf non- Dv NULL , it is printed, followed by a colon and a space. The error message is printed with a trailing newline. .Pp The variable .Va h_errno can have the following values: .Bl -tag -width HOST_NOT_FOUND .It Dv HOST_NOT_FOUND No such host is known. .It Dv TRY_AGAIN This is usually a temporary error and means that the local server did not receive a response from an authoritative server. A retry at some later time may succeed. .It Dv NO_RECOVERY Some unexpected server failure was encountered. This is a non-recoverable error. .It Dv NO_DATA The requested name is valid but does not have an IP address; this is not a temporary error. This means that the name is known to the name server but there is no address associated with this name. Another type of request to the name server using this domain name will result in an answer; for example, a mail-forwarder may be registered for this domain. .El .Sh SEE ALSO .Xr getaddrinfo 3 , .Xr getnameinfo 3 , .Xr inet_aton 3 , .Xr resolver 3 , .Xr hosts 5 , .Xr hostname 7 , .Xr named 8 -.Sh CAVEAT +.Sh HISTORY The +.Fn herror +function appeared in +.Bx 4.3 . +The +.Fn endhostent , +.Fn gethostbyaddr , +.Fn gethostbyname , +.Fn gethostent , +and +.Fn sethostent +functions appeared in +.Bx 4.2 . +The +.Fn gethostbyname2 +function first appeared in +.Tn BIND +version 4.9.4. +.Sh CAVEATS +The .Fn gethostent function is defined, and .Fn sethostent and .Fn endhostent are redefined, when .Lb libc is built to use only the routines to lookup in .Pa /etc/hosts and not the name server. .Pp The .Fn gethostent function reads the next line of .Pa /etc/hosts , opening the file if necessary. .Pp The .Fn sethostent function opens and/or rewinds the file .Pa /etc/hosts . If the .Fa stayopen argument is non-zero, the file will not be closed after each call to .Fn gethostbyname , .Fn gethostbyname2 or .Fn gethostbyaddr . .Pp The .Fn endhostent function closes the file. -.Sh HISTORY -The -.Fn herror -function appeared in -.Bx 4.3 . -The -.Fn endhostent , -.Fn gethostbyaddr , -.Fn gethostbyname , -.Fn gethostent , -and -.Fn sethostent -functions appeared in -.Bx 4.2 . -The -.Fn gethostbyname2 -function first appeared in -.Tn BIND -version 4.9.4. .Sh BUGS These functions use a thread-specific data storage; if the data is needed for future use, it should be copied before any subsequent calls overwrite it. .Pp Though these functions are thread-safe, still it is recommended to use the .Xr getaddrinfo 3 family of functions, instead. .Pp Only the Internet address format is currently understood. Index: head/lib/libc/stdlib/system.3 =================================================================== --- head/lib/libc/stdlib/system.3 (revision 368816) +++ head/lib/libc/stdlib/system.3 (revision 368817) @@ -1,111 +1,111 @@ .\" Copyright (c) 1990, 1991, 1993 .\" The Regents of the University of California. All rights reserved. .\" .\" This code is derived from software contributed to Berkeley by .\" the American National Standards Committee X3, on Information .\" Processing Systems. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 3. Neither the name of the University nor the names of its contributors .\" may be used to endorse or promote products derived from this software .\" without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" @(#)system.3 8.1 (Berkeley) 6/4/93 .\" $FreeBSD$ .\" .Dd July 25, 2015 .Dt SYSTEM 3 .Os .Sh NAME .Nm system .Nd pass a command to the shell .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In stdlib.h .Ft int .Fn system "const char *string" .Sh DESCRIPTION The .Fn system function hands the argument .Fa string to the command interpreter .Xr sh 1 . The calling process waits for the shell to finish executing the command, ignoring .Dv SIGINT and .Dv SIGQUIT , and blocking .Dv SIGCHLD . .Pp If .Fa string is a .Dv NULL pointer, .Fn system will return non-zero if the command interpreter .Xr sh 1 is available, and zero if it is not. .Sh RETURN VALUES The .Fn system function returns the exit status of the shell as returned by .Xr waitpid 2 , or \-1 if an error occurred when invoking .Xr fork 2 or .Xr waitpid 2 . A return value of 127 means the execution of the shell failed. .Sh SEE ALSO .Xr sh 1 , .Xr execve 2 , .Xr fork 2 , .Xr waitpid 2 , .Xr popen 3 , .Xr posix_spawn 3 .Sh STANDARDS The .Fn system function conforms to .St -isoC and is expected to be .St -p1003.2 compatible. .Sh SECURITY CONSIDERATIONS The .Fn system function is easily misused in a manner that enables a malicious user to run arbitrary command, because all meta-characters supported by .Xr sh 1 would be honored. User supplied parameters should always be carefully santized before they appear in -.Fa string. +.Fa string . Index: head/lib/libc/sys/_umtx_op.2 =================================================================== --- head/lib/libc/sys/_umtx_op.2 (revision 368816) +++ head/lib/libc/sys/_umtx_op.2 (revision 368817) @@ -1,1531 +1,1531 @@ .\" Copyright (c) 2016 The FreeBSD Foundation, Inc. .\" All rights reserved. .\" .\" This documentation was written by .\" Konstantin Belousov under sponsorship .\" from the FreeBSD Foundation. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD$ .\" .Dd November 23, 2020 .Dt _UMTX_OP 2 .Os .Sh NAME .Nm _umtx_op .Nd interface for implementation of userspace threading synchronization primitives .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In sys/types.h .In sys/umtx.h .Ft int .Fn _umtx_op "void *obj" "int op" "u_long val" "void *uaddr" "void *uaddr2" .Sh DESCRIPTION The .Fn _umtx_op system call provides kernel support for userspace implementation of the threading synchronization primitives. The .Lb libthr uses the syscall to implement .St -p1003.1-2001 pthread locks, like mutexes, condition variables and so on. .Ss STRUCTURES The operations, performed by the .Fn _umtx_op syscall, operate on userspace objects which are described by the following structures. Reserved fields and paddings are omitted. All objects require ABI-mandated alignment, but this is not currently enforced consistently on all architectures. .Pp The following flags are defined for flag fields of all structures: .Bl -tag -width indent .It Dv USYNC_PROCESS_SHARED Allow selection of the process-shared sleep queue for the thread sleep container, when the lock ownership cannot be granted immediately, and the operation must sleep. The process-shared or process-private sleep queue is selected based on the attributes of the memory mapping which contains the first byte of the structure, see .Xr mmap 2 . Otherwise, if the flag is not specified, the process-private sleep queue is selected regardless of the memory mapping attributes, as an optimization. .Pp See the .Sx SLEEP QUEUES subsection below for more details on sleep queues. .El .Bl -hang -offset indent .It Sy Mutex .Bd -literal struct umutex { volatile lwpid_t m_owner; uint32_t m_flags; uint32_t m_ceilings[2]; uintptr_t m_rb_lnk; }; .Ed .Pp The .Dv m_owner field is the actual lock. It contains either the thread identifier of the lock owner in the locked state, or zero when the lock is unowned. The highest bit set indicates that there is contention on the lock. The constants are defined for special values: .Bl -tag -width indent .It Dv UMUTEX_UNOWNED Zero, the value stored in the unowned lock. .It Dv UMUTEX_CONTESTED The contention indicator. .It Dv UMUTEX_RB_OWNERDEAD A thread owning the robust mutex terminated. The mutex is in unlocked state. .It Dv UMUTEX_RB_NOTRECOV The robust mutex is in a non-recoverable state. It cannot be locked until reinitialized. .El .Pp The .Dv m_flags field may contain the following umutex-specific flags, in addition to the common flags: .Bl -tag -width indent .It Dv UMUTEX_PRIO_INHERIT Mutex implements .Em Priority Inheritance protocol. .It Dv UMUTEX_PRIO_PROTECT Mutex implements .Em Priority Protection protocol. .It Dv UMUTEX_ROBUST Mutex is robust, as described in the .Sx ROBUST UMUTEXES section below. .It Dv UMUTEX_NONCONSISTENT Robust mutex is in a transient non-consistent state. Not used by kernel. .El .Pp In the manual page, mutexes not having .Dv UMUTEX_PRIO_INHERIT and .Dv UMUTEX_PRIO_PROTECT flags set, are called normal mutexes. Each type of mutex .Pq normal, priority-inherited, and priority-protected has a separate sleep queue associated with the given key. .Pp For priority protected mutexes, the .Dv m_ceilings array contains priority ceiling values. The .Dv m_ceilings[0] is the ceiling value for the mutex, as specified by .St -p1003.1-2008 for the .Em Priority Protected mutex protocol. The .Dv m_ceilings[1] is used only for the unlock of a priority protected mutex, when unlock is done in an order other than the reversed lock order. In this case, .Dv m_ceilings[1] must contain the ceiling value for the last locked priority protected mutex, for proper priority reassignment. If, instead, the unlocking mutex was the last priority propagated mutex locked by the thread, .Dv m_ceilings[1] should contain \-1. This is required because kernel does not maintain the ordered lock list. .It Sy Condition variable .Bd -literal struct ucond { volatile uint32_t c_has_waiters; uint32_t c_flags; uint32_t c_clockid; }; .Ed .Pp A non-zero .Dv c_has_waiters value indicates that there are in-kernel waiters for the condition, executing the .Dv UMTX_OP_CV_WAIT request. .Pp The .Dv c_flags field contains flags. Only the common flags .Pq Dv USYNC_PROCESS_SHARED are defined for ucond. .Pp The .Dv c_clockid member provides the clock identifier to use for timeout, when the .Dv UMTX_OP_CV_WAIT request has both the .Dv CVWAIT_CLOCKID flag and the timeout specified. Valid clock identifiers are a subset of those for .Xr clock_gettime 2 : .Bl -bullet -compact .It .Dv CLOCK_MONOTONIC .It .Dv CLOCK_MONOTONIC_FAST .It .Dv CLOCK_MONOTONIC_PRECISE .It .Dv CLOCK_PROF .It .Dv CLOCK_REALTIME .It .Dv CLOCK_REALTIME_FAST .It .Dv CLOCK_REALTIME_PRECISE .It .Dv CLOCK_SECOND .It .Dv CLOCK_UPTIME .It .Dv CLOCK_UPTIME_FAST .It .Dv CLOCK_UPTIME_PRECISE .It .Dv CLOCK_VIRTUAL .El .It Sy Reader/writer lock .Bd -literal struct urwlock { volatile int32_t rw_state; uint32_t rw_flags; uint32_t rw_blocked_readers; uint32_t rw_blocked_writers; }; .Ed .Pp The .Dv rw_state field is the actual lock. It contains both the flags and counter of the read locks which were granted. Names of the .Dv rw_state bits are following: .Bl -tag -width indent .It Dv URWLOCK_WRITE_OWNER Write lock was granted. .It Dv URWLOCK_WRITE_WAITERS There are write lock waiters. .It Dv URWLOCK_READ_WAITERS There are read lock waiters. .It Dv URWLOCK_READER_COUNT(c) Returns the count of currently granted read locks. .El .Pp At any given time there may be only one thread to which the writer lock is granted on the .Vt struct rwlock , and no threads are granted read lock. Or, at the given time, up to .Dv URWLOCK_MAX_READERS threads may be granted the read lock simultaneously, but write lock is not granted to any thread. .Pp The following flags for the .Dv rw_flags member of .Vt struct urwlock are defined, in addition to the common flags: .Bl -tag -width indent .It Dv URWLOCK_PREFER_READER If specified, immediately grant read lock requests when .Dv urwlock is already read-locked, even in presence of unsatisfied write lock requests. By default, if there is a write lock waiter, further read requests are not granted, to prevent unfair write lock waiter starvation. .El .Pp The .Dv rw_blocked_readers and .Dv rw_blocked_writers members contain the count of threads which are sleeping in kernel, waiting for the associated request type to be granted. The fields are used by kernel to update the .Dv URWLOCK_READ_WAITERS and .Dv URWLOCK_WRITE_WAITERS flags of the .Dv rw_state lock after requesting thread was woken up. .It Sy Semaphore .Bd -literal struct _usem2 { volatile uint32_t _count; uint32_t _flags; }; .Ed .Pp The .Dv _count word represents a counting semaphore. A non-zero value indicates an unlocked (posted) semaphore, while zero represents the locked state. The maximal supported semaphore count is .Dv USEM_MAX_COUNT . .Pp The .Dv _count word, besides the counter of posts (unlocks), also contains the .Dv USEM_HAS_WAITERS bit, which indicates that locked semaphore has waiting threads. .Pp The .Dv USEM_COUNT() macro, applied to the .Dv _count word, returns the current semaphore counter, which is the number of posts issued on the semaphore. .Pp The following bits for the .Dv _flags member of .Vt struct _usem2 are defined, in addition to the common flags: .Bl -tag -width indent .It Dv USEM_NAMED Flag is ignored by kernel. .El .It Sy Timeout parameter .Bd -literal struct _umtx_time { struct timespec _timeout; uint32_t _flags; uint32_t _clockid; }; .Ed .Pp Several .Fn _umtx_op operations allow the blocking time to be limited, failing the request if it cannot be satisfied in the specified time period. The timeout is specified by passing either the address of .Vt struct timespec , or its extended variant, .Vt struct _umtx_time , as the .Fa uaddr2 argument of .Fn _umtx_op . They are distinguished by the .Fa uaddr value, which must be equal to the size of the structure pointed to by .Fa uaddr2 , casted to .Vt uintptr_t . .Pp The .Dv _timeout member specifies the time when the timeout should occur. Legal values for clock identifier .Dv _clockid are shared with the .Fa clock_id argument to the .Xr clock_gettime 2 function, and use the same underlying clocks. The specified clock is used to obtain the current time value. Interval counting is always performed by the monotonic wall clock. .Pp The .Dv _flags argument allows the following flags to further define the timeout behaviour: .Bl -tag -width indent .It Dv UMTX_ABSTIME The .Dv _timeout value is the absolute time. The thread will be unblocked and the request failed when specified clock value is equal or exceeds the .Dv _timeout. .Pp If the flag is absent, the timeout value is relative, that is the amount of time, measured by the monotonic wall clock from the moment of the request start. .El .El .Ss SLEEP QUEUES When a locking request cannot be immediately satisfied, the thread is typically put to .Em sleep , which is a non-runnable state terminated by the .Em wake operation. Lock operations include a .Em try variant which returns an error rather than sleeping if the lock cannot be obtained. Also, .Fn _umtx_op provides requests which explicitly put the thread to sleep. .Pp Wakes need to know which threads to make runnable, so sleeping threads are grouped into containers called .Em sleep queues . A sleep queue is identified by a key, which for .Fn _umtx_op is defined as the physical address of some variable. Note that the .Em physical address is used, which means that same variable mapped multiple times will give one key value. This mechanism enables the construction of .Em process-shared locks. .Pp A related attribute of the key is shareability. Some requests always interpret keys as private for the current process, creating sleep queues with the scope of the current process even if the memory is shared. Others either select the shareability automatically from the mapping attributes, or take additional input as the .Dv USYNC_PROCESS_SHARED common flag. This is done as optimization, allowing the lock scope to be limited regardless of the kind of backing memory. .Pp Only the address of the start byte of the variable specified as key is important for determining corresponding sleep queue. The size of the variable does not matter, so, for example, sleep on the same address interpeted as .Vt uint32_t and .Vt long on a little-endian 64-bit platform would collide. .Pp The last attribute of the key is the object type. The sleep queue to which a sleeping thread is assigned is an individual one for simple wait requests, mutexes, rwlocks, condvars and other primitives, even when the physical address of the key is same. .Pp When waking up a limited number of threads from a given sleep queue, the highest priority threads that have been blocked for the longest on the queue are selected. .Ss ROBUST UMUTEXES The .Em robust umutexes are provided as a substrate for a userspace library to implement .Tn POSIX robust mutexes. A robust umutex must have the .Dv UMUTEX_ROBUST flag set. .Pp On thread termination, the kernel walks two lists of mutexes. The two lists head addresses must be provided by a prior call to .Dv UMTX_OP_ROBUST_LISTS request. The lists are singly-linked. The link to next element is provided by the .Dv m_rb_lnk member of the .Vt struct umutex . .Pp Robust list processing is aborted if the kernel finds a mutex with any of the following conditions: .Bl -dash -offset indent -compact .It the .Dv UMUTEX_ROBUST flag is not set .It not owned by the current thread, except when the mutex is pointed to by the .Dv robust_inactive member of the .Vt struct umtx_robust_lists_params , registered for the current thread .It the combination of mutex flags is invalid .It read of the umutex memory faults .It the list length limit described in .Xr libthr 3 is reached. .El .Pp Every mutex in both lists is unlocked as if the .Dv UMTX_OP_MUTEX_UNLOCK request is performed on it, but instead of the .Dv UMUTEX_UNOWNED value, the .Dv m_owner field is written with the .Dv UMUTEX_RB_OWNERDEAD value. When a mutex in the .Dv UMUTEX_RB_OWNERDEAD state is locked by kernel due to the .Dv UMTX_OP_MUTEX_TRYLOCK and .Dv UMTX_OP_MUTEX_LOCK requests, the lock is granted and .Er EOWNERDEAD error is returned. .Pp Also, the kernel handles the .Dv UMUTEX_RB_NOTRECOV value of .Dv the m_owner field specially, always returning the .Er ENOTRECOVERABLE error for lock attempts, without granting the lock. .Ss OPERATIONS The following operations, requested by the .Fa op argument to the function, are implemented: .Bl -tag -width indent .It Dv UMTX_OP_WAIT Wait. The arguments for the request are: .Bl -tag -width "obj" .It Fa obj Pointer to a variable of type .Vt long . .It Fa val Current value of the .Dv *obj . .El .Pp The current value of the variable pointed to by the .Fa obj argument is compared with the .Fa val . If they are equal, the requesting thread is put to interruptible sleep until woken up or the optionally specified timeout expires. .Pp The comparison and sleep are atomic. In other words, if another thread writes a new value to .Dv *obj and then issues .Dv UMTX_OP_WAKE , the request is guaranteed to not miss the wakeup, which might otherwise happen between comparison and blocking. .Pp The physical address of memory where the .Fa *obj variable is located, is used as a key to index sleeping threads. .Pp The read of the current value of the .Dv *obj variable is not guarded by barriers. In particular, it is the user's duty to ensure the lock acquire and release memory semantics, if the .Dv UMTX_OP_WAIT and .Dv UMTX_OP_WAKE requests are used as a substrate for implementing a simple lock. .Pp The request is not restartable. An unblocked signal delivered during the wait always results in sleep interruption and .Er EINTR error. .Pp Optionally, a timeout for the request may be specified. .It Dv UMTX_OP_WAKE Wake the threads possibly sleeping due to .Dv UMTX_OP_WAIT . The arguments for the request are: .Bl -tag -width "obj" .It Fa obj Pointer to a variable, used as a key to find sleeping threads. .It Fa val Up to .Fa val threads are woken up by this request. Specify .Dv INT_MAX to wake up all waiters. .El .It Dv UMTX_OP_MUTEX_TRYLOCK Try to lock umutex. The arguments to the request are: .Bl -tag -width "obj" .It Fa obj Pointer to the umutex. .El .Pp Operates same as the .Dv UMTX_OP_MUTEX_LOCK request, but returns .Er EBUSY instead of sleeping if the lock cannot be obtained immediately. .It Dv UMTX_OP_MUTEX_LOCK Lock umutex. The arguments to the request are: .Bl -tag -width "obj" .It Fa obj Pointer to the umutex. .El .Pp Locking is performed by writing the current thread id into the .Dv m_owner word of the .Vt struct umutex . The write is atomic, preserves the .Dv UMUTEX_CONTESTED contention indicator, and provides the acquire barrier for lock entrance semantic. .Pp If the lock cannot be obtained immediately because another thread owns the lock, the current thread is put to sleep, with .Dv UMUTEX_CONTESTED bit set before. Upon wake up, the lock conditions are re-tested. .Pp The request adheres to the priority protection or inheritance protocol of the mutex, specified by the .Dv UMUTEX_PRIO_PROTECT or .Dv UMUTEX_PRIO_INHERIT flag, respectively. .Pp Optionally, a timeout for the request may be specified. .Pp A request with a timeout specified is not restartable. An unblocked signal delivered during the wait always results in sleep interruption and .Er EINTR error. A request without timeout specified is always restarted after return from a signal handler. .It Dv UMTX_OP_MUTEX_UNLOCK Unlock umutex. The arguments to the request are: .Bl -tag -width "obj" .It Fa obj Pointer to the umutex. .El .Pp Unlocks the mutex, by writing .Dv UMUTEX_UNOWNED (zero) value into .Dv m_owner word of the .Vt struct umutex . The write is done with a release barrier, to provide lock leave semantic. .Pp If there are threads sleeping in the sleep queue associated with the umutex, one thread is woken up. If more than one thread sleeps in the sleep queue, the .Dv UMUTEX_CONTESTED bit is set together with the write of the .Dv UMUTEX_UNOWNED value into .Dv m_owner . .Pp The request adheres to the priority protection or inheritance protocol of the mutex, specified by the .Dv UMUTEX_PRIO_PROTECT or .Dv UMUTEX_PRIO_INHERIT flag, respectively. See description of the .Dv m_ceilings member of the .Vt struct umutex structure for additional details of the request operation on the priority protected protocol mutex. .It Dv UMTX_OP_SET_CEILING Set ceiling for the priority protected umutex. The arguments to the request are: .Bl -tag -width "uaddr" .It Fa obj Pointer to the umutex. .It Fa val New ceiling value. .It Fa uaddr Address of a variable of type .Vt uint32_t . If not .Dv NULL and the update was successful, the previous ceiling value is written to the location pointed to by .Fa uaddr . .El .Pp The request locks the umutex pointed to by the .Fa obj parameter, waiting for the lock if not immediately available. After the lock is obtained, the new ceiling value .Fa val is written to the .Dv m_ceilings[0] member of the .Vt struct umutex, after which the umutex is unlocked. .Pp The locking does not adhere to the priority protect protocol, to conform to the .Tn POSIX requirements for the .Xr pthread_mutex_setprioceiling 3 interface. .It Dv UMTX_OP_CV_WAIT Wait for a condition. The arguments to the request are: .Bl -tag -width "uaddr2" .It Fa obj Pointer to the .Vt struct ucond . .It Fa val Request flags, see below. .It Fa uaddr Pointer to the umutex. .It Fa uaddr2 Optional pointer to a .Vt struct timespec for timeout specification. .El .Pp The request must be issued by the thread owning the mutex pointed to by the .Fa uaddr argument. The .Dv c_hash_waiters member of the .Vt struct ucond , pointed to by the .Fa obj argument, is set to an arbitrary non-zero value, after which the .Fa uaddr mutex is unlocked (following the appropriate protocol), and the current thread is put to sleep on the sleep queue keyed by the .Fa obj argument. The operations are performed atomically. It is guaranteed to not miss a wakeup from .Dv UMTX_OP_CV_SIGNAL or .Dv UMTX_OP_CV_BROADCAST sent between mutex unlock and putting the current thread on the sleep queue. .Pp Upon wakeup, if the timeout expired and no other threads are sleeping in the same sleep queue, the .Dv c_hash_waiters member is cleared. After wakeup, the .Fa uaddr umutex is not relocked. .Pp The following flags are defined: .Bl -tag -width "CVWAIT_CLOCKID" .It Dv CVWAIT_ABSTIME Timeout is absolute. .It Dv CVWAIT_CLOCKID Clockid is provided. .El .Pp Optionally, a timeout for the request may be specified. Unlike other requests, the timeout value is specified directly by a .Vt struct timespec , pointed to by the .Fa uaddr2 argument. If the .Dv CVWAIT_CLOCKID flag is provided, the timeout uses the clock from the .Dv c_clockid member of the .Vt struct ucond , pointed to by .Fa obj argument. Otherwise, .Dv CLOCK_REALTIME is used, regardless of the clock identifier possibly specified in the .Vt struct _umtx_time . If the .Dv CVWAIT_ABSTIME flag is supplied, the timeout specifies absolute time value, otherwise it denotes a relative time interval. .Pp The request is not restartable. An unblocked signal delivered during the wait always results in sleep interruption and .Er EINTR error. .It Dv UMTX_OP_CV_SIGNAL Wake up one condition waiter. The arguments to the request are: .Bl -tag -width "obj" .It Fa obj Pointer to .Vt struct ucond . .El .Pp The request wakes up at most one thread sleeping on the sleep queue keyed by the .Fa obj argument. If the woken up thread was the last on the sleep queue, the .Dv c_has_waiters member of the .Vt struct ucond is cleared. .It Dv UMTX_OP_CV_BROADCAST Wake up all condition waiters. The arguments to the request are: .Bl -tag -width "obj" .It Fa obj Pointer to .Vt struct ucond . .El .Pp The request wakes up all threads sleeping on the sleep queue keyed by the .Fa obj argument. The .Dv c_has_waiters member of the .Vt struct ucond is cleared. .It Dv UMTX_OP_WAIT_UINT Same as .Dv UMTX_OP_WAIT , but the type of the variable pointed to by .Fa obj is .Vt u_int .Pq a 32-bit integer . .It Dv UMTX_OP_RW_RDLOCK Read-lock a .Vt struct rwlock lock. The arguments to the request are: .Bl -tag -width "obj" .It Fa obj Pointer to the lock (of type .Vt struct rwlock ) to be read-locked. .It Fa val Additional flags to augment locking behaviour. The valid flags in the .Fa val argument are: .Bl -tag -width indent .It Dv URWLOCK_PREFER_READER .El .El .Pp The request obtains the read lock on the specified .Vt struct rwlock by incrementing the count of readers in the .Dv rw_state word of the structure. If the .Dv URWLOCK_WRITE_OWNER bit is set in the word .Dv rw_state , the lock was granted to a writer which has not yet relinquished its ownership. In this case the current thread is put to sleep until it makes sense to retry. .Pp If the .Dv URWLOCK_PREFER_READER flag is set either in the .Dv rw_flags word of the structure, or in the .Fa val argument of the request, the presence of the threads trying to obtain the write lock on the same structure does not prevent the current thread from trying to obtain the read lock. Otherwise, if the flag is not set, and the .Dv URWLOCK_WRITE_WAITERS flag is set in .Dv rw_state , the current thread does not attempt to obtain read-lock. Instead it sets the .Dv URWLOCK_READ_WAITERS in the .Dv rw_state word and puts itself to sleep on corresponding sleep queue. Upon wakeup, the locking conditions are re-evaluated. .Pp Optionally, a timeout for the request may be specified. .Pp The request is not restartable. An unblocked signal delivered during the wait always results in sleep interruption and .Er EINTR error. .It Dv UMTX_OP_RW_WRLOCK Write-lock a .Vt struct rwlock lock. The arguments to the request are: .Bl -tag -width "obj" .It Fa obj Pointer to the lock (of type .Vt struct rwlock ) to be write-locked. .El .Pp The request obtains a write lock on the specified .Vt struct rwlock , by setting the .Dv URWLOCK_WRITE_OWNER bit in the .Dv rw_state word of the structure. If there is already a write lock owner, as indicated by the .Dv URWLOCK_WRITE_OWNER bit being set, or there are read lock owners, as indicated by the read-lock counter, the current thread does not attempt to obtain the write-lock. Instead it sets the .Dv URWLOCK_WRITE_WAITERS in the .Dv rw_state word and puts itself to sleep on corresponding sleep queue. Upon wakeup, the locking conditions are re-evaluated. .Pp Optionally, a timeout for the request may be specified. .Pp The request is not restartable. An unblocked signal delivered during the wait always results in sleep interruption and .Er EINTR error. .It Dv UMTX_OP_RW_UNLOCK Unlock rwlock. The arguments to the request are: .Bl -tag -width "obj" .It Fa obj Pointer to the lock (of type .Vt struct rwlock ) to be unlocked. .El .Pp The unlock type (read or write) is determined by the current lock state. Note that the .Vt struct rwlock does not save information about the identity of the thread which acquired the lock. .Pp If there are pending writers after the unlock, and the .Dv URWLOCK_PREFER_READER flag is not set in the .Dv rw_flags member of the .Fa *obj structure, one writer is woken up, selected as described in the .Sx SLEEP QUEUES subsection. If the .Dv URWLOCK_PREFER_READER flag is set, a pending writer is woken up only if there is no pending readers. .Pp If there are no pending writers, or, in the case that the .Dv URWLOCK_PREFER_READER flag is set, then all pending readers are woken up by unlock. .It Dv UMTX_OP_WAIT_UINT_PRIVATE Same as .Dv UMTX_OP_WAIT_UINT , but unconditionally select the process-private sleep queue. .It Dv UMTX_OP_WAKE_PRIVATE Same as .Dv UMTX_OP_WAKE , but unconditionally select the process-private sleep queue. .It Dv UMTX_OP_MUTEX_WAIT Wait for mutex availability. The arguments to the request are: .Bl -tag -width "obj" .It Fa obj Address of the mutex. .El .Pp Similarly to the .Dv UMTX_OP_MUTEX_LOCK , put the requesting thread to sleep if the mutex lock cannot be obtained immediately. The .Dv UMUTEX_CONTESTED bit is set in the .Dv m_owner word of the mutex to indicate that there is a waiter, before the thread is added to the sleep queue. Unlike the .Dv UMTX_OP_MUTEX_LOCK request, the lock is not obtained. .Pp The operation is not implemented for priority protected and priority inherited protocol mutexes. .Pp Optionally, a timeout for the request may be specified. .Pp A request with a timeout specified is not restartable. An unblocked signal delivered during the wait always results in sleep interruption and .Er EINTR error. A request without a timeout automatically restarts if the signal disposition requested restart via the .Dv SA_RESTART flag in .Vt struct sigaction member .Dv sa_flags . .It Dv UMTX_OP_NWAKE_PRIVATE Wake up a batch of sleeping threads. The arguments to the request are: .Bl -tag -width "obj" .It Fa obj Pointer to the array of pointers. .It Fa val Number of elements in the array pointed to by .Fa obj . .El .Pp For each element in the array pointed to by .Fa obj , wakes up all threads waiting on the .Em private sleep queue with the key being the byte addressed by the array element. .It Dv UMTX_OP_MUTEX_WAKE Check if a normal umutex is unlocked and wake up a waiter. The arguments for the request are: .Bl -tag -width "obj" .It Fa obj Pointer to the umutex. .El .Pp If the .Dv m_owner word of the mutex pointed to by the .Fa obj argument indicates unowned mutex, which has its contention indicator bit .Dv UMUTEX_CONTESTED set, clear the bit and wake up one waiter in the sleep queue associated with the byte addressed by the .Fa obj , if any. Only normal mutexes are supported by the request. The sleep queue is always one for a normal mutex type. .Pp This request is deprecated in favor of .Dv UMTX_OP_MUTEX_WAKE2 since mutexes using it cannot synchronize their own destruction. That is, the .Dv m_owner word has already been set to .Dv UMUTEX_UNOWNED when this request is made, so that another thread can lock, unlock and destroy the mutex (if no other thread uses the mutex afterwards). Clearing the .Dv UMUTEX_CONTESTED bit may then modify freed memory. .It Dv UMTX_OP_MUTEX_WAKE2 Check if a umutex is unlocked and wake up a waiter. The arguments for the request are: .Bl -tag -width "obj" .It Fa obj Pointer to the umutex. .It Fa val The umutex flags. .El .Pp The request does not read the .Dv m_flags member of the .Vt struct umutex ; instead, the .Fa val argument supplies flag information, in particular, to determine the sleep queue where the waiters are found for wake up. .Pp If the mutex is unowned, one waiter is woken up. .Pp If the mutex memory cannot be accessed, all waiters are woken up. .Pp If there is more than one waiter on the sleep queue, or there is only one waiter but the mutex is owned by a thread, the .Dv UMUTEX_CONTESTED bit is set in the .Dv m_owner word of the .Vt struct umutex . .It Dv UMTX_OP_SEM2_WAIT Wait until semaphore is available. The arguments to the request are: .Bl -tag -width "obj" .It Fa obj Pointer to the semaphore (of type .Vt struct _usem2 ) . .It Fa uaddr Size of the memory passed in via the .Fa uaddr2 argument. .It Fa uaddr2 Optional pointer to a structure of type .Vt struct _umtx_time , which may be followed by a structure of type .Vt struct timespec . .El .Pp Put the requesting thread onto a sleep queue if the semaphore counter is zero. If the thread is put to sleep, the .Dv USEM_HAS_WAITERS bit is set in the .Dv _count word to indicate waiters. The function returns either due to .Dv _count indicating the semaphore is available (non-zero count due to post), or due to a wakeup. The return does not guarantee that the semaphore is available, nor does it consume the semaphore lock on successful return. .Pp Optionally, a timeout for the request may be specified. .Pp A request with non-absolute timeout value is not restartable. An unblocked signal delivered during such wait results in sleep interruption and .Er EINTR error. .Pp If .Dv UMTX_ABSTIME was not set, and the operation was interrupted and the caller passed in a .Fa uaddr2 large enough to hold a .Vt struct timespec following the initial .Vt struct _umtx_time , then the .Vt struct timespec is updated to contain the unslept amount. .It Dv UMTX_OP_SEM2_WAKE Wake up waiters on semaphore lock. The arguments to the request are: .Bl -tag -width "obj" .It Fa obj Pointer to the semaphore (of type .Vt struct _usem2 ) . .El .Pp The request wakes up one waiter for the semaphore lock. The function does not increment the semaphore lock count. If the .Dv USEM_HAS_WAITERS bit was set in the .Dv _count word, and the last sleeping thread was woken up, the bit is cleared. .It Dv UMTX_OP_SHM Manage anonymous .Tn POSIX shared memory objects (see .Xr shm_open 2 ) , which can be attached to a byte of physical memory, mapped into the process address space. The objects are used to implement process-shared locks in .Dv libthr . .Pp The .Fa val argument specifies the sub-request of the .Dv UMTX_OP_SHM request: .Bl -tag -width indent .It Dv UMTX_SHM_CREAT Creates the anonymous shared memory object, which can be looked up with the specified key -.Fa uaddr. +.Fa uaddr . If the object associated with the .Fa uaddr key already exists, it is returned instead of creating a new object. The object's size is one page. On success, the file descriptor referencing the object is returned. The descriptor can be used for mapping the object using .Xr mmap 2 , or for other shared memory operations. .It Dv UMTX_SHM_LOOKUP Same as .Dv UMTX_SHM_CREATE request, but if there is no shared memory object associated with the specified key .Fa uaddr , an error is returned, and no new object is created. .It Dv UMTX_SHM_DESTROY De-associate the shared object with the specified key -.Fa uaddr. +.Fa uaddr . The object is destroyed after the last open file descriptor is closed and the last mapping for it is destroyed. .It Dv UMTX_SHM_ALIVE Checks whether there is a live shared object associated with the supplied key .Fa uaddr . Returns zero if there is, and an error otherwise. This request is an optimization of the .Dv UMTX_SHM_LOOKUP request. It is cheaper when only the liveness of the associated object is asked for, since no file descriptor is installed in the process fd table on success. .El .Pp The .Fa uaddr argument specifies the virtual address, which backing physical memory byte identity is used as a key for the anonymous shared object creation or lookup. .It Dv UMTX_OP_ROBUST_LISTS Register the list heads for the current thread's robust mutex lists. The arguments to the request are: .Bl -tag -width "uaddr" .It Fa val Size of the structure passed in the .Fa uaddr argument. .It Fa uaddr Pointer to the structure of type .Vt struct umtx_robust_lists_params . .El .Pp The structure is defined as .Bd -literal struct umtx_robust_lists_params { uintptr_t robust_list_offset; uintptr_t robust_priv_list_offset; uintptr_t robust_inact_offset; }; .Ed .Pp The .Dv robust_list_offset member contains address of the first element in the list of locked robust shared mutexes. The .Dv robust_priv_list_offset member contains address of the first element in the list of locked robust private mutexes. The private and shared robust locked lists are split to allow fast termination of the shared list on fork, in the child. .Pp The .Dv robust_inact_offset contains a pointer to the mutex which might be locked in nearby future, or might have been just unlocked. It is typically set by the lock or unlock mutex implementation code around the whole operation, since lists can be only changed race-free when the thread owns the mutex. The kernel inspects the .Dv robust_inact_offset in addition to walking the shared and private lists. Also, the mutex pointed to by .Dv robust_inact_offset is handled more loosely at the thread termination time, than other mutexes on the list. That mutex is allowed to be not owned by the current thread, in which case list processing is continued. See .Sx ROBUST UMUTEXES subsection for details. .El .Pp The .Fa op argument may be a bitwise OR of a single command from above with one or more of the following flags: .Bl -tag -width indent .It Dv UMTX_OP__I386 Request i386 ABI compatibility from the native .Nm system call. Specifically, this implies that: .Bl -hang -offset indent .It .Fa obj arguments that point to a word, point to a 32-bit integer. .It The .Dv UMTX_OP_NWAKE_PRIVATE .Fa obj argument is a pointer to an array of 32-bit pointers. .It The .Dv m_rb_lnk member of .Vt struct umutex is a 32-bit pointer. .It .Vt struct timespec uses a 32-bit time_t. .El .Pp .Dv UMTX_OP__32BIT has no effect if this flag is set. This flag is valid for all architectures, but it is ignored on i386. .It Dv UMTX_OP__32BIT Request non-i386, 32-bit ABI compatibility from the native .Nm system call. Specifically, this implies that: .Bl -hang -offset indent .It .Fa obj arguments that point to a word, point to a 32-bit integer. .It The .Dv UMTX_OP_NWAKE_PRIVATE .Fa obj argument is a pointer to an array of 32-bit pointers. .It The .Dv m_rb_lnk member of .Vt struct umutex is a 32-bit pointer. .It .Vt struct timespec uses a 64-bit time_t. .El .Pp This flag has no effect if .Dv UMTX_OP__I386 is set. This flag is valid for all architectures. .El .Pp Note that if any 32-bit ABI compatibility is being requested, then care must be taken with robust lists. A single thread may not mix 32-bit compatible robust lists with native robust lists. The first .Dv UMTX_OP_ROBUST_LISTS call in a given thread determines which ABI that thread will use for robust lists going forward. .Sh RETURN VALUES If successful, all requests, except .Dv UMTX_SHM_CREAT and .Dv UMTX_SHM_LOOKUP sub-requests of the .Dv UMTX_OP_SHM request, will return zero. The .Dv UMTX_SHM_CREAT and .Dv UMTX_SHM_LOOKUP return a shared memory file descriptor on success. On error \-1 is returned, and the .Va errno variable is set to indicate the error. .Sh ERRORS The .Fn _umtx_op operations can fail with the following errors: .Bl -tag -width "[ETIMEDOUT]" .It Bq Er EFAULT One of the arguments point to invalid memory. .It Bq Er EINVAL The clock identifier, specified for the .Vt struct _umtx_time timeout parameter, or in the .Dv c_clockid member of .Vt struct ucond, is invalid. .It Bq Er EINVAL The type of the mutex, encoded by the .Dv m_flags member of .Vt struct umutex , is invalid. .It Bq Er EINVAL The .Dv m_owner member of the .Vt struct umutex has changed the lock owner thread identifier during unlock. .It Bq Er EINVAL The .Dv timeout.tv_sec or .Dv timeout.tv_nsec member of .Vt struct _umtx_time is less than zero, or .Dv timeout.tv_nsec is greater than 1000000000. .It Bq Er EINVAL The .Fa op argument specifies invalid operation. .It Bq Er EINVAL The .Fa uaddr argument for the .Dv UMTX_OP_SHM request specifies invalid operation. .It Bq Er EINVAL The .Dv UMTX_OP_SET_CEILING request specifies non priority protected mutex. .It Bq Er EINVAL The new ceiling value for the .Dv UMTX_OP_SET_CEILING request, or one or more of the values read from the .Dv m_ceilings array during lock or unlock operations, is greater than .Dv RTP_PRIO_MAX . .It Bq Er EPERM Unlock attempted on an object not owned by the current thread. .It Bq Er EOWNERDEAD The lock was requested on an umutex where the .Dv m_owner field was set to the .Dv UMUTEX_RB_OWNERDEAD value, indicating terminated robust mutex. The lock was granted to the caller, so this error in fact indicates success with additional conditions. .It Bq Er ENOTRECOVERABLE The lock was requested on an umutex which .Dv m_owner field is equal to the .Dv UMUTEX_RB_NOTRECOV value, indicating abandoned robust mutex after termination. The lock was not granted to the caller. .It Bq Er ENOTTY The shared memory object, associated with the address passed to the .Dv UMTX_SHM_ALIVE sub-request of .Dv UMTX_OP_SHM request, was destroyed. .It Bq Er ESRCH For the .Dv UMTX_SHM_LOOKUP , .Dv UMTX_SHM_DESTROY , and .Dv UMTX_SHM_ALIVE sub-requests of the .Dv UMTX_OP_SHM request, there is no shared memory object associated with the provided key. .It Bq Er ENOMEM The .Dv UMTX_SHM_CREAT sub-request of the .Dv UMTX_OP_SHM request cannot be satisfied, because allocation of the shared memory object would exceed the .Dv RLIMIT_UMTXP resource limit, see .Xr setrlimit 2 . .It Bq Er EAGAIN The maximum number of readers .Dv ( URWLOCK_MAX_READERS ) were already granted ownership of the given .Vt struct rwlock for read. .It Bq Er EBUSY A try mutex lock operation was not able to obtain the lock. .It Bq Er ETIMEDOUT The request specified a timeout in the .Fa uaddr and .Fa uaddr2 arguments, and timed out before obtaining the lock or being woken up. .It Bq Er EINTR A signal was delivered during wait, for a non-restartable operation. Operations with timeouts are typically non-restartable, but timeouts specified in absolute time may be restartable. .It Bq Er ERESTART A signal was delivered during wait, for a restartable operation. Mutex lock requests without timeout specified are restartable. The error is not returned to userspace code since restart is handled by usual adjustment of the instruction counter. .El .Sh SEE ALSO .Xr clock_gettime 2 , .Xr mmap 2 , .Xr setrlimit 2 , .Xr shm_open 2 , .Xr sigaction 2 , .Xr thr_exit 2 , .Xr thr_kill 2 , .Xr thr_kill2 2 , .Xr thr_new 2 , .Xr thr_self 2 , .Xr thr_set_name 2 , .Xr signal 3 .Sh STANDARDS The .Fn _umtx_op system call is non-standard and is used by the .Lb libthr to implement .St -p1003.1-2001 .Xr pthread 3 functionality. .Sh BUGS A window between a unlocking robust mutex and resetting the pointer in the .Dv robust_inact_offset member of the registered .Vt struct umtx_robust_lists_params allows another thread to destroy the mutex, thus making the kernel inspect freed or reused memory. The .Li libthr implementation is only vulnerable to this race when operating on a shared mutex. A possible fix for the current implementation is to strengthen the checks for shared mutexes before terminating them, in particular, verifying that the mutex memory is mapped from a shared memory object allocated by the .Dv UMTX_OP_SHM request. This is not done because it is believed that the race is adequately covered by other consistency checks, while adding the check would prevent alternative implementations of .Li libpthread . Index: head/lib/libc/sys/copy_file_range.2 =================================================================== --- head/lib/libc/sys/copy_file_range.2 (revision 368816) +++ head/lib/libc/sys/copy_file_range.2 (revision 368817) @@ -1,207 +1,203 @@ .\" SPDX-License-Identifier: BSD-2-Clause .\" .\" Copyright (c) 2019 Rick Macklem .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD$ .\" .Dd March 30, 2020 .Dt COPY_FILE_RANGE 2 .Os .Sh NAME .Nm copy_file_range .Nd kernel copy of a byte range from one file to another or within one file .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In sys/types.h .In unistd.h .Ft ssize_t .Fo copy_file_range .Fa "int infd" .Fa "off_t *inoffp" .Fa "int outfd" .Fa "off_t *outoffp" .Fa "size_t len" .Fa "unsigned int flags" .Fc .Sh DESCRIPTION The .Fn copy_file_range system call copies up to .Fa len bytes from .Fa infd to .Fa outfd in the kernel. It may do this using a file system specific technique if .Fa infd and .Fa outfd are on the same file system. If .Fa infd and .Fa outfd refer to the same file, the byte ranges defined by the input file offset, output file offset and .Fa len cannot overlap. The .Fa infd argument must be opened for reading and the .Fa outfd argument must be opened for writing, but not .Dv O_APPEND . If .Fa inoffp or .Fa outoffp is .Dv NULL , the file offset for .Fa infd or .Fa outfd respectively will be used and updated by the number of bytes copied. If .Fa inoffp or .Fa outoffp is not .Dv NULL , the byte offset pointed to by .Fa inoffp or .Fa outoffp respectively will be used/updated and the file offset for .Fa infd or .Fa outfd respectively will not be affected. The .Fa flags argument must be 0. .Pp This system call attempts to maintain holes in the output file for the byte range being copied. However, this does not always work well. It is recommended that sparse files be copied in a loop using .Xr lseek 2 with .Dv SEEK_HOLE , .Dv SEEK_DATA arguments and this system call for the data ranges found. -.Pp .Sh RETURN VALUES If it succeeds, the call returns the number of bytes copied, which can be fewer than .Fa len . Returning fewer bytes than .Fa len does not necessarily indicate that EOF was reached. However, a return of zero for a non-zero .Fa len argument indicates that the offset for .Fa infd is at or beyond EOF. .Fn copy_file_range should be used in a loop until copying of the desired byte range has been completed. If an error has occurred, a \-1 is returned and the error code is placed in the global variable .Va errno . .Sh ERRORS The .Fn copy_file_range system call will fail if: .Bl -tag -width Er .It Bq Er EBADF If -.Fa -infd +.Fa infd is not open for reading or -.Fa -outfd +.Fa outfd is not open for writing, or opened for writing with .Dv O_APPEND , or if .Fa infd and .Fa outfd refer to the same file. .It Bq Er EFBIG If the copy exceeds the process's file size limit or the maximum file size for the file system .Fa outfd resides on. .It Bq Er EINTR A signal interrupted the system call before it could be completed. This may happen for files on some NFS mounts. When this happens, the values pointed to by .Fa inoffp and .Fa outoffp are reset to the initial values for the system call. .It Bq Er EINVAL .Fa infd and .Fa outfd refer to the same file and the byte ranges overlap or -.Fa -flags +.Fa flags is not zero. .It Bq Er EIO An I/O error occurred while reading/writing the files. .It Bq Er EINTEGRITY Corrupted data was detected while reading from a file system. .It Bq Er EISDIR If either .Fa infd or .Fa outfd refers to a directory. .It Bq Er ENOSPC File system that stores .Fa outfd is full. .El .Sh SEE ALSO .Xr lseek 2 .Sh STANDARDS The .Fn copy_file_range system call is expected to be compatible with the Linux system call of the same name. .Sh HISTORY The .Fn copy_file_range function appeared in .Fx 13.0 . Index: head/lib/libc/sys/execve.2 =================================================================== --- head/lib/libc/sys/execve.2 (revision 368816) +++ head/lib/libc/sys/execve.2 (revision 368817) @@ -1,379 +1,379 @@ .\" Copyright (c) 1980, 1991, 1993 .\" The Regents of the University of California. All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 3. Neither the name of the University nor the names of its contributors .\" may be used to endorse or promote products derived from this software .\" without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" @(#)execve.2 8.5 (Berkeley) 6/1/94 .\" $FreeBSD$ .\" .Dd March 30, 2020 .Dt EXECVE 2 .Os .Sh NAME .Nm execve , .Nm fexecve .Nd execute a file .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In unistd.h .Ft int .Fn execve "const char *path" "char *const argv[]" "char *const envp[]" .Ft int .Fn fexecve "int fd" "char *const argv[]" "char *const envp[]" .Sh DESCRIPTION The .Fn execve system call transforms the calling process into a new process. The new process is constructed from an ordinary file, whose name is pointed to by .Fa path , called the .Em new process file . The .Fn fexecve system call is equivalent to .Fn execve except that the file to be executed is determined by the file descriptor .Fa fd instead of a .Fa path . This file is either an executable object file, or a file of data for an interpreter. An executable object file consists of an identifying header, followed by pages of data representing the initial program (text) and initialized data pages. Additional pages may be specified by the header to be initialized with zero data; see .Xr elf 5 and .Xr a.out 5 . .Pp An interpreter file begins with a line of the form: .Pp .Bd -ragged -offset indent -compact .Sy \&#! .Em interpreter .Bq Em arg .Ed .Pp When an interpreter file is .Sy execve Ap d , the system actually .Sy execve Ap s the specified .Em interpreter . If the optional .Em arg is specified, it becomes the first argument to the .Em interpreter , and the name of the originally .Sy execve Ap d file becomes the second argument; otherwise, the name of the originally .Sy execve Ap d file becomes the first argument. The original arguments are shifted over to become the subsequent arguments. The zeroth argument is set to the specified .Em interpreter . .Pp The argument .Fa argv is a pointer to a null-terminated array of character pointers to null-terminated character strings. These strings construct the argument list to be made available to the new process. At least one argument must be present in the array; by custom, the first element should be the name of the executed program (for example, the last component of .Fa path ) . .Pp The argument .Fa envp is also a pointer to a null-terminated array of character pointers to null-terminated strings. A pointer to this array is normally stored in the global variable .Va environ . These strings pass information to the new process that is not directly an argument to the command (see .Xr environ 7 ) . .Pp File descriptors open in the calling process image remain open in the new process image, except for those for which the close-on-exec flag is set (see .Xr close 2 and .Xr fcntl 2 ) . Descriptors that remain open are unaffected by .Fn execve . If any of the standard descriptors (0, 1, and/or 2) are closed at the time .Fn execve is called, and the process will gain privilege as a result of set-id semantics, those descriptors will be re-opened automatically. No programs, whether privileged or not, should assume that these descriptors will remain closed across a call to .Fn execve . .Pp Signals set to be ignored in the calling process are set to be ignored in the new process. Signals which are set to be caught in the calling process image are set to default action in the new process image. Blocked signals remain blocked regardless of changes to the signal action. The signal stack is reset to be undefined (see .Xr sigaction 2 for more information). .Pp If the set-user-ID mode bit of the new process image file is set (see .Xr chmod 2 ) , the effective user ID of the new process image is set to the owner ID of the new process image file. If the set-group-ID mode bit of the new process image file is set, the effective group ID of the new process image is set to the group ID of the new process image file. (The effective group ID is the first element of the group list.) The real user ID, real group ID and other group IDs of the new process image remain the same as the calling process image. After any set-user-ID and set-group-ID processing, the effective user ID is recorded as the saved set-user-ID, and the effective group ID is recorded as the saved set-group-ID. These values may be used in changing the effective IDs later (see .Xr setuid 2 ) . .Pp The set-ID bits are not honored if the respective file system has the .Cm nosuid option enabled or if the new process file is an interpreter file. Syscall tracing is disabled if effective IDs are changed. .Pp The new process also inherits the following attributes from the calling process: .Pp .Bl -column parent_process_ID -offset indent -compact .It process ID Ta see Xr getpid 2 .It parent process ID Ta see Xr getppid 2 .It process group ID Ta see Xr getpgrp 2 .It access groups Ta see Xr getgroups 2 .It working directory Ta see Xr chdir 2 .It root directory Ta see Xr chroot 2 .It control terminal Ta see Xr termios 4 .It resource usages Ta see Xr getrusage 2 .It interval timers Ta see Xr getitimer 2 .It resource limits Ta see Xr getrlimit 2 .It file mode mask Ta see Xr umask 2 .It signal mask Ta see Xr sigaction 2 , .Xr sigprocmask 2 .El .Pp When a program is executed as a result of an .Fn execve system call, it is entered as follows: .Bd -literal -offset indent main(argc, argv, envp) int argc; char **argv, **envp; .Ed .Pp where .Fa argc is the number of elements in .Fa argv (the ``arg count'') and .Fa argv points to the array of character pointers to the arguments themselves. .Pp The .Fn fexecve ignores the file offset of .Fa fd . Since execute permission is checked by .Fn fexecve , the file descriptor .Fa fd need not have been opened with the .Dv O_EXEC flag. However, if the file to be executed denies read permission for the process preparing to do the exec, the only way to provide the .Fa fd to .Fn fexecve is to use the .Dv O_EXEC flag when opening .Fa fd . Note that the file to be executed can not be open for writing. .Sh RETURN VALUES As the .Fn execve system call overlays the current process image with a new process image the successful call has no process to return to. If .Fn execve does return to the calling process an error has occurred; the return value will be -1 and the global variable .Va errno is set to indicate the error. .Sh ERRORS The .Fn execve system call will fail and return to the calling process if: .Bl -tag -width Er .It Bq Er ENOTDIR A component of the path prefix is not a directory. .It Bq Er ENAMETOOLONG A component of a pathname exceeded 255 characters, or an entire path name exceeded 1023 characters. .It Bq Er ENOEXEC When invoking an interpreted script, the length of the first line, inclusive of the .Sy \&#! prefix and terminating newline, exceeds .Dv MAXSHELLCMDLEN characters. .It Bq Er ENOENT The new process file does not exist. .It Bq Er ELOOP Too many symbolic links were encountered in translating the pathname. .It Bq Er EACCES Search permission is denied for a component of the path prefix. .It Bq Er EACCES The new process file is not an ordinary file. .It Bq Er EACCES The new process file mode denies execute permission. .It Bq Er ENOEXEC The new process file has the appropriate access permission, but has an invalid magic number in its header. .It Bq Er ETXTBSY The new process file is a pure procedure (shared text) file that is currently open for writing by some process. .It Bq Er ENOMEM The new process requires more virtual memory than is allowed by the imposed maximum .Pq Xr getrlimit 2 . .It Bq Er E2BIG The number of bytes in the new process' argument list is larger than the system-imposed limit. This limit is specified by the .Xr sysctl 3 MIB variable .Dv KERN_ARGMAX . .It Bq Er EFAULT The new process file is not as long as indicated by the size values in its header. .It Bq Er EFAULT The .Fa path , .Fa argv , or .Fa envp arguments point to an illegal address. .It Bq Er EIO An I/O error occurred while reading from the file system. .It Bq Er EINTEGRITY Corrupted data was detected while reading from the file system. .El .Pp In addition, the .Fn fexecve will fail and return to the calling process if: .Bl -tag -width Er .It Bq Er EBADF The .Fa fd argument is not a valid file descriptor open for executing. .El .Sh SEE ALSO .Xr ktrace 1 , .Xr _exit 2 , .Xr fork 2 , .Xr open 2 , .Xr execl 3 , .Xr exit 3 , .Xr sysctl 3 , .Xr a.out 5 , .Xr elf 5 , .Xr fdescfs 5 , .Xr environ 7 , .Xr mount 8 .Sh STANDARDS The .Fn execve system call conforms to .St -p1003.1-2001 , with the exception of reopening descriptors 0, 1, and/or 2 in certain circumstances. A future update of the Standard is expected to require this behavior, and it may become the default for non-privileged processes as well. .\" NB: update this caveat when TC1 is blessed. The support for executing interpreted programs is an extension. The .Fn fexecve system call conforms to The Open Group Extended API Set 2 specification. .Sh HISTORY The .Fn execve system call appeared in -.At V7 . +.At v7 . The .Fn fexecve system call appeared in .Fx 8.0 . .Sh CAVEATS If a program is .Em setuid to a non-super-user, but is executed when the real .Em uid is ``root'', then the program has some of the powers of a super-user as well. .Pp When executing an interpreted program through .Fn fexecve , kernel supplies .Pa /dev/fd/n as a second argument to the interpreter, where .Ar n is the file descriptor passed in the .Fa fd argument to .Fn fexecve . For this construction to work correctly, the .Xr fdescfs 5 filesystem shall be mounted on .Pa /dev/fd . Index: head/lib/libc/sys/fhlink.2 =================================================================== --- head/lib/libc/sys/fhlink.2 (revision 368816) +++ head/lib/libc/sys/fhlink.2 (revision 368817) @@ -1,277 +1,277 @@ .\" SPDX-License-Identifier: BSD-2-Clause .\" .\" Copyright (c) 2018 Gandi .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD$ .\" .Dd September 23, 2020 .Dt FHLINK 2 .Os .Sh NAME .Nm fhlink , .Nm fhlinkat .Nd make a hard file link .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In unistd.h .Ft int .Fn fhlink "fhandle_t *fhp" "const char *to" .Ft int .Fn fhlinkat "fhandle_t *fhp" "int tofd" "const char *to" .Fc .Sh DESCRIPTION The .Fn fhlink system call atomically creates the specified directory entry (hard link) .Fa to with the attributes of the underlying object pointed at by .Fa fhp . If the link is successful: the link count of the underlying object is incremented; .Fa fhp and .Fa to share equal access and rights to the underlying object. .Pp If .Fa fhp is removed, the file .Fa to is not deleted and the link count of the underlying object is decremented. .Pp The object pointed at by the .Fa fhp argument must exist for the hard link to succeed and both .Fa fhp and .Fa to must be in the same file system. The .Fa fhp argument may not be a directory. .Pp The .Fn fhlinkat system call is equivalent to .Fa fhlink except in the case where .Fa to is a relative paths. In this case a relative path .Fa to is interpreted relative to the directory associated with the file descriptor .Fa tofd instead of the current working directory. .Pp Values for .Fa flag are constructed by a bitwise-inclusive OR of flags from the following list, defined in .In fcntl.h : .Bl -tag -width indent .It Dv AT_SYMLINK_FOLLOW If .Fa fhp names a symbolic link, a new link for the target of the symbolic link is created. .It Dv AT_BENEATH Only allow to link to a file which is beneath of the topping directory. See the description of the .Dv O_BENEATH flag in the .Xr open 2 manual page. .It Dv AT_RESOLVE_BENEATH Only walks paths below the topping directory. See the description of the .Dv O_RESOLVE_BENEATH flag in the .Xr open 2 manual page. .El .Pp If .Fn fhlinkat is passed the special value .Dv AT_FDCWD in the .Fa tofd parameter, the current working directory is used for the .Fa to argument. If .Fa tofd has value .Dv AT_FDCWD , the behavior is identical to a call to .Fn link . Unless .Fa flag contains the .Dv AT_SYMLINK_FOLLOW flag, if .Fa fhp names a symbolic link, a new link is created for the symbolic link .Fa fhp and not its target. .Sh RETURN VALUES .Rv -std link .Sh ERRORS The .Fn fhlink system call will fail and no link will be created if: .Bl -tag -width Er .It Bq Er ENOTDIR A component of .Fa to prefix is not a directory. .It Bq Er ENAMETOOLONG A component of .Fa to exceeded 255 characters, or entire length of .Fa to name exceeded 1023 characters. .It Bq Er ENOENT A component of .Fa to prefix does not exist. .It Bq Er EOPNOTSUPP The file system containing the file pointed at by .Fa fhp does not support links. .It Bq Er EMLINK The link count of the file pointed at by .Fa fhp would exceed 32767. .It Bq Er EACCES A component of .Fa to prefix denies search permission. .It Bq Er EACCES The requested link requires writing in a directory with a mode that denies write permission. .It Bq Er ELOOP Too many symbolic links were encountered in translating one of the pathnames. .It Bq Er ENOENT The file pointed at by .Fa fhp does not exist. .It Bq Er EEXIST The link named by .Fa to does exist. .It Bq Er EPERM The file pointed at by .Fa fhp is a directory. .It Bq Er EPERM The file pointed at by .Fa fhp has its immutable or append-only flag set, see the .Xr chflags 2 manual page for more information. .It Bq Er EPERM The parent directory of the file named by .Fa to has its immutable flag set. .It Bq Er EXDEV The link named by .Fa to and the file pointed at by .Fa fhp are on different file systems. .It Bq Er ENOSPC The directory in which the entry for the new link is being placed cannot be extended because there is no space left on the file system containing the directory. .It Bq Er EDQUOT The directory in which the entry for the new link is being placed cannot be extended because the user's quota of disk blocks on the file system containing the directory has been exhausted. .It Bq Er EIO An I/O error occurred while reading from or writing to the file system to make the directory entry. .It Bq Er EINTEGRITY Corrupted data was detected while reading from the file system. .It Bq Er EROFS The requested link requires writing in a directory on a read-only file system. .It Bq Er EFAULT One of the pathnames specified is outside the process's allocated address space. .It Bq Er ESTALE The file handle .Fa fhp is no longer valid .El .Pp In addition to the errors returned by the .Fn fhlink , the .Fn fhlinkat system call may fail if: .Bl -tag -width Er .It Bq Er EBADF The .Fa fhp or .Fa to argument does not specify an absolute path and the .Fa tofd argument, is not .Dv AT_FDCWD nor a valid file descriptor open for searching. .It Bq Er EINVAL The value of the .Fa flag argument is not valid. .It Bq Er ENOTDIR The .Fa fhp or .Fa to argument is not an absolute path and .Fa tofd is not .Dv AT_FDCWD nor a file descriptor associated with a directory. .El .Sh SEE ALSO -.Xr fhstat 2 , -.Xr fhreadlink 2 , .Xr fhopen 2 , +.Xr fhreadlink 2 , +.Xr fhstat 2 Index: head/lib/libc/sys/open.2 =================================================================== --- head/lib/libc/sys/open.2 (revision 368816) +++ head/lib/libc/sys/open.2 (revision 368817) @@ -1,669 +1,669 @@ .\" Copyright (c) 1980, 1991, 1993 .\" The Regents of the University of California. All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 3. Neither the name of the University nor the names of its contributors .\" may be used to endorse or promote products derived from this software .\" without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" @(#)open.2 8.2 (Berkeley) 11/16/93 .\" $FreeBSD$ .\" .Dd September 23, 2020 .Dt OPEN 2 .Os .Sh NAME .Nm open , openat .Nd open or create a file for reading, writing or executing .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In fcntl.h .Ft int .Fn open "const char *path" "int flags" "..." .Ft int .Fn openat "int fd" "const char *path" "int flags" "..." .Sh DESCRIPTION The file name specified by .Fa path is opened for either execution or reading and/or writing as specified by the argument .Fa flags and the file descriptor returned to the calling process. The .Fa flags argument may indicate the file is to be created if it does not exist (by specifying the .Dv O_CREAT flag). In this case .Fn open and .Fn openat require an additional argument .Fa "mode_t mode" , and the file is created with mode .Fa mode as described in .Xr chmod 2 and modified by the process' umask value (see .Xr umask 2 ) . .Pp The .Fn openat function is equivalent to the .Fn open function except in the case where the .Fa path specifies a relative path, or the .Dv O_BENEATH flag is provided. For .Fn openat and relative .Fa path , the file to be opened is determined relative to the directory associated with the file descriptor .Fa fd instead of the current working directory. The .Fa flag parameter and the optional fourth parameter correspond exactly to the parameters of .Fn open . If .Fn openat is passed the special value .Dv AT_FDCWD in the .Fa fd parameter, the current working directory is used and the behavior is identical to a call to .Fn open . .Pp When .Fn openat is called with an absolute .Fa path without the .Dv O_BENEATH flag, it ignores the .Fa fd argument. When .Dv O_BENEATH is specified with an absolute .Fa path , a directory passed by the .Fa fd argument is used as the topping point for the resolution. When .Dv O_BENEATH is specified with a relative path, the .Fa fd argument is used both as the starting point, and as the topping point for the resolution. See the definition of the .Dv O_BENEATH flag below. .Pp In .Xr capsicum 4 capability mode, .Fn open is not permitted. The .Fa path argument to .Fn openat must be strictly relative to a file descriptor .Fa fd , as defined in .Pa sys/kern/vfs_lookup.c . .Fa path must not be an absolute path and must not contain ".." components which cause the path resolution to escape the directory hierarchy starting at .Fa fd . Additionally, no symbolic link in .Fa path may target absolute path or contain escaping ".." components. .Fa fd must not be .Dv AT_FDCWD . .Pp If the .Dv vfs.lookup_cap_dotdot .Xr sysctl 3 MIB is set to zero, ".." components in the paths, used in capability mode, or with the .Dv O_BENEATH flag, are completely disabled. If the .Dv vfs.lookup_cap_dotdot_nonlocal MIB is set to zero, ".." is not allowed if found on non-local filesystem. .Pp The flags specified are formed by .Em or Ns 'ing the following values .Pp .Bd -literal -offset indent -compact O_RDONLY open for reading only O_WRONLY open for writing only O_RDWR open for reading and writing O_EXEC open for execute only O_SEARCH open for search only, an alias for O_EXEC O_NONBLOCK do not block on open O_APPEND append on each write O_CREAT create file if it does not exist O_TRUNC truncate size to 0 O_EXCL error if create and file exists O_SHLOCK atomically obtain a shared lock O_EXLOCK atomically obtain an exclusive lock O_DIRECT eliminate or reduce cache effects O_FSYNC synchronous writes O_SYNC synchronous writes O_NOFOLLOW do not follow symlinks O_NOCTTY ignored O_TTY_INIT ignored O_DIRECTORY error if file is not a directory O_CLOEXEC set FD_CLOEXEC upon open O_VERIFY verify the contents of the file O_BENEATH require resolved path to be strictly relative to topping directory O_RESOLVE_BENEATH require walked path to be strictly relative to topping directory .Ed .Pp Opening a file with .Dv O_APPEND set causes each write on the file to be appended to the end. If .Dv O_TRUNC is specified and the file exists, the file is truncated to zero length. If .Dv O_EXCL is set with .Dv O_CREAT and the file already exists, .Fn open returns an error. This may be used to implement a simple exclusive access locking mechanism. If .Dv O_EXCL is set and the last component of the pathname is a symbolic link, .Fn open will fail even if the symbolic link points to a non-existent name. If the .Dv O_NONBLOCK flag is specified and the .Fn open system call would result in the process being blocked for some reason (e.g., waiting for carrier on a dialup line), .Fn open returns immediately. The descriptor remains in non-blocking mode for subsequent operations. .Pp If .Dv O_FSYNC is used in the mask, all writes will immediately and synchronously be written to disk. .Pp .Dv O_SYNC is a synonym for .Dv O_FSYNC required by .Tn POSIX . .Pp If .Dv O_NOFOLLOW is used in the mask and the target file passed to .Fn open is a symbolic link then the .Fn open will fail. .Pp When opening a file, a lock with .Xr flock 2 semantics can be obtained by setting .Dv O_SHLOCK for a shared lock, or .Dv O_EXLOCK for an exclusive lock. If creating a file with .Dv O_CREAT , the request for the lock will never fail (provided that the underlying file system supports locking). .Pp .Dv O_DIRECT may be used to minimize or eliminate the cache effects of reading and writing. The system will attempt to avoid caching the data you read or write. If it cannot avoid caching the data, it will minimize the impact the data has on the cache. Use of this flag can drastically reduce performance if not used with care. .Pp .Dv O_NOCTTY may be used to ensure the OS does not assign this file as the controlling terminal when it opens a tty device. This is the default on .Fx , but is present for .Tn POSIX compatibility. The .Fn open system call will not assign controlling terminals on .Fx . .Pp .Dv O_TTY_INIT may be used to ensure the OS restores the terminal attributes when initially opening a TTY. This is the default on .Fx , but is present for .Tn POSIX compatibility. The initial call to .Fn open on a TTY will always restore default terminal attributes on .Fx . .Pp .Dv O_DIRECTORY may be used to ensure the resulting file descriptor refers to a directory. This flag can be used to prevent applications with elevated privileges from opening files which are even unsafe to open with .Dv O_RDONLY , such as device nodes. .Pp .Dv O_CLOEXEC may be used to set .Dv FD_CLOEXEC flag for the newly returned file descriptor. .Pp .Dv O_VERIFY may be used to indicate to the kernel that the contents of the file should be verified before allowing the open to proceed. The details of what .Dq verified means is implementation specific. The run-time linker (rtld) uses this flag to ensure shared objects have been verified before operating on them. .Pp .Dv O_BENEATH returns .Er ENOTCAPABLE if the specified path, after resolving all symlinks and ".." references, does not end up with tail residing in the directory hierarchy of children beneath the topping directory. Topping directory is the process current directory if relative .Fa path is used for .Fn open , and the directory referenced by the .Fa fd argument when using .Fn openat . .Dv O_BENEATH allows arbitrary prefix that ends up at the topping directory, after which all further resolved components must be under it. .Pp .Dv O_RESOLVE_BENEATH returns .Er ENOTCAPABLE if any intermediate component of the specified relative path does not reside in the directory hierarchy beneath the topping directory. Comparing to -.Dv O_BENEATH, +.Dv O_BENEATH , absolute paths or even the temporal escape from beneath of the topping directory is not allowed. .Pp When .Fa fd is opened with .Dv O_SEARCH , execute permissions are checked at open time. The .Fa fd may not be used for any read operations like .Xr getdirentries 2 . The primary use for this descriptor will be as the lookup descriptor for the .Fn *at family of functions. .Pp If successful, .Fn open returns a non-negative integer, termed a file descriptor. It returns \-1 on failure. The file pointer used to mark the current position within the file is set to the beginning of the file. .Pp If a sleeping open of a device node from .Xr devfs 5 is interrupted by a signal, the call always fails with .Er EINTR , even if the .Dv SA_RESTART flag is set for the signal. A sleeping open of a fifo (see .Xr mkfifo 2 ) is restarted as normal. .Pp When a new file is created it is given the group of the directory which contains it. .Pp Unless .Dv O_CLOEXEC flag was specified, the new descriptor is set to remain open across .Xr execve 2 system calls; see .Xr close 2 , .Xr fcntl 2 and .Dv O_CLOEXEC description. .Pp The system imposes a limit on the number of file descriptors open simultaneously by one process. The .Xr getdtablesize 2 system call returns the current system limit. .Sh RETURN VALUES If successful, .Fn open and .Fn openat return a non-negative integer, termed a file descriptor. They return \-1 on failure, and set .Va errno to indicate the error. .Sh ERRORS The named file is opened unless: .Bl -tag -width Er .It Bq Er ENOTDIR A component of the path prefix is not a directory. .It Bq Er ENAMETOOLONG A component of a pathname exceeded 255 characters, or an entire path name exceeded 1023 characters. .It Bq Er ENOENT .Dv O_CREAT is not set and the named file does not exist. .It Bq Er ENOENT A component of the path name that must exist does not exist. .It Bq Er EACCES Search permission is denied for a component of the path prefix. .It Bq Er EACCES The required permissions (for reading and/or writing) are denied for the given flags. .It Bq Er EACCES .Dv O_TRUNC is specified and write permission is denied. .It Bq Er EACCES .Dv O_CREAT is specified, the file does not exist, and the directory in which it is to be created does not permit writing. .It Bq Er EPERM .Dv O_CREAT is specified, the file does not exist, and the directory in which it is to be created has its immutable flag set, see the .Xr chflags 2 manual page for more information. .It Bq Er EPERM The named file has its immutable flag set and the file is to be modified. .It Bq Er EPERM The named file has its append-only flag set, the file is to be modified, and .Dv O_TRUNC is specified or .Dv O_APPEND is not specified. .It Bq Er ELOOP Too many symbolic links were encountered in translating the pathname. .It Bq Er EISDIR The named file is a directory, and the arguments specify it is to be modified. .It Bq Er EISDIR The named file is a directory, and the flags specified .Dv O_CREAT without .Dv O_DIRECTORY . .It Bq Er EROFS The named file resides on a read-only file system, and the file is to be modified. .It Bq Er EROFS .Dv O_CREAT is specified and the named file would reside on a read-only file system. .It Bq Er EMFILE The process has already reached its limit for open file descriptors. .It Bq Er ENFILE The system file table is full. .It Bq Er EMLINK .Dv O_NOFOLLOW was specified and the target is a symbolic link. .It Bq Er ENXIO The named file is a character special or block special file, and the device associated with this special file does not exist. .It Bq Er ENXIO .Dv O_NONBLOCK is set, the named file is a fifo, .Dv O_WRONLY is set, and no process has the file open for reading. .It Bq Er EINTR The .Fn open operation was interrupted by a signal. .It Bq Er EOPNOTSUPP .Dv O_SHLOCK or .Dv O_EXLOCK is specified but the underlying file system does not support locking. .It Bq Er EOPNOTSUPP The named file is a special file mounted through a file system that does not support access to it (e.g.\& NFS). .It Bq Er EWOULDBLOCK .Dv O_NONBLOCK and one of .Dv O_SHLOCK or .Dv O_EXLOCK is specified and the file is locked. .It Bq Er ENOSPC .Dv O_CREAT is specified, the file does not exist, and the directory in which the entry for the new file is being placed cannot be extended because there is no space left on the file system containing the directory. .It Bq Er ENOSPC .Dv O_CREAT is specified, the file does not exist, and there are no free inodes on the file system on which the file is being created. .It Bq Er EDQUOT .Dv O_CREAT is specified, the file does not exist, and the directory in which the entry for the new file is being placed cannot be extended because the user's quota of disk blocks on the file system containing the directory has been exhausted. .It Bq Er EDQUOT .Dv O_CREAT is specified, the file does not exist, and the user's quota of inodes on the file system on which the file is being created has been exhausted. .It Bq Er EIO An I/O error occurred while making the directory entry or allocating the inode for .Dv O_CREAT . .It Bq Er EINTEGRITY Corrupted data was detected while reading from the file system. .It Bq Er ETXTBSY The file is a pure procedure (shared text) file that is being executed and the .Fn open system call requests write access. .It Bq Er EFAULT The .Fa path argument points outside the process's allocated address space. .It Bq Er EEXIST .Dv O_CREAT and .Dv O_EXCL were specified and the file exists. .It Bq Er EOPNOTSUPP An attempt was made to open a socket (not currently implemented). .It Bq Er EINVAL An attempt was made to open a descriptor with an illegal combination of .Dv O_RDONLY , .Dv O_WRONLY , or .Dv O_RDWR , and .Dv O_EXEC or .Dv O_SEARCH . .It Bq Er EINVAL The .Dv O_RESOLVE_BENEATH flag is specified and .Dv path is absolute. .It Bq Er EBADF The .Fa path argument does not specify an absolute path and the .Fa fd argument is neither .Dv AT_FDCWD nor a valid file descriptor open for searching. .It Bq Er ENOTDIR The .Fa path argument is not an absolute path and .Fa fd is neither .Dv AT_FDCWD nor a file descriptor associated with a directory. .It Bq Er ENOTDIR .Dv O_DIRECTORY is specified and the file is not a directory. .It Bq Er ECAPMODE .Dv AT_FDCWD is specified and the process is in capability mode. .It Bq Er ECAPMODE .Fn open was called and the process is in capability mode. .It Bq Er ENOTCAPABLE .Fa path is an absolute path, or contained a ".." component leading to a directory outside of the directory hierarchy specified by .Fa fd , and the process is in capability mode. .It Bq Er ENOTCAPABLE The .Dv O_BENEATH flag was provided, and the absolute .Fa path does not have its tail fully contained under the topping directory, or the relative .Fa path escapes it. .It Bq Er ENOTCAPABLE The .Dv O_RESOLVE_BENEATH flag was provided, and the relative .Fa path escapes topping directory. .El .Sh SEE ALSO .Xr chmod 2 , .Xr close 2 , .Xr dup 2 , .Xr fexecve 2 , .Xr fhopen 2 , .Xr getdtablesize 2 , .Xr getfh 2 , .Xr lgetfh 2 , .Xr lseek 2 , .Xr read 2 , .Xr umask 2 , .Xr write 2 , .Xr fopen 3 , .Xr capsicum 4 .Sh STANDARDS These functions are specified by .St -p1003.1-2008 . .Fx sets .Va errno to .Er EMLINK instead of .Er ELOOP as specified by .Tn POSIX when .Dv O_NOFOLLOW is set in flags and the final component of pathname is a symbolic link to distinguish it from the case of too many symbolic link traversals in one of its non-final components. .Sh HISTORY The .Fn open function appeared in .At v1 . The .Fn openat function was introduced in .Fx 8.0 . .Sh BUGS The Open Group Extended API Set 2 specification requires that the test for whether .Fa fd is searchable is based on whether .Fa fd is open for searching, not whether the underlying directory currently permits searches. The present implementation of the .Fa openat checks the current permissions of directory instead. .Pp The .Fa mode argument is variadic and may result in different calling conventions than might otherwise be expected. Index: head/lib/libc/sys/pdfork.2 =================================================================== --- head/lib/libc/sys/pdfork.2 (revision 368816) +++ head/lib/libc/sys/pdfork.2 (revision 368817) @@ -1,188 +1,188 @@ .\" .\" Copyright (c) 2009-2010, 2012-2013 Robert N. M. Watson .\" All rights reserved. .\" .\" This software was developed at the University of Cambridge Computer .\" Laboratory with support from a grant from Google, Inc. .\" .\" This software was developed by SRI International and the University of .\" Cambridge Computer Laboratory under DARPA/AFRL contract (FA8750-10-C-0237) .\" ("CTSRD"), as part of the DARPA CRASH research programme. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD$ .\" .Dd October 14, 2018 .Dt PDFORK 2 .Os .Sh NAME .Nm pdfork , .Nm pdgetpid , .Nm pdkill .Nd System calls to manage process descriptors .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In sys/procdesc.h .Ft pid_t .Fn pdfork "int *fdp" "int flags" .Ft int .Fn pdgetpid "int fd" "pid_t *pidp" .Ft int .Fn pdkill "int fd" "int signum" .Sh DESCRIPTION Process descriptors are special file descriptors that represent processes, and are created using .Fn pdfork , a variant of .Xr fork 2 , which, if successful, returns a process descriptor in the integer pointed to by .Fa fdp . Processes created via .Fn pdfork will not cause .Dv SIGCHLD on termination. .Fn pdfork can accept the flags: .Bl -tag -width ".Dv PD_DAEMON" .It Dv PD_DAEMON Instead of the default terminate-on-close behaviour, allow the process to live until it is explicitly killed with .Xr kill 2 . .Pp This option is not permitted in .Xr capsicum 4 capability mode (see .Xr cap_enter 2 ) . .El .Bl -tag -width ".Dv PD_DAEMON" .It Dv PD_CLOEXEC Set close-on-exec on process descriptor. .El .Pp .Fn pdgetpid queries the process ID (PID) in the process descriptor .Fa fd . .Pp .Fn pdkill is functionally identical to .Xr kill 2 , except that it accepts a process descriptor, .Fa fd , rather than a PID. .Pp The following system calls also have effects specific to process descriptors: .Pp .Xr fstat 2 queries status of a process descriptor; currently only the .Fa st_mode , .Fa st_birthtime , .Fa st_atime , .Fa st_ctime and .Fa st_mtime fields are defined. If the owner read, write, and execute bits are set then the process represented by the process descriptor is still alive. .Pp .Xr poll 2 and .Xr select 2 allow waiting for process state transitions; currently only .Dv POLLHUP is defined, and will be raised when the process dies. Process state transitions can also be monitored using .Xr kqueue 2 filter .Dv EVFILT_PROCDESC ; currently only .Dv NOTE_EXIT is implemented. .Pp .Xr close 2 will close the process descriptor unless .Dv PD_DAEMON is set; if the process is still alive and this is the last reference to the process descriptor, the process will be terminated with the signal .Dv SIGKILL . .Sh RETURN VALUES .Fn pdfork returns a PID, 0 or -1, as .Xr fork 2 does. .Pp .Fn pdgetpid and .Fn pdkill return 0 on success and -1 on failure. .Sh ERRORS These functions may return the same error numbers as their PID-based equivalents (e.g. .Fn pdfork may return the same error numbers as .Xr fork 2 ) , with the following additions: .Bl -tag -width Er .It Bq Er EINVAL The signal number given to .Fn pdkill is invalid. .It Bq Er ENOTCAPABLE The process descriptor being operated on has insufficient rights (e.g. .Dv CAP_PDKILL for .Fn pdkill ) . .El .Sh SEE ALSO .Xr close 2 , .Xr fork 2 , .Xr fstat 2 , .Xr kill 2 , -.Xr poll 2 , .Xr kqueue 2 , +.Xr poll 2 , .Xr wait4 2 , .Xr capsicum 4 , .Xr procdesc 4 .Sh HISTORY The .Fn pdfork , .Fn pdgetpid , and .Fn pdkill system calls first appeared in .Fx 9.0 . .Pp Support for process descriptors mode was developed as part of the .Tn TrustedBSD Project. .Sh AUTHORS .An -nosplit These functions and the capability facility were created by .An Robert N. M. Watson Aq Mt rwatson@FreeBSD.org and .An Jonathan Anderson Aq Mt jonathan@FreeBSD.org at the University of Cambridge Computer Laboratory with support from a grant from Google, Inc. Index: head/lib/libc/sys/ptrace.2 =================================================================== --- head/lib/libc/sys/ptrace.2 (revision 368816) +++ head/lib/libc/sys/ptrace.2 (revision 368817) @@ -1,1173 +1,1171 @@ .\" $FreeBSD$ .\" $NetBSD: ptrace.2,v 1.2 1995/02/27 12:35:37 cgd Exp $ .\" .\" This file is in the public domain. .Dd July 15, 2019 .Dt PTRACE 2 .Os .Sh NAME .Nm ptrace .Nd process tracing and debugging .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In sys/types.h .In sys/ptrace.h .Ft int .Fn ptrace "int request" "pid_t pid" "caddr_t addr" "int data" .Sh DESCRIPTION The .Fn ptrace system call provides tracing and debugging facilities. It allows one process (the .Em tracing process) to control another (the .Em traced process). The tracing process must first attach to the traced process, and then issue a series of .Fn ptrace system calls to control the execution of the process, as well as access process memory and register state. For the duration of the tracing session, the traced process will be .Dq re-parented , with its parent process ID (and resulting behavior) changed to the tracing process. It is permissible for a tracing process to attach to more than one other process at a time. When the tracing process has completed its work, it must detach the traced process; if a tracing process exits without first detaching all processes it has attached, those processes will be killed. .Pp Most of the time, the traced process runs normally, but when it receives a signal (see .Xr sigaction 2 ) , it stops. The tracing process is expected to notice this via .Xr wait 2 or the delivery of a .Dv SIGCHLD signal, examine the state of the stopped process, and cause it to terminate or continue as appropriate. The signal may be a normal process signal, generated as a result of traced process behavior, or use of the .Xr kill 2 system call; alternatively, it may be generated by the tracing facility as a result of attaching, stepping by the tracing process, or an event in the traced process. The tracing process may choose to intercept the signal, using it to observe process behavior (such as .Dv SIGTRAP ) , or forward the signal to the process if appropriate. The .Fn ptrace system call is the mechanism by which all this happens. .Pp A traced process may report additional signal stops corresponding to events in the traced process. These additional signal stops are reported as .Dv SIGTRAP or .Dv SIGSTOP signals. The tracing process can use the .Dv PT_LWPINFO request to determine which events are associated with a .Dv SIGTRAP or .Dv SIGSTOP signal. Note that multiple events may be associated with a single signal. For example, events indicated by the .Dv PL_FLAG_BORN , .Dv PL_FLAG_FORKED , and .Dv PL_FLAG_EXEC flags are also reported as a system call exit event .Pq Dv PL_FLAG_SCX . The signal stop for a new child process enabled via .Dv PTRACE_FORK will report a .Dv SIGSTOP signal. All other additional signal stops use .Dv SIGTRAP . .Pp Each traced process has a tracing event mask. An event in the traced process only reports a signal stop if the corresponding flag is set in the tracing event mask. The current set of tracing event flags include: .Bl -tag -width "Dv PTRACE_SYSCALL" .It Dv PTRACE_EXEC Report a stop for a successful invocation of .Xr execve 2 . This event is indicated by the .Dv PL_FLAG_EXEC flag in the .Va pl_flags member of .Vt "struct ptrace_lwpinfo" . .It Dv PTRACE_SCE Report a stop on each system call entry. This event is indicated by the .Dv PL_FLAG_SCE flag in the .Va pl_flags member of .Vt "struct ptrace_lwpinfo" . .It Dv PTRACE_SCX Report a stop on each system call exit. This event is indicated by the .Dv PL_FLAG_SCX flag in the .Va pl_flags member of .Vt "struct ptrace_lwpinfo" . .It Dv PTRACE_SYSCALL Report stops for both system call entry and exit. .It Dv PTRACE_FORK This event flag controls tracing for new child processes of a traced process. .Pp When this event flag is enabled, new child processes will enable tracing and stop before executing their first instruction. The new child process will include the .Dv PL_FLAG_CHILD flag in the .Va pl_flags member of .Vt "struct ptrace_lwpinfo" . The traced process will report a stop that includes the .Dv PL_FLAG_FORKED flag. The process ID of the new child process will also be present in the .Va pl_child_pid member of .Vt "struct ptrace_lwpinfo" . If the new child process was created via .Xr vfork 2 , the traced process's stop will also include the .Dv PL_FLAG_VFORKED flag. Note that new child processes will be attached with the default tracing event mask; they do not inherit the event mask of the traced process. .Pp When this event flag is not enabled, new child processes will execute without tracing enabled. .It Dv PTRACE_LWP This event flag controls tracing of LWP .Pq kernel thread creation and destruction. When this event is enabled, new LWPs will stop and report an event with .Dv PL_FLAG_BORN set before executing their first instruction, and exiting LWPs will stop and report an event with .Dv PL_FLAG_EXITED set before completing their termination. .Pp Note that new processes do not report an event for the creation of their initial thread, and exiting processes do not report an event for the termination of the last thread. .It Dv PTRACE_VFORK Report a stop event when a parent process resumes after a .Xr vfork 2 . .Pp When a thread in the traced process creates a new child process via .Xr vfork 2 , the stop that reports .Dv PL_FLAG_FORKED and .Dv PL_FLAG_SCX occurs just after the child process is created, but before the thread waits for the child process to stop sharing process memory. If a debugger is not tracing the new child process, it must ensure that no breakpoints are enabled in the shared process memory before detaching from the new child process. This means that no breakpoints are enabled in the parent process either. .Pp The .Dv PTRACE_VFORK flag enables a new stop that indicates when the new child process stops sharing the process memory of the parent process. A debugger can reinsert breakpoints in the parent process and resume it in response to this event. This event is indicated by setting the .Dv PL_FLAG_VFORK_DONE flag. .El .Pp The default tracing event mask when attaching to a process via .Dv PT_ATTACH , .Dv PT_TRACE_ME , or .Dv PTRACE_FORK includes only .Dv PTRACE_EXEC events. All other event flags are disabled. .Pp The .Fa request argument specifies what operation is being performed; the meaning of the rest of the arguments depends on the operation, but except for one special case noted below, all .Fn ptrace calls are made by the tracing process, and the .Fa pid argument specifies the process ID of the traced process or a corresponding thread ID. The .Fa request argument can be: .Bl -tag -width "Dv PT_GET_EVENT_MASK" .It Dv PT_TRACE_ME This request is the only one used by the traced process; it declares that the process expects to be traced by its parent. All the other arguments are ignored. (If the parent process does not expect to trace the child, it will probably be rather confused by the results; once the traced process stops, it cannot be made to continue except via .Fn ptrace . ) When a process has used this request and calls .Xr execve 2 or any of the routines built on it (such as .Xr execv 3 ) , it will stop before executing the first instruction of the new image. Also, any setuid or setgid bits on the executable being executed will be ignored. If the child was created by .Xr vfork 2 system call or .Xr rfork 2 call with the .Dv RFMEM flag specified, the debugging events are reported to the parent only after the .Xr execve 2 is executed. .It Dv PT_READ_I , Dv PT_READ_D These requests read a single .Vt int of data from the traced process's address space. Traditionally, .Fn ptrace has allowed for machines with distinct address spaces for instruction and data, which is why there are two requests: conceptually, .Dv PT_READ_I reads from the instruction space and .Dv PT_READ_D reads from the data space. In the current .Fx implementation, these two requests are completely identical. The .Fa addr argument specifies the address (in the traced process's virtual address space) at which the read is to be done. This address does not have to meet any alignment constraints. The value read is returned as the return value from .Fn ptrace . .It Dv PT_WRITE_I , Dv PT_WRITE_D These requests parallel .Dv PT_READ_I and .Dv PT_READ_D , except that they write rather than read. The .Fa data argument supplies the value to be written. .It Dv PT_IO This request allows reading and writing arbitrary amounts of data in the traced process's address space. The .Fa addr argument specifies a pointer to a .Vt "struct ptrace_io_desc" , which is defined as follows: .Bd -literal struct ptrace_io_desc { int piod_op; /* I/O operation */ void *piod_offs; /* child offset */ void *piod_addr; /* parent offset */ size_t piod_len; /* request length */ }; /* * Operations in piod_op. */ #define PIOD_READ_D 1 /* Read from D space */ #define PIOD_WRITE_D 2 /* Write to D space */ #define PIOD_READ_I 3 /* Read from I space */ #define PIOD_WRITE_I 4 /* Write to I space */ .Ed .Pp The .Fa data argument is ignored. The actual number of bytes read or written is stored in .Va piod_len upon return. .It Dv PT_CONTINUE The traced process continues execution. The .Fa addr argument is an address specifying the place where execution is to be resumed (a new value for the program counter), or .Po Vt caddr_t Pc Ns 1 to indicate that execution is to pick up where it left off. The .Fa data argument provides a signal number to be delivered to the traced process as it resumes execution, or 0 if no signal is to be sent. .It Dv PT_STEP The traced process is single stepped one instruction. The .Fa addr argument should be passed .Po Vt caddr_t Pc Ns 1 . The .Fa data argument provides a signal number to be delivered to the traced process as it resumes execution, or 0 if no signal is to be sent. .It Dv PT_KILL The traced process terminates, as if .Dv PT_CONTINUE had been used with .Dv SIGKILL given as the signal to be delivered. .It Dv PT_ATTACH This request allows a process to gain control of an otherwise unrelated process and begin tracing it. It does not need any cooperation from the to-be-traced process. In this case, .Fa pid specifies the process ID of the to-be-traced process, and the other two arguments are ignored. This request requires that the target process must have the same real UID as the tracing process, and that it must not be executing a setuid or setgid executable. (If the tracing process is running as root, these restrictions do not apply.) The tracing process will see the newly-traced process stop and may then control it as if it had been traced all along. .It Dv PT_DETACH This request is like PT_CONTINUE, except that it does not allow specifying an alternate place to continue execution, and after it succeeds, the traced process is no longer traced and continues execution normally. .It Dv PT_GETREGS This request reads the traced process's machine registers into the .Do .Vt "struct reg" .Dc (defined in .In machine/reg.h ) pointed to by .Fa addr . .It Dv PT_SETREGS This request is the converse of .Dv PT_GETREGS ; it loads the traced process's machine registers from the .Do .Vt "struct reg" .Dc (defined in .In machine/reg.h ) pointed to by .Fa addr . .It Dv PT_GETFPREGS This request reads the traced process's floating-point registers into the .Do .Vt "struct fpreg" .Dc (defined in .In machine/reg.h ) pointed to by .Fa addr . .It Dv PT_SETFPREGS This request is the converse of .Dv PT_GETFPREGS ; it loads the traced process's floating-point registers from the .Do .Vt "struct fpreg" .Dc (defined in .In machine/reg.h ) pointed to by .Fa addr . .It Dv PT_GETDBREGS This request reads the traced process's debug registers into the .Do .Vt "struct dbreg" .Dc (defined in .In machine/reg.h ) pointed to by .Fa addr . .It Dv PT_SETDBREGS This request is the converse of .Dv PT_GETDBREGS ; it loads the traced process's debug registers from the .Do .Vt "struct dbreg" .Dc (defined in .In machine/reg.h ) pointed to by .Fa addr . .It Dv PT_LWPINFO This request can be used to obtain information about the kernel thread, also known as light-weight process, that caused the traced process to stop. The .Fa addr argument specifies a pointer to a .Vt "struct ptrace_lwpinfo" , which is defined as follows: .Bd -literal struct ptrace_lwpinfo { lwpid_t pl_lwpid; int pl_event; int pl_flags; sigset_t pl_sigmask; sigset_t pl_siglist; siginfo_t pl_siginfo; char pl_tdname[MAXCOMLEN + 1]; pid_t pl_child_pid; u_int pl_syscall_code; u_int pl_syscall_narg; }; .Ed .Pp The .Fa data argument is to be set to the size of the structure known to the caller. This allows the structure to grow without affecting older programs. .Pp The fields in the .Vt "struct ptrace_lwpinfo" have the following meaning: .Bl -tag -width indent -compact .It Va pl_lwpid LWP id of the thread .It Va pl_event Event that caused the stop. Currently defined events are: .Bl -tag -width "Dv PL_EVENT_SIGNAL" -compact .It Dv PL_EVENT_NONE No reason given .It Dv PL_EVENT_SIGNAL Thread stopped due to the pending signal .El .It Va pl_flags Flags that specify additional details about observed stop. Currently defined flags are: .Bl -tag -width indent -compact .It Dv PL_FLAG_SCE The thread stopped due to system call entry, right after the kernel is entered. The debugger may examine syscall arguments that are stored in memory and registers according to the ABI of the current process, and modify them, if needed. .It Dv PL_FLAG_SCX The thread is stopped immediately before syscall is returning to the usermode. The debugger may examine system call return values in the ABI-defined registers and/or memory. .It Dv PL_FLAG_EXEC When .Dv PL_FLAG_SCX is set, this flag may be additionally specified to inform that the program being executed by debuggee process has been changed by successful execution of a system call from the .Fn execve 2 family. .It Dv PL_FLAG_SI Indicates that .Va pl_siginfo member of .Vt "struct ptrace_lwpinfo" contains valid information. .It Dv PL_FLAG_FORKED Indicates that the process is returning from a call to .Fn fork 2 that created a new child process. The process identifier of the new process is available in the .Va pl_child_pid member of .Vt "struct ptrace_lwpinfo" . .It Dv PL_FLAG_CHILD The flag is set for first event reported from a new child which is automatically attached when .Dv PTRACE_FORK is enabled. .It Dv PL_FLAG_BORN This flag is set for the first event reported from a new LWP when .Dv PTRACE_LWP is enabled. It is reported along with .Dv PL_FLAG_SCX . .It Dv PL_FLAG_EXITED This flag is set for the last event reported by an exiting LWP when .Dv PTRACE_LWP is enabled. Note that this event is not reported when the last LWP in a process exits. The termination of the last thread is reported via a normal process exit event. .It Dv PL_FLAG_VFORKED Indicates that the thread is returning from a call to .Xr vfork 2 that created a new child process. This flag is set in addition to .Dv PL_FLAG_FORKED . .It Dv PL_FLAG_VFORK_DONE Indicates that the thread has resumed after a child process created via .Xr vfork 2 has stopped sharing its address space with the traced process. .El .It Va pl_sigmask The current signal mask of the LWP .It Va pl_siglist The current pending set of signals for the LWP. Note that signals that are delivered to the process would not appear on an LWP siglist until the thread is selected for delivery. .It Va pl_siginfo The siginfo that accompanies the signal pending. Only valid for .Dv PL_EVENT_SIGNAL stop when .Dv PL_FLAG_SI is set in .Va pl_flags . .It Va pl_tdname The name of the thread. .It Va pl_child_pid The process identifier of the new child process. Only valid for a .Dv PL_EVENT_SIGNAL stop when .Dv PL_FLAG_FORKED is set in .Va pl_flags . .It Va pl_syscall_code The ABI-specific identifier of the current system call. Note that for indirect system calls this field reports the indirected system call. Only valid when .Dv PL_FLAG_SCE or .Dv PL_FLAG_SCX is set in -.Va pl_flags. +.Va pl_flags . .It Va pl_syscall_narg The number of arguments passed to the current system call not counting the system call identifier. Note that for indirect system calls this field reports the arguments passed to the indirected system call. Only valid when .Dv PL_FLAG_SCE or .Dv PL_FLAG_SCX is set in -.Va pl_flags. +.Va pl_flags . .El .It Dv PT_GETNUMLWPS This request returns the number of kernel threads associated with the traced process. .It Dv PT_GETLWPLIST This request can be used to get the current thread list. A pointer to an array of type .Vt lwpid_t should be passed in .Fa addr , with the array size specified by .Fa data . The return value from .Fn ptrace is the count of array entries filled in. .It Dv PT_SETSTEP This request will turn on single stepping of the specified process. Stepping is automatically disabled when a single step trap is caught. .It Dv PT_CLEARSTEP This request will turn off single stepping of the specified process. .It Dv PT_SUSPEND This request will suspend the specified thread. .It Dv PT_RESUME This request will resume the specified thread. .It Dv PT_TO_SCE This request will set the .Dv PTRACE_SCE event flag to trace all future system call entries and continue the process. The .Fa addr and .Fa data arguments are used the same as for -.Dv PT_CONTINUE. +.Dv PT_CONTINUE . .It Dv PT_TO_SCX This request will set the .Dv PTRACE_SCX event flag to trace all future system call exits and continue the process. The .Fa addr and .Fa data arguments are used the same as for -.Dv PT_CONTINUE. +.Dv PT_CONTINUE . .It Dv PT_SYSCALL This request will set the .Dv PTRACE_SYSCALL event flag to trace all future system call entries and exits and continue the process. The .Fa addr and .Fa data arguments are used the same as for -.Dv PT_CONTINUE. +.Dv PT_CONTINUE . .It Dv PT_GET_SC_ARGS For the thread which is stopped in either .Dv PL_FLAG_SCE or .Dv PL_FLAG_SCX state, that is, on entry or exit to a syscall, this request fetches the syscall arguments. .Pp The arguments are copied out into the buffer pointed to by the .Fa addr pointer, sequentially. Each syscall argument is stored as the machine word. Kernel copies out as many arguments as the syscall accepts, see the .Va pl_syscall_narg member of the .Vt struct ptrace_lwpinfo , but not more than the .Fa data bytes in total are copied. .It Dv PT_GET_SC_RET Fetch the system call return values on exit from a syscall. This request is only valid for threads stopped in a syscall exit (the .Dv PL_FLAG_SCX state). The .Fa addr argument specifies a pointer to a .Vt "struct ptrace_sc_ret" , which is defined as follows: .Bd -literal struct ptrace_sc_ret { register_t sr_retval[2]; int sr_error; }; .Ed .Pp The .Fa data argument is set to the size of the structure. .Pp If the system call completed successfully, .Va sr_error is set to zero and the return values of the system call are saved in .Va sr_retval . If the system call failed to execute, .Va sr_error field is set to a positive .Xr errno 2 value. If the system call completed in an unusual fashion, .Va sr_error is set to a negative value: -.Pp .Bl -tag -width Dv EJUSTRETURN -compact .It Dv ERESTART System call will be restarted. .It Dv EJUSTRETURN System call completed sucessfully but did not set a return value .Po for example, .Xr setcontext 2 and .Xr sigreturn 2 .Pc . .El .It Dv PT_FOLLOW_FORK This request controls tracing for new child processes of a traced process. If .Fa data is non-zero, .Dv PTRACE_FORK is set in the traced process's event tracing mask. If .Fa data is zero, .Dv PTRACE_FORK is cleared from the traced process's event tracing mask. .It Dv PT_LWP_EVENTS This request controls tracing of LWP creation and destruction. If .Fa data is non-zero, .Dv PTRACE_LWP is set in the traced process's event tracing mask. If .Fa data is zero, .Dv PTRACE_LWP is cleared from the traced process's event tracing mask. .It Dv PT_GET_EVENT_MASK This request reads the traced process's event tracing mask into the integer pointed to by .Fa addr . The size of the integer must be passed in .Fa data . .It Dv PT_SET_EVENT_MASK This request sets the traced process's event tracing mask from the integer pointed to by .Fa addr . The size of the integer must be passed in .Fa data . .It Dv PT_VM_TIMESTAMP This request returns the generation number or timestamp of the memory map of the traced process as the return value from .Fn ptrace . This provides a low-cost way for the tracing process to determine if the VM map changed since the last time this request was made. .It Dv PT_VM_ENTRY This request is used to iterate over the entries of the VM map of the traced process. The .Fa addr argument specifies a pointer to a .Vt "struct ptrace_vm_entry" , which is defined as follows: .Bd -literal struct ptrace_vm_entry { int pve_entry; int pve_timestamp; u_long pve_start; u_long pve_end; u_long pve_offset; u_int pve_prot; u_int pve_pathlen; long pve_fileid; uint32_t pve_fsid; char *pve_path; }; .Ed .Pp The first entry is returned by setting .Va pve_entry to zero. Subsequent entries are returned by leaving .Va pve_entry unmodified from the value returned by previous requests. The .Va pve_timestamp field can be used to detect changes to the VM map while iterating over the entries. The tracing process can then take appropriate action, such as restarting. By setting .Va pve_pathlen to a non-zero value on entry, the pathname of the backing object is returned in the buffer pointed to by .Va pve_path , provided the entry is backed by a vnode. The .Va pve_pathlen field is updated with the actual length of the pathname (including the terminating null character). The .Va pve_offset field is the offset within the backing object at which the range starts. The range is located in the VM space at .Va pve_start and extends up to .Va pve_end (inclusive). .Pp The .Fa data argument is ignored. .El .Sh ARM MACHINE-SPECIFIC REQUESTS .Bl -tag -width "Dv PT_SETVFPREGS" .It Dv PT_GETVFPREGS Return the thread's .Dv VFP machine state in the buffer pointed to by .Fa addr . .Pp The .Fa data argument is ignored. .It Dv PT_SETVFPREGS Set the thread's .Dv VFP machine state from the buffer pointed to by .Fa addr . .Pp The .Fa data argument is ignored. .El -.Pp .Sh x86 MACHINE-SPECIFIC REQUESTS .Bl -tag -width "Dv PT_GETXSTATE_INFO" .It Dv PT_GETXMMREGS Copy the XMM FPU state into the buffer pointed to by the argument .Fa addr . The buffer has the same layout as the 32-bit save buffer for the machine instruction .Dv FXSAVE . .Pp This request is only valid for i386 programs, both on native 32-bit systems and on amd64 kernels. For 64-bit amd64 programs, the XMM state is reported as part of the FPU state returned by the .Dv PT_GETFPREGS request. .Pp The .Fa data argument is ignored. .It Dv PT_SETXMMREGS Load the XMM FPU state for the thread from the buffer pointed to by the argument .Fa addr . The buffer has the same layout as the 32-bit load buffer for the machine instruction .Dv FXRSTOR . .Pp As with -.Dv PT_GETXMMREGS, +.Dv PT_GETXMMREGS , this request is only valid for i386 programs. .Pp The .Fa data argument is ignored. .It Dv PT_GETXSTATE_INFO Report which XSAVE FPU extensions are supported by the CPU and allowed in userspace programs. The .Fa addr argument must point to a variable of type .Vt struct ptrace_xstate_info , which contains the information on the request return. .Vt struct ptrace_xstate_info is defined as follows: .Bd -literal struct ptrace_xstate_info { uint64_t xsave_mask; uint32_t xsave_len; }; .Ed The .Dv xsave_mask field is a bitmask of the currently enabled extensions. The meaning of the bits is defined in the Intel and AMD processor documentation. The .Dv xsave_len field reports the length of the XSAVE area for storing the hardware state for currently enabled extensions in the format defined by the x86 .Dv XSAVE machine instruction. .Pp The .Fa data argument value must be equal to the size of the .Vt struct ptrace_xstate_info . .It Dv PT_GETXSTATE Return the content of the XSAVE area for the thread. The .Fa addr argument points to the buffer where the content is copied, and the .Fa data argument specifies the size of the buffer. The kernel copies out as much content as allowed by the buffer size. The buffer layout is specified by the layout of the save area for the .Dv XSAVE machine instruction. .It Dv PT_SETXSTATE Load the XSAVE state for the thread from the buffer specified by the .Fa addr pointer. The buffer size is passed in the .Fa data argument. The buffer must be at least as large as the .Vt struct savefpu (defined in .Pa x86/fpu.h ) to allow the complete x87 FPU and XMM state load. It must not be larger than the XSAVE state length, as reported by the .Dv xsave_len field from the .Vt struct ptrace_xstate_info of the .Dv PT_GETXSTATE_INFO request. Layout of the buffer is identical to the layout of the load area for the .Dv XRSTOR machine instruction. .It Dv PT_GETFSBASE Return the value of the base used when doing segmented memory addressing using the %fs segment register. The .Fa addr argument points to an .Vt unsigned long variable where the base value is stored. .Pp The .Fa data argument is ignored. .It Dv PT_GETGSBASE Like the .Dv PT_GETFSBASE request, but returns the base for the %gs segment register. .It Dv PT_SETFSBASE Set the base for the %fs segment register to the value pointed to by the .Fa addr argument. .Fa addr must point to the .Vt unsigned long variable containing the new base. .Pp The .Fa data argument is ignored. .It Dv PT_SETGSBASE Like the .Dv PT_SETFSBASE request, but sets the base for the %gs segment register. .El .Sh PowerPC MACHINE-SPECIFIC REQUESTS .Bl -tag -width "Dv PT_SETVRREGS" .It Dv PT_GETVRREGS Return the thread's .Dv ALTIVEC machine state in the buffer pointed to by .Fa addr . .Pp The .Fa data argument is ignored. .It Dv PT_SETVRREGS Set the thread's .Dv ALTIVEC machine state from the buffer pointed to by .Fa addr . .Pp The .Fa data argument is ignored. .It Dv PT_GETVSRREGS Return doubleword 1 of the thread's .Dv VSX registers VSR0-VSR31 in the buffer pointed to by .Fa addr . .Pp The .Fa data argument is ignored. .It Dv PT_SETVSRREGS Set doubleword 1 of the thread's .Dv VSX registers VSR0-VSR31 from the buffer pointed to by .Fa addr . .Pp The .Fa data argument is ignored. .El .Pp Additionally, other machine-specific requests can exist. .Sh RETURN VALUES Most requests return 0 on success and \-1 on error. Some requests can cause .Fn ptrace to return \-1 as a non-error value, among them are .Dv PT_READ_I and .Dv PT_READ_D , which return the value read from the process memory on success. To disambiguate, .Va errno can be set to 0 before the call and checked afterwards. .Pp The current .Fn ptrace implementation always sets .Va errno to 0 before calling into the kernel, both for historic reasons and for consistency with other operating systems. It is recommended to assign zero to .Va errno explicitly for forward compatibility. .Sh ERRORS The .Fn ptrace system call may fail if: .Bl -tag -width Er .It Bq Er ESRCH .Bl -bullet -compact .It No process having the specified process ID exists. .El .It Bq Er EINVAL .Bl -bullet -compact .It A process attempted to use .Dv PT_ATTACH on itself. .It The .Fa request argument was not one of the legal requests. .It The signal number (in .Fa data ) to .Dv PT_CONTINUE was neither 0 nor a legal signal number. .It .Dv PT_GETREGS , .Dv PT_SETREGS , .Dv PT_GETFPREGS , .Dv PT_SETFPREGS , .Dv PT_GETDBREGS , or .Dv PT_SETDBREGS was attempted on a process with no valid register set. (This is normally true only of system processes.) .It .Dv PT_VM_ENTRY was given an invalid value for .Fa pve_entry . This can also be caused by changes to the VM map of the process. .It The size (in .Fa data ) provided to .Dv PT_LWPINFO was less than or equal to zero, or larger than the .Vt ptrace_lwpinfo structure known to the kernel. .It The size (in .Fa data ) provided to the x86-specific .Dv PT_GETXSTATE_INFO request was not equal to the size of the .Vt struct ptrace_xstate_info . .It The size (in .Fa data ) provided to the x86-specific .Dv PT_SETXSTATE request was less than the size of the x87 plus the XMM save area. .It The size (in .Fa data ) provided to the x86-specific .Dv PT_SETXSTATE request was larger than returned in the .Dv xsave_len member of the .Vt struct ptrace_xstate_info from the .Dv PT_GETXSTATE_INFO request. .It The base value, provided to the amd64-specific requests .Dv PT_SETFSBASE or .Dv PT_SETGSBASE , pointed outside of the valid user address space. This error will not occur in 32-bit programs. .El .It Bq Er EBUSY .Bl -bullet -compact .It .Dv PT_ATTACH was attempted on a process that was already being traced. .It A request attempted to manipulate a process that was being traced by some process other than the one making the request. .It A request (other than .Dv PT_ATTACH ) specified a process that was not stopped. .El .It Bq Er EPERM .Bl -bullet -compact .It A request (other than .Dv PT_ATTACH ) attempted to manipulate a process that was not being traced at all. .It An attempt was made to use .Dv PT_ATTACH on a process in violation of the requirements listed under .Dv PT_ATTACH above. .El .It Bq Er ENOENT .Bl -bullet -compact .It .Dv PT_VM_ENTRY previously returned the last entry of the memory map. No more entries exist. .El .It Bq Er ENAMETOOLONG .Bl -bullet -compact .It .Dv PT_VM_ENTRY cannot return the pathname of the backing object because the buffer is not big enough. .Fa pve_pathlen holds the minimum buffer size required on return. .El .El .Sh SEE ALSO .Xr execve 2 , .Xr sigaction 2 , .Xr wait 2 , .Xr execv 3 , .Xr i386_clr_watch 3 , .Xr i386_set_watch 3 .Sh HISTORY The .Fn ptrace function appeared in .At v6 . Index: head/lib/libc/sys/revoke.2 =================================================================== --- head/lib/libc/sys/revoke.2 (revision 368816) +++ head/lib/libc/sys/revoke.2 (revision 368817) @@ -1,110 +1,110 @@ .\" Copyright (c) 1993 .\" The Regents of the University of California. All rights reserved. .\" .\" This code is derived from software contributed to Berkeley by .\" Berkeley Software Design, Inc. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 3. Neither the name of the University nor the names of its contributors .\" may be used to endorse or promote products derived from this software .\" without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" @(#)revoke.2 8.1 (Berkeley) 6/4/93 .\" $FreeBSD$ .\" -.Dd Jan 25, 2016 +.Dd January 25, 2016 .Dt REVOKE 2 .Os .Sh NAME .Nm revoke .Nd revoke file access .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In unistd.h .Ft int .Fn revoke "const char *path" .Sh DESCRIPTION The .Fn revoke system call invalidates all current open file descriptors in the system for the file named by .Fa path . Subsequent operations on any such descriptors fail, with the exceptions that a .Fn read from a character device file which has been revoked returns a count of zero (end of file), and a .Fn close system call will succeed. If the file is a special file for a device which is open, the device close function is called as if all open references to the file had been closed using a special close method which does not block. .Pp Access to a file may be revoked only by its owner or the super user. The .Fn revoke system call is currently supported only for block and character special device files. It is normally used to prepare a terminal device for a new login session, preventing any access by a previous user of the terminal. .Sh RETURN VALUES .Rv -std revoke .Sh ERRORS Access to the named file is revoked unless one of the following: .Bl -tag -width Er .It Bq Er ENOTDIR A component of the path prefix is not a directory. .It Bq Er ENAMETOOLONG A component of a pathname exceeded 255 characters, or an entire path name exceeded 1024 characters. .It Bq Er ENOENT The named file or a component of the path name does not exist. .It Bq Er EACCES Search permission is denied for a component of the path prefix. .It Bq Er ELOOP Too many symbolic links were encountered in translating the pathname. .It Bq Er EFAULT The .Fa path argument points outside the process's allocated address space. .It Bq Er EINVAL The implementation does not support the .Fn revoke operation on the named file. .It Bq Er EPERM The caller is neither the owner of the file nor the super user. .El .Sh SEE ALSO .Xr revoke 1 , .Xr close 2 .Sh HISTORY The .Fn revoke system call first appeared in .Bx 4.3 Reno . .Sh BUGS The non-blocking close method is only correctly implemented for terminal devices. Index: head/lib/libc/sys/rtprio.2 =================================================================== --- head/lib/libc/sys/rtprio.2 (revision 368816) +++ head/lib/libc/sys/rtprio.2 (revision 368817) @@ -1,198 +1,201 @@ .\"- .\" Copyright (c) 1994, Henrik Vestergaard Draboel .\" All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 3. All advertising materials mentioning features or use of this software .\" must display the following acknowledgement: .\" This product includes software developed by Henrik Vestergaard Draboel. .\" 4. The name of the author may not be used to endorse or promote products .\" derived from this software without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\"- .\" Copyright (c) 2011 Xin LI .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD$ .\" .Dd December 27, 2011 .Dt RTPRIO 2 .Os .Sh NAME .Nm rtprio , .Nm rtprio_thread .Nd examine or modify realtime or idle priority .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In sys/types.h .In sys/rtprio.h .Ft int .Fn rtprio "int function" "pid_t pid" "struct rtprio *rtp" .Ft int .Fn rtprio_thread "int function" "lwpid_t lwpid" "struct rtprio *rtp" .Sh DESCRIPTION The .Fn rtprio system call is used to lookup or change the realtime or idle priority of a process, or the calling thread. The .Fn rtprio_thread system call is used to lookup or change the realtime or idle priority of a thread. .Pp The .Fa function argument specifies the operation to be performed. RTP_LOOKUP to lookup the current priority, and RTP_SET to set the priority. .Pp For the .Fn rtprio system call, the .Fa pid argument specifies the process to operate on, 0 for the calling thread. When .Fa pid is non-zero, the system call reports the highest priority in the process, or sets all threads' priority in the process, depending on value of the .Fa function argument. .Pp For the .Fn rtprio_thread system call, the .Fa lwpid specifies the thread to operate on, 0 for the calling thread. .Pp The .Fa *rtp argument is a pointer to a struct rtprio which is used to specify the priority and priority type. This structure has the following form: .Bd -literal struct rtprio { u_short type; u_short prio; }; .Ed .Pp The value of the .Va type field may be RTP_PRIO_REALTIME for realtime priorities, RTP_PRIO_NORMAL for normal priorities, and RTP_PRIO_IDLE for idle priorities. The priority specified by the .Va prio field ranges between 0 and .Dv RTP_PRIO_MAX .Pq usually 31 . 0 is the highest possible priority. .Pp -Realtime and idle priority is inherited through fork() and exec(). +Realtime and idle priority is inherited through +.Fn fork +and +.Fn exec . .Pp A realtime thread can only be preempted by a thread of equal or higher priority, or by an interrupt; idle priority threads will run only when no other real/normal priority thread is runnable. Higher real/idle priority threads preempt lower real/idle priority threads. Threads of equal real/idle priority are run round-robin. .Sh RETURN VALUES .Rv -std rtprio rtprio_thread .Sh ERRORS The .Fn rtprio and .Fn rtprio_thread system calls will fail if: .Bl -tag -width Er .It Bq Er EFAULT The rtp pointer passed to .Fn rtprio or .Fn rtprio_thread was invalid. .It Bq Er EINVAL The specified .Fa prio was out of range. .It Bq Er EPERM The calling thread is not allowed to set the realtime priority. Only root is allowed to change the realtime priority of any thread, and non-root may only change the idle priority of threads the user owns, when the .Xr sysctl 8 variable .Va security.bsd.unprivileged_idprio is set to non-zero. .It Bq Er ESRCH The specified process or thread was not found or visible. .El .Sh SEE ALSO .Xr nice 1 , .Xr ps 1 , .Xr rtprio 1 , .Xr setpriority 2 , .Xr nice 3 , .Xr renice 8 , .Xr p_cansee 9 .Sh AUTHORS .An -nosplit The original author was .An Henrik Vestergaard Draboel Aq Mt hvd@terry.ping.dk . This implementation in .Fx was substantially rewritten by .An David Greenman . The .Fn rtprio_thread system call was implemented by .An David Xu . Index: head/lib/libc/sys/sendfile.2 =================================================================== --- head/lib/libc/sys/sendfile.2 (revision 368816) +++ head/lib/libc/sys/sendfile.2 (revision 368817) @@ -1,445 +1,445 @@ .\" Copyright (c) 2003, David G. Lawrence .\" All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice unmodified, this list of conditions, and the following .\" disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD$ .\" .Dd March 30, 2020 .Dt SENDFILE 2 .Os .Sh NAME .Nm sendfile .Nd send a file to a socket .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In sys/types.h .In sys/socket.h .In sys/uio.h .Ft int .Fo sendfile .Fa "int fd" "int s" "off_t offset" "size_t nbytes" .Fa "struct sf_hdtr *hdtr" "off_t *sbytes" "int flags" .Fc .Sh DESCRIPTION The .Fn sendfile system call sends a regular file or shared memory object specified by descriptor .Fa fd out a stream socket specified by descriptor .Fa s . .Pp The .Fa offset argument specifies where to begin in the file. Should .Fa offset fall beyond the end of file, the system will return success and report 0 bytes sent as described below. The .Fa nbytes argument specifies how many bytes of the file should be sent, with 0 having the special meaning of send until the end of file has been reached. .Pp An optional header and/or trailer can be sent before and after the file data by specifying a pointer to a .Vt "struct sf_hdtr" , which has the following structure: .Pp .Bd -literal -offset indent -compact struct sf_hdtr { struct iovec *headers; /* pointer to header iovecs */ int hdr_cnt; /* number of header iovecs */ struct iovec *trailers; /* pointer to trailer iovecs */ int trl_cnt; /* number of trailer iovecs */ }; .Ed .Pp The .Fa headers and .Fa trailers pointers, if .Pf non- Dv NULL , point to arrays of .Vt "struct iovec" structures. See the .Fn writev system call for information on the iovec structure. The number of iovecs in these arrays is specified by .Fa hdr_cnt and .Fa trl_cnt . .Pp If .Pf non- Dv NULL , the system will write the total number of bytes sent on the socket to the variable pointed to by .Fa sbytes . .Pp The least significant 16 bits of .Fa flags argument is a bitmap of these values: .Bl -tag -offset indent -width "SF_USER_READAHEAD" .It Dv SF_NODISKIO This flag causes .Nm to return .Er EBUSY instead of blocking when a busy page is encountered. This rare situation can happen if some other process is now working with the same region of the file. It is advised to retry the operation after a short period. .Pp Note that in older .Fx versions the .Dv SF_NODISKIO had slightly different notion. The flag prevented .Nm to run I/O operations in case if an invalid (not cached) page is encountered, thus avoiding blocking on I/O. Starting with .Fx 11 .Nm sending files off the .Xr ffs 7 filesystem does not block on I/O (see .Sx IMPLEMENTATION NOTES ), so the condition no longer applies. However, it is safe if an application utilizes .Dv SF_NODISKIO and on .Er EBUSY performs the same action as it did in older .Fx versions, e.g., .Xr aio_read 2 , .Xr read 2 or .Nm in a different context. .It Dv SF_NOCACHE The data sent to socket will not be cached by the virtual memory system, and will be freed directly to the pool of free pages. .It Dv SF_SYNC .Nm sleeps until the network stack no longer references the VM pages of the file, making subsequent modifications to it safe. Please note that this is not a guarantee that the data has actually been sent. .It Dv SF_USER_READAHEAD .Nm has some internal heuristics to do readahead when sending data. This flag forces .Nm to override any heuristically calculated readahead and use exactly the application specified readahead. See .Sx SETTING READAHEAD for more details on readahead. .El .Pp When using a socket marked for non-blocking I/O, .Fn sendfile may send fewer bytes than requested. In this case, the number of bytes successfully written is returned in .Fa *sbytes (if specified), and the error .Er EAGAIN is returned. .Sh SETTING READAHEAD .Nm uses internal heuristics based on request size and file system layout to do readahead. Additionally application may request extra readahead. The most significant 16 bits of .Fa flags specify amount of pages that .Nm may read ahead when reading the file. A macro .Fn SF_FLAGS is provided to combine readahead amount and flags. An example showing specifying readahead of 16 pages and .Dv SF_NOCACHE flag: .Pp .Bd -literal -offset indent -compact SF_FLAGS(16, SF_NOCACHE) .Ed .Pp .Nm will use either application specified readahead or internally calculated, whichever is bigger. Setting flag .Dv SF_USER_READAHEAD would turn off any heuristics and set maximum possible readahead length to the number of pages specified via flags. .Sh IMPLEMENTATION NOTES The .Fx implementation of .Fn sendfile does not block on disk I/O when it sends a file off the .Xr ffs 7 filesystem. The syscall returns success before the actual I/O completes, and data is put into the socket later unattended. However, the order of data in the socket is preserved, so it is safe to do further writes to the socket. .Pp The .Fx implementation of .Fn sendfile is "zero-copy", meaning that it has been optimized so that copying of the file data is avoided. .Sh TUNING .Ss physical paging buffers .Fn sendfile uses vnode pager to read file pages into memory. The pager uses a pool of physical buffers to run its I/O operations. When system runs out of pbufs, sendfile will block and report state .Dq Li zonelimit . Size of the pool can be tuned with .Va vm.vnode_pbufs .Xr loader.conf 5 tunable and can be checked with .Xr sysctl 8 OID of the same name at runtime. .Ss sendfile(2) buffers On some architectures, this system call internally uses a special .Fn sendfile buffer .Pq Vt "struct sf_buf" to handle sending file data to the client. If the sending socket is blocking, and there are not enough .Fn sendfile buffers available, .Fn sendfile will block and report a state of .Dq Li sfbufa . If the sending socket is non-blocking and there are not enough .Fn sendfile buffers available, the call will block and wait for the necessary buffers to become available before finishing the call. .Pp The number of .Vt sf_buf Ns 's allocated should be proportional to the number of nmbclusters used to send data to a client via .Fn sendfile . Tune accordingly to avoid blocking! Busy installations that make extensive use of .Fn sendfile may want to increase these values to be inline with their .Va kern.ipc.nmbclusters (see .Xr tuning 7 for details). .Pp The number of .Fn sendfile buffers available is determined at boot time by either the .Va kern.ipc.nsfbufs .Xr loader.conf 5 variable or the .Dv NSFBUFS kernel configuration tunable. The number of .Fn sendfile buffers scales with .Va kern.maxusers . The .Va kern.ipc.nsfbufsused and .Va kern.ipc.nsfbufspeak read-only .Xr sysctl 8 variables show current and peak .Fn sendfile buffers usage respectively. These values may also be viewed through .Nm netstat Fl m . .Pp If .Xr sysctl 8 OID .Va kern.ipc.nsfbufs doesn't exist, your architecture does not need to use .Fn sendfile buffers because their task can be efficiently performed by the generic virtual memory structures. .Sh RETURN VALUES .Rv -std sendfile .Sh ERRORS .Bl -tag -width Er .It Bq Er EAGAIN The socket is marked for non-blocking I/O and not all data was sent due to the socket buffer being filled. If specified, the number of bytes successfully sent will be returned in .Fa *sbytes . .It Bq Er EBADF The .Fa fd argument is not a valid file descriptor. .It Bq Er EBADF The .Fa s argument is not a valid socket descriptor. .It Bq Er EBUSY A busy page was encountered and .Dv SF_NODISKIO had been specified. Partial data may have been sent. .It Bq Er EFAULT An invalid address was specified for an argument. .It Bq Er EINTR A signal interrupted .Fn sendfile before it could be completed. If specified, the number of bytes successfully sent will be returned in .Fa *sbytes . .It Bq Er EINVAL The .Fa fd argument is not a regular file. .It Bq Er EINVAL The .Fa s argument is not a SOCK_STREAM type socket. .It Bq Er EINVAL The .Fa offset argument is negative. .It Bq Er EIO An error occurred while reading from .Fa fd . .It Bq Er EINTEGRITY Corrupted data was detected while reading from .Fa fd . .It Bq Er ENOTCAPABLE The .Fa fd or the .Fa s argument has insufficient rights. .It Bq Er ENOBUFS The system was unable to allocate an internal buffer. .It Bq Er ENOTCONN The .Fa s argument points to an unconnected socket. .It Bq Er ENOTSOCK The .Fa s argument is not a socket. .It Bq Er EOPNOTSUPP The file system for descriptor .Fa fd does not support .Fn sendfile . .It Bq Er EPIPE The socket peer has closed the connection. .El .Sh SEE ALSO -.Xr loader.conf 5 , .Xr netstat 1 , .Xr open 2 , .Xr send 2 , .Xr socket 2 , .Xr writev 2 , -.Xr sysctl 8 , -.Xr tuning 7 +.Xr loader.conf 5 , +.Xr tuning 7 , +.Xr sysctl 8 .Rs .%A K. Elmeleegy .%A A. Chanda .%A A. L. Cox .%A W. Zwaenepoel .%T A Portable Kernel Abstraction for Low-Overhead Ephemeral Mapping Management .%J The Proceedings of the 2005 USENIX Annual Technical Conference .%P pp 223-236 .%D 2005 .Re .Sh HISTORY The .Fn sendfile system call first appeared in .Fx 3.0 . This manual page first appeared in .Fx 3.1 . In .Fx 10 support for sending shared memory descriptors had been introduced. In .Fx 11 a non-blocking implementation had been introduced. .Sh AUTHORS The initial implementation of .Fn sendfile system call and this manual page were written by .An David G. Lawrence Aq Mt dg@dglawrence.com . The .Fx 11 implementation was written by .An Gleb Smirnoff Aq Mt glebius@FreeBSD.org . .Sh BUGS The .Fn sendfile system call will not fail, i.e., return .Dv -1 and set .Va errno to .Er EFAULT , if provided an invalid address for .Fa sbytes . The .Fn sendfile system call does not support SCTP sockets, it will return .Dv -1 and set .Va errno to -.Er EINVAL. +.Er EINVAL . Index: head/lib/libc/sys/thr_exit.2 =================================================================== --- head/lib/libc/sys/thr_exit.2 (revision 368816) +++ head/lib/libc/sys/thr_exit.2 (revision 368817) @@ -1,95 +1,95 @@ .\" Copyright (c) 2016 The FreeBSD Foundation, Inc. .\" All rights reserved. .\" .\" This documentation was written by .\" Konstantin Belousov under sponsorship .\" from the FreeBSD Foundation. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD$ .\" .Dd May 5, 2020 .Dt THR_EXIT 2 .Os .Sh NAME .Nm thr_exit .Nd terminate current thread .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In sys/thr.h .Ft void .Fn thr_exit "long *state" .Sh DESCRIPTION .Bf -symbolic This function is intended for implementing threading. Normal applications should call .Xr pthread_exit 3 instead. .Ef .Pp The .Fn thr_exit system call terminates the current kernel-scheduled thread. .Pp If the .Fa state argument is not .Dv NULL , the location pointed to by the argument is updated with an arbitrary non-zero value, and an .Xr _umtx_op 2 .Dv UMTX_OP_WAKE operation is consequently performed on the location. .Pp Attempts to terminate the last thread in the process are silently ignored. Use .Xr _exit 2 syscall to terminate the process. .Sh RETURN VALUES The function does not return a value. A return from the function indicates that the calling thread was the last one in the process. .Sh SEE ALSO .Xr _exit 2 , +.Xr _umtx_op 2 , .Xr thr_kill 2 , .Xr thr_kill2 2 , .Xr thr_new 2 , .Xr thr_self 2 , .Xr thr_set_name 2 , -.Xr _umtx_op 2 , .Xr pthread_exit 3 .Sh STANDARDS The .Fn thr_exit system call is non-standard and is used by .Lb libthr to implement .St -p1003.1-2001 .Xr pthread 3 functionality. .Sh HISTORY The .Fn thr_exit system call first appeared in .Fx 5.2 . Index: head/lib/libc/sys/thr_new.2 =================================================================== --- head/lib/libc/sys/thr_new.2 (revision 368816) +++ head/lib/libc/sys/thr_new.2 (revision 368817) @@ -1,250 +1,250 @@ .\" Copyright (c) 2016 The FreeBSD Foundation, Inc. .\" All rights reserved. .\" .\" This documentation was written by .\" Konstantin Belousov under sponsorship .\" from the FreeBSD Foundation. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD$ .\" .Dd May 5, 2020 .Dt THR_NEW 2 .Os .Sh NAME .Nm thr_new .Nd create new thread of execution .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In sys/thr.h .Ft int .Fn thr_new "struct thr_param *param" "int param_size" .Sh DESCRIPTION .Bf -symbolic This function is intended for implementing threading. Normal applications should call .Xr pthread_create 3 instead. .Ef .Pp The .Fn thr_new system call creates a new kernel-scheduled thread of execution in the context of the current process. The newly created thread shares all attributes of the process with the existing kernel-scheduled threads in the process, but has private processor execution state. The machine context for the new thread is copied from the creating thread's context, including coprocessor state. FPU state and specific machine registers are excluded from the copy. These are set according to ABI requirements and syscall parameters. The FPU state for the new thread is reinitialized to clean. .Pp The .Fa param structure supplies parameters affecting the thread creation. The structure is defined in the .In sys/thr.h header as follows .Bd -literal struct thr_param { void (*start_func)(void *); void *arg; char *stack_base; size_t stack_size; char *tls_base; size_t tls_size; long *child_tid; long *parent_tid; int flags; struct rtprio *rtp; }; .Ed and contains the following fields: .Bl -tag -width ".Va parent_tid" .It Va start_func Pointer to the thread entry function. The kernel arranges for the new thread to start executing the function upon the first return to userspace. .It Va arg Opaque argument supplied to the entry function. .It Va stack_base Stack base address. The stack must be allocated by the caller. On some architectures, the ABI might require that the system put information on the stack to ensure the execution environment for .Va start_func . .It Va stack_size Stack size. .It Va tls_base TLS base address. The value of TLS base is loaded into the ABI-defined machine register in the new thread context. .It Va tls_size TLS size. .It Va child_tid Address to store the new thread identifier, for the child's use. .It Va parent_tid Address to store the new thread identifier, for the parent's use. .Pp Both .Va child_tid and .Va parent_tid are provided, with the intent that .Va child_tid is used by the new thread to get its thread identifier without issuing the .Xr thr_self 2 syscall, while .Va parent_tid is used by the thread creator. The latter is separate from .Va child_tid because the new thread might exit and free its thread data before the parent has a chance to execute far enough to access it. .It Va flags Thread creation flags. The .Va flags member may specify the following flags: .Bl -tag -width ".Dv THR_SYSTEM_SCOPE" .It Dv THR_SUSPENDED Create the new thread in the suspended state. The flag is not currently implemented. .It Dv THR_SYSTEM_SCOPE Create the system scope thread. The flag is not currently implemented. .El .It Va rtp Real-time scheduling priority for the new thread. May be .Dv NULL to inherit the priority from the creating thread. .El .Pp The .Fa param_size argument should be set to the size of the .Fa param structure. .Pp After the first successful creation of an additional thread, the process is marked by the kernel as multi-threaded. In particular, the .Dv P_HADTHREADS flag is set in the process' .Dv p_flag (visible in the .Xr ps 1 output), and several operations are executed in multi-threaded mode. For instance, the .Xr execve 2 system call terminates all threads but the calling one on successful execution. .Sh RETURN VALUES If successful, .Fn thr_new will return zero, otherwise \-1 is returned, and .Va errno is set to indicate the error. .Sh ERRORS The .Fn thr_new operation returns the following errors: .Bl -tag -width Er .\" When changing this list, consider updating share/man/man3/pthread_create.3, .\" since that function can return any of these errors. .It Bq Er EFAULT The memory pointed to by the .Fa param argument is not valid. .It Bq Er EFAULT The memory pointed to by the .Fa param structure .Fa child_tid , parent_tid or .Fa rtp arguments is not valid. .It Bq Er EFAULT The specified stack base is invalid, or the kernel was unable to put required initial data on the stack. .It Bq Er EINVAL The .Fa param_size argument specifies a negative value, or the value is greater than the largest .Fa struct param size the kernel can interpret. .It Bq Er EINVAL The .Fa rtp member is not .Dv NULL and specifies invalid scheduling parameters. .It Bq Er EINVAL The specified TLS base is invalid. .It Bq Er EPERM The caller does not have permission to set the scheduling parameters or scheduling policy. .It Bq Er EPROCLIM Creation of the new thread would exceed the .Dv RACCT_NTHR limit, see .Xr racct 2 . .It Bq Er EPROCLIM Creation of the new thread would exceed the .Dv kern.threads.max_threads_per_proc .Xr sysctl 2 limit. .It Bq Er ENOMEM There was not enough kernel memory to allocate the new thread structures. .El .Sh SEE ALSO .Xr ps 1 , +.Xr _umtx_op 2 , .Xr execve 2 , .Xr racct 2 , .Xr thr_exit 2 , .Xr thr_kill 2 , .Xr thr_kill2 2 , .Xr thr_self 2 , .Xr thr_set_name 2 , -.Xr _umtx_op 2 , .Xr pthread_create 3 .Sh STANDARDS The .Fn thr_new system call is non-standard and is used by the .Lb libthr to implement .St -p1003.1-2001 .Xr pthread 3 functionality. .Sh HISTORY The .Fn thr_new system call first appeared in .Fx 5.2 . Index: head/lib/libc/sys/thr_self.2 =================================================================== --- head/lib/libc/sys/thr_self.2 (revision 368816) +++ head/lib/libc/sys/thr_self.2 (revision 368817) @@ -1,95 +1,95 @@ .\" Copyright (c) 2016 The FreeBSD Foundation, Inc. .\" All rights reserved. .\" .\" This documentation was written by .\" Konstantin Belousov under sponsorship .\" from the FreeBSD Foundation. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD$ .\" .Dd May 5, 2020 .Dt THR_SELF 2 .Os .Sh NAME .Nm thr_self .Nd return thread identifier for the calling thread .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In sys/thr.h .Ft int .Fn thr_self "long *id" .Sh DESCRIPTION The .Fn thr_self system call stores the system-wide thread identifier for the current kernel-scheduled thread in the variable pointed by the argument .Va id . .Pp The thread identifier is an integer in the range from .Dv PID_MAX + 2 (100001) to .Dv INT_MAX . The thread identifier is guaranteed to be unique at any given time, for each running thread in the system. After the thread exits, the identifier may be reused. .Sh RETURN VALUES If successful, .Fn thr_self will return zero, otherwise \-1 is returned, and .Va errno is set to indicate the error. .Sh ERRORS The .Fn thr_self operation may return the following errors: .Bl -tag -width Er .It Bq Er EFAULT The memory pointed to by the .Fa id argument is not valid. .El .Sh SEE ALSO +.Xr _umtx_op 2 , .Xr thr_exit 2 , .Xr thr_kill 2 , .Xr thr_kill2 2 , .Xr thr_new 2 , .Xr thr_set_name 2 , -.Xr _umtx_op 2 , .Xr pthread_getthreadid_np 3 , .Xr pthread_self 3 .Sh STANDARDS The .Fn thr_self system call is non-standard and is used by .Lb libthr to implement .St -p1003.1-2001 .Xr pthread 3 functionality. .Sh HISTORY The .Fn thr_self system call first appeared in .Fx 5.2 . Index: head/lib/libc/sys/thr_set_name.2 =================================================================== --- head/lib/libc/sys/thr_set_name.2 (revision 368816) +++ head/lib/libc/sys/thr_set_name.2 (revision 368817) @@ -1,99 +1,99 @@ .\" Copyright (c) 2016 The FreeBSD Foundation, Inc. .\" All rights reserved. .\" .\" This documentation was written by .\" Konstantin Belousov under sponsorship .\" from the FreeBSD Foundation. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD$ .\" .Dd May 5, 2020 .Dt THR_SET_NAME 2 .Os .Sh NAME .Nm thr_set_name .Nd set user-visible thread name .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In sys/thr.h .Ft int .Fn thr_set_name "long id" "const char *name" .Sh DESCRIPTION The .Fn thr_set_name system call sets the user-visible name for the thread with the identifier .Va id in the current process to the NUL-terminated string .Va name . The name will be silently truncated to fit into a buffer of .Dv MAXCOMLEN + 1 bytes. The thread name can be seen in the output of the .Xr ps 1 and .Xr top 1 commands, in the kernel debuggers and kernel tracing facility outputs, and in userland debuggers and program core files, as notes. .Sh RETURN VALUES If successful, .Fn thr_set_name returns zero; otherwise, \-1 is returned, and .Va errno is set to indicate the error. .Sh ERRORS The .Fn thr_set_name system call may return the following errors: .Bl -tag -width Er .It Bq Er EFAULT The memory pointed to by the .Fa name argument is not valid. .It Bq Er ESRCH The thread with the identifier .Fa id does not exist in the current process. .El .Sh SEE ALSO .Xr ps 1 , +.Xr _umtx_op 2 , .Xr thr_exit 2 , .Xr thr_kill 2 , .Xr thr_kill2 2 , .Xr thr_new 2 , .Xr thr_self 2 , -.Xr _umtx_op 2 , .Xr pthread_set_name_np 3 , .Xr ddb 4 , .Xr ktr 9 .Sh STANDARDS The .Fn thr_set_name system call is non-standard and is used by the .Lb libthr . .Sh HISTORY The .Fn thr_set_name system call first appeared in .Fx 5.2 . Index: head/lib/libc/sys/thr_suspend.2 =================================================================== --- head/lib/libc/sys/thr_suspend.2 (revision 368816) +++ head/lib/libc/sys/thr_suspend.2 (revision 368817) @@ -1,135 +1,134 @@ .\" Copyright (c) 2016 The FreeBSD Foundation, Inc. .\" All rights reserved. .\" .\" This documentation was written by .\" Konstantin Belousov under sponsorship .\" from the FreeBSD Foundation. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD$ .\" .Dd May 5, 2020 .Dt THR_SUSPEND 2 .Os .Sh NAME .Nm thr_suspend .Nd suspend the calling thread .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In sys/thr.h .Ft int .Fn thr_suspend "struct timespec *timeout" .Sh DESCRIPTION .Bf -symbolic This function is intended for implementing threading. Normal applications should use .Xr pthread_cond_timedwait 3 together with .Xr pthread_cond_broadcast 3 for typical safe suspension with cooperation of the thread being suspended, or .Xr pthread_suspend_np 3 and .Xr pthread_resume_np 3 in some specific situations, instead. .Ef .Pp The .Fn thr_suspend system call puts the calling thread in a suspended state, where it is not eligible for CPU time. This state is exited by another thread calling .Xr thr_wake 2 , when the time interval specified by .Fa timeout has elapsed, or by the delivery of a signal to the suspended thread. .Pp If the .Fa timeout argument is .Dv NULL , the suspended state can be only terminated by explicit .Fn thr_wake or signal. .Pp If a wake from .Xr thr_wake 2 was delivered before the .Nm call, the thread is not put into a suspended state. Instead, the call returns immediately without an error. .Pp If a thread previously called .Xr thr_wake 2 with its own thread identifier, which resulted in setting the internal kernel flag to immediately abort interruptible sleeps with an .Er EINTR error .Po see .Xr thr_wake 2 .Pc , the flag is cleared. As with .Xr thr_wake 2 called from another thread, the next .Nm call does not result in suspension. -.Pp .Sh RETURN VALUES .Rv -std thr_suspend .Sh ERRORS The .Fn thr_suspend operation returns the following errors: .Bl -tag -width Er .It Bq Er EFAULT The memory pointed to by the .Fa timeout argument is not valid. .It Bq Er ETIMEDOUT The specified timeout expired. .It Bq Er ETIMEDOUT The .Fa timeout argument specified a zero time interval. .It Bq Er EINTR The sleep was interrupted by a signal. .El .Sh SEE ALSO .Xr ps 1 , .Xr thr_wake 2 , .Xr pthread_resume_np 3 , .Xr pthread_suspend_np 3 .Sh STANDARDS The .Fn thr_suspend system call is non-standard. .Sh HISTORY The .Fn thr_suspend system call first appeared in .Fx 5.2 . Index: head/lib/libc/sys/thr_wake.2 =================================================================== --- head/lib/libc/sys/thr_wake.2 (revision 368816) +++ head/lib/libc/sys/thr_wake.2 (revision 368817) @@ -1,117 +1,117 @@ .\" Copyright (c) 2016 The FreeBSD Foundation, Inc. .\" All rights reserved. .\" .\" This documentation was written by .\" Konstantin Belousov under sponsorship .\" from the FreeBSD Foundation. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD$ .\" .Dd May 5, 2020 .Dt THR_WAKE 2 .Os .Sh NAME .Nm thr_wake .Nd wake up the suspended thread .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In sys/thr.h .Ft int .Fn thr_wake "long id" .Sh DESCRIPTION .Bf -symbolic This function is intended for implementing threading. Normal applications should use .Xr pthread_cond_timedwait 3 together with .Xr pthread_cond_broadcast 3 for typical safe suspension with cooperation of the thread being suspended, or .Xr pthread_suspend_np 3 and .Xr pthread_resume_np 3 in some specific situations, instead. .Ef .Pp Passing the thread identifier of the calling thread .Po see .Xr thr_self 2 .Pc to .Fn thr_wake sets a thread's flag to cause the next signal-interruptible sleep of that thread in the kernel to fail immediately with the .Er EINTR error. The flag is cleared by an interruptible sleep attempt or by a call to .Xr thr_suspend 2 . This is used by the system threading library to implement cancellation. .Pp If .Fa id is not equal to the current thread identifier, the specified thread is woken up if suspended by the -.Xr thr_suspend +.Xr thr_suspend 2 system call. If the thread is not suspended at the time of the .Nm call, the wake is remembered and the next attempt of the thread to suspend itself with the .Xr thr_suspend 2 results in immediate return with success. Only one wake is remembered. .Sh RETURN VALUES .Rv -std thr_wake .Sh ERRORS The .Fn thr_wake operation returns these errors: .Bl -tag -width Er .It Bq Er ESRCH The specified thread was not found or does not belong to the process of the calling thread. .El .Sh SEE ALSO .Xr ps 1 , .Xr thr_self 2 , .Xr thr_suspend 2 , .Xr pthread_cancel 3 , .Xr pthread_resume_np 3 , .Xr pthread_suspend_np 3 .Sh STANDARDS The .Fn thr_suspend system call is non-standard and is used by .Lb libthr to implement .St -p1003.1-2001 .Xr pthread 3 functionality. .Sh HISTORY The .Fn thr_suspend system call first appeared in .Fx 5.2 . Index: head/lib/libc/x86/sys/pkru.3 =================================================================== --- head/lib/libc/x86/sys/pkru.3 (revision 368816) +++ head/lib/libc/x86/sys/pkru.3 (revision 368817) @@ -1,206 +1,206 @@ .\" Copyright (c) 2019 The FreeBSD Foundation, Inc. .\" All rights reserved. .\" .\" This documentation was written by .\" Konstantin Belousov under sponsorship .\" from the FreeBSD Foundation. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD$ .\" .Dd February 16, 2019 .Dt PKRU 3 .Os .Sh NAME .Nm Protection Key Rights for User pages .Nd provide fast user-managed key-based access control for pages .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In machine/sysarch.h .Ft int .Fn x86_pkru_get_perm "unsigned int keyidx" "int *access" "int *modify" .Ft int .Fn x86_pkru_set_perm "unsigned int keyidx" "int access" "int modify" .Ft int .Fo x86_pkru_protect_range .Fa "void *addr" .Fa "unsigned long len" .Fa "unsigned int keyidx" .Fa "int flag" .Fc .Ft int .Fn x86_pkru_unprotect_range "void *addr" "unsigned long len" .Sh DESCRIPTION The protection keys feature provides an additional mechanism, besides the normal page permissions as established by .Xr mmap 2 and .Xr mprotect 2 , to control access to user-mode addresses. The mechanism gives safety measures which can be used to avoid incidental read or modification of sensitive memory, or as a debugging feature. It cannot guard against conscious accesses since permissions are user-controllable. .Pp If supported by hardware, each mapped user linear address has an associated 4-bit protection key. A new per-thread PKRU hardware register determines, for each protection key, whether user-mode addresses with that protection key may be read or written. .Pp Only one key may apply to a given range at a time. The default protection key index is zero, it is used even if no key was explicitly assigned to the address, or if the key was removed. .Pp The protection prevents the system from accessing user addresses as well as the user applications. When a system call was unable to read or write user memory due to key protection, it returns the .Er EFAULT error code. Note that some side effects may have occurred if this error is reported. .Pp Protection keys require that the system uses 4-level paging (also called long mode), which means that it is only available on amd64 system. Both 64-bit and 32-bit applications can use protection keys. More information about the hardware feature is provided in the IA32 Software Developer's Manual published by Intel Corp. .Pp The key indexes written into the page table entries are managed by the .Fn sysarch syscall. Per-key permissions are managed using the user-mode instructions .Em RDPKRU and -.Em WRPKRU. +.Em WRPKRU . The system provides convenient library helpers for both the syscall and the instructions, described below. .Pp The .Fn x86_pkru_protect_range function assigns key .Fa keyidx to the range starting at .Fa addr and having length .Fa len . Starting address is truncated to the page start, and the end is rounded up to the end of the page. After a successfull call, the range has the specified key assigned, even if the key is zero and it did not change the page table entries. .Pp The .Fa flags argument takes the logical OR of the following values: .Bl -tag -width .It Bq Va AMD64_PKRU_EXCL Only assign the key if the range does not have any other keys assigned (including the zero key). You must first remove any existing key with .Fn x86_pkru_unprotect_range in order for this request to succeed. If the .Va AMD64_PKRU_EXCL flag is not specified, .Fn x86_pkru_protect_range replaces any existing key. .It Bq Va AMD64_PKRU_PERSIST The keys assigned to the range are persistent. They are re-established when the current mapping is destroyed and a new mapping is created in any sub-range of the specified range. You must use a .Fn x86_pkru_unprotect_range call to forget the key. .El .Pp The .Fn x86_pkru_unprotect_range function removes any keys assigned to the specified range. Existing mappings are changed to use key index zero in page table entries. Keys are no longer considered installed for all mappings in the range, for the purposes of .Fn x86_pkru_protect_range with the .Va AMD64_PKRU_EXCL flag. .Pp The .Fn x86_pkru_get_perm function returns access rights for the key specified by the .Fn keyidx argument. If the value pointed to by .Fa access is zero after the call, no read or write permissions is granted for mappings which are assigned the key .Fn keyidx . If .Fa access is not zero, read access is permitted. The non-zero value of the variable pointed to by the .Fa modify argument indicates that write access is permitted. .Pp Conversely, the .Fn x86_pkru_set_perm establishes the access and modify permissions for the given key index as specified by its arguments. .Sh RETURN VALUES .Rv -std .Sh ERRORS .Bl -tag -width Er .It Bq Er EOPNOTSUPP The hardware does not support protection keys. .It Bq Er EINVAL The supplied key index is invalid (greater than 15). .It Bq Er EINVAL The supplied .Fa flags argument for .Fn x86_pkru_protect_range has reserved bits set. .It Bq Er EFAULT The supplied address range does not completely fit into the user-managed address range. .It Bq Er ENOMEM The memory shortage prevents the completion of the operation. .It Bq Er EBUSY The .Va AMD64_PKRU_EXCL flag was specified for .Fn x86_pkru_protect_range and the range already has defined protection keys. .El .Sh SEE ALSO .Xr mmap 2 , .Xr mprotect 2 , .Xr munmap 2 , .Xr sysarch 2 . .Sh STANDARDS The .Nm functions are non-standard and first appeared in .Fx 13.0 .