diff --git a/lib/libc/gen/dup3.3 b/lib/libc/gen/dup3.3 index f2798930797b..338a9ae74c64 100644 --- a/lib/libc/gen/dup3.3 +++ b/lib/libc/gen/dup3.3 @@ -1,114 +1,125 @@ .\" Copyright (c) 2013 Jilles Tjoelker .\" All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" -.Dd August 16, 2013 +.Dd May 17, 2025 .Dt DUP3 3 .Os .Sh NAME .Nm dup3 .Nd duplicate an existing file descriptor .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In fcntl.h .In unistd.h .Ft int .Fn dup3 "int oldd" "int newd" "int flags" .Sh DESCRIPTION The .Fn dup3 function duplicates an existing object descriptor while allowing the value of the new descriptor to be specified. .Pp The close-on-exec flag on the new file descriptor is determined by the .Dv O_CLOEXEC bit in .Fa flags . .Pp +The close-on-fork flag on the new file descriptor is determined by the +.Dv O_CLOFORK +bit in +.Fa flags . +.Pp If .Fa oldd \*(Ne .Fa newd and .Fa flags == 0, the behavior is identical to .Li dup2(oldd, newd) . .Pp If .Fa oldd == .Fa newd , then .Fn dup3 fails, unlike .Xr dup2 2 . .Sh RETURN VALUES The value -1 is returned if an error occurs. The external variable .Va errno indicates the cause of the error. .Sh ERRORS The .Fn dup3 function fails if: .Bl -tag -width Er .It Bq Er EBADF The .Fa oldd argument is not a valid active descriptor or the .Fa newd argument is negative or exceeds the maximum allowable descriptor number .It Bq Er EINVAL The .Fa oldd argument is equal to the .Fa newd argument. .It Bq Er EINVAL The .Fa flags argument has bits set other than -.Dv O_CLOEXEC . +.Dv O_CLOEXEC +or +.Dv O_CLOFORK . .El .Sh SEE ALSO .Xr accept 2 , .Xr close 2 , .Xr dup2 2 , .Xr fcntl 2 , .Xr getdtablesize 2 , .Xr open 2 , .Xr pipe 2 , .Xr socket 2 , .Xr socketpair 2 .Sh STANDARDS The .Fn dup3 function does not conform to any standard. .Sh HISTORY The .Fn dup3 function appeared in .Fx 10.0 . +The +.Dv O_CLOFORK +flag appeared in +.Fx 15.0 . diff --git a/lib/libsys/accept.2 b/lib/libsys/accept.2 index 53926b3153d2..2da2af066a5b 100644 --- a/lib/libsys/accept.2 +++ b/lib/libsys/accept.2 @@ -1,236 +1,248 @@ .\" Copyright (c) 1983, 1990, 1991, 1993 .\" The Regents of the University of California. All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 3. Neither the name of the University nor the names of its contributors .\" may be used to endorse or promote products derived from this software .\" without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" -.Dd October 9, 2014 +.Dd May 17, 2025 .Dt ACCEPT 2 .Os .Sh NAME .Nm accept , .Nm accept4 .Nd accept a connection on a socket .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In sys/types.h .In sys/socket.h .Ft int .Fn accept "int s" "struct sockaddr * restrict addr" "socklen_t * restrict addrlen" .Ft int .Fn accept4 "int s" "struct sockaddr * restrict addr" "socklen_t * restrict addrlen" "int flags" .Sh DESCRIPTION The argument .Fa s is a socket that has been created with .Xr socket 2 , bound to an address with .Xr bind 2 , and is listening for connections after a .Xr listen 2 . The .Fn accept system call extracts the first connection request on the queue of pending connections, creates a new socket, and allocates a new file descriptor for the socket which inherits the state of the .Dv O_NONBLOCK and .Dv O_ASYNC properties and the destination of .Dv SIGIO and .Dv SIGURG signals from the original socket .Fa s . .Pp The .Fn accept4 system call is similar, but the .Dv O_NONBLOCK property of the new socket is instead determined by the .Dv SOCK_NONBLOCK flag in the .Fa flags argument, the .Dv O_ASYNC property is cleared, the signal destination is cleared and the close-on-exec flag on the new file descriptor can be set via the .Dv SOCK_CLOEXEC flag in the .Fa flags argument. +Similarly, the +.Dv O_CLOFORK +property can be set via the +.Dv SOCK_CLOFORK +flag in the +.Fa flags +argument. .Pp If no pending connections are present on the queue, and the original socket is not marked as non-blocking, .Fn accept blocks the caller until a connection is present. If the original socket is marked non-blocking and no pending connections are present on the queue, .Fn accept returns an error as described below. The accepted socket may not be used to accept more connections. The original socket .Fa s remains open. .Pp The argument .Fa addr is a result argument that is filled-in with the address of the connecting entity, as known to the communications layer. The exact format of the .Fa addr argument is determined by the domain in which the communication is occurring. A null pointer may be specified for .Fa addr if the address information is not desired; in this case, .Fa addrlen is not used and should also be null. Otherwise, the .Fa addrlen argument is a value-result argument; it should initially contain the amount of space pointed to by .Fa addr ; on return it will contain the actual length (in bytes) of the address returned. This call is used with connection-based socket types, currently with .Dv SOCK_STREAM . .Pp It is possible to .Xr select 2 a socket for the purposes of doing an .Fn accept by selecting it for read. .Pp For certain protocols which require an explicit confirmation, such as .Tn ISO or .Tn DATAKIT , .Fn accept can be thought of as merely dequeueing the next connection request and not implying confirmation. Confirmation can be implied by a normal read or write on the new file descriptor, and rejection can be implied by closing the new socket. .Pp For some applications, performance may be enhanced by using an .Xr accept_filter 9 to pre-process incoming connections. .Pp When using .Fn accept , portable programs should not rely on the .Dv O_NONBLOCK and .Dv O_ASYNC properties and the signal destination being inherited, but should set them explicitly using .Xr fcntl 2 ; .Fn accept4 sets these properties consistently, but may not be fully portable across .Ux platforms. .Sh RETURN VALUES These calls return \-1 on error. If they succeed, they return a non-negative integer that is a descriptor for the accepted socket. .Sh ERRORS The .Fn accept and .Fn accept4 system calls will fail if: .Bl -tag -width Er .It Bq Er EBADF The descriptor is invalid. .It Bq Er EINTR The .Fn accept operation was interrupted. .It Bq Er EMFILE The per-process descriptor table is full. .It Bq Er ENFILE The system file table is full. .It Bq Er ENOTSOCK The descriptor references a file, not a socket. .It Bq Er EINVAL .Xr listen 2 has not been called on the socket descriptor. .It Bq Er EFAULT The .Fa addr argument is not in a writable part of the user address space. .It Bo Er EWOULDBLOCK Bc or Bq Er EAGAIN The socket is marked non-blocking and no connections are present to be accepted. .It Bq Er ECONNABORTED A connection arrived, but it was closed while waiting on the listen queue. .El .Pp The .Fn accept4 system call will also fail if: .Bl -tag -width Er .It Bq Er EINVAL The .Fa flags argument is invalid. .El .Sh SEE ALSO .Xr bind 2 , .Xr connect 2 , .Xr getpeername 2 , .Xr getsockname 2 , .Xr listen 2 , .Xr select 2 , .Xr socket 2 , .Xr accept_filter 9 .Sh HISTORY The .Fn accept system call appeared in .Bx 4.2 . .Pp The .Fn accept4 system call appeared in .Fx 10.0 . +.Pp +The +.Dv SOCK_CLOFORK +flag appeared in +.Fx 15.0 . diff --git a/lib/libsys/closefrom.2 b/lib/libsys/closefrom.2 index aaa4c55607ac..1885a6fdeaa8 100644 --- a/lib/libsys/closefrom.2 +++ b/lib/libsys/closefrom.2 @@ -1,92 +1,99 @@ .\" Copyright (c) 2009 Hudson River Trading LLC .\" Written by: John H. Baldwin .\" All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" -.Dd March 3, 2022 +.Dd May 17, 2025 .Dt CLOSEFROM 2 .Os .Sh NAME .Nm closefrom , .Nm close_range .Nd delete open file descriptors .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In unistd.h .Ft void .Fn closefrom "int lowfd" .Ft int .Fn close_range "u_int lowfd" "u_int highfd" "int flags" .Sh DESCRIPTION The .Fn closefrom system call deletes all open file descriptors greater than or equal to .Fa lowfd from the per-process object reference table. Any errors encountered while closing file descriptors are ignored. .Pp The .Fn close_range system call deletes all open file descriptors between .Fa lowfd and .Fa highfd inclusive, clamped to the range of open file descriptors. Any errors encountered while closing file descriptors are ignored. Supported .Fa flags : .Bl -tag -width ".Dv CLOSE_RANGE_CLOEXEC" .It Dv CLOSE_RANGE_CLOEXEC Set the close-on-exec flag on descriptors in the range instead of closing them. +.It Dv CLOSE_RANGE_CLOFORK +Set the close-on-fork flag on descriptors in the range instead of closing them. .El .Sh RETURN VALUES Upon successful completion, .Fn close_range returns a value of 0. Otherwise, a value of -1 is returned and the global variable .Va errno is set to indicate the error. .Sh ERRORS The .Fn close_range system call will fail if: .Bl -tag -width Er .It Bq Er EINVAL The .Fa highfd argument is lower than the .Fa lowfd argument. .It Bq Er EINVAL An invalid flag was set. .El .Sh SEE ALSO .Xr close 2 .Sh HISTORY The .Fn closefrom function first appeared in .Fx 8.0 . +.Pp +The +.Dv CLOSE_RANGE_CLOFORK +flag appeared in +.Fx 15.0 . diff --git a/lib/libsys/execve.2 b/lib/libsys/execve.2 index 5a35980e9555..dc85b9321e48 100644 --- a/lib/libsys/execve.2 +++ b/lib/libsys/execve.2 @@ -1,379 +1,382 @@ .\" Copyright (c) 1980, 1991, 1993 .\" The Regents of the University of California. All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 3. Neither the name of the University nor the names of its contributors .\" may be used to endorse or promote products derived from this software .\" without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" -.Dd January 26, 2022 +.Dd July 02, 2025 .Dt EXECVE 2 .Os .Sh NAME .Nm execve , .Nm fexecve .Nd execute a file .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In unistd.h .Ft int .Fn execve "const char *path" "char *const argv[]" "char *const envp[]" .Ft int .Fn fexecve "int fd" "char *const argv[]" "char *const envp[]" .Sh DESCRIPTION The .Fn execve system call transforms the calling process into a new process. The new process is constructed from an ordinary file, whose name is pointed to by .Fa path , called the .Em new process file . The .Fn fexecve system call is equivalent to .Fn execve except that the file to be executed is determined by the file descriptor .Fa fd instead of a .Fa path . This file is either an executable object file, or a file of data for an interpreter. An executable object file consists of an identifying header, followed by pages of data representing the initial program (text) and initialized data pages. Additional pages may be specified by the header to be initialized with zero data; see .Xr elf 5 and .Xr a.out 5 . .Pp An interpreter file begins with a line of the form: .Pp .Bd -ragged -offset indent -compact .Sy \&#! .Em interpreter .Bq Em arg .Ed .Pp When an interpreter file is .Sy execve Ap d , the system actually .Sy execve Ap s the specified .Em interpreter . If the optional .Em arg is specified, it becomes the first argument to the .Em interpreter , and the name of the originally .Sy execve Ap d file becomes the second argument; otherwise, the name of the originally .Sy execve Ap d file becomes the first argument. The original arguments are shifted over to become the subsequent arguments. The zeroth argument is set to the specified .Em interpreter . .Pp The argument .Fa argv is a pointer to a null-terminated array of character pointers to null-terminated character strings. These strings construct the argument list to be made available to the new process. At least one argument must be present in the array; by custom, the first element should be the name of the executed program (for example, the last component of .Fa path ) . .Pp The argument .Fa envp is also a pointer to a null-terminated array of character pointers to null-terminated strings. A pointer to this array is normally stored in the global variable .Va environ . These strings pass information to the new process that is not directly an argument to the command (see .Xr environ 7 ) . .Pp File descriptors open in the calling process image remain open in the new process image, except for those for which the close-on-exec flag is set (see .Xr close 2 and .Xr fcntl 2 ) . Descriptors that remain open are unaffected by -.Fn execve . +.Fn execve , +except those with the close-on-fork flag +.Dv FD_CLOFORK +which is cleared from all file descriptors. If any of the standard descriptors (0, 1, and/or 2) are closed at the time .Fn execve is called, and the process will gain privilege as a result of set-id semantics, those descriptors will be re-opened automatically. No programs, whether privileged or not, should assume that these descriptors will remain closed across a call to .Fn execve . .Pp Signals set to be ignored in the calling process are set to be ignored in the new process. Signals which are set to be caught in the calling process image are set to default action in the new process image. Blocked signals remain blocked regardless of changes to the signal action. The signal stack is reset to be undefined (see .Xr sigaction 2 for more information). .Pp If the set-user-ID mode bit of the new process image file is set (see .Xr chmod 2 ) , the effective user ID of the new process image is set to the owner ID of the new process image file. If the set-group-ID mode bit of the new process image file is set, the effective group ID of the new process image is set to the group ID of the new process image file. (The effective group ID is the first element of the group list.) The real user ID, real group ID and other group IDs of the new process image remain the same as the calling process image. After any set-user-ID and set-group-ID processing, the effective user ID is recorded as the saved set-user-ID, and the effective group ID is recorded as the saved set-group-ID. These values may be used in changing the effective IDs later (see .Xr setuid 2 ) . .Pp The set-ID bits are not honored if the respective file system has the .Cm nosuid option enabled or if the new process file is an interpreter file. Syscall tracing is disabled if effective IDs are changed. .Pp The new process also inherits the following attributes from the calling process: .Pp .Bl -column parent_process_ID -offset indent -compact .It process ID Ta see Xr getpid 2 .It parent process ID Ta see Xr getppid 2 .It process group ID Ta see Xr getpgrp 2 .It access groups Ta see Xr getgroups 2 .It working directory Ta see Xr chdir 2 .It root directory Ta see Xr chroot 2 .It control terminal Ta see Xr termios 4 .It resource usages Ta see Xr getrusage 2 .It interval timers Ta see Xr getitimer 2 .It resource limits Ta see Xr getrlimit 2 .It file mode mask Ta see Xr umask 2 .It signal mask Ta see Xr sigaction 2 , .Xr sigprocmask 2 .El .Pp When a program is executed as a result of an .Fn execve system call, it is entered as follows: .Bd -literal -offset indent main(argc, argv, envp) int argc; char **argv, **envp; .Ed .Pp where .Fa argc is the number of elements in .Fa argv (the ``arg count'') and .Fa argv points to the array of character pointers to the arguments themselves. .Pp The .Fn fexecve ignores the file offset of .Fa fd . Since execute permission is checked by .Fn fexecve , the file descriptor .Fa fd need not have been opened with the .Dv O_EXEC flag. However, if the file to be executed denies read permission for the process preparing to do the exec, the only way to provide the .Fa fd to .Fn fexecve is to use the .Dv O_EXEC flag when opening .Fa fd . Note that the file to be executed can not be open for writing. .Sh RETURN VALUES As the .Fn execve system call overlays the current process image with a new process image the successful call has no process to return to. If .Fn execve does return to the calling process an error has occurred; the return value will be -1 and the global variable .Va errno is set to indicate the error. .Sh ERRORS The .Fn execve system call will fail and return to the calling process if: .Bl -tag -width Er .It Bq Er ENOTDIR A component of the path prefix is not a directory. .It Bq Er ENAMETOOLONG A component of a pathname exceeded 255 characters, or an entire path name exceeded 1023 characters. .It Bq Er ENOEXEC When invoking an interpreted script, the length of the first line, inclusive of the .Sy \&#! prefix and terminating newline, exceeds .Dv MAXSHELLCMDLEN characters. .It Bq Er ENOENT The new process file does not exist. .It Bq Er ELOOP Too many symbolic links were encountered in translating the pathname. .It Bq Er EACCES Search permission is denied for a component of the path prefix. .It Bq Er EACCES The new process file is not an ordinary file. .It Bq Er EACCES The new process file mode denies execute permission. .It Bq Er EINVAL .Fa argv did not contain at least one element. .It Bq Er ENOEXEC The new process file has the appropriate access permission, but has an invalid magic number in its header. .It Bq Er ETXTBSY The new process file is a pure procedure (shared text) file that is currently open for writing by some process. .It Bq Er ENOMEM The new process requires more virtual memory than is allowed by the imposed maximum .Pq Xr getrlimit 2 . .It Bq Er E2BIG The number of bytes in the new process' argument list is larger than the system-imposed limit. This limit is specified by the .Xr sysctl 3 MIB variable .Dv KERN_ARGMAX . .It Bq Er EFAULT The new process file is not as long as indicated by the size values in its header. .It Bq Er EFAULT The .Fa path , .Fa argv , or .Fa envp arguments point to an illegal address. .It Bq Er EIO An I/O error occurred while reading from the file system. .It Bq Er EINTEGRITY Corrupted data was detected while reading from the file system. .El .Pp In addition, the .Fn fexecve will fail and return to the calling process if: .Bl -tag -width Er .It Bq Er EBADF The .Fa fd argument is not a valid file descriptor open for executing. .El .Sh SEE ALSO .Xr ktrace 1 , .Xr _exit 2 , .Xr fork 2 , .Xr open 2 , .Xr execl 3 , .Xr exit 3 , .Xr sysctl 3 , .Xr fdescfs 4 , .Xr a.out 5 , .Xr elf 5 , .Xr environ 7 , .Xr mount 8 .Sh STANDARDS The .Fn execve system call conforms to .St -p1003.1-2001 , with the exception of reopening descriptors 0, 1, and/or 2 in certain circumstances. A future update of the Standard is expected to require this behavior, and it may become the default for non-privileged processes as well. .\" NB: update this caveat when TC1 is blessed. The support for executing interpreted programs is an extension. The .Fn fexecve system call conforms to The Open Group Extended API Set 2 specification. .Sh HISTORY The .Fn execve system call appeared in .At v7 . The .Fn fexecve system call appeared in .Fx 8.0 . .Sh CAVEATS If a program is .Em setuid to a non-super-user, but is executed when the real .Em uid is ``root'', then the program has some of the powers of a super-user as well. .Pp When executing an interpreted program through .Fn fexecve , kernel supplies .Pa /dev/fd/n as a second argument to the interpreter, where .Ar n is the file descriptor passed in the .Fa fd argument to .Fn fexecve . For this construction to work correctly, the .Xr fdescfs 4 filesystem shall be mounted on .Pa /dev/fd . diff --git a/lib/libsys/fcntl.2 b/lib/libsys/fcntl.2 index 604de43e5e8c..3cf8adc29f88 100644 --- a/lib/libsys/fcntl.2 +++ b/lib/libsys/fcntl.2 @@ -1,813 +1,848 @@ .\" Copyright (c) 1983, 1993 .\" The Regents of the University of California. All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 3. Neither the name of the University nor the names of its contributors .\" may be used to endorse or promote products derived from this software .\" without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" -.Dd June 5, 2025 +.Dd June 24, 2025 .Dt FCNTL 2 .Os .Sh NAME .Nm fcntl .Nd file control .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In fcntl.h .Ft int .Fn fcntl "int fd" "int cmd" "..." .Sh DESCRIPTION The .Fn fcntl system call provides for control over descriptors. The argument .Fa fd is a descriptor to be operated on by .Fa cmd as described below. Depending on the value of .Fa cmd , .Fn fcntl can take an additional third argument .Fa arg . Unless otherwise noted below for a specific operation, .Fa arg has type .Vt int . .Bl -tag -width F_DUP2FD_CLOEXEC .It Dv F_DUPFD Return a new descriptor as follows: .Pp .Bl -bullet -compact -offset 4n .It Lowest numbered available descriptor greater than or equal to .Fa arg . .It Same object references as the original descriptor. .It New descriptor shares the same file offset if the object was a file. .It Same access mode (read, write or read/write). .It Same file status flags (i.e., both file descriptors share the same file status flags). .It The close-on-exec flag .Dv FD_CLOEXEC associated with the new file descriptor is cleared, so the file descriptor is to remain open across .Xr execve 2 system calls. .It +The fork-on-exec flag +.Dv FD_CLOFORK +associated with the new file descriptor is cleared, so the file descriptor is +to remain open across +.Xr fork 2 +system calls. +.It The .Dv FD_RESOLVE_BENEATH flag, described below, will be set if it was set on the original descriptor. .El .It Dv F_DUPFD_CLOEXEC Like .Dv F_DUPFD , but the .Dv FD_CLOEXEC flag associated with the new file descriptor is set, so the file descriptor is closed when .Xr execve 2 system call executes. +.It Dv F_DUPFD_CLOFORK +Like +.Dv F_DUPFD , +but the +.Dv FD_CLOFORK +flag associated with the new file descriptor is set, so the file descriptor +is closed when +.Xr fork 2 +system call executes. .It Dv F_DUP2FD It is functionally equivalent to .Bd -literal -offset indent dup2(fd, arg) .Ed .It Dv F_DUP2FD_CLOEXEC Like .Dv F_DUP2FD , but the .Dv FD_CLOEXEC flag associated with the new file descriptor is set. .Pp The .Dv F_DUP2FD and .Dv F_DUP2FD_CLOEXEC constants are not portable, so they should not be used if portability is needed. Use .Fn dup2 instead of .Dv F_DUP2FD . +.It Dv F_DUP3FD +Used to implement the +.Fn dup3 +call. +Do not use it. .It Dv F_GETFD Get the flags associated with the file descriptor .Fa fd . The following flags are defined: .Bl -tag -width FD_RESOLVE_BENEATH .It Dv FD_CLOEXEC The file will be closed upon execution of .Fn exec .Fa ( arg is ignored). Otherwise, the file descriptor will remain open. +.It Dv FD_CLOFORK +The file will be closed upon execution of the +.Fn fork +family of system calls. .It Dv FD_RESOLVE_BENEATH All path name lookups relative to that file descriptor will behave as if the lookup had .Dv O_RESOLVE_BENEATH or .Dv AT_RESOLVE_BENEATH semantics. It is not permitted to call .Xr fchdir 2 or .Xr fchroot 2 on such a file descriptor. The .Dv FD_RESOLVE_BENEATH flag is sticky, meaning that it is preserved by .Xr dup 2 and similar operations, and opening a directory with .Xr openat 2 where the directory descriptor has the flag set causes the new directory descriptor to also have the flag set. .El .It Dv F_SETFD Set flags associated with .Fa fd . The available flags are -.Dv FD_CLOEXEC +.Dv FD_CLOEXEC , +.Dv FD_CLOFORK and .Dv FD_RESOLVE_BENEATH . The .Dv FD_RESOLVE_BENEATH flag cannot be cleared once set. .It Dv F_GETFL Get descriptor status flags, as described below .Fa ( arg is ignored). .It Dv F_SETFL Set descriptor status flags to .Fa arg . .It Dv F_GETOWN Get the process ID or process group currently receiving .Dv SIGIO and .Dv SIGURG signals; process groups are returned as negative values .Fa ( arg is ignored). .It Dv F_SETOWN Set the process or process group to receive .Dv SIGIO and .Dv SIGURG signals; process groups are specified by supplying .Fa arg as negative, otherwise .Fa arg is interpreted as a process ID. .It Dv F_READAHEAD Set or clear the read ahead amount for sequential access to the third argument, .Fa arg , which is rounded up to the nearest block size. A zero value in .Fa arg turns off read ahead, a negative value restores the system default. .It Dv F_RDAHEAD Equivalent to Darwin counterpart which sets read ahead amount of 128KB when the third argument, .Fa arg is non-zero. A zero value in .Fa arg turns off read ahead. .It Dv F_ADD_SEALS Add seals to the file as described below, if the underlying filesystem supports seals. .It Dv F_GET_SEALS Get seals associated with the file, if the underlying filesystem supports seals. .It Dv F_ISUNIONSTACK Check if the vnode is part of a union stack (either the "union" flag from .Xr mount 2 or unionfs). This is a hack not intended to be used outside of libc. .It Dv F_KINFO Fills a .Vt struct kinfo_file for the file referenced by the specified file descriptor. The .Fa arg argument should point to the storage for .Vt struct kinfo_file . The .Va kf_structsize member of the passed structure must be initialized with the sizeof of .Vt struct kinfo_file , to allow for the interface versioning and evolution. .El .Pp The flags for the .Dv F_GETFL and .Dv F_SETFL commands are as follows: .Bl -tag -width O_NONBLOCKX .It Dv O_NONBLOCK Non-blocking I/O; if no data is available to a .Xr read 2 system call, or if a .Xr write 2 operation would block, the read or write call returns -1 with the error .Er EAGAIN . .It Dv O_APPEND Force each write to append at the end of file; corresponds to the .Dv O_APPEND flag of .Xr open 2 . .It Dv O_DIRECT Minimize or eliminate the cache effects of reading and writing. The system will attempt to avoid caching the data you read or write. If it cannot avoid caching the data, it will minimize the impact the data has on the cache. Use of this flag can drastically reduce performance if not used with care. .It Dv O_ASYNC Enable the .Dv SIGIO signal to be sent to the process group when I/O is possible, e.g., upon availability of data to be read. .It Dv O_SYNC Enable synchronous writes. Corresponds to the .Dv O_SYNC flag of .Xr open 2 . .Dv O_FSYNC is an historical synonym for .Dv O_SYNC . .It Dv O_DSYNC Enable synchronous data writes. Corresponds to the .Dv O_DSYNC flag of .Xr open 2 . .El .Pp The seals that may be applied with .Dv F_ADD_SEALS are as follows: .Bl -tag -width F_SEAL_SHRINK .It Dv F_SEAL_SEAL Prevent any further seals from being applied to the file. .It Dv F_SEAL_SHRINK Prevent the file from being shrunk with .Xr ftruncate 2 . .It Dv F_SEAL_GROW Prevent the file from being enlarged with .Xr ftruncate 2 . .It Dv F_SEAL_WRITE Prevent any further .Xr write 2 calls to the file. Any writes in progress will finish before .Fn fcntl returns. If any writeable mappings exist, F_ADD_SEALS will fail and return .Dv EBUSY . .El .Pp Seals are on a per-inode basis and require support by the underlying filesystem. If the underlying filesystem does not support seals, .Dv F_ADD_SEALS and .Dv F_GET_SEALS will fail and return .Dv EINVAL . .Pp Several operations are available for doing advisory file locking; they all operate on the following structure: .Bd -literal struct flock { off_t l_start; /* starting offset */ off_t l_len; /* len = 0 means until end of file */ pid_t l_pid; /* lock owner */ short l_type; /* lock type: read/write, etc. */ short l_whence; /* type of l_start */ int l_sysid; /* remote system id or zero for local */ }; .Ed These advisory file locking operations take a pointer to .Vt struct flock as the third argument .Fa arg . The commands available for advisory record locking are as follows: .Bl -tag -width F_SETLKWX .It Dv F_GETLK Get the first lock that blocks the lock description pointed to by the third argument, .Fa arg , taken as a pointer to a .Fa "struct flock" (see above). The information retrieved overwrites the information passed to .Fn fcntl in the .Fa flock structure. If no lock is found that would prevent this lock from being created, the structure is left unchanged by this system call except for the lock type which is set to .Dv F_UNLCK . .It Dv F_SETLK Set or clear a file segment lock according to the lock description pointed to by the third argument, .Fa arg , taken as a pointer to a .Fa "struct flock" (see above). .Dv F_SETLK is used to establish shared (or read) locks .Pq Dv F_RDLCK or exclusive (or write) locks, .Pq Dv F_WRLCK , as well as remove either type of lock .Pq Dv F_UNLCK . If a shared or exclusive lock cannot be set, .Fn fcntl returns immediately with .Er EAGAIN . .It Dv F_SETLKW This command is the same as .Dv F_SETLK except that if a shared or exclusive lock is blocked by other locks, the process waits until the request can be satisfied. If a signal that is to be caught is received while .Fn fcntl is waiting for a region, the .Fn fcntl will be interrupted if the signal handler has not specified the .Dv SA_RESTART (see .Xr sigaction 2 ) . .El .Pp When a shared lock has been set on a segment of a file, other processes can set shared locks on that segment or a portion of it. A shared lock prevents any other process from setting an exclusive lock on any portion of the protected area. A request for a shared lock fails if the file descriptor was not opened with read access. .Pp An exclusive lock prevents any other process from setting a shared lock or an exclusive lock on any portion of the protected area. A request for an exclusive lock fails if the file was not opened with write access. .Pp The value of .Fa l_whence is .Dv SEEK_SET , .Dv SEEK_CUR , or .Dv SEEK_END to indicate that the relative offset, .Fa l_start bytes, will be measured from the start of the file, current position, or end of the file, respectively. The value of .Fa l_len is the number of consecutive bytes to be locked. If .Fa l_len is negative, .Fa l_start means end edge of the region. The .Fa l_pid and .Fa l_sysid fields are only used with .Dv F_GETLK to return the process ID of the process holding a blocking lock and the system ID of the system that owns that process. Locks created by the local system will have a system ID of zero. After a successful .Dv F_GETLK request, the value of .Fa l_whence is .Dv SEEK_SET . .Pp Locks may start and extend beyond the current end of a file, but may not start or extend before the beginning of the file. A lock is set to extend to the largest possible value of the file offset for that file if .Fa l_len is set to zero. If .Fa l_whence and .Fa l_start point to the beginning of the file, and .Fa l_len is zero, the entire file is locked. If an application wishes only to do entire file locking, the .Xr flock 2 system call is much more efficient. .Pp There is at most one type of lock set for each byte in the file. Before a successful return from an .Dv F_SETLK or an .Dv F_SETLKW request when the calling process has previously existing locks on bytes in the region specified by the request, the previous lock type for each byte in the specified region is replaced by the new lock type. As specified above under the descriptions of shared locks and exclusive locks, an .Dv F_SETLK or an .Dv F_SETLKW request fails or blocks respectively when another process has existing locks on bytes in the specified region and the type of any of those locks conflicts with the type specified in the request. .Pp The queuing for .Dv F_SETLKW requests on local files is fair; that is, while the thread is blocked, subsequent requests conflicting with its requests will not be granted, even if these requests do not conflict with existing locks. .Pp This interface follows the completely stupid semantics of System V and .St -p1003.1-88 that require that all locks associated with a file for a given process are removed when .Em any file descriptor for that file is closed by that process. This semantic means that applications must be aware of any files that a subroutine library may access. For example if an application for updating the password file locks the password file database while making the update, and then calls .Xr getpwnam 3 to retrieve a record, the lock will be lost because .Xr getpwnam 3 opens, reads, and closes the password database. The database close will release all locks that the process has associated with the database, even if the library routine never requested a lock on the database. Another minor semantic problem with this interface is that locks are not inherited by a child process created using the .Xr fork 2 system call. The .Xr flock 2 interface has much more rational last close semantics and allows locks to be inherited by child processes. The .Xr flock 2 system call is recommended for applications that want to ensure the integrity of their locks when using library routines or wish to pass locks to their children. .Pp The .Fn fcntl , .Xr flock 2 , and .Xr lockf 3 locks are compatible. Processes using different locking interfaces can cooperate over the same file safely. However, only one of such interfaces should be used within the same process. If a file is locked by a process through .Xr flock 2 , any record within the file will be seen as locked from the viewpoint of another process using .Fn fcntl or .Xr lockf 3 , and vice versa. Note that .Fn fcntl F_GETLK returns \-1 in .Fa l_pid if the process holding a blocking lock previously locked the file descriptor by .Xr flock 2 . .Pp All locks associated with a file for a given process are removed when the process terminates. .Pp All locks obtained before a call to .Xr execve 2 remain in effect until the new program releases them. If the new program does not know about the locks, they will not be released until the program exits. .Pp A potential for deadlock occurs if a process controlling a locked region is put to sleep by attempting to lock the locked region of another process. This implementation detects that sleeping until a locked region is unlocked would cause a deadlock and fails with an .Er EDEADLK error. .Sh RETURN VALUES Upon successful completion, the value returned depends on .Fa cmd as follows: .Bl -tag -width F_GETOWNX -offset indent .It Dv F_DUPFD A new file descriptor. .It Dv F_DUP2FD A file descriptor equal to .Fa arg . .It Dv F_GETFD Value of flag (only the low-order bit is defined). .It Dv F_GETFL Value of flags. .It Dv F_GETOWN Value of file descriptor owner. .It other Value other than -1. .El .Pp Otherwise, a value of -1 is returned and .Va errno is set to indicate the error. .Sh ERRORS The .Fn fcntl system call will fail if: .Bl -tag -width Er .It Bq Er EAGAIN The argument .Fa cmd is .Dv F_SETLK , the type of lock .Pq Fa l_type is a shared lock .Pq Dv F_RDLCK or exclusive lock .Pq Dv F_WRLCK , and the segment of a file to be locked is already exclusive-locked by another process; or the type is an exclusive lock and some portion of the segment of a file to be locked is already shared-locked or exclusive-locked by another process. .It Bq Er EBADF The .Fa fd argument is not a valid open file descriptor. .Pp The argument .Fa cmd is .Dv F_DUP2FD , and .Fa arg is not a valid file descriptor. .Pp The argument .Fa cmd is .Dv F_SETLK or .Dv F_SETLKW , the type of lock .Pq Fa l_type is a shared lock .Pq Dv F_RDLCK , and .Fa fd is not a valid file descriptor open for reading. .Pp The argument .Fa cmd is .Dv F_SETLK or .Dv F_SETLKW , the type of lock .Pq Fa l_type is an exclusive lock .Pq Dv F_WRLCK , and .Fa fd is not a valid file descriptor open for writing. .It Bq Er EBUSY The argument .Fa cmd is .Dv F_ADD_SEALS , attempting to set .Dv F_SEAL_WRITE , and writeable mappings of the file exist. .It Bq Er EDEADLK The argument .Fa cmd is .Dv F_SETLKW , and a deadlock condition was detected. .It Bq Er EINTR The argument .Fa cmd is .Dv F_SETLKW , and the system call was interrupted by a signal. .It Bq Er EINVAL The .Fa cmd argument is .Dv F_DUPFD and .Fa arg is negative or greater than the maximum allowable number (see .Xr getdtablesize 2 ) . .Pp The argument .Fa cmd is .Dv F_GETLK , .Dv F_SETLK or .Dv F_SETLKW and the data to which .Fa arg points is not valid. .Pp The argument .Fa cmd is .Dv F_ADD_SEALS or .Dv F_GET_SEALS , and the underlying filesystem does not support sealing. .Pp The argument .Fa cmd is invalid. .It Bq Er EMFILE The argument .Fa cmd is .Dv F_DUPFD and the maximum number of file descriptors permitted for the process are already in use, or no file descriptors greater than or equal to .Fa arg are available. .It Bq Er ENOTTY The .Fa fd argument is not a valid file descriptor for the requested operation. This may be the case if .Fa fd is a device node, or a descriptor returned by .Xr kqueue 2 . .It Bq Er ENOLCK The argument .Fa cmd is .Dv F_SETLK or .Dv F_SETLKW , and satisfying the lock or unlock request would result in the number of locked regions in the system exceeding a system-imposed limit. .It Bq Er EOPNOTSUPP The argument .Fa cmd is .Dv F_GETLK , .Dv F_SETLK or .Dv F_SETLKW and .Fa fd refers to a file for which locking is not supported. .It Bq Er EOVERFLOW The argument .Fa cmd is .Dv F_GETLK , .Dv F_SETLK or .Dv F_SETLKW and an .Fa off_t calculation overflowed. .It Bq Er EPERM The .Fa cmd argument is .Dv F_SETOWN and the process ID or process group given as an argument is in a different session than the caller. .Pp The .Fa cmd argument is .Dv F_ADD_SEALS and the .Dv F_SEAL_SEAL seal has already been set. .It Bq Er ESRCH The .Fa cmd argument is .Dv F_SETOWN and the process ID given as argument is not in use. .El .Pp In addition, if .Fa fd refers to a descriptor open on a terminal device (as opposed to a descriptor open on a socket), a .Fa cmd of .Dv F_SETOWN can fail for the same reasons as in .Xr tcsetpgrp 3 , and a .Fa cmd of .Dv F_GETOWN for the reasons as stated in .Xr tcgetpgrp 3 . .Sh SEE ALSO .Xr close 2 , .Xr dup2 2 , .Xr execve 2 , .Xr flock 2 , .Xr getdtablesize 2 , .Xr open 2 , .Xr sigaction 2 , .Xr lockf 3 , .Xr tcgetpgrp 3 , .Xr tcsetpgrp 3 .Sh STANDARDS The .Dv F_DUP2FD -constant is non portable. -It is provided for compatibility with AIX and Solaris. +and +.Dv F_DUP3FD +constants are not portable. +They are provided for compatibility with AIX and Solaris. .Pp Per .St -susv4 , a call with .Dv F_SETLKW should fail with .Bq Er EINTR after any caught signal and should continue waiting during thread suspension such as a stop signal. However, in this implementation a call with .Dv F_SETLKW is restarted after catching a signal with a .Dv SA_RESTART handler or a thread suspension such as a stop signal. .Sh HISTORY The .Fn fcntl system call appeared in .Bx 4.2 . .Pp The .Dv F_DUP2FD constant first appeared in .Fx 7.1 . +.Pp +The +.Dv F_DUPFD_CLOFORK +and +.Dv F_DUP3FD +flags appeared in +.Fx 15.0 . diff --git a/lib/libsys/fork.2 b/lib/libsys/fork.2 index 7d548a42890d..e59b208a9ff5 100644 --- a/lib/libsys/fork.2 +++ b/lib/libsys/fork.2 @@ -1,268 +1,278 @@ .\" Copyright (c) 1980, 1991, 1993 .\" The Regents of the University of California. All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 3. Neither the name of the University nor the names of its contributors .\" may be used to endorse or promote products derived from this software .\" without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" -.Dd August 5, 2021 +.Dd May 17, 2024 .Dt FORK 2 .Os .Sh NAME .Nm fork .Nd create a new process .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In unistd.h .Ft pid_t .Fn fork void .Ft pid_t .Fn _Fork void .Sh DESCRIPTION The .Fn fork function causes creation of a new process. The new process (child process) is an exact copy of the calling process (parent process) except for the following: .Bl -bullet -offset indent .It The child process has a unique process ID. .It The child process has a different parent process ID (i.e., the process ID of the parent process). .It The child process has its own copy of the parent's descriptors, except for descriptors returned by .Xr kqueue 2 , which are not inherited from the parent process. These descriptors reference the same underlying objects, so that, for instance, file pointers in file objects are shared between the child and the parent, so that an .Xr lseek 2 on a descriptor in the child process can affect a subsequent .Xr read 2 or .Xr write 2 by the parent. This descriptor copying is also used by the shell to establish standard input and output for newly created processes as well as to set up pipes. +Any file descriptors that were marked with the close-on-fork flag, +.Dv FD_CLOFORK +.Po see +.Fn fcntl 2 +and +.Dv O_CLOFORK +in +.Fn open 2 +.Pc , +will not be present in the child process, but remain open in the parent. .It The child process' resource utilizations are set to 0; see .Xr setrlimit 2 . .It All interval timers are cleared; see .Xr setitimer 2 . .It The robust mutexes list (see .Xr pthread_mutexattr_setrobust 3 ) is cleared for the child. .It The atfork handlers established with the .Xr pthread_atfork 3 function are called as appropriate before fork in the parent process, and after the child is created, in parent and child. .It The child process has only one thread, corresponding to the calling thread in the parent process. If the process has more than one thread, locks and other resources held by the other threads are not released and therefore only async-signal-safe functions (see .Xr sigaction 2 ) are guaranteed to work in the child process until a call to .Xr execve 2 or a similar function. The .Fx implementation of .Fn fork provides a usable .Xr malloc 3 , and .Xr rtld 1 services in the child process. .El .Pp The .Fn fork function is not async-signal safe and creates a cancellation point in the parent process. It cannot be safely used from signal handlers, and the atfork handlers established by .Xr pthread_atfork 3 do not need to be async-signal safe either. .Pp The .Fn _Fork function creates a new process, similarly to .Fn fork , but it is async-signal safe. .Fn _Fork does not call atfork handlers, and does not create a cancellation point. It can be used safely from signal handlers, but then no userspace services ( .Xr malloc 3 or .Xr rtld 1 ) are available in the child if forked from multi-threaded parent. In particular, if using dynamic linking, all dynamic symbols used by the child after .Fn _Fork must be pre-resolved. Note: resolving can be done globally by specifying the .Ev LD_BIND_NOW environment variable to the dynamic linker, or per-binary by passing the .Fl z Ar now option to the static linker .Xr ld 1 , or by using each symbol before the .Fn _Fork call to force the binding. .Sh RETURN VALUES Upon successful completion, .Fn fork and .Fn _Fork return a value of 0 to the child process and return the process ID of the child process to the parent process. Otherwise, a value of -1 is returned to the parent process, no child process is created, and the global variable .Va errno is set to indicate the error. .Sh EXAMPLES The following example shows a common pattern of how .Fn fork is used in practice. .Bd -literal -offset indent #include #include #include #include int main(void) { pid_t pid; /* * If child is expected to use stdio(3), state of * the reused io streams must be synchronized between * parent and child, to avoid double output and other * possible issues. */ fflush(stdout); switch (pid = fork()) { case -1: err(1, "Failed to fork"); case 0: printf("Hello from child process!\en"); /* * Since we wrote into stdout, child needs to use * exit(3) and not _exit(2). This causes handlers * registered with atexit(3) to be called twice, * once in parent, and once in the child. If such * behavior is undesirable, consider * terminating child with _exit(2) or _Exit(3). */ exit(0); default: break; } printf("Hello from parent process (child's PID: %d)!\en", pid); return (0); } .Ed .Pp The output of such a program is along the lines of: .Bd -literal -offset indent Hello from parent process (child's PID: 27804)! Hello from child process! .Ed .Sh ERRORS The .Fn fork system call will fail and no child process will be created if: .Bl -tag -width Er .It Bq Er EAGAIN The system-imposed limit on the total number of processes under execution would be exceeded. The limit is given by the .Xr sysctl 3 MIB variable .Dv KERN_MAXPROC . (The limit is actually ten less than this except for the super user). .It Bq Er EAGAIN The user is not the super user, and the system-imposed limit on the total number of processes under execution by a single user would be exceeded. The limit is given by the .Xr sysctl 3 MIB variable .Dv KERN_MAXPROCPERUID . .It Bq Er EAGAIN The user is not the super user, and the soft resource limit corresponding to the .Fa resource argument .Dv RLIMIT_NPROC would be exceeded (see .Xr getrlimit 2 ) . .It Bq Er ENOMEM There is insufficient swap space for the new process. .El .Sh SEE ALSO .Xr execve 2 , .Xr rfork 2 , .Xr setitimer 2 , .Xr setrlimit 2 , .Xr sigaction 2 , .Xr vfork 2 , .Xr wait 2 , .Xr pthread_atfork 3 .Sh STANDARDS The .Fn fork and .Fn _Fork functions conform to .St -p1003.1-2024 . .Sh HISTORY The .Fn fork function appeared in .At v1 . The .Fn _Fork function appeared in .Fx 13.1 . diff --git a/lib/libsys/open.2 b/lib/libsys/open.2 index 84c4f02fce8a..a0e905a8f375 100644 --- a/lib/libsys/open.2 +++ b/lib/libsys/open.2 @@ -1,853 +1,880 @@ .\" Copyright (c) 1980, 1991, 1993 .\" The Regents of the University of California. All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 3. Neither the name of the University nor the names of its contributors .\" may be used to endorse or promote products derived from this software .\" without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" -.Dd April 3, 2025 +.Dd May 17, 2025 .Dt OPEN 2 .Os .Sh NAME .Nm open , openat .Nd open or create a file for reading, writing or executing .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In fcntl.h .Ft int .Fn open "const char *path" "int flags" "..." .Ft int .Fn openat "int fd" "const char *path" "int flags" "..." .Sh DESCRIPTION The file name specified by .Fa path is opened for either execution or reading and/or writing as specified by the argument .Fa flags and the file descriptor returned to the calling process. The .Fa flags argument may indicate the file is to be created if it does not exist (by specifying the .Dv O_CREAT flag). In this case .Fn open and .Fn openat require an additional argument .Fa "mode_t mode" , and the file is created with mode .Fa mode as described in .Xr chmod 2 and modified by the process' umask value (see .Xr umask 2 ) . .Pp The .Fn openat function is equivalent to the .Fn open function except in the case where the .Fa path specifies a relative path. For .Fn openat and relative .Fa path , the file to be opened is determined relative to the directory associated with the file descriptor .Fa fd instead of the current working directory. The .Fa flag parameter and the optional fourth parameter correspond exactly to the parameters of .Fn open . If .Fn openat is passed the special value .Dv AT_FDCWD in the .Fa fd parameter, the current working directory is used and the behavior is identical to a call to .Fn open . .Pp When .Fn openat is called with an absolute .Fa path , it ignores the .Fa fd argument. .Pp In .Xr capsicum 4 capability mode, .Fn open is not permitted. The .Fa path argument to .Fn openat must be strictly relative to a file descriptor .Fa fd ; that is, .Fa path must not be an absolute path and must not contain ".." components which cause the path resolution to escape the directory hierarchy starting at .Fa fd . Additionally, no symbolic link in .Fa path may target absolute path or contain escaping ".." components. .Fa fd must not be .Dv AT_FDCWD . .Pp If the .Dv vfs.lookup_cap_dotdot .Xr sysctl 3 MIB is set to zero, ".." components in the paths, used in capability mode, are completely disabled. If the .Dv vfs.lookup_cap_dotdot_nonlocal MIB is set to zero, ".." is not allowed if found on non-local filesystem. .Pp The .Fa flags are formed by .Em or Ns 'ing the following values: .Pp .Bl -tag -width O_RESOLVE_BENEATH .It Dv O_RDONLY open for reading only .It Dv O_WRONLY open for writing only .It Dv O_RDWR open for reading and writing .It Dv O_EXEC open for execute only .It Dv O_SEARCH open for search only (an alias for .Dv O_EXEC typically used with .Dv O_DIRECTORY ) .It Dv O_NONBLOCK do not block on open .It Dv O_APPEND set file pointer to the end of the file before each write .It Dv O_CREAT create file if it does not exist .It Dv O_TRUNC truncate size to 0 .It Dv O_EXCL fail if .Dv O_CREAT is set and the file exists .It Dv O_SHLOCK atomically obtain a shared lock .It Dv O_EXLOCK atomically obtain an exclusive lock .It Dv O_DIRECT read and write directly from the backing store .It Dv O_FSYNC synchronous data and metadata writes .Pq historical synonym for Dv O_SYNC .It Dv O_SYNC synchronous data and metadata writes .It Dv O_DSYNC synchronous data writes .It Dv O_NOFOLLOW do not follow symlinks .It Dv O_NOCTTY ignored .It Dv O_TTY_INIT ignored .It Dv O_DIRECTORY error if file is not a directory .It Dv O_CLOEXEC automatically close file on .Xr execve 2 +.It Dv O_CLOFORK +automatically close file on any child process created with +.Fn fork 2 .It Dv O_VERIFY verify the contents of the file with .Xr mac_veriexec 4 .It Dv O_RESOLVE_BENEATH .Pq Xr openat 2 only path resolution must not cross the .Fa fd directory .It Dv O_PATH record only the target path in the opened descriptor .It Dv O_EMPTY_PATH .Pq Xr openat 2 only open file referenced by .Fa fd if path is empty .It Dv O_NAMEDATTR open a named attribute or named attribute directory .El .Pp Exactly one of the flags .Dv O_RDONLY , .Dv O_WRONLY , .Dv O_RDWR , or .Dv O_EXEC must be provided. .Pp Opening a file with .Dv O_APPEND set causes each write on the resulting file descriptor to be appended to the end of the file. .Pp If .Dv O_TRUNC is specified and the file exists, the file is truncated to zero length. .Pp If .Dv O_CREAT is set, but file already exists, this flag has no effect except when .Dv O_EXCL is set too, in this case .Fn open fails with .Er EEXIST . This may be used to implement a simple exclusive access locking mechanism. In all other cases, the file is created and the access permission bits (see .Xr chmod 2) of the file mode are set to the value of the third argument taken as .Fa "mode_t mode" and passed through the .Xr umask 2 . This argument does not affect whether the file is opened for reading, writing, or for both. The open' request for a lock on the file, created with .Dv O_CREAT , will never fail provided that the underlying file system supports locking; see also .Dv O_SHLOCK and .Dv O_EXLOCK below. .Pp If .Dv O_EXCL is set and the last component of the pathname is a symbolic link, .Fn open will fail even if the symbolic link points to a non-existent name. .Pp If .Dv O_NONBLOCK is specified and the .Fn open system call would block for some reason (for example, waiting for carrier on a dialup line), .Fn open returns immediately. The descriptor remains in non-blocking mode for subsequent operations. .Pp If .Dv O_SYNC is used in the mask, all writes will immediately and synchronously be written to disk. .Dv O_FSYNC is an historical synonym for .Dv O_SYNC . .Pp If .Dv O_DSYNC is used in the mask, all data and metadata required to read the data will be synchronously written to disk, but changes to metadata such as file access and modification timestamps may be written later. .Pp If .Dv O_NOFOLLOW is used in the mask and the target file passed to .Fn open is a symbolic link then the .Fn open will fail. .Pp When opening a file, a lock with .Xr flock 2 semantics can be obtained by setting .Dv O_SHLOCK for a shared lock, or .Dv O_EXLOCK for an exclusive lock. .Pp .Dv O_DIRECT may be used to minimize or eliminate the cache effects of reading and writing. The system will attempt to avoid caching the data you read or write. If it cannot avoid caching the data, it will minimize the impact the data has on the cache. Use of this flag can drastically reduce performance if not used with care. The semantics of this flag are filesystem dependent, and some filesystems may ignore it entirely. .Pp .Dv O_NOCTTY may be used to ensure the OS does not assign this file as the controlling terminal when it opens a tty device. This is the default on .Fx , but is present for POSIX compatibility. The .Fn open system call will not assign controlling terminals on .Fx . .Pp .Dv O_TTY_INIT may be used to ensure the OS restores the terminal attributes when initially opening a TTY. This is the default on .Fx , but is present for POSIX compatibility. The initial call to .Fn open on a TTY will always restore default terminal attributes on .Fx . .Pp .Dv O_DIRECTORY may be used to ensure the resulting file descriptor refers to a directory. This flag can be used to prevent applications with elevated privileges from opening files which are even unsafe to open with .Dv O_RDONLY , such as device nodes. .Pp .Dv O_CLOEXEC may be used to set .Dv FD_CLOEXEC flag for the newly returned file descriptor. .Pp +.Dv O_CLOFORK +may be used to set +.Dv FD_CLOFORK +flag for the newly returned file descriptor. +The file will be closed on any child process created with +.Fn fork 2 , +.Fn vfork 2 +or +.Fn rfork 2 +with the +.Dv RFFDG +flag, remaining open in the parent. +Both the +.Dv O_CLOEXEC +and +.Dv O_CLOFORK +flags can be modified with the +.Dv F_SETFD +.Fn fcntl 2 +command. +.Pp .Dv O_VERIFY may be used to indicate to the kernel that the contents of the file should be verified before allowing the open to proceed. The details of what .Dq verified means is implementation specific. The run-time linker (rtld) uses this flag to ensure shared objects have been verified before operating on them. .Pp .Dv O_RESOLVE_BENEATH returns .Er ENOTCAPABLE if any intermediate component of the specified relative path does not reside in the directory hierarchy beneath the starting directory. Absolute paths or even the temporal escape from beneath of the starting directory is not allowed. .Pp When a directory is opened with .Dv O_SEARCH , execute permissions are checked at open time. The returned file descriptor may not be used for any read operations like .Xr getdirentries 2 . The primary use of this descriptor is as the lookup descriptor for the .Fn *at family of functions. If .Dv O_SEARCH was not requested at open time, then the .Fn *at functions use the current directory permissions for the directory referenced by the descriptor at the time of the .Fn *at call. .Pp .Dv O_PATH returns a file descriptor that can be used as a directory file descriptor for .Fn openat and other system calls taking a file descriptor argument, like .Xr fstatat 2 and others. The other functionality of the returned file descriptor is limited to the following descriptor-level operations: .Pp .Bl -tag -width __acl_aclcheck_fd -offset indent -compact .It Xr fcntl 2 but advisory locking is not allowed .It Xr dup 2 .It Xr close 2 .It Xr fstat 2 .It Xr fstatfs 2 .It Xr fchdir 2 .It Xr fchroot 2 .It Xr fexecve 2 .It Xr funlinkat 2 can be passed as the third argument .It Dv SCM_RIGHTS can be passed over a .Xr unix 4 socket using a .Dv SCM_RIGHTS message .It Xr kqueue 2 only with .Dv EVFILT_VNODE .It Xr __acl_get_fd 2 .It Xr __acl_aclcheck_fd 2 .It Xr extattr 2 .It Xr capsicum 4 can be passed to .Fn cap_*_limit and .Fn cap_*_get system calls (such as .Xr cap_rights_limit 2 ) . .El .Pp Other operations like .Xr read 2 , .Xr ftruncate 2 , and any other that operate on file and not on file descriptor (except .Xr fstat 2 ) , are not allowed. .Pp A file descriptor created with the .Dv O_PATH flag can be opened as a normal (operable) file descriptor by specifying it as the .Fa fd argument to .Fn openat with an empty .Fa path and the .Dv O_EMPTY_PATH flag. Such an open behaves as if the current path of the file referenced by .Fa fd is passed, except that path walk permissions are not checked. See also the description of .Dv AT_EMPTY_PATH flag for .Xr fstatat 2 and related syscalls. .Pp Conversely, a file descriptor .Dv fd referencing a filesystem file can be converted to the .Dv O_PATH type of descriptor by using the following call .Dl opath_fd = openat(fd, \[dq]\[dq], O_EMPTY_PATH | O_PATH); .Pp If successful, .Fn open returns a non-negative integer, termed a file descriptor. It returns \-1 on failure. The file descriptor value returned is the lowest numbered descriptor currently not in use by the process. The file pointer used to mark the current position within the file is set to the beginning of the file. .Pp If a sleeping open of a device node from .Xr devfs 4 is interrupted by a signal, the call always fails with .Er EINTR , even if the .Dv SA_RESTART flag is set for the signal. A sleeping open of a fifo (see .Xr mkfifo 2 ) is restarted as normal. .Pp When a new file is created, it is assigned the group of the directory which contains it. .Pp Unless .Dv O_CLOEXEC flag was specified, the new descriptor is set to remain open across .Xr execve 2 system calls; see .Xr close 2 , .Xr fcntl 2 and the description of the .Dv O_CLOEXEC flag. .Pp When the .Dv O_NAMEDATTR flag is specified for an .Fn openat where the .Fa fd argument is for a file object, a named attribute for the file object is opened and not the file object itself. If the .Dv O_CREAT flag has been specified as well, the named attribute will be created if it does not exist. When the .Dv O_NAMEDATTR flag is specified for a .Fn open , a named attribute for the current working directory is opened and not the current working directory. The .Fa path argument for this .Fn openat or .Fn open must be a single component name with no embedded .Ql / . If the .Fa path argument is .Ql .\& then the named attribute directory for the file object is opened. (See .Xr named_attribute 7 for more information.) .Pp The system imposes a limit on the number of file descriptors open simultaneously by one process. The .Xr getdtablesize 2 system call returns the current system limit. .Sh RETURN VALUES If successful, .Fn open and .Fn openat return a non-negative integer, termed a file descriptor. They return \-1 on failure, and set .Va errno to indicate the error. .Sh ERRORS The named file is opened unless: .Bl -tag -width Er .It Bq Er ENOTDIR A component of the path prefix is not a directory. .It Bq Er ENAMETOOLONG A component of a pathname exceeded 255 characters, or an entire path name exceeded 1023 characters. .It Bq Er ENOENT .Dv O_CREAT is not set and the named file does not exist. .It Bq Er ENOENT A component of the path name that must exist does not exist. .It Bq Er EACCES Search permission is denied for a component of the path prefix. .It Bq Er EACCES The required permissions (for reading and/or writing) are denied for the given flags. .It Bq Er EACCES .Dv O_TRUNC is specified and write permission is denied. .It Bq Er EACCES .Dv O_CREAT is specified, the file does not exist, and the directory in which it is to be created does not permit writing. .It Bq Er EPERM .Dv O_CREAT is specified, the file does not exist, and the directory in which it is to be created has its immutable flag set, see the .Xr chflags 2 manual page for more information. .It Bq Er EPERM The named file has its immutable flag set and the file is to be modified. .It Bq Er EPERM The named file has its append-only flag set, the file is to be modified, and .Dv O_TRUNC is specified or .Dv O_APPEND is not specified. .It Bq Er ELOOP Too many symbolic links were encountered in translating the pathname. .It Bq Er EISDIR The named file is a directory, and the arguments specify it is to be modified. .It Bq Er EISDIR The named file is a directory, and the flags specified .Dv O_CREAT without .Dv O_DIRECTORY . .It Bq Er EROFS The named file resides on a read-only file system, and the file is to be modified. .It Bq Er EROFS .Dv O_CREAT is specified and the named file would reside on a read-only file system. .It Bq Er EMFILE The process has already reached its limit for open file descriptors. .It Bq Er ENFILE The system file table is full. .It Bq Er EMLINK .Dv O_NOFOLLOW was specified and the target is a symbolic link. POSIX specifies a different error for this case; see the note in .Sx STANDARDS below. .It Bq Er ENXIO The named file is a character special or block special file, and the device associated with this special file does not exist. .It Bq Er ENXIO .Dv O_NONBLOCK is set, the named file is a fifo, .Dv O_WRONLY is set, and no process has the file open for reading. .It Bq Er EINTR The .Fn open operation was interrupted by a signal. .It Bq Er EOPNOTSUPP .Dv O_SHLOCK or .Dv O_EXLOCK is specified but the underlying file system does not support locking. .It Bq Er EOPNOTSUPP The named file is a special file mounted through a file system that does not support access to it (for example, NFS). .It Bq Er EWOULDBLOCK .Dv O_NONBLOCK and one of .Dv O_SHLOCK or .Dv O_EXLOCK is specified and the file is locked. .It Bq Er ENOSPC .Dv O_CREAT is specified, the file does not exist, and the directory in which the entry for the new file is being placed cannot be extended because there is no space left on the file system containing the directory. .It Bq Er ENOSPC .Dv O_CREAT is specified, the file does not exist, and there are no free inodes on the file system on which the file is being created. .It Bq Er EDQUOT .Dv O_CREAT is specified, the file does not exist, and the directory in which the entry for the new file is being placed cannot be extended because the user's quota of disk blocks on the file system containing the directory has been exhausted. .It Bq Er EDQUOT .Dv O_CREAT is specified, the file does not exist, and the user's quota of inodes on the file system on which the file is being created has been exhausted. .It Bq Er EIO An I/O error occurred while making the directory entry or allocating the inode for .Dv O_CREAT . .It Bq Er EINTEGRITY Corrupted data was detected while reading from the file system. .It Bq Er ETXTBSY The file is a pure procedure (shared text) file that is being executed and the .Fn open system call requests write access. .It Bq Er EFAULT The .Fa path argument points outside the process's allocated address space. .It Bq Er EEXIST .Dv O_CREAT and .Dv O_EXCL were specified and the file exists. .It Bq Er EOPNOTSUPP An attempt was made to open a socket (not currently implemented). .It Bq Er EINVAL An attempt was made to open a descriptor with an illegal combination of .Dv O_RDONLY , .Dv O_WRONLY , or .Dv O_RDWR , and .Dv O_EXEC or .Dv O_SEARCH . .It Bq Er EINVAL .Dv O_CREAT is specified, and the last component of the .Fa path argument is invalid on the file system on which the file is being created. .It Bq Er EBADF The .Fa path argument does not specify an absolute path and the .Fa fd argument is neither .Dv AT_FDCWD nor a valid file descriptor open for searching. .It Bq Er ENOTDIR The .Fa path argument is not an absolute path and .Fa fd is neither .Dv AT_FDCWD nor a file descriptor associated with a directory. .It Bq Er ENOTDIR .Dv O_DIRECTORY is specified and the file is not a directory. .It Bq Er ECAPMODE .Dv AT_FDCWD is specified and the process is in capability mode. .It Bq Er ECAPMODE .Fn open was called and the process is in capability mode. .It Bq Er ENOTCAPABLE .Fa path is an absolute path and the process is in capability mode. .It Bq Er ENOTCAPABLE .Fa path is an absolute path and .Dv O_RESOLVE_BENEATH is specified. .It Bq Er ENOTCAPABLE .Fa path contains a ".." component leading to a directory outside of the directory hierarchy specified by .Fa fd and the process is in capability mode. .It Bq Er ENOTCAPABLE .Fa path contains a ".." component leading to a directory outside of the directory hierarchy specified by .Fa fd and .Dv O_RESOLVE_BENEATH is specified. .It Bq Er ENOTCAPABLE .Fa path contains a ".." component, the .Dv vfs.lookup_cap_dotdot .Xr sysctl 3 is set, and the process is in capability mode. .It Bq Er ENOATTR .Dv O_NAMEDATTR has been specified and the file object is not a named attribute directory or named attribute. .El .Sh SEE ALSO .Xr chmod 2 , .Xr close 2 , .Xr dup 2 , .Xr fexecve 2 , .Xr fhopen 2 , .Xr getdtablesize 2 , .Xr getfh 2 , .Xr lgetfh 2 , .Xr lseek 2 , .Xr read 2 , .Xr umask 2 , .Xr write 2 , .Xr fopen 3 , .Xr capsicum 4 , .Xr named_attribute 7 .Sh STANDARDS These functions are specified by .St -p1003.1-2008 . .Pp .Fx sets .Va errno to .Er EMLINK instead of .Er ELOOP as specified by POSIX when .Dv O_NOFOLLOW is set in flags and the final component of pathname is a symbolic link to distinguish it from the case of too many symbolic link traversals in one of its non-final components. .Pp The Open Group Extended API Set 2 specification, that introduced the .Fn *at API, required that the test for whether .Fa fd is searchable is based on whether .Fa fd is open for searching, not whether the underlying directory currently permits searches. The present implementation of the .Fa openat system call is believed to be compatible with .\" .St -p1003.1-2017 , .\" XXX: This should be replaced in the future when an appropriate argument to .\" the St macro is available: -p1003.1-2017 .No IEEE Std 1003.1-2008, 2017 Edition ("POSIX.1") , which specifies that behavior for .Dv O_SEARCH , in the absence of the flag the implementation checks the current permissions of a directory. .Sh HISTORY The .Fn open function appeared in .At v1 . The .Fn openat function was introduced in .Fx 8.0 . .Dv O_DSYNC appeared in 13.0. .Dv O_NAMEDATTR appeared in 15.0. +.Dv O_CLOFORK +appeared in +.Fx 15.0 . .Sh BUGS The .Fa mode argument is variadic and may result in different calling conventions than might otherwise be expected. diff --git a/lib/libsys/pipe.2 b/lib/libsys/pipe.2 index 9531c9717395..37d6eba420de 100644 --- a/lib/libsys/pipe.2 +++ b/lib/libsys/pipe.2 @@ -1,175 +1,182 @@ .\" Copyright (c) 1980, 1991, 1993 .\" The Regents of the University of California. All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 3. Neither the name of the University nor the names of its contributors .\" may be used to endorse or promote products derived from this software .\" without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" -.Dd December 1, 2017 +.Dd May 17, 2025 .Dt PIPE 2 .Os .Sh NAME .Nm pipe , .Nm pipe2 .Nd create descriptor pair for interprocess communication .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In unistd.h .Ft int .Fn pipe "int fildes[2]" .Ft int .Fn pipe2 "int fildes[2]" "int flags" .Sh DESCRIPTION The .Fn pipe function creates a .Em pipe , which is an object allowing bidirectional data flow, and allocates a pair of file descriptors. .Pp The .Fn pipe2 system call allows control over the attributes of the file descriptors via the .Fa flags argument. Values for .Fa flags are constructed by a bitwise-inclusive OR of flags from the following list, defined in .In fcntl.h : .Bl -tag -width ".Dv O_NONBLOCK" .It Dv O_CLOEXEC Set the close-on-exec flag for the new file descriptors. +.It Dv O_CLOFORK +Set the close-on-fork flag for the new file descriptors. .It Dv O_NONBLOCK Set the non-blocking flag for the ends of the pipe. .El .Pp If the .Fa flags argument is 0, the behavior is identical to a call to .Fn pipe . .Pp By convention, the first descriptor is normally used as the .Em read end of the pipe, and the second is normally the .Em write end , so that data written to .Fa fildes[1] appears on (i.e., can be read from) .Fa fildes[0] . This allows the output of one program to be sent to another program: the source's standard output is set up to be the write end of the pipe, and the sink's standard input is set up to be the read end of the pipe. The pipe itself persists until all its associated descriptors are closed. .Pp A pipe that has had an end closed is considered .Em widowed . Writing on such a pipe causes the writing process to receive a .Dv SIGPIPE signal. Widowing a pipe is the only way to deliver end-of-file to a reader: after the reader consumes any buffered data, reading a widowed pipe returns a zero count. .Pp The bidirectional nature of this implementation of pipes is not portable to older systems, so it is recommended to use the convention for using the endpoints in the traditional manner when using a pipe in one direction. .Sh IMPLEMENTATION NOTES The .Fn pipe function calls the .Fn pipe2 system call. As a result, system call traces such as those captured by .Xr dtrace 1 or .Xr ktrace 1 will show calls to .Fn pipe2 . .Sh RETURN VALUES .Rv -std pipe .Sh ERRORS The .Fn pipe and .Fn pipe2 system calls will fail if: .Bl -tag -width Er .It Bq Er EFAULT .Ar fildes argument points to an invalid memory location. .It Bq Er EMFILE Too many descriptors are active. .It Bq Er ENFILE The system file table is full. .It Bq Er ENOMEM Not enough kernel memory to establish a pipe. .El .Pp The .Fn pipe2 system call will also fail if: .Bl -tag -width Er .It Bq Er EINVAL The .Fa flags argument is invalid. .El .Sh SEE ALSO .Xr sh 1 , .Xr fork 2 , .Xr read 2 , .Xr socketpair 2 , .Xr write 2 .Sh HISTORY The .Fn pipe function appeared in .At v3 . .Pp Bidirectional pipes were first used on .At V.4 . .Pp The .Fn pipe2 function appeared in .Fx 10.0 . .Pp The .Fn pipe function became a wrapper around .Fn pipe2 in .Fx 11.0 . +.Pp +The +.Dv O_CLOFORK +flag appeared in +.Fx 15.0 . diff --git a/lib/libsys/recv.2 b/lib/libsys/recv.2 index f3ee60b75663..b78cd70b8a1d 100644 --- a/lib/libsys/recv.2 +++ b/lib/libsys/recv.2 @@ -1,401 +1,402 @@ .\" Copyright (c) 1983, 1990, 1991, 1993 .\" The Regents of the University of California. All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 3. Neither the name of the University nor the names of its contributors .\" may be used to endorse or promote products derived from this software .\" without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" -.Dd July 30, 2022 +.Dd May 17, 2025 .Dt RECV 2 .Os .Sh NAME .Nm recv , .Nm recvfrom , .Nm recvmsg , .Nm recvmmsg .Nd receive message(s) from a socket .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In sys/socket.h .Ft ssize_t .Fn recv "int s" "void *buf" "size_t len" "int flags" .Ft ssize_t .Fn recvfrom "int s" "void *buf" "size_t len" "int flags" "struct sockaddr * restrict from" "socklen_t * restrict fromlen" .Ft ssize_t .Fn recvmsg "int s" "struct msghdr *msg" "int flags" .Ft ssize_t .Fn recvmmsg "int s" "struct mmsghdr * restrict msgvec" "size_t vlen" "int flags" "const struct timespec * restrict timeout" .Sh DESCRIPTION The .Fn recvfrom , .Fn recvmsg , and .Fn recvmmsg system calls are used to receive messages from a socket, and may be used to receive data on a socket whether or not it is connection-oriented. .Pp If .Fa from is not a null pointer and the socket is not connection-oriented, the source address of the message is filled in. The .Fa fromlen argument is a value-result argument, initialized to the size of the buffer associated with .Fa from , and modified on return to indicate the actual size of the address stored there. .Pp The .Fn recv function is normally used only on a .Em connected socket (see .Xr connect 2 ) and is identical to .Fn recvfrom with a null pointer passed as its .Fa from argument. .Pp The .Fn recvmmsg function is used to receive multiple messages at a call. Their number is supplied by .Fa vlen . The messages are placed in the buffers described by .Fa msgvec vector, after reception. The size of each received message is placed in the .Fa msg_len field of each element of the vector. If .Fa timeout is NULL the call blocks until the data is available for each supplied message buffer. Otherwise it waits for data for the specified amount of time. If the timeout expired and there is no data received, a value 0 is returned. The .Xr ppoll 2 system call is used to implement the timeout mechanism, before first receive is performed. .Pp The .Fn recv , .Fn recvfrom and .Fn recvmsg return the length of the message on successful completion, whereas .Fn recvmmsg returns the number of received messages. If a message is too long to fit in the supplied buffer, excess bytes may be discarded depending on the type of socket the message is received from (see .Xr socket 2 ) . .Pp If no messages are available at the socket, the receive call waits for a message to arrive, unless the socket is non-blocking (see .Xr fcntl 2 ) in which case the value \-1 is returned and the global variable .Va errno is set to .Er EAGAIN . The receive calls except .Fn recvmmsg normally return any data available, up to the requested amount, rather than waiting for receipt of the full amount requested; this behavior is affected by the socket-level options .Dv SO_RCVLOWAT and .Dv SO_RCVTIMEO described in .Xr getsockopt 2 . The .Fn recvmmsg function implements this behaviour for each message in the vector. .Pp The .Xr select 2 system call may be used to determine when more data arrives. .Pp The .Fa flags argument to a .Fn recv function is formed by .Em or Ap ing one or more of the values: .Bl -column ".Dv MSG_CMSG_CLOEXEC" -offset indent .It Dv MSG_OOB Ta process out-of-band data .It Dv MSG_PEEK Ta peek at incoming message .It Dv MSG_TRUNC Ta return real packet or datagram length .It Dv MSG_WAITALL Ta wait for full request or error .It Dv MSG_DONTWAIT Ta do not block .It Dv MSG_CMSG_CLOEXEC Ta set received fds close-on-exec +.It Dv MSG_CMSG_CLOFORK Ta set received fds close-on-fork .It Dv MSG_WAITFORONE Ta do not block after receiving the first message (only for .Fn recvmmsg ) .El .Pp The .Dv MSG_OOB flag requests receipt of out-of-band data that would not be received in the normal data stream. Some protocols place expedited data at the head of the normal data queue, and thus this flag cannot be used with such protocols. The .Dv MSG_PEEK flag causes the receive operation to return data from the beginning of the receive queue without removing that data from the queue. Thus, a subsequent receive call will return the same data. The .Dv MSG_TRUNC flag causes the receive operation to return the full length of the packet or datagram even if larger than provided buffer. The flag is supported on SOCK_DGRAM sockets for .Dv AF_INET , .Dv AF_INET6 and .Dv AF_UNIX families. The .Dv MSG_WAITALL flag requests that the operation block until the full request is satisfied. However, the call may still return less data than requested if a signal is caught, an error or disconnect occurs, or the next data to be received is of a different type than that returned. The .Dv MSG_DONTWAIT flag requests the call to return when it would block otherwise. If no data is available, .Va errno is set to .Er EAGAIN . This flag is not available in .St -ansiC or .St -isoC-99 compilation mode. The .Dv MSG_WAITFORONE flag sets MSG_DONTWAIT after the first message has been received. This flag is only relevant for .Fn recvmmsg . .Pp The .Fn recvmsg system call uses a .Fa msghdr structure to minimize the number of directly supplied arguments. This structure has the following form, as defined in .In sys/socket.h : .Bd -literal struct msghdr { void *msg_name; /* optional address */ socklen_t msg_namelen; /* size of address */ struct iovec *msg_iov; /* scatter/gather array */ int msg_iovlen; /* # elements in msg_iov */ void *msg_control; /* ancillary data, see below */ socklen_t msg_controllen;/* ancillary data buffer len */ int msg_flags; /* flags on received message */ }; .Ed .Pp Here .Fa msg_name and .Fa msg_namelen specify the source address if the socket is unconnected; .Fa msg_name may be given as a null pointer if no names are desired or required. The .Fa msg_iov and .Fa msg_iovlen arguments describe scatter gather locations, as discussed in .Xr read 2 . The .Fa msg_control argument, which has length .Fa msg_controllen , points to a buffer for other protocol control related messages or other miscellaneous ancillary data. The messages are of the form: .Bd -literal struct cmsghdr { socklen_t cmsg_len; /* data byte count, including hdr */ int cmsg_level; /* originating protocol */ int cmsg_type; /* protocol-specific type */ /* followed by u_char cmsg_data[]; */ }; .Ed .Pp As an example, the SO_TIMESTAMP socket option returns a reception timestamp for UDP packets. .Pp With .Dv AF_UNIX domain sockets, ancillary data can be used to pass file descriptors and process credentials. See .Xr unix 4 for details. .Pp The .Fa msg_flags field is set on return according to the message received. .Dv MSG_EOR indicates end-of-record; the data returned completed a record (generally used with sockets of type .Dv SOCK_SEQPACKET ) . .Dv MSG_TRUNC indicates that the trailing portion of a datagram was discarded because the datagram was larger than the buffer supplied. .Dv MSG_CTRUNC indicates that some control data were discarded due to lack of space in the buffer for ancillary data. .Dv MSG_OOB is returned to indicate that expedited or out-of-band data were received. .Pp The .Fn recvmmsg system call uses the .Fa mmsghdr structure, defined as follows in the .In sys/socket.h header: .Bd -literal struct mmsghdr { struct msghdr msg_hdr; /* message header */ ssize_t msg_len; /* message length */ }; .Ed .Pp On data reception the .Fa msg_len field is updated to the length of the received message. .Sh RETURN VALUES On successful completion, the .Fn recv , .Fn recvfrom , and .Fn recvmsg functions return the number of bytes received, while the .Fn recvmmsg function returns the number of messages received. If no messages are available to be received and the peer has performed an orderly shutdown, 0 is returned. Otherwise, -1 is returned and .Va errno is set to indicate the error. .Sh ERRORS The calls fail if: .Bl -tag -width Er .It Bq Er EBADF The argument .Fa s is an invalid descriptor. .It Bq Er ECONNRESET The remote socket end is forcibly closed. .It Bq Er ENOTCONN The socket is associated with a connection-oriented protocol and has not been connected (see .Xr connect 2 and .Xr accept 2 ) . .It Bq Er ENOTSOCK The argument .Fa s does not refer to a socket. .It Bq Er EMFILE The .Fn recvmsg system call was used to receive rights (file descriptors) that were in flight on the connection. However, the receiving program did not have enough free file descriptor slots to accept them. In this case the descriptors are closed, with pending data either discarded in the case of the unreliable datagram protocol or preserved in the case of a reliable protocol. The pending data can be retrieved with another call to .Fn recvmsg . .It Bq Er EMSGSIZE The .Fa msg_iovlen member of the .Fa msghdr structure pointed to by .Fa msg is less than or equal to 0, or is greater than .Va IOV_MAX . .It Bq Er EAGAIN The socket is marked non-blocking and the receive operation would block, or a receive timeout had been set and the timeout expired before data were received. .It Bq Er EINTR The receive was interrupted by delivery of a signal before any data were available. .It Bq Er EFAULT The receive buffer pointer(s) point outside the process's address space. .El .Sh SEE ALSO .Xr fcntl 2 , .Xr getsockopt 2 , .Xr read 2 , .Xr select 2 , .Xr socket 2 , .Xr CMSG_DATA 3 , .Xr unix 4 .Sh HISTORY The .Fn recv function appeared in .Bx 4.2 . The .Fn recvmmsg function appeared in .Fx 11.0 . diff --git a/lib/libsys/socket.2 b/lib/libsys/socket.2 index a383cbcc4d80..b211611c6354 100644 --- a/lib/libsys/socket.2 +++ b/lib/libsys/socket.2 @@ -1,349 +1,358 @@ .\" Copyright (c) 1983, 1991, 1993 .\" The Regents of the University of California. All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 3. Neither the name of the University nor the names of its contributors .\" may be used to endorse or promote products derived from this software .\" without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" -.Dd January 15, 2023 +.Dd May 17, 2025 .Dt SOCKET 2 .Os .Sh NAME .Nm socket .Nd create an endpoint for communication .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In sys/socket.h .Ft int .Fn socket "int domain" "int type" "int protocol" .Sh DESCRIPTION The .Fn socket system call creates an endpoint for communication and returns a descriptor. .Pp The .Fa domain argument specifies a communications domain within which communication will take place; this selects the protocol family which should be used. These families are defined in the include file .In sys/socket.h . The currently understood formats are: .Pp .Bd -literal -offset indent -compact PF_LOCAL Host-internal protocols (alias for PF_UNIX), PF_UNIX Host-internal protocols, PF_INET Internet version 4 protocols, PF_INET6 Internet version 6 protocols, PF_DIVERT Firewall packet diversion/re-injection, PF_ROUTE Internal routing protocol, PF_KEY Internal key-management function, PF_NETGRAPH Netgraph sockets, PF_NETLINK Netlink protocols, PF_BLUETOOTH Bluetooth protocols, PF_INET_SDP OFED socket direct protocol (IPv4), AF_HYPERV HyperV sockets .Ed .Pp Each protocol family is connected to an address family, which has the same name except that the prefix is .Dq Dv AF_ in place of .Dq Dv PF_ . Other protocol families may be also defined, beginning with .Dq Dv PF_ , with corresponding address families. .Pp The socket has the indicated .Fa type , which specifies the semantics of communication. Currently defined types are: .Pp .Bd -literal -offset indent -compact SOCK_STREAM Stream socket, SOCK_DGRAM Datagram socket, SOCK_RAW Raw-protocol interface, SOCK_SEQPACKET Sequenced packet stream .Ed .Pp A .Dv SOCK_STREAM type provides sequenced, reliable, two-way connection based byte streams. An out-of-band data transmission mechanism may be supported. A .Dv SOCK_DGRAM socket supports datagrams (connectionless, unreliable messages of a fixed (typically small) maximum length). A .Dv SOCK_SEQPACKET socket may provide a sequenced, reliable, two-way connection-based data transmission path for datagrams of fixed maximum length; a consumer may be required to read an entire packet with each read system call. This facility may have protocol-specific properties. .Dv SOCK_RAW sockets provide access to internal network protocols and interfaces. The .Dv SOCK_RAW type is available only to the super-user and is described in .Xr ip 4 and .Xr ip6 4 . .Pp Additionally, the following flags are allowed in the .Fa type argument: .Pp .Bd -literal -offset indent -compact SOCK_CLOEXEC Set close-on-exec on the new descriptor, +SOCK_CLOFORK Set close-on-fork on the new descriptor, SOCK_NONBLOCK Set non-blocking mode on the new socket .Ed .Pp The .Fa protocol argument specifies a particular protocol to be used with the socket. Normally only a single protocol exists to support a particular socket type within a given protocol family. However, it is possible that many protocols may exist, in which case a particular protocol must be specified in this manner. The protocol number to use is particular to the .Dq "communication domain" in which communication is to take place; see .Xr protocols 5 . .Pp The .Fa protocol argument may be set to zero (0) to request the default implementation of a socket type for the protocol, if any. .Pp Sockets of type .Dv SOCK_STREAM are full-duplex byte streams, similar to pipes. A stream socket must be in a .Em connected state before any data may be sent or received on it. A connection to another socket is created with a .Xr connect 2 system call. Once connected, data may be transferred using .Xr read 2 and .Xr write 2 calls or some variant of the .Xr send 2 and .Xr recv 2 functions. (Some protocol families, such as the Internet family, support the notion of an .Dq implied connect , which permits data to be sent piggybacked onto a connect operation by using the .Xr sendto 2 system call.) When a session has been completed a .Xr close 2 may be performed. Out-of-band data may also be transmitted as described in .Xr send 2 and received as described in .Xr recv 2 . .Pp The communications protocols used to implement a .Dv SOCK_STREAM ensure that data is not lost or duplicated. If a piece of data for which the peer protocol has buffer space cannot be successfully transmitted within a reasonable length of time, then the connection is considered broken and calls will indicate an error with -1 returns and with .Er ETIMEDOUT as the specific code in the global variable .Va errno . The protocols optionally keep sockets .Dq warm by forcing transmissions roughly every minute in the absence of other activity. An error is then indicated if no response can be elicited on an otherwise idle connection for an extended period (e.g.\& 5 minutes). By default, a .Dv SIGPIPE signal is raised if a process sends on a broken stream, but this behavior may be inhibited via .Xr setsockopt 2 . .Pp .Dv SOCK_SEQPACKET sockets employ the same system calls as .Dv SOCK_STREAM sockets. The only difference is that .Xr read 2 calls will return only the amount of data requested, and any remaining in the arriving packet will be discarded. .Pp .Dv SOCK_DGRAM and .Dv SOCK_RAW sockets allow sending of datagrams to correspondents named in .Xr send 2 calls. Datagrams are generally received with .Xr recvfrom 2 , which returns the next datagram with its return address. .Pp An .Xr fcntl 2 system call can be used to specify a process group to receive a .Dv SIGURG signal when the out-of-band data arrives. It may also enable non-blocking I/O and asynchronous notification of I/O events via .Dv SIGIO . .Pp The operation of sockets is controlled by socket level .Em options . These options are defined in the file .In sys/socket.h . The .Xr setsockopt 2 and .Xr getsockopt 2 system calls are used to set and get options, respectively. .Sh RETURN VALUES A -1 is returned if an error occurs, otherwise the return value is a descriptor referencing the socket. .Sh ERRORS The .Fn socket system call fails if: .Bl -tag -width Er .It Bq Er EACCES Permission to create a socket of the specified type and/or protocol is denied. .It Bq Er EAFNOSUPPORT The address family (domain) is not supported or the specified domain is not supported by this protocol family. .It Bq Er EMFILE The per-process descriptor table is full. .It Bq Er ENFILE The system file table is full. .It Bq Er ENOBUFS Insufficient buffer space is available. The socket cannot be created until sufficient resources are freed. .It Bq Er EPERM User has insufficient privileges to carry out the requested operation. .It Bq Er EPROTONOSUPPORT The protocol type or the specified protocol is not supported within this domain. .It Bq Er EPROTOTYPE The socket type is not supported by the protocol. .El .Sh SEE ALSO .Xr accept 2 , .Xr bind 2 , .Xr connect 2 , .Xr getpeername 2 , .Xr getsockname 2 , .Xr getsockopt 2 , .Xr ioctl 2 , .Xr listen 2 , .Xr read 2 , .Xr recv 2 , .Xr select 2 , .Xr send 2 , .Xr shutdown 2 , .Xr socketpair 2 , .Xr write 2 , .Xr CMSG_DATA 3 , .Xr getprotoent 3 , .Xr divert 4 , .Xr ip 4 , .Xr ip6 4 , .Xr netgraph 4 , .Xr protocols 5 .Rs .%T "An Introductory 4.3 BSD Interprocess Communication Tutorial" .%B PS1 .%N 7 .Re .Rs .%T "BSD Interprocess Communication Tutorial" .%B PS1 .%N 8 .Re .Sh STANDARDS The .Fn socket function conforms to .St -p1003.1-2008 . The .Tn POSIX standard specifies only the .Dv AF_INET , .Dv AF_INET6 , and .Dv AF_UNIX constants for address families, and requires the use of .Dv AF_* constants for the .Fa domain argument of .Fn socket . The .Dv SOCK_CLOEXEC -flag is expected to conform to the next revision of the +and +.Dv SOCK_CLOFORK +flags are expected to conform to +.St -p1003.1-2024 . .Tn POSIX standard. The .Dv SOCK_RDM .Fa type , the .Dv PF_* constants, and other address families are .Fx extensions. .Sh HISTORY The .Fn socket system call appeared in .Bx 4.2 . +.Pp +The +.Dv SOCK_CLOFORK +flag appeared in +.Fx 15.0 . diff --git a/lib/libsys/socketpair.2 b/lib/libsys/socketpair.2 index 5874a0791f4d..60dec74f9cc2 100644 --- a/lib/libsys/socketpair.2 +++ b/lib/libsys/socketpair.2 @@ -1,104 +1,105 @@ .\" Copyright (c) 1983, 1991, 1993 .\" The Regents of the University of California. All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 3. Neither the name of the University nor the names of its contributors .\" may be used to endorse or promote products derived from this software .\" without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" -.Dd February 10, 2018 +.Dd May 17, 2025 .Dt SOCKETPAIR 2 .Os .Sh NAME .Nm socketpair .Nd create a pair of connected sockets .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In sys/types.h .In sys/socket.h .Ft int .Fn socketpair "int domain" "int type" "int protocol" "int *sv" .Sh DESCRIPTION The .Fn socketpair system call creates an unnamed pair of connected sockets in the specified communications .Fa domain , of the specified .Fa type , and using the optionally specified .Fa protocol . The descriptors used in referencing the new sockets are returned in .Fa sv Ns [0] and .Fa sv Ns [1] . The two sockets are indistinguishable. .Pp The -.Dv SOCK_CLOEXEC +.Dv SOCK_CLOEXEC , +.Dv SOCK_CLOFORK and .Dv SOCK_NONBLOCK flags in the .Fa type argument apply to both descriptors. .Sh RETURN VALUES .Rv -std socketpair .Sh ERRORS The call succeeds unless: .Bl -tag -width Er .It Bq Er EMFILE Too many descriptors are in use by this process. .It Bq Er EAFNOSUPPORT The specified address family is not supported on this machine. .It Bq Er EPROTONOSUPPORT The specified protocol is not supported on this machine. .It Bq Er EOPNOTSUPP The specified protocol does not support creation of socket pairs. .It Bq Er EFAULT The address .Fa sv does not specify a valid part of the process address space. .El .Sh SEE ALSO .Xr pipe 2 , .Xr read 2 , .Xr socket 2 , .Xr write 2 .Sh STANDARDS The .Fn socketpair system call conforms to .St -p1003.1-2001 and .St -p1003.1-2008 . .Sh HISTORY The .Fn socketpair system call appeared in .Bx 4.2 . .Sh BUGS This call is currently implemented only for the .Ux domain.