diff --git a/lib/libc/sys/ptrace.2 b/lib/libc/sys/ptrace.2 index 6148e6d333d5..43ec2b76bbfd 100644 --- a/lib/libc/sys/ptrace.2 +++ b/lib/libc/sys/ptrace.2 @@ -1,1235 +1,1260 @@ .\" $FreeBSD$ .\" $NetBSD: ptrace.2,v 1.2 1995/02/27 12:35:37 cgd Exp $ .\" .\" This file is in the public domain. -.Dd April 10, 2021 +.Dd May 20, 2021 .Dt PTRACE 2 .Os .Sh NAME .Nm ptrace .Nd process tracing and debugging .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In sys/types.h .In sys/ptrace.h .Ft int .Fn ptrace "int request" "pid_t pid" "caddr_t addr" "int data" .Sh DESCRIPTION The .Fn ptrace system call provides tracing and debugging facilities. It allows one process (the .Em tracing process) to control another (the .Em traced process). The tracing process must first attach to the traced process, and then issue a series of .Fn ptrace system calls to control the execution of the process, as well as access process memory and register state. For the duration of the tracing session, the traced process will be .Dq re-parented , with its parent process ID (and resulting behavior) changed to the tracing process. It is permissible for a tracing process to attach to more than one other process at a time. When the tracing process has completed its work, it must detach the traced process; if a tracing process exits without first detaching all processes it has attached, those processes will be killed. .Pp Most of the time, the traced process runs normally, but when it receives a signal (see .Xr sigaction 2 ) , it stops. The tracing process is expected to notice this via .Xr wait 2 or the delivery of a .Dv SIGCHLD signal, examine the state of the stopped process, and cause it to terminate or continue as appropriate. The signal may be a normal process signal, generated as a result of traced process behavior, or use of the .Xr kill 2 system call; alternatively, it may be generated by the tracing facility as a result of attaching, stepping by the tracing process, or an event in the traced process. The tracing process may choose to intercept the signal, using it to observe process behavior (such as .Dv SIGTRAP ) , or forward the signal to the process if appropriate. The .Fn ptrace system call is the mechanism by which all this happens. .Pp A traced process may report additional signal stops corresponding to events in the traced process. These additional signal stops are reported as .Dv SIGTRAP or .Dv SIGSTOP signals. The tracing process can use the .Dv PT_LWPINFO request to determine which events are associated with a .Dv SIGTRAP or .Dv SIGSTOP signal. Note that multiple events may be associated with a single signal. For example, events indicated by the .Dv PL_FLAG_BORN , .Dv PL_FLAG_FORKED , and .Dv PL_FLAG_EXEC flags are also reported as a system call exit event .Pq Dv PL_FLAG_SCX . The signal stop for a new child process enabled via .Dv PTRACE_FORK will report a .Dv SIGSTOP signal. All other additional signal stops use .Dv SIGTRAP . +.Sh DETACH AND TERMINATION +.Pp +Normally, exiting tracing process should wait for all pending +debugging events and then detach from all alive traced processes +before exiting using +.Dv PT_DETACH +request. +If tracing process exits without detaching, for instance due to abnormal +termination, the destiny of the traced children processes is determined +by the +.Dv kern.kill_on_debugger_exit +sysctl control. +.Pp +If the control is set to the default value 1, such traced processes +are terminated. +If set to zero, kernel implicitly detaches traced processes. +Traced processes are un-stopped if needed, and then continue the execution +without tracing. +Kernel drops any +.Dv SIGTRAP +signals queued to the traced children, which could be either generated by +not yet consumed debug events, or sent by other means, the later should +not be done anyway. +.Sh TRACING EVENTS .Pp Each traced process has a tracing event mask. An event in the traced process only reports a signal stop if the corresponding flag is set in the tracing event mask. The current set of tracing event flags include: .Bl -tag -width "Dv PTRACE_SYSCALL" .It Dv PTRACE_EXEC Report a stop for a successful invocation of .Xr execve 2 . This event is indicated by the .Dv PL_FLAG_EXEC flag in the .Va pl_flags member of .Vt "struct ptrace_lwpinfo" . .It Dv PTRACE_SCE Report a stop on each system call entry. This event is indicated by the .Dv PL_FLAG_SCE flag in the .Va pl_flags member of .Vt "struct ptrace_lwpinfo" . .It Dv PTRACE_SCX Report a stop on each system call exit. This event is indicated by the .Dv PL_FLAG_SCX flag in the .Va pl_flags member of .Vt "struct ptrace_lwpinfo" . .It Dv PTRACE_SYSCALL Report stops for both system call entry and exit. .It Dv PTRACE_FORK This event flag controls tracing for new child processes of a traced process. .Pp When this event flag is enabled, new child processes will enable tracing and stop before executing their first instruction. The new child process will include the .Dv PL_FLAG_CHILD flag in the .Va pl_flags member of .Vt "struct ptrace_lwpinfo" . The traced process will report a stop that includes the .Dv PL_FLAG_FORKED flag. The process ID of the new child process will also be present in the .Va pl_child_pid member of .Vt "struct ptrace_lwpinfo" . If the new child process was created via .Xr vfork 2 , the traced process's stop will also include the .Dv PL_FLAG_VFORKED flag. Note that new child processes will be attached with the default tracing event mask; they do not inherit the event mask of the traced process. .Pp When this event flag is not enabled, new child processes will execute without tracing enabled. .It Dv PTRACE_LWP This event flag controls tracing of LWP .Pq kernel thread creation and destruction. When this event is enabled, new LWPs will stop and report an event with .Dv PL_FLAG_BORN set before executing their first instruction, and exiting LWPs will stop and report an event with .Dv PL_FLAG_EXITED set before completing their termination. .Pp Note that new processes do not report an event for the creation of their initial thread, and exiting processes do not report an event for the termination of the last thread. .It Dv PTRACE_VFORK Report a stop event when a parent process resumes after a .Xr vfork 2 . .Pp When a thread in the traced process creates a new child process via .Xr vfork 2 , the stop that reports .Dv PL_FLAG_FORKED and .Dv PL_FLAG_SCX occurs just after the child process is created, but before the thread waits for the child process to stop sharing process memory. If a debugger is not tracing the new child process, it must ensure that no breakpoints are enabled in the shared process memory before detaching from the new child process. This means that no breakpoints are enabled in the parent process either. .Pp The .Dv PTRACE_VFORK flag enables a new stop that indicates when the new child process stops sharing the process memory of the parent process. A debugger can reinsert breakpoints in the parent process and resume it in response to this event. This event is indicated by setting the .Dv PL_FLAG_VFORK_DONE flag. .El .Pp The default tracing event mask when attaching to a process via .Dv PT_ATTACH , .Dv PT_TRACE_ME , or .Dv PTRACE_FORK includes only .Dv PTRACE_EXEC events. All other event flags are disabled. +.Sh PTRACE REQUESTS .Pp The .Fa request argument specifies what operation is being performed; the meaning of the rest of the arguments depends on the operation, but except for one special case noted below, all .Fn ptrace calls are made by the tracing process, and the .Fa pid argument specifies the process ID of the traced process or a corresponding thread ID. The .Fa request argument can be: .Bl -tag -width "Dv PT_GET_EVENT_MASK" .It Dv PT_TRACE_ME This request is the only one used by the traced process; it declares that the process expects to be traced by its parent. All the other arguments are ignored. (If the parent process does not expect to trace the child, it will probably be rather confused by the results; once the traced process stops, it cannot be made to continue except via .Fn ptrace . ) When a process has used this request and calls .Xr execve 2 or any of the routines built on it (such as .Xr execv 3 ) , it will stop before executing the first instruction of the new image. Also, any setuid or setgid bits on the executable being executed will be ignored. If the child was created by .Xr vfork 2 system call or .Xr rfork 2 call with the .Dv RFMEM flag specified, the debugging events are reported to the parent only after the .Xr execve 2 is executed. .It Dv PT_READ_I , Dv PT_READ_D These requests read a single .Vt int of data from the traced process's address space. Traditionally, .Fn ptrace has allowed for machines with distinct address spaces for instruction and data, which is why there are two requests: conceptually, .Dv PT_READ_I reads from the instruction space and .Dv PT_READ_D reads from the data space. In the current .Fx implementation, these two requests are completely identical. The .Fa addr argument specifies the address (in the traced process's virtual address space) at which the read is to be done. This address does not have to meet any alignment constraints. The value read is returned as the return value from .Fn ptrace . .It Dv PT_WRITE_I , Dv PT_WRITE_D These requests parallel .Dv PT_READ_I and .Dv PT_READ_D , except that they write rather than read. The .Fa data argument supplies the value to be written. .It Dv PT_IO This request allows reading and writing arbitrary amounts of data in the traced process's address space. The .Fa addr argument specifies a pointer to a .Vt "struct ptrace_io_desc" , which is defined as follows: .Bd -literal struct ptrace_io_desc { int piod_op; /* I/O operation */ void *piod_offs; /* child offset */ void *piod_addr; /* parent offset */ size_t piod_len; /* request length */ }; /* * Operations in piod_op. */ #define PIOD_READ_D 1 /* Read from D space */ #define PIOD_WRITE_D 2 /* Write to D space */ #define PIOD_READ_I 3 /* Read from I space */ #define PIOD_WRITE_I 4 /* Write to I space */ .Ed .Pp The .Fa data argument is ignored. The actual number of bytes read or written is stored in .Va piod_len upon return. .It Dv PT_CONTINUE The traced process continues execution. The .Fa addr argument is an address specifying the place where execution is to be resumed (a new value for the program counter), or .Po Vt caddr_t Pc Ns 1 to indicate that execution is to pick up where it left off. The .Fa data argument provides a signal number to be delivered to the traced process as it resumes execution, or 0 if no signal is to be sent. .It Dv PT_STEP The traced process is single stepped one instruction. The .Fa addr argument should be passed .Po Vt caddr_t Pc Ns 1 . The .Fa data argument provides a signal number to be delivered to the traced process as it resumes execution, or 0 if no signal is to be sent. .It Dv PT_KILL The traced process terminates, as if .Dv PT_CONTINUE had been used with .Dv SIGKILL given as the signal to be delivered. .It Dv PT_ATTACH This request allows a process to gain control of an otherwise unrelated process and begin tracing it. It does not need any cooperation from the to-be-traced process. In this case, .Fa pid specifies the process ID of the to-be-traced process, and the other two arguments are ignored. This request requires that the target process must have the same real UID as the tracing process, and that it must not be executing a setuid or setgid executable. (If the tracing process is running as root, these restrictions do not apply.) The tracing process will see the newly-traced process stop and may then control it as if it had been traced all along. .It Dv PT_DETACH This request is like PT_CONTINUE, except that it does not allow specifying an alternate place to continue execution, and after it succeeds, the traced process is no longer traced and continues execution normally. .It Dv PT_GETREGS This request reads the traced process's machine registers into the .Do .Vt "struct reg" .Dc (defined in .In machine/reg.h ) pointed to by .Fa addr . .It Dv PT_SETREGS This request is the converse of .Dv PT_GETREGS ; it loads the traced process's machine registers from the .Do .Vt "struct reg" .Dc (defined in .In machine/reg.h ) pointed to by .Fa addr . .It Dv PT_GETFPREGS This request reads the traced process's floating-point registers into the .Do .Vt "struct fpreg" .Dc (defined in .In machine/reg.h ) pointed to by .Fa addr . .It Dv PT_SETFPREGS This request is the converse of .Dv PT_GETFPREGS ; it loads the traced process's floating-point registers from the .Do .Vt "struct fpreg" .Dc (defined in .In machine/reg.h ) pointed to by .Fa addr . .It Dv PT_GETDBREGS This request reads the traced process's debug registers into the .Do .Vt "struct dbreg" .Dc (defined in .In machine/reg.h ) pointed to by .Fa addr . .It Dv PT_SETDBREGS This request is the converse of .Dv PT_GETDBREGS ; it loads the traced process's debug registers from the .Do .Vt "struct dbreg" .Dc (defined in .In machine/reg.h ) pointed to by .Fa addr . .It Dv PT_LWPINFO This request can be used to obtain information about the kernel thread, also known as light-weight process, that caused the traced process to stop. The .Fa addr argument specifies a pointer to a .Vt "struct ptrace_lwpinfo" , which is defined as follows: .Bd -literal struct ptrace_lwpinfo { lwpid_t pl_lwpid; int pl_event; int pl_flags; sigset_t pl_sigmask; sigset_t pl_siglist; siginfo_t pl_siginfo; char pl_tdname[MAXCOMLEN + 1]; pid_t pl_child_pid; u_int pl_syscall_code; u_int pl_syscall_narg; }; .Ed .Pp The .Fa data argument is to be set to the size of the structure known to the caller. This allows the structure to grow without affecting older programs. .Pp The fields in the .Vt "struct ptrace_lwpinfo" have the following meaning: .Bl -tag -width indent -compact .It Va pl_lwpid LWP id of the thread .It Va pl_event Event that caused the stop. Currently defined events are: .Bl -tag -width "Dv PL_EVENT_SIGNAL" -compact .It Dv PL_EVENT_NONE No reason given .It Dv PL_EVENT_SIGNAL Thread stopped due to the pending signal .El .It Va pl_flags Flags that specify additional details about observed stop. Currently defined flags are: .Bl -tag -width indent -compact .It Dv PL_FLAG_SCE The thread stopped due to system call entry, right after the kernel is entered. The debugger may examine syscall arguments that are stored in memory and registers according to the ABI of the current process, and modify them, if needed. .It Dv PL_FLAG_SCX The thread is stopped immediately before syscall is returning to the usermode. The debugger may examine system call return values in the ABI-defined registers and/or memory. .It Dv PL_FLAG_EXEC When .Dv PL_FLAG_SCX is set, this flag may be additionally specified to inform that the program being executed by debuggee process has been changed by successful execution of a system call from the .Fn execve 2 family. .It Dv PL_FLAG_SI Indicates that .Va pl_siginfo member of .Vt "struct ptrace_lwpinfo" contains valid information. .It Dv PL_FLAG_FORKED Indicates that the process is returning from a call to .Fn fork 2 that created a new child process. The process identifier of the new process is available in the .Va pl_child_pid member of .Vt "struct ptrace_lwpinfo" . .It Dv PL_FLAG_CHILD The flag is set for first event reported from a new child which is automatically attached when .Dv PTRACE_FORK is enabled. .It Dv PL_FLAG_BORN This flag is set for the first event reported from a new LWP when .Dv PTRACE_LWP is enabled. It is reported along with .Dv PL_FLAG_SCX . .It Dv PL_FLAG_EXITED This flag is set for the last event reported by an exiting LWP when .Dv PTRACE_LWP is enabled. Note that this event is not reported when the last LWP in a process exits. The termination of the last thread is reported via a normal process exit event. .It Dv PL_FLAG_VFORKED Indicates that the thread is returning from a call to .Xr vfork 2 that created a new child process. This flag is set in addition to .Dv PL_FLAG_FORKED . .It Dv PL_FLAG_VFORK_DONE Indicates that the thread has resumed after a child process created via .Xr vfork 2 has stopped sharing its address space with the traced process. .El .It Va pl_sigmask The current signal mask of the LWP .It Va pl_siglist The current pending set of signals for the LWP. Note that signals that are delivered to the process would not appear on an LWP siglist until the thread is selected for delivery. .It Va pl_siginfo The siginfo that accompanies the signal pending. Only valid for .Dv PL_EVENT_SIGNAL stop when .Dv PL_FLAG_SI is set in .Va pl_flags . .It Va pl_tdname The name of the thread. .It Va pl_child_pid The process identifier of the new child process. Only valid for a .Dv PL_EVENT_SIGNAL stop when .Dv PL_FLAG_FORKED is set in .Va pl_flags . .It Va pl_syscall_code The ABI-specific identifier of the current system call. Note that for indirect system calls this field reports the indirected system call. Only valid when .Dv PL_FLAG_SCE or .Dv PL_FLAG_SCX is set in .Va pl_flags . .It Va pl_syscall_narg The number of arguments passed to the current system call not counting the system call identifier. Note that for indirect system calls this field reports the arguments passed to the indirected system call. Only valid when .Dv PL_FLAG_SCE or .Dv PL_FLAG_SCX is set in .Va pl_flags . .El .It Dv PT_GETNUMLWPS This request returns the number of kernel threads associated with the traced process. .It Dv PT_GETLWPLIST This request can be used to get the current thread list. A pointer to an array of type .Vt lwpid_t should be passed in .Fa addr , with the array size specified by .Fa data . The return value from .Fn ptrace is the count of array entries filled in. .It Dv PT_SETSTEP This request will turn on single stepping of the specified process. Stepping is automatically disabled when a single step trap is caught. .It Dv PT_CLEARSTEP This request will turn off single stepping of the specified process. .It Dv PT_SUSPEND This request will suspend the specified thread. .It Dv PT_RESUME This request will resume the specified thread. .It Dv PT_TO_SCE This request will set the .Dv PTRACE_SCE event flag to trace all future system call entries and continue the process. The .Fa addr and .Fa data arguments are used the same as for .Dv PT_CONTINUE . .It Dv PT_TO_SCX This request will set the .Dv PTRACE_SCX event flag to trace all future system call exits and continue the process. The .Fa addr and .Fa data arguments are used the same as for .Dv PT_CONTINUE . .It Dv PT_SYSCALL This request will set the .Dv PTRACE_SYSCALL event flag to trace all future system call entries and exits and continue the process. The .Fa addr and .Fa data arguments are used the same as for .Dv PT_CONTINUE . .It Dv PT_GET_SC_ARGS For the thread which is stopped in either .Dv PL_FLAG_SCE or .Dv PL_FLAG_SCX state, that is, on entry or exit to a syscall, this request fetches the syscall arguments. .Pp The arguments are copied out into the buffer pointed to by the .Fa addr pointer, sequentially. Each syscall argument is stored as the machine word. Kernel copies out as many arguments as the syscall accepts, see the .Va pl_syscall_narg member of the .Vt struct ptrace_lwpinfo , but not more than the .Fa data bytes in total are copied. .It Dv PT_GET_SC_RET Fetch the system call return values on exit from a syscall. This request is only valid for threads stopped in a syscall exit (the .Dv PL_FLAG_SCX state). The .Fa addr argument specifies a pointer to a .Vt "struct ptrace_sc_ret" , which is defined as follows: .Bd -literal struct ptrace_sc_ret { register_t sr_retval[2]; int sr_error; }; .Ed .Pp The .Fa data argument is set to the size of the structure. .Pp If the system call completed successfully, .Va sr_error is set to zero and the return values of the system call are saved in .Va sr_retval . If the system call failed to execute, .Va sr_error field is set to a positive .Xr errno 2 value. If the system call completed in an unusual fashion, .Va sr_error is set to a negative value: .Bl -tag -width Dv EJUSTRETURN -compact .It Dv ERESTART System call will be restarted. .It Dv EJUSTRETURN System call completed sucessfully but did not set a return value .Po for example, .Xr setcontext 2 and .Xr sigreturn 2 .Pc . .El .It Dv PT_FOLLOW_FORK This request controls tracing for new child processes of a traced process. If .Fa data is non-zero, .Dv PTRACE_FORK is set in the traced process's event tracing mask. If .Fa data is zero, .Dv PTRACE_FORK is cleared from the traced process's event tracing mask. .It Dv PT_LWP_EVENTS This request controls tracing of LWP creation and destruction. If .Fa data is non-zero, .Dv PTRACE_LWP is set in the traced process's event tracing mask. If .Fa data is zero, .Dv PTRACE_LWP is cleared from the traced process's event tracing mask. .It Dv PT_GET_EVENT_MASK This request reads the traced process's event tracing mask into the integer pointed to by .Fa addr . The size of the integer must be passed in .Fa data . .It Dv PT_SET_EVENT_MASK This request sets the traced process's event tracing mask from the integer pointed to by .Fa addr . The size of the integer must be passed in .Fa data . .It Dv PT_VM_TIMESTAMP This request returns the generation number or timestamp of the memory map of the traced process as the return value from .Fn ptrace . This provides a low-cost way for the tracing process to determine if the VM map changed since the last time this request was made. .It Dv PT_VM_ENTRY This request is used to iterate over the entries of the VM map of the traced process. The .Fa addr argument specifies a pointer to a .Vt "struct ptrace_vm_entry" , which is defined as follows: .Bd -literal struct ptrace_vm_entry { int pve_entry; int pve_timestamp; u_long pve_start; u_long pve_end; u_long pve_offset; u_int pve_prot; u_int pve_pathlen; long pve_fileid; uint32_t pve_fsid; char *pve_path; }; .Ed .Pp The first entry is returned by setting .Va pve_entry to zero. Subsequent entries are returned by leaving .Va pve_entry unmodified from the value returned by previous requests. The .Va pve_timestamp field can be used to detect changes to the VM map while iterating over the entries. The tracing process can then take appropriate action, such as restarting. By setting .Va pve_pathlen to a non-zero value on entry, the pathname of the backing object is returned in the buffer pointed to by .Va pve_path , provided the entry is backed by a vnode. The .Va pve_pathlen field is updated with the actual length of the pathname (including the terminating null character). The .Va pve_offset field is the offset within the backing object at which the range starts. The range is located in the VM space at .Va pve_start and extends up to .Va pve_end (inclusive). .Pp The .Fa data argument is ignored. .It Dv PT_COREDUMP This request creates a coredump for the stopped program. The .Fa addr argument specifies a pointer to a .Vt "struct ptrace_coredump" , which is defined as follows: .Bd -literal struct ptrace_coredump { int pc_fd; uint32_t pc_flags; off_t pc_limit; }; .Ed The fields of the structure are: .Bl -tag -width pc_flags .It Dv pc_fd File descriptor to write the dump to. It must refer to a regular file, opened for writing. .It Dv pc_flags Flags. The following flags are defined: .Bl -tag -width PC_COMPRESS .It Dv PC_COMPRESS Request compression of the dump. .It Dv PC_ALL Include non-dumpable entries into the dump. The dumper ignores .Dv MAP_NOCORE flag of the process map entry, but device mappings are not dumped even with .Dv PC_ALL set. .El .It Dv pc_limit Maximum size of the coredump. Specify zero for no limit. .El .Pp The size of .Vt "struct ptrace_coredump" must be passed in .Fa data . .Pp The process must be stopped before dumping core. A single thread in the target process is temporarily unsuspended in kernel to write the dump. If the .Nm call fails before a thread is unsuspended, there is no event to .Xr waitpid 2 for. If a thread was unsuspended, it will stop again before the .Nm call returns, and the process must be waited upon using .Xr waitpid 2 to consume the new stop event. Since it is hard to deduce whether a thread was unsuspended before an error occurred, it is recommended to unconditionally perform .Xr waitpid 2 with .Dv WNOHANG flag after .Dv PT_COREDUMP , and silently accept zero result from it. .El .Sh ARM MACHINE-SPECIFIC REQUESTS .Bl -tag -width "Dv PT_SETVFPREGS" .It Dv PT_GETVFPREGS Return the thread's .Dv VFP machine state in the buffer pointed to by .Fa addr . .Pp The .Fa data argument is ignored. .It Dv PT_SETVFPREGS Set the thread's .Dv VFP machine state from the buffer pointed to by .Fa addr . .Pp The .Fa data argument is ignored. .El .Sh x86 MACHINE-SPECIFIC REQUESTS .Bl -tag -width "Dv PT_GETXSTATE_INFO" .It Dv PT_GETXMMREGS Copy the XMM FPU state into the buffer pointed to by the argument .Fa addr . The buffer has the same layout as the 32-bit save buffer for the machine instruction .Dv FXSAVE . .Pp This request is only valid for i386 programs, both on native 32-bit systems and on amd64 kernels. For 64-bit amd64 programs, the XMM state is reported as part of the FPU state returned by the .Dv PT_GETFPREGS request. .Pp The .Fa data argument is ignored. .It Dv PT_SETXMMREGS Load the XMM FPU state for the thread from the buffer pointed to by the argument .Fa addr . The buffer has the same layout as the 32-bit load buffer for the machine instruction .Dv FXRSTOR . .Pp As with .Dv PT_GETXMMREGS , this request is only valid for i386 programs. .Pp The .Fa data argument is ignored. .It Dv PT_GETXSTATE_INFO Report which XSAVE FPU extensions are supported by the CPU and allowed in userspace programs. The .Fa addr argument must point to a variable of type .Vt struct ptrace_xstate_info , which contains the information on the request return. .Vt struct ptrace_xstate_info is defined as follows: .Bd -literal struct ptrace_xstate_info { uint64_t xsave_mask; uint32_t xsave_len; }; .Ed The .Dv xsave_mask field is a bitmask of the currently enabled extensions. The meaning of the bits is defined in the Intel and AMD processor documentation. The .Dv xsave_len field reports the length of the XSAVE area for storing the hardware state for currently enabled extensions in the format defined by the x86 .Dv XSAVE machine instruction. .Pp The .Fa data argument value must be equal to the size of the .Vt struct ptrace_xstate_info . .It Dv PT_GETXSTATE Return the content of the XSAVE area for the thread. The .Fa addr argument points to the buffer where the content is copied, and the .Fa data argument specifies the size of the buffer. The kernel copies out as much content as allowed by the buffer size. The buffer layout is specified by the layout of the save area for the .Dv XSAVE machine instruction. .It Dv PT_SETXSTATE Load the XSAVE state for the thread from the buffer specified by the .Fa addr pointer. The buffer size is passed in the .Fa data argument. The buffer must be at least as large as the .Vt struct savefpu (defined in .Pa x86/fpu.h ) to allow the complete x87 FPU and XMM state load. It must not be larger than the XSAVE state length, as reported by the .Dv xsave_len field from the .Vt struct ptrace_xstate_info of the .Dv PT_GETXSTATE_INFO request. Layout of the buffer is identical to the layout of the load area for the .Dv XRSTOR machine instruction. .It Dv PT_GETFSBASE Return the value of the base used when doing segmented memory addressing using the %fs segment register. The .Fa addr argument points to an .Vt unsigned long variable where the base value is stored. .Pp The .Fa data argument is ignored. .It Dv PT_GETGSBASE Like the .Dv PT_GETFSBASE request, but returns the base for the %gs segment register. .It Dv PT_SETFSBASE Set the base for the %fs segment register to the value pointed to by the .Fa addr argument. .Fa addr must point to the .Vt unsigned long variable containing the new base. .Pp The .Fa data argument is ignored. .It Dv PT_SETGSBASE Like the .Dv PT_SETFSBASE request, but sets the base for the %gs segment register. .El .Sh PowerPC MACHINE-SPECIFIC REQUESTS .Bl -tag -width "Dv PT_SETVRREGS" .It Dv PT_GETVRREGS Return the thread's .Dv ALTIVEC machine state in the buffer pointed to by .Fa addr . .Pp The .Fa data argument is ignored. .It Dv PT_SETVRREGS Set the thread's .Dv ALTIVEC machine state from the buffer pointed to by .Fa addr . .Pp The .Fa data argument is ignored. .It Dv PT_GETVSRREGS Return doubleword 1 of the thread's .Dv VSX registers VSR0-VSR31 in the buffer pointed to by .Fa addr . .Pp The .Fa data argument is ignored. .It Dv PT_SETVSRREGS Set doubleword 1 of the thread's .Dv VSX registers VSR0-VSR31 from the buffer pointed to by .Fa addr . .Pp The .Fa data argument is ignored. .El .Pp Additionally, other machine-specific requests can exist. .Sh RETURN VALUES Most requests return 0 on success and \-1 on error. Some requests can cause .Fn ptrace to return \-1 as a non-error value, among them are .Dv PT_READ_I and .Dv PT_READ_D , which return the value read from the process memory on success. To disambiguate, .Va errno can be set to 0 before the call and checked afterwards. .Pp The current .Fn ptrace implementation always sets .Va errno to 0 before calling into the kernel, both for historic reasons and for consistency with other operating systems. It is recommended to assign zero to .Va errno explicitly for forward compatibility. .Sh ERRORS The .Fn ptrace system call may fail if: .Bl -tag -width Er .It Bq Er ESRCH .Bl -bullet -compact .It No process having the specified process ID exists. .El .It Bq Er EINVAL .Bl -bullet -compact .It A process attempted to use .Dv PT_ATTACH on itself. .It The .Fa request argument was not one of the legal requests. .It The signal number (in .Fa data ) to .Dv PT_CONTINUE was neither 0 nor a legal signal number. .It .Dv PT_GETREGS , .Dv PT_SETREGS , .Dv PT_GETFPREGS , .Dv PT_SETFPREGS , .Dv PT_GETDBREGS , or .Dv PT_SETDBREGS was attempted on a process with no valid register set. (This is normally true only of system processes.) .It .Dv PT_VM_ENTRY was given an invalid value for .Fa pve_entry . This can also be caused by changes to the VM map of the process. .It The size (in .Fa data ) provided to .Dv PT_LWPINFO was less than or equal to zero, or larger than the .Vt ptrace_lwpinfo structure known to the kernel. .It The size (in .Fa data ) provided to the x86-specific .Dv PT_GETXSTATE_INFO request was not equal to the size of the .Vt struct ptrace_xstate_info . .It The size (in .Fa data ) provided to the x86-specific .Dv PT_SETXSTATE request was less than the size of the x87 plus the XMM save area. .It The size (in .Fa data ) provided to the x86-specific .Dv PT_SETXSTATE request was larger than returned in the .Dv xsave_len member of the .Vt struct ptrace_xstate_info from the .Dv PT_GETXSTATE_INFO request. .It The base value, provided to the amd64-specific requests .Dv PT_SETFSBASE or .Dv PT_SETGSBASE , pointed outside of the valid user address space. This error will not occur in 32-bit programs. .El .It Bq Er EBUSY .Bl -bullet -compact .It .Dv PT_ATTACH was attempted on a process that was already being traced. .It A request attempted to manipulate a process that was being traced by some process other than the one making the request. .It A request (other than .Dv PT_ATTACH ) specified a process that was not stopped. .El .It Bq Er EPERM .Bl -bullet -compact .It A request (other than .Dv PT_ATTACH ) attempted to manipulate a process that was not being traced at all. .It An attempt was made to use .Dv PT_ATTACH on a process in violation of the requirements listed under .Dv PT_ATTACH above. .El .It Bq Er ENOENT .Bl -bullet -compact .It .Dv PT_VM_ENTRY previously returned the last entry of the memory map. No more entries exist. .El .It Bq Er ENAMETOOLONG .Bl -bullet -compact .It .Dv PT_VM_ENTRY cannot return the pathname of the backing object because the buffer is not big enough. .Fa pve_pathlen holds the minimum buffer size required on return. .El .El .Sh SEE ALSO .Xr execve 2 , .Xr sigaction 2 , .Xr wait 2 , .Xr execv 3 , .Xr i386_clr_watch 3 , .Xr i386_set_watch 3 .Sh HISTORY The .Fn ptrace function appeared in .At v6 . diff --git a/sys/kern/kern_exit.c b/sys/kern/kern_exit.c index e1b40a171345..cb5996982a3a 100644 --- a/sys/kern/kern_exit.c +++ b/sys/kern/kern_exit.c @@ -1,1401 +1,1417 @@ /*- * SPDX-License-Identifier: BSD-3-Clause * * Copyright (c) 1982, 1986, 1989, 1991, 1993 * The Regents of the University of California. All rights reserved. * (c) UNIX System Laboratories, Inc. * All or some portions of this file are derived from material licensed * to the University of California by American Telephone and Telegraph * Co. or Unix System Laboratories, Inc. and are reproduced herein with * the permission of UNIX System Laboratories, Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)kern_exit.c 8.7 (Berkeley) 2/12/94 */ #include __FBSDID("$FreeBSD$"); #include "opt_ddb.h" #include "opt_ktrace.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include +#include #include #include #include /* for acct_process() function prototype */ #include #include #include #include #include #include #include #ifdef KTRACE #include #endif #include #include #include #include #include #include #include #include #include #ifdef KDTRACE_HOOKS #include dtrace_execexit_func_t dtrace_fasttrap_exit; #endif SDT_PROVIDER_DECLARE(proc); SDT_PROBE_DEFINE1(proc, , , exit, "int"); +static int kern_kill_on_dbg_exit = 1; +SYSCTL_INT(_kern, OID_AUTO, kill_on_debugger_exit, CTLFLAG_RWTUN, + &kern_kill_on_dbg_exit, 0, + "Kill ptraced processes when debugger exits"); + struct proc * proc_realparent(struct proc *child) { struct proc *p, *parent; sx_assert(&proctree_lock, SX_LOCKED); if ((child->p_treeflag & P_TREE_ORPHANED) == 0) return (child->p_pptr->p_pid == child->p_oppid ? child->p_pptr : child->p_reaper); for (p = child; (p->p_treeflag & P_TREE_FIRST_ORPHAN) == 0;) { /* Cannot use LIST_PREV(), since the list head is not known. */ p = __containerof(p->p_orphan.le_prev, struct proc, p_orphan.le_next); KASSERT((p->p_treeflag & P_TREE_ORPHANED) != 0, ("missing P_ORPHAN %p", p)); } parent = __containerof(p->p_orphan.le_prev, struct proc, p_orphans.lh_first); return (parent); } void reaper_abandon_children(struct proc *p, bool exiting) { struct proc *p1, *p2, *ptmp; sx_assert(&proctree_lock, SX_LOCKED); KASSERT(p != initproc, ("reaper_abandon_children for initproc")); if ((p->p_treeflag & P_TREE_REAPER) == 0) return; p1 = p->p_reaper; LIST_FOREACH_SAFE(p2, &p->p_reaplist, p_reapsibling, ptmp) { LIST_REMOVE(p2, p_reapsibling); p2->p_reaper = p1; p2->p_reapsubtree = p->p_reapsubtree; LIST_INSERT_HEAD(&p1->p_reaplist, p2, p_reapsibling); if (exiting && p2->p_pptr == p) { PROC_LOCK(p2); proc_reparent(p2, p1, true); PROC_UNLOCK(p2); } } KASSERT(LIST_EMPTY(&p->p_reaplist), ("p_reaplist not empty")); p->p_treeflag &= ~P_TREE_REAPER; } static void reaper_clear(struct proc *p) { struct proc *p1; bool clear; sx_assert(&proctree_lock, SX_LOCKED); LIST_REMOVE(p, p_reapsibling); if (p->p_reapsubtree == 1) return; clear = true; LIST_FOREACH(p1, &p->p_reaper->p_reaplist, p_reapsibling) { if (p1->p_reapsubtree == p->p_reapsubtree) { clear = false; break; } } if (clear) proc_id_clear(PROC_ID_REAP, p->p_reapsubtree); } void proc_clear_orphan(struct proc *p) { struct proc *p1; sx_assert(&proctree_lock, SA_XLOCKED); if ((p->p_treeflag & P_TREE_ORPHANED) == 0) return; if ((p->p_treeflag & P_TREE_FIRST_ORPHAN) != 0) { p1 = LIST_NEXT(p, p_orphan); if (p1 != NULL) p1->p_treeflag |= P_TREE_FIRST_ORPHAN; p->p_treeflag &= ~P_TREE_FIRST_ORPHAN; } LIST_REMOVE(p, p_orphan); p->p_treeflag &= ~P_TREE_ORPHANED; } /* * exit -- death of process. */ void sys_sys_exit(struct thread *td, struct sys_exit_args *uap) { exit1(td, uap->rval, 0); /* NOTREACHED */ } /* * Exit: deallocate address space and other resources, change proc state to * zombie, and unlink proc from allproc and parent's lists. Save exit status * and rusage for wait(). Check for child processes and orphan them. */ void exit1(struct thread *td, int rval, int signo) { struct proc *p, *nq, *q, *t; struct thread *tdt; ksiginfo_t *ksi, *ksi1; int signal_parent; mtx_assert(&Giant, MA_NOTOWNED); KASSERT(rval == 0 || signo == 0, ("exit1 rv %d sig %d", rval, signo)); p = td->td_proc; /* * XXX in case we're rebooting we just let init die in order to * work around an unsolved stack overflow seen very late during * shutdown on sparc64 when the gmirror worker process exists. * XXX what to do now that sparc64 is gone... remove if? */ if (p == initproc && rebooting == 0) { printf("init died (signal %d, exit %d)\n", signo, rval); panic("Going nowhere without my init!"); } /* * Deref SU mp, since the thread does not return to userspace. */ td_softdep_cleanup(td); /* * MUST abort all other threads before proceeding past here. */ PROC_LOCK(p); /* * First check if some other thread or external request got * here before us. If so, act appropriately: exit or suspend. * We must ensure that stop requests are handled before we set * P_WEXIT. */ thread_suspend_check(0); while (p->p_flag & P_HADTHREADS) { /* * Kill off the other threads. This requires * some co-operation from other parts of the kernel * so it may not be instantaneous. With this state set * any thread entering the kernel from userspace will * thread_exit() in trap(). Any thread attempting to * sleep will return immediately with EINTR or EWOULDBLOCK * which will hopefully force them to back out to userland * freeing resources as they go. Any thread attempting * to return to userland will thread_exit() from userret(). * thread_exit() will unsuspend us when the last of the * other threads exits. * If there is already a thread singler after resumption, * calling thread_single will fail; in that case, we just * re-check all suspension request, the thread should * either be suspended there or exit. */ if (!thread_single(p, SINGLE_EXIT)) /* * All other activity in this process is now * stopped. Threading support has been turned * off. */ break; /* * Recheck for new stop or suspend requests which * might appear while process lock was dropped in * thread_single(). */ thread_suspend_check(0); } KASSERT(p->p_numthreads == 1, ("exit1: proc %p exiting with %d threads", p, p->p_numthreads)); racct_sub(p, RACCT_NTHR, 1); /* Let event handler change exit status */ p->p_xexit = rval; p->p_xsig = signo; /* * Ignore any pending request to stop due to a stop signal. * Once P_WEXIT is set, future requests will be ignored as * well. */ p->p_flag &= ~P_STOPPED_SIG; KASSERT(!P_SHOULDSTOP(p), ("exiting process is stopped")); /* Note that we are exiting. */ p->p_flag |= P_WEXIT; /* * Wait for any processes that have a hold on our vmspace to * release their reference. */ while (p->p_lock > 0) msleep(&p->p_lock, &p->p_mtx, PWAIT, "exithold", 0); PROC_UNLOCK(p); /* Drain the limit callout while we don't have the proc locked */ callout_drain(&p->p_limco); #ifdef AUDIT /* * The Sun BSM exit token contains two components: an exit status as * passed to exit(), and a return value to indicate what sort of exit * it was. The exit status is WEXITSTATUS(rv), but it's not clear * what the return value is. */ AUDIT_ARG_EXIT(rval, 0); AUDIT_SYSCALL_EXIT(0, td); #endif /* Are we a task leader with peers? */ if (p->p_peers != NULL && p == p->p_leader) { mtx_lock(&ppeers_lock); q = p->p_peers; while (q != NULL) { PROC_LOCK(q); kern_psignal(q, SIGKILL); PROC_UNLOCK(q); q = q->p_peers; } while (p->p_peers != NULL) msleep(p, &ppeers_lock, PWAIT, "exit1", 0); mtx_unlock(&ppeers_lock); } itimers_exit(p); if (p->p_sysent->sv_onexit != NULL) p->p_sysent->sv_onexit(p); /* * Check if any loadable modules need anything done at process exit. * E.g. SYSV IPC stuff. * Event handler could change exit status. * XXX what if one of these generates an error? */ EVENTHANDLER_DIRECT_INVOKE(process_exit, p); /* * If parent is waiting for us to exit or exec, * P_PPWAIT is set; we will wakeup the parent below. */ PROC_LOCK(p); stopprofclock(p); p->p_ptevents = 0; /* * Stop the real interval timer. If the handler is currently * executing, prevent it from rearming itself and let it finish. */ if (timevalisset(&p->p_realtimer.it_value) && _callout_stop_safe(&p->p_itcallout, CS_EXECUTING, NULL) == 0) { timevalclear(&p->p_realtimer.it_interval); msleep(&p->p_itcallout, &p->p_mtx, PWAIT, "ritwait", 0); KASSERT(!timevalisset(&p->p_realtimer.it_value), ("realtime timer is still armed")); } PROC_UNLOCK(p); umtx_thread_exit(td); seltdfini(td); /* * Reset any sigio structures pointing to us as a result of * F_SETOWN with our pid. The P_WEXIT flag interlocks with fsetown(). */ funsetownlst(&p->p_sigiolst); /* * Close open files and release open-file table. * This may block! */ pdescfree(td); fdescfree(td); /* * If this thread tickled GEOM, we need to wait for the giggling to * stop before we return to userland */ if (td->td_pflags & TDP_GEOM) g_waitidle(); /* * Remove ourself from our leader's peer list and wake our leader. */ if (p->p_leader->p_peers != NULL) { mtx_lock(&ppeers_lock); if (p->p_leader->p_peers != NULL) { q = p->p_leader; while (q->p_peers != p) q = q->p_peers; q->p_peers = p->p_peers; wakeup(p->p_leader); } mtx_unlock(&ppeers_lock); } vmspace_exit(td); (void)acct_process(td); #ifdef KTRACE ktrprocexit(td); #endif /* * Release reference to text vnode */ if (p->p_textvp != NULL) { vrele(p->p_textvp); p->p_textvp = NULL; } /* * Release our limits structure. */ lim_free(p->p_limit); p->p_limit = NULL; tidhash_remove(td); /* * Call machine-dependent code to release any * machine-dependent resources other than the address space. * The address space is released by "vmspace_exitfree(p)" in * vm_waitproc(). */ cpu_exit(td); WITNESS_WARN(WARN_PANIC, NULL, "process (pid %d) exiting", p->p_pid); /* * Remove from allproc. It still sits in the hash. */ sx_xlock(&allproc_lock); LIST_REMOVE(p, p_list); #ifdef DDB /* * Used by ddb's 'ps' command to find this process via the * pidhash. */ p->p_list.le_prev = NULL; #endif sx_xunlock(&allproc_lock); sx_xlock(&proctree_lock); PROC_LOCK(p); p->p_flag &= ~(P_TRACED | P_PPWAIT | P_PPTRACE); PROC_UNLOCK(p); /* * killjobc() might drop and re-acquire proctree_lock to * revoke control tty if exiting process was a session leader. */ killjobc(); /* * Reparent all children processes: * - traced ones to the original parent (or init if we are that parent) * - the rest to init */ q = LIST_FIRST(&p->p_children); if (q != NULL) /* only need this if any child is S_ZOMB */ wakeup(q->p_reaper); for (; q != NULL; q = nq) { nq = LIST_NEXT(q, p_sibling); ksi = ksiginfo_alloc(TRUE); PROC_LOCK(q); q->p_sigparent = SIGCHLD; if ((q->p_flag & P_TRACED) == 0) { proc_reparent(q, q->p_reaper, true); if (q->p_state == PRS_ZOMBIE) { /* * Inform reaper about the reparented * zombie, since wait(2) has something * new to report. Guarantee queueing * of the SIGCHLD signal, similar to * the _exit() behaviour, by providing * our ksiginfo. Ksi is freed by the * signal delivery. */ if (q->p_ksi == NULL) { ksi1 = NULL; } else { ksiginfo_copy(q->p_ksi, ksi); ksi->ksi_flags |= KSI_INS; ksi1 = ksi; ksi = NULL; } PROC_LOCK(q->p_reaper); pksignal(q->p_reaper, SIGCHLD, ksi1); PROC_UNLOCK(q->p_reaper); } else if (q->p_pdeathsig > 0) { /* * The child asked to received a signal * when we exit. */ kern_psignal(q, q->p_pdeathsig); } } else { /* - * Traced processes are killed since their existence - * means someone is screwing up. + * Traced processes are killed by default + * since their existence means someone is + * screwing up. */ t = proc_realparent(q); if (t == p) { proc_reparent(q, q->p_reaper, true); } else { PROC_LOCK(t); proc_reparent(q, t, true); PROC_UNLOCK(t); } /* * Since q was found on our children list, the * proc_reparent() call moved q to the orphan * list due to present P_TRACED flag. Clear * orphan link for q now while q is locked. */ proc_clear_orphan(q); - q->p_flag &= ~(P_TRACED | P_STOPPED_TRACE); + q->p_flag &= ~P_TRACED; q->p_flag2 &= ~P2_PTRACE_FSTP; q->p_ptevents = 0; + p->p_xthread = NULL; FOREACH_THREAD_IN_PROC(q, tdt) { tdt->td_dbgflags &= ~(TDB_SUSPEND | TDB_XSIG | TDB_FSTP); + tdt->td_xsig = 0; + } + if (kern_kill_on_dbg_exit) { + q->p_flag &= ~P_STOPPED_TRACE; + kern_psignal(q, SIGKILL); + } else if ((q->p_flag & (P_STOPPED_TRACE | + P_STOPPED_SIG)) != 0) { + sigqueue_delete_proc(q, SIGTRAP); + ptrace_unsuspend(q); } - kern_psignal(q, SIGKILL); } PROC_UNLOCK(q); if (ksi != NULL) ksiginfo_free(ksi); } /* * Also get rid of our orphans. */ while ((q = LIST_FIRST(&p->p_orphans)) != NULL) { PROC_LOCK(q); KASSERT(q->p_oppid == p->p_pid, ("orphan %p of %p has unexpected oppid %d", q, p, q->p_oppid)); q->p_oppid = q->p_reaper->p_pid; /* * If we are the real parent of this process * but it has been reparented to a debugger, then * check if it asked for a signal when we exit. */ if (q->p_pdeathsig > 0) kern_psignal(q, q->p_pdeathsig); CTR2(KTR_PTRACE, "exit: pid %d, clearing orphan %d", p->p_pid, q->p_pid); proc_clear_orphan(q); PROC_UNLOCK(q); } #ifdef KDTRACE_HOOKS if (SDT_PROBES_ENABLED()) { int reason = CLD_EXITED; if (WCOREDUMP(signo)) reason = CLD_DUMPED; else if (WIFSIGNALED(signo)) reason = CLD_KILLED; SDT_PROBE1(proc, , , exit, reason); } #endif /* Save exit status. */ PROC_LOCK(p); p->p_xthread = td; if (p->p_sysent->sv_ontdexit != NULL) p->p_sysent->sv_ontdexit(td); #ifdef KDTRACE_HOOKS /* * Tell the DTrace fasttrap provider about the exit if it * has declared an interest. */ if (dtrace_fasttrap_exit) dtrace_fasttrap_exit(p); #endif /* * Notify interested parties of our demise. */ KNOTE_LOCKED(p->p_klist, NOTE_EXIT); /* * If this is a process with a descriptor, we may not need to deliver * a signal to the parent. proctree_lock is held over * procdesc_exit() to serialize concurrent calls to close() and * exit(). */ signal_parent = 0; if (p->p_procdesc == NULL || procdesc_exit(p)) { /* * Notify parent that we're gone. If parent has the * PS_NOCLDWAIT flag set, or if the handler is set to SIG_IGN, * notify process 1 instead (and hope it will handle this * situation). */ PROC_LOCK(p->p_pptr); mtx_lock(&p->p_pptr->p_sigacts->ps_mtx); if (p->p_pptr->p_sigacts->ps_flag & (PS_NOCLDWAIT | PS_CLDSIGIGN)) { struct proc *pp; mtx_unlock(&p->p_pptr->p_sigacts->ps_mtx); pp = p->p_pptr; PROC_UNLOCK(pp); proc_reparent(p, p->p_reaper, true); p->p_sigparent = SIGCHLD; PROC_LOCK(p->p_pptr); /* * Notify parent, so in case he was wait(2)ing or * executing waitpid(2) with our pid, he will * continue. */ wakeup(pp); } else mtx_unlock(&p->p_pptr->p_sigacts->ps_mtx); if (p->p_pptr == p->p_reaper || p->p_pptr == initproc) { signal_parent = 1; } else if (p->p_sigparent != 0) { if (p->p_sigparent == SIGCHLD) { signal_parent = 1; } else { /* LINUX thread */ signal_parent = 2; } } } else PROC_LOCK(p->p_pptr); sx_xunlock(&proctree_lock); if (signal_parent == 1) { childproc_exited(p); } else if (signal_parent == 2) { kern_psignal(p->p_pptr, p->p_sigparent); } /* Tell the prison that we are gone. */ prison_proc_free(p->p_ucred->cr_prison); /* * The state PRS_ZOMBIE prevents other proesses from sending * signal to the process, to avoid memory leak, we free memory * for signal queue at the time when the state is set. */ sigqueue_flush(&p->p_sigqueue); sigqueue_flush(&td->td_sigqueue); /* * We have to wait until after acquiring all locks before * changing p_state. We need to avoid all possible context * switches (including ones from blocking on a mutex) while * marked as a zombie. We also have to set the zombie state * before we release the parent process' proc lock to avoid * a lost wakeup. So, we first call wakeup, then we grab the * sched lock, update the state, and release the parent process' * proc lock. */ wakeup(p->p_pptr); cv_broadcast(&p->p_pwait); sched_exit(p->p_pptr, td); PROC_SLOCK(p); p->p_state = PRS_ZOMBIE; PROC_UNLOCK(p->p_pptr); /* * Save our children's rusage information in our exit rusage. */ PROC_STATLOCK(p); ruadd(&p->p_ru, &p->p_rux, &p->p_stats->p_cru, &p->p_crux); PROC_STATUNLOCK(p); /* * Make sure the scheduler takes this thread out of its tables etc. * This will also release this thread's reference to the ucred. * Other thread parts to release include pcb bits and such. */ thread_exit(); } #ifndef _SYS_SYSPROTO_H_ struct abort2_args { char *why; int nargs; void **args; }; #endif int sys_abort2(struct thread *td, struct abort2_args *uap) { struct proc *p = td->td_proc; struct sbuf *sb; void *uargs[16]; int error, i, sig; /* * Do it right now so we can log either proper call of abort2(), or * note, that invalid argument was passed. 512 is big enough to * handle 16 arguments' descriptions with additional comments. */ sb = sbuf_new(NULL, NULL, 512, SBUF_FIXEDLEN); sbuf_clear(sb); sbuf_printf(sb, "%s(pid %d uid %d) aborted: ", p->p_comm, p->p_pid, td->td_ucred->cr_uid); /* * Since we can't return from abort2(), send SIGKILL in cases, where * abort2() was called improperly */ sig = SIGKILL; /* Prevent from DoSes from user-space. */ if (uap->nargs < 0 || uap->nargs > 16) goto out; if (uap->nargs > 0) { if (uap->args == NULL) goto out; error = copyin(uap->args, uargs, uap->nargs * sizeof(void *)); if (error != 0) goto out; } /* * Limit size of 'reason' string to 128. Will fit even when * maximal number of arguments was chosen to be logged. */ if (uap->why != NULL) { error = sbuf_copyin(sb, uap->why, 128); if (error < 0) goto out; } else { sbuf_printf(sb, "(null)"); } if (uap->nargs > 0) { sbuf_printf(sb, "("); for (i = 0;i < uap->nargs; i++) sbuf_printf(sb, "%s%p", i == 0 ? "" : ", ", uargs[i]); sbuf_printf(sb, ")"); } /* * Final stage: arguments were proper, string has been * successfully copied from userspace, and copying pointers * from user-space succeed. */ sig = SIGABRT; out: if (sig == SIGKILL) { sbuf_trim(sb); sbuf_printf(sb, " (Reason text inaccessible)"); } sbuf_cat(sb, "\n"); sbuf_finish(sb); log(LOG_INFO, "%s", sbuf_data(sb)); sbuf_delete(sb); exit1(td, 0, sig); return (0); } #ifdef COMPAT_43 /* * The dirty work is handled by kern_wait(). */ int owait(struct thread *td, struct owait_args *uap __unused) { int error, status; error = kern_wait(td, WAIT_ANY, &status, 0, NULL); if (error == 0) td->td_retval[1] = status; return (error); } #endif /* COMPAT_43 */ /* * The dirty work is handled by kern_wait(). */ int sys_wait4(struct thread *td, struct wait4_args *uap) { struct rusage ru, *rup; int error, status; if (uap->rusage != NULL) rup = &ru; else rup = NULL; error = kern_wait(td, uap->pid, &status, uap->options, rup); if (uap->status != NULL && error == 0 && td->td_retval[0] != 0) error = copyout(&status, uap->status, sizeof(status)); if (uap->rusage != NULL && error == 0 && td->td_retval[0] != 0) error = copyout(&ru, uap->rusage, sizeof(struct rusage)); return (error); } int sys_wait6(struct thread *td, struct wait6_args *uap) { struct __wrusage wru, *wrup; siginfo_t si, *sip; idtype_t idtype; id_t id; int error, status; idtype = uap->idtype; id = uap->id; if (uap->wrusage != NULL) wrup = &wru; else wrup = NULL; if (uap->info != NULL) { sip = &si; bzero(sip, sizeof(*sip)); } else sip = NULL; /* * We expect all callers of wait6() to know about WEXITED and * WTRAPPED. */ error = kern_wait6(td, idtype, id, &status, uap->options, wrup, sip); if (uap->status != NULL && error == 0 && td->td_retval[0] != 0) error = copyout(&status, uap->status, sizeof(status)); if (uap->wrusage != NULL && error == 0 && td->td_retval[0] != 0) error = copyout(&wru, uap->wrusage, sizeof(wru)); if (uap->info != NULL && error == 0) error = copyout(&si, uap->info, sizeof(si)); return (error); } /* * Reap the remains of a zombie process and optionally return status and * rusage. Asserts and will release both the proctree_lock and the process * lock as part of its work. */ void proc_reap(struct thread *td, struct proc *p, int *status, int options) { struct proc *q, *t; sx_assert(&proctree_lock, SA_XLOCKED); PROC_LOCK_ASSERT(p, MA_OWNED); KASSERT(p->p_state == PRS_ZOMBIE, ("proc_reap: !PRS_ZOMBIE")); mtx_spin_wait_unlocked(&p->p_slock); q = td->td_proc; if (status) *status = KW_EXITCODE(p->p_xexit, p->p_xsig); if (options & WNOWAIT) { /* * Only poll, returning the status. Caller does not wish to * release the proc struct just yet. */ PROC_UNLOCK(p); sx_xunlock(&proctree_lock); return; } PROC_LOCK(q); sigqueue_take(p->p_ksi); PROC_UNLOCK(q); /* * If we got the child via a ptrace 'attach', we need to give it back * to the old parent. */ if (p->p_oppid != p->p_pptr->p_pid) { PROC_UNLOCK(p); t = proc_realparent(p); PROC_LOCK(t); PROC_LOCK(p); CTR2(KTR_PTRACE, "wait: traced child %d moved back to parent %d", p->p_pid, t->p_pid); proc_reparent(p, t, false); PROC_UNLOCK(p); pksignal(t, SIGCHLD, p->p_ksi); wakeup(t); cv_broadcast(&p->p_pwait); PROC_UNLOCK(t); sx_xunlock(&proctree_lock); return; } PROC_UNLOCK(p); /* * Remove other references to this process to ensure we have an * exclusive reference. */ sx_xlock(PIDHASHLOCK(p->p_pid)); LIST_REMOVE(p, p_hash); sx_xunlock(PIDHASHLOCK(p->p_pid)); LIST_REMOVE(p, p_sibling); reaper_abandon_children(p, true); reaper_clear(p); PROC_LOCK(p); proc_clear_orphan(p); PROC_UNLOCK(p); leavepgrp(p); if (p->p_procdesc != NULL) procdesc_reap(p); sx_xunlock(&proctree_lock); proc_id_clear(PROC_ID_PID, p->p_pid); PROC_LOCK(p); knlist_detach(p->p_klist); p->p_klist = NULL; PROC_UNLOCK(p); /* * Removal from allproc list and process group list paired with * PROC_LOCK which was executed during that time should guarantee * nothing can reach this process anymore. As such further locking * is unnecessary. */ p->p_xexit = p->p_xsig = 0; /* XXX: why? */ PROC_LOCK(q); ruadd(&q->p_stats->p_cru, &q->p_crux, &p->p_ru, &p->p_rux); PROC_UNLOCK(q); /* * Decrement the count of procs running with this uid. */ (void)chgproccnt(p->p_ucred->cr_ruidinfo, -1, 0); /* * Destroy resource accounting information associated with the process. */ #ifdef RACCT if (racct_enable) { PROC_LOCK(p); racct_sub(p, RACCT_NPROC, 1); PROC_UNLOCK(p); } #endif racct_proc_exit(p); /* * Free credentials, arguments, and sigacts. */ proc_unset_cred(p); pargs_drop(p->p_args); p->p_args = NULL; sigacts_free(p->p_sigacts); p->p_sigacts = NULL; /* * Do any thread-system specific cleanups. */ thread_wait(p); /* * Give vm and machine-dependent layer a chance to free anything that * cpu_exit couldn't release while still running in process context. */ vm_waitproc(p); #ifdef MAC mac_proc_destroy(p); #endif KASSERT(FIRST_THREAD_IN_PROC(p), ("proc_reap: no residual thread!")); uma_zfree(proc_zone, p); atomic_add_int(&nprocs, -1); } static int proc_to_reap(struct thread *td, struct proc *p, idtype_t idtype, id_t id, int *status, int options, struct __wrusage *wrusage, siginfo_t *siginfo, int check_only) { struct rusage *rup; sx_assert(&proctree_lock, SA_XLOCKED); PROC_LOCK(p); switch (idtype) { case P_ALL: if (p->p_procdesc == NULL || (p->p_pptr == td->td_proc && (p->p_flag & P_TRACED) != 0)) { break; } PROC_UNLOCK(p); return (0); case P_PID: if (p->p_pid != (pid_t)id) { PROC_UNLOCK(p); return (0); } break; case P_PGID: if (p->p_pgid != (pid_t)id) { PROC_UNLOCK(p); return (0); } break; case P_SID: if (p->p_session->s_sid != (pid_t)id) { PROC_UNLOCK(p); return (0); } break; case P_UID: if (p->p_ucred->cr_uid != (uid_t)id) { PROC_UNLOCK(p); return (0); } break; case P_GID: if (p->p_ucred->cr_gid != (gid_t)id) { PROC_UNLOCK(p); return (0); } break; case P_JAILID: if (p->p_ucred->cr_prison->pr_id != (int)id) { PROC_UNLOCK(p); return (0); } break; /* * It seems that the thread structures get zeroed out * at process exit. This makes it impossible to * support P_SETID, P_CID or P_CPUID. */ default: PROC_UNLOCK(p); return (0); } if (p_canwait(td, p)) { PROC_UNLOCK(p); return (0); } if (((options & WEXITED) == 0) && (p->p_state == PRS_ZOMBIE)) { PROC_UNLOCK(p); return (0); } /* * This special case handles a kthread spawned by linux_clone * (see linux_misc.c). The linux_wait4 and linux_waitpid * functions need to be able to distinguish between waiting * on a process and waiting on a thread. It is a thread if * p_sigparent is not SIGCHLD, and the WLINUXCLONE option * signifies we want to wait for threads and not processes. */ if ((p->p_sigparent != SIGCHLD) ^ ((options & WLINUXCLONE) != 0)) { PROC_UNLOCK(p); return (0); } if (siginfo != NULL) { bzero(siginfo, sizeof(*siginfo)); siginfo->si_errno = 0; /* * SUSv4 requires that the si_signo value is always * SIGCHLD. Obey it despite the rfork(2) interface * allows to request other signal for child exit * notification. */ siginfo->si_signo = SIGCHLD; /* * This is still a rough estimate. We will fix the * cases TRAPPED, STOPPED, and CONTINUED later. */ if (WCOREDUMP(p->p_xsig)) { siginfo->si_code = CLD_DUMPED; siginfo->si_status = WTERMSIG(p->p_xsig); } else if (WIFSIGNALED(p->p_xsig)) { siginfo->si_code = CLD_KILLED; siginfo->si_status = WTERMSIG(p->p_xsig); } else { siginfo->si_code = CLD_EXITED; siginfo->si_status = p->p_xexit; } siginfo->si_pid = p->p_pid; siginfo->si_uid = p->p_ucred->cr_uid; /* * The si_addr field would be useful additional * detail, but apparently the PC value may be lost * when we reach this point. bzero() above sets * siginfo->si_addr to NULL. */ } /* * There should be no reason to limit resources usage info to * exited processes only. A snapshot about any resources used * by a stopped process may be exactly what is needed. */ if (wrusage != NULL) { rup = &wrusage->wru_self; *rup = p->p_ru; PROC_STATLOCK(p); calcru(p, &rup->ru_utime, &rup->ru_stime); PROC_STATUNLOCK(p); rup = &wrusage->wru_children; *rup = p->p_stats->p_cru; calccru(p, &rup->ru_utime, &rup->ru_stime); } if (p->p_state == PRS_ZOMBIE && !check_only) { proc_reap(td, p, status, options); return (-1); } return (1); } int kern_wait(struct thread *td, pid_t pid, int *status, int options, struct rusage *rusage) { struct __wrusage wru, *wrup; idtype_t idtype; id_t id; int ret; /* * Translate the special pid values into the (idtype, pid) * pair for kern_wait6. The WAIT_MYPGRP case is handled by * kern_wait6() on its own. */ if (pid == WAIT_ANY) { idtype = P_ALL; id = 0; } else if (pid < 0) { idtype = P_PGID; id = (id_t)-pid; } else { idtype = P_PID; id = (id_t)pid; } if (rusage != NULL) wrup = &wru; else wrup = NULL; /* * For backward compatibility we implicitly add flags WEXITED * and WTRAPPED here. */ options |= WEXITED | WTRAPPED; ret = kern_wait6(td, idtype, id, status, options, wrup, NULL); if (rusage != NULL) *rusage = wru.wru_self; return (ret); } static void report_alive_proc(struct thread *td, struct proc *p, siginfo_t *siginfo, int *status, int options, int si_code) { bool cont; PROC_LOCK_ASSERT(p, MA_OWNED); sx_assert(&proctree_lock, SA_XLOCKED); MPASS(si_code == CLD_TRAPPED || si_code == CLD_STOPPED || si_code == CLD_CONTINUED); cont = si_code == CLD_CONTINUED; if ((options & WNOWAIT) == 0) { if (cont) p->p_flag &= ~P_CONTINUED; else p->p_flag |= P_WAITED; PROC_LOCK(td->td_proc); sigqueue_take(p->p_ksi); PROC_UNLOCK(td->td_proc); } sx_xunlock(&proctree_lock); if (siginfo != NULL) { siginfo->si_code = si_code; siginfo->si_status = cont ? SIGCONT : p->p_xsig; } if (status != NULL) *status = cont ? SIGCONT : W_STOPCODE(p->p_xsig); PROC_UNLOCK(p); td->td_retval[0] = p->p_pid; } int kern_wait6(struct thread *td, idtype_t idtype, id_t id, int *status, int options, struct __wrusage *wrusage, siginfo_t *siginfo) { struct proc *p, *q; pid_t pid; int error, nfound, ret; bool report; AUDIT_ARG_VALUE((int)idtype); /* XXX - This is likely wrong! */ AUDIT_ARG_PID((pid_t)id); /* XXX - This may be wrong! */ AUDIT_ARG_VALUE(options); q = td->td_proc; if ((pid_t)id == WAIT_MYPGRP && (idtype == P_PID || idtype == P_PGID)) { PROC_LOCK(q); id = (id_t)q->p_pgid; PROC_UNLOCK(q); idtype = P_PGID; } /* If we don't know the option, just return. */ if ((options & ~(WUNTRACED | WNOHANG | WCONTINUED | WNOWAIT | WEXITED | WTRAPPED | WLINUXCLONE)) != 0) return (EINVAL); if ((options & (WEXITED | WUNTRACED | WCONTINUED | WTRAPPED)) == 0) { /* * We will be unable to find any matching processes, * because there are no known events to look for. * Prefer to return error instead of blocking * indefinitely. */ return (EINVAL); } loop: if (q->p_flag & P_STATCHILD) { PROC_LOCK(q); q->p_flag &= ~P_STATCHILD; PROC_UNLOCK(q); } sx_xlock(&proctree_lock); loop_locked: nfound = 0; LIST_FOREACH(p, &q->p_children, p_sibling) { pid = p->p_pid; ret = proc_to_reap(td, p, idtype, id, status, options, wrusage, siginfo, 0); if (ret == 0) continue; else if (ret != 1) { td->td_retval[0] = pid; return (0); } nfound++; PROC_LOCK_ASSERT(p, MA_OWNED); if ((options & WTRAPPED) != 0 && (p->p_flag & P_TRACED) != 0) { PROC_SLOCK(p); report = ((p->p_flag & (P_STOPPED_TRACE | P_STOPPED_SIG)) && p->p_suspcount == p->p_numthreads && (p->p_flag & P_WAITED) == 0); PROC_SUNLOCK(p); if (report) { CTR4(KTR_PTRACE, "wait: returning trapped pid %d status %#x " "(xstat %d) xthread %d", p->p_pid, W_STOPCODE(p->p_xsig), p->p_xsig, p->p_xthread != NULL ? p->p_xthread->td_tid : -1); report_alive_proc(td, p, siginfo, status, options, CLD_TRAPPED); return (0); } } if ((options & WUNTRACED) != 0 && (p->p_flag & P_STOPPED_SIG) != 0) { PROC_SLOCK(p); report = (p->p_suspcount == p->p_numthreads && ((p->p_flag & P_WAITED) == 0)); PROC_SUNLOCK(p); if (report) { report_alive_proc(td, p, siginfo, status, options, CLD_STOPPED); return (0); } } if ((options & WCONTINUED) != 0 && (p->p_flag & P_CONTINUED) != 0) { report_alive_proc(td, p, siginfo, status, options, CLD_CONTINUED); return (0); } PROC_UNLOCK(p); } /* * Look in the orphans list too, to allow the parent to * collect it's child exit status even if child is being * debugged. * * Debugger detaches from the parent upon successful * switch-over from parent to child. At this point due to * re-parenting the parent loses the child to debugger and a * wait4(2) call would report that it has no children to wait * for. By maintaining a list of orphans we allow the parent * to successfully wait until the child becomes a zombie. */ if (nfound == 0) { LIST_FOREACH(p, &q->p_orphans, p_orphan) { ret = proc_to_reap(td, p, idtype, id, NULL, options, NULL, NULL, 1); if (ret != 0) { KASSERT(ret != -1, ("reaped an orphan (pid %d)", (int)td->td_retval[0])); PROC_UNLOCK(p); nfound++; break; } } } if (nfound == 0) { sx_xunlock(&proctree_lock); return (ECHILD); } if (options & WNOHANG) { sx_xunlock(&proctree_lock); td->td_retval[0] = 0; return (0); } PROC_LOCK(q); if (q->p_flag & P_STATCHILD) { q->p_flag &= ~P_STATCHILD; PROC_UNLOCK(q); goto loop_locked; } sx_xunlock(&proctree_lock); error = msleep(q, &q->p_mtx, PWAIT | PCATCH | PDROP, "wait", 0); if (error) return (error); goto loop; } void proc_add_orphan(struct proc *child, struct proc *parent) { sx_assert(&proctree_lock, SX_XLOCKED); KASSERT((child->p_flag & P_TRACED) != 0, ("proc_add_orphan: not traced")); if (LIST_EMPTY(&parent->p_orphans)) { child->p_treeflag |= P_TREE_FIRST_ORPHAN; LIST_INSERT_HEAD(&parent->p_orphans, child, p_orphan); } else { LIST_INSERT_AFTER(LIST_FIRST(&parent->p_orphans), child, p_orphan); } child->p_treeflag |= P_TREE_ORPHANED; } /* * Make process 'parent' the new parent of process 'child'. * Must be called with an exclusive hold of proctree lock. */ void proc_reparent(struct proc *child, struct proc *parent, bool set_oppid) { sx_assert(&proctree_lock, SX_XLOCKED); PROC_LOCK_ASSERT(child, MA_OWNED); if (child->p_pptr == parent) return; PROC_LOCK(child->p_pptr); sigqueue_take(child->p_ksi); PROC_UNLOCK(child->p_pptr); LIST_REMOVE(child, p_sibling); LIST_INSERT_HEAD(&parent->p_children, child, p_sibling); proc_clear_orphan(child); if ((child->p_flag & P_TRACED) != 0) { proc_add_orphan(child, child->p_pptr); } child->p_pptr = parent; if (set_oppid) child->p_oppid = parent->p_pid; }